Perturbations of Operators with Application to Testing Equality of Covariance Operators.
by
Krishna Kaphle, M.S.
A Dissertation
In
Mathematics and Statistics
Submitted to the Graduate Faculty of Texas Tech University in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philoshopy
Approved
Frits H. Ruymgaart
Linda J. Allen
Petros Hadjicostas
Peggy Gordon Miller Interim Dean of the Graduate School
August, 2011 c 2011, Krishna Kaphle Texas Tech University, Krishna Kaphle, August-2011
ACKNOWLEDGEMENTS
This dissertation would not have been possible without continuous support of my advisor Horn Professor Dr. Firts H. Ruymgaart. I am heartily thankful to Dr. Ruym- gaart for his encouragement, and supervision from the research problem identification to the present level of my dissertation research. His continuous supervision enabled me not only to bring my dissertation in this stage but also to develop an understand- ing of the subject. I am lucky to have a great person like him as my advisor. I am grateful to Horn. Professor Dr. Linda J. Allen for her help throughout my stay at Texas Tech. I would also like to thank Dr. Petros Hadjicostas for encouragement and several valuable remarks during my research. I am also heartily thankful to Dr. David Gilliam for his support and help on writing Matlab code. I am pleased to thank those who made this dissertation possible. Among those, very important people are Dr. Kent Pearce, chair, Dr. Ram Iyer, graduate advisor, and all the staffs of Department of Mathematics and Statistics, who gave me the moral, financial, and technological supports required during my study. I also would like to make a special reference to Dr. Magdalena Toda, undergraduate director, for the encouragement and support I got from her. I would like to thank all Professors of Department of Mathematics and fellow graduate students for helping me on whenever needed to finish this dissertation.
ii Texas Tech University, Krishna Kaphle, August-2011
TABLE OF CONTENTS
Acknowledgements ...... ii Abstract ...... v List of Tables ...... vi List of Figures ...... vii 1. Introduction ...... 1 2. Elements of Hilbert Space Theory ...... 3 2.1 Hilbert spaces ...... 3 2.2 Projection and Riesz representation ...... 5 2.2.1 Projections ...... 8 2.2.2 Spectral Properties of Operators ...... 11 3. Random Variables in A Hilbert Space ...... 19 3.1 Random variables ...... 19 3.2 Probability Measures on H ...... 22 3.2.1 Some remarks on B ...... 22 3.2.2 Probability measures on H ...... 23 3.2.3 Gaussian distributions ...... 26 3.2.4 Karhunen - Lo`eve expansion ...... 28 4. Random Samples and Limit theorems ...... 29 4.1 Random Samples ...... 29 4.2 Some Central Limit Theorems ...... 31 5. Functions of covariance operators and Delta method ...... 41 5.1 Functions of bounded linear operators ...... 43 5.1.1 Fr´echet derivative ...... 45 5.1.2 Delta Method ...... 48 6. Perturbation of eigenvalues and eigenvectors ...... 51 6.1 Perturbation theory for operators ...... 51 6.2 Perturbation theory for matrices ...... 56 7. Testing equality of covariance operators ...... 61 7.1 Finite dimensional case ...... 61 7.2 Infinite dimensional case ...... 62
iii Texas Tech University, Krishna Kaphle, August-2011
7.2.1 Test statistic under null hypothesis ...... 62 7.2.2 Estimation of variance ...... 67 8. Generalized test ...... 70 8.1 The two-sample case ...... 73 8.2 Test statistics ...... 75 8.3 Estimation of the variance ...... 79 8.3.1 The Gaussian case ...... 80 9. Some Simulations ...... 82 9.1 Test using single eigenvalue ...... 83 9.2 Test using the first m largest eigenvalues ...... 85 9.2.1 The null Hypothesis ...... 85 9.2.2 The fixed alternative ...... 86 9.2.3 Identification of regularization parameter ...... 87 9.2.4 Local alternatives ...... 88 10. conclusion ...... 93 Bibliography ...... 94 Appendix: Matlab Code ...... 98
iv Texas Tech University, Krishna Kaphle, August-2011
ABSTRACT
The generalization of multivariate statistical procedures to infinite dimension nat- urally requires extra theoretical work. In this dissertation, we will focus on testing the equality of covariance operators. We derive a procedure from the Union Intersec- tion principle in conjunction with a Likelihood Ratio test. This procedure leads to a statistic which is the largest eigenvalue of a product of operators. We generalize this procedure by using a test statistic that is based on the first m ∈ N largest eigenvalues. Perturbation theory of operators and functional calculus of covariance operators are extensively used to derieve the required asymptotics. It is shown that the power of the test is improved with inclusion of more eigenvalues. We perform simulations to corroborate the testing procedure, using samples from two Gaussian distributions.
v Texas Tech University, Krishna Kaphle, August-2011
LIST OF TABLES
9.1 Test statistic and the regularization parameter ...... 85 9.2 Power vs inclusion of eigenvalues ...... 86 9.3 Fraction of rejections under the null Hypothesis ...... 87 9.4 Type II error ...... 92
vi Texas Tech University, Krishna Kaphle, August-2011
LIST OF FIGURES
8.1 Contours used for integration ...... 71 8.2 Contours enclosing m eigenvalues ...... 72 9.1 Histogram of 1000 test statistic values under Null ...... 84
9.2e ˆ1 vs e1 ...... 84 9.3v ˆ vs v ...... 84 9.4 Histogram of test statistic values under null when two and three eigen- value are taken...... 86 9.5 Histograms of test statistics under Local alternative when two, and three eigenvalue is taken...... 87 9.6 Histogram of test statistics under Null when epsilon is 0.5 two, and three eigenvalue is taken...... 88 9.7 Histogram of test statistics under Null when epsilon is 1.0 when two, and three eigenvalues are taken...... 89 9.8 Histogram of test statistics under Null when epsilon is 1.5 when two, and three eigenvalues are taken...... 89 9.9 QQ plot of the test statistics under Null when epsilon is 0.5 when two, and three eigenvalue are taken...... 90 9.10 QQ plot of the test statistics under Null when epsilon is 1.0 when two, and three eigenvalue are taken...... 90 9.11 QQ plot of the test statistics under Null when epsilon is 1.5 when two, and three eigenvalue is taken...... 91 9.12 Fraction of rejection vs Gamma for fixed sample size ...... 91
vii Texas Tech University, Krishna Kaphle, August-2011
CHAPTER 1 INTRODUCTION
Univariate statistical theory is concerned with statistical inference based on the samples from a distribution on the real line. Multivariate statistics refers to the in- ference based on the samples from a distribution on Euclidean, i.e. finite dimensional spaces. Functional data analysis deals with samples in infinite dimensional spaces, typically function spaces. Indeed, the generic sample element is usually a function and an infinite dimensional object. It may also be an object of very high finite dimension. Throughout this research we will always assume the data to be infinite dimensional. More specifically, the generic sample element will be supposed to be an element of an infinite dimensional Hilbert space H. We will keep H abstract. In the theory, this has the advantage that the properties obtained for sample means, for instance, entail at once properties of the sample covariance operator, because it is also a sample mean in a Hilbert space (albeit not the same Hilbert space in which the sample elements assume their values). A good example of a Hilbert space that might be used for functional data is L2(0, 1), the space of all square integrable functions on [0,1]. Many classical statistical problems can be formulated for functional data [36]. This dissertation focuses on the testing equality of the covariance structures of two populations, based on random samples. Before going any further in the development of the theory, we will discuss some examples of functional data analysis (see [29] for detail discussion). 1. As a generic sample element we may consider the angle of the hip and knee over a child’s gait cycle. The cycle begins when the heel touches the ground and ends when it touches the ground again. A natural question that arises is whether there is a relation between the two cycles (see [26])?
2. Near - infrared spectroscopy is applied to different varieties of wheat. Let Xi(t) denote the density of the reflected radiation recorded at the spectrometer when the wave length equals t and ηi represents the level of a given protein for the i-th type of wheat. Theory suggests that the relation between ηi and Xi’s of the form
Z b ηi = C + Xi(t)f(t)dt + error, for i =1, 2, . . . , n a
1 Texas Tech University, Krishna Kaphle, August-2011 where C is a constant. This represents a functional regression model(see [19]). In the above examples we have seen several statistical problems for functional data that are well - known in multivariate statistics: For example, the two - sample problem, multiple and canonical correlation, and regression. Principal component analysis is another point of interest that extends to functional data. In this research we will need a version of the central limit theorem in Hilbert spaces, that is somewhat more general then the central limit theorem in its simplest form. Although very general central limit exists (see [24]), we prefer to present an independent proof tailored to the situation at hand. This yields the asymptotic distribution of both the sample mean and the sample covariance operators for certain triangular arrays of H- valued random variables. An important tool for the asymptotic distribution of the test statistic that we purpose in Chapter 7 to deal with equality of covariance structures, is a delta method for analytic functions of random operators. There are many analogies between infinite dimensional Hilbert spaces and Eu- clidean spaces. However, there are some differences: For example, the closed unit ball in H is not compact in the norm topology, there is no Lebesgue measure on H, the option of defining a density with respect to the Lebesgue measure is no longer available for H. The Gaussian probability distributions can be defined on H but two Gaussian distribution could be either orthogonal or equivalent. Apart from some Hilbert space theory, we will discuss random elements, probability distributions, central limit theorems, some operator and perturbation theory for H.
2 Texas Tech University, Krishna Kaphle, August-2011
CHAPTER 2 ELEMENTS OF HILBERT SPACE THEORY
In this chapter, we will focus on some theory of Hilbert spaces.
2.1 Hilbert spaces A real inner product space is a vector space equipped with a real valued inner product h., .i which is symmetric, linear in both components and satisfies hx, xi ≥ 0 for every element x in the space. Associated with the inner product is the norm p kxk = hx, xi. A sequence {xn} in the inner product space is said to be Cauchy if
for every > 0 there is an integer N such that kxn − xmk < , whenever n, m > N. The inner product space is said to be complete if every Cauchy sequence converges to a limit in the space.
Definition 2.1.1. A Hilbert space H is a complete inner product space.
For any two elements x, y in H, the distance ρ(x, y) = kx − yk defines a metric in it. A subset of a Hilbert space is called a subspace if it is a Hilbert space in its own right. Some of the important properties of a Hilbert space H are: (a) The Cauchy - Schwarz Inequality: For any x, y ∈ H,
|hx, yi| ≤ kxk kyk . (2.1)
(b) The Triangle Inequality: For any x, y ∈ H,
kx + yk ≤ kxk + kyk . (2.2)
(c) The Continuity of The Inner Product: The inner product is jointly con- tinuous with respect to the induced norm. That is
xn → x and yn → y ⇒ hxn, yni → hx, yi .
(d) The Parallelogram Law: For any x, y ∈ H,
kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 . (2.3)
3 Texas Tech University, Krishna Kaphle, August-2011
(e) The Polarization Identity: For any x, y ∈ H,
1 hx, yi = kx + yk2 + kx − yk2 . (2.4) 4 Complete normed vector spaces are called Banach spaces. Hilbert spaces are special Banach spaces in which the norm is induced by an inner product. The following proposition relates Banach spaces and Hilbert spaces.
Proposition 2.1.1. Every Banach space in which the parallelogram identity holds is a Hilbert space, and the inner product is uniquely determined by the polarization identity.
2 P 2 Some examples: (i) The space l of sequences {xn} such that j xj < ∞. The inner ∞ X 2 product is defined by hx, yi = xiyi. (ii) The space L (0, 1) of all square integrable i=1 Z 1 functions f on [0, 1]. The inner product is defined by hf, gi = f(t)g(t)dt. 0 The following example shows that not every linear inner product space is a Hilbert space. The space C[0, 1] of all continuous functions f on [0, 1]. The inner product Z 1 can be defined by hf, gi = f(t)g(t)dt, but the space is not complete: fn(t) = 0 max {0, min(1, n(t − 1/2))} is a Cauchy sequence whose limit g(1/2) = 0, and g(t) = 1, t 6= 1/2 is not continuous on [0,1]
Definition 2.1.2. Orthogonality Two vectors x and y in H are said to be orthog- onal, if hx, yi = 0.
We write x⊥y if x and y are orthogonal vectors. A set S ⊂ H is said to be an orthogonal set if all vectors in it are non-zero and are mutually orthogonal. An orthogonal set with only unit vectors is called an orthonormal set. We will end this section with one of the very important identities for any inner product space.
Pythagorean Identity: If x1, x2, . . . , xn is an orthogonal system, then
2 2 2 2 kx1 + x2 + ... + xnk = kx1k + kx2k + ... + kxnk . (2.5)
4 Texas Tech University, Krishna Kaphle, August-2011
2.2 Projection and Riesz representation In this section we will discuss two important classes of mappings on H: one assum- ing values in R and the other assuming values in H itself. A mapping l : H → R is called a functional on H and L : H → H is called an operator on H. Any mapping φ in a linear space is linear if φ(ax + by) = aφ(x) + bφ(y) for all a, b ∈ R and x, y ∈ H, and it is said to be continuous if φ(xn) → φ(x) whenever xn → x in H.
Definition 2.2.1. A bounded linear functional is a mapping l : H → R which is linear and |l(x)| sup = klk < ∞. x6=0 kxk
This definition entails |l(x)| ≤ klk kxk , ∀x ∈ H. Without confusion we have used the same notation for the norm klk of l and kxk of x, although l is not a element of H. See, however, Theorem 2.2.1.
Proposition 2.2.1. The linear functional l on H is bounded if and only if it is continuous.
Proof. Let l be bounded, and xn → x. We have,
|l(xn) − l(x)| = |l(xn − x)| ≤ klk kxn − xk → 0, when n → ∞.
Hence l is continuous. Conversely, let l be continuous. Suppose klk = ∞. Then there x exists a sequence x with kx k = 1 and kl(x )k > n for each n. Let y = n for each n n n n n n. Thus, we have, l(yn) → 0 and kl(yn)k ≥ 1, which is a contradiction.
It is worth to note that the boundedness not only implies continuity but also uniform continuity since for any x ∈ H we have |l(x + h) − l(x)| = |l(h)| ≤ klk khk. The following theorem is of extreme importance and its proof can be found in any standard functional analysis book (see [31], [10], [9] and [25]).
Theorem 2.2.1. Riesz representation theorem To each bounded linear functional
l : H → R, there corresponds a unique vector al ∈ H, such that
l(x) = hx, ali , x ∈ H, moreover klk = kalk . (2.6)
5 Texas Tech University, Krishna Kaphle, August-2011
The vector al is called the representer of functional l.
Definition 2.2.2. A linear operator T : H → H is called bounded if
kT xk sup = sup kT xk = kT kL ≤ ∞. x6=0 kxk kxk=1
As in the case of functionals, the boundedness of T entails its uniform continuity. The class of all bounded linear operators on H will be denoted by L(H) or simply by L.
Proposition 2.2.2. L is a Banach space under the norm k.kL .
Proof. Let T1,T2 in L and a, b ∈ R. For any x ∈ H, we have k(aT1 + bT2)xk =
kaT1(x) + bT2(x)k ≤ (|a| kT1k + |b| kT2k) kxk . Thus, aT1 + bT2 ∈ L. Now, let {Tn}
be a Cauchy sequence in L. Since kTn(x) − Tm(x)k ≤ kTn − Tmk kxk , the sequence
Tn(x) is Cauchy in H, thus it is convergent. Let T (x) be its limit. Since Tn is Cauchy,
there is M < ∞ such that kTnk < M, for all n. The continuity of the norm implies that kT (x)k ≤ M kxk for all x ∈ H. Thus, T ∈ L. Let > 0, and k be such that kTn − Tmk < for m, n ≥ k. But,
kTm(x) − Tn(x)k ≤ kTm − Tnk kxk ≤ kxk for all n, m > k implies,
kT (x) − Tn(x)k = lim kTm(x) − Tn(x)k ≤ kxk m→∞ for all n > k and x ∈ H. That is, kT − Tnk ≤ for all n > k. This implies lim Tn = T ∈ L
Definition 2.2.3. The operator T ∗ : H → H is called adjoint of T if
∗ hT x, yi = hx, T yi , for all x, y ∈ H (2.7)
Proposition 2.2.3. Every bounded linear operator T ∈ L has a bounded linear ad- joint.
6 Texas Tech University, Krishna Kaphle, August-2011
Proof. Let y ∈ H be fixed. l(x) = hT x, yi , x ∈ H, is a linear functional on H. This functional is bounded because T is bounded and we have,
|hT x, yi| ≤ kT xk kyk ≤ kT k kxk kyk .
Thus, by the Riesz representation theorem there exists a representer T ∗y ∈ H of this functional such that l(x) = hT x, yi = hx, T ∗yi , x ∈ H. Next, hx, T ∗(ay + bz)i = hT x, ay + bzi = a hT x, yi + b hT x, zi = hx, aT ∗y + bT ∗zi . Thus, T ∗(ay + bz) = aT ∗y + bT ∗z, for all a, b ∈ R and for all y, z ∈ H. Therefore, T ∗ is linear. Fi- nally, kT ∗yk2 = hT ∗y, T ∗yi = hTT ∗y, yi ≤ kT k kT ∗yk kyk implies, kT ∗yk ≤ kT k kyk . Similarly, kT xk ≤ kT ∗k kxk. Thus, T ∗ is bounded and we have
kT k = kT ∗k . (2.8)
Definition 2.2.4. A linear operator T ∈ L is called Hermitian if T ∗ = T .
For any T ∈ L, the operator T ∗T : H → H, is Hermitian. Example 1: Let us consider H = l2 the space of infinite sequences. The operator T : l2 → l2, defined by,
∞ X 2 T (x) = T (x1, x2,...) = (x2, x3,...), where x = (x1, x2,...), xk < ∞, k=1 is known as shift operator. Since
∞ ∞ 2 X 2 X 2 2 kT (x1, x2,...)k = xk ≤ xk = k(x1, x2,...)k , k=2 k=1
T is bounded. Also
∞ ∞ X X ∗ hT x, yi = xk+1yk = x1 · 0 + xkyk−1 = hx, T yi , k=1 k=2
∗ where, T y = (0, y1, y2,...).
7 Texas Tech University, Krishna Kaphle, August-2011
Example 2: Let H = L2(0, 1). The operator known as primitivation operator is defined by Z x (T f)(x) = f(t)dt, 0 ≤ x ≤ 1, f ∈ L2(0, 1). 0 We have,
Z 1 Z x 2 kT fk2 = f(t)dt dx x=0 0 Z 1 2 = 1[0,x], f dx x=0 Z 1 2 2 ≤ 1[0,x] kfk x=0 Z 1 1 = kfk2 xdx = kfk2 . 0 2
1 1 √ This shows that kT k ≤ √ . In fact kT k = √ , as can be shown by taking f(t) = 3t 2 2 for t ∈ (0, 1). To find the adjoint, note that
Z 1 Z 1 hT f, gi = 1[0,x](t)f(t)dt g(x)dx x=0 t=0 Z 1 Z 1 = f(t) 1[0,x](t)g(x)dx dt t=0 x=0 = hf, T ∗gi ,
∗ R 1 where (T g)(t) = x=t g(x)dx.
2.2.1 Projections The class of projection operators is one of the most important classes of operators on a Hilbert space. They resemble the class of projection matrices in the case of Euclidean spaces. In this subsection we briefly discuss projection operators.
Definition 2.2.5. For any non-empty subset G of H the orthogonal complement G⊥ is defined to be the set ⊥ G = {x ∈ H : x⊥y, ∀y ∈ G} .
8 Texas Tech University, Krishna Kaphle, August-2011
Proposition 2.2.4. G⊥ is a closed linear subspace of H.
⊥ Proof. Let x1, x2 ∈ G and a, b ∈ R. We have for any y ∈ G, hax1 + bx2, yi = ⊥ a hx1, yi + b hx2, yi = 0. That is, G is a linear subspace of H. Next, let {xn} be any Cauchy sequence in G⊥ converging to x. For any y ∈ G, we have hx, yi = ⊥ ⊥ limn→∞ hxn, yi = 0. That is, x ∈ G . Hence, G is closed.
The following theorem is of extreme importance.
Theorem 2.2.2. Let L ⊂ H be a closed linear subspace of H. Then, for each x ∈ H, there exists a unique P x ∈ L such that
kx − P xk = min kx − yk , y∈L
Moreover, we have the unique decomposition
⊥ x = P x + x − P x, P x ∈ L, (x − P x) ∈ L .
Proof. Since L is a closed linear subspace, it is convex and complete. By the definition of minimum there exists a sequence {yn} ⊂ L such that dn → miny∈L kx − yk = d,
where dn = kx − ynk. Note that, by the parallelogram law,
2 2 kyn − ymk = k(yn − x) − (ym − x)k 2 2 2 ≤ 2 kyn − xk + kym − xk − k(yn − x) + (ym − x)k 2 2 2 = 2(dn + dm) − 2d → 0.
Thus, yn is Cauchy and hence, converges in L to y = P x. Also, P x ∈ L ⇒ kx − P xk ≥ d. Moreover,
kx − P xk ≤ kx − ynk + kyn − P xk = dn + kyn − P xk → d.
Hence, d = kx − P xk. If possible let z ∈ L be another such vector, i. e. kx − zk = d. By the parallelogram
9 Texas Tech University, Krishna Kaphle, August-2011 law, we have
kP x − zk2 = k(P x − x) + (x − z)k2 = 2 kP x − xk2 + kx − zk2 − kP x − x + x − zk2 2 2 2 (P x + z) 2 2 = 2d + 2d − 4 − x ≤ 4d − 4d = 0. 2
Therefore P x − z = 0 ⇒ P x = z. Hence we get the uniqueness. The rest can be proved exploiting uniqueness.
The vector P x is called the orthogonal projection of x onto L, and we have the following result associated with orthogonal projection.
Proposition 2.2.5. The map x 7−→ P x is linear and bounded with kP k = 1.
Proof. Let L ⊂ H be a closed linear subspace and x, y ∈ H, a, b ∈ R. We have
⊥ x = P x+(x−P x), and y = P y+(y−P y) with P x, P y ∈ L, and (x−P x), (y−P y) ∈ L .
Also, ax + by = P (ax + by) + (ax + by − P (ax + by)) with P (ax + by) ∈ L and ax + by − P (ax + by) ∈ L⊥. Moreover, ax + by = a(P x + (x − P x)) + b(P y + (y − P y)) = aP x + bP y + (a(x − P x) + b(y − P y)) with (aP x + bP y) ∈ L and (a(x − P x) + b(y − P y)) ∈ L⊥. The uniqueness of the decomposition implies, P (ax+ by) = aP x + bP y. Hence, the map is linear. By the Pythageorean equality, kxk2 = kP xk2 + kx − P xk2. Therefore kP xk ≤ kxk. Thus P is bounded and kP k ≤ 1. But, for x ∈ L, P x = x implies kP k = 1.
Theorem 2.2.3. A bounded linear operator P : H → H is a projection if and only if it is idempotent and Hermitian, that is, P 2 = P and P ∗ = P .
Proof. To prove the if part, let G = {g : P g = 0} and L = G⊥. Since P 2 = P , (x − P x) ∈ G for all x ∈ H, and for any g ∈ G, hP x, gi = hx, P gi = 0. Thus P x ∈ L⊥ with x = P x + (x − P x), and hence P is a projection. Conversely, let P be a projection. For any x ∈ H, we have x = P x + (x − P x) with
10 Texas Tech University, Krishna Kaphle, August-2011
P x ∈ L and x − P x ∈ L⊥. So, P x = P 2x for all x ∈ H, thus ⇒ P 2 = P . Moreover,
hP x, yi = hP x, P y + y − P yi = hP x, P yi hx, P yi = hP x + x − P x, P yi = hP x, P yi .
That is, hP x, yi = hx, P yi = hP ∗x, yi which implies P ∗ = P .
Theorem 2.2.4. Let L = span (e1, e2, . . . , en), where {e1, e2, . . . , en} is a finite or- thogonal system. If P is the projection onto L, we have
n X P x = hx, eki ek, x ∈ H. k=1
n ⊥ X Proof. We have, x = P x+(x−P x), P x ∈ L, and (x−P x) ∈ L . Thus, P x = ckek k=1 for some constants ck. In addition hx − P x, eji = 0, for each j = 1, 2, . . . , n. This implies ck = hx, eki for all k. Hence the result follows.
An orthonormal sequence e1, e2,... in H is called complete if hx, eki = 0 for all integers k implies x = 0. A complete orthonormal sequence is called an orthonormal basis, and we have the following two important results: (1) Any separable Hilbert space of infinite dimension has a countably infinite or- thogonal basis e1, e2,.... ∞ X (2) Any vector x ∈ H can be written as x = hx, eki ek in the sense that k=1
n X x − hx, eki ek → 0, as n → ∞, k=1 which yields the Parseval’s identity
∞ 2 X 2 kxk = hx, eki . k=1
2.2.2 Spectral Properties of Operators We will now discuss some important classes of bounded operators on H.
11 Texas Tech University, Krishna Kaphle, August-2011
Definition 2.2.6. A subset S of H is compact if every sequence in S contains a convergent subsequence with limit in S.
Closed and bounded sets are compact in Euclidean spaces. But, the same is not true in infinite dimensional Hilbert spaces. The following theorem shows one of the differences between Euclidean spaces and infinite dimensional Hilbert spaces.
Theorem 2.2.5. If H is infinite dimensional, the closed unit ball B = {x : kxk ≤ 1} is not compact in the norm - metric on H.
Proof. Let e1, e2,... be an orthonormal basis of H. We have, q √ kej − ekk = 2 − 2 hej, eki = 2 for j 6= k.
Hence no subsequence of e1, e2,... can be a Cauchy subsequence. That is, the se- quence does not have any converging subsequence. Since en ∈ B for all n, B can not be compact.
Definition 2.2.7. A set S ⊂ H is called precompact if every sequence of points in S contains a Cauchy subsequence. A linear operator T : H → H is called compact if the image TB of the unit ball B is precompact. Equivalently, T is compact if for every bounded sequence {xn} in H, the sequence {T xn} contains a convergent subsequence.
Proposition 2.2.6. Compact operators are bounded.
Proof. Let T be compact. If T is unbounded, there exists a sequence {xn} in H such that kxnk = 1 for all n, and kT xnk → ∞. But then {T xn} can not have a convergent subsequence.
Definition 2.2.8. A linear operator T is called positive if T is Hermitian and
hT x, xi > 0 ∀x ∈ H − {0} .
It is called nonnegative if we allow equality.
A positive operator is one-to-one. For, if there exists x 6= y with T x = T y, then hT (x − y), (x − y)i = 0 and (x − y) 6= 0, which is a contradiction.
12 Texas Tech University, Krishna Kaphle, August-2011
Definition 2.2.9. A number λ ∈ C is called an eigenvalue of an operator T if there exists x 6= 0 such that T x = λx.
It is easy to see that eigenvalues of a positive operator T are always real positive. The vector x in the above definition is called an eigenvector for the eigenvalue λ. The set of all eigenvectors together with 0 is a closed linear subspace. If the operator is compact, we have the following spectral theorem whose proof can be found in any standard functional analysis book (see [10], [25], [30], and [31]).
Theorem 2.2.6. If H is infinite dimensional and T : H → H is a positive compact operator, then T has an infinitely countable number of positive eigenvalues τk that can be arranged in a decreasing sequence that converges to zero. That is,
τ1 > τ2 > . . . ↓ 0, and the corresponding eigenspaces are all finite dimensional and mutually orthogonal.
If Ek is the orthogonal projection onto the eigenspace for τk, then we have,
∞ ∞ X X T = τkEk, with EkEl = 0 for k 6= l, and Ek = I. (2.9) k=1 k=1
The following relates the norm of an operator with its largest eigenvalue. The easy proof is omitted.
Theorem 2.2.7. For any positive compact operator T ,
kT k = τ1 = the largest eigenvalue.
Example: Let us take H = L2(0, 1). The kernel
K(s, t) = s ∧ t − st, (s, t) ∈ [0, 1] × [0, 1] ,
13 Texas Tech University, Krishna Kaphle, August-2011
defines an integral operator K : L2(0, 1) → L2(0, 1) by
Z 1 (Kf)(s) = K(s, t)f(t)dt 0 Z 1 Z 1 = (s ∧ t)f(t)dt − (st)f(t)dt 0 0 Z s Z 1 Z 1 = tf(t)dt + s f(t)dt − s tf(t)dt 0 s 0 Z s Z 1 = (1 − s) tf(t)dt + s (1 − t)f(t)dt, 0 s
where s ∈ [0, 1] and f ∈ L2(0, 1). This operator is known to be positive and compact. By differentiating twice the equation (Kf)(s) = λf(s), and using the above expression we get, −f(s) = λf 00(s), 0 ≤ s ≤ 1.
We also have the boundary conditions
f(0) = f 00(0) = 0.
This yields the eigenfunctions √ φk(s) = 2 sin(kπs), 0 ≤ s ≤ 1,
with eigenvalues 1 λ = , k ∈ . k (kπ)2 N
In this case all the eigenspaces are one dimensional, and φ1, φ2,... form an or- thonormal basis. The integral kernel considered here is the covariance kernel of the Brownian Bridge, the limiting process of the uniform empirical process.
Definition 2.2.10. A linear operator T ∈ L is called Hilbert- Schmidt if for some orthonormal basis e1, e2,... of H we have,
∞ X 2 kT ekk < ∞. (2.10) k=1
14 Texas Tech University, Krishna Kaphle, August-2011
The following proposition shows that the finite number in (2.10) is independent of the choice of basis.
Proposition 2.2.7. If T is Hilbert-Schmidt, then for any two orthonormal bases e1, e2,... and a1, a2,... we have,
∞ ∞ X 2 X 2 kT ekk = kT amk k=1 m=1
Proof. We have,
∞ ∞ * ∞ !+ X 2 X ∗ X kT ekk = ek,T T hek, ami am k=1 k=1 m=1 ∞ ∞ X X ∗ = hek, ami hek,T T ami k=1 m=1 ∞ ∞ X X ∗ = ham, eki ham,T T eki m=1 k=1 ∞ * ∞ !+ X ∗ X = am,T T ham, eki ek m=1 k=1 ∞ X 2 = kT amk . m=1
Proposition 2.2.8. If S,T ∈ LHS, and e1, e2,... an orthonormal basis of H, then
∞ X hS,T iHS = hSek, T eki (2.11) k=1 defines an inner product on LHS.
15 Texas Tech University, Krishna Kaphle, August-2011
Proof. Let S1,S2,T ∈ LHS and a, b ∈ R. We have,
∞ X haS1 + bS2,T iHS = haS1ek + bS2ek, T eki k=1 ∞ ∞ X X = a hS1ek, T eki + b hS2ek, T eki k=1 k=1
= a hS1,T iHS + b hS2,T iHS .
P∞ P∞ 2 Also, hT,T iHS = k=1 hT ek, T eki = k=1 kT ekk ≥ 0, for all T ∈ LHS. The inner product in (2.11) does not depend on the choice of basis as can be seen the same way as the norm. Equipped with this inner product the space of all Hilbert-
Schmidt operators LHS becomes a separable infinite dimensional Hilbert space in its own right and the Hilbert-Schmidt norm will be written as,
∞ 2 X 2 kT kHS = kT ekk . (2.12) k=1
Definition 2.2.11. For a, b ∈ H, the tensor product a ⊗ b is defined by the following relation (a ⊗ b)x = hx, bi a, ∀x ∈ H. (2.13)
This is an important example of an operator in LHS. The following proposition defines its norm.
Proposition 2.2.9. we have ka ⊗ bkHS = kak kbk. Proof. If b = 0, the result is obvious. Let b 6= 0, choose an orthonormal basis b e = , e ,.... We have, 1 kbk 2
2 ∞ 2 b X 2 ka ⊗ bk = (a ⊗ b) + k(a ⊗ b)ekk HS kbk k=2 2 ∞ hb, bi 2 X 2 2 = kak + he , bi kak kbk k k=2 = kak2 kbk2 .
16 Texas Tech University, Krishna Kaphle, August-2011
This completes the proof of the Proposition.
If u ∈ H has kuk = 1, u ⊗ u is the orthogonal projection onto the span(u), and
ku ⊗ ukHS = 1.
Theorem 2.2.8. If T ∈ LHS, then T is compact.
Proof. Let e1, e2,... be an orthonormal basis of H, and note that
N 2 X 2 CN = kT kHS − kT ekk → 0, as N → ∞. k=1
N X Let us define an operator TN by TN x = hx, eki T ek, x ∈ H, then TN is com- k=1 pact because it has finite dimensional range. Choose x ∈ H, and we have x = N ⊥ X ⊥ ⊥ xN + xN , where xN = hx, eki ek, and xN ⊥ span(e1, e2, . . . eN ). If xN = 0, k=1 ⊥ 2 2 ⊥ xN we have k(T − TN )xk = 0 ≤ CN kxk . If xN 6= 0, sete ˜N+1 = ⊥ , and let xN e˜1 = e1,..., e˜N = eN , eN˜+1, eN+2,... be also an orthonormal basis of H. Because the HS-norm is independent of basis it follows that
2 ⊥ 2 2 ⊥ 2 2 k(T − TN )xk = T xN = kT e˜N+1k xN ≤ CN kxk .
It follows from these arguments that, kT − TN k → 0 as N → ∞. But this implies that T is also compact because the limit of compact operators is also compact.
Definition 2.2.12. The trace of a positive bounded operator T is defined as
∞ X tr(T ) = kT ktr = hek, T eki , (2.14) k=1
where e1, e2,... is an orthonormal basis of H.
The number in (2.14) is independent of the choice of basis. The operator is called
trace class if kT ktr < ∞. The family of all trace class operators will be denoted by
17 Texas Tech University, Krishna Kaphle, August-2011
Ltr, and we have the following relation.
Ltr ⊂ LHS ⊂ {all compact operators} ⊂ L. (2.15)
The following theorem justifies the name trace.
Theorem 2.2.9. Let T be positive and compact operator on an infinite dimensional
Hilbert space H with all eigenvalues τ1 > τ2,... ↓ 0 simple. Then,
∞ X tr(T ) = τk. (2.16) k=1
Proof. Since all the eigenvalues are simple, there exists an orthonormal basis e1, e2,... of eigenvectors and we have corresponding eigen projections Ek = ek ⊗ ek, so that T can be written as ∞ X T = τkek ⊗ ek. k=1 Thus we have ∞ ∞ ∞ X X 2 X tr(T ) = hek, T eki = τk kekk = τk. k=1 k=1 k=1
Thus in this special case the trace of an operator is nothing more than sum of eigenvalues.
18 Texas Tech University, Krishna Kaphle, August-2011
CHAPTER 3 RANDOM VARIABLES IN A HILBERT SPACE
3.1 Random variables In this section we define random variables in a Hilbert space and discuss their important basic properties. The Hilbert space H we consider here onward is infinite dimensional and separable. Let (Ω, F, P) be a probability space. A mapping X : Ω → H such that X is (F, B) − measurable, is called a random variable in H, where B is the smallest sigma algebra that contains the family of all open sets in H. A random variable induces a probability distribution PX = P on (H, B), defined by the relation
−1 P (B) = PX (B) = P(X (B)) = P {X ∈ B} ,B ∈ B. (3.1)
Now we define moments of the random variable in H.
Definition 3.1.1. Let X be a random variable in H with E kXk < ∞. The mean EX of X is the vector µ ∈ H uniquely determined by the relation
E ha, Xi = ha, µi , ∀a ∈ H. (3.2)
With the above defined mean, (X − µ) ⊗ (X − µ) is a random element in LHS, with the norm 2 k(X − µ) ⊗ (X − µ)kHS = k(X − µ)k . (3.3) If we assume that 2 E kXk < ∞, (3.4)
the random element (X − µ) ⊗ (X − µ) has a mean in LHS in its own right by essentially the same definition as for the mean of X, i.e. E (X − µ) ⊗ (X − µ) is uniquely determined by the requirement
E hT, (X − µ) ⊗ (X − µ)iHS = hT, ΣiHS , ∀T ∈ LHS. (3.5)
The uniqueness in these definitions is uniquely determined by the Riesz representation
19 Texas Tech University, Krishna Kaphle, August-2011
theorem, because the functionals a 7→ E ha, Xi and T 7→ E hT, (X − µ) ⊗ (X − µ)iHS are bounded.
The relation in (3.5) can be further simplified. We know that LHS is spanned by the operators of the form (a ⊗ b), a, b ∈ H. So choosing T = a ⊗ b we see that the left hand side of (3.5) is,
∞ X E ha ⊗ b, (X − µ) ⊗ (X − µ)iHS = E h(a ⊗ b)ek, ((X − µ) ⊗ (X − µ))eki k=1
= E hhb, e1i a, hX − µ, e1i (X − µ)i
= hb, e1i E hX − µ, e1i ha, X − µi = E ha, X − µi hX − µ, bi ,
b where we have assumed, b 6= 0, e = , e ,... is an orthonormal basis of . Also 1 kbk 2 H the right hand side of (3.5) reduces to
∞ X ha ⊗ b, ΣiHS = h(a ⊗ b)ek, Σeki k=1
= hb, e1i ha, Σe1i = ha, Σbi .
This establish a very important relation. The covariance operator Σ of X is uniquely determined by E ha, X − µi hX − µ, bi = ha, Σbi , ∀a, b ∈ H. (3.6) Apart from some elementary properties similar to those in the multivariate case [28] [11], covariance operators has some specific properties. The following theorem states some of the important properties of the covariance operator. Theorem 3.1.1. The covariance operator Σ is Hermitian and of trace class, and hence compact. It is nonnegative and positive if it is one to one. Proof. Using (3.6) is easy to see that Σ is Hermitian. To prove it is of trace class, let
e1, e2,... be any orthonormal basis in H. Then,
∞ X X − µ = hX − µ, eki ek. k=1
20 Texas Tech University, Krishna Kaphle, August-2011
Hence
∞ ∞ ∞ X X 2 X 2 2 hek, Σeki = E hX − µ, eki = E hX − µ, eki = E kX − µk < ∞. k=1 k=1 k=1
Therefore Σ is of finite trace. Moreover, for any a ∈ H,
2 ha, Σai = E ha, X − µi ≥ 0.
If Σ is one to one, the number on the left in the above inequality is positive and hence Σ is positive.
Now we define an important class of operators on a Hilbert space.
Definition 3.1.2. An operator is called covariance operator if it has the following properties (i) it is Hermitian; (ii) it is positive; (iii) it has finite trace. Hence, it is Hilbert- Schmidt and therefore compact.
Because of the compactness and positivity of Σ, it has the spectral decomposition as in Theorem 2.2.6. Let σ1 > σ2 > . . . ↓ 0 be the eigenvalues of Σ. If we assume that all the eigenspaces are one dimensional, there exists an orthonormal basis e1, e2,... such that the eigenprojections can be written as
Ek = ek ⊗ ek, k ∈ N, (3.7)
and we have the spectral representation
∞ X 2 Σ = σkek ⊗ ek. (3.8) k=1
2 2 Note that the variance of hX − µ, eki is hek, Σeki = σk, so that σk are in fact the variances as suggested by the notion.
21 Texas Tech University, Krishna Kaphle, August-2011
3.2 Probability Measures on H 3.2.1 Some remarks on B Let A be any σ−field of subsets of H, that is, ∅ ∈ A,A ∈ A ⇒ Ac ∈ A, and ∞ A1,A2,... ∈ A ⇒ ∪k=1Ak ∈ A. The sigma field generated by any collection C is the
sigma field σ(C) = ∩A⊃CA. That is, σ(C) is the smallest sigma field containing C. The sigma field B is generated by the collection of all open sets O. Each O ∈ O can ∞ be written as O = ∪k=1Bk, for some open balls B1,B2,.... A collection V of subsets of H is called a field if it is closed under complementation k and finite unions. For each k ∈ N let Bk denote the σ−field of Borel sets in R . A subset C of H is called a Borel cylinder set with base B, if it is of the form C = {x ∈ H :(ha1, xi ,..., hak, xi) ∈ B} , for some k ≥ 1, a1, a2, . . . , ak ∈ H, and
B ∈ Bk. Thus, C is an inverse image of a Borel set under a continuous function and hence measurable. The class V of all cylindrical sets is a field and a subset of B. It is not difficult to see that V also generates B. Since H is separable there exists a countable set of elements of xn ∈ H, xn 6= 0, which is everywhere dense in H. Let
xn un = , so that hxn, uni = kxnk , n ∈ N. kxnk
Using the Cauchy - Schwarz inequality it is easy to see that
∞ \ {x ∈ H : kxk ≤ α} ⊂ {x ∈ H : hx, uni ≤ α} . (3.9) n=1
Since the xn are everywhere dense in H, for x ∈ H with kxk > α, there exists n0 ∈ N 1 such that kx − x k < (kxk − α), so that n0 2 1 1 kx k ≥ kxk − kx − xk > kxk − (kxk − α) = (kxk + α). n0 n0 2 2
Moreover,
1 |hx, u i − kx k| = |hx − x , u i| ≤ ku k kx − x k < (kxk − α). n0 n0 n0 n0 n0 n0 2
22 Texas Tech University, Krishna Kaphle, August-2011
This entails hx, un0 i > α. Hence,
∞ [ {x ∈ H : kxk > α} ⊂ {x ∈ H : hx, uni > α} . n=1
Taking compliments we get
∞ \ {x ∈ H : kxk ≤ α} = {x ∈ H : hx, uni ≤ α} . (3.10) n=1
∞ The set ∩n=1 {x ∈ H : hx, uni ≤ α} is the intersection of a countable collection of cylinder sets and therefore an element of σ(V) for α > 0 and trivially for α = 0. But then, ∞ [ 1 {x ∈ : kxk < α} = x ∈ : kxk ≤ α − , (3.11) H H n n=1 is also an element of σ(V) for each α > 0. This argument extends easily to all open balls so we have σ(V) ⊃ B. Because of the structure of σ(V), we can apply a fundamental result from measure
theory due to Caratheodory to claim : if P0 :V → [0, 1] satisfies P0(H) = 1 and is σ−additive on V, then P0 extends uniquely to a probability measure P on B.
3.2.2 Probability measures on H ∞ X Let x1, x2,... ∈ H and p1 ≥ 0, p2 ≥ 0,... such that pk = 1. The discrete k=1 probability measure on H is defined as X P (B) = pk,B ∈ B.
k:xk∈B
Since each point subset is in B we have P ({xk}) = pk. Important examples of discrete distributions are the empirical distributions.
Let us assume that X has a discrete distribution PX = P as above, and that
∞ X E kXk = kxkk pk < ∞. k=1
23 Texas Tech University, Krishna Kaphle, August-2011
Under this assumption the mean of X is given by
∞ X EX = µ = xkpk, k=1
because ∞ * ∞ + X X E ha, Xi = ha, xki pk = a, xkpk = ha, µi , ∀a ∈ H. k=1 k=1 Moreover, assuming that
∞ 2 X 2 E kXk = kxkk pk < ∞, k=1 the covariance operator of X equals
∞ X E(X − µ) ⊗ (X − µ) = Σ = {(xk − µ) ⊗ (xk − µ)} pk. k=1
To see this, choose an arbitrary a, b ∈ H, and note that
∞ X E ha, (X − µ)i h(X − µ), bi = ha, (xk − µ)i h(xk − µ), bi pk. k=1
This last expression should be ha, Σbi. To double check note that, with Σ as above we indeed have
∞ X ha, Σbi = ha, ((xk − µ) ⊗ (xk − µ))bi pk k=1 ∞ X = ha, hxk − µ, bi (xk − µ)i pk k=1 ∞ X = ha, xk − µi hxk − µ, bi pk. k=1
One of the major differences between Euclidean spaces and infinite dimensional Hilbert spaces is that there does not exists a Lebesgue measure on a Hilbert space. Consequently, densities with respect to Lebesgue measure can not be defined. Thus,
24 Texas Tech University, Krishna Kaphle, August-2011 we have to specify the probability distribution in a different manner. Let us recall that a positive Hermitian operator with finite trace is called a covariance operator. We will show that any such operator is indeed the covariance operator of some random variable in H.
Definition 3.2.1. The characteristic functional of a random variable X with induced ˆ probability function PX = P is P : H → R, given by Z iht,xi iht,Xi Pˆ(t) = Ee = e dP (x). (3.12) H The characteristic functional has the following properties: 1. Pˆ(0) = 0. 2. Pˆ : H → C is continuous. 3. Pˆ is semi - definite positive. That is, for every N ∈ N, for every set of numbers z1, z2, . . . zN ∈ C, and for every set of vectors t1, t2, . . . , tN ∈ H we have
N N X X ˆ zjz¯kP (tj − tk) ≥ 0. j=1 k=1
4. If Y is another random variable whose characteristic functional is the same as that of X then PX = PY .
5. For any > 0 there exists a covariance operator Σ such that
ˆ 1 − Re(P(t)) < , whenever ht, Σti < 1. (3.13)
Moreover, the Minlos-Sazonov(1963)(see [27] and [33]) theorem states that the above mentioned properties are sufficient for any complex valued function Pˆ on H to be characteristic functional of a probability measure P on (H, B). Henceforth we will refer to any such function with these five properties as characteristic functional and notice that in principle probabilities on H can be given by specifying characteristic functionals. An important class of probability measure that can be thus defined is the Gaussian distribution.
25 Texas Tech University, Krishna Kaphle, August-2011
3.2.3 Gaussian distributions
Let µ ∈ H be a vector and Σ ∈ LHS be a covariance operator. Then the functional
iht,µi− 1 ht,Σti φ(t) = e 2 , t ∈ H, (3.14) is a characteristic functional. The corresponding probability distribution is called the Gaussian distribution with parameter µ and Σ, and we write it as G(µ, Σ).
Proposition 3.2.1. The random variable X whose characteristic functional is given by (3.14) has a mean µ and a covariance operator Σ.
Proof. Let a, b ∈ H and s1, s2 ∈ R. Then
i(s hx,ai+s hx,bi) ihx,s a+s bi Ee 1 2 = Ee 1 2 ihµ,s a+s bi− 1 hs a+s b,Σ(s a+s b)i = e 1 2 2 1 2 1 2 ha, Σai ha, Σbi 1 ∗ is1hµ,ai+is2hµ,bi− (s1,s2) (s1,s2) 2 ha, Σbi hb, Σbi = e .
This shows that (hx, ai , hx, bi)∗ has a bivariate normal distribution and the mean of hx, ai is hµ, ai, also E hx − µ, ai hx − µ, bi = ha, Σbi. Hence the result follows. In general, probability measures on a Hilbert space are different from those in a Euclidean space. This can be demonstrated in terms of Gaussian measures.
Definition 3.2.2. Two probability measures P,Q on (H, B) are called equivalent (P ∼ Q) if P (B) = 0 ⇔ Q(B) = 0,B ∈ B, and they are called orthogonal (P ⊥Q) if
∃S ∈ B : P (S) = 1,Q(S) = 0.
Thus if P ∼ Q we have P << Q and Q << P , that is the measures are absolutely continuous with respect to each other, and we have the Radon - Nikodym derivatives
dP Z dQ Z = fP,Q : P (B) = fP,QdQ and = fQ,P : P (B) = fQ,P dP, B ∈ B. dQ B dP B
26 Texas Tech University, Krishna Kaphle, August-2011
On the real line any two normal distributions with nonzero variance are equivalent, but two Gaussian distribution on H can be orthogonal. We would like to state the following theorem (see [12], [13] for the proof).
Theorem 3.2.1. (Feldman, Ha´jek)(1958). Let P be G(µ, Σ) and Q be G(ν, Σ) be two Gaussian distributions on H with means µ and ν and common covariance operator P∞ 2 Σ = k=1 σkek ⊗ ek. Let µk = hµ, eki , νk = hν, eki . We have
∞ 2 X (µk − νk) P ∼ Q ⇔ < ∞. (3.15) σ2 k=1 k
If the sum is infinite they are orthogonal.
If two Gaussian measures are equivalent we can define the density of one with respect to the other.
Theorem 3.2.2. If P ∼ Q as defined above the density of P with respect to Q equals
! µk − νk µk + νk P∞ x − dP k=1 2 k = e σk 2 , dQ
P∞ P∞ where x = k=1 hx, eki ek = k=1 xkek.
Proof. We will only give a sketch. Let Pn and Qn be the restrictions of P and Q respectively to the subspaces generated by e1, e2, . . . , en, and let λn be the Lebesgue n measure on R . The density of Pn with respect to Qn equals
dP (dP /dλ )(x) n (x) = n n dQn (dQn/dλn)(x) µ − ν µ + ν ! 1 Pn k k k k Pn 1 2 2 xk− − {(xk−µk) −(xk−νk) } k=1 2 k=1 σ2 σ 2 = e 2 k = e k .
It can be shown that dP/dQ is equal to the limit of the above expression. Hence the result follows.
27 Texas Tech University, Krishna Kaphle, August-2011
3.2.4 Karhunen - Lo`eve expansion Gaussian random variables in a Hilbert space are easy to deal with because of several reasons. One of the reasons is that the calculations regarding a Gaussian random variables are relatively simple because they can be expressed in terms of a countable collection of standard normal random variables. This expansion is known as the Karhunen - Loe`ve expansion (see [18]). 2 Let X ∈ H be a mean 0 random variable with a covariance operator Σ. Let σ1 > 2 σ2 > . . . ↓ 0 be eigenvalues and e1, e2,... be the orthonormal basis of corresponding P∞ eigenvectors of Σ, then X = k=1 hX, eki ek, and thus,
2 E hej,Xi hX, eki = hej, Σeki = σkδj,k.
d 2 If X = G(0, Σ), then the hX, eji are normal with mean 0 and variance σj , and are uncorrelated. Consequently, they are independent. This yields the following Karhunen - Lo`eve expansion. ∞ X X = σkZkek, (3.16) k=1
where, Z1,Z2,... are iid N (0, 1) random variables. We will end this chapter with one important result which will be needed in the sequel.
d Theorem 3.2.3. Let µ ∈ H and Σ be a covariance operator, then X = G(µ, Σ) if d and only if hX, ai = N (ha, µi , ha, Σai) , ∀a ∈ H. d Proof. Let X = G(µ, Σ), and a ∈ H. For any s ∈ R we have
ishX,ai ihX,sai ihsa,µi− 1 hsa,Σsai Ee = Ee = e 2 isha,µi− 1 s2ha,Σai = e 2 .
d This implies that hX, ai = N (ha, µi , ha, Σai) for all a ∈ H. Conversely, let t ∈ H. Since hX, ti =d N (ht, µi , ht, Σti) , we have
ihX,ti iht,µi− 1 ht,Σti Ee = e 2 .
This implies X =d G(µ, Σ).
28 Texas Tech University, Krishna Kaphle, August-2011
CHAPTER 4 RANDOM SAMPLES AND LIMIT THEOREMS
4.1 Random Samples
In this chapter we will develop some tools for statistical analysis. Let X1,X2,... be an iid sample from PX = P . The two important parameters of the probability measure P are the mean µ and the covariance operator Σ. We will now define their sample analogues. The sample mean is defined to be
n 1 X µˆ = X¯ = X , (4.1) n n i i=1 and the sample covariance operator is defined as
n 1 X Σˆ = Σˆ = (X − X¯ ) ⊗ (X − X¯ ). (4.2) n n i n i n i=1
The assumption of finite fourth moment of the norm of X guarantees the existence and uniqueness of µ and Σ. We have the following theorem proving their unbiasedness.
Theorem 4.1.1. The sample mean is an unbiased estimator of the population mean and a rescaled sample covariance operator is an unbiased estimator of the population covariance operator.
Proof. We have,
n 1 X 1 a, X¯ = ha, X i = n ha, µi = ha, µi , ∀a ∈ . E n E i n H i=1
¯ This shows that EXn = µ. In order to prove the same for the covariance operator, we note that for any two random elements X and Y with finite second moment and means µ and ν respectively,
E(X − µ) ⊗ (Y − ν) = T ⇔ E ha, X − µi hY − ν, bi = ha, T bi , ∀a, b ∈ H. (4.3)
29 Texas Tech University, Krishna Kaphle, August-2011
Now, let n 1 X Σ˜ = (X − µ) ⊗ (X − µ) . (4.4) n i i i=1
Since the r.h.s is an average of iid elements in the Hilbert space LHS each with mean E (X − µ) ⊗ (X − µ) = Σ, this must be an unbiased estimator of Σ. That is EΣ˜ = Σ. Also, note that
n 1 X Σˆ = (X − µ) ⊗ (X − µ) − X¯ − µ ⊗ X¯ − µ (4.5) n i i i=1 = Σ˜ − X¯ − µ ⊗ X¯ − µ . (4.6)
Because Xi and Xj are independent for i 6= j we have E ha, Xi − µi hXj − µ, bi = 0. Using (4.3), E (Xi − µ) ⊗ (Xj − µ) = 0, and hence,
n n 1 X X X¯ − µ ⊗ X¯ − µ = (X − µ) ⊗ (X − µ) (4.7) E n2 E i j i=1 j=1 1 1 = (X − µ) ⊗ (X − µ) = Σ. (4.8) nE n
Combining these, we see that
1 n − 1 Σˆ = Σ − Σ = Σ. E n n n Thus Σˆ is an unbiased estimator of Σ. n − 1 The exact distribution of the sample mean for a random sample from a Gaussian
distribution is easy to find. If Xi are iid G(µ, Σ), using X as a generic sample element we have,
¯ 1 Pn n t on 1 Σ iht,Xi iht,Xj i ih ,Xi iht,µi− ht, ti Ee = Ee n j=1 = Ee n = e 2 n . (4.9)
¯ Σ This shows that X has a G(µ, n ).
30 Texas Tech University, Krishna Kaphle, August-2011
4.2 Some Central Limit Theorems There are various versions of univariate limit theorems [37]. To derive the asymp- totic distribution of X¯ and Σ we will need the notion of convergence in distribution in a Hilbert space H.
Definition 4.2.1. Let T,T1,T2,... be a sequence of random variables in H, and let d PT = P, PT1 = P1, PT2 = P2,..., be the induced probability measures. Then Tn → T in H or Pn → P , if Z Z Ef(Tn) = fdPn → fdP = Ef(T ), ∀f ∈ C0(H), (4.10)
where C0(H) is the class of all bounded and continuous real valued functions on H. As in the multivariate case, the above definition is equivalent to the following statement
P(Tn ∈ B) = Pn(B) → P (B) = P (T ∈ B), ∀B ∈ B : P (∂B) = 0. (4.11)
In the case of an infinite dimensional situation, the pointwise convergence of the characteristic functionals is no longer sufficient for establishing the convergence in distribution. We will need a further condition for convergence in distribution of a sequence of random variables in a Hilbert space. The following theorem will give the condition for this convergence (see [24]).
Theorem 4.2.1. Let T1,T2,..., be a sequence of random variables in H such that, for each a ∈ H, d hTn, ai → Ua, as n → ∞, in R, (4.12)
where Ua is some real valued random variable. Suppose, moreover, that for some
orthonormal basis e1, e2,...,
∞ X 2 sup E hTn, eki → 0, as ν → ∞. (4.13) n∈ N k=ν
d Then there exists an H− valued random variable T such that Tn → T as n → ∞, in H.
31 Texas Tech University, Krishna Kaphle, August-2011
Condition (4.12) is known as the convergence of finite dimensional distributions and condition (4.13) is known as the tightness of the sequence. Hence, we say that a sequence of random variable in H converges in distribution if (1) the finite dimensional distributions converge, and (2) the sequence is tight.
Now we will consider the simplest version of the central limit theorem in Hilbert spaces (see also [8]).
Theorem 4.2.2. Central Limit Theorem for for the sample mean. Let ¯ X1,X2,...,Xn be an iid sample with finite second moment of norm and Xn be the sample mean, then there exists a Gaussian random element G =d G(0, Σ) such that √ ¯ d n(Xn − µ) → G, as n → ∞, in H. (4.14)
Proof. Let e1, e2,... be an orthonormal basis of eigenvectors of Σ. Then
∞ √ ∞ X ¯ 2 X ¯ 2 sup E n(Xn − µ), ek = sup nE Xn − µ, ek n∈ n∈ N k=ν N k=ν ∞ X = sup hek, Σeki n∈ N k=ν ∞ X 2 = σk → 0, as ν → ∞, k=ν
√ ¯ because Σ is of finite trace. This shows that the sequence n(Xn − µ) is tight. Now, let a ∈ H and note that hXi − µ, ai are iid real valued random variables with mean 0 and variance ha, Σai. The central limit theorem in one dimension yields the existence d of a real valued random variable Ua = N (0, ha, Σai) such that √ ¯ d n(Xn − µ), a → Ua, as n → ∞, in R. (4.15)
Hence Theorem 4.2.1 applies and yields the existence of a random variable G ∈ H such that √ ¯ d n(Xn − µ) → G, as n → ∞, in H.
32 Texas Tech University, Krishna Kaphle, August-2011
According to continuous mapping theorem it follows that √ ¯ d n(Xn − µ), a → hG, ai , as n → ∞, in R. (4.16)
Thus, the equations (4.15) and (4.16) implies that
d hG, ai = N (0, ha, Σai) , ∀a ∈ H, (4.17)
and G is Gaussian by Theorem 3.2.3. ˆ To derive the asymptotic distribution for Σn, we will need some discussion about
the structure of LHS. As mentioned in Chapter 2, it is a Hilbert space with its own
inner product and norm. We define the tensor product S ⊗HS T for S,T ∈ LHS as follows.
(S ⊗HS T )U = hU, T iHS S,U ∈ LHS. (4.18)
Since (X − µ) ⊗ (X − µ) − Σ is a zero mean random element in LHS, its covariance operator V : LHS → LHS equals
V = E {(X − µ) ⊗ (X − µ) − Σ} ⊗HS {(X − µ) ⊗ (X − µ) − Σ} . (4.19)
This operator is uniquely determined by the property
E hS, (X − µ) ⊗ (X − µ) − ΣiHS h(X − µ) ⊗ (X − µ) − Σ,T iHS
= hS, VT iHS , ∀S ∈ LHS, ∀T ∈ LHS. (4.20)
Now we are in the stage to prove a central limit theorem for covariance operators.
Theorem 4.2.3. Let X1,X2,...Xn be a sequence of iid random variables with finite ˆ fourth moment of the norm. If Σn is the sample covariance operator as defined before, d then there exists a Gaussian random element G = GHS(0, V) such that √ ˆ d n(Σn − Σ) → G, as n → ∞, in LHS (4.21)
33 Texas Tech University, Krishna Kaphle, August-2011
ˆ Proof. Σn can be decomposed as
ˆ ˜ ¯ ¯ Σn = Σn − (Xn − µ) ⊗ (Xn − µ),
˜ 1 Pn where Σn = n i=1(Xi − µ) ⊗ (Xi − µ). ˜ Note that Σn is an average of iid random variables in LHS that have common mean 0 d √ ˜ and covariance operator V. Therefore applying Theorem 4.2.2 yields n(Σn−Σ) → G as n → ∞, in LHS. Also, we have (see Proposition 2.2.9)
√ 1 √ 2 n(Xn − µ) ⊗ (Xn − µ) = √ n(Xn − µ) . HS n
2 Since k.k is a continuous function on H, the continuous mapping theorem entails
√ 2 d 2 n(Xn − µ) → kGk , as n → ∞, in R.
√ 2 √1 √1 Therefore n k n(Xn − µ)k = OP ( n ) and does not contribute to the asymptotic √ ˜ distribution determined by n(Σn − Σ). Thus we have the desired result.
Using this simple version of the central limit theorems we will now prove a more general central limit theorem for a triangular array of random variables in H. For this generalization we will need some additional properties of covariance operators. We know that, for any bounded, positive and Hermitian operator A, there exists a √ bounded operator B such that B2 = A, and we write B = A. Since for any T ∈ L the operator T ∗T is positive and Hermitian, we have the following definition.
Definition 4.2.2. Let T ∈ L we define |T | by
|T | = p(T ∗T ). (4.22)
If C ⊂ LHS is the class of covariance operators, and T ∈ Cwe have
kT ktr = tr(|T |) = tr(T ). (4.23)
34 Texas Tech University, Krishna Kaphle, August-2011
If S,T ∈ C then (|S − T |) ∈ C, and we define
kS − T ktr = tr(|S − T |),T ∈ C. (4.24)
For any T ∈ L the following inclusions are immediate:
kT ktr < ∞ ⇒ kT kHS < ∞ ⇒ kT kL < ∞. (4.25)
The above notions can be easily extended to the class of covariance operators 2 defined on the space of Hilbert - Schmidt operators. We will write CHS ⊂ LHS for the
class of covariance operators on LHS with trace trHS(V) and norm kVkHS,tr, where the trace and norm are as defined above. Now we will prove a generalized version of the central limit theorem. For each
sampling stage n ∈ N, let Xn,1,...,Xn,n be independent and identically distributed H− valued random elements with the same distribution as Xn. A sufficient condition for all that follows is,
4+δ sup E kXnk < ∞, for some δ > 0. (4.26) n∈N
For each n ∈ N we write,
EXn,i = µn, E(Xn,i − µn) ⊗ (Xn,i − µn) = Σn, (4.27)
and introduce
n n 1 X 1 X X¯ = X , Σˆ = (X − µ ) ⊗ (X − µ ). (4.28) n n n,i n n n,i n n,i n i=1 i=1
Also define
E ((Xn,i − µn) ⊗ (Xn,i − µn) − Σn) ⊗HS ((Xn,i − µn) ⊗ (Xn,i − µn) − Σn.) = Vn. (4.29)
Theorem 4.2.4. Let us assume that there exist covariance operators Σ on H and V
35 Texas Tech University, Krishna Kaphle, August-2011
on LHS with,
kΣn − Σktr → 0 and kVn − VkHS,tr → 0. (4.30) Then √ ¯ d n(Xn − µn) → G, as n → ∞, in H, (4.31) where G is a Gaussian (0, Σ) random element, √ ˆ d n(Σn − Σn) → G, as n → ∞, in LHS, (4.32)
where G is a Gaussian (0, V) random element. √ ¯ Proof. Let Tn = n(Xn −µn), we need to prove that for each e ∈ H, hTn, ei converges to a random variable Ue ∈ R, and the sequence {Tn} is tight. Note that Yn,i = 1 √ hX − µ , ei are iid real valued random variables with mean 0 and variance s = n n,i n n,i n 1 2 2 2 X 2 2 4+δ σ (e) where, σn(e) = he, Σnei. Thus, sn = sn,i = σn(e). Since, sup kXn,ik < n n∈N i=1 ∞, 1 E kY k2+δ ≤ √ K, where, K = (2 kek)2+δ sup kX k2+δ . n,i 2+δ n,i ( n) n∈N Now, n X 2+δ E kYn,ik K i=1 ≤ √ → 0, as n → ∞. 2+δ δ 2+δ sn ( n) (σn(e)) By the Lyapeunov Central Limit Theorem for real valued random variables applied to n X {Yn,i} we get, Yn,i = hTn, ei converges in distribution to a normal random variable i=1 Ue ∈ R with mean 0 and variance he, Σnei. But,
|he, Σnei − he, Σei| = |he, (Σn − Σ) ei| ≤ kΣn − Σktr → 0.
That is, Ue is N(0, he, Σei).
For tightness, let {em} be an orthonormal basis of H and > 0 be given. Note that √ ¯ 2 E n(Xn − µn), em = hem, Σnemi , and that |ha, (Σn − Σ) ai| ≤ ha, |Σn − Σ| ai , for
36 Texas Tech University, Krishna Kaphle, August-2011
each a ∈ H. It follows that
∞ ∞ ∞ X X X sup hem, Σnemi ≤ hem, Σemi + sup hem, |Σn − Σ| emi . (4.33) n∈ n∈ N m=N m=N N m=N
Since kΣn − Σktr → 0, there exists n() ∈ N such that
∞ X sup he , |Σ − Σ| e i ≤ sup kΣ − Σk ≤ . (4.34) m n m n tr 2 n≥n() m=N n≥n()
Since each of the operators in the finite collection Σ, Σ1,..., Σn() has finite trace, there exists an index N() ∈ N such that
∞ X he , Σe i < , (4.35) m m 4 m=N()
and
∞ X sup hem, |Σn − Σ| emi (4.36) 1≤n≤n() m=N() ∞ X ≤ hem, Σemi + sup hem, Σnemi ≤ . 1≤n≤n() 2 m=N()
Combination of (4.34), (4.35), and (4.36) yields that the expression on the left in (4.33) is smaller than provided that N ≥ N(). Consequently, there exists a random d element G ∈ H, such that Tn → G as n → ∞ in H. Using the continuous mapping theorem we get, d d hTn, ei → hG, ei ⇒ hG, ei = N(0, he, Σei).
This means that G =d G(0, Σ). n d 1 X To prove (4.32), note that Σˆ → Σ˜ = ((X − µ ) ⊗ (X − µ )). The rest n n n n,i n n,i n i=1 ˜ follows from the same arguments since Σn are iid random elements in LHS.
Corollary 4.2.1. Suppose that in addition to the assumptions in Theorem 4.2.4 we
37 Texas Tech University, Krishna Kaphle, August-2011
have √ n(Σn − Σ) → ∆, as n → ∞, in LHS, (4.37) then we have √ ˆ d n(Σn − Σ) → ∆ + G, as n → ∞, in LHS. (4.38)
Special case : local alternatives. The above result will be applied in a special instance of local alternatives, where the generic random element satisfies
d 2 Xn = X + Rn,X ⊥ Rn, E kRnk → 0, as n → ∞, (4.39)
4 4 with E kXk < ∞, EX = µ, E (X − µ) ⊗ (X − µ) = Σ, E kRnk < ∞, ERn ⊗ Rn =
Tn. In this situation we have,
Σn = E (Xn − µ) ⊗ (Xn − µ) = Σ + Tn. (4.40)
Furthermore, Vn is the covariance operator of (Xn − µ) ⊗ (Xn − µ), and for V we choose the covariance operator of (X − µ) ⊗ (X − µ). 2 Since E kRnk = tr(Tn) = tr (Σn − Σ), the first condition in (4.30) is fulfilled. To
verify the second, let e1, e2,... be the orthogonal basis of eigenvalues of Σ, and note that
hej ⊗ ek, Vnej ⊗ ekiHS (4.41)
= E hej ⊗ ek, {(Xn − µ) ⊗ (Xn − µ) − Σn} ⊗HS
{(Xn − µ) ⊗ (Xn − µ) − Σn} ej ⊗ ekiHS
= Var hej,Xn − µi hXn − µ, eki
= Var hej,X − µi hX − µ, eki + Var hej,Rni hX − µ, eki
+ Var hej,X − µi hRn, eki + Var hej,Rni hRn, eki
≥ Var hej,X − µi hX − µ, eki = hej ⊗ ek, Vej ⊗ ekiHS ,
because the covariances are all zero as follows from the first two assumption in (4.39).
38 Texas Tech University, Krishna Kaphle, August-2011
This inequality means that Vn − V ≥ 0, so that
tr |Vn − V| = tr (Vn − V) (4.42) X X = {Var hej,Rni hX − µ, eki j k
+Var hej,X − µi hRn, eki
+Var hej,Rni hRn, eki} → 0, as n → ∞.
2 To verify (4.42), the first two terms on the right are equal to ΣjE hej,Rni ΣkE hX 2 2 2 −µ, eki = tr(Tn)tr(Σ) → 0. The last term is bounded by ΣjE hej,Rni 2 = (tr(Tn)) → 0, and we are done. We will end this chapter with some discussions about the Karhunen - Loeve ex- pansion of the limit G in Theorems 4.2.3 and 4.2.4. We know that G is zero mean Gaussian random variable in LHS, but the eigenvalues and eigenvectors of V are unknown. Since the sequence {ej ⊗ ek} , j ∈ N, k ∈ N is a basis for LHS, we have
∞ ∞ X X G = hG, ej ⊗ ekiHS ej ⊗ ek. (4.43) j=1 k=1
Note that hG, ej ⊗ ekiHS are zero - mean normal random variables in R. Since the covariance stricture of G is the same as that of (X − µ) ⊗ (X − µ) − Σ, we have
E hej ⊗ ek, GiHS heα ⊗ eβ, GiHS (4.44)
= E hej ⊗ ek, (X − µ) ⊗ (X − µ) − ΣiHS h(X − µ) ⊗ (X − µ) − Σ, eα ⊗ eβiHS 2 2 = E ej, hX − µ, eki (X − µ) − σkek hX − µ, eβi (X − µ) − σβeβ, eα 2 2 = E hej,X − µi hX − µ, eki − σkδj,k hX − µ, eαi hX − µ, eβi − σβδα,β 2 2 = E hX − µ, eji hX − µ, eki hX − µ, eαi hX − µ, eβi − δj,kδα,βσkσβ
= v(j,k),(α.β).
Although the random variables hX − µ, e1i , hX − µ, e2i ,... are uncorrelated, they are not in general independent and the above numbers can not be further specified. If we assume X =d G(µ, Σ), the uncorrelatedness does entail independence and
39 Texas Tech University, Krishna Kaphle, August-2011
(4.44) simplifies to 0 (j, k) 6= (α, β) 4 v(j,k),(α,β) = 2σ (j, k) = (α, β), j = k (4.45) 2 j vj,k = 2 2 σj σk (j, k) = (α, β), j 6= k
In other words, in this case the operator V turns out to be diagonal in the basis ej ⊗ ek. Thus, if X is Gaussian (µ, Σ), the random variable G has the Karhunen - Loeve expansion ∞ ∞ X X G = vj,kZj,kej ⊗ ek, (4.46) j=1 k=1
where vj,k are given by (4.45), and the Zj,k, j ∈ N, k ∈ N are independent standard normal random variables.
40 Texas Tech University, Krishna Kaphle, August-2011
CHAPTER 5 FUNCTIONS OF COVARIANCE OPERATORS AND DELTA METHOD
In multivariate statistics, quite often the inverse and the square root of the inverse of a sample and the population covariance operator play a role. The same will be true in the case of functional statistics. Since Σ is assumed to be one to one, its inverse Σ−1 exists. One way to obtain this inverse is by using the spectral representation of Σ.
If Σ has one dimensional eigenspaces ek ⊗ ek corresponding to each of its eigenvalues 2 σk, k ∈ N, it is easy to see that
∞ X 1 Σ−1 = e ⊗ e (5.1) σ2 k k k=1 k
has all the property to be an inverse of Σ. It should be noted, however, that (a) Σ−1 is not defined on all of H, (b) Σ−1 is not bounded, where defined. P∞ −1 To see (a), choose x = k=1 σkek. Since Σ is of finite trace, x ∈ H. But, Σ x = 1 P∞ 1 2 2 ek, and because ( ) = ∞ we see that x∈ / . Also, since (σk) ↓ 0, as k ↑ ∞, σk k=1 σk H Σ−1 is unbounded. The situation with Σˆ is even worse, because Σˆ can never be one to one. The definition of Σˆ entails that its range is finite dimensional and contained in the linear ¯ ¯ ¯ Pn ¯ span of X1 − X,X2 − X,...,Xn − X . Since i=1(Xi − X) = 0, the dimension of this linear span is at most n − 1. The above situations enforce the thought about some kind of generalization of the inverse of the covariance operator.
Definition 5.0.3. Let Σ be a covariance operator. A regularized inverse of Tikhnov type of Σ is defined by
−1 −1 Σ = (I + Σ) , for some > 0, (5.2)
where is called the regularization parameter.
41 Texas Tech University, Krishna Kaphle, August-2011
If Σ−1 has the spectral representation as in (5.1), we have
∞ X 1 Σ−1 = e ⊗ e . (5.3) + σ2 k k k=1 k
The unboundedness of the inverse is taken care by the regularization process, and we have the following. 1 Proposition 5.0.1. We have kΣ−1k = . L
−1 1 2 Proof. Since Σ ek = 2 ek, for each k and σk ↓ 0 , we have + σk
1 −1 −1 1 1 ≥ Σ L ≥ Σ ek = 2 → . + σk
1 Hence kΣ−1k = . L The following theorem justifies the name regularized inverse.
Theorem 5.0.5. For any x ∈ H we have
−1 (I + Σ) Σx − x → 0, as ↓ 0. (5.4)
Proof.
2 ∞ 2 2 −1 2 X σk − ( + σk) (I + Σ) Σx − x = hx, eki ek + σ2 k=1 k ∞ 2 X 2 = hx, e i . k ( + σ2)2 k=1 k
Introduce the functions f : N → [0, ∞), given by
2 2 f(k) = hx, eki 2 2 , > 0. (5.5) ( + σk)
2 Then 0 ≤ f(k) ≤ hx, eki = g(k), k ∈ N. If νN is the counting measure on N we have, R P∞ 2 gdν = hx, eki < ∞, so that the f are dominated by an integrable function. N N k=1
42 Texas Tech University, Krishna Kaphle, August-2011
Also f(k) → 0 as ↓ 0, for each k ∈ N. Therefore the dominated convergence theorem applies and we get Z Z −1 2 lim (I + Σ) Σx − x = lim fdν = 0.dν = 0, ↓0 ↓0 N N N N hence the result.
Apart from the regularized inverse, other functions of Σ play a role in statistical analysis. In the next subsection we will focus on such functions of bounded linear operators in general.
5.1 Functions of bounded linear operators We begin this section with some definitions. Let T ∈ L.
Definition 5.1.1. The resolvent set ρ(T ) of T is defined by
−1 ρ(T ) = z ∈ C :(zI − T ) is one to one and (zI − T ) ∈ L . (5.6)
The complement of ρ(T ) is called the spectrum σ(T ). That is,
σ(T ) = {ρ(T )}c . (5.7)
Definition 5.1.2. The resolvent of T is the bounded function R(z), z ∈ ρ(T ), defined by R(z) = (zI − T )−1 (5.8)
Since L is a complete, normed, linear space, we can see that
∞ −1 X k (I − S) = S ,S ∈ L : kSkL < 1. (5.9) k=0
This generalizes to
∞ X (S − U)−1 = S−1(I − S−1U)−1 = S−1 (SU)k, (5.10) k=0
43 Texas Tech University, Krishna Kaphle, August-2011
−1 −1 provided that S, U, S ∈ L and kS UkL < 1. With the help of these results we can see that ρ(T ) is an open subset of C. Let z0 ∈ ρ(T ) and z such that |z − z0| kR(z0)kL <
1. We have zI − T = z0I − T − (z0 − z)I, and (5.10) applies with S = z0I − T and
U = (z0 − z)I and yields that
∞ −1 X k k+1 R(z) = (zI − T ) = (z0 − z) R (z0), (5.11) k=0
exists and is continuous. Thus, ρ(T ) is open, and hence σ(T ) is closed. Also because of the power series expansion, R(z) is analytic. Moreover, σ(T ) is bounded and
σ(T ) ⊂ {z ∈ C : |z| ≤ kT kL} . (5.12)
Henceforth let Ω ⊃ σ(T ) be an open region in C with smooth boundary Γ = ∂Ω. ¯ ¯ Also let D ⊃ Ω be an open neighborhood of Ω and 0 < δΓ ≤ dist(Γ, σ(T )). We will now give a very general definition of an analytic function of bounded operators.
Definition 5.1.3. If φ : D → C be an analytic function we define 1 I φ(T ) = φ(z)R(z)dz. (5.13) 2πi Γ
The above integral is considered as a limit of Riemann sums and this definition is well defined since L is a complete linear space. The following special cases are interesting to justify our expectations on the above definition. (a) For any T ∈ L, if φ(z) = z we have
1 I φ(T ) = zR(z)dz = T. (5.14) 2πi
T To see this let Γ be boundary of {z ∈ C : |z| > kT kL}. Since z L < 1 we have, ∞ X zR(z) = z−kT k. Hence k=0
∞ 1 I 1 I X zR(z)dz = z−kT kdz = T. 2πi 2πi k=0
44 Texas Tech University, Krishna Kaphle, August-2011
∞ X (b) If T is positive and compact with spectral expansion T = τkEk, then it can k=1 be directly verified that,
∞ X 1 R(z) = Ek, (5.15) z − τk k=1 for each z ∈ ρ(T ). Therefore, for any φ we have
∞ ∞ 1 I 1 I X 1 X φ(T ) = φ(z)R(z)dz = φ(z) Ekdz = φ(τk)Ek. (5.16) 2πi 2πi z − τk Γ Γ k=1 k=1
In compact cases the last expression in (5.16) will be often used as a definition of φ(T ). The collection of all bounded linear operators on H and the collection of all analytic functions on D are both algebras [40]. The mapping φ 7→ φ(T ) establishes an algebra homeomorphism and we have
φ(T )ψ(T ) = (φψ)(T ), (5.17) for any φ, ψ analytic on D. Using this homeomorphism, we see that I n 1 n T = z R(z)dz, n ∈ N. (5.18) 2πi Γ
In particular, 1 I I = R(z)dz. (5.19) 2πi ¯ c Furthermore for any z0 ∈ Ω we have (see [1]),
1 I 1 R(z0) = R(z)dz. (5.20) 2πi Γ z0 − z
5.1.1 Fr´echet derivative Let T ∈ L. For an arbitrary Π ∈ L the operator T˜ = T + Π may be considered a perturbation of T. Let Mφ = max |φ(z)| < ∞, and LΓ = length of Γ < ∞. Let us z∈Γ
45 Texas Tech University, Krishna Kaphle, August-2011
˙ define φT : L → L by I ˙ 1 φT Π = φ(z)R(z)ΠR(z)dz, (5.21) 2πi Γ
and I 1 2 −1 ρT Π = φ(z)R(z)(ΠR(z)) (I − ΠR(z)) dz. (5.22) 2πi Γ The above operators are well defined for sufficiently small Π ∈ L and will play an important role in our future analysis. Using the fact that, there exists a constant 0 < K < ∞, such that kR(z)k ≤ K , ∀z ∈ Ωc, (see [8]) we will prove some properties L δΓ of these operators.
˙ Proposition 5.1.1. φT Π is linear and bounded. ˙ Proof. The operator φT Π is linear using the linearity of integration. To see it is bounded, note that
1 I 1 K 2 φ˙ Π ≤ M kR(z)k2 kΠk dz ≤ M L kΠk . (5.23) T φ L L φ Γ L L 2π Γ 2π δΓ
Although the operator ρT Π is not linear, we can prove it is bounded.
Proposition 5.1.2. ρT Π is bounded.
Proof. We have 3 1 K 2 kρT ΠkL ≤ MφLΓ kΠkL , (5.24) 2(1 − c)π δΓ
δΓ provided that kΠkL ≤ c K , for some 0 < c < 1. We will end this section with the following important theorem ( see [7]) and some remarks.
Theorem 5.1.1. Let T ∈ L and φ analytic on the domain D as defined before. Then n o ˜ cδΓ φ maps the neighborhood T = T + Π : Π ∈ L, kΠkL ≤ K into L, when defined in
46 Texas Tech University, Krishna Kaphle, August-2011
the usual way of functional calculus. This mapping is Fre´chet differentiable at T, ˙ tangentially to L, with bounded derivative φT , as defined before, and we have
˙ φ(T + Π) = φ(T ) + φT Π + ρT Π. (5.25)
Remark 5.1. If we assume T and Π commute, the Fre´chet derivative reduces to the numerical derivative in the functional sense. That is,
I I ˙ 1 2 1 0 0 φT Π = φ(z)R (z)dz Π = φ (z)R(z)dz Π = φ (T )Π. (5.26) 2πi Γ 2πi Γ
To see this result, note that for z, w ∈ ρ(T ) we have the resolvent identity
R(z) − R(w) = (z − w)R(z)R(w). (5.27)
Thus, R(z) − R(w) R0(z) = lim = R2(z), (5.28) w→z z − w and the result follows by using integration by parts.
Remark 5.2. In the situation of commuting operators we have the following Taylor series expansion ∞ X φn(T ) φ(T + Π) = Πn. (5.29) n! n=0
Remark 5.3. If T is compact and positive Hermitian with eigenvalues τ1 > τ2 ... ↓ 0,
and corresponding eigenprojections E1,E2,... we get
∞ ˙ X 0 X X φ(τk) − φ(τj) φT Π = φ (τj)EjΠEj + EjΠEk, Π ∈ L. (5.30) τk − τj j=1 j6=k
Furthermore, if the operators commute, the double sum reduces to zero and then we get, ∞ ˙ X 0 φT Π = φ (τj)EjΠ. (5.31) j=1 ˙ Examples: (a) If φ(z) = z, we immediately see that φT Π = Π. −p (b) Let φ(z) = (α + z) , p > 0, α > δΓ > 0, z 6= −α. Note that the choice of α
47 Texas Tech University, Krishna Kaphle, August-2011
ensures that the pole at z = −α remains outside the contour Γ. In this situation we get,
∞ X 1 φ˙ Π = − p E ΠE (5.32) T (α + τ )p+1 j j j=1 k p p XX (α + τj) − φ(α + τk) + p p EjΠEk. j6=k (τk − τj)(α + τj) (α + τk)
5.1.2 Delta Method We know from multivariate statistics that if a sequence of matrices converges in distribution, then the sequence of a smooth function of the matrices converges in distribution whose limiting distribution depends on some kind of derivative of the function evaluated at the limit [34]. This is a special case of well known delta method (see also [7] and [16]). We will use the Fr´echet derivative discussed above to derive the delta method for a function of operators. Since eigenvalues, eigenvectors and eigenprojections are some functions of operators. This method will be extensively used in this research. Let (Ω, F, P) ba a probability space and B be the σ− field of Borel sets in L. Let ˆ for each n ∈ N we have random operators Tn ∈ LHS, and let there exist a T ∈ LHS such that ˆ d an(Tn − T ) = G, as n → ∞, in LHS, (5.33) √ where an are numbers such that an > 0 and an → ∞. Usually, we will have an = n because of the central limit theorem.
Since the mapping S 7→ S from LHS into L is continuous, the above result entails
ˆ d an(Tn − T ) = G, as n → ∞, in L. (5.34)
ˆ ˆ If Πn = (Tn − T ), with the help of the continuous mapping theorem we get, √ √ ˆ ˆ d nΠn = n (Tn − T ) = kGkHS , as n → ∞. (5.35) HS HS
This entails ˆ 1 Πn = O(√ ). (5.36) HS n
48 Texas Tech University, Krishna Kaphle, August-2011
From this result we can see that, ˆ 1 P(Ωn) = P ω ∈ Ω: Πn,ω ≤ → 1, as n → ∞. (5.37) HS n1/3
Because of (4.25) the Hilbert- Schmidt norm can be replaced by operator norm with-
out changing the above results (see also [17]). With the help of the set (Ωn) we immediately get the following delta method for functions of operators.
Theorem 5.1.2. Let φ : D → C be analytic on the domain D as defined before, then we have √ ˆ d ˙ n(φ(Tn) − φ(T )) = φT G, as n → ∞, in L, (5.38) ˙ where φT G is given by (5.30).
Proof. Let Ωn be the set as defined in (5.37), and let n be sufficiently large to ensure that 1 δT 1 ≤ c , n 3 K
where K is as in Section 5.1. For such n we have, with Ωn as defined before, √ n o √ n o √ n o ˆ ˆ ˆ ˆ c n φ(Σn) − φ(Σ) = n φ(Σn) − φ(Σ) 1Ωn + n φ(Σn) − φ(Σ) 1Ωn √ n ˙ ˆ ˆ o = n φΣΠn + ρΣΠn 1Ωn + op(1).
Since ρΣ is bounded, using (5.37), (5.24) we have
1 ρ Πˆ 1 = O , Σ n Ωn p 2 L n 3
√ n ˆ o √ ˙ ˆ so that n φ(Σn) − φ(Σ) has the same limiting distribution as nφΣΠn. Because ˙ φΣ : L → L is continuous we find that √ √ ˙ ˆ ˙ ˆ ˙ nφΣΠn = φΣ nΠn = φΣG,
as n → ∞, and we are done. Remark 5.4. If T = Σ is the covariance operator of a Gaussian random variable ˆ X with only simple eigenvalues and Σn the sample covariance operator, using the
49 Texas Tech University, Krishna Kaphle, August-2011
˙ Karhunen - Loeve expansion of G as in (4.46), we see that the first term of φΣG is ! ˙ X 0 2 X X φΣG = φ (σj )(ej ⊗ ej) vα,βZα,βeα ⊗ eβ (ej ⊗ ej) j α β X 0 2 X = φ (σj )(ej ⊗ ej) vα,jZα,j(ej ⊗ ej) j α X 0 2 = φ (σj )vj,jZj,j(ej ⊗ ej), j and similarly it can be seen that the second term is
2 2 XX φ(σk) − φ(σj ) = v Z (e ⊗ e ) σ2 − σ2 j,k j,k j j j6=k k j
50 Texas Tech University, Krishna Kaphle, August-2011
CHAPTER 6 PERTURBATION OF EIGENVALUES AND EIGENVECTORS
6.1 Perturbation theory for operators We will discuss results of convergence of eigenvalues, eigenvectors, and eigenpro- jections of a small perturbation T˜ = T + Π, where both T and Π are bounded and Hermitian. Although our aim is to apply these results for covariance operators, we will discuss the theory for more general operators. We will assume that T ∈ LH
has an isolated simple eigenvalue λ1, whose one-dimensional eigenspace is given by
E1 = e1 ⊗ e1 for some unit vector e1. Let Ω = Ω0 ∪ Ω1 be as defined in Chapter 5, where
Ω0 ⊃ σ(T )\{λ1} , Ω1 ⊃ {λ1} , and dist(Ω0, Ω1) > 0. (6.1)
Let us define an analytic ψ1 : D → C such that,
ψ1(z) = 1, z ∈ Ω1, ψ1(z) = 0 otherwise. (6.2)
We have
T = λ1E1 + T0, (6.3)
where T0 is Hermitian with σ(T0) ⊂ Ω0. Since there exists a resolution of the identity E(λ), λ ∈ σ(T ) with [1] 0 Z T0 = λdE(λ), (6.4) σ(T0) with resolvent Z 1 R0(z) = dE(λ), σ(T0) z − λ and
E1E(λ) = E(λ)E1 = 0. (6.5)
We have Z −1 λ1 1 1 R(z) = (zI − T ) = E1 + dE(λ). (6.6) z z − λ1 σ(T0) z − λ
51 Texas Tech University, Krishna Kaphle, August-2011
Thus we get
1 I 1 I 1 1 ψ1(T ) = R(z)d(z) = − E1d(z) = E1. (6.7) 2πi ∂Ω1 2πi ∂Ω1 z − λ1 z
Let Z 1 Q1 = dE(λ). (6.8) σ(T0) λ1 − λ
Now we are in the position to find the Fr´echet derivative of ψ1 at T .
Theorem 6.1.1. The Fre´chet derivative of ψ1 at T is given by
˙ ψ1,T Π = E1ΠQ1 + Q1ΠE1. (6.9)
Proof. We have I ˙ 1 ψ1,T Π = R(z)ΠR(z)dz, (6.10) 2πi Γ1 where Γ1 = ∂Ω1. Thus we get,
I 2 ˙ 1 λ1 ψ1,T Π = 2 2 E1ΠE1d(z) 2πi Γ1 z (z − λ1) I Z 1 λ1 1 + E1Π dE(λ) d(z) 2πi Γ1 z(z − λ1) σ(T0) z − λ I Z 1 1 λ1 + dE(λ) Π E1d(z) 2πi Γ1 σ(T0) z − λ z(z − λ1) 1 I Z 1 Z 1 + dE(λ) Π dE(µ) d(z) 2πi Γ1 σ(T0) z − λ σ(T0) z − µ
By the Cauchy integral formula the first term is the zero operator [6]. However, note that
52 Texas Tech University, Krishna Kaphle, August-2011
I Z 1 λ1 1 E1Π dE(λ) d(z) 2πi Γ1 z(z − λ1) σ(T0) z − λ Z I 1 λ1 1 = E1Π dz dE(λ) 2πi σ(T0) Γ1 z(z − λ1) (z − λ) 1 Z I 1 1 1 1 = E1Π + − dz dE(λ). 2πi σ(T0) Γ1 λz (λ1 − λ) (z − λ1) (z − λ)
Also, note that λ1 is inside of Γ1 but λ and 0 are outside. Thus we have
1 I 1 1 I 1 I 1 dz = 1, dz = 0, and dz = 0. 2πi Γ z − λ1 2πi Γ1 z − λ Γ1 z
Hence we get the second term
1 Z 1 = E1Π dE(λ) = E1ΠQ1. (6.11) 2πi σ(T0) λ1 − λ
Similarly the third term equals Q1ΠE1. Since both λ and µ lies outside of the contour 1 the last term is zero because is analytic inside the contour. This proves (z − λ)(z − µ) (6.9).
Remark 6.1. The result of the above theorem holds true for any eigenprojection Ep by replacing λ1 by any isolated and simple λp.
The following result regarding the eigenprojection of a perturbation of T is of importance. For sufficiently small Π ∈ LH , we have the following. ˜ ˜ Theorem 6.1.2. T = T +Π has an isolated simple eigenvalue λ1 with eigenprojection ˜ E1 and ˜ E1 = E1 + E1ΠQ1 + Q1ΠE1. (6.12)
˜ 0 2 2 Proof. We have ψ1(T ) = ψ1(T )+ψ1,T Π+O(kΠkL) = E1+E1ΠQ1+Q1ΠE1+O(kΠkL). ˜ 2 2 ˜ ˜ ˜ Note that (ψ1(T )) = ψ1(T ) = ψ1(T ) (see (5.17)) , so ψ1(T ) is idempotent. Clearly, ˜ it is Hermitian so that it is a projection. Let the projection be E1. For sufficiently small Π, we have ˜ E1 − E1 < 1. (6.13) L
53 Texas Tech University, Krishna Kaphle, August-2011
˜ Hence E1 must also have dimension 1 (see [31]), and the the eigenvalue associated with ˜ it must be simple and E1 =e ˜1 ⊗ e˜1, for some unit vectore ˜1. Also with χ(z) = z, ∀z ∈ ˜ ˜ ˜ ˜ ˜ ˜ C, we see that, (χψ1)(T )˜e1 = T e˜1, and also (ψ1χ)(T )˜e1 = E1T e˜1 = (˜e1 ⊗ e˜1)T e˜1 = D ˜ E ˜ ˜ ˜ ˜ ˜ T e˜1, e˜1 e˜1 = λ1e˜1. Hence T e˜1 = λ1e˜1, and λ1 is an eigenvalue of T .
The following corollaries show similar results for eigenvectors and eigenvalues of
the small perturbation of operators, assuming that T ∈ LH has a simple and isolated
eigenvalue λ1 and corresponding eigenvalue e1. ˜ Corollary 6.1.1. If e˜1 and λ1 are the eigenvector and eigenvalue corresponding to e1 and λ1, then