A new decomposition of Gaussian random elements in Banach spaces with application to Bayesian inversion. Jean-Charles Croix
To cite this version:
Jean-Charles Croix. A new decomposition of Gaussian random elements in Banach spaces with appli- cation to Bayesian inversion.. General Mathematics [math.GM]. Université de Lyon, 2018. English. NNT : 2018LYSEM019. tel-02862789
HAL Id: tel-02862789 https://tel.archives-ouvertes.fr/tel-02862789 Submitted on 9 Jun 2020
HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés.
Abstract
Inferring is a fundamental task in science and engineering: it gives the opportunity to compare theory to experimental data. Since measurements are finite by nature whereas parameters are often functional, it is necessary to offset this loss of in- formation with external constraints, by regularization. The obtained solution then realizes a trade-off between proximity to data on one side and regularity on the other.
For more than fifteen years, these methods include a probabilistic thinking, taking uncertainty into account. Regularization now consists in choosing a prior probability measure on the parameter space and explicit a link between data and parameters to deduce an update of the initial belief. This posterior distribution informs on how likely is a set of parameters, even in infinite dimension.
In this thesis, approximation of such methods is studied. Indeed, infinite dimen- sional probability measures, while being attractive theoretically, often require a dis- cretization to actually extract information (estimation, sampling). When they are Gaussian, the Karhunen-Lo`eve decomposition gives a practical method to do so, and the main result of this thesis is to provide a generalization to Banach spaces, much more natural and less restrictive. The other contributions are related to its practical use with in a real-world example.
iii iv R´esum´e
L’inf´erence est une activit´efondamentale en sciences et en ing´enierie: elle permet de confronter et d’ajuster des mod`eles th´eoriques aux donn´ees issues de l’exp´erience. Ces mesures ´etant finies par nature et les param`etres des mod`eles souvent fonction- nels, il est n´ecessaire de compenser cette perte d’information par l’ajout de con- traintes externes au probl`eme, via les m´ethodes de r´egularisation. La solution ainsi associ´ee satisfait alors un compromis entre d’une part sa proximit´e aux donn´ees, et d’autre part une forme de r´egularit´e.
Depuis une quinzaine d’ann´ees, ces m´ethodes int`egrent un formalisme probabiliste, ce qui permet la prise en compte d’incertitudes. La r´egularisation consiste alors `a choisir une mesure de probabilit´esur les param`etres du mod`ele, expliciter le lien entre donn´ees et param`etres et d´eduire une mise-`a-jour de la mesure initiale. Cette probabilit´ea posteriori, permet alors de d´eterminer un ensemble de param`etres com- patibles avec les donn´ees tout en pr´ecisant leurs vraisemblances respectives, mˆeme en dimension infinie.
Dans le cadre de cette th`ese, la question de l’approximation de tels probl`emes est abord´ee. En effet, l’utilisation de lois infini-dimensionnelles, bien que th´eoriquement attrayante, n´ecessite souvent une discr´etisation pour l’extraction d’information (cal- cul d’estimateurs, ´echantillonnage). Lorsque la mesure a priori est Gaussienne, la d´ecomposition de Karhunen-Lo`eve est une r´eponse `acette question. Le r´esultat principal de cette th`ese est sa g´en´eralisation aux espaces de Banach, beaucoup plus naturels et moins restrictifs que les espaces de Hilbert. Les autres travaux d´evelopp´es concernent son utilisation dans des applications avec donn´ees r´eelles.
v vi Remerciements
Pendant les trois ann´ees qui m’ont men´e`ar´ediger ce manuscrit, j’ai conscience d’avoir b´en´efici´e du soutien de nombreuses personnes, chacune contribuant `a l’´etablissement des r´esultats qui sont retranscrits dans ce manuscrit `asa fa¸con. Il m’est impossible d’ˆetre exhaustif, mais je souhaite en remercier ici un certain nombre d’entre elles.
Mes tout premiers remerciements s’adressent aux deux rapporteurs de cette th`ese, Antoine Ayache et Josselin Garnier. Au travers de leurs rapports respectifs tr`es d´etaill´es, ils m’ont tous deux fait part d’un regard avis´e sur les travaux pr´esent´es, ainsi que de pr´ecieuses remarques menant `ade nouvelles pistes de recherche. Leur validation a ´et´eessentielle `ames yeux et me conforte dans ce choix de carri`ere.
En second lieu, je souhaite exprimer toute ma gratitude envers Anthony Nouy et Mikhail Lifshits qui ont accept´es d’ˆetre membres examinateurs du Jury. Con- jointement avec celles des autres membres, je suis convaincu que leurs analyses et remarques me permettront d’envisager de nouvelles pistes fructueuses.
Les personnes que je vais maintenant remercier, forment l’´equipe qui m’a en- cadr´e ces trois ann´ees : Mireille Batton-Hubert, Xavier Bay et Eric Touboul. J’ai de nombreuses raisons de leur ˆetre reconnaissant, mais je souhaiterai d’abord commencer par souligner les qualit´es humaines dont ils ont fait preuve tout au long de cette collaboration. Merci de m’avoir constamment fait confiance, d’avoir ´et´eaussi disponibles et bienveillants pendant ce doctorat. Que ce soit en r´eunion, autour d’un caf´e, pendant le d´ejeuner ou en pleine course `apied, j’ai pass´emon temps `aprofiter de leurs curiosit´es scientifiques et de leurs expertises. J’ai beaucoup appris grˆace `aeux et je souhaite vivement continuer `apartager leur passion pour les sciences. Je leur dois donc beaucoup, autant sur le plan humain que scientifique. Merci `aeux de m’avoir permis, dans de si bonnes conditions, de r´ealiser cette th`ese de doctorat.
Pendant cette th`ese, Nicolas Durrande m’a propos´e de le rejoindre dans un projet scientifique Franco-Colombien. Ce fut le d´epart d’un travail commun avec
vii viii
Mauricio A. Alvarez, m’ayant conduit `avisiter l’universit´ede Pereira en Colombie plusieurs fois. En plus d’avoir l’opportunit´ed’enseigner dans un contexte excep- tionnel, cet ´echange a donn´enaissance `al’application pr´esente dans ce document. Pour toutes ces raisons, je dois beaucoup `aNicolas et Mauricio. Bien entendu, je remercie aussi toutes les personnes avec qui je partage maintenant des souvenirs inoubliables, notamment Julian David Echeverry, Julian Gil Gonzalez et Alvaro Angel Orozco.
Je voudrais maintenant remercier mes autres coll`egues st´ephanois, dont j’ai eu le plaisir de partager le quotidien au fil des ans: Rodolphe Leriche, Olivier Rous- tant, Isabelle Chantrel, Christine Exbrayat, Xavier Delorme, Fr´ed´eric Grimaud, Audrey Derrien, Paolo Gianessi, Didier Rulli`ere, David Gaudrie, Marie-Liesse Cauwet, Adrien Spagnol, Andres Felipe Lopez-Lopera, Mohammed-Amine Abdous, Madhi El-Alaoui-El-Abdellaoui, Adrien Spagnol, Serena Finco, Oussama Mas- moudi, Marie-Line Barneoud, Bruno L´eger, Nicolas Garland, Esperan Padonou, Thierry Garaix, Niloufare Sadr, Jean-Fran¸cois Tchebanoff, Gabrielle Bruyas et Alain Mounier. Nombre d’entre eux sont devenus des amis proches.
Ma famille m’a toujours accompagn´e, partageant chaque moment de joie et me soutenant dans les autres. J’ai ´enorm´ement de chance de les avoir tous et mˆeme s’ils le savent d´ej`a,je compte le redire ici aussi : je suis fier et infiniment reconnaissant de les avoir dans ma vie.
A Saint-Etienne, il y a deux personnes `a qui je dois beaucoup car elles ont tout simplement ´et´eune source quotidienne de bonheur. J’ai trouv´echez elles toutes les qualit´es que l’on recherche chez des proches, allant de la g´en´erosit´eet la gentillesse qui sont n´ecessaires `atoute bonne relation jusqu’`ala t´enacit´eet le courage qui forgent les amis. Leurs noms sont Afafe Zehrouni et Romain No¨el et je leur dis `atous les deux merci d’avoir rendu ces trois ann´ees aussi belles.
Je finirai par remercier mes proches, Cl´ement Pallez, Etienne & Marie Spormeyeur, Nicolas & Pauline Valance, Thomas & Fanny Calm´es, Jean-Baptiste Dufour, Mathilde Armant, Claire Fousse, Jo¨el Pfaff, Hugo Bernard-Brunel, Fulvio Mounier, Anne-Kim Lˆe, Chlo´eDasse, Am´elie Beyssat, Sylvain Housseman, Alban Derrien et Pauline Julou pour tous les moments v´ecus et `avenir. Contents
Abstract iii
R´esum´e v
Remerciements vii
Table of contents x
List of figures xi
Introduction 1
I Approximation of Gaussian priors in Banach spaces 5
1 Introduction to Gaussian random elements 7 1.1 Definition ...... 7 1.2 Integrability ...... 10 1.3 Covariance and Hilbert subspace(s) ...... 12 1.4 Seriesrepresentation ...... 20 1.5 Additional results in the Hilbert case ...... 26 1.6 Gaussian random elements in C ( , R) ...... 30 K 1.7 Conclusion ...... 42
2 Karhunen-Lo`eve decomposition in Banach spaces 43 2.1 Extending the Karhunen-Lo`eve decomposition ...... 43 2.2 The special case of C ( , R)...... 54 K 2.3 Examples in ( , R)...... 57 C K 2.4 A study of the standard Wiener process ...... 60 2.5 Conclusion ...... 70
II Applications to Bayesian inverse problems 73
3 Introduction to inverse problems 75 3.1 Forward, inverse and ill-posed problems ...... 75
ix x CONTENTS
3.2 Examples ...... 77 3.3 Tikhonov-Philips regularization for operators ...... 79 3.4 Other regularizations methods ...... 84
4 Theory of Bayesian regularization in Banach spaces 87 4.1 Kriging: a motivation for Bayesian inversion ...... 87 4.2 Bayesian regularization ...... 92 4.3 Maximum a posteriori ...... 100 4.4 Markov chain Monte-Carlo method ...... 102 4.5 Approximation ...... 108 4.6 Conclusion ...... 111
5 Karhunen-Lo`eve approximation of Bayesian inverse problems 113 5.1 Approximation using projections ...... 113 5.2 Application to advection diffusion model ...... 115 5.3 Numerical results ...... 128 5.4 Conclusion ...... 132
Conclusion and future works 137
Glossary of mathematical notations 139
Bibliography 141 List of Figures
2.1 8 first normalized basis functions (squared exponential kernel). . . . 61 2.2 Evolution of 8 first basis functions variances (squared exponential kernel)...... 61 3 2.3 8 first normalized basis functions (Mat´ern ν = 2 kernel)...... 62 3 2.4 Evolution of 8 first basis functions variances (Mat´ern ν = 2 kernel). 62 1 2.5 8 first normalized basis functions (Fbm H = 4 kernel)...... 63 1 2.6 Evolution of 8 first basis functions variances (Fbm H = 4 kernel). . 63 3 2.7 8 first normalized basis functions (Fbm H = 4 kernel)...... 64 3 2.8 Evolution of 8 first basis functions variances (Fbm H = 4 kernel). . 64 2.9 Numerical comparison of l-errors of admissible sequences for the stan- dardWienerprocess...... 71 2.10 Comparison of a-errors of admissible sequences for the standard Wiener process...... 71
3.1 Forward and inverse problems...... 76
4.1 Ω-minimizing and Tikhonov-Philips solutions in C ([0, 1], R) with Matern 3 2 covariancekernel...... 90 4.2 Mean function m with 95% confidence interval and 10 sample func- tions from u u(x)= y with Mat´ern 3 covariance kernel...... 91 | 2 4.3 Bayesianinverseproblem...... 92
5.1 Trace plots of Φ(u; y), λ, D, ξ0, ξ1 and ξ2 (the first 6 000 iterations areburned)...... 133
5.2 Autocorrelation of Φ(u; y), λ, D, ξ0, ξ1 and ξ2, excluding burning sample...... 134 5.3 Marginal posterior distributions for λ y and D y...... 134 | | 5.4 MAP estimator from McMC sampling. Left: Estimated source term
fMAP , right: estimated solution z(uMAP ) with absolute error at data locations...... 135 5.5 Posterior point-wise variance of source (left) and solution (right). . . 136
xi xii LIST OF FIGURES Introduction
In experimental sciences and engineering, most of the mathematical models are designed to represent real-world phenomenon with various levels of fidelity. They describe how a well-known initial state evolves under the modelled assumptions in a final outcome. Their quality is then assessed by comparison with a physical experi- ment for instance. Since this represents a logical link from causes to consequences, this is called a direct approach and most models are designed to be well-posed in the sense of [Had02], meaning that for any set of initial conditions, one and only one solution exists and it is continuous in the causes.
However, the same model can serve different purposes and in particular, it can be reversed. Indeed, since the model gives a tangible link between causes and con- sequences, information can go both ways. The reverse approach then consists in inferring what would have been the causes leading to a particular outcome. Despite an obvious symmetry of these two concepts, they are very different in nature.
The well-posedness of the direct problem implies the opposite for the inverse ap- proach, especially in infinite dimensional spaces, where an important loss of infor- mation (also called smoothing) happens. Stronger it is, hardier the reversion will be.
One common methodology to overcome this difficulty is regularization. In essence, it consists in looking for less general causes for the observed output, adding new and model-external constraints. If the parameter is a function, this could be done by imposing a certain regularity. The notion of solution is thus slightly modified, including a compromise between actually explaining observations and respecting these additional constraints. This provides a new problem, which in particular cases can be well-posed (in the previous sense). The probably most known regularization method is Tikhonov-Philips [Tik63, Phi62].
In parallel, it is now well established that errors of different natures may appear in the practical use of mathematical models. For instance, real-world measurements are always limited to a level of precision, which can sometimes be taken into ac- count in a probabilistic framework. In the context of inverse problems, it results in randomness of the associated causes, even with regularization. The study of this
1 2 INTRODUCTION effect is called quantification of uncertainty and it is the objective of Bayesian meth- ods to intrinsically provide it as well as regularization [Stu10, DS15]. It consists in choosing an initial probability measure on the causes and use some observations (even poorly related) of the consequences, and then use Bayes theorem to obtain a new distribution reflecting how data are informative. Such inverse problem is said well-posed when the probability a posteriori satisfies Hadamard’s definition.
The motivation in this thesis is to investigate Bayesian inverse problems, with a particular focus on their discretization [CDS10]. Indeed, when the mathematical model involves partial derivative equations, the solution is often an infinite dimen- sional probability distribution. As it is characterized by a density, it is usually necessary to rely on simulation or variational optimization to extract information. These two methods need a discretization and this is the subject of interest here.
When the prior distribution is Gaussian [Bog98], such as continuous Gaussian ran- dom fields [RW04], it is well-known that it can be represented as a random series. The approximation then consists in keeping a finite number of terms in a truncated representation. Since it defines a different prior, it results in a distinct posterior and such discretization is consistent when both tend to be equal when the num- ber of terms increases [Hos17a]. It appears that this convergence, measured in the Hellinger metric, can be controlled by the quality of the prior approximation. This naturally lead to the question of optimal series representation.
The Karhunen-Lo`eve decomposition gives such representation in Hilbert spaces. However, it has several limitations for a practical use. First, it is the solution of an eigenvalue problem involving the covariance operator which is rarely known analyti- cally. This restricts heavily the possible choices for the covariance kernel for instance. Secondly, it is based on Hilbert geometry, meanwhile the natural framework for in- verse problems is Banach spaces. Both of these questions will be addressed, through a direct generalization of this decomposition to Banach spaces [BC17]. Besides the theoretical result, it appears that this new representation is easy to implement in the space of continuous functions. The optimality properties are discussed in this context, even if they are provided only in the special case of the Wiener process over [0, 1].
The document is written in two distinct parts of respectively two and three chapters. Chapter 1 presents a short introduction to the theory of Gaussian random elements, with particular focus on series representation and covariance factorization. In chap- ter 2, the new decomposition is given, both in general Banach spaces and in the case of continuous functions over a compact metric set. It also includes both analytical and numerical examples involving Gaussian random processes. Finally, a particular analysis is given concerning the Wiener process, where two distinct optimality cri- INTRODUCTION 3 teria are studied. The second part turns to inverse problems. Specifically, chapter 3 gives a short reminder of Tikhonov-Philips regularization. Chapter 4 provides a detailed presentation of the Bayesian methodology for inverse problems, as it can be found in the literature. It includes a unified treatment with the sets of hypotheses given in [Hos17a], adapted to prior measures with exponential tails. Finally, chapter 5 gives an example how the new Karhunen-Lo`eve decomposition can be used in a real-world, non-linear inverse problem. In particular, the complete analysis of the problem is provided, as well as experimental hyper-parameter tuning. 4 INTRODUCTION Part I
Approximation of Gaussian priors in Banach spaces
5
Chapter 1
Introduction to Gaussian random elements
The objective of this very first chapter is to introduce the theory of Gaussian random elements in Banach spaces. It is a direct generalization of Gaussian random vectors to infinite dimensional spaces, which in particular, will be useful as prior distribu- tions in a Bayesian context. The presentation given here only includes results already well-known in the literature, and one may find inspiring treatments in [Bog98, Lif12] for locally convex spaces, [VTC87, Hai09, VV08, Kuo75, QL04, vN07, DZ14] for Ba- nach spaces and [DZ04a] in Hilbert context.
1.1 Definition
Random elements In finite dimensional spaces, the notion of random variable is well-known (see textbooks [Str10, M´el10]) and defined as a Borel measurable appli- cation from any probability space. To extend this notion to (general) Banach spaces, measurability must be specified for different reasons. First, in infinite dimensional spaces, cylindrical (smallest such that every bounded linear functional is measur- able) and Borel (smallest containing all open sets) σ-algebras are distinct and both leading to different definitions in general. Secondly, a Borel measurable map may not be approximated by simple functions, making the usual construction of Lebesgue’s integral intractable. In order to circumvent both these difficulties, we stick to the notion of strong (or separably-valued) random elements, which offers a sufficiently general setup while avoiding unnecessary technical complications. From now on, we will always consider (Ω, , P) and ( ,Bor( )) as prototypes of probability space F U U and measurable Banach space (equipped with its Borel σ-algebra).
Definition 1 (Random element). Let (Ω, , P) be a probability space and a Ba- F U nach space, a random element in is a map X :Ω such that U →U
X is (strongly) measurable: A Bor( ), X 1(A) , • ∀ ∈ U − ∈ F 7 8 CHAPTER 1. INTRO. TO GAUSSIAN RAND. ELEM.
X is almost-surely separably valued: , a separable Banach space such • ∃U0 ⊂U that X almost-surely. ∈U0 With this definition, it is equivalent to consider the Banach space separable, U since one could restrict himself to . In this case, both cylindrical and Borel σ- U0 algebras are the same. The space of all (equivalence classes of) random elements from (Ω, , P)to( ,Bor( )) will be noted L0( ) instead of L0(Ω, , P; ,Bor( )) F U U U F U U since we always assume the same probability space and σ-algebra. An alterna- tive treatment could be made with adequate notions of probability measures, since random elements as previously defined immediately induce Radon probability mea- sures. However, the random element point of view is used here, since it appears more intuitive in applications.
Proposition 1 (Distribution of a random element). Let be a Banach space, U X L0( ) a random element, then the map ∈ U
µX := A Bor( ) P(X A) [0, 1], ∈ U → ∈ ∈ is a Borel probability measure, called the distribution of X and it is Radon.
Proof. Since X is measurable, µX is well-defined as a push-forward measure. The fact that µX is a Radon is a classical result from measure theory, see proposition I.2 in [QL04] for instance.
Two random elements sharing the same distribution will be called identically distributed. One particularly important property of such probability measures is their characterization through all one-dimensional projections.
Definition 2 (Fourier transform). Let be a Banach space, X L0( ) a random U ∈ U element, the associated Fourier transform is defined as
X := l ∗ E exp i X,l , ∗ C. ∈U → h iU U ∈ h i Proposition 2. Let b be a Banach space, X,Y L0( ) two random elements, if U ∈ U Xˆ = Yˆ then X and Y are identically distributed.
Proof. This result lies on a fundamental property, two finite dimensional probability measures with the same Fourier transform are equal. Now, since the set of all cylinders is a Dynkin system and generates the Cylindrical σ-algebra, which in separable spaces coincides with Borel σ-algebra, the result is clear.
From now on, the notation u,l will always denote the image of a vector u by h i a bounded linear functional l, it is the duality pairing with in . The dual space U will be noted . When different spaces will be involved, the complete notation U ∗ u,l , ∗ will be preferred. h iU U 1.1. DEFINITION 9
Gaussian random elements The role of Gaussian distributions is ubiquitous in probability theory, statistics and most of their applications in science and engineer- ing. In Euclidean spaces, they are usually defined by a density w.r.t. the funda- mental Lebesgue’s measure. However, when the dimension is infinite, this cannot be done since Lebesgue’s measure does not exist any more. However, since all one dimensional projections of random elements totally characterize their underlying distribution, the following definition is interesting.
Definition 3 (Gaussian random element). Let be a Banach space, X L0( ) a U ∈ U random element, it is Gaussian if
l ∗, X,l is a Gaussian random variable. ∀ ∈U h i
This definition is clearly a generalization of a fundamental fact about Gaussian vectors. Now, in terms of vocabulary, a random element is said centred when all one-dimensional projections X,l are, and only this case will be considered here for h i simplicity (without further mention). The general theory is obtained through a translation. In terms of distribution, since all projections are Gaussian it imposes a distinct form for the Fourier transform.
Proposition 3. Let be a Banach space, X L0( ) a random element, it is U ∈ U Gaussian if and only if 1 Xˆ(l) = exp ϕX (l,l) , −2 where:
ϕX :(l ,l ) ∗ ∗ E [ X,l X,l ] R 1 2 ∈U ×U → h 1ih 2i ∈ is a symmetric, non-negative, bilinear form.
Proof. Let X L0( ) be a random element. If it is Gaussian, by definition ∈ U X,l , l is a Gaussian random variable and its Fourier transform is: h i ∀ ∈U ∗ t2 \X,l : t R exp E X,l 2 R. h i ∈ → − 2 h i ∈ h i \ Now, it is enough to see that l , X(l) = X,l (1) and take ϕX (l ,l ) := ∀ ∈ U ∗ h i 1 2 E [ X,l1 X,l2 ] which is clearly bilinear, non-negative and symmetric. Conversely, h ih i b suppose the Fourier transform of X has the previous form, then for all l , X,l ∈U ∗ h i is Gaussian, thus X is a Gaussian random element and the proof is complete.
The bilinear form ϕX in proposition 3 is called the covariance of X and totally characterizes the distribution of a Gaussian random element (since it describes the Fourier transform). In some cases, it enjoys definiteness as well, and such covariance will be said non-degenerated and provides a pre-Hilbert structure to . The finite U ∗ dimensional analogue of ϕX is the usual covariance matrix. One easy (and useful) consequence of proposition 3 is the conservation of Gaussian distributions under bounded linear transformations. 10 CHAPTER 1. INTRO. TO GAUSSIAN RAND. ELEM.
Proposition 4. Let , be Banach spaces, X L0( ) a Gaussian random U1 U2 ∈ U1 element and A ( , ) a bounded operator, then AX L0( ) is a Gaussian ∈ L U1 U2 ∈ U2 random element.
Proof. A : is a bounded operator, thus the adjoint A : exists U1 → U2 ∗ U2∗ → U1∗ ∗ ∗ and (x,l) 1 2∗, Ax, l 2, = x, A∗l 1, . Now: ∀ ∈U ×U h iU U2 h iU U1 1 l ∗, AXˆ (l)= Xˆ(A∗l) = exp ϕX (A∗l, A∗l) , ∀ ∈U2 −2 which is the Fourier transform of a centred Gaussian random element by proposition 3.
Corollary 1. Let a Banach space, X =(X ,X ) a Gaussian random element in U 1 2 L0( ), then X + X is a Gaussian random element in . U×U 1 2 U Proof. Apply proposition 4 with the bounded operator A := (x,y) x + y. → Now that Gaussian random elements are defined, the presentation will now turn to advanced properties, starting with Bochner integrability.
1.2 Integrability
There are multiple notions of vector-valued integration in Banach spaces (Pettis and Bochner for instance), referring to the different notions of measurability mentioned earlier and here, only Bochner’s theory will be used. In this area, Fernique’s theorem is a very powerful result, stating that all Gaussian random elements have exponential tails. This will have deep consequences, in particular w.r.t. the covariance form. π Fernique’s theorem only needs a rotation (of angle 4 ) invariance principle, stated in the next lemma.
Lemma 1. Let be a Banach space, X,Y L0( ) independent and identically U ∈ U distributed Gaussian random elements with distribution µ, then √2 (X Y ), √2 (X + 2 − 2 Y ) are both Gaussian random elements with distribution µ and are independent.
Proof. Let (l ,l ) then 1 2 ∈U ∗ ×U ∗
E [ X Y,l X + Y,l ]= ϕX (l ,l ) ϕY (l ,l ) = 0, h − 1ih 2i 1 2 − 1 2 which gives independence and
1 2 1 E X Y,l = (ϕX (l ,l )+ ϕY (l ,l )) = ϕX (l ,l ). 2 h − 1i 2 1 1 1 1 1 1 h i The computation is similar for the second random element.
Theorem 1 (Fernique’s). Let be a Banach space, X L0( ) a Gaussian random U ∈ U element, then there exists α> 0 such that:
κ [0,α], E exp κ X 2 < . ∀ ∈ k kU ∞ 1.2. INTEGRABILITY 11
Proof. The proof is based on an estimation of the type:
2 2 E exp(κ X ) P ( X t0)exp κt0 k kU ≤ k kU ≤ h i ∞ 2 + P (tn X tn+1)exp κtn+1 , ≤ k kU ≤ n=0 X where the sequence (tn)n N is chosen diverging and such that previous series con- ∈ verges.
Let X,Y L0( ) be independent and identically distributed Gaussian random • ∈ U elements, (s,t) R2 such that t s > 0 then it comes from the previous ∈ ≥ lemma:
P ( X s) P ( X >t) k kU ≤ k kU = P X + Y √2s P X Y > √2t k kU ≤ k − kU P X Y √2s, X + Y > √2t . ≤ | k kU − k kU |≤ k kU k kU Clearly,
X + Y X Y > √2t √2s, k kU k kU − | k kU − k kU | − 2min( X , Y ) > √2(t s), ⇔ k kU k kU − thus t s 2 P ( X s) P ( X >t) P X > − . k kU ≤ k kU ≤ k kU √2 Now, t and get s will be fixed along a specific sequence (tn)n N. ∈ 2 √ Now, let t0 > 0 such that P( X t0) 3 and define tn+1 = t0 + 2tn, n • k nk+1U ≤ ≥ ∀ ∈ √2 1 n+4 N, then by induction t = r − and thus t t √2 . Furthermore, let n √2 1 n 0 − ≤ P( X >tn) 2 u := k k which leads to u u from the previous equation with n P( X t0) n+1 n k k≤ 2n ≤ 2n s = t0 and t = tn+1. Finally, un u0 2− and P( X tn)= unP( X n ≤ ≤ k k≥ k k≤ t ) 2 2 . 0 ≤ − Injecting these elements in the initial expression leads to •
2 2 ∞ 2n 2 n+5 E exp(κ X ) exp κt0 + 2− exp κt02 , k kU ≤ n=0 h i X ∞ exp κt2 + exp 2n log(2) κt225 ≤ 0 − − 0 n=0 X which converges for κ> 0 and sufficiently small.
From this theorem, we know that Gaussian random elements have thin tails, similarly to the finite dimensional case. In particular, it implies the existence of moments of (strong) order p, for all p N. ∈ 12 CHAPTER 1. INTRO. TO GAUSSIAN RAND. ELEM.
Corollary 2. Let be a Banach space, X L0( ) a Gaussian random element, U ∈ U then β R+, E [exp(β X )] < + . ∀ ∈ k kU ∞ Proof. Let κ from theorem 1, then α [0,κ] ∀ ∈
x R exp(βx)exp αx2 R+, ∈ → − ∈ is bounded for all β R. ∈
Corollary 3. Let be a Banach space, X L0( ) a Gaussian random element, U ∈ U then: p N, E[ X p ] < + . ∀ ∈ k kU ∞
Proof. Let κ from theorem 1, then for all α [0,κ], the function: ∈
+ p κx2 x R x e− R ∈ → ∈ is bounded for all p N. ∈
Corollary 4. Let be a Banach space, X L0( ) a Gaussian random element, U ∈ U then the covariance bilinear form ϕX is continuous and satisfies the following esti- mate: 2 ϕX E X . k k≤ k kU h i
Proof.
2 (l1,l2) ∗ ∗, E [ X,l1 X,l2 ] E X l1 ∗ l2 ∗ , ∀ ∈U ×U h ih i ≤ k kU k kU k kU h i taking the supremum on the unit ball of in both arguments completes the proof. U ∗
Next section will continue on properties of the covariance form, using Hilbert spaces.
1.3 Covariance and Hilbert subspace(s)
In the definition of a Gaussian random element, the covariance is a central notion and encloses valuable information. In particular, it completely characterizes the distribution, and as it will be shown, provides a fundamental Hilbert structure. 1.3. COVARIANCE AND HILBERT SUBSPACE(S) 13
Covariance operator In the spirit of finite dimensional spaces, the covariance can be expressed in operator language instead of bilinear forms (the covariance matrix may also be seen as a linear map). This will provide an alternative point of view, particularly adapted to standard functional analysis tools.
Definition 4. Let be a Banach space, X L0( ) a Gaussian random element, U ∈ U then its covariance operator X is defined as follows: C
X := l ∗ E [ X,l X] . C ∈U → h i ∈U The covariance operator is well-defined and bounded as a consequence of corol- lary 3.
Proposition 5. Let be a Banach space, X L0( ) a Gaussian random element U ∈ U then its covariance operator X satisfies the following relations: C
(l1,l2) ∗ ∗, X l1,l2 , ∗ = ϕX (l1,l2), • ∀ ∈U ×U hC iU U
X ( ∗, ) = ϕX • kC kL U U k k
Proof. It is a standard property of Bochner’s integral that (l ,l ) : ∀ 1 2 ∈U ∗ ×U ∗
X l ,l = E [ X,l X] ,l = E [ X,l X,l ]= ϕX (l ,l ). hC 1 2i h h 1i 2i h 1ih 2i 1 2 The equality of the norms is a direct application of the definition and Cauchy- Schwarz inequality. Start with
(l1,l2) ∗ ∗ , X l1,l2 ϕX , ∀ ∈BU ×BU hC i ≤ k k thus X ϕX . The other inequality is obtained because kC k ≤ k k
X l1,l2 X l1 ∗ l2 ∗ , hC i ≤ kC k k kU k kU which gives ϕX X . k k ≤ kC k Since the covariance operator and bilinear forms are linked, the notation X 0 ∼ (0, X ) will be used to define a Gaussian random element in L ( ) together with N C U its distribution. The covariance operator enjoys more regularity than simply bound- edness, it is nuclear in the following sense.
Proposition 6. Let be a Banach space and X (0, X ), then X is a nuclear U ∼N C C operator, that is (un)n N such that ∃ ∈ ⊂U
l ∗, X l = un,l , ∗ un, ∀ ∈U C h iU U n 0 X≥ 2 and ( un )n N l (N). k kU ∈ ∈ Proof. The proof can be found in [VTC87]. 14 CHAPTER 1. INTRO. TO GAUSSIAN RAND. ELEM.
The particular properties (symmetry and non-negativity) of the covariance op- erator are almost those of an inner product, however it is not necessarily non degen- erated. This can be exploited using a quotient space construction.
Lemma 2. Let be a Banach space, X L0( ) a Gaussian random element with U ∈ U covariance operator X , then there exists a Hilbert space H and a bounded operator C A ( ,H) such that ∈L U ∗ X = A A, • C ∗ R (A) is dense in H, •
R (A )= R ( X ), • ∗ C 1 2 A ( ∗,H) = X ( ∗, ). • k kL U kC kL U U Furthermore, the factorization is unique in the sense that if A ( ,H ) and 1 ∈ L U ∗ 1 A ( ,H ) are two operators such that X = A A = A A with R (A ), 2 ∈ L U ∗ 2 C 1∗ 1 2∗ 2 1 R (A ) respectively dense in H , H then there exists an isomorphism u : H H 2 1 2 2 → 1 satisfying A1 = uA2.
Proof. The operator X is bounded, thus ker( X ) is closed in . Let H = C C U ∗ 0 / ker( X ) be the quotient space equipped with the following bilinear application: U ∗ C
[l ] , [l ] = X l ,l , h 1 2 iH0 hC 1 2i which is well defined. It is symmetric, bilinear and positive-definite. Let H be the topological completion of H and A : H the natural embedding operator. It 0 U ∗ → is clearly linear and has a dense image (again, by construction). To see that A is bounded, one has:
2 2 l ∗, Al H = Al, Al H = X l,l , ∗ X l ∗ , ∀ ∈U k k h i hC iU U ≤ kC k k kU 1 thus A X 2 . Conversely, the Cauchy-Schwarz inequality gives k k ≤ kC k
(l,g) ∗ ∗, X l,g , ∗ = Al, Ag H Al H Ag H , ∀ ∈U ×U hC iU U h i ≤ k k k k 1 2 2 and ∗ A ∗ , thus A ∗ = . Now, let (l ,l ) X ( , ) ( ,H) ( , X ) X ( ∗, ) 1 2 kC kL U U ≤ k kL U k kL U U kC kL U U ∈ then U ∗ ×U ∗ X l ,l = Al , Al = A∗Al ,l , hC 1 2i h 1 2iH h 1 2i thus A A = X . Now, let A ,H and A ,H two valid factorizations. Let l ∗ C 1 1 2 2 ∈ U ∗ such that A l = 0, then X l = 0 thus A l = 0 as well. One can then consider the 1 C 2 map u : A l H A l H , 1 ∈ 1 → 2 ∈ 2 which is linear and isomorphic between R (A1) and R (A2). To see that it is isomor- phic let l . Then it comes: ∈U ∗ 2 2 A l = X l ,l = A l k 1 kH1 hC 1 1i k 2 kH2 and it extends to H1 and H2 by density. 1.3. COVARIANCE AND HILBERT SUBSPACE(S) 15
Even if this result is rather formal (by the use of quotient operations), it has two concrete instances, namely the Gaussian and Cameron-Martin spaces that are presented below.
Gaussian space The first method to build the Hilbert subspace (from lemma 2) is to follow the definition of a Gaussian random element, and associate every linear functional to a Gaussian random variable by the following application:
2 l ∗ X,l , ∗ L (Ω, , P; R). ∈U →h iU U ∈ F
This is completely equivalent to consider the equivalence classes (equality µX -a.e.) of all bounded linear functionals:
2 l ∗ l L ( ,Bor( ),µX ; R). ∈U → ∈ U U This last representation will be kept (noted shortly as L2( ; R), and the Gaussian U space defined as the topological closure of the dual space by this linear map.
Definition 5 (Gaussian space). Let be a Banach space, X L0( ) a Gaussian U ∈ U random element, the associated Gaussian space is
L2( ;R) X = U . H U ∗ Here, linear functionals are implicitly identified with their equivalence classes and the notation is unchanged and will always be clear in its context. The denomi- nation is justified since all elements in this space have indeed, as random variables, a Gaussian distribution.
Proposition 7. Let be a Banach space, X L0( ) a Gaussian random element U ∈ U and z X then z is a Gaussian random variable. ∈H 2 Proof. Let (ln)n N ∗ such that ln z in L ( ; R), then for all t R it comes ∈ ⊂ U → U ∈ that ln(t) z(t) since →
itln(u) itz(u) b e e t ln(u) z(u) b − ≤| || − |
and itln(u) itz(u) 2 2 e e µX (du) t (ln(u) z(u)) µX (du). − ≤ s − ZU ZU
\ t2 2 Since n N, t R, X,ln (t) = exp 2 ln L2( ;R) , one has: ∀ ∈ ∀ ∈ h i − k k U 2 t 2 t R, zˆ(t) = exp z 2 , ∀ ∈ − 2 k kL which completes the proof.
This construction does not require any theoretical completion, since the space of bounded linear functionals is seen as a subset of the larger space L2( ; R). U 16 CHAPTER 1. INTRO. TO GAUSSIAN RAND. ELEM.
Proposition 8. Let be a Banach space, X L0( ) a Gaussian random element U ∈ U with Gaussian space X , then the following operator: H
i X := l ∗ l X H ∈U → ∈H is bounded and provides a factorization of the covariance operator = i i . X ∗ X X C H H
Proof. Note i X := l ∗ l X , then this operator is linear and bounded H ∈ U → ∈ H because X L2( ) and ∈ U
2 2 2 2 l = l(u) µ (du) u µ (du) l ∗ . X X X k kH ≤ k kU k kU ZU ZU The adjoint operator is well-defined as
i∗ X := z X z(u)uµX (du) . H ∈H → ∈U ZU It is clear that = i i . X ∗ X X C H H Proposition 7 is the reason why this particular factorization is said canonical in [Vak91], as the Hilbert space is made of random variables.
Cameron-Martin space The second possibility is to use the covariance operator and build a proper subspace of , the Cameron-Martin space. Indeed, one can define U the following bilinear form on R ( X ) taking: C n f = αi X li, C i=1 Xm g = βj X gj, C j X=1 n m
f,g := αiβj X li,gj ∗ , h iX hC i , i j U U X=1 X=1 n+m n+m where (l ,...,ln,g ,...,gm) ( ) , (α ,...,αn,β ,...,βm) R . .,. is a 1 1 ∈ U ∗ 1 1 ⊂ h iX well-defined, symmetric positive-definite bilinear form. The associated norm will be noted . and the Cameron-Martin space will be defined as the topological k kX completion of R ( X ) under this norm in . C U Definition 6 (Cameron-Martin space). Let be a Banach space, X L0( ) a U ∈ U Gaussian random element with covariance operator X , the associated Cameron- C Martin space is .,. \ h iX X := R ( X ) . U C Unlike the previous construction, there is no ambient space for the completion to take place since it is a theoretical topological operation using Cauchy sequences. However, it can be identified with a subspace of . U 1.3. COVARIANCE AND HILBERT SUBSPACE(S) 17
Lemma 3. Let be a Banach space, X L0( ) a Gaussian random element with U ∈ U covariance operator X and .,. the inner product on R ( X ) defined as previously, C h iX C then 1 2 X ( ∗, ) = sup X l X . kC kL U U l ∗ kC k ∈BU
Proof. Let l , then ∈U ∗
X l = sup X l,g , ∗ = sup X l, X g X X l X sup X g X , kC kU g ∗ hC iU U g ∗ hC C i ≤ kC k g ∗ kC k ∈BU ∈BU ∈BU 1 and one gets 2 sup g . Furthermore, X ( ∗, ) g U∗ X X kC kL U U ≤ ∈B kC k 1 2 X l X = X l, X l X = X l,l , ∗ X ( ∗, ) l ∗ , kC k hC C i hC iU U ≤ kC kL U U k kU q q which gives the converse inequality and the proof is complete.
Proposition 9. Let be a Banach space, X L0( ) a Gaussian random element, U ∈ U then there is a natural topological injection i X : X such that U U →U
h R ( X ) , i X h = h, ∀ ∈ C U 1 2 and i = ∗ . Furthermore (reproduction property): X ( X , ) X ( , ) k U kL U U kC kL U U
h X , l ∗, h,l , ∗ = h, X l X . ∀ ∈U ∀ ∈U h iU U h C i
Proof. Let i : h R ( X ) h . By definition of the inner product on X , one ∈ C → ∈ U U gets:
h R ( X ) , l ∗, h,l , ∗ = h, X l X . ∀ ∈ C ∀ ∈U h iU U h C i To see that i is bounded, it is enough to write:
i = sup sup l,g ∗ , (R( X ), ) X , k kL C U g U∗ X l 1 hC iU U ∈B kC kX ≤
= sup sup X l, X g X , g U∗ X l 1 hC C i ∈B kC kX ≤ X g = sup C , X g , g U∗ X g X C X ∈B kC k = sup X g X , g ∗ kC k ∈BU and by lemma 3: 1 i = 2 . (R( X ), ) X ( ∗, ) k kL C U kC kL U U Now, R ( X ) is dense in X (by construction) thus i uniquely extends by continuity C U 1 2 to , this new map being noted i and i = ∗ . Let h , X X X ( X , ) X ( , ) X U U k U kL U U kC kL U U ∈U l ∗ and ( X ln)n N a sequence converging to h in X , then ∈U C ∈ U
lim X ln, X l = h, X l n X X →∞ hC C i h C i 18 CHAPTER 1. INTRO. TO GAUSSIAN RAND. ELEM.
since strong convergence implies weak convergence. However, i X is bounded, thus U i X ( X ln) = X ln i X (h) in and implies that X ln,l , ∗ h,l , ∗ . Be- U C C → U U hC iU U → h iU U cause X ln, X l X = X ln,l , ∗ and the uniqueness of limits the reproduction hC C i hC iU U property is true. It remains to see that i X is injective. Let h X such that U ∈ U i X h = 0, then by the reproduction property, one has i X h,l = h, X l X = 0. U h U iU h C i Since R ( X ) is dense in X , it comes that h = 0, thus i X is an injection. C U U
From now on, the space X is identified with a subspace of . As it was an- U U nounced, the Cameron-Martin space and its injection provide a factorization of the covariance operator. Corollary 5. Let be a Banach space, X L0( ) a Gaussian random element with U ∈ U covariance operator X and Cameron-Martin space X , then one has the following C U factorization:
X = i X i∗ X . C U U
Lo`eve isometry Since both Cameron-Martin and Gaussian spaces provide fac- torization of the covariance operator, there exists an isometry between them. In particular, it will be possible to use either point of view to prove results on X. Proposition 10 (Lo`eve isometry). Let be a Banach space, X L0( ) a Gaussian U ∈ U random element, then
X = X . U ∼ H Proof. Consider the following map:
2 X l R ( X ) l L ( ; R). C ∈ C → ∈ U Since l = 0 µX -a.e. implies that X l = 0, it is well-defined and injective. Moreover, C for all l ∗, X l X = l L2( ;R). This isomorphism naturally extends by density, ∈U kC k k k U thus Cameron-Martin and Gaussian spaces are isomorphic.
It states that to every element from the Cameron-Martin space corresponds a unique element in the Gaussian space and vice-versa. Corollary 6. Let be a Banach space, X L0( ) a Gaussian random element U ∈ U then the associated Cameron-Martin space X is separable. U Proof. Since can be taken separable, L2( ; R) is separable since Bor( ) is count- U U U ably generated. By the Lo`eve isometry, it comes that X is separable. U The following theorem will be critical in the next chapter.
Theorem 2. Let be a Banach space, X (0, X ) then U ∼N C
X = h X , h X 1 X BU { ∈U k k ≤ }⊂U is compact in . U Proof. The proof can be found in [Bog98]. 1.3. COVARIANCE AND HILBERT SUBSPACE(S) 19
Translation of Gaussian elements So far, the presentation only concerned cen- tred Gaussian random elements for simplicity. However, all previous notions are adapted to the case where the mean is non-zero. For any vector u and Gaussian ∈U element X, X +u is also Gaussian. However, in terms of distribution, one may ask if they are related in some sense. The Cameron-Martin theorem states precisely that both distributions are equivalent for translations in the Cameron-Martin space.
Theorem 3 (Cameron-Martin). Let X be a Gaussian element and h X with ∈ U corresponding element h X then the distributions of X and X +h are equivalent ∗ ∈H with Radon-Nikodym density
2 ∂µX+h h X (u) = exp h∗(u) k k . ∂µ − 2 X 2 Proof. Let h X , there is a corresponding element h∗ in L ( ; R) (Lo`eve isometry), ∈U 2 U which is a Gaussian random variable with variance h X . It follows that exp(h∗) is 1 k k 1 2 L ( , R) (as a log-normal random variate) and fh : u exp(h (u) h ) U ∈ U → ∗ − 2 k kX is a strictly positive (Radon-Nikodym) density (it integrates to one). Consider the following application for a fixed l ∈U ∗
1 2 φ := t R exp h X exp i u,l , ∗ + ith∗(u) µX (du) C. ∈ → −2k k h iU U ∈ ZU As we have l + th Gaussian (it is an element from X ), the previous expression ∗ H becomes
1 2 2 φ(t) = exp ϕX (l,l) + 2t h,l , ∗ +(1+ t ) h X −2 h iU U k k using the reproducing property. Now, φ can be extended to the complex plane and in particular Lebesgue’s dominated convergence theorem gives
1 lim φ(z) = exp i h,l ∗ ϕX (l,l) . z i h i , − 2 →− U U
The direct Fourier transform of µX+h is obtained directly as follows:
\ l ∗, X + h(l)= E exp i X + h,l , ∗ ∀ ∈U h iU U h 1 i = exp i h,l ∗ ϕX (l,l) . h i , − 2 U U Since both Fourier transforms are the same, the underlying distributions are equal.
In the literature, the Cameron-Martin theorem states as well that if h X , the 6∈ U distributions are mutually singular. This result is omitted here since it will not be used. One interesting consequence of the Cameron-Martin theorem is the possibility to quantify the so-called small-ball probability. 20 CHAPTER 1. INTRO. TO GAUSSIAN RAND. ELEM.
Theorem 4 (Onsager-Machlup functional). Let be a Banach space and X U ∈ L0( ) a Gaussian random element then (h ,h ) 2 , it comes: U ∀ 1 2 ∈UX µX ( (h1,r]) 1 2 1 2 lim BU = exp h2 X h1 X . r 0 µX ( (h2,r]) 2 k k − 2 k k → BU
Proof. The proof may be found in [Bog98] (Corollary 4.7.8).
1.4 Series representation
Admissible sequences The concept of series representation for a Gaussian ran- dom element X consists in finding particular sequences of vectors (un)n N such ∈ ⊂U that the random series
ξnun, n 0 X≥ with (ξn)n N independent (0, 1) random variables converges almost-surely in ∈ N U with distribution µX (the law of X). Every such family is called an admissible sequence for X.
Definition 7 (Admissible sequence). Let be a Banach space, X L0( ) a Gaus- U ∈ U sian random element, (ξn)n N a family of independent standard Gaussian random ∈ variables, then a sequence (un)n N is admissible for X if ∈
ξnun converges almost-surely in , U n 0 X≥ and its distribution is µX .
The existence of such admissible sequence has been first shown in [Tsi81] and later studied in [Bog98, VTC87, Vak91, LP09, Lif12] for instance. Whenever the series representation is actually a finite sum, the random element is said of finite rank (equal to the number of terms). It is essentially the same problem as finding (tensor) representations of the covariance operator.
Lemma 4 (Lemma 1 in [LP09]). Let be a Banach space, X L0( ) a Gaussian U ∈ U random element, X its covariance operator and (un)n N then the following C ∈ ⊂ U assertions are equivalent:
1. (un)n N is admissible for X, ∈
2. l ∗, un,l , ∗ is admissible for X,l , ∗ , ∀ ∈U h iU U n N h iU U ∈ 3. l ∗, X l = n 0 un,l , ∗ un, where the convergence is in , ∀ ∈U C ≥ h iU U U
4. r> 0, l,g P∗ (0,r] ∀ ∀ ∈BU
X l,g , ∗ = un,l , ∗ un,g , ∗ , hC iU U h iU U h iU U n 0 X≥ where the convergence is uniform. 1.4. SERIES REPRESENTATION 21
Proof. 1 2 is direct while 2 1 is a consequence of the Itˆo-Nisio theorem [IN68]. ⇒ ⇒ Now let (ξn)n N be a sequence of independent Gaussian standard random variables ∈ n and note Xn = k=0 ξkuk. 1 4 By hypothesis, Xn Y almost-surely with Y ⇒ → 2 a Gaussian random element identically distributed with X, thus Xn Y in L ( ) P → U and one has for all r> 0
n
uk,l , ∗ uk,g , ∗ X l,g , ∗ = E Xn Y,l , ∗ Xn Y,g , ∗ , h iU U h iU U − hC iU U h − iU U h − iU U Xk=0 h i 2 2 r E Xn Y , ≤ k − kU h i 2 with (l,g) ∗ (0,r] . 4 3 is direct. 3 2 Finally, ∈BU ⇒ ⇒ n 1 2 l ∗, E exp i Xn,l , ∗ = exp un,l , ∗ ∀ ∈U h iU U −2 h iU U ! h i Xk=0 thus Xn(l) X(l). → Asc pointedb out in [LP09], the last item in lemma 4 is a general version of Mercer’s theorem [Mer09] and all convergences are unconditional. It appears that admissible sequences are completely characterized using covariance factorizations from lemma 2 [LP09].
Theorem 5. Let be a Banach space, X L0( ) a Gaussian random element U ∈ U and a separable Hilbert space with (en)n N a basis. A sequence (un)n N is H ∈ ∈ ⊂ U admissible for X if and only if there is an operator S ( , ) such that X = SS ∈L H U C ∗ and n N, Sen = un. ∀ ∈
Proof. Let be a separable Hilbert space, S ( , ) such that X = SS and H ∈ L H U C ∗ (en)n N a Hilbert basis. It comes that ∈ ⊂H
l ∗, S∗l = S∗l,en en = Sen,l , ∗ en, ∀ ∈U h iH h iU U n 0 n 0 X≥ X≥ and applying S, it comes l ∗, X l = SS∗l = n 0 Sen,l , ∗ Sen. Using ∀ ∈ U C ≥ h iU U lemma 4, (Sen)n N is admissible. Conversely, let (un)n N be an admissible P ∈ ∈ 2 ⊂ U sequence for X then (cn)n N l (N), one has ∀ ∈ ∈ m 2 m 2 ckuk sup ckuk,l , ≤ l U∗ * + k=n ∈B k=n , ∗ X U X U U m 2 2 ck sup ui,l , ∗ , ≤ l ∗ h iU U k=n U i 0 ∈B ≥ Xm X 2 ck sup X l,l , ∗ , ≤ l ∗ hC iU U k=n ∈BU X m 2 X ( ∗, ) ck, ≤ kC kL U U kX=n 22 CHAPTER 1. INTRO. TO GAUSSIAN RAND. ELEM. using lemma 4, thus the sequence is Cauchy in and converges. Since (en)n N U ∈ ⊂H is a Hilbert basis, Parseval equality gives
2 h , ( h,en )n N l (N), ∀ ∈H h iH ∈ ∈ and n 0 h,en un . Define S as the following operator: ≥ h iH ∈U P S := h h,en un, ∈ H → h iH n 0 X≥ which is linear. To see that it is bounded, write
2 2 Sh = sup Sh,l , ∗ , k kU l ∗ h iU U ∈BU 2
sup Sh,en un,l , ∗ , ≤ ∗ h iH h iU U l U n 0 ∈B X≥ 2 X ( ∗, ) h , ≤ kC kL U U k kH and now n N, Sen = un. The proof is complete. ∀ ∈ In practice, one finds admissible sequences using covariance factorization and pre-existing Hilbert bases. In [LP09], the authors emphasize a more general notion of Parseval frames instead of Hilbert bases, allowing redundancy and the possibility to work with wavelets for instance.
Corollary 7. Let be a Banach space, X L0( ) a Gaussian random element U ∈ U with Cameron-Martin space X , then all Hilbert bases (hn)n N X are admissible U ∈ ⊂U for X.
Proof. Since the injection i X : X provides a valid factorization, it is a U U → U consequence of theorem 5.
Among all possible bases in the Cameron-Martin space, there are some satisfying a strong summability condition, linked with the nuclearity of the covariance operator.
Proposition 11. Let be a Banach space, X L0( ) a Gaussian random element, U ∈ U then there exists an admissible sequence (un)n N such that ∈ ⊂U 2 un < + . k kU ∞ n 0 X≥ Proof. The proof can be found in [Bog98].
An admissible sequence with this last property will be called a nuclear represen- tation of X. The general concept of admissible sequences only provides a Gaussian random element with distribution target distribution µX . However, it is possible to
find particular sequences such that they lie in the range of X (as it is dense in X ). C U In that case, the approximation is much stronger, since it holds almost-surely. 1.4. SERIES REPRESENTATION 23
Proposition 12. Let be a Banach space, X (0, X ), (ln)n N ∗ such that U ∼N C ∈ ⊂U ( X ln)n N is admissible for X, then C ∈
X = X,ln , ∗ X ln, a.s. h iU U C n 0 X≥ n Proof. For all n N, let Xn = k=0 X,lk , ∗ X lk, then for every l ∗, ∈ h iU U C ∈ U 2 Xn,l , ∗ converges to X,lP , ∗ in L (Ω; R) by lemma 4, thus in proba- h iU U n N h iU U bility and Itˆo-Nisio ∈ theorem gives the conclusion.
This (stronger) type of representations is known as stochastic bases in [Her81, Oka86].
Definition 8. Let be a Banach space, X L0( ) a Gaussian random element, U ∈ U (un)n N and (un∗ )n N ∗ then the system (un,un∗ )n N is a stochastic basis if ∈ ⊂U ∈ ⊂U ∈ 2 (n, m) N , un,um∗ , ∗ = δnm, • ∀ ∈ h iU U
X = n 0 X,un∗ , ∗ un, a.s. • ≥ h iU U P Optimality of representations Since there are multiple admissible sequences (and stochastic bases) to represent a Gaussian random element, it is natural to study the speed of convergence of such representations, in particular in applied contexts. Indeed, the practitioner will be often interested in finite approximations using an admissible sequence (un)n N of the following form: ∈ n n N∗, Xn = ξkuk. ∀ ∈ Xk=0 It is then of interest to look for optimal representations, provided a measure of con- vergence speed. The usual notions of distances (Hellinger, total-variation) between distributions µX and µXn are not adapted (in fact not defined), since both measures are mutually singular. However, a first criterion could be a uniform approximation of the type (see [KL02] for a precise discussion):
1 2 2
n N∗, E ξkuk . ∀ ∈ k n 1 ≥X− U
This leads to the important notion of l-numbers for a Gaussian random element.
Definition 9. Let be a Banach space, X L0( ) a Gaussian random element, U ∈ U then its sequence of l-numbers is defined as n N : ∀ ∈ ∗ 1 2 2
ln(X) = inf E ξkuk , (un)n N is admissible for X . ∈ k n 1 ≥X− U 24 CHAPTER 1. INTRO. TO GAUSSIAN RAND. ELEM.
In particular, these l-numbers where shown closely related to the metric entropy of the underlying probability measure [LLW99]. In fact, this notion can be linked to the covariance operator using factorizations.
Proposition 13. Let be a Banach space, X L0( ) a Gaussian random element U ∈ U with covariance operator X , a separable Hilbert space and A ( , ) such that C H ∈L H U X = AA then n N : C ∗ ∀ ∈ ∗ 1 2 2
ln(X) = inf E ξkAek , (en)n N Hilbert basis in , ∈ H k n 1 ≥X− U
where (ξn)n N is a sequence of i.i.d. (0, 1) random variables. ∈ N
However, it is not clear that there exists an admissible sequence (un)n N such ∈ that 1 2 2
n N∗, ln(X)= E ξkuk . ∀ ∈ k n 1 ≥X− U In other words, the infimum may not be a minimum in the definition of l-numbers.
Nevertheless, if one can at least estimate the sequence of l-numbers, there exists at least one admissible sequence achieving this speed of approximation.
Proposition 14. Let be a Banach space, X L0( ) a Gaussian random element U ∈ U with covariance operator X , a Hilbert space, A ( , ) such that X = AA , C H ∈L H U C ∗ α> 0 and β R then the following assertions are equivalent: ∈ (1+log(n))β 1. c > 0, n N , ln(X) c α , ∃ 1 ∀ ∈ ∗ ≤ n
2. there exists (un)n N an admissible sequence for X and c2 > 0 such that ∈ ⊂U 1 2 2 (1 + log(n))β n N∗, E ξkuk c . ∀ ∈ ≤ 2 nα k n 1 ≥X− U
3. there exists a sequence independent Gaussian random elements (Xk)k N ∈ ⊂ L0( ), c > 0 such that U 3
X = Xk, k 0 X≥ k k N, rank (Xk) < 2 , ∀ ∈ 1 2 2 β αk k N, E Xk c3k 2− . ∀ ∈ k kU ≤ h i
Proof. The proof can be found in [Pis89]. 1.4. SERIES REPRESENTATION 25
In view of proposition 14, we see that if one estimates the l-numbers for a particu- lar factorization of the covariance operator, then there exists an admissible sequence with the same approximation error estimate. However, in some cases, it is also possible to bound the l-numbers below and when both estimates are similar, the decomposition is said asymptotically optimal.
Definition 10. Let be a Banach space, X L0( ) a Gaussian random element U ∈ U then an admissible sequence (un)n N is asymptotically optimal for X if c1,c2 > ∈ ⊂U ∃ 0, n N∗, ∀ ∈ 1 2 2
c ln(X) E ξkuk c ln(X). 1 ≤ ≤ 2 k n 1 ≥X− U
The second notion of approximation of Gaussian random element is a worst-case type. It is based on the approximation of the covariance operator by finite dimen- sional operators and it leads to the notion of approximation numbers for Gaussian random elements.
Definition 11. Let be a Banach space, X L0( ) a Gaussian random element U ∈ U with covariance operator X , the sequence of approximation numbers is defined as: C
n N∗, an(X) = inf X S ( ∗, ) , S ( ∗, ), rank (S) Proposition 15 (Proposition 6.2 in [KL02]). Let be a Banach space, X L0( ) U ∈ U a Gaussian random element, n N∗ and ǫ> 0 then an(X) <ǫ if and only if there n ∈ n exist (l0,...,ln 1) ( ∗) and (α0,...,αn 1) R such that − ∈ U − ∈ n 1 2 − l ∗, E X,l , ∗ αk X,lk , ∗ ǫ l ∗ . ∀ ∈U h iU U − h iU U ! ≤ k kU Xk=0 Proof. The proof is in [KL02]. The second interest in approximation numbers is the possibility to provide a lower bound to l-numbers. Proposition 16. Let be a Hilbert space, a Banach space and A ( , ), H U ∈ L H U then there exists c> 0, m, n 1, log(m)an+m 1(A) cln(A). ∀ ≥ − ≤ p 26 CHAPTER 1. INTRO. TO GAUSSIAN RAND. ELEM. 1.5 Additional results in the Hilbert case In this section, some important results are provided concerning the very specific case where is a Hilbert space. Indeed, there are some important strengthenings of U previous general concepts in this very special type of Banach spaces. Here, the dual space will be identified with using Riesz representation theorem. In particular, U ∗ U it implies that the covariance operator is an endomorphism and can be diagonalized by the spectral theorem (since it is compact and self-adjoint). Moreover, it is non- + negative thus all the eigenvalues (λn)n N R and the nuclearity translates into ∈ 1 ⊂ (λn)n N l (N). ∈ ⊂ Characterization of Gaussian covariance operators In the section 1.3, the covariance operator has been defined using Bochner’s integral and shown to be non- negative, symmetric and nuclear. In Hilbert spaces, the converse is also true, that is every operator sharing these properties is the covariance of a Gaussian random element. Theorem 6 (Prohorov). Let be a separable Hilbert space and ( , ), it is U C∈L U U the covariance of a Gaussian random element in if and only if it is symmetric, U non-negative and nuclear. Proof. Let be a separable Hilbert space and X (0, X ) a Gaussian random U ∼ N C element. It has been already shown that its covariance operator is self-adjoint and non-negative. The nuclearity can be directly shown here. Indeed, let (un)n N ∈ ⊂ U be a Hilbert basis, then 2 2 u µX (du)= u,un µX (du) k kU h iU n 0 ZU ZU X≥ 2 = u,un µX (du) h iU n 0 X≥ ZU = X un,un hC iU n 0 X≥ from Lebesgue’s dominated convergence theorem, which implies a finite trace, thus X is nuclear. Now, let X ( , ) be a symmetric, non-negative and nuclear C C ∈ L U U operator, the spectral theorem implies the existence of a Hilbert basis (un)n N ∈ + ⊂U and a sequence (λn)n N R (by non-negativity) such that ∈ ⊂ u , X u = λn u,un un, ∀ ∈U C h iU n 0 X≥ 1 and (λn)n N l (N) (by nuclearity). Now, let (ξn)n N be a sequence of i.i.d. (0, 1) ∈ ∈ ∈ N random variables (on the same probability space) and consider the following quan- tity: X˜ = λnξnun. n 0 X≥ p 1.5. ADDITIONAL RESULTS IN THE HILBERT CASE 27 It comes that 2 2 E X˜ = E λnξ = λn < + , n ∞ U n 0 n 0 X≥ X≥ using Lebesgue’s dominated convergence theorem and thus X˜ L2( ). Now it ∈ U remains to see that it is a Gaussian random element with covariance X . Again, C Lebesgue’s dominated convergence theorem gives i X,u˜ i√λn u,un ξn 1 u , E e h iU = E e h iU = E X u,u . ∀ ∈U −2 hC iU n 0 h i Y≥ h i In conclusion, µX˜ = µX by proposition 2. In general Banach spaces, the characterization of Gaussian covariance operators is still an open problem (except in lp spaces, see [KT14] and the reference therein). Karhunen-Lo`eve representation In separable Hilbert spaces, the covariance operator admits a spectral representation. This eigendecomposition naturally leads to a Hilbert basis, which will be used to give an important admissible sequence, namely the Karhunen-Lo`eve decomposition. Proposition 17. Let be a separable Hilbert space, X L2( ) a Gaussian ran- U ∈ U dom element with covariance operator X which eigendecomposition is (λn)n N ∈ + C ⊂ R , (un)n N then ∈ ⊂U 1 U 2 1. UX = R , CX 2. (√λnun)n N is a Hilbert basis in UX , ∈ 1 3. (h1,h2) X , h1,h2 X = n 0 λn− h1,un h2,un . ∀ ∈U h i ≥ h iU h iU P Proof. Let u and note h = X u, then ∈U C 1 2 2 2 h = h,h = X u,u = u k kX h iX hC i CX U U using the reproduction property. This isometry extends by density, giving item 1. Similarly, let (un)n N be a spectral basis of X , u and h = X u, then ∈ C ∈U C n N, h,un X = u,un . ∀ ∈ h i h iU This implies that if h,un X = 0 for all n N, h = 0 thus (un)n N is dense in X . h i ∈ ∈ U Finally, 2 λnun = λnun,un X = un,un = 1, X h i h iU p √λp √ and similarly for λn > 0, λnun, λpup X = √λ un,up = δnp. The specific n h iU form of the inner product is deducedp directly. 28 CHAPTER 1. INTRO. TO GAUSSIAN RAND. ELEM. Theorem 7. Let be a separable Hilbert space, X a Gaussian random element in U U with covariance operator X , then using previous notations, the sequence (√λnun)n N C ∈ is a Hilbert basis of X and one has U X = X,un un a.s. h iU n 0 X≥ Proof. Since (un)n N is a Hilbert basis, the result is clear. ∈ ⊂U Besides of having independent Gaussian components, this decomposition enjoys two fundamental properties: a precise quantification of errors under truncation (low- rank approximation) and optimality in both trace and operator norms. Theorem 8. Let be a Hilbert space, X L0( ) a Gaussian random element with U ∈ U covariance operator X ( , ). Let (λn)n N R and (un)n N respectively C ∈ L U U ∈ ⊂ ∈ ⊂ U its eigenvalues (in decreasing order) and eigenvectors, then n N : ∀ ∈ ∗ 1 2 ln(X)= k n λk , • ≥ 1 P2 an(X)= λn 1. • − In other words, the problem of approximation of Gaussian random elements in Hilbert spaces is solved, the optimal basis being essentially unique and given by the (rescaled) eigenvectors. In finite dimensional spaces, this is also known as Eckart- Young-Mirksy theorem for low-rank approximation. Feldman-Hajek theorem The second important result is that equivalence and singularity properties from Cameron-Martin theorem may be extended to measures with different covariance operators. Theorem 9 (Feldman-Hajek). Let a Hilbert space, Xi (mi, i), i 1, 2 U ∼ N C ∈ { } two Gaussian random elements, then: 1. The distributions µX1 and µX2 are equivalent if and only if: (a) X = X U 1 U 2 (b) m m X 1 − 2 ∈U 1 1 1 1 1 ∗ (c) − 2 2 − 2 2 < C1 C2 C1 C2 −I ∞ HS 2. They are mutually singular otherwise. Proof. The proof is given in [DZ14], theorem 2.25. Here, the Hilbert-Schmidt norm of an operator A ( , ) is: ∈L U U 1 2 2 A HS = Aen , k k k kU n 0 X≥ where (en)n N is a Hilbert basis (this quantity is invariant under change of ∈ ⊂ U bases). 1.5. ADDITIONAL RESULTS IN THE HILBERT CASE 29 Examples Example 1 (Gaussian vectors). This first example shows how the previous general theory applies in an Euclidean space = Rn with n N . Let X L0( ) be a U ∈ ∗ ∈ U Gaussian random element. In the canonical basis (ei)i [1,n], the covariance operator ∈ is the usual matrix 2 (i,j) [1,n] , (ΣX )i,j = E X,ei X,ej . ∀ ∈ h iU h iU Because this matrix is symmetric, one can find an eigendecompositi on and represent n it as a diagonal matrix with eigenvalues (λ ,...,λn) [0, + [ . The Cameron- 1 ∈ ∞ Martin space is then equal to the span of eigenvectors with strictly positive eigenval- n ues, while the inner product is defined (Σx, Σy) Σ(R ), Σy, Σx X = Σy,x . ∀ ∈ h i h iU In this very special case, the classical terminology is Gaussian random vectors. Example 2 (Inverse Laplace covariance.). Let Ω =]0, 1[, f L2(Ω; R) and consider ∈ the following equation: ∆u = f on Ω, − u = 0 on ∂Ω. Using standard variational analysis (Lax-Milgram theorem), one shows that a (unique) 1 weak solution u exists in the Sobolev space H0 (Ω), defining a solution map C := f 2 1 ∈ L (Ω; R) u H0 (Ω; R). This map is linear and bounded. Furthermore, since Ω → ∈ 1 2 is bounded itself, the Sobolev embedding H0 (Ω) in L (Ω) is compact and C is thus a compact operator. To see that it is self-adjoint, let (f,g) L2(Ω)2 and u = Cf, ∈ v = Cg. It is clear that Cf,g 2 = u′,v′ 2 = f,Cg 2 , h iL L h iL and C is self-adjoint in L2. By the spectral theorem, it admits a spectral represen- tation and in this particular case, it is given n N: ∀ ∈ Cfn = λnfn, 1 λ = , n (n + 1)2π2 fn : = x ]0, 1[ √2 sin((n + 1)πx) R. ∈ → ∈ It is thus clearly trace-class, and the following random series X = λnξnfn, n 0 X≥ p with (ξn)n N a sequence of independent standard normal variates defines a Gaussian ∈ random element in = L2(Ω, R) with covariance operator (known as the Brownian U C bridge). The Cameron-Martin space can be identified with the subspace H1(Ω), f 0 ∀ ∈ L2(Ω), note u = Cf, then 2 2 u = u,u = u,f 2 = u′,u′ 2 = u 1 . k kX h iX h iL L k kH0 30 CHAPTER 1. INTRO. TO GAUSSIAN RAND. ELEM. 1.6 Gaussian random elements in C ( , R) K In this section, the presentation focuses on the particular case of continuous func- tions over a compact metric set, that is = C ( , R) from now on. First of all, U K it is a Banach space when the supremum norm is considered and it is separable (as a consequence of Stone-Weierstrass theorem). Moreover, the dual space is U ∗ identified with the space of finite, signed measures on equipped with the Borel K σ-algebra (Riesz-Markov theorem, see [Rud09]). In this context, the theory of con- tinuous Gaussian random fields will provide an alternative viewpoint on the previous material. Indeed, they can be seen as Gaussian random elements of L0( ). U Continuous Gaussian random fields The theory of random fields is a practical tool to define probability measures over function spaces [AT07]. Indeed, let be a K set and (Ω, , P) a probability space, then a family Xs s of real random variables, F { } ∈K all defined on the same probability space, maps Ω to RK (the set of all maps from to R). K Definition 12 (Random field). Let (Ω, , P) a probability space, a set, then a F K random field on is a family of real random variables (Xs)s defined on the same K ∈K probability space. This formally defines a measurable mapping X between (Ω, ) and R equipped F K with the cylindrical σ-algebra Cyl( ) (smallest such that all point-wise evaluations K are measurable). Let ω Ω such that the previous map is well-defined, then X(ω) := ∈ (Xs(ω))s is called a sample function. From now on, we will always suppose that ∈K a unique probability space is given, and will not mention it any more. Now, given a random field on ( ,d), one can consider an associated probability measure, its K distribution. Definition 13 (Distribution of a random field). Let a set and X = (Xs)s a K ∈K random field, its distribution is defined as: 1 µX := A Cyl( ) P X− (A) [0, 1]. ∈ K → ∈ The particular nature of the Cylindrical σ-algebra gives the possibility to consider the previous distribution on cylinders, providing a family of finite distributions. Definition 14 (Finite distribution). Let a set and X =(Xs)s a random field, ∈K K n+1 then for all n N and all (s ,...,sn) , the following map is a probability ∈ 0 ∈ K measure on (Rn+1,Bor(Rn+1)): (s0,...,sn) n+1 µ := A Bor(R ) P ((Xs ,...,Xs ) A) [0, 1]. X ∈ → 0 n ∈ ∈ Obviously, the distribution of a random field uniquely defines a family of finite distributions. However, it is possible to build a distribution on Cyl( ) by specifying K a family of finite distributions, provided they are consistent (Kolmogorov extension theorem). Now, since a random field is a family of random variables, one can consider important mappings, generalizing the usual notion of moments. 1.6. GAUSSIAN RANDOM ELEMENTS IN C ( , R) 31 K Definition 15. Let a set and X =(Xs)s a random field on : K ∈K K 1 if s , Xs L (R), then the map • ∀ ∈K ∈ mX := s E [Xs] R ∈ K → ∈ is the mean function of X, 2 if s , Xs L (R), then the map • ∀ ∈K ∈ 2 kX := (s,t) E [(Xs E [Xs])(Xt E [Xt])] R ∈K → − − ∈ is the covariance kernel of X. A random field with a mean function is said to be integrable, and if additionally it has a covariance kernel, it is square integrable. A random field with trivial mean function will be called centred, and again, only this case will be considered here such that 2 (s,t) , kX (s,t)= E [XsXt] . ∀ ∈K An important consequence of the definition is that a covariance kernel is symmetric and semi-positive definite. Proposition 18. Let a set and X =(Xs)s a square integrable random field on K ∈K , then its covariance kernel kX has the following properties: K 2 Symmetry: (s,t) , kX (s,t)= kX (t,s), • ∀ ∈K Semi-positive definiteness: • n n n n n N, (s ,...sn) , (α ,...,αn) R , αiαjkX (si,sj) 0. ∀ ∈ ∀ 1 ∈K ∀ 1 ∈ ≥ i j X=1 X=1 The terminology of semi-positive definite kernel is usual in the literature of co- variance kernels, while non-negativity is used in functional analysis for operators. Now, the definition of Gaussian random field is based on all its finite distributions, as follows. Definition 16 (Gaussian random field). Let be a set and (Xs)s a random field ∈K K n+1 on , it is Gaussian if n N, (s ,...,sn) , (Xs ,...,Xs ) is a Gaussian K ∀ ∈ ∀ 0 ∈ K 0 n vector on (Rn+1,Bor(Rn+1)). One particularity of such field is that they are square integrable, thus the mean function and covariance kernel are well-defined. As a consequence, all finite distri- butions can be expressed using these two notions. Indeed, let n N, s =(s ,...,sn) ∈ 0 then (Xs ,...,Xs ) (0,kX (s,s)) 0 n ∼N where kX (s,s) is a square matrix of dimension n + 1, such that (kX (s,s))(i,j) = kX (si,sj). Since all finite distributions are parametrized by the covariance kernel 32 CHAPTER 1. INTRO. TO GAUSSIAN RAND. ELEM. (the mean function being taken null) and as it bears the necessary consistency, it provides a characterization of the random fields’ law. It is also important to mention that given a symmetric, semi-positive definite kernel, one can build a Gaussian process from which it is the covariance. Example 3. Let Rn, n 1, then the following applications are valid covariance K⊂ ≥ kernels: Squared-exponential kernel: • (s,t) 2, k(s,t) = exp d(s,t)2 , ∀ ∈K − Exponential kernel: • (s,t) 2, k(s,t) = exp( d(s,t)) , ∀ ∈K − Mat´ern kernel: • 1 ν ν 2 2 − (s,t) , k(s,t)= √2νd(s,t) Kν √2νd(s,t) , ∀ ∈K Γ(ν) where Γ, Kν are respectively the gamma and modified Bessel functions and ν > 0. There exists a large family of well-known kernels in the literature, one could consult [RW04, BTA04] for instance. In the special case where the compact set K is a Cartesian product, it is possible to define covariance kernels by tensorization. Proposition 19. Let , be two sets, = , k and k two symmetric, K1 K2 K K1 ×K2 1 2 semi-positive definite kernels on , respectively, then K1 K2 (s,t) 2, k(s,t)= k (s ,t )k (s ,t ) ∀ ∈K 1 1 1 2 2 2 is a symmetric, semi-positive definite kernel on . K 2 Example 4. Let = [0, 1] and consider the Wiener process kernel (s,t) [0, 1] , kW (s,t)= K ∀ ∈ min(s,t), then the Brownian sheet is defined as the centred Gaussian process with the following covariance kernel 4 (s ,t ,s ,t ) [0, 1] , k ((s ,t ), (s ,t )) = kW (s ,t )kW (s ,t ). ∀ 1 1 2 2 ∈ 1 1 2 2 1 1 2 2 Now, concerning the associated sample functions, their continuity (as well as other types of regularity) has been widely studied in the literature and [AT07] pro- vides a detailed introduction. The analysis of general necessary conditions ensuring this continuity is beyond the scope of this work, but there is a practical criterion when is a subset of the Euclidean space. K 1.6. GAUSSIAN RANDOM ELEMENTS IN C ( , R) 33 K n Theorem 10 (Kolmogorov continuity test). Let n N∗, R , X = (Xs)s a ∈ K ⊂ ∈K random field on , if K 2 p pα+n (s,t) , E [ Xs Xt ] C s t n ∀ ∈K | − | ≤ k − kR where p > 1, α > 0 are constants, then there exists a version X˜ of X with almost- surely β-H¨older continuous sample functions, for all 0 <β<α. Example 5. Let = [0, 1] and consider W = (Wt)t [0,1] the standard Wiener K ∈ process, that is W0 = 0 almost-surely and kW (s,t) = min(s,t). One important property is that increments are normally distributed with a proportional variance 2 2 (s,t) [0, 1] , E (Wt Ws) = t s , thus ∀ ∈ − | − | 4 Ws Wt (s,t) [0, 1], s = t, E − = 3, ∀ ∈ 6 s t ! | − | p 1 and one can apply theorem 10 with C = 3, p = 4 and α = 4 . In other words, there exists a modification of the standard Wiener process with β-H¨older continuous 1 sample functions, provided 0 <β< 4 . It appears that continuous Gaussian random fields have necessarily a continuous covariance kernel, but the converse needs not be true. Proposition 20. Let ( ,d) be a compact metric space and X =(Xs)s a Gaussian K ∈K random field with almost-surely continuous sample functions, then kX is continuous. General Gaussian random fields on a set are initially defined as measurable K mappings from (Ω, ) to (R ,Cyl( )). In the case of almost-sure continuity of F K K sample functions, the random field takes its values in C ( , R), where both Borel K and Cylindrical σ-algebras are equal when ( ,d) is a compact metric set. In this K case, the random field may then be viewed as a measurable map from (Ω, , P) to F ( ,Bor( )), and one can show that it is a Gaussian random element (in the sense U U of definition 3). Consequently, both concepts of covariance kernel and operator are linked in the following sense. Proposition 21. Let ( ,d) be a compact metric space, X =(Xs)s a continuous K ∈K Gaussian random field on with covariance kernel kX , then it is a Gaussian random K element = C ( , R) with covariance operator: U K X : µ ∗ s kX (s,t)µ(dt) . C ∈U → ∈ K → ∈U ZK Conversely, a Gaussian random element X L0( ) has a covariance kernel defined ∈ U as follows: 2 (s,t) , kX (s,t)= X δs,δt , ∗ . ∀ ∈K hC iU U 34 CHAPTER 1. INTRO. TO GAUSSIAN RAND. ELEM. Proof. Using the definition, one has (µ,ν) : ∀ ∈U ∗ ×U ∗ X µ,ν , ∗ = E Xtµ(dt) Xsν(ds) , hC iU U ZK ZK = E XsXtµ(dt)ν(ds) , ZK ZK = E [XsXt] µ(dt)ν(ds), ZK ZK = kX (s,t)µ(dt)ν(ds), ZK ZK from which the covariance operator is identified. Conversely, one has 2 (s,t) , kX (s,t)= E [XsXt]= X δs,δt , ∗ . ∀ ∈K hC iU U Reproducing Kernel Hilbert Space (RKHS) Given a symmetric, semi-definite kernel (or equivalently a covariance kernel), one can build an important Hilbert sub- space of RK, construction similar in many senses with the Cameron-Martin space of Gaussian random elements (see [Aro50, BTA04, VV08]). Proposition 22. Let a set and k a symmetric, semi-positive definite kernel and K the following vector space := span k(s,.), s . H0 { ∈ K} Let (f,g) 2, such that ∈H0 n f = αikX (.,si), i=0 Xm g = βjkX (.,tj), j X=0 m+n+2 n+m+2 with n, m N, (α ,...,αn,β ,...,βm) R and (s ,...,sn,t ,...,tm) , ∈ 0 0 ∈ 0 0 ∈K then the following map: n m 2 (f,g) f,g = αiβjkX (si,tj) R, ∈H0 →h ik ∈ i j X=0 X=0 is well-defined and an inner product. Moreover, it satisfies the reproduction property: f , s , f(s)= f,k(.,t) . ∀ ∈H0 ∀ ∈K h ik Proof. Let f and s , then: ∈H0 ∈K n f,k(.,s) = αik(si,s)= f(s), h ik i X=0 1.6. GAUSSIAN RANDOM ELEMENTS IN C ( , R) 35 K thus the application is well-defined (it does not depend on the particular represen- tation of vectors), bilinear, symmetric and non-negative. To see that it is definite, it remains to use Cauchy-Schwarz inequality: s , f(s) = f,k(.,s) f k(s,s), ∀ ∈K | | |h ik | ≤ k kk p thus f = 0 f = 0. k kk ⇒ Proposition 23 (Reproducing Kernel Hilbert Space). Let a set and k a sym- K metric and semi-positive definite kernel, then the space = span k(.,s), s H0 { ∈ K} is continuously embedded in RK. The associated Reproducing Kernel Hilbert Space is its topological completion in RK: .,. k k = h i RK. H H0 ⊂ Moreover, it satisfies the reproductionc property: f k, s , f,k(s,.) = f(s). ∀ ∈H ∀ ∈K h ik The notion of Gaussian space may be defined as well, using the embedding 2 s Xs L (Ω; R) and it is isometric (Lo`eve isometry) with the Reproducing ∈ K → ∈ Kernel Hilbert space. When the kernel is continuous, the embedding will be in C ( , R). K Proposition 24. Let ( ,d) be a compact metric space, k a symmetric, positive- K definite and continuous kernel, then k C ( , R). H ⊂ K 2 Proof. Let f k, then for all (s,t) , it comes: ∈H ∈K (f(s) f(t))2 = f,k(s,.) k(t,.) 2 f 2 k(s,.) k(t,.) 2 . − h − ik ≤ k kk k − kk 2 Now, k(s,.) k(t,.) k = k(s,s) + k(t,t) 2k(s,t) and since k is continuous, k − k 2 − lims t k(s,.) k(t,.) k = 0. → k − k Corollary 8. Let ( ,d) be a compact metric space, k a symmetric, semi-positive K definite and continuous kernel, then k is separable. H Concerning continuous Gaussian random fields on a compact metric space, both definitions of RKHS and Cameron-Martin space are well-defined and lead to the very same notion (Theorem 2.1 in [VV08]), but it needs not be true in general. Series representation of continuous Gaussian random fields The previ- ous Reproducing Kernel Hilbert space is useful in the analysis of Gaussian random fields. In particular, when it is separable, one can use Hilbert bases to give series representations of the underlying random field. 36 CHAPTER 1. INTRO. TO GAUSSIAN RAND. ELEM. Proposition 25. Let ( ,d) be a compact metric space, X = (Xs)s a Gaussian K ∈K random field with continuous covariance kernel kX and (hn)n N k a Hilbert basis, ∈ ∈H then s , Xs = ξnhn(s), a.s. ∀ ∈K n 0 X≥ with (ξn)n N a sequence of independent standard Gaussian random variables and the ∈ convergence is in L2(Ω, , P; R). F Proof. Since kX is continuous, one has k separable and since it is isometric to H X the Gaussian space, the latter is separable as well. Let s and (hn∗ )n N a Hilbert ∈K ∈ basis in the Gaussian space, then Xs = E [Xshn∗ ] hn∗ , n 0 X≥ and using the Lo`eve isometry E [X(s)h∗ ]= kX (s,.),hn = hn(s), thus the result n h ikX follows. However, the very nature of continuous Gaussian random fields can be use to significantly increase the convergence, passing from L2(Ω, R) to C ( , R). K Theorem 11. Let ( ,d) be a compact metric space, X = (Xs)s a continuous K ∈K Gaussian random field with covariance kernel kX and (hn)n N k a Hilbert basis, ∈ ⊂H then s , Xs = ξnhn(s), a.s. ∀ ∈K n 0 X≥ with (ξn)n N a sequence of i.i.d (0, 1) random variables, the convergence being in ∈ N C ( , R). K Proof. Let s and (hn)n N k a Hilbert basis, then one has ∈K ∈ ⊂H kX (s,.)= kX (s,.),hn hn = hn(s)hn, h ikX n 0 n 0 X≥ X≥ by the reproduction property and it implies 2 (s,t) , kX (s,t)= hn(s)hn(t). ∀ ∈K n 0 X≥ In particular, taking t = s gives 2 s , kX (s,s)= hn(s) , ∀ ∈K n 0 X≥ the convergence being point-wise and monotonous, Dini’s theorem establishes the uniform convergence. Consider X as a Gaussian random element and note Xn = 1.6. GAUSSIAN RANDOM ELEMENTS IN C ( , R) 37 K n 1 k=0− ξkhk. P µ ∗, E [ Xn,µ X,µ ]= E Xn(s) X(s)µ(ds) , ∀ ∈U |h i−h i| − Z E Xn(s) X(s) µ (ds ) , ≤ | − || | Z E [ Xn(s) X(s) ] µ (ds), ≤ | − | | | Z 1 2 2 E (Xn(s) X(s)) µ (ds), ≤ − | | Z h 1 i 2 2 hk(s) µ (ds). ≤ | | Z k n X≥ Since it is the rest of a uniformly convergent series, the limit of the right hand side 1 in last inequality tends to 0. This implies that Xn,µ X,µ in L (Ω, , P; R) h i→h i F thus in probability. Applying Itˆo-Nisio theorem, it follows that Xn X almost- → surely. Similarly to Gaussian random elements and their covariance factorization, Hilbert bases in the Cameron-Martin space can be identified by decomposing the covariance kernel. Proposition 26. Let ( ,d) be a compact metric space, k a continuous, symmetric, K + positive-definite kernel, (λn) R and (fn)n N C ( , R) such that ⊂ ∈ ⊂ K 2 (s,t) , k(s,t)= λnfn(s)fn(t), ∀ ∈K n 0 X≥ where the convergence is uniform then (√λnfn)n N is a Hilbert basis in k. ∈ H However, it is a notoriously difficult problem to find such decompositions in general and the following section provides a canonical example of such method. Note that there are numerous recent contributions in this field (see [Git12, Zha17, BCM18] for instance). Karhunen-Lo`eve decomposition in L2( ,µ) The Karhunen-Lo`eve decompo- K sition is a very important result in the theory and applications of random fields, since it gives a practical method to obtain bases in the Reproducing Kernel Hilbert space. It is based on a spectral representation of the covariance operator with sample functions in L2( ,µ; R) where µ is a finite Borel measure on . K K Proposition 27. Let ( ,d) be a compact metric space, k a continuous, symmetric K and semi-positive definite kernel and µ a finite, Borel measure on then K 2 2 Ck : f L ( ,µ; R) k(.,t)f(t)µ(dt) L ( ,µ; R), ∈ K → ∈ K ZK is well-defined, self-adjoint and compact. 38 CHAPTER 1. INTRO. TO GAUSSIAN RAND. ELEM. Proof. Let f L2( ,µ; R), then ∈ K 2 2 2 2 Ckf 2 = k(s,t)f(t)µ(dt) µ(ds) k(s,t) µ (ds) µ (dt) f 2 . k kL ≤ | | | | k kL ZK ZK ZK ZK Since k is continuous and compact, it is clear that K k(s,t)2 µ (ds) µ (dt) < + , | | | | ∞ ZK ZK 2 thus Ckf L and the operator Ck is well-defined. The linearity follows from the ∈ linearity property of Lebesgue’s integral. The operator is self-adjoint because: 2 2 (f,g) L ( ,µ; R) , Ckf,g 2 = k(s,t)f(t)µ(dt)g(s)µ(ds), ∀ ∈ K h iL ZK ZK = k(s,t)g(s)µ(ds)f(t)µ(dt), ZK ZK = f,Ckg 2 h iL the exchange of integral being possible thanks to Fubini’s theorem. Now, let f ∈ L2(µ), then Ckf(s) Ckf(t) = (k(s,v) k(t,v))f(v)µ(dv) , | − | − Z K (k(s,v) k(t,v))f(v) µ (dv ), ≤ | − || | ZK f 2 k(s,.) k(t,.) 2 . ≤ k kL k − kL Since k is continuous, lims t k(s,.) k(t,.) L2 = 0 and Ckf is continuous as well. → k − k Moreover, s , Ckf(s) sup k(s,t) f L2 ∀ ∈K | |≤ t | | k k ∈K thus taking the supremum, we have Ck( 2 ) bounded. It remains to see that BL (µ) Ck( 2 ) is equicontinuous and conclude. Since is compact and k continuous, BL (µ) K it is uniformly continuous, thus ǫ > 0, δ > 0, (s ,s ,t ,t ) 4, d(s ,s )+ ∀ ∃ ∀ 1 2 1 2 ∈ K 1 2 d(t ,t ) δ k(s ,t ) k(s ,t ) ǫ. Let ǫ> 0 and take d(s,t) δ then 1 2 ≤ ⇒| 1 1 − 2 2 |≤ ≤ Ckf(s) Ckf(t) ǫ f(v) µ (dv) ǫ f 2 ǫ. | − |≤ | || | ≤ k kL ≤ ZK Now, since Ck( 2 ) is bounded and equicontinuous, it is relatively compact by BL Arzela-Ascoli theorem and the proof is complete. Now, the Karhunen-Lo`eve decomposition uses the spectral theory in Hilbert spaces, as it was presented in section 1.5. Theorem 12 (Mercer’s). Let ( ,d) be a compact metric space, k a continuous, K symmetric and semi-positive definite kernel and µ a finite, Borel measure on , K (λn,fn)n N the spectral decomposition of Ck, then: ∈ λn 0, n N, • ≥ ∀ ∈ 1.6. GAUSSIAN RANDOM ELEMENTS IN C ( , R) 39 K fn C ( , R) , n N, • ∈ K ∀ ∈ k admits the following expansion: • 2 (s,t) , k(s,t)= λnfn(s)fn(t), ∀ ∈K n 0 X≥ where the convergence is absolute and uniform on . K×K In other words, the problem of finding series representation of Gaussian random fields with values in L2( ,µ; R) is instantiated in an eigendecomposition problem. K The above examples are taken directly from [Wan08] in the case where = [0, 1]. K Example 6 (Wiener process). Let = [0, 1] and consider the standard Wiener K process W = (Ws)s [0,1] with covariance kernel (s,t), kW (s,t) = min(s,t) and ∈ ∀ W0 = 0 almost-surely. Since it is a continuous Gaussian random field on a compact metric space, the sample functions are square integrable w.r.t. Lebesgue’s measure. The eigendecomposition of the following covariance operator: 1 2 2 CW := f L ([0, 1]) kW (s,t)f(t)dt L ([0, 1]). ∈ → ∈ Z0 is given by n N solving an ordinary differential equation: ∀ ∈ 1 λn = , 1 2 2 n + 2 π 1 s [0, 1], fn(s)= √ 2sin n + πs . ∀ ∈ 2 Example 7 (Brownian bridge). Let = [0, 1] and consider the previous Wiener K process W = (Ws)s [0,1], the Brownian bridge is defined as the Gaussian process ∈ B = (Bs)s [0,1] = (Ws W1s)s [0,1]. This process corresponds to the conditional ∈ − ∈ Wiener process given W1 = 0. The spectral decomposition of the covariance operator gives the following elements: 1 λn = , (n + 1)2 π2 s [0, 1], fn(s)= √2sin((n + 1) πs) . ∀ ∈ Example 8 (Ornstein-Uhlenbeck process). Consider now the Ornstein-Uhlenbeck process Z = (Zs)s [0,1] on the time interval [0, 1] defined as the stationary solution ∈ of the following stochastic differential equation: dZt = βZtdt + σdWt, β> 0, σ> 0. (1.1) − One can show that the solution is a centred, continuous Gaussian random field with covariance kernel 2 σ β t s K(t,s)= Cov(Z ,Z )= e− | − |. t s 2β 40 CHAPTER 1. INTRO. TO GAUSSIAN RAND. ELEM. This kernel is also known as the exponential covariance kernel or Mat´ern covariance kernel of order ν = 1 . The associated eigendecomposition is given as n N, 2 ∀ ∈ 2β λn = 2 2 , wn + β 2 2 2wn 2β s [0, 1], fn(s)= 2 2 cos (wns)+ 2 2 sin(wns) , ∀ ∈ s2β + wn + β s2β + wn + β where n N, wn is a solution to the following equation: ∀ ∈ w2 β2 sin(w) = 2βw cos(w). − Remark from section 1.5, that this decomposition is optimal in L2( ,µ; R). How- K ever, it also provides a series representation in C ( , R) when the random field is K continuous. Optimality in C ( , R) The search for optimal series representation of continuous K Gaussian random fields over = [0, 1] is a very active field of research, in particular K for the fractional Brownian motion [AT03, DZ04b, DvZ05, Igl05, Nda16]. The results presented here are directly taken from [KL02, LP09] an give practical methods to estimate the l-numbers associated to a continuous Gaussian random field using both covariance kernel and operator. The first result provides an upper bound for l- numbers for continuous Gaussian random fields in = [0, 1], using the Faber- K Schauder basis. Proposition 28 (Proposition 7.1 in [KL02]). Let X = (Xs)s [0,1] be a Gaussian ∈ random field, if there exist c > 0, γ> 0, β R such that 1 ∈ 2β 2 2γ 1 E (2Xs Xs+t Xs t) c1t log , − − − ≤ t h i for all 0 s t Example 9. Let = [0, 1], W =(Ws)s [0,1] be the standard Wiener process (W0 = 0 K ∈ a.s.) then 2 E (Xs Xt) = s + t 2min(s,t)= s t . − − | − | 1 Using proposition 28 with γ = 2 , c1 = 1 and β = 0 gives 1 1 + log(n) 2 n N∗, ln(W ) c . ∀ ∈ ≤ n 1.6. GAUSSIAN RANDOM ELEMENTS IN C ( , R) 41 K In order to get a lower bound, which will be used to obtain asymptotically optimal representations, the space C ( , R) is embedded in L2( ,µ; R), using a finite and K K Borel measure on with full topological support (supp(µ) = ). Indeed, consider K K the following operator: 2 iµ : f C ( , R) f L ( ,µ; R), ∈ K → ∈ K it is linear, bounded and injective. This implies that if one has an operator S : 2 2 L ( ,µ; R) C ( , R), then iµ S is an endomorphism in L ( ,µ; R). In particular, K → K ◦ K the approximation numbers of such operators are equal with eigenvalues. This gives an interesting lower bound for the l-numbers, expressed in terms of eigenvalues of iµ S. ◦ d Proposition 29 (Proposition 4 in [LP09]). Let d N∗, = [0, 1] , X = (Xs)s ∈ K ∈K a continuous Gaussian random field with covariance kernel kX and note X the C associated integral operator: 2 2 X : L ( ,ds) L ( ,ds) C K → K suppose that (fn)n N C ( , R) is admissible for X, then if ∈ ⊂ K 1. the sequence (λn)n N of eigenvalues of X satisfy ∈ C log(n + 2)2γ n N, λn c ∀ ∈ ≥ 1 (n + 1)2ν with ν > 1 , γ 0 and c > 0, 2 ≥ 1 log(n+2)γ 2. n N, fn C( ,R) c2 (n+1)ν with c2 > 0, ∀ ∈ k k K ≤ b 3. (fn)n N is β-H¨older continuous with n N, fn β c3(n + 1) with b R, ∈ ∀ ∈ k k ≤ ∈ β ]0, 1] and c > 0, ∈ 3 then γ 1 log(n) + 2 ln(X) ν 1 ≈ n − 2 and (fn)n N is asymptotically optimal for X. ∈ Example 10 (Wiener process). Let W =(Ws)s [0,1] be the standard Wiener process, ∈ 2 that is W = 0 almost-surely and (s,t) , kW (s,t) = min(s,t), then the l- 0 ∀ ∈ K numbers are such that log(n) ln(W ) . ≈ r n Indeed, from example 6, the eigendecomposition of the associated covariance operator (seen as an endomorphism in L2([0, 1],ds; R)) is n N: ∀ ∈ 1 λ = , n 1 2 2 (n + 2 ) π 1 f (s)= √2sin n + πx . n 2 42 CHAPTER 1. INTRO. TO GAUSSIAN RAND. ELEM. It is clear that the sequence of eigenvalues satisfies item 1 in proposition 29 with 1 √ c1 = π2 , ν = 1 and γ = 0. Here, the admissible sequence is λnfn n N, thus: ∈ √2 2√2 n N, λnfn = , ∀ ∈ C([0,1],R) π n + 1 ≤ π(n + 1) 2 p 2√2 and item 2 in proposition 29 is satisfied for γ = 0, c2 = π and ν = 1. Finally, the √ β-H¨older norms with β ]0, 1] of λnfn n N are: ∈ ∈ f (s) f (t) 2√2 8√2 N n n n , λnfn = λn sup | − β | = 2 2 , ∀ ∈ β s=t s t π n + 1 ≤ π (n + 1) 6 | − | 2 p p and satisfies item 3 with c = 8√2 and b = 2. Other asymptotically optimal bases 3 π − can be found in [LP09] and next chapter. 1.7 Conclusion In this introductory chapter on Gaussian random elements, the necessary material has been given for the rest of this first part. In particular, the possibility to represent Gaussian random element as series has been studied, with particular focus on the links with the covariance operator and Cameron-Martin space. In particular, co- variance factorizations are key in the practical construction of admissible sequences. However, except in the Hilbert case, this factorization is non-constructive. This will be the objective of next chapter, as first contribution in this thesis, to derive such construction. A second difficulty can arise, concerning continuous Gaussian random fields. The Karhunen-Lo`eve representation lies on an eigendecomposition problem, that can be difficult to solve in practice. In this case, the decomposition proposed in next chapter gives a practical optimization problem, easier to solve numerically. Chapter 2 Karhunen-Lo`eve decomposition in Banach spaces Previous chapter was dedicated to a short introduction of Gaussian random elements theory in general Banach spaces and in C ( , R) with a compact metric space. K K In particular, it has been shown that these elements can be represented as random series, leading to the notion of admissible sequences. Finding such families is usually done by factorization of the covariance operator, using a pre-existing Hilbert basis in the intermediate space. However, this method is not always applicable and it is of interest to find constructive solutions for this problem. It is already done in Hilbert spaces under the name of Karhunen-Lo`eve decomposition, since eigendecomposition of the covariance operator provides such factorization. Again, these problems does not have explicit solutions in all cases. In this second chapter, a constructive ap- proach is given to build stochastic bases of any Gaussian random element, extending the Karhunen-Lo`eve decomposition to general Banach spaces. Besides a theoretical interest, it provides a new methodology to decompose Gaussian random fields. 2.1 Extending the Karhunen-Lo`eve decomposition In chapter 1, it has been showed that if is a Banach space, every Gaussian random U element X L0( ) can be represented as a random series, using a factorization of its ∈ U covariance operator. In Hilbert spaces, the spectral representation of the covariance operator provides such decomposition, in a canonical fashion. In particular, the eigenvectors are obtained sequentially, as unit vectors of maximum variance. This principle will be extended to the Banach setup, using adequate notions of Rayleigh quotients in this context. The role of the eigenvector will be played by a pair (u,l) of respective unit norms, such that X l = λu. In particular, this ∈U×U ∗ C generalization recovers the usual spectral representation of the covariance operator when is Hilbert. U 43 44 CHAPTER 2. K.-L. DECOMPOSITION IN BANACH SPACES Spectral decomposition in Hilbert spaces Before actually going to the Banach setup, the spectral decomposition of non-negative, self-adjoint and compact opera- tors in Hilbert spaces is revisited, emphasizing the point of view that will be fruitful later on. Indeed, let be a separable Hilbert space and X L0( ) a Gaussian U ∈ U random element with covariance operator X . Since it is a non-negative, self-adjoint C and nuclear operator (see theorem 6), one has the following characterization of its highest eigenvalue. Proposition 30. Let be a separable Hilbert space and A ( , ) a non-negative, U ∈L U U self-adjoint and compact operator, then: A ( , ) = sup Au, u = Au0,u0 = λ0, k kL U U u U h iU h iU ∈B where λ0 is the highest eigenvalue of A and u0 one associated eigenvector of unit norm: Au0 = λ0u0. Proof. First, 2 (u,v) , Au, v Au v A ( , ) u v , ∀ ∈BU h iU ≤ k kU k kU ≤ k kL U U k kU k kU thus supu U Au, u A ( , ). Conversely, Cauchy-Schwarz inequality gives ∈B h iU ≤ k kL U U Au, v Au, u Av, v , |h iU |≤ h iU h iU q q thus A ( , ) supu U Au, u . Since Au0,u0 = λ0, λ0 A ( , ). Finally, k kL U U ≤ ∈B h iU h iU ≤ k kL U U if (λn,un)n N is the spectral representation of A, ∈ 2 2 u , Au, u = λn u,un λ0 u , ∀ ∈U h iU h iU ≤ k kU n 0 X≥ thus A ( , ) λ0. The proof is complete. k kL U U ≤ Thus, the first element u0 in the spectral basis is a unit vector of maximum variance: 2 X ( , ) = max X u,u = max V [ X,u ]= X u0 X = λ0. kC kL U U u U hC iU u U h iU kC k ∈B ∈B Since u is a vector of unit norm, the Gaussian random element can be split 0 ∈ U using the associated orthogonal projector of unit rank: X = Y + Z, Y = X,u0 u0, h iU Z = X Y. − Then Y,Z L0( ) are independent Gaussian random elements. The covariance ∈ U operator is decomposed as well: X = Y + Z , C C C Y = λ0 .,u0 u0, C h iU Z = X Y . C C −C 2.1. EXTENDING THE KARHUNEN-LOEVE` DECOMPOSITION 45 Here, Z is identically distributed with the conditional Gaussian random element X X,u0 = 0. The second component is chosen similarly, under the additional |h iU constraint to be independent (orthogonality in the Cameron-Martin space) with the first one. It is the exact same thing than looking for the unit vector of maximum variance w.r.t. Z and it appears that: 2 Z ( ∗, ) = max Z u,u = max X u,u = Z u1 X = λ1. kC kL U U u U hC iU u U hC iU kC k ∈B u,u∈B0 =0 h iU This scheme will be repeated to obtain every eigenvector and this is what will be done in the Banach setup. However, the optimality (see chapter 1) of the spectral basis lies on the following observation (Courant-Fisher min-max theorem): n N, λn = min max X u,u , u ⊥, dim( )= n , ∀ ∈ u =1 hC iU ∈V V k kU = max min X u,u , u , dim( )= n . u =1 hC iU ∈V V k kU It implies that sequentially taking the maximum of variance in the orthogonal sub- space of previous vectors leads to the best solution. However, in the Banach setting, this principle needs not be true. Splitting the space Now, let be a general Banach space and X L0( ) a U ∈ U Gaussian random element. The first step is to be able to split the space in such a way that the underlying Gaussian random element is decomposed as the sum of two independent components (one being of unit rank). Indeed, one could always consider a non-trivial linear functional and the associated hyperplane which is of codimension 1: l ∗ 0 , = ker l + span u , ∀ ∈U \{ } U { } where u ker l. However, nothing imposes that the resulting decomposition of ∈ U \ the Gaussian random element: X = Y + Z, Y = X,l u, h i Z = X X,l u, −h i provides two independent elements. Nevertheless, the next lemma shows that it is possible to impose this independence, choosing an adequate vector u , which ∈ U will result in an orthogonal direct sum of the two resulting Cameron-Martin spaces. Moreover, both spaces inherit the inner product from X . U Lemma 5. Let be a Banach space, X L0( ) a Gaussian random element U ∈ U 1 with non-trivial covariance operator X , l ∗ ker( X ), h = X l X− X l and 1 C ∈ U \ C kC k C h = X l − l then ∗ kC kX X = Y + Z, 46 CHAPTER 2. K.-L. DECOMPOSITION IN BANACH SPACES where: Y = X,h∗ , ∗ h, h iU U and Z = X Y are two independent Gaussian random elements with respectively − Y := g ∗ h,g , ∗ h , C ∈U →h iU U ∈U and Z = X Y as covariance operators. Concerning the respective Cameron- C C −C Martin spaces, it comes that Y is isometric to span h X and Z to span h ⊥ U { }⊂U U { } ⊂ X , leading to U X = Y Z . U U ⊕U Moreover V [ X,h ]= h 2 = 1. h ∗i k kX Proof. Since Y and Z are both images of X under bounded operators, they are Gaussian random elements themselves (proposition 4). Concerning Y , the covariance operator may be deduced directly from its Fourier transform: 1 2 g ∗, Y (g)= E exp i X,h∗ , ∗ h,g , ∗ = exp h,g , ∗ , ∀ ∈U h iU U h iU U −2 h iU U h i b from which Y is easily identified. For the independence property, let (l ,l ) C 0 1 ∈ , then U ∗ ×U ∗ E [ Y,l Z,l ]= E [ Y,l X Y,l ] , h 0ih 1i h 0ih − 1i = E [ Y,l X,l ] E [ Y,l Y,l ] , h 0ih 1i − h 0ih 1i = E [ X,h∗ h,l X,l ] E [ X,h∗ h,l X,h∗ h,l ] , h ih 0ih 1i − h ih 0ih ih 1i = h,l X h∗,l h,l h,l X h∗,h∗ , h 0ihC 1i−h 0ih 1ihC i = 0, since X h = h, which is sufficient. Now, because X = Y + Z and Y,Z are indepen- C ∗ dent, l ,l ∗, X l ,l = E [ X,l X,l ] , ∀ 0 1 ∈U hC 0 1i h 0ih 1i = E [ Y + Z,l Y + Z,l ] , h 0ih 1i = E [ Y,l Y,l ]+ E [ Z,l Z,l ] , h 0ih 1i h 0ih 1i = Y l ,l + Z l ,l , hC 0 1i hC 0 1i and X = Y + Z . Let P be the orthogonal projector from X to span h , as C C C U { } h = 1 then one has g , k kX ∀ ∈U ∗ Y g = h,g , ∗ h = X g,h X h, C h iU U hC i so Y g = P ( X g) and similarly Z g =(I P )( X g) thus C C C − C g ∗, Z g, Y g = 0. ∀ ∈U hC C iX 2.1. EXTENDING THE KARHUNEN-LOEVE` DECOMPOSITION 47 It remains to see that on R ( Y ) and R ( Z ), both respective Cameron-Martin norms C C are inherited from X . Let g , then i 1, 2 one has: U ∈U ∗ ∀ ∈ { } 2 X g = X g, X g , kC i kXi hC i C i iXi = Xi g,g , ∗ , hC iU U = X g, X g , hC i C iX 2 = X g, X g = X g . hC i C i iX kC i kX . X Now, clearly Y = R ( Y ) = span h and Z = R ( Z )k k = span h ⊥. Finally, U C { } U C { } 2 X l,l , ∗ X l X hC iU U V [ X,h∗ ]= X h∗,h∗ = = kC k2 = 1. h i hC i X l, X l X X l hC C i kC kX One very important interpretation in previous lemma is that Z has the same distribution as X X,l = 0 and it will be called the residual Gaussian random |h i element. If Z has a non-trivial covariance operator, it can be used to iterate and obtain the following result. Proposition 31. Let be a Banach space, X L0( ) a Gaussian random ele- U ∈ U ment with covariance operator X and Cameron-Martin space X . If X is infinite C U U dimensional, there exists (hn)n N X orthonormal and (hn∗ )n N ∗ such that ∈ ⊂U ∈ ⊂U X = Yn + X , ∞ n 0 X≥ with n N, Yn = X,hn∗ hn and X mutually independent Gaussian random ∞ ∀ ∈ 0 h i elements in L ( ). For all n N, V [ X,h ]= hn = 1, Y = span hn (with U ∈ h n∗ i k kX U n { } .,. = .,. ). Furthermore, h iYn h iX X X = span hn, n N X , U { ∈ } ⊕U ∞ where X = span hn, n N ⊥ and .,. = .,. . U ∞ { ∈ } h iX∞ h iX Proof. The proof is done by induction and the initialization (n = 0) is the result of 1 1 lemma 5 applied to X, with notations l = l , h = X l − X l , h = X l − l 0 0 kC 0kX C 0 0∗ kC 0kX 0 and: X = Y0 + X1, where Y0 = X,h0∗ , ∗ h0 and X1 = X Y0 are independent Gaussian random h0 iU U − element in L ( ) with orthogonal Cameron-Martin spaces Y = span h and U U 0 { 0} X = . The associated covariance operators are: U 1 UY⊥0 Y = .,h0 , ∗ h0, C h iU U X = X Y . C 1 C −C 48 CHAPTER 2. K.-L. DECOMPOSITION IN BANACH SPACES For the induction step, let n N and suppose the proposition true at this stage, ∈ meaning that n X = Yk + Xn+1, Xk=0 with k [0,n], Yk = X,hk∗ hk and Xn+1, (n + 1) mutually independent Gaus- ∀ ∈ h 0 i sian random elements in L ( ) with respective Cameron-Martin spaces Y = U U k span hk , k [0,n] and X = span hk, k [0,n] ⊥. Since X is infinite { } ∀ ∈ U n+1 { ∈ } U dimensional, there exists ln span hk, k [0,n] ⊥ 0 and: +1 ∈ { ∈ } \{ } 1 hn = X ln − X ln , +1 C n+1 +1 X C n+1 +1 is well-defined. Now it comes: ln+1 Xn+1 = Xn, hn+1 + Xn+2, X ln * C n+1 +1 X + n ln+1 = X X,hk∗ hk, hn+1 + Xn+2, * − h i Xn+1 ln+1 X + Xk=0 C n ln+1 k=0 hk,ln+1 hk∗ = X, − h i h n+1 + Xn+2, X ln * CPn+1 +1 X + = X,h h + X , n∗+1 n+1 n+2 with obvious notations. Since hn X = span h ,...,hn ⊥ and has unit norm, +1 ∈U n+1 { 0 } the family (h ,...,hn ) X is orthonormal and one has X = span h ,...,hn ⊥.The 0 +1 ⊂U U n+2 { 0 +1} independence property is obtained directly and the proposition is true at step (n+1). Using the induction principle, the proof is complete. In the degenerated case where the covariance operator has finite rank, which is equivalent with X of finite dimension, it is clear that previous proposition is U enough to provide an orthonormal basis of the Cameron-Martin space. However, the case where X is infinite dimensional requires to choose the linear functionals U more specifically at each step. Maximum variance functionals Here, it is shown that one can choose l ∗ ∈BU maximizing the one-dimensional projected variance. This principle is directly moti- vated by analogy with the Hilbert case where eigenvectors (seen as linear functionals) are maximizing the Rayleigh quotient. Lemma 6. Let be a Banach space, X L0( ) a Gaussian random element with U ∈ U covariance operator X , then C l ∗ , X l,l , ∗ = max X g,g , ∗ . ∃ ∈BU hC iU U g U∗ hC iU U ∈B Moreover, it can be chosen with unit norm and satisfies the following relation: 1 1 2 2 X l X = sup sup h,g , ∗ = X ( ∗, ) = X l . kC k g B ∗ h U h iU U kC kL U U kC kU ∈ U ∈B X 2.1. EXTENDING THE KARHUNEN-LOEVE` DECOMPOSITION 49 1 1 Note λ = X l,l , ∗ = V X,l , ∗ , h = X l X− X l and u = X l − X l then hC iU U h iU U kC k C kC kU C one has the following properties:h i 1. X l = λu = √λh, C 2. h X = 1, h = √λ, k k k kU 1 3. u X = √ , u = 1, k k λ k kU 4. u,l , ∗ = 1. h iU U ∗ ∗ ∗ Proof. Let (ln)n N be such that X ln,ln , supl U∗ X l,l , (a ∈ ⊂ BU hC iU U → ∈B hC iU U maximizing sequence). Because ∗ is compact in the σ( ∗, )-topology (Banach- BU U U Alaoglu theorem), there exists l ∗ such that ln ⇀ l in the σ( ∗, ) topology. ∈ BU U U Now, ˆ ˆ X(ln)= E exp i X,ln , ∗ E exp i X,l , ∗ = X(l), h iU U → h iU U h i h i by Lebesgue’s dominated convergence theorem. Using proposition 3 it implies that X ln,ln , ∗ X l,l , ∗ R. If X is trivial, then one can take whatever hC iU U → hC iU U ∈ C element in ∗ . If X = 0, then l must be of unit norm. Indeed, suppose l ∗ < BU 1C 6 k kU 1, then g = l −∗ l would contradict the maximality. Now, let h X , then k kU ∈ BU the reproducing property gives g ∗, h,g , ∗ = h, X g X and taking the ∀ ∈ U h iU U h C i supremum in h over the unit ball X leads to BU sup h,g , ∗ = X g X = X g,g , ∗ , h U h iU U kC k hC iU U ∈B X which leads to the announced equality. Now, from the definition, it is clear that h X = u = 1, k k k kU √ h and u,l , ∗ = X l,u X = λh, √ = h,h X = 1. Finally, h iU U hC i λ X h i D E 1 1 1 2 2 2 X l X = X l, X l X = X l,l , ∗ X l l ∗ X ( ∗, ) , kC k hC C i hC iU U ≤ kC kU k kU ≤ kC kL U U q q and conversely X l,g , ∗ = X l, X g X X l X X g X , hC iU U hC C i ≤ kC k kC k 2 thus X ( ∗, ) X l X and both are equal. kC kL U U ≤ kC k The similarity with the Hilbert case is emphasized here, where l and u ∈U ∗ ∈U will play the role of eigenvectors associated to λ, since X l = λu. Remark that C uniqueness of the maximizing linear functional in previous lemma need not be true in general, the set of solutions even being possibly infinite. However, one can always choose this element among the extremal points of the unit ball of , and this will U ∗ be particularly important in practice. 50 CHAPTER 2. K.-L. DECOMPOSITION IN BANACH SPACES Proposition 32. Let be a Banach space, X L0( ) a Gaussian random element U ∈ U with covariance operator X , then there exists an extremal point δ of ∗ such that C BU ∗ ∗ X δ, δ , = supl U∗ X l,l , . hC iU U ∈B hC iU U Proof. Since the functional l ∗ X l,l , ∗ is quadratic, this is a general result ∈U → hC iU U from convex optimization theory. Splitting the space using maximum variance functionals It has been proved previously that whenever the Cameron-Martin space is finite dimensional, one can build a basis only using linear functionals of strictly positive residual variance at each stage (proposition 31). Now, if one chooses these functionals such that they maximize the residual variance at each step, the family (hn)n N will be a Hilbert ∈ basis in X . The first step is to give an analogue of proposition 31, taking into U account that linear functionals are maximizing the variance. Lemma 7. Let be a Banach space, X L0( ) a Gaussian random element with U ∈ U covariance operator X and suppose X infinite dimensional. Note X := X and C U 0 define by induction on n N: ∈ ln = arg maxl ∗ Xn l,l , ∈BU hC i λn = X ln,ln ∗ , hC n i , 1 U U 2 hn = λn− X ln, C n 1 un = λ− X ln, n C n Xn+1 = Xn Xn,ln , ∗ un, −h iU U then one has the following properties: n N, X ln = √λnhn = λnun, • ∀ ∈ C n n N, hn X = un = un,ln , ∗ = 1, • ∀ ∈ k k k kU h iU U 2 (n, m) N , hn,hm = δnm. • ∀ ∈ h iX + Furthermore, (λn)n N R is non-increasing and limn λn = 0. ∈ ⊂ →∞ Proof. Since X is infinite dimensional, the construction from proposition 31 is licit. U From lemma 6, linear functionals of maximum residual variance (a fortiori strictly positive) exist and the sequences (un)n N, (ln)n N, (λn)n N are well-defined. Now, ∈ ∈ ∈ n N, Xn = Xn,ln , ∗ un + Xn+1, ∀ ∈ h iU U where Xn,ln , ∗ un and Xn+1 are independent Gaussian random elements, it fol- h iU U lows that: N n , l ∗, Xn l,l , ∗ Xn+1 l,l , ∗ , ∀ ∈ ∀ ∈U hC iU U ≥ C U U and taking the supremum in the above relation on the unit ball of shows that U ∗ (λn)n N is non-increasing. Moreover, (hn)n N is an orthonormal system in X , hence ∈ ∈ U l ∗, hn,l , ∗ = hn, X l X 0, ∀ ∈U h iU U h C i → 2.1. EXTENDING THE KARHUNEN-LOEVE` DECOMPOSITION 51 as a consequence of Bessel’s inequality. In other words, hn ⇀ 0 in the weak topology σ( , ). Now, since the unit ball of X is precompact in , there exists a convergent U U ∗ U U subsequence (hnk )k N such that hnk k h in the strong topology of . Using the ∈ → ∞ U uniqueness of limits in equipped with the weak topology σ( , ), it comes that U U U ∗ h = 0 and therefore, hnk = λnk 0, which implies that (λn)n N 0. The ∞ k kU → ∈ → relations are obtained directly usingp the reproducing property at each stage similarly to lemma 6. Now, it remains to prove that the orthonormal sequence (hn)n N X from ∈ ⊂ U lemma 7 is actually a Hilbert basis. Theorem 13. Let be a Banach space, X L0( ) a Gaussian random element U ∈ U with covariance operator X . Consider the orthogonal family (hn)n N X as built C ∈ ⊂U in lemma 7, then it is a Hilbert basis in X . U Proof. The family (hn)n N is orthonormal by lemma 7, it remains to see that ∈ span hn, n N is dense in X to conclude. Let h X such that n N, h,hn = { ∈ } U ∈U ∀ ∈ h iX 0, then clearly, h X , n N. Using the reproducing property in X , it ∈ U n+1 ∀ ∈ U n+1 comes n N, l ∗ : ∀ ∈ ∀ ∈BU h,l , ∗ = h, Xn+1 l X , h iU U C n+1 h X +1 l, X +1 l , ≤ k kXn+1 C n C n Xn+1 q h λn . ≤ k kX +1 This implies that l ∗ , h,l , ∗p= 0, therefore h = 0 and span hn, n N is ∀ ∈BU h iU U { ∈ } dense in X . U Consequences The previous construction leads to a Hilbert basis of X in R ( X ) U C using a constructive approach. In fact, the system (un,hn∗ )n N ∗ is a ∈ ⊂ U×U stochastic basis. Corollary 9. Let be a Banach space, X L0( ) a Gaussian random element U ∈ U with covariance operator X and consider notations from lemma 7, then C X = n 0 X,hn∗ , ∗ hn a.s., • ≥ h iU U P l ∗, X l = n 0 hn,l hn, • ∀ ∈U C ≥ h i moreover one has XP ( ∗, ) = λ0 and: kC kL U U n n N, X hk,. hk = λn+1. ∀ ∈ C − h i k=0 ( ∗, ) X L U U Proof. The sequence (hn)n N is a Hilbert basis in X , thus admissible for X and ∈ U the two representations are consequences of lemma 4 and corollary 7. Now, the truncation error norm is n X hk,. , ∗ hk = Xn+1 ln+1,ln+1 , ∗ = λn+1. C − h iU U C U U k=0 ( ∗, ) X L U U 52 CHAPTER 2. K.-L. DECOMPOSITION IN BANACH SPACES However, since it is possible to find non-unique maximizing linear functionals at each steps, the error λn+1 may be dependent on these choices. It implies that it is not necessarily minimal, thus the obtained basis may not be optimal in the sense n N, an(X) = λn . For instance, one could start with linear functionals ∀ ∈ +1 having strictly positivep residual variance on a finite number of steps, and then apply the proposed decomposition. However, it is still of important practical interest for the following reason. Corollary 10. Let be a Banach space, X L0( ) a Gaussian random element, U ∈ U then with the notations from lemma 7, 1 n 2 2 sup E X X,hk∗ , ∗ hk,l = λn+1. l ∗ − h iU U ∈BU * k=0 + , ∗ X U U p Proof. By construction, n N, ∀ ∈ n 2 E l ∗, X X,hk∗ , ∗ hk,l = Xn+1 l,l , ∗ , ∀ ∈U − h iU U C U U * k=0 + , ∗ X U U thus the result is immediate. In particular, this gives an estimation of the approximation numbers of the Gaus- sian random element X: n N, an(X) λn . ∀ ∈ ≤ +1 p The question whether one has an(X) λn, ≈ p in general is still open, but it will be positively answered in the particular study of the Wiener process for instance (see section 2.4 below). Non-Gaussian case It will now be showed that the Gaussian hypothesis in pre- vious construction is not necessary. Indeed, all the above theory can be applied to a wider class, namely the random elements X L2( ) of strong order 2. First, no- ∈ U tions of covariance operator and Cameron-Martin spaces are well-defined and share similar properties. Proposition 33. Let be a Banach space, X L2( ) a random element of strong U ∈ U order 2, then there exists a unique covariance operator X : such that: C U ∗ →U l,g ∗, X l,g , ∗ = E X,l , ∗ X,g , ∗ . ∀ ∈U hC iU U h iU U h iU U h i This operator is linear, non-negative, symmetric and nuclear. 2.1. EXTENDING THE KARHUNEN-LOEVE` DECOMPOSITION 53 Proof. Since X L2( ), the covariance operator is directly defined using Bochner’s ∈ U integral: X := l ∗ E X,l , ∗ X . C ∈U → h iU U ∈U The above relations is a property of this integral,h thus symmetry,i non-negativity and linearity are immediate. The nuclearity is proved in [VTC87] (chapter 3, theorem 2.3). In particular, the factorization lemma and the construction of the Cameron- Martin space are still valid. Proposition 34. Let be a Banach space, X L2( ) a random element of strong U ∈ U order 2 with covariance operator X , then the associated Cameron-Martin space X C U is well-defined. It is separable and injects compactly in . U Proof. The proof is similar with the Gaussian case, the separability being given in [VTC87] (chapter 3, corollary 1). The compacity comes from the nuclearity of X . C Now that the essential ingredients are similar, it remains to see that the maximiz- ing linear functionals are still well-defined, which is the object of next proposition. Proposition 35. Let be a Banach space, X L2( ) a random element of strong U ∈ U order 2 with covariance operator X , then it comes that C ln ⇀l X ln,ln , ∗ X l,l , ∗ . ⇒ hC iU U → hC iU U ∗ ∗ ∗ As a consequence, l such that X l,l , = supg U∗ X g,g , . ∃ ∈BU hC iU U ∈B hC iU U Proof. Let (ln)n N ∗, l ∗ such that ln ⇀ l in σ( ∗, ). In particular, this ∈ 2⊂ U ∈ U 2 U U implies that X,ln , ∗ X,l , ∗ . Moreover, one has h iU U →h iU U 2 2 2 X,ln , ∗ X ln ∗ , h iU U ≤ k kU k kU and since ( ln ∗ ) is bounded, Lebesgue’s dominated convergence theorem gives: k kU 2 2 X ln,ln , ∗ = E X,ln , ∗ E X,l , ∗ = X l,l , ∗ . hC iU U h iU U → h iU U hC iU U h i h i Now, the rest of the proof is similar with the Gaussian case. With these 3 propositions at hand, the previous decomposition can be derived, in the exact same manner. It leads to the following result: X = X,hn∗ , ∗ hn, h iU U n 0 X≥ with X,hn∗ , ∗ a sequence of square integrable random variables with unit h iU U n N ∈ variance and (hn)n N a basis in the Cameron-Martin space. The main difference ∈ with the Gaussian case are: 1. n N, X,hn∗ , ∗ need not be Gaussian, ∀ ∈ h iU U 2. X,hn∗ , ∗ are mutually non-correlated but possibly dependent variables. h iU U 54 CHAPTER 2. K.-L. DECOMPOSITION IN BANACH SPACES 2.2 The special case of C ( , R) K In this section, the previous theory is applied to the Banach space = C ( , R), U K where ( ,d) is a compact metric space. First, the decomposition is directly given by K application of previous general theorems. In a second time, a direct proof is given, using standard analysis in Reproducing Kernel Hilbert spaces. Application of previous theory As stated in chapter 1, any Gaussian random field on may be considered as a random element, the covariance operator being K obtained as follows (see chapter 1): X : µ ∗ kX (.,t)µ(dt). C ∈U → ZK One particularly useful result from previous section is the possibility to choose among extremal points of the unit ball in at each stage. In this context, these points U ∗ are (signed) Dirac delta measures and lead to a classical optimization problem. Proposition 36. Let ( ,d) be a compact metric space, X a continuous Gaussian K random field on , kX and X its respective covariance kernel and operator, then K C sup X µ,µ , ∗ = sup kX (s,t)µ(dt)µ(ds) = sup kX (s,s). µ U∗ hC iU U µ U∗ s ∈B ∈B ZK ZK ∈K Proof. The functional µ ∗ X µ,µ , ∗ is quadratic. From proposition 32, ∈ U → hC iU U ∗ ∗ ∗ there exists an extremal point δ such that X δ, δ , = supµ U∗ X µ,µ , . ∈BU hC iU U ∈B hC iU U Now, extremal points of the unit ball in are Dirac delta measures, thus δ = δs∗ U ∗ or δ = δs∗ with s . In this case, it is clear that − ∗ ∈K kX (s,t)δs∗ (dt)δs∗ (ds)= kX (s∗,s∗), ZK ZK and the result follows. This principle, jointly with previous decomposition, gives the following result. Corollary 11. Let ( ,d) be a compact metric space, X = (Xs)s a continuous K ∈K Gaussian random field with covariance kernel kX . Take k0 := k and define by induction: sn = arg maxs kn(s,s), ∈K λn = kn(sn,sn), kn(s,sn) hn(s) := , kn(sn,sn) kn+1(s,t) := kpn(s,t) hn(s)hn(t). − The sequence (hn)n N k C ( , R) is admissible for X. ∈ ⊂H ⊂ K Furthermore, the precised quantification of uncertainty from corollary 10 is trans- lated as follows. 2.2. THE SPECIAL CASE OF C ( , R) 55 K Corollary 12. Let ( ,d) be a compact metric space, X = (Xs)s a continuous K ∈K Gaussian random field with covariance kernel kX . With the notations from corollary n 11, note n N, X the continuous Gaussian process with kernel kn then: ∀ ∈ n n sup V Xs µ(ds) = sup V [Xs ]= λn. µ ∗ s ∈U ZK ∈K A direct proof for covariance kernels As it was showed in chapter 1, the series representation of a continuous Gaussian process is the same problem as find- ing tensor representations of its covariance kernel. The extended Karhunen-Lo`eve decomposition (described for general Banach spaces) is given here directly in the language of Reproducing Kernel Hilbert spaces, the probabilistic aspect being al- ready investigated. It is always assumed that the covariance kernel k is associated to a continuous Gaussian random field X = (Xs)s with ( ,d) a compact metric ∈K K set. Lemma 8. Let k a continuous, symmetric, semi-positive definite kernel and s˜ ∈K such that k(˜s, s˜) > 0 then k = k1 + k2, where (s,t) 2, ∀ ∈K k(s, s˜)k(t, s˜) k (s,t)= , 1 k(˜s, s˜) k (s,t)= k(s,t) k (s,t), 2 − 1 are continuous, symmetric, semi-positive definite kernels such that k = k k , H H 1 ⊕H 2 and i 1, 2 , . = . on k . ∀ ∈ { } k kki k kk H i Proof. Lets ˜ such that k(˜s, s˜) > 0, then s : ∀ ∈K 1 2 Xs = Xs + Xs , k(.,s) X1 = X , s s˜ k(˜s, s˜) 2 1 X = Xs X . s − s Y 1,Y 2 are two independent Gaussian random fields, the rest of the proof is similar with lemma 5. Proposition 37. Let k a continuous, symmetric, semi-positive definite kernel and s = arg maxt k(t,t) then ∈K sup h C( ,R) = k(s,s). h k k K Hk ∈B p 56 CHAPTER 2. K.-L. DECOMPOSITION IN BANACH SPACES Proof. Using the reproducing property, it comes: sup h C( ,R) = sup sup h(t) = sup sup h,k(t,.) k . h H k k K h H t | | h H t |h i | ∈B k ∈B k ∈K ∈B k ∈K Now, Cauchy-Schwarz inequality implies h,k(t,.) h k(t,t), |h ik | ≤ k kk p k(s,.) thus suph H h C( ,R) supt k(t,t). However, f = is of unit norm in ∈B k k k K ≤ ∈K √k(s,s) k and f(s)= k(s,s) thus f Cp( ,R) k(s,s) and the proof is complete. H k k K ≥ Lemma 9. Letpk a symmetric, semi-positivep definite and continuous kernel, if (hn)n N is orthonormal in k then hn C( ,R) 0. ∈ H k k K → Proof. The unit ball k is compact, thus h k and (hnk )k N such that BH ∃ ∞ ∈ H ∈ limk hnk = h in C ( , R). Moreover, s , hn(s) 0 since (Bessel’s in- →∞ ∞ K ∀ ∈ K → equality): 2 2 2 hn(s) = hn,k(.,s) k(.,s) , h ik ≤ k kk n 0 n 0 X≥ X≥ 2 (hn(s))n N l (N). It follows that hnk 0 and hn C( ,R) tends to 0. ∈ ∈ → k k K Theorem 14. Let k a symmetric, semi-positive definite and continuous kernel such that k is infinite dimensional. Let k := k and define by induction n N: H 0 ∀ ∈ sn = arg maxs kn(s,s), ∈K λn = kn(sn,sn), kn(sn,s) hn(s) := , √λn kn (s,t) := kn(s,t) hn(s)hn(t), +1 − then the family (hn)n N is a Hilbert basis in k. ∈ H Proof. Since k is infinite dimensional, k = k is non-trivial and one has k (s ,s ) > H 0 0 0 0 0 thus k is well-defined. From lemma 8, both k and h h are symmetric, 1 1 0 ⊗ 0 continuous, semi-positive definite kernels and k = k H H0 ⊕H 1 with k infinite dimensional. Now, let n N and suppose that kn is a well-defined, H 1 ∈ symmetric, semi-positive definite and continuous kernel with k an infinite dimen- H n sional RKHS. Again, the application of lemma 8 establishes the same properties for kn . By induction, it is true n N. Now, by construction the family is +1 ∀ ∈ orthonormal in k, it remains to see that it is a basis. Let h k such that H ∈ H n N, h,hn = 0, it comes that h k , n N, thus: ∀ ∈ h ik ∈H n ∀ ∈ n N, s , h(s) = h,kn(.,s) k h k kn(s,s) h k hn C( ,R) . ∀ ∈ ∀ ∈K | | |h i | ≤ k k ≤ k k k k K p From lemma 9, hn C( ,R) 0, thus h = 0 and the proof is complete. k k K → 2.3. EXAMPLES IN ( , R) 57 C K This strategy is distinct from [Git12], as the sequence of points (sn)n N is not ∈ necessary dense, nor given a priori. However, the process of building orthogonal functions is similar to the Gram-Schmidt orthogonalisation. 2.3 Examples in ( , R) C K In this section, different examples of continuous Gaussian random fields are given. In a first part, they are provided analytically and later on by a numerical algorithm. As a reminder of notations, the decomposition is as follows: s , Xs = λnξnun(s)= ξnhn(s), ∀ ∈K n 0 n 0 X≥ p X≥ with n N, hn k = un C( ,R) = 1. Bases functions will refer to the sequence ∀ ∈ k k k k K (hn)n N while their normalized version is (un)n N. ∈ ∈ Analytical examples in C ([0, 1], R) It appears that the same examples of ana- lytical Karhunen-Lo`eve decomposition in L2([0, 1],ds; R) are available here. Example 11 (Wiener process). Consider again the standard Wiener process W = (Ws)s on = [0, 1]. Note k0 := k, the first step in the decomposition is to find ∈K K s [0, 1] such that 0 ∈ s0 = arg maxs [0,1]k0(s,s). ∈ Here, s [0, 1], k (s,s) = s, thus the maximum is obtained at s = 1, λ = ∀ ∈ 0 0 0 k0(s0,s0) = 1 and h0(s)= s. Now, the residual process corresponds to the Brownian bridge (Bs)s [0,1] =(Ws W1s)s [0,1] with covariance kernel ∈ − ∈ 2 k :(t,s) [0, 1] E [BtBs] = min(s,t) ts. 1 ∈ → − In this case, the maximum of variance is obtained at s = 1 since k (s,s)= s(1 s) 1 2 1 − and it comes λ = 1 and h (s)= 1 min(s, 1 ) s . The residual process is given by 1 4 1 2 2 − 2 the conditional Wiener process W W 1 = W1 = 0. Since it is Markovian, the study 2 | 1 1 of the process can be considered independently on [0, 2 ] and [ 2 , 1]. On both parts, it is a Brownian bridge. By induction, one has directly the following: 1 p p n N, λn = for n = 2 + k, k = 0,..., 2 1 and p 0. ∀ ∈ 2p+2 − ≥ Furthermore, the Hilbert basis (hn)n N k is given by h0(t) = t and hn(t) = ∈ ⊂ H t h (s)ds, n 0, where 0 n′ ≥ R p 2k 2k+1 √2 for +1 s +1 2p ≤ ≤ 2p p 2k+1 2k+2 h′ (s)= √2 for +1 p p if n = 2 + k, k = 0,..., 2 1and p 0. The family (hn′ )n N is the usual Haar − ≥ ∈ 2 hn basis of L ([0, 1], R). The normalized functions un = are Schauder’s functions √λn p+2 un(s)= √2 hn(s) 58 CHAPTER 2. K.-L. DECOMPOSITION IN BANACH SPACES k k+1 corresponding to hat functions of height 1 and lying above the intervals 2p , 2p (n = p 2 + k). The resulting decomposition n 0 ξn(ω)hn is the famous L´evy-Ciesielski ≥ construction of Wiener process on theP interval [0, 1] (see [JK69]). Remark that 2 1 1 1 hn = λn =1+ + 2 + 4 + ... =+ , k kU 4 × 8 × 16 ∞ n 0 n 0 X≥ X≥ due to the ”multiplicity” and this representation is not nuclear. The decomposition of the Brownian bridge is obtained in example 11 since it appears as the residual process after the first iteration. Example 12 (Ornstein-Uhlenbeck process). We already know that the Ornstein- Uhlenbeck process is Gaussian with covariance kernel given as: σ β t s k(s,t)= e− | − |, 2β and σ will be supposed equal to 2β without loss of generality. Here, the process is stationary and any value s [0, 1] is maximum and valid for the first iteration. ∈ However, we choose s0 = 0 as it provides an initial condition of equation 1.1. It immediately follows that the conditional process has the following covariance kernel: 2 β t s βt βs k :(t,s) [0, 1] e− | − | e− e− . 1 ∈ → − The maximum of t k (t,t) is obtained with s = 1 and it comes → 1 1 2β λ = 1 e− , 1 − 1 β(1 t) β βt u1(t) = e− − e− e− . λ1 − The next iteration will be the last needed to obtain the full decomposition of the process. The conditional Gaussian process has covariance kernel 1 k (t,s)= k (t,s) λ− k (t,s )k (s,s ). 2 1 − 1 1 1 1 1 A straightforward computation shows that the variance function t k2(t,t) is max- 1 → imum at s2 = 2 that leads to β e− λ2 = 1 2 β , − 1+ e− βt β(1 t) 1 1 β e− + e− − β(t 2 ) 2 u2(t)= e− − e− β . λ2 − 1+ e− ! From this and since the Ornstein-Ulhenbeck process is Markovian, we deduce the shape of all other functions. Indeed, all further steps will be given on dyadic intervals k 1 k p of the form [ −p , p ] with k [1,..., 2 ], the process being independent on these 2 2 ∈ intervals. Here is the general form of the basis for n = 2p + k with p 0 and ≥ k [1,..., 2p]: ∈ β p e− 2 λn = 1 2 β , p − 1+ e− 2 k− 1 k−1 k 2 β β(t 2p ) β( 2p t) 1 β t 2p e− − + e− − − − 2p+1 un(t)= e e− β . λ p n " − 1+ e− 2 # 2.3. EXAMPLES IN ( , R) 59 C K Remark that these analytical examples rely heavily on the Markovian nature of the considered processes, as it provides a sort of self-similarity argument. Numerical examples Since all one dimensional continuous Gaussian random fields are not Markovian, the generalized Karhunen-Lo`eve decomposition is not always available analytically. However, the particular nature of the decomposi- tion provides a practical method to numerically build it. Indeed, the search for maximum variance functionals translates into maximizing a continuous function on a compact space. Algorithm 1 provides a na¨ıve method to build such basis. All the following numerical examples are obtained on a regular desktop machine (late 2015), using Python. The chosen optimization library is Pyswarm (https: //pythonhosted.org/pyswarm/), as swarm-type algorithms provide robustness to local maximums. Data: A compact metric set ( ,d); K a continuous, symmetric, semi-positive definite kernel k; an integer n N. ∈ Result: A vector of points (s0,...,sn 1); − a vector of maximal residual variances (λ0,...,λn 1); − a list of basis functions (h0,...,hn 1). − Set k0 = k and i = 0; while i find si := arg maxt ki(t,t); ∈K put λi := ki(si,si); ki(.,si) put hi := ; √ki(si,si) if λi > 0 then 2 Set ki := (s,t) ki(s,t) hi(s)hi(t); +1 ∈K → − set i := i + 1; else stop; end end Algorithm 1: Build the n first basis functions. 1 1 Example 13 (Squared exponential kernel). Let X = (Xs )s [0,1] be the Gaussian ∈ process with squared exponential kernel: 2 kX (s,t) = exp (t s) . 1 − − A basis is built considering s0 = 0 and using algorithm 1. The 8 first normalized basis functions (u0,...,u7) are presented in figure 2.1 as well as their respective variances (λ0,...,λ7) in figure 2.2. It appears that most of the variability of the process is 60 CHAPTER 2. K.-L. DECOMPOSITION IN BANACH SPACES captured with 8 basis functions. Indeed, the maximum of residual variance is less 8 than 10− of the initial variance. 3 2 2 Example 14 (Mat´ern ν = 2 kernel). Let X =(Xs )s [0,1] be the Gaussian process ∈ 3 with Mat´ern 2 kernel: kX (s,t)= 1+ √3 s t exp √3 s t . 2 | − | − | − | A basis is built considering s0 = 0 and using algorithm 1. The normalized functions are presented in figure 2.3 as well as the associated variances in figure 2.4. 1 3 3 Example 15 (Fractional Brownian motion H = 4 ). Let X = (Xs )s [0,1] be the ∈ Gaussian process with fractional Brownian motion kernel: 1 2H 2H 2H kX (s,t)= s + t s t . 3 2 | | | | −| − | 1 where H is the index. A basis is built using algorithm 1 for both H = 4 and 3 H = 4 . The normalized functions are presented in figures 2.5 and 2.7 as well as the associated variances in figures 2.6 and 2.8. 2.4 A study of the standard Wiener process This last section will be dedicated to a comparison of different admissible sequences for the standard Wiener process in C ([0, 1], R). In particular, if (hn)n N is an ∈ admissible sequence, then s [0, 1], Ws = ξnhn(s), a.s. ∀ ∈ n 0 X≥ with a uniform convergence. Two distinct measures will be considered for every admissible sequence (hn)n N, namely: ∈ 1. l-error: 1 2 2 n N, ln((hn)n N)= E sup ξkhk(s) , ∀ ∈ ∈ s [0,1] k n ∈ X≥ 2. a-error: 1 2 2 n N, an((hn)n N)= sup hk(s) . ∀ ∈ ∈ s [0,1] k n ∈ X≥ In this special case, both ln(W ) and an(W ) convergence rates are known asymp- totically and will be compared with ln((hn)n N) and an((hn)n N). ∈ ∈ 2.4. A STUDY OF THE STANDARD WIENER PROCESS 61 Normalized basis functions un 1.00 0.75 0.50 0.25 0.00 u0 −0.25 u1 u2 −0.50 u3 u4 −0.75 u5 u6 u7 −1.00 0.0 0.2 0.4 0.6 0.8 1.0 s Figure 2.1: 8 first normalized basis functions (u0,...,u7) (squared exponential ker- nel). The dots represent positions (s0,...,s7) of associated Dirac functionals of max- imum variance. Evolution of variances (λn) 100 Variances 10−2 10−4 −6 Variance 10 10−8 10−10 1 2 3 4 5 6 7 8 Index of basis functions Figure 2.2: Evolution of 8 first basis functions variances (λ0,...,λ7) (squared expo- nential kernel). 62 CHAPTER 2. K.-L. DECOMPOSITION IN BANACH SPACES Normalized basis functions un 1.0 u0 u1 u2 0.8 u3 u4 u5 0.6 u6 u7 0.4 0.2 0.0 −0.2 −0.4 0.0 0.2 0.4 0.6 0.8 1.0 s 3 Figure 2.3: 8 first normalized basis functions (u0,...,u7) (Mat´ern ν = 2 kernel). The dots represent positions (s0,...,s7) of associated Dirac functionals of maximum variance. Evolution of variances (λn) 100 Variances 10−1 10−2 Variance 10−3 10−4 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Index of basis functions 3 Figure 2.4: Evolution of 8 first basis functions variances (λ0,...,λ7) (Mat´ern ν = 2 kernel). 2.4. A STUDY OF THE STANDARD WIENER PROCESS 63 Normalized basis functions un 1.0 u0 u1 u2 u3 0.8 u4 u5 u6 0.6 u7 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 s 1 Figure 2.5: 8 first normalized basis functions (u0,...,u7) (Fbm H = 4 kernel). The dots represent positions (s0,...,s7) of associated Dirac functionals of maximum variance. Evolution of variances (λn) 100 Variances 6 × 10−1 Variance 4 × 10−1 3 × 10−1 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Index of basis functions 1 Figure 2.6: Evolution of 8 first basis functions variances (Fbm H = 4 kernel). 64 CHAPTER 2. K.-L. DECOMPOSITION IN BANACH SPACES Normalized basis functions un u0 1.0 u1 u2 u3 u4 0.8 u5 u6 u7 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 s 3 Figure 2.7: 8 first normalized basis functions (u0,...,u7) (Fbm H = 4 kernel). The dots represent positions (s0,...,s7) of associated Dirac functionals of maximum variance. Evolution of variances (λn) 100 Variances Variance 10−1 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Index of basis functions 3 Figure 2.8: Evolution of 8 first basis functions variances (Fbm H = 4 kernel). 2.4. A STUDY OF THE STANDARD WIENER PROCESS 65 A few admissible sequences In this thesis, all admissible sequences presented can be derived using the following covariance factorization. Proposition 38. Let = [0, 1] and consider the standard Wiener process with K covariance kernel kW and operator W , then the following operator: C s 2 AW : f L ([0, 1],ds) s f(t)dt = C ([0, 1], R) , ∈ → → ∈U Z0 2 provides a factorization of W in H = L ([0, 1],ds; R). C Proof. First, the Cauchy-Schwarz inequality gives 2 2 f L ([0, 1],ds), (s,t) [0, 1] , AW f(s) AW f(t) s t f 2 ∀ ∈ ∀ ∈ | − ≤ | − | k kL p thus functions AW f are continuous, the operator is thus well-defined and linear. Furthermore, it is clearly bounded since f L2([0, 1],ds; R) and it has an adjoint. ∈ It appears that (µ,f) L2([0, 1],ds; R): ∀ ∈U ∗ × 1 AW f,µ , ∗ = AW f(s)µ(ds), h iU U 0 Z 1 1 = χ[0,s](t)f(t)dtµ(ds), 0 0 Z 1 Z 1 = χ[0,s](t)µ(ds)f(t)dt, 0 0 Z 1 Z = µ([t, 1])f(t)dt, Z0 = A∗ µ,f 2 , h W iL ([0,1],ds) from which A := µ (s µ([s, 1])) is identified. Finally, one has: W∗ → → s s [0, 1],AW AW∗ µ(s)= µ([t, 1])dt, ∀ ∈ 0 Z 1 1 = χ[0,s](t)χ[t,1](z)µ(dz)dt, 0 0 Z 1 Z 1 = χ[0,s](t)χ[t,1](z)dtµ(dz), 0 0 Z 1 Z = kW (s, z)µ(dz), Z0 = W µ(s). C 2 Using theorem 5, the image through AW of any basis from L ([0, 1],ds; R) pro- vides an admissible sequence for W . This is what will be done in the next examples. Example 16 (Karhunen-Lo`eve in L2). Consider the following Hilbert basis in L2([0, 1],ds; R): 1 n N, en(s)= √2 cos π n + s ∀ ∈ 2 66 CHAPTER 2. K.-L. DECOMPOSITION IN BANACH SPACES 2 then the associated admissible sequence using AW is the usual L ([0, 1],ds; R) Karhunen- Lo`evebasis: √2 1 n N,AW en(s)= sin π n + s . ∀ ∈ π(n + 1 ) 2 2 Example 17 (L´evy-Cielsieski). Consider the following (Haar) Hilbert basis in L2([0, 1],ds; R): p 2k 2k+1 √2 for +1 s +1 , 2p ≤ ≤ 2p p p p 2k+1 2k+2 n N, n = 2 + k, k [0, 2 1], p 0, en(s)= √2 for +1 Example 18 (Paley-Wiener basis). Consider the following (Fourier) Hilbert basis in L2([0, 1],ds; R): e0(s) = 1, n N, e n (s)= √2 cos (2π(n + 1)s) , ∀ ∈ 2 +1 n N, e n (s)= √2 sin (2π(n + 1)s) , ∀ ∈ 2 +2 then the admissible sequence obtained using AW is the Paley-Wiener basis: AW e0(s)= s, 1 AW e2n+1(s)= (1 cos (2π(n + 1)s)) , √2π(n + 1) − 1 AW e2n+2(s)= sin (2π(n + 1)s) . √2π(n + 1) Example 19 (Sinus). Consider the following Hilbert basis in L2([0, 1],ds; R): n N, en(s)= √2sin(π (n + 1) s) ∀ ∈ then the admissible sequence obtained using AW is the following basis: √2 n N,AW en(s)= (1 cos (π (n + 1) s)) . ∀ ∈ π(n + 1) − The previous factorization can be used to construct other admissible sequences, using the more general notion of Hilbert frames in L2([0, 1],ds; R) instead of Hilbert basis [LP09] (Wavelets for instance). It is also possible to build bases by specifying a dense sequence (tn)n N in [0, 1], along with a Gram-Schmidt construction [Git12]. ∈ The last admissible sequence that will be given here for the standard Wiener process is a transformation of the C ([0, 1], R) Karhunen-Lo`eve basis (L´evy-Ciesielski). Example 20 (Rotated L´evy-Ciesielski). Let = [0, 1] and W = (Wt)t [0,1] the K ∈ standard Wiener process and the admissible sequence (hn)n N given in example 11 ∈ 2.4. A STUDY OF THE STANDARD WIENER PROCESS 67 (L´evy-Ciesielski basis). The specific structure of this basis will now be exploited to define a new basis. Note h˜0 = h0, h˜1 = h1 and consider h˜2, h˜3, then 1 h˜2 = (h2 + h3) , √2 1 h˜3 = (h2 h3) , √2 − are both orthogonal with unit norm in X and span h ,h = span h˜ , h˜ . Re- U { 2 3} 2 3 peating a similar transformation at for (h4,...,h7), (h8,....,h15) andn so ono and so forth gives a new basis, called rotated L´evy-Ciesielski. l-optimality The comparison starts with the question of asymptotic optimality (related to l-numbers). It has been shown in the literature that the optimal rate is the following for the Wiener standard process: log(n) ln(W ) . ≈ r n Using proposition 29 which gives a sufficient set of conditions, it is immediate to show that the following bases are indeed asymptotically optimal and figure 2.9 provides a Monte-Carlo illustration for the previous admissible sequences (104 simulations on a fine grid for each n between 0 and 31). Proposition 39. Let = [0, 1], W = (Ws)s [0,1] be the standard Wiener process, K ∈ then the following admissible sequences: Karhunen-Lo`evein L2([0, 1],ds; R), • Paley-Wiener, • Sinus, • L´evy-Ciesielski (Karhunen-Lo`evein C ([0, 1], R)), • Rotated L´evy-Ciesielski, • are all asymptotically optimal for W . Proof. The proof relies on an application of proposition 29 to the different admissible sequences. First, note that the eigenvalues of the covariance operator W (as an C endomorphism in L2([0, 1],ds; R)) are known: 1 n N, λ = , n 2 1 2 ∀ ∈ π (n + 2 ) thus item 1 in proposition 29 is satisfied for ν = 1. It remains to check item 2 and 3 relative to infinite and β-H¨older norms. Karhunen-Lo`eve in L2([0, 1],ds; R). It has already be done in example 10. • 68 CHAPTER 2. K.-L. DECOMPOSITION IN BANACH SPACES Paley-Wiener. The basis functions are built upon cosines and sinuses, it is • clear that h0 = 1, k kU √2 n N, h2n+1 = , ∀ ∈ k kU π(n + 1) 1 n N, h2n+2 = , ∀ ∈ k kU √2π(n + 1) 1 thus hn C(n + 1)− with C sufficiently big. Concerning the β-H¨older k kU ≤ norms, one has: h = 1, k 0kβ β+1 β n N, h n = 2 (n + 1) , ∀ ∈ k 2 +1kβ β+1 β n N, h n = 2 (n + 1) , ∀ ∈ k 2 +2kβ and the proposition applies. The analysis is the same as Karhunen-Lo`eve L2([0, 1],ds; R). • Rotated L´evy-Ciesielski. The transformation used reduces the infinite norm • as follows: h2 + h3 1 λ2 h˜2 = = h2 = , √2 √2 k kU 2 U r U since h2 and h3 have distinct supports. The result is similar at each step, thus n 1, n = 2p + k, k [0, 2p 1], p 1: ∀ ≥ ∈ − ≥ 1 h˜n = hn . √2p k kU U In other words, (h˜n)n N satisfies: ∈ p p 1 n N, n = 2 + k, k [0, 2 1], p 0, h˜n = . ∀ ∈ ∈ − ≥ 2p+1 U This implies that: 1 n N∗, h˜n , ∀ ∈ ≤ n + 1 U thus (h˜n)n N satisfies item 2 in proposition 29. Concerning the β-H¨older ∈ norms, it comes 2 h˜n(t) h˜n(s) √ p+1 1 1 1 ˜ 2 p(β 2 )+ 2 (p+1)(β 2 ) 1 hn = sup | − β | = 1 = 2 − = 2 − − , β s=t t s pβ 6 | − | 2 and taking either n = 2p or n = 2p+1 1 gives a valid rate for item 3. − L´evy-Cielsieski. The necessary conditions from proposition 29 are not satisfied • in this case. Indeed, one only has C hn , C> 0. k kU ≤ √n + 1 2.4. A STUDY OF THE STANDARD WIENER PROCESS 69 However, since the rotated L´evy-Ciesielski admissible sequence is optimal, and by construction one has 2p+1 2p+1 Law ξkhk = ξkh˜k p p kX=2 kX=2 with (ξn)n N a sequence of i.i.d. (0, 1), it comes: ∈ N 2n 2 2n 2 n N, E ξkhk = E ξkh˜k , ∀ ∈ k=0 k=0 X U X U from which optimality is deduced. a-optimality The second feature that is interesting to observe is the evolution of a-errors, as they measure the linear reconstruction rate. These numbers have been shown to asymptotically satisfy (corollary 7.6 in [KL02]): 1 an(W ) . ≈ √n It appears that all previous admissible sequences achieve this convergence rate. Proposition 40. Let = [0, 1], W = (Ws)s [0,1] be the standard Wiener process ∈ K 2 on [0, 1] with covariance kernel (s,t) [0, 1] , kW (s,t) = min(s,t) and W = 0 ∀ ∈ 0 almost-surely. Note (hn)n N the admissible sequence obtained by Karhunen-Lo`eve ∈ decomposition in C ([0, 1], R): an((hn)n N)= λn an(W ). ∈ ≈ p Proof. In this construction, one has: 1 p p λn = , n = 2 + k, k [0, 2 1], p 0, 2p+2 ∈ − ≥ and it comes that 1 1 1 n N∗, λn , ∀ ∈ 4n ≤ ≤ 2(n + 1) ≤ 2n thus the proof is complete. Proposition 41. Let = [0, 1], W = (Ws)s [0,1] be the standard Wiener process K ∈ on [0, 1]. Note (hn)n N the admissible sequence obtained by Karhunen-Lo`evedecom- ∈ position in L2([0, 1],ds; R) then an((hn)n N)= λn. ∈ k n sX≥ 70 CHAPTER 2. K.-L. DECOMPOSITION IN BANACH SPACES Proof. Clearly, one has 2 n N, s [0, 1], hn(s) λn, ∀ ∈ ∀ ∈ ≤ 2 thus sups [0,1] k n hk(s) k n λk. Conversely, ∈ ≥ ≤ ≥ P P 2 n N, hn(1) = λn, ∀ ∈ 2 thus sups [0,1] k n hn(s) k n λk. ∈ ≥ ≥ ≥ Figure 2.10P represents theP maximum of residual kernels at each steps for the different admissible sequences at each step. 2.5 Conclusion In this chapter, the important concept of Karhunen-Lo`eve decomposition has been generalized from Hilbert to Banach spaces. It mimics the Hilbert case, where each eigenvector is maximizing the associated Rayleigh quotient. In particular, it provides a precise quantification of uncertainty, since the variance of all linear functionals of the residual Gaussian random elements are dominated. In the case of continuous Gaussian random fields, this decomposition translates in a recursive optimization algorithm, easy to implement. However, it is not clear yet if this new construction leads to l and a optimal admissible sequences. After a short analysis of the standard Wiener process, it appears it is interesting lead. Here are some open questions: 1. Does the Karhunen-Lo`eve decomposition of a Gaussian random element X always lead to rate a-optimal and/or l-optimal bases ? 2. Does the rotation provided for the L´evy-Ciesielski basis for the standard Wiener process generalizes to other Gaussian random elements ? 3. In view of Courant-Fisher minimax principle in Hilbert spaces, is there a dual construction of the presented decomposition ? 4. In the literature of optimal design of experiments, the concept of sequential search for maximum variance points is generalized in Stepwise Uncertainty Reduction [BBG16]. Is this possible to adapt alternative criterion (trace, etc.) to derive new admissible sequences for continuous Gaussian random fields ? 2.5. CONCLUSION 71 Monte-Carlo estimation of l-errors 100 Karhunen-Loève L2 Sinus Paley-Wiener Lévy-Cielsieski. Rotated Lévy-Cielsieski. 6 × 10−1 4 × 10−1 -error l 3 × 10−1 2 × 10−1 0 5 10 15 20 25 30 Order of projection n Figure 2.9: Numerical comparison of l-errors of admissible sequences for the standard Wiener process. a-errors 100 Karhunen-Loève L2 Sinus Paley-Wiener. Lévy-Cielsieski. Rotated Lévy-Cielsieski. -error a 10−1 100 101 Order of projection log(n) Figure 2.10: Comparison of a-errors of admissible sequences for the standard Wiener process. 72 CHAPTER 2. K.-L. DECOMPOSITION IN BANACH SPACES Part II Applications to Bayesian inverse problems 73 Chapter 3 Introduction to inverse problems This introductory chapter on inverse problems presents important concepts relative to this area. It starts with notions of forward, inverse and well-posed problems (section 3.1), which are illustrated both on linear and non-linear examples (section 3.2). Then, the well-known Tikhonov-Philips regularization is introduced in the context of reflexive Banach spaces with general convex penalty functionals (section 3.3). 3.1 Forward, inverse and ill-posed problems Well-posed problem In a large number of fields ranging from engineering to fun- damental sciences, mathematics have been applied to represent various phenomena. These models are designed by the formulation of hypotheses ensuring the desired behaviour for their solution. One well accepted set of properties that a model (or problem) should have, has been stated by J. Hadamard [Had02] in his definition of well-posed problems. Definition 17 (Well-posed problem in the sense of Hadamard [Had02]). A problem is well-posed if solutions: 1. exist, 2. are identical, 3. vary continuously with data. Let us emphasize the three statements in this definition. Although the absence of solution is an interesting mathematical result on itself, one may generally ex- pect solutions for further analysis, especially in applied contexts. Thus, if condition 1 in the previous definition is not satisfied, the whole model needs adjustments. The second assertion about multiplicity is also important, especially if solutions are 75 76 CHAPTER 3. INTRO. TO INVERSE PROBLEMS particularly different. Indeed, one needs either to treat them all (which may be impossible) or to choose and more importantly justify his choice. Finally, continuity is also mandatory for theoretical and practical reasons. For instance, any approxi- mate treatment of the problem (numerical discretization, measurement errors, etc.) would have important and undesired consequences. It may be necessary to consider different topologies (weak topology in Banach spaces for instance) to get continuity. Naturally, a none well-posed problem will be called ill-posed. Forward and inverse problems Once a model has been carefully designed and is shown to be well-posed, a typical use (for instance in Physics or Engineering) is to determine the consequences implied by known causes through the model, which is called the forward or direct approach. Since the model is a tangible link between the two, one may want to determine the causes leading to known consequences, which is the inverse problem. Both approaches are illustrated in figure 3.1. Forward Inverse Causes Model Consequences Figure 3.1: Forward and inverse problems. From a mathematical standpoint, this terminology is conventional as both prob- lems can be considered the inverse of each other. In practice however, forward models are usually studied first to check well-posedness and then their inverse coun- terparts are considered. Even though they represent the two faces of the same coin, both problems may have very different properties. As previously stated, forward ap- proaches are well-posed almost by definition, while inverses are often ill-posed (see the linear example of compact operators in section 3.2 for instance). This is often due to a smoothing effect, meaning that two different causes could lead to similar consequences, or in some sense that the forward model looses information. This phenomenon is particularly clear in the context of linear operators (see section 3.2). Mathematical formulation In this work, we will use the following mathematical notations. Let and be Banach spaces and U Y : D ( ) (D ( )) G G ⊂U→G G ⊂Y a map. A direct problem will be formally defined as: for u , find y such that y = (u), (3.1) ∈U ∈Y G whereas its inverse: for y , find u such that y = (u). (3.2) ∈Y ∈U G 3.2. EXAMPLES 77 In this setting, the well-posedness of the forward problem 3.1 is equivalent to = U D ( ) and continuous. Indeed, in that case (u) is defined for all u and G G G ∈ U continuous in u. Conversely, the problem 3.2 is well-posed if (D ( )) = , 1 G G Y G− exists and is continuous. These particular statements will be detailed in case of a linear map in the next section. 3.2 Examples Following the previous discussion, examples of forward and inverse problems are provided in this section. The two first examples are given with a linear map , G as it illustrates fundamental phenomena, while the last is taken from a real-world non-linear example from computational Biology. Closed range operators Let ( , ) be a bounded operator such that R ( ) G∈L U Y G is closed. If = D ( ), the forward model is well-posed. Conversely, if = R ( ) U G Y G there may be multiple solutions, in which case the inverse problem is ill-posed. Indeed, any particular solution u leads to the full set of solutions u + ker( ). ∈ U G However, if is injective as well, its co-restriction to R ( ) is invertible, and is G G automatically continuous (Open mapping theorem). 2 Example 21. Let = l (N), 0 < m M < (λn)n N R such that λ0 = 0, U ≤ ∞ ∈ ⊂ n N , 0 < m λn M < + and ∀ ∈ ∗ ≤ ≤ ∞ :(un)n N (λnun)n N . G ∈ ∈ U → ∈ ∈U This operator is non-injective as ker( ) = span e where e = (1, 0,...). Further- G { 0} 0 more, let f :(un)n N u0 R, then R ( ) = ker(f) is closed. Indeed: ∈ ∈ U → ∈ G u , f( u)=( u) = λ u = 0, thus R ( ) ker(f), • ∀ ∈U G G 0 0 0 G ⊂ u u u 1 2 k kU u ker(f), let v := 0, λ1 , λ2 ,... then v since v m and u = v • ∀ ∈ ∈U k kU ≤ G thus ker(f) R ( ). ⊂ G The forward problem is well-posed and its inverse ill-posed because of the multiplicity of solutions. However, this ill-posedness is solved by restriction of to ker( ) for G G ⊥ instance. In other words, for closed ranged operators between Banach spaces, the ill- posedness is essentially due to possible multiplicity of solutions. This is the main idea behind the following alternative definition of a well-posed problem. Definition 18 (Well-posed problem in the sense of Nashed [Nas81]). The inverse problem 3.2 is well-posed if and only if R ( ) is closed. G A very important case in practice is given by finite rank operators, because finite dimensional spaces are always topologically closed. In particular, this means that ill-posedness is essentially an infinite dimensional phenomenon [EHN96]. 78 CHAPTER 3. INTRO. TO INVERSE PROBLEMS Example 22 (Interpolation of continuous functions). Let = C([0, 1], R) equipped n U with the supremum norm, (x ,...,xn) [0, 1] and 1 ∈ n : u (u(x ),...,u(xn)) R , G ∈ U → 1 ∈ which is well-defined linear with finite rank. The forward problem of computing (u) G given u is well-posed. The inverse problem of finding u given y Rn is ill-posed ∈U ∈ in the sense of Hadamard (but well-posed in the sense of Nashed). Compact operators Suppose now that and are infinite dimensional Banach U Y spaces and ( , ) a compact operator, then it is well known that can’t be G∈K U Y G homeomorphic. Proposition 42. Let , two infinite dimensional Banach spaces and ( , ) U Y G∈K U Y be an injective compact operator, then 1 : R ( ) is not bounded. G− G →U Proof. Let ( , ) injective and 1 (R ( ) , ) then 1 = is compact, G∈K U Y G− ∈L G U G− G I which provides a contradiction with Riesz theorem. Example 23 (Poisson’s equation). Let Ω be a smooth domain and consider the following equation: ∆u = f on Ω, − u = 0 on ∂Ω. The associated weak formulation is v H1(Ω), u. vdx = fvdx, ∀ ∈ 0 ∇ ∇ ZΩ ZΩ which, according to Lax-Milgram theorem, has a unique solution u H1(Ω) for ∈ 0 every f H 1(Ω). The solution map is linear and compact (H1(Ω) is compactly ∈ − 0 embedded in L2(Ω)), thus the forward problem of computing u from f is well-posed. However, the inverse problem is ill-posed according to proposition 42. Non-linear forward map Here, the following linear evolution equation is con- sidered ∂y (x,t)+ λy(x,t) D∆y(x,t)= f(x,t), (x,t) ]0,L[ ]0,T ] ∂t − ∀ ∈ × y(x, 0) = 0, x ]0,L[ ∀ ∈ (3.3) y(0,t) = 0, t ]0,T ], ∀ ∈ y(L,t) = 0, t ]0,T ]. ∀ ∈ In other words, the quantity of interest y(x,t) evolves from a null initial state under 3 mechanisms. First, a direct variation, resulting from f(x,t) which depends on time and space. Furthermore, y diffuses at a rate D (strictly positive) and decays 3.3. TIKHONOV-PHILIPS REGULARIZATION FOR OPERATORS 79 at a rate λ (positive) both being constant in time and space. The direct problem is to compute the solution y(x,t) at all time and space given the model parameters u =(λ,D,f). When the source f is the only parameter, this problem is linear and the equation 3.3 is called a linear evolution equation. Here, parameters from the differential operator are considered as well, thus the solution map is non-linear. The inverse problem consists in finding u =(λ,D,f) from partial information on y. This example is further studied in chapter 5. Other well-known examples Different examples might be found in the litera- ture, based on the theory of partial differential equations for instance. These mod- els are usually taken from physics or engineering and tomography, gravitometry or imaging are important examples [Isa17]. 3.3 Tikhonov-Philips regularization for operators A regularization method is now presented, which consists in defining alternative notions of solutions to the inverse problem 3.2. These elements will be such that their existence and continuity is proved. It is presented in the case of bounded operators between reflexive Banach spaces, following the lines of [SKHK12], but this approach extends to much more general settings. Regularized solution Consider a penalty functional Ω: D (Ω) R+ ⊂ U → which quantifies a certain property of elements u . A typical example would be ∈U a norm which measures the size of all elements in , but functionals Ω(u) := Au U k kX with A : are also classical (A being a differential operator in a function space U→X for instance). Using this new functional, one can define a new notion of solution to problem 3.2. Definition 19 (Ω-minimizing solution [SKHK12]). An element u is an Ω- † ∈ U minimizing solution to problem 3.2 if u† = arg minu −1(y) D(Ω)Ω(u) ∈G ∩ In other words, we choose among the set of solutions of problem 3.2 those who minimize the penalty functional Ω. The existence and uniqueness of such elements will strongly depends on properties of both spaces , and the map Ω. Usually, a U Y penalty functional is chosen such that it is: 1. Proper: D (Ω) = , thus Ω may be used to distinguish some elements in , 6 ∅ U 2. Convex: c R, the shape of the level set Ω(u) c is convex, ∀ ∈ { ≤ } 80 CHAPTER 3. INTRO. TO INVERSE PROBLEMS 3. Lower semi-continuous: c R, the level set Ω(u) c is closed in the strong ∀ ∈ { ≤ } topology, 4. Stabilizing: c R, Ω(u) c is weakly sequentially pre-compact. ∀ ∈ { ≤ } These particular hypotheses are exactly what is needed to show the existence of minimizers. Indeed, the combination of items 2 and 3 gives the following general result. Proposition 43. Let be a Banach space, A a convex and (strongly) closed U ⊂ U subset, then it is weakly closed as well. Proof. See [Bre11], chapter 3. Finally, item 4 gives the compacity and the existence of a limit for any min- imizing sequence. For instance, the norm of is a penalty functional satisfying U these hypotheses and this particular choice leads to the concept of minimal-norm solutions. Other penalties may be defined, the following power-type family has been particularly studied (Chapter 5 in [SKHK12] for instance) with q ]1, + [ ∈ ∞ 1 q q u , if u Ω(u)= k kX ∈X ( + , otherwise, ∞ and a continuously embedded Banach subspace of . X U Example 24 (Penalty from Reproducing Kernel Hilbert Spaces (RKHS)). For every compact topological space K, the set of real continuous functions on K, equipped with the supremum norm is a Banach space. Choosing a continuous kernel k, the associated (RKHS) injects continuously (see chapter 1 for a proof). Now, using the previous properties of penalty functionals and in the context of reflexive Banach spaces, we can show the existence of Ω-minimizing solutions to problem 3.2. Proposition 44. Let be a reflexive Banach space, be a Banach space and U Y ( , ) a bounded operator, then the set of Ω-minimizing solutions of problem G∈L U Y 3.2 is non-empty whenever y R ( ) and D (Ω) 1(y) = . ∈ G ∩G− 6 ∅ 1 Proof. Since y R ( ) and the set − (y) D (Ω) is non-empty, it contains a min- ∈ G G ∩ 1 imizing sequence (un)n N such that Ω(un) inf Ω(u), u − (y) 0. Because ∈ → { ∈ G } ≥ every convergent sequence is bounded, there is k R such that (Ω(un))n N u ∈ ∈ ∈ { ∈ , Ω(u) k . This set being convex and closed, it is weakly closed and even weakly U ≤ } sequentially compact, thus we can extract a subsequence (unk )k N which is weakly ∈ converging tou ˜ . Finally, the lower semi-continuity of Ω imposes thatu ˜ is an ∈ U Ω-minimizing solution. The unicity is more involved, and can be established using geometrical argu- ments, such as uniform convexity for instance. 3.3. TIKHONOV-PHILIPS REGULARIZATION FOR OPERATORS 81 1 q Proposition 45. Let Ω(u) := q u with q [1, + [ and uniformly convex, k kU ∈ ∞ U then there exists a unique Ω-minimizing solution to problem 3.2. Proof. Suppose q = 1 without loss of generality. From the Millman-Pettis theorem, is reflexive. Moreover, the norm is proper, convex, continuous (thus lower semi- U continuous) and stabilizing as any closed ball is weakly compact in a reflexive Banach space, thus from proposition 44 the set of Ω-minimizing solutions is non-empty. Let u1†,u2† distinct Ω-minimizing solutions (supposed to be of unit norm without loss of generality). Since the set Ω-minimizing solutions is convex, it containsu ˜ := † † u1+u2 2 as well. However, the uniform convexity implies Ω(˜u) < Ω(u1†) which is a contradiction. However, even if the Ω-minimizing solution is a singleton, nor the linearity or continuity of the resulting map is given in proposition 44. In other words, looking directly for Ω-minimizing solutions may be an ill-posed problem. The Tikhonov- Philips regularization method will instead look for approximations of Ω-minimizing solutions. Remark 1. If and are Hilbert spaces and Ω the norm from , the map : U Y U G† y ( ) u is known as the Moore-Penrose inverse [EHN96]. Its domain ∈ R G → † ∈ U of definition is R ( )+ R ( )⊥. In particular, if R ( ) is closed, D = . G G G G† Y Regularized problem The concept of Ω-minimizing solutions is appealing for its clarity but doesn’t provide for well-posed inverse problems. Moreover, the data y may not be available, and one can sometimes work with noise δ , thus consider ∈Y yδ = y + δ instead of y. The regularized problem will give a answer to these 2 problems, considering the following Tikhonov-Philips functional u : ∀ ∈U 1 p T δ(u) := u D (Ω) D ( ) u yδ + αΩ(u) α ∈ ∩ G ⊂ U → p G − Y where p ]1, + [, α> 0. This quantity is composed of two different terms, balanc- ∈ ∞ ing between proximity to data and penalty as α varies. Taking these 2 constraints in account, a Tikhonov-Philips solution to problem 3.2 can be defined. Definition 20 (Tikhonov-Philips solution). An element uδ is a Tikhonov- α ∈ U Philips solution to problem 3.2 if T δ(uδ ) = inf T δ(u), u . α α { α ∈ U} Remark that uδ may not be an element from 1(y). α G− Theorem 15. Let , be reflexive Banach spaces, ( , ) a bounded operator, U Y G∈L U Y y R ( ) and yδ then ∈ G ∈Y 1. (Existence of solutions, proposition 4.1 in [SKHK12]): α > 0, yδ , the ∀ ∀ ∈ Y set of Tikhonov-Philips solutions is non-empty. 82 CHAPTER 3. INTRO. TO INVERSE PROBLEMS 2. (Stability of solutions, proposition 4.2 in [SKHK12]): α > 0, if yδn yδ ∀ → δn then any sequence (uα )n N of Tikhonov-Philips solutions has a subsequence ∈ δnk δnk δ δ uα such that uα ⇀uα and uα is a Tikhonov-Philips solution. k N ∈ δn 3. (Convergence of solutions, theorem 4.3 in [SKHK12]): Let (y )n N a sequence ∈ δn in such that y y and δn converges monotonically to 0, (αn)n N Y → k kY ∈ ∈ δn ]0, + [ and uαn n N of Tikhonov-Philips solutions. If ∞ ∈ 0 such that, ≤ − ≤ ∃ 2 β 1 (1 + log(n)) + 2 n N, ln(X) c . ∀ ∈ ≤ 2 nγ This is the case when c > 0 such that, ∃ 3 2β 2 2 2γ 1 (s,t) [0, 1] , s = t, E (Xs Xt) c s t log . ∀ ∈ 6 − ≤ 3| − | s t h i | − |