Kernel Mean Embedding of Probability Measures and Its Applications to Functional Data Analysis

Total Page:16

File Type:pdf, Size:1020Kb

Kernel Mean Embedding of Probability Measures and Its Applications to Functional Data Analysis Kernel Mean Embedding of Probability Measures and its Applications to Functional Data Analysis Saeed Hayati Kenji Fukumizu [email protected] [email protected] Afshin Parvardeh [email protected] November 5, 2020 Abstract This study intends to introduce kernel mean embedding of probability measures over infinite-dimensional separable Hilbert spaces induced by functional response statistical models. The embedded function represents the concentration of probability measures in small open neighborhoods, which identifies a pseudo-likelihood and fosters a rich framework for sta- tistical inference. Utilizing Maximum Mean Discrepancy, we devise new tests in functional response models. The performance of new derived tests is evaluated against competitors in three major problems in functional data analysis including function-on-scalar regression, functional one-way ANOVA, and equality of covariance operators. 1 Introduction Functional response models are among the major problems in the context of Functional Data Analysis. A fundamental issue in dealing with functional response statistical models arises due to the lack of practical frameworks on characterizing probability measure on function spaces. This is mainly a con- sequence of the tremendous gap on how we present probability measures in arXiv:2011.02315v1 [math.ST] 4 Nov 2020 finite-dimensional and infinite-dimensional spaces. A useful property of finite-dimensional spaces is the existence of a locally fi- nite, strictly positive, and translation invariant measure like Lebesgue or count- ing measure, which makes it easy to take advantage of probability measures directly in the statistical inference. Fitting a statistical model, and estimat- ing parameters, hypothesis testing, deriving confidence regions and developing goodness of fit indices, all can be applied by integrating distribution or condi- tional distribution of response variables as a presumption into statistical proce- dures. 1 Sporadic efforts have been gone into approximating or representing proba- bility measures on infinite-dimensional spaces. Let H be a separable infinite- dimensional Hilbert space and X be a H-valued random element with finite second moment and covariance operator C. Delaigle and Hall [5] approxi- mated probability of Br (x) = fk X − x k< rg by the surrogate density of a finite-dimensional approximated version of X, obtained by projecting the ran- dom element X into a space spanned by first few eigenfunctions of C with largest eigenvalues. The approximated small-ball probability is on the basis of Karhunen-Lo`eve expansion and putting an extra assumption that the compo- nent scores are independent. The precision of this approximation depends on the volume of ball and probability measure itself. Let I be a compact subset of R such as closed interval [0; 1] and X be a zero mean L2 [I]-valued random element with finite second moment and P 1=2 −1=2 Karhunen-Lo`eve expansion X = j≥1 λj Xj j, in which Xj = λj hX; ji and fλj; jgj≥1 is the eigensystem of covariance operator C. Suppose that the distribution of Xj is absolutely continuous with respect to the Lebesgue mea- sure with density fj. Approximation of the logarithm of p (x j r) = P (Br (x)) = P (fk X − x k< rg) given by Delaigle and Hall [5] is h X log p(x j r) = C1(h; fλjgj≥1) + log fj(xj) + o(h); j=1 in which xj = hx; ji, and h is the number of components that depends on r and tends to infinity as r declines to zero. C1 (·) depends only on size of the ball and sequence of eigenvalues, though the quantity o(h) as the precision of approximation depends on P . −1 Ph The quantity h j=1 log fj(xj) is called log-density by Delaigle and Hall [5]. A serious concern with this approximation is its precision, which depends on the probability measure itself. Accordingly, it can not be employed to compare small-ball probability in a family of probability measures. For example, in the case of estimating the parameters in a functional response regression model, the induced probability measure varies with different choices of parameters. Thus this approximation can not be employed for parameter estimation and comparing the goodness of fit of different regression models. Another work in representing probability measures on a general separable Hilbert space H presented by Lin et al. [17]. They constructed a dense sub- space of H called Mixture Inner Product Space (MIPS), which is the union of a countable collection of finite-dimensional subspaces of H. An approximating version of the given H-valued random element lies in this subspace, which in consequence, lies in a finite-dimensional subspace of H according to a given discrete distribution. They defined a base measure on the MIPS, which is not translation-invariant, and introduced density functions for the MIPS-valued ran- dom elements. Absence of a proper method in representing probability measures over infinite- dimensional spaces caused severe problems to statistical inference. To make it 2 clear, as an example Greven et al. [9] developed a general framework for func- tional additive mixed-effect regression models. They considered a log-likelihood function by summing up the log-likelihood of response functions Yi at a grid of time-points tid; d = 1;:::;Di, assuming Yi (tid) to be independent within the grid of time-points. A simulation study by Kokoszka and Reimherr [16] re- vealed the weak performance of the proposed framework in statistical hypothesis testing in a simple Gaussian Function-on-Scalar linear regression problem. Currently, MLE and other density-based methods are out of reach in the context of functional response models. In this study, we follow a different path by identifying probability measures with their kernel mean functions and in- troduce a framework for statistical inference in infinite-dimensional spaces. A promising fact about the kernel mean functions, which is shown in this paper, is their ability to reflect the concentration of probability measures in small open neighborhoods, where unlike the approach of Delaigle and Hall [5] is comparable among different probability measures. This property of kernel mean function motivates us to make use of it in fitting statistical models and introducing new statistical tests in the context of functional data analysis. This paper is organized as follows: In Section2, kernel mean embedding of probability measures over infinite-dimensional separable Hilbert spaces is dis- cussed. In Section3 the Maximum Kernel Mean estimation method is intro- duced and estimators for Gaussian Response Regression models are derived. In Section4, new statistical tests are developed for three major problems in functional data analysis and their performance evaluated using simulation stud- ies. Section5 has been devoted to discussion and conclusion. Major proofs are aggregated in the appendix. 2 Kernel mean embedding of probability mea- sures We summarize the basics of kernel mean embedding. See Muandet et al. [20] for a general reference. Let (H;B (H) ;P ) be a probability measure space. Through- out this study H is an infinite-dimensional separable Hilbert space equipped with inner product h·; ·i . A function k : H × H ! R is a positive definite H Pn kernel if it is symmetric, i.e., k(x; y) = k(y; x) and i=1 aiajk(xi; xj) ≥ 0 for all n 2 N and ai 2 R and xi 2 H. k is strictly positive definite if equality implies a1 = a2 = ::: = an = 0. k is said to be integrally strictly positive definite if R k(x; y)µ(dx)µ(dy) > 0 for any non-zero finite signed measure µ defined over (H;B (H)). Any integrally strictly positive definite kernel is strictly positive definite while the converse is not true [26]. A positive definite kernel induces a Hilbert space of functions over H, which is called Reproducing Kernel Hilbert Space (RKHS) and equals to Hk = spanfk(x; ·); x 2 Hg with inner product X X X X h aik(xi; ·); bik(yi; ·)iHk = aibjk(xi; yj): i≥1 i≥1 i≥1 j≥1 3 For each f 2 Hk and x 2 H we have f(x) = hf; k(:; x)iHk , which is the repro- ducing property of kernel k. A strictly positive definite kernel k is said to be characteristic for a family of measures P if the map Z m : P !Hk P 7! k(x; :)P (dx) p is injective. If EP ( k(X; X)) < 1 then mP (·) := (m(P ))(·) exists in Hk R [20], and the function mP (·) = k(x; ·)P (dx) is called kernel mean function. Moreover, for any f 2 Hk we have EP [f(X)] = hf; mP iHk [25]. Thus, if kernel k is characteristic then every probability measure defined over (H; Σ) is uniquely identified by an element mP of Hk and Maximum Mean Discrepancy (MMD) defined as Z Z MMD(Hk; P; Q) = sup f(x)P(dx) − f(x)Q(dx) f2Hk;kfk ≤1 Hk = sup hf; m − m i = km − m k ; (1) P Q P Q Hk f2Hk;kfk ≤1 Hk is a metric on the family of measures P over H [20]. A similar quantity called Ball divergence is proposed by Pan et al. [22] to distinguish probability measures defined over separable Banach spaces. For the case of infinite-dimensional spaces, Ball divergence distinguishes two probability measures if at least one of them possesses a full support, that is, Supp (P ) = H. They employed Ball divergence for a two-sample test, which according to their simulation results, the performance of both MMD and Ball divergence are close and superior to other tests. Kernel mean functions can also be used to reflect the concentration of prob- ability measures in small-balls, if the kernel function is translation-invariant.
Recommended publications
  • Preregular Maps Between Banach Lattices
    BULL. AUSTRAL. MATH. SOC. 46A40, 46M05 VOL. II (1974), 231-254. (46BI0, 46BI5) Preregular maps between Banach lattices David A. Birnbaum A continuous linear map from a Banach lattice E into a Banach lattice F is preregular if it is the difference of positive continuous linear maps from E into the bidual F" of F . This paper characterizes Banach lattices B with either of the following properties: (1) for any Banach lattice E , each map in L(E, B) is preregular; (2) for any Banach lattice F , each map in L{B, F) is preregular. It is shown that B satisfies (l) (respectively (2)) if and- only if B' satisfies (2) (respectively (l)). Several order properties of a Banach lattice satisfying (2) are discussed and it is shown that if B satisfies (2) and if B is also an atomic vector lattice then B is isomorphic as a Banach lattice to I (T) for some index set Y . 1. Introduction The following natural question arises in the theory of Banach lattices: Given Banach lattices E and F , is each map in the space L(E, F) of continuous linear maps from E into F the difference of positive (continuous) linear maps? It is known that if F is a C{X) for X an extremally disconnected, compact Hausdorff space X or if E is an A£-space and F has the monotone convergence property then the answer to Received 17 May 1971*. 231 Downloaded from https://www.cambridge.org/core. IP address: 170.106.202.226, on 02 Oct 2021 at 11:23:22, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
    [Show full text]
  • Integration of Functions with Values in a Banach Lattice
    INTEGRATION OF FUNCTIONS WITH VALUES IN A BANACH LATTICE G. A. M.JEURNINK INTEGRATION OF FUNCTIONS WITH VALUES IN A BANACH LATTICE PROMOTOR: PROF. DR. А. С. M. VAN ROOIJ INTEGRATION OF FUNCTIONS WITH VALUES IN A BANACH LATTICE PROEFSCHRIFT TER VERKRIJGING VAN DE GRAAD VAN DOCTOR IN DE WISKUNDE EN NATUURWETENSCHAPPEN AAN DE KATHOLIEKE UNIVERSI­ TEIT TE NIJMEGEN, OP GEZAG VAN DE RECTOR MAGNIFICUS PROF DR Ρ G А В WIJDEVELD, VOLGENS BESLUIT VAN HET COI LEGE VAN DEKANEN IN HET OPENBAAR TE VERDEDIGEN OP VRIJDAG 18 JUNI 1982, DES MIDDAGS TE 2 00 ULR PRECIES DOOR GERARDUS ALBERTUS MARIA JEURNINK GEBOREN TE DIEPENVEEN И krips repro meppel 1982 Aan mijn ошіелі Voor hun medewerking aan dit proefschrift ben ik Trees van der Eem-Mijnen en Ciaire Elings-Mesdag zeer dankbaar. CONTENTS INTRODUCTION AND SUMMARY CONVENTIONS AND NOTATIONS CHAPTER I PRELIMINARIES 1 §1 Measurability and integrability of functions with values in a Banach space 1 §2 Banach lattices 13 §3 Summability of sequences in Banach lattices 23 CHAPTER II INTEGRATION 35 51 Integration of functions with values in a Banach lattice 35 §2 Riesz spaces of integrable functions 47 §3 Banach lattice theory for spaces of integrable functions 58 §4 Examples 68 CHAPTER III SPECIAL CLASSES OF OPERATORS AND TENSOR PRODUCTS 75 51 Induced maps between spaces of integrable functions 75 §2 θ-operators 79 5 3 Δ-operators 85 5 4 Tensor products of Banach lattices 92 55 Tensor products of Banach spaces and Banach lattices 103 56 Examples of tensor products 108 CHAPTER IV VECTOR MEASURES 115 §1 Vector measures with values in a Banach lattice 115 §2 Weakly equivalent functions 124 §3 The Radon-Nikodym property 136 54 Weak measurable functions 147 CHAPTER V DANIELL INTEGRATION 153 §1 An extension of the Pettis integral 153 §2 The extension of the integral on S(μ.
    [Show full text]
  • Bochner's Subordionation and Fractional Caloric
    BOCHNER'S SUBORDIONATION AND FRACTIONAL CALORIC SMOOTHING IN BESOV AND TRIEBEL{LIZORKIN SPACES VICTORIYA KNOPOVA AND RENE´ L. SCHILLING Abstract. We use Bochner's subordination technique to obtain caloric smooth- ing estimates in Besov- and Triebel{Lizorkin spaces. Our new estimates extend known smoothing results for the Gauß–Weierstraß, Cauchy{Poisson and higher- order generalized Gauß–Weierstraß semigroups. Extensions to other function spaces (homogeneous, hybrid) and more general semigroups are sketched. 1. Introduction f Let (Wt )t≥0 be the f-subordinated Gauß–Weierstraß semigroup; by this we mean the family of operators which is defined through the Fourier transform F(W f u)(ξ) = e−tf(jξj2) Fu(ξ); u 2 S(Rn)(1.1) where the function f : (0; 1) ! (0; 1) is a so-called Bernstein function, see Sec- tion 3. Typical examplesp are f(x) = x (which gives the classical Gauß{Weierstraß semigroup), f(x) = x (which gives the Cauchy{Poisson semigroup) or f(x) = xα, 0 < α < 1 (which leads to the stable semigroups). In this note we prove the caloric f smoothing of (the extension of) (Wt )t≥0 in Besov and Triebel{Lizorkin spaces, see Section 2. \Caloric smoothing" refers to the smoothing effect of the semigroup which can be quantified through inequalities of the following form (f) s+d s s (1.2) Cf;d(t)kWt u j Ap;q k ≤ ku j Ap;qk for all 0 < t ≤ 1 and u 2 Ap;q; where d ≥ 0 is arbitrary, Cf;d(t) is a constant depending only on f and d, and s s Rn s Rn Cf;d(t) ! 0 as t ! 0; Ap;q = Ap;q( ) stands for a Besov space Bp;q( ) or Triebel{ s Rn Lizorkin space Fp;q( ).
    [Show full text]
  • Extensions and Liftings of Positive Linear Mappings On
    TRANSACTIONSOF THE AMERICANMATHEMATICAL SOCIETY Volume 211, 1975 EXTENSIONSAND LIFTINGS OF POSITIVE LINEARMAPPINGS ON BANACHLATTICES BY HEINRICH P. LOTZi1) ABSTRACT. Let F be a closed sublattice of a Banach lattice G. We show that any positive linear mapping from F into L (i¿) or C(X) for a Stonian space X has a positive norm preserving extension to G. A dual re- sult for positive norm preserving liftings is also established. These results are applied to obtain extension and lifting theorems for order summable and majorizing linear mappings. We also obtain some partial results concerning positive extensions and liftings of compact linear mappings. The main purpose of this paper is to establish some conditions under which positive linear mappings between Banach lattices have norm preserv- ing positive linear extensions or liftings. These results are then applied to obtain extension and lifting theorems for order summable and majorizing linear mappings and to establish the inductive and projective character of the |cr|- tensor product topology introduced by Jacobs [6]. Finally, we obtain some partial results concerning the problem of finding positive, compact linear extensions and liftings of positive compact linear mappings between Banach lattices. We begin our discussion with a summary of known results concerning the corresponding problems for continuous linear mappings between Banach spaces. Suppose that E, F and G are Banach spaces, that cf> is an isometry of F into G and that T is a continuous linear mapping of F into E. When does T have a norm preserving linear extension T:G —*E, that is, when does there exist a continuous linear mapping T of G into E such that ||T|| = ||T|| and such that the following diagram commutes? If E is the scalar field, the Hahn-Banach theorem asserts that such an exten- sion T of T always exists.
    [Show full text]
  • View This Volume's Front and Back Matter
    http://dx.doi.org/10.1090/gsm/084 Cone s an d Dualit y This page intentionally left blank Cone s an d Dualit y Charalambo s D . Alipranti s RabeeTourk y Graduate Studies in Mathematics Volum e 84 •& Ip^Sn l America n Mathematica l Societ y *0||jjO ? provjcjence i Rhod e Islan d Editorial Board David Cox (Chair) Walter Craig N. V. Ivanov Steven G. Krantz 2000 Mathematics Subject Classification. Primary 46A40, 46B40, 47B60, 47B65; Secondary 06F30, 28A33, 91B28, 91B99. For additional information and updates on this book, visit www.ams.org/bookpages/gsm-84 Library of Congress Cataloging-in-Publication Data Aliprantis, Charalambos D. Cones and duality / Charalambos D. Aliprantis, Rabee Tourky. p. cm. — (Graduate studies in mathematics, ISSN 1065-7339 ; v. 84) Includes bibliographical references and index. ISBN 978-0-8218-4146-4 (alk. paper) 1. Cones (Operator theory). 2. Linear topological spaces, Ordered. I. Tourky, Rabee, 1966- II. Title. QA329 .A45 2007 515'.724—dc22 2007060758 Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy a chapter for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given. Republication, systematic copying, or multiple reproduction of any material in this publication is permitted only under license from the American Mathematical Society. Requests for such permission should be addressed to the Acquisitions Department, American Mathematical Society, 201 Charles Street, Providence, Rhode Island 02904-2294, USA.
    [Show full text]
  • Preregular Maps Between Banach Lattices
    BULL. AUSTRAL. MATH. SOC. 46A40, 46M05 VOL. II (1974), 231-254. (46BI0, 46BI5) Preregular maps between Banach lattices David A. Birnbaum A continuous linear map from a Banach lattice E into a Banach lattice F is preregular if it is the difference of positive continuous linear maps from E into the bidual F" of F . This paper characterizes Banach lattices B with either of the following properties: (1) for any Banach lattice E , each map in L(E, B) is preregular; (2) for any Banach lattice F , each map in L{B, F) is preregular. It is shown that B satisfies (l) (respectively (2)) if and- only if B' satisfies (2) (respectively (l)). Several order properties of a Banach lattice satisfying (2) are discussed and it is shown that if B satisfies (2) and if B is also an atomic vector lattice then B is isomorphic as a Banach lattice to I (T) for some index set Y . 1. Introduction The following natural question arises in the theory of Banach lattices: Given Banach lattices E and F , is each map in the space L(E, F) of continuous linear maps from E into F the difference of positive (continuous) linear maps? It is known that if F is a C{X) for X an extremally disconnected, compact Hausdorff space X or if E is an A£-space and F has the monotone convergence property then the answer to Received 17 May 1971*. 231 Downloaded from https://www.cambridge.org/core. IP address: 170.106.202.8, on 27 Sep 2021 at 07:26:14, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
    [Show full text]
  • The Problems of Employment in Mathematical Sciences 718 Some Super-Classics of Mathematics 723 News Items An:D Announcements 722, 726, 730, 738, 742
    OF THE AMERICAN MATHEMATICAL SOCIETY Edited by Everett Pitcher and Gordon L. Walker CONTENTS MEETINGS Calendar of Meetings o o o o o o o o o o o o o o o o o o o o Inside Front Cover Program for the August Meeting in University Park, Pennsylvania 690 Abstracts for the Meeting: Pages 752-792 PRELIMINARY ANNOUNCEMENTS OF MEETINGS o o o o o o o o o o 715 THE PROBLEMS OF EMPLOYMENT IN MATHEMATICAL SCIENCES 718 SOME SUPER-CLASSICS OF MATHEMATICS 723 NEWS ITEMS AN:D ANNOUNCEMENTS 722, 726, 730, 738, 742 MEMORANDA TO MEMBERS • o o o 0 0 0 0 0 0 0 0 0 0 727 Contributing Members Grants for Scientific Research Mathematical Sciences Employment Register Annual Salary Survey Change of Address??????? LETTERS TO THE EDITOR 731 SPECIAL MEETINGS INFORMATION CENTER 733 PERSONAL ITEMS o o o o 736 NEW AMS PUBLICATIONS 739 BACKLOG OF MATHEMATICS RESEARCH JOURNALS 743 VISITING MATHEMATICIANS o o o o o o o 744 ABSTRACTS OF CO~TRIBUTED PAPERS 751 ABSTRACTS PRESENTED TO THE SOCIETY 793 ERRATA TO ABSTRACTS o 792, 842 INDEX TO ADVERTISERS o 848 The Seventy-Sixth Summer Meeting Pennsylvania State University University Park, Pennsylvania August 31-September 3, 1971 The seventy-sixth summer meeting matics. of the American Mathematical Society By invitation of the Committee to will be held at The Pennsylvania State Select Hour Speakers for Annual and University, University Park, Pennsylva­ Summer Meetings, there will be five in­ nia, from Tuesday, August 31, 1971, vited hour addresses at the meeting. Pro­ through Friday, September 3, 1971.
    [Show full text]
  • NARROW ORTHOGONALLY ADDITIVE OPERATORS 3 Means That Y Is a Fragment of X
    NARROW ORTHOGONALLY ADDITIVE OPERATORS MARAT PLIEV AND MIKHAIL POPOV Abstract. We extend the notion of narrow operators to nonlinear maps on vector lattices. The main objects are orthogonally additive operators and, in particular, abstract Uryson operators. Most of the re- sults extend known theorems obtained by O. Maslyuchenko, V. Mykhay- lyuk and the second named author published in Positivity 13 (2009), pp. 459–495, for linear operators. For instance, we prove that every orthogonally additive laterally-to-norm continuous C-compact operator from an atomless Dedekind complete vector lattice to a Banach space lc is narrow. Another result asserts that the set Uon(E, F ) of all order narrow laterally continuous abstract Uryson operators is a band in the vector lattice of all laterally continuous abstract Uryson operators from an atomless vector lattice E with the principal projection property to a Dedekind complete vector lattice F . The band generated by the dis- jointness preserving laterally continuous abstract Uryson operators is lc the orthogonal complement to Un (E, F ). 1. Introduction 1.1. About the paper. Narrow operators were introduced and studied in 1990 by Plichko and the second named author [14] as a generalization of compact operators defined on symmetric function spaces. Since that, narrow operators were defined on much more general domain spaces generalizing the previous cases, like K¨othe function spaces [3], vector lattices [10] and lattice normed spaces [16]. Now it is a subject of an intensive study (see recent monograph [18]). Some properties of AM-compact operators are generalized to narrow op- erators, but not all of them.
    [Show full text]
  • Arxiv:1908.06786V2 [Math.PR]
    BOCHNER’S SUBORDINATION AND FRACTIONAL CALORIC SMOOTHING IN BESOV AND TRIEBEL–LIZORKIN SPACES VICTORIYA KNOPOVA AND RENE´ L. SCHILLING Abstract. We use Bochner’s subordination technique to obtain caloric smooth- ing estimates in Besov- and Triebel–Lizorkin spaces. Our new estimates extend known smoothing results for the Gauß–Weierstraß, Cauchy–Poisson and higher- order generalized Gauß–Weierstraß semigroups. Extensions to other function spaces (homogeneous, hybrid) and more general semigroups are sketched. 1. Introduction f Let (Wt )t≥0 be the f-subordinated Gauß–Weierstraß semigroup; by this we mean the family of operators which is defined through the Fourier transform 2 (1.1) (W f u)(ξ)= e−tf(|ξ| ) u(ξ), u S(Rn), F t F ∈ where the function f : (0, ) (0, ) is a so-called Bernstein function, see Sec- tion 3. Typical examples are∞ f→(x) =∞x (which gives the classical Gauß–Weierstraß semigroup), f(x)= √x (which gives the Cauchy–Poisson semigroup) or f(x)= xα, 0 <α< 1 (which leads to the stable semigroups). In this note we prove the caloric f smoothing of (the extension of) (Wt )t≥0 in Besov and Triebel–Lizorkin spaces, see Section 2. “Caloric smoothing” refers to the smoothing effect of the semigroup which can be quantified through inequalities of the following form (1.2) C (t) W f u As+d u As for all 0 < t 1 and u As , f,d k t | p,q k≤k | p,qk ≤ ∈ p,q where d 0 is arbitrary, Cf,d(t) is a constant depending only on f and d, and ≥ s s Rn s Rn Cf,d(t) 0 as t 0; the symbol Ap,q = Ap,q( ) stands for a Besov space Bp,q( ) → → s Rn or a Triebel–Lizorkin space Fp,q( ).
    [Show full text]