Kernel Mean Embedding of Probability Measures and Its Applications to Functional Data Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Kernel Mean Embedding of Probability Measures and its Applications to Functional Data Analysis Saeed Hayati Kenji Fukumizu [email protected] [email protected] Afshin Parvardeh [email protected] November 5, 2020 Abstract This study intends to introduce kernel mean embedding of probability measures over infinite-dimensional separable Hilbert spaces induced by functional response statistical models. The embedded function represents the concentration of probability measures in small open neighborhoods, which identifies a pseudo-likelihood and fosters a rich framework for sta- tistical inference. Utilizing Maximum Mean Discrepancy, we devise new tests in functional response models. The performance of new derived tests is evaluated against competitors in three major problems in functional data analysis including function-on-scalar regression, functional one-way ANOVA, and equality of covariance operators. 1 Introduction Functional response models are among the major problems in the context of Functional Data Analysis. A fundamental issue in dealing with functional response statistical models arises due to the lack of practical frameworks on characterizing probability measure on function spaces. This is mainly a con- sequence of the tremendous gap on how we present probability measures in arXiv:2011.02315v1 [math.ST] 4 Nov 2020 finite-dimensional and infinite-dimensional spaces. A useful property of finite-dimensional spaces is the existence of a locally fi- nite, strictly positive, and translation invariant measure like Lebesgue or count- ing measure, which makes it easy to take advantage of probability measures directly in the statistical inference. Fitting a statistical model, and estimat- ing parameters, hypothesis testing, deriving confidence regions and developing goodness of fit indices, all can be applied by integrating distribution or condi- tional distribution of response variables as a presumption into statistical proce- dures. 1 Sporadic efforts have been gone into approximating or representing proba- bility measures on infinite-dimensional spaces. Let H be a separable infinite- dimensional Hilbert space and X be a H-valued random element with finite second moment and covariance operator C. Delaigle and Hall [5] approxi- mated probability of Br (x) = fk X − x k< rg by the surrogate density of a finite-dimensional approximated version of X, obtained by projecting the ran- dom element X into a space spanned by first few eigenfunctions of C with largest eigenvalues. The approximated small-ball probability is on the basis of Karhunen-Lo`eve expansion and putting an extra assumption that the compo- nent scores are independent. The precision of this approximation depends on the volume of ball and probability measure itself. Let I be a compact subset of R such as closed interval [0; 1] and X be a zero mean L2 [I]-valued random element with finite second moment and P 1=2 −1=2 Karhunen-Lo`eve expansion X = j≥1 λj Xj j, in which Xj = λj hX; ji and fλj; jgj≥1 is the eigensystem of covariance operator C. Suppose that the distribution of Xj is absolutely continuous with respect to the Lebesgue mea- sure with density fj. Approximation of the logarithm of p (x j r) = P (Br (x)) = P (fk X − x k< rg) given by Delaigle and Hall [5] is h X log p(x j r) = C1(h; fλjgj≥1) + log fj(xj) + o(h); j=1 in which xj = hx; ji, and h is the number of components that depends on r and tends to infinity as r declines to zero. C1 (·) depends only on size of the ball and sequence of eigenvalues, though the quantity o(h) as the precision of approximation depends on P . −1 Ph The quantity h j=1 log fj(xj) is called log-density by Delaigle and Hall [5]. A serious concern with this approximation is its precision, which depends on the probability measure itself. Accordingly, it can not be employed to compare small-ball probability in a family of probability measures. For example, in the case of estimating the parameters in a functional response regression model, the induced probability measure varies with different choices of parameters. Thus this approximation can not be employed for parameter estimation and comparing the goodness of fit of different regression models. Another work in representing probability measures on a general separable Hilbert space H presented by Lin et al. [17]. They constructed a dense sub- space of H called Mixture Inner Product Space (MIPS), which is the union of a countable collection of finite-dimensional subspaces of H. An approximating version of the given H-valued random element lies in this subspace, which in consequence, lies in a finite-dimensional subspace of H according to a given discrete distribution. They defined a base measure on the MIPS, which is not translation-invariant, and introduced density functions for the MIPS-valued ran- dom elements. Absence of a proper method in representing probability measures over infinite- dimensional spaces caused severe problems to statistical inference. To make it 2 clear, as an example Greven et al. [9] developed a general framework for func- tional additive mixed-effect regression models. They considered a log-likelihood function by summing up the log-likelihood of response functions Yi at a grid of time-points tid; d = 1;:::;Di, assuming Yi (tid) to be independent within the grid of time-points. A simulation study by Kokoszka and Reimherr [16] re- vealed the weak performance of the proposed framework in statistical hypothesis testing in a simple Gaussian Function-on-Scalar linear regression problem. Currently, MLE and other density-based methods are out of reach in the context of functional response models. In this study, we follow a different path by identifying probability measures with their kernel mean functions and in- troduce a framework for statistical inference in infinite-dimensional spaces. A promising fact about the kernel mean functions, which is shown in this paper, is their ability to reflect the concentration of probability measures in small open neighborhoods, where unlike the approach of Delaigle and Hall [5] is comparable among different probability measures. This property of kernel mean function motivates us to make use of it in fitting statistical models and introducing new statistical tests in the context of functional data analysis. This paper is organized as follows: In Section2, kernel mean embedding of probability measures over infinite-dimensional separable Hilbert spaces is dis- cussed. In Section3 the Maximum Kernel Mean estimation method is intro- duced and estimators for Gaussian Response Regression models are derived. In Section4, new statistical tests are developed for three major problems in functional data analysis and their performance evaluated using simulation stud- ies. Section5 has been devoted to discussion and conclusion. Major proofs are aggregated in the appendix. 2 Kernel mean embedding of probability mea- sures We summarize the basics of kernel mean embedding. See Muandet et al. [20] for a general reference. Let (H;B (H) ;P ) be a probability measure space. Through- out this study H is an infinite-dimensional separable Hilbert space equipped with inner product h·; ·i . A function k : H × H ! R is a positive definite H Pn kernel if it is symmetric, i.e., k(x; y) = k(y; x) and i=1 aiajk(xi; xj) ≥ 0 for all n 2 N and ai 2 R and xi 2 H. k is strictly positive definite if equality implies a1 = a2 = ::: = an = 0. k is said to be integrally strictly positive definite if R k(x; y)µ(dx)µ(dy) > 0 for any non-zero finite signed measure µ defined over (H;B (H)). Any integrally strictly positive definite kernel is strictly positive definite while the converse is not true [26]. A positive definite kernel induces a Hilbert space of functions over H, which is called Reproducing Kernel Hilbert Space (RKHS) and equals to Hk = spanfk(x; ·); x 2 Hg with inner product X X X X h aik(xi; ·); bik(yi; ·)iHk = aibjk(xi; yj): i≥1 i≥1 i≥1 j≥1 3 For each f 2 Hk and x 2 H we have f(x) = hf; k(:; x)iHk , which is the repro- ducing property of kernel k. A strictly positive definite kernel k is said to be characteristic for a family of measures P if the map Z m : P !Hk P 7! k(x; :)P (dx) p is injective. If EP ( k(X; X)) < 1 then mP (·) := (m(P ))(·) exists in Hk R [20], and the function mP (·) = k(x; ·)P (dx) is called kernel mean function. Moreover, for any f 2 Hk we have EP [f(X)] = hf; mP iHk [25]. Thus, if kernel k is characteristic then every probability measure defined over (H; Σ) is uniquely identified by an element mP of Hk and Maximum Mean Discrepancy (MMD) defined as Z Z MMD(Hk; P; Q) = sup f(x)P(dx) − f(x)Q(dx) f2Hk;kfk ≤1 Hk = sup hf; m − m i = km − m k ; (1) P Q P Q Hk f2Hk;kfk ≤1 Hk is a metric on the family of measures P over H [20]. A similar quantity called Ball divergence is proposed by Pan et al. [22] to distinguish probability measures defined over separable Banach spaces. For the case of infinite-dimensional spaces, Ball divergence distinguishes two probability measures if at least one of them possesses a full support, that is, Supp (P ) = H. They employed Ball divergence for a two-sample test, which according to their simulation results, the performance of both MMD and Ball divergence are close and superior to other tests. Kernel mean functions can also be used to reflect the concentration of prob- ability measures in small-balls, if the kernel function is translation-invariant.