Diffussion Kernels on Statistical Manifold

Diffusion Kernels on Statistical Manifolds Juan Carlos Arango Parra September 29, 2017 Advisers: PhD Gabriel Ignacio Loaiza Ossa PhD Carlos Alberto Cadavid Moreno Doctoral Seminar 1 Universidad EAFIT Department of Mathematical Sciences PhD in Mathematical Engineering 2017 Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Objective of the article Lafferty and Lebanon propose in their article Diffusion Kernels on Statis- tical Manifold (Laferty and Lebanon, 2005) the following way of work Classification of Data (SVM) Linear Gaussian Data set Mercer Kernels Classic look Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Objective of the article Lafferty and Lebanon propose in their article Diffusion Kernels on Statis- tical Manifold (Laferty and Lebanon, 2005) the following way of work Classification of Data (SVM) Linear Gaussian Data set Mercer Kernels Classic look Probability distributions Heat Kernel families Information Manifold Fisher Heat Equation Information metric or Difussion New look Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Support Vector Machine - SVM Mathematically, a support vector machine builds a hyperplane that can be used to do classification, regression, or other tasks. Intuitively, a good separation is achieved by the hyperplane that has the greatest distance to the closest training data points of any class (called functional margin), since in general, the larger the margin, the less generalization error of the classifier. (Cárdenas, 2015; Scikit Learn, 2017). Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold What is Kernel in mathematics? Let X be non-empty set. A symmetric function K : X X R is called positive definite Kernel in X if × → n m ci cj K(xi , xj ) 0 ≥ i 1 j 1 X= X= + is verified for any n, m N, each xi X and ci R . Examples of Kernels are ∈ ∈ ∈ 1 Linear: K(x, y)= x T y, where x, y Rd . n ∈ 2 Polynomial: K(x, y)= x T y + r , where x, y Rd and r > 0. ∈ x y 2 3 Gaussian (RBF): K(x, y)= exp k − 2k . − 2σ 4 Sigmoid: K(x, y)= tanh αx T y+ r where α and r are constant. Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Example SVM Feature space Data Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Example SVM Feature space Decision boundary Support vector Data Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Example SVM Feature space Decision boundary Support vector Data Margin Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Example SVM Feature space Decision boundary Support vector Data Margin Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Example SVM Feature space Decision boundary Support vector Data Margin Figure 1: Ideas about the operation of SVM Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Riemannian manifold Let M be a topological space. A chart in M consists of a pair (U, φ), where U is an open set in M and φ is a bijection of U to some open set A of Rn. Given a set of indexes I , a collection of charts on M = (Ui , φi ): i I A { ∈ } p where φi : Ui Ai is called a C -atlas with p 0, when the following conditions are→ verified ≥ 1 the family = Ui : i I is a covering of , F { ∈ } M 2 for all i, j I , φi (Ui Uj ) is an open set, ∈ 3 i j I U U for all , , with Ti j = , the mapping (called transition function) ∈ 6 ∅ T 1 φj φ− : φi (Ui Uj ) φj (Ui Uj ) , ◦ i ∩ → ∩ is an isomorphism of class C p. If the transition function are differentiable then the manifold is differentiable (Loaiza and Quiceno, 2013) Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold M Rn Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold M Ui p b Rn Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold M Ui p b Rn φi b Ai u Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold M Ui p b 1 φi− Rn φi b Ai u Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold M Ui Uj p b 1 φ− φj i 1 Rn φi φj− Rn b Aj Aj Ai u Ai b u Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold M Ui Uj p b 1 φ− φj i 1 Rn φi φj− Rn b Aj Aj Ai u Ai b u 1 φi (Ui Uj ) φj φ− φj (Ui Uj ) ∩ ◦ i ∩ Figure 2: Topological Manifold Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Riemannian Metric A Riemannian metric on a differentiable manifold M is a mapping that assigns each point p of the variety M an inner product gp on the tangent space TpM; thus, for every two vector fields Xp, Yp en M, the function gp(Xp, Yp)= Xp, Yp p is smooth. A M variety provided with a Rieman- nian metric ish called Riemanniani Manifold. The metric does not depend on the choice of the local coordinates, in addition it can be expressed n ∂ gp(Xp, Yp)= vi wj ∂i p, ∂j p with ∂i = h | | i ∂xi i,j X gij | {z } where vi y wj are the coefficients in the representation of Xp y Yp in the canonical basis of the tangent space Tp(M) given by ∂1 p, . , ∂n p . The { | | } terms gij (p) represent the entries of the matrix g(p) which is symmetric and definite positive. The existence (always exists at least one) of a metric allows defining the length of vectors (of curves) . The curve for which the shortest distance is presented is called geodesic (Burns and Gidea, 2005). Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Example of a Riemannian manifold: spherical surface p b M Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Example of a Riemannian manifold: spherical surface p b Yp TpM Xp M Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Example of a Riemannian manifold: spherical surface p b Yp TpM Xp b M Q Geodesic = PQ = arc of a great circle d Figure 3: Space tangent and geodesic in a sphere Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Laplacian Operator Let (M, g) a Riemannian manifold. For any smooth function f over M, the gradient is defined as the vector field grad(f ) in T (M) that satisfies grad(f ), X g = X (f ) for all X T (M), in local coordinates, the gradient h i ij∈∂f ij is written as (grad(f ))i = g where g are the components of the ∂xj j inverse of the matrix g =[gPij ], then n ij ∂f graf (f )= g ∂i . ∂xj i j 1 X, = (graf (f ))i In these same local coordinates, the divergent| {z } of X is written 1 ∂ div(X )= detgXi . ∂xi det g i X p The Laplacian or Laplace-Beltramip operator on (M, g) of a smooth func- tion1 f : M R is defined as → 1 ∂ ij ∂f ∆g f = div(grad(f )) = g detg . ∂xj ∂xi det g j i ! X X p Juan Carlos Arangop Parra Diffussion Kernels on Statistical Manifold Heat Equation ∂f The heat equation on (M, g) is the partial differential equation ∂t = ∆g f . A solution to the problem with initial condition ∂f ∂t = ∆g f 2 (1) f ( , 0)= f0 L (M) ( · ∈ is a continuous function f : M [0, ) R denoted f (x, t) which de- scribes the temperature at a point× x∞at time→ t beginning with an initial heat distribution described by the initial condition f (x, 0). This solution is such that for each fixed t > 0, f ( , t) is a function C 2, and for each x M, f (x, ) is C 1 (Cadavid and Vélez,· 2014). ∈ · Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Heat Kernel The Heat Kernel Kt (x, y) is the solution to the initial condition heat equation given by the Dirac delta function δx . This Heat Kernel allows to describe the other solutions of the heat equation by means of convolution as f (x, t)= Kt (x, y)f0(y)dy . ZM It also satisfies the following properties 1 Kt (x, y)= Kt (y, x) (Symmetric) ∂ 2 ∆ Kt (x, y)= 0 (Solution of the Heat Equation) − ∂t 3 Kt (x, y)= Kt s (x, z)Ks (z, y)dz for any s > 0 (Definite positive) M − According to theseR properties, the Heat Kernel is also a Mercer Kernel. Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold In the case of a n-dimensional flat Euclidean space, the Heat Kernel has the form 1 x y 2 1 d 2(x, y) Kt (x, y)= exp k − k = exp (4πt)n/2 − 4t (4πt)n/2 − 4t 2 exp kx−yk − 2σ2 | {z } where x y 2 is the square of the Euclidean distance between points x and y.k The− parametrixk expansion approximates the heat kernel locally as a correction to this Euclidean heat Kernel, is written 2 m 1 d (x, y) m P (x, y)= exp (Ψ0(x, y)+ ... +Ψm(x, y)t ) t (4πt)n/2 − 4t 2 for currently unspecified functions Ψk (x, y), where d (x, y) denote the square of the geodesic distance on the manifold. Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Interpretation of the Heat Equation using the Dirac Delta x b b y Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Interpretation of the Heat Equation using the Dirac Delta δx f (x, 0)= f0(x) x b b y Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Interpretation of the Heat Equation using the Dirac Delta δx f (x, 0)= f0(x) x b Kt (x, y) bb y Figure 4: Heat Kernel Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Particular cases 1 Multinomial Simplex.

Diffussion Kernels on Statistical Manifold

Information Geometry (Part 1)

Statistical Manifold, Exponential Family, Autoparallel Submanifold

Exponential Families: Dually-Flat, Hessian and Legendre Structures

Machine Learning on Statistical Manifold Bo Zhang Harvey Mudd College

Exponential Statistical Manifold

Information Geometry and Its Applications: an Overview

An Elementary Introduction to Information Geometry

Low-Dimensional Statistical Manifold Embed- Dingofdirectedgraphs

Diffusion Kernels on Statistical Manifolds

An Information Geometric Perspective on Active Learning

Pseudo-Elliptic Geometry of a Class of Frobenius-Manifolds & Maurer

Symplectic Structures on Statistical Manifolds