Diffussion Kernels on Statistical Manifold

Total Page:16

File Type:pdf, Size:1020Kb

Diffussion Kernels on Statistical Manifold Diffusion Kernels on Statistical Manifolds Juan Carlos Arango Parra September 29, 2017 Advisers: PhD Gabriel Ignacio Loaiza Ossa PhD Carlos Alberto Cadavid Moreno Doctoral Seminar 1 Universidad EAFIT Department of Mathematical Sciences PhD in Mathematical Engineering 2017 Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Objective of the article Lafferty and Lebanon propose in their article Diffusion Kernels on Statis- tical Manifold (Laferty and Lebanon, 2005) the following way of work Classification of Data (SVM) Linear Gaussian Data set Mercer Kernels Classic look Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Objective of the article Lafferty and Lebanon propose in their article Diffusion Kernels on Statis- tical Manifold (Laferty and Lebanon, 2005) the following way of work Classification of Data (SVM) Linear Gaussian Data set Mercer Kernels Classic look Probability distributions Heat Kernel families Information Manifold Fisher Heat Equation Information metric or Difussion New look Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Support Vector Machine - SVM Mathematically, a support vector machine builds a hyperplane that can be used to do classification, regression, or other tasks. Intuitively, a good separation is achieved by the hyperplane that has the greatest distance to the closest training data points of any class (called functional margin), since in general, the larger the margin, the less generalization error of the classifier. (Cárdenas, 2015; Scikit Learn, 2017). Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold What is Kernel in mathematics? Let X be non-empty set. A symmetric function K : X X R is called positive definite Kernel in X if × → n m ci cj K(xi , xj ) 0 ≥ i 1 j 1 X= X= + is verified for any n, m N, each xi X and ci R . Examples of Kernels are ∈ ∈ ∈ 1 Linear: K(x, y)= x T y, where x, y Rd . n ∈ 2 Polynomial: K(x, y)= x T y + r , where x, y Rd and r > 0. ∈ x y 2 3 Gaussian (RBF): K(x, y)= exp k − 2k . − 2σ 4 Sigmoid: K(x, y)= tanh αx T y+ r where α and r are constant. Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Example SVM Feature space Data Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Example SVM Feature space Decision boundary Support vector Data Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Example SVM Feature space Decision boundary Support vector Data Margin Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Example SVM Feature space Decision boundary Support vector Data Margin Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Example SVM Feature space Decision boundary Support vector Data Margin Figure 1: Ideas about the operation of SVM Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Riemannian manifold Let M be a topological space. A chart in M consists of a pair (U, φ), where U is an open set in M and φ is a bijection of U to some open set A of Rn. Given a set of indexes I , a collection of charts on M = (Ui , φi ): i I A { ∈ } p where φi : Ui Ai is called a C -atlas with p 0, when the following conditions are→ verified ≥ 1 the family = Ui : i I is a covering of , F { ∈ } M 2 for all i, j I , φi (Ui Uj ) is an open set, ∈ 3 i j I U U for all , , with Ti j = , the mapping (called transition function) ∈ 6 ∅ T 1 φj φ− : φi (Ui Uj ) φj (Ui Uj ) , ◦ i ∩ → ∩ is an isomorphism of class C p. If the transition function are differentiable then the manifold is differentiable (Loaiza and Quiceno, 2013) Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold M Rn Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold M Ui p b Rn Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold M Ui p b Rn φi b Ai u Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold M Ui p b 1 φi− Rn φi b Ai u Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold M Ui Uj p b 1 φ− φj i 1 Rn φi φj− Rn b Aj Aj Ai u Ai b u Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold M Ui Uj p b 1 φ− φj i 1 Rn φi φj− Rn b Aj Aj Ai u Ai b u 1 φi (Ui Uj ) φj φ− φj (Ui Uj ) ∩ ◦ i ∩ Figure 2: Topological Manifold Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Riemannian Metric A Riemannian metric on a differentiable manifold M is a mapping that assigns each point p of the variety M an inner product gp on the tangent space TpM; thus, for every two vector fields Xp, Yp en M, the function gp(Xp, Yp)= Xp, Yp p is smooth. A M variety provided with a Rieman- nian metric ish called Riemanniani Manifold. The metric does not depend on the choice of the local coordinates, in addition it can be expressed n ∂ gp(Xp, Yp)= vi wj ∂i p, ∂j p with ∂i = h | | i ∂xi i,j X gij | {z } where vi y wj are the coefficients in the representation of Xp y Yp in the canonical basis of the tangent space Tp(M) given by ∂1 p, . , ∂n p . The { | | } terms gij (p) represent the entries of the matrix g(p) which is symmetric and definite positive. The existence (always exists at least one) of a metric allows defining the length of vectors (of curves) . The curve for which the shortest distance is presented is called geodesic (Burns and Gidea, 2005). Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Example of a Riemannian manifold: spherical surface p b M Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Example of a Riemannian manifold: spherical surface p b Yp TpM Xp M Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Example of a Riemannian manifold: spherical surface p b Yp TpM Xp b M Q Geodesic = PQ = arc of a great circle d Figure 3: Space tangent and geodesic in a sphere Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Laplacian Operator Let (M, g) a Riemannian manifold. For any smooth function f over M, the gradient is defined as the vector field grad(f ) in T (M) that satisfies grad(f ), X g = X (f ) for all X T (M), in local coordinates, the gradient h i ij∈∂f ij is written as (grad(f ))i = g where g are the components of the ∂xj j inverse of the matrix g =[gPij ], then n ij ∂f graf (f )= g ∂i . ∂xj i j 1 X, = (graf (f ))i In these same local coordinates, the divergent| {z } of X is written 1 ∂ div(X )= detgXi . ∂xi det g i X p The Laplacian or Laplace-Beltramip operator on (M, g) of a smooth func- tion1 f : M R is defined as → 1 ∂ ij ∂f ∆g f = div(grad(f )) = g detg . ∂xj ∂xi det g j i ! X X p Juan Carlos Arangop Parra Diffussion Kernels on Statistical Manifold Heat Equation ∂f The heat equation on (M, g) is the partial differential equation ∂t = ∆g f . A solution to the problem with initial condition ∂f ∂t = ∆g f 2 (1) f ( , 0)= f0 L (M) ( · ∈ is a continuous function f : M [0, ) R denoted f (x, t) which de- scribes the temperature at a point× x∞at time→ t beginning with an initial heat distribution described by the initial condition f (x, 0). This solution is such that for each fixed t > 0, f ( , t) is a function C 2, and for each x M, f (x, ) is C 1 (Cadavid and Vélez,· 2014). ∈ · Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Heat Kernel The Heat Kernel Kt (x, y) is the solution to the initial condition heat equa- tion given by the Dirac delta function δx . This Heat Kernel allows to describe the other solutions of the heat equation by means of convolution as f (x, t)= Kt (x, y)f0(y)dy . ZM It also satisfies the following properties 1 Kt (x, y)= Kt (y, x) (Symmetric) ∂ 2 ∆ Kt (x, y)= 0 (Solution of the Heat Equation) − ∂t 3 Kt (x, y)= Kt s (x, z)Ks (z, y)dz for any s > 0 (Definite positive) M − According to theseR properties, the Heat Kernel is also a Mercer Kernel. Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold In the case of a n-dimensional flat Euclidean space, the Heat Kernel has the form 1 x y 2 1 d 2(x, y) Kt (x, y)= exp k − k = exp (4πt)n/2 − 4t (4πt)n/2 − 4t 2 exp kx−yk − 2σ2 | {z } where x y 2 is the square of the Euclidean distance between points x and y.k The− parametrixk expansion approximates the heat kernel locally as a correction to this Euclidean heat Kernel, is written 2 m 1 d (x, y) m P (x, y)= exp (Ψ0(x, y)+ ... +Ψm(x, y)t ) t (4πt)n/2 − 4t 2 for currently unspecified functions Ψk (x, y), where d (x, y) denote the square of the geodesic distance on the manifold. Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Interpretation of the Heat Equation using the Dirac Delta x b b y Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Interpretation of the Heat Equation using the Dirac Delta δx f (x, 0)= f0(x) x b b y Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Interpretation of the Heat Equation using the Dirac Delta δx f (x, 0)= f0(x) x b Kt (x, y) bb y Figure 4: Heat Kernel Juan Carlos Arango Parra Diffussion Kernels on Statistical Manifold Particular cases 1 Multinomial Simplex.
Recommended publications
  • Information Geometry (Part 1)
    October 22, 2010 Information Geometry (Part 1) John Baez Information geometry is the study of 'statistical manifolds', which are spaces where each point is a hypothesis about some state of affairs. In statistics a hypothesis amounts to a probability distribution, but we'll also be looking at the quantum version of a probability distribution, which is called a 'mixed state'. Every statistical manifold comes with a way of measuring distances and angles, called the Fisher information metric. In the first seven articles in this series, I'll try to figure out what this metric really means. The formula for it is simple enough, but when I first saw it, it seemed quite mysterious. A good place to start is this interesting paper: • Gavin E. Crooks, Measuring thermodynamic length. which was pointed out by John Furey in a discussion about entropy and uncertainty. The idea here should work for either classical or quantum statistical mechanics. The paper describes the classical version, so just for a change of pace let me describe the quantum version. First a lightning review of quantum statistical mechanics. Suppose you have a quantum system with some Hilbert space. When you know as much as possible about your system, then you describe it by a unit vector in this Hilbert space, and you say your system is in a pure state. Sometimes people just call a pure state a 'state'. But that can be confusing, because in statistical mechanics you also need more general 'mixed states' where you don't know as much as possible. A mixed state is described by a density matrix, meaning a positive operator with trace equal to 1: tr( ) = 1 The idea is that any observable is described by a self-adjoint operator A, and the expected value of this observable in the mixed state is A = tr( A) The entropy of a mixed state is defined by S( ) = −tr( ln ) where we take the logarithm of the density matrix just by taking the log of each of its eigenvalues, while keeping the same eigenvectors.
    [Show full text]
  • Statistical Manifold, Exponential Family, Autoparallel Submanifold
    Global Journal of Advanced Research on Classical and Modern Geometries ISSN: 2284-5569, Vol.8, (2019), Issue 1, pp.18-25 SUBMANIFOLDS OF EXPONENTIAL FAMILIES MAHESH T. V. AND K.S. SUBRAHAMANIAN MOOSATH ABSTRACT . Exponential family with 1 - connection plays an important role in information geom- ± etry. Amari proved that a submanifold M of an exponential family S is exponential if and only if M is a 1- autoparallel submanifold. We show that if all 1- auto parallel proper submanifolds of ∇ ∇ a 1 flat statistical manifold S are exponential then S is an exponential family. Also shown that ± − the submanifold of a parameterized model S which is an exponential family is a 1 - autoparallel ∇ submanifold. Keywords: statistical manifold, exponential family, autoparallel submanifold. 2010 MSC: 53A15 1. I NTRODUCTION Information geometry emerged from the geometric study of a statistical model of probability distributions. The information geometric tools are widely applied to various fields such as statis- tics, information theory, stochastic processes, neural networks, statistical physics, neuroscience etc.[3][7]. The importance of the differential geometric approach to the field of statistics was first noticed by C R Rao [6]. On a statistical model of probability distributions he introduced a Riemannian metric defined by the Fisher information known as the Fisher information metric. Another milestone in this area is the work of Amari [1][2][5]. He introduced the α - geometric structures on a statistical manifold consisting of Fisher information metric and the α - con- nections. Harsha and Moosath [4] introduced more generalized geometric structures± called the (F, G) geometry on a statistical manifold which is a generalization of α geometry.
    [Show full text]
  • Exponential Families: Dually-Flat, Hessian and Legendre Structures
    entropy Review Information Geometry of k-Exponential Families: Dually-Flat, Hessian and Legendre Structures Antonio M. Scarfone 1,* ID , Hiroshi Matsuzoe 2 ID and Tatsuaki Wada 3 ID 1 Istituto dei Sistemi Complessi, Consiglio Nazionale delle Ricerche (ISC-CNR), c/o Politecnico di Torino, 10129 Torino, Italy 2 Department of Computer Science and Engineering, Graduate School of Engineering, Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya 466-8555, Japan; [email protected] 3 Region of Electrical and Electronic Systems Engineering, Ibaraki University, Nakanarusawa-cho, Hitachi 316-8511, Japan; [email protected] * Correspondence: [email protected]; Tel.: +39-011-090-7339 Received: 9 May 2018; Accepted: 1 June 2018; Published: 5 June 2018 Abstract: In this paper, we present a review of recent developments on the k-deformed statistical mechanics in the framework of the information geometry. Three different geometric structures are introduced in the k-formalism which are obtained starting from three, not equivalent, divergence functions, corresponding to the k-deformed version of Kullback–Leibler, “Kerridge” and Brègman divergences. The first statistical manifold derived from the k-Kullback–Leibler divergence form an invariant geometry with a positive curvature that vanishes in the k → 0 limit. The other two statistical manifolds are related to each other by means of a scaling transform and are both dually-flat. They have a dualistic Hessian structure endowed by a deformed Fisher metric and an affine connection that are consistent with a statistical scalar product based on the k-escort expectation. These flat geometries admit dual potentials corresponding to the thermodynamic Massieu and entropy functions that induce a Legendre structure of k-thermodynamics in the picture of the information geometry.
    [Show full text]
  • Machine Learning on Statistical Manifold Bo Zhang Harvey Mudd College
    Claremont Colleges Scholarship @ Claremont HMC Senior Theses HMC Student Scholarship 2017 Machine Learning on Statistical Manifold Bo Zhang Harvey Mudd College Recommended Citation Zhang, Bo, "Machine Learning on Statistical Manifold" (2017). HMC Senior Theses. 110. https://scholarship.claremont.edu/hmc_theses/110 This Open Access Senior Thesis is brought to you for free and open access by the HMC Student Scholarship at Scholarship @ Claremont. It has been accepted for inclusion in HMC Senior Theses by an authorized administrator of Scholarship @ Claremont. For more information, please contact [email protected]. Machine Learning on Statistical Manifold Bo Zhang Weiqing Gu, Advisor Nicholas Pippenger, Reader Department of Mathematics May, 2017 Copyright © 2017 Bo Zhang. The author grants Harvey Mudd College and the Claremont Colleges Library the nonexclusive right to make this work available for noncommercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise or to republish requires written permission from the author. Abstract This senior thesis project explores and generalizes some fundamental ma- chine learning algorithms from the Euclidean space to the statistical manifold, an abstract space in which each point is a probability distribution. In this thesis, we adapt the optimal separating hyperplane, the k-means clustering method, and the hierarchical clustering method for classifying and clustering probability distributions. In these modifications, we use the statistical distances as a mea- sure of the dissimilarity between objects. We describe a situation where the clustering of probability distributions is needed and useful. We present many interesting and promising empirical clustering results, which demonstrate the statistical-distance-based clustering algorithms often outperform the same algorithms with the Euclidean distance in many complex scenarios.
    [Show full text]
  • Exponential Statistical Manifold
    AISM (2007) 59:27–56 DOI 10.1007/s10463-006-0096-y Exponential statistical manifold Alberto Cena · Giovanni Pistone Received: 2 May 2006 / Revised: 25 September 2006 / Published online: 16 December 2006 © The Institute of Statistical Mathematics, Tokyo 2006 Abstract We consider the non-parametric statistical model E(p) of all positive densities q that are connected to a given positive density p by an open exponen- tial arc, i.e. a one-parameter exponential model p(t), t ∈ I, where I is an open interval. On this model there exists a manifold structure modeled on Orlicz spaces, originally introduced in 1995 by Pistone and Sempi. Analytic properties of such a manifold are discussed. Especially, we discuss the regularity of mix- ture models under this geometry, as such models are related with the notion of e- and m-connections as discussed by Amari and Nagaoka. Keywords Information geometry · Statistical manifold · Orlicz space · Moment generating functional · Cumulant generating functional · Kullback–Leibler divergence 1 Introduction 1.1 Content In the present paper we follow closely the discussion of Information Geometry developed by Amari and coworkers, see e.g. in Amari (1982), Amari (1985), Amari and Nagaoka (2000), with the specification that we want to construct a Banach manifold structure in the classical sense, see e.g. Bourbaki (1971)or Lang (1995), without any restriction to parametric models. A. Cena · G. Pistone (B) Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy e-mail: [email protected] A. Cena e-mail: [email protected] 28 A. Cena, G. Pistone We build on the previous work on the theory of Statistical Manifolds modeled on Orlicz spaces as defined in Pistone and Sempi (1995), Pistone and Rogantin (1999), Gibilisco and Pistone (1998) and the unpublished PhD thesis Cena (2002).
    [Show full text]
  • Information Geometry and Its Applications: an Overview
    Information Geometry and Its Applications: an Overview Frank Critchley1 ? and Paul Marriott2 ?? 1 The Open University, Walton Hall, Milton Keynes, Buckinghamshire, UK MK7 6AA [email protected] 2 University of Waterloo, 200 University Avenue West, Waterloo, Ontario, Canada [email protected] Abstract. We give a personal view of what Information Geometry is, and what it is be- coming, by exploring a number of key topics: dual affine families, boundaries, divergences, tensorial structures, and dimensionality. For each, (A) we start with a graphical illustrative example, (B) give an overview of the relevant theory and key references, and (C) finish with a number of applications of the theory. We treat `Information Geometry' as an evolutionary term, deliberately not attempting a comprehensive definition. Rather, we illustrate how both the geometries used and application areas are rapidly developing. Introduction This paper is an overview of information geometry (IG) and it is important to emphasize that ours is one of many possible approaches that could have been taken. It is, necessarily, a somewhat personal view, with a focus on the authors' own expertise. We, the authors, both have our key inter- est in statistical theory and practice, and were both strongly influenced, just after its publication, by Professor Amari's monograph, Amari (1985). Recently we, and co-workers, have focused our attention on what we call computational information geometry (CIG). This, in all its forms { see, for example, Liu et al. (2012), Nielsen and Nock (2013), Nielsen and Nock (2014), Anaya-Izquierdo et al. (2013a), and Critchley and Marriott (2014a) { has been a significant recent development, and this paper includes further contribution to it.
    [Show full text]
  • An Elementary Introduction to Information Geometry
    entropy Review An Elementary Introduction to Information Geometry Frank Nielsen Sony Computer Science Laboratories, Tokyo 141-0022, Japan; [email protected] Received: 6 September 2020; Accepted: 25 September 2020; Published: 29 September 2020 Abstract: In this survey, we describe the fundamental differential-geometric structures of information manifolds, state the fundamental theorem of information geometry, and illustrate some use cases of these information manifolds in information sciences. The exposition is self-contained by concisely introducing the necessary concepts of differential geometry. Proofs are omitted for brevity. Keywords: differential geometry; metric tensor; affine connection; metric compatibility; conjugate connections; dual metric-compatible parallel transport; information manifold; statistical manifold; curvature and flatness; dually flat manifolds; Hessian manifolds; exponential family; mixture family; statistical divergence; parameter divergence; separable divergence; Fisher–Rao distance; statistical invariance; Bayesian hypothesis testing; mixture clustering; a-embeddings; mixed parameterization; gauge freedom 1. Introduction 1.1. Overview of Information Geometry We present a concise and modern view of the basic structures lying at the heart of Information Geometry (IG), and report some applications of those information-geometric manifolds (herein termed “information manifolds”) in statistics (Bayesian hypothesis testing) and machine learning (statistical mixture clustering). By analogy to Information Theory
    [Show full text]
  • Low-Dimensional Statistical Manifold Embed- Dingofdirectedgraphs
    Published as a conference paper at ICLR 2020 LOW-DIMENSIONAL STATISTICAL MANIFOLD EMBED- DING OF DIRECTED GRAPHS Thorben Funke Tian Guo L3S Research Center Computationl Social Science Leibniz University Hannover ETH Zürich Hannover, Germany Zurich, Switzerland Alen Lancic Nino Antulov-Fantulin Faculty of Science Computationl Social Science Department of Mathematics ETH Zürich University of Zagreb, Croatia Zurich, Switzerland [email protected] ABSTRACT We propose a novel node embedding of directed graphs to statistical manifolds, which is based on a global minimization of pairwise relative entropy and graph geodesics in a non-linear way. Each node is encoded with a probability density function over a measurable space. Furthermore, we analyze the connection between the geometrical properties of such embedding and their efficient learning procedure. Extensive experiments show that our proposed embedding is better in preserving the global geodesic information of graphs, as well as outperforming existing embedding models on directed graphs in a variety of evaluation metrics, in an unsupervised setting. 1 INTRODUCTION In this publication, we study the directed graph embedding problem in an unsupervised learning setting. A graph embedding problem is usually defined as a problem of finding a vector representation X 2 RK for every node of a graph G = (V; E) through a mapping φ: V ! X. On every graph G = (V; E), defined with set of nodes V and set of edges E, the distance dG : V ×V ! R+ between two vertices is defined as the number of edges connecting them in the shortest path, also called a graph geodesic. In case that X is equipped with a distance metric function dX : X × X ! R+, we can quantify the embedding distortion by measuring the ratio dX = dG between pairs of corresponding embedded points and nodes.
    [Show full text]
  • Diffusion Kernels on Statistical Manifolds
    Journal of Machine Learning Research 6 (2005) 129–163 Submitted 1/04; Published 1/05 Diffusion Kernels on Statistical Manifolds John Lafferty [email protected] Guy Lebanon [email protected] School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 USA Editor: Tommi Jaakkola Abstract A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian kernel of Euclidean space. As an important special case, kernels based on the geometry of multi- nomial families are derived, leading to kernel-based learning algorithms that apply naturally to discrete data. Bounds on covering numbers and Rademacher averages for the kernels are proved using bounds on the eigenvalues of the Laplacian on Riemannian manifolds. Experimental results are presented for document classification, for which the use of multinomial geometry is natural and well motivated, and improvements are obtained over the standard use of Gaussian or linear kernels, which have been the standard for text classification. Keywords: kernels, heat equation, diffusion, information geometry, text classification 1. Introduction The use of Mercer kernels for transforming linear classification and regression schemes into nonlin- ear methods is a fundamental idea, one that was recognized early in the development of statistical learning algorithms such as the perceptron, splines, and support vector machines (Aizerman et al., 1964; Kimeldorf and Wahba, 1971; Boser et al., 1992). The resurgence of activity on kernel methods in the machine learning community has led to the further development of this important technique, demonstrating how kernels can be key components in tools for tackling nonlinear data analysis problems, as well as for integrating data from multiple sources.
    [Show full text]
  • An Information Geometric Perspective on Active Learning
    An information geometric perspective on active learning Chen-Hsiang Yeang Artificial Intelligence Lab, MIT, Cambridge, MA 02139, USA [email protected] Abstract. The Fisher information matrix plays a very important role in both active learning and information geometry. In a special case of active learning (nonlinear regression with Gaussian noise), the inverse of the Fisher information matrix { the dispersion matrix of parameters { induces a variety of criteria for optimal experiment design. In informa- tion geometry, the Fisher information matrix defines the metric tensor on model manifolds. In this paper, I explore the intrinsic relations of these two fields. The conditional distributions which belong to exponen- tial families are known to be dually flat. Moreover, the author proves for a certain type of conditional models, the embedding curvature in terms of true parameters also vanishes. The expected Riemannian dis- tance between current parameters and the next update is proposed to be the loss function for active learning. Examples of nonlinear and logistic regressions are given in order to elucidate this active learning scheme. 1 Introduction Active learning is a subcategory of machine learning. The learner seeks new examples from a specific region of input space instead of passively taking the examples generated by an unknown oracle. It is crucial when the effort of ac- quiring output information is much more demanding than collecting the input data. When the objective is to learn the parameters of an unknown distribution, a data point (x; y) contains input variables x and output variables y. Actively choosing x distorts the natural distribution of p(x; y) but generates no bias on the conditional distribution p(y x).
    [Show full text]
  • Pseudo-Elliptic Geometry of a Class of Frobenius-Manifolds & Maurer
    PSEUDO-ELLIPTIC GEOMETRY OF A CLASS OF FROBENIUS-MANIFOLDS & MAURER–CARTAN STRUCTURES N. COMBE, PH. COMBE, AND H. NENCKA Abstract. The recently discovered fourth class of Frobenius manifolds by Combe–Manin in [11] opened and highlighted new geometric domains to ex- plore. The guiding mantra of this article is to show the existence of hidden geometric aspects of the fourth Frobenius manifold, which turns out to be re- lated to so-called causality conditions. Firstly, it is proved that the fourth class of Frobenius manifolds is a Pseudo-Elliptic one. Secondly, this manifold turns out to be a sub-manifold of a non-orientable Lorentzian projective manifold. Thirdly, Maurer–Cartan structures for this manifold and hidden geometrical properties for this manifold are unraveled. In fine, these investigations lead to the rather philosophical concept of causality condition, creating a bridge be- tween the notion of causality coming from Lorentzian manifolds (originated in special relativity theory) and the one arising in probability and statistics. Contents Introduction 2 1. Overview on the paracomplex geometry 5 2. FirstpartoftheproofofTheoremA 9 3. SecondpartoftheproofofTheoremA 13 4. Theorem B: the fourth Frobenius manifold is a Lorentzian manifold 16 5. SecondpartoftheproofofTheoremB 18 6. Conclusion 23 References 23 arXiv:2107.01985v1 [math.AG] 5 Jul 2021 Date: July 6, 2021. 2020 Mathematics Subject Classification. Primary: 17Cxx, 53B05, 53B10, 53B12; Sec- ondary:16W10, 53D45. Key words and phrases. Non-Euclidean geometry, Paracomplex geometry, Lorentzian mani- fold, Jordan algebras, Frobenius manifold. This research was supported by the Max Planck Society’s Minerva grant. The authors express their gratitude towards MPI MiS for excellent working conditions.
    [Show full text]
  • Symplectic Structures on Statistical Manifolds
    J. Aust. Math. Soc. 90 (2011), 371–384 doi:10.1017/S1446788711001285 SYMPLECTIC STRUCTURES ON STATISTICAL MANIFOLDS TOMONORI NODA (Received 18 August 2010; accepted 25 October 2010) Communicated by M. K. Murray Abstract A relationship between symplectic geometry and information geometry is studied. The square of a dually flat space admits a natural symplectic structure that is the pullback of the canonical symplectic structure on the cotangent bundle of the dually flat space via the canonical divergence. With respect to the symplectic structure, there exists a moment map whose image is the dually flat space. As an example, we obtain a duality relation between the Fubini–Study metric on a projective space and the Fisher metric on a statistical model on a finite set. Conversely, a dually flat space admitting a symplectic structure is locally symplectically isomorphic to the cotangent bundle with the canonical symplectic structure of some dually flat space. We also discuss nonparametric cases. 2010 Mathematics subject classification: primary 53D20; secondary 60D05. Keywords and phrases: symplectic manifold, moment map, statistical manifold, statistical model, dually flat space. 1. Introduction Information geometry is the study of probability and information from a differential geometric viewpoint. On the space of probability measures, there exist a natural Riemannian metric, called the Fisher metric, and a family of affine connections, called α-connections. In this paper we see that symplectic structures are very natural for the elementary spaces appearing in information geometry. Let M be a smooth manifold and ! be a 2-form on M. Denote by ![ the linear map from TM to T ∗M determined by !.
    [Show full text]