Statistical Manifold, Exponential Family, Autoparallel Submanifold

Total Page:16

File Type:pdf, Size:1020Kb

Statistical Manifold, Exponential Family, Autoparallel Submanifold Global Journal of Advanced Research on Classical and Modern Geometries ISSN: 2284-5569, Vol.8, (2019), Issue 1, pp.18-25 SUBMANIFOLDS OF EXPONENTIAL FAMILIES MAHESH T. V. AND K.S. SUBRAHAMANIAN MOOSATH ABSTRACT . Exponential family with 1 - connection plays an important role in information geom- ± etry. Amari proved that a submanifold M of an exponential family S is exponential if and only if M is a 1- autoparallel submanifold. We show that if all 1- auto parallel proper submanifolds of ∇ ∇ a 1 flat statistical manifold S are exponential then S is an exponential family. Also shown that ± − the submanifold of a parameterized model S which is an exponential family is a 1 - autoparallel ∇ submanifold. Keywords: statistical manifold, exponential family, autoparallel submanifold. 2010 MSC: 53A15 1. I NTRODUCTION Information geometry emerged from the geometric study of a statistical model of probability distributions. The information geometric tools are widely applied to various fields such as statis- tics, information theory, stochastic processes, neural networks, statistical physics, neuroscience etc.[3][7]. The importance of the differential geometric approach to the field of statistics was first noticed by C R Rao [6]. On a statistical model of probability distributions he introduced a Riemannian metric defined by the Fisher information known as the Fisher information metric. Another milestone in this area is the work of Amari [1][2][5]. He introduced the α - geometric structures on a statistical manifold consisting of Fisher information metric and the α - con- nections. Harsha and Moosath [4] introduced more generalized geometric structures± called the (F, G) geometry on a statistical manifold which is a generalization of α geometry. There are many attempts to understand the geometry of the statistical manifold and− also to develop a dif- ferential geometric framework for the estimation theory. In this paper we shall study the geometry of exponential family. An exponential family is an im- portant statistical model which is attracted by many of the researchers from Physics, Mathematics and Statistics. The exponential family contains as special cases most of the standard discrete and continuous distributions that we use for practical modelling, such as the normal, Poisson, Bi- nomial, exponential, Gamma, multivariate normal, etc. Distributions in the exponential family have been used in classical statistics for decades. We discuss the dually flat structure of the finite dimensional exponential family with respect to the 1 connections defined by Amari. Then we prove a condition for a 1 flat statistical manifold± − to be an exponential family. Also show ± − that submanifold of a statistical manifold which is an exponential family is a 1 - autoparallel submanifold. ∇ 2. S TATISTICAL MANIFOLD Consider the sample space Rn. A probability measure on can be represented in terms of density function with respectX to ⊆ Lebesgue measure. X Key words and phrases. statistical manifold, exponential family, autoparallel submanifold. AMS 2010 Mathematics Subject Classification:53A15. 18 Submanifolds of Exponential Families Definition 2.1. Consider a family of probability distributions on . Suppose each element of S X can be parametrized using n real-valued variables (θ1, ..., θn) so that S = p = p(x; θ) / θ = ( θ1, ..., θn) E (2.1) S { θ ∈ } E Rn where is a subset of and the mapping θ pθ is injective. We call such family an n- dimensional statistical model or a parametric model7→ or simply a model on θ. S Let us now state certain regularity conditions which are required for our geometric theory. Regularity conditions (1) We assume that E is an open subset of Rn and for each x , the function θ p(x; θ) is of class c∞. ∈ X 7→ (2) Let ℓ(x; θ) = log p(x; θ). For every fixed θ, n functions in x, ∂ ℓ(x; θ); i = 1, ..., n are { i } linearly independent, where ∂ = ∂ . i ∂θ i (3) The order of integration and differentiation may be freely rearranged. (4) The moments of ∂iℓ(x; θ) exists upto necessary orders. (5) For a probability distribution p on Ω, let the support of p be defined as, supp (p) := x p(x) > 0 . The case when supp (p ) varies with θ poses rather significant difficulties { | } θ for analysis. Hence we assume that supp (pθ) is constant with respect to θ. Then we can redefine to be supp (pθ). This is equivalent to assuming that p(x; θ) > 0 holds for all θ E andX all x . This means that the model is a subset of ∈ ∈ X S ( ) := p : R p(x) > 0 ( x ); p(x)dx = 1 (2.2) P X { X −→ | ∀ ∈ X Z } X Definition 2.2. For a model = p / θ E , the mapping ϕ : Rn defined by ϕ(p ) = θ S { θ ∈ } S −→ θ allows us to consider ϕ = ( θi) as a coordinate system for . Suppose we have a c∞ diffeomor- phism ψ : E ψ(E), where ψ(E) is an open subset of RSn. Then if we use ρ = ψ(θ) instead −→ E of θ as our parameter, we obtain = pψ 1(ρ) ρ ψ( ) . This expresses the same family of S { − | ∈ } ∞ probability distributions = pθ . If we consider parametrizations which are c diffeomorphic to each other to be equivalent,S { then} we may consider as a c∞ differentiable manifold and we call it as a statistical manifold . S For the statistical manifold = p(x; θ) , define ℓ(x; θ) = log p(x; θ) and consider the partial S { } derivatives ∂iℓ; i = 1, ...., n. By our assumption, ∂iℓ; i = 1, ...., n are linearly independent functions in x. We can construct the following n-dimensional vector space spanned by n functions ∂iℓ; i = 1, ...., n in x as, n 1 i ℓ Tθ ( ) = A(x) / A(x) = ∑ A ∂i . (2.3) S { i=1 } Define expectation with respect to the distribution p(x; θ) as Eθ( f ) = f (x)p(x; θ)dx . (2.4) Z Note that Eθ[∂iℓx;θ] = 0 since p(x; θ) satisfies p(x; θ)dx = 1. (2.5) Z 1 Hence for any random variable A(x) Tθ ( ), we have Eθ[A(x)] = 0. This expectation induces an inner product∈ onS in a natural way. S < A(x), B(x) > = E [A(x)B(x)] ; f orA (x), B(x) T1( ) θ θ ∈ θ S 19 Mahesh T. V. and K.S. Subrahamanian Moosath Especially the inner product of the basis vectors ∂i and ∂j is gij (θ) = < ∂i, ∂j >θ = Eθ[∂iℓ(x; θ)∂jℓ(x; θ)] (2.6) = E[∂ ∂ ℓ(x; θ)] (2.7) − i j = ∂iℓ(x; θ)∂jℓ(x; θ)p(x; θ)dx . (2.8) Z It is clear that the matrix G(θ) = ( gij (θ)) is symmetric (i.e gij = gji ). For any n-dimensional vector c = [ c1, ..., cn]t n t i 2 c G(θ)c = ∑ c ∂iℓ(x; θ) p(x; θ)dx > 0 (2.9) Z {i=1 } since ∂1ℓ(x; θ), ..., ∂nℓ(x; θ) are linearly independent, G is positive definite. Hence g =<, > de- fined{ in (2.8 ) is a Riemannian} metric on the statistical manifold , called the Fisher information metric . S Example 2.3. Normal distribution = R, n = 2, θ = ( µ, σ), E = (µ, σ) / ∞ < µ < ∞, 0 < σ < ∞ X { − } 1 (x µ)2 = N(µ, σ) = p(x; θ) = exp − . (2.10) S { √2πσ {− 2σ2 }} This is a 2-dimensional manifold which can be identified with the upper half plane. The log- likelihood function is given by (x µ)2 ℓ(x, θ) = − log √2πσ − 2σ2 − The tangent space T1 is spanned by ∂ = ∂ and ∂ = ∂ . θ S 1 ∂µ 2 ∂σ (x µ) (x µ)2 1 ∂ = − , ∂ = − 1 σ2 2 − σ3 − σ Then the Fisher information matrix G(θ) = ( gij ) is given by 1 0 σ2 2 0 2 σ Definition 2.4. Let = p(x; θ) / θ E be an n-dimensional statistical manifold with the S { 3 ∈ Γ}1 Fisher metric g. We can define n functions ijk by Γ1 ℓ ℓ ijk = Eθ[( ∂i∂j (x; θ))( ∂k (x; θ))] (2.11) Γ1 uniquely determine an affine connection 1 on the statistical manifold by ijk ∇ S 1 1 Γ =< ∂j, ∂ > (2.12) ijk ∇∂i k 1 is called the 1 connection or the exponential connection . ∇ − Here ℓ(x; θ) the logarithm of the density function p(x; θ) is used to define the fundamental geo- metric structures in a statistical model S = p(x; θ) . Amari defined one parameter family of functions called the α - embedding indexed by{ α R}. ∈ Definition 2.5. Let L(α)(p) be a one parameter family of functions defined by 1 α 2 −2 1 α p α = 1 L(α)(p) = − 6 (2.13) ( log p α = 1 20 Submanifolds of Exponential Families and we call ℓ (α)(x; θ) = L(α)(p(x; θ)) (2.14) the α representation of the density function p(x; θ). − The 1 representation ℓ (x; θ) is the log-likelihood function ℓ(x; θ) and the − 1 ( 1) representation ℓ 1(x; θ) is the density function p(x; θ) itself. − −α − ℓ Let Tθ ( ) be the vector space spanned by n linearly independent functions ∂i α(x; θ) in x for i = 1, ...,Sn. n α i ℓ Tθ ( ) = A(x) / A(x) = ∑ A ∂i α(x; θ) . (2.15) S { i=1 } There is a natural isomorphism between these two vector spaces T1( ) and Tα( ) given by θ S θ S ∂ ℓ (x; θ) T1( ) ∂ ℓ (x; θ) Tα( ) (2.16) i 1 ∈ θ S ←→ i α ∈ θ S The vector space Tα( ) is called the α representation of the tangent space T1( ). The α representation θ S − θ S − of a vector A = ∑n Ai∂ ℓ T1( ) is the random variable i=1 i ∈ θ S n i Aα(x) = ∑ A ∂iℓα(x; θ) (2.17) i=1 Let us define the α expectation of a random variable f with respect to the density p(x; θ) as − α α Eθ ( f ) = f (x)p(x; θ) dx .
Recommended publications
  • Information Geometry (Part 1)
    October 22, 2010 Information Geometry (Part 1) John Baez Information geometry is the study of 'statistical manifolds', which are spaces where each point is a hypothesis about some state of affairs. In statistics a hypothesis amounts to a probability distribution, but we'll also be looking at the quantum version of a probability distribution, which is called a 'mixed state'. Every statistical manifold comes with a way of measuring distances and angles, called the Fisher information metric. In the first seven articles in this series, I'll try to figure out what this metric really means. The formula for it is simple enough, but when I first saw it, it seemed quite mysterious. A good place to start is this interesting paper: • Gavin E. Crooks, Measuring thermodynamic length. which was pointed out by John Furey in a discussion about entropy and uncertainty. The idea here should work for either classical or quantum statistical mechanics. The paper describes the classical version, so just for a change of pace let me describe the quantum version. First a lightning review of quantum statistical mechanics. Suppose you have a quantum system with some Hilbert space. When you know as much as possible about your system, then you describe it by a unit vector in this Hilbert space, and you say your system is in a pure state. Sometimes people just call a pure state a 'state'. But that can be confusing, because in statistical mechanics you also need more general 'mixed states' where you don't know as much as possible. A mixed state is described by a density matrix, meaning a positive operator with trace equal to 1: tr( ) = 1 The idea is that any observable is described by a self-adjoint operator A, and the expected value of this observable in the mixed state is A = tr( A) The entropy of a mixed state is defined by S( ) = −tr( ln ) where we take the logarithm of the density matrix just by taking the log of each of its eigenvalues, while keeping the same eigenvectors.
    [Show full text]
  • Exponential Families: Dually-Flat, Hessian and Legendre Structures
    entropy Review Information Geometry of k-Exponential Families: Dually-Flat, Hessian and Legendre Structures Antonio M. Scarfone 1,* ID , Hiroshi Matsuzoe 2 ID and Tatsuaki Wada 3 ID 1 Istituto dei Sistemi Complessi, Consiglio Nazionale delle Ricerche (ISC-CNR), c/o Politecnico di Torino, 10129 Torino, Italy 2 Department of Computer Science and Engineering, Graduate School of Engineering, Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya 466-8555, Japan; [email protected] 3 Region of Electrical and Electronic Systems Engineering, Ibaraki University, Nakanarusawa-cho, Hitachi 316-8511, Japan; [email protected] * Correspondence: [email protected]; Tel.: +39-011-090-7339 Received: 9 May 2018; Accepted: 1 June 2018; Published: 5 June 2018 Abstract: In this paper, we present a review of recent developments on the k-deformed statistical mechanics in the framework of the information geometry. Three different geometric structures are introduced in the k-formalism which are obtained starting from three, not equivalent, divergence functions, corresponding to the k-deformed version of Kullback–Leibler, “Kerridge” and Brègman divergences. The first statistical manifold derived from the k-Kullback–Leibler divergence form an invariant geometry with a positive curvature that vanishes in the k → 0 limit. The other two statistical manifolds are related to each other by means of a scaling transform and are both dually-flat. They have a dualistic Hessian structure endowed by a deformed Fisher metric and an affine connection that are consistent with a statistical scalar product based on the k-escort expectation. These flat geometries admit dual potentials corresponding to the thermodynamic Massieu and entropy functions that induce a Legendre structure of k-thermodynamics in the picture of the information geometry.
    [Show full text]
  • Machine Learning on Statistical Manifold Bo Zhang Harvey Mudd College
    Claremont Colleges Scholarship @ Claremont HMC Senior Theses HMC Student Scholarship 2017 Machine Learning on Statistical Manifold Bo Zhang Harvey Mudd College Recommended Citation Zhang, Bo, "Machine Learning on Statistical Manifold" (2017). HMC Senior Theses. 110. https://scholarship.claremont.edu/hmc_theses/110 This Open Access Senior Thesis is brought to you for free and open access by the HMC Student Scholarship at Scholarship @ Claremont. It has been accepted for inclusion in HMC Senior Theses by an authorized administrator of Scholarship @ Claremont. For more information, please contact [email protected]. Machine Learning on Statistical Manifold Bo Zhang Weiqing Gu, Advisor Nicholas Pippenger, Reader Department of Mathematics May, 2017 Copyright © 2017 Bo Zhang. The author grants Harvey Mudd College and the Claremont Colleges Library the nonexclusive right to make this work available for noncommercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise or to republish requires written permission from the author. Abstract This senior thesis project explores and generalizes some fundamental ma- chine learning algorithms from the Euclidean space to the statistical manifold, an abstract space in which each point is a probability distribution. In this thesis, we adapt the optimal separating hyperplane, the k-means clustering method, and the hierarchical clustering method for classifying and clustering probability distributions. In these modifications, we use the statistical distances as a mea- sure of the dissimilarity between objects. We describe a situation where the clustering of probability distributions is needed and useful. We present many interesting and promising empirical clustering results, which demonstrate the statistical-distance-based clustering algorithms often outperform the same algorithms with the Euclidean distance in many complex scenarios.
    [Show full text]
  • Exponential Statistical Manifold
    AISM (2007) 59:27–56 DOI 10.1007/s10463-006-0096-y Exponential statistical manifold Alberto Cena · Giovanni Pistone Received: 2 May 2006 / Revised: 25 September 2006 / Published online: 16 December 2006 © The Institute of Statistical Mathematics, Tokyo 2006 Abstract We consider the non-parametric statistical model E(p) of all positive densities q that are connected to a given positive density p by an open exponen- tial arc, i.e. a one-parameter exponential model p(t), t ∈ I, where I is an open interval. On this model there exists a manifold structure modeled on Orlicz spaces, originally introduced in 1995 by Pistone and Sempi. Analytic properties of such a manifold are discussed. Especially, we discuss the regularity of mix- ture models under this geometry, as such models are related with the notion of e- and m-connections as discussed by Amari and Nagaoka. Keywords Information geometry · Statistical manifold · Orlicz space · Moment generating functional · Cumulant generating functional · Kullback–Leibler divergence 1 Introduction 1.1 Content In the present paper we follow closely the discussion of Information Geometry developed by Amari and coworkers, see e.g. in Amari (1982), Amari (1985), Amari and Nagaoka (2000), with the specification that we want to construct a Banach manifold structure in the classical sense, see e.g. Bourbaki (1971)or Lang (1995), without any restriction to parametric models. A. Cena · G. Pistone (B) Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy e-mail: [email protected] A. Cena e-mail: [email protected] 28 A. Cena, G. Pistone We build on the previous work on the theory of Statistical Manifolds modeled on Orlicz spaces as defined in Pistone and Sempi (1995), Pistone and Rogantin (1999), Gibilisco and Pistone (1998) and the unpublished PhD thesis Cena (2002).
    [Show full text]
  • Information Geometry and Its Applications: an Overview
    Information Geometry and Its Applications: an Overview Frank Critchley1 ? and Paul Marriott2 ?? 1 The Open University, Walton Hall, Milton Keynes, Buckinghamshire, UK MK7 6AA [email protected] 2 University of Waterloo, 200 University Avenue West, Waterloo, Ontario, Canada [email protected] Abstract. We give a personal view of what Information Geometry is, and what it is be- coming, by exploring a number of key topics: dual affine families, boundaries, divergences, tensorial structures, and dimensionality. For each, (A) we start with a graphical illustrative example, (B) give an overview of the relevant theory and key references, and (C) finish with a number of applications of the theory. We treat `Information Geometry' as an evolutionary term, deliberately not attempting a comprehensive definition. Rather, we illustrate how both the geometries used and application areas are rapidly developing. Introduction This paper is an overview of information geometry (IG) and it is important to emphasize that ours is one of many possible approaches that could have been taken. It is, necessarily, a somewhat personal view, with a focus on the authors' own expertise. We, the authors, both have our key inter- est in statistical theory and practice, and were both strongly influenced, just after its publication, by Professor Amari's monograph, Amari (1985). Recently we, and co-workers, have focused our attention on what we call computational information geometry (CIG). This, in all its forms { see, for example, Liu et al. (2012), Nielsen and Nock (2013), Nielsen and Nock (2014), Anaya-Izquierdo et al. (2013a), and Critchley and Marriott (2014a) { has been a significant recent development, and this paper includes further contribution to it.
    [Show full text]
  • An Elementary Introduction to Information Geometry
    entropy Review An Elementary Introduction to Information Geometry Frank Nielsen Sony Computer Science Laboratories, Tokyo 141-0022, Japan; [email protected] Received: 6 September 2020; Accepted: 25 September 2020; Published: 29 September 2020 Abstract: In this survey, we describe the fundamental differential-geometric structures of information manifolds, state the fundamental theorem of information geometry, and illustrate some use cases of these information manifolds in information sciences. The exposition is self-contained by concisely introducing the necessary concepts of differential geometry. Proofs are omitted for brevity. Keywords: differential geometry; metric tensor; affine connection; metric compatibility; conjugate connections; dual metric-compatible parallel transport; information manifold; statistical manifold; curvature and flatness; dually flat manifolds; Hessian manifolds; exponential family; mixture family; statistical divergence; parameter divergence; separable divergence; Fisher–Rao distance; statistical invariance; Bayesian hypothesis testing; mixture clustering; a-embeddings; mixed parameterization; gauge freedom 1. Introduction 1.1. Overview of Information Geometry We present a concise and modern view of the basic structures lying at the heart of Information Geometry (IG), and report some applications of those information-geometric manifolds (herein termed “information manifolds”) in statistics (Bayesian hypothesis testing) and machine learning (statistical mixture clustering). By analogy to Information Theory
    [Show full text]
  • Low-Dimensional Statistical Manifold Embed- Dingofdirectedgraphs
    Published as a conference paper at ICLR 2020 LOW-DIMENSIONAL STATISTICAL MANIFOLD EMBED- DING OF DIRECTED GRAPHS Thorben Funke Tian Guo L3S Research Center Computationl Social Science Leibniz University Hannover ETH Zürich Hannover, Germany Zurich, Switzerland Alen Lancic Nino Antulov-Fantulin Faculty of Science Computationl Social Science Department of Mathematics ETH Zürich University of Zagreb, Croatia Zurich, Switzerland [email protected] ABSTRACT We propose a novel node embedding of directed graphs to statistical manifolds, which is based on a global minimization of pairwise relative entropy and graph geodesics in a non-linear way. Each node is encoded with a probability density function over a measurable space. Furthermore, we analyze the connection between the geometrical properties of such embedding and their efficient learning procedure. Extensive experiments show that our proposed embedding is better in preserving the global geodesic information of graphs, as well as outperforming existing embedding models on directed graphs in a variety of evaluation metrics, in an unsupervised setting. 1 INTRODUCTION In this publication, we study the directed graph embedding problem in an unsupervised learning setting. A graph embedding problem is usually defined as a problem of finding a vector representation X 2 RK for every node of a graph G = (V; E) through a mapping φ: V ! X. On every graph G = (V; E), defined with set of nodes V and set of edges E, the distance dG : V ×V ! R+ between two vertices is defined as the number of edges connecting them in the shortest path, also called a graph geodesic. In case that X is equipped with a distance metric function dX : X × X ! R+, we can quantify the embedding distortion by measuring the ratio dX = dG between pairs of corresponding embedded points and nodes.
    [Show full text]
  • Diffusion Kernels on Statistical Manifolds
    Journal of Machine Learning Research 6 (2005) 129–163 Submitted 1/04; Published 1/05 Diffusion Kernels on Statistical Manifolds John Lafferty [email protected] Guy Lebanon [email protected] School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 USA Editor: Tommi Jaakkola Abstract A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian kernel of Euclidean space. As an important special case, kernels based on the geometry of multi- nomial families are derived, leading to kernel-based learning algorithms that apply naturally to discrete data. Bounds on covering numbers and Rademacher averages for the kernels are proved using bounds on the eigenvalues of the Laplacian on Riemannian manifolds. Experimental results are presented for document classification, for which the use of multinomial geometry is natural and well motivated, and improvements are obtained over the standard use of Gaussian or linear kernels, which have been the standard for text classification. Keywords: kernels, heat equation, diffusion, information geometry, text classification 1. Introduction The use of Mercer kernels for transforming linear classification and regression schemes into nonlin- ear methods is a fundamental idea, one that was recognized early in the development of statistical learning algorithms such as the perceptron, splines, and support vector machines (Aizerman et al., 1964; Kimeldorf and Wahba, 1971; Boser et al., 1992). The resurgence of activity on kernel methods in the machine learning community has led to the further development of this important technique, demonstrating how kernels can be key components in tools for tackling nonlinear data analysis problems, as well as for integrating data from multiple sources.
    [Show full text]
  • An Information Geometric Perspective on Active Learning
    An information geometric perspective on active learning Chen-Hsiang Yeang Artificial Intelligence Lab, MIT, Cambridge, MA 02139, USA [email protected] Abstract. The Fisher information matrix plays a very important role in both active learning and information geometry. In a special case of active learning (nonlinear regression with Gaussian noise), the inverse of the Fisher information matrix { the dispersion matrix of parameters { induces a variety of criteria for optimal experiment design. In informa- tion geometry, the Fisher information matrix defines the metric tensor on model manifolds. In this paper, I explore the intrinsic relations of these two fields. The conditional distributions which belong to exponen- tial families are known to be dually flat. Moreover, the author proves for a certain type of conditional models, the embedding curvature in terms of true parameters also vanishes. The expected Riemannian dis- tance between current parameters and the next update is proposed to be the loss function for active learning. Examples of nonlinear and logistic regressions are given in order to elucidate this active learning scheme. 1 Introduction Active learning is a subcategory of machine learning. The learner seeks new examples from a specific region of input space instead of passively taking the examples generated by an unknown oracle. It is crucial when the effort of ac- quiring output information is much more demanding than collecting the input data. When the objective is to learn the parameters of an unknown distribution, a data point (x; y) contains input variables x and output variables y. Actively choosing x distorts the natural distribution of p(x; y) but generates no bias on the conditional distribution p(y x).
    [Show full text]
  • Pseudo-Elliptic Geometry of a Class of Frobenius-Manifolds & Maurer
    PSEUDO-ELLIPTIC GEOMETRY OF A CLASS OF FROBENIUS-MANIFOLDS & MAURER–CARTAN STRUCTURES N. COMBE, PH. COMBE, AND H. NENCKA Abstract. The recently discovered fourth class of Frobenius manifolds by Combe–Manin in [11] opened and highlighted new geometric domains to ex- plore. The guiding mantra of this article is to show the existence of hidden geometric aspects of the fourth Frobenius manifold, which turns out to be re- lated to so-called causality conditions. Firstly, it is proved that the fourth class of Frobenius manifolds is a Pseudo-Elliptic one. Secondly, this manifold turns out to be a sub-manifold of a non-orientable Lorentzian projective manifold. Thirdly, Maurer–Cartan structures for this manifold and hidden geometrical properties for this manifold are unraveled. In fine, these investigations lead to the rather philosophical concept of causality condition, creating a bridge be- tween the notion of causality coming from Lorentzian manifolds (originated in special relativity theory) and the one arising in probability and statistics. Contents Introduction 2 1. Overview on the paracomplex geometry 5 2. FirstpartoftheproofofTheoremA 9 3. SecondpartoftheproofofTheoremA 13 4. Theorem B: the fourth Frobenius manifold is a Lorentzian manifold 16 5. SecondpartoftheproofofTheoremB 18 6. Conclusion 23 References 23 arXiv:2107.01985v1 [math.AG] 5 Jul 2021 Date: July 6, 2021. 2020 Mathematics Subject Classification. Primary: 17Cxx, 53B05, 53B10, 53B12; Sec- ondary:16W10, 53D45. Key words and phrases. Non-Euclidean geometry, Paracomplex geometry, Lorentzian mani- fold, Jordan algebras, Frobenius manifold. This research was supported by the Max Planck Society’s Minerva grant. The authors express their gratitude towards MPI MiS for excellent working conditions.
    [Show full text]
  • Symplectic Structures on Statistical Manifolds
    J. Aust. Math. Soc. 90 (2011), 371–384 doi:10.1017/S1446788711001285 SYMPLECTIC STRUCTURES ON STATISTICAL MANIFOLDS TOMONORI NODA (Received 18 August 2010; accepted 25 October 2010) Communicated by M. K. Murray Abstract A relationship between symplectic geometry and information geometry is studied. The square of a dually flat space admits a natural symplectic structure that is the pullback of the canonical symplectic structure on the cotangent bundle of the dually flat space via the canonical divergence. With respect to the symplectic structure, there exists a moment map whose image is the dually flat space. As an example, we obtain a duality relation between the Fubini–Study metric on a projective space and the Fisher metric on a statistical model on a finite set. Conversely, a dually flat space admitting a symplectic structure is locally symplectically isomorphic to the cotangent bundle with the canonical symplectic structure of some dually flat space. We also discuss nonparametric cases. 2010 Mathematics subject classification: primary 53D20; secondary 60D05. Keywords and phrases: symplectic manifold, moment map, statistical manifold, statistical model, dually flat space. 1. Introduction Information geometry is the study of probability and information from a differential geometric viewpoint. On the space of probability measures, there exist a natural Riemannian metric, called the Fisher metric, and a family of affine connections, called α-connections. In this paper we see that symplectic structures are very natural for the elementary spaces appearing in information geometry. Let M be a smooth manifold and ! be a 2-form on M. Denote by ![ the linear map from TM to T ∗M determined by !.
    [Show full text]
  • An Elementary Introduction to Information Geometry
    An elementary introduction to information geometry Frank Nielsen Sony Computer Science Laboratoties Inc Tokyo, Japan Abstract In this survey, we describe the fundamental differential-geometric structures of information manifolds, state the fundamental theorem of information geometry, and illustrate some use cases of these information manifolds in information sciences. The exposition is self-contained by concisely introducing the necessary concepts of differential geometry, but proofs are omitted for brevity. Keywords: Differential geometry; metric tensor; affine connection; metric compatibility; conjugate connec- tions; dual metric-compatible parallel transport; information manifold; statistical manifold; curvature and flatness; dually flat manifolds; Hessian manifolds; exponential family; mixture family; statistical divergence; parameter divergence; separable divergence; Fisher-Rao distance; statistical invariance; Bayesian hypothesis testing; mixture clustering; αembeddings; gauge freedom 1 Introduction 1.1 Overview of information geometry We present a concise and modern view of the basic structures lying at the heart of Information Geometry (IG), and report some applications of those information-geometric manifolds (herein termed \information manifolds") in statistics (Bayesian hypothesis testing) and machine learning (statistical mixture clustering). By analogy to Information Theory (IT) (pioneered by Claude Shannon in his celebrated 1948's pa- per [119]) which considers primarily the communication of messages over noisy transmission channels, we may define Information Sciences (IS) as the fields that study \communication" between (noisy/imperfect) data and families of models (postulated as a priori knowledge). In short, information sciences seek meth- ods to distill information from data to models. Thus information sciences encompass information theory but also include the fields of Probability & Statistics, Machine Learning (ML), Artificial Intelligence (AI), Mathematical Programming, just to name a few.
    [Show full text]