Introduction to Information Geometry – Based on the Book “Methods of Information Geometry ”Written by Shun-Ichi Amari and Hiroshi Nagaoka

Introduction to Information Geometry – based on the book “Methods of Information Geometry ”written by Shun-Ichi Amari and Hiroshi Nagaoka Yunshu Liu 2012-02-17 Introduction to differential geometry Geometric structure of statistical models and statistical inference Outline 1 Introduction to differential geometry Manifold and Submanifold Tangent vector, Tangent space and Vector field Riemannian metric and Affine connection Flatness and autoparallel 2 Geometric structure of statistical models and statistical inference The Fisher metric and α-connection Exponential family Divergence and Geometric statistical inference Yunshu Liu (ASPITRG) Introduction to Information Geometry 2 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Part I Introduction to differential geometry Yunshu Liu (ASPITRG) Introduction to Information Geometry 3 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Basic concepts in differential geometry Basic concepts Manifold and Submanifold Tangent vector, Tangent space and Vector field Riemannian metric and Affine connection Flatness and autoparallel Yunshu Liu (ASPITRG) Introduction to Information Geometry 4 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Manifold Manifold S n Manifold: a set with a coordinate system, a one-to-one mapping from S to R , n supposed to be ”locally” looks like an open subset of R ” n Elements of the set(points): points in R , probability distribution, linear system. Figure : A coordinate system ξ for a manifold S Yunshu Liu (ASPITRG) Introduction to Information Geometry 5 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Manifold Manifold S Definition: Let S be a set, if there exists a set of coordinate systems A for S which satisfies the condition (1) and (2) below, we call S an n-dimensional C1 differentiable manifold. (1) Each element ' of A is a one-to-one mapping from S to some open n subset of R . (2) For all ' 2 A, given any one-to-one mapping from S to Rn, the following hold: 2 A , ◦ '−1 is a C1 diffeomorphism. Here, by a C1 diffeomorphism we mean that ◦ '−1 and its inverse ' ◦ −1 are both C1(infinitely many times differentiable). Yunshu Liu (ASPITRG) Introduction to Information Geometry 6 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Examples of Manifold Examples of one-dimensional manifold 1 k A straight line: a manifold in R , even if it is given in R for 8 k > 2. Any open subset of a straight line: one-dimensional manifold A closed subset of a straight line: not a manifold A circle: locally the circle looks like a line Any open subset of a circle: one-dimensional manifold A closed subset of a circle: not a manifold. Yunshu Liu (ASPITRG) Introduction to Information Geometry 7 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Examples of Manifold: surface of a sphere Surface of a sphere in R3, defined by S = f(x; y; z) 2 R3jx2 + y2 + z2 = 1g, locally it can be parameterized by using two coordinates, for example, we can use latitude and longitude as the coordinates. n 2 2 2 nD sphere(n-1 sphere): S = f(x1; x2; :::; xn) 2 R jx1 + x2 + ::: + xn = 1g: Yunshu Liu (ASPITRG) Introduction to Information Geometry 8 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Examples of Manifold: surface of a torus The torus in R3(surface of a doughnut): x(u; v) = ((a + b cos u)cos v; (a + b cos u)sin v; b sin u); 0 6 u; v < 2π: where a is the distance from the center of the tube to the center of the torus, and b is the radius of the tube. A torus is a closed surface defined as product of two circles: T2 = S1 × S1. n-torus: Tn is defined as a product of n circles: Tn = S1 × S1 × · · · × S1. Yunshu Liu (ASPITRG) Introduction to Information Geometry 9 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Coordinate systems for manifold Parametrization of unit hemisphere Parametrization: map of unit hemisphere into R2 (1) by latitude and longigude; x('; θ) = (sin ' cos θ; sin ' sin θ; cos '); 0 < ' < π=2; 0 6 θ < 2π (1) Yunshu Liu (ASPITRG) Introduction to Information Geometry 10 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Coordinate systems for manifold Parametrization of unit hemisphere Parametrization: map of unit hemisphere into R2 (2) by stereographic projections. 2u 2v 1 − u2 − v2 x(u; v) = ( ; ; ) where u2 + v2 1 (2) 1 + u2 + v2 1 + u2 + v2 1 + u2 + v2 6 Yunshu Liu (ASPITRG) Introduction to Information Geometry 11 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Examples of Manifold: colors Parametrization of color models 3 channel color models: RGB, CMYK, LAB, HSV and so on. The RGB color model: an additive color model in which red, green, and blue light are added together in various ways to reproduce a broad array of colors. The Lab color model: three coordinates of Lab represent the lightness of the color(L), its position between red and green(a) and its position between yellow and blue(b). Yunshu Liu (ASPITRG) Introduction to Information Geometry 12 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Coordinate systems for manifold Parametrization of color models Parametrization: map of color into R3 Examples: Yunshu Liu (ASPITRG) Introduction to Information Geometry 13 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Submanifolds Submanifolds Definition: a submanifold M of a manifold S is a subset of S which itself has the structure of a manifold An open subset of n-dimensional manifold forms an n-dimensional submanifold. One way to construct m(<n)dimensional manifold: fix n-m coordinates. Examples: Yunshu Liu (ASPITRG) Introduction to Information Geometry 14 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Submanifolds Examples: color models 3-dimensional submanifold: any open subset; 2-dimensional submanifold: fix one coordinate; 1-dimensional submanifold: fix two coordinates. Note: In Lab color model, we set a and b to 0 and change L from 0 to 100(from black to white), then we get a 1-dimensional submanifold. Yunshu Liu (ASPITRG) Introduction to Information Geometry 15 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Basic concepts in differential geometry Basic concepts Manifold and Submanifold Tangent vector, Tangent space and Vector field Riemannian metric and Affine connection Flatness and autoparallel Yunshu Liu (ASPITRG) Introduction to Information Geometry 16 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Curves and Tangent vector of Curves Curves Curve γ: I ! S from some interval I(⊂ R) to S. Examples: curve on sphere, set of probability distribution, set of linear systems. Using coordinate system fξig to express the point γ(t) on the curve(where t 2 I): γi(t) = ξi(γ(t)), then we get γ¯(t) = [γ1(t); ··· ; γn(t)]. C1Curves C1: infinitely many times differentiable(sufficiently smooth). If γ¯(t) is C1 for t 2 I, we call γ a C1 on manifold S. Yunshu Liu (ASPITRG) Introduction to Information Geometry 17 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Tangent vector of Curves A tangent vector is a vector that is tangent to a curve or surface at a given point. n When S is an open subset of R , the range of γ is contained within a single linear space, hence we consider the standard derivative: γ(a + h) − γ(a) γ_ (a) = lim (3) h!0 h In general, however, this is not true, ex: the range of γ in a color model Thus we use a more general ”derivative” instead: n X @ γ_ (a) = γ_ i(a)( ) (4) @ξi p i=1 i i i d i @ where γ (t) = ξ ◦ γ(t), γ_ (a) = dt γ (t)jt=a and ( @ξi )p is an operator which @f maps f ! ( @ξi )p for given function f : S ! R. Yunshu Liu (ASPITRG) Introduction to Information Geometry 18 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Tangent space Tangent space Tangent space at p: a hyperplane Tp containing all the tangents of curves passing through the point p 2 S. (dim Tp(S) = dim S) n X @ T (S) = f ci( ) j[c1; ··· ; cn] 2 ng p @ξi p R i=1 Examples: hemisphere and color Yunshu Liu (ASPITRG) Introduction to Information Geometry 19 / 79 Figure : Tangent vector and tangent space in unit hemisphere Introduction to differential geometry Geometric structure of statistical models and statistical inference Vector fields Vector fields Vector fields: a map from each point in a manifold S to a tangent vector. Consider a coordinate system fξig for a n-dimensional manifold, clearly @ @i = @ξi are vector fields for i = 1; ··· ; n. Yunshu Liu (ASPITRG) Introduction to Information Geometry 20 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Basic concepts in differential geometry Basic concepts Manifold and Submanifold Tangent vector, Tangent space and Vector field Riemannian metric and Affine connection Flatness and autoparallel Yunshu Liu (ASPITRG) Introduction to Information Geometry 21 / 79 Introduction to differential geometry Geometric structure of statistical models and statistical inference Riemannian Metrics 0 Riemannian Metrics: an inner product of two tangent vectors(D and D 2 0 Tp(S)) which satisfy h D, D ip 2 R, and the following condition hold: 0 00 00 0 00 Linearity : haD + bD ; D ip = ahD; D ip + bhD ; D ip 0 0 Symmetry : hD; D ip = hD ; Dip Positive − definiteness : If D 6= 0 then hD; Dip > 0 The components fgijg of a Riemannian metric g w:r:t: the coordinate system fξ g are defined by g = h @ ;@ i, where @ = @ .

Introduction to Information Geometry – Based on the Book “Methods of Information Geometry ”Written by Shun-Ichi Amari and Hiroshi Nagaoka

Arxiv:1907.11122V2 [Math-Ph] 23 Aug 2019 on M Which Are Dual with Respect to G

Information Geometry of Orthogonal Initializations and Training

Information Geometry and Optimal Transport

Information Geometry (Part 1)

Information Geometry: Near Randomness and Near Independence

Exponential Families: Dually-Flat, Hessian and Legendre Structures

Information-Geometric Optimization Algorithms: a Unifying Picture Via Invariance Principles

Fisher-Rao Geometry of Dirichlet Distributions Alice Le Brigant, Stephen Preston, Stéphane Puechmorel

Information-Geometric Optimization Algorithms: a Unifying Picture Via Invariance Principles Yann Ollivier, Ludovic Arnold, Anne Auger, Nikolaus Hansen

The Fisher-Rao Geometry of Beta Distributions Applied to the Study of Canonical Moments Alice Le Brigant, Stéphane Puechmorel

Applications of Information Geometry to Hypothesis Testing and Signal

Information Geometry of Α-Projection in Mean Field Approximation