Statistical Manifold, Exponential Family, Autoparallel Submanifold
Total Page:16
File Type:pdf, Size:1020Kb
Global Journal of Advanced Research on Classical and Modern Geometries ISSN: 2284-5569, Vol.8, (2019), Issue 1, pp.18-25 SUBMANIFOLDS OF EXPONENTIAL FAMILIES MAHESH T. V. AND K.S. SUBRAHAMANIAN MOOSATH ABSTRACT . Exponential family with 1 - connection plays an important role in information geom- ± etry. Amari proved that a submanifold M of an exponential family S is exponential if and only if M is a 1- autoparallel submanifold. We show that if all 1- auto parallel proper submanifolds of ∇ ∇ a 1 flat statistical manifold S are exponential then S is an exponential family. Also shown that ± − the submanifold of a parameterized model S which is an exponential family is a 1 - autoparallel ∇ submanifold. Keywords: statistical manifold, exponential family, autoparallel submanifold. 2010 MSC: 53A15 1. I NTRODUCTION Information geometry emerged from the geometric study of a statistical model of probability distributions. The information geometric tools are widely applied to various fields such as statis- tics, information theory, stochastic processes, neural networks, statistical physics, neuroscience etc.[3][7]. The importance of the differential geometric approach to the field of statistics was first noticed by C R Rao [6]. On a statistical model of probability distributions he introduced a Riemannian metric defined by the Fisher information known as the Fisher information metric. Another milestone in this area is the work of Amari [1][2][5]. He introduced the α - geometric structures on a statistical manifold consisting of Fisher information metric and the α - con- nections. Harsha and Moosath [4] introduced more generalized geometric structures± called the (F, G) geometry on a statistical manifold which is a generalization of α geometry. There are many attempts to understand the geometry of the statistical manifold and− also to develop a dif- ferential geometric framework for the estimation theory. In this paper we shall study the geometry of exponential family. An exponential family is an im- portant statistical model which is attracted by many of the researchers from Physics, Mathematics and Statistics. The exponential family contains as special cases most of the standard discrete and continuous distributions that we use for practical modelling, such as the normal, Poisson, Bi- nomial, exponential, Gamma, multivariate normal, etc. Distributions in the exponential family have been used in classical statistics for decades. We discuss the dually flat structure of the finite dimensional exponential family with respect to the 1 connections defined by Amari. Then we prove a condition for a 1 flat statistical manifold± − to be an exponential family. Also show ± − that submanifold of a statistical manifold which is an exponential family is a 1 - autoparallel submanifold. ∇ 2. S TATISTICAL MANIFOLD Consider the sample space Rn. A probability measure on can be represented in terms of density function with respectX to ⊆ Lebesgue measure. X Key words and phrases. statistical manifold, exponential family, autoparallel submanifold. AMS 2010 Mathematics Subject Classification:53A15. 18 Submanifolds of Exponential Families Definition 2.1. Consider a family of probability distributions on . Suppose each element of S X can be parametrized using n real-valued variables (θ1, ..., θn) so that S = p = p(x; θ) / θ = ( θ1, ..., θn) E (2.1) S { θ ∈ } E Rn where is a subset of and the mapping θ pθ is injective. We call such family an n- dimensional statistical model or a parametric model7→ or simply a model on θ. S Let us now state certain regularity conditions which are required for our geometric theory. Regularity conditions (1) We assume that E is an open subset of Rn and for each x , the function θ p(x; θ) is of class c∞. ∈ X 7→ (2) Let ℓ(x; θ) = log p(x; θ). For every fixed θ, n functions in x, ∂ ℓ(x; θ); i = 1, ..., n are { i } linearly independent, where ∂ = ∂ . i ∂θ i (3) The order of integration and differentiation may be freely rearranged. (4) The moments of ∂iℓ(x; θ) exists upto necessary orders. (5) For a probability distribution p on Ω, let the support of p be defined as, supp (p) := x p(x) > 0 . The case when supp (p ) varies with θ poses rather significant difficulties { | } θ for analysis. Hence we assume that supp (pθ) is constant with respect to θ. Then we can redefine to be supp (pθ). This is equivalent to assuming that p(x; θ) > 0 holds for all θ E andX all x . This means that the model is a subset of ∈ ∈ X S ( ) := p : R p(x) > 0 ( x ); p(x)dx = 1 (2.2) P X { X −→ | ∀ ∈ X Z } X Definition 2.2. For a model = p / θ E , the mapping ϕ : Rn defined by ϕ(p ) = θ S { θ ∈ } S −→ θ allows us to consider ϕ = ( θi) as a coordinate system for . Suppose we have a c∞ diffeomor- phism ψ : E ψ(E), where ψ(E) is an open subset of RSn. Then if we use ρ = ψ(θ) instead −→ E of θ as our parameter, we obtain = pψ 1(ρ) ρ ψ( ) . This expresses the same family of S { − | ∈ } ∞ probability distributions = pθ . If we consider parametrizations which are c diffeomorphic to each other to be equivalent,S { then} we may consider as a c∞ differentiable manifold and we call it as a statistical manifold . S For the statistical manifold = p(x; θ) , define ℓ(x; θ) = log p(x; θ) and consider the partial S { } derivatives ∂iℓ; i = 1, ...., n. By our assumption, ∂iℓ; i = 1, ...., n are linearly independent functions in x. We can construct the following n-dimensional vector space spanned by n functions ∂iℓ; i = 1, ...., n in x as, n 1 i ℓ Tθ ( ) = A(x) / A(x) = ∑ A ∂i . (2.3) S { i=1 } Define expectation with respect to the distribution p(x; θ) as Eθ( f ) = f (x)p(x; θ)dx . (2.4) Z Note that Eθ[∂iℓx;θ] = 0 since p(x; θ) satisfies p(x; θ)dx = 1. (2.5) Z 1 Hence for any random variable A(x) Tθ ( ), we have Eθ[A(x)] = 0. This expectation induces an inner product∈ onS in a natural way. S < A(x), B(x) > = E [A(x)B(x)] ; f orA (x), B(x) T1( ) θ θ ∈ θ S 19 Mahesh T. V. and K.S. Subrahamanian Moosath Especially the inner product of the basis vectors ∂i and ∂j is gij (θ) = < ∂i, ∂j >θ = Eθ[∂iℓ(x; θ)∂jℓ(x; θ)] (2.6) = E[∂ ∂ ℓ(x; θ)] (2.7) − i j = ∂iℓ(x; θ)∂jℓ(x; θ)p(x; θ)dx . (2.8) Z It is clear that the matrix G(θ) = ( gij (θ)) is symmetric (i.e gij = gji ). For any n-dimensional vector c = [ c1, ..., cn]t n t i 2 c G(θ)c = ∑ c ∂iℓ(x; θ) p(x; θ)dx > 0 (2.9) Z {i=1 } since ∂1ℓ(x; θ), ..., ∂nℓ(x; θ) are linearly independent, G is positive definite. Hence g =<, > de- fined{ in (2.8 ) is a Riemannian} metric on the statistical manifold , called the Fisher information metric . S Example 2.3. Normal distribution = R, n = 2, θ = ( µ, σ), E = (µ, σ) / ∞ < µ < ∞, 0 < σ < ∞ X { − } 1 (x µ)2 = N(µ, σ) = p(x; θ) = exp − . (2.10) S { √2πσ {− 2σ2 }} This is a 2-dimensional manifold which can be identified with the upper half plane. The log- likelihood function is given by (x µ)2 ℓ(x, θ) = − log √2πσ − 2σ2 − The tangent space T1 is spanned by ∂ = ∂ and ∂ = ∂ . θ S 1 ∂µ 2 ∂σ (x µ) (x µ)2 1 ∂ = − , ∂ = − 1 σ2 2 − σ3 − σ Then the Fisher information matrix G(θ) = ( gij ) is given by 1 0 σ2 2 0 2 σ Definition 2.4. Let = p(x; θ) / θ E be an n-dimensional statistical manifold with the S { 3 ∈ Γ}1 Fisher metric g. We can define n functions ijk by Γ1 ℓ ℓ ijk = Eθ[( ∂i∂j (x; θ))( ∂k (x; θ))] (2.11) Γ1 uniquely determine an affine connection 1 on the statistical manifold by ijk ∇ S 1 1 Γ =< ∂j, ∂ > (2.12) ijk ∇∂i k 1 is called the 1 connection or the exponential connection . ∇ − Here ℓ(x; θ) the logarithm of the density function p(x; θ) is used to define the fundamental geo- metric structures in a statistical model S = p(x; θ) . Amari defined one parameter family of functions called the α - embedding indexed by{ α R}. ∈ Definition 2.5. Let L(α)(p) be a one parameter family of functions defined by 1 α 2 −2 1 α p α = 1 L(α)(p) = − 6 (2.13) ( log p α = 1 20 Submanifolds of Exponential Families and we call ℓ (α)(x; θ) = L(α)(p(x; θ)) (2.14) the α representation of the density function p(x; θ). − The 1 representation ℓ (x; θ) is the log-likelihood function ℓ(x; θ) and the − 1 ( 1) representation ℓ 1(x; θ) is the density function p(x; θ) itself. − −α − ℓ Let Tθ ( ) be the vector space spanned by n linearly independent functions ∂i α(x; θ) in x for i = 1, ...,Sn. n α i ℓ Tθ ( ) = A(x) / A(x) = ∑ A ∂i α(x; θ) . (2.15) S { i=1 } There is a natural isomorphism between these two vector spaces T1( ) and Tα( ) given by θ S θ S ∂ ℓ (x; θ) T1( ) ∂ ℓ (x; θ) Tα( ) (2.16) i 1 ∈ θ S ←→ i α ∈ θ S The vector space Tα( ) is called the α representation of the tangent space T1( ). The α representation θ S − θ S − of a vector A = ∑n Ai∂ ℓ T1( ) is the random variable i=1 i ∈ θ S n i Aα(x) = ∑ A ∂iℓα(x; θ) (2.17) i=1 Let us define the α expectation of a random variable f with respect to the density p(x; θ) as − α α Eθ ( f ) = f (x)p(x; θ) dx .