Global Journal of Advanced Research on Classical and Modern Geometries ISSN: 2284-5569, Vol.8, (2019), Issue 1, pp.18-25

SUBMANIFOLDS OF EXPONENTIAL FAMILIES

MAHESH T. V. AND K.S. SUBRAHAMANIAN MOOSATH

ABSTRACT . Exponential family with 1 - connection plays an important role in information geom- ± etry. Amari proved that a submanifold M of an exponential family S is exponential if and only if M is a 1- autoparallel submanifold. We show that if all 1- auto parallel proper submanifolds of ∇ ∇ a 1 flat statistical manifold S are exponential then S is an exponential family. Also shown that ± − the submanifold of a parameterized model S which is an exponential family is a 1 - autoparallel ∇ submanifold. Keywords: statistical manifold, exponential family, autoparallel submanifold. 2010 MSC: 53A15

1. I NTRODUCTION Information geometry emerged from the geometric study of a statistical model of probability distributions. The information geometric tools are widely applied to various fields such as statis- tics, information theory, stochastic processes, neural networks, statistical physics, neuroscience etc.[3][7]. The importance of the differential geometric approach to the field of was first noticed by C R Rao [6]. On a statistical model of probability distributions he introduced a Riemannian metric defined by the known as the Fisher information metric. Another milestone in this area is the work of Amari [1][2][5]. He introduced the α - geometric structures on a statistical manifold consisting of Fisher information metric and the α - con- nections. Harsha and Moosath [4] introduced more generalized geometric structures± called the (F, G) geometry on a statistical manifold which is a generalization of α geometry. There are many attempts to understand the geometry of the statistical manifold and− also to develop a dif- ferential geometric framework for the estimation theory. In this paper we shall study the geometry of exponential family. An exponential family is an im- portant statistical model which is attracted by many of the researchers from Physics, Mathematics and Statistics. The exponential family contains as special cases most of the standard discrete and continuous distributions that we use for practical modelling, such as the normal, Poisson, Bi- nomial, exponential, Gamma, multivariate normal, etc. Distributions in the exponential family have been used in classical statistics for decades. We discuss the dually flat structure of the finite dimensional exponential family with respect to the 1 connections defined by Amari. Then we prove a condition for a 1 flat statistical manifold± − to be an exponential family. Also show ± − that submanifold of a statistical manifold which is an exponential family is a 1 - autoparallel submanifold. ∇

2. S TATISTICAL MANIFOLD Consider the sample space Rn. A probability measure on can be represented in terms of density function with respectX to ⊆ Lebesgue measure. X

Key words and phrases. statistical manifold, exponential family, autoparallel submanifold. AMS 2010 Mathematics Subject Classification:53A15.

18 Submanifolds of Exponential Families

Definition 2.1. Consider a family of probability distributions on . Suppose each element of S X can be parametrized using n real-valued variables (θ1, ..., θn) so that S = p = p(x; θ) / θ = ( θ1, ..., θn) E (2.1) S { θ ∈ } E Rn where is a subset of and the mapping θ pθ is injective. We call such family an n- dimensional statistical model or a parametric model7→ or simply a model on θ. S

Let us now state certain regularity conditions which are required for our geometric theory. Regularity conditions (1) We assume that E is an open subset of Rn and for each x , the function θ p(x; θ) is of class c∞. ∈ X 7→ (2) Let ℓ(x; θ) = log p(x; θ). For every fixed θ, n functions in x, ∂ ℓ(x; θ); i = 1, ..., n are { i } linearly independent, where ∂ = ∂ . i ∂θ i (3) The order of integration and differentiation may be freely rearranged. (4) The moments of ∂iℓ(x; θ) exists upto necessary orders. (5) For a p on Ω, let the support of p be defined as, supp (p) := x p(x) > 0 . The case when supp (p ) varies with θ poses rather significant difficulties { | } θ for analysis. Hence we assume that supp (pθ) is constant with respect to θ. Then we can redefine to be supp (pθ). This is equivalent to assuming that p(x; θ) > 0 holds for all θ E andX all x . This means that the model is a subset of ∈ ∈ X S ( ) := p : R p(x) > 0 ( x ); p(x)dx = 1 (2.2) P X { X −→ | ∀ ∈ X Z } X Definition 2.2. For a model = p / θ E , the mapping ϕ : Rn defined by ϕ(p ) = θ S { θ ∈ } S −→ θ allows us to consider ϕ = ( θi) as a coordinate system for . Suppose we have a c∞ diffeomor- phism ψ : E ψ(E), where ψ(E) is an open subset of RSn. Then if we use ρ = ψ(θ) instead −→ E of θ as our parameter, we obtain = pψ 1(ρ) ρ ψ( ) . This expresses the same family of S { − | ∈ } ∞ probability distributions = pθ . If we consider parametrizations which are c diffeomorphic to each other to be equivalent,S { then} we may consider as a c∞ differentiable manifold and we call it as a statistical manifold . S

For the statistical manifold = p(x; θ) , define ℓ(x; θ) = log p(x; θ) and consider the partial S { } derivatives ∂iℓ; i = 1, ...., n. By our assumption, ∂iℓ; i = 1, ...., n are linearly independent functions in x. We can construct the following n-dimensional vector space spanned by n functions ∂iℓ; i = 1, ...., n in x as, n 1 i ℓ Tθ ( ) = A(x) / A(x) = ∑ A ∂i . (2.3) S { i=1 } Define expectation with respect to the distribution p(x; θ) as

Eθ( f ) = f (x)p(x; θ)dx . (2.4) Z Note that Eθ[∂iℓx;θ] = 0 since p(x; θ) satisfies

p(x; θ)dx = 1. (2.5) Z 1 Hence for any random variable A(x) Tθ ( ), we have Eθ[A(x)] = 0. This expectation induces an inner product∈ onS in a natural way. S < A(x), B(x) > = E [A(x)B(x)] ; f orA (x), B(x) T1( ) θ θ ∈ θ S

19 Mahesh T. V. and K.S. Subrahamanian Moosath

Especially the inner product of the basis vectors ∂i and ∂j is

gij (θ) = < ∂i, ∂j >θ = Eθ[∂iℓ(x; θ)∂jℓ(x; θ)] (2.6) = E[∂ ∂ ℓ(x; θ)] (2.7) − i j = ∂iℓ(x; θ)∂jℓ(x; θ)p(x; θ)dx . (2.8) Z It is clear that the matrix G(θ) = ( gij (θ)) is symmetric (i.e gij = gji ). For any n-dimensional vector c = [ c1, ..., cn]t

n t i 2 c G(θ)c = ∑ c ∂iℓ(x; θ) p(x; θ)dx > 0 (2.9) Z {i=1 } since ∂1ℓ(x; θ), ..., ∂nℓ(x; θ) are linearly independent, G is positive definite. Hence g =<, > de- fined{ in (2.8 ) is a Riemannian} metric on the statistical manifold , called the Fisher information metric . S Example 2.3. = R, n = 2, θ = ( µ, σ), E = (µ, σ) / ∞ < µ < ∞, 0 < σ < ∞ X { − } 1 (x µ)2 = N(µ, σ) = p(x; θ) = exp − . (2.10) S { √2πσ {− 2σ2 }} This is a 2-dimensional manifold which can be identified with the upper half plane. The log- likelihood function is given by (x µ)2 ℓ(x, θ) = − log √2πσ − 2σ2 − The tangent space T1 is spanned by ∂ = ∂ and ∂ = ∂ . θ S 1 ∂µ 2 ∂σ (x µ) (x µ)2 1 ∂ = − , ∂ = − 1 σ2 2 − σ3 − σ Then the Fisher information matrix G(θ) = ( gij ) is given by 1 0 σ2  2  0 2  σ    Definition 2.4. Let = p(x; θ) / θ E be an n-dimensional statistical manifold with the S { 3 ∈ Γ}1 Fisher metric g. We can define n functions ijk by Γ1 ℓ ℓ ijk = Eθ[( ∂i∂j (x; θ))( ∂k (x; θ))] (2.11) Γ1 uniquely determine an affine connection 1 on the statistical manifold by ijk ∇ S 1 1 Γ =< ∂j, ∂ > (2.12) ijk ∇∂i k 1 is called the 1 connection or the exponential connection . ∇ − Here ℓ(x; θ) the logarithm of the density function p(x; θ) is used to define the fundamental geo- metric structures in a statistical model S = p(x; θ) . Amari defined one parameter family of functions called the α - embedding indexed by{ α R}. ∈ Definition 2.5. Let L(α)(p) be a one parameter family of functions defined by

1 α 2 −2 1 α p α = 1 L(α)(p) = − 6 (2.13) ( log p α = 1

20 Submanifolds of Exponential Families and we call ℓ (α)(x; θ) = L(α)(p(x; θ)) (2.14) the α representation of the density function p(x; θ). − The 1 representation ℓ (x; θ) is the log-likelihood function ℓ(x; θ) and the − 1 ( 1) representation ℓ 1(x; θ) is the density function p(x; θ) itself. − −α − ℓ Let Tθ ( ) be the vector space spanned by n linearly independent functions ∂i α(x; θ) in x for i = 1, ...,Sn. n α i ℓ Tθ ( ) = A(x) / A(x) = ∑ A ∂i α(x; θ) . (2.15) S { i=1 } There is a natural isomorphism between these two vector spaces T1( ) and Tα( ) given by θ S θ S ∂ ℓ (x; θ) T1( ) ∂ ℓ (x; θ) Tα( ) (2.16) i 1 ∈ θ S ←→ i α ∈ θ S The vector space Tα( ) is called the α representation of the tangent space T1( ). The α representation θ S − θ S − of a vector A = ∑n Ai∂ ℓ T1( ) is the random variable i=1 i ∈ θ S n i Aα(x) = ∑ A ∂iℓα(x; θ) (2.17) i=1 Let us define the α expectation of a random variable f with respect to the density p(x; θ) as − α α Eθ ( f ) = f (x)p(x; θ) dx . (2.18) Z Then an inner product can be defined naturally as < >α α Aα(x), Bα(x) θ = Eθ [Aα(x)Bα(x)] . (2.19) We have the relations (1 α) − ∂iℓα(x; θ) = p 2 ∂iℓ(x; θ) (2.20) (∂iℓα)( ∂jℓ α) = p(x; θ) ∂iℓ ∂jℓ. (2.21) − Thus we have < >α ℓ ℓ α ∂i, ∂j θ = ∂i α(x; θ)∂j α(x; θ)p(x; θ) dx (2.22) Z = ∂iℓ ∂jℓ p(x; θ) dx (2.23) Z = gij (θ) (2.24) and the inner product has the following dualistic expression for any α,

< Aα(x), Bα(x) >θ = Aα(x, θ)B α(x; θ)dx . (2.25) Z − α α Then we say that the two vector spaces Tθ ( ) and Tθ− ( ) are dually coupled . That is the inner product of two vectors A and B is givenS by the integrationS of the product of their α and ( α) representations. − We have,− − (1 α) 1 α ∂ ∂ ℓ = p −2 (∂ ∂ ℓ + − ∂ ℓ∂ ℓ). (2.26) i j α i j 2 i j 3 Γα Hence we can define n functions ijk by Γα ℓ ℓ ijk = ∂i∂j α(x; θ)∂k α(x; θ)dx . (2.27) Z − These Γα uniquely determine connections α on the statistical manifold by ijk ∇ S α α Γ = < ∂j, ∂ > (2.28) ijk ∇∂i k

21 Mahesh T. V. and K.S. Subrahamanian Moosath which is called α-connection . α Thus the one parameter family of functions Lα(p) defines a family of connections , α R on the statistical manifold . ∇ ∈ S α α Lemma 2.6. The α connection and the (α) connection − are dual with respect to the Fisher information metric.− In particular,∇ the 0 connection− − is the Levi-Civita∇ connection or the metric connec- tion. − Proof. By the use of α representation, we have −

A < B, C > = A Bα(x, θ)C α(x; θ)dx (2.29) Z − = (AB α(x, θ)) C α(x; θ)dx + Bα(x, θ)( AC α(x; θ)) dx Z − Z − α α = < B, C > + < B, − C > . (2.30) ∇A ∇A 

3. T HE EXPONENTIAL FAMILY The Exponential family is a practically convenient and widely used unified family of distribu- tions on finite dimensional Euclidean spaces parametrized by a finite dimensional parameter vector. It contains as special cases most of the standard discrete and continuous distributions that we use for practical modelling, such as the normal, Poisson, Binomial, exponential, Gamma, multivariate normal, etc. Definition 3.1. The standard form of a n-dimensional exponential family of distributions = p(x; θ) / θ E Rn is S { ∈ ⊆ } n n i i p(x; θ) = exp (∑ θ xi ψ(θ)) or log (p(x; θ)) = ∑ θ xi ψ(θ) (3.1) i=1 − i=1 − 1 n where x = ( x1, ..., xn) is a set of random variables, θ = ( θ , .., θ ) are the canonical parameters and ψ(θ) is determined from the normalization condition. Now consider the exponential family = p(x; θ) / θ E Rn where p(x, θ) = exp [∑n θix S { ∈ ⊆ } i=1 i − ψ(θ)] . Now ∂ l(x; θ) = x ∂ ψ(θ), ∂ ∂ l(x; θ) = ∂ ∂ ψ(θ) Then Γ1 = ∂ ∂ ψ(θ)E (∂ l ) = 0 i i − i i j − i j ijk i j θ k θ 1 Thus we have ∂j = 0. Then we say that the exponential family is 1 - flat. By duality we ∇∂i get it is 1 - flat also. Thus the exponential family is a dully flat space with respect to the 1 connections− defined by Amari. Thus ± − Theorem 3.2. The exponential family is a dually flat space with respect to the 1 connections defined by Amari. ± A dually flat space is an important tool in the geometric study of statistical estimation. Now we have seen that the important statistical model the exponential family has a dually flat structure with respect to the α = 1 connections. ± −

4. C HARACTERIZATION OF EXPONENTIAL FAMILY Amari has given necessary and sufficient condition for a submanifold of an exponential family to be exponential. In this section we prove that a parameterized family which is flat with respect to 1-connection is an exponential family if all the 1- autoparallel submanifolds are exponential. ± ∇ Also we show that if submanifold of a statistical manifold is exponential then it is 1- autopar- allel. ∇

22 Submanifolds of Exponential Families

Definition 4.1. Let M be a smooth n-dimensional manifold and τ(M) denote the set of all smooth vector fields on M. Let X τ(M) and be a connection on M, then X is parallel on M with ∈ ∇ respect to iff XY = 0; Y τ(M). M is called∇ flat with∇ respect∀ to ∈ if ∂ = 0 for all i, j. ∇ ∇∂i j Definition 4.2. Let N be a submanifold of M and be an affine connection on M. N is said to ∇ be autoparallel with respect to if XY τ(N) for all X, Y τ(N). 1-dimensional autoparallel submanifolds∇ ∇ ∈ are called geodesics.∈

Remark 4.3 . A necessary and sufficient condition for N to be autoparallel is that ∂a∂b τ(N) holds for all a, b. ∇ ∈ Definition 4.4. Let N be a submanifold of M. Let p N, then T N T M. Now consider the ∈ p ⊂ p projection map πp : T pN TpM and πp(D) = D , D TpN. Let be a connection on M, then we define a connection−→ π on N as ∀ ∈ ∇ ∇ ( π Y) = π ( Y) ; p N. ∇X p p ∇X p ∀ ∈ Now define H(X, Y) = Y π Y ∇X − ∇ X is called the second fundamental form or embedding curvature. Remark 4.5 . For each p M. Let (∂ ) ; 1 a m , m = dim( N), be a basis for T N and let ∈ { a p ≤ ≤ } p (∂ ) ; m + 1 k n , n = dim( M), be a basis for T N . Then we define m2(n m) functions { k p ≤ ≤ } p ⊥ − H in the following way { abk } H = H(∂ , ∂ ), ∂ = ∂ , ∂ . abk h a b ki h∇ ∂a b ki It follows that H = 0 iff H = 0; a, b, k abk ∀ Remark 4.6 . H (X, Y) = 0 iff N is - autoparallel submanifold of M. ∇ Remark 4.7 . We know that the exponential family is dually flat with respect to the 1- connection. But a parametrized model which is flat with respect to 1- connection need not be± an exponential family. ± Example 4.8. Let q be a smooth probability density function on R and qk be the kth iid extension. Then for Y = ( y1, y2, y3, ... yk)t (4.1) we have qk(Y) = q(y1)q(y2)q(y3), ... q(yk). (4.2) For a regular matrix A Rk k and a vector µ Rk, we define a probability density function on ∈ × ∈ Rk by qk(A 1(x µ)) p(A, µ, x) = − − . (4.3) det (A) | | Now define a statistical model S = p(A, µ, x) µ Rk . (4.4) { | ∈ } Now consider k 1 log (p(A, µ, x)) = ∑ log (q(A− (x µ))) log ( det (A) ). (4.5) i=1 − − | | ( ( )) Then clearly ∂log p A,µ,x is constant. So from the definition of Amari’s α- connection Γα = 0, it ∂µ i ij ,k implies that S is α-flat for all α, but in general it is not necessarily to be an exponential family.

23 Mahesh T. V. and K.S. Subrahamanian Moosath

Amari [1] has proved that a submanifold M of an exponential family S is an exponential family if and only if M is autoparallel with respect to 1 in S. Now we prove a condition for a statistical manifold∇ to be an exponential family. Theorem 4.9. Let S = P(x, θ) θ Θ be a parametrized family which is flat with respect to 1 1 1 { | ∈ } ∇ and − . If all - autoparallel proper submanifolds of S are exponential family, then S is an exponential family.∇ ∇ Proof. Let S = P(x, θ) θ Θ be an n-dimensional statistical manifold with dually flat struc- { | ∈ } ture (S, g, 1, 1), where g is the Fisher information metric. Let θ = [θi] and η = [η ] be the co- ∇ ∇− j ordinate system of S with respect to 1 and 1 respectively. Now subdivide the range of index ∇ ∇− i = 1, 2, ... n into indexing sets I = i = 1, 2, ... k and II = i = k + 1, k + 2, ... n . Let M(CII ) be the {i } { i } set of points whose coordinates [θ ] in II are fixed to constant CII = (CII ) for i = k + 1, k + 2, .. n. That is M(C ) = p S θk+1 = Ck+1, θk+2 = Ck+2, ... θn = Cn (4.6) II { ∈ | II II II } where C Rn k, then clearly this is an affine space with respect to θ- coordinate system, which II ∈ − implies M(C ) is a 1- autoparallel submanifold of S. Also if C = C′ then M(C ) M(C′ ) II ∇ II 6 II II ∩ II = φ and CII M(CII ) = S. Now by our assumption M(CII ) is an exponential family for all CII . If p(x, θ) S, then p(x, θ) M(C ) for some constant C , this implies that ∈S ∈ II II k i β p(x, θ) = exp (∑ θ xi ψ (θ)) (4.7) i=1 − where ψβ(θ) defined on Θβ = θ Θ θk+1 = Ck+1, θk+2 = Ck+2, ... θn = Cn . Now define φ(θ) { ∈ | II II II } = ψβ(θ) if θ Θβ. Then we can write ∈ k i p(x, θ) = exp (∑ θ xi φ(θ)) (4.8) i=1 − k n k i i i = exp (∑ θ xi + ∑ CII xi ∑ CII xi φ(θ)) (4.9) i=1 i=k+1 − i=1 − n i = exp (∑ θ xi + F(x) φ(θ)) (4.10) i=1 − where F(x) = ∑k Ci x for p(x, θ) M(C ), then S is an exponential family.  − i=1 II i ∈ II Theorem 4.10. Let S = p(x, θ) θ Θ be a statistical manifold with 1 connection. Let M be a { | ∈ } ∇ submanifold of S. If M is an exponential family then M is 1 autoparallel submanifold of S. ∇ Proof. Let S = p(x, θ) θ Θ and M = q(x, u) be a submanifold of S, [θi] be the coordinates { | ∈ } { } on S and [ua] be the coordinates on M. Suppose M is an exponential family, then n a q(x, u) = p(x, θ(u)) = exp ∑ uaG (x) + D(x) φ(u) (4.11) {a=1 − } 2 1 ∂ φ we have, Γ = Eξ [( ∂a∂aℓθ)∂kℓθ], where ℓθ =log (p(x, θ)) . Then ∂a∂aℓθ = . Therefore we ab ,k − ∂ua∂ub have, Γ1 = 0 which implies 1 ∂ , ∂ = 0; k. Hence H = 0, which implies that M is ab ,k h∇ ∂a b ki ∀ abk a 1- autoparallel submanifold of S.  ∇ 5. A CKNOWLEDGMENTS The first named author was supported by Doctoral Research Fellowship from Indian Institute of Space Science and Technology (IIST), Kerala, India.

24 Submanifolds of Exponential Families

REFERENCES [1] Amari S, Methods of information geometry , Oxford University Press (2000). [2] Amari S, Information geometry and its applications , Springer (2016). [3] Ay N., Jost J., Le H.V., Schwachh ofer¨ L., Information Geometry , Springer (2017). [4] Harsha K.V. and Subrahamanian Moosath K.S., F-geometry and Amari’s α-geometry on a Statistical Manifold. En- tropy , 16(5): 2472-2487, 2014. [5] M. K.Murryand W. J. Rice, Differential Geometry and Statistics , Chapman & Hall (1993). [6] Rao C.R., Information and accuracy attainable in the estimation of statistical parameters. Bulletin of the Calcutta Mathematical Society , 37: 81-91, 1945. [7] Crasmareanu M.; Hret¸canu C. E., Statistical structures on metric path spaces, Chin. Ann. Math., Ser. B 33, No. 6, 889-902 (2012).

DEPARTMENT OF MATHEMATICS , I NDIANINSTITUTEOFSPACESCIENCEANDTECHNOLOGY , T RIVANDRUM , K ERALA , INDIA E-mail address : [email protected]

E-mail address : [email protected]

25