<<

ARTICLE IN PRESS

Journal of Mathematical Psychology 50 (2006) 60–65 www.elsevier.com/locate/jmp

Statistical as an affine : A functional equation approach

Jun Zhanga,Ã, Peter Ha¨sto¨b,1

aDepartment of Psychology, 525 East University Ave., University of Michigan, Ann Arbor, MI 48109-1109, USA bDepartment of and Statistics, P. O. Box 68, 00014 University of Helsinki, Finland

Received 7 January 2005; received in revised form 9 July 2005 Available online 11 October 2005

Abstract

A statistical manifold Mm consists of positive functions f such that fdm defines a probability measure. In order to define an on the manifold, it is viewed as an affine space associated with a subspace of the Orlicz space LF. This leads to a functional equation whose solution, after imposing the linearity constrain in with the assumption, gives rise to a general form of mappings between the affine probability manifold and the vector (Orlicz) space. These results generalize the exponential statistical manifold and clarify some foundational issues in non-parametric information . r 2005 Published by Elsevier Inc.

Keywords: Probability density function; Exponential family; Non-parameteric; Information geometry

1. Introduction theory, stochastic process, neural computation, machine learning, Bayesian statistics, and other related fields (Amari, Information geometry investigates the differential geo- 1985; Amari & Nagaoka, 1993/2000). Recently, an interest in metric structure of the manifold of probability density the geometry associated with non-parametric probability functions with continuous support (or probability distribu- densities has arisen (Pistone & Sempi, 1995; Giblisco & tions with discrete support). Treating a family of parametric Pistone, 1998; Pistone & Rogantin, 1999; Grasselli, 2005). probability density functions as a smooth manifold with their Non-parametric statistical models are important in a wider parameters as its coordinates originated from Rao (1945), range of areas including psychological measurement and who identified Fisher information to be the Riemannian perception (Townsend, Solomon, & Smith, 2001) and model metric on this manifold. Later Amari (1982),motivatedby selection and testing (Karabatsos, in press). Unlike in the (Efron, 1975; David, 1975), and also independently Cˇencov parametric case where the manifold of probability density (1972/1982), established a family of affine connections with functions inherits a Euclidean topology from the space of its dualistic properties on the manifold of parametric density natural parameters, a major challenge for the non-parametric functions. A manifold endowed with a Fisher metric and case is to define a suitable topology and develop a dualistic (conjugate) affine connections is abstracted as a corresponding notion of convergence. Fortunately, this statistical manifold (Lauritzen, 1987). Built upon these obstacle was overcome by the introduction of an exponential foundational work, parametric information geometry has statistical manifold by Pistone and Sempi (1995).These found many applications in theoretical statistics, information authors, among other things, gave an explicit formula for a chart of the manifold formed by all density functions ÃCorresponding author. Fax: +1 313s 763 7480. absolutely continuous with respect to a given one. The E-mail addresses: [email protected] (J. Zhang), peter.hasto@helsinki.fi topology on this infinite-dimensional manifold is induced via (P. Ha¨sto¨). this chart; it is metric and can be defined based on a notion URLs: http://www-personal.umich.edu/junz/, of exponential convergence of sequences (Pistone & Sempi, http://www.helsinki.fi/hasto/. 1Supported in part by a Gehring-Finland Post-doctoral Fellowship, by 1995; Pistone, 2001). the Finnish Academy of Science and Letters and by the Research Council In this paper we address some foundational issues of of Norway, Project 160192/V30. non-parametric information geometry, with the goal of

0022-2496/$ - see front matter r 2005 Published by Elsevier Inc. doi:10.1016/j.jmp.2005.08.003 ARTICLE IN PRESS J. Zhang, P. Ha¨sto¨ / Journal of Mathematical Psychology 50 (2006) 60–65 61 extending the Pistone–Sempi formula that maps the This set will be endowed with a structure of a of probability density functions to a in manifold. Since X and m will be fixed, we abbreviate L1 ¼ an Orlicz space LF (a special , see Appendix L1ðX; mÞ and L1ðpÞ¼L1ðX; p mÞ, and similarly for other A). The Banach-space valued map is the infinite- function spaces. The expectation operator Epðf Þ over a extension of the coordinate functions which take values in function f on X is defined as Rn. Recall that Rn has both a topological structure, in Z terms of its canonical topology, an algebraic structure as a Epðf Þ¼ fp dm ¼kf kL1ðpÞ. vector space, and a geometric or affine structure as a set of X points. Our approach is to exploit this affine structure and Recall that a differentiable manifold M of a set of view the statistical manifold as an affine space associated functions is defined as follows: there exists a system of F with the vector space L . charts fOi; figi2N, collectively called an atlas, such that An affine space is a homogeneous set of points such that no stands out in particular. Affine spaces differ from (i) each Oi is an open set on M and the union [i2N Oi vector spaces in that no origin has been selected. So affine covers M; space is fundamentally a geometric structure—an example (ii) the associated mappings fi: Oi ! B are all home- being the . The structure of an affine space is given by omorphisms (here B is some Banach space) and have an operation : A U ! A which associates to a point a the properties that fiðOiÞ is open in B and that in the affine space A and a vector u in the vector space U whenever Oi and Oj have non-empty intersection then 1 another point a u in A. We think of this as a the mapping fj fi : fiðOi \ OjÞ!fjðOi \ OjÞ is of a point a in its space A by a vector u. Notice that it differentiable up to certain order. makes no sense to add two points of A in this setting.

One advantage of modelling a statistical manifold as a The functions fi themselves are called coordinate func- generalized affine space is to address the issue of tions. In the finite-dimensional case, the coordinate representation of probability measures, i.e., through the functions are valued in ( of) Rn. However, in the use of an extended vector space U as opposed to the construction of Pistone and Sempi, the coordinate func- probability A. If the operation exists and is tions fi are valued on subsets of an ‘‘exponential Orlicz continuous and differentiable, a global mapping is space’’ (which is a Banach space) defined by the Young established between U and A. This allows the setting up function FðtÞ¼expðjtjÞ 1; we denote this Orlicz space by of a global atlas at any given point of A (recall that the exp LðpÞ¼exp LðX; p mÞ. usual differentiable manifold only assumes the existence of A major achievement of Pistone and Sempi (1995) is to an atlas locally). provide an atlas fp centered at a point p 2 Mm and The structure of the rest of this paper is as follows: In the mapping its neighborhood to exp L0ðpÞ: next section we review the Pistone–Sempi framework, and q q consider an obvious generalization of their original fpðqÞ¼log Ep log , (1) formulation. In Section 3 we introduce the affine structure p p on the probability manifold and derive a corresponding where exp L0ðpÞ is a sub-space of exp LðpÞ consisting of the functional equation. We then solve this equation and functions with zero mean: derive a general expression for the operation. This exp L ðpÞ¼fu 2 exp LðpÞ : E ðuÞ¼0g. generalizes the exponential model in a natural way. 0 p Discussion of possible applications of these results is given In other words, the space exp L0ðpÞ contains all random in Section 4, followed by conclusions in Section 5. The variables with zero expectation value. Pistone and Sempi reader who needs some refreshing of their Orlicz space (1995) showed that in an open neighborhood surrounding theory can start by consulting Appendix A. p, the Epfg term in (1) is always finite and hence fp is a well-defined chart for any reference point p 2 Mm, with fpðpÞ¼0. (Note that the chart is not global because, in the infinite-dimension setting, the second term in (1) may be 2. The Pistone–Sempi framework revisited unbounded for certain q 2 Mm.) The inverse of the coordinate mapping f1, which maps (a subset of) Pistone and Sempi (1995) introduced a non-parametric p exp L0ðpÞ!Mm, gives the exponential family of density exponential statistical manifold consisting of all densities functions that are absolutely continuous with respect to a given one. peu In this section we review the parts of that article most 1ð Þ¼ fp u u . (2) relevant to our considerations. Epðe Þ Let X be a set and m a s-finite probability measure on X, u Again this formula is well-defined only when Epðe Þ is in other words mðXÞ¼1. We will consider the set finite. For this reason, Pistone and Sempi require that u lies exp 1 in the closed unit ball B ðpÞ of exp LðpÞ. Under Mm:¼fp 2 L ðX; mÞ: p40 a.e., kpkL1ðX;mÞ ¼ 1g. this condition, the coordinate transformation f f1 p2 p1 ARTICLE IN PRESS 62 J. Zhang, P. Ha¨sto¨ / Journal of Mathematical Psychology 50 (2006) 60–65 between two atlases centered at different points p1; p2 of easily see that Mm is simply EpðlogðFðucÞÞÞ ¼ logðcÞþEpðlog FðuÞÞ. (4) p p 1 1 In order for this formula to make sense, we need to prove u 7! u þ log Ep2 u þ log . p2 p2 that the integral involved in the right-hand side is The construction of the mapping (1) from Mm to the absolutely convergent. It follows from our assumption space of zero-mean random variables can be understood in log FðtÞ¼log FðtÞ that the following way. Suppose we start from the entire Orlicz Epðj log FðuÞjÞ ¼ Epðlog FðjujÞÞ. space exp LðpÞ, as opposed to exp L0ðpÞ, and restrict the coordinate function u 2 exp LðpÞ to lie in the closed unit An application of Jensen’s inequality yields ball Bexp, i.e., kuk 1. By the definition of the exp LðpÞp Epðlog FðjujÞÞp log EpðFðjujÞÞo1, and the monotonicity of the Young function, this implies F that where we used also u 2 L . That Epðj log FðuÞjÞ (and hence Z Z Epðlog FðuÞÞ) is finite allows us to impose, with reference to

expðjujÞ pdmp expðjuj=kukexp LðpÞÞ pdm ¼ 1, (4), the X X EpðlogðFðucÞÞÞ ¼ 0. i.e., exp juj is in L1ðpÞ. Thus exp u is also in L1ðpÞ since Z Z Therefore the Pistone–Sempi approach will work if we use expðuÞpdmp expðjujÞpdm. any positive function FðtÞ that is strictly increasing and X X convex on ½0; 1Þ, and satisfies FðtÞFðtÞ¼1. The corre- So there is an obvious way to get a function of unit integral sponding map is (and hence an element of Mm) using exp u, namely p FðuÞ p exp u u 7! . (5) u 7! (3) EpðFðuÞÞ Epðexp uÞ We call this the pseudo-exponential map. This general- 1 exp which is just (2). It is clear that in this case fp : B ! Mm ization of Pistone–Sempi is somewhat straightforward 1 as defined above is many-to-one. Specifically, fp ðuÞ¼ because it still uses the same kind of normalization for f 1ðvÞ if and only if exp u ¼ c exp v, or expressed F p mapping B to Mm and follows their same insights into the differently, u ¼ v þ log c, for some positive constant c. representation of probability density functions using Orlicz Because of this extra , we can define a space functions. We turn to another approach in the next foliation of exp LðpÞ¼ c exp LcðpÞ where section. exp LcðpÞ¼fu 2 exp LðpÞ : EpðuÞ¼log cg. 1 3. The affine space model of statistical We can then require that fp ðqÞ be defined on a particular leaf of this foliation of the Banach space. When An affine space is a set of points in which each point can c ¼ 1 this leads to the Pistone–Sempi’s formula of fp, i.e., (1). be ‘‘translated’’ to any other point through an associated We now investigate whether this construction can be vector space. More precisely, the set A is an affine space extended to an arbitrary Orlicz space LF, using an arbitrary associated to the vector space U if there exists an operation class of Young function F, instead of from exp L, using the : A U ! A, called ‘‘right translation’’ or ‘‘translation’’ particular Young function expðj jÞ 1. Introduce a strictly for short, through which U acts on A transitively. Writing increasing function F : R !ð0; 1Þ that is convex on ½0; 1Þ out the axioms, this means that and satisfies FðtÞ¼ðFðtÞÞ1 (and Fð0Þ¼1). Now CðtÞ¼ 0 0 0 FðjtjÞ 1 is a Young function. With an abuse of notation, (i) ðp uÞu ¼ p ðu þ u Þ for all p 2 A and u; u 2 U. we still denote the corresponding Orlicz space as LF rather (ii) ðp 0Þ¼p for all p 2 A. than LC. The arguments of the above paragraph, which (iii) The restricted mapping f uðpÞ¼p u is surjective for was made to FðtÞ¼et, can be now made to a general F. every fixed u 2 U. Everything works fine up until (3). When we require, in 1 1 It can be easily verified that the exponential model general, that fp ðuÞ¼fp ðvÞ if and only if FðuÞ¼cFðvÞ, 1 we run into a problem. Let us define uc ¼ F ðcFðuÞÞ,a satisfies the above axioms with function on X. The problem is that we do not know if for a peu F F ¼ fixed u 2 B (the unit ball in L ) there exists a positive p u u . Epðe Þ constant c such that The Pistone–Sempi formula (2) can be viewed as defining E ðu Þ¼0. p c the right translation for the exponential family. Of course, Hence we must look for another foliation procedure. the affine structure of the exponential map, with log- Since our goal is to turn the multiplicative constant into an likelihoods as score functions, is well known, e.g., (Amari, additive one, it is natural to look at the logarithm. We 1985; Murray & Rice, 1993). So one way to generalize the ARTICLE IN PRESS J. Zhang, P. Ha¨sto¨ / Journal of Mathematical Psychology 50 (2006) 60–65 63 exponential model is to see what other representations linear solutions could have. lðcÞ¼kc, Let us denote p u by Fðp; uÞ. Then the first axiom above can be written as FðFðp; uÞ; u0Þ¼Fðp; u þ u0Þ. This is where k is an arbitrary element in V. Finally, setting c ¼ 0 a functional equation known as the ‘‘translation equation’’ in (9) gives (Acze´l, 1966, 8.2.2). Below we derive a general form for the 1 0 G 0 ðG0ðvÞÞ ¼ lðc Þþv, solution of this equation in the infinite dimensional case, c following the finite-dimensional solution in (Acze´l, 1966, or, after denoting q ¼ G0ðvÞ 8.2.2, Theorem 1). In the next theorem we assume that the 1 0 1 G 0 ðqÞ¼lðc ÞþG ðqÞ. Banach space splits as a direct sum of the type B ¼ c 0 V Ru0 (here the direct sum is not to be confused with Substituting into (8) gives the right-translation operator above). We use the notation Fðp; uÞ¼G ðG1ðpÞþKðuÞÞ, u ¼½v; c to denote the splitting of individual elements of 0 0 the Banach space B accordingly. where Kð½v; cÞ ¼ v þ lðcÞ¼v þ ck is a linear transforma- tion mapping B to V. It is easy to see that every F of this Theorem 1. Let P and B be Banach spaces and suppose that form is a solution. & F: P B ! P is a function satisfying The previous theorem is not directly applicable to our FðFðp; uÞ; u0Þ¼Fðp; u þ u0Þ (6) setting. The problem is that, in Fðp; uÞ, the range in the 0 for all p 2 P and u; u 2 B. Assume that B splits as V fcu0 : second variable u depends on the value of the first variable F c 2 Rg for a fixed u0 2 B, and there exists p 2 P so that p 2 Mm, because u is in the unit ball of L ðpÞ. The way to 1 F GcðÞ ¼ Fðp; ½; cÞ: V ! P is a for every fixed get around this is to notice that L L ðpÞ for every c 2 R. Then all continuous solutions of (6) are of the form p 2 Mm, and, moreover, it is dense. So we consider only functions F: L1 ! . Since we know all contin- Fðp; uÞ¼G ðG1ðpÞþKðuÞÞ, (7) Mm Mm 0 0 uous solutions in a dense subset of our space, we obviously where K: B ! V is linear. know all continuous solutions in the whole space, as well. With the understanding that we restrict ourselves to Proof. First, it is clear from (6) along with the bijectivity dense subsets when necessary, we can see immediately how assumption that Fðp; 0Þp.Fixp 2 P so that G ðÞ ¼ c Theorem 1 relates to the charting of a statistical manifold. Fðp; ½; cÞ: V ! P is a bijection for every fixed c 2 R. For By defining q 7! fpðqÞ2V as v 2 V, G0ðvÞ¼Fðp; ½v; 0Þ so that G0ð0Þ¼p with inverse 1 1 1 G0 ðpÞ¼0. fpðqÞ¼G0 ðqÞG0 ðpÞ, We next set p ¼ p, u ¼½v; c and u0 ¼½v0; c in (6): we get a chart mapping a region (centered on the reference 0 0 FðFðp; ½v; cÞ; ½v ; cÞ ¼ Fðp; ½v þ v ; 0Þ point p) of the manifold Mm to V B. Note that K is an 0 ¼ G0ðv þ v Þ. identity map when restricted to V: KðfpðqÞÞ ¼ fpðqÞ. So for all p; q We denote q ¼ Fðp; ½v; cÞ ¼ GcðvÞ, and note that then 1 Fðp; f ðqÞÞ ¼ q, Gc ðqÞ¼v. We have thus derived that p 0 1 0 effecting a right-translation from p 2 Mm to q 2 Mm. In the Fðq; ½v ; cÞ ¼ G0ðGc ðqÞþv Þ (8) above, the coordinate function fpðÞ is valued in V.In for every q 2 P (here we use that Gc is a surjection) and general, if it is valued in the entire B, then there is an ½v0; c2B. additional term cðu0 kÞ in the expression of fpðqÞ, where We substitute this expression for F in the original c 0 0 0 the constant is arbitrary. functional equation (with u ¼½v; c and u ¼½v ; c ) and get Let us next see how the exponential model fits into this 1 1 0 general . Recall that Theorem 1 involves a co- G0ðGc0 ðG0ðGc ðpÞþvÞÞ þ v Þ 1 0 dimension 1 splitting of the Banach space B in the form ¼ G0ðGðcþc0ÞðpÞþv þ v Þ. B ¼ V fcu0 : c 2 Rg. Set u0 ¼ 1, the constant function, so Cancelling the outermost G0 and defining l : R ! V by that the splitting becomes exp L ¼ exp L0 R, where 1 exp L is (as defined in Section 1) the space of exponen- lðcÞ¼Gc ðpÞ yields 0 tially integrable functions of zero mean, and R denotes the 1 0 Gc0 ðG0ðlðcÞþvÞÞ ¼ lðc þ c Þþv. (9) space of constant functions. Then choose k ¼ 0, which

Setting v ¼lðcÞ in (9), rearranging, and recalling G0ð0Þ¼ amounts to requiring Fðp; uÞ¼p; 8u 2 R, i.e., for all p gives constant functions u. Consequently, KðuÞ0 if and only if u is a constant function. Choose G1ðpÞ¼Kðlog pÞ. Then lðc0ÞþlðcÞ¼lðc þ c0Þ. 0 n 1 p u ¼ expðK ðKðlog pÞþKðuÞÞÞ Due to the assumed continuity of Gc , l is continuous with 1 ¼ expðK nðKðlogðpÞþuÞÞÞ, lð0Þ¼G0 ðpÞ¼0, so the above Cauchy equation has only ARTICLE IN PRESS 64 J. Zhang, P. Ha¨sto¨ / Journal of Mathematical Psychology 50 (2006) 60–65

n F where K denotes the pseudo-inverse of K, i.e., a mapping subset of L to an open subset of Mm that includes the such that K K n is the identity, which will be specified point p. next. We know that K nðKðuÞÞ ¼ u þ c for some constant c, Let us give an example for the use of the differential since constants are the only elements in the of K. Let geometric notion of chart/atlas for computations. Our us define a functional (interpreting the constant function as generalized expression (5) based on the normalization a ) by c½u¼K nðKðuÞÞ u. Then our previous approach has a natural interpretation as the Bayes formula equation gives pl q ¼ , p u ¼ expðu þ logðpÞþc½u þ log pÞ EpðlÞ ¼ peuec½logðp exp uÞ. where p is the prior distribution and l ¼ cFðuÞ is the n likelihood function. In other words, the Orlicz-space Thus we get the Pistone–Sempi model if we choose K so functions u ¼ F1ðl=cÞ are just the F1-transformed like- that Z lihoods on the support X (here c is a certain constant). The c½u¼log exp udm. (10) Pistone–Sempi model (2) is then the Bayes formula using X the log-likelihood function

Let us look at some obvious generalizations: we fix a u ¼ log l Epðlog lÞ. non-zero u 2 LF and define L to be a linear transforma- 0 Viewed in this way, the meaning of the bijective transfor- tion so that KðuÞ¼0 if and only if u ¼ cu . We define 0 mation between exp LðpÞ,orLFðpÞ in general, and as G1ðpÞ¼KðF1ðpÞÞ. We require that G1 be an injection, Mm 0 0 given by any p-referenced chart is that the posterior so F1ðpÞ¼F1ðqÞþcu if and only if p ¼ q and c ¼ 0. 0 distribution q, with respect to the prior distribution p,isin We find that one-to-one correspondence to (a subspace of) the Orlicz p u ¼ Fðu þ F1ðpÞþc½u þ F1ðpÞÞ, space where likelihood functions (after proper transforma- n tion) are defined. So trajectories in Mm can be mapped to where the functional c½u¼K ðKðuÞÞ u is as before. Now F we should choose trajectories in exp LðpÞ or in L ðpÞ; aggregation of poster- Z ior distributions in Mm can be carried out by concatenating c½v¼min t 2 R: Fðv þ tÞ dm ¼ 1 , likelihood functions on a vector space. The upshot is that, X given the prior distribution p, the space of posterior in order for p u to be in the manifold. For FðtÞ¼et, the distributions and the space of likelihood functions are in above reduces to (10). one-to-one correspondence as established by the chart fp, To conclude this section, we examine whether or not our so that computations involve posterior distributions can be previous generalization (5) investigated in Section 2 effectively performed using p-centered likelihood functions. satisfies the right-translation property. Imposing (6) on Note that our approaches in extending the exponential (5) yields model is not an all inclusive one. Burdet, Combe, and Nencka (2001) gave the following chart for mapping the Fðu ÞFðu Þ¼c Fðu þ u Þ, 2 1 2 1 2 open unit ball of L ðpÞ to Mm: 1 where c is some constant. So logðc FÞ satisfies the Cauchy qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 equation p u ¼ puþ 1 Epðu Þ logðc1Fðu ÞÞ þ logðc1Fðu ÞÞ ¼ logðc1Fðu þ u ÞÞ, 1 2 1 2 with the inverse given (for fixed p)by with solution logðc1FðuÞÞ ¼ bu (where b is another rffiffiffi rffiffiffi bu q q constant), or FðuÞ¼ce . Therefore the exponential model fpðqÞ¼ Ep . studied by Pistone and Sempi (1995) is the only common p p element in the two generalized forms of charts for Clearly for such statistical manifolds, as investigated here. ðp uÞu0ap ðu þ u0Þ; 4. Potential applications and discussions hence this model does not fit into the scheme of Section 3. As a generalization of the translation equation (6), one An atlas on a manifold provides a systematic way of should consider the so-called ‘‘transformation equation’’ representing points on the manifold by coordinate func- (Acze´l, 1966, 8.2.1) 0 0 tions. In the case of a statistical manifold Mm where points ðp uÞu ¼ p ðu u Þ are themselves probability density functions (positive where is some associative operator functions in L1), the coordinates are Banach-space valued functions as well (e.g., valued in LF), with normalization ðu u0Þu00 ¼ u ðu0 u00Þ. and positivity constraints removed. Each individual chart 1 The corresponding functional equation is centered at p 2 Mm, denoted fp ðuÞ in Section 2 and Fðp; uÞ in Section 3, contains a bijective mapping from an open FðFðp; uÞ; vÞ¼Fðp; Gðu; vÞÞ, ARTICLE IN PRESS J. Zhang, P. Ha¨sto¨ / Journal of Mathematical Psychology 50 (2006) 60–65 65 where G : B B ! B is associative. This is the subject for References future research. Acze´l, J. (1966). Lectures on functional equations and their applications. 5. Conclusions New York, London: Academic Press. Amari, S. (1982). Differential geometry of curved exponential families—curvatures and information loss. Annals of Statistics, 10, We have proposed two extensions to Pistone–Sempi’s 357–385. exponential model (2) as an atlas on the manifold of non- Amari, S. (1985). Differential-geometrical methods in statistics. Lecture parametric probability density functions. The first ap- Notes in Statistics (Vol. 28). New York: Springer. proach constructs coordinate functions in a general Orlicz Amari, S., & Nagaoka, H. (1993/2000). Methods of information geometry. Translations of Mathematical Monographs, (Vol. 191). Oxford: space using a Young function whose logarithm is assumed Oxford University Press. to be odd, and yields (5) as a generalized form for Burdet, G., Combe, Ph., & Nencka, H. (2001). On real Hilbertian normalization-based models. The second approach views info-manifolds. Disordered and complex systems London, 2000 the statistical manifold as an affine space that obeys a (pp. 153–158), AIP Conference Proceedings Vol. 553, Melville, NY: ‘‘right-translation’’ property, and derives (7) as the general American Institute of Physics. Cˇencov, N.N. (1972/1982). Statistical decision rules and optimal inference. representation of an atlas. Exponential map is shown to be Providence, RI, USA: American Mathematical Society, 1982 (Origin- the only common element from these two approaches, ally published in Russian, Nauka, Moscow, 1972). highlighting its importance in charting statistical mani- David, A. P. (1975). Discussion to Efron’s paper. Annals of Statistics, 3, folds. 1231–1234. Efron, B. (1975). Defining the curvature of a statistical problem (with application to second order efficiency) (with discussion). Annals of Appendix A. Orlicz spaces Statistics, 3, 1189–1242. Giblisco, P., & Pistone, G. (1998). Connections on non-parametric In this appendix we give a short recap of the theory of statistical manifolds by Orlicz space geometry. Infinite Dimensional Orlicz spaces. For more details the reader is referred to one Analysis Quantum Probability, and Related Topics, 1(2), 325–347. of the many books on the subject, e.g., (Krasnosel’skil˘& Grasselli, M. R. (2005). Dual connections in nonparametric classical ˘ information geometry. Annals of the Institute for Statistical Mathe- Rutickil, 1961; Rao & Ren, 1991). matics (in press). Recall that a Young function is a convex increasing Karabatsos, G. Bayesian nonparametric model selection and model function F: ½0; 1Þ ! ½0; 1Þ with Fð0Þ¼0. For a Young testing. Journal of Mathematical Psychology, in press. function F we define a modular on the set of measurable Krasnosel’skil˘, M. A., & Rutickil˘, Ya. B. (1961). Convex functions and functions by orlicz spaces. Translated by Boron, L.F. Groningen, The Netherlands: Z P. Noordhoff, Ltd. Lauritzen, S. (1987). Statistical manifolds. In: S. Amari, O. Barndorff- RFðuÞ:¼ FðjuðxÞjÞ dmðxÞ. Nielsen, R. Kass, S. Lauritzen, C. R. Rao (Eds.), Differential geometry X in statistical inference, IMS Lecture Notes (Vol 10) (pp. 163–216). Using the modular we can define the Luxemburg norm on Hayward, CA. the same set by Murray, M. K., & Rice, J. W. (1993). Differential geometry and statistics. London: Chapman & Hall. kukF:¼ infft40: RFðu=tÞp1g. Pistone, G. (2001). New ideas in non-parametric estimation. Dis- ordered and complex systems (pp. 159–164) (London, 2000), AIP Using these concept we define the Orlicz space by Conference Proceedings, Vol. 553, Melville, NY: American Institute of LFðXÞ:¼fu: kuk o1g. Physics. F Pistone, G., & Rogantin, M. P. (1999). The exponential statistical The best-known example of an Orlicz space is given by the manifold: mean parameters, orthogonality and space transformations. Young functions FðtÞ¼tp for some real number p greater Bernoulli, 5(4), 721–760. than or equal to 1. In this case we have Pistone, G., & Sempi, C. (1995). An infinite-dimensional geometric Z structure on the space of all the probability measures equivalent to a p given one. Annals of Statistics, 23(5), 1543–1561. RFðuÞ:¼ juðxÞj dmðxÞ. Rao, C. R. (1945). Information and accuracy attainable in the estimation X of statistical parameters. Bulletin of the Calcutta Mathematical Society, and the norm is just the Lebesgue norm: 37, 81–91. Rao, M., & Ren, Z. (1991). Theory of Orlicz spaces. Monographs and kukF :¼ infft40: RFðu=tÞp1g Textbooks in Pure and Applied Mathematics: Vol. 146. New York: Z 1=p Marcel Dekker, Inc. ¼ juðxÞjp dmðxÞ . Townsend, J. T., Solomon, B., & Smith, J. S. (2001). The perfect Gestalt: X Infinite dimensional Riemannian face spaces and other aspects of face perception. In M. J. Wenger, & J. T. Townsend (Eds.), Another important case is FðtÞ¼expðjtjÞ 1, and the Computational, geometric and process perspectives of facial cognition: corresponding Orlicz space exp L is the space of exponen- Contexts and challenges (pp. 39–82). Mahwah, NJ: Lawrence Erlbaum tially integrable functions. Associates.