Duality for Real and Multivariate Exponential Families
Total Page:16
File Type:pdf, Size:1020Kb
Duality for real and multivariate exponential families a, G´erard Letac ∗ aInstitut de Math´ematiques de Toulouse, 118 route de Narbonne 31062 Toulouse, France. Abstract n Consider a measure µ on R generating a natural exponential family F(µ) with variance function VF(µ)(m) and Laplace transform exp(ℓµ(s)) = exp( s, x )µ(dx). ZRn −h i A dual measure µ∗ satisfies ℓ′ ( ℓ′ (s)) = s. Such a dual measure does not always exist. One important property µ∗ µ 1 − − is ℓ′′ (m) = (VF(µ)(m))− , leading to the notion of duality among exponential families (or rather among the extended µ∗ notion of T exponential families TF obtained by considering all translations of a given exponential family F). Keywords: Dilogarithm distribution, Landau distribution, large deviations, quadratic and cubic real exponential families, Tweedie scale, Wishart distributions. 2020 MSC: Primary 62H05, Secondary 60E10 1. Introduction One can be surprized by the explicit formulas that one gets from large deviations in one dimension. Consider iid random variables X1,..., Xn,... and the empirical mean Xn = (X1 + + Xn)/n. Then the Cram´er theorem [6] says that the limit ··· 1/n Pr(Xn > m) n α(m) → →∞ 1 does exist for m > E(X ). For instance, the symmetric Bernoulli case Pr(Xn = 1) = leads, for 0 < m < 1, to 1 ± 2 1 m 1+m α(m) = (1 + m)− − (1 m)− . (1) − 1 x For the bilateral exponential distribution Xn e dx we get, for m > 0, ∼ 2 −| | 1 1 √1+m2 2 α(m) = e − (1 + √1 + m ). 2 What strange formulas! Other ones can be found in [17], Problem 410. The present paper is going to interpret in arXiv:2104.05510v2 [math.PR] 8 Aug 2021 certain cases the function m 1/α(m) as the Laplace transform of a certain dual measure µ which can be deduced 7→ ∗ explicitly from the distribution µ of Xn. Since the Cram´er theorem can be seen as a result about one dimensional exponential families, we develop the idea in this framework, using the large box of examples obtained from the theory of variance functions of exponential families initiated by the article of Carl Morris [21] in 1982. An even simpler example of duality is provided by the Tweedie families (Barlev and Enis [1], Jørgensen [9] and Tweedie [22]): the p q 1 1 variance function Am with p > 1 has dual Bm with p + q = 1 for suitable pairs (A, B) (see (8)). For instance, the Inverse Gaussian family Am3 has dual Bm3/2 . The Poisson distribution, with the one of the simplest variance functions VF(m) = m, leads to the study of expo- nential family with variance function em generated up to a translation by the unsymmetric stable law with Laplace ∗Corresponding author. Email address: [email protected] Preprint submitted to Journal of Multivariate Analysis August 10, 2021 s s transform e− s , also called Landau distribution. This gives tools for describing duals of other familiar exponential families. The cases of the normal and gamma families are very simple, being self dual, but other familiar cases like the negative binomial, the Bernoulli distribution and the cubic families are tougher. Finally, we consider another family, which is the dilogarithm family with variance function em 1 as well as the one with variance function sinh m. Like the normal and gamma families, they have the remarkable prop− erty to be self dual. The definition of dual measures makes sense also in Rn while the probabilistic interpretation in terms of large deviations is lost. However we consider several cases in Rn: the multinomial distribution, the Wishart ones and other quadratic families as classified by Casalis ([4]). We proceed as follows: the notion of duality leads us unfortunatelyto change a bit the tradition about the exponen- sx θx tial families. Indeed we will use e− instead of e in order to obtain later more readable formulas. This is explained in Section 2, together with the description of the classical objects attached to an exponential family. In the preceding lines we have been vague about duality. Section 3 gives proper definitions, explaining what we call a dual measure µ∗ of µ and showing that some measures have no dual. We explain also what a T exponential family TF is. It is nothing but an exponential family F plus all its translations. Indeed, talking about the dual F∗ of an exponential family F does not exactly make sense, while the dual TF∗ of TF does. Section 3 gives also the link with large deviations. m Section 4 concentrates on the TF when the variance function VF(m) is e and some parent distributions. It also give details on what we call L´evy measures of types 0, 1 and 2. Of course, large parts of this material are well known from probabilists and statisticians: exponential families, variance functions, L´evy measures, Landau distribution. It was necessary to expose them again for commodity of reading. This section contains crucial calculations for the sequel in Proposition 7. Section 5 applies the results of Section 4 to the description of the duals of the Morris and the cubic families, with the surprizing fact that they exist all with the only exception of the hyperbolic family with variance function m2 + 1. Section 6 describes the self dual dilogarithm distribution µ on the set N = 0, 1,... of integers defined by { } ∞ ∞ 1 µ(n)zn = exp (zk 1) k2 − Xn=0 Xk=1 which, for m > 0, generates the exponential family with variance function em 1. Since the consideration of this exponential family and of a set of parent distributions is not done in the literature,− we develop some of their properties, somewhat deviating from the study of duality. For instance, if N is the standard Gaussian distribution, then the variance of the exponential family generated by the convolution N µ is em + 1. Section 7 considers the Rn case: The multinomial distribution∗ has a very explicit dual expressed in terms of the Landau distribution. The Wishart distribution is self dual as the one dimensional gamma distribution. We prove some negative results, like the fact that the multivariate negative binomial law has no dual. Section 8 discusses open problems. 2. Laplace and bilateral Laplace transforms At first the exponential families and Laplace transforms are considered. A certain tradition among statisticians (see Morris [21]) as opposed to physicists, and may be to probabilists, defines the Laplace transform of a positive measure µ on R and Rn as follows: θ,x Lµ(θ) = eh iµ(dx). (2) ZRn From the H¨older inequality the set D(µ) = θ ; Lµ(θ) < is a convex set, and the function kµ = log Lµ is convex on { ∞} D(µ). Actually kµ is strictly convex outside of the particular case where µ is concentrated on one point in the case of R, oronan affine hyperplane in the case of Rn. To avoid trivialities one introduces the interior Θ(µ) of D(µ). One calls (Rn) the set of µ which are not concentrated on an affine hyperplane and such that Θ(µ) is not empty. Such a µ generatesM a set of probabilities θ,x k (θ) P(θ, µ)(dx) = eh i− µ µ(dx) 2 and F = F(µ) = P(θ, µ) ; θ Θ(µ) is called the natural exponential family generated by µ. Note that µ is not { ∈ } R necessarily bounded: simple examples on like µ(dx) = 1(0, )(x)dx or n∞=0 δn generate the important families of ∞ exponential distributions or geometric discrete laws. Omiting ’natural’, weP will say always exponential family for short. Objects linked to F are the mean m of P(θ, µ), the domain of the means MF and the inverse function ψµ. They are defined by m = kµ′ (θ) = xP(θ, µ)(dx), MF = kµ′ (Θ(µ)), θ = ψµ(m). ZRn Note that since kµ is strictly convex, then kµ′ is injective on Θ(µ), the map ψµ from MF onto Θ(µ) is well defined and MF is a connected open set. If CF is the closed convex set generated by the support of µ clearly MF is contained in CF . We say that F or µ are steep if MF is equal to the interior of CF . Most of the classical exponential families are steep, but not always (see for n instance the Tweedie scale below for p < 0). In R the set MF is an interval, but in R there are non steep examples such that MF is non-convex ([16], p. 35). Finally, the last important object about the exponential family F is its variance function VF defined on MF by 1 VF(m) = = kµ′′(ψµ(m)) = (x m) (x m)P(ψµ(m), µ)(dx), ψµ′ (m) ZRn − ⊗ − which characterizes F. Now bilateral Laplace transforms are considered. In the particular case of dimension one, an older tradition asso- ∞ e sxµ dx s > ciates the name of Laplace to integrals 0 − ( ) which are conveniently defined for 0 in many circumstances. In the present paper we need to considerR what the physicists call the bilateral Laplace transform s,x Bµ(s) = e−h iµ(dx). (3) ZRn Dealing with this slight change of notation Bµ(s) = Lµ( s) will much simplify the description of duality between two natural exponential families. In the sequel, we say ’Laplac− e transform’ for short and by abus de langage instead of the longer term ’bilateral Laplace transform’. Because we will deal with these bilateral Laplace transforms, we have to modify the description of the classical objects associated to F = F(µ) with S (µ) = Θ(µ), ℓµ(s) = kµ( s), m = ℓ′ (s), (4) − − − µ 1 s = ϕµ(m) = ψµ(m), ℓ′′(s) = k′′( s), VF(m) = (ϕ′ (m))− .