1 One Parameter Exponential Families
Total Page:16
File Type:pdf, Size:1020Kb
1 One parameter exponential families The world of exponential families bridges the gap between the Gaussian family and general dis- tributions. Many properties of Gaussians carry through to exponential families in a fairly precise sense. • In the Gaussian world, there exact small sample distributional results (i.e. t, F , χ2). • In the exponential family world, there are approximate distributional results (i.e. deviance tests). • In the general setting, we can only appeal to asymptotics. A one-parameter exponential family, F is a one-parameter family of distributions of the form Pη(dx) = exp (η · t(x) − Λ(η)) P0(dx) for some probability measure P0. The parameter η is called the natural or canonical parameter and the function Λ is called the cumulant generating function, and is simply the normalization needed to make dPη fη(x) = (x) = exp (η · t(x) − Λ(η)) dP0 a proper probability density. The random variable t(X) is the sufficient statistic of the exponential family. Note that P0 does not have to be a distribution on R, but these are of course the simplest examples. 1.0.1 A first example: Gaussian with linear sufficient statistic Consider the standard normal distribution Z e−z2=2 P0(A) = p dz A 2π and let t(x) = x. Then, the exponential family is eη·x−x2=2 Pη(dx) / p 2π and we see that Λ(η) = η2=2: eta= np.linspace(-2,2,101) CGF= eta**2/2. plt.plot(eta, CGF) A= plt.gca() A.set_xlabel(r'$\eta$', size=20) A.set_ylabel(r'$\Lambda(\eta)$', size=20) f= plt.gcf() 1 Thus, the exponential family in this setting is the collection F = fN(η; 1) : η 2 Rg : d 1.0.2 Normal with quadratic sufficient statistic on R d As a second example, take P0 = N(0;Id×d), i.e. the standard normal distribution on R . As sufficient 2 statistic, we take t(x) = kxk2=2. Then, the exponential family is η·kxk2=2−kxk2=2 Pη(dx) / e 2 2 and we see that the family is only defined for η < 1. For η < 1, d Λ(η) = − log(1 − η): 2 We see that not all exponential families have all of R as their parameter space. We might as well define Λ over all of R: ( − d log(1 − η) η < 1 Λ(η) = 2 1 η ≥ 1: The exponential family here is −1 F = N(0d×1; (1 − η) · Id×d); η < 1 : eta= np.linspace(-3,0.99,101) d=3 CGF=-d* np.log(1-eta)/2. plt.plot(eta, CGF) A= plt.gca() A.set_xlabel(r'$\eta$', size=20) A.set_ylabel(r'$\Lambda(\eta)$', size=20) 2 <matplotlib.text.Text at 0x10fcae0d0> 1.0.3 Tilts of triangular distribution The previous two examples, we could express Λ explicitly by simple integration. This is not always possible, though we can use the computer to do some calculations for us. Set P0 to be the triangular distribution on (−1; 1) with sufficient statistic t(x) = x so that Pη(dx) = exp(η · x − Λ(η))P0(dx) with Z 1 Λ(η) = log eηx dx : −1 X= np.linspace(-1,1,501) dX=X[1]-X[0] def tilted_density(eta): D= np.exp(eta*X)* np.minimum((1 +X), (1 -X)) CGF= np.log((np.exp(eta*X)* np.minimum((1 +X), (1 -X)) * dX).sum()) returnD/ np.exp(CGF) [plt.plot(X, tilted_density(eta), label=r'$\eta=%d$'% eta) for eta in [0,1,2,3]] plt.gca().set_title('Tilts of the uniform distribution.') plt.legend(loc='upper left') <matplotlib.legend.Legend at 0x10fcaea50> 3 1.0.4 Carrier measure More generally, P0 could be replaced by some measure m0 that is not a probability density. 2 For example, if m0 is Lebesgue measure on R and t(x) = x =2. Then, for all η < 0 ηx2=2 dPη e (x) = p dm0 −2π/η corresponds to a N(0; −η−1) density. To find Λ(η), note that Z (p Λ(η) ηx2=2 2π= − η η < 0 e = e dm0(x) = R 1 otherwise. Therefore, for η < 0 1 1 Λ(η) = − log(−η) + log(2π): 2 2 The exponential family is therefore N(0; −η−1); η < 0 : 1.1 Reparametrizing the family Note that the exponential family is determined by the pair (t(X); m0). The choice of m0 is somewhat arbitrary. We could fix some η0 and consider a new family with carrier measure Pη0 2 F: n o F = = exp η · t(x) − Λ(η) (dx) e Peηe e e e Pη0 4 But, a simple manipulation shows that Pη(dx) = exp (η · t(x) − Λ(η)) m0(dx) = exp ((η − η0) · t(x) − (Λ(η) − Λ(η0)) Pη0 (dx): This shows that there is a 1:1 correspondence between F and Fe. Namely F 3 P 7! 2 F e eηe Pηe+η0 Λ(e ηe) = Λ(ηe + η0) − Λ(η0): 1.2 Domain of an exponential family In the examples above, we saw that not all values of η lead to a probability distribution due to the sufficient statistic not being integrable with respect to m0. The domain D(F) can be thought of as the set of all natural parameters which lead to a probablity distribution. Formally, we define the domain as D(F) = D((t(X); m0)) = fη : Λ(η) < 1g : The domain is also defined relative to the carrier measure m0. As in the previous section on reparametrization, we see D(Fe) = D((t(X); Pη0 ) = fηe : ηe + η0 2 D(F)g = fηe : Λ(ηe + η0) − Λ(η0) < 1g = D(F) ⊕ (−η0): Hence, the domain of two exponential families with different parametrizations determined by dif- ferent canonical parameters are related by a simple translation. 1.2.1 Exercise: convexity of D(F) 1. Show that Λ is a (possibly infinite) convex function on R. 2. Use this to show that D(F) is convex, i.e. a (possibly infinite) interval. 1.2.2 Exercise: half-Gaussian density Consider the half-Gaussian distribution with density 2e−x2=2 f(x) = p ; x ≥ 0: 2π 1. Use f as carrier measure to create an exponential family with sufficient statistic t(x) = −x. 2. Plot the density for η 2 [0; 2; 4; 6]. 3. What is D(F)? 4. What happens as η ! 1? What about η ! −∞? 5. Can you renormalize the random variables with distributions in F to get a \nice" limit at either ±∞? That is, suppose Zn ∼ Pηn with ηn ! ±∞ Can you define Wn = cn(Zn − µn) so that Wn converges in distribution? 5 1.3 Example: the Poisson family An important example of a one-parameter family that we will revisit often is the Poisson family on the non-negative integers Z≥0 = f0; 1; 2;::: g. The carrier measure is 1 m (dx) = m(dx) 0 x! with m the counting measure on Z≥0. Poisson random variables are usually parametrized by their expectation λ. This is different than the canonical parametrization. Let's write this parameterization as e−λλx (dx) = m (dx): Qλ x! 0 We see, then Pη(dx) = exp(η · x − Λ(η))m0(dx): The two parametrizations are related by η = log(λ) Λ(η) = eη = λ 1.3.1 Exercise: reparametrizing the Poisson family Let F = (x; m0) denote the Poisson family. 1. What is D(F)? 2. Rewrite the Poisson family Pη so that the carrier measure is a Poisson distribution with mean 2. Call the exponential family with this carrier measure F2. What is D(F2)? 3. Write the Poisson distribution with mean 6 as a point in D(F) and as a point in D(F2). That is, in each case, find the canonical parameter such the corresponding distribution is a Poisson with mean 6. 1.4 Expectation and variances The function Λ is the cumulant generating function of the family and differentiating it yields the cumulants of the random variable t(X). Specifically, if the carrier measure is a probability measure, it is the logarithm of the moment generating function of t(X) under P0. More generally, if the carrier measure is not a probability measure but just a measure on some sample space Ω, then for any η 2 D(F) Z θ·t(X) (θ+η)t(x)−Λ(η) Λ(θ+η)−Λ(η) Eη(e ) = e m0(dx) = e : Ω Note that Z Λ(η) η·t(x) e = e m0(dx): Ω 6 Differentiating yields with respect to η Z Λ(η) η·t(x) Λ(_ η)e = t(x)e m0(dx) Ω Z Λ(η) = e · t(x)Pη(dx) Ω Λ(η) = e · Eη(t(X)): Differentiating a second time yields Z 2 Λ(η) 2 η·t(x) Λ(¨ η) + Λ(_ η) e = t(x) e m0(dx) Ω Λ(η) 2 = e Eη(t(X) ): Summarizing, _ Λ(η) = Eη(t(X)) ¨ 2 Λ(η) = Eη[(t(X) − Eη(t(X)) ] = Varη(t(X)) The above also motivates definition of another space related to F, the set of realizable expected values n o M(F) = Λ(_ η): η 2 D(F) = Λ(_ D(F)): 1.4.1 Parametrization by the mean The above calculation yields a parameterization _ µ(η) = Eη(t(X)) = Λ(η): As dµ = Λ(¨ η) = Var (t(X)) ≥ 0 dη η we see that the mapping is 1:1 and non-decreasing and is invertible as long as the random variable t(X) is not constant under Pη.