<<

Lecture 2. Exponential Families

Nan Ye

School of Mathematics and Physics University of Queensland

1 / 19 Recall: definition of GLM ∙ A GLM has the following structure

⊤ (systematic) E(Y | x) = h(훽 x). (random) Y | x follows an distribution.

∙ This is usually separated into three components ∙ The linear predictor 훽⊤x. ∙ The response function h. People often specify the link function g = h−1 instead. ∙ The exponential family for the conditional distribution of Y given x.

2 / 19 Recall: remarks on exponential families ∙ It is common! normal, Bernoulli, Poisson, and Gamma distributions are exponential families. ∙ It gives a well-defined model. its parameters are determined by the 휇 = E(Y | x). ∙ It leads to a unified treatment of many different models. , , ...

3 / 19 This Lecture

∙ One-parameter exponential family: definition and examples. ∙ Maximum likelihood estimation: score equation, MLE as .

4 / 19 Exponential Families

One parameter exponential family An exponential family distribution has a PDF (or PMF for a discrete ) of the form

(︂휂(휃)T (y) − A(휃) )︂ f (y | 휃, 휑) = exp + c(y, 휑) b(휑)

∙ T (y) is called the natural . ∙ 휂(휃) is called the natural parameter. ∙ 휑 is a nuisance parameter treated as known.

Range of y must be independent of 휃.

5 / 19 Special forms ∙ A natural exponential family is one parametrized by 휂, that is, 휃 = 휂, and the PDF/PMF can be written as

(︂휂T (y) − A(휂) )︂ f (y | 휂, 휑) = exp + c(y, 휑) b(휑)

∙ The nuisance parameter can be absent, in which case b and c have the form b(휑) = 1 and c(y, 휑) = c(y).

6 / 19 General form ∙ In general, 휂(휃) and T (y) can be vector-valued functions, 휃 can be a vector, and y can be an arbitrary object (like strings, sequences). ∙ Of course, 휂(휃)T (y) need to be replaced by the inner product 휂(휃)⊤T (y) then. ∙ In practice, this general form has been heavily used in various domains such as computer vision, natural language processing. ∙ For this course, we focus on the single-parameter case, but the properties studied here can be easily generalized for the general form.

7 / 19 Examples

Example 1. Gaussian distribution (known 휎2) ∙ The PDF is 1 (︂ (y − 휇)2 )︂ f (y | 휇, 휎) = √ exp − 2휋휎2 2휎2 (︂ y 2 휇y 휇2 √ )︂ = exp − + − − ln 2휋휎2 , 2휎2 휎2 2휎2

∙ This is a natural exponential family with a nuisance parameter 휎 ∙ 휂(휇) = 휇. ∙ T (y) = y. ∙ A(휇) = 휇2/2. ∙ b(휎) = 휎2. 2 √ ∙ y 2 c(y, 휎) = − 2휎2 − ln 2휋휎 .

8 / 19 Example 2. ∙ The PMF is

f (y | p) = py (1 − p)1−y = exp (y ln p + (1 − y) ln(1 − p)) (︂ p )︂ = exp y ln + ln(1 − p) 1 − p

∙ This is an exponential family without a nuisance parameter ∙ p 휂(p) = ln 1−p . ∙ T (y) = y. ∙ A(p) = − ln(1 − p). ∙ c(y) = 0.

9 / 19 ∙ The natural form is given by ey휂 f (y | 휂) = . 1 + e휂

10 / 19 Example 3. ∙ The PMF is 휆y f (y | 휆) = exp(−휆) y! = exp(−휆 + y ln 휆 − ln y!)

∙ This is an exponential family without a nuisance parameter ∙ 휂(휆) = ln 휆. ∙ T (y) = y. ∙ A(휆) = 휆. ∙ c(y) = − ln y!.

11 / 19 Example 4. Gamma distribution (fixed shape) ∙ The PDF, parametrized using the shape k and scale 휃, is

y k−1 f (y | k, 휃) = e−y/휃. Γ(k)휃k

∙ An alternative parametrization, sometimes easier to work with, is to use two parameters 휇 and 휈 such that 휇 is the mean, and the is 휇2/휈.

12 / 19 ∙ In the (k, 휃) parametrization, the mean and variance are k휃 and k휃2 respectively, thus k = 휈 and 휃 = 휇/휈, and

y 휈−1휈휈 f (y | 휇, 휈) = e−y휈/휇 Γ(휈)휇휈 (︂ (︂ 1 )︂ )︂ = exp −휈 y + ln 휇 + (휈 − 1) ln y + 휈 ln 휈 − ln Γ(휈) 휇

∙ This is an exponential family with a nuisance parameter 휈 ∙ 휂(휇) = 1/휇. ∙ T (y) = y. ∙ A(휇) = ln 휇 ∙ b(휈) = 1/휈. ∙ c(y, 휈) = (휈 − 1) ln y + 휈 ln 휈 − ln Γ(휈).

13 / 19 Example 5. Negative binomial (fixed r) ∙ The PMF of N(r, p) is

(︂y + r − 1)︂ f (y | p, r) = (1 − p)r py y (︂ (︂y + r − 1)︂ )︂ = exp ln + r ln(1 − p) + y ln p y

∙ This is an exponential family when r is fixed ∙ 휂(p) = ln p. ∙ T (y) = y. ∙ A(p) = −r ln(1 − p). ∙ (︀y+r−1)︀ c(y) = ln y .

14 / 19 MLE

Score equation The MLE 휃 is the solution to A′(휃) = T˜, 휂′(휃)

˜ 1 ∑︀ where T = n i T (yi ).

15 / 19 Derivation

∙ The log-likelihood of 휃 given y1,..., yn is

∑︁ 휂(휃)T (yi ) − A(휃) ℓ(휃) = + c(y , 휑). b(휑) i i ∙ Set the derivative of the log-likelihood to 0, and rearrange to obtain A′(휃)/휂′(휃) = T˜.

16 / 19 MLE as moment matching ′ ′ ∙ We will see that E(T | 휃) = A (휃)/휂 (휃) in next lecture. ∙ Thus the score equation is E(T | 휃) = T˜. ∙ MLE is thus matching the first moment of the exponential family to that of the empirical distribution (i.e., a method of moment estimator).

17 / 19 Example. MLE for negative ∙ Recall: for fixed r, A(p) = −r ln(1 − p), 휂(p) = ln p. rp 1 ∑︀ ∙ The score equation is 1−p =y ¯ = n i yi . ∙ p =y ¯/(¯y + r).

18 / 19 What You Need to Know

∙ Recognise whether a distribution is a one-parameter exponential family. ∙ Score equation for one-parameter exponential family ∙ Sufficiency of T˜ for determining MLE. ∙ MLE as moment matching.

19 / 19