Lecture 2. Exponential Families

Lecture 2. Exponential Families Nan Ye School of Mathematics and Physics University of Queensland 1 / 19 Recall: definition of GLM ∙ A GLM has the following structure > (systematic) E(Y j x) = h(훽 x): (random) Y j x follows an exponential family distribution: ∙ This is usually separated into three components ∙ The linear predictor 훽>x. ∙ The response function h. People often specify the link function g = h−1 instead. ∙ The exponential family for the conditional distribution of Y given x. 2 / 19 Recall: remarks on exponential families ∙ It is common! normal, Bernoulli, Poisson, and Gamma distributions are exponential families. ∙ It gives a well-defined model. its parameters are determined by the mean 휇 = E(Y j x). ∙ It leads to a unified treatment of many different models. linear regression, logistic regression, ... 3 / 19 This Lecture ∙ One-parameter exponential family: definition and examples. ∙ Maximum likelihood estimation: score equation, MLE as moment matching. 4 / 19 Exponential Families One parameter exponential family An exponential family distribution has a PDF (or PMF for a discrete random variable) of the form (︂휂(휃)T (y) − A(휃) )︂ f (y j 휃; 휑) = exp + c(y; 휑) b(휑) ∙ T (y) is called the natural statistic. ∙ 휂(휃) is called the natural parameter. ∙ 휑 is a nuisance parameter treated as known. Range of y must be independent of 휃. 5 / 19 Special forms ∙ A natural exponential family is one parametrized by 휂, that is, 휃 = 휂, and the PDF/PMF can be written as (︂휂T (y) − A(휂) )︂ f (y j 휂; 휑) = exp + c(y; 휑) b(휑) ∙ The nuisance parameter can be absent, in which case b and c have the form b(휑) = 1 and c(y; 휑) = c(y). 6 / 19 General form ∙ In general, 휂(휃) and T (y) can be vector-valued functions, 휃 can be a vector, and y can be an arbitrary object (like strings, sequences). ∙ Of course, 휂(휃)T (y) need to be replaced by the inner product 휂(휃)>T (y) then. ∙ In practice, this general form has been heavily used in various domains such as computer vision, natural language processing. ∙ For this course, we focus on the single-parameter case, but the properties studied here can be easily generalized for the general form. 7 / 19 Examples Example 1. Gaussian distribution (known 휎2) ∙ The PDF is 1 (︂ (y − 휇)2 )︂ f (y j 휇, 휎) = p exp − 2휋휎2 2휎2 (︂ y 2 휇y 휇2 p )︂ = exp − + − − ln 2휋휎2 ; 2휎2 휎2 2휎2 ∙ This is a natural exponential family with a nuisance parameter 휎 ∙ 휂(휇) = 휇. ∙ T (y) = y. ∙ A(휇) = 휇2=2. ∙ b(휎) = 휎2. 2 p ∙ y 2 c(y; 휎) = − 2휎2 − ln 2휋휎 . 8 / 19 Example 2. Bernoulli distribution ∙ The PMF is f (y j p) = py (1 − p)1−y = exp (y ln p + (1 − y) ln(1 − p)) (︂ p )︂ = exp y ln + ln(1 − p) 1 − p ∙ This is an exponential family without a nuisance parameter ∙ p 휂(p) = ln 1−p . ∙ T (y) = y. ∙ A(p) = − ln(1 − p). ∙ c(y) = 0. 9 / 19 ∙ The natural form is given by ey휂 f (y j 휂) = : 1 + e휂 10 / 19 Example 3. Poisson distribution ∙ The PMF is 휆y f (y j 휆) = exp(−휆) y! = exp(−휆 + y ln 휆 − ln y!) ∙ This is an exponential family without a nuisance parameter ∙ 휂(휆) = ln 휆. ∙ T (y) = y. ∙ A(휆) = 휆. ∙ c(y) = − ln y!. 11 / 19 Example 4. Gamma distribution (fixed shape) ∙ The PDF, parametrized using the shape k and scale 휃, is y k−1 f (y j k; 휃) = e−y/휃: Γ(k)휃k ∙ An alternative parametrization, sometimes easier to work with, is to use two parameters 휇 and 휈 such that 휇 is the mean, and the variance is 휇2/휈. 12 / 19 ∙ In the (k; 휃) parametrization, the mean and variance are k휃 and k휃2 respectively, thus k = 휈 and 휃 = 휇/휈, and y 휈−1휈휈 f (y j 휇, 휈) = e−y휈/휇 Γ(휈)휇휈 (︂ (︂ 1 )︂ )︂ = exp −휈 y + ln 휇 + (휈 − 1) ln y + 휈 ln 휈 − ln Γ(휈) 휇 ∙ This is an exponential family with a nuisance parameter 휈 ∙ 휂(휇) = 1/휇. ∙ T (y) = y. ∙ A(휇) = ln 휇 ∙ b(휈) = 1/휈. ∙ c(y; 휈) = (휈 − 1) ln y + 휈 ln 휈 − ln Γ(휈). 13 / 19 Example 5. Negative binomial (fixed r) ∙ The PMF of N(r; p) is (︂y + r − 1)︂ f (y j p; r) = (1 − p)r py y (︂ (︂y + r − 1)︂ )︂ = exp ln + r ln(1 − p) + y ln p y ∙ This is an exponential family when r is fixed ∙ 휂(p) = ln p. ∙ T (y) = y. ∙ A(p) = −r ln(1 − p). ∙ (︀y+r−1)︀ c(y) = ln y . 14 / 19 MLE Score equation The MLE 휃 is the solution to A0(휃) = T~; 휂0(휃) ~ 1 P where T = n i T (yi ). 15 / 19 Derivation ∙ The log-likelihood of 휃 given y1;:::; yn is X 휂(휃)T (yi ) − A(휃) `(휃) = + c(y ; 휑): b(휑) i i ∙ Set the derivative of the log-likelihood to 0, and rearrange to obtain A0(휃)/휂0(휃) = T~. 16 / 19 MLE as moment matching 0 0 ∙ We will see that E(T j 휃) = A (휃)/휂 (휃) in next lecture. ∙ Thus the score equation is E(T j 휃) = T~. ∙ MLE is thus matching the first moment of the exponential family to that of the empirical distribution (i.e., a method of moment estimator). 17 / 19 Example. MLE for negative binomial distribution ∙ Recall: for fixed r, A(p) = −r ln(1 − p), 휂(p) = ln p. rp 1 P ∙ The score equation is 1−p =y ¯ = n i yi . ∙ p =y ¯=(¯y + r). 18 / 19 What You Need to Know ∙ Recognise whether a distribution is a one-parameter exponential family. ∙ Score equation for one-parameter exponential family ∙ Sufficiency of T~ for determining MLE. ∙ MLE as moment matching. 19 / 19.

Lecture 2. Exponential Families

A Random Variable X with Pdf G(X) = Λα Γ(Α) X ≥ 0 Has Gamma

Concentration and Consistency Results for Canonical and Curved Exponential-Family Models of Random Graphs

Use of Statistical Tables

Stat 5101 Notes: Brand Name Distributions

On a Problem Connected with Beta and Gamma Distributions by R

A Form of Multivariate Gamma Distribution

The Probability Lifesaver: Order Statistics and the Median Theorem

Notes Mean, Median, Mode & Range

A Note on the Existence of the Multivariate Gamma Distribution 1

Sampling Student's T Distribution – Use of the Inverse Cumulative

Negative Binomial Regression Models and Estimation Methods

Approximating the Distribution of the Product of Two Normally Distributed Random Variables