Lecture 2. Exponential Families

Lecture 2. Exponential Families Nan Ye School of Mathematics and Physics University of Queensland 1 / 19 Recall: definition of GLM ∙ A GLM has the following structure > (systematic) E(Y j x) = h(훽 x): (random) Y j x follows an exponential family distribution: ∙ This is usually separated into three components ∙ The linear predictor 훽>x. ∙ The response function h. People often specify the link function g = h−1 instead. ∙ The exponential family for the conditional distribution of Y given x. 2 / 19 Recall: remarks on exponential families ∙ It is common! normal, Bernoulli, Poisson, and Gamma distributions are exponential families. ∙ It gives a well-defined model. its parameters are determined by the mean 휇 = E(Y j x). ∙ It leads to a unified treatment of many different models. linear regression, logistic regression, ... 3 / 19 This Lecture ∙ One-parameter exponential family: definition and examples. ∙ Maximum likelihood estimation: score equation, MLE as moment matching. 4 / 19 Exponential Families One parameter exponential family An exponential family distribution has a PDF (or PMF for a discrete random variable) of the form (︂휂(휃)T (y) − A(휃) )︂ f (y j 휃; 휑) = exp + c(y; 휑) b(휑) ∙ T (y) is called the natural statistic. ∙ 휂(휃) is called the natural parameter. ∙ 휑 is a nuisance parameter treated as known. Range of y must be independent of 휃. 5 / 19 Special forms ∙ A natural exponential family is one parametrized by 휂, that is, 휃 = 휂, and the PDF/PMF can be written as (︂휂T (y) − A(휂) )︂ f (y j 휂; 휑) = exp + c(y; 휑) b(휑) ∙ The nuisance parameter can be absent, in which case b and c have the form b(휑) = 1 and c(y; 휑) = c(y). 6 / 19 General form ∙ In general, 휂(휃) and T (y) can be vector-valued functions, 휃 can be a vector, and y can be an arbitrary object (like strings, sequences). ∙ Of course, 휂(휃)T (y) need to be replaced by the inner product 휂(휃)>T (y) then. ∙ In practice, this general form has been heavily used in various domains such as computer vision, natural language processing. ∙ For this course, we focus on the single-parameter case, but the properties studied here can be easily generalized for the general form. 7 / 19 Examples Example 1. Gaussian distribution (known 휎2) ∙ The PDF is 1 (︂ (y − 휇)2 )︂ f (y j 휇, 휎) = p exp − 2휋휎2 2휎2 (︂ y 2 휇y 휇2 p )︂ = exp − + − − ln 2휋휎2 ; 2휎2 휎2 2휎2 ∙ This is a natural exponential family with a nuisance parameter 휎 ∙ 휂(휇) = 휇. ∙ T (y) = y. ∙ A(휇) = 휇2=2. ∙ b(휎) = 휎2. 2 p ∙ y 2 c(y; 휎) = − 2휎2 − ln 2휋휎 . 8 / 19 Example 2. Bernoulli distribution ∙ The PMF is f (y j p) = py (1 − p)1−y = exp (y ln p + (1 − y) ln(1 − p)) (︂ p )︂ = exp y ln + ln(1 − p) 1 − p ∙ This is an exponential family without a nuisance parameter ∙ p 휂(p) = ln 1−p . ∙ T (y) = y. ∙ A(p) = − ln(1 − p). ∙ c(y) = 0. 9 / 19 ∙ The natural form is given by ey휂 f (y j 휂) = : 1 + e휂 10 / 19 Example 3. Poisson distribution ∙ The PMF is 휆y f (y j 휆) = exp(−휆) y! = exp(−휆 + y ln 휆 − ln y!) ∙ This is an exponential family without a nuisance parameter ∙ 휂(휆) = ln 휆. ∙ T (y) = y. ∙ A(휆) = 휆. ∙ c(y) = − ln y!. 11 / 19 Example 4. Gamma distribution (fixed shape) ∙ The PDF, parametrized using the shape k and scale 휃, is y k−1 f (y j k; 휃) = e−y/휃: Γ(k)휃k ∙ An alternative parametrization, sometimes easier to work with, is to use two parameters 휇 and 휈 such that 휇 is the mean, and the variance is 휇2/휈. 12 / 19 ∙ In the (k; 휃) parametrization, the mean and variance are k휃 and k휃2 respectively, thus k = 휈 and 휃 = 휇/휈, and y 휈−1휈휈 f (y j 휇, 휈) = e−y휈/휇 Γ(휈)휇휈 (︂ (︂ 1 )︂ )︂ = exp −휈 y + ln 휇 + (휈 − 1) ln y + 휈 ln 휈 − ln Γ(휈) 휇 ∙ This is an exponential family with a nuisance parameter 휈 ∙ 휂(휇) = 1/휇. ∙ T (y) = y. ∙ A(휇) = ln 휇 ∙ b(휈) = 1/휈. ∙ c(y; 휈) = (휈 − 1) ln y + 휈 ln 휈 − ln Γ(휈). 13 / 19 Example 5. Negative binomial (fixed r) ∙ The PMF of N(r; p) is (︂y + r − 1)︂ f (y j p; r) = (1 − p)r py y (︂ (︂y + r − 1)︂ )︂ = exp ln + r ln(1 − p) + y ln p y ∙ This is an exponential family when r is fixed ∙ 휂(p) = ln p. ∙ T (y) = y. ∙ A(p) = −r ln(1 − p). ∙ (︀y+r−1)︀ c(y) = ln y . 14 / 19 MLE Score equation The MLE 휃 is the solution to A0(휃) = T~; 휂0(휃) ~ 1 P where T = n i T (yi ). 15 / 19 Derivation ∙ The log-likelihood of 휃 given y1;:::; yn is X 휂(휃)T (yi ) − A(휃) `(휃) = + c(y ; 휑): b(휑) i i ∙ Set the derivative of the log-likelihood to 0, and rearrange to obtain A0(휃)/휂0(휃) = T~. 16 / 19 MLE as moment matching 0 0 ∙ We will see that E(T j 휃) = A (휃)/휂 (휃) in next lecture. ∙ Thus the score equation is E(T j 휃) = T~. ∙ MLE is thus matching the first moment of the exponential family to that of the empirical distribution (i.e., a method of moment estimator). 17 / 19 Example. MLE for negative binomial distribution ∙ Recall: for fixed r, A(p) = −r ln(1 − p), 휂(p) = ln p. rp 1 P ∙ The score equation is 1−p =y ¯ = n i yi . ∙ p =y ¯=(¯y + r). 18 / 19 What You Need to Know ∙ Recognise whether a distribution is a one-parameter exponential family. ∙ Score equation for one-parameter exponential family ∙ Sufficiency of T~ for determining MLE. ∙ MLE as moment matching. 19 / 19.

Lecture 2. Exponential Families

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support