Chapter 1 Introduction

Chapter 1 Introduction

Outline Chapter 1 Introduction • Variable types • Review of binomial and multinomial distributions • Likelihood and maximum likelihood method Yibi Huang • Inference for a binomial proportion (Wald, score and likelihood Department of Statistics ratio tests and confidence intervals) University of Chicago • Sample sample inference 1 Regression methods are used to analyze data when the response variable is numerical • e.g., temperature, blood pressure, heights, speeds, income • Stat 22200, Stat 22400 Methods in categorical data analysis are used when the response variable takes categorical (or qualitative) values Variable Types • e.g., • gender (male, female), • political philosophy (liberal, moderate, conservative), • region (metropolitan, urban, suburban, rural) • Stat 22600 In either case, the explanatory variables can be numerical or categorical. 2 Two Types of Categorical Variables Nominal : unordered categories, e.g., • transport to work (car, bus, bicycle, walk, other) • favorite music (rock, hiphop, pop, classical, jazz, country, folk) Review of Binomial and Ordinal : ordered categories Multinomial Distributions • patient condition (excellent, good, fair, poor) • government spending (too high, about right, too low) We pay special attention to — binary variables: success or failure for which nominal-ordinal distinction is unimportant. 3 Binomial Distributions (Review) Example If n Bernoulli trials are performed: Vote (Dem, Rep). Suppose π = Pr(Dem) = 0:4: • only two possible outcomes for each trial (success, failure) Sample n = 3 voters, let y = number of Dem votes among them. − • π = P(success), 1 π = P(failure), for each trial, n! 3! P(y) = πy (1 − π)n−y = (0:4)y (0:6)3−y • trials are independent y!(n − y)! y!(3 − y)! • Y = number of successes out of n trials 3! P(0) = (0:4)0(0:6)3 = (0:6)3 = 0:216 0!3! then Y has a binomial distribution, denoted as 3! P(1) = (0:4)1(0:6)2 = 3(0:4)(0:6)2 = 0:432 Y ∼ binomial (n; π): 1!2! The probability function of Y is y P(y) ! n 0 0.216 P(Y = y) = πy (1 − π)n−y ; y = 0; 1;:::; n: y 1 0.432 ! n n! where = is the binomial coefficient and 2 0.288 y y!(n − y)! 3 0.064 m! = “m factorial” = m × (m − 1) × (m − 2) × · · · × 1: total 1 Note that 0! = 1. 4 5 R Codes Facts About the Binomial Distribution > dbinom(x=0, size=3, p=0.4) If Y is a binomial (n; π) random variable, then [1] 0.216 > dbinom(0, 3, 0.4) • E(Y) = nπ [1] 0.216 p p • σ(Y) = Var(Y) = nπ(1 − π), > dbinom(1, 3, 0.4) [1] 0.432 • Binomial (n; π) can be approx. by Normal (nπ, nπ(1 − π)) > dbinom(0:3, 3, 0.4) when n is large (nπ > 5 and n(1 − π) > 5). [1] 0.216 0.432 0.288 0.064 binomial(n = 8; π = 0:2) binomial(n = 25; π = 0:2) > plot(0:3, dbinom(0:3, 3, .4), type = "h", xlab = "y", ylab = "P(y)") 0.20 0.30 0.4 0.10 P(y) P(y) 0.3 0.15 P(y) 0.2 0.00 0.00 0.1 0 2 4 6 8 0 5 10 15 0.0 1.0 2.0 3.0 y y y 6 7 Multinomial Distribution — Generalization of Binomial Example If n trials are performed: Suppose proportions of individuals with genotypes AA, Aa, and aa in a large population are • in each trial there are c > 2 possible outcomes (categories) (πAA ; πAa ; πaa ) = (0:25; 0:5; 0:25): Pc • πi = P(category i), for each trial, i=1 πi = 1 Randomly sample n = 5 individuals from the population. • trials are independent The chance of getting 2 AA’s, 2 Aa’s, and 1 aa is • Y = number of trials fall in category i out of n trials 5! i P(Y = 2; Y = 2; Y = 1) = (0:25)2(0:5)2(0:25)1 AA Aa aa 2! 2! 1! 5 · 4 · 3 · 2 · 1 then the joint distribution of (Y ; Y ;:::; Y ) has a multinomial = (0:25)2(0:5)2(0:25)1 ≈ 0:117 1 2 c (2 · 1)(2 · 1)(1) distribution, with probability function and the chance of getting no AA, 3 Aa’s, and 2 aa’s is n! y y y ( = = = ) = 1 2 ··· c 5! 0 3 2 P Y1 y1; Y2 y2;:::; Yc yc π1 π2 πc P(YAA = 0; YAa = 3; Yaa = 2) = (0:25) (0:5) (0:25) y1! y2! ··· yc ! 0! 3! 2! P 5 · 4 · 3 · 2 · 1 0 3 2 where 0 ≤ yi ≤ n for all i and i yi = n: = (0:25) (0:5) (0:25) ≈ 0:078 (1)(3 · 2 · 1)(2 · 1) 8 9 Facts About the Multinomial Distribution If (Y1; Y2;:::; Yc ) has a multinomial distribution with n trials and category probabilities (π1; π2; ··· ; πc ), then Likelihood and Maximum Likelihood Estimation • E(Yi) = nπi for i = 1; 2;:::; c p p • σ(Yi) = Var(Yi) = nπi(1 − πi), • Cov(Yi; Yj) = −nπiπj 10 A Probability Question A Statistics Question A push pin is tossed n = 5 times. Let Y be the number of times the push pin lands on its head. What is P(Y = 3)? Suppose a push pin is observed to land on its head Y = 8 times in n = 20 tosses. Can we infer about the value of Answer. As the tosses are indep., Y is binomial (n = 5, π) π = P(push pin lands on its head in a toss)? n! P(Y = y; π) = πy (1 − π)n−y y!(n − y)! The chance to observe Y = 8 in n = 20 tosses is 8 where π = P(push pin lands on its head in a toss): > 20 (0:3)8(0:7)12 ≈ 0:1143 if π = 0:3 <> 8 P(Y = 8; π) = > :> 20 (0:6)8(0:4)12 ≈ 0:0354 if π = 0:6 If π is known to be 0.4, then 8 It appears that π = 0:3 is more likely than π = 0:6; since the former 5! P(Y = 3; π) = (0:4)3(0:6)2 = 0:2304: 3!2! gives a higher prob. to observe the outcome Y = 8. 11 12 The probability Likelihood ! n P(Y = y; π) = πy (1 − π)n−y = `(πjy) y In general, suppose the observed data (Y1; Y2;:::; Yn) have a joint viewed as a function of π, is called the likelihood function, (or just probability distribution with some parameter(s) θ likelihood) of π, denoted as `(πjy). P(Y1 = y1; Y2 = y2;:::; Yn = yn) = f(y1; y2;:::; ynjθ) It is a measure of the “plausibility” for a value being the true value The likelihood function for the parameterθ is of π. `(θ) = `(θjy1; y2;:::; yn) = f(y1; y2;:::; ynjθ): 1.0 0.8 y=0 • Note the likelihood function regards the probability as a 0.6 function of the parameter θ rather than as a function of the 0.4 y=2 y=8 y=14 data y1; y2;:::; yn. Likelihood 0.2 • If 0.0 `(θ1jy1;:::; yn) > `(θ2jy1;:::; yn); 0.0 0.2 0.4 0.6 0.8 1.0 π then θ1 appears more plausible to be the true value of θ than θ2 does, given the observed data y1;:::; yn: Curves for the likelihood `(πjy) at different values of y for n = 20: 13 14 Maximum Likelihood Estimate (MLE) Maximizing the Log-likelihood The maximum likelihood estimate (MLE) of a parameter θ is the value at which the likelihood function is maximized. Rather than maximizing the likelihood, it is usually computationally Example. If a push pin lands on head Y = 8 times in n = 20 easier to maximize its logarithm, called the log-likelihood, tosses, the likelihood function ! 20 log `(πjy) `(πjy = 8) = π8(1 − π)12 8 which is equivalent since logarithm is strictly increasing, reach its maximum at π = 0:4, the MLE for π is bπ = 0:4 given the data Y = 8: x1 > x2 () log(x1) > log(x2): 0.15 So 0.10 `(π1jy) > `(π2jy) () log `(π1jy) > log `(π2jy): 0.05 0.00 0.0 0.2 0.4 0.6 0.8 1.0 π 15 16 Example (MLE for Binomial) More Facts about MLEs If the observed data Y ∼ binomial (n; π) but π is unknown, the likelihood of π is ! n 2 `(πjy) = p(Y = yjπ) = πy (1 − π)n−y • If Y1; Y2;:::; Yn are i.i.d. N(µ, σ ), the MLE of µ is the sample y mean Pn Y =n: and the log-likelihood is i=1 i ! n log `(πjy) = log + y log(π) + (n − y) log(1 − π): y • In ordinary linear regression, From calculus, we know a function f(x) reaches its max at x = x0 if 2 d ( ) = = d ( ) = Yi = β0 + β1xi1 + ··· + βpxip + "i dx f x 0 at x x0 and dx2 f x < 0 at x x0: As d y n − y y − nπ log `(πjy) = − = when the noise "i are i.i.d. normal, the usual least squares dπ π 1 − π π(1 − π): estimates for β0; β1; : : : ; βp are MLEs. 2 = d ( j ) = − y − n−y equals 0 when π y=n and dπ2 log ` π y π2 (1−π)2 < 0 is always true, we know log `(πjy) reaches its max when π = y=n. So the MLE of π is y=n: 17 18 Large Sample Optimality of MLEs MLEs are not always the best estimators but they have a number of good properties.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    13 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us