<<

Maximum Likelihood Estimation & Contingency Tables

Joseph Abraham1

1Estacio Uniseb

Lecture I RGM5908

Joseph Abraham Maximum LIkelihood & Contingency Tables

Outline of Maximum Likelihood Estimation

Simple Example of MLE

Association between Categorical Variables

Contingency Tables

Joseph Abraham Not really, what about p = 0.6 ?

p = 0.5 ... how do we choose p based on ?

Data are independent and identically distributed samples

100 in total with 15 heads dependent on one parameter (p)

Likelihood L(parameter | data) = p15(1 − p)85

Form similar to a joint distribution but not a joint distribution

Intro. to Maximum Likelihood I

Consider a coin with P(heads) = p

Data: 15 heads in 100 tosses

Is p = 0.90 reasonable ?

Joseph Abraham Data are independent and identically distributed samples

100 in total with 15 heads dependent on one parameter (p)

Likelihood L(parameter | data) = p15(1 − p)85

Form similar to a joint distribution but not a joint distribution

Intro. to Maximum Likelihood I

Consider a coin with P(heads) = p

Data: 15 heads in 100 tosses

Is p = 0.90 reasonable ?

Not really, what about p = 0.6 ?

p = 0.5 ... how do we choose p based on data ?

Joseph Abraham Intro. to Maximum Likelihood I

Consider a coin with P(heads) = p

Data: 15 heads in 100 tosses

Is p = 0.90 reasonable ?

Not really, what about p = 0.6 ?

p = 0.5 ... how do we choose p based on data ?

Data are independent and identically distributed samples

100 in total with 15 heads dependent on one parameter (p)

Likelihood L(parameter | data) = p15(1 − p)85

Form similar to a joint distribution but not a joint distribution

Joseph Abraham Asymptotically unbiased & decreases with

size of data set

Variance of MLE is smallest among a large class of estimators

Intro to Maximum Likelihood II

Key Idea: Best choice of p is one which

maximizes L (data fixed !)

( Many parameters possible must be done numerically)

In practise work with log L → L

Maximum Likelihod Estimator (MLE) normally distributed

Provided Size is Large !

Joseph Abraham Intro to Maximum Likelihood II

Key Idea: Best choice of p is one which

maximizes L (data fixed !)

( Many parameters possible must be done numerically)

In practise work with log L → L

Maximum Likelihod Estimator (MLE) normally distributed

Provided Sample Size is Large !

Asymptotically unbiased & Variance decreases with

size of data set

Variance of MLE is smallest among a large class of estimators

Joseph Abraham What is the MLE for p based on the data ?

n (N−n) ∂L L(p | data) = p (1 − p) solve ∂p = 0 equivalently

∂L solve ∂p = 0 where L = log(L) = n log(p) + (N − n) log(1 − p)

ˆ n MLE for p , p = N as expected.

n(N−n) Variance of estimate is N3 Decreases with sample size Normal confidence intervals apply

Compare with Binomial Confidence for proportion !

MLE example I

N tosses of a coin with n heads recorded (HTHHHTT ...)

Assume independent Bernoulli trials common probability p

Joseph Abraham n (N−n) ∂L L(p | data) = p (1 − p) solve ∂p = 0 equivalently

∂L solve ∂p = 0 where L = log(L) = n log(p) + (N − n) log(1 − p)

ˆ n MLE for p , p = N as expected.

n(N−n) Variance of estimate is N3 Decreases with sample size Normal confidence intervals apply

Compare with Binomial Confidence for proportion !

MLE example I

N tosses of a coin with n heads recorded (HTHHHTT ...)

Assume independent Bernoulli trials common probability p

What is the MLE for p based on the data ?

Joseph Abraham equivalently

∂L solve ∂p = 0 where L = log(L) = n log(p) + (N − n) log(1 − p)

ˆ n MLE for p , p = N as expected.

n(N−n) Variance of estimate is N3 Decreases with sample size Normal confidence intervals apply

Compare with Binomial Confidence for proportion !

MLE example I

N tosses of a coin with n heads recorded (HTHHHTT ...)

Assume independent Bernoulli trials common probability p

What is the MLE for p based on the data ?

n (N−n) ∂L L(p | data) = p (1 − p) solve ∂p = 0

Joseph Abraham ˆ n MLE for p , p = N as expected.

n(N−n) Variance of estimate is N3 Decreases with sample size Normal confidence intervals apply

Compare with Binomial Confidence for proportion !

MLE example I

N tosses of a coin with n heads recorded (HTHHHTT ...)

Assume independent Bernoulli trials common probability p

What is the MLE for p based on the data ?

n (N−n) ∂L L(p | data) = p (1 − p) solve ∂p = 0 equivalently

∂L solve ∂p = 0 where L = log(L) = n log(p) + (N − n) log(1 − p)

Joseph Abraham MLE example I

N tosses of a coin with n heads recorded (HTHHHTT ...)

Assume independent Bernoulli trials common probability p

What is the MLE for p based on the data ?

n (N−n) ∂L L(p | data) = p (1 − p) solve ∂p = 0 equivalently

∂L solve ∂p = 0 where L = log(L) = n log(p) + (N − n) log(1 − p)

ˆ n MLE for p , p = N as expected.

n(N−n) Variance of estimate is N3 Decreases with sample size Normal confidence intervals apply

Compare with Binomial Confidence for proportion !

Joseph Abraham Is p = 0, 2 a reasonable guess ?

Not reasonable from confidence intervals for p.

More general way to check reasonable parameter values

MLE example II

Real Data: Simulate N = 100 with p = 0.4 to get n = 43

Maximum Likelihood estimate pˆ is 0,43

Variance of estimate is 0,002451

95% confidence intevals: 0, 43 ± 0, 097

Fully compatible with p = 0, 4.

Suppose we did not know p = 0, 4 but just 100 trials &

43 successes

Joseph Abraham Not reasonable from confidence intervals for p.

More general way to check reasonable parameter values

MLE example II

Real Data: Simulate N = 100 with p = 0.4 to get n = 43

Maximum Likelihood estimate pˆ is 0,43

Variance of estimate is 0,002451

95% confidence intevals: 0, 43 ± 0, 097

Fully compatible with p = 0, 4.

Suppose we did not know p = 0, 4 but just 100 trials &

43 successes Is p = 0, 2 a reasonable guess ?

Joseph Abraham MLE example II

Real Data: Simulate N = 100 with p = 0.4 to get n = 43

Maximum Likelihood estimate pˆ is 0,43

Variance of estimate is 0,002451

95% confidence intevals: 0, 43 ± 0, 097

Fully compatible with p = 0, 4.

Suppose we did not know p = 0, 4 but just 100 trials &

43 successes Is p = 0, 2 a reasonable guess ?

Not reasonable from confidence intervals for p.

More general way to check reasonable parameter values

Joseph Abraham Assuming some given !

We wish to test p = p0 as a parameter choice

Calculate MLE (pˆ) and L(pˆ)

Compute LR = 2(L(pˆ) − L(pˆ0))

2 For large samples LR distributed like χ1

(One parameter estimated in L(pˆ) and none in L(pˆ0) )

Likelihood Ratio Test I

Very General Technique to see how well data fits

Choice of parameters

Joseph Abraham Likelihood Ratio Test I

Very General Technique to see how well data fits

Choice of parameters Assuming some given statistical model !

We wish to test p = p0 as a parameter choice

Calculate MLE (pˆ) and L(pˆ)

Compute LR = 2(L(pˆ) − L(pˆ0))

2 For large samples LR distributed like χ1

(One parameter estimated in L(pˆ) and none in L(pˆ0) )

Joseph Abraham LR = 2(L(pˆ) − L(pˆ0)) = 27, 18704 2 ˆ −6 Extreme tail of χ1 hence reject p = p0 (p-value ∼ 10 )

(for pˆ = 0, 4) p-value far from LR not significant.

Can be generalized to more complicated situations

p → (θ1, θ2, . . . θm) (used in later lectures)

Likelihood Ratio Test II

For (N = 100, n = 43) pˆ = 0, 43 L(pˆ) = −68, 33149

Using p0 = 0, 2 we get L(p0) = −81, 925

Joseph Abraham Likelihood Ratio Test II

For (N = 100, n = 43) pˆ = 0, 43 L(pˆ) = −68, 33149

Using p0 = 0, 2 we get L(p0) = −81, 925

LR = 2(L(pˆ) − L(pˆ0)) = 27, 18704 2 ˆ −6 Extreme tail of χ1 hence reject p = p0 (p-value ∼ 10 )

(for pˆ = 0, 4) p-value far from LR not significant.

Can be generalized to more complicated situations

p → (θ1, θ2, . . . θm) (used in later lectures)

Joseph Abraham In some cases (ordinal variables)

may be possible to define order (Doutorado > Mestrado)

For others (nominal) not possible (Smoker > Non-Smoker) ???

How to quantify correlations among categorical variables ?

Cannot define , pearsons correlations etc.

Associations of Categorical Variables I

Have many independent observations characterized by two

or more Categorical variables (education, genotype, hair color

smoker ...

Joseph Abraham For others (nominal) not possible (Smoker > Non-Smoker) ???

How to quantify correlations among categorical variables ?

Cannot define mean, pearsons correlations etc.

Associations of Categorical Variables I

Have many independent observations characterized by two

or more Categorical variables (education, genotype, hair color

smoker ... In some cases (ordinal variables)

may be possible to define order (Doutorado > Mestrado)

Joseph Abraham Associations of Categorical Variables I

Have many independent observations characterized by two

or more Categorical variables (education, genotype, hair color

smoker ... In some cases (ordinal variables)

may be possible to define order (Doutorado > Mestrado)

For others (nominal) not possible (Smoker > Non-Smoker) ???

How to quantify correlations among categorical variables ?

Cannot define mean, pearsons correlations etc.

Joseph Abraham Associations of Categorical Variables II

Assume we have an sample of independent individuals

some with cancer some without

Some individuals are smokers, some are not

(Assume similar ages, similar proportions of men & women)

Is there a correlation between Smoking and Cancer ?

If independent (H0) then P(S, C) = P(S)P(C)

Joseph Abraham Contingency Tables I

Sample independent individuals with Cancer & Healthy

Some smokers and others not smokers

Smokers Not Smokers Total Cancer n11 n12 nC Healthy n21 n22 nH nS nNS

nC → number of individuals with cancer

nC + nH = nS + nNS = N (total number)

n11 → Number of individuals with Cancer & Smokers

n12, n22 etc. defined similarly

Joseph Abraham Does the data support H0 ?

For large samples use Pearson’s Chi-square Test

0 2 X (nij − nij ) X 2 = n0 i,j ij

2 2 Under H0 X distributed like χ1

0 ( 1 df after fixing nC, nS and n11)

Contingency Tables II

0 Under H0 (independence) n11 = nCnS/N

0 n12 etc. defined similarly

Joseph Abraham Contingency Tables II

0 Under H0 (independence) n11 = nCnS/N

0 n12 etc. defined similarly

Does the data support H0 ?

For large samples use Pearson’s Chi-square Test

0 2 X (nij − nij ) X 2 = n0 i,j ij

2 2 Under H0 X distributed like χ1

0 ( 1 df after fixing nC, nS and n11)

Joseph Abraham How large is the correlation between S & C ?

Cannot be obtained from p-value alone

Need different measure of correlation

Contingency Tables III

Suppose we find for some data X 2 = 9.2

p-value highly significant for α = 0.05

S & C appear to be correlated

Joseph Abraham Contingency Tables III

Suppose we find for some data X 2 = 9.2

p-value highly significant for α = 0.05

S & C appear to be correlated

How large is the correlation between S & C ?

Cannot be obtained from p-value alone

Need different measure of correlation

Joseph Abraham Contingency Tables IV

Define (OR)

P(Disease | RiskFactor)P(NoDisease | NoRiskFactor) OR = P(Disease | NoRiskFactor)P(NoDisease | RiskFactor)

Ratio of Odds, OR = 1 under H0 OR > 1 =⇒ Disease and Risk Factor positively correlated

OR < 1 =⇒ Disease and Risk Factor negatively correlated

In our case sample OR = n11 n22 n21 n12 Large or Small OR evidence for strong asociation

Joseph Abraham How to test ?

Review of Fisher I

Sample of Cancer Patients (C), Healthy Individuals (H)

Some are smokers (S) & others not (N)

Data is as follows:

CCCHHCHCH ...

SSNNNSSNS ...

Statistically significant association between C & S ?

(Is S below C statistically frequent ? )

Joseph Abraham Review of Fisher Exact Test I

Sample of Cancer Patients (C), Healthy Individuals (H)

Some are smokers (S) & others not (N)

Data is as follows:

CCCHHCHCH ...

SSNNNSSNS ...

Statistically significant association between C & S ?

(Is S below C statistically frequent ? ) How to test ?

Joseph Abraham CCCHHCHCH ... NNSSSNNSS ...

Numbers of C, H, S & N are fixed !

Repeat many times ... (permutation test)

How frequent S below C in permutations

compared with Original Data ? → empirical p-value

Review of Fisher Exact Test II

Rearrange assignments of S , N with C, H

CCCHHCHCH ... SSSNSNSNN ...

Joseph Abraham Numbers of C, H, S & N are fixed !

Repeat many times ... (permutation test)

How frequent S below C in permutations

compared with Original Data ? → empirical p-value

Review of Fisher Exact Test II

Rearrange assignments of S , N with C, H

CCCHHCHCH ... SSSNSNSNN ...

CCCHHCHCH ... NNSSSNNSS ...

Joseph Abraham Repeat many times ... (permutation test)

How frequent S below C in permutations

compared with Original Data ? → empirical p-value

Review of Fisher Exact Test II

Rearrange assignments of S , N with C, H

CCCHHCHCH ... SSSNSNSNN ...

CCCHHCHCH ... NNSSSNNSS ...

Numbers of C, H, S & N are fixed !

Joseph Abraham Review of Fisher Exact Test II

Rearrange assignments of S , N with C, H

CCCHHCHCH ... SSSNSNSNN ...

CCCHHCHCH ... NNSSSNNSS ...

Numbers of C, H, S & N are fixed !

Repeat many times ... (permutation test)

How frequent S below C in permutations

compared with Original Data ? → empirical p-value

Joseph Abraham (Compared with what ? )

Review of Fisher Exact Test III

Alternatively, create Contingency Table

Smokers Not Smokers Total Cancer n11 n12 nC Healthy n21 n22 nH nS nNS N

nC,nH ,nS and nNS fixed & nC + nH = nS + nNS = N Probability (Hypergeometric distribution)

Is this value large ?

Joseph Abraham Review of Fisher Exact Test III

Alternatively, create Contingency Table

Smokers Not Smokers Total Cancer n11 n12 nC Healthy n21 n22 nH nS nNS N

nC,nH ,nS and nNS fixed & nC + nH = nS + nNS = N Probability (Hypergeometric distribution)

Is this value large ? (Compared with what ? )

Joseph Abraham Fixed nC etc. =⇒ only one entry is independent

3 1 If3 → 2 all other entries become 2. 1 3 All row and column totals are equal to 4. Only possibilities are4,3,2,1,&0

(4) 0,0142 (∞),(3) 0,2286 (9),(2) 0,514 (1)

(1) 0,2286 (0,111),(0) 0,0142 (0)

Review of Fisher Exact Test IV

Compared with all other tables !

Joseph Abraham If3 → 2 all other entries become 2.

All row and column totals are equal to 4. Only possibilities are4,3,2,1,&0

(4) 0,0142 (∞),(3) 0,2286 (9),(2) 0,514 (1)

(1) 0,2286 (0,111),(0) 0,0142 (0)

Review of Fisher Exact Test IV

Compared with all other tables !

Fixed nC etc. =⇒ only one entry is independent

3 1 1 3

Joseph Abraham Review of Fisher Exact Test IV

Compared with all other tables !

Fixed nC etc. =⇒ only one entry is independent

3 1 If3 → 2 all other entries become 2. 1 3 All row and column totals are equal to 4. Only possibilities are4,3,2,1,&0

(4) 0,0142 (∞),(3) 0,2286 (9),(2) 0,514 (1)

(1) 0,2286 (0,111),(0) 0,0142 (0)

Joseph Abraham Actually significant at α = 0, 0284 (two sided)

No observation significant at exactly α = 0, 05

Observations available only at less than 0,05 !

Tests are conservative (more significant than α)

Consequence of discrete data (4,3,2,1,&0)

Review of Fisher Exact Text V

Suppose we observe table with n11 = 4

Significant at α = 0, 05 ?

Joseph Abraham Review of Fisher Exact Text V

Suppose we observe table with n11 = 4

Significant at α = 0, 05 ?

Actually significant at α = 0, 0284 (two sided)

No observation significant at exactly α = 0, 05

Observations available only at less than 0,05 !

Tests are conservative (more significant than α)

Consequence of discrete data (4,3,2,1,&0)

Joseph Abraham Odds Ratio is 1,2315 & p-value is 1 If α = 0, 05

no evidence for association between row and column variables.

Odds Ratio is 1,2498 & p-value is 0,03935 if α = 0, 05

STRONG evidence for association between

row and column variables with very similar effect sizes !

Effect Sizes and P-Values I

Consider 2 contingency tables small & large samples

5 4 500 400 3 3 300 300

Joseph Abraham If α = 0, 05

no evidence for association between row and column variables.

Odds Ratio is 1,2498 & p-value is 0,03935 if α = 0, 05

STRONG evidence for association between

row and column variables with very similar effect sizes !

Effect Sizes and P-Values I

Consider 2 contingency tables small & large samples

5 4 500 400 3 3 300 300

Odds Ratio is 1,2315 & p-value is 1

Joseph Abraham Odds Ratio is 1,2498 & p-value is 0,03935 if α = 0, 05

STRONG evidence for association between

row and column variables with very similar effect sizes !

Effect Sizes and P-Values I

Consider 2 contingency tables small & large samples

5 4 500 400 3 3 300 300

Odds Ratio is 1,2315 & p-value is 1 If α = 0, 05

no evidence for association between row and column variables.

Joseph Abraham if α = 0, 05

STRONG evidence for association between

row and column variables with very similar effect sizes !

Effect Sizes and P-Values I

Consider 2 contingency tables small & large samples

5 4 500 400 3 3 300 300

Odds Ratio is 1,2315 & p-value is 1 If α = 0, 05

no evidence for association between row and column variables.

Odds Ratio is 1,2498 & p-value is 0,03935

Joseph Abraham with very similar effect sizes !

Effect Sizes and P-Values I

Consider 2 contingency tables small & large samples

5 4 500 400 3 3 300 300

Odds Ratio is 1,2315 & p-value is 1 If α = 0, 05

no evidence for association between row and column variables.

Odds Ratio is 1,2498 & p-value is 0,03935 if α = 0, 05

STRONG evidence for association between

row and column variables

Joseph Abraham Effect Sizes and P-Values I

Consider 2 contingency tables small & large samples

5 4 500 400 3 3 300 300

Odds Ratio is 1,2315 & p-value is 1 If α = 0, 05

no evidence for association between row and column variables.

Odds Ratio is 1,2498 & p-value is 0,03935 if α = 0, 05

STRONG evidence for association between

row and column variables with very similar effect sizes !

Joseph Abraham Close to significant

P-Value & large effect sizes with small samples

may be important

Effect Sizes and P-Values II

Both P-Values and Effect sizes are usefull.

Significant P-Value and small

not very interesting ... .

Joseph Abraham Effect Sizes and P-Values II

Both P-Values and Effect sizes are usefull.

Significant P-Value and small effect size

not very interesting ... .Close to significant

P-Value & large effect sizes with small samples

may be important

Joseph Abraham