Maximum Likelihood Estimation & Contingency Tables
Joseph Abraham1
1Estacio Uniseb
Lecture I RGM5908
Joseph Abraham Maximum LIkelihood & Contingency Tables
Outline of Maximum Likelihood Estimation
Simple Example of MLE
Association between Categorical Variables
Contingency Tables
Joseph Abraham Not really, what about p = 0.6 ?
p = 0.5 ... how do we choose p based on data ?
Data are independent and identically distributed samples
100 in total with 15 heads dependent on one parameter (p)
Likelihood L(parameter | data) = p15(1 − p)85
Form similar to a joint distribution but not a joint distribution
Intro. to Maximum Likelihood I
Consider a coin with P(heads) = p
Data: 15 heads in 100 tosses
Is p = 0.90 reasonable ?
Joseph Abraham Data are independent and identically distributed samples
100 in total with 15 heads dependent on one parameter (p)
Likelihood L(parameter | data) = p15(1 − p)85
Form similar to a joint distribution but not a joint distribution
Intro. to Maximum Likelihood I
Consider a coin with P(heads) = p
Data: 15 heads in 100 tosses
Is p = 0.90 reasonable ?
Not really, what about p = 0.6 ?
p = 0.5 ... how do we choose p based on data ?
Joseph Abraham Intro. to Maximum Likelihood I
Consider a coin with P(heads) = p
Data: 15 heads in 100 tosses
Is p = 0.90 reasonable ?
Not really, what about p = 0.6 ?
p = 0.5 ... how do we choose p based on data ?
Data are independent and identically distributed samples
100 in total with 15 heads dependent on one parameter (p)
Likelihood L(parameter | data) = p15(1 − p)85
Form similar to a joint distribution but not a joint distribution
Joseph Abraham Asymptotically unbiased & Variance decreases with
size of data set
Variance of MLE is smallest among a large class of estimators
Intro to Maximum Likelihood II
Key Idea: Best choice of p is one which
maximizes L (data fixed !)
( Many parameters possible must be done numerically)
In practise work with log L → L
Maximum Likelihod Estimator (MLE) normally distributed
Provided Sample Size is Large !
Joseph Abraham Intro to Maximum Likelihood II
Key Idea: Best choice of p is one which
maximizes L (data fixed !)
( Many parameters possible must be done numerically)
In practise work with log L → L
Maximum Likelihod Estimator (MLE) normally distributed
Provided Sample Size is Large !
Asymptotically unbiased & Variance decreases with
size of data set
Variance of MLE is smallest among a large class of estimators
Joseph Abraham What is the MLE for p based on the data ?
n (N−n) ∂L L(p | data) = p (1 − p) solve ∂p = 0 equivalently
∂L solve ∂p = 0 where L = log(L) = n log(p) + (N − n) log(1 − p)
ˆ n MLE for p , p = N as expected.
n(N−n) Variance of estimate is N3 Decreases with sample size Normal confidence intervals apply
Compare with Binomial Confidence for proportion !
MLE example I
N tosses of a coin with n heads recorded (HTHHHTT ...)
Assume independent Bernoulli trials common probability p
Joseph Abraham n (N−n) ∂L L(p | data) = p (1 − p) solve ∂p = 0 equivalently
∂L solve ∂p = 0 where L = log(L) = n log(p) + (N − n) log(1 − p)
ˆ n MLE for p , p = N as expected.
n(N−n) Variance of estimate is N3 Decreases with sample size Normal confidence intervals apply
Compare with Binomial Confidence for proportion !
MLE example I
N tosses of a coin with n heads recorded (HTHHHTT ...)
Assume independent Bernoulli trials common probability p
What is the MLE for p based on the data ?
Joseph Abraham equivalently
∂L solve ∂p = 0 where L = log(L) = n log(p) + (N − n) log(1 − p)
ˆ n MLE for p , p = N as expected.
n(N−n) Variance of estimate is N3 Decreases with sample size Normal confidence intervals apply
Compare with Binomial Confidence for proportion !
MLE example I
N tosses of a coin with n heads recorded (HTHHHTT ...)
Assume independent Bernoulli trials common probability p
What is the MLE for p based on the data ?
n (N−n) ∂L L(p | data) = p (1 − p) solve ∂p = 0
Joseph Abraham ˆ n MLE for p , p = N as expected.
n(N−n) Variance of estimate is N3 Decreases with sample size Normal confidence intervals apply
Compare with Binomial Confidence for proportion !
MLE example I
N tosses of a coin with n heads recorded (HTHHHTT ...)
Assume independent Bernoulli trials common probability p
What is the MLE for p based on the data ?
n (N−n) ∂L L(p | data) = p (1 − p) solve ∂p = 0 equivalently
∂L solve ∂p = 0 where L = log(L) = n log(p) + (N − n) log(1 − p)
Joseph Abraham MLE example I
N tosses of a coin with n heads recorded (HTHHHTT ...)
Assume independent Bernoulli trials common probability p
What is the MLE for p based on the data ?
n (N−n) ∂L L(p | data) = p (1 − p) solve ∂p = 0 equivalently
∂L solve ∂p = 0 where L = log(L) = n log(p) + (N − n) log(1 − p)
ˆ n MLE for p , p = N as expected.
n(N−n) Variance of estimate is N3 Decreases with sample size Normal confidence intervals apply
Compare with Binomial Confidence for proportion !
Joseph Abraham Is p = 0, 2 a reasonable guess ?
Not reasonable from confidence intervals for p.
More general way to check reasonable parameter values
MLE example II
Real Data: Simulate N = 100 with p = 0.4 to get n = 43
Maximum Likelihood estimate pˆ is 0,43
Variance of estimate is 0,002451
95% confidence intevals: 0, 43 ± 0, 097
Fully compatible with p = 0, 4.
Suppose we did not know p = 0, 4 but just 100 trials &
43 successes
Joseph Abraham Not reasonable from confidence intervals for p.
More general way to check reasonable parameter values
MLE example II
Real Data: Simulate N = 100 with p = 0.4 to get n = 43
Maximum Likelihood estimate pˆ is 0,43
Variance of estimate is 0,002451
95% confidence intevals: 0, 43 ± 0, 097
Fully compatible with p = 0, 4.
Suppose we did not know p = 0, 4 but just 100 trials &
43 successes Is p = 0, 2 a reasonable guess ?
Joseph Abraham MLE example II
Real Data: Simulate N = 100 with p = 0.4 to get n = 43
Maximum Likelihood estimate pˆ is 0,43
Variance of estimate is 0,002451
95% confidence intevals: 0, 43 ± 0, 097
Fully compatible with p = 0, 4.
Suppose we did not know p = 0, 4 but just 100 trials &
43 successes Is p = 0, 2 a reasonable guess ?
Not reasonable from confidence intervals for p.
More general way to check reasonable parameter values
Joseph Abraham Assuming some given statistical model !
We wish to test p = p0 as a parameter choice
Calculate MLE (pˆ) and L(pˆ)
Compute LR = 2(L(pˆ) − L(pˆ0))
2 For large samples LR distributed like χ1
(One parameter estimated in L(pˆ) and none in L(pˆ0) )
Likelihood Ratio Test I
Very General Technique to see how well data fits
Choice of parameters
Joseph Abraham Likelihood Ratio Test I
Very General Technique to see how well data fits
Choice of parameters Assuming some given statistical model !
We wish to test p = p0 as a parameter choice
Calculate MLE (pˆ) and L(pˆ)
Compute LR = 2(L(pˆ) − L(pˆ0))
2 For large samples LR distributed like χ1
(One parameter estimated in L(pˆ) and none in L(pˆ0) )
Joseph Abraham LR = 2(L(pˆ) − L(pˆ0)) = 27, 18704 2 ˆ −6 Extreme tail of χ1 hence reject p = p0 (p-value ∼ 10 )
(for pˆ = 0, 4) p-value far from LR not significant.
Can be generalized to more complicated situations
p → (θ1, θ2, . . . θm) (used in later lectures)
Likelihood Ratio Test II
For (N = 100, n = 43) pˆ = 0, 43 L(pˆ) = −68, 33149
Using p0 = 0, 2 we get L(p0) = −81, 925
Joseph Abraham Likelihood Ratio Test II
For (N = 100, n = 43) pˆ = 0, 43 L(pˆ) = −68, 33149
Using p0 = 0, 2 we get L(p0) = −81, 925
LR = 2(L(pˆ) − L(pˆ0)) = 27, 18704 2 ˆ −6 Extreme tail of χ1 hence reject p = p0 (p-value ∼ 10 )
(for pˆ = 0, 4) p-value far from LR not significant.
Can be generalized to more complicated situations
p → (θ1, θ2, . . . θm) (used in later lectures)
Joseph Abraham In some cases (ordinal variables)
may be possible to define order (Doutorado > Mestrado)
For others (nominal) not possible (Smoker > Non-Smoker) ???
How to quantify correlations among categorical variables ?
Cannot define mean, pearsons correlations etc.
Associations of Categorical Variables I
Have many independent observations characterized by two
or more Categorical variables (education, genotype, hair color
smoker ...
Joseph Abraham For others (nominal) not possible (Smoker > Non-Smoker) ???
How to quantify correlations among categorical variables ?
Cannot define mean, pearsons correlations etc.
Associations of Categorical Variables I
Have many independent observations characterized by two
or more Categorical variables (education, genotype, hair color
smoker ... In some cases (ordinal variables)
may be possible to define order (Doutorado > Mestrado)
Joseph Abraham Associations of Categorical Variables I
Have many independent observations characterized by two
or more Categorical variables (education, genotype, hair color
smoker ... In some cases (ordinal variables)
may be possible to define order (Doutorado > Mestrado)
For others (nominal) not possible (Smoker > Non-Smoker) ???
How to quantify correlations among categorical variables ?
Cannot define mean, pearsons correlations etc.
Joseph Abraham Associations of Categorical Variables II
Assume we have an sample of independent individuals
some with cancer some without
Some individuals are smokers, some are not
(Assume similar ages, similar proportions of men & women)
Is there a correlation between Smoking and Cancer ?
If independent (H0) then P(S, C) = P(S)P(C)
Joseph Abraham Contingency Tables I
Sample independent individuals with Cancer & Healthy
Some smokers and others not smokers
Smokers Not Smokers Total Cancer n11 n12 nC Healthy n21 n22 nH nS nNS
nC → number of individuals with cancer
nC + nH = nS + nNS = N (total number)
n11 → Number of individuals with Cancer & Smokers
n12, n22 etc. defined similarly
Joseph Abraham Does the data support H0 ?
For large samples use Pearson’s Chi-square Test
0 2 X (nij − nij ) X 2 = n0 i,j ij
2 2 Under H0 X distributed like χ1
0 ( 1 df after fixing nC, nS and n11)
Contingency Tables II
0 Under H0 (independence) n11 = nCnS/N
0 n12 etc. defined similarly
Joseph Abraham Contingency Tables II
0 Under H0 (independence) n11 = nCnS/N
0 n12 etc. defined similarly
Does the data support H0 ?
For large samples use Pearson’s Chi-square Test
0 2 X (nij − nij ) X 2 = n0 i,j ij
2 2 Under H0 X distributed like χ1
0 ( 1 df after fixing nC, nS and n11)
Joseph Abraham How large is the correlation between S & C ?
Cannot be obtained from p-value alone
Need different measure of correlation
Contingency Tables III
Suppose we find for some data X 2 = 9.2
p-value highly significant for α = 0.05
S & C appear to be correlated
Joseph Abraham Contingency Tables III
Suppose we find for some data X 2 = 9.2
p-value highly significant for α = 0.05
S & C appear to be correlated
How large is the correlation between S & C ?
Cannot be obtained from p-value alone
Need different measure of correlation
Joseph Abraham Contingency Tables IV
Define Odds Ratio (OR)
P(Disease | RiskFactor)P(NoDisease | NoRiskFactor) OR = P(Disease | NoRiskFactor)P(NoDisease | RiskFactor)
Ratio of Odds, OR = 1 under H0 OR > 1 =⇒ Disease and Risk Factor positively correlated
OR < 1 =⇒ Disease and Risk Factor negatively correlated
In our case sample OR = n11 n22 n21 n12 Large or Small OR evidence for strong asociation
Joseph Abraham How to test ?
Review of Fisher Exact Test I
Sample of Cancer Patients (C), Healthy Individuals (H)
Some are smokers (S) & others not (N)
Data is as follows:
CCCHHCHCH ...
SSNNNSSNS ...
Statistically significant association between C & S ?
(Is S below C statistically frequent ? )
Joseph Abraham Review of Fisher Exact Test I
Sample of Cancer Patients (C), Healthy Individuals (H)
Some are smokers (S) & others not (N)
Data is as follows:
CCCHHCHCH ...
SSNNNSSNS ...
Statistically significant association between C & S ?
(Is S below C statistically frequent ? ) How to test ?
Joseph Abraham CCCHHCHCH ... NNSSSNNSS ...
Numbers of C, H, S & N are fixed !
Repeat many times ... (permutation test)
How frequent S below C in permutations
compared with Original Data ? → empirical p-value
Review of Fisher Exact Test II
Rearrange assignments of S , N with C, H
CCCHHCHCH ... SSSNSNSNN ...
Joseph Abraham Numbers of C, H, S & N are fixed !
Repeat many times ... (permutation test)
How frequent S below C in permutations
compared with Original Data ? → empirical p-value
Review of Fisher Exact Test II
Rearrange assignments of S , N with C, H
CCCHHCHCH ... SSSNSNSNN ...
CCCHHCHCH ... NNSSSNNSS ...
Joseph Abraham Repeat many times ... (permutation test)
How frequent S below C in permutations
compared with Original Data ? → empirical p-value
Review of Fisher Exact Test II
Rearrange assignments of S , N with C, H
CCCHHCHCH ... SSSNSNSNN ...
CCCHHCHCH ... NNSSSNNSS ...
Numbers of C, H, S & N are fixed !
Joseph Abraham Review of Fisher Exact Test II
Rearrange assignments of S , N with C, H
CCCHHCHCH ... SSSNSNSNN ...
CCCHHCHCH ... NNSSSNNSS ...
Numbers of C, H, S & N are fixed !
Repeat many times ... (permutation test)
How frequent S below C in permutations
compared with Original Data ? → empirical p-value
Joseph Abraham (Compared with what ? )
Review of Fisher Exact Test III
Alternatively, create Contingency Table
Smokers Not Smokers Total Cancer n11 n12 nC Healthy n21 n22 nH nS nNS N
nC,nH ,nS and nNS fixed & nC + nH = nS + nNS = N Probability (Hypergeometric distribution)
Is this value large ?
Joseph Abraham Review of Fisher Exact Test III
Alternatively, create Contingency Table
Smokers Not Smokers Total Cancer n11 n12 nC Healthy n21 n22 nH nS nNS N
nC,nH ,nS and nNS fixed & nC + nH = nS + nNS = N Probability (Hypergeometric distribution)
Is this value large ? (Compared with what ? )
Joseph Abraham Fixed nC etc. =⇒ only one entry is independent
3 1 If3 → 2 all other entries become 2. 1 3 All row and column totals are equal to 4. Only possibilities are4,3,2,1,&0
(4) 0,0142 (∞),(3) 0,2286 (9),(2) 0,514 (1)
(1) 0,2286 (0,111),(0) 0,0142 (0)
Review of Fisher Exact Test IV
Compared with all other tables !
Joseph Abraham If3 → 2 all other entries become 2.
All row and column totals are equal to 4. Only possibilities are4,3,2,1,&0
(4) 0,0142 (∞),(3) 0,2286 (9),(2) 0,514 (1)
(1) 0,2286 (0,111),(0) 0,0142 (0)
Review of Fisher Exact Test IV
Compared with all other tables !
Fixed nC etc. =⇒ only one entry is independent
3 1 1 3
Joseph Abraham Review of Fisher Exact Test IV
Compared with all other tables !
Fixed nC etc. =⇒ only one entry is independent
3 1 If3 → 2 all other entries become 2. 1 3 All row and column totals are equal to 4. Only possibilities are4,3,2,1,&0
(4) 0,0142 (∞),(3) 0,2286 (9),(2) 0,514 (1)
(1) 0,2286 (0,111),(0) 0,0142 (0)
Joseph Abraham Actually significant at α = 0, 0284 (two sided)
No observation significant at exactly α = 0, 05
Observations available only at less than 0,05 !
Tests are conservative (more significant than α)
Consequence of discrete data (4,3,2,1,&0)
Review of Fisher Exact Text V
Suppose we observe table with n11 = 4
Significant at α = 0, 05 ?
Joseph Abraham Review of Fisher Exact Text V
Suppose we observe table with n11 = 4
Significant at α = 0, 05 ?
Actually significant at α = 0, 0284 (two sided)
No observation significant at exactly α = 0, 05
Observations available only at less than 0,05 !
Tests are conservative (more significant than α)
Consequence of discrete data (4,3,2,1,&0)
Joseph Abraham Odds Ratio is 1,2315 & p-value is 1 If α = 0, 05
no evidence for association between row and column variables.
Odds Ratio is 1,2498 & p-value is 0,03935 if α = 0, 05
STRONG evidence for association between
row and column variables with very similar effect sizes !
Effect Sizes and P-Values I
Consider 2 contingency tables small & large samples
5 4 500 400 3 3 300 300
Joseph Abraham If α = 0, 05
no evidence for association between row and column variables.
Odds Ratio is 1,2498 & p-value is 0,03935 if α = 0, 05
STRONG evidence for association between
row and column variables with very similar effect sizes !
Effect Sizes and P-Values I
Consider 2 contingency tables small & large samples
5 4 500 400 3 3 300 300
Odds Ratio is 1,2315 & p-value is 1
Joseph Abraham Odds Ratio is 1,2498 & p-value is 0,03935 if α = 0, 05
STRONG evidence for association between
row and column variables with very similar effect sizes !
Effect Sizes and P-Values I
Consider 2 contingency tables small & large samples
5 4 500 400 3 3 300 300
Odds Ratio is 1,2315 & p-value is 1 If α = 0, 05
no evidence for association between row and column variables.
Joseph Abraham if α = 0, 05
STRONG evidence for association between
row and column variables with very similar effect sizes !
Effect Sizes and P-Values I
Consider 2 contingency tables small & large samples
5 4 500 400 3 3 300 300
Odds Ratio is 1,2315 & p-value is 1 If α = 0, 05
no evidence for association between row and column variables.
Odds Ratio is 1,2498 & p-value is 0,03935
Joseph Abraham with very similar effect sizes !
Effect Sizes and P-Values I
Consider 2 contingency tables small & large samples
5 4 500 400 3 3 300 300
Odds Ratio is 1,2315 & p-value is 1 If α = 0, 05
no evidence for association between row and column variables.
Odds Ratio is 1,2498 & p-value is 0,03935 if α = 0, 05
STRONG evidence for association between
row and column variables
Joseph Abraham Effect Sizes and P-Values I
Consider 2 contingency tables small & large samples
5 4 500 400 3 3 300 300
Odds Ratio is 1,2315 & p-value is 1 If α = 0, 05
no evidence for association between row and column variables.
Odds Ratio is 1,2498 & p-value is 0,03935 if α = 0, 05
STRONG evidence for association between
row and column variables with very similar effect sizes !
Joseph Abraham Close to significant
P-Value & large effect sizes with small samples
may be important
Effect Sizes and P-Values II
Both P-Values and Effect sizes are usefull.
Significant P-Value and small effect size
not very interesting ... .
Joseph Abraham Effect Sizes and P-Values II
Both P-Values and Effect sizes are usefull.
Significant P-Value and small effect size
not very interesting ... .Close to significant
P-Value & large effect sizes with small samples
may be important
Joseph Abraham