Maximum-Likelihood Estimation: Basic Ideas

Maximum-LikelihoodEstimation:BasicIdeas 1 I The methodofmaximumlikelihood providesestimatorsthathaveboth areasonableintuitivebasisandmanydesirablestatisticalproperties. I Themethodisverybroadlyapplicableandissimpletoapply. I Onceamaximum-likelihoodestimatorisderived,thegeneraltheory ofmaximum-likelihoodestimationprovidesstandarderrors,statistical tests,andotherresultsusefulforstatisticalinference. I Adisadvantageofthemethodisthatitfrequentlyrequiresstrong Maximum-Likelihood Estimation: assumptionsaboutthestructureofthedata. Basic Ideas c ° Maximum-LikelihoodEstimation:BasicIdeas 2 Maximum-LikelihoodEstimation:BasicIdeas 3 Thisfunctioniscalledthelikelihoodfunction: 1.AnExample (parameter data)= ( ) Wewanttoestimatetheprobability ofgettingaheadupon flippinga | | I = 7(1 )3 particularcoin. We flipthecoin‘independently’10times(i.e.,wesample =10 flips), I Theprobabilityfunctionandthelikelihoodfunctionaregivenbythe obtainingthefollowingresult: . sameequation,buttheprobabilityfunctionisafunctionofthedata Theprobabilityofobtainingthissequence—inadvanceofcollecting withthevalueoftheparameter fixed,whilethelikelihoodfunctionisa thedata—isafunctionoftheunknownparameter : functionoftheparameterwiththedata fixed. Pr(data parameter)=Pr( ) | | = (1 ) (1 )(1 ) = 7(1 )3 Butthedataforourparticularsampleare fixed:Wehavealready collectedthem. Theparameter alsohasafixedvalue,butthisvalueisunknown,and sowecanletitvaryinourimaginationbetween0and1,treatingthe probabilityoftheobserveddataasafunctionof . c c ° ° Maximum-LikelihoodEstimation:BasicIdeas 4 Maximum-LikelihoodEstimation:BasicIdeas 5 Herearesomerepresentativevaluesofthelikelihoodfordifferent ThecompletelikelihoodfunctionisgraphedinFigure1. valuesof : ( data)= 7(1 )3 Althougheachvalueof ( data) isanotionalprobability,thefunction | ( data) isnotaprobability| ordensityfunction—itdoesnotenclose 0.0 0.0 | .1 .0000000729 anareaof1. .2 .00000655 Theprobabilityofobtainingthesampleofdatathatwehaveinhand, .3 .0000750 ,issmallregardlessofthetruevalueof . .4 .000354 – Thisisusuallythecase: Anyspecific sampleresult—includingthe .5 .000977 onethatisrealized—willhavelowprobability. .6 .00179 Nevertheless,thelikelihoodcontainsusefulinformationaboutthe .7 .00222 unknownparameter . .8 .00168 Forexample, cannot be0or1,andis‘unlikely’tobecloseto0or1. .9 .000478 1.0 0.0 I Reversingthisreasoning,thevalueof thatismostsupportedbythe dataistheoneforwhichthelikelihoodislargest. Thisvalueisthe maximum-likelihoodestimate(MLE),denoted . Here, = 7,whichisthesampleproportionofheads,7/10. b c c ° ° b Maximum-LikelihoodEstimation:BasicIdeas 6 Maximum-LikelihoodEstimation:BasicIdeas 7 I Moregenerally,for independent flipsofthecoin,producingaparticular sequencethatincludes headsand tails, 20 00 data data . ( )=Pr( )= (1 ) 0 | | Wewantthevalueof thatmaximizes ( data),whichweoften 5 | 01 abbreviate ( ). 0 . 0 ta Itissimpler—andequivalent—to findthevalueof thatmaximizes a 0 d | 1 thelogofthelikelihood 00 . L 0 log ( )= log +( )log (1 ) 5 0 Differentiating log ( ) withrespectto produces 0 0 . 0 log ( ) 1 = +( ) ( 1) 0 1 000 . 0 = 0.00.20.40.60.81.0 1 Figure1.Likelihoodofobserving7headsand3tailsinaparticularse- quencefordifferentvaluesoftheprobabilityofobservingahead, . c c ° ° Maximum-LikelihoodEstimation:BasicIdeas 8 Maximum-LikelihoodEstimation:BasicIdeas 9 Settingthederivativeto0andsolvingproducestheMLEwhich,as 2.PropertiesofMaximum-Likelihood before,isthesampleproportion . Estimators Themaximum-likelihood estimator is = . Underverybroadconditions,maximum-likelihoodestimatorshavethe b followinggeneralproperties: I Maximum-likelihoodestimatorsareconsistent. I Theyareasymptoticallyunbiased,althoughtheymaybebiasedin finite samples. I Theyareasymptoticallyefficient—noasymptoticallyunbiasedestimator hasasmallerasymptoticvariance. I Theyareasymptoticallynormallydistributed. I Ifthereisasufficientstatisticforaparameter,thenthemaximum- likelihoodestimatoroftheparameterisafunctionofasufficientstatistic. Asufficientstatisticisastatisticthatexhaustsalloftheinformationin thesampleabouttheparameterofinterest. c c ° ° Maximum-LikelihoodEstimation:BasicIdeas 10 Maximum-LikelihoodEstimation:BasicIdeas 11 I TheasymptoticsamplingvarianceoftheMLE ofaparameter can I ( ) isthevalueofthelikelihoodfunctionattheMLE ,while ( ) is beobtainedfromthesecondderivativeofthelog-likelihood: thelikelihoodforthetrue(butgenerallyunknown)parameter . 1 ( )= b Tbhe loglikelihood-ratiostatistic b V 2log ( ) ( ) 2 2log =2[log ( ) log ( )] 2 ( ) b ¸ followsanasymptoticchisquaredistributionwithonedegreeof b Thedenominatorof ( ) iscalledthe expected or Fisherinformation freedom. b V 2 log ( ) – Because,bydefinition,theMLEmaximizesthelikelihoodforour ( ) particularsample,thevalueofthelikelihoodatthetrueparameter I b 2 ¸ value isgenerallysmallerthanattheMLE (unless,bygood fortune, and happentocoincide). Inpractice,wesubstitutetheMLE intotheequationfor ( ) to V b obtainan estimate oftheasymptoticsamplingvariance, [( ). V b b b b c c ° ° Maximum-LikelihoodEstimation:BasicIdeas 12 Maximum-LikelihoodEstimation:BasicIdeas 13 3.StatisticalInference:Wald, 3. ScoreTest: The‘score’istheslopeofthelog-likelihoodataparticular valueof ,thatis, ( ) log ( ) . Likelihood-Ratio,andScoreTests AttheMLE,thescoreis0: ( )=0.Itcanbeshownthatthe score statistic Thesepropertiesofmaximum-likelihoodestimatorsleaddirectlytothree ( 0) commonandgeneralproceduresfortestingthestatisticalhypothesis 0 b ( 0) : = . I 0 0 isasymptoticallydistributedas (0 1) under 0. p 1. WaldTest: RelyingontheasymptoticnormalityoftheMLE ,we I Unlessthelog-likelihoodisquadratic,thethreeteststatisticscan calculatetheteststatistic producesomewhatdifferentresultsinspecificsamples,althoughthe 0 0 b threetestsareasymptoticallyequivalent. [( ) b V I Incertaincontexts,thescoretesthasthepracticaladvantageofnot whichisasymptoticallydistributedqas (0 1) under 0. requiringthecomputationoftheMLE (because 0 dependsonlyon 2. Likelihood-RatioTest: Employingthelobglikelihoodratio,theteststatistic thenullvalue 0,whichisspecifiedin 0). 2 ( 0) TheWaldandlikelihood-ratiotestscanb be‘turnedaround’toproduce 0 2log =2[log ( ) log ( 0)] I ( ) confidenceintervalsfor . 2 isasymptoticallydistributedas 1 under 0. b c b c ° ° Maximum-LikelihoodEstimation:BasicIdeas 14 Maximum-LikelihoodEstimation:BasicIdeas 15 I Figure2comparesthethreeteststatistics. I Maximum-likelihoodestimationandtheWald,likelihood-ratio,andscore Likelihood-ratio test tests,extendstraightforwardlytosimultaneousestimationofseveral logeL parameters. Score test Wald test ^ 0 Figure2.Likelihood-ratio,Wald,andscoretests. c c ° °.

Maximum-Likelihood Estimation: Basic Ideas

Three Statistical Testing Procedures in Logistic Regression: Their Performance in Differential Item Functioning (DIF) Investigation

Testing for INAR Effects

Comparison of Wald, Score, and Likelihood Ratio Tests for Response Adaptive Designs

Robust Score and Portmanteau Tests of Volatility Spillover Mike Aguilar, Jonathan B

Econometrics-I-11.Pdf

Power Analysis for the Wald, LR, Score, and Gradient Tests in a Marginal Maximum Likelihood Framework: Applications in IRT

Rao's Score Test in Econometrics

An Improved Sample Size Calculation Method for Score Tests in Generalized Linear Models Arxiv:2006.13104V1 [Stat.ME] 23 Jun 20

Lagrange Multiplier Test

Lecture 02: Statistical Inference for Binomial Parameters

Skedastic: Heteroskedasticity Diagnostics for Linear Regression

Piagnostics for Heteroscedasticity in Regression by R. Dennis Cook and Sanford Weisberg