Maximum-Likelihood Estimation: Basic Ideas
Total Page:16
File Type:pdf, Size:1020Kb
Maximum-LikelihoodEstimation:BasicIdeas 1 I The methodofmaximumlikelihood providesestimatorsthathaveboth areasonableintuitivebasisandmanydesirablestatisticalproperties. I Themethodisverybroadlyapplicableandissimpletoapply. I Onceamaximum-likelihoodestimatorisderived,thegeneraltheory ofmaximum-likelihoodestimationprovidesstandarderrors,statistical tests,andotherresultsusefulforstatisticalinference. I Adisadvantageofthemethodisthatitfrequentlyrequiresstrong Maximum-Likelihood Estimation: assumptionsaboutthestructureofthedata. Basic Ideas c ° Maximum-LikelihoodEstimation:BasicIdeas 2 Maximum-LikelihoodEstimation:BasicIdeas 3 Thisfunctioniscalledthelikelihoodfunction: 1.AnExample (parameter data)= ( ) Wewanttoestimatetheprobability ofgettingaheadupon flippinga | | I = 7(1 )3 particularcoin. We flipthecoin‘independently’10times(i.e.,wesample =10 flips), I Theprobabilityfunctionandthelikelihoodfunctionaregivenbythe obtainingthefollowingresult: . sameequation,buttheprobabilityfunctionisafunctionofthedata Theprobabilityofobtainingthissequence—inadvanceofcollecting withthevalueoftheparameter fixed,whilethelikelihoodfunctionisa thedata—isafunctionoftheunknownparameter : functionoftheparameterwiththedata fixed. Pr(data parameter)=Pr( ) | | = (1 ) (1 )(1 ) = 7(1 )3 Butthedataforourparticularsampleare fixed:Wehavealready collectedthem. Theparameter alsohasafixedvalue,butthisvalueisunknown,and sowecanletitvaryinourimaginationbetween0and1,treatingthe probabilityoftheobserveddataasafunctionof . c c ° ° Maximum-LikelihoodEstimation:BasicIdeas 4 Maximum-LikelihoodEstimation:BasicIdeas 5 Herearesomerepresentativevaluesofthelikelihoodfordifferent ThecompletelikelihoodfunctionisgraphedinFigure1. valuesof : ( data)= 7(1 )3 Althougheachvalueof ( data) isanotionalprobability,thefunction | ( data) isnotaprobability| ordensityfunction—itdoesnotenclose 0.0 0.0 | .1 .0000000729 anareaof1. .2 .00000655 Theprobabilityofobtainingthesampleofdatathatwehaveinhand, .3 .0000750 ,issmallregardlessofthetruevalueof . .4 .000354 – Thisisusuallythecase: Anyspecific sampleresult—includingthe .5 .000977 onethatisrealized—willhavelowprobability. .6 .00179 Nevertheless,thelikelihoodcontainsusefulinformationaboutthe .7 .00222 unknownparameter . .8 .00168 Forexample, cannot be0or1,andis‘unlikely’tobecloseto0or1. .9 .000478 1.0 0.0 I Reversingthisreasoning,thevalueof thatismostsupportedbythe dataistheoneforwhichthelikelihoodislargest. Thisvalueisthe maximum-likelihoodestimate(MLE),denoted . Here, = 7,whichisthesampleproportionofheads,7/10. b c c ° ° b Maximum-LikelihoodEstimation:BasicIdeas 6 Maximum-LikelihoodEstimation:BasicIdeas 7 I Moregenerally,for independent flipsofthecoin,producingaparticular sequencethatincludes headsand tails, 20 00 data data . ( )=Pr( )= (1 ) 0 | | Wewantthevalueof thatmaximizes ( data),whichweoften 5 | 01 abbreviate ( ). 0 . 0 ta Itissimpler—andequivalent—to findthevalueof thatmaximizes a 0 d | 1 thelogofthelikelihood 00 . L 0 log ( )= log +( )log (1 ) 5 0 Differentiating log ( ) withrespectto produces 0 0 . 0 log ( ) 1 = +( ) ( 1) 0 1 000 . 0 = 0.00.20.40.60.81.0 1 Figure1.Likelihoodofobserving7headsand3tailsinaparticularse- quencefordifferentvaluesoftheprobabilityofobservingahead, . c c ° ° Maximum-LikelihoodEstimation:BasicIdeas 8 Maximum-LikelihoodEstimation:BasicIdeas 9 Settingthederivativeto0andsolvingproducestheMLEwhich,as 2.PropertiesofMaximum-Likelihood before,isthesampleproportion . Estimators Themaximum-likelihood estimator is = . Underverybroadconditions,maximum-likelihoodestimatorshavethe b followinggeneralproperties: I Maximum-likelihoodestimatorsareconsistent. I Theyareasymptoticallyunbiased,althoughtheymaybebiasedin finite samples. I Theyareasymptoticallyefficient—noasymptoticallyunbiasedestimator hasasmallerasymptoticvariance. I Theyareasymptoticallynormallydistributed. I Ifthereisasufficientstatisticforaparameter,thenthemaximum- likelihoodestimatoroftheparameterisafunctionofasufficientstatistic. Asufficientstatisticisastatisticthatexhaustsalloftheinformationin thesampleabouttheparameterofinterest. c c ° ° Maximum-LikelihoodEstimation:BasicIdeas 10 Maximum-LikelihoodEstimation:BasicIdeas 11 I TheasymptoticsamplingvarianceoftheMLE ofaparameter can I ( ) isthevalueofthelikelihoodfunctionattheMLE ,while ( ) is beobtainedfromthesecondderivativeofthelog-likelihood: thelikelihoodforthetrue(butgenerallyunknown)parameter . 1 ( )= b Tbhe loglikelihood-ratiostatistic b V 2log ( ) ( ) 2 2log =2[log ( ) log ( )] 2 ( ) b ¸ followsanasymptoticchisquaredistributionwithonedegreeof b Thedenominatorof ( ) iscalledthe expected or Fisherinformation freedom. b V 2 log ( ) – Because,bydefinition,theMLEmaximizesthelikelihoodforour ( ) particularsample,thevalueofthelikelihoodatthetrueparameter I b 2 ¸ value isgenerallysmallerthanattheMLE (unless,bygood fortune, and happentocoincide). Inpractice,wesubstitutetheMLE intotheequationfor ( ) to V b obtainan estimate oftheasymptoticsamplingvariance, [( ). V b b b b c c ° ° Maximum-LikelihoodEstimation:BasicIdeas 12 Maximum-LikelihoodEstimation:BasicIdeas 13 3.StatisticalInference:Wald, 3. ScoreTest: The‘score’istheslopeofthelog-likelihoodataparticular valueof ,thatis, ( ) log ( ) . Likelihood-Ratio,andScoreTests AttheMLE,thescoreis0: ( )=0.Itcanbeshownthatthe score statistic Thesepropertiesofmaximum-likelihoodestimatorsleaddirectlytothree ( 0) commonandgeneralproceduresfortestingthestatisticalhypothesis 0 b ( 0) : = . I 0 0 isasymptoticallydistributedas (0 1) under 0. p 1. WaldTest: RelyingontheasymptoticnormalityoftheMLE ,we I Unlessthelog-likelihoodisquadratic,thethreeteststatisticscan calculatetheteststatistic producesomewhatdifferentresultsinspecificsamples,althoughthe 0 0 b threetestsareasymptoticallyequivalent. [( ) b V I Incertaincontexts,thescoretesthasthepracticaladvantageofnot whichisasymptoticallydistributedqas (0 1) under 0. requiringthecomputationoftheMLE (because 0 dependsonlyon 2. Likelihood-RatioTest: Employingthelobglikelihoodratio,theteststatistic thenullvalue 0,whichisspecifiedin 0). 2 ( 0) TheWaldandlikelihood-ratiotestscanb be‘turnedaround’toproduce 0 2log =2[log ( ) log ( 0)] I ( ) confidenceintervalsfor . 2 isasymptoticallydistributedas 1 under 0. b c b c ° ° Maximum-LikelihoodEstimation:BasicIdeas 14 Maximum-LikelihoodEstimation:BasicIdeas 15 I Figure2comparesthethreeteststatistics. I Maximum-likelihoodestimationandtheWald,likelihood-ratio,andscore Likelihood-ratio test tests,extendstraightforwardlytosimultaneousestimationofseveral logeL parameters. Score test Wald test ^ 0 Figure2.Likelihood-ratio,Wald,andscoretests. c c ° °.