<<

Likelihood and What we will cover: What we will not cover: in Scientific •The basis of inference in . Not suitable for people already well-versed in . They’ll already know most of this! •Compare the "frequentist" approach we UCL Graduate School: Graduate Skills Course usually learn, with today’s widely used Not suitable for people who’ve no idea about alternatives in . statistics. At least GCSE knowledge required. Your hosts for today: We won’t have time to teach you all you need to ¾Particularly the use of "likelihood" and Particularly the use of "likelihood" and know to analyse your data. James Mallet, Professor of Biological Diversity . Ziheng Yang, Professor of Statistical Genetics http://abacus.gene.ucl.ac.uk/ We won’t have time to go into very complicated http://abacus.gene.ucl.ac.uk/jim/ •We will hopefully empower to develop examples. (Department of Biology, UCL) your own analyses, using simple examples.

Instead, we hope My main source Overview You begin to develop a healthy disrespect for most “off-the-shelf” methods. (But you will Anthony W. Edwards • What is scientific inference? probably still use them). (1972); reprinted 1992: • Three of statistical inference: – Frequentist (probability in the long run) You start to form your own ideas of how Likelihood. Cambridge UP – Likelihood (likelihood) statistics and scientific inference are related (a – Bayesian () of science topic). • Common Ground: Opposing philosophies agree (approximately) on many problems That your interest in likelihood and Bayesian analysis is piqued, and you might be motivated see also more in-depth: to do further reading. • Discussion Yudi Pawitan (2001). • Exercises, example of ABO bloodgroups You become empowered to perform simple In all Likelihood. Statistical • Ziheng’s talk: when philosophies conflict ... statistical analyses, using Excel and Excel's Modelling and Inference Solver "add-in". + a little programming ⇒ you Modelling and Inference can analyse much more difficult problems. using Likelihood. Oxford UP

Scientific Inference The nature of scientific Models and hypotheses inference Science is about trying to find • What is scientific inference? “predictability” or “regularities” in nature, “I’m sure this is true” “predictability” or “regularities” in nature, • Three philosophies of statistical inference: which we can use. – Frequentist (probability in the long run) “I’m pretty sure” “I’m not sure” “It is likely that...” – Likelihood (likelihood measures strength) For some , this usually seems to “This seems most probable to me” For some reason, this usually seems to – Bayesian (posterior probability " " ) work ... • Common ground: Opposing philosophies All of inference about the world is likely to Models and hypotheses allow . agree (approximately), in many problems. be based on probability; it’s statistical. Models and hypotheses allow prediction. We test them by analysing something (Except divine revelation!) about their “likelihood” or “probability”

1 Models and hypotheses in Models and hypotheses in Data is typically discrete For example, statistical inference ... Counts of things milk fat ... Measurements to nearest mm, 0.1ºC Models are assumed to be true for the Data is also finite purposes of the particular test or problem Models, hypotheses can be discrete too, or e.g. we assume height in humans to be continuous. Models and hypotheses may be normally distributed. finite, or infinite in scope. A good method of inference should take this Hypotheses are “parameters” that are the discreteness of data into account when we focus of interest in estimation analyse the data. Many analyses, e.g. and of height humans. From Sokal & Rohlf 1981, particularly frequentist, don’t! Biometry, p. 47

Null hypotheses in statistics Estimation is primary The three philosophies We are often taught in biology a simplistic Edwards argues that we should turn this • What is scientific inference? kind of “Popperian” approach to science, to argument on its head. kind of “Popperian” approach to science, to • Three philosophies of statistical inference: falsify simple hypotheses. We then try to test – Frequentist (probability in the long run) the null hypothesis! Estimation of a distribution or model can lead to testing of an infinitude of – Likelihood (likelihood) – Bayesian (posterior probability) (Zero-dimensional statistics, if you like; only hypotheses, including the null hypothesis. – Bayesian (posterior probability) one hypothesis can be excluded). • Common ground: Opposing philosophies Uses full dimensionality of the problem: agree (approximately), in many problems. In this view, estimation (e.g. mean, variance) ≥1 – n-dimensional statistical analyses. is like natural history, not good science. -envy? More powerful!

1. Frequentist, significance Philosophical problems P - values testing, P-values with frequentist approach P-values are “tail ” Perfected in 1920s We only have one of data; seems to (Pearson, Fisher et al.) imagine the done a very large “What the use of P χ2 e.g. test, or t-test number of times implies, therefore, is that a χ2 = 5.28, d.f. = 1; hypothesis that may be or t=3.92, d.f.=10 Often tend to assume the data come from a true may be rejected We find P<0.05, or P=0.009834 continuous distribution; because it has not e.g. χ2 tests on , Σ(O-E)2/E predicted observable This is “tail probability” or “probability in the results that have not long run” of getting results at least as extreme Encourages testing of null hypothesis occurred” Jeffreys 1961 as the data under the null hypothesis

2 Alternatives to frequentism 2. Likelihood The Law of Likelihood • Frequentism: “Probability in the long run” The likelihood of a hypothesis (H) after doing “Within the framework of a statistical an experiment or gathering data (D) is the model, a particular set of data supports one • Two alternative measures of support: probability of the data given the hypothesis statistical hypothesis better than another if – Bayesian Probability ( 1763, Marquis de Laplace 1820) the likelihood of the first hypothesis on the “The probability of a hypothesis given the data” L(H|D) = P(D|H) data exceeds the likelihood of the second hypothesis” – Likelihood (RA Fisher 1920s, Edwards 1972) Probabilities add to 1 for each hypothesis (by “The probability of the data given a hypothesis” ), but do not add to 1 across different P(D | H ) (can be viewed as a simplified form of Bayesian = 1 probability) hypotheses – hence “Likelihood” Likelihood Ratio P(D | H 2 )

Example: binomial A common frequentist Support distribution approach: Support is defined as the natural Supposing we are interested in estimating the allele of a gene in a sample: Sample mean p* = 2/10 = 0.2 logarithm of the likelihood ratio 2 A a Total alleles Sample variance, sp = p*q*/n = 0.2x0.8/10 = 0.016 P(D | H ) = 1 of mean, s =√0.016=0.126 Support loge 2 8 10 p P(D | H 2 ) i (n-i) n ± 95% conf. limits of mean = p* t9,0.05sp = − loge P(D | H1) loge P(D | H 2 ) This is a problem that is well suited to the binomial theorem: = 0.2 ± 2.262 x 0.126

n −−n! P(D|H ) =   ppiniini()11−=()pp()− () = (-0.085, +0.485) j i  ini!(− )! Note the NEGATIVE lower limit!

Likelihood plot Likelihood 0.008 Likelihood approach Likelihood & the binomial & the 0.007 & the 0.006 n=10 0.005 To get the support for two hypotheses, we need Binomial probability sample size "successes" using likelihood n= 10 i= 2 binomial 0.004 0.003 to calculate: Likelihood/B ln likelihood ln likelihood ratio Likelihood n=40 Hj = p p^i(1-p)^(n-i) The support curve gives 0.002 P(D | H ) 0 0 #NUM! (impossible) #NUM! (->minus infinity) a of in the 0.001 Support = log 1 0.001 1.002E-06 -13.81351 -8.36546 continuously 0 e 0.01 9.22745E-05 -9.290743 -4.19635 hypotheses 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P(D | H ) 0.05 0.001658551 -6.401811 -1.39779 Binom ial p 2 0.1 0.004304672 -5.448054 -0.44403 0.15 0.006131037 -5.094391 -0.09037 Edwards: 2 units below Log Likelihood plot 0.2 0.006710886 -5.004024 *=max (i=2)! 0 Note! The binomial coefficient depends only the can be viewed as Bin om ial p 0.25 0.006257057 -5.074045 -0.07002 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.005188321 -5.261345 -0.25732 “support limits” on the data (D), not on the hypothesis (H) 0 0.35 0.003903399 -5.545908 -0.54188 0.4 0.002687386 -5.919186 -0.91516 -1 n −−n! (equivalent to approx 2 iniini−=()− () 0.45 0.001695612 -6.379711 -1.37569 P(D|H ) =   pp()11pp() -2 j i  in!(− i )! 0.5 0.000976563 -6.931472 -1.92745 standard deviations in 0.55 0.000508658 -7.583736 -2.57971 the frequentist approach) -3 0.6 0.00023593 -8.351977 -3.34795 Ln Likelihood 0.65 9.51417E-05 -9.260143 -4.25612 -4 n=10 Binomial coeff. cancels! No need to calculate the 2 0.7 3.21489E-05 -10.34513 -5.34111 logeLR=2 implies LR=e , n=40 tedious constant! Just need the (1–p)(n-i) terms the best is 7.4x as good -5

3 Bayes’ Theorem as a Sum of support from different 3. Bayes’ Theorem Bayes’ Theorem as a means of inference Likelihood of Binomial p P(B | A)P(A) Sum of support = P(A | B) P(H1 | D) k.P(D | H1 )P(H1) 0 0.2 0.4 0.6 0.8(rescaled) 1 = 0 P(B) P(H 2 | D) k.P(D | H 2 )P(H 2 ) -1 Named after its inventor, Thomas Bayes in 18th Century England. Led by Bayes and Laplace, the -2 If the prior is “uniform”, P(H1)=P(H2) 20A, 20a theorem and “Bayesian Probability” has come to -3 be used in a system of inference … P(H | D) P(D | H ) 2A, 10a 1 = 1 Log Likelihood ratio Likelihood Log -4 P(H | D) = k.P(D | H )P(H ) P(H 2 | D) P(D | H 2 ) -5 Binomial p Posterior Likelihood Prior Posterior Likelihood Prior The ratio of posterior probabilities collapses to Probability Probability Support provides a way to adjudicate between data ... a likelihood ratio! from different experiments

Common ground Opposing philosophies In practice Important to realize there isn’t just one way of In practice, in most applications, all three doing statistics. For me: approaches tend to support similar hypotheses. • What is scientific inference? • Three philosophies of statistical inference: Edwards’ argument for likelihood as the means Edwards shows that significance tests are justifiable by appealing to likelihood ratios – tail probability – Frequentist (probability in the long run) of inference seems powerful. Probability of the data given the hypothesis is a good measure. low when likelihood ratio (itself often proportional – Likelihood (likelihood) to relative Bayesian probability) is high. – Bayesian (posterior probability) Bayesian difficulties: “probability of a • Common ground: Opposing philosophies hypothesis” without data (the ) In very complex estimation problems (e.g. GLM ), • Common ground: Opposing philosophies ν agree (approximately), in many problems. where we test for “significance” of extra Frequentist difficulties: P-values: probability parameters, we use the chi-square approximation: ≈ χ 2 based on events that haven’t happened 2logeLR=“deviance” ν This interpretation employs a frequentist approach.

Conclusion Relationship of likelihood ratio to Utility of likelihood Excel exercise with "Solver" frequentism Estimation and hypothesis testing of complex problems In large samples, today almost always use likelihood or Bayesian go to www.ucl.ac.uk/~ucbhdjm/bin/ = {}− G 2 loge P(D | H1) loge P(D | H 2 ) methods, often using MCMC optimization, for example: open the ABO_Student.xls file Generalized Linear Models, Deviance converges to a χ2 distribution with the of follow instructions degrees of freedom given by the numbers of free Phylogeny estimation, molecular clock estimation Phylogeny estimation, molecular clock estimation parameters.

Linkage mapping, QTL analysis in human genetics For a test of null hypothesis H0 vs. max. likelihood High energy physics experiments hypothesis H1: P can be calculated from the integral of the χ2 probability density . At the very least, these methods enable more complex Also, note that with a support value (∆lnL) of 2.0, G = 4.0 problems to be analysed. At best, they may provide an .1.962 = 3.84, i.e. the value of χ2 which is "significant" improved philosophical basis for inference. at P=0.05 with 1 degree of freedom.

4