Likelihood and Probability What We Will Cover: What We Will Not Cover: in Scientific Inference •The Basis of Inference in Science

Likelihood and Probability What we will cover: What we will not cover: in Scientific Inference •The basis of inference in science. Not suitable for people already well-versed in statistics. They’ll already know most of this! •Compare the "frequentist" approach we UCL Graduate School: Graduate Skills Course usually learn, with today’s widely used Not suitable for people who’ve no idea about alternatives in statistical inference. statistics. At least GCSE knowledge required. Your hosts for today: We won’t have time to teach you all you need to ¾Particularly the use of "likelihood" and Particularly the use of "likelihood" and know to analyse your data. James Mallet, Professor of Biological Diversity Bayesian probability. Ziheng Yang, Professor of Statistical Genetics http://abacus.gene.ucl.ac.uk/ We won’t have time to go into very complicated http://abacus.gene.ucl.ac.uk/jim/ •We will hopefully empower to develop examples. (Department of Biology, UCL) your own analyses, using simple examples. Instead, we hope My main source Overview You begin to develop a healthy disrespect for most “off-the-shelf” methods. (But you will Anthony W. Edwards • What is scientific inference? probably still use them). (1972); reprinted 1992: • Three philosophies of statistical inference: – Frequentist (probability in the long run) You start to form your own ideas of how Likelihood. Cambridge UP – Likelihood (likelihood) statistics and scientific inference are related (a – Bayesian (posterior probability) philosophy of science topic). • Common Ground: Opposing philosophies agree (approximately) on many problems That your interest in likelihood and Bayesian analysis is piqued, and you might be motivated see also more in-depth: to do further reading. • Discussion Yudi Pawitan (2001). • Exercises, example of ABO bloodgroups You become empowered to perform simple In all Likelihood. Statistical • Ziheng’s talk: when philosophies conflict ... statistical analyses, using Excel and Excel's Modelling and Inference Solver "add-in". + a little programming ⇒ you Modelling and Inference can analyse much more difficult problems. using Likelihood. Oxford UP Scientific Inference The nature of scientific Models and hypotheses inference Science is about trying to find • What is scientific inference? “predictability” or “regularities” in nature, “I’m sure this is true” “predictability” or “regularities” in nature, • Three philosophies of statistical inference: which we can use. – Frequentist (probability in the long run) “I’m pretty sure” “I’m not sure” “It is likely that...” – Likelihood (likelihood measures strength) For some reason, this usually seems to “This seems most probable to me” For some reason, this usually seems to – Bayesian (posterior probability " " ) work ... • Common ground: Opposing philosophies All of inference about the world is likely to Models and hypotheses allow prediction. agree (approximately), in many problems. be based on probability; it’s statistical. Models and hypotheses allow prediction. We test them by analysing something (Except divine revelation!) about their “likelihood” or “probability” 1 Models and hypotheses in Models and hypotheses in Data is typically discrete For example, statistical inference ... Counts of things milk fat ... Measurements to nearest mm, 0.1ºC Models are assumed to be true for the Data is also finite purposes of the particular test or problem Models, hypotheses can be discrete too, or e.g. we assume height in humans to be continuous. Models and hypotheses may be normally distributed. finite, or infinite in scope. A good method of inference should take this Hypotheses are “parameters” that are the discreteness of data into account when we focus of interest in estimation analyse the data. Many analyses, e.g. mean and variance of height humans. From Sokal & Rohlf 1981, particularly frequentist, don’t! Biometry, p. 47 Null hypotheses in statistics Estimation is primary The three philosophies We are often taught in biology a simplistic Edwards argues that we should turn this • What is scientific inference? kind of “Popperian” approach to science, to argument on its head. kind of “Popperian” approach to science, to • Three philosophies of statistical inference: falsify simple hypotheses. We then try to test – Frequentist (probability in the long run) the null hypothesis! Estimation of a distribution or model can lead to testing of an infinitude of – Likelihood (likelihood) – Bayesian (posterior probability) (Zero-dimensional statistics, if you like; only hypotheses, including the null hypothesis. – Bayesian (posterior probability) one hypothesis can be excluded). • Common ground: Opposing philosophies Uses full dimensionality of the problem: agree (approximately), in many problems. In this view, estimation (e.g. mean, variance) ≥1 – n-dimensional statistical analyses. is like natural history, not good science. Physics-envy? More powerful! 1. Frequentist, significance Philosophical problems P - values testing, P-values with frequentist approach P-values are “tail probabilities” Perfected in 1920s We only have one set of data; seems to (Pearson, Fisher et al.) imagine the experiment done a very large “What the use of P χ2 e.g. test, or t-test number of times implies, therefore, is that a χ2 = 5.28, d.f. = 1; hypothesis that may be or t=3.92, d.f.=10 Often tend to assume the data come from a true may be rejected We find P<0.05, or P=0.009834 continuous distribution; because it has not e.g. χ2 tests on count data, Σ(O-E)2/E predicted observable This is “tail probability” or “probability in the results that have not long run” of getting results at least as extreme Encourages testing of null hypothesis occurred” Jeffreys 1961 as the data under the null hypothesis 2 Alternatives to frequentism 2. Likelihood The Law of Likelihood • Frequentism: “Probability in the long run” The likelihood of a hypothesis (H) after doing “Within the framework of a statistical an experiment or gathering data (D) is the model, a particular set of data supports one • Two alternative measures of support: probability of the data given the hypothesis statistical hypothesis better than another if – Bayesian Probability (Thomas Bayes 1763, Marquis de Laplace 1820) the likelihood of the first hypothesis on the “The probability of a hypothesis given the data” L(H|D) = P(D|H) data exceeds the likelihood of the second hypothesis” – Likelihood (RA Fisher 1920s, Edwards 1972) Probabilities add to 1 for each hypothesis (by “The probability of the data given a hypothesis” definition), but do not add to 1 across different P(D | H ) (can be viewed as a simplified form of Bayesian = 1 probability) hypotheses – hence “Likelihood” Likelihood Ratio P(D | H 2 ) Example: binomial A common frequentist Support distribution approach: Support is defined as the natural Supposing we are interested in estimating the allele frequency of a gene in a sample: Sample mean p* = 2/10 = 0.2 logarithm of the likelihood ratio 2 A a Total alleles Sample variance, sp = p*q*/n = 0.2x0.8/10 = 0.016 P(D | H ) = 1 Standard deviation of mean, s =√0.016=0.126 Support loge 2 8 10 p P(D | H 2 ) i (n-i) n ± 95% conf. limits of mean = p* t9,0.05sp = − loge P(D | H1) loge P(D | H 2 ) This is a problem that is well suited to the binomial theorem: = 0.2 ± 2.262 x 0.126 n −−n! P(D|H ) = ppiniini()11−=()pp()− () = (-0.085, +0.485) j i ini!(− )! Note the NEGATIVE lower limit! Likelihood plot Likelihood 0.008 Likelihood approach Likelihood & the binomial & the 0.007 & the 0.006 n=10 0.005 To get the support for two hypotheses, we need Binomial probability sample size "successes" using likelihood n= 10 i= 2 binomial 0.004 0.003 to calculate: Likelihood/B ln likelihood ln likelihood ratio Likelihood n=40 Hj = p p^i(1-p)^(n-i) The support curve gives 0.002 P(D | H ) 0 0 #NUM! (impossible) #NUM! (->minus infinity) a measure of belief in the 0.001 Support = log 1 0.001 1.002E-06 -13.81351 -8.36546 continuously variable 0 e 0.01 9.22745E-05 -9.290743 -4.19635 hypotheses 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P(D | H ) 0.05 0.001658551 -6.401811 -1.39779 Binom ial p 2 0.1 0.004304672 -5.448054 -0.44403 0.15 0.006131037 -5.094391 -0.09037 Edwards: 2 units below Log Likelihood plot 0.2 0.006710886 -5.004024 *=max (i=2)! 0 Note! The binomial coefficient depends only the can be viewed as Bin om ial p 0.25 0.006257057 -5.074045 -0.07002 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.005188321 -5.261345 -0.25732 “support limits” on the data (D), not on the hypothesis (H) 0 0.35 0.003903399 -5.545908 -0.54188 0.4 0.002687386 -5.919186 -0.91516 -1 n −−n! (equivalent to approx 2 iniini−=()− () 0.45 0.001695612 -6.379711 -1.37569 P(D|H ) = pp()11pp() -2 j i in!(− i )! 0.5 0.000976563 -6.931472 -1.92745 standard deviations in 0.55 0.000508658 -7.583736 -2.57971 the frequentist approach) -3 0.6 0.00023593 -8.351977 -3.34795 Ln Likelihood 0.65 9.51417E-05 -9.260143 -4.25612 -4 n=10 Binomial coeff. cancels! No need to calculate the 2 0.7 3.21489E-05 -10.34513 -5.34111 logeLR=2 implies LR=e , n=40 tedious constant! Just need the pi(1–p)(n-i) terms the best is 7.4x as good -5 3 Bayes’ Theorem as a means Sum of support from different experiments 3. Bayes’ Theorem Bayes’ Theorem as a means of inference Likelihood of Binomial p P(B | A)P(A) Sum of support = P(A | B) P(H1 | D) k.P(D | H1 )P(H1) 0 0.2 0.4 0.6 0.8(rescaled) 1 = 0 P(B) P(H 2 | D) k.P(D | H 2 )P(H 2 ) -1 Named after its inventor, Thomas Bayes in 18th Century England.

Load more