Parameter Estimation

Parameter Estimation Peter N Robinson Estimating Parameters from Data Parameter Estimation Maximum ML vs. MAP Likelihood (ML) Estimation Beta distribution Peter N Robinson Maximum a posteriori (MAP) Estimation December 14, 2012 MAQ Estimating Parameters from Data Parameter Estimation Peter N Robinson Estimating Parameters In many situations in bioinformatics, we want to estimate \op- from Data timal" parameters from data. In the examples we have seen in Maximum Likelihood the lectures on variant calling, these parameters might be the (ML) Estimation error rate for reads, the proportion of a certain genotype, the Beta proportion of nonreference bases etc. However, the hello world distribution example for this sort of thing is the coin toss, so we will start Maximum a posteriori with that. (MAP) Estimation MAQ Coin toss Parameter Estimation Peter N Robinson Estimating Parameters Let's say we have two coins that are each tossed 10 times from Data Maximum Coin 1: H,T,T,H,H,H,T,H,T,T Likelihood (ML) Coin 2: T,T,T,H,T,T,T,H,T,T Estimation Beta Intuitively, we might guess that coin one is a fair coin, i.e., distribution P(X = H) = 0:5, and that coin 2 is biased, i.e., Maximum a posteriori P(X = H) 6= 0:5 (MAP) Estimation MAQ Discrete Random Variable Parameter Estimation Peter N Let us begin to formalize this. We model the coin toss process Robinson as follows. Estimating Parameters The outcome of a single coin toss is a random variable X from Data that can take on values in a set X = fx1; x2;:::; xng Maximum Likelihood (ML) In our example, of course, n = 2, and the values are Estimation x1 = 0 (tails) and x2 = 1 (heads) Beta distribution We then have a probability mass function p : X −! [0; 1]; P Maximum a the law of total probability states that p(xi ) = 1 posteriori x2X (MAP) Estimation This is a Bernoulli distribution with parameter µ: MAQ p(X = 1; µ) = µ (1) Probability of sequence of events Parameter Estimation In general, for a sequence of two events X1 and X2, the joint Peter N probability is Robinson P(X1; X2) = p(X2jX1)p(X1) (2) Estimating Parameters Since we assume that the sequence is iid (identically and from Data independently distributed), by definition p(X2jX1) = P(X2). Maximum Likelihood Thus, for a sequence of n events (coin tosses), we have (ML) Estimation n Beta Y distribution p(x1; x2;:::; xn; µ) = p(xi ; µ) (3) Maximum a i=1 posteriori (MAP) Estimation if the probability of heads is 30%, the the probability of the MAQ sequence for coin 2 can be calculated as 3 2 7 8 p(T ; T ; T ; H; T ; T ; T ; H; T ; T ; µ) = µ2(1 − µ)8 = (4) 10 10 Probability of sequence of events Parameter Estimation Peter N Robinson Estimating Parameters Thus far, we have considered p(x; µ) as a function of x, from Data parametrized by µ. If we view p(x; µ) as a function of µ, then Maximum Likelihood it is called the likelihood function. (ML) Estimation Beta distribution Maximum a Maximum likelihood estimation basically chooses a value of µ posteriori (MAP) that maximizes the likelihood function given the observed data. Estimation MAQ Maximum likelihood for Bernoulli Parameter Estimation The likelihood for a sequence of i.i.d. Bernoulli random Peter N Robinson variables X = [x1; x2;:::; xn] with xi 2 f0; 1g is then Estimating n n Parameters Y Y xi 1−xi from Data p(X; µ) = p(xi ; µ) = µ (1 − µ) (5) Maximum i=1 i=1 Likelihood (ML) Estimation We usually maximize the log likelihood function rather than the Beta distribution original function Maximum a posteriori Often easier to take the derivative (MAP) Estimation the log function is monotonically increasing, thus, the MAQ maximum (argmax) is the same Avoid numerical problems involved with multiplying lots of small numbers Log likelihood Parameter Thus, instead of maximizing this Estimation n Peter N Y Robinson p(X; µ) = µxi (1 − µ)1−xi (6) Estimating i=1 Parameters from Data we maximize this n Maximum Likelihood Y xi 1−xi (ML) log p(X; µ) = log µ (1 − µ) Estimation i=1 Beta n distribution X = log µxi (1 − µ)1−xi Maximum a posteriori i=1 (MAP) n Estimation X xi 1−xi MAQ = log µ + log(1 − µ) i=1 n X = [xi log µ + (1 − xi ) log(1 − µ)] i=1 Log likelihood Parameter Estimation Peter N Robinson Note that one often denotes the log likelihood function with Estimating the symbol L = log p(X; µ). Parameters from Data A function f defined on a subset of the real numbers with real Maximum Likelihood values is called monotonic (also monotonically increasing, in- (ML) creasing or non-decreasing), if for all x and y such that x ≤ y Estimation Beta one has f (x) ≤ f (y) distribution Maximum a Thus, the monotonicity of the log function guarantees that posteriori (MAP) Estimation argmax p(X; µ) = argmax log p(X; µ) (7) MAQ µ µ ML estimate Parameter Estimation Peter N Robinson Estimating Parameters The ML estimate of the parameter µ is then from Data Maximum n Likelihood X (ML) argmax [xi log µ + (1 − xi ) log(1 − µ)] (8) Estimation µ i=1 Beta distribution We can calculate the argmax by setting the first derivative Maximum a posteriori equal to zero and solving for µ (MAP) Estimation MAQ ML estimate Parameter Estimation Peter N Robinson Thus Estimating Parameters n from Data @ X @ log p(X; µ) = [xi log µ + (1 − xi ) log(1 − µ)] Maximum @µ @µ Likelihood i=1 (ML) n n Estimation X @ X @ = x log µ + (1 − x ) log(1 − µ) Beta i @µ i @µ distribution i=1 i=1 Maximum a n n posteriori 1 X 1 X (MAP) = xi − (1 − xi ) Estimation µ 1 − µ i=1 i=1 MAQ ML estimate @ Parameter and finally, to find the maximum we set log p(X; µ) = 0: Estimation @µ Peter N n n Robinson 1 X 1 X 0 = xi − (1 − xi ) Estimating µ 1 − µ Parameters i=1 i=1 from Data Pn 1 − µ (1 − xi ) Maximum = i=1 Likelihood µ Pn x (ML) i=1 i Estimation Pn 1 i=1 1 Beta − 1 = Pn − 1 distribution µ i=1 xi Maximum a 1 n posteriori = n (MAP) µ P x Estimation i=1 i n MAQ 1 X µ^ = x ML n i i=1 Reassuringly, the maximum likelihood estimate is just the proportion of flips that came out heads. Problems with ML estimation Parameter Estimation Peter N Robinson Estimating Parameters Does it really make sense that from Data Maximum H,T,H,T ! µ^ = 0:5 Likelihood (ML) H,T,T,T ! µ^ = 0:25 Estimation Beta T,T,T,T! µ^ = 0:0 distribution ML estimation does not incorporate any prior knowledge and Maximum a posteriori does not generate an estimate of the certainty of its results. (MAP) Estimation MAQ Maximum a posteriori Estimation Parameter Estimation Peter N Bayesian approaches try to reflect our belief about µ. In this Robinson case, we will consider µ to be a random variable. Estimating Parameters from Data p(Xjµ)p(µ) Maximum p(µjX) = (9) Likelihood p(X) (ML) Estimation Thus, Bayes' law converts our prior belief about the parameter Beta distribution µ (before seeing data) into a posterior probability, p(µjX), by Maximum a using the likelihood function p(Xjµ). The maximum posteriori (MAP) a-posteriori (MAP) estimate is defined as Estimation MAQ µ^MAP = argmax p(µjX) (10) µ Maximum a posteriori Estimation Parameter Estimation Peter N Robinson Note that because p(X) does not depend on µ, we have Estimating Parameters from Data Maximum µ^MAP = argmax p(µjX) Likelihood µ (ML) Estimation p(Xjµ)p(µ) = argmax Beta µ p(X) distribution Maximum a = argmax p(Xjµ)p(µ) posteriori µ (MAP) Estimation This is essentially the basic idea of the MAP equation used by MAQ SNVMix for variant calling MAP Estimation; What does it buy us? Parameter Estimation To take a simple example of a situation in which MAP estima- Peter N tion might produce better results than ML estimation, let us Robinson consider a statistician who wants to predict the outcome of the Estimating Parameters next election in the USA. from Data Maximum The statistician is able to gather data on party preferences Likelihood 1 (ML) by asking people he meets at the Wall Street Golf Club Estimation which party they plan on voting for in the next election Beta distribution The statistician asks 100 people, seven of whom answer Maximum a posteriori \Democrats". This can be modeled as a series of (MAP) Estimation Bernoullis, just like the coin tosses. MAQ In this case, the maximum likelihood estimate of the proportion of voters in the USA who will vote democratic isµ ^ML = 0:07. 1i.e., a notorious haven of ultraconservative Republicans MAP Estimation; What does it buy us? Parameter Estimation Peter N Robinson Estimating Somehow, the estimate ofµ ^ML = 0:07 doesn't seem quite right Parameters given our previous experience that about half of the electorate from Data Maximum votes democratic, and half votes republican. But how should Likelihood (ML) the statistician incorporate this prior knowledge into his Estimation prediction for the next election? Beta distribution Maximum a posteriori (MAP) The MAP estimation procedure allows us to inject our prior Estimation beliefs about parameter values into the new estimate. MAQ Beta distribution: Background Parameter Estimation The Beta distribution is appropriate to express prior belief about Peter N a Bernoulli distribution.

Parameter Estimation

A Matrix-Valued Bernoulli Distribution Gianfranco Lovison1 Dipartimento Di Scienze Statistiche E Matematiche “S

Discrete Probability Distributions Uniform Distribution Bernoulli

STATS8: Introduction to Biostatistics 24Pt Random Variables And

Chapter 5 Sections

Lecture 6: Special Probability Distributions

11. Parameter Estimation

Chapter 3. Discrete Random Variables and Probability Distributions

Some Discrete Distributions.Pdf

Chapter 2: Random Variables Example 1

Discrete Distributions: Empirical, Bernoulli, Binomial, Poisson

Hand-Book on STATISTICAL DISTRIBUTIONS for Experimentalists

Discrete Probability Distributions