Parameter Estimation

Peter N Robinson

Estimating Parameters from Data Parameter Estimation Maximum ML vs. MAP Likelihood (ML) Estimation

Beta distribution Peter N Robinson

Maximum a posteriori (MAP) Estimation December 14, 2012 MAQ Estimating Parameters from Data

Parameter Estimation

Peter N Robinson

Estimating Parameters In many situations in bioinformatics, we want to estimate “op- from Data timal” parameters from data. In the examples we have seen in Maximum Likelihood the lectures on variant calling, these parameters might be the (ML) Estimation error rate for reads, the proportion of a certain genotype, the Beta proportion of nonreference bases etc. However, the hello world distribution example for this sort of thing is the coin toss, so we will start Maximum a posteriori with that. (MAP) Estimation

MAQ Coin toss

Parameter Estimation

Peter N Robinson

Estimating Parameters Let’s say we have two coins that are each tossed 10 times from Data

Maximum Coin 1: H,T,T,H,H,H,T,H,T,T Likelihood (ML) Coin 2: T,T,T,H,T,T,T,H,T,T Estimation

Beta Intuitively, we might guess that coin one is a fair coin, i.e., distribution P(X = H) = 0.5, and that coin 2 is biased, i.e., Maximum a posteriori P(X = H) 6= 0.5 (MAP) Estimation

MAQ Discrete

Parameter Estimation Peter N Let us begin to formalize this. We model the coin toss process Robinson as follows. Estimating Parameters The of a single coin toss is a random variable X from Data that can take on values in a set X = {x1, x2,..., xn} Maximum Likelihood (ML) In our example, of course, n = 2, and the values are Estimation x1 = 0 (tails) and x2 = 1 (heads) We then have a mass function p : X −→ [0, 1]; P Maximum a the states that p(xi ) = 1 posteriori x∈X (MAP) Estimation This is a Bernoulli distribution with parameter µ:

MAQ p(X = 1; µ) = µ (1) Probability of sequence of events

Parameter Estimation In general, for a sequence of two events X1 and X2, the joint

Peter N probability is Robinson P(X1, X2) = p(X2|X1)p(X1) (2) Estimating Parameters Since we assume that the sequence is iid (identically and from Data independently distributed), by definition p(X2|X1) = P(X2). Maximum Likelihood Thus, for a sequence of n events (coin tosses), we have (ML) Estimation n Beta Y distribution p(x1, x2,..., xn; µ) = p(xi ; µ) (3) Maximum a i=1 posteriori (MAP) Estimation if the probability of heads is 30%, the the probability of the MAQ sequence for coin 2 can be calculated as

 3 2  7 8 p(T , T , T , H, T , T , T , H, T , T ; µ) = µ2(1 − µ)8 = (4) 10 10 Probability of sequence of events

Parameter Estimation

Peter N Robinson

Estimating Parameters Thus far, we have considered p(x; µ) as a function of x, from Data parametrized by µ. If we view p(x; µ) as a function of µ, then Maximum Likelihood it is called the likelihood function. (ML) Estimation

Beta distribution Maximum a Maximum likelihood estimation basically chooses a value of µ posteriori (MAP) that maximizes the likelihood function given the observed data. Estimation

MAQ Maximum likelihood for Bernoulli

Parameter Estimation The likelihood for a sequence of i.i.d. Bernoulli random Peter N Robinson variables X = [x1, x2,..., xn] with xi ∈ {0, 1} is then

Estimating n n Parameters Y Y xi 1−xi from Data p(X; µ) = p(xi ; µ) = µ (1 − µ) (5) Maximum i=1 i=1 Likelihood (ML) Estimation We usually maximize the log likelihood function rather than the Beta distribution original function Maximum a posteriori Often easier to take the derivative (MAP) Estimation the log function is monotonically increasing, thus, the MAQ maximum (argmax) is the same Avoid numerical problems involved with multiplying lots of small numbers Log likelihood

Parameter Thus, instead of maximizing this Estimation n Peter N Y Robinson p(X; µ) = µxi (1 − µ)1−xi (6)

Estimating i=1 Parameters from Data we maximize this n Maximum Likelihood Y xi 1−xi (ML) log p(X; µ) = log µ (1 − µ) Estimation i=1 Beta n distribution X = log µxi (1 − µ)1−xi Maximum a posteriori i=1 (MAP) n Estimation X  xi 1−xi  MAQ = log µ + log(1 − µ) i=1 n X = [xi log µ + (1 − xi ) log(1 − µ)] i=1 Log likelihood

Parameter Estimation

Peter N Robinson Note that one often denotes the log likelihood function with

Estimating the symbol L = log p(X; µ). Parameters from Data A function f defined on a subset of the real numbers with real Maximum Likelihood values is called monotonic (also monotonically increasing, in- (ML) creasing or non-decreasing), if for all x and y such that x ≤ y Estimation

Beta one has f (x) ≤ f (y) distribution

Maximum a Thus, the monotonicity of the log function guarantees that posteriori (MAP) Estimation argmax p(X; µ) = argmax log p(X; µ) (7) MAQ µ µ ML estimate

Parameter Estimation

Peter N Robinson

Estimating Parameters The ML estimate of the parameter µ is then from Data Maximum n Likelihood X (ML) argmax [xi log µ + (1 − xi ) log(1 − µ)] (8) Estimation µ i=1 Beta distribution We can calculate the argmax by setting the first derivative Maximum a posteriori equal to zero and solving for µ (MAP) Estimation

MAQ ML estimate

Parameter Estimation

Peter N Robinson Thus Estimating Parameters n from Data ∂ X ∂ log p(X; µ) = [xi log µ + (1 − xi ) log(1 − µ)] Maximum ∂µ ∂µ Likelihood i=1 (ML) n n Estimation X ∂ X ∂ = x log µ + (1 − x ) log(1 − µ) Beta i ∂µ i ∂µ distribution i=1 i=1 Maximum a n n posteriori 1 X 1 X (MAP) = xi − (1 − xi ) Estimation µ 1 − µ i=1 i=1 MAQ ML estimate

∂ Parameter and finally, to find the maximum we set log p(X; µ) = 0: Estimation ∂µ Peter N n n Robinson 1 X 1 X 0 = xi − (1 − xi ) Estimating µ 1 − µ Parameters i=1 i=1 from Data Pn 1 − µ (1 − xi ) Maximum = i=1 Likelihood µ Pn x (ML) i=1 i Estimation Pn 1 i=1 1 Beta − 1 = Pn − 1 distribution µ i=1 xi Maximum a 1 n posteriori = n (MAP) µ P x Estimation i=1 i n MAQ 1 X µˆ = x ML n i i=1 Reassuringly, the maximum likelihood estimate is just the proportion of flips that came out heads. Problems with ML estimation

Parameter Estimation

Peter N Robinson

Estimating Parameters Does it really make sense that from Data

Maximum H,T,H,T → µˆ = 0.5 Likelihood (ML) H,T,T,T → µˆ = 0.25 Estimation

Beta T,T,T,T→ µˆ = 0.0 distribution ML estimation does not incorporate any prior knowledge and Maximum a posteriori does not generate an estimate of the certainty of its results. (MAP) Estimation

MAQ Maximum a posteriori Estimation

Parameter Estimation Peter N Bayesian approaches try to reflect our belief about µ. In this Robinson case, we will consider µ to be a random variable. Estimating Parameters from Data p(X|µ)p(µ) Maximum p(µ|X) = (9) Likelihood p(X) (ML) Estimation Thus, Bayes’ law converts our prior belief about the parameter Beta distribution µ (before seeing data) into a posterior probability, p(µ|X), by

Maximum a using the likelihood function p(X|µ). The maximum posteriori (MAP) a-posteriori (MAP) estimate is defined as Estimation

MAQ µˆMAP = argmax p(µ|X) (10) µ Maximum a posteriori Estimation

Parameter Estimation

Peter N Robinson Note that because p(X) does not depend on µ, we have

Estimating Parameters from Data

Maximum µˆMAP = argmax p(µ|X) Likelihood µ (ML) Estimation p(X|µ)p(µ) = argmax Beta µ p(X) distribution

Maximum a = argmax p(X|µ)p(µ) posteriori µ (MAP) Estimation This is essentially the basic idea of the MAP equation used by MAQ SNVMix for variant calling MAP Estimation; What does it buy us?

Parameter Estimation To take a simple example of a situation in which MAP estima- Peter N tion might produce better results than ML estimation, let us Robinson consider a statistician who wants to predict the outcome of the Estimating Parameters next election in the USA. from Data Maximum The statistician is able to gather data on party preferences Likelihood 1 (ML) by asking people he meets at the Wall Street Golf Club Estimation which party they plan on voting for in the next election Beta distribution The statistician asks 100 people, seven of whom answer Maximum a posteriori “Democrats”. This can be modeled as a series of (MAP) Estimation Bernoullis, just like the coin tosses. MAQ In this case, the maximum likelihood estimate of the proportion of voters in the USA who will vote democratic isµ ˆML = 0.07.

1i.e., a notorious haven of ultraconservative Republicans MAP Estimation; What does it buy us?

Parameter Estimation

Peter N Robinson

Estimating Somehow, the estimate ofµ ˆML = 0.07 doesn’t seem quite right Parameters given our previous experience that about half of the electorate from Data

Maximum votes democratic, and half votes republican. But how should Likelihood (ML) the statistician incorporate this prior knowledge into his Estimation prediction for the next election? Beta distribution

Maximum a posteriori (MAP) The MAP estimation procedure allows us to inject our prior Estimation beliefs about parameter values into the new estimate. MAQ Beta distribution: Background

Parameter Estimation The Beta distribution is appropriate to express prior belief about

Peter N a Bernoulli distribution. The Beta distribution is a family of Robinson continuous distributions defined on [0, 1] and parametrized by Estimating two positive shape parameters, α and β Parameters from Data

Maximum beta distribution Likelihood (ML) 1 α−1 β−1 2.5 α=β=0.5 Estimation p(µ) = ·µ (1 − µ) α=5, β=1 B(α, β) α=1, β=3 α=β=2 Beta α β

2.0 =2, =5 distribution

Maximum a here, µ ∈ [0, 1], and 1.5 posteriori

(MAP) pdf

Γ(α + β) 1.0 Estimation B(α, β) = MAQ Γ(α) · Γ(β) 0.5 where Γ is the Gamma function 0.0

(extension of factorial). 0.0 0.2 0.4 0.6 0.8 1.0 x Beta distribution: Background

Parameter Estimation Peter N Random variables are either discrete (i.e., they can assume one Robinson of a list of values, like the Bernoulli with heads/tails) or con- Estimating tinuous (i.e., they can take on any numerical value in a certain Parameters from Data interval, like the Beta distribution with µ). Maximum Likelihood (ML) A probability density function (pdf) of a continuous Estimation random variable, is a function that describes the relative Beta distribution likelihood for this random variable to take on a given Maximum a + posteriori value, i.e., p(µ): R → R such that (MAP) Estimation b MAQ Z Pr(µ ∈ (a, b)) = p(µ)dµ (11) a The probability that the value of µ lies between a and b is given by integrating the pdf over this region Beta distribution: Background

Parameter Estimation Peter N Recall the difference between a PDF (for continuous random Robinson variable) and a probability mass function (PMF) for a discrete Estimating random variable Parameters from Data Maximum A PMF is defined as Pr(X = xi ) = pi , with 0 ≤ pi ≤ 1 Likelihood P (ML) and i pi = 1 Estimation e.g., for a fair coin toss (Bernoulli), Beta distribution Pr(X = heads) = Pr(X = tails) = 0.5 Maximum a posteriori In contrast, for a PDF, there is no requirement that (MAP) Estimation p(µ) ≤ 1, but we do have MAQ Z +∞ p(µ)dµ = 1 (12) −∞ Beta distribution: Background

Parameter Estimation We calculate the PDF for the Beta distribution for a Peter N Robinson sequence of values 0, 0.01, 0.02,..., 1.00 in R as follows

Estimating Parameters x <- seq(0.0, 1.0, 0.01) from Data y <- dbeta(x, 3, 3) Maximum Likelihood (ML) Estimation Recalling how to approximate an integral with a Riemann Beta R b Pn distribution sum, a p(µ)dµ ≈ i=1 p(µi )∆i , where µi is a point in Maximum a the subinterval ∆i and the subintervals span the entire posteriori R 1 (MAP) interval [a, b], we can check that 0 Beta(µ)dµ = 1 Estimation MAQ > sum ((1/101)*y) [1] 0.990099

1 Here, ∆i = 101 and the vector y contains the various values of µi Beta distribution: Background

Parameter Estimation The of a continuous is the Peter N Robinson value x at which its probability density function has its maximum value Estimating Parameters The mode of the Beta(α, β) distribution has its mode at from Data

Maximum α − 1 Likelihood (13) (ML) α + β − 2 Estimation Beta alpha <- 7 distribution beta <- 3 Maximum a posteriori x <- seq(0.0, 1.0, 0.01) (MAP) y <- dbeta(x, alpha, beta) Estimation md <- (alpha-1)/(alpha + beta - 2) MAQ title <- expression(paste(alpha,"=7 ",beta,"=3")) plot(x, y, type="l",main=title, xlab="x",ylab="pdf",col="blue",lty=1,cex.lab=1.25) abline(v=md,col="red",lty=2,lwd=2) Beta distribution: Background

Parameter Estimation The code from the previous slide leads to Peter N Robinson α=7 β=3

Estimating Parameters from Data 2.5

Maximum Likelihood 2.0 (ML)

Estimation 1.5 pdf Beta distribution 1.0

Maximum a posteriori 0.5 (MAP)

Estimation 0.0

MAQ 0.0 0.2 0.4 0.6 0.8 1.0 x

α − 1 7 − 1 The mode, shown as the dotted red line, is calculated as = = 0.75 α + β − 2 7 + 3 − 2 Maximum a posteriori (MAP) Estimation

Parameter Estimation

Peter N Robinson

Estimating With all of this information in hand, let’s get back to MAP Parameters from Data estimation! Maximum Likelihood (ML) Going back to Bayes rule, again, we seek the value of µ Estimation that maximizes the posterior Pr(µ|X): Beta distribution Maximum a Pr(X|µ)Pr(µ) posteriori Pr(µ|X) = (14) (MAP) Pr(X) Estimation

MAQ Maximum a posteriori (MAP) Estimation

Parameter Estimation

Peter N Robinson We then have Estimating Parameters from Data µˆMAP = argmax Pr(µ|X) µ Maximum Likelihood Pr(X|µ)Pr(µ) (ML) = argmax Estimation µ Pr(X) Beta distribution = argmax Pr(X|µ)Pr(µ) µ Maximum a posteriori Y (MAP) = argmax Pr(xi |µ)Pr(µ) Estimation µ xi ∈X MAQ Maximum a posteriori (MAP) Estimation

Parameter Estimation

Peter N Robinson As we saw above for maximum likelihood estimation, it is easier

Estimating to calculate the argmax for the logarithm Parameters from Data

Maximum Likelihood (ML) argmax Pr(µ|X) = argmax log Pr(µ|X) Estimation µ µ Beta Y distribution = argmax log Pr(xi |µ) · Pr(µ) µ Maximum a x ∈X posteriori i (MAP) X Estimation = argmax {log Pr(xi |µ)} + log Pr(µ) µ MAQ xi ∈X Maximum a posteriori (MAP) Estimation

Parameter Estimation Peter N Let’s go back now to our problem of predicting the results of Robinson the next election. Essentially, we plug in the equations for the Estimating Parameters distributions of the likelihood (a Bernoulli distribution) and the from Data prior (A Beta distribution). Maximum Likelihood (ML) Estimation Pr(µ|X) ∝ Pr(xi |µ) · Pr(µ) Beta distribution

Maximum a posteriori posterior (MAP) Estimation Likelihood (Bernoulli) MAQ prior (Beta) Maximum a posteriori (MAP) Estimation

Parameter Estimation

Peter N Robinson We thus have that

xi 1−xi Estimating Pr(xi |µ) = Bernoulli(xi |µ) = µ (1 − µ) Parameters 1 from Data Pr(µ) = Beta(µ|α, β) = · µα−1 (1 − µ)β−1 Maximum B(α, β) Likelihood (ML) thus Estimation Pr(µ|X) ∝ Pr(X|µ)Pr(µ) Beta distribution is equivalent to Maximum a posteriori (MAP) ( ) Estimation Y Pr(µ|X) ∝ Bernoulli(xi |µ) · Beta(µ|α, β) (15) MAQ i Maximum a posteriori (MAP) Estimation

Parameter Estimation Furthermore

Peter N Robinson

Estimating L = log Pr(µ|X) Parameters ( ) from Data Y Maximum = log Bernoulli(xi |µ) · Beta(µ|α, β) Likelihood i (ML) Estimation X = log Bernoulli(xi |µ) + log Beta(µ|α, β) Beta distribution i

Maximum a posteriori We solve forµ ˆMAP = argmaxµ L as follows (MAP) Estimation X MAQ argmax log Bernoulli(xi |µ) + log Beta(µ|α, β) µ i Note that this is almost the same as the ML estimate except that we now have an additional term resulting from the prior Maximum a posteriori (MAP) Estimation

Parameter Estimation Peter N Again, we find the maximum value of µ by setting the first Robinson derivative of L equal to zero and solving for µ Estimating Parameters from Data ∂ X ∂ ∂ Maximum L = log Bernoulli(x |µ) + log Beta(µ|α, β) Likelihood ∂µ ∂µ i ∂µ (ML) i Estimation

Beta distribution The first term is the same as for ML2, i.e. Maximum a posteriori (MAP) n n Estimation X ∂ 1 X 1 X log Bernoulli(xi |µ) = xi − (1−xi ) (16) MAQ ∂µ µ 1 − µ i i=1 i=1

2 see slide 11 Maximum a posteriori (MAP) Estimation

Parameter Estimation

Peter N Robinson To find the second term, we note   Estimating ∂ ∂ Γ(α + β) α−1 β−1 Parameters log Beta(µ|α, β) = log · µ (1 − µ) from Data ∂µ ∂µ Γ(α) · Γ(β)

Maximum ∂ Γ(α + β) ∂ α−1 β−1 Likelihood = log + log µ (1 − µ) (ML) ∂µ Γ(α) · Γ(β) ∂µ Estimation ∂ = 0 + log µα−1 (1 − µ)β−1 Beta ∂µ distribution ∂ ∂ ∂ Maximum a = (α − 1) log µ + (β − 1) (1 − µ) posteriori ∂µ ∂µ ∂µ (MAP) α − 1 β − 1 Estimation = − µ 1 − µ MAQ Maximum a posteriori (MAP) Estimation

∂ Parameter To findµ ˆ , we now set L = 0 and solve for µ Estimation MAP ∂µ Peter N Robinson ∂ 0 = L ∂µ Estimating n n Parameters 1 X 1 X α − 1 β − 1 from Data = x − (1 − x ) + − µ i 1 − µ i µ 1 − µ Maximum i=1 i=1 Likelihood (ML) Estimation and thus " n # " # Beta X X distribution µ (1 − xi ) + β − 1 = (1 − µ) xi + α − 1 Maximum a i=1 i posteriori " n # (MAP) X X X Estimation µ (1 − xi ) + xi + β − 1 + α − 1 = xi + α − 1

MAQ i=1 i i " n # X X µ 1 + β + α − 2 = xi + α − 1 i=1 i Maximum a posteriori (MAP) Estimation

Parameter Estimation Finally, if we let our Bernoulli distribution be coded as

Peter N Republican=1 and Democrat=0, we have that Robinson

Estimating Parameters X from Data xi = nr where nr denotes the number of Republican voters

Maximum i Likelihood (17) (ML) Estimation Then,

Beta " n # distribution X X µ 1 + β + α − 2 = xi + α − 1 Maximum a i=1 i posteriori (MAP) µ [n + β + α − 2] = nR + α − 1 Estimation

MAQ and finally n + α − 1 µˆ = R (18) MAP n + β + α − 2 And now our prediction

Parameter Estimation

Peter N It is useful to compare the ML and the MAP predictions. Note Robinson again that α and β are essentially the same thing as pseudo- Estimating counts, and the higher their value, the more the prior affects Parameters from Data the final prediction (i.e., the posterior). Maximum Likelihood Recall that in our poll of 100 members of the Wall Street Golf (ML) Estimation club, only seven said they would vote democratic. Thus Beta distribution n = 100 Maximum a n = 93 posteriori r (MAP) Estimation We will assume that the mode of our prior belief is that

MAQ 50% of the voters will vote democratic, and 50% republican. Thus, α = β. However, different values for alpha and beta express different strengths of prior belief And now our prediction

Parameter Estimation

Peter N Robinson

Estimating Parameters n n α β µˆ µˆ from Data R ML MAP

Maximum Likelihood 100 93 1 1 0.93 0.93 (ML) 100 93 5 5 0.93 0.90 Estimation

Beta 100 93 100 100 0.93 0.64 distribution 100 93 1000 1000 0.93 0.52 Maximum a posteriori 100 93 10000 10000 0.93 0.502 (MAP) Estimation Thus, MAP “pulls” the estimate towards the prior to an extent that depends on the strength of the prior MAQ The use of MAP in MAQ

Parameter Estimation Recall from the lecture that we call the posterior Peter N of the three genotypes given the data D, that is a column with Robinson n aligned nucleotides and quality scores of which k correspond Estimating to the reference a and n − k to a variant nucleotide b. Parameters from Data

Maximum Likelihood (ML) Estimation p(G = ha, ai|D) ∝ p(D|G = ha, ai)p(G = ha, ai) Beta ∝ α · (1 − r)/2 distribution n,k

Maximum a p(G = hb, bi|D) ∝ p(D|G = hb, bi)p(G = hb, bi) posteriori (MAP) ∝ αn,n−k · (1 − r)/2 Estimation

MAQ p(G = ha, bi|D) ∝ p(D|G = ha, bi)p(G = ha, bi) n 1 ∝ · r k 2n MAQ: Consensus Genotype Calling

Parameter Estimation

Peter N Robinson Note that MAQ does not attempt to learn the parameters, Estimating Parameters rather it uses user-supplied parameter r which roughly corre- from Data sponds to µ in the election. Maximum Likelihood (ML) MAQ calls the the genotype with the highest posterior Estimation probability: Beta distribution

Maximum a gˆ = argmax p(g|D) posteriori g∈(ha,ai,ha,bi,hb,bi) (MAP) Estimation The probability of this genotype is used as a measure of MAQ confidence in the call. MAQ: Consensus Genotype Calling

Parameter Estimation A major problem in SNV calling is false positive heterozygous

Peter N variants. It seems less probable to observe a heterozygous call Robinson at a position with a common SNP in the population. For this Estimating reason, MAQ uses a different prior (r) for previously known Parameters from Data SNPs (r = 0.2) and “new” SNPs (r = 0.001). Maximum Likelihood Let us examine the effect of these two priors on variant calling. (ML) Estimation In R, we can write Beta distribution n 1 Maximum a p(G = ha, bi|D) ∝ n · r posteriori k 2 (MAP) Estimation as MAQ > dbinom(k,n,0.5) * r

where k is the number of non-reference bases, n is the total number of bases, and 0.5 corresponds to the

probability of seeing a non-ref base given that the true genotype is heterozygous. MAQ: Consensus Genotype Calling

Parameter Estimation A major problem in SNV calling is false positive heterozygous

Peter N variants. It seems less probable to observe a heterozygous call Robinson at a position with a common SNP in the population. For this Estimating reason, MAQ uses a different prior (r) for previously known Parameters from Data SNPs (r = 0.2) and “new” SNPs (r = 0.001). Maximum Likelihood Let us examine the effect of these two priors on variant calling. (ML) Estimation In R, we can write Beta distribution n 1 Maximum a p(G = ha, bi|D) ∝ n · r posteriori k 2 (MAP) Estimation as MAQ > dbinom(k,n,0.5) * r

where k is the number of non-reference bases, n is the total number of bases, and 0.5 corresponds to the

probability of seeing a non-ref base given that the true genotype is heterozygous. MAQ: Consensus Genotype Calling

Parameter Estimation

Peter N Robinson r=0.001 r=0.2 A B ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● Estimating ● ● 0.8 Parameters 0.8 from Data 0.4 0.4

● ● Maximum ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Likelihood 0.0 0.0 (ML) 0 5 10 15 20 0 5 10 15 20 Estimation k k Beta distribution

Maximum a The figures show the posterior for the heterozygous posteriori (MAP) genotype according to the simplified MAQ algorithm Estimation discussed in the previous lecture MAQ The prior r = 0.0001 means than positions with 5 or less ALT bases do not get called as heterozygous, whereas the prior with r = 0.2 means that positions with 5 bases do get a het call MAQ: Consensus Genotype Calling

Parameter Estimation

Peter N Robinson What have we learned?

Estimating Parameters Get used to Bayesian techniques important in many areas from Data of bioinformatics Maximum Likelihood understand the difference between ML and MAP (ML) Estimation estimation Beta distribution understand the Beta function, priors, pseudocounts Maximum a Note that MAQ is no longer a state of the art algorithm, posteriori (MAP) and its use of the MAP framework is relatively simplistic Estimation

MAQ Nonetheless, a good introduction to this topic, and we will see how these concepts are used in EM today