Lesson 44 Maximummaximum Likelihoodlikelihood Intervalinterval Estimationestimation

E.E. SantovettiSantovetti lessonlesson 44 MaximumMaximum likelihoodlikelihood IntervalInterval estimationestimation 1 ExtendedExtended MaximumMaximum LikelihoodLikelihood Sometimes the number of total events measurements of the experiment n is not fixed, but, for example, is a Poisson r.v., mean . The extended likelihood function is then: If is a function of we have: Example: the expected number of events of a certain process ● Extended ML uses more informations, so the error on the parameters will be smaller, compared to the case in which n independent ● In case n doesn't depend from we have the usual likelihood 2 ExtendedExtended MLML exampleexample Consider two types of events (e.g., signal and background) each of which predict a given pdf for the variable x: fs(x) and fb(x). We observe a mixture of the two event types, signal fraction = θ, expected total number = ν, observed total number = n Let s = and b = (1- be the number of signal and background that we want to evaluate 3 ExtendedExtended MLML exampleexample (2)(2) Consider for signal a Gaussian pdf and for background an exponential Maximize the log L, to find s and b Here errors reflect total Poisson fluctuation as well as that in proportion of signal/background 4 UnphysicalUnphysical valuesvalues forfor estimatorsestimators Here the unphysical estimator is unbiased and should nevertheless be reported, since average of a large number of unbiased estimates converges to the true value (cf. PDG). Repeat entire MC experiment many times, allow unphysical estimates. 5 ExtendedExtended MLML exampleexample IIII LH does not provide any information on the Goodness of the fit. This has to be checked separately. ● Simulate toy MC according to estimated pdf (using fit results from data as “true” parameter values) compare max Likelihood value in toy to the one in data ● Draw data in (binned) histogram, “compare” distribution with result of ML fit. 6 ExtendedExtended MLML exampleexample IIII Again you want to distinguish (and count) signal events respect to background events. The signal is the process The background is combinatorial: vertex (and then particle) reconstructed with the wrong tracks. To select signal from background we can use two main things: 1) The invariant mass of the two daughter particles has to peak to the B meson mass; 2) The time of flight of the B meson candidate has to be of the order of the B meson life time. These two variables have to behave in a complete different way for the two event categories. Let us see the distribution of this variables 7 ExtendedExtended MLML exampleexample IIII A first look at the distribution (mass and time) allow us to state: pdf for signal mass: double Gaussian pdf for signal time: exponential (negative) pdf for background mass: exponential (almost flat) pdf for background time: exponential + Lorentzian We build the pdf and make as: By maximizing the likelihood, we can estimate the number of signal and background as well as the B meson mass and life time 8 ExtendedExtended MLML exampleexample IIII signal background all data mass Life time The fit is done with the RooFit package (Root) 9 WeightedWeighted maximummaximum likelihoodlikelihood Suppose we want to measure the polarization of the J/Ψ meson (1--). The measurement can be done by looking at the angular distribution of the decay product of the meson itself: and are respectively the polar and azimuthal angles of the positive muon, in the decay in the J/Ψ rest frame, measured choosing the J/Ψ direction in the lab frame as polar axis. θ 10 WeightedWeighted likelihoodlikelihood –– polarizationpolarization measurementmeasurement We have to measure the angular distribution and fit with the function There are two main problem to face: When we select our signal, there is an unavoidable amount of background events (evident from the mass distribution) The angular distribution of the background events is unknown and also very difficult to parametrize The likelihood function is: Where ε is the total detection efficiency, P is the above angular function and Norm is a normalization function in order to have probability normalized to 1. 11 cancelsout if: Thebackground eventscontribution Weighted likelihood– polarizationWeighted measurement Weighted likelihood– polarizationWeighted measurement extendedevents all at butsome with proper weights: orderIn taketo intoaccount the background events likelihoodthe sum is thenisconstant a termthein maximization procedure. term denominator the at doesThe efficiency not dependon readjusting the weights in a proper way proper a in weights the readjusting this by account into take can always we otherwise satisfied iswell hypothesis The islinear. massdistribution background The MeVup) 300 regions thethree shifting demonstrated be (can regions bands side and signal samein the the are distributions angular background combinatorial The left side band λ parameters and right side band 12 WeightedWeighted likelihoodlikelihood –– polarizationpolarization measurementmeasurement How to evaluate the Norm function (depending on detector efficiency) ? We can again use the MC simulation, considering an unpolarized sample, P=1 and the sum is over the MC events. 13 WeightedWeighted likelihoodlikelihood –– polarizationpolarization measurementmeasurement Then from MC events we can compute the function: 14 TheThe sPlotsPlot techniquetechnique 15 RelationshipRelationship betweenbetween MLML andand BayesianBayesian estimatorsestimators In Bayesian statistics, both θ and x are random variables:. In the Bayes approach, if θ is a certain hypothesis: posterior θ pdf (conditional pdf for θ given x) prior θ probability Purist Bayesian: p(θ|x) contains all the informations about θ. Pragmatist Bayesian: p(θ|x) can be a complicated function: summarize by using new estimator Looking at p(θ|x): what do we use for π(θ)? No golden rule (subjective!), often represent ‘prior ignorance’ by π(θ) = constant, in which case 16 But... we could have used a different parameter, e.g., λ = 1/θ, and if prior π(θ) is constant, then π(λ) is not! ‘Complete prior ignorance’ is not well defined. RelationshipRelationship betweenbetween MLML andand BayesianBayesian estimatorsestimators The main concern expressed by frequentist statisticians regarding the use of Bayesian probability is its intrinsic dependence on a prior probability that can be chosen in an arbitrary way. This arbitrariness makes Bayesian probability to some extent subjective. Adding more measurements increases one’s knowledge of the unknown parameter, hence the posterior probability will depend less, and be less sensitive to the choice of the prior probability. In those cases, where a large number of measurements occurs, in most of the cases the results of Bayesian calculations tend to be identical to those of frequentist calculations. Many interesting statistical problems arise in the cases of low statistics, i.e. a small number of measurements. In those cases, Bayesian or frequentist methods usually leads to different results. In those cases, using the Bayesian approach, the choice of the prior probabilities plays a crucial role and has great influence in the results. One main difficulty is how to chose a PDF that models one’s complete ignorance on an unknown parameter. One naively could choose a uniform (“flat”) PDF in the interval of validity of the parameter. But it is clear that if we move the parametrization from x to a function of x (say log x or 1/x), the resulting 17 transformed parameter will no longer have a uniform prior PDF TheThe JeffreysJeffreys priorprior One possible approach has been proposed by Harold Jeffreys, adopting a choice of prior PDF that results invariant under parameter transformation. This choice is: with: Determinant of the Fischer information matrix Examples of Jeffreys prior distributions for some important parameters 18 IntervalInterval estimation,estimation, settingsetting limitslimits 19 IntervalInterval estimationestimation —— introductionintroduction In addition to a ‘point estimate’ of a parameter we should report an interval reflecting its statistical uncertainty. Desirable properties of such an interval may include: ● communicate objectively the result of the experiment; ● have a given probability of containing the true parameter; ● provide informations needed to draw conclusions about the parameter possibly incorporating stated prior beliefs. Often use +/- the estimated standard deviation of the estimator. In some cases, however, this is not adequate: ● estimate near a physical boundary, e.g., an observed event rate consistent with zero. We will look briefly at Frequentist and Bayesian intervals. 20 NeymanNeyman confidenceconfidence intervalsintervals Rigorous procedure to get confidence intervals in frequentist approach Consider an estimator for a parameter (measurable) We also need the pdf Specify upper and lower tail probabilities, e.g., α = 0.05, β = 0.05, then find functions uα(θ) and vβ(θ) such that Integral over all the possible estimator values We obtain a confidence interval (CL = 1-α-β) for the estimator function of the true parameter value θ. This is the interval for the estimator 21 No unique way to define this interval with the same CL ConfidenceConfidence intervalinterval fromfrom thethe confidenceconfidence beltbelt Confident belt region is a function of the parameter Find points where observed estimate intersects the confidence belt This gives the confidence interval for The true parameter Confidence level = 1 - α - β = probability for the interval to cover true

Lesson 44 Maximummaximum Likelihoodlikelihood Intervalinterval Estimationestimation

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support