Bayesian Inference for the Location Parameter of a Student-T Density

Total Page:16

File Type:pdf, Size:1020Kb

Bayesian Inference for the Location Parameter of a Student-T Density Bayesian inference for the location parameter of a Student-t density Jean-Fran¸cois Angers∗† CRM-2642 February 2000 ∗D´ep. de math´ematiques et de statistique; Universit´ede Montr´eal; C.P. 6128, Succ. ”Centre-ville”; Montr´eal, Qu´ebec, H3C 3J7;[email protected] †This research has been partially funded by NSERC, Canada Abstract Student-t densities play an important role in Bayesian statistics. For example, suppose that an estimator of the mean of a normal population with unknown variance is desired, then the marginal posterior density of the mean is often a Student-t density. In this paper, estimation of the location parameter of a Student-t density is considered when its prior is also a Student-t density. It is shown that the posterior mean and variance can be written as the ratio of two finite sums when the number of the degrees of freedom of both the likelihood function and the prior are odd. When one of them (or both) is even, approximations for the posterior mean and variance are given. The behavior of the posterior mean is also investigated in presence of outlying observations. When robustness is achieved, second order approximations of the estimator and its posterior expected loss are given. Mathematics Subject Classification: 62C10, 62F15, 62F35. Keyworks:Robust estimator, Fourier transform, Convolution of Student-t densities. 1 Introduction Heavy tail priors play an important role in Bayesian statistics. They can be viewed as an alternative to noninformative priors since they lead to estimators which are insensitive to misspecification of the prior parameters. However, they allow the use of prior information when it is available. Because of its heavier tails, the Student-t density is a “robust” alternative to the normal density when large observations are expected. (Here, robustness means that the prior information is ignored when it conflicts with the information contained in the data.) This density is also encountered when the data come from a normal population with unknown variance. In this paper, the problem of estimating the location of a Student-t density, under squared- error loss, is considered. To obtain an estimator which will ignore the prior information when it conflicts with the likelihood information, the prior density proposed in the paper is another Student- t density. However, it has fewer degrees of freedom than the likelihood. Consequently, the prior tails are heavier that those of the likelihood, resulting in an estimator which is insensitive to prior misspecification (cf. O’Hagan, 1979). This problem has been previously studied by Fan and Berger (1990), Angers and Berger (1991), Angers (1992) and Fan and Berger (1992). However, some conditions have to be imposed on the degrees of freedom in order to obtain an analytic expression for the estimator. A statistical motivation of the importance of this problem can be found in Fan and Berger (1990). In Section 2 of this paper, it is assumed that the degrees of freedom of both the prior and the likelihood are odd. Using Angers (1996a), an alternative form (which is sometimes easier to use (cf. Angers, 1996b)) for the estimator is also proposed in this section. In Section 3, it is shown that the effect of large observation of the proposed estimator is limited. In the last section, using Saleh (1994), an approximation is considered for the case where the number of degrees of freedom of the likelihood function is even. 2 Development of the estimator—odd degrees of freedom Let us consider the following model: X | θ ∼ T2k+1(θ, σ), θ ∼ T2κ+1(µ, τ), where σ, µ and τ are known and both k and κ are in N. The notation Tm(η, ν) denotes the Student-t density with m degrees of freedom and location and scale parameters respectively given by η and ν, that is Γ([m + 1]/2) (x − η)2 !−[m+1]/2 f (x | η, ν) = √ 1 + . (1) m ν mπΓ(m/2) mν2 Since the hyperparameters are assumed to be known, we suppose, without loss of generality, that µ = 0 and σ = 1. The general case can be obtained by replacing X by σX + µ and θ by θ + µ in Theorems 2 and 3. In Angers (1996a), the following theorem is proved. 1 Theorem 1. If X | θ ∼ g(x − θ) and if θ | τ ∼ τ −1h(θ/τ), then m(x) = marginal density of X evaluated at x = I0(x), (2) θb(x) = posterior expected mean of θ I (x) = x − i 1 , (3) I0(x) ρ(x) = posterior variance of θ I (x)!2 I (x) = 1 − 2 , (4) I0(x) I0(x) √ −1 (j) (j) where i = −1, Ij(x) = F {hb(τs)gb (s); x}, hb(s) denotes the Fourier transform of h(x), gb (s) the jth derivative of the Fourier transform of g(x) and F −1{fb; x} represents the inverse Fourier transform of fb evaluated at x. Applications of this theorem to several models can be found in Leblanc and Angers (1995) and Angers (1996a, 1996b). In order to compute equations (2),(3) and (4), the Fourier transform of a Student-t density is needed. It is given, along with its first two derivatives, in the following proposition. Since the proof is mostly technical, it is omitted. Proposition 1. If X ∼ Tm(0, σ), then √ ( mσ|s|)m/2 √ fb (s) = K ( mσ|s|), m 2[m−2]/2Γ(m/2) m/2 √ ( mσ|s|)m/2 √ fb0 (s) = −σ sign(s) K ( mσ|s|), m 2[m−2]/2Γ(m/2) [m−2]/2 √ m(m − 1)σ2( mσ|s|)m−2 √ fb00 (s) = mσ2fb(s) − K ( mσ|s|), m 2[m−2]/2Γ(m/2) [m−2]/2 where Km/2(s) denotes the modified Bessel function of the second kind of order m/2. Note that if m = 2k + 1 where k ∈ N, then, using Gradshteyn and Ryzhik (1980, equation (8.468)), we have that √ √ Km/2( mσ|s|) = Kk+1/2( mσ|s|) √ π 1 k (2k − p)! √ = √ X (2σ 2k + 1|s|)p. (5) k+1/2 k (2σ 2k + 1) |s| p=0 p!(k − p)! To obtain the marginal density of x, the posterior mean and variance of θ, we need to compute −1 (j) F {fb2κ+1(τs)fb2k+1(s); x}, for j = 0, 1 and 2. Hence, the following two integrals need to be evaluated: Z ∞ √ k+κ−l+1 Ak,l(x) = cos(|x|s)s Kk−l+1/2( 2k + 1s) 0 √ × Kκ+1/2( 2κ + 1τs)ds, (6) 2 for l = 0 and 1 and Z ∞ √ k+κ+1 Bk(x) = sin(|x|s)s Kk−1/2( 2k + 1s) 0 √ × Kκ+1/2( 2κ + 1τs)ds. (7) Using Angers (1997), we can also show that Theorem 2. (2k + 1)[2k+1]/4(2κ + 1)[2κ+1]/4τ 2κ+1 m (x) = A (x), 2k+1 2k+κπΓ(k + 1/2)Γ(κ + 1/2) k,0 √ Bk(x) θb2k+1(x) = x − 2k + 1 sign(x) , Ak,0(x) !2 2k Ak,1(x) Bk(x) ρ2k+1(x) = (2k + 1) √ − 1 + . 2k + 1 Ak,0(x) Ak,0(x) In order to compute equations (6) and (7), we need the following lemma. This lemma can be proven using Gradshteyn and Ryzhik (1980, equations (3.944.5) and (3.944.6)). Lemma 1. Z ∞ Γ(a + 1) sa cos(xs) e−bs ds = cos([a + 1] tan−1(x/b)), 0 (b2 + x2)[a+1]/2 Z ∞ Γ(a + 1) sa sin(xs) e−bs ds = sin([a + 1] tan−1(x/b)). 0 (b2 + x2)[a+1]/2 Using Lemma 1 and equation (5), the functions Ak,l(x) and Bk(x) can be easily evaluated and they are given in the following theorem. Theorem 3. π 1 A (x) = k,l 2k−l+κ+1 (2k + 1)[2(k−l)+1]/4([2κ + 1]τ 2)[2κ+1]/4 k−l κ X X (2[k − l] − p)! (2κ − q)! p=0 q=0 (k − l − p)! (κ − q)! p + q! 2p+q(2k + 1)p/2([2κ + 1]τ 2)q/2 × √ √ (8) q ([ 2k + 1 + τ 2κ + 1]2 + x2)[p+q+1]/2 " |x| #! × cos [p + q + 1] tan−1 √ √ , 2k + 1 + τ 2κ + 1 π 1 B (x) = k 2k+κ (2k + 1)[2k−1]/4([2κ + 1]τ 2)[2κ+1]/4 k−1 κ X X (2[k − 1] − p)! (2κ − q)! p=0 q=0 (k − 1 − p)! (κ − q)! p + q! 2p+q(2k + 1)p/2([2κ + 1]τ 2)q/2 × √ √ (9) q ([ 2k + 1 + τ 2κ + 1]2 + x2)[p+q+2]/2 " |x| #! × sin [p + q + 2] tan−1 √ √ . 2k + 1 + τ 2κ + 1 3 Using Theorems 2 and 3, the posterior expected mean and the posterior variance can be com- puted using only a ratio of two finite sums. In Section 4, the case where the likelihood function is a Student-t density with an even number of degrees of freedom is considered. In this situation, the posterior quantities cannot be written using finite sums, although they can be expressed as the ratio of two infinite series (cf. Angers, 1997). However, using an approximation for the Student-t density (cf. Saleh94), θb2k and ρ2k(x) can be approximated accurately. Before doing so, we first discuss two limit cases, that is, when |x| is large and when τ → ∞. 3 Special cases The main advantage of using a heavy-tails prior is that the resulting Bayes estimator, under the squared-error loss, is insensitive to the choice of prior when there is a conflict between the prior and the likelihood information.
Recommended publications
  • Estimation of Common Location and Scale Parameters in Nonregular Cases Ahmad Razmpour Iowa State University
    Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 1982 Estimation of common location and scale parameters in nonregular cases Ahmad Razmpour Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Statistics and Probability Commons Recommended Citation Razmpour, Ahmad, "Estimation of common location and scale parameters in nonregular cases " (1982). Retrospective Theses and Dissertations. 7528. https://lib.dr.iastate.edu/rtd/7528 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. INFORMATION TO USERS This reproduction was made from a copy of a document sent to us for microfilming. While the most advanced technology has been used to photograph and reproduce this document, the quality of the reproduction is heavily dependent upon the quality of the material submitted. The following explanation of techniques is provided to help clarify markings or notations which may appear on this reproduction. 1. The sign or "target" for pages apparently lacking from the document photographed is "Missing Page(s)". If it was possible to obtain the missing page(s) or section, they are spliced into the film along with adjacent pages. This may have necessitated cutting through an image and duplicating adjacent pages to assure complete continuity. 2. When an image on the film is obliterated with a round black mark, it is an indication of either blurred copy because of movement during exposure, duplicate copy, or copyrighted materials that should not have been filmed.
    [Show full text]
  • A Comparison of Unbiased and Plottingposition Estimators of L
    WATER RESOURCES RESEARCH, VOL. 31, NO. 8, PAGES 2019-2025, AUGUST 1995 A comparison of unbiased and plotting-position estimators of L moments J. R. M. Hosking and J. R. Wallis IBM ResearchDivision, T. J. Watson ResearchCenter, Yorktown Heights, New York Abstract. Plotting-positionestimators of L momentsand L moment ratios have several disadvantagescompared with the "unbiased"estimators. For generaluse, the "unbiased'? estimatorsshould be preferred. Plotting-positionestimators may still be usefulfor estimatingextreme upper tail quantilesin regional frequencyanalysis. Probability-Weighted Moments and L Moments •r+l-" (--1)r • P*r,k Olk '- E p *r,!•[J!•. Probability-weightedmoments of a randomvariable X with k=0 k=0 cumulativedistribution function F( ) and quantile function It is convenient to define dimensionless versions of L mo- x( ) were definedby Greenwoodet al. [1979]to be the quan- tities ments;this is achievedby dividingthe higher-orderL moments by the scale measure h2. The L moment ratios •'r, r = 3, Mp,ra= E[XP{F(X)}r{1- F(X)} s] 4, '", are definedby ßr-" •r/•2 ß {X(u)}PUr(1 -- U)s du. L momentratios measure the shapeof a distributionindepen- dently of its scaleof measurement.The ratios *3 ("L skew- ness")and *4 ("L kurtosis")are nowwidely used as measures Particularlyuseful specialcases are the probability-weighted of skewnessand kurtosis,respectively [e.g., Schaefer,1990; moments Pilon and Adamowski,1992; Royston,1992; Stedingeret al., 1992; Vogeland Fennessey,1993]. 12•r= M1,0, r = •01 (1 - u)rx(u) du, Estimators Given an ordered sample of size n, Xl: n • X2:n • ''' • urx(u) du. X.... there are two establishedways of estimatingthe proba- /3r--- Ml,r, 0 =f01 bility-weightedmoments and L moments of the distribution from whichthe samplewas drawn.
    [Show full text]
  • The Asymmetric T-Copula with Individual Degrees of Freedom
    The Asymmetric t-Copula with Individual Degrees of Freedom d-fine GmbH Christ Church University of Oxford A thesis submitted for the degree of MSc in Mathematical Finance Michaelmas 2012 Abstract This thesis investigates asymmetric dependence structures of multivariate asset returns. Evidence of such asymmetry for equity returns has been reported in the literature. In order to model the dependence structure, a new t-copula approach is proposed called the skewed t-copula with individ- ual degrees of freedom (SID t-copula). This copula provides the flexibility to assign an individual degree-of-freedom parameter and an individual skewness parameter to each asset in a multivariate setting. Applying this approach to GARCH residuals of bivariate equity index return data and using maximum likelihood estimation, we find significant asymmetry. By means of the Akaike information criterion, it is demonstrated that the SID t-copula provides the best model for the market data compared to other copula approaches without explicit asymmetry parameters. In addition, it yields a better fit than the conventional skewed t-copula with a single degree-of-freedom parameter. In a model impact study, we analyse the errors which can occur when mod- elling asymmetric multivariate SID-t returns with the symmetric multi- variate Gauss or standard t-distribution. The comparison is done in terms of the risk measures value-at-risk and expected shortfall. We find large deviations between the modelled and the true VaR/ES of a spread posi- tion composed of asymmetrically distributed risk factors. Going from the bivariate case to a larger number of risk factors, the model errors increase.
    [Show full text]
  • Approximated Bayes and Empirical Bayes Confidence Intervals—
    Ann. Inst. Statist. Math. Vol. 40, No. 4, 747-767 (1988) APPROXIMATED BAYES AND EMPIRICAL BAYES CONFIDENCE INTERVALSmTHE KNOWN VARIANCE CASE* A. J. VAN DER MERWE, P. C. N. GROENEWALD AND C. A. VAN DER MERWE Department of Mathematical Statistics, University of the Orange Free State, PO Box 339, Bloemfontein, Republic of South Africa (Received June 11, 1986; revised September 29, 1987) Abstract. In this paper hierarchical Bayes and empirical Bayes results are used to obtain confidence intervals of the population means in the case of real problems. This is achieved by approximating the posterior distribution with a Pearson distribution. In the first example hierarchical Bayes confidence intervals for the Efron and Morris (1975, J. Amer. Statist. Assoc., 70, 311-319) baseball data are obtained. The same methods are used in the second example to obtain confidence intervals of treatment effects as well as the difference between treatment effects in an analysis of variance experiment. In the third example hierarchical Bayes intervals of treatment effects are obtained and compared with normal approximations in the unequal variance case. Key words and phrases: Hierarchical Bayes, empirical Bayes estimation, Stein estimator, multivariate normal mean, Pearson curves, confidence intervals, posterior distribution, unequal variance case, normal approxima- tions. 1. Introduction In the Bayesian approach to inference, a posterior distribution of unknown parameters is produced as the normalized product of the like- lihood and a prior distribution. Inferences about the unknown parameters are then based on the entire posterior distribution resulting from the one specific data set which has actually occurred. In most hierarchical and empirical Bayes cases these posterior distributions are difficult to derive and cannot be obtained in closed form.
    [Show full text]
  • A Bayesian Hierarchical Spatial Copula Model: an Application to Extreme Temperatures in Extremadura (Spain)
    atmosphere Article A Bayesian Hierarchical Spatial Copula Model: An Application to Extreme Temperatures in Extremadura (Spain) J. Agustín García 1,† , Mario M. Pizarro 2,*,† , F. Javier Acero 1,† and M. Isabel Parra 2,† 1 Departamento de Física, Universidad de Extremadura, Avenida de Elvas, 06006 Badajoz, Spain; [email protected] (J.A.G.); [email protected] (F.J.A.) 2 Departamento de Matemáticas, Universidad de Extremadura, Avenida de Elvas, 06006 Badajoz, Spain; [email protected] * Correspondence: [email protected] † These authors contributed equally to this work. Abstract: A Bayesian hierarchical framework with a Gaussian copula and a generalized extreme value (GEV) marginal distribution is proposed for the description of spatial dependencies in data. This spatial copula model was applied to extreme summer temperatures over the Extremadura Region, in the southwest of Spain, during the period 1980–2015, and compared with the spatial noncopula model. The Bayesian hierarchical model was implemented with a Monte Carlo Markov Chain (MCMC) method that allows the distribution of the model’s parameters to be estimated. The results show the GEV distribution’s shape parameter to take constant negative values, the location parameter to be altitude dependent, and the scale parameter values to be concentrated around the same value throughout the region. Further, the spatial copula model chosen presents lower deviance information criterion (DIC) values when spatial distributions are assumed for the GEV distribution’s Citation: García, J.A.; Pizarro, M.M.; location and scale parameters than when the scale parameter is taken to be constant over the region. Acero, F.J.; Parra, M.I. A Bayesian Hierarchical Spatial Copula Model: Keywords: Bayesian hierarchical model; extreme temperature; Gaussian copula; generalized extreme An Application to Extreme value distribution Temperatures in Extremadura (Spain).
    [Show full text]
  • 1 Estimation and Beyond in the Bayes Universe
    ISyE8843A, Brani Vidakovic Handout 7 1 Estimation and Beyond in the Bayes Universe. 1.1 Estimation No Bayes estimate can be unbiased but Bayesians are not upset! No Bayes estimate with respect to the squared error loss can be unbiased, except in a trivial case when its Bayes’ risk is 0. Suppose that for a proper prior ¼ the Bayes estimator ±¼(X) is unbiased, Xjθ (8θ)E ±¼(X) = θ: This implies that the Bayes risk is 0. The Bayes risk of ±¼(X) can be calculated as repeated expectation in two ways, θ Xjθ 2 X θjX 2 r(¼; ±¼) = E E (θ ¡ ±¼(X)) = E E (θ ¡ ±¼(X)) : Thus, conveniently choosing either EθEXjθ or EX EθjX and using the properties of conditional expectation we have, θ Xjθ 2 θ Xjθ X θjX X θjX 2 r(¼; ±¼) = E E θ ¡ E E θ±¼(X) ¡ E E θ±¼(X) + E E ±¼(X) θ Xjθ 2 θ Xjθ X θjX X θjX 2 = E E θ ¡ E θ[E ±¼(X)] ¡ E ±¼(X)E θ + E E ±¼(X) θ Xjθ 2 θ X X θjX 2 = E E θ ¡ E θ ¢ θ ¡ E ±¼(X)±¼(X) + E E ±¼(X) = 0: Bayesians are not upset. To check for its unbiasedness, the Bayes estimator is averaged with respect to the model measure (Xjθ), and one of the Bayesian commandments is: Thou shall not average with respect to sample space, unless you have Bayesian design in mind. Even frequentist agree that insisting on unbiasedness can lead to bad estimators, and that in their quest to minimize the risk by trading off between variance and bias-squared a small dosage of bias can help.
    [Show full text]
  • Download Article (PDF)
    Journal of Statistical Theory and Applications, Vol. 17, No. 2 (June 2018) 359–374 ___________________________________________________________________________________________________________ BAYESIAN APPROACH IN ESTIMATION OF SHAPE PARAMETER OF THE EXPONENTIATED MOMENT EXPONENTIAL DISTRIBUTION Kawsar Fatima Department of Statistics, University of Kashmir, Srinagar, India [email protected] S.P Ahmad* Department of Statistics, University of Kashmir, Srinagar, India [email protected] Received 1 November 2016 Accepted 19 June 2017 Abstract In this paper, Bayes estimators of the unknown shape parameter of the exponentiated moment exponential distribution (EMED)have been derived by using two informative (gamma and chi-square) priors and two non- informative (Jeffrey’s and uniform) priors under different loss functions, namely, Squared Error Loss function, Entropy loss function and precautionary Loss function. The Maximum likelihood estimator (MLE) is obtained. Also, we used two real life data sets to illustrate the result derived. Keywords: Exponentiated Moment Exponential distribution; Maximum Likelihood Estimator; Bayesian estimation; Priors; Loss functions. 2000 Mathematics Subject Classification: 22E46, 53C35, 57S20 1. Introduction The exponentiated exponential distribution is a specific family of the exponentiated Weibull distribution. In analyzing several life time data situations, it has been observed that the dual parameter exponentiated exponential distribution can be more effectively used as compared to both dual parameters of gamma or Weibull distribution. When we consider the shape parameter of exponentiated exponential, gamma and Weibull is one, then these distributions becomes one parameter exponential distribution. Hence, these three distributions are the off shoots of the exponential distribution. Moment distributions have a vital role in mathematics and statistics, in particular probability theory, in the viewpoint research related to ecology, reliability, biomedical field, econometrics, survey sampling and in life-testing.
    [Show full text]
  • Procedures for Estimation of Weibull Parameters James W
    United States Department of Agriculture Procedures for Estimation of Weibull Parameters James W. Evans David E. Kretschmann David W. Green Forest Forest Products General Technical Report February Service Laboratory FPL–GTR–264 2019 Abstract Contents The primary purpose of this publication is to provide an 1 Introduction .................................................................. 1 overview of the information in the statistical literature on 2 Background .................................................................. 1 the different methods developed for fitting a Weibull distribution to an uncensored set of data and on any 3 Estimation Procedures .................................................. 1 comparisons between methods that have been studied in the 4 Historical Comparisons of Individual statistics literature. This should help the person using a Estimator Types ........................................................ 8 Weibull distribution to represent a data set realize some advantages and disadvantages of some basic methods. It 5 Other Methods of Estimating Parameters of should also help both in evaluating other studies using the Weibull Distribution .......................................... 11 different methods of Weibull parameter estimation and in 6 Discussion .................................................................. 12 discussions on American Society for Testing and Materials Standard D5457, which appears to allow a choice for the 7 Conclusion ................................................................
    [Show full text]
  • Location-Scale Distributions
    Location–Scale Distributions Linear Estimation and Probability Plotting Using MATLAB Horst Rinne Copyright: Prof. em. Dr. Horst Rinne Department of Economics and Management Science Justus–Liebig–University, Giessen, Germany Contents Preface VII List of Figures IX List of Tables XII 1 The family of location–scale distributions 1 1.1 Properties of location–scale distributions . 1 1.2 Genuine location–scale distributions — A short listing . 5 1.3 Distributions transformable to location–scale type . 11 2 Order statistics 18 2.1 Distributional concepts . 18 2.2 Moments of order statistics . 21 2.2.1 Definitions and basic formulas . 21 2.2.2 Identities, recurrence relations and approximations . 26 2.3 Functions of order statistics . 32 3 Statistical graphics 36 3.1 Some historical remarks . 36 3.2 The role of graphical methods in statistics . 38 3.2.1 Graphical versus numerical techniques . 38 3.2.2 Manipulation with graphs and graphical perception . 39 3.2.3 Graphical displays in statistics . 41 3.3 Distribution assessment by graphs . 43 3.3.1 PP–plots and QQ–plots . 43 3.3.2 Probability paper and plotting positions . 47 3.3.3 Hazard plot . 54 3.3.4 TTT–plot . 56 4 Linear estimation — Theory and methods 59 4.1 Types of sampling data . 59 IV Contents 4.2 Estimators based on moments of order statistics . 63 4.2.1 GLS estimators . 64 4.2.1.1 GLS for a general location–scale distribution . 65 4.2.1.2 GLS for a symmetric location–scale distribution . 71 4.2.1.3 GLS and censored samples .
    [Show full text]
  • Bayes Estimator Recap - Example
    Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary . Last Lecture . Biostatistics 602 - Statistical Inference Lecture 16 • What is a Bayes Estimator? Evaluation of Bayes Estimator • Is a Bayes Estimator the best unbiased estimator? . • Compared to other estimators, what are advantages of Bayes Estimator? Hyun Min Kang • What is conjugate family? • What are the conjugate families of Binomial, Poisson, and Normal distribution? March 14th, 2013 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 1 / 28 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 2 / 28 Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary . Recap - Bayes Estimator Recap - Example • θ : parameter • π(θ) : prior distribution i.i.d. • X1, , Xn Bernoulli(p) • X θ fX(x θ) : sampling distribution ··· ∼ | ∼ | • π(p) Beta(α, β) • Posterior distribution of θ x ∼ | • α Prior guess : pˆ = α+β . Joint fX(x θ)π(θ) π(θ x) = = | • Posterior distribution : π(p x) Beta( xi + α, n xi + β) | Marginal m(x) | ∼ − • Bayes estimator ∑ ∑ m(x) = f(x θ)π(θ)dθ (Bayes’ rule) | α + x x n α α + β ∫ pˆ = i = i + α + β + n n α + β + n α + β α + β + n • Bayes Estimator of θ is ∑ ∑ E(θ x) = θπ(θ x)dθ | θ Ω | ∫ ∈ Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 3 / 28 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 4 / 28 Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary . Loss Function Optimality Loss Function Let L(θ, θˆ) be a function of θ and θˆ.
    [Show full text]
  • CSC535: Probabilistic Graphical Models
    CSC535: Probabilistic Graphical Models Bayesian Probability and Statistics Prof. Jason Pacheco Why Graphical Models? Data elements often have dependence arising from structure Pose Estimation Protein Structure Exploit structure to simplify representation and computation Why “Probabilistic”? Stochastic processes have many sources of uncertainty Randomness in Measurement State of Nature Process PGMs let us represent and reason about these in structured ways What is Probability? What does it mean that the probability of heads is ½ ? Two schools of thought… Frequentist Perspective Proportion of successes (heads) in repeated trials (coin tosses) Bayesian Perspective Belief of outcomes based on assumptions about nature and the physics of coin flips Neither is better/worse, but we can compare interpretations… Administrivia • HW1 due 11:59pm tonight • Will accept submissions through Friday, -0.5pts per day late • HW only worth 4pts so maximum score on Friday is 75% • Late policy only applies to this HW Frequentist & Bayesian Modeling We will use the following notation throughout: - Unknown (e.g. coin bias) - Data Frequentist Bayesian (Conditional Model) (Generative Model) Prior Belief Likelihood • is a non-random unknown • is a random variable (latent) parameter • Requires specifying the • is the sampling / data prior belief generating distribution Frequentist Inference Example: Suppose we observe the outcome of N coin flips. What is the probability of heads (coin bias)? • Coin bias is not random (e.g. there is some true value) • Uncertainty reported
    [Show full text]
  • 9 Bayesian Inference
    9 Bayesian inference 1702 - 1761 9.1 Subjective probability This is probability regarded as degree of belief. A subjective probability of an event A is assessed as p if you are prepared to stake £pM to win £M and equally prepared to accept a stake of £pM to win £M. In other words ... ... the bet is fair and you are assumed to behave rationally. 9.1.1 Kolmogorov’s axioms How does subjective probability fit in with the fundamental axioms? Let A be the set of all subsets of a countable sample space Ω. Then (i) P(A) ≥ 0 for every A ∈A; (ii) P(Ω)=1; 83 (iii) If {Aλ : λ ∈ Λ} is a countable set of mutually exclusive events belonging to A,then P Aλ = P (Aλ) . λ∈Λ λ∈Λ Obviously the subjective interpretation has no difficulty in conforming with (i) and (ii). (iii) is slightly less obvious. Suppose we have 2 events A and B such that A ∩ B = ∅. Consider a stake of £pAM to win £M if A occurs and a stake £pB M to win £M if B occurs. The total stake for bets on A or B occurring is £pAM+ £pBM to win £M if A or B occurs. Thus we have £(pA + pB)M to win £M and so P (A ∪ B)=P(A)+P(B) 9.1.2 Conditional probability Define pB , pAB , pA|B such that £pBM is the fair stake for £M if B occurs; £pABM is the fair stake for £M if A and B occur; £pA|BM is the fair stake for £M if A occurs given B has occurred − other- wise the bet is off.
    [Show full text]