Two Statistical Paradigms Bayesian Versus Frequentist

Total Page:16

File Type:pdf, Size:1020Kb

Two Statistical Paradigms Bayesian Versus Frequentist Two Statistical Paradigms Bayesian versus Frequentist Steven Janke April 2012 (Bayesian Seminar) April 2012 1 / 31 Probability versus Statistics Probability Problem Given some symmetry, what is the probability that an event occurs? (Example: Probability of 3 heads when a fair coin is flipped 5 times.) Statistical Problem Given some data, what is the underlying symmetry? (Example: Four heads resulted from tossing a coin 5 times. Is is a fair coin?) (Bayesian Seminar) April 2012 2 / 31 Statistics Problems All statistical problems begin with a model for the problem. Then there are different flavors for the actual question: Point Estimation (Find the mean of the population.) Interval Estimation (Confidence interval for the mean.) Hypothesis testing (Is the mean greater than 10?) Model fit (Could one random variable be linearly related to another?) (Bayesian Seminar) April 2012 3 / 31 Estimators and Loss functions Definition (Estimator) An estimator (θˆ) is a random variable that is a function of the data and reasonably approximates θ. ˆ ¯ 1 Pn For example, θ = X = n i=1 Xi Loss Functions Squared Error: L(θ − θˆ) = (θ − θˆ)2 Absolute Error: L(θ − θˆ) = |θ − θˆ| Linex: L(θ − θˆ) = exp(c(θ − θˆ)) − c(θ − θˆ) − 1 (Bayesian Seminar) April 2012 4 / 31 Comparing Estimators Definition (Risk) An estimator’s risk is R(θ, θˆ) = E(L(θ, θˆ)). For two estimators θˆ1 and θˆ2, if R(θ, θˆ1) < R(θ, θˆ2) for all θ ∈ Θ, then θˆ2 is called inadmissible. How do you find ”good” estimators? What does ”good” mean? How do you definitively rank estimators? (Bayesian Seminar) April 2012 5 / 31 Frequentist Approach Philosophical Principles: Probability as relative frequency (repeatable trials). (Law of large numbers, Central Limit Theorem) θ is a fixed quantity, but the data can vary. Definition (Likelihood Function) Suppose the population density is fθ and the data (x1, x2, ··· , xn) are n sampled independently, then the Likelihood Function is L(θ) = Πi=1fθ(xi ). The value of θ (as a function of the data) that gives the maximum likelihood is a good candidate for an estimator θˆ. (Bayesian Seminar) April 2012 6 / 31 Maximum Likelihood for Normal and Binomial Population is Normal(µ, σ2): (x −µ)2 n 1 − i L(µ) = Π √ e 2σ2 i=1 σ 2π ¯ 1 Pn Maximum Likelihood givesµ ˆ = X = n i=1 xi . Population is Binomial(n,p): Flip a coin n times with P[head]=p. L(p) = pk (1 − p)n−k , where k is the number of heads. k Maximum Likelihood givesp ˆ = n . np P µ Both estimators are unbiased: Epˆ = n = p and Eµˆ = n = µ. Among all unbiased estimators, these estimators minimize the squared error loss. (Bayesian Seminar) April 2012 7 / 31 Frequentist Conclusions Definition (Confidence Interval) ¯ σ2 X is a random variable with mean µ and distribution N (µ, n ) ¯ √σ ¯ √σ The random interval (X − 1.96 n , X + 1.96 n ) is a confidence interval. The probability that it covers µ is 0.95. >>> The probability that µ is in the interval is not 0.95! Definition (Hypothesis Testing) Null Hypothesis: µ = 10 ∗ X¯ −10 Calculate z = σ n p-value is P[z ≤ z∗] + P[z ≥ z∗] >>> p-value is not the probability that the null hypothesis is true. (Bayesian Seminar) April 2012 8 / 31 Space of Estimators Methods: Moments, Linear, Invariant Characteristics: Unbiased, Sufficient, Consistent, Minimum Variance, Admissible (Bayesian Seminar) April 2012 9 / 31 Frequentist Justification: Three Theorems Theorem (Rao-Blackwell) A statistic T such that the conditional distribution of the data given T does not depend on θ (the parameter) is called sufficient. Unbiased estimators that are functions of a sufficient statistic give the least variance. Theorem (Lehmann-Scheffe) If T is a complete sufficient statistic for θ and θˆ = h(T ) is an unbiased estimator of θ, then θˆ has the smallest possible variance among unbiased estimators of θ. (UMVUE of θ.) Theorem (Cramer-Rao) Let a population have a ”regular” distribution with density fθ(x) and let θˆ ˆ ∂ 2 −1 be an unbiased estimator of the θ. Then Var(θ) ≥ (nE[( ∂θ ln fθ(X )) ]) (Bayesian Seminar) April 2012 10 / 31 Problems Choosing Estimators How does the Frequentist choose an estimator? Example (Variance) Pn 2 i=1(xi −x¯) Maximum likelihood estimator: n Pn 2 i=1(xi −x¯) Unbiased estimator: n−1 Pn 2 i=1(xi −x¯) Minimum variance: n+1 Example (Mean) Estimate IQ for an individual. Say score on a test was 130. Suppose scoring machine reported all scores less than 100 as 100. Now the Frequentist must change estimate of IQ. (Bayesian Seminar) April 2012 11 / 31 What is the p-value? The p-value depends on events that didn’t happen. Example (Flipping Coins) Flipping coin: K = 13 implies p-value is P[K ≤ 4 or K ≥ 13] = 0.049. If experimenter stopped when at least 4 heads and 4 tails, p − value = 0.021. If experimenter stops at n = 17 and continues if data inconclusive to n = 44, then p − value > 0.049. p-value already suspect since it is NOT the probability H0 is true. (Bayesian Seminar) April 2012 12 / 31 Several Normal means at once Example (Estimating Several Normal Means) Suppose estimating k normal means µ1, µ2, ··· , µk from populations with common variance. Then Frequentist estimate is ~x¯ = (¯x1, x¯2 ··· x¯k ) under Euclidean distance as the loss function. Stein (1956) showed that ~x¯ is not even admissible! A better estimator isµ ˆ(¯x) = (1 − k−2 ) · x¯. ||x||2 The better estimator can be thought of as a Bayesian estimator with the correct prior. (Bayesian Seminar) April 2012 13 / 31 Bayes Theorem Richard Price communicates Rev. Thomas Bayes paper to Royal Society (1763) Given data, Bayes found probability p for a binomial is in a given interval. One of the many lemmas is now known as“Bayes Theorem” P[AB] P[B|A]P[A] P[B|A]P[A] P[A|B] = = = P[A] P[B] P[B|A]P[A] + P[B|Ac ]P[Ac ] Conditional probability defined after defining conditional expectation as an integral over a sub-sigma field of sets. Conditional distribution is then f (x, y) f (y|x)f (x) f (x|y) = = X fY (y) fY (y) (Bayesian Seminar) April 2012 14 / 31 Bayesian Approach Assume that the parameter under investigation is a random variable and that the data are fixed. Decide on a prior distribution for the parameter. (Bayes considered p to be uniformly distributed from 0 to 1.) Calculate the distribution of the parameter given the data. (Posterior distribution) To minimize squared error loss, take the expected value of the posterior distribution as the estimator. (Bayesian Seminar) April 2012 15 / 31 Bayesian Approach for the Binomial Assume a binomial model with parameters n, p (flipping a coin.) Let K be the number of heads (the data). For a prior distribution on p take the Beta distribution. Γ(α+β) α−1 β−1 fp(p) = Γ(α)Γ(β) p (1 − p) for 0 < p < 1. f (p|k) = f (k|p)fp(p) = C · pα+k−1(1 − p)β+n−k−1 fk (k) α+k Hence, E[p|k] = α+β+n With the uniform prior (α = 1, β = 1), the estimator is k+1 n k 2 1 n+2 = ( n+2 ) n + ( n+2 ) 2 . (Bayesian Seminar) April 2012 16 / 31 Bayesian Approach for the Normal If µ is the mean of a normal population, a reasonable prior 2 distribution for µ could be another normal distribution, N (µo, σo). 2 n(¯x−µ)2 (µ−µ )2 (µ−µ1) √ o √ −c − n − 2 − 2 ne 2σ2 f (µ|x¯) = e 2σ 2σo = e 1 2πσσo g(¯x) 2πσσo 2 2 nσo σ Hence, E[µ|x¯] =x ¯( 2 2 ) + µo( 2 2 ) nσo +σ nσo +σ (Bayesian Seminar) April 2012 17 / 31 Bayesian Approach under other Loss functions Note: If we use absolute error loss, then the Bayes estimator is the median of the posterior distribution. 1 −cθ If we use Linex loss, then the Bayes estimator is − c ln{Ee }. (Bayesian Seminar) April 2012 18 / 31 Calculating Mean Squared Error If the loss function is squared error, then the mean squared error (MSE) measures expected loss. MSE = E(θˆ − θ)2 = (E(θˆ) − θ)2 + E(θˆ − E(θˆ))2 (Squared Bias + Variance) ˆ k 2 p(1−p) For θ = n : MSE = 0 + n ˆ k+1 1−2p 2 1 2 For θ = n+2 : MSE = ( n+2 ) + ( n+2 ) · p(1 − p) (Bayesian Seminar) April 2012 19 / 31 Comparing the MSE Figure: Comparing MSE for Freq and Bayes estimators (Bayesian Seminar) April 2012 20 / 31 Bayesian Confidence (Credible) Interval Θ and X are the random variables representing the parameter of interest and the data. The functions u(x) and v(x) are arbitrary and f is the posterior distribution, R v(x) Then, P[u(x) < Θ < v(x)|X = x] = u(x) f (θ|x)dθ Definition (Credible Interval) If u(x) and v(x) are picked so the probability is 0.95, the interval [u(x), v(x)] is a 95% credible interval for Θ. 2 2 q 2 2 xn¯ σo +µo σ σ σo For the normal distribution, 2 2 ± 1.96 2 2 is a 95% credible nσo +σ nσo +σ interval for µ. (Bayesian Seminar) April 2012 21 / 31 Comparing Estimates Frequentist and Bayesian estimates for p in the Binomial: (k, n) Estimate 95% Conf. Int. Width Frequentist (2,3) 0.667 (0.125, 0.982) 0.857 (30,45) 0.667 (0.509, 0.796) 0.287 (60,90) 0.667 (0.559, 0.760) 0.201 Bayes (Beta(1,1) (2,3) 0.600 (0.235, 0.964) 0.729 (30,45) 0.660 (0.527, 0.793) 0.266 (60,90) 0.663 (0.567, 0.759) 0.191 Bayes (Beta(4,4) (2,3) 0.545 (0.270, 0.820) 0.550 (30,45) 0.642 (0.514, 0.729) 0.254 (60,90) 0.653 (0.560, 0.747) 0.187 (Bayesian Seminar) April 2012 22 / 31 Bayesian Hypothesis testing 1 1 For the coin example, Ho : p = 2 and H1 : p 6= 2 Suppose K=14, n=17.
Recommended publications
  • Lecture 13: Simple Linear Regression in Matrix Format
    11:55 Wednesday 14th October, 2015 See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 13: Simple Linear Regression in Matrix Format 36-401, Section B, Fall 2015 13 October 2015 Contents 1 Least Squares in Matrix Form 2 1.1 The Basic Matrices . .2 1.2 Mean Squared Error . .3 1.3 Minimizing the MSE . .4 2 Fitted Values and Residuals 5 2.1 Residuals . .7 2.2 Expectations and Covariances . .7 3 Sampling Distribution of Estimators 8 4 Derivatives with Respect to Vectors 9 4.1 Second Derivatives . 11 4.2 Maxima and Minima . 11 5 Expectations and Variances with Vectors and Matrices 12 6 Further Reading 13 1 2 So far, we have not used any notions, or notation, that goes beyond basic algebra and calculus (and probability). This has forced us to do a fair amount of book-keeping, as it were by hand. This is just about tolerable for the simple linear model, with one predictor variable. It will get intolerable if we have multiple predictor variables. Fortunately, a little application of linear algebra will let us abstract away from a lot of the book-keeping details, and make multiple linear regression hardly more complicated than the simple version1. These notes will not remind you of how matrix algebra works. However, they will review some results about calculus with matrices, and about expectations and variances with vectors and matrices. Throughout, bold-faced letters will denote matrices, as a as opposed to a scalar a. 1 Least Squares in Matrix Form Our data consists of n paired observations of the predictor variable X and the response variable Y , i.e., (x1; y1);::: (xn; yn).
    [Show full text]
  • In This Segment, We Discuss a Little More the Mean Squared Error
    MITOCW | MITRES6_012S18_L20-04_300k In this segment, we discuss a little more the mean squared error. Consider some estimator. It can be any estimator, not just the sample mean. We can decompose the mean squared error as a sum of two terms. Where does this formula come from? Well, we know that for any random variable Z, this formula is valid. And if we let Z be equal to the difference between the estimator and the value that we're trying to estimate, then we obtain this formula here. The expected value of our random variable Z squared is equal to the variance of that random variable plus the square of its mean. Let us now rewrite these two terms in a more suggestive way. We first notice that theta is a constant. When you add or subtract the constant from a random variable, the variance does not change. So this term is the same as the variance of theta hat. This quantity here, we will call it the bias of the estimator. It tells us whether theta hat is systematically above or below than the unknown parameter theta that we're trying to estimate. And using this terminology, this term here is just equal to the square of the bias. So the mean squared error consists of two components, and these capture different aspects of an estimator's performance. Let us see what they are in a concrete setting. Suppose that we're estimating the unknown mean of some distribution, and that our estimator is the sample mean. In this case, the mean squared error is the variance, which we know to be sigma squared over n, plus the bias term.
    [Show full text]
  • On the Meaning and Use of Kurtosis
    Psychological Methods Copyright 1997 by the American Psychological Association, Inc. 1997, Vol. 2, No. 3,292-307 1082-989X/97/$3.00 On the Meaning and Use of Kurtosis Lawrence T. DeCarlo Fordham University For symmetric unimodal distributions, positive kurtosis indicates heavy tails and peakedness relative to the normal distribution, whereas negative kurtosis indicates light tails and flatness. Many textbooks, however, describe or illustrate kurtosis incompletely or incorrectly. In this article, kurtosis is illustrated with well-known distributions, and aspects of its interpretation and misinterpretation are discussed. The role of kurtosis in testing univariate and multivariate normality; as a measure of departures from normality; in issues of robustness, outliers, and bimodality; in generalized tests and estimators, as well as limitations of and alternatives to the kurtosis measure [32, are discussed. It is typically noted in introductory statistics standard deviation. The normal distribution has a kur- courses that distributions can be characterized in tosis of 3, and 132 - 3 is often used so that the refer- terms of central tendency, variability, and shape. With ence normal distribution has a kurtosis of zero (132 - respect to shape, virtually every textbook defines and 3 is sometimes denoted as Y2)- A sample counterpart illustrates skewness. On the other hand, another as- to 132 can be obtained by replacing the population pect of shape, which is kurtosis, is either not discussed moments with the sample moments, which gives or, worse yet, is often described or illustrated incor- rectly. Kurtosis is also frequently not reported in re- ~(X i -- S)4/n search articles, in spite of the fact that virtually every b2 (•(X i - ~')2/n)2' statistical package provides a measure of kurtosis.
    [Show full text]
  • The Probability Lifesaver: Order Statistics and the Median Theorem
    The Probability Lifesaver: Order Statistics and the Median Theorem Steven J. Miller December 30, 2015 Contents 1 Order Statistics and the Median Theorem 3 1.1 Definition of the Median 5 1.2 Order Statistics 10 1.3 Examples of Order Statistics 15 1.4 TheSampleDistributionoftheMedian 17 1.5 TechnicalboundsforproofofMedianTheorem 20 1.6 TheMedianofNormalRandomVariables 22 2 • Greetings again! In this supplemental chapter we develop the theory of order statistics in order to prove The Median Theorem. This is a beautiful result in its own, but also extremely important as a substitute for the Central Limit Theorem, and allows us to say non- trivial things when the CLT is unavailable. Chapter 1 Order Statistics and the Median Theorem The Central Limit Theorem is one of the gems of probability. It’s easy to use and its hypotheses are satisfied in a wealth of problems. Many courses build towards a proof of this beautiful and powerful result, as it truly is ‘central’ to the entire subject. Not to detract from the majesty of this wonderful result, however, what happens in those instances where it’s unavailable? For example, one of the key assumptions that must be met is that our random variables need to have finite higher moments, or at the very least a finite variance. What if we were to consider sums of Cauchy random variables? Is there anything we can say? This is not just a question of theoretical interest, of mathematicians generalizing for the sake of generalization. The following example from economics highlights why this chapter is more than just of theoretical interest.
    [Show full text]
  • Approximated Bayes and Empirical Bayes Confidence Intervals—
    Ann. Inst. Statist. Math. Vol. 40, No. 4, 747-767 (1988) APPROXIMATED BAYES AND EMPIRICAL BAYES CONFIDENCE INTERVALSmTHE KNOWN VARIANCE CASE* A. J. VAN DER MERWE, P. C. N. GROENEWALD AND C. A. VAN DER MERWE Department of Mathematical Statistics, University of the Orange Free State, PO Box 339, Bloemfontein, Republic of South Africa (Received June 11, 1986; revised September 29, 1987) Abstract. In this paper hierarchical Bayes and empirical Bayes results are used to obtain confidence intervals of the population means in the case of real problems. This is achieved by approximating the posterior distribution with a Pearson distribution. In the first example hierarchical Bayes confidence intervals for the Efron and Morris (1975, J. Amer. Statist. Assoc., 70, 311-319) baseball data are obtained. The same methods are used in the second example to obtain confidence intervals of treatment effects as well as the difference between treatment effects in an analysis of variance experiment. In the third example hierarchical Bayes intervals of treatment effects are obtained and compared with normal approximations in the unequal variance case. Key words and phrases: Hierarchical Bayes, empirical Bayes estimation, Stein estimator, multivariate normal mean, Pearson curves, confidence intervals, posterior distribution, unequal variance case, normal approxima- tions. 1. Introduction In the Bayesian approach to inference, a posterior distribution of unknown parameters is produced as the normalized product of the like- lihood and a prior distribution. Inferences about the unknown parameters are then based on the entire posterior distribution resulting from the one specific data set which has actually occurred. In most hierarchical and empirical Bayes cases these posterior distributions are difficult to derive and cannot be obtained in closed form.
    [Show full text]
  • Minimum Mean Squared Error Model Averaging in Likelihood Models
    Statistica Sinica 26 (2016), 809-840 doi:http://dx.doi.org/10.5705/ss.202014.0067 MINIMUM MEAN SQUARED ERROR MODEL AVERAGING IN LIKELIHOOD MODELS Ali Charkhi1, Gerda Claeskens1 and Bruce E. Hansen2 1KU Leuven and 2University of Wisconsin, Madison Abstract: A data-driven method for frequentist model averaging weight choice is developed for general likelihood models. We propose to estimate the weights which minimize an estimator of the mean squared error of a weighted estimator in a local misspecification framework. We find that in general there is not a unique set of such weights, meaning that predictions from multiple model averaging estimators might not be identical. This holds in both the univariate and multivariate case. However, we show that a unique set of empirical weights is obtained if the candidate models are appropriately restricted. In particular a suitable class of models are the so-called singleton models where each model only includes one parameter from the candidate set. This restriction results in a drastic reduction in the computational cost of model averaging weight selection relative to methods which include weights for all possible parameter subsets. We investigate the performance of our methods in both linear models and generalized linear models, and illustrate the methods in two empirical applications. Key words and phrases: Frequentist model averaging, likelihood regression, local misspecification, mean squared error, weight choice. 1. Introduction We study a focused version of frequentist model averaging where the mean squared error plays a central role. Suppose we have a collection of models S 2 S to estimate a population quantity µ, this is the focus, leading to a set of estimators fµ^S : S 2 Sg.
    [Show full text]
  • Theoretical Statistics. Lecture 5. Peter Bartlett
    Theoretical Statistics. Lecture 5. Peter Bartlett 1. U-statistics. 1 Outline of today’s lecture We’ll look at U-statistics, a family of estimators that includes many interesting examples. We’ll study their properties: unbiased, lower variance, concentration (via an application of the bounded differences inequality), asymptotic variance, asymptotic distribution. (See Chapter 12 of van der Vaart.) First, we’ll consider the standard unbiased estimate of variance—a special case of a U-statistic. 2 Variance estimates n 1 s2 = (X − X¯ )2 n n − 1 i n Xi=1 n n 1 = (X − X¯ )2 +(X − X¯ )2 2n(n − 1) i n j n Xi=1 Xj=1 n n 1 2 = (X − X¯ ) − (X − X¯ ) 2n(n − 1) i n j n Xi=1 Xj=1 n n 1 1 = (X − X )2 n(n − 1) 2 i j Xi=1 Xj=1 1 1 = (X − X )2 . n 2 i j 2 Xi<j 3 Variance estimates This is unbiased for i.i.d. data: 1 Es2 = E (X − X )2 n 2 1 2 1 = E ((X − EX ) − (X − EX ))2 2 1 1 2 2 1 2 2 = E (X1 − EX1) +(X2 − EX2) 2 2 = E (X1 − EX1) . 4 U-statistics Definition: A U-statistic of order r with kernel h is 1 U = n h(Xi1 ,...,Xir ), r iX⊆[n] where h is symmetric in its arguments. [If h is not symmetric in its arguments, we can also average over permutations.] “U” for “unbiased.” Introduced by Wassily Hoeffding in the 1940s. 5 U-statistics Theorem: [Halmos] θ (parameter, i.e., function defined on a family of distributions) admits an unbiased estimator (ie: for all sufficiently large n, some function of the i.i.d.
    [Show full text]
  • Calibration: Calibrate Your Model
    DUE TODAYCOMPUTER FILES AND QUESTIONS for Assgn#6 Assignment # 6 Steady State Model Calibration: Calibrate your model. If you want to conduct a transient calibration, talk with me first. Perform calibration using UCODE. Be sure your report addresses global, graphical, and spatial measures of error as well as common sense. Consider more than one conceptual model and compare the results. Remember to make a prediction with your calibrated models and evaluate confidence in your prediction. Be sure to save your files because you will want to use them later in the semester. Suggested Calibration Report Outline Title Introduction describe the system to be calibrated (use portions of your previous report as appropriate) Observations to be matched in calibration type of observations locations of observations observed values uncertainty associated with observations explain specifically what the observation will be matched to in the model Calibration Procedure Evaluation of calibration residuals parameter values quality of calibrated model Calibrated model results Predictions Uncertainty associated with predictions Problems encountered, if any Comparison with uncalibrated model results Assessment of future work needed, if appropriate Summary/Conclusions Summary/Conclusions References submit the paper as hard copy and include it in your zip file of model input and output submit the model files (input and output for both simulations) in a zip file labeled: ASSGN6_LASTNAME.ZIP Calibration (Parameter Estimation, Optimization, Inversion, Regression) adjusting parameter values, boundary conditions, model conceptualization, and/or model construction until the model simulation matches field observations We calibrate because 1. the field measurements are not accurate reflecti ons of the model scale properties, and 2.
    [Show full text]
  • 1 Estimation and Beyond in the Bayes Universe
    ISyE8843A, Brani Vidakovic Handout 7 1 Estimation and Beyond in the Bayes Universe. 1.1 Estimation No Bayes estimate can be unbiased but Bayesians are not upset! No Bayes estimate with respect to the squared error loss can be unbiased, except in a trivial case when its Bayes’ risk is 0. Suppose that for a proper prior ¼ the Bayes estimator ±¼(X) is unbiased, Xjθ (8θ)E ±¼(X) = θ: This implies that the Bayes risk is 0. The Bayes risk of ±¼(X) can be calculated as repeated expectation in two ways, θ Xjθ 2 X θjX 2 r(¼; ±¼) = E E (θ ¡ ±¼(X)) = E E (θ ¡ ±¼(X)) : Thus, conveniently choosing either EθEXjθ or EX EθjX and using the properties of conditional expectation we have, θ Xjθ 2 θ Xjθ X θjX X θjX 2 r(¼; ±¼) = E E θ ¡ E E θ±¼(X) ¡ E E θ±¼(X) + E E ±¼(X) θ Xjθ 2 θ Xjθ X θjX X θjX 2 = E E θ ¡ E θ[E ±¼(X)] ¡ E ±¼(X)E θ + E E ±¼(X) θ Xjθ 2 θ X X θjX 2 = E E θ ¡ E θ ¢ θ ¡ E ±¼(X)±¼(X) + E E ±¼(X) = 0: Bayesians are not upset. To check for its unbiasedness, the Bayes estimator is averaged with respect to the model measure (Xjθ), and one of the Bayesian commandments is: Thou shall not average with respect to sample space, unless you have Bayesian design in mind. Even frequentist agree that insisting on unbiasedness can lead to bad estimators, and that in their quest to minimize the risk by trading off between variance and bias-squared a small dosage of bias can help.
    [Show full text]
  • Download Article (PDF)
    Journal of Statistical Theory and Applications, Vol. 17, No. 2 (June 2018) 359–374 ___________________________________________________________________________________________________________ BAYESIAN APPROACH IN ESTIMATION OF SHAPE PARAMETER OF THE EXPONENTIATED MOMENT EXPONENTIAL DISTRIBUTION Kawsar Fatima Department of Statistics, University of Kashmir, Srinagar, India [email protected] S.P Ahmad* Department of Statistics, University of Kashmir, Srinagar, India [email protected] Received 1 November 2016 Accepted 19 June 2017 Abstract In this paper, Bayes estimators of the unknown shape parameter of the exponentiated moment exponential distribution (EMED)have been derived by using two informative (gamma and chi-square) priors and two non- informative (Jeffrey’s and uniform) priors under different loss functions, namely, Squared Error Loss function, Entropy loss function and precautionary Loss function. The Maximum likelihood estimator (MLE) is obtained. Also, we used two real life data sets to illustrate the result derived. Keywords: Exponentiated Moment Exponential distribution; Maximum Likelihood Estimator; Bayesian estimation; Priors; Loss functions. 2000 Mathematics Subject Classification: 22E46, 53C35, 57S20 1. Introduction The exponentiated exponential distribution is a specific family of the exponentiated Weibull distribution. In analyzing several life time data situations, it has been observed that the dual parameter exponentiated exponential distribution can be more effectively used as compared to both dual parameters of gamma or Weibull distribution. When we consider the shape parameter of exponentiated exponential, gamma and Weibull is one, then these distributions becomes one parameter exponential distribution. Hence, these three distributions are the off shoots of the exponential distribution. Moment distributions have a vital role in mathematics and statistics, in particular probability theory, in the viewpoint research related to ecology, reliability, biomedical field, econometrics, survey sampling and in life-testing.
    [Show full text]
  • Lecture 14 Testing for Kurtosis
    9/8/2016 CHE384, From Data to Decisions: Measurement, Kurtosis Uncertainty, Analysis, and Modeling • For any distribution, the kurtosis (sometimes Lecture 14 called the excess kurtosis) is defined as Testing for Kurtosis 3 (old notation = ) • For a unimodal, symmetric distribution, Chris A. Mack – a positive kurtosis means “heavy tails” and a more Adjunct Associate Professor peaked center compared to a normal distribution – a negative kurtosis means “light tails” and a more spread center compared to a normal distribution http://www.lithoguru.com/scientist/statistics/ © Chris Mack, 2016Data to Decisions 1 © Chris Mack, 2016Data to Decisions 2 Kurtosis Examples One Impact of Excess Kurtosis • For the Student’s t • For a normal distribution, the sample distribution, the variance will have an expected value of s2, excess kurtosis is and a variance of 6 2 4 1 for DF > 4 ( for DF ≤ 4 the kurtosis is infinite) • For a distribution with excess kurtosis • For a uniform 2 1 1 distribution, 1 2 © Chris Mack, 2016Data to Decisions 3 © Chris Mack, 2016Data to Decisions 4 Sample Kurtosis Sample Kurtosis • For a sample of size n, the sample kurtosis is • An unbiased estimator of the sample excess 1 kurtosis is ∑ ̅ 1 3 3 1 6 1 2 3 ∑ ̅ Standard Error: • For large n, the sampling distribution of 1 24 2 1 approaches Normal with mean 0 and variance 2 1 of 24/n 3 5 • For small samples, this estimator is biased D. N. Joanes and C. A. Gill, “Comparing Measures of Sample Skewness and Kurtosis”, The Statistician, 47(1),183–189 (1998).
    [Show full text]
  • Covariances of Two Sample Rank Sum Statistics
    JOURNAL OF RESEARCH of the National Bureou of Standards - B. Mathematical Sciences Volume 76B, Nos. 1 and 2, January-June 1972 Covariances of Two Sample Rank Sum Statistics Peter V. Tryon Institute for Basic Standards, National Bureau of Standards, Boulder, Colorado 80302 (November 26, 1971) This note presents an elementary derivation of the covariances of the e(e - 1)/2 two-sample rank sum statistics computed among aU pairs of samples from e populations. Key words: e Sample proble m; covariances, Mann-Whitney-Wilcoxon statistics; rank sum statistics; statistics. Mann-Whitney or Wilcoxon rank sum statistics, computed for some or all of the c(c - 1)/2 pairs of samples from c populations, have been used in testing the null hypothesis of homogeneity of dis­ tribution against a variety of alternatives [1, 3,4,5).1 This note presents an elementary derivation of the covariances of such statistics under the null hypothesis_ The usual approach to such an analysis is the rank sum viewpoint of the Wilcoxon form of the statistic. Using this approach, Steel [3] presents a lengthy derivation of the covariances. In this note it is shown that thinking in terms of the Mann-Whitney form of the statistic leads to an elementary derivation. For comparison and completeness the rank sum derivation of Kruskal and Wallis [2] is repeated in obtaining the means and variances. Let x{, r= 1,2, ... , ni, i= 1,2, ... , c, be the rth item in the sample of size ni from the ith of c populations. Let Mij be the Mann-Whitney statistic between the ith andjth samples defined by n· n· Mij= ~ t z[J (1) s=1 1' = 1 where l,xJ~>xr Zr~= { I) O,xj";;;x[ } Thus Mij is the number of times items in the jth sample exceed items in the ith sample.
    [Show full text]