6 Finite Sample Theory of Order Statistics and Extremes

Total Page:16

File Type:pdf, Size:1020Kb

6 Finite Sample Theory of Order Statistics and Extremes 6 Finite Sample Theory of Order Statistics and Extremes The ordered values of a sample of observations are called the order statistics of the sample, and the smallest and the largest called the extremes. Order statistics and extremes are among the most important functions of a set of random variables that we study in probability and statistics. There is natural interest in studying the highs and lows of a sequence, and the other order statistics help in understanding concentration of probability in a distribution, or equivalently, the diversity in the population represented by the distribution. Order statistics are also useful in statistical inference, where estimates of parameters are often based on some suitable functions of the order statistics. In particular, the median is of very special importance. There is a well developed theory of the order statistics of a fixed number n of observations from a fixed distribution, as also an asymptotic theory where n goes to infinity. We will discuss the case of fixed n in this chapter. Distribution theory for order statistics when the observations are from a discrete distribution is complex, both notationally and algebraically, because of the fact that there could be several observations which are actually equal. These ties among the sample values make the distribution theory cumbersome. We therefore concentrate on the continuous case. Principal references for this chapter are the books by David (1980). Reiss (1989), Galambos (1987), Resnick (2007), and Leadbetter, Lindgren, and Rootz´en (1983). Specific other references are given in the sections. 6.1 Basic Distribution Theory Definition 6.1. Let X1.X2, ···,Xn be any n real valued random variables. Let X(1) ≤ X(2) ≤ ··· ≤ X(n) denote the ordered values of X1.X2, ···,Xn. Then, X(1),X(2), ···,X(n) are called the order statistics of X1.X2, ···,Xn. Remark: Thus, the minimum among X1.X2, ···,Xn is the first order statistic, and the max- imum the nth order statistic. The middle value among X1.X2, ···,Xn is called the median. But it needs to be defined precisely, because there is really no middle value when n is an even integer. Here is our definition. Definition 6.2. Let X1.X2, ···,Xn be any n real valued random variables. Then, the median of X1.X2, ···,Xn is defined to be Mn = X(m+1) if n =2m + 1 (an odd integer), and Mn = X(m) if n =2m (an even integer). That is, in either case, the median is the order statistic X(k) where k ≥ n is the smallest integer 2 . Example 6.1. Suppose .3,.53,.68,.06,.73,.48,.87,.42,.89,.44 are ten independent observations from the U[0, 1] distribution. Then, the order statistics are .06, .3, .42, .44, .48, .53, .68, .73, .87, n .89. Thus, X(1) = .06,X(n) = .89, and since 2 =5,Mn = X(5) = .48. An important connection to understand is the connection order statistics have with the empir- ical CDF, a function of immense theoretical and methodological importance in both probability and statistics. Definition 6.3. Let X1,X2, ···,Xn be any n real valued random variables. The empirical CDF of X1.X2, ···,Xn, also called the empirical CDF of the sample, is the function # {Xi : Xi ≤ x} Fn(x)= , n 189 i.e., Fn(x) measures the proportion of sample values that are ≤ x for a given x. Remark: Therefore, by its definition, Fn(x) = 0 whenever x<X(1),andFn(x) = 1 whenever k x ≥ X(n). It is also a constant , namely, n , for all x-values in the interval [X(k),X(k+1)). So Fn satisfies all the properties of being a valid CDF. Indeed, it is the CDF of a discrete distribution, 1 which puts an equal probability of n at the sample values X1,X2, ···,Xn. This calls for another definition. 1 Definition 6.4. Let Pn denote the discrete distribution which assigns probability n to each Xi. Then, Pn is called the empirical measure of the sample. −1 Definition 6.5. Let Qn(p)=Fn (p) be the quantile function corresponding to Fn. Then, Qn = −1 Fn is called the quantile function of X1,X2, ···,Xn, or the empirical quantile function. −1 We can now relate the median and the order statistics to the quantile function Fn . Proposition Let X1,X2, ···,Xn be n random variables. Then, −1 i (a)X i = F ( ); ( ) n n −1 1 (b)Mn = F ( ). n 2 We now specialize to the case where X1,X2, ···,Xn are independent random variables with a common density function f(x) and CDF F (x), and work out the fundamental distribution theory of the order statistics X(1),X(2), ···,X(n). Theorem 6.1. (Joint Density of All the Order Statistics) Let X1,X2, ···,Xn be indepen- dent random variables with a common density function f(x). Then, the joint density function of X(1),X(2), ···,X(n) is given by ··· ··· f1,2,···,n(y1,y2, ,yn)=n!f(y1)f(y2) f(yn)I{y1<y2<···<yn}. Proof: A verbal heuristic argument is easy to understand. If X(1) = y1,X(2) = y2, ···,X(n) = yn, then exactly one of the sample values X1,X2, ···,Xn is y1, exactly one is y2, etc., but we can put any of the n observations at y1, any of the other n − 1 observations at y2, etc., and so the density of X(1),X(2), ···,X(n) is f(y1)f(y2) ···f(yn) × n(n − 1) ···1=n!f(y1)f(y2) ···f(yn), and obviously if the inequality y1 <y2 < ···<yn is not satisfied, then at such a point the joint density of X(1),X(2), ···,X(n) must be zero. Here is a formal proof. The multivariate transformation (X1,X2, ···,Xn) → (X(1),X(2), ···,X(n)) is not one-to-one, as any permutation of a fixed (X1,X2, ···,Xn) vector has exactly the same set of order statistics X(1),X(2), ···,X(n). However, fix a specific permutation {π(1),π(2), ···,π(n)} of {1, 2, ···,n} and consider the subset Aπ = {(x1,x2, ···,xn):xπ(1) <xπ(2) < ··· <xπ(n)}. Then, the transformation (x1,x2, ···,xn) → (x(1),x(2), ···,x(n)) is one-to-one on each such Aπ, and indeed, then x(i) = xπ(i),i =1, 2, ···,n. The Jacobian matrix of the transformation is 1, for each such Aπ. A particular vector (x1,x2, ···,xn) falls in exactly one Aπ, and there are n! such regions Aπ, as we exhaust all the n! permutations {π(1),π(2), ···,π(n)} of {1, 2, ···,n}.Bya modification of the Jacobian density theorem, we then get X f1,2,···,n(y1,y2, ···,yn)= f(x1)f(x2) ···f(xn) π 190 X = f(xπ(1))f(xπ(2)) ···f(xπ(n)) π X = f(y1)f(y2) ···f(yn) π = n!f(y1)f(y2) ···f(yn). ♣ Example 6.2. (Uniform Order Statistics). Let U1,U2, ···,Un be independent U[0, 1] vari- ables, and U(i), 1 ≤ i ≤ n, their order statistics. Then, by our theorem above, the joint density of U(1),U(2), ···,U(n) is ··· f1,2,···,n(u1,u2, ,un)=n!I0<u1<u2<···<un<1. Once we know the joint density of all the order statistics, we can find the marginal density of any subset, by simply integrating out the rest of the coordiantes, but being extremely careful in using the correct domain over which to integrate the rest of the coordinates. For example, if we want the marginal density of just U(1), that is of the sample minimum, then we will want to integrate out u2, ···,un, and the correct domain of integration would be, for a given u1, a value of U(1), in (0,1), u1 <u2 <u3 < ···<un < 1. So, we will integrate down in the order un,un−1, ···,u2, to obtain Z Z Z 1 1 1 f1(u1)=n! ··· dundun−1 ···du3du2 u1 u2 un−1 n−1 = n(1 − u1) , 0 <u1 < 1. Likewise, if we want the marginal density of just U(n), that is of the sample maximum, then we will want to integrate out u1,u2, ···,un−1, and now the answer will be Z Z Z un un−1 u2 fn(un)=n! ··· du1du2 ···dun−1 0 0 0 n−1 = nun , 0 <un < 1. However, it is useful to note that for the special case of the minimum and the maximum, we could have obtained the densities much more easily and directly. Here is why. First take the maximum. Consider its CDF; for 0 <u<1: Yn ≤ ∩n { ≤ } ≤ P (U(n) u)=P ( i=1 Xi u )= P (Xi u) i=1 = un, d n n−1 and hence, the density of U(n) is fn(u)= du [u ]=nu , 0 <u<1. Likewise, for the minimum, for 0 <u<1, the tail CDF is: ∩n { } − n P (U(1) >u)=P ( i=1 Xi >u )=(1 u) , and so the density of U(1) is d f (u)= [1 − (1 − u)n]=n(1 − u)n−1, 0 <u<1. 1 du 191 Density of Minimum, Median, and Maximum of U[0,1] Variables; n = 15 14 12 10 8 6 4 2 x 0.2 0.4 0.6 0.8 1 For a general r, 1 ≤ r ≤ n, the density of U(r) works out to a Beta density: n! r−1 n−r fr(u)= u (1 − u) , 0 <u<1, (r − 1)!(n − r)! which is the Be(r, n − r + 1) density.
Recommended publications
  • Concentration and Consistency Results for Canonical and Curved Exponential-Family Models of Random Graphs
    CONCENTRATION AND CONSISTENCY RESULTS FOR CANONICAL AND CURVED EXPONENTIAL-FAMILY MODELS OF RANDOM GRAPHS BY MICHAEL SCHWEINBERGER AND JONATHAN STEWART Rice University Statistical inference for exponential-family models of random graphs with dependent edges is challenging. We stress the importance of additional structure and show that additional structure facilitates statistical inference. A simple example of a random graph with additional structure is a random graph with neighborhoods and local dependence within neighborhoods. We develop the first concentration and consistency results for maximum likeli- hood and M-estimators of a wide range of canonical and curved exponential- family models of random graphs with local dependence. All results are non- asymptotic and applicable to random graphs with finite populations of nodes, although asymptotic consistency results can be obtained as well. In addition, we show that additional structure can facilitate subgraph-to-graph estimation, and present concentration results for subgraph-to-graph estimators. As an ap- plication, we consider popular curved exponential-family models of random graphs, with local dependence induced by transitivity and parameter vectors whose dimensions depend on the number of nodes. 1. Introduction. Models of network data have witnessed a surge of interest in statistics and related areas [e.g., 31]. Such data arise in the study of, e.g., social networks, epidemics, insurgencies, and terrorist networks. Since the work of Holland and Leinhardt in the 1970s [e.g., 21], it is known that network data exhibit a wide range of dependencies induced by transitivity and other interesting network phenomena [e.g., 39]. Transitivity is a form of triadic closure in the sense that, when a node k is connected to two distinct nodes i and j, then i and j are likely to be connected as well, which suggests that edges are dependent [e.g., 39].
    [Show full text]
  • Use of Statistical Tables
    TUTORIAL | SCOPE USE OF STATISTICAL TABLES Lucy Radford, Jenny V Freeman and Stephen J Walters introduce three important statistical distributions: the standard Normal, t and Chi-squared distributions PREVIOUS TUTORIALS HAVE LOOKED at hypothesis testing1 and basic statistical tests.2–4 As part of the process of statistical hypothesis testing, a test statistic is calculated and compared to a hypothesised critical value and this is used to obtain a P- value. This P-value is then used to decide whether the study results are statistically significant or not. It will explain how statistical tables are used to link test statistics to P-values. This tutorial introduces tables for three important statistical distributions (the TABLE 1. Extract from two-tailed standard Normal, t and Chi-squared standard Normal table. Values distributions) and explains how to use tabulated are P-values corresponding them with the help of some simple to particular cut-offs and are for z examples. values calculated to two decimal places. STANDARD NORMAL DISTRIBUTION TABLE 1 The Normal distribution is widely used in statistics and has been discussed in z 0.00 0.01 0.02 0.03 0.050.04 0.05 0.06 0.07 0.08 0.09 detail previously.5 As the mean of a Normally distributed variable can take 0.00 1.0000 0.9920 0.9840 0.9761 0.9681 0.9601 0.9522 0.9442 0.9362 0.9283 any value (−∞ to ∞) and the standard 0.10 0.9203 0.9124 0.9045 0.8966 0.8887 0.8808 0.8729 0.8650 0.8572 0.8493 deviation any positive value (0 to ∞), 0.20 0.8415 0.8337 0.8259 0.8181 0.8103 0.8206 0.7949 0.7872 0.7795 0.7718 there are an infinite number of possible 0.30 0.7642 0.7566 0.7490 0.7414 0.7339 0.7263 0.7188 0.7114 0.7039 0.6965 Normal distributions.
    [Show full text]
  • The Probability Lifesaver: Order Statistics and the Median Theorem
    The Probability Lifesaver: Order Statistics and the Median Theorem Steven J. Miller December 30, 2015 Contents 1 Order Statistics and the Median Theorem 3 1.1 Definition of the Median 5 1.2 Order Statistics 10 1.3 Examples of Order Statistics 15 1.4 TheSampleDistributionoftheMedian 17 1.5 TechnicalboundsforproofofMedianTheorem 20 1.6 TheMedianofNormalRandomVariables 22 2 • Greetings again! In this supplemental chapter we develop the theory of order statistics in order to prove The Median Theorem. This is a beautiful result in its own, but also extremely important as a substitute for the Central Limit Theorem, and allows us to say non- trivial things when the CLT is unavailable. Chapter 1 Order Statistics and the Median Theorem The Central Limit Theorem is one of the gems of probability. It’s easy to use and its hypotheses are satisfied in a wealth of problems. Many courses build towards a proof of this beautiful and powerful result, as it truly is ‘central’ to the entire subject. Not to detract from the majesty of this wonderful result, however, what happens in those instances where it’s unavailable? For example, one of the key assumptions that must be met is that our random variables need to have finite higher moments, or at the very least a finite variance. What if we were to consider sums of Cauchy random variables? Is there anything we can say? This is not just a question of theoretical interest, of mathematicians generalizing for the sake of generalization. The following example from economics highlights why this chapter is more than just of theoretical interest.
    [Show full text]
  • Notes Mean, Median, Mode & Range
    Notes Mean, Median, Mode & Range How Do You Use Mode, Median, Mean, and Range to Describe Data? There are many ways to describe the characteristics of a set of data. The mode, median, and mean are all called measures of central tendency. These measures of central tendency and range are described in the table below. The mode of a set of data Use the mode to show which describes which value occurs value in a set of data occurs most frequently. If two or more most often. For the set numbers occur the same number {1, 1, 2, 3, 5, 6, 10}, of times and occur more often the mode is 1 because it occurs Mode than all the other numbers in the most frequently. set, those numbers are all modes for the data set. If each number in the set occurs the same number of times, the set of data has no mode. The median of a set of data Use the median to show which describes what value is in the number in a set of data is in the middle if the set is ordered from middle when the numbers are greatest to least or from least to listed in order. greatest. If there are an even For the set {1, 1, 2, 3, 5, 6, 10}, number of values, the median is the median is 3 because it is in the average of the two middle the middle when the numbers are Median values. Half of the values are listed in order. greater than the median, and half of the values are less than the median.
    [Show full text]
  • Sampling Student's T Distribution – Use of the Inverse Cumulative
    Sampling Student’s T distribution – use of the inverse cumulative distribution function William T. Shaw Department of Mathematics, King’s College, The Strand, London WC2R 2LS, UK With the current interest in copula methods, and fat-tailed or other non-normal distributions, it is appropriate to investigate technologies for managing marginal distributions of interest. We explore “Student’s” T distribution, survey its simulation, and present some new techniques for simulation. In particular, for a given real (not necessarily integer) value n of the number of degrees of freedom, −1 we give a pair of power series approximations for the inverse, Fn ,ofthe cumulative distribution function (CDF), Fn.Wealsogivesomesimpleandvery fast exact and iterative techniques for defining this function when n is an even −1 integer, based on the observation that for such cases the calculation of Fn amounts to the solution of a reduced-form polynomial equation of degree n − 1. We also explain the use of Cornish–Fisher expansions to define the inverse CDF as the composition of the inverse CDF for the normal case with a simple polynomial map. The methods presented are well adapted for use with copula and quasi-Monte-Carlo techniques. 1 Introduction There is much interest in many areas of financial modeling on the use of copulas to glue together marginal univariate distributions where there is no easy canonical multivariate distribution, or one wishes to have flexibility in the mechanism for combination. One of the more interesting marginal distributions is the “Student’s” T distribution. This statistical distribution was published by W. Gosset in 1908.
    [Show full text]
  • Approximating the Distribution of the Product of Two Normally Distributed Random Variables
    S S symmetry Article Approximating the Distribution of the Product of Two Normally Distributed Random Variables Antonio Seijas-Macías 1,2 , Amílcar Oliveira 2,3 , Teresa A. Oliveira 2,3 and Víctor Leiva 4,* 1 Departamento de Economía, Universidade da Coruña, 15071 A Coruña, Spain; [email protected] 2 CEAUL, Faculdade de Ciências, Universidade de Lisboa, 1649-014 Lisboa, Portugal; [email protected] (A.O.); [email protected] (T.A.O.) 3 Departamento de Ciências e Tecnologia, Universidade Aberta, 1269-001 Lisboa, Portugal 4 Escuela de Ingeniería Industrial, Pontificia Universidad Católica de Valparaíso, Valparaíso 2362807, Chile * Correspondence: [email protected] or [email protected] Received: 21 June 2020; Accepted: 18 July 2020; Published: 22 July 2020 Abstract: The distribution of the product of two normally distributed random variables has been an open problem from the early years in the XXth century. First approaches tried to determinate the mathematical and statistical properties of the distribution of such a product using different types of functions. Recently, an improvement in computational techniques has performed new approaches for calculating related integrals by using numerical integration. Another approach is to adopt any other distribution to approximate the probability density function of this product. The skew-normal distribution is a generalization of the normal distribution which considers skewness making it flexible. In this work, we approximate the distribution of the product of two normally distributed random variables using a type of skew-normal distribution. The influence of the parameters of the two normal distributions on the approximation is explored. When one of the normally distributed variables has an inverse coefficient of variation greater than one, our approximation performs better than when both normally distributed variables have inverse coefficients of variation less than one.
    [Show full text]
  • Calculating Variance and Standard Deviation
    VARIANCE AND STANDARD DEVIATION Recall that the range is the difference between the upper and lower limits of the data. While this is important, it does have one major disadvantage. It does not describe the variation among the variables. For instance, both of these sets of data have the same range, yet their values are definitely different. 90, 90, 90, 98, 90 Range = 8 1, 6, 8, 1, 9, 5 Range = 8 To better describe the variation, we will introduce two other measures of variation—variance and standard deviation (the variance is the square of the standard deviation). These measures tell us how much the actual values differ from the mean. The larger the standard deviation, the more spread out the values. The smaller the standard deviation, the less spread out the values. This measure is particularly helpful to teachers as they try to find whether their students’ scores on a certain test are closely related to the class average. To find the standard deviation of a set of values: a. Find the mean of the data b. Find the difference (deviation) between each of the scores and the mean c. Square each deviation d. Sum the squares e. Dividing by one less than the number of values, find the “mean” of this sum (the variance*) f. Find the square root of the variance (the standard deviation) *Note: In some books, the variance is found by dividing by n. In statistics it is more useful to divide by n -1. EXAMPLE Find the variance and standard deviation of the following scores on an exam: 92, 95, 85, 80, 75, 50 SOLUTION First we find the mean of the data: 92+95+85+80+75+50 477 Mean = = = 79.5 6 6 Then we find the difference between each score and the mean (deviation).
    [Show full text]
  • Lecture 14: Order Statistics; Conditional Expectation
    Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 14: Order Statistics; Conditional Expectation Relevant textbook passages: Pitman [5]: Section 4.6 Larsen–Marx [2]: Section 3.11 14.1 Order statistics Pitman [5]: Given a random vector (X(1,...,Xn) on the probability) space (S, E,P ), for each s ∈ S, sort the § 4.6 components into a vector X(1)(s),...,X(n)(s) satisfying X(1)(s) 6 X(2)(s) 6 ··· 6 X(n)(s). The vector (X(1),...,X(n)) is called the vector of order statistics of (X1,...,Xn). Equivalently, { } X(k) = min max{Xj : j ∈ J} : J ⊂ {1, . , n} & |J| = k . Order statistics play an important role in the study of auctions, among other things. 14.2 Marginal Distribution of Order Statistics From here on, we shall assume that the original random variables X1,...,Xn are independent and identically distributed. Let X1,...,Xn be independent and identically distributed random variables with common cumulative distribution function F ,( and let (X) (1),...,X(n)) be the vector of order statistics of X1,...,Xn. By breaking the event X(k) 6 x into simple disjoint subevents, we get ( ) X(k) 6 x = ( ) X(n) 6 x ∪ ( ) X(n) > x, X(n−1) 6 x . ∪ ( ) X(n) > x, . , X(j+1) > x, X(j) 6 x . ∪ ( ) X(n) > x, . , X(k+1) > x, X(k) 6 x . 14–1 Ma 3/103 Winter 2017 KC Border Order Statistics; Conditional Expectation 14–2 Each of these subevents is disjoint from the ones above it, and each has a binomial probability: ( ) X(n) > x, .
    [Show full text]
  • Descriptive Statistics and Histograms Range: the Range Is a Statistic That
    Descriptive Statistics and Histograms Range: The range is a statistic that helps to describe the spread of the data. The range requires numerical data to find a difference between the highest and lowest data value. Mode: The mode indicates which data value occurs most often. Median: The median describes a middle point of the data values (half the data values are above the median and half the data values are below the median). The median requires arranging the data values in ascending or descending order to determine the middle point of the data. Mean: The mean describes what would be a fair share of the data values or a balance point of the data values. How many calculators (of any kind) do you own? Range is 7; data values range from one calculator owned to 8 calculators owned. Mode is 2; the most frequent data value is 2 calculators owned. Median is 3; the middle of the data values is 3 calculators owned, or half the people surveyed own 3 or more calculators and half of the people surveyed own 3 or fewer calculators. Mean is 3.2 (to the nearest tenth); if everyone surveyed owned the same number of calculators, it would be a little more than 3 calculators, or the balance point of the data is a little more than 3 calculators owned. Possible conclusion: since the three measures of central tendency are relatively close in value, it is reasonable to conclude that the average number of calculators owned is about 3. The data value of 8 calculators is an outlier for this set of data and does pull the mean a little bit higher than the median and mode.
    [Show full text]
  • Medians and Order Statistics
    Medians and Order Statistics CLRS Chapter 9 1 What Are Order Statistics? The k-th order statistic is the k-th smallest element of an array. 3 4 13 14 23 27 41 54 65 75 8th order statistic ⎢n⎥ ⎢ ⎥ The lower median is the ⎣ 2 ⎦ -th order statistic ⎡n⎤ ⎢ ⎥ The upper median is the ⎢ 2 ⎥ -th order statistic If n is odd, lower and upper median are the same 3 4 13 14 23 27 41 54 65 75 lower median upper median What are Order Statistics? Selecting ith-ranked item from a collection. – First: i = 1 – Last: i = n ⎢n⎥ ⎡n⎤ – Median(s): i = ⎢ ⎥,⎢ ⎥ ⎣2⎦ ⎢2⎥ 3 Order Statistics Overview • Assume collection is unordered, otherwise trivial. find ith order stat = A[i] • Can sort first – Θ(n lg n), but can do better – Θ(n). • I can find max and min in Θ(n) time (obvious) • Can we find any order statistic in linear time? (not obvious!) 4 Order Statistics Overview How can we modify Quicksort to obtain expected-case Θ(n)? ? ? Pivot, partition, but recur only on one set of data. No join. 5 Using the Pivot Idea • Randomized-Select(A[p..r],i) looking for ith o.s. if p = r return A[p] q <- Randomized-Partition(A,p,r) k <- q-p+1 the size of the left partition if i=k then the pivot value is the answer return A[q] else if i < k then the answer is in the front return Randomized-Select(A,p,q-1,i) else then the answer is in the back half return Randomized-Select(A,q+1,r,i-k) 6 Randomized Selection • Analyzing RandomizedSelect() – Worst case: partition always 0:n-1 T(n) = T(n-1) + O(n) = O(n2) • No better than sorting! – “Best” case: suppose a 9:1 partition T(n) = T(9n/10)
    [Show full text]
  • Exponential Families and Theoretical Inference
    EXPONENTIAL FAMILIES AND THEORETICAL INFERENCE Bent Jørgensen Rodrigo Labouriau August, 2012 ii Contents Preface vii Preface to the Portuguese edition ix 1 Exponential families 1 1.1 Definitions . 1 1.2 Analytical properties of the Laplace transform . 11 1.3 Estimation in regular exponential families . 14 1.4 Marginal and conditional distributions . 17 1.5 Parametrizations . 20 1.6 The multivariate normal distribution . 22 1.7 Asymptotic theory . 23 1.7.1 Estimation . 25 1.7.2 Hypothesis testing . 30 1.8 Problems . 36 2 Sufficiency and ancillarity 47 2.1 Sufficiency . 47 2.1.1 Three lemmas . 48 2.1.2 Definitions . 49 2.1.3 The case of equivalent measures . 50 2.1.4 The general case . 53 2.1.5 Completeness . 56 2.1.6 A result on separable σ-algebras . 59 2.1.7 Sufficiency of the likelihood function . 60 2.1.8 Sufficiency and exponential families . 62 2.2 Ancillarity . 63 2.2.1 Definitions . 63 2.2.2 Basu's Theorem . 65 2.3 First-order ancillarity . 67 2.3.1 Examples . 67 2.3.2 Main results . 69 iii iv CONTENTS 2.4 Problems . 71 3 Inferential separation 77 3.1 Introduction . 77 3.1.1 S-ancillarity . 81 3.1.2 The nonformation principle . 83 3.1.3 Discussion . 86 3.2 S-nonformation . 91 3.2.1 Definition . 91 3.2.2 S-nonformation in exponential families . 96 3.3 G-nonformation . 99 3.3.1 Transformation models . 99 3.3.2 Definition of G-nonformation . 103 3.3.3 Cox's proportional risks model .
    [Show full text]
  • 3.4 Exponential Families
    3.4 Exponential Families A family of pdfs or pmfs is called an exponential family if it can be expressed as ¡ Xk ¢ f(x|θ) = h(x)c(θ) exp wi(θ)ti(x) . (1) i=1 Here h(x) ≥ 0 and t1(x), . , tk(x) are real-valued functions of the observation x (they cannot depend on θ), and c(θ) ≥ 0 and w1(θ), . , wk(θ) are real-valued functions of the possibly vector-valued parameter θ (they cannot depend on x). Many common families introduced in the previous section are exponential families. They include the continuous families—normal, gamma, and beta, and the discrete families—binomial, Poisson, and negative binomial. Example 3.4.1 (Binomial exponential family) Let n be a positive integer and consider the binomial(n, p) family with 0 < p < 1. Then the pmf for this family, for x = 0, 1, . , n and 0 < p < 1, is µ ¶ n f(x|p) = px(1 − p)n−x x µ ¶ n ¡ p ¢ = (1 − p)n x x 1 − p µ ¶ n ¡ ¡ p ¢ ¢ = (1 − p)n exp log x . x 1 − p Define 8 >¡ ¢ < n x = 0, 1, . , n h(x) = x > :0 otherwise, p c(p) = (1 − p)n, 0 < p < 1, w (p) = log( ), 0 < p < 1, 1 1 − p and t1(x) = x. Then we have f(x|p) = h(x)c(p) exp{w1(p)t1(x)}. 1 Example 3.4.4 (Normal exponential family) Let f(x|µ, σ2) be the N(µ, σ2) family of pdfs, where θ = (µ, σ2), −∞ < µ < ∞, σ > 0.
    [Show full text]