The Most Informative Order Statistic and Its Application to Image Denoising

Total Page:16

File Type:pdf, Size:1020Kb

The Most Informative Order Statistic and Its Application to Image Denoising The Most Informative Order Statistic and its Application to Image Denoising Alex Dytso?, Martina Cardone∗, Cynthia Rushy ? New Jersey Institute of Technology, Newark, NJ 07102, USA Email: [email protected] ∗ University of Minnesota, Minneapolis, MN 55404, USA, Email: [email protected] y Columbia University, New York, NY 10025, USA, Email: [email protected] Abstract—We consider the problem of finding the subset of over others. Although such a universal1 choice can be justified order statistics that contains the most information about a sample when there is no knowledge of the underlying distribution, in of random variables drawn independently from some known scenarios where some knowledge is available a natural question parametric distribution. We leverage information-theoretic quan- tities, such as entropy and mutual information, to quantify arises: Can we somehow leverage such a knowledge to choose the level of informativeness and rigorously characterize the which is the “best” order statistic to consider? amount of information contained in any subset of the complete The main goal of this paper is to answer the above question. collection of order statistics. As an example, we show how these Towards this end, we introduce and analyze a theoretical informativeness metrics can be evaluated for a sample of discrete Bernoulli and continuous Uniform random variables. Finally, framework for performing ‘optimal’ order statistic selection we unveil how our most informative order statistics framework to fill the aforementioned theoretical gap. Specifically, our can be applied to image processing applications. Specifically, we framework allows to rigorously identify the subset of order investigate how the proposed measures can be used to choose statistics that contains the most information on a random sample. the coefficients of the L-estimator filter to denoise an image As an application, we show how the developed framework corrupted by random noise. We show that both for discrete (e.g., salt-pepper noise) and continuous (e.g., mixed Gaussian noise) can be used for image denoising to produce competitive noise distributions, the proposed method is competitive with off- approaches with off-the-shelf filters, as well as with wavelet- the-shelf filters, such as the median and the total variation filters, based denoising methods. Similar ideas also have the potential as well as with wavelet-based denoising methods. to benefit other fields where order statistics find application, such as radar detection and classification. With the goal of developing a theoretical framework for ‘optimal’ order statistic I. INTRODUCTION selection, in this work we are interested in answering the following questions: Consider a random sample X ;X ;:::;X drawn indepen- 1 2 n (1) How much ‘information’ does a single order statistic X dently from some known parametric distribution p(xjθ) where (i) contain about the random sample Xn for each i 2 [n]? We the parameter θ may or may not be known. Let the random refer to the X that contains the most information about the variables (r.v.) X ≤ X ≤ ::: ≤ X represent the order (i) (1) (2) (n) sample as the most informative order statistic. statistics of the sample. In particular, X(1) corresponds to (2) Let S ⊆ [n] be a set of cardinality jSj = k and let the minimum value of the sample, X(n) corresponds to the X = fX g . Which subset of order statistics X of maximum value of the sample, and X n (provided that n is (S) (i) i2S (S) ( 2 ) n even) corresponds to the median of the sample. We denote the size k is the most informative with respect to the sample X ? n collection of the random samples as X := (X1;X2;:::;Xn), (3) Given a set S ⊆ [n] and the collection of order statistics and we use [n] to denote the collection f1; 2; : : : ; ng: X(S), which additional order statistic X(i) where i 2 [n] but n As illustrated by comprehensive survey texts [1], [2], order i 62 S, adds the most information about the sample X ? arXiv:2101.11667v1 [cs.IT] 27 Jan 2021 statistics have a broad range of applications including survival One approach for defining the most informative order statis- and reliability analysis, life testing, statistical quality control, tics, and the one that we investigate in this work, is to consider filtering theory, signal processing, robustness and classification the mutual information as a base measure of informativeness. studies, radar target detection, and wireless communication. In Recall that, intuitively, the mutual information between two such a wide variety of practical situations, some order statistics variables X and Y , denoted as I(X; Y ) = I(Y ; X), measures – such as the minimum, the maximum, and the median – have the reduction in uncertainty about one of the variables given been analyzed and adopted more than others. For instance, in the knowledge of the other. Let p(x; y) be the joint density the context of image processing (see also Section V), a widely of (X; Y ) and let p(x); p(y) be the marginals. The mutual employed order statistic filter is the median filter. However, to the best of our knowledge, there is not a theoretical study 1A large body of the literature has focused on analyzing information that justifies why certain order statistics should be preferred measures of the (continuous or discrete) parent population of ordered statistics (examples include the differential entropy [3], the Rényi entropy [4], [5], the cumulative entropies [6], the Fisher information [7], and the f-divergence [8]) The work of M. Cardone was supported in part by the U.S. National Science and trying to show universal (i.e., distribution-free) properties for such Foundation under Grant CCF-1849757. information measures, see for instance [5], [9], [8], [10]. n information is calculated as Definition 1. Let Z := (Z1;Z2;:::;Zn) be a vector n ZZ p(x; y) of i.i.d. standard Gaussian r.v. independent of X = I(X; Y ) = p(x; y) log dx dy: (1) (X ;X ;:::;X ). Let S ⊆ [n] be defined as p(x)p(y) 1 2 n The base of the logarithm determines the units of the measure, S = f(i1; i2; : : : ; ik) : 1 ≤ i1 < i2 < : : : < ik ≤ ng; and throughout the paper we use base e. Notice that there is a with jSj = k. We define the following three measures of order relationship between the mutual information and the differential statistic informativeness: entropy, namely, r (S;Xn) = I(Xn; X ); (3) I(X; Y ) = h(Y ) − h(Y jX); (2) 1 (S) n 2 n n r2(S;X ) = lim 2σ I(X + σZ ; X(S)); (4) where the entropy and the conditional entropy are de- σ!1 n 2 n k R r3(S;X ) = lim 2σ I(X ; X(S) + σZ ): (5) fined as h(Y ) = − p(y) log p(y) dy; and h(Y jX) = σ!1 RR p(x; y) log(p(x)=p(x; y))dy dx: The discrete analogue n of (1) replaces the integrals with sums, and (2) holds with In Definition 1, the measure r1(S;X ) computes the mutual the differential entropy h(Y ) being replaced with its discrete information between a subset of order statistics X(S) and n n P the sample X . The measure r2(S;X ) computes the slope version, denoted as H(Y ) = − y p(y) log p(y). In particular, if X and Y are independent – so knowing of the mutual information at σ = 1: intuitively, as noise becomes large, only the most informative X(S) should maintain one delivers no information about the other – then the mutual n X the largest mutual information. The measure r3(S;X ) is an information is zero. Differently, if is a deterministic function n Y Y X alternative to r2(S;X ), with noise added to X(S) instead of of and is a deterministic function of , then knowing n one gives us complete information on the other. If additionally, X . The limits in (4) and (5) always exist, but may be infinity. X and Y are discrete, the mutual information is then the same One might also consider similar measures as in (4) and (5), as the amount of information contained in X or Y alone, as but in the limit of σ that goes to zero, namely measured by the entropy, H(Y ), since H(Y jX) = 0. If X n n n I(X + σZ ; X(S)) and Y are continuous, the mutual information is infinite since r4(S;X ) = lim ; σ!0 1 log(1 + 1 ) h(Y jX) = −∞ (because (X; X) is singular with respect to 2 σ2 n k (6) the Lebesgue measure on 2). n I(X ; X(S) + σZ ) R r5(S;X ) = lim : σ!0 1 1 2 log(1 + σ2 ) II. MEASURES OF INFORMATIVENESS OF ORDER n STATISTICS In particular, the intuition behind r4(S;X ) is that the most informative set X(S) should have the largest increase in the In this section, we propose several metrics, all of which mutual information as the observed sample becomes less noisy. leverage the mutual information as a base measure of infor- n n The measure r5(S;X ) is an alternative to r4(S;X ) where mativeness. We start by considering the mutual information the noise is added to X instead of Xn. However, as we n (S) between the sample X and any order statistic X(i), i.e., prove next, these measures evaluate to n I(X(i); X ) and find the index i 2 [n] that results in the n largest mutual information. In the case of discrete r.v., we have r4(S;X ) = 0; continuous and discrete r.v., n n n n k; continuous r.v., I(X(i); X ) = H(X ) − H(X jX(i)) r5(S;X ) = n 0; discrete r.v.
Recommended publications
  • Concentration and Consistency Results for Canonical and Curved Exponential-Family Models of Random Graphs
    CONCENTRATION AND CONSISTENCY RESULTS FOR CANONICAL AND CURVED EXPONENTIAL-FAMILY MODELS OF RANDOM GRAPHS BY MICHAEL SCHWEINBERGER AND JONATHAN STEWART Rice University Statistical inference for exponential-family models of random graphs with dependent edges is challenging. We stress the importance of additional structure and show that additional structure facilitates statistical inference. A simple example of a random graph with additional structure is a random graph with neighborhoods and local dependence within neighborhoods. We develop the first concentration and consistency results for maximum likeli- hood and M-estimators of a wide range of canonical and curved exponential- family models of random graphs with local dependence. All results are non- asymptotic and applicable to random graphs with finite populations of nodes, although asymptotic consistency results can be obtained as well. In addition, we show that additional structure can facilitate subgraph-to-graph estimation, and present concentration results for subgraph-to-graph estimators. As an ap- plication, we consider popular curved exponential-family models of random graphs, with local dependence induced by transitivity and parameter vectors whose dimensions depend on the number of nodes. 1. Introduction. Models of network data have witnessed a surge of interest in statistics and related areas [e.g., 31]. Such data arise in the study of, e.g., social networks, epidemics, insurgencies, and terrorist networks. Since the work of Holland and Leinhardt in the 1970s [e.g., 21], it is known that network data exhibit a wide range of dependencies induced by transitivity and other interesting network phenomena [e.g., 39]. Transitivity is a form of triadic closure in the sense that, when a node k is connected to two distinct nodes i and j, then i and j are likely to be connected as well, which suggests that edges are dependent [e.g., 39].
    [Show full text]
  • Use of Statistical Tables
    TUTORIAL | SCOPE USE OF STATISTICAL TABLES Lucy Radford, Jenny V Freeman and Stephen J Walters introduce three important statistical distributions: the standard Normal, t and Chi-squared distributions PREVIOUS TUTORIALS HAVE LOOKED at hypothesis testing1 and basic statistical tests.2–4 As part of the process of statistical hypothesis testing, a test statistic is calculated and compared to a hypothesised critical value and this is used to obtain a P- value. This P-value is then used to decide whether the study results are statistically significant or not. It will explain how statistical tables are used to link test statistics to P-values. This tutorial introduces tables for three important statistical distributions (the TABLE 1. Extract from two-tailed standard Normal, t and Chi-squared standard Normal table. Values distributions) and explains how to use tabulated are P-values corresponding them with the help of some simple to particular cut-offs and are for z examples. values calculated to two decimal places. STANDARD NORMAL DISTRIBUTION TABLE 1 The Normal distribution is widely used in statistics and has been discussed in z 0.00 0.01 0.02 0.03 0.050.04 0.05 0.06 0.07 0.08 0.09 detail previously.5 As the mean of a Normally distributed variable can take 0.00 1.0000 0.9920 0.9840 0.9761 0.9681 0.9601 0.9522 0.9442 0.9362 0.9283 any value (−∞ to ∞) and the standard 0.10 0.9203 0.9124 0.9045 0.8966 0.8887 0.8808 0.8729 0.8650 0.8572 0.8493 deviation any positive value (0 to ∞), 0.20 0.8415 0.8337 0.8259 0.8181 0.8103 0.8206 0.7949 0.7872 0.7795 0.7718 there are an infinite number of possible 0.30 0.7642 0.7566 0.7490 0.7414 0.7339 0.7263 0.7188 0.7114 0.7039 0.6965 Normal distributions.
    [Show full text]
  • The Probability Lifesaver: Order Statistics and the Median Theorem
    The Probability Lifesaver: Order Statistics and the Median Theorem Steven J. Miller December 30, 2015 Contents 1 Order Statistics and the Median Theorem 3 1.1 Definition of the Median 5 1.2 Order Statistics 10 1.3 Examples of Order Statistics 15 1.4 TheSampleDistributionoftheMedian 17 1.5 TechnicalboundsforproofofMedianTheorem 20 1.6 TheMedianofNormalRandomVariables 22 2 • Greetings again! In this supplemental chapter we develop the theory of order statistics in order to prove The Median Theorem. This is a beautiful result in its own, but also extremely important as a substitute for the Central Limit Theorem, and allows us to say non- trivial things when the CLT is unavailable. Chapter 1 Order Statistics and the Median Theorem The Central Limit Theorem is one of the gems of probability. It’s easy to use and its hypotheses are satisfied in a wealth of problems. Many courses build towards a proof of this beautiful and powerful result, as it truly is ‘central’ to the entire subject. Not to detract from the majesty of this wonderful result, however, what happens in those instances where it’s unavailable? For example, one of the key assumptions that must be met is that our random variables need to have finite higher moments, or at the very least a finite variance. What if we were to consider sums of Cauchy random variables? Is there anything we can say? This is not just a question of theoretical interest, of mathematicians generalizing for the sake of generalization. The following example from economics highlights why this chapter is more than just of theoretical interest.
    [Show full text]
  • Notes Mean, Median, Mode & Range
    Notes Mean, Median, Mode & Range How Do You Use Mode, Median, Mean, and Range to Describe Data? There are many ways to describe the characteristics of a set of data. The mode, median, and mean are all called measures of central tendency. These measures of central tendency and range are described in the table below. The mode of a set of data Use the mode to show which describes which value occurs value in a set of data occurs most frequently. If two or more most often. For the set numbers occur the same number {1, 1, 2, 3, 5, 6, 10}, of times and occur more often the mode is 1 because it occurs Mode than all the other numbers in the most frequently. set, those numbers are all modes for the data set. If each number in the set occurs the same number of times, the set of data has no mode. The median of a set of data Use the median to show which describes what value is in the number in a set of data is in the middle if the set is ordered from middle when the numbers are greatest to least or from least to listed in order. greatest. If there are an even For the set {1, 1, 2, 3, 5, 6, 10}, number of values, the median is the median is 3 because it is in the average of the two middle the middle when the numbers are Median values. Half of the values are listed in order. greater than the median, and half of the values are less than the median.
    [Show full text]
  • Sampling Student's T Distribution – Use of the Inverse Cumulative
    Sampling Student’s T distribution – use of the inverse cumulative distribution function William T. Shaw Department of Mathematics, King’s College, The Strand, London WC2R 2LS, UK With the current interest in copula methods, and fat-tailed or other non-normal distributions, it is appropriate to investigate technologies for managing marginal distributions of interest. We explore “Student’s” T distribution, survey its simulation, and present some new techniques for simulation. In particular, for a given real (not necessarily integer) value n of the number of degrees of freedom, −1 we give a pair of power series approximations for the inverse, Fn ,ofthe cumulative distribution function (CDF), Fn.Wealsogivesomesimpleandvery fast exact and iterative techniques for defining this function when n is an even −1 integer, based on the observation that for such cases the calculation of Fn amounts to the solution of a reduced-form polynomial equation of degree n − 1. We also explain the use of Cornish–Fisher expansions to define the inverse CDF as the composition of the inverse CDF for the normal case with a simple polynomial map. The methods presented are well adapted for use with copula and quasi-Monte-Carlo techniques. 1 Introduction There is much interest in many areas of financial modeling on the use of copulas to glue together marginal univariate distributions where there is no easy canonical multivariate distribution, or one wishes to have flexibility in the mechanism for combination. One of the more interesting marginal distributions is the “Student’s” T distribution. This statistical distribution was published by W. Gosset in 1908.
    [Show full text]
  • Approximating the Distribution of the Product of Two Normally Distributed Random Variables
    S S symmetry Article Approximating the Distribution of the Product of Two Normally Distributed Random Variables Antonio Seijas-Macías 1,2 , Amílcar Oliveira 2,3 , Teresa A. Oliveira 2,3 and Víctor Leiva 4,* 1 Departamento de Economía, Universidade da Coruña, 15071 A Coruña, Spain; [email protected] 2 CEAUL, Faculdade de Ciências, Universidade de Lisboa, 1649-014 Lisboa, Portugal; [email protected] (A.O.); [email protected] (T.A.O.) 3 Departamento de Ciências e Tecnologia, Universidade Aberta, 1269-001 Lisboa, Portugal 4 Escuela de Ingeniería Industrial, Pontificia Universidad Católica de Valparaíso, Valparaíso 2362807, Chile * Correspondence: [email protected] or [email protected] Received: 21 June 2020; Accepted: 18 July 2020; Published: 22 July 2020 Abstract: The distribution of the product of two normally distributed random variables has been an open problem from the early years in the XXth century. First approaches tried to determinate the mathematical and statistical properties of the distribution of such a product using different types of functions. Recently, an improvement in computational techniques has performed new approaches for calculating related integrals by using numerical integration. Another approach is to adopt any other distribution to approximate the probability density function of this product. The skew-normal distribution is a generalization of the normal distribution which considers skewness making it flexible. In this work, we approximate the distribution of the product of two normally distributed random variables using a type of skew-normal distribution. The influence of the parameters of the two normal distributions on the approximation is explored. When one of the normally distributed variables has an inverse coefficient of variation greater than one, our approximation performs better than when both normally distributed variables have inverse coefficients of variation less than one.
    [Show full text]
  • Calculating Variance and Standard Deviation
    VARIANCE AND STANDARD DEVIATION Recall that the range is the difference between the upper and lower limits of the data. While this is important, it does have one major disadvantage. It does not describe the variation among the variables. For instance, both of these sets of data have the same range, yet their values are definitely different. 90, 90, 90, 98, 90 Range = 8 1, 6, 8, 1, 9, 5 Range = 8 To better describe the variation, we will introduce two other measures of variation—variance and standard deviation (the variance is the square of the standard deviation). These measures tell us how much the actual values differ from the mean. The larger the standard deviation, the more spread out the values. The smaller the standard deviation, the less spread out the values. This measure is particularly helpful to teachers as they try to find whether their students’ scores on a certain test are closely related to the class average. To find the standard deviation of a set of values: a. Find the mean of the data b. Find the difference (deviation) between each of the scores and the mean c. Square each deviation d. Sum the squares e. Dividing by one less than the number of values, find the “mean” of this sum (the variance*) f. Find the square root of the variance (the standard deviation) *Note: In some books, the variance is found by dividing by n. In statistics it is more useful to divide by n -1. EXAMPLE Find the variance and standard deviation of the following scores on an exam: 92, 95, 85, 80, 75, 50 SOLUTION First we find the mean of the data: 92+95+85+80+75+50 477 Mean = = = 79.5 6 6 Then we find the difference between each score and the mean (deviation).
    [Show full text]
  • Lecture 14: Order Statistics; Conditional Expectation
    Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 14: Order Statistics; Conditional Expectation Relevant textbook passages: Pitman [5]: Section 4.6 Larsen–Marx [2]: Section 3.11 14.1 Order statistics Pitman [5]: Given a random vector (X(1,...,Xn) on the probability) space (S, E,P ), for each s ∈ S, sort the § 4.6 components into a vector X(1)(s),...,X(n)(s) satisfying X(1)(s) 6 X(2)(s) 6 ··· 6 X(n)(s). The vector (X(1),...,X(n)) is called the vector of order statistics of (X1,...,Xn). Equivalently, { } X(k) = min max{Xj : j ∈ J} : J ⊂ {1, . , n} & |J| = k . Order statistics play an important role in the study of auctions, among other things. 14.2 Marginal Distribution of Order Statistics From here on, we shall assume that the original random variables X1,...,Xn are independent and identically distributed. Let X1,...,Xn be independent and identically distributed random variables with common cumulative distribution function F ,( and let (X) (1),...,X(n)) be the vector of order statistics of X1,...,Xn. By breaking the event X(k) 6 x into simple disjoint subevents, we get ( ) X(k) 6 x = ( ) X(n) 6 x ∪ ( ) X(n) > x, X(n−1) 6 x . ∪ ( ) X(n) > x, . , X(j+1) > x, X(j) 6 x . ∪ ( ) X(n) > x, . , X(k+1) > x, X(k) 6 x . 14–1 Ma 3/103 Winter 2017 KC Border Order Statistics; Conditional Expectation 14–2 Each of these subevents is disjoint from the ones above it, and each has a binomial probability: ( ) X(n) > x, .
    [Show full text]
  • Descriptive Statistics and Histograms Range: the Range Is a Statistic That
    Descriptive Statistics and Histograms Range: The range is a statistic that helps to describe the spread of the data. The range requires numerical data to find a difference between the highest and lowest data value. Mode: The mode indicates which data value occurs most often. Median: The median describes a middle point of the data values (half the data values are above the median and half the data values are below the median). The median requires arranging the data values in ascending or descending order to determine the middle point of the data. Mean: The mean describes what would be a fair share of the data values or a balance point of the data values. How many calculators (of any kind) do you own? Range is 7; data values range from one calculator owned to 8 calculators owned. Mode is 2; the most frequent data value is 2 calculators owned. Median is 3; the middle of the data values is 3 calculators owned, or half the people surveyed own 3 or more calculators and half of the people surveyed own 3 or fewer calculators. Mean is 3.2 (to the nearest tenth); if everyone surveyed owned the same number of calculators, it would be a little more than 3 calculators, or the balance point of the data is a little more than 3 calculators owned. Possible conclusion: since the three measures of central tendency are relatively close in value, it is reasonable to conclude that the average number of calculators owned is about 3. The data value of 8 calculators is an outlier for this set of data and does pull the mean a little bit higher than the median and mode.
    [Show full text]
  • Medians and Order Statistics
    Medians and Order Statistics CLRS Chapter 9 1 What Are Order Statistics? The k-th order statistic is the k-th smallest element of an array. 3 4 13 14 23 27 41 54 65 75 8th order statistic ⎢n⎥ ⎢ ⎥ The lower median is the ⎣ 2 ⎦ -th order statistic ⎡n⎤ ⎢ ⎥ The upper median is the ⎢ 2 ⎥ -th order statistic If n is odd, lower and upper median are the same 3 4 13 14 23 27 41 54 65 75 lower median upper median What are Order Statistics? Selecting ith-ranked item from a collection. – First: i = 1 – Last: i = n ⎢n⎥ ⎡n⎤ – Median(s): i = ⎢ ⎥,⎢ ⎥ ⎣2⎦ ⎢2⎥ 3 Order Statistics Overview • Assume collection is unordered, otherwise trivial. find ith order stat = A[i] • Can sort first – Θ(n lg n), but can do better – Θ(n). • I can find max and min in Θ(n) time (obvious) • Can we find any order statistic in linear time? (not obvious!) 4 Order Statistics Overview How can we modify Quicksort to obtain expected-case Θ(n)? ? ? Pivot, partition, but recur only on one set of data. No join. 5 Using the Pivot Idea • Randomized-Select(A[p..r],i) looking for ith o.s. if p = r return A[p] q <- Randomized-Partition(A,p,r) k <- q-p+1 the size of the left partition if i=k then the pivot value is the answer return A[q] else if i < k then the answer is in the front return Randomized-Select(A,p,q-1,i) else then the answer is in the back half return Randomized-Select(A,q+1,r,i-k) 6 Randomized Selection • Analyzing RandomizedSelect() – Worst case: partition always 0:n-1 T(n) = T(n-1) + O(n) = O(n2) • No better than sorting! – “Best” case: suppose a 9:1 partition T(n) = T(9n/10)
    [Show full text]
  • Exponential Families and Theoretical Inference
    EXPONENTIAL FAMILIES AND THEORETICAL INFERENCE Bent Jørgensen Rodrigo Labouriau August, 2012 ii Contents Preface vii Preface to the Portuguese edition ix 1 Exponential families 1 1.1 Definitions . 1 1.2 Analytical properties of the Laplace transform . 11 1.3 Estimation in regular exponential families . 14 1.4 Marginal and conditional distributions . 17 1.5 Parametrizations . 20 1.6 The multivariate normal distribution . 22 1.7 Asymptotic theory . 23 1.7.1 Estimation . 25 1.7.2 Hypothesis testing . 30 1.8 Problems . 36 2 Sufficiency and ancillarity 47 2.1 Sufficiency . 47 2.1.1 Three lemmas . 48 2.1.2 Definitions . 49 2.1.3 The case of equivalent measures . 50 2.1.4 The general case . 53 2.1.5 Completeness . 56 2.1.6 A result on separable σ-algebras . 59 2.1.7 Sufficiency of the likelihood function . 60 2.1.8 Sufficiency and exponential families . 62 2.2 Ancillarity . 63 2.2.1 Definitions . 63 2.2.2 Basu's Theorem . 65 2.3 First-order ancillarity . 67 2.3.1 Examples . 67 2.3.2 Main results . 69 iii iv CONTENTS 2.4 Problems . 71 3 Inferential separation 77 3.1 Introduction . 77 3.1.1 S-ancillarity . 81 3.1.2 The nonformation principle . 83 3.1.3 Discussion . 86 3.2 S-nonformation . 91 3.2.1 Definition . 91 3.2.2 S-nonformation in exponential families . 96 3.3 G-nonformation . 99 3.3.1 Transformation models . 99 3.3.2 Definition of G-nonformation . 103 3.3.3 Cox's proportional risks model .
    [Show full text]
  • 3.4 Exponential Families
    3.4 Exponential Families A family of pdfs or pmfs is called an exponential family if it can be expressed as ¡ Xk ¢ f(x|θ) = h(x)c(θ) exp wi(θ)ti(x) . (1) i=1 Here h(x) ≥ 0 and t1(x), . , tk(x) are real-valued functions of the observation x (they cannot depend on θ), and c(θ) ≥ 0 and w1(θ), . , wk(θ) are real-valued functions of the possibly vector-valued parameter θ (they cannot depend on x). Many common families introduced in the previous section are exponential families. They include the continuous families—normal, gamma, and beta, and the discrete families—binomial, Poisson, and negative binomial. Example 3.4.1 (Binomial exponential family) Let n be a positive integer and consider the binomial(n, p) family with 0 < p < 1. Then the pmf for this family, for x = 0, 1, . , n and 0 < p < 1, is µ ¶ n f(x|p) = px(1 − p)n−x x µ ¶ n ¡ p ¢ = (1 − p)n x x 1 − p µ ¶ n ¡ ¡ p ¢ ¢ = (1 − p)n exp log x . x 1 − p Define 8 >¡ ¢ < n x = 0, 1, . , n h(x) = x > :0 otherwise, p c(p) = (1 − p)n, 0 < p < 1, w (p) = log( ), 0 < p < 1, 1 1 − p and t1(x) = x. Then we have f(x|p) = h(x)c(p) exp{w1(p)t1(x)}. 1 Example 3.4.4 (Normal exponential family) Let f(x|µ, σ2) be the N(µ, σ2) family of pdfs, where θ = (µ, σ2), −∞ < µ < ∞, σ > 0.
    [Show full text]