The Most Informative Order Statistic and Its Application to Image Denoising

The Most Informative Order Statistic and its Application to Image Denoising Alex Dytso?, Martina Cardone∗, Cynthia Rushy ? New Jersey Institute of Technology, Newark, NJ 07102, USA Email: [email protected] ∗ University of Minnesota, Minneapolis, MN 55404, USA, Email: [email protected] y Columbia University, New York, NY 10025, USA, Email: [email protected] Abstract—We consider the problem of finding the subset of over others. Although such a universal1 choice can be justified order statistics that contains the most information about a sample when there is no knowledge of the underlying distribution, in of random variables drawn independently from some known scenarios where some knowledge is available a natural question parametric distribution. We leverage information-theoretic quan- tities, such as entropy and mutual information, to quantify arises: Can we somehow leverage such a knowledge to choose the level of informativeness and rigorously characterize the which is the “best” order statistic to consider? amount of information contained in any subset of the complete The main goal of this paper is to answer the above question. collection of order statistics. As an example, we show how these Towards this end, we introduce and analyze a theoretical informativeness metrics can be evaluated for a sample of discrete Bernoulli and continuous Uniform random variables. Finally, framework for performing ‘optimal’ order statistic selection we unveil how our most informative order statistics framework to fill the aforementioned theoretical gap. Specifically, our can be applied to image processing applications. Specifically, we framework allows to rigorously identify the subset of order investigate how the proposed measures can be used to choose statistics that contains the most information on a random sample. the coefficients of the L-estimator filter to denoise an image As an application, we show how the developed framework corrupted by random noise. We show that both for discrete (e.g., salt-pepper noise) and continuous (e.g., mixed Gaussian noise) can be used for image denoising to produce competitive noise distributions, the proposed method is competitive with off- approaches with off-the-shelf filters, as well as with wavelet- the-shelf filters, such as the median and the total variation filters, based denoising methods. Similar ideas also have the potential as well as with wavelet-based denoising methods. to benefit other fields where order statistics find application, such as radar detection and classification. With the goal of developing a theoretical framework for ‘optimal’ order statistic I. INTRODUCTION selection, in this work we are interested in answering the following questions: Consider a random sample X ;X ;:::;X drawn indepen- 1 2 n (1) How much ‘information’ does a single order statistic X dently from some known parametric distribution p(xjθ) where (i) contain about the random sample Xn for each i 2 [n]? We the parameter θ may or may not be known. Let the random refer to the X that contains the most information about the variables (r.v.) X ≤ X ≤ ::: ≤ X represent the order (i) (1) (2) (n) sample as the most informative order statistic. statistics of the sample. In particular, X(1) corresponds to (2) Let S ⊆ [n] be a set of cardinality jSj = k and let the minimum value of the sample, X(n) corresponds to the X = fX g . Which subset of order statistics X of maximum value of the sample, and X n (provided that n is (S) (i) i2S (S) ( 2 ) n even) corresponds to the median of the sample. We denote the size k is the most informative with respect to the sample X ? n collection of the random samples as X := (X1;X2;:::;Xn), (3) Given a set S ⊆ [n] and the collection of order statistics and we use [n] to denote the collection f1; 2; : : : ; ng: X(S), which additional order statistic X(i) where i 2 [n] but n As illustrated by comprehensive survey texts [1], [2], order i 62 S, adds the most information about the sample X ? arXiv:2101.11667v1 [cs.IT] 27 Jan 2021 statistics have a broad range of applications including survival One approach for defining the most informative order statis- and reliability analysis, life testing, statistical quality control, tics, and the one that we investigate in this work, is to consider filtering theory, signal processing, robustness and classification the mutual information as a base measure of informativeness. studies, radar target detection, and wireless communication. In Recall that, intuitively, the mutual information between two such a wide variety of practical situations, some order statistics variables X and Y , denoted as I(X; Y ) = I(Y ; X), measures – such as the minimum, the maximum, and the median – have the reduction in uncertainty about one of the variables given been analyzed and adopted more than others. For instance, in the knowledge of the other. Let p(x; y) be the joint density the context of image processing (see also Section V), a widely of (X; Y ) and let p(x); p(y) be the marginals. The mutual employed order statistic filter is the median filter. However, to the best of our knowledge, there is not a theoretical study 1A large body of the literature has focused on analyzing information that justifies why certain order statistics should be preferred measures of the (continuous or discrete) parent population of ordered statistics (examples include the differential entropy [3], the Rényi entropy [4], [5], the cumulative entropies [6], the Fisher information [7], and the f-divergence [8]) The work of M. Cardone was supported in part by the U.S. National Science and trying to show universal (i.e., distribution-free) properties for such Foundation under Grant CCF-1849757. information measures, see for instance [5], [9], [8], [10]. n information is calculated as Definition 1. Let Z := (Z1;Z2;:::;Zn) be a vector n ZZ p(x; y) of i.i.d. standard Gaussian r.v. independent of X = I(X; Y ) = p(x; y) log dx dy: (1) (X ;X ;:::;X ). Let S ⊆ [n] be defined as p(x)p(y) 1 2 n The base of the logarithm determines the units of the measure, S = f(i1; i2; : : : ; ik) : 1 ≤ i1 < i2 < : : : < ik ≤ ng; and throughout the paper we use base e. Notice that there is a with jSj = k. We define the following three measures of order relationship between the mutual information and the differential statistic informativeness: entropy, namely, r (S;Xn) = I(Xn; X ); (3) I(X; Y ) = h(Y ) − h(Y jX); (2) 1 (S) n 2 n n r2(S;X ) = lim 2σ I(X + σZ ; X(S)); (4) where the entropy and the conditional entropy are de- σ!1 n 2 n k R r3(S;X ) = lim 2σ I(X ; X(S) + σZ ): (5) fined as h(Y ) = − p(y) log p(y) dy; and h(Y jX) = σ!1 RR p(x; y) log(p(x)=p(x; y))dy dx: The discrete analogue n of (1) replaces the integrals with sums, and (2) holds with In Definition 1, the measure r1(S;X ) computes the mutual the differential entropy h(Y ) being replaced with its discrete information between a subset of order statistics X(S) and n n P the sample X . The measure r2(S;X ) computes the slope version, denoted as H(Y ) = − y p(y) log p(y). In particular, if X and Y are independent – so knowing of the mutual information at σ = 1: intuitively, as noise becomes large, only the most informative X(S) should maintain one delivers no information about the other – then the mutual n X the largest mutual information. The measure r3(S;X ) is an information is zero. Differently, if is a deterministic function n Y Y X alternative to r2(S;X ), with noise added to X(S) instead of of and is a deterministic function of , then knowing n one gives us complete information on the other. If additionally, X . The limits in (4) and (5) always exist, but may be infinity. X and Y are discrete, the mutual information is then the same One might also consider similar measures as in (4) and (5), as the amount of information contained in X or Y alone, as but in the limit of σ that goes to zero, namely measured by the entropy, H(Y ), since H(Y jX) = 0. If X n n n I(X + σZ ; X(S)) and Y are continuous, the mutual information is infinite since r4(S;X ) = lim ; σ!0 1 log(1 + 1 ) h(Y jX) = −∞ (because (X; X) is singular with respect to 2 σ2 n k (6) the Lebesgue measure on 2). n I(X ; X(S) + σZ ) R r5(S;X ) = lim : σ!0 1 1 2 log(1 + σ2 ) II. MEASURES OF INFORMATIVENESS OF ORDER n STATISTICS In particular, the intuition behind r4(S;X ) is that the most informative set X(S) should have the largest increase in the In this section, we propose several metrics, all of which mutual information as the observed sample becomes less noisy. leverage the mutual information as a base measure of infor- n n The measure r5(S;X ) is an alternative to r4(S;X ) where mativeness. We start by considering the mutual information the noise is added to X instead of Xn. However, as we n (S) between the sample X and any order statistic X(i), i.e., prove next, these measures evaluate to n I(X(i); X ) and find the index i 2 [n] that results in the n largest mutual information. In the case of discrete r.v., we have r4(S;X ) = 0; continuous and discrete r.v., n n n n k; continuous r.v., I(X(i); X ) = H(X ) − H(X jX(i)) r5(S;X ) = n 0; discrete r.v.

The Most Informative Order Statistic and Its Application to Image Denoising

Concentration and Consistency Results for Canonical and Curved Exponential-Family Models of Random Graphs

Use of Statistical Tables

The Probability Lifesaver: Order Statistics and the Median Theorem

Notes Mean, Median, Mode & Range

Sampling Student's T Distribution – Use of the Inverse Cumulative

Approximating the Distribution of the Product of Two Normally Distributed Random Variables

Calculating Variance and Standard Deviation

Lecture 14: Order Statistics; Conditional Expectation

Descriptive Statistics and Histograms Range: the Range Is a Statistic That

Medians and Order Statistics

Exponential Families and Theoretical Inference

3.4 Exponential Families