Open Saenapark-Dissertation.Pdf

Total Page:16

File Type:pdf, Size:1020Kb

Open Saenapark-Dissertation.Pdf The Pennsylvania State University The Graduate School CLASSIFICATION OF TRANSIENTS BY DISTANCE MEASURES A Dissertation in Statistics by Sae Na Park c 2015 Sae Na Park Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy August 2015 The dissertation of Sae Na Park was reviewed and approved∗ by the following: G. Jogesh Babu Professor of Statistics Dissertation Advisor Chair of Committee John Fricks Associate Professor of Statistics Matthew Reimherr Assistant Professor of Statistics Eric B. Ford Professor of Astronomy Aleksandra Slavkovic Associate Professor of Statistics Chair of Graduate Program ∗Signatures are on file in the Graduate School. Abstract Due to a rapidly increasing size of data in astronomical surveys, statistical methods which can automatically classify newly detected celestial objects in an accurate and efficient way have become essential. In this dissertation, we introduce two methodologies to classify variable stars and transients by using light curves, which are graphs of magnitude (the logarithm measure of brightness of a star) as a function of time. Our analysis focuses on characterizing light curves by using magnitude changes over time increments and developing a classifier with this information. First we present the classifier based on the difference between two distributions of magni- tudes, estimated by the statistical distance measures such as the Kullback-Leibler divergence, the Jensen-Shannon divergence, and the Hellinger distance. Also, we propose a method that groups magnitudes and times by binning and uses frequen- cies in each bin as the variables for classification. Along with these two methods, a way to incorporate other measures into our classifiers, which have been used for classification of light curves, is presented. Finally, the proposed methods are demonstrated with real data and compared with the past classification methods of variable stars and transients. iii Table of Contents List of Figures vii List of Tables ix Acknowledgments xii Chapter 1 Introduction 1 Chapter 2 Literature Review, Background, and Statistical Methods 4 2.1 Introduction to CRTS Data . 4 2.2 Classification of Variable Stars . 7 2.2.1 Periodic Feature Generation . 7 2.2.2 Non-periodic Features Generation . 8 2.2.3 Modeling Light Curves: Gaussian Process Regression . 9 2.3 Bayesian Decision Theory . 12 2.4 Kernel Density Estimation . 14 2.5 Classification Methods . 15 2.5.1 Linear Discriminant Analysis . 16 2.5.2 Classification Tree . 16 2.5.3 Random Forest . 17 2.5.4 Support Vector Machines . 18 2.5.5 Neural Networks . 18 2.6 Distance Measures . 19 2.6.1 Kullback-Leibler Divergence . 19 2.6.1.1 Definition and Properties . 19 iv 2.6.1.2 Bayesian Estimate of Kullback-Leibler Divergence . 20 2.6.2 Jensen-Shannon Divergence . 22 2.6.3 Hellinger Distance . 24 Chapter 3 Methods 25 3.1 Introduction . 25 3.2 Classification by Distance Measures . 26 3.2.1 Kullback-Leibler Divergence with Kernel Density Estimation 27 3.2.2 Kullback-Leibler Divergence with Binning . 28 3.2.3 Bayesian Estimate of Kullback-Leibler Divergence . 30 3.2.4 Incorporating Other Measures . 30 3.3 Classification by Binning . 31 3.3.1 Incorporating Other Measures . 32 3.4 Summary and Conclusions . 32 Chapter 4 Simulation 34 4.1 Data Generation . 34 4.2 Simulation Results . 41 4.2.1 Light Curves without Gaps . 41 4.2.2 Light Curves with Gaps . 44 Chapter 5 Analysis 47 5.1 Introduction . 47 5.2 Data Description . 48 5.3 Exploratory Analysis . 50 5.4 Classification by Distance Measures . 53 5.4.1 Settings . 55 5.4.1.1 Kullback-Leibler Divergence . 55 5.4.1.2 Bayesian Estimate of Kullback-Leibler Divergence . 55 5.4.1.3 Jensen-Shannon Divergence and Hellinger Distance 57 5.4.2 Results . 59 5.4.2.1 Binary Classification . 59 5.4.2.2 Multi-class Classification . 62 5.4.3 Incorporating Other Measures . 65 5.5 Classification by Binning . 69 5.5.1 Incorporating Other Measures . 70 5.6 Application on Newly Collected Data Sets . 72 v 5.7 Summary . 75 Chapter 6 Summary and Future Work 77 6.1 Summary . 77 6.2 Future Work . 78 Appendix A Details on Application to CRTS Data 80 A.1 Choice of Pseudocounts . 80 A.2 Choice of Measures . 81 A.2.1 Measures for Section 5.4.3 . 81 A.2.2 Measures for Section 5.6 . 82 A.3 Contingency Tables . 83 A.3.1 Tables of Section 5.4 and 5.5 . 83 A.3.2 Tables of Section 5.6 . 88 Appendix B R code 90 B.1 Classification by Distance Measures . 90 B.2 Classification by Binning . 97 Bibliography 99 vi List of Figures 1.1 Example of a light curve . 2 2.1 Celestial coordinates: Right ascension and declination . 5 2.2 Four images of CSS080118:112149-131310, taken minutes apart by CSS on March 30, 2008. This object was classified as a transient. 6 2.3 The light curve for Transient CSS080118:112149-131310. Measure- ments are presented with blue points with error bars. x-axis and y-axis represent time and magnitude respectively. 6 2.4 GPR curves for an AGN and a Supernova. Black dots are the observations, and the blue solid line and the red dashed line are the fitted curves with the median magnitude and the detection limit 20.5 as a mean function respectively. 12 3.1 Light curves of a supernova (left) and a non-transient (right) . 27 4.1 Example of RR Lyrae light curve . 35 4.2 Simulated RR Lyrae light curves with b = 0:2, a = (1; 2; 3) (from top to bottom), and p = (0:2; 0:5; 1) (from left to right) . 36 4.3 Simulated RR Lyrae light curves with b = 0:4, a = (1; 2; 3) (from top to bottom), and p = (0:2; 0:5; 1) (from left to right) . 37 4.4 Simulated RR Lyrae light curves with b = 0:6, a = (1; 2; 3) (from top to bottom), and p = (0:2; 0:5; 1) (from left to right) . 38 4.5 Examples of Type I, Type II-P, and Type II-L supernovae light curves (Doggett and Branch, 1985) . 39 4.6 Simulated Type I, Type II-P, and Type II-L (from top to bottom) supernovae light curves with a = (1; 3; 5) (from left to right) . 40 5.1 Top: A boxplot of dmag for different types of transients (entire range). Bottom: A boxplot of dmag between -2 and 2. 52 5.2 Light curves of a AGN, a Flare, a Supernova, and a non-transient . 53 vii 5.3 (dt; dmag) density plots by kernel density estimation of a AGN, a Flare, a Supernova, and a non-transient. These are the same object with the light curves presented in Figure 4.2. The number of grid points used for evaluating densities is 50 in each side. 54 5.4 Left: The histogram of the entire dt. Right: The histogram of dts which are less than 100. 56 5.5 The histogram of dmag ......................... 56 5.6 Classification rates of KLD3 for different choices of α . 58 5.7 Stepwise selection for choosing measures to include for the distance method: The first value on the plot is the classification rate when every measure is used. After that, it shows the rate when the specified measure in the x-axis is excluded from the set. 68 5.8 Stepwise selection for choosing measures to include for the binning method: The first value on the plot is the classification rate when every measure is used. After that, it shows the rate when the specified measure is excluded from the set. 71 6.1 Light curve of CV (CSS071216:110407045134) and its (dt, dmag) kernel density plot . 78 viii List of Tables 2.1 Non-periodic features from Richards et al. (2011) . 10 2.2 The features generated by modeling light curves in Faraway et al. (2014) . 13 3.1 Contingency table for dt and dmag bins . 29 4.1 RR Lyraes vs. non-variables classification: Completeness and con- tamination (in parenthesis) for different amplitudes . 42 4.2 RR Lyraes vs. non-variables classification: Completeness and con- tamination (in parenthesis) for different periods . 42 4.3 RR Lyraes vs. non-variables classification: Completeness and con- tamination (in parenthesis) for different shapes . 42 4.4 Supernovae vs. non-variables classification: Completeness and con- tamination (in parenthesis) for different types . 43 4.5 Supernovae vs. non-variables classification: Completeness and con- tamination (in parenthesis) for different amplitudes . 43 4.6 RR Lyraes vs. non-variables classification: Completeness and con- tamination (in parenthesis) for different amplitudes when gaps are present in light curves. 45 4.7 RR Lyraes vs. non-variables classification: Completeness and con- tamination (in parenthesis) for different periods when gaps are present in light curves. 45 4.8 RR Lyraes vs. non-variables classification: Completeness and con- tamination (in parenthesis) for different shapes when gaps are present in light curves. 45 5.1 Number of light curves for each transient type and non-transient objects (Faraway et al., 2014) . 50 5.2 Transients vs. non-variables classification by KLD2 with α = 0:6 . 59 5.3 Examples of a contingency table for SNe versus non-SNe classification 60 ix 5.4 Completeness: Percentages correctly classified to each type of tran- sients . 60 5.5 Contamination: False alarm rates . 61 5.6 Comparisons among Richards measure, Faraway measure, and our distance measure for completeness . 62 5.7 Comparisons among Richards measure, Faraway measure, and our distance measure for contamination . 62 5.8 Completeness for all-type classification .
Recommended publications
  • Conditional Central Limit Theorems for Gaussian Projections
    1 Conditional Central Limit Theorems for Gaussian Projections Galen Reeves Abstract—This paper addresses the question of when projec- The focus of this paper is on the typical behavior when the tions of a high-dimensional random vector are approximately projections are generated randomly and independently of the Gaussian. This problem has been studied previously in the context random variables. Given an n-dimensional random vector X, of high-dimensional data analysis, where the focus is on low- dimensional projections of high-dimensional point clouds. The the k-dimensional linear projection Z is defined according to focus of this paper is on the typical behavior when the projections Z =ΘX, (1) are generated by an i.i.d. Gaussian projection matrix. The main results are bounds on the deviation between the conditional where Θ is a k n random matrix that is independent of X. distribution of the projections and a Gaussian approximation, Throughout this× paper it assumed that X has finite second where the conditioning is on the projection matrix. The bounds are given in terms of the quadratic Wasserstein distance and moment and that the entries of Θ are i.i.d. Gaussian random relative entropy and are stated explicitly as a function of the variables with mean zero and variance 1/n. number of projections and certain key properties of the random Our main results are bounds on the deviation between the vector. The proof uses Talagrand’s transportation inequality and conditional distribution of Z given Θ and a Gaussian approx- a general integral-moment inequality for mutual information.
    [Show full text]
  • Hellinger Distance Based Drift Detection for Nonstationary Environments
    Hellinger Distance Based Drift Detection for Nonstationary Environments Gregory Ditzler and Robi Polikar Dept. of Electrical & Computer Engineering Rowan University Glassboro, NJ, USA [email protected], [email protected] Abstract—Most machine learning algorithms, including many sion boundaries as a concept change, whereas gradual changes online learners, assume that the data distribution to be learned is in the data distribution as a concept drift. However, when the fixed. There are many real-world problems where the distribu- context does not require us to distinguish between the two, we tion of the data changes as a function of time. Changes in use the term concept drift to encompass both scenarios, as it is nonstationary data distributions can significantly reduce the ge- usually the more difficult one to detect. neralization ability of the learning algorithm on new or field data, if the algorithm is not equipped to track such changes. When the Learning from drifting environments is usually associated stationary data distribution assumption does not hold, the learner with a stream of incoming data, either one instance or one must take appropriate actions to ensure that the new/relevant in- batch at a time. There are two types of approaches for drift de- formation is learned. On the other hand, data distributions do tection in such streaming data: in passive drift detection, the not necessarily change continuously, necessitating the ability to learner assumes – every time new data become available – that monitor the distribution and detect when a significant change in some drift may have occurred, and updates the classifier ac- distribution has occurred.
    [Show full text]
  • Hellinger Distance-Based Similarity Measures for Recommender Systems
    Hellinger Distance-based Similarity Measures for Recommender Systems Roma Goussakov One year master thesis Ume˚aUniversity Abstract Recommender systems are used in online sales and e-commerce for recommend- ing potential items/products for customers to buy based on their previous buy- ing preferences and related behaviours. Collaborative filtering is a popular computational technique that has been used worldwide for such personalized recommendations. Among two forms of collaborative filtering, neighbourhood and model-based, the neighbourhood-based collaborative filtering is more pop- ular yet relatively simple. It relies on the concept that a certain item might be of interest to a given customer (active user) if, either he appreciated sim- ilar items in the buying space, or if the item is appreciated by similar users (neighbours). To implement this concept different kinds of similarity measures are used. This thesis is set to compare different user-based similarity measures along with defining meaningful measures based on Hellinger distance that is a metric in the space of probability distributions. Data from a popular database MovieLens will be used to show the effectiveness of different Hellinger distance- based measures compared to other popular measures such as Pearson correlation (PC), cosine similarity, constrained PC and JMSD. The performance of differ- ent similarity measures will then be evaluated with the help of mean absolute error, root mean squared error and F-score. From the results, no evidence were found to claim that Hellinger distance-based measures performed better than more popular similarity measures for the given dataset. Abstrakt Titel: Hellinger distance-baserad similaritetsm˚attf¨orrekomendationsystem Rekomendationsystem ¨aroftast anv¨andainom e-handel f¨orrekomenderingar av potentiella varor/produkter som en kund kommer att vara intresserad av att k¨opabaserat p˚aderas tidigare k¨oppreferenseroch relaterat beteende.
    [Show full text]
  • On Measures of Entropy and Information
    On Measures of Entropy and Information Tech. Note 009 v0.7 http://threeplusone.com/info Gavin E. Crooks 2018-09-22 Contents 5 Csiszar´ f-divergences 12 Csiszar´ f-divergence ................ 12 0 Notes on notation and nomenclature 2 Dual f-divergence .................. 12 Symmetric f-divergences .............. 12 1 Entropy 3 K-divergence ..................... 12 Entropy ........................ 3 Fidelity ........................ 12 Joint entropy ..................... 3 Marginal entropy .................. 3 Hellinger discrimination .............. 12 Conditional entropy ................. 3 Pearson divergence ................. 14 Neyman divergence ................. 14 2 Mutual information 3 LeCam discrimination ............... 14 Mutual information ................. 3 Skewed K-divergence ................ 14 Multivariate mutual information ......... 4 Alpha-Jensen-Shannon-entropy .......... 14 Interaction information ............... 5 Conditional mutual information ......... 5 6 Chernoff divergence 14 Binding information ................ 6 Chernoff divergence ................. 14 Residual entropy .................. 6 Chernoff coefficient ................. 14 Total correlation ................... 6 Renyi´ divergence .................. 15 Lautum information ................ 6 Alpha-divergence .................. 15 Uncertainty coefficient ............... 7 Cressie-Read divergence .............. 15 Tsallis divergence .................. 15 3 Relative entropy 7 Sharma-Mittal divergence ............. 15 Relative entropy ................... 7 Cross entropy
    [Show full text]
  • Multivariate Statistics Chapter 5: Multidimensional Scaling
    Multivariate Statistics Chapter 5: Multidimensional scaling Pedro Galeano Departamento de Estad´ıstica Universidad Carlos III de Madrid [email protected] Course 2017/2018 Master in Mathematical Engineering Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 5 Master in Mathematical Engineering 1 / 37 1 Introduction 2 Statistical distances 3 Metric MDS 4 Non-metric MDS Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 5 Master in Mathematical Engineering 2 / 37 Introduction As we have seen in previous chapters, principal components and factor analysis are important dimension reduction tools. However, in many applied sciences, data is recorded as ranked information. For example, in marketing, one may record \product A is better than product B". Multivariate observations therefore often have mixed data characteristics and contain information that would enable us to employ one of the multivariate techniques presented so far. Multidimensional scaling (MDS) is a method based on proximities between ob- jects, subjects, or stimuli used to produce a spatial representation of these items. MDS is a dimension reduction technique since the aim is to find a set of points in low dimension (typically two dimensions) that reflect the relative configuration of the high-dimensional data objects. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 5 Master in Mathematical Engineering 3 / 37 Introduction The proximities between objects are defined as any set of numbers that express the amount of similarity or dissimilarity between pairs of objects. In contrast to the techniques considered so far, MDS does not start from a n×p dimensional data matrix, but from a n × n dimensional dissimilarity or distance 0 matrix, D, with elements δii 0 or dii 0 , respectively, for i; i = 1;:::; n.
    [Show full text]
  • Tailoring Differentially Private Bayesian Inference to Distance
    Tailoring Differentially Private Bayesian Inference to Distance Between Distributions Jiawen Liu*, Mark Bun**, Gian Pietro Farina*, and Marco Gaboardi* *University at Buffalo, SUNY. fjliu223,gianpiet,gaboardig@buffalo.edu **Princeton University. [email protected] Contents 1 Introduction 3 2 Preliminaries 5 3 Technical Problem Statement and Motivations6 4 Mechanism Proposition8 4.1 Laplace Mechanism Family.............................8 4.1.1 using `1 norm metric.............................8 4.1.2 using improved `1 norm metric.......................8 4.2 Exponential Mechanism Family...........................9 4.2.1 Standard Exponential Mechanism.....................9 4.2.2 Exponential Mechanism with Hellinger Metric and Local Sensitivity.. 10 4.2.3 Exponential Mechanism with Hellinger Metric and Smoothed Sensitivity 10 5 Privacy Analysis 12 5.1 Privacy of Laplace Mechanism Family....................... 12 5.2 Privacy of Exponential Mechanism Family..................... 12 5.2.1 expMech(;; ) −Differential Privacy..................... 12 5.2.2 expMechlocal(;; ) non-Differential Privacy................. 12 5.2.3 expMechsmoo −Differential Privacy Proof................. 12 6 Accuracy Analysis 14 6.1 Accuracy Bound for Baseline Mechanisms..................... 14 6.1.1 Accuracy Bound for Laplace Mechanism.................. 14 6.1.2 Accuracy Bound for Improved Laplace Mechanism............ 16 6.2 Accuracy Bound for expMechsmoo ......................... 16 6.3 Accuracy Comparison between expMechsmoo, lapMech and ilapMech ...... 16 7 Experimental Evaluations 18 7.1 Efficiency Evaluation................................. 18 7.2 Accuracy Evaluation................................. 18 7.2.1 Theoretical Results.............................. 18 7.2.2 Experimental Results............................ 20 1 7.3 Privacy Evaluation.................................. 22 8 Conclusion and Future Work 22 2 Abstract Bayesian inference is a statistical method which allows one to derive a posterior distribution, starting from a prior distribution and observed data.
    [Show full text]
  • An Information-Geometric Approach to Feature Extraction and Moment
    An information-geometric approach to feature extraction and moment reconstruction in dynamical systems Suddhasattwa Dasa, Dimitrios Giannakisa, Enik˝oSz´ekelyb,∗ aCourant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA bSwiss Data Science Center, ETH Z¨urich and EPFL, 1015 Lausanne, Switzerland Abstract We propose a dimension reduction framework for feature extraction and moment reconstruction in dy- namical systems that operates on spaces of probability measures induced by observables of the system rather than directly in the original data space of the observables themselves as in more conventional methods. Our approach is based on the fact that orbits of a dynamical system induce probability measures over the measur- able space defined by (partial) observations of the system. We equip the space of these probability measures with a divergence, i.e., a distance between probability distributions, and use this divergence to define a kernel integral operator. The eigenfunctions of this operator create an orthonormal basis of functions that capture different timescales of the dynamical system. One of our main results shows that the evolution of the moments of the dynamics-dependent probability measures can be related to a time-averaging operator on the original dynamical system. Using this result, we show that the moments can be expanded in the eigenfunction basis, thus opening up the avenue for nonparametric forecasting of the moments. If the col- lection of probability measures is itself a manifold, we can in addition equip the statistical manifold with the Riemannian metric and use techniques from information geometry. We present applications to ergodic dynamical systems on the 2-torus and the Lorenz 63 system, and show on a real-world example that a small number of eigenvectors is sufficient to reconstruct the moments (here the first four moments) of an atmospheric time series, i.e., the realtime multivariate Madden-Julian oscillation index.
    [Show full text]
  • Issue PDF (13986
    European Mathematical Society NEWSLETTER No. 22 December 1996 Second European Congress of Mathematics 3 Report on the Second Junior Mathematical Congress 9 Obituary - Paul Erdos . 11 Report on the Council and Executive Committee Meetings . 15 Fifth Framework Programme for Research and Development . 17 Diderot Mathematics Forum . 19 Report on the Prague Mathematical Conference . 21 Preliminary report on EMS Summer School . 22 EMS Lectures . 27 European Won1en in Mathematics . 28 Euronews . 29 Problem Corner . 41 Book Reviews .....48 Produced at the Department of Mathematics, Glasgow Caledonian University Printed by Armstrong Press, Southampton, UK EDITORS Secretary Prof Roy Bradley Peter W. Michor Department of Mathematics Institut fiir Mathematik, Universitiit Wien, Strudlhof­ Glasgow Caledonian University gasse 4, A-1090 Wien, Austria. GLASGOW G4 OBA, SCOTLAND e-mail: [email protected] Editorial Team Glasgow: Treasurer R. Bradley, V. Jha, J. Gomatam, A. Lahtinen G. Kennedy, M. A. Speller, J. Wilson Department of Mathematics, P.O.Box 4 Editor - Mathematics Education FIN-00014 University of Helsinki Finland Prof. Vinicio Villani Dipartimento di Matematica e-mail: [email protected] Via Bounarroti, 2 56127 Pisa, Italy EMS Secretariat e-mail [email protected] Ms. T. Makelainen University of Helsinki (address above) Editors - Brief Reviews e-mail [email protected] I Netuka and V Soucek tel: +358-9-1912 2883 Mathematical Institute Charles University telex: 124690 Sokolovska 83 fax: +358-9-1912 3213 18600 Prague, Czech Republic e-mail: Newsletter editor [email protected] R. Bradley, Glasgow Caledonian University ( address [email protected] above) USEFUL ADDRESSES e-mail [email protected] President: Jean-Pierre Bourguignon Newsletter advertising officer IHES, Route de Chartres, F-94400 Bures-sur-Yvette, M.
    [Show full text]
  • Statistics As Both a Purely Mathematical Activity and an Applied Science NAW 5/18 Nr
    Piet Groeneboom, Jan van Mill, Aad van der Vaart Statistics as both a purely mathematical activity and an applied science NAW 5/18 nr. 1 maart 2017 55 Piet Groeneboom Jan van Mill Aad van der Vaart Delft Institute of Applied Mathematics KdV Institute for Mathematics Mathematical Institute Delft University of Technology University of Amsterdam Leiden University [email protected] [email protected] [email protected] In Memoriam Kobus Oosterhoff (1933–2015) Statistics as both a purely mathematical activity and an applied science On 27 May 2015 Kobus Oosterhoff passed away at the age of 82. Kobus was employed at contact with Hemelrijk and the encourage- the Mathematisch Centrum in Amsterdam from 1961 to 1969, at the Roman Catholic Univer- ment received from him, but he did his the- ity of Nijmegen from 1970 to 1974, and then as professor in Mathematical Statistics at the sis under the direction of Willem van Zwet Vrije Universiteit Amsterdam from 1975 until his retirement in 1996. In this obituary Piet who, one year younger than Kobus, had Groeneboom, Jan van Mill and Aad van der Vaart look back on his life and work. been a professor at Leiden University since 1965. Kobus became Willem’s first PhD stu- Kobus (officially: Jacobus) Oosterhoff was diploma’ (comparable to a masters) in dent, defending his dissertation Combina- born on 7 May 1933 in Leeuwarden, the 1963. His favorite lecturer was the topolo- tion of One-sided Test Statistics on 26 June capital of the province of Friesland in the gist J. de Groot, who seems to have deeply 1969 at Leiden University.
    [Show full text]
  • On Some Properties of Goodness of Fit Measures Based on Statistical Entropy
    IJRRAS 13 (1) ● October 2012 www.arpapress.com/Volumes/Vol13Issue1/IJRRAS_13_1_22.pdf ON SOME PROPERTIES OF GOODNESS OF FIT MEASURES BASED ON STATISTICAL ENTROPY Atif Evren1 & Elif Tuna2 1,2 Yildiz Technical University Faculty of Sciences and Literature , Department of Statistics Davutpasa, Esenler, 34210, Istanbul Turkey ABSTRACT Goodness of fit tests can be categorized under several ways. One categorization may be due to the differences between observed and expected frequencies. The other categorization may be based upon the differences between some values of distribution functions. Still the other one may be based upon the divergence of one distribution from the other. Some widely used and well known divergences like Kullback-Leibler divergence or Jeffreys divergence are based on entropy concepts. In this study, we compare some basic goodness of fit tests in terms of their statistical properties with some applications. Keywords: Measures for goodness of fit, likelihood ratio, power divergence statistic, Kullback-Leibler divergence, Jeffreys’ divergence, Hellinger distance, Bhattacharya divergence. 1. INTRODUCTION Boltzmann may be the first scientist who emphasized the probabilistic meaning of thermodynamical entropy. For him, the entropy of a physical system is a measure of disorder related to it [35]. For probability distributions, on the other hand, the general idea is that observing a random variable whose value is known is uninformative. Entropy is zero if there is unit probability at a single point, whereas if the distribution is widely dispersed over a large number of individually small probabilities, the entropy is high [10]. Statistical entropy has some conflicting explanations so that sometimes it measures two complementary conceptions like information and lack of information.
    [Show full text]
  • On a Generalization of the Jensen-Shannon Divergence and the Jensen-Shannon Centroid
    On a generalization of the Jensen-Shannon divergence and the JS-symmetrization of distances relying on abstract means∗ Frank Nielsen† Sony Computer Science Laboratories, Inc. Tokyo, Japan Abstract The Jensen-Shannon divergence is a renown bounded symmetrization of the unbounded Kullback-Leibler divergence which measures the total Kullback-Leibler divergence to the aver- age mixture distribution. However the Jensen-Shannon divergence between Gaussian distribu- tions is not available in closed-form. To bypass this problem, we present a generalization of the Jensen-Shannon (JS) divergence using abstract means which yields closed-form expressions when the mean is chosen according to the parametric family of distributions. More generally, we define the JS-symmetrizations of any distance using generalized statistical mixtures derived from abstract means. In particular, we first show that the geometric mean is well-suited for exponential families, and report two closed-form formula for (i) the geometric Jensen-Shannon divergence between probability densities of the same exponential family, and (ii) the geometric JS-symmetrization of the reverse Kullback-Leibler divergence. As a second illustrating example, we show that the harmonic mean is well-suited for the scale Cauchy distributions, and report a closed-form formula for the harmonic Jensen-Shannon divergence between scale Cauchy distribu- tions. We also define generalized Jensen-Shannon divergences between matrices (e.g., quantum Jensen-Shannon divergences) and consider clustering with respect to these novel Jensen-Shannon divergences. Keywords: Jensen-Shannon divergence, Jeffreys divergence, resistor average distance, Bhat- tacharyya distance, Chernoff information, f-divergence, Jensen divergence, Burbea-Rao divergence, arXiv:1904.04017v3 [cs.IT] 10 Dec 2020 Bregman divergence, abstract weighted mean, quasi-arithmetic mean, mixture family, statistical M-mixture, exponential family, Gaussian family, Cauchy scale family, clustering.
    [Show full text]
  • Disease Mapping Models for Data with Weak Spatial Dependence Or
    Epidemiol. Methods 2020; 9(1): 20190025 Helena Baptista*, Peter Congdon, Jorge M. Mendes, Ana M. Rodrigues, Helena Canhão and Sara S. Dias Disease mapping models for data with weak spatial dependence or spatial discontinuities https://doi.org/10.1515/em-2019-0025 Received November 28, 2019; accepted October 26, 2020; published online November 11, 2020 Abstract: Recent advances in the spatial epidemiology literature have extended traditional approaches by including determinant disease factors that allow for non-local smoothing and/or non-spatial smoothing. In this article, two of those approaches are compared and are further extended to areas of high interest from the public health perspective. These are a conditionally specified Gaussian random field model, using a similarity- based non-spatial weight matrix to facilitate non-spatial smoothing in Bayesian disease mapping; and a spatially adaptive conditional autoregressive prior model. The methods are specially design to handle cases when there is no evidence of positive spatial correlation or the appropriate mix between local and global smoothing is not constant across the region being study. Both approaches proposed in this article are pro- ducing results consistent with the published knowledge, and are increasing the accuracy to clearly determine areas of high- or low-risk. Keywords: bayesian modelling; body mass index(BMI); limiting health problems; spatial epidemiology; similarity-based and adaptive models. Background To allocate the scarce health resources to the spatial units that need them the most is of paramount importance nowadays. Methods to identify excess risk in particular areas should ideally acknowledge and examine the extent of potential spatial clustering in health outcomes (Tosetti et al.
    [Show full text]