A New Family of Bounded Divergence Measures and Application to Signal Detection
Total Page:16
File Type:pdf, Size:1020Kb
A New Family of Bounded Divergence Measures and Application to Signal Detection Shivakumar Jolad1, Ahmed Roman2, Mahesh C. Shastry 3, Mihir Gadgil4 and Ayanendranath Basu 5 1Department of Physics, Indian Institute of Technology Gandhinagar, Ahmedabad, Gujarat, INDIA 2 Department of Mathematics, Virginia Tech , Blacksburg, VA, USA. 3 Department of Physics, Indian Institute of Science Education and Research Bhopal, Bhopal, Madhya Pradesh, INDIA. 4 Biomedical Engineering Department, Oregon Health & Science University, Portland, OR, USA. 5 Indian Statistical Institute, Kolkata, West Bengal-700108, INDIA [email protected], [email protected], [email protected], [email protected], [email protected] Keywords: Divergence Measures, Bhattacharyya Distance, Error Probability, F-divergence, Pattern Recognition, Signal Detection, Signal Classification. Abstract: We introduce a new one-parameter family of divergence measures, called bounded Bhattacharyya distance (BBD) measures, for quantifying the dissimilarity between probability distributions. These measures are bounded, symmetric and positive semi-definite and do not require absolute continuity. In the asymptotic limit, BBD measure approaches the squared Hellinger distance. A generalized BBD measure for multiple distributions is also introduced. We prove an extension of a theorem of Bradt and Karlin for BBD relating Bayes error probability and divergence ranking. We show that BBD belongs to the class of generalized Csiszar f-divergence and derive some properties such as curvature and relation to Fisher Information. For distributions with vector valued parameters, the curvature matrix is related to the Fisher-Rao metric. We derive certain inequalities between BBD and well known measures such as Hellinger and Jensen-Shannon divergence. We also derive bounds on the Bayesian error probability. We give an application of these measures to the problem of signal detection where we compare two monochromatic signals buried in white noise and differing in frequency and amplitude. 1 INTRODUCTION to assess the convergence (or divergence) of proba- bility distributions. Many of these measures are not Divergence measures for the distance between two metrics in the strict mathematical sense, as they may probability distributions are a statistical approach to not satisfy either the symmetry of arguments or the comparing data and have been extensively studied in triangle inequality. In applications, the choice of the the last six decades [Kullback and Leibler, 1951, Ali measure depends on the interpretation of the metric in and Silvey, 1966, Kapur, 1984, Kullback, 1968, Ku- terms of the problem considered, its analytical prop- mar et al., 1986]. These measures are widely used in erties and ease of computation [Gibbs and Su, 2002]. varied fields such as pattern recognition [Basseville, One of the most well-known and widely used di- 1989, Ben-Bassat, 1978, Choi and Lee, 2003], speech vergence measures, the Kullback-Leibler divergence recognition [Qiao and Minematsu, 2010, Lee, 1991], (KLD) [Kullback and Leibler, 1951,Kullback, 1968], arXiv:1201.0418v9 [math.ST] 10 Apr 2016 signal detection [Kailath, 1967, Kadota and Shepp, can create problems in specific applications. Specif- 1967,Poor, 1994], Bayesian model validation [Tumer ically, it is unbounded above and requires that the and Ghosh, 1996] and quantum information theory distributions be absolutely continuous with respect [Nielsen and Chuang, 2000, Lamberti et al., 2008]. to each other. Various other information theoretic Distance measures try to achieve two main objectives measures have been introduced keeping in view ease (which are not mutually exclusive): to assess (1) how of computation ease and utility in problems of sig- “close” two distributions are compared to others and nal selection and pattern recognition. Of these mea- (2) how “easy” it is to distinguish between one pair sures, Bhattacharyya distance [Bhattacharyya, 1946, than the other [Ali and Silvey, 1966]. Kailath, 1967,Nielsen and Boltz, 2011] and Chernoff There is a plethora of distance measures available distance [Chernoff, 1952, Basseville, 1989, Nielsen tions. We also characterize the performance of BBD and Boltz, 2011] have been widely used in signal for different signal to noise ratios, providing thresh- processing. However, these measures are again un- olds for signal separation. bounded from above. Many bounded divergence mea- Our paper is organized as follows: Section I is sures such as Variational, Hellinger distance [Bas- the current introduction. In Section II, we recall the seville, 1989, DasGupta, 2011] and Jensen-Shannon well known Kullback-Leibler and Bhattacharyya di- metric [Burbea and Rao, 1982,Rao, 1982b,Lin, 1991] vergence measures, and then introduce our bounded have been studied extensively. Utility of these mea- Bhattacharyya distance measures. We discuss some sures vary depending on properties such as tightness special cases of BBD, in particular Hellinger distance. of bounds on error probabilities, information theoretic We also introduce the generalized BBD for multi- interpretations, and the ability to generalize to multi- ple distributions. In Section III, we show the posi- ple probability distributions. tive semi-definiteness of BBD measure, applicability of the Bradt Karl theorem and prove that BBD be- Here we introduce a new one-parameter (a) fam- longs to generalized f-divergence class. We also de- ily of bounded measures based on the Bhattacharyya rive the relation between curvature and Fisher Infor- coefficient, called bounded Bhattacharyya distance mation, discuss the curvature metric and prove some (BBD) measures. These measures are symmet- inequalities with other measures such as Hellinger ric, positive-definite and bounded between 0 and 1. and Jensen Shannon divergence for a special case of In the asymptotic limit (a ! ±¥) they approach BBD. In Section IV, we move on to discuss applica- squared Hellinger divergence [Hellinger, 1909,Kaku- tion to signal detection problem. Here we first briefly tani, 1948]. Following Rao [Rao, 1982b] and Lin describe basic formulation of the problem, and then [Lin, 1991], a generalized BBD is introduced to cap- move on computing distance between random pro- ture the divergence (or convergence) between multi- cesses and comparing BBD measure with Fisher In- ple distributions. We show that BBD measures belong formation and KLD. In the Appendix we provide the to the generalized class of f-divergences and inherit expressions for BBD measures , with a = 2, for some useful properties such as curvature and its relation to commonly used distributions. We conclude the paper Fisher Information. Bayesian inference is useful in with summary and outlook. problems where a decision has to be made on clas- sifying an observation into one of the possible array of states, whose prior probabilities are known [Hell- man and Raviv, 1970, Varshney and Varshney, 2008]. 2 DIVERGENCE MEASURES Divergence measures are useful in estimating the er- ror in such classification [Ben-Bassat, 1978, Kailath, In the following subsection we consider a measur- 1967, Varshney, 2011]. We prove an extension of the able space W with s-algebra B and the set of all prob- Bradt Karlin theorem for BBD, which proves the exis- ability measures M on (W;B). Let P and Q denote tence of prior probabilities relating Bayes error prob- probability measures on (W;B) with p and q denoting abilities with ranking based on divergence measure. their densities with respect to a common measure l. Bounds on the error probabilities Pe can be calcu- We recall the definition of absolute continuity [Roy- lated through BBD measures using certain inequal- den, 1986]: ities between Bhattacharyya coefficient and Pe. We derive two inequalities for a special case of BBD Absolute Continuity A measure P on the Borel sub- (a = 2) with Hellinger and Jensen-Shannon diver- sets of the real line is absolutely continuous with re- gences. Our bounded measure with a = 2 has been spect to Lebesgue measure Q, if P(A) = 0, for every used by Sunmola [Sunmola, 2013] to calculate dis- Borel subset A 2 B for which Q(A) = 0, and is de- tance between Dirichlet distributions in the context of noted by P << Q. Markov decision process. We illustrate the applicabil- ity of BBD measures by focusing on signal detection 2.1 Kullback-Leibler divergence problem that comes up in areas such as gravitational wave detection [Finn, 1992]. Here we consider dis- The Kullback-Leibler divergence (KLD) (or rela- criminating two monochromatic signals, differing in tive entropy) [Kullback and Leibler, 1951, Kullback, frequency or amplitude, and corrupted with additive 1968] between two distributions P;Q with densities p white noise. We compare the Fisher Information of and q is given by: the BBD measures with that of KLD and Hellinger distance for these random processes, and highlight the Z p I(P;Q) ≡ plog dl: (1) regions where FI is insensitive large parameter devia- q The symmetrized version is given by where a 2 [−¥;0) [ (1;¥]. This gives the measure J(P;Q) ≡ (I(P;Q) + I(Q;P))= a 2 (1 − r) Ba(r(P;Q)) ≡ −log 1 −a 1 − ; (5) [Kailath, 1967], I(P;Q) 2 [0;¥]. It diverges if (1− a ) a 9 x0 : q(x0) = 0 and p(x0) 6= 0. which can be simplified to KLD is defined only when P is absolutely contin- h (1−r) i uous w.r.t. Q. This feature can be problematic in nu- log 1 − a merical computations when the measured distribution Ba(r) = : (6) log1 − 1 has zero values. a It is easy to see that B (0) = 1; B (1) = 0. 2.2 Bhattacharyya Distance a a Bhattacharyya distance is a widely used measure 2.4 Special cases in signal selection and pattern recognition [Kailath, 1967]. It is defined as: 1. For a = 2 we get, Z p B(P;Q) ≡ −ln pqdl = −ln(r); (2) 1 + r2 1 + r B2(r) = −log22 = −log2 : R p 2 2 where the term in parenthesis r(P;Q) ≡ pqdl (7) is called Bhattacharyya coefficient [Bhattacharyya, We study some of its special properties in Sec.3.7.