Applications of Information Geometry to Hypothesis Testing and Signal

CMCAA 2016 ApplicationsApplications ofof InformationInformation GeometryGeometry toto HypothesisHypothesis TestingTesting andand SignalSignal DetectionDetection Yongqiang Cheng National University of Defense Technology July 2016 OutlineOutline 1. Principles of Information Geometry 2. Geometry of Hypothesis Testing 3. Matrix CFAR Detection on Manifold of Symmetric Positive-Definite Matrices 4. Geometry of Matrix CFAR Detection 1.1. PrinciplesPrinciples ofof InformationInformation GeometryGeometry Important problems in statistics Distribution (likelihood): p(x|θ) where . x is a vector of data . θ is a vector of unknowns 1. How much does the data x tell about the unknown θ ? 2. How good is an estimator θ ˆ ? 3. How to measure difference between two distributions? 4. How about the structure of a statistical model specified by a family of distributions? 3 1.1. PrinciplesPrinciples ofof InformationInformation GeometryGeometry What is information geometry? Data Distributions Manifold Data processing Statistics Information Geometry 4 1.1. PrinciplesPrinciples ofof InformationInformation GeometryGeometry . Information geometry is the study of intrinsic properties of manifolds of probability distributions by way of differential geometry. The main tenet of information geometry is that many important structures in information theory and statistics can be treated as structures in differential geometry by regarding a space of probabilities as a differentiable manifold endowed with a Riemannian metric and a family of affine connections. Statistics Information Theory Probability Theory Relationships with Information Geometry other subjects Physics Differential Geometry Systems Riemannian Theory Geometry 5 1.1. PrinciplesPrinciples ofof InformationInformation GeometryGeometry n . Statistical manifold ,,GpR (|)|x θ x ,θ logp log p . Riemannian metric GEθ ij θθ . Affine connections 1 jim ()θ El (,)x θ l (,)x θ Ell (,)x θ (,)x θ l (,)x θ θθji m2 j i m . Distance and geodesic 2 T ds gij()θθdd i j dG()θθ d . Curvatures Rllllsls ijkjk ik ij js ik ks ij xxs 6 OutlineOutline 1. Principles of Information Geometry 2. Geometry of Hypothesis Testing 3. Matrix CFAR Detection on Manifold of Symmetric Positive-Definite Matrices 4. Geometry of Matrix CFAR Detection 2.2. GeometryGeometry ofof HypothesisHypothesis TestingTesting 1）Start from target detection 100 90 80 70 60 50 40 30 20 10 0 -30 -20 -10 0 10 20 30 -30 -20 -10 0 10 20 30 P(|x 0 ) P(|x 1 ) x Hypothesis testing 8 2.2. GeometryGeometry ofof HypothesisHypothesis TestingTesting 2）Likelihood ratio test • Principles Make division of observation space • Basic method P(|x 0 ) The detector decides 1 if the likelihood P(|x ) ratio exceeds a threshold 1 p(|x ) L()x 1 p(|x 0 ) x • Essentials of signal detection p(|x ) Discrimination between two identically L()x 1 p(|x 0 ) distributed distributions with Likelihood ratio test different parameters. 9 2.2. GeometryGeometry ofof HypothesisHypothesis TestingTesting 3）Geometric interpretation of hypothesis testing 2 1 x px;, exp 2 2 HR ,;0 2 2 AB B C dABF (,) A CD dCDF (, ) D Familyofdistributions Statisticalmanifold Consider hypothesis testing from a geometric viewpoint 10 2.2. GeometryGeometry ofof HypothesisHypothesis TestingTesting 3）Geometric interpretation of hypothesis testing • Equivalence between LRT and Kullback-Leibler divergence Suppose xx 12 ,,, x N are N i.i.d. observations from a distribution qx () , and there are two models (hypotheses) for qx () , denoted by px 0 () qx() D(||)qp qx ()ln dx and px 1 () . Then, the likelihood ratio is KLD px() N px()1 1 L 1 i 1 Dq(||) p01 Dq (||)p px() i1 0 i 0 0 N Minimum distance detector 1 • Error exponent: KPlim log M N N NK • Stein’s lemma: K Dp( 01|| p) PM 2 11 2.2. GeometryGeometry ofof HypothesisHypothesis TestingTesting 3）Geometric interpretation of hypothesis testing p(|x ) The problem of hypothesis p(|x 0 ) 1 testing can be regarded as a discrimination problem where x X the decision is made by comparing distances from the p(|)x θ signal distribution estimates to d1 d0 p(|x θ1 ) two hypotheses in the sense p(|x θ0 ) S of the KL divergence, i.e., selecting the model that is 1 1 Dq(||) p01 Dq (||)p “closer” to signal distribution 0 N estimates. Minimum distance detector 12 OutlineOutline 1. Principles of Information Geometry 2. Geometry of Hypothesis Testing 3. Matrix CFAR Detection on Manifold of Symmetric Positive-Definite Matrices 4. Geometry of Matrix CFAR Detection 3.3. MatrixMatrix CFARCFAR DetectionDetection 1）Constant false alarm rate detector c ll ce b n io Classical CFAR detector ct a te de Decision by comparing the targets content of the cell under samples test with an adaptive threshold given by the x1 …… xN 21 xN 2 xD xN 2 xN 21 …… xN arithmetic mean of the …… …… Arithmetic mean reference cells to achieve 0 0: target absent the desired constant Decision 1 1: target present probability of false alarm. Threshold 14 3.3. MatrixMatrix CFARCFAR DetectionDetection 2）Matrix CFAR detector In 2008, F. Barbaresco proposed a generalized R1 Ri Ri1 RN CFAR technique based on the manifold of symmetric R R positive definite (SPD) matrices. It has been proved that the R 1 R Riemannian distance-based N RD detector has better R Ri1 detection performance than R2 Ri the classical CFAR detector. 15 3.3. MatrixMatrix CFARCFAR DetectionDetection 2）Matrix CFAR detector • Riemannian distance between two SPD matrices 2 n 212122 d RR12,ln R 1 RR 21 lnk k 1 • Riemannian center of N SPD matrices N p RR arg minwdii , R R i1 where p=1, R denotes the median; p=2, R is the mean. • The matrix CFAR detector d RR, i 16 3.3. MatrixMatrix CFARCFAR DetectionDetection 2）Matrix CFAR detector Initial spectra of measurements Mean spectra of measurements Intensity Classical detector Geometric detector 17 3.3. MatrixMatrix CFARCFAR DetectionDetection 3）Robust matrix CFAR detector • Two shortcomings of the Riemannian distance based matrix CFAR detector a) Computational cost is expensive for exponential operations in the calculation of Riemannian distance and its average; b) Riemannian mean and median are not robust to outlier. 18 3.3. MatrixMatrix CFARCFAR DetectionDetection 3）Robust matrix CFAR detector • Symmetrized Kullback-Leibler (sKL) divergence based matrix CFAR detector • Total Kullback-Leibler (tKL) divergence based matrix CFAR detector Sample Data Covariance CUT R Matrix R R1 Ri Ri+1 N sKL mean, sKL median, tKL t center Divergence >Threshold Computation 19 3.3. MatrixMatrix CFARCFAR DetectionDetection 3）Robust matrix CFAR detector • sKL divergence between two SPD matrices 11 sKLRR12,tr( RR 12 RR 21 2) I • sKL mean of N SPD matrices 1 12 11NN RRR= 1 ik NNik=1 =1 • sKL median of N SPD matrices 1 12 1 NNR R R i j k1 ij11sKL RR,,sKL RR ki k j 20 3.3. MatrixMatrix CFARCFAR DetectionDetection 3）Robust matrix CFAR detector • The tKL divergence is a special case of the total Bregman divergence tBD, which is invariant to linear transformation. BDxy(, ) f x f y x y, f y f xfyxyfy , tBD(, x y ) More robust 2 1fy BD x , y BD x , y tBD x , y tBD x , y 21 3.3. MatrixMatrix CFARCFAR DetectionDetection 3）Robust matrix CFAR detector • tKL divergence between two SPD matrices 11 log det RR12 tr RR 21 n tKLRR12, 2 log det R2 n1log2 2c log det R 42 2 • tKL center of N SPD matrices inversely proportional to the 1 value of divergence gradient, RRww1 , i ii i which is robust to outliers i j j 1 2 log det R i n1log2 where i 2 c log det Ri 42 22 3.3. MatrixMatrix CFARCFAR DetectionDetection 3）Robust matrix CFAR detector • Comparisons of dissimilarity measures between Riemannian distance, sKL divergence and tKL divergence The signal-to-clutter ratio (SCR) is significantly improved by the mapping of tKL divergence. 23 3.3. MatrixMatrix CFARCFAR DetectionDetection 3）Robust matrix CFAR detector • Comparisons of detection performance between Riemannian distance, sKL divergence and tKL divergence The tKL divergence based matrix CFAR detector has better performance. 24 3.3. MatrixMatrix CFARCFAR DetectionDetection 3）Robust matrix CFAR detector Table I The time taken by different algorithms Algorithm Time (s) Riemannian mean detector 29.74 Riemannian median detector 41.66 sKL mean detector 0.09 sKL median detector 2.81 tKL t center detector 0.15 25 OutlineOutline 1. Principles of Information Geometry 2. Geometry of Hypothesis Testing 3. Matrix CFAR Detection on Manifold of Symmetric Positive-Definite Matrices 4. Geometry of Matrix CFAR Detection 4.4. GeometryGeometry ofof MatrixMatrix CFARCFAR DetectionDetection Classical CFAR detector Euclidean space R 1 R Euclidean distance measure N Matrix CFAR detector RD R Ri1 Matrix manifold Riemannian distance measure R2 Ri KL divergence, etc. A good detector should R R1 N Properly characterize the RD R Ri1 R2 intrinsic structure of the Ri measurement space Maximize the divergence between two hypotheses (clusters) 27 4.4. GeometryGeometry ofof MatrixMatrix CFARCFAR DetectionDetection Future work Other divergences which have better performance to measure the dissimilarity between distributions Better approaches for clustering the distributions Detectors for heavy clutters Detectors for nonstationary clutters Detectors for few samples 28 Thank you for your attention！ .

Applications of Information Geometry to Hypothesis Testing and Signal

Introduction to Information Geometry – Based on the Book “Methods of Information Geometry ”Written by Shun-Ichi Amari and Hiroshi Nagaoka

Arxiv:1907.11122V2 [Math-Ph] 23 Aug 2019 on M Which Are Dual with Respect to G

Information Geometry of Orthogonal Initializations and Training

Information Geometry and Optimal Transport

Information Geometry (Part 1)

Information Geometry: Near Randomness and Near Independence

Exponential Families: Dually-Flat, Hessian and Legendre Structures

Information-Geometric Optimization Algorithms: a Unifying Picture Via Invariance Principles

Fisher-Rao Geometry of Dirichlet Distributions Alice Le Brigant, Stephen Preston, Stéphane Puechmorel

Information-Geometric Optimization Algorithms: a Unifying Picture Via Invariance Principles Yann Ollivier, Ludovic Arnold, Anne Auger, Nikolaus Hansen

The Fisher-Rao Geometry of Beta Distributions Applied to the Study of Canonical Moments Alice Le Brigant, Stéphane Puechmorel

Information Geometry of Α-Projection in Mean Field Approximation