A Method to Integrate and Classify Normal Distributions

A method to integrate and classify normal distributions Abhranil Das ∗1,3,4 and Wilson S Geisler2,3,4 1Department of Physics, The University of Texas at Austin 2Department of Psychology, The University of Texas at Austin 3Center for Perceptual Systems, The University of Texas at Austin 4Center for Theoretical and Computational Neuroscience, The University of Texas at Austin July 29, 2021 Abstract performance (behavior) of humans, other animals, neural circuits or algorithms. They also serve as a starting point in developing Univariate and multivariate normal probability distribu- other models/theories that describe sub-optimal performance of tions are widely used when modeling decisions under uncer- agents. tainty. Computing the performance of such models requires integrating these distributions over specic domains, which To compute the performance predicted by such theories it is can vary widely across models. Besides some special cases necessary to integrate the normal distributions over specic do- where these integrals are easy to calculate, there exist no gen- mains. For example, a particularly common task in vision science eral analytical expressions, standard numerical methods or is classication into two categories (e.g. detection and discrim- software for these integrals. Here we present mathematical ination tasks). The predicted maximum accuracy in such tasks results and open-source software that provide (i) the proba- is determined by integrating normals over domains dened by bility in any domain of a normal in any dimensions with any a quadratic decision boundary.2,3 Predicted accuracy of some of parameters, (ii) the probability density, cumulative distribu- the possible sub-optimal models is determined by integrating over tion, and inverse cumulative distribution of any function of a other domains. normal vector, (iii) the classication errors among any num- Except for some special cases4,5 (e.g. multinormal probabilities ber of normal distributions, the Bayes-optimal discriminability index and relation to the operating characteristic, (iv) di- in rectangular domains, such as when two normals have equal mension reduction and visualizations for such problems, and covariance and the optimal classication boundary is at), there (v) tests for how reliably these methods may be used on given exists no general analytical expression for these integrals, and we data. We demonstrate these tools with vision research appli- must use numerical methods, such as integrating over a Cartesian cations of detecting occluding objects in natural scenes, and grid, or Monte-Carlo integration. detecting camouage. Since the normal distribution tails o innitely outwards, it is Keywords: multivariate normal, integration, classication, inecient to numerically integrate it over a nite uniform Carte- signal detection theory, Bayesian ideal observer, vision sian grid, which would be large and collect ever-reducing masses outwards, yet omit some mass wherever the grid ends. Also, if the normal is elongated by unequal variances and strong covariances, 1 Introduction or the integration domain is complex and non-contiguous, naïve integration grids will waste resources in regions or directions that The univariate or multivariate normal (henceforth called sim- have low density or are outside the domain. One then needs to arXiv:2012.14331v7 [stat.ML] 27 Jul 2021 ply ‘normal’) is arguably the most important and widely-used visually inspect and arduously hand-tailor the integration grid to probability distribution. It is frequently used because various t the shape of each separate problem. central-limit theorems guarantee that normal distributions will Monte-Carlo integration involves sampling from the multinor- occur commonly in nature, and because it is the simplest and most mal, then counting the fraction of samples in the integration do- tractable distribution that allows arbitrary correlations between main. This does not have the above ineciencies or problems, but the variables. has other issues. Unlike grid integration, the desired precision Normal distributions form the basis of many theories and mod- cannot be specied, but must be determined by measuring the els in the natural and social sciences. For example, they are the spread across multiple runs. Also, when the probability in the in- foundation of Bayesian statistical decision/classication theories 1 tegration domain is very small (e.g. to compute the classication using Gaussian discriminant analysis, and are widely applied in error rate or discriminability d0 for highly-separated normal dis- diverse elds such as vision science, neuroscience, probabilistic tributions), it cannot be reliably sampled without a large number planning in robotics, psychology, and economics. These theo- of samples, which costs resource and time (see the performance ries specify optimal performance under uncertainty, and are of- benchmark section for a comparison). ten used to provide a benchmark against which to evaluate the Thus, there is no single analytical expression, numerical ∗[email protected] method or standard software tool to quickly and accurately in- 1 tegrate arbitrary normals over arbitrary domains, or to compute dimensional sd, since it linearly scales the normal (like σ in 1d), classication errors and the discriminability index d0. Evaluat- and its eigenvectors and values are the axes of the 1 sd error el- ing these quantities is often simplied by making the limiting lipsoid. We rst invert this transform to standardize the normal: assumption of equal variance. This impedes the quick testing, z = S−1(x − µ). This decorrelates or ‘whitens’ the variables, comparison, and optimization of models. Here we describe a and transforms the integration domain to a dierent quadratic: mathematical method and accompanying software implementa- 0 0 tion which provides functions to (i) integrate normals with arbi- q~(z) = z Qe 2z + q~1z +q ~0 > 0; with trary means and covariances in any number of dimensions over Qe 2 = SQ2S; arbitrary domains, (ii) compute the pdf, cdf and inverse cdf of (2) q~ = 2SQ µ + Sq ; any function of a multinormal variable (normal vector), and (iii) 1 2 1 compute the performance of classifying amongst any number of q~0 = q(µ): normals. This software is available as a Matlab toolbox ‘Inte- Now the problem is to nd the probability of the standard normal grate and classify normal distributions’, and the source code is z in this domain. If there is no quadratic term Qe 2, q~(z) is nor- at github.com/abhranildas/IntClassNorm. mally distributed, the integration domain boundary is a at, and We rst review and assimilate previous mathematical results the probability is Φ( q~0 ), where Φ is the standard normal cdf.4 into a generalized chi-squared method that can integrate arbi- kq~1k 0 trary normals over quadratic domains. Then we present a novel Otherwise, say Qe 2 = RDR is its eigen-decomposition, where 0 ray-trace method to integrate arbitrary normals over any domain, R is orthogonal, i.e. a rotoreection. So y = R z is also standard and consequently to compute the distribution of any real-valued normal, and in this space the quadratic is: function of a normal vector. We describe how these results can q^(y) = y0Dy + b0y +q ~ (b = R0q~ ) be used to compute error rates (and other relevant quantities) 0 1 X 2 X for Bayes-optimal and custom classiers, given arbitrary priors = Diyi + biyi + bi0 yi0 +q ~0 and outcome cost matrix. We then present some methods to re- i i0 duce problems to fewer dimensions, for analysis or visualization. (i and i0 index the nonzero and zero eigenvalues) Next, we provide a way to test whether directly measured sam- 2 2 X bi X X bi ples from the actual distributions in a classication problem are = Di yi + + bi0 yi0 +q ~0 − 2Di 2Di close enough to normal to trust the computations from the tool- i i0 i box. After describing the methods and software toolbox with ex- X 02 = Di χ 2 + x; amples, we demonstrate their accuracy and speed across a variety 1;(bi=2Di) i of problems. We show that for quadratic-domain problems both the generalized chi-squared method and the ray-trace method are a weighted sum of non-central chi-square variables χ02, each with accurate, but vary in relative speed depending on the particular 1 degree of freedom, and a normal variable x ∼ N(m; s). So this 2 problem. Of course, for domains that are not quadratic, only the is a generalized chi-square variable χ~w;k;λ;m;s, where we merge ray-trace method applies. Finally, we illustrate the methods with the non-central chi-squares with the same weights, so that the two applications from our laboratory: modeling detection of oc- vector of their weights w are the unique nonzero eigenvalues cluding targets in natural scenes, and detecting camouage. among Di, their degrees of freedom k are the numbers of times the eigenvalues occur, and their non-centralities, and normal parameters are: 2 Integrating the normal s 1 X 2 X 2 λj = 2 bi ; m = q(µ) − w.λ; s = bi0 : 2.1 In quadratic domains: the generalized chi- 4wj 0 i:Di=wj i square method The required probability, p χ~2 > 0, is now a 1d integral, com- Integrating the normal in quadratic domains is important for putable using, say, Ruben’s6 or Davies’7 methods. We use the computing the maximum possible classication accuracy. The Matlab toolbox ‘Generalized chi-square distribution’ that we de- problem is the following: given a column vector x ∼ N(µ; Σ), veloped (source code is at github.com/abhranildas/gx2), which nd the probability that can compute the generalized chi-square parameters correspond- 0 0 ing to a quadratic form of a normal vector, its statistics, cdf (using q(x) = x Q2x + q1x + q0 > 0: (1) three dierent methods), pdf, inverse cdf and random numbers. (Here and henceforth, bold upper-case symbols represent ma- Previous software implements specic forms of this theory for trices, bold lower-case symbols represent vectors, and regular particular quadratics such as ellipsoids.5 The method described lower-case symbols represent scalars.) here correctly handles all quadratics (ellipsoids, hyperboloids, This can be viewed as the multi-dimensional integral of the paraboloids and degenerate conics) in all dimensions.

A Method to Integrate and Classify Normal Distributions

Moments of the Product and Ratio of Two Correlated Chi-Square Variables

A Multivariate Student's T-Distribution

A Guide on Probability Distributions

Gamma and Related Distributions

Hand-Book on STATISTICAL DISTRIBUTIONS for Experimentalists

Handbook on Probability Distributions

Field Guide to Continuous Probability Distributions

Multivariate Statistics

Chi Percent Point Function with Degrees of Freedom Parameter V

Probability Distribution Relationships

Statistical Analysis

Probability Theory with Simulations - Part-III Continuous Distributions in One-Dimension