Entropy-Based Approach for Detecting Feature Reliability

ENTROPY-BASED APPROACH FOR DETECTING FEATURE RELIABILITY Dragoljub Pokrajac, Delaware State University [email protected] Longin Jan Latecki,Temple University [email protected] Abstract – Although a tremendous effort has been made One of the most successful approaches for motion to perform a reliable analysis of images and videos in the detection was introduced by Stauffer and Grimson [7]. It is past fifty years, the reality is that one cannot rely in 100% on based on adaptive Gaussian mixture model of the color the analysis results. In this paper, rather than proposing yet values distribution over time at each pixel location. Each another improvement in video analysis techniques, we Gaussian function in the mixture is defined by its prior discuss entropy-based monitoring of features reliability. probability, mean and a covariance matrix. In [8], we applied Major assumption of our approach is that the noise, the Stauffer-Grimson technique on spatial-temporal blocks adversingly affecting the observed feature values, is and demonstrate the robustness of the modified technique in Gaussian and that the distribution of noised feature comparison to original pixel-based approach. approaches the normal distribution with higher magnitudes of noise. In this paper, we consider one-dimensional features In [9] we have shown that the motion detection algorithms and compare two different ways of differential entropy based on local variation are not only a much simpler but also estimation—histogram based and the parametric based can be a more adequate model for motion detection for estimation. We demonstrate that the parametric approach is infrared videos. This technique can significantly reduce the superior and applicable to identify time intervals of a test processing time in comparison to the Gaussian mixture video where the observed features are not reliable for motion model, due to smaller complexity of the local variation detection tasks. computation, thus making the real time processing of high- resolution videos as well as efficient analysis of large-scale 1. INTRODUCTION video data viable. Moreover, the local-variation based algorithm remains stable with higher dimensions of input The main effort in video analysis nowadays is still data, which is not necessarily the case for an EM type in making the feature computation more reliable. Our claim algorithm (used for Gaussian model estimation). is that in spite of all such efforts we need to accept the fact that the computed features will never be absolutely reliable, 2. ENTROPY-BASED ASSESMENT OF FEATURE and focus our attention also on computing reliability RELIABILITY measures. This way system decisions will only be made To determine whether a particular feature is reliable, we when features are sufficiently reliable. For an intelligent assume that the feature bears more information if its system for video analysis, as supplement to feature distribution differs more significantly from a normal computation, we propose instantaneous evaluation of (Gaussian) distribution. Obsereve that similar heuristics are computed features accompanied by assessing their reliability, used e.g., in Independent Components Analysis [10]. The and adapting the system behavior in accordance to features follow-up assumption is that the feature becomes unreliable reliability. For example, if the goal of a system is to monitor if an addition random noise is superimposed, which would motion activity, and to signal an alarm if the activity is high, lead the distribution of such noisy feature to become more the system can make reliable decisions only if there exist a Gaussian like. Hence, by measuring to what extent a feature subset of the computed motion activity features that is distribution differs from a Gaussian distribution, one can not sufficiently reliable. The monitoring of features reliability only get information to what extent the feature is useful but and adjusting the system behavior accordingly therefore also when such usefulness drops (e.g., due to some external seems to be the only way to deploy autonomous video and often non-observed factor). surveillance systems in the close future. The Information Theory proposes negentropy as the measure A good overview of the existing approaches to motion of this discrepancy. Given a probability density p(x) of a detection can be found in the collection of papers edited by feature, Differential Entropy is defined [11, 12] as: Remagnino et al. [1] and in the special section on video ∞ (1) surveillance in IEEE PAMI edited by Collins et al. [2]. A H ()x = ∫ − p ()x logp ()x dx −∞ common feature of the existing approaches for moving For a given class of distributions p(x) that have the same objects detection is the fact that they are pixel based. Some of 2 the approaches rely on comparison of color or intensities of variance σ , differential entropy is maximal for a Gaussian pixels in the incoming video frame to a reference image. Jain distribution where it is equal to: 2 1 2 . (2) et al. [3] use simple intensity comparison to reference images H Gauss ()σ = log2π eσ so that the values above a given threshold identify the pixels 2 of moving objects. A large class of approaches is based on Hence, a negentropy, which is defined as 2 (3) appropriate statistics of color or gray values over time at each J (x) = H Gauss (σ )− H (x) pixel location. (e.g., the segmentation by background or its normalized value subtraction in W4 [4], eigenbackground subtraction [5], etc). 2 H (x) (4) J ()x / H Gauss ()σ = 1− 2 Wren et al. [6] were the first who used a statistical model of H Gauss ()σ the background instead of a reference image. may be used to measure usefulness and reliability of a feature. Observe that the minimal value of negentropy is 0 (when p(x) is Gaussian). Differential entropy and negentropy compare reliability with no need to normalize with the can be generalized for multidimensional variables. However, entropy of a Gaussian. in this paper, we consider only one-dimensional feature x. 2) Use a first-order Taylor approximation of a logarithmic function in eq. (1) that leads to: 2.1 Computation of Differential Entropy Based on Piecewise (1+ ε )log(1+ ε ) ≈ ε + ε 2 / 2 ; Approximation 3) Use conveniently chosen set of orthogonal functions Let’s approximate p(x) with piecewise linear function p’(x*) of a feature x (denoted by Gi(x)) to expand probability such that: density function p(x) in vicinity of a Gaussian probability p'()x * = K p ()x ,for x*∈[]x , x + ∆x ,i = 0,...N −1 density. i i i BINS (5) In practice, the choice of orthogonal functions is based on where x ≡ x + ∆x. N BINS N BINS −1 practicability and sensitivity on outliers of the computation of Here, K is normalization constant chosen such that p’(x*) is estimates for expectations E(Gi(x)), integrability of the ∞ obtained probability density function approximation and last, distribution, i.e. p'()x * dx* = 1. Hence, we choose K but not the least, the properties of non-Gaussian distributions ∫ we want to capture. −∞ Based on such consideration, [11] proposes the following such that K = 1/ ∆x∑ p()xi . two approximations of negentropy, that we use in this paper: i 2 2 2 * 2 J x* = k E x* ⋅e−x / 2 + k E x* − (12a) Let’s denote p ≡ Kp x such that a () 1 ()2a () i ()i π p'()x * = pi , for x*∈[]xi , xi + ∆x . The differential 2 2 2 2 * * 1 entropy can be approximated as: J x* = k E x* ⋅ e−x / 2 + k E e−x / 2 − (12b) b () 1 ()()2b ∞ 2 H()x * = ∫ − p' ()x * logp' ()x * dx Here the coefficients are determined as: −∞ (6) 36 24 24 . (13) xi +∆x k = ,k = ,k = 1 2a 6 2b = − p log p dx* = − p log p ∆x 8 3 − 9 2 − 16 3 − 27 ∑ ∫ i i ∑ i i π i i xi 2 3. EXPERIMENTAL RESULTS Let’s now compute the variance σˆ * of the discretized variable x* (corresponding to the distribution p’(x*)) We evaluated the proposed techniques for assessing The variance is: feature reliability on a set of videos [14]. This set includes 2 2 infrared videos, for which the same settings of parameters as σˆ *2 = E x* − E x* (7) ( ) () for visual light videos were used. Here we focus on our Where: results on a video sequence from the Performance Evaluation ∆x (8a) of Tracking and Surveillance (PETS) repository referred to E()x * = ∆x∑ pi xi + i 2 as the Campus 3 sequence [15]. 2 2 ∆x We consider the reliability of a feature called motion activity, E()x *2 = ∆x p x + ∆x 2 p x + (8b) ∑ i i ∑ i i 3 computed as follows [9]. First, we represent videos as three- i i dimensional (3D) arrays of gray level or monochromatic The differential entropy of a Gaussian variable that has the infrared pixel values. We divide each image in a video same variance as x* is (see eq. (2)) sequence into disjoint 8x8 squares and generate 1 2 2 (9) spatiotemporal (3D) blocks by three adjacent squares in H gauss ()σˆ * = log 2π eσˆ * 2 consecutive frames at the same video plane location. The Using eq. (6) and eq. (9) we will approximate J(x), eq. (3), obtained 3D blocks are represented as 192-dimensional with the negentopy J(x*) of discretized variable x*: vectors (8*8*3) of gray level or monochromatic infrared J()x * = H (σˆ *2 )− H(x*) (10) pixel values. We then zero mean these vectors and project gauss them to three dimensions using principal component analysis Similar, we can approximate normalized negentropy, eq. (4), (PCA) [16]. For each block location (x,y), we consider set of as: vectors of the PCA projections corresponding to a symmetric *2 H gauss ()σˆ − H(x*) window of 7 frames around the temporal instant t, and for J()x * / H ()σˆ *2 = .

Entropy-Based Approach for Detecting Feature Reliability

An Unforeseen Equivalence Between Uncertainty and Entropy

Guaranteed Bounds on Information-Theoretic Measures of Univariate Mixtures Using Piecewise Log-Sum-Exp Inequalities

A Lower Bound on the Differential Entropy of Log-Concave Random Vectors with Applications

Specific Differential Entropy Rate Estimation for Continuous-Valued Time Series

"Differential Entropy". In: Elements of Information Theory

January 28, 2021 1 Dealing with Infinite Universes

A Probabilistic Upper Bound on Differential Entropy Joseph Destefano, Member, IEEE, and Erik Learned-Miller

Chapter 5 Differential Entropy and Gaussian Channels

Measuring the Complexity of Continuous Distributions

Undersmoothed Kernel Entropy Estimators MAIN RESULTS

On Relations Between the Relative Entropy and Χ2-Divergence, Generalizations and Applications

Entropy and Mutual Information