<<

On Elliptical Possibility Distributions Charles Lesniewska-Choquet, Gilles Mauris, Abdourrahmane Atto, Grégoire Mercier

To cite this version:

Charles Lesniewska-Choquet, Gilles Mauris, Abdourrahmane Atto, Grégoire Mercier. On Elliptical Possibility Distributions. IEEE Transactions on Fuzzy Systems, Institute of Electrical and Electronics Engineers, 2020, 28 (8), pp.1631-1639. ￿10.1109/TFUZZ.2019.2920803￿. ￿hal-02149345￿

HAL Id: hal-02149345 https://hal.archives-ouvertes.fr/hal-02149345 Submitted on 6 Jun 2019

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. 1

On Elliptical Possibility Distributions

Charles LESNIEWSKA-CHOQUET1,∗, Gilles MAURIS1, Abdourrahmane M. ATTO1, Gregoire´ MERCIER2

Abstract—This paper aims to propose two main contributions • and the Mahalanobis relating multivariate in the field of multivariate through the possibility to their monovariate closeness measures theory. The first proposition is the definition of a generalized family of multivariate elliptical possibility distributions. These in order to derive a family of multivariate possibility distribu- distributions have been derived from a consistent - tions. These possibility distributions will be called elliptical, possibility transformation over the family of so-called elliptical for terminology analogy with respect to their probabilistic probability distributions. The second contribution proposed by the paper is the definition of two measures between counter parts. Thus our work will help to design a possibility possibilistic distributions. We prove that a symmetric version of distribution from any elliptical thanks the Kullback-Leibler divergence guarantees all divergence prop- to a general analytical form and regardless of the dimension. erties when related to the of possibility distributions. We In the second part of this paper, we consider possibilis- further derive analytical expressions of the latter divergence and tic divergence definition [10], called Π-divergence hereafter. of the Hellinger divergence for certain possibility distributions pertaining to the elliptical family proposed, especially the normal Two divergences are studied: the Kullback-Leibler and the multivariate possibility divergence in two dimensions. Finally, this Hellinger divergence. We show that the probabilistic Kullback- paper provides an illustration of the developed possibilistic tools Leibler divergence [11] does not satisfy positivity constraints in an application of bi-band between optical when transposed to the possibility domain, and that its sym- satellite images. version does not suffer from this lack. On the other hand, due to its closeness to the Euclidean norm, the Hellinger Keywords: Multivariate possibility distributions; Joint pos- divergence preserves its divergence properties when convert sibility distributions; Elliptical probability distributions; to the possibility domain. Then we derive some analytical Kullback-Leibler divergence; Hellinger divergence; Change closed forms from the previously mentioned Π-divergences detection. applied to elliptical possibility distributions that can be used for dissemblance assessment in image processing applications. We also prove the relevancy of the joint possibility modeling I.INTRODUCTION and divergence evaluation in the context of a bi-band change Multivariate data analysis has long been an outstanding detection application between optical satellite images. issue where the plays a major role. Recently, This paper is organized as follows: Section 2 presents the different attempts have been made in the literature in terms of proposed n-dimensional probability-possibility transformation possibility theory in order to suggest an alternative representa- for the derivation of elliptical possibility distributions. Some tion of multivariate data that can be convenient in cases where properties of this transformation are given as well as an probabilistic features are weak or difficult to obtain. illustration of different possibility distributions obtained in The first way to deal with a multivariate issue is to directly case n = 2. Section 3 is divided into two main steps. First, define a multidimensional possibility distribution as proposed two possibilistic divergence measures are proposed and their among others, by Tanaka in [1], who considers multivariate analytical forms calculated for the joint possibility distribution exponential distributions. obtained from the normal bivariate distribution are given. The second way consists in combining marginal possibility Secondly, an application using the possibilistic tools developed distributions by of t-norm [2], in order to obtain joint in this paper is presented in the form of an of possibility distribution [3][4][5][6]. bi-band change detection between optical satellite images. The third way followed in this paper is to transform a Section 4 provides an overall conclusion to this paper. multivariate probability distribution by extending the uni- variate probability-possibility transformation [7][8] to the n- dimensional case. In the first part of this paper, we will consider in the same framework: II.ELLIPTICALCONTINUOUSPOSSIBILITYDISTRIBUTION • the probability-possibility transformation framework pro- After recalling some definitions of the possibility theory posed in one-dimension and of the probability-possibility transformation in the • the multivariate elliptical probability distributions [9] continuous case for the unidimensional probability model, 1 Univ. Savoie Mont Blanc, LISTIC, F-74000 Annecy, France this section will present a new family of n-dimensional 2 eXo maKina, Digital Technologies, Paris, France elliptical possibility distributions. At the end, a last subsection ∗ [email protected] will propose some bivariate elliptical possibility distributions ————————– The work was supported by USMB and the PHOENIX ANR-15-CE23-0012 obtained from well-known probability distributions. grant of the French National Agency of Research. 2

A. Unidimensional Probability-Possibility transformation when x > 0, we can then rewrite it as: |x − µ| In the following, let us consider a continuous random π(x) = 1 − erf √ (2) variable X on the of reals and its associated probability σ 2 ∗ density p with x , F being its corresponding As shown by (2), in one dimension, the continuous probability- cumulative distribution function. possibility transformation can be applied to any symmetric Definition 1. (Definition 3.1 in [8]) The possibility distri- unimodal probability distribution that has an analytical form bution induced by a continuous strictly unimodal probability for its PDF. density p around x∗ is the possibility (denoted π∗) whose α- However, one can note that definition 1 and 1, in ∗ ∗ which this transformation lies, is not limited to the unidi- cuts are the confidence Iα of level P (Iα) = α around the mode x∗ computed from p. mensional case but only to unimodal probability distributions. Lemma 1 can also be extended to the case of n-dimensional π∗ can be defined as follows: possibility distributions by taking a hyper-volume as a mea- surable length, which is a Lebesgue on Rn. Different π∗(x) = 1 − P ({y, p(y) ≥ p(x)}) (1) ways to define the hyper-volume are possible. Thus, by taking as a starting point (1) from definition 1 and by The possibility distribution π∗ is continuous and encodes the applying it to the n-dimensional case, in the next section we whole set of confidence intervals and π∗(x∗) = 1 . will propose a new family of elliptical possibility distributions derived from the family of elliptical probability distributions. Theorem 1. (Theorem 3.1 in [8]) For any probability density p, the possibility distribution π∗ in definition 1 is consistent with p, that is: ∀A measurable, Π∗(A) ≥ P (A), Π∗ and P B. Probability-Possibility transformation for Multivariate el- being the possibility and probability measures associated to liptical probability distributions π∗ and p. For the sake of generality, we consider a family of prob- ability which represents a general framework although pre- Lemma 1. (Lemma 3.1 in [8]) For any continuous probability senting good performance in the modelization of correlated density p having a finite of modes, any minimal length information. Thus our attention focuses on the family of ellip- measurable subset I of the real line such that P (I) = α ∈ tical probability distributions which has been widely studied (0, 1], is of the form {x, p(x) ≥ β} for some β ∈ [0, p ] max especially in [14], [15] or [16], and has already shown its where p = sup p(x). It thus contains the modal value(s) max x usefulness. This family includes notably the Normal, Laplace, of p. Student’s t, and Cauchy distributions. From Definition 1 and Theorem 1, it is stated that the Firstly, let’s start by giving the considered framework. maximally specific possibility distribution π∗ obtained from Let X be an n-dimensional random vector elliptically con- a given probability distribution although remaining consistent toured (denoted by X ∼ ECn(Σ)), whose PDF has the is built from the set of confidence intervals around a nominal following form value. From Lemma 1, we ensure that we preserve the − 1 T −1  pX (x, µ, Σ) = αn|Σ| 2 gn (x − µ) Σ (x − µ) (3) maximum amount of information during the transformation from p to π when the nominal value around which is built where µ ∈ Rn is the value, Σ ∈ Rn×n its positive and m the possibility distribution is chosen to be the mode x . symmetric - , αn is a normalization Moreover, in the one-dimensional case, it has been proved in scalar and |Σ| is the determinant of Σ. The function gn : [8] that from (1), the expression of the possibility distribution R+ → R+ is a density generator function of the elliptical π∗ corresponding to a unimodal symmetric PDF of mode x∗ distribution in Rn with the following property [14]: strictly increasing on the left and decreasing on the right can Z T be written as [12],[13]: gn(x x) dx = 1 . R Let dmah be the between two vectors ∗ ∗ for −∞ < x ≤ x , π (x) = 2F (x) x and y according to the weighting matrix Σ. Then dmah is for x∗ ≤ x < ∞, π∗(x) = 2(1 − F (x)) such as: q T −1 As an example we can consider the case of a Normal dmah(x, y) = kx − ykΣ = (x − y) Σ (x − y) (4) probability distribution with mean µ and variance σ2, the following possibility distribution is then: It is important to note that the equidensity contours of pX are ellipsoids centered around the mean µ (which is also the   mode) and the axes of the ellipsoids are the eigenvectors of for −∞ < x ≤ µ, π(x) = 1 + erf x−√µ Σ [17]. Secondly, let’s determine the probability accumulated σ 2   in a region delimited by an equidensity contour of the PDF p for µ ≤ x < ∞, π(x) = 1 − erf x−√µ σ 2 given in (3). x 2 √2 R −t Where erf(x) = π 0 e dt is the Error function. Proposition 1. Let ax = dmah(x, µ), for an n-dimensional Since the Error function is negative when x < 0 and positive random vector X ∈ Rn with mean µ ∈ Rn and covariance 3

n×n matrix Σ ∈ R , whose PDF is (3), the probability accumu- Lastly, let πX and πY be the two possibility distributions lated in the region associated respectively to X and Y . n If AΣ := {z ∈ R such that dmah(z, µ) ≤ ax} (5) the matrix M = Σ − Σ is semi-definite positive. is Y X n a 2π 2 Z x Then P (A ):=P (d (z, µ)≤a )=α rn−1g (r2)dr (6) Σ mah x n Γ( n ) n 2 0 πΣX ≤ πΣY . Z +∞ u−1 −t where αn is a normalization scalar, Γ(u) = t e dt Proof. Let’s denote C1 the class of all convex centrally sym- n 0 metric sets in and let’s ΣX and ΣY be two definite is the Gamma function and gn is the density generator R positive matrices. Finally, let’s denote P and P as two function. ΣX ΣY multivariate elliptical probability distributions. As a consequence of Theorem 5.1 in [18] we have the R Proof. Let’s directly evaluate P (AΣ ) = p(z) dz. Firstly, following statement: Assume that X ECn(Σ). If C ∈ C1 AΣ ∼ as Σ is a definite positive matrix, we can proceed to a change and C is closed, and if ΣY − ΣX is a definite positive matrix 1 − 2 of variable in the P (AΣ ) by setting y = Σ (z − then PΣX (C) ≥ PΣY (C). 1 ∂z 2 µ) whose determinant of Jacobian is | ∂y | = |Σ| . Hence, Hence, we can conclude that from (3) P (C) ≥ P (C) ⇔ 1 − P (C) ≤ 1 − P (C) Z Z ΣX ΣY ΣX ΣY − 1 2 1 p(z) dz = αn|Σ| 2 gn kyk |Σ| 2 dy ⇔ π (x) ≤ π (x) ΣX ΣY AΣ kyk≤ax Z 2 = αn gn kyk dy (7) kyk≤ax Note that in the case of an uncorrelated , the where k · k stands for the Euclidean Norm. associated is diagonal definite positive (i.e. Then, by observing that there is a rotational invariance in the ΣX = diag(σX,1, ..., σX,n) and ΣY = diag(σY,1, ..., σY,n)). integral in (7) since it’s the integral of an ellipsoid, we split Hence the matrix M = ΣY − ΣX is definite positive if and the n-dimensional volume element dy into an area element dA only if the eigenvalues of the diagonal matrix M are positive and a radial element dr. Thus, by setting r = kyk, we obtain or null meaning that an expression of (7) in spherical coordinates such as ∗ ∀i ∈ N , σX,i ≤ σY,i. Z ax Z 2 P (AΣ ) = αn gn(r ) dA dr, Although (9) may seem complicated to handle in its general 0 Sn−1(r) form, it has to be applied to some well-known elliptical where Sn−1(r) is the (n − 1)-sphere of radius r. probability models such as the Normal multivariate probability Knowing that the area of the surface of this (n − 1)-sphere distribution, the Cauchy multivariate distribution or the Stu- n−1 can be written as An(r) = r An−1(1) where An−1(1) = dent’s t-distribution, for example. n 2 n  2π /Γ 2 is the area of the unit sphere, we have n Z ax 2π 2 n−1 2 C. Special case of elliptical possibility distributions P (AΣ ) = αn n r gn(r ) dr (8) Γ( 2 ) 0 In the previous section, we have determined a general form of elliptical possibility distributions from the n-dimensional family of elliptical probability distributions, let’s now apply Finally, from (1) and proposition 1 we are able to obtain the this transformation onto some well-known probability distri- elliptical possibility distribution associated with any elliptical butions. probability distribution: In order to illustrate the use of the probability-possibility transformation given by (9), we will firstly consider the case πΣ (x) = 1 − P (AΣ ) n Z ax of one of the most famous elliptical probability distributions: 2π 2 n−1 2 = 1 − αn r gn(r ) dr (9) the . Firstly, we recall that the Normal Γ( n ) 2 0 multivariate distribution possesses a PDF of the following with ax = dmah(x, µ). The following proposition provides form: a dominance property between possibility distribu- 1 − 1 (x−µ)T Σ−1(x−µ) p (x) = e 2 . tions. N (2π)n/2|Σ|1/2

Proposition 2. Let X be an n-dimensional random vector on 1 1 − 2 t n n×n It corresponds to (3) when αn = n/2 and gn(t) = e . the set of reals with mean µ ∈ R and ΣX ∈ R its semi- (2π) definite positive covariance matrix. From (6) the expression of the probability accumulated in the region A becomes Let Y be another n-dimensional random vector on the set of Σ n n×n a reals with the same mean µ ∈ and Σ ∈ its semi- n/2 Z x 2 R Y R 2π n−1 − r P (A) = r e 2 dr. definite positive covariance matrix. N n/2 (10) (2π) Γ(n/2) 0 4

Moreover, it has been proved (Result 2 in [19]), that (10), which represents the probability accumulated in a region delimited by an equidensity contour of the PDF, can be computed in this way thanks to the substitution t = r2/2:

 2  n ax γ 2 , 2 PN (AΣ ) := PN (dmah(z, µ) ≤ ax) = n  (11) Γ 2 γ being the lower incomplete Gamma function defined as Z b (a) Cauchy’s PDF (b) πC γ(a, b) = ta−1e−tdt. 0 Finally, the possibility distribution associated to the multivari- ate Normal distribution takes the following form:

 a2  γ n , x n 2 2 ∀x ∈ R , πN (x) = 1 − n  . (12) Γ 2 n = 1 In the case of , the Mahalanobis distance is√ equivalent to d (x, µ) = kx − µk = |x−µ| and Γ 1  = π. Thus, mah Σ σ 2 (c) Student’s t PDF (d) πS we find an expression of the possibility distribution exactly similar to (2). In table I, we give the analytical expressions of joint possibility distributions obtained for different probability distributions for n = 2. Fig. 1 presents different possibility distributions as well as

PDF Associated joint possibility distributions for n = 2

Cauchy (e) Normal’s PDF (f) πN 1 π (x) = (13) Fig. 1: Examples of PDFs (left column) and their associated C p T −1 1 + (x − µ) Σ (x − µ) possibility distributions (right column). Student’s t − ν  (x − µ)T Σ−1(x − µ)  2 π (x) = 1 + through a change detection application under a high level of S ν (14) imprecision.

Normal 1 T −1 − 2 (x−µ) Σ (x−µ) πN (x) = e (15) III.CHANGEDETECTIONOVEROPTICALSATELLITE IMAGES

TABLE I: Analytic forms of possibility distributions obtained In this section, we will firstly present the two divergences from well-known elliptical probability distributions. we derived in order to compare possibility distributions. The possibility distributions are obtained thanks to the previously their associated probability distributions. In each case the proposed multivariate probability-possibility transformation mean vector is set to µ = (5 20), the variance along the applied to the Normal distribution on the special case of n = 2. Then we will illustrate through a change detection x and y components are σx = 8 and σy = 6 respec- tively and the correlation coefficient is equal to 0.8. For the application on real optical satellite images the use of the Student’s t distribution, the degree of freedom ν is equal proposed possibilistic framework. to 3. Note that (15) is close to the analytical expression found by H.Tanaka [1] although his exponential possibility A. Possibilistic Divergences function holds for any dimension whereas (15) is derived from a consistent probability-possibility transformation and it Let’s first introduce the possibilistic divergences (Π- is specific to the case n = 2 and differs otherwise. Finally, divergence) as a comparison function between two possibility as can be seen, (15) is remarkable as it is a covariance based distributions. normalization of the joint probability distribution for n = 2 The first divergence proposed is inspired (though slightly such as different) from the extension of the probabilistic Kullback- 1 Leibler divergence by replacing the probability distributions π (x) = 2π|Σ| 2 p (x). (16) N N by possibility distributions [20]. This approach has proved its Thus, in the following section, we will focus on the conse- interest in many applications and is of particular value due to quences of this normalization based possibility distribution the exponential form of the possibility distribution in (15). 5

As a reminder, for two PDF p1 and p2, the probabilistic possibility distributions. Kullback-Leibler divergence is defined as As a reminder, for two PDF p1 and p2, the square of the Z   Hellinger divergence is defined as 2 p1(x) ∀X ∈ R KL(p1kp2) = p1(x)ln dx. (17) Z 2 p (x) 2 1 p p  R 2 H (p1, p2) = p (x)− p (x) dx. (20) prob 2 1 2 Remark 1. Let’s begin with an example. In the case where R 2 π2 = π1, a possibilistic Kullback-Leibler divergence would Note that, due to its form closely related to the Euclidean take the following form norm, the Hellinger divergence still preserves its divergence Z Z properties while replacing PDF by possibility distributions. π1 1 KL(π1kπ2) = π1log √ = π1log (π1) . After some straightforward calculus, following [22], the square π 2 R 1 R of the analytical expression of the Hellinger divergence (called But it has been demonstrated in (25) in appendix that if π has ΠDH ) between two possibility distributions in the form of (15) the form of (15) then we have is given by: Z 1 2 p p  π1log (π1) = −2π|Σ1| 2 . ΠD (π , π ) = π |Σ | + |Σ | H 1 2 1 2 R 1 1 −1 1 −1 − 1 Thus KL(π kπ ) = −π|Σ | 2 which is negative and so a −2π| Σ + Σ | 2 × (21) 1 2 1 2 1 2 2 possibilistic Kullback-Leibler divergence does not respect the 1 T 1 1 −1 positivity. − (µ1 −µ2) ( Σ2 + Σ1) (µ1 −µ2) e 8 2 2 . Moreover, when the possibility distribution has the form of (15) we can determine the lowest value of KL(π1kπ2) = Another approach to extend probabilistic divergence is to use a Z π Z π π log 1 = − π log 2 . Indeed, since the logarithm is measure-based approach as proposed by V.Torra, Y.Narukawa, 1 π 1 π R 2 R 1 M.Sugeno in [23]. But unlike the distribution-based approach concave and thanks to the Jensen’s inequality we can write analytical expressions are not available in a lot of cases. Z Z  Z π2 π2 Therefore we have not considered this approach in this paper π log ≤ log π = log π . 1 π 1 π 2 but this could be the object of future works. R 1 R 1 R From (16), we can rewrite this inequality as Z  Z  B. Application π2 1 π log ≤ log 2π|Σ | 2 p . 1 2 2 1) Data description: The application of change detection π1 R R has been carried out over 14 pairs of images from the  1  Thus, we have KL(π1kπ2) ≥ −log 2π|Σ2| 2 . training set of the Onera Satellite Change Detection (OSCD) dataset where the groundtruth is available. The OSCD dataset In order to conserve the properties of a divergence, it has been presented in the paper “Urban Change Detec- is necessary to consider a symmetrical form of the KL- tion for Multispectral Earth Observation Using Convolutional divergence. We have called this the divergence ΠD and KL Neural Network” presented at the international conference it is expressed for all X in 2 as R IGARSS’2018 [24]. The original OSCD dataset comprises Z π (x) Z π (x) 24 pairs of multispectral images taken from the Sentinel-2 ΠD (π ,π )= π (x)log 1 dx + π (x)log 2 dx KL 1 2 1 π (x) 2 π (x) satellite between 2015 and 2018. Locations are picked all R 2 R 1 (18) over the world, in South America, USA, Europe, Middle Note that (18) has to be close to the probabilistic Jeffrey’s East and Asia. For each location, registered pairs of 13- divergence [21]. band multispectral satellite images obtained by the Sentinel- 2 satellites are provided. Images vary in spatial resolution Proposition 3. The divergence ΠDKL is a particular case of divergence. between 10m, 20m and 60m. Fig. 2 presents an example of a pair of images used in the application. The pair considered

The demonstration that the divergence ΠDKL is a f- hereafter consists of two optical images of the city of Beihai divergence can be found in the appendix. Lastly, the analytical taken between December 2016 and March 2018. The ground expression of the Π-divergence between two possibility distri- truth mask of the changes is provided as well as an example butions whose expression is of the form in (15) is given by: of change map obtained after running the algorithm of change h −1 T −1 i detection. ΠD (π , π )=πp|Σ | tr(Σ Σ )+(µ −µ ) Σ (µ −µ )−2 KL 1 2 1 2 1 1 2 2 1 2 2) Experimental setup: Let’s now expose in more details p h −1 T −1 i the algorithm functioning of the change detection application. +π |Σ2| tr(Σ1 Σ2)+(µ2−µ1) Σ1 (µ2−µ1)−2 Firstly, a preliminary step is necessary before running the (19) core of the algorithm of change detection. The images are where |A| and tr(A) are respectively the determinant and represented as double in the ranges 0 to the trace of the matrix A. The calculation steps needed to 1. Next, in order to avoid any border effects, a symmetric determine (19) are given in the appendix. extension is applied to each satellite image proportionally to The second Π-divergence proposed is an extension of the the size of the sliding window considered after. Then, the Hellinger divergence by replacing probability distributions by images are altered by the application of a Gaussian 6

(a) Image Beihai date t1 (b) Image Beihai date t2 (a) Image Beihai date t1 (b) Image Beihai date t2 Fig. 4: Example of the pair image of the city of Beihai altered by a Gaussian noise with zero mean and a variance of 0.1.

sake of performing a bi-band Change Detection. Then, for the two color channels selected, we extract the mean vector µ and the variance-covariance matrix Σ thanks to the Covariance Matrix method (SCM) [25]. The µ and Σ being the parameters of the possibility distribution in (15). Then, with the parameters estimated, one of the previously proposed analytical possibilistic divergences (either in the (c) Ground truth mask Beihai (d) Change Map form of (19) or (22)) is computed and its result is stored Fig. 2: Example of a pair of images with its associated Ground into the change map. The last step consists in shifting the truth and a change map obtained. sliding window by one pixel before the next iteration. Fig. 3 presents the functioning of the algorithm for any possibilistic divergence proposed and when the color channels selected for the parameters extraction are the green and the blue ones. Next, when the change map is fully completed, the perfor- mance of the algorithm are determined by computing the Receiving Operator Characteristic (ROC) curve by comparing the change map and the ground truth mask. Then by looking at the area under the curve (AUC) we can calculate the overall performance of the algorithm for all possible thresholds of detection. See [26] for more details. Finally, in order to highlight the interest of a possibilistic framework, we compare the performance of the possibilis- tic model to the corresponding probabilistic one. We repeat Fig. 3: Flowchart of proposed Change detection algorithm exactly the same steps with the analytical form of the sym- metrical Kullback-Leibler divergence (called KLprob) or with the Hellinger divergence (called Hprob) associated with the with zero mean and fixed variance. This guarantees realistic normal bivariate probability distribution. covariance matrix estimation in presence of rank-deficient 3) Experimental results: Two different test protocols have correlation structures that can be due to missing pixels. Note been done with the change detection algorithm. In the first that the higher the value of the variance, the blurrier the image series of tests, we set a specific variance of 0.01 for the is. An example of the pair images of the city of Beihai altered Gaussian noise applied to each image. Then we run the by a Gaussian noise with zero mean and variance equal to 0.1 algorithm several times by increasing the size of the sliding is given in fig. 4. Secondly, we run the algorithm of change window by 2 pixels each time from 3 by 3 pixels to 41 by 41 detection over a pair of images. The principle of the algorithm pixels. The results of this first series of tests are presented in is to calculate a measure of dissemblance at each iteration fig. 5 for the two pairs of images. In the second series of tests, between the image before (t1) and the image after (t2) for a we fix the size of the sliding window to 5 by 5 pixels and we pixel and its surrounding neighborhood. A high dissimilarity increase the variance of the applied noise at each running of measure signifies a possible change. At each iteration, a patch the algorithm. The results of this second series of tests are (sliding window) is extracted for the two images and we presented in fig. 6. In fig. 5, the two divergences considered consider two color channels out of three available for the obtain similar results on the 14 pairs of images considered, 7

abilistic methods over 14 pairs of images. As expected their performance decrease as the variance of the Gaussian noise applied increases. However, as we could begin to observe in the first series of tests, an interesting property of robustness to the noise emerges from the possibilistic methods compared to the probabilistic methods. This property seems to be confirmed by the fact that both divergences give similar results. In the following paragraphs, the authors will propose a hy- pothesis that could explain the robustness to the noise of the possibilistic framework. First, it is important to recall that the final results in terms of AUC are impacted by three factors: the parameters estimator (SCM here), the (probabilistic or possibilistic) and the divergences used. In each case the estimator and the divergences used are the same. Thus, the real difference between the probabilistic and the possibilistic framework is the statistical model chosen. Fig. 5: Graphs showing the average AUC over 14 pairs of As both statistical models are elliptical distributions, we have images as a function of the diameter of the patch used. to look at their intrinsic properties to give an explanation on the behavior observed. The elliptical possibility distribu- tion is homogeneous to a cumulative function (but slightly different) thanks to the probability-possibility transformation which integrates the probability density function. Thus the possibilistic divergence is a comparison between cumulative functions whereas the probabilistic framework compares den- sity functions. That could explain the difference in terms of performance.

IV. CONCLUSION In this paper we have proposed a new family of elliptical possibility distributions derived from the well-known elliptical family of probability distributions thanks to the extension to the n-dimension of the continuous probability-possibility transformation in one dimension. Finally, through a change detection application, we have emphasized the consequences of a possibilitic framework in Fig. 6: Graphs showing the average AUC over 14 pairs of the field of multivariate data analysis especially when the data images according to the variance of the applied noise for a are noisy or their amount is insufficient to allow evaluating 5x5 pixels sliding window. their characteristic parameters accurately. In future works, two main aspects will be investigated; firstly, elliptical possibility distributions of higher dimensions will be either by comparing possibility distributions or probability considered for the sake of higher multivariate data analysis, distributions. When the size of the patch is 3 by 3 pixels, and secondly, other divergences will be studied to propose the SCM estimator is affected by the presence of in useful analytical forms in application of continuous process- the patch. Therefore the overall performance of each method ing. is affected and the results are below the optimal performance. On the other side, when the patch size is big enough (more APPENDIX A than 10 pixels width), the SCM estimator is more consistent. KULLBACK-LEIBLERDIVERGENCE Nevertheless, since most of the change size are about few A. positivity & symmetry pixels width, each method has its AUC decreasing as the size Considering the following equality we have for 0 < x ≤ 1: of the patch becomes bigger. Finally, the best compromise  1  1 seems to be found for a 7x7 pixels patch although it depends log(x) ≤ x − 1 ⇔ log ≤ − 1 highly on the size of the changes considered. Note that the x x 1 possibilistic algorithm shows an interesting robustness to the ⇔ log(x) ≥ 1 − . noise observable both in the case of the Kullback-Leibler x x divergence or in the Hellinger divergence. This property of then, by replacing x by y with 0 < y ≤ 1, and by multiplying robustness needs to be confirmed in the second series of tests. the whole expression by x we obtain Fig. 6 presents the average AUC obtained while testing the x x log ≥ x − y. noise resilience of the possibilistic methods and of the prob- y 8

Let’s now replace x and y by any classical continuous possi- bility distribution π1 and π2, which have values in the interval Z 1 T −1 ]0, 1]. Thus, it can become β = (x − µ1) Σ (x − µ1)p1(x)dx 2 1 R  1    T −1 π1 = E1 tr (x − µ1) Σ1 (x − µ1) π1 log ≥ π1 − π2 2 π2 1 = tr (x − µ )(x − µ )T Σ−1 2E1 1 1 1 or 1 = tr  (x − µ )(x − µ )T  Σ−1   E1 1 1 1 π2 2 π2 log ≥ π2 − π1. 1 −1 π1 = tr Σ Σ 2 1 1 1 = tr (I ) Finally by adding the two expressions together we have 2 2 = 1 .     π1 π2 π1 log + π2 log ≥ 0. and π2 π1 Z 1 T −1 α = (x − µ2) Σ2 (x − µ2)p1(x)dx And so ΠD(π1, π2) ≥ 0. 2 ZR The symmetry property is direct. Indeed by setting π = π , 1 T 1 2 = [(x−µ )(µ −µ )] Σ−1(x − µ )(µ − µ )p (x)dx we find ΠD(π , π ) = 0. 2 1 1 2 2 1 1 2 1 1 1 R Z 1 = ((x − µ )T Σ−1(x − µ ) + 2(x − µ )T Σ−1(µ − µ ) 2 1 2 1 1 2 1 2 R + (µ − µ )T Σ−1(µ − µ ))p (x)dx B. analytical form 1 2 2 1 2 1 1 T −1 T −1 = (E1[(x−µ1) Σ2 (x−µ1)] + E1[2(x−µ1) Σ2 (µ1 −µ2)] First, let’s consider the first term in (18) that we will call A 2 + [(µ − µ )T Σ−1(µ − µ )]) and B will be the second term. Then replace π1(x) and π2(x) E1 1 2 2 1 2 by their respective expressions given in (15). We then obtain 1  −1 T −1  = tr(Σ Σ1) + (µ1 − µ2) Σ (µ1 − µ2) the following form with 2 2 2 1 T −1 2 − (x−µ1) Σ (x−µ1) ∀X ∈ R π1(x) = e 2 1 1 T −1 The analytical expression of the first term of (18) then be- − (x−µ2) Σ (x−µ2) and π2(x) = e 2 2 : comes: p h −1 T −1 i Z π (x) A =π |Σ1| tr(Σ2 Σ1)+(µ1−µ2) Σ2 (µ1−µ2)−2 (23) A = π (x) log 1 dx 1 π (x) R 2 A B B Z Z Finally, we obtain (19) by adding and where is A = π1(x)log (π1(x)) dx− π1(x)log (π2(x)) dx calculated in the same way as . R R NB: Note that the term β is actually similar to a Shanon Z 1 = (x − µ )T Σ−1(x − µ )π (x)dx of a 2D-joint possibility distribution, such as: 2 2 2 2 1 R Z Z 1 T −1 1 T −1 β = (x − µ1) Σ1 (x − µ1)p1(x)dx − (x − µ1) Σ (x − µ1)π1(x)dx. 2 2 1 R 1 Z = − π1(x)ln (π1(x)) dx 2πp|Σ | From (16) we can then rewrite it as 1 Since β = 1, we then have: Z p 1 T −1 Z A = 2π |Σ1|( (x − µ2) Σ2 (x − µ2) p1(x)dx p 2 − π1(x)ln (π1(x)) dx = 2π |Σ1|. (24) R Z 1 − (x−µ )T Σ−1(x−µ ) p (x)dx). (22) 2 1 1 1 1 Moreover, we also have: R Z p Z p Let’s call α and β the two scalars such as π1(x)dx = 2π |Σ1| p1(x)dx = 2π |Σ1| (25) Z 1 T −1 R • α = (x − µ ) Σ (x − µ )p (x)dx Since p1(x)dx = 1. 2 2 2 2 1 Z R Thus, an invariance property appears and can be written as: 1 T −1 • β = (x − µ ) Σ (x − µ )p (x)dx. 2 1 1 1 1 Z +∞ R π(x) (1 + ln ( π(x) )) dx = 0 (26) By noting E(x) the of the random variable −∞ X ∈ R2, we have: 9

REFERENCES [26] A. P. Bradley, “The use of the area under the roc curve in the evaluation of machine learning ,” , vol. 30, no. 7, pp. [1] H. Tanaka and H. Ishibuchi, “Evidence theory of exponential possibility 1145–1159, 1997. distributions,” International Journal of Approximate Reasoning, vol. 8, no. 2, pp. 123–140, 1993. [2] D. Dubois and H. Prade, “A class of fuzzy measures based on triangular norms a general framework for the combination of uncertain informa- tion,” International Journal of General Systems, vol. 8, no. 1, pp. 43–61, 1982. [3] J. Vejnarova,´ “On interpretation of t-product extensions of possibility distributions,” Fuzzy Sets and Systems, vol. 232, pp. 3–17, 2013. [4] E. L. Esmi, G. Barroso, L. C. Barros, and P. Sussner, “A family of joint possibility distributions for adding interactive fuzzy numbers inspired by biomathematical models.” in IFSA-EUSFLAT, 2015. [5] C. Baudrit, I. Couso, D. Dubois et al., “Joint propagation of probability and possibility in analysis: Towards a formal framework,” Interna- tional Journal of Approximate Reasoning, vol. 45, no. 1, pp. 82–105, 2007. [6] L. Coroianu and R. Fuller,´ “Nguyen type theorem for extension prin- ciple based on a joint possibility distribution,” International Journal of Approximate Reasoning, vol. 95, pp. 22–35, 2018. [7] A. Ferrero, M. Prioli, S. Salicone, and B. Vantaggi, “2d probability- possibility transformations,” in Synergies of Soft Computing and Statis- tics for Intelligent Data Analysis. Springer, 2013, pp. 63–72. [8] D. Dubois, L. Foulloy, G. Mauris, and H. Prade, “Probability-possibility transformations, triangular fuzzy sets, and probabilistic inequalities,” Reliable computing, vol. 10, no. 4, pp. 273–297, 2004. [9] S. Cambanis, S. Huang, and G. Simons, “On the theory of elliptically contoured distributions,” Journal of Multivariate Analysis, vol. 11, no. 3, pp. 368–385, 1981. [10] C. Lesniewska-Choquet, A. M. Atto, G. Mauris, and G. Mercier, “Image change detection by possibility distribution dissemblance,” in Fuzzy Systems (FUZZ-IEEE), 2017 IEEE International Conference on. IEEE, 2017, pp. 1–6. [11] S. Kullback and R. A. Leibler, “On information and sufficiency,” The annals of , vol. 22, no. 1, pp. 79–86, 1951. [12] G. Mauris, V. Lasserre, and L. Foulloy, “A fuzzy approach for the expression of in measurement,” Measurement, vol. 29, no. 3, pp. 165–177, 2001. [13] G. Mauris, “Possibility distributions: A unified representation of usual direct-probability-based parameter estimation methods,” International Journal of Approximate Reasoning, vol. 52, no. 9, pp. 1232–1242, 2011. [14] S. Bhattacharyya and P. J. Bickel, “Adaptive estimation in elliptical distributions with extensions to high dimensions,” Preprint, 2014. [15] J. Owen and R. Rabinovitch, “On the class of elliptical distributions and their applications to the theory of portfolio choice,” The Journal of Finance, vol. 38, no. 3, pp. 745–752, 1983. [16] K. W. Fang, Symmetric Multivariate and Related Distributions: 0. Chapman and Hall/CRC, 2017. [17] R. Blanc, E. Syrkina, and G. Szekely,´ “Estimating the confidence of statistical model based shape prediction,” in International Conference on Information Processing in Medical Imaging. Springer, 2009, pp. 602–613. [18] M. D. Perlman, “Concentration inequalities for multivariate distribu- tions: Ii. elliptically contoured distributions,” Lecture Notes-Monograph Series, pp. 284–308, 1992. [19] G. Gallego, C. Cuevas, R. Mohedano, and N. Garcia, “On the ma- halanobis distance classification criterion for multidimensional normal distributions,” IEEE Transactions on , vol. 61, no. 17, pp. 4387–4396, 2013. [20] A. Ohlan, “An overview of recent developments of fuzzy divergence measures and their generalizations,” International Journal of Current Research and Review, vol. 9, no. 7, p. 1, 2017. [21] H. Jeffreys, The theory of probability. OUP Oxford, 1998. [22] L. Pardo, based on divergence measures. Chapman and Hall/CRC, 2005. [23] V. Torra, Y. Narukawa, and M. Sugeno, “On the f-divergence for non- additive measures,” Fuzzy Sets and Systems, vol. 292, pp. 364–379, 2016. [24] R. Caye Daudt, B. Le Saux, A. Boulch, and Y. Gousseau, “Urban change detection for multispectral earth observation using convolutional neural networks,” in IEEE International Geoscience and Remote Sensing Symposium (IGARSS), July 2018. [25] S. T. Smith, “Covariance, subspace, and intrinsic crame/spl acute/r-rao bounds,” IEEE Transactions on Signal Processing, vol. 53, no. 5, pp. 1610–1630, 2005.