8th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2013)

Hellinger distance for fuzzy measures

Vicenç Torra1 Yasuo Narukawa2 Michio Sugeno3 Michael Carlson4

1IIIA-CSIC, Campus UAB s/n, 08193 Bellaterra (Catalonia, Spain) 2Toho Gakuen, Kunitachi, Tokyo 186-0004 (Japan) 3ECSC, c/ Gonzalo Gutiérrez Quirós s/n, 33600 Mieres (Spain) 4Dept. Statistics, U. Stockholm, SE-106 91 Stockholm (Sweden)

Abstract The structure of this paper is as follows. In Sec- tion 2 we review some concepts needed in the rest of Hellinger distance is a distance between two ad- this work. In Section 3 we introduce the Hellinger ditive measures defined in terms of the Radon- distance for fuzzy measures, give some examples and Nikodym derivative of these two measures. This results. The paper finishes with some conclusions. proposed in 1909 has been used in a large variety of contexts. 2. Preliminaries In this paper we define an analogous measure for fuzzy measures. We discuss them for distorted prob- This section reviews some results that are needed abilities and give two examples. later on in the rest of this paper. We focus on the Hellinger distance and on the Choquet integral. Keywords: Hellinger distance, fuzzy measures, Radon-Nikodym derivative, Choquet integral, ca- 2.1. Measures pacities. We begin with the definition of additive and non- 1. Introduction additive (fuzzy) measures.

In 1909 Ernst Hellinger [7] introduced a distance Definition 1 A collection of subsets of a set Ω F to evaluate in which extent two probability distri- is a σ-algebra if butions are similar. The definition is based on the 1. Ω and ; Radon-Nikodym derivatives of the two probabilities 2. If ∈A F then∅ ∈ its F complement Ac ; with respect to a third . This 3. If A ∈,A F ,... , then their union ∈A F . distance has been used in a variety of contexts as 1 2 F ∪ i ∈ F e.g. data privacy [20], data mining [2]. A pair (Ω, ) consisting of a set Ω and a σ-algebra In the area of fuzzy measures and capacities some of subsetsF of Ω is called a measurable space. research has been done to prove a Radon-Nikodym F type theorem, as Graf [6] puts it. That is, re- Definition 2 (see e.g. [19]) Let (Ω, ) and (Λ, ) searchers try to solve the question of when a given be measurable spaces and f a functionF from Ω to GΛ. fuzzy measure can be expressed as the (Choquet) in- The function f is called a measurable function from tegral [1] of a function with respect to another given (Ω, ) to (Λ, ) if and only if f −1( ) . measure. When such relationship is found, we can F G G ⊂ F say that this function is the Radon-Nikodym deriva- Here we use for a set A tive. Graf [6] was one of the first authors to deal ∈ G − with this problem. He focuses on sub-additive fuzzy f 1(A) = ω Ω f(ω) A , measures and gives (Theorem 4.3) necessary and { ∈ | ∈ } sufficient conditions for this to happen. Sugeno [21] and for a collection of subsets of of Λ A deals with the same problem but considering dis- − − f 1( ) = f 1(A) A torted probabilities. Rébillé [17] deals with the case A { | ∈ A} of almost subadditive set functions of bounded sum. In this paper we consider the definition of the Definition 3 Let (Ω, ) be a measurable space. A set function µ definedF on is called a (additive) Hellinger distance for fuzzy measures. To do so, we F use the concept of derivative as used in [21], and the measure if and only if Choquet integral as an alternative to the Lebesgue 0 µ(A) for any A ; integral. We will illustrate the definition with some • µ(≤) = 0; ≤ ∞ ∈ F examples using distorted probabilities, and prove • ∅ If A1,A2,... are disjoint elements of (i.e., some properties. • F Ai Aj = for all i = j) then The calculation of our examples requires the com- ∩ ∅ 6 putation of Choquet integrals. The problem of the ∞ ∞ calculation of the Choquet integral has been previ- µ( Ai) = µ(Ai). i=1 i=1 ously considered in e.g. [12, 13, 10, 14]. [ X

© 2013. The authors - Published by Atlantis Press 541 The triple (Ω, , µ) is called a measure space. Theorem 10 Let µ and ν be two additive measures F on (Ω, ) and µ be σ-finite. If ν << µ, then there Example 4 There is a unique measure λ on (R, ) exists aF nonnegative measurable function f on Ω B such that λ([a, b]) = b a for every finite interval such that − [a, b], < a b < . This is known as the ν(A) = fdµ Lebesgue−∞ measure.≤ ∞ ZA Here R is the real line and is a Borel σ-algebra. The function f in this theorem is called the B Radon-Nikodym derivative of ν with respect to µ, Definition 5 Let (Ω, ) be a measurable space. A denoted F dν set function µ defined on is called a fuzzy measure f = . if an only if F dµ The function f may not be unique, but if f0 and 0 µ(A) for any A ; f1 are both Radon-Nikodym derivatives of ν, then • µ(≤) = 0; ≤ ∞ ∈ F f0 = f1 almost everywhere µ (see e.g. [8] p.7). • If ∅A A then • 1 ⊆ 2 ⊆ F Definition 11 Let P , Q be two probabilities that µ(A ) µ(A ) 1 ≤ 2 are absolutely continuous with respect to a third probability measure ν. The Hellinger distance be- Example 6 Let m : R+ R+ be a continuous tween P and Q is defined as and increasing function such→ that m(0) = 0. Let λ be the Lebesgue measure. Let µm be the set function 2 1 dP dQ defined by H(P,Q) = dν v µm(A) = m(λ(A)) (1) u2 r dν − r dν ! u Z for all A. The function µ is a fuzzy measure. t m Here dP/dν and dQ/dν are the Radon-Nikodym When m(1) = 1 and we restrict λ to the measur- derivatives of P and Q able space ([0, 1], ), then λ is a probability mea- [0,1] The Hellinger distance does not depend on the sure and µ is aB distorted probability. Distorted m measure ν (see [16]). probabilities have been studied in [5, 4, 11]. 2.3. Choquet integral and derivatives with Example 7 [21] Let a > 0. We define µa as the fuzzy measure of the form of Equation 1 with respect to fuzzy measures

2 In this section we review the definition of the Cho- ma(t) = t + at /2, quet integral as well as some results related to this integral. The Choquet integral integrates a function that is with respect to a fuzzy measure. When the fuzzy 2 measure is additive, it corresponds to the Lebesgue µa(A) = (λ(A)) + a(λ(A)) /2. integral. Definition 8 Let µ be a fuzzy measure space. Definition 12 [1] Let X be a reference set, let 1. µ is said to be submodular if (X, ) be a measurable space, let µ be a fuzzy mea- sureA on (X, ), and let g be a measurable function µ(A) + µ(B) µ(A B) + µ(A B). g : X [0, 1]A; then, the Choquet integral of g with ≥ ∪ ∩ respect→ to µ is defined by 2. µ is said to be supermodular if ∞ µ(A) + µ(B) µ(A B) + µ(A B). (C) gdµ := µg(r)dr, (2) ≤ ∪ ∩ Z Z0 2.2. Hellinger distance where µg(r) := µ( x g(x) > r ). { | }

The Hellinger distance was defined for pairs of prob- An alternative notation for this integral is Cµ(f). abilities. It is defined in terms of the Radon- The Choquet integral of g with respect to a fuzzy Nikodym derivative. measure µ on a set A is defined by:

Definition 9 (see e.g. [19, 8]) Let ν and µ be ∞ two additive measures in the same measurable space (C) gdµ := µ( x g(x) > r A)dr. (3) (Ω, ). Then, A 0 { | } ∩ F Z Z if µ(A) = 0 implies ν(A) = 0 which corresponds to we say that ν is absolutely continuous with respect (C) gdµ = (C) g 1Adµ to µ. In this cas we write ν << µ. · ZA Z

542 Theorem 13 [1, 3, 9] Let µ be a non-additive mea- ν(A) = ν(B). Therefore, when λ(A) = x, ν(A) = sure in on (R, ), and f, g be non negative measur- ν([0, x]) and we can define a function f as follows: able function. B f(x) = ν([0, x]). So, if λ(A) = x then

1. If µ is submodular, then ν(A) = ν([0, x]) = f(x) = (C) gdµm. Z[0,x] (C) (f + g)dµ (C) fdµ + (C) gdµ. ≤ As stated in the introduction, Graf and Sugeno, Z Z Z among others have proven conditions on when a 2. If µ is supermodular, then fuzzy measure ν is a Choquet integral of another fuzzy measure µ. Here, we illustrate these results (C) (f + g)dµ (C) fdµ + (C) gdµ. ≥ with two examples adapted from [21]. We will use Z Z Z these examples later. Definition 14 Let (Ω, ) be a measurable space F and let ν, µ : R+ be fuzzy measures. We say Example 17 (See Example 7 in [21]) Let µm be that ν is a ChoquetF → integral of µ if there exists a as in Example 7 (i.e., a distorted Lebesgue measure 2 measurable function g :Ω R+ with with m(t) = t + at /2). Let νn be another distorted → measure with n(t) = t2. Then, ν(A) = (C) gdµ (4) ZA νn(A) = n(λ(A)) = (C) gdµm [0,x] for all A . Z ∈ F for all A such that λ(A) = x with g(t) = (2/a)(1 Then, in Graf [6], Nguyen [15], and Sugeno [21] e−at). − the inverse problem is considered. That is, given measures µ and ν, the problem is to know whether a Example 18 (See Example 8 in [21]) Let µm be g exists that satisfies Equation 4 and, if so, compute as in Example 7 (i.e., a distorted Lebesgue measure 2 such g. with m(t) = t + at /2). Let νn be another distorted measure with n(t) = eat 1 for a > 0. Then, Definition 15 Let µ and ν be two fuzzy measures. − If µ is a Choquet integral of ν, and g is a function νn(A) = n(λ(A)) = (C) gdµm such that Equation 4 is satisfied we write Z[0,x] dν/dµ = g, for all A such that λ(A) = x with g(t) = cosh(at). and we say that g is a derivative of ν with respect Let m(t), g(t) and f(t) be continuously differen- to µ. tiable. Let µ([τ, t]) differentiable with respect to τ on [0, t] for every t > 0. We require the regularity Graf’s definition [6] considers the case of two mea- condition that µ( t ) = 0 holds for every t 0. Let sures when these measures are subadditive capaci- µ′([τ, t]) denote ({∂/∂τ} )µ([τ, t]), where we note≥ that ties. Sugeno’s [21] definition of derivative, which µ′([τ, t]) 0 for τ t. If µ is a distorted Lebesgue ≤ ≤ follows, is of a function with respect to a distorted measure (Definition 1) in terms of m, i.e. µ = µm, Lebesgue measure, and, thus suitable for the deriva- then µ′([τ, t]) = m′(t τ) where m′(t) = dm(t)/t. tive of a distorted Lebesgue measure with respect to − − Theorem 19 (Theorem 1 in [21]) Let + be the another distorted Lebesgue measure. F In short, Graf and Sugeno prove some cases where class of measurable, non-negative, continuous and g exists. I.e., there is a unique g that satisfies Equa- increasing (non-decreasing) functions such that g : R+ R+ + tion 4 for all A . See e.g. Theorem 4.3. in Graf. . Let g , then the Choquet integral ∈ F of g→with respect to∈µ Fon [0, t] is represented as: Definition 16 (Definition 2 in [21]) For a contin- ∞ uous and increasing function f(t) with f(0) = 0, the (C) gdµ = µ( τ g(τ) r [0, t])dr { | ≥ } ∩ derivative of the function f with respect to a fuzzy Z[0,t] Z0 t measure µm is defined as the inverse operation of ′ = µ ([τ, t])g(τ)dt, (6) the Choquet integral by − Z0 df/dµm = g, (5) and when the measure is a distorted Lebesgue mea- sure µ = µm then if g is found to be continuous and increasing. ∞ t ′ If µ(A) is a distorted Lebesgue measure then µ( τ g(τ) r [0, t])dr = m (t τ)g(τ)dτ. 0 { | ≥ }∩ 0 − there exists a function m such that µm(A) = Z Z m(λ(A)). Similarly if ν(A) is a distorted Lebesgue For a function h : R+ R+, its Laplace transfor- → there exists a function n such that νn(A) = n(λ(A)). mation is denoted by H(s) and its inverse Laplace Then, naturally, if λ(A) = λ(B) it holds that transformation by h(t) = L−1[H(s)].

543 Proposition 20 (Proposition 3 in [21]) Let +, The next proposition follows from Theorem 13. + F g , and µ = µm be as in Theorem 19, let ∈ F Proposition 25 If ν is submodular, then we have t ′ f(t) = (C) gdµm = m (t τ)g(τ)dτ. h (µ , µ ) + h (µ , µ ) h (µ , µ ). − ν 1 2 ν 2 3 ≥ ν 1 3 Z[0,t] Z0 As h is symmetric and hν (µ1, µ2) 0, we have Then, the Laplace transformation of f(t) and the that our definition is a distance for submodular≥ ν. expression of f(t) in terms of the inverse Laplace transformation are as follows: Corollary 26 If ν is submodular, Definition 22 de- fines a distance. F (s) = L(f(t)) = sM(s)G(s) (7) − f(t) = L 1[sM(s)G(s)] (8) This definition is illustrated with an example based on the measures given above. where M(s) is the Laplace transformation of m and G(s) is the Laplace transformation of g. Example 27 Let a > 0. Let µm be as in Ex- ample 7 (i.e., a distorted Lebesgue measure with 2 Proposition 21 (Proposition 4 in [21]) Let f(t) be m(t) = t + at /2). Let νn1 be the measure in Ex- a continuous and increasing function with f(0) = 0, ample 17 (i.e., a distorted Lebesgue measure with 2 let µm be a distorted Lebesgue measure, then there n1(t) = t ), and let νn2 be the measure in Ex- exists an increasing (non-decreasing) function g so ample 18 (i.e., a distorted Lebesgue measure with at that f(t) = (C) g(τ)dµ and the following n2(t) = e 1). Then, the Hellinger distance of [0,t] m − holds: µn1 and µn2 with respect to µm is defined as fol- R lows:

G(s) = F (s)/sM(s) (9) 2 −1 1 dµ1 dµ2 g(t) = L [F (s)/sM(s)]. (10) Hµm (µ1, µ2) = (C) − dµm = s 2 dµm dµm Z r r  Here, F (s) is the Laplace transformation of f, M 1 2 the Laplace transformation of m, and G the Laplace = (C) a(2/ )(1 − e−at) − cosht (a ) dµ 2 m transformation of g. r Z where p p  2 3. Variation of Hellinger distance −at (C) a(2/ )(1 − e ) − cosht (a ) dµm = We introduce now our definition of the Hellinger Z   p p 2 −at distance for non-additive measures. µm t | a(2/ )(1 − e ) − cosht (a ) ≥ r dr Z   Definition 22 Let µ1 and µ2 be two fuzzy measures p p  that are Choquet integrals of ν. The Hellinger dis- Now we consider another example of the new Hellinger distance. tance between µ1 and µ2 is defined as Example 28 Let us consider µ be the distorted 2 m 1 dµ dµ Lebesgue measure with m(t) = t2 and let νp be the H (µ , µ ) = (C) 1 2 dν ν 1 2 v distorted Lebesgue measure with distortion n(t) = u2 r dν − r dν ! u Z tp. Therefore, νp(A) = (λ(A))p for p 2, and t νp([0, t]) = tp. ≥ Here dµ1/dν and dµ2/dν are the derivatives of µ1 In this example we compute the Hellinger distance and µ2 according to Definition 15. 2 p between ν and ν with respect to µm. That is, 2 p The first proposition below establishes that this Hµm (ν , ν ) using Definition 22. definition is a proper generalization of standard It is known (see e.g. [18]) that for all r > 1, − Hellinger distance. s > 0:

r Γ(r + 1) Proposition 23 Let ν, µ1, and µ2 be three addi- L[t ] = sr+1 tive measures such that µ1 and µ2 are absolutely r−1 − 1 t continuous with respect to ν. Then, Hν (µ1, µ2) is L 1 = sr Γ(r) the standard Hellinger distance of Definition 11.   This proposition follows from the fact that the where Γ is Euler’s gamma function. p Choquet integral corresponds to the Lebesgue inte- Therefore, the Laplace transforms of ν and µ gral for additive measures. are: We consider some additional properties below. Γ(p + 1) N p(s) = L[νp] = The next proposition is obvious from the definition. sp+1 2 Proposition 24 M(s) = L[µ] = (11) hν (µ1, µ2) = 0 if µ1 = µ2. s3

544 p Now, let us consider L[ dν ]. Applying Proposi- Therefore, the Hellinger distance between νn and dµm p tion 21, we have that: ν with respect to µm is calculated as

dνp N p(s) Γ(p + 1) 2 1 2 p L[ ] = = p−1 . 2 p 1 dν dν dµm sM(s) 2s H (ν , ν ) = (C) dµ (τ) µm v m u2 0 s µm − s µm ! Using the inverse Laplace transform on this expres- u Z t sion we obtain: 4 2)p(p 1 p = 1 − (13) dν −1 Γ(p + 1) Γ(p + 1) p−2 s − ( p + 2)p = L [ p−1 ] = t p dµm 2s 2Γ() p 1 2 p − We note that Hµ (ν , ν ) = 0 for p = 2. p(p 1) − m = − tp 2. 2 Remark 29 The Hellinger distance depends on Let us denote by g(t) the following expression to µm. be used later to compute the Hellinger distance be- 2 p To illustrate the previous remark, let us consider tween ν and µ with respect to µm: the following example. 2 dν2 dνp Example 30 Let µm′ be a distorted Lebesgue mea- g(t) = . ′ sdµm − sdµm ! sure with m (t) = t. Let νp be the distorted Lebesgue

2 measure of Definition 28. Then, First note dν = 1, therefore, dµm df df 2 = dµ ′ dt p)(p 1 − m g(t) = 1 − t(p 2)/2 − 2 and r ! t t − p(p 1) − (C) gdµ ′ = g(τ)dτ. = 1 2)p(p 1 t(p 2)/2 + − tp 2. m − − 2 Z0 Z0 We have p p From this follows that its Laplace transform is: dv − = ptp 1 1 Γ(p/2) p(p − 1) Γ(p − 1) dt G(s) = − 2)p(p − 1 + . s sp/2 2 sp−1 and let us define g(t) as follows p (12) 2 dν2 dνp Now, let us consider the Laplace transform of g(t) = the Choquet integral. Using Proposition 20, and r dt − r dt ! p/2 the expressions for M(s) and G(s) in Equations 11 2 2p − = √2t pt2 p−1 = t 2 + ptp 1. and 12, we have − − t  p  We obtain t 2 t L[(C) gdµm] = 2 G(s) = 2√2p 0 s g(t)dt = t2 tp/2+1 + tp Z − p/ 2 +1 1 1 Z0 2[ 3 2 p(p 1)Γ(p/2) p/2+2 + s − − s and, hence, p Γ(p + 1) 1 ]. 2 sp+1 1 4√2p g(t)dt = 2 . − p2 + Therefore, using the inverse of the Laplace trans- Z0 form, we have that Therefore, the Hellinger distance with respect to ′ t µm′ with m (t) = t is (C) gdµm = Z0 2 p/2+1 2 p 2√2p t t Hµ ′ (ν , ν ) = 1 . (14) 2[ 2 p(p 1)Γ(p/2) m s − p2 + 2 − − Γ(/ p 2 + 2) p Γ(p + 1) tp It is clear that the Hellinger distance between + ] = 2 p 2 Γ( p + 1) ν and ν depends on the measure µm and µm′ as Equation 13 and Equation 14 are, in general, differ- 8 2)p(p 1 t2 − tp/2+1 + tp ent (only equivalent for p = 2 when both distances − p( p + 2)p are zero). p Let us consider now ν and µm as distorted prob- The following can be deduced from the above ex- abilities and take the integration on [0,1]. Then, pressions.

t 2 p 8 2)p(p 1 Lemma 31 When p , both Hµm (ν , ν ) = 1 (C) g(τ)dµm(τ) = 2 − . 2 p → ∞ − ( p + 2)p and Hµm′ (ν , ν ) = 1. Z0 p

545 2 p Lemma 32 Distances Hµm (ν , ν ) and [11] Narukawa, Y., Torra, V. (2005) Fuzzy measure 2 p Hµm′ (ν , ν ) are increasing with respect to p, and probability distributions: distorted prob- and the following holds abilities, IEEE Trans. on Fuzzy Systems 13:5 617-629. 2 p Hµm (ν , ν ) [0, 1] for all p 2, [12] Narukawa, Y., Torra, V. (2009) Aggregation • 2 p ∈ ≥ H ′ (ν , ν ) [0, 1] for all p > 0. operators on the real line, Proc. 3rd Interna- • µm ∈ tional Workshop on Soft Computing Applica- tions (SOFA 2009) (ISBN: 978-1-4244-5056-5), 4. Conclusions Szeged, Hungary and Arad, Romania, 185-188. [13] Narukawa, Y., Torra, V. (2009) Continuous In this paper we have extended the definition of the OWA operator and its calculation, Proc. IFSA- Hellinger distance, which was initially defined for EUSFLAT (ISBN:978-989-95079-6-8), Lisbon, additive measures, to fuzzy measures. This exten- Portugal, 1132-1135. sion relies on a Radon-Nikodym-type derivative for [14] Narukawa, Y., Torra, V., Sugeno, M. Cho- fuzzy measure. quet integral with respect to a symmetric fuzzy As future work we plan to study some properties measure of a function on the real line, An- of this distance, and also study how this extension nals of Operations Research, in press. DOI: can be applied to other f-divergences, and if a gen- 10.1007/s10479-012-1166-6 eral definition of f- can also be given. [15] Nguyen, H.T. (2006) An Introduction to Ran- dom Sets, Chapman and Hall, CRC Press. 5. Acknowledgments [16] Pollard, D. (2008) Distances and affinities be- tween measures, from: Partial support by the Spanish MEC projects http://www.stat.yale.edu/˜pollard/ ARES (CONSOLIDER INGENIO 2010 CSD2007- Books/Asymptopia/Metrics.pdf 00004), eAEGIS (TSI2007-65406-C03-02), and CO- [17] Rébillé, R. (2012) A Radon-Nikodym deriva- PRIVACY (TIN2011-27076-C03-03) is acknowl- tive for almost subadditive set functions, Int. J. edged. of Unc., Fuzziness and Knowledge-Based Sys- tems, in press. [18] Schiff, J. L. (1999) The Laplace Transform, References Springer. [19] Shao, J. (2010) , [1] Choquet, G. (1953/54) Theory of capacities, Springer. Ann. Inst. Fourier, 5, 131-295. [20] Shlomo, N., Antal, L., Elliot, M. (2013) Dis- [2] Cieslak, D. A., Hoens, T. R., Chawla, N. V., closure Risk and Data Utility in Flexible Table Kegelmeyer, W. P. (2012) Hellinger distance Generators, Proc. NTTS 2013. decision trees are robust and skew-insensitive, [21] Sugeno, M. (2013) A note on derivatives of Data Min Knowl Disc 24 136-158. functions with respect to fuzzy measures, Fuzzy [3] Denneberg,D. (1994) Non Additive Measure and Sets and Systems 222 1-17. Integral, Dordorecht:Kluwer Academic Publish- ers. [4] Edwards, W. (1953) Probability-preferences in gambling, American Journal of Psychology 66 349-364. [5] Edwards, W. (1962) Subjective probabilities in- ferred from decisions, Psychological Review 69 109-135. [6] Graf, S. (1980) A Radon-Nikodym theorem for capacities, Journal für die reine und angewandte Mathematik, 1980 192-214. [7] Hellinger, E. (1909) Neue Begründung der The- orie quadratischer Formen von unendlichvielen Veränderlichen, Journal für die reine und ange- wandte Mathematik 136: 210-271. [8] Keener, R. W. (2010) Theoretical Statistics, Springer. [9] Klement, E. P., Mesiar, R., Pap, E. (2000) Tri- angular Norms, Kluwer Academic Publisher. [10] Narukawa, Y. (2012) Choquet Integral on the Real Line as a Generalization of the OWA Op- erator. Proc. MDAI 2012, Lecture Notes in Ar- tificial Intelligence 7647 56-65.

546