Positive Definite Kernel Functions on Fuzzy Sets Jorge Guevara, Roberto Hirata Jr, Stephane Canu

To cite this version:

Jorge Guevara, Roberto Hirata Jr, Stephane Canu. Positive Definite Kernel Functions on Fuzzy Sets. 2014. hal-00920645

HAL Id: hal-00920645 https://hal.archives-ouvertes.fr/hal-00920645 Preprint submitted on 6 Mar 2014

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Positive Deﬁnite Kernel Functions on Fuzzy Sets

Jorge Guevara Roberto Hirata Jr. Stephane´ Canu Instituto de Matematica´ Instituto de Matematica´ Normandie universite´ e Estat´ıstica e Estat´ıstica INSA de Rouen - LITIS Universidade de Sao˜ Paulo Universidade de Sao˜ Paulo St Etienne du Rouvray Sao Paulo, Brazil Sao Paulo, Brazil France Email: [email protected] Email: [email protected] Email: [email protected]

Abstract—Embedding non-vectorial data into a normed vec- the RHKS. Methods working in this way are called kernel torial space is very common in machine learning, aiming to methods [13], [14], for instance, the Support Vector Machine perform tasks such classification, regression, clustering and so [16], Support Vector Data Description [17], Kernel PCA [13], on. Fuzzy datasets or datasets whose observations are fuzzy sets, are an example of non-vectorial data and, many of fuzzy pattern Gaussian Process [18] an so on. recognition algorithms analyze them in the space formed by the Kernel methods are attractive for data analysis because: 1) set of fuzzy sets. However, the analysis of fuzzy data in such space the domain of definition has not additionally requirements, has the limitation of not being a vectorial space. To overcome such allowing the analysis of non-vectorial data, such as graphs, limitation, in this work, we propose the embedding of fuzzy data sets, strings. 2) a RKHS has a structure such that the closeness into a proper Hilbert space of functions called the Reproducing Kernel Hilbert Space or RKHS. This embedding is possible using of two functions in the norm implies closeness in their values, a positive definite kernel function defined on fuzzy sets. As a allowing to perform, clustering, classification and another result, we present a formulation of a real-valued kernels on fuzzy important tasks. 3) to construct a RKHS it is only necessary sets, particularly, we define the intersection kernel and the cross to have a positive definite kernel k : E × E → R. 4) product kernel on fuzzy sets giving some examples of them using computations in the RKHS are performed by knowing that T-norm operators. Also, we analyze the nonsingleton TSK fuzzy kernel and, finally, we gave several examples of kernels on fuzzy kernel evaluations are equal to the inner product of functions sets, that can be easily constructed from the previous ones. in the RKHS: k(x,y) = hk(x,.),k(y,.)iH, where x ∈ E 7→ Index Terms—Kernel on fuzzy sets, Reproducing Kernel k(x,.) ∈ H is the embedding of the data into the RHKS H, Hilbert Space, positive definite kernel and k(x,.),k(y,.) ∈H are the representers of x,y ∈ E in the RKHS. 5) k(x,y) is a similarity measure between the objects I.INTRODUCTION x,y ∈ E and, because the mapping x ∈ E 7→ k(x,.) ∈ H is Several world applications contain datasets whose observa- nonlinear, simple functions in the RKHS are useful to analyze N tions are fuzzy sets, i.e., datasets of the form {Xi}i=1, where complex input data. 6) kernel methods are modular, algorithms each Xi is a fuzzy set [1], [2]. Those datasets are a result of working in the RKHS are independent of the kernel k that modeling impreciseness and vagueness in observations of real generates such space, that is, we can choose many kernels problems with fuzzy sets. For example, because of the uncer- without changing the algorithm. 7) Many classical algorithms tainty added by noise and impreciseness due to measurement can be kernelized applying the kernel trick [13]. instruments, data from biological and astronomical problems could be modeled by fuzzy sets. Also, it is widely know that datasets with features given in the form of linguistic terms, words and intervals could be modeled by fuzzy sets [3]–[9]. In Machine Learning community, datasets are used to au- tomatically construct algorithms that give some useful information to the user, for instance, to make future predictions from the actual data, to perform selection of the most relevant features of the dataset or another important tasks as clustering, regression, inference, density estimation and so on [10], [11]. A methodology commonly used by machine learning community is to perform the analysis of the embedding of the data in a proper subspace of a Hilbert space of functions called the Fig. 1. Supervised classification of fuzzy data using support vector machines. A1, A2,..., A5 are fuzzy sets [19] Reproducing Kernel Hilbert Space or RKHS [12]–[15]. To do this embedding possible, it is only necessary to have a real- In this paper, we give the theoretical basis to construct 1 valued positive definite function called reproducing kernel of positive definite kernel on fuzzy sets, that is, we are going 1The word kernel comes from the theory of integral operators and it should to consider the set E as the set of all the fuzzy sets. This not be confused with the concept of kernel of a fuzzy set. will allow us to use all the stuff of kernel methods in datasets whose observations are given by fuzzy sets. As an example, two functions in the norm implies closeness of their values. Figure 1 shows a nonlinear classifier obtained from a support Such Hilbert spaces are widely used in machine learning for vector machine using a dataset whose observations are fuzzy data analysis. Algorithms like support vector machines [38], sets, using a positive definite kernel on fuzzy sets [19]. support vector data description [17] and kernel PCA [13] work with the embedding of the input data into some RKHS A. Previous Work using Positive Definite Kernels and Fuzzy generated by a reproducing kernel. Sets In the sequel, E denotes a non empty set, H is the real The literature reports some work using jointly fuzzy theory RKHS of real valued functions on E. Notation k(.,y) means techniques and positive definite kernels to solve machine learn- the mapping x → k(x,y) with fixed y where k is a function ing problems, particularly, in clustering [20]–[22], classifica- on E × E. tion problems with outliers or noises [23], feature extraction 2 [24] and discriminant analysis [25], without implying positive Definition II.1 (Reproducing kernel). A function definite kernels on fuzzy sets, i.e., all the kernels are real k : E × E → R valued functions defined on RD × RD (R is the set of real number and D is a positive integer) and fuzzy techniques and (x,y) 7→ k(x,t) (1) kernels are used in some step of the algorithms. is called reproducing kernel of the Hilbert space if and only A relationship between some fuzzy concepts and positive H if definite kernels, as for example, Takagi-Sugeno-Kang fuzzy systems, under some criteria, can be viewed as kernel eval- 1) ∀x ∈ E, k(.,x) ∈H uations [26]–[33]; some fuzzy basis functions can be used 2) ∀x ∈ E, ∀f ∈H hf,k(.,x)iH = f(x) to construct positive definite kernels [34] and some positive Condition 2) is called the reproducing property. From the definite kernels are fuzzy equivalence relations [35]–[37]. But, definition above follows: all the positive definite kernels on those works are functions RD RD defined only on × . ∀(x,y) ∈ E × E, k(x,y)= hk(.,x),k(.,y)iH To the best of our knowledge, the first attempt to fill this gap, is the work [19] giving a formulation to construct positive Definition II.2 (Real RKHS). A Hilbert Space of real valued definite kernels on fuzzy sets and experimenting with those functions on E, denoted by H, with reproducing kernel is kernels using fuzzy and interval datasets. called a real Reproducing Kernel Hilbert Space or real RKHS. B. Contributions In the sequel, we are going to use the term RKHS to design a To the best of our knowledge, there is no general formula- real RKHS. A main characterization of RKHS is that a RKHS tion to define kernels on fuzzy sets, all previous works only is a Hilbert space of real valued functions on E where all the consider kernels on RD × RD relating fuzzy concepts in the evaluation functionals design of the kernel or as a step of some algorithm. This work e : H → R (2) has the following contributions: x f 7→ ex(f)= f(x) (3) • We give a general formulation of kernels on fuzzy sets. in particular we define the intersection kernel on fuzzy are continuous on H. By Riez representation theorem [39] and sets and the cross product kernel on fuzzy sets. Also, we the reproducing property (see Definition II.1) follows provide some examples of such kernels using different

T-norm operators. ex(f)= f(x)= hf,k(.,x)iH, ∀x ∈ E, ∀f ∈H, • We show that the kernel presented in [19] satisfy our definition of kernel on fuzzy sets and we proof that such As a result, in RKHS a sequence converging in the norm kernel is a fuzzy equivalence relation and is a fuzzy logic also converges pointwise. formula for fuzzy rules. The following result shows the equivalence between repro- • We give several examples to construct new positive defi- ducing kernels and positive functions nite kernels on fuzzy sets from all the previous kernels on Lemma 1. Any reproducing kernel k : E × E → R is a fuzzy sets, we present the Fuzzy Polynomial Kernel, the symmetric positive definite function, that is, satisfy Fuzzy Gaussian Kernel and the Fuzzy Rational Quadratic Kernel and also, we present some conditionally positive N N kernels on fuzzy sets, such as: the Fuzzy Multiquadric cicjk(xi,xj) ≥ 0 (4) Kernel and the Fuzzy Inverse Multiquadric Kernel. i=1 j=1 X X II.THEORETICAL BACKGROUND ∀N ∈ N, ∀ci,cj ∈ R and k(x,y) = k(y,x), ∀x,y ∈ E, the A. Reproducing Kernel Hilbert Spaces converse is true.

A Reproducing Kernel Hilbert Space (RKHS) is a Hilbert 2We will consider only real valued kernels, because are the functions of space of functions with the nice property that the closeness of more practical interest To proof that k is positive definite it is enough to proof that A complete review of the theory of fuzzy sets and applica- for some Hilbert space H and mapping φ : E →H, function tions can be found in [45]. k could be written as C. T-Norm k(x,y)= hφ(x),φ(y)iH (5) A triangular norm or T-norm is the function T : [0, 1]2 → An important result is the Moore-Aronszajn Theorem [40], it [0, 1], that for all x,y,z ∈ [0, 1] satisfy: claims that a RKHS, H, defines a corresponding reproducing T1 commutativity: T (x,y)= T (y,x); kernel k and, conversely, a reproducing kernel k defines a T2 associativity: T (x,T (y,z)) = T (T (x,y),z); unique RKHS, H. Another important result from probability T3 monotonicity: y ≤ z ⇒ T (x,y) ≤ T (x,z); theory is that if k(.,.) is positive definite, then exists a T4 boundary condition T (x, 1) = x. family of zero-mean Gaussian random variables with k(.,.) Using n ∈ N and associativity, a multiple-valued extension n as covariance function [12]. Tn : [0, 1] → [0, 1] of a T-norm T is given by Examples of reproducing kernels or positive definite kernels on RD × RD widely used in machine learning community are Tn(x1,x2,...,xn)= T (x1,Tn−1(x2,x3,...,xn)). (8)

Linear kernel k(x,y)= hx,yi We will use T to denote T or Tn. Polynomial kernel k(x,y) = exp(−kx − yk2/σ2) Gaussian kernel k(x,y)=(hx,yi + 1)D D. Semi-ring of Sets More sophisticated examples are kernels on probability mea- Let Ω be a set, A semi-ring of sets S on Ω is a subset of sures [41] kernels on measures [42] kernels on strings [43] the power set P(Ω), that is, a set of sets satisfying: and another kernels for non-vectorial data, such graphs, sets 1 φ ∈S; and logic terms [44]. 2 A, B ∈S, =⇒ A ∩ B ∈S; Summarizing, a RKHS is a Hilbert space of functions 3 for all A, A1 ∈ S and A1 ⊆ A, exist a sequence of possessing the additional structure that all the evaluation pairwise disjoint sets A2,A3,...AN , such functionals are continuous. A RKHS has a reproducing kernel N k(x,y), x,y ∈ E, with the property that for fixed x ∈ E, A = Ai. k(x,.) is a function that belongs to the RKHS H. The kernel i=1 function k(x,y) is a positive definite function. The space [ Condition 3 is called finite decomposition of A. spanned by the functions k(x,.) generates a RKHS or a Hilbert space with reproducing kernel k. Note that positive definite E. Measure kernels are reproducing kernels of some RKHS. Definition II.5 (Measure). Let S be a semi-ring and let ρ : 2 As a comment, the space of square integrable functions L S → [0, ∞] be a pre-measure, i.e., ρ satisfy: is a Hilbert space and is isometric to the space of sequences 1 ρ(φ) = 0; 2 but is not a RKHS because it is a space of a equivalence ℓ 2 for a finite decomposition of A ∈ S, ρ(A) = class of functions rather than a function space. Then, 2 does N L ρ(A ); not have a reproducing kernel, note that the delta function has i=1 i by Carathodory’s extension theorem, ρ is a measure on σ(S), a reproducing property but does not belong to this space. P Reference [15] gives more details about the theory of where σ(S) is the smallest σ-algebra containing S. RKHS and references [13], [14] give more details about kernel Finally, capital letters A, B, C will denote sets and capital methods in machine learning. letters X,Y,Z will be denote fuzzy sets. Notation F(S ⊂ Next, to introduce the concept of kernels on fuzzy sets Ω) stands for the set of all fuzzy sets over Ω whose support we will review the concepts of fuzzy set, semi-ring of sets, belongs to S, i.e., measure and T-norm operator. F(S ⊂ Ω) = {X ⊂ Ω|X>0 ∈ S}. B. Fuzzy Set III.KERNELSON FUZZY SETS Let Ω be the universal set, A fuzzy set on Ω, is the set X ⊂ Ω with membership function We define kernel functions on fuzzy sets as the mapping R µX :Ω → [0, 1] (6) k : F(S ⊂ Ω) × F(S ⊂ Ω) →

x 7→ µX (x). (7) ( X , Y ) 7→ k(X,Y ), (9) Definition II.3 (α-cut of a fuzzy set). The α-cut of a fuzzy where S is a semi-ring of sets on Ω and F(S ⊂ Ω) is the set set X ⊂ Ω is the set of all fuzzy sets over Ω whose support belongs to S. This is a kernel for non-vectorial input. X = {x ∈ Ω|µ (x) ≥ α,α ∈ [0, 1]}. α X Because each fuzzy set X belongs to F(S ⊂ Ω), then the Definition II.4 (support of a fuzzy set). The support of a fuzzy support X>0 of X admits finite decomposition, that is, set is the set X>0 = Ai ∈S, X>0 = {x ∈ Ω|µX (x) > 0}. i∈I [ where A = {A1,A2,...,AN } are pairwise disjoint sets and and ρ is a measure according to Definition (II.5). We define I stand for an arbitrary index set. the Intersection Kernel on Fuzzy Sets with measure ρ as In the following, we will derive some kernels on fuzzy sets, based on the intersection of fuzzy sets and the cross product k(X,Y ) = g(X ∩ Y ) between its elements. = µX∩Y (Ai)ρ(Ai) (12) i∈I A. Intersection kernel on Fuzzy Sets X The intersection of two fuzzy sets X,Y ∈ F(S ⊂ Ω) is the Using the T-norm operator, the intersection kernel on fuzzy fuzzy set X ∩ Y ∈ F(S ⊂ Ω) with membership function sets with measure ρ given by (12) can be written as

µX∩Y :Ω → [0, 1] (10) k(X,Y ) = µX∩Y (Ai)ρ(Ai) i∈I x 7→ µX∩Y = T (µX (x),µY (x)) (11) X = µ (x)ρ(A ) where T is a T-norm operator. Using this fact, we define the X∩Y i i∈I x∈A intersection kernel on fuzzy sets as follows: X Xi = T (µX (x),µY (x))ρ(Ai) Definition III.1 (Intersection Kernel on Fuzzy Sets). Let X,Y i∈I x∈Ai be two fuzzy sets in F(S ⊂ Ω), the intersection kernel on X X fuzzy sets is the function Table I shows several kernels examples for different T-norm k : F(S ⊂ Ω) × F(S ⊂ Ω) → R operators. Function Z in Table I is defined as ( X , Y ) 7→ k(X,Y )= g(X ∩ Y ), µX (x), µy(x) = 1 Z(µ (x),µ (x)) = µ (x), µ (x) = 1 where g is the mapping X Y  Y X  0, otherwise g : F(S ⊂ Ω) → [0, ∞] X 7→ g(X)  Intersection kernels on fuzzy sets with measure ρ The mapping g plays an important role assigning real values kmin(X,Y ) Pi I Px A min(µX (x),µY (x))ρ(Ai) to the intersection fuzzy set X ∩ Y . We can think about this ∈ ∈ i function as a similarity measure between two fuzzy sets and kP (X,Y ) Pi∈I Px∈A µX (x)µY (x)ρ(Ai) its design will be highly dependent of the problem and the i kmax(X,Y ) P P max(µX (x)+ µY (x) − 1, 0)ρ(Ai) data. i∈I x∈Ai For instance, our first choice for g uses the fact that the kZ (X,Y ) Pi I Px A Z(µX (x),µY (x))ρ(Ai) support of X ∩ Y , has finite decomposition, that is, ∈ ∈ i TABLE I (X ∩ Y )>0 = Ai ∈S, KERNELSONFUZZYSETS. i∈I [ of pairwise disjoint sets {A1,A2,...,AN }. We can measure its support using the measure ρ : S → [0, ∞] as follows: More examples can be obtained by setting specific measures, for example, kmin with the probability measure P gives ρ((X ∩ Y )>0)= ρ( Ai)= ρ(Ai), the kernel i∈I i∈I [ X kmin(X,Y )= min(µX (x),µY (x))P(Ai), The idea to include fuzziness is to weight each ρ(Ai) by i∈I x∈A a value given by the contribution of the membership function X Xi on all the elements of the set Ai. or kP with the Dirac measure δx(Ai) gives Next, we give a definition of a intersection kernel on fuzzy sets using the concept of measure and membership function. kP (X,Y )= µX (x)µY (x)δx(Ai). i∈I x∈A Definition III.2 (Intersection Kernel on Fuzzy Sets with X Xi The next step is to determine which intersection kernels on measure ρ). Let i∈I Ai ∈ S, a finite decomposition of the support of the intersection fuzzy set X ∩ Y ∈ F(S ⊂ Ω) as fuzzy sets with measure ρ are positive definite, that is, which defined before. LetS g be the function intersection kernels are reproducing kernels of some RKHS.

g : F(S ⊂ Ω) → [0, ∞] Lemma 2. kmin(X,Y ) is positive deﬁnite

X ∩ Y 7→ g(X ∩ Y )= µX∩Y (Ai)ρ(Ai) Proof: We first, define a function i∈I X 1 R where [0,a] : → {0, 1} (13) µX∩Y (Ai)= µX∩Y (x) 1 1, t ∈ [0,a] t 7→ [0,a](t)= (14) x∈A 0, otherwise Xi then function min could be written as It is worth to note that, if the σ-algebra is a Borel algebra of subsets of RD, then the intersection kernel with measure ρ 1 1 min(a,b) = [0,a](t) [0,b](t)dλ(t) can be written as R Z1 1 = [0,a](.), [0,b](.) H k(X,Y )= T (µX (x),µY (x))dρ(x) RD Z By Lemma (1) and Equation (5), it follows that min is positive as for example kmin and kP can be written as definite. That is, for a fixed x ∈ Ω,

N N kmin(X,Y ) = min(µX (x),µY (x))dρ(x) (15) RD Z cicj min(µXi (x),µXj (x)) ≥ 0, i=1 j=1 kP = µX (x)µY (x)dρ(x) (16) X X RD Z ∀N ∈ N, ∀ci,cj ∈ R, ∀Xi,Xj ∈ F(S ⊂ Ω). (17) Next, we show that kmin is positive deﬁnite, that is: Another type of intersection kernel is the nonsingleton TSK N N fuzzy kernel presented in [19]. We will see that this kernel satisfy our deﬁnition of intersection kernel and we will review cicj min(µXi (x),µXj (x))ρ(Al) ≥ 0,

i=1 j=1 l∈I x∈Al their properties and study the link with fuzzy equivalence X X X X relations in Section (IV) ∀N ∈ N, ∀ci,cj ∈ R, ∀Xi,Xj ∈ F(S ⊂ Ω), ∀x ∈ Ω, and I stands for an arbitrary index set. B. Cross product kernel between fuzzy sets Note that Definition III.3. Let k : Ω × Ω → R be a positive definite N N kernel. The cross product kernel between fuzzy sets X,Y ∈ F(S ⊂ Ω) is the real valued function k× defined on F(S ⊂ cicj min(µXi (x),µXj (x))ρ(Al) i=1 j=1 l∈I x∈A Ω) × F(S ⊂ Ω) as X X X Xl N N k×(X,Y )= k(x,y)µX (x)µY (y) (18) = ( cicj min(µX (x),µX (x)))ρ(Al) i j x∈X y∈Y l∈I x∈Al i=1 j=1 X X X X X X ≥ 0 Lemma 4. kernel k× is positive definite Proof: By Definition (III.3) k (X,Y ) = k(x,y)µ (x)µ (y) Lemma 3. kP (X,Y ) is positive definite × X Y x∈X y∈Y Proof: Note that for a fixed x ∈ Ω, X X = h k(.,x)µX (x), k(.,y)µY (y)i N N x∈X y∈Y X X cicjµXi (x)µXj (x) i=1 j=1 X X N 2 IV. NONSINGLETON TSKFUZZY KERNEL

= ciµXi (x) ≥ 0 The Nonsingleton TSK fuzzy kernel, was presented in [19]. i=1 ! We are going to show that this kernel is an instance of the X fuzzy intersection kernel from Deﬁnition (III.1) Next, we show that kP is positive deﬁnite, that is

N N Deﬁnition IV.1 (Nonsingleton TSK Fuzzy Kernel). Let X ∩Y be a fuzzy set given by Deﬁnition (III.1) and let g be the cicj µX (x)µX (x)ρ(Al) ≥ 0, i j function: i=1 j=1 l∈I x∈A X X X Xl g : F(S ⊂ Ω) → [0, ∞] ∀N ∈ N, ∀ci,cj ∈ R, ∀Xi,Xj ∈ F(S ⊂ Ω), ∀x ∈ Ω, and L stands for an arbitrary index set. Note that: X ∩ Y 7→ g(X ∩ Y ) = sup µX∩Y (x) x∈Ω N N then the Nonsingleton TSK Fuzzy Kernel is given by cicj µXi (x)µXj (x)ρ(Al) i=1 j=1 l∈I x∈A k(X,Y ) = sup µ (x) (19) X X X Xl X∩Y N N x∈Ω

= ( cicjµXi (x),µXj (x))ρ(Al) Using T-norm operators, this kernel could be written as

l∈I x∈Al i=1 j=1 X X X X k(X,Y ) = sup µX∩Y (x) ≥ 0 x∈Ω = sup T (µX (x),µY (x)) x∈Ω Note that the definition of the nonsingleton TSK Fuzzy Lemma 6 (kernels are at least Tcos transitive [35]). Let the kernel satisfy the definition of intersection kernel on fuzzy nonempty set X . Let k : X × X → [0, 1] a positive definite sets (Definition III.1) for the particular setting of g(X ∩ Y )= kernel such that ∀x ∈ X : k(x,x) = 1; then ∀x,y,z ∈ X , supx∈Ω µX∩Y (x). kernel k satisfy Tcos-transitivity:

Lemma 5. The Nonsingleton TSK Fuzzy Kernel is positive Tcos(k(x,y),k(y,z)) ≤ k(x,z) (20) definite that is: where N N 2 2 cicjk(Xi,Xj) ≥ 0, Tcos(a,b)= max(ab − 1 − a 1 − b , 0) (21) i=1 j=1 X X is a Archimedean T-norm and itp is the greatestp T-norm with N R ∀N ∈ , ∀ci,cj ∈ , ∀Xi,Xj ∈ F(S ⊂ Ω). this property [35]. Proof: Let an arbitrary index set. By commutativity I Lemma 7 (kernels as fuzzy logic formula for fuzzy rules [36]). property of T-norms, k is symmetric. Note that: Let the nonempty set X . Let k : X × X → [0, 1] a positive 2 definite kernel such that ∀x ∈X : k(x,x) = 1; then ∀x,y,z ∈ cicjk(Xi,Xj)= ci k(Xi,Xi)+2 cicjk(Xi,Xj) i,j∈I i∈I i>j,i,j∈I X , there is a family of membership functions µi∈I : X → X X X [0, 1], where I is a nonempty index set such that and sup T (µX (x),µX (x)) = 1, ∀i ∈ I then, x∈Ω i i ←→ 2 ∀x,y ∈X : k(x,y) = inf TM (µi(x),µi(y)) (22) cicjk(Xi,Xj)= ci + 2 cicjk(Xi,Xj) i∈I i,j∈I i∈I i>j,i,j∈I ←→ −→ −→ X X X where TM = min( T (x,y), T (y,x)) is its induced bi- 2 2 −→ Using that ( i∈I ci) = i∈I ci + 2 i>j,i,j∈I cicj ≥ 0 and implication operator. and T (x,y) = sup{t ∈ [0, 1]|T (x,t) ≤ by the fact that k(Xi,Xj) ∈ [0, 1], we have P P P y} is a implication function generated from a T-norm T [36]. a) If k(Xi,Xj) = 0, ∀i,j ∈ I : i>j, then Lemma 8. The Nonsingleton TSK Fuzzy Kernel is Tcos 2 cicjk(Xi,Xj)= ci ≥ 0 transitive (Lemma (6)) and admits the representation given i,j∈I i∈I by Lemma (7) X X b) If k(Xi,Xj) = 1, ∀i,j ∈ I : i>j, then Proof: By construction, the Nonsingleton TSK Fuzzy Kernel is a positive definite kernel such ∀X ∈ F(S ⊂ Ω) : c c k(X ,X ) = c2 + 2 c c i j i j i i j k(X,X) = 1 and also k has values in the interval [0, 1]. By i,j∈I i∈I i,j∈I,i=6 j.i>j X X X Lemma (6) k is Tcos transitive and by Lemma (7), k admits 2 = ( ci) ≥ 0 representation given by Lema (7). i∈I X V. MORE KERNELSON FUZZY SETS Some examples of this kernel are given in [19] It is easily to construct new kernels on fuzzy sets from the previously defined kernels. For example, if k (.,.) and k (.,.) A. Relation with Fuzzy Equivalence Relations 1 2 are positive definite kernels on fuzzy sets, by closure properties We now review two results from [35] (Corollary 6) and [36] of kernels [14], also are positive definite kernels on fuzzy sets: (Theorem 9). The first one shows that every positive definite 1) k (X,Y )+ k (X,Y ); kernel mapping to the unit interval with constant one in the 1 2 2) αk (X,Y ), α ∈ R+; diagonal is a fuzzy equivalence relation with respect to a given 1 3) k (X,Y )k (X,Y ); T-norm. The second one shows that such kernels can be viewed 1 2 4) f(X)f(Y ), f : F(S ⊂ Ω) → R; as a fuzzy logic formula used to represent fuzzy rule bases. 5) k (f(X),f(Y )), f : F(S ⊂ Ω) → F(S ⊂ Ω); Then we show that the Nonsingleton TSK Fuzzy Kernel satisfy 1 6) exp(k (X,Y )); such results. 1 7) p(k1(X,Y )), p is a polynomial with positive coeffi- Definition IV.2 (Fuzzy Equivalence Relation). A function E : cients. X × X → [0, 1] is called a fuzzy equivalence relation with More kernels on fuzzy sets could be obtained using the respect to the T-norm T if nonlinear mapping 1) ∀x ∈X , E(x,x) = 1; φ : F(S ⊂ Ω) → H 2) ∀x,y ∈X , E(x,y)= E(y,x); 3) ∀x,y,z ∈X , T (E(x,y), E(y,z)) ≤E(x,z). X 7→ φ(X),

The value E(x,y) can be interpreted as “x is equal to y. and using the fact that k(X,Y )= hφ(X),φ(Y )iH and Condition 3 is called T-transitivity and can be regarded as the def 2 statement “If x and y are similar, and y and z are similar D(X,Y ) = kφ(X) − φ(Y )kH then x and z are also similar.” [35]. = k(X,Y ) − 2k(X,Y )+ k(Y,Y ), we have the following positive definite kernels on fuzzy sets. squashing [48]), where all the information of the features are • Fuzzy Polynomial kernel α ≥ 0,β ∈ N preserved but the size of the dataset is decreased with the hope that the analysis of the smaller dataset gives approximately the β kpol(X,Y ) = (hφ(X),φ(Y )iH + α) same information that the large dataset. Modeling groups of = (k(X,Y )+ α)β. samples as fuzzy sets jointly with the kernels on fuzzy sets in supervised classification problems is an interesting topic to be • Fuzzy Gaussian kernel γ > 0 investigated. 2 kgauss(X,Y ) = exp(−γkφ(X) − φ(Y )kH) Another important topic to be investigated is the perfor- = exp(−γD(X,Y )). mance of another formulations for kernels on fuzzy sets such as the multiple kernel learning approach [49] and non positive • Fuzzy Rational Quadratic kernel α,β > 0 kernels [50] on fuzzy sets and study the performance in real 2 datasets. kφ(X) − φ(Y )kH −α kratio(X,Y ) = (1+ ) It is possible to work with kernel methods for datasets αβ2 whose observations are fuzzy sets. The set of all fuzzy sets is D(X,Y ) −α a example of non-vectorial data, we can embed this data into a = (1+ 2 ) . αβ vectorial space as the RKHS, using a positive definite kernel A. Conditionally Positive Definite Kernels on Fuzzy Sets defined on the set of all the fuzzy sets. Using some basic We can construct another class of fuzzy kernels using the concepts of set theory such as semi-ring of sets, we presented concept of conditionally positive definite kernels, which are a novel formulation for kernels on fuzzy sets. As instances of kernels satisfying Lemma 1 but, additionally it is required our general definition, we gave a definition of the intersection N kernel on fuzzy sets and the cross product kernel on fuzzy sets. that i=1 ci = 0. Examples of Conditionally Positive Definite Fuzzy Kernels with this property are: Also we gave some examples of positive definite functions for P those kernels. Moreover, we showed that the kernel on fuzzy • Fuzzy Multiquadric kernel sets presented in [19] is a fuzzy equivalence relation, admits 2 2 some special representation as fuzzy logic formula for fuzzy kmulti(X,Y ) = − kφ(X) − φ(Y )kH + α rules and is a special case of the intersection kernel on fuzzy q 2 = − D(X,Y )+ α . sets. Furthermore, we gave some examples of positive definite • Fuzzy Inverse Multiquadricp kernel kernels on fuzzy sets and conditionally positive definite kernels on fuzzy sets. 2 2 −1 kinvmult(X,Y ) = ( kφ(X) − φ(Y )k + α ) H ACKNOWLEDGMENT q 2 −1 = ( D(X,Y )+ α ) . The authors would like to thank CNPq and CAPES (Brazil- It is easy to construct positivep definite kernels from CPD ker- ian Agencies) for their financial support. nels by doing exp(tkmulti(X,Y )) and exp(tkinvmult(X,Y )) for t> 0, because a kernel k is conditionally positive definite if and only if exp(tk) is positive definite for all t> 0 [13]. See Proposition 2.22 of [13] for more details on how to construct positive definite kernels from conditionally positive definite kernels.

VI.CONCLUSIONSAND FURTHER RESEARCH As a next step of our research, we are going experiment with the proposed kernels on several machine learning problems, specifically, in supervised classification problems. Besides the applications mentioned in the introductory part of this paper, we are particularly interested in apply those kernels in datasets whose observations are clusters, prototypes or groups of samples, as for example, some scholars had modeling this kind of observations with probability measures with important applications such as group anomaly identification in galaxies and physical particles [46], [47]. Another important applications to be explored are in the areas of big data and large scale machine learning. Because hardware and software requirements, large datasets are difficult to analyze and one possible solution is to construct a summary of data or transform the large dataset into a smaller one (data x— [28] J.-T. Jeng and T.-T. Lee, “Support vector machines for the fuzzy neural networks,” in Systems, Man, and Cybernetics, 1999. IEEE SMC ’99 Conference Proceedings. 1999 IEEE International Conference on, vol. 6, EFERENCES R 1999, pp. 115 –120 vol.6. [29] C.-T. Lin, C.-M. Yeh, S.-F. Liang, J.-F. Chung, and N. Kumar, “Support- [1] C. Bertoluzza, M.-A.´ Gil, and D. A. Ralescu, Statistical modeling, vector-based fuzzy neural network for pattern classification,” Fuzzy analysis and management of fuzzy data. Physica-Verlag, 2002. Systems, IEEE Transactions on, vol. 14, no. 1, pp. 31 – 41, feb. 2006. [2] R. Viertl, Statistical methods for fuzzy data. Wiley. com, 2011. [30] C.-F. Juang, S.-H. Chiu, and S.-W. Chang, “A self-organizing ts-type [3] A. Palacios and J. Alcala-Fdez, “Mining fuzzy association rules from fuzzy network with support vector learning and its application to low quality data,” Soft Computing - A Fusion of Foundations, Method- classification problems,” IEEE Transactions on Fuzzy Systems, vol. 15, ologies and Applications, pp. 0–0, 2012. no. 5, pp. 998 –1008, oct. 2007. [4] A. Palacios, L. Sanchez,´ and I. Couso, “Diagnosis of dyslexia with [31] C. T. Wee and T. W. Wan, “Efsvmrobus-fcm: Evolutionary fuzzy low quality data with genetic fuzzy systems,” International Journal of rule-based support vector machines classifier with fcm clustering,” in Approximate Reasoning, vol. 51, no. 8, pp. 993–1009, 2010. Fuzzy Systems, 2008. FUZZ-IEEE 2008. (IEEE World Congress on [5] A. M. Palacios, L. Sanchez,´ and I. Couso, “Future performance mod- Computational Intelligence). IEEE International Conference on, june eling in athletism with low quality data-based genetic fuzzy systems,” 2008, pp. 606 –612. Multiple-Valued Logic and Soft Computing, vol. 17, no. 2-3, pp. 207– [32] S.-M. Zhou and J. Gan, “Constructing l2-svm-based fuzzy classifiers in 228, 2011. high-dimensional space with automatic model selection and fuzzy rule [6] L. Sanchez,´ I. Couso, and J. Casillas, “Genetic learning of fuzzy rules ranking,” IEEE Transactions on Fuzzy Systems, vol. 15, no. 3, pp. 398 based on low quality data,” Fuzzy Sets and Systems, vol. 160, no. 17, –409, june 2007. pp. 2524–2552, 2009. [33] W.-Y. Cheng and C.-F. Juang, “An incremental support vector machine- [7] L. Zadeh, “Soft computing and fuzzy logic,” Software, IEEE, vol. 11, trained ts-type fuzzy system for online classification problems,” Fuzzy no. 6, pp. 48 –56, nov. 1994. Sets Syst., vol. 163, pp. 24–44, January 2011. [8] P. P. Wang, Computing with words. John Wiley & Sons, Inc., 2001. [34] J. M. Lski, “On support vector regression machines with linguistic [9] L. A. Zadeh, “Fuzzy logic= computing with words,” Fuzzy Systems, interpretation of the kernel matrix,” Fuzzy Sets Syst., vol. 157, no. 8, IEEE Transactions on, vol. 4, no. 2, pp. 103–111, 1996. pp. 1092–1113, Apr. 2006. [10] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification (2Nd [35] B. Moser, “On the t-transitivity of kernels,” Fuzzy Sets and Systems, vol. Edition). Wiley-Interscience, 2000. 157, no. 13, pp. 1787 – 1796, 2006. [11] C. M. Bishop and N. M. Nasrabadi, Pattern recognition and machine [36] ——, “On representing and generating kernels by fuzzy equivalence learning. springer New York, 2006, vol. 1. relations,” J. Mach. Learn. Res., vol. 7, pp. 2603–2620, Dec. 2006. [12] G. Wahba, Spline models for observational data. Siam, 1990, no. 59. [37] F. Liu and X. Xue, “Design of natural classification kernels using prior [13] B. Scholkopf¨ and A. J. Smola, Learning with kernels : support vector knowledge,” Trans. Fuz Sys., vol. 20, no. 1, pp. 135–152, Feb. 2012. machines, regularization, optimization, and beyond, ser. Adaptive com- [38] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., putation and machine learning. MIT Press, 2002. vol. 20, no. 3, pp. 273–297, Sep. 1995. [14] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. [39] L. Debnath and P. Mikusinski,´ Hilbert Spaces with Applications. El- New York, NY, USA: Cambridge University Press, 2004. sevier Academic Press, 2005. [15] A. Berlinet and C. Thomas-Agnan, Reproducing kernel Hilbert spaces [40] N. Aronszajn, “Theory of reproducing kernels,” Transactions of the in probability and statistics. Kluwer Academic Boston, 2004, vol. 3. American Mathematical Society, vol. 68, 1950. [16] N. Cristianini and J. Shawe-Taylor, An introduction to support Vector [41] T. Jebara, R. Kondor, and A. Howard, “Probability product kernels,” The Machines: and other kernel-based learning methods. New York, NY, Journal of Machine Learning Research, vol. 5, pp. 819–844, 2004. USA: Cambridge University Press, 2000. [42] M. Cuturi, K. Fukumizu, and J.-P. Vert, “Semigroup kernels on mea- [17] D. M. Tax and R. P. Duin, “Support vector data description,” Machine sures,” in Journal of Machine Learning Research, 2005, pp. 1169–1198. learning, vol. 54, no. 1, pp. 45–66, 2004. [43] H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins, [18] C. E. Rasmussen and C. Williams, Gaussian Processes for Machine “Text classification using string kernels,” The Journal of Machine Learning. MIT Press, 2006. Learning Research, vol. 2, pp. 419–444, 2002. [19] J. Guevara, R. Hirata, and S. Canu, “Kernel functions in takagi-sugeno- [44] T. Gartner and W. Scientific., Kernels for structured data. Hackensack, kang fuzzy system with nonsingleton fuzzy input,” in Fuzzy Systems N.J.: World Scientific, 2008. (FUZZ), 2013 IEEE International Conference on, 2013, pp. 1–8. [45] W. Pedrycz and F. A. C. Gomide, Fuzzy Systems Engineering - Toward [20] Y. Kanzawa, Y. Endo, and S. Miyamoto, “On kernel fuzzy c-means for Human-Centric Computing. Wiley, 2007. data with tolerance using explicit mapping for kernel data analysis,” in [46] K. Muandet, K. Fukumizu, F. Dinuzzo, and B. Schlkopf, “Learning from Fuzzy Systems (FUZZ), 2010 IEEE International Conference on. IEEE, distributions via support measure machines,” in Advances in Neural 2010, pp. 1–6. Information Processing Systems 25, P. Bartlett, F. Pereira, C. Burges, [21] H.-C. Huang, Y.-Y. Chuang, and C.-S. Chen, “Multiple kernel fuzzy L. Bottou, and K. Weinberger, Eds., 2012, pp. 10–18. clustering,” Fuzzy Systems, IEEE Transactions on, vol. 20, no. 1, pp. [47] K. Muandet and B. Schoelkopf, “One-class support measure machines 120 –134, feb. 2012. for group anomaly detection,” in Proceedings of the Twenty-Ninth [22] M. Raza and F.-H. Rhee, “Interval type-2 approach to kernel possibilistic Conference Annual Conference on Uncertainty in Artificial Intelligence c-means clustering,” in Fuzzy Systems (FUZZ-IEEE), 2012 IEEE Inter- (UAI-13). Corvallis, Oregon: AUAI Press, 2013, pp. 449–458. national Conference on, june 2012, pp. 1 –7. [48] J. Abello, P. M. Pardalos, and M. G. Resende, Handbook of massive [23] X. Yang, G. Zhang, J. Lu, and J. Ma, “A kernel fuzzy c-means clustering- data sets. Springer, 2002, vol. 4. based fuzzy support vector machine algorithm for classification problems [49] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet, “Simplemkl,” with outliers or noises,” Fuzzy Systems, IEEE Transactions on, vol. 19, Journal of Machine Learning Research, vol. 9, pp. 2491–2521, 2008. no. 1, pp. 105 –115, feb. 2011. [50] C. Ong, X. Mary, S. Canu, and A. Smola, “Learning with non-positive [24] J. Wang, J. Hua, and J. Guo, “Fuzzy maximum scatter discriminant anal- kernels,” in Proceedings of the twenty-first international conference on ysis with kernel methods,” in Fuzzy Systems and Knowledge Discovery Machine learning. ACM, 2004, p. 81. (FSKD), 2010 Seventh International Conference on, vol. 2, aug. 2010, pp. 560 –564. [25] G. Heo and P. Gader, “Robust kernel discriminant analysis using fuzzy memberships,” Pattern Recogn., vol. 44, no. 3, pp. 716–723, Mar. 2011. [26] Y. Chen and J. Wang, “Support vector learning for fuzzy rule-based classification systems,” Fuzzy Systems, IEEE Transactions on, vol. 11, no. 6, pp. 716 – 728, dec. 2003. [27] Y. Chen, “Support vector machines and fuzzy systems,” in Soft Com- puting for Knowledge Discovery and Data Mining, O. Maimon and L. Rokach, Eds. Springer US, 2008, pp. 205–223.