Learning of Decision Fusion Mappings for Pattern Recognition

Learning of Decision Fusion Mappings for Pattern Recognition Friedhelm Schwenker, Christian R. Dietrich, Christian Thiel, Gunther¨ Palm Department of Neural Information Processing Universitat¨ Ulm, 89069 Ulm, Germany fi[email protected] Abstract 2. Training of the Fusion Layer performing a mapping of the classifier outputs (soft or crisp decisions) into the Different learning algorithms for the decision fusion set of desired class labels. mapping of a multiple classifier system are compared in this This two-phase MCS training is very similar to RBF learn- paper. It is very well known that the confusion matrices ing procedures, where the network is learned over two or of the individual classifiers are utilised in the naive Bayes even three learning phases: combination of classifier outputs. After a brief review of the decision templates, the linear associative memory and the 1. Training of the RBF kernel parameters (centres and pseudoinverse matrix approaches it is demonstrated that all widths) of the first layer through clustering or vector four adaptive decision fusion mappings share the confusion quantization. matrices as the essential ingredient. 2. Supervised learning of the output layer (second layer) of the network. 1 Introduction 3. Training both layers of the RBF network through a backpropagation-like optimisation procedure. During the last years, a community has emerged that is Typically, for these three-phase learning scheme a labeled interested in the fusion of classifiers in the context of so- training set is used to calculate the radial basis function called multiple classifier systems (MCS). One goal is to centres, the radial basis function widths and the weights of design classifier systems with a high classification accu- the output layer, but it should be noticed that other learning racy ([7]...[13],[11]).The reason behind this new approach schemes may be applied. For instance, if a large set of unla- is that in traditional pattern recognition, the best individ- belled data points is available this data can be used to train ual classifier has to be found, which is difficult to identity. the parameters of the first RBF layer (centres and widths) Also, a single classifier can not make use of information through unsupervised clustering [17]. from other discrimination functions. For the training of the two layer fusion architecture two Typical fusion strategies for the combination of classi- labeled data sets may be used. It is assumed that I classifier fiers in MCS are based on fixed fusion mappings, for in- decisions, here calculated on I different features, have to be stance averaging or multiplying the classifier outputs [1, 6]. combined. First, the classifiers Ci, i = 1, ..., I, are trained, In this paper we consider supervised learning methods to by utilising a training set, and then another labeled training train the fusion mapping of the MCS. Whereas the MCS ar- set R, which is not necessarily completely disjunct to the chitecture (see Figure 1) is very similar to layered artificial first, is sent to the previously trained first level classifiers to neural networks, like radial basis function networks (RBFs) calculate the individual classifier outputs. These classifier or multilayer perceptrons (MLPs), the training is rather dif- i outputs C (xi), i = 1, ..., I together with the desired class ferent. For most artificial neural network architectures, e.g. labels are then used to train the decision fusion mapping F. MLPs, the parameters are typically trained simultaneously To train this fusion mapping different algorithms have been through a backpropagation-like training algorithm by a one- proposed. phase learning procedure. On the other hand a MCS is typ- In this paper decision templates [9, 10], naive Bayes ically trained by two completely different training phases: decision fusion [19], linear matrix memory [4], and Pseu- 1. Building the Classifier Layer consisting of a set of clas- doinverse matrix [4] methods are studied and links between sifiers where each classifier is trained separately, e.g. these fusion mappings are shown. The mappings are intro- each classifier is trained on a specific feature subset. duced in Section 2, while Section 3 presents experimental results on their performance. The closing Section explores stored classifier outputs Ci are adapted through a Hebbian the close links between the fusion mappings. learning rule [15] and V i is given as the product of the classifier outputs Ci and the desired classifier outputs Y : 2 Adaptive fusion mappings i T V := YCi . (4) | {z } i In this Section we briefly describe methods to train adap- W tive fusion mappings. It is assumed that a set of first level In the case of crisp classifiers the matrix V i is equal to the classifiers C1, ..., CI has been trained through a supervised i i confusion matrix of classifier C , where Vω,ω∗ is equal to learning procedure using a labeled training set. This corre- the number of samples of class ω in the training set which sponds to phase 1 of the two-phase MCS learning scheme. were assigned by Ci to class ω∗ [19]. For soft classifiers the To fix the notation, let Ω = {1, ..., L} be a set of class ω-th row of V i contains the accummulated soft classifier labels and Ci(xµ) ∈ ∆ be the probabilistic classifier output i µ ω 1 i decisions of C for the feature vectors xi ∈ R . i µ of the i-th classifier C given the input feature vector xi ∈ In the classification phase these matrices are then used to R, with ∆ defined as combine the individual classifier decisions to calculate the overall classifier decision (see Eq 3). For a feature vector L X X = (x , ..., x ) Ci(x ) ∆ := {(y , ..., y ) ∈ [0, 1]L| y = 1}. (1) 1 I , the classifier outputs i are applied 1 L l to the matrices V i, i = 1, ..., I and the outputs zi ∈ IRL are l=1 given by i i i T Then for each classifier the outputs for the training set R z := V (C (xi)) . (5) are given by a (L × M)-matrix C , i = 1, ..., I where M = i The combined class membership estimate based on I fea- |R| and the µ-th column of Ci contains the classifier output i µ T ture vectors is then calculated as the average of the individ- C (xi ) . Here the superscript T denotes the transposition. µ µ ual outputs of the second level classifiers The desired classifier outputs ω ∈ Ω for inputs xi ∈ R are given by the (L × M)-matrix Y defined by the 1 of L I I X i X i i T encoding scheme for class labels z := z = V C (xi) (6) i=1 i=1 ( 1, l = ωµ Yl,µ = . (2) and the final class label based on the combined class mem- 0, otherwise bership estimate z is then determined by the maximum membership rule That is, corresponding to Ci the µ-th column Y·,µ ∈ ∆ contains the binary coded target output of feature vector ω := argmax(zl). (7) µ l∈Ω xi ∈ R. In the following we discuss different approaches to com- i 2.2 Decision Templates bine the classifier outputs C (xi), i = 1, ..., I into an overall classifier decision The concept of decision templates is a simple, intuitive, 1 I z := F(C (x1), ..., C (xI )). (3) and robust aggregation idea that evolved from the fuzzy template which was introduced by KUNCHEVA, see [10, 11]. The four different learning schemes, namely linear associa- Decision templates are calculated as the mean of the classi- µ tive memory, decision template, pseudoinverse matrix and fier outputs for inputs xi of class ω ∈ Ω naive Bayes are introduced to calculate the decision fusion 1 X mapping F : ∆I → ∆ (see Eq. 3). In all these methods T ω := C(xµ). (8) i |Rω| i the fusion mapping F is realised through (L × L)-matrices µ ω xi ∈R V 1, ..., V I , calculated through a certain training algorithm. The overall classifier system is depicted in Figure 1. In the case of I input features for each class the decision template T ω is given by the (I × L)-matrix 2.1 Linear Associative Memory ω T1 T ω := . ∈ ∆I . (9) A linear decision fusion mapping F may be re- . ω alised through an associative matrix memory whose error- TI correcting properties have been shown in several numerical 1Independent from the classifier type (soft or crisp) we consider W i a experiments and theoretical investigations [8]. In order to confusion matrix. It is defined to compare the individual fusion schemes calculate the memory matrix V i for each classifier Ci the (see Eq. 4,11,12 and 14). Classifier Layer Fusion Layer 1 C 1 (x1) 1 Feature 1 V z : : : i : Decision C (x ) : i Fusion i i z Feature i V : I : Final : : : I : C (xI) z Classification Feature I V I z Classification Transformation Figure 1. Two layer MCS architecture consisting of a classifier layer and an additional fusion layer. In i general the combination of the classifier outputs C (xi), i = 1, ..., I is accomplished through a fusion 1 I mapping F(C (x1), ..., C (xI )). In this paper we restrict ourselves to separable linear fusion mappings, i i where the classifier outputs C (xi) are multiplied with matrices V , i = 1, ..., I. The resulting decisions z1, ..., zI are combined with decision fusion. To produce independent answers, the classifiers in this example work on different features. In order to align the decision template combining scheme a set of input vectors X = (x1, ..., xI ) and a set of classifier i in such a way that the combination is done as proposed in outputs C (xi), i = 1, ..., I the combined class membership Figure 1, for each feature space a linear matrix operator V i estimate is given by Eq.

Load more