Learning of Decision Fusion Mappings for Pattern Recognition Abstract 1 Introduction

Learning of Decision Fusion Mappings for Pattern Recognition Friedhelm Schwenker, Christian R. Dietrich, Christian Thiel, Günther Palm Department of Neural Information Processing, University of Ulm 89069 Ulm, Germany [email protected], http://www.informatik.uni-ulm.de/neuro/ Abstract classiers where each classier is trained sepa- rately, e.g. each classier is trained on a specic Dierent learning algorithms for the decision fusion feature subset. mapping of a multiple classier system are compared in this paper. It is very well known that the confusion 2. Training of the Fusion Layer performing a map- matrices of the individual classiers are utilised in the ping of the classier outputs (soft or crisp deci- naive Bayes combination of classier outputs. After sions) into the set of desired class labels. a brief review of the decision templates, the linear This two-phase MCS training is very similar to RBF associative memory and the pseudoinverse matrix learning procedures, where the network is learned over approaches it is demonstrated that all four adaptive two or even three learning phases: decision fusion mappings share the confusion matrices as the essential ingredient. 1. Training of the RBF kernel parameters (centres and widths) of the rst layer through clustering Keywords: Decision fusion mapping, Linear as- or vector quantisation. sociative memory, Decision templates, Pseudoinverse 2. Supervised learning of the output layer (second matrix, Naive Bayes, Confusion matrix. layer) of the network. 3. Training both layers of the RBF network through 1 Introduction a backpropagation-like optimisation procedure. The reason behind this new approach is that in tra- Typically, for these three-phase learning scheme a la- ditional pattern recognition, the best individual clas- beled training set is used to calculate the radial ba- sier has to be found, which is dicult to identify. sis function centres, the radial basis function widths Also, a single classier can not make use of informa- and the weights of the output layer, but it should be tion from other discrimination functions. noticed that other learning schemes may be applied. Typical fusion strategies for the combination of For instance, if a large set of unlabeled data points is classiers in MCS are based on xed fusion map- available this data can be used to train the parameters pings, for instance averaging or multiplying the clas- of the rst RBF layer (centres and widths) through sier outputs [1, 5]. In this paper we consider super- unsupervised clustering [13]. vised learning methods to train the fusion mapping of For the training of the two layer fusion architec- the MCS. Whereas the MCS architecture (see Fig- ture two labeled data sets may be used. It is assumed ure 1) is very similar to layered articial neural net- that I classier decisions, here calculated on I dier- works, like radial basis function networks (RBFs) or ent features, have to be combined. First, the classiers multilayer perceptrons (MLPs), the training is rather Ci, i = 1, ..., I, are trained, by utilising a training set, dierent. For most articial neural network architec- and then another labeled training set R, which is not tures, e.g. MLPs, the parameters are typically trained necessarily completely disjunct to the rst, is sent to simultaneously through a backpropagation-like train- the previously trained rst level classiers to calcu- ing algorithm by a one-phase learning procedure. On late the individual classier outputs. These classier i the other hand a MCS is typically trained by two outputs C (xi), i = 1, ..., I together with the desired completely dierent training phases: class labels are then used to train the decision fusion mapping F. To train this fusion mapping dierent 1. Building the Classier Layer consisting of a set of algorithms have been proposed. In this paper decision templates [7, 8], naive Bayes correcting properties have been shown in several nu- decision fusion [15], linear matrix memory [3], and merical experiments and theoretical investigations [6]. Pseudoinverse matrix [3] methods are studied and In order to calculate the memory matrix V i for i links between these fusion mappings are shown. The each classier C the stored classier outputs Ci are mappings are introduced in Section 2, while Section adapted through a Hebbian learning rule [6] and V i is 3 presents experimental results on their performance. given as the product of the classier outputs Ci and The closing Section explores the close links between the desired classier outputs Y : the fusion mappings. i T (4) V := YCi . | {z } 2 Adaptive fusion mappings W i In the case of crisp classiers the matrix V i is equal to the confusion matrix of classier i, where i is In this Section we briey describe methods to train C Vω,ω∗ adaptive fusion mappings. It is assumed that a set of equal to the number of samples of class ω in the training set which were assigned by i to class ∗ [15]. For rst level classiers C1, ..., CI has been trained through C ω a supervised learning procedure using a labeled train- soft classiers the ω-th row of V i contains the accum- ing set. This corresponds to phase 1 of the two-phase mulated soft classier decisions of Ci for the feature vectors µ ω.1 MCS learning scheme. xi ∈ R In the classication phase these matrices are then To x the notation, let Ω = {1, ..., L} be a set µ used to combine the individual classier decisions to of class labels and Ci(x ) ∈ ∆ be the probabilistic i calculate the overall classier decision (see Eq 3). For classier output of the i-th classier Ci given the input µ a feature vector , the classier outputs feature vector x ∈ R, with ∆ dened as X = (x1, ..., xI ) i i i C (xi) are applied to the matrices V , i = 1, ..., I and L the outputs i L are given by X z ∈ IR ∆ := {(y , ..., y ) ∈ [0, 1]L| y = 1}. (1) 1 L l i i i T (5) l=1 z := V (C (xi)) . Then for each classier the outputs for the training The combined class membership estimate based on I set R are given by a (L × M)-matrix C , i = 1, ..., I feature vectors is then calculated as the average of the i individual outputs of the second level classiers where M = |R| and the µ-th column of Ci contains the classier output i µ T. Here the superscript C (xi ) I I denotes the transposition. The desired classier X i X i i T T z := z = V C (xi) (6) outputs µ for inputs µ are given by the ω ∈ Ω xi ∈ R i=1 i=1 (L × M)-matrix Y dened by the 1 of L encoding and the nal class label based on the combined class scheme for class labels membership estimate z is then determined by the ( maximum membership rule 1, l = ωµ Yl,µ = . (2) 0, otherwise ω := argmax(zl). (7) l∈Ω That is, corresponding to Ci the µ-th column Y·,µ ∈ ∆ contains the binary coded target output of feature 2.2 Decision Templates vector µ . xi ∈ R The concept of decision templates is a simple, in- In the following we discuss dierent approaches to tuitive, and robust aggregation idea that evolved combine the classier outputs i , into C (xi) i = 1, ..., I from the fuzzy template which was introduced by an overall classier decision Kuncheva, see [8, 9]. Decision templates are calculated as the mean of the classier outputs for inputs z := F(C1(x ), ..., CI (x )). (3) 1 I µ of class xi ω ∈ Ω The four dierent learning schemes, namely linear 1 X associative memory, decision template, pseudoinverse T ω := C(xµ). (8) i |Rω| i matrix and naive Bayes are introduced to calculate µ ω xi ∈R the decision fusion mapping F : ∆I → ∆ (see Eq. 3). In the case of I input features for each class the deci- In all these methods the fusion mapping F is re- sion template T ω is given by the (I × L)-matrix alised through (L × L)-matrices V 1, ..., V I , calculated ω through a certain training algorithm. The overall clas- T1 . sier system is depicted in Figure 1. ω . I (9) T := . ∈ ∆ . T ω 2.1 Linear Associative Memory I 1Independent from the classier type (soft or crisp) we con- A linear decision fusion mapping may be realised F sider W i a confusion matrix. It is dened to compare the indi- through an associative matrix memory whose error- vidual fusion schemes (see Eq. 4,11,12 and 14). Classifier Layer Fusion Layer 1 C 1 (x1) 1 Feature 1 V z : : : i : Decision C (x ) : i Fusion i i z Feature i V : I : Final : : : I : C (xI) z Classification Feature I V I z Classification Transformation Figure 1: Two layer MCS architecture consisting of a classier layer and an additional fusion layer. In gen- i eral the combination of the classier outputs C (xi), i = 1, ..., I is accomplished through a fusion mapping 1 I F(C (x1), ..., C (xI )). In this paper we restrict ourselves to separable linear fusion mappings, where the classi- i i 1 I er outputs C (xi) are multiplied with matrices V , i = 1, ..., I. The resulting decisions z , ..., z are combined with decision fusion. To produce independent answers, the classiers in this example work on dierent features. In order to align the decision template combining 2.3 Pseudoinverse Solution scheme in such a way that the combination is done as Another linear decision fusion mapping can be cal- proposed in Figure 1, for each feature space a linear F culated as an optimal least squares solution between matrix operator i is calculated.

Load more