Use Style: Paper Title
Total Page:16
File Type:pdf, Size:1020Kb
3rd International Conference on Automation, Control, Engineering and Computer Science (ACECS-2016) 20 - 22 March 2016 - Hammamet, Tunisia Dimension Reduction Based on Scattering Matrices and Classification Using Fisher's Linear Discriminant Nasar Aldian Ambark Shashoa#1, Salah Mohamed Naas #2, Abdurrezag S. Elmezughi #2 #1 Electrical and Electronics Engineering Department, Azzaytuna University #2 Computer Engineering Department, Azzaytuna University Tarhuna, Libya [email protected] [email protected] [email protected] Abstract— This paper presents Feature Extraction Based on Y , the detection method generate parameter estimatesˆ , Scattering Matrices for Classification using fisher's linear which are called features[3]. Although all these techniques are discriminant approach. The recursive least square identification algorithm is estimated for autoregressive model (ARX) system. well designed for the fault detection, one of the most relevant Next, Dimension reduction based on scattering matrices is used techniques for the diagnosis is the supervised classification. It to obtain good classification performance. The classification is not unusual to see the fault diagnosis as a classification task between two classes is done by a fisher's linear discriminant. Our whose objective is to classify new observations to one of the simulation results illustrate the usefulness of the proposed existing classes. Many methods have been developed for procedures. supervised classification. Such as the Fisher Discriminant. Nevertheless, the classification (diagnosis) is a hard task in a Keywords— Scattering Matrices; Fault Diagnosis; Feature large number of parameters. Indeed, it is not uncommon for a Selection; Fisher's linear discriminant. process to be described by a large number of parameters, I. INTRODUCTION where not all parameters are of equal informative value, such that it is possible to describe the behavior of the process well Feature selection is a process of selecting a small number of enough using a smaller set of parameters. Therefore a highly predictive features out of a large set of candidate selection of the informative variables for the classification attributes that might be strongly irrelevant or redundant. It task should be done in order to increase the accuracy of the plays a fundamental role in pattern recognition, data mining, classification. The paper is structured in the following and more generally machine learning tasks [1], e.g., manner: in the section 2, parameters estimation using facilitating data interpretation, reducing measurement and recursive least square algorithm is presented; in the section 3, storage requirements, increasing predeceasing speeds, dimension reduction techniques using Scattering Matrices and improving generalization performance, etc. System Separability Criteria is derived; the Section 4 provides a identification is an important approach to model dynamical detailed description of the linear discriminant function. in the systems and has been used in many areas such as chemical section 5, the simulation results is presented. Section 6 processes and signal processing [2]. Fault detection and contains the conclusion. isolation (FDI) using analytical redundancy methods are currently the subject of extensive research and numerous II. MODEL IDENTIFICATION OF THE AUTOREGRESSIVE MODEL surveys can be found. The analytical methods compare real The basic step in identification procedure is the choice of process data to those obtained by mathematical models of the suitable type of the model. General linear model takes the system. The most common popular analytical redundancy following form, called (Auto-Regressive model). techniques are parameter estimation, parity relation and 1 1 observer-based approaches. In the parameter estimation A(z )y(k) B(z )u(k) n(k) (1) method, the target system is considered as a continuous- where variable dynamic system, which has an input U and an output A(z 1) 1 a z 1 a z 2 ... a z na 1 2 na The correcting vector is given by 1 1 2 nb T 1 T B(z ) b1z b2 z ... bna z (2) ( k ) P(k)Z(k) =[Z (k)P(k 1)Z(k) 1] Z (k)P(k 1) (9) are shift operators polynomials is introduced as And the matrix P(k) is calculated from the recursive formula i z y(k) y(k i) , Where y(k),u(k),n(k) are the sequences T P(k) [I (k)Z (k)]P(k 1) (10) of system output, measurable input and stochastic input, or With initial conditions noise, respectively, while the constants ai , b j , and ci P(0) I and (0) 0 (11) represent system parameters. III. DIMENSION REDUCTION TECHNIQUES In pattern classification, dimension reduction of estimated parameters is often unavoidable. Namely, it is not uncommon u(i) B(z) y(i) for a process to be described by a large number of parameters, where not all parameters are of equal informative value, such A(z) that it is possible to describe the behavior of the process well enough using a smaller set of parameters. Numerous dimension reduction techniques have been developed, which Fig.1 model structure for ARX system largely seek out suitable transformation matrices Anm , where The recursive parameter estimation algorithms are based on n is the initial vector dimension, and m is the desired the data analysis of the input and output signals from the dimension (m n) that will allow for appropriate projection process to be identified. This method can be used for parameter estimate of ARX model [4]. The algorithm can be T written in following form: Consider linear, dynamic, time- Ym1 Anm X n1 (12) invariant, discrete-time system, which can be represented by na nb Of the initial measurement vectors X (in the present case this is the parameter vector of the identified model) onto reduced- y(k) ak y(k i) bku(k i) n(k) (3) i1 i1 dimension vectors Y , which need not have a physical Equation (3) can be written in a linear regression form meaning in the general case. Therefore, the major task now is T summarized as follows. Given a number of features, how can y(k) Z (k) n(k) (4) one select the most important of them so as to reduce their where number and at the same time retain their class discriminatory information as much as possible? The procedure is known as T feature selection or reduction [5]. There are many types of [a1...ana ,b1...bnb ] (5) dimension reduction methods, The Scattering Matrices and Represents unknown parameter vector and Separability Criteria was selected for this work. T Z (k) [y(k 1)... y(k na) u(k 1)...u(k nb)] (6) SCATTERING MATRICES AND SEPARABILITY CRITERIA Represents a vector of input and output measurable samples (the information vector) and the residual n(k) is introduced Let X be an n-dimensional random vector. Then, X can be as represented without error by the summation of n linearly T independent vectors as n(k) y(k) Z (k) (7) In many practical cases, it is necessary that parameter n (13) estimation takes place concurrently with the system’s X yii Y operation. This parameter estimation problem is called on-line i1 identification and its methodology usually leads to a recursive procedure for every new measurement (or data entry). For this Where reason, it is also called recursive least-squares estimate (RLS) or recursive identification[5]. The proposed recursive [1 . n ] (14) algorithm is given by the following theorem. Suppose that ˆ(k) is the estimate of the parameters of the nth order system And for k data entries. Then, the estimate of the parameter vector Y [y1 . yn ] (15) ˆ (k 1) for (k 1) data entries, with (k 1,2,3,...) is given by the expression. The matrix is deterministic and is made up of n linearly ˆ(k) ˆ(k 1) (k)[y(k) ZT (k)ˆ(k 1)] (8) independent column vectors. Thus, 0 .We may assume that the columns of form an feature space has several attractive properties which we can list [8]. orthonormal set, that is, 1. The effectiveness of each feature, in terms of representing X , is determined by its corresponding eigenvalue. If a T 1 for i j feature, say , is deleted, the mean-square error increases by (16) i i j 0 for i j i . Therefore, the feature with the smallest eigenvalue should be deleted first, and so on. If the eigenvalues are indexed as We may call i the ith feature or feature vector, and yi the 1 2 ......... n 0 (in ascending order), the features ith component of the sample in the feature (or mapped) space. should be ordered in the same manner. The feature values are We should aim to select features leading to large between- mutually uncorrelated, that is, the covariance matrix of Y is class distance and small within-class variance in the feature diagonal. This follows because vector space. This means that features should take distant values in the different classes and closely located values in the 1 0 0 same class. In discriminant analysis of statistics, within-class, 2 0 between-class, and mixture scatter matrices are used to T . formulate criteria of class separability. A within-class scatter Y X (22) matrix shows the scatter of samples around their respective . 0 class expected vectors, and is expressed by . L L 0 . n T (17) Sw Pi E{(X Mi )(X Mi ) i} Pi i i1 i1 2. The set of m eigenvectors of X . Which correspond to the 2 n Where P the a priori probability of class , that is P i , m largest eigenvalues, minimizes (m) over all choices of i i i N m orthonormal basis vectors. where ni is the number of samples in class i , out of a total m*m of N samples [7]. On the other hand, a between-class scatter 1 0 matrix is the scatter of the expected vectors around the 2 mixture mean as .