Artificial Intelligence Methods for Selection of An
Total Page:16
File Type:pdf, Size:1020Kb
Sensors and Actuators B 80 *2001) 243±254 Arti®cial intelligence methods for selection of an optimized sensor array for identi®cation of volatile organic compounds$ Robi Polikara,*, Ruth Shinarb, Lalita Udpac, Marc D. Porterb aDepartment of Electrical and Computer Engineeing, Rowan University, 136 Rowan Hall, Glassboro, NJ 08028, USA bAmes Laboratory, USDOE and Department of Chemistry, Microanalytical Instrumentation Center, Iowa State University, Ames, IA 50011, USA cDepartment of Electrical and Computer Engineering, Iowa State University, Ames, IA 50011, USA Accepted 19 July 2001 Abstract We have investigated two arti®cial intelligence *AI)-based approaches for the optimum selection of a sensor array for the identi®cation of volatile organic compounds *VOCs). The array consists of quartz crystal microbalances *QCMs), each coated with a different polymeric material. The ®rst approach uses a decision tree classi®cation algorithm to determine the minimum number of features that are required to classify the training data correctly. The second approach employs the hill-climb search algorithm to search the feature space for the optimal minimum feature set that maximizes the performance of a neural network classi®er. We also examined the value of simple statistical procedures that could be integrated into the search algorithm in order to reduce computation time. The strengths and limitations of each approach are discussed. # 2001 Elsevier Science B.V. All rights reserved. Keywords: Optimum coating selection; Decision tree; Wrapper search; Neural network classi®cation 1. Introduction various chemical properties *e.g. solubility parameters [14±16]) of the VOCs and the compatibility of each with Piezoelectric chemical sensors, such as surface acoustic a range of compositionally different polymer coatings. Some wave *SAW) devices and quartz crystal microbalances researchers have also tried using various signal processing *QCMs), have been widely used for detection and identi®- metrics, such as the Euclidean distance [17], or the principal cation of volatile organic compounds *VOCs) [1±8]. In component analysis [18] to obtain the optimum set of general, an array of polymer-coated sensors is used for coatings for speci®c applications. detection, where the change in the resonant frequency of Since there may be a large number of polymers suitable for each sensor as a function of VOC concentration constitutes a the identi®cation of a VOC, the selection of the smallest set response pattern. Over the past 15 years, a signi®cant giving the best performance is an ill-de®ned problem. This amount of work has been done on developing pattern situation arises because testing every possible combination is recognition algorithms, using principal component analysis, usually not manageable. Furthermore, many researchers have neural networks and fuzzy inference systems, for various gas investigated the relationship between the number of sensors sensing problems [9±13]. However, these methods can only and the performance of the array [19], and found that using as be successful, if the features *polymer-coated sensor many sensors as possible does not necessarily improve the responses) used to identify the VOCs allow an ef®cient performance of a classi®cation system. In fact, Park and separation of patterns in the feature space. The challenge Zellers [20], and Park et al. [21], through a careful analysis is then to identify a subset of polymer coatings such that a of the required number of sensors versus the number of classi®cation algorithm provides optimum classi®cation analytes and Osbourn et al. [22] through an examination of performance. Selection of coatings is usually based on the effects of increasing the sensor size, have shown that the performance of classi®ers for VOC identi®cation typically degrades as the number of sensors increase beyond a certain $ Portions of this work were completed while the corresponding author number. Therefore, an ef®cient algorithm for optimum selec- was with the Department of Electrical and Computer Engineering of Iowa tion of sensors is of paramount importance. State University. * Corresponding author. Tel.: 1-856-256-5372; fax: 1-856-256-5241. For small pools of potential coatings, an exhaustive search E-mail address: [email protected] *R. Polikar). may be manageable. For example, Zellers and coworkers 0925-4005/01/$ ± see front matter # 2001 Elsevier Science B.V. All rights reserved. PII: S 0925-4005*01)00903-0 244 R. Polikar et al. / Sensors and Actuators B 80 62001) 243±254 used extended disjoint principal components regression subset of features for a PR problem. Feature subset selection analysis to conduct an exhaustive search on a 10-polymer is commonly encountered in pattern analysis, machine dataset and identi®ed four polymers as requisite array learning and arti®cial intelligence [29,30]. Many studies elements for optimum identi®cation of six VOCs have shown that most classi®cation algorithms perform best [20,21,23]. The use of four polymers out of 10 amounts when the feature space includes only the most relevant to 210 possible combinations, which is manageable for information that is required for identi®cation [31±33]. an exhaustive search. However, as the number of possible While having relevant features is a key to successful coatings increase, an exhaustive search becomes computa- performance of any classi®cation algorithm, the de®nition tionally prohibitive. Adding only two more coatings to of a relevant feature has been extensively debated. Some the pool, for instance, requires evaluating 495 possible studies suggest algorithms that are preprocessing in nature. four-coating combinations, and a more practical problem These preprocessing algorithms can be viewed as ®ltering of choosing 6 out of 20 coatings requires testing 38,760 the data, and thus eliminating irrelevant features. Statistical different combinations of coatings. measures, such as the properties of the probability distribu- In efforts to reduce the number of candidate coatings from tion function of the data, are often employed for ®ltering out a larger pool of potentially useful coatings, various pattern the irrelevant features, and consequently, these algorithms recognition *PR) algorithms have been developed. Principal are referred to as ®lter approaches [34±36]. Filter algo- component analysis *PCA), a dimensionality reduction tech- rithms, however, are independent of the classi®cation algo- nique, has been one of the most popular of such techniques. rithm to be used to process the data. Some researchers Carey et al. used PCA [24] to reduce the feature vector suggest that relevant features for any set of data are depen- obtained from 27 sensors to less than 8 for an identi®cation dent on the classi®cation algorithm [30,37]. For example, a problem consisting of 14 VOCs. Avila et al. introduced good set of features for a neural network may not be as correspondence analysis as an alternative to PCA [25] and effective for decision trees. Such studies indicate that a showed that it had computational advantages as well as feature selection algorithm must be based on or wrapped performance improvement over PCA on the same dataset around the classi®cation algorithm [37]. Feature selection used by Carey et al. [24]. PCA has been employed not only algorithms that use such an approach are known as wrapper in the gas sensor area, but also in many other areas where approaches. Most wrapper approaches, on the other hand, data analysis for dimensionality reduction is important. With suffer from large computational time and space complexity PCA, the strategy is to ®nd a set of n orthogonal vectors problems, particularly for data sets with a large number along which the m dimensional data has the largest variance of features. such that n < m. PCA is, therefore, a dimensionality reduc- Due to the limited number of possible coatings typically tion procedure, rather than a feature selection procedure. used in gas sensing area, the computational complexity of This distinction is because the principal components are wrapper approaches does not constitute a major drawback. computed as the projection of the data on a set of orthogonal We have therefore analyzed two techniques based on the vectors that are the eigenvectors of the covariance matrix of wrapper approach, and we report herein on the perfor- the data. The covariance matrix, may and frequently does, mances of these two arti®cial intelligence *AI) approaches contain signi®cant information obtained from each sensor. for selecting the optimum set of coatings for VOC identi- Consequently, PCA does not reduce the number of sensors, ®cation. The ®rst approach is based on Quinlan's iterative nor does it identify the optimum set of coatings. Recently, dichotomizer 3 *ID3) algorithm [31], a decision tree Osbourn and Martinez [26], and Ricco et al. [27] introduced algorithm that integrates classi®cation and feature selec- visual empirical region of in¯uence pattern recognition tion. The second approach is a modi®ed version of the *VERI-PR) for identi®cation of VOCs. Various shortcom- wrapper model of Kohavi and John [37], which uses a ings of neural network and statistical techniques for pattern hill-climb search algorithm to search the feature space recognition have also been addressed by these authors. for an optimum set of features. The original wrapper For example, VERI does not require or assume any speci®c model combines the hill-climb search with ID3. We have probability distributions to be known, and it does not explored integrating the hill-climb search with a multilayer require a large number of parameters to be adjusted by perceptron *MLP) neural network. We have also investi- the user. Furthermore, VERI is a versatile algorithm not only gated the value of using a different starting point for the capable of pattern recognition, but also of optimum feature search, based on the variance of the data, to accelerate the selection. The optimal feature selection capabilities of VERI convergence of the hill-climb search.