Approach to Fault Identification for Electronic Products Using Md 2057

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 59, NO. 8, AUGUST 2010 2055 Approach to Fault Identification for Electronic Products Using Mahalanobis Distance Sachin Kumar, Member, IEEE, Tommy W. S. Chow, Senior Member, IEEE, and Michael Pecht, Fellow, IEEE Abstract—This paper presents a Mahalanobis distance (MD) The built-in test (BIT) and self-test abilities in a system based diagnostic approach that employs a probabilistic approach were early attempts at providing diagnostic capabilities incor- to establish thresholds to classify a product as being healthy porated into a system’s own structure. Gao and Suryavanshi or unhealthy. A technique for detecting trends and biasness in system health is presented by constructing a control chart for have catalogued applications of BITs in many industries, in- the MD value. The performance parameters’ residuals, which are cluding semiconductor production, manufacturing, aerospace, the differences between the estimated values (from an empirical and transportation [5]. BIT system applicability is limited to model) and the observed values (from health monitoring), are the failure definition embedded at the system’s manufactur- used to isolate parameters that exhibit faults. To aid in the qual- ing stage, whereas, with developments in sensor and data ification of a product against a specific known fault, we suggest that a fault-specific threshold MD value be defined by minimizing analysis capabilities, the development and implementation of an error function. A case study on notebook computers is pre- data-driven diagnostic systems that can adapt to new failure sented to demonstrate the applicability of this proposed diagnostic definitions are now possible. approach. Today, a product’s health can be assessed in many ways, Index Terms—Computers, diagnostics, electronic products, including monitoring changes in its performance parameters, fault identification, fault isolation, Mahalanobis distance (MD). which are used to characterize a system’s performance; monitoring canaries (structures that have equivalent circuitry but are calibrated to fail at a faster rate than the actual product); and I. INTRODUCTION estimating accumulated damage based on physics-of-failure UANTIFICATION of degradation and fault progression modeling [6]. Performance parameter analysis uncovers the Q in an electronic system is difficult since not all faults nec- interactions between performance parameters and the influence essarily lead to system failure or functionality loss [1], [2]. In of environmental and operational conditions on these param- addition, there is a significant lack of knowledge about failure eters. In the absence of fault-indicating parameters, health precursors in electronics [3]. With limited failure precursors assessment can be performed by combining 1) damage estimate and complex architecture, it is generally hard to implement information obtained from physics-based models that utilize a health-monitoring system that can directly monitor all the data from environmental and operating conditions and 2) failure conditions in which fault incubation occurs. precursor information extracted from data-driven models [7]. A The health of a system is a state of complete physical, struc- product’s historical data on intermittent failures (i.e., failures tural, and functional well-being and not merely conformance to that cannot be reproduced in a laboratory environment [8]) the system’s specifications. A health assessment of electronic should be included in a product’s health assessment. products can be performed at the product level, assembly Sun Microsystems developed the Continuous System level, or component level [4]. The health assessment procedure Telemetry Harness for collecting, conditioning, synchronizing, should also consider various environmental and usage condi- and storing computer systems’ telemetry signals [9]. The Mul- tions in which a product is likely to be used. tivariate State Estimation Technique (MSET) provides an estimate of each parameter, and these estimates are later used for decision making using the Sequential Probability Ratio Test and hypothesis testing. The Mahalanobis Distance (MD) approach Manuscript received March 19, 2009; revised August 27, 2009; accepted August 28, 2009. Date of publication October 30, 2009; date of current version considered in this paper is a distance measure in multidimen- July 14, 2010. The Associate Editor coordinating the review process for this sional space that considers correlations among parameters [10]. paper was Dr. John Sheppard. S. Kumar is with the Prognostics and Health Management Labora- The use of the MD approach over the MSET will reduce the tory, Center for Advanced Life Cycle Engineering (CALCE), University analytical burden, because the MD approach provides a number of Maryland, College Park, MD 20742 USA (e-mail: [email protected]; for determining a system’s health after combining information [email protected]). T. W. S. Chow is with the Prognostics and Health Management Centre, De- on all performance parameters, whereas MSET provides an partment of Electronic Engineering, City University of Hong Kong, Kowloon, estimate for each parameter and needs analytical assessment of Hong Kong. each parameter for determining a system’s health. M. Pecht is with the Prognostics and Health Management Laboratory, CALCE, University of Maryland, College Park, MD 20742 USA, and also with Other distance-based approaches that have been used for the Prognostics and Health Management Center, City University of Hong Kong, diagnostics and classification include Manhattan distance, Kowloon, Hong Kong (e-mail: [email protected]). Euclidean distance, Hamming distance, Hotelling T-square, and Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. square prediction error. Manhattan distance is the distance Digital Object Identifier 10.1109/TIM.2009.2032884 between two points measured along axes at right angles. It 0018-9456/$26.00 © 2009 IEEE 2056 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 59, NO. 8, AUGUST 2010 has been used to classify text via the N-gram approach [11]. These traditional methods do not provide a generic framework Euclidean distance is the straight-line distance between two for defining a threshold MD value for fault identification. The points and can be calculated as the sum of the squares of the proposed diagnostic method does not require the definition of a differences between two points. The Hotelling T-square and faulty product during training and fault isolation, unlike other square prediction error are used in principal component analysis methods such as clustering and supervised neural networks that for representing statistical indices [12]. The Hotelling T-square require aprioriknowledge of the types of faults during training is a measure that accounts for the covariance structure of a [21]. When unforeseen types of faults occur, supervised neural multivariate normal distribution and is computed in reduced networks or clustering approaches may fail to deliver a correct model space, which is defined by a few principal components decision on system health [21]. (i.e., the number of principal components used is less than the The MD approach suffers from the masking effect if the number of original parameters) [13]. The squared prediction training data contain a significant amount of outliers [22]. This error index is a measure computed in the residual space that is because MD uses a sample mean and a correlation matrix is not explained by the model space [14]. that can be influenced by a cluster of outliers. These outliers The Manhattan distance, Euclidean distance, and Hamming can shift the sample mean and inflate the correlation matrix in distance do not use correlation among parameters and suffer a covariate direction. This is particularly true if the n/p ratio from a scaling effect, in contrast to MD. The scaling effect is small, where n is the number of observations and p is the describes a situation where the variability of one parameter number of features. Another issue is related to the computation masks the variability of another parameter, and it happens time needed to reach O(p2) for the p-dimensionality of feature when the measurement ranges or scales of two parameters are vectors [23]. different [15]. To remove the scaling effect (i.e., eliminate the This paper provides a probabilistic approach for defining influence of measurement units), the data should be normalized. warning and fault threshold MD values to improve upon the The Hotelling T-square and the square prediction error indices traditional approaches where threshold MD values are decided are calculated in reduced dimensions (i.e., information loss) by experts. Since MD values do not follow any distribution and and use covariance as opposed to a correlation matrix, which have positive values, a Box–Cox transformation was applied is one reason to consider using MD for fault diagnosis. MD to the MD values to obtain a normally distributed transformed calculation uses the normalized values of measured parame- variable. The transformed variable was used to construct a ters, which eliminates the problem of scaling. MD also uses control chart and define threshold values to detect faults. An correlation among parameters, which makes it sensitive to optimized MD value, using an error function, was obtained to interparameter “health” changes. For example, consider a set qualify a product against a particular

Approach to Fault Identification for Electronic Products Using Md 2057

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support