Detecting abnormal ozone measurements with a -based strategy

Item Type Article

Authors Harrou, Fouzi; Dairi, Abdelkader; Sun, Ying; Kadri, Farid

Citation Harrou F, Dairi A, Sun Y, Kadri F (2018) Detecting Abnormal Ozone Measurements With a Deep Learning-Based Strategy. IEEE Sensors Journal 18: 7222–7232. Available: http:// dx.doi.org/10.1109/JSEN.2018.2852001.

Eprint version Post-print

DOI 10.1109/JSEN.2018.2852001

Publisher Institute of Electrical and Electronics Engineers (IEEE)

Journal IEEE Sensors Journal

Rights (c) 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.

Download date 04/10/2021 08:33:37

Link to Item http://hdl.handle.net/10754/628364 This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2852001, IEEE Sensors Journal IEEE SENSORS JOURNAL 1

Detecting abnormal ozone measurements with a deep learning-based strategy Fouzi Harrou, Member, IEEE, Abdelkader Dairi, Ying Sun, Farid Kadri

Abstract—Air quality management and monitoring are vital to by bright sunlight and usually result in a severe air quality maintaining clean air, which is necessary for the health of human, issue named photochemical smog [14]–[16]. Thus, there is vegetation, and ecosystems. Ozone pollution is one of the main a need for systematic approach to detecting abnormal ozone pollutants that negatively affect human health and ecosystems. This paper reports the development of an unsupervised and levels, which provides pertinent information for enhancing air efficient scheme to detecting anomalies in unlabelled ozone mea- quality management or providing useful information for rapid surements. This scheme combines a Deep Belief Networks (DBN) decision-making, such as warning the public of harmful ozone model and a one-class support vector machine (OCSVM). The levels or checking the sensors in case a technical problem is DBN model accounts for nonlinear variations in the ground-level causing anomalies. [17]–[19]. This paper presents an flexible ozone concentrations, while OCSVM detects the abnormal ozone measurements. The performance of this approach is evaluated monitoring scheme to detect abnormal ozone data. using real data from Isere` in France. We also compare the detection quality of DBN-based detection schemes to that of Recently, deep learning-based feature extraction mytholo- deep stacked auto-encoders, Restricted Boltzmann Machines- gies turn out to play a considerable role in the literature [20]– based OCSVM and DBN-based clustering procedures (i.e., K- [24]. As a matter of fact, deep learning methods were de- means, Birch and Expectation Maximization). The results show signed to model complex systems with flexibility, simplic- that the developed strategy is able to identify anomalies in ozone ity, and strength using series of multilayer architectures. Fo measurements. instance, they are used to enhance intelligent transportation Index Terms—Air quality, learning, OCSVM, Ozone, DBNs, systems [22], [25], health informatics [23], human action statistical monitoring, . recognition [26], detection of the cerebral microbleed vox- els [20], classification of hearing loss images [21], and in I.INTRODUCTION fingerprints indoor positioning via WIFI [27]. Due to its broad applications, the Deep Belief Networks (DBNs) models, have O achieve acceptable air quality and better understand air received much attention from researchers recently. However, pollution phenomena, investigations on the health influ- T to the best of our knowledge, the DBNs models have not been ences of ambient air pollution have attracted numerous efforts employed for ozone pollution monitoring. throughout the last few decades [1]–[6]. The surveillance of at- mospheric pollution is of great importance due to the negative Constructing a fast and precise model of ozone variation is effects of pollution on health of human and the ecosystem [7], very challenging mainly because its mechanisms of production [8]. The high levels of air pollution have become an important in the troposphere are complex and there are measurement issue and can leads to serious impacts on the human health [2], uncertainties in all the parameters involved [28], [29]. To [9], [10]. Pollution from ozone (O3) formed in the troposphere bypass this difficulty, in this study we use DBN to model is a growing issue in industrialized countries, mainly due to the variability in ozone data. It shows a good performance its participation in greenhouse gas emissions [7]. Accordingly, in learning layer-by-layer complex nonlinearity [30]. The air quality management and monitoring are key to assuring a purpose is to design a deep learning-based methodology healthy living environment [11]. capable to detect abnormal ozone measurements. Basically, this method combines a DBN model with one-class support Air quality is a key challenging problem increasingly vector machine (OCSVM), to simultaneously take benefits of gaining attention worldwide [12], [13]. Ground-level ozone a DBN model in extracting features from high dimensional (O3), which is an important pollutant that affect everyones data and the anomaly detection capability of OCSVM. We health, is basically generated by a chemical reaction of ni- assess our approach via data obtained from the Isere` region in trogen and carbon-based compounds discharged in vehicle France. We also compare the detection quality of DBN-based exhausts [12], [13]. The reactions creating ozone are favored detection schemes to that of deep stacked auto-encoders (DSA) and Restricted Boltzmann Machines-based OCSVM and other F. Harrou and Y. Sun are with King Abdullah University of Science and clustering procedures. Technology (KAUST) Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, 23955-6900, Saudi Arabia e-mail: [email protected] The study site and data collection are briefly described A. Dairi is with Computer Science Department, University of Oran 1 in Section II. Section III reviews the DBN model and the Ahmed Ben Bella , Algeria Street El senia el mnouer bp 31000 Oran, Algeria. OCSVM scheme. Then, the DBN modeling approach and the E-mail: [email protected] F. Kadri is with Sopra Steria Group, Big Data & Data Analytics Team, OCSVM procedure are introduced in Section III. Section IV 31770 Colomiers, France. E-mail: [email protected] presents the designed monitoring technique. In Section V, we

1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2852001, IEEE Sensors Journal IEEE SENSORS JOURNAL 2

assess the efficiency of the developed approach using via real data and conclusions are presented in Section VI.

II.STUDIEDAIRQUALITYMONITORINGNETWORK In this paper, we monitor ozone measurements from Isere,` France (see Figure 1). Atmo Auvergne-Rhone-Alpesˆ is the establishment in charge of supervision of air quality in this re- gion. Hourly ozone data are collected via 14 stations dispersed throughout the region (see Table I). Each station consists of a room with many analyzers (see Figure 2). The ambient air is taken in from the roof of each measuring station and pumped to the measuring instruments located inside the station. Each analyzer is used to a specific pollutant (e.g., ozone). The TABLE I ambient air is continuously analyzed 24 hours a day and 7 MEASUREMENT STATIONS IN ISEREREGION` . days a week. The data measured at the network stations are transmitted by telephone line to a central computer, stored in a local database, processed statistically, and validated manually every day by the Atmo team. The validated data are then sent to a national database.

Fig. 2. Measurement station.

Fig. 1. Location of Isere` in France.

The spatial distribution of measurement stations is shown in Figure 3.

III.PRELIMINARY MATERIAL Here, we briefly describe the DBN model and the OCSVM scheme, which are used to construct the proposed detection approach. Fig. 3. Network of measurement stations for the Isere` region. A. Deep Belief Networks DBNs models can basically be obtained by piling succession input data. RBMs are generative models that can be used to of many Restricted Boltzmann Machines (RBMs). These latter generate new data by sampling from the model. They are represent stochastic neural networks (see Figure 5), which can defined with the restriction: no interaction between hidden be viewed as a special case of Boltzmann machines (BM). variables, i.e., neurons belonging to the same layer are not They are efficient to estimate the probability distribution of connected. However, a full connection mode exists with the

1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2852001, IEEE Sensors Journal IEEE SENSORS JOURNAL 3

Fig. 4. Structure of an RBM model. Fig. 5. A diagrammatic representation of a DBN model with three hidden layers. neurons in the other layer. RBMs are undirected graphs with a simple structure composed of a visible layer v representing the input data (observed) and hidden (or latent) layer corre- a weight matrix Wk with optional bias vectors for both the sponding to the feature detectors. The interaction between the visible and hidden layers. visible and hidden units (v ⇔ h) has an energy expressed by The DBN joint distribution between the observed vector x the following energy function [31]: and the ` hidden layers h is given by [32]

where P (x|h1) denotes the conditional distribution of the where Wij is the weight matrix between the input vi and input x and the first hidden layer h1, P (h1|h2) models the the hidden variable hj, and b and c are model parameters. conditional distribution of hidden layer and each other as Generally speaking, RBMs can be viewed as estimators of Sigmoid belief networks (see Figure 5). The term P (h`−1, h`) the distribution underlying the training data (see equation expresses the joint distribution of an RBM. We expressed the III-A). The probability (distribution) assigned by the RBM DBN model in a compact formula: configuration to every possible pair of visible and hidden vectors (v, h) is given by the energy function. The joint distribution of the RBM configuration is

where x = h0 and P (hk|hk+1) is a visible given hidden conditional distribution in an RBM corresponding to level k of the DBN and expressed by P (hk|hk+1) = sigmoidbk + W khk+1, and P (h`−1, h`) represents the joint distribution in where Z is a partition function denoted by Z = the top-level RBM. P P v h exp(−Energy(v, h)). Thus, DBN can be dened as a (see Fig- B. One-class support vector machines (OCSVM) ure 5), designed in an unsupervised manner with unlabeled OCSVM can be regarded as a special case of the mul- training data. It is exploited to discover relevant features of ticlass SVM scheme that separates normal from abnormal data via multiple layers of nonlinear representations. datasets [33]. In other words, OCSVM is constructed using The principal building block of a DBN is an RBM [32], only data from one class (i.e., the normal data). It is trained which represents a layer in the network. The training method to build boundaries (hyperplanes) in an unsupervised manner proposed by [32] aims to train the whole deep network se- based on training data. Generally speaking, the constructed quentially layer-by-layer, which means that each stacked RBM boundaries are employed to check whether new data points in the architecture is separately trained in an unsupervised have close features as the nominal class in the feature space manner. This approach is called greedy layer-wise, and it F for normal observations, which is delimited by the hy- creates a synergy between the layers (RBMs) in that the perplanes. The decision process is performed via a decision output of one layer is the input of the layer. This gives DBNs function, f(x), which gives the response +1 (inlier) if the the capability to discover features by learning a higher-level tested data is within the feature space F of normal data and representation of the data through the succession of layers. −1 (outlier) for observations classified as outside the feature Once the training phase is completed, each k-th RBM provides space F (i.e., an anomaly).

1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2852001, IEEE Sensors Journal IEEE SENSORS JOURNAL 4

We let x1, . . . , xj ∈ D and j ∈ [1, k] be the training dataset. which provide a minimum deviation between output and input, OCSVM maps input data into the high-dimensional feature then its output will be the input of the next layer. In the DBN space F via kernels such as the radial basis function (RBF) modeling, the features produced by the first layer are used to (see equation 6). As illustrated in Figure 6, the decision rule feed the second layer to obtain new features that will feed the f(x) aims to maximize the distance between the hyperplane next layer and so on. At the end of each layer, the model (i) H, which separates the training data in the features space F, discovers and extracts new features. After each model (ii), a and the origin. So, the objective function f(x) is expressed new encoded output is generated that is used as input for the as where W , ρ and Ψ represents respectively a weight vector, next layer. This greedy layer-wise training procedure has been proposed by [35] for building a DBN model.

Generally speaking, an appropriate model is able to extract an offset, and a feature map D → F. W and ρ can be deter- the complexity features to mimic the process dynamics and mined easily by solving the following quadratic optimization approximatively reconstruct the original input data. In training problem. stage, the designed deep learning model is usually validated by evaluating the reconstruction error. Indeed, cross entropy function has been commonly used to validate the quality of the built model. It measures the dissimilarity between two distributions of input and reconstructed data form the designed model. The cross-entropy function becomes closer to zero where ν ∈ [0, 1] represents the parameter that defines the when the selected model describes well the input data. It is solution. defined as [36] n X C(X, Xˆ)=− (Xˆi log(Xi) + (1 − Xˆi) log(1 − Xi)), (1) i where X is the distribution of input data and Xˆ is the distribution of reconstructed data from the constructed model. The initialization of parameters plays a key role in reaching the global minimum of the cost function. Three main initial- izations methods are usually used to set the hyperparameters in deep learning architectures [37]: (1) random initialization, which randomly selects values of the starting parameters, (2) grid search which defines a range of hyperparameters and iterates over those ranges till reaching an acceptable result, and finally (3) the manual search based on knowledge of a problem in hand. After constructing a reference DBN model, in the second phase, an OCSVM algorithm with a nonlinear kernel is trained Fig. 6. Illustration of a one-class CSVM scheme. with the features extracted by DBN. As stated above, DBN and OCSVM are both trained entirely in an unsupervised way. Then, the constructed DBN model is utilized with OCSVM to IV. DBN-OCSVM METHODOLOGY test new datasets. The main steps of the proposed approach Here, we describe the designed anomaly detection method are summarized in Algorithm 1. We apply this model to that we will use for monitoring ozone measurements. Because model-based methods are not commonly available to model ozone variations due to its complex nonlinear dynamics, we adopted the DBN model, which is a powerful tool with greedy learning features, accounts for the nonlinear aspects of ozone variations. This methodology merges the benefits of a DBN modeling and anomaly detection ability of OCSVM to improve ozone monitoring (see Figure 8). DBNs exhibit high performance in identifying features from high-dimensional Fig. 7. Diagrammatic illustration of DBN-based OCSVM methodology. data [34]. Here, we first construct a DBN model with four RBM layers based on anomaly-free data (under anomaly-free inspect process variables for abnormal events by looking for conditions). The first layer is trained with training input data. future deviations from normality. In this approach, the output Incorporating more hidden units to a given layer increases the DBN model is used by the OCSVM algorithm for anomaly flexibility of the model to extract relevant features from the detection. A schematic illustration of the proposed DBN- analyzed data. After selecting the parameters of a given layer, OCSVM strategy is given in Figure 8 and outlined below.

1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2852001, IEEE Sensors Journal IEEE SENSORS JOURNAL 5

Algorithm 1: Training DBN-OCSVM approach Input: Training DataSet: S Output: Deep Belief Network Model: DBNModel, One Class SVM Model: OCSVMModel 1 D, O : V ectors; 2 D ← S; 3 i ∈ 0, . . . , l, where l is number of layers in DBN 4 for Each Layer RBMi in DBN Layers do 5 RBMModel ← T rainRBM(RBMi, D); 6 O ← EncodeRBM(RBMi, D); 7 D ← O;

8 S¯ ← D; 9 DBNModel ← BuildDBN(RBM0,...,RBMl); 10 OCSVMModel ← T rainOCSV M(S¯); 11 return DBNModel, OCSVMModel

Offline modeling Step (a): 1) Collection of data without anomalies (i.e., ambient pol- lution) to build training dataset S. 2) Normalization of training dataset S. Fig. 8. DBN-based OCSVM fault detection methodology. 3) Greedy layer-wise unsupervised training of each layer (RBM) separately. 4) Hierarchical nonlinear transformations to discover fea- tures. 5) Building DBN model M, weights matrix W, and bias B. 6) Generation of new training dataset S¯, which is an encoded version of the original dataset S that provides consistent features. Fig. 9. Ozone measurements used in training. 7) Unsupervised training of OCSVM using dataset S¯. 8) Building of OCSVM model for anomaly detection using hyperplanes. Autocorrelation function (ACF) plots of the training time series from Figure 9 are presented in Figure 10. Figure 10 indicates an ACFperiodicity of 24 hours. One period is defined Online detection as the time between two successive maxima in the ACF Step (a): (Figure 10). This periodicity is due to the diurnal ozone cycle 1) Acquisition and normalization of new observation X. which is mainly resulted from the diurnal temperature cycle. 2) Encoding of the new observation X using DBN model. 3) Classification of X as inlier or outlier based on the OCSVM model hyperplanes.

V. OZONEPOLLUTIONMONITORING This section is dedicated to an evaluation of the pro- posed anomaly-detection strategy with real datasets from Isere´ in France. The ozone data was downloaded from Atmo Auvergne-Rhone-Alpes’sˆ website (http://www.atmo- auvergnerhonealpes.fr). Ozone data measured January 1st to March 4th, 2015 (see Figure 9), were used to develop a DBN model. Before we evaluate the performance of the proposed algorithm, we first conduct a descriptive analysis of Fig. 10. Autocorrelation function of ozone data. the anomaly-free training data (of ambient ozone pollution). Figure 9 clearly shows that the variations in the ozone time The summary statistics for the training dataset are shown series collected from different sensors behave similarly to each in Table II. The location of the distribution of each variable other. (i.e., ozone measurements from each station) is captured by

1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2852001, IEEE Sensors Journal IEEE SENSORS JOURNAL 6

the mean value, while the spread in the dataset is delineated by standard deviations, extremes, and quartiles. The skewness and kurtosis represent the symmetry and flatness of the distribution of the studied time series. The kurtosis is used to quantify the non-Gaussianity of the data. The kurtosis, or the degree of peakedness of a distribution, is 3 for a Gaussian distribution. If the kurtosis is greater than 3, the distribution is super- Gaussian, i.e, more peaked than Gaussian; if it is less than 3, then the distribution is sub-Gaussian, i.e., flatter than Gaussian. Skewness indicates the asymmetry of the data compared to the sample mean. The skewness of a symmetric distribution, such as Gaussian, is zero. Negative skewness indicates that the distribution is skewed to the left side, and positive skewness indicates that the distribution is skewed to the right. From Table II, we conclude that the ozone data is approximately Gaussian (i.e., small values of skewness, and the kurtosis values are around 3). Fig. 12. The evolution of the cross-entropy error in function of the number of epochs during the training phase. For the training data in Figure 9, the box plots are shown in Figure 11. We see from Figure 11 that the data distribution from all sensors are almost symmetric and have comparable a progressive production of the photochemical ozone. While behavior. false anomalies are mainly related to sensors malfunctioning (i.e., sensor drifts) and are very short with a duration less than an hour and are very large (Figure 13 (Right)).

Fig. 11. The box plots of the data in Figure 9.

The values of parameters of the designed DBN model and OCSVM scheme are given in Table III. After constructing the reference DBN model with anomaly-free data, we use the output features from DBN to monitor the abnormal ozone pollution with OCSVM. In this study, the initial values of the DBN parameters are fixed manually during the training phase. A loss function is used to measure the error made by the network when trying to reproduce the input. This function is also called cost, objective or usually regularization term. As pointed out above, the cross- entropy error function is used to quantify the quality of the designed model. The training is performed until the cross- entropy error converges close to zero (closer to the global optimum). Figure 12 shows the convergence of the cross- entropy loss when the number of epochs is around 180.

Abnormalities in ozone measurements can be categorized Fig. 13. (Top) Photochemical ozone pollution and (Bottom) spike in ozone into two types: true and false anomalies. True anomalies concentration due to sensors malfunction. are anomalies generated by an increase of ozone concen- tration (i.e., abnormal pollution) (Figure 13 (Left)). They To detect atypical ozone peaks, we complete two steps. are generated under certain conditions including sunny days First, we design a model that describes the variation in ozone under stagnant, humid air conditions and high temperatures pollution. Then, based on the DBN model and the OCSVM to enhance the formation of ozone [17]. These anomalies are detector, we check the new measured data for anomalies. We with a duration of several hours because of time required for combine the DBN model with the OCSVM detector using an

1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2852001, IEEE Sensors Journal IEEE SENSORS JOURNAL 7

TABLE II DESCRIPTIVE STATISTICS OF THE TRAINING DATASET.

Mean STD Min 1st Quartile Median 3rd Quartile Max Skewness Kurtosis S1 63.80 30.85 4.00 39.00 63.00 88.00 134.00 0.08 2.07 S2 63.89 26.51 4.00 44.00 64.00 84.00 129.00 0.10 2.33 S3 61.96 26.12 5.00 42.00 61.00 81.00 137.00 0.15 2.39 S4 58.75 29.06 2.00 36.00 57.00 80.00 132.00 0.18 2.26 S5 54.99 26.86 1.00 34.00 53.00 74.00 125.00 0.26 2.33 S6 63.20 26.41 9.00 43.00 62.00 82.00 128.00 0.24 2.32 S7 59.71 28.20 3.00 37.00 57.00 80.00 138.00 0.36 2.37 S8 64.33 27.38 4.00 44.00 63.00 84.00 137.00 0.09 2.36 S9 55.40 31.03 0 32.00 59.00 79.00 127.00 -0.07 2.12 S10 65.75 29.10 1.00 44.00 69.00 86.00 134.00 -0.16 2.37 S11 52.88 28.31 1.00 30.00 52.00 73.00 120.00 0.16 2.23 S12 66.29 31.20 1.00 43.00 68.00 90.00 136.00 -0.13 2.21 S13 59.11 27.40 0 39.00 59.00 80.00 129.00 0.01 2.21 S14 81.84 24.83 7.00 64.00 81.00 100.00 159.00 0.10 2.68

TABLE III PARAMETER VALUES OF THE DBN MODELAND OCSVM SCHEME.

RBF kernel and parameters values (γ = 0.1 and ν = 0.001). We compare the efficiency of the designed methodology to that declared by the experts of Atmo Auvergne-Rhone-Alpes.ˆ The testing measurements comprise the period between January 1st and September 10th, 2017. The detection results are given in Table IV. Fig. 14. Original ozone data (plot(a)). The DBN-OCSVM chart (plot(b)) in the case with abnormal measurements occurred on 22 and 23 June 2017.

running time, which is acceptable and meets the requirement of the real-time application.

A. Sensor anomaly detection: False anomalies TABLE IV In this section, we assess the performance of the DBN- DETECTIONRESULTSFROMTHE DBN-OCSVM ALGORITHM. OCSVM algorithm in detecting sensor faults (i.e., faults in sen- sor readings). We note that high ozone concentrations related to sensor faults can be observed outside the summer period The ozone data with atypical ozone measurements that and even in a night, which is not the case for ozone pollution occurred on June 22nd and 23rd, 2017 and the corresponding produced via photochemical ozone. Also, sensor faults may DBN-OCSVM results are demonstrated in plots (a) and (b) produce unusually high values, from 150 to over 600 µg/m3. of Figure 14. The output of DBN-OCSVM is ’1’ in the case Here, we inject three types of faults to the testing datasets and of normal measurements and ’-1’ in the case of anomalies in assess the efficiency of the proposed algorithm. In the first case ozone data. We see that the DBN-based OCSVM chart detects study, we introduce a single bias fault (i.e., in one variable) the abnormal ozone measurements. to the testing data from S2, where the measured values are The proposed approach is running entirely using a single shifted by a bias, x(t) + b, from the correct measurements. In CPU desktop machine with Intel i7 4770 CPU based on the second case study, we simultaneously inject multiple bias Streaming SIMD Extensions (SSE) technology. This approach faults into the data from S3 and S6. In the third case study, is fast and provides the detection result in around 10ms we introduce an intermittent fault into the data of S5. Here,

1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2852001, IEEE Sensors Journal IEEE SENSORS JOURNAL 8

studied algorithms are presented in Table V.

TABLE V VALUESOFPARAMETERSUSEDINTHESTUDIEDSCHEMES.

Models Parameter Value weights uniform Learning rate 0.01 Training epochs 100 Birch branch 50 init k-means++ K-means init 10 iteration 300 covarianceType full EM covar 1e-06 iteration 100

1) Case A: Single abrupt fault: Here, we evaluate the per- formance of the DBN-OCSVM algorithm in detecting a single bias sensor fault. We introduced the bias to measurements collected by S2. Then, we apply the DBN-OCSVM algorithm Fig. 15. Original ozone data (plot(a)). The DBN-OCSVM scheme (plot(b)) and compute the AUC values for each fault magnitude between in the case with abnormal measurements on 29 August 2017. 5% and 100% of the total variation found in the raw data. Figure 17 shows the AUC values of DBN3, DBN2, RBM and DSA-based OCSVM algorithms, and standalone OCSVM for different fault magnitudes.

Fig. 17. AUC comparison of the studied fault detection methods (i.e., DBN3, DBN2, DBN1RBM, and DSA-based OCSVM and standalone OCSVM meth- ods) for different fault magnitudes (case A).

From Figure 17, we see that the DBN-OCSVM algorithm performs well in detecting the single bias fault, even when the magnitude is relatively small. Results in Figure 17 confirms Fig. 16. Original ozone data (plot(a)). The DBN-OCSVM scheme (plot(b)) also the superiority of the DBN-OCSVM approach compared in the case with abnormal measurements on July 7th, 2017. to RBM, DBN2 and DSA-based OCSVM and standalone OCSVM algorithms. These results indicate that deeper DBN architecture with three layers provides superior performance we utilize the Area Under Curve (AUC) metric to evaluate the compared to RBM and DBN2-based OCSVM approaches. accuracy of the proposed algorithm. An AUC of 1 corresponds Based on this result, the selected DBN model with three layers to an ideal detection, and an AUC > 0.9 corresponds to a provides more accurate detection compared to the RBM and good detection. In this section, we present a comparison be- DBN2. As a matter of fact, including more layers to the tween the proposed approach DBN3-OCSVM and that of the network may increase the capability of the model to extract RBM, DBN2, and deep stacked autoencoder (DSA) [38]-based relevant features from the input data. Also, we compared the OCSVM methods and DBN-based Expectation Maximization integrated DBN-OCSVM to that of the standalone OCSVM (EM) [39], Birch [40] and K-means [41] approaches. Here, and found better performance in detecting abnormalities in DBN2 comprises two hidden layers and proposed DBN model ozone data. Compared with DSA-OCSVM, the proposed ap- comprises three layers. The experimental parameters of the proach delivers high detection performance. This is mainly

1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2852001, IEEE Sensors Journal IEEE SENSORS JOURNAL 9

due to the flexibility and efficiency of DBN model to extract combining OCSVM algorithm with deep learning models the important features from the data and the high capacity of improve its detection capacity. OCSVM to quantitatively discriminate between abnormal and normal features.

We also compared the detection efficiency of the proposed approach to that of DBN-based EM, K-means, and Birch algorithms (Figure 18). Here, these clustering schemes are applied to the features from the last layer in the DBN model. Figure 18 indicates that the DBN-OCSVM has surpassed DBN-based EM, K-means, and Birch algorithms. OCSVM algorithm is designed without considering assumptions of data convexity as in EM algorithm. In OCSVM algorithm, a kernel function projects the input data to a higher dimensional space to make a clear separation between normal and abnormal Fig. 19. AUC comparison of the studied fault detection methods (i.e., DBN3, features. In addition, OCSVM is insensitive to the rank of DBN2, RBM, and DSA-based OCSVM and standalone OCSVM methods) for the data record as the case for Birch scheme. Unsupervised different fault magnitudes (case B). OCSVM is easy to implement and requires only the fault- free data for training without the need for any labeled data. Now, the performances of DBN-OCSVM approach in de- Of course, the proposed technique showed high sensitivity to tecting multiple anomalies in multivariate ozone data are abrupt changes in ozone measurements compared to the other compared with DBN-based EM, K-means and Birch algorithm studied approaches. (Figure 20). Similar to the previous results, the detection results clearly indicate the superiority of DBN-OCSVM ap- proach in detecting multiple anomalies in ozone data. Overall, the DBN-OCSVM showed the best detection quality in detect- ing abnormalities in ozone data.

Fig. 18. AUC comparison of the DBN-based OCSVM, EM, K-means, and Birch methods for different fault magnitudes (case A).

2) Case B: Multiple abrupt faults: We simulate a bias Fig. 20. AUC comparison of the DBN-based OCSVM, EM, K-means, and sensor fault by simultaneously introducing a small constant Birch methods for different fault magnitudes (case B). change to the ozone measurements of both S3 and S6 between sample numbers 240 and 300 of the testing data. The AUC 3) Case C: Intermittent faults: Here, we assess the DBN- values of the DBN-OCSVM scheme with different numbers OCSVM scheme in the presence of intermittent sensor faults of hidden layers (i.e., RBM, DBN2 and DBN3), the DSA- that occur and disappear repeatedly. We introduce a simulated OCSVM and the standalone OCSVM algorithms computed for intermittent fault into the raw data of S5. Specifically, we different fault magnitude values are presented in Figure 19. By inject a bias in the ozone concentration measured byS5 from comparing the RBM, DBN2, and DBN3-based OCSVM algo- samples 410 to 440, and from samples 502 to 520 of the rithms, we conclude that including more hidden layers to the testing data. To evaluate the sensitivity of DBN-OCSVM DBN model enhance the fault detection performance. From the to these intermittent faults, we change the amplitude of figure, we see that the DBN-OCSVM scheme performs well in the bias fault and compute the AUC obtained by DBN3- detecting multiple faults (e.g., faults with 20% magnitude can OCSVM, DBN2-OCSVM, RBM-OCSVM, DSA-OCSVM and be detected with AUC=0.916). Results in Figure 19 confirms standalone OCSVM algorithms. The AUC as a function of the superiority of the DBN3-OCSVM approach compared to the fault magnitude is shown in Figure 21. We see that the DSA-OCSVM and the standalone OCSVM. Unlike deep mod- DBN-OCSVM algorithm performs suitably well in detecting els, standalone OCSVM is less effective in detecting small and intermittent sensor faults. It detects intermittent faults with moderate anomalies in ozone measurements, i.e., its detection a magnitude of 20% of the total variation found in the raw capability decrease in the case of small faults (e.g., faults data, with AUC around 0.89, which is very promising. Our with 20% magnitude can be detected with AUC=0.559). Thus, study again testifies that the performance of the OCSVM chart

1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2852001, IEEE Sensors Journal IEEE SENSORS JOURNAL 10

improved when applied to features from DBN model than from changes of OCSVM scheme. We first evaluated the efficiency the RBM, DBN2, and DSA. of the developed algorithm to detect abnormal ozone pollution real data from Isere` in France. The developed approach correctly detected all anomalies reported by the experts of Atmo Auvergne-Rhone-Alpesˆ association. We also applied this approach to detect sensors anomalies. Furthermore, we also compared the efficiency of the proposed approach with RBM and DSA-based K-Means, EM and Birch algorithms. Results verify the ability of the DBN-OCSVM methodology to identify abnormalities in ozone measurements. As future work, it would be interesting to incorporate more data inputs, such as meteorological data or other pollutants to enhance the efficiency of the proposed approach.

Fig. 21. AUC comparison of the studied fault detection methods (i.e., DBN3, DBN2, RBM, and DSA-based OCSVM and standalone OCSVM methods) for ACKNOWLEDGEMENT different fault magnitudes (case C).

The superiority of the DBN-OCSVM approach, in detecting intermittent anomalies in ozone measurements, over other charts can be confirmed by the results in Figure 22, show- ing that the DBN-OCSVM approach has the highest AUC values compared to the DBN-based EM, K-means and Birch approaches. REFERENCES

[1] M. Bacco, F. Delmastro, E. Ferro, and A. Gotta, “Environmental monitoring for smart cities,” IEEE Sensors Journal, vol. 17, no. 23, pp. 7767–7774, 2017. [2] K. B. Shaban, A. Kadri, and E. Rezk, “Urban air pollution monitoring system with forecasting models,” IEEE Sensors Journal, vol. 16, no. 8, pp. 2598–2606, 2016. [3] K. S. E. Phala, A. Kumar, and G. P. Hancke, “Air quality monitoring sys- tem based on ISO/IEC/IEEE 21451 standards,” IEEE Sensors Journal, vol. 16, no. 12, pp. 5037–5045, 2016. [4] F. Harrou, L. Fillatre, and I. Nikiforov, “Bounded nuisance rejection and redundant sensor network,” in International Conference, System Identification and Control Problems, SICPRO’09. Moscow, 2009, pp. 786–795. [5] J.-Y. Kim, C.-H. Chu, and S.-M. Shin, “ISSAQ: An integrated sensing systems for real-time indoor air quality monitoring,” IEEE Sensors Journal, vol. 14, no. 12, pp. 4230–4244, 2014. Fig. 22. AUC comparison of the DBN-based OCSVM, EM, K-means, and [6] H. Moshammer, “Communicating health impact of air pollution,” in Air Birch methods for different fault magnitudes (case C). Pollution. InTech, 2010. [7] F. Biancofiore, M. Verdecchia, P. Di Carlo, B. Tomassetti, E. Aruffo, M. Busilacchio, S. Bianco, S. Di Tommaso, and C. Colangeli, “Analysis Thus, our results demonstrate that the DBN-OCSVM al- of surface ozone using a ,” Science of the Total gorithm performs reasonably well in detecting real ozone Environment, vol. 514, pp. 379–387, 2015. pollution and abnormal measurements resulting from sensor [8] L. E. Plummer, S. Smiley-Jewell, and K. E. Pinkerton, “Impact of air pollution on lung inflammation and the role of toll-like receptors,” malfunctions. This study shows, in particular, the benefit of International Journal of Interferon, Cytokine and Mediator Research, integrating a DBN model, which extracts the features of high vol. 4, no. 1, pp. 43–57, 2012. dimensional data, with OCSVM, which differentiates the new [9] O. Taylan, “Modelling and analysis of ozone concentration by artificial intelligent techniques for estimating air quality,” Atmospheric Environ- data from the normal training data. Also, results demonstrated ment, vol. 150, pp. 356–365, 2017. that the proposed DBN-OCSVM methodology shows a good [10] J.-S. Hwang and C.-C. Chan, “Effects of air pollution on daily clinic performance compared to the DSA-based OCSVM and DBN- visits for lower respiratory tract illness,” American journal of epidemi- ology, vol. 155, no. 1, pp. 1–10, 2002. based EM, K-means, and Birch schemes. [11] F. Harrou, M. Nounou, and H. Nounou, “Statistical detection of ab- normal ozone levels using principal component analysis,” ternational Journal of Engineering & Technology, vol. 12, no. 6, pp. 54–59, 2012. VI.CONCLUSION [12] M. P. Rissanen, T. Kurten,´ M. Sipila, J. A. Thornton, J. Kangasluoma, Reliable monitoring of air quality is an essential element N. Sarnela, H. Junninen, S. Jørgensen, S. Schallhart, M. K. Kajos et al., “The formation of highly oxidized multifunctional products in the for improving air quality management and to ensuring a ozonolysis of cyclohexene,” Journal of the American Chemical Society, proper environment for good health and quality living. In vol. 136, no. 44, pp. 15 596–15 606, 2014. this paper, an unsupervised anomaly detection strategy is [13] A. Nawahda, “An assessment of adding value of traffic information and other attributes as part of its classifiers in a tool set designed for monitoring ozone measurements. This strategy for predicting surface ozone levels,” Process Safety and Environmental integrates the DBN modeling with the sensitivity to small Protection, vol. 99, pp. 149–158, 2016.

1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2852001, IEEE Sensors Journal IEEE SENSORS JOURNAL 11

[14] S. A. Abdul-Wahab, C. S. Bakheit, and S. M. Al-Alawi, “Principal [37] Y. Bengio, “Practical recommendations for gradient-based training of component and multiple in modelling of ground-level deep architectures,” in Neural networks: Tricks of the trade. Springer, ozone and factors affecting its concentrations,” Environmental Modelling 2012, pp. 437–478. & Software, vol. 20, no. 10, pp. 1263–1271, 2005. [38] Y. Bengio et al., “Learning deep architectures for AI,” Foundations and [15] C. Vlachokostas, S. Nastis, C. Achillas, K. Kalogeropoulos, I. Karmiris, trends R in , vol. 2, no. 1, pp. 1–127, 2009. N. Moussiopoulos, E. Chourdakis, G. Banias, and N. Limperi, “Eco- [39] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood nomic damages of ozone air pollution to crops using combined air from incomplete data via the em algorithm,” Journal of the royal quality and GIS modelling,” Atmospheric Environment, vol. 44, no. 28, statistical society. Series B (methodological), pp. 1–38, 1977. pp. 3352–3361, 2010. [40] T. Zhang, R. Ramakrishnan, and M. Livny, “Birch: an efficient data [16] C. Duenas,˜ M. Fernandez,´ S. Canete, J. Carretero, and E. Liger, “Anal- clustering method for very large databases,” in ACM Sigmod Record, yses of ozone in urban and rural sites in malaga´ (spain),” Chemosphere, vol. 25, no. 2. ACM, 1996, pp. 103–114. vol. 56, no. 6, pp. 631–639, 2004. [41] D. Arthur and S. Vassilvitskii, “k-means++: The advantages of careful [17] F. Harrou, F. Kadri, S. Khadraoui, and Y. Sun, “Ozone measurements seeding,” in Proceedings of the eighteenth annual ACM-SIAM sym- monitoring using data-based approach,” Process Safety and Environmen- posium on Discrete algorithms. Society for Industrial and Applied tal Protection, vol. 100, pp. 220–231, 2016. Mathematics, 2007, pp. 1027–1035. [18] F. Harrou, A. Dairi, Y. Sun, and M. Senouci, “Reliable detection of abnormal ozone measurements using an air quality sensors network,” in 2018 IEEE International Conference on Environmental Engineering (EE). IEEE, 2018, pp. 1–5. [19] F. Harrou, L. Fillatre, M. Bobbia, and I. Nikiforov, “Statistical detection of abnormal ozone measurements based on constrained generalized likelihood ratio test,” in Decision and Control (CDC), 2013 IEEE 52nd Annual Conference on. IEEE, 2013, pp. 4997–5002. [20] Y.-D. Zhang, Y. Zhang, X.-X. Hou, H. Chen, and S.-H. Wang, “Seven- layer deep neural network based on sparse autoencoder for voxelwise detection of cerebral microbleed,” Multimedia Tools and Applications, vol. 77, no. 9, pp. 10 521–10 538, 2018. [21] W. Jia, M. Yang, and S.-H. Wang, “Three-category classification of magnetic resonance hearing loss images based on deep autoencoder,” Journal of medical systems, vol. 41, no. 10, p. 165, 2017. [22] Y. Jia, J. Wu, and Y. Du, “Traffic speed prediction using deep learning method,” in Intelligent Transportation Systems (ITSC), 2016 IEEE 19th International Conference on. IEEE, 2016, pp. 1217–1222. [23] D. Rav`ı, C. Wong, F. Deligianni, M. Berthelot, J. Andreu-Perez, B. Lo, and G.-Z. Yang, “Deep learning for health informatics,” IEEE journal of biomedical and health informatics, vol. 21, no. 1, pp. 4–21, 2017. [24] O. Costilla-Reyes, P. Scully, and K. B. Ozanyan, “Deep neural networks for learning spatio-temporal features from tomography sensors,” IEEE Transactions on Industrial Electronics, 2017. [25] A. Koesdwiady, R. Soua, and F. Karray, “Improving traffic flow pre- diction with weather information in connected cars: A deep learning approach,” IEEE Transactions on Vehicular Technology, vol. 65, no. 12, pp. 9508–9517, 2016. [26] J. Wang, X. Zhang, Q. Gao, H. Yue, and H. Wang, “Device-free wireless localization and activity recognition: A deep learning approach,” IEEE Transactions on Vehicular Technology, vol. 66, no. 7, pp. 6258–6267, 2017. [27] X. Wang, L. Gao, S. Mao, and S. Pandey, “CSI-based fingerprinting for indoor localization: A deep learning approach,” IEEE Transactions on Vehicular Technology, vol. 66, no. 1, pp. 763–776, 2017. [28] J. H. Seinfeld and S. N. Pandis, Atmospheric chemistry and physics: from air pollution to climate change. John Wiley & Sons, 2016. [29] B. Ozbay,¨ G. A. Keskin, S¸. C¸. Dogruparmak,˘ and S. Ayberk, “Multivari- ate methods for ground-level ozone modeling,” Atmospheric Research, vol. 102, no. 1-2, pp. 57–65, 2011. [30] A. Dairi, F. Harrou, M. Senouci, and Y. Sun, “Unsupervised obstacle de- tection in driving environments using deep-learning-based stereovision,” Robotics and Autonomous Systems, vol. 100, pp. 287–301, 2018. [31] A.-r. Mohamed, G. E. Dahl, and G. Hinton, “Acoustic modeling us- ing deep belief networks,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 14–22, 2012. [32] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554, 2006. [33] B. Scholkopf,¨ J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, “Estimating the support of a high-dimensional distribution,” Neural computation, vol. 13, no. 7, pp. 1443–1471, 2001. [34] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layer- wise training of deep networks,” in Advances in neural information processing systems, 2007, pp. 153–160. [35] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554, 2006. [36] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, “Stacked denoising : Learning useful representations in a deep network with a local denoising criterion,” Journal of Machine Learning Research, vol. 11, no. Dec, pp. 3371–3408, 2010.

1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.