Detecting Abnormal Ozone Measurements with a Deep Learning-Based Strategy

Detecting abnormal ozone measurements with a deep learning-based strategy Item Type Article Authors Harrou, Fouzi; Dairi, Abdelkader; Sun, Ying; Kadri, Farid Citation Harrou F, Dairi A, Sun Y, Kadri F (2018) Detecting Abnormal Ozone Measurements With a Deep Learning-Based Strategy. IEEE Sensors Journal 18: 7222–7232. Available: http:// dx.doi.org/10.1109/JSEN.2018.2852001. Eprint version Post-print DOI 10.1109/JSEN.2018.2852001 Publisher Institute of Electrical and Electronics Engineers (IEEE) Journal IEEE Sensors Journal Rights (c) 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Download date 04/10/2021 08:33:37 Link to Item http://hdl.handle.net/10754/628364 This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2852001, IEEE Sensors Journal IEEE SENSORS JOURNAL 1 Detecting abnormal ozone measurements with a deep learning-based strategy Fouzi Harrou, Member, IEEE, Abdelkader Dairi, Ying Sun, Farid Kadri Abstract—Air quality management and monitoring are vital to by bright sunlight and usually result in a severe air quality maintaining clean air, which is necessary for the health of human, issue named photochemical smog [14]–[16]. Thus, there is vegetation, and ecosystems. Ozone pollution is one of the main a need for systematic approach to detecting abnormal ozone pollutants that negatively affect human health and ecosystems. This paper reports the development of an unsupervised and levels, which provides pertinent information for enhancing air efficient scheme to detecting anomalies in unlabelled ozone mea- quality management or providing useful information for rapid surements. This scheme combines a Deep Belief Networks (DBN) decision-making, such as warning the public of harmful ozone model and a one-class support vector machine (OCSVM). The levels or checking the sensors in case a technical problem is DBN model accounts for nonlinear variations in the ground-level causing anomalies. [17]–[19]. This paper presents an flexible ozone concentrations, while OCSVM detects the abnormal ozone measurements. The performance of this approach is evaluated monitoring scheme to detect abnormal ozone data. using real data from Isere` in France. We also compare the detection quality of DBN-based detection schemes to that of Recently, deep learning-based feature extraction mytholo- deep stacked auto-encoders, Restricted Boltzmann Machines- gies turn out to play a considerable role in the literature [20]– based OCSVM and DBN-based clustering procedures (i.e., K- [24]. As a matter of fact, deep learning methods were de- means, Birch and Expectation Maximization). The results show signed to model complex systems with flexibility, simplic- that the developed strategy is able to identify anomalies in ozone ity, and strength using series of multilayer architectures. Fo measurements. instance, they are used to enhance intelligent transportation Index Terms—Air quality, learning, OCSVM, Ozone, DBNs, systems [22], [25], health informatics [23], human action statistical monitoring, anomaly detection. recognition [26], detection of the cerebral microbleed vox- els [20], classification of hearing loss images [21], and in I. INTRODUCTION fingerprints indoor positioning via WIFI [27]. Due to its broad applications, the Deep Belief Networks (DBNs) models, have O achieve acceptable air quality and better understand air received much attention from researchers recently. However, pollution phenomena, investigations on the health influ- T to the best of our knowledge, the DBNs models have not been ences of ambient air pollution have attracted numerous efforts employed for ozone pollution monitoring. throughout the last few decades [1]–[6]. The surveillance of at- mospheric pollution is of great importance due to the negative Constructing a fast and precise model of ozone variation is effects of pollution on health of human and the ecosystem [7], very challenging mainly because its mechanisms of production [8]. The high levels of air pollution have become an important in the troposphere are complex and there are measurement issue and can leads to serious impacts on the human health [2], uncertainties in all the parameters involved [28], [29]. To [9], [10]. Pollution from ozone (O3) formed in the troposphere bypass this difficulty, in this study we use DBN to model is a growing issue in industrialized countries, mainly due to the variability in ozone data. It shows a good performance its participation in greenhouse gas emissions [7]. Accordingly, in learning layer-by-layer complex nonlinearity [30]. The air quality management and monitoring are key to assuring a purpose is to design a deep learning-based methodology healthy living environment [11]. capable to detect abnormal ozone measurements. Basically, this method combines a DBN model with one-class support Air quality is a key challenging problem increasingly vector machine (OCSVM), to simultaneously take benefits of gaining attention worldwide [12], [13]. Ground-level ozone a DBN model in extracting features from high dimensional (O3), which is an important pollutant that affect everyones data and the anomaly detection capability of OCSVM. We health, is basically generated by a chemical reaction of ni- assess our approach via data obtained from the Isere` region in trogen and carbon-based compounds discharged in vehicle France. We also compare the detection quality of DBN-based exhausts [12], [13]. The reactions creating ozone are favored detection schemes to that of deep stacked auto-encoders (DSA) and Restricted Boltzmann Machines-based OCSVM and other F. Harrou and Y. Sun are with King Abdullah University of Science and clustering procedures. Technology (KAUST) Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, 23955-6900, Saudi Arabia e-mail: [email protected] The study site and data collection are briefly described A. Dairi is with Computer Science Department, University of Oran 1 in Section II. Section III reviews the DBN model and the Ahmed Ben Bella , Algeria Street El senia el mnouer bp 31000 Oran, Algeria. OCSVM scheme. Then, the DBN modeling approach and the E-mail: [email protected] F. Kadri is with Sopra Steria Group, Big Data & Data Analytics Team, OCSVM procedure are introduced in Section III. Section IV 31770 Colomiers, France. E-mail: [email protected] presents the designed monitoring technique. In Section V, we 1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2852001, IEEE Sensors Journal IEEE SENSORS JOURNAL 2 assess the efficiency of the developed approach using via real data and conclusions are presented in Section VI. II. STUDIED AIR QUALITY MONITORING NETWORK In this paper, we monitor ozone measurements from Isere,` France (see Figure 1). Atmo Auvergne-Rhone-Alpesˆ is the establishment in charge of supervision of air quality in this region. Hourly ozone data are collected via 14 stations dispersed throughout the region (see Table I). Each station consists of a room with many analyzers (see Figure 2). The ambient air is taken in from the roof of each measuring station and pumped to the measuring instruments located inside the station. Each analyzer is used to a specific pollutant (e.g., ozone). The TABLE I ambient air is continuously analyzed 24 hours a day and 7 MEASUREMENT STATIONS IN ISERE` REGION. days a week. The data measured at the network stations are transmitted by telephone line to a central computer, stored in a local database, processed statistically, and validated manually every day by the Atmo team. The validated data are then sent to a national database. Fig. 2. Measurement station. Fig. 1. Location of Isere` in France. The spatial distribution of measurement stations is shown in Figure 3. III. PRELIMINARY MATERIAL Here, we briefly describe the DBN model and the OCSVM scheme, which are used to construct the proposed detection approach. Fig. 3. Network of measurement stations for the Isere` region. A. Deep Belief Networks DBNs models can basically be obtained by piling succession input data. RBMs are generative models that can be used to of many Restricted Boltzmann Machines (RBMs). These latter generate new data by sampling from the model. They are represent stochastic neural networks (see Figure 5), which can defined with the restriction: no interaction between hidden be viewed as a special case of Boltzmann machines (BM). variables, i.e., neurons belonging to the same layer are not They are efficient to estimate the probability distribution of connected. However, a full connection mode exists with the 1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2852001, IEEE Sensors Journal IEEE SENSORS JOURNAL 3 Fig. 4. Structure of an RBM model. Fig. 5. A diagrammatic representation of a DBN model with three hidden layers. neurons in the other layer. RBMs are undirected graphs with a simple structure composed of a visible layer v representing the input data (observed) and hidden (or latent) layer corre- a weight matrix Wk with optional bias vectors for both the sponding to the feature detectors. The interaction between the visible and hidden layers.

Load more