<<

applied sciences

Article Pollutant Recognition Based on Supervised Machine Learning for Monitoring Systems

Shaharil Mad Saad 1,* ID , Allan Melvin Andrew 2, Ali Yeon Md Shakaff 3, Mohd Azuwan Mat Dzahir 1, Mohamed Hussein 1, Maziah Mohamad 1 and Zair Asrar Ahmad 1 1 Faculty of Mechanical Engineering, Universiti Teknologi Malaysia (UTM), Johor Bahru 81310, Malaysia; [email protected] (M.A.M.D.); [email protected] (M.H.); [email protected] (M.M.); [email protected] (Z.A.A.) 2 Center for Diploma Studies (CDS), Universiti Malaysia Perlis (UniMAP), Kampus UniCITI Alam, Chuchuh, Padang Besar 02100, Malaysia; [email protected] 3 Center of Excellence for Advanced Sensor Technology (CEASTech), Universiti Malaysia Perlis (UniMAP), Taman Muhibbah, Jejawi, Arau 02600, Malaysia; [email protected] * Correspondence: [email protected]; Tel.: +60-7-553-4673

Received: 3 July 2017; Accepted: 8 August 2017; Published: 11 August 2017

Abstract: Indoor air may be polluted by various types of pollutants which may come from cleaning products, construction activities, perfumes, cigarette smoke, water-damaged building materials and outdoor pollutants. Although these gases are usually safe for humans, they could be hazardous if their amount exceeded certain limits of exposure for human health. A sophisticated indoor air quality (IAQ) monitoring system which could classify the specific type of pollutants is very helpful. This study proposes an enhanced indoor air quality monitoring system (IAQMS) which could recognize the pollutants by utilizing supervised machine learning algorithms: multilayer perceptron (MLP), K-nearest neighbour (KNN) and linear discrimination analysis (LDA). Five sources of indoor air pollutants have been tested: ambient air, combustion activity, presence of chemicals, presence of fragrances and presence of food and beverages. The results showed that the three algorithms successfully classify the five sources of indoor (IAP) with a classification rate of up to 100 percent. An MLP classifier with a model structure of 9-3-5 has been chosen to be embedded into the IAQMS. The system has also been tested with all sources of IAP presented together. The result shows that the system is able to classify when single and two mixed sources are presented together. However, when more than two sources of IAP are presented at the same period, the system will classify the sources as ‘unknown’, because the system cannot recognize the input of the new pattern.

Keywords: indoor air quality; supervised machine learning; pollutants recognition

1. Introduction The issue of outdoor air pollution (OAP), such as haze, is well known to the public due to the attention given to it by the media. In contrast, the issue of indoor air pollution (IAP) is less known to the public, although IAP poses similar threats towards human health. In fact, more attention should be given to the issue of IAP, because people normally spend 90% of their time in indoor environments [1]. IAP, which undermines indoor air quality (IAQ), is found to contain indoor air pollutants such as harmful gases and contaminants at a concentration level up to five times higher than the concentration of these pollutants in normal air. In severe cases, the concentration level of the pollutants could rise up to 100 times more than a normal concentration level, which is hazardous for human health [1]. IAP can be defined as the disturbance of any gases, materials or human activities on the state of ambient air in indoor environment [2]. In other words, IAP does not need the presence of disease or infirmity. As long as the concentration in the normal ambient air is disturbed either by gases, other materials or combustion activity, it is already considered as pollution. The most common sources

Appl. Sci. 2017, 7, 823; doi:10.3390/app7080823 www.mdpi.com/journal/applsci Appl. Sci. 2017, 7, 823 2 of 21 of IAP are contributed by combustion activities such as cigarette smoking, wood or paper burning, gas stoves, gas-fired dryers and engines emission [3,4]. Other materials, such as chemical products, building materials and office materials, are also major sources of IAP. Chemical products such as air fresheners and cleaning products contribute to IAP by emitting volatile organic compounds (VOCs), while building and office materials such as printers and carpets also release air contaminants such as dust into the indoor air atmosphere [5–9]. IAP may also emerge as a result of variations of gas concentration in the indoor air, as well as from variations in thermal conditions such as temperature and humidity [10,11]. Thermal or physical conditions such as temperature, relative humidity (RH) and air movement are important IAP parameters as they could affect people’s perception towards IAP. These physical parameters can act directly on building occupants, interact with indoor air pollution factors or affect human responses to the indoor environment [12–16]. These sources of IAP might not look harmful, but to a certain extent (based on the concentration level) they may be hazardous to human health. The health effects from IAP may be experienced soon after exposure, or possibly, years later, depending on the individuals and type of pollutants they have been exposed to [17]. The common health conditions that may show up immediately include irritation of the eyes, nose, and throat, headaches, dizziness, fatigue, and humidifier fever. The health effects of years of exposure to IAP are more dangerous and can be fatal; these include some respiratory-related diseases, heart disease, and cancer [18,19]. The health effects relating to poor indoor air quality have been divided into four categories: environmental tobacco smoke (ETS), sick building syndrome (SBS), building related illness (BRI), and thermal comfort problems (TCP) [10,13,20]. Hence, meticulous attention should be given to make sure the indoor air is safe and comfortable. Indoor air may be polluted by various types of pollutants which may come from cleaning products, construction activities, perfumes, cigarette smoke, water-damaged building materials and outdoor pollutants [21]. Although these gases are usually safe for humans, they could be hazardous to human beings, especially people with respiratory-related problem and children, if their amount exceeded certain limits of exposure for human health. In terms of the current technology, certain sensing devices have allowed researchers to get continuous, quick and reliable information about ambient air [21] which enable advanced data processing to be applied to the data of ambient air. Forecasting techniques allow the system to predict level of IAQ while pattern recognition techniques allow the system to recognize certain types of smell. However, these advanced data processing techniques were mainly used to investigate the odour in outdoor environments [22–25]. In indoor environments, odour recognition was usually used to detect odour from a single category; for example, the use of odour recognition to recognize types of mushrooms [26], types of oil flowers (lavender, hyssop, geranium and rosemary) [27] or types of pure chemical only (ethanol, hexanal, linalool, and ammonia) [28]. On the other hand, some other techniques, such as computational fluid dynamics (CFD), are used to predict IAP in indoor environments [29,30]. CFD is used to evaluate the IAQ for buildings near to the road traffic environment. This study proposed an enhancement of the IAQMS, where the system is integrated with sources of pollution recognition. The proposed IAQMS has been developed and designed using an array of sensors which can also effectively function as an electronic nose, meaning that it could measure multiple pollutants that influenced indoor air levels. The pattern recognition algorithm allows the system to recognize the sources of IAP from five conditions: ambient air, combustion activity, presence of fragrance, presence of chemical and presence of food and beverages. As for the sources influencing IAP recognition ability, this study utilized a machine learning algorithm that is widely used in data mining. In order to find the best classifier among the family of machine learning algorithms, three algorithms which have been used in many applications—especially involving odour or smell classification—have been chosen. The three algorithms chosen are multilayer perceptron (MLP), K-nearest neighbour (KNN) and linear discrimination analysis (LDA). Then, the classifier that provided the best classification results is chosen to be embedded into the system. Appl. Sci. 2017, 7, 823 3 of 21

Appl. Sci. 2017, 7, 823 3 of 21 2. Overview of Developed IAQMS System 2. Overview of Developed IAQMS System Basically, the developed system consists of the sensor module cloud (SMC), base station and service-orientedBasically, client, the developed as shown system in Figure consists1. The of th sensore sensor module module cloud cloud (SMC)(SMC), containsbase station collections and of sensorservice-oriented modules client, that measure as shown thein Figure air quality 1. The sensor data andmodule transmit cloud (SMC) the captured contains collections data to the of base stationsensor through modules a wireless that measure network. the air Each quality sensor data an moduled transmit includes the captured an integrated data to the sensor base station array that through a wireless network. Each sensor module includes an integrated sensor array that can can measure IAQ parameters. There are various IAQ parameters involved in measuring the IAQ measure IAQ parameters. There are various IAQ parameters involved in measuring the IAQ level. level.These These parameters parameters are are divided divided into into four four categories categories:: physical physical condition, condition, chemical chemical contaminants, contaminants, biologicalbiological contaminants, contaminants, and and other other common common IAQ IAQ parameters. parameters. However, However, for for the the purpose purpose of of this this project, only nineproject, parameters only nine haveparamete beenrs chosen, have been which chosen, are nitrogenwhich are dioxide nitrogen (NO dioxide2), carbon (NO2), dioxide carbon (COdioxide2), (O3),(CO carbon2), ozone monoxide (O3), carbon (CO), oxygenmonoxide (O 2(CO),), VOCs oxygen and (O particulate2), VOCs and matter particulate (PM10) matter along with(PM10 temperature) along and humidity.with temperature This study and chooseshumidity. the This parameters study choo forses itsthe IAQMS parameters based for on its theIAQMS indoor based air parameterson the highlightedindoor air by parameters the Occupational highlighted Safety by the and Occupati Healthonal Administration Safety and Health (OSHA), Administration the American (OSHA), Society of Heating,the American Refrigerating Society and of Air-Conditioning Heating, Refrigeratin Engineersg and Air-Conditioning (ASHRAE), the Engineers United States (ASHRAE), Environmental the ProtectionUnited Agency States Environmental (US EPA) and Protection Malaysian Agency regulations (US EPA) on and indoor Malaysian air as stipulatedregulations byon Departmentindoor air of as stipulated by Department of Occupational Safety and Health (DOSH) [12,17,31,32]. Occupational Safety and Health (DOSH) [12,17,31,32].

FigureFigure 1. System 1. System architecture architecture for for real-timereal-time indoor indoor air air quality quality (IAQ) (IAQ) monitoring. monitoring.

TheThe sensor sensor can can be be divided divided into into three three typestypes of sensor: gas gas sensor, sensor, particle particle sensor sensor and andthermal thermal sensor.sensor. Lists Lists of sensors of sensors used used in in the the proposed proposed systemsystem along along with with their their operational operational range range are presented are presented in Tablein Table1 below. 1 below.

TableTable 1. 1.List List of of sensorssensors used in in the the system. system. No Sensor Model Sensor Type Target Parameter Typical Detection Range Required Range in IAQ No1 SensorCDM Model 4161 SensorMOX Type TargetParameter CO2 Typical 400–2000 Detection ppm Range Required 400–1000 Range ppm in IAQ 2 TGS 5342 Electrochemical CO 0–100 ppm 0–10 ppm 1 CDM 4161 MOX CO2 400–2000 ppm 400–1000 ppm 23 TGS TGS 5342 2602 Electrochemical MOX VOCsCO 0–300–100 ppm ppm 0–3 0–10 ppm ppm 34 TGSMiCS-2610 2602 MOX MOX VOCs O3 0–10–30 ppm ppm 0–0.05 0–3 ppm ppm 5 MiCS-2710 MOX NO2 0.05–5 ppm 0–0.08 ppm 4 MiCS-2610 MOX O3 0–1 ppm 0–0.05 ppm 6 KE-25 Electrochemical O2 0–100% 19.5–23% 5 MiCS-2710 MOX NO2 0.05–5 ppm 0–0.08 ppm Thermal Humidity 20–95% RH 38–70% RH 67 KE-25HSM20G Electrochemical O2 0–100% 19.5–23% ThermalThermal Temperature Humidity 20–95% 0–50 °C RH 23–26 38–70% °C RH 7 HSM20G ◦ ◦ 8 GP2Y1010AU0F Thermal Optical Temperature PM10 0–0.5 0–50 mg/mC3 0–0.15 23–26 mg/m3C 3 3 8 GP2Y1010AU0F Optical PM10 0–0.5 mg/m 0–0.15 mg/m The sensors are chosen based on their detection rate, which comply with the range required by IAQ [10,12]. Each sensor generates a voltage signal based on the current environment. These The sensors are chosen based on their detection rate, which comply with the range required by sampling voltage levels are read by the microcontroller periodically. Selecting a proper gas sensor is IAQ [10,12]. Each sensor generates a voltage signal based on the current environment. These sampling voltage levels are read by the microcontroller periodically. Selecting a proper gas sensor is a relatively Appl. Sci. 2017, 7, 823 4 of 21 Appl. Sci. 2017, 7, 823 4 of 21 complicateda relatively issue, complicated as many issue, factors as needmany to factors be taken need into to consideration.be taken into consideration. For this study, For most this of study, the gas sensorsmost of are the metal gas sensors oxide (MOX)-based, are metal oxide while (MOX)-based the rest are, while electrochemical-based. the rest are electrochemical-based. The MOX gas sensorThe MOX gas sensor is composed of a sensor cap, sensing element and sensor base. Basically, the gas is composed of a sensor cap, sensing element and sensor base. Basically, the gas sensing element is sensing element is coated with a metal oxide—tin dioxide (SnO2)—material that responds to the gas coated with a metal oxide—tin dioxide (SnO )—material that responds to the gas molecules, which molecules, which are typically volatile compounds2 [33]. It consists of two major parts; namely, the are typically volatile compounds [33]. It consists of two major parts; namely, the heater and sensor heater and sensor substrate. The substrate has two terminals, and its resistance is measured as a substrate. The substrate has two terminals, and its resistance is measured as a representation of the representation of the amount of gas concentration in the environment, while the heater provides the amountstabilized of gas temperature concentration needed in the for environment, the measurement while [34]. the heaterDue toprovides its long lifetime, the stabilized high sensitivity temperature neededresponse for theand measurement low cost, this [type34]. Dueof sensor to its is long commonly lifetime, used high in sensitivity many indoor response applications and low such cost, as this typehomes, of sensor offices is and commonly factories used appliances. in many The indoor second applications type of gas such sensor as homes, used in offices this study and factoriesis the appliances.electrochemical-based The second sensor. type of This gas type sensor of sensor used has in this high study sensitivity is the to electrochemical-based environmental change sensor.and Thisit does type not of need sensor power has highto operate. sensitivity The typical to environmental electrochemical change sensors and consis it doest of chemical not need reactants power to operate.(electrolytes The typical or gels) electrochemical and two terminals—an sensors anode consist and of chemicala cathode. reactants The anode (electrolytes is responsible or gels)for an and twooxidization terminals—an process anode and andthe cathode a cathode. is responsible The anode isfor responsible a reduction for process. an oxidization As a result, process current and is the cathodecreated is by responsible way of positive for a reductionions flowing process. to the Ascathode a result, and current the negative is created ions byflowing way ofto positivethe anode. ions flowingThe output to the is cathode directly and proportional the negative to the ions concentration flowing to theof the anode. sample The gases. output is directly proportional to the concentrationThe calibration of of the each sample gas sensor gases. has been carried out in our previous manuscript [35]. For validation,The calibration self-developed of each sensor gas nodes sensor were has validating been carried with the out commercial in our previous device. manuscriptThe validation [35 ]. Forprocedure validation, had self-developed been carried out sensor to make nodes sure were that validating the data withcollected the commercialby the sensor device. was similar The validation to the proceduredata collected had been by a commercial carried out sensor. to make The sure discussi that theon of data this collected procedure by was the limited sensor only was similarto the NO to2 the, temperature and RH sensors. High NO2 levels in the outdoor environment, originating from local data collected by a commercial sensor. The discussion of this procedure was limited only to the NO2, traffic or other combustion sources, influences the NO2 level in the indoor environment. Exposure to temperature and RH sensors. High NO2 levels in the outdoor environment, originating from local an excessive level of NO2 could be fatal [4]. The sensor validation was carried out with an Aeroqual traffic or other combustion sources, influences the NO level in the indoor environment. Exposure to Ltd.(Auckland, New Zealand) Series 500 portable indoor2 monitor device (a professional grade air an excessive level of NO could be fatal [4]. The sensor validation was carried out with an Aeroqual Ltd. quality measurement 2system) which had been pre-calibrated [36]. Three sensor nodes and the (Auckland, New Zealand) Series 500 portable indoor monitor device (a professional grade air quality Aeroqual device were placed in a clear, sealed glass container of 100 cm × 40 cm × 30 cm which was measurementcompletely sealed. system) Then, which the had gas beenconcentration pre-calibrated inside [ 36the]. sealed Three container sensor nodes was andvaried the by Aeroqual injecting device the × × wereparticular placed ingas a of clear, interest. sealed The glass outputs container of the of sens 100ors cm were40 recorded cm 30 continuously cm which was for completely1 h and plotted sealed. Then,(Figure the 2). gas concentration inside the sealed container was varied by injecting the particular gas of interest. The outputs of the sensors were recorded continuously for 1 h and plotted (Figure2).

(a)

(b)

Figure 2. Cont. Appl.Appl. Sci. Sci.2017 2017, 7,, 7 823, 823 5 of5 21 of 21

(c)

Figure 2. (a) NO2 data; (b) Temperature data; (c) Humidity data. Figure 2. (a) NO2 data; (b) Temperature data; (c) Humidity data.

Figure 2a shows the result of the NO2 sensor when 35 ppb of NO2 gas concentration was injected Figure2a shows the result of the NO 2 sensor when 35 ppb of NO2 gas concentration was injected toto the the sealed sealed container. container. It It can can be be observedobserved that th thee value value for for all all sensor sensor nodes, nodes, including including the the Aeroqual Aeroqual device, gave relatively similar readings for a one hour period. During the experiment, the same room device, gave relatively similar readings for a one hour period. During the experiment, the same room temperature setting 25 °C was applied as shown in Figure 2b, while Figure 2c shows the readings for temperature setting 25 ◦C was applied as shown in Figure2b, while Figure2c shows the readings for RH. For validation purposes, means and standard deviations for all three parameters were calculated RH. For validation purposes, means and standard deviations for all three parameters were calculated as shown in Table 2 below. Also shown in Table 2, the mean value for NO2, Temp and RH of three as shown in Table2 below. Also shown in Table2, the mean value for NO , Temp and RH of three nodes nodes (Node 1, Node 2 and Node 3) did not differ substantially from2 that of Aeroqual (a pre- (Nodecalibrated 1, Node device). 2 and This Node shows 3) did that not the differ data substantially measured for from each that developed of Aeroqual sensor (a pre-calibratedmodules provide device). a Thissimilar shows response that the with data the measured pre-calibrated for each device. developed On the other sensor hand, modules the mean provide value a for similar those response three withparameters the pre-calibrated is within an device. acceptable On the range other or hand, exposure the meanlimit as value suggested for those by threeIAQ authorities parameters such is within as an[10]. acceptable The standard range deviation or exposure (SD) limitfrom asTable suggested 2 shows byhow IAQ the authoritiesdata differed such from as the [10 mean]. The value standard for deviationeach node. (SD) Overall, from Table it shows2 shows that the how de theveloped data differedsystem provides from the reliable mean valuedata. for each node. Overall, it shows that the developed system provides reliable data. Table 2. Means and standard deviations for three parameters. Table 2. Means and standard deviations for three parameters. Node 1 Node 2 Node 3 Aeroqual Parameter Mean SD Mean SD Mean SD Mean SD Node 1 Node 2 Node 3 Aeroqual ParameterNO2 (ppb) (0–80 ppb) 35.9 2.0 34.9 1.7 34.6 1.7 35.5 2.2 Temperature (°C) (23–26 °C)Mean 25.6 SD 0.3 Mean 26.5 SD0.1 Mean25.7 0.1 SD 25.7 Mean 0.1 SD

NO2 (ppb)Humidity (0–80 ppb)(%) 38–70% 35.9 39.7 2.0 0.8 34.939.6 1.70.6 39.3 34.6 0.6 1.7 40.2 35.50.1 2.2 Temperature (◦C) (23–26 ◦C) 25.6 0.3 26.5 0.1 25.7 0.1 25.7 0.1 3. Methodology Humidity (%) 38–70% 39.7 0.8 39.6 0.6 39.3 0.6 40.2 0.1 3.1. Sources of Indoor Air Pollution 3. Methodology Higher IAP levels could lead to certain health effects, while extremely high IAP levels could be 3.1.fatal. Sources Different of Indoor IAP Airparameters Pollution may come from different sources and impose different health effects towards humans, as shown in Table 3. Table 3 shows that there are certainly a lot of activities and conditionsHigher IAPwhich levels could could trigger lead IAP, to certainsuch as health air fresheners, effects, while combustion extremely appliances, high IAP water levels damage, could be fatal.cigarettes, Different fire IAP and parameters combustion may equipment come from [10,12,1 different3]. However, sources for and the impose purpose different of this healthstudy, effectsthe towardssources humans, of IAP are as limited shown to in five Table conditions3. Table 3only shows, because that they there are are commonly certainly present a lot of in activities the indoor and conditionsenvironment: which ambient could air, trigger combustion IAP, such activity, as air the fresheners, presence combustionof chemical products, appliances, the water presence damage, of cigarettes,food and firebeverages, and combustion and the presence equipment of fragrances [10,12, [17,37–39].13]. However, The first for condit the purposeion of sources of this of study, indoor the sourcesair pollution of IAP areis the limited ambient to five air. conditionsAmbient air only, refers because to the they air that are commonlynormally exists present in inthe the indoor indoor environment:environment ambient without air,the combustionpresence of other activity, sources the presence of indoor of air chemical pollution. products, Ambient the air presence could pollute of food andthe beverages, indoor air andif it thecarries presence excessive of fragrances dust from [carpets17,37–39 and]. The furniture first condition or if it carries of sources too much of indoor ozone air pollutionfrom office is the machines ambient [17]. air. The Ambient second air condition refers to of the sources air that of normally IAP is combustion exists inthe activity. indoor Combustion environment withoutactivities the such presence as smoking of other cigarettes sources and of indoorburning air fire-wood pollution. release Ambient poisonou airs could gasespollute such as theCO indoorand airCO if it2, and carries PMexcessive at a higher dust concentration from carpets than and ambient furniture air ordoes, ifit which carries could too muchharm human’s ozone from health office machines[37,39]. For [17 ].the The purpose second of conditionthis study, ofcigarette sources sm ofoking IAP ishas combustion been chosen activity. as the proxy Combustion for combustion activities such as smoking cigarettes and burning fire-wood release poisonous gases such as CO and CO2, and PM at a higher concentration than ambient air does, which could harm human’s health [37,39]. Appl. Sci. 2017, 7, 823 6 of 21

For the purpose of this study, cigarette smoking has been chosen as the proxy for combustion activities. The third condition of sources of indoor air pollution is the presence of chemical products or substances. Chemical products such as chemical cleaning agents which are usually used in homes and offices may release VOCs at a poisonous level. Excessive level of VOCs may lead to respiratory-related diseases, such as lung cancer [17]. Thus, for the purpose of this study, chemical cleaning products will be used as the proxy for the presence of chemicals. The fourth condition for sources of indoor air pollution is the presence of food and beverages. Cooking activity or certain food and beverages emit VOCs which could lead to an uncomfortable smell inside a building [38]. VOCs are known to have led to eye irritation, headache and nausea in certain people. Therefore, rotten cooked fish, which has a strong smell and a high level of VOCs, is chosen as the proxy to food and beverages.

Table 3. Indoor air pollution (IAP) parameters, its sources and health effects to humans.

IAP Pollutant Sources Health Effects At lower concentration can cause chest pain, Electric arcing, electronic air cleaners, some O coughing, shortness of breath (asthma) and 3 copiers, and printers throat irritation Air fresheners, furniture, office equipment, Nausea, damage to the liver, mucous VOC cleaning agents membrane annoyance and asthma Fatigues in healthy, chest pain and sore eyes Combustion equipment, engines, faulty CO (low concentration) Impaired vision heating systems and headaches Cause a variety of pathological changes Combustion, gas stoves, water heaters, NO including the destruction of cilia lining 2 gas-fired dryers, cigarettes, engines respiratory airways Combustion appliances, humans present Cause occupants to grow drowsy and get CO 2 in room headaches, shortness of breath Stoves, fireplaces, cigarettes, sprays, Eye irritation effects and respiratory illness PM 10 cooking like lung cancer

O2 Photosynthesis from organisms like plants Nausea, vomiting and lethargic movements Air conditioning, fire, outdoor Hyperthermia, skin pain and can cause Temperature air temperature serious cardiac arrhythmia Cold and dry will cause skin itchiness. Humidity Unsanitary conditions and water damage Moisture cause cough, eye irritation

Finally, the presence of fragrances is the fifth condition for the sources of indoor air pollution. Fragrances such as air fresheners and perfumes usually deliver a pleasant smell. However, excessive use of perfumes may cause annoyance and headaches to certain people. In addition, air fresheners usually emit a high amount of VOCs, which may cause irritation and discomfort to certain people [17]. For this project, air freshener is used to substitute for the presence of fragrances. Table4 summarizes the five conditions for the sources of indoor air pollution and their proxies which have been used in this study.

Table 4. Sources of indoor air pollutants.

Condition Proxy Combustion Activity Cigarette Chemical Present Cleaning Agent Fragrance Present Air Freshness Food & Beverage Present Rotten Cooked Fish Ambient Air Ambient Air

3.2. Experimental Setup and Data Collection Once all the five conditions of sources of indoor air pollution were identified, an experiment simulating the five conditions was set up for data collection purposes. The experiment was conducted Appl. Sci. 2017, 7, 823 7 of 21

3.2. Experimental Setup and Data Collection Once all the five conditions of sources of indoor air pollution were identified, an experiment Appl. Sci. 2017, 7, 823 7 of 21 simulating the five conditions was set up for data collection purposes. The experiment was conducted in medium-size room of 4.5 m × 2.4 m × 2.6 m located in a concrete building which is equipped with anin air-conditioner medium-size room at a of height 4.5 m of× 2.42.2 mm× from2.6 mthe located floor, inas a shown concrete in buildingFigure 3. which The building is equipped is located with an 100air-conditioner m away from at the a height main ofroad, 2.2 mwhich from is the relatively floor, as far shown from in traffic-related Figure3. The outdoor building air is locatedpollution. 100 In m addition,away from the the location main road, of the which building is relatively is basically far from in a traffic-relatedrural area, which outdoor eliminated air pollution. the influence In addition, of urbanthe location air pollution of the on building the ambient is basically air. Thus, in a the rural ambient area, which air measured eliminated during the the influence experiment of urban is not air highlypollution influenced on the ambientby the outside air. Thus, air itself. the ambient The room air was measured a closed during environment the experiment and sealed is not by highlyusing rubber-sealinfluenced bywindows. the outside The air sensor itself. module, The room which was a is closed used environment to collect the and data sealed on the by usingindoor rubber-seal air, was installedwindows. hanging The sensor up to module, the right which of the is used wall to of collect the room the data with on a the1.1 indoorm height air, above was installed the ground; hanging a positionup to the considered right of the as wall the of breathing the room withzone afor 1.1 th me height occupants above [10]. the ground;The sensor a position module considered was powered as the usingbreathing a 7.5 zoneV adaptor for the and occupants was programmed [10]. The sensor to send module the data was poweredto the base using station a 7.5 every V adaptor one minute. and was Theprogrammed data collection to send was the conducted data to the baseover station16 days every between one minute. 9:00 a.m. The and data 5:00 collection p.m. with was conductedthe room temperatureover 16 days set between at 22 °C. 9:00 After a.m. each and experiment, 5:00 p.m. withthe window the room was temperature opened to purge set at 22the◦ indoorC. After air each as wellexperiment, as to allow the outside window air was to openedenter the to purgeroom. theSince indoor the ambient air as well air as essentially to allow outside served airas toa baseline enter the forroom. the experiment, Since the ambient the pollutants air essentially of interest served woul as a baselined vary each for the day experiment, based on the the outside pollutants ambient of interest air concentrationswould vary each that day day. based Since on the the ambient outside air ambient essentially air concentrations served as a baseline that day. for Sincethe experiment, the ambient the air pollutantsessentially of served interest as would a baseline vary foreach the day experiment, based on the the outside pollutants ambient of interest air concentrations would vary that each day. day Thisbased variation on the outside can be ambient accounted air concentrations for by using thatdata day. pre-processing This variation techniqu can be accountedes such as for baseline by using manipulation.data pre-processing Baseline techniques manipulation such as is baseline the so manipulation.lution to the Baselineproblem manipulation and the correct is the way solution of representingto the problem the andsignal the when correct the way analysis of representing deals with thesensor signal values when from the different analysis conversion deals with sensorunits. Baselinevalues from manipulation different conversionhelps to pre-process units. Baseline the sensor manipulation output to helps free to itself pre-process from the the drift sensor effect, output the intensityto free itself dependence from the and, drift possibly, effect, the from intensity non-linearity dependence [40,41]. and, The possibly, details fromabout non-linearity other type of [40 data,41]. pre-processingThe details about techniques other type will of be data described pre-processing in the next techniques section. will be described in the next section.

Figure 3. Room setup for the experiment. Figure 3. Room setup for the experiment. The process of data collection began with the first condition, which was the ambient air The process of data collection began with the first condition, which was the ambient air environment. The purpose of this experiment was to collect the data of clean air for the room with the environment. The purpose of this experiment was to collect the data of clean air for the room with assumption that the ambient air was not contaminated. For the first environment, the data collection the assumption that the ambient air was not contaminated. For the first environment, the data process took about two days. Thus, at the end of day 2, there were 960 samples collected for ambient collection process took about two days. Thus, at the end of day 2, there were 960 samples collected air. The second environment was the environment with the presence of chemical substance. In this for ambient air. The second environment was the environment with the presence of chemical experiment, a cleaning agent was used as a proxy of the chemical substance. About 100 mL of chemical substance. In this experiment, a cleaning agent was used as a proxy of the chemical substance. About was put in a beaker and placed inside the room—at the centre of the room. The experiment was repeated for two days and 960 samples were collected during that period. For the third environment, an air freshener was used as a proxy for the fragrance. An automatic air freshener which released Appl. Sci. 2017, 7, 823 8 of 21 Appl. Sci. 2017, 7, 823 8 of 21 100 mL of chemical was put in a beaker and placed inside the room—at the centre of the room. The experiment was repeated for two days and 960 samples were collected during that period. For the fragrancethird environment, every 15 min an was air freshener placed inside was used the room. as a proxy It was for hung the fragrance. on the wall An adjacent automatic to theair freshener wall where thewhich sensing released node wasfragrance placed, every about 15 min 2 m was from placed the floor inside and the about room. 2 It m was from hung the on sensing the wall node. adjacent The air flowto fromthe wall the where air conditioner the sensing would node accumulatewas placed, about the fragrance 2 m from in the the floor room. and The about data 2 wasm from collected the forsensing two days node. with The 960 air samples.flow from For the theair conditioner fourth condition would of accumulate the room the environment fragrance in with the room. combustion The activity,data was a person collected smoking for two adays cigarette with 960 was samples. chosen For as the the fourth proxy. condition A person of the was room asked environment to smoke in thewith room combustion so that the activity, real data a person of a person smoking smoking a cigare atte cigarette was chosen in a as room the proxy. was collected. A person was That asked person to smoke in the room so that the real data of a person smoking a cigarette in a room was collected. smoked one cigarette at the centre of the room. Every cigarette produced data for approximately That person smoked one cigarette at the centre of the room. Every cigarette produced data for 30 min. The experiment was repeated four times a day for eight days. The amount of data collected for approximately 30 min. The experiment was repeated four times a day for eight days. The amount of the environment with combustion activity was 960 samples. Lastly, for the room environment with data collected for the environment with combustion activity was 960 samples. Lastly, for the room the presence of food and beverages, rotten cooked fish had been selected to represent this category. environment with the presence of food and beverages, rotten cooked fish had been selected to A bowlrepresent of rotten this category. cooked fish A bowl was placedof rotten in thecooked middle fishof was the placed room. in The the experiment middle of wasthe room. repeated The for twoexperiment days and was 960 samplesrepeated werefor two collected days and during 960 samples that period. were collected during that period. 3.3. Sensor Response 3.3. Sensor Response InIn this this section, section, the the sensors’ sensors’ response response towardstowards the five five different different conditions conditions of of sources sources of ofindoor indoor airair pollution—ambient pollution—ambient air, air, combustion combustion activity,activity, chemicalchemical presence, presence, fragrance fragrance product product (air (air freshener) freshener) andand foods foods and and beverages beverages (rotten (rotten cookedcooked fish)—is fish)—is disc discussed.ussed. Figure Figure 4a4 ashows shows that that the the sensors sensors gave gave a a relativelyrelatively steadysteady readingreading throughout the the time. time. The The sensors’ sensors’ response response was was as as expected expected as asthere there was was nono substance substance which which could could interrupt interrupt thethe ambientambient air concentration. On On the the other other hand, hand, in inFigure Figure 4b,4 b, withwith the the presence presence of of a a chemical chemical substance, substance, whichwhich is represented by by ch chemicalemical cleaning cleaning product, product, it can it can bebe observed observed that that certain certain gas gas sensors, sensors, suchsuch asas VOCs,VOCs, NO2 andand O O33, ,reacted reacted differently differently compared compared to to ambientambient environment. environment. TheThe readingreading ofof thethe VOCs gas sensor, sensor, particularly, particularly, raised raised sharply sharply when when the the chemicalchemical was was present present in in the the room. room. A similarA similar situation situatio couldn could be be observed observed with with the the presence presence of of food food and beverages,and beverages, which which was represented was represented by rotten by rotten cooked cooked fish as fish shown as shown in Figure in Figure4c. The 4c. graphThe graph for VOCs for increasedVOCs increased dramatically dramatically when the when smell the was smell first was introduced first introduced and then and remained then remained at the peak. at the The peak. graph forThe other graph gases for didother not gases change did not much. change Notably, much. inNotabl all graphs,y, in all agraphs, different a different set of gas set of sensors gas sensors reacted differentlyreacted differently towards different towards conditions. different conditions. In the following In the sections, following the sections, raw data the collected raw data went collected through patternwent recognitionthrough pattern procedures. recognition Figure procedures.4d represents Figure the 4d responserepresents of the the response sensors of when the sensors the automatic when the automatic air freshener released fragrance into the room every 15 min. The fragrance of the air air freshener released fragrance into the room every 15 min. The fragrance of the air freshener however, freshener however, vaporized quickly into the air after it was released. Thus, these changes of high vaporized quickly into the air after it was released. Thus, these changes of high and low concentration and low concentration of fragrance in the air could be observed from the disturbed graph. of fragrance in the air could be observed from the disturbed graph.

(a) (b)

Figure 4. Cont. Appl. Sci. 2017, 7, 823 9 of 21 Appl. Sci. 2017, 7, 823 9 of 21 Appl. Sci. 2017, 7, 823 9 of 21

(c) (d) (c) (d)

(e) (e) Figure 4. (a) Ambient environment; (b) Chemical presence; (c) Food and beverages; (d) Fragrance FigureFigure 4. 4.(a )(a Ambient) Ambient environment; environment; ((bb)) ChemicalChemical presence; ( (cc)) Food Food and and beverages; beverages; (d ()d Fragrance) Fragrance presence; and (e) Combustion activity. presence;presence; and and (e )(e Combustion) Combustion activity. activity. Meanwhile, for the last condition, 30 min of data were recorded instead of 8 h because the effects Meanwhile, for the last condition, 30 min of data were recorded instead of 8 h because the effects of cigaretteMeanwhile, smoke for only the lastlast condition,for 30 min. 30Figure min 4e of illustrates data were the recorded effect of instead the cigarette of 8 h smoking because theactivity effects of cigarette smoke only last for 30 min. Figure 4e illustrates the effect of the cigarette smoking activity ofon cigarette the sensor smoke in onlythe room. last for Notably, 30 min. in Figure all graphs4e illustrates, a different the set effect of ofgas the sensors cigarette reacted smoking differently activity on the sensor in the room. Notably, in all graphs, a different set of gas sensors reacted differently ontowards the sensor different in the conditions. room. Notably, in all graphs, a different set of gas sensors reacted differently towardstowards different different conditions. conditions. 3.4. Steps in Pattern Recognition 3.4.3.4. Steps Steps in in Pattern Pattern Recognition Recognition The multivariate response of an array of chemical gases with broad and partially overlapping The multivariate response of an array of chemical gases with broad and partially overlapping selectivityThe multivariate created “electronic response fingerprints” of an array offor chemical a wide range gases of with odour broad which and can partially be characterized overlapping selectivity created “electronic fingerprints” for a wide range of odour which can be characterized selectivityusing pattern-recognition. created “electronic The fingerprints” process of pattern for a wide recognition range of may odour be split which into can four be sequential characterized stages: using using pattern-recognition. The process of pattern recognition may be split into four sequential stages: pattern-recognition.data pre-processing, The dimensionality process of pattern reduction, recognition classification may and be splitdecision into making. four sequential Figure 5 illustrates stages: data data pre-processing, dimensionality reduction, classification and decision making. Figure 5 illustrates pre-processing,the pattern recognition dimensionality process. reduction, Each stage classification is described in and detail decision in the making.following Figure sections.5 illustrates the the pattern recognition process. Each stage is described in detail in the following sections. pattern recognition process. Each stage is described in detail in the following sections.

Sensor Data Dimensionality Classification Decision Sensor Data Dimensionality Classification Decision Response Preprocessing Reduction Network Making Response Preprocessing Reduction Network Making Optimization Feedback Optimization Feedback

FigureFigure 5.5. Steps in pattern pattern recognition. recognition. Figure 5. Steps in pattern recognition. Appl. Sci. 2017, 7, 823 10 of 21 Appl. Sci. 2017, 7, 823 10 of 21

TheThe firstfirst stagestage ofof the pattern recognition process, after collecting raw data, is data data pre-processing. pre-processing. DataData pre-processingpre-processing isis aa procedureprocedure thatthat involvesinvolves extractingextracting certaincertain significantsignificant characteristicscharacteristics fromfrom thethe sensorsensor responseresponse curvescurves oror transienttransient responseresponse inin orderorder toto produceproduce aa setset ofof numericalnumerical datadata oror featurefeature forfor furtherfurther processingprocessing [[42].42]. ChoosingChoosing thethe correctcorrect pre-processingpre-processing techniquetechnique isis importantimportant becausebecause itit maymay aidaid inin thethe successsuccess ofof subsequentsubsequent analysis and affect the performance of pattern recognition [[43].43]. MostMost datadata pre-processingpre-processing techniques techniques are are basically basically derived derived from from a typicala typical sensor sensor response response as shownas shown in Figurein Figure6. Vo 6. isVo a is measured a measured value value in clean in clean ambient ambient air or air an or initial an initial value value called called the baseline, the baseline, while whileVs is theVs is response the response value value to odour to odour or smell. or smell.

FigureFigure 6.6. Typical sensorsensor response.response.

Basically,Basically, datadata pre-processingpre-processing techniquestechniques cancan bebe divideddivided intointo threethree majormajor categories:categories: baselinebaseline manipulation,manipulation, normalizationnormalization andand compression.compression. Ev Everyery category has their ownown formulaformula whichwhich waswas transformedtransformed from from the the sensor sensor response. response. In In this this study, study, only only five five data data pre-processing pre-processing techniques techniques have beenhave selectedbeen selected which which are frequently are frequently used inused odour in odour pattern pattern recognition recognition as summarized as summarized in Table in Table5. These 5. These five techniques,five techniques, called called features features of pattern of pattern recognition recogn andition raw and data, raw are data, also are chosen also aschosen one of as the one features. of the Thefeatures. feature The output feature from output the pre-processing from the pre-processing stage are often stage not are suitable often not to be suitable processed to be by processed the classifier by duethe toclassifier data redundancy due to data and redundancy high-dimensionality and high that-dimensionality can cause the that problem can ofcause dimensionality the problem [44 of]. Ondimensionality the other hand, [44]. if On too the many other features hand, if are too used many for features the classification, are used for there the isclassification, a risk that the there model is a becomesrisk that the too model complex becomes and the too capability complex and of the the model capability to classify of the canmodel be to very classify poor. can For be this very reason, poor. aFor dimensionality this reason, a reductiondimensionality stage isreduction required stage to eliminate is required the curseto eliminate of dimensionality the curse of indimensionality classification andin classification improve the and accuracy improve orperformance the accuracy oforclassification performance[ of45 ].classification [45].

TableTable 5.5. DataData pre-processingpre-processing techniquestechniques selected.selected.

TechniqueTechnique Abbreviation Abbreviation Formula References References Raw RW = [46] Raw RW Xij = Vij [46] Differential DIFF = − [43,46,47] Differential DIFF Xij = Vij− Vbj [43,46,47] Relative REL = Vij Relative REL Xij = V −bj = Fractional FRACT Vij−Vbj Fractional FRACT Xij = V bj −min [43,46,47] Vij − V [43,46,47] Sensor Normalization SN = ij Sensor Normalization SN Xij = max−min Vij − Vij = Vij Xij = q Vector ArrayVector Normalization Array Normalization VAN VAN N 2 ∑ (Vij ) ∑q=1()

Where, Xij represents feature matrix for the ith sample of jth sensor, Vij represents sensor’s response value, and Vbj represents baseline value of Vij. A feature extraction technique based on principal component analysis (PCA), which is widely used in in machine learning for dimensionality reduction, is chosen for this study [22,48,49]. PCA is Appl. Sci. 2017, 7, 823 11 of 21 defined by a matrix having as rows the eigenvectors of the feature space covariance matrix. The PCA removes any redundancy between the components of the projected vectors, since the covariance matrix in the transformed space becomes diagonal as shown in Equation (1):

∑ y = diag[λ1 λ2 ... λn] (1) where, {λi}i=1...n represent for the eigenvalues of the covariance matrix. The PCA performs the vector projection without any knowledge of their labels. This transformation is defined in such a way that the first principal component has as high a variance as possible and each succeeding component in turn has the highest variance possible under the constraint that it be orthogonal to the preceding components. It is therefore known as an unsupervised data analysis method or algorithm since it “ignores” class labels [50–52]. In this research, PCA was used to remove any redundancy between the components of the projected vectors and reduce the dimension of the original dataset. The result is explained in term of the “total variance explained” table. The table shows the number of the principal component (PC) that has been extracted with eigenvalue and how much information (variance) can be attributed to each component. Only a few components will be selected based on “eigenvalue-one criterion”. In PCA, one of the most commonly-used criteria for solving the number of components problem is the eigenvalue-one criterion, also known as the Kaiser criterion [53]. With this approach, any component with an eigenvalue greater than 1.00 will be selected, and thus it will reduce the dimension of original datasets which have nine dimensions. Table6 shows the total variance explained for the raw (RW) feature after being analyzed via PCA. The table clearly shows that most of the variance (31.77%) can be explained by the first principal component (PC1) alone. The second principal component (PC2) still bears some information (23.42%) while the third (PC3) and fourth principal components (PC4) bear least information with variance of 14.59% and 11.16%. respectively. Based on “eigenvalue-one criterion”, four components (PC1, PC2, PC3 and PC4) have been selected because they display an eigenvalue greater than 1.00 and hold the greater amount of variance. Together, the selected components explain 80.88% of the information.

Table 6. Total variance explained for raw (RW) feature.

Initial Eigenvalues Extraction Sums of Squared Loadings PC Total Variance (%) Cumulative (%) Total Variance (%) Cumulative (%) 1 2.854 31.711 31.711 2.854 31.711 31.711 2 2.108 23.420 55.131 2.108 23.420 55.131 3 1.313 14.590 69.720 1.313 14.590 69.720 4 1.005 11.162 80.882 1.005 11.162 80.882 5 0.833 9.251 90.133 6 0.407 4.526 94.659 7 0.278 3.085 97.744 8 0.152 1.685 99.429 9 0.051 0.571 100.000

The dimensionality reduction based on PCA has also been performed to other features. Table7 illustrates the overall results for all features. According to Table7, it is clear that most features can be explained based on four dimensions, except for vector array normalization (VAN), which can be explained by two dimensions only. VAN also gave the highest total variance explained at 93.70%. All other features are being dimensionally reduced from nine dimensions to four dimensions, with differential (DIFF) being the second-highest total variance explained at 84.48%, while SN was the lowest total variance explained at 73.99% only. Other features gave a similar result to raw (RW) at 80.88%, and relative (REL) and fractional (FRACT) shared the same percentage at 80.85%. Appl. Sci. 2017, 7, 823 12 of 21

Table 7. Total variance explained for all features.

Feature Original Dimension New Dimension Total Variance Explained RW 9 4 80.88% DIFF 9 4 84.48% REL 9 4 80.85% FRACT 9 4 80.85% SN 9 4 73.99% VAN 9 2 93.70%

4. Results and Discussion

4.1. Supervised Machine Learning Analysis for Pattern Recognition There are various supervised machine learning used in classification techniques, which can be sorted into a few categories: logic-based, perceptron-based, instance-based, statistical learning-based and vector-based [54]. The classifiers for each category are shown in Table8. For this study, three algorithms which have been used in many applications, especially involving odour or smell classification, have been chosen: MLP (perceptron-based), KNN (instance-based), and LDA (statistical learning-based) [55–57]. All of these classifiers have been run using MATLAB’s 2015 functions library that supports supervised machine learning. At the end of the program, the MATLAB R2015b (version 8.6, The MathWorks Company, Natick, MA, USA, 2015) produced an output file which then embedded to the system.

Table 8. Methods for supervised machine learning.

Method Classifier

• Decisions trees Logic-based • Rule based

• Artificial neural network (ANN) Perceptron-based • Multilayer perceptron (MLP)

Instance-based • K-nearest neighbour (KNN)

Statistical Learning-based • Linear discriminant analysis (LDA)

Vector-based • Support vector machine (SVM)

The example of one of the algorithms, such as the MLP model, along with its classification performance of six features is discussed in this section. For each of the features, a separate MLP model was formulated. Separate models need to be formulated as the aim of the research is to find the optimal classification accuracy for each feature. In order to identify the source affecting the IAQ, this study used the MLP, which consists of three layers: the input layer, hidden layer and output layer. As network architecture, a 3-layer perceptron model as shown in Figure7 was used. The first input layer contains the input variables for the network, which is the data after the pre-processing technique. For the data set before PCA, the input layer contains nine neurons of IAQ parameters, which are CO2, CO, O3, NO2,O2, VOC, PM10, temperature and humidity, while, for the data set after PCA, the input layer contains the dimensions for each feature after reduction. There is one hidden layer used and the numbers of hidden neurons were not fixed and were adjusted until the desired performance was achieved. The last layer of the model is the output layer, which consists of five target outputs that represent five types of sources of indoor air pollution, such as combustion activity, the presence of fragrances and so on. Sixty percent of all data is selected randomly to become the training set. A goal is set (in this case, a mean square error (MSE) of 0.0001 has been chosen as the goal) Appl. Sci. 2017, 7, 823 13 of 21 Appl. Sci. 2017, 7, 823 13 of 21 and thegoal) training and the dataset training is dataset trained is untiltrained the until desired the desired MSE is MSE obtained. is obtained. MSE wasMSE usedwas used as the asstopping the criterion.stopping Training criterion. was Training conducted was untilconducted the MSE until fell the belowMSE fell 0.0001 below or 0.0001 a maximum or a maximum epoch epoch limit limit of 1000 is of 1000 is reached. This is to ensure that the model is trained with minimum error iteration and not reached. This is to ensure that the model is trained with minimum error iteration and not over-trained. over-trained. The learning rate and momentum factor were chosen based on experimental analysis. The learning rate and momentum factor were chosen based on experimental analysis. The number of The number of hidden neurons was adjusted by the network to achieve this goal. The testing hiddentolerance neurons of wasthe neural adjusted network by the model network was tochosen achieve as 0.1. this This goal. value The is testing the maximum tolerance allowable of the neural networktolerance model level was for chosen the testing. as 0.1. This value is the maximum allowable tolerance level for the testing.

Figure 7. Network architecture of the multilayer perceptron (MLP) model. Figure 7. Network architecture of the multilayer perceptron (MLP) model.

TheThe detailed detailed parameter parameter for for MLP MLP training training is given in in Table Table 9.9 .For For example, example, to train to train the thedataset dataset (before(before PCA) PCA) for thefor the condition condition of of ambient ambient air, air, thethe data from from all all 9 9IAQ IAQ parameters parameters become become the neurons the neurons for thefor input the input layer. layer. Randomly, Randomly, 60% 60% of eachof each IAQ IAQ parameter parameter was was selected selected toto bebe the training set set (60% (60% out of 960out data of 960 for data ambient for ambient air). air). The The training training was was conducted conducted until the the targeted targeted MSE MSE was was reached reached or or untiluntil a maximum a maximum epoch epoch limit limit of of 1000 1000 was was reached.reached. This This process process was was repeated repeated with with the other the other four four conditions.conditions. In theIn the end, end, the the network network would would produceproduce a a model model which which was was then then tested tested against against the 40% the 40% of theof remainingthe remaining data data for for all all conditions conditions (the(the testing set) set) in in order order to tofind find the theclassification classification accuracy. accuracy. The wholeThe whole process process was was repeated repeated again again for for differentdifferent fe features.atures. The The model model of the of thefeature feature that thatgave gavethe the highest classification accuracy may be chosen to be embedded into the system after being compared highest classification accuracy may be chosen to be embedded into the system after being compared to to the model of other type of classifiers, such as KNN or LDA. the model of other type of classifiers, such as KNN or LDA. Table 9. Parameters for multilayer perceptron (MLP) training. Table 9. Parameters for multilayer perceptron (MLP) training. Training Parameter Value Sample Training Parameter Value • Number of samples used for training: 2880 4800 • Number of samplesSample used for testing: 1920 • Number of samplesInput used for training: 2880 9 4800 • Number of samplesHidden usedneurons for testing: 1920 Flexible Output neurons 5 PerformanceInput MSE 9 HiddenGoal neurons 0.0001 Flexible OutputLearning neurons rate 0.01 5 MomentumPerformance constant 0.5 MSE Goal 0.0001 Learning rate 0.01 Momentum constant 0.5 Appl. Sci. 20172017,, 77,, 823 823 14 of 21

The classification performance of the MLP, KNN and LDA using the six features for the dataset before-PCAThe classification and after-PCA performance are shown of thein Figure MLP, KNN8. The and PCA LDA was using performed the six because features PCA for the is datasetknown tobefore-PCA be able to and increase after-PCA the areclassification shown in Figure accuracy8. The of PCAcertain was datasets performed by becausereducing PCA the isnumber known toof variables,be able to losing increase only the a classification minimum of accuracyvariability of [46]. certain However, datasets as by shown reducing in Figure the number 8, the classification of variables, ratelosing for only dataset a minimum after PCA of is variability less than the [46 ].classificati However,on as rate shown before in FigurePCA for8, theall features. classification This result rate for is duedataset to the after information PCA is less loss than during the classification PCA. According rate before to [58], PCA for fordatasets all features. with very This low result complexity is due to (fewthe information PCs), the relevant loss during information PCA. According has been toexcluded [58], for during datasets the with process very of low PCA, complexity which resulted (few PCs), in athe lower relevant classification information accuracy has been for excludeddatasets after during PCA. the The process PCA of coul PCA,d give which a higher resulted classification in a lower accuracyclassification to datasets accuracy with for very datasets high after complexity PCA. The (many PCA PCs), could where give a the higher dataset classification before PCA accuracy does not to onlydatasets have with relevant very information, high complexity but also (many contains PCs), noise where [59] the. With dataset the before presence PCA of doesnoise, not the only classifier have over-fitsrelevant the information, training data but and also thus contains does noise not generalize [59]. With well. the presence Based on of the noise, explanation the classifier by [58], over-fits it can bethe seen training that this data study and thus has a does dataset not generalizewith a very well. low complexity Based on the (only explanation 9 PCs). Thus, by [58 the], itPCA can process be seen hasthat excluded this study relevant has a dataset information with athat very could low complexitycontribute to (only the 9high PCs). classification Thus, the PCAaccuracy, process which has explainsexcluded the relevant lower information classification that accuracy could contribute for the todataset the high after classification PCA. Nevertheless, accuracy, which although explains the datasetthe lower after classification PCA (VAN accuracy feature) could for the not dataset give 100% after classification PCA. Nevertheless, accuracy, although it could the classify dataset 99.58% after ofPCA the (VAN dataset feature) using couldonly two not variables give 100% instead classification of nine accuracy,variables itneeded could classifyfor the dataset 99.58% ofbefore the datasetPCA. using only two variables instead of nine variables needed for the dataset before PCA.

Classification before PCA Classification after PCA 100.00 100.00 90.00 90.00 80.00 80.00 70.00 70.00 60.00 60.00 FRA

FRA Accuracy (%) Classification RW DIFF REL SN VAN Classification Accuracy Accuracy (%) Classification RW DIFF REL SN VAN CT CT MLP 96.67 97.50 97.29 97.29 70.00 99.58 MLP 99.69 98.33 97.92 97.81 89.69 100.0 KNN 98.44 98.75 98.02 98.02 88.33 97.92 KNN 99.06 98.96 98.13 98.13 97.50 100.0 LDA 92.19 93.96 94.69 94.69 64.17 66.56 LDA 97.60 96.56 95.73 95.73 85.10 99.38

Figure 8. ClassificationClassification performance performance for for the the before- before- pr principalincipal component component analysis analysis (PCA) (PCA) and and after- after- principal component analysis (PCA) dataset.

The validation in identifying pollutants by th thee proposed machine learning algorithm can be obtained byby lookinglooking at at the the confusion confusion matrix. matrix. A confusion A confusion matrix matrix is a table is a thattable is oftenthat is used often to describeused to describethe performance the performance of a classification of a classification model (or model “classifier”) (or “classifier” on a set of) on test a data set of for test which data the for true which values the trueare known. values Forare example,known. For in theexample, case of in the the MLP case classifier, of the MLP the confusionclassifier, matrixthe confusion for the featuresmatrix for giving the featuresthe lowest giving and the lowest highest and classification the highest accuracy classifica aretion shown accuracy in Tables are shown 10 and in Tables11. Rows 10 and and 11. columns Rows andrepresent columns actual represent and predicted actual values, and predicted respectively. valu Tablees, respectively. 10 shows the Table confusion 10 shows matrix the for featureconfusion SN matrix(after PCA) for feature as it gave SN (after the lowest PCA) confusionas it gave the matrix. lowest Based confusion on the matrix. table, it Based can be on observed the table, that it can every be observedcondition contributesthat every condition to the confusion contributes level, withto the human confusion activity level, having with the human highest activity confusion having level the at highest50%. This confusion means that level MLP at 50%. can classify This means only 50% that of MLP combustion can classify activity only correctly 50% of andcombustion it cannot activity classify correctlythe other and 50% it correctly cannot classify as combustion the othe activity.r 50% correctly as combustion activity. Table 1111 presents presents the the confusion confusion matrix matrix for VAN for (beforeVAN PCA)(before which PCA) has which the highest has classificationthe highest classificationaccuracy. Compared accuracy. to Compared the confusion to the matrix confusion of SN matrix in Table of SN 10, in MLP Table does 10, notMLP have does any not confusion have any confusionin classifying in classifying all the five all conditions. the five conditions. It means thatIt means it can that classify it can all classify of the all five of conditions the five conditions correctly. correctly.This confusion This confusion matrix validates matrix validates the classification the classi ratefication for VAN rate (beforefor VAN PCA), (before which PCA), is 100%.which is 100%.

Appl. Sci. 2017, 7, 823 15 of 21

Table 10. Confusion matrix of multilayer perceptron (MLP) for sensor normalization (SN) after principal component analysis (PCA).

Predicted Sources of IAQ Human Confusion Ambient Chemical Food & Beverages Fragrance Pollutant Activity Level (%) Actual Ambient 276 88 20 0 0 28.13 Chemical 16 308 50 10 0 19.79 Food & Beverages 0 16 280 84 4 27.08 Fragrance 6 16 40 288 34 25.00 Human Activity 4 12 60 116 192 50.00

Table 11. Confusion matrix of multilayer perceptron (MLP) for vector array normalization (VAN) before principal component analysis (PCA).

Predicted Sources of IAQ Human Confusion Ambient Chemical Food & Beverages Fragrance Pollutant Activity Level (%) Actual Ambient 384 0 0 0 0 0 Chemical 0 384 0 0 0 0 Food & Beverages 0 0 384 0 0 0 Fragrance 0 0 0 384 0 0 Human Activity 0 0 0 0 384 0

The classification accuracy that has been achieved by this study is quite similar to the classification result achieved by a previous researcher [46]. They have developed a laboratory-made malodour sensing system, used to identify five typical sources of olfactive annoyance: printing houses, a paint shop in a coach building, wastewater treatment plant, urban waste composting facilities and a rendering plant. The researcher adopted various data pre-processing techniques, such as the VAN feature, which was also used in this study. Their results also show that the best classification results are obtained using a VAN feature with 100% classification accuracy. The objective of testing these three classifiers is to see which classifier gives the highest classification accuracy. Based on the results of the classifiers discussed before, there are four sets of classifiers with one feature (VAN before PCA) which gave 100% classification accuracy:

(1). MLP-VAN feature before PCA (model 9-3-5), (2). MLP-VAN feature before PCA (model 9-9-5), (3). MLP-VAN feature before PCA (model 9-12-5), and (4). KNN-VAN feature before PCA (K factor is 2).

To prove that the VAN feature before PCA really gave 100% classification accuracy, another analysis has been done. The PCA visualization for the VAN feature before any dimensionality reduction is constructed in s 3D plot as shown in Figure9. From Figure9, it can be seen that none of the five conditions coincide with each other, and therefore they are mutually exclusive. After the feature has been identified, it is now time to choose between the two classifiers: MLP or KNN. This study chooses MLP with a model structure of 9-3-5 because it is easier to be embedded in the system. Model 9-3-5 only has three hidden variables, while the other two model structures have nine and 12 hidden variables. A model structure with fewer hidden variables has a less complicated formula and is therefore easy to be embedded. As far as KNN is concerned, KNN requires a large storage space in the system because it saves every data that it receives. MLP, on the other hand, does not require a large storage system. Due to these reasons, an MLP classifier is chosen for this study. Appl. Appl.Sci. 2017 Sci., 20177, 823, 7 , 823 16 of 1621 of 21

Figure 9. 3D plot of PCA visualization for the VAN feature. Figure 9. 3D plot of PCA visualization for the VAN feature. 4.2. Classification of Multiple Sources of IAP 4.2. Classification of Multiple Sources of IAP This section shows results for the classification of sources of IAP when multiple sources are This section shows results for the classification of sources of IAP when multiple sources are present present at the same time. Based on the previous result, the MLP classifier with a model structure of at the same time. Based on the previous result, the MLP classifier with a model structure of 9-3-5 was 9-3-5 was chosen for this analysis. To collect the related data for this analysis, an experiment that chosen for this analysis. To collect the related data for this analysis, an experiment that simulates the simulates the presence of multiple sources of IAP was conducted. A similar environment to the presence of multiple sources of IAP was conducted. A similar environment to the previous experiment previous experiment of collecting a single source of IAP was maintained, where the sensor module of collecting a single source of IAP was maintained, where the sensor module was installed hanging was installed hanging up to the right of the wall with 1.1 m of height above the ground, and the up to the right of the wall with 1.1 m of height above the ground, and the temperature of the room was temperature of the room was set to 22 °C. The data was collected for one day: between 9:00 a.m. and set to 22 ◦C. The data was collected for one day: between 9:00 a.m. and 11:30 a.m. Figure 10 shows the 11:30 a.m. Figure 10 shows the process of data collection for testing the system when all sources process of data collection for testing the system when all sources present in a room. present in a room. The experiment began with the first condition present, which was the ambient air. There were The experiment began with the first condition present, which was the ambient air. There were 45 samples collected for ambient air over 45 min duration. This environment was tagged as “single 45 samples collected for ambient air over 45 min duration. This environment was tagged as “single source”. Then, an automatic air freshener which released fragrance every 15 min was placed inside source”. Then, an automatic air freshener which released fragrance every 15 min was placed inside the room. This environment was conducted to simulate the presence of two sources: ambient air and the room. This environment was conducted to simulate the presence of two sources: ambient air and fragrance. The air freshener was hung up on the wall with a height of 2 m from the floor and about 2 m fragrance. The air freshener was hung up on the wall with a height of 2 m from the floor and about 2 from the sensing node. Total data collected for the air freshener was over 75 min. This environment is m from the sensing node. Total data collected for the air freshener was over 75 min. This environment tagged as “mixed sources A”. After that, a person was asked to smoke in the room. This environment is tagged as “mixed sources A”. After that, a person was asked to smoke in the room. This was conducted to simulate the presence of three sources of IAP: ambient air, fragrance and single environment was conducted to simulate the presence of three sources of IAP: ambient air, fragrance cigarette smoke. That person smoked one cigarette at the centre of the room. One cigarette took 10 min, and single cigarette smoke. That person smoked one cigarette at the centre of the room. One cigarette which contribute to a 10 min data sample. This environment was tagged as “mixed sources B”. Lastly, took 10 min, which contribute to a 10 min data sample. This environment was tagged as “mixed the other two sources of IAQ, which are food and beverages and chemical cleaning product, were sources B”. Lastly, the other two sources of IAQ, which are food and beverages and chemical cleaning added into the environment. These two sources of IAP were placed in the middle of the room for product, were added into the environment. These two sources of IAP were placed in the middle of 20 min, which gave 20 samples. This environment was conducted to simulate the presence of all the room for 20 min, which gave 20 samples. This environment was conducted to simulate the five sources of IAP. Although there was no person smoking in the room at this time, the presence of presence of all five sources of IAP. Although there was no person smoking in the room at this time, smoking can still be traced. The presence of smoking can be traced up to 30 min, as shown in Figure 10. the presence of smoking can still be traced. The presence of smoking can be traced up to 30 min, as This environment was tagged as “mixed sources C”. shown in Figure 10. This environment was tagged as “mixed sources C”. Appl. Sci. 2017, 7, 823 17 of 21 Appl. Sci. 2017, 7, 823 17 of 21

Experiment Setting

Single Source • Duration = 9.00 to 9.45am • IAP Source = Clean air • 1 minute = 1 sample • Data Collected = 45 samples

Mixed Sources A • Day = 9.46 to 11.00am • IAP Source = Clean air + Air freshener • 1 minute = 1 sample • Data Collected = 75 samples

Mixed Sources B • Day = 11.01 to 11.10am • IAP Source = Clean air + Air freshener + Cigarette smoke • 1 minute = 1 sample • Data Collected = 10 samples

Mixed Sources C • Day = 11.11 to 11.30am • IAP Source = Clean air + Air freshener + Cigarette smoke +Rotten Cooked Fish + Cleaning Agent • 1 minute = 1 sample • Data Collected = 20 samples

FigureFigure 10. 10.Data Data collection collectionprocedures procedures for for multiple multiple sources sources present. present.

FigureFigure 11 shows 11 shows the the result result of of the the sensor’ssensor’s response response for for all all situations situations as described as described above, above, from a from a singlesingle source source up up to to five five sources sources ofof IAP.IAP. ItIt can be observed observed that that the the sensors sensors react react differently differently when when additionaladditional sources sources of IAP of areIAP added are added into the into room. the Theroom. result The of result the classifier of the cl basedassifier on based all environments on all environments is shows in Table 12. Based on the table, it can be observed that the system can precisely is shows in Table 12. Based on the table, it can be observed that the system can precisely detect a single detect a single source (ambient) with 45 correct classifications out of 45 data samples. This means that source (ambient) with 45 correct classifications out of 45 data samples. This means that the MLP the MLP classifier can classify an ambient environment at 100% classification rate. Similarly, when classifierthe fragrance can classify is present an ambient in the condition environment of “mixed at 100% sources classification A”, the system rate. Similarly,correctly classifies when the the fragrance two is presentmixed in sources the condition of IAQ. ofThe “mixed system sources did not A”,misclassif the systemy the sources correctly as unknown classifies sources. the two This mixed result sources is of IAQ.as The expected system didbecause not misclassifythe environment the sources of fragranc as unknowne mixed sources.with the Thisambient result air is is as similar expected to the because the environmentpresence of fragrance of fragrance alone, mixed and the with system the has ambient already air been is similar trained to with the such presence an environment. of fragrance On alone, and thethe systemother hand, has the already system been could trained not classify with two such samples an environment. out of 10 samples On the as the other available hand, sources the system could(ambient not classify air, presence two samples of fragrance out of and 10 presence samples ci asgarette the availablesmoke) in the sources condition (ambient of “mixed air, sources presence of fragranceB”. Nonetheless, and presence the cigaretteMLP classifier smoke) correctly in the classi conditionfied the of other “mixed eight sources samples B”. as fragrance Nonetheless, (one) theand MLP cigarette smoke (seven). The result is also as expected, because the presence of smoking was classifier correctly classified the other eight samples as fragrance (one) and cigarette smoke (seven). overpowering the presence of fragrance due to the high amount of gases produced during smoking The result is also as expected, because the presence of smoking was overpowering the presence of fragrance due to the high amount of gases produced during smoking as compared to the amount of gases produced by air freshener. Likewise, in the condition of “mixed sources C”, when all of the sources of IAP were mixed together in the room, the MLP classifier could correctly classify 50% of Appl. Sci. 2017, 7, 823 18 of 21 Appl. Sci. 2017, 7, 823 18 of 21 as compared to the amount of gases produced by air freshener. Likewise, in the condition of “mixed sources C”, when all of the sources of IAP were mixed together in the room, the MLP classifier could the samplescorrectly as classify combustion 50% of activity the samples (one), as food combustion and beverages activity (two)(one), andfood presenceand beverages of chemical (two) and (seven). The otherpresence 10 samples of chemical have (seven). been The classified other 10 as samp unknownles have sources. been classified as unknown sources.

Single Source Mixed Mixed Mixed Sources C Sources A Sources B

FigureFigure 11. 11.Sensor’s Sensor’s response response when when additionaladditional sources sources of of IAP IAP are are present. present.

Table 12. Result of the classifier based on mixed sources. Table 12. Result of the classifier based on mixed sources. Sources of IAQ Pollutant Data Condition Combustion Food & Samples Ambient Fragrance Sources of IAQ Pollutant Chemical Unknown Condition Data Samples Activity Beverages Ambient Fragrance Combustion Activity Food & Beverages Chemical Unknown Single source 45 45 0 0 0 0 0 SingleMixed source sources A 45 75 45 7 0 68 0 0 0 0 0 0 0 0 Mixed sourcesMixed sources A B 75 10 7 0 68 1 0 7 0 0 0 0 2 0 MixedMixed sources sources B C 10 20 0 0 1 0 7 1 2 0 7 0 10 2 Mixed sources C 20 0 0 1 2 7 10

5. Conclusions 5. Conclusions This study proposed an enhanced IAQMS which could recognize the pollutants by the utilized Thissupervised study machine proposed learning an enhanced algorithm IAQMS that is widely which used could in data recognize mining. the With pollutants regards to by pollutant the utilized supervisedrecognition, machine it can learning be concluded algorithm that this that IAQMS is widely has successfully used in data recognized mining. Withfive sources regards of to indoor pollutant recognition,air pollution it can with be concluded a classification that thisrate IAQMSof 100 pe hasrcent. successfully These five recognizedsources of indoor five sources air pollution— of indoor air pollutionambient with air, a classificationcombustion activity, rate of presence 100 percent. of chemic Theseals, fivepresence sources of fragrances of indoor and air presence pollution—ambient of food air, combustionand beverages—are activity, successfully presence classified of chemicals, by MLP presence and KNN of using fragrances the VAN and before presence PCA feature. of food To and prove that all the five sources of indoor air pollution are mutually exclusive, a 3-D graph has been beverages—are successfully classified by MLP and KNN using the VAN before PCA feature. To prove attached, as shown in Section 4. Three classifiers have been tested to choose for the best classifier: that allMLP, the KNN five sourcesand LDA. of In indoor the end, air the pollution MLP classifier are mutually with a model exclusive, structure a 3-D of graph9-3-5 has has been been chosen attached, as shownto be inembedded Section4 .into Three the classifiersIAQMS because have beenit is more tested suitable to choose to be for embedded the best in classifier: the system. MLP, A model KNN and LDA.structure In the end, with the fewer MLP hidden classifier variables with has a modela less comp structurelicated of formula 9-3-5 hasand beenis therefore chosen easy to to be embed. embedded into theKNN IAQMS requires because a large itstorage is more space suitable in the system to be embedded because it saves in the every system. data that A model it receives. structure MLP, with feweron hidden the other variables hand, does has anot less require complicated a large storage formula system. and isDue therefore to these easyreasons, to embed. the MLP KNN classifier requires a largeis storagechosen for space this inproject. the system The system because has it also saves been every tested data with that multiple it receives. sources MLP, of IAP on the presented other hand, doestogether. not require The aresult large shows storage that system. the system Due is able to these to classify reasons, when the single MLP and classifier two mixed is chosensources are for this presented together; however, when more than two sources of IAP are presented at the same period, project. The system has also been tested with multiple sources of IAP presented together. The result the system will classify the sources as ‘unknown’ because the system cannot recognize the input of shows that the system is able to classify when single and two mixed sources are presented together; the new pattern. Hence, the classification accuracy falls dramatically. Future research should consider however,expanding when morethe sources than twoof indoor sources air of pollution IAP are to presented include more at the sources same period, of indoor the air system pollution. will classifyIn the sources as ‘unknown’ because the system cannot recognize the input of the new pattern. Hence, the classification accuracy falls dramatically. Future research should consider expanding the sources of indoor air pollution to include more sources of indoor air pollution. In addition, the proxy for each condition could also be added to include more activity and to test the efficiency of IAQ that can detect more sources of IAP such as furniture (wood, plastic), building materials (plaster, insulation), Appl. Sci. 2017, 7, 823 19 of 21 and coatings (carpeting, painting). In order to sense mixed sources, more sensitive sensors might provide more insight; more sensitive sensors with the ability to distinguish the different sources need to used. Furthermore, the system can also be trained with two or more mixed sources, for example, mixing between air freshener with smoking activity and the presence of food and beverages. For this study, a neural network (NN) used to classify the sources of indoor air pollution is embedded into the IAQMS only. Therefore, this study would like to suggest that the NN should be embedded onto the microcontroller for future research. If the classifier is embedded onto the microcontroller, the IAQMS could operate as a stand-alone device, which would enable the IAQMS to send an alert to users (via short massage system (SMS)) if the air quality level in the observed room changed to a “bad” or “hazardous” level.

Acknowledgments: The equipment used in this project was pre-developed by Universiti Malaysia Perlis (UniMAP) for previous studies. The authors would like to offer a special thanks to the members of the Centre of Excellence for Advanced Sensor Technology (CEATech), UniMAP, Malaysia for their critical advice and warm cooperation. The authors also would like to acknowledge the financial sponsorship provided by Malaysian Ministry of Higher Education (MOHE) under myBrain15 scheme. Lastly, the authors would like to thanks to Universiti Teknologi Malaysia (UTM) for supporting this research especially in terms of funding through PAS grant (Project No: 02K56). Author Contributions: The presented work is the product of cooperative effort by the whole group members. In particular, Shaharil Mad Saad has performed the experiments, analyzed the data and drafted the manuscript; Allan Melvin Andrew, Ali Yeon Md Shakaff, Mohd Azuwan Mat Dzahir, Mohamed Hussein, Maziah Mohamad and Zair Asrar Ahmad contributed to the ideas, developments of certain parts of the project and made critical revisions to the manuscript. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Environmental Protection Agency (EPA). Buildings and Their Impact on the Environment: A Statistical Summary; Environmental Protection Agency Green Building Workgroup: Washington, DC, USA, 2009. Available online: http://www.epa.gov/greenbuilding/pubs/gbstatpdf (accessed on 26 November 2014). 2. Reffat, R.M.; Harkness, E.L. Environmental comfort criteria: Weighting and integration. J. Perform. Constr. Facil. 2001, 15, 104–108. [CrossRef] 3. Batterman, S.; Burge, H. HVAC systems as emission sources affecting indoor air quality: A critical review. HVAC&RRes. 1995, 1, 61–78. 4. World Health Organization (WHO). Indoor Air Quality Guidelines: Household Fuel Combustion; WHO: Geneva, Switzerland, 2014. 5. Borkar, C. Development of Wireless Sensor Network System for Indoor Air Quality Monitoring. Master’s Thesis, University of North Texas, Denton, TX, USA, 2012. 6. Centers for Disease Control and Prevention (CDC). Indoor Environmental Quality: Chemicals and Odors; CDC: Atlanta, GA, USA, 2014. Available online: http://www.cdc.gov/niosh/topics/indoorenv/chemicalsodors. html. (accessed on 13 December 2014). 7. Mumyakmaz, B.; Karabacak, K. An E-Nose-based indoor air quality monitoring system: Prediction of combustible. Turk. J. Electr. Eng. Comput. Sci. 2015, 2, 1–12. 8. Sait, C.S. The Development of the Indoor Air Pollution Index for Office Buildings. Ph.D. Thesis, Illinois Institute of Technology, Chicago, IL, USA, 2001. 9. Wasana, T. A Validation Method for the Development of Carbon Monoxide Wireless Sensor for Ambient Air Monitoring. Master’s Thesis, University of Cincinnati, Cincinnati, OH, USA, 2007. 10. Department of Occupational Safety and Health (DOSH). Industry Code of Practice On Indoor Air Quality; DOSH: Putrajaya, Malaysia, 2010. 11. Postolache, O.A.; Pereira, J.M.D.; Girao, P.M.B.S. Smart sensors network for air quality monitoring applications. IEEE Trans. Instrum. Meas. Trans. Instrum. Meas. 2009, 58, 3253–3262. [CrossRef] 12. American Society of Heating Refrigerating and Air-Conditioning Engineers (ASHRAE). 2009 Ashrae Handbook: Fundamentals, I-P ed.; ASHRAE: Atlanta, GA, USA, 2009. 13. Indoor Air Quality Management Group (IAQMG). Guidance Notes for the Management of Indoor Air Quality in Offices and Public Places; IAQMG: Hong Kong, China, 2003. Appl. Sci. 2017, 7, 823 20 of 21

14. Rahul, R. An Embedded Real-Time Environment Monitoring System. Master’s Thesis, University of Texas at Arlington, Arlington, MA, USA, 2002. 15. Thad, G. Indoor Environmental Quality; CRC Press: Boca Raton, FL, USA, 2001. 16. Yau, Y.H.; Chew, B.T.; Saifullah, A.Z. Studies on the indoor air quality of Pharmaceutical Laboratories in Malaysia. Int. J. Sustain. Built Environ. 2012, 1, 110–124. [CrossRef] 17. Environmental Protection Agency (EPA). Indoor Air: An Introduction to Indoor Air Quality (IAQ); EPA: Washington, DC, USA, 2014. Available online: http://www.epa.gov/iaq/ia-intro.html (accessed on 12 January 2015). 18. Maroni, M.; Seifert, B.; Lindwall, T. Indoor Air Quality: A Comprehensive Reference Book; Elsevier: New York, NY, USA, 1995. 19. Mohave, L. Indoor air quality and health. In Proceedings of the Healthy Buildings 2000, Espoo, Finland, 6–10 August 2000. 20. Environmental Protection Agency (EPA). Care for Your Air: A Guide to Indoor Air Quality; EPA: Washington, DC, USA, 2008. 21. Capelli, L.; Sironi, S.; del Rosso, R. Electronic noses for environmental monitoring applications. Sensors 2014, 14, 19979–20007. [CrossRef][PubMed] 22. Dentoni, L.; Capelli, L.; Sironi, S.; del Rosso, R.; Zanetti, S.; della Torre, M. Development of an electronic nose for environmental odour monitoring. Sensors 2012, 12, 14363–14381. [CrossRef][PubMed] 23. Nicolas, J.; Romain, A.C.; Wiertz, V.; Maternova, J.; André, P. Using the classification model of an electronic nose to assign unknown malodours to environmental sources and to monitor them continuously. Sens. Actuators B Chem. 2000, 69, 366–371. [CrossRef] 24. Sironi, S.; Capelli, L.; Céntola, P.; del Rosso, R.; Grande, M.I. Continuous monitoring of odours from a composting plant using electronic noses. Waste Manag. 2007, 27, 389–397. [CrossRef][PubMed] 25. Sohn, J.H.; Pioggia, G.; Craig, I.P.; Stuetz, R.M.; Atzeni, M.G. Identifying major contributing sources to odour annoyance using a non-specific gas sensor array. Biosyst. Eng. 2009, 102, 305–312. [CrossRef] 26. Archie, L.W. Prediction Of Odor Pleasantness Using Electronic Nose Technology and Artificial Neural Networks. Ph.D. Thesis, Pennsylvania State University, State College, PA, USA, 2006. 27. Kim, D.K.; Roh, Y.W.; Hong, K.S. A method of multiple odors detection and recognition. In Proceedings of the International Conference on Human-Computer Interaction, Orlando, FL, USA, 9–14 July 2011; Volume 6762, pp. 464–473. 28. Loutfi, A.; Coradeschi, S. Odor recognition for intelligent systems. IEEE Intell. Syst. 2008, 23, 41–48. [CrossRef] 29. Tong, Z.; Chen, Y.; Malkawi, A.; Adamkiewicz, G.; Spengler, J.D. Quantifying the impact of traffic-related air pollution on the indoor air quality of a naturally ventilated building. Environ. Int. 2016.[CrossRef] [PubMed] 30. Tong, Z.; Yang, B.; Hopke, P.K.; Zhang, K.M. Microenvironmental air quality impact of a commercial-scale biomass heating system. Environ. Pollut. 2017, 220, 1112–1120. [CrossRef][PubMed] 31. OSHA. Indoor Air Quality in Commercial and Institutional Building; OSHA: Washington, DC, USA, 2011. 32. Department of Safety and Health (DOSH). DOSH Profile; Department of Safety and Health, Ministry of Human Resources: Putrajaya, Malaysia, 2014. 33. Figaro. Technical Information For Air Quality Control Sensors; Figaro: Osaka, Japan, 2001. 34. Figaro. TGS 2602—For the Detection of Air Contaminants; Figaro: Osaka, Japan, 2005. 35. Saad, S.M.; Andrew, A.M.; Shakaff, A.Y.M.; Saad, A.R.M.; Kamarudin, A.M.Y.; Zakaria, A. Classifying sources influencing indoor air quality (IAQ) using artificial neural network (ANN). Sensors 2015, 15, 11665–11684. [CrossRef][PubMed] 36. Aeroqual. Series-200-300-500-Portable-Monitor-User-Guide-11–14; Aeroqual: Auckland, New Zealand, 2014. 37. Invernizzi, R.B.G.; Ruprecht, A.; Mazza, R.; Rossetti, E.; Sasco, A.; Nardini, S. Particulate matter from tobacco versus diesel car exhaust: An educational perspective. Tob. Control 2004, 13, 219–221. [CrossRef][PubMed] 38. Meena, K. Indoor Air Pollution: Sources, Health Effects and Mitigation Strategies. In The Encylopedia of Earth; Environmental Information Coalition: Washington, DC, USA, 2009. Available online: http://editors.eol.org/ eoearth/wiki/Indoor_air_quality_(IAQ) (accessed on 31 January 2015). Appl. Sci. 2017, 7, 823 21 of 21

39. Loomis, D.; Grosse, Y.; Lauby-Secretan, B.; El Ghissassi, F.; Bouvard, V.; Benbrahim-Tallaa, L.; Guha, N.; Baan, R.; Mattock, H.; Straif, K.; et al. The carcinogenicity of outdoor air pollution. Lancet Oncol. 2013, 14, 1262–1263. [CrossRef] 40. Andrew, A.M.; Zakaria, A.; Saad, S.M.; Shakaff, A.Y.M. Multi-stage feature selection based intelligent classifier for classification of incipient stage fire in building. Sensors 2016, 16, 31. [CrossRef][PubMed] 41. Bahram, G.K. On Using Artificial Neural Networks and Genetic Algorithms to Optimize Performance of an Electronic Nose. Ph.D. Thesis, North Carolina State University, Raleigh, NC, USA, 1996. 42. Distante, C.; Leo, M.; Siciliano, P.; Persaud, K.C. On the study of feature extraction methods for an electronic nose. Sens. Actuators B Chem. 2002, 87, 274–288. [CrossRef] 43. Gutierrez-Osuna, R.; Nagle, H.T. A method for evaluating data-preprocessing techniques for odor classification with an array of gas sensors. IEEE Trans. Syst. Man Cybern. B Cybern. 1999, 29, 626–632. [CrossRef][PubMed] 44. Gutierrez-Osuna, R. Pattern analysis for machine olfaction: A review. IEEE Sens. J. 2002, 2, 189–202. [CrossRef] 45. Bahraminejad, B.; Basri, S.; Isa, M.; Hambli, Z. Real-time gas identification by analyzing the transient response of capillary-attached conductive gas sensor. Sensors 2010, 10, 5359–5377. [CrossRef][PubMed] 46. Romain, A.C.; Nicolas, J.; Wiertz, V.; Maternova, J.; Andre, P. Use of a simple tin oxide sensor array to identify five malodours collected in the field. Sens. Actuators B Chem. 2000, 62, 73–79. [CrossRef] 47. Gardner, J.W.; Bartlett, P.N. A brief history of electronic noses. Sens. Actuators B Chem. 1994, 18, 210–211. [CrossRef] 48. Scott, S.M.; James, D.; Ali, Z. Data analysis for electronic nose systems. Microchim. Acta 2006, 156, 183–207. [CrossRef] 49. Kim, Y.; Kim, I.; Kim, J.; Yoo, C. Real-time multivariate monitoring and diagnosis of air pollutants in a subway station. In Proceedings of the International Conference on Control, Automation and Systems, Seoul, Korea, 14–17 October 2008; pp. 2610–2615. 50. Hidayat, W.; Shakaff, A.Y.; Ahmad, M.N.; Adom, A.H. Classification of agarwood oil using an electronic nose. Sensors 2010, 10, 4675–4685. [CrossRef][PubMed] 51. Kim, E.; Lee, S.; Kim, J.H.; Kim, C.; Byun, Y.T.; Kim, H.S.; Lee, T. Pattern recognition for selective odor detection with gas sensor arrays. Sensors 2012, 12, 16262–16273. [CrossRef][PubMed] 52. Yingjie, Z. Portable Electronic Nose System for Detecting Volatile Organic Compounds. Master’s Thesis, University of Massachusetts Lowell, Lowell, MA, USA, 2012. 53. Kaiser, H. The application of electronic computers to factor analysis. Educ. Psychol. Meas. 1960, 20, 141–151. [CrossRef] 54. Kotsiantis, S.B.; Zaharakis, I.D.; Pintelas, P.E. Machine learning: A review of classification and combining techniques. Artif. Intell. Rev. 2006, 26, 159–190. [CrossRef] 55. Schiffman, S.S.; Wyrick, D.W.; Nagle, H.T. Effectiveness of an Electronic Nose for Monitoring Bacterial and Fungal Growth; CRC Press: Boca Raton, FL, USA, 2001. 56. Yolanda, G.M.; Oliveros, C.; Perez, P.; Carmelo, G.; Bernardo, M.C. Electronic nose based on metal oxide semiconductor sensors and pattern recognition techniques: Characterisation of vegetable oils. Anal. Chim. Acta 2001, 449, 69–80. 57. Mamat, M.; Samad, S.A.; Hannan, M.A. An electronic nose for reliable measurement and correct classification of beverages. Sensors 2011, 11, 6435–6453. [CrossRef][PubMed] 58. Trevor, H.; Robert, T.; Jerome, F. The Elements of Statistical Learning: Data Mining, Inference and Prediction; Springer: Berlin, Germany, 2009; Volume 2. 59. Howley, T.; Madden, M.G.; Connell, M.O.; Ryder, A.G. The Effect of principal component analysis on machine learning accuracy with high dimensional spectral data. Knowl. Based Syst. 2006, 19, 363–370. [CrossRef]

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).