Prototyping Low-Cost Automatic Stations for Natural Disaster Monitoring

Gabriel F. L. R. Bernardesa, Rogerio´ Ishibashib, Andre´ A. S. Ivob, Valerio´ Rosseta, Bruno Y. L. Kimuraa,∗

aInstitute of Science and Technology, Federal University of S˜aoPaulo (ICT/UNIFESP), S˜aoJos´edos Campos - SP, Brazil bBrazilian National Center for Monitoring and Early Warnings of Natural Disasters (Cemaden), S˜aoJos´edos Campos - SP, Brazil

Abstract Weather events put human lives at risk mostly when people might reside in areas susceptible to natural disasters. Weather monitoring is a pivotal remote sensing task that is accomplished in vulnerable areas with the support of reliable weather stations. Such stations are front-end equipment typically mounted on a fixed mast structure with a set of digital and magnetic weather sensors connected to a datalogger. While remote sensing from a number of stations is paramount, the cost of professional weather instruments is extremely high. This imposes a challenge for large-scale deployment and maintenance of weather stations for broad natural disaster monitoring. To address this problem, in this paper, we validate the hypothesis that a Low-Cost Automatic system (LCAWS) entirely developed from commercial-off-the-shelf and open-source IoT technologies is able to provide data as reliable as a Professional Weather Station (PWS) of reference for natural disaster monitoring. To achieve data reliability, we propose an intelligent sensor calibration method to correct weather parameters. From the experimental results of a 30-day uninterrupted observation period, we show that the results of the calibrated LCAWS sensors have no statistically significant differences with the PWS’s results. Together with The Brazilian National Center for Monitoring and Early Warning of Natural Disasters (Cemaden), LCAWS has opened new opportunities towards reducing maintenance cost of its weather observational network. Keywords: Low-cost Automatic Weather Station; Natural Disaster; Intelligent Sensor Calibration; Internet of Things.

1. Introduction warnings of vulnerable areas in the whole country. At Ce- maden, weather stations are the main monitoring components Weather monitoring is a crucial task in different domains of in a decision-making pipeline for early warnings of natural dis- applications, e.g., high precision agriculture [28], military mis- asters. This means that, while the weather stations themselves sions [36], outdoor entertainment and recreation [16], industrial cannot directly monitor natural disasters, the role of such sta- production, and logistics. One of the most critical applications tions is essentially collecting reliable weather parameters to feed is natural disaster monitoring. change has intensified forecasting models running on Cemaden’s back-end servers. As the occurrence of natural disasters around the world [3]. More such, the forecasting models are the entities responsible for pre- intense weather events have been experienced in the last few dicting disasters and then trigger early warnings. decades, such as higher and lower , intense , strong in tropical cyclones, and intensified droughts [34]. 1.1. Problem Addressed and Proposed Solution People might be exposed to the consequences of extreme weather events, e.g., flash flooding in underground drainage galleries, To provide effective and efficient large-scale weather mon- landslides on slopes, river overflow, soil and coastal erosion, itoring, Cemaden has an observational network composed of infrastructural collapsing of houses and buildings. Besides hu- more than five thousand Data Collection Platforms (DCP) de- man losses, climate-related natural disaster events affect the lo- ployed in almost a thousand municipalities within the Brazilian territory. A DCP is implemented by a Professional Weather arXiv:2102.04574v2 [cs.NI] 3 May 2021 cal economy by destroying productive capital, supply chains, and housing stock [5, 11]. Station (PWS) which consists of a set of one or more sensors to To warn people under imminent risk of natural disasters, measure weather events, e.g., rainfall , speed, weather data remotely collected from weather stations play a , air , relative , and atmo- key role. Although and citizen science based ap- spheric pressure. Although a PWS provides precision and data proaches [13, 15, 10] for disaster management can obtain use- reliability, its cost can reach tens of thousands of dollars. This ful information from volunteers, the main source of accurate implies that the deployment and maintenance of PWSs are ex- and reliable weather data still comes from radar and weather tremely costly for large-scale weather monitoring such as Ce- stations. In order to support risk management and reduce the maden’s observational network. impacts of natural disasters in Brazil, Cemaden is the govern- As an initiative to address cost reduction, in this paper, we ment institution responsible for monitoring and providing early present a low-cost automatic weather station system, LCAWS, which we developed from commercial-off-the-shelf and open- source IoT technologies. The key requirement of LCAWS is ∗Corresponding author Email address: [email protected] (Bruno Y. L. Kimura) providing low-cost weather instrumentation that allows preci- sion and data reliability equivalent to the PWSs employed by 1.3. Related Work Cemaden to monitor natural disasters across the Brazilian ter- In line with the hardware architecture proposed in this pa- ritory. By reducing the costs, it is possible to execute mainte- per, some Arduino-based low-cost prototypes especially designed nance in short periods and expand the network coverage, allow- for environmental monitoring are also present in the literature ing redundancy for a failed equipment. [26, 27, 21, 32]. Among them, ground weather stations devel- oped by Sabharwal et al. [26] and Saini et al. [27] are based 1.2. Summary of Main Contributions on the combination of Arduino Uno and ZigBee technologies. The main contributions of this work are the following: Whereas Lockridge et al. [21] designed a Sonde for monitor- ing marine environments, Strigaro et al. [32] also demonstrated (1) Design and implementation. We describe a detailed de- that low-cost weather stations based on COTS IoT devices are sign of an LCAWS, including: the interaction of the em- accessible solutions able to produce data of appropriate quality bedded electronics and sensor dynamics; the data process- for natural resource and risk management. ing to produce weather parameters with the main equations; Benghanem [4] proposed a wireless data acquisition system the system software architecture with the main algorithms (WDAS) and a low-cost weather station, whose hardware archi- at the automatic weather station (client node) and the cloud tecture was based on PIC (Parallel Interface Controller) 16F877 services (server node). microcontroller and a communication module using RF Mono- (2) Weather station validation. We present a consistent ex- lithics TX5002 and RX5002. Likewise, Tenzin et al. [33] devel- perimental methodology to validate LCAWS which includes oped a low-cost weather station for a smart agriculture applica- three main steps. First, the weather station deployment to- tion based on PIC24FJ64 microcontroller and a Xbee module, gether with a PWS of reference. Second, the data acqui- comparing results with a commercial and more costly station. sition with the same frequency within a one-month con- Shaout et al. [30] present a low-cost weather prototype for tinuous period of data sampling from measurements of at- measuring , wind directions, and temperature. These mospheric pressure, air temperature, relative humidity, parameters are used in combination with a neural network in precipitation, wind speed, and wind direction. Third, the order to estimate a dressing index, i.e., the number of clothes data analysis to compare the weather parameters produced a person should use under certain conditions. The hardware by the stations from different performance indicators. architecture consists of a Freescale HCS12 family’s Dragon12- Plus2 and a MC9S12DG256 microcontroller (16-bit CPU, 256 (3) Intelligent sensor calibration and data correction. We KB flash memory, 12 KB RAM, 4 KB EEPROM). Kodali and apply a robust methodology for calibrating the sensors in Mandal [18] explore the capabilities of ESP8266 (4 MB RAM, order to correct the weather parameters by means of linear 128 KB ROM) to prototype a very low cost weather station to and machine learning regression models. In such a method- acquire temperature, pressure, light, and rain drop sensors. ology, for each weather sensor, we select the best models Different from these related works, in our study, we focus from a broad set of candidate regression models. To do so, on the context of the natural disaster monitoring system which for each candidate model, we ran experiments exhaustively presents stronger operation requirements and high accuracy of with different randomized train-validation and test datasets gathered data. We validate an LCAWS by comparing results by using a k-fold cross-validation based machine learning with a PWS of reference for natural disaster monitoring, ap- pipeline. When applying the best models to correct the plying a consistent experimental methodology in order to ful- LCAWS weather parameters, they allowed an improvement fill weather monitoring requirements of Cemaden—a national on R-squared coefficient determination (R2) from 0.93 to monitoring center. In order to improve the accuracy of collected 0.99 for digital sensors, and from 0.31 to 0.97 for magnetic weather data, we propose and implement as an intelligent cali- sensors. In other words, this means a root mean squared er- bration system based on linear and machine learning regression ror (RMSE) reduction of up to 70% for digital sensors and models, which is a contribution that does not appear in related and up to 80% for magnetic sensors. Such an intelligent works. approach allowed an improvement in such a way that there The remainder of this paper is organized as follows: Sec- were no significant differences in the weather parameters tion 2 presents Cemaden’s remarks on the problem we addressed produced by the LCAWS and PWS, i.e., T-Test’s p-value in this paper. Section 3 describes our LCAWS prototype. Sec- greater than 0.05. tion 4 describes the weather data processing for LCAWS. Sec- tion 5 describes the weather station validation. Section 6 presents (4) A good candidate solution to reduce costs in natural dis- the intelligent sensor calibration and data correction approach aster monitoring. Using a robust methodology, we showed we proposed. Finally, Section 7 brings our main conclusions, that an LCAWS of a few hundred dollars has the potential to as well as the LCAWS’s limitations to be addressed in future provide weather data as reliable as a PWS applied for natu- work. ral disaster monitoring. When validating a reliable LCAWS prototype, new opportunities have been presented together with Cemaden to expand its national weather observational network and reduce the cost of maintenance of its DCPs.

2 Figure 1: Distribution of Cemaden’s observational network.

2. The Brazilian National Center for Monitoring and Early 2.1. DCP Types for Natural Disaster Monitoring Warning of Natural Disasters Every Cemaden DCP is a Professional Weather Station (PWS) equipped with a tipping bucket , and able to accom- Cemaden is a national center created in 2011 under the Min- modate additional sensors of specific groups of natural disaster istry of Science, Technology, and Innovation of the Brazilian monitoring. Such groups are divided into five DCP/PWS cat- government. Besides providing uninterrupted natural disasters egories:Pluviometric, Hydrological, Geotechnical, Agrometeo- monitoring and early warnings, Cemaden also carries out re- rological, and Acqua. search and technology innovation to contribute to its monitoring A pluviometric DCP is used for monitoring natural disasters systems in order to reduce the number of victims living in risk in general, which are often triggered by rain, and it is usually areas around the country [12]. Nowadays, Cemaden is monitor- composed of only a rain gauge. A hydrological DCP is used ing 958 municipalities in the entire Brazilian territory, mostly for flood monitoring from a rain gauge, a level radar for river for geological (landslides) and hydrological (floods) events of- level measurement, and a photography camera. A geotechni- ten caused by intense or sparse rain events [22]. To reach its cal DCP consists of a rain gauge and six soil humidity sensors goals, Cemaden holds an observational network, currently con- (up to 3 m) to be used together with a robotic total station to sisting of 5857 different kinds of DCPs and nine weather radars. help in the landslide monitoring. An agrometeorological DCP Such a network covers metropolitan areas subject to natural dis- is used for drought monitoring in the semiarid region of Brazil. asters while gathering meteorological data in real-. Ce- It consists of a rain gauge, a thermo for air tem- maden also has numerous partnership agreements to provide perature and relative humidity, an for wind speed, and receive weather information from other observational net- a vane for wind direction, four soil temperature and humidity works (radars and DCPs), expanding its network for better cov- sensors, a , and a for solar radiation erage and redundancy. The coverage map in Figure 1 shows measurements. An acqua DCP is also used for drought mon- the national territory with the priority municipalities overlaid itoring in the Brazilian semiarid region from a rain gauge and by the distribution of the observational systems. two superficial soil humidity sensors. In spite of its importance for governmental decision-making,

3 the expansion and maintenance of almost six thousands weather 2.3. Maintenance Schedules: A Cost-Related Challenge to Mit- instruments in a territory of 8,516,000 km2 impose financial and igate the Impacts of Vulnerabilities of DCPs technological challenges and logistics and communication dif- The aforementioned vulnerabilities can directly affect the ficulties and vulnerabilities. weather condition analysis at Cemaden’s Situational Room and, as consequence, affect the quality of the warning messages sent 2.2. Frequent Vulnerabilities of DCPs to Civil Defenses. In order to mitigate the impacts of these Based on historical information and maintenance reports on vulnerabilities, an effective preventive maintenance schedule Cemaden’s DCP network, it was possible to identify the most in short periods is a pivotal measure. However, Cemaden is frequent vulnerabilities that cause service interruption, classify a governmental center with very limited financial and human them into types, and determine their impact from the percentage resources. Currently, it is possible to execute a device’s main- of affected devices. Table 1 presents an analysis of vulnerabili- tenance every eighteen months. ties regarding a 2017 survey. In this context, one of its greatest challenges to mitigate Table 1: Frequent vulnerabilities found in Ceamaden’s DCP network. the impacts of vulnerabilities in its observational network is to execute maintenance within appropriate periods (less than six Vulnerability Type Affected Devices months) using the same human resources and available bud- get. From the vulnerability analysis, we observed that they Rain gauge clogging Physical 45.60% Theft, vandalism, and damage Human 5.37% are not linked to the robustness of hardware, but to the applied Battery Technological 4.49% technology. Cemaden’s DCPs use expensive imported compo- Communication failure Technological 12.05% nents (e.g., sensors, power regulators, solar panels, AC charg- No internet coverage Technological – ers, batteries, dataloggers, network interface controllers, enclo- sures, among others), which makes maintenance expensive, ei- Rain gauge clogging is a physical vulnerability that repre- ther preventively or for replacing failed components. Indeed, sents the greatest number of occurrences, a total of 35,185 oc- low-cost stations would be exposed to the same vulnerabilities currences in 2017, which affected 2671 devices. In most cases, in the field. However, an LCAWS with similar functionalities the problem of funnel clogging is caused by dirt, falling leaves and measurement accuracy should be a good cost alternative to and small insects that interrupt flow through the a PWS, allowing the execution of maintenance in shorter pe- of the tipping bucket rain gauge. riods and also an expansion of the network coverage, giving Theft, vandalism, and damage are human type vulnerabili- redundancy for a failed equipment. ties that affected around 5.37% of devices in 2017. In particular, The maintenance regards other preventive actions, such as there are many goats raised for milk and meat in the semiarid the installation of fences around most exposed equipment, part- region. These animals have a peculiar diet, and it is not unusual nerships with Civil Defenses for maintenance, informative boards, to gnaw the DCP’s cables, sensors, and even the holding box. etc. Communication failures and the absence of internet access Regarding the technological vulnerabilities, we highlight coverage are vulnerabilities that require efforts also from third that the most impactful ones are battery maintenance, commu- parties such as mobile network operators. To improve connec- nication failures, and the absence of internet coverage access. tivity beyond the access provided by network operators, e.g., by A battery is a type of commodity item that has a short life deploying an ad-hoc network to connect DCPs and Cemaden’s cycle and needs to be replaced every two years. Thus, it is servers, the DCPs have to be updated. However, besides the important to have a good schedule for battery replacement in need for acquiring new components (i.e., incurring cost of im- maintenance campaigns. Communication failures were regis- ported components), improving communication capacity would tered in 706 devices by data transmission problems with a to- be limited to the DCP manufacturer’s solutions. tal of 18,750 cases in 2017. Currently, the DCP communica- tion is based on GSM/GPRS technologies that uses the mobile 2.4. LCAWS as a Key Enabler for Low-Cost Reliable Weather telecommunication infrastructure. Thus, there are at least two Monitoring factors that could degrade signal quality and cause a failure: As discussed above, the cost reduction in the maintenance long distances between the telecommunication tower and DCP, process for a reliable weather monitoring is a key challenge for and obstructions between them, e.g., mountains, hills, build- Cemaden, regarding its limited financial and human resources. ings, even a strong rain can attenuate radio signals. No inter- In this context, prototyping Low-Cost Automatic Weather Sta- net coverage is an extension of communication failures. This tions (LCAWS) can represent a great opportunity to reduce the technological vulnerability represents a limitation to expand or maintenance cost composition, since it uses easily found Com- rearrange the observational network, as the telecommunication mercial Off The Shelf (COTS) components, representing just a towers typically cover the most populated areas and the coastal fraction of the values of actual DCP/PWS components. Besides zone of the country. It means that there are difficulties to in- a significant cost reduction in extreme scales (≈1/30) without stall instruments in rural areas (e.g., to monitor drought), top of losing autonomy or functionalities, the openness feature from mountains, and even other basins where rain could contribute open-source IoT technologies employed by LCAWS leverages to a natural disaster in a risk area. Nowadays, Cemaden has not other benefits, e.g., the flexibility for updating hardware/software deployed any DCPs in areas with no telecommunication cover- age. 4 components without being confined to single manufactures’ so- The mobile component of the anemometer is composed of lutions; the possibility of acting in different domains of re- a permanent magnet and three 75 mm diameter aluminum cups search and development, such as low-cost technologies for en- mounted on horizontal arms. Air flow makes this mobile part vironmental sensors, sensor data communication, data loggers, rotate on a vertical axis and once per full rotation the perma- among others. In the next section, we describe the design and nent magnet triggers a reed switch which is fixed to the static implementation of our LCAWS prototype. part. These rotations generate electric pulses that, along with a timestamp associated with each pulse, can be used to determine wind speed. 3. The Prototype of a Low-Cost Automatic Weather Station The tipping bucket rain gauge consists of a 150 mm diame- The LCAWS presented in this paper is composed of six ter funnel where rainwater collects and drips down to a tipping main components mounted on a structure of vertical and hor- bucket. The tipping buckets are the only moving parts and store izontal axes, as seen in Figure 2a and described in Table 2. The water in one of two buckets. When one bucket accumulates a weather measurements are provided by magnetic and digital specific volume of water, it tips over, emptying itself and expos- sensors. The magnetic sensors are the wind vane, cup anemome- ing the other one. Once that other bucket is full, it tips over and ter, and tipping bucket rain gauge, which are implemented by repeats the process. A permanent magnet placed between the the components 1, 2, and 3, respectively. The sensors for mea- two buckets triggers one of two reed switches, creating a pulse suring , air temperature, and relative hu- that is interpreted as a precise amount of precipitation that has midity are implemented by a single digital module that is housed occurred since the last pulse. in component 5. Component 4 is a photo-voltaic solar panel responsible for charging a 12 V 6 Ah lead-acid battery which 3.2. Digital Sensors powers the weather station. The battery is housed in component Component 5 is a sensor housing which accommodates and 6, the datalogger, which is responsible for acquiring, process- protects three digital sensors: air temperature, atmospheric pres- ing, storing, and transmitting the weather measurements. In the sure, and relative humidity. Such sensors are combined into a next subsections, we present details on the magnetic and digi- single Bosch BME280 [6] module. While providing low power tal sensors, the datalogger components, as well as the software consumption (3.6 µA at 1 Hz measurement), high accuracy and architecture of the system. resolution, the BME280 module encapsulates the sensors into an electronic component of small dimensions (11.5 × 15 mm). 3.1. Magnetic Sensors Different settings of oversampling and filtering allow tailor- ing data rate, noise reduction, response time, and energy con- The wind vane, cup anemometer, and tipping bucket rain sumption. An impulse response filter (IIR) can remove short gauge are magnetic sensors. Each consists of one or more reed fluctuations in data sampling. The small dimensions and versa- switches and a permanent magnet that is attached to a particu- tile features allow sensor implementation in small devices such lar mechanical device. Reed switches are devices that open or as handsets, GPS modules, watches, in different applications, close an electrical circuit when influenced by a magnetic field. e.g., home automation, indoor navigation, fitness, and GPS re- All reed switches in these specific sensors are of the normally finement. Differently, in this paper, we apply the BME280 to open type, i.e., they remain open by default until a nearby mag- implement an automatic weather station. netic field causes them to close. Each mechanical device has a specialized structure designed to interact physically with the 3.3. Datalogger environment so that movement of its parts causes a permanent magnet to influence one or more reed switches. By causing a Component 6 is the datalogger, which is a key instrument reed switch to allow electricity to pass, the mechanical device’s that contains all the circuitry responsible for sensor data acqui- movement generates electric signals which can consistently and sition, processing, and storage, for telemetry, and for powering reliably be interpreted as a specific weather phenomenon. the weather station. The datalogger is composed of eight main Each weather device produces particular movement dynam- internal components, as seen on Figure 2b: ics. Air flow causes the rotary part of the wind vane to rotate (.1) GPS receiver. Aside from useful to synchronize time and on a vertical axis until it points to the direction from where the localization, the GPS receiver provides a source of data for wind is blowing. The permanent magnet fixed to the rotary part further investigations, e.g., correlation between signal-to- moves on top of eight reed switches equally spaced by 45° fixed noise ratio (SNR) from different and the weather to the static part of the wind vane so that only one reed switch is measurements. on at any given time. Each of the reed switches is in series with a resistor of a distinct value varying uniformly from 10,000 Ω to (.2) Charge controller. It is a device that regulates the highly 80,000 Ω. When that particular circuit is closed, the resistance variable voltage and current supplied by a solar panel and of the overall circuit is uniquely related to a particular wind di- charges a battery following a charge curve. The charge rection. The overall circuit is in series with a fixed reference controller used in our prototype is a generic 10 A max resistor of 4700 Ω, forming a voltage divider. The wind direc- 12/24 V pulse width modulation (PWM) controller. tion is thus associated with the voltage measured in the wind (.3) Battery. The weather station is powered by a 12 V 6 Ah vane circuit. lead-acid absorbed glass mat (AGM) motorcycle battery. 5 Table 2: LCAWS components.

Component Description

1 Wind vane: Wind vane magnetic sensor (reed switch) of 45° resolution (N, NE, E, SE, S, SW, W, NW). 2 Anemometer: Wind speed magnetic sensor (reed switch) of one pulse per full rotation of 75 mm diameter aluminum cups, 147 mm radius to cup center. 3 Rain gauge: Tipping bucket rain gauge with magnetic sen- sor (reed switch) of 0.25 mm precipitation per pulse. 150 mm diameter collector. 4 Solar panel: Yingli YL055P-17b 55 W peak power solar panel used to charge the battery. 5 Sensor housing: Bosch BME280 combined digital sensors of air temperature, atmospheric pressure, and relative humidity. 6 Datalogger: .1 GPS receiver: u-blox Neo-6M GPS receiver module. .2 Charge controller: Generic 12 V/24 V 10 A Charge controller. .3 Battery: 12 V 6 Ah AGM lead-acid battery. .4 Terminal block: Generic terminal block for cable connections. .5 Internal sensors: Bosch BME280. .6 Storage: 2 GB MicroSD card. .7 Processing & Arduino Mega 2560, Telemetry: Epalsite GPRS Shield V1.0 (SIM900). .8 Voltage regulator: DC-DC step-down voltage regulator. (a) Weather station components (b) Components inside datalogger

Figure 2: Components of the Low-Cost Automatic Weather Station (LCAWS).

(.4) Terminal block. A 10-pin terminal block (component 6.4) weather station. The weather sensors are gathered into a pool is used to facilitate maintenance and secure stable electri- of S sensors: atmospheric pressure (AP), air temperature (AT), cal connections between sensors and the microcontroller. relative humidity (RH), rain gauge (RG), wind speed (WS), and wind direction (WD). The client program runs continuously and (.5) Internal weather sensors. A Bosch BME280 sensor mod- collects the sensor data once a minute, hence there is a standby ule is installed inside the datalogger housing to monitor time ts between each data acquisition cycle. The generic pro- internal temperature and humidity. cedure get sensordata() reads the sensors and returns the fol- (.6) Storage. Data is stored in a 2 GB microSD card. lowing 7-tuple:

(.7) Processing and telemetry. The Arduino Mega 2560 is the τ = {AP, AT, RH, RG, WS, WD, tts} (1) microcontroller used to acquire, process, store, and trans- mit data from sensors. Coupled with a SIM900 GPRS where tts is the UTC timestamp of each data acquisition cycle shield, it sends the weather data automatically to a cloud on the pool of sensors. Inside the get sensordata() procedure, remote server. the read sensor() procedure reads out the value available for each sensor s. (.8) Voltage regulator. An LM2596S DC-DC step-down volt- A file f is created to store a dataset of Nτ = 10 tuples τ in a age regulator chip, which converts 10–14 V inputs from cache on the local file system. The file f is sent to the server via the battery to a fixed output at 7 V to the Arduino. TCP/IP over the mobile operator network under configurable data frequencies. For the sake of real-time weather monitoring, 3.4. System Software Architecture the periodic communication can be accomplished at the rate of The LCAWS system software is based on a client–server the data acquisition from the sensors, i.e., the sending of sensor architecture, as shown in Figure 3. The automatic weather sta- data tuple every minute cycle as soon as it is measured. This tion (AWS) is composed of a client program and routines de- sampling frequency should be enough, since it is higher than ployed on the datalogger. On the server side, the cloud ser- the one configured in Cemaden’s DCPs on the field, which is vices (CS) are provided by programs and routines for process- one sample every 10 min to monitor natural disasters. If the ing and data corrections deployed on a cloud. The rationale of internet connection is completely interrupted, the data continue AWS and CS operations are summarized in Algorithms 1 and 2, being buffered in the local files and is transmitted when the in- respectively. ternet connection is reestablished. If a connection is unstable The client program is implemented with the C++ Arduino and packet drop occurs, the TCP connection between the AWS API [1] in order to run on the microcontroller deployed on the client and the CS server is responsible for providing the reli- 6 Automatic Weather Station (AWS)

get_sensordata(S) Algorithm 1: AWS operation. Algorithm 2: CS operation. AP send_to(server, f ) Input: Input: AT N = 10, tuples τ per file. DB = , an initialized τ database. RH τ L null pool of client RG sensors S ts = 1 min, a sampling standby. DBL = null, an initialized τ database. write_file(f, τ) server = IP-port, address info. t, UTC starting time for data processing. WS Internet WD program client(Nτ, ts, server): i = 1 hour, time interval from i. tuple τ cache init modules() program server(): S = init sensors() while true do Cloud Services (CS) while true do f = create file() f = create file() recv from(client, f ) dataset d recv_from(client, f ) foreach 1 to Nτ do append(DBL, f ) τ = get sensordata(S ) close( f ) server write file( f , τ)) end DBL process_data standby(t ) s program process data(DBL, DBL, i, g): τ summary end subset(DB, i ≤ t ≤ i + g) while not end of(DBL) do ts send to(server, f ) append(DB, f ) d = subset(DBL, t ≤ tts < t + i) close( f ) DB AP = summary(dAP) // Equation (3) L end AT = summary(d ) // Equation (3) procedure get sensordata(S ): AT Inteligent Sensor RH = summary(d ) // Equation (3) Calibrataion get dataset τ = null RH Dsk to correct DB DBP foreach s ∈ S do RG = summary(dRG) // Equation (4) get dataset v = read sensor(s) data_correction Dsj to learn WS = summary(dWS) // Equation (10) τ = cat(τ, v) WD = summary(dWD) // Equation (13) D * learning_pipeline sk end τ = cat(AP, T, RH, P, WS, WD) learner learning tts = get timestamp() DB* techniques append(DBL, τ) L model l* τ = cat(τ, tts) t t i best model l* return τ = + O L end

Figure 3: LCAWS system software architecture. able transmission required to consistently send the weather data eral database DB. The database DBP stores the corresponding cached in local files. In our proof-of-concept prototype for au- weather parameters obtained from a professional weather sta- tomatic data acquisition, the data transmission is implemented tion of reference. As such, a dataset Ds j ∈ DB of a sensor s ∈ S with a GPRS shield connected to the Arduino microcontroller. consists of the weather parameters of both low-cost and pro- Nevertheless, once the LCAWS design is a COTS-based modu- fessional weather stations, regarding a j period of observation. lar architecture, there is flexibility to apply other up to date de- Given Ds j and a set L of candidate machine learning-based re- ployable long-range IoT data transmission technologies [8, 19], gression techniques, the procedure learning pipeline is respon- ∗ e.g., LoRA [2], SigFox [39], and NB-IoT [29]. sible for constructing the learner model l to fit Ds j through a On the remote side, cloud services are provided from a server k-fold cross-validation learning pipeline. The resulting learn- program and a program for data processing. The server is ers l∗ (i.e., the fitted regression models) are stored into a set O implemented in Python and runs continuously in order to re- together with other machine learning output material. When ceive the file f and append Nτ tuples of raw data values into having a number of learner models available in O, the proce- a local database DB. Asynchronously, process data is an- dure data correction takes the best one l∗ for a target sensor s other program implemented in R Language, which processes in order to correct a given dataset Dsk of a k observation period ∗ weather measurements in raw values and produces the corre- from the LCAWS’s database DBL. The resulting dataset Dsk sponding weather parameters. To do so, a subset of raw data containing the corrected weather parameters is thus stored in a d in DB is selected regarding t , which is the measurement ∗ L ts final database DBL . UTC timestamp. Since the weather parameters are usually sum- The proposed intelligent sensor calibration and data correc- marized in statistics per hour, the dataset d considers an interval tion is expected to be a continuous process. This means that, [t, t+i−1], where t is a given UTC starting time, and i = 60 min once we have a set of fitted models in O, the regression models is the dataset time interval. For each 1 h subset d, we deter- can be applied to correct data as soon as new weather parame- ∈ mine the weather parameter for each sensor s S from the ters are available in DBL. Meanwhile, the set O should not be procedure summary(). As highlighted in Algorithm 2, such a a permanent repository of regression models, so that the pro- procedure applies equations to give the parameters across the cedure learning pipeline has to be run whenever new ground- different types of weather data. These equations are discussed truth values are available in DBP. In Section 6, we describe in the next section. details of the implementation and methodology the proposed As a service CS at the server side, we have the intelligent calibration and data corrections, as well as the discussion of sensor calibration programs. The processed LCAWS weather results we obtained from experiments conducted exhaustively. data stored in DBL and the ones in DBP are merged into a gen-

7 4. Weather Sensor Data Processing The lagged differences are given by:  In this section, we describe the equations we implemented xi+1 + xmax − xi, if (xi+1 − xi) < 0, throughout the procedure summary() to process the different fd(xi) = (5) x − x , otherwise, sensor data and obtain the desired weather parameters. i+1 i

where xmax is the maximum unsigned integer value in which 4.1. Digital Sensor Measurements xi can store. The LCAWS client data collector module assumes Atmospheric pressure (AP), air temperature (AT), and rela- 16-bit integer on the Arduino Mega microcontroller, i.e., xmax = tive humidity (RH) are measured by the Bosch BME280 digital 216 − 1. sensor kept in the sensor housing. The BME280 allows for ac- complishing a period of measurement with configurable over- 4.3. Wind Speed sampling. After such a measurement, an infinite impulse re- The raw measurement obtained from the LCAWS anemome- sponse filter (IIR) can be applied to AP and AT values in order ter computes cumulatively the number of revolutions of its cup- to increase the output signal resolution from 16 to 20 bits while mounted arm. To convert such a data into wind speed in meters reducing bandwidth and removing short-term fluctuations. RH per second, w, the data processing is accomplished as follows: measurement does not fluctuate rapidly, thus it requires no fil- tering. BME280s IIR is given by a low-pass filter: w = C · R, (6)

∗ k x · (2 − 1) + xi where C = π × o is the circumference of a revolution for a x∗ = i−1 , (2) i 2k propeller of o diameter; and R is the expected amount of rev- olutions of the propeller observed in a second. The LCAWS x where i is the observed measurement from the ADC output anemometer provides revolutions of circumference C = 92.4 cm, data; x∗ is the previous measurement filtered; and 2k is a filter i−1 once its arms have a diameter of o = 29.4 cm. k , coefficient with a factor = [0 4]. Since a cup-mounted arm revolution causes a magnetic pulse, × The LCAWS oversampling setting is of 16 measurements R provides the expected number of pulses per second, which is k with filter disabled (i.e., = 0) per data reading. A cycle of given by: a single sampling takes the time of three consecutive measure- fd(r) ments (AT, AP, and RH) followed by a standby time of 0.5 ms. R = · s, (7) fd(t) The LCAWS client data collector accomplishes one data read- ing per minute from the BME280 module. Thus, the mean of where fd(x) is given by Equation (5); r the set of cumulative filtered measurements per hour for a BME280 sensor is: pulses observed in a 1 h period; t is a set of the numbers of cu- mulative milliseconds passed since LCAWS datalogger started N running; and s is the time unit in seconds, i.e., s = 1000 ms. 1 X ∗ X = x , (3) Cup and wind vanes are typical weather in- N i i=1 struments that provide horizontal wind measurements, so that where N is the sample size of measurements obtained in a pe- wind speed is related to a particular wind direction. Thus, to ob- ∗ tain the mean of wind speed of an observation period, the east– riod of 1 h observation; and xi is an i measurement filtered with Equation (2). west and north–south wind directions require to be mapped into the Cartesian plane of two components, x and y. The mean of 4.2. Rainfall Precipitation the vector components,x ¯ andy ¯, are given by: The LCAWS pluviometer is a tipping bucket rain gauge, N 1 X   θi  thus it counts cumulatively the number of magnetic pulses when- x¯ = −wi · sin 2π · , (8) N 360 ever the bucket tips due to the rainfall precipitation. From the i number of pulses, the correspondent rainfall in 1 XN   θ  millimeters per hour is: y¯ = −w · cos 2π · i , (9) N i 360 i XN RG = fd(pi) × ν, (4) where N is the sample size of measurements obtained with the i=1 weather station in 1-hour observation; wi is the wind speed in m/s; and θ is the wind direction in degrees. The wind speed w where N is the sample size of measurements obtained in 1 h i i and LCAWS direction θ are given by Equations (6) and (12), observation; p is the number of pulse clicks accumulated in i i respectively. that measurement; ν is the amount of precipitation measured From the means of vector components x and y, the mean of by each pulse generated by the tipping of the LCAWS bucket, wind speed is determined from the resultant vector average: which is ν = 0.25 mm; and fd(x) is the lagged differences be- tween two consecutive values, i and i + 1 in x. q WS = x¯2 + y¯2, (10)

8 wherex ¯ andy ¯ are the component averages obtained with Equa- catch and measure the same weather events at near location and tions (8) and (9), respectively. time, allowing direct comparison of the measurement results between them. The Professional Weather Station (PWS) of ref- 4.4. Wind Vane erence we utilized was the Campbell Scientific aCR200 Series The LCAWS wind vane operates through eight reed switches (Campbell Scientific, Inc, Leicestershire, UK) [7]. Specifica- fixed to the chassis, which are switched on by a permanent mag- tions of both stations are detailed in Table 3. Figure 4 shows net fixed to the rotary part of the vane. This gives the wind vane the pictures of the deployment location. a resolution of θ = 45°, corresponding to the Cardinal direc- res Table 3: Weather stations’ specifications. tions north, northeast, east, southeast, south, southwest, west, and northwest. LCAWS PWS In the magnetic sensor device, the wind direction corre- Manufacture: Multi-suppliers Campbell Scientific sponds to the current voltage of the vane voltage divider in the Model: Beta v3.0 CR200(X) continuous interval of 0v and 5v. ADC provides such a volt- Maturity level: Academic/test prototype Professional use age in 10-bit integer values, which can be lossless reduced to Weather sensors: AP, AT, RH, RG, WS, WD AP, AT, RH, RG, WS, WD the space of an 8-bit integer. Thus, the value domain is within Control sensors: GPS, BME280, Battery l. Battery level 8 − ∈ API languages: C++, C, R, Python CRBASIC V = [0, vmax], where vmax = 2 1. Then, a voltage v V is A/D converter: 10 bits 12 bits mapped into one out of nine discrete vane positions p ∈ P.A Max. scan rate: 1/s 1/s mapping function fp : V → P is given by: User program: – PC200W software Processing: Arduino Mega 2560 CR200(X) CPU     vmax Memory: 8 KB RAM, 6.5 KB program storage fp(v) = Rref · − 1 , (11) 256 KB program storage v Data storage: 2 GB microSD 128-512 KB EEPROM Data format: C++ primitive data types 4 B per data point where Rref = 4700 Ω is the reference resistor. Data retrieval: USB/RS232, GPRS, microSD RS232, PCCOM If 1 ≤ p ≤ 8, then the vane position p can be mapped into a Data correction: ML-based sensor calibration – corresponding angle θ ∈ Θ, where Θ = {0, 45, ··· , 315} is a set Communication: USB/RS232, GPRS shield Serial RS232 o o of angles spaced in θ degrees according to the eight expected Temp. range: – -40 C to +50 C res Datalogger Plastic, IP55, Aluminium, NEMA 4X, Cardinal directions. A mapping function, fθ : P → Θ, is given enclosure: 30 cm× 22 cm × 12 cm 14 cm × 17.6 cm × 5.1 cm by: Current drain: 80 mA avg, 2 A peak (GPRS 3 mA avg, 75 mA peak,  TX), 12 V 12 V θ0, if p = 0, fθ(p) =  (12) Battery: 6 Ah 0.08 Ah θres · (p − 1), otherwise, Power supply: Solar panel 55 Wp Solar panel 10 Wp Redundancy: – Battery backup where θ0 = 225° is the calibration of a particular position of the Protections: Moisture monitoring EMI, ESD vane when fp(v) = 0, in which the circuit is spliced so that ACD cannot provide a valid voltage; and θres = 45° is the angular space between two intercardinal directions. 5.2. Weather Data Acquisition When determining θ, the mean of wind direction, WS, of a We collected the weather measurements from both stations period of observation is determined by: during 30 days of continuous observations, during the rainy sea- ! x¯ 180 son in southeast Brazil (March 2019). Each station provided the WD = arctan · + 180, (13) y¯ π measurements once per minute, so that we processed the corre- sponding weather parameters into statistics per hour, according wherex ¯ andy ¯ are the means of the vector wind components to the data processing described in Section 4. Thus, from a large obtained from Equations (8) and (9), respectively. uninterrupted observation period, we obtained samplings of at- mospheric pressure (AP), air temperature (AT), relative humid- ity (RH), rain gauge (RG), wind speed (WS), and wind direc- 5. Weather Station Validation tion (WD). Figure 5 shows the time-series for each parameter To validate LCAWS, we applied an experimental method- provided by LCAWS and PWS during the 30 days of obser- ology of three main steps: weather station deployment, data ac- vations. When interposing the results of the stations, the time- quisition, and data analysis. In the next subsections, we discuss series gives us a description of how each parameter had behaved each step. and how close the results of LCAWS are with respect to PWS. Since we used PWS as a reference, we assume its results as be- 5.1. Weather Station Deployment ing the ground-truth to discuss closeness and dissimilarity from the results produced by LCAWS. We deployed LCAWS at the outdoor test environment of the For each sensor of each station, we generate a dataset by Technology Park (PqTec) [25, 23] of the city of Sao˜ Jose´ dos summarizing statistics (e.g., minimum, 1st quartile, median, Campos, state of Sao˜ Paulo, southeast Brazil. In order to asses mean, 3rd quartile, maximum, sum, and mode) per hour over the weather measurement results, LCAWS was co-located 3 m the corresponding weather parameter (i.e., over the weather data apart of a reference weather station. Both stations were able to processed with the equations described in Section 4). Thus, 9 (a) Deployment location (red point): 23o09’21.5” S 45o7’30.0” W. (b) LCAWS. (c) PWS.

Figure 4: Weather station deployment in situ pictures: (a) a 300 ft aerial image of the Sao˜ Jose´ dos Campos Technological Park (PqTec)[25] – image adapted from OpenStreetMap contributors [23], with the red point marking the exact location (-23.15598 -45.79166) of the deployment; (b) the Low-Cost Automatic Weather Station (LCAWS); and (c) the Professional Weather Station (PWS). the summarized weather parameter is described in the domain root of such an average, respectively, of non-negative real numbers, where each tuple in the dataset N t 1 X is driven to the corresponding timestamp ts. The datasets are MSE = (y − yˆ )2, (16) publicly available and can be found in [20]. N i i √ i=1 5.3. Weather Data Analysis RMSE = MSE, (17)

We analyzed the parameters by computing various perfor- where yi andy ˆi are the actual and predicted values, as in Equa- mance metrics such as Pearson’s Correlation Coefficient (PCC), tion (15). While MSE is a risk metric corresponding to a quadratic R-squared coefficient of determination (R2), and residuals anal- score from the average residual magnitude, RMSE is its squared ysis from Mean Squared Error (MSE) and Root Mean Squared root, which can be interpreted with the same unit as the mea- Error (RMSE). In addition, we applied t-tests over the obtained sured data. results in order to verify the statistical significance. Figure 7 shows the PPC results between pairs of sensors. To correlate the weather parameters in pairs of sensors, we From the PPC similarity of internal sensor pairs in each sta- apply PPC: tion, as shown in Figure 7a,b, one can assume that the results between the stations are coherent. While there was no corre- PN (xi − x¯) · (yi − y¯) PPC = i=1 , (14) lation between AP and the other sensors, RH was strongly and q q negatively correlated to AT. The other pairs of sensors had pos- PN (x − x¯)2 · PN (y − y¯)2 i=1 i i=1 i itive and negative PPC results, however, with no strong coeffi- − where x and y are two given values correlated from two sensors, cients greater than 0.5 or lower than 0.5. In general, the ob- andx ¯ andy ¯ are their averages, respectively. tained PPCs show that LCAWS works in accordance with PWS. Considering the PWS results as being the ground-truth, we However, when combining sensors in pairs between LCAWS determine the goodness-of-fit from LCAWS to PWS for each and PWD, as shown in Figure 7c, we observed correlation di- pair of equivalent sensors by applying the R2 coefficient, vergences. While the Bosch BME280s digital sensors (AP, AT, RH) allowed coefficients closer to the ones in PWD, the PN 2 magnetic-based sensors (RG, WS, WD) could not provide such i=1 (yi − yˆi) R2 = 1 − , (15) stronger correlations. PN (¯y − y )2 i=1 i To investigate such a divergence, we compared those pairs of sensors by analyzing the residual results between them with where yi andy ˆi are the actual and predicted values, i.e., PWS and LCAWS results, respectively. The best case is R2 = 1, i.e., the performance metrics R2, MSE, and RMSE. Table 4 presents when the predicted values exactly match the actual ones, so that the metrics obtained between LCAWS and PWS. To assess the the residual sum of squares is zero. statistical significance of the results, we applied paired t-tests To measure how well the results of LCAWS can fit to PWS, by assuming the following parameterization: the confidence in- for each pair of equivalent sensors, we analyze the residuals terval of 95 %, with the null hypothesis H0 : sL = sP, i.e., a between them from the average of squared errors and square datum collected from a given sensor s in LCAWS is equal to a corresponding one in PWS. Otherwise, the alternative hypothe-

10 ssi eetdfrtesnosA,A,adR,wihmeans which RH, di provide and AT, sensors AP, such sensors hypoth- statistically the null that the for words, rejected other is In esis (**). 0.001 than lower icance, verified we PWD, results and the per- LCAWS of between promising significance the obtaining assessing when the Although metrics, to formance PWS. close were where with LCAWS 6, obtained from Figure RH ones in and AT, plots AP, scatter for the results in the seen be can This RMSE. R2 with 0 accuracy, high allow could module BME280 Bosch’s and LCAWS between observation). significance (30-day statistical PWS and metrics Performance 4: Table and (LCAWS) Station H Weather sis Automatic Low-Cost the side-by-side, co-located stations both (WS). with Direction di observation Wind from and weather (WS), (PWS), of Speed Station days Weather 30 Professional of the Time-series 5: Figure facpigH accepting of

. WD (degree) WS (m/s) RG (mm/h) RH (%) AT (C) AP (mbar) 100 200 300 100 945 950 955 1adsalrsdasacrigt h bandMEand MSE obtained the to according residuals small and 91 10 15 20 40 60 80 20 25 30 35 0 1 2 3 4 0 5 0 speetdi al ,tecmie iia esr in sensors digital combined the 4, Table in presented As T096 .79099 .40** ** 0.7056 4.6400 59.7297 0.2569 3.7500 3567.6384 4.1609 0.4979 0.9894 0.6136 Significance (**) 0.0660 code: 0.3445 0.5305 Significance 17.3133 0.9390 WD 0.9789 0.9186 RMSE WS 0.2815 0.9260 RG 0.9557 RH MSE AT AP R2 Sensor 1 1 : s 2 L , 3 s P 0 . ihatgtmthbtenteotoe of outcomes the between match tight a with 4 5 p -value 6 ≤ 7 0.001. 8 9 Professional Weather Station(PWS), p ff ff vle fhg signif- high of -values rn esr:ArPesr A) i eprtr A) eaieHmdt R) anGue(G,Wind (RG), Gauge Rain (RH), Humidity Relative (AT), Temperature Air (AP), Pressure Air sensors: erent rn eut.I case In results. erent − 10 − − − 300 ** 13.0100 t .300.2601 1.1300 0.6132 ** 0.5100 4.5700 -value 11 12 p -value 13 Days inMarch,2019 ofobservation 14 ≥ 15 11 pe W) n iddrcin(D r antcsensors magnetic e are the (WD) on depends direction accuracy wind whose and (WS), speed ag iesre r ecie y9.%ad8.%o non- of 88.8% and 91.4% by rain described the are that highlight time-series to gauge important is it However, LCAWS. and di icant av- the residuals, an these represents RMSE of an Such error 4. erage Table in described MSE as with 0.93, magnitude, between low residuals gauge RMSE of rain were the stations resolution higher, the bucket PWS 2.5 the about Although was LCAWS by respectively. registered PWS, observa- mm 98.7 30-day and and the mm 80.75 during with stations period, the tion by rainy precipitation catch di of is implied to ing sensor This gauge precipitation sensor. LCAWS rain the less than PWS requiring events the by that sensitive, means This more the mm mm). tip (0.25 0.10 PWS to the of vs. needed that than precipitation higher is amount bucket tipping LCAWS The dimensions. ical under implemented the are di on sensors these events addition, those In caught by device. pulses sensor caused interaction switch mechanical reed the from from measured are events Weather have would we sensors, of pairs 16 ff rn ein o ahsain hc nld di include which station, each for designs erent Di 17 Low−Cost Automatic Weather Station(LCAWS) ff ff = rn rmtedgtlsnos h angue(G,wind (RG), gauge rain the sensors, digital the from erent 18 rnebtenteri ag esrmnso PWD of measurements gauge rain the between erence .59(m,wiealwn rdcaiiyo R2 of predictability allowing while (mm), 0.2569 19 ± CW uktbtentesain.From stations. the between bucket LCAWS 1 20 p vlewso .1s htteei osignif- no is there that so 0.61 of was -value 21 22 23 t -value ffi 24 inyo nlgapparatus. analog of ciency 25 ≈ and 0 26 27 ff p rn account- erent -value = ff 28 rn phys- erent .6 and 0.066 29 ≈ 1. 30 ≥ AP (mbar) AT (C) RH (%) RG (mm/h) WS (m/s) WD (degree) 958 ● ● ● ● ● ● ● p−value = 2e−04 ●● 35 p−value = 0 ●● 100 p−value = 0 ●●●●●●● 20 p−value = 0.6132 p−value = 0 ● ● ●●p−value = 0.2601 ●● ●● ●●●●●●● ● ● ●● ● ●●● ●● ● ●●●●●●●● ● ● ● ●●●●● R2 = 0.9557 ●● R2 = 0.926 ●●●● R2 = 0.9186 ●●●●●●● R2 = 0.939 4 R2 = 0.3445 ● ● R2 = 0.6136● ●●●●●●● ●●● ●●●● ●●● ●●●●● ●● ●●●●●●●● ●●●● ●●●●● ●●●●●●● ● ●● 300 ● ●●●●●●●● ● ●●●● ●●●● ●●●●●●●● ●● ● ● ● ●●●●●●●●● ● 954 ●●●● ●●●●● ●●●●●●● 15 ● ●● ● ● ●●●●●●●●●●●●●●● ●●●● 30 ●●●●● ● ●●●● ●●● ●● ●● ●●●●●●● ●●●● ●●●●● 80 ● ●●●●●●● ●●●● ● ●●●●●●●●●●● ●●● ●●●●● ●●●●●● ●●●●●●●● 3 ●● ●●● ● ●●●●●●●●● ●●●● ●●●●● ●●●●●●●● ●●●●● ●●●●●●● ● ●●●●● ●●●●● ●●●●●●● ● ●● ●●●●● ●●●● ●●●●● ●●●●●●● ●●●●● ●●●● 200 ● ●●●●●● ●●●●● ●●●●●● ●●●●●●● ●●●●●●● ●●●●●●●● ●●● 950 ●●●●● ●●●●●●●● ●●●●● 10 ●●●●●●●●● ● ●●●●● ●●●●● 25 ●●●●●● ●●●●● ● 2 ● ●●●●●● ●●●●●●●●●●●● ● ●●●●● ●●●●●●● ●●●●● ● ●●●●●● ●●●●●●●●●●●● ● ●●●●● ●●●● ●●● ●●● ● ●●●●●●●●●● ●●●●●● ●● ● ●●● ●●●●●●●● 60 ●●●● ●●●●●●●●●● ●●●●●●● ● ●●●● ●●●●●●● ●●●● ● ●●●●●●●● ●● ●●●●●●●●● ● ● ●●●● ●●●●●● ●●●● ●●●●●●●●● 100 ●●●●●● ●● ●● ●● ● ●●● ●●●●●● ●●●● 5 ● 1 ●●●●●●●●●●● ●●●●●●●● ●● 946 ●●●● ●●●●● ●●● ●●●●●●●●●●●● ●●●●●●● ●● ●●● 20 ●●●● ● ●●● ●● ●●●●●●●●●●● ●●●●●● ●● ●●●● ●● ● ●● ●●●●●●●●●●●● ●●●●●● ● ●● ●●● ●● ●●● ●●●●●●●●●● ●●●●●●● ● ● ●●● ●● ●●●● ●●●●●●●●●●●● ●●● ● ● ●● ●●●● 40 ● 0 ●●●● ● 0 ●●●●●●●●●●● 0 ●●●● ● 946 950 954 958 20 25 30 35 40 60 80 100 0 5 10 15 0.0 0.5 1.0 1.5 2.0 0 100 200 300 Professional Weather Station (PWS) Weather Professional Low−Cost Automatic Weather Station (LCAWS)

Figure 6: Scatter plots of 30-day observation period with the Low-Cost Automatic Weather Station (LCAWS) and the Professional Weather Station (PWS) from different sensors: Air Pressure (AP), Air Temperature (AT), Relative Humidity (RH), Rain Gauge (RG), Wind Speed (WS), and Wind Direction (WS). L L P P P P L L P P P P L L P P P P S D S D S D stations. This means that, if the values sampled by the stations H G P T H G H G P T P T A A R R W W A A R R W W A A R R W W are very different, they may represent wind directions of differ- APL 1 0.1 −0.07−0.02 0.05 0 APP 1 0.1 −0.07−0.03 0.06 0 APL 1 −0.39 0.12 −0.02−0.11−0.24 ent moments within the observed minute. Although there were ATL 1 −0.89 0.11 −0.37−0.04 ATP 1 −0.88 0.07 −0.43−0.06 ATL 1 −0.88−0.07 0.34 0.19 these two possibilities of data inconsistencies, they could not RHL 1 −0.37 0.29 0.15 RHP 1 −0.35 0.35 0.18 RHL 1 0.1 −0.42−0.08 cause more impact than the observed outliers. Thus, we under- RGL 1 −0.06−0.21 RGP 1 −0.09−0.24 RGL 0.98 0.06 0 stand that the wind direction measurements had little fluctuation WSL 1 −0.23 WSP 1 −0.1 WSL 0.98 −0.2

WDL 1 WDP 1 WDL 0.83 within the observed minutes.

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 From the analysis we carried out, one can conclude that the weather stations produce different weather data, either by the (a) LCAWS (b) PWS (c) LCAWS vs. PWS residual or by the significance of paired t-tests. In order to re- duce residuals while enabling operations with no statistical dif- Figure 7: Pearson correlation coefficient (PCC) of the observed results (30 days): (a) between sensors of LCAWS, (b) between sensors of PWS, and (c) be- ference between the stations, we propose to extend the exper- tween sensors of LCAWS and PWS. imental methodology by applying linear and machine learning regression models in order to calibrate the LCAWS weather pa- rameters. In the next section, we discuss experimental results rainy hours for LCAWS and PWS, respectively. This unbal- obtained from such a proposal. anced sampling of rainy and non-rainy measurements also ex- plains the low residuals, even when sensors have different bucket volumes. 6. Calibrating Weather Sensors with Linear and Machine Among the magnetic sensors, the worst performance met- Learning-Based Regression Models rics came from the wind speed (WS) sensor, with the lowest R2 Assuming the PWD results as being the expected values, of 0.34, and residuals of MSE = 0.50 and RMSE = 0.70 m/s. i.e., ground-truth, we applied regression methods in order to Regarding the space of the observed results in the range 0 ≥ reduce the LCAWS residuals. As described in Table 5, other WS ≤ 4.45 m/s, those residuals are quite representative. How- works applied regression methods to accomplish sensor cali- ever, from the time-series in Figure 5, we observe that the faster bration or data inference. The authors observed promising re- the wind speed, the higher the residuals between the WS results. sults from well-known linear methods such as Linear Regres- On the other hand, while the residuals are of different scales, sion (LM), Multiple Linear Regression (MLR), as well as from they behave under a similar pattern with correlation PPC = well-known machine learning (ML) based techniques such as 0.98. Support Vector Machines (SVM), Random Forest (RF), and When comparing the results of the wind direction sensors Neural Networks (NN). Aside from analyzing performance of (WD) between the stations, a p-value greater than 0.05 accepts well-known methods, we also verified the efficiency of sophis- the null hypothesis. However, it lies on the higher magnitude ticated methods of Ensemble Learning (EL) by combining a of residuals with MSE = 3567.64 and RMSE = 59.73 ( degree), set of different base models into super learner meta-models. and hence a lower predictability of R2 = 0.61. Such an RMSE Although ML-based methods are more computationally costly led to errors around ±60 degrees in a space of 0 ≤ WS < 360. than the linear models LM and MLR, computing resource con- The influence of residuals in WD can be seen from the outliers straints in the data correction process should not be of concern. in Figure 6. The reasons are twofold. First, the data collecting Such a process is a non-real-time task which is carried out after procedure is discrete and constrained, accomplished only by a data processing as a Cloud Service (CS) on the server side. single sampling a minute from the WD sensor. This implies that the wind may change its direction by moving the vane to a new 6.1. Machine Learning Pipeline position just before or soon after reading out the sensor. In this occasion, the collected value may not be representative for that In order to reduce the LCAWS residual, we implemented a whole minute of observation. The second reason is that the time machine learning pipeline in R language, as illustrated in Fig- of reading out the WD sensors is not synchronized between the ure 8. In such a pipeline, there are four important steps. In

12 Table 5: Related work on applying regression methods to improve weather Algorithm 3. We aim at identifying the best candidate regres- sensing process. sion models for accomplishing data correction on the weather parameters. Thus, improvements on machine learning methods Applications Regression Methods are beyond the scope of this paper. Reference Calibration Inference LM MLR NN SVM RF EL

[14] XX [17] XX Algorithm 3: ML experiments. [31] XX Input: [38] XX S = {AP, AT, RH, RG, WS, WD}, the set of sensors. [37] XXX L = {LM, MLR, NN, SVM, RF, EL}, the set of learning techniques. [9] XXXXX E = {1, 2, ×, 100}, set of experiments. *LCAWS XXXXXXX DB, processed dataset containing the weather parameters. * The method we proposed in this paper for intelligent sensor calibration Output: and data correction. Label: Linear Model (LM), Multiple Linear Regres- O, resulting set of multiple ML outcomes, o = {l?, d, yˆ, m}. sion (MLR), Support Vector Machines (SVM), Random Forest (RF), Neural program run experiments(S , L, E, DB): Networks (NN), Ensemble Learning (EL). O = null foreach s ∈ S do Ds = subset(DB, s) step (1), the processed dataset of each sensor is split into two foreach l ∈ L do sets. The first set consists of 60 % of the dataset and is used for foreach e ∈ E do o = learning pipeline(Ds, l, e) training and validating the regression models. The second set append(O, o) with the remaining 40 % of the dataset is used for testing the end models fitted in step (1). We observed that the train–test dataset end proportion of 60–40% could bring better results, while train sets end larger than that have led to overfitting. In step (2), we submitted return O procedure learning pipeline(Ds, l, e): the train set to SuperLearner [35], which is a machine learning set seed(e)

R package [24] for constructing super-learners in predictions d = split dataset(Ds, 60%, ’rand’) ? based on k-fold cross-validations from a set of candidate mod- l = super learner(dtrain[y], dtrain[x], l) ? els. We set k = 10 folds in our experiments. Then, learner l? yˆ = predict(l , dtest[x]) m = compute metrics(yˆ, dtest[y]) is the model or the set of weighted models that most minimizes return o = {l?, d,y ˆ, m} the risk metric MSE. In step (3), the fitted model l? is then used to make out-of-fold predictions from the test set with the 40 % of the input dataset. Finally, in step (4), we apply the perfor- In the experiments, we set three models of feed-forward mance metrics of R2, MSE, and RMSE in order to compare the neural networks regarding different sizes for the hidden layer outcomesy ˆ predicted by the l? model with the expected values (4, 7, and 10 units). The SVM model was configured with ra- y in the test set. dial basis function and regression precision (nu) of 0.5. The Random Forest (RF) model was parameterized with 1000 trees Processed dataset Super Learner's Pipeline grown. We defined a set of more than twenty different candidate 1 (1) Split the train set models for ensemble learning (EL). (2) into k folds for From the database DB with the processed parameters of Split dataset cross-validation both LCAWS and PWS stations, for each sensor s ∈ S , we selected the respective processed dataset DB and conducted (3) For each k block, s Test set Learner model l* train and predict 100 experiments from it. In each experiment e ∈ E, the four using base models steps showing in Figure 8 are accomplished by the learning pipeline procedure. Since a random seed is set for each experiment, Compute metrics (4) Outcome (R2, MSE, RMSE) predictions different random samples of each weather parameters are gen- erated and split by keeping the 60–40% train–test proportion. Figure 8: Machine learning regression pipeline we implemented in order to re- When having the fitted model l? from the SuperLearner’s pipeline duce the LCAWS residuals. Four main steps are accomplished: (1) dataset split- for training and cross-validation, the predictions are accom- ting; (2) model training and cross-validation; (3) predictions with the learner plished from the test set and performance metrics are computed. model; and (4) computing performance metrics. The output o of each experiment consists of the model l?, the dataset d, the predicted valuesy ˆ, and the obtained performance metrics m. Regarding the outputs o ∈ O, we applied t-tests to 6.2. Efficiency of the Regression Models Considering the processed dataset for the sensors S = {AP, AT, RH, RG, WS, WD}, we evaluated the efficiency of a set of 1Candidate models applied for the ensemble learning (EL) from a num- regression models L = {LM, MLR, NN, SVM, RF, EL} from ber of R packages: biglasso, extraTrees, gam, glm, glm.interaction, ipredbagg, kernelKnn, ksvm, lm, loess, mean, nnet, nnls, polymars, an amount of different experiments |E| = 100, as described in randomForest, ranger, rpart, rpartPrune, speedglm, speedlm, step, step.forward, step.interaction, stepAIC, svm. 13 verify the significance of the top-1 fittest model’s R2 coefficient against the other models. Thus, we identified whether the best regression model was really able to perform differently than the others. Table 6 presents the performance ranking based on the av- erage (Avg.) and standard deviations (SD) of the regression models’ R2 coefficients, as well as the the paired t-tests. The Table 6: Ranking of different regression models and R2 significance (Top-1 baseline model RAW (highlighted in blue) presents the LCAWS model vs. others) from results of 100 experiments. performance results regarding its originally processed weather parameter. Except for RG parameters, the experimental results Sensor Model Rank R2 R2 Significance show significant improvement on the data correctness. When Avg. (±SD) t-value p-value minimizing the residuals, the top-1 regression models allowed AP MLR 1 0.9991 (± 1e-04) for maximizing the performance in R2 with statistical signifi- EL 2 0.9987 (± 0.0025) 1.62 0.1078 cance in relation to the RAW baseline. Considering R2 and the LM 3 0.9966 (± 3e-04) 74.85 ** respective p-values for AP, AT, and WS sensors, one can verify SVM 4 0.9871 (± 0.0047) -25.69 ** that there were no significant differences in the performance of RF 5 0.9613 (± 0.0059) -64.57 ** RAW 6 0.9555 (± 0.0035) 125.61 ** simple regression models (LM, MLR) and more sophisticated NN.4 7 0.0127 (± 0.142) 69.47 ** ML-based models (EL, SVM, RF). The EL model was the best AT MLR 1 0.9939 (± 0.0012) one for the RH sensor, reaching p-value with high significance, EL 2 0.9939 (± 0.0045) 0.11 0.9141 differentiating it from the others. For the WD parameter, the LM 3 0.9929 (± 0.0014) 5.68 ** ML-based models EL and RF have performance with no statis- SVM 4 0.9852 (± 0.0046) -18.31 ** tical difference. For the RG sensor, it is irrelevant applying or RF 5 0.9835 (± 0.0022) -41.08 ** not with regression models to correct the RG parameter. This RAW 6 0.9254 (± 0.0058) 115.06 ** can be verified from the R2 performance of the fittest model NN.4 7 0.0340 (± 0.1971) 48.7 ** LM, which had no significant difference to the ML, RAW (i.e., RH EL 1 0.9943 (± 0.0049) LCAWS raw data), and EL models. MLR 2 0.9924 (± 0.0014) -3.77 ** RG and WD sensors have inherent limitations of analog de- LM 3 0.9910 (± 0.0017) -6.44 ** SVM 4 0.9838 (± 0.0063) -13.17 ** vices based on magnetic reed switches, either by the device de- RF 5 0.9826 (± 0.0025) -21.16 ** sign or by the data sampling procedure. Unlike the other param- RAW 6 0.9177 (± 0.0065) 94.29 ** eters, rainy events and wind speed had no pattern. We expect NN.10 7 0.0110 (± 0.1206) -81.44 ** to address these issues to improve the data correctness in fur- RG LM 1 0.9235 (± 0.1145) ther investigations, e.g., hardware device design, software data MLR 2 0.9228 (± 0.1155) -0.05 0.964 collecting procedures, and/or improving regression methods. RAW 3 0.9060 (± 0.0982) 1.16 0.2485 EL 4 0.8967 (± 0.1066) 1.71 0.0887 6.3. Correcting Weather Parameters with the Fittest Models RF 5 0.4902 (± 0.1446) -23.5 ** SVM 6 0.2220 (± 0.2002) -30.42 ** Once the best regression models for each sensor are ob- NN.10 7 0.0316 (± 0.1891) 40.34 ** tained, we carried out a last experiment to verify how data cor- WS EL 1 0.9685 (± 0.0736) rection would be accomplished in reality. Instead of splitting MLR 2 0.9668 (± 0.0036) -0.24 0.8138 the processed parameters into randomized samples, we deter- SVM 3 0.9563 (± 0.0113) -1.65 0.1026 mined the 60–40% train-test proportion with chronological in- LM 4 0.9554 (± 0.0051) -1.78 0.0776 tervals, from day 1 to day 18 and from day 18 to day 30, respec- RF 5 0.9458 (± 0.0089) -3.06 * tively. Figure 9 shows the scatter plots (a) and (b) of the weather RAW 6 0.3395 (± 0.0213) 82.1 ** parameters with no corrections (RAW data) and the parameters NN.10 7 0.1704 (± 0.3764) -20.81 ** corrected with the fittest regression models, respectively. Ta- WD EL 1 0.7231 (± 0.0486) ble 7 presents the results from performance metrics. RF 2 0.7213 (± 0.037) -0.29 0.7708 As we previously discussed, we observed that regression SVM 3 0.7007 (± 0.0553) -3.04 * models could not reduce the residuals of the RG parameter. In MLR 4 0.6818 (± 0.0532) -5.73 ** LM 5 0.6779 (± 0.0563) -6.08 ** this last experiment, the best model LM in the RG parameters RAW 6 0.6045 (± 0.0719) 13.67 ** led to a lower R2 with a performance degradation of −4.49%. NN.7 7 0.0195 (± 0.1252) -52.39 ** On the other hand, for the other weather parameters, we ob- Significance codes: (∗∗) < 0.001, (∗) < 0.01 served that the LCAWS predicted valuesy ˆ from the best re- gression models could be fitted well to those expected y in PWS. The best models allowed performance improvements in R2 of 1.9%, 5.6%, 6.8%, 68.0%, and 19.9% for the AP, AT, RH, WS, and WD parameters, respectively. In addition, the paired t-tests applied in such weather parameters showed p- values higher than the level of 0.05, i.e., there was no significant 14 AP (mbar) AT (C) RH (%) RG (mm/h) WS (m/s) WD (degree) 958 ● ● 100 ● ● ● ● ●● ●● ●●●●●● ● ● Model: RAW ●● Model: RAW ● ● Model: RAW ●●●●● Model: RAW Model: RAW ● Model: RAW ● ●● ●●● ●●●●●● ● ● ●● ●●●● ● ●●●●●● 4 ● ● ●●● p−value = 0.1048 ●●● p−value = 0.0062 ●● p−value = 0.0107 ● ●●●●● 6 p−value = 0.8449 p−value = 0 ●● p−value = 0.4627 ● ●● ●●● 30 ●●● ●●●●●● ● ● 300 ● ●● ●●● ●●● ●●●● ● ● ● ● ●●●●● 954 R2 = 0.9789 ●● R2 = 0.9352 ●●● R2 = 0.9359 ●●●●●● R2 = 0.9196 R2 = 0.3109 R2 = 0.5467 ● ●●●●● ● ●● ●● ●●●●● ●●● ●● ● ●●● ●●●●● ●●● 80 ●●●●●● 3 ●● ● ●●●● ●●●● ●●●● ● ●●● ● ●●●●● ●● ●●●●●● ●●●● ●●● ●●● ●● ●●●● ●●●● ●● 4 ● ●●●● ●●● ●●●● ●●●● ●●●●● ●● 200 ●●●●● ●●●●● 25 ●●●●● ●●●● ●●● ●●●●●● ●● 950 ●●● ●●●● ●●● ●●●●●●● ● ●●● ●●●●● ●●●●● ●●● 2 ●●●● ●●●●●●●● ● ●●●● ●●●●● ●● ●●●● ●●●●●●● ● ● ●●● ●●●●●●● 60 ●● ●●●●●● ●●●● ●● ● ●●●●● ●●● ● ●● ● ●●●●● ●●●●● ●●● ●●●●● ●● ●●●●● ● ●●●●●●●● ● ● ●●● ●●● ● ●● 2 ●●●●●● 100 ●●●●● ●● ●●● ●●● ●●●● ● 1 ●●●●●●●● ●●●●● ● 946 ●●● 20 ●●●● ● ●● ●●●●●●●●● ●●●● ● ●●● ●●●● ●● ● ●●●●●●●●● ●●●● ●●● ●●● ● ● ●●●●●●●● ●●● ● ●● ●●●● ●● ● ●●●●●●●● ●● ● ●● ●● ●● ● ●●●●●●●●●● ● ● ● ●●● 40 ● 0 ●● ● 0 ●●●●●●●●● 0 ● ● 946 950 954 958 20 25 30 40 60 80 100 0 2 4 6 0.0 0.5 1.0 1.5 0 100 200 300 Professional Weather Station (PWS) Weather Professional Low−Cost Automatic Weather Station (LCAWS)

(a) LCAWS weather parameters processed from the raw data.

AP (mbar) AT (C) RH (%) RG (mm/h) WS (m/s) WD (degree) 958 ● ● 100 ●● ● ● ● ●● ●● ●●●●●● ● ● Model: MLR ●● Model: MLR ●● Model: EL ●●●●●● Model: LM Model: EL ● Model: EL ● ● ●● ●●●● ●●●●●● ● ● ●● ●●● ● ●●●●●●●● 4 ● ● ● ● p−value = 0.6985 ●●● p−value = 0.674 ●● p−value = 0.5571 ●●●●●●●● 6 p−value = 0.3945 p−value = 0.4773 ●● p−value = 0.7097 ● ● ●●● 30 ●●● ●●●●●● ● ● 300 ● ● ●● ●● ●● ●●●●●● ● ● ● ● ●●●●●●● 954 R2 = 0.9981 ●● R2 = 0.9912 ●●● R2 = 0.9822 ●●●●●●●● R2 = 0.8801 R2 = 0.972 R2 = 0.6827 ● ●●●●●●● ●● ●●●● ●●●●●●● ●●● ● ●● ●●● ●●●● ●●● 80 ●●●●●●●● 3 ●●● ● ●●●● ●●● ●●●● ●●●●●●● ● ● ●●● ●●● ●●●● ●●●●●●● ●● ● ●● ● ●●● ●●●● ●● ● 4 ●●● ● ●● ●●● ●●●●● ●●●●● ●● 200 ●●●●●● ●●● 25 ●●●●● ●●● ●● ●●●●● ● 950 ●●● ●●●●● ●● ●● ●●●●●● ●●●●● ●●● ●●●●●● ●●●● 2 ●●●● ●●●●●●●●● ● ●●● ●●●●●● ●● ●●● ●●●●●●●●●●●● ● ●●● ●●●●●●●● 60 ●●●● ●●●● ●●●●●● ●● ● ●●●● ●●●●● ●●● ● ●●●●●● ●●● ● ●●● ●●●●●● ●●● ●●●● ●●●●●●●●●● ● ●●● ●●●●●● ●● 2 ●●●●● 100 ●●●●●● ● ●● ●●● ●●●● ●●● ● 1 ●●●●●● ●●●●●●●● ● 946 ●●● 20 ●●●● ● ●● ●●●●●●● ●●●●● ●● ● ●●●● ●● ● ●●●●●● ●●●●●●● ● ●● ●●●●● ● ● ●●●●●●●● ● ●● ● ●● ●●●●● ●● ● ●●●●● ●● ● ●● ●● ●● ● ●●●●●●●●● ● ● ● ●●● 40 ● 0 ●● ● 0 ●●●●●●●●●● 0 ● ● 946 950 954 20 25 30 40 60 80 100 0 2 4 6 8 0 1 2 3 4 100 200 300 Professional Weather Station (PWS) Weather Professional ML−based Regression Models (LCAWS)

(b) Processed LCAWS weather parameters corrected through the fittest regression models.

Figure 9: Scatter plots between LCAWS and PWS weather parameters, regarding the test set from day 18 to day 30, with different sensors: Air Pressure (AP), Air Temperature (AT), Relative Humidity (RH), Rain Gauge (RG), Wind Speed (WS), and Wind Direction (WS). difference betweeny ˆ and y. These promising fitting results are the weather stations produce equivalent weather parameters. In seen in the time-series in Figure 10, where the black lines of the the case of the RG sensor, whether or not to apply regression yˆ values predicted by the best regression models nearly overlap models is indifferent. Nevertheless, the original results with the red line of y-values expected in PWS. processed data from LCAWS show to be statistically similar to the PWS results. Table 7: Performance metrics and significance between LCAWS and PWS weather parameters regarding the test set from the day 18 to day 30. 7. Conclusions Sensor Model R2 MSE RMSE Significance t-value p-value Weather stations play a key role for the natural disaster mon- AP RAW 0.9789 0.2316 0.4813 1.6200 0.1048 itoring, being the main source of accurate, reliable, and in situ AT RAW 0.9352 0.9686 0.9842 2.7500 * weather data. To support risk management and reduce the im- RH RAW 0.9359 14.6893 3.8327 −2.5600 . pacts of natural disasters in Brazil, Cemaden urges to provide RG RAW 0.9196 0.0168 0.1296 0.2000 0.8449 WS RAW 0.3109 0.6404 0.8003 −8.6200 ** large-scale weather monitoring from an observational network WD RAW 0.5467 2676.4324 51.7342 −0.7400 0.4627 of more than five thousands of data collection platforms (DCPs) AP MLR 0.9981 0.0204 0.1430 −0.3900 0.6985 spread across risk areas in thousands of cities of the Brazilian AT MLR 0.9912 0.1309 0.3618 0.4200 0.674 territory. The main vulnerabilities of DCPs are not linked to RH EL 0.9822 4.0804 2.0200 0.5900 0.5571 the hardware robustness, but mostly to the applied technology RG LM 0.8801 0.0251 0.1583 0.8500 0.3945 that is provided only from high-cost professional weather in- WS EL 0.9720 0.0260 0.1613 −0.7100 0.4773 WD EL 0.6827 1873.3925 43.2827 0.3700 0.7097 strumentation. While the expensive DCP’s components lead to high costs of maintenance for both preventively and replacing Significance codes: (**) < 0.001, (*) < 0.01, (.) < 0.05. failed components, the vulnerabilities in the observational net- work are mitigated ultimately from short-term periodic mainte- From the experimental results we obtained, one can con- nance. In this context, the maintenance of DCPs is extremely clude that, by applying the regression model-based data correc- costly for a vast country with a need for thousands of DCPs tion approach, it was possible to calibrate the sensors AP, AT, such as Cemaden’s observational network, mostly under lim- RH, WS, and WD. Such an approach allowed: (1) minimizing ited human and financial resources. the residuals from MSE and RMSE metrics and, hence, maxi- As an initiative to address cost reduction in maintenance mizing the performance R2 of these LCAWS sensors; (2) val- process of natural disaster monitoring, we propose to apply idating statistically these LCAWS sensors with no significant IoT technology based on COTS in order to reduce the cost of differences of its corrected weather parameters to the expected weather instrumentation from tens of thousands of dollars to a ones in PWS. With reduced residuals, the p-values above 0.05 few hundred. To this end, in this paper, we present a compre- make the null hypothesis be accepted, i.e., the hypothesis that 15 h olwn adaeadsfwr limitations: software identified and we hardware process, following the network. validation LCAWS observational main- the weather reduce along Brazilian However, to the alternatives of low-cost costs of tenance feasibility the strate Work Future in Limitations LCAWS’s Addressing 7.1. station monitoring. weather disaster natural professional for a reference pro- as of to reliably (PWS) able as was data LCAWS weather proposed vide the that showed method- robust we a ology, From technologies. commercial-grade IoT open-source from and weather parts developed automatic entirely low-cost (LCAWS) a station of calibration and sensor validation, intelligent implementation, design, the on material hensive 30. day to 18 day from regression LCAWS, set PWS, test from the regarding parameters models, weather of Time-series 10: Figure WD (degree) WS (m/s) RG (mm/h) RH (%) AT (C) AP (mbar) 100 200 300 100 945 950 955 40 60 80 20 25 30 0 1 2 3 4 0 2 4 6 8 0 sapofo-ocp,LASi niiilwr odemon- to work initial an is LCAWS proof-of-concept, a As • 18 hra nuainadsiligteecouefo di- from enclosure . the superior rect shielding with and enclosure insulation an thermal utilizing to are improvements investigated max- Proposed a be Celsius. to degrees up 47 days, of sunny imum during housing inside datalogger observed the frequently were More- temperatures came . acid high water battery over, this from if or known outside not the is from It silica the saturated. when became ob- housing hous- the gel were of the walls condensation interior of water the on interior of served the amounts keep small dry, to ing the utilized Although was sealed. gel hermetically silica not hous- is IP55, datalogger rated The ing, protection. enclosure Datalogger 19 20 PWS, Days inMarch,2019(testdataset) ofobservation 21 22 LCAWS, 23 24 25 ML−based RegressionModel 26 27 28 29 16 Acknowledgments moni- disaster natural its pipeline. into toring integrated obser- and Cemaden’s on network deployed vational be to LCAWS evolve mentioned then, above and, limitations software and hardware LCAWS’s References [20]. in Availability Data correc- data a and tion. calibration implement sensor to intelligent for us method helped robust that practices best about thank machine-learning suggestions also the We for Brazil) (UNIFESP, [25]. Faria PqTec Augusto Fabio at professional deployed group’s (PWS) research station be weather his to LCAWS with of together validation accomplished the allowed who Brazil) (UNESP, #2019 8, #2015 Nos. Grant (FAPESP), Foundation Bnhnm . 09 esrmn fmtoooia aabsdon based data meteorological of Measurement 2009. M., Benghanem, [4] change climate of impact The 2014. S., of Donner, study J., Kossin, A S., Banholzer, 2016. M., [3] W. Townsley, T., Clausen, J., Yi, A., Augustin, [2] Adio 00 h run reference. arduino The 2020. Arduino, [1] hsrsac a atal uddb S by funded partially was research This address to required are investigations further context, this In aaesadohrcmlmnaymtrascnb found be can materials complementary other and Datasets • • • ieesdt custo ytmmntrn.ApidEeg 6(12), 86 Energy Applied monitoring. 2660. – system 2651 acquisition data for wireless systems warning Early 21–49. disaster: pp. Springer, Reducing change. climate In: disasters. natural on Sensors things. of internet 1466. the for (9), networks 16 power low & range Long lora: URL etto Fgr )hsvldtdLAS t coding its refactoring. LCAWS, imple- including validated improvements, AWS requires has the 3) While (Figure month, investigation. mentation a under once still is happening and usually recurrent, This reset. is hard manual problem a requiring freeze, to Arduino causes the error software supposed A inconsistencies. AWS’ position. (tipped) end then the was at switch sat bucket reed the Each when replac- switches. triggered and reed two position with middle it the ing passed triggered bucket was which the corrected switch, when tip- reed successfully the single was the caused This removing by winds vibrate. high was to design when bucket gauge ping pulses rain false original to The prone faults. gauge Rain temperatures high observed the battery to housing. short the due This inside also night. be whole may a life last hold to not battery could charge the it one, enough before Wp to 55 months necessary a six to approximately it panel lasted made solar Wp GPS) 10 (GPRS, the switch consumption modules power RF the run While to life-cycle. battery Short https://www.arduino.cc/reference/en/ / 38-.W hn noi alsVrl Saraiva Varela Carlos Antonio thank We 23382-2. / 80-,#2018 18808-0, oPuoResearch Paulo ao ˜ / 23064- [5] Boustan, L. P., Kahn, M. E., Rhode, P. W., Yanguas, M. L., 2020. The Package ‘superlearner’ version 2.0-26, The Comprehensive R Archive effect of natural disasters on economic activity in us counties: A century Network (CRAN). of data. Journal of Urban Economics 118, 103257. URL https://cran.r-project.org/web/packages/ [6] BST-BME280-DS002-15, Sep 2018. Bosch sensortech - bme280 SuperLearner/index.html datasheet - revision 1.6. [25] PqTec, 2006. Sao˜ jose´ dos campos technology park. URL https://www.bosch-sensortec.com/media/ URL http://www.pqtec.org.br/en-us/about-us boschsensortec/downloads/datasheets/bst-bme280- [26] Sabharwal, N., Kumar, R., Thakur, A., Sharma, J., 2014. A low cost zig- ds002.pdf bee basedautomatic wireless weather station with gui and web hosting [7] Campbell Scientific, 2015. Cr200/cr200x series dataloggers: Operator’s facility. International Journal of Electrical and Electronics Engineering 1. manual. [27] Saini, H., Thakur, A., Ahuja, S., Sabharwal, N., Kumar, N., 2016. Ar- URL https://s.campbellsci.com/documents/eu/manuals/ duino based automatic wireless weather station with remote graphical cr200_200x.pdf application and alerts. In: 2016 3rd International Conference on Signal [8] Centenaro, M., Vangelista, L., Zanella, A., Zorzi, M., 2016. Long-range Processing and Integrated Networks (SPIN). pp. 605–609. communications in unlicensed bands: The rising stars in the iot and smart [28] Sawant, S., Durbha, S. S., Jagarlapudi, A., 2017. Interoperable agro- city scenarios. IEEE Wireless Communications 23 (5), 60–67. meteorological observation and analysis platform for precision agricul- [9] Cordero, J. M., Borge, R., Narros, A., 2018. Using statistical methods to ture: A case study in citrus crop water requirement estimation. Computers carry out in field calibrations of low cost air quality sensors. Sensors and and Electronics in Agriculture 138, 175 – 187. Actuators B: Chemical 267, 245–254. [29] Schlienz, J., Raddino, D., 2016. Narrowband internet of things whitepa- [10] de Vos, L. W., Leijnse, H., Overeem, A., Uijlenhoet, R., 2019. Quality per. White Paper, Rohde&Schwarz, 1–42. control for crowdsourced personal weather stations to enable operational [30] Shaout, A., Li, Y., Zhou, M., Awad, S., 2014. Low cost embedded weather rainfall monitoring. Geophysical Research Letters 46 (15), 8820–8829. station with intelligent system. In: 2014 10th International Computer En- [11] Debortoli, N. S., Camarinha, P. I. M., Marengo, J. A., Rodrigues, R. R., gineering Conference (ICENCO). IEEE, pp. 100–106. 2017. An index of brazil’s vulnerability to expected increases in natural [31] Sharma, N., Sharma, P., Irwin, D., Shenoy, P., 2011. Predicting solar gen- flash flooding and landslide disasters in the context of climate change. eration from weather forecasts using machine learning. In: 2011 IEEE Natural hazards 86 (2), 557–582. International Conference on Smart Grid Communications (SmartGrid- [12] Decreto, 2011. Decreto n 7.513, de 1 de julho de 2011, presidenciaˆ da Comm). pp. 528–533. republica.´ [32] Strigaro, D., Cannata, M., Antonovic, M., 2019. Boosting a weather mon- URL http://www.planalto.gov.br/ccivil_03/_ato2011- itoring system in low income economies using open and non-conventional 2014/2011/decreto/d7513.htm systems: Data quality analysis. Sensors 19 (5), 1185. [13] Degrossi, L. C., de Albuquerque, J. P., Fava, M. C., Mendiondo, E. M., [33] Tenzin, S., Siyang, S., Pobkrut, T., Kerdcharoen, T., 2017. Low cost 2014. Flood citizen observatory: a crowdsourcing-based approach for weather station for climate-smart agriculture. In: 2017 9th international flood risk management in brazil. In: The 26th International Conference conference on knowledge and smart technology (KST). IEEE, pp. 172– on Software Engineering and Knowledge Engineering (SEKE 2014). pp. 177. 570–575. [34] Van Aalst, M. K., 2006. The impacts of climate change on the risk of [14] Fang, X., Bate, I., 2017. Using multi-parameters for calibration of low- natural disasters. Disasters 30 (1), 5–18. cost sensors in urban environment. In: Proceedings of the 2017 Interna- [35] Van der Laan, M. J., Polley, E. C., Hubbard, A. E., 2007. Super learner. tional Conference on Embedded Wireless Systems and Networks. Junc- Statistical applications in genetics and molecular biology 6 (1). tion Publishing, p. 1–11. [36] Winkler, M., Street, M., Tuchs, K.-D., Wrona, K., 2012. Wireless sen- [15] Fava, M. C., Abe, N., Restrepo-Estrada, C., Kimura, B. Y. L., Mendiondo, sor networks for military purposes. In: Autonomous Sensor Networks. E. M., 2019. Flood modelling using synthesised citizen science urban Springer, pp. 365–394. streamflow observations. Journal of Flood Risk Management 12 (S2), [37] Yamamoto, K., Togami, T., Yamaguchi, N., Ninomiya, S., 2017. Machine e12498. learning-based calibration of low-cost air temperature sensors using envi- [16] Finger, R., Lehmann, N., 2012. Modeling the sensitivity of outdoor recre- ronmental data. Sensors 17 (6), 1290. ation activities to climate change. Climate Research 51 (3), 229–236. [38] Zimmerman, N., Presto, A. A., Kumar, S. P., Gu, J., Hauryliuk, A., Robin- [17] Kelley, J., Pardyjak, E. R., 2019. Using neural networks to estimate site- son, E. S., Robinson, A. L., Subramanian, R., 2018. A machine learning specific crop evapotranspiration with low-cost sensors. Agronomy 9 (2), calibration model using random forests to improve sensor performance 108. for lower-cost air quality monitoring. Atmospheric Measurement Tech- [18] Kodali, R. K., Mandal, S., 2016. Iot based weather station. In: 2016 In- niques 11 (1). ternational Conference on Control, Instrumentation, Communication and [39] Zuniga, J. C., Ponsard, B., 2016. Sigfox system description. LPWAN@ Computational Technologies (ICCICCT). pp. 680–683. IETF97, Nov. 14th 25. [19] Lauridsen, M., Nguyen, H., Vejlgaard, B., Kovacs, I. Z., Mogensen, P., Sorensen, M., 2017. Coverage comparison of gprs, nb-iot, lora, and sig- fox in a 7800 km2 area. In: 2017 IEEE 85th Vehicular Technology Con- ference (VTC Spring). pp. 1–5. [20] LCAWS, 2021. Lcaws datasets and complementary material. URL https://github.com/brunokimura-dev/LCAWS [21] Lockridge, G., Dzwonkowski, B., Nelson, R., Powers, S., 2016. Develop- ment of a low-cost arduino-based sonde for coastal applications. Sensors 16 (4), 528. [22] Mendes, R. M., de Andrade, M. R. M., Tomasella, J., de Moraes, M. A. E., Scofield, G. B., 2018. Understanding shallow landslides in campos do jordao˜ municipality – brazil: disentangling the anthropic effects from natural causes in the disaster of 2000. Natural Hazards and Earth System Sciences 18 (1), 15–30. URL https://www.nat-hazards-earth-syst-sci.net/18/15/ 2018/ [23] OpenStreetMap contributors, 2020. Planet dump retrieved on 2020-06- 26. URL https://www.openstreetmap.org/export#map=17/- 23.15598/-45.79166 [24] Polley, E., LeDell, E., Kennedy, C., Lendle, S., van der Laan, M., 2019.

17