2018 International Workshop on ADVANCEs in ICT Infrastructures and Services (ADVANCE’ 2018)

Proceedings

Santiago, Chile 11th – 12th January 2018

ISBN: 978-2-9561129

The 6th edition of the International Workshop on ADVANCEs in ICT Infrastructures and Services focuses in sustaining efforts of the worldwide scientific community, practitioners, researchers, engineers from both academia and industry to have a forum for discussion and technical presentations on the latest theoretical and technological advances in ICT to solve societal challenges for developed and developing countries. The workshop aims to advancing open science, initiating and strengthening research collaborations, sharing knowledge and data, networking for new research collaboration proposals, strengthening friendship among communities and ethics in conducting science.

ICT technologies and more particularly novel networking, computing and service infrastructures are drastically changing our society in all its dimensions. These advances not only have an impact on the way people are working but also on the way they are interacting, learning, educating, and playing, among others. How these technologies respond to the societal needs and how they should evolve to respond to future needs of the digital society are crucial aspects discussed in the technical papers selected for the proceedings of ADVANCE’ 2018.

The workshop included several tracks addressing specific topics in ICT. Each session had a keynote talk, paper presentations, and discussion sessions.

i Technical Chairs

Karima Boudaoud, I3S Laboratory, University of Nice , France Sandra Céspedes, Depto de Ingeniería Eléctrica, Universidad de Chile, Chile Technical Program Committee Hakim Abdelhafid, University of Montreal, Canada Nazim Agoulmine, IBISC Lab, Univ Evry, France Farid Alilat, USTHB, Algeria Cesar Azurdia Meza, University of Chile, Chile Rossana Maria de Castro Andrade, UFC, Brazil Javier Baliosian, University of Uruguay, Uruguay Djamel Belaid, Telecom Sud Paris, France Sonia Ben rejeb-Chaouch, Mediatron-Supcom, ISI, Tunisia Bharat Bhushan, Vodafone Group Technology, UK Karima Boudaoud, University of Nice, France Reinaldo Braga, IFCE, Aracaxi, Brazil Javier Bustos, University of Chile, Chile Sandra Céspedes, Universidad de Chile, Chile Jose Bringel Filho, UESPI, Brazil Javier Bustos, NICLabs, Universidad de Chile, Chile Joaquim Celestino, UECE, Brazil Tijani Chahid, Telecom Sud Paris, France Nada Chendeb, Lebanese University at Tripoli, Lebanon Emanuel Coutinho, UFC, Brazil Diego Dujovne, Universidade Diego Portales, Chile Ahmed Elmisery, Universidad Tecnica Frederico Santa Maria, Chile Stenio Fernandes, UFPE, Brazil Paulo Roberto Freire Cunha, UFPE, Brazil Elhadi Cherkaoui, IBISC Lab, Univ Evry, France Willie Donnelly, Wit, Ireland Ismail Guvenc, NC State University, USA Massum Hassan, Verizon, USA Artur Henrique Kronbauer, UNIFACS, Brazil Hanna Klaudel, IBISC Lab, Univ Evry, France Christian Lazo, Universidad Austral de Chile, Chile Hannane Lutfiyya, University of Western Ontario, Canada Joberto Martins, University of Salvador, Brazil Wassila Mtalaa, LIST, Luxembourg Hassine Moungla, University Paris 5, France Augusto Neto, UFRN, Brazil Thinh Nguyen, Orange, France Cesar Olavo, IFCE, Brazil Mauro Oliveira, IFCE-Aracaxi, Brazil Carina Oliveira, IFCE-Aracaxi, Brazil Rafael Tolosana, Universidad Zaragoza, Spain Franck Pommereau, IBISC Lab, Univ Evry, France ii Rafael Freitas Reale, IFBA, Brazil Antonio Wendell Rodrigues, IFCE, Brazil Paulo Nazareno Sampaio, UNIFACS, Brazil Marcelo Anderson Baptista Dos Santos, IF Sertao-PE, Brazil Thiago Silva Moreira, IBISC Laboratory, France Khanh Toan Tran, Carrefour Information System, France Marco Winckler, IRIT, University Paul Sabatier, France

iii Table of Contents

QoS Instrumentation to Support Cloud Computing SLA Assurance ..... 1 Mustapha Aitidir, Nazim Agoulmine, Rafael Tolosana and Javier Baliosian

Evaluate Location Features for Continuous Authentication with Machine Learning Experiments ...... 13 Rossana Andrade, Marcio Correia, Carlos Carvalho and Pablo Ximenes

Neural Network Model of QoE for Estimation Video Streaming over 5G network ...... 21 Lounis Asma, Alilat Farid and Agoulmine Nazim

Service Orchestrator in Cloud RAN based Edge Cloud ...... 28 Olfa Chabbouh, Nazim Agoulmine, Sonia Ben Rejeb and Javier Baliosian

Intelligent Environments Applied to Precision Agriculture ...... 35 Clemilson Costa Dos Santos, Gabriel Paillard, Emanuel Coutinho, Maur´ıcio Neto, Leonardo Moreira and Ernesto Trajano de Lima

Research Opportunities in Quality Assessment of Internet of Things, Software Defined Networks and Network Function Virtualization Environments ...... 43 Emanuel Coutinho, Maur´ıcioMoreira Neto, Jose De Souza, Carla Bez- erra and William Sales

IoT Research Opportunities in SOLAR E-learning Software Ecosystem . . 51 Emanuel Coutinho, Maur´ıcioMoreira Neto, Leonardo Moreira, Carla Bezerra and Jose De Souza

Secure Services Recommendation for Social IoT Systems ...... 59 Ahmed Elmisery and Hugo V´elez

Using Archetypes for Interoperability on Clinical Management Scenarios: A Case Study on Aedes Aegypti’s Arboviruses ...... 70 F´abioGomes, ´esarMoura, Arthur Bezerra, Jo˜aoJos´e,Oton Braga, Odorico Monteiro and Mauro Oliveira

Automated Scale Calibration and Color Normalization for Recognition of Time Series Wound Images ...... 78 Te-Wei Ho, Jin-Ming Wu, Hao-Chih Tai, Chun-Che Chang, Chien Hsu Chen and Feipei Lai

A Publish/Subscribe QoS-aware Framework for Massive IoT Traffic Orchestration ...... 86 Pedro Moraes, Rafael Reale and Joberto Martins

iv ASP: An IoT Approach to Help Sedentary People ...... 100 Maur´ıcioMoreira Neto, Emanuel Coutinho, Matheus Roberto Da Silva Oliveira, Leonardo Moreira and Jose De Souza

Experimental Evaluation of Adaptive Beaconing for Vehicular Communications ...... 108 Pablo Ortega, Sandra Cespedes, Sandy Bolufe and Cesar Azurdia

A new SDN-enabled Routing scheme in Vehicular Networks ...... 116 Mehdi Rayeni and Abdelhakim Senhaji Hafid

Enhancing the Performance of Wireless Sensor Network through Cross-Layer and Graph Coloring Approaches ...... 118 Bruno Santos, Leonardo Rocha, Renan Alves, Rafael Gomes and Joaquim Celestino J´unior

SMDAnonymizer: a web tool for data anonymization ...... 126 Italo Santos, Emanuel Coutinho and Leonardo Moreira

Blockchain Technology: A new secured Electronic Health Record System 134 Lotfi Tamazirt, Farid Alilat and Nazim Agoulmine

CONTEXT-BASED DYNAMIC OPTIMIZATION OF SOFTWARE DEFINED NETWORKS...... 142 Francisco J. Badaro V. Neto, Constantino Jacob Miguel, Jorge Alves Santos and Paulo N. M. Sampaio

A solution for acquisition of vital signs on Healthcare IoT Application. . . 150 David Viana, Emilson Rocha, Nicodemos Freitas, Vitor Lopes, Odorico Monteiro and Mauro Oliveira

Using Classification Algorithms for generating alerts of the risk of infant death ...... 157 Gerson Vieira Albuquerque Neto, Cristiano Silva, Joyce Quintino, Mauro Oliveira and Odorico Miranda

Benchmarking microservices deployment patterns: Virtual Machines vs Container over Virtual Machine ...... 165 Ahmed Yakdhane, El Hadi Cherkaoui and Fouad Guenane

v Proceedings ADVANCE 2018 ISBN 978-2-9561129

1 QoS Instrumentation to Support Cloud Computing SLA Assurance

Mustapha Ait-Idir1, Nazim Agoulmine1, Rafael Tolosana Calasanz2, and Javier Baliosian3

1 IBISC Lab, University of Evry Val d’Essonne, Evry Val d’Esssonne, France [maitidir, nazim.agoulmine]@ibisc.univ-evry.fr 2 Universidad de Zaragoza, Zaragoza, Spain [email protected] 3 Universidad de la Repblica de Uruguay, Montevideo, Uruguay [email protected]

Abstract. Today, organizations and individuals are moving from pro- prietary servers to the cloud computing to benefit from the powerful ad- vantages of the theoretical unlimited resources and computation power. The principle of PAYG (Pay As You Go) makes this new paradigm more attractive and provides a rapid growing and extension. However, using a multi-tenancy environment brings new challenges in terms of security, reliability and QoS (Quality of Service). Thus, to fulfill cloud customer expectations in cloudified applications, a relationship must be formalized in a signed contract (i.e. SLA or Service Level Agreement). The service provider monitors the application components to prevent breaches dur- ing the SLA life cycle and guarantees the enforcement of the promised QoS. Usually a service is monitored as a whole instead of considering each component in the service individually for a best intervention and an opti- mized scaling. In this paper, we propose a component-based application monitoring mechanism. We have adopted the OCCI (Open Cloud Com- puting Interface) framework to model our agreement and instrument a response-time metric for SLA enforcement. The goal is to prevent latency and SLA violation by identifying responsible components involved in the request chain. Our priority is to take actions to prevent the breaches or at least act for a resolution when a problem manifests.

1 Introduction

Based on the best of our knowledge, there is a lack of clear specifications of a clear Quality of Service (QoS) framework in cloud computing. This concept actually varies from one IT domain to another. QoS is often applied to a system, a service or a component and is defined in [22] as the overall service performance from the end user perspective. Popularity of Internet services has required the need to provide customers with more than best-effort support allowing them to choose between providers based on the level of the provided services. QoS has al- ways been one of the key differentiation factors among service providers and this is also true in Cloud Computing (CC). CC QoS could be Functional (function- ality and behavior) or Non Functional (criteria for evaluation and operation). A

1 Proceedings ADVANCE 2018 ISBN 978-2-9561129

QoS metric could also be qualitative (e.g. user experience) or quantitative (e.g. response-time). There is therefore a need to define what needs to be monitored and analyzed to assess whether the level of QoS is satisfactory in the signed SLA contract as well as if is respected.

1.1 Problem Statement Cloud Computing QoS is a very important aspect as a customer will only expe- rience the launching and the resuming of a service provisioning request regard- less how many underlying components and/or services are actually activated to provision the customer application. In case of resources shortage, the Cloud Computing scaling mechanism may only impact the overloaded components in- stead of the overall application. Often, when scaling an application we scale a VM (Virtual Machine) regardless of which component in the application is causing a bottleneck or a latency that may add an overhead (time and cost) for resource provisioning and releasing. For this reason, in this paper, we propose to accurately identify and monitor all the components status to achieve a better decision-making based on the collected information. Always, when processing a client request, several components may interact together to handle it, con- sequently increasing the complexity of investigating any QoS attribute failure. The more the accuracy in monitoring, the faster the resolution of any detected problem is. By using this approach, it will be possible to improve the perfor- mance of the provided service by reducing outage, MTSO (Mean-Time Switch Over) and MTSR (Mean-Time System Recovery) duration. For instance, in case of response-time performance QoS, the requester component will be tightly cou- pled with the requested component. Therefore, the more the requested compo- nent fulfills the SLA, the more the requester component will process a request in time. In this paper, we propose to put an enforcement framework in place in or- der to monitor the response-time performance of these components using smart data monitoring and decision-making algorithm, either to prevent any latency or to intervene immediately after a detected problem. We will describe how to measure a response-time metric based on acceptable QoS values defined in the SLA and how to pinpoint potential failing components.

1.2 Quality of Service Concept Most existing state of art contributions on QoS assurance have focused on two main aspects: QoS modeling using Ontology and QoS Instrumentation. QoS mod- eling using ontology aims to provide definition, terminology, metrics unit and all areas where QoS should be applied [4, 11, 7, 13, 6, 10, 12, 17]. QoS instrumenta- tion is related to measurement, enforcement techniques and monitoring [20, 3, 9, 21]. Ontology is very important when defining a common understanding of SLS (Service Level [Agreement] Specifications) parameters. To enforce this con- cept, the NIST (National Institute of Standards and Technology) has proposed an abstract overview for metrics definitions and measurements [18]. The cloud popularity pushes service providers to offer more QoS metrics monitoring and

2 Proceedings ADVANCE 2018 ISBN 978-2-9561129

reporting by exposing transparently their semantics. Thus, several authors em- phasized their importance for CC adoption and attractiveness.

1.3 Paper Structure

The remaining of the paper is organized as follows: Section 2 presents the related works mainly in the areas of QoS instrumentation and enforcement. Section 3 presents the proposed solution. The implementation and use case are presented in Section 4. Finally, Section 5 concludes this work present some important future directions.

2 Related Work on QoS Enforcement

Understanding the semantics of QoS metrics is very important. The QoS Ontol- ogy model gathers standardization, definition and terminology in this area while instrumentation is more focused on the metrics for monitoring and measure- ments. Monitoring is mandatory for QoS validation, such as verifying whether a QoS metric instance is within acceptable thresholds, as specified in the SLS of the SLA. Oftentimes QoS parameters are specified in the SLA and are monitored by the provider to assess whether SLA is violated. To achieve this objective, the monitoring and SLA management systems should collect performance data from the underlying monitored resources and map them to the target QoS pa- rameters. SLA monitoring is one of the seven functions of SLM (Service Level Management: creation, negotiation, provisioning, monitoring, maintenance, re- porting and assessment) even if QoS properties may differ from one service to another. User experience is itself considered as a QoS such as QoBiz (Quality of Business) metric. The work done in [4] provides a reasonable specification range for the QoS metrics values and describes the service operations (e.g. migration) to keep the overall QoS acceptable. In the work presented in [11], authors as- sessed the QoSS (Quality of Security Service) concept to verify whether it may improve security and system performance in a QoS-Aware distributed environ- ment. Security is studied as a dimension of QoS and it is related to a system that was deployed in several sites. In such a case, a managed resource may have a predictable and efficient allocation and utilization. Hence, if a resource is overloaded, the provider can define priority actions such as postponing tasks or even canceling and terminating jobs. Authors in [7] proposed another approach based on the W3C (World Wide Web Consortium) QoS specifications for Web services to identify relevant attributes when selecting a service provider. The con- sidered QoS attributes include: Performance, Scalability, Reliability, Accuracy, Integrity, Robustness, Availability, Interoperability, Accessibility and Security. Current standards in Web services such as WSDL (Web Service Definition Lan- guage) mostly supports the description of functional aspects of interfaces rather than QoS specification and the used terminology is somehow ambiguous when defining quality attributes. Authors in[7] have therefore proposed an interesting classification for QoS attributes and sub-attributes as depicted in the Table 1.

3 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Attribute Sub Classes Functionality Suitability, Accuracy (or Correctness), Security , Interoperability, Compliance. Maturity, Fault Tolerance, Recovery, Compliance, Robustness, Availability, Reliability Integrity. Time Behavior (or Performance) (Latency and Throughput), Resource Behav- Efficiency ior, Compliance, Scalability. Accessibility. Analyzability, Changeability or Modifyability, Stability, Testability, Compli- Maintainability ance. Portability Adaptability, Install-ability, Co-existence, Replace-ability, Regulatory. Understandability, Learn-ability, Operability, Attractiveness, Compliance, Usability Documentation.

Table 1: QoS Attributes Classification

In the commercial area, Appirio Cloud Metric [1] is used to monitor and man- age Saleforce.com by delivering reports. It helps customers to identify the most important factors impacting the production environment and let them focus on the performance enhancement. This kind of reports help clients detecting QoS breaches in their services and managing them accordingly. SAS Environment Manager [21] is used to track the performance of middle-tier component dur- ing run-time. Potential bottlenecks and breaches can be identified by collecting the relevant measurements. This requires to know more precisely when, what, how and where to collect relevant information. DevOps also Brought the notion of rapid deployment, configuration and scaling. In this area, AWS Fargate [8] helps customers to deploy applications using containers on AWS by leveraging containers instead of VM as a compute primitive. In this case a knowledge of the application is mandatory for well sizing each container image. Kubernetes [2] is another open-source framework used to manage centralized application for automated deployment, configuration and scaling. The containers are grouped in a logical units to build an application. Hence by considering a component as an application will increase drastically the number of containers. Finally, the state of the art also shows that the concept of QoS monitoring is omnipresent, and often focuses on either the middle-tier infrastructure or the whole system (in case of cloudified application). The more the monitoring accuracy (specific component), the stronger the SLA enforcement is.

3 Proposed Solution & Architecture

From this state of the art analysis, it appears that just monitoring the system as a whole, rather than also monitoring the specific constituent component of a system, does not lead to accurate SLA management. We argue that it is necessary to have a more fine grained monitoring of the system. Therefore, we propose a solution that aims to provide QoS modeling and its associated algorithm SLA

4 Proceedings ADVANCE 2018 ISBN 978-2-9561129

breach detection and assurance. We focus mainly on the response-time QoS in the performance category [19], as it is the most visible metric for the end user.

3.1 Solution Architecture and Design

The proposed solution is based on components monitoring by analyzing dynam- ically the monitoring data produced by these components. The monitoring data should respect specific protocol and structure that will be described later in this section. The goal of the monitoring data analysis is to survey response-time QoS (date-time metric). As specified in [12], a response-time is a minimum elapsed time in a service request. It is measured by date-time metric (often in millisec- onds) and it may be expressed with an average or a min-max interval. In our case, the response-time of a component is the time used by a component to process a request from the time it received it to the time it sent the response. In a request chain, we will use the term requester for a component Ci sending a request and the term requested for a component Cj receiving a request. The idea behind is to trigger an alert whenever the component is not capable to process the request in the specified time and therefore exhibiting response-time exceeding a predefined threshold derived from the SLA. Two types of alert can be triggered:

– Proactive Alert: As an alert triggered by a component and serves to mit- igate potential SLA violation. – Emergency Alert: Is an alert triggered by the application and requires an immediate intervention since it will eventually break the SLA terms and lead to penalties for the provider.

Parametrization and Hypothesis: After a few long running performed tests and benchmarks on a real deployed service, we have measured the average response-time thresholds to apply service components. In addition, we have ob- served that useful alerts happen after three defects of a component to respond within an acceptable threshold. Thus, if a component triggers three consecutive alerts in the same hour, it is reasonable to consider it as a component violation (not a SLA violation). We derive then the following conclusions for a customer request on a service that involve n components:

Pn – RTmax = i=0 RTmax(i) Pessimistic response-time for n components Pn – RTmin = i=0 RTmin(i) Optimistic response-time for n components Pn – RTavg = i=0 RTavg(i) Average response-time for n components – Alertmax Number of occurrences before triggering an emergency alert set to 3. Each component Ci has its maximum alert occurrence Alert(i)max

5 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Response-Time Model: A set of components are deployed and logically con- nected together to provide an end-to-end service. Each triggered component receives a request, performs specific computation and processing on the request and finally delivers a response to the requester component or another component in the component chain. A response-time for a request RTreq is tightly bound with all involved components in the request chain. Thus, the final response-time RTreq can be calculated as follows:  n P T (i), n+1 involved components  i=0 where, RTreq = RT ≤ RT , RT QoS in SLA  req max  th T (i) = Tout(i) − Tin(i), RT of the i component

Monitoring Data Structure: The monitoring tool requires a specific data structure. We used JSON format [14] to represent it and REST API [23] render it as these two technologies are widely used today. The data structure is depicted in the Table 2.

Data Type Attribute Description message-Id Message Id identifier across the request. Identification component-Id Component Id identifier across the application. instance-Id Container instance Id identifier in the VM. time-stamps timestamps in milliseconds. Measurements start-time Represents a component request T (i)in in seconds. end-time Represents a component response T (i)out in seconds.

Table 2: Monitoring Data Structure

3.2 SLA Enforcement Algorithm The algorithm we propose is based on two major events that could trigger two types of alerts. The first type of alert (Proactive) is used to alert about a partic- ular component performance that may cause a SLA violation if not mitigated. The second type (Emergency) is for immediate intervention because it eventually causes SLA violation. Each time a performance violation is detected for a partic- ular component, the Alert occurrence is incremented. When a maximum allowed number of alerts Alert(i)max is reached within the same hour for a particular component, a proactive alert is sent. If the RTreq is not met and a maximum number of alerts Alertmax is reached within the same hour then an emergency alert is triggered. Emergency alert should satisfy these conditions:  RT > RT RT exceeds the limit.  req max Emergency and  Alertmax = 3 Maximum number of QoS violation during the same hour.

6 Proceedings ADVANCE 2018 ISBN 978-2-9561129

A system is reset if one hour has elapsed or an emergency alert has been triggered.

3.3 SLA Modeling

The corresponding SLA model is specified using OCCI Service Level Agreement [16]. OCCI is used both for SLA specification as well as for modeling the mon- itoring tasks [15, 5]. The proposed SLA is divided into two major parts. The first part is related to services specification (e.g. Application Components) and the second part is related to the modeling of the QoS objectives or SLO (or Service Level Objectives). In our case, the SLO are related to services perfor- mances (e.g. response-time). The Figure 1 depicts the resulting Service Level Agreement model for one customer application (user service). The QoSSLA is

Application Resource AgreementTemplate:Mixin Instance term = agreement_tpl AgreementTerms:Mixin scheme = http://schemas.ogf.org/occi/sla# aterm=agreement_terms scheme=http://schemas.ogf.org/occi/sla#

depends depends QoSSLALink:AgreementLink

AppPerformance:Mixin LocalisationTemp:Mixin term = service_performance term = localisation_fr scheme = http://ibisc.fr/terms# QoSSLA:Agreement scheme = http://ibisc.fr/templates# occi.agreement.state = accepted occi.agreement.agreedAt = 2017-04-30T14:00:00+00:00 occi.agreement.effectiveFrom = 2017-05-01T00:01:00+00:00 occi.agreement.effectiveUntil = 2017-07-01T00:01:00+00:00 attributes:Set attributes:Set application_performance.terme.type = SLO-TERM application_performance.term.state = Undefined fr.ibisc.region = ile-de-france application_performance.term.desc = “Response Time” fr.ibisc.incountries = Europe application_performance.term.rt.max = 3 s application_performance.term..rt.min = 1 s

Fig. 1: Application OCCI SLA Model

the agreement related to the application resource and is linked to it using the QoSSLALink link. The QoS response-time is represented by the AppPerfor- mance Mixin (a concept in OCCI that allows to add properties dinamically to a particular concept) with its OCCI AgreementTerms Mixin and set of attributes including the response-time limit condition. A LocalisationTemp Mixin is also added using OCCI AgreementTemplate Mixin with its attributes set. The sec- ond model is for the monitoring representation as depicted in the Figure 2. The AppMonitoring resource is linked with the monitored application using the AppMonitoringLink link. The AppMonitoringData Mixin represents the performance condition while the AppMonitoringCollector Mixin defines the

7 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Entity

AppMonitoring:Resource EmergencyAlert:Mixin AppMonitoringLink:Link term = alert_emergency scheme = http://ibisc.fr/alerts#

AppMonitoringData:Mixin depends AppMonitoringCollector:Mixin scheme = http://schemas.ogf.org/occi/monitoring# AlertTemp:Mixin attributes:Set scheme = http://schemas.ogf.org/occi/monitoring# term = service_performance_data term = service_performance term = alert_tpl fr.ibisc.support.sms = Title = component data monitiring Title = application monitiring scheme = http://ibisc.fr/templates# fr.ibisc.support.email = [email protected] Fr.ibisc.support.pager = depends ProactiveAlert:Mixin attributes:Set attributes:Set application_performance.terme.type = SLO-TERM term = alert_proactive scheme = http://ibisc.fr/alerts# component_monitoring_Id = application_performance.term.state = Undefined component_monitoring_msgId = application_performance.term.desc = “Response Time” component_monitoring_instanceId = application_performance.term.rt.max = 3 s component_monitoring_timestamp = application_performance.term..rt.min = 1 s component_monitoring_starttime = component_monitoring_endtime =

Fig. 2: Monitoring OCCI Model

collected data. A set of alert Mixin is defined (EmergencyAlert and Proac- tiveAlert) based on the AlertTemp template Mixin. Please note that we did not represent the Decision Agent and the Notification Agent as depicted in the architecture diagram from the Figure 3.

4 Implementation

We have implemented a proof of concept simulation. It consists of an envi- ronment hosting two VM(s) with different resources capacity: VM1(CPU : 1, RAM : 3GB, Disk : 30GB) and VM2(CPU : 1, RAM : 2.5GB, Disk : 20GB). We configured each VM with the necessary tools and software to ac- curately monitor the middleware and executing the application. We have in- stalled a J2EE based application server that serves as an application container and we added a MDC (Monitoring Data Collector) and a DNA (Decision and Notification Agents). The two VM(s) are faced by a local LB (Load Balancer) in a local network as illustrated in Figure 3. We have specified our complex and composite application as a set of interacting components. In order to facilitate our simulation, the MDC tool collects monitoring data from each component and builds monitoring metrics. This data is then exposed through a REST API in a JSON format. When we ran the performance tests, we faced an issue with the CPU consumption that impacted the overall response-time. After running an RCA (Root Cause Analysis), we discovered that the issue was caused by the increasing number of REST calls. By tuning the time between two consecutive

8 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Local Network Load Balancer Virtual Machine/OS 2 Virtual Machine/OS 1 Application Container Notification End User C Agent Support .0 Decision Agent . Ci

Config

Alerts Monitoring Data Collector

Fig. 3: Global Architecture Design

calls, we reached an acceptable compromise between the CPU usage and the number of monitoring calls as illustrated in Figure 4 and Figure 5. This issue is pinpointed in the SLAMOM project [19] where they recommended a minimum time interval between two monitoring requests in order not to overload the com- ponents (around 1 minute). After executing our simulation scenario, we received

1/2 Second Interval per Call 2 Seconds Interval per Call 100 100 % Usage CPU % Usage CPU 75 75

50 50

25 25

0 0 0.00 1.65 3.12 4.77 6.23 7.88 9.35 11.00 11.17 12.65 0.00 1.65 3.12 4.77 6.23 7.88 9.35 11.00 11.17 12.65 Hours Hours

Fig. 4: CPU Usage (1) Fig. 5: CPU Usage (2)

twenty-six alerts (twenty-five proactive and one emergency). After analyzing the alerts, we discovered that the emergency alert happened after the same compo- nent caused four proactive alerts as depicted in Figure 6 and Figure 7.

9 Proceedings ADVANCE 2018 ISBN 978-2-9561129

3.0 3 RTmax 2.5

2.0 2 Proactive Proactive Alerts Response Response Time (s) 1.5

1.0 1

0.5 RTmin 0 0.0 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Emergency Alert Hours Hours

Fig. 6: Response Time Result Fig. 7: Alerts Result

5 Conclusions and Future Works

Assuring SLA in Cloud Computing environment is very important for its adop- tion. However, the QoS metrics that are defined by Cloud Providers are hetero- geneous, domain dependent and more seriously lacking the same semantics from one application to another. Thus, if we exclude traditional hardware monitoring and tuning, it is difficult to have common tools to manage these metrics in an efficient way. Cloud applications are used to build a set of components that are logically connected together. It is important to monitor each component indi- vidually and understand how the performance violation of each component may impact the service as a whole. It is also important to avoid performing unnec- essary reconfiguration or scaling on these components and to refine the right thresholds and failure occurrence before any action. In this work, we have pro- posed an approach to optimize and adapt these values based on peak hour and normal hours to minimize resources usage and enhance the overall system per- formance. In the future, we would like to leverage this approach to other QoS, improve the provisioning process and scaling mechanism using Container (e.g. Docker) technology and associated monitoring mechanisms. In this scenario scal- ing containers instead of VM may permit to better optimize resource utilization by modeling a set of components as an application.

Acknowledgments

This research is partially funded by the French Ministry of Foreign Affairs and International Development (MAEDI) in the frame of the AmSuD-VNET project (16STIC11-VNET). Also by the Industry and Innovation department of the Aragonese Government and European Social Funds (COSMOS group, ref. T93) and the Spanish Ministry of Economy (TIN2013-40809-R). Thanks to all the partners of the project who have helped with their discussions to improve the research work presented in this paper.

10 Proceedings ADVANCE 2018 ISBN 978-2-9561129

References

1. Appirio. Cloud Metrics for Salesforce.com. Technical report, Appirio, Inc., 760 Market St. 11th Floor, San Francisco, CA 94102, 2012. 2. The Kubernetes Authors. Production-grade container orchestration, 2017. 3. A.K. Bardsiri and S.M. Hashemi. QoS Metrics for Cloud Computing Services Eval- uation. I.J. Intelligent Systems and Applications, MECS, pages 27–33, December 2014. doi:10.5815/ijisa.2014.12.04. 4. Ch. Braga, F. Chalub, and A. Sztajnberg. A Formal Semantics for a Quality of Service Contract Language. ELSEVIER, Electronic Notes in Theoretical Computer Science 203 (2009), pages 103–120, 2009. 5. A. Ciuffoletti. A simple generic interface for a cloud monitoring service. CLOSER 2014 - 4th International Conference on Cloud Computing and Services Science, April 2-5 2014 (Barcelona), April 2014. 6. G. Dobson, R. Lock, and I. Sommerville. QoSOnt: A QoS Ontol- ogy for Service-Centric Systems. 31st EUROMICRO Conference on Soft- ware Engineering and Advanced Applications, pages 80–87, September 2005. doi:10.1109/EUROMICRO.2005.49. 7. S. Hanna and A. Alawneh. An Approach of Web Service Quality Attributes Spec- ification. IBIMA Publishing, Communications of the IBIMA Journal (ISSN:1943- 7765), 2010:1–13, January 2010. doi:10.5171/2010.552843. 8. R. Hunt. Introducing aws fargate run containers without managing infrastructure, November 2017. 9. A. Iosup, S. Ostermann, M. N. Yigitbasi, R. Prodan, T. Fahringer, and D. Epema. Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Com- puting. IEEE Transactions on Parallel and Distributed Systems, 22:931–945, June 2011. 10.1109/TPDS.2011.66. 10. C. Irvine and T. Levin. Toward a Taxonomy and Costing Method for Security Services. Proc. ACSAC99, Phoenix AZ, December 1999. 11. C. Irvine and T. Levin. Quality of Security Service. NSPW - Proceedings of the workshop on New security paradigms, pages 91–99, 2001. 12. J. McKendrick. Survey: Public Cloud Metrics Are Still Too... Cloudy. http://www.forbes.com/forbes/welcome/3db1de661194, December 2015. 13. J. Jin and K. Nahrstedt. QoS Specification Languages for Distributed Multimedia Applications: A Survey and Taxonomy. IEEE Computer Society, IEEE MultiMe- dia, ISSN :1070-986X, 11:74–87, July 2004. doi:10.1109/MMUL.2004.16. 14. JSON. The home of JSON Schema. Json-schema Organization (2017). http://json- schema.org/documentation.html, 2017. 15. M. Mohamed, Dj. Belaid, and S. Tata. Extending OCCI for autonomic manage- ment in the cloud. http://elsevier.com/locate/jss, 2016. 16. OGF. GFD228-Open Cloud Computing Interface Service Level Agreements. http://occi-wg.org/2016/10/, October 2016. 17. Opengroup. The Open Group, Building Return on Investment from Cloud Computing : Cloud Computing Key Performance Indicators and Metrics. www.opengroup.org/cloud/whitepapers/ccroi/kpis.htm, June 2016. 18. P. Pritzker and W. May. Cloud Computing Service Metrics Description. National Institute of Standards and Technology, Special Publication 500-307, 2015. 19. Slalom Project. SLA specification and reference model-c-D3.6. Technical report, Institute of Communication and Computer Systems and other members of the SLALOM consortium, 2016, ICCS 9, Iroon. Polytechniou Str., 157 73 Zografou, Greece, 2016.

11 Proceedings ADVANCE 2018 ISBN 978-2-9561129

20. M. Selva, L. Morel, K. Marquet, and S. Frenot. A QoS Monitoring System for Dataflow Programs. ComPAS2013 : RenPar21/ SympA15/ CFSE9 - Grenoble, France, January 2013. HAL Id: hal-00780976. 21. R. Sioss. SAS 9.4, Web Application Performance: Monitoring, Tuning, Scaling, and Troubleshooting. Proceedings 2014, SAS Global Forum, March 2014. 22. Wikipedia. Quality of service. https://en.wikipedia.org/wiki/Quality of service/, August 2016. 23. Wikipedia. Representational state transfer. https://en.wikipedia.org/wiki/Representational state transfer/, September 2017.

12 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Evaluate Location Features for Continuous Authentication with Machine Learning Experiments

Rossana M. C. Andrade1,∗ Marcio A. S. Correia1, Carlos A. B. Carvalho12, and Pablo Ximenes3

1 Group of Computer Networks, Software Engineering and Systems Federal University of Ceara, Fortaleza, Brazil {rossana,marcio}@ufc.br 2 Federal University of Piaui, Teresina, Brazil [email protected] 3 Information Technology Company of Ceara (ETICE) State Government of Ceara, Fortaleza, Brazil [email protected]

Abstract Traditional authentication mechanisms take some time, especially, from mobile users that access your devices several times but for short periods. Continuous authentication appears as an alternative, requiring less active user interaction. These mechanisms aim at reducing the time spent by users with authentication as well as raising the security level related to identity verification. Existing solutions use biometric data as the input of Machine Learning algorithms that identify if a user is authorized. It is possible to use a single biometric feature or combine some features. Then, it is necessary to analyze the impact of the features and machine learning algorithms when designing an authentication mechanism. In this context, this paper describes a methodology to design and perform experiments in which their results can be used in a decision-making process. We apply this methodology to evaluate features of outdoor location obtained from GSM or GPS, using algorithms available in the WEKA environment. Based on efficacy and efficiency measures, our experiments, using three datasets, indicate better results when GPS data and the algorithm J48 are used for authentication.

1 Introduction

Authentication solutions should be designed to balance security and usability. Due to limi- tations imposed by the interface of mobile devices, the use of passwords cannot be the best authentication solution. Besides, the duration and frequency of user authentication on mobile devices must be considered in the design of authentication mechanisms, because the users han- dle with these devices more frequently but for short periods of time [1]. Then, the usability cannot be ignored because the risk of the authentication mechanism being disabled. Continuous authentication (also known as transparent or implicit authentication) appears as an alternative for user authentication, requiring less active user interaction [2]. In continuous authentication mechanisms, the user’s identity is continuously monitored based on physiological and behavioral biometrics extracted from natural use of a device. Some examples of biometric features are touchscreen manipulation, typing, gait, face, voice, location, active mobile app, accessed websites, typed text, audible noise, received/sent text messages and calls. The user

∗Researcher Scholarship - Technological Development and Innovative Extension (DT 2), sponsored by CNPq, Brazil

13 Proceedings ADVANCE 2018 ISBN 978-2-9561129

profile is built from patterns recognized from a single biometric feature or combined features. The patterns recognition is broadly based on machine learning algorithms [3]. The security of the authentication mechanisms relies on their effectiveness that is influenced by the features and algorithms used. In this context, a decision-making process is essential to propose authentication mechanisms, observing what algorithms and features produce better results. The statistical analysis based on efficacy and efficiency measures allows the comparison between the algorithms and features. On the other hand, existing solutions did not follow guidelines such as the proposed by Alpaydin [3] to design them, enabling the experiments’ reproduction and comparison with between solutions. Then, we propose, in this paper, a methodology to design and perform experiments in order to evaluate what algorithms and features are best suitable to be used by continuous authentication mechanisms. It is interesting to mention the possibility to use this type of mechanism also by IoT devices. As an example, a presence sensor can identify automatically who entered the environment and send commands to the lights and the air conditioner, configuring them in accordance with the user’s preferences. GSM (Global System for Mobile Communications) or GPS (Global Positioning System) data have been used as location feature to authenticate users [4, 5]. However, it was not possible to infer what feature is most suitable for authentication. Then, we applied our methodology to evaluate location features. We detail here the results of this evaluation that includes also the analysis of machine learning algorithms and the merge of features, considering, for example, a user’s trajectory. We use WEKA [6], a publicly available machine learning environment for experimentation that achieves wide acceptance in the academic world. Finally, we also use Geolife [7] and MIT Reality [8], two open datasets which have location collected from a large number of real mobile devices users. Following sections present the background (Section 2); related work (Section 3); proposed methodology and case study (Section 4); results and analysis (Section 5); and finally, conclusions and future work (Section 6).

2 Background

Continuous authentication mechanisms have be proposed in literature to identify mobiles users without their active participation [1]. Mobiles devices have multiple sensors whose data can be collected and analyzed, allowing the user identification. Usually, the mechanisms are based on machine learning algorithms in which an authorized user provides samples of his/her features that are used to build the user profile. This process can be called registration and is followed by the authentication process [9]. In the authentication process, the unknown user identity is verified to allow user access to the system. Although there are unsupervised learning algorithms [3], this scenario fits with the use of supervised classification algorithms. Then, the algorithm learns the pattern of the authorized user, based on the features extracts from the training set. Next, this pattern is compared with the features are extracted during the use of the device, allowing or not access. It is essential to perform experiments to analyze the effectiveness of the authentication. For this, a test set, containing features of authorized and unauthorized users, is used for classification, and the obtained results are compared with the expected results. In this scenario, the training set contains only data of the authorized user, requiring a one- class classification algorithm such as One-class SVM [10]. We performed some experiments, using this algorithm, and obtained results well below that achieved by related work. However,

2

14 Proceedings ADVANCE 2018 ISBN 978-2-9561129

one-class algorithms are not used by related work so that the training set contains data of authorized and unauthorized users. In this research, two-class were used, and each entry of the training set is labeled with authorized or unauthorized, indicating if this entry is referring to the authorized user. In our experiments (see Section 5), each entry of the test set is classified also as authorized or unauthorized. In order to statistically analyze the results, it is necessary to collect some measures, considering also the expected results. The basic measures is True Positive (TP), True Negative (TN ), False Positive (FP) and False Negative (FN ). The first two measures show the hits of the classifier, indicating, respectively, the number of times the authorized and unauthorized users have been identified rightly. On the other hand, the latest measures inform the mistake of the classifier. In order to enable the reproducibility and the comparison with other solutions, it is essential to follow some guidelines, during experiments, as the described by Alpaydin [3]. His guidelines specify the needs to: i) define the objectives of the experiments; ii) select the response variables; iii) choice of factors and levels (i.e., the controllable elements of an experiment, such as the algorithms and their configurations); iv) design the experiments, detailing their variations in accordance with, for example, with the used dataset and algorithm; v) perform the experiments; and vi) analyze the results. Our methodology is based on this guidelines and detailed in Section 4. The repetition of experiments, using the same factors is essential to statistically analyze so that the effects of uncontrollable factors are reduced. In machine learning, a dataset is divided in different ways to repeat each experiment, using a cross-validation technique. Besides, in order to compare algorithms, the same subsets is used, modifying only the algorithms. Following the recommendation of Alpaydin [3], we use the 10-Fold Cross-Validated Paired T-Test, performing ten experiments with each algorithm. The dataset is divided in ten equals parts, and one part is used as the test set in each experiment while the others parts make up the training set.

3 Related Work

Patel et al. [2] present a review of the recent literature with major work involving continuous authentication. Among these work, [4] and [5] use location features. We performed a systematic literature review to identify other proposals that use the location for authentication. the Table 1 summarizes our findings, showing for each related work: i) the technology used to extract the users’ location; ii) the extracted features; iii) the machine learning algorithm; iv) the evaluation measures; and v) details of the dataset. Although we highlight only features related to user location, some solutions use other fea- tures to perform the user authentication, such as web browsing [5] and app usage [11]. However, the conducted evaluations do not describe the effect the impact of each feature in the efficiency and efficacy of the proposed mechanisms. Each mechanism focus in a single machine learning algorithm, such as K-NN (K-Nearest Neighbor) [13] or SVM (Support Vector Machine) [5], so that it is not evaluated what is the most suitable algorithm for continuous authentication, considering the selected features. Besides, it is not possible to compare the solutions because a proper methodology was not applied for evaluation of the mechanisms. The absence of this methodology makes impossible the reproduction of the experiments that were performed not using open datasets. Then, although the related work presents good results under the observed scenarios, it is hard to conclude about the overall effectiveness of the proposed mechanisms In this context, the related work does not provide elements that help in the decision-making

3

15 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Table 1: Main analysed characteristics in related work.

Paper Tech. Features Learning Measures Datasets Shi et al. [4] GSM CellID Levenshtein TPR, FPR, Collected by authors (7 Sequences Distance Precision, users) and Recall Fridman et al. [5] GPS Latitude and SVM with TPR, FPR Collected by authors Longitude Kernel RBF and ROC (200 users) curve Ramakrishnan et al. [11] GPS Latitude and K-NN TPR and Collected by authors (4 Longitude FPR users) Tang et al. [12] GPS Latitude, Lon- Rule Based TPR and Collected by authors gitude, and Classifier FPR (10 users) Time Hayashi et al. [13] GPS Latitude and K-NN Others Collected by authors Longitude (32 users)

process used during the design of authentication mechanisms. It is possible to ask what al- gorithm and feature (e.g., GPS-based or GSM-based) are most appropriate. Another possible analysis is the effect of a set of features in the efficiency and efficacy of authentication. In this research, for example, we observe the impact of the time and collection of locations (trajectory).

4 Evaluation Methodology and Study Case

In this section, we describe the proposed methodology that is based on the guidelines of machine learning experiments [3]. We extend these guidelines, proposing two new steps, as presented in Figure1. By exposing each step of our methodology, we also detail its execution in accordance with our study case, focusing on location features.

Figure 1: Methodology evaluation.

4.1 Objective In machine learning, the definition of objectives is essential to plan the evaluation so that the executed experiments allow to achieve them. In this research, the objective of the experiments is compare the results obtained with outdoor location features, considering: i) GPS and GSM- based features; ii) machine learning algorithms; iii) effect of the use of time and trajectory in

4

16 Proceedings ADVANCE 2018 ISBN 978-2-9561129

authentication; and iv) impact of the existence of users with similar behavior in the dataset.

4.2 Problem Modeling In this step, the original problem (in our case, authentication) is mapped in a machine learning algorithm. This mapping is not in the scope of the guidelines [3] and helps in the choices of the type of algorithms to be used during the experiments. An authentication mechanism verifies if a user is authorized or not. As described in Section 2, our mapping results in datasets that contain samples of authorized and unauthorized users. Then, a two-class classifier can identify if some sample is authorized or not.

4.3 Response Variables In this step, quality measures are selected to be used during the evaluation. The effectiveness is an important aspect to be analyzed, showing not only the success in the users’ identification but also the security of a mechanism, blocking the unauthorized access. There are several measures (e.g., accuracy, precision and recall) to calculate the effectiveness of a classification [3]. Due to the space limitation, we present here only the accuracy that is computed as indicated in Equation1, where the N is the amount of entries of the test set, and TP and TN are, respectively, the number of true and false positives. This measure has been usually cited in the literature and focus on the hits of the classifier. However, we computed the other measures, obtaining also good results. Besides, efficiency measures must also be computed, showing the cost of computing resources such as CPU and memory. The energy consumption was not analyzed because this measure is not supported by WEKA.

Accuracy = (TP + TN)/N (1)

4.4 Factors and Levels The factors are the controllable elements that affect the results of the experiments. The main factors are the features, datasets and machine learning algorithms. The features are defined based on the objectives that include the analysis of the location, time and trajectory. Then, we specified four features sets (FS). The FS#1 contains entries with one single location while the time of day that the location was collected is also indicated in the FS#2. Each entry of the FS#3 represents one trajectory, composed by six consecutive locations, as recomended by Shi et al. [4]. The time of the last location was included in the FS#4. Two open datasets were used to enable the analysis of the GPS and GSM data. While Geolife [7] contains users’ data collected by GPS, MIT Reality [8] consists of data collected by GSM. We also built a dataset containing GPS data of two users with a similar behavior to evaluate the impact of this similarity. The algorithms are selected based on the modeled problem and their availability in WEKA. In this research, we use the following classifiers: i) Decision Tree C4.5 (J48); ii) Support Vector Machine C-SVC (LibSVM); iii) Naive Bayes (NaiveBayes); and iv) 0-R (ZeroR). The levels of one factor are related to, for example, the different configurations of one algorithm. The pretests can help in the definition of the configurations to be used, reducing the size of the experiments without lose the quality of the results. In our experiments, the only change made in the standard setting was in LibSVM. For this algorithm, it was necessary to activate the normalization of the input features (-Z) and deactivate the use of the heuristics (-H).

5

17 Proceedings ADVANCE 2018 ISBN 978-2-9561129

4.5 Pretests Besides helping in the definition of configuration parameters, the pretests are included to filter the datasets. The repeated information reduce the efficiency and does not contribute to the learning process, being necessary to remove redundant samples. In our scenario, it is neces- sary to discretize the space and time. As results of this step, the location scale was fixed in approximately 100 square meters and the time scale was fixed in one minute. Then, only the change of region is considered in the extraction of the trajectory. Besides, in order to obtain more realistic results, it is necessary to have some similarity between the users’ behavior. So, only users of the same region (e.g., city) are analyzed together. Last, we remove users with less of 500 samples because the low quantity of samples is one of the causes of the underfitting [3].

4.6 Experimental Design Based on the results of the previous steps, the experiments are planned, defining what com- bination of factors will be used. Although we have fixed the parameters of the algorithms, all possible scenarios are considered in our experiments, following the factorial design [3]. Then, we have 48 evaluation scenarios, making all combinations of the three datasets, four features sets and four algorithms. The statistic analysis depends on the repetition of experiments, and we generated 610 different experiments, using the 10-Fold Cross-Validated Paired T-Test.

4.7 Performing the Experiment The equipment used for the experiments was a Lenovo Desktop ThinkCenter M91p with an Intel Core i5-2400 processor, a 8GB RAM and 7 Professional 64 bits SP1 operational system. The WEKA environment was chosen for the execution of the experiments conducted in this work. Python scripts were created to generate the ARFF file (used by WEKA) of each experiment. The files are available at https://goo.gl/DMtHn8 to allow the reproducibility of the experiments, filtering the original datasets and producing the training and test sets. This link contains also the tables with all measures of the statistical analysis.

5 Statistical Analysis of Results

In this section, the results of this work will be presented. The results of the evaluations form the classifiers are presented in a graph, with a confidence interval of 95%. The comparisons between the obtained results by the classifiers considered a significance level of 0.05. Figure2 presents only the accuracy of the classifiers, grouped by dataset. Comparing the results obtained by the classifiers, it is possible to observe that the J48 reached the greatest accuracy among the algorithms, followed by LibSVM that had a significantly lower accuracy. We also found evidence that the use of GPS data produces better results when compare with GSM data, extracted from dataset MIT Reality. Although the datasets are different, it is important to note that the behavior of the users in our GPS-based dataset (Research Data) is similar and, even so, the obtained results were better than the results of the MIT Reality. The effect of the use of time and trajectory in authentication is another aspect to be an- alyzed. It is possible to view, in Figures 2a and 2b, the improvement of the results when the time and location are used as features. However, observing also Figure 2d, the inclusion of the trajectory generated best results, considering GSM data, and worse results using GPS data. On the other hand, there is a reduction in efficiency as shown in Figure 3.

6

18 Proceedings ADVANCE 2018 ISBN 978-2-9561129

(a) Feature Set #1. (b) Feature Set #2.

(c) Feature Set #3. (d) Feature Set #4.

Figure 2: Accuracy results

The efficiency is other an important aspect, especially considering that the authentication is a real-time procedure. Due to the restriction of space, Figure 3 display only results obtained by the J48 and LibSVM classifiers, in accordance with the results obtained from the Geolife dataset. It is possible to highlight the good results obtained with the J48.

(a) CPU Time for training. (b) CPU Time for test.

Figure 3: Performance results

6 Conclusions and Future Work

In this work, we propose a methodology to evaluate the use of machine learning algorithms by continuous authentication mechanisms. We applied our methodology to compare algorithms

7

19 Proceedings ADVANCE 2018 ISBN 978-2-9561129

and different strategies of outdoor location that can be used by these mechanisms. The evidence obtained from the experiments indicated that Decision Tree C4.5 (J48) is the machine learning algorithm that offers the best efficacy and efficiency among the classifiers evaluated. Besides, analyzing the results by location techniques, the evidence indicated that the location features retrieved from the GPS data offer better results. Finally, considering the evaluated features sets, the evidence indicated that the combination of location features related to time and trace of last locations degrades the accuracy with GPS and improves the accuracy with GSM. Opportunities for future work include the performing of experiments using different configu- ration of the evaluated algorithms, other algorithms (e.g., K-NN), other biometric features (e.g., face) and the development of an environment for conducting experiments in mobile devices. The proposed methodology can also help to define the most appropriate techniques and features in the design of new authentication mechanisms, including the solutions for IoT environment.

References

[1] Heather Crawford. Adventures in authentication–position paper. In Symposium on Usable Privacy and Security (SOUPS). USENIX Association, 2014. [2] Vishal M Patel, Rama Chellappa, Deepak Chandra, and Brandon Barbello. Continuous user au- thentication on mobile devices: Recent progress and remaining challenges. IEEE Signal Processing Magazine, 33(4):49–61, 2016. [3] Ethem Alpaydin. Introduction to machine learning. MIT press, 2014. [4] Weidong Shi, Jun Yang, Yifei Jiang, Feng Yang, and Yingen Xiong. Senguard: Passive user identification on smartphones using multiple sensors. In 2011 IEEE 7th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pages 141–148. IEEE, 2011. [5] Lex Fridman, Steven Weber, Rachel Greenstadt, and Moshe Kam. Active authentication on mobile devices via stylometry, application usage, web browsing, and gps location. IEEE Systems Journal, PP(99):1–9, 2015. [6] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10–18, 2009. [7] Yu Zheng, Xing Xie, and Wei-Ying Ma. Geolife: A collaborative social networking service among user, location and trajectory. IEEE Data Eng. Bull., 33(2):32–39, 2010. [8] Nathan Eagle and Alex Sandy Pentland. Reality mining: sensing complex social systems. Personal and ubiquitous computing, 10(4):255–268, 2006. [9] Nathan Clarke. Transparent user authentication: biometrics, RFID and behavioural profiling. Springer Science & Business Media, 2011. [10] Larry M Manevitz and Malik Yousef. One-class svms for document classification. Journal of Machine Learning Research, 2(Dec):139–154, 2001. [11] Arun Ramakrishnan, Jochen Tombal, Davy Preuveneers, and Yolande Berbers. Prism: Policy- driven risk-based implicit locking for improving the security of mobile end-user devices. In Pro- ceedings of the 13th International Conference on Advances in Mobile Computing and Multimedia, pages 365–374. ACM, 2015. [12] Yujin Tang, Nakazato Hidenori, and Yoshiyori Urano. User authentication on smart phones using a data mining method. In Information Society (i-Society), 2010 International Conference on, pages 173–178. IEEE, 2010. [13] Eiji Hayashi, Sauvik Das, Shahriyar Amini, Jason Hong, and Ian Oakley. Casa: context-aware scalable authentication. In Proceedings of the Ninth Symposium on Usable Privacy and Security, page 3. ACM, 2013.

8

20 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Neural Network Model of QoE for Estimation Video Streaming over 5G network

LOUNIS Asma1, ALILAT Farid1, and AGOULMINE Nazim2

1 Image Processing and Radiation Laboratory, EII, USTHB, Algiers, Algeria [email protected] [email protected] 2 , Laboratory of Information, Integrative Biology and Complex Systems, University of Evry Val dEssonne, Bd Franois Mitterrand, Evry Cedex. France. [email protected]

Abstract With the rapid increasing demand of commercial video streaming, the satisfaction of the end user is becoming more and more important to measure and assure.The quality of experience (QoE) is defined as the measure of the overall level of customer satisfaction with the usage of a service provided by a vendor. Many works have addressed this issue in many different scenarios in cellular networks however most of these works have addressed video streaming over LTE networks (Long Term Evolution network). Up to day, there are few contributions of work that address the QoE over 5G network since there still still some challenges in this later to address. In this paper, we present the specific aspects we consider important in the evolution from 4G to 5G in term of traffic management and a solution to estimate this QoE in this new context. We adopted an approach based on Neural Network (NN) to estimate the QoE parameters. NN have been successfully used in many domains where it was difficult to derive an exact analytical model of the system so is the case of the 5G network.

1 Introduction

Recent years have seen an amazing evolution in information and telecommunication technolo- gies. The development of multimedia streaming services (Netflix, Youtube, Facebook, etc.) and innovative end user devices (e.g. smart phones, wearable devices, tablets, laptop) has created a huge demand for high data rate services. According to the latest Cisco Visual Network Index (VNI)[1], global mobile data traffic will increase almost ten times in the next ten years, where three-quarters of this traffic will be video streaming. The latter has pushed many researchers to investigate new technologies in the frame of the 5G (Fifth Generation of Cellular Networks). The expectation from 5G are huge with the promising enormous amount of spectrum in the millimeter wave (mmWave) bands it plans to use[2]. The goal of 5G is not only to provide high data transfer rates but also to improve the quality of experience (QoE)of end users and more particularly video streaming which are one of the most critical services that will be supported by 5G networks at large scale. Nowadays, video streaming traffic counts for more then 70 percent of all Internet traffic, and it is expected to reach 82 percent by 2020 [3]. This increase in the volume of video traffic is fueled by the emergence of new applications that will emerge quickly in the market such as social video, virtual reality (VR), augmented reality (AR), etc.., the latter shows that the multimedia streaming or more precisely the streaming video will be more and more important in the human society (smartcity, eHealth, Industry 4.0; etc.). Actually, each class of services has its own QoS (Quality of Service) parameters that in- fluence/impact the QoE of end users. Researchers have been working on finding methods to

21 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Neural Network Model for QoE Estimation of Video Streaming over 5G Network LO AL AG

quantify and map these factors into a QoE model to be able to assess the quality of the cor- responding service[4]. Most existing contributions in this area are related to LTE and wireless network. There are very few articles that address specifically the case of future 5G network,in [5] the autors proposed an analytical method that reduces delay and packet loss rate (PLR) to in- crease QoE of video streaming over 5G. In [6] the autors present a novel method of performance evaluation for service chains within cloud networks by task graph reduction,they evaluate this methos by cloud based mobile video streaming as an example of a service chain in a 5G network for evaluation. In [7] the autors propose a novel QoE model for evaluating video streaming,they use a non-linear regression and exponential function to get the formula of estimation the QoE of streaming video over 5G Therefore, the objectif of this paper is first to present these contributions and then propose a new solution for the estimation the QoE of 5G network end users for streaming video over millimeter wave, the 5G millimeter wave network should be used to obtain the higher bandwidth and bit rate gains. The remainder of this paper is structured as follows: in section II, a definition of QoE is presented, next, we discuss the different state of art QoE evaluation methods. in section III, we present the architecture of video streaming system over 5G network and the proposed methodology to estimate the QoE of streaming videos. Finally, in Section IV, a conclusion of the work is presented as well as some future works.

2 QoE Definition and State of Art Assessment Methods

2.1 QoE Definition As today, it is difficult to find a unique definition of the Quality of Experience. The Inter- national Telecommunication Union (ITU-T) has defined it as the overall acceptability of an application or service, as perceived subjectively by the end user. Recently, in the European Qualinet community, the QoE was defined as the degree of delight or annoyance of the user of an application or service. It is clear from these two definitions, that the general understanding of QoE remains the same to a large extent despite some specific formulation in different context of interactions between a user and a provided service.

2.2 State of Art Assessment Methods State of art QoE assessment methods can be categorized into three category: subject assessment, object assessment and hybride assessment. Subjective assessment is the most direct way to evaluate users QoE since the results are given by humans directly during the usage of the service. The evaluation consists of constructing a panel of human observers who will evaluate the sequences of videos according to their point of view and their perception. The output of the test is generally in terms of Mean Opinion Score (MOS). Since the subjective approach is not appropriate for implementation (because it is very expensive and takes a long time to achieve). The latter pushed researchers to find other more less subjective methods,where the QoE is mathematically modeled with some objectively measurable factors as input variables, and the value of QoE is the corresponding output as proposed in[7]. The researchers found many objective measures such as Peak-to-Signal-Noise Ratio(PSNR),Structural Similarity Met- riC(SSIM)and Video Quality Metric(VQM)[8] to optimize streaming video. Between the two approaches described above, a hybrid evaluation called pseudo subjective quality assessement (PSQA) was created to provide an accurate assessment of QoE as perceived

2

22 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Neural Network Model for QoE Estimation of Video Streaming over 5G Network LO AL AG

by humans. The method is based on statistical learning. The current classical machines learn- ing approach used in(PSQA)are recurrent neural network (RNN) model,The support vector machines (SVM),model decision tree and Bayesian network [3]. Based on experiences researching QoE, it is realized that in many situations,the relation between influencing factors and QoE cannot be expressed by mathematical formulas explic- itly. Furthermore, even if a mathematical model can be established, system parameters may be hidden in the beginning. In order to explore the implicit relation and the system parame- ters,machine learning methods are widely applied,there are for we use machine learning methods in our proposed scenario.

3 Scenario of evaluation the QoE of streaming video over 5G 3.1 The architecture of video streaming system over 5G networks Due to the shortage of frequency spectrum below 6 GHz bands and the demand for higher data rate, higher frequencies, e.g., the millimeter-wave (mm-Wave) frequency bands, have been suggested as candidates for future 5G smartphone applications, as the considerably larger band- width could be exploited to increase the capacity and enable the users to experience several- gigabits-per-second data rates [9][10]. In our architecture, we use millimeter wave link model to build video streaming client and server side, which achieves the goal of spread of video streaming [10][11] Our architecture for video streaming system over 5G network is showed in Fig. 1 From the figure we can see it includes two parts, the first one is to build millimeter wave network by using ns-3 simulation tools which is defined and explained in [12]. The second constitutes the estimation model for video streaming over 5G.

Figure 1: Architecture of video streaming system over 5G network.

3.2 millimiter wave network 5G Millimeter-wave (mmWave) communication is widely considered to be a promising candi- date technology for fifth generation (5G) cellular and next-generation wireless local area net-

3

23 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Neural Network Model for QoE Estimation of Video Streaming over 5G Network LO AL AG

works(WLANs). The implementation of millimeter wave devices comprises the propagation and channel model, the physical (PHY) layer, and the MAC layer. The module completely is explained in [7],and described in fig 2.

Figure 2: 5G millimeter wave end-to-end network with video streaming system.

As we can see in fig 2 the architectures is very robust the millimiiter MAC layer is designed to meet the ultra-low latency and high data rate demands, as presented in [13].. the mmWave bands potentially enable ultra-low latency and massive bandwidths at the physical (PHY)which is designed for multiple function like achieve channel model frame structure parameters, includ- ing millimeter wave transmission path loss model, MIMO technology to achieve beam-forming technology, the channel configuration parameters...ect .

3.3 Estimation of the parameters Our scenario of estimation of QoE for video streaming over millimitter wave in 5G is shown in fig 3. As we can see from the architecure of our estimation of QoE is over a Neural Network

Figure 3: architecture of our scenario.

where we need to introduce some parameters so : Firstly the Peak Signal to Noise Ratio (PSNR) of the videos is calculate and it will be mapped to Mean opinion score (MOS ) value of the user as showed in Table I secondly we alculate the parameters of quality of service QoS (jitter, packet loss, delay) in order to characterize videos. as showed in figure 4

4

24 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Neural Network Model for QoE Estimation of Video Streaming over 5G Network LO AL AG

Table 1: PSNR-MOS PSNR MOS Estimated Quality > 37 5 Excellent 31-37 4 Good 25-31 3 Reasonable 20-25 2 Poor < 20 1 Bad

Figure 4: .Estimation of QoS parameters and MOS.

also it will be important to find a parameters for introduces the user profil which is a very important factor to improve the QoE. The QoS parameters (jitter, delay, packet loss,human profil) and MOS will be used for automatic learning part. Thereinafter, we will describe this part in detail.

The third part concerns the automatic learning of the selected neural network. In our QoE assenssement we adopt the fuzzy ARTMAP (FAM)neural network. Since its inception in 1992, the fuzzy ARTMAP (FAM)[14] neural network has attracted researchers attention. FAM excels in fast incremental supervised learning in a non-stationary environment and using few examples. The network enables the learning of new data without forgetting past data, addressing the so- called plasticity-stability dilemma [14] . The FAM is considered one of the leading algorithms for classification [15]. The parameters of QoS (jitter, packet loss, delay,human profil) will be the input of our NN and the output will be the QoE as showed in fig 4. we will select a number n of vodeos for learning and a number m (n ¡¡ m) of videos for the test. The two bases will be chosen independent of each other. From these videos, we will determine the parameters characteristics (jitter ,delay ,packet loss,human profile) which will constitute the signature of the videos.

5

25 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Neural Network Model for QoE Estimation of Video Streaming over 5G Network LO AL AG

4 conclusion

In 5G future networks , QoE metric is e a challenging task.So in this paper, we propose to use a Fuzzy artmap neural network to estimate the QoE of video streaming system over 5G network. The parameters of QoS (jitter, packet loss, delay,human profil) will be the input of our neural network and the output will be the QoE. We will complete this work by a comparative study of different architectures using different neural networks and architectures using mathematical models.

References

[1] Cisco Visual Networking Index. Global mobile data traffic forecast update 2014–2019 white pa- per, feb 2015. See: http://www. cisco. com/c/en/us/solutions/collateral/service-provider/visual- networking-index-vni/white paper c11-520862. html, 2015. [2] Yong Niu, Yong Li, Depeng Jin, Li Su, and Athanasios V Vasilakos. A survey of millimeter wave communications (mmwave) for 5g: opportunities and challenges. Wireless Networks, 21(8):2657– 2676, 2015. [3] Ying Wang, Peilong Li, Lei Jiao, Zhou Su, Nan Cheng, Xuemin Sherman Shen, and Ping Zhang. A data-driven architecture for personalized qoe management in 5g wireless networks. IEEE Wireless Communications, 24(1):102–110, 2017. [4] Hui Zhang, Xiuhua Jiang, and Xiaohua Lei. A method for evaluating qoe of live streaming services. International Journal of Computer and Electrical Engineering, 7(5):296, 2015. [5] M Sajid Mushtaq, Scott Fowler, Brice Augustin, and Abdelhamid Mellouk. Qoe in 5g cloud networks using multimedia services. In Wireless Communications and Networking Conference (WCNC), 2016 IEEE, pages 1–6. IEEE, 2016. [6] Frank Loh, Valentin Burger, Florian Wamser, Phuoc Tran-Gia, Giovanni Schembra, and Corrado Rametta. Performance evaluation of video streaming service chains in softwarized 5g networks with task graph reduction. In Teletraffic Congress (ITC 29), 2017 29th International, volume 2, pages 36–41. IEEE, 2017. [7] Yanjun Hou, Lijun Song, Mengyu Gao, et al. A qoe estimation model for video streaming over 5g millimeter wave network. In International Conference on Broadband and Wireless Computing, Communication and Applications, pages 93–104. Springer, 2016. [8] Guan-Ming Su, Xiao Su, Yan Bai, Mea Wang, Athanasios V Vasilakos, and Haohong Wang. Qoe in video streaming over wireless networks: perspectives and research challenges. Wireless networks, 22(5):1571–1593, 2016. [9] Jian Qiao, Xuemin Sherman Shen, Jon W Mark, Qinghua Shen, Yejun He, and Lei Lei. Enabling device-to-device communications in millimeter-wave 5g cellular networks. IEEE Communications Magazine, 53(1):209–215, 2015. [10] Menglei Zhang, Marco Mezzavilla, Russell Ford, Sundeep Rangan, Shivendra Panwar, Evangelos Mellios, Di Kong, Andrew Nix, and Michele Zorzi. Transport layer performance in 5g mmwave cel- lular. In Computer Communications Workshops (INFOCOM WKSHPS), 2016 IEEE Conference on, pages 730–735. IEEE, 2016. [11] Marco Mezzavilla, Sourjya Dutta, Menglei Zhang, Mustafa Riza Akdeniz, and Sundeep Rangan. 5g mmwave module for the ns-3 network simulator. In Proceedings of the 18th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, pages 283–290. ACM, 2015. [12] Sourjya Dutta, Marco Mezzavilla, Russell Ford, Menglei Zhang, Sundeep Rangan, and Michele Zorzi. Frame structure design and analysis for millimeter wave cellular systems. IEEE Transactions on Wireless Communications, 16(3):1508–1522, 2017.

6

26 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Neural Network Model for QoE Estimation of Video Streaming over 5G Network LO AL AG

[13] I Chih-Lin, Corbett Rowell, Shuangfeng Han, Zhikun Xu, Gang Li, and Zhengang Pan. Toward green and soft: a 5g perspective. IEEE Communications Magazine, 52(2):66–73, 2014. [14] Gail A Carpenter, Stephen Grossberg, Natalya Markuzon, John H Reynolds, and David B Rosen. Fuzzy artmap: A neural network architecture for incremental supervised learning of analog mul- tidimensional maps. IEEE Transactions on neural networks, 3(5):698–713, 1992. [15] D Sabella, P Serrano, G Stea, A Virdis, I Tinnirello, F Giuliano, D Garlisi, P Vlacheas, P Demes- tichas, V Foteinos, et al. A flexible and reconfigurable 5g networking architecture based on context and content information. In Networks and Communications (EuCNC), 2017 European Conference on, pages 1–6. IEEE, 2017.

7

27 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Service Orchestrator in Cloud RAN based Edge Cloud

Olfa Chabbouh1,2, Nazim Agoulmine2, Sonia Ben Rejeb1 and Javier Baliosian 3 1 High School of Communication of Tunis (Sup’com), Carthage University, Ariana, Tunisia. 2 IBISC – IBGBI Laboratory, University of Evry-Val-d’Essonne, Evry, France. 3 University of Uruguay, Uruguay.

[email protected], [email protected], [email protected], [email protected]

Abstract Fifth generation of mobile networks (5G) is designed to introduce a multitude of new services which require higher processing resources and lower latency. Cloud Radio Access Network (Cloud RAN) is one of the most promising solutions for next generation networks. The basic idea of C-RAN is to migrate Baseband Units (BBU) to the cloud for a centralized processing and management. In our work, we propose to extend C-RAN architecture with an edge cloud: the Cloud RRH in order to enhance the quality of service (QoS). In this paper we propose a service orchestrator in the Cloud RAN based edge cloud in order to enhance resources utilization while keeping a good QoS.

1 Introduction In recent years and thanks to the development of mobile devices, the services offered to users have been multiplied in number and diversity. These services are increasingly greedy in terms of resources. This evolution in the usage model has led to an exponential increase in traffic. It is estimated to grow 13- fold from 2012 until 2017 according to Cisco [1]. Therefore, to cope with this increase, mobile network operators have to increase the capacity of their access networks. Several solutions was proposed such as access network densification, however, this solution is very expensive because it requires the implementation of new access points. Moreover, reducing the cell size was proposed as a solution since limited spectral resources can be reused among the small cells more frequently, thus enhancing the total system capacity. However, it increases the problem of inter cell interference.

A major orientation adopted by 5G is based on the use of the cloud. Among the proposed approaches, there is the virtualization of the access network that gave birth to the Cloud RAN. The concept of C- RAN was first introduced in [3] and described in details in [4]. In Cloud RAN architecture, baseband units (BBUs) are centralized and virtualized in BBU pools. Therefore, traditional complicated base stations can be simplified to cost-effective and power-efficient radio units RRHs (Radio Remote Heads). This helps to reduce the network costs in terms of CAPEX and OPEX.

28 Proceedings ADVANCE 2018 ISBN 978-2-9561129

This new access network architecture brings several advantages. Indeed, C-RAN permits to benefit from all the advantages of cloud computing paradigm. It also benefits from centralized management and processing thanks to the provision of BBUs as pools in the cloud. Moreover, Cloud RAN permits better spectral and energy efficiency.

In other hand, running applications specifically developed for mobile devices, for entertainment, health care, business, social networking, traveling, news… becomes an everyday habit. However, there is a gap between mobile handset’s capabilities and those required to run such sophisticated applications. Mobile Cloud Computing (MCC) and cloud edge are becoming a key flexible and cost effective tools to allow mobile terminals to have access to much larger computational and storage resources than those available in typical user equipment.

In our work, we propose a novel Cloud RAN heterogeneous architecture where we introduce the Cloud RRH as an edge cloud. The basic idea consists on providing additional computational and storage resources to High RRHs to bring resources closer to end user. Containers is the technology used to support this offloading [3] because it is able to provide a higher level of abstraction in terms of virtualization and isolation compared to other virtualization techniques. Using this infrastructure, mobile users will be able to execute their applications locally on mobile handset or to offload them in Cloud-RRH or in BBU pools. Therefore, in order to fully profit from this architecture we need to efficiently orchestrate services execution among available resources. That’s why we propose a service orchestrator in order to efficiently decide where to execute applications.

The remainder of this paper is organized as follows. In the next section, we present our system model and the basic idea of the proposed solution. Finally, section III concludes this paper.

2 Proposed service orchestrator 2.1 System model The scenario is represented in figure 1. We consider a HetNet Cloud RAN infrastructure composed of H-RRHs (High RRHs) which acts as macro cells and L-RRHs (Low RRHs) which acts as small cells. In our system, we propose to introduce additional cloud capacities in the edge network named Cloud RRH. Unlike a traditional C-RAN architecture where all RAN functionalities are centralized in the cloud, we propose to flexibly split of these functionalities between Cloud RRH and central cloud. We suppose also that additional computation and storage resources are available in the edge for computation offloading. These additional resources are represented by cloud containers

Moreover, as part of the cloud management, we propose to add a new functional entity that we call Cloudlet Manager (CM). The main functionalities of the CM are containers placement, deployment, monitoring and resource and applications scheduling.

Mobile users can access their services directly in the Cloud RRH. In this case, the CM could instantiate containers in the edge and offload the service logic computation in these containers. Containers are not always active, rather they are activated or deactivated accordingly. Different interactions schema are represented in figure 2.

29 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Figure 1: C-RAN Architecture proposed model

Mobile services are increasingly sought after by users in all areas: sports, health, leisure, transportation, studies.... In our architecture, these services can be executed on the mobile terminal, in the Cloud-RRH or in the Central Cloud (BBU pools). However, the main problem is how to make the decision about the place of execution of a given service while assuring an amount of condition such as QoS improvement, cost reduce and resources utilization optimization. Therefore, we developed a service orchestrator in a 5G cloud RAN environment.

Figure 2: Cloudlet Manager interactions

30 Proceedings ADVANCE 2018 ISBN 978-2-9561129

2.2 Service orchestrator As we have explained, a given service can be executed in three parts of our Cloud RAN architecture which are the mobile, the cloud edge and the central cloud. Actually, the assignment of a service to a processing unit depends both on the resources required by the service to launch and the free capacities of the processing units. And here comes the role of the decision algorithm in our orchestrator.

Different, decision algorithms exist in the literature [5][6][7]. In our work, we propose to use fuzzy logic as a decision algorithm. As represented in figure 3, fuzzy logic controller (FLC) is essentially composed of three steps. The fuzzification is the first step of the fuzzy logic process. This step takes as input the numerical values of the variables on which the fuzzy controller will base its

decision i = nix ),...,2,1( and gives as an output m fuzzy variables, denoted by

ij = mjF ),...,2,1( . Therefore, a continuous space is mapped into a discrete space. The matching

between the numerical and fuzzy variables is made using a membership function δ xiij )( which

defines the membership degree of the numerical input xi with the fuzzy output Fij . In our work, we use the Gaussian membership function as follows:

2  −− Fx iji )(  (1) δ xiij = exp)(  2   2σ 

Where σ is the membership functions width, ∈{ ,...,2,1 ni } and ∈{ ,...,2,1 mj }.

The second step is inference. This step takes as input the fuzzy variables and the ∈ Ss rules defined for the fuzzy controller and calculates validation degree of each of these rules. All rules have the following form:

= cathenrisxif s (2)

Figure 3: Fuzzy Logic Control Process

31 Proceedings ADVANCE 2018 ISBN 978-2-9561129

In deffuzification step, the output action a is given by the gravity center of conclusions cs in each rule weighted by the membership function using the following expression:

n a = δ ).( cx (3) ∑∏i=1 siij ∈Ss

To use fuzzy logic, we need to define the inputs of the decision algorithm, the sets of values, then the membership functions of the inputs to the sets of values and finally the rules on which the algorithm will be based to make its decision on where the service should optimally be executed.

a) Input specification First of all, we should define our inputs which are the resources service requirements. We specified these resources required by the service to be the: • CPU • RAM • Bandwidth • Disk space

b) Set of values specification Now, that we specified the inputs of our algorithm, we should define the sets of values that we will use. Since the service can be executed either on the mobile, the cloud edge or the central cloud, the sets of values will be the: • Mobile region • Cloud edge or cloud RRH region • Central cloud region.

c) Membership functions The figure 4 represents an example for the membership functions of the resources (CPU, RAM, BW, DS) to the sets of values (mobile, cloud edge, central cloud) knowing that: • Mobile limit is taken for 5 units • Cloud RRH limit is 10 units • Central cloud limit is 15

d) Rules specification Now that we defined the membership functions for our inputs, we need to define the rules on which the fuzzy controller will base its decision. The rules that we defined are: • If ((CPU is Mobile) and (RAM is Mobile) and (BW is Mobile) and (DS is mobile)) then (Output is Mobile) • If ((CPU is Cloud_RRH) and (RAM is Cloud_RRH) and (BW is Cloud_RRH) and (DS is Cloud_RRH)) then (Output is Cloud_RRH) • If ((CPU is Central_Cloud) and (RAM is Central_Cloud) and (BW is Central_Cloud) and (DS is Central_Cloud)) then (Output is Central_Cloud)

32 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Figure 4: Membership functions

3 Simulation and results This section provides simulation results for the proposed service orchestrator based fuzzy logic, we have compared its results with a random association of services to MT, Cloud RRH or central cloud. We considered a Cloud-RRH with N= 20 containers having heterogeneous resources. The computing capacity of containers varies from 1 to 10 CPUs. The memory is set from 128 Mbytes to 512 Mbytes and the network bandwidth is set from 100 Kbps to 200 Kbps. Services have heterogeneous requirements and the request to execute a service arrives randomly.

Figure 5 represents the variation of the response time over service’s data size (ranging from 1 to 100 Kbits). Results show that the proposed service orchestrator ensures a lower response time compared to random association. Therefore, it will enhance the user quality of experience.

5000 Proposed 4000 Orchestrator Random Association 3000

2000

1000 Response time (ms)

0 0 20 40 60 80 100 120 service data size (Kbits)

Figure 5: Services response time evaluation

33 Proceedings ADVANCE 2018 ISBN 978-2-9561129

4 Conclusion This paper addresses a topical issue in the field of telecommunications and mobile networks which is the optimization of the resource usage in a Cloud RAN network which comes across the orchestration of services execution in this network. To develop our service orchestrator we have used a fuzzy logic decision algorithm. The proposed system model is expected to optimize resource utilization while keeping a good level of QoS.

As a future work, we will simulate the proposed solution in order to test its performances. Then, we will integrate the execution cost in the optimization process.

5 Acknowledgement We would like to thank the French Ministry of Foreign Affair and International Development for its support to this research work in the frame of VNET 16STIC11-VNET AmSud project.

References

[1] “Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2012-2017,” Cisco, Tech. Rep., February 2013. [2] R. Dua, A. R. Raja, and D. Kakadia, “Virtualization vs Containerization to Support PaaS,” in 2014 IEEE International Conference on Cloud Engineering, 2014, pp. 610–614. [3] Y. Lin, L. Shao, Z. Zhu, Q. Wang, and R. K. Sabhikhi, “Wireless network cloud: Architecture and system requirements,” IBM Journal of Research and Development, january-february 2010. [4] “C-RAN The Road Towards Green RAN,” China Mobile Research Institute, Tech. Rep., October 2011. [5] Chen, Z. Han, H. Zhang, G. Xue, Y. Xiao and M. Bennis, “Wireless Resource Scheduling in Virtualized Radio Access Networks Using Stochastic Learning”, IEEE Transactions on Mobile Computing, Volume: PP, Issue: 99 , August 2017. [6] P. Tang, F. Li, W. Zhou, W. Hu and L. Yang, “Efficient Auto-Scaling Approach in the Telco Cloud Using Self-Learning Algorithm”, Global Communications Conference (GLOBECOM), December 2015. [7] B. Németh, J. Czentye, G. Vaszkun, L. Csikor and B. Sonkoly, “Customizable real-time service graph mapping algorithm in carrier grade networks”, IEEE Conference on Network Function Virtualization and Software Defined Network (NFV-SDN), January 2015.

34 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Intelligent Environments Applied to Precision Agriculture

Clemilson C. Santos1, Gabriel A. L. Paillard1, Emanuel F. Coutinho1, Maur´ıcio Moreira Neto2, Leonardo O. Moreira1, and Ernesto T. Lima Neto1

1 Virtual University Institute, Federal University of Cear´a(UFC), Fortaleza, Cear´a,Brazil 2 Federal University of Cear´a(UFC) – Fortaleza – Cear´a– Brazil {clemilson.santos,gabriel,emanuel}@virtual.ufc.br, [email protected], {leoomoreira,ernesto}@virtual.ufc.br

Abstract The need to improve the efficiency of irrigation in arid and semi-arid regions has made its rational planning essential, and therefore a great deal of effort has been invested in the development of new technologies that allow a better knowledge of the phenomena of nature involving water, soil, and plant. The technologies applied to precision irrigation use resources capable of identifying the correct location, time and quantity of water sufficient to meet the hydro needs of a given plant cultivated in the most varied types of soil and climate. Precision agriculture uses different technologies for the commercial agriculture management in large and medium scale. Internet of Things (IoT) integrates different technologies from diverse areas, allowing the application of sensors and actuators that can serve the most varied types of agricultural production. Considering these factors, we propose an autonomous and adaptive control system using a network of sensors and wireless actuators capable of efficiently and accurately automating a conventional high-efficiency pressurized irrigation system of the drip type.

1 Introduction

The expansion of irrigated agriculture has made the agricultural sector the largest consumer of water. Globally, agriculture consumes about 69% of all water derived from sources (rivers, lakes and underground aquifers), and industries and home use consume the other 31%. Therefore, proper control and management are essential elements in maintaining sustainable agriculture. Irrigation involves the set of artificial techniques to meet the water needs of the exploited crop. The effort seeks to enable agricultural productivity in areas without the natural structures necessary for its practice. The rational management of irrigation means the adequate applica- tion of water in the plants, always considering the quantity and the right moment. Precision irrigation uses technologies that allow uniform amounts of water to be applied at precise loca- tions within the soil profile, providing differential irrigation treatments by treating the plants or small areas within the field individually. The search for improvement in precision agriculture has involved the use of much digital equipments, geographic information systems, remote sensing devices, automated machine con- trol and internet communication. Among the techniques of precision agriculture, we highlight the control of the amount of water applied in the crop (precision irrigation). Precision irrigation uses technologies that allow the control of water application, within well- defined limits between deficit points and water excess, using differential irrigation treatments, a process that can be automated and managed through software applications.The Internet of Things (IoT) emerges in this context as a concept capable of integrating the capabilities of Wireless Sensor Networks and automation into several areas of research and production. Mainly due to their great flexibility of implantation the devices IoT present characteristics that elect them like tool compatible with the agricultural production environments [6].

35 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Intelligent Environments Applied to Precision Agriculture Santos et al.

IoT’s advancement in agriculture will contribute to the analysis of data, making the knowl- edge obtained useful not only for the irrigation process but also for making decisions related to investments, transportation, the environment and other vital areas in the sustainability of agribusiness. The possible applications allow to integrate more quickly all the characters of the production chain, clearly presenting the performance of the production in the field. In this work, an IoT control platform was developed capable of efficiently and accurately automating a conventional drip type high-pressure pressurized irrigation system. The work proposes the development of an integrated irrigation management environment, managed by a distributed system and with an availability of the data in the cloud. This article is organized into six parts. In the second one, some observations will be made on the concept of precision irrigation and its characteristics and components. In the third section will be a brief exposition about the system architecture, and a basic model of the control algorithms employed. The fourth section deals with the operation and management interfaces of the monitoring and control system. The fifth section outlines some results achieved during the field testing period. The last one concludes the paper presenting the obstacles encountered and the challenges listed for future work.

2 Precision Irrigation

Technological innovations are being tested and gradually implanted in the reality of agricultural practices. The monitoring of the actual physical and climatic conditions in the field currently finds a wide range of sensors and actuators, which provides the viable conditions for the real-time maintenance of ideal conditions of productivity [4]. Among these technologies, we can highlight the use of Wireless Sensor Networks (WSNs) in the field. WSNs, along with other technologies, can be autonomous modules with embedded processing and communication, offering monitoring and control features of devices with great practicality and economy, allowing them to be rapidly deployed in various environmental contexts.

2.1 High Efficiency Pressurized Irrigation System The irrigation system used in this work is the one of dripping that is between pressurized methods that depend on pumping. In this arrangement, the water is brought under pressure through tubes until it is applied to the soil through emitters directly on the root zone of the plant, with high frequency and low intensity. The system can reach 90% efficiency and has been widely used in perennial crops, fruit crops, and the production of vegetables and flowers. The higher demand for this type of irrigation is primarily related to its low water requirement when compared to other systems. The installation may be surface or buried by careful analysis of the crop to be irrigated[3].

2.2 Sensors in the Field In this work, capacitive sensors were used to measure soil moisture, these capacitive sensors are FDR (Reflectometry in Frequency Domain) and were developed in the Department of Agricultural Engineering (DENA) of UFC[8]. This monitoring technique has been used in the field and proved to be an efficient tool for irrigation control, monitoring in real time, with precision and accuracy, soil moisture and allowing a higher fractionation of the water distribution. In [5] it was employed this type of sensor to monitor and control irrigation in a

2

36 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Intelligent Environments Applied to Precision Agriculture Santos et al.

watermelon crop, which allowed him to observe the soil water content in real time, obtaining greater irrigation efficiency.

2.3 Wireless Sensor Networks The capacitive sensor was integrated into the wireless communication modules[7] and began to form a distributed network of sensors, and its use was tested by[2] in the management of a system with localized irrigation. In their evaluations, groundwater movement data were ob- tained, acquired by remote modules equipped with two FDR type capacitive sensors, installed at different depths. The measurements of said sensors were taken at predetermined time inter- vals and sent to a master module connected to a computer. The stored data were treated in spreadsheets with the calibration and correction equations generating as output, the soil mois- ture in quantity of water per amount of soil, and the definition of the irrigation time required to maintain the ideal moisture for the crop.

2.4 Systems Actuators In order for the information obtained in the monitoring process to be automatically used for irrigation control, it is necessary to create remote modules capable of being integrated into the control panels of the irrigation pumps and the various bypass valves installed hydraulic mesh installed in the crop-lot. The coupling of these devices presents challenging requirements to be solved. This integration should be easy, fast and without the need to reconfigure the hydraulic or electrical infrastructure found. The requirement for such flexibility is mainly related to the types of crops harvested, es- pecially those with short growing periods and high turnover in land use. These characteristics make it unfeasible to redistribute wire-connected sensors between each harvest and replanting. Another very convenient feature of Wireless Sensor Networks (WSNs) is the practicality for geographic distribution of control and monitoring activities in areas of interest. However, in order to exploit this characteristic of WSNs, it is necessary for the remote units to present hardware configurations with the ability to integrate multiple peripherals, sensors or actuators, making these nodes capable of serving the most varied functions. In this work, the hardware used is modularized and can be quickly adapted to the activities of interest by replacing its interface cards with the peripheral control and monitoring devices. As an example, we can mention the sensor coupling layers, valve actuation and pressurizing pumps. These layers were dimensioned according to the electrical characteristics of consumption and operation relative to the sources of energy available at the intervention sites. In some locations it is necessary to use alternative energy sources such as solar panels to ensure the continued operation of some units. The module developed for the control of the pumps has a connection layer capable of offering levels of control for the pumping. These levels vary according to the resources found in the infrastructure of the irrigated area. The module, after being installed next to the circuit breaker circuit, starts to control the pump cycles through the contactor switches. A more efficient level of control can be achieved by the module when there are frequency inverters for the control of rotation of the pumps, in this way the system starts to act controlling the flow, further fractionating the irrigation cycles of attending more localized only to the areas with water deficit. The operating interface is critical as it deals with the high power of the pump drive and the control of the pressure in the piping. At this point in the project, the development of the control algorithm must use safety devices that guarantee the adequate synchronism in the operation of

3

37 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Intelligent Environments Applied to Precision Agriculture Santos et al.

the hydraulic mesh. Safety devices are implemented in both hardware and software and should avoid errors such as: pump operation with all sector valves closed, pump shutdown during the irrigation interval of any of the areas under control, pump drive without water, or with low voltage in the network and the other more. The control of the valves is carried out through a module endowed with higher power, with its load maintained through photovoltaic cells. This unit, in addition to the control, shall monitor the service pressure at the bypass of the hydraulic network, returning such data for compensating controls, loss of load or as the safety against overflow of ducts. The most efficient flow control was scaled through the use of frequency inverters, where its control is also integrated into the network through a wireless interface with the central system supervision unit.

2.5 Distributed System Just one equipment cannot do control of a heterogeneous system such as an irrigation system. Soil moisture sensors, wireless communication modules, control modules, hydraulic pumps, fre- quency inverters, etc. are required for automatic irrigation control. In the proposed system, control algorithms and interfaces are allocated to a personal computer (PC), which provides excellent processing power, proper storage, and other advantages such as internet communica- tion. Figure 1 presents an overview of the proposed automation system. The calculation of irrigation time and system control is centralized, taking the computer as the system’s brain. It is where control software runs, collecting data from sensors and processing it to make decisions about when and how much to irrigate. The use of the computer allows to set up a database and system monitoring, alerting the owner or administrator about unusual events in the hydraulic grid, or just storing data about: pump activation times, irrigation time of each area, soil moisture throughout the day and others. The Master Module is coupled to the PC via a serial or Universal Serial Bus (USB) con- nection, it is the root node or sink of the WSN, being necessary as a communication interface between the PC and the wireless modules present in the field. In the master module, the control logic for valve control, pump actuation, and sensor reading is defined. The collected information is packaged in the communication protocol and disseminated through multihop among all the remote modules of the network. The Control Module is in the pump house and is responsible for triggering a frequency inverter that will control the speed of the motor rotation. Another responsibility of this module is the monitoring of the pressure in the hydraulic system, guaranteeing the necessary pressure for the irrigation. The Valve Module is in the field next to the bypass stand and will trigger the four valves that divide the irrigation areas. This module also has pressure sensors that will be used to monitor service pressure after each of the valves, ensuring the flow of the emitters. The Sensor Modules will be installed in the field, where they will measure the soil moisture through the FDR capacitive sensors. Each module is capable of reading up to three sensors, which can be installed at different depths, depending on the culture being monitored. Sensor information should be sent to the master module periodically, closing the control system loop.

3 Control Algorithm

The control algorithm is a fundamental part of an automation system because it will govern the other components of it. Casadesus et al. [1] proposed an interesting and general algorithm for closed-loop control of a drip irrigation system in tree plantations. With some adaptations

4

38 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Intelligent Environments Applied to Precision Agriculture Santos et al.

Figure 1: Architecture of a system for irrigation automation

Figure 2: Control Algorithm

applied to the algorithm, it can be adapted to any crop, irrigation system and / or management strategy as we can see in figure 2. Step 1 (Monitoring of Soil Sensors) should be done in small intervals (5 or 10 minutes) for sandy soils. The sensors in this work are of the FDR type, and their response is given by the frequency of a digital signal. The reading of this signal is made by the sensor module, which measures this and sends it to the central computer. The ground sensor signal used ranges from 1700 kHz (dry soil) to 800 kHz (wet soil only). Step 2 (Interpretation of Sensor Data), translates the sensor reading (kHz) to a soil moisture

5

39 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Intelligent Environments Applied to Precision Agriculture Santos et al.

value based on volume (cm3/ cm3)[2]. Elaborated a potential equation that relates these factors, is the calibration of the parameters of the equation dependent on the physical and chemical characteristics of the soil. The sensors should be calibrated for each type of soil on which they will be installed. Step 3 (React to Events) is performed by the system’s main computer. It is at this moment that it is verified if there were adverse events, such as malfunction of some equipment (valve or pump) or rainfall. Events are detected by variations in soil moisture and system pressure. When they occur, the control system must react adequately by skipping the water replenishment calculation step. Step 4 (Calculation of the Water Requirement) is very simple to calculate when using the data of the soil moisture sensors and knowing the field capacity as we can see in this equation: ∆Θ = ΘCC − Θcurrent. The need for irrigation, in this case, will simply be the amount needed to move from the current (current) moisture to the field capacity. Step 5 (Adaptation to the Irrigation System), it is necessary to define the irrigation time necessary for the soil to reach the field capacity. The water requirement together with data on the hydraulic system and the crop management strategy form the set of parameters necessary for this decision to be made. In drip systems, the irrigation time is defined by the equation below:

∆Θ(cm3cm−3)ZmmELmEm T(h) = (1) Q(L/H)Ea

Where: (∆θ) represents the soil water requirement; (θCC ) is the moisture in the field capac- 3 −3 3 −3 ity at ((cm cm ); (θCurrent) is the current moisture at (cm cm ). T represents the irrigation time in hours; (∆θ) is the water requirement in (cm3cm−3); EL is the line spacing in m; EE is the emitter spacing in m; Q the flow per emitter in L (Lh−1), Ea corresponds to the efficiency of application and Z the depth of application of the blade in mm.

4 System Management Interfaces

A flap-based approach has been chosen as it helps to modularize the system, facilitate the development and subsequent maintenance. The home screen, linked to the “System Status” tab, displays some critical information about the software’s current operation and status. It allows the automatic control to be switched to the manual and provides buttons for activating the different steps of the automatic control. The other flaps contain the different functionalities of the system. The “Sensors Modules” tab deals with the registration of the remote modules and the sending of commands for reading the soil moisture sensors. The “Pumps” tab deals with the control of the system pumping stations and pressure readings. The “Valve Racks” tab deals with the hydraulics mesh bypass rails, allowing its remote activation and also the pressure reading. Finally, the “Manual Control” tab contains tools for quick manual control, designed to be easy to use in in system maintenance situations or pest control situations. The Auto Control frame provides a simple and quick way to trigger automatic system op- eration, and is designed to provide greater flexibility to the operator. There are three functions that can be actuated using the buttons included therein: automatic reading of soil moisture sensors, automatic reading of pressure sensors and irrigation drive. The first two can be acti- vated independently, so the operator can decide to only monitor the system without activating it automatically, as it would in case of meteorological monitoring.

6

40 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Intelligent Environments Applied to Precision Agriculture Santos et al.

5 Results Achieved

The experimental area used in this project is located in the irrigated perimeter of Baixo Acara´u, located in the state of Cear´a,210 km from Fortaleza (3◦ 07’13 ”South latitude and 40◦ 05’13” West longitude). It has a support house, a pump house and a cultivated field of 1 hectare. The system was installed in the field in two different moments, one in the rainy season, and it was used to monitor soil moisture and another in the dry period to control irrigation. In this way, the efficiency of the system can be verified by performing both functions. In the monitoring carried out in the cultivated field, it was possible to observe the water dynamics in the soil. In the first phase, the automatic control was not activated, is the appli- cation of water in the soil made by seasonal rains or sporadic irrigations. The first layer (5 to 15 cm), monitored by sensor 1, was most affected by the rain/irrigation cycles, with humidity varying much over the days. The second layer (15 to 25 cm), monitored by sensor 2, was little affected by water applications, showing that all the applied water was being absorbed by the plant in the first layer. The second layer was modified only after heavy rain, on April, shown by the abrupt increase in soil moisture in the prominent area. The software responded well to continuous monitoring of soil moisture, having operated for three days without restart. A problem of loss of USB communication arose when using other PC programs or other functions of the software itself. Modifications were made to the code to reduce the number of accesses to the USB port, but the problem persisted. The solution was to maintain a permanent communication with the USB device, instead of starting a new communication whenever it was necessary to exchange messages. This change has brought about a significant improvement in both the number of failures and the communication time between the PC and the master module. The control stage was tested for three days in a productive one, where besides monitoring the soil moisture, the software controlled the irrigations whenever the control parameters were reached. A first irrigation was performed at the beginning of the experiment, in order to verify the time between the opening of the valve and the detection of the moisture front by sensor 1, which is 12 minutes. Successive observations have determined a time between 10 and 20 minutes for this irrigation system. Irrigations occurred throughout the days, whenever the humidity reached a critical point of 13 cm3 / cm3. Sometimes irrigation was purposely interrupted to observe the effect on soil moisture. It was noticed that the soil moisture growth was proportional to the irrigation time and that it began to decay shortly after the valves closed. The rate of increase of humidity was high with the dry soil, decreasing as it approached the field capacity. At the same time that sensor 1 showed significant variation in its response throughout the days, it can be observed that sensor 2 only had a slight increase of humidity. It can be seen that the second layer of soil was gradually retaining water, until reaching the field capacity, similar to that obtained by [2]. With the same data, it is possible to notice that the water is not lost by deep percolation since it accumulates in the layer of 15 to 25 cm, where there are still corn roots. During the tests, some problems tested the robustness of the system for both software and hardware devices. These issues revealed points susceptible to external faults in: sensor modules, batteries, USB library, database, hydraulic valves, piping etc. Many of these problems are not directly identifiable via software, such as a punctured pipe responsible for an unexpected pressure loss that can cause a high humidity detection by a sensor in the vicinity of the leak. Therefore, it is evident the need for frequent maintenance in the system, to keep it functional. The software may indirectly indicate possible causes of errors, but you still need a human

7

41 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Intelligent Environments Applied to Precision Agriculture Santos et al.

operator who realizes that a problem is occurring and intervene in the system when necessary: repairing pipes, replacing hardware components, or performing any other appropriate action.

6 Conclusions and Future Work

The developed system allowed the correct control of the soil moisture, always keeping it within acceptable limits for the good development of the corn crop. With parameter changes, he should be able to control irrigation for different soil types and crop correctly. The software is flexible enough to test for new control algorithms, requiring new functions in the valve-drive configuration flap. Also, the soil moisture database can be used for manual control or even evaluation of the water consumption of a crop. However, the wireless commu- nication protocol still needs to be improved, allowing for less power consumption and better communication availability. It was not possible to test the program during the complete cycle of a maize crop, due to the project execution time. However, the system has correctly controlled irrigation over a period of one week and can be used for a longer period, ensuring the correct amount of water for the crop throughout the cycle. The integration of environmental monitoring presents itself as a possibility of expansion of the system since it behaved suitably to monitor soil moisture. Other sensors can be coupled to the modules to measure temperature, atmospheric pressure, relative humidity, etc. Thus, it is possible to set up a database to evaluate: the water consumption of a crop; of the temperature in a greenhouse or aviary; distribution of rain along a central pivot, etc.

References

[1] A general algorithm for automated scheduling of drip irrigation in tree crops. volume 83, pages 11 – 20. Computers and Electronics in Agriculture.. ISSN 0168-1699., 2012. [2] T. M. L. Cruz. Estrat´egiade monitoramento e automa¸c˜aoem sistemas de irriga¸c˜aoutilizando dispositivos de comunica¸c˜aoem redes de sensores sem fio. In Disserta¸c˜ao(Mestrado.Departamento de Engenharia Agr´ıcola, UFC, Fortaleza, 2009. [3] M. M. Ramos E. C. Mantovani. Eficiˆenciana aplica¸c˜aoda ´agua. In Quimiga¸c˜ao: aplica¸c˜aode produtos qu´ımicos e biol´ogicos via irriga¸c˜ao, pages 135–152. Embrapa - SPI, 1994. [4] Jean Marie Farines, Joni da S. Fraga, and Rˆomulo S. de Oliveira. Sistemas de Tempo Real. Escola de Computa¸c˜ao- IME-USP, 2000. [5] A. D. S. D Oliveira. Avalia¸c˜aodo sensor de umidade topdea no manejo da irriga¸c˜ao. page 71. Disserta¸c˜ao(Mestrado). Departamento de Engenharia Agr´ıcola,UFC, Fortaleza., 2008. [6] Davi Oliveira, Maur´ıcioM. Neto, Emanuel Ferreira Coutinho, Gabriel Paillard, Ernesto Trajano, and Leonardo Moreira. An autonomic system for energy cost reduction in computer labs. In 5th International Workshop on ADVANCEs in ICT INfrastructures and Services (ADVANCE 2017), Evry, Jan 2017. [7] C. C. Santos. Sistema de sensoriamento remoto de umidade e temperatura do solo para irriga¸c˜ao de precis˜ao. page 129p. Disserta¸c˜ao(Mestrado). Departamento de Engenharia El´etrica, UFC, Fortaleza., 2008. [8] I. O SILVA. Desenvolvimento de um sensor capacitivo para o monitoramento de umidade do solo. page 86. Disserta¸c˜ao(Mestrado). Departamento de Engenharia Agr´ıcola,UFC, Fortaleza., 2005.

8

42 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Research Opportunities in Quality Assessment of Internet of Things, Software Defined Networks and Network Function Virtualization Environments

Emanuel Coutinho1,2,5, Maur´ıcioM. Neto1,4,5, William Sales4,5, Carla Ilane Bezerra3,5, and Jos´eNeuman de Souza4,5

1 IBITURUNA – Research Group of Cloud Computing and Systems 2 Virtual University Institute (UFC-VIRTUAL), Federal University of Cear´a,Fortaleza, Cear´a,Brazil 3 Campus Quixad´a– Federal University of Cear´a(UFC) – Quixad´a,Brazil 4 Master and Doctorate in Computer Science (MDCC) 5 Federal University of Cear´a(UFC) – Fortaleza – Cear´a– Brazil [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract Currently, IoT has been extensively researched and emerging technologies such as SDN and NFV are also being targeted because of the possibility of using software to control a computer network and build new network functions. Many of these technologies rely directly on software at various levels, for example graphical user interfaces, data access, communication, and integration with several devices. To evaluate the quality of a software product, a quality model is required and it defines quality goals for intermediate and final software products. An example of a product quality model is described in ISO/IEC 20010. This work aims to highlight research opportunities among these technologies and software quality, and point out some research aspects in the software quality of the generated products, both during its development process and its full usage.

1 Introdution and Concepts

Internet has led to the creation of a digital society in which (almost) everything is connected and accessible anywhere [10]. However, despite their widespread adoption, traditional IP networks are complex and difficult to manage. While it is difficult to configure the network according to pre-defined policies, it is also difficult to reconfigure it to meet different workloads and changes, and respond to failures. To make such situations even more difficult, today’s networks are also vertically integrated: control and data plans are grouped together. Software-Defined Networking (SDN) is an emerging paradigm that promises to change the state of the previously described situations by breaking vertical integration, separating the network control logic from the underlying routers and switches, promoting centralization of network control, and introducing network programming capability [10]. The separation of in- terests or concepts, introduced between the definition of network policies, their implementation in switching hardware, and the routing of traffic, is the key to the desired flexibility: by breaking the network control problem into manageable parts, SDN makes it easy to create and introduce new network abstractions, simplifying network management and making it easier to evolve. SDN refers to a network architecture where the data plane is managed by a remote control plane decoupled from the previous one. Its architecture can be characterized by [10]: (i) Decoupled data and control plans; (ii) Routing decisions based on flow, rather than destination- based; (iii) Control logic moved to an external entity (SDN controller or network operating

43 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Research Opportunities in Quality Assessment of Internet of Things, Software Defined Networks and Network Function Virtualization Environments Coutinho et al.

system); and (iv) Network programmable by software applications that interact with data plane devices.

Network Functions Virtualization (NFV) has drawn significant attention in industry and academia as a major shift in the provision of telecommunications services [12]. By unlinking Network Functions (NF) from the physical devices on which they are executed, the NFV has the potential to lead to significant reductions in Operating Expenses (OPEX) and capital expenditures (Capital Expenses - CAPEX). In addition, it promotes the implementation of new services with greater agility and greater return. The NFV paradigm is still in the early stages and there is a wide spectrum of opportunities for the research community to develop new architectures, systems and applications, and to evaluate alternatives and cost-benefit relation in the development of technologies for their successful deployment.

The NFV also promotes more flexibility and network services for users and other services, as well as the ability to implement or support new, faster and cheaper network services. For example [4]: (i) Software-to-hardware dissociation: As the network element is no longer a composite of integrated hardware and software entities, their evolution is independent of each other, allowing for separate development timeframes and software and hardware maintenance; (ii) Deployment of the flexible network function: Decoupling of software from hardware helps to allocate and share infrastructure resources, so together, hardware and software can perform different functions at various times; (iii) Dynamic sizing: Decoupling the functionality of the networking function in instantiable software components provides greater flexibility to scale the actual performance of the function in a more dynamic and granular way. For example, according to actual traffic, for which network operator needs to provision capacity.

The ISO/IEC 25000 standard recommends that a quality model be defined for the quality assessment of a software product and that it be used in the definition of quality goals for the final software products and intermediaries [8]. In addition, ISO/IEC 25010 [9] presents a software quality model that has been updated in accordance with the quality model defined in ISO/IEC 9126 [2]. The model has eight characteristics and thirty quality attributes related to the internal and external quality model.

The Internet of Things (IoT) is a new paradigm that combines aspects and technologies coming from different approaches [3]. Ubiquitous computing, pervasive computing, Internet Protocol, sensing technologies, communication technologies, and embedded devices are merged together in order to form a system where the real and digital worlds meet and are continuously in symbiotic interaction. The smart object is the building block of the IoT vision. By putting intelligence into everyday objects, they are turned into smart objects able not only to collect information from the environment and interact/control the physical world, but also to be in- terconnected, to each other, through internet to exchange data and information. IoT allows the connection of people and things “anytime”, “anyplace”, with “anything” and “anyone” [6]. IoT is an evolving concept with an increasing range of applications leading to development of new technologies and methods for the improvement of IoT environment [5]. In this new vision, things are connected and controlled by the internet.

The objective of this work is to promote a discussion about the use of SDN and NFV in conjunction with IoT, and the need and opportunity of research and development in software quality, such as evaluation of software quality of the generated products, both during its devel- opment process and its full use.

2

44 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Research Opportunities in Quality Assessment of Internet of Things, Software Defined Networks and Network Function Virtualization Environments Coutinho et al. 2 Software Product Quality

ISO/NBR 8402 [1] defines quality as the totality of characteristics of an entity that gives it the ability to meet implicit and explicit needs. Explicit needs are those expressed in defining the requirements proposed by the producer and implicit requirements are not documented by the producer, but they are necessary to the user. Regarding software quality, McCall et al. [11] define the quality of a software product through three different perspectives: Product Operations, Product Reviews, and Product Tran- sitions. The fundamental idea of this model is to evaluate the relationship between external quality factors and product quality criteria. The evaluation of software products aims to satisfy the quality needs in one of the stages of the software development lifecycle. Software product quality can be assessed by measuring internal attributes (typically static measures of intermediate products), external attributes (typically by measuring code behavior when executed), or quality attributes in use. The goal is the product has the required effect in a context of particular use [9]. To evaluate the quality of a product, a possible strategy is the use of measures. According to ISO/IEC 25010, a measure is the mapping of an entity to a number or a symbol in order to characterize an entity property [9]. Measures can be part of a quality model. According to the SQuaRE standard, a quality model categorizes software quality into characteristics, which are subdivided into subcharacteristics and quality attributes [9]. According to ISO/IEC 9126-1, quality characteristics are properties of a software product so that quality can be defined and evaluated [2]. Quality attributes are measurable physical or abstract properties of an entity. Quality measures are used to reflect quality characteristics, subcharacteristics or attributes. ISO/IEC 25000 recommends that a quality model be defined for quality evaluation of a software product and that it be used in the definition of quality goals for final software products and intermediaries [8]. In addition, ISO/IEC 25010 [9] presents a software quality model, which has been updated according to the quality model defined in ISO/IEC 9126 [2]. The model has eight characteristics: functional suitability, reliability, performance efficiency, usability, security, compatibility, maintainability and portability. These eigth characteristics are divided into thirty subcharacteristics (quality attributes) related to the internal and external quality model, as shown in Figure 1.

Figure 1: Product quality model [7][9]

3

45 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Research Opportunities in Quality Assessment of Internet of Things, Software Defined Networks and Network Function Virtualization Environments Coutinho et al. 3 Architectures Proposals and Software Quality

In general, architectural views reflect layers of applications, control or data flow, and hardware or network. These views can contain as many layers as are of interest to represent something significant of the architecture. Figure2 displays an overview of the application of only IoT with SDN support, proposed by [14]. This proposal has four layers: application, control, network and devices. It stands out the control layer that manages the IoT gateways, necessary for full communication with the various devices.

Figure 2: Overview of SDN and IoT usage [14]

Figure3 shows an overview of the use of SDN, NFV and IoT, proposed by [13]. In this figure, we can visualize four layers with different services: application, data, control and perception. From bottom to top we have a layer for sensing that uses several common IoT technologies, such as RFID and various types of sensors. Next we have a layer of data managed by SDN, where the data flows are executed, being able to use NFV features, such as virtual network functions (e.g. firewalls and load balances), which are now software entities. In the next layer we have a control plan, managed by network operating systems and controllers. Finally, in the upper layer we have applications that use resources from the environment captured by sensors, routed through network services or intermediary applications, transparent to the end user. Figure4 displays an architecture proposal that mixes elements of IoT, SDN, and NFV. Given this architecture, it is possible to apply each of the quality characteristics described in ISO/IEC 25010 [9] in the proposal layers. It should be emphasized that the applicability of the characteristics is not limited to the following examples, and it is possible that they are all applied in all proposed layers. All definitions mentioned in the following examples were taken from ISO/IEC 25010 [9]. Organizing by layers, some examples are:

4

46 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Research Opportunities in Quality Assessment of Internet of Things, Software Defined Networks and Network Function Virtualization Environments Coutinho et al.

Figure 3: Overview of SDN, NFV, and IoT usage [13]

Figure 4: Architecture proposal with IoT, SDN and NFV

5

47 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Research Opportunities in Quality Assessment of Internet of Things, Software Defined Networks and Network Function Virtualization Environments Coutinho et al.

• Application layer: Functional Suitability and Usability. Functional suitability is the degree to which a product or system provides functions that meet stated and implied needs when used under specified conditions. Usability is the degree to which a product or system can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use. • Control layer: Maintainability. Maintainability is the degree of effectiveness and effi- ciency in which a product or system can be modified by the intended maintainers. • Virtualization layer: Performance efficiency. Performance efficiency is the performance relative to the amount of resources used under stated conditions. • Communication layer: Compatibility and Portability. Compatibility is the degree to which a product, system or component can exchange information with other components, systems or products, and/or performs its required functions, while sharing the same hard- ware or software environment. Portability is the degree of effectiveness and efficiency with which a component, product or system can be transferred from one hardware, software or other operational or usage environment to another. • Devices layer: Reliability and Security. Reliability is the degree to which a system, prod- uct or component performs specified functions under specified conditions for a specified period of time. Security is the degree to which a product or system protects information and data so that persons or other products or systems have the degree of data access appropriate to their types and levels of authorization.

4 Research Opportunities with IoT, SDN, NFV and Soft- ware Quality

In view of the scenarios presented in Section 3, we list some research opportunities:

4.1 Software Products Evaluation The evaluation of software products is an aspect to be considered, regarding quality needs in the stages of the software development lifecycle. Software product quality can be assessed by measuring internal attributes (typically static measures of intermediate products), external attributes (typically by measuring code behavior when executed), or quality attributes in use. The goal is to have for the software product the required effect in a context of particular use [9], being a research opportunity. In an IoT environment, the software products are scattered and often on different devices. Evaluating the developed software for this environment becomes a more complex task. By adding SDN and NFV, this complexity increases, making it even more challenging.

4.2 Software for Environment Management In an environment with several network functions (e.g. firewall, load balancers and routers) a sequence of functions is usually defined. Managing the orchestration of virtual network functions, as well as their composition, then becomes a necessity. A software that manages such sequences would be very useful for an administrator or network operator. In addition, the access control and resource management itself are an opportunities for the development of new applications, either for the control itself and for configuration, or for monitoring.

6

48 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Research Opportunities in Quality Assessment of Internet of Things, Software Defined Networks and Network Function Virtualization Environments Coutinho et al. 4.3 Software for Control Control of network functions, and control of what happens throughout the environment is now programmable with the SDN. In this case, an opportunity arises to implement, test and integrate, since its effects have repercussions on all layers and for the end user. In this case, the aggregation of quality characteristics, such as those described in ISO/IEC 25010 [9], form a wide range of research opportunities, since each of the subcharacteristics can be applied in SDN, NFV and IoT environments.

4.4 Software for Virtual Network Functions With NFV, hardware can be replaced with software, such as firewalls and load balancers. In addition, the possibility of generating new functions oriented to the customer’s business rules becomes possible, further expanding the need for requirements management, process and product quality. In this context, many applications at various levels can be developed, and the level of inte- gration between them and the various layers of the environment also promote several challenges. In addition, these new software need to be tested for validation of user and infrastructure needs, as well as validate the integration between layers.

4.5 Application Software The traditional applications are at the top of the architecture, where the entire software de- velopment process is more traditionally applied, having the opportunity to develop the most varied applications, connecting network functions and data captured from the environment, in various devices such as web and mobile applications. These applications are the closest to end users, and the entire infrastructure is transparent to them.

4.6 Software for Diversified Devices With the advent of IoT, mobile devices and other embedded systems could be designed and built. In addition, access to such devices by the population has also become easier. And with the use, diverse applications and services have arisen. Several hardware, not traditionally com- mon as smartphones, for example smartwatches, have begun to have applications of embedded applications and applications for final users. Some devices use specific sensors, so need to be developed and distributed. Cross-platform integration also requires software, especially for IoT communication gate- ways. Finally, in all these cases, software quality is an aspect that must be observed at all levels, and ISO/IEC 25010 [9] has several characteristics and subcharacteristics to be consid- ered. The quality of the software embedded in these assorted devices must be verified and validated, as a failure at some point can impact both end users and developers.

5 Conclusion and Future Work

This work presented initial observations on how to integrate current computer network tech- nologies (SDN and NFV) with IoT elements, and thus enable an increasingly integrated envi- ronment, aligned with good practices in software development. The association of SDN with NFV appears to be a good opportunity for infrastructure operators, regardless of whether or

7

49 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Research Opportunities in Quality Assessment of Internet of Things, Software Defined Networks and Network Function Virtualization Environments Coutinho et al.

not IoT resources are used. And often these technologies are supported by cloud computing environments. As future work, we intended to investigate the research opportunities briefly described in this paper, and to apply them in an environment involving IoT, SDN and NFV. It is also intended to investigate the quality attributes described in ISO/IEC 25010 [9] and where they impact more on the proposed IoT/SDN/NFV architecture, supported by a cloud computing environment. In this way, quality characteristics and subcharacteristics can be mapped gradu- ally in the layers of the proposed architecture, always with some associated application.

Acknowledgments: This work was partially supported by Universal MCTI/CNPq 01/2016 program (process 422342/2016-5).

References

[1] NBR ISO 8402. Nbr iso 8402 - gesto da qualidade e garantia da qualidade, normas de gesto da qualidade e garantia da qualidade. parte 1: Diretrizes para seleo e uso, 1994. [2] ISO/IEC 9126-3. Iso/iec 9126-3 - software engineering product quality - part 3: Internal metrics, geneva: International organization for standardization, 2002. [3] Eleonora Borgia. The internet of things vision: Key features, applications and open issues. Com- puter Communications, 54(Supplement C):1 – 31, 2014. [4] ETSI. Etsi gs nfv 002 v1.2.1: Network functions virtualisation (nfv); architectural framework, 2014. Online; acessado em junho-2017. [5] K. N. Fallavi, V. R. Kumar, and B. M. Chaithra. Smart waste management using internet of things: A survey. In 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pages 60–64, Feb 2017. [6] Patrick Guillemin and Peter Friess. Internet of things strategic research roadmap. Technical report, http://www.internet-of-things-research.eu/pdf/IoT Cluster Strategic Research Agenda 2009.pdf, 2009. [7] ISO25000. Iso/iec 25010, 2015. http://iso25000.com/index.php/en/iso-25000-standards/ iso-25010. [8] ISO/IEC-25000. Iso / iec 25000 - software engineering. software product quality requirements and evaluation (square), 2014. [9] ISO/IEC-25010. Iso/iec 25010 - system and software engineering - system and software quality requirements and evaluation (square) - system and software quality models, 2011. [10] D. Kreutz, F. M. V. Ramos, P. E. Verssimo, C. E. Rothenberg, S. Azodolmolky, and S. Uhlig. Software-defined networking: A comprehensive survey. Proceedings of the IEEE, 103(1):14–76, Jan 2015. [11] Jim. A. McCall, Paul K. Richards, and Gene F. Walters. Factors in software quality. volumes i, ii, and iii. Technical report, 1977. [12] R. Mijumbi, J. Serrat, J. L. Gorricho, N. Bouten, F. De Turck, and R. Boutaba. Network function virtualization: State-of-the-art and research challenges. IEEE Communications Surveys Tutorials, 18(1):236–262, Firstquarter 2016. [13] M. Ojo, D. Adami, and S. Giordano. A sdn-iot architecture with nfv implementation. In 2016 IEEE Globecom Workshops (GC Wkshps), pages 1–6, Dec 2016. [14] O. Salman, I. Elhajj, A. Kayssi, and A. Chehab. An architecture for the internet of things with decentralized data and centralized control. In 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), pages 1–8, Nov 2015.

8

50 Proceedings ADVANCE 2018 ISBN 978-2-9561129

IoT Research Opportunities in SOLAR E-learning Software Ecosystem

Emanuel Coutinho1,2,5, Maur´ıcioM. Neto1,4,5, Leonardo Moreira1,2,5, Carla Ilane Bezerra3, and Jos´eNeuman de Souza4,5

1 IBITURUNA – Research Group of Cloud Computing and Systems 2 Virtual University Institute (UFC-VIRTUAL) 3 Campus Quixad´a– Federal University of Cear´a(UFC) – Quixad´a,Brazil 4 Mestrado e Doutorado em Cincia da Computao (MDCC) 5 Federal University of Cear´a(UFC) – Fortaleza – Cear´a– Brazil [email protected], [email protected], [email protected] [email protected], [email protected]

Abstract Internet of Things (IoT) is a paradigm that is rapidly gaining ground in the scenario of telecommunications, allowing people and things to be connected. In this scenario, things have identities and they operates in smart spaces for connecting users, environment and social contexts. Software Ecosystem (SECO) refers to a set of software products with a certain degree of symbiotic relationship, and it may consist of actors interacting with a market, supported by a technology platform or common market. Virtual Learning Environ- ments (VLE) integrate information and communication technologies in an environment, aiming the creation of internet-based environments, enabling a knowledge construction process and autonomy by its users. SOLAR VLE is a virtual environment for presential or semi-presential courses. From a broader overview, SOLAR VLE also can be shown as a SECO, connecting its various components. The main objective of this work is to present IoT research opportunities in the SOLAR SECO.

1 Introduction

Internet of Things (IoT) is a novel paradigm that is rapidly gaining ground in the scenario of modern wireless telecommunications [1]. IoT allows people and things to be connected “anytime”, “anyplace”, with “anything” and “anyone”, ideally using “any” path and “any” service [8]. These “things” have identities and virtual personalities operating in smart spaces with intelligent interfaces to connect within society, environment and user contexts [18]. Traditionally, a Software Ecosystem (SECO) refers to a collection of software products with some degree of symbiotic relationship [14]. Also, a SECO can consist of a set of actors acting as an unit interacting with a market, distributed between software and services amongst the relationships between these entities [11]. Such relationships are often supported by a technology platform or by a common market, carried out by information exchange, resources and artifacts. Virtual Learning Environments (VLE) integrate information and communication technolo- gies in an environment, aiming the creation of internet-based environments, enabling a knowl- edge construction process and autonomy by its users. A VLE can have many learning mecha- nisms, such as: class visualization, forums, chats, message exchanges, etc [7]. However, using these mechanisms does not imply in real learning. Currently, educators have access to better and modern technologies such as Web 2.0, and they should take advantage of this [9]. Never- theless, despite the available technologies, there are many factors that influence in the resource usage, such as processes, strategies, motivations and culture.

51 Proceedings ADVANCE 2018 ISBN 978-2-9561129

IoT Research Opportunities in SOLAR E-learning Software Ecosystem Coutinho et al.

Due to these complexities faced today by educators, designing any e-learning system should begin with the adoption of a sustainable model [9]. This model is comprehensive and able to cope with the use of new technologies and tools, incorporating new learning approaches, adaptable to a variety of learning styles, being sensitive to learning conditions. The SOLAR (Learning Online System) [17] is a three-layer web application, enabling courses publication and interaction with them. In this context, the SOLAR SECO was modeled to show the relationships between software, hardware and people, and to study the impact on development, society and research [5][6]. The main objective of this work is to present IoT research opportunities in the SOLAR SECO.

2 SOLAR Virtual Learning Environment

SOLAR is a VLE designed for a virtual space to attend face-to-face or semi-presential courses, and currently with around 47.000 users, and an average of 2.000 daily accesses. It serves as a point of convergence for the creation of a “Blended Education” (a blend of characteristics for both education modes), to form a new educational model that strongly uses technologies. However, it necessarily includes face-to-face situations, hence the origin of the “blended” des- ignation, something mixed, combined. Fig. 1 show the SOLAR VLE web and mobile versions. The VLE SOLAR is mainly available in its web version. However, two web versions exist, differentiating mainly by the amount of new features added and by the graphical user inter- face. There is also a SOLAR VLE mobile version, with reduced functionalities, and a SOLAR VLE version for MOOC (Massive Open Online Course). The VLE main characteristics are: (i) greater customization of the interactors interface; (ii) tools availability for collaborative au- thoring of content (Wiki, Blogs etc); (iii) integration possibility with Web 2.0 tools (Twitter, Facebook, Gmail, Google Analytics etc); (iv) support to different media and devices (Digital TV, Smartphone, PDA etc); (v) projected from research carried out in usability and accessibility areas; and (vi) continue simple, fast and pleasant, but with new features. Fig. 2 shows a SOLAR sociotechnical network. In this network stands out around the SO- LAR VLE platform several elements colaborating many times among themselves. For example, the SOLAR API enables the development of new applications, which can be used by internal, external or independent customers. The involved institutions range from the university, with its undergraduate courses, in face-to-face and at distance, to external institutions and distance education centers. Moreover, there are research components, integration with different systems, social networks and society, as well as different types of access to the environment. Through the network, we can identify some integration points with third-party products and suppliers.

Figure 1: SOLAR VLE web and mobile versions [5]

2

52 Proceedings ADVANCE 2018 ISBN 978-2-9561129

IoT Research Opportunities in SOLAR E-learning Software Ecosystem Coutinho et al.

Figure 2: Sociotechnical network modeled from SOLAR [5]

3 SOLAR SECO

Around SOLAR, a set of relationships was formed: users, suppliers, solutions developer, and business. Several systems have been also developed around the central platform, and many versions and maintenance have emerged. The provision of an API for solutions building also contributed to the integration and diffusion of the environment. The SOLAR SECO is composed of a set of elements that communicate at different levels, where SOLAR VLE is the basis of the ecosystem, being the technological platform that sup- ports SECO. These elements involve different institutions, producing or receiving information, supported by different technologies. Considering the three key role types described by [10], we have: (i) keystone: an organization or a small group acting as the keystone organization, sometimes leading the central software technology development; (ii) end-users: central technology end-users who need it to carry out their business; and (iii) third party organizations: actors which use the central technology as a platform for producing solutions or services. Establishing SOLAR VLE as the SECO central platform, we have an organization which develops an maintain its versions an technologies. The end-users are very diversified, such as: research centers, university, undergraduate courses, professors, and students. A community of external developers is still few, but there is an expansion possibility because the SOLAR API. Considering the three dimensions of SECO proposed in [3]: (i) Business Dimension: driven through the factors of vision, innovation and strategic planning; (ii) Architectural Dimension:

3

53 Proceedings ADVANCE 2018 ISBN 978-2-9561129

IoT Research Opportunities in SOLAR E-learning Software Ecosystem Coutinho et al.

Figure 3: SOLAR SECO SSN Modeling [5]

multiple product development, achieved by sharing a common organizational platform architec- ture, and third parties; and (iii) Social Dimension: need for organizations to become more open, allowing third parties to develop applications for a specific platform. From (i), those involved with the platform have market knowledge, and they act as decision makers, identifying needs and platform expansions. From (ii), aspects commonly analyzed used are the definition and maintenance of the technologies, platform needs to improve the quality, interoperability, and performance. Finally, from (iii), we have several actors, with different roles of users, suppliers and developers, interacting with each other (Fig. 2). A Software Supply Network (SSN) is a series of linked software, hardware and service organizations cooperating to satisfy market demands [12]. To represent SOLAR SECO [5], a SSN modeling notation from [2] and [4] was used (Fig. 3). In this notation, each element coulde be: Company of Interest or Product of Interest, Supplier, Customer, Intermediary, Customer’s Customer, Trade Relationship, Flow and Aggregator. SOLAR is the Company of Interest, supported by several types of suppliers: software, hard- ware, development teams, databases, etc. As Intermediary we have the SOLAR developers, applications stores and a research committee. There are many types of Customers in SOLAR SECO, such as: universities, platform users, researchers, management systems and external de- velopers. The Agreggator role is represented by the SOLAR Technical Coordinator, responsible to managing the SOLAR development, to intermediate new products and services, aligned with business needs, and taking advantage of the opportunities that arise with the platform usage.

4

54 Proceedings ADVANCE 2018 ISBN 978-2-9561129

IoT Research Opportunities in SOLAR E-learning Software Ecosystem Coutinho et al.

Infrastructure Smart Labs Data Mining Management Support

1 Administration

2 Administration

3 Administration 4 Library

5 Laboratories

6 Restaurant

7 Classrooms

Smart Labs Activities Tracking

Figure 4: IoT research opportunities and applications

4 IoT Research Opportunities

The IoT applications can provide many opportunities for improvement and research in an ecosystem such as SOLAR. Fig. 4 illustrates a university campus with some opportunities for IoT applications. As SOLAR VLE is commonly used in the university, this kind of environ- ment is suitable for SOLAR SECO (not limited only to it), but it shows how some of these opportunities can be used.

4.1 Activities Tracking and Real-Time Tracking Most users utilize mobile devices to perform VLE activities. These activities are often related to the performance of students, tutors, and coordinators. A research opportunity would be an IoT application that would monitor this information to be used for supporting performance evalu- ation of the various roles in VLE. In addition, this data can also be used to improve ecosystem processes and for many other applications. The SOLAR central platform is an VLE, so several administrative functions are required for ecosystem management. Most situations are only no- ticed after problems occur. This same problem occurs at the cities poles where the courses are located. The facility provided by the flexibility of use and acquisition of mobile devices would help in recording and effectively communicate across the various levels of ecosystem actors.

4.2 Use of Tags in Activities The use of tags (e.g. RFID or QRCODE) in classroom activities, laboratories or to identify products/environments in class or in the courses activities scattered around cities can motivate users to use the environment, being more integrated and promoting certain digital inclusion.

5

55 Proceedings ADVANCE 2018 ISBN 978-2-9561129

IoT Research Opportunities in SOLAR E-learning Software Ecosystem Coutinho et al.

4.3 Development of Plugins SOLAR has an API for developing applications/plugins. For SECO SOLAR, this API can promote the entry of new application developers, which can be incorporated into SECO itself. The use of different physical devices (intelligent objects) and the composition of new IoT services is a research opportunity in several aspects of SECO, not only being restricted to the platform but also promoting the union of legacy systems to the environment (systems of systems). In addition, it would support the incorporation of new technology platforms and the construction of new applications standardized by API.

4.4 Data Mining SOLAR SECO has many data being generated daily, such as: student frequencies and grades; class and tutors data, from different cities and courses; and employee allocation, logistics, management. All this mass of data is growing more and more, and the amount of data already existing is enormous. If considering that not all applications are connected to each other, and that all could have access to a single point or easy access to the data, the possibilities of using this data in various sectors inside the SECO increases greatly. Examples of uses would be: performance prediction of students and tutors, data-based simulation, resource redistribution, and allocation based on the class and courses’ history. In addition, data can be generated and transformed as needed, and this new information can be used for many purposes. If we add IoT elements, such as various physical devices connected to different databases, the range of applications also increases. And by adding external vendors, the production of new applications or products for different aspects also extends.

4.5 Integrated Application Development The data diversity generated from the most varied situations of SOLAR SECO can power potential applications that can be designed and developed. Currently, several applications exist in various applications in the environment, such as for logistics, financial, allocation and for monitoring students. However, these applications are web-based and lack integration and broad coverage. In this case, it is necessary for one application to know the data of the other. At this point comes the opportunity to search for integrated applications. The performance of applications with embedded autonomic computing mechanisms [13] can help an ecosystem. The use of different devices with various communication mechanisms also enables a series of new support applications.

4.6 Inclusion of Emerging Technologies The addition of emerging technologies to SECO is a research that involves several areas of knowl- edge. The use of simulators to support classes and accessibility features needs the knowledge of developers with differentiated skills. Integration into the environment is one of the challenges to be investigated and encompasses the knowledge of how to communicate to receive/send data in real time and how to facilitate the usability of the actors in the ecosystem. The evaluation of the performance of the cost of mobile devices, such as tablets and smartphones, is a situation to be analyzed, as more and more applications require computing resources for mobile devices. The use of Physical Computing components (e.g. Arduino and Raspberry Pi) for both class- room and resource management of the environment or process automation is an opportunity to be investigated, mainly due to the interaction between people and the environment.

6

56 Proceedings ADVANCE 2018 ISBN 978-2-9561129

IoT Research Opportunities in SOLAR E-learning Software Ecosystem Coutinho et al.

4.7 Recommendation System Recommendation systems can be applied in SECO SOLAR in several ways. The data analysis may provide recommendations for readings, practices, and exercises based on student data. For tutors, in addition to recommendations where he could best act, it is also possible to have recommendations on students who need more attention. For a manager, it is possible that he is interested in observing certain cities poles where performance is not so suitable to the need for quality. Finally, several situations can be generated from the mass of data that travels in the environment, from the database that is stored, and from prediction events that can collaborate with the improvement of the environment.

4.8 Infrastructure Management Support It is possible to monitor and manage the cities poles infrastructure remotely, such as room and equipment allocation, helping to expand, merge, plan new classes and flexibly simulate courses. With IoT, the diversity of resources, devices, data types, communication between equipment and new roles in the ecosystem expands, possibly generating new challenges that require a more complete and integrated management.

4.9 Smart Labs Figs. 2 and 3 show elements for a virtual laboratory. This item came from research to improve experimentation and efficiency in courses that require laboratories. Intelligent laboratories are laboratories where you can remotely control or autonomously control internal resources such as temperature, lighting, turning on/off physical components (air conditioning and computers). Research in this area was carried out in laboratories [16][15], saving energy and promoting better laboratory resources usage. This research is important for laboratories that lack physical and financial resources.

5 Conclusions and Future Work

This work presented the software ecosystem of VLE SOLAR. It is possible to perceive by the level of integration of the platform with the society and with other diverse platforms of technologies that their complexity of relationships tends to grow. In this context, applications in SECO with IoT become propitious and important. Finally, several opportunities for research were highlighted but not limited only to them. Most of the opportunities are related to the generated database of the use of the environment and the integration and communication of physical devices to the SOLAR environment. As future work, the challenges related to the use of data integrated with physical devices, even on a small scale, are initially intended to evaluate the feasibility of expanding the experiments and the logistics and cost of applying them on a larger scale.

References

[1] Luigi Atzori, Antonio Iera, and Giacomo Morabito. The internet of things: A survey. Computer Networks, 54(15):2787 – 2805, 2010. [2] Vasilis Boucharas, Slinger Jansen, and Sjaak Brinkkemper. Formalizing software ecosystem mod- eling. In Proceedings of the 1st International Workshop on Open Component Ecosystems, IWOCE ’09, pages 41–50, New York, NY, USA, 2009. ACM.

7

57 Proceedings ADVANCE 2018 ISBN 978-2-9561129

IoT Research Opportunities in SOLAR E-learning Software Ecosystem Coutinho et al.

[3] P. R. J. Campbell and Faheem Ahmed. A three-dimensional view of software ecosystems. In Proceedings of the Fourth European Conference on Software Architecture: Companion Volume, ECSA ’10, pages 81–84, New York, NY, USA, 2010. ACM. [4] Gabriella Costa, Felyppe Silva, Rodrigo Santos, Cl´audiaWerner, and Toacy Oliveira. From ap- plications to a software ecosystem platform: An exploratory study. In Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems, MEDES ’13, pages 9–16, New York, NY, USA, 2013. ACM. [5] E. F. Coutinho, I. Santos, and C. I. M. Bezerra. A software ecosystem for a virtual learning environment: Solar seco. In 2017 IEEE/ACM Joint 5th International Workshop on Software Engineering for Systems-of-Systems and 11th Workshop on Distributed Software Development, Software Ecosystems and Systems-of-Systems (JSOS), pages 41–47, May 2017. [6] E. F. Coutinho, D. Viana, and R. P. d. Santos. An exploratory study on the need for modeling software ecosystems: The case of solar seco. In 2017 IEEE/ACM 9th International Workshop on Modelling in Software Engineering (MiSE), pages 47–53, May 2017. [7] Emanuel Coutinho, Leonardo Moreira, and Wellington Sarmento. Maat - sistema de avaliao de alunos e tutores para um ambiente virtual de aprendizagem. In IX Simpsio Brasileiro de Sistemas de Informao (SBSI2013), may 2013. [8] Patrick Guillemin and Peter Friess. Internet of things strategic research roadmap. Technical report, http://www.internet-of-things-research.eu/pdf/IoT Cluster Strategic Research Agenda 2009.pdf, 2009. [9] Christian Gtl and Vanessa Chang. The use of web 2.0 technologies and services to support e- learning ecosystem to develop more effective learning environments. In In proceedings of ICDEM 2008, pages 145–148, 2008. [10] Geir K. Hanssen. A longitudinal case study of an emerging software ecosystem: Implications for practice and theory. Journal of Systems and Software, 85(7):1455 – 1466, 2012. Software Ecosystems. [11] S. Jansen, S. Brinkkemper, and A. Finkelstein. Business network management as a survival strategy: A tale of two software ecosystems. In Proceedings of the First International Workshop on Software Ecosystems, 11th International Conference on Software Reuse, pages 34–48, 2009. [12] Slinger Jansen, Sjaak Brinkkemper, and Anthony Finkelstein. Providing Transparency in the Busi- ness of Software: A Modeling Technique for Software Supply Networks, pages 677–686. Springer US, Boston, MA, 2007. [13] Jeffrey O. Kephart and David M. Chess. The vision of autonomic computing. Computer, 36(1):41– 50, 2003. [14] D. Messerschmitt and C. Szyperski. Software Ecosystem: Understanding an Indispensable Tech- nology and Industry. The MIT Press, 1 edition, 2003. [15] Davi Oliveira, Leonardo Moreira, Emanuel Coutinho, and Gabriel Paillard. Uma proposta ar- quitetural de sistema autonmico para reduo do custo de energia em laboratrios de informtica. In V Workshop de Sistemas Distribudos Autonmicos (WOSIDA2015), may 2015. [16] Davi Oliveira, Maur´ıcioM. Neto, Emanuel Ferreira Coutinho, Gabriel Paillard, Ernesto Trajano, and Leonardo Moreira. An autonomic system for energy cost reduction in computer labs. In 5th International Workshop on ADVANCEs in ICT INfrastructures and Services (ADVANCE 2017), Evry, Jan 2017. [17] Solar. Solar. http://www.solar.virtual.ufc.br/, 2016. Online; accessed in april-2016. [18] Lu Tan and Neng Wang. Future internet: The internet of things. In 2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE), volume 5, pages V5–376– V5–380, Aug 2010.

8

58 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Secure Services Recommendation for Social IoT Systems

Ahmed M. Elmisery1 and Hugo Vélez2

1 Department of Electronic Engineering, Universidad Técnica Federico Santa María, Chile [email protected] 2 IoT Cybersecurity Laboratory, Universidad Técnica Federico Santa María, Chile [email protected]

Abstract. The recent advances in embedded systems facilitate the emergence of the internet of things (IoT). IoT devices are limited in size and have the capability to communicate with each other to distribute information and to execute simple processing tasks like measuring and coordinating. Cloud-based Services built upon the IoT paradigm often follow the vertical manner, where the services col- lect data related to a specific scenario and do not allow sharing of the collected data with other services. In fact, this model might prevent cloud-based services from generating accurate recommendations. The synergism between devices and services requires more investigation and data sharing to gain deeper insights into the best way for adopting new services according to users’ usage patterns and preferences which in turn improves the user experience. The utilization of social IoT (SIoT) could boost the accuracy of the generated service recommendations by offering relational data related to the potential interconnection between the devices and the various services. The services recommendation in Social IoT sys- tems is the problem of detecting, for each device its membership to one circle of similar devices. The IoT devices in every circle share some relevant data and conditions gathered from their owners. The circle formation process demands the revealing of all devices’ data to the social IoT system. With the increasing trend of data violations, the need for secure services is emerging. In this paper, we introduce a distributed approach for a secure services recommendation, which permits the composition of various circles without revealing the devices’ data to other parties. We also provide a scenario depicting a secure services recommen- dation for the detection of potential healthcare services for the internet of healthcare things and experimentation results.

Keywords: Social IoT Systems, Secure-Multiparty Computation, Recom- mender System, Cloud Services.

1 INTRODUCTION

The Internet of Things (IoT) is evolving into an extensive network of specific-purpose and uniquely identifiable smart devices with low power consumption that represents the latest phase in the evolution of the Internet. Each smart device embeds various kinds of sensors, low-power processor, and some kind of actuators that enable these devices

59 Proceedings ADVANCE 2018 ISBN 978-2-9561129

to interact with their surroundings and/or deliver notifications. Each smart device gen- erates data by sensing its operating environment or consumes data produced by other smart devices. The different smart devices cooperate all together in order to attain a common goal. The current development model of cloud- based services is based on the vertical manner, each service collect data related to a specific scenario in certain con- ditions, such as e-heath or fitness applications. The collected data for one service is generally not shared with other services. Thus, in turn, causes ineffective utilization for the different data collected by IoT devices, which can potentially be used in a variety of services. The usage of social IoT systems offers extensive data and settings that can aid in selecting the best cloud-based services for a specific set of devices. Recently, new marketplaces were proposed to offer innovative and cheap cloud-based services for the owners of different IoT devices. For instance, TELUS IoT marketplace offers out-of-box services to the owners of various IoT devices in exchange for a monthly subscription. The users can select appropriate services ranging from fleet tracker to smart temps for Healthcare. The selected services can quickly identify, configure and utilize the different IoT devices which are owned by the users. The proliferation of social networks as an efficient gadget to promote the interaction between people and to facilitate information sharing between members of the network. This drives the emerging of more purpose-driven social networks in different domains. Employing social IoT systems had a considerable influence on the IoT paradigm. This in turn aids in raising the term Social internet of things (SIoT). SIoT holds a substantial value for cloud-based service providers since these systems create an interaction space where devices can collaborate together and gather information related to their condi- tions and observations. SIoT were originally developed to enhance the collective intel- ligence of IoT devices. However, the realization of SIoT can attain positive implications on the whole IoT paradigm. These systems managed to turn the regular IoT networks from a network of connected devices to a network of social devices [1]. SIoT facilitates the navigability of the IoT network by forming various structures needed for the detec- tion of new devices and services, the initiation of trust between various entities, and the sharing of data and resources for solving IoT networking problems. SIoT can also ac- cumulate settings and conditions for the best practices to deal with certain IoT devices. Within the SIoT systems, each device has its own profile that represents its owner’s usage patterns, preferences, and conditions, sharing these profiles between different cloud-based services on the SIoT system can enhance the performance of the cloud- based services and the overall user experience. Additionally, each device in SIoT can have relationship with other (similar / dissimilar) devices and people. Services recom- mendation is one of these services that can run on the SIoT systems, which utilizes device’s data to provide numerous recommendations to join personalized cloud-based services out of a large number of candidate services. The principal enhancements of SIoT system to the current IoT paradigm can be summarized as follows: First, the rela- tions within the SIoT are established between devices themselves, rather than with their owners only. Second, the different devices can easily discover the most suitable ser- vices and available resources of other devices by exploiting the relationship on the SIoT

60 Proceedings ADVANCE 2018 ISBN 978-2-9561129

system. Finally, the SIoT system has its own technology stack, machine-readable for- mats and protocols, since it is a social network dealing only with devices and do not rely primarily on web technologies. This kind of service recommendation relies mainly on the assumption that devices with related profiles’ data are connected to the same cloud-based services. The extrac- tion of these recommendations depends on the private profiles of their owners’ usage patterns, preferences, and conditions, which contain their personal data and sensitive preferences. This service is usually accessible to different kinds of entities, which in turn, bring new kinds of threats and problems to the users from the service provider and other registered users, such as malignant behaviors. For instance, malignant users might perform certain attacks to get one another’s personal information, such as current health conditions, place of work and/or relationship status. This kind of attacks could reveal users’ personal information even if it is not supposed to be exposed to the public. These privacy concerns might prohibit the users form subscribing in these services and con- strain the effective utilization of the collected data for recommendations purposes. Sev- eral techniques exist in the literature to control the exposure of personal information. One of the most well-known approaches is the utilization of policy rules to control the flow of private data, a policy engine makes use of these rules to conduct a decision either to liberate or not certain preferences in the user’s profile. However, this approach is either coarse-grained or enforces the user to have a detailed understanding of how to construct proper privacy rules, any unsuitable setting may produce unexpected or un- planned exposure of sensitive data. Additionally, this approach is based on a binary philosophy of either permit or denies the publication of specific preferences on the de- vices’ profiles. Once these data is published, this process can’t be revised and the data owners have no govern over it. This published data can be used later to breach the privacy of the end-users. The malignant parties could employ inference techniques to infer other private preferences that haven’t been published by the users. The authors in [2] applied inference techniques to deduce private information via social relations. The stronger a relationship between victims in the social network, the higher deduction ac- curacy could be attained. In this work, the procedure of discovering and recommending relevant circles were laid out at the owner side. This approach will attain security, privacy and helps to em- brace the proposed protocols. The published data is concealed using two protocols to exclude the risks of potential breaches. This also attains privacy to owners’ data since the sensitive usage patterns, preferences, and conditions will be available only to the owner in a raw form. During the execution of the proposed protocols, the patterns within devices’ data are destroyed. Hence, to facilitate the handling of this concealed data, some selective properties need to be maintained for the algorithm of services rec- ommendations. This research is organized as follows. In Section 2, related works are presented. In section 3 the proposed architecture running at the owner’s side is de- scribed. Some essential definitions will be introduced in section 4. The protocols used for the formation of similar circles are explained in detail in section 5. In section 6, experiments and results are reported. Finally, the conclusions and future work are shown in Section 7.

61 Proceedings ADVANCE 2018 ISBN 978-2-9561129

2 RELATED WORKS

The plurality of the literature on social network analysis highlights the importance of maintaining security while outsourcing any information published by the users to ex- ternal recommendation services, since this is a potential cause of sensitive data leakage as shown in [3]. A theoretical framework was proposed in [4] to maintain the privacy of patrons and the business interests of retailers. A hybrid recommender system was designed in this work, which utilized secure multiparty computation and public key cryptography to attain the required objectives. The authors in [5, 6] proposed a privacy preserving distributed recommender system based on peer to peer techniques. In their work, users’ communities are formed and an aggregated profile is constructed for each community but not for a single user. The aggregated profile is based on the personal information of each user in the community and it is constructed via peer to peer com- munication and encryption techniques. The recommendations process occurs at user side. The same was the approach in [7] to store users’ information on their side and execute the recommendation process in distributed manner without depending on a cen- tralized entity.

3 THE PROPOSED ARCHITECTURE

This research work aims to attain security by design approach [8] in which a middle- ware will be proposed for governing the data publishing during the detection process of devices’ circles. The users are no longer forced to select one of two choices, either to publish or not their devices’ profiles. However, the users are now empowered to segregate sensitive information about themselves, and in that way, they can unveil themselves gradually. The proposed middleware helps the users to manage the infor- mation they share with different devices’ circles, and to be enrolled in a circle with a crafted portion of his/her devices’ profile. The main idea behind the proposed middle- ware rose from user centric insights, which states that the safest method to protect sen- sitive information is to not publish it but to keep at end-user side. However, to discover the relevant devices’ circles, users need to reveal their devices’ profile in some manner to enable the detection process. A mobile middleware for cooperative security (MMCS) is the name of the proposed middleware, which runs on the home gateways of users. It consists of different cooper- ative agents. Each agent has a predefined function. The cooperation between these agents is required to attain the required security goals. The local camouflage agent is responsible for generating a generalized profile based on the real sensitive profile of its owner. The masking agent takes as input the generalized profile then executes two cryp- tographic protocols needed for the detection process. The first protocol is secure rele- vancy ranking (SRR) which is used to form virtual circles based on devices’ profiles. And the second one is secure circle detection (SCD) that is utilized to discover real circles within each virtual circle. MMCS utilizes aggregation topologies to organize the accumulation of users’ data; these topologies can be as simple as a ring formation or as

62 Proceedings ADVANCE 2018 ISBN 978-2-9561129

complex as a hierarchical formation. The topological formation suits the proposed pro- tocols well. The centralized services recommendation (CSR) is a centralized entity that launches the detection process and stores the extracted circles. Additionally, CSR acts as virtual workplace that facilitates any communication between devices. The scenario in this research is as follows: Based on various themes of the services, the administrator of the SIoT creates initial circles and requests the devices to join it. Each device informs its MMCS to generate a generalized profile that exposes only main topics on its sensi- tive profile. Users seek to conceal their current personal information. MMCS executes the proposed protocols, and then offers services recommendation to enroll in a relevant circle that its owner may like.

4 THREAT MODEL

The proposed middleware maintains security in the semi-honest model. Each party in- volved in the detection process is compelled to act in accordance with the protocols but the intermediate values might be stored to later investigate the input of other parties. The centralized services recommendation (CSR) is considered to be the untrusted ad- versary that focuses on collecting the data of various users in order to classify and trace them. This work did not assume that the centralized recommender service to be entirely malignant, which is a sensible assumption, since certain business goals needs to be achieved to increase its profit. The attained privacy is high, only if CSR can’t infer any sensitive data of the users.

5 PROBLEM FORMULATION

In this section, the important notions used in this work will be outlined based on our previous solution in [9, 10]. The users’ preferences are expressed within two types of profiles, a generalized profile and a sensitive profile. The generalized profile is the pub- lic version of the sensitive profile, which contain a set of hypernym phrases in the same semantic level of the sensitive data. The generalized profile represents the popular data that users are willing to disclose and accept to be published by MMCS. The sensitive profile contains the “personal/secret” preferences that the users refuse to reveal openly to other entities. Security must be respected when detecting relevant members of the devices’ circles. It also needs to be maintained when recommending circles to the new devices. The data in flux during the detection process needs to be protected against both of CSR and external third parties. The notion of virtual circle in this research can be described: Definition 1. A virtual circle is the set VC = {RC1, RC2, … , RCn} ,where n is the total n number of real circles in VC, has the following properties: (1) Each ∀i=1RCi ∈ VC has a 3-tuple RC = {Isg, Vsg, dsg} such that Isg = {i1, i2, … , il} represents the set of generalized preferences, Vsg = {v1, v2, … , vk} corresponds to the set of devices, and dsg ∈ Isg is the l main- preferences of RC. (2) For each user ∀i=1vi ∈ Vsg, v have the preferences Vsg. (3) dsg is the frequent preference in Vsg profiles, and it represents the “core-point” of this

63 Proceedings ADVANCE 2018 ISBN 978-2-9561129

real-circle RC. (4) For any two circles RCa and RCb (1 ≤a,b ≤n and a≠b) Vsga ∩

Vsgb = ∅ and Isga ≠ Isgb.

6 PROPOSED PROTOCOLS IN MMCS

In the proposed approach, MMCS enforces and maintains security of the owners’ pro- files [11-18]. MMCS is running on users’ gateways and is arrayed with two crypto- graphic protocols, namely, secure relevancy ranking protocol (SRR) and secure circle detection protocol (SCD). MMCS facilitates the formation of virtual circles and the detection of relevant real-circles. Users sharing the same preferences can exchange their personal settings and services recommendation to handle the emerging requirements of a certain cloud-based service. Any newly registered device can search and enroll in any real-circle in a secure manner.

6.1 Secure Relevancy Ranking (SRR) Protocol The main aim of this protocol is to cluster devices’ data into numerous virtual circles. Two challenges have been faced while forming those virtual circles. The first one is the representation of such circle, i.e., proper intra-circle closeness and intra-circle segrega- tion need to be attained. The second one is to maintain privacy of sensitive devices’ data. Hence, the of generalized profiles is essential. The generalized profiles are con- structed using public information offered by CSR such as (taxonomy tree and unique public dictionary). The local camouflage agent maps the contents of the sensitive profile into this public information space, which results into the creation of generalized profiles as proposed in [9, 10] . After creating the generalized profiles, MMCS incites the mask- ing agent to execute SRR protocol to form the virtual circles. Each virtual circle contains the devices which share the same generalized preferences. Any device can enroll to mul- tiple circles. SRR is performed in a distributed way. SRR begins with organizing a bag of interests that represent generalized preferences based on devices’ sensitive prefer- ences. The sensitive preferences are extracted then generalized as stated previously. The generalized preferences of device Vc are used to create a preference vector Vc = (ec(w1), … . . ec(wm)), where m represents the total number of distinct preferences in the device’s profile, and ec(w1) describes the significance of preference w1to device Vc (weighted frequency). The further computation utilizes term frequency inverse profile frequency model [19] as follows:

Term − frequencyVc (wi) = #wi in Vc profile⁄ #preferences in Vc profile , and

inverse − profile − frequencyVc (wi) = log(#device⁄ #profiles contain preference wi), where

ec(w1) = Term − frequencyVc (wi) ∗ inverse − profile − frequencyVc (wi) The similarity metric should be tuned appropriately to apprehend the similarity between the generalized preferences of each device. Dice similarity was employed for this duty. Let Vc(Vd) be a two preference vectors for devices C and D then: 2 2 DevicesSimilarity(Vc,Vd) = 2|Vc ∩ Vd|⁄ |Vc| + |Vd|

64 Proceedings ADVANCE 2018 ISBN 978-2-9561129

In other words, any two devices C and D are deemed similar to each other if they share many generalized preferences. Hence, these two devices will belong to the same circle. The sensitive preferences are kept at owner side. The procedure for SRR is as follows:

1. For every two devices 퐶, 퐷 ∈ 푉 own a set of preference vectors ec(wi) and ed(wi). Each device employs a hash function h to its preference vector to generate Vc = h( ec(wi)) and Vd = h( ed(wi)). MMCS at device C produces an encryption key E and decryption key U then sends the encryption key E to device D. Later, the simi- larity between these two devices is calculated in two steps. First, computing the nu- merator, and then the denominator 2. After selecting one of the gateways as a trusted fog node for the aggregation process. A topological formation is created between devices to receive the calculated numer- ator values. D 3. The masking agent at device D hides Vd by Bd = {ed(wi) × r |wi ∈ Vd} where r is a random number for each preference in its profile wi, and then sends Bd to device C. 4. The masking agent at device C signs Bd and get the signature Sd, then sends Sd again to D in the same order it was received. MMCS at device D reveals the preference set Sd using the set of r values and obtains the real signature SId, then it implements the hash function h on SId to form SIHd = H(SId). 5. The masking agent at device C signs the preference set Vc and gets the signature SIc then implements the same hash function h on SIc to form SIHc = H(SIc) then sub- mits this to D. 6. The masking agent at device D compares SIHd and SIHc utilizing the knowledge of Vd, D gets the intersection set INC,D = SIHc ∩ SIHd that represents |Vc ∩ VD|. MMCS at D implements the hash function h on INC,D ,and then it encrypts this ex- tracted value along with |VD|, |VC| and devices’ pseudonyms identities with the pub- lic key of the fog node. This encrypted data is later sent to the fog node of this circle. 7. The fog node collects all these intermediate values and decrypts them, then starts to cluster devices into different circles. using S-seeds [9] clustering algorithm.

The above protocol executes these steps on m hashed generalized preferences hosted on m parties without revealing any of these preferences.

6.2 Secure Circle Detection (SCD) Protocol The masking agent in MMCS implements SCD protocol on the virtual circles extracted from SRR protocol. SCD protocol learns in a bilateral way the correlated preferences in the generalized profiles, the final results are used in discovering the relevant real- circles. SCD protocol is based on the work in [9, 10]. The main idea behind SCD is that the sets of frequent preferences shared between the devices should be as large as pos- sible in each real-circle. For this protocol, the devices need to be organized into a ring topological formation to discover the relevant real-circles. SCD protocol can be out- lined as follows:

65 Proceedings ADVANCE 2018 ISBN 978-2-9561129

1. The SCD protocol is incited by CSR. Users within the same virtual circle negotiate with each other to elect a fog node. The fog node publishes a catalog of 1-candidate frequent preferences. The registered devices execute a local function to extract local frequent preferences using local parameters for the support and closure. The algo- rithm presented in [20] is utilized to detect global & local frequent preferences for each virtual circle. n 2. For all devices ∀1 Pi, device Pi encrypts the calculated local list with its own key and send it to second member Pi+1 in the virtual circle, and so on for all devices. 3. The last device in the circle Pn−1 sends the received lists to the fog node. The Fog node generates the global support by aggregating the local supports that were re- ceived. While, the global closure will be the intersection between the received local closures. 4. The fog node encrypts and publishes the lists of global supports & closures to de- vice Pn−1 in random order. Device Pn−1 decrypts its encrypted contribution from these lists using its own private key, and then forwards these lists to the next de- vice Pn−2. The fog node receives these lists back only encrypted with its own key. Hence, final results can be obtained. 5. For each neighboring set of global frequent preferences, the fog node creates a basic real-circle 푅퐶 that contains all devices sharing these global frequent preferences; these basic real-circles might be overlapped in the start. SCD will utilize these global frequent preferences as a real-circle representative. 6. For each device’s profile Vi, the masking agent determines the best basic real-cir- SimilarityScore(RC ← V ) = ∑ e (w ) ∗ cle 푅퐶( ci) using a scoring function: i i [ wi r i ′ ′ RC_support(w ) ] − [∑ ′ e (w ) ∗ VC_support(w )]. w is the global frequent prefer- i wi r i i i ′ ence in profile r and this preference is also common in the real-circle RCi while wi is a global frequent preference in profile r and is not common in the real-circle RCi. After implementing this scoring function, each device will belong to exactly one real-circle. The representative is re-calculated based on the current members in each real-circle. 7. For each virtual circle 푉퐶, the fog nodes compile a hierarchical structure of the discovered real-circles. The global frequent k-preferences of each real-circle are employed as representatives. Hence, the real-circle with k-preferences will become available at level k of this hierarchical structure. The parent real-circle at level k-1 is a subset of its child real-circle at level k. The scoring function is also utilized to deduce the candidate parent for each child real-circle. The fog nodes in various circles share the list of discovered real-circles with one another. This essential step is used to merge similar real-circles together and removes excessively real-circles according to inter-real-circle similarity. This new similarity metric is very close to the previous scoring function, the only variation is related to the normalization, which is used to eliminate the effect of changeable number of members in each real-circle. This metric is measured as follows: SimilarityScore(RC ← ∀n V ∈ RC ) RC_Similarity(RC ← RC ) = [ i x=1 x j ] + 1 i j ⁄ ′ [∑ e(w ) + ∑ ′ e(w )] wj j wj j

66 Proceedings ADVANCE 2018 ISBN 978-2-9561129

The 퐼푛푡푒푟 RC_ 푠푖푚푖푙푎푟푖푡푦 (푅퐶푖 ↔ 푅퐶푗) = [RC_푆푖푚푖푙푎푟푖푡푦(푅퐶푖 ← 푅퐶푗) ∗ 푛 RC_푆푖푚푖푙푎푟푖푡푦(푅퐶푗 ← 푅퐶푖)]. 푅퐶푖 and 푅퐶푗 are two real-circles; ∀푥=1푉푥 ∈ 푅퐶푗 stands for single conceptual profile for the real-circles 푅퐶푗. 푤푗 represents a global frequent ′ preference in both 푅퐶푖 and 푅퐶푗 while 푤푗 represent a global frequent preference in ′ ′ 푅퐶푗 only but not in 푅퐶푖. 푒(푤푗) and 푒(푤푗 ) are the weighted frequency of 푤푗 and 푤푗 in real-circles 푅퐶푗. 8. Finally, any new device invokes its MMCS to obtain a list of available real-circles’ representatives from CSR. Then executes SRR and SCD protocols on its sensitive profile. Finally, MMCS measure the similarity with each representative and enroll the device to the real-circle with the highest relevancy value, then start to recommend services associated with other devices in such real circle.

7 EXPERIMENTS

The experiments were conducted on two Intel® machines connected via a local net- work, the lead peer is Intel® Core i7 and the other is Intel® Core 2 Duo. MySQL was used as data storage for the device’ profiles. CSR has been deployed as a web service. MMCS has been implemented as an applet to handle the complex interactions between the devices, CSR and other registered devices. Our proposed protocols were imple- mented using Java and boundycastel© library, RSA key length is set to 512 for all experimental scenarios. The experiments were conducted using a dataset pulled from the SportyPal network that was linked to another dataset containing 54 services recom- mendation of 30 IoT devices extracted from the TELUS IoT marketplace. To create the generalized profiles based on these profiles we use the same method proposed in [10].

Fig. 1. Service Recommendations Accuracy and Privacy

67 Proceedings ADVANCE 2018 ISBN 978-2-9561129

In order to evaluate the attained privacy level and accuracy of results of the proposed solution, precision and recall metrics were utilized as shown in Fig. (1). As seen, a good quality is achieved by identifying virtual circles that involve different real-circles, which enables the extraction of accurate recommendations to the devices who share the same data. Also, the effect of each preference inside the real circle can be easily calcu- lated, which allows MMCS to detect and remove outlier values that are very far from the generalized preferences. We also evaluated the leaked private preferences of differ- ent devices when running the proposed solution. We considered devices, who revealed portion of their sensitive preferences in their generalized profiles, for each of these de- vices; we tried the attack procedure mentioned in threat model to expose other hidden preferences in their sensitive profiles based on the real-circle they are belonging to. The obtained preferences were quantified and the results are shown in Fig. (1). As seen, the proposed solution can reduce privacy leakages of exposed private preferences, How- ever, the exposed preferences are only hashed hypernym phrases based on the sensitive preferences.

8 CONCLUSION AND FUTURE WORK

In this paper, we presented our attempt to develop a mobile based middleware (MMCS) that runs at the users' gateways and allows exchanging of their information to facilities services recommendation and creating real-circles without disclosing their preferences to other entities. A brief overview of the proposed protocols was given. The perfor- mance of the proposed protocols was tested on a real dataset. The experimental and analysis results show attaining higher privacy levels in services recommendation within the real-circles is feasible under the proposed approach without reducing the accuracy of the recommendations. A future research agenda will include utilizing game theory to better formulate virtual circles, multiple preferences release and its impact on owner’s privacy.

ACKNOWLEDGMENT. This work was partially financed by the “Dirección General de Investigación, Innovación y Postgrado” of Federico Santa María Technical University- Chile, in the project Security in Cyber-Physical Systems for Power Grids (UTFSM-DGIP PI.L.17.15), by Advanced Center for Electrical and Electronic Engineering (AC3E) CONICYT-Basal Project FB0008, and by the Microsoft Azure for Research Grant (0518798).

References

1. L. Atzori, A. Iera, and G. Morabito, “Social internet of things: turning smart objects into social objects to boost the IoT,” Newsletter. 2. J. He, W. W. Chu, and Z. V. Liu, "Inferring privacy information from social networks." pp. 154-165. 3. F. McSherry, and I. Mironov, "Differentially private recommender systems: building pri- vacy into the net." pp. 627-636.

68 Proceedings ADVANCE 2018 ISBN 978-2-9561129

4. E. Aimeur, G. Brassard, J. M. Fernandez, F. S. M. Onana, and Z. Rakowski, "Experimental demonstration of a hybrid privacy-preserving recommender system." pp. 161-170. 5. T. Hofmann, and D. Hartmann, "Collaborative filtering with privacy via factor analysis." pp. 791-795. 6. J. Canny, "Collaborative filtering with privacy." pp. 45-57. 7. B. N. Miller, J. A. Konstan, and J. Riedl, “PocketLens: Toward a personal recommender system,” ACM Transactions on Information Systems (TOIS), vol. 22, no. 3, pp. 437-476, 2004. 8. I. S. Rubinstein, “Regulating privacy by design,” Berkeley Technology Law Journal, vol. 26, no. 3, pp. 1409-1456, 2011. 9. A. M. Elmisery, K. Doolin, and D. Botvich, Privacy Aware Community based Recom- mender Service for Conferences Attendees: IOS press, 2012. 10. A. M. Elmisery, K. Doolin, I. Roussaki, and D. Botvich, "Enhanced Middleware for Collab- orative Privacy in Community Based Recommendations Services," Computer Science and its Applications: CSA 2012, S.-S. Yeo, Y. Pan, S. Y. Lee and B. H. Chang, eds., pp. 313- 328, Dordrecht: Springer Netherlands, 2012. 11. F. Beil, M. Ester, and X. Xu, “Frequent term-based text clustering,” in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, Edmonton, Alberta, Canada, 2002, pp. 436-442. 12. M. Fung B. C, “Hierarchical document clustering using frequent item sets,” Master's Thesis, Simon Fraser University, 2002, 2002. 13. A. M. Elmisery, S. Rho, and D. Botvich, “Privacy-enhanced middleware for location-based sub-community discovery in implicit social groups,” The Journal of Supercomputing, vol. 72, no. 1, pp. 247-274, 2015. 14. A. M. Elmisery, S. Rho, and D. Botvich, “Collaborative privacy framework for minimizing privacy risks in an IPTV social recommender service,” Multimedia Tools and Applications, pp. 1-31, 2014. 15. A. M. Elmisery, “Private personalized social recommendations in an IPTV system,” New Review of Hypermedia and Multimedia, vol. 20, no. 2, pp. 145-167, 2014/04/03, 2014. 16. A. Elmisery, and D. Botvich, “Enhanced Middleware for Collaborative Privacy in IPTV Recommender Services ” Journal of Convergence, vol. 2, no. 2, pp. 10, 2011. 17. A. M. Elmisery, and D. Botvich, "Agent Based Middleware for Maintaining User Privacy in IPTV Recommender Services," Security and Privacy in Mobile Information and Commu- nication Systems: Third International ICST Conference, MobiSec 2011, Aalborg, Denmark, May 17-19, 2011, Revised Selected Papers, R. Prasad, K. Farkas, A. U. Schmidt, A. Lioy, G. Russello and F. L. Luccio, eds., pp. 64-75, Berlin, Heidelberg: Springer Berlin Heidel- berg, 2012. 18. A. M. Elmisery, and D. Botvich, "An Agent Based Middleware for Privacy Aware Recom- mender Systems in IPTV Networks," Intelligent Decision Technologies: Proceedings of the 3rd International Conference on Intelligent Decision Technologies (IDT’ 2011), J. Watada, G. Phillips-Wren, L. C. Jain and R. J. Howlett, eds., pp. 821-832, Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. 19. F. Sebastiani, “Machine learning in automated text categorization,” ACM Comput. Surv., vol. 34, no. 1, pp. 1-47, 2002. 20. D. W. Cheung, J. Han, V. T. Ng, A. W. Fu, and Y. Fu, “A fast distributed algorithm for mining association rules,” in Proceedings of the fourth international conference on on Par- allel and distributed information systems, Miami Beach, Florida, United States, 1996, pp. 31-43.

69 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Using Archetypes for Interoperability on Clinical Management Scenarios: A Case Study on Aedes Aegypti’s Arboviruses ∗

Fabio Gomes1,† Cesar Moura2,‡ and Joao Jose Filho3 Arthut Bezerra4 Oton Braga5 Odorico Monteiro6 Mauro Oliveira7

1 Instituto Federal do Ceara, Aracati, Ceara, Brasil [email protected] 2 Instituto Federal do Ceara, Fortaleza, Ceara, Brasil. [email protected] 3 Instituto Federal do Ceara, Aracati, Ceara, Brasil [email protected] 4 Instituto Federal do Ceara, Aracati, Ceara, Brasil [email protected] 5 Instituto Federal do Ceara, Aracati, Ceara, Brasil [email protected] 6 Congresso Nacional do Brasil, Brasilia, DF, Brasil [email protected] 7 Instituto Federal do Ceara, Aracati, Ceara, Brasil [email protected]

Abstract The quality of services provided to patients in health area has a strong correlation with the quality of the related clinical information. Furthermore, this information should be consistent, secure and available to health professionals, but these data are often distributed among heterogeneous systems. Electronic Patient Record (EPR) has been proposed and inteded to minimize the integration problems through construction of health information systems. This work proposes a methodology for the development of interoperable and flex- ible systems, using the EHRServer framework of the OpenEHR standard. As a case study, this methodology is being applied in Aracati/CE, since March/2017, on the Chikungunya disease context. As a primary result, a template (i.e. a set of archetypes) that represents the clinical management of Chikungunya disease has been created and is available on Clin- ical Knowledge Manager (CKM), the largest online repository of archetypes available on the Web.

Contents

1 Introduction 2

2 Theoretical Framework And Related Work 3 2.1 OpenEHR ...... 3 2.2 Related Work ...... 4

∗Other people who contributed to this document include Maria Voronkov (Imperial College and EasyChair) and Graham Gough (The University of Manchester). †Designed and implemented the class style ‡Did numerous tests and provided a lot of suggestions

70 Proceedings ADVANCE 2018 ISBN 978-2-9561129

The easychair Class File Mokhov, Sutcliffe and Voronkov

3 Proposed Methodology 4 3.1 Applied Methodology ...... 5

4 Future Work 6

5 Conclusion 7

6 Acknowledgments 7

1 Introduction

The Brazilian government and health services private providers have been investing in informa- tion technology (IT) for the Health Information Systems (HIS) construction to help profession- als to improve people’s health care. One of the solutions is the Electronic Patient Record (EPR) development, which replaces the paper medical record by electronical medical record and stores the patient’s information in a health center[1]. Ideally, these systems should be interoperable, making this information available wherever it was expected to. However, most of the EPRs used in health centers are proprietary systems built with different architecture, business rules, information technologies and models, in addition the diverse clinical terminology use. These problems make it difficult for the health professionals to provide adequate care to the patient [2]. These facts hinder the interoperability among the HIS. In addition, we have the diversity of these systems and the complexity of the health area, leading to problems such as: clinical history fragmentation in various HIS, relevant information unavailable from patients in a new health facility, medical error due to the lack of information about the patient, clinical informa- tion redundancy spread across the various EPRs, test results omission (obliging the patient to redo such procedures), greater time and resource expenditure. All these problems negatively impact both the patient care and information management for the decision-making of health professionals and managers [14]. A solution for this problem could be interoperability through interfaces, in which systems exchange information with each other. However, this solution is not reusable and it is inap- propriate for a large number of systems, since each newly added system would require a new interface [10]. The most appropriate solution for achieving interoperability is by means of con- sensus pattern [2]. These standards provide the means for two or more systems to exchange and use information without adding new functionality to the system, requiring only systems that need to communicate using consensus patterns [9]. In Brazil, there is an effort to use interoperability standards for HIS. An initiative was of Decree 2.073 / 2011 published by the Ministry of Health in which twelve standards of interoperability between HIS are adopted. Among these standards, the OpenEHR [8], that proposes a fast construction of flexible and interoperable EHR [5]. For this reason, it was choosen as standard for this work. Therefore, this work proposes a methodology for the development of interoperable and flexible systems, using the EHRServer framework of the OpenEHR standard. As a case study, this methodology is being applied in Aracati/CE, since March/2017, on the Chikungunya disease context. As a primary result, a template (i.e. a set of archetypes) that represents the clinical management of Chikungunya disease has been created and is available on Clinical Knowledge Manager (CKM), the largest online repository of archetypes available on the Web. This system aim to support primary care professionals along the treatment of the disease, due to the fact that this region is being affected by epidemics caused by the mosquito Aedes Aegypti, specifically arbovirosis chikungunya [7].

2

71 Proceedings ADVANCE 2018 ISBN 978-2-9561129

The easychair Class File Mokhov, Sutcliffe and Voronkov

Figure 1: Dual model of openEHR standard. Source openEHR(2017)

2 Theoretical Framework And Related Work 2.1 OpenEHR The OpenEHR standard is characterized by being a specifications and free tools set that al- low the clinical records development in a modular way [6]. This standard is maintained by the OpenEHR Foundation, which proposes, especially, the Electronic Health Record (EHR) development capable of monitoring dynamism and complexity in the health area, thus gener- ating open, flexible, independent and interoperable systems [11]. The purpose of this standard is to represent the clinical knowledge in a structured way by standardizing and organizing knowledge domain data through Models called archetypes (metadata standards). Archetypes represent clinical information of various natures, for example heart rate [13]:

2.1.1 OpenEHR Clinical Model The archetypes are kept outside the system code, stored in an online repository called Clinical Knowledge Manager (CKM), allowing medical experts to manage them independently from system analysts and IT specialists [5]. This allows health professionals themselves to represent complex concepts such as ”blood pressure” or ”family sickness history”, especially through the reuse or definition of new archetypes [6]. Thus, as illustrated in figure 1, there is an important and crucial separation for the successful EHR solutions development [5]: Domain modellers: Health professionals define models that describe consensual concepts and create template models that will be used in health facilities. The template is a set of archetypes that represent a clinical record. The manipulation of these archetypes is done through specific software tools such as Archetype Editor, which maps them into formal languages, ADL Archetype Definition Language or XML (eXtensible Markup Language). Software developers: Information Technology Professionals develop software that under- stands these models and adjusts to their definitions, ensuring version control of informa- tion and data management capability. From the formal representation, the archetypes can be implemented in the software, using some frameworks or libraries available through the OpenEHR Foundation [4].

2.1.2 EHR Server The EHRServer is a Health Electronic Health Record (EHR) server standardized with the OpenEHR standard used as a clinical information repository. Through this server, it is possible

3

72 Proceedings ADVANCE 2018 ISBN 978-2-9561129

The easychair Class File Mokhov, Sutcliffe and Voronkov

Figure 2: Proposed Methodology

to register and query EHR data in a Service Oriented Architecture. This architecture allows distinct systems to connect, send, and request data to the EHRServer. The EHRServer uses the MySQL database manager and the groovy programming language to generate software artifacts that will write files to a database, in addition, the server is abstracted in 3 layers: Data layer, Logical layer, and Persistence layer (Where data is stored). The EHRServer (API) application program interface allows the integration of an Electronic Medical Record (EMR) application to use two basic services, Commit, which sends EHR data patient to The EHR Server, and the Query EHR data from a patient by EHR Server.

2.2 Related Work There is a community effort on the road to standardizing systems through the use of archetypes. [12] defines an archetype model to map the formal clinical knowledge base for the construc- tion of computational artifacts focused on telehealth. In turn, [3] implements an interoperable RES based on the openEHR standard for basic care of the Bahia State Harmony Foundation (FLH/BA, in Portuguese). Although using traditional system development methodology (Anal- ysis, Design, Implementation, Testing and Deployment), it was the first proof of concept of the openEHR standard in Brazil, serving as the basis for other standard-based SIS. [15] proposed a shareable RES architecture and the creation of twenty archetypes for the primary care of the Minas Gerais Health Secretariat (SES/MG, in Portuguese) using ISO 13606. This work contributed to the development of the RES system Using the two-level modeling, attending primary care.

3 Proposed Methodology

There is a big challenge when it comes to developing interoperable HIS. However, the openEHR standard provides a set of interoperable software building features that may be modified later without changing source code, unlike traditional methodology. An important framework of this standard that facilitates this process is the EHRServer, which will be used in this work. Based on this new paradigm, we propose the following methodology (Figure 2) for the generation of the proposed solution: 1. Requirements: Initial phase, where the system requirements are defined. A formal speci- fication is described describing the functions of the future system.

4

73 Proceedings ADVANCE 2018 ISBN 978-2-9561129

The easychair Class File Mokhov, Sutcliffe and Voronkov

2. Archetypes: In this phase, the archetypes are searched in the CKM. These archetypes (metadata) represent the requirements of the health information system. 3. Composition: Then, the archetypes are clustered in a single base archetype (composition), giving rise to the template. 4. Exportation: After the template creation, it will be exported (usually in an Operational Template (OPT) 1) or XML Schema Definition (XSD) 2. 5. Upload do Template: Then the template is uploaded to the EHR server. In this work we will use the clinical server EHRServer, which allows the reading of archetypes and templates for processing and manipulation of these. 6. Template Implementation: once the EHRServer template is read, it is time to implement it, using some kind of framework and other necessary tools. 7. Generating System Form: After the application is deployed, it will be created the HTML form for interaction with the end user, made available through a browser.

3.1 Applied Methodology In order to validate the proposed methodology, is being applied a case study for implementation of the Chikungunya Clinical management system for the basic health unit in Aracati/CE. The case study will be described in detail below.

1. The system requirements are based on the chikungunya clinical management protocol (edition 2017)3 , made available by SUS, the Brazillian Health System. The objective is to implement the registry of clinical procedures adopted in cases suspected of the disease, which are summarized in four actions: Anamnese (record the patient’s clinical history); Physical Examination (record information on physical examination findings in patients); Blood Examination (Register the results of laboratory tests done to confirm the suspected case) and Conduct (record clinical conduct as prescription of medications and contraindications). 2. Once the software requirements are defined, it is time to search the Clinical Knowledge Manager (CKM) for the archetypes that represent the clinical knowledge of each action. In case it doesn’t exist yet in CKM, the archetype will be created using an appropriate tool, such as Archetype Editor. Next we present the Chikungunya archetypes:

• openEHR-EHR-COMPOSITION.encouter.v1: BaseBase archetype for oth- ers. It represents a composition through which the other archetypes will be inte- grated. • OpenEHR-EHR.OBSERVATION.IdParent.v1: Archetype of identification of the person registered in the system, possessing identification code, personal data and demographic data. • openEHR-EHR-OBSERVATION.history.v1: Archetype that represents the patient’s medical history (anamnesis).

1Operational Template, standard format used by frameworks in the implementation. 2XML Schema Definition - a data format recommended by the W3C Consortium 3Available at http: //portalquivos.saude.gov.br/images/pdf/2016/December/ 25/chikungunya-new-protocol.pdf.

5

74 Proceedings ADVANCE 2018 ISBN 978-2-9561129

The easychair Class File Mokhov, Sutcliffe and Voronkov

Figure 3: Mind map of Serology Archetype

• OpenEHR-EHR-CLUSTER.exam.v1: Archetype that represents the physical examination findings performed on the patient. • OpenEHR-EHR-OBSERVATION.Serology.v1: Archetype that represents the patient’s blood test result. The confirmation of arbovirose comes by blood test. This archetype was created to represent the amount of information on: hemoglobin, hematocrit, leukocyte and platelet. The values of this information are essential to prove or rule out a suspected case. The Figure 3 shows the mental map of this archetype, which also serves to represent the serology of the arboviruses Dengue and Zika Virus. Then this archetype was submitted to the CKM. • openEHR-EHR-SECTION.clinicalDecision.v0: Archetype that represents in- formation involving medical conduct as indications and contraindications.

3. After defining the set of archetypes, it will be clustered into a single composition archetype type, and then the system template is generated. It consists of the relevant clinical information set to be printed through a form, resulting in the system interface. The figure 4 shows the mind map of this template. 4. After the template creation, it will be exported (usually in an Operational Template (OPT) 4) or XML Schema Definition (XSD) 5; 5. Then the template is uploaded to the Server. In this work we will use the clinical server EHRServer, which allows the reading of archetypes and templates for processing and manipulation of these. 6. The template will be implemented using the Grails framework, generating the system source code. After that, the HTML form of the applications will be created for interaction with the end user.

This methodology can be applied to any other interoperable system development process based on the openEHR standard. The figure 4 Shows the mind map that represents the clinical management of Chikungunya. This template was submitted to CKM as a proof of validation.

4 Future Work

This system is being implemented in the EHRServer architecture, using the groovy program- ming language and the MySQL database. To date, the template representing the clinical management record for chukingunya has been defined. This model was designed with the as- sistance of health professionals from the Municipality of Aracati and then submitted to CKM for sharing and validation.

4Operational Template, standard format used by frameworks in the implementation. 5XML Schema Definition - a data format recommended by the W3C Consortium

6

75 Proceedings ADVANCE 2018 ISBN 978-2-9561129

The easychair Class File Mokhov, Sutcliffe and Voronkov

Figure 4: Mind Map of Chikungunya Template

After the implementation of the system, this work aims to implement this solution in a basic health unit to assist primary care professionals, especially in the clinical management of Chikungunya. And it will later include the clinical management of Dengue and Zika Vrus arborviroses.

5 Conclusion

The challenges are great when it comes to HIS’s interoperability. However, the openEHR standard has several features to overcome such challenges. As demonstrated, it is possible to develop a HIS in an efficient, scalable, flexible, changeable, economically viable, open source and interoperable way using the proposed architecture. The methodology used in this work can be adopted in several processes of development of similar systems or adaptations for the existing systems.

6 Acknowledgments

This work was supported by the FUNCAP (Fundao Cearense de Apoio ao Desenvolvimento Cientfico e Tecnolgico, in Portuguese) under the Program of Research Productivity Grants, Incentive for Interiorization and Technological Innovation - BPI, FUNCAP Edital No. 09/2015.

References

[1] Tiago Veloso Araujo, Silvio Ricardo Pires, and Paulo Bandiera-Paiva. Ado¸c˜aode padr˜oespara registro eletrˆonicoem sa´udeno brasil. Revista Eletrˆonica de Comunica¸c˜ao,Informa¸c˜ao& Inova¸c˜ao em Sa´ude, 8(4), 2014. [2] Gustavo Bacelar and Ricardo Correia. openehr. 2015. [3] Gustavo M Bacelar-Silva, Hilton C´esar,Patricia Braga, and Rodney Guimaraes. Openehr-based pervasive health information system for primary care: First brazilian experience for public care. In Computer-Based Medical Systems (CBMS), 2013 IEEE 26th International Symposium on, pages 572–873. IEEE, 2013. [4] T Beale, S Heard, D Kalra, and D Lloyd. Archetype definition language (adl). OpenEHR specifi- cation, the openEHR foundation, 2005. [5] Thomas Beale, Sam Heard, D Kalra, and D Lloyd. Openehr. Null Flavours and Boolean data in openEHR.[article on the Internet], 2007. [6] Thomas Beale, Sam Heard, Dipak Kalra, and David Lloyd. Openehr architecture overview. The OpenEHR Foundation, 2006.

7

76 Proceedings ADVANCE 2018 ISBN 978-2-9561129

The easychair Class File Mokhov, Sutcliffe and Voronkov

[7] Brazil. Ministrio da sade. monitoramento dos casos de dengue, febre de chikungunya e febre pelo vrus zika at a semana epidemiolgica 35, 2017. Accessed: 2017-09-29. [8] Rigoleta Dutra Mediano Dias. Modelagem do padr˜aotiss por meio do enfoque dual da funda¸c˜ao openehr. 2011. [9] Sebastian Garde, Petra Knaup, Evelyn JS Hovenga, and Sam Heard. Towards semantic inter- operability for electronic health records–domain knowledge governance for open ehr archetypes. Methods of information in medicine, 46(3):332–343, 2007. [10] Ju¸caraSalete Gubiani, Rafael Port da Rocha, and Marcos Cordeiro d’ Ornellas. Interoperabilidade semˆantica do prontu´arioeletrˆonico do paciente. Simp´osiode Inform´atica da Regi˜aoCentro do RS (2.: 2003: Santa Maria, RS). Anais. Santa Maria: SIRC, 2003, 2003. [11] Dipak Kalra, Thomas Beale, and Sam Heard. The openehr foundation. Studies in health technology and informatics, 115:153–173, 2005. [12] Marcia Narumi Shiraishi Kondo. Mapeamento da base de conhecimento fundamentado em arqu´etipos: contribui¸c˜ao`ainform´atica em sa´ude. PhD thesis, Universidade de S˜aoPaulo, 2012. [13] Heather Leslie. openehr-the worlds record. Pulse+ IT, 6:50–55, 2007. [14] Christiano Pereira Pessanha and Marcello Peixoto Bax. Implementando o prontu´arioeletrˆonico openehr em sistemas gestores de conte´udo:uma aproxima¸c˜ao. Tendˆenciasda Pesquisa Brasileira em Ciˆenciada Informa¸c˜ao, 8(2), 2015. [15] Marcelo Rodrigues dos Santos. Sistema de registro eletrˆonicode sa´udebaseado na norma iso 13606: aplica¸c˜oesna secretaria de estado de sa´udede minas gerais. Perspectivas em Ciˆencia da Informa¸c˜ao, 16(3):272–272, 2011.

8

77 Proceedings ADVANCE 2018 ISBN 978-2-9561129

A Publish/Subscribe QoS-aware Framework for Massive IoT Traffic Orchestration

Pedro F. Moraes,1 Rafael F. Reale1,2 and Joberto S. B. Martins1

1 PPGCOMP - Salvador University - UNIFACS, Brazil 2 Instituto Federal da Bahia - IFBA - Valenca, Brazil

Abstract. Internet of Things (IoT) application deployment requires the allocation of resources such as virtual machines, storage, and network el- ements that must be deployed over distinct infrastructures such as cloud computing, Cloud of Things (CoT), datacenters and backbone networks. For massive IoT data acquisition, a gateway-based data aggregation ap- proach is commonly used featuring sensor/ actuator seamless access and providing cache/ buffering and preprocessing functionality. In this per- spective, gateways acting as producers need to allocate network resources to send IoT data to consumers. In this paper, it is proposed a Publish/- Subscribe (PubSub) quality of service (QoS) aware framework (PSIoT- Orch) that orchestrates IoT traffic and allocates network resources be- tween aggregates and consumers for massive IoT traffic. PSIoT-Orch schedules IoT data flows based on its configured QoS requirements. Ad- ditionally, the framework allocates network resources (LSP/ bandwidth) over a controlled backbone network with limited and constrained re- sources between IoT data users and consumers. Network resources are allocated using a Bandwidth Allocation Model (BAM) to achieve efficient network resource allocation for scheduled IoT data streams. PSIoT-Orch adopts an ICN (Information-Centric Network) PubSub architecture ap- proach to handle IoT data transfers requests among framework compo- nents. The proposed framework aims at gathering the inherent advan- tages of an ICN-centric approach using a PubSub message scheme while allocating resources efficiently keeping QoS awareness and handling re- stricted network resources (bandwidth) for massive IoT traffic.

Keywords: Internet of Things (IoT), IoT Framework, Resource Alloca- tion, Publish/Subscribe, Information-Centric Network (ICN), Quality of Service (QoS), Bandwidth Allocation Model (BAM), Edge Computing.

1 Introduction

The Internet of Things (IoT) is considered an important trend in many areas like smart cities, smart grid, e-health, industry and future Internet [1] [2]. As such, a large effort is being undertaken to find suitable technologies, standards, middlewares and architectures to support IoT application deployment. IoT de- ployment typically requires the orchestration of heterogeneous resources that are

78 Proceedings ADVANCE 2018 ISBN 978-2-9561129

allocated over distinct infrastructures such as cloud computing, cloud of things (CoT), datacenters and backbone networks [1]. IoT devices like sensors and actuators are in large quantity for most IoT appli- cations and, in addition, they differ considerably in terms of processing, storage and functional capabilities. In addition, IoT setups generate a huge amount of heterogeneous traffic leading to an increasing quality of service (QoS), resource allocation and network configuration complexity. Paradigms such as Cloud Computing [3], and more recently Fog Computing [4], arise to alleviate the weight that massive IoT data processing and traffic has on networks and devices. In particular Fog Computing relies on mostly operating on the edges of the network, to minimize the load on the whole network by serving already processed and aggregated data. However, traffic still might need to be sent across the network to interested clients, and as the amount of IoT devices increases and spreads geographically, processed IoT traffic served by IoT/ Fog- like aggregators along the edges might still heavily load the network if no specific traffic management is done. In an attempt to address this scenario, we present a Pub/ Sub QoS-aware framework (PSIoT-Orch) for managing massive IoT traffic aggregated into Fog- like IoT gateways along the network edge. This framework allows for IoT quality of service traffic management according to network-wide specifications, applica- tion domain and IoT characteristics. Aspects like the backbone network topology, network traffic saturation and IoT domain requirements are considered. In this article we’ll describe the PSIoT-Orch framework components, its re- lation to IoT requirements and how they are combined to seamlessly manage IoT traffic. In section 2 we explore proposed IoT-oriented architectures and pro- posals and how they pertains to our framework. Section 3 is an overview of the framework components as well as key features and purpose. In Section 5 we de- scribe how we’ve built our simple proof-of-concept implementation and network evaluation scenario. Finally, section 6 concludes with an overview of what has been achieved.

2 Key IoT Aspects and Related Work

IoT has potential for the creation of new intelligent applications in nearly every field, with it’s devices that enable local or mobile sensing and actuation services. The different fields of application can be organized in different ways into various domains like industrial, smart city and health among others [2]. For every standardized or practical field, IoT devices share common features and requirements in traffic and usage. Our framework builds on dealing with distinct IoT requirements, namely heterogeneity, scalability and QoS. While IoT devices, traffic characteristics and requirements are quite well defined, architectures for IoT generally have a hard time maintaining interoper- ability with each other. In 2009 the ETSI Technical Committee for Machine-to- Machine communications (ETSI TC M2M) was established to develop a reference IP-based architecture relying on existing technologies [5]. This architecture has

79 Proceedings ADVANCE 2018 ISBN 978-2-9561129

three domains: a) the Application Domain, where client and M2M applications reside; b) the Network Domain, consisting of any network between applications and device gateways; c) the Device & Gateway domain, where all the devices and/or gateways are located. Our framework fits well in this sort of architecture, building a bridge in the Network Domain between device gateways and applications. One approach for massive scale IoT data dissemination is presented in [6]. In this proposal, remote and rural areas are the focus and an ad-hoc interconnec- tion infrastructure is adopted for IoT traffic transport using a mix of low-power wireless personal area network (LoWPAN) and wireless sensor network (WSN). To the best of our knowledge, the communications resources between IoT aggre- gators and consumer applications mostly use cloud computing, fog-like services, Internet connectivity, ad-hoc solutions or a mix of them [1] [4]. Our proposal, dis- tinctly from commonly used approaches, uses a controlled network with limited resources in which bandwidth utilization and optimization is the focus.

3 PSIoT-Orch Framework QoS Awareness, Network Resource Allocation and PubSub Orchestration

The overall goal of the proposed PSIoT-Orch framework is to manage the mas- sive traffic generated by a huge number of IoT devices, aiming to handle effi- ciently network resources and IoT QoS requirements over the network between the IoT devices and consumer IoT applications. Consumers might be hosts over the backbone network, cloud computing infrastructures accessed by the network or any other scheme that makes use of the managed network infrastructure for communication (Figure 1). IoT gateways (IoTGW-Ag) act as traffic aggregators interacting with IoT devices. IoT gateway traffic aggregators use a PubSub style architecture lined along the network edge to transmit their data to application clients (consumers), with these transmissions being mediated by a centralized orchestrator (PSIoT- Orchestrator) (Figure 1).

3.1 QoS-aware IoT Traffic Resource Deployment

The PSIoT-Orch framework offers a set of QoS levels based on time-sensitivity to deal with IoT traffic requirements regarding time-sensitivity for massive data transmission:

– Insensitive: best-effort transmission (e.g., weather data, non-critical smart home data). – Sensitive: low data transmission delay (e.g., commercial data, security sen- sors). – Priority: high transmission rate and low delay (e.g., health care, critical industrial sensors).

80 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Fig. 1. PSIoT-Orch framework basic components.

The PSIoT-Orch QoS levels are intended to allow the creation of IoT traffic classes (TCs) over the backbone. These time-sensitivity classes group IoT traffic having similar application requirements and allows a QoS-aware arbitration of IoT massive data flows over the backbone network. The traffic scheduling is done for these defined classes and will require the allocation of network resources. As indicated, subsequently and independently, PSIoT-Orch operation will de- ploy network communication resources (LSPs, bandwidth) based on IoT traffic classes (TCs). As such, application’s IoT data flows are simultaneously priori- tized by the QoS scheduler and communication resources are efficiently allocated over the network using a bandwidth allocation model based strategy (BAM- based).

3.2 Network Resource Allocation based on Bandwidth Allocation Model (BAM)

Once IoT traffic is scheduled according with its QoS requirements, a further aspect mediated by the PSIoT-Orch framework is the effective network resource allocation (LSP; bandwidth) between producer and consumer(s). The architectural approach adopted by the framework is to delegate the allocation of bandwidth resources to a bandwidth allocation model (BAM). Using BAM to allocate resources is an implementation option that is aligned with the fact that the backbone network resources availability is assumed to be restricted. This is a real scenario of typical backbone setups interconnect- ing IoT devices through either private, metropolitan or long-distance networks (like smart cities or large distributed IoT setups). BAM, in turn, have a proven capability to distribute and manage efficiently scarce network resources [7]. Con- sequently, relying on BAMs as a broker-like entity to allocate bandwidth is a

81 Proceedings ADVANCE 2018 ISBN 978-2-9561129

relevant option that must be considered for the PSIoT-Orch framework. This is specially valid when the massive IoT traffic generated by producers will some- how reduce and affect the overall network resources availability and bandwidth dispute will occur.

3.3 PSIoT-Orch PubSub Message Scheme as an Information-Centric Network (ICN) PSIoT-Orch framework uses a Publish/ Subscribe (PubSub) Information-Centric Network (ICN) message scheme in which an information-based access model is used [8]. As such, consumers try to access named content (IoT data) without any direct mapping to the transport mechanism used over the network that interconnect them with data producers. This message scheme provides the inherent ICN advantages to the framework including dynamic discovery and dissemination, efficient distribution of content, the potential to factorize functionalities and in-aggregator cache that may save energy and increase local content availability [9] [10] [11].

4 PSIoT-Orch Framework

PSIoT-Orch framework has 4 main components that interact to provide network resource allocation with quality of service awareness (Figure 1):

– IoT gateways data aggregators (IoTGW-Ag) acting as ”Producers” and IoT applications acting as ”Consumers”; – The ”PSIoT-Orchestrator” acting as the mediator; and – The backbone network interconnecting efficiently the previous components.

In addition, the PSIoT-Orch framework functional operation is composed by 2 distinct and cooperating functional entities:

– The PubSub message scheme allowing the asynchronous request of IoT data over the entire infrastructure; and – The quality of service (QoS) and network resource allocation police and implementation scheme provided by the ”PSIoT-Orchestrator”.

In sequence, these framework architectural components, functional entities and overall framework overview are described.

4.1 PSIoT-Orch Framework Overview The PSIoT-Orch frameworks main role is to manage and monitor the traffic output from each IoT aggregator according to the overall aggregators throughput and the underlying network resource availability and usage (saturation). In PSIoT-Orch, each IoT aggregator works in a similar paradigm to Fog Computing. Each aggregator node is lined up along the network edge collecting

82 Proceedings ADVANCE 2018 ISBN 978-2-9561129

IoT traffic from local devices and offering them to applications via a topic-based PubSub interface. While PSIoT-Orch framework only manage traffic aggrega- tion, it can also be further extended to perform Fog-like capabilities such as IoT raw data processing and manipulation.

4.2 PSIoT-Orch Gateway Aggregator

Each PSiOT-Orch gateway aggregator (IoTGW-Ag) can be sectioned into two main roles: a) that of gathering IoT traffic from local devices and of b) sending consumers their subscribed data, observing the transmission effort each QoS level should follow, as defined by the orchestrator (Figure 2).

Fig. 2. IoTGW-Ag aggregator internal structure.

The IoTGW-Ag section that deals with the actual gathering of IoT data from devices can be based on either MQTT (Message Queue Telemetry Transport) technology [12], traffic generators for simulated devices or other generalized data gateways that interface with the IoTGW-Ag. All these options must preserve the topic-based nature of the frameworks subscription data. The network-facing section of the IoTGW-Ag deals with consumer topic re- quests, via a HTTP PubSub API, sending the orchestrator the metadata related to the IoTGW-Ags own subscriptions, transmission rates and buffer state, as well as keeping track of the orchestrator-defined transmission rates for each QoS level.

4.3 The PubSub QoS Configuration Message Scheme

Figure 3 shows the initial flow of configuration messages between subscribers, providers and orchestrator allowing the IoT data transfers:

83 Proceedings ADVANCE 2018 ISBN 978-2-9561129

(a) A consumer initiates a topic subscription, with a requested QoS level, to a specific IoTGW-Ag. (b) The IoTGW-Ag sends to the orchestrator relevant metadata such as number of subscribers and their QoS levels and buffer allocation. (c) The orchestrator notifies the IoTGW-Ag with an amount of bandwidth that can be consumed by each level of QoS (TCs). (d) The IoTGW-Ag publishes the data to the client according to bandwidth and data availability in the buffer. As IoTGW-Ags receive applications topic subscriptions, they must notify the orchestrator so as to maintain updated the level of information the orchestrator needs to manage for each aggregators data output. While the framework’s or- chestration isn’t distributed, the failure of the orchestration component would not entail in the failure of the IoTGw-Ag data delivery, since they maintain the set of predefined output rates in the case of an orchestrator failure occurs.

Fig. 3. PSIoT-Orch PubSub messages.

4.4 The PubSub Network Resource Allocation Message Scheme Once IoT data has been published with its attributed QoS level, consumers may request them. At this point, a second level of intervention by the PSIoT- Orchestrator is necessary to create an effective communication channel (LSP/ bandwidth) between consumer(s) and producer over their interconnection back- bone network. Figure 3 details the flow of information exchange between any IoTGW-Ag and the orchestrator. This communication is asynchronous and subscribing appli- cations are not aware of this, nor is their subscription dependent on the commu- nication with the orchestrator. Using a IoTGW-Ag that uses an MQTT broker to aggregate IoT data from devices in the local network, this process will be described in a linear fashion: 1. Consumers subscribe to any given IoTGW-Ag, requesting their desired topics and QoS levels. Consumers must be aware of the three QoS levels available in the framework.

84 Proceedings ADVANCE 2018 ISBN 978-2-9561129

2. Internally, the topic subscriptions will be recorded in the IoTGW-Ag’s list of active subscriptions. 3. Subscriptions are registered in the IoTGW-Ag’s internal IoT MQTT broker. 4. Sensor data is published from IoT devices to the IoTGW-Ag’s internal IoT MQTT broker. 5. The subscribed data is passed on to the IoTGW-Ag’s buffers with different QoS levels, according to each topic subscribers requested QoS level. 6. The IoTGw-Ag buffers store the messages that should be published to sub- scribed applications. 7. The IoTGW-Ag requests the allocation of communications resource to the PSIoT-Orchestrator and a LSP (Label Switched Path - LSP) is allocated (or not) considering an efficient distribution of the available network resources. 8. Each buffer is emptied (IoT data transmitted to the application) by using the communication channel (LSP, other) allocated by the BAM model and consistent with its QoS level as determined by the orchestrator.

Fig. 4. PSIoT-Orch framework overall components message exchange.

Other operational aspects of the data exchange between IoTGW-Ag and the orchestrator are:

– Each IoTGW-Ag should periodically (necessarily on each new subscription) send the metadata that the orchestrator needs for traffic scheduling (e.g., buffer sizes, message frequency, subscriber list, topics, QoS levels).

85 Proceedings ADVANCE 2018 ISBN 978-2-9561129

– The orchestrator then decides the effort with which each level of QoS must publish the data to the applications. This decision will depend on several factors, such as the state of saturation of the network and priority of other IoTGW-Ag according to each IoTGW-Ag’s reported metadata. – Once the decision is made, the orchestrator sends each IoTGW-Ag the trans- mission effort that each level of QOS must have. The frequency each IoTGW- Ag’s receives transmission effort changes is dependent on the algorithm that the orchestrator is relying on to decide the transmission rates.

4.5 PSIoT-Orchestrator

The PSiOT-Orchestrator has two main components: the QoS IoT traffic sched- uler module and the bandwidth allocation model module (BAM module). The QoS IoT traffic scheduler module continuously computes the amount of bandwidth that will be associated for each IoT QoS class. The orchestrator functions in a reactive manner: each time it receives an IoTGW-Ag’s metadata it will recalculate all IoTGW-Ag’s QoS level transmission rates. This is the default behavior, but as the amount of updates becomes a processing burden, the transmission rates calculations can be scheduled to appropriate intervals, according to each orchestrator implementation. The traffic scheduler module’s main responsibility is to manage the available bandwidth and distribute it to all IoTGW-Ag’s, according to their throughput and QoS-level. For the orchestrator to be able to properly manage the QoS-aware transmis- sion of all aggregators, it’s vital that it maintains updated knowledge on several pertinent factors. These factors include timely updates on aggregator metadata, like subscriber count, buffer usage and throughput. It is also important to note that aggregators might chose different buffer overflow algorithms according to the IoT device data being collected. For real time sensor data it might not make sense to send outdated data that sits in the aggregator output buffer, so a circular buffer strategy might be more useful in maintaining updated data than a simple first in, first out algorithm. The bandwidth allocation model module acts as a broker for the available re- sources on the backbone network (channels). In effect, each time the IoTGW-Ag needs to send data to consumers, it sends the metadata to the PSIoT orchestra- tor, which in turn requests a communication channel (LSP for a MPLS-aware network) to the PSIoT BAM broker module (message exchange in ”f”) (Figure 4). The request is treated by the BAM module that will grant or deny the chan- nel request, the PSIoT orchestrator will then act accordingly. For computing bandwidth availability, the BAM module will interact with network resources as any BAM implementation does. Information about topology, used bandwidth, current LSPs and other relevant data will be maintained by the BAM module. The behavior of BAM models has been extensively evaluated and detailed in [13] [14]. For the massive IoT traffic scenario it is assumed that bandwidth dispute will occur between QoS classes (high and low priorities) as far as the network is assumed to have limited resources . For this scenario, the GBAM

86 Proceedings ADVANCE 2018 ISBN 978-2-9561129

(Generalized Bandwidth Allocation Model) with AllocTC-Sharing (ATCS) be- havior is evaluated in [7] and is the adopted BAM model in the implementation setup. ATCS behavior allows resource sharing between high and low priority traffic classes and provides the best possible network utilization [15].

5 PSIoT-Orch proof-of-concept

As a proof-of-concept we have developed the three core components of our pro- posed framework: an orchestrator, an aggregator (producer) and a client (con- sumer). We have chosen the network emulator Mininet [16] for its SDN capabilities and flexibility in creating network topologies. Mininet is a network emulator that uses lightweight virtualization to run many different network components in a single system. With Mininet we built a simple bandwidth-constrained network with consumers, aggregators, an orchestrator and two hosts to generate network traffic to simulate non-IoT network traffic. The orchestrator is implemented in a multi-purpose messaging platform, with capabilities similar to PubSub systems. Each aggregator (producer) then con- nects itself to the orchestrator and receives the designated efforts for each QoS level. A simple fixed scheduler was used to determine each QoS level for all ag- gregators (producers). The transmission rates for each QoS level are static and proportionally divided between QoS levels, according to importance. These QoS levels are specific to our producer software and deal directly with each aggrega- tors IoT traffic throughput. Because we are mostly interested in the evaluating network traffic, each ag- gregator (producer) was built to simulate IoT sensor/device data. A configurable rate, topics and data can be attributed to each deployed aggregator (producer). The aggregator is built as a HTTP PubSub server, with none of the usual de- liverable guarantees, as those arent the focus of the proof-of-concept scenario. For consumers, another HTTP server is used. It logs the received IoT data with the ability to configure subscriptions to different aggregators (producers). With this basic setup we can configure different producer-consumer topologies with varied amounts of producers and consumers. The proof-of-concept setup may also include non-IoT traffic between nodes to measure its impact on network congestion and near congestion operation.

5.1 Proof-of-concept network topology

In Figure 5 we show the proof-of-concept network topology used, including the main traffic flows expected: from consumers to producers (in red); and between traffic generators (in yellow). Traffic generating hosts are positioned in a way to directly compete with IoT data consumers and producers for network resources (LSP - bandwidth). To build our network scenario, we created Mininet hosts for each producer, consumer, the orchestrator and non-IoT traffic generators

87 Proceedings ADVANCE 2018 ISBN 978-2-9561129

connected with available stock Mininet Switches and Controller, connected by 1MB/s links. For the orchestrator, we used a simple algorithm to evenly divide the available bandwidth between each aggregator, and for the QoS levels 0, 1 and 2 the individual aggregators bandwidth was allocated in 25%, 35% and 45%, respectively. To accommodate differences in QoS level output, our algorithm also listens for aggregators buffer levels and attempts to allocate more of the available bandwidth accordingly. Each aggregator then simulates 3 different topics, to generate IoT data, with a reasonable throughput, with Ag1 having a fourth topic that has 100 times more throughput. For our scenario, we follow a timeline, with the following main events:

1. Hosts startup: turning on each aggregator, the orchestrator and the non-IoT traffic generators. 2. Client 1 initiates subscriptions: Client 1 subscribes to the three IoT topics, one from each aggregator. 3. Client 2 initiates subscriptions: Client 2 subscribes to the three IoT topics, one from each aggregator, plus the fourth topic from Ag1. 4. End of scenario

Fig. 5. Proof-of-concept network topology with two IoT clients and three aggregators.

88 Proceedings ADVANCE 2018 ISBN 978-2-9561129

5.2 Results After running the scenario, we were able to observe the communication between consumers and producers as well as the management of each aggregator by the orchestrator. After each subscription, the aggregators would report them to the orchestrator, and continue to report their buffer state for each QoS level. Figure 6 shows the buffer states for the aggregators as well as the allocated bandwidth by the orchestrator for each QoS level, according to each of the main events. After the first subscription event, by Client 1, the buffers where constantly at near- empty levels, as the throughput was low enough that the default allocation from the orchestrator was enough. But as soon as Client 2 initiates its subscription, along with the fourth topic in Ag1, the buffer in Ag1 begins to fill rapidly. As soon as the orchestrator detects this, by means of a defined buffer size threshold, it attempts to allocate more bandwidth to Ag1. The orchestrator allocates as much bandwidth that it can without impairing the other subscriptions, but as the data generation rate from Ag1s fourth topic is much larger than the available bandwidth, we still observe Ag1s buffer growing as the orchestrator maintains its attempts to allocate enough bandwidth.

6 Conclusion

The Internet of Things has arrived and with it the possibility that the massive amount of heterogeneous traffic it can generate will heavily load the backbone networks between producers and consumers of IoT traffic. This article presented the PubSub PSIoT-Orch framework aiming for massive IoT traffic orchestration relying on IoT QoS needs and efficient management of network resources. The emulation-based proof-of-concept demonstrated the basic functionality and that our framework is a viable solution for managing the transmission of massive IoT traffic, as it predictively manages and distributes the available band- width between producers. The framework proved to be flexible as in both network flow management, but also in maintaining the IoT traffics characteristics by transparently offer- ing the same topic-based IoT data from devices to consumers in the network. Furthermore, the ability to have the orchestrator work in tandem with CDN QoS functionalities, allows our framework to be used both integrated to the net- work, or simply as a 3rd party traffic control between IoT data producers and consumers. The user-centric approach adopted by the PubSub message scheme between consumers and producers presents an easy interface for data subscription by hid- ing the complex QoS considerations and data management from both producers and consumers. This allows for consumers to focus on receiving the data and also enables the producers to deal with their Fog-like data aggregation and/or processing. In terms of future work, it will be developed a real test setup using a net- work for experimentation testbed (NfExp) distributed in various physical lo- cations (FIBRE Network) to further validate the scalability and flexibility of

89 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Fig. 6. Proof-of-concept simulation main events results.

the framework and simulate BAM behaviors over a real producer/consumer IoT backbone.

References

1. E. E. Rachikidi. Modeling and placement optimization of compound service in a converged infrastructure of cloud computing and internet of things. Universit Paris-Saclay, Evry, 2017. 2. Eleonora Borgia. The internet of things vision: Key features, applications and open issues. Computer Communications, 54:1–31, 2014. 3. Ling Qian, Zhiguo Luo, Yujian Du, and Leitao Guo. Cloud computing: An overview. Lecture Notes in Computer Science, pages 626–631, 2009. 4. Ivan Stojmenovic and Sheng Wen. The fog computing paradigm: Scenarios and security issues. Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, 2014.

90 Proceedings ADVANCE 2018 ISBN 978-2-9561129

5. ETSI TS 102 921 v1.1.1. Machine-to-machine communications (m2m); mia, dia and mid interfaces, 2012. 6. Glenn Daneels, Esteban Municio, Kathleen Spaey, Gilles Vandewiele, Alexander Dejonghe, Femke Ongenae, Steven Latr, and Jeroen Famaey. Real-time data dis- semination and analytics platform for challenging iot environments. presented at the Global Information Infrastructure and Networking Symposium (GIIS), 10 2017. 7. Rafael Freitas Reale, Romildo Martins da S. Bezerra, and Joberto S. B. Martins. Gbam: A generalized bandwidth allocation model for ip/mpls/ds-te networks. In- ternational Journal of Computer Information Systems and Industrial Management Applications (IJCISIM), 6:635–643, 2014. 8. Sasu Tarkoma. Publish/subscribe systems: design and principles. John Wiley & Sons, 2012. 9. Emmanuel Baccelli, Christian Mehlis, Oliver Hahm, Thomas C. Schmidt, and Matthias Whlisch. Information centric networking in the iot. Proceedings of the 1st international conference on Information-centric networking - INC ’14, 2014. 10. Hong-Linh Truong and Nanjangud Narendra. Sinc - an information-centric ap- proach for end-to-end iot cloud resource provisioning. 2016 International Confer- ence on Cloud Computing Research and Innovations (ICCCRI), 2016. 11. Kyoungho An, Aniruddha Gokhale, Sumant Tambe, and Takayuki Kuroda. Wide area network-scale discovery and data dissemination in data-centric publish/sub- scribe systems. In Proceedings of the Posters and Demos Session of the 16th In- ternational Middleware Conference, page 6. ACM, 2015. 12. Andrew Banks and Rahul Gupta. MQTT Version 3.1. 1. OASIS standard 29, 2014. 13. Rafael Freitas Reale, Romildo Martins da S. Bezerra, and Joberto S. B. Martins. A preliminary evaluation of bandwidth allocation model dynamic switchings. Inter- national Journal of Computer Networks & Communications (IJCNC), 6(3):131– 143, May 2014. 14. Rafael Freitas Reale, Romildo Martins da S. Bezerra, and Joberto S. B. Mar- tins. Analysis of bandwidth allocation models reconfiguration impacts. In 2014 3rd International Workshop on ICT Infrastructures and Services (ADVANCE), December 2014. 15. Rafael Freitas Reale, Walter d. Costa Pinto Neto, and Joberto S. B. Martins. Alloctc-sharing: A new bandwidth allocation model for ds-te networks. In 2011 7th Latin American Network Operations and Management Symposium, pages 1–4, Oct 2011. 16. Bob Lantz, Nikhil Handigol, Brandon Heller, and Vimal Jeyakumar. Introduction to mininet, 2017.

91 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Automated Scale Calibration and Color Normalization for Recognition of Time Series Wound Images

Te-Wei Ho13*, Jin-Ming Wu1, Hao-Chih Tai23, Chun-Che Chang3, Chien-Hsu Chen3, Feipei Lai345 1 Department of Surgery, National Taiwan University Hospital and National Taiwan University College of Medicine, Taipei, Taiwan 2 Department of Plastic Surgery, National Taiwan University Hospital, Taipei, Taiwan 3 Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University Taipei, Taiwan 4 Department of Computer Science and Information Engineering, National Taiwan University Taipei, Taiwan 5 Department of Electrical Engineering, National Taiwan University Taipei, Taiwan [skbaskba]@gmail.com

Abstract Compared with traumatic and surgical wounds, chronic wounds are hard to heal within two months and often fail to proceed through the normal phases of wound healing over time. The patients with chronic wound often are difficult to back clinics because their limitation of activities. With an increase in the aging population, an organized long-term wound care technique is urgently needed. Besides, wound images taken by a camera alters due to different angles and different distances from the wounds, which is not conducive for wound tracking. Hence, this study aimed to propose an image-processing algorithm along with calibration mechanism, which can calibrate the size and normalize the color of the image according to a color reference card. Our algorithm achieved a wound coverage rate of 90.25%, whereas the original image had a coverage rate of 88.31%. Furthermore, we proposed a computer-aided wound area measurement prototype with a bias of 12.42% compared to the area manually calculated by physicians. The contributions provide further support to evaluate the change of wound condition as a part of telecare service.

Keywords—chronic wound; color normalization; image processing; computer aided diagnose; wound area measurement; telecare

92 Proceedings ADVANCE 2018 ISBN 978-2-9561129

1 Introduction In the United States, approximately 6.5 million patients die of wound infections and its complications [1]. Treatment of chronic wounds is very costly owing to the difficulty in tracking treatment efficacy over time. The main metrics used to track healing progress over time are changes in wound area and tissue composition. [2] Unfortunately, human wound measurements are usually done by “ruler-based measurement” (Figure 1), resulting in high bias and high inaccuracy, which deteriorates tracking of healing progress. Some medical staff measure patient wounds by pasting an op-site film and then calculating the area by counting the total square grids on the film (Figure 2). This method can calculate the wound area accurately but still has some constraints. The first limitation is that the wound needs to be scabbed or dried to execute such technique. The second limitation is that the manual counting of square grids on massive wounds is time consuming. Therefore, computer-aided diagnosis becomes a major trend of solving these kinds of problems due to the advancement of digital image processing techniques [3].

Figure 1. Wound area measurement using ruler

Figure 2. Wound sketch on op-site film

Many studies on wound assessment and analysis have contributed to the development in computer vision. In a previous study [4], we designed an automatic surgical wound detection and segmentation method. As the pattern and color of wounds are obviously different from that of skin, the wound edge could address the size of wound area appropriately. While using Canny-edge detection individually shows many errors when the background image is complicated, we used a combination of edged- based and color-based segmentation method to filter the wound image. The segmented image was evaluated by three medical doctors and a wound was considered well segmented only when the majority votes were agreed on the results. The goal of this research was to establish an algorithm prototype, which can normalize the color and calibrate the scale of wound images taken with the color

93 Proceedings ADVANCE 2018 ISBN 978-2-9561129

reference card and accurately analyze the surface area of wounds, to reduce the bias when measuring wounds using a ruler. The advantage of normalizing the color of wounds is not only for its recognition precision but also to establish a gold standard for computer aided diagnosis. The infection of wound is often diagnosed by color changes around the wound. Detection of wound infection will benefit from standardizing the color of wound images.

2 System Architecture The architecture of our system is shown in Figure 3. Each input image will go through the pipeline for wound detection with a feature of wound area calculation. The color calibration is based on the reference card and adjusted under least-square approximation.

Figure 3. System overview

3 Method 3.1 Reference Card Acquisition In order to normalize and calibrate the color and size of our wound images, we need a standard reference of color and scale. For this reason, we used a color reference card, as mentioned in Jose A. et al. [5]. We placed the color reference card next to the wound while shooting a wound image (Figure 4). The medical practitioner always disinfected the card before use.

Figure 4. Color reference card with wound Figure 5. Recognizing the color reference card

94 Proceedings ADVANCE 2018 ISBN 978-2-9561129

The first crucial step is to recognize the location of the color reference card in an image. The pattern of a color reference card could be roughly considered as a regular texture pattern. Near-regular texture patterns are pervasive in man-made and natural environments. There are different kinds of scale invariant methods for acquisition of such pattern of the image such as lattice fitting using Markov Random Field (MRF) [6], Scale Invariant Feature Transform (SIFT) [7], and Speeded Up Robust Feature (SURF) [8]. In this study, we adopted SURF, as the computational complexity was too high for MRF. The SURF initially detects key points by calculating Laplacian of Gaussian using box filters. The advantage of calculating convolution with box filter is that it can be done with less calculation by the help of integral image. A Hessian matrix was adopted to distinguish the local maximum and minimum values among the image. The final step is the matching of key points. Key points between two images are matched by identifying their nearest neighbors. By clustering sufficient neighborhood key-points, the target shape can be recognized. Figure 5 shows the reference card framed by green lines. 3.2 Color Normalization Color normalization is a concern when viewing the same object or pattern under different lighting conditions, cameras, angles, and other complicated factors. The distribution of color values varies when the conditions are dissimilar. Color normalization compensates these variances and provides a fair color distribution for object recognition. Despite, the modern methods of image processing having built-in color optimization functions, the built-in optimization functions are generally based on the distribution of color histogram. Such methods result in unnatural image that is too distant from the colors of the original image. To overcome this problem, we proposed a correction matrix with a bias parameter. The relationship between original color reference card and the color reference card in the wound image can be described by the following function (Figure 6), where R’, G’, and B’ represent the standard color references and r, g, and b represent for the colors that require correction.

Figure 6. Correction matrix 3.3 Scale Calibration As the absolute length and width of the color reference card is known, geometric calibration can also be based on the color reference card. While the scale and shape of the reference card differs in every wound image, a perspective transformation of homogenous coordinates can assist the intention of scale calibration [9]. We can do an image mapping between the original color reference card and wound image color reference card using the homography transformation. It shows the homogenous transformation between two planes. Based on the relationship, we can solve the eight parameters with at least four corresponding points given in each plane. The four corresponding points are chosen by the four corners of the color reference card (Figure 7). The homogenous coordinates can be transformed into an expanded matrix, which allows us to calculate the eight parameters via the least square approximation (Figure 8).

95 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Figure 7. Homogenous transformation Figure 8. Expanded matrix for homogenous between two images coordinate

3.4 Wound Detection Our wound detection algorithm is based on few observations. First, skin region pixels are in certain values of color space. In addition, wound region possesses the most robust edge area in the skin region. By these two facts, we developed a system comprising three operations: robust edge detection, skin region detection, and reconstruction of the wound region. During robust edge detection, we calculated the robustness of each edge detected by the Canny-edge detector. Canny-edge detector consists of a Gaussian filter, non-maxima suppression, and hysteria threshold. While under some circumstances wound region may be tattered, it is crucial to reconnect the edges and create a more robust edge, if necessary. The connection of edges is done by connected-component labeling and optimization by the mean gray pixel value. The edges found by Canny-edge detector will perform this algorithm and decide whether the connected components are required or not. We then detected the region of skin; normally the skin can be detected by a skin color model. A. Albiol [10] proposed a skin detection model for several color spaces including Red, Green, and Blue (RGB) and Hue, Saturation, and Value (HSV). In their research, they compared skin region detection between different color spaces and showed that color space does not have much influence on skin region detection. In contrast, P. Kakumanu [11] compared the efficiency between different color spaces and concluded that the HSV color space is best suited for skin color modeling. We followed the HSV color modeling presented by them to perform a detection of skin region. Next, we tried to reconstruct the wound region of the image. We reconstructed the hollow parts in the skin region using topological skeleton image processing technique. We performed a branch-linking method similar to the concept of convex hull to construct the shape. By applying dilation and erosion operator, we can smoothen the edges and connect the robust edge into a mask of wound. 3.5 Wound Area Calculation As the size of the color reference card is known, we can calculate the pixels and compare the relationship between every photo’s color reference card. The length and width of the color reference card are 6.65 cm and 4.5 cm, respectively, which gives an area of 29.925 cm2. For every photo, we calculated the total pixel of the color reference card region. Thus, we can deduce how many pixels represent 1 cm2, marked as A1. Then, we determined the area of the wound region by comparing their pixel counts to our previously defined A1.

96 Proceedings ADVANCE 2018 ISBN 978-2-9561129

4 Results 4.1 Evaluation Study We have collected a variety of wound images located on different places, including chest, thigh, arm, and back. A total of 24 images were collected by an iPhone6’s back camera using default mode. Despite the advances in image recognition after the calibration of image, the evaluation of these methods has been generally subjective. To verify our segmentation improvement after calibration of colors, we applied a precision recall measurement method that was presented by Monteiro, F. et al. [12] In their work, they presented a boundary-based evaluation with a cross validation of positive and negative rates using receiver operating characteristic (ROC) curve. We applied the evaluation method to see how well the segmentation results improve after we calibrated our image. During ROC curve evaluation, we set the manual segmentation result performed by a professional surgeon at the National Taiwan University Hospital as a gold standard. We compared the gold standard to our segmentation results performed by computer after calibration. The outcome can be viewed in four major parts, including true positive, false positive, true negative, and false negative. We then obtained the true positive rate, accuracy, and specificity according to the information above. The boundary-based evaluation was conducted by comparing the results of computer executing segmentation with the gold standard. We performed this by subtracting the two images and calculating the non-black area (Figure 9). The computer segmenting image were named as C and the manual gold standard as M. We can determine how many pixels are over-segmented by performing |C-M|. Note that we only admitted the values as non-black pixels, where a non-black pixel was subtracted from a black pixel. Meanwhile, we can determine how many pixels that are part of segmentation are missing by performing |M-C|. Figure 10 shows the outcome of one of the wound segmenting results with the positive and negative segmentation boundaries.

Original image Computer label Physician label Figure 9. Boundary-based evaluation

Computer label Physician label Positive Negative Figure 10. Wound coverage rate

By performing our color normalization before wound image segmentation, we obtained an average wound coverage rate of 90.25%. The detailed information of the average wound coverage rate for calibrated image and original image is presented in Table I.

97 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Table I. Coverage rate evaluation Cases Wound Coverage Rate (%) Out of Coverage Rate (%) Calibrated Image 24 90.25 20.55 Original Image 24 88.31 20.15

4.2 Case Study After calibration of color and scale, we developed a deeper understanding of our images. As mentioned earlier, as the size of color reference card is known, we can predict the area of wound images by comparing the pixels with color reference card. We collected five images with assistance from professional medical staffs. The wound in each image was sketched by an op-site film and pasted on a square-grid paper. The area of the wound can be calculated by summing up the total grids of the sketched area. The comparison of the area of five collected wound images is listed in Table II.

Table II. Wound Area evaluation Area Calculated by Area Calculated by Case Bias (%) Sketch (cm2) Computer (cm2) I 8.92 10.22 14.5 II 43.32 47.45 9.5 III 10.65 12.06 13.2 IV 12.53 14.11 12.6 V 13.66 15.34 12.3 Average 12.4

5 Conclusion and Future Work In this study, we proposed a color and scale calibration method according to a color reference card for time series wound images. A wound area calculation prototype was implemented. Once the standard color definition is established, further applications for wound healing process can be conducted. The bias is dependent on the skin region included in the segmented image. In addition, the curvature of skin would also affect the bias between the computer aided approach and manual approach. Our approach could be improved with a more accurate chronic wound segmentation algorithm. However, the area of the skin region that should be included into the segmented wound images to diagnose peripheral skin lesion is unclear. If the area of the image is sufficient, a deep neural network for wound healing process analysis can be implemented. In the future, we plan to develop a telecare service for ubiquitous wound assessment that will be implemented in a mobile application for wound monitoring. Further clinical studies on wound healing process are also warranted.

Acknowledgment The authors would like to thank all members of the Department of Surgery at the NTUH for their efforts and contribution. We are also grateful to Jui-Tse Hsu for his help in this study. This study was

98 Proceedings ADVANCE 2018 ISBN 978-2-9561129

supported in part by grants from the Ministry of Science and Technology, Taiwan (MOST 106-2628- E-002-004-MY3 and 106-2627-M-002-022).

References

[1] Sen, C.K., Gordillo, G.M., Roy, S., Kirsner, R., Lambert, L., Hunt, T.K., Gottrup, F., Gurtner, G.C., and Longaker, M.T.: ‘Human skin wounds: a major and snowballing threat to public health and the economy’, Wound Repair and Regeneration, 2009, 17, (6), pp. 763-771.J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68-73. [2] Budman, J., Keenahan, K., Acharya, S., and Brat, G.A.: ‘Design of A Smartphone Application for Automated Wound Measurements for Home Care’, Iproceedings, 2015, 1, (1), pp. e16. [3] Song, B., and Sacan, A.: ‘Automated wound identification system based on image segmentation and artificial neural networks’, in Editor (Ed.)^(Eds.): ‘Book Automated wound identification system based on image segmentation and artificial neural networks’ (IEEE, 2012, edn.), pp. 1-4. [4] Shih, H.-F., Ho, T.-W., Hsu, J.-T., Chang, C.-C., Lai, F., and Wu, J.-M.: ‘Surgical wound segmentation based on adaptive threshold edge detection and genetic algorithm’, in Editor (Ed.)^(Eds.): ‘Book Surgical wound segmentation based on adaptive threshold edge detection and genetic algorithm’ (International Society for Optics and Photonics, 2017, edn.), pp. 1022517-1022517-1022515. [5] Jose, A., Haak, D., Jonas, S.M., Brandenburg, V., and Deserno, T.M.: ‘Towards Standardized Wound Imaging’: ‘Bildverarbeitung für die Medizin 2015’ (Springer, 2015), pp. 269-274. [6] Geman, S., and Graffigne, C.: ‘Markov random field image models and their applications to computer vision’, in Editor (Ed.)^(Eds.): ‘Book Markov random field image models and their applications to computer vision’ (1986, edn.), pp. 2. [7] Lowe, D.G.: ‘Distinctive image features from scale-invariant keypoints’, International journal of computer vision, 2004, 60, (2), pp. 91-110. [8] Bay, H., Tuytelaars, T., and Van Gool, L.: ‘Surf: Speeded up robust features’, Computer vision– ECCV 2006, 2006, pp. 404-417. [9] Deserno, T.M., Sárándi, I., Jose, A., Haak, D., Jonas, S., Specht, P., and Brandenburg, V.: ‘Towards quantitative assessment of calciphylaxis’, in Editor (Ed.)^(Eds.): ‘Book Towards quantitative assessment of calciphylaxis’ (International Society for Optics and Photonics, 2014, edn.), pp. 90353C-90353C-90358. [10] Albiol, A., Torres, L., and Delp, E.J.: ‘Optimum color spaces for skin detection’, in Editor (Ed.)^(Eds.): ‘Book Optimum color spaces for skin detection’ (IEEE, 2001, edn.), pp. 122-124. [11] Kakumanu, P., Makrogiannis, S., and Bourbakis, N.: ‘A survey of skin-color modeling and detection methods’, Pattern recognition, 2007, 40, (3), pp. 1106-1122. [12] Monteiro, F., and Campilho, A.: ‘Performance evaluation of image segmentation’, Image Analysis and Recognition, 2006, pp. 248-259.

99 Proceedings ADVANCE 2018 ISBN 978-2-9561129

ASP: An IoT Approach to Help Sedentary People

Maur´ıcioMoreira Neto1,4, Emanuel F. Coutinho1,2,4, Matheus Roberto Oliveira1,2,4, Leonardo O. Moreira1,2,4, and Jos´eNeuman de Souza3,4

1 IBITURUNA – Research Group in Cloud Computing and Systems 2 Virtual University Institute (UFC-VIRTUAL) 3 Master and Doctorate in Computer Science (MDCC) 4 Federal University of Cear´a(UFC) – Fortaleza – Cear´a– Brazil [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract Due to advances in microprocessor technologies and embedded systems, it was possible the development of small devices capable of capturing data and exchanging information among them using the Internet. The Internet of Things (IoT) uses these small devices (called smart object) to capture information from the environment. IoT enabled the emer- gence of several areas of research such as traffic, intelligent laboratories, health and smart cities. In IoT domain, the Health care area has been highlighted by the need to monitor people with limitations or diseases. With the low cost of wearable devices, it is possible to monitor the physiological data of a user in real time. In this article, we present ASP, an IoT approach to help sedentary people. The ASP approach monitors users’ physiological data through wearable devices and sends them to the cloud for processing. In the cloud, the data is processed and presented to the health professional, that will monitor and establish new activities for users.

1 Introduction

The advancement of technologies such as embedded systems, microelectronics, communication, and detection allowed the emergence of the Internet of Things (IoT) [9]. The IoT can be defined as a set of devices connected to the Internet capable of monitoring and acting in the environment [9, 13]. The smart objects are everyday objects that have capabilities of processing and communication, together with sensors to capture the environment data. The advance in pervasive computing and network communication has influenced the data collection growth nowadays [3]. Modern devices of the type of smart-phones, smart-home devices, smart-televisions and wearable devices can be connected in public and private networks, increasing the capacity to monitor the environment and those who use these devices. The life expectancy of Brazilian people has been increasing since the 60s year [5]. However, there have been an increasing number of elderly with health issues that need a full-time care- giver, although they do not have it. On the other hand, individuals in mobility (on urban traffic, national or international trips) may experience an adverse situation like dizziness, fainting, and running over. In cases of emergency, a stranger usually does not know what to do to help the individual. On Health-care, there are several factors that must be taken in consideration before making important decisions, e.g. which area has a growing number of occurrences of a particular disease [10]. However, most of the time, the information that are needed by managers are distributed in diverse databases in different non-integrated formats. The home care consists of a form of primary care performed by a lay caregiver, a specialist or a multidisciplinary team [4]. This modality is applied in the treatment of chronic patients who are not at risk of death or elderly.

100 Proceedings ADVANCE 2018 ISBN 978-2-9561129

ASP Moreira Neto et al.

In this paper, we proposed ASP, an acronym for “An IoT Approach to help sedentary People”. ASP monitor people with physical inactivity and, based on the data collected by wearable materials, sends the patient’s information to the health professional to analyze and create new goals and exercises for the patient. The rest of the paper is organized in the following sections. Section 2 presents the related works. Section 3 describes the architecture of ASP proposal. Section 4 describes the components present in the proposal and the evaluation design. Finally, in Section 5, the conclusions and the future proposals are presented.

2 Related Work

IoT allows the connection of people and things “anytime”, “anyplace”, with “anything” and “anyone” [6]. In this scenario, many applications can be developed, and serve diverse areas of society, such as smart laboratories [14][15], parking lot [1], and traffic and health systems. Moreover, diverse services also can be created, such as prediction [12]. Krachunov et al. (2017) presented an approach to monitoring people’s heartbeat using wearable devices. According to the authors, the approach has a low cost of energy making the battery of the device have a long duration (approximately a month) [7]. The heartbeat data is collected and processed on the electrocardiogram plate present in the mobile device when activated. After the collection step, the data is transmitted via Bluetooth to the application to generate the graphics [7]. Manisha et al. (2016) have proposed an approach to detect heart attacks by fully monitoring the user’s health using a wearable device [11]. The wearable device captures the heartbeat and blood pressure to later send this data to the user’s smartphone. The smartphone sends alerts to the hospital if you are having a heart attack [11]. Velicu et al. (2016) investigated low complexity methods to process the minimum number of vital biological signs during sleep and classify them correctly [17]. The approach utilized a bracelet equipped with accelerometer and electrocardiogram sensors to capture user data and send, via Bluetooth, to an Arduino that processes this data and generates sleep charts [17]. Lopes et al. (2015) proposed a prototype and hardware prototype To help caregivers and patients in home care situations. The authors used Set-Top Box (STB) connected to a TV with Internet access so that the user can interact by inserting information about the current state of health[4]. The proposal captures data continuously through medical sensors to power the system. The data and information provided by the user are used for inference about the clinical condition of the patient[4]. Queiroz Neto et al. (2017) presented VITE, an intelligent system for emergency health. The VITE proposal was initially based on Brazilian digital TV and home care, however, it evolves towards the area of emergency care until individual interurban mobility [5]. The VITE proposal came to meet the demand to expedite the service process due to the lack of availability of important information in real time. The VITE consists of hardware components (V-HARD), mobile application (V-APP), ontology-based inference mechanism (V-ONTO), a specialized social network (V-NET) and a cloud service (VaaS) [5]. VaaS aims to provide user-generated information in real time using sensors in the body. This informations are used in prevention services. To obtain user data, the VITE a remote server is designed to map out possible risks and identify an emergency call. The service layer of VITE providing scalability and modularity [5]. Parate et al. (2017) [16] intend to build a wereable-smarthone system (RisQ) that detects when the user smokes, how many times a day, the duration and quantity of toxic substances

2

101 Proceedings ADVANCE 2018 ISBN 978-2-9561129

ASP Moreira Neto et al.

was ingested. RisQ is based on a bracelet and accelerometer equipped with gyroscope, which captures variations in position and rotation of the user’s arm, and detects when the action performed was smoking. The collected data is sent to a smartphone, where graphics that inform the smoke sessions are created. The main challenge was to differentiate the gestures that the body performs when smoking the day-to-day gestures, such as drinking and eating, besides the possible actions that the user can do while smoking. Cvetkovic et al. (2014) [2] describe the process of creating a wereable-smartphone system that monitors stress levels and physical activity of the user. With the smartphone, an algorithm first calculates its position (place where it is, rotation, etc.), and then monitors the performed activity, through the accelerometer of the device. With the wereable, in the case of Microsoft Band 2, the sensors of the bracelet were used to capture the heart rate and EDA (electrodermal activity) of the user, so that it is possible to monitor stress levels. Data collected from both devices is sent to an existing application (Fit4Work) where graphs that indicate physical activity and stress levels are created.

Table 1: Summary of related work Reference IoT Cloud Computing Health Krachunov et al. (2017) X X X Manisha et al. (2016) X X X Velicu et al. (2016) X X X Lopes et al. (2015) X X X Queiroz Neto et al. (2017) X X X Parate et al. (2014) X X X Cvetkovic et al. (2017) X X X ASP X X X

Table 1 shows related work using IoT with cloud computing for the Health applications. We observed that the integration of IoT elements (wearable devices and smartphones) associated with the power of cloud computing can provide the creation of complex IoT systems capable of processing and storing thousands of smart objects data. Health applications can capture a variety of physiological data from users and can not process them and store them on smart objects that have limited hardware. Cloud computing can provide solutions to IoT problems by providing infrastructure for processing and storing data from smart objects.

3 ASP Architecture

In this section we detail the ASP, an IoT approach to help sedentary people. The ASP ar- chitecture is composed of three main steps, as presented by Figure 1. The first step is the gathering and display of data. The second step is the storage and processing in the cloud of collected data. Finally, the third step is the web application for professional health to monitor the patient’s condition and set goals according to the data obtained. The first set is responsible to collect the user physiological data (e.g., heartbeat, number of daily steps and calories) using a wearable device Xiaomi Mi Band 2 [8]. After collected, the user data is sent via Bluetooth Low Energy (BLE) protocol to the smartphone, where it will be displayed. Bluetooth Low Energy (BLE) is a wireless technology designed for novel applications in the healthcare, fitness, home entertainment and security. If compared to classic bluetooth, BLE is intended to provide considerably reduced power consumption and cost, maintaining a

3

102 Proceedings ADVANCE 2018 ISBN 978-2-9561129

ASP Moreira Neto et al.

Smartphone Bluetooth Low Energy (BLE)

User + MiBand Database BLE Wi-Fi

User + MiBand BLE ... Wi-Fi

Analyzes Server Web Application for Admin User + MiBand

Figure 1: Proposed architecture

similar communication range. The user has access to his or her physiological data through a mobile application that displays the data through visualizations. The application will also receive the feedback from the health professional, indicating what new daily goals are to be followed. The smartphone is also used as a gateway to send the wearable device data to the cloud. The second step is responsible for sending the captured data to the cloud for processing and storage. The data is sent from the smartphone to the cloud via the Wi-Fi protocol. In the cloud, data is stored in the database and processed for statistical inference. The information is sent to the health professional through a web application. The health professional can follow the patient remotely, observing their daily activities and recommending new goals according to their daily physiological data. The third step is the web application responsible for receiving the statistical inferences provided by processing data in the cloud and presenting them to the health professional. The health professional will be able to see, through visualizations, the information of his patients and to provision new exercises or goals.

4 Evaluation Design

Wearable computing is a computing approach where devices are directly connected to the user. In this approach, users “dress” the device. These devices are called wearables and intended to capture the data in a way that the user does not notice in the day-to-day. Currently, there are several wearable devices (e.g., Apple Watch, Samsung Gear and Google Glass), however, the biggest selling point is the watchband Mi Band 2 [8]. The Mi Band 2 is a watchband from Xiaomi company and it has several monitoring functions. Among the various functions of this watchband, the following stand out: counting of traveled steps, spent calories and measurement of heart rate of the users. Another advantage of this smart watch is its low cost, costing around $20,00 $25,00 dollars. Figure 2 presents the Xiaomi Mi Band 2 wearable device. To validate the proposed architecture of this paper, this section presents and comments a Proof of Concept (PoC) of an application that uses heartbeats data collected by a Mi Band 2 and interacting through an Android application with a Java web application on a server in the cloud that makes data persistence in a relational database.

4

103 Proceedings ADVANCE 2018 ISBN 978-2-9561129

ASP Moreira Neto et al.

Figure 2: Mi Band 2 [8]

Table 2: API developed in Android for bluetooth communication with Mi Band 2 Method Description Method used to establish a communication channel with the Mi Band 2. It is only possible connect() : boolean to connect to the Mi Band 2, if the smartphone is paired, via bluetooth, with the Mi Band 2. Method used to initialize a Listener to observe the bluetooth communication startListeningForHeartbeats() : void and store the data referring to heartbeats that are originate in the Mi Band 2. Method used to transform and return into a double getLatestHeartbeatReadings() : double[] vector the heartbeats data stored by the Listener.

Due to the lack of an official API for communication Android applications with the Mi Band 2, there was a need to implement a bluetooth communication module (simulating an API) to enable the exchange of information between the Android application and Mi Band 2. For this, we basically used three methods to implement a bluetooth communication with Mi Band 2 in the Android application. Table 2 shows the methods and their respective functions in the application that was developed to capture heartbeat data through the Mi Band 2. In addition to the developed bluetooth API methods, we had to create two more important methods. The first is a method that initializes an Android thread that continually calls the server in the cloud to see if there is an alert for the user. This first method is invoked as soon as the connection to Mi Band 2 is established. The second method is responsible for instantiating an Android to pack a set of heartbeats data, retrieved by the getLatestHeartbeatReadings() method, and sending them to the server in the cloud. In this PoC, the HTTP communication protocol was used to send the heartbeat data to the server and also to obtain alerts to the user that originated on the server. Figure 3 illustrates the developed PoC that is responsible for connecting to the Mi Band 2, retrieving heartbeats data and displays to the user alerts identified by the server in the cloud. For this PoC, a Java web application is also developed to represent the processing that is

5

104 Proceedings ADVANCE 2018 ISBN 978-2-9561129

ASP Moreira Neto et al.

Figure 3: Main activity of the Android application

done on the server in the cloud. This application is divided into two modules: service module and administrative module. The service module has two basic services. The first service is responsible for receiving the heartbeats data and storing it in the database. The second service checks if there is an alert to appear in the Android application. The administrative module is used by a user who knows the data’s nature collected by the wearable device and performs the interpretation of this data. For this, the administrative module provides a web interface for viewing the data collected on the wearable device. In addition, the administrative module provides an interface to enable the registration of alerts to the Android application user. Figure 4 shows the screen of the data collected in the administrative module and Figure 5 illustrates the form for sending alerts to the user’s Android application. All data collected by the wearable device and the alerts are persisted in a database. In this PoC, an instance of the PostgreSQL database server was used.

Figure 4: View of data collected in the administrative module

6

105 Proceedings ADVANCE 2018 ISBN 978-2-9561129

ASP Moreira Neto et al.

Figure 5: Form for sending alerts to the user’s Android application

5 Conclusions and Future Work

This paper proposed the ASP, an IoT Approach to help Sedentary People. The ASP proposal monitors people with physical inactivity and, based on the data collected by wearable devices, sends the patient information to the health professional to analyze and create new goals and exercises for the patient. In our approach, low-cost weareble devices (Xiaomi Mi Band [8]) were used to perform the capture of the patient’s state data. We implemented as a Proof of Concept (PoC) an application that uses heartbeats data collected by a Mi Band 2. This application interacts with an Android application and a Java web application on a server in the cloud that makes data persistence in a relational database. As future work, we intend to improve the interface of ASP in order to provide a greater interaction between the health professional and the patient. We also intend to test with several connected devices to observe the behavior of ASP with high scalability.

References

[1] Emanuel Ferreira Coutinho, DˆenisLeorne da Cunha, and Pedro Brito. A proposal for a parking integrated management. In 5th International Workshop on ADVANCEs in ICT INfrastructures and Services (ADVANCE 2017), Evry, Jan 2017. [2] Boˇzidara Cvetkovi´c, Martin Gjoreski, Jure Sorn,ˇ Pavel Maslov, MichalKosiedowski, Maciej Bogda´nski,Aleksander Stroi´nski,and Mitja Luˇstrek. Real-time physical activity and mental stress management with a wristband and a smartphone. In Proceedings of the 2017 ACM Inter- national Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2017 ACM International Symposium on Wearable Computers, UbiComp ’17, pages 225–228, New York, NY, USA, 2017. ACM. [3] Thiago Moreira da Costa, Elie EL Rachkidi, Nazim Agoulmine, Leonardo M. Gardini, Reinaldo Braga, Cesar Moura, Luiz O. M. Andrade, and Mauro Oliveira. An enhanced architecture for lariisa: An intelligent system for decision making and service provision for e-health using the cloud. In 4th International Workshop on ADVANCEs in ICT INfrastructures and Services (ADVANCE 2015), Recife, dez 2015.

7

106 Proceedings ADVANCE 2018 ISBN 978-2-9561129

ASP Moreira Neto et al.

[4] Vitor de Carvalho Melo Lopes, Valdir Silveira Junior, Carlos Giovanni Nunes de Carvalho, An- tonio Mauro Oliveira, and Jose de Ribamar Matins Bringel Filho. A digital tv prototype with medical sensors for a smart system in home care. In 4th International Workshop on ADVANCEs in ICT INfrastructures and Services (ADVANCE 2015), Recife, dez 2015. [5] Eliezio Gomes de Queiroz Neto, Oton Crispim Braga, Nicodemos Freitas, Luiz Odorico Monteiro de Andrade, and Mauro Oliveira. Vite - velocity and intelligence technology on emergency health systems. In 5th International Workshop on ADVANCEs in ICT INfrastructures and Services (ADVANCE 2015), Evry, Jan 2017. [6] Patrick Guillemin and Peter Friess. Internet of things strategic research roadmap. Technical report, http://www.internet-of-things-research.eu/pdf/IoT Cluster Strategic Research Agenda 2009.pdf, 2009. [7] S. Krachunov, C. Beach, A. J. Casson, J. Pope, X. Fafoutis, R. J. Piechocki, and I. Craddock. Energy efficient heart rate sensing using a painted electrode ecg wearable. In 2017 Global Internet of Things Summit (GIoTS), pages 1–6, June 2017. [8] Rodrigo Leal, Ismael Pereira, Cidronio Oliveira, and Francisco Airton Silva. Uma anlise compara- tiva de aplicativos mveis para medio de batimentos cardacos. In III Escola Regional de Informtica do Piau. Livro Anais - Artigos e Minicursos, Jul 2017. [9] J. Lin, W. Yu, N. Zhang, X. Yang, H. Zhang, and W. Zhao. A survey on internet of things: Architecture, enabling technologies, security and privacy, and applications. IEEE Internet of Things Journal, 4(5):1125–1142, Oct 2017. [10] Gabriel Lopes, Mauro Oliveira, and Vania Vidal. Lais: Towards to a linked data framework to support decision-making on healthcare. In 5th International Workshop on ADVANCEs in ICT INfrastructures and Services (ADVANCE 2015), Evry, Jan 2017. [11] Mamidi Manisha, Katakam Neeraja, Vemuri Sindhura, and Paruchuri Ramya. Iot on heart attack detection and heart rate monitoring. International Journal of Innovations in Engineering and Technology (IJIET), 7(2), August 201. [12] Maur´ıcioMoreira Neto, Emanuel Coutinho, Leonardo Moreira, and Gustavo Santos. Feed: a fore- cast evaluation of elasticity cloud data. In 7th WORKSHOP ON AUTONOMIC DISTRIBUTED SYSTEMS (WoSiDA2017), may 2017. [13] Maur´ıcioMoreira Neto, Jos´eNeuman de Souza, Atslands R. da Rocha, Emanuel Ferreira Coutinho, and Leonardo O. Moreira. Miner: an approach for management of iot elements in a cloud using software defined network. In 5th International Workshop on ADVANCEs in ICT INfrastructures and Services (ADVANCE 2017), Evry, Jan 2017. [14] Davi Oliveira, Leonardo Moreira, Emanuel Coutinho, and Gabriel Paillard. Uma proposta ar- quitetural de sistema autonmico para reduo do custo de energia em laboratrios de informtica. In V Workshop de Sistemas Distribudos Autonmicos (WOSIDA2015), may 2015. [15] Davi Oliveira, Maur´ıcioM. Neto, Emanuel Ferreira Coutinho, Gabriel Paillard, Ernesto Trajano, and Leonardo Moreira. An autonomic system for energy cost reduction in computer labs. In 5th International Workshop on ADVANCEs in ICT INfrastructures and Services (ADVANCE 2017), Evry, Jan 2017. [16] Abhinav Parate, Meng-Chieh Chiu, Chaniel Chadowitz, Deepak Ganesan, and Evangelos Kaloger- akis. Risq: Recognizing smoking gestures with inertial sensors on a wristband. In Proceedings of the 12th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys ’14, pages 149–161, New York, NY, USA, 2014. ACM. [17] O. R. Velicu, N. M. Madrid, and R. Seepold. Experimental sleep phases monitoring. In 2016 IEEE- EMBS International Conference on Biomedical and Health Informatics (BHI), pages 625–628, Feb 2016.

8

107 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Experimental Evaluation of Adaptive Beaconing for Vehicular Communications

Pablo Ortega1, Sandra C´espedes12, Sandy Boluf´e1, and Cesar A. Azurdia-Meza1

1 Department of Electrical Engineering, Universidad de Chile, Santiago, Chile 2 NIC Labs Chile, Universidad de Chile, Santiago, Chile {pabloortega,scespedes,sbolufe,cazurdia}@ing.uchile.cl.

Abstract In this work we developed an experimental setup to evaluate an adaptive beaconing al- gorithm that helps maintain cooperative knowledge in vehicular communication networks. The algorithm controls the frequency of the beacons with the objective of reducing channel load and mantaining a fixed average position error calculated from the location information received in the beacons from the neighboring vehicles. To evaluate the performance of the adaptive algorithm, we setup a test bed with commercial on board units (OBUs) that are to be used in connected vehicles. The experimental results are compared with simulated results obtained via network simulation tools.

1 Introduction

Modern vehicles are equipped with systems that include ultrasonic sensors used for parking assistance, cameras employed to monitor the lane or detect pedestrians, radar technology that allows the detection and measurement of distance from vehicles or nearby obstacles, among others [1]. However, these sensors have limitations that in most cases can be solved with vehicle to vehicle communication (V2V), which offers an efficient platform for the deployment of safety and data dissemination applications in vehicular environments. Vehicular communication networks convey a reliable system through the exchange of information, mostly related to road safety. The intention is to provide information to each vehicle date, such as the identification and motion parameters of the vehicles that are in the area of influence, especially those that are not in the field of vision of drivers. This is to alert them of possible road hazards with sufficient time so that proper actions can be taken on time [2]. Cooperative knowledge is the basis of multiple applications corresponding to the categories of road safety and traffic management. This cooperative knowledge is built upon the periodic exchange of messages called beacons, which contain important data such as position, speed, acceleration, among others. Using the information provided by cooperative messaging, vehicles and roadside units (RSUs) are able to create a map of their surroundings, which is then used as input for safety applications that detect potentially hazardous situations [3]. However, the effective transmission of beacons is very sensitive to network conditions: a fixed beacon transmission rate can easily increase the channel load and saturate the network, in particular in scenarios where there is a high density of vehicles [4]. Further, a reduction of the beacon transmission rate may decrement in the quality of information that is built with the help of neighboring vehicles. To address this problem, adaptive beacon algorithms have been proposed in [5][6][7][8], which have shown to improve the effective transmission of beacons by reducing the beacon- ing load in the channel, increasing the reliability of beacons delivery, as well as adapting the transmission of beacons with the least impact over neighboring nodes. However, these adaptive algorithms have been tested only through computational tools, either numerically or simulated.

108 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Experimental Evaluation of Adaptive Beaconing for Vehicular Communications Ortega, C´espedes, Boluf´e,Azurdia-Meza

The objective of this experimental work is to evaluate the performance of the adaptive beacon- ing algorithm proposed in [5], using test bed equipment for vehicular communications. This algorithm was selected because to the authors’ best knowledge, it is the most recent work that controls the beacon transmission rate to meet the position accuracy requirements of safety applications.

2 Performance evaluation metrics 2.1 Position Error Metric In [5] the authors proposed the evaluation of an adaptive beaconing algorithm that uses the following criteria as metrics [4]: the minimum error, maximum error, and average error of the last received position information compared with the real physical position of the sending vehicle. The relevant input parameters for the metric are the vehicle velocity v, the beacon rate

fTb , and the transmission delay DT bRb. Minimum position error (Epmin) : it is the position error that occurs due to the displace- ment of the vehicle during the beacon transmission. This error is related with the transmission delay, DT bRb, and is typically around 0.001s according to [5]. Maximum position error (Epmax) : it is the position error that occurs when the position of the vehicle is looked up in the neighbor data base before receiving the next beacon from this vehicle. This error is equal to the distance traveled by the vehicle in a time interval equal to the inverse of the transmission frequency of the beacon. Average position error (Eprom) : it is the mean error assuming that the event of looking up the position is uniformly distributed over the minimum and maximum time between the reception of two consecutive beacons. The time parameters that influence the average position error are depicted in Figure 1.

Figure 1: Key parameters to determine position error (Adapted from [4])

The average position error is defined as follows [4]:

E + E v(D + f −1) E = pmin pmax = T bRb T b , (1) prom 2 2

where fT b is the frequency of the beacon transmission.

Beaconing schemes that use a fixed frequency and a fixed transmission power have certain

tradeoffs. Due to a high fTb the network produces a low position error, but the probability of collision in the shared channel increases in environments with high vehicular density. Reducing

2

109 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Experimental Evaluation of Adaptive Beaconing for Vehicular Communications Ortega, C´espedes, Boluf´e,Azurdia-Meza

fTb in such scenarios also reduces the probability of collision, but the error position increases significantly; hence, degrading the level of cooperative knowledge. The algorithm proposed in [5] solves this problem using a joint scheme with adaptive frequency of beacon transmission.

The algorithm dynamically adjust fTb in each vehicle according to its velocity and acceleration, to obtain a limited average position error of 1 m, which provides an acceptable level of error for cooperative knowledge.

2.2 Numerical evaluation Based on the expression given in (1), we made a numerical evaluation to verify the impact that the vehicle speed (v) has on the average and maximum position error. This evaluation does not evaluate the minimum position error because it only depends on the transmission delay, which is negligible. In our evaluation we use four typical values of velocity to account for different reference scenarios: residential areas (30 km/h ≈ 8.33 m/s), metropolitan areas (50 km/h ≈ 13.88 m/s ), rural areas (100 km/h ≈ 27.7 m/s) and highways (150 km/h ≈ 54 m/s) [9]. Vehi- cles are assumed to move at a constant speed.

Figure 2: Average position error (m) depending on velocity and beacon transmission rate with DT bRb= 0.001s

Figure2 shows the relation between several beacon transmission rates and the vehicle ve- locity according to the average position error provided in expression 1. The average position error achieved is between 0 m and 10 m with a beacon rate varying from 0 Hz to 10 Hz. If the transmission rate is 10 beacons/s, the error can be kept to 1 m only up to 70 Km/h (≈ 20m/s). An average error of 5m is achieved with a beacon transmission rate of 5 beacons/s only up to 180 km/h (≈ 50 m/s). An accuracy of 2 m with a beacon transmission rate of 10 beacons/s is achieved only with velocities up to 145 km/h (≈ 40m/s).

3

110 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Experimental Evaluation of Adaptive Beaconing for Vehicular Communications Ortega, C´espedes, Boluf´e,Azurdia-Meza

With the aim of validating the results in [5], which have been obtained through simulations and numerical evaluations through computational tools, in the following section we describe an experimental evaluation setup for the adaptive beaconing algorithm proposed in [5] using a test bed with devices for vehicular communications.

3 Methodology

In order to evaluate the selected adaptive beaconing algorithm, two commercial OBUs from Arada Systems [10] are employed to create the experimental testbed (Figure 3). Each OBU is installed on a bicycle to test the communication with other OBUs. The OBU equipment supports Dedicated Short-Range Communications (DSRC), IEEE 802.11p, IEEE 1609.2, IEEE 1609.3 and 1609.4 protocols, the frequency transmission range is between 5.7 GHz and 5.9 GHz [11]. To obtain preliminary results, we first built a pilot test between a fixed OBU and a moving portable OBU installed on a bicycle to assess the average error in a controlled environment.

(a) (b)

Figure 3: Arada LocoME Mobile Device and installation in the bicycle

4 Preliminaries Results

The algorithm in [5] has been implemented with a Software Development Kit available for the Arada Systems equipment. The initial experiment is performed with a fixed OBU acting as the receiver and a mobile portable OBU acting as the transmitter of beacons. Data rate was set to 6 Mbps and the average packet size was fixed to 81 bytes. The testing scenario is in O’Higgins Park, located in Santiago de Chile (Figure 4). The distance traveled by the mobile OBU is 200 meters with the fixed OBU located in the middle of the route to take advantage of the range of coverage. The experimental setup and the evaluation parameters are depicted in Figure 5 and Table 1, respectively. The first result analyzed is the packet loss rate according to the distance between transmitter and receiver OBUs. It can be observed in Figure 6 that for distances below 50m, the algorithm achieves a beacon reception rate near 100%. The loss rate is around 10% for distances between 50 m and 80 m. However, the packet loss rate increases up to 50% for distances longer than

4

111 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Experimental Evaluation of Adaptive Beaconing for Vehicular Communications Ortega, C´espedes, Boluf´e,Azurdia-Meza

Figure 4: Experimental scenario: Start of the route (Top point), End of the route (Bottom point)

Figure 5: Schematic illustration for experimental evaluation of adaptive beaconing algorithms

80 m. This results in the degradation of the neighbor’s position information created at the receiving OBU. The next evaluation corresponds to the maximum and average position error achieved when employing a fixed frequency for beacon transmission compared to the adaptive algorithm pro- posed in [5]. The fixed transmission frequency fT b is set to 1Hz, which could be considered an appropriate frequency to avoid the channel congestion at the cost of degrading the accuracy for neighbors’ position information.

5

112 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Experimental Evaluation of Adaptive Beaconing for Vehicular Communications Ortega, C´espedes, Boluf´e,Azurdia-Meza Parameter Value Position of the Fixed OBU 33◦ 27’ 37.69”S, 70◦ 39’ 36.14”W Initial: 33◦ 27’ 35.78”S, 70◦ 39’ 35.89”W Position of the Mobile OBU Final: 33◦ 27’ 40.42”S, 70◦ 39’ 35.92”W Speed of the Mobile OBU (0 m/s - 6 m/s)

Average Distance Route 190m Frequency 5.9 GHz Channel Bandwidth 10 MHz Desired Error Position 1m Data Rate 6 Mbps Transmition Power 20 dB

Table 1: Evaluation Parameters

Figure 6: Packet reception rate according to the distance between OBUs

The average and maximum error position obtained from the experimental setting is depicted in Figure7. At the beginning of the test, both the velocity of the OBU and the beacon transmission rate of the fixed beaconing algorithm are low (i.e., 1 beacon/s). The error position obtained in the experiments is similar to the one reported by [5]. However, in cases when the velocity increases in intervals of 3m/s and 6m/s, the observed error position also increases to values around 4 m and 6 m for average and maximum position errors, respectively. When the tests is performed with the adaptive beaconing, we set the frequency fT b to 1Hz only for the initial state. When the velocity increases during the experiment, the algorithm implemented in the OBUs computes the velocity between two consecutive beacons and calculates fT b for the next transmission. The objective is to keep the error position bound to 1 m [5], but in this work the average error position with adaptative beacon transmision is 1.3 m. Results show the improvement of the adaptive beaconing versus the fixed beaconing scheme. Both the average and maximum error position are reduced when the frequency is adapted according to the velocity of the vehicle. Our preliminary results are consistent with the theoretical results of the evaluated algorithm.

6

113 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Experimental Evaluation of Adaptive Beaconing for Vehicular Communications Ortega, C´espedes, Boluf´e,Azurdia-Meza

Figure 7: Average and maximum position error percibed by the fixed node

5 Conclusions and Future Work

In this work we have presented a methodology to perform experimental tests of adaptive bea- coning algorithm for the maintenance of cooperative knowledge in vehicular networks. The algorithm aims to maintain a proper accuracy of the information measured through the dif- ference between the known and the real position of the vehicles. In previous works, adaptive beaconing algorithms have proved to be effective in reducing the position error, increasing beaconing delivery and reducing channel load. However, the validation has been performed exclusively through numerical and simulation tools. In this work the average error position with adaptative beacon transmision is 1.3 m. For distances below 50m, the algorithm achieved a beacon reception rate near 100%, whereas the loss rate is around 10% for distances between 50 m and 80 m. Future work involves the installation of OBUs in cars to assess the average position error with vehicular velocities. We will refine our implementation to proceed to artifi- cially increase network density using a USRP to simulate high-density traffic conditions. The experimental results will be compared with simulation data obtained using software tools. We expect to establish which physical layer aspect of the network may be used to improve the performance of the algorithm via cross layer design.

Acknowledgements

This work has been partially funded by CONICYT Project FONDECYT Iniciaci´on11140045, ERANET-LAC ELAC2015/T10-0761, and Project STIC-Amsud VNET 16-STIC-11.

References

[1] A. M. Cailean, B. Cagneau, L. Chassagne, V. Popa, and M. Dimian, “A survey on the usage of DSRC and VLC in communication-based vehicle safety applications,” Proceedings of the 2014

7

114 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Experimental Evaluation of Adaptive Beaconing for Vehicular Communications Ortega, C´espedes, Boluf´e,Azurdia-Meza

IEEE 21st Symposium on Communications and Vehicular Technology in the BeNeLux, IEEE SCVT 2014, pp. 69–74, 2014. [2] M. Sepulcre, J. Gozalvez, O. Altintas, and H. Kremo, “Integration of congestion and awareness control in vehicular networks,” Ad Hoc Networks, vol. 37, pp. 29–43, 2016. [3] M. Boban and P. M. D’Orey, “Exploring the Practical Limits of Cooperative Awareness in Vehicu- lar Communications,” IEEE Transactions on Vehicular Technology, vol. 65, no. 6, pp. 3904–3916, 2016. [4] R. K. Schmidt, T. Leinm¨uller,E. Schoch, F. Kargl, and G. Sch¨afer,“Exploration of adaptive beaconing for efficient intervehicle safety communication,” IEEE Network, vol. 24, no. 1, pp. 14– 19, 2010. [5] S. Bolufe, S. Montejo, C. A. Azurdia, C. Sandra, R. D. Souza, and E. M. G. Fernandez, “Dy- namic Control of Beacon Transmission Rate with Position Accuracy in Vehicular Networks,” no. SSN2017: III Spring School on Networks. Pucon, Chile, October 2017, pp. 1–4, 2017. [6] S. A. A. Shah, E. Ahmed, F. Xia, A. Karim, M. Shiraz, and R. M. Noor, “Adaptive Beaconing Approaches for Vehicular Ad Hoc Networks: A Survey,” IEEE Systems Journal, pp. 1–15, 2016. [7] M. Sepulcre, J. Gozalvez, O. Altintas, and H. Kremo, “Adaptive Beaconing for Congestion and Awareness Control in Vehicular Networks,” pp. 81–88, 2014. [8] A. Daniel, D. C. Popescu, and S. Olariu, “A study of beaconing mechanism for vehicle-to- infrastructure communications,” IEEE International Conference on Communications, pp. 7146– 7150, 2012. [9] D. Jiang and L. Delgrossi, “IEEE 802 . 11p : Towards an International Standard for Wireless Access in Vehicular Environments,” Group, pp. 2036–2040, 2008. [10] ARADA. Locomate classic on board unit obu-200. [Online]. Available: http://www.aradasystems. com/ [11] J. B. Kenney, “Dedicated short-range communications (DSRC) standards in the United States,” Proceedings of the IEEE, vol. 99, no. 7, pp. 1162–1182, 2011.

8

115 Proceedings ADVANCE 2018 ISBN 978-2-9561129 A new SDN-enabled Routing scheme in Vehicular Networks

Mehdi Sharifi Rayeni, Abdelhakim Hafid Department of Computer Science and Operations Research, University of Montreal Montreal, Canada

Abstract—Software Defined Networking (SDN) has been already used in recent literature to add flexibility and programmability to Vehicular Ad hoc Networks (VANETs). However, there are numerous open issues in implementing SDN for central control and management of VANETs. In this paper, we use SDN controller to mitigate congestion of Vehicle-to-Vehicle communications. This is achieved by efficient utilization of VANET bandwidth on road segments. The proposed SDN controller provides a novel routing mechanism that takes into account other existing routing paths which are already relaying data in VANET. New routing requests are addressed such that no road segment gets overloaded by multiple crossing routing paths. We provide an efficient algorithm for a practical solution. Our simulations show QoS improvement, in terms of max crossing path ratio, achieved by our proposal in comparison with recent related contributions.

Keywords—Optimal resource utilization; Software Defined Networking; Connected vehicles; Road oriented routing; Fig. 1. A typical urban scenario for the client v and w requesting for traffic Congestion prevention; Heterogeneous Vehicular Networks. information of road segment s in HetVNets. I. INTRODUCTION II. SYSTEM MODEL AND OPERATIONS In this paper, we consider a heterogeneous vehicular network which consists of (i) WAVE standard for Vehicular A. System Model and Application Scenarios Ad hoc Networks (VANETs) communications, i.e. V2V and As for the system architecture and application scenarios, V2I communications; and (ii) LTE technology for vehicle Fig. 1 shows the system model and a scenario for information communications to evolved NodeB (eNodeB) Radio Access query in HetVNets. Vehicles are provided with the digital road Network units (E-UTRAN). Hence, there are two maps and GPS. We assume that vehicles are equipped with communication options for vehicles: WAVE and LTE both WAVE and LTE transceivers. As for LTE, eNodeBs networks. Vehicles may use their WAVE- and LTE-enabled provide cellular coverage for radio access network in the urban client interfaces, simultaneously. The resultant network is environment. referred as Heterogeneous Vehicular Network (HetVNet) [1]. We employ a bird view over the routing paths of ongoing data B. Operations communications in VANET. The bird view is computed by a Since our goal is to plan data routing in WAVE network, central decision center that keeps track of existing data we design an SDN-enabled scheme to keep track of the communication paths in VANET. For a new routing request, following: (1) existing data communication paths in WAVE the decision center aims to balance data traffic over all road network; and (2) real-time WAVE connectivity and segments subject to delay constraints. The central decision transmission delay over each road segment. center is implemented by a Software Defined Network (SDN) 1) Keeping Track of existing WAVE data communication controller. SDN controllers employ OpenFlow [2] as a paths: To initiate a communication path with a source, a common protocol to adjust routing of flows in OpenFlow- vehicle sends the routing request RREQ to SDN controller. enabled switches throughout the whole network. In case of RREQ includes request id, vehicle id, requested content (e.g. HetVNets, switches are vehicles and RSUs. In our scheme, traffic/parking information), geo-location of the source, and SDN controller doesn’t deal with the flow tables of delay threshold for packet delivery. SDN controller computes intermediate vehicles in the routing path; it just provides the requester with an optimal routing path. SDN controller the optimal routing path in WAVE from the source to the computes routing path such that more road segments are vehicle. The optimal path is computed based on existing utilized in VANET communications (thus balancing the load) WAVE paths in the network. SDN controller saves the optimal and the delay constraint for packet delivery of request is path in OPT (i.e. Optimal path) packet and sends it via LTE satisfied. In this paper, we propose a Cloud-based routing downlink to the requester vehicle. OPT includes message id, approach that takes into account other existing routing paths vehicle id, and the optimal path. The content of the optimal which are already relaying data in VANET. Moreover, we path field consists of the ordered string of road segment ids. present a Software Defined Networking model and mechanism Upon receipt of OPT, the requester embeds the optimal path in for VANET congestion control and monitoring of real-time the header of each data packet and starts to send/receive data WAVE connectivity and transmission delays on road segments. towards/from the source through WAVE multi-hop The rest of the paper is organized as follows. Section II communications. When the data delivery job completes, the presents the proposed system model and mechanisms. Section III evaluates the performance of the proposed solutions. Section requester sends the Finish control message (FINI) to SDN IV concludes the paper. controller. FINI includes the original RREQ request id. Fig. 2 116 Proceedings ADVANCE 2018 ISBN 978-2-9561129

[3], AQRV [4], and Vela [5]. In order to simulate communicating pairs, for each period of 3 seconds, we selected up to 600 vehicles to initiate 300 pairs of communicating sessions. To implement the simulations, we used OMNet++ [6], SUMO [7], and Veins LTE [8].

Fig. 2. A bird view of existing WAVE communication paths in the Manhattan urban scenario.

Fig. 4. Max-Crossing-Paths-ratio versus number of communicating sessions.

Fig. 4 shows Max-Crossing-Paths-ratio (i.e. max number of Fig. 3. Steps for monitoring real-time WAVE connectivity and routing paths which cross a road segment) versus number of transmission delay on road segment (i,j). communicating sessions. Since AQRV uses global pheromones, it can select other paths that are alternative to the shows a simplified bird view of existing WAVE shortest path; thus, it shows less Max-Crossing-Paths-ratio communication paths in the Manhattan urban scenario. compared to GeoSpray and Vela. Vela shows highest Max- Different paths may have some of road segments in common; Crossing-Paths-ratio because all data routing paths are selected the more paths crossing a road segment, the more WAVE among limited number of bus routes. In extreme cases, we congestion and DSRC channel access contentions on the road observe that ORUR causes 28%, 42%, and 56% less Max- segment. Therefore, it is desirable to establish routing paths Crossing-Paths-ratio than AQRV, GeoSpray, and Vela, that utilize unused road segments. However, the route respectively; it shows load balancing behavior of ORUR. planning method should take into account the delay threshold IV. CONCLUSIONS of packet delivery for each routing request (i.e. the delay threshold field in RREQ message format). Instead of routing over a limited set of road segments, our 2) Monitoring real-time WAVE Connectivity and proposed routing scheme, ORUR, balances the load of transmission delays on road segments: communication paths over the whole road segments, thus, it helps in preventing potential congestions in VANETs. ORUR To estimate connectivity and delay metrics, SDN controller, makes use of a new SDN scheme in processing routing for every intersection, periodically selects a random vehicle requests. The objective of the proposed SDN controller is to which is located close to the intersection and sends the control monitor real-time WAVE Connectivity and transmission delays message MONITOR via LTE downlink (step 1 in Fig. 3). The on road segments. ORUR outperforms existing VANETs vehicle disseminates MONITOR to the nearby road segments routing protocols in terms of Max-Crossing-Paths-ratio. For (step 2); the vehicles calculate their local density and forward future work, we will consider more performance parameters in MONITOR to next vehicles; the vehicle closest to the next our evaluations. intersection (vehicle 풛) sends the average vehicle density and REFERENCES transmission delay of the road segment to the SDN controller via LTE uplink (step 3). [1] K. Zheng, et al. “Architecture of Heterogeneous Vehicular Networks, ” [Chapter], in K. Zheng, et al. “Heterogeneous Vehicular Networks, ” C. Optimal Resource Utilization Routing Springer, pp. 9-24, 2016. [2] I. Ku, et al. “Towards software-defined VANET: Architecture and Our proposed routing scheme aims to provide a routing services, ” 13th Annual Mediterranean Ad Hoc Networking Workshop path for a requester so that it involves minimum number of (MED-HOC-NET), pp. 103-110, 2014. shared road segments with existing routing paths in the WAVE [3] V. N.G.J. Soares, et al. “GeoSpray: A geographic routing protocol for network. The resulting path should satisfy delay threshold of vehicular delay-tolerant networks, ” Information Fusion vol.15, pp. 102- the request. We devise an adaptive 퐾 constraint shortest path 113, 2014. algorithm for the Manhattan urban scenario. It is a combination [4] G. Li, et al. “Adaptive Quality-of-Service-Based Routing for Vehicular of a generalized Dijkstra and parallel computing. The Ad Hoc Networks With Ant Colony Optimization, ” IEEE Transactions on Vehicular Technology, vol. 66, no. 4, pp. 3249-3264, 2017. complexity of the algorithm is in the order of 푂(푚푎푥퐾. 푉2 ) where and are the adjusted max number of shortest path [5] F. Zhang, et al. “On Geocasting over Urban Bus-Based Networks by 퐾 푉 Mining Trajectories, ” IEEE Transactions on Intelligent Transportation and the number of intersections, respectively. Systems, vol. 17, no. 6, pp. 1734-1747, 2016. [6] OMNet++, Discrete event simulator, Online: https://omnetpp.org/. III. PERFORMACE EVALUATION [7] SUMO, Urban mobility simulator, Online: http://sumo.dlr.de/. We compare performance of our proposed Optimal [8] Veins LTE, A simulator for heterogeneous vehicular networks, Online: Resource Utilization Routing scheme (ORUR) with GeoSpray http://veins-lte.car2x.org/.

117 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Enhancing the Performance of Wireless Sensor Network through Cross-Layer and Graph Coloring Approaches

Bruno Rog´erioS. dos Santos1, Leonardo S. Rocha1, Renan da S. Alves1, Rafael L. Gomes1, Joaquim Celestino Jr.1

State University of Cear´a(UECE), Fortaleza, Cear´a,Brazil1 {brunoro1389, lsampaiorocha}@gmail.com,[email protected] {rafaellgom, celestino}@larces.uece.br

Abstract In Wireless Sensor Networks (WSNs) a set of autonomous sensors feed a base sta- tion with information. The transmissions from the nodes to the base station need to be scheduled in time slots. Thus, it is necessary a efficient protocol to avoid collisions and interference, while minimizes the energy consumption. Within this context, this paper is proposed a cross-layer protocol based on the graph coloring for wireless sensor networks. The main characteristic of the proposed protocol is an efficient use of the time slots al- located to the sensors, offering scalability and efficiency in the transmission to the base station. The validation of the proposal was performed in a specific simulator of sensor networks. The proposed protocol overcame existing protocols that do not reuse time slots, even in high density scenarios. In general, the protocol supported 53% more nodes and having an 30% better energy efficiency in the data collection.

1 Introduction

A wireless sensor network (WSN) is a set of autonomous distributed sensors used to monitor environments information (such as temperature, pressure, movements and others) and transmit these information to a base station, which process it [5]. The sensors need to share the physical medium to efficiently transfer the data, where a medium access protocol need to coordinate the transmission to avoid interferences and collisions [6]. In the Time-Division Multiple Access (TDMA) protocol, each sensor has a time interval to transmit its data, avoiding the interference between sensors [8]. The scheduling of time slots in the TDMA is made by the base station, which synchronizes the sensors. The scheduling process of WSNs allows receptions and transmissions free from collisions, reducing the energy consumption of sensors (some nodes in sleep mode and no retransmission needed). Another two key aspect in WSN (besides collisions, interference and energy consumption) are the scalability and effectiveness [2]. A scalable protocol enables the growth of network topology (number of nodes and increase connectivity) without compromise the transmission performance. In the same way, effectiveness represents the relation between the amount of data transmitted to the base station by the energy consumed by the network as a whole. One possible way to address these aspects is a cross-layer approach, that designs streamlines communication between layers, providing response based on the most complete view of the network. Within this context, this paper proposes the Spatial Reuse-Medium Access Control (SR- MAC) protocol, a medium access protocol for WSN that reuses the time slots in a TDMA frame. This approach increases the scalability of the network, as well as maximizes the performance of data transmission to the base station. The SR-MAC protocol applies a graph coloring heuristic [7] in order to reuse the time slots used by the sensors and, thus, make the protocol scalable and efficient in relation to base station data collection, in addition to maximizing network

118 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Enhancing the Performance of WSN B. Santos, L. Rocha, R. Gomes, J. J´unior,R. Alves

performance and exchanges of information. Additionally, a cross-layer approach was used to allow SR-MAC to search for the best route between the station and sensors. From the results obtained it is verified that the SR-MAC provides scalability, allowing the insertion of more nodes in the network, generating an increase of the data flow. And it is efficient in collecting data captured by the base station, even with a significant energy consumption. A previous version of SR-MAC was presented in reference [3]. The current version of the protocol has improvements in the access medium control mechanism, as will be described through the paper. Additionally, this paper presents new results (more robust and deeper analysis). This paper is organized as follows. Section 2 presents related works, encompassing topics related to WSN and access medium protocols. Section 3 describes the proposed SR-MAC protocol, while Section 4 shows results obtained in the experiments performed. Finally, Section 5 summarizes the paper and presents some future works.

2 Related Works

This section describes key related works about medium access protocols in WSNs, as well as cross-layer approaches. The SOTP protocol [9] applies a cross-layer approach based on the TDMA scheme. SOTP adopts a reverse scheduling to reduce the transmission delay in multihop networks. This schedul- ing allocates time slots from the beginning to the end of the TDMA frames. Thus, the first nodes to register in the network will have the end of the frame. On the other hand, the last nodes to register will transmit in the beginning of the frame. This process occurs until all the nodes be registered. In this way, the data coming from a node will be delivered in a singles frame, reducing the transmission delay. Despite the cross-layer approach, the SOTP protocol does not reuses the time slots, preventing it to achieve a better scalability and effectiveness. Gajjar et al. [4] propose a protocol, called Self Organized, Flexible, Latency and Energy Ef- ficient (SOFLEE), to reuse the transmission slots, aiming to be flexible, energy efficient and low latency. SOFLEE considers the base station to be responsible for the slots allocation, where the same transmission interval could be shared by nodes that are 2-hop distant. Addition- ally, SOFLEE split the network in sectors to increase the spatial reutilization of the channels and to reduce the data latency. However, the 2-hop distance criterion can not guarantee the effectiveness in the time interval reuse. Wu et al. [10] present a distributed protocol, called TDMA-CA, for scheduling based on graph coloring to perform a spatial reuse of transmission channels (allowing distinct sensors to use the same transmission channel). Thus, the TDMA-CA protocol allocates different channels for conflicting nodes and schedules distinct time intervals for data transmission. Nevertheless, the distributed approach of TDMA-CA compromises the scalability of the proposal, since all the nodes need to participate of the channel allocation and time interval scheduling processes. To the best of our knowledge, none of the works found in the literature focuses on the development of a medium access protocol for WSN that focus on both scalability and effective- ness of data transmission in the network. These issues are considered in the SR-MAC protocol described in this paper.

3 SR-MAC Protocol

The SR-MAC is a cross-layer protocol which acts on the routing and, mainly, medium access control tasks. SR-MAC offers a efficient reuse of time intervals in the scheduling of TDMA

2

119 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Enhancing the Performance of WSN B. Santos, L. Rocha, R. Gomes, J. J´unior,R. Alves

using a color graph technique, where the scheduling process is centralized in the base station. The time interval reuse increases the work cycle of sensor, enhancing the data collection by the base station. Additionally, this reuse process improves the scalability of the network, since the time slots are better used. The graph coloring in sensor networks assigns color, where each color represents an available time interval in the TDMA frame. When a sensor node is registered in the network, an available time interval is associated to this node. After, when a non registered sensor node is introduced in the network, a several conditions are considered to assigned to it a already used color (time interval). More details about this conditions are described in Section 3.2. The registry process in the SR-MAC protocol is self-organized, where the operation is divided in frame, which in turn is split in time intervals of same duration. Each sensor has its own frame, that are synchronized with the other frames in the network. One frame has 5 types of time interval: (1) Broadcast (BRS), Detection (CSS), transmission (TX), Reception (RX) and Idle (IDS). Each frame has a time interval dedicated to the base station, to the BRS and CSS. BRS and CSS are the first and second time intervals, respectively. These intervals occurs in the beginning of the frame to allow a better control of the network and the node registration process by the base station. Each sensor has three states: (a) Searching State, (b) Synchronized State and (c) Registered State.

(a) Searching State (b) Synchronized State (c) Registered State Figure 1: SR-MAC states In the Search process, illustrated in Figure 1(a), a sensor listen the transmission inside its operating range. Thus, the sensor receives from the base station a Time Slot Packet (TSP), which assigns a time interval. The TSP includes information about allocated and available time intervals in the TDMA frame, as well as information about the state of the sensors. The TSP is sent by the base station by broadcast in the BRS interval. Therefore, all active sensors in the network will change to synchronization mode when the TSP is received. Figure 1(b) presents the Synchronized state. A sensor chooses a time interval to the TX from the information in the TSP (available and allocated time intervals), as well as it is decided its father node (one of the neighbors, each neighbor with a ID). All these information is added to a packet, called REG, which is sent to the father node. From the data in packet REG is possible to detect conflicts. The father node forwards the REG packet to its father, and so on, until the REG packet reaches the base station. When all the information (coming in REG packets from the sensors) is available, the base station applies a graph coloring algorithm to solve the conflicts. Thus, in the algorithm, it is possible to identify the time intervals that could be reused for TX intervals, when a new node wants to be registered. The graph coloring algorithm is described in Section 3.2.

3

120 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Enhancing the Performance of WSN B. Santos, L. Rocha, R. Gomes, J. J´unior,R. Alves

In Figure 1(c), after the allocation of TX interval, the base station updates the TSP and sent it to all sensors in the network, aiming to spread the nodes in the network and to allow the father nodes to adapt its RX intervals to the TX intervals of its sons. If two or mode modes choose the same TX interval in the registry process, then the base station allocates distinct time intervals to these sensors. After all nodes been registered, the routing tree is finished, determining the routes of each sensor. The graph coloring heuristic enables multiple simultaneous transmissions, avoiding interference and collision between the sensors.

3.1 Routing

The routes to interconnect the sensors to the base station are based on multihop. The SR- MAC protocol defines a reverse scheduling to reduce the delay transmissions, increasing the probability of the data to reach the base station in a single frame. In order to increase the performance of the reverse scheduling, an heuristic to order the input data in the registry process is applied. The Algorithm 1 represents the ordination process, where the sensor nodes close to the base station receive the time intervals in the end of the frame (lower waiting time) and the nodes far from the base station get the time interval in the beginning of the frame (longer waiting time), after the intervals of BRS and CSS.

Algorithm 1 Sorting Algorithm 1: Input: n sensor nodes 2: Output: Waiting time t 3: for i ←− 1 to n do 4: Make the relative distance between i and the Base Station by formula: p 2 2 5: DistiBS = (iX − BSX ) + (iY − BSY ) 6: And make the division: t = DistiBS/1,5 7: Return: t

In Algorithm1, the following notation is used: i represents the ID of a sensor in the network; BSX and BSY are the coordinations of the base station; and, iX and iY represent the coordinations of a node i. The calculation of the relative distance between each sensor and the base station is defined by variable DistiBS. The output of the algorithm is the variable t, i.e., the the waiting time of a node to get in the registry process. Through this waiting time it is possible to order the sensors to enter the searching state and consequently prevent the flood of registration requests at the same time, it prevents the node from waiting a long time to enter the registration process and minimizes the registration time of all the nodes in the network. The value 1,5 was defined from an empirically study (more than fifty simulations in distinct scenarios) about the behavior of the nodes in the registration process. It as identified that this value significantly reduced the possibility of two or more nodes enter the network at the same time and choosing the same TX slot in the synchronized state.

p 2 2 DNV = (NX − VX ) + (NY − VY )

p 2 2 DVBS = (VX − BSX ) + (VY − BSY ) . (1)

p 2 2 DNBS = (NX − BSX ) + (NY − BSY ) 4

121 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Enhancing the Performance of WSN B. Santos, L. Rocha, R. Gomes, J. J´unior,R. Alves

Regarding to the routing tree building, the criterion used is related to distance between the sensors and the base station. The choose of father nodes follows the Equation 1. As result of the equations, a node is chosen to be father of another node when it is in a better position (closer) between the non registered node (son) and the base station. These equations bring stability in the communication between sensors and reduces the energy consumption of the sensors. DNV represents the distance between the non registered node and each of its neighbors. DVBS is the distance between the registered neighbor and the base station. And, DNBS represents the distance between the non registered node and the base station. Additionally, NX and NY are the “X” and “Y” coordinations of the node to be registered, respectively. In the same way, VX and VY represent the “X” and “Y” coordinations the registered neighbors.

3.2 Scheduling The SR-MAC protocol focus on the enhancement of the scheduling process by reusing the available time intervals. To deploy this reusing approach in the TX intervals for the already registered nodes, it was applied an heuristic with graph coloring. Figure 2 shows a example of network using the SR-MAC protocol. First, it is defined a conflict graph from a neighborhood matrix presented in the base station. This matrix is created from information provided by each sensor in the network, during the registry process. From the conflict graph, it is defined the a set of interference rules. In this way, each color represents a existing time interval in the TDMA frame, excluding the BRS and CSS intervals.

27 29 26 29 29 30 28 BS - Base Station

Slot TX 30 30 27 Slot TX 29 30 28 Slot TX 28 Slot TX 27

Slot TX 26 BS

Figure 2: Example of scenario using the SR-MAC protocol

Thus, for a sensor node gets a color (a TX interval), it is necessary to meet the following conditions: (I) the nodes can not get the same TX interval of its neighbors; (II) the nodes can not get the same TX interval of fathers of its neighbors, avoiding nodes inside the range of other two nodes get interference and collision from its neighbors; and, (III) the nodes can not get the same TX interval of the sons of its neighbors. These conditions guarantee that one transmission will not interfere in another one from its neighbors, avoiding collisions from direct neighbors and intermediary nodes (one node father and son of distinct nodes).

5

122 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Enhancing the Performance of WSN B. Santos, L. Rocha, R. Gomes, J. J´unior,R. Alves

In the example of Figure 2, the base station could reuse the time intervals since it is using the graph coloring algorithm. Thus, in this example, with 11 registered nodes were necessary just 5 time intervals to allow transmissions free from collision/interference. Additionally, Algorithm 2 summarizes the graph coloring approach to define the time intervals.

Algorithm 2 Coloring Algorithm Input: Sensor node, adjacency matrix and the vector of slots 2: Output: The largest color suitable for the sensor node x Create vector D[1, .., s] to save the available slots to the node x 4: for i ←− 1 to s do D[i] = 1 // All slots are initially available 6: Be Slot(i) the function that returns the node slot i for each neighbor i ←− x in the conflict graph do 8: D[Slot(i)] = 0 // Not available, because if the node i transmit at the same time as the node x, the node y you will not receive the message correctly for each neighbor i ←− x in the conflict graph do 10: for each child j ←− i in the conflict graph do D[Slot(j)] = 0 // Not available, because if the node j transmit at the same time as the node x, the node i you will not receive the message correctly 12: Return: Slot(x) = largest slot available in D (with value 1)

4 Results

In this section we show the evaluation of SR-MAC1 protocol in front of SOTP (described in section2), through a Castalia simulator [1]. The scenario was composed by a set of nodes (three scenarios with 50, 100 and 150 nodes, respectively) randomized positioned in a 100m2 area. Additionally, the protocols were run in simulations with 200, 400, 600, 800 and 1000 seconds, to get results in distinct periods. Each sensor node was configured with 10 Joules of energy and the duration of each time interval of a frame was defined as 5ms. 35 simulation were performed for each scenario, all the results are presented with 95% of confidence interval. Figure 3(a) shows the number of TX intervals needed to transmit all data to the base station. In this case, the SR-MAC used less time intervals than SOTP, since the SOTP need the number of interval equal to the number of sensor in the network. The SR-MAC reached a effectiveness higher than SOTP in all scenarios, reaching a performance 58% and 30% better in the best and worst cases, respectively. In the same way, Figures 3(b), 3(c) and 3(d) presents the data related to the effectiveness of data collection, illustrated in Equation 2. In Equation 2, P ctBSP is the number of packets received by the base station in the period P , EiP is the sum of the energy spent by the node i in the period P and n is the number of nodes in the network. Thus, the effectiveness is the amount of packets received by the base station per the energy consumed spent by the network.

P ctBSP n (2) P EiP i=1

1Available at bitbucket.org/masks/sr-mac/downloads/

6

123 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Enhancing the Performance of WSN B. Santos, L. Rocha, R. Gomes, J. J´unior,R. Alves

(a) Number of intervals required. (b) Collection Efficiency in Scenario 1.

(c) Collection Efficiency in Scenario 2. (d) Collection Efficiency in Scenario 3.

Figure 3: Evaluation Metrics in relation to the Base Station Based on information shown in Figures 3(b), 3(c) and 3(d), it is possible to conclude that SR- MAC presents a better effectiveness than SOTP: 36,42%, 19,28% and 26,77% in the Scenarios 1, 2 and 3, respectively. This fact occurs due to SR-MAC guarantees more data transmissions and consequently more data will be collected using less energy. The higher effectiveness of SR-MAC protocol has a price, it spends more energy of the sensor nodes. In the experiments performed, the SR-MAC reduced the lifetime of the network when compared to the SOTP (around 9% in worst case and 6% in best case). This higher energy consumption is caused by the higher frequency of transmissions and receptions in the sensor nodes. Therefore, the higher energy consumption comes with the higher effectiveness. The benefits of the effectiveness overcome the energy consumption drawback, since a 36,42% higher effectiveness results in a 9% worst energy consumption.

5 Conclusion and Future Works

WSNs are used to monitor environments through a set of sensors, where all data collect is transmitted by the sensors to the base station of the network. To avoid problems like interference

7

124 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Enhancing the Performance of WSN B. Santos, L. Rocha, R. Gomes, J. J´unior,R. Alves

and collision it is necessary to medium access protocol to control the transmission time intervals. This protocol should be scalable and effective to support the data transmission. Therefore, this paper proposes a medium access protocol for WSN that reuses the time intervals, called SR-MAC. SR-MAC applies a graph coloring heuristic and a cross-layer approach to reuse the time intervals in a TDMA frame. The goal of SR-MAC is to increase the scalability and to maximize the performance of data transmission to the base station. The centralized proposal of the protocol was chosen precisely to compare the characteristics and functionalities of the SR-MAC with the SOTP protocol, which is also centralized. The results suggest that the SR-MAC improves the effectiveness of the network by reusing time intervals. In general, the performance of SR-MAC overcome the existing work in the literature regarding time slot usage and scalability. As future works, we intend to improve the energy consumption of the proposal, increasing the lifetime of the sensors, in addition to comparing the SR-MAC protocol, using a centralized and a distributed approach, with distributed protocols existing in the literature.

References

[1] Athanassios Boulis et al. Castalia: A simulator for wireless sensor networks and body area networks. NICTA: National ICT Australia, 2011. [2] Felipe D. Cunha et al. Uma nova abordagem para acesso ao meio em redes de sensores sem fio. 31o Simp´osioBrasileiro de Redes de Computadores e Sistemas Distribu´ıdos, pages 3–16, 2013. [3] Bruno Rog´erioS. dos Santos et al. Sr-mac um protocolo cross-layer baseado em colora¸c ao de grafos para melhoria da eficiˆenciade redes de sensores sem fio. XXXVII Congresso da Sociedade Brasileira de Computa¸c˜ao- 16o WPerformance, pages 1642–1655, 2017. [4] Sachin Gajjar et al. Self organized, flexible, latency and energy efficient protocol for wireless sensor networks. International Journal of Wireless Information Networks, 21(4):290–305, 2014. [5] Deepak Goyal and Malay R. Tripathy. Routing protocols in wireless sensor networks: a survey. In Second International Conference on Advanced Computing & Communication Technologies, pages 474–480. IEEE, 2012. [6] Pei Huang et al. The evolution of mac protocols in wireless sensor networks: A survey. IEEE Communications Surveys & Tutorials, 15(1):101–120, 2013. [7] Leonardo S. Rocha. Algorithmic aspects of graph colouring heuristics. PhD thesis, Nice, 2012. [8] Aggeliki Sgora et al. A survey of tdma scheduling schemes in wireless multihop networks. ACM Computing Surveys, 47(3):53, 2015. [9] Yu Wang et al. Sotp: a self-organized tdma protocol for wireless sensor networks. In Electrical and Computer Engineering, 2006. CCECE’06. Canadian Conference on, pages 1108–1111. IEEE, 2006. [10] Dan Wu et al. Distributed tdma scheduling protocol based on conflict-free for wireless sen- sor networks. In Intelligent Computing and Integrated Systems (ICISS), 2010 International Conference on, pages 876–879. IEEE, 2010.

8

125 Proceedings ADVANCE 2018 ISBN 978-2-9561129

SMDAnonymizer: a web tool for data anonymization

´Italo O. Santos2, Emanuel F. Coutinho1,2, and Leonardo O. Moreira1,2

1 IBITURUNA – Research Group of Cloud Computing and Systems 2 Federal University of Cear´a(UFC) – Fortaleza – Cear´a– Brazil [email protected], [email protected], [email protected]

Abstract Cloud Computing provides a high-performance computing infrastructure to run appli- cations with guarantees of quality of service. From the moment the user hires a cloud provider, there is a loss of control over data security and privacy issues. Privacy in cloud is the ability of a user or organization to control what information can be revealed about themselves and take control of who can access certain information and how it can occur. Due to this, there is a need to use techniques or strategies to preserve the security and privacy of user data in cloud providers. This paper presents a tool called SMDAnonymizer to generate anonymized files using the k-anonymity algorithm to increase the privacy of data sets. We discuss some results obtained with our tool applied in real data conditions. Finally, we present some future works that we intend to do, for improving our tool.

1 Introduction

With the advancement of modern human society, basic and essential services are almost all delivered transparent. Utilities such as water, electricity, gas and telephone have become fun- damental to our lives, being exploited through the use-based payment model [21]. Nowadays, existing business models provide many services anywhere and anytime. These services are charged considering the various policies for the end user. This also applies to services offered in technology areas and this is due to the growth and spread of Cloud Computing (CC), that proposes to be global and delivers services from the end user who hosts their personal docu- ments such as texts, videos and images on the internet to companies that outsource the entire IT infrastructure offering their services through the cloud. However, due to be a recent area of research, CC brings common risks relevant to IT environments. CC has its own set of security problems, identified by [10] in seven categories: network security, interfaces, data security, virtualization, governance, compliance, and legal issues. In order for a CC environment to be exploited by corporations, security and privacy of data stored in the cloud is a fundamental requirement. Privacy is a concept directly related to people, and it is a human right, such as freedom, justice or equality before the law, and is directly related to people’s interest in maintaining a personal space, without the interference of other people or organizations. Moreover, ensures that individuals control or influence what information related to them can be collected and stored by someone and with whom they can be shared [17]. Privacy in CC is the ability of a user or organization to control what information can be revealed about themselves in the cloud, that is, take control of who can access certain information and how it can occur. [9] defined three privacy dimensions: (i) Territorial Privacy is the protection of the region close to an individual; (ii) Individual Privacy is the protection against moral damages and unwanted interference; and (iii) Information Privacy is the protection for personal data collected, stored, processed and propagated to third parties. In a cloud, it is important to note that developers and users deliver their applications and data to be managed on the infrastructures and platforms provided by cloud providers. In this

126 Proceedings ADVANCE 2018 ISBN 978-2-9561129

SMDAnonymizer: a web tool for data anonymization Santos et al.

sense, the need arises to adopt techniques so that the delivered data are free of internal or external access in an environment that can be controlled by third parties, especially if data considered sensitive or highly private [16]. There are some techniques currently proposed by the academic community for data protection, which can be used and applied with the aim of anonymizing data, such as: (i) Generalization that replaces quasi-identifier attribute values with less specific but semantically consistent values representing them. This technique categorizes the attributes, creating a taxonomy of values with levels of abstraction going from the particular level to the generic level; (ii) Suppression that deletes some identifier and/or quasi-identifier attribute values from the anonymized table; (iii) Encryption that uses cryptography schemes normally based on public key or symmetric key to replace sensitive data (identifiers, quasi- identifiers and sensitive attributes) by encrypted data; and (iv) Perturbation is used for data mining privacy preservation or for replacing actual data values with dummy data for masking test or training databases. In this paper, we will investigate tools used for data anonymization in cloud environments and present a web tool developed for data anonymization that uses the k-anonymity technique to anonymize user data. The specific objectives are: (i) describe how the tool works; (ii) show the applicability of the web tool to real data; and (iii) present new opportunities related to future versions of the web tool. The remainder of this paper is organized as follows: The next section we discuss concepts of cloud computing, privacy and data anonymization. In Section 3, we review related work. Then, in Section 4, the experiments and evaluation resulted are discussed. Finally, Section 5 presents the conclusion and future work.

2 Cloud Computing, Privacy and Data Anonymization

2.1 Cloud Computing According to the National Institute of Standards and Technology (NIST) [11], CC is defined as an evolving paradigm. Their definitions, use cases, technologies, problems, risks and benefits will be redefined in discussions between the public and private sectors, and these definitions, attributes, and characteristics will evolve over time. In dealing specifically with the definition, a broadly accepted definition is not yet available. NIST presents the following definition for CC: “CC is a model that enables convenient and on-demand access to a set of configurable computing resources (for example, networks, servers, storage, applications, and services) Which can be quickly acquired and released with minimal managerial effort or interaction with the service provider”. Another CC definition is: “CC is a set of enabled network services, providing scalability, quality of service, inexpensive on-demand computing infrastructure and can be accessed in a simple and pervasive way” [2].

2.2 Privacy The amount of personal information transferred to the cloud is increasing, so does the concern of individuals and organizations about how this data will be stored and processed. The data is stored in multiple locations, often transparently and it causes uncertainty as to the degree of privacy they are exposed. According to [12], terminology for dealing with data privacy issues in the cloud includes the following concepts: (i) Data Controller is an entity (individual or legal entity, public authority, agency or organization) that alone or in conjunction with others determines the manner and

2

127 Proceedings ADVANCE 2018 ISBN 978-2-9561129

SMDAnonymizer: a web tool for data anonymization Santos et al.

purpose for which personal information is processed; (ii) Data Processor is an entity (individual or legal entity, public authority, agency or organization) that processes personal information in accordance with the Data Controller’s instructions; and (iii) Data Subject is an identified or identifiable individual to whom personal information refers, either by direct or indirect identification (for example by reference to an identification number or by one or more physical, psychological, mental, economic, cultural or Social). The NIST Computer Security Handbook defines computer security as the protection afforded to an automated information system in order to achieve the proposed objectives of preserving the integrity, availability, and confidentiality of information system resources [8]. The process of developing and deploying applications to CC platform, which follow the Software as a Service (SaaS) model, should consider the following security aspects of data stored in cloud [18]: (i) Data security in SaaS model, data is stored outside the boundaries of the organization’s technology infrastructure, so the cloud provider must provide mechanisms that ensure data security. For instance, this can be done using strong encryption techniques and fine-tuning mechanisms for authorization and access control; (ii) Network security the client data is processed by SaaS applications and stored on cloud servers. The transfer of organization data to cloud must be protected to prevent loss of sensitive information; (iii) Data Location in SaaS model, the client uses the SaaS applications to process their data, but do not know where the data will be stored. This may be a problem due to privacy legislation in some countries that prohibit data being stored outside their geographical boundaries; (iv) Data integrity the SaaS model is composed of multi-tenant cloud-hosted applications. These applications use interfaces based on API-Application Program Interfaces XML to expose their functionalities in form of web services; (v) Data Segregation the data from multiple clients may be stored on the same server or database as the SaaS model. The SaaS application must ensure the segregation, at the physical level and at the application layer, of customer data; and (vi) Access to data the multi-tenant environment of the cloud can generate problems related to the lack of flexibility of SaaS applications to incorporate specific policies of access to data by the users of SaaS client organizations.

2.3 Data Anonymization

Data anonymization is done to preserve privacy over data publishing. Large public and private corporations have increasingly been charged to publish their “raw” data in electronic format, rather than providing only statistical or tabulated data. These “raw” data are called micro data. In this case, prior to publication, data must be “sanitized” by removing explicit identifiers such as names, addresses, and telephone numbers. From the perspective of data dissemination, the attributes can be classified as follows [4]: (i) Identifiers are attributes that uniquely identify individuals (e.g. social security number, name, identity number); (ii) Quasi-identifiers (QI) are attributes that can be combined with external information to expose some or all individuals, or reduce uncertainty about their identities (e.g. date of birth, ZIP code, work position, function, blood type); and (iii) Sensitive attributes (SAs) are attributes that contain sensitive information about individuals (e.g. salary, medical examinations, credit card postings). K -anonymization techniques are a key component of any comprehensive solution to data privacy and have been the focus of intense research in the last few years. An important require- ment for such techniques is to ensure anonymization of data while at the same time minimizing the information loss resulting from data modifications such as generalization and suppression [3]. The k-anonymity model requires that any combination of quasi-identifier attributes be shared by at least k records in an anonymous database [14], where k is a positive integer value

3

128 Proceedings ADVANCE 2018 ISBN 978-2-9561129

SMDAnonymizer: a web tool for data anonymization Santos et al.

defined by the data owner, possibly as a result of negotiations with other interested parties. A high value of k indicates that the anonymized bank has low disclosure risk because the prob- ability of re-identifying a record is 1/k, but this does not protect the data against disclosure of attributes. Even if the attacker does not have the ability to re-identify the registry, he may discover sensitive attributes in the anonymized database. By using anonymization techniques, data sets privacy is preserved from various disclosures according [7]: (i) Identity disclosure occurs when an individual was usually associated with an evidence in the published table. If his identity was disclosed, then the corresponding sensitive value of an individual would be revealed; (ii) Attribute disclosure was possible when information about individual record would be revealed. Before releasing the data, it must infer attributes of an individual with high confidence; and (iii) Membership disclosure is a membership information in the released table would imply an identity of an individual through various attacks. If the selection criteria were not a sensitive attribute value, then it would lead to have a membership disclosure.

3 Related Work

According [6] personal records are collected and used for data analysis by various organizations in public and private sectors. In such cases privacy should be ensured for not disclosing the personal information at the time of data sharing and analysis. Although anonymization is an important method for privacy protection, there is a lack of tools which are both comprehensive and readily available to informatics researchers and also to non-IT experts, e.g., researchers responsible for the sharing of data [13]. Graphical user interfaces (GUIs) and the option of using a wide variety of intuitive and replicable methods are needed. Tools have to offer in- terfaces allowing their integration into pipelines comprising further data processing modules. Moreover, extensive testing, documentation and openness to reviews by the community are of high importance. Informatics researchers who want to use or evaluate existing anonymization methods or to develop novel methods will benefit from well-documented, open-source software libraries [13]. We briefly review the related work as follows. The µ-Argus [1] is a software program designed to create safe micro-data files, and is a closed-source application that implements a broad spectrum of techniques, but it is no longer under active development. The sdcMirco[15] is a package for the R statistics software used for the generation of anonymized micro data, i.e. for the creation of public and scientific-use files, which implements many primitives required for data anonymization but offers only a limited support for using them to find data transformations that are suitable for a specific context. UTD Anonymization Toolbox[19] was developed by UT Dallas Data Security and Privacy Lab, they made an implementation of various anonymization methods into a toolbox for public use by researchers. These algorithms can either be applied directly to a data set or can be used as library functions inside other applications, and the Cornell Anonymization Toolkit (CAT)[20] was designed for interactively anonymizing published data set to limit identification disclosure of records under various attacker models, both are research prototypes that have mainly been developed for demon extraction purposes. Problems with these tools include scalability issues when handling large data sets, complex configuration requiring IT-expertise, and incomplete support of privacy criteria and methods of data transformation. The ARX tool an open-source data anonymization framework, features a cross-platform user interface that is oriented towards non-IT experts, it utilizes a well-known and highly efficient anonymization algorithm [13].

4

129 Proceedings ADVANCE 2018 ISBN 978-2-9561129

SMDAnonymizer: a web tool for data anonymization Santos et al.

4 Experimental Evaluation

[13] presents the ARX tool, a comprehensive open-source data anonymization framework that implements a simple three-step process. It provides support for all common privacy criteria, as well as for arbitrary combinations. It utilizes a well-known and highly efficient anonymization algorithm. Moreover, it implements a carefully chosen set of techniques that can handle a broad spectrum of data anonymization tasks, while being efficient, intuitive and easy to understand. Their tool features has a cross-platform user interface that is oriented towards non-IT experts. In our approach, we used the ARX API to create a generic web tool for data anonymization, we will explain more about the tool developed in the next section. [13] provides a stand-alone software library with an easy-to-use public API for integration into other systems. Their code base is extensible, well-tested and extensively documented. As such, it provides a solid basis for developing novel privacy methods.

4.1 Experiment Settings The proposed SMD Anonymizer tool is implemented in java using the NetBeans1 as integrated development environment (IDE) for coding. Our experiments are conducted in a cloud envi- ronment2 hosted in IBITURUNA research group. We have collected data set from a federal government website3 the data set name is BolsaFa- milia data set and have the following attributes: UF, Code-SIAFI, Municipality, Code-Function, Code-Subfunction, Code-Program, Code-Action, NIS-Favored, Name-Favored, Source-Purpose, Value-Month. The tool implements the k-anonymity algorithm as generalization technique for data anonymization. The k-anonymity parameter is set as standard to 2 for the results that will be presented, we intend in future versions of the tool, let the user set in the interface the value of k-anonymity parameter.

4.2 Experiment Process and Results We present the tool interface in figure 1, where we have the initial screen. In the interface we can download a data file example that shows the user the type of file the tool receives, our tool support the format .csv short for comma separated values. This format is often used to exchange data between differently similar applications. The user has the option to use confirm button after uploading the file which will be anonymized. Then, the user should select the anonymization algorithm that will be applied to the data set. The clear button will delete the filled fields and the user can start uploading and selecting the anonymization algorithm again. After uploading and selecting the algorithm, the tool reads and interprets the fields referring to the columns of the data set that has been uploaded and shows in the checkbox field the columns the data set has. The user can select the fields which will be anonymized as presented in Figure 2. By selecting the fields which will be anonymized, the user must upload the hierarchy. To generalize the hierarchy is created for each attribute which defines the privacy level. A hierarchy is created for quasi-identifiers based on the type of values these attributes hold. For instance, the hierarchy of attribute UF, Code-Subfunction, Code-Program and Code-Action is shown in figure 3.

1https://netbeans.org/ 2http://app.ibituruna.virtual.ufc.br/ 3http://www.transparencia.gov.br/

5

130 Proceedings ADVANCE 2018 ISBN 978-2-9561129

SMDAnonymizer: a web tool for data anonymization Santos et al.

Figure 1: Step 1 Selecting the data set and the anonymization algorithm

Figure 2: Step 2 Selecting the anonymization hierarchies of the fields

Therefore, after selecting fields and uploading their hierarchies, the user must confirm the operation, and then the tool exports and saves the anonymized data to a file in csv format. Table1 shows the final file generated by the tool. By multimedia, we understand all the programs and systems where communication between man and computer occurs through multiple means of representation of information. Our tool fits as a multimedia product according to the characteristics described by [5]: (i) Non-linear

6

131 Proceedings ADVANCE 2018 ISBN 978-2-9561129

SMDAnonymizer: a web tool for data anonymization Santos et al.

Figure 3: Hierarchy of attributes applied

Table 1: BolsaFamilia data set result UF Codigo-Subfuncao Codigo-Programa Codigo-Acao Nordeste 24* 13** 844* Nordeste 24* 13** 844* Centro-Oeste 24* 13** 844* Norte 24* 13** 844* Sudeste 24* 13** 844* Sul 24* 13** 844* Centro-Oeste 24* 13** 844*

access the information is quickly accessible non-linear, the user does not get stuck in a time sequence like the reader of a book, the listener of a lecture or the spectator of a movie; (ii) Interactivity the situation of the user in front of the computer may not be that of passive spectator, but of participant of an activity; and (iii) Integration with application programs is when the computer can perform calculations, searches on databases and other normal tasks of any application program. Therefore, as the multimedia characteristics mentioned above our tool meets the require- ments of interactivity where it requires the user to upload the file that will be anonymized, and select in the checkboxes the hierarchies that will be applied, and meets the requirement of integration with application programs, because the tool executes the k-anonymity algorithm used for data anonymization, and makes use of an API to execute the others functionalities present in the application.

5 Conclusion and Future Work

In this paper, we presented our tool called SMDAnonymizer and described their use as a tool to anonymize data raw and generate a new file with anonymized data. Furthermore, we also identified concepts presented in the literature for data anonymization. Moreover, we presented concepts related to CC and privacy focusing on anonymization techniques. As future work, we intend to further develop the tool, implementing new anonymization algorithms, and testing different types of data, comparing the efficiency of each implemented algorithm. We also want to integrate other APIs to make our tool more expandable and increase the amount of functionalities. We will try to develop this tool with a friendly interface and it can be used for research purposes and by the non-IT experts.

7

132 Proceedings ADVANCE 2018 ISBN 978-2-9561129

SMDAnonymizer: a web tool for data anonymization Santos et al.

References

[1] µ-argus manual, 2017. Available from: http://neon.vb.cbs.nl/casc/Software/MuManual4.2. pdf. [2] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D Joseph, Randy H Katz, Andrew Konwinski, Gunho Lee, David A Patterson, Ariel Rabkin, Ion Stoica, et al. Above the clouds: A berkeley view of cloud computing. Technical report, Technical Report UCB/EECS-2009-28, EECS Department, University of California, Berkeley, 2009. [3] Ji-Won Byun, Ashish Kamra, Elisa Bertino, and Ninghui Li. Efficient k-anonymization using clustering techniques. In International Conference on Database Systems for Advanced Applications, pages 188–200. Springer, 2007. [4] Jan Camenisch, Simone Fischer-H¨ubner,and Kai Rannenberg. Privacy and identity management for life. Springer Science & Business Media, 2011. [5] Wilson de P´aduaPaula Filho. Multimdia - Conceitos e Aplicaes. Editora LTC, 2 edition, 2011. [6] Benjamin Fung, Ke Wang, Rui Chen, and Philip S Yu. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys (CSUR), 42(4):14, 2010. [7] S Gokila and P Venkateswari. A survey on privacy preserving data publishing. International Journal on Cybernetics & Informatics (IJCI) Vol, 3, 2014. [8] Barbara Guttman and Edward A Roback. An introduction to computer security: the NIST hand- book. DIANE Publishing, 1995. [9] Arlindo Marcon Jr, Marcos Laureano, Altair Santin, and Carlos Maziero. Aspectos de seguran¸ca e privacidade em ambientes de computa¸c˜ao em nuvem. 2010. [10] Ronald L Krutz and Russell Dean Vines. Cloud security: A comprehensive guide to secure cloud computing. Wiley Publishing, 2010. [11] Peter Mell and Tim Grance. The nist definition of cloud computing. national institute of standards and technology (nist). Information Technology Laboratory.[Online]. Available: http://csrc. nist. gov/groups/SNS/cloud-computing/index. html, 2009. [12] Siani Pearson. Privacy, security and trust in cloud computing. In Privacy and Security for Cloud Computing, pages 3–42. Springer, 2013. [13] Fabian Prasser, Florian Kohlmayer, Ronald Lautenschl¨ager,and Klaus A Kuhn. Arx-a compre- hensive tool for anonymizing biomedical data. In AMIA Annual Symposium Proceedings, volume 2014, page 984. American Medical Informatics Association, 2014. [14] Pierangela Samarati. Protecting respondents identities in microdata release. IEEE transactions on Knowledge and Data Engineering, 13(6):1010–1027, 2001. [15] sdcMicro. Data-Analysis, 2017. Available from: https://cran.r-project.org/web/packages/ sdcMicro/. [16] Fl´avioRC Sousa, Leonardo O Moreira, and Javam C Machado. Computa¸c˜aoem nuvem: Conceitos, tecnologias, aplica¸c˜oese desafios. II Escola Regional de Computa¸c˜aoCear´a,Maranh˜aoe Piau´ı (ERCEMAPI), pages 150–175, 2009. [17] William Stallings. Network security essentials: applications and standards. Pearson Education India, 2007. [18] Subashini Subashini and Veeraruna Kavitha. A survey on security issues in service delivery models of cloud computing. Journal of network and computer applications, 34(1):1–11, 2011. [19] UTD Anonymization Toolbox. UT Dallas Data Security and Privacy Lab, 2017. Available from: http://cs.utdallas.edu/dspl/cgi-bin/toolbox/. [20] Cornell Anonymization Toolkit. Cornell Database Group, 2017. Available from: https: //sourceforge.net/projects/anony-toolkit/. [21] Christian Vecchiola, Xingchen Chu, and Rajkumar Buyya. Aneka: a software platform for .net- based cloud computing. High Speed and Large Scale Scientific Computing, 18:267–295, 2009.

8

133 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Blockchain Technology: A new secured Electronic Health Record System

Lotfi Tamazirt1*, Farid Alilat1, Nazim Agoulmine2 1 University of Sciences and Technology Houari Boumediene, Algiers, Algeria 2 University of Evry val-d’essonne, Paris, France. [email protected], [email protected], [email protected].

Abstract Nowadays, health systems are looking for effective ways to manage more patients in a shorter time, and to increase the quality of care through better coordination to provide quick, accurate and non-invasive diagnostics to patients. This paper aims to solve the dependence on trusted third parties by proposing a new management strategy, storage and security in a decentralized network through Blockchain technology. The proposed system also aims to offer a solution to help healthcare professionals to be informed of the slightest changes made in a patient's file in order to reduce medical error rates , but also by allowing them to consult them transparently if they are authorized.

1 Introduction Over the last few decades, revolutionary new technologies such as the Internet hav e enabled the daily and professional lives of billions of people to cross the world, and have made it possible to overcome the constraints of distance and ubiquity through equal exchanges and an open and free flow of information. However, as soon as an exchange implies a transfer of material or immaterial value, it requires absolute confidence as a guaranty principle incarnated by centralized institutions which are none other than trusted third parties. This aspect of authenticity and confidentiality of data as well as the security of information transfers once again makes its full sense in the medical and health care field in which the slightest error in the writing or reading of a medical record can be fatal. Certainly, there are now several electronic sy stems for the indexation and the storage of patient records, nevertheless, the latter do not always consensus in terms of reliability or legitimacy, hence the need to supervise them. However, the supervision of these trusted third parties is not without risk, which explains the fact that patients and professionals are gradually losing confidence in those systems. The current problem is therefore how to have solutions allowing to trust these systems, and having the opportunity to solve the intrinsic problems to the functioning of these entities in order to be able to

134 Proceedings ADVANCE 2018 ISBN 978-2-9561129

reform the current system through innovative alternatives to the current schemes, and thus to dispense with a third party in charge of the verification, validation and listing of the history of medical records. Among those promising technologies that are unambiguous and revolutionizing in many aspects, a permanent, transparent, secured and non-centralized control solution, relying on its own security system has emerged. It is none other than the Blockchain technology. This paper focuses mainly on the investigation of the applicability and the implementation of an Electronic Health Record (EHR) system (Friend, T. H. et al. (2017). Communication Patterns in the Perioperative Environment During Epic Electronic Health Record System Implementation. Journal of medical systems, 41(2), 22.) based on blockchain technology on the top of an E-health communication architecture. The rest of this paper is organized as follows. The first part of this article is devoted to presenting the fundamentals of Blockchain technology. The second focusses in the technical part of the EHR Blockchain system, and explains in detail its functional aspect and its protocols as well as the steps necessary for the development of this system. Finally, we will end this paper with a discussion of the potential of the Blockchain in such an application and will end with a conclusion.

2 Fundamental of Blockchain technology The Blockchain literally means a chain of blocks. It is perceived as a system or a computer protocol for managing digital data of all kinds: transactions, contracts, medical data, etc. All this information is housed in digital containers as chronologically chained blocks from hence the name Blockchain. In other words, the Blockchain is a sort of gigantic register with an almost unlimited storage capacity, transparent, secured and decentralized. It contains the history of all the exchanges made by the users since the system is created. The latter can of course be consulted in a transparent manner. The Blockchain can be described as a register on which everyone can write, but cannot erase and/or destroy. The confidentiality of data shared by users requires a high level of security. As a result, the system requires the sharing of copies of this register on the various computers known as "network nodes". The robustness of this technology is demonstrated not only by the use of a sophisticated security system, which is asymmetric cryptography, but also by sharing data and information in a “peer-to-peer” (Gaffney, T. (2016). The Peer-to-Peer Blockchain Mortgage Recording System: Scraping the Mortgage Electronic Registration System and Replacing It with a System Built off a Blockchain. Wake Forest J. Bus. & Intell. Prop. L., 17, 349.) i.e. from one user to another without passing through a third party. Figure 1 shows a part of the architecture of a Blockchain.

Bloc 15 Bloc 16 Bloc 17 EHR 64 EHR 62 EHR 67 EHR 65 EHR 63 EHR 66

Figure 1: EHR Blockchain architecture.

135 Proceedings ADVANCE 2018 ISBN 978-2-9561129

3 Types of Blockchain systems The Blockchain technology is adaptable to its degree of openness, if the latter is limited, the Blockchain created is called "private" or "consortium", on the other hand if it is open, it is called "public".

• Public Blockchain: The Public Blockchain is a fully decentralized and open-ended block chain in terms of use, reading, and participation in the management of operations within the network. The main element favoring this model is the assured protection of the users of an application against the developers of this same application. • Consortium Blockchain: The Consortium Blockchain is a partially decentralized semi-private chain, where the consensus process, i.e., the management of the operation, is controlled by a set of pre- selected nodes. Depending on the application, access and consultation of the registry may be public or restricted. • Private Blockchain: The private Blockchain is a chain of blocks whose writing is dedicated to a centralized organization, contrary to the right of reading which can be public or private, which reinforces the confidentiality of the users and reduces the costs of operations; its strength lies in the possibility of drafting, modifying and sealing all the rules of the Blockchain network.

The optimum solution for a particular industry depends greatly on the exact area of activity of the industry in question. In some cases, the public model is clearly more appropriate, in others a certain degree of private control is simply necessary.

4 Functional aspect of Blockchains The functional aspect of a Blockchain system is based on three pillars: 1. Peer-to-Peer network that offers decentralization of the system. 2. Cryptography, which ensures the anonymity or, more precisely, the pseudonymy of users and the electronic authentication, thanks to asymmetric algorithms such as RSA (Somani, U., et al. (2010, October). Implementing digital signature with RSA encryption algorithm to enhance the Data Security of cloud in Cloud Computing. In Parallel Distributed and Grid Computing (PDGC), 2010 1st International Conference on (pp. 211-216). IEEE.), which can provide encryption as well as electronic signatures. Its security is based on the intra-capacity of the factorization problem of large integers. 3. A programmed consensus, distributed on all the nodes of the network, which ensures the construction of a chain of identical blocks for all the nodes of the Blockchain even if it is built independently. The combination of these three concepts gives rise to a digital data management system, transparent, secure and decentralized, which allows a breakthrough in computing.

5 Electronic Health Records Blockchain As mentioned before, the type of Blockchain to use depends heavily on the application. In the case of EHR, the most judicious choice to consider is the Consortium Blockchains. Because medical records

136 Proceedings ADVANCE 2018 ISBN 978-2-9561129

of a particular patient should offer the opportunity for many practitioners to view the history of their patients while not letting anyone add facts if some pre-selected nodes do not allow them to. So, in order to enhance the security of EHR, we propose to implement a Consortium Blockchain dedicated to medical records. Thus, when recording a fact for example, the user writes in the register or in the database using his own private key. This is called the creation of facts. The attending physician as well as all the members of the network - if they are authorized - can then decrypt and read the medical file with complete transparency. Moreover, the accounts cannot be modified or repudiated because only the user who has the encryption key who is actually the private key. This process is called the electronic signature or authentication.

Writing of the fact. Operations are grouped in the same block.

1 2

The block is validated by the pre-selected nodes of the network ✓.

3

4 5

The block is dated and added to the blockchain to The fact can be which authorized users have access. consulted.

Figure 2: Validation process of the EHR Blockchain.

In the proposed EHR Blockchain, adding a fact is carried out by the establishment of a process ensuring the start of the various steps which are explained as follows: (the figure 2 summarizes the process). • A doctor proceeds to write a fact on the Blockchain account. • The fact is recorded and timestamped in a block using arithmetic operations. • The block is subsequently validated by network pre-selected nodes of the network through cryptographic techniques, this is called Hashing. • The block is dated and added to the block chain at the end of a vital element that is the consensus mechanism, so that all users can have access to the same chain since each node builds its own exemplary independently.

137 Proceedings ADVANCE 2018 ISBN 978-2-9561129

• Finally, another doctor or other users can access to EHR of a specific patient added by the first doctor.

6 Development of the EHR Blockchain System The development of an EHR block goes through an extremely important function in the generation process of the EHR Blockchain (figure 3), the hash function. The latter converts a digital input value to another compressed numerical value called a fingerprint. The input of this hash function has an arbitrary length, its output as for it is part has a fixed length. The properties of this function are given as follows: • Any data size always gives the same hash length. • Slight changes in the input data give totally different hashes. • Hash are unidirectional. Otherwise, this function has two essential characteristics: • The slightest change in the input chain causes a big change in the output chain and therefore a different footprint. • Building a document that can provide a given footprint is deemed to be extremely difficult. A footprint can therefore be at the same time an identifier, a guarantee of integrity, a proof of existence or even serve as a basic function in the production of proof of work. The footprint of a block is not integrated into n + 1 block, making it impossible to modify the block n, without having to modify also the block n + 1, then n + 2, n + 3, etc. To find an identifier for each block of our EHR Blockchain, we must build a file that contains all the validated facts of the block and the identifier of the previous block, and to strengthen the security of this chain of blocks, a random number named Nonce (Kishigami, J., et al. (2015, August). The Blockchain-based digital content distribution system. In Big Data and Cloud Computing (BDCloud), 2015 IEEE Fifth International Conference on (pp. 187-190). IEEE.) is added. This file will go through a Hash function (Zyskind, G., & Nathan, O. (2015, May). Decentralizing privacy: Using Blockchain to protect personal data. In Security and Privacy Workshops (SPW), 2015 IEEE (pp. 180-184). IEEE.), and the result is a sequence of characters that will be used as an identifier of the next block, then the blocks will be chained one after the other, where the identifier of a block is used to process the next block. The addition of the random value Nonce aims to reinforce the robustness and security of the Blockchain. A condition on the identifier of the block is then imposed, it is called "The target difficulty", it is the fact of finding an identifier that begins with a number '0' depending on the accuracy of the system. The aforementioned process requires significant computing power, since it can only be solved by trying random combinations of the Nonce, until finding a value that matches.

138 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Block [n] Block [n+1]

Identifier of the previous block: Identifier of the previous block:

05g6fser76jkgrstyj0 00gg5f7kfrdik68sfnbl Fact to be validated: Fact to be validated:

▪ Hg5476ghfy5yinyg763q66d4y6t53 ▪ v76r575gikg56fjhi8t53rgj88hfr45dh ▪ f86g5degjo7r3gyg838dnduw37b5 ▪ jjk7545djh644nhf73j2ybne9d8yb6e ▪ 8de83h53g3b47h5932jbdtu1nb2h

Nonce: 164653168641 Nonce: 354684536584

Hash Hash

00gg5f7kfrdik68sfnbl 000l8g5c4g89g4d3s90m8i

Figure 3: Example of an EHR block construction.

7 Content of an EHR block A block consists of two essential parts, the block header and the contents of the block. 1. The header of the block is composed of: • Technical data, including a Magic ID, a version number that specifies to which set of rules of the protocol, this block is compliant, as well as the size of the block. • A Hash corresponding to the identifier of the previous block, it ensures the link that creates the Blockchain. • The Merkle root (Kallahalla, M, et al. (2003, March). Plutus: Scalable Secure File Sharing on Untrusted Storage. In Fast (Vol. 3, pp. 29-42) that includes a history of all block transactions as a single Hash.

139 Proceedings ADVANCE 2018 ISBN 978-2-9561129

• A block creation Timestamp that is used to determine whether the network is creating blocks too fast or too slowly. • The target difficulty related to block checking. This is the condition on which the block identifier is generated. • The Nonce: A random number added, to make the hash more difficult and create different Hash to find the most suitable. 2. The content of the block consists of a list of all the patients records, as well as the operations performed. The figure 4 shows a block content of the EHR Blockchain.

Block

Block Header

Technical data Previous Hash précédent Merkle root Timestamp

Target difficulty Nonce

Block Content

Patients records Performed operations

Figure 4: Block content of the EHR Blockchain.

8 Discussion Through the implementation of our EHR Blockchain system, we truly believe that this technology has the potential to transform health care sector by increasing the level of the privacy and security, and most of all the interoperability of health data, by providing a ubiquitous, and a secured network infrastructure, capable to authenticate and verify the identity people with access to patients’ medical records. The use of such a technology will open up new perspectives in the healthcare field by organizing it through the creation of secured and immutable health data and by surpassing the economic

140 Proceedings ADVANCE 2018 ISBN 978-2-9561129

challenges that must be addressed by governments. In (Rifi, N., et al. (2017, October). Towards using Blockchain technology for eHealth data access management), authors illustrates the specific problems and highlights the benefits of the blockchain technology for the deployment of a secure and a scalable solution for medical data exchange. Hence, like any recent technology that is taking its first steps, the Blockchain system will obviously face technical problems, however, the technological research and hardware investments provided have solved some of these major problems such as data storage and the huge computing power needed to validate the blocks, which is currently disciplined by alternative protocols offering other management methods. In addition, an improvement must be made for the duration of the validation of the facts and the integration of the blocks in the chain which is rather long because it requires a distributed consensus throughout the network.

9 Conclusion Blockchain technology is a new strategy that gives users a new decentralized power over businesses and public authorities, guaranteeing users' security, transparency and confidentiality, which is offered by a database similar to a register. implemented on a shared peer-to-peer network on all network nodes. Thus, we have proposed in this paper to use Blockchain technology as a tool to provide a new model for health care information exchanges by making EHR more secured in order to reduce and eliminate medical errors. We also explained the tools and methodologies adopted for the development of our EHR Blockchain system by illustrating the operating principle of this system as well as the different tools and methodologies needed. We believe that the proposed EHR Blockchain has the potential to support the healthcare sector by providing a better support to health data management for doctors and patients.

References

Friend, T. H. et al. (2017). Communication Patterns in the Perioperative Environment During Epic Electronic Health Record System Implementation. Journal of medical systems, 41(2), 22 Gaffney, T. (2016). The Peer-to-Peer Blockchain Mortgage Recording System: Scraping the Mortgage Electronic Registration System and Replacing It with a System Built off a Blockchain. Wake Forest J. Bus. & Intell. Prop. L., 17, 349. Somani, U., et al. (2010, October). Implementing digital signature with RSA encryption algorithm to enhance the Data Security of cloud in Cloud Computing. In Parallel Distributed and Grid Computing (PDGC), 2010 1st International Conference on (pp. 211-216). IEEE. Kishigami, J., et al. (2015, August). The blockchain-based digital content distribution system. In Big Data and Cloud Computing (BDCloud), 2015 IEEE Fifth International Conference on (pp. 187-190). IEEE.) Zyskind, G., & Nathan, O. (2015, May). Decentralizing privacy: Using blockchain to protect personal data. In Security and Privacy Workshops (SPW), 2015 IEEE (pp. 180-184). IEEE Kallahalla, M, et al. (2003, March). Plutus: Scalable Secure File Sharing on Untrusted Storage. In Fast (Vol. 3, pp. 29-42). Rifi, N., et al. (2017, October). Towards using Blockchain technology for eHealth data access management. In International Conference on Advance in Biomedical Engineering (ICABME17). 4th international conference on Advances in Biomedical Engineering. IEEE.

141 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Context-based Dynamic Optimization of Software Defined Networks

Francisco J. Badaró V. Neto, Constantino Jacob Miguel, Jorge Alves Santos and Paulo N. M. Sampaio Unifacs – Universidade Salvador [email protected], [email protected], [email protected], [email protected]

Abstract —Providing guarantees of real-time description of the dynamic nature of trafficdelivery based on the availability of context-based networks.Therefore, in this network resources has been one of the main work we propose the application of a user- issues discussed in the literature. However, centric (context-based) optimization solution due to the converging nature of digital to SDNs called Context-Aware Adaptive architectures, to the increasing demand for Routing Framework (CAARF-SDN). Within real-time sensitive traffic such as voip and CAARF-SDN the concepts of Quality of other multimedia applications, and to a Service (QoS), Quality of Experience (QoE), greater user adaptability relative to the use Quality of Device (QoD), Quality of Context of new technologies, solutions based only on (QoC), and adaptive routing are integrated Quality of Service (QoS) appear to be to provide a dynamic and proactive insufficient to meet user requirements. In approach for the delivery of context- fact, QoS metrics are centered on the sensitive traffic. network, describing the nature of the traffic (using metrics such as throughtput, delay, Keywords—Adaptive Routing, Context- jitter, etc.). The concept of context-based Based Networks, Quality of Experience, networks enriches the traffic management Quality of Device, Quality of Context, since it considers users, network and end- Quality of Service, CAARF, SDN. user devices requirements, providing a generic and cutting-edge approach for I. INTRODUCTION traffic optimization. In this work, the Software Defined Networks (SDN) he intensive use of existing IP networks paradigm provides the required Trequires an optimized management of the mechanisms for the implementation of a available resources of the network dynamic control architecture and infrastructure in order to allow the coexistence management of network resources due to of multiple types of traffic ensuring their the decoupling of control plan and routing proposed service levels (SLA). In recent years plan.Nevertheless, flowtable configuration there has been a considerable increase in the within SDNs´controllers is still carried out availability of bandwidth, storage and statically, which does not allow the processing capacity, which has motivated the

142 Proceedings ADVANCE 2018 ISBN 978-2-9561129

emergence of new applications, new end-user to context changes during communication, devices, mobile communication, etc. thus, enhancing user experience. Unfortunately, network infrastructure and routing strategies have not evolved at the same The main contributions presented in this paper pace as applications and their demands. As a are: result, network infrastructure is constantly under scarcity of resources and hence under 1) The presentation of a dynamic and constant congestion, not fully meeting the scalable solution to provide adaptive demands of network utilisation, which are and context-sensitive traffic shifting from data-driven demands to service- optimization, through the proposal of centric dynamic demands accessed by fixed the CAARF framework applied to and mobile users with a significant increase in SDN networks(CAARF-SDN); traffic demands under highly heterogeneous environments. 2) The proposal of an integrated context model based on the concepts of The literature related to network optimization Quality of Service (QoS), Quality of is extensive revealing a wide range of Experience (QoE) and Quality of approaches. Some of these approaches are Device (QoD) for adaptive routing; related to flowtable configuration based on Software-Defined Networks (SDNs), such as 3) The representation of a generic- OpenQoS [1], Q-Point [4], Sdnhas [3], Rvsdn context model using JavaScript Object [5] , EuQoS [23], QoE-Serv [24], and the Notation (JSON) to describe the literature survey "Traffic Engineering in context-notification provided by Software Defined Networks" [21], which users, end-user devices and network presents two case studies with different devices; approaches.This scenario presents the need for a solution to optimize network traffic delivery 4) The introduction of the implemented for critical applications in today's multi-service CAARF-SDN architecture, networksin order to provide minimum identifying the main modules and guarantees to applications sensitive to network their interactions; variations. 5) The proposal of a case studyin order The work presented in this paper introduces a to illustrate and validate the solution for the optimization of context- implemented solution. sensitive traffic within SDN networks based on: Quality of Service (QoS), Quality of Compared to the approaches found in the Experience (QoE), Quality of Device (QoD) literature, CAARF-SDN not only is focused on and adaptive routing. This context-oriented a specific metric (QoS or QoE), application solution is called Context-Aware Adaptive (video or voice) or a type of network Routing Framework (CAARF) [6] [7] [8] [9]. (stationary or mobile). CAARF-SDN, besides CAARF integrates the concepts of QoS, QoE considering QoS and QoE in the context and QoD in order to provide a more proactive definition, also takes into account Quality of and dynamic approach for time-sensitive traffic Device metrics (QoD), such as CPU load of a delivery (such as VoIP and video), while network device (switch or router), end-user aiming at the improvement of user perception device information such as global positioning over a conventional IP network. CAARF (GPS) or some specific data related to mobile collects, analyzes, and maintains system state device radio interface, being totally agnostic to (user, devices, and network), dynamically the type of application and network. When adapting the routing engine, while responding compared to the contributions found in the literature, CAARF-SDN, besides supporting

143 Proceedings ADVANCE 2018 ISBN 978-2-9561129

QoS, QoE and QoD, also presents high Quality of Experience (QoE) refers to a group flexibility for dynamic network performance. of metrics related to user perception such as Moreover, CAARF-SDN can not be Mean Opinion Score (MOS) [19] and R-Factor misconsidered as a regular network [20]. MoS is a value derived from the R-Factor management platform. Besides supporting that measures voice call quality, on a scale of 1 monitoring, configuration and resources (poor) to 5 (excellent). The Rating Factor (E- allocation, CAARF-SDN also analyses Model R-Factor) [20] is a value derived from collected data and further issues context and metrics such as latency, jitter, and packet loss, modification notifications in order to provide in order to quickly evaluate QoE for voice calls traffic optimization delivery through the on a scale from 50 (poor) to 100 (excellent ). definition of optimal paths and dynamic Therefore, QoE provides an estimate of the reconfiguration of the network according to the user experience.Finally, Quality of Device traffic needs. (QoD) is related to the accuracy and technical characteristicsof end-user device or network This paper is organized as follows: Section II devices (such as switches, routers and all other introduces context-sensitive systems and network infrastructure equipment). Some of the presents the adopted context model; Section III applied QoD metrics are: GPS location, device presents the CAARF-SDNarchiteture; Section CPU level, device memory load, and others IV discusses a case study and section V related to the state of the user's device or presents the conclusions of this paper and some network equipment. perspectives for future work.

II. CONTEXT-SENSITIVE SYSTEMS

Context can be defined as "any information that can be used to characterize the state of an entity considered as relevant to the interaction between a user and an application, or between the application and the communication infrastructure" [10]. Context is typically related to the location, identity, characteristics, and states of individuals, groups, and elements [11].In particular, context knowledge is Fig. 1 - CAARF-SDN Conceptual Architecture understood as a ubiquitous computing paradigm that aims to dynamically deal with III. CONTEXT-AWARE ADAPTIVE changes in the computational environment[2]. ROUTING FRAMEWORK - SDN APPROACH (CAARF-SDN) Discussed in the literature as in the following [6], [7], [8], [9], the concept of QoS is related As previously mentioned, in this work the to the guarantee of the quality of traffic Context-Aware Adaptive Routing Framework delivery (according to Service Level (CAARF) [6], [7], [8], [9] is applied to the Agreement - SLA) between the user and the integration with SDN networks. For this communication platform based on qualitative purpose, a conceptual architecture of the and quantitative metrics of the network, such proposed solution is illustrated in Figure 1. The as bandwidth, latency and jitter, packet loss conceptual architecture of CAARF-SDN is and throughput. Therefore, QoS provides the modularized and fully integrated, and it is network perspective. Within CAARF-SDN, composed of the CONTEXT READER, the notion of context is associated with the OPTIMIZATION and FLOWTABLE integrated use of metrics related to QoS, QoE CONFIGURATION modules. and QoD.

144 Proceedings ADVANCE 2018 ISBN 978-2-9561129

A. Context ReaderModule The context processing reader metrics to be The context reader module aims to collect the verified initially proposed are: QoS, QoD and QoE notifications from their respective sources (network devices, end-user • QoS: (Jitter, Latency, Packet Loss, devices and users) and process them in order to Bandwidth, Throughput); verify the global context of the system. If a • QoD: (CPU Load, Memory Load, Battery relevant context modification isverified a Charge, GPS Positioning), and; notification for the optimization module is • QoE: (Estimated MoS and R-Factor). issued. All the data collected are required to guarantee the quality of the optimization process. The notification algorithm must take into account all of these metrics. Analysis are carried out when a minimal data criteria is reached. This analysis is done by the context management submodule, and upon detection of any significant data variation, it sends a notification to the Optimization Module, detailed in the next section.

B. OptimizationModule Fig. 2 – Conceptual Architecture of context reader Module. The optimization module aims at automatically Figure 2 illustrates the functional description of selecting the existent optimal paths based on the context reader module. As a network contextual information generated by the device, any network equipment is required to Context Reader Module. The path selection monitor and collect context information, such relies on a set of pre-defined policies, built as routers, switches, and servers. Terminal upon data analysis and performance indexes devices are user equipment, such as a personal also generated by the Context Reader Module. computer, laptop or smartphone. Therefore, when a context variation notification is generated by the Context Reader The SDN Controller is responsible for the Module, the Optimization Module analyzes the configuration of the SDN network, and network topology dynamically, determining the therefore it has important information about the possible forwarding paths . traffic and the network topology that must be collected. The path ranking submodule is responsible for determining the set of optimal paths that meet In order to evaluate the information accuracy the traffic needs and QoC criteria for each QoS, QoE andQoD metrics, the notion of application. For this purpose, the QoE, QoD Quality of Context (QoC) was proposed. The and QoS metrics are applied to rank all the validation of QoD is related to some metrics, possible forwarding paths and determine the such as [11]: Accuracy of information (that ones more likely to be applied. allows the evaluation of relevance of the information); Probability thatthe information is Based on the set of optimal forwarding paths correct (Probability of Correction); Level of determined by the path ranking submodule, the confidence regarding information sources forwarding policies submodule will determine Reliability); Resolution levels of information which forwarding path will be chosen (among granularity (Resolution); How updated the the set of proposed optimal paths) considering information is and its temporal characteristics. a set of pre-defined policies (criteria) for each application. After that, the optimization module sends the flowtable configuration module the

145 Proceedings ADVANCE 2018 ISBN 978-2-9561129

configuration directives that have to be applied queries. Indeed, the interaction with the on the SDN controller´s flowtable. These controller´s northbound interface allows the directives are finally stored on the optimization Flowtable configuration module to interact database. with several types of controllers, enabling the dynamic reconfiguration of their flowtable. The optimization to be carried out is able to modify the behavior of the network´s traffic Figure 4 illustrates the conceptual architecture (for instance, by modifying its forwarding of the Flowtable configuration module, where priority or the flow path through the change of it is possible to observe its interaction via API switching port). Figure 3 depicts the with (i) the optimization module, when it architecture of the optimization module and its receives the configuration directives ; (ii) the components. SDN controller via its Northbound interface´s API, through which the controller's flowtable is dynamically reconfigured; (iii) the database Configuration Database, for the statistical recording of the flowtable configuration update/ notification logs.

In Figure 4, it is also possible to notice the SDN controller, which, through its northbound interface has its flowtable reconfigured, and via southbound interface reprograms the Openflow devices (switches) in the existing network infrastructure, thus optimizing traffic management. Fig. 3 – Conceptual Architecture of optimization module

C. Flowtable ConfigurationModule The main goal of the flowtable configuration module is to set the SDN controller´s flowtable dinamically based on the configuration directives issued by the optimization module.

This configuration is carried out through the SDN controller API functions, via the controller´s Northbound interface (which is, in the SDN paradigm, the interface which enables Fig. 4 – Conceptual Architecture of flowtable external interaction with the SDN controller configuration module [13]). Once the controller´s flowtable is reconfigured, it updates the SDN IV. CAARF-SDN: CASE STUDY switches´flowtable via its Southbound interface (which is, in the SDN paradigm, the interface In order to validate the proposed forwarding through which a real or virtual network can be optimization solution, the CAARF-SDN controlled / configured / managed [13]). architecture was implemented for the optimization of a Voice over IP (VoIP) The flowtable configuration module also stores application. all the configuration carried out in the SDN controller´s flowtable in the configuration database as action logs for further statistical

146 Proceedings ADVANCE 2018 ISBN 978-2-9561129

The laboratory setting for this case study is under congestion. The simulation of a voip call composed of a server running CAARF-SDN, a using Startrinity starts with a congestion free server for the SDN Controller (HPE VAN SDN network. The simulation runs smoothly (MoS Controler), elements for traffic/congestion above 4) until it reaches a congestion peak (1), generation (RouterOS Mikrotik RB1200 / caused by a traffic generated for this purpose btest), Startrinity voice call simulator (UAC + (1). CAARF-SDN, processes and implements UAS Functions) [14] and ethernet switches this policy through the SDN controller´s with support to openflow 1.3. In this topology northbound interface. The SDN controller the voice communication with Startrinity is processes the CAARF-SDN flowtable simulated between two stations, along with an configuration directives and updates the extra traffic simulation with the use of the flowtable with its configuration instruction, routerOS / Mikrotik btest utility [15]. forwarding the reprogrammed flowtables to the switches carrying the optimization instructions Several case studies can be proposed for (DSCP 46 -Expedited Forwarding) [16] for CAARF-SDN. However, due to space voice over ip packets, and; Best Effort voice limitations, only one case study scenario is packets that previoulsy suffered the greatest discussed in this paper. Figure 5 illustrates the impact during congestion, were reassigned proposed environment. with an EF (Expedited Forwarding) traffic priority, thus being routed with a higher priority within the network [16], and achieving a better performance given the congested scenario.

Fig. 5. Simulated scenario: Voice Prioritization and Fig. 6 – Formula used to estimate QoE (MoS) by CAARF- Classification SDN. In order to prospect MoS as a QoE metric in A. Description of the Package Prioritization this case study, the proposed approach applied Approach the formula depicted in Figure 6 to estimate MoS within CAARF-SDN. This formula is In this scenario, the adopted optimization introduced in [25] as an adaptation to the policy implies that the voice application traffic classical E-Model [20] considering the context has a higher priority among all the traffics in of packet-based networks. The formula the network. Therefore, voice flow packets considers the effective latency (and jitter) and must be detected and have their priority the R-Factor definition based on the effective changed from best-effort to a higher border latency of 160ms (above this, due to forwarding priority. For instance, in ethernet network degradation – leading to a high jitter networks the DSCP 46 determines express and packet loss, the quality is highly forwarding - Expedited Forwarding (EF) [16]). compromised, consequently the R-Factoris also penalized) to determine the estimated MoS. As B. Simulation of the Testing Scenario a complement to the MoS estimation in the An initial topology was proposed for the proposed scenario, the CAARF-SDN context simulation of CAARF-SDN. As suggested in processing module also collects the metrics of the initial setting illustrated in Figure 5, the the applications through the API agent via demonstration of results is carried out through RTCP [18] and sends them to the CAARF- the simulation of a voice call with the network SDN.

147 Proceedings ADVANCE 2018 ISBN 978-2-9561129

C. Description of the Testing Scenario, the latency between controller and physical Simulationand Results devices is high or the controller performance is Scenario I: Simulation of 5 simultaneous calls, not satisfactory, the flowtable configuration during a period of 10 minutes of simulation performance will be affected, thus impacting within a congestion free FastEthernet network. its efficiency. However, these problems can be mitigated by using a distributive architecture TABLE I with multiple controllers performing load INDICATORS PRODUCED WITHIN SCENARIO I balancing [13]. METRICS MIN AVERAGE MAX Packets Loss (%) 0.00 0.00 0.00 MOS 4.41 4.41 4.41 V. CONCLUSIONS AND FUTURE R-factor 93.20 93.20 93.20 WORKS

Scenario II: Simulation of 5 simultaneous This paper discusses the effectiveness of the calls, 10 minutes of call simulations, with optimization techniques proposed by CAARF- congestion within the FastEthernet network SDN as a solution for the definition of optimal (99.8% of the interface is congested). In paths for context sensitive traffic routing. The Scenario II (the congestion simulation), the chosen architecture is expandable, modular and packet loss also causes a sudden quality adaptive. SDN networks due to decrease in the Startrinity simulation programmability characteristics open up a environment [14] with a negative impact on its promising horizon for the next generation operation. networks, given the characteristics of TABLE II flexibility, programmability and adaptability. INDICATORS PRODUCED IN THE SCENARIO II METRICS MIN AVERAGE MAX Packet Loss (%) 0.00 25.84 78.30 As for future work, CAARF-SDN can be MOS 4.41 2.38 1.00 extended to support multidomain topologies R-factor 93.20 41.27 0.00 both concerning bilateral agreements between different autonomous systems. CAARF-SDN Scenario III: Simulation of 5 calls, 10 minutes can also be extended to support mobile of call simulations, with a congested networkswhere QoD metrics should be FastEthernet network (99.8% of the interface improved. QoC and Routing Quality is congested) with the optimization application monitoring approach is also under via CAARF-SDN, as described in this case development, leading to diverse results and study. scenarios to be addressed in future work, thus demonstrating the flexibility and usability of After testing scenario III, the quality problems the proposed architecture. are perceptible due to packet loss, where the problems are clearly caused by the congested scenario. TABLE III INDICATORS PRODUCED IN THE SCENARIO III METRICS MIN AVERAGE MAX Lost packets (%) 0.00 9.44 37.13 MOS 4.41 3.90 2.14 R-factor 93.20 79.54 70.20

Scalability, performance, and security issues must also be considered. In a small or lab environment with few network elements, a controller can perfectly meet the expected requirements without high latency. However, for a larger and more complex network, where

148 Proceedings ADVANCE 2018 ISBN 978-2-9561129

BIBLIOGRAPHIC REFERENCES _-Scope_and_Requirements.pdf. [Access in 04-19-2017] [14] “Startrinity,” Startrinity.com, [Online]. Available: http://startrinity.com. [Access in 04-19-2017]. [1] H. E. Egilmez, S. T. Dane, K. T. Bagci e A. M. Tekalp, “OpenQoS: An OpenFlow controller design for [15] “Mikrotik RouterOS,” Mikrotik, [Online]. Available: multimedia delivery with end-to-end Quality of Service https://mikrotik.com/software. [Access in 04-19-2017]. over Software-Defined Networks,” Signal & Information [16] “RFC 3260 New Terminology and Clarifications for Processing Association Annual Summit and Conference Diffserv,” Internet Engineering Task Force (IETF®), 04 (APSIPA ASC) Asia-Pacific., 2012. 2002. [Online]. Available: [2] S. Oh, J. Lee, K. Lee e I. Shin, “RT-SDN: Adaptive https://tools.ietf.org/html/rfc3260. [Access in 04-19- Routing and Priority Ordering for Software-Defined 2017]. Real-Time Networking,” Technical report No. CS-TR- [17] “HP VAN SDN Controller,” Hewlett Packard, [Online]. 2014-387. KAIST School of Computing. Republic of Available: Korea, 2014 https://marketplace.saas.hpe.com/sdn/content/hpe-van- [3] A. Bentaleb, A. Begen, R. Zimmermann and S. Harous, sdn-controller-ova-free-trial#app_description. [Access in "SDNHAS: An SDN-Enabled Architecture to Optimize 04-19-2017]. QoE in HTTP Adaptive Streaming", IEEE Transactions [18] “RTP: A Transport Protocol for Real-Time on Multimedia, pp. 1-1, 2017. Applications,” Internet Engineering Task Force [4] O. Dobrijevic, A. J. Kassler, L. Skorin-Kapov, M. (IETF®), 07 2003. [Online]. Available: Matijasevic, "Q-point: Qoe-driven path optimization https://www.ietf.org/rfc/rfc3550.txt. [Access in 04-19- model for multimedia services" , in: Wired/Wireless 2017]. Internet Communications, Springer International [19] Opinion model for video-telephony applications, ITU-T Publishing, 2014, pp. 134–147. G.1070. [Online] Avaiable: http://www.itu.int/ITU- [5] H. Owens, A. Durresi, R. Jain, "RVSDN: Reliable video T/recommendations/rec.aspx?id=9050[Access in 04-19- over softwaredefined networking" , in: 2014 IEEE 2017] Global Communications Conference, 2014, pp. 1974– [20] The E-model: a computational model for use in 1979. doi:10.1109/GLOCOM.2014.703709 transmission planning, ITU-T G.107. [Online] Avaiable: [6] C. F. J. Muakad, Context-based Dynamic and Adaptive https://www.itu.int/rec/T-REC-G.107[Access in 04-19- Forwarding Management,, UNIFACS Universidade 2017] Salvador, Salvador, Brasil, 2015. [21] Jeandro de M. Bezerra, Antonio Janael Pinheiro, Michel [7] A. L. C. d. Oliveira, Context-based Notification S. Bonfim, Jose A. Suruagy Monteiro e Divanilson R. Campelo, "Traffic Engineering in Software Defined Mechanism for Adaptive Forwarding, UNIFACS Networks ". [OnLine] Available: Universidade Salvador, Salvador, Brasil, 2015. http://www.sbrc2016.ufba.br/[Access in 04-19-2017] [8] J. P. S. d. Silva, Implementation of an Architecture for the Context-Aware Adaptative Routing Framework [22] ODL - OpenDayLight SDN Controller, [Online]. (CAARF), UNIFACS Universidade Salvador, Salvador, Available: Brasil, 2015. https://marketplace.saas.hpe.com/sdn/content/hpe-van- sdn-controller-ova-free-trial#app_description. [Access in [9] S. S. Spinola, Context Management applied to Adaptive 04-19-2017] Forwarding within Convergent Solutions, UNIFACS Universidade Salvador, Salvador, Brasil, 2015. [23] M. A. Callejo-Rodriguez, J. Enriquez-Gabeiras, W. Burakowski , "EuQoS: End-To-End QoS over [10] A. Zimmermann, A. Lorenz e R. Oppermann, “An Heterogeneous Networks," 2008 First ITU-T Operational Definition of Context,” 6th international Kaleidoscope Academic Conference - Innovations in and interdisciplinary conference on Modeling and using NGN: Future Network and Services, Geneva, 2008, pp. context (CONTEXT'07), Fraunhofer Institute for Applied 177-184. Information Technology, 2007. [24] E. Liotou, G. Tseliou, K. Samdanis, D. Tsolkas, F. [11] D. C. Nazario, M. A. R. Dantas e J. L. Todesco, Adelantado and C. Verikoukis, "An SDN QoE-service “Taxonomy of Context Quality publications,” for dynamically enhancing the performance of OTT Sustentable Business International Journal, ISSN 1807- applications," 2015 Seventh International Workshop on 5908, n.20, 2012. Quality of Multimedia Experience (QoMEX), Pylos- [12] “The JavaScript Object Notation (JSON) Data Nestoras, 2015, pp. 1-2. Interchange Format,” Internet Engineering Task Force [25] The ITU-T E-model: G.108 : New Appendix II - (IETF®), 03 2014. [Online]. Available: Planning examples regarding delay in packet-based https://tools.ietf.org/html/rfc7159. [Access in 04-19- networks. [Online] Avaiable: https://www.itu.int/rec/T- 2017]. REC-G.108-200403-I!Amd2/en [Access in 04-19-2017] [13] “Framework for SDN: Scope and Requirements,” ONF – OpenNetworkFundations, [Online]. Available: https://www.opennetworking.org/images/stories/downlo ads/sdnresources/technicalreports/Framework_for_SDN

149 Proceedings ADVANCE 2018 ISBN 978-2-9561129

A solution for acquisition of vital signs on Healthcare IoT Application.

David Viana1, Emilson Rocha1, Nicodemos Freitas1, Vitor Lopes1, Odorico Monteiro2, and Mauro Oliveira1

1 IFCE - Instituto Federal de Ciˆenciae Tecnologia do Cear´a,Fortaleza, Cear´a,Brasil [email protected] [email protected] [email protected] [email protected] [email protected] 2 UFC - Universidade Federal do Cear´a,Fortaleza, Cear´a,Brasil [email protected]

Abstract The need to monitor the vital signs of hospitalized patients can be met by information and communication technologies. The IoT technology on health will provide the use and analysis of vital patient data, enabling faster response from the healthcare team. This work uses heart rate, temperature and blood oxygen sensors to monitor the health status of patients, as well as their demand for requests, and the response of the healthcare team. The analysis of the results show that the solution is feasible, the sensors have accuracy within the average and that the requests and care of the health care team can be monitored.

1 Introduction

Internet of Things (IoT) is an emerging technology that can be applied in many areas to provide solutions that can transform the way industry is known, as well as current systems [17]. These possible transformations are due to the growth, mainly, of the number of devices connected to the Internet; in 2003 were 500 million, and it is predicted that more than 30 billion will be connected by 2020 [9], [4]. Thus, the Industrial Internet of Things (IIoT) and the Internet of Everything (IoE) are concepts addressed by [14], which emphasize the constant technological development followed by standardization and recommendations for data exchange among network types, proposing to use incident reports in these areas to work on standardized models, building secure software, and making data exchange secure [14]. It is clear that the health industry can also be transformed by the IoT, the constant tech- nological development discussed by [14], as well as, according to [13], who affirm that mobile technologies for health will transform clinical intervention, especially in the care of the elderly with chronic diseases that prevent the individual from living independently. According to [13], noncommunicable diseases (NCDs) such as cardiovascular, respiratory, diabetes and cancer are the main causes of morbidity and mortality. For the WHO (World Health Organization) NCDs are one of the major health and development challenges of the 21st century [16]. In Brazil, the number of elderly will reach 33.4 million, representing 15% of the Brazilian population and 5th place in the world [6], [16]. It can be seen that from both sides, information technology and care for the elderly, and/or people with NCDs, with the growth of the elderly population and the number of devices con- nected to the network, related to the emerging IoT and the possible transformations of the health industry, various technology services may emerge. Thus, with interest in the evolution

150 Proceedings ADVANCE 2018 ISBN 978-2-9561129

A solution for acquisition of vital signs on Healthcare IoT Application. Viana, Rocha, Freitas, Lopes, Monteiro and Oliveira

of technologies for health system and in the changes that impact such systems, it is proposed: Create an IoT solution for acquisition of vital signs using open-hardware to evaluate the use of temperature sensors, blood oxygen level and amount of heart rate for monitoring patients without risk of life during hospitalization. Also monitor the follow-up of care and requests for care to the health care team.

2 Methodology

The creation of an embedded IoT solution using three types of sensors(heart rate, temperature and blood oxygen sensor) to capture vital signs of the patient. A button and magnet reader will also be used to identify respectively when the patient requests care and to identify when the health team treat the patient.

2.1 Acquisition of vital signs, request for service and service per- formed The acquisition of blood oxygen levels and heart rate signals per minute was performed in 3 participants with a 1 minute interval for each sample. The 1 minute interval was used so that there were no very large time differences affecting the value of the acquisitions. Since a person can have his heart rhythm changed at different times of the day. The samples are captured in terms of an average of 15 readings performed at intervals of 400 milliseconds, also to avoid very different values since the sensor may have differences in the values captured in case of sudden changes of the finger on the sensor. Only for the vital sign of body temperature, the value of the instant of measurement was used because the temperature sensor is in permanent contact with the skin of the volunteer and in approximately 15 seconds can already make an accurate reading of the temperature of the place in contact. Service requests were made during solution testing. The simulation of the fulfillment of the service was also confirmed after each request made.

2.2 Analysis of the data The data analysis was performed in an exploratory way, testing the acquisition of the signals in three participants with a 1 minute interval for each sample. Five shots were taken for each participant. The analysis of the visits were made during the vital signs acquisition tests.

3 Related works

In [3] an Fog-based1 architecture is proposed, using FPGA to process signals, as well as a mesh layer to aggregate data before processing them in the Fog layer. The solution proposed here is similar, but the architecture proposed by [11], an intelligent local layer, is similarly studied and used, which resembles the architecture of [3]. The paper [12] proposes the use of sensors for monitoring real-time vital signs using low- cost hardware concerned with energy efficiency. In a similar way, it is proposed here the use of open-hardware that has relatively low cost, however, cost and energy are not analyzed here.

1The IoT approaches can to have local processing, local inteligency, local data-base, etc and the studies prupose the FOG-based when trying to minimize latency and optimize network usage to the cloud [2].

2

151 Proceedings ADVANCE 2018 ISBN 978-2-9561129

A solution for acquisition of vital signs on Healthcare IoT Application. Viana, Rocha, Freitas, Lopes, Monteiro and Oliveira

In [10] and [5], the authors used positional, fall, panic button analysis and context analysis to trigger urgency/emergency assistance, similar to [7]. Here, the call for assistance button will be used in a similar way, and the monitoring of the delivery of assistance to the health care team. The work [8] used a mesh network through Xbee for communication of several nodes that monitor the temperature of patients with concern to reduce costs and increase the quality of health services. However, here, a star topology will be used, where each node would have its access to the internet. The works previously related used IoT to monitor health signs and other functionalities such as: Heart rate, SPO2, temperature, fall analysis, blood pressure, environment sensors, air quality, CO2 levels and etc. Each work has its individual contributions, however, according [1], there are few sensors that can be used with open API and that are reasonably priced for prototyping.

4 Aspects of implementation

4.1 The hardware

An ESP8266 was used as an open-hardware platform with the following characteristics: 32-bit RISC CPU, 80 MHz, 64 KB RAM and 96KB data, Flash QSPI External - 512 KB to 4 MB, IEEE 802.11b/g/n Wi-Fi, 16 pin GPIO, SPI and I2C. In order to acquire the temperature, the DS18B20 sensor was able to operate in the resolution of 9 Bits up to 12 Bits. With accuracy of 0.5◦C, 0.25◦C, 0.125◦C, and 0.0625◦C, respectively for 9, 10, 11 and 12 bits. Each reading can be performed at 750 millisecond intervals. To capture the heart rate and blood oxygen level, the MAX30100 sensor was used. This sensor can work with resolutions from 13 up to 16 Bits. By varying the number of samples from 50 to 1000 per second. A push button was placed on the device to be pressed in the event of a service request and a redswitch presence sensor triggered by a magnet to indicate that the service was performed. A bluetooth HC-06 was used to send the captured information. It was decided to use the communication through a smartphone with android system to receive the data and transmit the information to the cloud through 3G/4G. Finally, two 4v7 and 400mA lithium batteries connected in parallel were used. These devices were used for immediate availability in the market at a relatively low cost because depend on the scenario where it will be used and if the project has enough money to implement it.

4.2 Software and firmware

The software was developed in two different platforms. One for mobile monitoring for android and another for managing the data in a web system. The application developed for android receives the data of the sensors through bluetooth. In addition, it can be configured to generate alerts when abnormal levels of some vital signal are reached, so that the health care team takes the necessary action. The Web software stores the sent data and displays it on a screen to monitor the patients assisted centrally. For firmware development, the Arduino IDE and the libraries that are available in its library manager have been used.

3

152 Proceedings ADVANCE 2018 ISBN 978-2-9561129

A solution for acquisition of vital signs on Healthcare IoT Application. Viana, Rocha, Freitas, Lopes, Monteiro and Oliveira 5 The architecture

The communication architecture used can work in 2 ways.

• Case 1: Uses the hardware formed by bluetooth and Wi-fi sending the data by bluetooth when it is paired with the smart-phone through the android application.

• Case 2: Send data to the cloud without needing a smart-phone.

There are 3 types of messages with the following format each:

Message_of_vital_signs: {"bmp":"value","spo2":"value","temp":"value", "id":"0123456789"} Visit_requisition_message: {"dt":"HH:mm:ss MM/dd/yyyy", "id":"0123456789"} Message_of_attendance: {"dt":"HH:mm:ss MM/dd/yyyy","id":"0123456789", "id_care":"0123456789"}

The size of the data transmitted for the 3 message types may vary but the maximum size is 64 bytes in the json format previously displayed for the vital sign message. Thus, on a network with a wireless switch of 100 Mbps capacity, it can transmit up to 160 simultaneous messages; or 160 patients can send messages every second. For service request messages and service confirmation messages, there is no possibility to send them with such a fast rate. Thus, taking into account that the objective of the real-time monitoring of patients is not to transmit the data in this way, but, according to the studies already mentioned, as in [11], using a fog or intelligent layer, to obtain a more efficient transmission of data and only send relevant data to the health monitoring team, being able to address other aspects such as energy consumption, battery usage. However it is not the object of study in this work. Figure 1 represents the hospital rooms where they are portrayed: Case 1, case 2, T.I. room and the real-time health care monitoring room.

Figure 1: Adapted from this proposed solution.

4

153 Proceedings ADVANCE 2018 ISBN 978-2-9561129

A solution for acquisition of vital signs on Healthcare IoT Application. Viana, Rocha, Freitas, Lopes, Monteiro and Oliveira 6 Results

After the development and testing stage the solution was able to capture the data temperature, oxygen level of the blood and the amount of heart rate per minute. It was possible to capture the moment when the patient pressed the button to request care, and it was also possible to record the moment the patient is treated. Table 1 shows the monitored vital signs of 3 participants in the testing stage. Where for each sample is the result of the average of 15 acquisitions of each vital signal.

Patient 1 Patient 2 Patient 3 Sample HR SPO2 Temperature HR SPO2 Temperature HR SPO2 Temperature 1 63 98 36.4 81 98 36.3 67 98 36.5 2 68 98 36.5 78 97 36.2 69 97 36.7 3 65 98 36.5 72 98 36.3 74 97 36.2 4 67 98 36.4 73 98 36.4 68 98 36.2 5 66 98 36.4 72 97 36.4 74 98 36.1 MEAN 65.8 98.0 36.44 75.2 97.6 36.32 70.4 97.6 36.34 VAR 3.7 0.0 0.003 16.7 0.3 0.01 11.3 0.3 0.06 STD 1.92 0.0 0.05 4.09 0.55 0.08 3.36 0.55 0.25

Table 1: Results of the acquisition of 3 patients with interval of 1 min. For each sample is the result of the average of 15 acquisitions of each vital signal.

Table 2 shows when the user requested the service request, showing that all the tests were captured and sent to the server. As well as all the moments in which the key to the service was activated.

No. Solicitation (datetime) Treatment (datetime) 1 10/24/2017 08:12:56 10/24/2017 08:13:13 2 10/24/2017 08:14:25 10/24/2017 08:15:09 3 10/24/2017 08:15:36 10/24/2017 08:15:52 4 10/24/2017 08:18:23 10/24/2017 08:19:05 5 10/24/2017 08:19:49 10/24/2017 08:20:10

Table 2: Results of tests from solicitation and treatment

7 Discussion

According to the presented data it is noticed that the standard deviation of the means of measurement of the heart beats was not greater than 4,1 or into ±5%. In [1], the accuracy presented ranged from ±2% to ±5% for the sensors studied, the tests showed the same variation in one single sensor. This means that the captured heart rate varied in the worst case of ±5% of the mean value and in the best case of ±2%. At relation to the requests of visits of the health team, the equipment can send the requisitions, as well as can record the moment in which someone of the health team came to perform the service, according to the results of Table 2.

5

154 Proceedings ADVANCE 2018 ISBN 978-2-9561129

A solution for acquisition of vital signs on Healthcare IoT Application. Viana, Rocha, Freitas, Lopes, Monteiro and Oliveira

The paper [8] argues that the use of health signal sensors, and in this case, the use of IoT for health systems, can improve care for hospitalized people, as well as costs involving professionals. The work [15] aproaches that to monitor more efficiently through a health team working in a satisfied and motivated way is also very important for the treatment of hospitalized elderly people. However, even the protocol and responsibility of health professionals about data collected often have errors in filling it for various reasons, stressful work day, poor compensation, reduced staff and so on. It can be said that the acquisition of vital signs through IoT devices only has to improve the quality of the information captured from the sensors to the health care team using the device accurately captures the signals within the standards presented and with in the cases in which the request/confirmation of care is performed, it can be perceived that this datas can be used to improve quality of service in health care, process improvement and data quality, management of people their activities and efficiency. The IoT technology can bring improvement to the standards and protocols that healthcare organizations follow and demand.

8 Conclusion and future work

This work was a study of case for a great health insurance company in the trying to improve healthcare assistance and your services with the view to implements/use local knowledge at technology and healthcare. However, even with shelf devices or open-hardware, the sensors and platforms tested by the team of this article, even using members with a great deal of development experience, had a negative impact on the development time of this work since some sensors did not communicate or did not work as promised. The team devoted themselves to search for errors in the libraries or in the hardware provided by their manufacturers. Thus, open- hardware technology brings risks costs in this direction. This IoT solution for the acquisition of vital signs using open-hardware was presented in where in experimental way, three people at different times checked their vital signs through the use of temperature sensors, oxygen level of the blood and heart beats rate. It was possible to capture and analyze such vital signs to monitor them. It was also possible to monitor requests and services to the health care team. For future works there is tring compare the data between different sensors and/or mhealth applications. To explorer the architecture elaborated in [11] or a Fog-based1 , inserting another heterogeneous element in the data communication with the use of LPWAN, in an attempt to reduce costs and to evaluate communication of long distances and the possibilities to reduce energy consuption.

9 Acknowledgment

The FUNCAP (Cear´aFoundation for the Support of Scientific and Technological Development). We also thank PPGCC- IFCE (Post-Graduation Program in Computer Science at the Federal Institute of Education, Science and Technology of Cear´a),the laboratory LAR (Lab Com- puter Networks and Multimedia Systems Aracati - CE) and LIT(Laboratory of Technological Innovation - Fortaleza, CE).

References

[1] Alexander Borodin, Yuliya Zavyalova, Alexei Zaharov, and Igor Yamushev. Architectural approach to the multisource health monitoring application design. 2015 17th Conference of Open Innovations

6

155 Proceedings ADVANCE 2018 ISBN 978-2-9561129

A solution for acquisition of vital signs on Healthcare IoT Application. Viana, Rocha, Freitas, Lopes, Monteiro and Oliveira

Association (FRUCT), pages 16–21, 2015. [2] L. Cerina, S. Notargiacomo, M. G. Paccanit, and M. D. Santambrogio. A fog-computing ar- chitecture for preventive healthcare and assisted living in smart ambients. In 2017 IEEE 3rd International Forum on Research and Technologies for Society and Industry (RTSI), pages 1–6, Sept 2017. [3] Luca Cerina, Sara Notargiacomo, Matteo GrecoLuca Paccanit, and Marco Domenico Santambro- gio. A fog-computing architecture for preventive healthcare and assisted living in smart ambi- ents. 2017 IEEE 3rd International Forum on Research and Technologies for Society and Industry (RTSI), pages 1–6, 2017. [4] Dave Evans. The Internet of Things The Internet of Things. Scientific American, (February):13– 15, 2011. [5] Eliezio Gomes, De Queiroz Neto, Oton Crispim Braga, Nicodemos Freitas, Luiz Odorico, Mon- teiro De Andrade, and Mauro Oliveira. VITE - Velocity and Intelligence Technology on Emergency Health Systems. pages 1–4. [6] Lucia Hisako Takase Gon¸calves, Angela Maria Alvarez, Edite Lago da Silva Sena, Luzia Wilma da Silva Santana, and Fernanda Regina Vicente. Perfil da fam´ıliacuidadora de idoso doente/frag- ilizado do contexto sociocultural de Florian´opolis, SC. Texto & Contexto - Enfermagem, 15:570– 577, 2006. [7] Yair Enrique Rivera Julio. Design ubiquitous architecture for telemedicine based on mhealth Arduino 4G LTE. 2016 IEEE 18th International Conference on e-Health Networking, Applications and Services, Healthcom 2016, 2016. [8] Ravi Kishore Kodali, Govinda Swamy, and Boppana Lakshmi. An implementation of IoT for healthcare. 2015 IEEE Recent Advances in Intelligent Computational Systems, RAICS 2015, (December):411–416, 2016. [9] B. V. S. Krishna and T. Gnanasekaran. A systematic study of security issues in internet-of-things (iot). pages 107–111, Feb 2017. [10] Vitor Lopes, Emilson Rocha, Eliezio Queiroz, Nicodemos Freitas, David Viana, and Mauro Oliveira. VITESSE - more intelligence with emerging technologies for health systems. 2016 7th International Conference on the Network of the Future (NOF), pages 1–3, 2016. [11] Vitor De Carvalho Melo Lopes. Tv health das coisas: uma arquitetura iot para assistˆenciadomi- ciliar `asa´ude.2017. [12] Tuan Nguyen Gia, Mingzhe Jiang, Victor Kathan Sarker, Amir M. Rahmani, Tomi Westerlund, Pasi Liljeberg, and Hannu Tenhunen. Low-cost fog-assisted health-care IoT system with energy- efficient sensor nodes. 2017 13th International Wireless Communications and Mobile Computing Conference, IWCMC 2017, pages 1765–1770, 2017. [13] Pradeep Ray, Politecnico Milano, and Aura Ganz. mHealth Technologies for Chronic Diseases and Elders : A Systematic Review mHealth Technologies for Chronic Diseases and Elders : A Systematic Review. 31(September):6–18, 2013. [14] Melnik Sergey, Smirnov Nikolay, and Erokhin Sergey. Cyber security concept for Internet of Ev- erything (IoE). 2017 Systems of Signal Synchronization, Generating and Processing in Telecom- munications, SINKHROINFO 2017, 2017. [15] Cristiane Chagas Teixeira, Rafaela Peres Boaventura, Adrielle Cristina Silva Souza, Thatianny Tanferri de Brito Paranagu´a,Ana L´uciaQueiroz Bezerra, Maria M´arciaBachion, and Virginia Vis- conde Brasil. Vital Signs Measurement: an Indicator of Safe Care Delivered To Elderly Patients. Texto & Contexto - Enfermagem, 24(4):1071–1078, 2015. [16] WHO. Global status report on noncommunicable diseases 2014. World Health, page 176, 2014. [17] L. D. Xu, W. He, and S. Li. Internet of things in industries: A survey. IEEE Transactions on Industrial Informatics, 10(4):2233–2243, Nov 2014.

7

156 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Evaluating Classification Algorithms performance with Matlab for generating alerts of risk of infant death

Gerson Albuquerque1, Cristiano Silva1, Joyce Quintino1, Odorico Andrade2, and Mauro Oliveira1

1 Federal Institute of Education, Science and Technology of Cear´a,Fortaleza, Cear´a,Brasil [email protected] [email protected] [email protected] [email protected] 2 Brazilian National Congress - Bras´ılia,Distrito Federal, Brazil [email protected]

Abstract GISSA is an intelligent system for health decision making focused on childish maternal care. In this system, are generated alerts that cover the five health domains: clinical- epidemiological, normative, administrative, knowledge management and shared knowledge. The system proposes a contribution to the reduction of child mortality in Brazil. Thus, this paper presents studies over an intelligent module that uses Machine Learning to generate child death risk alerts on GISSA. These studies focus on the use of different classification Algorithms, with a methodology based on Data Mining in order to reach a learning model capable of calculating the probability of a newborn dying. The work brings together public databases SIM and SINASC for the training of classification algorithms. During the methodological process, it was made a subsampling to balance the number of inputs and be fair in the training model results, executed with Matlab scripts.

Introduction

The problem of infant mortality mainly affects the so-called underdeveloped countries [21]. According to the United Nations, the overall rate of child mortality has dropped by 53% in 25 years [22], while in Brazil this reduction has been 77% in the last 22 years [23]. Probably due to improvements in maternal and child care through programs to support pregnant women, such as the Stork Network, which goal is to preserve maternal and child health, especially in the first years of life [6]. The State of Cear´a had a reduction of 11.5% between 2014 and 2015 [14]. However, these rates are still high compared with developed countries. Norway, for example, presented an infant mortality rate of 2.4% in 2014 [8]. Therefore, more effective strategies are needed to alleviate this problem. GISSA (Intelligent Governance in Health Systems) is a framework to support decision making in health settings. The system is able to generate alerts and administrative reports for managers and health professionals. This work presents LAIS, a mechanism based on machine learning capable of predicting cases of infant mortality to assist managers in decision making. Integration and analysis of the SIM (Mortality Information System) and SINASC (Information System for Live Births) public databases, was made available by DATASUS (Department of Information Technology of the SUS). Thus, the model generated by the LAIS is able, from the attributes of the newborn and those of its mother, to classify and calculate the risk of infant death. This paper is organized as follows. Section 1 presents LARIISA and GISSA; Section 2 discusses related work; section 3 describes the intelligent module based on machine learning using the pattern recognition methodology and section 4 presents conclusions and future work.

157 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Evaluating Classification Algorithms performance with Matlab for generating alerts of risk of infant death Gerson, Cristiano and Joyce 1 Theoretical Foundation 1.1 LARIISA project LARIISA is a platform developed in 2009 [20] with the aim of providing governance intelligence in the five areas of health (clinical and epidemiological, legal, administrative, knowledge management and shared knowledge), helping users (patients, health workers, nurses, doctors, administrators, health secretaries, etc.) in decision making. To do so, it is necessary to manage health-related databases, dispersed on government bases or not, by crossing them with information captured in real time [15].

1.2 GISSA The GISSA project is an instance of the LARIISA platform, with focus on the Stork Network project of the Brazilian Ministry of Health, supported by FINEP (Funding of Studies and Projects) and being implemented by Instituto Atlˆantico. Its purpose is to help decision makers at all levels of the health cycle (patient, health agent, physician, hospital manager, secretary, etc.), with the generation of dashboards and alerts, related to maternal and child health issues. A GISSA prototype is operational in Tau´a, Cear´a,and is being implemented in other municipalities of the State of Cear´a. GISSA is therefore composed of a set of components that allows the collection, integration, and visualization of information relevant to the decision-making process [3]. Currently, it has the following alerts: live birth with low weight; delayed vaccination; prenatal; vaccine campaign; among others. In this context, [11] proposes a mechanism based on heuristics capable of calculating the probability of death of newborns using information from different databases for GISSA. However, although it is based on medical knowledge, the work does not perform tests of efficiency or precision, which prevents the evaluation of this mechanism.

2 Related Work

Malnutrition is considered to be a major cause of child mortality in underdeveloped countries. In [17], classification algorithms were used to find patterns related to the nutritional status of children under five years of age. The study aims to identify which factors affect the nutritional status of children. A total of 11,654 cases were treated with 16 health and socioeconomic attributes, collected from an Ethiopian Health Demographic Survey conducted in 2011. The machine learning algorithms used were J48 [25] of decision trees, Naive Bayes [16] and the rules inducer class PART [10]. After several experiments, was selected the PART algorithm that presented the best performance, with a precision of 92.6% and a Receiver Operating Characteristic (ROC) curve area of 97.8%. In [28] a study on infant death in children under one year of age was performed using Data Mining techniques. Then, the SIM and SINASC databases were used for the municipality of Rio de Janeiro between 2008 and 2012. The integration was done through the field DN (Birth Certificate Number), present in the SINASC and SIM. A total of 3,336 individuals were born and died. In the research, the following 13 attributes were used: Sex of the Newborn, Apgar1 (5 parameters that are assessed during the first minute of the child’s life - heart rate, respiration, muscle tone, irritability and skin color), Apgar5 parameters that are evaluated during the fifth minute of the child’s life - heart rate, breathing, muscle tone, irritability and skin color), Newborn weight, Newborn color, Newborn age, basic cause of death , age of mother, number of dead children, number of live children, number of weeks of gestation, type of pregnancy and type of delivery. Was used the unsupervised algorithm Apriori [1] for the investigation of birth characteristics that are associated with death in children under one year of age. At the end of the work, some rules provided may assist health professionals. A study of births at Bega Obstetrics and Gynecology Clinique, Timi¸soara,Romania, was presented in [27]. A dataset was analyzed with 2,325 births and 15 attributes, were some of them are: mother’s age, number of pregnancies, number of pregnancies weeks of gestation, child sex, child’s weight, and type of delivery. The goal of the paper is to predict the child’s Apgar score at birth, using the tool

2

158 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Evaluating Classification Algorithms performance with Matlab for generating alerts of risk of infant death Gerson, Cristiano and Joyce

Figure 1: Smart Alert Generation Methodology

WEKA [9] and 10 classification algorithms: Naive Bayes, J48, IBK [2], Random Forest [5], SMO [24], AdaBoost [12], LogitBoost [13], JRipp [7], REPTree and SimpleCart [4]. The LogitBoost algorithm presented better results in the experiments. The generated model was used in a Java application to predict the Apgar score of a new patient. [18] uses Bayesian Networks to support decision making in uncertain environments. A network was developed to classify hypertensive disorders focused on the care of pre-eclampsia. Using the Bayesian Nisy-OR model in a database, the system analyzes the data layout and classifies them in the network. From the symptoms presented by the pregnant woman, the system, through statistical data, infers the severity of the case, helping the doctor specialized in the diagnosis of pre-eclampsia. This approach proved accurate even with a small number of data. [19] makes a detailed analysis between the Naive Bayes and the decision tree J48 classifiers. The paper analyzes a set of data related to hypertensive disorders to evaluate pregnancy complications. A study of the performance of the classifiers and a Confusion Matrix is done using predictive parameters. The two classifiers present close values. However, the J48 decision tree had a more accurate result.

3 Intelligent Module Based on Machine Learning

To achieve better results in the Data Mining process, the methodology of pattern recognition developed at the Federal University of Cear´a(UFC) in the Centauro Laboratory was used. This methodology consists of a set of steps performed in the Data Mining process with the objective of selecting the best algorithms and attributes according to the context studied [26]. Figure 1 shows the steps developed in this work: Data collection; Integration; Evaluation and Re- sults; and Application. First, data is collected from databases. Then, this data is integrated through the junction between the bases. Subsequently, the algorithms are trained, tested and evaluated accord- ing to the appropriate metric, generating a prediction model. Finally, the generated model is tested on a prototype capable of predicting the risk of a newborn coming to death.

3

159 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Evaluating Classification Algorithms performance with Matlab for generating alerts of risk of infant death Gerson, Cristiano and Joyce 3.1 Collection and Preparation of Data Data were collected from two different public databases: SINASC, which contain information on live births; and SIM, which contain information on mortality, including cases of infant mortality. Both databases are available on the DATASUS portal in DBC (DataBase Container) format. The data refer to the State of Cear´ain the years 2013 (SINASC and SIM) and 2014 (SIM). These data were converted to SQL (Structured Query Language) using TABWIN, a software provided by DATASUS for viewing and manipulating public data.

3.1.1 Integration and Selection of Attributes With the relationship of SIM bases and SINASC it is possible to retrieve information about the birth of children victims of infant mortality. Thus it is possible to distinguish children who have survived or not up to one year of age. Each live birth has a unique attribute called the Born Birth Declaration Number (numerodn), always filled in at the base of SINASC. The SIM base also has the field (numerodn), which is filled only in cases of infant death. This field was essential for the integration of the bases, since it is possible to relate the infant mortality data to the birth data. The integration was divided into 4 stages: Step 01: Taking into account that children born in 2013 can be victims of infant mortality in 2014, the bases of SIM2013 and SIM2014 were united for children who died less than 1 year old. The following is a simplified expression (in relational algebra) of the integration process (Equation 1):

SIM 0 ← σidade((SIM2013) ∪ (SIM2014)) (1) Step 02: Then, the SINASC2013 and SIM are joined together using the numerodn field. The result returns all child mortality cases occurred in 2013 or 2014. The following is a simplified expression in relational algebra (Equation 2):

M ← (ρSN(SINASC) 1 SN.numerodn = S.numerodnρS(SIM 0)) (2) Step 03: Also, it was searched for cases of newborns who did not suffer death. Thus, a query was made at SINASC2013, except for cases that suffered infant death M. The following is a simplified expression in relational algebra (Equation 3):

V ← (SINASC2013 − M)) (3) Step 04: Finally, a union of the death cases M and non-infantile V deaths occurred in 2013 and 2014. The following is a simplified expression (in relational algebra) (Equation 4):

ALL ← (M ∪ V ) (4) In this stage, labeling was also performed, death YES and non-death NO, needed in supervised classification problems. The result of the integration is a dataset with 50 attributes, containing information on the birth and death (if it occurred) of children born in 2013. The dataset obtained resulted in 1,182 cases of death and 124,876 cases of children who survived up to one year. 16 of these attributes were selected for the analysis step. The values of these attributes are originally inserted as strings. The Matlab scripts used work with numbers, so it was necessary to convert these strings into to numeric values to make calculations. The weight for each category value was determined by the sequence it was observed in the data analysis. The values were defined from -1 to 9, where: • -1 – used to ”Campo-em-Branco” (free translation: Blank field) • -0.5 – to ”Ignorado” (free translation: ignored) • 0 – to ”Errado (free translation: wrong) and from 1 to 9 – defined to the other characteristics considering 1 to smaller death risk and 9 to bigger risk of death

4

160 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Evaluating Classification Algorithms performance with Matlab for generating alerts of risk of infant death Gerson, Cristiano and Joyce

Figure 2: Example of a Test procedure using KFold with CrossValidation - Classified by MLP and KFold = 5

3.2 Analysis and Tests In order to find a more adequate model for the prediction problem of infants, were performed experi- ments with the algorithms, K Nearest Neighbor (KNN), Naive Bayes (NB), Vector Support (SVM) and Artificial Neural Networks (ANN), using scripts of a trial version of Matlab. Some of these algorithms are described as follows: (i) K Nearest Neighbor (KNN): It works in order to calculate the similarity between the record to be analyzed and the records of the dataset in order to estimate the class of the record that was presented as input. When a new record needs to be classified, it is compared to all training data records to identify k-neighbors closest according to some selected metric, and one of the most used is the Euclidean distance (Equation 5). The Euclidean distance refers to the distance between two points measured by the straight line that interconnects these two points (Equation 5).

v u n uX 2 d(x, y) = t (xi − yi) (5) i=1 (ii) Artificial Neural Networks: The letter in equation 6 represents the input signals (data on the problem), while the synaptic weights are represented by wi (responsible for weighting the input signals according to the level of importance), and Σ represents the aggregating function. It also has + Θ, threshold of activation, a constant responsible for allowing or not the passage of signal; u represents the activation potential [29].

n X u = wi ∗ xi + Θ (6) i=1 In this classifier was adopted a cross-validation K-Fold (Figure 2), because the stratification allows tests using different parameters, and determine the best learning rate and neurons number. (iii) Naive Bayes: The Bayesian Naive Bayes algorithm is based on probability theory and assumes that attributes will influence the class independently. During model creation, the classifier will construct a table showing how much each category of each attribute contributes to each class. In equation 7, C represents the class and e={A1 = a1,...{An = an} are the attributes of the classes. The tests show that the Naive Bayes classifier is the most suitable for this purpose, presenting good results with area of the ROC curve of 92.1%.

n Y P (A1, ...An,C = P (C) P (Ai|C) (7) k=1

5

161 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Evaluating Classification Algorithms performance with Matlab for generating alerts of risk of infant death Gerson, Cristiano and Joyce 3.3 Evaluation and Results In the first experiment, the Naive Bayes algorithm obtained the best results. It presented the highest area value of the ROC curve compared to the other. Table 1 refers to the experiment with balanced data, in which the Spread Subsample algorithm was used to balance the classes of the data sample by means of a random sub-sampling. This equals the number of individuals in the living and dead classes with, respectively, 1,182 instances for each class. It is noticed that even with the change in the number of classes, the Naive Bayes algorithm continued to have the best result because it had the largest area value of the ROC curve.

Algorithms Precision Recall F-Measure ROC KNN 0,895 0,830 0.861 0,888 Naive Bayes 0,921 0,809 0,861 0,924 V. PERCEPTRON 0,900 0,838 0,868 0,875 MLPerpectron 0,837 0,821 0,829 0,898

Table 1: Experiment

Because the Naive Bayes algorithm obtained the largest area value of the ROC curve in the exper- iment, a more detailed study was carried out in this experiment: Table 2 shows the Confusion Matrix of the Naive Bayes algorithm for an analysis of the results. It is verified that Naive Bayes correctly classified 2,056 children (86,912%) that correspond to the correct diagonal in the table below (956 + 1,100). Therefore, 308 children (13,028%) were classified incorrectly (another diagonal: 82 + 226).

Predicted Class Dead Live Dead 956 226 Real Class Live 82 1100

Table 2: Confusion Matrix - Naive Bayes

Among the 308 children who were misclassified, 82 (3.46%) were false positives and 226 (9.56%) were false negatives. Of the 2,056 children who were correctly classified, 956 (40.44%) are true positives and 1,100 (46.53%) are true negatives. As 956 are true positives, this indicates those who suffered childhood death and that 82 false positives did not suffer infant death, but were classified as having died.

3.4 Application As mentioned before, the objective of this paper is to predict through the application of classification algorithms the analysis of the attributes to calculate the chance of a new born die. The tools used are useful to search for results based on the precision after some training. After this process of analysis and comparison between the algorithms, we made the choice of the most efficient classification algorithm according to the domain studied. Thus, the Naive Bayes classifier is the one that best fits to the dataset analyzed. So, the model generated by the algorithm was used to classify the risk of new patients suffering death.

4 Conclusions and Future Work

The proposal presented in this paper adds value to the GISSA alerts, providing them with an intelligent mechanism based on classifiers. Thus, it is able to provide the health manager, in addition to the important warnings that already produced the probability of death of a newborn from the information

6

162 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Evaluating Classification Algorithms performance with Matlab for generating alerts of risk of infant death Gerson, Cristiano and Joyce

of the pregnant and the newborn itself. Therefore, the decision maker may prioritize more urgent cases and, consequently, mitigate the serious problem of infant mortality. As a future work, it is intended to apply the methodology used in the present work to the integration of SINASC and e-SUS databases, as also run tests with other tools like the language R and Scikit- python to test the perfomance of the tools itself. It is also expected to use together classification and heuristics with ontology (that is under work by other team) to fit specific classes of problems. This will allow the possibility of developing a hybrid mechanism to be added to the GISSA, from these experiments.

Acknowledgment

The authors would like to thank PRPI / IFCE and FUNCAP for the support they received via the Program for Productivity in Research, Incentives for Interiorization and Technological Innovation.

References

[1] Rakesh Agrawal, Ramakrishnan Srikant, et al. Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases (VLDB), September 12-15, Santiago, Chile, volume 1215, pages 487–499, 1994. [2] David W. Aha, Dennis Kibler, and Marc K. Albert. Instance-based learning algorithms. Machine Learning, 6(1):37–66, 1991. [3] L. O. M. Andrade, M. Oliveira, and Ronaldo Ramos. Projeto GISSA: META F´ISICA 3 – atividade 3.1 Definir modelo de inteligˆencia de gest˜ao na sa´ude. https://amauroboliveira.files.wordpress.com/2015/11/ 2015-nov30-meta-3-ativ-1-moldelointeligc3aanciagestc3a3o-draf-1-0.pdf, 2015. [Online; accessed 30-September-2016]. [4] L Breiman, JH Friedman, RA Olshen, and CJ Stone. Classification and regression trees, wadsworth international group, belmont, CA, 1984. Case Description Feature Subset Correct Missed FA Misclass, 1:1–3, 1993. [5] Leo Breiman. Random forests. Mach. Learn., 45(1):5–32, October 2001. [6] Pauline Cristine da Silva Cavalcanti, Garibaldi Dantas Gurgel Junior, Ana Ribeiro de Vasconcelos, and Andre copyright Vinicius Pires Guerrero. Um modelo da Rede Cegonha, 12 2013. [7] William W Cohen. Fast effective rule induction. In Proceedings of the twelfth international con- ference on machine learning, July 9—12, Tahoe City, CA, USA, pages 115–123, 1995. [8] CIA World Factbook. Noruega taxa de mortalidade infantil. http://www.indexmundi.com/pt/ noruega/taxa_de_mortalidade_infantil.html/, last viewed: July 17 2017, 2015. [9] Eibe Frank, Mark. Hall, and Ian Witten. Online appendix for ”data mining: Practical machine learning tools and techniques. In Morgan Kaufmann. 5 edition, 2016. [10] Eibe Frank and Ian H. Witten. Generating accurate rule sets without global optimization. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML ’98), pages 144–151, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc. [11] Renato Freitas, Cleilton Lima, Oton Braga, Gabriel Lopes, Odorico Monteiro, and Mauro Oliveira. Using linked data in the integration of data for maternal and infant death risk of the sus in the gissa project. In Proceedings of the 23nd Brazilian Symposium on Multimedia and the Web (WebMedia ’17), October 17–20, Gramado, RS, Brazil. ACM, 2017. [12] Yoav Freund, Robert E Schapire, et al. Experiments with a new boosting algorithm. In icml, volume 96, pages 148–156, 1996.

7

163 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Evaluating Classification Algorithms performance with Matlab for generating alerts of risk of infant death Gerson, Cristiano and Joyce

[13] Jerome Friedman, Trevor Hastie, Robert Tibshirani, et al. Additive logistic regression: a statis- tical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics, 28(2):337–407, 2000. [14] G1. Taxa de mortalidade infantil no cear´a. http://g1.globo.com/ceara/noticia/2016/08/ ceara-reduz-mortalidade-infantil-materna-e-fetal, last viewed: PREENCHER, 2016. [15] Leonardo M ”Gardini, Reinaldo Braga, Jose Bringel, Carina Oliveira, Rossana Andrade, Herv´e Martin, Luiz OM Andrade, and Mauro” Oliveira. Clariisa, a context-aware framework based on geolocation for a health care governance system. In IEEE 15th International Conference on e- Health Networking, Applications & Services (Healthcom), October 9-12, Lisbon, Portugal, pages 334–339. IEEE, 2013. [16] George H John and Pat Langley. Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, August 18-20, Montreal, QU, Canada, pages 338–345. Morgan Kaufmann Publishers Inc., 1995. [17] Z Markos, F Doyore, M Yifiru, and J Haidar. Predicting under nutrition status of under-five children using data mining techniques: The case of 2011 ethiopian demographic and health survey. J Health Med Inform, 5:152, 2014. [18] M. W. L. Moreira, J. J. P. C. Rodrigues, A. M. B. Oliveira, R. F. Ramos, and K. Saleem. A preeclampsia diagnosis approach using bayesian networks. In 2016 IEEE International Conference on Communications (ICC), pages 1–5, May 2016. [19] M. W. L. Moreira, J. J. P. C. Rodrigues, A. M. B. Oliveira, K. Saleem, and A. Neto. Performance evaluation of predictive classifiers for pregnancy care. In 2016 IEEE Global Communications Conference (GLOBECOM), pages 1–6, Dec 2016. [20] Mauro Oliveira, Carlos Hairon, Odorico Andrade, Regis Moura, Claude Sicotte, J-L Denis, Stenio Fernandes, Jerome Gensel, Jose Bringel, and Herv´eMartin. A context-aware framework for health care governance decision-making systems: A model based on the brazilian digital tv, 2010. [21] ONU. Onu: Meta global de mortalidade infantil ser´aatingida com atraso de 11 anos. http://www. bbc.com/portuguese/noticias/2014/09/140916_unicef_meta_mortalidade_infantil_rm, last viewed: July 22 2017, 2014. [22] ONU. Onu afirma que taxa de mortalidade infantil no mundo caiu pela metade em 25 anos. http://www.uai.com.br/app/noticia/saude/2015/09/09/noticias-saude,187094/onu-a\ begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{}intopreamble] rma-que-taxa-de-mortalidade-infantil-no-mundo-caiu-pela-metade, last viewed: July 17 2017, 2015. [23] ONU. Taxa de mortalidade infantil no brasil cai 77 https://istoe.com.br/324257_TAXA+DE+ MORTALIDADE+INFANTIL+NO+BRASIL+CAI+77+EM+22+ANOS+DIZ+ONU/, last viewed: June 28 2017, 2015. [24] John C. Platt. Advances in kernel methods. chapter Fast Training of Support Vector Machines Using Sequential Minimal Optimization, pages 185–208. MIT Press, Cambridge, MA, USA, 1999. [25] J Ross Quinlan. C4.5: programs for machine learning. Elsevier, 2014. [26] Ronaldo F. Ramos, C´esarL. C. Mattos, Amauri H. Souza J´unior,Ajalmar R. Rocha Neto, Guil- herme A. Barreto, H´elio A. Mazzal, and M´arcioO. Mota. Heart diseases prediction using data from health assurance systems in models and methods for supporting decision-making in human health and environment protection. In Nova Publishers, Nova York, NY, USA. 2016. [27] Raul Robu and S¸tefan Holban. The analysis and classification of birth data. Acta Polytechnica Hungarica, 12(4), 2015. [28] Cl´audioJesus Rosa. Aplica¸c˜aode KDD nos dados dos sistemas SIM e SINASC em busca de padr˜oesdescritivos de ´obitoinfantil no munic´ıpiodo rio de janeiro, 2015. [29] IN da Silva, Danilo Hernane Spatti, and Rog´erioAndrade Flauzino. Redes neurais artificiais para engenharia e ciˆenciasaplicadas. S˜aoPaulo: Artliber, pages 33–111, 2010.

8

164 Proceedings ADVANCE 2018 ISBN 978-2-9561129 Benchmarking microservices deployment patterns: Virtual Machines vs. Container over Virtual Machine

Ahmed Yakdhane, El Hadi Cherkaoui and Fouad Guenane {ayakdhane,echerkaoui,fguenane}@beamap.fr ​

Beamap a Sopra Steria Group 5 place de l'iris 92400 courbevoie, France

Abstract—The emerging of Microservices Architecture applications necessary for their business. In many cases, the ​ is becoming popular and getting the attention of several easiest way to achieve this characteristic is to allocate a unique companies that aim to deploy their infrastructure into the and separate server for each application. Adopting Cloud. Public Cloud providers such as Google Cloud microservices brings many challenges and decisions to make. Platform (GCP) and Microsoft Azure Cloud are concerned One of the most important decision is to choose the right about this problematic with the perspective to deploy deployment ? This challenge is addressed by the Cloud container services into their service catalog. Containers is architects with the need of re-architecting or sometimes by the way identified as the perfect candidates to deploy re-platforming the applications and the hosted infrastructure. microservices-based applications. However, most of Another issue may occur when deploying these applications containers layers are not optimized to be run on virtual on public cloud provides such as AWS, GCP or Azure Cloud: machines. The generated overhead in term of latency and Is it better to follow the trend of container-based virtualization resources CPU may double with respect to run or to stick to a virtual machines deployment? microservices on the top of containers. In this paper we Microservice-oriented architectures emerged from the need propose a benchmark for running microservices on virtual to separate and distribute the resources into several managed machines against deploying microservices on container on and microservices, and has now all the necessary maturity to top of virtual machines to highlight the impact of the be deployed into production environments [1]. Most of overload that these kinds of deployments could generate. ​ ​ companies and large web players such as Netflix, Airbnb, Keywords— Microservices, containers, Kubernetes, Amazon, etc. are using these architectures into their Public cloud, CaaS, virtualization. deployments because they are agile in their methodology, they are cloud compatible and particularly adapted for an I. INTRODUCTION economical change. Microservices have emerged as a new architectural Traditional web applications are built using a monolithic concept in which distributed applications are divided into architecture [3], this architecture runs nearly all of the system ​ ​ small and independently deployable services, each running in components in a single process, virtual machines are a perfect its own context. This context is generally described as a candidate to deploy such architectures. The micro-services process, communication protocol and stateless mechanisms. architecture is a distributed system of independent and However, there is a lack of a quantitative approach to design lightweight services, each service has a single purpose and and thus to evaluate the benefit of microservices applications runs in its own space of memory. Micro-services use with respect to traditional deployment such as virtual machine. messaging or other web based protocols [4] for inter-process ​ As a first step of this study, we propose in this paper to communication, while components in a monolithic illustrate the use of set of metrics that may be interesting to deployment communicate through shared memory. Each benchmark for the microservices deployments. service can be developed in a different language by a different team of developers; this makes micro-services easier to be Many companies are deploying thousand of applications individually deployed, upgraded or reused. that all support vital functions of their business and have to ensure the effectiveness of running these applications while minimizing the maintenance and IT operations of these

165 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Container-based virtualization brings a better and more The most important metric we’re looking for in this is test is efficient [2] virtual memory management, it allows its users to the rate of transactions the database server is capable is of ​ fit more applications into a single server. However, in terms of performing. performance, there is a lake of research and proposal in order B. Test scenarios to provide for software engineering and researchers baseline, best practices and repeatable empirical studies involving this 1) Apache web server new architectural style. Such benchmark could be useful, for In the following section, we will describe how the virtualized instance, to investigate about the outcomes of adopting each of and containerized web server are deployed and tested. We these virtualization technologies in order to help making the start by installing Apache web server on the virtual machines architectural decision. It could be also interesting to study the on each of the two cloud providers. To run a containerized impact of new microservices architectures and programming Apache server on a Kubernetes cluster we need to create a models. Another usage scenario could be to help developers as deployment file to specify parameters and configuration of the a reference implementation which are unfamiliar with the Pod. Kubernetes will start the pod that will host the microservices style. application, for test purposes, the web server serves a static The rest of this paper is presented as follow: Section II 3055 Bytes HTML file. The benchmark tool will start an describes how the benchmark infrastructure is composed and incremental performance test by increasing the concurrency deployed. Metric and infrastructure components are illustrated level parameter on the Apache benchmark (ab) tool, the in this section. The next section present the results of the HTML file is requested NR times for a concurrency level from benchmark and highlight the identified metric. Section IV CMIN up to CMAX, all of these parameters can be set by the explain the obtained results. Finally Section drives user. conclusions and future works. 2) MySQL database server II. DEPLOYING THE BENCHMARK Following the web serving performance test, we will describe how the database performance test is set and executed. To test INFRASTRUCTURE the MySQL relational database server’s performance, we In this section we present the metric and tools that are deploy a MySQL server on all of the resources and write a necessary for building the benchmark. We present the total of 10000 lines in a test database using the “prepare” different scenarios that have been deployed. Finally, we function on Sysbench, these lines will serve as data to be present the 2 Cloud provider platform that have been used processed for read/write operations and database transactions. during the benchmark. Sysbench will measure the performance of the MySQL server A. Metrics, test tools and resources for a given number of threads ranging from TMIN to TMAX. C. Benchmark resources: Cloud providers 1) Apache Bench: For the need of this benchmark, we chose two of the major Apache Bench (ab) is a very popular tool for measuring the cloud providers, Google Cloud Platform and Microsoft Azure. performance of HTTP servers. It works by generating a flood These cloud providers offer container services with a fully of requests to a given URL and returns some comprehensible managed Kubernetes platform. Using the same container performance related metrics to the screen. In this paper we orchestration platform will provide relevant results and avoids will be interested in RPS (requests per second) as a metric. any variation on software layers between containers and This measures the throughput, which is typically the most hardware. On these two cloud providers, we provisioned important measure. There are other parameters that can be 1CPU, 1.75GB of RAM for the virtual machines and interesting (latency) depending on the application, but in a containers running Apache HTTP and 2CPUs, 3.5GB of RAM typical application, RPS is the main metric. By using Apache for virtual machines and containers running MySQL database Bench, we get to run concurrent requests by specifying a server. All of the cloud benchmark resources were deployed parameter on the tool, this parameter allow us to define how on the “West Europe” zone. many requests will be sent to the server at a time 2) Sysbench III. ANALYZING BENCHMARK RESULTS SysBench is a modular, cross-platform and multi-threaded In the previous section, we have deployed an Apache web benchmark tool for evaluating OS parameters that are server and MySQL database server and performed a series of important for a system running a database under intensive performance tests. In this section, we will study the results of load. The idea of this benchmark suite is to quickly get an the benchmark scripts by collecting the different metrics for impression about system performance. One of the main the purpose of comparison and analysis. features of Sysbench is its ability to benchmark SQL database, it prepares a database and performs a series of tests and A. results operations and returns a collection of metrics that will allow 1) Apache Server on VM vs. on Container on VM: us to see how the database server is behaving under a load test.

166 Proceedings ADVANCE 2018 ISBN 978-2-9561129

VM vs Kubernetes Pod Upon completion of the previous tests, important amount of processing power is lost when running a we collect the resulted csv files, in this section we will be web server on Kubernetes Pods. interested on how the Apache web server performs under the incremental tests, the following figure illustrates the number 2) MySQL Server: VM vs Kubernetes Pod: of requests a web server can handle as the number of concurrency level is incrementing.

Fig 3 : MySql server VM vs Container over VM (GCP)

Fig 1 : Apache server VM vs Container over VM (GCP)

Fig 4 : MySQL server VM vs Container over VM (Azure)

In this figure, we study how the MySQL database server behaves under the load of incrementing Sysbench threads, this multithreaded database benchmarking tools measures a collection of metrics by performing a number of reading and writing operations and measuring the rate of transactions per Fig 2 : Apache server VM vs Container over VM (Azure) second the server is capable of handling. After running the Visually the rate of requests quickly jumps and then stabilizes virtual machine and the container tests we collect the metrics as Apache server reaches the limit of its resources. To get a in CSV format. In the figure below, we see how the database clearer view of the output, let’s see the maximum and server responds to incrementing threads on both of the tests. minimum of the requests rate for each of the tests in the table below: On this figure, we note that for low number of Sysbench threads, the containerized and the virtual machine deployed database servers are performing closely, but as the number of Google Google Azure Azure the operations increased by incrementing the number of (VM) (Pod) (VM) (Pod) Sysbench threads, the load on the MySQL database server quickly rises and the number of transactions per seconds Min Req/s 5845.54 4924.55 1749.34 1253.09 descends as a clear indicator of system resources overload. Let’s see how the maximum and minimum transactions rates Max Req/s 28130.34 14842.51 12932.08 7992.27 performed by each server on each of the deployed servers.

Table 1 : Apache server VM vs Container on VM Request rate Google Google Azure Azure At the maximum points of performance, the containerized (VM) (Pod) (VM) (Pod) apache server has performed 47.2% lower than the server Minimum 230.14 200.53 21.05 20.39 deployed on a virtual machine on Google Cloud Platform and Transaction/s -38.1% on Microsoft Azure. This clearly indicated that an

167 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Maximum 626.20 596.98 283.82 195.18 processed. The overhead for using the hypervisor Hyper-V, for example, is reported [6] to be between 9-12%, to explain Transaction/s ​ further, a guest operating system on top of the hypervisor Table 2 : MySQL server VM vs Container on VM Transaction rate Hyper-V will get access to 88-91% of the CPU available. The By calculating the difference on the maximum performance memory overhead for Hyper-V was observed to be around 340 points, we eventually get a 4.6% drop on performance by MB of RAM. As a result, the virtualized application’s running the Apache Server on a Kubernetes Pod on Google performance will be lower than running the same application Container Engine (GKE) and a 31.2% drop by running the directly on a physical server. server on a Kubernetes Pod on Azure Container Service V. CONCLUSION (ACS). Running the performance tests on the different virtualization technologies gave us a quick idea on how the web and database servers performed. We have noticed a This paper presented a set of tests conducted for a candidate significant performance difference between the deployment microservices application benchmark to be used in software patterns. In the next section, we will be interpreting the results architecture design. The proposed scenarios were discussed and trying to explain the output. and illustrated in the context of selecting possible benchmark candidates amongst 2 Cloud providers : Google Cloud IV. RESULTS INTERPRETATION AND Platform and Microsoft Azure. Our results indicate that, the HYPOTHESIS overhead of the virtualization layers on virtual machines as On Google Container Engine and Azure Container Service, well as the containers running on virtual machines, by containers (Pods) run on worker nodes (Virtual Machines). applying the conclusions to real life scenarios, we pointed on The following figure illustrates the layers between the server the importance of running benchmarks on compute intensive application to the hardware for both of the virtualization tasks to measure the loss of performance when running technologies [5] studied in this paper: containers on virtual machines. ​ ​ We believe that the proposed work can be useful in selecting the appropriate Cloud provider and the microservice architecture. REFERENCES [1] Mario Villamizar, Oscar Garcés, “Infrastructure Cost Comparison of Running Web Applications in the Cloud Using AWS Lambda and Monolithic and Microservice Architectures” Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium [2] Wes Felter, Alexandre Ferreira, Ram Rajamony, Juan Rubio, ”An Updated Performance Comparison of Virtual Machines and Containers, IBM Research”, Performance Analysis of Systems and Software (ISPASS), 2015 IEEE International Symposium on. [3] Mario Villamizar, Oscar Garcés, “Evaluating the monolithic and the microservice architecture pattern to deploy web applications in the cloud”, Computing Colombian Conference (10CCC), 2015 10th. [4] Sam Newman, Building Microservices: Designing Fine-Grained ​ Systems,” O'Reilly Media Release, February 2015 [5] Performance Evaluation of Container-based Virtualization for High Performance Computing Environments, Miguel G. Xavier, Marcelo V. Fig 5 : Virtual Machines vs. Container-based virtualization deployment Neves, Fabio D. Rossi, Tiago C. Ferreto, Timoteo Lange, Cesar A. F. De Rose Pontifical Catholic University of Rio Grande do Sul (PUCRS) Virtualized applications need to access resources through the Porto Alegre, Brazil Parallel, Distributed and Network-Based Processing different layers, every layer puts an amount of latency as the (PDP), 2013 21st Euromicro International Conference on. requests are processed. A hypervisor shares the hardware [6] Microsoft, BizTalk Server 2006 R2 Hyper-V Guide, System Resource infrastructure, allowing multiple isolated Virtual Machines to Costs of Hyper-V run on the same hardware. Virtualized applications need to access resources through the different layers, every layer puts an amount of latency and overhead as the requests are

168 Proceedings ADVANCE 2018 ISBN 978-2-9561129

Author Index

A Agoulmine, Nazim 1, 26, 131 Aitidir, Mustapha 1 Alilat, Farid 131 Alves, Renan 115 Andrade, Rossana 11 Asma, Lounis 19 Azurdia, Cesar 106 B Baliosian, Javier 1, 26 Ben Rejeb, Sonia 26 Bezerra, Arthur 68 Bezerra, Carla 41, 49 Bolufe, Sandy 106 Braga, Oton 68 C Carvalho, Carlos 11 Cespedes, Sandra 106 Chabbouh, Olfa 26 Chang, Chun-Che 76 Chen, Chien Hsu 76 Cherkaoui, El Hadi 162 Correia, Marcio 11 Costa Dos Santos, Clemilson 33 Coutinho, Emanuel 33, 41, 49, 98, 123 D De Souza, Jose 41, 49, 98 E Elmisery, Ahmed 57 F Farid, Alilat 19 Freitas, Nicodemos 147 G Gomes, F´abio 68 Gomes, Rafael 115 Guenane, Fouad 162 H Hafid, Abdelhakim Senhaji 114 Ho, Te-Wei 76 J Jos´e,Jo˜ao 68 J´unior,Joaquim Celestino 115

169 Proceedings ADVANCE 2018 ISBN 978-2-9561129

L Lai, Feipei 76 Lopes, Vitor 147 M M. Sampaio, Paulo N. 139 Martins, Joberto 84 Miguel, Constantino Jacob 139 Miranda, Odorico 154 Monteiro, Odorico 68, 147 Moraes, Pedro 84 Moreira Neto, Maur´ıcio 41, 49, 98 Moreira, Leonardo 33, 49, 98, 123 Moura, C´esar 68 N Nazim, Agoulmine 19 Neto, Maur´ıcio 33 O Oliveira, Mauro 68, 147, 154 Ortega, Pablo 106 P Paillard, Gabriel 33 Q Quintino, Joyce 154 R Rayeni, Mehdi 114 Reale, Rafael 84 Roberto Da Silva Oliveira, Matheus 98 Rocha, Emilson 147 Rocha, Leonardo 115 S Sales, William 41 Santos, Bruno 115 Santos, Italo 123 Santos, Jorge Alves 139 Silva, Cristiano 154 T Tai, Hao-Chih 76 Tamazirt, Lotfi 131 Tolosana, Rafael 1 Trajano de Lima, Ernesto 33 V V. Neto, Francisco J. Badaro 139 Viana, David 147 Vieira Albuquerque Neto, Gerson 154 V´elez,Hugo 57

170 Proceedings ADVANCE 2018 ISBN 978-2-9561129

W Wu, Jin-Ming 76 X Ximenes, Pablo 11 Y Yakdhane, Ahmed 162

171