Machine Learning in the Internet of Things: a Semantic-Enhanced Approach

Semantic Web 0 (2018) 1 1 IOS Press Machine Learning in the Internet of Things: a Semantic-enhanced Approach Michele Ruta a,*, Floriano Scioscia a, Giuseppe Loseto a, Agnese Pinto a, and Eugenio Di Sciascio a a Polytechnic University of Bari, Department of Electrical and Information Engineering, via E. Orabona 4, I-70125, Bari, Italy E-mail: [email protected] Editors: Maria Maleshkova, University of Bonn, Germany; Ruben Verborgh, Ghent University – imec – IDLab, Belgium Solicited reviews: Simon Mayer, TU Graz & Pro2Future GmbH, Austria; one anonymous reviewer Abstract. Novel Internet of Things (IoT) applications and services rely on an intelligent understanding of the environment leveraging data gathered via heterogeneous sensors and micro-devices. Though increasingly effective, Machine Learning (ML) techniques generally do not go beyond classification of events with opaque labels, lacking machine-understandable representation and explanation of taxonomies. This paper proposes a framework for semantic-enhanced data mining on sensor streams, amenable to resource-constrained pervasive contexts. It merges an ontology-based characterization of data distributions with non-standard reasoning for a fine-grained event detection. The typical classification problem of ML is treated as a resource discovery by exploiting semantic matchmaking. Outputs of classification are endowed with computer-processable descriptions in standard Semantic Web languages, while explanation of matchmaking outcomes motivates confidence on results. A case study on road and traffic analysis has allowed to validate the proposal and achieve an assessment with respect to state-of-the-art ML algorithms. Keywords: Semantic Web, Machine Learning, Non-standard Reasoning, Internet of Things 1. Introduction very large quantities of information materialize and need to be manipulated. Hence, Machine Learning The Internet of Things (IoT) paradigm is emerg- (ML) is adopted to classify raw data and make pre- ing through the widespread adoption of sensing micro- and nano-devices dipped in everyday environments dictions oriented to decision support and automation. and interconnected in low-power, lossy networks. The Progress in ML algorithms and optimization goes hand amount and consistency of pervasive devices increases in hand with advances in pervasive technologies and daily and then the rate of raw data available for pro- Web-scale data management architectures, so that un- cessing and analysis grows up exponentially. More than ever, effective methods are needed to treat data deniable benefits have been produced in data analysis. streams with the final goal of giving a meaningful in- Nevertheless, some non-negligible weaknesses are still terpretation of retrieved information. evident with respect to the increasing complexity and The Big Data label has been coined to denote the re- heterogeneity of pervasive computing scenarios. Par- search and development of data mining techniques and management infrastructures to deal with “volume, ve- ticularly, the lack of machine-understandable charac- locity, variety and veracity” issues [1] emerging when terization of outputs is a prominent limit of state-of- the-art ML techniques for a possible exploitation in *Corresponding author. E-mail: [email protected]. fully autonomic application scenarios. 1570-0844/18/$35.00 c 2018 – IOS Press and the authors. All rights reserved 2 M. Ruta et al. / Machine Learning in the Internet of Things: a Semantic-enhanced Approach This paper proposes a framework named MAFALDA1 while Section 5 and Section 6 report on the case study (as MAtchmaking Features for mAchine Learning Data and the experiments, respectively. Conclusion finally Analysis), aiming to enhance classical ML analysis on closes the paper. IoT data streams, by associating semantic descriptions to information retrieved from the physical world, as opposed to simplistic classification labels. The basic 2. Motivation idea is to treat a typical ML classification problem like knowledge-based resource discovery. This process Motivation for the work derives from the evidence calls for building a logic-based characterization of sta- of current limitations featuring the typical IoT sce- tistical data distributions and performing a fine-grained narios. There, information is gathered through micro- event detection through non-standard reasoning ser- devices attached to common items or deployed in vices for matchmaking [2]. given environments and interconnected wirelessly. Ba- The proposal leverages both general theory and sically, due to their small size, such objects have min- technologies of Pervasive Knowledge-Based Systems imal processing capabilities, small storage and low- (PKBS), intended as KBS whose individuals (asser- throughput communication capabilities. They continu- tional knowledge) are physically tied to objects dis- ously produce raw data whose volume requires to be seminated in a given environment, without central- processed by advanced remote infrastructures. Clas- ized coordination. Each annotation refers to an on- sical ML techniques have been largely used for that, tology providing the conceptualization and vocabu- but often representations of detected events are not lary for the particular knowledge domain and an ad- completely manageable in practical applications: this vanced matchmaking can operate on the above meta- is mostly due to the difficulty of making descriptions data stored in sensing and capturing devices. No fixed interoperable with respect to shared vocabularies. In knowledge bases are needed. In other words, infer- addition, usually ML solutions are very much tai- ence tasks are distributed among devices which pro- lored (i.e., trained) to a specific classification problem. vide minimal computational capabilities. Stream rea- In spite of increasing device pervasiveness (miniatur- soning techniques provide the groundwork to harness ization) and connectivity (interconnection capability), the flow of annotation updates inferred from low-level data streams produced at the edge of the network can- data, in order to enable proper context-aware capa- not be fully analyzed locally yet. Commonly adopted bilities. Along this vision, innovative analysis meth- data mining techniques have two main drawbacks: i) ods applied to data extracted by inexpensive off-the- they basically carry out no more than a classification shelf sensor devices can provide useful results in event task and ii) their precision increases if they are applied recognition without requiring large computational re- on very big data amounts, so on-line analyses hardly sources: limits of capturing hardware could be coun- achieve high performance on typical IoT devices, due terbalanced by novel software-side data interpretation to computational and storage requirements. These fac- approaches. tors still prevent the possibility of actualizing thinking MAFALDA has been tested and validated in a case things, able to make decisions and take actions locally study for road and traffic monitoring on a real data set after the sensing stage. collected for experiments. Results have been compared IoT relevance could be enhanced by annotating real- to classic ML algorithms in order to evaluate the pro- world objects, the data they gather and the environ- vided performances. The experimental test campaign ments they are dipped in, with concise, structured and allows a preliminary assessment of both feasibility and semantically rich descriptions. The combination of the sustainability of the proposed approach. IoT with Semantic Web models and technologies is The remainder of the paper is as follows. Section bringing about the so-called Semantic Web of Things 2 outlines motivation for the proposal, before dis- (SWoT) vision, introduced in [3] and developed, e.g., cussing in Section 3 both background and state of in [4–7]. This paradigm aims to enable novel classes the art on semantic data mining and ML for the IoT. of intelligent applications and services grounded on The MAFALDA framework is presented in Section 4, Knowledge Representation (KR), exploiting semantic- based automatic inferences to derive implicit infor- 1The name should give a retcon with the well-known Quino comic mation from an explicit event and context detection strip to hint at the shrewd gaze of Mafalda character with her inves- [8]. By associating a machine-understandable, struc- tigating attitude to life and her curiosity about the world. tured description in standard Semantic Web languages, M. Ruta et al. / Machine Learning in the Internet of Things: a Semantic-enhanced Approach 3 each classification output could have an unambigu- tering. This paper focuses on classification, i.e., the as- ous meaning. Furthermore, semantic-based explana- sociation of an observation (sample) to one of a set of tion capabilities allow increasing confidence in sys- possible categories (classes)–e.g., whether an e-mail tem outcomes. If pervasive micro-devices are capable message is spam or not– based on values of its rele- of efficient on-board processing on locally retrieved vant attributes (features). Classification has therefore a data, they can describe themselves and the context discrete n-ary output. where they are located toward external devices and ap- The implementation of a ML system typically in- plications. This enhances interoperability and flexibil- cludes training and testing stages, respectively devoted ity and enables autonomicity of pervasive knowledge- to build a model of the particular problem inductively

Machine Learning in the Internet of Things: a Semantic-Enhanced Approach

Metadata for Semantic and Social Applications

From the Semantic Web to Social Machines

A New Generation Digital Content Service for Cultural Heritage Institutions

Enterprise Data Classification Using Semantic Web Technologies

Web Semantics in Cloud Computing

A Multimodal Approach for Automatic WS Semantic Annotation

Semantic Computing

Exploiting Semantic Web Knowledge Graphs in Data Mining

Semantic Data Mining: a Survey of Ontology-Based Approaches

Ontology Engineering for the Design and Implementation of Personal Pervasive Lifestyle Support Michael A

Leveraging Protection and Efficiency of Query Answering in Heterogenous RDF Data Using Blockchain

Computing for the Human Experience: Semantics-Empowered Sensors, Services, and Social Computing on the Ubiquitous Web