<<

Journal of Computing and Information Technology - CIT 20, 2012, 3, 167–173 167 doi:10.2498/cit.1002093

A Framework for Semantic Enrichment of Sensor Data

Alexandra Moraru and Dunja Mladenic´

Artificial Intelligence Laboratory, J. Stefan Institute, Ljubljana, Slovenia

The increased interest in sensing the environment in networks, which can improve knowledge ex- which we live has led to the deployment of thousands traction from sensor data streams and facilitate of sensors which can measure and report its status. In order to raise the impact that sensor networks can reasoning capabilities. have, improving the usability and accessibility of the Some of the directions adopted for achieving the measurements they provide is an important step. integration of semantic technologies and Sensor The problem addressed in this paper is that of enrichment Web are related to (i.e. linked sen- of sensor descriptions and measurements in order to [ ][ ][ ]) provide richer data, i.e., data containing more meaning. sor data 2 3 4 , or to semantic annotation and We propose a framework for automatizing the process of composition of web services [5]. More general semantically enriching sensor descriptions and measure- directions that can be identified in building the ments with the purpose of improving the usability and Semantic Sensor Web are: accessibility of sensor data. • Automatically annotate and enrich sensor Keywords: , linked data, sensor web data, by providing semantic about spatiotemporal and thematic properties. • Publish annotated sensor data using shared 1. Introduction vocabularies and standard schemas, in order to facilitate accessibility and enable sensor discovery. Sensors are materials or devices which change • their (conductive) properties according to a phys- Apply reasoning mechanisms on semanti- ical stimulus. These sensors can be attached cally enriched sensor data for solving prob- to more complex devices, called sensor nodes, lems, such as sensor composition, event de- which can have computing and communication tection and network management. capabilities. More and more sensor nodes are The enrichment of data generally refers to adding embedded into physical objects used in every- information, annotation or additional features day life, ranging from pacemakers, transporta- to the data by means of computation or by tion cargos to electrical appliances. Further- pulling information from external sources (e.g., more, communication links can be established the web, , etc.). Semantic enrichment between these objects, organized into wired and of sensor data denotes the process of associating wireless networks called sensor networks or semantic tags to initial sensor descriptions and sensor webs in the case when web accessibil- measurements. These tags represent concepts, ity is provided [1]. properties and relationships from an ontology and are used to describe the metadata associ- Using semantic technologies for enriching sen- ( sor descriptions and measurements in scalable ated to sensor data i.e., measurement capabil- ities, observed phenomena, spatial properties, and heterogeneous sensor networks are intended )[ ] as a solution for better interoperability and eas- etc. 6 . ier maintenance. Through semantic descrip- Making sensor data publicly available enables tions it is possible to provide context for sensor the development of new and useful applications. 168 A Framework for Semantic Enrichment of Sensor Data

The methods for publishing sensor data can ing thereby advanced query and reasoning. Re- vary from standardized web services, such as source Description Framework attributes (RDFa) OGC’s Sensor Observation Service (SOS) to format is adopted as an annotation language for application specific methods, as the ones used two demonstrative applications that are using by web platforms, such as Pachube1 or Sensor- also several Sensor Web Enablement standards. pedia2. However, such methods require prior Moreover, rule-based reasoning is applied for knowledge of the infrastructures used, while determining specific weather conditions, such publishing semantically annotated sensor data, as freezing or blizzard. The idea of semantic following the linked data principles, would en- annotation is taken further by Wei and Barnaghi able better accessibility. Moreover, when sup- [9] by using Linked Open Data (LOD) resources ported for integration with existing knowledge, for annotation that brings access to knowledge it would increase also the usability of published already represented and eliminates the risks of data. creating redundant data. Reasoning, in general, is the process of produc- A recent trend for making sensor descriptions ing new beliefs from a collection of believed and measurements available on the Web is to propositions. It is strongly related to the field publish them on LOD cloud. The advantages of logic and in the context of ontologies the and challenges of Linked Sensor Data are dis- logical formalisms are provided by a family of cussed by Keler and Janowicz [4] as a solution representation languages known as Description for better sensor data accessibility without intro- Logic (DL)[7]. Describing sensors data using ducing very high complexity. The paper stresses ontology terms enables reasoning mechanisms out the importance of finding the appropriate that can be used to infer new knowledge for links between different datasets from LOD and further enrichment of data or to solve complex proposes a semiautomatic way for generating problems. them. We propose a framework for semantic enrich- Other research in the direction of publishing ment of sensor descriptions and measurements, sensor data as linked data include [2] and [3]. with the purposes of automatizing the process Patni et al. [2] were the first to publish a large of translating existing sensor descriptions into dataset of sensor descriptions and measurements, semantic descriptions and enabling semantic by first representing it in Observations and Mea- querying over sensor measurements. The pri- surements (O&M) standard and then converting mary focus of this work is on the first general it to Resource Description Framework (RDF) direction that we mentioned above, that of anno- format. The linked sensor data is using a sensor tating and enriching sensor data using semantic ontology schema based on the concepts from technologies. Next, we take into consideration O&M and the external links are made only linked data as a method of publishing annotated for the location attribute, using the Geonames sensor data, while the aspects of applying rea- dataset. Barnaghi and Presser [3] propose a soning mechanism on sensor data are briefly platform for publishing linked sensor data fol- mentioned as possible future directions. lowing the four principles proposed by Berners- Lee [10]. The platform offers an interface for publishing linked senor data without requiring 2. Related Work from its user a related technical background. However, the user is requested to manually en- ter relevant keywords that describe the sensors Sheth et al. [8] propose the Semantic Sensor ( ) for obtaining a list of suggested concepts from Web SSW as a solution for the problem of on-line repositories. “too much data and not enough knowledge” that appeared with the rapid development of sensor In our work we propose methods for automa- networks. In their view, the SSW represents tizing the translation of simple sensor descrip- semantically annotated sensor data with spa- tions into semantic sensor descriptions and for tial, temporal and thematic metadata, facilitat- processing sensor measurements for extracting

1 http://www.pachube.com/ 2 http://www.sensorpedia.com/ A Framework for Semantic Enrichment of Sensor Data 169 more meaningful values for the properties ob- • domain layer, defining the domain concepts served. We adopt a strategy of analysis of sensor related to a specific scenario where the sen- measurements before building the semantic rep- sor networks are used (e.g., floods, land- resentations and we use tools capable of dealing slides, oil spills, etc). with large amounts of sensor data. The description languages are used in repre- senting ontologies. These are named ontology 3. Semantic Enrichment of Sensor Data languages and are included in the larger family of formal languages. Ontology languages en- The requirements for building the SSW refer code the domain knowledge and the rules used to knowledge representation, description lan- for reasoning on that knowledge. guages and semantic reasoners. Two of the most common used languages for A fundamental definition of knowledge repre- knowledge representation are Web Ontology sentation is given by Davis et al. [10] as “a sur- Language (OWL) and RDF. Both languages are rogate, a substitute for the thing itself”, seen as a W3C standardized and different versions are de- model. One of the categories of knowledge rep- fined, presenting varying levels of expressivity. resentation appropriate for the model required RDF is a data model based on subject-predicate- is represented by ontologies. object triples and uses XML for specifying syn- A detailed survey of semantic specification of tax. RDF Schema introduces semantics to a sensor networks is provided in [12],where eleven RDF data model; it describes concepts, such as sensor network ontologies are analyzed. The classes, properties of classes and hierarchies of ontologies developed for modeling sensor net- these. However, RDF and RDF Schema sup- works have a set of common concepts related port a limited number of semantic primitives. to the taxonomy of different types of sensors, The advantage of OWL is better expressivity, physical properties of sensor devices, data ac- but sometimes with higher costs regarding effi- quisition and sensed domain. However, the fea- ciency and reasoning capabilities. tures of the sensed domain may vary, depend- [ ] ing on the application where the sensor network Baader et al. 14 presented ( ) is used and further development of this set of DL for ontology languages, as it can provide concepts is required. Moreover, none of the on- both well-defined semantics and powerful rea- tologies analyzed in the survey provide means soning tools. DL models concepts, roles and to fully describe all the concepts which might be individuals and the relationships between them required in a real-live scenario. However, they are expressed through . Based on the can provide a ground to which further exten- axioms stated in DL, new relationships can be sion can be added to meet the specific domain inferred using a reasoning engine that can “de- requirements. duce implicit knowledge from the explicit repre- sented knowledge” [14]. Therefore, a reasoning A classification in different layers of ontologies engine (or semantic reasoner) is a system able that are used to describe the sensor network do- main can be found in [13].Grayetal.are to draw conclusions or to infer logical conse- suggesting four layers of ontologies: quences by applying logic rules to a set of facts or hypothesis from a . • upper layer, comprising upper-level ontolo- gies used for the interoperability between For the SSW domain, we classified the exiting other ontologies. reasoners in 3 categories: distributed or large- scale reasoners, normal scale reasoners and rea- • infrastructure layer, describing the informa- tion required for the infrastructure (i.e., sen- soners for constrained resource devices. sor network deployment, services provided In the first category we refer to distributed plat- by the infrastructure, metadata about sensor forms that are able to process large amounts of ) streams. data, usually Web data. The existing or under • external layer, representing concepts which development systems that must be mentioned are not directly related to the sensor domain, are: Marvin [15] and LarKC [16]. While the such as geographical information. first one uses a divide-conquer-swap strategy 170 A Framework for Semantic Enrichment of Sensor Data that assures massive scalability able to eventu- 1. Use URIs as names for things. ally reach completeness, LarKC is trading com- 2. Use HTTP URIs so that people can look up putational cost to incomplete reasoning, being those names. intended for massive heterogeneous informa- tion. 3. When someone looks up an URI, provide useful information, using the standards (RDF, The second category covers reasoners that can SPARQL). normally run on a simple desktop machine and are meant for not very large ontologies, for do- 4. Include links to other URIs, so that they can discover more things. main specific problems where complete reason- ing is required. They could be used in relatively These principles have been largely adopted in small sensor networks. Few examples of the ex- the last years by the Linked Open Data (LOD) iting reasoners that we considered for this cate- community and several methods for publishing 3 4 ++5 6 data following these principles have been doc- gory are: Pellet , Racer-Pro ,FACT , [ ] (the reasoning component). umented 18 . Therefore, one can say that pub- lishing the related information that describes the For the last category we consider the reasoners real world object in HTML and RDF/XML rep- that can ran on resource-constrained devices, resentations is straightforward once it is avail- such as sensor nodes. These types of reason- able. ers are useful in large sensor networks, where a centralized system will not perform well any more. Currently, there are some prototype im- 4. Conceptual Framework plementations, one of these based on a method for automatically composing a reasoner for the The problem that the proposed framework is needs of particular applications [17]. addressing is that of semantic enrichment of sensor descriptions and measurements with the purpose of enabling sensor discovery for better 3.1. Linked Data accessibility and processing of sensor data. The enrichment of data generally refers to adding Publishing information on the Web has already information, annotation or additional features changed once with the Web’s evolution. If we to the data by means of computation or by consider Web 1.0, the data was in a static form pulling information from external sources (i.e. and the interaction between the user and the data the web). One example of enrichment by com- published was mostly read-only. The evolution putation is to generate features, such as headache to the Web 2.0 put the user as the central ac- likelihood based on the barometric pressure val- tor for generating information, through blogs or ues and their variation. An example of enrich- social media sites, resulting in a read-write in- ment by pulling data from the Web is adding teraction with the data. Web 3.0, also referred tweets about the weather generated around the to as Semantic Web, uses semantic description time the values have been measured. The en- languages such as RDF and OWL to provide a riched data is then usually further processed in- formal description of data and knowledge. Pub- stead of processing the original data only. lishing information for the Semantic Web can be The conceptual framework illustrated in Fig- done via object representation described in RDF ure 1 is defined by the following components: or OWL, using structured vocabularies in the • form of ontologies, or as document annotated Sensor Descriptions and Measurements with formal metadata describing the content of • Ontology Collection ( the document using annotation languages such • as RDFa). Enrichment Components • The principles of linked data have been defined Semantic Repository of Sensor Data by Tim Berners-Lee in [10] as follows: • Data Consumers

3 Pellet, http://clarkparsia.com/pellet/ 4 RacerPro, https://www.racer-systems.com/ 5 FACT++, http://owl.man.ac.uk/factplusplus/ 6 Cyc, http://www.cyc.com/ A Framework for Semantic Enrichment of Sensor Data 171

Figure 1. Conceptual framework. Illustration of the main components constituting the proposed framework.

The two components to start the framework with semantic concepts. The main steps of the en- are the Sensor Descriptions and Measurements richment process are: and an Ontology Collection. • Analysis of the sensor descriptions and mea- The Sensor Descriptions contain the metadata surements for identifying the associated se- defining the sensor characteristics. The process mantic concepts. of generating the metadata can follow a manual • Selection of the most appropriate ontologies or an automatic approach. Manually generating out of the existing ontologies. metadata involves engineers aware of the sen- • sor characteristics, who can describe the sensor Extension of the selected ontologies with the metadata following a predefined schema (e.g. concepts specific to the domain of applica- schema, XML Schema, etc.). An auto- tion. This mainly implies particularization matic approach for generating sensor metadata of the observed properties and features of assumes that the sensor nodes would have the interest. capabilities of describing themselves by send- • Implementation of enrichment components, ing their characteristics encoded in messages to which are software programs that parse the a server. In this case physical capabilities of sen- sensor description and measurements, ex- sor nodes are to be considered, such as memory tracting the required metadata, and translate constraints or power consumption for transmit- it with the associated semantics in a formal- ting metadata messages. The metadata gener- ized language for semantic representation. ated by either of the two methods, manually or The result of the Enrichment Components in- automatically, are usually stored in databases. volved in the framework is a Semantic Repos- The Sensor Measurements contain numerical itory of Sensor Data (SRSD), which contains values quantifying the changes of sensor proper- the enriched sensor descriptions and measure- ties and can be accessed using traditional data- ments. base methods or streaming-based approaches. The data from SRSD can be consumed by differ- The Ontology Collection consists of a set of on- ent Data Consumers, such as query end-points, tologies necessary for describing sensor charac- semantic browsers and engines. The teristics and providing context for sensor mea- query end-point and the semantic browsers pro- surements. vide simple means for searching and browsing through the SRSD, supporting its representation The main process in the framework is run by format. The inference engines are represented the Enrichment Components,wherethesen- by different semantic reasoners, able to infer sor descriptions are enriched with semantic con- conclusions based on the exiting facts stored cepts and the sensor measurements are pro- in the SRSD by applying the rules defined by cessed to generate new features enriched with the ontology. A common application in which 172 A Framework for Semantic Enrichment of Sensor Data

inference engines are useful is that of virtual [4] C. KELER,K.JANOWICZ, Linking Sensor data – sensors composition, but we can also mention Why, to What and How?. In Proceedings of the rd anomaly detection of sensor observations and 3 International Workshop on Semantic Sensor Networks, (2010 November). sensor network management. [5] ENVISION EU PROJECT, The conceptual components of the proposed http://www.envision-project.eu/ framework can be instantiated by various im- [01/15/2012] [ ] plementations. As part of a master thesis 19 , [6] R. HULL,D.JENKINS,A.MCCUTCHEN, Semantic a deep analysis of a set of implementations has Enrichment and Fusion of Multi-Intelligence Data. been performed together with a discussion of Modus Operandi, Inc., 2009. http://www.modusoperandi.com/downloads/ their advantages for different real-live scenar- Semantic Enrichment and ios. Fusion of Multi INT Data White Paper.pdf [01/15/2012]

[7] R. BRACHMAN,H.J.LEVESQUE, Knowledge Repre- 5. Conclusions sentation and Reasoning. Elsevier, 2004. [8] A. SHETH,C.HENSON,S.S.SAHOO, Semantic Sen- sor Web. IEEE Computing,12(4)(2008), Semantic technologies can improve the inter- 78–83. operability and accessibility of sensor descrip- [9] W. WEI,P.BARNAGHI, Semantic Annotation and tions and measurements by semantically enrich- Reasoning for Sensor Data. In Proceedings of the ing them. The context that semantic annotation 4th European Conference on Smart Sensing and provides for sensor data improves knowledge Context, (2009). extraction and enables the development of new [10] T. BERNERS-LEE, Design Issues – Linked Data. applications. This paper proposes a framework http://www.w3.org/DesignIssues/ for enrichment of sensor descriptions and mea- LinkedData.html [01/15/2012] surements that provides means for automatizing [11] R. DAV I S ,H.SHROBE,P.SZOLOVITS, What is Know- the process of semantically describing and pub- ledge Representation? AI Magazine,14(1)(1993), lishing sensor data. 17–33. [12] M. COMPTON ET AL., A Survey of the Semantic Specification of Sensors. In Proceedings of the 6. Acknowledgments 2nd International Workshop on Semantic Sensor Networks (2009 October).

[13] A. GRAY ET AL., A Semantically Enabled Ser- This work was supported by the Slovenian Re- vice Architecture for Mashups over Streaming and search Agency and the IST Programme of the Stored Data. In Proceedings of Extended Semantic ( ) Web Conference 2011, Part II, LNCS 6644, (2011) EC under PASCAL2 IST-NoE-216886 ,EN- 300–314. VISION (IST-2009-249120) and PlanetData (IST-NoE-257641). [14] F. BAADER,I.HORROCKS,U.SATTLER, Description Logics as Ontology Languages for the Seman- tic Web. Mechanizing Mathematical Reasoning, (2005), 228–248.

References [15] E. OREN,S.KOTOULAS,G.ANADIOTIS,R.SIEBES, A. TEN TEIJE,F.MARVIN VAN HAEMELEN,Aplat- form for large-scale analysis of Semantic Web data. In Proceedings of International Con- [1] A SEA OF SENSORS. The Economist, 2010. ference, (2009). http://www.economist.com/node/17388356 [ ] [01/15/2012] 16 LARKC: THE LARGE KNOWLEDGE COLLIDER. http://www.larkc.eu [01/15/2012] [2] H. PATNI,C.HENSON,A.SHETH,Linkedsensor [17] W. TAI,R.BRENNAN,J.KEENEY,D.O’SULLICAN, data. In 2010 International Symposium on Collabo- ( ) An automatically composable OWL reasoner for rative Technologies and Systems, 2010 May 17-21 resource constrained devices. In Proceedings of the pp. 362–370. Chicago, USA. 2009 IEEE International Conference on , 2009) Washington, DC, USA. [3] P. BARNAGHI,M.PRESSER, Publishing Linked Sen- sor Data. In Proceedings of the 3rd International [18] T. HEATH,C.BIZER, Linked Data: Evolving the Workshop on Semantic Sensor Networks, (2010 Web into a Global Data Space. Morgan & Claypool November). 2011. ISBN 978-1608454303. A Framework for Semantic Enrichment of Sensor Data 173

[19] A. MORARU, Enrichment of Sensor Descriptions ALEXANDRA MORARU is a PhD student at the J. Stefan International and Measurements Using Semantic Technologies. Postgraduate School in the Information and Communication Technolo- Master Thesis, Jozefˇ Stefan International Postgrad- gies program. She got her BSc in Computer Science from the Technical University of Cluj-Napoca in 2009 and her MSc in Information and uate School, Ljubljana, 2011. Communication Technology from the J. Stefan International Postgrad- uate School in 2011. She started her collaboration with the J. Stefan Institute in 2008, with a 2 months internship program, and since 2009 she is a student there. Her general research interests are in the area of semantic web and semantic technologies, more specifically, the appli- cability of semantics in sensor networks. Received: June, 2012 Accepted: August, 2012 DUNJA MLADENIC´ is an expert on study and development of machine learning, data/text mining, semantic technology techniques and their Contact addresses: application on real-world problems. She is associated with the J. Stefan Institute since 1987, first as a student and since 1992 employed as a Alexandra Moraru researcher. She is leading Laboratory of the J. Artificial Intelligence Laboratory Stefan Institute since 2011. She got her MSc and PhD in Computer J. Stefan Institute Science from the University of Ljubljana in 1995 and 1998 respec- Jamova 39 tively. She was a visiting researcher at School of Computer Science, Ljubljana Carnegie Mellon University, USA in 1996-1997 and in 2000-2001. She Slovenia has published papers in refereed conferences and journals, served in the e-mail: [email protected] program committee of international conferences and organized interna- tional events in the area of text mining, link analysis and data mining. She is co-editor of several books including “Data Mining and Decision Dunja Mladenic´ Support: Integration and Collaboration”, Kluwer Academic Publishers Artificial Intelligence Laboratory 2003, “Semantic : Integrating Ontology Man- J. Stefan Institute agement, Knowledge Discovery, and Human Language Technologies” Jamova 39 Springer 2008, “Web Mining: from Web to Semantic Web”, Springer 2004, “Semantics, Web and Mining” Springer 2006, “From Web to Ljubljana Social Web: discovering and deploying user and content profiles”, Slovenia [email protected] Springer 2007, “Knowledge Discovery Enhanced with Semantic and e-mail: Social Information”, Springer 2009.