Die approbierte Originalversion dieser Diplom-/ Masterarbeit ist in der Hauptbibliothek der Tech- nischen Universität Wien aufgestellt und zugänglich. http://www.ub.tuwien.ac.at
The approved original version of this diploma or master thesis is available at the main library of the Vienna University of Technology. http://www.ub.tuwien.ac.at/eng
Consuming Linked Open Data via Standard Web Widgets
DIPLOMARBEIT
zur Erlangung des akademischen Grades
Diplom-Ingenieurin
im Rahmen des Studiums
Business Informatics
eingereicht von
Irina Pershina Matrikelnummer 1127738
an der Fakultät für Informatik der Technischen Universität Wien
Betreuung: o.Univ.-Prof. Dipl.-Ing. Dr.techn. A Min Tjoa Mitwirkung: Univ.Ass. Dipl.-Ing. Dr.rer.soc.oec. Amin Anjomshoaa
Wien, 23.04.2014 (Unterschrift Verfasserin) (Unterschrift Betreuung)
Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at
Consuming Linked Open Data via Standard Web Widgets
MASTER’S THESIS
submitted in partial fulfillment of the requirements for the degree of
Diplom-Ingenieurin
in
Business Informatics
by
Irina Pershina Registration Number 1127738
to the Faculty of Informatics at the Vienna University of Technology
Advisor: o.Univ.-Prof. Dipl.-Ing. Dr.techn. A Min Tjoa Assistance: Univ.Ass. Dipl.-Ing. Dr.rer.soc.oec. Amin Anjomshoaa
Vienna, 23.04.2014 (Signature of Author) (Signature of Advisor)
Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at
Erklärung zur Verfassung der Arbeit
Irina Pershina Kohlgasse 49/15, 1050
Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich die verwende- ten Quellen und Hilfsmittel vollständig angegeben habe und dass ich die Stellen der Arbeit - einschließlich Tabellen, Karten und Abbildungen -, die anderen Werken oder dem Internet im Wortlaut oder dem Sinn nach entnommen sind, auf jeden Fall unter Angabe der Quelle als Ent- lehnung kenntlich gemacht habe.
(Ort, Datum) (Unterschrift Verfasserin)
i
Acknowledgements
I would like to express my very great appreciation to my supervisors, Univ.-Prof. A Min Tjoa and Univ. Ass. Dr. Amin Anjomshoaa, for his valuable and constructive suggestions, and useful critiques during the the process of writing this master thesis. Their willingness to give his time so generously has been very much appreciated. To my family, who has always been my support in every stage of my life. I am especially grateful to my parents, who supported me emotionally and financially. I also would like to thank my colleagues Peter Wetz, Dat Trinh Tuan, and Lam Ba Do from Linked Data Lab, and Lucas Gerrand and Raffael Prätterhoffer in Business Informatics Master program for listening, patience and support during the last ten months. I gratefully enjoyed the collaboration and knowledge exchange with them.
iii
Abstract
The Semantic Web describes a concept for information storing, sharing, and retrieving on the Web by adding machine-readable meta information to convey meaning to the data. Linked Open Data is publicly available structured data which is stored and modelled according to Semantic Web standards and interlinked with other Open Data. The Linked Open Data cloud comprises of Linked Data sources and has been growing significantly in recent years. Complementary to this, mashups allow non-professional users to access, consume, and analyze data from various sources. The basic component of a mashup is a widget that can access certain datasets, process data, and provide additional functionality. Mashups can partly handle Linked Data consumption for knowledge workers. The main challenges are finding the appropriate widget when the amount of available widgets is increasing, categorizing and finding widgets with similar functionality, and adding provenance information to widgets. The primary purpose of this thesis is to design a semantic model for a mashup platform that enables (i) publishing of widget information on the Linked Open Data Cloud, (ii) widget discovery, (iii) widget composition, and (iv) smart data consumption based on semantic model. Additionally, the semantic model should provide provenance information in order to provide additional information about origin and authenticity of data, and increase trust in data resources. During this research work existing approaches applicable for Semantic Web Service descrip- tion have been compared and Semantic Web Service description techniques concerning appli- cation in the area of Web Widgets have been evaluated. Requirements to the semantic model are derived from literature review and complemented with requirements for mashup systems. Finally, the semantic widget model is implemented into a mashup prototype to demonstrate its usability.
v
Kurzfassung
Das Semantische Web beschreibt ein Konzept zu Informationspeicherung, -austausch und -abruf im Web durch Hinzufügen maschinenlesbarer Metainformation. Ziel ist es, Daten eine Bedeu- tung zu geben. Zu diesem Konzept zählt auch Linked Open Data. Dabei handelt es sich um Daten, die der Resource Description Framework Spezifikation entsprechend modelliert und ge- speichert sind. Zudem sind diese Daten öffentlich verfügbar und miteinander verknüpft. Die Linked Open Data Cloud beeinhaltet alle bedeutenden Linked Data Quellen und befindet sich seit den letzten Jahren in einem ständigen Wachstum. Ergänzend dazu ermöglichen Mashups nicht fachkundigen Anwendern Zugang zu Konsum und Analyse von Linked Data. Die Grund- komponente eines Mashups ist ein Widget. Dieses kann auf bestimmte Datensätze zugreifen, Daten verarbeiten und zusätzliche Funktionen zur Verfügung stellen. Bis dato können Mashups nur teilweise die vorhandenen Probleme für Wissensarbeiter, die mit der Verwendung von Linked Data zusammenhängen, lösen. Die größten Herausforderungen sind es, passende Widgets zu finden, während die Anzahl verfügbarer Widgtes steigt, Widgets mit ähnlichen Funktionalitäten zu kategorisieren und zu finden, und Informationen über die Her- kunft und Vertrauenswürdigkeit von Daten hinzuzufügen. Der Hauptzweck dieser Masterarbeit ist die Entwicklung eines semantischen Modells für ei- ne Mashup Platform. Das Modell ermöglicht (i) die automatische Veröffentlichung der Widget- information in die Linked Open Data Cloud, (ii) Widget Auffindung, (iii) Widget Zusammenset- zung und (iv) smarte Anwendung von Daten basiert auf semantische Modellen. Zusätzlich soll das Modell Informationen über die Herkunft von Daten beinhalten. Während meiner Forschung evaluierte ich Änlichkeiten und Unterschiede zwischen Web Widgets und Semantic Web Services, verglich existierende Ansätze zu Semantic Web Service Beschreibungen und evaluierte Semantic Web Service Beschreibungstechniken, die für eine An- wendung im Bereich Web Widgets relevant sind. Anforderungen an das Modell werden von vorhandener Literatur abgeleitet und mit den Anforderungen an Mashupsysteme ergänzt. Ab- schließend wird das semantische Widget Modell mittels eines Prototyps implementiert, um des- sen praktische Nutzbarkeit zu demonstrieren.
vii
Contents
1 Introduction 1 1.1 Motivation ...... 1 1.2 Problem Statement ...... 2 1.3 Structure of the Thesis ...... 5
2 Web of Data 7 2.1 Web 2.0 ...... 7 2.2 Web 3.0 ...... 9 2.3 Resource Description Framework (RDF) ...... 11 2.4 Web Ontology Language (OWL) ...... 16 2.5 SPARQL. Query Language for RDF ...... 19 2.6 Linked Open Data ...... 21 2.7 Overview of Linked Data Endpoints ...... 24 2.7.1 DBPedia ...... 25 2.7.2 Linked Movie Data Base ...... 28 2.8 Widgets & Mashups ...... 28 2.9 Scheme.org ...... 32 2.10 Semantic Web Services ...... 33
3 State of the Art 37 3.1 Applications ...... 37 3.1.1 Overview of existing application ...... 37 3.1.2 Yahoo!Pipes ...... 38 3.1.3 DERI Pipes ...... 41 3.1.4 BIO2RDF ...... 44 3.1.5 LOD2 ...... 45 3.2 Semantic Description Approaches ...... 50 3.2.1 Web Services Description Language (WSDL) ...... 50 3.2.2 Semantic Annotation for Web Services Description Language (SAWSDL) 51 3.2.3 Semantic Markup for Web Services (OWL-S) ...... 51 3.2.4 Web Service Modeling Ontology (WSMO) ...... 54 3.2.5 WSMO lite ...... 54 3.2.6 RESTDesc semantic description ...... 55
ix 3.2.7 SA-REST ...... 56 3.2.8 EXPRESS ...... 57 3.2.9 Linked Open Services (LOS) ...... 59 3.2.10 Linked Date Services (LIDS) ...... 60 3.2.11 Data-Fu ...... 61 3.2.12 Karma ...... 62 3.2.13 RDB to RDF Mapping Language (R2RML) ...... 65 3.3 Summary ...... 68
4 Solution 73 4.1 Definition of requirements ...... 73 4.2 Use and Extension of Karma Approach ...... 75 4.3 Widget Model ...... 81 4.4 DCAT ...... 84 4.5 Provenance ...... 86
5 Results and Evaluation 95 5.1 Resulting Semantic Model ...... 95 5.2 Semantic Model Use cases ...... 95 5.2.1 Publishing examples ...... 96 5.2.2 Discovery examples ...... 98 5.2.3 Composition examples ...... 101 5.2.4 Smart Data Consumption ...... 103 5.3 Result evaluation ...... 104
6 Conclusion and Future Work 107 6.1 Research Summary ...... 107 6.2 Research Limitation ...... 109 6.3 Future Work ...... 109
7 Appendix 111 7.1 Acronyms ...... 111 7.2 Widget Semantic Model ...... 112 7.3 Semantic Models in Top Braid Composer ...... 116
Bibliography 121
x CHAPTER 1 Introduction
1.1 Motivation
The Web is a phenomenon which has changed the modern era of communication and enterprise networks. The idea originally was conceived 25 years ago by Tim Berners-Lee and Robert Cailiau [15]. The main goals of the project were: to provide a protocol for requesting and exchanging information with use of networks; provide a method of reading information; provide search mechanisms; provide a collection of documents [15]. The documents were presented by a list of references, so-called hyperlinks, to other text sources over the Internet. In general, the Web is based on following technologies:
• documents written in Hypertext Markup Language (HTML)1, the language that “was pri- marily designed as a language for semantically describing scientific documents“ [67],
• Uniform Resource Locator (URL) references to a resource that consists of “a naming scheme specifier followed by a string whose format is a function of the naming scheme“ and Uniform Resource Identifier (URI), “a compact sequence of characters that identify an abstract or physical resource“ [1], i.e. the name of Web sources,
• Hypertext Transfer Protocol (HTTP), a protocol for “distributed, collaborative, hyperme- dia information systems“ [64]
With widespread use of the Web we saw the next stage in this evolution, the so called “Read- Write“ Web, or Web 2.0, where information can be distributed. The term includes social com- munities, services, and a corresponding set of technologies. Examples of Web 2.0 are blogs, Web applications, wikis, social networking sites, and mashups. The value of information that organizations and people are putting onto the internet started to increase. This progress had an increasingly significant influence on decision-making processes [27]. Furthermore, the informa- tion became an essential factor for management. The main focus of Web 1.0 and Web 2.0 was
1http://www.w3.org/TR/html/
1 content generation and content representation. The immersion of content created problems for data storage. This was due to the inability to provide a solid structure, comprehensive meaning and interchangeability by using machine-understandable formats. Aiming to solve these challenges a new generation called the Semantic Web emerged. It “provides a common framework that allows data to be shared and reused across application, en- terprise, and community boundaries“ [2]. The main focus of the Semantic Web is interlinking of data from various sources, whereas the original Web mainly focused on the interlinking of Web documents. Additionally, the meaning of data becomes machine-understandable. Fundamen- tally, the Semantic Web is based on the concept of metadata, which provides data descriptions; ontology, which describes hierarchy and relationship between data; and reasoning, which allows for logical derivation of new facts. Furthermore, Semantic Web technologies provide effective querying on large data sets. Data on the Web should have a standard format and be interlinkable in order to generate relationships among data. The “collection of interrelated datasets on the Web“ [2] is called Linked Data. To publish and connect data on Web it is important to follow the set of best practices and principles, the so called Linked Data principles [17]. These include use of URIs for entity identification, use of Semantic Web standards for data description (RDF2), data querying (SPARQL3) and data interlinking.
1.2 Problem Statement
In spite of the fact that the quantity of Linked Data is continually increasing, there are still many research challenges. For instance, creating and publishing Linked Data, trust and provenance of Linked Data, user interaction and usability, and natural language interfaces are still relevant research issues. In addition to the aforementioned challenges, there is a lack of successful applications that offer people, who are not necessarily from a professional background, access to Linked Data. Additionally, the possibility to make complex queries, data analysis, data enrichment, visualiza- tion, and aggregate data from various Linked Data sources in a feasible manner is still cumber- some. Semantic technologies like SPARQL4, RDF5 and OWL6 plus good programming skills are usually needed to process Linked Data. One of the solutions for the challenge mentioned above is the use of mashups, which are “user-driven micro-integration of Web-accessible data“ [21]. Using mashups allows to avoid redundancy, while enabling easy and cost-effective implementation and integration of software components in applications at the same time [60]. Furthermore, they provide features such as reusability, easy implementation, the possibility to combine widgets together, and consuming data from different Linked Open Data (LOD, publicly available Linked Data) sources.
2http://www.w3.org/RDF/ 3http://www.w3.org/TR/rdf-sparql-query/ 4http://www.w3.org/TR/rdf-sparql-query/ 5http://www.w3.org/RDF/ 6http://www.w3.org/TR/owl-features/
2 The basic component of a mashup is a widget which is an application with limited function- ality. Each widget fulfills a simple task and widgets can be linked to each other to enable new, more complex tasks. Widgets either have access to datasets from different sources or can process data. Widgets can have input/output terminals that define the type of data the widget can process and return. Additionally, widgets can include options to control the way widgets process data. The user can also wire the widgets together to create a mashup. Figure 1.1 presents a typical graphical interface of a mashup platform, called Yahoo!Pipes7. High usability obviously has an impact on an application’s success. The mashup platform can support widget development. Therefore a certain growth of the amount of available widgets can be expected. Widgets can process data from different fields like finance, population, trans- port, etc. Initially, it is not possible for users to rapidly learn to work with the system. To find a required widget, the user has to check all categories of widgets. Secondly, the user does not know the source of information, that widgets provide. Thirdly, the user does not know how to find widgets that can be combined in order to create new knowledge. Therefore, it is important to provide a mean to solve the following problems: • Publishing: Make Linked Widget information available on the LOD Cloud. • Discovery: Search for widgets that contain a specific kind of semantic relation. • Composition: Search for widgets that can consume a specific dataset or produce the re- quired output data. • Smart data consumption based on semantic model: Selection of the required input from the provided context data.
7http://pipes.yahoo.com/pipes/
Figure 1.1: Yahoo Pipes User Interface
3 For example, a set of locations is used as input information that is then processed by a widget. After processing, the widget returns a set of movies. Still, some facts remain unclear: Does the provided location describe where the film is running, or does it describe where the author or producer was born? Even though it is possible to add some human readable information in order to clarify a widget’s meaning, it still proves to be a problem for the machine as it does not understand the human readable information. A problem of this master thesis refers to data quality and trust-worthiness. Due to the fact that mashups process data from various sources, it is difficult to define the data origin. Ad- ditionally, information on the Web is often inconsistent or questionable, and therefore people often make trust judgments based on authorship of information. With the fast growth of Linked Data, provenance information becomes a factor that influences success of new Semantic Web applications8, especially of a mashup platform. Provenance includes information about origin and ownership of datasets, change tracking, and access. The proposed solution of this thesis aims at solving this challenge by creating a semantic model. This is due to the widgets having access to various types and formats of data. The semantic model should provide a description of the data that a widget accesses via its input and output, the relationship between the data behind, and provenance information. The goals of this thesis can be summarized as follows:
• Evaluation of similarities and differences between Web Widgets and Semantic Web Ser- vices.
• Comparison of existing approaches applicable for Semantic Web Service description.
• Evaluation of Semantic Web Service description techniques regarding application in the area of Web Widgets.
• Defining the widget semantically in order to enable service composition, search, and pub- lish widget description as a part of the LOD cloud.
• Defining a semantic model that can be used to select the required input widget or data from the provided context data.
• Semantic model extension that provides provenance information.
• Semantic widget model implementation.
The following research questions are derived from the master thesis goals: 1) Is it possible to apply semantic service description languages for widget description? 2) How to extend the semantic model to support the data flow and data streams? 3)How this semantic model can be integrated with a mashup environment?
8http://www.w3.org/2005/Incubator/prov/wiki/What_Is_Provenance
4 1.3 Structure of the Thesis
Chapter 2 provides an overview of the basic principles and concepts of the Semantic Web (like RDF, OWL, and SPARQL), Linked Open Data, and Semantic Web Services. A part of this chapter also describes two Linked Data endpoints that are used for the examples mentioned in the chapters of this Master Thesis. Chapter 3 explores the state of the art in mashups and semantic description approaches. It consists of two parts. The first part of Chapter 3 describes existing mashup platforms (seman- tic and non-semantic), that are comparable with the approach of Linked Widgets. The second part of Chapter 3 follows with detailed descriptions of semantic service description methodolo- gies. Besides description of each methodology it includes advantages and disadvantages of each approach. Chapter 4 presents the Linked Widgets approach and requirements for the developed Linked Widget Model, that includes description of input/output graphs, as well as information about provenance by using DCAT and PROV-O. Chapter 5 follows with evaluations and use cases. The ontology for widget description and examples of the ontology implementation in Top- Braid Composer9 are provided in Appendix.
9http://www.topquadrant.com/tools/IDE-topbraid-composer-maestro-edition/
5
CHAPTER 2 Web of Data
To understand the Semantic Web it is important to begin with defining the World Wide Web (WWW, 3W or the Web), purposes of the Web and its principles. “The Web is a system of inter- linked documents that run over the Internet. With a Web browser, a user views Web pages that may contain text, images, and other multimedia and navigates between them using hyperlinks“1. The first step in the Web evolution was Web 1.0, a collection of static web pages or “Read- Only“ Web. The user had the possibility to search for information on the internet, publish the information in his/her web pages, but it was not possible to interact with other users and distribute the information. For example, in e-commerce sector the web pages were presented like catalogs, the goal of that was to show the information about products to costumers. In the era of Web 1.0 the interaction between user was insufficient. Therefore the appearance of Web 2.0 was predictable. It started in 1999 emergence of systems like LiveJournal2 and Blogger3.
2.1 Web 2.0
By emergence of Web 2.0 or the “Read-Write“ Web users of the internet have instruments for communication, information sharing, advertising like social networks, Wikis, Data Feeds, online market, blog platforms, E-Conferences etc. It is also characterized as dynamic user generated content in the form of information postings (e.g. photos, videos, text). Now even a non-technical user can use Web 2.0 to share information, communicate with another users etc. The most popular platforms are Twitter4, Flickr5, Facebook6, Youtube7 etc.
1http://en.wikipedia.org/wiki/World_Wide_Web 2http://www.livejournal.com/ 3http://www.blogger.com/ 4http://www.twitter.com/ 5http://www.flickr.com/ 6http://www.facebook.com/ 7http://www.youtube.com/
7 With the widespread adoption and penetration of the web into consumer and commercial interests, we have sprung out a new phenomenon of the social and interactive web. This stage is commonly called the Web 2.0 which is presented by a wide variety how websites and soft- ware can be developed and designed. These web pages are often also characterized by personal profiles (such as birthday, contact and location information), connections between users, groups of users, RSS feeds in form of links, and public API. Web technologies allow users to include contents from other web pages into her/his pages. For example, Youtube.com shares usually the code of the video that can be embedded into another web page. “Richer applications make more extensive use of more recently opened APIs“ 8. Web 2.0 is also characterized by use of AJAX (asynchronous Javascript and XML). Ajax is key technology in Web 2.0 for creating asynchronous web applications “without interfering with the display and behavior of the existing page“ [4]. AJAX is a set of technologies that realize data exchange between a client and a server, and integrate Web page presentation. AJAX is used often in combination with HTML and CSS. The retrieving of data is going throw XMLHttprequest (XHR) object, although others formats (like JSON, HTML, text) can be used. The goal of AJAX is to let scripts send and receive data via the HTTP methods: PUT, DELETE, GET, HEAD, POST, and OPTIONS (like HTTP client). AJAX can interact with the server by sending a request in order to get data and at the same time the whole reload of web pages is not needed. There are some alternatives to AJAX available, e.g. Flash9, Microsoft WPF/E3210, XBAP11. The most important achievement of Web 2.0 is open API. The programmers can access via open API to set of modules without direct programming in the source code, create mushups (c.f. Chapter 2.7) from different data sources and integrate data. Regardless of the fact that there is information overflow, the efficient search is complex. For example [5], somebody is planning to participate some conferences in a city. The confer- ences take place in different locations. He/she wants to book some hotels near the conferences. To do this the user should take a look on more than one web page in order to find the needed information and the result will not be perfect because the data about hotels and conference lo- cations are not really connected. A solution for this problem is to program an application using programming languages like Java12 and Python13, or query languages like XPath14 for trans- forming HTML documents (in this example, information from booking system and conference web page). The second solution can be use of open APIs. But the use of open APIs causes extra time needed and complexity to integrate the data because of lack of links between the data. Web 2.0 moved static Web pages to a more dynamic and interactive level. Through Web 2.0 a large amount of information has been collected and widely available. The problem of the Web 1.0 and Web 2.0 provide human-readable content, which linked to each other via the URLs15 [37] without computer readable and understandable logic.
8http://firstmonday.org/ojs/index.php/fm/article/view/2125/1972 9http://www.adobe.com/at/products/flashruntimes.html 10http://msdn.microsoft.com/de-de/library/aa970060(v=vs.110).aspx 11http://www.xbap.org/ 12http://www.java.com 13http://www.python.org 14http://www.w3.org/standards/techs/xpath 15http://en.wikipedia.org/wiki/Uniform_resource_locator
8
Web page :movie “Actors”
a:actor Web page “Movies”
l:city
Web page “Countries”
Figure 2.1: the Web & the Semantic Web
2.2 Web 3.0
The next generation of the Web is Semantic Web or Web of Data (so called Web 3.0) [9]. Tim Berners-Lee defined Semantic Web as “an extension of the original Web, in which information is given well-defined meaning, better enabling computers and people to work in cooperation“ [14]. The differences to the traditional Web are depicted in Table 2.1):
• While it consists of contents which are attractive for the user with nice structured content and interface, the Semantic Web provides machine readable content.
• Not web pages, but data behind the web pages are connected. The links indicate location and meaning of the data.
• It is possible to create logical statements.
Feature Web Semantic Web Fundamental component Unstructured content Formal statements Primary audience Humans Applications Links Indicate location Indicate location and meaning Primary vocabulary Formatting instructions Semantics and logic Logic Informal/nonstandard Description logic Table 2.1: Comparision of Web and Semantic Web. Source: [37]
9 To clarify the difference between data representation in the Web and the Semantic Web an example is illustrated in figure 2.1. In the case of the Web 3.0 web pages are connected via hyperlinks. A search engine finds movies or actors according to keywords like movie, actor, city etc. The model doesn’t represent data behind the web pages. In the case of the Semantic Web the machine is able to read and interpret the data behind the web page. For example, the movie has a title and it has a relation to actors. There are different techniques to add and recognize structured content. The Web Content can be automatically generated from relational database and it helps search engine to interpret the data [83]. To achieve the goal, addition the structure to the data, Microformats16 or RDFa can be applied. Microformats is a vocabulary which describes the data within the web page and extracts semi-structured information [44]. The meta information can be added into a (X)HTML code. The code bellow (Listing 2.1) is an example of the use of an open microformat standard named hCard17. The hCard is used to add information about persons and organisation into web contents. The root class name is vcard18. In this case the properties fn (first and last name), org (organization name), email (email) and url (link to a web page) are used to add semantic to the existing (X)HTML code (c.f. Listing ??).
1
The information about the person is machine readable, the application can return the infor- mation directly from the web page, where the format is used. An alternative to Microformates for data interchange on the Web is more generic language named Resource Description Framework in Attributes (RDFa) [65], a serialization format [42] for semantic inclusion into (X)HTML code. It supports embedding any type of data [83]. Microformats and RDFa have a common feature. There are focused on addition of meta information into in (X)HTML. There is also another technique to add machine readable (struc- tured) data: Linked Data. Compared to Microformats and RDFa it gives possibility to publish data as Linked Data into Semantic Web. It represents the graph or set of entities that are con- nected using RDF and URIs [52]. In 2000 Tim Berners-Lee introduced 7 layers of Semantic Web (c.f. Figure 2.2):
• IRI/Unicode. IRI is Unique Internationalized Resource Identifier for Semantic Web. Uni- code is global encoding standard that includes characters for various languages and math- ematical formulas.
• XML - the language for structured content creation.
16http://microformats.org/ 17http://microformats.org/wiki/hcard 18http://microformats.org/wiki/hcard
10 Figure 2.2: the Web & the Semantic Web
• RDF/RDF-Schema. RDF is data format for creating statements in triple form (subject, object, and predicate). RDF Schema is used for hierarchies creations.
• Ontology. Web Ontology Language (OWL) is an extension of RDFS that includes con- structs for semantic description (e.g. cardinality, transitivity).
• Logic. Reasoning within the logic layer.
• Proof. Result verification.
• Trust. The derive statements should be verified and resources - identified (they should come from trusted sources).
“The main idea of Semantic Web is to support distributed Web at the level of the data rather than at the level of the presentation“ [5]. The better the data are structured the easier for search engines or applications to read and interpret the content of web pages [83]. The goal is extend- ing existing Internet and computer tools in order to get machine readable information and add semantic (meaning) to data.
2.3 Resource Description Framework (RDF)
RDF is a metamodel for knowledge representation and fact expressions. The structure of RDF expression is a collection of triples. The triple expresses a fact, that is represented as a relations between two nodes of the graph (things). The triple consists of (cf. Figure 2.3) [65]:
11 Figure 2.3: The triple
• The subject (an RDF URI reference or a blank node).
• The predicate (an RDF URI reference).
• The object (an RDF URI reference, a literal or a blank node).
Generally, each subject, object and predicate is an RDF URI reference. But there are some exception:
• If a subject isn’t a URI, it is anonymous resource (with local identifier instead of URI). It also names lank nodes and it is possible to use one or more RDF statements. For data with a complicated structure it is recommended to use the blank node, as it ties together some elements of an entity and adds different relations. For example, an address consists of street, city, house number etc.
• The object can be either an RDF URI reference or an anonymous resource or a literal. There are two kinds of literal: plain literals being a string optionally with a language tag and typed literals is a datatype URI, taken from XML Schema.
The predicate is a URI, it depicts relation between two things or attribute of the subject, having the object as a value. Return to example depict in figure 2.1, the class :movie is a subject, the class a:actor is a object and the relationship :starring is a property. The class :movie has a title :hasTitle, stored as string. The value of the object is a plain literal.
RDF NOTATION
Like XML, the RDF has two type of notations: graphical notations and serialized notations. The graphical notations look like directed labelled graphs and show how RDF triples are connected. The nodes of the graph are things (objects and subjects) and the arcs are predicates. Figure 2.4 depicts a simple example of a RDF graph and represents the following sentence: Angelina Jolie is starring of the movie “Life, or Something Like It“ and she is author of “Notes from My Travels“ that has 213 pages. The parts of the sentences are presented in the table 2.2. The nodes (ovals) in the Graph are resources that can be either object or subject. The objects and the subjects are connected via directed arrows. The direction of the arrow is going from the subject to the object. The diagram should be read as
12
http://dbpedia.org/resource/Life,_or_ Something_Like_It
http://dbpedia.org/page/Angelina_ Jolie 213”
http://dbpedia.org/resource/Life,_or_ Something_Like_It
Figure 2.4: The RDF graph
Elements Value Subject (Resource) http://dbpedia.org/page/Angelina_Jolie Predicate (Property) http://dbpedia.org/property/starringof Object (Resource) http://dbpedia.org/resource/Life,_or_Something Predicate (Property) http://dbpedia.org/property/authorof Object (Resource) http://dbpedia.org/page/Notes_from_My_Travels Subject (Resource) http://dbpedia.org/page/Notes_from_My_Travels Predicate (Property) http://dbpedia.org/property/pages Object (literal) 213 Table 2.2: The part of the sentence
The serialized notations are syntaxes for RDF: N-triples19, Turtle20, N321, RDF/XML22, N- quads23, TriG24, TriX25. The table 2.3 shows the syntax comparison.
19http://www.w3.org/2001/sw/RDFCore/ntriples 20http://www.w3.org/TeamSubmission/turtle 21http://www.w3.org/TeamSubmission/n3 22http://www.w3.org/TR/REC-rdf-syntax 23http://www.w3.org/TR/2013/WD-n-quads-20130905 24http://www.w3.org/TR/2013/WD-trig-20130409 25http://www.w3.org/2004/03/trix
13 Name Description Examples N-Triple is a line- N-Triple use absolute URI, at the end
14 RDF AND RDF SCHEMA VOCABULARY
RDF Schema provides a data-modelling vocabulary for RDF data [23]. Classes
• rdfs:Resource. All things are instanced from this class. “rdfs:Resource is a subclass of rdfs:Class [23]“.
• rdfs:Class defines a resource as a class.
• rdfs:Literal presents literal values [23] like string and integer.
• rdfs:Datatype is a subclass of rdfs:Literal that presents the datatypes.
• rdf:XMLLiteral is “an instance of rdfs:Datatype and a subclass of rdfs:Literal“ [23].
• rdf:Property depicts the relationships between classes and is instanced from the class rdfs:Class.
Properties
• rdfs:range “is used to state that the values of a property are instances of one or more classes“.
• rdfs:domain is used for subject definition of a triple.
• rdf:type sets the assignment of a resource to a class
• rdfs:subClassOf depicts that a class is a subclass of another class.
• rdfs:subPropertyOf is that a property is a subproperty of another property.
• rdfs:label “is an instance of rdf:Property that may be used to provide a human- readable version of a resource’s name“ [23].
• rdfs:comment provides description of the resources .
Benefits of using RDF data model
The main benefits are that [36]:
• By using HTTP URIs as globally unique identifiers for data items as well as for vocabu- lary terms, the RDF data model is inherently designed for being used at global scale and enables anybody to refer to anything.
• Clients can look up any URI in an RDF graph explored locally to retrieve additional information.
• The data model enables you to set RDF links between data from different sources.
15 • Information from different sources can easily be combined by merging the two sets of triples into a single graph.
• RDF allows to represent information that is expressed using different schemata in a single graph, meaning that you can mix terms for different vocabularies to represent data.
• Combined with schema languages such as RDF-Schema and OWL, the data model allows the use of as much or as little structure as desired, meaning that tightly structured data as well as semi-structured data can be represented. A short introduction to RDF Schema and OWL is also given in this Chapter.
Disadvantages of using RDF data model:
• Introduction of redundancy.
• Limited processing speed. Frequently the data from relational databases can be retrieved faster.
2.4 Web Ontology Language (OWL)
In a knowledge based system an ontology represents a vocabulary for knowledge representation [71]. The vocabulary defines set of the objects or entities and the relationships between them which present a knowledge domain. An ontology language means a declarative language for knowledge encoding or knowledge representation. Such types of languages support also knowl- edge reasoning and rules declaration. The elements of an ontology are classes and properties. The class means a group of entities [71], the property describes relationships. An important point is the focusing on relationship. The Web Ontology Language OWL is an extension of RDF and a semantic markup language, that derived from the DAML+OIL Web Ontology Language26 [73] and based on Description Logic with some additional features [69], the purpose of that is to share, authorize (authoring) and publish ontologies on the WWW. In comparison to traditional Description logic OWL uses partially different terms (cf. Table 2.4). For the overview of the ontology elements the following namespaces, a group of identifier, are defined (c.f. Listing 2.2).
1 xmlns:owl ="http://www.w3.org/2002/07/owl#" 2 xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 3 xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" 4 xmlns:xsd ="http://www.w3.org/2001/XMLSchema#" Listing 2.2: Namespaces
1. owl: - OWL namespace (vocabulary for OWL).
26http://www.w3.org/Submission/2001/12
16 OWL DL class name concept name class concept object property name role name object property role ontology knowledge base axiom axiom vocabulary vocabulary/ signature Table 2.4: OWL and DL. Source: [69]
2. rfd: - RDF syntax.
3. rdfs: - RDF Schema syntax.
An OWL class represents a set of entities that share common features or characteristics, the property is used for individuals description [71] [37]. A member of the class is an individual. The main properties of a class are object property and datatype properties. According to W3C recommendation the value of an object property is an individual and the value of the a datatype property is a literal. The definition of a class is realized via owl:class and owl:type. The instances of a class are specified via owl:type. For example, the Movie is a Class and Harry Potter is an instance of the Class Movie (c.f. Listing 2.3).
1 @prefix ex:
It is possible to add taxonomic relationships to OWL via using rdfs:subClassOf prop- erty. The distinction between relations to an instance and a subclass is that the instance represents the individuals of a class and the subclass represents a subset of the members [37]. For example (c.f. Listing 2.4), there some types of movies: action, adventure and biography in an ontology. The class ex:Movie shows the set of all movies, which is further subdivided into the subclasses ex:Action, ex:Adventure and ex:Biography. The individuals ex:EndOfDays and ex:Divergent are members of the subclasses ex:Action and ex:Devergent.
1 @prefix ex:
17 11 ex:Biography rdf:type owl:Class; 12 rdf:subClassOf ex:Movie. 13 14 ex:EndOfDays rdf:type ex:Action. 15 ex:Divergent rdf:type ex:Adventure. Listing 2.4: Example of a description in OWL
OWL has two major classes: owl:Thing and owl:Nothing. The resource owl:Thing represents the set of all individuals (recomendatioN) and owl:Nothing represents the empty class without members [37]. OWL properties are used to show the relationships between resources [71]. The two main properties are owl:ObjectProperty for relationships between two individuals and owl:DatatypeProperty for the relationship between an individual and a literal [37]. For example, a titel of a movie is a string value, therefore it will be a datatype property. The property ex:starring is a object property because it shows the relationship between an actor and a movie.
1 @prefix ex:
The next step is to define which domain and which rang have the properties. The rdfs:domain property defines the subject of a triple and the rdfs:range property defines the object of a triple. Return to the example about movies, the movie has a title, in owl semantic the sentence sounds like the domain of the property ex:hasTitle is the Class ex:Movie and rang is string (xsd:string).
1 ... 2 3 ex:hasTitel rdfs:domain ex:Movie. 4 ex:hasTitel rdfs:range xsd:string. 5 ex:starring rdfs:domain ex:Movie. 6 ex:starring rdfs:range ex:Star. Listing 2.6: Example of a description in OWL
The properties like classes can have subproperies. It is defined via rdfs:subProperyOf. For example, the property ex:hasShotDescription and hasLongDescription are specializations of the property ex:hasDescription.
1 ... 2 3 ex:hasShotDescription rdf:type owl:DatatypeProperty. 4 rdfs:subPropertyOf ex:hasDescription. Listing 2.7: Example of a description in OWL
18 As already explained, the property has always a direction, from subject to object. Some- times it is important to indicate that is also an inverse relationship exist. OWL uses the property owl:inverceOf for showing inverse relationships. According to the example ex:starringOf is the inverse property of ex:starring. There are some different variants of OWL that differed in complicity, possibility and needs of concrete users:
1. OWL Lite. The simplest version of OWL designed for user which primarily needing a classification hierarchy and simple constraints (W3C).
2. OWL DL. DL means Description Logic, formal knowledge representation language. The language includes some restrictions like type separation that provides reasoning.
3. OWL Full. OWL full offers the user full capability in expressiveness and it is pure exten- sion of RDF [37]. The main disadvantage is that it is unpredictable.
Each variant of the OWL represents an extension of another one. OWL DL is the extension of OWL Lite and OWL Full is the extension of OWL DL.
2.5 SPARQL. Query Language for RDF
SPARQL is a W3C recommendation, that was introduced for retrieving data stored in RDF format [73]. While SPARQL supports only reading, for data writing SPARQL Update27 is used. According to W3C recommendation the following terms are used in SPARQL [73]:
• IRI. Resource ID, includes URIs and URLs.
• Literal. An RDF graph which represents the set of RDF triples with nodes (set of subjects and objects of triples).
• Lexical form “being a Unicode string, which should be in Normal Form “ [32],
• Plain literals “have a lexical form and optionally a language tag as defined by RFC-3066 28 and normalized to lowercase“.
• Language tags - tags for language identification, defined by RFC-3066.
• Typed literals “have a lexical form and a datatype URI being an RDF URI reference“.
• Datatype IRI - an “RDF URI reference“.
• Blank node - an anonymous resource.
There are fore different query forms that SPARQL use:
27http://www.w3.org/TR/sparql11-update 28http://www.isi.edu/in-notes/rfc3066.txt
19 • SELECT query. The basic command for reading facts according to some graph pattern, returns subset of variables constraint in SPARQL query. • CONSTRUCT query. TThe CONSTRUCT query form returns a single RDF graph speci- fied by a graph template. The result is an RDF graph formed by taking each query solution in the solution sequence, substituting for the variables in the graph template, and combin- ing the triples into a single RDF graph by set union- [73]. • ASK query. Simple question to SPARQL endpoint, returns true/false result • DESCRIBE query, used to read an RDF Graph (returns all facts about resources) Listing 2.12 presents an example SPARQL SELECT query to DBPedia 29. The query re- turns a list of films dbpedia-owl:Film and their corresponding books which is specified by dbpedia-owl:basedOn property with gross income greater than 390000000 dollars.
1 PREFIX dbpedia-owl:
20 Figure 2.5: Evolution of the web. Source: [10]
2.6 Linked Open Data
The Web can been seen as a huge database. The problem of nowadays web is that the data are not really connected. Therefore an effective search over the data is often hard and not embodied. The idea to connect the data over the Web is not new. The approach was introduced by Tim Berners-Lee, director of the World Wide Web Consortium more than 20 years ago and now it gets more popular but there are some complexities. Figure 2.5 depicts the history of data on the web. The evolution has four steps: documents on the web, Web of Documents (linked pages), Data on the Web (Open Data, not linked) and Web of Data (Linked Data). The last step is Linked Data. “Linked Data refers to a set of best practice for publishing and connecting structured data on the Web“ [17]. He outlined four basic principles [17]:
1. Use URIs to denote things.
2. Use HTTP URIs so that people can look up those names over the Web.
3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL).
4. Include links to other URIs, so that they can discover more things.
According to these principles we can start to see the advantages that Linked Data brings. The first principle recommends using URI to identify things like real world objects or con- cepts. For examples, animals, people, places, etc. This things can have some properties such as name, color, description and some relations to other objects. “In the classic Web, HTTP URIs are used to combine globally unique identification with a simple, well-understood retrieval mecha- nism“ [36]. A URI is a globally unique resource identifier which is used for identification of
21 Figure 2.6: Linked Open Data Cloud things over the Web according to the second principle. In order to facilitate data processing for different data on the Web, it is necessary to find a common format. HTML is a dominant doc- ument format for pages on the web [36]. W3C recommends use of RDF as a common format for Linked Data. According to the Linked Data principles, the things on the Web are linked with each other via RDF links (in compare to the classic Web, where web pages are linked via hyperlinks). For example, a link between an actor and a film or a hotel and its location. The linking of the data empowers the retrieval of distributed information from different resources. An additional advantage of Linked Data is the facility to build the links between data over the existing Web architecture. The Web of Data has properties similar to classic Web [36]: • The Web of Data is generic and can contain any type of data.
• Anyone can publish data to the Web of Data.
• The Web of Data is able to represent disagreement and contradictionary information about an entity.
• Entities are connected by RDF links, creating a global data graph that spans data sources and enables the discovery of new data sources. This means that applications do not have to be implemented against a fixed set of data sources, but they can discover new data sources at run-time by following RDF links.
• Data publishers are not constrained in their choice of vocabularies which represent data.
22 Figure 2.7: The 5 star scheme. Source: [54]
• Data is self-describing. If an application consuming Linked Data encounters data de- scribed with an unfamiliar vocabulary, the application can dereference the URIs that is identify vocabulary terms in order to find their definition.
• The use of HTTP as a standardized data access mechanism and RDF as a standardized data model simplifies data access compared to Web APIs, which rely on heterogeneous data models and access interfaces. The Data on the Web can be characterized with using the “five-star rating schema“ (cf. Figure 2.7). The criteria are as follows: • 1 Star. Data on the web, any format (e.g., pdf, an image scan), an open licence is available.
• 2 Star. Structured Data, machine-readable formats (e.g., excel).
• 3 Star. Non-proprietary format (e.g., CSV).
• 4 Star. Use of URIs to identify things, open standards from the W3C, possibility to link the things.
• 5 Star. Linked content from different resources, using Linked Data principals. Large amount of structured data has been posted on the Web. The result of this is Linked Open Data Cloud (c.f. Figure 2.6), which is highly interlinked and formed a very extensive graph. The graph consists of billions of triples stored in RDF format from different sources. The datasets covers many topics like Media, Geographic, Publication, Government and various others. Linked Open Data technical overview is presented by Linked Open Data Puzzle (c.f. Figure 2.8). The stack shows which technology should be used for working with LOD. The LOD
23 Figure 2.8: Linked Open Data Puzzle. Source: [10] documents are stored on the WWW (HTTP) servers. The diagrams show URLs which are required for identifying resources. Additionally, vocabularies are used for descriptions of nouns while ontologies add relationships between them. As per the diagram SPARQL gives the ability to query the data. Finally, the applications that can consume and produce Linked Data are defined as Mashups and Search Engines. Linked Data presents a new way to organize information on the web and in organization because of flexible and expressive standards. Linked Data connects data from different sec- tors. The crucial point is adaptation of the Linked Data to enterprise level. It includes better techniques for publishing, consuming the data as well es better usability and easier learning for work with Linked Data. Due to this problem, the main focus of Linked Widget approach is to in- crease level of usability, to make the work with Linked Data more intuitive and understandable, give ability for organizations to combine their intern data with Linked Open Data Cloud.
2.7 Overview of Linked Data Endpoints
In the following chapters of this thesis I will present some examples of semantic service de- scription with use of various semantic description approaches. The semantic web service will be
24 Figure 2.9: Overview of DBPedia components. Source: [18] process Linked Data taken from Linked Data endpoints like DBPedia30 and Linked Data Movie Base31.
2.7.1 DBPedia DBPedia is semantic version of Wikipedia32. It “allows to ask queries against Wikipedia and to link the different data sets on the Web to Wikipedia data“ [18]. The main components of the framework are (c.f. Figure 2.9):
• Page Collections - local or remote sources of Wikipedia contents.
• Destinations - storing or serializing extracted RDF triples.
• Parsers - supporting the extractors, converting values between different units and splitting markup into lists [18].
The Extraction Manager is used for managing the processes of mapping Wikipedia articles to the domain ontology. The framework includes following extractors [18]:
• Labels (rdfs:label) - a title of the articles.
30http://dbpedia.org/About 31http://linkedmdb.org/ 32http://www.wikipedia.org/
25 • Abstracts. There are 2 version of abstract: short with use of rdfs:comment and long with use of dbpedia:abstract.
• Interlanguage links are links that connect articles about same topics in different lan- guages.
• Images. The images are connected to resources via the foaf:depiction property.
• Redirects - identification of synonymous terms, references between DBpedia resources.
• Disambiguation - explanation of the different meanings of homonyms via the predicate dbpedia:disambiguates.
• External links link data from DBPedia to external Web resources with use of the property dbpedia:reference.
• Pagelinks - links between Wikipedia articles (dbpedia:wikilink property).
• Homepages - links entities to their homepages (foaf:homepage).
• Categories - categories of articles that are represented with use of the SKOS vocabulary33 (the property skos:concept, skos:broader).
• Geo-coordinates use the Basic Geo Vocabulary34 and the GeoRSS Simple encoding of the W3C Geospatial Vocabulary35 to define a location.
The framework uses four types of extraction [18]:
• Dump-based extraction. The Dbpedia database is updated monthly with dumps of all Wikipedia editions. The dump-based work uses the page collection from WikiPedia like the source of article texts and the N-Triples serializer as the output destination.
• Live extraction. The extractor uses the Wikipedia’s Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) live feed36 that announce all changes in Wikiperdia as stream for new RDF extraction. The SPARQL Update deletes existing and inserts new triples into a separate triple store.
• Generic Infobox Extraction processes all infoboxes within a Wikipedia article. The triples are produced in the following way. The algorithm uses the corresponding DBpedia URI as subject. The predicate URI is constructed from the namespace fragment http:// dbpedia.org/property/ and name of the infobox attribute. The attribute values are presented as Objects.
33http://www.w3.org/2004/02/skos/ 34http://www.w3.org/2003/01/geo/ 35http://www.w3.org/2005/Incubator/geo/XGR-geo/ 36http://wiki.dbpedia.org/DBpediaLive
26 Figure 2.10: DBPedia page
• Mapping-based Infobox Extraction maps Wikipedia templates to an ontology by arranging the 350 most commonly used infobox templates within the English version, 170 classes, and 2350 attributes from within these templates. “The property mappings define fine- grained rules on how to parse infobox values and define target datatypes, which help the parsers to process attribute values“.
DBpedia currently includes about 4.0 million “things“ with 470 million “facts“, and about 45 million links to external data sets37 like Freebase38 or flickr wrappr39. DBPedia provides also versions in 199 languages. The advantages of DBPedia are coverage of many domains, real community agreement, automatic actualization of contents and multilingual support. “The DBpedia knowledge base is served as Linked Data on the Web“40 and presents one of the central interlinking-hubs. Each thing is identified via a dereferenceable IRI, or a URI-based reference. For example, http://dbpedia.org/ontology/Agent is the URI of the class “Agent“. http:// dbpedia.org/page/Angelina_Jolie is the URI of the instance “Angelina Jolie“ of the class “Agent“. Figure 2.10 depicts the web page which presents the information about the actor “Angelina Jolie“ stored in DBPedia. The information about “Angelina Jolie“ divides into two parts property and value. The instance has the following properties: dbpedia-owl: birthName, which presents the name of the actor; dbpedia-owl:birthPlace - birth place of the actor, dbpedia-owl:parent - parents, and also external links to other data sets.
37http://wiki.dbpedia.org/Datasets 38http://www.freebase.com/ 39http://wifo5-03.informatik.uni-mannheim.de/flickrwrappr/ 40http://dbpedia.openlinksw.com:8890/About
27 The data can be also accessed via the SPARQL endpoint at http://dbpedia.org/ sparql. Listing 2.9 presents an example query to DBPedia. The query returns the list of movie instances and directors of them. The actor “Raul Reubens“ performed a role in the movies. The endpoint supports also full text search over properties.
1 2 PREFIX :
2.7.2 Linked Movie Data Base Linked Movie Data Base is another open semantic web database which contains information about movies. The data sets also include links to other Linked Open Data and references to webpages. There are about 1645000 interlinks to other Linked Data endpoints such DBPe- dia (vie owl:sameAs), RDF Book Mashup (via movie:relatedBook), flick-rwrappr (via dbpedia:hasPhotoCollection), etc. Figure 2.11 depicts the interlinks to other data endpoints. The resource are presented by the following sample entities: film, actor, director, writer, producer, music contributor, cinematographer, etc. The SPARQL Endpoint is available at http://data.linkedmdb.org/sparql. Listing 2.10 shows a simple example of the SPARQL query to the endpoint, which selects instances of class movie that are available in En- glish language, and have relation to the actor “Paul Reubens“ (http://data.linkedmdb. org/resource/actor/1395), and include links to other sources (owl:sameAs).
1 SELECT ?film ?title ?instance 2 WHERE { 3 ?film movie:actor 4
Figure 2.12 depicts the result of the query.
2.8 Widgets & Mashups
W3C’s Widget Specification defines widget as “an interactive single purpose application for displaying and/or updating local data or data on the Web, packaged in a way to allow a single download and installation on a user’s machine or mobile device“ [56]. In other words, a widget is a small and simple application or piece of dynamic content developed for different types of software platforms.
28 Figure 2.11: LinkedMDB in the Linking Open Data cloud. Source: http://richard. cyganiak.de/blog/
Figure 2.12: SPARQL results
There are different types of widgets:
• GUI (graphical user interface) widget is a part of applications designed for human-computer interaction (such as a window, a text box or a check box) in order to control displayed el- ements.
• Disclosure widget specifies which information should be hidden or shown for the user.
29 • Desktop widget is small application for desktop that control simple function like clocks setting, calenders or have access to some web services and show actual information (e.g. news, rate of exchange).
• Metawidget is used for control of other widgets.
A special kind of widget is the web widget, this can be included in code of the web page in order to show the information from another source. It is often used for advertising or for displaying video. The widget is often used in Social Web. It presents by a “widget application“ as a third party application “for an online social network platform, with the user interface or the entire application hosted by the network service“41. It is possible to combine it with other components and data for complex problem solving. There are following benefits:
• Versatility and seamless integration possibility within diverse Web environments.
• Reusability.
• Easy implementation.
• Possibility to combine some widgets together.
• Possibility to use internal resources from a web page (site data) with online data from different LOD sources.
• Easy and cost-efficiently use of widget for adding semantic functionality.
As already referred in Chapter 2.1.2 Web 2.0 opened the door to the new technologies, enabled the easiest data integration, like open API or Mashups. A possible solution to process data from various sources is use of Mashups’. Mashups are application developed for content retrieving from disparate Web sources. The data and functions can be received through various mechanisms and formats like REST APIs, feed formats, JSON, XML, HTML. The typical characteristics for mashups are:
• a mashup is often consist of Widgets and Feeds that mixed together and have access to different sources,
• use of service-oriented architecture,
• focus on specific domains or problems,
• it is possible to publish the result on the web and provide access to their functionality,
• ability to access to published mashups and include their functionality in a new mashup.
41http://en.wikipedia.org/wiki/Software_widget
30 Mashup development differs from tradition component-based application development. It is typically more collaborative, organic, and dependent on reuse of existing components. The development can be realized manually or with use of a development environment. The develop- ment includes:
• Widget creation and organising flows of data, transforming the data into an appropriate format or reusable feeds;
• Mashups sharing, tagging, and trustworthiness indication;
• Reuse of existing mashups or mashup logic extension and new combination sharing;
• Data analyse and personalization.
There is a set of tools and technologies that can be categorized as mashup builder like Ya- hoo!Pipes 42, QedWiki (IBM product), Intel Mash Maker 43 and mashup enabler [53]. In chapter 3 Yahoo Pipes will be viewed because of its popularity. Mashup builder gives an ability for non experts to create new composite applications by combining simple operator and operation like filtering, selecting etc. The operator is a widget that provides access to data sources. The mashups can be published and combined with each other. The disadvantage of such tool is that the developers can not extend the features. Mashup enabler suppose data source adapter that gives structure to the data. The examples of Mashup enablers are Feed4344, Openkapow and Kapow Mashup Server45. The typical data source adapters are Application-specific API, RSS46, and RMDB47. The disadvantage of this kind of tools is that they doesn’t have a graphical mashup builder. The most challenging problems [46] are:
• Combining data and function. Because the data are stored in various sources in different format and it is important to recognize which data and function can be combined together.
• Data integrity. “Mashups are a quick way to create new applications but they can raise data integrity problems when changes of end-users are not valid against the underlying commitment“.
• Mashups search/Cataloging. It is necessary to provide an efficient mechanism for search. If many mashups exist, user doesn’t know which mashup can be used for different task and which mashups can be combined together.
• Making data Web-enabled. Not all data and functionalities are published on the Web. Some data are available, but not accessible from the mashup systems because of formats or it includes extra data (e.g. HTML formatting structure) and a conversion to structured
42http://pipes.yahoo.com/pipes/ 43http://software.intel.com/en-us/articles/ 44http://feed43.com/ 45http://kapowsoftware.com 46http://cyber.law.harvard.edu/rss/rss.html 47http://rmdb.stanford.edu/repository/
31 data is needed. Therefore a well-defined process is required to prepare the raw data for web publishing.
• Security and identity. Some data are confidential and the system should support it via appropriate authorization mechanism.
• Sharing and reusing. It should be possible for users to reuse the already created mashups and share the new mashups with other users.
• Trust certificates. The owner of the mashup system should provide a license that will guarantee end-user rights and permissions of the mashup.
• Version control mechanisms. The data from various sources may get updated and the end user of the system should know about changes in the data sets, therefore a version control mechanism is essential.
The mashup development needs a methodical construction that can include steps as follows: the problem and domain definition (objectives, factors), IT environment definition, identification of technical requirement, technology selection, special mashup features definition like version control or data integrity. Mashups are a novel approach to build Web application that can access to various data sets and combine it together. The mashup creation can be done by non-professional users. The mashups can cover different topics like finance, government, news, libraries etc. Mashups can be used to consume Linked Data. Furthermore mashups offer an easy way to integrate non-semantic data in different formats with Linked Data sets.
2.9 Scheme.org
Schema.org has been introduced in 2011 [39] by Yahoo, Google and Bing. It represents a collec- tion of schemas that can be used to markup the web pages to improve the recognition of data by search engines. The schema.org focuses specially on Linked Data. The Schema.org supports the generation of the following formats: RDF/Turtle, RDF/XML, RDF/NTriples, JSON, and CSV. As already referred in Chapter 2.1.3 the data can be automatically generated from databases and put into HTML. The data stored in databases are already structured, but search engine can not recognize the structure if the data is presented in HTML format. “Many applications, espe- cially search engines, can benefit greatly from direct access to this structured data“ [39]. On- page markup supports more effective search for the data and order the data to make the result of search more relevant for users. The following example shows how the content can be marked up using microdata. The original HTML code looks as follows:
1
Harry Potter 3 Author: J.K. Rowling (born 31.07.1965) 4 Country: United Kingdom 5 Movie32 6 < / div > Listing 2.11: HTML code
The schema.org vocabulary can be used with the microdata format to add the structure to the content of the web page. For the section identification that is about a movie the itemscope element is used. The concrete items are defined by adding itemprop and itemtype elements in the
block.1
2 3 5 Author : J.K. Rowling 6 ( born 31.07.1965) 7 < / div > 8 United Kingdom 9 Movie 10 < / div > Listing 2.12: HTML with microdata format2.10 Semantic Web Services
One of the goals of this master thesis is semantic Web Widget description with use of web service description approaches or languages. This part of the Master Thesis gives an overview of Semantic Web Service. As referred earlier, nowadays the Web may have great significance for society. The tra- ditional Web focused on interaction between people and applications, information sharing, on providing of the basic features for e-Commerce, and on support for application integration (very limited). [26] The ability to exchange and use information is the major task because of limita- tions of the Web. The solution for the problem of interoperability was the introduction of Web Services. The W3C defines web service as “a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine- processable format (specifically WSDL). Other systems interact with the Web service in a man- ner prescribed by its description using SOAP messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards“. The Web Services connect applications over the internet using Web service standards in order to exchange the data. For example, online purchase, if the user want to by a staff, he or she sends a request to the Server and gets a response. The request includes the ID of the staff, amount, credit card name, address etc. The response includes information about successful purchase or some errors. A client and a web service exchange this information via request and response messages. The client application send a request message to the Web server and the servers returns a re- sponse message to the the client. The technology has following aspects:
33 • The protocol is responsible for message transportation. For example, HTTP, SMTP, FTP or BEEP48.
• The message structure is defined with SOAP or REST.
• The interface description describes the structure of message. For example, WSDL.
• The data format are XML based message format or JSON.
It is necessary to take close look at the data that web services have as Input and as Output. For clarification, the following examples of the messages in different formats are introduced. REST + XML The URI defines a resource. E.g., http://ex.com/actors/angelinajolie.A REST response is a document in XML format and the resource URL.
1 < p r o f i l e > 2 Angelina 3 Jolie 4 US 5 1982 6 < / p r o f i l e > Listing 2.13: REST + XML
REST + JSON REST + JSON is essentially the same as the previous format. The difference is that the data is transferred in JSON format. The advantage of JSON is the ability to parse the structures into JavaScript.
1 { 2 firstName : "’Angelina"’, 3 lasName : "’Jolie"’, 4 citizenship : "’US"" 5 year : "’1982"’ 6 } Listing 2.14: REST + JSON
XML RPC The message is also represented in XML format.
1 HTTP / 1 . 1 200 OK 2 Connection: close 3 Content−Type: text/xml 4 Server: ex.com 5 6 7 8 9 10 < s t r u c t >
48http://en.wikipedia.org/wiki/BEEP
34 11 12 firstName 13 Angelina 14 < / member> 15 16 lastName 17 Jolie 18 < / member> 19 20 citizenship 21 US 22 < / member> 23 24 year 25 1982 26 < / member> 27 < / s t r u c t > 28 < / param> 29 < / params> 30 Listing 2.15: XML RPC
The Richardson maturity model49 identifies REST as the interaction between a client and a server according to three principles:
• Resource identification by means of URI.
• API should use constrained set of operations (HTTP verbs).
• Hypermedia controls (automatic web application control).
The main problems of the Web Service are:
• As pointed out before, the Web Services can have different standards and not machine- understandable content.
• Not self-describing.
• Service discovering is complex.
• Technical challenges in service composition.
A solution for the problems is adding the semantic description to the Web Services and their corresponding messages that contained data. “Semantic Web Services is a synergistic conflu- ence of the Semantic Web and Web Services“ [62]. They are like traditional Web Service but includes machine-readable and understandable information. The implementation of Semantic Web Services should apply standards for semantic data. Due to this fact the services can be discovered and assembled.
49http://martinfowler.com/articles/richardsonMaturityModel
35 The Semantic Web Services have many similarities with Web Widgets: widgets have also input, output, functional properties etc. An important task is to describe the semantic behind the services. There are different methodologies (OWL-S, WSMO, WSDL etc.) for services descrip- tion task, that are explored in detail in Chapter 3. The goal of the Chapter 3 is to understand which approaches can be applied to the problem statement of this Master Thesis.
36 CHAPTER 3 State of the Art
This Chapter is divided into two parts. The first part describes state of consuming and publishing tools. The second presents the different methodologies that can be applied for Web Services and data descriptions.
3.1 Applications
3.1.1 Overview of existing application This chapter describes related applications and projects in the field of consuming and publication of Linked Data. There are some existing tools and application available. The applications can be categorized as follows [17] [36]:
• Linked Data Browser are similar to web browser however instead of navigating between pages via hyper links, the users navigate between data resources by following linked ex- pressed by RDF triples. [17]. Examples: Tabulator (generic data browser and editor [13]) and Marbles1 (server-side application that formats Semantic Web content for XHTML clients using Fresnel2 lenses and formats).
• Linked Data Search Engines and Indexes. A number of search engines have been de- veloped that crawl Linked Data from the Web by following RDF links, and provide query capabilities over aggregated data. Broadly speaking, these services can be divided into two categories: human-oriented search engines and application-oriented indexes [17]. Exam- ples: Falcons3, SWSE4, Swoogle5 (semantic Web search engines that provide keyword-
1http://mes.github.io/marbles 2http://www.w3.org/2005/04/fresnel-info/ 3http://www.w3.org/2001/sw/wiki/Falcons 4http://swse.org 5http://swoogle.umbc.edu
37 based search for objects, data, ontologies and documents), sameAs.org6, Sindice7, and Sig.ma8.
• Domain-specific Applications. The applications that were developed for domain-specific goals. Such applications access specific data from different Linked Data Sources. Exam- ples: DBpedia Mobile9, DERI Pipes10, BBC Programms and Music11.
The following sections of this Chapter provides existing mashup platforms and applications that are able to consume Linked Data. This Chapter covers not only semantic applications but also Yahoo!Pipes12, a mashup that consume data from various resources. For evaluation of tools some parameter like discovery, input/output data types, access methods, recursion, behavior are used [3]. The user interface is also an important factor.
3.1.2 Yahoo!Pipes Yahoo!Pipes is an online application that was lunched at 7th February 2007 by Yahoo. The purpose of the application is data integration and consuming from different web pages, web feeds (RSS feeds) and other online resources by way of constructing data mashups [40] [43]. The mashup system includes different type of widgets. Some of them have access to data sources. Another widgets include aggregate or filtering options. Widgets can be wiring together in order to process data. The Yahoo!Pipes environment includes four main parts: a navigational bar, the toolbox, the work canvas, and a debug-output panel [43]. The mashup creation is going through dragging modules (operators) from the toolbox into the work canvas and linking the modules. Each of the modules completes a specific task [40]. Widgets have input and output terminals. The linking input of a widget to another widget is occurred via wiring from the input to the output port (terminals) or vice versa. Data flow is going from input modules to a single Pipe output (end of execution process) [43]. The output returns in different formats such as RSS, JSON. The project can be saved and shared with other users of Yahoo Pipes. Pipes can be accessed via their URL (each of pipes has a unique URL). The user has a possibility to store the pipe in the public directory. Anyone can search and browse the pipes from the directory. User can search for published pipes, inspect and modify pipes, and also save a copy from it in a directory. There are eleven categories of modules (features): sources, user inputs, operators, url, string, data, location, number, favorites, my pipes and deprecated. Source is the component that bring the data from web pages into the pipe [43]. This modules can process data on the Web in CSV (Module Fetch CSV), XML and JSON (Module Fetch Data), RSS, Atom and RDF (Module Fetch Feed) formats. Find First Site Feed is a module for finding an RSS or Atom feeds. It is also possible to extract any information from web pages
6http://sameas.org/ 7http://sindice.com/ 8http://sig.ma/ 9http://dbpedia.org/DBpediaMobile 10http://pipes.deri.org 11http://www.bbc.co.uk 12http://pipes.yahoo.com/pipes/
38 using XPATH Fetch Page Module. E.g. The command //img is used to return all images from a web page. This category includes also other components. User inputs make Yahoo!Pipes more flexible and enable adding user inputs into data flow. There are five types of input modules: date, location, number, text and URL. The user may provide the following fields: name (parameter name), prompt (for Run Pipe option, a text entry field), position (the order of input fields), default (a default value), and debug (a default value within the Pipes Editor). Operators are used for data transformation and filtering. This category includes following modules [43]:
• Count Module counts number of items. The input of the module is a data feed and the output is a number.
• Filter Module is used for item inclusion and exclusion from a feed via rules definition. The module can contain multiple rules.
• Location Extractor Module is used for adding location elements (y:location) which in- cludes sub-elements such as latitude, longitude, quality, country, state, city, street, postal code. This element gives the possibility to display the feed on a map.
• Loop Module is used to add sub-modules to Pipes. A module can be inserted into Loop Module. The sub-module will run once for each item in the input feed. There are two options that define the output of the module: “emit result“ (output is only data from the sub-module) and “assign results to“ (output is all the data from the original input, the data from the sub-module is ).
• Regex Module “modifies fields in an RSS feed using regular expressions, a powerful type of pattern matching“ [43].
• Rename Module renames elements. E.g. it is possible to convert some data into RSS format (the elements will have title, description, etc.) or to location elements for Location Extractor. There are two types of mapping: “rename“ (create a new element with a new name with deleting the old element) and “copy es“ (create a new element without deleting the old element).
• Reverse Module provides reversing the order of items.
• Split Module splits the feed “into two identical output feeds“ [43]. The module is useful in case of different operation on the same data items.
• Sort Module sorts feeds in either ascending or descending order by any element (e.g., name, date).
• Sub-Element Module extracts selected sub-elements from a feed.
• Tail Module tails a feed to the last N items. N is a number specified by user.
• Truncate Module truncates a feed to the first N items.
39 Figure 3.1: the Web & the Semantic Web
• Union Module combines separate sources of items (maximum 5). The output is a list of items.
• Unique Module removes the duplicated string data type data from the feeds.
• Web Service Module sends a request to an Web Service for additional processing of the data. The Yahoo Pipes gets the response from a Web Server in JSON format. The Web Service should support HTTP POST in JSON format.
• Create RSS Module transforms input data in RSS format. Non-RSS elements are re- named in an existing element name.
Figure 3.1 presents an example in Yahoo Pipes. The example shows aggregation of infor- mation from different sources. The processing is going separately for each data source. In the example the data from Sciencenews Web Page (https://www.sciencenews.org/) and CNN news page (http://rss.cnn.com/) have been selected. The merging the data is processed via the Union module. The use case for the example was the finding the ar- ticle, that have word “Dolphin“ in the title. To get “dolphin“ reference from Sciencenews Web Page the XPath Web Page module was used. For selection of the data XPATH com- mand //a[contains(.,’Dolphin’)] has been used. The Truncate module has been user for taking the top two articles. The articles from CNN web page has an RSS feed (http:
40 //rss.cnn.com/rss/cnn_topstories.rss). For selection of “Dolphin“ from the ti- tle the module Filter has been used (item.description contains “Dolphin“). Finally, both feeds are piped into a Union module, merging both into one feed. After running the pipe in the debug-output panel the result of merging is shown(3 articles). URL Module includes only one module - URL Builder Module. All resources are defined by URLs. Some of them are complex. The module is used for controlling on URL construction. String Modules are used to process string values. For example, the building a string from some sub-strings. The category includes String Builder Module, Sub String Module, Term Ex- tractor Module, Translate Module, String Regex Module, String Replace Module, String Tok- enizer Module, Yahoo! Shortcuts Module, Private String Module. Date Modules are used for date building and formatting. There are two modules: Date Builder Module and Date Formatter Module. The first module converts a string value into a datetime value. The second modules defines a format for the datetime value. Location Builder Module extracts geographical data from a description. “The module outputs a location structure with separate fields for city, state, country, latitude, and longitude“ [40]. The location can be connected with any modules that accept location types. Simple Math Module processes mathematical operation like division, substraction, power etc. Yahoo!Pipes supports creation of new information streams from different sources by using a cascade of simple operators. Data sources are usually web feeds (e.g. from news web page) or another simple data. The access to data is realized via standard web protocols (HTTP, RSS). Yahoo!Pipes mashups can be combined with each other and can be accessed via HTTP. The retrieving data are usually automatic refreshed after each start of the pipes. The disadvantage of Yahoo!Pipes is the lack of Semantic Data processing capabilities. Yahoo!Pipes also doesn’t support search for stored widgets based on authors, topic, etc. and semantic description of the resources. Yahoo!Pipes supports component discovery according to keywords. The possible formats of data are limited to RSS, Atom, XML, and JSON. An advantage is that Yahoo!Pipes give ability to use XPath expression for data retrieving from web pages. It accesses the data via HTTP or RSS/Atom. A good feature is mashups recursion, the stored mashups can be used as parts of another mashups. The interface is very complicated for non-professional users.
3.1.3 DERI Pipes DERI Pipes [29] is an open source project for web data transforming, filtering, aggregation (the data should be in RDF format or in several RDF serialization format) [49], and building RDF- based mashup [61]. The tool supports RDF, XML, SPARQL, XQUERY, JSON and several scripting languages [29]. External applications can use the output stream of data (e.g. JSON). The web sources of data can be accessed via URIs. Data are processed by several basic operators, and each operator may have one or more input (e.g. text, output from other operators or URIs) and only one output (a RDF graph, an RDF datasets, an SPARQL set). A set of instances of the operators represents a pipe [61] [49]. A Semantic Web pipe processes a data flow from a set of RDF sources through pipelined special purpose operators [49]. Figure 3.2 presents the basic operators such as CONSTRUCT and
41 Figure 3.2: Semantic Web pipe operators. Source: [49]
SELECT. The input values can be data in RDF, string or XML formats. The output is usually data either in RDF or in XML format. The definition of the pipes are stored as XML. The structure of a simple pipe is presented by the following example [29], which presents a simple pipe that aggregates data from differ- ent Linked Data sources. Each pipe is started with XML tag . The construct block () is used for RDF transformation (c.f. Listing 3.1).
1 < p i p e > 2 3 5 < query > 6 < ! [CDATA[ 7 CONSTRUCT { 8 / / SPARQL query 9 } 10 ] ] > 11 < / query > 12 13 < / p i p e > Listing 3.1: Pipes Definition
DERI Pipes has also a graphical editor (c.f. Figure 3.3). The environment is similar to Yahoo!Pipes. On the left side there are sets of operators that are grouped into four categories: fetch, operators, url, and inputs. The operators can be moved onto the designer tab canvas or panel and connected. The source code can seen by clicking on the button “source code“ under the designer tab canvas. The result of pipes is shown in view panel (text or table view). To understand better the features the operators [29] have been considered.
42 Figure 3.3: DERI interface
The first category of operators includes fetch operators. This operators get data from data source (via a URI) in RDF, HRML, XML, or XSL format. The second category “Operators“ includes operators for data processing. The triples that are fetched from different sources can be merged via the MIX operator. The input of the operator should be RDF/XML data (a constant or an output of an another operators in RDF/XML format). The operator RDFS MIX merges specified sources and then concludes triple from the merged triples. The operator CONSTRUCT is used to derive data from one or mere specified RDF sources via SPARQL. Cycle operator FOR invokes “a parametrized pipe multiple times and merge the resulting outputs of each invocation“. The operator SMOOSHER can be used to merge all data from different sources according to a URI and based on the owl:sameAs statement. The third category is “URL“. There are two operators in the category: URL builder and SPARQL Endpoint. URL builder is similar to Yahoo pipes url builder. SPARQL Endpoint accesses to a SPARQL endpoint via a SPARQL query which is contained in the operator. The fourth category “Inputs“ includes PARAMETER that “accepts user input“ [29] and FOR VARIABLE that gives a name to a field that is used within a loop [29]. DERI Pipes like Yahoo!Pipes can be stored, shared an re-used by other users. Each DERI Pipe has a unique URL. Users can connect different pipes, modify existing pipes and include pipes as a functional block into projects (because of XML format or HTTP-retrievable model). DERI Pipes like Yahoo!Pipes doesn’t support an efficient search for stored widgets and se- mantic description of widgets. DERI Pipes process data in RDF, XML, Microformats, JSON and binary streams and convert data into RDF format. The platform accesses the data via SPARQL. A good feature is mashups recursion. The stored mashups can be used as parts of another mashups. The interface is very complicated for non-professional users and programming skills are needed.
43 3.1.4 BIO2RDF BIO2RDF is an open-source semantic project that provides Life Science Linked Data data from over 1500 biological databases (like Kegg13, MGI14, PDB15) [16]. The goal of the project is an implementation of a more sophisticated scheme for biomedical data, bringing data from different web sites together, and adding semantic to the data in order to get machine-understandable content [20]. Bio2RDF provides scripts that convert diverse set of heterogeneously formatted sources [16] [20] into RDF. The datasets are converted based on Tim Berners-Lee’s designed principles of Linked Data (cf. Chapter 2.2). The transformation of the data into RDF format is occurred though a JSP toolbox. The data can be locally stored or accessed via http requests [16] [12]. The system supports “relational databases, text files, XML documents, and HTML pages“ [12]. Depending on the data format the system uses different methods to get the data: XML to RDF conversion, SQL to RDF conversion, or text file to RDF conversion. The data conversation includes three steps [12]:
1. Namespaces definition for URIs normalization (URI is unique, using owl:sameAs pred- icate).
2. The data source analyses and design of RDF a model.
3. The implementation of an RDFizer for data transformation and putting the data into a triple store.
Bio2RDF suggests set of principles for providers [12]:
1. “Use a REST like interface“ for clear and stable URI creation.
2. “Lowercase all the URI up to the colon“ for effectively case insensitive.
3. “All URIs should return an RDF document“ for easy connection to other linking data.
“The syntax of normalized URI is described by the following pattern: http://bio2rdf. org/:“ [12]. Figure 3.4 presents the framework architecture. The input data can be in different formats as Text, XML, RDF, etc. The system processes the data in two ways:
• The data from external sources can be are stored in an SQL database on BIO2RDF.org server. This sources are accessible directly from the server. The direct access to the BIO2RDF server affords high speed. E.g. data from HGNC16, Entrez Gene17, Kegg18.
13http://www.genome.jp/kegg/ 14http://www.informatics.jax.org/ 15http://www.rcsb.org/pdb/) 16http://www.genenames.org/ 17http://www.ncbi.nlm.nih.gov/gene 18http://www.genome.jp/kegg/
44 Figure 3.4: Bio2RDF system framework architecture. Source: [12]
• The data from external sources can be requested directly from data source. After the request the data is transformed into RDF with use of a RDFizer program. E.g. data from Reactome19, PubMed20, UniProt21.
There are two servlets in the system: Elmo22 and Sesame23. For crawling of the RDF documents Elmo is used. The triples are processed in the local Sesame repository. By means of the Sesame interface the data can be browsed and queried. BIO2RDF looks like a search engine. The user can use it like Google or Yahoo to find the needed information. The result of request is a table with properties and values. BIO2RDF contains very specific knowledge therefore it is very popular in life sciences industry.
3.1.5 LOD2 The LOD2 is a large European project and set of tools that “support the whole life cycle of Linked Data from extraction, authoring/creation via enrichment, interlinking, fusing to mainte- nance“ [8], developed by partner companies and university. The architecture of components is based on three foundations [8]:
• “Software integration and deployment using the Debian packaging system“.
• “Use of a central SPARQL endpoint and standardized vocabularies for knowledge base access and integration between different tools“.
19http://www.reactome.org/PathwayBrowser/ 20http://www.ncbi.nlm.nih.gov/pubmed 21http://www.uniprot.org/ 22http://www.openrdf.org/ 23http://www.w3.org/2001/sw/wiki/Sesame
45 • “Integration of the LOD2 Stack user interfaces based on REST enabled Web Applica- tions“.
LOD2 defines Linked Data Lifecycle, that includes eight phases, for each of that are a set of tools are available:
• Extraction. Conversion of data into RDF. Tools: Valiant24, Apache Stanbol25, DBPedia Sportlight26, D2RQ27.
• Storage. Optimization of data storage, dynamic query to RDF graph, graph processing etc. Tools: Virtuoso28.
• Authoring. Publishing of the Linked Data, addition of semantically enriched content and editing it for non-expert users. E.g. WYSIWYM paradigm29. Tools: PoolParty30, On- toWiki31.
• Interlinking. Data integration, addition the links between semantic contents. Tools: SILK32, LIMES33.
• Classification. The integration of the raw data with an ontology for future work with integrated data.
• Quality. The quality characteristics like coverage, context or structure are very important. Tools: Sieve34.
• Evolution/Repair. Control for data sets and ontologies relevance in order to keep things stable. Repair strategies should be planed for appeared problems. Tools: Sieve.
• Search/Browsing/Exploration. Tools: SemMap35.
Apache Stanbol Apache Stanbol is a set of components that combine tradition content management systems with semantic services. Apache Stanbol includes:
• Content enhancement. The goals are information extraction from contents, content ana- lyze, presenting contents as RDF. It is used for search and navigation improvement.
24http://lod2.eu/Project/Valiant.html 25https://stanbol.apache.org/ 26http://dbpedia-spotlight.github.io/demo/ 27http://d2rq.org/ 28http://lod2.eu/Project/Virtuoso.html 29http://en.wikipedia.org/wiki/WYSIWYM 30http://lod2.eu/Project/PoolParty.html 31http://lod2.eu/Project/OntoWiki.html 32http://lod2.eu/Project/Silk.html 33http://lod2.eu/Project/LIMES.html 34http://sieve.wbsg.de/ 35http://aksw.org/Projects/SemMap
46 • Reasoning. The Stanbol reasoners analyzes set of axioms and facts in order to get logical consequences (additional semantic).
• Knowledge models or Ontology Manager provides access to ontology stored in the system for managing ontology, ontology networks, user sessions [28].
• Persistence or Contenthub. It is a document repository for semantic information storing.
The functionalities of components are terms of a RESTful web service API.
D2RQ R2RQ is a platform for data retrieving in form of RDF graphs from relational databases without additional storing the data in a RDF store. The D2RQ Platform is “a system for accessing relational databases as virtual, read-only RDF graphs. It offers RDF-based access to the content of relational databases without having to replicate it into an RDF store“36. The system support querying of non-RDF database using SPARQL, presentation of relational data bases as Linked Data and access to the data, use of Apache Jena API, creation of custom dumps. The platform includes :
• The D2RQ Mapping Language. The language describes mapping between relational databases and ontologies. The data are presented as virtual data graphs that include infor- mation from relational databases.
• The D2RQ Engine. The engine converts Jena API calls into common SQL queries ac- cording to a mapping description.
• D2R Server gives the ability to publish the data into LOD. The server transforms the data from relational database into RDF formats according to a mapping description. After transformation the data can be browsed and searched.
Virtuoso Virtuso is a multi-model data server for data and information storage, and knowledge man- agement. It allows the access to various data sources that are stored in different formats and sup- ports various query languages, and data representation formats. For example, SQL, SPARQL, JDBC, HTTP, WebDAV, XML, RDF, etc. Virtuoso covers many areas like Data Management (Relational, RDF Graph, or Document), Free Text Content Management & Full Text Indexing, Document Web Server, Linked Data Server, Linked Data Deployment, and Messaging.
PoolParty PoolParty is a thesaurus management system for generation of knowledge models, creation of thesauri and taxonomies. The platform is based on semantic technology and provides the
36http://d2rq.org/
47 ability to combine thesauri with Linked Open Data. The information are analyzed by the system and published into a semantic graph. The system has following features:
• It can analyse documents in order to find inconsistency between existing taxonomies and the content.
• The system follows W3C’s SKOS standard.
• Connection to Linked Data.
• Support various datatypes.
• Use of Virtuoso for knowledge graph storing.
• Integration with SharePoint, Drupal, etc.
• The system is based on the following standards: RDF, SPARQL and SKOS.
• Integration with other enterprise systems.
A Link Discovery Framework for the Web of Data (SILK). “Using the declarative Silk - Link Specification Language (Silk-LSL), data publishers can specify which types of RDF links should be discovered between data sources as well as which conditions data items must fulfill in order to be interlinked.“ [?]. The SILK specifies RDF links between data sources and terms of data interlinking. Different similarity metrics can be used for links defining. LIMES. LIMES is a link discovery framework. The approach refers to “interlinking“ phase and is based on estimation of similarity between instances [8] [51]. The instances pair are filtered to find out sufficient according to the specified conditions. The approach includes also machine- learning algorithms (EAGLE37, COALA38 and EUCLID39) to find out the appropriate pairs of instances. The framework includes seven modules: control module (matching process coordination), data module (consist of classes needed to work with data), I/O-module (is used for data reading and data extraction), query module, LIMES engine (used for result computing).
Sieve. Sieve relates to the quality phase of Linked Data Life Circle. The tool consists of two modules: data quality and data fusion [58]. Sieve realizes the prove of quality of data throw various mechanisms:
• Assessment Metrics. The metrics combines some quality indicators and “calculates an assessment score from these indicators using a scoring function“ [58].
37http://en.wikipedia.org/wiki/Eagle_strategy 38http://www.cs.mu.oz.au/~jbailey/papers/coalafinal.pdf 39http://en.wikipedia.org/wiki/Euclidean_algorithm
48 • Data qualities indicators. The indicators depends on information that the users need and a specific situation.
• Scoring functions. The functions are related to data qualities indicators and presented an evaluation of them. It includes simple comparison functions, complex statistical functions, network analyses, etc.
• Aggregate Metrics. The metric aggregates assessment metrics with use of the average, sum, max, min or threshold functions.
The second module presents a data fusion mechanism. “Data Fusion is commonly seen as a third step following schema mapping and identity resolution, as a way to deal with conflicts that either already existed in the original sources or were generated by integrating them“ [58]. There are two types of fusion finction in Sieve:
• Filter function. It uses a quality metric to remove some values from the input data sets.
• Transform function. It generates new values from input datasets with use of fusion function like Filter, First, Last, Random, or Average, Max, Min.
SemMap is used for knowledge visualization. It explores spatial areas and shows objects according to specific properties. The interaction between triple stores and the application is re- alized via SPARQL queries.
Sig.ma Sig.ma is a semantic web Mashup. The application has following tasks [80]:
• Browsing the Web of Data. Sig.ma browsers the information according to the input text data. The application returns the data from the Web of Data (e.g., name, title, location, etc.). The user has the ability to follow the links that the system returned.
• Embedding, linking and Sig.ma alerts. The user has an ability to expand and refine the sources in order to select needed values and properties.
• Structured property search for multiple entities. Search for properties. For example, the request “title, actor, year, [...] @ Harry Potter“ returns an array with given properties regarding to the entity “Harry Potter“.
The search for data sets has following steps [80]:
• Data source selection. The result is a list of sources that has been found via various Search Engine interrogations.
• Parallel Data Gathering. The Extraction of structured data from different data sources.
• Extraction and Alignment of related subgraphs. The structured data are separated into parts, each of that has a resource description. As next step the similarity of data will be found and connected via owl:sameAs.
49 The information above can be summarized as follows: LOD2 is large research and develop- ment project, which covers full Linked Data circle from data extraction to search. LOD2 focus on data and information integration, quality of data, and bringing Linked Data to enterprises.
3.2 Semantic Description Approaches
In this part following kinds of description methodologies are presented:
• Service description approaches such as WSDL, OWL-S, WSMO, and WSMO Lite.
• The approaches which present integration of the services and Linked Data like LIDS, LOS, Data-Fu, RestDesc, and Karma.
• A matching approach that presented matching from relational database to RDF like R2RML.
In course of the parts it was impotent to clarify if there is one approach that can be applied for Linked Widget.
3.2.1 Web Services Description Language (WSDL) WSDL is an XML-based language and a model for Web services descriptions. A WSDL de- scription provides machine-readable information about how the service can be invoked; what data or information are needed; and what is a return of the service. The service description in- cludes the operations provided by the service and expected parameters. The model of WSDL is a set of components and properties. There are two version of WSDL: 1.1 and 2.0. Version 2.0 is a part of W3C recommendation. WSDL 2.0 provides two kinds of information: abstract model (application-level description) and concrete model (the specific protocol-dependent details) [66]. The separation is needed because of different end points with dissimilar access protocols for common functionality. The abstract model describes messages that are sent and received by a Web service. The concrete model describes communication protocol (e.g., SOAP), service interactions, and the endpoint of communication (the address). WSDL document uses the following elements in the definition of web services 40 [66]:
• Types – a container for data type definitions using some type system (such as XSD) [66].
• Message (WSDL 1.1) includes essential information for operation execution and corre- sponds to an action (an operation).
• Operation - “an abstract description of an action supported by the service“.
• Port Type (WSDL 1.1) or Interface (WSDL 2.0) – a list of operations (inputs and out- puts) that can be performed by one or more endpoints.
40WSDL 1.1 and WSDL 2.0 use in some cases different terms
50 • Binding indicates a protocol and a data format specification for a port type (SOAP binding style); • Port (WSDL 1.1) or Endpoint(WSDL 2.0) – usually an URL to a single endpoint. • Service – a set of endpoints. Listing 3.2 presents the main elements of a WSDL description.
1 3 ∗ 4 [ | ]∗ 5 < t y p e s / >? 6 [ | | ]∗ 7 Listing 3.2: Main Elements of WSDL Description WSDL is “an XML format for describing network services as a set of endpoints operat- ing on messages containing either document-oriented or procedure-oriented information“ 41. It focuses more on technical side of processes and describes the syntax of service but not the se- mantics. This approach can not cover all requirements of semantic service retrieval and service composition.
3.2.2 Semantic Annotation for Web Services Description Language (SAWSDL) SAWSDL is “mechanisms using which semantic annotations can be added to WSDL compo- nents“ [70]. SAWSDL provides mechanisms by which concepts from the semantic models that are defined either within or outside the WSDL document can be referenced from within WSDL components as annotations [70]. Based on member submission WSDL-S, the key design principles for SAWSDL are [70]: • “The specification enables semantic annotations for Web services using and building on the existing extensibility framework of WSDL. • It is agnostic to semantic representation languages. • It enables semantic annotations for Web services not only for discovering Web services but also for invoking them“.
3.2.3 Semantic Markup for Web Services (OWL-S) OWL-S is an OWL-based ontology for description of web services. The language constructs are used for describing the properties and capabilities of Web services. “OWL-S markup of Web services will facilitate the automation of Web service tasks including automated Web service discovery, execution, interoperation, composition and execution monitoring“ [25]. The descriptions of Semantic web services have usually three interrelated subontologies or profiles: 41http://www.daml.org/services/owl-s/1.0/owl-s-wsdl.html
51
Ressource provides Service
presents (what supports (how it does) describedby to access it) (how it works)
profile process grounding
Figure 3.5: Top level of the service ontology
• The service profile provides service description, in standard OWL.
• The process model describes the processes inside the semantic web service.
• The service grounding describes access to the semantic web service, typically expressed in WSDL.
Like WSDL, OWL-S has abstract and concrete models. Abstract characterizations are the service profile and the process model. The service grounding provides concrete information needed for access to a web service like message formats, protocol, etc. Figure 3.5 depicts rela- tions between service and its components. In the picture the arrows present OWL properties and ovals show the OWL classes. The Service Models includes:
• Inputs and outputs. The inputs are the description of data that the service needs to process, the outputs are description of result data that service produce. Both properties are values from the Service Model.
• Precondition is a proposition that has to be true to execute the service.
• Result is a condition that becomes true after process execution.
The Service Profile specializes representation of services [25]. An OWL-S profile provides following kinds of information:
• Non-functional description (metadata like service name, description, contact information etc.). For example, the provider information includes information about the entity that is responsible for running the service.
52 • Function description about information transformation (function that can be computed, service characteristics) and service states (precondition and postcondition, fact). For ex- ample, a booking service can need as precondition the first and the last name of a person, credit cards data and an identity card ID. As output the service returns a booking confor- mation.
The service profile includes references to service model therefore it is possible to find those services that mostly satisfy requests. The service grounding describes an access to the semantic web service and implementation to WSDL, SPARQL etc. It represents exchange of information between consumer and service provider. It maps a abstract specification to a concrete model [25]. For example, in case of WSDL, it maps each atomic process to a WSDL operation and relates “each OWL-S process input and output to elements of the XML serialization of the input and output messages of that operation“ [57]. The inputs and outputs of a process in grounding level are realized as messages. The following example demonstrates the syntax of OWL-S. The process model describes the processes inside the semantic web service. Each service has some inputs , outputs , condition , result variables and effects. E.g. the service: purchase. To buy some products in the internet the user has to provide the information about his or her credit card like a credit card number, a name and a cvv. The card will be the input of the process “purchase“ (c.f. Listing 3.3).
1 2 3 Listing 3.3: OWL-S Example
As output after the purchase the user gets a confirmation number (c.f. Listing 3.4).
1 2 3 Listing 3.4: OWL-S Example
A result variable should be also defined (c.f. Listing 3.5). The result variable is a variable scoped to the Result block and bound by the result condition.
1 2 3 4 5 Listing 3.5: OWL-S Example
The advantages are OWL-S synthesizes both an extensional and functional view of Web services, it provides a complete description of the services that it describes [57], OWL logic and ontological are included in description.
53 The disadvantages are: the limitation of OWL-S is in using OWL as a language based on description logic [6], hard to describe semantic relation between input and output because it doesn’t provide mechanisms to express its relation to other service [81], no working tools.
3.2.4 Web Service Modeling Ontology (WSMO) WSMO is an ontology for description of the core elements of Semantic Web Services. The following design principles are basics for WSMO: Web compliance (use of URI for resource identification), ontology-based (resource description is based on an ontology), strict decoupling (“each resource is specified independently without regard to possible usage or interactions with other resources“ [68]), centrality of mediation, ontological role separation (clients role separa- tion), description versus implementation (separation between description of Web Service and implementation), execution semantics (technical realization), service versus Web service. WSMO uses similar approach to OWL-S for declaring and describing services. But there is a difference to OWL-S. If OWL-S focuses more on the description of services, WSMO focuses more on application domain and solving the integration problems [47]. The model include four parts:
• Ontology - domain description that can be used by other WSMO elements. This part includes machine-processable information that are needed for addition of meaning to the data.
• Web service interface - semantic description of services (the capabilities, interfaces and internal working of the service);
• Goal - results or goals of the usage of the web service;
• Mediator - coordinates WSMO components.
An ontology in WSMO includes also non-functional properties, used mediators, concept definitions, relation definitions, axioms, and instances [47]. The non-functional properties are globally accessible by all the modelling elements. The properties can be from controlled vocab- ularies like Dublin Core, Properties from other vocabulary, a standard set provided by WSMO (hasContributor, hasDate etc). Used mediators are used for linking to ontologies that should be imported, linking the goals, linking between services and WSMO goals, orchestration. In compare to WSMO, OWL-S does not support such a meta-ontology.
3.2.5 WSMO lite WSMO-Lite is a lightweight approach which is standardized according to the W3C standards for the semantic service description, “the next evolutionary step after SAWSDL, filling the SAWSDL annotations with concrete semantic service descriptions“ [84] which can be directly applied to WSDL description. WSMO-Lite allows bottom-up modelling of web services. WSMO-Lite adopts the WSMO model and makes its semantics lighter in the following major aspects: WSMO-Lite treats medi- ators as infrastructure elements, and specifications for user goals as dependent on the particular
54 discovery mechanism used, WSMO-Lite only defines semantics for the information model, func- tional and nonfunctional descriptions, and it accepts any ontology language based on resource description framework [84]. The approach treats Web services as atomic, and does not focus on internal behaviour of web services [84] and does not have a concrete language for description of semantics for function. WSMO-Lite service ontology has three parts:
• Domain ontology that presents an information or data Structure Model. WSMO-Lite iden- tifies types and simple vocabulary for semantic description of services and languages used to express descriptions.
• Capabilities and/or functionality classifications that presents functional description of the service (conditions definition, effects)
• Non-functional and description is represented by an ontology that specifies policy or other non-functional properties.
The main disadvantage of this approach is that it focuses on description of Web APIs, and not on providing relationships between data that is processed by web services.
3.2.6 RESTDesc semantic description RESTDesc is a semantic functionality-centred method which expresses the functionality of a service - as well as its communication - in a concise way that appeals to human and can be processed automatically [81] [82]. The main elements of the RESTDesc approach are the pre- condition, the postcondition and the request details. The precondition describes the input state of a resource of a service. The postcondition is the output state after the interaction and the request details defines a method which should be used to achieve the new state. These ele- ments are brought together in the form of the rule, which takes care of correct quantification and variable instantiation [81]. This approach supposes use of an ontology model. E.g. It can be an RDF schema. The links are used to define the relationships between resources. E.g http://example.org/pictures/1 and http://example.org/pictures/ 1/animals/1, it means that the picture 1 is grouped to the category “Animals“. Listing 3.6 demonstrates use of RestDesc description language for service description. The service gets a director as input data and returns list of movies. The precondition defines an input of the service, an instance of the class movie:Director. This input is required for the service invocation. In the postcondition section the HTTP vocabulary is used to describe a GET request. The directorOf link shows the relationship between the director and the movies. The service returns the list of movies and provides additional data such as year, actor.
1 2 @prefix movie: . 3 @prefix http: . 4 @prefix tmpl: . 5 { ?director a movie:Director. } 6 => 7 {
55 8 _:request http:methodName "GET"; 9 tmpl:requestURI (?director "/movie"); 10 http:resp [ tmpl:represents ?movie ]. 11 ?director movie:directorOf ?movie. 12 ?movie movie:year _:year; 13 movie:starring _:actor; 14 movie:type _:type. 15 }. Listing 3.6: RESTDesc Example
The advantages of this approach are: it is possible to describe relationships between input and output data (e.g., ?director movie:directorOf ?movie., it links web services direct to data sets. The disadvantages are: the user should manually write the description of the model while RESTDesc doesn’t support automatic generation of service descriptions, the approach focuses on applying HTTP methods for data retrieving, publishing, etc. while mashup focuses on Linked Data consuming and data description.
3.2.7 SA-REST SA-REST is an a simple and open microformat for enhancing Web resources with additional semantic information [31]. The meta information can be modeled according to various for- mats such as RDFa, OWL or Gleaning Resource Descriptions from Dialects of Languages (GRDDL42). Altogether that makes the service description human readable as well as machine readable. The main idea of SA-REST is to add semantic description directly into SAWSDL or HTML code. SA-REST like SAWSDL “annotates outputs, inputs, operations, and faults, along with the type of request that it needed to invoke the service“ [48] in form of URIs. It means, that the SA-REST links an ontology to a service. For example, an input message can be annotated by embedding a URI from an ontology. An important point of the approach is lifting and lowering schema specification. It is used for data structure transformation from input or output of services to an ontology. The idea is similar to OWL-S grounding. To realize this transformation SA-REST uses XSLTs or XQueries. The queries take a data structure from the implementation level (data expected as input or output of the service) and convert it into an ontology structure. Listing 4.1 demonstrates a Web page which is annotated with use of SA-REST. In this ex- ample, the user searches for information about a movie. The user puts a title object to the movie-search-service. An the service returns the description from the output of the movie- search-service.
1 2 3 4
42www.w3.org/TR/grddl/
56 6 8 9 11 13 15 < / meta> Listing 3.7: SA-REST Example
The advantages of the SA-REST are adding semantics direct to REST services, WSDL, or HTML; SA-REST doesn’t enforce the choice of language for representing an ontology or a conceptual model, but it allows the use of OWL or RDF [48]. SA-REST is “a more general purpose language that adds semantic annotations only to those page elements that wrap a service or a service description“ [48]. This could problems associated with widget composition and discovery. The disadvantages are that the annotation of web pages is often problematical, while the programmer should usually select a page which will be annotated with semantic description. Additionally, for Linked Widgets it is important to separate technical and semantic parts.
3.2.8 EXPRESS EXPRESS is an approach for semantic service description. The main feature of EXPRESS is providing “an uniform interface for resources“ [6]. The resources are described with use of an OWL ontology and the HTTP methods (GET, PUT, DELETE, POST and OPTIONS). The RESTful interface can be automatically created because of automatic direct mapping between entities and resources. The EXPRESS includes a service provider and an EXPRESS deployment engine. The ser- vice provider provides an OWL file describing the resources in a Web Service [7]. The OWL file also defines “exchanged message format“ [7]. The URIs for resources are generated through an EXPRESS deployment engine. After URI generation the service provider assigns the HTTP methods to the classes, properties and instances [7]. The user roles are provided to differ the access to resources and methods for different kinds of user (role-based access control). Listing 3.8 shows an example of a DVD ordering service description with using EXPRESS methods. A DVD ordering is provided by a Web Service. The service provider provides an ontology that describes entities and relationships between them. In this example, the classes are DVD, Customer and Order. The customer can order movies and games (the subclasses of class DVD).
1 2 // The custemer can order movies and games 3 :DVD a owl:Class. 4 :Movie a :DVD. 5 :Game a :DVD. 6 7 // an order can include movies and games 8 // the order has properties that defines a customer
57 9 // and time of the ordering 10 :Order a owl:Class. 11 :hasDVD a owl:ObjectProperty; 12 rdfs:domain :Order; rdfs:range :DVD. 13 :OrderedBy a owl:ObjectProperty; 14 rdfs:domain :Order; rdfs:range :Customer. 15 :hasDate a owl:DatatypeProperty; 16 rdfs:domain :Order; rdfs:range xsd:dateTime. 17 18 // class Customer 19 :Customer a owl:Class. 20 :hasName a owl:DatatypeProperty; 21 rdfs:domain :Customer; rdfs:range xsd:string. Listing 3.8: EXPRESS Example
The EXPRESS deployment engine generates the URIs. E.g., http://www.example. org/DVD is a URI for class DVD, http://www.example.org/Order is a URI for class Order. URIs are generated also for properties and instances of the classes. E.g., http:// www.example.org/customer1 is a URI for an instance of the class “Customer“, http: //www.example.org/customer1/hasName is a URI for the property “hasName“. The next step is methods subscribing to the resources that define the HTTP methods for each URI. If there are different types of user in the system, the methods will be defined via role based access control. After role definition stubs will be automatically created. Firstly, for DVD ordering the user sends a request to the server via the GET methods. The method returns the list of DVD from the OWL file. Secondly, the user orders items via a POST request to http://www.example.org/Order. The response of the server will be a URI of the new order (http://www.example.org/order1). If the user is already existing, the server selects automatically the URI of this user into the new order. Otherwise, the new user will be created and the server will return a new URI. The desired products are added via a PUT request to the server. Listing 3.9 is an example of the message that will be send to the server.
1 2 //The customer "Irina:" ordered an item 3 4 :customer1 a :Customer ; 5 :hasName "Irina". 6 :order1 a :Order ; 7 :hasDVD :Movie ; 8 :orederedBy :customer1. Listing 3.9: EXPRESS Example
The EXPRESS approach has following advantages: it eliminates the need describing ser- vices separately [7], not so complicated like WSMO and OWL-S, and uses OWL ontology to provide “a description of a RESTful Semantic Service“ [6]. The disadvantage of the EXPRESS approach are: there is no implementation of this approach, automatic discovery and composition are not yet possible, integration of the semantic model into the resource oriented architecture is not yet implemented [6].
58 3.2.9 Linked Open Services (LOS) The LOS approach supposes a service description method that simplifies the access to the seman- tic web services for LOD specialists [63]. The input and output of the services are connected via links to Linked Data. The semantic description presents what kind of input and output RDF data a service can consume and produce, and “how a service invocation contributes to the knowl- edge of its consumers“ [63]. The approach focuses on data description, therefore services can be more easily integrated in service compositions [63]. LOS does not only follow the Linked Data principles, but also proposes “a list of further service-specific principles to be followed for openly exposing services over Linked Data“ [63]. These principles are:
• SPARQL graph pattern for service description of input and output (inclusive the specifi- cation of data format);
• Use of RESTful content negotiation.
• The explicit relation between outputs and inputs.
• SPARQL CONSTRUCT will be available for lifting or mapping (optionally).
The approach proposes to transform non-RDF data to RDF data if the service does not accept the non-RDF data. After the data processing the returned output non-RDF data should be transformed again to the RDF format. The following parts of a code shows examples of production and consumption patterns for data that are accepted and returned by a service. The service gets information about actors and returns movies based on names and birthdays of actors. The client sends a request which contains information about an actor, the name and the birthday (c.f. Listing 3.10).
1 2 [a dbpedia:Person; 3 dbpediaprop:name ?name; dbpediaprop:bithDay ?b] Listing 3.10: Request to a service
The server sends a response in form of a message after completing the request. The response contains information about movies (the title, the year, and the actor) (c.f. Listing 3.11).
1 [ moviedbbase:movie [ 2 moviedbbase:title ?title ; 3 moviedbbase:year ?year ; 4 dbpedia:actors ?actor ; 5 dbpedia:name ?name; dbpedia:bithDay ?b] 6 ] Listing 3.11: Response of a service
The disadvantage is that LOS uses string values to present graph pattern, e.g. “[a dbpe- dia:Person; dbpediaprop:name ?name]“. Therefore the quality of discovery and composition reduces.
59 3.2.10 Linked Date Services (LIDS) LIDS focuses on Web Services and Linked Data integration by providing an interface [74]. The LIDS follows Linked Data principles, therefore the set of requirements are fulfilled: a URI of input of a service is required to invoke this service; an “URI must return a description of the input entity, relating it to the service output data“; the description has to be modeled according to RDF standard [75]. The use of URI as identifier for input entities has following advantages: the explicit link between input and output; the entities can be connected to different results; the representation of the result structure by a description; the meaning of the data by means of an ontology. Linked Service is interlinked with a Linked Data Endpoint. This gives a possibility to enrich Linked Data automatically. Additionally, the LIDS approach supports Linked Data publication and interlinking of Linked Data Endpoints with Linked Data Services. Listing 3.12 presents basic elements of a description. SPARQL constructs are used for adding relation between data and a service. input represents specific input values and ser- vice parameters. endpoint is a URI of a Linked Data Endpoint, that is used to construct service calls. io-relation is relation between input and output data.
1 2 CONSTRUCT { [ io−relation] } FROM [endpoint] 3 WHERE { [input] } Listing 3.12: LIDS Construct
Listing 3.13 presents a construct expression. The variable ?star will be found by the service. The variable ?movie is an input object of a service which has properties dbpediaprop:title and dbpediaprop:year. The service receipts title and year attributes of a movie, and based on this attribute finds and returns a list of stars.
1 2 CONSTRUCT { ?movie dbpediaprop:starring ?star } 3 4 FROM 5 6 WHERE { ?movie dbpediaprop:title ?title . 7 ?movie dbpediaprop:year ?year } Listing 3.13: LIDS Example
Listing 3.14 shows the basic pattern of LIDS descriptions which can be added in an ontology, where LIDS - an instance of the Linked Service, ENDPOINT - an HTTP URI of the Linked Service, ENTITY - the name of the entity, INPUT and OUTPUT graph patterns, VARS - variables or input parameters.
1 @prefix lids: 2 LIDS a lids:LIDS; 3 lids:lids_description [ 4 l i d s : e n d p o i n t ENDPOINT ; 5 lids:service_entity ENTITY ; 6 lids:input_bgp INPUT ; 7 lids:output_bgp OUTPUT ;
60 8 lids:required_vars VARS 9 ]. Listing 3.14: LIDS basic pattern Listing 3.15 presents an example of applying LIDS approach. The example shows a “Movie find server“ which returns a set of stars based on an year and a title of a movie.
1 :MovieFindService a lids:LIDS; 2 lids:lids_description [ 3 lids:endpoint 4 ; 5 lids:service_entity "movie" ; 6 lids:input_bgp "?movie a dbpedia:Work. 7 ?movie dbpediaprop:title ?title . 8 ?movie bpediaprop:year ?year" ; 9 lids:output_bgp "?movie dbpediaprop:starring ?star" ; 10 lids:required_vars "title year" 11 ]. Listing 3.15: LIDS Example The work on this approach is not yet finished. The LIDS developers plan to improve tool support, develop an integration mechanism into SPARQL processing, and add usage policies. The disadvantages of the LIDS approach are that LIDS approach uses graph pattern which is presented as string, e.g. lids:input_bgp “?movie a dbpedia:Work“. This limits service discovery and compositions because the graph can not be queried.
3.2.11 Data-Fu The goal of the approach is a specification of data and services that process Linked Data from various data sources. Data-Fu is “resource-driven programming approach leveraging the com- bination of REST with Linked Data“ [76]. The approach gives an opportunity to develop an application that access to semantic web resources via using a declarative rule language. The approach simplifies web application development by providing links to Linked Data and inter- action specification based on resource state. Data-Fu follows three basic principles: use of URIs for resources identification, use HTTP methods to access and process data, and to interlink resources. It also denotes that Linked Data “does not distinguish explicitly between URI-identified objects and their representation“ [76]. The combination Linked Data with REST brings an ability to manipulate data. Data-Fu provides a mechanism to define changes of resource states. Data-Fu includes two layers: • Read/Write Linked Data Resource - the applying HTTP methods to Linked Data re- source. The most important methods are GET, POST, OPTION, DELETE, and PUT. Data-Fu distinguishes safe and non-safe methods. The non-safe methods effect on state of the resource (e.g., the method DELETE that deletes some datasets). The safe methods don’t affect state of the the resources. “The dependency between communicated input and the resulting state of resources also needs to be described“ [76]. For example, the method PUT creates or overwrites a resource with the submitted input.
61 • REST Service Model is formalized model for description of interactions that are sup- ported by RESTful services. It describes the influence of HTTP methods on Linked Data resource states and presented by “a REST state transition system (RSTS)“ [76].
The both two layers use RDF for description of methods and resources. The Data-Fu tech- nique also includes an interpreter, an engine which invokes service interaction. The interaction are specified by Data-Fu rules. An advantage of the engine is ability to process complex queries at the same time. After processing the engine can store the data in different formats like JSON or RDF. Listing 3.16 presents a description of Linked Data services using Data-Fu language. The first part describes the HTTP method GET which returns a movie item. The second part describes a method POST which adds the additional information to the movie item (title, year and a star).
1 2 GET (?mid, {}) 3 <− {?mid rdf:type ex:MovieID} 4 5 POST{?d, {[] rdf:type ex:Description; 6 ex:title ?t ; 7 e x : y e a r ?y ; 8 ex:starring ex:Person. }) 9 <− { ex:Movie ex:hasID ex:MovieID }. Listing 3.16: Data-Fu Example
The approach focuses on the applying the HTTP methods for interaction with Linked Data processing. The disadvantages of this approach are: it is not defined how to discover and com- posite services; the querying mechanism is not defined.
3.2.12 Karma Karma is a tool for data integration from various data sources and generation semantic interlink- ing data. The ontology describes APIs description as well as semantic relation between input and output data. Karma approach suggests to “represent the semantics of Web APIs in terms of well known vocabularies, and to wrap these APIs so that they can consume RDF from the LOD cloud and produce RDF that links back to the LOD cloud“ [78]. Due to the semantic description of APIs, the description can be queried with use of SPARQL queries. The modelling includes two steps: ontology definition, assignment data to semantics types, and Identification of relationships between data and ontology. The model of a linked API consists of two parts: the syntactic part which provides re- quired information (e.g, a URI, input parameters) for service, and semantic part that describes input and output data of a service and the relations between them [78]. Figure 3.6 represents the semantic model of this approach. Each Service km:Service has inputs km:hasInput and outputs km:hasOutput that are linked to a model km:Model. The input and output models are defined with use of the Semantic Web Rule Language (SWRL)43. The Model is linked to swrl:Atom instances: swrl:ClassAtom which describes an instance of a class
43http://www.w3.org/Submission/SWRL/
62 Figure 3.6: The ontology description of a Web APIs. Source: [78] and swrl:IndividualPropertyAtom which presents an instance of a property [78]. The variable swrl:Variable are data that the service gets as input or returns as output. For example, how to describe with use of the Karma approach a service which receives an instance of the class author as input data and returns an instance of the class publication as Output? The class atoms will be linked to instances of classes author and publication. The individual property atom will have a relation to an rdf:property - “theAuthorOf“. The variables will be author name and birth date. The service processes this data and returns URIs of one or more publications. Listing 3.17 presents a snippet of a service description with use of the Karma approach.
1 @prefix : . 2 @prefix dbpedia: dbpedia: . 3 ... 4 : a km:Service; 5 km:hasName "actors" ; 6 hrests:hasAddress "http: //api.example.org/findactors? 7 title={harry}&username={username}" ^^ 8 hrests:URITemplate ; 9 hrests:hasMethod "GET"; km:hasInput :input; 10 km:hasOutput :output . 11 12 ... 13 14 :input a km:Input; :output a km:Output; 15 km:hasAttribute :in_title; km:hasAttribute :out_actor_name ;
63 16 km:hasModel :inputModel . km:hasModel :outputModel . 17 :in_title a km:Attribute; :out_actor_name a km:Attribute; 18 km:hasName "title" ; km:hasName "actor_name" . 19 hrests:isGroundedIn ... 20 "p1"^^rdf:PlainLiteral . 21 22 ...... 23 :title_var a swrl:Variable . 24 :inputModel a km:Model; 25 km:hasAtom 26 [ a swrl:ClassAtom ; 27 swrl:ClassPredicate dbpedia:title; 28 swrl:argument1 : title_var]; 29 ... Listing 3.17: Karma Example
Another goal of the Karma approach is the automatic modelling and optimization of source model. This is based on a graph-based approach which was introduced by Karma developers. The main focus is the problem of automatic semantic annotation. The approach increases “the quality of the automatically generated models by using the already modelled sources to learn the patterns that more likely represent the intended meaning of a new source“ [77]. There are many sources that provide similar semantic linking data. The task of the project is using already existing resource models in order to get a new one. Typically there two steps in
Figure 3.7: Graph-based approach by an example. Source: [78]
64 modelling process. The first step is determination of semantic types. It means that each attribute should be “labelled with a class or a data property of the domain ontology“ [77]. For example, to invoke the service getEmployees it is required to provide the attributes “employer“ and “employee“. The domain ontology include two classes: Person and Organisation. As result of this step the attribute “employee“ will be labelling with the class “Person“, and the attribute “em- ployer“ with the class “Organization“. The second step is relationship definition, e.g a person “worksFor“ an organisation (c.f. Figure 3.7). As a graph is constructed, labeling the attributes to semantic types and search for appropriate nodes with use of machine learning technique can be performed. Next, the models should be scored in order to find one that is matched with more coherent and frequent patterns and build- ing of a tree for candidate models generation. The last step is the generation of a ranking list according to which the users have possibility to choice a correct model. Additionally, the new version of Karma can include direct mapping between data stored in relational databases and domain ontologies with use of W3C’s R2RML44. This mapping lan- guage will be introduced in the following section.
3.2.13 RDB to RDF Mapping Language (R2RML) In order to make the semantic model more flexible, the relational database to rdf mapping is very important. The automatic mapping can increase the data volume that can be used for specific tasks. The suggestion is to use RDB to RDF Mapping Language (R2RML) for automatic dataset generation and combination with the existing web service description models. This part of the Master Thesis is based on the W3C recommendation for R2RML [72]. R2RML is a language for “relational database datasets to RDF datasets“ transformation. The language describes the database structure as input and returns the structure of new RDF dataset. Transformation to RDF graph is occurred via SPARQL constructs. The target RDF vocab- ulary composes the database elements name, therefore it is not possible to change the RDF structure or vocabulary. Figure 3.8 presents the meta-model of the R2RML. It includes the following elements: “triplesMap, LogicalTable, PredicateObjectMap, GraphMap, SubjectMap, PredicateMap, ObjectMap, RefObjectMap and Join“. The Input can be an SQL query to the Database. The code below presents an example of the SQL query that selects data about movie (title and date) from movie database.
1 [] rr:sqlQuery """ 2 S e l e c t ( ’ Movie ’ | | MOVIENO) AS MOVIEID 3 , MOVIENO 4 ,TITEL 5 , DATE 6 from LW.MOVIE 7 """; 8 rr:sqlVersion rr:SQL2008. Listing 3.18: R2RML
44http://www.w3.org/TR/r2rml/
65 The rules for relational dataset to rdf mapping should be specified via TripleMap, that has exactly one logical table, one subject map and zero or more predicate object map properties. The logical table describes the set of data that have to be mapped to RDF.
1 namespace: 2 [] 3 rr:logicalTable [ rr:tableName "MOVIE" ]; 4 rr:subjectMap [ rr:template "http: //linkedwidget.org/ 5 moviedataset /{MOVIENO}" ]; 6 rr:predicateObjectMap [ 7 rr:predicate lw:titel; 8 rr:objectMap [ rr:column "TITEL" ]; 9 ]; 10 rr:predicateObjectMap [ 11 rr:predicate lw:date; 12 rr:objectMap [ rr:column "DATE" ]; 13 ]. Listing 3.19: R2RML
The subject map property describes the way of subject generation. It references one or more properties rr:class (c.f. Listing 3.20). The value of the property is an IRI.
1 input: [] rr:template "http: //linkedwidget.org/moviedataset/ 2 {MOVIENO} " ; 3 rr:class lw:Movie. 4 output: rdf:type lw:Movie. Listing 3.20: R2RML
Figure 3.8: An overview of R2RML
66 The predicate-object map is “a function that creates one or more predicate-object pairs for each logical table row of a logical table“ [72]. The predicate-object map is linked to one or more predicate maps, and one or more object maps or referencing object maps. The term map is “a function that generates an RDF term from a logical table row“ [72]. The term map relates to following RDF terms:
• Constant value (via rr:constant), represented by a resource.
• Column name (via rr:column), a valid SQL identifier.
• String template (via rr:template), a format string for strings building from multiple components.
• rr:IRI, rr:BlancNode, rr:Literal (via rr:termType), defines type of an RDF term, that can be either an IRI, or a blank node or a literal.
• language tag (via rr:language).
• rdfs:Datatype (via rr:datatype).
• string template (via rr:inverseExpression), for term map optimisation.
“A term map must be exactly one of the following: a constant-valued term map, a column- valued term map, a template-valued term map“.
1 [] rr:predicateMap [ rr:constant rdf:type ]; 2 rr:objectMap [ rr:constant lw:Movie ]. 3 ?x rdf:type lw:Movie. 4 5 [] rr:objectMap [ rr:column "MOVIENO"; rr:datatype 6 xsd:positiveInteger ]. Listing 3.21: R2RML
Relations mapping. It is possible to add reference between two instances instantiated from database. For example, the relation between movies and actors who have played in the movie. It is realized by adding a property object map. The property object map references triple map and join condition via rr:parentTriplesMap and rr:joinCondition. The join condition has exactly one value of property rr:child and one value of property rr:parent. The following code presents a SQL query, if the referencing object map has no join condi- tion.
1 SELECT ∗ FROM ({child −query}) AS tmp Listing 3.22: R2RML
Second code presents a SQL query, if the referencing object map has no join condition, if the referencing object map has at least one join condition.
67 1 SELECT ∗ FROM ({child −query}) AS child , 2 ({ p a r e n t −query}) AS parent 3 WHERE child .{ child −column1}=parent .{parent −column1 } 4 AND child.{child −column2}=parent .{parent −column2 } 5 AND . . . 6 7 [] rr:predicateObjectMap [ 8 rr:predicate lw:movie; 9 rr:objectMap [ 10 rr:parentTriplesMap <#TriplesMap2>; 11 rr:joinCondition [ 12 r r : c h i l d "MOVIENO" ; 13 r r : p a r e n t "MOVIENO" ; 14 ]; 15 ]; 16 ]. Listing 3.23: R2RML
The result is ex:starring .
3.3 Summary
This chapter presented some of the existing mashups platforms (Yahoo!Pipes, DERI Pipes, BIO2RDF), the tools that are provided by LOD2, and semantic approaches for enhancing web services with additional semantic information. The first part of the analysis showed that the application has a number of weak points:
• Most of the applications are not general, i.e. the focus is usually a specific problem. For example, BIO2RDF provides just Life Science Linked Data, LIMS is based on estimation of similarity between instances [8], etc.
• The systems do not give a possibility to develop new functions that can solve additional tasks.
• The mashup platforms are not described semantically, therefore composition and discov- ery are very difficult.
• For non-professional users it is often difficult to use this application because specific knowledge is needed.
In the second part of the analysis described the advantages and disadvantages of the semantic description approaches. The approaches can be categorized into the following types of service description approaches: the approaches that focus on technical aspects of web services, the approaches that focus on integration of web services and Linked Data, the approaches that are focused on quality of the ontology models, and matching to transform different types of data into RDF.
68 The main focus of the first group is representation of interaction between software compo- nents. Most of them do not describe explicit relations between input and output data. Addi- tionally, the developer should have a very good knowledge in this domain. The developer needs to describe preconditions for the Web Service execution, postconditions, and effects. Moreover the developer should describe very detailed rules, the choreography and orchestration of the service. In compression to web services widgets do not have such wide variety of functionali- ties that must be described. Additionally, the Mashups platform supports widgets development. Often knowledge workers don’t have enough practical experience in service-oriented architec- ture, therefore the mashups platform should provide automatic generation of semantic widget description. Due to this fact applying this approach for widget description is not possible. The second group of approaches focuses on description of relations between Linked Data and Semantic Web Services. This is advantageous due to the widget processing Linked Data. But most of them have limitations, for example, LIDS and LOS “integrate data services with Linked Data by assigning a URI to each service invocation. The service URI is linked to re- sources in the Linked Data cloud and dereferencing the URI provides RDF information about the linked resources“ [78]. The input and output graphs are presented as string. This limits widgets discovery and composition. Additionally, it should be possible to query the semantic descriptions with use of SPARQL. Data-Fu and EXPRESS don’t support easy service querying; therefore this approach is not applicable for widgets. An approach which can support widget publishing, widget discovery, widget composition, and widget execution can be Karma. An example of widget description with use of Karma ontology is provided in Chapter 4. The third and fourth groups of approaches are not relevant in this stage of mashup platform implementation. In future, it may be possible to extend the semantic model with the relation database to rdf mapping (R2RML) in order to increase the amount of datasets that can be pro- cessed by the mashups platform. The following benchmarking table 3.1 and 3.2 summaries the features of the approaches that are described in this Chapter: • Possibility to publish services description on the LOD cloud. Does the approach follow Linked Data principles to make information available on the LOD Cloud? • Discovery and composition based on input and output data. Does the approach support description of input and output data based on that it is possible to provide discovering and composition of the services? • Provenance information. Is it possible to define the origin of data sets? • Description of relations between data. Does the approach support semantic relations be- tween input and output data? • Separation presentation and data level. • Complexity. How much time does the developer need to spent to be familiar with the approach? • Possibility to discover the service using SPARQL. Is it possible to query the models?
69 Feature WSDL SAWSDL OWL-S WSMO WSMO- SA-REST Lite Goal adding adding description semantic semantic de- adding descrip- semantic of web service de- scription of semantic to tion of annotations service scription services services function- to WSDL functionality alities Method description extension of description F-Logic an an- adding the of ser- WSDL logic (OWL) for logical notation annotation vices expres- mechanism in the endpoints sions for WSDL service and their using this description messages service ontology Possibility no no no no no no to publish services de- scription on the LOD cloud Discovery no the concepts via OWL-S complicated, no no and Compo- from the process hard to im- sition based semantic models plement on input and models are (based output data referenced only for from within input/out- WSDL com- put data ponents as (relations annotations ignoring)) Provenance no it is possible it is possible no no difficult information to extend the to extend the model ontology Description no no no no no no of relation between data Separation yes yes yes yes yes no presentation and data level Complexity yes yes yes yes yes no Possibility to no no, an ex- provides ba- provides no no discover the tension is re- sic function basic func- service using quired [41] for discov- tion for SPARQL ering. It is discover- queries required to ing. It is 70 extend the required to model, e.g. extend the [30] model Table 3.1: Approaches comparison. Part 1 Feature RESTDesc EXPRESS LOS LIDS Data-Fu Karma Goal adding adding semantic semantic semantic integration semantic semantic description description descrip- of data from descrip- annotations of services of services tion of differnt tion of to services that process that process services sources service Linked Data Linked Data that function- process alities Linked Data Method describing description applying semantic using a providing precondi- of services SPARQL description declara- semantic de- tion and with use of constructs of services tive rule scription of postcon- OWL for service following language data and API dition of description Linked Data description resources principles of a service Possibility no no no yes no yes to publish the services description on the LOD cloud Discovery yes no difficult, difficult, no yes and Compo- because because sition based of using of using on input and string value string value output data for graph for graph patterns patterns Provenance no no no no no no information Description yes yes yes yes yes yes of relation between data Separation yes yes yes yes yes yes presentation and data level Complexity no no no no no yes Possibility to no no, because no, because no no yes discover the of using of using service using string value string value SPARQL for graph for graph queries patterns patterns Table 3.2: Approaches comparison. Part 2 71
CHAPTER 4 Solution
4.1 Definition of requirements
Figure 4.1 presents an example of a mashup. The mashup provides a combination of wired widgets, simple applications that provide some functionalities for data processing or visualizing, such as “Location“ and “Air Quality Filter“. The main components of a widget are input/output terminals and options. The input and output terminals are used to wire the widgets in order to process the data. Additionally, the widgets include options, input that influences the data process, such as “Choose location type“, “Street“ and “Maximum distance“. The widgets can be categorized into the following types: data widget which accesses to data sources and retrieves data, processing widget which process data that were retrieved from other widget (e.g., “geo merger“), presentation widget which vitalizes data sets in form of diagrams, maps, etc., and user interaction widget which is used to provide additional functionalities, e.g item selection.
Figure 4.1: Mashup example
A goal of this thesis is to develop a semantic model that will support publishing the widgets
73 on LOD Cloud, widget discovery, widget composition, and execution, selection of the required input from the provided context information which is based on a semantic model. The previous two section have given an overview on principles of Semantic and Linked Data, and on Semantic Web Services Description. Based on these principles the following basic requirements and widgets features can be specified:
1. Widgets and Mashups are identified via an identifier - a URI. User agents may deference the widgets via these URIs. The user will have a possibility to share and publish information about unique widgets and mashups.
2. By dereferencing the widget URIs, the semantic model will be returned. Widgets have semantic models that describe what kind of data can widget retrieve and process. This model will be returned.
3. The model should follow web standards (W3C recommendations). E.g., use of Semantic Web standards for data description (RDF, PROV). “The use of standards enables the Web to transcend different technical architectures“ [36]. The use of standardized content format enables to process and publish data on the Web. RDF is used to present data structure and enable the integration of information from mul- tiply sources. Since the widget description is presented with use of RDF standards, it should be possible to discover and composite the widgets.
4. The semantic model should support adding links to other Linked Data sources. These links allow the mashups platform to connect distributed data into a data space and to navigate over the data sets. For example, a link adds the relationship “owns“ between an owner and his/her pet. The mashups platform can find a URI of a widget that retrieves RDF data describing pets owner. Following the links “owns“ the mashup platform can find widgets that can process data about his pets.
5. A widget may have more than one semantic model, but all should generate the same output with explicit relation to input graph. For example, finding geographical coordinates based on different types of of location such as parks, organisations, libraries that have different properties and can be consumed from various Linked Data Endpoint. For example, organisation can be consumed from DBPedia Endpoint and places can be consumed from Open Governmental Data like US Data Governmental Data1 or some similar Linked Data Endpoints. The output of the widget will be set of points that are modelled according to GeoNames Ontology2 and related to the location models.
1http://www.data.gov/ 2http://www.geonames.org/ontology/documentation.html
74 6. The input and output data should be interlinked, explicit relations between data should be defined. Figure 4.2 presents an example of the widget “DBPedia Film Merger“. This widget can have different instances of class from dbpedia:person and dbpedia:Work as input and output. The explicit relations in this case are dbprop:starring and dbprop:directorOf.
7. The model should be general to support various types of widgets. E.g. data widgets, presentation widgets.
8. The semantic model should provide “an explicit representation of provenance information that is accessible to machines, not just to humans“ [85]. The semantic model should provide information about origin and ownership of datasets, change tracking, and access control that will increase people’s trust in data quality.
In previous Chapter approaches for semantic description for web services have been com- pared according to features that are relevant for the widgets like techniques of composition and discovery, possibility to add relationships between input and output data, possibility to publish services on the LOD cloud, provenance information, etc. The benchmarking table shows that the approach provided by Karma satisfies nearly all requirements for the semantic model. Following Chapter provides an implementation of the semantic model based on this approach.
4.2 Use and Extension of Karma Approach
Figure 4.3 shows a semantic model for widget description based on Karma approach. The model represents the semantics of widgets including relationships between input and output data, and it uses RDF so that models can be queried using SPARQL [45]. The model
Figure 4.2: Widget & Semantic Model
75 Figure 4.3: Linked Widget Model based on Karma approach has two kind of properties lw:hasInput and lw:hasOutput that are linked to a Model (property lw:hasModel). The SWRL vocabulary is used to define input and output data. SWRL is based on a combination of the OWL DL and OWL Lite sublanguages of the OWL Web Ontology Language with the Unary/ Binary DatalogRuleML sublanguages of the Rule Markup Language [38]. SWRL allows to write rules expressed in terms of OWL concepts. An swrl:ClassAtom entity shows the membership of an instance to a class, an Data- valuedPropertyAtom presents an instance of a data property (e.g., an entity of class dbpe- dia:Work has the property dbprop:hasTitle where title is a string value), and an Indi- vidualPropertyAtom entity describes an instance of a property. Figure 4.2 shows a widget which finds films. The widget receives datasets that contain in- formation about some stars and directors. The first terminal is used to wire the widget with a widget that returns datasets about stars. The second terminal is used to wire the widget with a widget that returns datasets about directors. The widget is identified by a URI, e.g. http://www.linkedwidgets.org/widhet/w5 and has two inputs that are connected with models mw5:starModel and mw5:directorModel and one output that is connected with the model mw5:filmModel. The models define data, that the widget process and add relationships between these data, with use of SWRL vocabulary. In this example there are three models because the widget process a set of stars and a set of directors in order to return a set of films. Each kind of instances needs a semantic description. The models are depicted in figures 4.4, 4.5, and 4.6: • The first picture presents a model of the first input - a set of stars. A star is an instance of the class dbpedia:Person that has the property dbprop:starring. • The second picture presents a model of the second input - a set of directors. A director is an instance of the class dbpedia:Person that has the property dbprop:director.
76 • The third picture presents a model of the output - a set of films. The relationships between the star class and the film class, and the director class and the film class are described using instances mw5:PropertyAtom1 and PropertyAtom2 of class swrl:PropertyAtom.
Figure 4.4: The star model
The following code presents a part of a widget description that can be published on the LOD cloud.
1 @prefix mw5: . 2 @prefix ontology: . 3 ... 4 mw5:Widget a lw:Widget; 5 lw:hasName "Movie Widget" ; 6 lw:hasInput mw5:Input1; 7 lw:hasInput mw5:Input2; 8 lw:hasOutput mw5:Output; 9 10 mw5:StarModel a lw:Model. 11 mw5:DirectorModel a lw:Model. 12 mw5:FilmModel a lw:Model. 13 ... 14 15 mw5:Output a lw:Output; 16 km:hasModel mw5:FilmModel. 17 ...... 18 mw5:FilmModel a lw:Model; 19 lw : hasAtom 20 [ a swrl:PropertyAtom1 ; 21 swrl:propertyPredicate dbprop:starring;
77 Figure 4.5: The director model
Figure 4.6: The film model
22 swrl:argument1 mw5: star ; 23 swrl:argument2 mw5:film ]; 24 lw : hasAtom
78 25 [ a swrl:ClassAtom3 ; 26 swrl:ClassPredicate ontology:Work; 27 swrl:argument1 mw5:film ]; 28 ... Listing 4.1: Widget Model represented formally
Moreover, the semantic model should provide automatic widget matching and execution. Listing 4.2 shows a SPARQL query that searches for widgets that contain a specific kind of semantic relation dbprop:starring.
1 SELECT ?widget ?name ?variable1 ?variable2 2 3 WHERE { 4 ?widget [lw:hasInput [lw:hasModel 5 [lw:hasAtom [swrl:propertyPredicate dbprop:starring; 6 swrl:argument1 ?variable1; swrl:argument2 ?varibale2 ]]]]. 7 ?widget lw:hasOutput[lw:hasModel 8 [swrl:propertyPredicate dbprop:starring; 9 swrl:argument1 ?variable1; swrl:argument2 ?variable2 ]]]]. 10 ?widget lw:hasName ?name. 11 } Listing 4.2: SPARQL query
Figure 4.7 shows the result of the query.
Figure 4.7: Results
The second SPARQL query searches for a widget which can produce a set of films. DBPedia does not have a special class for films. For film definition the class ontology:Work with the relation dbprop:starring is used.
1 SELECT ?widget ?name ?variable 2 3 WHERE { 4 ?widget lw:hasOutput[lw:hasModel 5 [lw:hasAtom [swrl:classPredicate ontology:Work; 6 swrl:argument1 ?variable], 7 [swrl:propertyPredicate dbprop:starring; 8 swrl:argument1 ?variable2; swrl:argument2 ?variable ]]]].
79 9 ?widget lw:hasName ?name. 10 }
Figure 4.8 shows the result of the query.
Figure 4.8: Results
The third SPARQL query finds similar data to the data that a widget process. It can be links to internal resources or links to external data resources. For example, an entity. In this case, the property owl:sameAs is often used. This means “that two URI references actually refer to the same thing: the individuals have the same identity“ [11]. For example, the entity http: //dbpedia.org/page/Angelina_Jolie has the property owl:sameAs that is related to http://de.dbpedia.org/resource/Angelina_Jolie in German language and freebase:Angelina Jolie from Freebase3.
1 SELECT ?widget ?name ?variable 2 3 WHERE { 4 ?widget lw:hasModel[lw:hasInput[lw:hasAtom 5 [swrl:propertyPredicate owl:sameAs; swrl:argument1 6 ?variable ]]]. 7 ?widget lw:hasName ?name. 8 }
Figure 4.9 shows the result of the query. Due to the fact that the mashup platform may support widget development, it is very hard to develop a user interface which will be clear for end users and support this kind of models. Widget discovery can also be problematically because the creating queries is difficult for the end user. Additionally, the model should be easy expandable because of an extensional growth in the amount of available widgets that can have more complex features.
3http://www.freebase.com/
80 Figure 4.9: Results
4.3 Widget Model
Due to the fact that there is difficult to apply web service description approaches, a semantic model for widget description had to be implemented. Figure 4.10 depicts this model. According to this description model each widget has three types of models: input model, output model, and model, that can be connected via three types of relationship (lw:hasInputModel, lw: hasOut- putModel, and lw:hasModel) to an instance of a widget. The models contain specific kinds of semantic relation. An instance of the class lw:InModel has direct link to Linked Data instances via the property lw:hasInNode and describes the kind of a semantic relation which a widget has as Input. An instance of the class lw:OutModel has direct link to Linked Data instances via the property lw:hasOutNode and describes the kind of a semantic relation which a widget has as Output. The class Model has the property lw:hasNode and describes the full semantic model which is processed by a Widget. Listing 4.3 presents a use case. The widget might have more than one input model and one output model which is unique. The widget receives instances of dbpedia:Person and returns an instance of a class dbpedia:Work. A work can be found either by providing name of director or name of star. There are two kinds of persons in the widget model, namely start and director. dbpedia:starring is the relation between input and output models that shows that the person is a star. dbpedia:director is the relation between input and output models that shows that the person is a director.
1 @prefix : . 2 @prefix db: . 3 @prefix db−prop: . 4 @prefix dbpedia −owl: . 5 6 :film a db:Work. 7 8 :star a db:Person; 9 dbpedia −owl:starring :film. 10 :director a db:Person; 11 dbpedia −owl:directorOf :film. 12 13 :inM1 a lw:InModel;
81 Figure 4.10: Widget Model
14 lw:hasInNode :star; 15 lw:hasInNode :nameStringObject1; 16 lw:hasInNode :nameStringObject2; 17 lw:hasInNode :nameDateObject. 18 19 :inM2 a lw:inModel; 20 lw:hasInNode :director; 21 lw:hasInNode :nameStringObject1; 22 lw:hasInNode :nameStringObject2; 23 lw:hasInNode :nameDateObject. 24 25 :outM a lw:OutModel; 26 lw:hasOutNode :star; 27 lw:hasOutNode :director . 28 29 :m a lw:Model; 30 lw:hasNode :star; 31 lw:hasNode :director. 32 33 :Widget a lw:Widget; 34 lw:hasName "Movie Agent Widget" ; 35 lw:hasInputModel :inM1; 36 lw:hasInputModel :inM2; 37 lw:hasOutputModel :outM;
82 38 lw:hasModel :m. Listing 4.3: a Widget Model
The code above shows a semantic description with use of the semantic model which was in- troduced in this section. Figure 4.11 presents this semantic description with use of the graphical notation. This semantic model has following advantages:
• it supports more natural way of widget description,
• it is easily extendible,
• the direct interlinking widget models with Linked Data provides clear definition of se- mantic relations,
• semantic repository of widgets can be queried in a clear and efficient way to find the appropriate widgets.
Moreover, the semantic model follows the Semantic Web standards and direct interlinking to Linked Data supports better querying the Linked Data sets. The model lw:Model defines the explicit relations between input and output data. Therefore the input and output models can be used to create the appropriate query for finding specific kinds of semantic relation, extracting required data from Linked Data datasets, searching for widgets that can consume a specific dataset or produce the required output data, or selecting of the required input from the provided context data. The querying examples are provided in Chapter 5.
Figure 4.11: Widget Model
83 4.4 DCAT
A requirement for Linked Widget is to provide information about about origin and ownership of datasets and increasing interoperability between widgets. A possibility to provide this additional features is to model it according to the Data Catalogue Vocabulary4 (DCAT). W3C defines the vocabulary as “a RDF vocabulary that has been designed to facilitate interoperability between data catalogs published on the Web“ [55]. Figure 4.12 depicts use of the DCAT vocabulary which has been adapted to widget description. For the semantic model of the widgets followings namespaces are used:
Prefix Namespace IRI Description dcat http://www.w3. DCAT is “an RDF vocabulary designed to facilitate in- org/ns/dcat# teroperability between data catalogs published on the Web“ [55]. dct http://www.w3. Dublin Core Schema org/ns/dcat# http://purl.org/ dc/terms/# rdf http://www.w3. c.f. Chapter 2 org/1999/02/ 22-rdf-syntax-ns# rdfs http://www.w3. c.f. Chapter 2 org/2000/01/ rdf-schema# foaf http://xmlns.com/ “FOAF is a project devoted to linking people and in- foaf/0.1/ formation using the Web. Regardless of whether in- formation is in people’s heads, in physical or digital documents, or in the form of factual data, it can be linked“ [22] skos http://www.w3. SKOS is “ is a common data model for sharing and org/2004/02/skos/ linking knowledge organization systems via the Se- core# mantic Web“ [59] vcard http://www.w3. VCard is a vocabulary designed for description of or- org/2006/vcard/ ganisation and people. ns# Table 4.1: Prefix and Namespaces
The semantic model includes following properties and classes:
• dct:title - a name given to the widget;
• dct:description - description of the widget;
4http://www.w3.org/TR/vocab-dcat/
84 Figure 4.12: Extension of the Semantic Widget Model with DCAT
• dct:issued - date of formal issuance of the widget; • dct:modified - most recent date on which the widget (in general) was changed, up- dated or modified; • dct:language - language; • dcat:keyword - keywords or tags describing the widget; • dcat:contactPoint - link to contact information which is provided using VCard vocabulary; • dct:temporal - the temporal period that the dataset covers (for data cube); • dct:publisher - an entity responsible for widget creation and publishing, link to foaf:Agent (persons, organizations or groups of any kind); • dcat:theme - the main topic of the widget; • skos:Concept a category or a theme is used for describing, categorizing and organis- ing datasets; • skos:ConceptScheme - the knowledge organization system used to represent con- cepts of widgets. Listing 4.4 demonstrates an example of applying of DCAT vocabulary for the Linked Wid- gets. The Widget has the title “Movie Widget“ and the relationship to media themes (ex1:media is an instance of skos:Concept).
85 1 :Widget a lw:Widget ; 2 rdfs:label "Widget 1"^^xsd:string ; 3 lw:hasInModel :inM1, :InM2 ; 4 lw:hasModel :m ; 5 lw:hasOutModel :OutModel ; 6 lw:name "Movie Widget"^^xsd:string ; 7 dct:title "Search for movie"^^xsd:string ; 8 dct:description "The widget searches for movies 9 based on actors and directors name"; 10 dct:issued "2014−01−10"^^xsd:date; 11 dct:modified "2014−01−12"^^xsd:date; 12 dcat:keyword "movie, film , actor , star"^^xsd:string ; 13 dct:publisher :tuvienna ; 14 dcat:theme :media. 15 16 :tu\_vienna a org:Organization , foaf:Agent; 17 rdfs:label "University of Technology Vienna" . Listing 4.4: DCAT example
Listing 4.5 show a SPRAQL example for searching widgets that have “media“ as the main theme, have been issued on “10.01.2014“ by “tuvienna“.
1 2 SELECT ?w ? t i t l e 3 WHERE { 4 ?w a lw:Widget; 5 dct:title ?title; 6 dcat:theme :media; 7 dct:issued "2014−01−10" ^^ xsd : date ; 8 dct:publisher :tuvienna ; 9 } Listing 4.5: SPARQL example
DCAT gives following benefits for Mashups: DCAT increases findability, enables descrip- tion of data that are located in different Linked Data endpoints, provides better search for wid- gets.
4.5 Provenance
An important requirement for Linked Widget is extending the widget description by adding information provenance. The W3C Provenance Incubator Group defines provenance as “a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing“ [35]. In other words, the meta-data may include information about:
• the creator of the data (author, reviewer, etc.);
• version of data sets (the data are changed often);
86 • data sources of information, in case of data integration, it is needed to describe which part comes from which data sets,
• description of rules, vocabularies, ontologies; etc.
“The Provenance Family of Documents (PROV) defines a model, corresponding serializa- tions and other supporting definitions to enable the inter-operable interchange of provenance information in heterogeneous environments such as the Web. The goal of PROV is to enable the wide publication and interchange of provenance on the Web and other information systems. PROV enables one to represent and interchange provenance information using widely available formats such as RDF and XML. In addition, it provides definitions for accessing provenance information, validating it, and mapping to Dublin Core“ [35]. There is a set of 12 documents, that W3C group defined for adding provenance: PROV-OVERVIEW5, PROV-PRIMER6, PROV- DM7, PROV-N8, etc. Figure 4.13 shows the organisation of PROV documents. The colors in the figure define on what category of user are the documentations focused:
• light blue color is for users (understanding and support provenance);
• blue is for developers (creation and consuming provenance);
• pink is for advanced user (creation new PROV serializations or other application based on provenance).
The common vocabulary is defined by the conceptual data model (PROV-DM). The user and the developers use the set of constraints (PROV-Constraints9) for constructing of valid prove- nances expressions. The formal semantic (declarative specification) is defined by PROV-SEM10. Further the developers use access provenance (PROV-AQ11), linking provenance information (PROV-Links12), dictionary style collections (PROV-Dictionary13) and Dublin Core vocabulary (PROV-DC). The approach suggests the use of the PROV ontology [50] (PROV-O, a standard lightweight vocabulary) for adding meta-information about provenance of information. The W3C Prove- nance Incubator Group describes PROV-O as “an OWL2 ontology allowing the mapping of the PROV data model to RDF“. The Prov ontology includes set of classes, properties, and restric- tions for representation of the information. Table 4.2 demonstrated the namespaces which are used by PROV-O. The basic three classes of PROV-O are:
5http://www.w3.org/TR/prov-overview/ 6http://www.w3.org/TR/prov-primer/ 7http://www.w3.org/TR/prov-dm/ 8http://www.w3.org/TR/prov-n/ 9http://www.w3.org/TR/prov-constraints/ 10http://www.w3.org/TR/prov-sem/ 11http://www.w3.org/TR/prov-aq/ 12http://www.w3.org/TR/prov-links/ 13http://www.w3.org/TR/prov-dictationary/
87 Figure 4.13: PROV documents. Source: [35]
Prefix Namespace IRI rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# xsd http://www.w3.org/2000/10/XMLSchema# owl http://www.w3.org/2002/07/owl# prov http://www.w3.org/ns/prov# Table 4.2: Prefix and Namespaces
• A prov:Entity is a kind of thing with some fixed aspects (real or imaginary).
• A prov:Activity is an event that happens over a period of time with entities (e.g. include consuming, transforming, using, etc.).
• A prov:Agent is a responsible for an activity.
The relations between the classes entity, actor and activity are shown in Figure 4.14. The properties prov:startedAtTime and prov: endedAtTime show start and end time of activities. The entities can be used and generated by activities (the properties: prov:used and prov:wasGeneratedBy). Additionally, some dependency information between activities can be provided via prov:wasInformedBy. This provides “some dependency information without explicitly providing the activities’ start and end times“ [50]. For example, the activity “creationCollection“ calls an additional activity “aggregationByTopic“ activity to subscribe a widget to a theme.
1 2 @prefix prov: . 3 @prefix : . 4 5 :creationCollection 6 a prov:Activity;
88 7 prov:wasInformedBy :subsribeActivity. 8 9 :subscribeActivity 10 a prov:Activity; 11 prov:wasInfluencedBy :aggregationByTopics; 12 # aggregation of widgets by topics 13 prov:wasCreatedBy: :irina. 14 15 :irina aprov:Agent. 16 17 :aggregationByTopic a prov:Activity . Listing 4.6: PROV-O
The property prov:wasDerivedFrom is used for provenance chains definition (trans- formation of one entity into another). For example, a new dataset can be a result of filtering of an another dataset. “Arbitrary RDF properties can be used to describe the fixed aspects of an Entity that are interesting within a particular application“ [50] (e.g., the format of the dataset). The responsibilities of an agent can be shown via prov:wasAssociatedWith and prov:wasAttributedTo. The property prov:actedOnBehalfOf describes an agent’s responsibility for an another agent that relates to the influenced Activity or Entity. The following code presents a part of a description with use of DCAT.
1 2 :movieWidget a lw:Widget 3 dct:title "Search for films" ; 4 dct:creator :irina ; 5 dct:contributor :peter ; 6 dct:created "2013−12−01" ; 7 dcat:theme :Media. 8 ...
Figure 4.14: Relation between three basic classes
89 dct:creator :movieWidget :irina prov:wasAttributeTo
prov:wasAssociatedWith prov:wasGeneratedBy
:creatingThe Widget prov:startedAtTime
prov:wasAssociatedWith “2013-11-15” prov:endedAtTime
:peter “2013-12-01”
Figure 4.15: Relation between the basic classes
9 10 :irina a dct:Agent. 11 12 :tuvienna a dct:Agent. 13 14 :Media a skos:Concept; 15 dct:creator :irina. 16 ... Listing 4.7: A part of a Widget Description
Figure 4.15 represents a transformation from the semantic description modelled following to DCAT vocabulary to the semantic description which is modelled with use of PROV-O. In this case, the entity is a :movieWidget (an instance of the class lw:Widget). There are two agents, that are responsible for the action effecting the entity :movieWidget: :irina and :peter. The action is :creatingTheWidget, that describes how the entity namely lw:Widget has been created or changed. The properties prov:startedAtTime and prov: endedAtTime describe the date of first creation of the widget and date of the last change. The ontology described above can be extended via additional terms (c.f. Figure 4.16). These additions can be divided in five categories [50]:
1. The class prov:Agent has three subclasses: prov:Agent - for peoples; prov: Organization - for companies, social institutions, society, etc.; and prov:Software Agent - for running software. The prov:Entity divides into: prov:Collection
90 Figure 4.16: The extended term
that provides structure to some Entities; prov:Bundle - a set of provenance descrip- tions; prov:Plan - a set of actions.
2. The property prov:specializationOf presents “an entity that is a specialization of another shares all aspects of the latter, and additionally presents more specific as- pects of the same thing as the latter“ [50]. An alternate entities can be presented using prov:alternateOf property.
3. The property prov:atLocation defines a prov:Location for the Entities.
4. The lifetime of Entities that are generated by an Activity and used by other Activities are defined by prov:invalidatedAtTime, prov:wasInvalidatedBy etc.
5. The lifetime of an Activity - the time between start and end time.
Figure 4.17 and the following code provide an example of using the additional terms (three types of agents: person, organization, and software).
1 2 @prefix xsd: . 3 @prefix foaf: . 4 @prefix sioc: . 5 @prefix prov: . 6 @prefix : . 7 @base . 8 9
91 10 <> 11 12 a prov:Bundle, prov:Entity; 13 prov:wasAttributedTo :postEditor; 14 prov:generatedAtTime "2011−07−16T02:52:02Z"^^xsd:dateTime; 15 . 16 17 : i r i n a 18 a prov:Person , prov:Agent; 19 ## prov:Agent is inferred from prov:Person 20 foaf:givenName "Irina"; 21 foaf:mbox ; 22 prov:actedOnBehalfOf :tuvienna; 23 . 24 25 : t u v i e n n a 26 a prov:Organization , prov:Agent; 27 ## prov:Agent is inferred from prov:Organization 28 foaf:name "TU Vienna"; 29 . 30 31 :widgetSystem 32 a prov:SoftwareAgent , prov:Agent; 33 ## prov:Agent is inferred from prov:SoftwareAgent 34 foaf:name "Linked Widget"; 35 . 36 37 :movieWidget 38 prov:Entity; 39 sioc:title "Findmeamovie"; 40 prov:generatedAtTime "2013−08−16T01:01:01Z"^^xsd:dateTime; 41 prov:wasGeneratedBy :creatingTheWidget12; 42 . 43 44 : creatingTheWidget12 45 a prov:Activity; 46 prov:startedAtTime "2013−08−16T01:01:01Z"^^xsd:dateTime; 47 prov:wasStartedBy :irina; 48 prov:wasAssociatedWith :widgetSystem; 49 prov:generated :movieWidget; 50 prov:endedAtTime "2013−08−16T03:52:02Z"^^xsd:dateTime; 51 prov:wasEndedBy :irina; 52 . Listing 4.8: Additional terms of PROV-O
92 :tuvienna prov:actedOnBehalfOf
:irina :movieWidget
prov:wasGenreretedBy prov:wasEndedBy prov:wasStartedBy
:creatingThe prov:wasAssociatedWith :widgetSystem Widget
Figure 4.17: Relation between the basic classes
93
CHAPTER 5 Results and Evaluation
5.1 Resulting Semantic Model
Figure 5.1 presents the semantic model based on semantic widget model, DCAT, and PROV- O that were described in previous Chapter. The model describes a possible way to bring the ontologies together in order to satisfy the requirements for the mashup system. It includes the most important classes that cover all requirements. If some additional classes or properties are needed, it is possible to extend the semantic model. The semantic model has been extended with the following properties and classes:
• dcat:theme - for widget classification. E.g. media, actors, science, etc.
• dct:publisher - for providing information about creators of widgets. There are three types of possible agents: foaf:Person defines a person, who created a widgets), foaf:Group defines groups of creators or an institution, to that the creators belong, and foaf:Software, e.g. editor, mashup creator. DCAT vocabulary “makes extensive use of terms from other vocabularies“ [55], e.g. Dublin Core1.
• prov:wasGeneretedBy - for providing an activity, that have an influence on state of the widget (e.g. creation, changing, etc.).
5.2 Semantic Model Use cases
In this section use cases are addressed by semantic widgets which follow the semantic semantic model presented in this Chapter. Semantic model description use cases are divided into follow- ing categories:
1http://dublincore.org/documents/dcmi-terms/
95 Figure 5.1: Widget Model
• Publishing the Linked Widget information on Linked Open Data Cloud. The detailed description of widgets.
• Discovery: finding widgets that contain a specific kind of semantic relation. E.g. all widgets that contain property dbprop:livesIn.
• Composition: finding the matching widget that can consume a specific dataset or produce the required output data. E.g. all widgets that have instances of class dbpedia:Person from DBPedia.
• Smart data consumption based on semantic model: semantic model is used to select the required input from the provided context data.
5.2.1 Publishing examples Figure 5.2 presents a set of widgets, that are needed for searching a set of films. The widget “DBPedia Film Agent Search“ gives a possibility to find either actors or directors based on the following properties: name, birthplace, and year of birth. This widget is presented by the in- stance w:widget of the class lw:Widget, that has an input models w:inM1 and w:inM2, a model w:m and an output model w:outM. The models are connected with the instances of DBPedia classes dbpedia:Person via the property lw:hasInputNode, lw:hasNode and lw:hasOutputNode. The instances of the class dbpedia:Person are w:star and w:director that differentiated with help of the dbpedia properties dbpedia:starring and dbpedia:director. The model w:m includes all properties and all classes, that needed
96 to depict all relationship between inputs and outputs of the widget. The output model is con- nected to stars and directors that are instances of the same DBPedia class dbpedia:Person. Furthermore, a person can be star and director at the same time and have both properties dbpedia: starring and dbpedia:director. Figures 5.3 and 5.4 present a part of semantic model of widget “DBPedia Film Agent Search“, in graphical notation and in Turtle.
Figure 5.2: Widget “DBPedia Film Agent Search“
Figure 5.5 presents the widget “Google Maps“ which receives list of coordinates (longitude and latitude) and shows the points on map. This widget is presented by the instance w2:Widget of the class lw:Widget that has an input models w2:inM1 and a model w2:m. The models
Figure 5.3: Semantic Model of Widget “DBPedia Film Agent Search“ in graphical notation
97 Figure 5.4: Semantic Model of Widget “DBPedia Film Agent Search“ in TopBraid Composer are connected with the instances of Geonames ontology2 class gn:Feature via the prop- erty lw:hasInputNode and lw:hasNode. The instances of the class gn:Feature is w2:feature that has properties wgs84_pos:lat and wgs84_pos:long. Figures 5.6 presents a part of semantic model of widget “Google Maps“.
5.2.2 Discovery examples The second goal of the semantic description is searching for widgets. It is possible to implement with use of SPARQL queries. Discovery example 1 The first SPARQL query (c.f. Figure 5.7) finds widgets that contain the property dbpedia: starring in the widget descriptions models. The property defines a relationship between two DBPedia classes dbpedia:Person and dbpedia:Work.
2http://www.geonames.org/ontology/documentation.html
98 Figure 5.5: Widget “Google Maps“
The SPARQL query includes two clauses:
• The “SELECT clause identifies the variables to appear in the query results“ [73]: ?w - an instance of the class lw:Widget, ?name - the name of the widget, ?publisher - a publisher of the widget, an instance of the class foaf:Agent, ?n - a node that is connected to an instance which has property dbpedia:starring.
• The “WHERE clause provides the basic graph pattern to match against the data graph“ [73]. The basic graph pattern includes the following triples: ?w rdf:type lw:Widget - finding an instance of the class lw:Widget, lw:hasName ?name - finding names of widgets, ?w dcterms:publisher ?publisher - finding publisher of widgets, ?w lw:hasModel ?m - finding models of widgets, ?m lw:hasNode ?n - finding nodes of models, ?n dbpedia:starring ?ins - finding the property dbpedia:staring. Figure 5.8 shows the main classes and properties that are included in the SPARQL query.
Figure 5.9 presents the search for widget that contained owl:sameAs property which shows that two thing with different URIs are the same thing. The instance “Angelina Jolie“ the class Actor from Linked Movie Database is the same as the instance “Angelina Jolie“ of the class Person from DBPedia. The SPARQL query includes two clauses:
• The “SELECT clause identifies the variables to appear in the query results“ [73]: ?w - an instance of the class lw:Widget, ?name - the name of the widget, ?class - a class of the instance.
• The basic graph pattern of WHERE clause includes the following triples: ?w rdf:type lw:Widget - finding an instance of the class lw:Widget, lw:hasName ?name - finding names of widgets, ?w lw:hasModel ?m - finding models of widgets, ?m
99 Figure 5.6: Semantic Model of widget “Google Maps“
Figure 5.7: Finding widgets that contain property “starring“ in semantic model
lw:hasNode ?n - finding nodes of models, ?node owl:sameAs ?x - finding the
100 Figure 5.8: SPARQL query steps
widget nodes that have the property owl:sameAs, ?x rdf:type ?class - finding a class of the instance.
Figure 5.9: Search for widget that contained the “owl:sameAs“ property
5.2.3 Composition examples The following SPARQL query (c.f. Figure 5.10 finds widgets that produce geo data for Map visualization. The location is defined with use of geoname ontology class fn:Feature that has properties geo:lat and geo:long. A part of the widget description description is provided in Listing 5.1. This widget returns a set of location (longitude and latitude). The goal is to provide a mechanism for automatic SPARQL queries generation. In this cases, the query will search for widgets that can be wired with the output of this widget.
1
101 Figure 5.10: Search for widgets that produce geo data
2 @prefix dbpedia: . 3 @prefix dcat: . 4 @prefix dcterms: . 5 @prefix gn: . 6 @prefix lw: . 7 @prefix owl: . 8 @prefix p: . 9 @prefix prov: . 10 @prefix rdf: . 11 @prefix rdfs: . 12 @prefix skos: . 13 @prefix : . 14 @prefix xsd: . 15 16 : Widget 17 rdf:type lw:Widget ; 18 lw:hasOutputModel :OutM1 ; 19 lw:hasModel :m ; 20 lw:hasName "Map Widget"^^xsd: string ; 21 dcterms:publisher p:tuVienna , p:irina ; 22 dcat:theme lw:map ; 23 prov:wasGeneratedBy lw:widgetCreation . 24 25 : OutM1 26 rdf:type lw:OutModel ; 27 rdfs:label "Input model for map"^^xsd:string ; 28 lw:hasInputNode w2:long , w2:feature , w2:lat . 29 30 : l a t 31 rdf:type xsd:float ; 32 rdfs:label "lat"^^xsd:string . 33 34 : long 35 rdf:type xsd:float ; 36 w2 : f e a t u r e 37 rdf:type gn:Feature ; Listing 5.1: Source Code
102 Figure 5.11: Generation of SPARQL queries
Figure 5.11 presents the automatic generation of SPARQL queries from a widget descrip- tion. The arrows in the picture show the transformation from the output model into the terms of SPARQL query. For example, the property lw:hasOutputModel is reversed to the term lw:hasInputModel, the property lw:hasOutputNode is reversed to lw:hasOutputNode. Figure 5.12 shows the result of the SPARQL query.
Figure 5.12: SPARQL query and result in TopBraid Composer
5.2.4 Smart Data Consumption Following figure 5.13 demonstrates selecting input data for the widget “Google Map“. The widget get the data flow from three widgets: “Location“ widget that returns locations of library, “City Byke“ widget that returns locations of city byke station, and “Geo Merge“ Widget. The “Geo Merger“ widget process this data according to defined options. The result includes set
103 of locations that are instances of the class gn:Point with the properties latitude geo:lat, longitude geo:long, and :address. Based on this model, the application knows which kind of data are required for the input of this widget.
Figure 5.13: Smart consumption example
5.3 Result evaluation
In Chapter 4 a list of requirements has been defined. The purpose of the subsection is to evaluate the result model based on requirements fulfilment:
1. Widgets and Mashups are identified via an identifier - a URI. Mashups and widgets have to be stored in Widget Repository and identified by their URI’s. The users have possibility to define the URI.
2. By dereferencing the widget URIs, the semantic model will be returned. It is possible to find a widget which is identified by a URI. The semantic model can be returned.
104 3. The model should follow web standards (W3C recommendations). The data are described with use of XML-serialized RDF format which gives the possibility to define the structure of the data and do publish them into LOD cloud. The provenance of information is described with use of web standards DCAT and PROV-O. The search is provided by using SPARQL queries. Some examples are given in this Chapter.
4. The semantic model should support adding links to other Linked Data sources. These links allow the mashups platform to connect distributed data into a data space and to navigate over the data sets. Due to the fact that widgets model are usually connected to original Linked Data, it is possible to define relations to external Linked Data Endpoints.
5. A widget may have more than one semantic model, but all should generate the same output with explicit relation to input graph. It is possible to create more than one model that generate the same output.
6. The input and output data should be interlinked, explicit relations between data should be defined. There are three different types of widget models: input model, output model, and model. It gives a flexible mechanisms to define the full data model and all connection between input and output data.
7. The model should be general to support various types of widgets. E.g. data widgets, presentation widgets. In actual stage of implementation the semantic model supports all existing types of wid- gets.
8. The semantic model should provide “an explicit representation of provenance information that is accessible to machines, not just to humans“ [85]. The provenance of data is provided by applying PROV-O and DCAT ontologies. The ontologies allow to define the information about author, date of creation, versions, etc.
The requirements are fulfilled. The semantic model follows Linked Data and Semantic Web principles. With use of the model the widgets can be published into LOD. The data are described with use of W3C standards, that make the data machine-readable.
105
CHAPTER 6 Conclusion and Future Work
This chapter summarizes the research work and research results, indicates research limitations and provides advise for future work.
6.1 Research Summary
The main questions of this research work were to define a possibility to apply semantic service description languages for widget description, to define requirements for the semantic model, to implement the semantic model according to defined requirements, and to integrate the model into a prototype mashup environment. The first challenge of this work was to define what kind of basic semantic concepts and principles the mashup platform should be based on. Therefore, the second part of this master thesis introduces both the definition of the web of data and software technology, such as Mashups and Web Services. A set of requirements were derived based on concepts of the semantic web. Another very important part is the extensive analysis and comparison of existing mashups platforms and semantic web services description techniques. The analysis was divided into two parts:
• The first part covers the analysis of existing mashups platforms in order to define what kind of factors can increase usability.
• The second part provides semantic web services description techniques, their advantages and disadvantages, and evaluation of possibilities to apply this concept to the mashup platform.
The result of the analysis shows that there is no applicable web service description ap- proach for the proposed system. Even though Karma seems to be the most suitable approach, it still poses barrier regarding model implementation, because the mashup development based on Karma approach is very complex.
107 Figure 6.1: Widget recommendation in Mashup Platform
The main goals were requirements definition and implementation of the widget model. These goals were achieved successfully. This includes a set of requirements for the Linked Widget Model that are derived from semantic web and Linked Data principles, and parts of the resulting semantic model that are presented by DCAT, Information Provenance, and semantic widget model. Additionally, the Karma approach as an alternative to the developed model is provided. The Karma-based widget description shows that this method does not satisfy all re- quirements, such as the possibility to include explicit descriptions of relations (Karma approach uses SWRL vocabulary) for the Semantic Model and therefore it should be extended. The exten- sion of the model, however, can provoke problems for widget discovery and widget composition.
The main result of this research is the semantic model, which can be integrated into a proto- type mashup environment. The resulting semantic model follows Linked Data principles. This enables publishing widget descriptions into the LOD cloud. Widget matching and composition can be provided by use of SPARQL queries. The model contains required meta-data for data origin definition. Figure 6.1 shows a new feature of the developed Mashup platform, which is available since the implementation of the semantic model. By clicking on the output terminal the user gets a list of widgets that can be wired with the widget on the workspace. The suggested widgets are appeared in the bottom left corner of the mashup platform.
108 6.2 Research Limitation
The following restrictions are denoted:
• Due to the fact that the mashup platform is in very earliest stage of implementation, only a limited number of use cases can be provided. An extensional growth in the amount of available widgets will provoke widgets that are more complex and therefore richer in terms of features. For example, the data sets should be transformed into more understandable, suitable structure. This can be done by applying some algorithms or statistical methods (correlation, role learning) that can provide an analysis of available Linked Data. This will influence the semantic model because it is required to describe explicitly the relation between input and output data.
• The existence of double relations between instances which is not supported by the se- mantic model. For example, the entity http://dbpedia.org/page/Angelina_ Jolie has two similar properties dbpprop:birthPlace and dbpprop:dateOf Birth.
• The semantic model includes basic components required for widgets description. For example, PROV-O provides very complex set of entities and relations which were not required for our solution.
6.3 Future Work
Due to the fact of significant growth of statistical data provided by various public organizations, in future the mashup platform, with all of its advantages regarding linked data consumption, can provide access to this data. A possibility to process this kind of public data through widgets is data publishing as Linked Data, using W3C Data Cube vocabulary1, a format for statistical data publishing on the Web of Data. This enables to link and combine the data with additional infor- mation. Additional advantages of this approach are that multi-dimensional data can be presented with use of the RDF standard and published following the Linked Data principles. Furthermore, the model is general, which enables high reusability, and can be used for various datasets like OLAP data cubes. The main elements of the Data Cube vocabulary are a collection of observa- tions (datasets), a set of dimensions, defining the foundations of the observation, measures that describe objects of the observation, and attributes of the observed values. This facet implies the development of new types of widgets that can process data modeled based on the Data Cube format. A possible way to integrate such data is to extend the existing semantic model by adding additional entities and relations like observation, dataset, measure, etc. The mashup platform will support visualization of such multi-dimensional data and inte- gration with other data sets to support end users in deducing knowledge from statistical data. Furthermore, this will allow developers to easily discover a data source and then develop sta- tistical web applications of high quality and flexibility. The current version supports only three common statistic charts including pie, bar chart and line chart, are supported. More types of
1http://www.w3.org/TR/2014/REC-vocab-data-cube-20140116/
109 charts can improve visualization of Linked Data and data browsing. This will also influence the semantic model, because the visualization widgets can process and return additional data like summary of some data values or difference between data value, etc. and the semantic model should enable description of such data types. Finally, streaming data2 can be integrated into the mashup platform. For this kind of data it will be necessary to find some mechanism how to deal with temporal data (time stamp, time interval and other options) and include it into the semantic model in order to provide best widget matching and composition.
2http://www.w3.org/community/rsp/wiki/RDF_Stream_Models
110 CHAPTER 7 Appendix
7.1 Acronyms
CSV Comma Separated Values
DAML DARPA Agent Markup Language
DCAT Data Catalog Vocabulary
DL Description Logic
HTML Hypertext Markup Language
IRI Internationalized Resource Identifier
LIDS Linked Data Services
LOD Linked Open Data
LOS Linked Open Services
OWL-S Semantic Marjup for Web Services
OWL Web Ontology Language
PROV-O Provenance Ontology
R2RML RDB to RDF mapping Language
RDB Relational Data Base
RDFS Resource Description Framework Schema
RDFa Resource Description Framework in Attributes
111 REST Representational State Transfer
RSS Really Simple Syndication
SAWSDL Semantic Annotation for Web Services Description Language
SOAP Simple Object Access Protocol
SPARQL SPARQL Protocol and RDF Query Language
SQl Structured Query Language
SWRL Semantic Web Rule Language
URI Uniform Resource Identifier
W3C World Wide Consortium
WSDL Web Service Description Language
WSMO Web Service Modeling Ontology
WWW, W3 World Wide Web
XML Extensible Markup Language
XSLT XSL Transformation
XSL Extensible Stylesheet Language
7.2 Widget Semantic Model
112 Widget Input Mode Output Model Model Mashup 113 rdf:about="http://linkedwidgets.org/ontologies#hasInputNode"> has nodel
name has model has model
114 has nodel has nodel name name
115
7.3 Semantic Models in Top Braid Composer
Figure 7.1: Top Braid Composer Interface
116 Figure 7.2: Import of ontologies in Top Braid Composer
Figure 7.3: DBPedia classes in Top Braid Composer
117 Figure 7.4: An example of a property in Top Braid Composer
Figure 7.5: Instances in Top Braid Composer
118 Figure 7.6: An example of Widget Description in Top Braid Composer
Figure 7.7: An example of a model description in Top Braid Composer
119
Bibliography
[1] T. Berners-Lee and R. Fielding and L. Masinter. http://tools.ietf.org/html/rfc3986. Ac- cessed: 2014-02-21. [2] W3C. http://www.w3.org/2001/sw/. Accessed: 2014-02-21. [3] Saeed Aghaee and Cesare Pautasso. An evaluation of mashup tools based on support for heterogeneous mashup components. In Proceedings of the 11th International Conference on Current Trends in Web Engineering, ICWE’11, pages 1–12. Springer-Verlag, 2012. [4] AJAX. http://en.wikipedia.org/wiki/ajax_(programming), Accessed: 2013-11-11. [5] Dean Allemang and Jim Hendler. Semantic Web for the Working Ontologist: effective modelling in RDFS and OWL. Morgan Kaufmann Publishers, 2. edition, 2011. [6] Areeb Alowisheq, David E. Millard, and Thanassis Tiropanis. Express: Expressing rest- ful semantic services using domain ontologies. International Semantic Web Conference, 5823:941–948, 2009. [7] Alowisheq Areeb and David E. Millard. Express: Expressing restful semantic web ser- vices. The Seventh Reasoning Web Summer School, pages 23–27, 2011. [8] Sören Auer, Lorenz Bühmann, Christian Dirschl, Michael Hausenblas Orri Erling, Robert Isele, Jens Lehmann, Michael Martin, Pablo N. Mendes, Bert van Nuffelen, Claus Stadler, Sebastian Tramp, and Hugh Williams. Managing the Life-Cycle of Linked Data with the LOD2 Stack. The Semantic Web – ISWC 2012, pages 1–16, 2012. [9] Robert J. Aumann, A. Michael Spence, Martin L. Perl, Frank Wilczek, Steve Wozniak, Vinton G Cerf, Ann Winblad, Richard Stallman, Jim Rogers, Alan Kay, Bjarne Strous- trup, Brian Behlendorf, Rajeev Madhavan, Jimmy Wales, Craig Newmark, Greg Gian- forte, Grady Booch, and Chief Scientist. Frontier visionary interview. Frontier Journal, 6(7), 2009. [10] Florian Bauer and Martin Kaltenböck. Linked Open Data: The Essentials. A Quick Start Guide for Decision Makers. Edition mono/monochrom, Vienna, Austria, 1. edition, 2012. [11] Sean Bechhofer, Frank van Harmelen, Jim Hendler, Ian Horrocks, Deborah L. McGuin- ness, Peter F. Patel-Schneider, and Lynn Andrea Stein. http://www.w3.org/tr/owl-ref/, Ac- cessed: 2013-12-05.
121 [12] Francois Belleau, Marc-Alexandre Nolin, Nicole Tourigny, Philippe Rigault, and Jean Morissette. Bio2RDF: Towards a mashup to build bioinformatics knowledge systems. The SemanticWeb: Semantics and Big Data, pages 706–716, 2008.
[13] T Berners-lee, J. Hollenbach, Kanghao Lu, J. Presbrey, and Mc Schraefel. Tabulator redux: Browsing and writing linked data, Accessed: 2013-11-02.
[14] Tim Berners-Lee, James Hendler, and Ora Lassila. The Semantic Web. Scientific Ameri- can, pages 29–37, 2011.
[15] Berners-Lee, Tim and Cailliau, Robert . http://www.w3.org/proposal.html. Accessed: 2014-02-21.
[16] BIO2RDF. https://github.com/bio2rdf/bio2rdf-scripts/wiki/, Accessed: 2013-11-11.
[17] Christian Bizer, Tom Heath, and Tim Berners-Lee. Linked Data — The Story So Far. International Journal on Semantic Web and Information Systems, pages 1–22, 2009.
[18] Christian Bizer, Jens Lehmann, Georgi Kobilarov, Sören Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann. Dbpedia - a crystallization point for the web of data. Web Semantics: Science, Services and Agents on the World Wide Web, 7(3):154–165, 2009.
[19] Brian McBride. http://www.w3.org/tr/rdf-schema/. Accessed: 2013-10-29.
[20] Alison Callahan, José Cruz-Toledo, Peter Ansell, and Michel Dumontier. Bio2RDF Re- lease 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data. Journal of Biomedical Informatics, pages 200–212, 2013.
[21] JackBe Corporation. A business guide to enterprise mashups, 2008.
[22] Dan Brickley and Libby Miller. http://xmlns.com/foaf/spec/. Accessed: 2013-10-24.
[23] Dave Beckett and Brian McBride. http://www.w3.org/tr/rec-rdf-syntax/. Accessed: 2013- 10-21.
[24] David Beckett and Tim Berners-Lee W3C. http://www.w3.org/teamsubmission/turtle/. Ac- cessed: 2013-10-24.
[25] David Martin and Mark Burstein and Jerry Hobbs and Ora Lassila and Drew McDermott and Sheila Mcllraith and Srini Narayanan and Massimo Paolucci and Bijan Parsia and Evren Sirin and Naveen Srinivasan and Katia Sycara. http://www.w3.org/submission/owl- s/, Accessed: 2013-11-15.
[26] Dieter Fensel, Federico Michele Facca, Elena Simperl, and Ioan Toma. Semantic Web Services. Springer-Verlag Berlin Heidelberg, 1. edition, 2011.
[27] Teresa Barberá Ribera Fernando J. Garrigos-Simon, Rafael Lapiedra Alcamí. Social net- works and Web 3.0: their impact on the management and marketing of organizations. Management Decision, 50(2):1880–1890, 2012.
122 [28] The Apache Software Foundation. http://stanbol.apache.org/, Accessed: 2013-12-03. [29] DERI Galway. http://pipes.deri.org/, Accessed: 2013-11-08. [30] Jose Marıa Garcıa, David Ruiz, and Antonio Ruiz-Cortes. A lightweight prototype imple- mentation of sparql filters for wsmo-based discovery. In Technical Report ISA-11-TR-01. ISA Research Group, 2011. [31] Karthik Gomadam, Ajith Ranabahu, and Amit Sheth. http://www.w3.org/submission/sa- rest/, Accessed: 2013-11-17. [32] Graham Klyne and Jeremy J. Carroll and Brian McBride. http://www.w3.org/tr/rdf11- concepts/. Accessed: 2014-02-27. [33] Benjamin Grosof, Mike Dean, Carl Andersen, William Ferguson, Daniela Inclezan, and Richard Shapiro. R.: A silk graphical ui for defeasible reasoning, with a biology causal process example. In In: Proc. 4th Intl. Web Rule Symp. (RuleML), 2010. [34] Benjamin Grosof, Mike Dean, and Michael Kifer. The silk system: Scalable higher-order defeasible rules. In International RuleML Symposium on Rule Interchange and Applica- tions, 2009. [35] Paul Groth and Luc Moreau. http://www.w3.org/tr/prov-overview/, Accessed: 2013-12-05. [36] Tom Heath and Christian Bizer. Linked Data. Evolving the Web into a Global Data Space. Morgan & Claypool, 1. edition, 2011. [37] John Hebeler, Matthew Fisher, Ryan Blace, and Andrew Perez-Lopez. Semantic Web Pro- gramming. Wiley Publishing, Inc., 1. edition, 2009. [38] Ian Horrocks and Peter F. Patel-Schneider and Harold Boley and Said Tabet and Benjamin Grosof and Mike Dean. http://www.w3.org/submission/swrl/. Accessed: 2014-02-21. [39] Google Inc, Yahoo Inc, and Microsoft Corporation. http://schema.org/, Accessed: 2013- 11-29. [40] Yahoo! Inc. http://pipes.yahoo.com/, Accessed: 2013-11-06. [41] Kashif Iqbal, Marco Luca Sbodio, Vassilios Peristeras, and Giovanni Giuliani. Semantic service discovery using sawsdl and sparql. In Proceedings of the 2008 Fourth International Conference on Semantics, Knowledge and Grid, pages 205–212. IEEE Computer Society, 2008. [42] Ivan Herman and Ben Adida and Manu Sporny and Digital Bazaar and Mark Birbeck. http://www.w3.org/tr/xhtml-rdfa-primer. Accessed: 2013-10-21. [43] M. Cameron Jones and Elizabeth F. Churchill. Conversations in Developer Communities: a Preliminary Analysis of the Yahoo! Pipes Community. Proceeding C&T ’09 Proceedings of the fourth international conference on Communities and technologies, pages 195–204, 2009.
123 [44] Rohit Khare and Tantek Çelik. Microformats: a pragmatic path to the semantic web. WWW ’06 Proceedings of the 15th international conference on World Wide Web, pages 865–866, 2006.
[45] Craig A. Knoblock, Pedro Szekely, José Luis Ambite, Aman Goel, Shubham Gupta, Kristina Lerman, Maria Muslea, Mohsen Taheriyan, and Parag Mallick. Semi- automatically mapping structured sources into the semantic web. In The Semantic Web: Research and Applications, Lecture Notes in Computer Science, pages 375–390. Springer Berlin Heidelberg, 2012.
[46] Agnes Koschmider, Victoria Torres, and Vicente Pelechano. Elucidating the mashup hype: Definition, challenges, methodical guide and tools for mashups. In 2nd Workshop on Mashups, Enterprise Mashups and Lightweight Composition on the Web in conjunction with the 18th International World Wide Web Conference, Madrid, 2009.
[47] Rubén Lara, Dumitru Roman, Axel Polleres, and Dieter Fensel. A conceptual comparison of wsmo and owl-s. Multimedia Tools and Applications, 64(2):365–387, 2013.
[48] Jon Lathem, Karthik Gomadam, and Amit P. Sheth. Sa-rest and (s)mashups : Adding semantics to restful services. International Conference on Semantic Computing, pages 469–476, 2007.
[49] Danh Le-Phuoc, Axel Polleres, Manfred Hauswirth, Giovanni Tummarello, and Christian Morbidoni. Rapid Prototyping of Semantic Mash-Ups through Semantic Web Pipes. Pro- ceeding WWW ’09 Proceedings of the 18th international conference on World wide web, pages 581–590, 2009.
[50] Timothy Lebo, Satya Sahoo, and Deborah McGuinness. http://www.w3.org/tr/prov-o/, Accessed: 2013-12-05.
[51] Faculty of Mathematics Leipzig University and Dept. Business Information Systems Com- puter Science, Institute of Computer Science. http://aksw.org/projects/limes.html, Ac- cessed: 2013-11-19.
[52] linkeddata.org, administrated by Tom Heath. http://linkeddata.org. Accessed: 2013-10-21.
[53] Yan Liu, Xin Liang, Lingzhi Xu, Mark Staples, and Liming Zhu. Composing enterprise mashup components and services using architecture integration patterns. J. Syst. Softw., 84(9):1436–1446, 2011.
[54] LOD-Around-The-Clock (LATC). http://5stardata.info/. Accessed: 2013-11-02.
[55] Fadi Maali and John Erickson. http://www.w3.org/tr/vocab-dcat/, Accessed: 2013-12-05.
[56] Marcos Caceres and Mark Priestley. http://www.w3.org/tr/2009/wd-widgets-reqs- 20090430/. Accessed: 2014-02-21.
124 [57] David Martin, Mark Burstein, Drew Mcdermott, Sheila Mcilraith, Massimo Paolucci, Ka- tia Sycara, Deborah L. Mcguinness, Evren Sirin, and Naveen Srinivasan. Bringing seman- tics to web services with owl-s. Multimed Tools Appl, pages 365–387, 2012.
[58] Pablo N. Mendes, Hannes Mühleisen, and Christian Bizer. Sieve: Linked data quality assessment and fusion. In Proceedings of the 2012 Joint EDBT/ICDT Workshops, pages 116–123. ACM, 2012.
[59] Alistair Miles and Sean Bechhofer. http://www.w3.org/2009/08/skos-reference/skos.html, Accessed: 2013-12-05.
[60] Eetu Mäkelä, Kim Viljanen, Olli Alm, Jouni Tuominen, Onni Valkeapää, Tomi Kauppinen, Jussi Kurki, Reetta Sinkkilä, Robin Lindroos, Osma Suominen, Tuukka Ruotsalo, Eero Hyvönen, and et al. Enabling the semantic web with ready-to-use web widgets, 2007.
[61] Christian Morbidoni, Axel Polleres, Giovanni Tummarello, and Danh Le Phuoc. Semantic Web Pipes, 2007.
[62] Jagadeesh Nandigam, Venkat N. Gudivada, and Mrunalini Kalavala. Semantic web ser- vices. J. Comput. Sci. Coll., 21(1):50–63, 2005.
[63] Barry Norton, Reto Krummenacher, Adrian Marte, and Dieter Fensel. Dynamic linked data via linked open services. In Linked Data in the Future Internet 2010, pages 1–10, 2010.
[64] R. Fielding and J. Gettys and J. Mogul and H. Frystyk and L. Masinter and P. Leachand T. Berners-Lee. http://www.w3.org/tr/html/. Accessed: 2014-02-21.
[65] RDF Working Group. http://www.w3.org/rdf. Accessed: 2013-10-21.
[66] Roberto Chinnici and Jean-Jacques Moreau and Arthur Ryman and Sanjiva Weerawarana. http://www.w3.org/tr/wsdl20/, Accessed: 2013-11-15.
[67] Robin Berjon and Steve Faulkner and Travis Leithead and Erika Doyle Navara and Edward O’Connor and Silvia Pfeiffer. http://www.w3.org/tr/html/. Accessed: 2014-02-21.
[68] Dumitru Roman, Uwe Keller, Holger Lausen, Jos de Bruijn, Ruben Lara, Michael Stoll- berg, Axel Polleres, Cristina Feier, Cristoph Bussler, and Dieter Fensel. Web service mod- eling ontology. Applied Ontology, pages 77–106, 2005.
[69] Sebastian Rudolph. Foundations of Description Logics. Reasoning Web 2011, LNCS 6848, 2011.
[70] SAWSDL Working Group. http://www.w3.org/2002/ws/sawsdl/. Accessed: 2014-02-27.
[71] Toby Segaram, Colin Evans, and Jamie Taylor. Programming the Sematic Web. O’REILLY, 1. edition, 2009.
125 [72] Souripriya Das and Seema Sundara and Richard Cyganiak. http://www.w3.org/tr/r2rml/, Accessed: 2013-11-13.
[73] SPARQL Working Group. http://www.w3.org/tr/rdf-sparql-query/. Accessed: 2013-10-21.
[74] Sebastian Speiser and Andreas Harth. Taking the lids off data silos. In Proceedings of the 6th International Conference on Semantic Systems, I-SEMANTICS ’10, pages 44:1–44:4. ACM, 2010.
[75] Sebastian Speiser and Andreas Harth. Integrating linked data and services with linked data services. In Proceedings of the 8th Extended Semantic Web Conference on The Semantic Web: Research and Applications - Volume Part I, ESWC’11, pages 170–184. Springer- Verlag, 2011.
[76] Steffen Stadtmüller, Sebastian Speiser, Andreas Harth, and Rudi Studer. Data-fu: A lan- guage and an interpreter for interaction with read/write linked data. In Proceedings of the 22Nd International Conference on World Wide Web, WWW ’13, pages 1225–1236. International World Wide Web Conferences Steering Committee, 2013.
[77] Mohsen Taheriyan, Craig A. Knoblock, Pedro Szekely, and José Luis Ambite. A graph- based approach to learn semantic descriptions of data sources. In The Semantic Web – ISWC 2013, Lecture Notes in Computer Science, pages 607–623. Springer Berlin Heidel- berg, 2013.
[78] Mohsen Taheriyan, Craig A. Knoblock, Pedro Szekely, and Jose Luis Ambite. Rapidly integrating services into the linked data cloud. In The Semantic Web – ISWC 2012, Lecture Notes in Computer Science, pages 559–574. Springer Berlin Heidelberg, 2012.
[79] Tim Berners-Lee, W3C and Dan Connolly, W3C. http://www.w3.org/teamsubmission/n3/. Accessed: 2013-10-24.
[80] Giovanni Tummarello, Richard Cyganiak, Michele Catasta, Szymon Danielczyk, Renaud Delbru, and Stefan Decker. Sig.ma: Live views on the web of data. Web Semantics: Science, Services and Agents on the World Wide Web, 8(4):355 – 364, 2010.
[81] Ruben Verborgh, Thomas Steiner, Davy Van Deursen, Jos De Roo, Rik Van de Walle, and Joaquim Gabarró Vallés. Capturing the functionality of web services with functional descriptions. World Wide Web, 10(3):243–277, 2012.
[82] Ruben Verborgh, Thomas Steiner, Davy Van Deursen, Sam Coppens, Erik Mannens, Rik Van de Walle, and Joaquim Gabarró Vallés. Integrating data and services through func- tional semantic service descriptions. In Proceedings of the W3C Workshop on Data and Services Integration, 2011.
[83] Roberto De Virgilio, Francesco Guerra, and Yannis Velegrakis. Semantic Search over the Web. Springer-Verlag Berlin Heidelberg, 1. edition, 2012.
126 [84] Tomas Vitvar, Jacek Kopecký, Jana Viskova, and Dieter Fensel. Wsmo-lite annotations for web services. In Proceedings of the 5th European Semantic Web Conference on The Semantic Web: Research and Applications, ESWC’08, pages 674–689. Springer-Verlag, 2008.
[85] W3C. http://www.w3.org, Accessed: 2013-12-15.
127
1
3 5 Author : J.K. Rowling 6 ( born 31.07.1965) 7 < / div > 8 United Kingdom 9 Movie 10 < / div > Listing 2.12: HTML with microdata format2.10 Semantic Web Services
One of the goals of this master thesis is semantic Web Widget description with use of web service description approaches or languages. This part of the Master Thesis gives an overview of Semantic Web Service. As referred earlier, nowadays the Web may have great significance for society. The tra- ditional Web focused on interaction between people and applications, information sharing, on providing of the basic features for e-Commerce, and on support for application integration (very limited). [26] The ability to exchange and use information is the major task because of limita- tions of the Web. The solution for the problem of interoperability was the introduction of Web Services. The W3C defines web service as “a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine- processable format (specifically WSDL). Other systems interact with the Web service in a man- ner prescribed by its description using SOAP messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards“. The Web Services connect applications over the internet using Web service standards in order to exchange the data. For example, online purchase, if the user want to by a staff, he or she sends a request to the Server and gets a response. The request includes the ID of the staff, amount, credit card name, address etc. The response includes information about successful purchase or some errors. A client and a web service exchange this information via request and response messages. The client application send a request message to the Web server and the servers returns a re- sponse message to the the client. The technology has following aspects:
33 • The protocol is responsible for message transportation. For example, HTTP, SMTP, FTP or BEEP48.
• The message structure is defined with SOAP or REST.
• The interface description describes the structure of message. For example, WSDL.
• The data format are XML based message format or JSON.
It is necessary to take close look at the data that web services have as Input and as Output. For clarification, the following examples of the messages in different formats are introduced. REST + XML The URI defines a resource. E.g., http://ex.com/actors/angelinajolie.A REST response is a document in XML format and the resource URL.
1 < p r o f i l e > 2 Angelina 3 Jolie 4 US 5 1982 6 < / p r o f i l e > Listing 2.13: REST + XML
REST + JSON REST + JSON is essentially the same as the previous format. The difference is that the data is transferred in JSON format. The advantage of JSON is the ability to parse the structures into JavaScript.
1 { 2 firstName : "’Angelina"’, 3 lasName : "’Jolie"’, 4 citizenship : "’US"" 5 year : "’1982"’ 6 } Listing 2.14: REST + JSON
XML RPC The message is also represented in XML format.
1 HTTP / 1 . 1 200 OK 2 Connection: close 3 Content−Type: text/xml 4 Server: ex.com 5 6 7 8 9 10 < s t r u c t >
48http://en.wikipedia.org/wiki/BEEP
34 11 12 firstName 13 Angelina 14 < / member> 15 16 lastName 17 Jolie 18 < / member> 19 20 citizenship 21 US 22 < / member> 23 24 year 25 1982 26 < / member> 27 < / s t r u c t > 28 < / param> 29 < / params> 30 Listing 2.15: XML RPC
The Richardson maturity model49 identifies REST as the interaction between a client and a server according to three principles:
• Resource identification by means of URI.
• API should use constrained set of operations (HTTP verbs).
• Hypermedia controls (automatic web application control).
The main problems of the Web Service are:
• As pointed out before, the Web Services can have different standards and not machine- understandable content.
• Not self-describing.
• Service discovering is complex.
• Technical challenges in service composition.
A solution for the problems is adding the semantic description to the Web Services and their corresponding messages that contained data. “Semantic Web Services is a synergistic conflu- ence of the Semantic Web and Web Services“ [62]. They are like traditional Web Service but includes machine-readable and understandable information. The implementation of Semantic Web Services should apply standards for semantic data. Due to this fact the services can be discovered and assembled.
49http://martinfowler.com/articles/richardsonMaturityModel
35 The Semantic Web Services have many similarities with Web Widgets: widgets have also input, output, functional properties etc. An important task is to describe the semantic behind the services. There are different methodologies (OWL-S, WSMO, WSDL etc.) for services descrip- tion task, that are explored in detail in Chapter 3. The goal of the Chapter 3 is to understand which approaches can be applied to the problem statement of this Master Thesis.
36 CHAPTER 3 State of the Art
This Chapter is divided into two parts. The first part describes state of consuming and publishing tools. The second presents the different methodologies that can be applied for Web Services and data descriptions.
3.1 Applications
3.1.1 Overview of existing application This chapter describes related applications and projects in the field of consuming and publication of Linked Data. There are some existing tools and application available. The applications can be categorized as follows [17] [36]:
• Linked Data Browser are similar to web browser however instead of navigating between pages via hyper links, the users navigate between data resources by following linked ex- pressed by RDF triples. [17]. Examples: Tabulator (generic data browser and editor [13]) and Marbles1 (server-side application that formats Semantic Web content for XHTML clients using Fresnel2 lenses and formats).
• Linked Data Search Engines and Indexes. A number of search engines have been de- veloped that crawl Linked Data from the Web by following RDF links, and provide query capabilities over aggregated data. Broadly speaking, these services can be divided into two categories: human-oriented search engines and application-oriented indexes [17]. Exam- ples: Falcons3, SWSE4, Swoogle5 (semantic Web search engines that provide keyword-
1http://mes.github.io/marbles 2http://www.w3.org/2005/04/fresnel-info/ 3http://www.w3.org/2001/sw/wiki/Falcons 4http://swse.org 5http://swoogle.umbc.edu
37 based search for objects, data, ontologies and documents), sameAs.org6, Sindice7, and Sig.ma8.
• Domain-specific Applications. The applications that were developed for domain-specific goals. Such applications access specific data from different Linked Data Sources. Exam- ples: DBpedia Mobile9, DERI Pipes10, BBC Programms and Music11.
The following sections of this Chapter provides existing mashup platforms and applications that are able to consume Linked Data. This Chapter covers not only semantic applications but also Yahoo!Pipes12, a mashup that consume data from various resources. For evaluation of tools some parameter like discovery, input/output data types, access methods, recursion, behavior are used [3]. The user interface is also an important factor.
3.1.2 Yahoo!Pipes Yahoo!Pipes is an online application that was lunched at 7th February 2007 by Yahoo. The purpose of the application is data integration and consuming from different web pages, web feeds (RSS feeds) and other online resources by way of constructing data mashups [40] [43]. The mashup system includes different type of widgets. Some of them have access to data sources. Another widgets include aggregate or filtering options. Widgets can be wiring together in order to process data. The Yahoo!Pipes environment includes four main parts: a navigational bar, the toolbox, the work canvas, and a debug-output panel [43]. The mashup creation is going through dragging modules (operators) from the toolbox into the work canvas and linking the modules. Each of the modules completes a specific task [40]. Widgets have input and output terminals. The linking input of a widget to another widget is occurred via wiring from the input to the output port (terminals) or vice versa. Data flow is going from input modules to a single Pipe output (end of execution process) [43]. The output returns in different formats such as RSS, JSON. The project can be saved and shared with other users of Yahoo Pipes. Pipes can be accessed via their URL (each of pipes has a unique URL). The user has a possibility to store the pipe in the public directory. Anyone can search and browse the pipes from the directory. User can search for published pipes, inspect and modify pipes, and also save a copy from it in a directory. There are eleven categories of modules (features): sources, user inputs, operators, url, string, data, location, number, favorites, my pipes and deprecated. Source is the component that bring the data from web pages into the pipe [43]. This modules can process data on the Web in CSV (Module Fetch CSV), XML and JSON (Module Fetch Data), RSS, Atom and RDF (Module Fetch Feed) formats. Find First Site Feed is a module for finding an RSS or Atom feeds. It is also possible to extract any information from web pages
6http://sameas.org/ 7http://sindice.com/ 8http://sig.ma/ 9http://dbpedia.org/DBpediaMobile 10http://pipes.deri.org 11http://www.bbc.co.uk 12http://pipes.yahoo.com/pipes/
38 using XPATH Fetch Page Module. E.g. The command //img is used to return all images from a web page. This category includes also other components. User inputs make Yahoo!Pipes more flexible and enable adding user inputs into data flow. There are five types of input modules: date, location, number, text and URL. The user may provide the following fields: name (parameter name), prompt (for Run Pipe option, a text entry field), position (the order of input fields), default (a default value), and debug (a default value within the Pipes Editor). Operators are used for data transformation and filtering. This category includes following modules [43]:
• Count Module counts number of items. The input of the module is a data feed and the output is a number.
• Filter Module is used for item inclusion and exclusion from a feed via rules definition. The module can contain multiple rules.
• Location Extractor Module is used for adding location elements (y:location) which in- cludes sub-elements such as latitude, longitude, quality, country, state, city, street, postal code. This element gives the possibility to display the feed on a map.
• Loop Module is used to add sub-modules to Pipes. A module can be inserted into Loop Module. The sub-module will run once for each item in the input feed. There are two options that define the output of the module: “emit result“ (output is only data from the sub-module) and “assign results to“ (output is all the data from the original input, the data from the sub-module is ).
• Regex Module “modifies fields in an RSS feed using regular expressions, a powerful type of pattern matching“ [43].
• Rename Module renames elements. E.g. it is possible to convert some data into RSS format (the elements will have title, description, etc.) or to location elements for Location Extractor. There are two types of mapping: “rename“ (create a new element with a new name with deleting the old element) and “copy es“ (create a new element without deleting the old element).
• Reverse Module provides reversing the order of items.
• Split Module splits the feed “into two identical output feeds“ [43]. The module is useful in case of different operation on the same data items.
• Sort Module sorts feeds in either ascending or descending order by any element (e.g., name, date).
• Sub-Element Module extracts selected sub-elements from a feed.
• Tail Module tails a feed to the last N items. N is a number specified by user.
• Truncate Module truncates a feed to the first N items.
39 Figure 3.1: the Web & the Semantic Web
• Union Module combines separate sources of items (maximum 5). The output is a list of items.
• Unique Module removes the duplicated string data type data from the feeds.
• Web Service Module sends a request to an Web Service for additional processing of the data. The Yahoo Pipes gets the response from a Web Server in JSON format. The Web Service should support HTTP POST in JSON format.
• Create RSS Module transforms input data in RSS format. Non-RSS elements are re- named in an existing element name.
Figure 3.1 presents an example in Yahoo Pipes. The example shows aggregation of infor- mation from different sources. The processing is going separately for each data source. In the example the data from Sciencenews Web Page (https://www.sciencenews.org/) and CNN news page (http://rss.cnn.com/) have been selected. The merging the data is processed via the Union module. The use case for the example was the finding the ar- ticle, that have word “Dolphin“ in the title. To get “dolphin“ reference from Sciencenews Web Page the XPath Web Page module was used. For selection of the data XPATH com- mand //a[contains(.,’Dolphin’)] has been used. The Truncate module has been user for taking the top two articles. The articles from CNN web page has an RSS feed (http:
40 //rss.cnn.com/rss/cnn_topstories.rss). For selection of “Dolphin“ from the ti- tle the module Filter has been used (item.description contains “Dolphin“). Finally, both feeds are piped into a Union module, merging both into one feed. After running the pipe in the debug-output panel the result of merging is shown(3 articles). URL Module includes only one module - URL Builder Module. All resources are defined by URLs. Some of them are complex. The module is used for controlling on URL construction. String Modules are used to process string values. For example, the building a string from some sub-strings. The category includes String Builder Module, Sub String Module, Term Ex- tractor Module, Translate Module, String Regex Module, String Replace Module, String Tok- enizer Module, Yahoo! Shortcuts Module, Private String Module. Date Modules are used for date building and formatting. There are two modules: Date Builder Module and Date Formatter Module. The first module converts a string value into a datetime value. The second modules defines a format for the datetime value. Location Builder Module extracts geographical data from a description. “The module outputs a location structure with separate fields for city, state, country, latitude, and longitude“ [40]. The location can be connected with any modules that accept location types. Simple Math Module processes mathematical operation like division, substraction, power etc. Yahoo!Pipes supports creation of new information streams from different sources by using a cascade of simple operators. Data sources are usually web feeds (e.g. from news web page) or another simple data. The access to data is realized via standard web protocols (HTTP, RSS). Yahoo!Pipes mashups can be combined with each other and can be accessed via HTTP. The retrieving data are usually automatic refreshed after each start of the pipes. The disadvantage of Yahoo!Pipes is the lack of Semantic Data processing capabilities. Yahoo!Pipes also doesn’t support search for stored widgets based on authors, topic, etc. and semantic description of the resources. Yahoo!Pipes supports component discovery according to keywords. The possible formats of data are limited to RSS, Atom, XML, and JSON. An advantage is that Yahoo!Pipes give ability to use XPath expression for data retrieving from web pages. It accesses the data via HTTP or RSS/Atom. A good feature is mashups recursion, the stored mashups can be used as parts of another mashups. The interface is very complicated for non-professional users.
3.1.3 DERI Pipes DERI Pipes [29] is an open source project for web data transforming, filtering, aggregation (the data should be in RDF format or in several RDF serialization format) [49], and building RDF- based mashup [61]. The tool supports RDF, XML, SPARQL, XQUERY, JSON and several scripting languages [29]. External applications can use the output stream of data (e.g. JSON). The web sources of data can be accessed via URIs. Data are processed by several basic operators, and each operator may have one or more input (e.g. text, output from other operators or URIs) and only one output (a RDF graph, an RDF datasets, an SPARQL set). A set of instances of the operators represents a pipe [61] [49]. A Semantic Web pipe processes a data flow from a set of RDF sources through pipelined special purpose operators [49]. Figure 3.2 presents the basic operators such as CONSTRUCT and
41 Figure 3.2: Semantic Web pipe operators. Source: [49]
SELECT. The input values can be data in RDF, string or XML formats. The output is usually data either in RDF or in XML format. The definition of the pipes are stored as XML. The structure of a simple pipe is presented by the following example [29], which presents a simple pipe that aggregates data from differ- ent Linked Data sources. Each pipe is started with XML tag . The construct block () is used for RDF transformation (c.f. Listing 3.1).
1 < p i p e > 2 3 5 < query > 6 < ! [CDATA[ 7 CONSTRUCT { 8 / / SPARQL query 9 } 10 ] ] > 11 < / query > 12 13 < / p i p e > Listing 3.1: Pipes Definition
DERI Pipes has also a graphical editor (c.f. Figure 3.3). The environment is similar to Yahoo!Pipes. On the left side there are sets of operators that are grouped into four categories: fetch, operators, url, and inputs. The operators can be moved onto the designer tab canvas or panel and connected. The source code can seen by clicking on the button “source code“ under the designer tab canvas. The result of pipes is shown in view panel (text or table view). To understand better the features the operators [29] have been considered.
42 Figure 3.3: DERI interface
The first category of operators includes fetch operators. This operators get data from data source (via a URI) in RDF, HRML, XML, or XSL format. The second category “Operators“ includes operators for data processing. The triples that are fetched from different sources can be merged via the MIX operator. The input of the operator should be RDF/XML data (a constant or an output of an another operators in RDF/XML format). The operator RDFS MIX merges specified sources and then concludes triple from the merged triples. The operator CONSTRUCT is used to derive data from one or mere specified RDF sources via SPARQL. Cycle operator FOR invokes “a parametrized pipe multiple times and merge the resulting outputs of each invocation“. The operator SMOOSHER can be used to merge all data from different sources according to a URI and based on the owl:sameAs statement. The third category is “URL“. There are two operators in the category: URL builder and SPARQL Endpoint. URL builder is similar to Yahoo pipes url builder. SPARQL Endpoint accesses to a SPARQL endpoint via a SPARQL query which is contained in the operator. The fourth category “Inputs“ includes PARAMETER that “accepts user input“ [29] and FOR VARIABLE that gives a name to a field that is used within a loop [29]. DERI Pipes like Yahoo!Pipes can be stored, shared an re-used by other users. Each DERI Pipe has a unique URL. Users can connect different pipes, modify existing pipes and include pipes as a functional block into projects (because of XML format or HTTP-retrievable model). DERI Pipes like Yahoo!Pipes doesn’t support an efficient search for stored widgets and se- mantic description of widgets. DERI Pipes process data in RDF, XML, Microformats, JSON and binary streams and convert data into RDF format. The platform accesses the data via SPARQL. A good feature is mashups recursion. The stored mashups can be used as parts of another mashups. The interface is very complicated for non-professional users and programming skills are needed.
43 3.1.4 BIO2RDF BIO2RDF is an open-source semantic project that provides Life Science Linked Data data from over 1500 biological databases (like Kegg13, MGI14, PDB15) [16]. The goal of the project is an implementation of a more sophisticated scheme for biomedical data, bringing data from different web sites together, and adding semantic to the data in order to get machine-understandable content [20]. Bio2RDF provides scripts that convert diverse set of heterogeneously formatted sources [16] [20] into RDF. The datasets are converted based on Tim Berners-Lee’s designed principles of Linked Data (cf. Chapter 2.2). The transformation of the data into RDF format is occurred though a JSP toolbox. The data can be locally stored or accessed via http requests [16] [12]. The system supports “relational databases, text files, XML documents, and HTML pages“ [12]. Depending on the data format the system uses different methods to get the data: XML to RDF conversion, SQL to RDF conversion, or text file to RDF conversion. The data conversation includes three steps [12]:
1. Namespaces definition for URIs normalization (URI is unique, using owl:sameAs pred- icate).
2. The data source analyses and design of RDF a model.
3. The implementation of an RDFizer for data transformation and putting the data into a triple store.
Bio2RDF suggests set of principles for providers [12]:
1. “Use a REST like interface“ for clear and stable URI creation.
2. “Lowercase all the URI up to the colon“ for effectively case insensitive.
3. “All URIs should return an RDF document“ for easy connection to other linking data.
“The syntax of normalized URI is described by the following pattern: http://bio2rdf. org/:“ [12]. Figure 3.4 presents the framework architecture. The input data can be in different formats as Text, XML, RDF, etc. The system processes the data in two ways:
• The data from external sources can be are stored in an SQL database on BIO2RDF.org server. This sources are accessible directly from the server. The direct access to the BIO2RDF server affords high speed. E.g. data from HGNC16, Entrez Gene17, Kegg18.
13http://www.genome.jp/kegg/ 14http://www.informatics.jax.org/ 15http://www.rcsb.org/pdb/) 16http://www.genenames.org/ 17http://www.ncbi.nlm.nih.gov/gene 18http://www.genome.jp/kegg/
44 Figure 3.4: Bio2RDF system framework architecture. Source: [12]
• The data from external sources can be requested directly from data source. After the request the data is transformed into RDF with use of a RDFizer program. E.g. data from Reactome19, PubMed20, UniProt21.
There are two servlets in the system: Elmo22 and Sesame23. For crawling of the RDF documents Elmo is used. The triples are processed in the local Sesame repository. By means of the Sesame interface the data can be browsed and queried. BIO2RDF looks like a search engine. The user can use it like Google or Yahoo to find the needed information. The result of request is a table with properties and values. BIO2RDF contains very specific knowledge therefore it is very popular in life sciences industry.
3.1.5 LOD2 The LOD2 is a large European project and set of tools that “support the whole life cycle of Linked Data from extraction, authoring/creation via enrichment, interlinking, fusing to mainte- nance“ [8], developed by partner companies and university. The architecture of components is based on three foundations [8]:
• “Software integration and deployment using the Debian packaging system“.
• “Use of a central SPARQL endpoint and standardized vocabularies for knowledge base access and integration between different tools“.
19http://www.reactome.org/PathwayBrowser/ 20http://www.ncbi.nlm.nih.gov/pubmed 21http://www.uniprot.org/ 22http://www.openrdf.org/ 23http://www.w3.org/2001/sw/wiki/Sesame
45 • “Integration of the LOD2 Stack user interfaces based on REST enabled Web Applica- tions“.
LOD2 defines Linked Data Lifecycle, that includes eight phases, for each of that are a set of tools are available:
• Extraction. Conversion of data into RDF. Tools: Valiant24, Apache Stanbol25, DBPedia Sportlight26, D2RQ27.
• Storage. Optimization of data storage, dynamic query to RDF graph, graph processing etc. Tools: Virtuoso28.
• Authoring. Publishing of the Linked Data, addition of semantically enriched content and editing it for non-expert users. E.g. WYSIWYM paradigm29. Tools: PoolParty30, On- toWiki31.
• Interlinking. Data integration, addition the links between semantic contents. Tools: SILK32, LIMES33.
• Classification. The integration of the raw data with an ontology for future work with integrated data.
• Quality. The quality characteristics like coverage, context or structure are very important. Tools: Sieve34.
• Evolution/Repair. Control for data sets and ontologies relevance in order to keep things stable. Repair strategies should be planed for appeared problems. Tools: Sieve.
• Search/Browsing/Exploration. Tools: SemMap35.
Apache Stanbol Apache Stanbol is a set of components that combine tradition content management systems with semantic services. Apache Stanbol includes:
• Content enhancement. The goals are information extraction from contents, content ana- lyze, presenting contents as RDF. It is used for search and navigation improvement.
24http://lod2.eu/Project/Valiant.html 25https://stanbol.apache.org/ 26http://dbpedia-spotlight.github.io/demo/ 27http://d2rq.org/ 28http://lod2.eu/Project/Virtuoso.html 29http://en.wikipedia.org/wiki/WYSIWYM 30http://lod2.eu/Project/PoolParty.html 31http://lod2.eu/Project/OntoWiki.html 32http://lod2.eu/Project/Silk.html 33http://lod2.eu/Project/LIMES.html 34http://sieve.wbsg.de/ 35http://aksw.org/Projects/SemMap
46 • Reasoning. The Stanbol reasoners analyzes set of axioms and facts in order to get logical consequences (additional semantic).
• Knowledge models or Ontology Manager provides access to ontology stored in the system for managing ontology, ontology networks, user sessions [28].
• Persistence or Contenthub. It is a document repository for semantic information storing.
The functionalities of components are terms of a RESTful web service API.
D2RQ R2RQ is a platform for data retrieving in form of RDF graphs from relational databases without additional storing the data in a RDF store. The D2RQ Platform is “a system for accessing relational databases as virtual, read-only RDF graphs. It offers RDF-based access to the content of relational databases without having to replicate it into an RDF store“36. The system support querying of non-RDF database using SPARQL, presentation of relational data bases as Linked Data and access to the data, use of Apache Jena API, creation of custom dumps. The platform includes :
• The D2RQ Mapping Language. The language describes mapping between relational databases and ontologies. The data are presented as virtual data graphs that include infor- mation from relational databases.
• The D2RQ Engine. The engine converts Jena API calls into common SQL queries ac- cording to a mapping description.
• D2R Server gives the ability to publish the data into LOD. The server transforms the data from relational database into RDF formats according to a mapping description. After transformation the data can be browsed and searched.
Virtuoso Virtuso is a multi-model data server for data and information storage, and knowledge man- agement. It allows the access to various data sources that are stored in different formats and sup- ports various query languages, and data representation formats. For example, SQL, SPARQL, JDBC, HTTP, WebDAV, XML, RDF, etc. Virtuoso covers many areas like Data Management (Relational, RDF Graph, or Document), Free Text Content Management & Full Text Indexing, Document Web Server, Linked Data Server, Linked Data Deployment, and Messaging.
PoolParty PoolParty is a thesaurus management system for generation of knowledge models, creation of thesauri and taxonomies. The platform is based on semantic technology and provides the
36http://d2rq.org/
47 ability to combine thesauri with Linked Open Data. The information are analyzed by the system and published into a semantic graph. The system has following features:
• It can analyse documents in order to find inconsistency between existing taxonomies and the content.
• The system follows W3C’s SKOS standard.
• Connection to Linked Data.
• Support various datatypes.
• Use of Virtuoso for knowledge graph storing.
• Integration with SharePoint, Drupal, etc.
• The system is based on the following standards: RDF, SPARQL and SKOS.
• Integration with other enterprise systems.
A Link Discovery Framework for the Web of Data (SILK). “Using the declarative Silk - Link Specification Language (Silk-LSL), data publishers can specify which types of RDF links should be discovered between data sources as well as which conditions data items must fulfill in order to be interlinked.“ [?]. The SILK specifies RDF links between data sources and terms of data interlinking. Different similarity metrics can be used for links defining. LIMES. LIMES is a link discovery framework. The approach refers to “interlinking“ phase and is based on estimation of similarity between instances [8] [51]. The instances pair are filtered to find out sufficient according to the specified conditions. The approach includes also machine- learning algorithms (EAGLE37, COALA38 and EUCLID39) to find out the appropriate pairs of instances. The framework includes seven modules: control module (matching process coordination), data module (consist of classes needed to work with data), I/O-module (is used for data reading and data extraction), query module, LIMES engine (used for result computing).
Sieve. Sieve relates to the quality phase of Linked Data Life Circle. The tool consists of two modules: data quality and data fusion [58]. Sieve realizes the prove of quality of data throw various mechanisms:
• Assessment Metrics. The metrics combines some quality indicators and “calculates an assessment score from these indicators using a scoring function“ [58].
37http://en.wikipedia.org/wiki/Eagle_strategy 38http://www.cs.mu.oz.au/~jbailey/papers/coalafinal.pdf 39http://en.wikipedia.org/wiki/Euclidean_algorithm
48 • Data qualities indicators. The indicators depends on information that the users need and a specific situation.
• Scoring functions. The functions are related to data qualities indicators and presented an evaluation of them. It includes simple comparison functions, complex statistical functions, network analyses, etc.
• Aggregate Metrics. The metric aggregates assessment metrics with use of the average, sum, max, min or threshold functions.
The second module presents a data fusion mechanism. “Data Fusion is commonly seen as a third step following schema mapping and identity resolution, as a way to deal with conflicts that either already existed in the original sources or were generated by integrating them“ [58]. There are two types of fusion finction in Sieve:
• Filter function. It uses a quality metric to remove some values from the input data sets.
• Transform function. It generates new values from input datasets with use of fusion function like Filter, First, Last, Random, or Average, Max, Min.
SemMap is used for knowledge visualization. It explores spatial areas and shows objects according to specific properties. The interaction between triple stores and the application is re- alized via SPARQL queries.
Sig.ma Sig.ma is a semantic web Mashup. The application has following tasks [80]:
• Browsing the Web of Data. Sig.ma browsers the information according to the input text data. The application returns the data from the Web of Data (e.g., name, title, location, etc.). The user has the ability to follow the links that the system returned.
• Embedding, linking and Sig.ma alerts. The user has an ability to expand and refine the sources in order to select needed values and properties.
• Structured property search for multiple entities. Search for properties. For example, the request “title, actor, year, [...] @ Harry Potter“ returns an array with given properties regarding to the entity “Harry Potter“.
The search for data sets has following steps [80]:
• Data source selection. The result is a list of sources that has been found via various Search Engine interrogations.
• Parallel Data Gathering. The Extraction of structured data from different data sources.
• Extraction and Alignment of related subgraphs. The structured data are separated into parts, each of that has a resource description. As next step the similarity of data will be found and connected via owl:sameAs.
49 The information above can be summarized as follows: LOD2 is large research and develop- ment project, which covers full Linked Data circle from data extraction to search. LOD2 focus on data and information integration, quality of data, and bringing Linked Data to enterprises.
3.2 Semantic Description Approaches
In this part following kinds of description methodologies are presented:
• Service description approaches such as WSDL, OWL-S, WSMO, and WSMO Lite.
• The approaches which present integration of the services and Linked Data like LIDS, LOS, Data-Fu, RestDesc, and Karma.
• A matching approach that presented matching from relational database to RDF like R2RML.
In course of the parts it was impotent to clarify if there is one approach that can be applied for Linked Widget.
3.2.1 Web Services Description Language (WSDL) WSDL is an XML-based language and a model for Web services descriptions. A WSDL de- scription provides machine-readable information about how the service can be invoked; what data or information are needed; and what is a return of the service. The service description in- cludes the operations provided by the service and expected parameters. The model of WSDL is a set of components and properties. There are two version of WSDL: 1.1 and 2.0. Version 2.0 is a part of W3C recommendation. WSDL 2.0 provides two kinds of information: abstract model (application-level description) and concrete model (the specific protocol-dependent details) [66]. The separation is needed because of different end points with dissimilar access protocols for common functionality. The abstract model describes messages that are sent and received by a Web service. The concrete model describes communication protocol (e.g., SOAP), service interactions, and the endpoint of communication (the address). WSDL document uses the following elements in the definition of web services 40 [66]:
• Types – a container for data type definitions using some type system (such as XSD) [66].
• Message (WSDL 1.1) includes essential information for operation execution and corre- sponds to an action (an operation).
• Operation - “an abstract description of an action supported by the service“.
• Port Type (WSDL 1.1) or Interface (WSDL 2.0) – a list of operations (inputs and out- puts) that can be performed by one or more endpoints.
40WSDL 1.1 and WSDL 2.0 use in some cases different terms
50 • Binding indicates a protocol and a data format specification for a port type (SOAP binding style); • Port (WSDL 1.1) or Endpoint(WSDL 2.0) – usually an URL to a single endpoint. • Service – a set of endpoints. Listing 3.2 presents the main elements of a WSDL description.
1 3 ∗ 4 [ | ]∗ 5 < t y p e s / >? 6 [ | | ]∗ 7 Listing 3.2: Main Elements of WSDL Description WSDL is “an XML format for describing network services as a set of endpoints operat- ing on messages containing either document-oriented or procedure-oriented information“ 41. It focuses more on technical side of processes and describes the syntax of service but not the se- mantics. This approach can not cover all requirements of semantic service retrieval and service composition.
3.2.2 Semantic Annotation for Web Services Description Language (SAWSDL) SAWSDL is “mechanisms using which semantic annotations can be added to WSDL compo- nents“ [70]. SAWSDL provides mechanisms by which concepts from the semantic models that are defined either within or outside the WSDL document can be referenced from within WSDL components as annotations [70]. Based on member submission WSDL-S, the key design principles for SAWSDL are [70]: • “The specification enables semantic annotations for Web services using and building on the existing extensibility framework of WSDL. • It is agnostic to semantic representation languages. • It enables semantic annotations for Web services not only for discovering Web services but also for invoking them“.
3.2.3 Semantic Markup for Web Services (OWL-S) OWL-S is an OWL-based ontology for description of web services. The language constructs are used for describing the properties and capabilities of Web services. “OWL-S markup of Web services will facilitate the automation of Web service tasks including automated Web service discovery, execution, interoperation, composition and execution monitoring“ [25]. The descriptions of Semantic web services have usually three interrelated subontologies or profiles: 41http://www.daml.org/services/owl-s/1.0/owl-s-wsdl.html
51
Ressource provides Service
presents (what supports (how it does) describedby to access it) (how it works)
profile process grounding
Figure 3.5: Top level of the service ontology
• The service profile provides service description, in standard OWL.
• The process model describes the processes inside the semantic web service.
• The service grounding describes access to the semantic web service, typically expressed in WSDL.
Like WSDL, OWL-S has abstract and concrete models. Abstract characterizations are the service profile and the process model. The service grounding provides concrete information needed for access to a web service like message formats, protocol, etc. Figure 3.5 depicts rela- tions between service and its components. In the picture the arrows present OWL properties and ovals show the OWL classes. The Service Models includes:
• Inputs and outputs. The inputs are the description of data that the service needs to process, the outputs are description of result data that service produce. Both properties are values from the Service Model.
• Precondition is a proposition that has to be true to execute the service.
• Result is a condition that becomes true after process execution.
The Service Profile specializes representation of services [25]. An OWL-S profile provides following kinds of information:
• Non-functional description (metadata like service name, description, contact information etc.). For example, the provider information includes information about the entity that is responsible for running the service.
52 • Function description about information transformation (function that can be computed, service characteristics) and service states (precondition and postcondition, fact). For ex- ample, a booking service can need as precondition the first and the last name of a person, credit cards data and an identity card ID. As output the service returns a booking confor- mation.
The service profile includes references to service model therefore it is possible to find those services that mostly satisfy requests. The service grounding describes an access to the semantic web service and implementation to WSDL, SPARQL etc. It represents exchange of information between consumer and service provider. It maps a abstract specification to a concrete model [25]. For example, in case of WSDL, it maps each atomic process to a WSDL operation and relates “each OWL-S process input and output to elements of the XML serialization of the input and output messages of that operation“ [57]. The inputs and outputs of a process in grounding level are realized as messages. The following example demonstrates the syntax of OWL-S. The process model describes the processes inside the semantic web service. Each service has some inputs , outputs , condition , result variables and effects. E.g. the service: purchase. To buy some products in the internet the user has to provide the information about his or her credit card like a credit card number, a name and a cvv. The card will be the input of the process “purchase“ (c.f. Listing 3.3).
1 2 3 Listing 3.3: OWL-S Example
As output after the purchase the user gets a confirmation number (c.f. Listing 3.4).
1 2 3 Listing 3.4: OWL-S Example
A result variable should be also defined (c.f. Listing 3.5). The result variable is a variable scoped to the Result block and bound by the result condition.
1 2 3 4 5 Listing 3.5: OWL-S Example
The advantages are OWL-S synthesizes both an extensional and functional view of Web services, it provides a complete description of the services that it describes [57], OWL logic and ontological are included in description.
53 The disadvantages are: the limitation of OWL-S is in using OWL as a language based on description logic [6], hard to describe semantic relation between input and output because it doesn’t provide mechanisms to express its relation to other service [81], no working tools.
3.2.4 Web Service Modeling Ontology (WSMO) WSMO is an ontology for description of the core elements of Semantic Web Services. The following design principles are basics for WSMO: Web compliance (use of URI for resource identification), ontology-based (resource description is based on an ontology), strict decoupling (“each resource is specified independently without regard to possible usage or interactions with other resources“ [68]), centrality of mediation, ontological role separation (clients role separa- tion), description versus implementation (separation between description of Web Service and implementation), execution semantics (technical realization), service versus Web service. WSMO uses similar approach to OWL-S for declaring and describing services. But there is a difference to OWL-S. If OWL-S focuses more on the description of services, WSMO focuses more on application domain and solving the integration problems [47]. The model include four parts:
• Ontology - domain description that can be used by other WSMO elements. This part includes machine-processable information that are needed for addition of meaning to the data.
• Web service interface - semantic description of services (the capabilities, interfaces and internal working of the service);
• Goal - results or goals of the usage of the web service;
• Mediator - coordinates WSMO components.
An ontology in WSMO includes also non-functional properties, used mediators, concept definitions, relation definitions, axioms, and instances [47]. The non-functional properties are globally accessible by all the modelling elements. The properties can be from controlled vocab- ularies like Dublin Core, Properties from other vocabulary, a standard set provided by WSMO (hasContributor, hasDate etc). Used mediators are used for linking to ontologies that should be imported, linking the goals, linking between services and WSMO goals, orchestration. In compare to WSMO, OWL-S does not support such a meta-ontology.
3.2.5 WSMO lite WSMO-Lite is a lightweight approach which is standardized according to the W3C standards for the semantic service description, “the next evolutionary step after SAWSDL, filling the SAWSDL annotations with concrete semantic service descriptions“ [84] which can be directly applied to WSDL description. WSMO-Lite allows bottom-up modelling of web services. WSMO-Lite adopts the WSMO model and makes its semantics lighter in the following major aspects: WSMO-Lite treats medi- ators as infrastructure elements, and specifications for user goals as dependent on the particular
54 discovery mechanism used, WSMO-Lite only defines semantics for the information model, func- tional and nonfunctional descriptions, and it accepts any ontology language based on resource description framework [84]. The approach treats Web services as atomic, and does not focus on internal behaviour of web services [84] and does not have a concrete language for description of semantics for function. WSMO-Lite service ontology has three parts:
• Domain ontology that presents an information or data Structure Model. WSMO-Lite iden- tifies types and simple vocabulary for semantic description of services and languages used to express descriptions.
• Capabilities and/or functionality classifications that presents functional description of the service (conditions definition, effects)
• Non-functional and description is represented by an ontology that specifies policy or other non-functional properties.
The main disadvantage of this approach is that it focuses on description of Web APIs, and not on providing relationships between data that is processed by web services.
3.2.6 RESTDesc semantic description RESTDesc is a semantic functionality-centred method which expresses the functionality of a service - as well as its communication - in a concise way that appeals to human and can be processed automatically [81] [82]. The main elements of the RESTDesc approach are the pre- condition, the postcondition and the request details. The precondition describes the input state of a resource of a service. The postcondition is the output state after the interaction and the request details defines a method which should be used to achieve the new state. These ele- ments are brought together in the form of the rule, which takes care of correct quantification and variable instantiation [81]. This approach supposes use of an ontology model. E.g. It can be an RDF schema. The links are used to define the relationships between resources. E.g http://example.org/pictures/1 and http://example.org/pictures/ 1/animals/1, it means that the picture 1 is grouped to the category “Animals“. Listing 3.6 demonstrates use of RestDesc description language for service description. The service gets a director as input data and returns list of movies. The precondition defines an input of the service, an instance of the class movie:Director. This input is required for the service invocation. In the postcondition section the HTTP vocabulary is used to describe a GET request. The directorOf link shows the relationship between the director and the movies. The service returns the list of movies and provides additional data such as year, actor.
1 2 @prefix movie: . 3 @prefix http: . 4 @prefix tmpl: . 5 { ?director a movie:Director. } 6 => 7 {
55 8 _:request http:methodName "GET"; 9 tmpl:requestURI (?director "/movie"); 10 http:resp [ tmpl:represents ?movie ]. 11 ?director movie:directorOf ?movie. 12 ?movie movie:year _:year; 13 movie:starring _:actor; 14 movie:type _:type. 15 }. Listing 3.6: RESTDesc Example
The advantages of this approach are: it is possible to describe relationships between input and output data (e.g., ?director movie:directorOf ?movie., it links web services direct to data sets. The disadvantages are: the user should manually write the description of the model while RESTDesc doesn’t support automatic generation of service descriptions, the approach focuses on applying HTTP methods for data retrieving, publishing, etc. while mashup focuses on Linked Data consuming and data description.
3.2.7 SA-REST SA-REST is an a simple and open microformat for enhancing Web resources with additional semantic information [31]. The meta information can be modeled according to various for- mats such as RDFa, OWL or Gleaning Resource Descriptions from Dialects of Languages (GRDDL42). Altogether that makes the service description human readable as well as machine readable. The main idea of SA-REST is to add semantic description directly into SAWSDL or HTML code. SA-REST like SAWSDL “annotates outputs, inputs, operations, and faults, along with the type of request that it needed to invoke the service“ [48] in form of URIs. It means, that the SA-REST links an ontology to a service. For example, an input message can be annotated by embedding a URI from an ontology. An important point of the approach is lifting and lowering schema specification. It is used for data structure transformation from input or output of services to an ontology. The idea is similar to OWL-S grounding. To realize this transformation SA-REST uses XSLTs or XQueries. The queries take a data structure from the implementation level (data expected as input or output of the service) and convert it into an ontology structure. Listing 4.1 demonstrates a Web page which is annotated with use of SA-REST. In this ex- ample, the user searches for information about a movie. The user puts a title object to the movie-search-service. An the service returns the description from the output of the movie- search-service.
1 2 3 4
42www.w3.org/TR/grddl/
56 6 8 9 11 13 15 < / meta> Listing 3.7: SA-REST Example
The advantages of the SA-REST are adding semantics direct to REST services, WSDL, or HTML; SA-REST doesn’t enforce the choice of language for representing an ontology or a conceptual model, but it allows the use of OWL or RDF [48]. SA-REST is “a more general purpose language that adds semantic annotations only to those page elements that wrap a service or a service description“ [48]. This could problems associated with widget composition and discovery. The disadvantages are that the annotation of web pages is often problematical, while the programmer should usually select a page which will be annotated with semantic description. Additionally, for Linked Widgets it is important to separate technical and semantic parts.
3.2.8 EXPRESS EXPRESS is an approach for semantic service description. The main feature of EXPRESS is providing “an uniform interface for resources“ [6]. The resources are described with use of an OWL ontology and the HTTP methods (GET, PUT, DELETE, POST and OPTIONS). The RESTful interface can be automatically created because of automatic direct mapping between entities and resources. The EXPRESS includes a service provider and an EXPRESS deployment engine. The ser- vice provider provides an OWL file describing the resources in a Web Service [7]. The OWL file also defines “exchanged message format“ [7]. The URIs for resources are generated through an EXPRESS deployment engine. After URI generation the service provider assigns the HTTP methods to the classes, properties and instances [7]. The user roles are provided to differ the access to resources and methods for different kinds of user (role-based access control). Listing 3.8 shows an example of a DVD ordering service description with using EXPRESS methods. A DVD ordering is provided by a Web Service. The service provider provides an ontology that describes entities and relationships between them. In this example, the classes are DVD, Customer and Order. The customer can order movies and games (the subclasses of class DVD).
1 2 // The custemer can order movies and games 3 :DVD a owl:Class. 4 :Movie a :DVD. 5 :Game a :DVD. 6 7 // an order can include movies and games 8 // the order has properties that defines a customer
57 9 // and time of the ordering 10 :Order a owl:Class. 11 :hasDVD a owl:ObjectProperty; 12 rdfs:domain :Order; rdfs:range :DVD. 13 :OrderedBy a owl:ObjectProperty; 14 rdfs:domain :Order; rdfs:range :Customer. 15 :hasDate a owl:DatatypeProperty; 16 rdfs:domain :Order; rdfs:range xsd:dateTime. 17 18 // class Customer 19 :Customer a owl:Class. 20 :hasName a owl:DatatypeProperty; 21 rdfs:domain :Customer; rdfs:range xsd:string. Listing 3.8: EXPRESS Example
The EXPRESS deployment engine generates the URIs. E.g., http://www.example. org/DVD is a URI for class DVD, http://www.example.org/Order is a URI for class Order. URIs are generated also for properties and instances of the classes. E.g., http:// www.example.org/customer1 is a URI for an instance of the class “Customer“, http: //www.example.org/customer1/hasName is a URI for the property “hasName“. The next step is methods subscribing to the resources that define the HTTP methods for each URI. If there are different types of user in the system, the methods will be defined via role based access control. After role definition stubs will be automatically created. Firstly, for DVD ordering the user sends a request to the server via the GET methods. The method returns the list of DVD from the OWL file. Secondly, the user orders items via a POST request to http://www.example.org/Order. The response of the server will be a URI of the new order (http://www.example.org/order1). If the user is already existing, the server selects automatically the URI of this user into the new order. Otherwise, the new user will be created and the server will return a new URI. The desired products are added via a PUT request to the server. Listing 3.9 is an example of the message that will be send to the server.
1 2 //The customer "Irina:" ordered an item 3 4 :customer1 a :Customer ; 5 :hasName "Irina". 6 :order1 a :Order ; 7 :hasDVD :Movie ; 8 :orederedBy :customer1. Listing 3.9: EXPRESS Example
The EXPRESS approach has following advantages: it eliminates the need describing ser- vices separately [7], not so complicated like WSMO and OWL-S, and uses OWL ontology to provide “a description of a RESTful Semantic Service“ [6]. The disadvantage of the EXPRESS approach are: there is no implementation of this approach, automatic discovery and composition are not yet possible, integration of the semantic model into the resource oriented architecture is not yet implemented [6].
58 3.2.9 Linked Open Services (LOS) The LOS approach supposes a service description method that simplifies the access to the seman- tic web services for LOD specialists [63]. The input and output of the services are connected via links to Linked Data. The semantic description presents what kind of input and output RDF data a service can consume and produce, and “how a service invocation contributes to the knowl- edge of its consumers“ [63]. The approach focuses on data description, therefore services can be more easily integrated in service compositions [63]. LOS does not only follow the Linked Data principles, but also proposes “a list of further service-specific principles to be followed for openly exposing services over Linked Data“ [63]. These principles are:
• SPARQL graph pattern for service description of input and output (inclusive the specifi- cation of data format);
• Use of RESTful content negotiation.
• The explicit relation between outputs and inputs.
• SPARQL CONSTRUCT will be available for lifting or mapping (optionally).
The approach proposes to transform non-RDF data to RDF data if the service does not accept the non-RDF data. After the data processing the returned output non-RDF data should be transformed again to the RDF format. The following parts of a code shows examples of production and consumption patterns for data that are accepted and returned by a service. The service gets information about actors and returns movies based on names and birthdays of actors. The client sends a request which contains information about an actor, the name and the birthday (c.f. Listing 3.10).
1 2 [a dbpedia:Person; 3 dbpediaprop:name ?name; dbpediaprop:bithDay ?b] Listing 3.10: Request to a service
The server sends a response in form of a message after completing the request. The response contains information about movies (the title, the year, and the actor) (c.f. Listing 3.11).
1 [ moviedbbase:movie [ 2 moviedbbase:title ?title ; 3 moviedbbase:year ?year ; 4 dbpedia:actors ?actor ; 5 dbpedia:name ?name; dbpedia:bithDay ?b] 6 ] Listing 3.11: Response of a service
The disadvantage is that LOS uses string values to present graph pattern, e.g. “[a dbpe- dia:Person; dbpediaprop:name ?name]“. Therefore the quality of discovery and composition reduces.
59 3.2.10 Linked Date Services (LIDS) LIDS focuses on Web Services and Linked Data integration by providing an interface [74]. The LIDS follows Linked Data principles, therefore the set of requirements are fulfilled: a URI of input of a service is required to invoke this service; an “URI must return a description of the input entity, relating it to the service output data“; the description has to be modeled according to RDF standard [75]. The use of URI as identifier for input entities has following advantages: the explicit link between input and output; the entities can be connected to different results; the representation of the result structure by a description; the meaning of the data by means of an ontology. Linked Service is interlinked with a Linked Data Endpoint. This gives a possibility to enrich Linked Data automatically. Additionally, the LIDS approach supports Linked Data publication and interlinking of Linked Data Endpoints with Linked Data Services. Listing 3.12 presents basic elements of a description. SPARQL constructs are used for adding relation between data and a service. input represents specific input values and ser- vice parameters. endpoint is a URI of a Linked Data Endpoint, that is used to construct service calls. io-relation is relation between input and output data.
1 2 CONSTRUCT { [ io−relation] } FROM [endpoint] 3 WHERE { [input] } Listing 3.12: LIDS Construct
Listing 3.13 presents a construct expression. The variable ?star will be found by the service. The variable ?movie is an input object of a service which has properties dbpediaprop:title and dbpediaprop:year. The service receipts title and year attributes of a movie, and based on this attribute finds and returns a list of stars.
1 2 CONSTRUCT { ?movie dbpediaprop:starring ?star } 3 4 FROM 5 6 WHERE { ?movie dbpediaprop:title ?title . 7 ?movie dbpediaprop:year ?year } Listing 3.13: LIDS Example
Listing 3.14 shows the basic pattern of LIDS descriptions which can be added in an ontology, where LIDS - an instance of the Linked Service, ENDPOINT - an HTTP URI of the Linked Service, ENTITY - the name of the entity, INPUT and OUTPUT graph patterns, VARS - variables or input parameters.
1 @prefix lids: 2 LIDS a lids:LIDS; 3 lids:lids_description [ 4 l i d s : e n d p o i n t ENDPOINT ; 5 lids:service_entity ENTITY ; 6 lids:input_bgp INPUT ; 7 lids:output_bgp OUTPUT ;
60 8 lids:required_vars VARS 9 ]. Listing 3.14: LIDS basic pattern Listing 3.15 presents an example of applying LIDS approach. The example shows a “Movie find server“ which returns a set of stars based on an year and a title of a movie.
1 :MovieFindService a lids:LIDS; 2 lids:lids_description [ 3 lids:endpoint 4 ; 5 lids:service_entity "movie" ; 6 lids:input_bgp "?movie a dbpedia:Work. 7 ?movie dbpediaprop:title ?title . 8 ?movie bpediaprop:year ?year" ; 9 lids:output_bgp "?movie dbpediaprop:starring ?star" ; 10 lids:required_vars "title year" 11 ]. Listing 3.15: LIDS Example The work on this approach is not yet finished. The LIDS developers plan to improve tool support, develop an integration mechanism into SPARQL processing, and add usage policies. The disadvantages of the LIDS approach are that LIDS approach uses graph pattern which is presented as string, e.g. lids:input_bgp “?movie a dbpedia:Work“. This limits service discovery and compositions because the graph can not be queried.
3.2.11 Data-Fu The goal of the approach is a specification of data and services that process Linked Data from various data sources. Data-Fu is “resource-driven programming approach leveraging the com- bination of REST with Linked Data“ [76]. The approach gives an opportunity to develop an application that access to semantic web resources via using a declarative rule language. The approach simplifies web application development by providing links to Linked Data and inter- action specification based on resource state. Data-Fu follows three basic principles: use of URIs for resources identification, use HTTP methods to access and process data, and to interlink resources. It also denotes that Linked Data “does not distinguish explicitly between URI-identified objects and their representation“ [76]. The combination Linked Data with REST brings an ability to manipulate data. Data-Fu provides a mechanism to define changes of resource states. Data-Fu includes two layers: • Read/Write Linked Data Resource - the applying HTTP methods to Linked Data re- source. The most important methods are GET, POST, OPTION, DELETE, and PUT. Data-Fu distinguishes safe and non-safe methods. The non-safe methods effect on state of the resource (e.g., the method DELETE that deletes some datasets). The safe methods don’t affect state of the the resources. “The dependency between communicated input and the resulting state of resources also needs to be described“ [76]. For example, the method PUT creates or overwrites a resource with the submitted input.
61 • REST Service Model is formalized model for description of interactions that are sup- ported by RESTful services. It describes the influence of HTTP methods on Linked Data resource states and presented by “a REST state transition system (RSTS)“ [76].
The both two layers use RDF for description of methods and resources. The Data-Fu tech- nique also includes an interpreter, an engine which invokes service interaction. The interaction are specified by Data-Fu rules. An advantage of the engine is ability to process complex queries at the same time. After processing the engine can store the data in different formats like JSON or RDF. Listing 3.16 presents a description of Linked Data services using Data-Fu language. The first part describes the HTTP method GET which returns a movie item. The second part describes a method POST which adds the additional information to the movie item (title, year and a star).
1 2 GET (?mid, {}) 3 <− {?mid rdf:type ex:MovieID} 4 5 POST{?d, {[] rdf:type ex:Description; 6 ex:title ?t ; 7 e x : y e a r ?y ; 8 ex:starring ex:Person. }) 9 <− { ex:Movie ex:hasID ex:MovieID }. Listing 3.16: Data-Fu Example
The approach focuses on the applying the HTTP methods for interaction with Linked Data processing. The disadvantages of this approach are: it is not defined how to discover and com- posite services; the querying mechanism is not defined.
3.2.12 Karma Karma is a tool for data integration from various data sources and generation semantic interlink- ing data. The ontology describes APIs description as well as semantic relation between input and output data. Karma approach suggests to “represent the semantics of Web APIs in terms of well known vocabularies, and to wrap these APIs so that they can consume RDF from the LOD cloud and produce RDF that links back to the LOD cloud“ [78]. Due to the semantic description of APIs, the description can be queried with use of SPARQL queries. The modelling includes two steps: ontology definition, assignment data to semantics types, and Identification of relationships between data and ontology. The model of a linked API consists of two parts: the syntactic part which provides re- quired information (e.g, a URI, input parameters) for service, and semantic part that describes input and output data of a service and the relations between them [78]. Figure 3.6 represents the semantic model of this approach. Each Service km:Service has inputs km:hasInput and outputs km:hasOutput that are linked to a model km:Model. The input and output models are defined with use of the Semantic Web Rule Language (SWRL)43. The Model is linked to swrl:Atom instances: swrl:ClassAtom which describes an instance of a class
43http://www.w3.org/Submission/SWRL/
62 Figure 3.6: The ontology description of a Web APIs. Source: [78] and swrl:IndividualPropertyAtom which presents an instance of a property [78]. The variable swrl:Variable are data that the service gets as input or returns as output. For example, how to describe with use of the Karma approach a service which receives an instance of the class author as input data and returns an instance of the class publication as Output? The class atoms will be linked to instances of classes author and publication. The individual property atom will have a relation to an rdf:property - “theAuthorOf“. The variables will be author name and birth date. The service processes this data and returns URIs of one or more publications. Listing 3.17 presents a snippet of a service description with use of the Karma approach.
1 @prefix : . 2 @prefix dbpedia: dbpedia: . 3 ... 4 : a km:Service; 5 km:hasName "actors" ; 6 hrests:hasAddress "http: //api.example.org/findactors? 7 title={harry}&username={username}" ^^ 8 hrests:URITemplate ; 9 hrests:hasMethod "GET"; km:hasInput :input; 10 km:hasOutput :output . 11 12 ... 13 14 :input a km:Input; :output a km:Output; 15 km:hasAttribute :in_title; km:hasAttribute :out_actor_name ;
63 16 km:hasModel :inputModel . km:hasModel :outputModel . 17 :in_title a km:Attribute; :out_actor_name a km:Attribute; 18 km:hasName "title" ; km:hasName "actor_name" . 19 hrests:isGroundedIn ... 20 "p1"^^rdf:PlainLiteral . 21 22 ...... 23 :title_var a swrl:Variable . 24 :inputModel a km:Model; 25 km:hasAtom 26 [ a swrl:ClassAtom ; 27 swrl:ClassPredicate dbpedia:title; 28 swrl:argument1 : title_var]; 29 ... Listing 3.17: Karma Example
Another goal of the Karma approach is the automatic modelling and optimization of source model. This is based on a graph-based approach which was introduced by Karma developers. The main focus is the problem of automatic semantic annotation. The approach increases “the quality of the automatically generated models by using the already modelled sources to learn the patterns that more likely represent the intended meaning of a new source“ [77]. There are many sources that provide similar semantic linking data. The task of the project is using already existing resource models in order to get a new one. Typically there two steps in
Figure 3.7: Graph-based approach by an example. Source: [78]
64 modelling process. The first step is determination of semantic types. It means that each attribute should be “labelled with a class or a data property of the domain ontology“ [77]. For example, to invoke the service getEmployees it is required to provide the attributes “employer“ and “employee“. The domain ontology include two classes: Person and Organisation. As result of this step the attribute “employee“ will be labelling with the class “Person“, and the attribute “em- ployer“ with the class “Organization“. The second step is relationship definition, e.g a person “worksFor“ an organisation (c.f. Figure 3.7). As a graph is constructed, labeling the attributes to semantic types and search for appropriate nodes with use of machine learning technique can be performed. Next, the models should be scored in order to find one that is matched with more coherent and frequent patterns and build- ing of a tree for candidate models generation. The last step is the generation of a ranking list according to which the users have possibility to choice a correct model. Additionally, the new version of Karma can include direct mapping between data stored in relational databases and domain ontologies with use of W3C’s R2RML44. This mapping lan- guage will be introduced in the following section.
3.2.13 RDB to RDF Mapping Language (R2RML) In order to make the semantic model more flexible, the relational database to rdf mapping is very important. The automatic mapping can increase the data volume that can be used for specific tasks. The suggestion is to use RDB to RDF Mapping Language (R2RML) for automatic dataset generation and combination with the existing web service description models. This part of the Master Thesis is based on the W3C recommendation for R2RML [72]. R2RML is a language for “relational database datasets to RDF datasets“ transformation. The language describes the database structure as input and returns the structure of new RDF dataset. Transformation to RDF graph is occurred via SPARQL constructs. The target RDF vocab- ulary composes the database elements name, therefore it is not possible to change the RDF structure or vocabulary. Figure 3.8 presents the meta-model of the R2RML. It includes the following elements: “triplesMap, LogicalTable, PredicateObjectMap, GraphMap, SubjectMap, PredicateMap, ObjectMap, RefObjectMap and Join“. The Input can be an SQL query to the Database. The code below presents an example of the SQL query that selects data about movie (title and date) from movie database.
1 [] rr:sqlQuery """ 2 S e l e c t ( ’ Movie ’ | | MOVIENO) AS MOVIEID 3 , MOVIENO 4 ,TITEL 5 , DATE 6 from LW.MOVIE 7 """; 8 rr:sqlVersion rr:SQL2008. Listing 3.18: R2RML
44http://www.w3.org/TR/r2rml/
65 The rules for relational dataset to rdf mapping should be specified via TripleMap, that has exactly one logical table, one subject map and zero or more predicate object map properties. The logical table describes the set of data that have to be mapped to RDF.
1 namespace: 2 [] 3 rr:logicalTable [ rr:tableName "MOVIE" ]; 4 rr:subjectMap [ rr:template "http: //linkedwidget.org/ 5 moviedataset /{MOVIENO}" ]; 6 rr:predicateObjectMap [ 7 rr:predicate lw:titel; 8 rr:objectMap [ rr:column "TITEL" ]; 9 ]; 10 rr:predicateObjectMap [ 11 rr:predicate lw:date; 12 rr:objectMap [ rr:column "DATE" ]; 13 ]. Listing 3.19: R2RML
The subject map property describes the way of subject generation. It references one or more properties rr:class (c.f. Listing 3.20). The value of the property is an IRI.
1 input: [] rr:template "http: //linkedwidget.org/moviedataset/ 2 {MOVIENO} " ; 3 rr:class lw:Movie. 4 output: rdf:type lw:Movie. Listing 3.20: R2RML
Figure 3.8: An overview of R2RML
66 The predicate-object map is “a function that creates one or more predicate-object pairs for each logical table row of a logical table“ [72]. The predicate-object map is linked to one or more predicate maps, and one or more object maps or referencing object maps. The term map is “a function that generates an RDF term from a logical table row“ [72]. The term map relates to following RDF terms:
• Constant value (via rr:constant), represented by a resource.
• Column name (via rr:column), a valid SQL identifier.
• String template (via rr:template), a format string for strings building from multiple components.
• rr:IRI, rr:BlancNode, rr:Literal (via rr:termType), defines type of an RDF term, that can be either an IRI, or a blank node or a literal.
• language tag (via rr:language).
• rdfs:Datatype (via rr:datatype).
• string template (via rr:inverseExpression), for term map optimisation.
“A term map must be exactly one of the following: a constant-valued term map, a column- valued term map, a template-valued term map“.
1 [] rr:predicateMap [ rr:constant rdf:type ]; 2 rr:objectMap [ rr:constant lw:Movie ]. 3 ?x rdf:type lw:Movie. 4 5 [] rr:objectMap [ rr:column "MOVIENO"; rr:datatype 6 xsd:positiveInteger ]. Listing 3.21: R2RML
Relations mapping. It is possible to add reference between two instances instantiated from database. For example, the relation between movies and actors who have played in the movie. It is realized by adding a property object map. The property object map references triple map and join condition via rr:parentTriplesMap and rr:joinCondition. The join condition has exactly one value of property rr:child and one value of property rr:parent. The following code presents a SQL query, if the referencing object map has no join condi- tion.
1 SELECT ∗ FROM ({child −query}) AS tmp Listing 3.22: R2RML
Second code presents a SQL query, if the referencing object map has no join condition, if the referencing object map has at least one join condition.
67 1 SELECT ∗ FROM ({child −query}) AS child , 2 ({ p a r e n t −query}) AS parent 3 WHERE child .{ child −column1}=parent .{parent −column1 } 4 AND child.{child −column2}=parent .{parent −column2 } 5 AND . . . 6 7 [] rr:predicateObjectMap [ 8 rr:predicate lw:movie; 9 rr:objectMap [ 10 rr:parentTriplesMap <#TriplesMap2>; 11 rr:joinCondition [ 12 r r : c h i l d "MOVIENO" ; 13 r r : p a r e n t "MOVIENO" ; 14 ]; 15 ]; 16 ]. Listing 3.23: R2RML
The result is ex:starring .
3.3 Summary
This chapter presented some of the existing mashups platforms (Yahoo!Pipes, DERI Pipes, BIO2RDF), the tools that are provided by LOD2, and semantic approaches for enhancing web services with additional semantic information. The first part of the analysis showed that the application has a number of weak points:
• Most of the applications are not general, i.e. the focus is usually a specific problem. For example, BIO2RDF provides just Life Science Linked Data, LIMS is based on estimation of similarity between instances [8], etc.
• The systems do not give a possibility to develop new functions that can solve additional tasks.
• The mashup platforms are not described semantically, therefore composition and discov- ery are very difficult.
• For non-professional users it is often difficult to use this application because specific knowledge is needed.
In the second part of the analysis described the advantages and disadvantages of the semantic description approaches. The approaches can be categorized into the following types of service description approaches: the approaches that focus on technical aspects of web services, the approaches that focus on integration of web services and Linked Data, the approaches that are focused on quality of the ontology models, and matching to transform different types of data into RDF.
68 The main focus of the first group is representation of interaction between software compo- nents. Most of them do not describe explicit relations between input and output data. Addi- tionally, the developer should have a very good knowledge in this domain. The developer needs to describe preconditions for the Web Service execution, postconditions, and effects. Moreover the developer should describe very detailed rules, the choreography and orchestration of the service. In compression to web services widgets do not have such wide variety of functionali- ties that must be described. Additionally, the Mashups platform supports widgets development. Often knowledge workers don’t have enough practical experience in service-oriented architec- ture, therefore the mashups platform should provide automatic generation of semantic widget description. Due to this fact applying this approach for widget description is not possible. The second group of approaches focuses on description of relations between Linked Data and Semantic Web Services. This is advantageous due to the widget processing Linked Data. But most of them have limitations, for example, LIDS and LOS “integrate data services with Linked Data by assigning a URI to each service invocation. The service URI is linked to re- sources in the Linked Data cloud and dereferencing the URI provides RDF information about the linked resources“ [78]. The input and output graphs are presented as string. This limits widgets discovery and composition. Additionally, it should be possible to query the semantic descriptions with use of SPARQL. Data-Fu and EXPRESS don’t support easy service querying; therefore this approach is not applicable for widgets. An approach which can support widget publishing, widget discovery, widget composition, and widget execution can be Karma. An example of widget description with use of Karma ontology is provided in Chapter 4. The third and fourth groups of approaches are not relevant in this stage of mashup platform implementation. In future, it may be possible to extend the semantic model with the relation database to rdf mapping (R2RML) in order to increase the amount of datasets that can be pro- cessed by the mashups platform. The following benchmarking table 3.1 and 3.2 summaries the features of the approaches that are described in this Chapter: • Possibility to publish services description on the LOD cloud. Does the approach follow Linked Data principles to make information available on the LOD Cloud? • Discovery and composition based on input and output data. Does the approach support description of input and output data based on that it is possible to provide discovering and composition of the services? • Provenance information. Is it possible to define the origin of data sets? • Description of relations between data. Does the approach support semantic relations be- tween input and output data? • Separation presentation and data level. • Complexity. How much time does the developer need to spent to be familiar with the approach? • Possibility to discover the service using SPARQL. Is it possible to query the models?
69 Feature WSDL SAWSDL OWL-S WSMO WSMO- SA-REST Lite Goal adding adding description semantic semantic de- adding descrip- semantic of web service de- scription of semantic to tion of annotations service scription services services function- to WSDL functionality alities Method description extension of description F-Logic an an- adding the of ser- WSDL logic (OWL) for logical notation annotation vices expres- mechanism in the endpoints sions for WSDL service and their using this description messages service ontology Possibility no no no no no no to publish services de- scription on the LOD cloud Discovery no the concepts via OWL-S complicated, no no and Compo- from the process hard to im- sition based semantic models plement on input and models are (based output data referenced only for from within input/out- WSDL com- put data ponents as (relations annotations ignoring)) Provenance no it is possible it is possible no no difficult information to extend the to extend the model ontology Description no no no no no no of relation between data Separation yes yes yes yes yes no presentation and data level Complexity yes yes yes yes yes no Possibility to no no, an ex- provides ba- provides no no discover the tension is re- sic function basic func- service using quired [41] for discov- tion for SPARQL ering. It is discover- queries required to ing. It is 70 extend the required to model, e.g. extend the [30] model Table 3.1: Approaches comparison. Part 1 Feature RESTDesc EXPRESS LOS LIDS Data-Fu Karma Goal adding adding semantic semantic semantic integration semantic semantic description description descrip- of data from descrip- annotations of services of services tion of differnt tion of to services that process that process services sources service Linked Data Linked Data that function- process alities Linked Data Method describing description applying semantic using a providing precondi- of services SPARQL description declara- semantic de- tion and with use of constructs of services tive rule scription of postcon- OWL for service following language data and API dition of description Linked Data description resources principles of a service Possibility no no no yes no yes to publish the services description on the LOD cloud Discovery yes no difficult, difficult, no yes and Compo- because because sition based of using of using on input and string value string value output data for graph for graph patterns patterns Provenance no no no no no no information Description yes yes yes yes yes yes of relation between data Separation yes yes yes yes yes yes presentation and data level Complexity no no no no no yes Possibility to no no, because no, because no no yes discover the of using of using service using string value string value SPARQL for graph for graph queries patterns patterns Table 3.2: Approaches comparison. Part 2 71
CHAPTER 4 Solution
4.1 Definition of requirements
Figure 4.1 presents an example of a mashup. The mashup provides a combination of wired widgets, simple applications that provide some functionalities for data processing or visualizing, such as “Location“ and “Air Quality Filter“. The main components of a widget are input/output terminals and options. The input and output terminals are used to wire the widgets in order to process the data. Additionally, the widgets include options, input that influences the data process, such as “Choose location type“, “Street“ and “Maximum distance“. The widgets can be categorized into the following types: data widget which accesses to data sources and retrieves data, processing widget which process data that were retrieved from other widget (e.g., “geo merger“), presentation widget which vitalizes data sets in form of diagrams, maps, etc., and user interaction widget which is used to provide additional functionalities, e.g item selection.
Figure 4.1: Mashup example
A goal of this thesis is to develop a semantic model that will support publishing the widgets
73 on LOD Cloud, widget discovery, widget composition, and execution, selection of the required input from the provided context information which is based on a semantic model. The previous two section have given an overview on principles of Semantic and Linked Data, and on Semantic Web Services Description. Based on these principles the following basic requirements and widgets features can be specified:
1. Widgets and Mashups are identified via an identifier - a URI. User agents may deference the widgets via these URIs. The user will have a possibility to share and publish information about unique widgets and mashups.
2. By dereferencing the widget URIs, the semantic model will be returned. Widgets have semantic models that describe what kind of data can widget retrieve and process. This model will be returned.
3. The model should follow web standards (W3C recommendations). E.g., use of Semantic Web standards for data description (RDF, PROV). “The use of standards enables the Web to transcend different technical architectures“ [36]. The use of standardized content format enables to process and publish data on the Web. RDF is used to present data structure and enable the integration of information from mul- tiply sources. Since the widget description is presented with use of RDF standards, it should be possible to discover and composite the widgets.
4. The semantic model should support adding links to other Linked Data sources. These links allow the mashups platform to connect distributed data into a data space and to navigate over the data sets. For example, a link adds the relationship “owns“ between an owner and his/her pet. The mashups platform can find a URI of a widget that retrieves RDF data describing pets owner. Following the links “owns“ the mashup platform can find widgets that can process data about his pets.
5. A widget may have more than one semantic model, but all should generate the same output with explicit relation to input graph. For example, finding geographical coordinates based on different types of of location such as parks, organisations, libraries that have different properties and can be consumed from various Linked Data Endpoint. For example, organisation can be consumed from DBPedia Endpoint and places can be consumed from Open Governmental Data like US Data Governmental Data1 or some similar Linked Data Endpoints. The output of the widget will be set of points that are modelled according to GeoNames Ontology2 and related to the location models.
1http://www.data.gov/ 2http://www.geonames.org/ontology/documentation.html
74 6. The input and output data should be interlinked, explicit relations between data should be defined. Figure 4.2 presents an example of the widget “DBPedia Film Merger“. This widget can have different instances of class from dbpedia:person and dbpedia:Work as input and output. The explicit relations in this case are dbprop:starring and dbprop:directorOf.
7. The model should be general to support various types of widgets. E.g. data widgets, presentation widgets.
8. The semantic model should provide “an explicit representation of provenance information that is accessible to machines, not just to humans“ [85]. The semantic model should provide information about origin and ownership of datasets, change tracking, and access control that will increase people’s trust in data quality.
In previous Chapter approaches for semantic description for web services have been com- pared according to features that are relevant for the widgets like techniques of composition and discovery, possibility to add relationships between input and output data, possibility to publish services on the LOD cloud, provenance information, etc. The benchmarking table shows that the approach provided by Karma satisfies nearly all requirements for the semantic model. Following Chapter provides an implementation of the semantic model based on this approach.
4.2 Use and Extension of Karma Approach
Figure 4.3 shows a semantic model for widget description based on Karma approach. The model represents the semantics of widgets including relationships between input and output data, and it uses RDF so that models can be queried using SPARQL [45]. The model
Figure 4.2: Widget & Semantic Model
75 Figure 4.3: Linked Widget Model based on Karma approach has two kind of properties lw:hasInput and lw:hasOutput that are linked to a Model (property lw:hasModel). The SWRL vocabulary is used to define input and output data. SWRL is based on a combination of the OWL DL and OWL Lite sublanguages of the OWL Web Ontology Language with the Unary/ Binary DatalogRuleML sublanguages of the Rule Markup Language [38]. SWRL allows to write rules expressed in terms of OWL concepts. An swrl:ClassAtom entity shows the membership of an instance to a class, an Data- valuedPropertyAtom presents an instance of a data property (e.g., an entity of class dbpe- dia:Work has the property dbprop:hasTitle where title is a string value), and an Indi- vidualPropertyAtom entity describes an instance of a property. Figure 4.2 shows a widget which finds films. The widget receives datasets that contain in- formation about some stars and directors. The first terminal is used to wire the widget with a widget that returns datasets about stars. The second terminal is used to wire the widget with a widget that returns datasets about directors. The widget is identified by a URI, e.g. http://www.linkedwidgets.org/widhet/w5 and has two inputs that are connected with models mw5:starModel and mw5:directorModel and one output that is connected with the model mw5:filmModel. The models define data, that the widget process and add relationships between these data, with use of SWRL vocabulary. In this example there are three models because the widget process a set of stars and a set of directors in order to return a set of films. Each kind of instances needs a semantic description. The models are depicted in figures 4.4, 4.5, and 4.6: • The first picture presents a model of the first input - a set of stars. A star is an instance of the class dbpedia:Person that has the property dbprop:starring. • The second picture presents a model of the second input - a set of directors. A director is an instance of the class dbpedia:Person that has the property dbprop:director.
76 • The third picture presents a model of the output - a set of films. The relationships between the star class and the film class, and the director class and the film class are described using instances mw5:PropertyAtom1 and PropertyAtom2 of class swrl:PropertyAtom.
Figure 4.4: The star model
The following code presents a part of a widget description that can be published on the LOD cloud.
1 @prefix mw5: . 2 @prefix ontology: . 3 ... 4 mw5:Widget a lw:Widget; 5 lw:hasName "Movie Widget" ; 6 lw:hasInput mw5:Input1; 7 lw:hasInput mw5:Input2; 8 lw:hasOutput mw5:Output; 9 10 mw5:StarModel a lw:Model. 11 mw5:DirectorModel a lw:Model. 12 mw5:FilmModel a lw:Model. 13 ... 14 15 mw5:Output a lw:Output; 16 km:hasModel mw5:FilmModel. 17 ...... 18 mw5:FilmModel a lw:Model; 19 lw : hasAtom 20 [ a swrl:PropertyAtom1 ; 21 swrl:propertyPredicate dbprop:starring;
77 Figure 4.5: The director model
Figure 4.6: The film model
22 swrl:argument1 mw5: star ; 23 swrl:argument2 mw5:film ]; 24 lw : hasAtom
78 25 [ a swrl:ClassAtom3 ; 26 swrl:ClassPredicate ontology:Work; 27 swrl:argument1 mw5:film ]; 28 ... Listing 4.1: Widget Model represented formally
Moreover, the semantic model should provide automatic widget matching and execution. Listing 4.2 shows a SPARQL query that searches for widgets that contain a specific kind of semantic relation dbprop:starring.
1 SELECT ?widget ?name ?variable1 ?variable2 2 3 WHERE { 4 ?widget [lw:hasInput [lw:hasModel 5 [lw:hasAtom [swrl:propertyPredicate dbprop:starring; 6 swrl:argument1 ?variable1; swrl:argument2 ?varibale2 ]]]]. 7 ?widget lw:hasOutput[lw:hasModel 8 [swrl:propertyPredicate dbprop:starring; 9 swrl:argument1 ?variable1; swrl:argument2 ?variable2 ]]]]. 10 ?widget lw:hasName ?name. 11 } Listing 4.2: SPARQL query
Figure 4.7 shows the result of the query.
Figure 4.7: Results
The second SPARQL query searches for a widget which can produce a set of films. DBPedia does not have a special class for films. For film definition the class ontology:Work with the relation dbprop:starring is used.
1 SELECT ?widget ?name ?variable 2 3 WHERE { 4 ?widget lw:hasOutput[lw:hasModel 5 [lw:hasAtom [swrl:classPredicate ontology:Work; 6 swrl:argument1 ?variable], 7 [swrl:propertyPredicate dbprop:starring; 8 swrl:argument1 ?variable2; swrl:argument2 ?variable ]]]].
79 9 ?widget lw:hasName ?name. 10 }
Figure 4.8 shows the result of the query.
Figure 4.8: Results
The third SPARQL query finds similar data to the data that a widget process. It can be links to internal resources or links to external data resources. For example, an entity. In this case, the property owl:sameAs is often used. This means “that two URI references actually refer to the same thing: the individuals have the same identity“ [11]. For example, the entity http: //dbpedia.org/page/Angelina_Jolie has the property owl:sameAs that is related to http://de.dbpedia.org/resource/Angelina_Jolie in German language and freebase:Angelina Jolie from Freebase3.
1 SELECT ?widget ?name ?variable 2 3 WHERE { 4 ?widget lw:hasModel[lw:hasInput[lw:hasAtom 5 [swrl:propertyPredicate owl:sameAs; swrl:argument1 6 ?variable ]]]. 7 ?widget lw:hasName ?name. 8 }
Figure 4.9 shows the result of the query. Due to the fact that the mashup platform may support widget development, it is very hard to develop a user interface which will be clear for end users and support this kind of models. Widget discovery can also be problematically because the creating queries is difficult for the end user. Additionally, the model should be easy expandable because of an extensional growth in the amount of available widgets that can have more complex features.
3http://www.freebase.com/
80 Figure 4.9: Results
4.3 Widget Model
Due to the fact that there is difficult to apply web service description approaches, a semantic model for widget description had to be implemented. Figure 4.10 depicts this model. According to this description model each widget has three types of models: input model, output model, and model, that can be connected via three types of relationship (lw:hasInputModel, lw: hasOut- putModel, and lw:hasModel) to an instance of a widget. The models contain specific kinds of semantic relation. An instance of the class lw:InModel has direct link to Linked Data instances via the property lw:hasInNode and describes the kind of a semantic relation which a widget has as Input. An instance of the class lw:OutModel has direct link to Linked Data instances via the property lw:hasOutNode and describes the kind of a semantic relation which a widget has as Output. The class Model has the property lw:hasNode and describes the full semantic model which is processed by a Widget. Listing 4.3 presents a use case. The widget might have more than one input model and one output model which is unique. The widget receives instances of dbpedia:Person and returns an instance of a class dbpedia:Work. A work can be found either by providing name of director or name of star. There are two kinds of persons in the widget model, namely start and director. dbpedia:starring is the relation between input and output models that shows that the person is a star. dbpedia:director is the relation between input and output models that shows that the person is a director.
1 @prefix : . 2 @prefix db: . 3 @prefix db−prop: . 4 @prefix dbpedia −owl: . 5 6 :film a db:Work. 7 8 :star a db:Person; 9 dbpedia −owl:starring :film. 10 :director a db:Person; 11 dbpedia −owl:directorOf :film. 12 13 :inM1 a lw:InModel;
81 Figure 4.10: Widget Model
14 lw:hasInNode :star; 15 lw:hasInNode :nameStringObject1; 16 lw:hasInNode :nameStringObject2; 17 lw:hasInNode :nameDateObject. 18 19 :inM2 a lw:inModel; 20 lw:hasInNode :director; 21 lw:hasInNode :nameStringObject1; 22 lw:hasInNode :nameStringObject2; 23 lw:hasInNode :nameDateObject. 24 25 :outM a lw:OutModel; 26 lw:hasOutNode :star; 27 lw:hasOutNode :director . 28 29 :m a lw:Model; 30 lw:hasNode :star; 31 lw:hasNode :director. 32 33 :Widget a lw:Widget; 34 lw:hasName "Movie Agent Widget" ; 35 lw:hasInputModel :inM1; 36 lw:hasInputModel :inM2; 37 lw:hasOutputModel :outM;
82 38 lw:hasModel :m. Listing 4.3: a Widget Model
The code above shows a semantic description with use of the semantic model which was in- troduced in this section. Figure 4.11 presents this semantic description with use of the graphical notation. This semantic model has following advantages:
• it supports more natural way of widget description,
• it is easily extendible,
• the direct interlinking widget models with Linked Data provides clear definition of se- mantic relations,
• semantic repository of widgets can be queried in a clear and efficient way to find the appropriate widgets.
Moreover, the semantic model follows the Semantic Web standards and direct interlinking to Linked Data supports better querying the Linked Data sets. The model lw:Model defines the explicit relations between input and output data. Therefore the input and output models can be used to create the appropriate query for finding specific kinds of semantic relation, extracting required data from Linked Data datasets, searching for widgets that can consume a specific dataset or produce the required output data, or selecting of the required input from the provided context data. The querying examples are provided in Chapter 5.
Figure 4.11: Widget Model
83 4.4 DCAT
A requirement for Linked Widget is to provide information about about origin and ownership of datasets and increasing interoperability between widgets. A possibility to provide this additional features is to model it according to the Data Catalogue Vocabulary4 (DCAT). W3C defines the vocabulary as “a RDF vocabulary that has been designed to facilitate interoperability between data catalogs published on the Web“ [55]. Figure 4.12 depicts use of the DCAT vocabulary which has been adapted to widget description. For the semantic model of the widgets followings namespaces are used:
Prefix Namespace IRI Description dcat http://www.w3. DCAT is “an RDF vocabulary designed to facilitate in- org/ns/dcat# teroperability between data catalogs published on the Web“ [55]. dct http://www.w3. Dublin Core Schema org/ns/dcat# http://purl.org/ dc/terms/# rdf http://www.w3. c.f. Chapter 2 org/1999/02/ 22-rdf-syntax-ns# rdfs http://www.w3. c.f. Chapter 2 org/2000/01/ rdf-schema# foaf http://xmlns.com/ “FOAF is a project devoted to linking people and in- foaf/0.1/ formation using the Web. Regardless of whether in- formation is in people’s heads, in physical or digital documents, or in the form of factual data, it can be linked“ [22] skos http://www.w3. SKOS is “ is a common data model for sharing and org/2004/02/skos/ linking knowledge organization systems via the Se- core# mantic Web“ [59] vcard http://www.w3. VCard is a vocabulary designed for description of or- org/2006/vcard/ ganisation and people. ns# Table 4.1: Prefix and Namespaces
The semantic model includes following properties and classes:
• dct:title - a name given to the widget;
• dct:description - description of the widget;
4http://www.w3.org/TR/vocab-dcat/
84 Figure 4.12: Extension of the Semantic Widget Model with DCAT
• dct:issued - date of formal issuance of the widget; • dct:modified - most recent date on which the widget (in general) was changed, up- dated or modified; • dct:language - language; • dcat:keyword - keywords or tags describing the widget; • dcat:contactPoint - link to contact information which is provided using VCard vocabulary; • dct:temporal - the temporal period that the dataset covers (for data cube); • dct:publisher - an entity responsible for widget creation and publishing, link to foaf:Agent (persons, organizations or groups of any kind); • dcat:theme - the main topic of the widget; • skos:Concept a category or a theme is used for describing, categorizing and organis- ing datasets; • skos:ConceptScheme - the knowledge organization system used to represent con- cepts of widgets. Listing 4.4 demonstrates an example of applying of DCAT vocabulary for the Linked Wid- gets. The Widget has the title “Movie Widget“ and the relationship to media themes (ex1:media is an instance of skos:Concept).
85 1 :Widget a lw:Widget ; 2 rdfs:label "Widget 1"^^xsd:string ; 3 lw:hasInModel :inM1, :InM2 ; 4 lw:hasModel :m ; 5 lw:hasOutModel :OutModel ; 6 lw:name "Movie Widget"^^xsd:string ; 7 dct:title "Search for movie"^^xsd:string ; 8 dct:description "The widget searches for movies 9 based on actors and directors name"; 10 dct:issued "2014−01−10"^^xsd:date; 11 dct:modified "2014−01−12"^^xsd:date; 12 dcat:keyword "movie, film , actor , star"^^xsd:string ; 13 dct:publisher :tuvienna ; 14 dcat:theme :media. 15 16 :tu\_vienna a org:Organization , foaf:Agent; 17 rdfs:label "University of Technology Vienna" . Listing 4.4: DCAT example
Listing 4.5 show a SPRAQL example for searching widgets that have “media“ as the main theme, have been issued on “10.01.2014“ by “tuvienna“.
1 2 SELECT ?w ? t i t l e 3 WHERE { 4 ?w a lw:Widget; 5 dct:title ?title; 6 dcat:theme :media; 7 dct:issued "2014−01−10" ^^ xsd : date ; 8 dct:publisher :tuvienna ; 9 } Listing 4.5: SPARQL example
DCAT gives following benefits for Mashups: DCAT increases findability, enables descrip- tion of data that are located in different Linked Data endpoints, provides better search for wid- gets.
4.5 Provenance
An important requirement for Linked Widget is extending the widget description by adding information provenance. The W3C Provenance Incubator Group defines provenance as “a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing“ [35]. In other words, the meta-data may include information about:
• the creator of the data (author, reviewer, etc.);
• version of data sets (the data are changed often);
86 • data sources of information, in case of data integration, it is needed to describe which part comes from which data sets,
• description of rules, vocabularies, ontologies; etc.
“The Provenance Family of Documents (PROV) defines a model, corresponding serializa- tions and other supporting definitions to enable the inter-operable interchange of provenance information in heterogeneous environments such as the Web. The goal of PROV is to enable the wide publication and interchange of provenance on the Web and other information systems. PROV enables one to represent and interchange provenance information using widely available formats such as RDF and XML. In addition, it provides definitions for accessing provenance information, validating it, and mapping to Dublin Core“ [35]. There is a set of 12 documents, that W3C group defined for adding provenance: PROV-OVERVIEW5, PROV-PRIMER6, PROV- DM7, PROV-N8, etc. Figure 4.13 shows the organisation of PROV documents. The colors in the figure define on what category of user are the documentations focused:
• light blue color is for users (understanding and support provenance);
• blue is for developers (creation and consuming provenance);
• pink is for advanced user (creation new PROV serializations or other application based on provenance).
The common vocabulary is defined by the conceptual data model (PROV-DM). The user and the developers use the set of constraints (PROV-Constraints9) for constructing of valid prove- nances expressions. The formal semantic (declarative specification) is defined by PROV-SEM10. Further the developers use access provenance (PROV-AQ11), linking provenance information (PROV-Links12), dictionary style collections (PROV-Dictionary13) and Dublin Core vocabulary (PROV-DC). The approach suggests the use of the PROV ontology [50] (PROV-O, a standard lightweight vocabulary) for adding meta-information about provenance of information. The W3C Prove- nance Incubator Group describes PROV-O as “an OWL2 ontology allowing the mapping of the PROV data model to RDF“. The Prov ontology includes set of classes, properties, and restric- tions for representation of the information. Table 4.2 demonstrated the namespaces which are used by PROV-O. The basic three classes of PROV-O are:
5http://www.w3.org/TR/prov-overview/ 6http://www.w3.org/TR/prov-primer/ 7http://www.w3.org/TR/prov-dm/ 8http://www.w3.org/TR/prov-n/ 9http://www.w3.org/TR/prov-constraints/ 10http://www.w3.org/TR/prov-sem/ 11http://www.w3.org/TR/prov-aq/ 12http://www.w3.org/TR/prov-links/ 13http://www.w3.org/TR/prov-dictationary/
87 Figure 4.13: PROV documents. Source: [35]
Prefix Namespace IRI rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# xsd http://www.w3.org/2000/10/XMLSchema# owl http://www.w3.org/2002/07/owl# prov http://www.w3.org/ns/prov# Table 4.2: Prefix and Namespaces
• A prov:Entity is a kind of thing with some fixed aspects (real or imaginary).
• A prov:Activity is an event that happens over a period of time with entities (e.g. include consuming, transforming, using, etc.).
• A prov:Agent is a responsible for an activity.
The relations between the classes entity, actor and activity are shown in Figure 4.14. The properties prov:startedAtTime and prov: endedAtTime show start and end time of activities. The entities can be used and generated by activities (the properties: prov:used and prov:wasGeneratedBy). Additionally, some dependency information between activities can be provided via prov:wasInformedBy. This provides “some dependency information without explicitly providing the activities’ start and end times“ [50]. For example, the activity “creationCollection“ calls an additional activity “aggregationByTopic“ activity to subscribe a widget to a theme.
1 2 @prefix prov: . 3 @prefix : . 4 5 :creationCollection 6 a prov:Activity;
88 7 prov:wasInformedBy :subsribeActivity. 8 9 :subscribeActivity 10 a prov:Activity; 11 prov:wasInfluencedBy :aggregationByTopics; 12 # aggregation of widgets by topics 13 prov:wasCreatedBy: :irina. 14 15 :irina aprov:Agent. 16 17 :aggregationByTopic a prov:Activity . Listing 4.6: PROV-O
The property prov:wasDerivedFrom is used for provenance chains definition (trans- formation of one entity into another). For example, a new dataset can be a result of filtering of an another dataset. “Arbitrary RDF properties can be used to describe the fixed aspects of an Entity that are interesting within a particular application“ [50] (e.g., the format of the dataset). The responsibilities of an agent can be shown via prov:wasAssociatedWith and prov:wasAttributedTo. The property prov:actedOnBehalfOf describes an agent’s responsibility for an another agent that relates to the influenced Activity or Entity. The following code presents a part of a description with use of DCAT.
1 2 :movieWidget a lw:Widget 3 dct:title "Search for films" ; 4 dct:creator :irina ; 5 dct:contributor :peter ; 6 dct:created "2013−12−01" ; 7 dcat:theme :Media. 8 ...
Figure 4.14: Relation between three basic classes
89 dct:creator :movieWidget :irina prov:wasAttributeTo
prov:wasAssociatedWith prov:wasGeneratedBy
:creatingThe Widget prov:startedAtTime
prov:wasAssociatedWith “2013-11-15” prov:endedAtTime
:peter “2013-12-01”
Figure 4.15: Relation between the basic classes
9 10 :irina a dct:Agent. 11 12 :tuvienna a dct:Agent. 13 14 :Media a skos:Concept; 15 dct:creator :irina. 16 ... Listing 4.7: A part of a Widget Description
Figure 4.15 represents a transformation from the semantic description modelled following to DCAT vocabulary to the semantic description which is modelled with use of PROV-O. In this case, the entity is a :movieWidget (an instance of the class lw:Widget). There are two agents, that are responsible for the action effecting the entity :movieWidget: :irina and :peter. The action is :creatingTheWidget, that describes how the entity namely lw:Widget has been created or changed. The properties prov:startedAtTime and prov: endedAtTime describe the date of first creation of the widget and date of the last change. The ontology described above can be extended via additional terms (c.f. Figure 4.16). These additions can be divided in five categories [50]:
1. The class prov:Agent has three subclasses: prov:Agent - for peoples; prov: Organization - for companies, social institutions, society, etc.; and prov:Software Agent - for running software. The prov:Entity divides into: prov:Collection
90 Figure 4.16: The extended term
that provides structure to some Entities; prov:Bundle - a set of provenance descrip- tions; prov:Plan - a set of actions.
2. The property prov:specializationOf presents “an entity that is a specialization of another shares all aspects of the latter, and additionally presents more specific as- pects of the same thing as the latter“ [50]. An alternate entities can be presented using prov:alternateOf property.
3. The property prov:atLocation defines a prov:Location for the Entities.
4. The lifetime of Entities that are generated by an Activity and used by other Activities are defined by prov:invalidatedAtTime, prov:wasInvalidatedBy etc.
5. The lifetime of an Activity - the time between start and end time.
Figure 4.17 and the following code provide an example of using the additional terms (three types of agents: person, organization, and software).
1 2 @prefix xsd: . 3 @prefix foaf: . 4 @prefix sioc: . 5 @prefix prov: . 6 @prefix : . 7 @base . 8 9
91 10 <> 11 12 a prov:Bundle, prov:Entity; 13 prov:wasAttributedTo :postEditor; 14 prov:generatedAtTime "2011−07−16T02:52:02Z"^^xsd:dateTime; 15 . 16 17 : i r i n a 18 a prov:Person , prov:Agent; 19 ## prov:Agent is inferred from prov:Person 20 foaf:givenName "Irina"; 21 foaf:mbox ; 22 prov:actedOnBehalfOf :tuvienna; 23 . 24 25 : t u v i e n n a 26 a prov:Organization , prov:Agent; 27 ## prov:Agent is inferred from prov:Organization 28 foaf:name "TU Vienna"; 29 . 30 31 :widgetSystem 32 a prov:SoftwareAgent , prov:Agent; 33 ## prov:Agent is inferred from prov:SoftwareAgent 34 foaf:name "Linked Widget"; 35 . 36 37 :movieWidget 38 prov:Entity; 39 sioc:title "Findmeamovie"; 40 prov:generatedAtTime "2013−08−16T01:01:01Z"^^xsd:dateTime; 41 prov:wasGeneratedBy :creatingTheWidget12; 42 . 43 44 : creatingTheWidget12 45 a prov:Activity; 46 prov:startedAtTime "2013−08−16T01:01:01Z"^^xsd:dateTime; 47 prov:wasStartedBy :irina; 48 prov:wasAssociatedWith :widgetSystem; 49 prov:generated :movieWidget; 50 prov:endedAtTime "2013−08−16T03:52:02Z"^^xsd:dateTime; 51 prov:wasEndedBy :irina; 52 . Listing 4.8: Additional terms of PROV-O
92 :tuvienna prov:actedOnBehalfOf
:irina :movieWidget
prov:wasGenreretedBy prov:wasEndedBy prov:wasStartedBy
:creatingThe prov:wasAssociatedWith :widgetSystem Widget
Figure 4.17: Relation between the basic classes
93
CHAPTER 5 Results and Evaluation
5.1 Resulting Semantic Model
Figure 5.1 presents the semantic model based on semantic widget model, DCAT, and PROV- O that were described in previous Chapter. The model describes a possible way to bring the ontologies together in order to satisfy the requirements for the mashup system. It includes the most important classes that cover all requirements. If some additional classes or properties are needed, it is possible to extend the semantic model. The semantic model has been extended with the following properties and classes:
• dcat:theme - for widget classification. E.g. media, actors, science, etc.
• dct:publisher - for providing information about creators of widgets. There are three types of possible agents: foaf:Person defines a person, who created a widgets), foaf:Group defines groups of creators or an institution, to that the creators belong, and foaf:Software, e.g. editor, mashup creator. DCAT vocabulary “makes extensive use of terms from other vocabularies“ [55], e.g. Dublin Core1.
• prov:wasGeneretedBy - for providing an activity, that have an influence on state of the widget (e.g. creation, changing, etc.).
5.2 Semantic Model Use cases
In this section use cases are addressed by semantic widgets which follow the semantic semantic model presented in this Chapter. Semantic model description use cases are divided into follow- ing categories:
1http://dublincore.org/documents/dcmi-terms/
95 Figure 5.1: Widget Model
• Publishing the Linked Widget information on Linked Open Data Cloud. The detailed description of widgets.
• Discovery: finding widgets that contain a specific kind of semantic relation. E.g. all widgets that contain property dbprop:livesIn.
• Composition: finding the matching widget that can consume a specific dataset or produce the required output data. E.g. all widgets that have instances of class dbpedia:Person from DBPedia.
• Smart data consumption based on semantic model: semantic model is used to select the required input from the provided context data.
5.2.1 Publishing examples Figure 5.2 presents a set of widgets, that are needed for searching a set of films. The widget “DBPedia Film Agent Search“ gives a possibility to find either actors or directors based on the following properties: name, birthplace, and year of birth. This widget is presented by the in- stance w:widget of the class lw:Widget, that has an input models w:inM1 and w:inM2, a model w:m and an output model w:outM. The models are connected with the instances of DBPedia classes dbpedia:Person via the property lw:hasInputNode, lw:hasNode and lw:hasOutputNode. The instances of the class dbpedia:Person are w:star and w:director that differentiated with help of the dbpedia properties dbpedia:starring and dbpedia:director. The model w:m includes all properties and all classes, that needed
96 to depict all relationship between inputs and outputs of the widget. The output model is con- nected to stars and directors that are instances of the same DBPedia class dbpedia:Person. Furthermore, a person can be star and director at the same time and have both properties dbpedia: starring and dbpedia:director. Figures 5.3 and 5.4 present a part of semantic model of widget “DBPedia Film Agent Search“, in graphical notation and in Turtle.
Figure 5.2: Widget “DBPedia Film Agent Search“
Figure 5.5 presents the widget “Google Maps“ which receives list of coordinates (longitude and latitude) and shows the points on map. This widget is presented by the instance w2:Widget of the class lw:Widget that has an input models w2:inM1 and a model w2:m. The models
Figure 5.3: Semantic Model of Widget “DBPedia Film Agent Search“ in graphical notation
97 Figure 5.4: Semantic Model of Widget “DBPedia Film Agent Search“ in TopBraid Composer are connected with the instances of Geonames ontology2 class gn:Feature via the prop- erty lw:hasInputNode and lw:hasNode. The instances of the class gn:Feature is w2:feature that has properties wgs84_pos:lat and wgs84_pos:long. Figures 5.6 presents a part of semantic model of widget “Google Maps“.
5.2.2 Discovery examples The second goal of the semantic description is searching for widgets. It is possible to implement with use of SPARQL queries. Discovery example 1 The first SPARQL query (c.f. Figure 5.7) finds widgets that contain the property dbpedia: starring in the widget descriptions models. The property defines a relationship between two DBPedia classes dbpedia:Person and dbpedia:Work.
2http://www.geonames.org/ontology/documentation.html
98 Figure 5.5: Widget “Google Maps“
The SPARQL query includes two clauses:
• The “SELECT clause identifies the variables to appear in the query results“ [73]: ?w - an instance of the class lw:Widget, ?name - the name of the widget, ?publisher - a publisher of the widget, an instance of the class foaf:Agent, ?n - a node that is connected to an instance which has property dbpedia:starring.
• The “WHERE clause provides the basic graph pattern to match against the data graph“ [73]. The basic graph pattern includes the following triples: ?w rdf:type lw:Widget - finding an instance of the class lw:Widget, lw:hasName ?name - finding names of widgets, ?w dcterms:publisher ?publisher - finding publisher of widgets, ?w lw:hasModel ?m - finding models of widgets, ?m lw:hasNode ?n - finding nodes of models, ?n dbpedia:starring ?ins - finding the property dbpedia:staring. Figure 5.8 shows the main classes and properties that are included in the SPARQL query.
Figure 5.9 presents the search for widget that contained owl:sameAs property which shows that two thing with different URIs are the same thing. The instance “Angelina Jolie“ the class Actor from Linked Movie Database is the same as the instance “Angelina Jolie“ of the class Person from DBPedia. The SPARQL query includes two clauses:
• The “SELECT clause identifies the variables to appear in the query results“ [73]: ?w - an instance of the class lw:Widget, ?name - the name of the widget, ?class - a class of the instance.
• The basic graph pattern of WHERE clause includes the following triples: ?w rdf:type lw:Widget - finding an instance of the class lw:Widget, lw:hasName ?name - finding names of widgets, ?w lw:hasModel ?m - finding models of widgets, ?m
99 Figure 5.6: Semantic Model of widget “Google Maps“
Figure 5.7: Finding widgets that contain property “starring“ in semantic model
lw:hasNode ?n - finding nodes of models, ?node owl:sameAs ?x - finding the
100 Figure 5.8: SPARQL query steps
widget nodes that have the property owl:sameAs, ?x rdf:type ?class - finding a class of the instance.
Figure 5.9: Search for widget that contained the “owl:sameAs“ property
5.2.3 Composition examples The following SPARQL query (c.f. Figure 5.10 finds widgets that produce geo data for Map visualization. The location is defined with use of geoname ontology class fn:Feature that has properties geo:lat and geo:long. A part of the widget description description is provided in Listing 5.1. This widget returns a set of location (longitude and latitude). The goal is to provide a mechanism for automatic SPARQL queries generation. In this cases, the query will search for widgets that can be wired with the output of this widget.
1
101 Figure 5.10: Search for widgets that produce geo data
2 @prefix dbpedia: . 3 @prefix dcat: . 4 @prefix dcterms: . 5 @prefix gn: . 6 @prefix lw: . 7 @prefix owl: . 8 @prefix p: . 9 @prefix prov: . 10 @prefix rdf: . 11 @prefix rdfs: . 12 @prefix skos: . 13 @prefix : . 14 @prefix xsd: . 15 16 : Widget 17 rdf:type lw:Widget ; 18 lw:hasOutputModel :OutM1 ; 19 lw:hasModel :m ; 20 lw:hasName "Map Widget"^^xsd: string ; 21 dcterms:publisher p:tuVienna , p:irina ; 22 dcat:theme lw:map ; 23 prov:wasGeneratedBy lw:widgetCreation . 24 25 : OutM1 26 rdf:type lw:OutModel ; 27 rdfs:label "Input model for map"^^xsd:string ; 28 lw:hasInputNode w2:long , w2:feature , w2:lat . 29 30 : l a t 31 rdf:type xsd:float ; 32 rdfs:label "lat"^^xsd:string . 33 34 : long 35 rdf:type xsd:float ; 36 w2 : f e a t u r e 37 rdf:type gn:Feature ; Listing 5.1: Source Code
102 Figure 5.11: Generation of SPARQL queries
Figure 5.11 presents the automatic generation of SPARQL queries from a widget descrip- tion. The arrows in the picture show the transformation from the output model into the terms of SPARQL query. For example, the property lw:hasOutputModel is reversed to the term lw:hasInputModel, the property lw:hasOutputNode is reversed to lw:hasOutputNode. Figure 5.12 shows the result of the SPARQL query.
Figure 5.12: SPARQL query and result in TopBraid Composer
5.2.4 Smart Data Consumption Following figure 5.13 demonstrates selecting input data for the widget “Google Map“. The widget get the data flow from three widgets: “Location“ widget that returns locations of library, “City Byke“ widget that returns locations of city byke station, and “Geo Merge“ Widget. The “Geo Merger“ widget process this data according to defined options. The result includes set
103 of locations that are instances of the class gn:Point with the properties latitude geo:lat, longitude geo:long, and :address. Based on this model, the application knows which kind of data are required for the input of this widget.
Figure 5.13: Smart consumption example
5.3 Result evaluation
In Chapter 4 a list of requirements has been defined. The purpose of the subsection is to evaluate the result model based on requirements fulfilment:
1. Widgets and Mashups are identified via an identifier - a URI. Mashups and widgets have to be stored in Widget Repository and identified by their URI’s. The users have possibility to define the URI.
2. By dereferencing the widget URIs, the semantic model will be returned. It is possible to find a widget which is identified by a URI. The semantic model can be returned.
104 3. The model should follow web standards (W3C recommendations). The data are described with use of XML-serialized RDF format which gives the possibility to define the structure of the data and do publish them into LOD cloud. The provenance of information is described with use of web standards DCAT and PROV-O. The search is provided by using SPARQL queries. Some examples are given in this Chapter.
4. The semantic model should support adding links to other Linked Data sources. These links allow the mashups platform to connect distributed data into a data space and to navigate over the data sets. Due to the fact that widgets model are usually connected to original Linked Data, it is possible to define relations to external Linked Data Endpoints.
5. A widget may have more than one semantic model, but all should generate the same output with explicit relation to input graph. It is possible to create more than one model that generate the same output.
6. The input and output data should be interlinked, explicit relations between data should be defined. There are three different types of widget models: input model, output model, and model. It gives a flexible mechanisms to define the full data model and all connection between input and output data.
7. The model should be general to support various types of widgets. E.g. data widgets, presentation widgets. In actual stage of implementation the semantic model supports all existing types of wid- gets.
8. The semantic model should provide “an explicit representation of provenance information that is accessible to machines, not just to humans“ [85]. The provenance of data is provided by applying PROV-O and DCAT ontologies. The ontologies allow to define the information about author, date of creation, versions, etc.
The requirements are fulfilled. The semantic model follows Linked Data and Semantic Web principles. With use of the model the widgets can be published into LOD. The data are described with use of W3C standards, that make the data machine-readable.
105
CHAPTER 6 Conclusion and Future Work
This chapter summarizes the research work and research results, indicates research limitations and provides advise for future work.
6.1 Research Summary
The main questions of this research work were to define a possibility to apply semantic service description languages for widget description, to define requirements for the semantic model, to implement the semantic model according to defined requirements, and to integrate the model into a prototype mashup environment. The first challenge of this work was to define what kind of basic semantic concepts and principles the mashup platform should be based on. Therefore, the second part of this master thesis introduces both the definition of the web of data and software technology, such as Mashups and Web Services. A set of requirements were derived based on concepts of the semantic web. Another very important part is the extensive analysis and comparison of existing mashups platforms and semantic web services description techniques. The analysis was divided into two parts:
• The first part covers the analysis of existing mashups platforms in order to define what kind of factors can increase usability.
• The second part provides semantic web services description techniques, their advantages and disadvantages, and evaluation of possibilities to apply this concept to the mashup platform.
The result of the analysis shows that there is no applicable web service description ap- proach for the proposed system. Even though Karma seems to be the most suitable approach, it still poses barrier regarding model implementation, because the mashup development based on Karma approach is very complex.
107 Figure 6.1: Widget recommendation in Mashup Platform
The main goals were requirements definition and implementation of the widget model. These goals were achieved successfully. This includes a set of requirements for the Linked Widget Model that are derived from semantic web and Linked Data principles, and parts of the resulting semantic model that are presented by DCAT, Information Provenance, and semantic widget model. Additionally, the Karma approach as an alternative to the developed model is provided. The Karma-based widget description shows that this method does not satisfy all re- quirements, such as the possibility to include explicit descriptions of relations (Karma approach uses SWRL vocabulary) for the Semantic Model and therefore it should be extended. The exten- sion of the model, however, can provoke problems for widget discovery and widget composition.
The main result of this research is the semantic model, which can be integrated into a proto- type mashup environment. The resulting semantic model follows Linked Data principles. This enables publishing widget descriptions into the LOD cloud. Widget matching and composition can be provided by use of SPARQL queries. The model contains required meta-data for data origin definition. Figure 6.1 shows a new feature of the developed Mashup platform, which is available since the implementation of the semantic model. By clicking on the output terminal the user gets a list of widgets that can be wired with the widget on the workspace. The suggested widgets are appeared in the bottom left corner of the mashup platform.
108 6.2 Research Limitation
The following restrictions are denoted:
• Due to the fact that the mashup platform is in very earliest stage of implementation, only a limited number of use cases can be provided. An extensional growth in the amount of available widgets will provoke widgets that are more complex and therefore richer in terms of features. For example, the data sets should be transformed into more understandable, suitable structure. This can be done by applying some algorithms or statistical methods (correlation, role learning) that can provide an analysis of available Linked Data. This will influence the semantic model because it is required to describe explicitly the relation between input and output data.
• The existence of double relations between instances which is not supported by the se- mantic model. For example, the entity http://dbpedia.org/page/Angelina_ Jolie has two similar properties dbpprop:birthPlace and dbpprop:dateOf Birth.
• The semantic model includes basic components required for widgets description. For example, PROV-O provides very complex set of entities and relations which were not required for our solution.
6.3 Future Work
Due to the fact of significant growth of statistical data provided by various public organizations, in future the mashup platform, with all of its advantages regarding linked data consumption, can provide access to this data. A possibility to process this kind of public data through widgets is data publishing as Linked Data, using W3C Data Cube vocabulary1, a format for statistical data publishing on the Web of Data. This enables to link and combine the data with additional infor- mation. Additional advantages of this approach are that multi-dimensional data can be presented with use of the RDF standard and published following the Linked Data principles. Furthermore, the model is general, which enables high reusability, and can be used for various datasets like OLAP data cubes. The main elements of the Data Cube vocabulary are a collection of observa- tions (datasets), a set of dimensions, defining the foundations of the observation, measures that describe objects of the observation, and attributes of the observed values. This facet implies the development of new types of widgets that can process data modeled based on the Data Cube format. A possible way to integrate such data is to extend the existing semantic model by adding additional entities and relations like observation, dataset, measure, etc. The mashup platform will support visualization of such multi-dimensional data and inte- gration with other data sets to support end users in deducing knowledge from statistical data. Furthermore, this will allow developers to easily discover a data source and then develop sta- tistical web applications of high quality and flexibility. The current version supports only three common statistic charts including pie, bar chart and line chart, are supported. More types of
1http://www.w3.org/TR/2014/REC-vocab-data-cube-20140116/
109 charts can improve visualization of Linked Data and data browsing. This will also influence the semantic model, because the visualization widgets can process and return additional data like summary of some data values or difference between data value, etc. and the semantic model should enable description of such data types. Finally, streaming data2 can be integrated into the mashup platform. For this kind of data it will be necessary to find some mechanism how to deal with temporal data (time stamp, time interval and other options) and include it into the semantic model in order to provide best widget matching and composition.
2http://www.w3.org/community/rsp/wiki/RDF_Stream_Models
110 CHAPTER 7 Appendix
7.1 Acronyms
CSV Comma Separated Values
DAML DARPA Agent Markup Language
DCAT Data Catalog Vocabulary
DL Description Logic
HTML Hypertext Markup Language
IRI Internationalized Resource Identifier
LIDS Linked Data Services
LOD Linked Open Data
LOS Linked Open Services
OWL-S Semantic Marjup for Web Services
OWL Web Ontology Language
PROV-O Provenance Ontology
R2RML RDB to RDF mapping Language
RDB Relational Data Base
RDFS Resource Description Framework Schema
RDFa Resource Description Framework in Attributes
111 REST Representational State Transfer
RSS Really Simple Syndication
SAWSDL Semantic Annotation for Web Services Description Language
SOAP Simple Object Access Protocol
SPARQL SPARQL Protocol and RDF Query Language
SQl Structured Query Language
SWRL Semantic Web Rule Language
URI Uniform Resource Identifier
W3C World Wide Consortium
WSDL Web Service Description Language
WSMO Web Service Modeling Ontology
WWW, W3 World Wide Web
XML Extensible Markup Language
XSLT XSL Transformation
XSL Extensible Stylesheet Language
7.2 Widget Semantic Model
112 Widget Input Mode Output Model Model Mashup 113 rdf:about="http://linkedwidgets.org/ontologies#hasInputNode"> has nodel
name has model has model
114 has nodel has nodel name name
115
7.3 Semantic Models in Top Braid Composer
Figure 7.1: Top Braid Composer Interface
116 Figure 7.2: Import of ontologies in Top Braid Composer
Figure 7.3: DBPedia classes in Top Braid Composer
117 Figure 7.4: An example of a property in Top Braid Composer
Figure 7.5: Instances in Top Braid Composer
118 Figure 7.6: An example of Widget Description in Top Braid Composer
Figure 7.7: An example of a model description in Top Braid Composer
119
Bibliography
[1] T. Berners-Lee and R. Fielding and L. Masinter. http://tools.ietf.org/html/rfc3986. Ac- cessed: 2014-02-21. [2] W3C. http://www.w3.org/2001/sw/. Accessed: 2014-02-21. [3] Saeed Aghaee and Cesare Pautasso. An evaluation of mashup tools based on support for heterogeneous mashup components. In Proceedings of the 11th International Conference on Current Trends in Web Engineering, ICWE’11, pages 1–12. Springer-Verlag, 2012. [4] AJAX. http://en.wikipedia.org/wiki/ajax_(programming), Accessed: 2013-11-11. [5] Dean Allemang and Jim Hendler. Semantic Web for the Working Ontologist: effective modelling in RDFS and OWL. Morgan Kaufmann Publishers, 2. edition, 2011. [6] Areeb Alowisheq, David E. Millard, and Thanassis Tiropanis. Express: Expressing rest- ful semantic services using domain ontologies. International Semantic Web Conference, 5823:941–948, 2009. [7] Alowisheq Areeb and David E. Millard. Express: Expressing restful semantic web ser- vices. The Seventh Reasoning Web Summer School, pages 23–27, 2011. [8] Sören Auer, Lorenz Bühmann, Christian Dirschl, Michael Hausenblas Orri Erling, Robert Isele, Jens Lehmann, Michael Martin, Pablo N. Mendes, Bert van Nuffelen, Claus Stadler, Sebastian Tramp, and Hugh Williams. Managing the Life-Cycle of Linked Data with the LOD2 Stack. The Semantic Web – ISWC 2012, pages 1–16, 2012. [9] Robert J. Aumann, A. Michael Spence, Martin L. Perl, Frank Wilczek, Steve Wozniak, Vinton G Cerf, Ann Winblad, Richard Stallman, Jim Rogers, Alan Kay, Bjarne Strous- trup, Brian Behlendorf, Rajeev Madhavan, Jimmy Wales, Craig Newmark, Greg Gian- forte, Grady Booch, and Chief Scientist. Frontier visionary interview. Frontier Journal, 6(7), 2009. [10] Florian Bauer and Martin Kaltenböck. Linked Open Data: The Essentials. A Quick Start Guide for Decision Makers. Edition mono/monochrom, Vienna, Austria, 1. edition, 2012. [11] Sean Bechhofer, Frank van Harmelen, Jim Hendler, Ian Horrocks, Deborah L. McGuin- ness, Peter F. Patel-Schneider, and Lynn Andrea Stein. http://www.w3.org/tr/owl-ref/, Ac- cessed: 2013-12-05.
121 [12] Francois Belleau, Marc-Alexandre Nolin, Nicole Tourigny, Philippe Rigault, and Jean Morissette. Bio2RDF: Towards a mashup to build bioinformatics knowledge systems. The SemanticWeb: Semantics and Big Data, pages 706–716, 2008.
[13] T Berners-lee, J. Hollenbach, Kanghao Lu, J. Presbrey, and Mc Schraefel. Tabulator redux: Browsing and writing linked data, Accessed: 2013-11-02.
[14] Tim Berners-Lee, James Hendler, and Ora Lassila. The Semantic Web. Scientific Ameri- can, pages 29–37, 2011.
[15] Berners-Lee, Tim and Cailliau, Robert . http://www.w3.org/proposal.html. Accessed: 2014-02-21.
[16] BIO2RDF. https://github.com/bio2rdf/bio2rdf-scripts/wiki/, Accessed: 2013-11-11.
[17] Christian Bizer, Tom Heath, and Tim Berners-Lee. Linked Data — The Story So Far. International Journal on Semantic Web and Information Systems, pages 1–22, 2009.
[18] Christian Bizer, Jens Lehmann, Georgi Kobilarov, Sören Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann. Dbpedia - a crystallization point for the web of data. Web Semantics: Science, Services and Agents on the World Wide Web, 7(3):154–165, 2009.
[19] Brian McBride. http://www.w3.org/tr/rdf-schema/. Accessed: 2013-10-29.
[20] Alison Callahan, José Cruz-Toledo, Peter Ansell, and Michel Dumontier. Bio2RDF Re- lease 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data. Journal of Biomedical Informatics, pages 200–212, 2013.
[21] JackBe Corporation. A business guide to enterprise mashups, 2008.
[22] Dan Brickley and Libby Miller. http://xmlns.com/foaf/spec/. Accessed: 2013-10-24.
[23] Dave Beckett and Brian McBride. http://www.w3.org/tr/rec-rdf-syntax/. Accessed: 2013- 10-21.
[24] David Beckett and Tim Berners-Lee W3C. http://www.w3.org/teamsubmission/turtle/. Ac- cessed: 2013-10-24.
[25] David Martin and Mark Burstein and Jerry Hobbs and Ora Lassila and Drew McDermott and Sheila Mcllraith and Srini Narayanan and Massimo Paolucci and Bijan Parsia and Evren Sirin and Naveen Srinivasan and Katia Sycara. http://www.w3.org/submission/owl- s/, Accessed: 2013-11-15.
[26] Dieter Fensel, Federico Michele Facca, Elena Simperl, and Ioan Toma. Semantic Web Services. Springer-Verlag Berlin Heidelberg, 1. edition, 2011.
[27] Teresa Barberá Ribera Fernando J. Garrigos-Simon, Rafael Lapiedra Alcamí. Social net- works and Web 3.0: their impact on the management and marketing of organizations. Management Decision, 50(2):1880–1890, 2012.
122 [28] The Apache Software Foundation. http://stanbol.apache.org/, Accessed: 2013-12-03. [29] DERI Galway. http://pipes.deri.org/, Accessed: 2013-11-08. [30] Jose Marıa Garcıa, David Ruiz, and Antonio Ruiz-Cortes. A lightweight prototype imple- mentation of sparql filters for wsmo-based discovery. In Technical Report ISA-11-TR-01. ISA Research Group, 2011. [31] Karthik Gomadam, Ajith Ranabahu, and Amit Sheth. http://www.w3.org/submission/sa- rest/, Accessed: 2013-11-17. [32] Graham Klyne and Jeremy J. Carroll and Brian McBride. http://www.w3.org/tr/rdf11- concepts/. Accessed: 2014-02-27. [33] Benjamin Grosof, Mike Dean, Carl Andersen, William Ferguson, Daniela Inclezan, and Richard Shapiro. R.: A silk graphical ui for defeasible reasoning, with a biology causal process example. In In: Proc. 4th Intl. Web Rule Symp. (RuleML), 2010. [34] Benjamin Grosof, Mike Dean, and Michael Kifer. The silk system: Scalable higher-order defeasible rules. In International RuleML Symposium on Rule Interchange and Applica- tions, 2009. [35] Paul Groth and Luc Moreau. http://www.w3.org/tr/prov-overview/, Accessed: 2013-12-05. [36] Tom Heath and Christian Bizer. Linked Data. Evolving the Web into a Global Data Space. Morgan & Claypool, 1. edition, 2011. [37] John Hebeler, Matthew Fisher, Ryan Blace, and Andrew Perez-Lopez. Semantic Web Pro- gramming. Wiley Publishing, Inc., 1. edition, 2009. [38] Ian Horrocks and Peter F. Patel-Schneider and Harold Boley and Said Tabet and Benjamin Grosof and Mike Dean. http://www.w3.org/submission/swrl/. Accessed: 2014-02-21. [39] Google Inc, Yahoo Inc, and Microsoft Corporation. http://schema.org/, Accessed: 2013- 11-29. [40] Yahoo! Inc. http://pipes.yahoo.com/, Accessed: 2013-11-06. [41] Kashif Iqbal, Marco Luca Sbodio, Vassilios Peristeras, and Giovanni Giuliani. Semantic service discovery using sawsdl and sparql. In Proceedings of the 2008 Fourth International Conference on Semantics, Knowledge and Grid, pages 205–212. IEEE Computer Society, 2008. [42] Ivan Herman and Ben Adida and Manu Sporny and Digital Bazaar and Mark Birbeck. http://www.w3.org/tr/xhtml-rdfa-primer. Accessed: 2013-10-21. [43] M. Cameron Jones and Elizabeth F. Churchill. Conversations in Developer Communities: a Preliminary Analysis of the Yahoo! Pipes Community. Proceeding C&T ’09 Proceedings of the fourth international conference on Communities and technologies, pages 195–204, 2009.
123 [44] Rohit Khare and Tantek Çelik. Microformats: a pragmatic path to the semantic web. WWW ’06 Proceedings of the 15th international conference on World Wide Web, pages 865–866, 2006.
[45] Craig A. Knoblock, Pedro Szekely, José Luis Ambite, Aman Goel, Shubham Gupta, Kristina Lerman, Maria Muslea, Mohsen Taheriyan, and Parag Mallick. Semi- automatically mapping structured sources into the semantic web. In The Semantic Web: Research and Applications, Lecture Notes in Computer Science, pages 375–390. Springer Berlin Heidelberg, 2012.
[46] Agnes Koschmider, Victoria Torres, and Vicente Pelechano. Elucidating the mashup hype: Definition, challenges, methodical guide and tools for mashups. In 2nd Workshop on Mashups, Enterprise Mashups and Lightweight Composition on the Web in conjunction with the 18th International World Wide Web Conference, Madrid, 2009.
[47] Rubén Lara, Dumitru Roman, Axel Polleres, and Dieter Fensel. A conceptual comparison of wsmo and owl-s. Multimedia Tools and Applications, 64(2):365–387, 2013.
[48] Jon Lathem, Karthik Gomadam, and Amit P. Sheth. Sa-rest and (s)mashups : Adding semantics to restful services. International Conference on Semantic Computing, pages 469–476, 2007.
[49] Danh Le-Phuoc, Axel Polleres, Manfred Hauswirth, Giovanni Tummarello, and Christian Morbidoni. Rapid Prototyping of Semantic Mash-Ups through Semantic Web Pipes. Pro- ceeding WWW ’09 Proceedings of the 18th international conference on World wide web, pages 581–590, 2009.
[50] Timothy Lebo, Satya Sahoo, and Deborah McGuinness. http://www.w3.org/tr/prov-o/, Accessed: 2013-12-05.
[51] Faculty of Mathematics Leipzig University and Dept. Business Information Systems Com- puter Science, Institute of Computer Science. http://aksw.org/projects/limes.html, Ac- cessed: 2013-11-19.
[52] linkeddata.org, administrated by Tom Heath. http://linkeddata.org. Accessed: 2013-10-21.
[53] Yan Liu, Xin Liang, Lingzhi Xu, Mark Staples, and Liming Zhu. Composing enterprise mashup components and services using architecture integration patterns. J. Syst. Softw., 84(9):1436–1446, 2011.
[54] LOD-Around-The-Clock (LATC). http://5stardata.info/. Accessed: 2013-11-02.
[55] Fadi Maali and John Erickson. http://www.w3.org/tr/vocab-dcat/, Accessed: 2013-12-05.
[56] Marcos Caceres and Mark Priestley. http://www.w3.org/tr/2009/wd-widgets-reqs- 20090430/. Accessed: 2014-02-21.
124 [57] David Martin, Mark Burstein, Drew Mcdermott, Sheila Mcilraith, Massimo Paolucci, Ka- tia Sycara, Deborah L. Mcguinness, Evren Sirin, and Naveen Srinivasan. Bringing seman- tics to web services with owl-s. Multimed Tools Appl, pages 365–387, 2012.
[58] Pablo N. Mendes, Hannes Mühleisen, and Christian Bizer. Sieve: Linked data quality assessment and fusion. In Proceedings of the 2012 Joint EDBT/ICDT Workshops, pages 116–123. ACM, 2012.
[59] Alistair Miles and Sean Bechhofer. http://www.w3.org/2009/08/skos-reference/skos.html, Accessed: 2013-12-05.
[60] Eetu Mäkelä, Kim Viljanen, Olli Alm, Jouni Tuominen, Onni Valkeapää, Tomi Kauppinen, Jussi Kurki, Reetta Sinkkilä, Robin Lindroos, Osma Suominen, Tuukka Ruotsalo, Eero Hyvönen, and et al. Enabling the semantic web with ready-to-use web widgets, 2007.
[61] Christian Morbidoni, Axel Polleres, Giovanni Tummarello, and Danh Le Phuoc. Semantic Web Pipes, 2007.
[62] Jagadeesh Nandigam, Venkat N. Gudivada, and Mrunalini Kalavala. Semantic web ser- vices. J. Comput. Sci. Coll., 21(1):50–63, 2005.
[63] Barry Norton, Reto Krummenacher, Adrian Marte, and Dieter Fensel. Dynamic linked data via linked open services. In Linked Data in the Future Internet 2010, pages 1–10, 2010.
[64] R. Fielding and J. Gettys and J. Mogul and H. Frystyk and L. Masinter and P. Leachand T. Berners-Lee. http://www.w3.org/tr/html/. Accessed: 2014-02-21.
[65] RDF Working Group. http://www.w3.org/rdf. Accessed: 2013-10-21.
[66] Roberto Chinnici and Jean-Jacques Moreau and Arthur Ryman and Sanjiva Weerawarana. http://www.w3.org/tr/wsdl20/, Accessed: 2013-11-15.
[67] Robin Berjon and Steve Faulkner and Travis Leithead and Erika Doyle Navara and Edward O’Connor and Silvia Pfeiffer. http://www.w3.org/tr/html/. Accessed: 2014-02-21.
[68] Dumitru Roman, Uwe Keller, Holger Lausen, Jos de Bruijn, Ruben Lara, Michael Stoll- berg, Axel Polleres, Cristina Feier, Cristoph Bussler, and Dieter Fensel. Web service mod- eling ontology. Applied Ontology, pages 77–106, 2005.
[69] Sebastian Rudolph. Foundations of Description Logics. Reasoning Web 2011, LNCS 6848, 2011.
[70] SAWSDL Working Group. http://www.w3.org/2002/ws/sawsdl/. Accessed: 2014-02-27.
[71] Toby Segaram, Colin Evans, and Jamie Taylor. Programming the Sematic Web. O’REILLY, 1. edition, 2009.
125 [72] Souripriya Das and Seema Sundara and Richard Cyganiak. http://www.w3.org/tr/r2rml/, Accessed: 2013-11-13.
[73] SPARQL Working Group. http://www.w3.org/tr/rdf-sparql-query/. Accessed: 2013-10-21.
[74] Sebastian Speiser and Andreas Harth. Taking the lids off data silos. In Proceedings of the 6th International Conference on Semantic Systems, I-SEMANTICS ’10, pages 44:1–44:4. ACM, 2010.
[75] Sebastian Speiser and Andreas Harth. Integrating linked data and services with linked data services. In Proceedings of the 8th Extended Semantic Web Conference on The Semantic Web: Research and Applications - Volume Part I, ESWC’11, pages 170–184. Springer- Verlag, 2011.
[76] Steffen Stadtmüller, Sebastian Speiser, Andreas Harth, and Rudi Studer. Data-fu: A lan- guage and an interpreter for interaction with read/write linked data. In Proceedings of the 22Nd International Conference on World Wide Web, WWW ’13, pages 1225–1236. International World Wide Web Conferences Steering Committee, 2013.
[77] Mohsen Taheriyan, Craig A. Knoblock, Pedro Szekely, and José Luis Ambite. A graph- based approach to learn semantic descriptions of data sources. In The Semantic Web – ISWC 2013, Lecture Notes in Computer Science, pages 607–623. Springer Berlin Heidel- berg, 2013.
[78] Mohsen Taheriyan, Craig A. Knoblock, Pedro Szekely, and Jose Luis Ambite. Rapidly integrating services into the linked data cloud. In The Semantic Web – ISWC 2012, Lecture Notes in Computer Science, pages 559–574. Springer Berlin Heidelberg, 2012.
[79] Tim Berners-Lee, W3C and Dan Connolly, W3C. http://www.w3.org/teamsubmission/n3/. Accessed: 2013-10-24.
[80] Giovanni Tummarello, Richard Cyganiak, Michele Catasta, Szymon Danielczyk, Renaud Delbru, and Stefan Decker. Sig.ma: Live views on the web of data. Web Semantics: Science, Services and Agents on the World Wide Web, 8(4):355 – 364, 2010.
[81] Ruben Verborgh, Thomas Steiner, Davy Van Deursen, Jos De Roo, Rik Van de Walle, and Joaquim Gabarró Vallés. Capturing the functionality of web services with functional descriptions. World Wide Web, 10(3):243–277, 2012.
[82] Ruben Verborgh, Thomas Steiner, Davy Van Deursen, Sam Coppens, Erik Mannens, Rik Van de Walle, and Joaquim Gabarró Vallés. Integrating data and services through func- tional semantic service descriptions. In Proceedings of the W3C Workshop on Data and Services Integration, 2011.
[83] Roberto De Virgilio, Francesco Guerra, and Yannis Velegrakis. Semantic Search over the Web. Springer-Verlag Berlin Heidelberg, 1. edition, 2012.
126 [84] Tomas Vitvar, Jacek Kopecký, Jana Viskova, and Dieter Fensel. Wsmo-lite annotations for web services. In Proceedings of the 5th European Semantic Web Conference on The Semantic Web: Research and Applications, ESWC’08, pages 674–689. Springer-Verlag, 2008.
[85] W3C. http://www.w3.org, Accessed: 2013-12-15.
127
2.10 Semantic Web Services
One of the goals of this master thesis is semantic Web Widget description with use of web service description approaches or languages. This part of the Master Thesis gives an overview of Semantic Web Service. As referred earlier, nowadays the Web may have great significance for society. The tra- ditional Web focused on interaction between people and applications, information sharing, on providing of the basic features for e-Commerce, and on support for application integration (very limited). [26] The ability to exchange and use information is the major task because of limita- tions of the Web. The solution for the problem of interoperability was the introduction of Web Services. The W3C defines web service as “a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine- processable format (specifically WSDL). Other systems interact with the Web service in a man- ner prescribed by its description using SOAP messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards“. The Web Services connect applications over the internet using Web service standards in order to exchange the data. For example, online purchase, if the user want to by a staff, he or she sends a request to the Server and gets a response. The request includes the ID of the staff, amount, credit card name, address etc. The response includes information about successful purchase or some errors. A client and a web service exchange this information via request and response messages. The client application send a request message to the Web server and the servers returns a re- sponse message to the the client. The technology has following aspects:
33 • The protocol is responsible for message transportation. For example, HTTP, SMTP, FTP or BEEP48.
• The message structure is defined with SOAP or REST.
• The interface description describes the structure of message. For example, WSDL.
• The data format are XML based message format or JSON.
It is necessary to take close look at the data that web services have as Input and as Output. For clarification, the following examples of the messages in different formats are introduced. REST + XML The URI defines a resource. E.g., http://ex.com/actors/angelinajolie.A REST response is a document in XML format and the resource URL.
1 < p r o f i l e > 2
REST + JSON REST + JSON is essentially the same as the previous format. The difference is that the data is transferred in JSON format. The advantage of JSON is the ability to parse the structures into JavaScript.
1 { 2 firstName : "’Angelina"’, 3 lasName : "’Jolie"’, 4 citizenship : "’US"" 5 year : "’1982"’ 6 } Listing 2.14: REST + JSON
XML RPC The message is also represented in XML format.
1 HTTP / 1 . 1 200 OK 2 Connection: close 3 Content−Type: text/xml 4 Server: ex.com 5 6 7
48http://en.wikipedia.org/wiki/BEEP
34 11
The Richardson maturity model49 identifies REST as the interaction between a client and a server according to three principles:
• Resource identification by means of URI.
• API should use constrained set of operations (HTTP verbs).
• Hypermedia controls (automatic web application control).
The main problems of the Web Service are:
• As pointed out before, the Web Services can have different standards and not machine- understandable content.
• Not self-describing.
• Service discovering is complex.
• Technical challenges in service composition.
A solution for the problems is adding the semantic description to the Web Services and their corresponding messages that contained data. “Semantic Web Services is a synergistic conflu- ence of the Semantic Web and Web Services“ [62]. They are like traditional Web Service but includes machine-readable and understandable information. The implementation of Semantic Web Services should apply standards for semantic data. Due to this fact the services can be discovered and assembled.
49http://martinfowler.com/articles/richardsonMaturityModel
35 The Semantic Web Services have many similarities with Web Widgets: widgets have also input, output, functional properties etc. An important task is to describe the semantic behind the services. There are different methodologies (OWL-S, WSMO, WSDL etc.) for services descrip- tion task, that are explored in detail in Chapter 3. The goal of the Chapter 3 is to understand which approaches can be applied to the problem statement of this Master Thesis.
36 CHAPTER 3 State of the Art
This Chapter is divided into two parts. The first part describes state of consuming and publishing tools. The second presents the different methodologies that can be applied for Web Services and data descriptions.
3.1 Applications
3.1.1 Overview of existing application This chapter describes related applications and projects in the field of consuming and publication of Linked Data. There are some existing tools and application available. The applications can be categorized as follows [17] [36]:
• Linked Data Browser are similar to web browser however instead of navigating between pages via hyper links, the users navigate between data resources by following linked ex- pressed by RDF triples. [17]. Examples: Tabulator (generic data browser and editor [13]) and Marbles1 (server-side application that formats Semantic Web content for XHTML clients using Fresnel2 lenses and formats).
• Linked Data Search Engines and Indexes. A number of search engines have been de- veloped that crawl Linked Data from the Web by following RDF links, and provide query capabilities over aggregated data. Broadly speaking, these services can be divided into two categories: human-oriented search engines and application-oriented indexes [17]. Exam- ples: Falcons3, SWSE4, Swoogle5 (semantic Web search engines that provide keyword-
1http://mes.github.io/marbles 2http://www.w3.org/2005/04/fresnel-info/ 3http://www.w3.org/2001/sw/wiki/Falcons 4http://swse.org 5http://swoogle.umbc.edu
37 based search for objects, data, ontologies and documents), sameAs.org6, Sindice7, and Sig.ma8.
• Domain-specific Applications. The applications that were developed for domain-specific goals. Such applications access specific data from different Linked Data Sources. Exam- ples: DBpedia Mobile9, DERI Pipes10, BBC Programms and Music11.
The following sections of this Chapter provides existing mashup platforms and applications that are able to consume Linked Data. This Chapter covers not only semantic applications but also Yahoo!Pipes12, a mashup that consume data from various resources. For evaluation of tools some parameter like discovery, input/output data types, access methods, recursion, behavior are used [3]. The user interface is also an important factor.
3.1.2 Yahoo!Pipes Yahoo!Pipes is an online application that was lunched at 7th February 2007 by Yahoo. The purpose of the application is data integration and consuming from different web pages, web feeds (RSS feeds) and other online resources by way of constructing data mashups [40] [43]. The mashup system includes different type of widgets. Some of them have access to data sources. Another widgets include aggregate or filtering options. Widgets can be wiring together in order to process data. The Yahoo!Pipes environment includes four main parts: a navigational bar, the toolbox, the work canvas, and a debug-output panel [43]. The mashup creation is going through dragging modules (operators) from the toolbox into the work canvas and linking the modules. Each of the modules completes a specific task [40]. Widgets have input and output terminals. The linking input of a widget to another widget is occurred via wiring from the input to the output port (terminals) or vice versa. Data flow is going from input modules to a single Pipe output (end of execution process) [43]. The output returns in different formats such as RSS, JSON. The project can be saved and shared with other users of Yahoo Pipes. Pipes can be accessed via their URL (each of pipes has a unique URL). The user has a possibility to store the pipe in the public directory. Anyone can search and browse the pipes from the directory. User can search for published pipes, inspect and modify pipes, and also save a copy from it in a directory. There are eleven categories of modules (features): sources, user inputs, operators, url, string, data, location, number, favorites, my pipes and deprecated. Source is the component that bring the data from web pages into the pipe [43]. This modules can process data on the Web in CSV (Module Fetch CSV), XML and JSON (Module Fetch Data), RSS, Atom and RDF (Module Fetch Feed) formats. Find First Site Feed is a module for finding an RSS or Atom feeds. It is also possible to extract any information from web pages
6http://sameas.org/ 7http://sindice.com/ 8http://sig.ma/ 9http://dbpedia.org/DBpediaMobile 10http://pipes.deri.org 11http://www.bbc.co.uk 12http://pipes.yahoo.com/pipes/
38 using XPATH Fetch Page Module. E.g. The command //img is used to return all images from a web page. This category includes also other components. User inputs make Yahoo!Pipes more flexible and enable adding user inputs into data flow. There are five types of input modules: date, location, number, text and URL. The user may provide the following fields: name (parameter name), prompt (for Run Pipe option, a text entry field), position (the order of input fields), default (a default value), and debug (a default value within the Pipes Editor). Operators are used for data transformation and filtering. This category includes following modules [43]:
• Count Module counts number of items. The input of the module is a data feed and the output is a number.
• Filter Module is used for item inclusion and exclusion from a feed via rules definition. The module can contain multiple rules.
• Location Extractor Module is used for adding location elements (y:location) which in- cludes sub-elements such as latitude, longitude, quality, country, state, city, street, postal code. This element gives the possibility to display the feed on a map.
• Loop Module is used to add sub-modules to Pipes. A module can be inserted into Loop Module. The sub-module will run once for each item in the input feed. There are two options that define the output of the module: “emit result“ (output is only data from the sub-module) and “assign results to“ (output is all the data from the original input, the data from the sub-module is ).
• Regex Module “modifies fields in an RSS feed using regular expressions, a powerful type of pattern matching“ [43].
• Rename Module renames elements. E.g. it is possible to convert some data into RSS format (the elements will have title, description, etc.) or to location elements for Location Extractor. There are two types of mapping: “rename“ (create a new element with a new name with deleting the old element) and “copy es“ (create a new element without deleting the old element).
• Reverse Module provides reversing the order of items.
• Split Module splits the feed “into two identical output feeds“ [43]. The module is useful in case of different operation on the same data items.
• Sort Module sorts feeds in either ascending or descending order by any element (e.g., name, date).
• Sub-Element Module extracts selected sub-elements from a feed.
• Tail Module tails a feed to the last N items. N is a number specified by user.
• Truncate Module truncates a feed to the first N items.
39 Figure 3.1: the Web & the Semantic Web
• Union Module combines separate sources of items (maximum 5). The output is a list of items.
• Unique Module removes the duplicated string data type data from the feeds.
• Web Service Module sends a request to an Web Service for additional processing of the data. The Yahoo Pipes gets the response from a Web Server in JSON format. The Web Service should support HTTP POST in JSON format.
• Create RSS Module transforms input data in RSS format. Non-RSS elements are re- named in an existing element name.
Figure 3.1 presents an example in Yahoo Pipes. The example shows aggregation of infor- mation from different sources. The processing is going separately for each data source. In the example the data from Sciencenews Web Page (https://www.sciencenews.org/) and CNN news page (http://rss.cnn.com/) have been selected. The merging the data is processed via the Union module. The use case for the example was the finding the ar- ticle, that have word “Dolphin“ in the title. To get “dolphin“ reference from Sciencenews Web Page the XPath Web Page module was used. For selection of the data XPATH com- mand //a[contains(.,’Dolphin’)] has been used. The Truncate module has been user for taking the top two articles. The articles from CNN web page has an RSS feed (http:
40 //rss.cnn.com/rss/cnn_topstories.rss). For selection of “Dolphin“ from the ti- tle the module Filter has been used (item.description contains “Dolphin“). Finally, both feeds are piped into a Union module, merging both into one feed. After running the pipe in the debug-output panel the result of merging is shown(3 articles). URL Module includes only one module - URL Builder Module. All resources are defined by URLs. Some of them are complex. The module is used for controlling on URL construction. String Modules are used to process string values. For example, the building a string from some sub-strings. The category includes String Builder Module, Sub String Module, Term Ex- tractor Module, Translate Module, String Regex Module, String Replace Module, String Tok- enizer Module, Yahoo! Shortcuts Module, Private String Module. Date Modules are used for date building and formatting. There are two modules: Date Builder Module and Date Formatter Module. The first module converts a string value into a datetime value. The second modules defines a format for the datetime value. Location Builder Module extracts geographical data from a description. “The module outputs a location structure with separate fields for city, state, country, latitude, and longitude“ [40]. The location can be connected with any modules that accept location types. Simple Math Module processes mathematical operation like division, substraction, power etc. Yahoo!Pipes supports creation of new information streams from different sources by using a cascade of simple operators. Data sources are usually web feeds (e.g. from news web page) or another simple data. The access to data is realized via standard web protocols (HTTP, RSS). Yahoo!Pipes mashups can be combined with each other and can be accessed via HTTP. The retrieving data are usually automatic refreshed after each start of the pipes. The disadvantage of Yahoo!Pipes is the lack of Semantic Data processing capabilities. Yahoo!Pipes also doesn’t support search for stored widgets based on authors, topic, etc. and semantic description of the resources. Yahoo!Pipes supports component discovery according to keywords. The possible formats of data are limited to RSS, Atom, XML, and JSON. An advantage is that Yahoo!Pipes give ability to use XPath expression for data retrieving from web pages. It accesses the data via HTTP or RSS/Atom. A good feature is mashups recursion, the stored mashups can be used as parts of another mashups. The interface is very complicated for non-professional users.
3.1.3 DERI Pipes DERI Pipes [29] is an open source project for web data transforming, filtering, aggregation (the data should be in RDF format or in several RDF serialization format) [49], and building RDF- based mashup [61]. The tool supports RDF, XML, SPARQL, XQUERY, JSON and several scripting languages [29]. External applications can use the output stream of data (e.g. JSON). The web sources of data can be accessed via URIs. Data are processed by several basic operators, and each operator may have one or more input (e.g. text, output from other operators or URIs) and only one output (a RDF graph, an RDF datasets, an SPARQL set). A set of instances of the operators represents a pipe [61] [49]. A Semantic Web pipe processes a data flow from a set of RDF sources through pipelined special purpose operators [49]. Figure 3.2 presents the basic operators such as CONSTRUCT and
41 Figure 3.2: Semantic Web pipe operators. Source: [49]
SELECT. The input values can be data in RDF, string or XML formats. The output is usually data either in RDF or in XML format. The definition of the pipes are stored as XML. The structure of a simple pipe is presented by the following example [29], which presents a simple pipe that aggregates data from differ- ent Linked Data sources. Each pipe is started with XML tag
1 < p i p e > 2
DERI Pipes has also a graphical editor (c.f. Figure 3.3). The environment is similar to Yahoo!Pipes. On the left side there are sets of operators that are grouped into four categories: fetch, operators, url, and inputs. The operators can be moved onto the designer tab canvas or panel and connected. The source code can seen by clicking on the button “source code“ under the designer tab canvas. The result of pipes is shown in view panel (text or table view). To understand better the features the operators [29] have been considered.
42 Figure 3.3: DERI interface
The first category of operators includes fetch operators. This operators get data from data source (via a URI) in RDF, HRML, XML, or XSL format. The second category “Operators“ includes operators for data processing. The triples that are fetched from different sources can be merged via the MIX operator. The input of the operator should be RDF/XML data (a constant or an output of an another operators in RDF/XML format). The operator RDFS MIX merges specified sources and then concludes triple from the merged triples. The operator CONSTRUCT is used to derive data from one or mere specified RDF sources via SPARQL. Cycle operator FOR invokes “a parametrized pipe multiple times and merge the resulting outputs of each invocation“. The operator SMOOSHER can be used to merge all data from different sources according to a URI and based on the owl:sameAs statement. The third category is “URL“. There are two operators in the category: URL builder and SPARQL Endpoint. URL builder is similar to Yahoo pipes url builder. SPARQL Endpoint accesses to a SPARQL endpoint via a SPARQL query which is contained in the operator. The fourth category “Inputs“ includes PARAMETER that “accepts user input“ [29] and FOR VARIABLE that gives a name to a field that is used within a loop [29]. DERI Pipes like Yahoo!Pipes can be stored, shared an re-used by other users. Each DERI Pipe has a unique URL. Users can connect different pipes, modify existing pipes and include pipes as a functional block into projects (because of XML format or HTTP-retrievable model). DERI Pipes like Yahoo!Pipes doesn’t support an efficient search for stored widgets and se- mantic description of widgets. DERI Pipes process data in RDF, XML, Microformats, JSON and binary streams and convert data into RDF format. The platform accesses the data via SPARQL. A good feature is mashups recursion. The stored mashups can be used as parts of another mashups. The interface is very complicated for non-professional users and programming skills are needed.
43 3.1.4 BIO2RDF BIO2RDF is an open-source semantic project that provides Life Science Linked Data data from over 1500 biological databases (like Kegg13, MGI14, PDB15) [16]. The goal of the project is an implementation of a more sophisticated scheme for biomedical data, bringing data from different web sites together, and adding semantic to the data in order to get machine-understandable content [20]. Bio2RDF provides scripts that convert diverse set of heterogeneously formatted sources [16] [20] into RDF. The datasets are converted based on Tim Berners-Lee’s designed principles of Linked Data (cf. Chapter 2.2). The transformation of the data into RDF format is occurred though a JSP toolbox. The data can be locally stored or accessed via http requests [16] [12]. The system supports “relational databases, text files, XML documents, and HTML pages“ [12]. Depending on the data format the system uses different methods to get the data: XML to RDF conversion, SQL to RDF conversion, or text file to RDF conversion. The data conversation includes three steps [12]:
1. Namespaces definition for URIs normalization (URI is unique, using owl:sameAs pred- icate).
2. The data source analyses and design of RDF a model.
3. The implementation of an RDFizer for data transformation and putting the data into a triple store.
Bio2RDF suggests set of principles for providers [12]:
1. “Use a REST like interface“ for clear and stable URI creation.
2. “Lowercase all the URI up to the colon“ for effectively case insensitive.
3. “All URIs should return an RDF document“ for easy connection to other linking data.
“The syntax of normalized URI is described by the following pattern: http://bio2rdf. org/
• The data from external sources can be are stored in an SQL database on BIO2RDF.org server. This sources are accessible directly from the server. The direct access to the BIO2RDF server affords high speed. E.g. data from HGNC16, Entrez Gene17, Kegg18.
13http://www.genome.jp/kegg/ 14http://www.informatics.jax.org/ 15http://www.rcsb.org/pdb/) 16http://www.genenames.org/ 17http://www.ncbi.nlm.nih.gov/gene 18http://www.genome.jp/kegg/
44 Figure 3.4: Bio2RDF system framework architecture. Source: [12]
• The data from external sources can be requested directly from data source. After the request the data is transformed into RDF with use of a RDFizer program. E.g. data from Reactome19, PubMed20, UniProt21.
There are two servlets in the system: Elmo22 and Sesame23. For crawling of the RDF documents Elmo is used. The triples are processed in the local Sesame repository. By means of the Sesame interface the data can be browsed and queried. BIO2RDF looks like a search engine. The user can use it like Google or Yahoo to find the needed information. The result of request is a table with properties and values. BIO2RDF contains very specific knowledge therefore it is very popular in life sciences industry.
3.1.5 LOD2 The LOD2 is a large European project and set of tools that “support the whole life cycle of Linked Data from extraction, authoring/creation via enrichment, interlinking, fusing to mainte- nance“ [8], developed by partner companies and university. The architecture of components is based on three foundations [8]:
• “Software integration and deployment using the Debian packaging system“.
• “Use of a central SPARQL endpoint and standardized vocabularies for knowledge base access and integration between different tools“.
19http://www.reactome.org/PathwayBrowser/ 20http://www.ncbi.nlm.nih.gov/pubmed 21http://www.uniprot.org/ 22http://www.openrdf.org/ 23http://www.w3.org/2001/sw/wiki/Sesame
45 • “Integration of the LOD2 Stack user interfaces based on REST enabled Web Applica- tions“.
LOD2 defines Linked Data Lifecycle, that includes eight phases, for each of that are a set of tools are available:
• Extraction. Conversion of data into RDF. Tools: Valiant24, Apache Stanbol25, DBPedia Sportlight26, D2RQ27.
• Storage. Optimization of data storage, dynamic query to RDF graph, graph processing etc. Tools: Virtuoso28.
• Authoring. Publishing of the Linked Data, addition of semantically enriched content and editing it for non-expert users. E.g. WYSIWYM paradigm29. Tools: PoolParty30, On- toWiki31.
• Interlinking. Data integration, addition the links between semantic contents. Tools: SILK32, LIMES33.
• Classification. The integration of the raw data with an ontology for future work with integrated data.
• Quality. The quality characteristics like coverage, context or structure are very important. Tools: Sieve34.
• Evolution/Repair. Control for data sets and ontologies relevance in order to keep things stable. Repair strategies should be planed for appeared problems. Tools: Sieve.
• Search/Browsing/Exploration. Tools: SemMap35.
Apache Stanbol Apache Stanbol is a set of components that combine tradition content management systems with semantic services. Apache Stanbol includes:
• Content enhancement. The goals are information extraction from contents, content ana- lyze, presenting contents as RDF. It is used for search and navigation improvement.
24http://lod2.eu/Project/Valiant.html 25https://stanbol.apache.org/ 26http://dbpedia-spotlight.github.io/demo/ 27http://d2rq.org/ 28http://lod2.eu/Project/Virtuoso.html 29http://en.wikipedia.org/wiki/WYSIWYM 30http://lod2.eu/Project/PoolParty.html 31http://lod2.eu/Project/OntoWiki.html 32http://lod2.eu/Project/Silk.html 33http://lod2.eu/Project/LIMES.html 34http://sieve.wbsg.de/ 35http://aksw.org/Projects/SemMap
46 • Reasoning. The Stanbol reasoners analyzes set of axioms and facts in order to get logical consequences (additional semantic).
• Knowledge models or Ontology Manager provides access to ontology stored in the system for managing ontology, ontology networks, user sessions [28].
• Persistence or Contenthub. It is a document repository for semantic information storing.
The functionalities of components are terms of a RESTful web service API.
D2RQ R2RQ is a platform for data retrieving in form of RDF graphs from relational databases without additional storing the data in a RDF store. The D2RQ Platform is “a system for accessing relational databases as virtual, read-only RDF graphs. It offers RDF-based access to the content of relational databases without having to replicate it into an RDF store“36. The system support querying of non-RDF database using SPARQL, presentation of relational data bases as Linked Data and access to the data, use of Apache Jena API, creation of custom dumps. The platform includes :
• The D2RQ Mapping Language. The language describes mapping between relational databases and ontologies. The data are presented as virtual data graphs that include infor- mation from relational databases.
• The D2RQ Engine. The engine converts Jena API calls into common SQL queries ac- cording to a mapping description.
• D2R Server gives the ability to publish the data into LOD. The server transforms the data from relational database into RDF formats according to a mapping description. After transformation the data can be browsed and searched.
Virtuoso Virtuso is a multi-model data server for data and information storage, and knowledge man- agement. It allows the access to various data sources that are stored in different formats and sup- ports various query languages, and data representation formats. For example, SQL, SPARQL, JDBC, HTTP, WebDAV, XML, RDF, etc. Virtuoso covers many areas like Data Management (Relational, RDF Graph, or Document), Free Text Content Management & Full Text Indexing, Document Web Server, Linked Data Server, Linked Data Deployment, and Messaging.
PoolParty PoolParty is a thesaurus management system for generation of knowledge models, creation of thesauri and taxonomies. The platform is based on semantic technology and provides the
36http://d2rq.org/
47 ability to combine thesauri with Linked Open Data. The information are analyzed by the system and published into a semantic graph. The system has following features:
• It can analyse documents in order to find inconsistency between existing taxonomies and the content.
• The system follows W3C’s SKOS standard.
• Connection to Linked Data.
• Support various datatypes.
• Use of Virtuoso for knowledge graph storing.
• Integration with SharePoint, Drupal, etc.
• The system is based on the following standards: RDF, SPARQL and SKOS.
• Integration with other enterprise systems.
A Link Discovery Framework for the Web of Data (SILK). “Using the declarative Silk - Link Specification Language (Silk-LSL), data publishers can specify which types of RDF links should be discovered between data sources as well as which conditions data items must fulfill in order to be interlinked.“ [?]. The SILK specifies RDF links between data sources and terms of data interlinking. Different similarity metrics can be used for links defining. LIMES. LIMES is a link discovery framework. The approach refers to “interlinking“ phase and is based on estimation of similarity between instances [8] [51]. The instances pair are filtered to find out sufficient according to the specified conditions. The approach includes also machine- learning algorithms (EAGLE37, COALA38 and EUCLID39) to find out the appropriate pairs of instances. The framework includes seven modules: control module (matching process coordination), data module (consist of classes needed to work with data), I/O-module (is used for data reading and data extraction), query module, LIMES engine (used for result computing).
Sieve. Sieve relates to the quality phase of Linked Data Life Circle. The tool consists of two modules: data quality and data fusion [58]. Sieve realizes the prove of quality of data throw various mechanisms:
• Assessment Metrics. The metrics combines some quality indicators and “calculates an assessment score from these indicators using a scoring function“ [58].
37http://en.wikipedia.org/wiki/Eagle_strategy 38http://www.cs.mu.oz.au/~jbailey/papers/coalafinal.pdf 39http://en.wikipedia.org/wiki/Euclidean_algorithm
48 • Data qualities indicators. The indicators depends on information that the users need and a specific situation.
• Scoring functions. The functions are related to data qualities indicators and presented an evaluation of them. It includes simple comparison functions, complex statistical functions, network analyses, etc.
• Aggregate Metrics. The metric aggregates assessment metrics with use of the average, sum, max, min or threshold functions.
The second module presents a data fusion mechanism. “Data Fusion is commonly seen as a third step following schema mapping and identity resolution, as a way to deal with conflicts that either already existed in the original sources or were generated by integrating them“ [58]. There are two types of fusion finction in Sieve:
• Filter function. It uses a quality metric to remove some values from the input data sets.
• Transform function. It generates new values from input datasets with use of fusion function like Filter, First, Last, Random, or Average, Max, Min.
SemMap is used for knowledge visualization. It explores spatial areas and shows objects according to specific properties. The interaction between triple stores and the application is re- alized via SPARQL queries.
Sig.ma Sig.ma is a semantic web Mashup. The application has following tasks [80]:
• Browsing the Web of Data. Sig.ma browsers the information according to the input text data. The application returns the data from the Web of Data (e.g., name, title, location, etc.). The user has the ability to follow the links that the system returned.
• Embedding, linking and Sig.ma alerts. The user has an ability to expand and refine the sources in order to select needed values and properties.
• Structured property search for multiple entities. Search for properties. For example, the request “title, actor, year, [...] @ Harry Potter“ returns an array with given properties regarding to the entity “Harry Potter“.
The search for data sets has following steps [80]:
• Data source selection. The result is a list of sources that has been found via various Search Engine interrogations.
• Parallel Data Gathering. The Extraction of structured data from different data sources.
• Extraction and Alignment of related subgraphs. The structured data are separated into parts, each of that has a resource description. As next step the similarity of data will be found and connected via owl:sameAs.
49 The information above can be summarized as follows: LOD2 is large research and develop- ment project, which covers full Linked Data circle from data extraction to search. LOD2 focus on data and information integration, quality of data, and bringing Linked Data to enterprises.
3.2 Semantic Description Approaches
In this part following kinds of description methodologies are presented:
• Service description approaches such as WSDL, OWL-S, WSMO, and WSMO Lite.
• The approaches which present integration of the services and Linked Data like LIDS, LOS, Data-Fu, RestDesc, and Karma.
• A matching approach that presented matching from relational database to RDF like R2RML.
In course of the parts it was impotent to clarify if there is one approach that can be applied for Linked Widget.
3.2.1 Web Services Description Language (WSDL) WSDL is an XML-based language and a model for Web services descriptions. A WSDL de- scription provides machine-readable information about how the service can be invoked; what data or information are needed; and what is a return of the service. The service description in- cludes the operations provided by the service and expected parameters. The model of WSDL is a set of components and properties. There are two version of WSDL: 1.1 and 2.0. Version 2.0 is a part of W3C recommendation. WSDL 2.0 provides two kinds of information: abstract model (application-level description) and concrete model (the specific protocol-dependent details) [66]. The separation is needed because of different end points with dissimilar access protocols for common functionality. The abstract model describes messages that are sent and received by a Web service. The concrete model describes communication protocol (e.g., SOAP), service interactions, and the endpoint of communication (the address). WSDL document uses the following elements in the definition of web services 40 [66]:
• Types – a container for data type definitions using some type system (such as XSD) [66].
• Message (WSDL 1.1) includes essential information for operation execution and corre- sponds to an action (an operation).
• Operation - “an abstract description of an action supported by the service“.
• Port Type (WSDL 1.1) or Interface (WSDL 2.0) – a list of operations (inputs and out- puts) that can be performed by one or more endpoints.
40WSDL 1.1 and WSDL 2.0 use in some cases different terms
50 • Binding indicates a protocol and a data format specification for a port type (SOAP binding style); • Port (WSDL 1.1) or Endpoint(WSDL 2.0) – usually an URL to a single endpoint. • Service – a set of endpoints. Listing 3.2 presents the main elements of a WSDL description.
1
3.2.2 Semantic Annotation for Web Services Description Language (SAWSDL) SAWSDL is “mechanisms using which semantic annotations can be added to WSDL compo- nents“ [70]. SAWSDL provides mechanisms by which concepts from the semantic models that are defined either within or outside the WSDL document can be referenced from within WSDL components as annotations [70]. Based on member submission WSDL-S, the key design principles for SAWSDL are [70]: • “The specification enables semantic annotations for Web services using and building on the existing extensibility framework of WSDL. • It is agnostic to semantic representation languages. • It enables semantic annotations for Web services not only for discovering Web services but also for invoking them“.
3.2.3 Semantic Markup for Web Services (OWL-S) OWL-S is an OWL-based ontology for description of web services. The language constructs are used for describing the properties and capabilities of Web services. “OWL-S markup of Web services will facilitate the automation of Web service tasks including automated Web service discovery, execution, interoperation, composition and execution monitoring“ [25]. The descriptions of Semantic web services have usually three interrelated subontologies or profiles: 41http://www.daml.org/services/owl-s/1.0/owl-s-wsdl.html
51
Ressource provides Service
presents (what supports (how it does) describedby to access it) (how it works)
profile process grounding
Figure 3.5: Top level of the service ontology
• The service profile provides service description, in standard OWL.
• The process model describes the processes inside the semantic web service.
• The service grounding describes access to the semantic web service, typically expressed in WSDL.
Like WSDL, OWL-S has abstract and concrete models. Abstract characterizations are the service profile and the process model. The service grounding provides concrete information needed for access to a web service like message formats, protocol, etc. Figure 3.5 depicts rela- tions between service and its components. In the picture the arrows present OWL properties and ovals show the OWL classes. The Service Models includes:
• Inputs and outputs. The inputs are the description of data that the service needs to process, the outputs are description of result data that service produce. Both properties are values from the Service Model.
• Precondition is a proposition that has to be true to execute the service.
• Result is a condition that becomes true after process execution.
The Service Profile specializes representation of services [25]. An OWL-S profile provides following kinds of information:
• Non-functional description (metadata like service name, description, contact information etc.). For example, the provider information includes information about the entity that is responsible for running the service.
52 • Function description about information transformation (function that can be computed, service characteristics) and service states (precondition and postcondition, fact). For ex- ample, a booking service can need as precondition the first and the last name of a person, credit cards data and an identity card ID. As output the service returns a booking confor- mation.
The service profile includes references to service model therefore it is possible to find those services that mostly satisfy requests. The service grounding describes an access to the semantic web service and implementation to WSDL, SPARQL etc. It represents exchange of information between consumer and service provider. It maps a abstract specification to a concrete model [25]. For example, in case of WSDL, it maps each atomic process to a WSDL operation and relates “each OWL-S process input and output to elements of the XML serialization of the input and output messages of that operation“ [57]. The inputs and outputs of a process in grounding level are realized as messages. The following example demonstrates the syntax of OWL-S. The process model describes the processes inside the semantic web service. Each service has some inputs
1
As output after the purchase the user gets a confirmation number (c.f. Listing 3.4).
1
A result variable should be also defined (c.f. Listing 3.5). The result variable is a variable scoped to the Result block and bound by the result condition.
1
The advantages are OWL-S synthesizes both an extensional and functional view of Web services, it provides a complete description of the services that it describes [57], OWL logic and ontological are included in description.
53 The disadvantages are: the limitation of OWL-S is in using OWL as a language based on description logic [6], hard to describe semantic relation between input and output because it doesn’t provide mechanisms to express its relation to other service [81], no working tools.
3.2.4 Web Service Modeling Ontology (WSMO) WSMO is an ontology for description of the core elements of Semantic Web Services. The following design principles are basics for WSMO: Web compliance (use of URI for resource identification), ontology-based (resource description is based on an ontology), strict decoupling (“each resource is specified independently without regard to possible usage or interactions with other resources“ [68]), centrality of mediation, ontological role separation (clients role separa- tion), description versus implementation (separation between description of Web Service and implementation), execution semantics (technical realization), service versus Web service. WSMO uses similar approach to OWL-S for declaring and describing services. But there is a difference to OWL-S. If OWL-S focuses more on the description of services, WSMO focuses more on application domain and solving the integration problems [47]. The model include four parts:
• Ontology - domain description that can be used by other WSMO elements. This part includes machine-processable information that are needed for addition of meaning to the data.
• Web service interface - semantic description of services (the capabilities, interfaces and internal working of the service);
• Goal - results or goals of the usage of the web service;
• Mediator - coordinates WSMO components.
An ontology in WSMO includes also non-functional properties, used mediators, concept definitions, relation definitions, axioms, and instances [47]. The non-functional properties are globally accessible by all the modelling elements. The properties can be from controlled vocab- ularies like Dublin Core, Properties from other vocabulary, a standard set provided by WSMO (hasContributor, hasDate etc). Used mediators are used for linking to ontologies that should be imported, linking the goals, linking between services and WSMO goals, orchestration. In compare to WSMO, OWL-S does not support such a meta-ontology.
3.2.5 WSMO lite WSMO-Lite is a lightweight approach which is standardized according to the W3C standards for the semantic service description, “the next evolutionary step after SAWSDL, filling the SAWSDL annotations with concrete semantic service descriptions“ [84] which can be directly applied to WSDL description. WSMO-Lite allows bottom-up modelling of web services. WSMO-Lite adopts the WSMO model and makes its semantics lighter in the following major aspects: WSMO-Lite treats medi- ators as infrastructure elements, and specifications for user goals as dependent on the particular
54 discovery mechanism used, WSMO-Lite only defines semantics for the information model, func- tional and nonfunctional descriptions, and it accepts any ontology language based on resource description framework [84]. The approach treats Web services as atomic, and does not focus on internal behaviour of web services [84] and does not have a concrete language for description of semantics for function. WSMO-Lite service ontology has three parts:
• Domain ontology that presents an information or data Structure Model. WSMO-Lite iden- tifies types and simple vocabulary for semantic description of services and languages used to express descriptions.
• Capabilities and/or functionality classifications that presents functional description of the service (conditions definition, effects)
• Non-functional and description is represented by an ontology that specifies policy or other non-functional properties.
The main disadvantage of this approach is that it focuses on description of Web APIs, and not on providing relationships between data that is processed by web services.
3.2.6 RESTDesc semantic description RESTDesc is a semantic functionality-centred method which expresses the functionality of a service - as well as its communication - in a concise way that appeals to human and can be processed automatically [81] [82]. The main elements of the RESTDesc approach are the pre- condition, the postcondition and the request details. The precondition describes the input state of a resource of a service. The postcondition is the output state after the interaction and the request details defines a method which should be used to achieve the new state. These ele- ments are brought together in the form of the rule, which takes care of correct quantification and variable instantiation [81]. This approach supposes use of an ontology model. E.g. It can be an RDF schema. The links are used to define the relationships between resources. E.g http://example.org/pictures/1 and http://example.org/pictures/ 1/animals/1, it means that the picture 1 is grouped to the category “Animals“. Listing 3.6 demonstrates use of RestDesc description language for service description. The service gets a director as input data and returns list of movies. The precondition defines an input of the service, an instance of the class movie:Director. This input is required for the service invocation. In the postcondition section the HTTP vocabulary is used to describe a GET request. The directorOf link shows the relationship between the director and the movies. The service returns the list of movies and provides additional data such as year, actor.
1 2 @prefix movie:
55 8 _:request http:methodName "GET"; 9 tmpl:requestURI (?director "/movie"); 10 http:resp [ tmpl:represents ?movie ]. 11 ?director movie:directorOf ?movie. 12 ?movie movie:year _:year; 13 movie:starring _:actor; 14 movie:type _:type. 15 }. Listing 3.6: RESTDesc Example
The advantages of this approach are: it is possible to describe relationships between input and output data (e.g., ?director movie:directorOf ?movie., it links web services direct to data sets. The disadvantages are: the user should manually write the description of the model while RESTDesc doesn’t support automatic generation of service descriptions, the approach focuses on applying HTTP methods for data retrieving, publishing, etc. while mashup focuses on Linked Data consuming and data description.
3.2.7 SA-REST SA-REST is an a simple and open microformat for enhancing Web resources with additional semantic information [31]. The meta information can be modeled according to various for- mats such as RDFa, OWL or Gleaning Resource Descriptions from Dialects of Languages (GRDDL42). Altogether that makes the service description human readable as well as machine readable. The main idea of SA-REST is to add semantic description directly into SAWSDL or HTML code. SA-REST like SAWSDL “annotates outputs, inputs, operations, and faults, along with the type of request that it needed to invoke the service“ [48] in form of URIs. It means, that the SA-REST links an ontology to a service. For example, an input message can be annotated by embedding a URI from an ontology. An important point of the approach is lifting and lowering schema specification. It is used for data structure transformation from input or output of services to an ontology. The idea is similar to OWL-S grounding. To realize this transformation SA-REST uses XSLTs or XQueries. The queries take a data structure from the implementation level (data expected as input or output of the service) and convert it into an ontology structure. Listing 4.1 demonstrates a Web page which is annotated with use of SA-REST. In this ex- ample, the user searches for information about a movie. The user puts a title object to the movie-search-service. An the service returns the description from the output of the movie- search-service.
1 2 3 4
42www.w3.org/TR/grddl/
56 6 8 9 11 13 15 < / meta> Listing 3.7: SA-REST Example
The advantages of the SA-REST are adding semantics direct to REST services, WSDL, or HTML; SA-REST doesn’t enforce the choice of language for representing an ontology or a conceptual model, but it allows the use of OWL or RDF [48]. SA-REST is “a more general purpose language that adds semantic annotations only to those page elements that wrap a service or a service description“ [48]. This could problems associated with widget composition and discovery. The disadvantages are that the annotation of web pages is often problematical, while the programmer should usually select a page which will be annotated with semantic description. Additionally, for Linked Widgets it is important to separate technical and semantic parts.
3.2.8 EXPRESS EXPRESS is an approach for semantic service description. The main feature of EXPRESS is providing “an uniform interface for resources“ [6]. The resources are described with use of an OWL ontology and the HTTP methods (GET, PUT, DELETE, POST and OPTIONS). The RESTful interface can be automatically created because of automatic direct mapping between entities and resources. The EXPRESS includes a service provider and an EXPRESS deployment engine. The ser- vice provider provides an OWL file describing the resources in a Web Service [7]. The OWL file also defines “exchanged message format“ [7]. The URIs for resources are generated through an EXPRESS deployment engine. After URI generation the service provider assigns the HTTP methods to the classes, properties and instances [7]. The user roles are provided to differ the access to resources and methods for different kinds of user (role-based access control). Listing 3.8 shows an example of a DVD ordering service description with using EXPRESS methods. A DVD ordering is provided by a Web Service. The service provider provides an ontology that describes entities and relationships between them. In this example, the classes are DVD, Customer and Order. The customer can order movies and games (the subclasses of class DVD).
1 2 // The custemer can order movies and games 3 :DVD a owl:Class. 4 :Movie a :DVD. 5 :Game a :DVD. 6 7 // an order can include movies and games 8 // the order has properties that defines a customer
57 9 // and time of the ordering 10 :Order a owl:Class. 11 :hasDVD a owl:ObjectProperty; 12 rdfs:domain :Order; rdfs:range :DVD. 13 :OrderedBy a owl:ObjectProperty; 14 rdfs:domain :Order; rdfs:range :Customer. 15 :hasDate a owl:DatatypeProperty; 16 rdfs:domain :Order; rdfs:range xsd:dateTime. 17 18 // class Customer 19 :Customer a owl:Class. 20 :hasName a owl:DatatypeProperty; 21 rdfs:domain :Customer; rdfs:range xsd:string. Listing 3.8: EXPRESS Example
The EXPRESS deployment engine generates the URIs. E.g., http://www.example. org/DVD is a URI for class DVD, http://www.example.org/Order is a URI for class Order. URIs are generated also for properties and instances of the classes. E.g., http:// www.example.org/customer1 is a URI for an instance of the class “Customer“, http: //www.example.org/customer1/hasName is a URI for the property “hasName“. The next step is methods subscribing to the resources that define the HTTP methods for each URI. If there are different types of user in the system, the methods will be defined via role based access control. After role definition stubs will be automatically created. Firstly, for DVD ordering the user sends a request to the server via the GET methods. The method returns the list of DVD from the OWL file. Secondly, the user orders items via a POST request to http://www.example.org/Order. The response of the server will be a URI of the new order (http://www.example.org/order1). If the user is already existing, the server selects automatically the URI of this user into the new order. Otherwise, the new user will be created and the server will return a new URI. The desired products are added via a PUT request to the server. Listing 3.9 is an example of the message that will be send to the server.
1 2 //The customer "Irina:" ordered an item 3 4 :customer1 a :Customer ; 5 :hasName "Irina". 6 :order1 a :Order ; 7 :hasDVD :Movie ; 8 :orederedBy :customer1. Listing 3.9: EXPRESS Example
The EXPRESS approach has following advantages: it eliminates the need describing ser- vices separately [7], not so complicated like WSMO and OWL-S, and uses OWL ontology to provide “a description of a RESTful Semantic Service“ [6]. The disadvantage of the EXPRESS approach are: there is no implementation of this approach, automatic discovery and composition are not yet possible, integration of the semantic model into the resource oriented architecture is not yet implemented [6].
58 3.2.9 Linked Open Services (LOS) The LOS approach supposes a service description method that simplifies the access to the seman- tic web services for LOD specialists [63]. The input and output of the services are connected via links to Linked Data. The semantic description presents what kind of input and output RDF data a service can consume and produce, and “how a service invocation contributes to the knowl- edge of its consumers“ [63]. The approach focuses on data description, therefore services can be more easily integrated in service compositions [63]. LOS does not only follow the Linked Data principles, but also proposes “a list of further service-specific principles to be followed for openly exposing services over Linked Data“ [63]. These principles are:
• SPARQL graph pattern for service description of input and output (inclusive the specifi- cation of data format);
• Use of RESTful content negotiation.
• The explicit relation between outputs and inputs.
• SPARQL CONSTRUCT will be available for lifting or mapping (optionally).
The approach proposes to transform non-RDF data to RDF data if the service does not accept the non-RDF data. After the data processing the returned output non-RDF data should be transformed again to the RDF format. The following parts of a code shows examples of production and consumption patterns for data that are accepted and returned by a service. The service gets information about actors and returns movies based on names and birthdays of actors. The client sends a request which contains information about an actor, the name and the birthday (c.f. Listing 3.10).
1 2 [a dbpedia:Person; 3 dbpediaprop:name ?name; dbpediaprop:bithDay ?b] Listing 3.10: Request to a service
The server sends a response in form of a message after completing the request. The response contains information about movies (the title, the year, and the actor) (c.f. Listing 3.11).
1 [ moviedbbase:movie [ 2 moviedbbase:title ?title ; 3 moviedbbase:year ?year ; 4 dbpedia:actors ?actor ; 5 dbpedia:name ?name; dbpedia:bithDay ?b] 6 ] Listing 3.11: Response of a service
The disadvantage is that LOS uses string values to present graph pattern, e.g. “[a dbpe- dia:Person; dbpediaprop:name ?name]“. Therefore the quality of discovery and composition reduces.
59 3.2.10 Linked Date Services (LIDS) LIDS focuses on Web Services and Linked Data integration by providing an interface [74]. The LIDS follows Linked Data principles, therefore the set of requirements are fulfilled: a URI of input of a service is required to invoke this service; an “URI must return a description of the input entity, relating it to the service output data“; the description has to be modeled according to RDF standard [75]. The use of URI as identifier for input entities has following advantages: the explicit link between input and output; the entities can be connected to different results; the representation of the result structure by a description; the meaning of the data by means of an ontology. Linked Service is interlinked with a Linked Data Endpoint. This gives a possibility to enrich Linked Data automatically. Additionally, the LIDS approach supports Linked Data publication and interlinking of Linked Data Endpoints with Linked Data Services. Listing 3.12 presents basic elements of a description. SPARQL constructs are used for adding relation between data and a service. input represents specific input values and ser- vice parameters. endpoint is a URI of a Linked Data Endpoint, that is used to construct service calls. io-relation is relation between input and output data.
1 2 CONSTRUCT { [ io−relation] } FROM [endpoint] 3 WHERE { [input] } Listing 3.12: LIDS Construct
Listing 3.13 presents a construct expression. The variable ?star will be found by the service. The variable ?movie is an input object of a service which has properties dbpediaprop:title and dbpediaprop:year. The service receipts title and year attributes of a movie, and based on this attribute finds and returns a list of stars.
1 2 CONSTRUCT { ?movie dbpediaprop:starring ?star } 3 4 FROM
Listing 3.14 shows the basic pattern of LIDS descriptions which can be added in an ontology, where LIDS - an instance of the Linked Service, ENDPOINT - an HTTP URI of the Linked Service, ENTITY - the name of the entity, INPUT and OUTPUT graph patterns, VARS - variables or input parameters.
1 @prefix lids:
60 8 lids:required_vars VARS 9 ]. Listing 3.14: LIDS basic pattern Listing 3.15 presents an example of applying LIDS approach. The example shows a “Movie find server“ which returns a set of stars based on an year and a title of a movie.
1 :MovieFindService a lids:LIDS; 2 lids:lids_description [ 3 lids:endpoint 4
3.2.11 Data-Fu The goal of the approach is a specification of data and services that process Linked Data from various data sources. Data-Fu is “resource-driven programming approach leveraging the com- bination of REST with Linked Data“ [76]. The approach gives an opportunity to develop an application that access to semantic web resources via using a declarative rule language. The approach simplifies web application development by providing links to Linked Data and inter- action specification based on resource state. Data-Fu follows three basic principles: use of URIs for resources identification, use HTTP methods to access and process data, and to interlink resources. It also denotes that Linked Data “does not distinguish explicitly between URI-identified objects and their representation“ [76]. The combination Linked Data with REST brings an ability to manipulate data. Data-Fu provides a mechanism to define changes of resource states. Data-Fu includes two layers: • Read/Write Linked Data Resource - the applying HTTP methods to Linked Data re- source. The most important methods are GET, POST, OPTION, DELETE, and PUT. Data-Fu distinguishes safe and non-safe methods. The non-safe methods effect on state of the resource (e.g., the method DELETE that deletes some datasets). The safe methods don’t affect state of the the resources. “The dependency between communicated input and the resulting state of resources also needs to be described“ [76]. For example, the method PUT creates or overwrites a resource with the submitted input.
61 • REST Service Model is formalized model for description of interactions that are sup- ported by RESTful services. It describes the influence of HTTP methods on Linked Data resource states and presented by “a REST state transition system (RSTS)“ [76].
The both two layers use RDF for description of methods and resources. The Data-Fu tech- nique also includes an interpreter, an engine which invokes service interaction. The interaction are specified by Data-Fu rules. An advantage of the engine is ability to process complex queries at the same time. After processing the engine can store the data in different formats like JSON or RDF. Listing 3.16 presents a description of Linked Data services using Data-Fu language. The first part describes the HTTP method GET which returns a movie item. The second part describes a method POST which adds the additional information to the movie item (title, year and a star).
1 2 GET (?mid, {}) 3 <− {?mid rdf:type ex:MovieID} 4 5 POST{?d, {[] rdf:type ex:Description; 6 ex:title ?t ; 7 e x : y e a r ?y ; 8 ex:starring ex:Person. }) 9 <− { ex:Movie ex:hasID ex:MovieID }. Listing 3.16: Data-Fu Example
The approach focuses on the applying the HTTP methods for interaction with Linked Data processing. The disadvantages of this approach are: it is not defined how to discover and com- posite services; the querying mechanism is not defined.
3.2.12 Karma Karma is a tool for data integration from various data sources and generation semantic interlink- ing data. The ontology describes APIs description as well as semantic relation between input and output data. Karma approach suggests to “represent the semantics of Web APIs in terms of well known vocabularies, and to wrap these APIs so that they can consume RDF from the LOD cloud and produce RDF that links back to the LOD cloud“ [78]. Due to the semantic description of APIs, the description can be queried with use of SPARQL queries. The modelling includes two steps: ontology definition, assignment data to semantics types, and Identification of relationships between data and ontology. The model of a linked API consists of two parts: the syntactic part which provides re- quired information (e.g, a URI, input parameters) for service, and semantic part that describes input and output data of a service and the relations between them [78]. Figure 3.6 represents the semantic model of this approach. Each Service km:Service has inputs km:hasInput and outputs km:hasOutput that are linked to a model km:Model. The input and output models are defined with use of the Semantic Web Rule Language (SWRL)43. The Model is linked to swrl:Atom instances: swrl:ClassAtom which describes an instance of a class
43http://www.w3.org/Submission/SWRL/
62 Figure 3.6: The ontology description of a Web APIs. Source: [78] and swrl:IndividualPropertyAtom which presents an instance of a property [78]. The variable swrl:Variable are data that the service gets as input or returns as output. For example, how to describe with use of the Karma approach a service which receives an instance of the class author as input data and returns an instance of the class publication as Output? The class atoms will be linked to instances of classes author and publication. The individual property atom will have a relation to an rdf:property - “theAuthorOf“. The variables will be author name and birth date. The service processes this data and returns URIs of one or more publications. Listing 3.17 presents a snippet of a service description with use of the Karma approach.
1 @prefix :
63 16 km:hasModel :inputModel . km:hasModel :outputModel . 17 :in_title a km:Attribute; :out_actor_name a km:Attribute; 18 km:hasName "title" ; km:hasName "actor_name" . 19 hrests:isGroundedIn ... 20 "p1"^^rdf:PlainLiteral . 21 22 ...... 23 :title_var a swrl:Variable . 24 :inputModel a km:Model; 25 km:hasAtom 26 [ a swrl:ClassAtom ; 27 swrl:ClassPredicate dbpedia:title; 28 swrl:argument1 : title_var]; 29 ... Listing 3.17: Karma Example
Another goal of the Karma approach is the automatic modelling and optimization of source model. This is based on a graph-based approach which was introduced by Karma developers. The main focus is the problem of automatic semantic annotation. The approach increases “the quality of the automatically generated models by using the already modelled sources to learn the patterns that more likely represent the intended meaning of a new source“ [77]. There are many sources that provide similar semantic linking data. The task of the project is using already existing resource models in order to get a new one. Typically there two steps in
Figure 3.7: Graph-based approach by an example. Source: [78]
64 modelling process. The first step is determination of semantic types. It means that each attribute should be “labelled with a class or a data property of the domain ontology“ [77]. For example, to invoke the service getEmployees it is required to provide the attributes “employer“ and “employee“. The domain ontology include two classes: Person and Organisation. As result of this step the attribute “employee“ will be labelling with the class “Person“, and the attribute “em- ployer“ with the class “Organization“. The second step is relationship definition, e.g a person “worksFor“ an organisation (c.f. Figure 3.7). As a graph is constructed, labeling the attributes to semantic types and search for appropriate nodes with use of machine learning technique can be performed. Next, the models should be scored in order to find one that is matched with more coherent and frequent patterns and build- ing of a tree for candidate models generation. The last step is the generation of a ranking list according to which the users have possibility to choice a correct model. Additionally, the new version of Karma can include direct mapping between data stored in relational databases and domain ontologies with use of W3C’s R2RML44. This mapping lan- guage will be introduced in the following section.
3.2.13 RDB to RDF Mapping Language (R2RML) In order to make the semantic model more flexible, the relational database to rdf mapping is very important. The automatic mapping can increase the data volume that can be used for specific tasks. The suggestion is to use RDB to RDF Mapping Language (R2RML) for automatic dataset generation and combination with the existing web service description models. This part of the Master Thesis is based on the W3C recommendation for R2RML [72]. R2RML is a language for “relational database datasets to RDF datasets“ transformation. The language describes the database structure as input and returns the structure of new RDF dataset. Transformation to RDF graph is occurred via SPARQL constructs. The target RDF vocab- ulary composes the database elements name, therefore it is not possible to change the RDF structure or vocabulary. Figure 3.8 presents the meta-model of the R2RML. It includes the following elements: “triplesMap, LogicalTable, PredicateObjectMap, GraphMap, SubjectMap, PredicateMap, ObjectMap, RefObjectMap and Join“. The Input can be an SQL query to the Database. The code below presents an example of the SQL query that selects data about movie (title and date) from movie database.
1 [] rr:sqlQuery """ 2 S e l e c t ( ’ Movie ’ | | MOVIENO) AS MOVIEID 3 , MOVIENO 4 ,TITEL 5 , DATE 6 from LW.MOVIE 7 """; 8 rr:sqlVersion rr:SQL2008. Listing 3.18: R2RML
44http://www.w3.org/TR/r2rml/
65 The rules for relational dataset to rdf mapping should be specified via TripleMap, that has exactly one logical table, one subject map and zero or more predicate object map properties. The logical table describes the set of data that have to be mapped to RDF.
1 namespace: 2 [] 3 rr:logicalTable [ rr:tableName "MOVIE" ]; 4 rr:subjectMap [ rr:template "http: //linkedwidget.org/ 5 moviedataset /{MOVIENO}" ]; 6 rr:predicateObjectMap [ 7 rr:predicate lw:titel; 8 rr:objectMap [ rr:column "TITEL" ]; 9 ]; 10 rr:predicateObjectMap [ 11 rr:predicate lw:date; 12 rr:objectMap [ rr:column "DATE" ]; 13 ]. Listing 3.19: R2RML
The subject map property describes the way of subject generation. It references one or more properties rr:class (c.f. Listing 3.20). The value of the property is an IRI.
1 input: [] rr:template "http: //linkedwidget.org/moviedataset/ 2 {MOVIENO} " ; 3 rr:class lw:Movie. 4 output:
Figure 3.8: An overview of R2RML
66 The predicate-object map is “a function that creates one or more predicate-object pairs for each logical table row of a logical table“ [72]. The predicate-object map is linked to one or more predicate maps, and one or more object maps or referencing object maps. The term map is “a function that generates an RDF term from a logical table row“ [72]. The term map relates to following RDF terms:
• Constant value (via rr:constant), represented by a resource.
• Column name (via rr:column), a valid SQL identifier.
• String template (via rr:template), a format string for strings building from multiple components.
• rr:IRI, rr:BlancNode, rr:Literal (via rr:termType), defines type of an RDF term, that can be either an IRI, or a blank node or a literal.
• language tag (via rr:language).
• rdfs:Datatype (via rr:datatype).
• string template (via rr:inverseExpression), for term map optimisation.
“A term map must be exactly one of the following: a constant-valued term map, a column- valued term map, a template-valued term map“.
1 [] rr:predicateMap [ rr:constant rdf:type ]; 2 rr:objectMap [ rr:constant lw:Movie ]. 3 ?x rdf:type lw:Movie. 4 5 [] rr:objectMap [ rr:column "MOVIENO"; rr:datatype 6 xsd:positiveInteger ]. Listing 3.21: R2RML
Relations mapping. It is possible to add reference between two instances instantiated from database. For example, the relation between movies and actors who have played in the movie. It is realized by adding a property object map. The property object map references triple map and join condition via rr:parentTriplesMap and rr:joinCondition. The join condition has exactly one value of property rr:child and one value of property rr:parent. The following code presents a SQL query, if the referencing object map has no join condi- tion.
1 SELECT ∗ FROM ({child −query}) AS tmp Listing 3.22: R2RML
Second code presents a SQL query, if the referencing object map has no join condition, if the referencing object map has at least one join condition.
67 1 SELECT ∗ FROM ({child −query}) AS child , 2 ({ p a r e n t −query}) AS parent 3 WHERE child .{ child −column1}=parent .{parent −column1 } 4 AND child.{child −column2}=parent .{parent −column2 } 5 AND . . . 6 7 [] rr:predicateObjectMap [ 8 rr:predicate lw:movie; 9 rr:objectMap [ 10 rr:parentTriplesMap <#TriplesMap2>; 11 rr:joinCondition [ 12 r r : c h i l d "MOVIENO" ; 13 r r : p a r e n t "MOVIENO" ; 14 ]; 15 ]; 16 ]. Listing 3.23: R2RML
The result is
3.3 Summary
This chapter presented some of the existing mashups platforms (Yahoo!Pipes, DERI Pipes, BIO2RDF), the tools that are provided by LOD2, and semantic approaches for enhancing web services with additional semantic information. The first part of the analysis showed that the application has a number of weak points:
• Most of the applications are not general, i.e. the focus is usually a specific problem. For example, BIO2RDF provides just Life Science Linked Data, LIMS is based on estimation of similarity between instances [8], etc.
• The systems do not give a possibility to develop new functions that can solve additional tasks.
• The mashup platforms are not described semantically, therefore composition and discov- ery are very difficult.
• For non-professional users it is often difficult to use this application because specific knowledge is needed.
In the second part of the analysis described the advantages and disadvantages of the semantic description approaches. The approaches can be categorized into the following types of service description approaches: the approaches that focus on technical aspects of web services, the approaches that focus on integration of web services and Linked Data, the approaches that are focused on quality of the ontology models, and matching to transform different types of data into RDF.
68 The main focus of the first group is representation of interaction between software compo- nents. Most of them do not describe explicit relations between input and output data. Addi- tionally, the developer should have a very good knowledge in this domain. The developer needs to describe preconditions for the Web Service execution, postconditions, and effects. Moreover the developer should describe very detailed rules, the choreography and orchestration of the service. In compression to web services widgets do not have such wide variety of functionali- ties that must be described. Additionally, the Mashups platform supports widgets development. Often knowledge workers don’t have enough practical experience in service-oriented architec- ture, therefore the mashups platform should provide automatic generation of semantic widget description. Due to this fact applying this approach for widget description is not possible. The second group of approaches focuses on description of relations between Linked Data and Semantic Web Services. This is advantageous due to the widget processing Linked Data. But most of them have limitations, for example, LIDS and LOS “integrate data services with Linked Data by assigning a URI to each service invocation. The service URI is linked to re- sources in the Linked Data cloud and dereferencing the URI provides RDF information about the linked resources“ [78]. The input and output graphs are presented as string. This limits widgets discovery and composition. Additionally, it should be possible to query the semantic descriptions with use of SPARQL. Data-Fu and EXPRESS don’t support easy service querying; therefore this approach is not applicable for widgets. An approach which can support widget publishing, widget discovery, widget composition, and widget execution can be Karma. An example of widget description with use of Karma ontology is provided in Chapter 4. The third and fourth groups of approaches are not relevant in this stage of mashup platform implementation. In future, it may be possible to extend the semantic model with the relation database to rdf mapping (R2RML) in order to increase the amount of datasets that can be pro- cessed by the mashups platform. The following benchmarking table 3.1 and 3.2 summaries the features of the approaches that are described in this Chapter: • Possibility to publish services description on the LOD cloud. Does the approach follow Linked Data principles to make information available on the LOD Cloud? • Discovery and composition based on input and output data. Does the approach support description of input and output data based on that it is possible to provide discovering and composition of the services? • Provenance information. Is it possible to define the origin of data sets? • Description of relations between data. Does the approach support semantic relations be- tween input and output data? • Separation presentation and data level. • Complexity. How much time does the developer need to spent to be familiar with the approach? • Possibility to discover the service using SPARQL. Is it possible to query the models?
69 Feature WSDL SAWSDL OWL-S WSMO WSMO- SA-REST Lite Goal adding adding description semantic semantic de- adding descrip- semantic of web service de- scription of semantic to tion of annotations service scription services services function- to WSDL functionality alities Method description extension of description F-Logic an an- adding the of ser- WSDL logic (OWL) for logical notation annotation vices expres- mechanism in the endpoints sions for WSDL service and their using this description messages service ontology Possibility no no no no no no to publish services de- scription on the LOD cloud Discovery no the concepts via OWL-S complicated, no no and Compo- from the process hard to im- sition based semantic models plement on input and models are (based output data referenced only for from within input/out- WSDL com- put data ponents as (relations annotations ignoring)) Provenance no it is possible it is possible no no difficult information to extend the to extend the model ontology Description no no no no no no of relation between data Separation yes yes yes yes yes no presentation and data level Complexity yes yes yes yes yes no Possibility to no no, an ex- provides ba- provides no no discover the tension is re- sic function basic func- service using quired [41] for discov- tion for SPARQL ering. It is discover- queries required to ing. It is 70 extend the required to model, e.g. extend the [30] model Table 3.1: Approaches comparison. Part 1 Feature RESTDesc EXPRESS LOS LIDS Data-Fu Karma Goal adding adding semantic semantic semantic integration semantic semantic description description descrip- of data from descrip- annotations of services of services tion of differnt tion of to services that process that process services sources service Linked Data Linked Data that function- process alities Linked Data Method describing description applying semantic using a providing precondi- of services SPARQL description declara- semantic de- tion and with use of constructs of services tive rule scription of postcon- OWL for service following language data and API dition of description Linked Data description resources principles of a service Possibility no no no yes no yes to publish the services description on the LOD cloud Discovery yes no difficult, difficult, no yes and Compo- because because sition based of using of using on input and string value string value output data for graph for graph patterns patterns Provenance no no no no no no information Description yes yes yes yes yes yes of relation between data Separation yes yes yes yes yes yes presentation and data level Complexity no no no no no yes Possibility to no no, because no, because no no yes discover the of using of using service using string value string value SPARQL for graph for graph queries patterns patterns Table 3.2: Approaches comparison. Part 2 71
CHAPTER 4 Solution
4.1 Definition of requirements
Figure 4.1 presents an example of a mashup. The mashup provides a combination of wired widgets, simple applications that provide some functionalities for data processing or visualizing, such as “Location“ and “Air Quality Filter“. The main components of a widget are input/output terminals and options. The input and output terminals are used to wire the widgets in order to process the data. Additionally, the widgets include options, input that influences the data process, such as “Choose location type“, “Street“ and “Maximum distance“. The widgets can be categorized into the following types: data widget which accesses to data sources and retrieves data, processing widget which process data that were retrieved from other widget (e.g., “geo merger“), presentation widget which vitalizes data sets in form of diagrams, maps, etc., and user interaction widget which is used to provide additional functionalities, e.g item selection.
Figure 4.1: Mashup example
A goal of this thesis is to develop a semantic model that will support publishing the widgets
73 on LOD Cloud, widget discovery, widget composition, and execution, selection of the required input from the provided context information which is based on a semantic model. The previous two section have given an overview on principles of Semantic and Linked Data, and on Semantic Web Services Description. Based on these principles the following basic requirements and widgets features can be specified:
1. Widgets and Mashups are identified via an identifier - a URI. User agents may deference the widgets via these URIs. The user will have a possibility to share and publish information about unique widgets and mashups.
2. By dereferencing the widget URIs, the semantic model will be returned. Widgets have semantic models that describe what kind of data can widget retrieve and process. This model will be returned.
3. The model should follow web standards (W3C recommendations). E.g., use of Semantic Web standards for data description (RDF, PROV). “The use of standards enables the Web to transcend different technical architectures“ [36]. The use of standardized content format enables to process and publish data on the Web. RDF is used to present data structure and enable the integration of information from mul- tiply sources. Since the widget description is presented with use of RDF standards, it should be possible to discover and composite the widgets.
4. The semantic model should support adding links to other Linked Data sources. These links allow the mashups platform to connect distributed data into a data space and to navigate over the data sets. For example, a link adds the relationship “owns“ between an owner and his/her pet. The mashups platform can find a URI of a widget that retrieves RDF data describing pets owner. Following the links “owns“ the mashup platform can find widgets that can process data about his pets.
5. A widget may have more than one semantic model, but all should generate the same output with explicit relation to input graph. For example, finding geographical coordinates based on different types of of location such as parks, organisations, libraries that have different properties and can be consumed from various Linked Data Endpoint. For example, organisation can be consumed from DBPedia Endpoint and places can be consumed from Open Governmental Data like US Data Governmental Data1 or some similar Linked Data Endpoints. The output of the widget will be set of points that are modelled according to GeoNames Ontology2 and related to the location models.
1http://www.data.gov/ 2http://www.geonames.org/ontology/documentation.html
74 6. The input and output data should be interlinked, explicit relations between data should be defined. Figure 4.2 presents an example of the widget “DBPedia Film Merger“. This widget can have different instances of class from dbpedia:person and dbpedia:Work as input and output. The explicit relations in this case are dbprop:starring and dbprop:directorOf.
7. The model should be general to support various types of widgets. E.g. data widgets, presentation widgets.
8. The semantic model should provide “an explicit representation of provenance information that is accessible to machines, not just to humans“ [85]. The semantic model should provide information about origin and ownership of datasets, change tracking, and access control that will increase people’s trust in data quality.
In previous Chapter approaches for semantic description for web services have been com- pared according to features that are relevant for the widgets like techniques of composition and discovery, possibility to add relationships between input and output data, possibility to publish services on the LOD cloud, provenance information, etc. The benchmarking table shows that the approach provided by Karma satisfies nearly all requirements for the semantic model. Following Chapter provides an implementation of the semantic model based on this approach.
4.2 Use and Extension of Karma Approach
Figure 4.3 shows a semantic model for widget description based on Karma approach. The model represents the semantics of widgets including relationships between input and output data, and it uses RDF so that models can be queried using SPARQL [45]. The model
Figure 4.2: Widget & Semantic Model
75 Figure 4.3: Linked Widget Model based on Karma approach has two kind of properties lw:hasInput and lw:hasOutput that are linked to a Model (property lw:hasModel). The SWRL vocabulary is used to define input and output data. SWRL is based on a combination of the OWL DL and OWL Lite sublanguages of the OWL Web Ontology Language with the Unary/ Binary DatalogRuleML sublanguages of the Rule Markup Language [38]. SWRL allows to write rules expressed in terms of OWL concepts. An swrl:ClassAtom entity shows the membership of an instance to a class, an Data- valuedPropertyAtom presents an instance of a data property (e.g., an entity of class dbpe- dia:Work has the property dbprop:hasTitle where title is a string value), and an Indi- vidualPropertyAtom entity describes an instance of a property. Figure 4.2 shows a widget which finds films. The widget receives datasets that contain in- formation about some stars and directors. The first terminal is used to wire the widget with a widget that returns datasets about stars. The second terminal is used to wire the widget with a widget that returns datasets about directors. The widget is identified by a URI, e.g. http://www.linkedwidgets.org/widhet/w5 and has two inputs that are connected with models mw5:starModel and mw5:directorModel and one output that is connected with the model mw5:filmModel. The models define data, that the widget process and add relationships between these data, with use of SWRL vocabulary. In this example there are three models because the widget process a set of stars and a set of directors in order to return a set of films. Each kind of instances needs a semantic description. The models are depicted in figures 4.4, 4.5, and 4.6: • The first picture presents a model of the first input - a set of stars. A star is an instance of the class dbpedia:Person that has the property dbprop:starring. • The second picture presents a model of the second input - a set of directors. A director is an instance of the class dbpedia:Person that has the property dbprop:director.
76 • The third picture presents a model of the output - a set of films. The relationships between the star class and the film class, and the director class and the film class are described using instances mw5:PropertyAtom1 and PropertyAtom2 of class swrl:PropertyAtom.
Figure 4.4: The star model
The following code presents a part of a widget description that can be published on the LOD cloud.
1 @prefix mw5:
77 Figure 4.5: The director model
Figure 4.6: The film model
22 swrl:argument1 mw5: star ; 23 swrl:argument2 mw5:film ]; 24 lw : hasAtom
78 25 [ a swrl:ClassAtom3 ; 26 swrl:ClassPredicate ontology:Work; 27 swrl:argument1 mw5:film ]; 28 ... Listing 4.1: Widget Model represented formally
Moreover, the semantic model should provide automatic widget matching and execution. Listing 4.2 shows a SPARQL query that searches for widgets that contain a specific kind of semantic relation dbprop:starring.
1 SELECT ?widget ?name ?variable1 ?variable2 2 3 WHERE { 4 ?widget [lw:hasInput [lw:hasModel 5 [lw:hasAtom [swrl:propertyPredicate dbprop:starring; 6 swrl:argument1 ?variable1; swrl:argument2 ?varibale2 ]]]]. 7 ?widget lw:hasOutput[lw:hasModel 8 [swrl:propertyPredicate dbprop:starring; 9 swrl:argument1 ?variable1; swrl:argument2 ?variable2 ]]]]. 10 ?widget lw:hasName ?name. 11 } Listing 4.2: SPARQL query
Figure 4.7 shows the result of the query.
Figure 4.7: Results
The second SPARQL query searches for a widget which can produce a set of films. DBPedia does not have a special class for films. For film definition the class ontology:Work with the relation dbprop:starring is used.
1 SELECT ?widget ?name ?variable 2 3 WHERE { 4 ?widget lw:hasOutput[lw:hasModel 5 [lw:hasAtom [swrl:classPredicate ontology:Work; 6 swrl:argument1 ?variable], 7 [swrl:propertyPredicate dbprop:starring; 8 swrl:argument1 ?variable2; swrl:argument2 ?variable ]]]].
79 9 ?widget lw:hasName ?name. 10 }
Figure 4.8 shows the result of the query.
Figure 4.8: Results
The third SPARQL query finds similar data to the data that a widget process. It can be links to internal resources or links to external data resources. For example, an entity. In this case, the property owl:sameAs is often used. This means “that two URI references actually refer to the same thing: the individuals have the same identity“ [11]. For example, the entity http: //dbpedia.org/page/Angelina_Jolie has the property owl:sameAs that is related to http://de.dbpedia.org/resource/Angelina_Jolie in German language and freebase:Angelina Jolie from Freebase3.
1 SELECT ?widget ?name ?variable 2 3 WHERE { 4 ?widget lw:hasModel[lw:hasInput[lw:hasAtom 5 [swrl:propertyPredicate owl:sameAs; swrl:argument1 6 ?variable ]]]. 7 ?widget lw:hasName ?name. 8 }
Figure 4.9 shows the result of the query. Due to the fact that the mashup platform may support widget development, it is very hard to develop a user interface which will be clear for end users and support this kind of models. Widget discovery can also be problematically because the creating queries is difficult for the end user. Additionally, the model should be easy expandable because of an extensional growth in the amount of available widgets that can have more complex features.
3http://www.freebase.com/
80 Figure 4.9: Results
4.3 Widget Model
Due to the fact that there is difficult to apply web service description approaches, a semantic model for widget description had to be implemented. Figure 4.10 depicts this model. According to this description model each widget has three types of models: input model, output model, and model, that can be connected via three types of relationship (lw:hasInputModel, lw: hasOut- putModel, and lw:hasModel) to an instance of a widget. The models contain specific kinds of semantic relation. An instance of the class lw:InModel has direct link to Linked Data instances via the property lw:hasInNode and describes the kind of a semantic relation which a widget has as Input. An instance of the class lw:OutModel has direct link to Linked Data instances via the property lw:hasOutNode and describes the kind of a semantic relation which a widget has as Output. The class Model has the property lw:hasNode and describes the full semantic model which is processed by a Widget. Listing 4.3 presents a use case. The widget might have more than one input model and one output model which is unique. The widget receives instances of dbpedia:Person and returns an instance of a class dbpedia:Work. A work can be found either by providing name of director or name of star. There are two kinds of persons in the widget model, namely start and director. dbpedia:starring is the relation between input and output models that shows that the person is a star. dbpedia:director is the relation between input and output models that shows that the person is a director.
1 @prefix :
81 Figure 4.10: Widget Model
14 lw:hasInNode :star; 15 lw:hasInNode :nameStringObject1; 16 lw:hasInNode :nameStringObject2; 17 lw:hasInNode :nameDateObject. 18 19 :inM2 a lw:inModel; 20 lw:hasInNode :director; 21 lw:hasInNode :nameStringObject1; 22 lw:hasInNode :nameStringObject2; 23 lw:hasInNode :nameDateObject. 24 25 :outM a lw:OutModel; 26 lw:hasOutNode :star; 27 lw:hasOutNode :director . 28 29 :m a lw:Model; 30 lw:hasNode :star; 31 lw:hasNode :director. 32 33 :Widget a lw:Widget; 34 lw:hasName "Movie Agent Widget" ; 35 lw:hasInputModel :inM1; 36 lw:hasInputModel :inM2; 37 lw:hasOutputModel :outM;
82 38 lw:hasModel :m. Listing 4.3: a Widget Model
The code above shows a semantic description with use of the semantic model which was in- troduced in this section. Figure 4.11 presents this semantic description with use of the graphical notation. This semantic model has following advantages:
• it supports more natural way of widget description,
• it is easily extendible,
• the direct interlinking widget models with Linked Data provides clear definition of se- mantic relations,
• semantic repository of widgets can be queried in a clear and efficient way to find the appropriate widgets.
Moreover, the semantic model follows the Semantic Web standards and direct interlinking to Linked Data supports better querying the Linked Data sets. The model lw:Model defines the explicit relations between input and output data. Therefore the input and output models can be used to create the appropriate query for finding specific kinds of semantic relation, extracting required data from Linked Data datasets, searching for widgets that can consume a specific dataset or produce the required output data, or selecting of the required input from the provided context data. The querying examples are provided in Chapter 5.
Figure 4.11: Widget Model
83 4.4 DCAT
A requirement for Linked Widget is to provide information about about origin and ownership of datasets and increasing interoperability between widgets. A possibility to provide this additional features is to model it according to the Data Catalogue Vocabulary4 (DCAT). W3C defines the vocabulary as “a RDF vocabulary that has been designed to facilitate interoperability between data catalogs published on the Web“ [55]. Figure 4.12 depicts use of the DCAT vocabulary which has been adapted to widget description. For the semantic model of the widgets followings namespaces are used:
Prefix Namespace IRI Description dcat http://www.w3. DCAT is “an RDF vocabulary designed to facilitate in- org/ns/dcat# teroperability between data catalogs published on the Web“ [55]. dct http://www.w3. Dublin Core Schema org/ns/dcat# http://purl.org/ dc/terms/# rdf http://www.w3. c.f. Chapter 2 org/1999/02/ 22-rdf-syntax-ns# rdfs http://www.w3. c.f. Chapter 2 org/2000/01/ rdf-schema# foaf http://xmlns.com/ “FOAF is a project devoted to linking people and in- foaf/0.1/ formation using the Web. Regardless of whether in- formation is in people’s heads, in physical or digital documents, or in the form of factual data, it can be linked“ [22] skos http://www.w3. SKOS is “ is a common data model for sharing and org/2004/02/skos/ linking knowledge organization systems via the Se- core# mantic Web“ [59] vcard http://www.w3. VCard is a vocabulary designed for description of or- org/2006/vcard/ ganisation and people. ns# Table 4.1: Prefix and Namespaces
The semantic model includes following properties and classes:
• dct:title - a name given to the widget;
• dct:description - description of the widget;
4http://www.w3.org/TR/vocab-dcat/
84 Figure 4.12: Extension of the Semantic Widget Model with DCAT
• dct:issued - date of formal issuance of the widget; • dct:modified - most recent date on which the widget (in general) was changed, up- dated or modified; • dct:language - language; • dcat:keyword - keywords or tags describing the widget; • dcat:contactPoint - link to contact information which is provided using VCard vocabulary; • dct:temporal - the temporal period that the dataset covers (for data cube); • dct:publisher - an entity responsible for widget creation and publishing, link to foaf:Agent (persons, organizations or groups of any kind); • dcat:theme - the main topic of the widget; • skos:Concept a category or a theme is used for describing, categorizing and organis- ing datasets; • skos:ConceptScheme - the knowledge organization system used to represent con- cepts of widgets. Listing 4.4 demonstrates an example of applying of DCAT vocabulary for the Linked Wid- gets. The Widget has the title “Movie Widget“ and the relationship to media themes (ex1:media is an instance of skos:Concept).
85 1 :Widget a lw:Widget ; 2 rdfs:label "Widget 1"^^xsd:string ; 3 lw:hasInModel :inM1, :InM2 ; 4 lw:hasModel :m ; 5 lw:hasOutModel :OutModel ; 6 lw:name "Movie Widget"^^xsd:string ; 7 dct:title "Search for movie"^^xsd:string ; 8 dct:description "The widget searches for movies 9 based on actors and directors name"; 10 dct:issued "2014−01−10"^^xsd:date; 11 dct:modified "2014−01−12"^^xsd:date; 12 dcat:keyword "movie, film , actor , star"^^xsd:string ; 13 dct:publisher :tuvienna ; 14 dcat:theme :media. 15 16 :tu\_vienna a org:Organization , foaf:Agent; 17 rdfs:label "University of Technology Vienna" . Listing 4.4: DCAT example
Listing 4.5 show a SPRAQL example for searching widgets that have “media“ as the main theme, have been issued on “10.01.2014“ by “tuvienna“.
1 2 SELECT ?w ? t i t l e 3 WHERE { 4 ?w a lw:Widget; 5 dct:title ?title; 6 dcat:theme :media; 7 dct:issued "2014−01−10" ^^ xsd : date ; 8 dct:publisher :tuvienna ; 9 } Listing 4.5: SPARQL example
DCAT gives following benefits for Mashups: DCAT increases findability, enables descrip- tion of data that are located in different Linked Data endpoints, provides better search for wid- gets.
4.5 Provenance
An important requirement for Linked Widget is extending the widget description by adding information provenance. The W3C Provenance Incubator Group defines provenance as “a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing“ [35]. In other words, the meta-data may include information about:
• the creator of the data (author, reviewer, etc.);
• version of data sets (the data are changed often);
86 • data sources of information, in case of data integration, it is needed to describe which part comes from which data sets,
• description of rules, vocabularies, ontologies; etc.
“The Provenance Family of Documents (PROV) defines a model, corresponding serializa- tions and other supporting definitions to enable the inter-operable interchange of provenance information in heterogeneous environments such as the Web. The goal of PROV is to enable the wide publication and interchange of provenance on the Web and other information systems. PROV enables one to represent and interchange provenance information using widely available formats such as RDF and XML. In addition, it provides definitions for accessing provenance information, validating it, and mapping to Dublin Core“ [35]. There is a set of 12 documents, that W3C group defined for adding provenance: PROV-OVERVIEW5, PROV-PRIMER6, PROV- DM7, PROV-N8, etc. Figure 4.13 shows the organisation of PROV documents. The colors in the figure define on what category of user are the documentations focused:
• light blue color is for users (understanding and support provenance);
• blue is for developers (creation and consuming provenance);
• pink is for advanced user (creation new PROV serializations or other application based on provenance).
The common vocabulary is defined by the conceptual data model (PROV-DM). The user and the developers use the set of constraints (PROV-Constraints9) for constructing of valid prove- nances expressions. The formal semantic (declarative specification) is defined by PROV-SEM10. Further the developers use access provenance (PROV-AQ11), linking provenance information (PROV-Links12), dictionary style collections (PROV-Dictionary13) and Dublin Core vocabulary (PROV-DC). The approach suggests the use of the PROV ontology [50] (PROV-O, a standard lightweight vocabulary) for adding meta-information about provenance of information. The W3C Prove- nance Incubator Group describes PROV-O as “an OWL2 ontology allowing the mapping of the PROV data model to RDF“. The Prov ontology includes set of classes, properties, and restric- tions for representation of the information. Table 4.2 demonstrated the namespaces which are used by PROV-O. The basic three classes of PROV-O are:
5http://www.w3.org/TR/prov-overview/ 6http://www.w3.org/TR/prov-primer/ 7http://www.w3.org/TR/prov-dm/ 8http://www.w3.org/TR/prov-n/ 9http://www.w3.org/TR/prov-constraints/ 10http://www.w3.org/TR/prov-sem/ 11http://www.w3.org/TR/prov-aq/ 12http://www.w3.org/TR/prov-links/ 13http://www.w3.org/TR/prov-dictationary/
87 Figure 4.13: PROV documents. Source: [35]
Prefix Namespace IRI rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# xsd http://www.w3.org/2000/10/XMLSchema# owl http://www.w3.org/2002/07/owl# prov http://www.w3.org/ns/prov# Table 4.2: Prefix and Namespaces
• A prov:Entity is a kind of thing with some fixed aspects (real or imaginary).
• A prov:Activity is an event that happens over a period of time with entities (e.g. include consuming, transforming, using, etc.).
• A prov:Agent is a responsible for an activity.
The relations between the classes entity, actor and activity are shown in Figure 4.14. The properties prov:startedAtTime and prov: endedAtTime show start and end time of activities. The entities can be used and generated by activities (the properties: prov:used and prov:wasGeneratedBy). Additionally, some dependency information between activities can be provided via prov:wasInformedBy. This provides “some dependency information without explicitly providing the activities’ start and end times“ [50]. For example, the activity “creationCollection“ calls an additional activity “aggregationByTopic“ activity to subscribe a widget to a theme.
1 2 @prefix prov:
88 7 prov:wasInformedBy :subsribeActivity. 8 9 :subscribeActivity 10 a prov:Activity; 11 prov:wasInfluencedBy :aggregationByTopics; 12 # aggregation of widgets by topics 13 prov:wasCreatedBy: :irina. 14 15 :irina aprov:Agent. 16 17 :aggregationByTopic a prov:Activity . Listing 4.6: PROV-O
The property prov:wasDerivedFrom is used for provenance chains definition (trans- formation of one entity into another). For example, a new dataset can be a result of filtering of an another dataset. “Arbitrary RDF properties can be used to describe the fixed aspects of an Entity that are interesting within a particular application“ [50] (e.g., the format of the dataset). The responsibilities of an agent can be shown via prov:wasAssociatedWith and prov:wasAttributedTo. The property prov:actedOnBehalfOf describes an agent’s responsibility for an another agent that relates to the influenced Activity or Entity. The following code presents a part of a description with use of DCAT.
1 2 :movieWidget a lw:Widget 3 dct:title "Search for films" ; 4 dct:creator :irina ; 5 dct:contributor :peter ; 6 dct:created "2013−12−01" ; 7 dcat:theme :Media. 8 ...
Figure 4.14: Relation between three basic classes
89 dct:creator :movieWidget :irina prov:wasAttributeTo
prov:wasAssociatedWith prov:wasGeneratedBy
:creatingThe Widget prov:startedAtTime
prov:wasAssociatedWith “2013-11-15” prov:endedAtTime
:peter “2013-12-01”
Figure 4.15: Relation between the basic classes
9 10 :irina a dct:Agent. 11 12 :tuvienna a dct:Agent. 13 14 :Media a skos:Concept; 15 dct:creator :irina. 16 ... Listing 4.7: A part of a Widget Description
Figure 4.15 represents a transformation from the semantic description modelled following to DCAT vocabulary to the semantic description which is modelled with use of PROV-O. In this case, the entity is a :movieWidget (an instance of the class lw:Widget). There are two agents, that are responsible for the action effecting the entity :movieWidget: :irina and :peter. The action is :creatingTheWidget, that describes how the entity namely lw:Widget has been created or changed. The properties prov:startedAtTime and prov: endedAtTime describe the date of first creation of the widget and date of the last change. The ontology described above can be extended via additional terms (c.f. Figure 4.16). These additions can be divided in five categories [50]:
1. The class prov:Agent has three subclasses: prov:Agent - for peoples; prov: Organization - for companies, social institutions, society, etc.; and prov:Software Agent - for running software. The prov:Entity divides into: prov:Collection
90 Figure 4.16: The extended term
that provides structure to some Entities; prov:Bundle - a set of provenance descrip- tions; prov:Plan - a set of actions.
2. The property prov:specializationOf presents “an entity that is a specialization of another shares all aspects of the latter, and additionally presents more specific as- pects of the same thing as the latter“ [50]. An alternate entities can be presented using prov:alternateOf property.
3. The property prov:atLocation defines a prov:Location for the Entities.
4. The lifetime of Entities that are generated by an Activity and used by other Activities are defined by prov:invalidatedAtTime, prov:wasInvalidatedBy etc.
5. The lifetime of an Activity - the time between start and end time.
Figure 4.17 and the following code provide an example of using the additional terms (three types of agents: person, organization, and software).
1 2 @prefix xsd:
91 10 <> 11 12 a prov:Bundle, prov:Entity; 13 prov:wasAttributedTo :postEditor; 14 prov:generatedAtTime "2011−07−16T02:52:02Z"^^xsd:dateTime; 15 . 16 17 : i r i n a 18 a prov:Person , prov:Agent; 19 ## prov:Agent is inferred from prov:Person 20 foaf:givenName "Irina"; 21 foaf:mbox
92 :tuvienna prov:actedOnBehalfOf
:irina :movieWidget
prov:wasGenreretedBy prov:wasEndedBy prov:wasStartedBy
:creatingThe prov:wasAssociatedWith :widgetSystem Widget
Figure 4.17: Relation between the basic classes
93
CHAPTER 5 Results and Evaluation
5.1 Resulting Semantic Model
Figure 5.1 presents the semantic model based on semantic widget model, DCAT, and PROV- O that were described in previous Chapter. The model describes a possible way to bring the ontologies together in order to satisfy the requirements for the mashup system. It includes the most important classes that cover all requirements. If some additional classes or properties are needed, it is possible to extend the semantic model. The semantic model has been extended with the following properties and classes:
• dcat:theme - for widget classification. E.g. media, actors, science, etc.
• dct:publisher - for providing information about creators of widgets. There are three types of possible agents: foaf:Person defines a person, who created a widgets), foaf:Group defines groups of creators or an institution, to that the creators belong, and foaf:Software, e.g. editor, mashup creator. DCAT vocabulary “makes extensive use of terms from other vocabularies“ [55], e.g. Dublin Core1.
• prov:wasGeneretedBy - for providing an activity, that have an influence on state of the widget (e.g. creation, changing, etc.).
5.2 Semantic Model Use cases
In this section use cases are addressed by semantic widgets which follow the semantic semantic model presented in this Chapter. Semantic model description use cases are divided into follow- ing categories:
1http://dublincore.org/documents/dcmi-terms/
95 Figure 5.1: Widget Model
• Publishing the Linked Widget information on Linked Open Data Cloud. The detailed description of widgets.
• Discovery: finding widgets that contain a specific kind of semantic relation. E.g. all widgets that contain property dbprop:livesIn.
• Composition: finding the matching widget that can consume a specific dataset or produce the required output data. E.g. all widgets that have instances of class dbpedia:Person from DBPedia.
• Smart data consumption based on semantic model: semantic model is used to select the required input from the provided context data.
5.2.1 Publishing examples Figure 5.2 presents a set of widgets, that are needed for searching a set of films. The widget “DBPedia Film Agent Search“ gives a possibility to find either actors or directors based on the following properties: name, birthplace, and year of birth. This widget is presented by the in- stance w:widget of the class lw:Widget, that has an input models w:inM1 and w:inM2, a model w:m and an output model w:outM. The models are connected with the instances of DBPedia classes dbpedia:Person via the property lw:hasInputNode, lw:hasNode and lw:hasOutputNode. The instances of the class dbpedia:Person are w:star and w:director that differentiated with help of the dbpedia properties dbpedia:starring and dbpedia:director. The model w:m includes all properties and all classes, that needed
96 to depict all relationship between inputs and outputs of the widget. The output model is con- nected to stars and directors that are instances of the same DBPedia class dbpedia:Person. Furthermore, a person can be star and director at the same time and have both properties dbpedia: starring and dbpedia:director. Figures 5.3 and 5.4 present a part of semantic model of widget “DBPedia Film Agent Search“, in graphical notation and in Turtle.
Figure 5.2: Widget “DBPedia Film Agent Search“
Figure 5.5 presents the widget “Google Maps“ which receives list of coordinates (longitude and latitude) and shows the points on map. This widget is presented by the instance w2:Widget of the class lw:Widget that has an input models w2:inM1 and a model w2:m. The models
Figure 5.3: Semantic Model of Widget “DBPedia Film Agent Search“ in graphical notation
97 Figure 5.4: Semantic Model of Widget “DBPedia Film Agent Search“ in TopBraid Composer are connected with the instances of Geonames ontology2 class gn:Feature via the prop- erty lw:hasInputNode and lw:hasNode. The instances of the class gn:Feature is w2:feature that has properties wgs84_pos:lat and wgs84_pos:long. Figures 5.6 presents a part of semantic model of widget “Google Maps“.
5.2.2 Discovery examples The second goal of the semantic description is searching for widgets. It is possible to implement with use of SPARQL queries. Discovery example 1 The first SPARQL query (c.f. Figure 5.7) finds widgets that contain the property dbpedia: starring in the widget descriptions models. The property defines a relationship between two DBPedia classes dbpedia:Person and dbpedia:Work.
2http://www.geonames.org/ontology/documentation.html
98 Figure 5.5: Widget “Google Maps“
The SPARQL query includes two clauses:
• The “SELECT clause identifies the variables to appear in the query results“ [73]: ?w - an instance of the class lw:Widget, ?name - the name of the widget, ?publisher - a publisher of the widget, an instance of the class foaf:Agent, ?n - a node that is connected to an instance which has property dbpedia:starring.
• The “WHERE clause provides the basic graph pattern to match against the data graph“ [73]. The basic graph pattern includes the following triples: ?w rdf:type lw:Widget - finding an instance of the class lw:Widget, lw:hasName ?name - finding names of widgets, ?w dcterms:publisher ?publisher - finding publisher of widgets, ?w lw:hasModel ?m - finding models of widgets, ?m lw:hasNode ?n - finding nodes of models, ?n dbpedia:starring ?ins - finding the property dbpedia:staring. Figure 5.8 shows the main classes and properties that are included in the SPARQL query.
Figure 5.9 presents the search for widget that contained owl:sameAs property which shows that two thing with different URIs are the same thing. The instance “Angelina Jolie“ the class Actor from Linked Movie Database is the same as the instance “Angelina Jolie“ of the class Person from DBPedia. The SPARQL query includes two clauses:
• The “SELECT clause identifies the variables to appear in the query results“ [73]: ?w - an instance of the class lw:Widget, ?name - the name of the widget, ?class - a class of the instance.
• The basic graph pattern of WHERE clause includes the following triples: ?w rdf:type lw:Widget - finding an instance of the class lw:Widget, lw:hasName ?name - finding names of widgets, ?w lw:hasModel ?m - finding models of widgets, ?m
99 Figure 5.6: Semantic Model of widget “Google Maps“
Figure 5.7: Finding widgets that contain property “starring“ in semantic model
lw:hasNode ?n - finding nodes of models, ?node owl:sameAs ?x - finding the
100 Figure 5.8: SPARQL query steps
widget nodes that have the property owl:sameAs, ?x rdf:type ?class - finding a class of the instance.
Figure 5.9: Search for widget that contained the “owl:sameAs“ property
5.2.3 Composition examples The following SPARQL query (c.f. Figure 5.10 finds widgets that produce geo data for Map visualization. The location is defined with use of geoname ontology class fn:Feature that has properties geo:lat and geo:long. A part of the widget description description is provided in Listing 5.1. This widget returns a set of location (longitude and latitude). The goal is to provide a mechanism for automatic SPARQL queries generation. In this cases, the query will search for widgets that can be wired with the output of this widget.
1
101 Figure 5.10: Search for widgets that produce geo data
2 @prefix dbpedia:
102 Figure 5.11: Generation of SPARQL queries
Figure 5.11 presents the automatic generation of SPARQL queries from a widget descrip- tion. The arrows in the picture show the transformation from the output model into the terms of SPARQL query. For example, the property lw:hasOutputModel is reversed to the term lw:hasInputModel, the property lw:hasOutputNode is reversed to lw:hasOutputNode. Figure 5.12 shows the result of the SPARQL query.
Figure 5.12: SPARQL query and result in TopBraid Composer
5.2.4 Smart Data Consumption Following figure 5.13 demonstrates selecting input data for the widget “Google Map“. The widget get the data flow from three widgets: “Location“ widget that returns locations of library, “City Byke“ widget that returns locations of city byke station, and “Geo Merge“ Widget. The “Geo Merger“ widget process this data according to defined options. The result includes set
103 of locations that are instances of the class gn:Point with the properties latitude geo:lat, longitude geo:long, and :address. Based on this model, the application knows which kind of data are required for the input of this widget.
Figure 5.13: Smart consumption example
5.3 Result evaluation
In Chapter 4 a list of requirements has been defined. The purpose of the subsection is to evaluate the result model based on requirements fulfilment:
1. Widgets and Mashups are identified via an identifier - a URI. Mashups and widgets have to be stored in Widget Repository and identified by their URI’s. The users have possibility to define the URI.
2. By dereferencing the widget URIs, the semantic model will be returned. It is possible to find a widget which is identified by a URI. The semantic model can be returned.
104 3. The model should follow web standards (W3C recommendations). The data are described with use of XML-serialized RDF format which gives the possibility to define the structure of the data and do publish them into LOD cloud. The provenance of information is described with use of web standards DCAT and PROV-O. The search is provided by using SPARQL queries. Some examples are given in this Chapter.
4. The semantic model should support adding links to other Linked Data sources. These links allow the mashups platform to connect distributed data into a data space and to navigate over the data sets. Due to the fact that widgets model are usually connected to original Linked Data, it is possible to define relations to external Linked Data Endpoints.
5. A widget may have more than one semantic model, but all should generate the same output with explicit relation to input graph. It is possible to create more than one model that generate the same output.
6. The input and output data should be interlinked, explicit relations between data should be defined. There are three different types of widget models: input model, output model, and model. It gives a flexible mechanisms to define the full data model and all connection between input and output data.
7. The model should be general to support various types of widgets. E.g. data widgets, presentation widgets. In actual stage of implementation the semantic model supports all existing types of wid- gets.
8. The semantic model should provide “an explicit representation of provenance information that is accessible to machines, not just to humans“ [85]. The provenance of data is provided by applying PROV-O and DCAT ontologies. The ontologies allow to define the information about author, date of creation, versions, etc.
The requirements are fulfilled. The semantic model follows Linked Data and Semantic Web principles. With use of the model the widgets can be published into LOD. The data are described with use of W3C standards, that make the data machine-readable.
105
CHAPTER 6 Conclusion and Future Work
This chapter summarizes the research work and research results, indicates research limitations and provides advise for future work.
6.1 Research Summary
The main questions of this research work were to define a possibility to apply semantic service description languages for widget description, to define requirements for the semantic model, to implement the semantic model according to defined requirements, and to integrate the model into a prototype mashup environment. The first challenge of this work was to define what kind of basic semantic concepts and principles the mashup platform should be based on. Therefore, the second part of this master thesis introduces both the definition of the web of data and software technology, such as Mashups and Web Services. A set of requirements were derived based on concepts of the semantic web. Another very important part is the extensive analysis and comparison of existing mashups platforms and semantic web services description techniques. The analysis was divided into two parts:
• The first part covers the analysis of existing mashups platforms in order to define what kind of factors can increase usability.
• The second part provides semantic web services description techniques, their advantages and disadvantages, and evaluation of possibilities to apply this concept to the mashup platform.
The result of the analysis shows that there is no applicable web service description ap- proach for the proposed system. Even though Karma seems to be the most suitable approach, it still poses barrier regarding model implementation, because the mashup development based on Karma approach is very complex.
107 Figure 6.1: Widget recommendation in Mashup Platform
The main goals were requirements definition and implementation of the widget model. These goals were achieved successfully. This includes a set of requirements for the Linked Widget Model that are derived from semantic web and Linked Data principles, and parts of the resulting semantic model that are presented by DCAT, Information Provenance, and semantic widget model. Additionally, the Karma approach as an alternative to the developed model is provided. The Karma-based widget description shows that this method does not satisfy all re- quirements, such as the possibility to include explicit descriptions of relations (Karma approach uses SWRL vocabulary) for the Semantic Model and therefore it should be extended. The exten- sion of the model, however, can provoke problems for widget discovery and widget composition.
The main result of this research is the semantic model, which can be integrated into a proto- type mashup environment. The resulting semantic model follows Linked Data principles. This enables publishing widget descriptions into the LOD cloud. Widget matching and composition can be provided by use of SPARQL queries. The model contains required meta-data for data origin definition. Figure 6.1 shows a new feature of the developed Mashup platform, which is available since the implementation of the semantic model. By clicking on the output terminal the user gets a list of widgets that can be wired with the widget on the workspace. The suggested widgets are appeared in the bottom left corner of the mashup platform.
108 6.2 Research Limitation
The following restrictions are denoted:
• Due to the fact that the mashup platform is in very earliest stage of implementation, only a limited number of use cases can be provided. An extensional growth in the amount of available widgets will provoke widgets that are more complex and therefore richer in terms of features. For example, the data sets should be transformed into more understandable, suitable structure. This can be done by applying some algorithms or statistical methods (correlation, role learning) that can provide an analysis of available Linked Data. This will influence the semantic model because it is required to describe explicitly the relation between input and output data.
• The existence of double relations between instances which is not supported by the se- mantic model. For example, the entity http://dbpedia.org/page/Angelina_ Jolie has two similar properties dbpprop:birthPlace and dbpprop:dateOf Birth.
• The semantic model includes basic components required for widgets description. For example, PROV-O provides very complex set of entities and relations which were not required for our solution.
6.3 Future Work
Due to the fact of significant growth of statistical data provided by various public organizations, in future the mashup platform, with all of its advantages regarding linked data consumption, can provide access to this data. A possibility to process this kind of public data through widgets is data publishing as Linked Data, using W3C Data Cube vocabulary1, a format for statistical data publishing on the Web of Data. This enables to link and combine the data with additional infor- mation. Additional advantages of this approach are that multi-dimensional data can be presented with use of the RDF standard and published following the Linked Data principles. Furthermore, the model is general, which enables high reusability, and can be used for various datasets like OLAP data cubes. The main elements of the Data Cube vocabulary are a collection of observa- tions (datasets), a set of dimensions, defining the foundations of the observation, measures that describe objects of the observation, and attributes of the observed values. This facet implies the development of new types of widgets that can process data modeled based on the Data Cube format. A possible way to integrate such data is to extend the existing semantic model by adding additional entities and relations like observation, dataset, measure, etc. The mashup platform will support visualization of such multi-dimensional data and inte- gration with other data sets to support end users in deducing knowledge from statistical data. Furthermore, this will allow developers to easily discover a data source and then develop sta- tistical web applications of high quality and flexibility. The current version supports only three common statistic charts including pie, bar chart and line chart, are supported. More types of
1http://www.w3.org/TR/2014/REC-vocab-data-cube-20140116/
109 charts can improve visualization of Linked Data and data browsing. This will also influence the semantic model, because the visualization widgets can process and return additional data like summary of some data values or difference between data value, etc. and the semantic model should enable description of such data types. Finally, streaming data2 can be integrated into the mashup platform. For this kind of data it will be necessary to find some mechanism how to deal with temporal data (time stamp, time interval and other options) and include it into the semantic model in order to provide best widget matching and composition.
2http://www.w3.org/community/rsp/wiki/RDF_Stream_Models
110 CHAPTER 7 Appendix
7.1 Acronyms
CSV Comma Separated Values
DAML DARPA Agent Markup Language
DCAT Data Catalog Vocabulary
DL Description Logic
HTML Hypertext Markup Language
IRI Internationalized Resource Identifier
LIDS Linked Data Services
LOD Linked Open Data
LOS Linked Open Services
OWL-S Semantic Marjup for Web Services
OWL Web Ontology Language
PROV-O Provenance Ontology
R2RML RDB to RDF mapping Language
RDB Relational Data Base
RDFS Resource Description Framework Schema
RDFa Resource Description Framework in Attributes
111 REST Representational State Transfer
RSS Really Simple Syndication
SAWSDL Semantic Annotation for Web Services Description Language
SOAP Simple Object Access Protocol
SPARQL SPARQL Protocol and RDF Query Language
SQl Structured Query Language
SWRL Semantic Web Rule Language
URI Uniform Resource Identifier
W3C World Wide Consortium
WSDL Web Service Description Language
WSMO Web Service Modeling Ontology
WWW, W3 World Wide Web
XML Extensible Markup Language
XSLT XSL Transformation
XSL Extensible Stylesheet Language
7.2 Widget Semantic Model
112 113 rdf:about="http://linkedwidgets.org/ontologies#hasInputNode">
114
115
7.3 Semantic Models in Top Braid Composer
Figure 7.1: Top Braid Composer Interface
116 Figure 7.2: Import of ontologies in Top Braid Composer
Figure 7.3: DBPedia classes in Top Braid Composer
117 Figure 7.4: An example of a property in Top Braid Composer
Figure 7.5: Instances in Top Braid Composer
118 Figure 7.6: An example of Widget Description in Top Braid Composer
Figure 7.7: An example of a model description in Top Braid Composer
119
Bibliography
[1] T. Berners-Lee and R. Fielding and L. Masinter. http://tools.ietf.org/html/rfc3986. Ac- cessed: 2014-02-21. [2] W3C. http://www.w3.org/2001/sw/. Accessed: 2014-02-21. [3] Saeed Aghaee and Cesare Pautasso. An evaluation of mashup tools based on support for heterogeneous mashup components. In Proceedings of the 11th International Conference on Current Trends in Web Engineering, ICWE’11, pages 1–12. Springer-Verlag, 2012. [4] AJAX. http://en.wikipedia.org/wiki/ajax_(programming), Accessed: 2013-11-11. [5] Dean Allemang and Jim Hendler. Semantic Web for the Working Ontologist: effective modelling in RDFS and OWL. Morgan Kaufmann Publishers, 2. edition, 2011. [6] Areeb Alowisheq, David E. Millard, and Thanassis Tiropanis. Express: Expressing rest- ful semantic services using domain ontologies. International Semantic Web Conference, 5823:941–948, 2009. [7] Alowisheq Areeb and David E. Millard. Express: Expressing restful semantic web ser- vices. The Seventh Reasoning Web Summer School, pages 23–27, 2011. [8] Sören Auer, Lorenz Bühmann, Christian Dirschl, Michael Hausenblas Orri Erling, Robert Isele, Jens Lehmann, Michael Martin, Pablo N. Mendes, Bert van Nuffelen, Claus Stadler, Sebastian Tramp, and Hugh Williams. Managing the Life-Cycle of Linked Data with the LOD2 Stack. The Semantic Web – ISWC 2012, pages 1–16, 2012. [9] Robert J. Aumann, A. Michael Spence, Martin L. Perl, Frank Wilczek, Steve Wozniak, Vinton G Cerf, Ann Winblad, Richard Stallman, Jim Rogers, Alan Kay, Bjarne Strous- trup, Brian Behlendorf, Rajeev Madhavan, Jimmy Wales, Craig Newmark, Greg Gian- forte, Grady Booch, and Chief Scientist. Frontier visionary interview. Frontier Journal, 6(7), 2009. [10] Florian Bauer and Martin Kaltenböck. Linked Open Data: The Essentials. A Quick Start Guide for Decision Makers. Edition mono/monochrom, Vienna, Austria, 1. edition, 2012. [11] Sean Bechhofer, Frank van Harmelen, Jim Hendler, Ian Horrocks, Deborah L. McGuin- ness, Peter F. Patel-Schneider, and Lynn Andrea Stein. http://www.w3.org/tr/owl-ref/, Ac- cessed: 2013-12-05.
121 [12] Francois Belleau, Marc-Alexandre Nolin, Nicole Tourigny, Philippe Rigault, and Jean Morissette. Bio2RDF: Towards a mashup to build bioinformatics knowledge systems. The SemanticWeb: Semantics and Big Data, pages 706–716, 2008.
[13] T Berners-lee, J. Hollenbach, Kanghao Lu, J. Presbrey, and Mc Schraefel. Tabulator redux: Browsing and writing linked data, Accessed: 2013-11-02.
[14] Tim Berners-Lee, James Hendler, and Ora Lassila. The Semantic Web. Scientific Ameri- can, pages 29–37, 2011.
[15] Berners-Lee, Tim and Cailliau, Robert . http://www.w3.org/proposal.html. Accessed: 2014-02-21.
[16] BIO2RDF. https://github.com/bio2rdf/bio2rdf-scripts/wiki/, Accessed: 2013-11-11.
[17] Christian Bizer, Tom Heath, and Tim Berners-Lee. Linked Data — The Story So Far. International Journal on Semantic Web and Information Systems, pages 1–22, 2009.
[18] Christian Bizer, Jens Lehmann, Georgi Kobilarov, Sören Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann. Dbpedia - a crystallization point for the web of data. Web Semantics: Science, Services and Agents on the World Wide Web, 7(3):154–165, 2009.
[19] Brian McBride. http://www.w3.org/tr/rdf-schema/. Accessed: 2013-10-29.
[20] Alison Callahan, José Cruz-Toledo, Peter Ansell, and Michel Dumontier. Bio2RDF Re- lease 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data. Journal of Biomedical Informatics, pages 200–212, 2013.
[21] JackBe Corporation. A business guide to enterprise mashups, 2008.
[22] Dan Brickley and Libby Miller. http://xmlns.com/foaf/spec/. Accessed: 2013-10-24.
[23] Dave Beckett and Brian McBride. http://www.w3.org/tr/rec-rdf-syntax/. Accessed: 2013- 10-21.
[24] David Beckett and Tim Berners-Lee W3C. http://www.w3.org/teamsubmission/turtle/. Ac- cessed: 2013-10-24.
[25] David Martin and Mark Burstein and Jerry Hobbs and Ora Lassila and Drew McDermott and Sheila Mcllraith and Srini Narayanan and Massimo Paolucci and Bijan Parsia and Evren Sirin and Naveen Srinivasan and Katia Sycara. http://www.w3.org/submission/owl- s/, Accessed: 2013-11-15.
[26] Dieter Fensel, Federico Michele Facca, Elena Simperl, and Ioan Toma. Semantic Web Services. Springer-Verlag Berlin Heidelberg, 1. edition, 2011.
[27] Teresa Barberá Ribera Fernando J. Garrigos-Simon, Rafael Lapiedra Alcamí. Social net- works and Web 3.0: their impact on the management and marketing of organizations. Management Decision, 50(2):1880–1890, 2012.
122 [28] The Apache Software Foundation. http://stanbol.apache.org/, Accessed: 2013-12-03. [29] DERI Galway. http://pipes.deri.org/, Accessed: 2013-11-08. [30] Jose Marıa Garcıa, David Ruiz, and Antonio Ruiz-Cortes. A lightweight prototype imple- mentation of sparql filters for wsmo-based discovery. In Technical Report ISA-11-TR-01. ISA Research Group, 2011. [31] Karthik Gomadam, Ajith Ranabahu, and Amit Sheth. http://www.w3.org/submission/sa- rest/, Accessed: 2013-11-17. [32] Graham Klyne and Jeremy J. Carroll and Brian McBride. http://www.w3.org/tr/rdf11- concepts/. Accessed: 2014-02-27. [33] Benjamin Grosof, Mike Dean, Carl Andersen, William Ferguson, Daniela Inclezan, and Richard Shapiro. R.: A silk graphical ui for defeasible reasoning, with a biology causal process example. In In: Proc. 4th Intl. Web Rule Symp. (RuleML), 2010. [34] Benjamin Grosof, Mike Dean, and Michael Kifer. The silk system: Scalable higher-order defeasible rules. In International RuleML Symposium on Rule Interchange and Applica- tions, 2009. [35] Paul Groth and Luc Moreau. http://www.w3.org/tr/prov-overview/, Accessed: 2013-12-05. [36] Tom Heath and Christian Bizer. Linked Data. Evolving the Web into a Global Data Space. Morgan & Claypool, 1. edition, 2011. [37] John Hebeler, Matthew Fisher, Ryan Blace, and Andrew Perez-Lopez. Semantic Web Pro- gramming. Wiley Publishing, Inc., 1. edition, 2009. [38] Ian Horrocks and Peter F. Patel-Schneider and Harold Boley and Said Tabet and Benjamin Grosof and Mike Dean. http://www.w3.org/submission/swrl/. Accessed: 2014-02-21. [39] Google Inc, Yahoo Inc, and Microsoft Corporation. http://schema.org/, Accessed: 2013- 11-29. [40] Yahoo! Inc. http://pipes.yahoo.com/, Accessed: 2013-11-06. [41] Kashif Iqbal, Marco Luca Sbodio, Vassilios Peristeras, and Giovanni Giuliani. Semantic service discovery using sawsdl and sparql. In Proceedings of the 2008 Fourth International Conference on Semantics, Knowledge and Grid, pages 205–212. IEEE Computer Society, 2008. [42] Ivan Herman and Ben Adida and Manu Sporny and Digital Bazaar and Mark Birbeck. http://www.w3.org/tr/xhtml-rdfa-primer. Accessed: 2013-10-21. [43] M. Cameron Jones and Elizabeth F. Churchill. Conversations in Developer Communities: a Preliminary Analysis of the Yahoo! Pipes Community. Proceeding C&T ’09 Proceedings of the fourth international conference on Communities and technologies, pages 195–204, 2009.
123 [44] Rohit Khare and Tantek Çelik. Microformats: a pragmatic path to the semantic web. WWW ’06 Proceedings of the 15th international conference on World Wide Web, pages 865–866, 2006.
[45] Craig A. Knoblock, Pedro Szekely, José Luis Ambite, Aman Goel, Shubham Gupta, Kristina Lerman, Maria Muslea, Mohsen Taheriyan, and Parag Mallick. Semi- automatically mapping structured sources into the semantic web. In The Semantic Web: Research and Applications, Lecture Notes in Computer Science, pages 375–390. Springer Berlin Heidelberg, 2012.
[46] Agnes Koschmider, Victoria Torres, and Vicente Pelechano. Elucidating the mashup hype: Definition, challenges, methodical guide and tools for mashups. In 2nd Workshop on Mashups, Enterprise Mashups and Lightweight Composition on the Web in conjunction with the 18th International World Wide Web Conference, Madrid, 2009.
[47] Rubén Lara, Dumitru Roman, Axel Polleres, and Dieter Fensel. A conceptual comparison of wsmo and owl-s. Multimedia Tools and Applications, 64(2):365–387, 2013.
[48] Jon Lathem, Karthik Gomadam, and Amit P. Sheth. Sa-rest and (s)mashups : Adding semantics to restful services. International Conference on Semantic Computing, pages 469–476, 2007.
[49] Danh Le-Phuoc, Axel Polleres, Manfred Hauswirth, Giovanni Tummarello, and Christian Morbidoni. Rapid Prototyping of Semantic Mash-Ups through Semantic Web Pipes. Pro- ceeding WWW ’09 Proceedings of the 18th international conference on World wide web, pages 581–590, 2009.
[50] Timothy Lebo, Satya Sahoo, and Deborah McGuinness. http://www.w3.org/tr/prov-o/, Accessed: 2013-12-05.
[51] Faculty of Mathematics Leipzig University and Dept. Business Information Systems Com- puter Science, Institute of Computer Science. http://aksw.org/projects/limes.html, Ac- cessed: 2013-11-19.
[52] linkeddata.org, administrated by Tom Heath. http://linkeddata.org. Accessed: 2013-10-21.
[53] Yan Liu, Xin Liang, Lingzhi Xu, Mark Staples, and Liming Zhu. Composing enterprise mashup components and services using architecture integration patterns. J. Syst. Softw., 84(9):1436–1446, 2011.
[54] LOD-Around-The-Clock (LATC). http://5stardata.info/. Accessed: 2013-11-02.
[55] Fadi Maali and John Erickson. http://www.w3.org/tr/vocab-dcat/, Accessed: 2013-12-05.
[56] Marcos Caceres and Mark Priestley. http://www.w3.org/tr/2009/wd-widgets-reqs- 20090430/. Accessed: 2014-02-21.
124 [57] David Martin, Mark Burstein, Drew Mcdermott, Sheila Mcilraith, Massimo Paolucci, Ka- tia Sycara, Deborah L. Mcguinness, Evren Sirin, and Naveen Srinivasan. Bringing seman- tics to web services with owl-s. Multimed Tools Appl, pages 365–387, 2012.
[58] Pablo N. Mendes, Hannes Mühleisen, and Christian Bizer. Sieve: Linked data quality assessment and fusion. In Proceedings of the 2012 Joint EDBT/ICDT Workshops, pages 116–123. ACM, 2012.
[59] Alistair Miles and Sean Bechhofer. http://www.w3.org/2009/08/skos-reference/skos.html, Accessed: 2013-12-05.
[60] Eetu Mäkelä, Kim Viljanen, Olli Alm, Jouni Tuominen, Onni Valkeapää, Tomi Kauppinen, Jussi Kurki, Reetta Sinkkilä, Robin Lindroos, Osma Suominen, Tuukka Ruotsalo, Eero Hyvönen, and et al. Enabling the semantic web with ready-to-use web widgets, 2007.
[61] Christian Morbidoni, Axel Polleres, Giovanni Tummarello, and Danh Le Phuoc. Semantic Web Pipes, 2007.
[62] Jagadeesh Nandigam, Venkat N. Gudivada, and Mrunalini Kalavala. Semantic web ser- vices. J. Comput. Sci. Coll., 21(1):50–63, 2005.
[63] Barry Norton, Reto Krummenacher, Adrian Marte, and Dieter Fensel. Dynamic linked data via linked open services. In Linked Data in the Future Internet 2010, pages 1–10, 2010.
[64] R. Fielding and J. Gettys and J. Mogul and H. Frystyk and L. Masinter and P. Leachand T. Berners-Lee. http://www.w3.org/tr/html/. Accessed: 2014-02-21.
[65] RDF Working Group. http://www.w3.org/rdf. Accessed: 2013-10-21.
[66] Roberto Chinnici and Jean-Jacques Moreau and Arthur Ryman and Sanjiva Weerawarana. http://www.w3.org/tr/wsdl20/, Accessed: 2013-11-15.
[67] Robin Berjon and Steve Faulkner and Travis Leithead and Erika Doyle Navara and Edward O’Connor and Silvia Pfeiffer. http://www.w3.org/tr/html/. Accessed: 2014-02-21.
[68] Dumitru Roman, Uwe Keller, Holger Lausen, Jos de Bruijn, Ruben Lara, Michael Stoll- berg, Axel Polleres, Cristina Feier, Cristoph Bussler, and Dieter Fensel. Web service mod- eling ontology. Applied Ontology, pages 77–106, 2005.
[69] Sebastian Rudolph. Foundations of Description Logics. Reasoning Web 2011, LNCS 6848, 2011.
[70] SAWSDL Working Group. http://www.w3.org/2002/ws/sawsdl/. Accessed: 2014-02-27.
[71] Toby Segaram, Colin Evans, and Jamie Taylor. Programming the Sematic Web. O’REILLY, 1. edition, 2009.
125 [72] Souripriya Das and Seema Sundara and Richard Cyganiak. http://www.w3.org/tr/r2rml/, Accessed: 2013-11-13.
[73] SPARQL Working Group. http://www.w3.org/tr/rdf-sparql-query/. Accessed: 2013-10-21.
[74] Sebastian Speiser and Andreas Harth. Taking the lids off data silos. In Proceedings of the 6th International Conference on Semantic Systems, I-SEMANTICS ’10, pages 44:1–44:4. ACM, 2010.
[75] Sebastian Speiser and Andreas Harth. Integrating linked data and services with linked data services. In Proceedings of the 8th Extended Semantic Web Conference on The Semantic Web: Research and Applications - Volume Part I, ESWC’11, pages 170–184. Springer- Verlag, 2011.
[76] Steffen Stadtmüller, Sebastian Speiser, Andreas Harth, and Rudi Studer. Data-fu: A lan- guage and an interpreter for interaction with read/write linked data. In Proceedings of the 22Nd International Conference on World Wide Web, WWW ’13, pages 1225–1236. International World Wide Web Conferences Steering Committee, 2013.
[77] Mohsen Taheriyan, Craig A. Knoblock, Pedro Szekely, and José Luis Ambite. A graph- based approach to learn semantic descriptions of data sources. In The Semantic Web – ISWC 2013, Lecture Notes in Computer Science, pages 607–623. Springer Berlin Heidel- berg, 2013.
[78] Mohsen Taheriyan, Craig A. Knoblock, Pedro Szekely, and Jose Luis Ambite. Rapidly integrating services into the linked data cloud. In The Semantic Web – ISWC 2012, Lecture Notes in Computer Science, pages 559–574. Springer Berlin Heidelberg, 2012.
[79] Tim Berners-Lee, W3C and Dan Connolly, W3C. http://www.w3.org/teamsubmission/n3/. Accessed: 2013-10-24.
[80] Giovanni Tummarello, Richard Cyganiak, Michele Catasta, Szymon Danielczyk, Renaud Delbru, and Stefan Decker. Sig.ma: Live views on the web of data. Web Semantics: Science, Services and Agents on the World Wide Web, 8(4):355 – 364, 2010.
[81] Ruben Verborgh, Thomas Steiner, Davy Van Deursen, Jos De Roo, Rik Van de Walle, and Joaquim Gabarró Vallés. Capturing the functionality of web services with functional descriptions. World Wide Web, 10(3):243–277, 2012.
[82] Ruben Verborgh, Thomas Steiner, Davy Van Deursen, Sam Coppens, Erik Mannens, Rik Van de Walle, and Joaquim Gabarró Vallés. Integrating data and services through func- tional semantic service descriptions. In Proceedings of the W3C Workshop on Data and Services Integration, 2011.
[83] Roberto De Virgilio, Francesco Guerra, and Yannis Velegrakis. Semantic Search over the Web. Springer-Verlag Berlin Heidelberg, 1. edition, 2012.
126 [84] Tomas Vitvar, Jacek Kopecký, Jana Viskova, and Dieter Fensel. Wsmo-lite annotations for web services. In Proceedings of the 5th European Semantic Web Conference on The Semantic Web: Research and Applications, ESWC’08, pages 674–689. Springer-Verlag, 2008.
[85] W3C. http://www.w3.org, Accessed: 2013-12-15.
127