A Framework for RDF Data Exploration and Conversion
Total Page:16
File Type:pdf, Size:1020Kb
A Framework for RDF Data Exploration and Conversion Master Thesis Gaurav Singha Roy Riemenschneiderstr. 2, 53117 Bonn [email protected] Matriculation number 2539517 Bonn, 12th January, 2015 Rheinische Friedrich-Wilhelms-Universität Bonn Informatik – Enterprise Information Systems Professor Dr. Sören Auer Declaration I hereby declare and confirm that this thesis is entirely the result ofmy own original work. Where other sources of information have been used, they have been indicated as such and properly acknowledged. I further declare that this or similar work has not been submitted for credit elsewhere. Bonn, Gaurav Singha Roy iii Acknowledgement Firstly, I would like to thank Dr. Sören Auer for his valuable time and guidance during the master thesis duration, for providing insightful and innovative solutions, and for being accessible whenever I needed. I would also like to thank Judie Attard and Dr. Fabrizio Orlandi, for their mentorship and constant guidance throughout the thesis. A special thanks especially to Judie, for her dedication and relentless help, which was a major reason for the completion of the thesis in such a small span of time. I would also like to thank my biggest idol, my uncle, Joydip Homchowd- hury, who is an innovator, an artist and a superhero. His constant guidance and vision in the past years made my life colourful with software engineering, and he had introduced me to computer science in the first place. Last but not the least, I would like to thank my parents for their support, even when distance was such a huge barrier. And finally, a special thanks to Adela, for her patience and love which has not bounds. v Contents Declaration iii Acknowledgementv List of Figures viii Abstract1 1 Introduction2 1.1 Problem Context and Motivation................2 1.2 Thesis Contributions.......................3 1.3 Thesis structure..........................4 2 Background5 2.1 The semantic web........................5 2.2 Linked OPEN Data (LOD)...................5 2.3 Resource Description Framework (RDF)............7 2.4 RDF Serialization........................9 2.4.1 RDF/XML........................9 2.4.2 Turtle........................... 10 2.4.3 JSON........................... 10 2.5 RDF Ontology.......................... 11 2.6 SPARQL............................. 11 3 Related Work 13 3.1 Linked Data and REST Services................ 13 3.2 RDF to Any............................ 14 3.3 Visualization of RDF data.................... 17 3.4 SPARQL Tools.......................... 18 3.5 XSPARQL............................ 20 4 Approach 23 4.1 The Initial Approach....................... 24 4.2 RDF Exploration and Conversion API server......... 25 vi Contents vii 4.2.1 RDF Exploration helper................. 26 4.2.2 RDF2Any Conversion.................. 36 4.3 Query Builder GUI Tool..................... 53 4.3.1 Selecting datasets.................... 54 4.3.2 Exploring classes..................... 54 4.3.3 Properties Histogram.................. 57 4.3.4 Equivalent Query..................... 60 4.3.5 Result Set Preview.................... 61 4.3.6 Result Set Download................... 61 4.3.7 SPARQL endpoint.................... 65 4.3.8 A Use Case........................ 65 5 Evaluation 68 5.1 Usability Feedback of Query Builder.............. 68 5.2 Evaluation of Conversion module................ 70 5.2.1 JSON Conversion..................... 70 5.2.2 CSV Conversion..................... 70 5.2.3 RDB Conversion..................... 70 5.2.4 Generic Conversion.................... 70 5.3 Performance............................ 75 6 Conclusion and Future Work 76 6.1 Summary............................. 76 6.2 Future Work........................... 77 References 79 Appendix 82 List of Figures 2.1 LOD Cloud in March 20091 ...................6 2.2 LOD Cloud in August 20142 ...................7 2.3 Basic RDF graph.........................8 2.4 An RDF triple example.....................9 2.5 RDF/XML example....................... 10 2.6 RDF Turtle example....................... 10 2.7 RDF JSON-LD example..................... 11 3.1 A conversion from RDBMS to RDF triples using Triplify3 .. 16 3.2 R2D System Architecture4 .................... 17 3.3 Hierarcheal facets can be used to filter information using gFacet5 18 3.4 Possible recommendations given using the SPARQL query formulation tool by the Graph Summary Project6 ....... 19 3.5 A lifting and lowering example................. 21 3.6 RDF lifting and lowering for Web service communication7 .. 21 3.7 XSPARQL : schematic view and how its conceptualized from XQuery and SPARQL...................... 22 3.8 XSPARQL lowering example.................. 22 4.1 Lucene indexes stored in the project.............. 31 4.2 A flowchart describing the properties indexing logic in Lucene 31 4.3 Example Body text of Generic convert parsed into Body Chunks expression tree...................... 52 4.4 DBpedia’s SPARQL Query Editor............... 55 4.5 Query Builder GUI : Selecting datasets............ 55 4.6 Query Builder GUI : Searching classes............. 56 4.7 Query Builder GUI : After selecting a class.......... 56 4.8 Query Builder GUI : URI viewer................ 57 4.9 Query Builder GUI : Classes histogram............ 57 4.10 Query Builder GUI : Property histogram............ 58 4.11 Query Builder GUI : Property filter.............. 59 4.12 Query Builder GUI : Select properties............. 59 4.13 Query Builder GUI : Equivalent query............. 60 viii List of Figures ix 4.14 Query Builder GUI : Equivalent query showing all the selected properties............................. 61 4.15 Query Builder GUI : Preview of Result Set.......... 62 4.16 Query Builder GUI : The result set download modal..... 62 4.17 Query Builder GUI : The JSON download........... 63 4.18 Query Builder GUI : The Generic download.......... 63 4.19 Query Builder GUI : A Sample Generic download template. 64 4.20 Query Builder GUI : A Sample Generic download template after filled by the user...................... 64 4.21 Query Builder GUI : SPARQL endpoint............ 65 4.22 Use Case : Searching for the classes matching Artist ..... 66 4.23 Use Case : Browsing the Artist class histogram........ 66 4.24 Use Case : Selecting the properties which the user wants for the class, Artist .......................... 67 5.1 Time taken for conversions of the class Person8 ........ 75 Abstract With the upswing increase of the use of RDF datastores, it has an increase in the number of users who want to access those data. RDF data exploration and conversion can be exhaustive and challenging. There is a lack of tools and frameworks which are needed to achieve this. Moreover, Conversion of RDF data to other formats is hard and may require the user to install a lot of softwares, if such softwares even exist. SPARQL is used mainly to access data in RDF data stores. Forming queries in SPARQL can be complex and overwhelming for even expert users. We thus propose the use of an interactive Query Builder, using which a user can formulate a SPARQL query even if they do not have any prior SPARQL experience. This thesis, therefore focuses to address the above problems as well as provide a framework with a RESTful solution which will allow developers to develop their own exploration and conversion tool by consuming those RESTful APIs. This framework can be utilized by non-expert users and also expert users, who can extend this framework easily and personalize it according to their requirements. Using the APIs from the framework, RDF data exploration and conversion is achieved. A prototype of an interactive Query builder has also been developed, which consumes these RESTful API. Keywords. RDF Exploration, SPARQL Query Builder, RESTful and Linked Data, RDF Serialization, RDF Conversion 1 Chapter 1 Introduction 1.1 Problem Context and Motivation There has been a surge in Web technologies and cloud data over the past few years. More and more users are making their data available through Linked Open Data (LOD). A common framework, Resource Description Framework (RDF) is adopted for providing meaningful semantics to the data. RDF [AG+12] was designed to provide a common way to interact with various computer applications. Since, they provide a common RDF/XML format, there can be interoperability between different systems. The LOD data providers provide APIs for their data. These API services return data results in standard JSON and XML format. There are various tools available which allows the user to convert this data to some popular formats such as XML, JSON, Turtle, N3, etc. from RDF. Many open source tools and RESTful services are available which help the users in achieving these RDF to any conversions. But first, it is important that the user under- stands the underlying data structure. RDF has a graph structure in which the Data structure is not as rigid as its Relational Database counterparts. The data in RDF data stores can be queried using the standard RDF query language, SPARQL. Let us consider a use case. Lets say, a user has built a Health cloud software and has an internal database which stores all the data of medicines, diseases, etc. Now this database needs to be updated on a regular basis. Lets say there is a another service, which provides this data in an RDF format by crawling through libraries like SNOMED CT1. Now, coming back to the Health cloud user. Lets say the user needs regular SQL upload scripts to update his database. Also, he is not interested in retrieving all of the data available from the RDF data source.