Rdf Repository Replacing Relational Database
Total Page:16
File Type:pdf, Size:1020Kb
RDF REPOSITORY REPLACING RELATIONAL DATABASE 1B.Srinivasa Rao, 2Dr.G.Appa Rao 1,2Department of CSE, GITAM University Email:[email protected],[email protected] Abstract-- This study is to propose a flexible enable it. One such technology is RDF (Resource information storage mechanism based on the Description Framework)[2]. RDF is a directed, principles of Semantic Web that enables labelled graph for representing information in the information to be searched rather than queried. Web. This can be perceived as a repository In this study, a prototype is developed where without any predefined structure the focus is on the information rather than the The information stored in the traditional structure. Here information is stored in a RDBMS’s requires structure to be defined structure that is constructed on the fly. Entities upfront. On the contrary, information could be in the system are connected and form a graph, very complex to structure upfront despite the similar to the web of data in the Internet. This tremendous potential offered by the existing data is persisted in a peculiar way to optimize database systems. In the ever changing world, querying on this graph of data. All information another important characteristic of information relating to a subject is persisted closely so that in a system that impacts its structure is the reqeusting any information of a subject could be modification/enhancement to the system. This is handled in one call. Also, the information is a big concern with many software systems that maintained in triples so that the entire exist today and there is no tidy approach to deal relationship from subject to object via the with the problem. predicate is captured in one record. 1.1. Relational databases Relational databases need a structure to be Key Concepts: Entity, Subject, Predicate, defined upfront and this might not be feasible Object, RDF Triple always with the complexities in the real 1. Introduction world.Relational databases have immense The Semantic Web is a collaborative movement capabilities to offer, however they do pose some led by the international standards body, the serious concerns in some aspects and the most World Wide Web Consortium (W3C).[1] The important of them are semantic web is a “web of data” that enables Enhancement to structure machines to understand the semantics of o requires a great deal of effort to update information on the World Wide Web. The term the structure of information stored "Semantic Web" is often used more specifically o total understanding of the system and its to refer to the formats and technologies that dependencies ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697,VOLUME-2, ISSUE-4,2015 27 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR) o complex analysis to ensure the system is not impacted even after the structural change o focussed monitoring once the change is put in place Scaling o scaling in line with the application is not always possible because of centralized nature Figure 2: Generalized RDF Model o when data set tends to grow, traditional The above picture represents an entity being an databases cannot handle it object and a subject in different contexts. 2. Background 2.2. Class Diagram 2.1. RDF Model Figure 1: RDF Model The RDF model has a triple consisting of subject, predicate and an object. The subject is Figure 3: Class diagram the entity that is being dealt with i.e. the subject and its characteristics are being discussed. The RDF data model is similar to classic Predicate indicates one characteristic type of the conceptual modelling approaches such as subject and the object represents the Entity-Relationship or Class diagrams,[10] as it characteristic. is based upon the idea of making statements about resources (in particular Web resources) in An analogy to understand this better is to the form of subject-predicate-object consider the {predicate, object} pair as an expressions. These expressions are known {attribute name, attribute value} pair for the as triples in RDF terminology. The subject subject. A subject can have multiple attributes, denotes the resource, and the predicate denotes and each of subject, attribute name and attribute traits or aspects of the resource and expresses a value is fully scoped. This ensures there is a relationship between the subject and the object. context associated with each triplet.Figure 1: RDF Model RDF is based on the idea of identifying things using Web identifiers (called Uniform Resource A subject can have any number of Identifiers, or URIs[4]), and describing characteristics and an object in one context resources in terms of simple properties and could be a subject in another context. Similarly, property values. This enables RDF to represent a predicate could also be a subject or an object simple statements about resources as a graph of in other contexts. nodes and arcs representing the resources, their properties and values. ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697,VOLUME-2, ISSUE-4,2015 28 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR) syntax-ns#" xmlns:si="http://www.w3schools.com/rdf/"> <rdf:Description rdf:about="Thomas Edison"> <si:inventor>Electric bulb</si:inventor> <si:mbox>[email protected]</si:mbox> </rdf:Description> F <rdf:Description rdf:about="Albert igure 4: An RDF Graph Describing David Einstein"> <si:discoverer>Photoelectric Figure 4 illustrates that RDF uses URIs to Effect</si:discoverer> identify individuals, physical objects, properties and values of properties. <si:mbox>[email protected]</si:mbox> This concept is extended to define relations between objects in a relational database thereby </rdf:Description> eliminating the structure. The main advantage <rdf:Description rdf:about="Max Planck"> that this approach offers is the traversal from one object to another based on the relations <si:founder>Quantum Theory</si:founder> either directly or by hopping across objects. 2.3. Storage and querying <si:mbox>[email protected]</si:mbox> The RDF store can be a file system, RDBMS or </rdf:Description> any other store. <rdf:Description rdf:about="Max Planck"> Let us consider information of individuals and <si:age>43</si:age> their discoveries or pioneering works. Then we query the store for the mailbox and pioneering </rdf:Description> work where both of them exist. This is a simple <rdf:Description rdf:about="Marie Curie"> case that shows that information can be queried from an RDF store using query languages very similar to the ones used to query traditional <si:mbox>[email protected]</si:mbox> relational databases. </rdf:Description> The RDF format consists of triples namely <rdf:Description rdf:about="Marie Curie"> Subject, Object and Predicate. The Subject and Object are related to each other via a Predicate. <si:pioneer>Radioactivity</si:pioneer> This information can be maintained as simple </rdf:Description> XML as shown below </rdf:RDF> <?xml version="1.0"?> This information is maintained in an RDF store <rdf:RDF without knowledge of how it is structured or xmlns:rdf="http://www.w3.org/1999/02/22-rdf- maintained internally by RDF engines. This gives the flexibility to add new dimensions to ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697,VOLUME-2, ISSUE-4,2015 29 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR) the existing information simply executing a database, MongoDB stores structured data query. On the contrary, when a relational as JSON-like documents with dynamic schemas database has to be updated with a new (MongoDB calls the format BSON), making the dimension, a lock has to be acquired and the integration of data in certain types of database is not usable until the dimension is applications easier and faster. added. MongoDB scales horizontally using sharding Traditional RDBMS’s are queried using SQL, [9]. The developer chooses a shard key, which similarly there are querying languages like determines how the data in a collection will be SPARQL[5]/SeRQL[6] to query information distributed. The data is split into ranges (based from an RDF store. SPARQL allows for a query on the shard key) and distributed across to consist of triple patterns, conjunctions, multiple shards. (A shard is a master with one disjunctions, and optional patterns. or more slaves.) In order to fetch the mailbox and pioneering 3. Architecture work where both exist, a SPARQL query would Our system architecture consists of three major look like components 1. RDF Data store SELECT ?mbox ?pioneer where {?x 2. Reactor <http://www.w3schools.com/rdf/mbox> ?mbox a. Data de-normalizer . ?x <http://www.w3schools.com/rdf/pioneer> b. View generator ?pioneer} 3. Query Interface This returns the response Any external system can populate data in RDF "Radioactivity" [email protected] Data store. For every such insert/update of data in the RDF Data store, a trigger begins a One of the core concepts of RDF is its open reaction. The reactor picks up the change and structure i.e. there is no need for a structure to data de-normalizer component de-normalizes be defined upfront. This property removes the the data by breaking it into simple relational complexity of updating the system. Information triples and further breaks up the data into in RDF could be distributed across multiple subject, object and predicate. The view systems like the information on the web. This generator picks up this information and distributed nature of persisting information generates materialized views. These views are allows the systems relying on RDF to scale then persisted in MongoDB store for optimal without the need to do a lot of manual retrieval. configuration. The query interface is a light weight converter This helps in overcoming the scaling problem. that lets clients query for information. It directly talks to MongoDB and converts the data back 2.4.