The RDF Virtual Machine Marko A

1 The RDF Virtual Machine Marko A. Rodriguez Abstract—The Resource Description Framework (RDF) is a semantic network data model that is used to create machine- understandable descriptions of the world and is the basis of the Semantic Web. This article discusses the application of RDF to the representation of computer software and virtual computing machines. The Semantic Web is posited as not only a web of data, but also as a web of programs and processes. Index Terms—Resource Description Framework, Virtual Machines, Distributed Computing, Semantic Web, Web Computing. F 1 INTRODUCTION article is to discuss the general use of rule and process At its core, the Semantic Web is a global graph data information in the Semantic Web and their explicit real- structure used to describe web resources in a machine- ization as RDF encoded software programs and RDF vir- understandable way [1]. Unlike the World Wide Web, in tual machines (RVM). The remainder of this section will which document resources are interconnected through a introduce 1.) the Linked Data initiative and its intention single type of relationship (i.e. href, hyper-text links), of creating a massive-scale distributed data structure and on the Semantic Web, resources are related to one an- 2.) the RDF programming and virtual machine initiative other through a heterogeneous set of relationships. The and its intention of creating a massive-scale distributed set of resources and relationship types are identified by process infrastructure. The latter initiative is a nascent Uniform Resource Identifiers (URI) [2]. The Resource movement which has the potential to greatly advance Description Framework (RDF) is a standard for graphing the utility of the Semantic Web and, as previously stated, (i.e. relating) URIs, literal values, and blank nodes (or forms the primary point of discussion for this article. anonymous nodes) [3]. If U is the set of all URIs, L is the set of all literal values, and B is the set of all blank 1.1 Linked Data as a Distributed Data Structure hs; p; oi nodes, then an RDF triple (or link) is defined as , The Linked Data initiative is concerned with exposing s 2 (U [ B) p 2 U o 2 (U [ L [ B) where , , and . The data within the URI address space much like the World union of all triples constitutes the Semantic Web graph Wide Web initiative is concerned with exposing docu- and can be generally defined as ments and media in the URL address space [7]. Before G ⊆ h(U [ B) × U × (U [ L [ B)i: discussing the Linked Data movement, it is important to understand how the Semantic Web serves not simply At the level of RDF, the Semantic Web is simply a collec- as a data repository, but more importantly as a sin- tion of triples. These triples form a data structure known gle massive-scale distributed database. Moreover, it is as a directed edge labeled graph (or multi-relational net- important to discuss how the Semantic Web provides work). However, in order to create a layer of abstraction both a technological and cultural differentiation from to describe how resources should be interrelated and to the traditional notion of a database as posited by the reason about and infer non-explicit relationship between relational database community. These factors set the arXiv:0802.3492v2 [cs.PL] 25 Mar 2010 resources, ontological languages have been developed. Semantic Web up for being a revolutionary means by The two most prevalent Semantic Web languages are which data is globally managed and accessed. the RDF Schema (RDFS) [4] and the Web Ontology Technologically, the Semantic Web is reminiscent of the Language (OWL) [5]. For a fine, practical review of these relational database model, insofar as it is a data storage two languages see [6]. environment that provides well-structured data to ex- The prevalent conception of the Semantic Web is ternal applications; though this data is not represented that of a well-structured, massive-scale distributed data as a collection of interlinked tables, but instead as an repository that can be utilized by applications for various edge labeled graph (more specifically, an RDF graph). purposes. However, the RDF data model is general While the table and graph data structures can be mapped enough to support not only the representation of data, into one another without loss of information, the utility but also the representation process. The purpose of this of the graph structure has yielded the development of specialized graph databases known as triple stores [8].1 • M.A. Rodriguez is with the Digital Library Research and Prototyping Moreover, the data exposed by the Semantic Web is Team, the Center for Nonlinear Studies, and the Applied Mathematics and Plasma Physics Group of the Los Alamos National Laboratory, Los 1. It is important to note that many RDF servers still utilize an Alamos, New Mexico 87545. underlying relational database to manage data. Examples of such E-mail: [email protected] Website: http://markorodriguez.com architectures include the D2R Server [9]. 2 within the URI address space and as such is agnostic that have the processing power and space to download to the addressing scheme of the underlying machine and index it.3 For the keyword search space of the supporting its representation. In this way, the data on World Wide Web, this problem is perhaps best solved multiple physical machines are able to reference each by the few large-scale, search engines in existence today. other and thus, the Semantic Web serves as a single However, the Semantic Web, with its rich data model unified graph spanning serves worldwide. and nearly endless potential, is poised to require a new Culturally, the Semantic Web maintains the open, glob- Web infrastructure to support its processing within and ally accessible nature of the World Wide Web. In contrast, between its various Linked Data repositories. No single rarely are relational database schemas reused and/or institution or organization will have the compute power, openly distributed and rarely are relational database nor the man power, to execute and implement all the ports (e.g. ODBC) made available for the public har- potentially useful algorithms that will make the Seman- vesting of information. The common paradigm in the tic Web stand out as the defacto medium for representing relational database world is that data is accessed and data. In order to remedy this situation, a move towards a manipulated by software with privileges to the data computing paradigm for the Semantic Web is necessary. and only through that software is the information made available to other services, if at all. However, with re- 1.2 RVM Computing as a Distributed Process Infras- spect to the Semantic Web, not only does the community tructure 2 encourage the distribution and reuse of ontologies , but The Semantic Web has the potential to not only act it also provides open and accessible interfaces to the its as a data storage environment, but also capture the data. Such interfaces are known as SPARQL endpoints more procedural aspects of computing, such as computer [10] and HTTP-based linked RDF data [7]. The Semantic instructions and abstract virtual computing machines. In Web truly represents a new data management paradigm others words, given the flexibility of the RDF data model, because of the way in which data is distributed and it is possible to encode, in RDF, the rules by which RDF discovered: in an open, standards-based fashion. data is manipulated and thus, expose such information The Semantic Web’s Linked Data community is fo- on the Semantic Web. Moreover, the URI address space cused on the systematic union of RDF datasets in order is an infinite space that is only constrained by the size to allow and number of physical machines that are supporting its “[any man or machine] to start with one data representation. A flexible data model and an infinite ad- source and then move through a potentially dress space make the Semantic Web an ideal medium for endless Web of data sources connected by RDF distributed, global computing. In this more computation- links. Just as the traditional document Web centric environment, instructions expressed in RDF are can be crawled by following hypertext links, executed by RDF virtual machines (RVM). An RVM is the Web of Data can be crawled by following any entity that processes RDF computing instructions, RDF links. Working on the crawled data, search and in some instances, is represented in RDF as well. engines can provide sophisticated query capa- Thus, like other RDF data, computing instructions and bilities, similar to those provided by conven- RVMs are “first-class” citizens on the Semantic Web. tional relational databases. Because the query Many common computing models are made salient by results themselves are structured data, not just the RVM paradigm, such as open (refer to Section 4.1), links to HTML pages, they can be immediately distributed (refer to Section 4.2), and reflective comput- processed, thus enabling a new class of appli- ing (refer to Section 4.3). RDF programming languages cations based on the Web of Data.” [11] compile down to RDF and these RDF instructions can There is far-reaching potential for the Web of data that be accessed, annotated (i.e. RDF related), and reasoned currently exists and will continue to grow to become. on like any other RDF data on the Semantic Web. Fur- However, one of the limiting factors in the Linked Data thermore, unique situations emerge when RDF code is approach is that while the community is providing a represented across different physical machines. Because massive-scale distributed data structure, they are not all RDF instructions are in the same URI address space, providing a massive-scale distribute process infrastruc- there is nothing that prevents the software, much like the ture to compute on this Web of data [12].

Load more