Linked Data Overview and Usage in Social Networks

Linked Data Overview and Usage in Social Networks Gustavo G. Valdez Technische Universitat Berlin Email: project [email protected] Abstract—This paper intends to introduce the principles of Webpages, also possibly reference any Object or Concept (or Linked Data, show some possible uses in Social Networks, even relationships of concepts/objects). advantages and disadvantages of using it, aswell as presenting The second one combines the unique identification of the an example application of Linked Data to the field of social networks. first with a well known retrieval mechanism (HTTP), enabling this URIs to be looked up via the HTTP protocol. I. INTRODUCTION The third principle tackles the problem of using the same The web is a very good source of human-readable informa- document standards, in order to have scalability and interop- tion that has improved our daily life in several areas. By using erability. This is done by having a standard, the Resource De- rich text and hyperlinks, it made our access to information scription Framewok (RDF), which will be explained in section something more active rather than passive. But the web was III-A. This standards are very useful to provide resources that made for humans and not for machines, even though today are machine-readable and are identified by a URI. a lot of the acesses to the web are made by machines. This The fourth principle basically advocates that hyperlinks has lead to the search for adding structure, so that data is should be used not only to link webpages, but also to link more machine-readable. Even when the data is structured, other data. Those hyperlinks, not like the web ones, would the access is usually through Web APIs, requiring manual be typed. (e.g. A friend-type hyperlink between two people) implementation for each API and making very hard to connect This principle incites the adding of value to data resources by data from different APIs. Amid this chaos the field of Linked connecting them to other data that can provide more useful Data [1] has appeared to define some standards and make information. connected data more easily available. B. The Web of Data Its goal is to solve exactly those problems described, by defining a set of good practices in publishing and interlinking The use of these Linked Data principles by a large commu- data, creating an environment where data can be accessed nity led to the creation of a vast number of datasets, which everywhere using the same interfaces and within the same interlinked form the Linking Open Data Cloud. framework. In summary, Linked Data uses the hyperlink con- As with any big idea, it has to start somewhere. And that cept (from the human web) to connect different datasets (and somewhere is probably the W3C Linking Open Data (LOD) help create a machine web). In both cases, those hyperlinks Project, which sought to bootstrap this web of data (along with enable a global web of information. the semantic web community). Looking at what it was in 2007 (Figure 1) and what it has become (Figure 2), it gives some II. AN OVERVIEW OF LINKED DATA indications of being a success, at least in some areas. To achieve the bold goals it hopes, whoever works must This concept of a Web of Data is a very interesting one. follow a set of principles and practices in order to create this It consists on the idea of having a giant global graph of so-called Web of Data. information that covers all sorts of topics, from music to census data. It’s basically the idea of connecting the Data A. Principles we have in a way very similar to the way the web connects The principles that guide Linked Data, listed in [1], can be documents. paraphrased as as: As the Figure 2 can show, the Web of the data is very 1) Use URIs as identifiers for objects and documents. diverse, including Geographic, Gornmental, Media, Life Sci- 2) Use HTTP URIs so that objects can be refered and ences and User Content Datasets, there are also Datasets from looked up both by humans as by machines. Libraries. That can also be seen which shows the distribution 3) When an URI is requested, provide useful information, in the Domains of the Datasets in Table I. using standards like RDF. This Web of Data also can have negative aspects. Being 4) Relate this URI with other URIs, creating structure in hugely dependent on other systems that provide added useful the web. information, if one service goes discontinued or has a failure, The first one basically extends the scope of Universal it can affect all others. Other fact to be noted is that this graphs Resource Identifiers (URIs) to instead of just referencing showed in Figures 1 and 2 can be somewhat misleading, not all ing subsections describe its key concepts. 1) Graph data model: The structure of the RDF expressions are triples (subject, object, predicate). This structure is usually denoted as two nodes (subject and object) and a directed arc (predicate). This arc denotes that a property or relationship holds between subject and object. The nodes in the model can be either: • an URI (with optional fragment identifier) • a literal • a blank node Should also be noted that the Subject can never be a literal, Fig. 1. Linked Open Data Cloud as of May 2007 [2] and the arc (predicate) is always an URI. 2) Datatypes: Datatypes are important in RDF to represent values (such as booleans, integer, floating point numbers, dates, ...) They consist of: Value Space, Lexical Space and Lexical- to-Value Mapping One example (adapted from [3]) would be boolean datatype: The Value Space would be fTrue, Falseg. A Lexical Space could be f”0”, ”1”, ”true”, ”false”g The Lexical-to-Value Mapping would then be f<”true”, True>;<”1”, True>;<”false”, False>;<”0”, False>g 3) Literals: Literals in RDF can be either plain or typed. Plain literals are simple strings in natural language and are recommended to be self-denoting. Typed literals are simple Fig. 2. Linked Open Data Cloud as of September 2011 [2] strings, combined with a datatype URI that provides the Lexical-to-Value mapping as shown in Section III-A2 and represent the value given by the mapping. these services still exist in their prime, some were discontin- 4) Serialization Syntax: RDF is NOT a data format. It’s a ued, are outdated by years or have very poor documentation data model and in order to be publishded it must be serialized. and/or quality, while others respresent some of the best the There are two standard W3C serialization formats: RDF/XML web can offer in matters of Scientific Data. This problems [4] and RDFa [5] (and many other non-standard that are used will be discussed again on the Section V. in specific situations). III. PUBLISHING LINKED DATA There are some RDF Formats that are usually used: RD- F/XML, Turtle, N-Triples and RDFa. They are exemplified Before understanding how to publish linked data, some below: things must be explained, especially how does the RDF work These examples also make use of FOAF (Section IV-1) and how are URIs Dereferenced. With that, the publishing and are an excerpt from the Datasets used on the Example patterns become clearer. Application (Section V). A. Resource Description Framewok nn RDF/XML The Resource Description Framewok (or RDF) is a frame- <r d f :RDF work for representing information in the Web. [3] The folow- xmlns: rdf=”http ://www.w3.org/1999/02/22 −rdf −syntax −ns #” xmlns:j.0=”http :// somewhere/ont#” Domain Data Triples Percent RDF Percent Sets (aprox.) Links xmlns:j.1=”http :// xmlns.com/foaf/0.1/” > Cross-domain 20 2,0 Bi 7.42 29,1 Mi 7.36 <rdf:Description rdf:about=”http :// Geographic 16 5,9 Bi 21.93 16,6 Mi 4.19 somewhere/ socialnet#me”> Government 25 11,6 Bi 43.12 17,7 Mi 4.46 Media 26 2,5 Bi 9.11 50,4 Mi 12.74 <j.1:knows rdf:resource= Libraries 67 2,2 Bi 8.31 78,0 Mi 19.71 ”http ://somewhere/ socialnet#rosa”/> Life sciences 42 2,7 Bi 9.89 200,4 Mi 50.67 <j .0:ListenedTo >Bad Romance User Content 7 57,5 Mi 0.21 3,4 Mi 0.86 203 26,9 Bi 395,5 Mi </j .0:ListenedTo > <j . 1 : name>Gustavo Valdez </ j . 1 : name> TABLE I NUMBER OF DATA SETS, AMOUNT OF TRIPLES, AND AMOUNT OF RDF <rdf:type rdf:resource= LINKS PER TOPICAL DOMAIN (ADAPTED FROM TABLE 3.1 OF [1]) ”http ://xmlns.com/foaf /0.1/Person”/> </rdf:Description > </ r d f : RDF> That’s not to say that it could not be added usind the predicate structure given, but it would be cumbersome and nn T u r t l e too complicated, since there is no native support. <http :// somewhere/ socialnet#me> 6) RDF, RDF-Schema and OWL: RDF’s describes re- a <http ://xmlns.com/foaf/ sources in a (subject, predicate, object) triple, but it does 0 . 1 / Person > ; not provide support for domain-specific terminologies, classes <http :// xmlns.com/foaf /0.1/knows> of things and relations of classes directly. These kinds of <http :// somewhere/ functions are better served by Ontologies, Vocabularies and socialnet#rosa > ; Taxonomies. For expressing that we can use RDF-Schema [6] <http :// somewhere/ont#ListenedTo > and OWL (Web Ontology Language) [7]. ”Bad Romance” ; The difference porbably is made more clear by an example.

Linked Data Overview and Usage in Social Networks

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support