
Using RDF with XML to Automatically Augment Metadata Tina Jayroe University of Denver Professor Jessica Branco Colati Metadata Architectures June 3, 2009 2 Abstract The World Wide Web Consortium’s Resource Description Framework (RDF) provides a methodology and an opportunity for information organizations to have their metadata become more valuable, contextual, and uniform; this allows for more relevant information to be retrieved when accessing Web data. Since Extensible Markup Language (XML) is the most widely used format for sharing metadata, it is advantageous for organizations to use its traits and constraints in conjunction with RDF triples and graphs—which correspond to ontologies and namespaces. However, writing RDF in XML is an expensive, labor-intensive process that is often too costly and technical for most library organizations. This article provides examples of the various schemata which exploit the RDF/XML specification, and attempts to make a case for information organizations to consider implementing RDF in order to provide richer metadata for information seekers, and more uniformity for the entire Web community. 3 The Web will become a repository of knowledge not only a compendium of facts. –Reed Hellman, A semantic approach adds meaning to the Web. In the past, individuals, associations, and computers have organized data to be retrievable by reconfiguring systems and adopting certain Web protocols and languages. A major component to the current flexible and interactive Web environment is the widely accepted metalanguage XML. XML is a World Wide Web Consortium (W3C) standard format language that allows a user to represent resources using any other type of language (Klein, 2001, p. 26). One downside to XML, however, is that its elements, subelements, and attributes do not define or reference the content enclosed within its structure (or tags). Therefore, many organizations have implemented Resource Description Framework (RDF) in order that their metadata reference ontologies for better context, uniformity, interoperability, and precision. “The Resource Description Framework is a . W3C recommendation designed to standardize the definition and use of metadata—descriptions of Web-based resources” (Decker, Melnik, Van Harmelen, Fensel, Klein, Broekstra, . Horrocks, 2000, p. 66).1 RDF is an application of XML; it allows for self-description, reification, and logic through the use of statements to which XML provides the notation (Berners-Lee, 1999, “Semantic Web: the pieces”; “Namespaces”). XML is extensible, readable by humans, and can be encoded in any proprietary language. As previously mentioned, XML does not attribute any vocabulary or meaning to content; the metadata within its tags are essentially ambiguous (Klein, 2001, p. 26). Thus, often a DTD (Document Type Declaration/Definition) is used because it can specify a vocabulary and other syntax specifications to use during the processing of the XML document. DTDs (and the XML 4 Schema) serve as translation agreements between parties regarding a document’s grammar (Decker et al., 2000; Klein, 2001, p. 26), and are a step toward increased description and interoperability—goals of the Semantic Web. By using what is referred to as RDF’s triples list and graph—entity, attribute, and value (also referred to as subject, predicate, and object)—data in RDF have the ability to be independent of each other within the same syntax, or syntax-independent. The other and opposite advantage is that the meaning (the semantics) between the objects can be identified or defined in relation to a reference, and each other. Figure 1: An RDF triple states the relationship between subject, predicate, and object. Decker et al. provide a comparison of the XML Schema vs. the RDF Schema2 mechanisms: [I]n XML schema, if type T′ is derived from type T, then elements of the derived type T′ are not necessarily members of the original type T. In the subClassOf relationship in RDF schema, on the other hand, a member of a subclass is also a member of the original superclass. As a result, subClassOf can be used to model ontological subtyping, whereas XML schema’s type extension cannot (2000, p. 70). This is an extremely important concept in the development of the future Web where ontologies are used to express the meaning between applications and systems (Baca, 2008, “Glossary”). Ontologies define terms for computer expressions and concepts in a given domain and are technical components of the Semantic Web, and while there are a lot of pieces needed to fulfill 5 this vision (see Figure 2), the RDF component cannot be effectively discussed without at least a brief explanation of XML namespaces and Uniform Resource Identifiers (URI). URIs identify resources; resources are anything that can be identified on the Web. A URIref (absolute or relative) identifies a resource using a fragment identifier. XML namespaces (which are also known as qualified names or QNames) declare the URIref in the code. What this means is, when the syntax that determines a namespace is constructed, the referenced object within the schema will correspond to the given resource (e.g., a certain vocabulary/ontology or a vCard3 URI) automatically. It will then retrieve any specific, absolute, or relative information about that resource (Breitman, Casanova & Truszkowski, 2007, pp. 59–61). Figure 2: The Semantic Web layers. Note: Created by Sebastian Faubel and released to the public domain. In other words, RDF is a framework that takes advantage of XML’s encoding to identify and define objects on the Web and put them in context according to the appropriate domain. The RDF model is an advanced way of representing metadata for the benefit of reuse and automation. The Good News 6 RDF is now being used in many areas and by many organizations. Digital library classifications systems have implemented applications that build on RDF. Simple Knowledge Organization System (SKOS) Core utilizes metadata contained in bibliographic records and social computing networks such as Friend of a Friend (FOAF)—an ontology that contains more personal-type links providing a richer source of data that can be related to other systems. SKOS Core is a W3C format designed to be less costly, and less complex than the more sophisticated Web Ontology Language (OWL) from which it is based. SKOS Core utilizes the RDF data model and schema and is intended for creating relationships between controlled vocabularies: Controlled vocabularies facilitate consistent documentation, but they do not guarantee flawless searching across multiple collections. Controlled vocabularies preclude the user from using natural language terms and phrases of his or her choosing, and seeking and finding resources illustrative of conceptual ideas and relationships (Cantara, 2006, p. 111). The benefit of using SKOS Core in libraries (besides its cost-effectiveness) is that resource discovery systems will become more effective and interoperable if semantic tools are implemented to search multiple vocabularies, thesauri, taxonomies, subject headings, etc. Further, the retrieved data will have been analyzed using intelligent algorithms which are applied to the attributes of the terms contained within a system. Linda Cantara, author of Encoding controlled vocabularies for the Semantic Web using SKOS Core, notes how terms used in classification systems to define subject headings are usually nouns, however, in actuality those terms often are meant to be used as verbs, adjectives, or some other context in natural language processing (2006, p. 112). 7 SKOS Core works extremely well with RDF is because of its extensibility and serialization.4 By associating concepts using RDF description statements, it provides the user with more accurate results; this occurs by mapping class and property elements which are easily extendible (Miles, 2005, Slide 6, 13, 14; W3C, 2009, “SKOS Core”). Another advantage is that SKOS Core can be implemented as a “basic” or “advanced” application, depending on the level of expression needed for the given institution and the amount of time that can be dedicated to the transition and maintenance of such a complex system (W3C, 2009, ¶ 3–4).5 FOAF is used to aggregate information via the Web about a person, their related persons, and their personal associations: In addition to the FOAF vocabulary, one of the most interesting features of a FOAF file is that it can contain "see Also" pointers to other FOAF files. This provides a basis for automatic harvesting tools to traverse a Web of interlinked files, and learn about new people, documents, services, data . (Brickley & Miller, 2000–2007, “The Basic Idea”). This information may be extracted from a social networking site, authority files, or may be imported using vCard elements. The FOAF project/initiative is built on the RDF data model to support semantic connections between identities (Harper & Tillett, 2007, p. 61); RDF allows vCard vocabularies to be imported into FOAF code (Breitman et al., 2007, p. 183). Thus, by using FOAF as a resource for RDF, it becomes possible to obtain more contextual information from a larger pool of decentralized data and vocabularies. For example, in a library catalog a vCard could be retrieved in relation to the MARC fields 245/246 which denote statement of responsibility, thereby providing much more information (richer metadata) about that resource. Much like SKOS Core, FOAF uses object attributes such as classes and properties to be “discernable in the syntax for RDF” (Brickley & Miller, 2000–2007, “FOAF and RDF”). 8 Another application developed by the W3C, the RDF iCalendar, is also being referenced much like FOAF. By accessing the iCalendar’s properties and components, information such as “[e]vents, places, names, and coordinates” further the definitions that the RDF and XML namespaces can access and allow for more specification in attributing venue information to social information (Connelly & Miller, 2005, “Events, places, names and coordinates”). While FOAF and iCalendar are effective linking and exchange mechanisms for aggregating personal identification attributes, at present there is a risk of invasion of privacy and valid security concerns.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages17 Page
-
File Size-