Rdfa Vs. Microformats

DERI – DIGITAL ENTERPRISE RESEARCH INSTITUTE RDFA VS.MICROFORMATS Alexander Graf DERI TECHNICAL REPORT 2007-04-10 APRIL 2007 DERI Galway University Road Galway, Ireland www.deri.ie DERI Innsbruck Technikerstrasse 21a Innsbruck, Austria www.deri.at DERI Korea Yeonggun-Dong, Chongno-Gu Seoul, Korea korea.deri.org DERI Stanford Serra Mall Stanford, USA DERI – DIGITAL ENTERPRISE RESEARCH INSTITUTE www.deri.us DERI TECHNICAL REPORT DERI TECHNICAL REPORT 2007-04-10, APRIL 2007 RDFA VS.MICROFORMATS A COMPARISON OF INLINE METADATA FORMATS IN (X)HTML Alexander Graf1 Abstract. Most Web pages contain inherent structured and significant data like contact details for various people, dates and addresses of events, descriptive elements for photos and a lot more. As it is, this data is expressed in a way that is easily understandable for humans but incredibly hard to detect and interpret for machines. Once content publishers gain the ability to express this data more completely and tools are developed that are able to understand the semantics, a whole new set of possibilities on the internet becomes available to the end user. New forms of web content, meaningful to computers, will unleash a revolution of the internet. A true Semantic Web might still lie in the future but that’s no reason not to start using the core ideas from which it is being formed. There are several attempts that try to combine the principles of the Semantic Web, as envisioned by Tim Berners-Lee, with currently established technologies such as (X)HTML. This growth of semantics in the existing Web is swiftly advancing the state of the art for all Semantic Web processes. By enhancing existing Web documents with semantics we allow machines to categorise and handle information, so it can be used in a much more practical way, yet keep the principles of the existing web and still code for humans first. This paper reviews, analyzes and compares RDFa and microformats, two of the current technologies for inline metadata in (X)HTML and aims to give an overview over what possibilites are currently available for annotating existing data in Web sites. Keywords: semantic web, microformats, rdfa, xhtml, inline metadata, erdf. 1Digital Enterprise Research Institute Innsbruck, University of Innsbruck, Technikerstraße 21a, A-6020 Innsbruck, Austria. E-mail: [email protected] Copyright c 2007 by the authors DERI TR 2007-04-10 I Contents 1 Introduction 1 2 Common Principles 1 2.1 Visible Metadata . 1 2.2 DRY principle . 2 3 RDFa 2 3.1 Benefits . 3 3.2 Drawbacks . 3 4 Microformats 3 4.1 Benefits . 4 4.2 Drawbacks . 4 5 Side by Side Comparison 5 6 Discussion and Conclusions 6 DERI TR 2007-04-10 1 1 Introduction “The goal of the Semantic Web initiative is as broad as that of the Web: to create a universal medium for the exchange of data. It is envisaged to smoothly interconnect personal information management, enterprise application integration, and the global sharing of commercial, scien- tific and cultural data. Facilities to put machine-understandable data on the Web are quickly becoming a high priority for many organizations, individuals and communities.” [1] This vision of the Semantic Web consists essentially of a distributed knowledge system based on RDF, a markup format that provides a way to express logical statements in serialized formats like XML. It derives from Tim Berners-Lee’s vision of the World Wide Web as a universal medium for knowledge exchange. This new Semantic Web would be fundamentally different from the Web of today, mainly because it will be designed for machines first and humans second. Additionally the Semantic Web requires that we take a step away from the Web that we all know, throw away current practices and formats and embrace a new Web. While this isn’t exactly bad, it’s not a step that is to be taken lightly and it’s certainly not a step that can be taken quickly. At the moment we already have a Web which is viewable by humans in its native form and yet can be used as a first step to a Semantic Web. By re-using existing data and allowing the expression of semantics in Web pages, we can provide machines with information already being published on the Web as (X)HTML. Those “Real World Semantics” are seeing a widespread adoption by companies, bloggers and other “real people” on the internet beyond academic institutions. Recently the term “Lowercase Semantic Web” was coined for this type of mark-up, where the goals of the semantic web are achieved without dependence on the standards that are part of the wider Semantic Web initiative but can still work together with the “Uppercase Semantic Web” which comprises those standards. Several technologies that aim to enhance (X)HTML with semantic information have surfaced over the time and struggle for public acceptance, the most important ones being RDFa and microformats. Both RDFa and microformats share the same goal, yet are fundamentally different in that they approach the problem from a different direction, and deserve a closer inspection. 2 Common Principles While RDFa and microformats are very different, they share several core principles. For example both technologies support plain literals, are well formed and have no negative effect on browser behaviour. They also both follow the Principle of Least Astonishment which states that, when several elements of an interface are ambiguous, the behaviour that least surprises the human user should apply as it will usually be the correct one. The principle of Visible Metadata and the DRY Principle are two more features that are equally available in all approaches. 2.1 Visible Metadata Previously there were several attempts to annotate HTML documents with metadata. Ranging from <meta> tags in the head of a document to embedded RDF in HTML comments, those attempts had in common that the metadata was invisible to the human reader of the document. Hidden metadata is often abused for search engine placement or other gain that only benefits the author of the document, not the user. 2 DERI TR 2007-04-10 By making metadata available and completely visible, a consumer can easily know whether to trust the author and can be sure that all data is actually relevant to the human reader as well as machines. This principle also assists the document author in keeping the metadata up-to-date. Metadata that is hidden away can be easily forgotten and go stale, whereas visible inaccuracies would soon be discovered by humans and could thus be fixed. 2.2 DRY principle DRY stands for Don’t Repeat Yourself and describes another important process philosophy used in the RDFa and microformats approaches. Also known as Once and Only Once or Single Point of Truth, the core principle, which has first been mentioned in Andy Hunt and Dave Thomas’s book The Pragmatic Programmer, aims to reduce redundancy in computing. “DRY says that every piece of system knowledge should have one authoritative, unambigu- ous representation. Every piece of knowledge in the development of something should have a single representation. [...] Given all this knowledge, why should you find one way to represent each feature? The obvious answer is, if you have more than one way to express the same thing, at some point the two or three different representations will most likely fall out of step with each other. Even if they don’t, you’re guaranteeing yourself the headache of maintaining them in parallel whenever a change occurs. And change will occur.” [3] Often we maintain seperate RDF documents along with their HTML equivalents and have to update both resources on a regular basis. If the DRY principle is applied, a modification of any metadata in the system has to be done only in one place, expressed for both humans and machines. 3 RDFa RDFa, developed and proposed by the W3C, is a set of rules that can be used as a module for XHTML 2. It reuses attributes from standard XHTML meta and link elements and applies them to all other XHTML elements, so that one can annotate XHTML markup with semantic information. With a simple mapping it is possible to extract RDF triples from a RDFa annotated document. “RDFa is a syntax for expressing this structured data in XHTML. The rendered, hypertext data of XHTML is reused by the RDFa markup, so that publishers don’t repeat themselves. The underlying abstract representation is RDF, which lets publishers build their own vocabulary, extend others, and evolve their vocabulary with maximal interoperability over time. The expressed structure is closely tied to the data, so that rendered data can be copied and pasted along with its relevant structure.” [4] The ultimate goal of RDFa is to make any RDF structure representable in pure XHTML. Other than the microformats approach, this allows an author to use a predefined set of rules to mark up just about anything. Since the underlying abstract presentation is pure RDF, publishers can build their own vocabulary and extend other vocabularies with maximum interoperability. The structure expressed with RDFa is closely tied to the actual data, so the rendered elements can be copied and pasted along with their relevant RDF structure. However, there are also problems with RDFa. Not only does it require XHTML 2, it also requires a new form of URIs, called CURIEs. DERI TR 2007-04-10 3 3.1 Benefits • Publishers are independent and each website is allowed to use their own standards • Because of Self Containment, the RDF triples are seperated from the (X)HTML content • Modularity of the schema makes attributes reusable • Follows several well-working microformats principles

Rdfa Vs. Microformats

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support