Microformats the Next (Small) Thing on the Semantic Web?
Total Page:16
File Type:pdf, Size:1020Kb
Standards Editor: Jim Whitehead • [email protected] Microformats The Next (Small) Thing on the Semantic Web? Rohit Khare • CommerceNet “Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards.” — Microformats.org hen we speak of the “evolution of the is precisely encoding the great variety of person- Web,” it might actually be more appropri- al, professional, and genealogical relationships W ate to speak of “intelligent design” — we between people and organizations. By contrast, can actually point to a living, breathing, and an accidental challenge is that any blogger with actively involved Creator of the Web. We can even some knowledge of HTML can add microformat consult Tim Berners-Lee’s stated goals for the markup to a text-input form, but uploading an “promised land,” dubbed the Semantic Web. Few external file dedicated to machine-readable use presume we could reach those objectives by ran- remains forbiddingly complex with most blog- domly hacking existing Web standards and hop- ging tools. ing that “natural selection” by authors, software So, although any intelligent designer ought to developers, and readers would ensure powerful be able to rely on the long-established facility of enough abstractions for it. file transfer to publish the “right” model of a social Indeed, the elegant and painstakingly inter- network, the path of least resistance might favor locked edifice of technologies, including RDF, adding one of a handful of fixed tags to an exist- XML, and query languages is now growing pow- ing indirect form — the “blogroll” of hyperlinks to erful enough to attack massive information chal- other people’s sites. lenges in disciplines such as bioinformatics. All the Sure, the XHTML Friends Network (XFN) same, incremental, messy innovation continues to microformat might be a weaker abstraction than take hold in fits and starts within such narrower the RDF-based Friend-of-a-Friend (FOAF) format,1 ecological niches of the Web as blogging. but choosing to merely “pave the cow path” of A prime example of this phenomenon is micro- existing writing styles as simply as possible could formats, a new approach to encoding semistruc- actually lead to significant adoption, which is what tured information in ordinary XHTML. Clever really matters for standards of any stripe. application of several existing XHTML elements In this column, I’ll take a more detailed look at and its powerful class attribute system can make some examples of microformats, the general prin- it easier to describe people, places, events, and ciples by which they can be constructed, and how other common types of semistructured informa- a community of users is forming around these tion in human-readable form. seemingly ad hoc specifications to advance the Microformats are better-adapted to the blo- cause of what some call an alternative to the gosphere for some seemingly minor reasons. In Semantic Web — the “lowercase semantic web.” Fred Brooks’ classic “No Silver Bullet” essay, he distinguishes between accidental and essential The Lowercase Semantic Web features of a problem. It might well be that the Suppose you wanted to publicize an upcoming lec- essential challenge in publishing a social network ture. The existence of vCalendar2 should be the end 68 JANUARY • FEBRUARY 2006 Published by the IEEE Computer Society 1089-7801/06/$20.00 © 2006 IEEE IEEE INTERNET COMPUTING Microformats of the debate on how to do so: it’s a widely <summary>Microformats: What the Hell Are They acclaimed international standard that thoroughly and Why Should I Care?</summary> addresses such calendaring and scheduling con- <description>Ryan King will explain why cerns as time zones, recurrences, owners, and pre- microformats are important and how you can mark up senters. You’d simply link to a myEvent.vcs file specific kinds of content in ways that make it easier for that looked something like this example from Ryan the right people to find your stuff.</description> King’s primer on the topic3: <location>Balder Room</location> <dtstamp>20051012T061505Z</dtstamp> BEGIN:VCALENDAR <uid>[email protected]</uid> BEGIN:VEVENT </Vevent> SUMMARY:Microformats: What the Hell Are They \ </component> and Why Should I Care? </Vcalendar> DTSTART:20050926T000000Z </rdf:RDF> LOCATION:Balder Room DTEND:20050926T010000Z Both of these alternatives must be stored as DESCRIPTION:Ryan King will explain why microfor- external resources and hyperlinked into the origi- mats are important and how you can mark up specific nal Web page from HTML anchor text such as: kinds of content in ways that make it easier for the right people to find your stuff. <a href=“/myEvent.vcs”> END:VEVENT <b>Microformats: What the Hell Are They and Why END:VCALENDAR Should I Care?</b> <p>Ryan King will explain why microformats are Given that this doesn’t look very human- important and how you can mark up specific kinds of readable, you’d assume it was well-suited to content in ways that make it easier for the right people machine-readability. Given that today’s machines to find your stuff.</p> prefer angle brackets to colon-delimited header- <small>September 25th, 2005, 5-6PM in the value pairs, however, RDF Calendar4 is a “better” <i>Balder Room</i></small> alternative for use with the Semantic Web. For </a> example, Masahide Kanzaki’s RDFical-a-matic tool (www.kanzaki.com/docs/sw/rdfical-a-matic.html) In modern usage, of course, this HTML is an would generate the following output for the same abomination of inline formatting. Applying Cas- event listing: cading Stylesheets (CSS) is simpler, more flexible, and more accessible than relying on such relics of <rdf:RDF the “browser wars” as <small>. You could use an xmlns:rdf=’http://www.w3.org/ external stylesheet like the following to define 1999/02/22-rdf-syntax-ns#’ each string’s look and feel: xmlns=’http://www.w3.org/2002/ 12/cal/ical#’> <div class=“vcalendar vevent”> <Vcalendar> <span class=“summary”>Microformats: What the <prodid>-//kanzaki.com//RDFCal 1.0//EN Hell Are They and Why Should I Care?</span> </prodid> <p class=“description”>Ryan King will explain <version>2.0</version> why microformats are important and how you can mark <method>PUBLISH</method> up specific kinds of content in ways that make it easier <component> for the right people to find your stuff.</p> <Vevent> <abbr class=“dtstart” title= <dtstart rdf:parseType=’Resource’> “20050926T050000-0700”>September 25th, 2005, 5 <dateTime>2005-09-26T00:00:00Z </abbr>— <abbr class=“dtend” title= </dateTime> “20050926T060000-0700”>6PM</abbr> in the </dtstart> <span class=”location”>Balder Room</span> <dtend rdf:parseType=’Resource’> </div> <dateTime>2005-09-26T01:00:00Z </dateTime> The payoff for choosing the class names I used in </dtend> this example (shown in blue) is that they elimi- IEEE INTERNET COMPUTING www.computer.org/internet/ JANUARY • FEBRUARY 2006 69 Standards nate the need for linking to an external represen- fication out of thin air, for example, the hCalen- tation in the first place. The inline style informa- dar microformat (www.microformats.org/wiki/ tion is as sufficient as the other formats for hcalendar) reuses the names, objects, properties, encoding the same information — especially when values, types, hierarchies, and constraints from combined with some of the lesser-used corners of IETF’s iCalendar 5 (itself a pared-down profile of the XHTML specification to “abbreviate” the vCalendar). It doesn’t even insist on a clever pre- machine-readable ISO-8601 timestamps5 with fix: it still uses the vevent field because that’s the human-readable phrases. case-insensitive transliteration of the label in the original specification. Unlocking XHTML’s Power In the same way, spaces become dashes and “But a Web full of XML documents of arbitrary plurals are reduced to singular instances (for application — ‘plain XML’? That future never example, categories in iCalendar becomes happened.” category in hCalendar). The latter rule works by —David Janes (Blogmatrix.com) expanding lists of values into multiple sibling ele- ments in the Document Object Model (DOM). Sim- Despite this seemingly ad hoc process, there’s actu- ilarly, hierarchical containment relationships in the ally a fairly principled transformation for encod- originals are represented by nesting the corre- ing event metadata into XHTML. Let’s look at how sponding microformatted XHTML elements. it works now, and return to the “why” in the next Finally, we can abbreviate particularly ugly data section. using the <abbr> construct. This is useful for dates, When XML was new, CSS was scarce, and the for example, not merely because the ISO-8601 for- browser wars raged, HTML was often cast as a mat is longer than the original text, but also because hopeless muddle. The “Web of HTML” was poised the constraints of machine readability can appear to give way to a “Web of XML” in which each pub- inconsistent: a conference ending on the 7th must lisher used its own tags and presentation logic to actually be marked up with a dtend field on the 8th empower a new generation of browsers. Today, because it’s an exclusive range delimiter. users have access to fairly full-featured XML+XSL Each of these rules elaborates one basic theme: browsers on the desktop, but it’s too late. Like Java, use the most semantically appropriate XHTML ele- XML’s niche turned out to be on the server side, ment in the first place. The preceding examples out of sight. included <div>s and <span>s, but those are actu- In the meantime, HTML grew up and became a ally the last resort. Better choices are to use exist- proper XML application, XHTML, offering all the ing list, dictionary, link, or quote constructs.