Standards Editor: Jim Whitehead • [email protected] The Next (Small) Thing on the ?

Rohit Khare • CommerceNet

“Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards.” — Microformats.org

hen we speak of the “evolution of the is precisely encoding the great variety of person- Web,” it might actually be more appropri- al, professional, and genealogical relationships W ate to speak of “intelligent design” — we between people and organizations. By contrast, can actually point to a living, breathing, and an accidental challenge is that any blogger with actively involved Creator of the Web. We can even some knowledge of HTML can add consult Tim Berners-Lee’s stated goals for the markup to a text-input form, but uploading an “promised land,” dubbed the Semantic Web. Few external file dedicated to machine-readable use presume we could reach those objectives by ran- remains forbiddingly complex with most - domly hacking existing Web standards and hop- ging tools. ing that “natural selection” by authors, software So, although any intelligent designer ought to developers, and readers would ensure powerful be able to rely on the long-established facility of enough abstractions for it. file transfer to publish the “right” model of a social Indeed, the elegant and painstakingly inter- network, the path of least resistance might favor locked edifice of technologies, including RDF, adding one of a handful of fixed tags to an exist- XML, and query languages is now growing pow- ing indirect form — the “blogroll” of hyperlinks to erful enough to attack massive information chal- other people’s sites. lenges in disciplines such as bioinformatics. All the Sure, the XHTML Friends Network (XFN) same, incremental, messy innovation continues to microformat might be a weaker abstraction than take hold in fits and starts within such narrower the RDF-based Friend-of-a-Friend (FOAF) format,1 ecological niches of the Web as blogging. but choosing to merely “pave the cow path” of A prime example of this phenomenon is micro- existing writing styles as simply as possible could formats, a new approach to encoding semistruc- actually lead to significant adoption, which is what tured information in ordinary XHTML. Clever really matters for standards of any stripe. application of several existing XHTML elements In this column, I’ll take a more detailed look at and its powerful class attribute system can make some examples of microformats, the general prin- it easier to describe people, places, events, and ciples by which they can be constructed, and how other common types of semistructured informa- a community of users is forming around these tion in human-readable form. seemingly ad hoc specifications to advance the Microformats are better-adapted to the blo- cause of what some call an alternative to the gosphere for some seemingly minor reasons. In Semantic Web — the “lowercase semantic web.” Fred Brooks’ classic “No Silver Bullet” essay, he distinguishes between accidental and essential The Lowercase Semantic Web features of a problem. It might well be that the Suppose you wanted to publicize an upcoming lec- essential challenge in publishing a ture. The existence of vCalendar2 should be the end

68 JANUARY • FEBRUARY 2006 Published by the IEEE Computer Society 1089-7801/06/$20.00 © 2006 IEEE IEEE COMPUTING Microformats

of the debate on how to do so: it’s a widely

Microformats: What the Hell Are They acclaimed international standard that thoroughly and Why Should I Care? addresses such calendaring and scheduling con- Ryan King will explain why cerns as time zones, recurrences, owners, and pre- microformats are important and how you can mark up senters. You’d simply link to a myEvent.vcs file specific kinds of content in ways that make it easier for that looked something like this example from Ryan the right people to find your stuff. King’s primer on the topic3: Balder Room 20051012T061505Z BEGIN:VCALENDAR [email protected] BEGIN:VEVENT SUMMARY:Microformats: What the Hell Are They \ and Why Should I Care? DTSTART:20050926T000000Z LOCATION:Balder Room DTEND:20050926T010000Z Both of these alternatives must be stored as DESCRIPTION:Ryan King will explain why microfor- external resources and hyperlinked into the origi- mats are important and how you can mark up specific nal Web page from HTML anchor text such as: kinds of content in ways that make it easier for the right people to find your stuff. END:VEVENT Microformats: What the Hell Are They and Why END:VCALENDAR Should I Care?

Ryan King will explain why microformats are Given that this doesn’t look very human- important and how you can mark up specific kinds of readable, you’d assume it was well-suited to content in ways that make it easier for the right people machine-readability. Given that today’s machines to find your stuff.

prefer angle brackets to colon-delimited header- September 25th, 2005, 5-6PM in the value pairs, however, RDF Calendar4 is a “better” Balder Room alternative for use with the Semantic Web. For
example, Masahide Kanzaki’s RDFical-a-matic tool (www.kanzaki.com/docs/sw/rdfical-a-matic.html) In modern usage, of course, this HTML is an would generate the following output for the same abomination of inline formatting. Applying Cas- event listing: cading Stylesheets (CSS) is simpler, more flexible, and more accessible than relying on such relics of . You could use an xmlns:rdf=’http://www.w3.org/ external stylesheet like the following to define 1999/02/22-rdf-syntax-ns#’ each string’s look and feel: xmlns=’http://www.w3.org/2002/ 12/cal/ical#’>
Microformats: What the -//kanzaki.com//RDFCal 1.0//EN Hell Are They and Why Should I Care?

Ryan King will explain 2.0 why microformats are important and how you can mark PUBLISH up specific kinds of content in ways that make it easier for the right people to find your stuff.

“20050926T050000-0700”>September 25th, 2005, 5 2005-09-26T00:00:00Z “20050926T060000-0700”>6PM in the Balder Room
2005-09-26T01:00:00Z The payoff for choosing the class names I used in this example (shown in blue) is that they elimi-

IEEE INTERNET COMPUTING www.computer.org/internet/ JANUARY • FEBRUARY 2006 69 Standards

nate the need for linking to an external represen- fication out of thin air, for example, the hCalen- tation in the first place. The inline style informa- dar microformat (www.microformats.org/wiki/ tion is as sufficient as the other formats for ) reuses the names, objects, properties, encoding the same information — especially when values, types, hierarchies, and constraints from combined with some of the lesser-used corners of IETF’s iCalendar 5 (itself a pared-down profile of the XHTML specification to “abbreviate” the vCalendar). It doesn’t even insist on a clever pre- machine-readable ISO-8601 timestamps5 with fix: it still uses the vevent field because that’s the human-readable phrases. case-insensitive transliteration of the label in the original specification. Unlocking XHTML’s Power In the same way, spaces become dashes and “But a Web full of XML documents of arbitrary plurals are reduced to singular instances (for application — ‘plain XML’? That future never example, categories in iCalendar becomes happened.” category in hCalendar). The latter rule works by —David Janes (Blogmatrix.com) expanding lists of values into multiple sibling ele- ments in the Document Object Model (DOM). Sim- Despite this seemingly ad hoc process, there’s actu- ilarly, hierarchical containment relationships in the ally a fairly principled transformation for encod- originals are represented by nesting the corre- ing event into XHTML. Let’s look at how sponding microformatted XHTML elements. it works now, and return to the “why” in the next Finally, we can abbreviate particularly ugly data section. using the construct. This is useful for dates, When XML was new, CSS was scarce, and the for example, not merely because the ISO-8601 for- browser wars raged, HTML was often cast as a mat is longer than the original text, but also because hopeless muddle. The “Web of HTML” was poised the constraints of machine readability can appear to give way to a “Web of XML” in which each pub- inconsistent: a conference ending on the 7th must lisher used its own tags and presentation logic to actually be marked up with a dtend field on the 8th empower a new generation of browsers. Today, because it’s an exclusive range delimiter. users have access to fairly full-featured XML+XSL Each of these rules elaborates one basic theme: browsers on the desktop, but it’s too late. Like Java, use the most semantically appropriate XHTML ele- XML’s niche turned out to be on the server side, ment in the first place. The preceding examples out of sight. included

s and s, but those are actu- In the meantime, HTML grew up and became a ally the last resort. Better choices are to use exist- proper XML application, XHTML, offering all the ing list, dictionary, link, or quote constructs. rigor and modularization that an information architect could ask for. Similarly, CSS support in hCard How-To browsers, on PCs and handhelds, and in print Continuing in a practical vein, let’s deconstruct the matured to the point that authors and designers hCard microformat (www.microformats.org/wiki/ adopted it broadly. This was the key ecological ). Start with an IETF specification for using change that triggered a resurgence of experimen- vCard in email.7 Applying the rule of using the tation with “plain old” HTML. most appropriate XHTML element, the URL data If XML’s essential strength — decentralized evolu- field becomes a class on an anchor (” href=“...”>...); EMAIL then there’s little to be gained by simply renaming the becomes a mailto: link; and PHOTO becomes a problem of Babel by encouraging random mutation class on images. of new CSS selector names (classes) instead. Techno- Some data fields can occur more than once, or logically, XHTML class attributes do add a critical have further internal structure. Singular keys such degree of freedom, insofar as they can accommodate as a formatted name (fn) are resolved by using multiple values in space-separated lists. only the first matching descendant element; given Socially, however, the key insight that the micro- that a person can have multiples of things like tele- formats community is taking advantage of is phone numbers, however, every instance of a “appealing to authority” — stealing selector names descendant element with class tel should be pre- outright from well-established standards rather than served, each with its own additional flags such as reinventing the wheel. home, work, fax, or pref (“preferred”). Rather than creating a new calendaring speci- Finally, we must evaluate the results of apply-

70 JANUARY • FEBRUARY 2006 www.computer.org/internet/ IEEE INTERNET COMPUTING Microformats

ing these transliteration rules for how well they range of tools and utilities. The hCard Creator balance human- and machine-readability. For Javascript form (www.tantek.com/microformats/ example, the information about whether a phone hcard-creator.) and a similar editor for hCal- line supports faxing or goes to the person’s resi- endar can even be inserted into existing editable dence should be kept visible. Putting it in a class text areas using a bookmarklet.) The X2V trans- attribute would hide it from the reader, so hCard formation service applies an Extensible Stylesheet adds an extra layer of indirection using the class Language Transformations (XSLT) template to type, with any of the vCard list of telephone types export hCards and hCalendar entries found on a inserted as visible text instead. given Web page into an .ics file suitable for The astute reader with an eye toward localization import into any standard address book application. might note that this overloads the use of a previously Brian Suda currently operates this service on a vol- machine-readable flag. The four-letter string “home” untary basis (http://suda.co.uk/projects/X2V/), but might also be meaningful in English, but we’re still Technorati will also soon be operating a large- waiting for real-world experience with the compli- scale public transformation service using this cations of, say, German: Haupttelefon. followed by the URL of the page to analyze). Mapping the tel field to separate attributes Unfortunately, because the vCard legacy format forces the introduction of a value class to delim- predated the wide availability of unicode, the X2V it where the actual digits begin: script requires users to be aware of which interna- tional character set and encoding to use. Our work number: Overall, transforming an existing +1.415.555.WORK schema into a microformat is a

Breaking tel into a type and value keeps the qual- relatively straightforward process. ifier “work” visible and separates it from the actu- al phone number. Other elements once considered mandatory, Even more exciting are applications that use such as prodid, version, and source, became less Greasemonkey (www.diveintogreasemonkey.org) relevant and were dropped. Over time, hCard users to find, edit, and share microformats found while also gained enough experience to suggest author- surfing (see p. 3 for more on Greasemonkey). So- friendly optimizations, such as using the words in a called “user scripts,” such as Mark Pilgrim’s Mag- formatted name to derive the implied given-name icLine (www.mozdev.org/pipermail/greasemonkey/ and family-name of the corresponding compound 2005-August/004738.html) and Monkey Do (www. name property, n; preferring organization-name mozdev.org/pipermail/greasemonkey/2005-August/ when an organizational-unit isn’t mentioned; 005030.html), are already detecting, parsing, stor- and assuming that an hCard represents corporate ing, sharing, and searching snippets of structured contact information when fn and org both appear data captured from Web pages. on the same element. Overall, transforming an existing schema into a The h* Effect microformat is a relatively straightforward process. Skeptics might well note that the examples I’ve Once we can express it in HTML, it becomes easier presented so far are just two facets of the same to adopt for authors comfortable with HTML and, specification. A cynic might go further and ask even more so with the advent of template-driven whether microformats are simply a matter of slap- content-management systems such as blogging tools an “h” in front of existing specifications: call that natively support microformats. Transforming it the “h* effect.” into XHTML also offers positive feedback in the form Microformat advocates might celebrate such of easy-to-apply CSS stylesheets that make contact criticism because it underscores the philosophy of information more attractive. “reduce, reuse, and recycle” — shorthand for sev- The same ability to extract information from eral design principles that contrast strongly with XHTML using class selectors has enabled a wide existing standards bodies and processes.

IEEE INTERNET COMPUTING www.computer.org/internet/ JANUARY • FEBRUARY 2006 71 Standards

20 Combining this with the venerable maxim “out of 15 sight, out of mind” explains why the microformats community insists on keeping semistructured 10 information in-band and visible.

5 Thinking about Linking

Number of tags (Millions) “If one Web site links to another, the link doesn’t 0 carry any information about why the sites are Jan. ‘05 Feb. ‘05 Mar. ‘05 Apr. ‘05 May ‘05 June ‘05 Date linked. But what if it did?” —Tantek Çelik8 Figure 1. Microformat adoption rate. Data from Technorati and analysis by researchers at Carnegie Mellon University show that blog The most successful microformat of all is tagging. posters adopted the relTag microformat extremely rapidly. The general idea of attaching short words to describe an item is ancient, but the recent “Web 2.0” enthusiasm for tags can be traced to the del.icio.us shared bookmarking service. Users start- Reduce ed choosing tags that weren’t just keywords but The microformats community and process serves also labeled groups and roles (“to-read”). By sur- to focus attention on a specific problem (“How can facing, or displaying, items to which other users we point to licensing terms for blog posts?”) and have applied the same tag, or by suggesting addi- favors the simplest solutions. This is so central to tional tags for a given item, tags power an exciting its culture that it is enshrined in an official mani- bottom-up process for collaboratively organizing festo of sorts on the microformats.org Web site. information. Another startup, Flickr.com, tagged pho- Reuse tographs in the same way, to great effect. In retro- Never proceed from a priori reasoning alone; work spect, it seems inevitable that the approach would from experience and favor examples of current be applied to , too. Technorati.com, which practice. Always keep in mind Picasso’s dictum, hosted the original developer’s wiki for microfor- “lesser artists borrow; great artists steal.” Avoid the mats, reported that it was tracking 20 million “not invented here” syndrome and embrace exist- tagged posts within six months of the relTag ing, widely adopted schemas. microformat’s introduction. Today, nearly a third In fairness, XML advocates would readily of all blog entries include tags.9 salute the same flag. Creating a new document- Figure 1 is based on a frame from a short video type definition (DTD) might be easy, but most generated by the Art and Computer Science would prefer to reuse existing standards. The key research group at Carnegie Mellon University to point of departure with the Semantic Web com- visualize the dramatic adoption rate of relTag. (The munity is the (lowercase) semantic web commu- 60-second video is well worth watching: www.our- nity’s rallying cry: “design for humans first, media.org/node/37881.) How could tags be retro- machines second.” fitted to such a wide range of blogs (and blogging tools) so rapidly? RelTag (www.microformats.org/ Recycle wiki/reltag) couldn’t be much simpler: it’s a value Make sure the results make it easy to decentralize for a hyperlink’s REL attribute. To indicate that a innovation by encouraging modularity and the diary entry relates to ice cream, for example, you’d ability to embed. By ensuring that microformats insert Ice Cream!. The blog posts, and RSS feeds, and anywhere convention for tagging in HTML is to use the last else you can access the Web. component of the URL path as the tag name for further indexing, thus letting users cite or create Presentable and Parsable any tag vocabulary they’d like. Sam Ruby’s postulate states, “The accuracy of Typed link relations are a mainstay of hyper- metadata is inversely proportional to the square of text theory, but they’ve generally been overlooked the distance between the data and the metadata” on the Web. Consider the social networking phe- (www.intertwingly.net/slides/2004/devcon/68.html). nomenon of blogrolls — lists of authors’ favorite

72 JANUARY • FEBRUARY 2006 www.computer.org/internet/ IEEE INTERNET COMPUTING Microformats

Table 1.XHTML Friends Network (XFN) blogs, presented as lists of links in the margins of vocabulary of human relationships their homepages. In contrast to more ambitious for annotating links between blogs and home pages. efforts, such as the RDF FOAF, which more com- pletely capture social-network relationships, the Relationship Valid Values Constraints XHTML Friends Network (XFN) took the approach Friendship Contact, acquaintance, friend Pick one of adding link relationships to existing blogrolls.10 Physical Met Presumed symmetric Table 1 shows XFN’s arbitrary vocabulary for Professional Coworker, colleague human relationships. Although incomplete in any Geographic Coresident, neighbor theoretical sense, the XFN vocabulary aims to Family Child, parent, sibling, spouse, kin Pick one solve “80 percent” of the problem. Romantic Muse, crush, date, sweetheart Not always symmetric! The last of those (rel=me) is the most intrigu- Identity Me Excludes all other types ing: it allows authors to link together all of the resources on the Web that represent themselves. It might seem superfluous, but it provides a pivot point for integrating fragmented digital identities, hCard which are currently decentralized across multiple hCalendar Compound XMDP hReview independent, isolated Web sites. Elemental microformats XFN ... microformats XOXO Microstandards? RelLicense Link-based microformats emerged first. These RelTag XHTML include vote links that express authors’ opinions of ... XML the linked page, which can be tallied into instant polls by search engines; the nofollow link, designed to avoid influencing search engines’ rank- Figure 2. Standards on which microformats build. Microformats build ing algorithms when links occurred in comments; on successive layers of existing standards. Just as XHTML depends and license links that indicate the copyrights that on XML, the compound microformats for semistructured data reuse apply to a given document. Today, these are known the simpler elemental microformats. as elemental microformats, in contrast to the inline semantic markup of text, presented earlier, which are known as compound microformats. provides the foundation for microformats. It’s more Figure 2 illustrates the stack of standards on of a human-readable help file than a machine- which microformats build. Like a sedimentary fos- readable set of rules for automating parsing and sil record, it also happens to be a roughly accurate validation. Again, by providing this sort of “80/20” historical timeline of their emergence. Successful solution (focusing on the smallest subset of the adoption of new standards at each layer has been problem that benefits the most users) microformats essential for later diversification and complication are making headway as a simple authoring solu- in the layers above. The “standards process,” such tion while more complete Semantic Web descrip- as it is, also evolved during these phases. tion languages remain less widely adopted. XFN was a product of the Global Multimedia As the microformats community grew, Techno- Protocols Group (GMPG; www.gmpg.org), a self- rati hired a few of the initial advocates. This led to proclaimed club of a few designers whose name an unfortunate conflation between the general con- they borrowed from Neal Stephenson’s Snow cept and a single start-up company’s specific inter- Crash (1992, Bantam Spectra). The same team ests. CommerceNet.org, a long-standing nonprofit defined XHTML Metadata Profiles (XMDP),11 a for- organization that promotes electronic commerce mat for describing standards like XFN. Although on the Internet, then helped sponsor the transition it sounds like a mouthful, XMDP is just a clever to a neutral home at Microformats.org in June 2004 way to tell readers a list of the class names and and continues to help promote the technology and rel\rev link-attribute values that a particular support the community in various ways. microformat uses. XMDP declarations are linked An important test of the new regime came ear- in from the lesser-known profile attribute of the lier this year with the rapid development of hRe- element in HTML 4.01. view, a format for publishing reviews. (See Although it’s not as ambitious as other, more http://cnlabs.commerce.net/~rohit/hReview-in powerful schema description languages, XMDP -Review/ for the overview that I presented at the

IEEE INTERNET COMPUTING www.computer.org/internet/ JANUARY • FEBRUARY 2006 73 Standards

2005 conference.) Whether of tralization for creating new vocabularies, with an books, movies, restaurants, or any other items, additional level of schema constraints for enumer- reviews are a common idiom in blog posting, and ated values and other basic data types. The disad- tool developers wanted a common way to share vantage is that the semistructured information is them with search engines that could aggregate invisible to ordinary browsers and screen readers for community opinions. the disabled, and it creates less incentive for mod- Jointly authored by individuals from AOL, ularization and sharing of common vocabularies. , Yahoo, and Six Apart, among others, A subtler consequence is that structured blog- hReview was a watershed because it doesn’t appeal ging forces all the structured information on a Web to the authority of prior art. page into a single “ghetto” — an island of XML Rather than translating one existing specifica- within the larger HTML document. In contrast, we tion into XHTML, it builds on widely divergent can add microformats to more complex HTML standards, from the Platform for Internet Content structures, such as table layouts for agenda grids Rating Services (PICS)12 to the layout idioms of e- or formatted presentation of bibliographic records. commerce sites. The critical break from past standardization attempts was making hReview independent of the items being reviewed; it con- n a classic joke, an inventor is showing off his tains nothing to limit it to only books, movies, I latest gadget to a scientist who says, “Yes, yes, restaurants, or so on. Instead, the item property is it works in practice — but does it work in theo- merely a formatted name, link, image, or hCard ry?” Comparisons between the nascent microfor- (for reviewing a person or corporation). This side- mats movement and the Semantic Web tend to steps all of the item-specific complexity of man- raise the same question. dating bibliographic data, menus, or track lists; it’s On “developers’ day” at WWW 2005, the or- just enough metadata to build a better search ganizers held a panel discussion with advocates engine for humans to find and read reviews. from both communities, and at least two distinct It’s important to note that neither Com- responses emerged. On one hand, microformatted merceNet nor Microformats.org is a standards HTML might be a case of Richard Gabriel’s dictum body. The microformats community is an open “worse is better” by precluding “complete” solu- wiki, mailing list, and (IRC) tions such as FOAF in favor of XFN. On the other channel that has proven remarkably scalable and hand, it might be the breakthrough on-ramp that accommodating. The only hard-and-fast rule for the Semantic Web needs — a seedbed of common participation is that copyrights and patents on personal information that grows alongside resulting specifications must be openly published 600,000-word ontologies for oncology. and entirely royalty-free, respectively. Above and Evolution, whether guided intelligently or ran- beyond that, the culture values research into exist- domly, will eventually churn out answers through ing standards, which helps dampen the tendency the process of (technological) selection. In the to promote too many narrow innovations. “Ruth- meantime, it might be more evocative to end by less self-criticism” is actually one of the commu- considering what microformats are not: nity’s stated values. “Microformats are not a new language; infinite- Random Mutations ly extensible and open-ended; an attempt to get Microformats are not the only alternative to the everyone to change their behavior and rewrite intelligent design of the Semantic Web. The origi- their tools; a whole new approach that throws nal XML vision is also adapting to the blog away what already works today; nor a panacea environment — this time, under the banner of for all taxonomies, ontologies, and other such structured blogging. (See www.structuredblogging. abstractions.” org for a good overview.) —Tantek Çelik Rather than treating XHTML content as author- itative and weaving metadata around it, structured By avoiding as much as possible any pretension to blogging embeds plain XML within