Editor: Peiya Liu Standards Siemens Corporate Research
XML Topic Maps: Finding Aids for the Web
Michel Biezunski opic maps superimpose an external layer that What’s a topic map? InfoLoom Tdescribes the nature of the knowledge repre- A topic map is a representation of information sented in the information resources. There are no used to describe and navigate information objects. limitations on the kinds of information that can The topic maps paradigm requires topic map Steven R. be characterized by topic maps. The purpose of authors to think in terms of topics (subjects, top- Newcomb the Extensible Markup Language topic maps ics of conversation, specific notions, ideas, or con- Coolheads (XTM) initiative is to apply the topic maps para- cepts), and to associate various kinds of Consulting digm in the context of the World Wide Web. information with specific topics. A topic map is an unobtrusive superimposed layer, external to the Finding information information objects it makes findable. The find- In a world of infoglut, it’s becoming a real chal- ability of a given information object , (that is, the lenge to find desired information. Hiding irrele- ease with which it can be found) has two aspects: vant information is most effectively and accurately done on the basis of categories, but 1. The ease with which a list of information there’s a number of ways to categorize the con- objects that is guaranteed to include the infor- tents of any corpus, and each system of catego- mation object can be created by means of rization represents only one particular worldview. some query, and Information users shouldn’t be forced to use a single ontology, taxonomy, glossary, namespace, 2. The brevity of that list. The shorter the list, the or other implicit worldview. On the Web, we easier it is for a human being to find the should federate and exploit different worldviews desired information object within the list. simultaneously, even if those worldviews are cog- nitively incompatible with each other. A topic map can act as a kind of glue between Figure 1. A depiction of Finding information—metadata that helps infor- disparate information objects, allowing all of the a corpus of information mation seekers to find other information—is objects relevant to a specific concept to be associ- (a) without topic map often too valuable to limit its exploitability to a ated with one another. Topic maps are metadata management and (b) single closed or proprietary environment. Finding that need not be inside the information they with topic map information should be application- and vendor- describe. management. neutral, so that users can freely exploit it in Interchangeable versus application-internal many ways and con- topic maps Why topic maps? texts. Topic maps take two forms: interchangeable Before After The topic map para- topic maps that are XML or SGML documents, digm provides a solu- and directly usable topic map graphs that are the Information/knowledge Infoglut management tion for interchanging application-internal result of processing inter- and federating finding changeable topic maps. Topic map graphs are information that di- abstractly described in terms of nodes and arcs. verse sources produce Topic maps can be formatted as specific kinds of and maintain accord- finding aids: indexes, glossaries, thesauri, and so ing to different world- on. We sometimes regard formatted finding aids as (a) (b) views (see Figure 1). a third form of a topic map, but this isn’t strictly
2 1070-986X/01/$10.00 © 2001 IEEE true, because such finding aids cannot necessarily are distinguishable by sets of topics (parameters) contain or reflect all of the information present in whose subjects are the processing contexts in the topic maps from which they were derived. which the variants are intended to be used. Topics without names may seem useless, but Topics they’re actually frequently used. A simple Web A topic map consists of topics, associations link expressed as an HTML element can be a between topics, and very little else. The inter- nameless topic (with a subject that’s unspecified, change syntax of topic maps provides an addi- but which is nonetheless quite real) that has two tional element type, called
Each name, or basename, consists of a string of April–June 2001 characters. Basenames appear in namespaces, and Scope each namespace is defined by a scope; thus, you Topic characteristics (their names, occurrences, can address a topic by its name in a given name- and participations in associations) are all scoped. space (scope). Basenames can also have variants— Scopes—which are themselves sets of topics— icons, sound recordings, strings to be used as sort define the extent of validity within which topics keys, and so on. The variants of a given basename have each of their characteristics. Topics them-
3 Standards
Building Standards for Finding Aids Developers initiated work on topic maps in 1991 when Unix lished in January 2000. The syntax of ISO Topic Maps is simul- system vendors (and others, including the publisher O’Reilly taneously open and constrained, expressed as a set of archi- and Associates) founded the Davenport Group. The vendors tectural forms. (Architectural forms are structured element were under customer pressure to improve consistency in their templates; this templating facility is the subject of ISO/IEC printed documentation. Users were concerned about the 10744:1997 Annex A.3, http://www.ornl.gov/sgml/wg8/ inconsistent use of terms in the documentation of systems and document/n1920/html/clause-A.3.html.) Applications of ISO in published books on the same subjects. Vendors wanted to 13250 can freely subclass the element types provided by the include independently and seamlessly created documentation element type definitions in the standard syntax, and freely under license in their system manuals. One major problem was rename the element type names, attribute names, and so on. providing master indexes for independently maintained, con- Thus, ISO 13250 meets the requirements of publishers and stantly changing technical documentation aggregated into sys- other power users for the management of their source codes tem manual sets by the vendors of such systems. The first for finding information assets. attempt at a solution to the problem was humorously called However, the advent of XML—and XML’s acceptance as the SOFABED (Standard Open Formal Architecture for Browsable Web’s lingua franca for communication between document Electronic Documents). and database-driven information systems—created a need for The problem of providing living master indexes was so fasci- a less flexible, less daunting syntax for Web-centric applications nating that a new group was created in 1993, called the Con- and users. Such a syntax is achievable without losing any of the ventions for the Application of HyTime (CApH). The group expressive or federating power that the topic maps paradigm applied the sophisticated hypertext facilities of the ISO 10744 provides to topic map authors and users; the XTM specifica- Hypermedia/Time-based Structuring Language (HyTime) stan- tion provides such a syntax. dard. HyTime was published in 1992 to provide Standard Gen- The XTM initiative began as soon as the ISO 13250 topic eralized Markup Language (SGML, ISO 8879:1986), the standard maps standard was published. Working with IDEAlliance and on which XML is based, with multimedia and hyperlinking fea- others, the authors founded an independent organization tures. The Graphic Communications Association Research Insti- called TopicMaps.Org (http://www.topicmaps.org) for the tute (GCARI, now called IDEAlliance) hosted the CApH activity. purpose of creating and publishing an XTM 1.0 specification After the CApH group reviewed the possibilities of extended as quickly as possible. In less than one year, TopicMaps.Org hyperlink navigation, it elaborated the SOFABED model, renam- was chartered and it delivered the core of the XTM 1.0 specifi- ing it topic maps. By 1995, the model was mature enough that cation at the XML 2000 conference in Washington, D.C. on 4 the ISO/JTC1/SC18/WG8 working group accepted it as a new Dec. 2000. work item—a basis for a new international standard. The topic The authors were the founding coeditors of the core maps standard was ultimately published as ISO/IEC 13250:2000 deliverables portion of the XTM specification, as well as of (http://www.y12.doe.gov/sgml/sc34/document/0129.pdf). the remaining portions of the authoring group review ver- During the initial phase, the ISO/IEC 13250 model consist- sion of the specification. In January 2001, Graham Moore ed of two constructs: topics and relationships between topics (Empolis) and Steve Pepper (Ontopia) became the new (later called associations). As the project developed, the need coeditors, and Eric Freese (DataChannel) was appointed the for a supplementary construct that could handle filtering based chair of TopicMaps.Org. on domain, language, security, and versioning emerged. A ves- Another version of the XTM 1.0 specification was released tigial form of the first filtering mechanism, called facet, persists on 17 Feb. 2001. It contains a corrected version of the XTM in the ISO standard, but it is not found in the XTM 1.0. Scope- 1.0 document type definition, and its Annex F proposes cer- based filtering is far more powerful than facet-based filtering, tain constraints on the processing of XTM topic maps. Mean- and since implementations of topic maps must support scoop- while, the authors are independently publishing their ongoing ing anyway, the facet facility is redundant. work relevant to topic maps, RDF, the Semantic Web, and so The ISO 13250 standard was finalized in 1999 and pub- on at http://www.topicmaps.net.
selves don’t have scope—only their characteristics ment that any topic must participate in the defi- have scope, and each characteristic has its own nition of any scope. scope. The set of topics that defines a given topic characteristic’s scope is entirely determined by the Subject-based topic merging author of the topic map, who also determines the It’s possible for two or more
IEEE MultiMedia any topic characteristic, but there’s no require- that exhibits the union of their characteristics (the
4 subject-based merging rule). A similar rule—the whenever two or more topics have the same name name-based merging rule—applies when two in the same scope, they’re assumed to have the
5 Standards
1.html) that there should be a convergence between Related Resources RDF and topic maps. The implications of such a Articles convergence will include benefits for DARPA’s We recommend the following articles as further resources. Agent Markup Language (DAML) initiative, as well L. Alschuler, “Going to Extremes,” 13 Sept. 2000, http://www.xml.com/ as with the ontology interface layer (OIL). pub/a/2000/09/13/extremes.html The central notions of topic maps will play L. Alschuler, “Topic Maps,” 16 June 2000, http://www.xml.com/ increasingly significant roles in future generations pub/a/2000/06/xmleurope/maps.html of Web technology, because the severity of the T. Berners-Lee, “Semantic Web,” presentation at Extensible Markup Lan- infoglut problem is only going to increase. The guage 2000 Conf. (XML 2000), 6 Dec. 2000, http://www.w3.org/2000/ Topic Maps paradigm is designed not only to Talks/1206-xml2k-tbl. accommodate diversity; it preserves, cherishes, and N. Ogievetsky (Cogitech), ”XSLT Transformations between ISO Topic leverages diversity in the conquest of infoglut. Maps and XTM,” http://www.cogx.com. Whenever a new vocabulary, ontology, and so on S. Saint-Laurent, “Getting Topical,” xml.com, 20 Dec. 2000, http://www. appears, it need not be regarded as evidence sug- xml.com/pub/a/2000/12/20/topicmaps.html. gesting that the dream of global knowledge inter- change can’t be realized. On the contrary, it’s cause Web Sites for hope, because of the knowledge-federating, The following Web sites deal with topic maps or expand upon related diversity-leveraging power of topic maps. No great topics discussed in this article. difficulty is posed by the need to welcome yet M. Biezunski and D. Kennedy, http://www.infoloom.com another community of interest into the global Empolis (Bertelsmann), http://www.topicmaps.com community of communities of interest. Commu- Enabling technology for XML and SGML architectural forms is freely avail- nities of interest are defined by their worldviews, able at http://www.hytime.org/SPt and whenever a community of interest rigorously Robin Cover’s extensive bibliography on topic maps and current related exposes its worldview in a fashion that permits its events is available from OASIS at http://www.oasis-open.org/cover/ knowledge to be federated with the worldviews and topicMaps.html knowledge of other communities, the whole Information about work in progress on the graphic representation of pro- human family is enriched. MM cessing models, including the one being elaborated for XTM and ISO 13250. Page maintained by Sam Hunting, http://www.etopicality.com Michel Biezunski and Steven R. Newcomb have Mondeca, topic map products, http://www.mondeca.com been working together since 1992. They cofounded the Ontopia, products, http://www.ontopia.net CapH Group (Conventions for the Applications of TM4J, open source software (Kal Ahmend), http://www.techquila.com HyTime), hosted by the Graphic Communications Xml.com Resource Guide, http://www.xml.com/pub/rg/Topic_Maps Association Research Institute (now called IDEAl- XTM official Web site, http://www.topicmaps.org liance). With Martin Bryan, they coedited the ISO/IEC 13250:2000 “Topic Maps” standard. They cofound- ed TopicMaps.Org, and they were the editors of the Topic maps and the semantic Web first release of XTM 1.0, an interchange syntax for There’s some overlap between topic maps and topic maps in XML. the Resource Description Framework (RDF, http://www.w3.org/rdf) specification. Both stan- Michel Biezunski is a consultant at InfoLoom dards aim to represent connections between infor- (www.infoloom.com). Among other things, he mation objects and can encode metadata, among designs and implements topic maps for commercial other things. Since the Extreme Markup Conference and government clients. Readers may email Biezuns- in August 2000 (http://www.extrememarkup.org), ki at [email protected] where a memorable boxing match between RDF Steven R. Newcomb is an independent consultant and Topic Maps was held, discussions have been in information management, with clients in industry ongoing between those responsible for the two and government. He is a coeditor of the HyTime inter- standards. Later, at the Graphic Communcations national standard (ISO/IEC 10744:1997, Hyperme- Association (GCA) XML 2000 Conference where dia/Time-based Structuring Language). Readers may the publication of the XTM 1.0 Core Deliverables email Newcomb at [email protected]. was announced, Tim Berners-Lee, the Director of the World Wide Web Consortium (W3C) proposed Contact Standards editor Peiya Liu, Siemens Cor- in his presentation on the semantic Web (http:// porate Research, 755 College Road East, Princeton, NJ
IEEE MultiMedia www.w3.org/2000/Talks/1206-xml2k-tbl/slide15- 08540, email [email protected].
6