<<

Editor: Peiya Liu Standards Siemens Corporate Research

XML Topic Maps: Finding Aids for the Web

Michel Biezunski opic maps superimpose an external layer that What’s a ? InfoLoom Tdescribes the nature of the knowledge repre- A topic map is a representation of sented in the information resources. There are no used to describe and navigate information objects. limitations on the kinds of information that can The topic maps paradigm requires topic map Steven R. be characterized by topic maps. The purpose of authors to think in terms of topics (subjects, top- Newcomb the Extensible topic maps ics of conversation, specific notions, ideas, or con- Coolheads (XTM) initiative is to apply the topic maps para- cepts), and to associate various kinds of Consulting digm in the context of the . information with specific topics. A topic map is an unobtrusive superimposed layer, external to the Finding information information objects it makes findable. The find- In a world of infoglut, it’s becoming a real chal- ability of a given information object , (that is, the lenge to find desired information. Hiding irrele- ease with which it can be found) has two aspects: vant information is most effectively and accurately done on the basis of categories, but 1. The ease with which a list of information there’s a number of ways to categorize the con- objects that is guaranteed to include the infor- tents of any corpus, and each system of catego- mation object can be created by means of rization represents only one particular worldview. some query, and Information users shouldn’t be forced to use a single ontology, , glossary, namespace, 2. The brevity of that list. The shorter the list, the or other implicit worldview. On the Web, we easier it is for a human being to find the should federate and exploit different worldviews desired information object within the list. simultaneously, even if those worldviews are cog- nitively incompatible with each other. A topic map can act as a kind of glue between Figure 1. A depiction of Finding information— that helps infor- disparate information objects, allowing all of the a corpus of information mation seekers to find other information—is objects relevant to a specific concept to be associ- (a) without topic map often too valuable to limit its exploitability to a ated with one another. Topic maps are metadata management and (b) single closed or proprietary environment. Finding that need not be inside the information they with topic map information should be application- and vendor- describe. management. neutral, so that users can freely exploit it in Interchangeable versus application-internal many ways and con- topic maps Why topic maps? texts. Topic maps take two forms: interchangeable Before After The topic map para- topic maps that are XML or SGML documents, digm provides a solu- and directly usable topic map graphs that are the Information/knowledge Infoglut management tion for interchanging application-internal result of processing inter- and federating finding changeable topic maps. Topic map graphs are information that di- abstractly described in terms of nodes and arcs. verse sources produce Topic maps can be formatted as specific kinds of and maintain accord- finding aids: indexes, glossaries, thesauri, and so ing to different world- on. We sometimes regard formatted finding aids as (a) (b) views (see Figure 1). a third form of a topic map, but this isn’t strictly

2 1070-986X/01/$10.00 © 2001 IEEE true, because such finding aids cannot necessarily are distinguishable by sets of topics (parameters) contain or reflect all of the information present in whose subjects are the processing contexts in the topic maps from which they were derived. which the variants are intended to be used. Topics without names may seem useless, but Topics they’re actually frequently used. A simple Web A topic map consists of topics, associations link expressed as an HTML element can be a between topics, and very little else. The inter- nameless topic (with a subject that’s unspecified, change syntax of topic maps provides an addi- but which is nonetheless quite real) that has two tional element type, called , that occurrences—one with an occurrence type of ori- refers to another interchangeable topic map with gin, the other with the occurrence type of target. which the containing topic map should be The subject can be the reason why the link merged at processing time. exists, whatever that reason might be (normally A topic is a compound information object— some particular notion that’s common to all represented and processed by a computer—that occurrences). There are other similar strategies for serves as a hub to which everything about the sub- upgrading existing Web knowledge assets, because ject that the topic represents is connected. In a it’s usually the case that when there’s such a link well-constructed topic map, every topic represents between two resources, it’s because they’re some- (or has) exactly one subject; a topic can be a com- how relevant to the same subject. As in nearly all puter-processable surrogate for its subject. (The kinds of cross-references, the topic and its subject Statue of Liberty can be a subject, but statues can’t already exist—they only lack a corresponding for- be processed by computers. A topic whose subject mal declaration, and they can then be given any is the Statue of Liberty, however, is processable by number of names. a computer. One way to say this is, “A topic reifies Unintentional name clashes must be prevent- a subject.”) We chose the word topic because it ed by means of the scoping facilities of topic maps connotes both subject and location (from the (see the “Scope” section). Greek topos, meaning location). Topic associations Topic occurrences Topics can participate in associations with one Information objects that are external to a topic another. Topic occurrences and associations may map—but which bear some relevance to the sub- be instances of classes of occurrences and associa- ject of some topic—are called occurrences of that tions. The classes—like everything else in topic topic. A topic occurrence is a resource declared to maps—are also topics. A class topic and an contain some kind of information about the topic instance topic play their respective roles in a class- under consideration that’s useful under certain instance association. Classes of topics, classes of conditions (that is, within some “scope”). Occur- associations, and the roles played in associations rences can be typed with user-defined ; are all user-definable, and they’re all topics (for the types, like everything else in topic maps, are example, they’re the subjects of topics that repre- represented as topics. The idea of classifying sent them). Associations are composed of mem- occurrences by type is comparable to a (much less bers—each topic participating in an association expressive) convention that causes some page plays a specific member role in the association. numbers in a index to appear in boldface. Topic maps neither interfere with nor limit the use of existing categorization schemes in knowledge Topic names bases, ontologies, , vocabularies, index- A topic may have zero, one, or several names, es, and so on. All are welcome and supportable, in any number of languages, for any set of user and all can be federated to meet the needs of users. contexts and scenarios (such as beginner, expert, Instances of associations can be subjected to casual, official, slang, formal, secret, and so on). validation via an association templating facility.

Each name, or basename, consists of a string of April–June 2001 characters. Basenames appear in namespaces, and Scope each namespace is defined by a scope; thus, you Topic characteristics (their names, occurrences, can address a topic by its name in a given name- and participations in associations) are all scoped. space (scope). Basenames can also have variants— Scopes—which are themselves sets of topics— icons, sound recordings, strings to be used as sort define the extent of validity within which topics keys, and so on. The variants of a given basename have each of their characteristics. Topics them-

3 Standards

Building Standards for Finding Aids Developers initiated work on topic maps in 1991 when Unix lished in January 2000. The syntax of ISO Topic Maps is simul- system vendors (and others, including the publisher O’Reilly taneously open and constrained, expressed as a set of archi- and Associates) founded the Davenport Group. The vendors tectural forms. (Architectural forms are structured element were under customer pressure to improve consistency in their templates; this templating facility is the subject of ISO/IEC printed documentation. Users were concerned about the 10744:1997 Annex A.3, http://www.ornl.gov/sgml/wg8/ inconsistent use of terms in the documentation of systems and document/n1920//clause-A.3.html.) Applications of ISO in published on the same subjects. Vendors wanted to 13250 can freely subclass the element types provided by the include independently and seamlessly created documentation element type definitions in the standard syntax, and freely under license in their system manuals. One major problem was rename the element type names, attribute names, and so on. providing master indexes for independently maintained, con- Thus, ISO 13250 meets the requirements of publishers and stantly changing technical documentation aggregated into sys- other power users for the management of their source codes tem manual sets by the vendors of such systems. The first for finding information assets. attempt at a solution to the problem was humorously called However, the advent of XML—and XML’s acceptance as the SOFABED (Standard Open Formal Architecture for Browsable Web’s lingua franca for communication between document Electronic Documents). and -driven information systems—created a need for The problem of providing living master indexes was so fasci- a less flexible, less daunting syntax for Web-centric applications nating that a new group was created in 1993, called the Con- and users. Such a syntax is achievable without losing any of the ventions for the Application of HyTime (CApH). The group expressive or federating power that the topic maps paradigm applied the sophisticated facilities of the ISO 10744 provides to topic map authors and users; the XTM specifica- /Time-based Structuring Language (HyTime) stan- tion provides such a syntax. dard. HyTime was published in 1992 to provide Standard Gen- The XTM initiative began as soon as the ISO 13250 topic eralized Markup Language (SGML, ISO 8879:1986), the standard maps standard was published. Working with IDEAlliance and on which XML is based, with multimedia and hyperlinking fea- others, the authors founded an independent organization tures. The Graphic Communications Association Research Insti- called TopicMaps.Org (http://www.topicmaps.org) for the tute (GCARI, now called IDEAlliance) hosted the CApH activity. purpose of creating and publishing an XTM 1.0 specification After the CApH group reviewed the possibilities of extended as quickly as possible. In less than one year, TopicMaps.Org navigation, it elaborated the SOFABED model, renam- was chartered and it delivered the core of the XTM 1.0 specifi- ing it topic maps. By 1995, the model was mature enough that cation at the XML 2000 conference in Washington, D.C. on 4 the ISO/JTC1/SC18/WG8 working group accepted it as a new Dec. 2000. work item—a basis for a new international standard. The topic The authors were the founding coeditors of the core maps standard was ultimately published as ISO/IEC 13250:2000 deliverables portion of the XTM specification, as well as of (http://www.y12.doe.gov/sgml/sc34/document/0129.pdf). the remaining portions of the authoring group review ver- During the initial phase, the ISO/IEC 13250 model consist- sion of the specification. In January 2001, Graham Moore ed of two constructs: topics and relationships between topics (Empolis) and Steve Pepper (Ontopia) became the new (later called associations). As the project developed, the need coeditors, and Eric Freese (DataChannel) was appointed the for a supplementary construct that could handle filtering based chair of TopicMaps.Org. on domain, language, security, and versioning emerged. A ves- Another version of the XTM 1.0 specification was released tigial form of the first filtering mechanism, called facet, persists on 17 Feb. 2001. It contains a corrected version of the XTM in the ISO standard, but it is not found in the XTM 1.0. Scope- 1.0 document type definition, and its Annex F proposes cer- based filtering is far more powerful than facet-based filtering, tain constraints on the processing of XTM topic maps. Mean- and since implementations of topic maps must support scoop- while, the authors are independently publishing their ongoing ing anyway, the facet facility is redundant. work relevant to topic maps, RDF, the , and so The ISO 13250 standard was finalized in 1999 and pub- on at http://www.topicmaps.net.

selves don’t have scope—only their characteristics ment that any topic must participate in the defi- have scope, and each characteristic has its own nition of any scope. scope. The set of topics that defines a given topic characteristic’s scope is entirely determined by the Subject-based topic merging author of the topic map, who also determines the It’s possible for two or more elements topics in the topic map. Any topic can be a mem- to have the same subject. In such a case, applica- ber of the set of topics that defines the scope of tions merge them into a single abstract topic node

IEEE MultiMedia any topic characteristic, but there’s no require- that exhibits the union of their characteristics (the

4 subject-based merging rule). A similar rule—the whenever two or more topics have the same name name-based merging rule—applies when two in the same scope, they’re assumed to have the elements have the same basename string same subject, and they’re automatically merged within the same scope. into a single topic node. This rule can be exploit- For purposes of subject-based merging, we ed to facilitate topic map maintenance and the assume that two topics have the same subject if merging of independently maintained topic maps. they specify that the same resource serves as their It’s up to the author or maintainer of a topic map subject indicator. A subject indicator is a resource to decide whether particular topics should be that the topic map author believes will precisely merged, and name-based merging is just one of the indicate to users the subject of the topic. Similar- knowledge-federation aspects of the paradigm. ly, two topics have the same subject if they speci- When topics with different subjects are merged, fy that one and the same resource actually is, the result is always incorrect, and such incorrect rather than indicates, the subject. merging has undesirable impacts on the usefulness Theoretically and ideally, after completing all of the resulting merged topic map. To protect topic map processing, every topic has exactly one against all unintended name-based topic merging subject, and no subject is represented by more than when whole topic maps are merged—as well as for one topic—there’s a one-to-one correspondence other reasons—the element can add between topics and subjects. Merging multiple topics to all the scopes of all the topic characteris- independently produced topic maps can pose seri- tics declared by the being merged. ous challenges. Topic map authors can greatly facil- itate the federation of their work with the work of Formalisms others by taking the trouble to refer to widely used The XTM 1.0 specification expresses the high- published subject indicators. All topics that have a level concepts of the topic maps paradigm, and subject indicator in common readily merge into a the relationships between the concepts, by means single topic. The published subject indicator notion of Unified (UML) diagrams. is expected to spawn businesses around lists, index- The XTM interchange syntax, like the ISO es, ontologies, vocabularies, and so forth. 13250 interchange syntax, is expressed as a docu- ment type definition (DTD). Although the XML Name-based topic merging interchange syntax is currently expressed as a Several topics can have the same name (for DTD, it could also be expressed as an XML schema example, they can have a homonymous one), and or in other ways, such as Relax, Schematron, yet not be intended to represent the same subject. TREX, and so forth. The DTD formalism was cho- For example, New York (the state) isn’t the same sen because it’s by far the most widely supported subject as New York (the city). To make the two and most mature formalism for XML syntaxes, topics addressable by their names, it’s necessary and it’s completely adequate for the purpose. that the two topics have these names within dif- Work on rigorous processing models for topic ferent namespaces. (The term namespace is used maps is in progress. A rigorous illustration of the here in the generic sense. A namespace is an authors’ deepest understanding of the meaning of abstract place where, when one utters a name, one topic map information is being maintained by either gets the one and only thing [in that place— them at http://www.topicmaps.net/pmtm4.htm. that namespace] that has the name one uttered, The graph representation used to describe at least or an error message saying, “There’s nothing here one kind of processing model for topic maps has that has the name you uttered.”) In topic maps, recently led to the creation of a visual vocabulary the namespace within which a topic has a name for diagramming topic maps, their various con- is defined by the scope (that is, the set of topics structs, and the decomposition of those constructs that defines the scope) within which it has that into their underlying concepts. An alternative name. In the New York example, the topic about mathematical expression language for this visual the subject of New York City might have the base- vocabulary is also under development. Because the April–June 2001 name New York within a scope that includes a graph-based language enables the representation of topic whose subject is the notion of “cityness,” the properties of topic map constructs at their most while the scope within which the topic whose elementary levels, it’s expected to be helpful in subject is New York State has the name New York showing how the Topic Maps paradigm intersects wouldn’t include the “cityness” topic. with (and, ultimately, can interoperate with) other The name-based merging rule requires that metadata representation and interchange standards.

5 Standards

1.html) that there should be a convergence between Related Resources RDF and topic maps. The implications of such a Articles convergence will include benefits for DARPA’s We recommend the following articles as further resources. Agent Markup Language (DAML) initiative, as well L. Alschuler, “Going to Extremes,” 13 Sept. 2000, http://www.xml.com/ as with the ontology interface layer (OIL). pub/a/2000/09/13/extremes.html The central notions of topic maps will play L. Alschuler, “Topic Maps,” 16 June 2000, http://www.xml.com/ increasingly significant roles in future generations pub/a/2000/06/xmleurope/maps.html of Web technology, because the severity of the T. Berners-Lee, “Semantic Web,” presentation at Extensible Markup Lan- infoglut problem is only going to increase. The guage 2000 Conf. (XML 2000), 6 Dec. 2000, http://www.w3.org/2000/ Topic Maps paradigm is designed not only to Talks/1206-xml2k-tbl. accommodate diversity; it preserves, cherishes, and N. Ogievetsky (Cogitech), ”XSLT Transformations between ISO Topic leverages diversity in the conquest of infoglut. Maps and XTM,” http://www.cogx.com. Whenever a new vocabulary, ontology, and so on S. Saint-Laurent, “Getting Topical,” xml.com, 20 Dec. 2000, http://www. appears, it need not be regarded as evidence sug- xml.com/pub/a/2000/12/20/topicmaps.html. gesting that the dream of global knowledge inter- change can’t be realized. On the contrary, it’s cause Web Sites for hope, because of the knowledge-federating, The following Web sites deal with topic maps or expand upon related diversity-leveraging power of topic maps. No great topics discussed in this article. difficulty is posed by the need to welcome yet M. Biezunski and D. Kennedy, http://www.infoloom.com another community of interest into the global Empolis (Bertelsmann), http://www.topicmaps.com community of communities of interest. Commu- Enabling technology for XML and SGML architectural forms is freely avail- nities of interest are defined by their worldviews, able at http://www.hytime.org/SPt and whenever a community of interest rigorously Robin Cover’s extensive bibliography on topic maps and current related exposes its worldview in a fashion that permits its events is available from OASIS at http://www.oasis-open.org/cover/ knowledge to be federated with the worldviews and topicMaps.html knowledge of other communities, the whole Information about work in progress on the graphic representation of pro- human family is enriched. MM cessing models, including the one being elaborated for XTM and ISO 13250. Page maintained by Sam Hunting, http://www.etopicality.com Michel Biezunski and Steven R. Newcomb have Mondeca, topic map products, http://www.mondeca.com been working together since 1992. They cofounded the Ontopia, products, http://www.ontopia.net CapH Group (Conventions for the Applications of TM4J, open source (Kal Ahmend), http://www.techquila.com HyTime), hosted by the Graphic Communications Xml.com Resource Guide, http://www.xml.com/pub/rg/Topic_Maps Association Research Institute (now called IDEAl- XTM official Web site, http://www.topicmaps.org liance). With Martin Bryan, they coedited the ISO/IEC 13250:2000 “Topic Maps” standard. They cofound- ed TopicMaps.Org, and they were the editors of the Topic maps and the semantic Web first release of XTM 1.0, an interchange syntax for There’s some overlap between topic maps and topic maps in XML. the Resource Description Framework (RDF, http://www.w3.org/rdf) specification. Both stan- Michel Biezunski is a consultant at InfoLoom dards aim to represent connections between infor- (www.infoloom.com). Among other things, he mation objects and can encode metadata, among designs and implements topic maps for commercial other things. Since the Extreme Markup Conference and government clients. Readers may email Biezuns- in August 2000 (http://www.extrememarkup.org), ki at [email protected] where a memorable boxing match between RDF Steven R. Newcomb is an independent consultant and Topic Maps was held, discussions have been in information management, with clients in industry ongoing between those responsible for the two and government. He is a coeditor of the HyTime inter- standards. Later, at the Graphic Communcations national standard (ISO/IEC 10744:1997, Hyperme- Association (GCA) XML 2000 Conference where dia/Time-based Structuring Language). Readers may the publication of the XTM 1.0 Core Deliverables email Newcomb at [email protected]. was announced, Tim Berners-Lee, the Director of the World Wide Web Consortium (W3C) proposed Contact Standards editor Peiya Liu, Siemens Cor- in his presentation on the semantic Web (http:// porate Research, 755 College Road East, Princeton, NJ

IEEE MultiMedia www.w3.org/2000/Talks/1206-xml2k-tbl/slide15- 08540, email [email protected].

6