Standards XML Topic Maps: Finding Aids for The
Total Page:16
File Type:pdf, Size:1020Kb
Editor: Peiya Liu Standards Siemens Corporate Research XML Topic Maps: Finding Aids for the Web Michel Biezunski opic maps superimpose an external layer that What’s a topic map? InfoLoom Tdescribes the nature of the knowledge repre- A topic map is a representation of information sented in the information resources. There are no used to describe and navigate information objects. limitations on the kinds of information that can The topic maps paradigm requires topic map Steven R. be characterized by topic maps. The purpose of authors to think in terms of topics (subjects, top- Newcomb the Extensible Markup Language topic maps ics of conversation, specific notions, ideas, or con- Coolheads (XTM) initiative is to apply the topic maps para- cepts), and to associate various kinds of Consulting digm in the context of the World Wide Web. information with specific topics. A topic map is an unobtrusive superimposed layer, external to the Finding information information objects it makes findable. The find- In a world of infoglut, it’s becoming a real chal- ability of a given information object , (that is, the lenge to find desired information. Hiding irrele- ease with which it can be found) has two aspects: vant information is most effectively and accurately done on the basis of categories, but 1. The ease with which a list of information there’s a number of ways to categorize the con- objects that is guaranteed to include the infor- tents of any corpus, and each system of catego- mation object can be created by means of rization represents only one particular worldview. some query, and Information users shouldn’t be forced to use a single ontology, taxonomy, glossary, namespace, 2. The brevity of that list. The shorter the list, the or other implicit worldview. On the Web, we easier it is for a human being to find the should federate and exploit different worldviews desired information object within the list. simultaneously, even if those worldviews are cog- nitively incompatible with each other. A topic map can act as a kind of glue between Figure 1. A depiction of Finding information—metadata that helps infor- disparate information objects, allowing all of the a corpus of information mation seekers to find other information—is objects relevant to a specific concept to be associ- (a) without topic map often too valuable to limit its exploitability to a ated with one another. Topic maps are metadata management and (b) single closed or proprietary environment. Finding that need not be inside the information they with topic map information should be application- and vendor- describe. management. neutral, so that users can freely exploit it in Interchangeable versus application-internal many ways and con- topic maps Why topic maps? texts. Topic maps take two forms: interchangeable Before After The topic map para- topic maps that are XML or SGML documents, digm provides a solu- and directly usable topic map graphs that are the Information/knowledge Infoglut management tion for interchanging application-internal result of processing inter- and federating finding changeable topic maps. Topic map graphs are information that di- abstractly described in terms of nodes and arcs. verse sources produce Topic maps can be formatted as specific kinds of and maintain accord- finding aids: indexes, glossaries, thesauri, and so ing to different world- on. We sometimes regard formatted finding aids as (a) (b) views (see Figure 1). a third form of a topic map, but this isn’t strictly 2 1070-986X/01/$10.00 © 2001 IEEE true, because such finding aids cannot necessarily are distinguishable by sets of topics (parameters) contain or reflect all of the information present in whose subjects are the processing contexts in the topic maps from which they were derived. which the variants are intended to be used. Topics without names may seem useless, but Topics they’re actually frequently used. A simple Web A topic map consists of topics, associations link expressed as an HTML <A> element can be a between topics, and very little else. The inter- nameless topic (with a subject that’s unspecified, change syntax of topic maps provides an addi- but which is nonetheless quite real) that has two tional element type, called <mergeMap>, that occurrences—one with an occurrence type of ori- refers to another interchangeable topic map with gin, the other with the occurrence type of target. which the containing topic map should be The subject can be the reason why the <A> link merged at processing time. exists, whatever that reason might be (normally A topic is a compound information object— some particular notion that’s common to all represented and processed by a computer—that occurrences). There are other similar strategies for serves as a hub to which everything about the sub- upgrading existing Web knowledge assets, because ject that the topic represents is connected. In a it’s usually the case that when there’s such a link well-constructed topic map, every topic represents between two resources, it’s because they’re some- (or has) exactly one subject; a topic can be a com- how relevant to the same subject. As in nearly all puter-processable surrogate for its subject. (The kinds of cross-references, the topic and its subject Statue of Liberty can be a subject, but statues can’t already exist—they only lack a corresponding for- be processed by computers. A topic whose subject mal declaration, and they can then be given any is the Statue of Liberty, however, is processable by number of names. a computer. One way to say this is, “A topic reifies Unintentional name clashes must be prevent- a subject.”) We chose the word topic because it ed by means of the scoping facilities of topic maps connotes both subject and location (from the (see the “Scope” section). Greek topos, meaning location). Topic associations Topic occurrences Topics can participate in associations with one Information objects that are external to a topic another. Topic occurrences and associations may map—but which bear some relevance to the sub- be instances of classes of occurrences and associa- ject of some topic—are called occurrences of that tions. The classes—like everything else in topic topic. A topic occurrence is a resource declared to maps—are also topics. A class topic and an contain some kind of information about the topic instance topic play their respective roles in a class- under consideration that’s useful under certain instance association. Classes of topics, classes of conditions (that is, within some “scope”). Occur- associations, and the roles played in associations rences can be typed with user-defined semantics; are all user-definable, and they’re all topics (for the types, like everything else in topic maps, are example, they’re the subjects of topics that repre- represented as topics. The idea of classifying sent them). Associations are composed of mem- occurrences by type is comparable to a (much less bers—each topic participating in an association expressive) convention that causes some page plays a specific member role in the association. numbers in a book index to appear in boldface. Topic maps neither interfere with nor limit the use of existing categorization schemes in knowledge Topic names bases, ontologies, taxonomies, vocabularies, index- A topic may have zero, one, or several names, es, and so on. All are welcome and supportable, in any number of languages, for any set of user and all can be federated to meet the needs of users. contexts and scenarios (such as beginner, expert, Instances of associations can be subjected to casual, official, slang, formal, secret, and so on). validation via an association templating facility. Each name, or basename, consists of a string of April–June 2001 characters. Basenames appear in namespaces, and Scope each namespace is defined by a scope; thus, you Topic characteristics (their names, occurrences, can address a topic by its name in a given name- and participations in associations) are all scoped. space (scope). Basenames can also have variants— Scopes—which are themselves sets of topics— icons, sound recordings, strings to be used as sort define the extent of validity within which topics keys, and so on. The variants of a given basename have each of their characteristics. Topics them- 3 Standards Building Standards for Finding Aids Developers initiated work on topic maps in 1991 when Unix lished in January 2000. The syntax of ISO Topic Maps is simul- system vendors (and others, including the publisher O’Reilly taneously open and constrained, expressed as a set of archi- and Associates) founded the Davenport Group. The vendors tectural forms. (Architectural forms are structured element were under customer pressure to improve consistency in their templates; this templating facility is the subject of ISO/IEC printed documentation. Users were concerned about the 10744:1997 Annex A.3, http://www.ornl.gov/sgml/wg8/ inconsistent use of terms in the documentation of systems and document/n1920/html/clause-A.3.html.) Applications of ISO in published books on the same subjects. Vendors wanted to 13250 can freely subclass the element types provided by the include independently and seamlessly created documentation element type definitions in the standard syntax, and freely under license in their system manuals. One major problem was rename the element type names, attribute names, and so on. providing master indexes for independently maintained, con- Thus, ISO 13250 meets the requirements of publishers and stantly changing technical documentation aggregated into sys- other power users for the management of their source codes tem manual sets by the vendors of such systems. The first for finding information assets. attempt at a solution to the problem was humorously called However, the advent of XML—and XML’s acceptance as the SOFABED (Standard Open Formal Architecture for Browsable Web’s lingua franca for communication between document Electronic Documents).