Knowl. Org. 34(2007)No.4

KNOWLEDGE ORGANIZATION KO

Official Quarterly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444 International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

Contents

International Society for Knowledge Book Reviews Organization. 11th General Assembly 2008. Agenda ...... 196 Murtha Baca, Patricia Harping, Elisa Lanzi, Linda McCrea, and Ann Whiteside (eds.). Articles Cataloging Cultural Objects: A Guide to Describing Cultural Work and Their Images. Fulvio Mazzocchi, Melissa Tiberi, Chicago: American Library Association, 2006. Barbara De Santis, and Paolo Plini. 396 p. ISBN 978-0-8389-3564-4 (pbk.) ...... 264 Relational Semantics in Thesauri: Some Remarks at Theoretical and Practical Levels...... 197 Patrick Lambe. Organising Knowledge: Taxonomies, Knowledge and Organisational Guglielmo Trentin. Effectiveness. Oxford: Chandos, 2007. Graphic Tools for Knowledge Representation xix, 277 p. ISBN 978-1-84334-228-1 (hbk.); and Informal Problem-Based Learning 978-1-84334-227-4 (pbk.)...... 266 in Professional Online Communities...... 215 ISKO News ...... 268 Jody L. DeRidder. The Immediate Prospects for the Application Knowledge Organization Literature of Ontologies in Digital Libraries ...... 227 34 (2007) No.4...... 269

Koraljka Golub, Thierry Hamon, Personal Author Index and Anders Ardö. 34 (2007) No.4...... 282 Automated Classification of Textual Documents Based on a Controlled Vocabulary in Engineering...... 247

Knowl. Org. 34(2007)No.4

KO KNOWLEDGE ORGANIZATION

Official Quarterly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444 International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

Contents pages

Mazzocchi, Fulvio, Tiberi, Melissa, De Santis, Barbara, and mapping and Petri Nets, developed collaboratively online Plini, Paolo. Relational Semantics in Thesauri: An Over- with the aid of the CMapTool and WoPeD graphic applica- view and Some Remarks at Theoretical and Practical tions. Two distinct professional communities have been in- Levels. Knowledge Organization, 34(4), 196-213. 39 refer- volved in the research, both pertaining to the Local Health ences. Units in Tuscany. One community is made up of head phy- sicians and health care managers whilst the other is formed ABSTRACT: A thesaurus is a controlled vocabulary de- by technical staff from the Department of Nutrition and signed to allow for effective information retrieval. It con- Food Hygiene. It emerged from the experimentation that sists of different kinds of semantic relationships, with the concept maps are considered more effective in analyzing aim of guiding users to the choice of the most suitable in- knowledge domain related to the problem to be faced (de- dex and search terms for expressing a certain concept. The scription of what it is). On the other hand, Petri Nets are relational semantics of a thesaurus deal with methods to more effective in studying and formalizing its possible so- connect terms with related meanings and are intended to lutions (description of what to do to). For the same rea- enhance information recall capabilities. In this paper, fo- son, those involved in the experimentation have proposed cused on hierarchical relations, different aspects of the re- the complementary rather than alternative use of the two lational semantics of thesauri, and among them the possi- knowledge representation methods as a support for profes- bility of developing richer structures, are analyzed. sional problem-solving. Thesauri are viewed as semantic tools providing, for opera- tional purposes, the representation of the meaning of the terms. The paper stresses how theories of semantics, hold- DeRidder, Jody L. The Immediate Prospects for the Ap- ing different perspectives about the nature of meaning and plication of Ontologies in Digital Libraries. Knowledge how it is represented, affect the design of the relational Organization, 34(4), 227-246. 53 references. semantics of thesauri. The need for tools capable of repre- senting the complexity of knowledge and of the semantics ABSTRACT: The purpose, scope, usage, methodology, of terms as it occurs in the literature of their respective cross-mapping and encoding of ontologies is summarized. subject fields is advocated. It is underlined how this would A snapshot of current research and development includes contribute to improving the retrieval of information. To available tools, ontologies, and query engines, with their achieve this goal, even though in a preliminary manner, we applications. Benefits, problems, and costs are discussed, explore the possibility of setting against the framework of and the feasibility and usefulness of ontologies is weighed thesaurus design the notions of language games and her- with respect to potential and current digital library arenas. meneutic horizon. The author concludes that ontology application potentially has a huge impact within knowledge management, enter- prise integration, e-commerce, and possibly education. Trentin, Guglielmo. Graphic Tools for Knowledge Repre- Outside of heavily funded domains, feasibility depends on sentation and Informal Problem-Based Learning in Pro- assessment of various evolving factors, including the cur- fessional Online Communities. Knowledge Organization, rent tools and systems, level of adoption in the field, time 34(4), 215-226. 24 references. and expertise available, and cost barriers.

ABSTRACT: The use of graphical representations is very common in information technology and engineering. Al- Golub, Koraljka, Hamon, Thierry, and Ardö, Anders. though these same tools could be applied effectively in Automated classification of textual documents based on other areas, they are not used because they are hardly a controlled vocabulary in engineering. Knowledge Or- known or are completely unheard of. This article aims to ganization, 34(4), 247-263. 33 references. discuss the results of the experimentation carried out on graphical approaches to knowledge representation during ABSTRACT. Automated subject classification has been a research, analysis and problem-solving in the health care challenging research issue for many years now, receiving sector. The experimentation was carried out on conceptual particular attention in the past decade due to rapid increase Knowl. Org. 34(2007)No.4

KNOWLEDGE ORGANIZATION KO

Official Quarterly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444 International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

of digital documents. The most frequent approach to richment of the controlled vocabulary with automatically automated classification is machine learning. It, however, extracted terms. The best results are 76% recall when the requires training documents and performs well on new controlled vocabulary is enriched with new terms, and 79% documents only if these are similar enough to the former. precision when certain terms are excluded. Precision of in- We explore a string-matching algorithm based on a con- dividual classes is up to 98%. These results are comparable trolled vocabulary, which does not require training docu- to state-of-the-art machine-learning algorithms. ments–instead it reuses the intellectual work put into cre- ating the controlled vocabulary. Terms from the Engineer- ing Information thesaurus and classification scheme were matched against title and abstract of engineering papers from the Compendex database. Simple string-matching was enhanced by several methods such as term weighting schemes and cut-offs, exclusion of certain terms, and en- These contents pages may be reproduced without charge. Knowl. Org. 34(2007)No.4

KO KNOWLEDGE ORGANIZATION

Official Quarterly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444 International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

KNOWLEDGE ORGANIZATION Dr. Jens-Erik MAI, Faculty of Information Studies, University of Toronto, 140 St. George Street, Toronto, Ontario M5S 3G6, This journal is the organ of the INTERNATIONAL SOCIETY Canada. Email: [email protected] FOR KNOWLEDGE ORGANIZATION (General Secretariat: H. Peter OHLY, Social Science Information Center, Lennestr. 30, Ms. Joan S. MITCHELL, Editor in Chief, Dewey Decimal Classi- D-53113 Bonn, Germany. fication, OCLC Online Computer Library Center, Inc., 6565 Frantz Road, Dublin, OH 43017-3395 USA. Editors Email: [email protected] Dr. Richard P. SMIRAGLIA (Editor-in-Chief), Palmer School of Dr. Widad MUSTAFA el HADI, URF IDIST, Université Charles Library and Information Science, Long Island University, 720 de Gaulle Lille 3, BP 149, 59653 Villeneuve D’Ascq, France Northern Blvd., Brookville NY 11548 USA. H. Peter OHLY, IZ Sozialwissenschaften, Lennestr. 30, 53113 Email: [email protected] Bonn Germany. Email: [email protected] Dr. Clément ARSENAULT (Book Review Editor), École de bi- Dr. Hope A. OLSON, School of Information Studies, 522 Bolton bliothéconomie et des sciences de l’information, Université de Hall, University of Wisconsin-Milwaukee, Milwaukee, WI 53201 Montréal, C.P. 6128, succ. Centre-ville, Montréal (QC) H3C 3J7, USA. Email: [email protected] Canada. Email: [email protected] Ms. Annelise Mark PEJTERSEN, Systems Analysis Dept., Risoe Dr. Ia MCILWAINE (Literature Editor), Research Fellow. National Laboratory, P.O. Box 49, DK-4000 Roskilde, Denmark School of Library, Archive & Information Studies, University College London, Gower Street, London WC1E 6BT U.K. Email: Dr. M. P. SATIJA, Guru Nanak Dev University, School of Li- [email protected] brary and Information Science, Amritsar-143 005, India Dr. Nancy WILLIAMSON (Classification Research News Edi- Prof. Dr. J.F. (Jos) SCHREINEMAKERS, School of Sciences, tor), Faculty of Information Studies, University of Toronto, 140 Department of Mathematics and Computer Science, Section Busi- St. George Street, Toronto, Ontario M5S 3G6 Canada. ness Informatics / Informatiekunder, Vrije Universiteit Amster- Email: [email protected] dam, De Boelelaan 1081a, U3.56, 1081 HV Amsterdam, Nether- lands. Email: [email protected] Hanne ALBRECHTSEN, Institute of Knowledge Sharing, Bu- reauet, Slotsgade 2, 2nd floor DK-2200 Copenhagen N Denmark. Dr. Otto SECHSER, In der Ey 37, CH-8047 Zürich, Switzerland Email: [email protected] Dr. Windfried SCHMITZ-ESSER, Salvatorgasse 23, 6060 Hall, Gabriel MCKEE (Editorial Assistant), Palmer School of Library Tirol, Austria. and Information Science, Long Island University. Dr. Dagobert SOERGEL, College of Information Studies, Horn- bake Bldg. (So. Wing), Room 4105, University of Maryland, Col- Consulting Editors lege Park, MD 20742. Email: [email protected] Prof. Clare BEGHTOL, Faculty of Information Studies, Univer- Dr. Eduard R. SUKIASYAN, Vozdvizhenka 3, RU-101000, Mos- sity of Toronto, 140 St. George Street, Toronto, Ontario M5S cow, Russia. 3G6, Canada. Email: [email protected] Dr. Joseph A. TENNIS, School of Library, Archival and In- Dr. Gerhard BUDIN, Dept. of Philosophy of Science, University formation Studies, University of British Columbia, 301 - 6190 of Vienna, Sensengasse 8, A-1090 Wien, Austria. Agronomy Road, Vancouver, BC V6T 1Z3, Canada. Email: Email: [email protected] [email protected] Prof. Jesús GASCÓN GARCÍA, Facultat de Biblioteconomia i Dr. Martin van der WALT, Department of Information Science, Documentació, Universitat de Barcelona, C. Melcior de Palau, University of Stellenbosch, Private Bag X1, Stellenbosch 7602, 140, 08014 Barcelona, Spain. Email: [email protected] South Africa. Email: [email protected] Claudio GNOLI, University of Pavia, Mathematics Department Prof. Dr. Harald ZIMMERMANN, Softex, Schmollerstrasse 31, Library, via Ferrata 1, I-27100 Pavia, Italy. Email: [email protected] D-66111 Saarbrücken, Germany Dr. Rebecca GREEN, Assistant Editor, Dewey Decimal Classifi- cation, Dewey Editorial Office, Library of Congress, Decimal Classification Division , 101 Independence Ave., S.E., Washing- ton, DC 20540-4330, USA. Email: [email protected]

Dr. Birger HJØRLAND, Royal School of Library and Informa- tion Science, Copenhagen Denmark. Email: [email protected] Founded under the title International Classification in 1974 by Dr. Dr. Barbara H. KWASNIK, Professor, School of Information Ingetraut Dahlberg, the founding president of ISKO. Dr. Dahl- Studies, Syracuse University, Syracuse, NY 13244 USA, (315) berg served as the journal's editor from 1974 to 1997, and as its 443-4547 voice, (315) 443-4506 fax. Email: [email protected] publisher (Indeks Verlag of Frankfurt) from 1981 to 1997. Knowl. Org. 34(2007)No.4

KNOWLEDGE ORGANIZATION KO

Official Quarterly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444 International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

Publisher cles appearing in the same year should have the following format: ERGON-Verlag, Grombühlstr. 7, GER-97080 Würzburg “Jones 2005a, Jones 2005b, etc.” Issue numbers are given only Phone: +49 (931) 280084; FAX +49 (931) 282872 when a journal volume is not through-paginated. E-mail: [email protected]; http://www.ergon-verlag.de Examples: Dahlberg, Ingetraut. 1978. A referent-oriented, analytical concept Editor-in-chief (Editorial office) theory for INTERCONCEPT. International classification 5: Dr. Richard P. SMIRAGLIA (Editor-in-Chief), Palmer School of 142-51. Library and Information Science, Long Island University, 720 Howarth, Lynne C. 2003. Designing a common namespace for Northern Blvd., Brookville NY 11548 USA. searching metadata-enabled knowledge repositories: an inter- Email: [email protected] national perspective. Cataloging & classification quarterly 37n1/2: 173-85. Instructions for Authors Pogorelec, Andrej and Šauperl, Alenka. 2006. The alternative model of classification of belles-lettres in libraries. Knowledge Manuscripts should be submitted electronically (in Word, organization 33: 204-14. WordPerfect, or RTF format) in English only to the editor-in- Schallier, Wouter. 2004. On the razor’s edge: between local and chief and should be accompanied by an indicative abstract of 100 overall needs in knowledge organization. In McIlwaine, Ia C. or 200 words. Submissions via email are preferred; submissions ed., Knowledge organization and the global information society: will also be accepted via post provided that submissions are ac- Proceedings of the Eighth International ISKO Conference 13-16 companied by a 3.5” diskette encoded in Word, WordPerfect, or July 2004 London, UK. Advances in knowledge organization 9. RTF format. Würzburg: Ergon Verlag, pp. 269-74. A separate title page should include the article title and the au- Smiraglia, Richard P. 2001. The nature of ‘a work’: implications for thor’s name, postal address, and E-mail address, if available. Only the organization of knowledge. Lanham, Md.: Scarecrow. the title of the article should appear on the first page of the text. Smiraglia, Richard P. 2005. Instantiation: Toward a theory. In To protect anonymity, the author’s name should not appear on the Vaughan, Liwen, ed. Data, information, and knowledge in a manuscript, and all references in the body of the text and in foot- networked world; Annual conference of the Canadian Association notes that might identify the author to the reviewer should be re- for Information Science … London, Ontario, June 2-4 2005. moved and cited on a separate page. Articles that do not conform Available http://www.cais-acsi.ca/2005proceedings.htm. to these specifications will be returned to authors. Footnotes are not permitted; all narration should be included Criteria for acceptance will be appropriateness to the field of in the text of the article. the journal (see Scope and Aims), taking into account the merit of Illustrations should be kept to a necessary minimum and the contents and presentation. The manuscript should be concise should be submitted electronically when possible. Photographs and should conform as much as possible to professional standards (including color and half-tone) should be scanned with a mini- of English usage and grammar. Manuscripts are received with the mum resolution of 600 dpi and saved as .tif files (Tagged Image understanding that they have not been previously published, are File Format preferred). Tables and figures should be embedded not being submitted for publication elsewhere, and that if the within the document or, alternatively, saved as separate files with work received official sponsorship, it has been duly released for clear instructions indicating their placement in the text. Tables publication. Submissions are refereed, and authors will usually be should contain a number and title at the top, and all columns and notified within 6 to 10 weeks. Unless specifically requested, rows should have headings. All illustrations should be cited in the manuscripts and illustrations will not be returned. text as Figure 1, Figure 2, etc. or Table 1, Table 2, etc. Illustrations The text should be structured by numbered subheadings. It submitted in hard copy only should be marked to indicate their should contain an Introduction, giving an overview and stating the placement in the text. purpose, a main body, describing in sufficient detail the materials Upon acceptance of a manuscript for publication, authors must or methods used and the results or systems developed, and a con- provide a wallet-size photo and a one-paragraph biographical clusion or summary. sketch. The photograph should be scanned with a minimum reso- Reference citations within the text should have the following lution of 600 dpi and saved as a .tif file (Tagged Image File For- form: (author year). For example, (Jones 1990). Specific page mat). numbers are optional, but preferred when applicable, e.g. (Jones 1990, 100). A citation with two authors would read (Jones & Advertising Smith, 1990); three or more authors would be: (Jones et al., 1990). Responsible for advertising: Dr. H.-J. Dietrich, ERGON-Verlag, When the author is mentioned in the text, only the date and op- Grombühlstr. 7, 97080 Würzburg (Germany). tional page number should appear in parenthesis – e.g. According to Jones (1990), … © 2007 by ERGON-Verlag Dr. H.-J. Dietrich. References should be listed alphabetically by author at the end All Rights reserved. of the article. Author names should be given as found in the sources (not abbreviated). Journal titles should not be abbreviated. Multiple citations to works by the same author should be listed KO is published quarterly by ERGON-Verlag. chronologically and should each include the author’s name. Arti- The price is € 115,00/ann. including airmail delivery. Knowl. Org. 34(2007)No.4

KO KNOWLEDGE ORGANIZATION

Official Quarterly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444 International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

Scope Aims

The more scientific data is generated in the impetuous Thus, KNOWLEDGE ORGANIZATION is a forum for present times, the more ordering energy needs to be expended all those interested in the organization of knowledge on a uni- to control these data in a retrievable fashion. With the abun- versal or a domain-specific scale, using concept-analytical or dance of knowledge now available the questions of new solu- concept-synthetical approaches, as well as quantitative and tions to the ordering problem and thus of improved classifica- qualitative methodologies. KNOWLEDGE ORGANIZA- tion systems, methods and procedures have acquired unfore- TION also addresses the intellectual and automatic compila- seen significance. For many years now they have been the fo- tion and use of classification systems and thesauri in all fields cus of interest of information scientists the world over. of knowledge, with special attention being given to the prob- Until recently, the special literature relevant to classifica- lems of terminology. tion was published in piecemeal fashion, scattered over the KNOWLEDGE ORGANIZATION publishes original numerous technical journals serving the experts of the various articles, reports on conferences and similar communications, fields such as: as well as book reviews, letters to the editor, and an extensive annotated bibliography of recent classification and indexing literature. philosophy and science of science KNOWLEDGE ORGANIZATION should therefore be science policy and science organization available at every university and research library of every coun- mathematics, statistics and computer science try, at every information center, at colleges and schools of li- library and information science brary and information science, in the hands of everybody in- archivistics and museology terested in the fields mentioned above and thus also at every journalism and communication science office for updating information on any topic related to the industrial products and commodity science problems of order in our information-flooded times. terminology, lexicography and linguistics KNOWLEDGE ORGANIZATION was founded in 1973 by an international group of scholars with a consulting board of editors representing the world’s regions, the special Beginning in 1974, KNOWLEDGE ORGANIZATION classification fields, and the subject areas involved. From (formerly INTERNATIONAL CLASSIFICATION) has 1974-1980 it was published by K.G. Saur Verlag, München. been serving as a common platform for the discussion of both Back issues of 1978-1992 are available from ERGON-Verlag, theoretical background questions and practical application too. problems in many areas of concern. In each issue experts from As of 1989, KNOWLEDGE ORGANIZATION has be- many countries comment on questions of an adequate struc- come the official organ of the INTERNATIONAL SOCI- turing and construction of ordering systems and on the prob- ETY FOR KNOWLEDGE ORGANIZATION (ISKO) lems of their use in opening the information contents of new and is included for every ISKO-member, personal or institu- literature, of data collections and survey, of tabular works and tional in the membership fee (US $ 55/US $ 110). of other objects of scientific interest. Their contributions have Rates: From 2006 on for 4 issues/ann. (including indexes) been concerned with € 115,00 (forwarding costs included). Membership rates see above. ERGON-Verlag, Grombühlstr. 7, GER-97080 Würzburg; (1) clarifying the theoretical foundations (general ordering Phone: +49 (931) 280084; FAX +49 (931) 282872; E-mail: theory/science, theoretical bases of classification, data [email protected]; http://www.ergon-verlag.de analysis and reduction) (2) describing practical operations connected with index- ing/classification, as well as applications of classification systems and thesauri, manual and machine indexing (3) tracing the history of classification knowledge and methodology The contents of this journal are indexed and abstracted in Refera- (4) discussing questions of education and training in classi- tivnyi Zhurnal Informatika and in the following online databases: fication Information Science Abstracts, INSPEC, Library and Information (5) concerning themselves with the problems of terminol- Science Abstracts (LISA), Library Literature, PASCAL, Sociologi- ogy in general and with respect to special fields. cal Abstracts, and Web Science & Social Sciences Citation Index. 196 Knowl. Org. 34(2007)No.2 ISKO 2008 – Montréal. Call for Papers

International Society for Knowledge Organization 11th General Assembly 2008 Agenda

I am most pleased to invite you to the 11th ISKO General Assembly which will take place in August 2008 in Montréal, Canada, at the 10th ISKO Conference. All ISKO members are encouraged to attend both the Conference and the General Assembly

The proposed Agenda is as follows:

1. Opening: Election of General Assembly Chair and the Secretary 2. Approval of & Additions to the Agenda 3. Report of the President 4. Report of the Secretary and Treasure 5. Report of the editor of the journal Knowledge Organization 6. New ISKO Chapters 7. Reports of the Representants of ISKO Regional and National Chapters 8. The Eleventh International ISKO Conference 9. Elections of members for the Executive Committee a. Election of Secretary/Treasure b. Election of two EC members 10. Any other business

I look forward very much to seeing as many ISKO members as possible at the 10th In- ternational Conference in Montréal and at this 11th General Assembly.

María J. López-Huertas, ISKO President.

Knowl. Org. 34(2007)No.4 197 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

Relational Semantics in Thesauri: Some Remarks at Theoretical and Practical Levels

Fulvio Mazzocchi*, Melissa Tiberi **, Barbara De Santis ***, Paolo Plini **** * / *** / ****Institute for Atmospheric Pollution of CNR, Via Salaria km 29, 300 Monterotondo staz., 00015 (RM), Italy, * *** **** **Central National Library of Florence, Piazza dei Cavalleggeri, 1, I-50122 Florence, Italy,

Fulvio Mazzocchi works as a researcher at the Institute for Atmospheric Pollution of the Italian Na- tional Research Council in Monterotondo (RM). He has studied biologic sciences and philosophy at ‘La Sapienza’ University in Rome. He has participated in a number of projects concerned with the de- sign and the implementation of thesauri for the environmental domain, such as EARTh and GEMET. Among his current research interests there are epistemological foundations of and semantics in rela- tion to knowledge organization.

Melissa Tiberi has obtained a degree in philosophy at ‘La Sapienza’ University in Rome. At present, she is working as an external consultant for the National Central Library in Florence, where she is tak- ing part in the development of the Thesaurus of the Nuovo Soggettario. In the past, by making research on the different kinds of semantic relationships and by implementing them in the thesaurus, she has collaborated to the development of EARTh, too.

Barbara De Santis obtained a degree in interpreting and translating (languages: English and German) at the Bologna University. At present she is working as an external consultant for the Italian National Research Council, at the development of the EARTh project, concentrating manly on multilingual as- pects within thesauri.

Paolo Plini was born in 1960 in Rome. He graduated in 1984 in Natural Sciences from the University of Rome. Since 1994 he is researcher at the Italian National Research Council. At present he is the sci- entific responsible of the Environmental Knowledge Organisation Laboratory of the Institute for At- mospheric Pollution. His main activities are focused on the design and management of EARTh (Envi- ronmental Applications Reference Thesaurus) and of other thesauri on specific environmental topics.

Mazzocchi, Fulvio, Tiberi, Melissa, De Santis, Barbara, and Plini, Paolo. Relational Semantics in The- sauri: An Overview and Some Remarks at Theoretical and Practical Levels. Knowledge Organiza- tion, 34(4), 197-214. 39 references.

ABSTRACT: A thesaurus is a controlled vocabulary designed to allow for effective information re- trieval. It consists of different kinds of semantic relationships, with the aim of guiding users to the choice of the most suitable index and search terms for expressing a certain concept. The relational se- mantics of a thesaurus deal with methods to connect terms with related meanings and are intended to enhance information recall capabilities. In this paper, focused on hierarchical relations, different as- pects of the relational semantics of thesauri, and among them the possibility of developing richer struc- tures, are analyzed. Thesauri are viewed as semantic tools providing, for operational purposes, the rep- 198 Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

resentation of the meaning of the terms. The paper stresses how theories of semantics, holding different perspectives about the nature of meaning and how it is represented, affect the design of the relational semantics of thesauri. The need for tools capable of representing the complexity of knowledge and of the semantics of terms as it occurs in the literature of their respective sub- ject fields is advocated. It is underlined how this would contribute to improving the retrieval of information. To achieve this goal, even though in a preliminary manner, we explore the possibility of setting against the framework of thesaurus design the notions of language games and hermeneutic horizon.

A thesaurus is a controlled vocabulary designed to A more detailed discussion on such a topic is be- allow for successful information retrieval (IR). It in- yond the scope of this paper and would concern a cludes different types of semantic relationships that further investigation on the nature of semantic rela- guide indexers and searchers to the selection of the tions as being mostly theoretical constructs because most suitable terms for expressing given con- built within the framework of a cultural form of life cepts/queries (Dextre Clarke 2001). The relational (Wittgenstein 1953), this latter being, however, ex- semantics of a thesaurus are concerned with methods pression of a most basic human form of life, which to connect terms with related meanings and consti- defines our primary cognitive means and other basic tuted by the set of meaning relationships. The basic characteristics as being members of the same species. relationships which typify a traditional thesaurus are A number of models of conceptualization of the three: hierarchical, associative and of equivalence. world have been crystallized and with them also cer- Being functional and not semantic tools strictu sen- tain ways to consider meaningful the relationships su, in most cases thesauri do not provide a complete between words. In the Western culture, some of and precise definition of the meaning of terms these relations (genus-species, synonyms, antonyms, (Schmitz Esser 1991). The relational structure is de- etc.) are common to all knowledge fields. Others are signed, in fact, mainly to enhance the information more specific to particular domains (in a thesaurus recall performance (Svenonius 2000). Nonetheless, they can be represented as associative relationship thesauri can still be regarded as (operational) seman- sub-kinds). However, the implementation of any re- tic tools in the sense that thesaurus relations are se- lation always depends on the conceptual and linguis- mantic relations and that a thesaurus provides the tic knowledge of the domain they refer to (in a the- conceptual structure of a subject field (Hjørland saurus it depends on operational concerns, as well). 2007). Thus, in order to acquire a deeper understanding A number of scholars have stressed the impor- of KOSs as operational semantic tools, it is impor- tance of semantic research in relation to information tant to investigate which theories are behind the science (IS), and in particular to its subfield of principles determining how the relations have to be knowledge organization, which is concerned with established. At the same time, it is also important to “the construction, use, and evaluation of semantic explore if other theoretical approaches exist and if tools for IR” (Hjørland 2007, 369). The kind of they can provide useful insights for such issues. A meaning understanding can have, in fact, a consider- chance to deepen this topic is offered by a new trend able impact on how knowledge organization systems in the panorama facing thesauri. In recent years (KOSs), as a thesaurus, and their relational semantics thesauri have entered a larger area of application in- are designed and implemented. The primary relation- cluding knowledge and language engineering. As a ships employed in a thesaurus, in fact, although at consequence, in this new framework and for present some levels they reflect certain basic cognitive incli- and future information retrieval and intelligent proc- nations of the human form of life (as the one to- essing needs, the thesaurus relational structure is wards classification and hierarchization), are not likely to require an enlargement and a refinement of ‘given’ as such—and thus necessarily and universally its definition. In order to achieve these goals, a more valid—but ‘constructed’ and defined within a certain thoughtful exploration of the theoretical bases that (cultural and) theoretical tradition. In some cases, guide its development appears to be necessary. they are even based on assumptions rooted in the Analyzing different aspects of the relational se- centuries of the history of philosophy (Hjørland mantics of thesauri (the focus will be restricted to 2007), as occurs with the notion of genus and species the hierarchical relationship) is the subject of this whose origin can be traced back to Aristotle and paper, structured as follows. Section 1 presents the which is based on an idea of meaning that has been basic roles of relational semantics in thesauri as well predominant in the Western culture. as the actual trend towards its refinement. After hav- Knowl. Org. 34(2007)No.4 199 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

ing introduced in section 2 the difference between saurus term and a structured representation of the the instance and the generic relationships, in section general understanding of a subject area are provided. 3 we investigate a number of issues involved in As stated by Soergel (1995, 369), in fact, “a good meaning representation occurring in thesauri thesaurus provides, through its hierarchy augmented through the classificatory and taxonomic aspects of by associative relationships between concepts, a se- their relational semantics, such as the criteria upon mantic road map for searchers and indexers and any- which the construction of the (logical) hierarchical body else interested in an orderly grasp of a subject trees are normally based and the distinction between field”. genus-species and perspective hierarchies. In this framework, what insights may be gained from the 1.2 Trend towards a refinement of the perspective of hermeneutics and from Wittgenstein’s relational semantics notion of language game is explored, too, together with their possible practical implications for the re- Bearing in mind these important functions of the re- trieval of information. Section 4 analyzes the parti- lational structure, it is then necessary to define the tive relationship and the possibility of its refinement, degree of complexity on the basis of which the the- through a differentiation into distinct subkinds. An saurus is conceived, in order to ensure its effective- overview of existing taxonomies of partitive relations ness for information indexing and retrieval. Methods is presented, too. Taking the partitive relationship as to measure its richness have already been developed. a case study, a more general discussion concerning Examples can range from the number of relation the factors on which the choice of the kind of rela- types to more sophisticated indicators, e.g. the ratio tions, as well as their implementation depend, is also of the number of semantic relations and the number outlined. of terms which are included in a thesaurus (Van Slype 1976). The traditional thesaurus format— 1. Relational semantics in thesauri: which stems from the more than twenty year old its role and possible refinement recommendations of the Standard for thesaurus de- velopment—has been created to cope with informa- 1.1 The (general) role of the relational semantics tion needs in the library and archival fields (Schmitz Esser 1991). Thesauri are tools designed for the purpose of im- However, many things have changed and are pres- proving information retrieval. They are based on a ently changing (this has been partially reflected in natural language that is transformed, however, by the development of new Standards like ANSI/NISO means of certain semantic treatments, into an ‘artifi- Z.39.19.2005). Technological advance, which has also cial’ and normalized language where terms are basi- brought a larger and differentiated community to cally monosemic and relations among them are made search for information on a computer basis, has es- explicit. Two different semantic structures are used tablished a different framework, which requires reas- in order to achieve this scope: the referential and the sessing prior assumptions and reconsidering whether relational semantics (Svenonius 2000). Referential the existing types of relationships still cope with the semantics consists of methods to limit the meanings current needs of information organization. And ac- or referents of thesaurus terms: homonyms and tually, a rather widespread opinion is that the tradi- polysemes are disambiguated in order to improve tional thesaurus format is no longer the best-suited precision in IR. means of dealing with these needs. It seems that a It is through the relational semantics of a thesau- richer and hierarchically organized set of relations rus, that is the object of interest of this paper, that would be more clearly apt to face them and, as stated terms are connected to each other when related by Milstead (2001, 65): meanings are identified, devising in this way the rela- tional structure that enhances the information recall There is reason to expect that provision of se- performance, although it can also contribute to im- mantic relationships in controlled vocabularies prove precision by suggesting more specific terms will become much more extensive in a future that can refine the search and help to eliminate un- standard, though this does not automatically wanted information. The network of relations of a mean that users will need to be aware of all thesaurus plays a semantic role since by means of it a kinds of relationships in order to use a particu- further representation of the meaning of each the- lar vocabulary. 200 Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

projects, such as the FAO’s AGROVOC, are instead Despite the general trend towards an expansion of more concerned with the reengineering of thesauri the semantic structure, the outcome of some past into ontologies. They aim at developing an enriched experiments comparing systems that incorporate dif- set of relationships—the latter would be explicitly ferent degrees of semantic structure seems somehow labelled and applied with specification of rules and to question the equation more structure- more effec- constraints—on the basis of a more fully concept- tiveness. Besides, in order to evaluate the effective- oriented organizational model, where concepts are ness of a semantic structure in IR, other factors regarded as independent from and preceding their should be considered, too, such as the comprehen- designation (Soergel et al., 2004). Indeed, the ap- siveness of the language or the manipulation in re- proach towards building thesauri with an extended trieval of the subject language (Svenonius 2000). relational structure partially converges with the idea This refinement is necessary to enhance thesaurus and work behind ontology development. An investi- suitability for uses in the artificial intelligence (AI) gation on ontologies, however, is not the focus of and the Semantic Web environments, as well as to in- the present paper, even though a number of assump- crease possibilities for IR. In particular, AI applica- tions that are normally associated with them are part tions are creating a demand for more elaborated of the discussion. KOS able to ensure higher expressive capabilities in The idea of developing thesauri and other KOSs order to allow inference (Dextre Clarke 2001). In with a more precise and rich semantics, or of using such a setting, the traditional relational structure is formal logic methods, and employing a notion of considered insufficiently detailed and lacking of a concept as if it were an a priori entity, can somehow well-defined semantics. “All the well-know relation- be viewed as expressions of the same theoretical ships are fuzzy in most thesauri. We could afford to point of view, based on logical positivism. What is allow them to be fuzzy as long as their only purpose searched for is creating the conditions for an unam- was to achieve the desired degree of order in our biguous interpretation of terms and relationships documents, which is a modest requirement com- mainly to make KOSs suitable for AI applications. pared with what we need for Language and Knowl- According to Svenonius (2004, 585): edge Engineering” (Schmitz Esser 1991, 145). Hence, along gaining a higher (conceptual and le- The knowledge representations resting upon xical) user interaction with the KOS in that the re- the epistemological foundations of logical posi- finement of the relational semantics might improve tivism in its operationalist and representational query formulation and subject browsing, examples approaches to meaning are … formalized to a of new applications for which such refinement is ad- greater degree and as such are simpler, more vocated include supporting automated processing; uniform, and relatively free from subjective in- query expansion; RDF representations of thesauri terpretation. The objectivity they provide for the Semantic Web; and interoperability among through definitional rigor is essential for auto- different KOSs (Soergel et al., 2004; Tudhope et al., mated applications in retrieval. 2001). Finally, the adoption of more expressive semantic This idea of objectivity, however, conflicts with the relations is advised also to improve the degree of in- fact that meanings and semantic structures in KOSs ternal structural consistency. In many cases, in fact, are always established within a given horizon (reflect- the standard set of relationships has not been consis- ing certain theoretical views and applied to specific tently applied (for instance, many links, labelled as knowledge domains and operational contexts). hierarchical, could be best resolved through an asso- While, of course, the choice to reduce the com- ciative relationship). For some authors, this is ex- plexity of reality for operational purposes can be actly a consequence of the fact that thesaurus rela- made, and attempts of narrowing it down to such an tionships are not provided with a precise semantics extent that it becomes manageable are not rare in the (Soergel et al., 2004). AI tradition, a better refinement and specification of Some advanced thesauri are developing or have al- relations or the adoption of a logicist view of seman- ready included—mainly in the medical domain as tics does not eliminate as such the issues posed by UMLS or MeSH—richer sets of semantic relation- this complexity. ships. A further example is the Italian CNR’s The role played by human judgement in such a EARTh project (Mazzocchi & Plini, 2005). Other task and the multiplicity of different contexts in Knowl. Org. 34(2007)No.4 201 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

which all of this can occur cannot, in fact, be ig- tivity-kind. In the instance relationship the narrower nored. And this is something that we will try to terms are nor parts neither types, but individual in- demonstrate throughout the whole paper, with spe- stances of the broader terms. In a thesaurus, this cial focus, though, on the hierarchical relationships. characteristic of individuality is expressed through a proper name (e.g., deserts—Sahara desert). 2 An introductory note on the hierarchical At this stage, the distinction between generic rela- relationship in thesauri tionship and instantiation seems clearly stated. No- netheless, Milstead (2001) has emphasized that in The hierarchical relationship connects pairs of terms the standards for thesauri there is no method used to when the scope of the broader term (BT) fully in- determine the genus-species relationship that could cludes the scope of the narrower term (NT). Gener- not be applied also to the instance relationship. For ally speaking, the purpose of the hierarchical rela- example, the ‘all-and-some’ test—which is used to tionship is to provide a semantic tree pathway, which assess the validity of the generic links (ISO 1986)— can be useful both as a tool for semantic control and can be applied to both cases (if grammatical differ- specification—the meaning of each term is, in fact, ences in number are admitted). The same is true also (partially) identified by its position within the tree— for ‘is a’ attribution: and as a navigational aid, by offering users the possi- bility to choose the terms to employ, when referring 1a. All mammals are animals / Some animals are to a certain concept, among a range situated at dif- mammals ferent levels of specificity (Dextre Clarke 2001). 1b. All (although only one exists) Sahara desert This relation comprises the following three different are deserts / Some (one) deserts are (is) Sahara kinds: generic, instantial and partitive. In a restricted desert number of thesauri they are distinguished as follows: 2a. a mammal is a animal BTG/NTG: generic 2b. the Sahara desert is a desert BTP/NTP: partitive BTI/NTI: instantial All of this may also lead to conceive the instance re- lationship as a variant of the genus-species relation- The next section will first introduce the generic and ship. However, unlike the generic one (concept-to instance relationships. Then, a discussion about the concept relationship) the instance relationship main features of the generic relation and a compari- points to a change of ‘logical level’ (individual-to- son with perspective hierarchies will follow. Special concept relation). emphasis will be placed on how any given classifica- tion or hierarchization of a term depends on which 3.1 Associative, perspective and logically-based of its conceptual features are made salient in the hierarchies light of a given perspective. Section 4, instead, will analyze the partitive relationship. The hierarchical relationship, and particularly the generic kind, is perhaps the most important within a 3 The generic and instance relationships thesaurus and its proper application plays a key role in ensuring the quality of a structured vocabulary. The generic relationships--named also inclusion, But can we estimate such aptness in an abstract subsumption or hyponymy—connects a genus with sense? It is true that in many thesauri this relation- its species (e.g., animals—mammals). An important ship has been implemented in quite an inconsistent property of this relation, also used as a criterion for way, often resulting in unpredictable semantic struc- its identification, is the inheritance of properties: any tures (Dextre Clarke 2001). attributes of the genus (hypernym) must also be at- As mentioned before, a higher degree of rigour is tributable to the species (hyponym). In this sense, thus advocated to improve the level of structural the meaning of the hyponym derives from the mean- consistency. Nonetheless, different contexts may re- ing of the hypernym, plus some additional features. quire different solutions, each having its own impli- Chaffin et al. (1988) distinguished four kinds of in- cations. Furthermore, it is of the utmost importance clusion according to the type of concept involved: to investigate the underlying assumptions that the natural object-kind; artefact-kind; state-kind; and ac- generic relationship, on which basis hierarchical trees 202 Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

are built, entails not only to deepen our understand- lationship is viewed as logically-based, definitionally ing of it, but also to have the chance to critically ana- true and functioning context independently. Besides, lyze these assumptions in the light of a comparison corresponding to the logical relationship of inclu- with alternative models. sion, it has been defined in terms of the properties of reflexivity, antisymmetry and transitivity. 3.1.1 RT-kind version of hierarchy Conversely, perspective hierarchies are regarded as functioning more contingently in given empirical Many existing thesauri have labelled as hierarchical contexts and depending on the point of view. Nor- relations between terms not belonging to the same mally, they are not provided with the same logical conceptual category. An example of it can be found properties of the generic hierarchies. They express, in the GEMET thesaurus where the term Recycling in fact (Svenonius 2000, 164): ratio (a parameter) is considered to be a Narrower Term of Recycling (an operation). Relationships like Points of view or aspects from which an object this have been established according to a definition or concept is regarded. In many discipline- of hierarchy that is of a ‘pragmatic’ nature and ori- based classifications, the point of view is the ented towards the function of the search process: knowledge domain in which the object or con- “Concept A is broader than concept B whenever the cept is located .… The genus-species relation- following holds: in any inclusive search for A all ship limits a rat to being a rodent; a perspective items dealing with B should be found. Conversely B relationship allows it to be an agricultural pest, is narrower than A” (Soergel 1974, 79). an experimental animal, and so on. Using such a version of the hierarchical relation can be useful to manage certain databases. But if it Thesaurus standards argue that relationships to be may somehow function efficiently at local levels, i.e. included in a thesaurus should be a priori rather than in a specific operative context, in a different and wi- a posteriori. However, the genus-species and the per- der framework, this choice may result unsatisfactory, spective relationships can have different functions since a so-developed hierarchy would suffer lack of and, in defining which hierarchical relationships a consistency with other structures, not being con- thesaurus has to be made of, different factors should form to the standard thesaurus format. Moreover, be taken into consideration, including the character- confusion may also arise if RT-kind (associative) hi- istics of the vocabulary to be structured and the pur- erarchies, like the above example, are labelled in the pose for which the relations are intended in retrieval. same way as the genus-species relation (or in any Concerning the first point, Svenonius (2000 and case as a hierarchical kind). 2004), for example, in terms of hierarchy, considers a stricter logical ordering as particularly apt to struc- 3.1.2 Genus-species and perspective hierarchies ture terms whose meanings are somehow more fixed, e.g. scientific terms, whereas she regards perspective In developing the thesaural relational structure, and hierarchies as more suitable to represent polyseman- thus hierarchies, Foskett (1980) emphasized the im- tic and vague lexicons, as is mostly the case in social portance of the logical perspective: a thesaurus sciences. Regarding the second aspect, the genus- would benefit if the choice of terms and relation- species relation, being logically based, is valuable, for ships reflected the logical structure of a subject field, example, for search broadening and narrowing as instead of being a scarcely systematized gathering of well as for retrieval strategies playing on inheritance terms extracted from the literature. Other authors as properties. Perspective hierarchies, instead, are not Maniez (1988) stressed that the usefulness of logical suitable for these applications. Their added value in relationships should be subordinated to the purposes IR consists of providing contexts that elucidate from of information indexing and retrieval. Svenonius which point of view is a term being considered. In (2000), for her part, underlines the distinction be- this way, they can assist in navigation and are apt for tween genus-species and perspective hierarchies. In a the disambiguation of multireferential terms (Sve- more general sense, this distinction, taken up by a nonius 2000). number of thesaurus standards, is expressed as being Perspective hierarchies are used by classifications between paradigmatic/a priori relations—e.g., genus- such as the Dewey Decimal Classification (DDC). species and syntagmatic/a posteriori ones—among The term ‘Insect’, for example, while it can be lo- them, perspective hierarchies. The genus-species re- cated only in a single genus-species hierarchy (BT: Knowl. Org. 34(2007)No.4 203 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

‘Arthropoda’), it can instead pertains to several per- of its members must also be members of the spective hierarchies according to the points of view narrower concept. from which its meaning is regarded: an insect can be viewed, for example, as an agricultural pest, a disease It should be said, however, that if on the one hand carrier, etc. (Svenonius 2000 and 2004). In the its usefulness is undeniable, on the other this test EARTh thesaurus, the idea of multiple thematic clas- seems to present a number of issues that still need to sifications of terms as a complement of placing them be addressed. For example, the test does not dis- into the genus-species tree has been developed on a criminate which levels of a genus-species tree are similar basis (Mazzocchi & Plini, 2005). linked when establishing a hierarchy. ‘Parrots BT It should be noted that terms linked by perspec- Birds’, ‘Parrots BT Animals’ and ‘Parrots BT Organ- tive hierarchies belong to the same conceptual cate- isms’ are all validated as hierarchies, since all parrots gory. Yet, being these links based on a situated per- are birds, animals and organisms. But, of course, spective, they are not amenable to the ‘all-and-some’ they encompass a different degree of (conceptual) test and thus, according to a strict application of the information. standards, not accepted as a valid hierarchy. To ex- plain this, ISO 2788 mentioned as an example ‘Par- 3.1.4 The intentional definition of the generic rela- rots BT Birds’, which is invariably a true (generic) tionship and its historical predecessor hierarchy, and thus compatible with the all-and-some test, and ‘Parrots BT Pets’, that, however, is not (be- Naturally, the genus-species relationship may also be ing a perspective hierarchy), since some Pets are Par- described on the basis of a representation of rots, and only some Parrots are Pets. Yet, if this is terms/concepts as sets of attribute values or features. mostly true, there may be special cases or particular We proceed from superordinates to subordinates, circumstances where this does not apply. For exam- which contain all the attribute values of the former, ple, in the restricted context of a specialized thesau- by means of the addition of further key conceptual rus on domestic animals, Parrots as NT of Pets can features (Fugmann 1993). In this formulation, Fisher be, instead, accepted. (1998) has recognized a form of the intentional defi- Anyway, despite special cases, being perspective nition of subsumption. Of course, as concepts be- hierarchies somehow context-dependent, it seems come more specific they will also correspond to that only genus-species hierarchies have the potential smaller classes of referents. to provide the basis for a more consistent application In order to better clarify this scheme, it might be throughout different systems. helpful to briefly refer to the philosophical tradition from which it derives. Broadly speaking, the origin 3.1.3 The all-and-some test of the notions of genus and species in the history of the Western thought can be traced back to Plato’s Indeed, this matter is more complex than it appears. and Aristotle’s philosophies, whereas the representa- A couple of criteria are normally used to determine tion of a series of subsequent genus-species links, genus-species hierarchies. First, terms have to belong that starting from a top level (categories) go down to to the same conceptual category. This is a necessary the ultimate or infima species—which in turn are su- (but not sufficient) condition to ensure that a hier- perordinate to the individuum—through a vertical archy is logically based. Both the logical and perspec- taxonomic structure, was firstly conceived with the tive hierarchies are compatible with it, but (nor- Porphyrian tree. mally) not the RT-kind hierarchy. The crucial notion for the establishment of the The other criterion is compatibility with the all- genus-species relationship is that of specific differen- and-some test. In this latter, Fisher (1998, 20) has tia, which represents the key distinctive element dif- recognized the extensional definition of subsump- ferentiating a species from all others sharing the tion: same genus (co-hyponyms). For example, the cate- gory ‘substance’ with the specific differentia ‘material’ Informally, it is said there that concepts are becomes the subordinate genera ‘body’, while with taken as classes which have members, and that the differentia ‘immaterial’ it becomes ‘spirit.’ The for a genuine narrower concept [all] its mem- tree in figure 1 derives from adding, along different bers must also be members of the broader con- hierarchical levels, differentiae to the first of the ten cept while for the broader concept only [some] Aristotle’s categories, substance. Even though Aris- 204 Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

Figure 1. the Tree of Porphyry, as drawn by the 13th century logician Peter of Spain (by Sowa 2000, slightly modified) totle never puts it in this way, by means of the same from all other animals. Listing all the differentiae, method analogous trees are expected to be developed ‘human’ is defined as ‘rational sensitive animate ma- from any of the other categories (quality, quantity, terial substance.’ relation, where or place, when or time, position, hav- Summing up, in a hierarchical arrangement ob- ing or state, action or operation, passion or process). tained in this way, two items are most relevant: the According to some authors (Girgenti 2004, intro- mechanism of conceptual feature addition (the lower duction to Porphyry’s Isagoge), the genus-species level is always a subclass of the higher one) and the tree can be navigated both in an upward direction— key differentiating character of the added conceptual ascension, according to a logical point of view—or in features. For Aristotle, such a method reflects, on a downward direction—declination, based on an on- the logical and language planes, a principle that oper- tological perspective. ates on an ontological level with the purpose of iden- The same notion of differentia plays a key role also tifying the distinctive features of things. Should the in defining. A classic example is the definition of latter be adopted, the problem is then how to put it man (human) as a ‘rational animal.’ The parts of this into practice, also considering that our highly struc- definiens are ‘animal’, the proximate genus that in- tured contemporary knowledge systems seem to be corporates within its range of meaning all the essen- developing more on a horizontal and sectorial plain, tial elements of the superordinate genera and ‘ra- than on a vertical level, as a univocal unfolding from tional’, the specific differentia distinguishing man an Ur-structure. Knowl. Org. 34(2007)No.4 205 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

More generally, the possibility itself of accessing thus, within the limits of certain basic constraints, on a rational level the ‘meta' point of view— i.e., the which aspects of an item (term or object) are made fundamental ‘place of observation’ where the onto- salient. logical order is unveiled—has become, from an epis- In information science, Hjørland and Nissen temological point of view, questionable and thus, to- Pedersen (2005) have developed a theory of classifi- gether with it, also the chance to separate, in a final cation for IR (that by extension can be applied to and objective way, what is essential from what is ac- hierarchization) somehow reflecting this principle cidental and to develop that ‘unique’ genus-species and that has been summarized by Hjørland himself tree, which derives from the further addition of spe- (2007, 373) as follows: cific differentiae to the top categories. According to Eco (1983), also Aristotle in some Classification is the ordering of objects (or of his works, such as De partibus animalium, recog- processes or ideas) into classes on the basis of nizes at an another level the possibility to develop some properties. (The same is the case when multiple trees, that could be complementary among terms are defined: It is determined what objects themselves, according to different perspectives. fall under the terms) …. The properties of ob- Given the impossibility to univocally distinguish ac- jects [which are portrayed in the conceptual cidental from distinctive features, such characteristic features of the terms used to name such ob- of distinctiveness can, in Eco’s view, be acquired only jects] are not just ‘given’ but are available to us in relation to a situated perspective (e.g., the classifi- only on the basis of some descriptions and pre- catory or definitory problem in question). understandings of those objects [although Contemporary biological systematics and taxo- these still have ‘objective’ properties] …. De- nomy provide an interesting example of synchronic scription (or every kind of representation) of copresence of different theoretical approaches. The objects is both a reflection of the thing de- classic Linnean approach—arranging organisms by scribed and of the subject creating the descrip- their morphological similarities—and cladistics (or tion …. The selection of the properties of the phylogenetic systematics)—where living beings are objects to be classified must reflect the purpose classified on the basis of their order in branching in of the classification. There is no ‘neutral’ or an evolutionary tree—coexist and may also be used ‘objective’ way to select properties for classifi- in a combined way to obtain further information. cation because any choice facilitates some kinds Different (theoretical) perspectives can, thus, lead to of use while limiting others …. Any given clas- focusing on a diverse set of characteristics. But they sification or definition will always be a reflec- need not necessarily be regarded as being in opposi- tion of a certain view or approach to the ob- tion. There may be cases in which they provide com- jects being classified. plementary information, useful in obtaining a more complete picture of the matter. Regarding classification as interpretation means to acknowledge the fact that we always act from a clas- 3.1.5 Classification as interpretation sificatory horizon (Paling 2004). This notion, how- ever, needs to be further explained and this can be Broadening the perspective, this latter position may done by indicating its possible constitutive elements. (partially) be related to the notion of interpretative First, it comprehends the ontological and epistemo- horizon as developed, in Gadamer’s work, in the logical meta-assumptions that provide the ‘lens’ framework of contemporary hermeneutics. Such a through which we look at the world (Kuhn 1970) notion, in fact, has mainly been used to explain the and the way in which they are reflected in the scien- historicity of human understanding, yet in a more tific activity. For example, positivism and instrumen- general way it can be regarded as the range of vision talism or hermeneutics have different views of the including “everything that can be seen from a par- (same) world and, accordingly, lead to different con- ticular vantage point” (Gadamer 1976, 302). In op- ceptions of classification and hierarchization, too. position to an objectivistic and universalistic view, Secondly, it includes the domain to which the classi- the idea of ‘classification as interpretation’ acknowl- fication is referring. As stressed in their theory by edges the fact that any classificatory act is always Hjørland and Nissen Pedersen (2005), criteria for made from a delimited horizon, which determines classification are (usually) domain-specific, since dif- how classification is conceived and undertaken and, ferent domains may need different descriptions and 206 Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

classification of items in order to meet their specific of mouth or chin, but without a single feature that purposes. necessarily all share). For example, ‘benzene’ can be described and de- Following this theoretical approach, it is clear fined in several different ways depending on the dis- that, having language and meaning the above charac- cipline or context in which it is considered. Chemists, teristics, they should not be confined to the rules of of course, emphasize its structural properties in being a particular language game. Should a deeper investi- precursor of a class of chemical compounds. Yet, gation still be required, this has a number of impor- physicists may focus on other properties and see it as tant implications with respect to the idea of hierar- a volatile and inflammable. Other descriptions can chical arrangement (in general and applied to a the- emphasize its possible effects—biologists may con- saurus) and to a number of other issues. As stated by sider its toxicity and the different routes through Svenonius (2004, 578): which it can enter an organism—or employments— engineers would consider it as a fuel for combustion Subscribing to the concept of language games engines (Fugmann, 1993). Furthermore, the fact that entails subscribing as well to the position that within the same domain conflicting paradigms and knowledge representations are not descriptive views can coexist should also be taken into considera- of things and relations in the real world; rather tion (Hjørland 2007, 385): “in every domain, there they are descriptive of linguistic behavior. The exist different theories, approaches, interests, or use of knowledge representations to organize ‘paradigms’, which also tend to describe and classify information is one kind of language game, one objects according to their respective views and goals.” kind of linguistic behaviour. Finally, the purpose of classification plays a role in determining the classificatory horizon, too. In fact, Besides, linking again the main point to what has even if a domain can be viewed in terms of a com- been said in the previous paragraph, it could be af- mon paradigm, different practical concerns may lead firmed that each field of knowledge, which has its to different choices in establishing classificatory and own set of conceptualisations, has also its particular hierarchical structures. language games with specific rules (although this does not mean that they cannot share common ele- 3.1.6 Possible insights from the language games theory ments). Meaning of words can, therefore, change (at least partially) from one domain to the next: “the In this context, we believe that the notion of lan- meanings of words—and, thus, words used to name guage games (Sprachspiele) can play a significant role subjects—are in part fixed and, in part, variable. The and be relevant for IS issues, too. This notion has variable part assumes its value by being contextual- been introduced by Wittgenstein (1953) to explain ized within a system of concepts” Svenonius (2004, the multiplicity of language practices that occur 581). within a language. Language does not consist, in fact, Further considerations would be needed to inves- of a single unified game. It is regarded, instead, as a tigate whether a hierarchy of conceptual features is collection of multiple and indefinite games. The ba- possible, if some of these features cannot be ‘can- sic assumption of this theory is that the meaning of a celled’ (without causing the total alteration of the as- word should not be regarded in terms of its referent, sociated meaning) and what their nature is. The but of its use. Speaking language is a social action. To meaning of a term has, in fact, also a more stable know the meaning of a word means to know how to part, that is likely to be maintained also after a major use it as part of an activity, within the framework of paradigm shift or along different domain-based a particular language game and its rules. viewpoints. Coming back to the example of ‘ben- Wittgenstein has introduced also the notion of zene’, all the listed descriptions share a common family resemblances. Considering several possible and premise: benzene, first of all, is a ‘substance’ (that different Sprachspiele, the instances of the use of a can have toxic effects, be used as fuel, etc.). Similarly, word do not (necessarily) share a common denomi- although diverse taxonomizations of a certain kind nator or essence (as it is, instead, assumed in class of animal may be possible (see note 5), none of them inclusion). They are ‘peripherically’ linked through questions its recognition and classification at a family resemblances, being similar but each in a dif- higher level as an animal. These features, thus, pro- ferent manner, like members of a family (where vide a more stable background while modifications some may have the same eyes, others the same form occur mostly at a foreground level. Knowl. Org. 34(2007)No.4 207 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

Furthermore, in a given historical period, being Thus, in all cases, concepts are not a priori (and as expression of the dominant view, certain semantic re- such universal) entities, but should be regarded in lations (and then those conceptual features on which the context of a given conceptualization system in their establishment is based) appear to be more ‘sta- which they are embedded. The meaning of words, ble’ and can be (extensionally) validated by the all- including those that are part of scientific vocabular- and-some test. For example, according to the taxon- ies, should be understood according to the rules of omy of the scientific discipline which is interested in the language games they belong to. The same word studying it (chemistry), benzene ‘organic can have (slightly or significantly) different mean- aromatic substance’ and this ‘always’ holds. But this ings according to its use in diverse language games, is not always the most important aspect in terms of which can pertain to different knowledge fields or to application. In a nature conservation thesaurus, it different theoretical views inside the same domain. might be more useful to represent the meaning of benzene as a ‘pollutant’ rather than as an ‘organic 3.1.7 Implications for the retrieval of information aromatic substance’. It is, however, true that this kind of relationships, in virtue of the stronger con- Both principles based on a hermeneutic perspective sensus sustaining their institution, can (at least) pro- and the language games theory have practical impli- vide a basis to ensure a certain degree of compatibil- cations for the retrieval of information (based on the ity and interoperability among different systems. use of a thesaurus). Many databases contain, in fact, Of course, not all the words convey meaning in documents that have been produced in different sub- the same manner. Some of them have more variable ject fields and, when within the same domain, some- meanings, i.e. more dependent on the context, than times according to different theoretical perspectives. others. For example, words used in the social sci- Meaning, however, cannot be defined by examining ences are regarded to have more variable meanings, the documents of a literature as such. Documents whereas words used in science as having more fixed should rather be seen as a means to access the con- meanings. But this is only partially true. Not only, ceptual structure of a given knowledge field and the in fact, meaning of scientific words changes along language games that it encloses. history in correspondence of paradigm shifts Words (used in documents), in fact, pertain to (Kuhn 1970). The idea that, in a given historical given language games. Each paradigm within a given moment, science is a knowledge system based on domain (of which it embodies the ‘cognitive’ author- universal conceptual structures and that words used ity), specifies the basic rules of the use of any term in scientific discourses have one and the same and, then, its meaning. If searchers, as is actually the meaning in all disciplinary domains has been ques- case, look for concepts (contained in documents) as tioned by part of the XXI century epistemology. defined in subject fields and their literatures, semantic Kuhn (2000), for example, regards each discipline tools such as thesauri should be able to represent—by or community of practitioners of a certain scien- means of their hierarchical arrangement and other re- tific field as bearing its own set of conceptualiza- lations—the meaning of words consistently with how tions, crystallized in a particular lexical taxonomy, these are defined in the language games of such do- in the frame of which terms acquire specific mean- mains. The retrieval of information would, in fact, be ings. This implies that for a (restricted) number of facilitated if a subject field represented in the docu- terms meaning changes along different disciplinary ments of a database had such documents indexed and fields (local incommensurability). searched by means of words used in accordance with Evidently, this fact can be particularly relevant for the (domain-based) language games they refer to the design of the hierarchical arrangement of scien- (Andersen & Christensen 1999). tific thesauri whose subject field is multidisciplinary In particular, users should be made aware of the (as those devoted to ‘environment’). Moreover, the possible different views on the meaning of words (as fact that in a given field of knowledge, different occurs in different language games) and, thus, of all theoretical views can exist simultaneously, providing the possible different views on a given topic (that different descriptions of objects and interpretations can focus on as many aspects of it) which may be of the meaning of terms, although less evident (and useful for them (Hjørland 1998). As underlined by also less agreed upon) may be applied to scientific Hjørland (2007, 389), while attempts at standardiz- disciplinary areas, too (see also note 5). ing terminology can cause the removal of some of these views, “a precondition for designing quality 208 Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

KOS is that the designer knows the different views standards regard only four types of this relation as and is able to provide a reasonably informed and ne- hierarchical: those taking place among parts of the gotiated solution.” body; organizational structures; geographical loca- Of course, a thesaurus has its own language game, tions and disciplines or fields of knowledge. All too, whose rules are basically oriented towards the other cases are classified, instead, as associative rela- achievement of a semantic univocity for operational tionships, even though exceptions may be accepted purposes. However, there are a number of devices in specific subject areas (ISO 1986). The partitive re- that can be used in a thesaurus to represent the dif- lationship is, thus, not restricted to material objects ferent aspects of the semantics of terms and (wher- and should be viewed as a collection of different ever necessary) to disambiguate them. One of these subkinds (Iris et al., 1988). Yet, no consensus has is the coupled use of genus-species and perspective been reached on the identification of such subkinds, hierarchies, in order to exploit the different func- nor has on the linguistic patterns that express them. tions that they could have. As already mentioned in 3.1.2, perspective hierarchies can provide additional 4.1 An overview of existing taxonomies of partitive views about the semantics of a term (or the aspects relations of a given topic) and can be used for disambiguation purposes, while ‘all-and-some’ hierarchies can also A number of interesting studies have been under- provide a shared basis to make different KOSs more taken in different knowledge fields, such as linguis- compatible and interoperable. tics, logic and cognitive psychology, in order to de- velop a taxonomy of partitive relationships. Mostly, 4. The partitive relationship they focus on the degree of differentiation of the parts and on their role with respect to the whole. This section deals with the partitive relationship. A Despite their different origins and aims, the outcome number of taxonomies organizing it into subclasses of these studies provides useful insights also for a re- are also presented, followed by some remarks on the finement of this relationship in thesauri. role played by ‘interpretation’ in implementing these Perhaps the most influential taxonomization is by relations (and semantic relations in general) to sat- Winston et al. (1987), based on experimental data isfy the needs of different conceptual contexts and and on a psychological perspective. Winston and his empirical circumstances. co-workers distinguish six subtypes on the basis of In the partitive relationship (also named mero- the values of three relational elements, which sum- nymy) the narrower terms are parts of the broader marize the attributes of the relationships: ones. In linguistics, a number of test-frames are used to detect it, such as ‘an X is a part of a Y’ (or in- 1. Functionality (functional/non functional): parts versely ‘a Y has an X / Xs’), but none of them seems are/are not in a specific spatial or temporal posi- to provide an unambiguous indicator of it, since they tion with respect to each other, which sustains can also be used to express non-meronymic relation- their functional role with respect to the whole. ships (Cruse 1986). 2. Degree of similarity (homeomerous/non homeo- Furthermore, which basic properties (among re- merous): parts are similar/dissimilar to each other flexivity, antisymmetry and transitivity) may be as- and to the whole to which they belong. cribed to this relationship is still a debated topic (Iris 3. Spatial cohesion (separable/inseparable): parts can/ et al., 1987; Winston et al., 1986). As a rule, thesauri cannot be physically separated from the whole.

Subtypes Examples Functional Homeomerous Separable Integral object- Cup-handle, component Linguistics-phonology + - + Collection- member Forest-tree - - + Mass-portion Salt-grain - + + Object-stuff Bike-steel - - - Activity-feature Shopping-paying + - - Area-place Desert-oasis - + -

Table 1. Winston et al.’s taxonomy of the partitive relation Knowl. Org. 34(2007)No.4 209 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

This scheme has already been integrated in some ad- the basis of characteristics that are extrinsic to the vanced thesauri, e.g. in the project for the develop- individual members, such as spatial or temporal ment of an environmental thesaurus—EARTh (En- proximity or a social connection. Chaffin and Herr- vironmental Applications Reference Thesaurus). mann (1988) distinguish three subkinds of this rela- Together with the description of each relation of tionship: group-member (e.g., herd-cow); member- the Winston et al.’s scheme, in order to have a look collection-member (e.g., tree-forest, fleet-ship); and at some results of this implementation, we have organization-unit (e.g., army-battalion). Up to now, listed a number of demonstrative partitive cases ex- in EARTh collection-member has been applied to trapolated from EARTh’s environmental (and connect material objects and is expressed by closely related) terminology. .

Integral object-component Flora Plants It takes place between a whole (an ‘integral object’)— Game Game species which presents some kind of patterned organization Car population Car or structure—and its components. These latter are also patterned and generally bear specific structural Mass-portion and functional relationships to one another and to the Portions are homeomerous parts of physical objects whole of which they are parts. Integral objects consist or masses since every portion is similar to the others both of things having an extensive dimension, such as and to the whole. They have arbitrary boundaries and physical things (e.g., natural objects or artefacts), and lack functional relation to the whole. They should things whose parts are not extensively contained in also be distinguished from ‘pieces’ that originate, for their wholes, such as abstract objects and organiza- example, from the destruction of an object and, tions. Due to this reason, a further differentiation in unlike portions, are not always homeomerous. In subtypes might still be planned. Accordingly, in the Cruse’s words (1986, 158) “The contrast between EARTh thesaurus this relation is expressed as follows: parts and pieces is potentially operative even with , used for mate- highly integrated wholes such as animal bodies: there rial objects—these include, for example, biological is a clear difference between such a body hacked to systems (cells, anatomical structures, plants) and, pieces, and one carefully dissected into its parts”. among artefacts, instruments, installations and build- Chaffin and Herrmann (1988) make also a distinction ings—and their parts; and a second expression, which between mass-measured portion (e.g., pie-slice) and however still needs to be defined (for the time being mass-natural tiny piece (e.g., salt-grain). Furthermore, the generic ) to be used, in- they include also measure-unit (e.g., mile-yard) as a stead, to express the relation between abstract entities, third subkind. In EARTh, so far it has had a quite as for example disciplines, and their ‘parts’. limited application and is expressed by . Cell Cell membrane Cardiovascular system Heart Land Parcel of land Electric vehicle Electric engine Ecology Land ecology Object-stuff This relation links an object to the substance or mate- Collection-member rial from which the object is naturally made or manu- It records membership in a collection. This relation- factured/created. It differs from the object-com- ship does not require that members have a given ponent relationship in that the stuff of which an ob- structural organization or carry out a particular func- ject is made cannot be physically separated from it tion in relation to each other and to the whole. Col- without altering its identity. Chaffin and Herrmann lection-member has some similarity to (and can con- (1988) distinguish mass-stuff (e.g., trash-paper) from sequently be confused with) the relationship of in- object-stuff (e.g., lens-glass). These authors, like others clusion since both involve membership of individuals such as Ahmad and Fulford (1992) and Iris et al. in larger sets. Nevertheless membership in a class (1988), do not regard this relationship as partitive. It (genus) is determined by similarity to the other can, in fact, be considered also as a kind of associative members (species) based on a set of intrinsic proper- relationship, as has occurred, for example, in EARTh ties. Membership in a collection is instead defined on where it is expressed by . 210 Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

Road Asphalt above mentioned Chaffin & Herrmann (1988), distin- Can Tin guish a set of subkinds by using relational elements Bicycle Aluminium that do not coincide with those of Winston et al.. Iris et al. (1988), propose a classification founded on four Activity-feature basic models. Three of them (the functional compo- It points to the relation focused on those parts— nent; the segmented whole; collection and members) are phases, stages, discrete periods, features, etc.—that similar to the first three Winston et al.’s categories, form, in a structured manner, a process or an activity, whereas the other (sets and subsets) resembles the no- which constitutes the whole. Chaffin and Herrmann tion of class-inclusion. Another comparable list has (1988), who do not include it among partitive kinds, been proposed by Gerstl & Pribbenow (1995), who distinguish process-phase (e.g., growing up-adoles- identify kinds induced by (mass/quantities, collection/ cence), continuous activity-phase (e.g., cycling-pedal- elements and complex/components) or independent of ing), and discrete activity-phase (shopping-buying). (segments and portions) the compositional structure. In EARTh, this relationship has been applied to Finally, Cruse (1986) classifies the partitive relation- (mostly natural) processes and to (social and other ship according to quantificational differences.

WINSTON et al. IRIS et al. GERSTL & PRIBBENOW

Integral object/component Functional component Complex/components Collection/member Collection and members Collection/elements Mass/portion Segmented whole Mass/quantities

Table 2. (Partial) overlapping of partitive categories in three of the cited taxonomies related) activities and their ‘parts’. It is expressed by In the work carried out by the Subcommittee on . Subject Relationships/Reference Structures of the ALA (American Library Association) Subject Ana- Metabolism Anabolism lysis Committee (1997)—who has compiled a master Environmental policy Nature list of 165 relationships from subject indexing and conservation policy cataloguing literature—two main categories are dis- Transport planning Road plan- tinguished: the first, composition partitive relation- ning ships, focuses on aggregates or composites of various members of a class of entities; and the other, who- Area-place le/part pairs, is based on structural and spatial rela- It is applied to things that have a spatial extent, indi- tions and consists of further eight subtypes. cating the relation between areas and specific places within them. The latters are inaliable parts of the Composition partitive relationships whole (areas) in which they are included. However, Whole/part pairs like members of a collection, places are not parts be- Non-physical whole/part pairs cause they functionally contribute to the whole. In Physical whole/part pairs EARTh, it has been applied mostly to geographic Anatomical whole/part pairs entities and expressed by . Geographic whole/part pairs Topic inclusion Desert Oasis Discipline/subdiscipline pairs Earth Continent Whole/attachment pairs City City centre Whole/integral part pairs Park Central park area Whole/piece pairs Whole/segmental part pairs Apart from Winston et al.’s proposal, there are also Whole/systemic part pairs other taxonomies of the partitive relationship, mostly Table 3. Subtypes of the partitive relation from ALA developed in the linguistics domain. For example, the (simplified version) Knowl. Org. 34(2007)No.4 211 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

Of course, in the framework of ontologies, where at- ter definition do not necessarily guarantee the same tempts to eliminate problems of ambiguity by pro- results in all applications. Once a shared set is estab- viding formal definitions of relations are undertaken, lished, this latter may still be dissimilarly imple- the issue of meronymy is greatly discussed, too. An mented. As already said in describing classification, interesting paper dealing with this topic, though in multiple features can, in fact, be ascribed to terms the framework of a broader analysis, is from Smith et (or objects). Depending on which of these features al. (2005), who have advanced a Relation Ontology are made salient in a given context, different rela- to assist the development of biomedical ontologies, tions can be established. such as the Gene Ontology, and promote their inter- Indeed, the application of the relations in a the- operability. saurus should reflect the knowledge of the subject area that the thesaurus aims to represent (with its 4.2. Some remarks on relation refinement and paradigms and language games). Besides, it can vary implementation according to different practical concerns and, in any case, to the way in which the criteria defining rela- Without going further into this analysis, even tions are interpreted and implemented in given cir- though the overview is still incomplete, it seems pos- cumstances. This might be applied to the partitive sible to infer an interesting point, that can be applied relation, too. Depending on all these factors, there to all relational patterns. Despite the general agree- could be room left for different ways of conceiving ment regarding a restricted number of basic relation- how parts relate to wholes. As underlined in their ships (namely hierarchical, associative and equiva- study of partitive relations by Chaffin & Hermann lence), that are in fact used in thesauri and other (1988), even the same pair of objects, and thus of KOS, a consensus on how to differentiate them into words representing these objects, can be viewed as distinct subkinds has still not been—and seems more being connected by different relations once the con- difficult to be—achieved. Some authors such as Tud- text changes. This means that, even though cases of hope et al. (2001) have highlighted the risk of an un- strong relational ‘ambiguity’ of such kind are some- disciplined extension of the basic semantic model. how limited to a restricted number, there is not a For this reason, in order to ensure a certain degree of single way to associate a word-pair to a relation kind interoperability among advanced systems adopting (and this concerns also other kinds of relations) different solutions, they advocate the adoption of a (Chaffin & Hermann 1988, 321-22): minimum common denominator—namely the basic thesaural relationships—for different types of appli- The phenomenon of relation ambiguity makes cations. the point that relations are constructed from All of this may be partly comprehensible since we knowledge of the two concepts related and that are still at an experimental stage in this research field. a particular relation may make use of some as- However, even though, as viewed in the case of the pects of the two concepts and ignore others .… partitive subkinds, there is a more stable consensus If two words have more than one relation, then among scholars on some more specific relations, the each relation must be based on somewhat dif- difficulty of univocally determining the ‘final’ set of ferent aspects of the two concepts. This point relations may also be connected to an impossibility about relation ambiguity may be clarified by of identifying a solution for any circumstance and comparison with ambiguity in other domains. context and which could be regarded as equally valid The closest parallel is with categorization of from all viewpoints. Hjørland (2007, 380-381) has concepts .… A word pair, more strictly a pair underlined, for example, how choices concerning of word senses, may likewise support more which kinds of semantic relations a system should than one relation. A relation need not to give include have to be related to their practical usage in equal weight to all aspects of the meaning of IR: “In a way, it is the specific ‘information need’ the two words. Relations typically emphasize that determines which relations are fruitful and some aspects and ignore others. which are not in a given search session. A semantic relation that increases recall and precision in a given An example analyzed by different authors is ‘kitchen- search is relevant in that situation.” refrigerator’ (Chaffin & Hermann 1988; Iris et al., The fact is that the further differentiation of the 1988; Winston et al., 1987). It has been viewed as: basic semantic relations into subkinds and their bet- 212 Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

integral object-component, when the most im- like ‘house-kitchen (where the part is not separable portant aspect of the refrigerator is consid- from but has a functional role in relation to the ered to be its function in relation to the whole) seem to fit entirely in one of Winston et al.’s kitchen (position shared by most of the au- categories and can be, also for this reason, differently thors); classified. This is not only a possible flaw of the tax- mass-portion, when the important feature to fo- onomy, but it may also derive from the fact that the cus on is size, e.g., in those situations where complexity of the matter seems to require descrip- small kitchens in contrast to large refrigera- tions based on different perspectives in order to ob- tors are considered (this attribution seems, tain a fuller view. This case seems also to underline however, too circumstantial); the need for more fuzzy-boundaried relational cate- area-place, when the focus is on the occupied gories: many situations could be more easily classi- spatial area in relation to the kitchen. fied if conceived as part of a continuum between the two discussed categories. In particular, the possibility of interpreting a word- What has been discussed in this paragraph fur- pair either focusing on the component function and nishes, obviously, only some preliminary remarks on the whole or on the spatial relation occurring between this topic. However, to conclude, we may affirm them, pertains, indeed, also to other cases concerning, that, while a more elaborated structure can contrib- for example, body structures and geographical items. ute to decrease the level of arbitrariness in the im- Remembering that a component (normally) plays a plementation of thesaurus relations, and this of functional role in relation to an integral object taken as course is highly recommendable, there is no guaran- a whole but is separable from it, and that, instead, a tee that only one valid set of relations exists or that place is not in this same relation to the area, but is the implementation of more specific relations can rather a spatial and inalienable part of it, not always provide consistent results in all situations. The her- these criteria are easily applicable. A refrigerator nor- meneutic principle mentioned in the discussion mally stands in a kitchen (although it is not an insepa- about classification is, in fact, still relevant, since dif- rable part of it). From the viewpoint of a kitchen, re- ferent choices can be made according to different frigerators are functional but ‘optional’ parts since it is perspectives and in order to satisfy the needs of dif- possible for a kitchen to lack a refrigerator (Cruse ferent domains and operational contexts. 1986). From the point of view of the refrigerator, however, its functional role can be considered apart 5. Conclusion from its relation with a kitchen (though this is its usual location). Its function, in fact, i.e. ‘to store food A thesaurus is a tool which semantically organizes a (or other products) at a low temperature’ seems more domain of knowledge for operational purposes. Its in relation to ‘what’ (to store) than to ‘where.’ relational semantics is concerned with methods to This is quite different from the relation, for ex- connect terms with related meanings and designed to ample, between ‘handle’ and ‘cup’ where the func- support information indexing and retrieval. With fo- tional role of the handle applies only if it is attached cus on hierarchical relations, different aspects of the to the cup (of which it constitutes a ‘canonical’ part) relational semantics of thesauri as well as the possi- and only in relation to that whole. It is interesting bility to develop richer structures by differentiating also to know that while they regard a refrigerator as standard relationships into subtypes have been ana- being (normally) a functional part of a kitchen, lyzed. We have also examined how semantic issues Winston et al. (1987, 433) consider, instead, this lat- are implied in thesaurus construction. From a certain ter as “merely a place within a house, not a compo- viewpoint, a thesaurus relational structure may be nent of the house” (in other words, ‘house—kitchen‘ regarded as a system providing the representation, is an example of area—place kind). Yet, this attribu- for operational purposes, of the meanings of the tion seems to be rather problematic (who would live terms contained in the thesaurus. Thus, theories of in a house without a kitchen?). semantics, which hold different perspectives about Summarizing, in our interpretation, neither the nature of meaning and how it is represented, af- ‘kitchen-refrigerator’ (where a refrigerator is separa- fect the way in which the relational semantics of the- ble from a kitchen and has a ‘partial’ functional role sauri is designed. in relation to it, in the sense that it has a kitchen In traditional approaches to knowledge organiza- primarily as its usual functional location), nor pairs tion the influence of logical positivism has played a Knowl. Org. 34(2007)No.4 213 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

significant role. And this is also reflected in the cur- fication research. Proceedings of the 10th ASIG rent trend towards an increase of formalism and stan- SIG/CR Classification Research Workshop vol. 10. dardization. The search for a more refined relational Medford, NJ: Information Today, pp. 1-21. semantics in thesauri has arisen from this same fra- Chaffin, Roger and Herrmann, Douglas J. 1988. The mework and, according to its advocates, holds the nature of semantic relations: a comparison of two promise to eliminate much of the ambiguity problems. approaches In Evens, MarthaWalton, ed. Rela- In our opinion, while it is likely that this field of tional model of the lexicon, representing knowledge study will bring valuable results in terms of an im- in semantic networks. Studies in natural language provement of the methodological basis and of a mo- processing. Cambridge: Cambridge University re consistent application, different ways of interpret- Press, pp. 249-94. ing meanings and of establishing semantic structures Chaffin, Roger, Herrmann, Douglas J., and Winston, (and thus of organizing knowledge) will continue to Morton. 1988. An empirical taxonomy of part- be developed, on the basis of different paradigms, whole relations: effects of part-whole type on re- domains and operational contexts. Thus, if stan- lation identification. Language and cognitive proc- dardization might be justified in given operational esses 3: 17-48. frameworks other solutions should be explored, too. Cruse, D. Alan. 1986. Lexical semantics. Cambridge: The usefulness of static and monolithic structures is, Cambridge University Press. in fact, rather limited. Tools are, instead, needed that Dextre Clarke, Stella G. 2001. Thesaural relation- are capable of representing the universe of knowl- ships. In Bean, Carol and Green, Rebecca, eds. Re- edge domains and structures in its complexity (and lationships in the organization of knowledge. also flexible enough to incorporate the continuous Dordrecht: Kluwer, pp. 37-52. changes in languages and meanings, not mentioning Eco, Umberto. 1983. L’antiporfirio. In Vattimo, how all of this is affected by the development of Gianni and Rovatti, Pier Aldo, eds., Il pensiero de- technology), in order to facilitate access to its con- bole. Milan: Feltrinelli, pp. 52-80. stitutive elements (concepts) that are the true object Fischer, Dietrich. 1998. From thesauri towards on- of searching. tologies? In el-Hadi, Mustafa, Maniez, Jacque, and Therefore, it is important to consider which con- Pollitt, Stephen A. eds., Structure and relations in tributions may derive from theoretical positions knowledge organization: Proceedings of the 5th In- such as those based on hermeneutics and those based ternational ISKO Conference. Würzburg: Ergon, on Wittgenstein’s view of language and meaning, pp. 18-30. which are more inclined to value such complexity (in Foskett, Douglas J. 1980. Thesaurus. In Kent, Allen, terms of diversity of perspectives, contexts, rules, ed., Encyclopedia of library and information sci- etc.). The possibility of their application in thesaurus ence, vol. 20. New York: Marcell Dekker, Inc., pp. design and other IR issues has been illustrated, even 416-63. if this topic needs to be further investigated. Fugmann, Robert. 1993. Subject analysis and index- ing. theoretical foundation and practical advice. References Frankfurt/Main: INDEKS Verlag. Gadamer, Hans-Georg. 1976. Tr u t h a n d m e t h o d , Ahmad, Khurshid and Fulford, Heather. 1992. Se- trans. G. Barden and J. Cumming from the 2nd mantic relations and their use in elaborating termi- German ed. London: Sheed and Ward. nology. Computing Science reports CS-92-7. Sur- Gerstl, Peter and Pribbenow, Simone. 1995. Midwin- rey: University of Surrey. ters, end games and body parts: a classification of American Library Association (ALA), Subject Analy- part-whole relations. International journal of hu- sis Committee, Subcommittee on Subject Relation- man-computer studies 43. 865-89. ships/Reference Structures. 1997. Final report to the Hjørland, Birger. 1998. Information retrieval, text ALCTS/CCS Subject Analysis Committee. http:// composition, and semantics. Knowledge organiza- www.ala.org/ala/alctscontent/catalogingsection/ tion 25: 16-31. catcommittees/subjectanalysis/subjectrelations/ Hjørland, Birger. 2007. Semantics and knowledge finalreport.htm (consulted: 15.09.2007). organization. Annual review of information science Andersen, Jack and Christensen, Frank Sejer. 1999. and technology 41: 367-405. Wittgenstein and indexing theory. In Albrechtsen, Hjørland, Birger Pedersen, Karsten Nissen. 2005. Hanne and Mai, Jens-Erik eds. Advances in classi- A substantive theory of classification for informa- 214 Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

tion retrieval. Journal of documentation 61: Smith, Barry, Ceusters, Werner, Klagges, Bert, 582-97. Köhler, Jacob, Kumar, Anand, Lomax, Jane, Mun- International Standards Organization (ISO).1986. gall, Chris, Neuhaus, Fabian, Rector, Alan L. and ISO 2788: Documentation—guidelines for the es- Rosse, Cornelius. 2005. Relations in biomedical tablishment and development of monolingual ontologies. Genome biology, 6: R46. thesauri. 2nd ed. Geneva: ISO. Soergel, Dagobert. 1974. Indexing languages and Iris, Madelyn A., Litowitz, Bonnie E. and Evens, thesauri: construction and maintenance. Los Ange- Martha Walton. 1988. Problems of the part-whole les: Melville Publishing. relation. In Evens, Martha Walton, ed., Relational Soergel, Dagobert. 1995. The Art and Architecture model of the lexicon. representing knowledge in se- Thesaurus (AAT): a critical appraisal. Visual re- mantic networks. Studies in natural language proc- source 10. 369-400. essing. Cambridge: Cambridge University Press, Soergel, Dagobert, Lauser, Boris, Liang, Anita, Fis- pp. 261-88. seha, Frehiwot, Keizer, Johannes and Katz, Kuhn, Thomas S. 1970 The structure of scientific revo- Stephen. 2004. Reengineering thesauri for new lutions. 2nd ed. Chicago: University of Chicago applications: the AGROVOC Example. Journal of Press. digital information 4 issue. 4. Article No. 257. Kuhn, Thomas S. 2000 The road since structure: phi- http://jodi.ecs.soton.ac.uk/Articles/v04/i04/Soer losophical essays, 1970-1993, with an autobio- gel/ (consulted: 01.10.2007). graphical interview. Conant, James and Hauge- Sowa, John F. 2000. Knowledge representation. logical, land, John, eds. Chicago: University of Chicago philosophical, and computational foundations. New Press. York: Brooks/Cole. Maniez, Jacques. 1988. Relationships in thesauri: Svenonius, Elaine. 2000. The intellectual foundation Some critical remarks. International classification of information organization. Cambridge, MA: The 15. 133-38. MIT Press. Mazzocchi, Fulvio & Plini, Paolo. 2005. Thesaurus Svenonius, Elaine. 2004. The epistemological foun- classification and relational structure: the EARTh dations of knowledge representations. Library experience. In Madsen, Bodil Nistrup and trends 52(3). 571-87. Thomsen, Hanne Erdman, eds., Terminology and Tudhope, Douglas, Alani, Harith and Jones, Christo- content development. Proceedings of the 7th Inter- pher. 2001. Augmenting thesaurus relationships: national conference on Terminology and Knowledge possibilities for Retrieval. Journal of digital infor- Engineering. Copenhagen pp. 265-78. mation 1, Issue 8, Article N.41. http://jodi.ecs Milstead, Jessica L. 2001. Thesaural relationships. In .soton.ac.uk/Articles/v01/i08/Tudhope/ Bean, Carol and Green, Rebecca, eds., Relation- (consulted: 01.10.2007) ships in the organization of knowledge. Dordrecht: Van Slype, Georges. 1976. Definition of the essential Kluwer, pp. 53-66. characteristics of thesauri. Prepared for the Com- National Information Standards Organization mission of the European Communities. Bruxelles: (NISO). 2005. ANSI/NISO Z.39.19.2005: Guide- Bureau Marcel van Dijk. lines for the construction, format and management Winston, Morton E., Chaffin, Roger, and Herrmann of monolingual controlled vocabularies. Bethesda Douglas J. 1987. A taxonomy of part-whole rela- (USA): NISO Press. tions. Cognitive science 11: 417-44. Paling, Stephen. 2004. Classification, rhetoric, and Wittgenstein, Ludwig. 1953. Philosophical investiga- the classification horizon. Library trends 52(3). tions, trans. Gertrude Elizabeth Margaret Ans- 588-603. combe. New York: Macmillan. Porphyry. Isagoge, Girgenti, Giuseppe ed.. 2004. Mi- lan: Bompiani. Schmitz-Esser, Windfried. 1991. New approaches in thesaurus application. International classification 18: 143-47.

Knowl. Org. 34(2007)No.4 215 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

Graphic Tools for Knowledge Representation and Informal Problem-Based Learning in Professional Online Communities

Guglielmo Trentin Istituto Tecnologie Didattiche, Consiglio Nazionale delle Ricerche, Via De Marini 6, 16149 Genova, Italy,

Guglielmo Trentin is with the Institute for Educational Technology (ITD) of the Italian National Re- search Council (CNR). His studies have largely focused on the use of ICT in formal and informal learning. In this field he has managed several projects and scientific activities, developing technological applications and methodological approaches to support networked collaborative learning. He is con- tributing editor of Educational Technology (USA) and member of the editorial board of the Interna- tional Journal of Technology, Pedagogy & Education (UK). Since 2002 he teaches Network Technology & Human Resources Development at the University of Turin, Faculty of Political Science.

Trentin, Guglielmo. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning in Professional Online Communities. Knowledge Organization, 34(4), 215-226. 24 references.

ABSTRACT: The use of graphical representations is very common in information technology and engineering. Although these same tools could be applied effectively in other areas, they are not used because they are hardly known or are completely un- heard of. This article aims to discuss the results of the experimentation carried out on graphical approaches to knowledge rep- resentation during research, analysis and problem-solving in the health care sector. The experimentation was carried out on conceptual mapping and Petri Nets, developed collaboratively online with the aid of the CMapTool and WoPeD graphic appli- cations. Two distinct professional communities have been involved in the research, both pertaining to the Local Health Units in Tuscany. One community is made up of head physicians and health care managers whilst the other is formed by technical staff from the Department of Nutrition and Food Hygiene. It emerged from the experimentation that concept maps are con- sidered more effective in analyzing knowledge domain related to the problem to be faced (description of what it is). On the other hand, Petri Nets are more effective in studying and formalizing its possible solutions (description of what to do to). For the same reason, those involved in the experimentation have proposed the complementary rather than alternative use of the two knowledge representation methods as a support for professional problem-solving.

1. Introduction ad hoc graphic editors are used which allow the online circulation of graphical representations as a support In the discussion group, when trying to best explain for collaborative interaction. This article, in particu- one’s viewpoint, oral communication is often accom- lar, will refer to two specific methods for the graphi- panied by simple diagrams drawn on the spot either cal representation of knowledge (Concept Maps and on paper or on a board. One therefore gives a sort of Petri Nets) and related software applications. conceptual image (van Lambalgen and Hamm 2001; Stokhof 2002; Wheeler 2006) of the portion of 2. Graphical Representations knowledge to be discussed. This in turn triggers a process involving explicit, implicit and tacit knowl- Graphical representations are de facto a language of edge (Polanyi 1975; Nonaka and Takeuchi 1995). The communication and, like any language, syntactic rules same thing often occurs also during interaction are needed for it to act as a medium in communica- among members of an online professional commu- tion between two or more individuals (Donald 1987). nity. In this case though, instead of paper or boards, Hence, specific graphic languages have been defined 216 Knowl. Org. 34(2007)No.4 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

and formalized that are geared towards knowledge Before dealing with the experimentation which is representation (hierarchical representations, semantic the subject of this article, details of the two underly- networks, concept maps, approaches to the represen- ing representation tools of knowledge are summa- tation of procedural knowledge, etc.). Their devel- rized here below. opment has been given considerable impetus from the field of artificial intelligence and, more in general, 3. Concept Maps from all those areas which have attempted “to capture in digital” knowledge domains. They are formally A is a coherent visual logical represen- represented so that they can be used by specific soft- tation of knowledge on a specific topic which en- ware engines: see for example, intelligent systems, de- courages individuals to direct, analyse and expand cision support systems, semantic webs (Bosch 2006) their analytical skills (Novak and Wandersee 1991; and simulation systems. Halimi 2006). The approach was developed by J.D. Thanks to their simplicity and effectiveness, some Novak (1991) based on Ausubel’s theories (1963; of these graphic languages later spread beyond the 1968) and Quillam’s studies on semantic networks specific area from which they originated where their (1968). Concept maps use diagram representations use was often more simplified and less rigorous which highlight meaningful relationships between (Trentin 1991), so that even non-specialists could concepts in the form of propositions, also called se- capitalize on the basic concepts. The question is: mantic units, or units of meaning. A proposition is when are these graphical representations useful for the statement represented by a relationship connect- the professional communities? A first consideration ing two concepts. Therefore, there are two basic fea- regards their effectiveness in facilitating the multi- tures used to construct concept maps: concepts and perspective study of a given knowledge domain and/ their relationships (Figure 1). or area of exploration: a new knowledge, the solu- tion to a problem, the functionalities of a complex system. The representation of concepts through graphics amplifies, in the eyes of the interlocutors, the existence of multiple interpretations of one sub- ject of study or debate (Cunningham 1991). A sec- ond consideration concerns the community’s need for technological aids to improve the flow and or- ganization of community knowledge (Shipman 1993; Prusak 1994; Haldin-Herrgard 2000). We are aware the knowledge sharing processes (theoretical and procedural) are favored by two types of technological support: one for interpersonal com- munication and the other for the collection and man- agement of information and knowledge (Auger et al. 2001). Both cases need to give a conceptual schematic representation of the knowledge domain of reference (or portions of it) for a given community. Graphical representations can give an inside view of the concep- tual interconnections between elements making up the knowledge that is being discussed and shared. It is Figure 1. Example of a concept map drawn with CMapTool therefore an effective way to facilitate the communi- cation of conceptual images as well as the semantic Besides the two basic features, a concept map is then organization of informative, documentary and factual characterized by hierarchical relationships between material contained in the community memory (Lave concepts and by cross-links between concepts be- and Wenger 1991). This last aspect is particularly in- longing to different domains of the same map. teresting as many research engines now use concep- Various graphic tools for editing concept maps tual representations of the knowledge domain in have been developed and the dialogue window in which they work for the selective recovery of infor- Figure 1 shows of one of the best-known: CMap- mation (for example http://www.webbrain.com). Tool (http://cmap.ihmc.us/). Many of these envi- Knowl. Org. 34(2007)No.4 217 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

ronments are able to link the different concepts to a variety of items (documents, images, films, URLs, other concept maps) with the possibility then of converting them into HTML format, thereby creat- ing structured repositories that can be accessed online. This, for example, is one of the possible ways to organize an online community’s shared memory. Designing concept maps with these software ap- plications is very simple and here, for example, is how one can work with CmapTool:

– after opening a new map and double clicking on the white area, the starting concept may be de- fined (Figure 2a); Figure 2c. Description of concepts and relation type – by clicking and dragging the arrow one can create a link between a new concept and the starting By proceeding in such a way, it is possible to obtain concept (Figure 2b); graphical representations like the one reported in – then the two concepts and the relation type link- Figure 3 showing a maps produced during the ex- ing them have to be described (Figure 2c). perimentation described here. When very complex knowledge domains have to be described, such as the Clinical Audit in Figure 3, the corresponding concept maps tend to become much larger and difficult to manage. For this reason, CMapTools provide a function to compress/explode sections of the map being drawn. For example, by clicking on the symbol “>>” that appears to the right of “evidence-based practice”, the map linked to that concept expands (see Figure 4). Then clicking on the symbol “<<” will take you back to Figure 3.

4. Petri Nets and Procedural Knowledge Representation

Petri Nets provide an effective way to describe and analyze models, whether complex systems, processes, Figure 2a. The starting concept knowledge domains, etc. (Peterson 1981). On account of this characteristic, they are often used in the graphical representation of procedural knowledge.

4.1. Resources and activities

A Petri Net is an oriented graphic in which two node types are represented: resources (indicated with cir- cles in Figure 5) and activities (indicated with seg- ments)—in literature on Petri Nets these nodes are respectively called places and transitions (Peterson 1981). A graphic arc that is directed from a resource to an activity indicates that the resource is necessary to carry out that activity. Similarly an arc that is di- rected from an activity to a resource indicates that Figure 2b. The link between two concepts the resource is the product of the same activity.

218 Knowl. Org. 34(2007)No.4 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

Figure 3. A concept map on the Clinical Audit developed with CMapTool

Figure 4. Example of a complex concept expansion Knowl. Org. 34(2007)No.4 219 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

Just as for concept maps, ad hoc software envi- ronments have been developed also in the case of the Petri Nets. By way of example, Figure 6 shows the dialogue screen of one of these environments, spe- cifically that of WoPeD (Workflow Petri Net De- signer— http://www.woped.org/). The features of such applications not only provide an editing environment of Petri Nets, but also check syntax functions and simulation of proce- dures/systems that they describe.

4.2. Successive refinements (top-down expansion)

Figure 5. An example of Petri Net Starting from an initial Petri Net - in attempting to describe the process/procedure or knowledge do- What has just been listed are, so to speak, the basic main with even greater precision - activities, re- “ingredients” to give shape to Petri Nets according sources and links are often increasingly added. This to the use suggested within the experimentation re- therefore produces very complex graphs that are ferred to here. In actual fact, the theory presupposed hard to process and read. A good method to over- by the Petri Nets is much more articulated and rig- come this drawback is to describe the network orous (Peterson 1981). In our case only the key con- through successive refinements (or stages), expand- cepts have been used to enable the two communities ing it using a top-down approach (Trentin 1991). In involved to assess the general philosophy governing the first stage an overall (undetailed) representation the specific approach. is given of what one wants to describe. The resources and main activities are reported together with their respective interconnections. In the same network the complex activities are then highlighted that will be described in more refined detail in a specific sub- network. See, in Figure 6, activity “AC development” represented with a grey square. The following stage involves developing the re- finement sub-networks giving a detailed description of the more complex activities. For example, Figure 7 reports the refinement of activity “AC develop- ment” shown in the Petri Net of Figure 6. The refinement process is iterated until the de- sired level of detail given to the representation is at- tained. The refinement activity is a consequence of the need to foster the so-called “functional abstraction” (Stein 2002), the process through which the atten- tion of the individual or whole group/community focuses on one aspect of what is being described at a time. This is a process developed stepwise. It begins with an overview of the subject matter, such as a profes- sional issue, where the key elements characterizing it are identified (macro-representation of the domain). In the following steps, each key element is isolated

and described in more detail by breaking it down into Figure 6. Example of environment to edit and implement less complex sub-elements (for example, a complex Petri Nets activity is broken down into sub-activities). This is 220 Knowl. Org. 34(2007)No.4 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

Figure 7. Example of refinement derived from Figure 6 done by trying to abstract as much as possible from This is the reason why - within the two specific what is within the confines of the element that is projects aimed at fostering the launch and develop- considered one by one (the other elements), to guar- ment of professional communities in the health care antee maximum success of its specific analysis. sector - research was carried out on the use of Should this refinement step be inadequate for a graphical approaches to professional knowledge rep- deep analysis of the element being dealt with, the re- resentation. The aim was to analyze and discuss their finement process is iterated until the level of detail is actual usability and effectiveness in fostering col- considered the most functional to reach the final ob- laborative interaction, debate and reciprocal clarifica- jective (analyzing a situation, solving a problem, de- tion during a process geared towards examining a scribing a complex system). specific professional theme/issue.

5. Research Issue 6. Experimental Setting

The use of graphical representations is very popular Two distinct professional communities have been in- in information technology and engineering. Al- volved in the research. The first (Audit community) though the same tools could be applied effectively in was made up 31 head physicians and health care other areas, they are not though since they are not managers pertaining to Local Health Unit 11 of well known or are completely unheard of. This is due Livorno (Tuscany Region) who had the task of deal- to study curricula and/or training courses where ing with the theme of Clinical Audit, the key ele- there is no occasion to learn these techniques and ments characterizing it and the working methods to technologies since they are not considered important carry it out. The second (Alert community) formed for a given disciplinary/professional area. by 18 technical staff from the Department of Nutri- Knowl. Org. 34(2007)No.4 221 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

tion and Food Hygiene coming from all the health C. Usefulness on an individual level in one’s own pro- care units in Tuscany. In their case, the task was to fessional practice, intended to research the per- define the organization of a Regional Working ceived usefulness of tools proposed in relation to Group on the problem of managing food alerts. an individual use in one’s own professional prac- In both cases, as already mentioned, concept maps tice. and Petri Nets have been proposed as methods for D. Usefulness in facilitating collaborative group work, graphical representations of knowledge. The devel- intended to discover the perceived usefulness of opment of each graphical representation has been di- tools proposed in fostering or not fostering vided into three stages: group work when dealing with aspects related to their own professional practice. – a face-to-face meeting for the first familiarization with the graphic approach and the related editing In the questionnaire, two questions are associated software; with each survey indicator: one with a closed-ended – two weeks of online collaborative activities in sub- answer based on attributing a score (on the Likert 1-5 groups; scale); the other with an open-ended answer asking to – a closing meeting to evaluate and compare the explain the attribution of the above-mentioned score graphical representations produced, and to discuss or to give further information about the same indica- the online collaborative process implemented to tor. 25 participants belonging to the Audit commu- produce them. nity and 16 to the Alert community answered the questionnaire anonymously. The participants were divided into sub-groups of 5-6 units and were asked to structure their work into 8. Results two one-week periods: The survey data revealed positive evaluations regard- – individual drawing up of one’s draft of the graphi- ing the professional use of proposed graphic formal- cal representation; ization methods. However, there were various and – sharing of graphical representation and conver- sometime considerable differences between what was gence towards one single sub-group version of it. expressed by the two communities. This likely to be related to the different roles covered by the respec- To co-construct the two representations the follow- tive individuals: on the one hand, positive but lower ing applications have been used: scores were given by the Audit community made up mainly of people with a managerial role; on the other – CMapTool (http://cmap.ihmc.us/) and WoPeD hand, higher scores were assigned by the Alert (Workflow Petri Net Designer) (http://www community made up of staff with a more technical .woped.org/) respectively for the development of role. A more analytical examination of the partici- concept maps and Petri Nets; pants’ answers is provided in the next section.

– Moodle (http://moodle.org/) as environment to run interpersonal group communication. 8.1. Learnability

7. Methodology As shown by Table 1, both groups stated that they

found it more difficult to enter the logic of the Petri At the end of the collaborative activity, the partici- Nets than the concept maps. pants were given a questionnaire divided into 4 sec- tions: Learnability Audit Alert A. Learnability, intended to pinpoint the times and How easy has it been for you to possible learning difficulties of the approaches to master the logic and syntax of 3,1 3,7 the concept maps? the formal representation of knowledge used in the experimentation. How easy has it been for you to B. Study and/or problem-solving, intended to re- master the logic and syntax of 2,6 2,8 the Petri Nets? search the perception of the general usefulness of the tools proposed for the study activities, analy- Table 1. Average data relating to answers on learnability sis and search for solutions. 222 Knowl. Org. 34(2007)No.4 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

It is a fairly common reaction, met in other similar 8.2. General usefulness for study activities, experimentations (Trentin 1991; Stein 2002), and analysis and problem-solving should be related to the greater effort of abstraction (and of dissection) that the top-down development To best understand the convergences and diver- of a Petri Net requires. The free answers given by the gences expressed by the participants on this point, participants show how the use of concept maps we will firstly make a quantitative comparison of the seems to best mirror their way of coping with pro- average scores assigned by the two communities and fessional problems i.e. considering the elements then summaries the usefulness of the two ap- characterizing them all together and simultaneously. proaches in relation to every single activity indicated The use of the Petri Nets, with a top-down ap- in the questionnaire. proach, generally baffles the professional not used to functional abstraction mechanisms which are more 8.2.1. Quantitative comparison of the scores familiar in information technology and engineering. assigned by the two communities This was confirmed by directly observing the par- ticipants’ first approach towards elaborating a Petri As can be observed in Figure 8, the trends of average Net where individuals tended to draw a very detailed, scores attributed by the two communities are fairly and therefore complex graph already at the overview similar even though they are quantitatively different. stage of the knowledge domain. Some open answers The only divergence that is rather noticeable corre- given by participants pointed out, among the prob- sponds to the use of concept maps for study activi- able causes of difficulties, how they are used to a se- ties. In this regard, 8 members of the Audit commu- quential approach to analyzing problems which is nity justified the low score claiming that drawing up closer to the logic of flow-charts (used occasionally a concept map on a given topic can be done only if by some of them) than to the logic of top-down. one already has sufficient knowledge about it. They

Figure 8. Quantitative comparison between the average scores assigned by the two communities in relation to the usefulness of graphical representations in their profession Knowl. Org. 34(2007)No.4 223 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

therefore think that the use of the concept maps can Personal usefulness of graphical Audit Alert be more useful as a self-check tool of one’s learning representations than as an aid to studying (at least the basics). On How much do you think Petri Nets the other hand, the rather high score attributed by can/could be useful in your profes- 3,2 3,6 the Alert community should be related to their idea sional practice, to describe complex of using the concept maps as a tool to support the situations/systems? collaborative study processes. Table 2. Average data relating to the personal usefulness of graphical representations 8.2.2. Summary on the different usefulness of the two approaches As can be seen, both communities gave between av- erage and high average scores regarding the personal Apart from the deviation between the quantitative usefulness of graphical representations. evaluations formulated by the two groups and the The attitude changes when instead the same tools above-described divergence, from the graph in Fig- are considered for collaborative group activities. ure 8 it can be deduced that: Usefulness of graphical Audit Alert – the graphical representations are considered useful representations in group work particularly for analysis and problem-solving ac- How much do you think Concept tivities and less useful for study activities. The Maps can/could be useful in group 3,7 4,1 evaluation of the Alert Community is an excep- work? tion to this in correspondence with the use of How much do you think Petri Nets concept maps; can/could be useful in group work, 3,8 3,8 – both communities showed concordance (despite at- for the representation of procedural tributing rather different average scores) in evaluat- knowledge? ing that the use of the concept maps are more rec- How much do you think Petri Nets ommended in analysis activities whilst that of the can/could be useful in group work, to 3,7 3,9 Petri Nets in problem-solving activities. describe complex situations/systems? Table 3. Average data relating to the usefulness of graphical To sum up, the participants indicate that the concept representations in group work maps are more useful in describing “what it is” whilst the Petri Nets in describing “what to do to.” A comparison between Table 2 and Table 3 shows how the participants underline how graphical repre- 8.3. Usefulness of graphical representations on a per- sentations are more useful in group work than in in- sonal and group level dividual work. Here, both communities have shown a certain convergence of opinion, although there are After the general considerations, described in the the usual deviations in average values. previous sections, participants were asked to evaluate From the diagram in Figure 9 it is interesting to ob- the perceived usefulness of the two graphic method- serve how there is an appreciable divergence between ologies as a tool for both personal and group use in the two communities regarding the usefulness of the their professional practice. Here are their evalua- Petri Nets. The Audit community believe they are tions: more effective for representation activities of proce- dural knowledge. On the other hand the Alert com- Personal usefulness of graphical Audit Alert munity consider them more useful for those activities representations connected to the description/analysis of complex sys- How much do you think Concept tems. This is for both individual and group activities. Maps can/could be useful in your pro- 3,3 3,8 Again, the divergence of opinion is likely to be related fessional practice? to the members’ role within the two different com- How much do you think Petri Nets munities in the respective local health units. can/could be useful in your profes- 3,3 3,3 sional practice, for the representation of procedural knowledge?

→ 224 Knowl. Org. 34(2007)No.4 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

Figure 9. Comparison between the average scores assigned by the two groups regarding the usefulness of graphical representations respectively for individual and collaborative use

9. Conclusions 5. identification of necessary resources to carry out actions determined in the previous point Perhaps the most interesting result emerging from the research is the idea of combining the use of the As can be observed, in the high stages (see points 1- two graphic tools for professional problem-solving 2), where the question is to define the problem in activities. In particular, as the participants indicate terms of “what is it”, the concept map would in fact explicitly in some answers, the concept maps are be- appear to be the most suitable tool. In the successive lieved to be more effective in analyzing the knowl- stages (3-4-5), the Petri Nets would instead have the edge domain related to the problem to be faced. On advantage of favoring the procedural description of the other hand, the Petri Nets are thought to be “what to do to”, at a macro level (solution overview) more effective in studying and describing the proce- as well as micro level (solution details to sub- dures to solve the very problem. problems comprising the general problem). Indeed this is confirmed by the typical stages With regard to the procedural representation of characterizing problem-solving strategies (Heller knowledge, it is worth pointing out how some par- and Reif 1984; Gick 1986): ticipants found Petri Nets more effective than flow- charts in describing processes/solutions. This is due 1. analysis of reference scenario related to the prob- to at least two reasons: lem; 2. description of what is already known regarding – because besides indicating the link between activi- the specific problem; ties characterizing a process, Petri Nets require 3. formalization of the problem and of its possible the necessary resources for their development to breakdown into sub-problems; be defined (flow-charts focus only on the state- 4. identification of actions to undertake to provide a ments); solution to the problem and/or individual sub- – the top-down refinement helps focus step by step problems where it can be broken down; on the specific parts of the process and therefore Knowl. Org. 34(2007)No.4 225 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

avoids managing the complexity of what is being Heller, Joan I. and Reif, Frederick. 1984. Prescribing studied/analysed with just one graphical represen- effective human problem-solving processes: prob- tation. lem description in physics. Cognition and instruc- tion 1: 177-216. These are a fairly interesting conclusions that could Lave, Jean and Wenger, Etienne. 1991. Situated learn- lead to new developments in researching technologi- ing: legitimate peripheral participation. Cambridge cal solutions to support the integration of the two University Press. methods of formal knowledge representation dis- Nonaka, Ikujiro and Takeuchi, Hirotaka. 1995. The cussed here. The solutions need to be able to offer, knowledge-creating company: how Japanese compa- through the same software environment, support nies create the dynamics of innovation. Oxford functions to the conceptualization and to the proce- University Press: New York. duralization in problem-solving activities. Novak, Joseph D. 1991. Clarify with concept maps. These activities, as is known, provide the ideal op- The science teacher 58(7): 45-49. portunity to trigger informal peer-to-peer learning Novak, Joseph D. and Wandersee, Jim, eds. 1991. processes which are typical in online professional Special Issue on “Concept Mapping” of Journal of communities. research in science teaching 28 (10). New York: Wiley. References Peterson, James L. 1981. Petri net theory and the modeling of systems. Prentice-Hall, Inc.: Engle- Augier, Mie, Shariq, Syed Z. and Vendelø, Morten T. wood Cliffs, N.J. 2001. Understanding context: its emergence, Polanyi, Michael. 1975. The tacit dimension. Univer- transformation and role in tacit knowledge shar- sity of Chicago Press: Chicago. ing. Journal of knowledge management 5: 125-36. Prusak, Laurence. 1994. How virtual communities en- Ausubel, David P. 1963. The psychology of meaningful hance knowledge, Knowledge@Wharton. Retrieved verbal learning. Grune and Stratton: New York. from: http://www.knowledge.wharton.upenn Ausubel, David P. 1968. Educational psychology: a .edu/articles.cfm?catid=7&articleid=152. cognitive view. Holt, Rinehart & Winston: New Quillian, M. Ross. 1968. Semantic memory. In M. Yo r k . Minsky (ed), Semantic information processing. Bosch, Mela. 2006. Ontologies, different reasoning MIT Press: Cambridge, pp.216-70. strategies, different logics, different kinds of Shipman, Frank M. 1993. Supporting knowledge- knowledge representation: working together. base evolution with incremental formalization. Knowledge organization 33: 153-59. Technical report CU-CS-658-93, Department of Cunningham, Donald J. 1991. Assessing construc- Computer Science, University of Colorado, USA. tion and constructing assessments: a dialogue. Stein, Benno. 2002. Design problem-solving by func- Educational technology 31(5): 38-45. tional abstraction. Retrieved from: http://www- Donald, Janet G. 1987. Learning schemata: methods is.informatik.uni-oldenburg.de/~sauer/puk2002/ of representing cognitive, content and curriculum papers/stein.pdf. structures in higher education. Instructional sci- Stokhof, Martin J.B. 2002. Meaning, interpretation, ence 16: 187-211. and semantics. In D. Barker-Plummer, D. Beaver, Gick, Mary L. 1986. Problem-solving strategies. J. van Benthem and P. Scotto di Luzio, eds, Words, Educational psychologist 21: 99-120. proofs, and diagrams. Stanford, CA: CSLI Press, Haldin-Herrgard, Tua. 2000. Difficulties in diffusion pp. 217-40. of tacit knowledge in organizations. Journal of in- Trentin, Guglielmo. 1991. Description of problem tellectual capital 1: 357-65. solving using Petri Nets. Proceedings of the XXVth Halimi, Sonia. 2006. The concept map as a cognitive AETT International Conference, “Realizing Hu- tool for specialized information recall. In A. J. man Potential”, AETT (Aspects of Educational Cañas and J. D. Novak, eds., Concept Maps: The- and Training Technology), Roy Winterburn ed, v. ory, Methodology, Technology: Proceedings of the 24. London: Kogan Page, pp. 122-28. Second International Conference on Concept Map- van Lambalgen, Michiel and Hamm, Fritz. 2001. Mo- ping. San José, Costa Rica: Universidad de Costa schovakis’ notion of meaning as applied to linguis- Rica, Sección de Impresión del SIEDIN, pp. 213- tics. Retrieved from: http://staff.science.uva.nl/ 222. _michiell. 226 Knowl. Org. 34(2007)No.4 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

Wheeler, Thomas J. 2006. Collaborative multidisci- its applications: ICCSA 2006: International Con- pline/multiscale analysis, modeling, simulation ference, Glasgow, UK, May 8-11, 2006: Proceed- and integration in complex systems. In Marina L ings. Berlin/Heidelberg: Springer, 654-664. Gavrilova et. al., eds., Computational science and

Knowl. Org. 34(2007)No.4 227 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

The Immediate Prospects for the Application of Ontologies in Digital Libraries

Jody L. DeRidder Digital Library Center, James D. Hoskins Library, University of Tennessee, Knoxville, Tennessee. USA,

Jody L. DeRidder received her M.S. in Computer Science from the University of Tennessee in 2002, after developing repositories for the Open Archives Initiative in its alpha phases. As the lead develo- per for the Digital Library Center of the University of Tennessee Libraries, she has built, customized and altered software to create interoperable digital library systems which provide usability features beyond the norm. Nearing completion of her M.S. in Information Sciences, her research interests have turned to interoperability between systems to support usability, sustainability in digital libraries, and the application and use of ontologies via automated cross-mapping by query engines.

DeRidder, Jody L. The Immediate Prospects for the Application of Ontologies in Digital Libraries. Knowledge Organization, 34(4), 227-246. 53 references.

ABSTRACT: The purpose, scope, usage, methodology, cross-mapping and encoding of ontologies is summarized. A snapshot of current research and development includes available tools, ontologies, and query engines, with their applications. Benefits, problems, and costs are discussed, and the feasibility and usefulness of ontologies is weighed with respect to potential and cur- rent digital library arenas. The author concludes that ontology application potentially has a huge impact within knowledge management, enterprise integration, e-commerce, and possibly education. Outside of heavily funded domains, feasibility de- pends on assessment of various evolving factors, including the current tools and systems, level of adoption in the field, time and expertise available, and cost barriers.

1. Introduction: defining ontology choice of food includes encoding examples and ex- planations (Hsu 2003). To illustrate an ontology de- Each of us has a slightly different way of looking at scription of an object, a graphic example of an on- the world. Across cultures and research areas, these tology application to an audio tape of a performance differences become palpable. What is clearly under- of a single concerto (in the ABC ontology) is shown stood within a community may be unknown else- in figure 1 (Hunter, 2001). where and technically specific terminology needs to be translated, as if to a different language, for the 1.1 Points to consider general user. For applications to be able to serve us in search and retrieval across all these variations, For ontologies to be useful and feasible in digital li- human knowledge needs to be made comprehensible braries, several requirements must be met. First, to computer programs. Building an ontology re- there must be evidence that they are helpful to users. quires capturing concepts (including implicit ones), Usefulness must outweigh the cost and effort of the relationships between them, and any constraints creation and maintenance. Here we must consider on those relationships (de Bruijn 2003, 35). In tech- further the identification of our user audiences, and nical terms, an ontology represents a “language” of the purpose and scope of what we wish to accom- concepts, relations, instances and axioms (de Bruijn plish. Secondly, what is the state of the art? What and Polleres 2004), which enable computer applica- parts of this territory have been mapped out, and tions to logically reason out solutions or adapt que- what are still murky waters? Is there, or will there ries. Stanford University offers a sample ontology soon be, broad support for the use of cross-mapped application which suggests wine selections for your ontologies? If the road is clear and support is avail- 228 Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

Figure 1. able, it behooves us to make our digital libraries ac- of precision and recall between full text searching, la- cessible via ontology mapping, to increase accessibil- tent semantic indexing, and ontology-based retrieval ity, interoperability, and to leverage the work in the (with manual assignment of concepts to query) finds broader arena to meet our constituents’ needs. If it ontologies capable of providing far better retrieval will be years before the path is paved, standards will efficiency (Paralič and Kostial 2003). likely change rapidly over that time. Those with the Digital libraries routinely provide their services funding and the capability can lead the way, contrib- without human assistance; thus it is essential that uting to the development of standards and interop- their metadata be suitable for computation, support- erability. If funding and capabilities are limited, it is ing inference. The reference interview is not avail- wiser to wait till the paths are well-laid, and the pro- able; therefore, computer applications need to be cess is easier. Thirdly, we need user-friendly tools able to reason about their contents to reformulate and methodology. What are the steps? What person- queries, deduce relations between works, and cus- nel and tools are needed? As the field is still clearly tomize services to the task and user. This is only in the beginning stages, an overview of current re- possible via ontologies (Weinstein and Birmingham search and development is provided for further in- 1998). vestigation. Finally, we must seriously consider the Imagine a user entering a query, and the computer costs. What level of funds, personnel, and expertise application offers different meanings for the entered are available? terms; the user selects the intended meaning, or chooses one of the related terms offered. The query 1.2 Benefits engine transforms the query into a language that matches the terminology used in describing the data As systems grow in decentralized manners, semantic sources. In addition, it locates material related to heterogeneity is inevitable; how do we provide func- your query, based on logical deduction and inference, tional search and retrieval across distributed digital offering these results on the side. In this manner, re- libraries? Searching by keyword retrieves irrelevant levance and pertinence are improved, and browsing is information when a term has multiple meanings; and enabled. With ontologies, we enable computer appli- information is missed when multiple terms have the cations to perform intelligent searching instead of same meaning. In addition, concepts that may not be keyword matching, query answering instead of in- represented by the terminology in the document or formation retrieval, and to provide customized views metadata are not available to searchers. Information of materials. A standardized vocabulary referring to retrieval is a negotiation process, and as digital con- natural language semantics enables automatic and tent multiplies, users need assistance in wading human agents to share information and interoperate through the results of their searches. A comparison functionally (Fensel et al.2003c). Knowl. Org. 34(2007)No.4 229 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

1.3 Depth and breadth ping between ontologies must be done by people competent in both domains; the current status is There are many different ways to classify ontologies; that human assistance in mapping will likely be nec- two of the most useful reflect the depth and the essary for some time to come, for high quality map- breadth of the ontology. In the depth dimension, the pings (Bockting 2005). specificity of the ontology determines its “weight.” Problems in cross-mappings can be of several ty- Lightweight ontologies are little more than taxono- pes. Data objects of the same name may describe dif- mies, and include only concepts and their properties, ferent real-world elements; concepts may be ascribed relationships between those concepts, and controlled to different levels of the metadata structures (an at- vocabularies. Heavyweight ontologies also include tribute in one ontology may be a class in another); axioms and constraints that increase the capability of conceptual approaches may preclude a functional a computer application to logically reason with the correspondence; descriptions of a single real-world data given. Dublin Core might be considered an ex- element may vary considerably and conflict with one tremely light-weight ontology, whereas Cyc (created another; and one of the ontologies may have incor- using the Knowledge Interchange Format, a proposed rect information (Adam, Atluri, and Adiwijaya standard) may be the most extensive top-level ontol- 2000). A concept in one ontology may not exist in ogy currently in existence (de Bruijn 2003, 6-9). (Two another, or may have an entirely different meaning. limited open-source versions of this encyclopedic on- For example, in the Harmony Project, members of tology are available: OpenCyc and Research Cyc.) In the closely-related domains of digital libraries and the breadth dimension, there are general (top-level, cultural heritage and museum communities sought or global) ontologies, domain ontologies (specific to to merge the digital library ABC Ontology (Lagoze a particular area) and application ontologies, which and Hunter 2001), with the CIDOC (International describe concepts depending on the task as well as the Committee for Documentation of the International domain (some refer to application ontologies as an- Council of Museums) Conceptual Reference Model other form of domain). (ICOM/CIDOC and CIDOC CRM 2005). They uncovered cultural biases particularly in terms of the 1.4. Cross-mapping issues nature of change; while both ontologies were con- cerned with change over time, one modeled the In order to provide searching via natural vocabulary, change of objects, while the other modeled changes a mapping is needed from the natural language of in the context and meaning for those objects (Doerr, each user group to the entries in each metadata vo- Hunter, and Lagoze 2003). A comprehensive over- cabulary. This is known as an “entry vocabulary in- view of the problem areas of mapping, including dex” or EVI. In addition, to search across databases, variation of expressiveness and the differing model- it is necessary to have mappings between each possi- ing paradigms or styles, is discussed by Klein, and ble pair of system vocabularies, or ontologies. Map- diagrammed in figure 2 (Klein, 2001).

Figure 2. 230 Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

An IMLS-funded effort (National Leadership erable for computer applications, as one-to-one map- Grant No. 178), based on prior research partially pings of all involved ontologies does not scale. How- supported by a DARPA (Defense Advanced Re- ever, obtaining global agreement on controlled terms search Projects Agency) contract, explored the feasi- and relationships is infeasible, so a layering approach bility of cross-mapping vocabularies of numeric data based on generality is more likely to succeed, with sets and text files (Buckland et al. 2007). It was dis- mapping between domains and higher level ontolo- covered that the vocabularies for topical categoriza- gies as needed (Meersman 1999). A single general tion vary greatly, requiring interpretive mappings be- light-weight ontology to be shared by multiple do- tween systems, and that specification of geographical mains was explored by (Stuckenschmidt and van area and time period are problematic. Both names of Harmelen 2005). After developing their framework, places and of time periods are culturally based, un- the authors stated that the shared ontology can only stable, and ambiguous. The use of geospatial coordi- be developed if all sources of information are known, nates is suggested as the only effective method of re- and the conceptualization of each source is accessi- lating locations to search terms, which means that ble; they concluded this was only feasible for a single both gazetteers and map visualizations become criti- domain (Stuckenschmidt and van Harmelen 2005, cal to implement search retrieval in a user-friendly 249). De Bruijn and Polleres add that a limitation to manner. A similar application needs to be developed this approach would be the likely lack of agreement for time periods, and this issue is being addressed in on the interpretation of the concepts in the shared a subsequent IMLS-funded study by the Electronic ontology by all the authors of local ontologies (de Cultural Atlas Initiative (Electronic Cultural Atlas Bruijn and Polleres 2004). Initiative 2006). Among other objectives, the intent Another possible middle ground between the is to contextualize objects in library and museum peer-to-peer approach and the central core ontology collections by using or adapting existing and emerg- method, would be to implement layers or a hierar- ing standards and protocols. This initiative is de- chical application (de Bruijn and Polleres 2004). One scribed further in (Petras et al. 2006). way to envision this is to compare a scientific disci- Ontologies must be expected to evolve over time pline with a group of islands, where each area of re- as knowledge and understanding grow, and termi- search is an island, and each island has a further nology changes. Their mappings to other ontologies breakdown of specificity into “dialects.” If a single must also evolve, and this evolution may require island had 3 dialects, each dialect would be a Level 1 change in other ontologies to which they are mapped ontology, probably the most specific in terminology. (de Bruijn and Polleres 2004, 11). Thus the initial ef- A shared ontology for the entire island would be a fort to develop ontologies is insufficient; they must Level 2 ontology. Islands (or domains) could map to not only be maintained but also versioned over time, one another as needed. A shared ontology for the and compatibility with other ontologies considered group of islands would be a Level 3 ontology, the with each evolution. Cross-mappings are rare, ex- most general so far. Other sets of islands could have pensive, time-consuming, and difficult to maintain. similar structure, and again, the hierarchy could con- With 135 semantic types and 54 relationships, the tinue as needed, but with a distributed, organically Unified Medical Language System Metathesaurus is a growing base rather than a single top-down applica- notable example (Smith et al. 2004). tion. This may be the only feasible solution, as it re- flects the grassroots approach and grows as needed. 1.5. A bird’s eye view 2. State of the art It is insufficient to consider ontology mapping as a singular or only a local problem. Many differing on- Currently, the semantic search engine Swoogle states tologies already exist with overlapping domains of that there are at least 10,000 ontologies in use on the knowledge and application (de Bruijn 2003). And WWW, and provides a list of ontology repositories, there are at least three basic conceptual approaches semantic web search engines and crawlers. The 2005 to interoperability: a global ontology to which all lo- version of Swoogle indexed 337,182 documents, cal ontologies are mapped, a peer-to-peer system while the 2006 version currently lists their number (where mappings exist between local ontologies of documents at 2,030,039 (Swoogle 2006), a major where needed), and a combination of the two. A increase. This cursory comparison indicates a grow- central, heavyweight global ontology is clearly pref- ing interest in the implementation of ontologies. Knowl. Org. 34(2007)No.4 231 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

2.1 Within domains within materials, would be extremely useful for isolat- ing information and minimizing the time spent sifting Ontologies seem to have already found a home in in- through search results. As the quantity of materials structional technology, as an outgrowth of KOS online explodes, findability becomes critical. (Knowledge Organization Systems). The primary An example of ontology use in enterprise integra- difference is that ontologies apply logic to the rela- tion would be the Unified Medical Language System tions (Binding and Tudhope 2004). Other differ- (UMLS), which provides services for computer appli- ences are that existing KOS lack conceptual abstrac- cations across a multitude of health-industry areas. tions, semantic coverage, consistency, and automat- The UMLS Metathesaurus is a and syn- able processing (Soergel et al. 2004). Ontologies are thesis of more than 100 different thesauri, classifica- important to education because concepts and the re- tions and code sets for health care, billing, statistics, lationships between them “provide a powerful, and medical literature, research and resources, and requires perhaps the only, level of granularity with which to constant updating and renovation. The Metathesaurus support effective access and learning” (Smith et al., preserves the many views present in the source vo- 2004, 2). A portal already exists for sharing tools, cabularies, as each may be useful for different tasks. projects, research and information for ontology use Hence, it must be customized to be effective in any in education (Dicheva et al. 2006), and a commercial one application (U.S. National Library of Medicine, success in the education arena is Xyleme, which de- March 2006a). UMLS includes a to pends upon the existing heterogenous XML struc- “provide a consistent categorization of all concepts ture in documents for pattern-matching, mapping, represented in the UMLS Metathesaurus and to pro- encoding, and creating “views” for abstract query re- vide a set of useful relationships between these con- sponse (Aquilera et al. 2000). cepts” (U.S. National Library of Medicine March The Alexandria Digital Earth Protoype (ADEPT), 2006b). In addition, the SPECIALIST Lexicon pro- currently in use for teaching geography courses at vides a general English vocabulary that includes bio- the University of California, employs an ontology to medical terms, for Natural Language Processing link the current lecture material to a graph showing (NLP), to improve searchability for the general user its relation to other concepts, and also links to ex- (U.S. National Library of Medicine, March 2006c). amples from the digital library. All three views are E-Commerce potential is clearly indicated in the presented at the same time, to give students the con- level to which ontologies have already proven their text and examples they need to make sense of what value in critical government defense, finance, and the teacher is trying to communicate. In addition, manufacturing. An example in the business arena is the ontology supports a Virtual Learning Environ- Australia’s InfoMaster. In the United States, Ontol- ment that lets the teacher create, use, and re-use ogy Works, founded in 1998 by former members of learning materials in different fields of science and in the intelligence community, currently serves the criti- various learning environments (Smith et al. 2004). cal needs of such clients as the U.S. Department of Yet here the content of the digital library itself is Defense, the U.S. Department of Justice, Science Ap- limited to examples, primarily images and graphs. For plications International Corporation, Boeing, North- digital libraries containing complex materials, there rop Grumman, and the Sierra Nevada Corporation. exists the need for two levels of access: discovery of Ontology Works is a highly successful commercial resources, and discovery within the resources, the lat- venture, and claims to have the most sophisticated on- ter of which requires the creation of descriptions of tology-driven database on the market (Ontology semantic and internal structural organization through Works, 2005). Another commercial success is Onto- resource decomposition. The GREEN digital library broker, a deductive, object-oriented database system, project explored the problems and possibilities in this now available via Ontoprise. area, using term extraction algorithms, performing MOMIS (Mediator environment for Multiple In- text analysis, and extending a combination of meta- formation Sources) has been used to model a tourism data schemes (LOM for learning objects and MatML information provider system. In the MOMIS Integra- for materials). This group noted the need for a con- tion Methodology, local source schemata are ex- vergence of metadata schemes and robust mechanisms tracted. If the source material is unstructured, text is for navigating a complex associational web of re- extracted, analyzed, and an XML schema is generated. sources (Shreve and Zeng 2003). Clearly, the ability to Then a meaning for each element of the source locate specific content, regardless of its location schema is chosen from a lexical database of English, 232 Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

WordNet (prompts for choices are given to a human; across over 50,000 database tables, manually defining the choice is manual). A common thesaurus, a global the domain model with 500 concept nodes, then schema, and sets of mappings to local schemata are mapping them with intentionally vague semantic generated. Finally, a meaning is assigned (semi- meaning to the possible 70,000 nodes of the larger automatically) to each element of the global schema. ontology. While much of the model building was au- The query manager then rewrites the incoming global tomated, it was far from simple to create a coherent query as an equivalent set of queries to match the local domain model out of the variation of metadata and source schemata; local sources are queried with these, domain terms within the databases. The end product and the resulting responses are fused and reconciled cannot support automated inference, but does enable into a final response (Bergamaschi et al. 2005). browsing and non-expert searching with familiar Exploration has been made into non-textual con- terms (Hovy, 2003). tent as well. Annotation of historical images with a OntoMedia, an opensource effort, builds on the domain-specific ontology enables users to retrieve CIDOC Conceptual Reference Model and the IFLA- images for which they inadequate historical knowl- NET (International Federation of Library Associa- edge and keywords (Soo et al. 2002). An Amsterdam tions and Institutions) FRBR model (Functional Re- research group has developed a Visual Ontology Us- quirements for Bibliographic Records) to facilitate the ing MPEG-7 and WordNet, which supports descrip- annotation of semantic content of multimedia. It pro- tions of colors and shapes of objects, to support vides the user with a graphical user interface with automatic annotation (Hollink et al. 2005). By ex- metadata indexing and search capabilities, for organiz- tracting and analyzing visual features, mapping clus- ing multimedia collections, though the ontology is ters of sequences and patterns to ontological con- presented as a general, high-level ontology for reuse cepts, another experiment has demonstrated the fea- across domains (Lawrence et al. 2005). sibility of semi-automated ontology annotation of Semantic Interoperability of Metadata and Infor- domain-specific videos (Bertini, et al. 2005). In a mation in unLike Environments (SIMILE) is a joint fourth model, audio tapes of sports broadcasts were project of MIT Libraries and MIT Computer Science annotated (Khan, McLeod, and Hovy 2003), though and Artificial Intelligence Laboratory, which lever- the text analyzed was extracted from the closed cap- ages and extends DSpace. The intent is to enhance tions that came with the audio objects. In this pro- general interoperability across distributed informa- ject, only three relations were modeled (isA, In- tion stores of varying types, and to provide useful stance-Of, and Part-Of), and an automatic query ex- end-user services for mining that material (Leuf pansion mechanism was built using WordNet as a 2006, 223-4). In an early prototype of the project, generic ontology, though they found it too incom- VRA Core (Visual Resources Association Data plete to functionally model the domain. Standards Committee) and IMS LOM (Learning According to (Ontology Works 2005), the leading Object Metadata) were translated into RDF schemas research groups in ontologies are IFOMIS (The In- with enrichment obtained from Wikipedia and the stitute for Formal Ontology and Medical Informa- prototype OCLC Library of Congress Name Au- tion Science), ECOR (European Centre for Onto- thority Service. Then the datasets were transformed logical Research), LOA (Laboratory for Applied from XML to RDF/ XML using XSLT. While the Ontology), and NCOR (National Center for Onto- developers were able to automate linkage of RDF logical Research). Based on the number of recent on- datasets using string similarity techniques, the ap- tological projects, Stanford University’s Knowledge proach was error prone and results had to be manu- Systems, Artificial Intelligence Laboratory and the ally reviewed. In addition, the enrichment techniques Sirma Group’s OntoText Semantic Technology Lab could be automated as well, but again, required hu- should perhaps be added to this list. man intervention to verify the validity of the data produced (Butler et al. 2004). 2.2. Across domains 3. Fundamentals One of the primary purposes of cross-mapping is to allow searching of heterogenous resources from a 3.1. Methodology single interface. The Digital Government Research Center Energy Data Collection project used an ove- A recent analysis of the state of ontology engineering rarching ontology (SENSUS) to provide searching bemoans a lack of guidance, unified methodology, Knowl. Org. 34(2007)No.4 233 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

cost benefit analysis tools, and selection support to One unusual investigation tested the hypothesis choose engineering approaches (Simperl and Tempich that the more indexing is geared toward the user 2006). Real-world applications require comprehen- task, the better the results. Kabel, Hoog, Wielinga sion of the scope and progression of the project, cus- and Anjewierden (Kabel et al. 2004) compared the tomizable workflows, user-friendly tools, and auto- efficiency, effectiveness, precision of use, and quality mation of the majority of the tasks. While several on- of results when users were given access to keywords tology management tools are relatively mature, many versus a domain index versus an instructional index, necessary ontology engineering activities are not yet for creating lesson plans. The domain index was con- adequately supported by technology, and critical as- tent-based, with specific terminology. The instruc- pects, such as automation of ontology creation, appli- tional index provided classification of objects by use cation, and mapping are still being researched. The in instructional material, and hence was task- basic model for implementation of an ontology oriented (an application ontology). An example of (without consideration of ontology mapping: see fig- this would be a “behavioral description” with “spe- ure 3) includes a feasibility study, domain analysis, cific” scope, and the instructional role of “illustra- conceptualization, encoding, maintenance and use tion.” Their hypothesis was generally correct. The (Simperl and Tempich 2006). domain index provided more efficient, effective

Figure 3. Ontology Engineering Activities

3.2. Purpose and scope search and retrieval than the keyword search, and the instructional index provided better precision than Before choosing, adapting, or creating an ontology, the use of keywords and domain indexing. Hence, it the purpose and the user audience must be deter- appears that we need to clearly understand the needs mined. If the domain is clearly delineated and there of our users, in order to choose the type of ontology is no desire for interoperability or cross-mapping to that will actually provide the specificity they need outside ontologies, the scope and direction are sim- for the task at hand. plified. If, however, the desired outcome is more di- ScholOnto (Shum et al. 2000), for example, is an verse and interoperable, the choices made in this as- effort to develop an ontology for discourse about re- sessment will be both critical and complex. search, rather than for the research itself, which is an 234 Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

interesting twist. Designed to provide an ontology pave the way. OCML (Operational Conceptual Mo- for scholars to interpret, discuss, analyze and debate deling Language) supports the construction of on- about existing literature, ScholOnto (developed us- tologies and problem solving methods, and is sup- ing OCML (Operational Conceptual Modeling Lan- ported by a large library of reusable models (via the guage) overlays existing metadata and does not at- WebOnto editor). Currently in use by several pro- tempt to directly describe the content of the re- jects, OCML is available free of charge for non search. Instead, the ontology provides a structure to commercial use. clarify the intellectual lineage of ideas, their impact, Building on previous work is a third option, and scholarly perspectives on those ideas, inconsistencies the one which offers the greatest variety of tools at in approaches or claims, and convergences of differ- present. Many of these are domain-specific. ent streams of research (Shum et al. 2000, 3). Here, the comments about the literature become the ob- The ABC Metadata Model Constructor funda- jects for retrieval and for building new structures to mental classes for digital libraries were determined define the usefulness of the object. This is a social by analyzing commonalities between Dublin networking function, an interactive community- Core, INDECS (Interoperability of Data in e- created layer over the research itself. This could be Commerce Systems), MPEG-7 (Multimedia Con- an invaluable way to add context and clarity to un- tent Description Interface), CIDOC (Interna- derstanding and exploration of a domain. Thus, the tional Committee for Documentation of the In- application of ontologies to digital libraries might ternational Council of Museums) Conceptual not be in querying the documents themselves, but in Reference Model and the IFLANET (Interna- building relationships and connections and social tional Federation of Library Associations and In- context around the documents. stitutions) FRBR model (Functional Require- ments for Bibliographic Records). These classes 3.3. Conceptualization form building blocks for developing either appli- cation or domain-specific ontologies, with event- If ontologies exist that can be adapted to the pur- aware views for modeling different manifestations pose at hand, tools are needed to perform such adap- of a relationship (Hunter 2001). This tool pro- tation. If an appropriate ontology does not yet exist, vides graphical user interfaces and is free to tools are needed for modeling and constructing the download, but it is still an experimental prototype ontology. Selecting or creating an ontology involves (Leuf 2006, 217-8), without support, and assumes a fundamental tradeoff between the degree of com- users understand Java, RDF, and basic ontology plexity and generality versus the degree of efficiency and metadata principles. of interpretation and reasoning within the language (Weinstein and Birmingham 1998, 35). Maximum WebOnto is a freely available Java applet coupled consideration must be given to the desired services. with a customized web server (LispWeb), which The following findings are intended to provide a provides browsing, visualization and editing of starting point for further exploration. knowledge models via the web. WebOnto is cur- One possibility is that of creating ontologies out rently being used with ScholOnto (discussed of existing metadata schemes or thesauri, adapting above) and PlanetOnto, for search, retrieval, news and adding as needed. The more complex and struc- feeds, alerts, and presentations of laboratory- turally coherent the metadata scheme, the more fea- related information. sible this may be. One effort under development is an adaptation of the AGROVOC Thesaurus, devel- The Kraft project outlines steps to building sha- oped and maintained by the Food and Agriculture red ontologies: ontology scoping, domain analy- Organization of the United Nations (Soergel et al. sis, ontology formulation, and top-level ontology 2004). An older effort to transform MARC (MA- (Jones et al. 1998). However, their methodology chine Readable Cataloging) uncovered difficulties in lacks comprehensive evaluation of ontologies and the varying dimensions and multiple levels of granu- is not applicable to global domains (Stucken- larity containing partial descriptions, which is a req- schmidt 2005, 68). uisite feature of bibliographic data (Weinstein and Birmingham 1998). Another possibility is creating The Protégé opensource Ontology Editor pro- an ontology from scratch, using existing models to vides two main ways of modeling ontologies, and Knowl. Org. 34(2007)No.4 235 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

can export in various formats including OWL. 3.4 Encoding Used extensively in clinical medicine and the biomedical sciences, Protégé covers the full range For computer applications to be able to use ontolo- of development processes (Leuf 2006, 209-210). gies, they must be encoded in machine-readable lan- guages: in particular, all implicit relations between KAON (KArlsruhe ONtology) offers a stable concepts must be explicitly encoded. To enable inter- opensource, comprehensive tool suite for ontol- operability between ontologies and query engines, we ogy creation, management, and a framework for need to agree on standards for these encodings. As in building applications; it was designed for business any other area, there is some disagreement on what is applications requiring scalability and efficient rea- the most useful path. OntologyWorks used the draft soning capabilities (Leuf 2006, 213). ISO (International Organization for Standardization) standard, SCL (Simple Common Logic), which has Chimæra is a system for creating and maintaining been superceded by the Common Logic Standard, distributed web ontologies, as well as for merging currently under development (ISO 2006). Since On- ontologies and providing multidimensional diag- tologyWorks does not seek interoperability with the noses to identify problems (Leuf 2006, 210-211). broader public (it is a commercial effort), their focus Chimæra can load and export files in OWL, and is was on what was most efficient and effective for their available opensource. needs. However, if this standard is adopted by the ISO, it will likely compete with OWL for wider on- There are many possible variations in the ability of tology development. CyCorp developed its own lan- software to combine and relate ontologies; Klein pro- guage, CycL, for their powerful Cyc system; how- vides a comparison of several different approaches ever, their opensource components (OpenCyc and (see table 1): SKC (Scalable Knowledge Composi- ResearchCyc) provide translators to certain other tion), Chimæra, PROMPT, SHOE (Simple HTML languages, and the ability to export selectively in Ontology Extensions), OntoMorph, metamodel, OWL (CyCorp 2002). Schematron, “a language for OKBC (Open Knowledge Base Connectivity) and making assertions about patterns found in xml layering. Of these, OntoMorph addresses the major- documents,” is based on the tree pattern uncovered in ity of the stated problems in combining ontologies. the marked-up document. It allows you to determine However, Klein states that “mismatches in expres- which variant of a language you are working with, as siveness between languages is not solvable” and more well as to verify that it conforms to a particular comprehensive schemes need to be developed for in- schema (Leuf 2006, 218). Schematron was published teroperability of ontologies (Klein 2001). as a draft ISO standard in 2004.

Table1. Table of problems and approaches for combined use of ontologies Legend A: Solves problem automatically U: Solutions suggested to user M: Provides mechanism for specifying solution (Klein 2001) 236 Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

A proposed RDF Thesaurus Specification provides tween metadata terms in domain schemas to the pre- “conceptual relationships for encoding thesauri, clas- ferred terms in the ABC ontology, and then (based sification systems and organized metadata”, as well as on this relationship), generate semantic relationships a proposal for encoding a core set of thesaurus rela- (cross-domain) between each of those original meta- tionships (Cross et al. 2003). The two standards that data terms, outputting the results in RDF (Hunter, have been adopted by the World Wide Web Consor- 2001). In addition, Harmony offers the ABC Meta- tium are the Resource Description Framework data Model Constructor for use with their ABC on- (RDF) and the Web Ontology Language (OWL). tology, an RDF visualization tool for complex meta- RDF is a simple notation for representing relation- data (RDFViz), and a simple RDF query language ships between and among objects. RDF uses URIs (Rudolf “Squish”).(Brickley et al., 2002b) (Uniform Resource Identifiers) for identification, The SIMILE project (Semantic Interoperability of and describes resources in terms of three parts: sub- Metadata and Information in unLike Environments) ject, predicate (the type of property about the sub- assessed existing tools in 2003, including RDF editors ject), and object (the value of the property about the (IsaViz and RDFAuthor), schema editors (Protégé- subject) (World Wide Web Consortium, 2004b). 2000, KAON OI-Modeller, and Ontolingua), ontol- OWL, the Web Ontology Language, was developed ogy visualization software (OntoRama and Ontosau- for defining and instantiating web ontologies so that rus), application profile editors (SCART: The MEG computers can logically interpret information. An ex- Registry Client), metadata instance editors (Hay- tension of RDF, OWL has 3 increasingly complex stack, Standardized Hyper Adaptable Metadata Edi- sublanguages: tor, and Simple Instance Creator), XForms for com- bining XML and forms, and thesaurus construction – OWL Lite is the simplest, and most closely re- software (WebChoir vocabulary tools, Thesaurus lated to thesauri. Builder, MultiTes, and Term Tree) (Gilbert and Butler – OWL DL is based on description logics, which 2003). They determined that the existing tools only enable computer applications to reason logically assist users in formally capturing existing models, and make inferences. rather than helping them to model their own schema. – OWL Full is provides maximum expression with In addition, they found no formal approach for creat- no computational guarantees. ing RDF models, so they proceeded to fill the gaps. Some of the tools they created include: a faceted OWL Full will probably never have wide usage due browser for RDF browsing via standard web brows- to its lack of tractability and lack of logic support; ers (Longwell), an interactive graphical RDF visuali- practical applications will likely use some subset of zation browser (Welkin), a tool for converting exist- OWL DL, as it can provide both power and func- ing syntaxes into RDF (RDFizers), a tool that sum- tionality. (de Bruijn 2003, 74). marizes the structure of an XML dataset (Gadget), and a generic ontology for rendering RDF in a hu- 3.5 Tools man-friendly manner (Fresnel, still in development) (Mazzocchi, Garland and Lee 2005). Much of the research in the cross-mapping arena is University of Maryland’s Mindswap Lab has de- focused on identifying and seeking solutions to the veloped an open-source OWL-DL reasoner, Pellet, problems, rather than developing tools. However, for which commercial-level support is available. The XeOML offers an extensible markup language for InfoSleuth project is working to develop a commer- mapping ontologies against one another, two at a cial query server that dynamically adapts to the avail- time. Simple mappings are one-to-one relations, and able information sources and services, fusing related complex mappings may involve more than one ele- information from heterogenous resources and ab- ment or element type in either or both languages stracting results to the level appropriate to the user (Pazienza 2004). needs (Telcordia Technologies 2005). Query engines MetaNet is a metadata term thesaurus created by can currently be classified coarsely by whether they the Harmony project to provide additional semantic use a centralized ontology to which all others are knowledge that does not exist in XML-encoded me- mapped, or whether they support individual map- tadata descriptions. Since many entities and relation- pings between ontologies. TSIMMIS, InfoMaster, ships occur across all domains, it is possible to gen- MOMIS, and Xyleme (an industrial solution) are erate a simplified set of semantic relationships be- based on a framework in which a single central Knowl. Org. 34(2007)No.4 237 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

schema is mapped to local schemas (Pazienza et al., tral repository to use for translating queries Stucken- 2004). The Bremen University Semantic Translator schmidt, 2005, 192-198). In this manner, heteroge- for Enhanced Retrieval (BUSTER) is a middleware nous databases and ontologies are managed without of this same type, designed to access and integrate the need for a single global ontology (Mena et al. multiple ontologies which are based on a common 2000). MAFRA (the Ontology MApping FRAme- vocabulary. The general top-level ontology it uses is work) also is based on distributed mediation systems based on simple Dublin Core with some added re- rather than a centralized one (Pazienza et al. 2004). finements. (Visser and Schuster 2002). Thus the user must commit to the basic generalized vocabulary 3.6 Costs that is used to define concepts in all the source on- tologies, and is not presented with a specific domain While ontologies offer benefits in terms of interop- view (Stuckenschmidt 2005, 199-207). erability, browsing and searching, reuse, and structur- In contrast, the OBSERVER (Ontology Based ing knowledge in a domain, the costs must be consid- System Enhanced With Relationships for Vocabulary ered. Costs include construction, learning, cross- hEterogeneity Resolution) system requires the user mapping, and maintenance and continual develop- to select his terms from one of the ontologies it sup- ment of both the ontologies and the software (Men- ports; the source material that ontology covers is zies 1997). Information about cost is difficult to ob- then queried (Figure 4). If the results are not satis- tain, as most efforts are prototypes or commercial factory, the user query is rewritten into the ontolo- developments. Tim Berners-Lee, a major proponent gies of other information sources in order to query of the Semantic Web, downplays the total cost, and other holdings (Mena et al. 2000). fails to consider methodologies, depth of ontology, or OBSERVER uses synonyms, hypernyms, hypo- even level of usability in his online assessment (Bern- nyms, overlap, disjointedness and coverage to map ers-Lee 2005). In a later article with others, however, between ontologies, storing these relations in a cen- this stance is modified somewhat by implying that

Figure 4 238 Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

general web applications may only need lightweight To be able to effectively apply an ontology, much ontologies; and recognition that in certain commer- less change it, one must learn it, another time- cial applications, the use of powerful heavyweight on- consuming task. Apart from domain knowledge, the tologies will easily recoup the cost (Shadbolt et al. person encoding the document must have a level of 2006). Recently a cost estimation approach has been understanding approaching that of a skilled knowl- developed (ONTOCOM; a detailed description is edge engineer (Marshall and Shipman 2003). To ex- available in (Bontas and Mochol 2006) and an exam- pect the average citizen to have or develop the neces- ple of its application to a particular ontology (DILI- sary knowledge and skill to coherently apply a do- GENT) is described (Bontas and Tempich 2005), main ontology to a document is infeasible (Marshall though the actual results of the many formulas upon 2004). If the users will not apply the ontologies, the various cost drivers are not included in this publi- then the application of metadata to resources must cation. These cost drivers include: be performed by the institution or service. Hence Product factors: complexity of the domain analy- the users only bear the cost if they pay for the ser- sis, conceptualization, implementation, instantiation, vice, either directly or indirectly; this implies that evaluation, integration, reusability, and documenta- ontologies may indeed only be feasible, in the long tion (Institut für Informatik 2006): term, for applications in commercial services. The only other solution to this cost would be the – Personnel factors: ontologist/domain expert ca- automation of application of ontologies to resources. pability & experience, language and tool experi- The development of this functionality depends heav- ence, and personnel continuity ily on research and tools developed by the artificial – Project factors: tool support, multi-site develop- intelligence community. Some of the techniques de- ment, and required development schedule veloped include a noun phrasing technique for con- – Reuse/maintenance factors: ontology understand- cept extraction and concept association based on ability, domain/expert unfamiliarity, and complex- context, frequency and co-occurrence of terms ity of evaluation, modifications, and translations (Chen 1999). However, precise meanings for every relation are necessary for automatic classification Development of an ontology requires a shared con- (Weinstein and Birmingham 1998). A 2003 assess- ceptualization by domain experts, users and design- ment stated that there are a number of issues to be ers (de Bruijn 2003, 5); this is not only difficult, but resolved before natural language can be understood requires such a high initial investment, it will only be by computers; and the majority of information pre- supportable where there is commercial interest sent on the web is in natural language (Fensel (Stuckenschmidt and van Harmelen 2005, 249). 2003a). However, for technical fields with more While the initial cost of ontology implementation is structured terminology, a text-mining system for frightening, one IBM researcher predicts the long scientific literature, Textpresso, shows considerable term maintenance of an ontology to be 80% of the promise for assisting in automatic ontology annota- cost (Welty 2005). In a recent survey of 34 ontology tion. While the machine cannot replace the human engineering projects, half of which were commercial, expert, it can increase efficiency greatly (Müller et al. all participants emphasized the resource-intensive 2004). Further investigation into current develop- nature of domain analysis and the lack of low barrier ments in this area is warranted. methods and tools (Simperl and Tempich 2006). The For the ontology to be widely usable and interop- implications are that there must be a clear and press- erable, cross-mapping to other ontologies and do- ing need for the benefits of ontological indexing and mains is necessary, requiring the involvement of mul- retrieval, sufficient to provide extensive funding or tiple domain experts (Adam, Atluri, and Adiwijaya the dedicated volunteer labor of known and trusted 2000). And ontologies (and their supporting soft- professionals. From the limited survey of the land- ware) must be expected to change (de Bruijn 2003, scape performed for this report, it appears that fund- 35), as knowledge and terminology are continually ing is currently available in medical fields, environ- evolving. It is quite possible that this aspect may re- mental research, national defense, and business ap- strict the usability of ontologies to specified do- plications. The educational field may contain suffi- mains. Cross-mapping is only likely if there is suffi- cient volunteer experts, university support, and cient need and funding to offset the expense, and grant-funded development to make ontology devel- then it is not likely to be maintained over time with- opment feasible for instructional materials. out continued funding and demand. Knowl. Org. 34(2007)No.4 239 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

4. Conclusions guage has been adopted by W3C, tools and systems continue to evolve, and new ontologies appear every The decision about if, when, and how one should year. If funding exists, and an acceptable ontology apply ontologies to one’s digital library is a complex exists in OWL for a domain covered by a particular one. There are many aspects to consider, and several digital library, it would be reasonable to assess the of those aspects are moving targets. Any assessment existing tools for application and delivery, and possi- or survey, such as this one, can only be a snapshot of bly move forward in implementation. As ontology an evolving landscape, and as such, is useful primar- and tool development lowers the technical and cost ily in helping one get his bearings for the moment. barriers, general digital libraries should certainly be- Further research and feasibility studies are necessary come involved: this is perhaps in the very near fu- components for any digital library considering the ture. application of ontologies. Some purposes of ontologies may be particularly References useful. Dieter Fensel predicted in 2003 that three ar- eas in which ontology application potentially has a ACS. World Ranking Thesaurus Software. Active huge impact are knowledge management, enterprise Classification Solutions. http://www.termtree.com integration, and e-commerce (Fensel 2003b). Al- .au/, viewed 1 June 2007. ready this prediction seems to be proving true. If Adam, Nabil R., Atluri, Vijayalakshmi and Adiwi- one’s digital library falls into these domains, the use- jaya, Igg. 2000. SI [system integration] in digital fulness may outweigh the cost: funding created by libraries. Communications of the ACM, 46: 6. demand for a service may well be sufficient to over- Aquilera, Vincent, Cluet Sophie, Veltri, Pierangelo, come other obstacles. Usefulness in educational Vodislav, Dan, and Wattez, Fanny. 2000. Querying realms seems quite promising, but the return on in- XML documents in Xyleme. In Proceedings of the vestment has yet to be proven (Milam 2005). ACM SIGIR Workshop on XML and Information Outside of heavily funded domains, feasibility is Retrieval, 28 July 2000. http://www.haifa.il.ibm yet to be determined. If the target audience for the .com/sigir00-xml/final-papers/xyleme/Xyleme digital library is the general public, at no cost to the Query/XylemeQuery.html. user, then it is not likely that the application of on- Becker, Peter, Green, Steve, and Roberts, Nataliya. tologies is currently monetarily feasible. Ontologies Ontorama. knowledge, visualization and ordering incur tremendous expenditures of resources in their laboratory. http://www.kvocentral.org/software/ creation or adoption, application, cross-mapping, ontorama.html, viewed 1 June 2007. maintenance, and possibly software development. Beckett, Dave, Steer, Damian, Heery, Rachel, and Tools exist to assist in modifying existing ontologies, Johnston, Pete. 2003. MEG registry project. but they are not simple, and require extensive do- UKOLN Metadata for Education Group. http:// main knowledge and understanding of the concepts www.ukoln.ac.uk/metadata/education/regproj/ and relations required for the ontology to be func- , last updated 12 September 2003. tional. Tools to apply ontologies to existing re- Bergamaschi, Sonia, Beneventano, Domenico, sources are still under development. Cross-mapping Guerra, Francesco, and Vincini, Maurizio. 2005. ontologies for use beyond a single domain is a new Building a tourism information provider with the territory; if the source ontologies have the same ba- MOMIS system. Journal of information technology sis, query engines appear to have good results, but and tourism 7: 3-4. http://tourism.wu-wien.ac.at/ that’s a rather telling caveat. Otherwise, it seems that Jitt/JITT_7_34_Bergamaschi_et_al.pdf. only general mappings are feasible, supporting gen- Berners-Lee, Tim. 2005. Putting the Web Back in eral queries with limited precision. To some extent, Semantic Web. In 4th International Semantic Web mappings can be automated, but must still be re- Conference, November, 2005, slide 17. http:// viewed by a human. www.w3.org/2005/Talks/1110-iswc-tbli/. Systems to support ontology use (query engines Bertini, M., Del Bimbo, A., and Torniai, C. 2005. and semantic web browsers) are becoming available, Automatic video annotation using ontologies ex- but their usefulness is limited by the ontologies and tended with visual information. In ACM Multime- their mappings. And the cost of maintenance and Conference 2005. ACM, November 2005. continual evolution of an ontology is yet unmeas- Binding, Ceri and Tudhope, Douglas. 2004. KOS at ured. On the other hand, a general ontology lan- your Service: programmatic access to knowledge 240 Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

organisation systems. Journal of digital informa- School of Education, Centre for Educational So- tion 4: 4. http://jodi.tamu.edu/Articles/v04/i04/ ciology. http://www.epros.ed.ac.uk/metanet/, last Binding. modified 3 November 2003. Bockting, Sander. 2005. A semantic translation ser- Chalupsky, Hans. OntoMorph: a translation system vice using ontologies. In University of Twente, for symbolic knowledge. University of Southern Faculty of Electrical Engineering, Mathematics California Information Sciences Institute. http:// and Computer Science 3rd Twente Student Confer- www.isi.edu/~hans/ontomorph/presentation/ont ence on IT, Enschede, 20 June 2005. http:// omorph.html, viewed 30 May 2007. bockting.student.utwente.nl/documents/semantic Chen, Hsinchun. 1999. Semantic research for digital _translation_service_using_ontologies.pdf. libraries. D-Lib Magazine 5: 10. http://www.dlib Bontas, Elena Paslaru., and Mochol, Malgorzata. 2006. .org/dlib/october99/chen/10chen.html. Ontology engineering cost estimation with ONTO- Cross, Phil, Brickley, Dan, and Koch, Traugott. 2003. COM. Technical Report TR-B-06-01, Freie Univer- RDF thesaurus specification (draft). Institute for sität Berlin, Germany, 7 February 2006. http:// Learning and Research Technology Technical Re- ontocom.ag-nbi.de/docs/tr-b-06-01.pdf. port Number 1011, 21 July 2003. http://www.ilrt Bontas, Elena Paslaru, and Tempich, Christoph. 2005. .bris.ac.uk/publications/researchreport/rr1011/re How much does it cost? Applying ONTOCOM to port_html. DILIGENT. Technical report TR-B-05-20, Freie CyCorp. 2007. What is Cyc? CyCorp, Inc. http:// Universität Berlin, Germany, 27 October 2005. www.cyc.com/cyc/technology/whatiscyc, viewed http://ontocom.ag-nbi.de/docs/tr-b-05-20.pdf. 29 May 2007. Brickley, Dan, Miller, Libby., Hunter, Jane and CyCorp. 2007. Opencyc.org: OpenCyc license infor- Lagoze, Carl, Principal Investigators. 2002a mation. CyCorp, Inc. http://www.opencyc.org/ About Harmony. A joint project of the Distrib- license, viewed 29 May 2007. uted Systems Technology Center (Australia), the CyCorp. 2007. ResearchCyc. CyCorp, Inc. http:// Institute for Learning and Research Technology research.cyc.com/, viewed 29 May 2007. (UK), and Cornell Digital Library Research CyCorp. 2002a The Syntax of CycL. CyCorp, last Group (USA). http://metadata.net/harmony/ updated 28 March 2002. http://www.cyc.com/ index.html, viewed 29 May 2007. cycdoc/ref/cycl-syntax.html Brickley, Dan, Miller, Libby, Hunter, Jane and Lagoze, CyCorp. 2002b. Frequently Asked Questions about Carl, Principal Investigators. 2002b. Harmony: Re- OpenCyc, Version 07b. CyCorp, last updated 20 sults. A joint project of the Distributed Systems September 2002. http://www.opencyc.org/faq/ Technology Center (Australia), the Institute for opencyc_faq Learning and Research Technology (UK), and Cor- Davies, John, Fensel, Dieter, and Van Harmelen, nell Digital Library Research Group (USA). http:// Frank, editors. 2003. Towards the semantic web: metadata.net/harmony/Results.htm, viewed 30 ontology-driven knowledge management. West Sus- May 2007. sex: John Wiley & Sons. Buckland, Michael, Chen, Aitao, Gey, Fredric C., de Bruijn, Jos. 2003. Using ontologies: enabling and Larson, Ray R. 2006. Search across different knowledge sharing and reuse on the semantic web. media: numeric data sets and text files. In Infor- Digital Enterprise Research Institute Technical Re- mation technology and libraries 25: 182. http:// port DERI-2003-10-29, October 2003. http:// metadata.sims.berkeley.edu/searchacross.pdf. www.deri.at/fileadmin/documents/DERI-TR Butler, Mark. H., Gilbert, John, Seaborne, Andy, and -2003-10-29.pdf. Smathers, Kevin. 2004. Data conversion, extraction de Bruijn, Jos and Polleres, Axel. 2004. Towards an and record linkage using XML and RDF tools in ontology mapping specification language for the Project SIMILE. Hewlett-Packard Company Tech- semantic web. Digital Enterprise Research Institute nical Report HPL-2004-147, 31 August 2004. Technical Report DERI-2004-06-30, June 2004. http://metadata.sims.berkeley.edu/searchacross http://www.deri.at/fileadmin/documents/DERI .pdf. -TR-2004-06-30.pdf. CES 2003. MetaNet: An Overview. A project of the Delugach, Harry, Editor. 2007. Common logic stan- European Commission Community Research In- dard. ISO Final Draft International Standard formation Society Technologies, published by the 24707. http://cl.tamu.edu/, viewed 30 May 2007. University of Edinburgh, Research Centre of the Knowl. Org. 34(2007)No.4 241 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

DGRC. The EDC Project. Digital Government Re- http://www.deri.at/fileadmin/documents/ search Center. http://www.isi.edu/dgrc/dgrc DERI-TR-2003-10-29.pdf. -research.html, viewed 29 May 2007. Fensel, Dieter, Hendler, Jim, Lieberman, Henry, and Dicheva, Darina, Sosnovsky, Sergey, Gavrilova, Ta- Wahlster, Wolfgang. 2003c Introduction. In Spin- tiana, and Brusilovsky, Peter. 2006. Ontologies for ning the semantic web. (Cambridge: MIT Press) Education Portal. A collaborative project of http://w5.cs.uni-sb.de/teaching/ws03/ Winston Salem State University, University of internetagenten/Introduction.pdf, viewed 15 May Pittsburgh, and Saint-Petersburg State Polytech- 2007. nic University. http://iiscs.wssu.edu/o4e/ Food and Agriculture Organization of the United viewhome.do?tm=O4E.xtm, viewed 1 April 2007. Nations. 2007. Agriculture Information Manage- Doerr, Martin, Hunter, Jane, and Lagoze, Carl. 2003. ment Standards: AGROVOC Thesaurus. http:// Towards a core ontology for information integra- www.fao.org/aims/ag_intro.htm, last updated 22 tion. Journal of digital information 4:1, Article May 2007. 169, 9. http://jodi.ecs.soton.ac.uk/Articles/v04/ FZI WIM and AIFB LS3. 2007. KAON Tool Suite. i01/Doerr/. http://kaon.semanticweb.org/, last updated 10 Domingue, John. WebOnto. The Open University May 2005. Knowledge Media Institute. http://kmi.open.ac.uk/ Genesreth, Michael R. 2004. Knowledge interchange projects/webonto/, viewed 30 May 2007. format: draft proposed American National Stan- Domingue, John, and Motta, Enrico. PlanetOnto. dard (dbANS), NCITS.T2/98-004. Stanford The Open University Knowledge Media Institute. Logic Group, Stanford University. http://logic http://kmi.open.ac.uk/projects/planetonto/, .stanford.edu/kif/dpans.html, viewed 14 April viewed 30 May 2007. 2007. Dublin Core Metadata Initiative. 2004. Dublin Core Gilbert, John, and Butler, Mark H. 2003. Review of Metadata Element Set, Version 1.1: reference de- existing tools for working with schemas, metadata, scription. Dublin Core Metadata Initiative, 20 De- and thesauri. Hewlett-Packard Company Techni- cember 2004. http://dublincore.org/documents/ cal Report HPL-2003-218, 6 November 2003. dces/, viewed 29 May 2007. http://www.hpl.hp.com/techreports/2003/HPL Electronic Cultural Atlas Initiative. 2006. Support for -2003-218.pdf. the learner: what, where, when, and who. University Gray, Peter, Gray, Alex, Fiddian, Nick, Shave, Mi- of California, Berkeley. http://ecai.org/imls2004/, chael, and Bench-Capon, Trevor, Principal Inves- viewed 29 May 2007. tigators. 2000. KRAFT: Knowledge Reuse & Fu- Faculty of Engineering at Modena. 2004. The Media- sion/Transformation. A joint project of The Uni- tor envirOnment for Multiple Information Sour- versity of Aberdeen Computer Science Depart- ces (MOMIS) Project. University of Modena e ment, The Cardiff University School of Com- Reggio Emilia, 2 October 2004. http://www puter Science, and The University of Liverpool .dbgroup.unimo.it/Momis/, viewed 15 May 2007. Computer Science Department. http://www.csd Fensel, Dieter. 2003a. From a presentation for the .abdn.ac.uk/~apreece/Research/KRAFT/, viewed Next Web Generation Seminar at the University of 30 May 2007. Innsbruck, Summer, 2003. In de Bruijn, J. Using Hjørland, Birger. 2007. Knowledge organization sys- Ontologies: Enabling Knowledge Sharing and Re- tems. Core concepts in library and information sci- use on the Semantic Web. Digital Enterprise Re- ence (LIS), 11 February 2007. http://www.db.dk/ search Institute Technical Report DERI- 2003-10- bh/lifeboat_ko/CONCEPTS/knowledge_ 29, October 2003. http://www.deri.at/fileadmin/ organization_systems.htm, viewed 15 May 2007. documents/DERI-TR-2003-10-29.pdf. Hovy, Eduard. 2003. Using an ontology to simplify Fensel, Dieter. 2003b Ontologies: a silver bullet for data access. Communications of the ACM 46. knowledge management and electronic commerce. Hollink, Laura, Worring, Marcel, and Schreiber, A. In de Bruijn, J. Using ontologies: enabling knowl- Th. (Guus). 2005. Building a visual ontology for edge sharing and reuse on the semantic web. Berlin: video retrieval. In ACM Multimedia Conference Springer-Verlag. Digital Enterprise Research Insti- 2005. ACM, November 2005. http://www.cs.vu tute Technical Report DERI-2003-10-29, October .nl/%7Eguus/papers/Hollink05b.pdf , viewed 10 2003. November 2007. Ontology available at http:// 242 Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

appling.kent.edu/nsdlgreen/default.htm, viewed work for a family of logic-based languages. 29 May 2007. (ISO/IEC JTC 1/SC 32 N 1498), 31 December Hsu, Eric I. 2003. Wine agent 1.0: how does it work? 2006. http://cl.tamu.edu/docs/cl/24707-31-Dec- Stanford University Knowledge Systems Artificial 2006.pdf. Intelligence Laboratory, last updated 8 April 2003. International Standards Organization. 2004. MPEG-7 http://www.ksl.stanford.edu/projects/wine/ overview. (ISO/IEC JTC1/SC29/WG11 N 6828) explanation.html Coding of Moving Pictures and Audio. http:// Hunter, Jane. 2001. MetaNet; a metadata term the- www.chiariglione.org/mpeg/standards/mpeg-7/ saurus to enable semantic interoperability be- mpeg-7.htm, viewed 29 May 2007. tween metadata domains. Journal of Digital In- International Standards Organization, International formation 1:8, No. 42, 8 February 2001. http:// Electrotechnical Commission. 2004. Document jodi.tamu.edu/Articles/v01/i08/Hunter/. schema definition languages (DSDL) – part 3: rule- Hüsemann, Bodo. 2006. OntoMedia. University of based validation – schematron. (ISO/IEC FDIS Muenster. http://www.ontomedia.de/, viewed 29 19757-3). http://www.schematron.com/iso/dsdl-3 May 2007. -fdis.pdf. ICOM/CIDOC Document Standards Group and Jelliffe, Rick. Schematron: a language for making as- CIDOC CRM Special Interest Group. 2005. sertions about patterns found in XML documents. Definition of the CIDOC Conceptual Reference http://www.schematron.com/overview.html, Model, Version 4.2. edited by Crofts, Nick, Doerr, viewed 30 May 2007. Martin, Gill, Tony, Stead, Stephen, and Stiff, Mat- Jones, D.M, Bench-Capon, T.J.M., and Visser, P.R.S. thew, June 2005. http://cidoc.ics.forth.gr/docs/ 1998. Methodologies for ontology development. cidoc_crm_version_4.2.pdf. In Jose ́ Cuena, ed., IT&KNOWS information IFLA. 1998. Functional requirements for bibliographic technology and knowledge systems: Proceedings of records. Final Report. International Federation of the XV. IFIP World Computer Congress, 31 Aug.- Library Associations and Institutions, Cataloguing 4 Sept. 1998, Vienna, Austria and Budapest, Hun- Section, FRBR Review Group. http://www.ifla gary. Vienna: Austrian Computer Society/Inter- .org/VII/s13/frbr/frbr.htm, viewed 29 May 2007. national Federation for Information Processing, InfoMaster. 2006. InfoMaster: the power to make pp. 62-75. better decisions. Efekt Pty Ltd. http://www Kabel, S., de Hoog, R., Wielinga, B.J., Anjewierden, .infomaster.com.au/, viewed 29 May 2007. A. 2004. The added value of task and ontology- IMS. 2007. Learning resource meta-data specifica- based markup for information retrieval. Journal of tion. IMS Global Learning Consortium, Inc. the American Society for Information Science and http://www.imsproject.org/metadata/, viewed 29 Te c h n o l o g y 55:348-62. May 2007. Kahn, L., McLeod, D., and Hovy, E. 2004. Retrieval Information Sciences Institute, University of South- effectiveness of an ontology-based model for in- ern California. Large resources: ontologies (SEN- formation selection. The international journal on SUS) and lexicons. Information Sciences Institute, very large data bases 13: 71- 85. University of Southern California. http://www.isi Karger, David R., Principal Investigator. Haystack .edu/natural-language/projects/ONTOLOGIES Project. Massachusetts Institute of Technology .html, viewed 29 May 2007. Computer Science and Artificial Intelligence Institut für Informatik. 2006a. Ontology engineer- Laboratory (MIT CSAIL). http://haystack.lcs.mit ing cost estimation with ONTOCOM. Institut für .edu/, viewed 1 June 2007. Informatik, Networked Information Systems, Kent State University. Green’s Functions Digital Li- Freie Universität Berlin. http://ontocom.ag- brary. A collaborative effort of Kent State Univer- nbi.de/index.html, viewed 1 June 2007. sity, National Institute of Standards and Testing Institut für Informatik. 2006b. ONTOCOM cost (NIST), and Massachusetts Institute of Technol- drivers. Institut für Informatik, Networked Infor- ogy (MIT). http://appling.kent.edu/nsdlgreen/ mation Systems, Freie Universität Berlin. http:// default.htm, viewed 29 May 2007. ontocom.ag-nbi.de/ontocom.html, viewed 1 June Klein, Michael. 2001. Combining and relating on- 2007. tologies: an analysis of problems and solutions. In International Standards Organization. 2006. Infor- International Joint Conferences on Artificial Intelli- mation technology – common logic (CL): a frame- gence Workshop on Ontologies and Information Knowl. Org. 34(2007)No.4 243 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

Sharing. http://www.informatik.uni-bremen.de/ Marshall, Catherine C. 2004. Taking a stand on the se- agki/www/buster/IJCAIwp/Finals/klein.pdf. mantic web. http://www.csdl.tamu.edu/~marshall/ Knowledge Media Institute and the Open Univer- mc-semantic-web.html, viewed 15 May 2007. sity. 2004. Scholarly ontologies project: Summary. Marshall, Catherine C. and Shipman, Frank M. 2003. http://kmi.open.ac.uk/projects/scholonto/summa Which semantic web? In HyperText ‘03 Confer- ry.html, viewed 30 May 2007. ence, Nottingham, UK, 26-30 August, 2003. Knowledge Media Institute. 2000. OCML: opera- Copyright 2003 ACM. http://www.csdl.tamu tional conceptual modelling language. http://kmi .edu/~marshall/ht03-sw-4.pdf, viewed 15 May .open.ac.uk/projects/ocml/ viewed 14 March 2007. 2007. MatML Working Group. 2004. MatML: XML for KSL. 2005a Chimæra. Stanford University Com- materials property data. http://www.matml.org/ puter Science Department, Knowledge Systems schema.htm, viewed 29 May 2007. Artificial Intelligence Laboratory. http://www-ksl Mazzocchi, Stephano, Garland, Stephen, and Lee, .stanford.edu/software/chimaera/, viewed 30 May Ryan. 2005. SIMILE: practical metadata for the 2007. semantic web. In O’Reilly XML.com 2005. XML KSL. 2005b Ontolingua. Stanford University Com- From the Inside Out, 26 January 2005. http:// puter Science Department, Knowledge Systems www.xml.com/pub/a/2005/01/26/simile.html Artificial Intelligence Laboratory. http://www-ksl Meersman, Robert. 1999. Semantic ontology tools in .stanford.edu/software/ontolingua/, viewed 1 is design. In Zbigniew W. Ras ́ and Andrzej Skow- June 2007. ron, eds., Foundations of intelligent systems: 11th Lagoze, Carl and Hunter, Jane. 2001. The ABC on- International Symposium, ISMIS’99, Warsaw, Po- tology and model. Journal of digital information land, June 8-11, 1999: proceedings. Berlin and New 2:2, Art. 77. http://jodi.ecs.soton.ac.uk/Articles/ York: Springer, 30-45. v02/i02/Lagoze/. Mena, Eduardo, Illarramendi, Arantza, Kashyap, Lawrence, Faith, Tuffield, Mischa M., Jewell, Mike Vipul, and Sheth, Amit P. 2000. OBSERVER: an O., Prügel-Bennett, Adam, Millard, David E., approach for query processing in global informa- Nixon, Mark S., Schraefel, Monica, and Shadbolt, tion systems based on interoperation across pre- Nigel R. 2005. OntoMedia creating an ontology existing ontologies. Distributed and Parallel Data- for marking up the contents of heterogenous me- bases 8: 223-71. dia. In Proceedings Ontology Patterns for the Se- Menzies, Tim. 1999. Cost benefits of ontologies. In- mantic Web ISWC-05 Workshop. http://eprints telligence 10, no. 3: 26-32. .ecs.soton.ac.uk/11153/01/onto_workshop.pdf. Milam, John. 2005. Ontologies in higher education. Leuf, Bo. 2006. The semantic web: crafting infrastruc- In HigherEd.org. http://highered.org/docs/milam ture for agency. West Sussex: John Wiley & Sons. -ontology.pdf, viewed 1 April 2007. Lin, David and Hunter, Jane. 2001. ABC Metadata Mindswap Lab. 2006. Pellet: an OWL DL reasoner. Model Constructor: The ABC Ontology. A result Developed at the University of Maryland’s of the DSTC (Australia), JISC (UK), and NSF Mindswap Lab, commercially supported by Clark (US) funded Harmony Project. Developed under & Parisial, LLC. http://pellet.owldl.com/, last up- the direction of Carl Lagoze. http://metadata dated 4 November 2006. .net/harmony/constructor/ABC_Constructor.htm, Miller, George A. WordNet: a lexical database for the viewed 30 May 2007. English language. Cognitive Science Laboratory, Library of Congress. 2007. MARC standards. Library Princeton University. http://wordnet.princeton of Congress, Network Development and MARC .edu/, viewed 29 May 2007. Standards Office. http://www.loc.gov/marc/, last MIT 2007a Simile: semantic interoperability of meta- updated 24 January 2007. data in unlike environments. A joint project of LSDIS. 2005. ADEPT: Alexandria Digital Earth Pro- Massachusetts Institute of Technology Libraries totype (1999-2004). Large Scale Distributed Infor- and Massachusetts Institute of Technology Com- mation Systems, University of Georgia, Computer puter Science and Artificial Intelligence Labora- Science Department. http://lsdis.cs.uga.edu/ tory. http://simile.mit.edu/, viewed 29 May 2007. projects/past/ADEPT/, viewed 29 May 2007. MIT and Hewlett Packard. 2007b DSpace: Welcome to DSpace. A joint project of Massachusetts Insti- 244 Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

tute of Technology Libraries and Hewlett Pack- puter Science, University of Maryland at College ard. http://www.dspace.org/, viewed 29 May 2007. Park. Motta, Enrico. OCML: operational conceptual model- http://www.cs.umd.edu/projects/plus/SHOE/ind ling language. The Open University Knowledge ex.html, viewed 30 May 2007. Media Insitute. http://kmi.open.ac.uk/projects/ Pazienza, MariaTeresa., Stellato, Armando, Vindigni, ocml/, viewed 30 May 2007. Michele, Zanzotto, Fabio Massimo. 2004. Müller, Hans-Michael, Kenny, Eimear E., and Stern- XeOML: an XML-based extensible ontology berg, Paul W. 2004. Textpresso: an ontology-based mapping language. Paper presented at the 3rd In- information retrieval and extraction system for ternational Semantic Web Conference (ISWC2004) biological literature. Public library of science: biol- in Hiroshima, Japan, November 2004. http:// ogy 2: 11. http://www.pubmedcentral.nih.gov/ ai-nlp.info.uniroma2.it/stellato/publications/2004 articleren- _ISWC-04_XeOML%20An%20XML-based%20 der.fcgi?tool=pubmed&pubmedid=15383839. extensible%20Ontology%20Mapping%20 Multisystems. 2007. Thesaurus construction and pub- Language.pdf, viewed 6 February 2007. lishing solutions. Multisystems. h t t p : / / w w w. Petras, Vivien, Larson, Ray, and Buckland, Michael. multites.com/, viewed 1 June 2007. 2006. Time period directories: a metadata infra- Noy, Natasha. Prompt. Stanford Medical Informat- structure for placing events in temporal and geo- ics, Stanford University. http://protege.stanford graphic context. In Opening information horizons: .edu/plugins/prompt/prompt.html, viewed 30 Joint Conference on Digital Libraries, Chapel Hill, May 2007. NC, 11-15 June 2006. http://metadata.sims. OCLC. 2007. Learn more about LC Name Author- berkeley.edu/tpdJCDL06.pdf, viewed 24 February ity Service. Online Computer Library Center Pro- 2007. grams and Research, ResearchWorks. http://www Pietriga, Emmanuel. 2007. IsaViz: a visual authoring .oclc.org/research/researchworks/authority/ tool for RDF. World Wide Web Consortium, RDF default.htm, viewed 29 May 2007. Developer. http://www.w3.org/2001/11/IsaViz/, OKBC Working Group. 1995. Open Knowledge Base May 2007. Connectivity Home Page. A joint project of Cy- Riva, Alberto. LispWeb. Common Lisp Web Server. Corp, Information Sciences Institute, Stanford http://snpper.chip.org/lispweb, viewed 30 May Knowledge Systems Laboratory, Science Applica- 2007. tions International Corporation (SAIC), SRI In- Russ, Tom and Patil, Ramesh. 2006. Loom ontosau- ternational, and Teknowledge; Richard Fikes, rus. University of Southern California Informa- working group chair. http://www.ai.sri.com/~ tion Sciences Institute. http://www.isi.edu/isd/ okbc/, viewed 30 May 2007. ontosaurus.html, last updated 5 December 2006. Ontology Works, Inc. 2005. Ontology Works knowl- Rys, Michael. 1998. The Stanford-IBM Manager of edge server. http://www.ontologyworks.com/ks Multiple Information Sources (TSIMMIS). Stan- .php, viewed 8 March 2007. ford University, last updated 4 April 1998. http:// Ontoprise. 2007. Know how to use know-how. Onto- www-db.stanford.edu/tsimmis/, viewed 3 Febru- prise GmbH. http://www.ontoprise.de/content/, ary 2007. viewed 29 May 2007. Shadbolt, Nigel, Hall, Wendy, and Berners-Lee, Tim. Palmer, Matthias, Naeve, Ambjörn, Enoksson, 2006. The semantic web revisited. IEEE Intelligent Fredrik, Nilsson, Mikael, Eriksson, Henrik, Systems 21: 96-101. http://eprints.ecs.soton.ac.uk/ Danils, Jan, and Stark, Jöran. 2007. SHAME: stan- 12614/01/Semantic_Web_Revisted.pdf. dardized hyper adaptible metadata editor. http:// Shreve, Gregory M. and Zeng, Marcia Lei. 2003. In- kmr.nada.kth.se/shame/wiki/Overview/Main, last tegrating resource metadata and domain markup updated 5 December 2006. in an NSDL collection. In Proceedings of the In- Paralič, Jan and Kostial, Ivan. 2003. Ontology-based ternational DCMI Metadata Conference and Work- Information Retrieval. In Proceedings of the 14th shop, Seattle, WA, 28 September - 2 October, International Conference on Information and Intel- 2003. http://www.siderean.com/dc2003/604_ ligent Systems (IIS2003), Varadin, Croatia. ISBN paper62.pdf, viewed 16 March 2007. 953-6071-22-3, 23-28. Shum, Simon Buckingham, Motta, Enrico and Parallel Understanding Systems Group. SHOE: sim- Dominigue, John. 2000. ScholOnto: an ontology- ple html ontology extensions. Department of Com- based digital library server for research documents Knowl. Org. 34(2007)No.4 245 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

and discourse. International Journal on Digital Li- http://www.ukoln.ac.uk/metadata/education/regp braries 3: 3. http://kmi.open.ac.uk/projects/ roj/scart/, viewed 1 June 2007. scholonto/docs/ScholOnto-IJoDL-2000.pdf, Stuckenschmidt, Heiner, and van Harmelen, Frank. viewed 4 February 2007. 2005. Information Sharing on the Semantic Web. SID. 2001. Interoperable Database Group, Research Berlin: Springer. Group of Distributed Information Systems Swoogle. 2006. Swoogle manual: FAQs. University of (SID). http://sid.cps.unizar.es/OBSERVER/, up- Maryland, Baltimore County Ebiquity Research dated 12 April 2001. Group. http://swoogle.umbc.edu/index.php Silva, Nuno, Varady, Zoltan, Westerhausen, Frank, ?option=com_swoogle_manual&manual=faq, Fodor, Oliver, Silva, PedroVieira, and Maio, Paulo. viewed 16 March 2007. 2006. MAFRA toolkit. http://mafra-toolkit Thesaurus Builder. 2007. Thesaurus Builder thesaurus .sourceforge.net/, last updated 2 January 2006. management software. Thesaurus Builder. http:// Simperl, Elena Paslaru Bontas, and Tempich, Chris- www.thesaurusbuilder.com/, viewed 1 June 2007. toph. 2006. Ontology engineering: a reality check. TECUP Consortium. 2001. INDECS: interoperabil- In 5th International Conference on Ontologies, Da- ity of data in e-commerce systems. TECUP Con- tabases, and Applications of Semantics. http:// sortium, lead by Universität Göttingen / Nied- ontocom.ag-nbi.de/docs/odbase2006.pdf, viewed ersächsische Staats- und Universitätsbibliothek 16 March 2007. (UNIGOE) as Coordinator. http://gdz.sub.uni Sirin, Evren. Simple Instance Creator (SIC). Mary- -goettingen.de/tecup/indecs.htm, viewed 30 May land Information and Network Dynamics Lab 2007. Semantic Web Agents Project (MINDSWAP). Telcordia Technologies. 2005. The InfoSleuth agent http://www.mindswap.org/~evren/SIC/, viewed system. Applied Research Greenhouse, Telcordia 1 June 2007. Technologies. http://www.argreenhouse.com/ Smith, Terence R., Zeng, Marcia L., and the ADEPT InfoSleuth/index.shtml, viewed 11 April 2007. Project Team. 2004. Building semantic tools for U.S. National Library of Medicine. 2006a UMLS me- concept-based learning spaces: knowledge bases tathesaurus fact sheet. National Institutes of of strongly-structured models for scientific con- Health, U.S. Department of Health and Human cepts in advanced digital libraries. Journal of digi- Services, 28 March 2006. http://www.nlm.nih.gov tal information 4:4, Art. 263, 28 January 2004. /pubs/factsheets/umlsmeta.html, viewed 14 http://jodi.tamu.edu/Articles/v04/i04/Smith/. March 2007. Soergel, Dagobert, Lauser, Boris, Liang, Anita, Fi- U.S. National Library of Medicine. 2006b. Unified esseha, Frehiwot, Keizer, Johannes, and Katz, Ste- medical language system. National Institutes of phen. 2004. Reengineering thesauri for new appli- Health, U.S. Department of Health and Human cations: the AGROVOC Example. In Journal of Services, 28 March 2006. http://www.nlm.nih digital information 4:3, No. 257. http://jodi.tamu .gov/research/umls/about_umls.html, viewed .edu/Articles/v04/i04/Soergel. 14 March 2007. Soo, Von-Wun, Lee, Chen-Yu and Yeh, Jaw Jium. Us- U.S. National Library of Medicine. 2006c. SPE- ing sharable ontology to retrieve historical images. CIALIST lexicon fact sheet. National Institutes of In International Conference on Digital Libraries, Health, U.S. Department of Health and Human Proceedings of the 2nd ACM/IEEE-CS Joint Con- Services, 28 March 2006. http://www.nlm.nih. ference on Digital Libraries, ACM Press, July 2002. gov/pubs/factsheets/umlslex.html, viewed 14 Stanford Medical Informatics. 2007. Protégé. Stanford March 2007. University School of Medicine, Stanford Medical U.S. National Library of Medicine. 2005. UMLS Informatics. http://protege.stanford.edu/, last up- knowledge source server (UMLSKS), Version 5.0. dated 25 May 2007. National Institutes of Health, U.S. Department of Steer, Damian. 2003a. RDF author. http://rdfweb Health and Human Services, 30 August 2005. .org/people/damian/RDFAuthor/, last modified 2 http://umlsks.nlm.nih.gov/, viewed 14 March 2007. August 2003. Visser, Ubbo and Hübner, Sebastian. 2003. Bremen Steer, Damian. 2003b. The Meg Registry Client Soft- University Semantic Translator for Enhanced Re- ware (SCART). UKOLN Metadata for Education trieval (BUSTER). http://www.informatik.uni Group. -bremen.de/agki/www/buster/new/application .html, last modified 25 May 2003. 246 Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

Visser, Ubbo and Schuster, Gerhard. 2002. Finding World Wide Web Consortium. 2007. Extensible and integration of information: a practical solu- markup language (XML). W3C Architecture Do- tion for the semantic web. In Proceedings of ECAI main. http://www.w3.org/XML/, viewed 29 May 02, Workshop on Ontologies and Semantic Interop- 2007. erability, pp. 73-78. http://www.informatik.uni World Wide Web Consortium. 2006. XForms 10 (Sec- -bremen.de/agki/www/buster/papers/ECAI02WS ond Edition): W3C Recommendation 14 March .pdf viewed 14 March 2007. 2006. http://www.w3.org/TR/xforms/, viewed 26 VRA. 2007. VRA core. Visual Resources Association, March 2007. The International Association of Image Media World Wide Web Consortium. (2004a) OWL web on- Professionals. http://www.vraweb.org/projects/ tology language guide. W3C Recommendation, Feb- vracore4/, viewed 29 May 2007. ruary 2004. http://www.w3.org/TR/owl-guide/, WebChoir. 2006. Project vocabulary tools. WebChoir, viewed 26 March 2007. Inc. http://www.webchoir.com/products/pvt World Wide Web Consortium. (2004b) RDF primer: .html, viewed 1 June 2007. W3C recommendation 10 February 2004. http:// Weinstein, PeterC. and Birmingham, William P. 1998. www.w3.org/TR/REC-rdf-syntax/, viewed 21 Creating ontological metadata for digital library March 2007. content and services. International journal on digital World Wide Web Consortium. 2004c. RDF/XML libraries 2: 20-37. http://deepblue.lib.umich.edu/ syntax specification (revised). W3C Recommenda- handle/2027.42/42334. tion 10 February 2004. http://www.w3.org/TR/ Welty, Chris. 2004. Ontology maintenance support: rdf-syntax-grammar/, viewed 29 May 2007. text, tools, and theories. Presentation at the 7th World Wide Web Consortium. 2000. Resource de- International Protégé Conference, Bethesda MD. scription framework (RDF) schema specification http://protege.stanford.edu/conference/2004/ 1.0. http://www.w3.org/TR/2000/CR-rdf-schema slides/2.1_Welty_Ontology_Maintenance_ -20000327/, viewed 29 May 2007. Support_v3.pdf, viewed 16 March 2007. World Wide Web Consortium. 1999. XSL transfor- Wiederhold, Gio, Jannink, Jan, Prasenjit, Mitra, mations (XSLT), Version 1.0. W3C Recommenda- Decker, Stefan, and Vasan, Pichai S. Scalable knowl- tion 16 November 1999. http://www.w3.org/TR/ edge composition (SKC). Stanford University In- xslt, viewed 29 May 2007. foLab. http://infolab.stanford.edu/SKC/, viewed XYLEME. 2006. Xyleme: harness the power of XML. 30 May 2007. XYLEME, 2006. http://www.xyleme.com/, viewed 29 May 2007.

Knowl. Org. 34(2007)No.4 247 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

Automated Classification of Textual Documents Based on a Controlled Vocabulary in Engineering†

Koraljka Golub,* Thierry Hamon,** and Anders Ardö*** *KnowLib Research Group, Lund University, P. O. Box 118, SE-221 00 Lund, Sweden ** Laboratoire d'Informatique de Paris-Nord – UMR CNRS 7030, Institut Galilée, Université Paris-Nord, Avenue J.-B. Clément, 93430 Villetaneuse, France ***KnowLib Research Group, Lund University, P. O. Box 118, SE-221 00 Lund, Sweden

Koraljka Golub has interest in traditional and recent knowledge organization systems in the context of digital libraries. She acquired her doctorate from Lund University, Sweden in 2007. Her thesis dealt with automated subject classification in Web-based hierarchical browsing systems. From 2008 she will work as a research officer at UKOLN, Bath. One project will be about terminology registries, and the other on social tagging and ways it can enhance information retrieval, especially when combined with more traditional controlled vocabularies.

Thierry Hamon is assistant professor at the Computer Science Department, Paris-Nord University. He received his Ph.D. in computer science in 2000, on the topic of semantic variation in specialized corpora. His current research interest is in terminology acquisition and structuring, and bringing to- gether tools for Natural Language Processing (NLP). He has developed several NLP tools: a termino- logical system SynoTerm dedicated to the acquisition of synonymy relations between terms, based on lexical resources, a term extractor, and a linguistic platform for the enrichment of specialized web do- cuments.

Anders Ardö is Associate Professor at the Department of Electrical and Information Technology, Lund University, where manages the Knowledge Discovery and Digital Library Research Group (KnowLib). He has a background in Computer Systems with a PhD from Lund University in 1986. Since 1992 he has worked with research and development for digital library services. He has participa- ted in many EU-projects including DESIRE, Telematics Applications Programme, Renardus, ALVIS and DELOS.

† Many thanks to Traugott Koch, Douglas Tudhope, Marianne Lykke Nielsen, and anonymous re- viewers for providing comments on the manuscript, which helped improve the paper considerably. The authors also wish to thank two subject experts who helped in evaluation. This work was sup- ported by the IST Programme of the European Community under ALVIS (IST-002068-STP).

Golub, Koraljka, Hamon, Thierry, and Ardö, Anders. Automated Classification of Textual Documents Based on a Control- led Vocabulary in Engineering. Knowledge Organization, 34(4), 247-263. 33 references.

248 Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

ABSTRACT. Automated subject classification has been a challenging research issue for many years now, receiving particular attention in the past decade due to rapid increase of digital documents. The most frequent approach to automated classification is machine learning. It, however, requires training documents and performs well on new documents only if these are similar enough to the former. We explore a string-matching algorithm based on a controlled vocabulary, which does not require trai- ning documents–instead it reuses the intellectual work put into creating the controlled vocabulary. Terms from the Engineering Information thesaurus and classification scheme were matched against title and abstract of engineering papers from the Com- pendex database. Simple string-matching was enhanced by several methods such as term weighting schemes and cut-offs, ex- clusion of certain terms, and enrichment of the controlled vocabulary with automatically extracted terms. The best results are 76% recall when the controlled vocabulary is enriched with new terms, and 79% precision when certain terms are excluded. Precision of individual classes is up to 98%. These results are comparable to state-of-the-art machine-learning algorithms.

1. Introduction mais 2000, 146). Also, clusters’ labels and relation- ships between them change as new documents are Subject classification is organization of objects into added to the collection; unstable class names and rela- topically related groups and establishing relationships tionships are in information retrieval systems user- between them. In automated subject classification (in unfriendly, especially when used for subject browsing. further text: automated classification) human intel- Text categorization (machine learning) is the most lectual processes are replaced by, for example, statisti- widespread approach to automated classification of cal and computational linguistics techniques. Auto- text. Here characteristics of subject classes, into which mated classification of textual documents has been a documents are to be classified, are learnt from docu- challenging research issue for several decades. Its ments with human-assigned classes. However, human- relevance is rapidly growing with the advancement of classified documents are often unavailable in many the World Wide Web. Due to high costs of human- subject areas, for different document types or for dif- based subject classification and the ever-increasing ferent user groups. If one would judge by the standard number of documents, there is a danger that recog- Reuters Corpus Volume 1 collection (RCV1) (Lewis nized objectives of bibliographic systems (Svenonius et al. 2004), some 8,000 training and testing docu- 2000, 20-21) would be left behind; automated means ments would be needed per class. A related problem is could provide a solution to preserve them (30). that the algorithm performs well on new documents Automated classification of text has many different only if they are similar enough to the training docu- applications (see Sebastiani 2002 and Jain et al. 1999); ments. The issue of document collections was also in this paper, the application context is that of infor- pointed out by Yang (1999) who showed how certain mation retrieval. In information retrieval systems, e.g., versions of one and the same document collection had library catalogues or indexing and abstracting services, a strong impact on performance. improved precision and recall are achieved by con- In document classification, matching is conducted trolled vocabularies, such as classification schemes and between a controlled vocabulary and text of docu- thesauri. The specific aim of the classification algo- ments to be classified. A major advantage of this ap- rithm is to provide a hierarchical browsing interface to proach is that it does not require training documents. a document collection, through a classification If using a well-developed classification scheme, it will scheme. In our opinion, one can distinguish between also be suitable for subject browsing in information three major approaches to automated classification: retrieval systems. This would be less the case with text categorization, document clustering, and docu- automatically-developed classes and structures of ment classification (Golub 2006a). document clustering or home-grown directories not In document clustering, both subject clusters or created in compliance with professional principles classes into which documents are classified and, to a and standards. Apart from improved information re- limited degree, relationships between them are auto- trieval, another motivation to apply controlled vo- matically produced. Labeling the clusters is a major cabularies in automated classification is to reuse the research problem, with relationships between them, intellectual effort that has gone into creating such a such as those of equivalence, related-term and hierar- controlled vocabulary (see also Svenonius 1997). chical relationships, being even more difficult to The importance of controlled vocabularies such as automatically derive (Svenonius 2000, 168). In addi- thesauri in automated classification has been recog- tion, “[a]utomatically-derived structures often result nized in recent research. Bang et al. (2006) used a the- in heterogeneous criteria for category membership saurus to improve performance of a k-NN classifier and can be difficult to understand” (Chen and Du- and managed to improve precision by 14%, without Knowl. Org. 34(2007)No.4 249 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

degrading recall. Medelyan and Witten (2006) showed achieve best recall, it was also indicated that higher how information from a subject-specific thesaurus weights could be given to preferred terms (from the improved performance of keyphrase extraction by thesaurus), captions (from the classification scheme) more than 1.5 times in F1, precision, and recall. and synonyms (from the thesaurus), as those three The overall purpose of this experiment is to gain in- types of terms yielded highest precision. sights into what degree a good controlled vocabulary The aim of this experiment is to improve the clas- such as Engineering Information thesaurus and classi- sification algorithm based on string-matching be- fication scheme (Milstead 1995) (in further text: Ei tween the Ei controlled vocabulary and engineering controlled vocabulary) could be used in automated documents to be classified. We especially wanted to classification of text, using string-matching. Vocabu- do the following: lary control in thesauri is achieved in several ways (Aitchinson et al. 2000). We believe that the following – increase levels of F1 and precision, similar to those could be beneficial in the process of automated classi- of recall from the previous experiment (Golub fication: 2006c, 964), by applying different weights and cut- offs; and, – Terms in thesauri are usually noun phrases, which – increase levels of recall to more than those achieved are content words; in the previous experiment by adding new terms ex- – Three main types of relationships are displayed in tracted using natural language processing methods a thesaurus: such as multi-word morpho-syntactic analysis and - equivalence (e.g., synonyms, lexical variants); synonym extraction. - hierarchical (e.g., generic, whole-part, instance relationships); and, 2. Methodology - associative (terms that are closely related con- ceptually but not hierarchically and are not 2.1 String matching algorithm members of an equivalence set). – In automated classification, equivalence terms This section describes the classification algorithm could allow for discovering concepts and not just used in the experiment. It is based on searching for terms expressing the concepts. Hierarchies could terms from the Ei controlled vocabulary, in the field provide additional context for determining the of engineering, in text of documents to be classified correct meaning of a term; and so could associa- (also in the field of engineering). The Ei controlled tive relationships; vocabulary consists of two parts: a thesaurus of en- – When a term has more than one meaning in the gineering terms, and a hierarchical classification thesaurus, each meaning is indicated by the addi- scheme of engineering topics. These two controlled tion of scope notes and definitions, providing ad- vocabulary types have each traditionally had distinct ditional context for automated classification. functions: the thesaurus has been used to describe a document with as many controlled terms as possible, In a previous paper Golub (2006c) explored to what while the classification scheme has been used to degree different types of Ei thesaurus terms and Ei group similar documents together to the purpose of classification captions influence performance of shelving them and allowing systematic browsing. automated classification. In short, the algorithm The aim of the algorithm was to classify documents searched for terms from the Ei controlled vocabulary into classes of the Ei classification scheme in order in engineering documents to be classified (see 2.1). to provide a browsing interface to the document col- The majority of classes were found when using all lection. A major advantage of Ei is that thesaurus de- the types of terms: preferred terms, their synonyms, scriptors are mapped to classes of the classification related, broader, narrower terms and captions, in scheme. These mappings have been made manually combination with a stemmer: recall was 73%. The (intellectually) and are an integral part of the thesau- remaining 27% of classes were not found because the rus. Compared with captions alone, mapped thesau- words in the term list designating the classes did not rus terms provide a rich additional vocabulary for exist in the text of the documents to be classified. every class: instead of having only one term per class No weighting or cut-offs were applied in the ex- (there is only one caption per class), in our experi- periment. Apart from showing that all those types of ment there were on average 88 terms per class. (A terms should be used for a term list in order to caption is a class notation expressed in words, e.g., in 250 Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

the Ei classification scheme “Electric and Electronic 942.1 Electric and Electronic Instruments Instruments” is the caption for class “942.1”.) … Pre-processing steps of Ei included normalizing 943.2 Mechanical Variables Measurements upper- and lower-case words. Upper-case words were left in upper case in the term list, assuming that From the thesaurus: they were acronyms; all other words containing at TM Amperometric sensors least one lower-case letter were converted into lower UF Sensors–Amperometric measurements case. The first major step in designing the algorithm MC 942.1 was to extract terms from Ei into what we call a term … list. It contained class captions, thesaurus terms TM Angle measurement (Term), classes to which the terms and captions map UF Angular measurement or denote (Class), and weight indicating how appro- UF Mechanical variables measurement–Angles priate the term is for the class to which it maps or BT Spatial variables measurement which it designates (Weight). Geographical names, RT Micrometers all mapping to class 95, were excluded on the MC 943.2 grounds that they are not engineering-specific. The … term list was formed as an array of triplets: TM Anisotropy NT Magnetic anisotropy Weight: Term (single word, Boolean term or MC 931.2 phrase) = Class All the different thesaurus terms as well as captions Single-word terms were terms consisting of one were added to the term list. Despite the fact that word. Boolean terms were terms consisting of two or choosing all types of thesaurus terms might lead to more words that must all be present but in any order precision losses, we decided to do just that in order or in any distance from each other. Boolean terms in to achieve maximum recall, as shown in a previous this form were not explicitly part of Ei, but were cre- paper (Golub 2006c). In the thesaurus, TM stands ated to our purpose. They were considered to be for the preferred term, UF (“Used For”) for an those terms which in Ei contained the following equivalent term, BT for broader term, RT for related strings: and, vs. (short for versus), , (comma), ; term, NT for narrower term; MC represents the (semi-colon, separating different concepts in class main class; sometimes there is also OC, which captions), ( and ) (parentheses, indicating the con- stands for optional class, valid only in certain cases. text of a homonym), : (colon, indicating a more spe- Main and optional classes are classes from the Ei cific description of the previous term in a class cap- classification scheme that have been made manually tions), and – (double dash, indicating heading– (intellectually) and are an integral part of the thesau- subheading relationship). These strings we replaced rus. Based on the above excerpts, the following term with @and which indicated the Boolean relation in list would be created: the term. All other terms consisting of two or more words were treated as phrases, i.e., strings that need 1: physical properties of gases @and liquids @and to be present in the document in the exact same or- solids = 931.2, der and form as in the term. Ei comprises a large 1: electric @and electronic instruments = 942.1, portion of composite terms (3,474 in the total of 1: mechanical variables measurements = 943.2, 4,411 distinct terms in our experiment); as such, Ei 1: amperometric sensors = 942.1, provides a rich and precise vocabulary with the po- 1: sensors @and amperometric measurements = tential to reduce the risks of false hits. 942.1, The following are two excerpts from the Ei classi- 1: angle measurement = 943.2, fication scheme and thesaurus, based on which the 1: angular measurement = 943.2, excerpt from the term list (further below) is created: 1: mechanical variables measurement @and angles = 943.2, From the classification scheme: 1: spatial variables measurement = 943.2, 931.2 Physical Properties of Gases, Liquids and 1: micrometers = 943.2, Solids 1: anisotropy = 931.2, … 1: magnetic anisotropy = 931.2, Knowl. Org. 34(2007)No.4 251 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

The number at the beginning of each triplet is weight gineering Information 2006). (Compendex being a estimating the probability that the term of the triplet commercial database, the document collection can- designates the class; in this example it is set to 1 as a not be made available to others, but the authors are baseline, and experiments with different weights are willing to provide documents’ identification num- discussed later on. bers on request.) The records were selected by sim- The algorithm searches for strings from a given ply retrieving the top 100 or more of them upon en- term list in the document to be classified and if the tering the class notation. A minimum of 100 records string (e.g., magnetic anisotropy from the above list) per class were downloaded at several different points is found, the class(es) assigned to that string in the in time during the years of 2005 and 2006. term list (931.2 in our example) are assigned to the For each record there was at least one of the 92 document. One class can be designated by many selected classes that were human-assigned (see 2.1). terms, and each time a term is found, the corre- A subset of this collection was created to include sponding weight (1 in our example) is added to a only those records where main class was class 9 (The score for the class. The scores for each class are first one listed in the Ei classification codes field of summed up and classes with scores above a certain the record.); this subset contained 19237 documents. cut-off (heuristically defined, discussed later on) are From each bibliographic record (in further text: selected as the final ones for the document being document) the following elements were extracted: classified. an identification number, title, abstract and human- The Ei classification scheme is hierarchical and assigned classes (Ei classification codes). Thesaurus consists of six main classes divided into 38 finer descriptors (in Compendex called Ei controlled classes which are further subdivided into 182 classes. terms) were not extracted since the purpose of this These are subdivided even further, resulting in some experiment was to compare automatically assigned 800 individual classes in a five-level hierarchy. For classes (and not descriptors) against the human- this experiment one of the six main classes was se- assigned ones. Below is an example of one docu- lected, together with all its subclasses: class 9, Engi- ment: neering, General. The reason for choosing this class was that it covers both natural sciences such as phys- Identification number: 03337590709 ics and mathematics, and social sciences fields such Title: The concept of relevance in IR as engineering profession and management. The lit- Abstract: This article introduces the concept of erature of the latter tends to contain more polysemic relevance as viewed and applied in the context of words than the former, and as such presents a more IR evaluation, by presenting an overview of the complex challenge for automated classification. multidimensional and dynamic nature of the con- Within the 9 class, there are 99 subclasses. However, cept. The literature on relevance reveals how the for seven of them the number of documents in a da- relevance concept, especially in regard to the mul- tabase based on which the document collection was tidimensionality of relevance, is many faceted, and created (see 2.2 Document collection) were few, less does not just refer to the various relevance criteria than 100. Thus those seven classes were excluded users may apply in the process of judging rele- from the experiment altogether. These were: 9 (En- vance of retrieved information objects. From our gineering, General), 902 (Engineering Graphics; En- point of view, the multidimensionality of rele- gineering Standards; Patents), 91 (Engineering Man- vance explains why some will argue that no con- agement), 914 (Safety Engineering), 92 (Engineering sensus has been reached on the relevance concept. Mathematics), 93 (Engineering Physics), and 94 (In- Thus, the objective of this article is to present an struments and Measurement). Of the remaining 92 overview of the many different views and ways by classes, the distribution at the five different hierar- which the concept of relevance is used - leading to chical levels is as follows: at the fifth hierarchical a consistent and compatible understanding of the level 11 classes, at the fourth 67, at the third 14, and concept. In addition, special attention is paid to at the second hierarchical level 5. the type of situational relevance. Many researchers perceive situational relevance as the most realistic 2.2 Document collection type of user relevance, and therefore situational relevance is discussed with reference to its poten- The document collection comprised 35,166 biblio- tial dynamic nature, and as a requirement for in- graphic records from the Compendex database (En- teractive information retrieval (IIR) evaluation. 252 Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

Ei classification codes: 903.3 Information Re- subject terms or classes to the same document. Studies trieval & Use, 723.5 Computer Applications, 921 on inter- and intra-indexer consistency report gener- Applied Mathematics ally low indexer consistency (Olson and Boll 2001, 99- 101). Markey (1984) reviewed 57 indexer consistency Automated classification was based on title and ab- studies and reported that consistency levels range stract, and automatically assigned classes were com- from 4% to 84%, with only 18 studies showing over pared against human-assigned ones (Ei classification 50% consistency. There are two main factors that codes in the example). On average, 2.2 classes per seem to affect it: document were human-assigned, ranging from 10 to 1. 1. Higher exhaustivity and specificity of subject in- 2.3 Evaluation methodology dexing both lead to lower consistency, i.e., index- ers choose the same first term for the major sub- 2.3.1 Evaluation challenge ject of the document, but the consistency de- creases as they choose more classes or terms; According to ISO standard on methods for examining 2. The bigger the vocabulary, or, the more choices documents, determining their subjects, and selecting the indexers have, the less likely will they choose index terms (International Standards Organization the same classes or terms (Olson and Boll 2001, 1985), human-based subject indexing is a process in- 99-101). volving three steps: 1) determining subject content of a document, 2) conceptual analysis to decide which Both of these two factors were present in our ex- aspects of the content should be represented, and 3) periment: translation of those concepts or aspects into a con- trolled vocabulary. These steps, in particular the sec- 1. High exhaustivity: on average, 2.2 classes per ond one, are based on a specific library’s policy in re- document had been human-assigned, ranging spect to its document collections and user groups. from 10 to 1; Thus, when evaluating automatically assigned classes 2. Ei controlled vocabulary is rather big (we chose 92 against the human-assigned ones, it is important to classes) and deep (five hierarchical levels), allow- know the human-based indexing policies. Unfortu- ing many different choices. nately, we were unable to obtain indexing policies ap- plied in the Compendex database. What we could de- An analysis of automatically and human-assigned rive from the document collection was the number of classes in a previous study showed, among other human-assigned classes per document, which were things, how certain human-assigned classes were ac- used in evaluation. However, without a thorough tually wrong and some automatically-assigned classes qualitative analysis of automatically assigned classes that were not human-assigned were correct (Golub one cannot be sure whether, for example, the classes 2006b). An analysis conducted within this study assigned by the algorithm, but not human-assigned, proved the same (see section 4.3). are actually wrong, or if they were left out by mistake Today evaluation in automated classification ex- or because of the indexing policy. A further issue is periments is mostly conducted under controlled that we did not know whether the articles had been conditions, ignoring the above-discussed issues. As human-classified based on their full-text or/and ab- Sebastiani (2002, 32) puts it: stracts; we had, however, only abstracts. Another problem to consider when evaluating The evaluation of document classifiers is typi- automated classification is the fact that certain sub- cally conducted experimentally, rather than jects are erroneously assigned. When indexing, people analytically. The reason is that … we would make errors such as those related to exhaustivity pol- need a formal specification of the problem that icy (too many or too few terms become assigned), the system is trying to solve (e.g., with respect specificity of indexing (which usually means that peo- to what correctness and completeness are de- ple do not assign the most specific term), they may fined), and the central notion … that of mem- omit important terms, or assign an obviously incor- bership of a document in a category is, due to rect term (Lancaster 2003, 86-87). In addition, it has its subjective character, inherently nonfor- been reported that different people, whether users or malizable. professional subject indexers, would assign different Knowl. Org. 34(2007)No.4 253 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

Because of the fact that methodology for such ex- Precisionmacroaveraged = sum of precision values for periments has yet to be developed, as well as limited each class / number of all classes resources, we followed the common approach to evaluation and started from the assumption that Precisionmicroaveraged = sum of correct automated human-assigned classes in the document collection assignments for each class / sum of all auto- were correct, and compared automatically assigned mated assignments for each class classes against them. In microaveraging more value is given to classes that 2.3.2 Evaluation measures have a lot of instances of automatically assigned classes and the majority of them are correct, while in The subset of the Ei controlled vocabulary we used macroaveraging the same weight is given to each comprised 92 classes that are all topically related to class, no matter if there are many or few automati- each other. The topical relatedness is expressed in cally assigned instances of it. The differences be- numbers representing the classes: the more initial dig- tween macroaveraged and microaveraged values can its any two classes have in common, the more related be large, but whether one is better than the other has they are. For example, 933.1.2 for Crystal Growth is not been agreed upon (Sebastiani 2002, 41-42). Thus, closely related to 933.1 for Crystalline Solids, both of in this experiment, it is the mean macroaveraged and which belong to 933 for Solid State Physics, and finally microaveraged F1 that is mostly used. to 93 for Engineering Physics. Each digit represents In order to examine different aspects of the auto- one hierarchical level: class 933.1.2 is at the fifth hier- mated classification performance, several other fac- archical level, 933.1 at the fourth etc. Thus, comparing tors were also taken into consideration: two classes at only first few digits (later referred to as partial matching) instead of all the five also makes – Whether the (human-assigned) main class is sense. Still, unless specifically noted, the evaluation in found; this experiment was conducted based on all the five – The number of documents that got automatically different levels (later referred to as complete match- assigned at least one class; ing), i.e., an automatically assigned class was consid- – Whether the class with highest score was the same ered correct only if all its digits were the same as a as the human-assigned main class; human-assigned class for the same document. – The distribution of automatically versus human- Evaluation measures used were the standard mi- assigned classes; and, croaveraged and macroaveraged precision, recall and – The average number of classes assigned to each F1 (Sebastiani 2002, 40-41), for both complete and document. There were 2.2 human-assigned classes partial matching: per document, and our aim was to achieve similar. In the context of hierarchical browsing based on a Precision = correctly automatically assigned classification scheme, having too many classes as- classes / signed to a document would place one document all automatically assigned classes to too many different places, which would create the opposite effect of the original purpose of a Recall = correctly automatically assigned classes / classification scheme, that of grouping similar all human-assigned classes documents together.

F1 = 2*Precision*Recall / (Precision + Recall) 3. Improving the algorithm

In macroaveraging the results are first calculated for The major aim of the experiment was to improve the each class, and then summed and divided by the algorithm that was previously experimented with in number of classes. In microaveraging the results for Golub 2006c, where highest (microaveraged) recall each part of every equation are summed up first was 73% when all types of terms were included in the (e.g., all correctly automatically assigned classes are term list. In that experiment neither weights nor cut- added together, all automatically assigned classes are offs were experimented with, so all the classes that added together), and then the “aggregated” values were found for a document were assigned to it. Here are used in one equation. Equations for macroaver- we wanted to achieve as high as possible precision lev- aged and microaveraged precision are given below: els by use of term weighting and class cut-offs. In or- 254 Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

der to also allow for better recall, the basic term list lists were created, each containing only single-word was enriched with new terms extracted from docu- terms, phrases or Boolean terms. Weight 1 was as- ments in the Compendex database, using multi-word signed to all of them. The documents were classified morpho-syntactic analysis and synonym acquisition. using these three terms lists and their performance was compared for precision. 3.1 Term weights Single Phrase Boolean

Avg. precision (%) 8 26 33 The aim of this part of the experiment was to achieve Derived weight 1 3 4 as high as possible precision levels by use of weight- ing and cut-offs. As shown in Golub 2006c, all types Table 1. Single, phrase and Boolean term lists and their per- formance as a basis for weights. of terms need to be used in the term list for maxi- mum recall. Thus, all the different types of terms Avg. precision (%) is mean microaveraged and and their mappings to classes were merged into the macroaveraged precision. Derived weights were final term list. This resulted in a number of duplicate based on dividing precision values (Avg. precision) cases which were dealt with in the following manner: by the lowest precision value (in this case 8).

– If one term mapping to the same class was a cap- 3. w12: Terms mapping to a main class (MC) were tion, a preferred term, and a synonym at the same given weight 2, and those mapping to an optional time, the highest preference was, based on their class (OC) were given weight 1. These weights were performance (see Table 4), given to captions, fol- heuristically derived in a separate experiment (Table lowed by preferred terms, followed by synonyms, 2). Two different term lists were created, one con- while others were removed from the list; taining only those terms that map to a main class, – If one term mapping to both optional class (OC) and another one containing only those terms that and main class (MC) was a caption, a preferred map to an optional class. Weight 1 was assigned to all term, and a synonym at the same time, the highest of them. The documents were classified using these preference was, based on their performance (see two terms lists and their performance was compared Table 4), given to captions, followed by preferred for precision. terms, followed by synonyms, while others were removed from the list; MC OC – If one thesaurus term of the same type mapped to Avg. precision (%) 13 6 both optional class (OC) and main class (MC), Derived weight 2 1 the one that mapped to the optional class was re- Table 2. Main code and optional code term lists and their moved (based on their performance, see Table 2). performance as a basis for weights.

The final term list consisted of 8099 terms, out of Avg. precision (%) is mean microaveraged and which 92 were captions (all mapped to main class macroaveraged precision. Derived weights were (MC)), 668 were broader terms, 729 narrower, 1653 based on dividing precision values (Avg. precision) preferred, 3224 related, and 1733 were synonym by the lowest precision value (in this case 6). terms. This big number of terms that have been hu- man-mapped to classes indicates potential usefulness 4. w134_12: This list was a combination of the two of such a controlled vocabulary in a string-matching preceding lists. Weights for term type 1, 3, and 4 for algorithm for automated classification. single, phrase or Boolean term were multiplied by In order to systematically vary different parame- the weight for the type of class to which the term ters, the following 14 weighting schemes evolved: mapped – 1 or 2 for optional or main class.

1. w1: All terms in the term list were given the same 5. wOrig: As used in the original term weighting weight, 1. This term list served as a baseline. scheme when the string-matching algorithm based on Ei was first applied (Koch and Ardö 2000). These 2. w134: Different term types were given different weights were intuitively derived. They combined weights: single-word terms 1, phrases 3, and Boolean types of terms depending if it were a single-word terms 4. These weights were heuristically derived in a term, Boolean or phrase, and whether the assigned separate experiment (Table 1). Three different term class was main (MC) or optional (OC). Knowl. Org. 34(2007)No.4 255 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

Phrase Boolean Single 11. wTf10Phrases: As in wTf10, with all the Boolean OC421 terms modified into phrases. This list was created in MC832 order to study the influence of phrases and Boolean Table 3. Weights in the original algorithm. terms on precision and recall.

6. w1234: With weights for different Ei term type as 12. wTf10_12: As in wTf10, with those weights mul- experimented with in Golub 2006c (captions are tiplied by the weight for the type of class to which from the classification scheme, all others are thesau- the term maps – 1 or 2 for optional or main class. rus terms). The multiplication was done before the rounding.

Broader Captions Narrow er Preferred Related Synonyms Avg. precision (%) 10 43 25 39 10 35 Derived w eight 1 4 2 4 1 3

Table 4. Different types of thesaurus terms captions and their performance as a basis for weights.

7. w134_1234: This list was a combination of two 13. wTf10_1234: As in wTf10, with those weights previous lists, w134 and w1234. Weights for term multiplied by the weight for the type of relationship type 1, 3, and 4 for single, phrase or Boolean term (Table 4). The multiplication was done before the were multiplied by the weight for the type of Ei term rounding. as given in Table 4. 14. wTf10_12_1234: As in wTf10_12, with those 8. w134_12_1234: This list was a combination of two weights multiplied by the weight for the type of rela- previous lists, w134_12 and w1234. Weights for term tionship (Table 4). The multiplication was done be- type 1, 3, and 4 for single, phrase or Boolean term fore the rounding. were multiplied by the weight for the type of class to which the term mapped – 1 or 2 for optional or main 3.1.1 Stop-word list and stemming class, and by the weight for the type of Ei term as given in Table 4. Although the terms and captions in the Ei controlled vocabulary are usually noun phrases which are good 9. wTf10: In this list weights were based on the content words, they can also contain words which number of words the term consisted of, and of the are frequently used in many contexts and as such are number of times each of its words occurred in other not very indicative of any document’s topicality terms (cf. tf-idf, term frequency – inverse document (e.g., word general in the Ei class caption Engineer- frequency, Salton and McGill 1983, 63, 205). If f ing, General). Thus, a stop-word list was used. It were the frequency with which a word w from the contained 429 such words, and was taken from Onix term t occurred in other terms, term t consisting of n text retrieval toolkit (Onix text retrieval toolkit). words, then the weight of that term was calculated as For stemming, the Porter’s algorithm (Porter 1980) follows: was used. The stop-word list was applied to the term lists, and stemming to the term lists as well as docu-

weightt = log(n) · ( 1/fw1 + 1/fw2 +…+ 1/fwn ) ments.

Logarithm was applied in order to reduce the impact 3.2 Cut-offs of parameter n, i.e., to avoid getting overly high weights for terms consisting of several sparse words. In a previous experiment (Golub 2006c) cut-offs In order to get integers as weights, the weights were were not used–instead, all the classes that were multiplied by 10, rounded and increased by 1 to found for a document were assigned to it. In the avoid zeros. context of hierarchical browsing based on a classifi- cation scheme, having too many classes assigned to a 10. wTf10Boolean: As in wTf10, with all the phrases document would place one document to many dif- modified into Boolean terms. This list was created in ferent places, which would create the opposite effect order to study the influence of phrases and Boolean of the original purpose of a classification scheme terms on precision and recall. (grouping similar documents together). In the 256 Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

document collection, there were 2.2 human-assigned Syntactical analysis was used to: classes per document, and the aim of automated clas- sification was to achieve similar. The effect of several a) insert word inside a term, such as: different cut-offs was investigated: – flow measurement: flow discharge measure- ments 1. All automatically derived classes are assigned as – distribution of good: distribution of the fin- final ones (no cut-off). ished goods 2. In order to assign a certain class as final, the score – construction equipment: construction re- of that class had to have a minimum percentage of lated equipment the sum of all the classes’ scores. Different values – intelligent control: intelligent distributed for the minimum percentage were tested: 1, 5, 10, control 15 and 20, as well as some others (see section 4 Results). b) permute components of a term, such as: 3. The second type of cut-off in combination with – control of the inventory: inventory control the rule that if there were no class with the re- – flow control: control of flow quired score, the one with the highest score would – development of a flexible software: software be assigned. development

4. In order to follow the subject classification prin- c) add a coordinated component to a term, such as: ciple of always assigning the most specific class – project management: project schedule and possible, the principle of score propagation was management introduced. The principle was implemented so – control system: control and navigation sys- that the scores for classes at deeper hierarchical tem levels were a sum of their own score together with

scores of classes at upper hierarchical levels if such Synonyms were acquired through a rule-based system were assigned. SynoTerm (Hamon and Nazarenko 2001) which in-

fers synonymy relations between complex terms by 3.3 Enriching the term list with new terms employing semantic information extracted from lexi-

cal resources. First the documents were preprocessed In the previous experiment (Golub 2006c), highest and tagged with part-of-speech information and lem- achieved recall was 73% (microaveraged), when all matized. Then terms were identified through the types of terms were included in the term list. In or- YaTeA term extractor (Aubin and Hamon 2006). The der to further improve recall, the basic term list was semantic information provided by the database Word- enriched with new terms. These terms were ex- Net (Fellbaum 1998; WordNet) was used as a boot- tracted from bibliographic records of the Com- strap to acquire synonym terms of the basic terms. pendex database, using multi-word morpho-syntac- The synonymy of the complex candidate terms was tic analysis and synonym acquisition, based on the assumed to be compositional, i.e., two terms were existing preferred and synonymous terms (as they considered synonymous if their components were gave best precision results). identical or synonymous (e.g., building components: Multi-word morpho-syntactic analysis was con- construction components, building components: con- ducted using a parser FASTER (Jacquemin 1996) struction elements). which analyses raw technical texts and, based on Although verification by a subject expert is desir- built-in meta-rules, detects morpho-syntactic vari- able for all automatically derived terms, due to lim- ants. The parser exploits morphological (derivational ited resources only the extracted synonyms were and inflectional) information as given by the database verified. Checking the synonyms is also most impor- CELEX (Baayen et al. 1995). Morphological analysis tant since computing those leads to a bigger seman- was used to identify derivational variants, such as: tic shift than morphological and syntactical opera- effect of gravity: gravitational effect tions do. The verification was conducted by a subject architectural design: design of the proposed archi- expert, a fifth-year student of engineering physics. tecture Suggested synonym terms were displayed in the user supersonic flow: subsonic flow interface of SynoTerm. The verification was not structural analysis: analysis of the structure strict: derived terms were kept if they were semanti- cally related to the basic term. Thus, hyperonym Knowl. Org. 34(2007)No.4 257 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

(generic/specific) or meronym (part/whole) terms when using these three term lists. As expected, best were also accepted as synonyms. The expert spent 10 precision results are gained when cut-off is highest, hours validating the derived terms. Of the 292 up to 0.37 macroaveraged, and best recall when there automatically acquired synonyms, 168 (57.5%) were is no cut-off, up to 0.54. validated and used in the experiment. When using cut-offs, two sets of experiments were conducted: one with assigning at least the class with 4. Results highest score, and the other following the threshold calculation only. Because the former results in more 4.1 Improving F1 and precision: applying weights documents with assigned correct classes, in further and cut-offs experiments the rule to assign at least the class with highest score is applied. Based on each of the 14 term lists, the classification algorithm was run on the document collection of 4.1.1 Stop words: removal and stemming 35,166 documents (see 2.2). As described earlier (2.3.2), several aspects were evaluated and different Next, the influence of stop-words removal and evaluation measures were used; thus, for each term stemming was tested (as described in 3.1.1). For this list, the following types of results were obtained: experiment three lists that performed best in the previous one were chosen: w1234, w134_1234 and 1. min 1: if no classes were assigned because their w134_12_1234. Every list was run against stop- final scores were below the pre-defined cut-off words removed, stemming, and both the stop-words value (described in 3.2), the class with the highest removed and stemming, each in combination with score was assigned; different cut-off values: 5, 10 and 15. Improvements 2. cut-off: the applied cut-off value; when using either stemming or stop-words removal 3. min 1 correct: number of documents that were or both are achieved in majority of cases up to two assigned at least one correct class; percent. There is also a slight increase in the number 4. min 1 auto: number of documents that were as- of correctly found classes without finding more in- signed at least one class; correct classes. The differences between the three 5. avg auto/doc: average number of classes that term lists measured in mean F1 are minor – one or were assigned per document, based on documents two percent. The best term list is w134_12_1234 that were assigned at least one class; used in combination with stemming and stop-words 6. macroa P: macroaveraged precision; removal and cut-off 10 – best mean F1 is 0.24. For 7. macroa R: macroaveraged recall; this list more cut-offs were experimented with for 8. macroa F1: macroaveraged F1; better results; the value of 9 proved to perform best 9. microa P: microaveraged precision; but better only on a third decimal digit than that of 10. microa R: microaveraged recall; 10. In the following experiments, unless specifically 11. microa F1: microaveraged F1; noted, we used the best-performing w134_12_1234 12. mean F1s: arithmetic mean of macroaveraged term list and setting (applying stemming and stop- and microaveraged F1 values. words removal, cut-off 9).

The same experiment was run on all the 14 term 4.1.2 Individual classes lists. For each term list, two parameters were varied: 1) whether min 1 was assigned or not; and, 2) the It was shown that certain classes perform much bet- first two cut-off variants from section 3.2. ter than the average. Performance of different classes When looking at mean F1 values, the differences varies quite a lot. For example, top three performing between the term lists are not larger than four per- classes as measured in precision are different from cent. Performance of the different lists measured in top three classes for recall or F1: see Table 5. precision and recall is also similar. Three lists that perform best in terms of mean F1 are w1234, 4.1.3 Partial matching w134_1234 and w134_12_1234 – all of them based on weights for different Ei term types. In compari- As expected, the algorithm performs better when son to the baseline when no weights or cut-offs are evaluation is based on partial matching between used, an improvement of six percent is achieved automatically and human-assigned classes (see sec- 258 Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

Precision - class: value Recall - class: value F1 - class: value

Cellular Manufacturing (913.4.3): 0.98 Amorphous Solids (933.2): 0.61 Crystal Grow th (933.1.2): 0.45 Electronic Structure of Solids (933.3): 0.97 Crystal Grow th (933.1.2): 0.52 Amorphous Solids (933.2): 0.44 Information Retrieval and Use (903.3):0.82 Manufacturing (913.4): 0.50 Optical Variables Measurement (941.1): 0.40

Table 5. Top three performing individual classes

General Management Maths Physics Instruments 90 91 92 93 94 F1 0.65 0.5 0.66 0.51 0.49 terms 679 1922 848 2902 1748 901 902 903 911 912 913 914 921 922 931 932 933 941 942 943 944 F1 0.4 0.3 0.5 0.3 0.4 0.3 0.3 0.6 0.3 0.44 0.3 0.5 0.3 0.4 0.2 0.4 terms 275 241 163 237 596 393 696 628 220 1648 801 453 422 373 604 349

Table 6. Results for partial matching at the second and third hierarchical levels, and num- ber of terms per each class. tion 2.3.2). As seen from Table 6, at the second hier- gated down”). In another run, this was slightly var- archical level F1 is up to 0.66 and at third 0.59. At ied, so that the broader classes from which scores the second hierarchical level the best F1 is achieved were propagated to their narrower classes were re- by classes Engineering mathematics (represented by moved (“propagated down, broader removed”). notation 92) and General engineering (90). At the These types of score propagation were tested on third hierarchical level, the class that performs best the best performing term list and setting (w134_12 of all is 921 Applied Mathematics, while the worst one _1234 with stemming and stop-words removal). In is 943 Mechanical and Miscellaneous Instruments. In complete matching, “propagated down” performs conclusion, for the 14 classes at top three hierarchi- best. However, it is slightly worse than when not us- cal levels mean F1 is almost twice as good as for the ing score propagation at all. In partial matching, both complete matching, which implies that our classifica- “propagated down” and “propagated down, broader tion approach would suit better those information removed” perform slightly better than the original on systems in which fewer hierarchical levels are the first two or three hierarchical levels, and slightly needed, like the Intute subject gateway on engineer- worse on the fourth and fifth ones. These not-so- ing (Intute Consortium 2006). good results with score propagation can be partially The variations in performance between individual explained by the fact that the term list contained both classes for both complete and partial matching are broader and narrower terms, which was done in order quite big, but at this stage it is difficult to say why. to achieve best recall (Golub 2006c). The two best-performing classes at the second hier- archical level have by far the smallest number of 4.1.5 Finding main classes terms designating them (terms). However, in other cases there does not seem to be any correlation be- We further analyzed the degree to which the one tween number of terms and performance, as also dis- most important concept of every document is found covered in Golub 2006b. Further research is needed by the algorithm. To this purpose, a subset of to explore what the factors contributing to perform- (19,153) documents was used which had the human- ance are. assigned main class in class 9 (there is one main class per document). In complete matching, 78% of main 4.1.4 Score propagation classes are found when no cut-offs are applied. When cut-offs are applied, 22% of main classes are found. A relevant subject classification principle is to always In partial matching, more main classes are found at assign the most specific class available. This principle the second and third hierarchical levels when using provided us with a basis for score propagation, in both types of score propagation, up to 59% and 38% which scores of classes at narrower (more specific) respectively. Thus, score propagation could be used hierarchical levels were increased by scores assigned in services for which fewer hierarchical levels are to their broader classes (later referred to as “propa- needed (e.g., Intute Consortium 2006). Knowl. Org. 34(2007)No.4 259 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

4.1.6 Distribution of classes ring together, but less so for more frequently occur- ring pairs. Using the same best setting achieved so far, the algo- rithm was also evaluated for distribution of auto- 4.1.7 Implications for application matically assigned classes in comparison to that of the human-assigned ones. The comparison was based Since automated classification algorithms can have on how often two classes get assigned together when a number of different applications, it is important to using the algorithm in comparison to when they get emphasize that an algorithm can be adjusted for the human-assigned. Figure 1 shows the frequency dis- specific application need. Here those applications are tribution of assigned class pairs. The x-coordinate pointed out in which our algorithm was shown to presents human-assigned class pairs ordered by de- yield promising results in terms of F1 and precision.

Figure 1. Frequency distribution of assigned pairs of classes (2538 pairs). scending frequency. One point represents one class 1. In all applications, best precision and F1 are pair: e.g., the pair of classes 912.2 and 903 occurs achieved when applying the w134_12_1234 term most frequently in human-based classification (48 list, together with stemming and stop-words re- times, as marked on the y-coordinate) and is repre- moval. sented by point 1 on the x-coordinate; point 500 on 2. In information systems such as Intute (Intute the x-coordinate represents the 913.5 and 911 pair Consortium 2006), several broader hierarchical that occurs 3 times, as marked on the y-coordinate. levels are used. To the purpose of such an applica- Thus, the smoothest line (Human-assigned) repre- tion, the classification algorithm should be im- sents the human-assigned classes. The minimum of plemented so that only classes from top three hi- 2538 pairs of classes that both the algorithm and erarchical levels are used, but so that scores from people have produced are shown. classes at lower hierarchical levels are added to the A correlation of 0.38 exists between the human- final some of their broader classes. assigned classes and automatically assigned classes 3. In applications where classes at all hierarchical lev- (Automated). However, for the 100 most frequent els are needed, such as other hierarchical browsing pairs, the correlation drops to 0.21. In the top 10 systems, searching or machine-aided indexing most frequent pairs of classes, there is no overlap at softwares, cut-off level of nine, and the principle of all. In conclusion, the distribution of human- assigning at least the class with highest score assigned and automatically assigned classes is more should be implemented. In addition, the choice can correlated when looking at all pairs of classes occur- be made to assign only the class with highest score, 260 Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

i.e., the class with highest probability that it is cor- ated and the classification was based on each of rect, as it is done in the Thunderstone’s web site them. It was shown that the number of terms is not catalog (Thunderstone 2005). Alternatively, the proportional to performance, e.g., permutation- classes can be ranked in descending order based on based extraction comprises 1373 terms, and, when the score indicating the probability that the docu- stemming is applied, has performance as measured in ment is dealing with the topic designated by the mean F1 of 0.02, whereas coordination comprises class. 403 terms, with performance of 0.07. These two cases can be explained by the fact that permutation 4.2 Enhancing the term list with new terms also implies variation based on insertion and preposi- tion change (e.g., engineering for commercial win- In the previous experiment (Golub 2006c), highest dow systems: system engineering) which leads to achieved recall was 73% (microaveraged), when all bigger semantic shift than the identification of term types of terms were included in the term list. In or- variant based on the coordination. By combining all der to further improve recall, the basic term list was the extracted terms into one term list, the mean F1 is enriched with new terms. These terms were ex- 0.14 when stemming is applied, and microaveraged tracted from bibliographic records of the Com- recall is 0.11, which would imply that enriching the pendex database, using multi-word morpho-syntac- original Ei-based term list with these newly extracted tic analysis and synonym acquisition, based on the terms should improve recall. In comparison to re- existing preferred and synonymous terms (as they sults gained in Golub 2006c, where microaveraged gave best precision results). The number of terms recall with stemming is 0.73, here the best recall, also added to the term list was as follows: microaveraged and with stemming, is 0.76. The next step was to assign appropriate weights to 1. Based on multi-word morpho-syntactic analysis: the newly extracted terms (Table 7). We used the – derivation: 705, out of which 93 adjective to w134_12_1234 term list, earlier shown to perform noun, 78 noun to adjective, and 534 noun to best. The result as measured in mean F1 is the same verb; as in the original, 0.24 (cut-off 10, stemming applied – permutation: 1373; but not stop-word removal). The difference is that – coordination: 483; recall and the number of correctly assigned classes – insertion: 742; and increases by 3%, but precision decreases. Thus, de- – preposition change: 69. pending on the final application, terms extracted in 2. Based on semantic variation (synonymy): 292 this way could be added to the term list or not. automatically extracted, out of which 168 were verified as correct by the subject expert. 4.2.1 Implications for application

In order to examine the influence of different types Enriching the term list with terms extracted using of extracted terms, nine different term lists were cre- multi-word morpho-syntactic analysis and synonym

all combined stemming no yes no yes stop-words out no no yes yes min 1 correct 24479 29639 26039 30466 min 1 auto 34086 34966 34425 34987 avg auto/doc 16.79 28.61 18.06 29.68 macroa P0.110.090.110.09 macroa R0.540.710.550.72 macroa F1 0.19 0.16 0.18 0.15 microa P0.070.060.070.06 microa R0.550.730.590.76 macroa F1 0.13 0.11 0.13 0.10 mean F1 0.16 0.13 0.16 0.13 Table 7. Performance of the w1 term list enriched with all automati- cally extracted terms. Knowl. Org. 34(2007)No.4 261 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

acquisition slightly improves recall. At the same 2. Containing those terms that found classes which time, precision decreases. Thus, enhancing the term were correct in more instances than they were in- list in this way would be appropriate for applications correct (1,924 terms). This list yields best mean such as focused crawling, when the purpose is to F1, 0.38. This value is achieved when stemming is crawl as many documents as possible and precision is used but no stop-words are removed. There are less important. To maximize recall, no weights or 65% of documents that are classified, with the av- cut-offs need be applied. erage number of classes 1.7. When stemming is not used, precision levels are 0.75 for microaver- 4.3 Term analysis and shortened term lists aged, and 0.79 for macroaveraged. 3. Containing all terms excluding those that found In the original term list there were 4,411 distinct classes which were always incorrect (4,751 terms). terms. In the document collection, 53% of them The mean F1 is 0.25, when cut-off is 10 and both were found. The average length of the terms found stop-words removal and stemming are used. The was between one and two words, while the longer slight improvement in comparison to the original ones were less frequently found. Of the terms found list is due to increase in precision. in the collection, based on 16% of them correct classes were always found, while based on 43% of 4.3.1. Implications for application them incorrect classes were always found. For a sample of documents containing terms that were Using the same w134_12_1234 term list, apart from shown to always yield incorrect results, we had a by using only weights and cut-offs, precision and F1 male subject expert confirm whether the documents are further improved by exclusion of terms that al- were in the wrong class according to his opinion. For ways yield incorrect classes. This setting improves 10 always-incorrect terms with most frequent occur- precision without degrading recall, so it should be rences, the subject expert looked at 30 randomly se- used in applications when either, or both, are impor- lected abstracts containing those terms. Based on his tant. The best F1 throughout the whole experiment judgments, it was shown that 24 out of those 30 is achieved when terms that yield incorrect classes in documents were indeed incorrectly classified, but majority of cases are excluded. there were also 6 which he deemed to be correct. This is another indication of how problematic it is to 5. Conclusion evaluate subject classification in general, and auto- mated subject classification in particular. Perhaps one The study showed that the string-matching algo- way would be to have a number of subject experts rithm could be enhanced in a number of ways: agree on all the possible subjects and classes for every document in a test collection for automated 1. Weights: adding different weights to the term list classification; another way could be to evaluate based on whether a term is single, phrase or Boo- automated classification in context, by end-users. lean, which type of class it maps to, and Ei term Based on the term analysis, three new term lists type, improves precision and relevance order of were extracted from the original one, and tested for assigned classes, the latter being important for performance: browsing; 2. Cut-offs: selecting as final classes those above a 1. Containing only those terms that found classes certain cut-off level improves precision and F1; which were always correct (1,308 terms). When 3. Enhancing the term list with new terms based on cut-off is between 5 and 10, macroaveraged preci- morpho-syntactic analysis and synonyms acquisi- sion reaches 0.89, and microaveraged 0.99, when tion improves recall; neither stemming nor stop-words removal are ap- 4. Excluding terms that in most cases gave wrong plied. Stemming does not really improve general classes yields best performance in terms of F1, performance because recall increases only little, by where the improvement is due to increased preci- 0.03, while precision decreases by 0.2. However, sion levels. when using only those 1,308 terms, only 5% of documents are classified. The best mean F1, 0.15, The best achieved recall is 76%, when the basic term is achieved when stemming and the stop-word list is enriched with new terms, and precision 79%, removal are used. when only those terms previously shown to yield 262 Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

correct classes in the majority of documents are Engineering Information. 2006. Compendex, Engi- used. Performance of individual classes, measured in neering Information, Elsevier, available at: http:// precision, is up to 98%. At third and second hierar- www.ei.org/databases/compendex.html (accessed chical levels mean F1 reaches up to 60%. 30 June 2006). engineering/ (accessed 30 August These results are comparable to machine-learning 2007). algorithms (see, for example, Sebastiani 2002), which Fellbaum, Christiane. 1998. WordNet: an electronic require training documents and are collection- lexical database, MIT Press, Cambridge, MA. dependent. Another benefit of classifying docu- Golub, Koraljka. 2006a. Automated subject classifi- ments into classes of well-developed classification cation of textual web documents. Journal of docu- schemes is that they are suitable for subject brows- mentation 62: 350-71. ing, unlike automatically-developed controlled vo- Golub, Koraljka. 2006b. Automated subject classifica- cabularies or home-grown directories often used in tion of textual web pages, based on a controlled vo- document clustering and text categorization (Golub cabulary: challenges and recommendations. New 2006a). review of hypermedia and multimedia 12: 11-27. The experiment has also shown that different ver- Golub, Koraljka. 2006c. The role of different sions of the algorithm could be implemented so that thesauri terms in automated subject classification it best suits the application of the automatically clas- of text. In T. Nishida, ed., 2006 IEEE/WIC/ACM sified document collection. If the application re- International Conference on Web Intelligence (WI quires high recall, such as, for example, in focused 2006 Main Conference Proceedings) (WI '06): Pro- crawling, cut-offs would not be used. Or, if one pro- ceedings: 18-22 December 2006, Hong Kong, vides directory-style browsing interface to a collec- China. Los Alamitos, Calif.: IEEE Computer So- tion of automatically classified web pages, web pages ciety, 961-65. could be ranked by relevance based on weights. In Hamon, Thierry, Nazarenko, Adeline. 2001. Detec- such a directory, one might want to limit the number tion of synonymy links between terms: experi- of web pages per class, e.g., assign only the class with ment and results. Recent advances in computa- highest probability that it is correct, as it is done in tional terminology, ed. Didier Bourigault et al. the Thunderstone’s web site catalog (Thunderstone Amsterdam: John Benjamins, pp. 185-208. 2005). International Standards Organization. 1985. Docu- mentation–methods for examining documents, de- References termining their subjects, and selecting index terms: ISO 5963, Geneva, ISO. Aitchinson, Jean, Gilchrist, Alan, Bawden, David. Intute Consortium. 2006. Intute: science, engineering 2000. Thesaurus construction and use: a practical and technology – engineering, available at: http:// manual, 4th ed., Aslib, London. www.intute.ac.uk/sciences/ (accessed 30 August Aubin, Sophie, and Hamon, Thierry. 2006. Improv- 2007). ing term extraction with terminological resources. Jacquemin, Christian. 1996. A symbolic and surgical Proceedings of the 5th International Conference on acquisition of terms through variation. Connec- NLP, FinTAL, pp. 380-387. tionist, statistical and symbolic approaches to learn- Baayen, R.H., Piepenbrock, R., and Gulikers, L. ing for natural language processing, ed. Stefan 1995. The CELEX lexical database, release 2, Lin- Wermter et al. Berlin: Springer, pp. 425-38. guistic Data Consortium, University of Pennsyl- Jain, Anil K., Murty, M. Narasimha, and Flynn, Pat- vania, Philadelphia, PA. [CD-ROM]. rick J. 1999. Data clustering: a review. ACM Com- Bang, Sun Lee, Yang, Jae Dong, and Yang, Hyung puting Surveys 31: 264-323. Jeong. 2006. Hierarchical document categoriza- Koch, Traugott, and Ardö, Anders. 2000. Automatic tion with k-NN and concept-based thesauri. In- classification. DESIRE II D3.6a, Overview of re- formation processing and management 42: 387-406. sults), available at: http://www.it.lth.se/knowlib/ Chen, Hao, and Dumais, Susan.T. 2000. Bringing or- DESIRE36a-WP2.html (accessed 29 November der to the web: automatically categorizing search 2007). results. In T. Turner and G. Szwillus, eds., CHI Lancaster, F.W. 2003. Indexing and abstracting in the- '00: Proceedings of the SIGCHI Conference on ory and practice, 3rd ed, Facet, London. Human Factors in Computing Systems. New York: Lewis, David D., Yang, Yiming, Rose, Tony G., and ACM Press, 145-152. Li, Fan. 2004. RCV1: a new benchmark collection Knowl. Org. 34(2007)No.4 263 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

for text categorization research. The journal of Sebastiani, Fabrizio. 2002. Machine learning in machine learning research 5: 361-97. automated text categorization. ACM computing Markey, Karen. 1984. Interindexer consistency tests: surveys 34: 1-47. a literature review and report of a test of consis- Svenonius, Elaine. 1997. Definitional approaches in tency in indexing visual materials. Library & in- the design of classification and thesauri and their formation science research 6: 155-77. implications for retrieval and for automatic classi- Medelyan, Olena, and Witten, Ian H. 2006. Thesau- fication. In I.C. McIlwaine, ed., Knowledge organi- rus based automatic keyphrase indexing. In Gary zation for information retrieval: Proceedings of the Marchionini, Michael L. Nelson, and Cathy Mar- Sixth International Study Conference on Classifica- shall, eds., 6th ACM/IEEE-CS Joint Conference tion Research held at University College London, on Digital Libraries 2006: Opening Information 16-18 June 1997. The Hague: International Federa- Horizons. New York: ACM Press, 296-97. tion for Information Documentation, 12-16. Milstead, Jessica L. ed. 1995. Ei thesaurus, 2nd ed. Svenonius, Elaine. 2000. The intellectual foundations Hoboken, NJ: Engineering Information Inc. of information organization, MIT Press, Cam- Olson, Hope A., and Boll, John J. 2001. Subject bridge, MA. analysis in online catalogs, 2nd ed., Libraries Thunderstone. 2005. Thunderstone’s Web Site Cata- Unlimited, Englewood, CO. log, available at: http://search.thunderstone.com/ Onix text retrieval toolkit: Stop word list 1. available texis/websearch (accessed 29 November 2007). at: http://www.lextek.com/manuals/onix/ WordNet, “WordNet Search”, available at: http:// stopwords1. html (accessed 29 November 2007). wordnet.princeton.edu/perl/webwn (accessed 29 Porter, Martin F. 1980. An algorithm for suffix November 2007). stripping. Program 14 no. 3: 130-37. Yang, Yiming. 1999. An evaluation of statistical ap- Salton, Gerard, and McGill, Michael J. 1983. Intro- proaches to text categorization. Journal of infor- duction to modern information retrieval, McGraw- mation retrieval 1: 67-88. Hill, Auckland.

264 Knowl. Org. 34(2007)No.4 Book Reviews

Book Reviews

Edited by Clément Arsenault

Book Review Editor

Murtha Baca, Patricia Harping, Elisa Lanzi, Linda relationships, or defined data values. While much ef- McCrea, and Ann Whiteside (eds.). Cataloging Cul- fort has been expended on developing both data tural Objects: A Guide to Describing Cultural Work structures and values, the editors argue, the third leg and Their Images. Chicago: American Library Asso- of the stool, data content, has received less attention. ciation, 2006. 396 p. ISBN 978-0-8389-3564-4 (pbk.) Unlike the library community with its Anglo- American Cataloging Rules [sic—though RDA is refe- At a time when cataloguing code revision is continu- renced in the Selected Bibliography], or its archival ing apace with the consolidation of the International equivalent, Describing Archives: A Content Standard Standard Bibliographic Description (ISBD), the draf- (DACS), those in the domain of cultural heritage re- ting of RDA: Resource Description and Access, and the sponsible for describing and documenting works of development of common principles for an internatio- art, architecture, cultural artifacts, and their respecti- nal cataloguing code (International Meeting of Ex- ve images, have not had the benefit of such data con- perts on an International Cataloguing Code [IME tent standards. CCO is intended to address (or re- ICC]), the publication of a guide for cataloguing cul- dress) that gap, emphasizing the exercise of good tural objects is timely and purposeful. Compiling this judgment and cataloguer discretion over the applica- data content standard on behalf of the Visual Resour- tion of “rigid rules” [p. xii], and building on existing ces Association, the five editors—with oversight standards. from an advisory board—have divided the guide into Part One, General Guidelines, sets the foundation. three parts. Following a brief introduction outlining Beginning with the question, “What are you Catalo- the purpose, intended audience, and scope and me- guing?”, this 41-page section articulates the difference thodology for the publication, Part One, General between a work and an image, and continues with Guidelines, explains both what the Cataloging Cultu- what institutions need to consider in determining ral Objects (CCO) guide is—“a broad document that what kinds of, and how much information to include includes rules for formatting data, suggestions for re- in, a minimal description for a Work Record– quired information, controlled vocabulary require- elements subsequently covered in Chapters 1–8 of ments, and display issues” (p. 1)—and is not—“not a Part 2—an Image Record—dealt with in Chapter 9 of metadata element set per se” (p. 1). Part Two, Ele- Part 2—records for a group, collection, or series of ments, is further divided into nine chapters dealing cultural objects, and related works, or, “those having with one or more metadata elements, and describing an important conceptual relationship to each other” the relationships between and among each element. (p. 13). Less familiar, perhaps, to the eyes of those re- Part Three, Authorities, discusses what elements to sponsible for bibliographic or archival description, is include in building authority records. A Selected Bi- the inclusion of recommendations concerning databa- bliography, Glossary, and Index, respectively, round se design, field structures, database construction, and out the guide. the purpose of a database–as a cataloguing tool? col- As the editors note in their introduction, “Stan- lection management system? digital asset manage- dards that guide data structure, data values, and data ment system? online catalogue? This latter part, while content form the basis for a set of tools that can lead a useful inclusion, seems somewhat contradictory to good descriptive cataloging, consistent documen- within a set of guidelines that profess to be “system tation, shared records, and increased end-user access” independent”. Part One concludes with definitions (p. xi). The VRA Core Categories, for example, re- of, and guidelines for, creating and maintaining con- present a set of metadata elements expressed within trolled vocabularies and authority files, respectively. an XML structure (data structure). Likewise, the Art Examples of work records (Figures 1–7), and a work & Architecture Thesaurus contains sets of terms and record with two related image records (Figure 8) pro- Knowl. Org. 34(2007)No.4 265 Book Reviews

vide concrete, visual samples of the issues covered sented throughout in a spirit of “recommended best throughout the General Guidelines, and foreshadow practice”. This is to allow for individual institutions the part to follow. to “make and enforce” local rules that accommodate Part Two, Elements, provides (1) definition, con- their requirements and those of their end-users most text, and terminology, (2) cataloguing rules, and (3) effectively and efficiently (p. 2). guidelines on presentation of data for each of eight This manual will serve as an important tool for broad metadata element types, grouped by purpose, museum documentation specialists, visual resources and associated with a work record (e.g., object na- curators, archivists, librarians, or others responsible ming [work type/title]; creator information [crea- for providing descriptive metadata and authority con- tor/creator role]; stylistic, cultural, and chronologi- trol for a variety of cultural objects, including archi- cal information [style/culture/date]; subject; etc.). tecture, paintings, sculpture, prints, manuscripts, The ninth chapter, view information elements, ad- photographs and other visual media, performance art, dresses how to describe aspects of a work as captu- archeological sites and artifacts, and different func- red in its surrogate, an image of the work. Each tional objects associated with material culture. While chapter within Part Two concludes with illustrated its coverage is impressively wide-ranging, CCO is not examples, again, to reinforce concepts and applicati- intended for natural history or scientific collections. ons discussed relative to a particular element set. Cataloging Cultural Objects, in linking the work of Those expecting the inclusion of administrative, cataloguers from different institutional contexts, pro- structural, and/or technical metadata for creating vides a timely and useful content standard for cross- and managing digital repositories, will be disappoin- domain application. It also serves as an effective tea- ted. The list of elements in Part Two is explicitly re- ching tool for those who recognize and value, less the stricted to descriptive metadata. location—museum, archive, library—where descripti- Part Three, Authorities, follows a similar format as ve metadata are to be assigned, and more the purpose Part Two, including discussion and terminology, edi- for which they are intended, namely to facilitate ac- torial rules, and presentation of data for (1) personal cess to, and sharing of both records and their cor- and corporate name authority, (2) geographic place responding objects. While this reviewer would have authority, (3) concept authority, and (4) subject appreciated more than a “Selected Bibliography”, and authority. As with Part Two, examples liberally popu- an expanded Glossary (e.g., where is a definition of late the text of each chapter, with specific illustrations “format controlled” among “controlled fields”, “con- of the four types of authority record coming at the trolled list”, and “controlled vocabulary”?), the inclu- end of respective chapters 1–4. sion of additional specialized sources for cataloguing The consistent formatting of chapters within the museum collections, and within-chapter references to text, overall, ensures that perspective cataloguers un- standard tools for particular metadata elements, are derstand the meaning, context, terminology, and app- especially foresighted, and commendable. There is lication of guidelines for descriptive metadata and mention throughout the text of a “CCO website”. A authority control. Thus, in its own internal structure, URL or other link eluded this reviewer, though a CCO remains true to its stated objective of promo- Google™ search led to http://vraweb.org/ccoweb/ ting consistency of interpretation and implementati- cco/index.html [accessed September 28, 2007]. on. Bolded recommendations throughout Part One Overall, Cataloging Cultural Objects with its atten- are, in some instances broad level—“CCO recom- ding guidelines for descriptive metadata and authority mends good and versatile database design and consi- control for “one-of-a-kind cultural objects” should stent cataloging rules” (p. 25)—and in others, appro- merit a place among the “well-established” data con- priately specific—“Because of the complexity of cul- tent standards of the library and archival communities tural information and the importance of Authority that CCO references with obvious regard. Records, CCO recommends using a relational data- base” (p. 20). Regardless of their degree of specificity, Lynne C. Howarth recommendations provide clear, logical, and princi- Professor ples-based guideposts for both institutions and indi- Faculty of Information Studies vidual cataloguers, alike. They also provide context University of Toronto for the series of “rules” which follow in Parts Two 140 St. George Street, Toronto, and Three. The rules, while named as such, and arti- Ontario M5S 3G6, Canada culated in a prescriptive tone, are discussed and pre- E-mail: [email protected] 266 Knowl. Org. 34(2007)No.4 Book Reviews

Patrick Lambe. Organising Knowledge: Taxonomies, If the first part of the book introduces concepts, Knowledge and Organisational Effectiveness. Oxford: provides definitions and challenges wrong assump- Chandos, 2007. xix, 277 p. ISBN 978-1-84334-228-1 tions about taxonomies and the work of taxonomy- (hbk.); 978-1-84334-227-4 (pbk.) building, the second one takes us step-by-step through a typical project. From here on, insights be- The knowledge and information world we live in can come part of practicable frameworks that form the rarely be described from a single coherent and pre- basis of a concrete information-management strategy dictable point of view. In the global economy and and process so flexible so as to be used in very differ- mass society, an explosion of knowledge sources, dif- ent organizational environments and scenarios. Start- ferent paradigms and information-seeking behaviors, ing from the definition of stakeholders, purpose and fruition contexts and access devices are overloading scope and ending with deployment, validation and our existence with an incredible amount of signals governance, a taxonomy-building project is realisti- and stimulations, all competing for our limited atten- cally presented as an iterative and fascinating journey tion. Taxonomies are often cited as tools to cope over competing needs, changing goals, mixed cues with, organize and make sense of this complex and and technical and cognitive constraints. ambiguous environment. Beyond introducing fundamental guiding princi- Leveraging an extensive review of literature from a ples and addressing relevant implementation chal- variety of disciplines, as well as a wide range of rele- lenges, Organising Knowledge provides a large dose of vant real-life case studies, Organising Knowledge by political and pragmatic advice to make your efforts Patrick Lambe has the great merit of liberating tax- useful in contributing to the overall knowledge and onomies from their recurring obscure and limitative information infrastructure. Taxonomies, much like definition, making them living, evolving and working architect’s blueprints, only represent theory until tools to manage knowledge within organizations. they are implemented in practice involving real people Primarily written for knowledge and information and real content. As Lambe explains, this step re- managers, this book can help a much larger audience quires crossing over to the other side of the barricade, of practitioners and students who wish to design, de- wearing the user’s shoes and constructing an infor- velop and maintain taxonomies for large-scale coor- mation neighborhood, designing and populating a dination and organizational effectiveness both within metadata framework, solving usability issues and suc- and across societies. Patrick Lambe opens ours eyes cessfully dealing with records management and in- to the fact that, far from being just a synonym for formation architecture concerns. pure hierarchical trees to improve navigation, find- While each single paragraph of the book is packed ability and information retrieval, taxonomies take with valuable advice and real-life experience, I con- multiple forms (from lists, to trees, facets and system sider the last chapter to be the most intriguing and maps) and play different roles, ranging from basic in- ground-breaking one. It’s only here that taxonomists formation organization to more subtle tasks, such as meet folksonomists and ontologists in a fundamental establishing common ground, overcoming bounda- attempt to write a new page on the relative position ries, discovering new opportunities and helping in between old and emerging classification techniques. sense-making. In a well-balanced and sober analysis that foregoes Over the course of the book, a number of miscon- excessive enthusiasm in favor of more appropriate ceptions haunting taxonomy work are addressed and considerations about content scale, domain maturity, carefully dispelled. Taxonomy development is often precision and cost, knowledge infrastructure tools thought to be an abstract task of analyzing and classi- are all arrayed from inexpensive and expressive folk- fying entities, performed in complete isolation. On sonomies on one side, to the smart, formal, machine- the contrary, taxonomies are to a large extent prod- readable but expensive world of ontologies on the ucts of users’ perceptions and worldviews, strongly other. In light of so many different tools, informa- influenced by the pre-existing information infrastruc- tion infrastructure clearly appears more as a complex ture. They can also be dangerous tools having the po- dynamic ecosystem than a static overly designed en- tential to reveal and clarify but also to exclude and vironment. Such a variety of tasks, perspectives, conceal critical details that can have a large impact on work activities and paradigms calls for a resilient, basic business activities such as managing risk, con- adaptive and flexible knowledge environment with a trolling costs, understanding customers and support- minimum of standardization and uniformity. The ing innovation. right mix of tools and approaches can only be deter- Knowl. Org. 34(2007)No.4 267 Book Reviews

mined case by case, by carefully considering the par- inspirational reading, not only about taxonomies, but ticular objectives and requirements of the organiza- also about effectiveness, collaboration and finding tion while aiming to maximize its overall perform- middle ground: exactly the right principles to make ance and effectiveness. your intranet, portal or document management tool a Starting from the history of taxonomy-building rich, evolving and long-lasting ecosystem. and ending with the emerging trends in Web tech- nologies, artificial intelligence and social computing, Emanuele Quintarelli Organising Knowledge is thus both a guiding tool and E-mail: [email protected]

268 Knowl. Org. 34(2007)No.4 ISKO News

ISKO News

Edited by Hanne Albrechtsen

Communications Editor

ISKO’s Nordic chapter was founded November first one being 2009 in Sweden. The theme of the 8th 2007 in Copenhagen and covers Sweden, Den- first conference will probably be whether there is a mark, Norway, Finland, Iceland & the Faroe Islands. Nordic school of thought in knowledge organisa- We will eventually approach the Baltic countries and tion. We hope the chapter will bring together the determine their interest in the project. Its first board Nordic researchers in KO and facilitate more com- constituted itself with Mikkel Christoffersen (DK) munication and exchange of ideas as well as coopera- as chairman and as board members professor Birger tion and general awareness of ongoing research and Hjørland (DK), Hanne Albrechtsen (DK) and Per the involved parties therein. Nyström (SWE). 22 people have so far expressed in- terest in the chapter, but before we see who pays the membership fee for 2008, we do not yet know how Mikkel Christoffersen, ph.d.-stipendiat many members we will be. We will establish a web presence soon, and the Danmarks Biblioteksskole / plan is to hold a conference in odd years with the Royal School of Library and Information Science

Knowl. Org. 34(2007)No.4 269 Knowledge Organization Literature

Knowledge Organization Literature Ia C. McIlwaine: Literature Editor

Assisted by: Marie Baliková, Victoria Frâncu, Claudio 0484 042.1 Gnoli, Ágnes Barátné Hajdu, John McIlwaine, Gerhard Ri- Universal'naja destjatičnaja klasifikacija (UDK): T.8: 66 esthuis, Aida Slavic, Rosa San Segundo, Alenka Sauperl, Himničeskaja tehnologija. Himničeskaja prom'išlenost' . Piš- Nancy Williamson. čevaja prom'išlennost'. Metallurgija. Rodstvenn'ie otrasli [Universal Decimal Classification: Volume 8: 66: Chemical Without their assistance the task would not be possible, technology]. 4th full ed. (Lang.: rus). Editor in chief Ju. M. and their help is greatly appreciated, as would be contribu- Arskij. - Moscow: VINITI; RAN, 2007. - 310p. (Publicati- tions from any other willing person. on No: UDC-PO51). - ISBN 5-94577-031-0.

ICM 0485 042.2 Universalioji dešimtainė klasifikacija (UDK) : lentelės mok- slinėms bibliotekom [Universal Decimal Classification: 0 Form division standard edition] (Lang.: lith). - Vilnius: Lietuvos nacion- alinė Martyno Mažvydo biblioteka, 2006. – 2 vols. - ISBN: 02 Literature Reviews in Knowledge Organization 9955-541-56-3.

0479 021 0486 042.3 Miksa, F. – “The power to name”: a review essay (Lang.: Universal'naja destjatičnaja klasifikacija (UDK): Vtoroe eng). – In: Libraries and the Cultural Record, 42(2007)1, sokraščenn'ie tablic'i [Universal Decimal Classification: p.75-79. Abridged Tables]. Editor in chief Ju. M. Arskij (Lang.: rus). - Moscow: VINITI; RAN, 2006. - 150p. 0480 021;182 Broughton, V. – Classification and subject organization and 0487 042.5 retrieval. British librarianship and information work, 1991- UDK Täiendusvihik [UDC update according to the 2000; ed. J. H. Bowman (Lang.: eng). – London: Ashgate Extensions and Corrections to the UDC 20 – 27]. Online Publishing, 2006, p.494-516. edition. Translated and edited by Katrin Karus and Sirje Nilbe (Lang.: est). - Tallin: Eesti Raamatukoguhoidjate 0481 021;182 Ühing; ELNET Konsortsium, 2007. - 58p. Broughton, V. - Classification and subject organization and URL: http://www2.nlib.ee/ERY/liigit_marks_toimk/UDK retrieval. British librarianship and information work, 2001- _TV.pdf 2005; ed. J. H. Bowman (Lang.: eng). London: Ashgate Publishing, 2007, p.467-488. 0488 042.5 Universal'naja destjatičnaja klasifikacija (UDK): izmenene- 0482 026 nija i dopolnenija: v'ipusk 4 [Universal Decimal Classifica- Andrews, J.E. – (Book review of) Spink, A., Cole, C., eds. - tion: corrections and extensions: issue 4]. Prepared by New directions in cognitive information – Dordrecht: Rossijskaja akademija nauk, VINITI. Editor in chief: Ju. Springer, 2005 - viii, 250 p. - ISBN: 140204013X (HB); M. Arskij. (Lang.: rus). - Moscow: VINITI, 2006. - 145p. 9781402040139 (HB); 1402040148 (e-book); 9781402040146 (e-book) (Lang.: eng). - In: LIBRES: Li- 0489 042.5 brary & Information Science Research, 29(2007)1, p.146- Universal'na desjatkova klasifikacija (UDK): zmini ta 147. dop.(1998-1999, 2001-2002) [Universal Decimal Classifi- cation: corrections and extensions]. M. I. Ahverdova [ed.] 04 Universal Classification Systems (Lang.: ukr).- Kiev: Knižkova palata Ukraini, 2006. - 199 p.- ISBN 966-647-065-9 0483 042.1 Universal'naja destjatičnaja klasifikacija (UDK): T. 5: 61 06 Conference Reports and Proceedings Medicinske nauki [Universal Decimal Classification: Vo- lume 5: 61 Medical sciences]. 4th full ed. (Lang.: rus). Edi- 0490 06.04-07-06.14/15 tor in chief Ju. M.Arskij. - Moscow: VINITI; RAN, 2006. Smiraglia, R. – A glimpse at knowledge organization in - 305p. (Publication No: UDC-PO51). North America (Lang.: eng). – In: Knowledge Organiza- tion, 34(2007)2, p.69-71.

270 Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

07 Textbooks (whole field) 0502 122 Ponnusamy, R., Gopal, T.V. – A concept matrix based ap- 0491 07.23 proach (Lang.: eng). – In: Information studies, 12(2006)3, Broughton, V. - Essential thesaurus construction (Lang.: p.179-195. eng). - London: Facet Publishing, 2006. – viii, 296p.- ISBN: 13:978-1-85604-565-0 13 Mathematics in Knowledge Organization Book reviews by: Quinn, S. (0492) - Words at work (Lang.: eng). – In: Aus- 0503 131 tralian Library Journal, 56(2007)2, p.183-184. Wang, Tai-Yue, Chiang, Huei-Min - Fuzzy support vector Trickey, K.V. (0493) (Lang.: eng). – In: New Library machine for multi-class text categorization. (Lang.: eng). – World, 108(2007)3/4, p.190-191. In: Information Processing & Management, 43(2007)4, p.914-929. 0494 07.3 Taylor, A.G., Miller, D. P. - Introduction to cataloging and 14 System Theory and Knowledge Organization classification. 10th ed. (Lang.: eng). - Westport, CN.: Li- braries Unlimited, 2006. - xviii, 589 p. - ISBN: 0504 147 159158230X;1591582350. Manevitz, L., Yousef, M. - One-class document classification Book reviews by: via Neural Networks (Lang.: eng). – In: Neurocomputing, Conway, C.N. (0495) (Lang.: eng). – In: Reference and 70(2007)7/9, p.1466-1481. User Services Quarterly, 46(2007)3, p.104-105; Intner, S.S. (0496) (Lang.: eng). - In: Technicalities, 0505 149;918 27(2007)2, p.19-20. Angrosh, M. L., Urs, S. R. - Ontology-driven knowledge management systems for digital libraries: towards creating semantic metadata-based information services (Lang.: eng). 1 Theoretical Foundations and general Problems – In: Information Studies, 12(2006)3, p.151-168.

11 Order and Knowledge Organization 0506 149 Kasten, J. – Thoughts on the relationship of knowledge or- 0497 111 ganization to knowledge management (Lang.: eng). – In: Du Preez, M. - (Book review of) Nissen, M.E. - Harnessing Knowledge Organization, 34(2007)1, p.9-15. knowledge dynamics: principled organizational knowing & learning – Hershey, PA: IRM Press, 2006. - xix, 278 p. - ISBN: 1591407737;1591407745;1591407753 (ebook) 15 Psychology and Knowledge Organization (Lang.: eng). – In: The Electronic Library, 25(2007)1, p.118-119. 0507 157 Chen, Z., Lu, K. - A preprocess algorithm of filtering irrele- 0498 111 vant information based on the minimum class difference Salo, J.- A conceptual model of trust in the online environ- (Lang.: eng). – In: Knowledge-based Systems, 19(2006)6, ment (Lang.: eng). – In: Online Information Review, p.422-429. 31(2007)5, p.604-621. 18 Classification and Indexing Research 12 Conceptology in Knowledge Organization 0508 182 Panici, A. - Noutăţile catalogării, clasificării şi indexării re- 0499 122 surselor bibliografice [Novelties in cataloguing, classification Barátné Hajdu, Á. - Human perception and knowledge or- and indexing of the bibliographic resources] (Lang.: rom). - ganization: visual imagery (Lang.: eng). – In: Library Hi In: Magazin Bibliologic, 1 (2006), p.23-26 . Tech, 25(2007)3, p.338-351. 19 History of Knowledge Organization 0500 122 Barátné Hajdu, Á. - A percepció és megjelenítés jelentősége az 0509 191 információkereső nyelvekben [The importance of perception Van der Linden, H. M.M. - De actualiteit van een 19e eeu- and visualisation in information retrieval] (Lang.: hun). In: wse classificatietheorie in de digitale wereld: over brievenbus- Tu dományos és Műszaki Tájékoztatás, 54(2007)10. sen en andere ordeningen [The actuality of a 19th-century URL: http://tmt.omikk.bme.hu/show_news.html?id=4785 classification theory in the digital world: about letter boxes &issue_id=487 and other forms of organization] (Lang.: du). – In: Infor- matie Professional, 11(2007)7/8, p.12-17. 0501 122 Karamuftuoglu, M. - Need for a systemic theory of classifica- tion in information science (Lang.: eng). – In: Journal of the American Society for Information Science & Technology, 58(2007)13, p.1977-1987. Knowl. Org. 34(2007)No.4 271 Knowledge Organization Literature

2 Classification Systems and Thesauri, Structure 22 Structure and elements of CS & T and Construction 0518 225 21 General Problems of Classification Systems and Hunt, K. - Faceted browsing: breaking the tyranny of key- Thesauri word searching (Lang.: eng). - In: Feliciter, 52(2006), p.36- 37. 0510 211 Dalbin, S. - Thesaurus et informatique documentaires: parte- 0519 225 naires de toujours? / Dokumentarische Thesauri und doku- Lin, Wen-Yau C. - The concept and applications of faceted mentarische Informatik: Partner fur immer? / Tesauros e in- classification (Lang.: chi). – In: Journal of Educational Me- formatica documentales: socios desde siempre? / The thesau- dia & Library Sciences, 44(2006)2, p.153-171. rus and the digital library: still partners? (Lang. : fr). – In: Documentaliste - Sciences de l'Information, 44(2007)1, 0520 225 p.42-55. Miksa, S.D., et al. - The development of a facet analysis sys- tem to identify and measure the dimensions of interaction in 0511 211 online learning (Lang.: eng). – In: Journal of the American Dalbin, S. - Thesaurus et informatique documentaires: des Society for Information Science and Technology, Noces d'Or. / Dokumentarischen Thesauri und dokumenta- 58(2007)11, p.1569-1578. rische Informatik: die Goldene Hochzeit. / Tesauros e infor- matica documentales: Bodas de oro. / Information languages 0521 225;752 and the thesaurus: celebrating their golden anniversary La Barre, K. – Faceted navigation and browsing features in (Lang.: fr). – In: Documentaliste - Sciences de l'Informati- new OPACs: robust support for scholarly information seek- on, 44(2007)1, p.76-80. ing? – (Lang.: eng). - In: Knowledge Organization, 34(2007)2, p.78-90. 0512 211 Hjørland, B. – Information: objective or subjec- 0522 226 tive/situational? (Lang.: eng). – In: Journal of the Ameri- Buizza, P. – (Book review of) Biblioteca nazionale centrale can Society for Information Science and Technology, di Firenze. Nuovo soggettario: guida al sistema italiano di 58(2007)10, p.1448-1457. indicazzazione per soggetto, prototipo del thesaurus [Natio- nal Central Library of Florence. New subject headings: a 0513 212 guide to the Italian system of subject description, proto- Kishida, K. - Effectiveness and functionality of controlled vo- type of a thesaurus] (Lang.: eng). – Milan: Bibliografica, cabulary in the Internet age (Lang.: jap). – In: Journal of In- 2007. – 246p.1 CD-ROM. – ISBN 978-88-7075-633-3(bb) formation Science and Technology Association (Joho no – In: Knowledge Organization, 35(2007)1, p.58-60. Kagaku to Gijutsu), 57 (2007)2, p.62-67. 0523 226 0514 214 Soonja Lee Koh, G.- Capturing the intended messages of Bianchini, D. et al. - Ontology-based methodology for e- subject headings as exemplified in The List of Korean Subject service discovery (Lang.: eng). – In: Information Systems, Headings (Lang.: eng). - In: International Cataloguing & 31(2006)4-5, p.361-380. Bibliographic Control, 36(2007)2, p.27-36.

0515 214 0524 226 Jimenez, A. G. - Una aproximacio als llenguatges 'documen- Ojala, M. - Finding and using the magic words: keywords, tals' en la web semantica [An approach to "bibliographic" thesauri, and free text search (Lang.: eng). – In: Online, languages on the semantic web] (Lang.: cat). - In: Item, 31(2007)4, p.40-42. 42(2006) p.33-50. 0525 226 0516 214 Shimada, M. - The revision of the National Diet Library List Madalli, P. – Ontologies as knowledge structures for semantic of Subject Headings (NDLSH), and its future (Lang.: jap). retrieval (Lang.: eng). – In: Information Studies, – In: Journal of Information Science and Technology As- 12(2006)4, p.205-212. sociation (Joho no Kagaku to Gijutsu), 57(2007)2, p.73- 78. 0517 214 Ungváry, R. - Az ontológiák és legfontosabb fogalmaik 0526 229 [Ontologies and their most general concepts] (Lang.: Hunter, J., Cheung, K. - Provenance Explorer - a graphical hun). - In: Tudományos és Műszaki Tájékoztatás, interface for constructing scientific publication packages from 54(2007)10, 2007. provenance trails (Lang.: eng). – In: International Journal URL: http://tmt.omikk.bme.hu/show_news.html?id=4789 on Digital Libraries, 7(2007)1/2, p.99-107. &issue_id=487

272 Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

23 Construction of Classification Systems and Thesauri 0535 343 Arazy, O., Woo, C. – Enhancing information retrieval 0527 232 through statistical natural language processing: a study of col- Rowley, J. – (Book review of) British Standard. Structured location indexing (Lang.: eng). – In: MIS Quarterly, vocabularies for information retrieval – Guide. Part 1: Defi- 31(2007)3, p.525-547. nitions, symbols and abbreviations. Part 2: Thesauri (BS8723-1 & 2: 2005) (Lang.: eng). – In: Journal of 0536 344 Documentation, 63(2007)3, p.428-431. Basile, P.et al. - The JIGSAW Algorithm for word sense dis- ambiguation and semantic indexing of documents (Lang.: 24 Relationships eng). – In: Lecture Notes in Computer Science, 4733(2007), p.313-325. 0528 241;761 Mortchev-Bouveret, M. - Fonctions lexicales pour le typage 0537 344 de relations syntagmatiques et paradigmatiques : une ap- Costa, V. S., Sagonas, K., Lopes, R. - Demand-driven in- proche lexicographique du terme [Lexical functions for the dexing of prolog clauses (Lang.: eng). – In: Lecture Notes in characterisation of syntagmatic and paradigmatic relations Computer Science, 4670 (2007), p.395-409. : a lexicographic approach to terms] (Lang.: fr). - In: Ter- minology, 12 (2006)2, p.235-259. 0538 344 Peng, D. - Automatic conceptual indexing of Web services and 25 Numerical Taxonomy its application to service retrieval (Lang.: eng). – In: Lecture Notes in Computer Science, 4494(2007), p.290-301. 0529 252 Cathey, R.J. et al. – Exploiting parallelism to support scalable 0539 344 hierarchical clustering (Lang.: eng). – In: Journal of the Samantray, S. D., Vasudev, P.- A data mining approach for American Society for Information Science and Technology, concept based document classification and automated text 58(2007)8, p.1207-1222. summarization (Lang.: eng). - In: International Conference Multidisciplinary Information Sciences and Technologies, 0530 252 1, 2(2006), p.3-7. Dunlavy, D.M. et al. - QCS: a system for querying, cluster- ing and summarizing documents (Lang.: eng). – In: Infor- 0540 344 mation Processing & Management, 43(2007)6, p.1588- Shen, J-J., Chang, C-C., Li, Y-C. – Combined association 1605. rules for dealing with missing values (Lang.: eng). – In: Journal of Information Science, 33(2007)4, p.468-481. 26 Notation. Codes 0541 344 0531 265 Zhan, J., Loh, H. T. - Using latent semantic indexing to im- Satija, M. P. – Book numbers in India with special reference prove the accuracy of document clustering (Lang.: eng). – In: to the author designed and used by the National Library of Journal of Information and Knowledge Management, India (Lang.: eng). – In: Knowledge Organization, 6(2007)3, p.181-188. 34(2007)1, p.34-40. 0542 346 29 Evaluation of C S & T De Campos, L. M. et al. – Automatic indexing from a the- 0532 292;048-46 saurus using Bayesian networks: application to the classifica- Kim, S., Beck, H. W. - A practical comparison between the- tion of parliamentary initiatives (Lang.: eng). – In: Lecture saurus and ontology techniques as a basis for search im- Notes in Computer Science, 4724 (2007), p.865-877. provement (Lang.: eng). – In: Journal of Agricultural & Food Information, 7(2006)4, p.23-42. 0543 347 Lioma, C., Ounis, I. - A syntactically-based query reformu- 0533 294 lation technique for information retrieval (Lang.: eng). – In: Harper, C.A., Tillett, B. - Library of Congress controlled vo- Information Processing & Management, 44(2008)1, p.143- cabularies and their application to the Semantic Web (Lang.: 162. eng). – In: Cataloging & Classification Quarterly, 43(2007)3/4, p.47-68. 0544 348 Giunchiglia, F., Zaihrayeu, I., Kharkevich, U. - Formalizing 34 Classing and Indexing the get-specific document classification algorithm (Lang.: eng). – In: Lecture Notes in Computer Science, 0534 34 4675(2007), p.26-37. Sukula, S. K. - Indexing in electronic environment (Lang.: eng). – In: SRELS journal of information management, 0545 348 44(2007)3, p.249-254. Li, T., Zhu, S., Ogihara, M. - Hierarchical document classifi- cation using automatically generated hierarchy (Lang.: eng). Knowl. Org. 34(2007)No.4 273 Knowledge Organization Literature

– In: Journal of Intelligent Information Systems, 0554 42 29(2007)2, p.211-230. Cordeiro, I. M. - The UDC in a time of change: a status re- port (Lang.: eng). Proceedings of the International Con- 0546 348 ference on Future of Knowledge Organization in Net- Ru, Y., Horowitz, E. - Automated classification of HTML worked Environment (IKONE 2007), Bangalore, 3-5 Sep- forms on e-commerce web sites (Lang.: eng). – In: Online tember 2007; ed. K.S. Raghavan. - Bangalore: Indian Statis- Information Review, 31(2007)4, p.51-466. tical Institute, Documentation Research & Training Cen- tre, 2007. (Indian Statistical Institute Platinum Jubilee 0547 348 Conference Series), p.105-114. Song, D. et al. - An intelligent information agent for docu- ment title classification and filtering in document-intensive 0555 42 domains (Lang.: eng). – In: Decision Support Systems, Frâncu, V. - Seminar internaţional CZU (I) [An 44(2007)1, p.251-265. international seminar on the UDC] (Lang.: rom). - In : Biblioteca, 18(2007)7, p.190-191. 35 Manual and Automatic Order Techniques 0556 42 0548 356 Kovac, T. et al. - Univerzalna decimalna klasifikacija: priroc- Agosti, M., Bonfiglio-Dosio, G., Ferro, N. - A historical nik [Universal Decimal Classification: handbook] (Lang.: and contemporary study on annotations to derive key features slo). – Ljubljana, Narodna in univerzitetna knižnica, 2006. - for systems design (Lang.: eng). – In: International Journal 130pp. on Digital Libraries, 8(2007)1, p.1-19. 43 On the Dewey Decimal Classification 0549 357 Doucet, A., Lehtonen, M. - Unsupervised classification of 0557 43 text-centric XML document collections (Lang.: eng). - In: Fleharty, C., Smith, S. - Biographies: where are they? (Lang.: Lecture Notes in Computer Science, 4518(2007) p.497- eng). – In: School Library Media Activities Monthly, 509. 23(2007)9, p.30-31.

0550 357;751 0558 43 Gery, M. - Indexing "Reading Paths" for a structured infor- Khairy, I. - Le projet Web Dewey en arabe de la Bibliothèque mation retrieval at INEX 2006 (Lang.: eng). – In: Lecture Alexandrie [The project for an Arabic Web Dewey in the Notes in Computer Science, 4518(2007) p.160-164. Biblioteca Alexandrina] (Lang.: fr). - Paper presented to 29th annual conference of MELCOM: the European Mid- 36 Coding dle Eastern Libraries Association, Sarajevo, June 4-6, 2007. 5p. 0551 361 URL: http://www.sant.ox.ac.uk/mec/melcomintl/melcom/ Tennis, J.T. - Scheme versioning in the Semantic Web (Lang.: Papers-2007/iman.doc eng). – In: Cataloging & Classification Quarterly, 43(2007)3/4, p.85-104. 0559 43 Montgomery, P. – Dewey Decimal Sudoku (Lang.: eng). In: 39 Evaluation of Classing and Indexing School Library Media Activities Monthly, 23(2007)10, p.16. 0552 393 Efron, M. - Query expansion and dimensionality reduction: 0560 43 notions of optimality in Rocchio relevance feedback and la- Petgnet, D. - Y a-t-il une vie après la Dewey? [Is there a life tent semantic indexing (Lang.: eng). – In: Information after Dewey?] (Lang.: fr). – In: Bulletin des Bibliothèques Processing & Management, 44(2008)1, p.163-180. de France, 52(2007)3, p.107-108.

0561 43 4 On Universal Classification Systems and Richman, D. - Social search comes of age (Lang.: eng). – In: Thesauri Information Outlook, 11(2007)8, p.18-24.

42 On the Universal Decimal Classification 0562 43 Weihs, J. – (Book review of) Mitchell, J. S., Vizine-Goetz, 0553 42 D. Moving beyond the presentation layer: content and con- Caranfil, L. - Clasificarea Zecimală Universală şi catalogul text in the Dewey Decimal Classification (DDC) system - tematic al Bibliotecii Academiei Române (I-3) [The Uni- New York: Haworth Information Press, 2006. - xix, 239 p.- versal Decimal Classification and the subject catalogue of ISBN: 0789034522; 9780789034526 (Lang.: eng). – In: Fe- the Romanian Academy Library] (Lang.: rom). - In: Bibli- liciter, 53(2007)5, p.265-265. oteca, 18(2007)2, p.50-51; 3/4, p.80-81; 5, p.127-128.

274 Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

44 On the Library of Congress Classification and Li- 64 On C S & T in Biological, Veterinary Science, Agri- brary of Congress Subject Headings culture, Food Sciences, Ecology

0563 44 0571 649;221 Making LC call numbers visual (Lang.: eng) .- In: Library Stirling, D. A. - EPA glossaries: the struggle to define envi- Journal, 130(2005)6, p.15. ronmental terms (Lang.: eng). – In: Government Informa- tion Quarterly, 24(2007)2, p.414-428. 0564 44;448 Schiff, A. - New edition of SACO Participants' Manual 65 On C S & T in Human Biology, Medicine, Psychol- forthcoming (Lang.: eng). – In: ALCTS Newsletter Online, ogy, Education, Labour, Sports, Household 16(2005)6, p.1. 0572 651/4 0565 448 Arencibia-Jorge, R., Vega-Almeida, R.L., Martí-Laher, Y. – Kuntz, B. - Arabic librarians talk back to the empire (Lang.: Domain analysis for the construction of a conceptual struc- eng). - Paper presented to 29th annual conference of ture: a case study (Lang: eng). – In: LIBRES: Library and MELCOM: the European Middle Eastern Libraries Asso- Information Science Research Electronic Journal, ciation, Sarajevo, June 4-6, 2007. 5p. 17(2007)2. URL: URL: http://libres.curtin.edu.aul http://www.sant.ox.ac.uk/mec/melcomintl/melcom/Paper s-2007/Blair-Kuntz.doc 0573 651/4 Lacoste, C. et al. - Inter-media concept-based medical image 0566 448 indexing and retrieval with UMLS at IPAL (Lang.: eng). – Orphan, S. - EBSCO offers alternate subject headings In: Lecture Notes in Computer Science, 4730(2007) p.694- through A-to-Z service (Lang.: eng). – In: College & Re- 701. search Libraries News, 67(2006)6, p.351. 0574 651/4 48 On other Universal Classification Systems and Muh-Chyun Tang - Browsing and searching in a faceted in- Thesauri formation space: a naturalistic study of PubMed users' inter- action with a display tool (Lang.: eng). – In: Journal of the 0567 481 American Society for Information Science & Technology, Bilodeau, B. - RASUQAM: the thesaurus of descriptors of the 58(2007)13, p.1998-2006. Universtié du Québec à Montreal (UQAM) / RASUQAM le thesaurus de descripteurs de l'Université du Québec à 0575 651/4; 743 Montreal (UQAM) Lang.: eng). – In: Documentation et Stojmirović, A., Pestov, V. - Indexing schemes for similarity Bibliothèques, 52(2006)2, p.109-120. search in datasets of short protein fragments (Lang.: eng). – In: Information Systems, 32(2007)8, p.1145-1165. 0568 481 Sato, H. - The main purpose and the basic rules of developing 66 On C S & T in Sociology, Politics, Social Policy, Law, "ExpressFinder/Thesaurus" (Lang.: jap). – In: Journal of In- Area Planning, Military Science, History formation Science and Technology Association (Joho no Kagaku to Gijutsu), 57(2007)2, p.84-88. 0576 66 Levinson, D. - Anthropology, taxonomies, and publishing (Lang.: eng). – In: Online, 30(2006)4, p.28-30. 6 On Special Subjects Classifications and Thesauri 0577 661 0569 6 López-Huertas, M. J., Ramírez, I de T.- Gender terminol- Gagnon, G. – (Book review of) Johnston, B. H. - An- ogy and indexing systems: the case of woman's body, image ishinaubae thesaurus – East Lansing, MI: Michigan State and visualization (Lang.: eng). – In: Libri: International University Press, 2006 – 320p. - ISBN 0-87013-753-0 Journal of Libraries & Information Services, 57(2007)1, (Lang.: eng). – In: Choice: Current Reviews for Academic p.34-44. Libraries, 45(2007)1, p.62. 0578 666 62 On C S & T in Physics, Chemistry, Electronics, En- Dabney, D. - The universe of unthinkable thoughts: literary ergy warrant and West’s Key Number System (Lang.: eng). – In: Law Library Journal, 99(2007)2, p.229-247. 0570 624 Access Innovations develops a thesaurus for the Institute of 0579 666 Electrical and Electronic Engineers (Lang.: eng). – In: Key Hickey, L. - (Book review of) Burton, W.C. - Burton's legal Words, 15(2007)1, p.10. thesaurus. 4th ed. - New York : McGraw-Hill, c2007. - xvii, 1063 p.– ISBN 0071472622; 9780071472623 (Lang.: eng). – Knowl. Org. 34(2007)No.4 275 Knowledge Organization Literature

In: Choice: Current Reviews for Academic Libraries, URL: http://scienceworld.cz/sw.nsf/ID/10D74E2E7ED7 45(2007)1, p.70. 559EC1256F32005ACF20?OpenDocument, October 19, 2007. 0580 666 Modeste, J., Dina, Y. – Use of the Elizabeth Moys Classifica- 0587 736 tion Scheme for Legal Materials in the Caribbean (Lang.: Ou, S., Khoo, C.S.G., Goh, D.H. – Automatic multidocu- eng). – In: Caribbean Libraries in the 21st Century: ment summarization of research abstracts: design and user Changes, Challenges and Choices; ed. by C. Peltier-Davies evaluation (Lang. eng). – In: Journal of the American Soci- & S. Renwick. – Medford, NJ: Information Today, 2007 – ety for Information Science and Technology, 58(2007)10, ISBN 978-1-57387-301-7 – p.119-129. p. 1419-1434.

68 On C S & T in Science of Science, Information Sci- 74 Grammar Problems ence, Computer Science, Communication Science, Semiotics 0588 743 Dehuri, S., Mall, R. - Predictive and comprehensible rule 0581 682 discovery using a multi-objective genetic algorithm (Lang.: Naumis-Peña, C. - Estudio comparativo de tesauros bibliote- eng). – In: Knowledge-based Systems, 19(2006)6, p.413- cológicos en lengua española [Comparative study of Spanish 421. thesauri in LIS] (Lang.: sp). – In: Investigacion Biblioteco- logica, 21(2007)42, p.195-210. 0589 744 Hu, J. et al. - Locality discriminating indexing for document 0582 69 classification keywords: information retrieval (Lang.: eng). – Broughton, V., Slavic, A. - Building of a faceted classifica- In: Proceedings of the Annual International ACM SIGIR tion for humanities: method and procedure (Lang.: eng). – Conference on Research and Development, (2007), p.689- In: Journal of Documentation, 63(2007)5, p.727-754. 690. URL: http://dlist.sir.arizona.edu/1976/. 0590 744 Ke, W., Mostafa, J., Fu, Y. - Collaborative classifier agents: 7 Knowledge Representation by Language and studying the impact of learning in distributed document clas- Terminology sification keywords (Lang.: eng). – In: Joint Conference on Digital Libraries, 7(2007), p.428-437. 71 General Problems of Natural Language in Relation to Knowledge Organization 75 On-Line Retrieval Systems and Technologies

0583 715 751 General and Theoretical Problems Ahn, H., Kim, K., Han, I. - Global optimization of feature weights and the number of neighbors that combine in a case- 0591 751 based reasoning system (Lang.: eng). – In: Expert Systems, Hutchinson, H.B., Druin, A., Bederson, B. – Supporting 23(2006)5, p.290-301. elementary-age children’s searching and browsing: design and evaluation using the International Children’s Digital Library 73 Automatic Language Processing (Lang.: eng). – In: Journal of the American Society for In- formation Science and Technology, 58(2007)11, p. 1618- 0584 732 1631. Niemi, T., Jamsen, J. – A query language for discovering se- mantic associations, Pt. 1: approach and formal definition of 0592 751 query primitives; Part 2: sample queries and query evaluation Rédy, G., Neumann, A., Sutó, Z. - Információkeresés [In- (Lang.: eng). – In: Journal of the American Society for In- formation retrieval] (Lang.: hun). – In: Tudomanyos es formation Science and Technology, 58(2007)11, p. 1559- Muszaki Tajekoztatas, 54(2007)2, p.55-61. 1568; 1686-1701. 0593 751 0585 733 Ryoo, J., Saiedian, H. - A framework for classifying and de- Lazarinis, F. – Engineering and utilizing a stopword list in veloping extensible architectural views (Lang.: eng). – In: In- Greek Web retrieval (Lang.: eng). – In: Journal of the formation and Software Technology, 48(2006)7, p.456-470. American Society for Information Science and Technology, 58(2007)11, p.1645-1653. 0594 751 Williamson, N.J. - Knowledge structures and the Internet: 0586 733 progress and prospects (Lang.: eng). – In: Cataloging & Strossa, P. - Komunikace mezi člověkem a počítačem v při- Classification Quarterly, 44(2007)3/4, p.329-342. rozeném jazyce [Communication between man and compu- ter in natural language] (Lang.: cz). - Science WORLD [Online]. Praha, 2004. 276 Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

0595 751 URL: http://knihovna.nkp.cz/knihovnaplus51/boldis.htm, Zhang, R., Hu, Y.C. - Assisted peer-to-peer search with par- October 19, 2007. tial indexing (Lang.: eng). – In: IEEE Transactions on Par- allel and Distributed Systems, 18(2007), p.1146-1158. 0605 757 Han, L., Chen, G., Xie, L. – AASA: a method of automati- 0596 751 cally acquiring semantic annotations (Lang.: eng). – In: Zhu Xiaomin, Liao Jianxin – (Book review of) Lazar, J. - Journal of Information Science, 33(2007)4, p.435-451. Web usability : a user-centered design approach. - Boston: Addison Wesley, 2006. - xxi, 394pp.ISBN: 9780321321350 0606 757 (Lang.: eng). – In: Journal of the American Society for In- Kuramochi, M., Karypis, G. - Discovering frequent geomet- formation Science & Technology, 58(2007)7, p.1066-1067. ric subgraphs (Lang.: eng). – In: Information Systems, 32(2007)8, p.1101-1120. 0597 751;225 Uddin, M.N., Janecek, P. – Faceted classification in web in- 0607 757 formation architecture (Lang.: eng). – In: The Electronic Tho, Q.T., Fong, A.C.M., Hui, S.C. – A scholarly semantic Library, 25(2007)2, p.218-233. web system for advanced search functions – (Lang.: eng). – In: Online Information Review, 31(2007)3, p.353-365. 752 Dialogue systems. Interactive Catalogues 0608 757 0598 752 Xiaodong Shi, Yang, C.C. - Mining related queries from Web Kapoor, K., Goyal, O.P. – Web-based OPACs in Indian search engine query logs using an improved association rule academic libraries: a functional comparison (Lang.: eng). – mining model (Lang.: eng). – In: Journal of the American In: Program: Electronic Library and Information Systems, Society for Information Science & Technology, 58(2007)12, 41(2007)3, p.291-310. p.1871-1883.

0599 752 759 Evaluation of On-Line Information Retrieval Sys- Wells, D. - What is a library OPAC? (Lang.: eng). – In: The tems and Techniques Electronic Library, 25(2007)4, p.386-394. 0609 759 0600 752.3 Chen, Y-L., Cheng, L-C., Cheng, Y-L. – Using position, Harcourt, K., Wacker, M., Wolley, I. - Automated access fonts and cited references to retrieve scientific documents level cataloging for Internet resources at Columbia Univer- (Lang.: eng). – In: Journal of Information Science, sity Libraries (Lang.: eng). – In: Library Resources and 33(2007)4, p.492-519. Technical Services, 51(2007)3, p.212-225. 0610 759 0601 752.3 Raban, D.R. – User-centred evaluation of information: a re- O'Leary, M. - Northern light: better the second time around? search challenge – (Lang.: eng). – In: Internet Research, (Lang.: eng). – In: Information Today, 24(2007)2, p.33,36. 17(2007)3, p.306-323.

753 Online Access, Queries, Free Text Searching 76 Lexicon/Dictionary Problems

0602 753 0611 761;722 Mandl, T. – The impact of web site structure on link analysis Gómez González-Jover, A. - Meaning and anisomorphism (Lang.: eng). – In: Internet Research, 17(2007)2, p.196- in modern lexicography (Lang.: eng). – In: Terminology, 12 207. (2006)2, p.215-234.

754 Programs for on-line queries, e.g. for ranking. Rele- 0612 762 vance ranking Bergenholtz, H., Nielsen, S. - Subject-field components as integrated parts of LSP dictionaries (Lang.: eng). – In: Ter- 0603 754 minology, 12 (2006)2, p.281-303. Fourie, I., Bothma, T. – Information seeking: an overview of web tracking and the criteria for tracking software (Lang.: 77 Problems of Terminology eng). – In: Aslib Proceedings: New Information Perspec- tives, 59(2007)3, p.264-285. 0613 773 L'Homme, M.C. - The processing of terms in dictionaries: 757 Expert Systems in Searching. Search Engines new models and techniques. State of the art (Lang.: eng). – In: Terminology, 12 (2006)2, p.181-188. 0604 757 Boldiš, P. – Vyhledávače: současné problémy a trendy vývoje [Search engines: present questions and trends] (Lang.: cz). – In: Knihovna plus [Online]. 2005, č. 1. Knowl. Org. 34(2007)No.4 277 Knowledge Organization Literature

0614 773;78-75 ject Access, Prague, November 24, 2006. URL: http:// Faber, P.et al. - Process-oriented terminology management in knihovnam.nkp.cz/docs/telmemor/subject/M-CAST_in_ the domain of coastal engineering (Lang.: eng). – In: Termi- libraries.ppt?PHPSESSID=e93ac950899817d81e359e4235 nology, 12 (2006)2, p.189-213. 5fb5ae, October 19, 2007.

0615 7770624 791 Chou, C.-H., Han, C.-C., Chen, Y.-H. - GA based optimal Czerniejewski, B. - Multilingual Content Aggregation Sys- keyword extraction in an automatic Chinese Web document tem based on TRUST Search Engine (M-CAST) (Lang.: classification system (Lang.: eng). – In: Lecture Notes in eng). - Paper presented at TEL-ME-MOR/M-CAST Se- Computer Science, 4743(2007), p.224-234. minar On Subject Access, Prague, November 24, 2006. URL: http://knihovnam.nkp.cz/docs/telmemor/subject/ 0616 777 M-CAST-project_presentation-final.ppt?PHPSESSID=e9 Fu Lee Wang, Yang, C. - Mining Web data for Chinese seg- 3ac950899817d81e359e42355fb5ae, October 19, 2007. mentation (Lang.: eng). – In: Journal of the American So- ciety for Information Science & Technology, 58(2007)12, 0625 791 p.1820-1837. Heuwing, B., Mandl, T., Strotgen, R. - Multilingual web re- trieval experiments with field specific indexing strategies for 78 Subject-Oriented Terminology Work WebCLEF 2006 at the University of Hildesheim (Lang.: eng). – In: Lecture Notes in Computer Science, 0617 78-49 4730(2007), p.834-837. Chen, Z. et al. - Semantic integration of government data for water quality management (Lang.: eng). – In: Government 0626 791 Information Quarterly, 24(2007)4, p.716-735. Lisek, S. - P2P networks for distributed queries (Lang.: eng). - Paper presented at TEL-ME-MOR/M-CAST Seminar 0618 78-66;448 On Subject Access, Prague, November 24, 2006. Whited, M. - ALA/ALCTS Cataloging and Classification URL: http://knihovnam.nkp.cz/docs/telmemor/subject/ Section/Subject Analysis Committee (Lang.: eng). – In: Law M-CAST-P2P.ppt?PHPSESSID=e93ac950899817d81e359e Library Journal, 96(2004)4, p.862-863. 42355fb5ae, October 19, 2007.

0619 78-93 0627 791;871 Nero, L. M., Mitchell, J. S.,Vizine-Goetz, D. - Classifying Ménard, E. – Indexing and retrieving images in a multilin- the popular music of Trinidad and Tobago (Lang.: eng). – In: gual world (Lang.: eng). – In: Knowledge Organization, Cataloging & Classification Quarterly, 42(2006)3/4, p.119- 34(2007)2, p.91-100. 133. 0628 793 0620 78-93;357 Strossa, P. - Information query formulation in a Slavonic Pinto, A., Haus, G. - A novel XML music information re- language and its automatic processing : experience from Po- trieval method using graph invariants (Lang.: eng). – In: lish and Czech in comparison to Western European languages ACM Transactions on Information Systems, 25(2007)4, (Lang.: eng). - Paper Presented at TEL-ME-MOR/M- p.19-44. CAST Seminar On Subject Access, Prague, November 24, 2006. 0621 78-97 URL: http://knihovnam.nkp.cz/docs/telmemor/subject/ Tiberi, M., Mazzocchi, F. - La gestione della polisemia nei TEL-ME-MOR-PS.ppt?PHPSESSID=e93ac950899817d8 thesauri: il caso dei termini filosofici [Management of pol- 1e359e42355fb5ae. ysemy in thesauri: the case of philosophical terms] (Lang.: it) – In: Bollettino AIB, 47(2007)1/2, p.93-107. 0629 799 Clavel, P.– How localization challenges international portals: 79 Multilingual Systems and Translation character sets and international access (Lang.: eng). – In: In- ternational Cataloguing and Bibliographic Control, 0622 791 36(2007)3, p.51-55. Amaral, C., Laurent, D. - Implementation of a QA system in a real context (Lang.: eng). - Paper presented at TEL-ME- 0630 799;945 MOR/M-CAST Seminar On Subject Access, Prague, No- Harai, N. – Japanese scripts and UNIMARC (Lang.: eng). – vember 24, 2006. In: International Cataloguing and Bibliographic Control, URL: http://knihovnam.nkp.cz/docs/telmemor/subject/ 36(2007)3, p.55-58. TellMeMore_version_2.ppt?PHPSESSID=e93ac95089981 7d81e359e42355fb5ae, October 19, 2007.

0623 791 Balíková, M. - M-CAST in libraries (Lang.: eng). - Paper presented at TEL-ME-MOR/M-CAST Seminar On Sub- 278 Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

8 Applied Classing and Indexing In: Information Processing & Management, 43(2007)5, p.1200-1215. 81 General Problems, Catalogues, Guidelines, Rules, In- dexes 0640 842 Li, Y., Shawe-Taylor, J. - Advanced learning algorithms for 0631 811 cross-language patent retrieval and classification (Lang.: Haslinger, I., Van Otegem, M. - Machines moeten zoeken, eng). – In: Information Processing & Management, mensen willen vinden [Engines should search, people want 43(2007)5, p.1183-1199. to find] (Lang.: du). –In: Informatie Professional, 11(2007)1, p.16-19. 0641 844;918 Cuvillier, J. - Indexing grey resources: considering usual be- havior of library users and the use of Dublin Core metadata 82 Data Classing and Indexing via a database of specialized vocabulary (Lang.: eng). - In: Publishing Research Quarterly, 23(2007)1, p.78-88. 0632 82-92;43 Hu Yuefang, Chen Yintao - Differences between the DDC 0642 846;88-46 and the CLC in classifying works of literature (Lang.: eng). Hoover, L. L. - Agriculture and food related theses and dis- – In: Illinois Libraries, 86(2007)4, p.5-10. sertations available on the Web (Lang.: eng). - In: Journal of Agricultural & Food Information, 7(2006)2/3, p.87-108. 83 Titled classing and Indexing. Derived Indexing, Folk- sonomy 85 (Back of the) Book Classification and Indexing

0633 831 0643 854 Kelsey, P.J. - The Financial Counseling and Planning Index- Kunze, H., Dahlberg, I. – (Book review of) Fugmann, R. ing Project: establishing a correlation between indexing, total Die Buchregister: Methodische Grundlagen und praktikische citations, and library holdings (Lang.: eng). – In: Financial Anwendungen [The book index: methodological founda- Counseling and Planning, 18(2007)1, p.19-24. tions and practical applications] – Frankfurt am Main: DGI, 2006. – 136pp.– (Reihe Informationswissenschaft der 0634 835 DGI, Bd. 10) – ISBN 978-3-925474-59-0; 3-925474-59-5 Abbas, J. – In the margins: reflections on scribbles, knowl- (Lang.: eng).- In: Knowledge Organization, 34(2007)1, edge organization and access (Lang.: eng) – In: Knowledge p.60-61. Organization, 34(2007)2, p.72-77. 0644 854 0635 835 Matthews, D. - Indexing (Lang.: eng). - In: Author, Francis, E., Quesnel, O. - Indexation collaborative et folk- 18(2007)2, p.61-62. sonomies. / Gemeinschaftliche Indexierung und Folksonomi- en. / Indizacion colaborativa y folksonomias. / Collaborative 86 Secondary Literature Classification and Indexing indexing and folksonomies (Lang.: fr). -In Documentaliste - Sciences de l'Information, vol. 44(2007)1, p 58-63. 0645 864 Bornmann, L., Daniel, H-D. – Multiple publication on a 0636 835 single research study: does it pay? The influence of number of Munk, T.B., Mørk, K. – Folksonomy, the power law & the research articles on total citation counts in biomedicine significance of the least effort (Lang.: eng). – In: Knowledge (Lang.: eng). – In: Journal of the American Society for In- Organization, 34(2007)1, p.16-33. formation Science and Technology, 58(2007)8, p.1100- 1108. 84 Primary Literature Classification and Indexing 0646 864 0637 842 Kousha, K., Thelwall, M. - How is science cited on the Web? Blanchard, A. - Understanding and customizing stopword A classification of Google unique Web citations (Lang.: eng). lists for enhanced patent mapping (Lang.: eng). – In: World - In: Journal of the American Society for Information Sci- Patent Information, 29(2007)4, p.308-316. ence & Technology, 58(2007)11, p.1631-1644.

0638 842 0647 864 Kang, In-Su, et al. - Cluster-based patent retrieval (Lang.: Rousseau, R. – On Egghe’s construction of Lorenz curves eng). - In: Information Processing & Management, (Lang.: eng). – In: Journal of the American Society for In- 43(2007)5, p.1173-1182. formation Science and Technology, 58(2007)10, p.1551- 1552. 0639 842 Kim, Jae-Ho, Choi, Key-Sun - Patent document categoriza- 0648 864 tion based on semantic structural information (Lang.: eng). – Sawyer, S., Huang, H. – Conceptualizing information, tech- nology and people: comparing information science and in- Knowl. Org. 34(2007)No.4 279 Knowledge Organization Literature

formation systems literature (Lang.: eng). – In: Journal of 0660 872 the American Society for Information Science and Tech- Rorissa, A. – Relationships between perceived features and nology, 58(2007)10, p.1436-1448. similarity images: a test of Tversky’s contrast model (Lang.: eng). – In: Journal of the American Society for Informa- 0649 864 tion Science and Technology, 58(2007)10, p.1401-1419. Schneider, J.W., Borlund, P. – Matrix comparison: motiva- tion and important issues for measuring the resemblance be- 0661 872 tween proximity measures or ordination results. Parts 1 & 2 Yang, S. et al. - Semantic categorization of digital home photo (Lang.: eng). – In: Journal of the American Society for In- using photographic region templates (Lang.: eng). – In: Infor- formation Science and Technology, 58(2007)11, p.1586-95; mation Processing & Management, 43(2007)2, p.503-514. 1596-1610. 0662 875 0650 864 Balkhatir, M., Charhad, M. - A conceptual framework for Vanclay, J.K. – On the robustness of the h-index (Lang.: automatic text-based indexing and retrieval in digital video eng). – In: Journal of the American Society for Informa- collections (Lang.: eng). – In: Lecture Notes in Computer tion Science and Technology, 58(2007)11, p. 1547-1550. Science, 4653(2007), p.392-403.

0651 864 0663 876 Zanotto, E. D. - The scientists pyramid (Lang.: eng). – In: Mangan, E. - Cartographic materials: a century of cataloging Scientometrics, 69(2006)1, p.175-181. at Library of Congress and beyond (Lang.: eng). – In: Jour- nal of Map & Geography Libraries, 3(2007)2, p.23-44. 0652 864;757 Schneider, J.W. - Concept symbols revisited: naming clusters 0664 878 by parsing and filtering of noun phrases from citation contexts Baca, M. et al. - Cataloging cultural objects : a guide to de- of concept symbols (Lang.: eng). – In: Scientometrics, scribing cultural works and their images. (Lang.: eng). - 68(2006)3, p.573-593. Chicago, IL.: American Library Association, 2006.- xiii, 396 p. - ISBN: 0838935648; 9780838935644. 87 Classification and Indexing of Non-Book Materials Book reviews by: Chapman, J.W. (0665) – (Lang.: eng). - In: Technicalities, 0653 871 27(2007)2, p.15-16; Chatterjee, K., Chen, S.-C. - A novel indexing and access Frosch, P. (0666) - (Lang.: eng) – In: Library Journal, mechanism using affinity hybrid tree for content-based image 132(2007)3, p.152. retrieval in multimedia databases (Lang.: eng). – In: Inter- national Journal of Semantic Computing, 1(2007)2, p.147- 0667 878 170. Leman, S. - Let op uw woorden: thesauri in de dagelijkse museumpraktijk [Take care of your words: thesauri in day- 0654 871 to-day practice in museums] (Lang.: du). – In: Biblio- Dmitry, M., Bovbel, E. - Indexing and retrieval scheme for theek- en Archiefgids, 83(2007)1, p.36-38. content-based multimedia applications (Lang.: eng). - In: Lec- ture Notes in Computer Science, 4629(2007), p.162-169. 0668 878 Uralman, N. H. - 21. yuzyila girerken bir bilgi kurumu 0655 871 olarak muze [The museum as an information institution in Hsieh-Lee, I. – Organizing audio-visual and electronic re- the 21st century] (Lang.: tur). – In: Bilgi Dunyasi / Infor- sources for access: a cataloguing guide. 2nd ed. – Westport, mation World, 7(2006)2, p 250-266. CN: Libraries Unlimited, 2006. – xix, 376p.(Lang.: eng). Book reviews by: 88 Classification and Indexing in Subject Fields Hillson, B. (0656) - (Lang.: eng). - Reference and User Services Quarterly, 46(2007)3, p.105-106; 0669 88-4/5 Seabrook, N. (0657) - (Lang.: eng). – Library Review, Acosta, S. - Classification of life (Lang.: eng). – In: Library 56(2007)7, p.629-631. Media Connection, 25(2007)4, p.95.

0658 871;847 Conduit, N., Rafferty, P. - Constructing an image indexing 9 Knowledge Organization Environment template for the Children's Society: users' queries and archi- vists' practice (Lang.: eng). – In: Journal of Documentation, 91 Professional and Organisational Problems in General 63(2007)6, p.898-919. and in Institutions

0659 872 0670 918 Fielding, E. - Unlocking the garage: a web portal for car en- Calhoun, K. - Being a librarian: metadata and metadata spe- thusiasts (Lang.: eng). – In: Electronic Library, 25(2007)4, cialists in the twenty-first century (Lang.: eng). – In: Library p.453-464. Hi-Tech, 25(2007)2, p.174-187. 280 Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

0671 918 0682 942 Chapman, A. - Resource discovery: catalogs, cataloging and Rafferty, P., Hidderley, R. - Flickr and Democratic Index- the user (Lang.: eng). – In: Library Trends, 55(2007)4, ing: dialogic approaches to indexing (Lang.: eng). – In: Aslib p.917-931. Proceedings: New Information Perspectives, 59(2007), p.397-410. 0672 918 D’Ambrosio, D.M. – Conceptualizing metadata via reper- 0683 942;918 tory grids: exploring a method for the development of do- Yee, M. M. - Cataloging compared to descriptive bibliogra- main-specific systems for knowledge organization (Lang.: phy, abstracting and indexing services, and metadata (Lang.: eng). – In: Knowledge Organization, 34(2007)1, p.41-57. eng). – In: Cataloging and Classification Quarterly, 44(2007)3/4, p.307-328. 0673 918 Hert, C.A., et al. - Investigating and modelling metadata use 0684 944; 357 to support information architecture development in the statis- Isaac, A. - SKOS: simple knowledge organisation system tical knowledge network (Lang.: eng). – In: Journal of the (Lang.: du). – In: Informatie Professional, 10(2006)11, American Society for Information Science and Technology, p.40-43. 58(2007)9, p. 1267-1285. 0685 944 0674 918 Semantic Web Deployment Working Group requests com- Metadata and its impact on libraries; ed. by Intner, S.S., ments (Lang.: eng). – In: Library Hi Tech News, Lazinger, S.S., Weihs, J. Westport, CO & London: Librar- 24(2007)6, p.48-49. ies Unlimited, 2006. - 272pp. (Lang.: eng). Book reviews by: 0686 945 Arnott, G. (0675) – (Lang: eng). – In: Journal of Librari- Mao, C-C. A., Hsu, C-f. F. - Chinese MARC (Taiwan) and anship and Information Science, 39(2007)1, p.59-60; its bibliographic database (Lang.: eng). – In: International Petrou, A.D. (0676) – (Lang.: eng). In: – Journal of the Cataloguing and Bibliographic Control, 36(2007)3, p.58- American Society for Information Science and Technology, 60. 58(2007)6, p.909-910. 0687 945 0677 918;88-51/4 Panici, A. - De la MARC la UNIMARC : etapele dezvoltă- Michon, J. – Biomedicine and the Semantic Web: a knowl- rii, semnificaţii, importanţă, utilitate [From MARC to edge model for visual phenotype (Lang.: eng). – In: Catalog- UNIMARC : stages of development, significance, impor- ing & Classification Quarterly, 43(2007)3/4, p.149-160. tance and usefulness] (Lang.: rom). - In: Magazin Biblio- logic, (2006)2-3, p.10-13. 0678 918;88-54 Ferraioli, L. - An exploratory study of metadata creation in a 0688 949 health care agency (Lang.: eng). – In: Cataloging & Classi- Chihaia, L., Chipcea, E., Covaci, M. - Fişierul de autoritate fication Quarterly, 40(2005)3/4, p.75-102. pentru nume de persoane: ghid practic de utilizare în Aleph 500 [Authority files for personal names: a practical guide 0679 918;88-54 to their use in Aleph 500] (Lang.: rom). - Iaşi: Biblioteca Hatfield, A.J., Kelley, S.D. - Case study: lessons learned Centrală Universitară „Mihai Eminescu”, 2005. through digitizing the National Commission for the Protec- tion of Human Subjects of Biomedical and Behavioral Re- 0689 949 search Collection (Lang.: eng). – In: Journal of the Medical Chihaia, L., Popa, M. - Fişier de autoritate pentru nume de Library Association; 95(2007)3, p.267-270. colectivitate în Aleph 500.16.02 [Authority file for corporate body names in Aleph 500.16.02] (Lang.: rom). - 92 Persons and Institutions in Knowledge Organization Iaşi: Biblioteca Centrală Universitară, „Mihai Eminescu”, 2006. - 65 p. 0680 922 Kofnovec, L.- Ing. Dušan Simandl - život a dílo [Dusan 0690 949 Simandl - life and work] (Lang.: cz). – In: Čtenář, 59 Galvez, C., Moya-Anegón, F. - Approximate personal name- (2007)2, p.56-58. matching through finite-state graphs (Lang.: eng). – In: Journal of the American Society for Information Science 94 Bibliographic Control. Bibliographic Records & Technology, 58(2007)13, p.1960-1976.

0681 942 95 Education and Training in Knowledge Organization Danskin, A. – “Tomorrow never knows”: the end of cata- loguing (Lang.: eng). In: IFLA Journal, 33(2007)3, p.205- 0691 952 209. Bawden, D. - Information seeking and information retrieval: the core of the information curriculum? (Lang.: eng). – In: Knowl. Org. 34(2007)No.4 281 Knowledge Organization Literature

Journal of Education for Library & Information Science, 0697 982 48(2007)2, p.125-138. Cole, C. et al. - A classification of mental models of under- graduates seeking information for a course essay in history 0692 952 and psychology: preliminary investigations into aligning their Brunt, R. – Information storage and retrieval in the profes- mental models with online thesauri (Lang.: eng). – In: Jour- sional curriculum (Lang.: eng). – In: Library Review, nal of the American Society for Information Science & 56(2007)7, p.552-567. Technology, 58(2007)13, p.2092-2104.

0693 952 0698 982 Lørring, L. - Didactical models behind the construction of an Given, L.M. et al. – Inclusive interface design for seniors: LIS curriculum (Lang.: eng). -In: Journal of Education for image-browsing for a health information context (Lang.: Library & Information Science, 48(2007)2, p.82-93. eng). – In: Journal of the American Society for Informa- tion Science and Technology, 58(2007)11, p.1610-1618. 0694 952 Poulter, A. – On reading “Information storage and retrieval 0699 982 in the professional curriculum” by Rodney Brunt (Lang.: Moore, J.L., Erdelez, S., He, W. – The search experience eng). – In: Library Review, 56(2007)7, p.557-560. variable in information behaviour research (Lang.: eng). – In: Journal of the American Society for Information Sci- 0695 953 ence and Technology, 58(2007)10, p.1529-1547. Spotti Lopes Fujita, M. - La enseñanza de la lectura docu- mentaria en el abordaje cognitivo y socio-cognitivo: orienta- 0700 982 ciones a la formación del indizador [Teaching documentary Tenopir, C. et al. - Academic users’ interactions with reading from a cognitive and sociocognitive approach: o- ScienceDirect in search tasks: Affective and cognitive behav- rientation for the training of the novice learner] (Lang.: iors (Lang.: eng). – In: Information Processing & Man- sp). – In: Anales de Documentacion, 10(2007), p.397-412. agement, 44(2008)1, p.105-121.

98 User Studies

0696 981 Chun-Yao Huang et al. - Characterizing Web users' online information behavior (Lang.: eng). – In: Journal of the American Society for Information Science & Technology, 58(2007)13, p.1988-1997.

282 Knowl. Org. 34(2007)No.4 Personal Author Index 34(2007)

Personal Author Index

Abbas, J. 0634 Chiang, Huei-Min Given, L.M. 0698 Kharkevich, U. 0544 Montgomery, P. 0559 Acosta, S. 0669 0503 Goh, D.H. 0587 Khoo, C.S.G. 0587 Moore, J.L. 0699 Agosti, M., 0548 Chihaia, L., 0688, Gómez González- Kim, Jae-Ho 0639 Mørk, K. 0636 Ahn, H. 0583 0689 Jover, A. 0611 Kim, K. 0583 Mortchev-Bouveret, Amaral, C. 0622 Chipcea, E. 0688 Gopal, T.V. 0502 Kim, S. 0532 M. 0528 Andrews, J.E. 0482 Choi, Key-Sun 0639 Goyal, O.P. 0598 Kishida, K. 0513 Mostafa, J. 0590 Angrosh, M. L. 0505 Chou, C.-H. 0615 Han, C.-C. 0615 Kofnovec, L. 0680 Moya-Anegón, F. Arazy, O. 0535 Chun-Yao Huang Han, I. 0583 Kousha, K. 0646 0690 Arencibia-Jorge, R. 0696 Han, L. 0605 Kovac, T. 0556 Muh-Chyun Tang 0572 Clavel, P. 0629 Harai, N. 0630 Kuntz, B. 0565 0574 Arnott, G. 0675 Cole, C. 0482, 0697 Harcourt, K 0600 Kunze, H. 0643 Munk, T.B. 0636 Baca, M. 0664 Conduit, N. 0658 Harper, C.A., 0533 Kuramochi, M. 0606 Naumis-Peña, C. Balíková, M. 0623 Conway, C.N. 0495 Haslinger, I. 0631 La Barre, K. 0521 0581 Balkhatir, M. 0662 Cordeiro, I. M. 0554 Hatfield, A.J. 0679 Lacoste, C. 0573 Nero, L. M. 0619 Barátné Hajdu, Á. Costa, V. S., 0537 Haus, G. 0620 Laurent, D. 0622 Neumann, A 0592 0499, 0500 Covaci, M. 0688 He, W. 0699 Lazar, J. 0596 Nielsen, S. 0612 Basile, P. 0536 Cuvillier, J. 0641 Hert, C.A. 0673 Lazarinis, F. 0585 Niemi, T. 0584 Bawden, D. 0691 Czerniejewski, B. Heuwing, B. 0625 Lazinger, S.S. 0674 Nissen, M.E. 0497 Beck, H.W. 0532 0624 Hickey, L. 0579 Lehtonen, M. 0549 Ogihara, M 0545 Bederson, B. 0591 Dabney, D. 0578 Hidderley, R. 0682 Leman, S. 0667 Ojala, M. 0524 Bergenholtz, H. Dahlberg, I. 0643 Hillson, B. 0656 Levinson, D. 0576 O'Leary, M. 0601 0612 Dalbin, S. 0510, 0511 Hjørland, B. 0512 L'Homme, M.C. Orphan, S. 0566 Bianchini, D. 0514 D’Ambrosio, D.M. Hoover, L. L. 0642 0613 Ou, S. 0587 Bilodeau, B. 0567 0672 Horowitz, E. 0546 Li, T. 0545 Ounis, I. 0543 Blanchard, A. 0637 Daniel, H-D. 0645 Hsieh-Lee, I. 0655 Li, Y. 0640 Panici, A. 0508, 0687 Boldiš, P. 0604 Danskin, A. 0681 Hsu, C-f. F. 0686 Li, Y-C. 0540 Peng, D. 0538 Bonfiglio-Dosio, G. De Campos, L. M Hu, J. 0589 Liao Jianxin 0596 Pestov, V. 0575 0548 0542 Hu, Y. C. 0595 Lin, Wen-Yau C. Petgnet, D. 0560 Borlund, P. 0649 Dehuri, S. 0588 Hu Yuefang 0632 0519 Petrou, A.D. 0676 Bornmann, L. 0645 Dina, Y. 0580 Huang, Chun-Yao Lioma, C. 0543 Pinto, A. 0620 Bothma, T. 0603 Dmitry, M. 0654 0696 Lisek, S. 0626 Ponnusamy, R. 0502 Bovbel, E. 0654 Doucet, A. 0549 Huang, H. 0648 Loh, H. T. 0541 Popa, M. 0689 Broughton, V. 0480, Druin, A. 0591 Hui, S.C. 0607 Lopes, R. 0537 Poulter, A. 0694 0481, 0491, 0582 Dunlavy, D.M. 0530 Hunt, K. 0518 López-Huertas, M. J. Quesnel, O. 0635 Brunt, R. 0692 Du Preez, M. 0497 Hunter, J. 0526 0577 Quinn, S. 0492 Buizza, P. 0522 Efron, M. 0552 Hutchinson, H.B. Lørring, L. 0693 Raban, D.R. 0610 Burton, W.C. 0579 Erdelez, S. 0699 0591 Lu, K. 0507 Rafferty, P. 0658, Calhoun, K. 0670 Faber, P. 0614 Intner, S.S. 0496, Madalli, P. 0516 0682 Caranfil, L. 0553 Ferraioli, L. 0678 0674 Mall, R. 0588 Ramírez, I de T. Cathey, R.J. 0529 Ferro, N. 0548 Isaac, A. 0684 Mandl, T. 0602, 0625 0577 Chang, C-C. 0540 Fielding, E. 0659 Jamsen, J. 0584 Manevitz, L. 0504 Rédy, G. 0592 Chapman, A. 0671 Fleharty, C 0557 Janecek, P. 0597 Mangan, E. 0663 Richman, D. 0561 Chapman, J.W. 0665 Fong, A.C.M. 0607 Jimenez, A. G. 0515 Mao, C-C. A. 0686 Rorissa, A. 0660 Charhad, M. 0662 Fourie, I. 0603 Johnston, B. H. 0569 Martí-Laher, Y. 0572 Rousseau, R. 0647 Chatterjee, K. 0653 Francis, E. 0635 Kang, In-Su 0638 Matthews, D. 0644 Rowley, J. 0527 Chen, G. 0605 Frâncu, V. 0555 Kapoor, K. 0598 Mazzocchi, F. 0621 Ru, Y. 0546 Chen, S.-C. 0653 Frosch, P. 0666 Karamuftuoglu, M. Ménard, E. 0627 Ryoo, J. 0593 Chen, Y.-H. 0615 Fu, Y. 0590 0501 Michon, J. 0677 Sagonas, K. 0537 Chen, Y-L. 0609 Fu Lee Wang 0616 Karypis, G. 0606 Miksa, F. 0479 Saiedian, H. 0593 Chen Yintao 0632 Fugmann, R. 0643 Kasten, J. 0506 Miksa, S.D. 0520 Salo, J. 0498 Chen, Z. 0507, 0617 Gagnon, G. 0569 Ke, W. 0590 Miller, D. P. 0494 Samantray, S. D. Cheng, L-C. 0609 Galvez, C. 0690 Kelley, S.D. 0679 Mitchell, J. S. 0562, 0539 Cheng, Y.-L. 0609 Gery, M. 0550 Kelsey, P.J. 0633 0619 Satija, M. P. 0531 Cheung, K. 0526 Giunchiglia, F. 0544 Khairy, I. 0558 Modeste, J. 0580 Sato, H. 0568 Knowl. Org. 34(2007)No.4 283 Personal Author Index 34(2007)

Sawyer, S. 0648 Spink, A. 0482 Tiberi, M. 0621 Vizine-Goetz, D. Yang, S. 0661 Schiff, A. 0564 Spotti Lopes Fujita, Tillett, B. 0533 0562, 0619 Yee, M. M. 0683 Schneider, J.W. 0649, M. 0695 Trickey, K.V. 0493 Wacker, M. 0600 Yintao, Chen 0632 0652 Stirling, D. A. 0571 Uddin, M.N. 0597 Wang, Fu Lee 0616 Yousef, M. 0504 Seabrook, N. 0657 Stojmirović, A 0575 Ungváry, R. 0517 Wang, Tai-Yue, 0503 Yuefang, Hu 0632 Shawe-Taylor, J. Strossa, P. 0586, 0628 Uralman, N. H. Weihs, J. 0562, 0674 Zaihrayeu, I. 0544 0640 Strotgen, R. 0625 0668 Wells, D. 0599 Zanotto, E. D. 0651 Shen, J-J. 0540 Sukula, S. K. 0534 Urs, S. R. 0505 Whited, M. 0618 Zhan, J 0541 Shi, Xiaodong 0608 Sutó, Z. 0592 Van der Linden, H. Williamson, N.J. Zhang, R. 0595 Shimada, M. 0525 Tang, Muh-Chyun M M. 0509 0594 Zhu, S. 0545 Slavic, A. 0582 0574 Van Otegem, M. Wolley, I. 0600 Zhu Xiaomin 0596 Smiraglia, R. 0490 Taylor, A.G. 0494 0631 Woo, C. 0535 Smith, S. 0557 Tennis, J.T. 0551 Vanclay, J.K. 0650 Xiaodong Shi 0608 Song, D 0547 Tenopir, C. 0700 Vasudev, P. 0539 Xie, L. 0605 Soonja Lee Koh, G. Thelwall, M. 0646 Vega-Almeida, R.L. Yang, C. 0616 0523 Tho, Q.T. 0607 0572 Yang, C.C. 0608

Media-Agentur Schaefer - Dr. Frauke Schaefer - Lange Straße 14 - 04103 Leipzig Tel.: +49 341/30 10 620 - Fax: +49 341/30 10 621 - [email protected]

“Sponsoren gewinnen - aber wie?” Trainings für Kulturinstitutionen und Künstler

KULTUR BRAUCHT WIRTSCHAFT WIRTSCHAFT BRAUCHT KULTUR

In Zeiten schrumpfender Zuschüsse aus der öffentlichen Hand müssen Wirtschaft und Kultur zusammenarbeiten, um Kultur als für beide Partner wichtige Säule am Leben zu halten. Wie findet man Sponsoren? Und wo? Wie spricht man sie an? Wie findet man zu einer beidseitig fruchtbaren Partnerschaft?

www.media-schaefer.de Dan Hazen, Konrad Umlauf Jahrbuch der James Henry Spohrer (Eds.) Medienkunde Deutschen Bibliotheken Building Area Studies 2., aktualisierte und neu gefasste Bd. 62, 2007/2008 Collections Aufl age Herausgegeben vom Beiträge zum Buch- und Bibliothekswesen, Unter Mitarbeit von Susanne Hein und Verein Deutscher Bibliothekare Volume 52 Daniella Sarnowski 2007. 594 Seiten, gb 2007. VIII, 163 pages, hc Bibliotheksarbeit 8 ISBN 978-3-447-05526-0 ISBN 978-3-447-05512-3 2006. 350 Seiten, 45 Tabellen, br € 79,− (D) / sFr 134,− ISBN 978-3-447-05052-4 € 68,− (D) / sFr 116,− Das Jahrbuch erscheint seit 1902 alle € 34,− (D) / sFr 59,− These essays by noted Area Studies 2 Jahre und informiert u.a. über Perso- In Bibliotheken, Archiven und allgemein specialists at a number of US research nal, Organisation, Sammelgebiete und in Mediensammlungen sind heute die libraries serve as a practical and theore- Etats von über 700 wissenschaftlichen Bestände multimedial. In diesem Buch tical guide to university and college ad- Bibliotheken. Das Verzeichnis der wis- werden im Zusammenhang dargestellt: ministrators, library directors and heads senschaftlichen Bibliotheken enthält die of collection development, as well as se- die technischen Grundlagen der Non- Namen und Anschriften, Telefon- und lection practitioners who work to create print-Medien, Faxnummern, E-Mail-Adressen und foreign-language collections for research ihre Produktion und Distribution, Homepages, des weiteren den Umfang libraries. The volume constitutes a gene- die Strukturen und Schwerpunkte ihrer der Bestände, genauere Angaben zur ral introduction for new practitioners and Inhalte und Darstellungsformen: Musik, Art der Bestände, den Umfang der Lehr- even the most experienced Area Studies Film, elektronische Publikationen, Lite- buchsammlung, die Höhe der Mittel zur librarians will fi nd useful practical advice ratur und Kinderprogramme, Compu- Bucherwerbung, Öffnungs- bzw. Aus- for reviewing and refi ning their existing ter- und Videospiele leihzeiten, Anzahl und Einstufung der collecting practices. Coverage includes wesentliche Ergebnisse der Rezepti- Mitarbeiter, Namen der wissenschaftli- the Middle East, East Asia, Latin America, onsforschung in Bezug auf Nonprint- chen Mitarbeiter, Veröffentlichungen über Southeast Asia, Africa, and the Romance Medien. die Bibliothek und ggf. Angaben über ein language areas of Europe, as well as the bestehendes Pfl ichtexemplarrecht, eine German/Nordic/Netherlandic countries. Ewa Bagłajewska-Miglus, Amtsdruckschriftensammlung und über Each essay presents the Area Studies Rainer Berg besondere Sammelgebiete. topic in question from an historical per- POLNISCH spective and provides background on Wörterbuch für Engelbert Plassmann, Hermann Rösch, Jürgen Seefeldt, its present status and anticipated future Bibliotheken development. Special emphasis is placed Konrad Umlauf on the techniques of both print and di- Deutsch-Polnisch Polnisch-Deutsch Bibliotheken und gital collecting and on the assessment Bibliotheksarbeit 13 Informationsgesellschaft methods by which collection strengths 2006. XXVIII, 320 Seiten, gb in Deutschland and future needs are determined. Guide- ISBN 978-3-447-05323-5 lines for expenditures for both collections € 49,80 (D) / sFr 86,– Eine Einführung 2006. XI, 333 Seiten, 6 Karten, gb and collateral activities such as providing Die deutsch-polnischen Beziehungen ISBN 978-3-447-05230-6 access and preservation are provided, werden immer vielfältiger und intensi- € 39,80 (D) / sFr 69,− and contributors also supply extensive ver, auch im bibliothekarischen Umfeld. Gesellschaftliche Kommunikation stützt documentation for the burgeoning array Das „Wörterbuch für Bibliotheken“ mit sich seit beinahe 3 000 Jahren auf Biblio- of online digital resources which have seinen rund 7000 Fachtermini wendet theken und ihre Vorformen. Mit diesem emerged in the past decade. The volu- sich vorwiegend an alle, die Deutsch Band wollen die Autoren belegen, dass me editors, Dan C. Hazen (Harvard) and und Polnisch in ihrem bibliothekarischen professionelles Informationsmanagement James H. Spohrer (University of Califor- Alltag brauchen. Darüber hinaus kann es in der Informationsgesellschaft wichtiger nia, Berkeley), also provide a general aber auch Wissenschaftlern und sons- ist denn je. Darüber hinaus wollen sie un- introduction to the topic and a detailed tigen Interessenten in beiden Ländern ter Beweis stellen, dass Bibliotheken und summary of current cooperative activities von großem Nutzen sein. Im Wörterbuch bibliothekarische Techniken sich her- in Area Studies collecting. wird der für das Buch- und Bibliotheks- vorragend dafür eignen – vorausgesetzt wesen relevante Wortschatz – und somit allerdings, die nötigen Innovationen und auch die Terminologie der Informatik und Kooperationen werden zügig eingeleitet Computerwelt – möglichst umfassend und konsequent praktiziert. berücksichtigt. HARRASSOWITZ VERLAG • WIESBADEN www.harrassowitz-verlag.de · [email protected]