546

Francisco-Javier García-Marco

Teaching Thesaurus Construction: A Top-Down Approach for LIS Undergraduate Programmes

Abstract Thesauri constitute a great platform for hands-on learning and teaching of KOS design because they synthetize the alphabetic, terminological approach and the systematic one, flexibly allowing for the different options of hierarchical organization: terminological fields, categories, facets, and disciplines. In this paper, an alternative approach towards thesaurus construction training is discussed, with an emphasis on the difficulties that undergraduate students experience in the programmes of Library and Information Science and Information Studies. In these programmes the teaching of thesauri construction is usually very connected with indexing training and vocabulary control; and their complete design is left for advanced professional or academic courses. In both cases the usual approach is bottom-up, firstly controlling the vocabulary and later organizing the systematic schedules. The proposal in this paper is to offer a top-down approach, focused on the selection and organization of its concepts inside their disciplines, categories or facets, that is, on building its general architecture. Students build a microthesaurus in a field of their choosing, so they can acquire a sense of self-competence, which will allow them to face up confidently their professional challenges in the growing information niches where small- and medium-sized domain oriented KOS are needed.

1. Purpose

Thesauri were incorporated to LIS education programmes very soon, as cutting-edge tools for enhancing information retrieval and knowledge organization (KO). Later, the advances in latent semantic analysis (probabilistic and vector models, relevance ranking…) robbed thesauri of their protagonist role. However, being as they are classic and widely used KO tools, they have continued to be part of main trend LIS teaching programmes as specific subjects, or at least as modules inside them. Recently, due to the ever-increasing growth of new specialized information and documentation fields and the overload of new systems in the Internet that the semantic web promotes, graduate students are increasingly required to be able to develop new small or medium knowledge organization systems (KOS), suited for emergent or very specific communities of users, and to contribute to the maintenance of existing ones, adding to them or working on their interoperability. So, the need to instruct graduate students in KOS design is becoming again a priority. In addition to this, thesauri constitute a great platform for teaching KOS design because they synthetize the alphabetic, terminological approach and the systematic one, flexibly allowing for the different options of hierarchical organization: terminological fields, categories, facets, and disciplines… In this paper, alternative approaches towards thesaurus construction training are discussed, with an emphasis on the difficulties experienced by undergraduate students in the programmes of Library and Information Science and Information Studies.  547

2. Methodology: two approaches towards thesaurus construction Knowledge organization teaching in graduate programmes is usually focused on understanding the concepts behind knowledge organization systems (KOS) and on using these tools properly when cataloguing and indexing. This is a very important aim as LIS graduates are supposed to understand catalogues, bibliographies and other reference tools, so that they become qualified to incorporate new items to them and to exploit them for searching and reference on behalf of their users. As a consequence, the teaching of thesaurus construction is normally very connected with indexing training. Sometimes this occurs because both themes are included in the same curricular course, and thesaurus construction is taught at the same time than indexing with a controlled vocabulary, usually a thesaurus. In other occasions, thesauri are taught in different, sequential subjects, but the teacher chooses a bottom-up approach towards thesaurus development, more connected with indexing. Thesauri are mainly explained and built as vocabulary control tools, and students learn to interconnect terms and concepts to build up the controlled list into a network of concepts. But, as presented in the previous section, there is a growing need to develop KOS for specific work teams, small information units and web sites, which, despite their smaller size, need also an ambitious tool capable to organizing their knowledge domain, providing a systematic overview of it. In this sense, there is a need for the students really putting hands on new thesauri from scratch, so that they can become confident to design these tools on which they usually work with as users, not as developers. However, one of the biggest difficulties that graduate students encounter when building a thesaurus is precisely learning to select and organize its concepts in disciplines, categories or facets, that is, to build its general architecture. Frequently, they become blocked, they need a lot of help from teachers in this part and, after finishing their project, they do not usually find themselves competent enough in organizing the concepts of a specific domain. On the contrary, they come to the conclusion that it is a difficult task that they must leave in the hands of specialists. This can be useful to advocate for the existence of thesaurus specialists, but it can condemn them to be a minority in a context when they could be growing strongly, failing to use it as an opportunity. In consequence, this paper presents and discusses an alternative approach to thesaurus design that emphasizes a top-down approach, so that students can finish the course with a strong feeling of self-competence, but without disregarding the need to pay attention to detail, carefully referring their work to the current—and excellent— standards.

 548

3. Results The results of the course development project can be grouped into two big sections: the lessons obtained on the evolution, advantages and disadvantages of the bottom-up and top-down approaches towards thesaurus design, construction and assessment; and the characteristics of the actual course that was designed.

Thesaurus instruction: reasons, pros and cons of the bottom-up and top-down approaches Excellent courses have been created in thesaurus construction and development. Some are specifically devote to them as Aitchison & Gilchrist (1972) and its many successful editions and follow-up collaboration works—among them Curras (1991) with many editions sequels, translations and a great influence in Latin America—or Lancaster (1985). Others are included inside handbooks of a larger scope, as Lancaster (1972) and its successive editions and translations, or Slype (1987). Many of them were oriented toward professionals, who have a very different profile from graduate students regarding their motivations and their theoretical, practical background, and extension and quality of their encyclopaedic knowledge; but they have been successfully used as text books in undergraduate courses. An excellent revision of the literature on teaching thesaurus is available in Nielsen (2004), and very interesting references and tips are provided in Thomas (2004). In general, these courses follow an inductive, bottom-up approach. For example, the handbook created by F. W. Lancaster for the instructors of the General Information Programme of the UNESCO in 1985—based on a course in Buenos Aires in 1978—is organized into 14 chapters:(1) purpose of vocabulary control; (2) major components of a controlled vocabulary; (3) gathering terms; (4) organizing terms; (5) the hierarchical relationship; (6) the associative relationship; (7) characteristics of descriptors; (8) the entry vocabulary; (9) scope notes and identifiers; (10) thesaurus format and display; (11) growth and updating; (12) computer use; (13) vocabulary factors affecting the performance of information systems; and (14) natural language systems. So, after an overall presentation of thesauri in the frame of vocabulary control, organizing the terms comes in fourth place after gathering terms (3rd chapter). So, this course follows an inductive, bottom-up approach. And this is also the case with the main standards on thesaurus construction and some more recent publications on teaching thesaurus construction, as the very practical and useful by Shearer (2004) or the previous by Cabero & Castro (1997). This approach is coherent with the origin and evolution of the use of thesauri for information retrieval. Thesauri for information retrieval evolved from post-coordinated indexing, as a way of controlling concepts and terms, and grouping them to allow for easier term selection by the indexers and for enhanced search expansion and refinement. Only after some years the systematic presentation was introduced, first hierarchical (mainly in disciplines or more specific categories, depending on the scope  549 of the thesaurus), and thereafter faceted (Aitchison & Clarke, 2004; Garcia-Marco, 2016). This structure has also the advantage of going from the more elemental units − concepts, terms and relations − towards the more complex subjects, like organizing the thesaurus, editing it for publication and maintaining it. But it has also some serious disadvantages when we consider the pedagogical context of undergraduate programmes. The topic does not have usually many hours assigned, and, as the inductive procedure takes a lot of time, undergraduate students finish with a feeling of incompetence in dealing with the more abstract, difficult task of organizing a knowledge domain. As a consequence, one of the course’s main objectives − promoting self-competence − is missed. Therefore, an innovative education project was set on to research if a top-down approach to thesaurus learning and teaching could serve to promote student self- competency in a more efficient way, inspired in the concept of Gestalt or cognitive closing. As a result, a complete new schedule and group of activities was programmed for the subject “Thesaurus construction and assessment” that is actually being taught in the Information and Documentation Graduate Programme of the University of Zaragoza (Spain), which has a student load of 6 ECTS [1]: It is an optional subject that students can choose in their third or fourth courses [2], so they are advanced graduate students, with previous knowledge of indexing, cataloguing and classification.

Course design The course has been designed according to the philosophy and requisites of the Higher European Education Space: accounting the total student workload in 10-hour credits, establishing a set of generic and specific competences as the final educational goals, determining the corresponding learning results as operational and measurable variables, and setting a series of activities as the tool to achieve the learning results. According to the configuration of the subject in the graduate study curriculum, students must devote a total of 150 hours, though different recounts have showed that the medium student does not reach this level of commitment, only the best ones. The competences proposed for the course are a subset of those defined for the whole graduate programme [3]. Four competences are generic: developing autonomous learning, an orientation towards continuous improvement and innovation, better personal organization and planning skills, and promoting an ethical engagement with users and their work environment. And two competences are specific: analysis and representation of information, and organizing and storage of information. As it can be seen, thesauri construction was selected when the programme was designed as one of the subjects where creativity, innovation and ethical engagement can be better taught. This specific challenge was strongly assumed when designing the learning-teaching process.  550

The competences are unfolded into twelve learning results. According to them the students should be able to: 1) identify, analyse and describe the objectives of a thesaurus, its components, structure and procedures for its creation, maintenance, dissemination and use; 2) organize a knowledge domain to facilitate the retrieval of documents pertaining to it; 3) detect and argue about the implications of the selection of indexing and retrieval terms; 4) taking into account the information dissemination needs when designing a thesaurus; 5) plan and manage the construction of a thesaurus as a project; 6) assess thesauri; 7) understand and use the ISO 25964 standard; 8) build specialized thesauri using appropriate software; 9) organize their work schedule; 10) analyse the ethical implications of their decisions; 11) plan and execute their work autonomously; and 12) develop and improve their thesaurus by taking innovative decisions.

Infrastructure ISO 25964-1 is used as the focus context and reference. Diego Ferreyra’s TemaTres (TemaTres, 2006-; Ferreyra, 2016) was selected as the helping application: a PHP open source software that can be used in networked environments and allows for many export formats: Skos-Core, Zthes, TopicMap, Dublin Core, MADS, BS8723-5, RSS, SiteMap, txt, SQL. The software was implanted in an OS X Server 10.11, with MySQL 5.6.21 and PHP 7.0.6. Each student has a complete TemaTres installation and can invite other students to cooperate with him. The students’ projects are available at ibersid.unizar.es.

Learning activities Students must work around with selected references to build a categorized chronology of the evolution of thesauri. In this way, they gain a general overview of their place among the ecology of KOS. The suggested references are partially changed from year to year to avoid copy and paste from previous students’ assignments. After completing this assignment, they are provided with additional references in case they would want to widen their state of the question (García-Marco, 2016). Later, they are given a presentation of the previous decisions that must be taken before beginning a thesaurus project, and the main available alternatives. Whilst, they must find several potential topics that can become the subject of their project, and choose one among them. For this, they use the common vocational perspective of finding intersections among their personal interests and capabilities, and the needs of  551 people to whom they are related (“clients”). Thereafter, they must do an assessment of potential problems and difficulties, so they can choose a project that fits the duration of the course and their background knowledge. As thesaurus construction can be a very technical and time-consuming activity, we try at least that the subject they choose supports their motivation instead of becoming an obstacle, a common problem when there are provided with a list of compulsory subjects. Thirdly, the students are offered a detailed exposition on concepts and terms, while they continue their work on their thesaurus preliminary decisions, which will eventually become the introduction of their thesaurus. They also have a couple of sessions to present the projects to their peers and discuss them. To increase motivation and get some approximation to real environments, students are asked to consider themselves a KOS firm, give a brand name to it, and prepare a business meeting where they will present their projects as professionals. They usually enjoy this part very much. While they are being explained the conceptual relations, they begin their work with the sources, which must be chosen not only because they offer potential concepts, but mainly because they also provide alternative organization perspectives for the domain. While this is a departure from which is advised in ISO 25964-1, it has been found to be highly pedagogical, because students can work with the more general layers of the domain they are working with, putting them in relation with the abstracter knowledge organization tools, e.g., facets, categories and disciplinary trees. Generally, a simple Excel sheet is enough to swiftly sketch the thesaurus general structure, and enter descriptors in other languages, non-preferred terms, related terms and notes. Thereafter, the presentation and disposition of thesauri is introduced, with some relevant, selected examples. In between, they finish the sketch of their hierarchy, and begin their work with the thesaurus application, TemaTres, entering concepts, terms, relations and notes, and working with the different presentations that are provided. Batch import using an amended Excel file is encouraged and assisted. Finally, the students are offered with an introduction to interoperability regarding its importance, context with an emphasis on the semantic web, tools and problems, so they learn which is the next step they are expected to take to improve their thesaurus construction and maintenance capabilities. They practice exporting their thesaurus in at least one of the semantic web formats, and the student group analyses the files. The course finishes with four deliverables: an oral presentation; a traditional thesaurus report with its introduction and the two basic presentations, alphabetical and systematic; its online version supported by TemaTres; and a detailed task report with the timespans devoted to the project, the problems encountered and the solutions provided.

 552

4. Future developments The current results are very satisfactory, and both the student satisfaction reports and the student projects show that the key objective of promoting student self-competence in thesaurus design is achieved. However, there is much room for improvement. At the present moment, the practice of interoperability—which would be relatively easy using TemaTres—has not been addressed in depth, mainly because of a lack of time. At least two weeks should be devoted to explain the basics of KOS interoperability according to ISO 25964-2, and do some mapping between the thesauri that have been designed by the students. At this stage, introducing Protégée seems a very good option that should be taken into account, as Zeng (2005) has effectively shown for postgraduate courses. Protégée is powerful, well proven, interoperable, professional, and provides the greater context of ontology development. Introducing it in less advanced courses would be very formative. Thesaurus teachers must evaluate their students and context to see it this software can be adopted in undergraduate courses without sacrificing more basic aims. Also, to allow for better teamwork and integration with the rest of the learning activities of the LIS programme, it would be very relevant to provide for import and export outputs in the most common formats for bibliographic authorities. In this way, students would be able to better connect their cataloguing practices with their thesaurus design classes. Besides these specific improvements, the integration of alternative teaching strategies should be explored. More directional programmes as the one proposed by Irving (1995) are complementary to a project-based educational approach. They can be very useful to ensure that students do not miss any important point, and that they have completed their conceptual learning before addressing the next step. Finally, it is well known that learners unfold different styles of learning and thinking when dealing with the subjects they must master (Sternberg, 1997). These learning styles seem to be very connected with the personality traits of the students. In particular, some persons prefer an analytical, step-by-step approach, while others need to obtain a gestalt of the field (pregnancy) to deal later with the details. This could be connected with a preference for a bottom-up or top-down approach towards thesaurus development learning. So, further research must be overtaken in this respect to inquiry if such an important personality trait has a real impact in thesaurus construction teaching and learning. Of course, such studies should be done in a controlled manner, by obtaining objective and subjective measures about the learning styles of the students.

 553

Acknowledgements: This paper has been developed in the frame of the project “Implantación de un servidor de tesauros para el apoyo al desarrollo de metodologías activas y colaborativas en el Grado de Información y Documentación”, supported by a grant of the University of Zaragoza (PIIDUZ_15_031).

Notes [1] ECTS stands for European Credit Transfer and Accumulation System. In the Spanish case, each credit typically results into ten hours of class attendance, both practical and theoretical, and a total of 25 hours of student load per credit, including the ten an-hour face-to-face classes. [2] Currently, general graduate programmes in Spain are four-year ones, with the exception of Medicine and Surgery. Their total learning load is 240 ECTS. Each yearly course is typically 60 ECTS. [3] Most library and information graduate programmes in Spain follow a white book built by consensus among the existing ones in 2003-4 under the guidance of the Agencia Nacional de Evaluación de la Calidad y Acreditación (2005). Its professional competence analysis follows the results of the DECIDoc project, developed under the Leonardo da Vinci programme of the European Union (Euroguide…, 2000).

References Agência Nacional de Evaluación de la Calidad y Acreditación (2004). Título de Grado en Información y Documentación: Libro Blanco. Madrid: Agencia Nacional de Evaluación de la Calidad y Acreditación. [http://www.aneca.es/media/150424/libroblanco_jun05_documentacion.]. Aitchison, Jean & Clarke, Stella Dextre (2004). The thesaurus: a historical viewpoint, with a look to the future. InThe Thesaurus: Review, Renaissance, and Revision, edited by S K Roe and A. R. Thomas. New York: Haworth Press. Pp. 5-21. Aitchison, Jean& Gilchrist, Alan (1972). Thesaurus construction: a practical manual. London: Aslib. Cabero, Manuela Moro& Castro, Carmen Caro (1997). Propuesta metodológica para la enseñanza de la utilización y elaboración de tesauros [Methodological proposal for teaching thesaurus use and construction] InOrganización del Conocimiento en Sistemas de Información y Documentación. Pp.159-67. Currás, Emilia, Aitchison, Jean& Gilchrist, Alan (1991). Thesaurus: lenguajes terminológicos. Madrid: Paraninfo. Euroguide LIS: the guide to competencies for European professionals in library and information services. Association for Information Management. [http://www.aslib.co.uk/pubs/ 2001/18/01/foreword.htm] Ferreyra, Diego (2016). TemaTres. [http://www.vocabularyserver.com/blog/contact] García-Marco, Francisco Javier (2015). 25731 — Construcción y evaluación de tesauros: Guía docente para el curso 2015–2016. Zaragoza: Universidad. [http://titulaciones.unizar.es/asignaturas/25731 García-Marco, Francisco Javier (2016). The evolution of thesauri and the history of knowledge organization: between the sword of mapping knowledge and the wall of keeping it simple.  554

Brazilian Journal of Information Studies: Research Trends, 10(1). [http://www.bjis.unesp.br/revistas/index.php/bjis/article/view/5786] Irving, H. (1995). CAIT: Computer-assisted indexing tutor, implemented for training at NAL.Agric.Libr.& Inform.Notes, 21(4-6): 1-5. Lancaster, F. W.(1985). Thesaurus Construction and Use: A Condensed Course. Paris: United Nations Educational, Scientific and Cultural Organization, General Information Programme. [http://unesdoc.unesco.org/images/0007/000703/070359EB.pdf] Nielsen, Marianne Lykke (2004). Thesaurus Construction: Key Issues and Selected Readings. Cataloging & Classification Quarterly, 37(3-4): 57-74. Shearer, James R. (2004). A Practical Exercise in Building a Thesaurus. Cataloging & Classification Quarterly, 37(3-4): 35-56. Slype, George van (1987). Les langages d'indexation: conception, construction et utilisation dans les systèmes documentaires. París: les éditions d'organisation. Sternberg, Robert J. (1997). Thinking styles. New York: Cambridge University Press. TemaTres (2006). TemaTres: controlled vocabulary server. Sourceforge. [https://sourceforge.net/ projects/tematres] Thomas, Alan R. (2004). Teach Yourself Thesaurus: Exercises, Readings, Resources. Cataloging & Classification Quarterly, 37(3-4): 23-34. Zeng, Marcia Lei (2005). Using software to teach thesaurus development and indexing in graduate programs of LIS and IAKM. Bulletin of the ASIS&T, 31:6: 11-3.