<<

Towards increased information findability in OJAX++

Are Virtual Research Environments ready for tags, annotations and user-generated, collaborative ?

Johan O. Bjornson, BSocSc (UCD)

A minor thesis in partial fulfilment of the requirements for the Degree of Master of Arts in Information Studies

National University of Ireland University College Dublin

School of Information and Library Studies

September 2008

Head of School: Dr. Ian Cornelius Research Supervisor: Dr. Judith Wusteman

Table of contents

List of figures v Preliminary statements vi Acknowledgements vii Abstract viii Guide to acronyms ix

1. Introduction 1 1.1 Combining tags and research 1 1.2 Problem statement 2 1.3 Thesis logic 2

2. Research context: Defining related concepts 4 2.1 Web 2.0 and the Social Web 4 2.2 Virtual Learning Environments 4 2.3 Virtual Research Environments 5 2.4 The Irish Virtual Research and Library Archive 6 2.4.1 IVRLA and social annotations 6 2.4.2 Current tagging implementation in IVRLA 7 2.5 OJAX and OJAX++ 8 2.6 Metadata 9 2.6.1 Three types of metadata 9 2.6.2 Consortium recommendations for annotations 10 2.6.3 Authoritative metadata in IVRLA 10

3. Research design: Objectives, rationale and methods of inquiry 11 3.1 Research objectives 11 3.2 Contributing to new knowledge: The significance and value of the research 12 3.3 Research feasibility and methods of inquiry 13 3.4 Research questions 15 3.4.1 Research Question I 15 3.4.2 Research Question II 15 3.4.3 Research Question III 15 3.4.4 Research Question IV 15 3.4.5 Research Question V 15 3.5 Limitations of the research 15

4. The changing nature of searching, retrieving and classifying 17 information 4.1 New approaches to information search and retrieval 17 4.1.1 ‘Berrypicking’ and serendipity 17 4.1.2 Precision and recall 18 4.1.3 Information search and retrieval in transformation 19 4.1.3.1 19 41.3.2 and OAI-PMH 19 4.1.3.3 OpenSearch and SRU/SRW 20 4.1.3.4 21 4.2 New approaches to information classification 21 4.2.1 Structured classification: Taxonomies and the hierarchical-enumerative 22 approach

ii

4.2.2 Semi-structured classification: Facets and the analytico-synthetic 22 approach 4.2.2.1 Guided navigation 23 4.2.2.2 Faceted search in journals 24 4.2.2.3 Standardised facets: FaceTag 24 4.2.2.4 En route to an adaptive classification system 25

5. Uncontrolled classification: Collaborative tagging and social 26 annotations 5.1 and ‘crowdsourced’ content 26 5.2 Origins of collaborative tagging 27 5.3 Examples of collaborative tagging and annotation in action 28 5.3.1 General 28 5.3.2 Reaching goals 29 5.3.3 Images 30 5.3.4 Museum exhibits 31 5.3.5 Books 32 5.3.6 Audio, films, videos and television series 32 5.3.7 Scholarly and scientific material 33 5.3.7.1 Research events information 34 5.3.7.2 Annotating information in the humanities 34 5.3.8 ‘People-tagging’ 35 5.4 categories: A typology 35 5.4.1 Five categories of tags 35 5.4.2 Tagging for self, for others, or both 36 5.4.3 and Connotea: What do people tag for? 37 5.4.4 Connotea and Steve.Museum : Tags complement ‘author keywords’ 38 5.5 ‘The public has spoken’: 39 5.5.1 Narrow folksonomies 39 5.5.2 Broad folksonomies: The power law and the ‘long tail’ 40 5.6 Navigating folksonomies 41

6. Limitation to uncontrolled folksonomies 43 6.1 Linguistic problems 43 6.1.1 Exactness, precision and basic level variation 43 6.1.2 Problems with consistency 43 6.1.3 Polysemy and capitonyms 44 6.1.4 Homonymy 44 6.1.5 Plural noun forms 44 6.1.6 Synonymy 46 6.1.7 Spaces, symbols, acronyms and word collocation 46 6.1.8 ‘Meta noise’ 47 6.2 Other problems 48 6.2.1 Limited life cycle 48 6.2.2 Low scalability 48 6.2.3 Non-hierarchical, ‘flat keywords’ 48 6.2.4 ‘Mob indexing’ 49 6.3 Problems with tag clouds 49

7. Quality control of user-generated content 51 7.1 Tags combined with a controlled vocabulary 51 7.1.1 WordNet 51 7.1.2 Ontology-directed folksonomies and tag normalisation 52 7.2 Tag suggestion mechanisms and recommender systems 53 7.2.1 A recommender system for IVRLA 53 7.2.2 54

iii

7.2.3 Buzzillions 55 7.2.4 ZoneTag 55 7.2.5 Tagging sensibly on Delicious 56

8. Available open source tagging and annotation software 58 8.1 Steve Tagger 58 8.2 FreeTag 59 8.3 RichTags 60 8.4 Annotea 61 8.5 Multivalent, Fab4, CommentPress, SharedCopy and Fleck 61 8.6 The Open Annotation and Tagging System 62 8.7 Harvesting and Aggregating Networked Annotations 64 8.8 The Ultimate Tag Warrior, Jerome’s Keyword and Dekoh 65

9. Practical recommendations for a single tagging and annotation 67 tool in OJAX++ 9.1 Introduce tagging in OJAX++ as it leads to richer metadata and increased 67 knowledge discovery: important aspects of successful research 9.2 Encourage collaborators in VREs to create multiple types of tags in order 68 to capture the multi-faceted context of information items 9.3 Promote sensible tagging and introduce a ‘Tagging tips’ checklist 69 9.4 Combine encouraging, unrestricted tagging with tag recommenders and 70 the WordNet lexicon 9.5 Extend OpenSearch syndication and aggregation to tags and annotations 71 9.6 Gather tags in a dedicated tag element and allow for private tags 72 9.7 Software recommendations 73 9.7.1 Use the Open Annotation and Tagging System software: it can be 73 seamlessly integrated and offers both tagging and annotation functionality 9.7.2 Use a combination of cross-repository tag cloud and tag lists on individual 75 information item pages to give the tagging tool maximum visibility 9.7.2.1 Use the Dekoh versatile widget software for creating tag clouds 76 9.7.3 Monitor and learn from the HarVANA annotation system prototype, which 76 has much in common with IVRLA and OJAX++

10. Discussion 77 10.1 Research questions revisited 77 10.1.1 Research Question I 77 10.1.2 Research Question II 77 10.1.3 Research Question III 78 10.1.4 Research Question IV 78 10.1.5 Research Question V 79 10.2 Tagging across disciplines 79 10.3 Gaining momentum for user-generated content? 80

11. Conclusions 81 11.1 IVRLA in transition 81 11.2 Future work 82

Notes 83 References 86

Appendix A: Software comparison table 98

iv

List of figures

Figure Page

2.1 Search options in OJAX (Wusteman and O’hIceadha, 2007) 8

4.1 Scientific classification (Wikimedia, 2008a) 22 4.2 Facets for classifying garments (Broughton, 2004: 262) 23 4.3 Guided navigation on the Guardian Unlimited web site (Endeca, 2008) 23

5.1 From uncontrolled keywords to first-order logic ontologies (Weller, 2007) 26 5.2 A variety of social bookmarking sites (Hammond et al. 2005) 28 5.3 Tagging for life goals on 43 Things (Robot Co-op, 2008) 29 5.4 An annotated image (Barnes, 2006) 30 5.5 Woman’s ceremonial skirt, with Steve Tagger (Steve.Museum, 2008) 31 5.6 LibraryThing tags relating to the philosophy of science (LibraryThing, 2008) 32 5.7 An annotated YouTube video clip (Chitu, 2008) 32 5.8 Tag relations on BibSonomy (University of Kassel, 2008) 33 5.9 Tags on CiteULike (2008) 33 5.10 Iugo project components (University of Bristol, 2008a) 34 5.11 The annotation interface in the Fab4 browser (Corubolo, 2008) 34 5.12 Tagging of people in the Fringe Contacts project (Farrell and Lau, 2006: 2) 35 5.13 Functional tag categories on Connotea (Heckner et al. 2008: 7) 37 5.14 ‘Tag to text category model’ for Connotea tagging (Heckner et al. 2008: 11) 38 5.15 Tags used for a web page on Delicious (Kipp and Campbell, 2006: 6) 40 5.16 A power law curve (Bianchini, 2007) 41 5.17 A simple tag cloud (Hassan-Montero and Herrero-Solana, 2006: 2) 41

6.1 List of NISO vocabulary elements (Spiteri, 2007: 25) 45 6.2 Compound word separators on Delicious (Guy and Tonkin, 2006) 46

7.1 networking (Wikimedia, 2008b) 54 7.2 Reasons for tagging images on Flickr (Ames and Naaman, 2007: 6) 55 7.3 Tag-creation recommendations on Delicious (2008b) 57

8.1 Steve Tagger tagging analysis (Trant et al. 2007) 58 8.2 The FreeTag-powered tagging service on Eatlunch.at (Luk, 2004) 59 8.3 Tag vocabulary window in RichTags (Fountopoulos, 2007: 34) 60 8.4 The Annotatio annotation client (World Wide Web Consortium, 2005) 61 8.5 The Multivalent browser with annotations (Phelps and Wilensky, 2001: 2) 61 8.6 The OATS annotation interface (Bateman et al. 2006b: 5) 62 8.7 Searching tags in OATS (Bateman et al. 2006b: 7 63 8.8 Tagging in OATS (Bateman et al. 2006b: 6) 63 8.9 The OATS tagging and annotation control panel (Bateman, 2008) 64 8.10 Harvesting annotations with HarVANA (Hunter et al. 2007: 5) 65 8.11 Dekoh tag cloud widget (Dekoh, 2008b) 65 8.12 XML code for the sample Dekoh tag cloud in Figure 8.11 (Dekoh, 2008c) 66

v

Preliminary statements

I, Johan O. Bjornson, hereby certify that this Work submitted for assessment is my own and is expressed in my own words. Any uses made within it of the works of any other author, in any form (ideas, figures, images, texts, tables, computer software, et cetera ), are properly acknowledged at the point of use. A list of all references used is included.

The right of Johan O. Bjornson to be identified as the Author of the Work has by this statement been asserted in accordance with Irish and international law, treaties, and agreements.

The Work is not endorsed by University College Dublin and I claim full responsibility for the contents of the Work.

The following terms of use apply to this Work. Please review carefully.

Printed copies of this Work are deposited in the thesis collection in the School of Information Library Studies, University College Dublin. These copies remain the property of University College Dublin. Consultation and duplication of this Work is allowed for strictly personal use. Citing and referring to this Work may only be done if this Work is properly referenced.

For the duration of one (1) year following the date signed below, no part of this Work may be cited, reproduced, saved in a retrieval system or transmitted by any means without the prior written authorisation from the Author.

Dublin, Ireland, 29 th September, 2008

______JOHAN O. BJORNSON

Email address for correspondence: [email protected]

© 2008, Johan O. Bjornson

vi

Acknowledgements

A profound thank you goes out to everyone who has assisted me during these intensive months of self-disciplined, independent research and write-up.

I am especially indebted to the Head of School, Dr. Ian Cornelius, for the bounty of wisdom shared to the students in the Thesis Research Seminar module, and all the invaluable guidance given in order for us to ‘survive the thesis’.

I am also highly appreciative of all the efforts by my thesis supervisor, Dr. Judith Wusteman, in suggesting the thesis topic, in realising the feasibility and meaningfulness of the research, and in helping me to refine my thoughts and stay focussed throughout this project.

Thank you to my relatives and friends for always being there. Thank you very much indeed, Mr. Seán Henneberry with family, for your time and support.

Last, but not least, my sincere gratitude to my wife for her continuous and infinite

support and encouragement.

vii

Abstract

Purpose: This study investigates the potential benefits of implementing a collaborative tagging and social annotation service in Virtual Research Environments (VREs), with a distinct reference to the OJAX++ framework and the user interface development plan of the Irish Virtual Research Library and Archive (IVRLA), which is the initial OJAX++ repository testbed. The overarching objective is to recommend best tagging practice and suitable open source software for the tagging tool in OJAX++.

Methodology: The exploratory method of inquiry is used. An extensive literature review of information search, retrieval and classification approaches is carried out, in order to position tagging and investigate how multi-faceted information artefacts in VREs benefit from being identified, assessed, organised and discovered through user-generated content.

Findings: Tagging is not replacing, only complementing, traditional metadata creation. The prevalence of contextually rich metadata is low in IVRLA leading to low findability. The value of collaborative tagging in VREs is further justified through empirical studies showing that users add content- and context-related keywords overlooked by professional indexers in academic settings.

Recommendations: In VREs, tags describing content and context are deemed more useful than emotive, attitudinal and time and task related tags, which are common in social bookmarking. Uncontrolled tagging in combination with the WordNet controlled vocabulary is recommended. It is suggested to present taggers with tagging tips and word suggestions when tags are created, in order to facilitate creativity and reach sensible tagging. To keep tags and annotations private is paramount. The Open Annotation and Tagging System (OATS), is shown to be capable of supplying OJAX++ with a seamless and user-friendly tagging and annotation tool.

Limitations: There is limited empirical evidence of tagging in Virtual Research Environments. No consensus around a theoretical framework for user-generated content has been reached. Much of the literature on how to improve tagging systems has a predilection for the , and the solutions are often largely hypothetical.

Originality: Superimposing the Social Web on Virtual Research Environments has not been widely done before. Research into facilitating the common, useful and crucial research activities of serendipitous search, browsing and knowledge discovery and exploration through user-generated tags contributes to new knowledge.

Keywords: collaborative tagging, and classification, IVRLA, knowledge discovery and exploration, OJAX, OJAX++, Open Annotation and Tagging System (OATS), recommender system, social annotation, Virtual Research Environments

viii

Guide to acronyms

AJAX: Asynchronous JavaScript and XML API: Application Programming Interface CLE: Collaboration and Learning Environments HarVANA: Harvesting and Aggregating Networked Annotations HTML: Markup Language IVRLA: The Irish Virtual Research and Library Archive LGPL: Lesser General Public Licence, Free Software Foundation software licence JISC: Joint Information Systems Committee MIX: Metadata for Images in XML MODS: Metadata Object Description Scheme NISO: National Information Standards Organization OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting OATS: Open Annotation and Tagging System OJAX: A federated for repository metadata OJAX++: A Virtual Research Environment incorporating OJAX PHP: A computer scripting language for displaying dynamically generated web pages RDF: Resource Description Framework RSS: Really Simple Syndication SQL: Structured Query Language, a computer language UCD: University College Dublin UGC: User-generated content URL: Uniform Resource Locator VLE: Virtual Learning Environment VRE: Virtual Research Environment W3C: World Wide Web Consortium XML: Extensible Markup Language

ix 1. INTRODUCTION

1. Introduction

Within the area of user-generated content (UGC) on the , a tag is a label, keyword, that is, a type of annotation attached as a commentary to a resource or an item, for instance a book, a dissertation, or a music album. Annotations can be single words or whole phrases or sentences of supplementary descriptions, for instance a caption added to an image. Collaborative tagging refers to the activity taking place when non-professional indexers assign keywords and comments stemming from their own interpretation of a piece of textual, visual or aural information. The site Delicious [1] defines tags in the following way:

A tag is simply a word you use to describe a bookmark. Unlike folders, you make up tags when you need them and you can use as many as you like. The result is a better way to organise your bookmarks and a great way to discover interesting things on the Web (Delicious, 2008a).

When a user assigns tags to web pages, it is the start of building ‘a collaborative repository of related information’ (Delicious, 2008a).

1.1 Combining tags and research This research is a showcase of combining tags and research. The primary research interest is the potential advantages of applying a tagging and annotation mechanism to Virtual Research Environments. Opening the doors for the end users to collaboratively label and index content has been shown to work exceptionally well on web sites where the ‘collective intelligence’ is being ‘harnessed’ (O’Reilly, 2005). The popular social bookmarking service Delicious has more than 100 million tagged bookmarks in its database. Delicious and similar sites allow individuals to work towards a common goal and adapt a shared philosophy of altruistic assistance in finding useful information. The motivation for conducting this study is a belief that the research community is filled with individuals who want to facilitate the search and retrieval of relevant research in a range of different ways. Why was the ‘Advanced search’ option invented in the first place? One answer is flexibility. Information seeking can be done in many different direct and indirect ways, with one extreme being the specificity connected with seeking for a particular ISBN number. On the other extreme is serendipity, browsing and other approaches to fortuitous knowledge discovery. The usefulness of the approaches in research environments intrigues me.

1 1. INTRODUCTION

1.2 Problem statement The Irish Virtual Research and Library Archive (IVRLA) [2] is a ‘major digitisation and digital object management project’ in the humanities. It launched in early 2005 and currently is scheduled to run for a total of five years (University College Dublin, 2008a). IVRLA is in need of richer contextual metadata than currently available. It needs to fill these vocabulary gaps. IVRLA will include tens of thousands of historical images, but the fact that many photograph captions include the date a photograph was taken and other terms unlikely to come to mind when students, researchers and other user categories search the archives, makes the existing metadata relatively content-light (Chen et al. 2008a: 47). This restricts the search experience, making it imperative to find ways to promote accessibility and navigation. The IVRLA interface development plan is described in ‘Work Package 5’. One of its objectives is to ‘explore the adoption of social tagging and bookmarking tools for use within a research repository system’ (University College Dublin, 2008b). Currently, IVRLA has a transitional, limited, and, to a certain degree, concealed, tagging service, linked closely to the Delicious social bookmarking service, hence a user needs a Delicious account to be able to tag and bookmark IVRLA pages (Healy, 2008). A closer integration between the tagging tool and the IVRLA content is called for and this research identifies some strategies for bridging the gap between content, context and the degree of findability. Social tagging as it currently stands is not designed primarily for information discovery and retrieval by others than those who are creating the tags as their own point of reference for later consultation. This research will investigate how tagging systems can be modified and refocused to facilitate purposeful information retrieval.

1.3 Thesis logic The inner logic of chapter disposition, transition and progression in the thesis will be briefly described. The main text of the thesis starts in Chapter 2 with an outline of the research context, by defining Web 2.0 and the Social Web, Virtual Learning Environments, Virtual Research Environments, the Irish Virtual Research Library and Archive, including user studies, OJAX and OJAX++ and metadata types and standards, followed by Chapter 3 where the research objectives, rationale and methods of inquiry, and the research questions are introduced. Proceeding from this and moving from the general to the specific, an overview of information classification, search and retrieval theory in Chapter 4 will position and contextualise collaborative tagging, annotations and folksonomies as these concepts are subsequently

2 1. INTRODUCTION introduced and examined in-depth in Chapter 5. Several real-world working examples of tagging and annotating resources are introduced in this chapter. An evaluation of the limitations to tagging will follow in Chapter 6, while Chapter 7 introduces the quality controls of tags and the concept of recommender systems and tag suggestion facilities as these can reduce the effects of some of the shortcomings located in Chapter 6. Chapter 8 will introduce the open source tagging and annotation software available. This will lead up to the recommendations in Chapter 9, where pragmatic and user-centric suggestions for a tagging and annotation tool in Virtual Research Environments are developed. The chapter is twofold, starting with a list of general recommendations, and then continuing on by comparing and contrasting the software presented in Chapter 8. Drawing on and learning from the empirical evidence and the concepts, theories, opinions and user requirements mentioned in the previous chapters, pragmatic and user-centric recommendations for a tagging and annotation tool in Virtual Research Environments are developed. These recommendations are relevant for IVRLA, but also interoperable with and transferable to other research domains, digital libraries, et cetera , where OJAX++ may be used in the future. Chapter 10 is a discussion of the findings and a revisit of the research questions. Chapter 11 concludes the thesis and proposed directions for future study are given.

.

3 2. RESEARCH CONTEXT : DEFINING RELATED CONCEPTS

2. Research context: Defining related concepts

This chapter outlines and defines concepts closely related to this research. The frame of reference will also be conveyed and described.

2.1 Web 2.0 and the Social Web The term Web 2.0 started to appear frequently around late 2003 and early 2004 (Cormode and Krishnamurthy, 2008). While there are as many definitions of Web 2.0 as there are services claiming to be true incarnations of it, essentially a set of techniques for encouraging and facilitating creativity, collaboration and information sharing. On Web 2.0, anyone can author a where the latest gadgets are reviewed and recommended, upload a video depicting the once-in-a-lifetime trip around the world, or share their favourite web pages via Delicious. In Web 2.0, the user is an active, first-class object, not just passively receiving information, but also actively contributing information. This leads to a Social Web, built around the seven ‘social attributes’ of identity, reputation, presence, relationships, groups, conversations and sharing (Connolly, 2008). O’Reilly (2005) offers a break-down of the technical, social and economical implications of Web 2.0 on society or groups of societies, on individuals and their recreational activities, on education and student behaviour, on perceptions towards learning and teaching, leading to more versatile Virtual Learning Environments, and the impact on research communities, giving rise to the innovative concept of Virtual Research Environments.

2.2 Virtual Learning Environments Virtual Learning Environments (VLEs), also called Collaboration and Learning Environments (CLEs), are networked computer applications that support online teaching and learning. Originally developed as a tool for distance learning, VLEs support flexible instructor–student interactions: lecturers and tutors can post course readings, lecture notes, self-assessment quizzes, external links to additional readings and other material. Students can take part of these documents and resources from computers on-campus, from home or from any other location that has Internet access. VLEs often also include calendar and e-mail facilities in order to complement face-to- face and collaboration between students, however Duggan (2008) emphasises that Virtual Learning Environments differ from Virtual Research Environments in that the

4 2. RESEARCH CONTEXT : DEFINING RELATED CONCEPTS former are more intended for autonomous learning than for explicit collaboration between users, a task for which the latter are more useful. A few examples of VLE systems are Blackboard [3], Moodle [4] and Sakai [5]. Moodle is referred to as a Course Management System (CMS) and it is software with open source code, as is many other VLE platforms.

2.3 Virtual Research Environments Virtual Research Environments (VREs) takes the idea of the VLE a step further. VREs are online frameworks for more effective research. The UK Joint Information Systems Committee (JISC) Roadmap defines a VRE as ‘a set of applications, services and resources integrated by a standards-based, service-oriented framework which will be populated by the research and IT communities working in partnership’ (Brennan, 2005: 4). The three main user groups of VREs are research-active staff, research support staff and system administrators. Fraser (2005) defines a VRE as ‘digital infrastructure and services which enable research to take place’. E-science is ‘grid-based distributed computing for scientists with huge amounts of data’ (ibidem). E-research ‘expands its remit to all research domains, not just the sciences’ (Joint Information Systems Committee, 2008). VREs are a technology used in e-research. Many VRE test projects have been carried out in various countries during the last five years. JISC is currently in the second phase of its VRE programme, which will run until mid- 2009. A few of the VREs developed under this scheme are the Collaborative Orthopaedic Research Environment (CORE) [6] at the University of Southampton, Building a Virtual Research Environment for the Humanities (BVREH) [7] at the University of Oxford, the Integrative Biology Virtual Research Environment (IBVRE) [8] at the University of Oxford, and Virtual Environments for Research in Archaeology (VERA) [9] at the University of Reading. In Ireland, the Virtual Institute of Bioinformatics Éire (VIBE) [10], connected to Trinity College Dublin aims for a consolidation of bioinformatics research in Ireland. The Irish Research eLibrary [11] managed by the Irish Universities Association, facilitates the access to a wide range of academic journals, mostly those of interest to the Science Foundation Ireland research programmes. The Observatory [12] is part of the Royal Irish Academy and provides online access to ‘a wide variety of interdisciplinary, multilingual, and mulimodal digital resources created on the island of Ireland’. The paradigm shift from modernism to postmodernism prompted Toffler to propose the idea of the unification of consumers and producers of information, leading to a ‘prosumer’

5 2. RESEARCH CONTEXT : DEFINING RELATED CONCEPTS

(Stock, 2007: 97). This is the fundamentals of the ‘’, the social bookmarking movement and the rest of the Social Web, part of the paradigm shift from a hierarchical and top-down dissemination and interpretation of information and ideas, to what Castells (2000) calls ‘the network society’, where relationships are networked and complex, and where no one group has the sole right to propagate their notion of truth and accuracy. By establishing new connections, people become more visible to each other, and efficacious VREs can be created.

2.4 The Irish Virtual Research Library and Archive The Irish Virtual Research Library and Archive (IVRLA) is an Irish digitisation project where material from four University College Dublin (UCD) collections, namely the UCD Delargy Centre for Irish Folklore and the National Folklore Collection, the Irish Dialect Archive, the UCD Archives, and the James Joyce Library Special Collections, is being available online for researchers, students and individuals with a general interest in the dissemination of these types of scholarly material. It will be an important part of preserving Irish heritage into the future. At the moment, only sample material is available on the IVRLA, but when all material is in place, the value of tagging and annotations will be even more prominent. IVRLA will gather fragile documents, manuscripts, images and other heritage material that has to be preserved from decay. This material will benefit from increased access, not least by the Irish diaspora, ancestors and genealogists. Web 2.0 has been viewed with reluctance by some members of the archival community (Cox et al. 2007). IVRLA is a small step, taken in a small country, but it invites archivists to modify their traditional workflows. The Irish knowledge economy is directly contingent upon effective VREs, not only those collecting commercially ‘viable’, cutting-edge research, like pharmaceutics, chemistry, computer science et cetera , but also humanities and the social sciences, as represented by the IVRLA repositories.

2.4.1 IVRLA and social annotations In her IVRLA-centred user needs, preferences and requirements study, Caccamo (2006) touches on aspects such as social annotation, design requirements, personalisation, customisation and bilingual resources in IVRLA. Caccamo finds that a majority, 64 %, of respondents would welcome the possibility to make ‘social annotations’ of items (Caccamo, 2006: 56). Many different user groups showed an interest in annotations, but postgraduate

6 2. RESEARCH CONTEXT : DEFINING RELATED CONCEPTS researchers were less convinced than senior staff as to the usefulness of adding these types of comments. This shows that the interest in user-generated content is not necessarily higher among younger individuals who have grown up using a Web 2.0 suite of . Furthermore, there is a strong trend towards restricting the access of comments and annotations to predefined user groups (35 %), or kept private (36 %), with only 23 % of the respondents willing to share their comments with the entire IVRLA user base (Caccamo, 2006: 58). Caccamo uses Cleveland Museum of Art [13] as inspirational source when evaluating the usefulness of social annotations. Two interviewees indicated that they thought of user comments as useful, but not relevant for their own personal research. Annotations would be useful ‘as it could save them time and could be used for teaching purposes’ (Caccamo, 2006: 58). All participants emphasised that some kind of authority control had to be in place. In conclusion, participants regarded a tagging and annotation facility as potentially useful, but in a majority of cases they would not make use of it themselves due to lack of authority control. One participant felt that he ‘could critically assess material’ on his own (Caccamo, 2006: 80). Caccamo (2006) shows that IVRLA users are reluctant to create a user account to view and react to recommendations, tag and annotations from other users. To be able to offer a rich collaborative tagging experience where user-generated information can be exchanged, a login is necessary as it will meet basic requirements of identifying users and their past activity, thus feed necessary profile data into OJAX++. In a similar type of study, Fraser (2006) finds that in the JISC-supported Building a Virtual Research Environment for the Humanities (BVREH) project (University of Oxford, 2007a), to be able to locate, view, compare and annotate image collections and associated research resources using a common workspace is important among the users.

2.4.2 Current tagging implementation in IVRLA Healy (2008) investigates the current tagging setup in IVRLA, a transitional, limited, and, to a certain degree, concealed, tagging service, closely linked to the Delicious social bookmarking service. A user needs a Delicious account to be able to tag and bookmark IVRLA pages. Healy finds that participants had problems finding the Delicious button that indicates that a tagging service is indeed available. Once they figured this out, however, they had no further problems signing up for a Delicious account and start tagging the items in the research experiment.

7 2. RESEARCH CONTEXT : DEFINING RELATED CONCEPTS

Healy (2008) mentions that members of the IVRLA team consider the content to be more important than tags, and this is the reason why they do not let visitors tag individual items: as an item is assigned more and more tags in the current Delicious-powered social bookmarking and tagging implementation, the actual content of that page is moved further down. What the user then will see when the page is opened, is a list of user tags, and the user needs to scroll down to the see actual content that the tags describe. Rock (2008, forthcoming) takes a broader look at VREs in Irish Higher Education institutions, and interviews researchers about their needs for Virtual Research Environments generally. The study shows that almost none of them currently tag or annotate on social bookmarking sites, photo sharing sites, et cetera. The only interviewee using tagging was a computer science research who used Delicious to tag bookmarks.

2.5 OJAX and OJAX++ IVRLA is the ‘testbed’ for OJAX++, which is a Virtual Research Environment described as a ‘next-generation collaborative research tool’ (Wusteman and O’hIceadha, 2007). One component of OJAX++ is OJAX, a ‘simple, non-threatening but powerful’ federated search engine, that is, a tool that collects, ‘harvests’, metadata from multiple repositories, such as the National Folklore Collection and the Irish Dialect Archive in IVRLA, so that this metadata can be searched from a single interface. The open source information retrieval system Apache Lucene [14] is used to index and query the records harvested. OJAX has a dynamic user interface, including advanced features like search phrase auto-completion and auto-display of the amount of matches. These features, illustrated in Figure 2.1, are enabled by the use of Asynchronous JavaScript and XML (AJAX), a popular technique in Rich Internet Applications (RIAs), that is, web applications that offer the same range of features, functionality and interactivity as found in traditional desktop applications (Loosley, 2006).

Figure 2.1: Search options in OJAX (Wusteman and O’hIceadha, 2007)

One of the features in the pipeline for OJAX++ is a single tagging and annotation service that can give a Social Web touch to the research community by delegating parts of the indexing of

8 2. RESEARCH CONTEXT : DEFINING RELATED CONCEPTS the scholarly material from the departmental content providers down to researchers and other end-users.

2.6 Metadata Choudhury et al. (2000) emphasise that with more and more multimedia content “it becomes even more important to enhance the ability to search, identify, navigate, or browse through a collection of digital objects”. Metadata is the standard remedy to this problem. Metadata is commonly defined as ‘data about data’ or ‘information about information’, but a more elaborate and precise description offered by the National Information Standards Organization (2004) is ‘structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use and manage an information resource’. Sureka (2006) illustrates this with an example of an item in a library that can be searchable, but not findable: ‘even though the book is kept somewhere in [a] library, which means that it is searchable, it becomes un-findable because it is not catalogued properly’.

2.6.1 Three types of metadata Pure metadata carries meaning only in context to the primary data that it describes (Agnew, 2003: 2). It allows the implementer as well as the end user to judge the appropriateness, context and relative value of primary data, that is, the information item that it describes. Metadata can be descriptive, structural or administrative (Keogh, 2005: 6). The descriptive class of metadata is used for discovering and identifying resources, by name, title, et cetera . Structural metadata is concerned with hierarchical structure, like orders of chapters in a book, while the administrative metadata provides technical information facilitating the management and preservation of the item. The National Information Standards Organization (2004) lists their six ‘Metadata Principles’: Metadata should

o Be appropriate to the materials in the collections, to the users of the collection, and for future use o Support interoperability o Have authority control and be based on controlled vocabularies o Have clear statements of conditions and terms of use for the digital object o Support long-term management of a digital object

9 2. RESEARCH CONTEXT : DEFINING RELATED CONCEPTS

o Have the qualities of authority, authenticity, archivability, persistence and unique identification, since they are objects in themselves

2.6.2 World Wide Web Consortium recommendations for annotations Collaboratively created metadata, like tags and other annotations, are per definition not NISO- compliant metadata. However, for digital repositories, the World Wide Web Consortium (W3C) has proposed that annotations be treated as metadata (Heery and Anderson, 2005: 23). More precisely, tags and other annotations will fall into the category of descriptive metadata. Hunter (2007) calls collaborative tags and annotations ‘secondary metadata’, while Metadata Object Description Scheme (MODS) [15], [16], et cetera , consequently are schemas for ‘primary metadata’ ( ibidem ). Secondary metadata conforms partially to NISO guidelines: it is appropriate to the materials in, and users of, the collection, and it supports interoperability.

2.6.3 Authoritative metadata in IVRLA Previous MLIS research carried out in the School of Information and Library Studies, University College Dublin, has recommended the best approaches for professionally assigning primary metadata to the material phased into IVRLA. Metadata and cataloguing is contained in the IVRLA ‘Work Package 3’ (University College Dublin, 2008b). A subcategory of administrative metadata is technical metadata, examined in the IVRLA context by Holland (2006). Technical metadata is currently not in use in Ireland (ibidem , 58). However, as shown by Holland, the Library of Congress NISO Metadata for Images in XML (MIX) [17] is beneficial for harvesting metadata from images in IVRLA. Holland develops a MIX ‘Lite’ to meet the specific local needs of IVRLA. Keogh (2005) carried out a case study auditing the best approach to descriptive metadata on IVRLA. Dublin Core Metadata Element Set (DCMES) with 15 core metadata elements and MODS having 20 core metadata elements are the two alternatives compared. They both fulfil the requirement of being an international standard, and not a ‘local’ proprietary version. MODS emerged as the best choice in the IVRLA context, since it could better retain ‘the valuable level of description in IVRLA sample material’. Keogh concludes that it is ‘easier to go from the richer description of MODS to the mere basic [Dublin Core] even with considerable information loss, but ineffective to go in the opposite direction’ (Keogh, 2005: 73).

10 3. RESEARCH DESIGN : OBJECTIVES , RATIONALE AND METHODS OF INQUIRY

3. Research design: Objectives, rationale and methods of inquiry

This chapter describes the ways in which the thesis attempts to contribute to new knowledge, the methods of inquiry used for this, and the limitations to the chosen approach. The research objectives and research questions are also put forward.

3.1 Research objectives The study has seven key goals and objectives, scattered over the various thesis chapters.

To position tagging and annotation tools in the broader context of information classification, search and retrieval (Chapters 2 and 4)

To illuminate the current use of collaborative tagging, annotation, and recommender technologies in Social Web, and how it has started to emerge in Virtual Research Environments (VREs) (Chapters 5, 7 and 8)

To analyse whether a single tagging and annotation tool is beneficial in OJAX++ and the wider research community in ‘harnessing collective intelligence’ (Chapters 4, 5 and 6)

To identify and attempt to solve the problems with using collaborative tags and annotations in OJAX++ (Chapters 6 and 7)

To recommend best practice for introducing tags and annotations in a VRE (Chapter 9)

To investigate whether and how OpenSearch could be the preferred protocol for syndicating and aggregating the tagging and annotation metadata (parts of Chapter 4)

To propose practical specific changes needed on the OJAX federated search platform in order to introduce a single collaborative tagging and annotation tool (Chapter 9)

11 3. RESEARCH DESIGN : OBJECTIVES , RATIONALE AND METHODS OF INQUIRY

3.2 Contributing to new knowledge: The significance and value of the research The initial material contained in IVRLA will mostly be images, but more and more textual material will be digitised and items need broad metadata, enabling the material to be findable through a range of different keywords. Up until now, no best-practice case study on the creation of a tagging and annotation service in VREs has been carried out and this thesis will be a significant scholarly contribution, resulting in a range of useful suggestions and preparing the ground for more thorough study in this field , which has not been at the centre of attention in much previous Irish research. The action plan ‘Building Ireland’s knowledge economy’ (Inter Departmental Committee on Science, Technology and Innovation, 2004) describes the European Union target towards Research and Development as constituting 3 % of all national GNPs in the EU by the year 2010. This will follow a three-fold increase in spending during the 1990s. The Lisbon strategy and the eEurope Action Plan 2005 aims to transform Europe into the ‘most competitive and dynamic knowledge-based economy in the world, capable of sustainable economic growth with more and better jobs and greater social cohesion’ (Brennan, 2005: 8). The strong and renowned Irish entrepreneurial culture can further stimulate and enrich a knowledge economy and give Ireland its competitive edge during the next decades. Virtual Research Environments can further drive this development if they have a useful and dynamic toolkit. Collaborative services, such as a tagging and annotation mechanism, play a vital role. The JISC and University of Bath project ‘Enhanced Tagging for Discovery’ (UK Office for Library and Information Networking, 2008) illustrates the timeliness of studying user-generated content in research communities. The aim of that project is to build a demonstrator ‘to test the combination and comparison of controlled and approaches to semantic interoperability’ and investigate how to take ‘free social tagging beyond personal bookmarking to aid resource discovery’. The JISC project will study two academic communities:

o Postgraduate users tagging resources on Intute [18], a free online service providing a resource discovery service of hand-selected Web resources for education and research o Tagging by authors depositing material to the Council for the Central Laboratory of the Research Councils (CCLRC) repositories [19]

The results of the JISC ‘Enhanced Tagging for Discovery’ project will be published after this thesis has been submitted, however one anticipated outcome is that tagging and annotating

12 3. RESEARCH DESIGN : OBJECTIVES , RATIONALE AND METHODS OF INQUIRY will benefit the Intute Repository Search Service. This in turn increases the significance of this research on tags and annotations in OJAX++.

3.3 Research feasibility and methods of inquiry This dissertation will report the outcomes of secondary research, i.e., collection, analysis and synthesis of data that already exists. This approach is also called desk-based research. The exploratory method of inquiry is used in this research. With this method, ‘new or relatively unknown territory’ is examined ‘for the purpose of searching out or closely scrutinising objects or phenomena to lead to a better understanding of them’ (Mauch and Park, 2003: 129). The new territory is tagging and annotation services and a better understanding of these phenomena in the research community will be acquired. Mauch and Park (2003: 126) stress the importance of the seamlessness between methods and ‘the theoretical or hypothetical propositions under scrutiny in the investigation’. The data collection instrument should be practical, efficient, promising and data should be readily available to answer the research questions and achieve the research objectives. However, all of these criteria can seldom be met and compromises have to be made (ibidem ). Literature evidence that tagging promotes collaboration is readily available but only a fraction of this material focuses on the research community. The hypothesis that tagging and annotating resources can facilitate serendipitous knowledge discovery, can deepen the understanding of a subject matter and can lead to enhanced search procedures, is used to warrant and justify the need to introduce tagging and annotation in online research communities, so that academic information can be successfully found, shared and digested. This is a manifestation of researcher needs and values, and an appeal to support productive and competent research, both in Ireland and abroad. The proposed hypothesis is that collaboration is second nature for most researchers and that this will impact on ‘information seek-and-find’. The critical literature review is exploratory in its attempt to identify all components to consider prior to formulating the suggestions and recommendation for how to improve Virtual Research Environments by developing collaborative tools. This is the most practical and the most promising method of carrying out this research project. The literature review is an all-encompassing digest of readily available literature on the topic in order to make the inquiry as transparent, impartial and efficient as possible. Different approaches to classification of information are traced and then linked together with tagging and annotations in the Social Web and these findings will then be applied to research communities, which traditionally have not relied on non-experts to

13 3. RESEARCH DESIGN : OBJECTIVES , RATIONALE AND METHODS OF INQUIRY come up with content descriptors. A chapter about various information search and retrieval strategies is included to manifest that tags are not ‘flat’ keywords; rather they are highly searchable. Defining this wide range of underlying concepts will help reinforce the theoretical framework. Researching with a constructive mindset, solutions to problems can be developed. In this research, information retrieval problems in VREs are identified and the solution proposed is more user-generated keywords, labels and comments. With this methodology, the research will devise practical strategies as regards tagging and annotations in VREs and how this can be put into action in a, for researchers, competent, efficient and labour-saving way. Case studies and illustrative examples of existing tagging annotation services, both the user interfaces and the software backbones, will lay the foundation for the recommendations of the most suitable way of implementing tagging in VREs. Some tagging statistics has been included to achieve a more mixed methodology and to more highlight the use of tags and annotations more in detail. The literature on VREs and on tagging is substantial, but there is a limited range of fully working VREs and very little evidence of tagging being used in them more than on a limited experimental basis. There is however a range of conference proceedings, and similar documents, that investigate VRE pilot schemes and offer explanations of the rationale behind tagging on social networking and other sites. Restricting the literature review to formally peer-reviewed journals was never an option: the majority of articles written in the field are not professionally assessed since they are parts of Frequently Asked Questions pages, mailing lists, , news sites such as TechCrunch [20], and other flavours of ‘grey literature’. There is no reason to be inclined to instinctively and automatically surmise that this kind of material is unimportant, inaccurate or unreliable: it is a snapshot of actual viewpoints by, often, respectable individuals in Library and Information Science (LIS) and an important contribution to the overall body of knowledge in the area. As Speller (2007) emphasises in her literature review of the current state of tagging research, ‘the literature of tagging is largely opinion-based, almost entirely online, and the topic is largely absent from academic literature as it has only emerged as a system in the last two years’. This opens up the possibility of a very timely, finger-on-the-pulse research, and the opportunity to deliver a future-proof set of suggestions for a tagging and annotation service, bridging the gap between the Social Web and Virtual Research Environments.

14 3. RESEARCH DESIGN : OBJECTIVES , RATIONALE AND METHODS OF INQUIRY

3.4 Research questions The five research questions are derived from the research objectives mentioned in section 3.1. The research questions are audited in the discussion, section 10.1, in order to find out whether the research managed to answer the questions.

3.4.1 How are collaborative tagging and annotation services currently used in academic and non-academic online environments?

3.4.2 Can a collaborative tagging and annotation tool benefit information classification and retrieval in OJAX++ Virtual Research Environments (VREs), and if so, how?

3.4.3 Are there any problems with using a collaborative tagging and annotation instrument in OJAX++ VREs, and if so, how can they be resolved?

3.4.4 What are the recommendations for the implementation of a single collaborative tagging and annotation tool in OJAX++?

3.4.5 Is there a role for the OpenSearch standard in syndicating and aggregating tags and annotations between VREs and other rich information environments?

3.5 Limitations of the research A major problem with tagging is that there is yet to appear a widely accepted ‘tagging theory’ underpinning the evolution of this type of user-generated content. Gene Smith (2008) contends that the lack of a theoretical framework decreases the level to which he is ‘feeling confident’ about the material in his book Tagging: People-Powered Metadata for the Social Web, released in January 2008. The lack of consensus around a single tagging theory, let alone a range of accepted theories, made the book difficult to organise. Impacting also on this research, the theory dilemma experienced by Smith, makes it slightly harder to convince the reader of the credibility and validity of the information presented. Using a rich collection of screenshots and other imagery throughout the research presentation will hopefully facilitate the task of grounding the research in reality. To move away from the way social bookmarking sites promote tagging of anything and everything is not always easy. Tagging has been the blanket method to let user interact with

15 3. RESEARCH DESIGN : OBJECTIVES , RATIONALE AND METHODS OF INQUIRY information, and tagging is hailed as a ‘democratic’ and ‘powerful’ activity. These repeated acclaims and the explosion of tagging on every possible place online might deter people from choosing tags over – why fix something that is working? This might turn the concept of tagging into somewhat wishy-washy territory. There are a lot of overly hypothetical research papers in the tagging domain, and the challenge is to strike a balance between ideas that have not left the drawing board and tools that are already out there. The distinction between mentioning and discussing real-world examples of tagging on one hand, and recommending the same on the other hand, is somewhat artificial, but demarcations have to be made. Both separating and linking together theory and practice was necessary, and this put the rhetorical skills to the test. Only a few studies on tagging and annotation of academic resources were identified, which restricted the conclusions that could be drawn. On a more practical level, the time assigned for writing this dissertation was limited, and the opportunities of going into detail about every aspect of the research were unfortunately, but inevitably, restricted.

16 4. THE CHANGING NATURE OF SEARCHING , RETRIEVING AND CLASSIFYING INFORMATION

4. The changing nature of searching, retrieving and classifying information

This chapter will start with a closer look at information search and retrieval and how it is optimised in different types of information environments with more or less structured information. Following from this, different approaches to classifying information, a prerequisite for successful information retrieval, will be introduced.

4.1 New approaches to information search and retrieval The utopia of constructing the ‘perfect’ search engine has led to a long-term engagement between avant-garde engineers and a Herculean, ever-changing World Wide Web. Retrieval tools have to keep ceaseless pace with the fact that information on the Internet changes every second. As Quintarelli (2005) points out, the

‘sprawling, heterogeneous information sources [on the Internet] make up an enormous, ever-changing, time-sensitive, not-clearly defined corpus of items to classify without a central authority, targeted at a heterogeneous and increasing group of users. This situation requires new and different classification strategies.’

Traditional information retrieval presupposes that the user plays an active role, by querying and browsing, and information is ‘pulled’ by the user. In an era of information overload, information filtering mechanisms instead ‘push’ relevant information to the user, based on subscribed keywords, like in Really Simple Syndication (RSS) web feeds (Hassan-Montero and Herrero-Solana, 2006: 1).

4.1.1 ‘Berrypicking’ and serendipity As the power of interfaces has increased, classic models of information seeking behaviour has been complemented with new theories such as ‘berrypicking’, a realisation that search is an evolving and ongoing, not a static and linear, process, that information is gathered in bits and pieces rather than from one grand source and that many different search techniques are used simultaneously (Bates, 1989). A by-product of the ever-ending growth of information in the ‘deep web’, which is mostly ‘untouched’ by traditional web crawlers, the concept of serendipity, or finding interesting things by chance while looking for something else, has been increasingly popular 17 4. THE CHANGING NATURE OF SEARCHING , RETRIEVING AND CLASSIFYING INFORMATION and important, as a result of the multi-faceted nature of information and a collective acknowledgement of today’s multi-disciplinary world. No matter how comprehensive a resource is indexed, curiosity can lead places never visited before, and one method of reaching there is others’ recommendations and tags. The power of serendipity is extensive, if not infinite. One example of ‘accidental discoveries’ is X-ray radiation. Wilhelm Conrad Röntgen studied cathode ray tubes when he noticed that some fluorescent paper in his laboratory became illuminated at a distance even though his apparatus was opaque (Krock, 2001). The anti-malarial drug quinine, cellophane, penicillin, radioactivity, electromagnetism and vulcanisation of rubber are some other important inventions which might never have been unearthed without the force of serendipity.

4.1.2 Precision and recall Successful information retrieval relies on high levels of both precision and recall (Kerchner, 2006). Precision indicates the level of exactness, and measures how many of the items relevant items retrieved. Recall is a measure of completeness, demonstrating how many relevant items are retrieved out of all relevant existing items. Fall-out is the amount of non- relevant items retrieved in proportion to the entire collection of non-relevant items. In a study on finding biotechnology information on the Internet, Shafi and Rather (2005) conclude that as precision increases, recall decreases, and the inverse is also true. Morrison (2007: 28) studies precision and recall rates for folksonomies and traditional search engines directories, and finds that folksonomies score highest when searching for news, but in all other types of searches, including searches for an exact site or searches with a short factual answer, search engines performs better, even though Delicious statistically are very similar to the directories in many instances. Morrison states that this comparison might not be a fair indicator of usefulness, since searching is just a minor part of folksonomies. Measuring overall task completion rate and time might have rendered a more neutral comparison. It would also have been of interest to study which effects tags have on the ways users socialise on social networking sites.

18 4. THE CHANGING NATURE OF SEARCHING , RETRIEVING AND CLASSIFYING INFORMATION

4.1.3 Information search and retrieval in transformation A range of information search and retrieval approaches will be summarised and empirical examples of their application given.

4.1.3.1 Vertical search ‘Vertical’ or specialised search engines focus only on indexing niches of the Internet, thereby being able to perform better than traditionally ‘broad-based’ search engines, like Yahoo! [21] or Google [22]. Truveo [23] focuses on indexing Internet videos. Other specialised search engines have covered other parts of the Internet. The University College Dublin web site [24] previously used the Google ‘Custom Search’ service that offers a more in-depth indexing of all pages within a specific Internet domain, also called a site scope search (Grehan, 2005). In the UCD example, this gives search result prominence to university-related web pages.

4.1.3.2 Federated search and OAI-PMH The next phase towards more intelligent search is meta-search or federated search engines, which can be described as a ‘one-stop shop’ for searching multiple specialised search engines at once. OJAX uses the Open Archives Initiative Protocol for Metadata Harvesting (OAI- PMH), a simple protocol for collecting, or ‘harvesting’, metadata from any number of distributed , that is, repositories (Open Archives Initiative, 2003). The advantage of OAI-PMH over traditional web crawling is that the former does not need to harvest the same site or metadata twice – only new information is picked for inclusion in the ‘metadata store’. OAI-PMH is not a search engine, just a method of collecting metadata from multiple sources. To present this metadata to those performing search queries, another protocol is needed. OAI-PMH feeds the tag and annotation metadata from the person tagging and into the ‘metadata store’ and the OJAX federated search is a solution for querying the metadata locally in the VRE. OAI-PMH does not provide a search option, nor does it define the link between the metadata and the related content – it just provides a technical description of the harvesting mechanism.

19 4. THE CHANGING NATURE OF SEARCHING , RETRIEVING AND CLASSIFYING INFORMATION

4.1.3.3 OpenSearch and SRU/SRW For federated search engines to function, these systems need to know the search request syntax of every engine supported, so that the queries can be forwarded correctly back and forth (LeVan, 2006). Two approaches will be mentioned: the twin protocols Search/Retrieve URL Service (SRU) and Search/Retrieve Web Service (SRW) [25] and the OpenSearch protocol [26]. SRU and SRW build on the search-and-retrieval protocol Z39.50, popular in library environments for querying library holdings to see whether an interlibrary loan is possible. SRW and SRU are ‘brother and sister’ standardised protocols for querying Internet-accessible databases and returning search results (Morgan, 2005). These protocols could make the content of the ‘hidden Web’ would become more accessible ( ibidem ). SRU and SRW are, however, best used when searching in ‘reasonably structured’ information resources, like library catalogues (McCallum, 2006), whereas the strength of OpenSearch is to be to search many unstructured information sources, like the Internet. Users coming from the environment would see more similarities with OpenSearch, which offers a specification for how to discover and describe search engines and search queries and for how to share search results in a standardised way appropriate for syndication and aggregation (van Deen and Deneberg, 2006). OpenSearch comes under a Creative Commons Attribution-ShareAlike Licence. The guiding principle behind OpenSearch is that search results are most conveniently displayed as or RSS feeds, since this format is already used widely online (Ogbuji, 2007). One way of using the OpenSearch technology to open up for sharing of metadata is to display the results of searching many independent OAI-PMH metadata stores on the same page. Amazon A9 [27] is a web-based meta-search OpenSearch portal, where search results from around 700 OpenSearch-compatible search engines can be aggregated and syndicated. These contributors can be either single-repository engines or federated search environments. OJAX uses OpenSearch to make its federated search interface available as a browser add-on. One example of an OAI-PMH metadata store that uses OpenSearch as a search interface is the National Library of Australia, which has made the Picture Australia [28] and Collections Australia Network (CAN) [29] repositories available for search via A9. Picture Australia captures a wide range of heritage information and a range of cultural institutions can be searched for images. Like IVRLA, Picture Australia is backed by a library, and both of these projects include a wide range of images, e.g., the IVRLA portrait photograph collection by the Dubliner Constantine Peter Curran (1880–1972).

20 4. THE CHANGING NATURE OF SEARCHING , RETRIEVING AND CLASSIFYING INFORMATION

Additionally, the OpenSearch framework includes a description for auto-discovery of a search engine and has the ability to add this to the browser toolbar. The OJAX federated search engine can already be made available via a browser pull-down menu, as an alternative to the search fields within a VRE built on OJAX++. OpenSearch is capable of linking institutional repositories in a standardised way and open up for richer search experiences.

4.1.3.4 Semantic search The ‘Semantic Web’ [30] goes even further in Internet maturity; it is a ‘Web 3.0’ where ‘information is given well-defined meaning, better enabling computers and people to work in cooperation’ (Berners-Lee et al. 2001). Fountopoulos (2007: 40) defines semantic content as ‘the one that can be uniquely identified and distinguished no matter if its perceptible properties are not unique’. It can be thought of as a text where each and every word has a unique identifier attached to it. This identifier can then be looked up in a database to see what it means. This is semantic identification. To then attach unique identifiers to ranges of words and sentences, i.e., to semantically describe and identify longer pieces of text is the greatest challenge for the Semantic Web (ibidem ). The photo sharing site [31] introduced a facial recognition service in September 2008 (Reisinger, 2008), but apart from this, semantic search is not widely in use at present.

4.2 New approaches to information classification By combining powerful OpenSearch-compatible federated search with social search elements like a tagging and annotation service, OJAX can provide a richer and more collaborative search experience. However, to be able to carry out useful searches, whatever the purpose is, the information needs to be carefully described, indexed and classified. Conventional classification has followed either the hierarchical-enumerative top-down approach, or the analytico-synthetic bottom-up approach (Quintarelli, 2005).

21 4. THE CHANGING NATURE OF SEARCHING , RETRIEVING AND CLASSIFYING INFORMATION

4.2.1 Structured classification: Taxonomies and the hierarchical-enumerative approach The hierarchical-enumerative approach ensures that text, sound, images, videos and other content types are assigned a rational, fixed and single point of access. Every subject is assigned a label, and a systematic, exclusive enumeration of the item takes place. Taxonomies are the hierarchical and controlled vocabularies used in this approach. The Linnaean taxonomy [32] is used for classifying living things just as Dewey Decimal Classification (DDC) [33] is used for indexing library items (Donnato, 2008: 8). However, the nature of multi-dimensional objects means that one classification category is often not enough to capture the true nature of a multi-faceted object. Numerous hair-splitting moments have occurred when librarians try to find the best class mark for a multi-faceted book and finally produce a labyrinthine 30-digit notation. Hierarchical classification is called so because each level is contained within the more inclusive category right above it. Hierarchical classification is centralised, authoritative and subjectively influenced and biased by the cataloguer’s perspective of the world. It is costly, does not cater for differences in user needs and views, and will quickly find itself in a backlog, frantically trying to incorporate new concepts, for instance newly coined words, so-called neologisms, in natural language, into a stale framework. Taxonomies are ideal for homogeneous material, but not for much else. Figure 4.1: Scientific classification (Wikimedia, 2008a)

4.2.2 Semi-structured classification: Facets and the analytico-synthetic approach The analytico-synthetic approach is built around a combination of fields or aspects, what is also called facets, where a broad subject, such as ‘food’, is branched out into clusters of concepts, ‘soup’, ‘bread’, ‘lasagne’, et cetera. This is the idea behind S. R. Ranganathan’s revolutionary Colon classification scheme [34], where the facets Personality, Matter, Energy, Space and Time all are reflected in the final shelf mark. This bottom-up approach allows for a more detailed classification and opens up for multiple points of access. However, someone has to decide on which facets to include. Faceted systems are suitable in an information environment with many conflicting mental models, where one user wants to search for an

22 4. THE CHANGING NATURE OF SEARCHING , RETRIEVING AND CLASSIFYING INFORMATION object by its date of origin, while another user will base the search on an author name, a title name or the country of origin.

Figure 4.2: Facets for classifying garments (Broughton, 2004: 262)

The facets in Figure 4.2 are suitable for classifying different clothing types. They are flexible enough to describe things from many different categories, all of which deal with the intrinsic subject matter. There is an equal chance of findability by searching for the pattern, the colour, or another aspect listed in the faceted classification scheme.

4.2.2.1 Guided navigation Faceted classification facilitates ‘guided navigation’. With this approach, users can narrow or refine their search through inclusion or exclusion of certain facets. The Endeca search and information access platform [35] has been employed by many web sites, for instance Guardian Unlimited [36], Walmart [37] and LexisNexis [38].

Figure 4.3: Guided navigation on the Guardian Unlimited web site (Endeca, 2008)

23 4. THE CHANGING NATURE OF SEARCHING , RETRIEVING AND CLASSIFYING INFORMATION

4.2.2.2 Faceted search in journals The Journal Storage (JSTOR) Faceted Search [39] is a prototype service that allows for the following facets to be searched in the scholarly material included in the journal repositories:

o Discipline o Journal o Article type o Publication date o Language o Times article is cited in JSTOR o Number of pages o Articles with image

4.2.2.3 Standardised facets: FaceTag FaceTag [40] builds on a series of facets proposed by the Classification Research Group (CRG) in the 1960s, to enhance findability, browsability and user discovery (Quintarelli et al. 2006: 2). The FaceTag facets cover in a wide range of dimensions to an information resource. The following facets are used in FaceTag.

o Resource types o Language o Activities and subjects o Usage o People o Date

24 4. THE CHANGING NATURE OF SEARCHING , RETRIEVING AND CLASSIFYING INFORMATION

4.2.2.4 En route to an adaptive classification system Using these facets when users tag leads to an ‘adaptive classification system’ that is able to cater for the indexing processes taking place in a social collaborative context (Quintarelli et al. 2006: 5). ‘Flat’ keywords are categorised and a more concept-based exploration is enabled. In a VRE, which is more topically structured than many broad and ambiguous web pages, guided navigation means increased findability, flexibility and scope for user discovery through multiple points of access. JSTOR, FaceTag and other guided navigation services have these characteristics. FaceTag facets are more broad and suitable for different content types, whereas JSTOR facets are tailored specifically for journal articles.

25 5. UNCONTROLLED CLASSIFICATION : COLLABORATIVE TAGGING AND SOCIAL ANNOTATIONS

5. Uncontrolled classification: Collaborative tagging and social annotations

Chapter 4 showed that facets, even though not being hierarchically structured, still build on categories that someone has established. That is beneficial in guided navigation systems, where different users have different entry points. This chapter will introduce the concept of user-generated content in the form of unstructured keywords. For the purpose of this research, the usefulness and implications of complementing facets with a completely unordered list of keywords describing information artefacts will be investigated.

5.1 Citizen Science and ‘crowdsourced’ content So-called Citizen Science refers to volunteers without formal training assisting scientists in observing and measuring large data sets which are time-consuming to classify and categorise (Lamb, 2008). For example, NASA runs a project called Stardust@home [41], where members of the public identify and index tiny dust particles in images taken onboard an interstellar spacecraft. It is a type of ‘crowdsourcing’ or ‘common-based peer production’, where the general public is invited to refine concepts and express their trains of thought from a range of perspectives (Howe, 2006). Wikipedia [42] is an example of a crowdsourced where people meet exclusively online to create information and share the profit. These concepts are closely related to collaborative tagging: they let the general public organise knowledge. Collaborative tagging introduces a move to the left on the continuum shown in Figure 5.1, towards the social dimensions of collectively constructed ‘aboutness’, reflective action based on the tagger’s experiences and beliefs. The whole field of cataloguing and classification is about to change when consumers describe, categorise and annotate information.

Figure 5.1: From uncontrolled keywords to first-order logic ontologies (Weller, 2007)

26 5. UNCONTROLLED CLASSIFICATION : COLLABORATIVE TAGGING AND SOCIAL ANNOTATIONS

5.2 Origins of collaborative tagging Collaborative tagging on the Internet can be traced back to the analytico-synthetic classification method, introduced in section 4.2.2. In this approach, a vocabulary does not necessarily have to be built around a thoroughly open-ended vocabulary where ‘anything goes’, but doing so has proven amazingly popular on sites such as Delicious, which is an example of a Distributed Classification System (DCS) (Anadiotis, 2007). Multiple interpretations and multicultural views’ of the same piece of information create ‘shared intersubjectivities’ where users can benefit from not only their own discoveries, but also those of others (Campbell, 2006: 10). Collaborative tagging is an inclusive technique, and the amount of tags assigned to a single resource or item is theoretically unrestricted. The term ‘web annotation’ means a facility which lets users add, modify or remove longer comment about a resource, e.g., a web page. Annotations are longer pieces of texts which can be subjective or objective, and for personal or private use. They are burdensome to search through due to their length and word frequencies in annotations do not have any real significance. Browsing would be the normal way to view them. Annotations could for example work like a guest book, where a user writes a short comment about what he or she think of an article, for private use or for the benefit of others.

27 5. UNCONTROLLED CLASSIFICATION : COLLABORATIVE TAGGING AND SOCIAL ANNOTATIONS

5.3 Examples of collaborative tagging and annotation in action This section will illuminate how collaborative tagging and annotation services currently are used in academic and non-academic online environments. There is a vast supply of sites incorporating some type of tagging or annotation service, and this exposé will only refer to a limited, yet heterogeneous, range of projects.

5.3.1 General social bookmarking

Figure 5.2: A variety of social bookmarking sites (Hammond et al. 2005)

Social bookmarking systems enabled ‘social search’, and share some common features (Millen et al. 2005). Users can create lists of favourite bookmarks that are stored centrally rather than locally. These bookmarks can then be shared with other users. Web browsers order bookmarks in folders, and a bookmark can only belong to one folder – unless a user manually create multiple copies of the same bookmark. Through tagging, social bookmarks can be placed in multiple categories. Web sites can be rated, tagged and commented on, and its usefulness assessed by others. Furl [43], Delicious, StumbleUpon [44] and Yahoo! MyWeb [45] are a few examples of social bookmarking sites with no particular niche. Technorati [46] indexes over 250 million blog entries by looking at the tags that bloggers have assigned to their entries. 28 5. UNCONTROLLED CLASSIFICATION : COLLABORATIVE TAGGING AND SOCIAL ANNOTATIONS

BBC’s Shared Tags [47], Digg [48], Reddit [49] and Slashdot [50] are some of the web sites that allow for tagging and rating of online news articles. PennTags [51] is a project at the University of Pennsylvania, which lets users tag web pages, links to journal articles, records in their online university library catalogue Franklin and their online video catalogue VCat.

5.3.2 Reaching goals 43 Things [52] is a Webby Award-winning social networking site where a user is linked to other users based solely on tags concerning common goals and achievements. It is a kind of ‘1000 things to do before you die’ spin-off where users tag future goals in life based on suggestions from other users.

Figure 5.3: Tagging for life goals on 43 Things (Robot Co-op, 2008)

29 5. UNCONTROLLED CLASSIFICATION : COLLABORATIVE TAGGING AND SOCIAL ANNOTATIONS

5.3.3 Images Figure 5.4: An annotated Flickr image (Barnes, 2006) By allowing users to tag images with the name of the person who shot it, the person it depicts, or just any random comment, [53] has over two billion images tagged and annotated. Flickr [54], Picasa and Shutterfly [55] allow for tagging of photos, and visitors can then create annotations. Flickr lets their members add annotations to any part of the picture, like in the beach scenery in Figure 5.4. Without knowing the credentials of the user h a n g i n g p i x e l s , the first assumption is that the comment is just a random, causal remark, added ‘on the fly’ by a fellow photographer showing an interest in the camera used and the modus operandi of capturing the rainbow. sites have given rise to so-called automatic image annotation tools, like the Automatic Linguistic Index of Pictures Real Time (ALIPR) [56], where the user can search by ‘relatedness’ and ‘visual similarity’. Viewers can complement these automated tags by tagging for affective state of mind. Another example is Behold [57], which has automatically assigned tags to over one million Flickr photos. Flickr and Picasa can interpret ‘machine tags’, e.g., some of the auto-generated data included in images that follow the Exchangeable Image File Format () specification. Using coordinates and other location data supplied by Global Positioning System (GPS) satellites have given rise to the popular activity of ‘geotagging’, where images are automatically positioned on Google Maps [58].

30 5. UNCONTROLLED CLASSIFICATION : COLLABORATIVE TAGGING AND SOCIAL ANNOTATIONS

5.3.4 Museum exhibits Steve.Museum [59] is a collaborative project where museum visitors can tag exhibits and collections with the Steve Tagger tool [60]. The aim is to lure visitors back to the museum and reduce the anxiety that art is too multi-faceted to appeal to a broader, more heterogeneous audience. Trant and Wyman (2006: 1) show that visitors who tag for perceptions and interpretations can apply personal meanings and perspectives to the museum collections. In two trials, IVRLA users were asked to annotate images from the Éamon de Valera Photograph Collection, comprising 3,000 black-and-white images. 50–60 % added caption tags, while 40–50 % added additional tags (Chen et al. 2008a). This shows a high user interest in adding personal remarks and comments to historical images.

Figure 5.5: Woman’s ceremonial skirt, tagged with Steve Tagger (Steve.Museum, 2008)

31 5. UNCONTROLLED CLASSIFICATION : COLLABORATIVE TAGGING AND SOCIAL ANNOTATIONS

5.3.5 Books LibraryThing [61] allows the user to catalogue favourite books by importing them from Library of Congress, book stores affiliated with Amazon [62] or any of eighty library catalogues from around the world. The users can then classify the books with the Library of Congress Classification (LCC) [63] or the Dewey Decimal Classification schemes, and then tag the personal collection. FictionDB [64] lets their users read the latest reviews from a variety of web sites, browse series of books by publisher and by individual authors, learn about upcoming releases, buy books from sellers around the world and sell books to customers directly with no commissions.

Figure 5.6: LibraryThing tags relating to the philosophy of science (LibraryThing, 2008)

5.3.6 Audio, films, videos and television series On YouTube [65], users can create annotations as commentary to videos via a service from Omnisio. Users

Figure 5.7: An annotated YouTube video clip (Chitu, 2008) can embed comments talk bubbles in its own or others videos and add tags that can help people navigate to specific sections of a video. The Internet Movie Database (IMDb) has introduced increased discovery of film and television series and related concepts through its Movie Keyword 32 5. UNCONTROLLED CLASSIFICATION : COLLABORATIVE TAGGING AND SOCIAL ANNOTATIONS

Analyzer (MoKA) service [66], where users can ‘find accurate, yet unexpected results’ (Internet Movie Database, 2008). Odeo [67] makes it easy to search, discover and share audio and video from the web, including . On Last.fm [68], artists are tagged with the music genres they are active in.

5.3.7 Scholarly and scientific material Figure 5.8: Tag relations on BibSonomy (University of Kassel, 2008) Nature Inc.’s Connotea [69] is primarily aimed at the communities of scientists, researchers and clinicians and lets their users sort, rate and tag academic papers, for private use or for others to find later. CiteULike [70] organises a user’s scientific references, and caters for import of references, abstracts and citations from repositories such as PubMed [71], Journal Storage (JSTOR) [72], Public Library of Science (PLoS) [73] and Elsevier’s ScienceDirect [74]. Another project customised for the research community is BibSonomy [75], which provides the social bookmarking ‘staples’ found on Delicious, but extends this to include bibliographic information in the BibTeX [76] format. This information can then be tagged for easy access, and ‘busy tags’ are displayed on the home page. Relations between tags can also be traced, as seen in Figure 5.8.

Figure 5.9: Tags on CiteULike (2008)

33 5. UNCONTROLLED CLASSIFICATION : COLLABORATIVE TAGGING AND SOCIAL ANNOTATIONS

5.3.7.1 Research events information The Iugo project is part of the JISC Virtual Research Environment Programme and attempts to gather all online information pertaining to conferences and other research events. Iugo has a web annotation component which lets their users add a free-text annotation or an image to a research events information web page (University of Bristol, 2008b). Figure 5.10: Iugo project components (University of Bristol, 2008a)

5.3.7.2 Annotating information in the humanities The Multivalent Cheshire Kepler (MultiCheK) project [77] at the University of Liverpool is a component in the Building a Virtual Research Environment for the Humanities (BVREH) framework (Watry and Corubolo, 2007). An annotation system supporting ‘spontaneous distributed annotations across diverse document formats’ has been developed and this can be accessed via a toolbar in the Fab4 browser which opens different media types, for instance images, texts and streaming content. Fab4 uses Search/Retrieve Web Service (SRW) to search and retrieve the annotations as well as the documents that they are applied to.

Figure 5.11: The annotation interface in the Fab4 browser (Corubolo, 2008)

34 5. UNCONTROLLED CLASSIFICATION : COLLABORATIVE TAGGING AND SOCIAL ANNOTATIONS

5.3.8 ‘People-tagging’ Enterprises can tag day-to-day information so that searching and the formation of social networks are facilitated (John and Seligman, 2006: 1). The user expertise and interest is asserted, thus ‘I tag, therefore I know’. This leads to an asymmetric feedback loop and increased professional reputation of individual employees. Employees can tag their colleagues’ skills, interests and expertise, in order to boost efficiency and reduce the cost of set-up and maintenance of professional relationships, which the Fringe Contacts project has shown (Farrell and Lau, 2006). The rationale is that people keep track of bookmarks and professional relationships in the same way.

Figure 5.12: Tagging of people in the Fringe Contacts project (Farrell and Lau, 2006: 2)

5.4 Tag categories: A typology An outline of tag categories and private versus public tags will be given, followed by examples of how different types of tags are used on the Internet.

5.4.1 Five categories of tags Golder and Huberman (2006) assert that tags can refine categories by simulating hierarchies. Xu et al. (2006) outline the different types of tags on Social Web sites, as belonging to one or more of the following tag categories. They can be thought of as facets, but all of them are optional, however adding more of them sharpens up the searchable pathways leading back to the item.

35 5. UNCONTROLLED CLASSIFICATION : COLLABORATIVE TAGGING AND SOCIAL ANNOTATIONS

o Content-related , mostly neutral statements describing main subject matters and concepts of the resource, e.g., ‘University College Dublin’ or ‘Master’s thesis’ o Context-related , mostly neutral statements describing underlying circumstances and factors, e.g., ‘education’, ‘life-long learning’, or ‘personal development’ o Attribute , mainly neutral tags, not immediately deduced from content or context tags, e.g., the name of the Head of School, SILS o Subjective , attitudinal or emotive tags, e.g., ‘interesting’ and ‘boring’ o Organisational , often for strictly personal use and understanding, identifying tasks to do, e.g., temporal statements like ‘to read’, ‘to forward’

The first three types of tags are mainly suitable as public tags, while the last two mostly bear the characteristics of typical private tags, although attitudinal or emotive tags can be useful in rating and assessing resources and share these with others.

5.4.2 Tagging for self, for others, or both Collaborative tagging falls into two broad categories: tags being stored separately from the source being tagged, and tags and sources stored in the same place (Trant and Wyman, 2006: 4). Marlow et al. (2006) distinguish between tagging by object type (textual, non-textual), source of material (user-contributed, system, global), and tagging rights (self-tagging, permission-based, free-for all). Hammond et al. (2005) exemplifies that tags can be

o Provided by self (Delicious), o Provided by others (Facebook and Amazon), o Provided for self (Connotea), or o Provided for others (Flickr)

The attempt to ‘serve two masters at once; the personal collection, and the collective collection’ can be difficult, but Guy and Tonkin (2006) accentuate that certain tags, repeated tag, have a ‘social shared meaning alongside the personal meaning’.

36 5. UNCONTROLLED CLASSIFICATION : COLLABORATIVE TAGGING AND SOCIAL ANNOTATIONS

5.4.3 Delicious and Connotea: What do people tag for?

Figure 5.13: Functional tag categories on Connotea (Heckner et al. 2008: 7)

A study by Heckner et al. (2008) surveys tagging on the academic reference portal Connotea as a highlight of the tagging trends in a research environment. Kipp and Campbell (2006) study tagging practices on Delicious. The Heckner et al. study finds that 92 % of tags on Connotea are subject related, and of these, 98 % are content-based and 2 % resource related (context-based). Within the non- subject related group (8 %), 1 % are affective tags (subjective tag in the typology), 20 % task- related (organisational), and 79 % constituted ‘tag avoidance’, through assigning tags such as ‘test1’, since Connotea requires their users to assign at least one tag. There is not so much of subjective (attitudinal or affective) Connotea tags (Heckner et al. 2008). This category and the organisational category are flagged by Kipp and Campbell as a way for users to ‘defy traditional subject analysis’ (Kipp and Campbell, 2006: 10). Tags in this category are more a response from the tagger than an analytical annotation, they are ‘intrinsically time-sensitive’, but they indicate an active engagement with the text in the same way as mapping tags into context or content categories does. 16 % of the tags in the Kipp and Campbell study of Delicious relate to time, task or sentiment, while only 1.3 % so in Heckner et al. (2008) on Connotea. As Shirky asserts (2005b) ‘user and time are core attributes’ in tagging: users want to know the expertise of the person tagging, so that irrelevant and decayed tags can be weeded out. This is important whether or not the material tagged is fluctuating web sites or more robust material in a VRE.

37 5. UNCONTROLLED CLASSIFICATION : COLLABORATIVE TAGGING AND SOCIAL ANNOTATIONS

5.4.4 Connotea and Steve Museum: Tags complement ‘author keywords’ Merholz (2004) shows that taggers can reveal terms that have been overlooked by indexers or forgotten in the author keywords section. In a study on tagging blog posts on Technorati, Berendt and Hanser (2007: 4) find that tags often are more than metadata; they are to be considered additional content, if

o The tags have low similarity with the blog body text, i.e., the body text can not be used to predict the tags and vice versa, and o The combination of body and tags can better describe the content than the body or tags alone.

Figure 5.14: ‘Tag to text category model’ for Connotea tagging (Heckner et al. 2008: 11)

This ‘Tag to text category model’ in Figure 5.14 is taken from a study on tagging activity for 500 randomly chosen from the information and computer technology domain at Connotea project (Heckner et al. 2008). The diagram outlines the different variations between tags and the full text documents these tags describe. It is more common for author keywords than for collaborative tags to be multi-words. Two thirds of author keywords are not reflected in tags, and while author keywords assigned on average were six words, the tag average is two words. Other findings were that 54 % of the tags could be found in the text, and most of the tags are part of the title of the tagged article (49 %), words from the abstract (9 %) or words from the full-text academic paper (42 %). 30 % of the tags could not be found in the original text being tagged, which shows that tags are additional content. In the Steve.Museum project, Trant et al. (2007) found that up to 90 % of tags assigned by volunteer taggers, that is, visitors, are words not present in official museum fact sheets describing the items (Trant et al. 2007). Social tagging and annotation is anticipated to be beneficial to IVRLA, which has wide overlaps with Steve.Museum concerning media types used, particularly images.

38 5. UNCONTROLLED CLASSIFICATION : COLLABORATIVE TAGGING AND SOCIAL ANNOTATIONS

5.5 ‘The public has spoken’: Folksonomies The tags amassed on social bookmarking sites et cetera, give rise to a ‘folksonomy’ a folk tax onomy where all words have the same importance. When a sufficient participation rate has been accumulated, participants can operate as useful judges in deciding on the keywords and other metadata annotations that best encapsulate the original piece of information. Citing Hannay (2008), folksonomies are ‘liberating, not restrictive; bottom-up, not imposed; relational, not hierarchical. It also cleverly harnesses selfish acts and directs them towards the common good. But most of all, it just seems to fit the way our brains work’. To cite Gahran (2005), ‘a folksonomy merges, diverges, and evolves much the way language does, through usage and interaction’, and this is the great strength of folksonomies. Thomas Vander Wal, who, in a joint action with Eric Scheid and Gene Smith, coined the term folksonomy, holds that they should be seen as just a dialect of metadata, where nothing is right and nothing is wrong. Vander Wal (2007) outlines the three tenets of a folksonomy as being

o Tags o Objects being tagged o Identities of the individuals tagging

Vander Wal stresses that collaborative tagging has not so much to do with pigeonholing objects into strict categories as with providing taggers with a channel where they can ‘place hooks’ between items so as to share their understanding and meaning of a concept with other users within the same environment, for instance in a Virtual Research Environment.

5.5.1 Narrow folksonomies Flickr tags are ordered into a narrow folksonomy: the Flickr images are usually only tagged once, by the author of the photo, and the opportunity for others to label photos taken and uploaded by other users, is restricted. However, Flickr photos are usually impossible to find via any other means of search, since they are visual, do not get descriptive MIX metadata assigned to them, thus are not within the realm of a standard textual search engine. Narrow folksonomies can be ideal for images with an unambiguous subject matter, and the subsequent retrieval will be ‘fast, efficient and enjoyable’ (Quintarelli, 2005).

39 5. UNCONTROLLED CLASSIFICATION : COLLABORATIVE TAGGING AND SOCIAL ANNOTATIONS

5.5.2 Broad folksonomies: The power law and the ‘long tail’ Folksonomies are near the language and mental model of the users (Quintarelli, 2005). Wüster (1999, mentioned in Heckner et al. 2008: 9) finds that there are indeed foundational disparities between scientific and informal language use: Connotea is explicitly tailored towards a professional user community of academics, while there are many colloquialisms to be found on Delicious. All Delicious tags are parts of a broad folksonomy, where multiple individuals openly tag the same content. Xu et al. (2006) emphasise that collaborative tagging is convenient for describing web pages, a medium that by nature does not have a hierarchical, but rather a network structure. A distinct ‘aboutness’ of a web page is often not possible. This makes tagging with an open-ended, analytico-synthetic, tagging vocabulary convenient. The tag frequency graph in Figure 5.15 visualises how a specific page is tagged for on Delicious by 4,171 different users. A tag frequency graph usually corresponds roughly to the ‘power law’, or the Pareto 80–20 principle (Reh, 2008), where 20 % of the tag words constitute 80 % of the total tags. The peak tag and the accompanying six-tag slope are the most popular tags, called the ‘head’ (Bateman et al. 2007: 3). The rest of the bar chart, the ‘long tail’, comprises minority tags used to classify the web page. The graph shows something important: the fact that many pages are multi-faceted leads to a gradual drop-off and has given rise to seven essential ‘co-word’ tags. If multiple users choose the same two or more tags to describe the same item, then these two tags are co-word and analysing these two tags, the strength of the relationship between them can be shown (Kipp and Campbell, 2006: 4).

Figure 5.15: Tags used for a web page on Delicious (Kipp and Campbell, 2006: 6)

40 5. UNCONTROLLED CLASSIFICATION : COLLABORATIVE TAGGING AND SOCIAL ANNOTATIONS

Compare the graphs in Figures 5.15 and 5.16. The latter is a typical representation of the power law with a more rapid drop-off. The sharper the drop off in a collaborative tagging graph, the more similar to classic hierarchical and mono- faceted indexing is the collaborative tagging. Figure 5.16: A power law curve (Bianchini, 2007)

5.6 Navigating folksonomies There are two ways to navigate through folksonomies (Echarte et al. 2007: 2). The first one is the plain search and refine approach, offered on BibSonomy, LibraryThing and other social bookmarking sites. Tag search can also be achieved with OpenSearch when a folksonomy is hooked up to the metadata search fields within reach of the federated search.

Figure 5.17: A simple tag cloud (Hassan-Montero and Herrero-Solana, 2006: 2)

The main method for navigating folksonomies is tag visualisation (Echarte et al. 2007: 2). There are three main types of tag clouds, described below in rank of popularity (Lamantia, 2006):

o The tag size can represent the number of items to which a tag has been applied, exhibiting the popularity of each tag. This approach is used on Flickr and Technorati. o The tag size can represent the number of times the tag has been applied to a single item. An example is the artist–genre connections on Last.fm

41 5. UNCONTROLLED CLASSIFICATION : COLLABORATIVE TAGGING AND SOCIAL ANNOTATIONS

o The tag can be used for categorisation, where the size of a tag represents the quantity of content items in a single category. Tag clouds ‘perform a very valuable function without undue complexity’ (Lamantia, 2006). They are ‘a collection of labels referring to a cluster of aggregated concepts’, giving rise to a semantic field, that is, a collection of concepts connected to a main focus ‘in a form that is now independent of the originating taggers, and available to other people for understanding’. In this sense, tag clouds constitute a ‘visible and actionable information environment’ (ibidem ).

42 6. LIMITATIONS TO AND QUALITY CONTROL OF UNCONTROLLED FOLKSONOMIES

6. Limitations to and quality control of uncontrolled folksonomies

Chapter 5 focussed on the positive aspects of tagging and annotating digital resources, and reported back successful utilisation of user-generated content. This chapter will go through a range of problems with collaborative tagging, and then introduce the concept of recommender systems, in an attempt to demonstrate how they can solve some of the problems with uncontrolled user-centred tag categorisation of information.

6.1 Linguistic problems The linguistics problems are most pronounced, since a plain, non-semantic folksonomy, as previously mentioned, is all-inclusive.

6.1.1 Exactness, precision and basic level variation (Shirky, 2004; Stock, 2007) Perhaps the greatest problem with tags is their lack of precision, however this is more a ‘function of user behaviour’, not a problem with tags themselves, according to Shirky (2004): Delicious “allows both hierarchical tags, of the weapon/lance form, as well as compounds, as with ‘SocialSoftware’” – the freedom to choose either solution is available. Stock (2007) exemplifies the problem with basic level variation: Perl might be too specific a term but programming might be too general. The sought for specificity can vary and the exactness of the terminology can confuse users who do not know which level of specificity to give to a tag to increase the findability.

6.1.2 Problems with consistency (Beck, 2007) Beck (2007) draws attention to his own ‘SLS’ tagging principle: ‘You can disregard every other piece of advice I give you, but consistency is the single thing you must strive for when tagging. Everything else is window dressing.’ Beck goes on listing ‘three things that I think almost every tag should be, regardless of what program or system you’re using’:

o Succinct: Short, simple and memorable o Lowercase: Adds to the consistency

43 6. LIMITATIONS TO AND QUALITY CONTROL OF UNCONTROLLED FOLKSONOMIES

o Singular: If everyone abides by this rule by using singular words by default, it reduces the questions you may have when searching the tags. “Did I tag that photo ‘people’ or ‘person’?” Guy and Tonkin (2006) call this ‘tags that do not follow convention’

6.1.3 Polysemy and capitonyms (Sood et al. 2007) Polysemy refers to the fact that a word can have many (‘poly’) related meanings or senses (‘semy’). One example is ‘window’ that both can refer to a hole in the wall, or to the pane of glass that resides within it (Pustejovsky and Bouillon, 1995). Analogously, when tagging for ‘caterpillar’, it is unknown whether the primary information item describes etymology or construction equipment (Sood et al. 2007).

Polish as nationality versus polish as activity is an example of a capitonym, words that share the same spelling, but have different meanings.

6.1.4 Homonymy (Golder and Huberman, 2005) Words that share the same spelling, but have unrelated meanings: water filtering versus Bayesian filtering. Homonymy is less of a problem than polysemy: ‘homonyms can be largely ruled out in a tag-based search through the addition of a related term with which the unwanted homonym would not appear’ (Golder and Huberman, 2005: 2).

6.1.5 Plural noun forms (Guy and Tonkin, 2006; Beck, 2007) Most, or almost 90 %, of the Flickr and Delicious tags studied by Guy and Tonkin (2006) are nouns. However, it is a conundrum to choose whether to use a plural or singular noun form in a tag. For personal retrieval, it makes sense to be consistent. When a VRE-active researcher is using someone else’s tags in his or her own knowledge discovery, it is even more important. To avoid problem with plurals, the PennTags project (University of Pennsylvania, 2008) advises taggers to only use plural nouns. A search on ‘book’ will then bring up the ‘books’ tag, since the search will automatically include wildcards: *book*. On the contrary, Beck (2007) holds that tags should be kept singular: ‘By consciously making an effort to keep everything single, you will improve your ability to find exactly what you are looking for, which is what tags are all about. There are certainly times when you should use a plural tag; just make it count.’

44 6. LIMITATIONS TO AND QUALITY CONTROL OF UNCONTROLLED FOLKSONOMIES

Guy and Tonkin (2006) agree that tags should be singular: plural words do not follow system conventions. Among the findings by Guy and Tonkin is that almost 8 % of Flickr tags and over 11 % of Delicious tags were plural forms. Spiteri (2007) examines the linguistic structure of the tags gathered from Delicious, Furl and Technorati, which all publish daily tag logs. These tags were collated and analysed for patterns against the National Information Standards Organization (NISO) guiding principles for devising traditional controlled vocabularies, such as subject indexing schemes, thesauri, taxonomies and the Library of Congress subject headings. Some of the NISO entities are shown in Figure 6.1. The outcome of the Spiteri review is that the collaboratively constructed tags correlate closely with the NISO recommendations and suggestions regarding ‘types of concepts expressed, the predominance of single terms and nouns, and the use of recognised spelling’. In other cases, the collaborative tagging runs into difficulties, for instance erratic and inconsistent use of count nouns and the prevalence of imprecise tag indicating homographs, abbreviations and acronyms. The survey displays the demand and imposition of clearer guidelines for these tag categories so that ‘folksonomies could serve as a powerful, flexible tool for increasing the user-friendliness’ (Spiteri, 2007: 13). Figure 6.1: List of NISO vocabulary elements (Spiteri, 2007: 25)

45 6. LIMITATIONS TO AND QUALITY CONTROL OF UNCONTROLLED FOLKSONOMIES

6.1.6 Synonymy (Shirky, 2004) ‘Self-portrait’ versus ‘me’ on Flickr (Shirky, 2004), conjugate forms, such as am, is and are (Stock, 2007) are all examples of synonyms that have been found as separate tags. Shirky (2005a) talks about a ‘signal loss’ when terms are forcefully merged in order to achieve a false non-binary classification. Shirky contends that there is no such thing as synonyms in folksonomies: people talking about ‘queer’ and ‘homosexual’ are choosing different words for a reason and do not perceive them as synonyms, but as belonging to very dissimilar concept spaces. By ‘reducing terms such as movies, film, and cinema to one all-encompassing category, the distinctive meanings of each term gets lost in the translation’ (Kroski, 2005). Kipp and Campbell (2006) analyse the collaborative tagging patterns on Delicious, and through frequency data and synonym analysis find that distances between synonymous tags are much the same as in controlled indexing. Traditional classification systems are either coordinating terms across an array, or relating them hierarchically along a chain, but the authors show that Delicious tagging uses the unconventional time dimension where users tags for short-term needs ( ibidem , 10).

6.1.7 Spaces, symbols, acronyms and word collocation (Guy and Tonkin, 2006; (Hsieh et al. 2006) NewYork, newyorkcity or New_York: the question is which alternative that is more valid. Guy and Tonkin (2006) mention ‘the confusion inherent in folksonomic tagging’ leading to over 10 % of the sample Delicious tags were user attempts ‘to make compound words without simply concatenating words together, but by putting a symbol or a piece of punctuation inside the tag to represent a space’. Many users bunched a word and a number together, for instance ‘july14’.

This was particularly interesting, because some users appeared to be attempting to establish a hierarchical structure by building up a ‘pathway’ within the tag. For example, a user tagging several web pages within Delicious on the subject of programming languages might tag one topic as ‘Devel/C++’, a second as ‘Devel/BASIC’, a third as ‘Devel/Perl’, and so on (Guy and Tonkin, 2006)

Figure 6.2: Compound word separators on Delicious (Guy and Tonkin, 2006)

46 6. LIMITATIONS TO AND QUALITY CONTROL OF UNCONTROLLED FOLKSONOMIES

The authors find that no community consensus regarding how to treat the ‘non-breakable spaces’ have been reached, but a hyphen (-) is most popular, followed by underscore (_). For acronyms, MIT can mean either ‘Made in Taiwan’ or ‘Massachusetts Institute of Technology’ – or something else (Hsieh et al. 2006). But is MadeInTaiwan in CamelCase better? It comes back to compound words, and ambiguous tag concatenations. As regards word collocation, Kipp and Campbell (2006) studied relationships between tags that are often used together, but there are no rules for the sequences of words that can or can not co-occur. Guy and Tonkin (2006) highlight the dissociation of adjectives from nouns in English. One example is that when using tags to describe a ‘black’ ‘cat’ and ‘white’ ‘dog’, these single-word tags lose their meaning and there is no way to collocate that the cat should be prefixed by black or that the colour white should be suffixed by the animal dog:

With regard to compound words, private conventions are chosen by individuals for indicating relationships within an otherwise flat namespace, but these indications are applied for personal use, are not standard and can not therefore be leveraged to any common advantage. (Guy and Tonkin, 2006)

6.1.8 ‘Meta noise’ (Guy and Tonkin, 2006) Inaccuracy, misspellings, irrelevancy, inconsistency, ‘sloppy tagging’, tag spamming and other types of ‘electronic vandalism’ damage a folksonomy and lead to so-called meta noise. Misspellings or irrelevant tags can be partially eradicated with tag recommenders (Hsieh et al. 2006). According to Guy and Tonkin (2006), 40 % of Flickr tags and 28 % of Delicious tags are either ‘misspelt, from a language not available via the software used, encoded in a manner that was not understood by the dictionary software, or compound words consisting of more than two words or a mixture of languages’. Guy and Tonkin (2006) believe that ‘sloppy tags’ in a folksonomy will automatically be weeded out over time, as the tagging engine improves and people will adjust their tags when they see how others tag. It would be dangerous to try to ‘tidying up tags’ (Guy and Tonkin (2006), since ‘folksonomies enhance exploration; taxonomies enhance searching’ (Gahran, 2005).

47 6. LIMITATIONS TO AND QUALITY CONTROL OF UNCONTROLLED FOLKSONOMIES

6.2 Other problems A few other various limitations to non-semantic tagging will be introduced.

6.2.1 Limited life cycle (Chi and Mytkowicz, 2006) Chi and Mytkowicz (2006: 8) show that tags have a limited life cycle; they are becoming more diverse over time and lose their descriptive efficiency. As Samuel (2006) points out, tags are not permanent, and change naturally over time. Or as Rosenfeld (2005) asserts, it is ‘a safe bet that no one will bother to go back and re-tag their photos with more precise terms’.

6.2.2 Low scalability (Rosenfeld, 2005) The quality of a folksonomy will decrease exponentially as new tags are added. For instance, if information about television is only tagged ‘tv’ and someone searches for ‘television’, the item will not be retrieved. Rosenfeld (2005) exemplifies the scalability problem further: if a user adds the tag ‘summer’ to an image on Flickr, the image is added to a tag category that already has been used thousands of times, and the problem grows bigger: ‘Hard to browse now, harder when there are 60,000 photos a year from now.’

6.2.3 Non-hierarchical, ‘flat keywords’ (Anadiotis et al. 2007) As Anadiotis et al. (2007) stress, retrieving information by typing keywords was not invented just yesterday, thus tags are still a raw collection of labels allocated to longer pieces of text, to images, to multimedia resources, or any other type of material that needs to be indexed and that would benefit from quick and easy retrieval. But it all comes down to the fact that tags lack hierarchical or semantic relations, and in the end they are a ‘flat collection of keywords’ (Anadiotis et al. 2007). Shirky (2005a) counterclaims that tags only look flat in the same way as Venn diagrams look flat on a paper. Taking the idea about Boolean logic [78] further, Shirky shows that tags are highly multi-dimensional: if something is tagged ‘Virtual Research Environment’, this opens up for the AND, OR and NOT operators, e.g., Virtual OR research NOT environment. For this to work practically, only one more tag needs to be assigned.

48 6. LIMITATIONS TO AND QUALITY CONTROL OF UNCONTROLLED FOLKSONOMIES

6.2.4 ‘Mob indexing’ (Morville, 2006) Morville (2006) describes Delicious tagging as ‘mob indexing’. It follows a pattern of being ‘a response from the user’ (‘cool’ or ‘to read’) ‘rather than a statement of the aboutness of a document’ (Kipp and Campbell, 2006: 10). But what these tags do is to shed some light on ‘the energy that an individual user throws against a knowledge structure’ ( ibidem ). It is hoped that the same energy can be found in research communities; finding well-balanced incentives and benefits to retain researchers’ enthusiasm for improving tagging. The impetus for this might be the mutual satisfaction of using the tags and annotations created by colleagues – if the tags are kept vivid, up-to-date and maintain appropriate precision in subject matter ‘aboutness’. Quintarelli (2005) asserts that tags and folksonomies is a ‘forced move’ with no way to opt out. Folksonomies have already emerged and are a part of the ‘mass amateurisation of Web publishing’, e.g., blogs, which do not generally address ‘the masses’ but ‘a small circle of readers, usually friends and colleagues’ (Shirky, 2002). This leads to a ‘mass amateurisation of cataloguing’, a trade-off between structured centralised classification and no classification at all (Quintarelli, 2005) which means that ‘participating in the conversation is its own reward’ (Shirky, 2002).

6.3 Problems with tag clouds Tag clouds are both ambiguous and undifferentiated, and as emphasised by Lamantia (2006), the user needs to put a given tag cloud into an understandable and ‘proper context in order to understand the cloud effectively’. This is necessary, whether the objective of the user interaction with the cloud is to find related items, to identify and contact collaborators, or for ‘surveying the thinking within a knowledge domain’ ( ibidem ). Rijsbergen’s cluster hypothesis states that when two documents have similar content, the likelihood that they are relevant to the same information need or a given query, is high (Crestani and Wu, 2006). Crestani and Wu find that this still holds true in highly heterogeneous information environments such as folksonomies. The currently most prevalent, ‘first generation’ of tag clouds have low complexity and are not semantically coded, thus there is no way of automatically pre-cluster and exclude words with the same overarching meaning. For instance, both ‘java’ and ‘javascript’ have high weight in the cloud, when clearly their meanings are closely interconnected. A few popular topics will often dominate the whole cloud when the inclusion is based solely on tag use frequency (Hassan-Montero and Herrero-Solana, 2006: 1). These dominating topics have a very high ‘semantic density’, i.e., a

49 6. LIMITATIONS TO AND QUALITY CONTROL OF UNCONTROLLED FOLKSONOMIES whole range of computer words are included, when it would have been more representative to replace some of them with minority tags and achieve a more heterogeneous composition of tag cloud words.

50 7. ATTEMPTING TO SOLVE SOME OF THESE PROBLEMS : QUALITY CONTROL OF USER -GENERATED CONTENT

7. Attempting to solve some of these problems: Quality control of user-generated content

Chapter 6 displayed the views of tagging sceptics and outlined the genuine linguistic limitations of an uncontrolled vocabulary. Without restricting the freedom of collaborative tagging, people can be instructed to move towards more rational tagging, however, Hunter et al. (2007: 2) wonder if there is a way to ‘carry out quality control of the community metadata without adversely impacting on the spontaneity and simplicity of the tags, and [doing so] with minimal cost and effort’.

7.1 Tags combined with a controlled vocabulary One way of choosing conventional tags is with the help of a controlled vocabulary, where terms can be retrieved, synonym relationships mapped – if only just for an inspirational purpose. The JISC project ‘Enhanced Tagging for Discovery’ (UK Office for Library and Information Networking, 2008) investigates information indexing and retrieval aspects in unrestricted social tagging versus social tagging in combination with a controlled vocabulary in an academic environment. In a study on the use of controlled vocabularies and the cross-language subject access feasibility in European libraries, Schmidt-Supprian (2007) finds that two-thirds of libraries complement controlled vocabularies with manually assigned keywords, especially for digitised material and online publications. There exists

a plethora of idiosyncratic controlled vocabularies: up to 118 of them, representing 63 ‘base’ tools. These bases ranged over 15 different classifications, roughly divided between a general and a specialist scope, 17 different pick lists, most for special collections, 16 subject heading languages, usually generalist subject access tools of national importance, and 15 thesauri, again both of generalist and of specialist scope (Schmidt-Supprian, 2007: 58).

7.1.1 WordNet Shirky (2005a) shows that collaborative tagging can lead to the construction of controlled vocabularies. This has happened already, according to Shirky, for instance in business research. Another example is eBay, where the collector communities construct their own terminologies, vocabularies and taxonomies. Also in the ‘blogosphere’, the value of a ‘giving and taking’ economy of tags has been widely embraced.

51 7. ATTEMPTING TO SOLVE SOME OF THESE PROBLEMS : QUALITY CONTROL OF USER -GENERATED CONTENT

WordNet [79] is a database and ontology over current English natural language, developed in the cognitive science laboratory at Princeton University through studies into human lexical behaviour and memory. It is free to download and use. In 2006, WordNet contained around 150,000 words, grouped into ‘synsets’ or ‘synonym rings’, i.e., words which can be used interchangeably without changing the semantic value. A valid synset can comprise the words ‘person’, ‘human’ and ‘individual’, and all these words are aliases for each other. WordNet as a ‘Word Sense Disambiguation’ (WSD) tool is employed in a tag recommender system developed at the University of Bari, Italy (Basile et al. 2008: 27). WordNet is also used when annotating blogs (Berendt and Hanser, 2007: 1). WordNet has also been used to compare and contrast folksonomies with controlled vocabularies in ‘adaptive e-learning systems’, to detect words not in WordNet but needed in e-learning. The CommonFolks project (Bateman et al. 2006a) attempts to facilitate the creation of this semantic metadata. The aim is to describe concepts, the properties of concepts, and the relationship between them. CommonFolks ties words not listed in WordNet, to the nearest equivalent WordNet synonym. Through a ‘community consensus’, words are added and the user can add an annotation, which are matched with annotations already made. Over time, this community consensus might lead to an increased level of disagreements, and this is something that needs further study, as the authors acknowledge.

7.1.2 Ontology-directed folksonomies and tag normalisation Different control mechanisms have been put forward (Hunter et al. 2007: 4). Taxonomists can go in and add syntax and structure and ‘supervise’ the folksonomy, or the ontology-directed approach can be used, where a user is recommended tags, but still can add its own. Folksonomy words can then over time be clustered, either a priori (by the users) or a posteriori (by clustering algorithms). Another way to achieve more sensible tagging is through ‘tag normalisation’ is a technique whereby a tagging tool modifies ‘raw’ tags, e.g., ‘ChocolateChipS’ will get normalised into ‘chocolatechips’ through a lowercasing operation (Luk, 2005a: 3). Some taggers will be upset when their raw tags are modified, and e.g., Flickr shows raw tags when a user browse photos, but convert inappropriate spacing, HumpCasing, et cetera, when users search the tags, just to avoid confusion when someone uses someone else’s tags to search for items.

52 7. ATTEMPTING TO SOLVE SOME OF THESE PROBLEMS : QUALITY CONTROL OF USER -GENERATED CONTENT

7.2 Tag suggestion mechanisms and recommender systems Sensible tagging can be learned but might never be intuitively flawless. A system that delves into the folksonomy and delivers appropriate tags every time a user is in the process of creating a new tag could be useful. Some type of ‘quality control’ of tags and annotations is important both when creating them and when searching them. Section 5.5.2 demonstrates that one problem with a broad folksonomy is that people might use so many different words to describe the same thing. Many words are so sporadically used as tags that they will never appear in a tag cloud, due to low ‘semantic density’ (Hassan-Montero and Herrero-Solana, 2006: 1). The concern with ‘the long tail’ is that while there are so many unique search queries performed, most of them are never repeated. The only chance to explore these infrequently used tags is by implementing some kind of randomised folksonomy viewer to facilitate undirected meandering through an ocean of tags. Recommender systems have grown increasingly important because there are so many resources out there, and the user needs to separate the wheat from the chaff but have no opportunity of doing so on its own. Recommender systems understand the user’s taste and recommends similar items from a large number of choices, that users would never have time to navigate (Resnick and Varian, 1997).

7.2.1 A recommender system for IVRLA The OJAX federated search will include a recommender system, which is currently in its early stages of development. This system will suggest and recommend similar images, texts, et cetera , based on metadata keyword similarity between digital objects. IVRLA interface development is contained in ‘Work Package 5’, and one of the objectives is to co-operate with the School of Computer Science and Informatics, University College Dublin, in the development of a research recommendation service (University College Dublin, 2008b). At least initially, this recommender will only be able to liaise between users and an existing folksonomy on a ‘read-only’ basis, i.e., it will not be able to work side by side with a tagger in the process of creating appropriate tags (Chen et al. 2008a). Caccamo (2006: 53) reports that 87 % of survey respondents in the user requirements study would find a ‘Find similar’ button helpful, and the author accentuates the strong enthusiasm for such a feature. A reason for not favouring a recommender engine is the problem with ‘the level of irrelevant material often supplied by suggestion facilities’, while a reason given by a respondent in favour of a suggestion facility, was that it ‘replicate[d] the

53 7. ATTEMPTING TO SOLVE SOME OF THESE PROBLEMS : QUALITY CONTROL OF USER -GENERATED CONTENT

“real-world” experience in an online environment’ (ibidem , 61). On the negative side, problems of being ‘sidetracked’ to irrelevant material are mentioned, and a simple disinterest in viewing other similar material, with a preference for new material or ‘new connections’ is also commented on. The company 83 Degrees [80] focuses on improving user experience on the Internet, and declares that ‘experience has led us to the belief that you must sometimes get sidetracked in order to find your focus’ (Davidson et al. 2008). Two steps sideways might be the way new forward, also when navigating scholarly repositories.

7.2.2 Collaborative filtering The Rijksmuseum Amsterdam web site [81] lets users tag collections for author, genre and period and the database can then make recommendations to similar artworks, leading to a personalised search experience, filtering and adapting the search results to the specific interest and knowledge level of every user (Basile et al. 2008: 26). Amazon uses item-based collaborative filtering (Herlocker et al. 2004) when recommending a user to buy additional items which other users with the same music, book or movie taste had in their check-out baskets. But not only user similarity, but also the trustworthiness of users is important in recommender systems (O’Donovan and Smyth, Figure 7.1: Collective 2005). By intelligence networking (Wikimedia, 2008b) taking this criterion into account, the collective intelligence can mimic some of the mechanisms behind peer-reviewing as taking place in academic journals and other publications of high intellectual renown where the crème de la crème of research is filtered out for inclusion in highly esteemed periodicals.

54 7. ATTEMPTING TO SOLVE SOME OF THESE PROBLEMS : QUALITY CONTROL OF USER -GENERATED CONTENT

7.2.3 Buzzillions The product reviews service Buzzillions [82] uses a combination of faceted browsing (narrowing results by category, by brand or by price), a folksonomy (‘Consumers Speak’) and a recommender system (rating and reviews of the products). Buzzillions uses the collective intelligence of the shopping community, the ‘network of trust’, to help consumers to make informed decisions about the best product to suit their taste. Xu et al. (2006) propose a ‘reputation score’, a barometer of the quality of user-created tags. Buzzillions as a ‘commerce-oriented’ Social Web application is closely related to the Social Web attribute of identity (‘who are you’), reputation (‘what do people think you stand for’) and relationships (‘who do you trust’) (Connolly, 2008).

7.2.4 ZoneTag Ames and Naaman (2007) study the user behaviour on ZoneTag [83], a mobile phone application which tags and uploads images to Flickr. The rationale for tagging photos is ordered in a matrix with the axes ‘sociality’ and ‘function’. The study shows that tagging activity increases if the user is given the opportunity to tag at the point of capturing the image. Ames and Naaman acknowledge that a tag suggestion scheme can encourage tagging but at the same time, ‘users can be confused or alarmed by inexplicable tags’, and they might also seize this fast- track solution and accept a recommended tag, even though it might not be the most suitable tag (Ames and Naaman, 2007: 9). Figure 7.2: Reasons for tagging images on Flickr (Ames and Naaman, 2007: 6)

55 7. ATTEMPTING TO SOLVE SOME OF THESE PROBLEMS : QUALITY CONTROL OF USER -GENERATED CONTENT

7.2.5 Tagging sensibly on Delicious There are many advisory articles written on how to tag in a sensible, appropriate and consistent way. One, written by Samuel (2006), suggests twelve steps for ‘choosing effective Delicious tags’:

o Use Delicious to get a sense of how other social bookmarkers are tagging. o Pick tags that have many links already, thus building on the existing body of knowledge o Don’t use CamelCase or spaces (blank space); this will just come out as ‘camelcase’, and ‘blank’ and ‘space’ respectively; also use underscores with care o For multi-word concepts, understand the difference between the tags compound tags ‘opensource’ and ‘canadianpolitics’: the first one is a good way of naming a multi- word concept with a unique tag, the second one is indeed also a unique tag, but it would have been better off as ‘Canada’ and ‘politics’. Spiteri (2007: 18) finds that single-term tags constitute 93 % of Delicious tags, 76 % of Furl tags, and 80 % of Technorati tags. Delicious does not allow for tags such as ‘open source’, i.e., a user is forced to use the compound ‘opensource’. o Establish a wiki to help facilitate co-ordinate tagging efforts in a small group with common interests, and tag in a way that makes your tags (and you) discoverable by your friends and colleagues, if they are also using Delicious)

Another range of criteria for creating ‘good tags’ on web pages is established by Xu et al. (2006: 3).

o High coverage of multiple facets: both generic tags (category, location, time, subjective and specific tags) and subjective tags. The larger the amount of distinct facets tagged for, the higher the search recall outcome will be. Heckner et al. (2008) find that on Connotea, nouns are most frequent, with 72 % of the tags. 15 % of tags are acronyms, 12 % adjectives, and 1 % constitutes numbers o High popularity: tags used often are less likely to be spam. The popularity aspect is equivalent to ‘term frequency’ in traditional information retrieval o Least-effort: Without ignoring multiple facets when constructing the tags, the total amount of tags should be held to a minimum, and the amount of objects and item

56 7. ATTEMPTING TO SOLVE SOME OF THESE PROBLEMS: QUALITY CONTROL OF USER -GENERATED CONTENT

that are tagged with the same tag combination should be small, so that most or all of the tagged objects can be reached through a minimum of clicks. o Uniformity: Syntactic variance among taggers, e.g., ‘blog’ and ‘blogging’, or synonyms (‘cell phone’ versus mobile phone’) introduces noise into the tagging instrument, but can increase recall. The authors recommend a system where any word is allowed, but similar words are bunched together group-wise. o Exclude ‘organisational tags’, like ‘to read’, ‘to forward’

The Delicious tutorial on how to prepare bookmarks for inclusion on the site (Delicious, 2008a) shows the importance of useful tags. The user enters a Uniform Resource Locator (URL), a Description, and then there is a Notes field for annotations, and finally the Tags field, which lets the user specify a range of keywords thought to correspond to the search terms the user intuitively estimates that other visitors will use in the search box. To help the user, a range of suggested, ‘recommended’ and ‘popular’ tags are displayed on Delicious – but without interfering in their own mental model of ‘aboutness’. Since the tagging instrument in IVRLA at present is integrated in Delicious, the tag recommender functionality is already there. However, a distinct and tailor-made OJAX++ tagging system with this tag recommender service as one of its components would be a more sensible solution.

Figure 7.3: Tag-creation recommendations on Delicious (2008b)

57 8. AVAILABLE OPEN SOURCE TAGGING AND ANNOTATION SOFTWARE

8. Available open source tagging and annotation software

Available open source and open standards software will be introduced in this chapter. The software packages will then compared and contrasted in section 9.7 of the subsequent recommendations chapter. For a tabular overview of the software features, see Appendix A. An important aspect of open source software is that they follow open standards. An example of adhering to standards is to display database query results in the PHP format, a computer scripting language for displaying dynamically generated web pages, such as search result pages, which are generated uniquely for every specific search query. Another standard is the Structured Query Language (SQL) for querying databases. MySQL is freely available as an open source multi-user, cross-platform database product which comes under a GNU General Public Licence and is the de facto standard for accessing databases. The IVRLA does not have a search facility at present. The upcoming IVRLA Search Application will have a back-end extension built on MySQL query language and a front-end annotation tool, collecting information into an annotation repository in order to enable ‘annotation-informed retrieval’ of repository data (Chen et al. 2008b).

8.1 Steve Tagger Figure 8.1: Steve Tagger tagging analysis (Trant et al. 2007) The Steve.Museum project offers a PHP-driven web application that publishes a collection of items, and records tags. The API is open source and allows integration with existing projects, for instance in an OJAX++ environment, with the specific applicability for annotating IVRLA

58 8. AVAILABLE OPEN SOURCE TAGGING AND ANNOTATION SOFTWARE images, something that already has proven popular and successful in two IVRLA experiments on annotating images from the Éamon de Valera Photograph Collection (Chen et al. 2008a). Figure 8.1 shows the social tagging analysis in the Steve.Museum project. WordNet is used as one of the vocabulary sources, together with the two Getty controlled vocabularies Art and Architecture Thesaurus (AAT) [84] and the Union List of Artists Names (ULAN) [85].

8.2 FreeTag FreeTag [86] is a folksonomy-compatible modular tagging plug-in using the relational database MySQL and PHP, and running on the Linux, Apache, MySQL, PHP (LAMP) stack. It allows for a combination of ‘raw’ and ‘normalised’ (search-friendly) tags, mentioned in section 7.1.2 on tag normalisation. A normalised tag is free of non-alphanumeric characters, periods.between.words and other anomalies, which are still allowed in raw tags (Luk, 2005a: 3).

Figure 8.2: The FreeTag-powered tagging service on Eatlunch.at (Luk, 2004)

FreeTag is used by the site BlogSkins [87], a community of blog template designers, and bloggers in search of a new background theme. It is also used on Eatlunch.at [88], as site by the FreeTag developer, also a previous Yahoo! employee. Eatlunch.at lets users tag their favourite lunch restaurants. The local events guide Upcoming [89], which was also developed

59 8. AVAILABLE OPEN SOURCE TAGGING AND ANNOTATION SOFTWARE by the FreeTag team, uses the FreeTag system. The tag engine was implemented on the Upcoming site in less than two days (Luk, 2005b).

8.3 RichTags Building on W3C standards, RichTags [90] is a semantic tagging instrument funded by the JISC Digital Repositories Programme and developed at the University of Southampton. RichTags is the topic of a Master dissertation by Fountopoulos (2007). The RichTags project ran between February 2007 and January 2008. The RichTags Web Service uses the (OWL) format, queries the data via a Joseki engine built on the SPARQL Protocol and RDF Query Language.

Figure 8.3: Tag vocabulary window in RichTags (Fountopoulos, 2007: 34)

RichTags goes beyond arbitrary flat keywords and towards an unabridged collection of alternative labels and semantic relations. Semantic tagging can offer a 100 % precision but the user might want to prefer to rely on tags from UCD colleagues in the School of Information and Library Studies rather than someone working in the School of Sociology, in a scenario where both schools have aggregated their institutional repositories.

60 8. AVAILABLE OPEN SOURCE TAGGING AND ANNOTATION SOFTWARE

8.4 Annotea A proposed W3C standard for collaborative annotations on the Semantic Web, Annotea [91] uses an annotation schema based on the Resource Description Framework (RDF), a W3C standard for representing metadata data models, e.g., annotations, and it uses the XML XPointer registry to find the annotations, which can be stored locally or on a public annotation server. Amaya [92] is the first client implementation. Annozilla [93] is an annotation client plug-in for Mozilla-compatible browsers, which is still under development, while the Annotatio [94] is a JavaScript server and client implementation.

Figure 8.4: The Annotatio annotation client (World Wide Web Consortium, 2005)

8.5 Multivalent, Fab4, CommentPress, SharedCopy and Fleck Originally set up in 2001 as the Multivalent browser, Fab4 is a document viewer that works as a standalone browser and can open HTML, PDF, DVI, SVG, and JPEG files without any helper application. Content is displayed in the same way as in a web browser, and any piece of text on the screen can be annotated. Annotations can be displayed in different colours, moved around on the page, and distributed to other users of the system. Annotations set to public mode will be seen by anyone using Fab4 to open a document.

Figure 8.5: The Multivalent browser with annotations (Phelps and Wilensky, 2001: 2)

61 8. AVAILABLE OPEN SOURCE TAGGING AND ANNOTATION SOFTWARE

The Fab4 browser is used in the Multivalent Cheshire Kepler (MultiCheK) project (Watry and Corubolo, 2007) which is a part of the JISC-funded Building a Virtual Research Environment for the Humanities (BVREH) Virtual Research Environment framework at the University of Oxford. The provenance of the annotations is guaranteed through an XML attached to the annotation. Various identifiers are used to attach the annotation to the document it describes, leading to annotations being location and file format independent (Phelps and Wilensky, 2001). CommentPress [95] is an open source annotation plug-in for the blog tool and publishing platform WordPress [96]. However, it is not transferable to other web sites or projects. Fleck [97] lets anyone place annotations on any web page without registration. Annotations can be sent via email to others, or upload to a blog. Fleck is freeware but not open source. Riley (2007) outlines a practical difference between Fleck and the similar SharedCopy [98] web annotation system: ‘Fleck loads a fresh copy of a marked site [while] SharedCopy shows a cached version of the page as it was at the exact time it was marked for annotation. The feature guarantees that what a later visitor sees is exactly what the original user intended. Links are provided to the original source site.’

8.6 The Open Annotation and Tagging System Ostensibly, there are multiple examples of web annotation systems. However, Bateman et al. (2007) emphasise that an open source, open standards system that combines in-line annotations and tags in a web service interface has not been seen before. The novel value- added functionality delivered by the Open Annotation and Tagging System (OATS) [99], is that it supports both tag metadata and longer user-generated annotations. OATS was developed by Professor Gordon McCalla and MSc student Scott Bateman at the University of Saskatchewan, Canada. It was awarded ‘first place demonstration’ at the 2007 Learning Object repository Network conference. OATS is being used both locally and with research partners in Pittsburgh to better connect online learners through

Figure 8.6: The OATS annotation web-based education content. The interface, (Bateman et al. 2006b: 5) 62 8. AVAILABLE OPEN SOURCE TAGGING AND ANNOTATION SOFTWARE

OATS Web Service consists of a Java servlet which communicates with a MySQL database. The OATS Client is a library of Javascripts, communicating via AJAX. Bateman et al. (2007: 1) show how OATS have been used for user-generated metadata in Virtual Learning Environments (VLEs) and Learning Management Systems (LMS). OATS is closely related to the iHelp [100] collective intelligence student support system for e-learning, and the Annotations for Education (AnnotatEd) system for navigating educational resources, at the University of Pittsburgh (Farzan and Brusilovsky, 2006). Bateman et al. (2007) find that students who use tags climb from the ‘consumption level of learning’ Figure 8.7: Searching tags in OATS (Bateman et al. 2006b:7) (knowledge and comprehension) to more meta-cognitive levels of application and analysis. Bateman et al. conclude that ‘metadata is best created if it focuses on a particular goal, is contextualised to a particular user and is created in an ambient manner by observing the actions and interactions of students in learning environments’ ( ibidem ). ‘Community highlights’ are sections that multiple readers have indicated as important. OATS can be used with the WordNet ontology to achieve a tag recommender facility via the “Community tags” and “Others’ notes” options.

Figure 8.8: Tagging in OATS (Bateman et al. 2006b: 6)

63 8. AVAILABLE OPEN SOURCE TAGGING AND ANNOTATION SOFTWARE

Figure 8.9: The OATS tagging and annotation control panel (Bateman, 2008)

8.7 Harvesting and Aggregating Networked Annotations Harvesting and Aggregating Networked Annotations (HarVANA) [101] is an annotation harvester using OAI-PMH and the Annotea framework, to be released mid-2009. The Australian National eResearch Architecture Taskforce (University of Queensland, 2008) argues that a wide range of research communities demand an environment of shared resources, leading to ‘additional layers of knowledge and interpretations that can be shared’. Such systems actively facilitate community discourse and can also be used to support peer- review and research quality assessment.’ Harvesting and Aggregating Networked Annotations (HarVANA) is an Australian project that will run over two years and will develop ‘secure document sharing, tagging and annotation services’ for e-research. E-research is online, data- intensive and collaborative research environments in a range of academic disciplines, among them the astronomy, medicine, social sciences and humanities and national, state and university libraries. A VRE is a technology used in e-research, thus the concepts are closely related. There is a belief in the Australian National eResearch Architecture Taskforce that the type of humanities material that is found on IVRLA can benefit from being tagged and annotated.

64 8. AVAILABLE OPEN SOURCE TAGGING AND ANNOTATION SOFTWARE

Each tag and annotation will include associated metadata, such as creator, date, tags, target, description, type and language. The browser interface can be a tag cloud, free text search interface or filtered search, restricted by time of creation, creator, et cetera . Tags will first be suggested from a controlled vocabulary, but new tags can be added.

Figure 8.10: Harvesting annotations with HarVANA (Hunter et al. 2007: 5)

8.8 The Ultimate Tag Warrior, Jerome’s Keyword and Dekoh A few tag cloud software will be mentioned. The Ultimate Tag Warrior [102] and Jerome’s Keywords [103] is a couple of plug-ins to be used with the WordPress blogging platform. These pieces of code can generate tag clouds from keywords assigned to blogs. Since version 2.3, WordPress has a tagging tool is built in, however the plug-ins are interesting examples of tag cloud generators. More versatile tag cloud software comes in form of an AJAX tag cloud JavaScript widget [104] by Dekoh, an open source and open standards ‘cross- OS desktop platform that brings several key J2EE, database and web services modules together’ (Dekoh, 2008a). Figure 8.11: Dekoh tag cloud widget (Dekoh, 2008b) This piece of code can extract tags

from a database and display them back to the user. Dekoh can manage multiple data formats, such as JavaScript Object Notation (JSON) and XML. Tags can be filtered out depending on when they were added, can be ordered alphabetically, displayed in differently coloured style sheets, et cetera .

65 8. AVAILABLE OPEN SOURCE TAGGING AND ANNOTATION SOFTWARE

To display the tag cloud in a web browser and connect to the database, HTML is used. Figure 8.12 is a snippet of XML code used in the Dekoh tag cloud in Figure 8.11. The XML tag has the attributes ‘created’, ‘freq’, ‘rank’ and ‘tagtitle’. Other possible attributes could be ‘facet’ to show which facet it belongs to in a faceted system, or ‘WordNet synset’ if the decision is made to code for the WordNet synonym ring that a particular tag belongs to. Figure 8.12: XML code for the sample Dekoh tag cloud in Figure 37 (Dekoh, 2008c)

66 9. PRACTICAL RECOMMENDATIONS FOR A SINGLE TAGGING AND ANNOTATION TOOL IN OJAX++

9. Practical recommendations for a single tagging and annotation tool in OJAX++

This chapter will synthesise the previous chapters and recommend best practice for introducing a tagging and annotation tool in OJAX++, learning from previous research into tagging and annotating on non-academic places on the web and using this set of ‘transferable skills’ in the OJAX++ infrastructure. A set of general recommendations will be thoroughly outlined, followed by software recommendations.

9.1 Introduce collaborative tagging in OJAX++ as it leads to richer metadata and increased knowledge discovery: important aspects of successful research In any research endeavour, the activities of browsing, knowledge discovery and exploration and serendipitous search are important to inform researchers of all relevant literature on a topic, whether this information is meticulously catalogued and classified or not. One subset of the IVRLA project plan is ‘interface development’, and more specifically to ‘explore the adoption of social tagging and bookmarking tools for use within a research repository system’ (University College Dublin, 2008b). The significance of bookmarking tools on research environments is important but was beyond the scope of this research, however for social tagging, the limited empirical evidence available points in the direction of a richer search experience through harvesting of collective intelligence among the user base of a VRE. Having compared and contrasted hierarchical and faceted classification schemes in Chapter 4, the recommendation is to combine professional and user indexing and take the best from two worlds. Faceted guided navigation is useful, but the number of different facet categories is limited, while the improvisation aspect of user-generated tags is a suitable extension to this problem of reaching orderly and unambiguous ‘aboutness’. The Castells (2000) scholarship on ‘the network society’ and Toffler’s idea of the ‘prosumer’ (Stock, 2007: 97) capture the idea of innovation and creativity, where the boundaries between content provider or producer, and content user or consumer, are blurred. This open process is fundamental in research, one of the key pillars of the knowledge economy. Extending this open process to the indexing process of the information being made available is important and this is where collaborative tagging can add a new dimension, leading to a richer metadata experience. Academic papers are not always written with information retrieval in mind by spreading out keyword cues evenly over the text, or

67 9. PRACTICAL RECOMMENDATIONS FOR A SINGLE TAGGING AND ANNOTATION TOOL IN OJAX++ collecting them in a paragraph between the abstract and the introduction sections. This is where collaborative tagging can add extra value, adding the keywords that the authors have forgotten. It is relevant in IVRLA, but even more so in an environment with a higher quota of academic papers than images, and should be taken into consideration when OJAX++ develops further. When Flickr started their tagging of photos, many information scientists criticised the tagging tool for its lack of qualitative findability, claiming that users had to rely on chancy exploration and serendipity, rather than searching and intent (Vander Wal, 2007). This could have been rooted in a misunderstanding of a typical user’s intent with searching tags, and that it possibly is more important to explore photos than knowing exactly what one is looking for. There are unending possibilities of serendipitous information seeking, where perhaps the first five clicks leads to information that the user knew before, but on the sixth click it opens up an exciting information source that are new to the user, but that will prove highly useful. The scope for serendipity in IVRLA will increase over time, when the project reaches important milestones through the inclusion of more digitised content.

9.2 Encourage collaborators in VREs to create multiple types of tags in order to capture the multi-faceted context of information items It would be rational to ask taggers to tag for content, context, and attributes, to follow the five-tiered typology of tags (Xu et al. 2006). This typology classifies tags into the categories content-related tags, context-related tags, tags concerned with attributes of the subject matter, subjective tags and organisational tags. Particular attention should be given to contextual metadata in IVRLA, since this type of keywords is rare in the IVRLA repositories (Chen et al. 2008a). Also for other types of repositories, contextual user-generated metadata complements authoritative metadata that tends to capture uncontested content information, and not contextual information originating on more personal level, thus falling outside the scope of the metadata categories in the Dublin Core and MODS schemes. Poor access to descriptive metadata can occur not only due to non- existing metadata, but also due to the existing metadata not being intended as keywords for a modern information retrieval system, but for perusal in a physical archive. The user base can then add contextual annotations to images. Steve.Museum appreciates that art is often beyond the force of objective explanations; rather it is about instilling a certain personal feeling and about appealing to the creative, interpretive and associative skills of the visitors. If these reactions can results in keywords

68 9. PRACTICAL RECOMMENDATIONS FOR A SINGLE TAGGING AND ANNOTATION TOOL IN OJAX++ that other visitors subliminally could identify with, the tags are useful for the entire community of museum visitors, no matter if the exhibits are enjoyed in a physical museum building or through a web site where they have been digitised. In two trials, users were asked to annotate images from the IVRLA Éamon de Valera Photograph Collection, comprising 3,000 black-and-white images. 50–60 % added caption tags, while 40–50 % added additional tags (Chen et al. 2008a). This shows a high user interest in adding personal remarks and comments to historical images. It also substantiates the findings by Caccamo (2006) that IVRLA users see an annotation service as an exciting addition.

9.3 Promote sensible tagging and introduce a ‘Tagging tips’ checklist Empowering users to tag sensibly is beneficial to everyone involved in an evolving research environment. It has been claimed that no matter how far the standards are lowered, the collaboration aspect of information organisation ‘flaunts too many standard principles of conventional indexing’ (Kipp and Campbell, 2006: 1). This radical view is unbalanced, and oblivious to the fact that collaborative tagging is not replacing conventional indexing, and the fact that folksonomies are dynamic and improve over time. They become more precise and sophisticated, for instance by moving more towards the Semantic Web. Thus, the list of disadvantages of tagging comes with the disclaimer that it only demonstrates how tags differ from conventional indexing, and there is no reason to infer from this that authoritative metadata and indexing is given up on altogether. Tags are often taken from users’ natural language vocabulary, and therefore they inherently adapt more rapidly to new concepts and cater for an unlimited amount of labelling to the same information item. A ‘Tagging tips’ page, where the tagging advice from section 7.2.5 could be outlined, e.g., the value of multi-faceted, consistent, useful and contextually RichTags would be beneficial. The ‘succinct, lowercase, singular’ scheme suggested by Beck (2007) is useful. Guy and Tonkin (2006) talk about ‘a set of helpful heuristics that promote good tag selection, such as a checklist of questions that could be applied to the object being tagged, in order to direct the tagger to various salient characteristics’. This could also be a part of the tagging tips page. This includes picking tags that have many links already, building on the existing body of knowledge (Samuel, 2006), who also stresses the confusion that CamelCase tags create. Xu et al. (2006) mentions the least-effort principle: tags should be kept to a minimum, without affecting the multi-facetedness of the tags created.

69 9. PRACTICAL RECOMMENDATIONS FOR A SINGLE TAGGING AND ANNOTATION TOOL IN OJAX++

The uniformity aspect is also very important. In a typical OJAX++ tag search, the best option is to exclude ‘raw’ tag versions (‘architecture123’) and only include ‘normalised’ tag versions (‘architecture123’ cleaned up leading to the tag ‘architecture’) (Luk, 2005a: 3).

9.4 Combine encouraging, unrestricted tagging with tag recommenders and the WordNet lexicon The creation of tags and the search for tags need to be as easy, purposeful and beneficial as possible, for all different types of material. ‘Tagging tips’ do not solve all problems with ‘sloppy tagging’. Users might for example not understand it or forget to consult it. A tag recommender system can be beneficial once implemented. As identified by Hsieh et al. (2006), inconsistency, impreciseness, irrelevance and, to a certain degree linguistic anomalies, can partially be eradicated with tag recommenders linked to a lexical mechanism, such as WordNet, which can check for misspellings. It can also display synonyms and inspire the tagger to think further in terms of the ‘aboutness’ of an information artefact. Thus, the recommendation is for the implementation of a tagging and annotation instrument that at the point of creating the tag, first recommends tags that other users have used for that item. The second step is to let the user choose another tag and check it against WordNet, and hereby make a lexical matching, i.e., ‘compare tags and concepts on the basis of the lexical representations associated with them’ (van der Sluijs and Houben, 2008: 21). This approach was used by the Dutch heritage organisation Regionaal Historisch Centrum Eindhoven (RHCe) [105] when they opened up their collections to the public, and encouraged user interaction with the artefact descriptors. The wide applicability of WordNet bodes well for using it in conjunction with OJAX++ VREs. As a last step, let OJAX++ users add their own tags to a broad folksonomy through the ontology-directed folksonomy approach (Hunter et al. 2007: 4). In Virtual Research Environments, a broad folksonomy where users can tag the same resource multiple times is the most logical approach. Unlike Flickr, the users of IVRLA are not the creators of the content; they are just reacting to material created by a third party. A narrow folksonomy would be too restrictive in a research environment where the items are highly heterogeneous. The bottom-line is that the users might be discouraged to tag if they are too restricted in words available as acceptable tags. The motivational aspect should not be interfered with. Taggers may choose different words for a reason and do not perceive them as synonyms. Furthermore, the overheads of tagging, that is, the time and intellectual input behind the

70 9. PRACTICAL RECOMMENDATIONS FOR A SINGLE TAGGING AND ANNOTATION TOOL IN OJAX++ tagging, does not reward the original tagger if the folksonomy does not have the ‘tight feedback loop’ between the assignment of tag and the use of the same tags that Delicious offers (Udell, 2004). The immediate gratification can inspire taggers to assign keywords to content. Care has to be taken not to attempt to tidy up tags (Guy and Tonkin, 2006). This is to be ‘condoning the implementation of a destructive solution that may lose valuable metadata’. It can also lead to erratic, less spontaneous tagging.

9.5 Extend OpenSearch syndication and aggregation to tags and annotations Tags and annotations are technically no different from MIX, MODS or Dublin Core metadata fields, and can be discovered and harvested in the same way. OpenSearch is a convenient and standardised way to link repositories and their metadata, and it effectuates the idea of federated institutional repositories. The strength of OpenSearch is that it can search unstructured information resources more efficient than SRU and SRW, as seen in 4.1.3.3. EMC’s AskOnce [106] and IBM’s Venetica [107] repository searches use proprietary technologies and the access to the content indexed and retrievable through these services is contingent on the generosity of a content provider, a type of ‘vendor lock-in’, leading to restricted access. The Alfresco [108] Content Management System (CMS) for enterprises supports OpenSearch as does previously mentioned Picture Australia and 700 other projects, repositories and databases. As Ogbuji (2007: 2) points out, there is ‘a long and growing list of OpenSearch tools and search engines, so there is a good chance this specification will guide how we approach search and query for Web 2.0’. The introduction an OpenSearch-compliant IVRLA metadata search environment can be justified for these reasons. OpenSearch can make OJAX++ tags and annotations available from anywhere, not only from within the VRE, with potential synergy effects. The IVRLA repositories, as well as future OJAX++ VREs, would benefit from being added as one of around 700 search engines that can be reached from the A9 OpenSearch portal. The value of this increases as more and more digitised repositories are made available through a single VRE, and subsequently also via OpenSearch. Were the Digital Humanities Observatory (Royal Irish Academy, 2008) to use OpenSearch the interdisciplinary, multilingual, and mulimodal digital resources available through this service, this would mean increased availability of other Irish digital libraries through a single OpenSearch interface. For the OJAX federate search, which already uses OAI-PMH on a large scale, the OpenSearch Discovery of the OJAX search engine is beneficial. It enables the same rich

71 9. PRACTICAL RECOMMENDATIONS FOR A SINGLE TAGGING AND ANNOTATION TOOL IN OJAX++ federated search engine experience without navigating to IVRLA or other repositories in an OJAX++ environment. Since collaborative tagging preferably should be treated the same as MODS, MIX or Dublin Core metadata, the recommendation is to allow for tags and annotations to also be searchable from the browser toolbar. In a scenario where IVRLA is unknown to a particular user searching for Irish emigration or famine questionnaires, OpenSearch can enable discovery of IVRLA metadata from the UCD Delargy Centre for Irish Folklore and the National Folklore Collection, if OJAX-compatible repositories and databases are made available via OpenSearch. This opens up for increased synergism by linking institutional repository metadata.

9.6 Gather tags in a dedicated tag element and allow for private tags Tags could preferably go into one all-encompassing ‘tag element’ and not be bunched together with the professionally created metadata. This makes is easier to capture the true vividness of tags. By mapping tags to standard Dublin Core elements like ‘description’ or ‘subject’, some of the fluidity of tags is lost. A ‘comments’ field could be the standard way of expressing tagging metadata. The OJAX Repository Search already allows for search in the ‘author’, ‘title’, and ‘abstracts’ fields, or any combination of these. The repository also searches ‘comments’ data by default, and it is straightforward to extend this field to include tags and annotations so that this collaboratively constructed information can be made searchable with OpenSearch, as recommended in section 9.5. The demarcation between private and public should be clear, and an option to keep certain tags private should be in place. But private tags could also be public tags, when there is a declaration that their purpose is to be private for a certain individuals, but that they can have some ‘brainstorming value’ for others. The problem occurs when private tags are made public but without being declared as primarily private tags. Allowing for both tag categories might lead to the users welcoming the tagging service with more enthusiasm. The percentage of tags chosen as private can then be further analysed for trends.

72 9. PRACTICAL RECOMMENDATIONS FOR A SINGLE TAGGING AND ANNOTATION TOOL IN OJAX++

9.7 Software recommendations The open source and open standards philosophy of OJAX++ is paramount, since it allows the local software developer to tweak the software to suit a local need and it increases software interoperability. Free licences, such as Berkeley Software Distribution (BSD) and the GNU Lesser General Public Licence (LGPL) keep the cost down. The pieces of software outlined in Chapter 8 range from FreeTag, RichTags and Steve Tagger that only cater for tags, to OATS and HarVANA that take both tags and longer annotations, such as image captions, into account. The aim was to find one piece of software capable of taking care of both tags and annotations, with no downloads required and a user interface that can fade into the background of the browser window.

9.7.1 Use the Open Annotation and Tagging System software: it can be seamlessly integrated and offers both tagging and annotation functionality The most suitable tagging software in IVRLA, and in other repositories using the OJAX federated search engine and the OJAX++ VRE framework, is to choose the Open Annotation and Tagging System (OATS). The four-button toolbar in OATS integrates seamlessly into the upper right-hand corner of the browser window, and can be used to annotate, search and browse. Tag recommendations can also be done in OATS, which ties in with the importance of quality control of tags. FreeTag is able to do most of this, but not supporting longer user- added comments is a limitation when the aim is a single tagging and annotation tool. The traits of OATS means that the system has the prospects of providing a discrete Java- based tagging and annotation tool in OJAX++, both for textual content, like theses, essays and pamphlets, and visual content, such as photography collections. OATS draws on previous research in the areas of support, web annotations systems, collaborative tagging and educational metadata and has a proven track record in managing user-generated content in a range of e-learning settings. The OATS interface is not interfering with the navigability of a repository, and the system can distinguish between private and public tags and the ‘Other’s tags’ option serves as a basic tag recommender system. As a versatile management system for user-generated content, OATS is equipped with the sought-for characteristics of a single tagging and annotation system. A further step is to use the controlled vocabulary WordNet, mentioned in section 7.1.1 and in the recommendation 9.4. Looking up proposed tags in WordNet before they are finally included in the folksonomy can lead to a lower rate of misspellings and irrelevant tags as Hsieh et al.

73 9. PRACTICAL RECOMMENDATIONS FOR A SINGLE TAGGING AND ANNOTATION TOOL IN OJAX++

(2006) has shown. OATS has the capability of querying databases and compare user- generated tags with word that occur in the WordNet language lexicon. Empirical evidence by Bateman et al. (2006b) shows that OATS is operational and has proven successful in e-learning and VLEs, where students advance towards meta-cognitive levels of application and analysis. Having been developed with VLEs in mind, a valid argument is whether it is transferable to VREs, however the potential end-user benefits are comparable: a researcher comes to a VRE to further broaden the understanding of a topic that is under investigation. The value of the acquired information can then be shared and thus become beneficial to others though the act of adding textual commentary, that is, tags and annotations to it. From consuming learning or research material, to interacting with it, in order to promote critical thinking is the underlying theme of OATS and this fits in with the goals and objectives of OJAX++. RichTags goes beyond arbitrary flat keywords and towards an unabridged collection of alternative labels and semantic relations. It builds on the Semantic Web philosophy of Annotea, a project that failed to take off. RichTags has a more sophisticated foundation, building on a ‘consistency mechanism’ to uphold regularity while a folksonomy expands with more and more user tags. RichTags is promising, but it is too early to predict its success with so little evidence-based examples currently available. Fleck and SharedCopy are browser add-ons and but have limited support from tagging and subsequent search and display of tags. Fleck can only be used in a Mozilla browser environment which limits its universal reach. The SharedCopy annotation widget is hassle- free and nothing needs to be installed, however no way of searching the user-generated content has been found – all comments are collected on ‘microblogs’ for each user and this limits the findability of annotations. The annotation functionality in the Multivalent and Fab4 browsers is interesting, but the limitation is that the additional browser has to be downloaded. Keeping it simple and staying within the boundaries of an ordinary web browser seems more sensible than this proprietary solution. Steve Tagger has been used for tagging of images and is more useful in an environment where most of the artefacts are images, photographs and other visual material. For the pictorial IVRLA content, Steve Tagger could work fine, but there is also a whole range of textual material in IVRLA. Both image annotations and tags for other IVRLA media types, such as papers, letters, pamphlets, slides, manuscripts, historical maps, and ballad sheets (University College Dublin, 2008c), need to be supported by the software. Steve Tagger does not have a tool for placing longer annotations on a page, and since the Open Annotation and

74 9. PRACTICAL RECOMMENDATIONS FOR A SINGLE TAGGING AND ANNOTATION TOOL IN OJAX++

Tagging system offers this, it is better to choose the latter system as it serves both tasks. FreeTag has also been implemented and shown to work on various services such as BlogSkins and Upcoming. This makes it suitable as software for the OJAX++ tagging tool, but again, longer annotations are not supported in FreeTag.

9.7.2 Use a combination of cross-repository tag cloud and tag lists on individual information item pages to give the tagging tool maximum visibility The recommendation for navigating tags is to have a simple tag cloud where the tag size represents the number of items to which a tag has been applied, as a weighted list representation of the popularity of tags. This is the most popular way of using tag clouds, and can be seen on, among others, Flickr and Technorati. The user studies by Caccamo (2006) and Rock (2008, forthcoming) show that tagging is not used frequently by researchers and a wide overnight adoption of tagging in IVRLA is unrealistic. Nevertheless, a cloud is a great way of immediately display the existence of a tagging service, and possibly also a token of the width and breadth of an emerging folksonomy. The most sensible approach under these circumstances is to have an overarching cross- repository tag cloud. The IVRLA repositories all have their focus on Ireland, from various historic aspects. The theoretical framework behind the digitised material in the UCD Delargy Centre for Irish Folklore and the National Folklore Collection, the Irish Dialect Archive, the UCD Archives, and the James Joyce Library Special Collections are partially overlapping. Visualising folksonomies through a tag cloud is realistic, since it is the de facto standard. The semantic relationships between search terms need to be taken into account in Virtual Research Environments, where relevancy and findability is paramount. However, the new search dimension opened up by collaborative tagging can contribute to this findability, taking into account the concepts of serendipity, berrypicking, exploration and discovery which are both common and vital methods for traversing information resources in a VRE. Tag clouds as an exploratory information environment can be achieved even with only a few tags. For the users who do not want to use a tag cloud as a point of entry to an information item, clickable tags on the same page as the items being tagged is the alternative recommendation. The ‘Tags for this work’ approach on item pages, used in Steve Tagger, is suggested for OJAX++, so that users can then discover tags in depth by clicking them. This ‘Find similar’ approach was popular in the Caccamo (2006) study, and that is also the functionality of the IVRLA recommender system (Chen et al. 2008a).

75 9. PRACTICAL RECOMMENDATIONS FOR A SINGLE TAGGING AND ANNOTATION TOOL IN OJAX++

9.7.2.1 Use the Dekoh versatile widget software for creating tag clouds For the tag cloud functionality, the WordPress tag cloud generators are tightly linked to that particular blog platform. The tag cloud widget from Dekoh is more promising. The Dekoh Desktop platform is open source and brings together a rich interface, rich functionality and rich access, ‘the three dimensions to a richer user experience’ (Dekoh, 2008a). It adds a flexible and appealing tag visualisation service, and since there are minimal overlaps with the Open Annotation and Tagging System, this software combination is valuable. The Dekoh Desktop shows its strength through a flexible JavaScript widget with lots of tag display options, by user, by date, by popularity, et cetera . It complements OATS well.

9.7.3 Monitor and learn from the HarVANA annotation system prototype, which has much in common with IVRLA and OJAX++ There is much space for flexibility in HarVANA and the fact that it is built on OAI-PMH introduces the proven benefits of scalability and interoperability. By storing the annotations separate from the digital objects they describe, copyright issues can be avoided. A secure Shibboleth protocol assures that the annotations and other metadata stays secure while transferred from the metadata server to the user running the search query. The HarVANA system has high currency and relevancy to OJAX++; however it is still under development. The HarVANA project and the ‘secure document sharing, tagging and annotation services’ for Virtual Research Environments developed there, should be closely monitored by the OJAX++ developers, since the project rationale, context and background story is closely related to OJAX and OJAX++. HarVANA is a promising project which can revolutionise the way tags and annotations are created, shared and discovered in community-driven VREs. Currently, Tomcat Web Application Archive (WAR) demonstrators for annotation of images and PDB Crystallography structures are available. More research is necessary after the HarVANA project has finalised the annotation tools, in order to see how they will initially be used, in which academic disciplines, and to what extent, so that the OJAX++ developers can learn from this related and very interesting and promising project.

76 10. DISCUSSION

10. Discussion

The chapter will start by revisiting the research questions and briefly indicate whether these have been answered in the thesis. The remainder of the chapter will discuss other aspects of the research.

10.1 Research questions revisited

10.1.1 How are collaborative tagging and annotation services currently used in academic and non-academic online environments? Chapter 5 surveyed various approaches to tagging, ranging from social bookmarking on the dominant and influential site Delicious and its many followers, to tagging of images, academic papers, books, audiovisual multimedia, et cetera .

10.1.2 Can a collaborative tagging and annotation tool benefit information classification and retrieval in OJAX++ Virtual Research Environments (VREs), and if so, how? Chapter 4 offered a background to information search, retrieval and classification. It was a thorough introduction to hierarchical and faceted classification approaches, and it set the scene for the uncontrolled user-centred keyword creation that is collaborative tagging, which was comprehensively explained and analysed in Chapters 5–7. The answer to the first part of the research question is: yes, tagging and annotating research material can benefit information classification and retrieval in VREs. An analysis as to how it can benefit these research communities showed that knowledge discovery, exploration, serendipity is facilitated, that many alternative ways to find an information item is possible, that the participation of collaborators in the indexing process leads to multiple interpretations of the same piece of information, and that fellow researchers can benefit from not only their own discoveries, but also those of others. IVRLA user studies and general research community user studies have shown that people are benignly welcoming aspects of the Social Web, including social tagging.

77 10. DISCUSSION

10.1.3 Are there any problems with using a collaborative tagging and annotation instrument in OJAX++ VREs, and if so, how can they be resolved? Chapter 6 outlined the problems with simple collaborative tagging: its inherent inadequacy to address synonym and homonym control, et cetera, but its strength in living up to its primary endeavour and responsibility: to make research and academic information more subjectively and contextually appraised, classified and findable, when Dublin Core, MODS, MIX and other authoritative metadata schemes are not precise enough to cater for user-generated metadata. The linguistic and semantic problems can be resolved by moving towards a tag recommender system, mentioned in Chapter 7. Introducing a tagging facility does not automatically lead to wide adoption of it. On Technorati, tags are used by only a fraction of the users, and these users only tag for personal use and not for the good of the community, therefore tagging is ‘lost by definition’ (Wetzlmayr, 2005). More than half, or 56 %, of the tags (and overall content) on Digg is created by 100 unique users, while on Delicious, 30,000 users (of the total user base of over three million) are responsible for 50 % of the content (Heymann et al. 2008).

10.1.4 What are the recommendations for the implementation of a single collaborative tagging and annotation tool in OJAX++? The recommendations were drawn up in Chapter 9. They ranged from the promotion of sensible tagging and the suggestion to combine unrestricted tagging with tag recommenders, to the recommendation that folksonomy tags benefit from tag cloud visualisation and navigation. Chapter 8 introduced the software, which was then revisited in Chapter 9 for the purpose of analysis. The Open Annotation and Tagging System was perceived as having the prospects of providing a discrete Java-based tagging and annotation tool in OJAX++, both for textual content, like theses, essays and pamphlets, and visual content, such as photography collections. OATS draws on previous research in the areas of social navigation support, web annotations systems, collaborative tagging and educational metadata and has been used in a range of e-learning settings. It is not interfering with the navigability of a repository, and can distinguish between private and public tags. It is equipped with the sought-for characteristics of a tagging and annotation system. It builds on the open standards PHP and MySQL.

78 10. DISCUSSION

10.1.5 Is there a role for the OpenSearch standard in syndicating and aggregating tags and annotations between VREs and other rich information environments? Yes, OpenSearch as an emerging standard, a non-proprietary protocol, for syndication and aggregation of search results from both structured and unstructured information sources, as section 4.1.3.3 described. Tags are a type of metadata, which can be harvested in the OAI- PMH format in the same way as more conventional metadata, such as author of a paper, photographer, title of a book or month and year of publication. Ogbuji (2007: 2) substantiates the validity and currency of OpenSearch with the long and growing list of OpenSearch tools and search engines adopting the open and standardised way of federating search results and metadata. In a scenario where IVRLA is unknown to a particular user searching for Irish emigration or famine questionnaires, OpenSearch can enable discovery of IVRLA metadata from the UCD Delargy Centre for Irish Folklore and the National Folklore Collection, if the OJAX Repository Search database is made available via OpenSearch. This opens up for increased synergism by linking institutional repository metadata and connecting multiple rich information environments to each other. When sufficiently geared up, OJAX++ will be a multi-faceted VRE that can be used in humanities research. OpenSearch allows for easy syndication of repository-specific tags, annotations and other metadata. Indirectly, it gives the OJAX search environment a wider audience.

10.2 Tagging across disciplines VREs are highly discipline-specific and their toolkits differ: some have blogs, wikis or document sharing or versioning services, while others focus more on annotations and user- generated descriptors displayed on the same page as the content they pertain to. However, CiteULike and Connotea show that tagging can be done with similar user interface and the same basic functionality over a range of topic areas and subject matters. There is no indication as to why the software suggested or the recommendations made could not be applicable across disciplines, for instance, humanities, physics, chemistry, engineering and many other disciplines. The idea and framework behind tagging as activity does not differ considerably. The implications and benefits of tagging are equally important across fields of study and branches of knowledge. This was illustrated with the Connotea tagging study. Connotea uses an inter-disciplinary tagging instrument which is just as effective in natural sciences as in social sciences. The tagging on Connotea has 92 % subject related tags, which is much higher

79 10. DISCUSSION than on Delicious and similar sites. Tagging academic and scientific information seems to be successful on Connotea. Subjects in the humanities domain might have more room for contextual, subjective keywords than in natural sciences, where a more precise terminology is necessary, but there is no hard and fast rule that user-generated tags for this reason are irrelevant in certain subject areas.

10.3 Momentum for user-generated content in VREs? In all likelihood, tags and folksonomies will always be just a complement to traditional search, and the beauty of folksonomies is that the ‘people power’ is very much beneficial even through simple serendipitous browsing and undemanding knowledge exploration, as recognised and commended by Gahran (2005). Boosting the efficiency of tagging systems is an evolving task and the theoretical framework and the technical infrastructure is becoming more and more advanced. Authority control of the user-generated content is of great importance, as well as realising and tackling the challenges with non-conventional classification, as outlined in Chapter 6. This research did not give or claim to give definite solutions to all the problems, it just indicated some ways of dealing with them and how a basic tagging tool can draw on software and services that are already out there. Whether it is called ‘Wisdom of Crowds’, collaborative, folk or social content, attaching user-generated tags and annotations to scholarly and scientific material has evidently worked on Connotea, CiteULike , PennTags and Steve.Museum, and also in the annotation system in the JISC VRE prototype Building a Virtual Research Environment for the Humanities. Collaborative tagging in its social bookmarking incarnation is not primarily designed for information discovery and retrieval by others than the person who is creating the tags and bookmarks. At this stage, looking back at the results of this research, there are good reasons to believe that user-generated content, such as tags, annotations and other elucidatory commentary derived from the theories behind the Social Web, are advantageous in the IVRLA repositories and future OJAX++ information resources, repositories and digital libraries.

80 11. CONCLUSIONS

11. Conclusions

11.1 IVRLA in transition Social tagging as it currently stands is designed primarily for private information discovery and retrieval. User-generated tags utilised for a collective, social good is paramount in research communities, for instance a VRE building on OJAX++. The thesis attempted to investigate how tagging systems can be modified and refocused to facilitate search. The research superimposed tagging and annotation features from the Social Web on the research community in order to lay the foundation for more study in how to combine these domains. Unstructured and uncontrolled tagging emerged as a new approach for classifying information, able to be employed side by side with faceted classification, but leaving the hierarchical classification approach behind. The power of serendipitous information acquisition in research communities is of great importance for future advancement in high-tech science, but also for the day-to-day research in the humanities, for instance IVRLA. At the same time, the temptation to be unduly optimistic that tags and annotations will make other information seeking approaches obsolete has to be resisted. They will not. They are just one component in an increasingly hybridised classification and search experience. Berendt and Hanser (2007: 1) argue that a ‘gold standard’ of classifying ‘may be meaningless’. Is this due to tags being too flat, or due to the content being too diverse? The answer will depend on an individual’s previous hands-on and theoretical experience of user-generated content and the Social Web, and the relative value ascribed to these processes. A major difference between tagging web pages on Delicious and tagging in IVRLA and other repositories is quantity versus quality. Initially, IVRLA users will not able to recommend and add external items and resources, but will be confined to annotating and tagging the material that has already been added, for instance images from the Éamon de Valera Photograph Collection or the collections of papers by William Frazer, Eugene O’Curry or Michael Tierney. Turning this to our advantage, the tagging service would work at its best if it could cross the boundaries between the repositories, for example, the Department of Archives, the James Joyce Library’s Special Collections and the Irish Dialect Archive could have their tags included in the same tag cloud. This would lead to increased knowledge discovery of concepts that a user is not familiar with. It is important that IVRLA management realises the value of user tags and rank them as equally important and not inferior to, the repository content on IVRLA. If tagging is being

81 11. CONCLUSIONS perceived as too loosely defined and ridden by the absence of ‘peer-review’ and filtering, procedures of helping the visitors to tag sensibly is an option. This is closely connected to the level of user interaction sought for, and the recognition of this needs to be re-evaluated over time. A greater awareness of the success with social classification in Steve.Museum , where multiple entry points to paintings and other museum exhibits are amassed from the collective intelligence, can further promote the paradigm of user-generated content. Collaborative tagging and annotation is not a negligent act of trivialising the scholarly need for credibility, verifiability and authority control; it just adds a new dimension to the discovery of material. The last word of approval and judgment will always remain with the researcher. To let researchers tag material that is not already permeated with tags (moving away from the likes of Delicious) might result in a greater interest in the activity of tagging and annotating resources. The IVRLA material is a ‘clean slate’ as regards tagging and it will be interesting to see how well this service will fall out once introduced.

11.2 Future work The IVRLA users surveyed by Caccamo (2006) seem to have a preference for keeping user- generated content private. This is the prevalent mindset before this tool has been introduced, and further research is needed in the period after implementation of the tagging and annotation tool, in order to assess whether this propensity will stay the same. Bateman et al. (2007: 6) emphasise that tagging systems ‘require a critical mass before they become useful to a community’. Will a small but very active tagging community create most of the public tags? This needs to be revisited after tagging has been introduced. Semantic tagging and the move towards inducing well-balanced faceted categories from tag clouds is becoming increasingly important and needs more study. More refined ways of searching tags and not just exploring them in chunks on resource pages needs more research. JISC’s ‘Enhanced Tagging for Discovery’ project was conveyed as empirical evidence that the research community has realised the value of knowledge discovery through non- traditional classification. Free tagging with no instructions versus tagging using a hybrid system with user guidance is under investigation here and the outcomes of this project should be closely monitored, as it studies tagging both by readers and by authors. Investigating how VREs can capitalise on other Social Web components, like blogs, wikis, forums, social bookmarks and feeds, is another important area for further research. Building further on the IVRLA recommender system research by Chen et al (2008a) is essential. 82 NOTES

Notes

[1] Delicious (formerly Del.icio.us) – http://delicious.com/ [2] The IVRLA repository web page can be found at http://ivrla.ucd.ie/ [3] Blackboard – http://www.blackboard.com/us/index.bbb [4] Moodle – http://moodle.org/ [5] Sakai – http://sakaiproject.org/ [6] CORE – http://www.core.ecs.soton.ac.uk/ [7] BVREH – http://bvreh.humanities.ox.ac.uk/ [8] IBVRE – http://www.vre.ox.ac.uk/ibvre/ [9] VERA – http://vera.rdg.ac.uk/index.php [10] VIBE – http://oscar.gen.tcd.ie/vibe/ [11] IReL – http://www.irelibrary.ie/default.aspx [12] Digital Humanities Observatory – http://www.dho.ie/ [13] Cleveland Museum of Art – http://www.clemusart.com/ [14] Apache Lucene – http://lucene.apache.org/java/docs/ [15] The official web page of MODS is http://www.loc.gov/standards/mods/ [16] More info on Dublin Core can be found at http://dublincore.org/ [17] MIX – http://www.loc.gov/standards/mix/ [18] Intute – http://www.intute.ac.uk/ [19] Council for the Central Laboratory of the Research Councils – http://www.cclrc.ac.uk/ [20] TechCrunch – http://www.techcrunch.com/ [21] Google – http://www.google.ie/ [22] Yahoo! – http://ie.yahoo.com/ [23] Truveo – http://www.truveo.com/ [24] University College Dublin – http://www.ucd.ie/ [25] SRU / SRW – http://www.loc.gov/standards/sru/ [26] The OpenSearch standard – http://www.opensearch.org/Home [27] Amazon A9 – http://opensearch.a9.com/ [28] Picture Australia – http://www.pictureaustralia.org/ [29] Collections Australia Network – http://www.collectionsaustralia.net/ [30] A wiki on the Semantic Web is available at http://semanticweb.org/wiki/Main_Page [31] Picasa – http://picasa.google.com/ [32] More on the Linnaean taxonomy at http://www.palaeos.org/Linnaean_taxonomy [33] DDC 22 – http://www.oclc.org/dewey/resources/summaries/default.htm [34] International Society for Knowledge Organization goes more into depth on Colon classification – http://www.iskoi.org/doc/colon.htm [35] Endeca – http://www.endeca.com/services/index.html [36] Guardian Unlimited – http://www.guardian.co.uk/ [37] Walmart – http://www.walmart.com/ [38] LexisNexis – http://www.lexisnexis.com/ [39] JSTOR Faceted Search – http://sandbox.jstor.org/ [40] FaceTag – http://www.facetag.org/ [41] Stardust@home – http://stardustathome.ssl.berkeley.edu/ [42] Wikipedia – http://www.wikipedia.org/ [43] Furl – http://www.furl.net/ [44] StumbleUpon – http://www.stumbleupon.com/ [45] Yahoo! MyWeb – http://myweb.yahoo.com/ [46] Technorati – http://technorati.com/ [47] BBC Shared Tags – http://backstage.bbc.co.uk/prototypes/archives/2005/05/bbc_shared_tags. 83 NOTES

[48] Digg – http://digg.com/ [49] Reddit – http://www.reddit.com/ [50] Slashdot – http://slashdot.org/ [51] PennTags – http://tags.library.upenn.edu/ [52] 43 Things – http://www.43things.com/ [53] Facebook – http://www.facebook.com/ [54] Flickr – http://www.flickr.com/ [55] Shutterfly – http://www.shutterfly.com/ [56] Automatic Linguistic Index of Pictures Real Time (ALIPR) – http://www.alipr.com/ [57] Behold – http://www.behold.cc/ [58] Google Maps – http://maps.google.com/ [59] Steve.Museum – http://www.steve.museum/ [60] Steve Tagger – http://sourceforge.net/projects/steve-museum [61] LibraryThing – http://www.librarything.com/ [62] Amazon – http://www.amazon.com/ [63] LCC – http://www.loc.gov/catdir/cpso/lcc.html [64] FictionDB – http://www.fictiondb.com/ [65] YouTube – http://www.youtube.com/ [66] Internet Movie Database, MoKA – http://www.imdb.com/Sections/Keywords/ [67] Odeo – http://www.odeo.com/ [68] Last.fm – http://www.last.fm/ [69] Connotea – http://www.connotea.org/ [70] CiteULike – http://www.citeulike.org/ [71] PubMed – http://www.ncbi.nlm.nih.gov/pubmed/ [72] JSTOR main page – http://www.jstor.org/ [73] PLoS – http://www.plos.org/ [74] Science Direct – http://www.sciencedirect.com/ [75] BibSonomy – http://www.bibsonomy.org/ [76] BibTeX format description and explanation at http://www.bibtex.org/ [77] The Multivalent Cheshire Kepler (MultiCheK) project – http://bodoni.lib.liv.ac.uk/anno/ [78] More on Boolean logic at http://computer.howstuffworks.com/boolean.htm [79] WordNet – http://wordnet.princeton.edu/wn2.0.shtml [80] 83 Degrees – http://www.83degrees.com/ [81] Rijksmuseum Amsterdam – http://www.rijksmuseum.nl/ [82] Buzzillions – http://www.buzzillions.com/ [83] ZoneTag – http://zonetag.research.yahoo.com/ [84] Getty Art and Architecture Thesaurus – http://www.getty.edu/research/conducting_research/vocabularies/aat/ [85] Getty Union List of Artists Names – http://www.getty.edu/research/conducting_research/vocabularies/ulan/ [86] FreeTag – http://code.google.com/p/freetag/wiki/FreetagHome [87] BlogSkins – http://www.blogskins.com/ [88] Eatlunch.at – http://www.eatlunch.at/ [89] Upcoming – http://upcoming.yahoo.com/ [90] RichTags – http://beta.richtags.net/ [91] The Annotea Live Early Adoption and Demonstration (LEAD) web site – http://www.w3.org/2001/Annotea/ [92] Amaya – http://www.w3.org/Amaya/ [93] Annozilla – http://annozilla.mozdev.org/ [94] Annotatio – http://sourceforge.net/projects/annotatio [95] CommentPress – http://www.futureofthebook.org/commentpress/ [96] WordPress – http://wordpress.com/

84

[97] Fleck – http://fleck.com/ [98] SharedCopy – http://sharedcopy.com/public/widget [99] Open Annotation and Tagging System (OATS) – http://ihelp.usask.ca/OATS/ [100] iHelp – http://ihelp.usask.ca/ [101] HarVANA – http://www.itee.uq.edu.au/~eresearch/projects/harvana/index.html [102] The Ultimate Tag Warrior – http://www.neato.co.nz/ultimate-tag-warrior/ [103] Jerome’s Keywords – http://vapourtrails.ca/wp-keywords [104] Dekoh – http://www.dekoh.org/wiki/view/TagCloudWidget [105] Regionaal Historisch Centrum Eindhoven – http://www.rhc-eindhoven.nl/ [106] AskOnce – http://www.askonce.com/ [107] Venetica – http://www.ibm.com/ [108] Alfresco – http://www.alfresco.com/

85 REFERENCES

References

Agnew, G. (2003), ‘Developing a metadata strategy’, ALA ALCTS, February 2003, available online at http://gondolin.rutgers.edu/MIC/text/how/metadata_agnew.pdf , retrieved at 28/09/08

Ames M. and M. Naaman (2007), ‘Why we tag: Motivations for annotation in mobile and online media’, available online http://www.stanford.edu/~morganya/research/chi2007- tagging.pdf , retrieved at 28/09/08

Anadiotis G., T. Franz and S. Boll (2007), ‘Tagging use case’, W3C Multimedia Semantic Incubator Group Wiki, available online at http://www.w3.org/2005/Incubator/mmsem/wiki/Tagging_Use_Case , retrieved 19/08/08

Barnes, C. (2006), ‘A rainbow sunset’, Flickr image, available online at http://www.flickr.com/photos/senrabphoto/236726633/, retrieved 23/09/08

Basile, P., F. Calefato, M. de Gemmis, P. Lops, G. Semeraro, M. Bux, C. Musto and F. Narducci (2008), ‘Augmenting a Content-based Recommender System with Tags for Cultural Heritage Personalization’, in PATCH 2008, Proceedings of the 2 nd International Workshop on Personalized Access to Cultural Heritage, July–August 2008, Hannover, Germany, pp. 25–34, available online at http://www.ah2008.org/files/resourcesmodule/@random4875d36b687e4/1215681416__ Proc_AH2008_WS4_Personalized_Access_to_Cultural_Heritage.pdf , retrieved 19/08/08

Bateman, S., C Brooks and G. McCalla (2006a), ‘Collaborative tagging approaches for ontological metadata in adaptive e-learning systems’, Lab for Advanced Research in Intelligent Educational Systems, Department of Computer Science, University of Saskatchewan, Canada, available online at http://www.win.tue.nl/SW-EL/2006/camera- ready/02-bateman_brooks_mccalla_SWEL2006_final.pdf , retrieved 23/09/08

Bateman, S., R. Farzan, P. Brusilovsky and G McCalla (2006b), ‘OATS: The Open Annotation and Tagging System’, Advanced Research in Intelligent Educational Systems Lab, Department of Computer Science, University of Saskatchewan, Canada, and Intelligent Systems Program and School of Information Sciences, University of Pittsburgh, Pennsylvania, available online at http://fox.usask.ca/files/oats-lornet.pdf , retrieved 23/09/08

Bateman, S., C. Brooks, G. McCalla and P. Brusilovsky (2007), ‘Applying collaborative tagging to e-learning’, Proceedings of the 16 th International World Wide Web Conference, May 2007, Banff, Alberta, Canada, available online at http://fox.usask.ca/files/tagging_elearning_bateman.pdf , retrieved 01/05/08

Bateman, S. (2008), The OATS tagging and annotation demo, available online at http://ihelp.usask.ca/OATS/sampleDeployment/montreal.html , retrieved 23/09/08

Bates, M. J. (1989), ‘The design of browsing and berrypicking techniques for the online search interface’, in Online Review , 1989, available online at http://www.si.umich.edu/~rfrost/courses/SI110/readings/InfoFinding/Bates_on_Berrypi cking.pdf , retrieved 19/08/08 86 REFERENCES

Beck, I. (2007), ‘Tagging best practices’, available online at http://tagamac.com/2007/07/best_practices/ , retrieved 23/09/08

Berendt, B. and C. Hanser (2007), ‘Tags are not metadata, but “just more content” – to some people’, available online at http://www.icwsm.org/papers/2--Berendt-Hanser.pdf , retrieved 01/05/08

Berners-Lee, T., J. Hendler and O. Lassila (2001), ‘The semantic web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities’, in Scientific American , May 2001, available online at http://www.sciam.com/article.cfm?id=the-semantic-web , retrieved 23/09/08

Bianchini, G. (2007), ‘5,500 new social networks in 10 days!’ available online at http://blog.ning.com/2007/07/5500_new_social_networks_in_10.html , retrieved 23/09/08

Brennan, N. (2005), ‘Ballrooms of romance? Towards collaborative virtual research environments in Ireland’, Irish Universities Information Services Colloquium Conference, 2005, available online at http://www.iuisc.ie/2005/Presentations/Thursday/NIamh%20brennan.pdf , retrieved 01/05/08

Broughton, V. (2004), Faceted Classification , Essential Classification. London: Facet, image available at http://www.emeraldinsight.com/fig/2760580105003.png, retrieved 23/09/08

Caccamo, A. C. (2006), ‘User needs and the Irish Virtual Research Library and Archive (IVRLA)’, MLIS minor thesis, September 2006, School of Library and Information Studies, University College Dublin

Campbell, D. G. (2006), ‘A phenomenological framework for the relationship between the semantic web and user-centered tagging systems’, in Furner, J. and J. T. Tennis (Eds.), Proceedings 17th Workshop of the American Society for Information Science and Technology Special Interest Group in Classification Research 17, Austin, Texas, available online at http://dlist.sir.arizona.edu/1838/01/campbell.pdf , retrieved 19/08/08

Castells, M. (2000), The Rise of the Network Society, 2nd edition, Oxford: Wiley-Blackwell

Chen, J., L. McGinty, J. Shen and H. Brosnan (2008a), ‘Supporting Navigation Over Context- Limited Historical Data’, in PATCH 2008, Proceedings of the 2 nd International Workshop on Personalized Access to Cultural Heritage, July–August 2008, Hannover, Germany, pp. 45–6, available online at http://www.ah2008.org/files/resourcesmodule/@random4875d36b687e4/1215681416__ Proc_AH2008_WS4_Personalized_Access_to_Cultural_Heritage.pdf , retrieved 19/08/08

Chen, J., L. McGinty, J. Shen, J. Wusteman and J. Freyne (2008b), ‘Supporting navigation over context-limited historical library data’, unpublished poster, School of Computer Science and Informatics and School of Information and Library Studies, University College Dublin

87 REFERENCES

Chi, E. H. and T. Mytkowicz (2006), ‘Understanding navigability of social tagging systems’, Palo Alto Research Center, available online at http://www.viktoria.se/altchi/submissions/submission_edchi_0.pdf , retrieved 23/09/08

Chitu, A. (2008), ‘YouTube annotations’, image of an annotated YouTube video clip, available online at http://googlesystem.blogspot.com/2008/06/youtube-annotations.html , retrieved 23/09/08

Choudhury G. S., C. Requardt, I. Fujinaga, T. DiLauro, E. W. Brown, J. W. Warner and B. Harrington (2000), ‘Digital Workflow Management: The Lester S. Levy Digitised Collection of Sheet Music’, in First Monday , 5 (6), June 2000, available online at http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/756/665, retrieved 19/08/08

CiteULike (2008), Tags on CiteULike, available online at http://www.citeulike.org/ , retrieved 23/09/08

Connolly, S. (2008), ‘Opening the social web for business: 7 key attributes of social web applications’, available online at http://connollyshaun.blogspot.com/2008/05/7-key- attributes-of-social-web.html , retrieved 01/09/08

Cormode, G. and B. Krishnamurthy (2008), ‘Key differences between Web 1.0 and Web 2.0’, in First Monday , 13 (6), June 2008, available online at http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2125/1972 , retrieved 19/08/08

Corubolo, F. (2008), The annotation interface in the Fab4 browser, available online at http://bodoni.lib.liv.ac.uk/fab4/ , retrieved at 23/09/08

Cox, R. J. and the University of Pittsburgh archive students (2007), ‘Machines in the archives: Technology and the coming transformation of archival reference’, in First Monday , 12 (11), November 2007, available online at http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2029/1894 , retrieved 19/08/08

Crestani, F. and S. Wu (2006), ‘Testing the cluster hypothesis in distributed information retrieval’, in Information Processing and Management: an International Journal, 42 (5), September 2006, available online at http://dx.doi.org/10.1016/j.ipm.2005.12.002 , retrieved 19/08/08

Davidson, J., N. Rocherolle and N. Wilder (2008), ’Who, what, why’, 83 Degrees, available online at http://www.83degrees.com/about , retrieved at 23/09/08

Dekoh (2008a), ‘Dekoh big picture’, available online at http://www.dekoh.org/wiki/view/DekohBigPicture , retrieved at 23/09/08

Dekoh (2008b), ‘TagCloud Example: XML Data’, available online at http://www.dekoh.org/widgets/tagcloud/tagcloud_xml_example.html , retrieved 23/09/08

88 REFERENCES

Dekoh (2008c), XML file of the tag cloud data in Dekoh (2008b), available online at http://www.dekoh.org/widgets/tagcloud/tagcloud_xml_exampledata. , retrieved 23/09/08

Delicious (2008a), ‘Help on Delicious’, available online at http://Del.icio.us/help/tags , retrieved 24/07/08

Delicious (2008b), Tag-creation recommendations on Delicious, available online at http://www.howtogeek.com/wp-content/uploads/2007/08/image136.png , retrieved 23/09/08

Donnato, D. (2008), ‘From folksonomy to collaborative tagging: How social bookmarking can improve web search’, Yahoo! Research, Seminars of Computer Networks, Barcelona, Spain, available online at http://donade.net/lectures/tagging.pdf , retrieved 23/09/08

Duggan, P. (2008), ‘The relationship between VLEs and VREs: a study’, MLIS minor thesis, September 2008, School of Information and Library Studies, University College Dublin

Echarte, F., J. J. Astrain, A. Cordóba and J. Villadangos (2007), ‘Ontology of folksonomy: A new modelling method’, Dpt. de Ingeniería Matemática e Informática, Campus de Arrosadia, Pamplona, Navarra, Spain, available online at http://ftp.informatik.rwth- aachen.de/Publications/CEUR-WS/Vol-289/p08.pdf , retrieved 23/09/08

Endeca (2008), ‘Try the guided navigation experience live at Guardian Unlimited newspapers’, available online at http://endeca.com/demo.html , retrieved 23/09/08

Farrell S. and T. Lau (2006), ‘Fringe Contacts: People-tagging for the enterprise’, available online at http://tlau.org/research/papers/www06-tagging-fc.pdf , retrieved 01/05/08

Farzan, R. and P. Brusilovsky (2006), ‘AnnotatEd: A Social Navigation and Annotation Service for Web-based Educational Resources’, in T. Reeves and S. Yamashita (Eds.), Proceedings of World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education, 2006, pp. 2794–2802, Association for the Advancement of Computing in Education, Chesapeake, Virginia

Fountopoulos, G. I. (2007), ‘RichTags: A Social Semantic Tagging System’ MSc thesis, University of Southampton, available online at http://eprints.ecs.soton.ac.uk/15109/1/gf_msc_thesis.pdf , retrieved 19/08/08

Fraser, M. (2005), ‘Virtual Research Environments: Overview and activity’, in ARIADNE , 44, July 2005, available online at http://www.ariadne.ac.uk/issue44/fraser/ , retrieved 19/08/08

Fraser, M. (2006), ‘The place of the digital library within Virtual Research Environments’, available online at http://users.ox.ac.uk/~mikef/rts/ticer/fraser_diglib_vre_24Aug06- online.pdf , retrieved 19/08/08

Gahran, A. (2005), Comment in reply to blog post ‘Technorati tags: Good idea, terrible implementation’ by D. Taylor, available online at http://www.intuitive.com/blog/technorati_tags_good_idea_terrible_implementation.html , retrieved 19/08/08

89 REFERENCES

Golder, S. A. and B. A. Huberman (2005), ‘The Structure of Collaborative Tagging Systems’, HPL Technical Report , available from http://www.hpl.hp.com/research/idl/papers/tags/tags.pdf , retrieved 19/08/08

Golder, S. A. and B. A. Huberman (2006), ‘Usage patterns of collaborative tagging systems’, in Journal of Information Science , 32 (2), pp. 198–208

Grehan, M. (2005), ‘Is a bigger search index better for relevancy?’ The ClickZ Network, available online at http://www.clickz.com/showPage.html?page=3549606 , retrieved 01/09/08

Guy, M. and E. Tonkin (2006), ‘Folksonomies: Tidying up tags’, in D-Lib Magazine , 12 (1), January 2006, available online at http://www.dlib.org/dlib/january06/guy/01guy.html , retrieved 19/08/08

Hammond, T., T Hannay, B. Lund and J. Scott (2005), Social bookmarking tools (I): A general review, in D-Lib Magazine , 11 (4), April 2005, available at http://www.dlib.org/dlib/april05/hammond/04hammond.html , retrieved 23/09/08

Hannay, T. (2008), ‘Introduction: Timo Hannay’ available online at http://tagsonomy.com/index.php/introduction-timo-hannay/ , retrieved 19/08/08

Hassan-Montero, Y. and V. Herrero-Solana (2006), ‘Improving Tag-Clouds as Visual Information Retrieval Interfaces’, Scimago Research Group, Faculty of Library and Information Science, University of Granada, Spain, available online at http://www.nosolousabilidad.com/hassan/improving_tagclouds.pdf , retrieved 19/08/08

Healy, S. (2008), ‘An evaluation of the user requirements of users of the Irish Virtual Research Library and Archive’, MLIS minor thesis, September 2008, School of Information and Library Studies, University College Dublin

Heckner, M, S. Mühlbacher and C. Wolff (2008), ‘Tagging tagging: Analysing user keywords in scientific bibliography management systems’, available online at http://journals.tdl.org/jodi/article/viewPDFInterstitial/246/208 , retrieved 19/08/08

Heery R. and S. Anderson (2005), Digital repositories review , UK Office for Library and Information Networking, available online at http://www.jisc.ac.uk/uploaded_documents/digital-repositories-review-2005.pdf , retrieved 01/05/08

Herlocker, J. L., J. A. Konstan, L. G. Terveen and J. T. Riedl (2004), ‘Evaluating collaborative filtering recommender systems’, available online at http://ectrl.itc.it/home/laboratory/meeting/download/p5-l_herlocker.pdf , retrieved 19/08/08

Heymann, P., G. Koutrika and H. Garcia-Molina (2008), ‘Can social bookmarking improve web search?’ available online at http://heymann.stanford.edu/improvewebsearch.html , retrieved 19/08/08

Holland, M. C. (2006), ‘MIX as a standard for managing images in a digital library – IVRLA: a case study, MLIS minor thesis, September, 2006, School of Information Library Studies, University College Dublin

90 REFERENCES

Howe, J. (2006), ‘The rise of crowdsourcing’, in Wired , 14.06, June 2006, available online at http://www.wired.com/wired/archive/14.06/crowds.html , retrieved 19/08/08

Hsieh, W.-T., W.-S. Lai and S.-C. T. Chou (2006), ‘A collaborative tagging system for learning resources sharing’, in Current Developments in Technology-Assisted Education , 2006, pp. 1364–1368, available online at http://www.formatex.org/micte2006/pdf/1364-1368.pdf, retrieved 01/05/08

Hunter, J. (2007), ‘Harvesting community tags and annotations to augment institutional repository metadata’, available online at http://www.eresearch.edu.au/hunter , retrieved 19/08/08

Hunter, J., I. Khan and A. Gerber (2007), ’HarVANA – Harvesting community tags to enrich collection metadata’, available online at http://www.itee.uq.edu.au/~eresearch/papers/2008/JCDL2008.pdf , retrieved 19/08/08

Inter Departmental Committee on Science, Technology and Innovation (2004), ‘Building Ireland’s knowledge economy’, The Irish Action Plan for Promoting Investment in R&D to 2010, Report to the Inter Departmental Committee on Science, Technology and Innovation’, July 2004, http://www.entemp.ie/publications/enterprise/2004/knowledgeeconomy.pdf , retrieved on 05/09/08

Joint Information Systems Committee (2008), ‘E-research’, available online at http://www.jisc.ac.uk/whatwedo/themes/eresearch.aspx , retrieved 19/08/08

John A. and D. Seligman (2006), ‘Collaborative tagging and expertise in the enterprise’, Proceedings of the Collaborative Web Tagging Workshop at WWW2006, Edinburgh, available online at http://www.semanticmetadata.net/hosted/taggingws-www2006- files/26.pdf , retrieved on 01/05/08

Keogh S. (2005), ‘Descriptive metadata in a digital library: The Irish Virtual Research Library and Archive: a case study’, MLIS minor thesis, October 2005, Faculty of Library and Information Studies, University College Dublin

Kerchner, M. D. (2006), ‘A Dynamic Methodology for Improving Search Experience’, in Information Technology and Libraries , 25 (2), June 2006, available online at http://www.lita.org/ala/lita/litapublications/ital/252006/2502jun/kerchner.pdf , retrieved 24/07/08

Kipp M. E. I. and D. G. Campbell (2006), ‘Patterns and inconsistencies in collaborative tagging systems: An examination of tagging practices’, Proceedings of the ASIS&T, 2006, available online at http://dlist.sir.arizona.edu/1704/01/KippCampbellASIST.pdf , retrieved 01/05/08

Krock, L. (2001), ’Accidental discoveries’, NOVA Online, available online at http://www.pbs.org/wgbh/nova/cancer/discoveries.html , retrieved 19/08/08

Kroski, E. (2005), ‘The hive mind: Folksonomies and user-based tagging’, available online at http://infotangle.blogsome.com/2005/12/07/the-hive-mind-folksonomies-and-user- based-tagging/ , retrieved 19/08/08

91 REFERENCES

Lamantia, J. (2006), ‘Tag clouds evolve: Understanding tag clouds’, available online at http://www.joelamantia.com/blog/archives/ideas/tag_clouds_evolve_understanding_tag _clouds_1.html , retrieved 19/08/08

Lamb, G. M. (2008), ‘“Citizen scientists” watch for signs of climate change’, The Christian Science Monitor, available online at http://www.csmonitor.com/2008/0410/p14s01- sten.html , retrieved 19/08/08

LeVan, R. (2006), ‘OpenSearch and SRU: Continuum of Searching’, in Information Technology and Libraries, 25 (3), September 2006, pp. 151–3, available online at http://www.oclc.org/research/publications/archive/2006/levan-ital.pdf , retrieved 19/08/08

LibraryThing (2008), LibraryThing tags relating to the philosophy of science, available online at http://www.librarything.com/tag/philosophy+of+science , retrieved 19/08/08

Loosley, C. (2006), ‘Rich Internet Applications: Design, measurement and management challenges’, available online at http://www.keynote.com/docs/whitepapers/RichInternet_5.pdf , retrieved 19/08/08

Luk, G. (2004), The FreeTag-powered tagging service on Eatlunch.at, available online at http://www.eatlunch.at/ , retrieved at 23/09/08

Luk, G. (2005a), ‘FreeTag implementation guide’, available online at http://freetag.googlecode.com/files/FreetagImplementationGuide.pdf , retrieved 19/08/08

Luk, G. (2005b), ’FreeTag: an open source tagging / folksonomy module for PHP/MySQL applications’, available online at http://getluky.net/2005/04/16/freetag-an-open-source- tagging-folksonomy-module-for-phpmysql-applications/ , retrieved 26/06/08

Marlow, C., M. Naaman, d. boyd and M. Davis (2006), ‘HT06, Tagging Paper, Taxonomy, Flickr, Academic Article, ToRead’, Proceedings of the Seventeenth Conference on Hypertext and Hypermedia. HYPERTEXT ’06, pp. 31–40, New York: ACM Press

Mauch J. E. and N. Park (2003), Guide to the successful thesis and dissertation: A handbook for students and faculty , London: CRC Press

McCallum, S. H. (2006), ‘A Look at New Information Retrieval Protocols: SRU, OpenSearch/A9, CQL, and XQuery’, 102 IFLA-CDNL Alliance for Bibliographic Standards ICABS, World Library and Information Congress, 72 nd IFLA General Conference and Council, August 2006, Seoul, Korea, available online at http://www.ifla.org/IV/ifla72/papers/102-McCallum-en.pdf , retrieved 26/06/08

Merholz, P. (2004), ‘Metadata for the masses’, available online at http://www.adaptivepath.com/publications/essays/archives/000361.php , retrieved 26/06/08

Millen, D., J. Feinberg and B. Kerr (2005), ‘Social bookmarking in the enterprise’, in ACM Queue, 3 (9), November 2005, available online at http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=344 , retrieved 26/06/08

92 REFERENCES

Morgan, E. L. (2005), ’Introduction to Search/Retrieve URL Service (SRU)’ available online at http://infomotions.com/musings/sru/ , retrieved 26/09/08

Morrison, J. (2007), ‘Tagging and searching: Search retrieval effectiveness of folksonomies on the World Wide Web’, available online at http://www.jasonmorrison.net/content/wp- content/uploads/2007/10/tagging-and-searching.pdf , retrieved 19/08/08

Morville, P. (2006), Ambient Findability: what we find changes who we become . Cambridge: O'Reilly.

National Information Standards Organization (2004), ‘Understanding metadata’, available online at http://www.niso.org/publications/press/UnderstandingMetadata.pdf , retrieved 19/08/08

O’Donovan, J. and B. Smyth (2005), ‘Trust in recommender systems’, Proceedings of the 10th international conference on intelligent user interfaces, San Diego, California, pp. 167–174, available online at http://portal.acm.org/citation.cfm?id=1040870 , retrieved 19/08/08

Open Archives Initiative (2003), ‘OAI for beginners’, available online at http://www.oaforum.org/tutorial/english/page1.htm , retrieved 19/08/08

Ogbuji, U. (2007), ‘Introducing OpenSearch’, available online at http://www.xml.com/pub/a/2007/07/20/introducing-opensearch.html , retrieved 19/08/08

O’Reilly, T. (2005), ‘What is Web 2.0’, available online at http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html , retrieved 19/08/08

Phelps, T. A. and R. Wilensky (2001), ‘The Multivalent browser: A platform for new ideas’, Proceedings of Document Engineering, November 2001, Atlanta, Georgia, available online at http://multivalent.sourceforge.net/Research/PlatformForNewIdeas.pdf , retrieved 26/09/08

Pustejovsky, J. and P. Bouillon (1995), ‘Aspectual Coercion and Logical Polysemy’, in Journal of Semantics, 12 (2), pp 133–162

Quintarelli, E. (2005), ‘Folksonomies: Power to the people’, paper presented at the ISKO Italy UniMIB meeting, Milan, June 2005, available online at http://www.iskoi.org/doc/folksonomies.htm , retrieved 19/08/08

Quintarelli, E., L. Rosati and A. Resmini (2006), ’FaceTag: Integrating a bottom-up and top- down classification in a social tagging system’, available online at http://www.facetag.org/download/facetag.pdf , retrieved 19/08/08

Reh, J. (2008): ‘Pareto’s principle – The 80-20 rule’, available online at http://management.about.com/cs/generalmanagement/a/Pareto081202.htm , retrieved 19/08/08

93 REFERENCES

Reisinger, D. (2008), ‘Picasa refresh brings facial recognition’, TechCrunch, available online at http://www.techcrunch.com/2008/09/02/picasa-refresh-brings-facial-recognition/ , retrieved 26/09/08

Resnick, P. and H. R. Varian (1997), ‘Recommender systems’, in of the ACM , 40(3), pp. 56–58

Riley, D. (2007), ‘Bookmark, copy, note and share: SharedCopy.com’, TechCrunch, available online at http://www.techcrunch.com/2007/05/10/bookmark-copy-note-and-share- sharedcopycom/ , retrieved 26/09/08

Robot Co-op (2008), Tagging for life goals on 43 Things, available online at http://www.43things.com/ , retrieved 26/09/08

Rock, C. (2008, forthcoming), ‘User requirements for Virtual Research Environments in Irish Higher Education institutions’, MLIS minor thesis, School of Information and Library Studies, University College Dublin

Rosenfeld, L. (2005), ‘Folksonomies? How about metadata ecologies?’ available online at http://louisrosenfeld.com/home/bloug_archive/000330.html , retrieved 19/08/08

Samuel, A. (2006), ‘Choosing effective del.icio.us tags’, available online at http://www.alexandrasamuel.com/20060310/update-choosing-effective-Delicious-tags , retrieved 19/08/08

Schmidt-Supprian, C. (2007), ‘Controlled Vocabularies in a Multilingual Federated Search Environment: The Example of the European Library’, MLIS minor thesis, November 2007, School of Information and Library Studies, University College Dublin

Shafi, S. M. and R. A. Rather (2005), ‘Precision and Recall of Five Search Engines for Retrieval of Scholarly Information in the Field of Biotechnology’, in Webology, 2 (2), August 2005, available online at http://www.webology.ir/2005/v2n2/a12.html , retrieved 19/08/08

Shirky, C. (2002), ‘Weblogs and the Mass Amateurization of Publishing’, available online at http://shirky.com/writings/weblogs_publishing.html , retrieved 19/08/08

Shirky, C. (2004), ‘Folksonomy’, available online at http://many.corante.com/archives/2004/08/25/folksonomy.php , retrieved 19/08/08

Shirky, C. (2005a), ‘Tags != folksonomies && Tags != Flat name spaces’, available online at http://many.corante.com/archives/2005/01/24/tags_folksonomies_tags_flat_name_space s.php , retrieved 19/08/08

Shirky, C. (2005b), ‘Ontology is overrated: Categories, Links, and Tags’, available online at http://www.shirky.com/writings/ontology_overrated.html , retrieved 19/08/08

Smith, G. (2008), Tagging: People-Powered Metadata for the Social Web (Voices that Matter) , London: New Riders Press

94 REFERENCES

Sood, S. C., S. H. Owsley, K. J. Hammond and L. Birnbaum (2007), ‘TagAssist: Automatic Tag Suggestion for Blog Posts’, available online at http://www.icwsm.org/papers/2-- Sood-Owsley-Hammond-Birnbaum.pdf , retrieved 19/08/08

Speller, E. (2007), ‘Collaborative tagging, folksonomies, distributed classification or ethnoclassification: a literature review’, available online at http://www.librarystudentjournal.org/index.php/lsj/article/view/45/58 , retrieved 19/08/08

Spiteri, L. F. (2007), ‘The Structure and Form of Folksonomy Tags: The Road to the Public Library Catalog’, in Information Technology and Libraries , 26 (3), September 2007, pp. 13–25

Steve.Museum (2008), Woman’s ceremonial skirt, tagged with Steve Tagger, available at http://tagger.steve.museum/steve.php?task=randomizedCollectionController_viewImage set&mimeId=137&imagesetId=14 , retrieved 23/09/08

Stock W. G. (2007), ‘Folksonomies and science communication: A mash-up of professional databases, and Web 2.0 services, in Information Services and Use , 27, pp. 97–103, available online at http://wwwalt.phil-fak.uni- duesseldorf.de/infowiss/admin/public_dateien/files/1/1194272247inf_servic.pdf , retrieved 01/05/08

Sureka, A. (2006), ‘Making Unstructured Data Findable Using Tagging and Annotation’, DM Review Special Report, available online at http://www.dmreview.com/specialreports/20060509/1054424-1.html , retrieved 19/08/08

Trant, J. and B. Wyman (2006), ‘Investigating social tagging and folksonomy in art museums with Steve.Museum’, Proceedings of the Collaborative Web Tagging Workshop at WWW2006, Edinburgh, available online at http://www.archimuse.com/research/www2006-tagging-steve.pdf , retrieved 19/08/08

Trant, J., D. Bearman and S. Chun (2007), ‘The eye of the beholder: Steve.Museum and social tagging of museum collections’, in Trant, J. and D. Bearman (Eds.): International Cultural Heritage Informatics Meeting (ICHIM07) Proceedings , Toronto: Archives & Museum Informatics, available online at http://www.archimuse.com/ichim07/papers/trant/trant.html , retrieved 19/08/08

Udell, J. (2004), ‘Collaborative knowledge gardening’, available online at http://www.infoworld.com/article/04/08/20/34OPstrategic_1.html , retrieved 19/08/08

UK Office for Library and Information Networking (2008), ‘Enhanced Tagging for Discovery’, available online at http://www.jisc.ac.uk/whatwedo/programmes/programme_rep_pres/etfd.aspx , retrieved 19/08/08

University College Dublin (2008a), ‘Irish Virtual Research Library and Archive (IVRLA)’, available online at http://www.ucd.ie/ivrla/main.html , retrieved 01/09/08

University College Dublin (2008b), ‘Irish Virtual Research Library and Archive (IVRLA) Work Packages’, available online at http://www.ucd.ie/ivrla/workpackages.html , retrieved 01/09/08

95 REFERENCES

University College Dublin (2008c), ‘Irish Virtual Research Library and Archive (IVRLA), Project overview: Source repositories and collections’, available online at http://www.ucd.ie/ivrla/workbook/wsourcerepositories.html , retrieved 01/09/08

University of Bristol (2008a), Iugo project components, available online at http://iugo.ilrt.bris.ac.uk/ , retrieved 23/09/08

University of Bristol (2008b), The Iugo prototype, available online at http://Iugo.ilrt.bris.ac.uk/IugoPortal/help.jsp#Annotating , retrieved 19/08/08

University of Kassel (2008): Tag relations on BibSonomy, Knowledge and Data Engineering Group, University of Kassel, Germany, available online at http://www.bibsonomy.org/relations , retrieved 23/09/08

University of Pennsylvania (2008), PennTags, Tagging tips, available online at http://tags.library.upenn.edu/help/tagging_tips , retrieved 19/08/08

University of Queensland (2008), National eResearch Architecture Taskforce (NeAT), NEAT Project Proposal: Secure Document Sharing, Tagging and Annotation, Services for Online eResearch Communities, available online at http://www.pfc.org.au/pub/Main/NeAT-2/NeAT_Tagging_and_Annotation.pdf , retrieved 19/08/08 van Deen, T. and R. Deneberg (2006), ‘Integration of Services - Integration of Standards’, Workshop Report, Koninklijke Bibliotheek, The Hague, March, 2006, in D-Lib Magazine, 12 (5), May 2006, available online at http://www.dlib.org/dlib/may06/vanveen/05vanveen.html , retrieved 19/08/08 van der Sluijs, K. and G.-J. Houben (2008), ‘Metadata-based Access to Cultural Heritage Collections: the RHCe Use Case’, pp. 15-24, in PATCH 2008 , 2nd International Workshop on Personalized Access to Cultural Heritage, July–August 2008, Hannover, Germany, available online at from http://www.ah2008.org/files/resourcesmodule/@random4875d36b687e4/1215681416__ Proc_AH2008_WS4_Personalized_Access_to_Cultural_Heritage.pdf , retrieved 19/08/08

Vander Wal, T (2007), ‘Folksonomy Coinage and Definition’, available online at http://vanderwal.net/folksonomy.html , retrieved 19/08/08

Watry, P. and F. Corubolo (2007), ‘MultiCheK, final report, March 2007, available online at http://bodoni.lib.liv.ac.uk/VRE/Final_Report_MultiCheK_VRE2.pdf , retrieved 19/08/08

Weller K. (2007), ‘Folksonomies and ontologies: Two new players in indexing and knowledge representation’, Proceedings of the Online Information Conference, London, 2007, available online at http://www.phil-fak.uni- duesseldorf.de/infowiss/admin/public_dateien/files/35/1197281173folksonomi.pdf , retrieved 01/05/08

96 REFERENCES

Wetzlmayr, R. (2005), Comment in reply to blog post ‘Technorati tags: Good idea, terrible implementation’ by D. Taylor, available online at http://www.intuitive.com/blog/technorati_tags_good_idea_terrible_implementation.html , retrieved 19/08/08

Wikimedia (2008a), Scientific classification, available online at http://commons.wikimedia.org/wiki/Image:Scientific_classification.png , retrieved 23/09/08

Wikimedia (2008b), Collective intelligence networking, available online at http://en.wikipedia.org/?title=Collective_intelligence#Types_of_collective_intelligence , retrieved 23/09/08

World Wide Web Consortium (2005), The Annotatio annotation client, available online at http://sourceforge.net/project/screenshots.php?group_id=120886 , retrieved 23/09/08

Wusteman, J. and P. O’hIceadha (2007), The OJAX home page, available online at http://ojax.sourceforge.net/ , retrieved 19/08/08

Xu, Z., Y. Fu, J. Mao and D. Su (2006), ‘Towards the semantic web: Collaborative tag suggestions’, Yahoo! Inc., Santa Clara, California, available online at http://www.semanticmetadata.net/hosted/taggingws-www2006-files/13.pdf , retrieved 23/09/08

97 APPENDIX A: SOFTWARE COMPARISON TABLE

Appendix A: Software comparison table

98