Automatic Indexing of News Articles by Yunus J

Total Page:16

File Type:pdf, Size:1020Kb

Automatic Indexing of News Articles by Yunus J Automatic Indexing of News Articles by Yunus J Mansuri 18 credit project report for the degree of Master of Information Science The School of Computer Science and Engineering The University of New South Wales August, 1996 This thesis is dedicated to God the most benificient and merciful 11 Acknowledgements During my course of study many people have helped me. However I would like to especially thank my supervisor John Shepherd for his patience and perseverance with my thesis. His assistance, suggestions, and constructive criticism in the development of this thesis is worthy of special praise. My brother Yusuf Mansuri, sister Raisa and baby Hannan deserve special thanks who gave me their endless love and support without which this thesis could not have appeared. I also thank my friends Sadik, Hassan, Banchong for their help and the time they shared with me. I thank for their friendship. Finally, I must thank my parents above all. Their contribution to my life makes everything else pale into insignificance. lll Abstract Information has become an essential currency in the "Information Age". With the growth of network technology and connectivity, the desire to share ideas via Usenet has grown exponentially. Huge amounts of data flow through the Usenet daily. Our overall aim is to minimise the effort required by the reader to handle large volume of news passing through Usenet. In order to achieve this, there are 2 major tasks: Automatic indexing which dives in the ocean of data and Retrieval of relevant articles which fetches a glass of information to satisfy the thirst of the user based on user profile. iv Contents 1 Introduction 6 1.1 Filtering . 7 1.1.1 Information filtering system 9 1.2 Objectives of research . 10 1.3 Area of research . 11 1.4 Organisation . 11 2 Literature Review 13 2.1 Information Retrieval . 13 2.2 Components of IR system . 14 2.2.1 Selection of information . 14 2.2.2 Text analysis and representation . 19 2.2.3 Searching strategy . 43 2.3 Summary . 46 3 Literature Review - Available Systems 47 3.1 History of newsreaders . 47 3.1.1 Popular screen-oriented news reading interfaces 50 1 3.2 Related work 53 3.2.1 SMART 53 3.2.2 SIFT-Stanford Information Filtering Tool 56 3.2.3 Tapestry . 57 3.2.4 URN ... 58 3.2.5 INFOSCOPE 59 3.2.6 Deja News .. 60 4 IAN - Intelligent Assistant for News reading 62 4.1 Preview of IAN 62 4.2 Introduction .. 64 4.2.1 How does IAN work 64 4.3 IAN system ..... 66 4.3.1 Fetch articles 66 4.3.2 Automatic indexing of news articles . 68 4.3.3 Retrieval of relevant news articles based on user profile 80 5 Experiments 87 5.1 Evaluation methods . 87 5.2 Evaluation procedure 89 5.2.1 System performance method . 90 5.2.2 Comparison method 91 5.2.3 Discussion . 92 6 Conclusions 97 2 6.1 Review of research objectives . 97 6.2 Conclusion . 98 6.3 Future work . 100 A Outline of the program 102 A Filtering Statistics 114 3 List of Figures 2.1 Change in Document space after assignment of good discrim- inator . 29 4.1 Modules and flow of data . 63 4.2 Major Functions of our part of IAN 66 4.3 Different modules and flow of data 67 4.4 Modules involved in text analysis 70 4.5 Keyword structure during search ............... 83 4.6 Query tree: representing A .and. B ...... 84 4. 7 Query tree: representing A .and. B .or. C .and. D . 85 A.1 Rate of posting of articles . 116 A.2 Increase in uniq words per hour . 117 4 List of Tables 2.1 Term weighting formulae depending on within-document fre- quency . 36 2.2 Term-weighting formulae depending on term importance within an entire collection . 37 2.3 Term weighting formulae depending on Document frequency 37 5.1 Results from system performance method . 91 5.2 Results from comparison method - An average of 3 days 93 5.3 Result:Evaluation of performance . 93 A.1 Statistical information: Information of data in MB ...... 115 A.2 Stasistical information: Information of data in terms of words 115 A.3 Results from comparison method - Day 1 . 118 A.4 Result:Evaluation of performance for Day - 1 . 118 A.5 Results from comparison method - Day 2 . 119 A.6 Result:Evaluation of performance for Day - 2 . 119 A.7 Results from comparison method - Day 3 . 120 A.8 Result:Evaluation of performance for Day - 3 . 120 5 Chapter 1 Introduction Information has become an essential currency of this "Information Age" . With the advancement in network technology the information resources of the world can be accessed from a desktop. With growth of this connectivity has also grown a desire to share ideas and information. For that purpose a system already exists which enables millions of people around the world to send and receive information: Internet. The Internet supports many styles of communication; one of them is Usenet News, a global bulletin board system. Usenet is a collaborative system with no barriers to access, no requirement of computer literacy beyond basic word processing skill and working in the most democratic way without any restriction on content or dissemination of information. Anyone can make a posting about any topic and anyone can read what anyone else has to say about a topic and also share his/her own view. With the explosive growth of the system in terms of number of hosts and users connected, the information flow has increased many fold. Every 6 day around 30000 messages (90 MB) of text on a wide range of topics arrive at each site on Usenet. 1 Given this amount and diversity of information, the question arises how can one actually make use of it, by getting information on his/her topic of interest and information on those topics only. Users generally have a small number of specific interests, but most of the material found on Usenet is irrelevant to these interests and often of low quality. One solution to this information overload problem is controlling the information either by charging on posting or by editors filtering out low quality information. At the moment it is not practical to implement either of these alternatives. On the other hand it is not clear that such restrictions are desirable; if these sort of restrictions had been applied from the outset it would have hampered the growth of Usenet. Now, the only feasible solution for the user to filter relevant information from the incoming stream is filtering. 1.1 Filtering Filtering is not a new concept. We use information filtering in our day to day life. For instance when we go to look for books of interest in the library, we do not start reading all books to find relevant topics but apply filtering 1 A typical example of Usenet traffic volume: In 2 weeks 990258 articles, totalling 2512.5 MB submitted from 53566 Usenet sites by 196093 different users to 11099 different newsgroups for an average 180 MB per day.[Com95] the weeks 7 whereby we use a catalogue to limit our search to books containing specific topics of interest. Once we have found a potentially useful book, we first look at the index to determine whether it covers any interesting material. The same ideas are applied in filtering. In Usenet news, the first layer of filtering is provided by newsgroups. Here generated messages are partitioned over the newsgroups, where each newsgroup carries articles on specific topics. Newsgroup-based filtering is achieved by simply subscribing only to relevant newsgroups (i.e. only sub­ scribing to a small set of newsgroups that carry articles relevant to one's interests). This sort of filtering is not entirely satisfactory as quite often newsgroups cross boundaries, and one is not certain that the un-subscribed newsgroups contain no relevant articles of interest. Also the task of selecting relevant newsgroups is a problem; looking at all 3000 newsgroups and what each contains and then selecting is a non-trivial task. Large numbers of ar­ ticles (hundreds per day) are posted in the most active newsgroups. Out of these, many are not of interest to all users who read that news group. Currently the burden of selecting relevant articles lies with the user and is achieved by looking at the subject line of each article. This functionality (displaying subject line) is provided by almost all news reading programs (e.g. "rn", "trn", "nn" etc). However, simply looking at the subject heading is not a viable way to find relevant articles; sometimes articles do not have any subject, but more frequently, they have a subject heading that is not directly related to the content of the article. Some news reading programs 8 (e.g. "nn", "GNUS") support the idea of a "kill-file" which, depending upon the criteria supplied by the user, removes the articles from consideration be­ fore their subject lines are displayed (e.g. kill all articles posted from site "x" or posted by "author y"). This filtering mechanism puts the burden of filtering onto the user, who has to choose between subscribing to only a small number of newsgroups and potentially missing out on the interesting items, or subscribing to more newsgroups and manually filtering out a large number of uninteresting articles. Clearly more automatic assistance is required in the information filtering task. 1.1.1 Information filtering system An information filtering system is an information system designed for un­ structured or semistructured data[BC92].
Recommended publications
  • Automatic Indexing: an Approach Using an Index Term Corpus and Combining Linguistic and Statistical Methods
    CORE Metadata, citation and similar papers at core.ac.uk Provided by Helsingin yliopiston digitaalinen arkisto Automatic indexing: an approach using an index term corpus and combining linguistic and statistical methods Timo Lahtinen Academic dissertation to be publicly discussed, by due permission of the Faculty of Arts at the University of Helsinki in lecture room Unioninkatu 35, on the 11th of December, 2000, at 11 o’clock. University of Helsinki PUBLICATIONS Department of General Linguistics NO. 34 P.O. Box 4 2000 FIN-00014 University of Helsinki Finland ISBN 951-45-9639-0 ISBN 951-45-9640-4 (PDF) ISSN 0355-7170 Helsinki 2000 Yliopistopaino Abstract This thesis discusses the problems and the methods of finding relevant information in large collections of documents. The contribution of this thesis to this problem is to develop better content analysis methods which can be used to describe document content with index terms. Index terms can be used as meta-information that describes documents, and that is used for seeking information. The main point of this thesis is to illustrate the process of developing an automatic indexer which analyses the content of documents by combining evidence from word frequencies and evidence from linguistic analysis provided by a syntactic parser. The indexer weights the expressions of a text according to their estimated importance for describing the content of a given document on the basis of the content analysis. The typical linguistic features of index terms were explored using a linguistically analysed text collection where the index terms are manually marked up. This text collection is referred to as an index term corpus.
    [Show full text]
  • NEDLIB Glossary
    NEDLIB Glossary Authors: José Luis Borbinha, National Library of Portugal Fernando Cardoso, National Library of Portugal Nuno Freire, INESC Date of Issue: 22 February 2000 Issue: 1.0web Total Number of Pages: 46 NEDLIB Glossary TABLE OF CONTENTS TABLE OF CONTENTS...............................................................................................................................................................................i Abstract ................................................................................................................................................................................................2 Keywords..............................................................................................................................................................................................2 1. Concepts ..................................................................................................................................................................................................2 1.1 Index of Terms ............................................................................................................................................................................2 1.2 Terms Relationship ....................................................................................................................................................................6 1.3 Glossary ....................................................................................................................................................................................13
    [Show full text]
  • A Design of the Inverted Index Based on Web Document Comprehending
    664 JOURNAL OF COMPUTERS, VOL. 6, NO. 4, APRIL 2011 A Design of the Inverted Index Based on Web Document Comprehending Shaojun Zhong Jiangxi University of Science and Technology, Ganzhou, China Email:[email protected] Min Shang and Zhijuan Deng Yuxi Normal University, Yuxi, China Jiangxi University of Science and Technology, Ganzhou, China Email:[email protected], [email protected] Abstract—A design of inverted index based on web document comprehending is induced and the algorithm is II. INVERTED INDEX TECHNOLOGY presented by combining the inverted index and the web documents understanding technology. Experimental results The inverted index is the most widely used index show that less time is cost than the traditional inverted index model at present. All words in the documents are indexed to query in the documents with the same size. as key words in inverted index. The recording item of each word includes the document that contains the word, Index Terms—inverted index, Web document as well as its location in the document. Thus, when you comprehending, latent semantic analysis, correlation search a word in the index, you can easily find the document which contains the word and its location in the document. For inverted index of search engine, since the I. INTRODUCTION number of web pages related to lexical items is The emergence of the Internet makes major changes of dynamically changed, and so is the content of the web the information retrieval. Search engines have become a pages, it is more difficult to maintain the inverted index. convenient way to query data, access to information.
    [Show full text]
  • Three SGML Metadata Formats: TEI, EAD, and CIMI
    Three SGML metadata formats: TEI, EAD, and CIMI A Study for BIBLINK Work Package 1.1 December 1996 Lou Burnard and Richard Light BIBLINK study of SGML metadata formats Page 1 Prefatory Note BIBLINK Work Package 1 (Study of Metadata) is intended to identify, describe, and compare current approaches to the encoding of metadata, with a view to making recommendations. Amongst the many different approaches currently in use, those based on the Standard Generalized Markup Lamguage (SGML: ISO 8879) appear to offer the widest range of features and the broadest potential for cross-sector applicability. This additional detailed study on three SGML-based formats was therefore commissioned to supplement the survey reported on in Work Package 1. This report consists of a brief overview of each of the three schemes studied, some general discussion of the technical aspects of using SGML in a production environment, and a detailed feature by feature comparison of the three schemes with respect to a number of key attributes identified by the Biblink project. A bibliography with pointers to further reading on each of the three schemes is also provided. Richard Light was responsible for the original draft of the report, and for the information on CIMI and EAD. Lou Burnard was responsible for final editing of the report and for the information on TEI and SGML. Thanks are due to John Perkins, Daniel Pitti, and Rachel Heery for helpful suggestions during the preparation of the report. 1 Introduction 2 2 Overview of the schemes studied 2 2.1 The Text Encoding
    [Show full text]
  • Metadata Standards & Applications
    Cataloging for the 21st Century -- Course 2 Metadata Standards & Applications Trainee Manual Original course design by Diane I. Hillmann Cornell University Library Revised by Rebecca Guenther and Allene Hayes, Library of Congress For The Library of Congress And the Association for Library Collections & Technical Services Washington, DC August 2008 THIS PAGE INTENTIONALLY LEFT BLANK FOR DOUBLE SIDED COPY Trainee Manual Course Outline Metadata Standards and Applications Outline 1. Introduction to Digital Libraries and Metadata • Discuss similarities and differences between traditional and digital libraries • Understand how the environment where metadata is developing is different from the library automation environment • Explore different types and functions of metadata (administrative, technical, administrative, etc.) Exercise: Examine three digital library instances, discuss differences in user approach and experience, and look for examples of metadata use 2. Descriptive Metadata Standards • Understand the categories of descriptive metadata standards (e.g., data content standards, data value standards, data structure standards, relationship models) • Learn about the various descriptive metadata standards and the communities that use them • Evaluate the efficacy of a standard for a particular community • Understand how relationship models are used Exercise: Create a brief descriptive metadata record using the standard assigned. 3. Technical and Administrative Metadata Standards • Understand the different types of administrative metadata
    [Show full text]
  • Unit14 Overview of Web Indexing, Metadata, Interoperability and Ontologies
    UNIT14 OVERVIEW OF WEB INDEXING, METADATA, INTEROPERABILITY AND ONTOLOGIES Structure 14.0 Objectives 14.1 Introduction 14.2 Web Indexing 14.2.1 Concept 14.2.2 Types of Web Indexes 14.3 Metadata 14.3.1 Concept 14.3.2 Types 14.4 Ontology 14.4.1 Concept 14.4.2 Web Ontology 14.4.3 Types 14.5 Interoperability 14.5.1 Need 14.5.2 Interoperability and Web Search 14.5.3 Methods for Achieving Interoperability 14.5.4 Protocols for Interoperability 14.6 Summary 14.7 Answers to Self Check Exercises 14.8 Keywords 14.9 References and Further Reading 14.0 OBJECTIVES After reading this Unit, you will be able to: define the meaning and need of web indexing; explain the role, usage and importance of metadata; define is ontology and its importance in web parlance; explain interoperability and various methods of interoperability; and discuss protocols for interoperability. 14.1 INTRODUCTION Index is a tool that has been in use for a long time to locate information. It is a list of key words or terms that supplement a document at the end of text for fruitful navigation and browsing. An index not only provides a chance to highlight content and provide a bird’s- 6 8 eye-view to the document, it also helps to identify the inconsistencies and improve Overview of Web upon content of the document for the author. The Web has emerged as an enormous Indexing, Metadata, source of information with a lot of chaotic information content also. Structurally, it is a Interoperability and Ontologies collection of websites hosted at different domains round the globe.
    [Show full text]
  • Bibliometrics, Information Retrieval and Natural Language Processing: Natural Synergies to Support Digital Library Research
    BIRNDL 2016 Joint Workshop on Bibliometric-enhanced Information Retrieval and NLP for Digital Libraries Bibliometrics, Information Retrieval and Natural Language Processing: Natural Synergies to Support Digital Library Research Dietmar Wolfram1 1School of Information Studies, University of Wisconsin-Milwaukee P.O. Box 413, Milwaukee, WI U.S.A. 53201 [email protected] Abstract Historically, researchers have not fully capitalized on the potential synergies that exist between bibliometrics and information retrieval (IR). Knowledge of regularities in information production and use, as well as citation relationships in bibliographic databases that are studied in bibliometrics, can benefit IR system design and evaluation. Similarly, techniques developed for IR and database technology have made the investigation of large-scale bibliometric phenomena feasible. Both fields of study have also benefitted directly from de- velopments in natural language processing (NLP), which have provided new tools and techniques to explore research problems in bibliometrics and IR. Digi- tal libraries, with their full text, multimedia content, along with searching and browsing capabilities, represent ideal environments in which to investigate the mutually beneficial relationships that can be forged among bibliometrics, IR and NLP. This brief presentation highlights the symbiotic relationship that ex- ists among bibliometrics, IR and NLP. Keywords: bibliometrics, information retrieval, digital libraries, natural lan- guage processing 1 Introduction Both information retrieval (IR) and bibliometrics have long histories as distinct areas of investigation in information science. IR has focused on the storage, representation and retrieval of documents (text or other media) from the system and user perspectives. Bibliometrics and its allied areas (informetrics, scientometrics, webmetrics, or simply “met- rics”) have focused on discovering and understanding regularities that exist in the way information is produced and used--but this simple defi- nition belies the breadth of research undertaken.
    [Show full text]
  • Search Engine Optimization for E-Business
    Search Engine Optimization for E-Business Website ك ا ا ا ا ل ا و Prepared by: Nancy Sharaf Student ID: 401020098 Supervised by: Dr. Raed Hnanandeh A THESIS PROPOSAL SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master OF Electronic-Business Faculty of Business E-Business Department Middle East University Amman-Jordan January, 2013 II III Middle East University for Graduate Studies Authorization Form I ,The Undersigned ( Nancy Shraf ), authorize the Middle East University for Graduate Studies to provide copies of my thesis to all and any university libraries and/or institutions or related parties interested in scientific researches upon their request IV Discussion Committee Decision This thesis has been discussed under the title Search Engine Optimization for E- Business Website "This is to certify that the Thesis entitled was successfully defended and approved May 24th 2010. V ACKNOWLEDGEMENTS This thesis is the product of an educational experience at MEU, various people have contributed towards its completion at different stages, either directly or indirectly, and any attempt to thank all of them is bound to fall short. To begin, I would like to express my whole hearted and sincere gratitude to Dr. Raed Hanandeh and Dr Sharef Jad for their guidance, time, and patience, for supporting me and this thesis during every stage of its development. I would like to extend my special thanks to my family and my work family, without whose encouragement and support; I wouldn’t have been here completing my degree’s final requirements. Sincerely Yours, VI DEDICATIONS To My father and mother soul My Brothers and sisters And to all my friends I dedicate this effort.
    [Show full text]
  • Topic Indexing with Wikipedia
    Topic Indexing with Wikipedia Olena Medelyan, Ian H. Witten and David Milne Computer Science Department University of Waikato New Zealand {olena, ihw, dnk2}@cs.waikato.ac.nz Abstract This paper shows how Wikipedia can be utilized effec- Wikipedia article names can be utilized as a controlled vo- tively for topical indexing. The scheme is evaluated on a cabulary for identifying the main topics in a document. set of 20 computer science articles, indexed by 15 teams of Wikipedia’s 2M articles cover the terminology of nearly any computer science students working independently, two per document collection, which permits controlled indexing in team. The automatic approach outperforms some student the absence of manually created vocabularies. We combine teams, and needs only a very small training set. state-of-the-art strategies for automatic controlled indexing with Wikipedia’s unique property—a richly hyperlinked en- cyclopedia. We evaluate the scheme by comparing auto- 2. Related work matically assigned topics with those chosen manually by human indexers. Analysis of indexing consistency shows One of the largest controlled vocabularies used for index- that our algorithm outperforms some human subjects. ing is the Medical Subject Heading (MeSH) thesaurus. It contains 25,000 concepts and has been applied to both term 1. Introduction assignment and keyphrase indexing, individually and in combination. Markó et al. (2004) decompose document The main topics of a document often indicate whether or phrases into morphemes with a manually created diction- not it is worth reading. In libraries of yore, professional ary and associate them with MeSH terms assigned to the human indexers were employed to manually categorize documents.
    [Show full text]
  • Natural Language and Mathematics Processing for Applicable Theorem Search
    Natural Language and Mathematics Processing for Applicable Theorem Search by Ștefan Anca a thesis for conferral of a Master of Science in Computer Science _______________________________ Prof. Dr. Michael Kohlhase (Jacobs University) Prof. Dr. John Bateman (Universität Bremen) Date of Submission: August 24th, 2009 School of Engineering and Science Declaration The research subsumed in this thesis has been conducted under the supervision of Prof. Dr. Michael Kohlhase from Jacobs University Bremen. All material presented in this Master Thesis is my own, unless specifically stated. I, S¸tefan Anca, hereby declare, that, to the best of my knowledge, the research presented in this Master Thesis contains original and independent results, and it has not been submitted elsewhere for the conferral of a degree. S¸tefan Anca Bremen, August 24th, 2009 Acknowledgements I would like to first of all thank my supervisor, Prof. Dr. Michael Kohlhase, for his continu- ous guidance throughout my entire research within the KWARC group and specifically for this thesis. Using his advice and direction, I have managed to fully grasp the complexity of my research topic and develop my own approach to pursue it. His comments on the final form of this thesis were most valuable and led me to a comprehensive exposition. I would also like to thank the members of the KWARC group for maintaining an exciting research environment, encouraging independent study and research. I also extend my gratitude to the developers of MathWebSearch for continuous support in my research efforts. I would like to thank my friends who took the time to review drafts of this report, especially Andrei Giurgiu, Adrian Djokic, Magdalena Narozniak and Milena Makaveeva.
    [Show full text]
  • Fall 2015 Web Search Engine Architecture Overview Chapter 1,2
    Subject 2 Fall 2015 Web Search Engine Architecture Overview Chapter 1,2,3.1-3.4 Disclaimer: These abbreviated notes DO NOT substitute the textbook for this class. They should be used IN CONJUNCTION with the textbook and the material presented in class. If there is a discrepancy between these notes and the textbook, ALWAYS consider the textbook to be correct. Report such a discrepancy to the instructor so that he resolves it. These notes are only distributed to the students taking this class with A. Gerbessiotis in Fall 2015 ; distribution outside this group of students is NOT allowed. (c) Copyright A. Gerbessiotis. CS 345 : Fall 2015 . All rights reserved. 1 Search Engine Architecture Overview of components We introduce in this subject the architecture of a search engine. It consists of its software components, the interfaces provided by them, and the relationships between any two of them. (An extra level of detail could include the data structures supported.) In this subject, we use the example of an early centralized architecture as reflected by the Altavista search engine of the mid 90s to provide a high-level description of the major components of such a system. We then (Subject 3) give an example of the Google search engine architecture as it was originally developed and used back in 1997 and 1998. There are more components involved in the Google architecture, but a high-level abstraction of that architecture (minus the ranking engine perhaps) is not much different from Altavista's. The first search engines such as Excite (1994), InfoSeek (1994), Altavista(1995) employed primarily Information Retrieval principles and techniques and were search engines that were evaluating the similarity of a query q relative to the web document dj of a corpus of web-documents retrieved from the Web.
    [Show full text]
  • Establishing the Value of Socially Created Metadata to Image Indexing
    This is a preprint of an article published in Library & Information Science Research: Stvilia, B., Jörgensen, C., & Wu, S. (2012). Establishing the value of socially created metadata to image indexing. Library & Information Science Research, 34(2), 99-109. Establishing the Value of Socially Created Metadata to Image Indexing Besiki Stvilia1, Corinne Jörgensen, and Shuheng Wu School of Library and Information Studies College of Communication and Information, Florida State University Tallahassee, FL 32306-2100, USA {bstvilia, cjorgensen, sw09f}@fsu.edu Abstract There have been ample suggestions in the literature that terms added to documents from Flickr and Wikipedia can complement traditional methods of indexing and controlled vocabularies. At the same time, adding new metadata to existing metadata objects may not always add value to those objects. This research examines the potential added value of using user-contributed (“social”) terms from Flickr and the English Wikipedia in image indexing compared with using two expert-created controlled vocabularies— the Thesaurus for Graphic Materials and the Library of Congress Subject Headings. Our experiments confirmed that the social terms did provide added value relative to terms from the controlled vocabularies. The median rating for the usefulness of social terms was significantly higher than the baseline rating but was lower than the ratings for the terms from the Thesaurus for Graphic Materials and the Library of Congress Subject Headings. Furthermore, complementing the controlled vocabulary terms with social terms more than doubled the average coverage of participants’ terms for a photograph. The study also investigated the relationships between user demographics and users’ perceptions of the value of terms, as well as the relationships between user demographics and indexing quality, as measured by the number of terms participants assigned to a photograph.
    [Show full text]