Identifying Experts and Authoritative Documents in Social Bookmarking Systems
Total Page:16
File Type:pdf, Size:1020Kb
IDENTIFYING EXPERTS AND AUTHORITATIVE DOCUMENTS IN SOCIAL BOOKMARKING SYSTEMS by Jonathan P. Grady B.A. Economics & Business Administration, Ursinus College, 1997 M.S. Electronic Commerce, Carnegie Mellon Univ., 2001 Submitted to the Graduate Faculty of School of Information Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh 2013 UNIVERSITY OF PITTSBURGH SCHOOL OF INFORMATION SCIENCES This dissertation was presented by Jonathan P. Grady It was defended on March 27, 2013 and approved by Peter Brusilovsky, Ph.D., Professor, School of Information Sciences Daqing He, Ph.D., Associate Professor, School of Information Sciences Stephen C. Hirtle, Ph.D., Professor, School of Information Sciences Brian S. Butler, Ph.D., Associate Professor, College of Information Studies, University of Maryland Dissertation Advisor: Michael B. Spring, Ph.D., Associate Professor, School of Information Sciences ii Copyright © by Jonathan P. Grady 2013 iii IDENTIFYING EXPERTS AND AUTHORITATIVE DOCUMENTS IN SOCIAL BOOKMARKING SYSTEMS Jonathan P. Grady University of Pittsburgh, 2013 Social bookmarking systems allow people to create pointers to Web resources in a shared, Web-based environment. These services allow users to add free-text labels, or “tags”, to their bookmarks as a way to organize resources for later recall. Ease-of-use, low cognitive barriers, and a lack of controlled vocabulary have allowed social bookmaking systems to grow exponentially over time. However, these same characteristics also raise concerns. Tags lack the formality of traditional classificatory metadata and suffer from the same vocabulary problems as full-text search engines. It is unclear how many valuable resources are untagged or tagged with noisy, irrelevant tags. With few restrictions to entry, annotation spamming adds noise to public social bookmarking systems. Furthermore, many algorithms for discovering semantic relations among tags do not scale to the Web. Recognizing these problems, we develop a novel graph-based Expert and Authoritative Resource Location (EARL) algorithm to find the most authoritative documents and expert users on a given topic in a social bookmarking system. In EARL’s first phase, we reduce noise in a Delicious dataset by isolating a smaller sub-network of “candidate experts”, users whose tagging behavior shows potential domain and classification expertise. In the second phase, a HITS- based graph analysis is performed on the candidate experts’ data to rank the top experts and authoritative documents by topic. To identify topics of interest in Delicious, we develop a iv distributed method to find subsets of frequently co-occurring tags shared by many candidate experts. We evaluated EARL’s ability to locate authoritative resources and domain experts in Delicious by conducting two independent experiments. The first experiment relies on human judges’ n-point scale ratings of resources suggested by three graph-based algorithms and Google. The second experiment evaluated the proposed approach’s ability to identify classification expertise through human judges’ n-point scale ratings of classification terms versus expert- generated data. v TABLE OF CONTENTS PREFACE ................................................................................................................................. XVI 1.0 INTRODUCTION ........................................................................................................ 1 1.1 FOCUS OF STUDY ............................................................................................. 3 1.2 DELICIOUS ......................................................................................................... 6 1.3 LIMITATIONS AND DELIMITATIONS ........................................................ 8 1.4 DEFINITION OF TERMS ............................................................................... 10 1.4.1 Annotation ...................................................................................................... 10 1.4.2 Bookmark ....................................................................................................... 10 1.4.3 Metadata ......................................................................................................... 11 1.4.4 Social Annotation ........................................................................................... 11 1.4.5 Social Bookmark ............................................................................................ 11 1.4.6 Tag................................................................................................................... 11 1.4.7 Resource ......................................................................................................... 12 1.4.8 URL ................................................................................................................. 12 1.4.9 Taxonomy ....................................................................................................... 13 1.4.10 Folksonomy ................................................................................................... 13 1.4.11 Noise ............................................................................................................... 14 1.4.12 Power set........................................................................................................ 14 vi 2.0 RELATED WORK .................................................................................................... 15 2.1 BACKGROUND ................................................................................................ 15 2.1.1 Annotations .................................................................................................... 15 2.1.1.1 Bookmarks ........................................................................................... 18 2.1.1.2 Social Bookmarks ................................................................................ 19 2.1.2 Classification .................................................................................................. 24 2.1.2.1 Approaches to Categorization ............................................................ 26 2.1.2.2 Classification Structures ..................................................................... 27 2.1.2.3 Subject Analysis - Classification by Experts .................................... 32 2.1.3 Domain Expertise .......................................................................................... 34 2.2 IDENTIFYING EXPERT USERS IN WEB-BASED SYSTEMS ................. 39 2.2.1 Identifying Expert Users in Bipartite Graphs ............................................ 39 2.2.2 Identifying Expert Users in Tripartite Graphs ........................................... 42 2.3 CLASSIFICATION IN SOCIAL ANNOTATION SYSTEMS ..................... 45 3.0 PRELIMINARY ANALYSIS ................................................................................... 48 3.1 INTRODUCTION ............................................................................................. 48 3.2 SOCIAL BOOKMARKING SYSTEMS ......................................................... 49 3.2.1 Usage Patterns................................................................................................ 50 3.2.2 Topics of Interest in Social Bookmarking Systems .................................... 53 3.3 FINDING EXPERTS AND AUTHORITATIVE RESOURCES .................. 54 3.3.1 Defining experts and authoritative resources ............................................. 54 3.3.2 EARL algorithm ............................................................................................ 56 3.3.3 Selecting topics of interest ............................................................................. 63 vii 3.4 FINDING EXPERTS AND AUTHORITATIVE RESOURCES .................. 68 3.4.1 Candidate Expert Tagging Patterns ............................................................ 68 3.4.2 Topics of Interest ........................................................................................... 73 3.4.3 EARL versus HITS and SPEAR .................................................................. 77 4.0 RESEARCH DESIGN ............................................................................................... 81 4.1 DELICIOUS DATA ........................................................................................... 81 4.2 PRE-PROCESSING OF DATA ....................................................................... 84 4.3 METHODOLOGY ............................................................................................ 86 4.4 EXPERIMENT 1: EVALUATING EARL’S ABILITY TO LOCATE AUTHORITATIVE RESOURCES .................................................................................. 87 4.4.1 Participants .................................................................................................... 87 4.4.2 Variables and Expected Results ................................................................... 88 4.4.3 Hypotheses of the 1st Experiment ................................................................. 89 4.4.4 Subjects, Evaluation, and Analysis Procedure ........................................... 89 4.5 EXPERIMENT 2: EVALUATING EARL’S ABILITY TO LOCATE DOMAIN EXPERTS .......................................................................................................... 91 4.5.1 Participants .................................................................................................... 92 4.5.2 Variables