Identifying Experts and Authoritative Documents in Social Bookmarking Systems

Total Page:16

File Type:pdf, Size:1020Kb

Identifying Experts and Authoritative Documents in Social Bookmarking Systems IDENTIFYING EXPERTS AND AUTHORITATIVE DOCUMENTS IN SOCIAL BOOKMARKING SYSTEMS by Jonathan P. Grady B.A. Economics & Business Administration, Ursinus College, 1997 M.S. Electronic Commerce, Carnegie Mellon Univ., 2001 Submitted to the Graduate Faculty of School of Information Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh 2013 UNIVERSITY OF PITTSBURGH SCHOOL OF INFORMATION SCIENCES This dissertation was presented by Jonathan P. Grady It was defended on March 27, 2013 and approved by Peter Brusilovsky, Ph.D., Professor, School of Information Sciences Daqing He, Ph.D., Associate Professor, School of Information Sciences Stephen C. Hirtle, Ph.D., Professor, School of Information Sciences Brian S. Butler, Ph.D., Associate Professor, College of Information Studies, University of Maryland Dissertation Advisor: Michael B. Spring, Ph.D., Associate Professor, School of Information Sciences ii Copyright © by Jonathan P. Grady 2013 iii IDENTIFYING EXPERTS AND AUTHORITATIVE DOCUMENTS IN SOCIAL BOOKMARKING SYSTEMS Jonathan P. Grady University of Pittsburgh, 2013 Social bookmarking systems allow people to create pointers to Web resources in a shared, Web-based environment. These services allow users to add free-text labels, or “tags”, to their bookmarks as a way to organize resources for later recall. Ease-of-use, low cognitive barriers, and a lack of controlled vocabulary have allowed social bookmaking systems to grow exponentially over time. However, these same characteristics also raise concerns. Tags lack the formality of traditional classificatory metadata and suffer from the same vocabulary problems as full-text search engines. It is unclear how many valuable resources are untagged or tagged with noisy, irrelevant tags. With few restrictions to entry, annotation spamming adds noise to public social bookmarking systems. Furthermore, many algorithms for discovering semantic relations among tags do not scale to the Web. Recognizing these problems, we develop a novel graph-based Expert and Authoritative Resource Location (EARL) algorithm to find the most authoritative documents and expert users on a given topic in a social bookmarking system. In EARL’s first phase, we reduce noise in a Delicious dataset by isolating a smaller sub-network of “candidate experts”, users whose tagging behavior shows potential domain and classification expertise. In the second phase, a HITS- based graph analysis is performed on the candidate experts’ data to rank the top experts and authoritative documents by topic. To identify topics of interest in Delicious, we develop a iv distributed method to find subsets of frequently co-occurring tags shared by many candidate experts. We evaluated EARL’s ability to locate authoritative resources and domain experts in Delicious by conducting two independent experiments. The first experiment relies on human judges’ n-point scale ratings of resources suggested by three graph-based algorithms and Google. The second experiment evaluated the proposed approach’s ability to identify classification expertise through human judges’ n-point scale ratings of classification terms versus expert- generated data. v TABLE OF CONTENTS PREFACE ................................................................................................................................. XVI 1.0 INTRODUCTION ........................................................................................................ 1 1.1 FOCUS OF STUDY ............................................................................................. 3 1.2 DELICIOUS ......................................................................................................... 6 1.3 LIMITATIONS AND DELIMITATIONS ........................................................ 8 1.4 DEFINITION OF TERMS ............................................................................... 10 1.4.1 Annotation ...................................................................................................... 10 1.4.2 Bookmark ....................................................................................................... 10 1.4.3 Metadata ......................................................................................................... 11 1.4.4 Social Annotation ........................................................................................... 11 1.4.5 Social Bookmark ............................................................................................ 11 1.4.6 Tag................................................................................................................... 11 1.4.7 Resource ......................................................................................................... 12 1.4.8 URL ................................................................................................................. 12 1.4.9 Taxonomy ....................................................................................................... 13 1.4.10 Folksonomy ................................................................................................... 13 1.4.11 Noise ............................................................................................................... 14 1.4.12 Power set........................................................................................................ 14 vi 2.0 RELATED WORK .................................................................................................... 15 2.1 BACKGROUND ................................................................................................ 15 2.1.1 Annotations .................................................................................................... 15 2.1.1.1 Bookmarks ........................................................................................... 18 2.1.1.2 Social Bookmarks ................................................................................ 19 2.1.2 Classification .................................................................................................. 24 2.1.2.1 Approaches to Categorization ............................................................ 26 2.1.2.2 Classification Structures ..................................................................... 27 2.1.2.3 Subject Analysis - Classification by Experts .................................... 32 2.1.3 Domain Expertise .......................................................................................... 34 2.2 IDENTIFYING EXPERT USERS IN WEB-BASED SYSTEMS ................. 39 2.2.1 Identifying Expert Users in Bipartite Graphs ............................................ 39 2.2.2 Identifying Expert Users in Tripartite Graphs ........................................... 42 2.3 CLASSIFICATION IN SOCIAL ANNOTATION SYSTEMS ..................... 45 3.0 PRELIMINARY ANALYSIS ................................................................................... 48 3.1 INTRODUCTION ............................................................................................. 48 3.2 SOCIAL BOOKMARKING SYSTEMS ......................................................... 49 3.2.1 Usage Patterns................................................................................................ 50 3.2.2 Topics of Interest in Social Bookmarking Systems .................................... 53 3.3 FINDING EXPERTS AND AUTHORITATIVE RESOURCES .................. 54 3.3.1 Defining experts and authoritative resources ............................................. 54 3.3.2 EARL algorithm ............................................................................................ 56 3.3.3 Selecting topics of interest ............................................................................. 63 vii 3.4 FINDING EXPERTS AND AUTHORITATIVE RESOURCES .................. 68 3.4.1 Candidate Expert Tagging Patterns ............................................................ 68 3.4.2 Topics of Interest ........................................................................................... 73 3.4.3 EARL versus HITS and SPEAR .................................................................. 77 4.0 RESEARCH DESIGN ............................................................................................... 81 4.1 DELICIOUS DATA ........................................................................................... 81 4.2 PRE-PROCESSING OF DATA ....................................................................... 84 4.3 METHODOLOGY ............................................................................................ 86 4.4 EXPERIMENT 1: EVALUATING EARL’S ABILITY TO LOCATE AUTHORITATIVE RESOURCES .................................................................................. 87 4.4.1 Participants .................................................................................................... 87 4.4.2 Variables and Expected Results ................................................................... 88 4.4.3 Hypotheses of the 1st Experiment ................................................................. 89 4.4.4 Subjects, Evaluation, and Analysis Procedure ........................................... 89 4.5 EXPERIMENT 2: EVALUATING EARL’S ABILITY TO LOCATE DOMAIN EXPERTS .......................................................................................................... 91 4.5.1 Participants .................................................................................................... 92 4.5.2 Variables
Recommended publications
  • Lyon-Nist241assmoct9.Pdf
    NIST Special Publication 500-241 Information Technology: A Quick-Reference List of Organizations and Standards for Digital Rights Management Gordon E. Lyon NIST Special Publication 500-241 Information Technology: A Quick-Reference List of Organizations and Standards for Digital Rights Management Gordon E. Lyon Convergent Information Systems Division Information Technology Laboratory National Institute of Standards and Technology Gaithersburg, MD 20899-8951 October 2002 U.S. Department of Commerce Donald L. Evans, Secretary Technology Administration Phillip J. Bond, Under Secretary of Commerce for Technology National Institute of Standards and Technology Arden L. Bement, Jr., Director Reports on Information Technology The Information Technology Laboratory (ITL) at the National Institute of Standards and Technology (NIST) stimulates U.S. economic growth and industrial competitiveness through technical leadership and collaborative research in critical infrastructure technology, including tests, test methods, reference data, and forward-looking standards, to advance the development and productive use of information technology. To overcome barriers to usability, scalability, interoperability, and security in information systems and networks, ITL programs focus on a broad range of networking, security, and advanced information technologies, as well as the mathematical, statistical, and computational sciences. This Special Publication 500-series reports on ITL’s research in tests and test methods for information technology, and its collaborative activities with industry, government, and academic organizations. Certain commercial entities, equipment, or materials may be identified in this document in order to describe an experimental procedure or concept adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the entities, materials, or equipment are necessarily the best available for the purpose.
    [Show full text]
  • Approved DITA 2.0 Proposals
    DITA Technical Committee DITA 2.0 proposals DITA TC work product Page 1 of 189 Table of contents 1 Overview....................................................................................................................................................3 2 DITA 2.0: Stage two proposals.................................................................................................................. 3 2.1 Stage two: #08 <include> element.................................................................................................... 3 2.2 Stage two: #15 Relax specialization rules......................................................................................... 7 2.3 Stage two: #17 Make @outputclass universal...................................................................................9 2.4 Stage two: #18 Make audience, platform, product, otherprops into specializations........................12 2.5 Stage two: #27 Multimedia domain..................................................................................................16 2.6 Stage two: #29 Update bookmap.................................................................................................... 20 2.7 Stage two: #36 Remove deprecated elements and attributes.........................................................23 2.8 Stage two: #46: Remove @xtrf and @xtrc...................................................................................... 31 2.9 Stage 2: #73 Remove delayed conref domain.................................................................................36
    [Show full text]
  • TRAFFIC AUTOPILOT™ SPECIAL REPORT Copyright Notice
    TRAFFIC AUTOPILOT™ SPECIAL REPORT Copyright Notice © Copyright 2011 Drew Trainor and Traffic Autopilot ALL RIGHTS RESERVED. No part of this report can be reproduced or distributed in any way without written permission of the author. DISCLAIMER AND/OR LEGAL NOTICES: The information presented herein represents the view of the author as of the date of publication. Because of the rate with which conditions change, the author reserves the right to alter and update his opinion based on the new conditions. The report is for informational purposes only. While every attempt has been made to verify the information provided in this report, neither the author nor his affiliates/partners assume any responsibility for errors, inaccuracies or omissions. Any slights of people or organizations are unintentional. If advice concerning legal or related matters is needed, the services of a fully qualified professional should be sought. This report is not intended for use as a source of legal or accounting advice. You should be aware of any laws which govern business transactions or other business practices in your country and state. Any reference to any person or business whether living or dead is purely coincidental. CONSUMER NOTICE: You should assume that the author of this report has an affiliate relationship and/or another material connection to the providers of goods and services mentioned in this report and may be compensated when you purchase from a provider. You should always perform due diligence before buying goods or services from anyone via the
    [Show full text]
  • Wordpress-For-Web-Developers.Pdf
    www.it-ebooks.info For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to access them. www.it-ebooks.info Contents at a Glance About the Author ............................................................................................................... xix About the Technical Reviewer ........................................................................................... xxi Acknowledgments ........................................................................................................... xxiii Introduction ...................................................................................................................... xxv ■ Chapter 1: Getting Started .................................................................................................1 ■ Chapter 2: Installing and Upgrading WordPress..............................................................13 ■ Chapter 3: Dashboard and Settings .................................................................................31 ■ Chapter 4: Working with Content ....................................................................................49 ■ Chapter 5: Working with Themes ....................................................................................81 ■ Chapter 6: Working with Plugins .....................................................................................95 ■ Chapter 7: Working with Users ......................................................................................101
    [Show full text]
  • CSCE A201 Computer Programming I Lab 0
    CSCE A201 Computer Programming I Lab 0 Lecture Instructors: Dr. Kenrick Mock, MW 11:30‐12:45pm Dr. Frank Witmer, TR 10‐11:15am Lab Assistants: Deanna Flynn Elisha Waugh Goals for Today • Introduce lab assistants • Meet some of your classmates • Introduction to Dr. Java • Understand platform independence • Write your first (!?) Java program Syllabus Highlights wrt. Lab • May students collaborate on lab assignments? • May students collaborate on programming assignments? • What percentage of the total course grade is determined by lab? • How is the lab grade determined? Introduce your Neighbor 1. Name 2. Hometown 3. Something interesting (e.g. hobby or recent trip) Some Java Development Environments • Dr. Java – from Rice University, simple & effective – standalone executable (installation instructions on Blackboard) • jGRASP – NSF funded at Auburn University • Intellij IDEA – good code completion & error messages – https://www.jetbrains.com/idea/ • NetBeans – from Oracle, good for multiple, larger projects • Eclipse – more plug‐ins, more customizable – good for Android development) • TextPad (initially free, but then ~$30) • Any text editor & command line – Notepad++ with Cygwin/DOS – emacs, vim, nano with unix/linux • See also Appendix 1 in Savitch book Java Example • Class Hello must be stored in file Hello.java public class Hello { public static void main(String[] args) { System.out.println("Hello world."); } } Platform Independence Demo • Compile java & C++ programs on linux – Which executable/bytecode will run on a Windows machine? Platform Independence • Use ‘javac’ command to compile Java source code .java files high‐level language • Use ‘java’ command to run Java bytecode .class and .jar (Java Archive) files low‐level language Same bytecodes can run any platform with an appropriate JVM (‘java’) Lab 0 Exercise 1.
    [Show full text]
  • Towards Actionable Visualisation in Software Development
    Towards Actionable Visualisation in Software Development Leonel Merino, Mohammad Ghafari, and Oscar Nierstrasz Software Composition Group, University of Bern Bern, Switzerland {merino, ghafari, oscar}@inf.unibe.ch Abstract—Although abundant studies have shown how visual- are still unaware of existing visualisation techniques to isation can help software developers to perform their daily tasks, adopt for their particular needs. A few studies have tried to visualisation is still not a common practice since developers address this issue by investigating to which software engi- have little support for adopting a proper visualisation for their needs. neering tasks particular visualisation techniques have been In this paper we review the 346 papers published in SOFT- applied [13]–[15]. Nevertheless, we believe these studies are VIS/VISSOFT venues and identify 65 design study papers that still coarse-grained to match a suitable visualisation to their describe how visualisation is used to alleviate various difficulties concrete needs. in software development. We classify these studies into several When developers perform a particular programming task problem domains that we collected from the research on soft- ware development community, and highlight the characteristics they ask some questions such as “what code is related to a of each study. On the one hand, we support software developers change?” or “where is this method called?”. Several studies to put visualisation in action by mapping existing techniques have investigated such questions and classified them into to particular needs in various problem domains. On the other groups [16]–[18]. Indeed, such questions reflect developer hand, we help researchers in the field by exposing domains with needs, and we believe mapping them to existing types of little visualisation support.
    [Show full text]
  • A 3-Tier Application for Topic Map Technology
    Proceedings of the 6th WSEAS International Conference on Applied Computer Science, Tenerife, Canary Islands, Spain, December 16-18, 2006 356 A 3-tier Application for Topic Map Technology A. PAPASTERGIOU, G. GRAMMATIKOPOULOS*, A. HATZIGAIDAS, G. TRYFON Department of Electronics, (*) Department of Esthetics Alexander Technological Educational Institute of Thessaloniki, EPEAEK II Sindos, 57400, Thessaloniki, Macedonia, GREECE Abstract: In the context of this paper the design and construction of a 3-tier application system for topic map technology is presented in detail. The proposed system employs Enterprise Topic Map Server (ETMS) that manages topic maps (TM) and establishes communication with client applications and persistent database. Yellow Web Services have been customized in order to achieve communication issues and exchange of TM. A client application is addressed to TM developers for editing and visualization of TM. Furthermore, the definition of TM Schema as well as the possibility to process rules and queries are supported. Another key aspect of the present paper is to analyze how different technologies utilized to formulate such a system. Key-words: Topic map, Server, Web services, Client, 3-tier, TM schema, Retrieval 1 Introduction definition of TM Schema as well as the possibility to The Topic Map model describes a mechanism for process rules and queries are also supported. In the representing knowledge about the structure of next sections the overall architecture of the proposed information and organizing it into topics [1,19]. Topics system as well as technologies that are implied in have occurrences and associations that define order to build the system are represented in details. relationships between the topics, creating thus a semantic network over the information [8,18].
    [Show full text]
  • Mapping Topic Maps to Common Logic
    MAPPING TOPIC MAPS TO COMMON LOGIC Tamas´ DEMIAN´ Advisor: Andras´ PATARICZA I. Introduction This work is a case study for the mapping of a particular formal language (Topic Map[1] (TM)) to Common Logic[2] (CL). CL was intended to be a uniform platform ensuring a seamless syntactic and semantic integration of knowledge represented in different formal languages. CL is based on first-order logic (FOL) with a precise model-theoretic semantic. The exact target language is Common Logic Interchange Format (CLIF), the most common dialect of CL. Both CL and TM are ISO standards and their metamodels are included in the Object Definition Metamodel[3] (ODM). ODM was intended to serve as foundation of Model Driven Architecture (MDA) offering formal basis for representation, management, interoperability, and semantics. The paper aims at the evaluation of the use of CL as a fusion platform on the example of TM. II. The Topic Maps TM is a technology for modelling knowledge and connecting this structured knowledge to relevant information sources. A central operation in TMs is merging, aiming at the elimination of redundant TM constructs. TopicMapConstruct is the top-level abstract class in the TM metamodel (Fig. 1). The later detailed ReifiableConstruct, TypeAble and ScopeAble classes are also abstract. The remaining classes are pairwise disjoint. Figure 1: The class hierarchy and the relation and attribute names of the TM metamodel. TopicMap is a set of topics and associations. Topic is a symbol used within a TM to represent exactly one subject, in order to allow statements to be made about that subject.
    [Show full text]
  • The Example of Swedish Independent Music Fandom by Nancy K
    First Monday Online groups are taking new forms as participants spread themselves amongst multiple Internet and offline platforms. The multinational online community of Swedish independent music fans exemplifies this trend. This participant–observation analysis of this fandom shows how sites are interlinked at multiple levels, and identifies several implications for theorists, researchers, developers, industry and independent professionals, and participants. Contents Introduction Fandom Swedish popular music The Swedish indie music fan community Discussion Conclusion Introduction The rise of social network sites is often taken to exemplify a shift from the interest–based online communities of the Web’s “first” incarnation to a new “Web 2.0” in which individuals are the basic unit, rather than communities. In a recent First Monday article, for instance, boyd (2006) states, “egocentric networks replace groups.” I argue that online groups have not been “replaced.” Even as their members build personal profiles and egocentric networks on MySpace, Facebook, BlackPlanet, Orkut, Bebo, and countless other emerging social network sites, online groups continue to thrive on Web boards, in multiplayer online games, and even on the all–but–forgotten Usenet. However, online communities are also taking a new form somewhere between the site-based online group and the egocentric network, distributing themselves throughout a variety of sites in a quasi–coherent networked fashion. This new form of distributed community poses particular problems for its members, developers, and analysts. This paper, based on over two years of participant–observation, describes this new shape of online community through a close look at the multinational online community of fans of independent rock music from Sweden.
    [Show full text]
  • How to Choose a Search Engine Or Directory
    How to Choose a Search Engine or Directory Fields & File Types If you want to search for... Choose... Audio/Music AllTheWeb | AltaVista | Dogpile | Fazzle | FindSounds.com | Lycos Music Downloads | Lycos Multimedia Search | Singingfish Date last modified AllTheWeb Advanced Search | AltaVista Advanced Web Search | Exalead Advanced Search | Google Advanced Search | HotBot Advanced Search | Teoma Advanced Search | Yahoo Advanced Web Search Domain/Site/URL AllTheWeb Advanced Search | AltaVista Advanced Web Search | AOL Advanced Search | Google Advanced Search | Lycos Advanced Search | MSN Search Search Builder | SearchEdu.com | Teoma Advanced Search | Yahoo Advanced Web Search File Format AllTheWeb Advanced Web Search | AltaVista Advanced Web Search | AOL Advanced Search | Exalead Advanced Search | Yahoo Advanced Web Search Geographic location Exalead Advanced Search | HotBot Advanced Search | Lycos Advanced Search | MSN Search Search Builder | Teoma Advanced Search | Yahoo Advanced Web Search Images AllTheWeb | AltaVista | The Amazing Picture Machine | Ditto | Dogpile | Fazzle | Google Image Search | IceRocket | Ixquick | Mamma | Picsearch Language AllTheWeb Advanced Web Search | AOL Advanced Search | Exalead Advanced Search | Google Language Tools | HotBot Advanced Search | iBoogie Advanced Web Search | Lycos Advanced Search | MSN Search Search Builder | Teoma Advanced Search | Yahoo Advanced Web Search Multimedia & video All TheWeb | AltaVista | Dogpile | Fazzle | IceRocket | Singingfish | Yahoo Video Search Page Title/URL AOL Advanced
    [Show full text]
  • Hostirian-100S-SEO-Tips.Pdf
    1 Introduction Hostirian has put together 100 search engine techniques. The competition is fierce in the search engine world. In order to get your web site or blog ranked high you need all the SEO techniques you can handle from this list. You can study millions of high rank web sites and model their techniques to use on your web site. If you choose to handle your SEO on your own, a great place to get started is getting your website setup with https://www.google.com/webmasters/#?modal_active=none 2 1) You might add keywords/phrases in the html title tag. You could also model other high ranked blog site's SEO tactics. 2) You can have good linking structure. You might also clone other top listed video site's search engine tips. 3) You should insert keywords/phrases in the html body copy. You can also imitate other leading world news site's optimization methods. 4) You could purchase quality web hosting. You may also fabricate other number one wiki site's SEO concepts. 5) You may use a fast loading web site. You should also mirror other top 10 web site template site's search engine ideas. 6) You might place keywords/phrases in bold typeface. You could also copy other hierarchy portal site's optimization advice. 7) You can affix keywords/phrases in header tags. You might also pattern other high traffic web directory site's SEO instructions. 3 8) You should use quality web site links. You can also duplicate other top of the list web based software site's search engine loopholes.
    [Show full text]
  • Young Adult Library Services Association
    THE OFFICIAL JOURNAL OF THE YOUNG ADULT LIBRARY SERVICES ASSOCIATION five ye ng ar ti s a o r f b y e a l l e s c young adult c e s l l e a b y r 5 f a t o in rs librarylibrary services services g five yea VOLUME 6 | NUMBER 2 WINTER 2008 ISSN 1541-4302 $12.50 INSIDE: INFORMATION TOOLS MUsiC WEB siTes TOP FIFTY GAMinG CORE COLLECTION TITLES INTERVIEW WITH KIMBERLY NEWTON FUSCO INFORMATION LITERACY AND MUCH MORE! TM ISSUE! TEEN TECH WEEK TM TM TEEN TECH WEEK MARCH 2-8, 2008 ©2007 American Library Association | Produced in partnership with YALSA | Design by Distillery Design Studio | www.alastore.ala.org march 2–8, 2008 for Teen Tech Week™ 2008! Join the celebration! Visit www.ala.org/teentechweek, and you can: ã Get great ideas for activities and events for any library, at any budget ã Download free tech guides and social networking resources to share with your teens ã Buy cool Teen Tech Week merchandise for your library ã Find inspiration or give your own ideas at the Teen Tech Week wiki, http://wikis.ala.org/yalsa/index.php/ Teen_Tech_Week! Teen Tech Week 2008 National Corporate Sponsor www.playdnd.com ttw_fullpage_cmyk.indd 1 1/3/2008 1:32:22 PM THE OFFICIAL JOURNAL OF THE YOUNG ADULT LIBRARY SERVICES ASSOCIATION young adult library services VOLUME 6 | NU MBER 2 WINTER 2008 ISSN 1541-4302 YALSA Perspective 33 Music Web Sites for Teen Tech Week 6 Margaret Edwards Award Turns 20 and Beyond By Betty Carter and Pam Spencer Holley By Kate Pritchard and Jaina Lewis 36 Top Fifty Gaming Core Collection Titles School Library Perspective Compiled by Kelly Czarnecki 14 Do We Still Dewey? By Christine Allen Literature Surveys and Research 39 Information Literacy As a Department Teen Perspective Store 15 Teens’ Top Ten Redux Applications for Public Teen Librarians Readers from New Jersey Talk about the By Dr.
    [Show full text]