Crowdsourcing,The,Semantic,Web

! ! ! ! ! ! Proceedings! of!the! First!International!Workshop! on! ! Crowdsourcing,the,Semantic,Web, held%in%conjunction%with! the!12th!International!Semantic!Web!Conference! ! Sydney,!Australia! ! Editors:! Maribel!Acosta! Lora!Aroyo! Abraham!Bernstein! Jens!Lehmann! Natasha!Noy! Elena!Simperl! ! Copyright!©!2013!for!the!individual!papers!by!the!papers’!authors.!Copying! permitted!only!for!private!and!academic!purposes.!This!volume!is!published!and! copyrighted!by!its!editors.! Preface This volume contains the papers presented at the 1st International Workshop on ”Crowdsourcing the Semantic Web” that was held in conjunction with the 12th International Semantic Web Conference (ISWC 2013), 21-25 October 2013, in Sydney, Australia. This interactive workshop takes stock of the emergent work and chart the research agenda with interactive sessions to brainstorm ideas and potential applications of collective intelligence to solving AI hard semantic web problems. There were 12 submissions. Each submission was reviewed by at least 2, and on the average 3, program committee members. The committee decided to accept 9 papers. Our special thanks goes to the reviewers who diligently reviewed the papers within this volume. September 3, 2013 Maribel Acosta Lora Aroyo Abraham Bernstein Jens Lehmann Natasha Noy Elena Simperl v Table of Contents Full Papers Crowdsourced Semantics with Semantic Tagging: “Don’t just tag it, LexiTag it!” ...................................................... 1 Csaba Veres ”Dr. Detective”: combining gamiﬁcation techniques and crowdsourcing to create a gold standard in medical text ............................. 16 Anca Dumitrache, Lora Aroyo, Chris Welty, Robert-Jan Sips and An- thony Levas SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing 32 Umair Ul Hassan, Sean O’Riain and Edward Curry Content and Behaviour Based Metrics for Crowd Truth ................ 45 Guillermo Soberon, Lora Aroyo, Chris Welty, Oana Inel, Manfred Over- meen and Hui Lin Experiment Papers Crowdsourced Entity Markup ...................................... 59 Lili Jiang, Yafang Wang, Johannes Ho↵art and Gerhard Weikum A Role for Provenance in Social Computation ........................ 69 Marco Fossati, Sara Tonelli and Claudio Giuliano Developing Crowdsourced Ontology Engineering Tasks: An iterative process .......................................................... 79 Jonathan Mortensen, Mark Musen and Natasha F. Noy Position Papers Information Reputation ............................................ 89 Peter Davis and Salman Haq Frame Semantics Annotation Made Easy with DBpedia ................ 93 Milan Markovic, Peter Edwards and David Corsar vi Program Committee Maribel Acosta Lora Aroyo Soren Auer Abraham Bernstein Irene Celino Oscar Corcho Philippe Cudre-Mauroux Roberta Cuel Dave de Roure Yolanda Gil Tudor Groza Michiel Hildebrand Haym Hirsh Aidan Hogan Mark Klein Jens Lehmann Patrick Minder Dunja Mladenic Jonathan Mortensen Mathias Niepert Barry Norton Natasha Noy Harald Sack Cristina Sarasua Elena Simperl Jamie Taylor Amrapali Zaveri Jun Zhao Additional Reviewers Demartini, Gianluca Waitelonis, Joerg vii Crowdsourced Semantics with Semantic Tagging: “Don’t just tag it, LexiTag it!” Csaba Veres Institute for Information and Media Science, University in Bergen, Norway [email protected] Abstract. Free form tagging was one of the most useful contributions of “Web2.0” toward the problem of content management and discovery on the web. Semantic tagging is a more recent but much less successful innovation borne of frustration at the limitations of free form tagging. In this paper we present LexiTags, a new platform designed to help realize the potential of semantic tagging for content management, and as a tool for crowdsourcing semantic metadata. We describe the operation of the LexiTags semantic bookmarking service, and present results from tools that exploit the semantic tags. These tools show that crowdsourcing can be used to model the taxonomy of an information space, and to semantically annotate resources within the space. Keywords. crowdsourcing, metadata, bookmarking, tagging, semantic tags 1 Introduction The emergence of "Web2.0"1 brought a number of innovations which changed the way people interact with information on the World Wide Web. The new paradigms made it easy for anyone to contribute content rather just consume. One of the early success stories was social tagging, which gave rise to folksonomies2 as a way to organise and find information on the Web through emergent shared vocabularies developed by the users themselves. Social tagging for content management and discovery became very popular in commercial services like the photo sharing site flickr.com and the bookmarking site delicious.com. These successes prompted some commentators to declare victory of user driven content tagging over the “overly complex” technologies of the semantic web. Perhaps most famously, in a web post entitled “Ontology is Overrated: Categories, Links, and Tags” Clay Shirky argued that any technology based on hierarchical classification (including ontologies) was doomed to fail when applied to the world of electronic resources [1]. Instead, simple naive tagging opened the door to crowdsourced content management, where dynamic user contributed metadata in a flat tag space offered a breakthrough in findability. However, researchers and information architects soon began to point out the limitations of unconstrained tagging for enhancing information findability. [2] identified a number of problems with tagging, which can limit its effective usefulness. Among the problems were tag ambiguity (e.g. apple - fruit vs. apple - company), idiosyncratic 1 T. O'Reilly. What Is Web 2.0: Design Patterns and Business Models for the Next Generation of Software. http://www.oreillynet.com/pub/a/oreilly/tim/news/ 2005/09/30/what-is-web-20.html, 2005. 2 http://iainstitute.org/news/000464.php#000464 treatment of multi word tags (e.g. vertigovideostillsbbc, design/css), synonyms (e.g. Mac, Macintosh, Apple), the use of acronyms and other terms as synonymous terms (e.g. NY, NYC, Big Apple), and of course mis spelled and idiosyncratic made up tags. These factors limit the use of tags in large scale information management. For example [3] discuss limitations of searching with tags, which is necessarily based on syntactic matching since there are no semantic links between individual tags. Thus, searching with “NYC” will not guarantee that results tagged with “NY” will be retrieved. Semantic tags or rich tags as they are known in the context of social tagging, emerged as a way to impose consistent and refined meanings to user tags [4]. The two best known semantic tagging sites were Faviki3 and Zigtag4 (the latter now appears to be defunct). Each site expected users to use tags from a large collection of provided terms. Faviki used WikiPedia identifiers, while Zigtag used a “semantic dictionary”. Both sites also allowed tags which were not in their initial knowledge base. In the case of Faviki, users can link undefined tags to a web page which best represents the tag. The web page is located by a simple web search. Zigtag allowed the use of undefined tags, with the expectation that users would later return and provide definitions of the tags. However a large proportion of tags remained without definition, leading to a mishmash of defined and undefined tags. LexiTags [5] was initially developed for similar reasons, as a tool for content management with rich tags. But as a semantic application it also had higher aspirations, to provide a platform and a set of front end tools designed to crowdsource the semantic web. By providing intuitive access points (APIs) to user generated metadata with semantic tags, the platform was expected to outgrow its initial purpose and provide novel new benefits for its users while at the same time generating semantic metadata for general consumption. The intuition is that users should have systems which behave as Web2.0 at the point of insertion, yet as Semantic Web at the point of retrieval. In this paper we present the core LexiTags system, and describe some tools we have developed to capitalise on the crowdsourced semantic metadata. 2. LexiTags LexiTags5 is an acronym for “Lexical Tags”, from the fact that the tags are primarily lexical items, or natural language dictionary words. They are disambiguated through the use of an interface which presents the user with a set of choices from WordNet, an electronic lexical database [6]. The main content bearing units in WordNet are synsets, which are represented by contextually synonymous word meanings grouped in a single entry. The word couch for example is represented by the synset {sofa, couch, lounge}. But couch is of course an ambiguous word whose alternate meaning appears is a second synset {frame, redact, cast, put, couch} as in “Let me couch that statement …. “. There is therefore no ambiguity in WordNet because every synset represents a unique meaning for a word string. LexiTags presents such synsets, and a short gloss, to help users chose their intended meaning. The use of synsets as tags can combine precise definitions without the need to adopt a set of idiosyncratic keywords. Mappings can be set up between synsets and any other vocabulary, enabling specific keyword markup through a natural language interface. 3 http://www.faviki.com/pages/welcome/ 4 http://zigtag.com 5 http://lexitags.dyndns.org/ui/webgui/ The success of mapping efforts can of course vary. While

Crowdsourcing,The,Semantic,Web

Lecture Notes in Computer Science 7032 Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan Van Leeuwen

An Overview of Ontoclean

Cristobalthesis

Semantic Web: a Review of the Field Pascal Hitzler [email protected] Kansas State University Manhattan, Kansas, USA

Introduction to Semantic Web Technologies & Linked Data

Semantics Driven Human-Machine Computation Framework for Linked Islamic Knowledge Engineering

An Overview of Ontoclean

Ontology-Driven Conceptual Modeling

IBM Research Report Harnessing Disagreement in Crowdsourcing A

The Disagreement Deconvolution: Bringing Machine Learning Performance Metrics in Line with Reality

Evaluating Ontology Cleaning

Crowd Truth and the Seven Myths of Human Annotation Aroyo, Lora; Welty, Chris