Semantics Driven Human-Machine Computation Framework for Linked Islamic Knowledge Engineering

Total Page:16

File Type:pdf, Size:1020Kb

Semantics Driven Human-Machine Computation Framework for Linked Islamic Knowledge Engineering Semantics Driven Human-Machine Computation Framework for Linked Islamic Knowledge Engineering by Amna Basharat (Under the Direction of Khaled Rasheed and I. Budak Arpinar) Abstract Formalized knowledge engineering activities including semantic annotation and linked data management tasks in specialized domains suffer from considerable knowledge acquisition bottleneck - owing to the lack of availability of experts and in-efficacy of automated approaches. Human Computation & Crowdsourcing (HC&C) methods successfully advocate leveraging the human intelligence and pro- cessing power to solve problems that are still difficult to be solved computationally. Contextualized to the domain of Islamic Knowledge, this research investigates the synergistic interplay of these HC&C methods and the semantic web and proposes a semantics driven human-machine computation framework for knowledge engineer- ing in specialized and knowledge intensive domains. The overall objective is to augment the process of automated knowledge extraction and text mining methods using a hybrid approach for combining collective intelligence of the crowds with that of experts to facilitate activities in formalized knowledge engineering - thus overcoming the so-called knowledge acquisition bottleneck. As part of this framework, we design and implement formal and scalable knowl- edge acquisition workflows through the application of semantics driven crowdsourc- ing methodology and its specialized derivative, called learnersourcing. We evaluate these methods and workflows for a range of knowledge engineering tasks including thematic classification, thematic disambiguation, thematic annotation and contex- tual interlinking for two primary Islamic texts, namely the Qur’an and the books of Prophetic narrations called the Hadith. This is done at various levels of granu- larity, including atomic and composite task workflows, that existing research fails to address. We leverage primarily upon students and learners engaging in typical knowledge seeking and learning scenarios. The chosen method ensures annotation reliability by introducing an ’expert sourcing’ workflow tightly integrated within the system. Therefore, quantitative measures of ensuring annotation quality are woven into the very fabric of the human computation framework. The results of our evaluation demonstrate that our proposed methods are robust and are capable of generating high quality and reliable annotations, while significantly reducing the need for expert contributions. Index words: Qur’an, Quran, Hadith, Semantic, Islam, Islamic, Crowdsourcing, Learnersourcing, Expert-sourcing, Human Computation, Thematic Annotation, Thematic Disambiguation Semantics Driven Human-Machine Computation Framework for Linked Islamic Knowledge Engineering by Amna Basharat B.E(SE)., National University of Science and Technology, 2005 M.Sc., Brunel Univeristy, 2007 A Dissertation Submitted to the Graduate Faculty of The University of Georgia in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Athens, Georgia 2016 © 2016 Amna Basharat All Rights Reserved Semantics Driven Human-Machine Computation Framework for Linked Islamic Knowledge Engineering by Amna Basharat Approved: Major Professors: Khaled Rasheed I. Budak Arpinar Committee: Krys Kochut Thiab Taha Electronic Version Approved: Suzanne Barbour Dean of the Graduate School The University of Georgia December 2016 Semantics Driven Human-Machine Computation Framework for Linked Islamic Knowledge Engineering Amna Basharat December 2016 Inspiration In the name of Allah, the most merciful, the most beneficent. And those who strive for Us - We will surely guide them to Our ways. And indeed, Allah is with the doers of good. — [Al-Quran, Al-Ankaboot,29:69] May Allah accept this humble effort, purely for His sake, and make this a source of elevating my ranks in this world and hereafter, along with all those who have contributed in making this effort possible, foremost amongst whom are my parents, teachers, siblings, family and friends. v vi Dedicated to Ammi and Abbu Ji, with all my love vii Acknowledgments All thanks and praise be to Almighty Allah, the Lord of the Worlds. It is only due to the grace and blessings of Almighty Allah (God), that I have been blessed with the poise, the will and the perseverance to see this milestone to an end. It was only a dream when I came. Only Allah has made its realization possible. After the prayers and the support of all my loved ones, I am deeply beholden to my advisor, Dr. Khaled Rasheed, who placed a great deal of confidence and trust viii in my ideas and abilities. He gave me both independence of thought and decision making, with the perfect balance of guidance and empathy, to help me realize my dream goals that I was most passionate about. When I look back at my years spent at UGA, Dr. Rasheed stands out as someone who played the most crucial role towards my successes and growth as an individual, and towards achieving my research vision. I will always continue to derive inspiration and learning from my time spent under his fatherly supervision. I would like to convey my immense gratitude to my co-advisor, Dr. I. Budak Arpinar, who also played a crucial role in driving my research objective and path. I am extremely grateful for the valuable suggestions, unprecedented guidance, appreciation and encouragement I received from him. He always encouraged me and challenged me to think with high level impact in mind and it certainly helped me explore avenues that I would not have been able to otherwise. I would also like to thank my committee members Dr. Krys Kochut and Dr. Thiab Taha, for their valuable time and feedback towards refining my research objectives and direction. My years of learning at UGA will have a profound impact on my life to come. I would like to express thanks to my colleagues Usman Nisar, Pan Huang, Kyle Krafka and Delaram Yazdansepas for their support and help. I would also like to thank the support staff at CS UGA, especially Suzi, Lafarrah, Timothy and Melody for their help and facilitation in all matters. I would also like to express my gratitude to my Fulbright advisor, Brian Diffley, for his support always. I would like to thank the IIE and USEFP and all those who ix enabled this scholarship opportunity and helped me see through the completion of this degree. I would like to express my special gratitude to Ms. Rita Akhtar (Executive Director, USEFP) and those along her side, and Ms. Angela P. Aggeler (US Embassy, Pakistan), who helped me at a difficult stage in my program and ensuring everything that would help reach me a successful completion. I am ex- tremely grateful to Dr. Aftab Maroof and Dr. Arshad Ali Shahid (FAST-NUCES, Islamabad) for facilitating, guiding and mentoring me in this time. My very special thanks to my Bhaiii who took care of me all these years, played the role of ’choti ammi’ and ’chota daddy’, and who has always placed great pride and confidence in me and also put up with my three year old acts. This PhD would simply not have been the same without you. It makes me really proud to have you as my brother and I am grateful for all the valuable life lessons I have learnt from you bro!. I would also like to say special thanks to my little sister, An’amta, who specially played a huge role in ensuring ease and comfort for the last few months of my PhD, when I needed all the more support and help. Her presence was indeed a source of Barakah in my time and efforts, Alhamdulillah. A shout out to Aapi’s little brothers Hebban and Zahir. Zahir, thank you for the great company, the forest walks, the great food and not to forget the endless questions and conversations. Hebban, thank you for all the wish lists and for always making sure that my luggage is loaded on my trips home. I am extremely grateful, from the bottom of my heart, especially to so many members of my extended family, my khalas, mamus, cousins and others who always x load me with duas, love and pampering throughout my life. My cousins Murrium and Umar have especially been a huge support for my travel trips throughout this period of five years. I feel extremely fortunate and blessed to have loving friends, who may not have been physically present, yet who stood by me, supported me, cheered me on, and helped me in every possible manner they could, with their efforts, duas and foremost their love and encouragement. Mahwash, Humaira, Tahreem, Sana, Amina, Zainab Abaid, Zainab Jamil and the Baqais to name a few, I owe it to all of you. So many others who always loaded me with duas and well wishes, may Allah reward you with the very best in both worlds. During these years, Allah also brought some special people into my life, as a special gift and mercy. Eternal friends, whom I feel so blessed to have in my life. My two pseudo moms are truly exceptional. I thank you both for bringing so much more joy, positivity and enlightenment that you bring, for all the duas, mentoring and support and for making my life and the process of my PhD in particular all the more special and cherished. I also would like to express great admiration and love for my friend, SNG. She has been a role model, an inspiration and someone I look up to a lot and has been a mentor for me in this PhD journey. I cannot forget all my friends here in Athens and the Atlanta community, both within and outside UGA, who made my stay comfortable and helped me in every possible way they can. I would specially like to mention Amena and Ayesha for always being there. xi I also acknowledge the efforts of my ex-students. Muhammad Shoaib, was always a great source of help and support in the initial stages of my PhD.
Recommended publications
  • Lecture Notes in Computer Science 7032 Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan Van Leeuwen
    Lecture Notes in Computer Science 7032 Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbruecken, Germany Lora Aroyo Chris Welty Harith Alani Jamie Taylor Abraham Bernstein Lalana Kagal Natasha Noy Eva Blomqvist (Eds.) The Semantic Web – ISWC 2011 10th International Semantic Web Conference Bonn, Germany, October 23-27, 2011 Proceedings, Part II 13 Volume Editors Lora Aroyo VU University Amsterdam, The Netherlands; [email protected] Chris Welty IBM Research, Yorktown Heights, NY, USA; [email protected] Harith Alani The Open University, Milton Keynes, UK; [email protected] Jamie Taylor Google, Mountain View, CA, USA; [email protected] Abraham Bernstein University of Zurich,
    [Show full text]
  • An Overview of Ontoclean
    8 An Overview of OntoClean Nieola Guarino1 and Christopher A. Welty lLaboratory for Applied Ontology (ISTC-CNR) Polo Tecnologico, Via Solteri 38, 38100 Trento, ITAL Y [email protected] 2IBM Watson Research Center 19 Skyline Dr., Hawthome, NY 10532, USA [email protected] Summary. OntoClean is a methodology for validating the ontologie al adequaey of taxonomie relationships. It is based on highly general ontologieal notions drawn from philosophy, like essence, identity, and unity, whieh are used to ehar­ aeterize relevant aspeets of the intended meaning of the properties, classes, and relations that make up an ontology. These aspeets are represented by formal metaproperties, whieh impose several eonstraints on the taxonomie strueture of an ontology. The analysis of these eonstraints helps in evaluating and validating the ehoiees made. In this ehapter we present an informal overview ofthe philoso­ phieal notions involved and their role in OntoClean, review some eommon onto­ logieal pitfalls, and walk through the example that has appeared in pieees in pre­ vious papers and has been the basis of numerous tutorials and talks. 8.1 Introduction The OntoClean methodology was first introdueed in aseries of eonferenee-Iength papers in 2000 [Guarino and Welty, 2000a-e; Welty and Guarino, 2001], and re­ eeived mueh attention and use in subsequent years. The main eontribution of On­ toClean was the beginning of a formal foundation for ontologieal analysis. Alan Reetor, a seasoned veteran at ontologieal analysis in the medieal domain, said of OntoClean, " ... what you have done is reduee the amount of time I spend arguing with doetors that the way I want to model the world is right..." [Reetor, 2002].
    [Show full text]
  • Cristobalthesis
    Open Personalization: Involving Third Parties in Improving the User Experience of Websites Dissertation presented to the Department of Computer Languages and Systems of the University of the Basque Country in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy (“international” mention) Cristóbal Arellano Bartolomé Supervisors: Prof. Dr. Oscar Díaz García Dr. Jon Iturrioz Sánchez San Sebastián, Spain, 2013 This work was hosted by the University of the Basque Country (Faculty of Computer Sciences). The author enjoyed a doctoral grant under de FPI (Formacion de Personal Investigador) from the Spanish Ministry of Science & Education during the years 2007 to 2011. The work was was co- supported by the Spanish Ministry of Education, and the European Social Fund under contracts (TIN2005-05610), MODELINE (TIN2008-06507- C02-01) and Scriptongue (TIN2011-23839). Summary Traditional software development captures the user needs during the requirement analysis. The Web makes this endeavour even harder due to the difficulty to determine who these users are. In an attempt to tackle the heterogeneity of the user base, Web Personalization techniques are proposed to guide the users’ experience. In addition, Open Innovation allows organisations to look beyond their internal resources to develop new products or improve existing processes. This thesis sits in between by introducing Open Personalization as a means to incorporate actors other than webmasters in the personalization of web applications. The aim is to provide the technological basis that builds up a trusty environment for webmasters and companion actors to collaborate, i.e. "an architecture of participation". Such architecture very much depends on these actors’ profile.
    [Show full text]
  • Semantic Web: a Review of the Field Pascal Hitzler [email protected] Kansas State University Manhattan, Kansas, USA
    Semantic Web: A Review Of The Field Pascal Hitzler [email protected] Kansas State University Manhattan, Kansas, USA ABSTRACT which would probably produce a rather different narrative of the We review two decades of Semantic Web research and applica- history and the current state of the art of the field. I therefore do tions, discuss relationships to some other disciplines, and current not strive to achieve the impossible task of presenting something challenges in the field. close to a consensus – such a thing seems still elusive. However I do point out here, and sometimes within the narrative, that there CCS CONCEPTS are a good number of alternative perspectives. The review is also necessarily very selective, because Semantic • Information systems → Graph-based database models; In- Web is a rich field of diverse research and applications, borrowing formation integration; Semantic web description languages; from many disciplines within or adjacent to computer science, Ontologies; • Computing methodologies → Description log- and a brief review like this one cannot possibly be exhaustive or ics; Ontology engineering. give due credit to all important individual contributions. I do hope KEYWORDS that I have captured what many would consider key areas of the Semantic Web field. For the reader interested in obtaining amore Semantic Web, ontology, knowledge graph, linked data detailed overview, I recommend perusing the major publication ACM Reference Format: outlets in the field: The Semantic Web journal,1 the Journal of Pascal Hitzler. 2020. Semantic Web: A Review Of The Field. In Proceedings Web Semantics,2 and the proceedings of the annual International of . ACM, New York, NY, USA, 7 pages.
    [Show full text]
  • Introduction to Semantic Web Technologies & Linked Data
    IntroductionIntroduction toto SemanticSemantic WebWeb TechnologiesTechnologies && LinkedLinked DataData OktieOktie HassanzadehHassanzadeh UniversityUniversity ofof TorontoToronto March 2011 CS 443: Database Management Systems - Winter 2011 Outline 2 Introduction Semantic Web Technologies Resource Description Framework (RDF) Querying RDF data (SPARQL) Linked Data Linked Data Principles Linking Open Data Community Project Example Data Sources Example Applications 3 Introduction Web of Documents vs. Web of Data Web of Documents 4 Untyped Untyped Untyped Links Links Links API/ HTML HTML HTML XML A B C D Primary objects: documents Links between documents (or parts of them) Degree of structure in data: fairly low Implicit semantics of contents Designed for: human consumption Based on presentations by Chris Bizer, Richard Cyganiak, Tom Heath, available at http://linkeddata.org/guides-and-tutorials Web of Documents: Problem 5 ? thing thing Are two documents talking about the same ? ? “thing”? ? ? ? Untyped Untyped Untyped Links Links Links API/ HTML HTML HTML XML A B C D Based on presentations by Chris Bizer, Richard Cyganiak, Tom Heath, available at http://linkeddata.org/guides-and-tutorials Example Query 6 Elvis Presley 1935 - 1977 Will there ever be someone like him again? Based on presentation by Lauw, Schenkel, Suchanek, Theobald and Weikum, available at http://www.mpi-inf.mpg.de/yago-naga/CIKM10-tutorial/ Example Query 7 Another Elvis Elvis Presley: The Early Years Elvis spent more weeks at the top of the charts than any other artist. www.fiftiesweb.com/elvis.htm Based on presentation by Lauw, Schenkel, Suchanek, Theobald and Weikum, available at http://www.mpi-inf.mpg.de/yago-naga/CIKM10-tutorial/ Example Query 8 Another singer called Elvis, young Personal relationships of Elvis Presley – Wikipedia ...when Elvis was a young teen...
    [Show full text]
  • An Overview of Ontoclean
    8 An Overview of OntoClean Nicola Guarino1 and Christopher A. Welty2 1Laboratory for Applied Ontology (ISTC-CNR) Polo Tecnologico, Via Solteri 38, 38100 Trento, ITALY [email protected] 2IBM Watson Research Center 19 Skyline Dr., Hawthorne, NY 10532, USA [email protected] Summary. OntoClean is a methodology for validating the ontological adequacy of taxonomic relationships. It is based on highly general ontological notions drawn from philosophy, like essence, identity, and unity, which are used to characterize relevant aspects of the intended meaning of the properties, classes, and relations that make up an ontology. These aspects are represented by formal metaproperties, which impose several constraints on the taxonomic structure of an ontology. The analysis of these constraints helps in evaluating and validating the choices made. In this chapter we present an informal overview of the philosophical notions in- volved and their role in OntoClean, review some common ontological pitfalls, and walk through the example that has appeared in pieces in previous papers and has been the basis of numerous tutorials and talks. 8.1 Introduction The OntoClean methodology was first introduced in a series of conference-length papers in 2000 [Guarino and Welty, 2000a-c; Welty and Guarino, 2001], and re- ceived much attention and use in subsequent years. The main contribution of On- toClean was the beginning of a formal foundation for ontological analysis. Alan Rector, a seasoned veteran at ontological analysis in the medical domain, said of OntoClean, “…what you have done is reduce the amount of time I spend arguing with doctors that the way I want to model the world is right…” [Rector, 2002].
    [Show full text]
  • Ontology-Driven Conceptual Modeling
    Ontology-Driven Conceptual Modeling Chris Welty IBM Watson Research Center CAISE-02 Acknowledgements People Organizations Vassar College, USA Nicola Guarino LADSEB-CNR, Padova Cladio Masolo CNR Cognitive Science Aldo Gangemi Institute, Trento Alessandro Oltramari Bill Andersen OntologyWorks, Inc. CAISE-02 2 1 Outline • Setting the record straight • Motivation • Formal foundation • “Upper Level” distinctions • Common pitfalls CAISE-02 3 What is Ontology? • A discipline of Philosophy – Meta-physics dates back to Artistotle • Meta (after) + physica (physical, real) – Ontology dates back to 17th century • Ontos (that which exists) + logos (knowledge of) • As in TorONTO, ONTario, ON TOp – The science of what is (in the universe) – “One universe, One ontology” • Quine, 1969: “To exist is to be the value of a quantified variable” CAISE-02 4 2 What is Ontology? • Borrowed by AI community – McCarthy (1980) calls for “a list of things that exist” – Specify all the kinds of things that can be the values of variables • Evolution of meaning in CS – Now refers to domain modeling, conceptual modeling, knowledge engineering, etc. • Note: not a “new name for an old thing” CAISE-02 5 What is an Ontology? • Poor definition: “Specification of a conceptualization” [Gruber, 1993] • Better: “Description of the kinds of entities there are and how they are related.” • Good ontologies should provide: – Meaning –Agreement – Organization – Common Understanding – Taxonomy – Vocabulary – Connection to the “real world” CAISE-02 6 3 What is an Ontology? a set of a collection
    [Show full text]
  • IBM Research Report Harnessing Disagreement in Crowdsourcing A
    RC25371 (WAT1304-058) April 23, 2013 Computer Science IBM Research Report Harnessing Disagreement in Crowdsourcing a Relation Extraction Gold Standard Lora Aroyo VU University Amsterdam, The Netherlands Chris Welty IBM Research Division Thomas J. Watson Research Center P.O. Box 208 Yorktown Heights, NY 10598 USA Research Division Almaden - Austin - Beijing - Cambridge - Haifa - India - T. J. Watson - Tokyo - Zurich LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g. , payment of royalties). Copies may be requested from IBM T. J. Watson Research Center , P. O. Box 218, Yorktown Heights, NY 10598 USA (email: [email protected]). Some reports are available on the internet at http://domino.watson.ibm.com/library/CyberDig.nsf/home . Harnessing disagreement in crowdsourcing a relation extraction gold standard Lora Aroyo Chris Welty VU University Amsterdam IBM Watson Research Center The Netherlands USA [email protected] [email protected] ABSTRACT are described in natural language. The importance of rela- One of the first steps in any kind of web data analytics tions and their interpretation is widely recognized in NLP, is creating a human annotated gold standard. These gold but whereas NLP technology for detecting entities (such as standards are created based on the assumption that for each people, places, organizations, etc.) in text can be expected annotated instance there is a single right answer.
    [Show full text]
  • Crowdsourcing,The,Semantic,Web
    ! ! ! ! ! ! Proceedings! of!the! First!International!Workshop! on! ! Crowdsourcing,the,Semantic,Web, held%in%conjunction%with! the!12th!International!Semantic!Web!Conference! ! Sydney,!Australia! ! Editors:! Maribel!Acosta! Lora!Aroyo! Abraham!Bernstein! Jens!Lehmann! Natasha!Noy! Elena!Simperl! ! Copyright!©!2013!for!the!individual!papers!by!the!papers’!authors.!Copying! permitted!only!for!private!and!academic!purposes.!This!volume!is!published!and! copyrighted!by!its!editors.! Preface This volume contains the papers presented at the 1st International Workshop on ”Crowdsourcing the Semantic Web” that was held in conjunction with the 12th International Semantic Web Conference (ISWC 2013), 21-25 October 2013, in Sydney, Australia. This interactive workshop takes stock of the emergent work and chart the research agenda with interactive sessions to brainstorm ideas and potential applications of collective intelligence to solving AI hard semantic web problems. There were 12 submissions. Each submission was reviewed by at least 2, and on the average 3, program committee members. The committee decided to accept 9 papers. Our special thanks goes to the reviewers who diligently reviewed the papers within this volume. September 3, 2013 Maribel Acosta Lora Aroyo Abraham Bernstein Jens Lehmann Natasha Noy Elena Simperl v Table of Contents Full Papers Crowdsourced Semantics with Semantic Tagging: “Don’t just tag it, LexiTag it!” ...................................................... 1 Csaba Veres ”Dr. Detective”: combining gamification techniques and crowdsourcing to create a gold standard in medical text ............................. 16 Anca Dumitrache, Lora Aroyo, Chris Welty, Robert-Jan Sips and An- thony Levas SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing 32 Umair Ul Hassan, Sean O’Riain and Edward Curry Content and Behaviour Based Metrics for Crowd Truth ...............
    [Show full text]
  • The Disagreement Deconvolution: Bringing Machine Learning Performance Metrics in Line with Reality
    The Disagreement Deconvolution: Bringing Machine Learning Performance Metrics In Line With Reality Mitchell L. Gordon Kaitlyn Zhou Kayur Patel Stanford University Stanford University Apple Inc. [email protected] [email protected] [email protected] Tatsunori Hashimoto Michael S. Bernstein Stanford University Stanford University [email protected] [email protected] ABSTRACT 1 INTRODUCTION Machine learning classifiers for human-facing tasks such as com- The machine learning classifiers that underpin modern social com- ment toxicity and misinformation often score highly on metrics puting systems such as Facebook, Wikipedia, and Twitter are a such as ROC AUC but are received poorly in practice. Why this gap? study in contrasts. On one hand, current performance metrics for Today, metrics such as ROC AUC, precision, and recall are used to popular tasks such as comment toxicity classification and disin- measure technical performance; however, human-computer inter- formation detection are extremely high, featuring up to .95 ROC action observes that evaluation of human-facing systems should AUC [11, 72], making the problem appear nearly solved. On the account for people’s reactions to the system. In this paper, we intro- other hand, audits suggest that these algorithms’ performance is duce a transformation that more closely aligns machine learning in reality much poorer than advertised [56]. Even well-intentioned classification metrics with the values and methods of user-facing platforms will continue to make highly-publicized mistakes [16] performance measures. The disagreement deconvolution takes in until they can better align metrics with reality. any multi-annotator (e.g., crowdsourced) dataset, disentangles sta- This disconnect between classifier performance and user-facing ble opinions from noise by estimating intra-annotator consistency, performance is indicative of a larger disconnect between how ma- and compares each test set prediction to the individual stable opin- chine learning (ML) and human-computer interaction (HCI) re- ions from each annotator.
    [Show full text]
  • Evaluating Ontology Cleaning
    Evaluating Ontology Cleaning Christopher Welty, Ruchi Mahindru, and Jennifer Chu-Carroll IBM T.J. Watson Research Center 19 Skyline Dr. Hawthorne, NY 10532 {welty,jencc}@us.ibm.com, [email protected] Abstract very little evidence that it can have impact on Ontology as a discipline of Computer Science has made knowledge-based systems. In fact, there appears many claims about its usefulness, however to date there has to be a significant obstacle in understanding the been very little evaluation of those claims. We present the methodology, and even without this “learning results of an experiment using a hybrid search system with a curve”, significant manual effort must be significant knowledge-based component to measure, using expended to employ the methodology to develop precision and recall, the impact of improving the quality of actual “clean” ontologies. Furthermore, there an ontology on overall performance. We demonstrate that has been no clear argument that such an improving the ontology using OntoClean (Guarino and expenditure will pay for itself in the long run. Welty, 2002), does positively impact performance, and that Indeed, “Why does it matter?” has been the most having knowledge of the search domain is more effective than domain-knowledge-free search techniques such as link frequent criticism of the OntoClean approach. analysis. We report here on a series of experiments using a Content Areas: Ontologies hybrid search system with a significant knowledge-based component to test the impact of improving the quality of ontologies on system Introduction performance. The use of search as a test system provides a well understood framework for Ontologies have been proposed as a solution to empirical evaluation, and gives an excellent numerous problems in areas such as search, opportunity to address the “Why does it matter?” semantic integration, agents, human-computer question.
    [Show full text]
  • Crowd Truth and the Seven Myths of Human Annotation Aroyo, Lora; Welty, Chris
    VU Research Portal Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation Aroyo, Lora; Welty, Chris published in The AI Magazine 2015 DOI (link to publisher) 10.1609/aimag.v36i1.2564 document version Publisher's PDF, also known as Version of record Link to publication in VU Research Portal citation for published version (APA) Aroyo, L., & Welty, C. (2015). Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation. The AI Magazine, 36(1), 15-24. https://doi.org/10.1609/aimag.v36i1.2564 General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal ? Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. E-mail address: [email protected] Download date: 30. Sep. 2021 Articles Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation Lora Aroyo, Chris Welty n Big data is having a disruptive impact amar prestar aen … “The world is changed.” In the past across the sciences.
    [Show full text]