Automatically Acquiring a Semantic Network of Related Concepts

University of Central Florida STARS Electronic Theses and Dissertations, 2004-2019 2013 Automatically Acquiring A Semantic Network Of Related Concepts Sean Szumlanski University of Central Florida Part of the Computer Sciences Commons, and the Engineering Commons Find similar works at: https://stars.library.ucf.edu/etd University of Central Florida Libraries http://library.ucf.edu This Doctoral Dissertation (Open Access) is brought to you for free and open access by STARS. It has been accepted for inclusion in Electronic Theses and Dissertations, 2004-2019 by an authorized administrator of STARS. For more information, please contact [email protected]. STARS Citation Szumlanski, Sean, "Automatically Acquiring A Semantic Network Of Related Concepts" (2013). Electronic Theses and Dissertations, 2004-2019. 2585. https://stars.library.ucf.edu/etd/2585 AUTOMATICALLY ACQUIRING A SEMANTIC NETWORK OF RELATED CONCEPTS by SEAN SZUMLANSKI B.S. University of Central Florida, 2004 M.S. University of Central Florida, 2005 A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Electrical Engineering and Computer Science in the College of Engineering and Computer Science at the University of Central Florida Orlando, Florida Spring Term 2013 Major Professor: Fernando Gomez © 2013 Sean Szumlanski ii ABSTRACT We describe the automatic acquisition of a semantic network in which over 7,500 of the most frequently occurring nouns in the English language are linked to their semantically related concepts in the WordNet noun ontology. Relatedness between nouns is discovered automatically from lexical co-occurrence in Wikipedia texts using a novel adaptation of an information theoretic inspired measure. Our algorithm then capitalizes on salient sense clustering among these semantic associates to automatically disambiguate them to their corresponding WordNet noun senses (i.e., concepts). The resultant concept-to-concept associations, stemming from 7,593 target nouns, with 17,104 distinct senses among them, constitute a large-scale semantic network with 208,832 undirected edges between related concepts. Our work can thus be conceived of as augmenting the WordNet noun ontology with RelatedTo links. The network, which we refer to as the Szumlanski-Gomez Network (SGN), has been subjected to a variety of evaluative measures, including manual inspection by human judges and quantitative comparison to gold standard data for semantic relatedness measurements. We have also evaluated the network’s performance in an applied setting on a word sense disambiguation (WSD) task in which the network served as a knowledge source for established graph-based spreading activation algorithms, and have shown: a) the network is competitive with WordNet when used as a stand-alone knowledge source for WSD, b) combining our network with WordNet achieves disambiguation results that exceed the performance of either resource individually, and c) our network outperforms a similar resource, WordNet++ (Ponzetto & Navigli, 2010), that has been automatically derived from annotations in the Wikipedia corpus. iii Finally, we present a study on human perceptions of relatedness. In our study, we elicited quantitative evaluations of semantic relatedness from human subjects using a variation of the classical methodology that Rubenstein and Goodenough (1965) employed to investigate human perceptions of semantic similarity. Judgments from individual subjects in our study exhibit high average correlation to the elicited relatedness means using leave-one-out sampling (r = 0.77, σ = 0.09, N = 73), although not as high as average human correlation in previous studies of similarity judgments, for which Resnik (1995) established an upper bound of r = 0.90 (σ = 0.07, N = 10). These results suggest that human perceptions of relatedness are less strictly constrained than evaluations of similarity, and establish a clearer expectation for what constitutes human-like performance by a computational measure of semantic relatedness. We also contrast the performance of a variety of similarity and relatedness measures on our dataset to their performance on similarity norms and introduce our own dataset as a supplementary evaluative standard for relatedness measures. iv That I am able to enjoy a life of safety, liberty, and broad acceptance as an openly gay man is a privilege that has been purchased for me by the suffering of countless other human beings. This dissertation is dedicated to those who found the courage to lead open and honest lives in the face of tremendous adversity, in memory of those who lost their lives for doing so, and to all who have stood with us in the protracted struggle for LGBT equality. v ACKNOWLEDGMENTS I would first like to thank my committee for their countless contributions to my graduate career over the years. Charlie Hughes, Annie Wu, and Valerie Sims have been my teachers, collaborators, co-authors, and mentors. They have impressed me not only with their invaluable intellectual contributions to my work, but also with how tirelessly they work to set their students up for success. The encouraging and giving nature of these individuals has made me want to work harder and give more to my own students and colleagues. I am particularly grateful to my advisor, Fernando Gomez. He has spent countless hours in conversation with me over the years, passing on his knowledge not only of artificial intelligence and computational linguistics, but also of art, philosophy, history, literature, and so many other things. In my early years under his advisement, one was as likely to find us discussing Chomsky as one was to find us discussing Lorca and Franco, Picasso’s Guernica, Bach’s cantatas, or Goya’s witches. I found in him a veritable Abbé Faria (in the Dumasian sense) whose breadth and depth of knowledge helped make me a more educated and well- rounded person. My life is richer for our time together, and I am grateful. Maxine Najle helped facilitate data collection for the perceptions of relatedness study that is included as part of this dissertation. I owe her huge thanks for that and for giving so generously of her time during one of the most demanding semesters of her undergraduate career. I am also grateful to my colleagues from the UCF AI Lab whose discussions, counsel, distractions, and friendship over the years contributed to my education and personal wellbeing. In lexicographic order, they are: Adam Campbell, who inspired me with his diligence and unassuming competence; Adelein Rodriguez, who helped keep me grounded with her vi perspectives on life, AI, and so many things in between; Andy Schwartz, who forged ahead of me on this incredible journey and, in doing so, cleared away some of the thorny brush along the trail and showed me it was possible to reach the goal; Chris Millward, who impressed me with his knack for coming up with creative solutions to problems large and small and by being one of the most level-headed and genuine people I know; Nadeem Mohsin, who made the AI Lab immeasurably more interesting, geeky, and fun by sharing just some of what is stored in his amazingly encyclopedic brain; and Ramya Pradhan, who reminded me that sometimes we have to make sacrifices for the things we want. I would like to extend my thanks to Arup Guha and Ali Orooji, who have been long-time teaching mentors and who played critical roles in getting me into the classroom and eventually having me teach classes on my own. I would also like to thank Ali Orooji and the UCF Programming Team for welcoming me as a guest at their practices in the fall semesters of 2011 and 2012, where I acquired invaluable experience honing my craft as a programmer and benefited from their collective knowledge. I am grateful to the Office of Student Conduct for providing me with so many opportunities to engage in challenging, meaningful work as a member of UCF’s Student Conduct Review Board—work that broadened my mind, brought much-needed balance to my life, and offered me tremendous personal growth. I am particularly grateful to Dana Juntunen, Director of the Office of Student Rights and Responsibilities, for the special role she played as a mentor to me during my time on the Board. I am profoundly grateful to my parents, who have always been there for me, who made incredible sacrifices to give me a good life, and who told me from a very young age that I could vii achieve anything I set my mind to, and to my brother, who is a tremendous friend, whom I admire for the kindness and compassion in his heart, and who makes me feel understood and valued for the person I am. Finally, I would like to thank NASA and the Division of Computer Science at UCF for their generous financial support over the years. This research was supported in part by the NASA Engineering and Safety Center under Grant/Cooperative Agreement NNX08AJ98A. viii TABLE OF CONTENTS LIST OF FIGURES......................................................................................................................xiv LIST OF TABLES........................................................................................................................xvi LIST OF ACRONYMS AND ABBREVIATIONS........................................................................xx 1 INTRODUCTION....................................................................................................................1 1.1 Semantic Memory and WordNet.....................................................................................2

Automatically Acquiring a Semantic Network of Related Concepts

Automatic Wordnet Mapping Using Word Sense Disambiguation*

Probabilistic Topic Modelling with Semantic Graph

Semantic Memory: a Review of Methods, Models, and Current Challenges

Knowledge Graphs on the Web – an Overview Arxiv:2003.00719V3 [Cs

Large Semantic Network Manual Annotation 1 Introduction

Universal Or Variation? Semantic Networks in English and Chinese

Structure at Every Scale: a Semantic Network Account of the Similarities Between Very Unrelated Concepts

March 21–25, 2016

Untangling Semantic Similarity: Modeling Lexical Processing Experiments with Distributional Semantic Models

NL Assistant: a Toolkit for Developing Natural Language: Applications

Similarity Detection Using Latent Semantic Analysis Algorithm Priyanka R

Introduction to Wordnet: an On-Line Lexical Database