Coversheet for Thesis in Sussex Research Online
Total Page:16
File Type:pdf, Size:1020Kb
A University of Sussex DPhil thesis Available online via Sussex Research Online: http://sro.sussex.ac.uk/ This thesis is protected by copyright which belongs to the author. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the Author The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the Author When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given Please visit Sussex Research Online for more information and further details Graph-Based Approaches to Word Sense Induction David Richard Hope Submitted for the degree of Doctor of Philosophy University of Sussex December 2014 Declaration I hereby declare that this thesis has not been and will not be submitted in whole or in part to another University for the award of any other degree. Signature: David Richard Hope iii UNIVERSITY OF SUSSEX David Richard Hope, Doctor of Philosophy Graph-Based Approaches to Word Sense Induction Summary This thesis is a study of Word Sense Induction (WSI), the Natural Language Processing (NLP) task of automatically discovering word meanings from text. WSI is an open problem in NLP whose solution would be of considerable benefit to many other NLP tasks. It has, however, has been studied by relatively few NLP researchers and often in set ways. Scope therefore exists to apply novel methods to the problem, methods that may improve upon those previously applied. This thesis applies a graph-theoretic approach to WSI. In this approach, word senses are identified by finding particular types of subgraphs in word co-occurrence graphs. A number of original methods for constructing, analysing, and partitioning graphs are introduced, with these methods then incorporated into graph- based WSI systems. These systems are then shown, in a variety of evaluation scenarios, to return results that are comparable to those of the current best performing WSI systems. The main contributions of the thesis are a novel parameter-free soft clustering algorithm that runs in time linear in the number of edges in the input graph, and novel generalisations of the clustering coefficient (a measure of vertex cohesion in graphs) to the weighted case. Further contributions of the thesis include: a review of graph-based WSI systems that have been proposed in the literature; analysis of the methodologies applied in these systems; analysis of the metrics used to evaluate WSI systems, and empirical evidence to verify the usefulness of each novel method introduced in the thesis for inducing word senses. iv Acknowledgements Firstly, many thanks to my thesis supervisor Bill Keller, notably for his patience and ability to phrase complex problems in simple terms. Thanks also to my parents, and to those who helped in different ways along the way: Jonathon, Dan, Catherine, Isabelle, Diva, Steven, Diana, David, Simon, John, Rob, Patrick, Brian, and Adam - in no particular order. v Contents List of Tables xiv List of Figures xviii 1 Introduction 1 1.1 Word Sense Induction . .3 1.1.1 Graph-Based Word Sense Induction . .4 1.1.1.1 A Graph Model of Word Context . .4 1.1.1.2 Identifying Word Senses . .5 1.1.1.3 Assigning Induced Senses to Words . .6 1.2 Motivation . .7 1.2.1 Lexicography . .7 1.2.2 Information Retrieval . .7 1.2.3 Machine Translation . .8 1.2.4 Information Extraction . .8 1.2.5 Social and Information Network Analysis . 10 1.2.6 Motivation: Summary . 11 1.3 Research Objectives . 11 1.4 Contributions of the Thesis . 12 1.5 Outline of the Thesis . 12 I Graphs, Association Measures and Clustering 15 2 Graphs 16 2.1 Undirected and Directed Graphs . 16 2.1.1 Undirected Graphs . 16 2.1.2 Directed Graphs . 17 vi 2.2 Weighted Graphs . 18 2.3 Connectivity in Graphs . 19 2.3.1 Connected Graphs and Connected Components . 19 2.3.2 Complete Graphs . 20 2.3.3 Cliques . 21 2.3.4 Vertex Degree . 22 2.4 Vertex Neighbours . 23 2.4.1 Vertex Cohesion . 24 2.4.2 nth. Order Neighbours . 26 2.5 Additional Graph Properties . 27 2.5.1 Graph Centrality . 27 2.5.2 Global Measures . 28 2.6 Summary . 30 3 Measures of Vertex Association 31 3.1 Introduction . 31 3.1.1 Vertex Measures and Graph Measures . 31 3.2 Association Measures . 33 3.2.1 Frequency . 33 3.2.2 Conditional Probability . 33 3.2.3 The Log Likelihood Ratio . 34 3.2.3.1 Log Likelihood Ratio Definition . 35 3.2.4 Alternative Association Measures . 36 3.3 The Clustering Coefficient . 37 3.3.1 The Local Clustering Coefficient . 37 3.3.2 A Generalisation for Directed Graphs . 38 3.4 Summary of the Chapter . 39 4 Weighted Clustering Coefficients 40 4.1 Introduction . 40 4.2 A Review of Existing Weighted Generalisations . 40 4.2.1 Barrat et al. 40 4.2.2 Lopez-Fernandez et al. 41 4.2.3 Onnela et al. 42 4.2.4 Zhang and Horvath . 43 vii 4.2.5 Opsahl and Panzarasa . 43 4.2.6 Summary . 45 4.3 Three Novel Generalisations to the Weighted Case . 45 4.3.1 Generalisation 1 . 46 4.3.2 Generalisations 2 and 3 . 46 4.3.2.1 A Generalisation for Graphs Representing Probability Spaces 47 4.3.2.2 A Generalisation for Integer Weighted Graphs . 48 4.4 Summary of the Chapter . 49 5 Clustering 50 5.1 Introduction . 50 5.2 Clustering Solution Types . 51 5.3 Two Commonly Applied Clustering Methods . 52 5.3.1 Prototype-Based Clustering . 53 5.3.2 Density-Based Clustering . 54 5.3.2.1 DBSCAN . 54 5.3.2.2 Shared Nearest Neighbours (SNN) . 56 5.3.3 Summary . 57 5.4 Graph-Based Soft, Fuzzy, and Parameter-Free Clustering Methods for WSI . 58 5.4.1 Graph-Based Soft Clustering for WSI . 59 5.4.1.1 Hypergraphs . 59 5.4.1.2 Graphs of Collocations . 60 5.4.1.3 Soft Clustering of Semantically Similar Words . 62 5.4.2 Graph-Based Induction of Fuzzy Word Senses . 62 5.4.2.1 Borin and Forsberg (2010) . 62 5.4.2.2 Oliveira and Gomes (2011) . 63 5.4.2.3 Inducing Fuzzy Semantic Relations . 65 5.4.3 Chinese Whispers: a Parameter-Free Clustering Algorithm . 66 5.4.3.1 Chinese Whispers ....................... 66 5.4.3.2 Chinese Whispers and Non-Determinism . 68 5.4.3.3 Analysis of Chinese Whispers ................. 69 5.4.3.4 Chinese Whispers and an `Ideal' Clustering Algorithm . 70 5.5 The MaxMax Clustering Algorithm . 71 5.5.1 MaxMax .................................. 71 viii 5.5.2 Correctness of MaxMax ......................... 73 5.5.3 Time Complexity of MaxMax ...................... 74 5.5.4 Testing MaxMax: Graph Partitioning . 75 5.5.5 Testing MaxMax: Graph Clustering . 75 5.5.5.1 Introduction . 75 5.5.5.2 The Test Set: Shapes . 76 5.5.5.3 Applying the Clustering Algorithms to the Shapes Test Set 77 5.5.5.4 The Affinity Propagation Clustering Algorithm . 78 5.5.5.5 Evaluation Measures . 79 5.5.5.6 Results . 80 5.5.5.7 Results Discussion . 82 5.5.5.8 Conclusions . 85 5.5.6 MaxMax: Discussion . 85 II Word Sense Induction 88 6 Word Sense Induction 89 6.1 Word Sense . 89 6.2 Word Sense Induction . 92 6.2.1 Word Sense Disambiguation Systems . 93 6.2.2 Word Sense Induction Systems . 93 6.2.3 Graph-Based Word Sense Induction Systems . 94 6.2.3.1 Discussion . 96 7 A Review of Existing Approaches to Word Sense Induction 98 7.1 Introduction . 98 7.1.1 Lexical Co-occurrence Graphs . 98 7.2 Dorow and Widdows . 99 7.2.1 Dorow and Widdows (2003a) . 99 7.2.2 Dorow et al. (2005) . 101 7.3 HyperLex - V´eronis(2004) . 102 7.4 Agirre et al. 103 7.4.1 Summary and Discussion . 105 7.5 Klapaftis et al. 105 7.5.1 Klapaftis (2008) . ..