A Graph Model for Words and Their Meanings

A Graph Model for Words and Their Meanings

A Graph Model for Words and their Meanings Von der Philosophisch-Historischen Fakult¨at der Universit¨at Stuttgart zur Erlangung der W¨urde eines Doktors der Philosophie (Dr. phil.) genehmigte Abhandlung vorgelegt von Beate Dorow aus Albstadt Erstgutachter: PD Dr. phil. habil. Ulrich Heid Zweitgutachter: Prof. Hinrich Sch¨utze, Ph.D. Tag der m¨undlichen Pr¨ufung: 03.03.2006 Institut f¨ur Maschinelle Sprachverarbeitung der Universit¨at Stuttgart 2006 2 Contents 1 Introduction 13 1.1 Motivation.................................... 13 1.2 Semantic similarity and ambiguity . ..... 13 1.2.1 Ambiguity................................ 13 1.2.2 Semanticsimilarity ........................... 14 1.3 Idioms...................................... 15 1.4 Synonyms .................................... 16 1.5 Contributionsofthethesis . ... 17 1.6 Outlineofthethesis .............................. 18 2 Graphs 21 2.1 Terminologyandnotation . 21 2.2 Real-worldgraphs................................ 25 2.2.1 Socialnetworks ............................. 25 2.2.2 Technologicalnetworks . 26 2.2.3 Biologicalnetworks ........................... 27 2.2.4 Graphs arising in linguistics . .. 27 2.3 Curvature as a measure of local graph topology . ...... 29 2.3.1 Clustering in social networks . .. 29 2.3.2 Graph-theoreticdefinition . 31 2.3.3 Geometricinterpretation . 33 3 Data acquisition 39 3.1 Acquisitionassumption. .. 39 3.2 Textmaterial .................................. 40 3.3 Buildingthegraph ............................... 40 3.4 Preprocessingofthewordgraph. ... 43 3.4.1 Conflatingvariantsofthesameword . 43 3.4.2 Eliminatingweaklinks . .. .. 44 3.5 Weightinglinks ................................. 44 3.6 Spuriousedges ................................. 46 3.6.1 Errorsindataacquisition . 46 3.6.2 Violation of the symmetry assumption . .. 48 4 CONTENTS 3.6.3 Semantically heterogeneous lists . .... 48 3.6.4 Ambiguity due to conflation of variants . ... 48 3.6.5 Statistical properties of the word graph . ..... 48 4 Discovering word senses 51 4.1 Intuition..................................... 51 4.2 MarkovClustering ............................... 52 4.3 Word Sense Clustering Algorithm . ... 53 4.4 ExperimentalResults. .. .. 54 5 Using curvature for ambiguity detection and semantic class acquisition 59 5.1 A linguistic interpretation of curvature . ........ 59 5.2 Curvatureasameasureofambiguity . ... 61 5.3 Curvatureasaclusteringtool . ... 64 5.3.1 Determining the curvature threshold . ... 65 5.3.2 Allowing for overlapping clusters . ... 69 5.4 EvaluationagainstWordNet . .. 72 5.4.1 PrecisionandRecall .......................... 72 5.4.2 Mapping clusters to WordNet senses . 75 5.4.3 ResultsandConclusions . 76 5.5 Uncovering the meanings of novel words . .... 79 6 From clustering words to clustering word associations 87 6.1 Intuition..................................... 87 6.2 Linkgraphtransformation . .. 87 6.3 Somepropertiesofthelinkgraph . ... 89 6.4 Clusteringthelinkgraph. .. 92 6.5 Evaluation.................................... 92 6.6 Pairsofcontrastingclusters . .... 96 6.7 Conclusion.................................... 102 7 Idiomaticity in the word graph 105 7.1 Introduction................................... 105 7.2 Asymmetriclinks ................................ 106 7.2.1 Testing the reversibility (symmetry) of links . ....... 106 7.2.2 Semanticconstraints . 108 7.3 Idiomaticconstraints . 113 7.4 Linkanalysis .................................. 114 7.5 Statisticalanalysis ............................. 118 7.5.1 Noncompositionality of meaning . 118 7.5.2 Semanticanomaly ........................... 119 7.5.3 Limited syntactic variability . 122 7.5.4 Variabilityofcontext . 123 Table of Contents 5 7.6 Conclusions ................................... 126 8 Synonyms in the word graph 127 8.1 Hypothesis.................................... 127 8.2 Methodology .................................. 129 8.3 Evaluationexperiment . 131 8.3.1 Testset ................................. 131 8.3.2 Experimentalset-up . .. .. 133 8.4 RevisedHypothesis ............................... 137 8.5 Conclusion.................................... 138 9 Related Work 143 9.1 Extractionofsemanticrelations . ..... 143 9.2 Semantic similarity and ambiguity . ..... 146 9.2.1 Bootstrapping semantic categories . .... 146 9.2.2 Hard clustering of words into classes of similar words ........ 150 9.2.3 Soft clustering of words into classes of similar words ......... 152 9.2.4 Word sense induction by clustering word contexts . ...... 158 9.2.5 Identifying and assessing ambiguity . .... 165 9.3 Idiomaticity ................................... 166 9.4 Synonymy .................................... 167 10 Summary, Conclusion and Future Research 171 10.1Summary .................................... 171 10.2 ConclusionandFutureResearch . .... 173 11 Zusammenfassung 175 11.1 Semantische Ambiguit¨at und Ahnlichkeit¨ . .. 176 11.2Idiome...................................... 180 11.3Synonyme .................................... 180 11.4 Einordnung und zuk¨unftige Forschungsarbeit . .......... 181 Acknowledgements 187 6 Table of Contents List of Figures 1.1 Local graph around rock ............................ 15 1.2 Local graph around carrot and stick ...................... 16 1.3 Local graph around writer and author ..................... 17 2.1 Asimplegraph ................................. 22 2.2 Twosubgraphs ................................. 22 2.3 Adirectedgraph ................................ 23 2.4 Aweightedgraph................................ 23 2.5 Completegraphs ................................ 24 2.6 Ahierarchy ................................... 25 2.7 Aparsetree ................................... 28 2.8 WordNetsnippet ................................ 30 2.9 Asocialnetwork ................................ 32 2.10Curvature .................................... 33 2.11Surfaces ..................................... 34 2.12Boundarystrips................................. 34 2.13Aplanartriangle ................................ 35 2.14 Atriangleonasphere ............................. 36 2.15Atriangleonasaddle ............................. 37 2.16 ThePoincar´edisk.. .. .. 37 3.1 Buildingthewordgraph ............................ 42 3.2 Local graph around body ............................ 42 3.3 Conflatingthevariantsofaword . .. 43 3.4 Thetrianglefilter................................ 44 4.1 Local graph around mouse ........................... 51 4.2 Local graph around wing ............................ 52 5.1 Local graph around head (3D)......................... 60 5.2 Curvaturevs.Frequency . 62 5.3 Local graph around monaco .......................... 63 5.4 Evolutionofthegiantcomponent . ... 67 5.5 Evolutionofsmallcomponents. ... 68 8 LIST OF FIGURES 5.6 Curvatureclustering ............................. 70 5.7 The clusters of mouse .............................. 73 5.8 The clusters of oil ................................ 74 5.9 The clusters of recorder ............................. 74 5.10 Precision, recall and F-score of the curvature clustering........... 77 5.11PathsinWordNet................................ 82 5.12 Length of the paths connecting a testword with its label .......... 83 5.13Christmastree ................................. 85 6.1 Linkgraphtransformation . .. 88 6.2 Two triangles involving squash ......................... 89 6.3 Local graph around the word organ ...................... 89 6.4 Link graph associated with organ ....................... 90 6.5 A clique under the link graph transformation . ....... 91 6.6 Curvatureofnodesinthelinkgraph . ... 92 6.7 Precision and recall of link graph clustering . ........ 94 6.8 F-scoreoflinkgraphclustering . .... 95 6.9 Two clusters of humour ............................. 97 6.10 Subgraphs corresponding to three clusters of plant .............. 102 6.11 Contrastsbetweenclusters . .... 103 7.1 Directed graph of family relationships . ....... 111 7.2 Hierarchical relationships between aristocrats . ........... 112 7.3 Directedgraphoflife-events . .... 112 7.4 Chronologyoftheseasons . 113 7.5 Lexicalsubstitution. .. .. 115 7.6 Local graph around rhyme and reason ..................... 115 7.7 Local graph around rhyme and song ...................... 116 7.8 Local graph around man, woman and child .................. 117 7.9 Local graph around lock, stock and barrel ................... 117 7.10 Circulardispersion . 125 8.1 Local graph around pedestrian and walker .................. 128 8.2 First-andsecond-orderMarkovgraph. ..... 130 8.3 Local graph around gull and mug ....................... 132 8.4 TargetranksunderHypothesis1 . 135 8.5 TargetranksunderHypothesis2 . 139 9.1 Bootstrappingsemanticcategories . ...... 149 9.2 Agglomerativeclustering . 151 9.3 Acooccurrencematrix ............................. 159 9.4 WordSpace ................................... 159 9.5 VectorsinWordSpace ............................. 160 9.6 Shared Nearest Neighbors Clustering . ..... 164 List of Figures 9 9.7 The neighborhood of chocolate inadictionary . 169 11.1 Die Nachbarschaft von Gericht ......................... 177 11.2Kr¨ummung ................................... 178 11.3 Graphtransformation . 179 10 List of Figures List of Tables 3.1 Statisticalpropertiesofthewordgraph . ....... 49 4.1 Sample of test words and their discovered senses . ........ 55 4.2 German nominalizations and their discovered senses . .......... 57 5.1 Correlation between curvature and WordNet polysemy count........ 64 5.2 Properties of curvature clustering for different curvature thresholds

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    187 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us