Coversheet for Thesis in Sussex Research Online

Total Page:16

File Type:pdf, Size:1020Kb

Coversheet for Thesis in Sussex Research Online A University of Sussex DPhil thesis Available online via Sussex Research Online: http://sro.sussex.ac.uk/ This thesis is protected by copyright which belongs to the author. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the Author The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the Author When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given Please visit Sussex Research Online for more information and further details Graph-Based Approaches to Word Sense Induction David Richard Hope Submitted for the degree of Doctor of Philosophy University of Sussex December 2014 Declaration I hereby declare that this thesis has not been and will not be submitted in whole or in part to another University for the award of any other degree. Signature: David Richard Hope iii UNIVERSITY OF SUSSEX David Richard Hope, Doctor of Philosophy Graph-Based Approaches to Word Sense Induction Summary This thesis is a study of Word Sense Induction (WSI), the Natural Language Processing (NLP) task of automatically discovering word meanings from text. WSI is an open problem in NLP whose solution would be of considerable benefit to many other NLP tasks. It has, however, has been studied by relatively few NLP researchers and often in set ways. Scope therefore exists to apply novel methods to the problem, methods that may improve upon those previously applied. This thesis applies a graph-theoretic approach to WSI. In this approach, word senses are identified by finding particular types of subgraphs in word co-occurrence graphs. A number of original methods for constructing, analysing, and partitioning graphs are introduced, with these methods then incorporated into graph- based WSI systems. These systems are then shown, in a variety of evaluation scenarios, to return results that are comparable to those of the current best performing WSI systems. The main contributions of the thesis are a novel parameter-free soft clustering algorithm that runs in time linear in the number of edges in the input graph, and novel generalisations of the clustering coefficient (a measure of vertex cohesion in graphs) to the weighted case. Further contributions of the thesis include: a review of graph-based WSI systems that have been proposed in the literature; analysis of the methodologies applied in these systems; analysis of the metrics used to evaluate WSI systems, and empirical evidence to verify the usefulness of each novel method introduced in the thesis for inducing word senses. iv Acknowledgements Firstly, many thanks to my thesis supervisor Bill Keller, notably for his patience and ability to phrase complex problems in simple terms. Thanks also to my parents, and to those who helped in different ways along the way: Jonathon, Dan, Catherine, Isabelle, Diva, Steven, Diana, David, Simon, John, Rob, Patrick, Brian, and Adam - in no particular order. v Contents List of Tables xiv List of Figures xviii 1 Introduction 1 1.1 Word Sense Induction . .3 1.1.1 Graph-Based Word Sense Induction . .4 1.1.1.1 A Graph Model of Word Context . .4 1.1.1.2 Identifying Word Senses . .5 1.1.1.3 Assigning Induced Senses to Words . .6 1.2 Motivation . .7 1.2.1 Lexicography . .7 1.2.2 Information Retrieval . .7 1.2.3 Machine Translation . .8 1.2.4 Information Extraction . .8 1.2.5 Social and Information Network Analysis . 10 1.2.6 Motivation: Summary . 11 1.3 Research Objectives . 11 1.4 Contributions of the Thesis . 12 1.5 Outline of the Thesis . 12 I Graphs, Association Measures and Clustering 15 2 Graphs 16 2.1 Undirected and Directed Graphs . 16 2.1.1 Undirected Graphs . 16 2.1.2 Directed Graphs . 17 vi 2.2 Weighted Graphs . 18 2.3 Connectivity in Graphs . 19 2.3.1 Connected Graphs and Connected Components . 19 2.3.2 Complete Graphs . 20 2.3.3 Cliques . 21 2.3.4 Vertex Degree . 22 2.4 Vertex Neighbours . 23 2.4.1 Vertex Cohesion . 24 2.4.2 nth. Order Neighbours . 26 2.5 Additional Graph Properties . 27 2.5.1 Graph Centrality . 27 2.5.2 Global Measures . 28 2.6 Summary . 30 3 Measures of Vertex Association 31 3.1 Introduction . 31 3.1.1 Vertex Measures and Graph Measures . 31 3.2 Association Measures . 33 3.2.1 Frequency . 33 3.2.2 Conditional Probability . 33 3.2.3 The Log Likelihood Ratio . 34 3.2.3.1 Log Likelihood Ratio Definition . 35 3.2.4 Alternative Association Measures . 36 3.3 The Clustering Coefficient . 37 3.3.1 The Local Clustering Coefficient . 37 3.3.2 A Generalisation for Directed Graphs . 38 3.4 Summary of the Chapter . 39 4 Weighted Clustering Coefficients 40 4.1 Introduction . 40 4.2 A Review of Existing Weighted Generalisations . 40 4.2.1 Barrat et al. 40 4.2.2 Lopez-Fernandez et al. 41 4.2.3 Onnela et al. 42 4.2.4 Zhang and Horvath . 43 vii 4.2.5 Opsahl and Panzarasa . 43 4.2.6 Summary . 45 4.3 Three Novel Generalisations to the Weighted Case . 45 4.3.1 Generalisation 1 . 46 4.3.2 Generalisations 2 and 3 . 46 4.3.2.1 A Generalisation for Graphs Representing Probability Spaces 47 4.3.2.2 A Generalisation for Integer Weighted Graphs . 48 4.4 Summary of the Chapter . 49 5 Clustering 50 5.1 Introduction . 50 5.2 Clustering Solution Types . 51 5.3 Two Commonly Applied Clustering Methods . 52 5.3.1 Prototype-Based Clustering . 53 5.3.2 Density-Based Clustering . 54 5.3.2.1 DBSCAN . 54 5.3.2.2 Shared Nearest Neighbours (SNN) . 56 5.3.3 Summary . 57 5.4 Graph-Based Soft, Fuzzy, and Parameter-Free Clustering Methods for WSI . 58 5.4.1 Graph-Based Soft Clustering for WSI . 59 5.4.1.1 Hypergraphs . 59 5.4.1.2 Graphs of Collocations . 60 5.4.1.3 Soft Clustering of Semantically Similar Words . 62 5.4.2 Graph-Based Induction of Fuzzy Word Senses . 62 5.4.2.1 Borin and Forsberg (2010) . 62 5.4.2.2 Oliveira and Gomes (2011) . 63 5.4.2.3 Inducing Fuzzy Semantic Relations . 65 5.4.3 Chinese Whispers: a Parameter-Free Clustering Algorithm . 66 5.4.3.1 Chinese Whispers ....................... 66 5.4.3.2 Chinese Whispers and Non-Determinism . 68 5.4.3.3 Analysis of Chinese Whispers ................. 69 5.4.3.4 Chinese Whispers and an `Ideal' Clustering Algorithm . 70 5.5 The MaxMax Clustering Algorithm . 71 5.5.1 MaxMax .................................. 71 viii 5.5.2 Correctness of MaxMax ......................... 73 5.5.3 Time Complexity of MaxMax ...................... 74 5.5.4 Testing MaxMax: Graph Partitioning . 75 5.5.5 Testing MaxMax: Graph Clustering . 75 5.5.5.1 Introduction . 75 5.5.5.2 The Test Set: Shapes . 76 5.5.5.3 Applying the Clustering Algorithms to the Shapes Test Set 77 5.5.5.4 The Affinity Propagation Clustering Algorithm . 78 5.5.5.5 Evaluation Measures . 79 5.5.5.6 Results . 80 5.5.5.7 Results Discussion . 82 5.5.5.8 Conclusions . 85 5.5.6 MaxMax: Discussion . 85 II Word Sense Induction 88 6 Word Sense Induction 89 6.1 Word Sense . 89 6.2 Word Sense Induction . 92 6.2.1 Word Sense Disambiguation Systems . 93 6.2.2 Word Sense Induction Systems . 93 6.2.3 Graph-Based Word Sense Induction Systems . 94 6.2.3.1 Discussion . 96 7 A Review of Existing Approaches to Word Sense Induction 98 7.1 Introduction . 98 7.1.1 Lexical Co-occurrence Graphs . 98 7.2 Dorow and Widdows . 99 7.2.1 Dorow and Widdows (2003a) . 99 7.2.2 Dorow et al. (2005) . 101 7.3 HyperLex - V´eronis(2004) . 102 7.4 Agirre et al. 103 7.4.1 Summary and Discussion . 105 7.5 Klapaftis et al. 105 7.5.1 Klapaftis (2008) . ..
Recommended publications
  • Brazoria County Master Gardener Association 2013 Citrus and Fruit Guide
    BRAZORIA COUNTY MASTER GARDENER ASSOCIATION 2013 CITRUS AND FRUIT GUIDE A Guide to Proper Selection and Planting Tips for Brazoria County Gardeners B.E.E.S—Brazoria Environmental Education Station B- Be aware of the environment E- Endeavor to protect our natural resources E- Educate our community S- Serve as stewards of the Earth FOREWORD The Brazoria County Master Gardener (BCMG) program is part of the state organization affiliated with the Texas A & M Extension System. We are a 501C3 organization under IRS statutes. Monies collected from this sale supports the B.E.E.S. educational and demonstration gardens located at the end of CR 171. One of this years’ major projects is to build walkways which meet handicap standards and for our senior citizens. The gardens are open on Tuesdays and Fri- days to the public. Special topic programs are offered on Saturdays for public attendance and are advertised in local newspapers and radio. The gardens include eating fruits (berries, fruit, etc.), herb garden, tropicals, veggies, rose garden, two greenhouses other demo/experimental plants. The contents of this brochure utilized multiple resources from leading agricultural universities, Texas and other state and national organizations. Fruit plants are selected on basis of Master Gardener feedback in our county as well as neighbor counties. Past demand & interviews of folks after each years’ February sale help us select new varieties and determine plant volume each year at the sale. BCMG makes genuine efforts to provide the public with information on plants offered. Other than assuring the public that we use only licensed nurseries, BCMG cannot assure success.
    [Show full text]
  • ASFIS ISSCAAP Fish List February 2007 Sorted on Scientific Name
    ASFIS ISSCAAP Fish List Sorted on Scientific Name February 2007 Scientific name English Name French name Spanish Name Code Abalistes stellaris (Bloch & Schneider 1801) Starry triggerfish AJS Abbottina rivularis (Basilewsky 1855) Chinese false gudgeon ABB Ablabys binotatus (Peters 1855) Redskinfish ABW Ablennes hians (Valenciennes 1846) Flat needlefish Orphie plate Agujón sable BAF Aborichthys elongatus Hora 1921 ABE Abralia andamanika Goodrich 1898 BLK Abralia veranyi (Rüppell 1844) Verany's enope squid Encornet de Verany Enoploluria de Verany BLJ Abraliopsis pfefferi (Verany 1837) Pfeffer's enope squid Encornet de Pfeffer Enoploluria de Pfeffer BJF Abramis brama (Linnaeus 1758) Freshwater bream Brème d'eau douce Brema común FBM Abramis spp Freshwater breams nei Brèmes d'eau douce nca Bremas nep FBR Abramites eques (Steindachner 1878) ABQ Abudefduf luridus (Cuvier 1830) Canary damsel AUU Abudefduf saxatilis (Linnaeus 1758) Sergeant-major ABU Abyssobrotula galatheae Nielsen 1977 OAG Abyssocottus elochini Taliev 1955 AEZ Abythites lepidogenys (Smith & Radcliffe 1913) AHD Acanella spp Branched bamboo coral KQL Acanthacaris caeca (A. Milne Edwards 1881) Atlantic deep-sea lobster Langoustine arganelle Cigala de fondo NTK Acanthacaris tenuimana Bate 1888 Prickly deep-sea lobster Langoustine spinuleuse Cigala raspa NHI Acanthalburnus microlepis (De Filippi 1861) Blackbrow bleak AHL Acanthaphritis barbata (Okamura & Kishida 1963) NHT Acantharchus pomotis (Baird 1855) Mud sunfish AKP Acanthaxius caespitosa (Squires 1979) Deepwater mud lobster Langouste
    [Show full text]
  • WO 2017/151835 Al 8 September 2017 (08.09.2017) P O P C T
    (12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) (19) World Intellectual Property Organization International Bureau (10) International Publication Number (43) International Publication Date WO 2017/151835 Al 8 September 2017 (08.09.2017) P O P C T (51) International Patent Classification: (81) Designated States (unless otherwise indicated, for every A61K 8/34 (2006.01) A61K 8/98 (2006.01) kind of national protection available): AE, AG, AL, AM, A61Q 19/00 (2006.01) AO, AT, AU, AZ, BA, BB, BG, BH, BN, BR, BW, BY, BZ, CA, CH, CL, CN, CO, CR, CU, CZ, DE, DJ, DK, DM, (21) Number: International Application DO, DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, PCT/US20 17/0203 12 HN, HR, HU, ID, IL, IN, IR, IS, JP, KE, KG, KH, KN, (22) International Filing Date: KP, KR, KW, KZ, LA, LC, LK, LR, LS, LU, LY, MA, 2 March 2017 (02.03.2017) MD, ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, OM, PA, PE, PG, PH, PL, PT, QA, RO, RS, (25) Filing Language: English RU, RW, SA, SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, (26) Publication Language: English TH, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW. (30) Priority Data: 62/302,972 3 March 2016 (03.03.2016) US (84) Designated States (unless otherwise indicated, for every kind of regional protection available): ARIPO (BW, GH, (71) Applicant: ACCESS BUSINESS GROUP INTERNA¬ GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, ST, SZ, TIONAL LLC [US/US]; 7575 Fulton Street East, Grand TZ, UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, RU, Rapids, Michigan 49355 (US).
    [Show full text]
  • Breeding and Analysis of Two New Grapefruit-Like Varieties with Low Furanocoumarin Content
    Food and Nutrition Sciences, 2016, 7, 90-101 Published Online February 2016 in SciRes. http://www.scirp.org/journal/fns http://dx.doi.org/10.4236/fns.2016.72011 Breeding and Analysis of Two New Grapefruit-Like Varieties with Low Furanocoumarin Content Lena Fidel1, Mira Carmeli-Weissberg1, Yosef Yaniv1, Felix Shaya1, Nir Dai1, Eran Raveh1, Yoram Eyal1, Ron Porat2, Nir Carmi1* 1Institute of Plant Sciences, Bet Dagan, Israel 2Institute of Postharvest and Food Sciences, Agricultural Research Organization, Volcani Center, Bet Dagan, Israel Received 28 December 2015; accepted 20 February 2016; published 23 February 2016 Copyright © 2016 by authors and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/ Abstract Furanocoumarins (FCs) are a group of related plant defense metabolites occurring in several plant families, including some species in the genus citrus, such as grapefruit and pummelo. FCs function as toxins against pathogens, insects and other plant pests and some are toxic to humans at high levels. Although the levels of FCs in grapefruits are non-toxic to humans, they inhibit the intestinal enzyme CYP3A, thus preventing degradation of medicines, such as statins, and causing dangerous overdose effects. This overdosing can cause devastating side effects, ranging from stomach bleed- ing to kidney problems, muscle aches and irregular heartbeats. In the present study, we utilize LC/MS to characterize the levels of FCs pathway intermediates and end products in twelve citrus cultivars, including mandarin (Citrus reticulata), orange [Citrus sinensis (L.) Osbeck], Pummelo [Citrus maxima (Burm.) Merr.], grapefruit (Citrus paradisi Macf.), and two newly selected grape- fruit like varieties [(Citrus reticulata) X [Citrus maxima (Burm.) Merr].
    [Show full text]
  • Le Catalogue Public Des Variétes
    DOCUMENT TECHNIQUE Réf. : DT-R-34 Version: C Création: Iatimad Bloquel Catalogue public des variétés Modification: Olivier Pailly Date d'application: 05/08/2020 Texte50* Il est recommandé de passer vos commandes avant le 31 janvier de la campagne en cours. Les quantités minimales acceptées sont de 10 yeux / variété, dans la limite de 5 variétés / an pour les particuliers * Aucune garantie ne peut être apportée quant à l'acceptation des commandes par nos services (priorités stratégiques, aléas climatiques, rendements, etc..) * Pour toute commande de greffons, un supplément de 10 € sera appliqué pour frais administratifs ainsi que 5 € / accession pour couvrir les frais d'analyse sanitaire * Le prix des greffons est de 2,00 € / œil * Pour les pépinieristes corses, le prix est de 1,10 € / œil Variété Espèce Aeglopsis chevalieri Aeglopsis chevalieri Swingle Alemow Citrus macrophylla Wester Atalantia buxifolia Severinia buxifolia (Poir.) Ten. Atalantia de Ceylan Atalantia ceylanica (Arn.) Oliv. balsamocitrus dawei Balsamocitrus dawei Stapf. Bergamote Castagnaro Citrus bergamia Risso & Poit. Bigarade à fleur Citrus aurantium L. Bigarade à fleur empereur Citrus aurantium L. Bigarade à fleur Ferando (B6c K37) Citrus aurantium L. Bigarade a fleur grosse Carle Citrus aurantium L. Bigarade Alibert Citrus aurantium L. Bigarade Alibert Hyb.12 Citrus aurantium L. Bigarade Apepu Citrus aurantium L. Bigarade Australian Citrus aurantium L. Bigarade Brazil Sour Citrus aurantium L. Bigarade cardosi A1 Citrus aurantium L. Bigarade cardosi A2 Citrus aurantium L. Bigarade chinensis Citrus myrtifolia Raf. Bigarade Corsigliese Citrus aurantium L. Bigarade Curaçao Citrus aurantium L. Bigarade d'algérie Citrus aurantium L. Bigarade de Floride Citrus aurantium L. Bigarade Donnet Citrus aurantium L.
    [Show full text]
  • Strain Guide
    Strain Guide DEPRESSION STRESS PAIN LACK OF APPETITE HEADACHES MUSCLE SPASMS SPASTICITY INFLAMMATION INSOMNIA NAUSEA ANXIETY FATIGUE Grape Stomper (GRST) Purple Elephant x Chemdawg Sour Diesel Queso Perro (QSPO) Stardawg x UK Cheese White 99 (WT99) The White x Cinderella 99 Stardawg (STDG) Chemdog 4 x Tres Dawg Durban Dawg (DBSD) Durban x Stardawg Good Dawg (GMSD) Good Medicine X Stardawg Frutful Watermelon (WTML) THC Blend x Fruit-derived terpenes 99 Valleys (99VA) 99 Problems x Silicon Valley OG Bubba Kush (BUBK) Hindu Kush x OG Kush Wookie Girl (WGRL) Girl Scout Cookies x The White Silicon Valley OG (SVOG) Valley Girl x Stardawg Brick House (BRKH) Ghost Train Haze x Oz Kush Bx1 Emerald City Kush (EMCK) Triangle Kush x Oz Kush Bx1 Frutful Blueberry (BBRY) THC Blend x Fruit-derived terpenes Mikan (MKAN) Mandarin Sunset Phenotype* 1 Herijuana x Orange Skunk Kinno (KNNO) Mandarin Sunset Phenotype* 2 Herijuana x Orange Skunk Mandelo (MDLO) Mandarin Sunset Phenotype* 3 Herijuana x Orange Skunk Citrus Slice (CTSL) Mandarin Sunset Phenotype* 4 Herijuana x Orange Skunk Master Grower Series™ Mandarin Sunset (MNSN) Herijuana x Orange Skunk Master Grower Series™ Florida Keys (FKLS) Bubba Kush x Key Lime Mints Duck Key (DCKY) Florida Keys Phenotype* 1 Bubba Kush x Key Lime Mints Islamorada (ISMD) Florida Keys Phenotype* 5 Bubba Kush x Key Lime Mints Big Pine Key (BPKY) Florida Keys Phenotype* 8 Bubba Kush x Key Lime Mints Key Largo (KYLG) Florida Keys Phenotype* 9 Bubba Kush x Key Lime Mints 99 Problems (99PR) White 99 x Stardawg Birds of Paradise
    [Show full text]
  • Fruit Trees Need a Period of Winter Rest Or Dormancy, When Temperatures Are Between 32°F and 45°F for Flowers and Leaf Buds to Develop Normally
    GARDENING FACT SHEET Harris County Cooperative Extension 3033 Bear Creek Drive, Houston, Texas 77084 281.855.5600 • http://harris-tx.tamu.edu/hort Recommended Fruit and Nut Varieties Harris County and Vicinity by William D. Adams, County Extension Agent—Horticulture, emeritus edited by Carol Brouwer, Ph.D., Harris County Extension Agent—Horticulture; Ethan Natelson, M.D., Gulf Coast Fruit Study Group; Robert A. Randall, Ph.D., Executive Director, Urban Harvest re-issued, July 2006, as a joint project of Texas Cooperative Extension, Urban Harvest, the Gulf Coast Fruit Study Group and the Harris County Master Gardeners any types of fruits and nuts grow well in home orchards in Harris County. The first key to a successful harvest is to select varieties best suited to our subtropical climate. MOne of the most important considerations in selecting an appropriate plant is its temperature requirements. Citrus trees are subtropical to tropical in nature and many may suffer severe damage or even death in freezing temperatures. However, several types of citrus are sufficiently cold hardy to survive most winters in our region, particularly as mature trees, and especially in the warmer areas of the county. Planting citrus trees on the south and southeast sides of the house or in other sheltered locations will provide some protection from northwesterly cold fronts. Aside from knowing how much cold a plant can stand, it is also important to know how much cold it needs. Stone and pome fruit trees need a period of winter rest or dormancy, when temperatures are between 32°F and 45°F for flowers and leaf buds to develop normally.
    [Show full text]