Aligning Wiktionary with Natural Language Processing Resources

Total Page:16

File Type:pdf, Size:1020Kb

Aligning Wiktionary with Natural Language Processing Resources Aligning Wiktionary With Natural Language Processing Resources Ben Casses Western Carolina University Cullowhee, NC 28723 [email protected] Abstract reinforce or partially automate mappings between some of these resources through the use of domain Recently, significant progress has been based disambiguation and gloss comparisons as in made towards mapping various natural the Lesk Algorithm (Lesk, 1986). The following language processing resources together in introduces some of the popular NLP resources and order to form more robust tools. While discusses their relative advantages. most efforts have gone towards connect- ing existing tools to each other, recently several projects have involved aligning 1.1 FrameNet the popular NLP resources to open col- FrameNet is a collection of semantic frames. It laborative projects such as Wikipedia. contains detailed breakdowns of the function and Such alignments are promising because contextual meaning of a given verb along with they link the specific but frequently all possible participating semantic participants narrow NLP data to high coverage open and examples. FrameNet is organized into a resources. This project explores the hypernym-hyponym hirearchy with more specific effectiveness of some variations of the terms inheriting structure from their hypernyms. Lesk Algorithm in connecting specific FrameNet’s exhaustive detail into semantic par- Wikipedia senses to corresponding senses ticipants makes it valuable for studying distance in other NLP resources. The purpose relationships between frames. Robbery, for ex- of this project is to present a potential ample, inherits directly from Committing crime, method of semiautonomous alignment indirectly from Misdeed and uses Theft. 1 for Wiktionary that will serve to augment other NLP resources. 1.2 OntoNotes OntoNotes is a collaborative project that aims to produce “richer model of text meaning” (Hovy et 1 Introduction al., 2006). OntoNotes contains 2,445 verb entries Existing natural language processing (NLP) re- organized into different senses with brief definitions sources can be used in each step of the language and examples. OntoNotes is the most externally comprehension task. Some resources also contain connected of the sense based NLP resources, it con- mappings to others. Since some extensive tasks like tains mappings to FrameNet, PropBank, VerbNet 2 language comprehension involve the use of multiple and WordNet. resources, such mappings can be very useful. This 1http://framenet.icsi.berkeley.edu/ project will study some methods that may help to 2http://verbs.colorado.edu/html groupings/ can also be a disadvantage. Its open nature leads to format inconsistencies between definitions that 1.3 PropBank make automated processing difficult. The potential PropBank contains information on 5,384 verbs. also exists for incorrect or deliberately misleading Each verb entry is divided into different syntactic entries such as those known to have occurred in structures based on common use. Structures are Wikipedia (Snyder, 2007) (Chesney, 2006). subdivided into components referred to as argu- ments. Propbank has value as a parsing and a disambiguating resource. A verb in a given sentence 1.6 WordNet could be matched to one syntactic structure by its WordNet is a long term project hosted by Princeton surrounding arguments. Once matched, the roles of University. Entries in WordNet are organized the arguments are disambiguated according to the into parts of speech and distingushed by sense. roleset. PropBank contains mappings to VerbNet. 3 Wordnet is highly interconnected with each term referring to related terms including hypernyms, 1.4 VerbNet hyponyms, troponyms, frames and peers. Because of its interconnectedness, WordNet is useful for VerbNet is an online database of verbs. Verbs are sense disambiguation. An unknown term could be collected into 274 Levin (1993) style verb classes generalized to its hypernym, for example: if the containing relevant semantic and syntactic infor- term “shuffle” is not understood, its hypernym, mation. The value of VerbNet lies in its depth of “walk” may be. 5 study into its verb members and their relationships. A given verb may be considered synonymous with its fellow members and hyponymous to its class. This can be useful for the purposes of translation 2 Related Work and comprehension if a given verb in VerbNet is not Several different mappings between existing NLP understood but one of its fellow members is. Verb- resources already exist. Combinations of resources Net contains mappings to FrameNet, OntoNotes, 4 have been created in different ways with most PropBank and WordNet. projects involving multiple alignment techniques to improve accuracy. 1.5 Wiktionary The effectiveness of Wiktionary in NLP tasks has al- Through common fields shared by both tools. ready been established by Zesch and Muller¨ (2008). • (Loper et al., 2007) (Pazienza et al., 2006) At the time of this writing, English Wiktionary had (Giuglea and Moschitti, 2004) “1,813,199 entries with English definitions from over 350 languages” (wik, 2010). The usefulness Through mutual restrictions, where a given en- of Wiktionary is in its open, collaborative nature. • try in one database could match with a given Because anyone can contribute to Wiktionary, it is entry in another database, but restrictions pre- far more encompassing than any project developed vent this combination, thus narrowing the pos- by an individual or more structured group could siblilities. (Shi and Mihalcea, 2005) (Loper et be. It is also current, while other dictionaries must al., 2007) (Giuglea and Moschitti, 2006) be revised and updated occasionally, Wiktionary entries are constantly being appended as the nature Through frequency analysis such as greatest of the language changes. The author observed • numbers of common synonyms. (Shi and Mi- the number of entries increasing by 20,000 over halcea, 2005) (Giuglea and Moschitti, 2006) a period of three weeks. Wiktionary’s advantage (Giuglea and Moschitti, 2004) 3http://verbs.colorado.edu/propbank/framesets-english/ 4http://verbs.colorado.edu/verb-index/index.php 5http://wordnetweb.princeton.edu/perl/webwn Through supervised learning techniques where effective when combined. There are structural con- • connections are trained. (Loper et al., 2007) cerns where two resources do not follow the same (Giuglea and Moschitti, 2004) layout or present the same information. There are also potential problems where two resources do not And through manual mapping of connections. • expose the same granularity. Only two resources (Pazienza et al., 2006) (Shi and Mihalcea, containing the same information could have a one- 2005). to-one cardinality. In general, these methods could be divided into three 3.1 Structure and Organization categories: manual methods that require human Not all of the NLP resources discussed in the control, statistical methods that involve matching introduction follow the same structure. WordNet, around comparisons, and learning methods that for example is categorized by frames where each involve machine learning techniques. frame contains multiple syntactic structures with interchangable member verbs. Entries in PropBank, Zesch and Gurevych (2010) compared the effec- however, are centered around a specific predicate or tiveness of several different semantic relatedness term and divided into usages. analysis methods. They considered four distinct se- mantic relatedness measures: Wiktionary is organized by term with each term Path based, where the distance between two divided into different senses. It will be more • terms in a graph is considered edge counting. meaningful to map the Wiktionary senses to the senses of another NLP resource that is organized Information Content based, where the number similarly. Of the resources organized in this way • of documents containing both terms is consid- that were discussed earlier, OntoNotes offers the ered. most advantages due to its interconnectedness. A given Wiktionary sense mapped to OntoNotes could Gloss based, where the quantity of common • be followed to each other resource that OntoNotes words in each term’s gloss is considered. is already connected to. Vector based, where vectors are constructed • from multiple documents and the frequency of 3.2 Granularity occurrence of the given term in each document Term-to-term matching is generally trivial, in- is considered. volving only a mutual lookup for a given term. Recently work has begun on mapping and utilizing Sense-to-sense mapping becomes more difficult the collaborative resources such as Wiktionary and for several reasons. A sense-to-sense connection Wikipedia. The Ubiquitous Knowledge Processing between two different resources indicates that both Lab has developed api’s for both Wiktionary6 and senses could be considered to be the “same”, but Wikipedia7 and made use of them as NLP resources the two senses will generally not contain the same (Zesch et al., 2008). information. For example, one Wiktionary sense of the verb make, “To indicate or suggest to be”, was aligned to “cause to become, or to have a certain 3 Problem Definition quality” in OntoNotes even though both senses con- tain different information. Additionally, because of The task of aligning NLP resources involves several the different ways these resources were constructed, issues. Each of the NLP resources considered in they feature a different degree of granularity around the introduction have different strengths, but some their terms. Arrive, for
Recommended publications
  • Abstraction and Generalisation in Semantic Role Labels: Propbank
    Abstraction and Generalisation in Semantic Role Labels: PropBank, VerbNet or both? Paola Merlo Lonneke Van Der Plas Linguistics Department Linguistics Department University of Geneva University of Geneva 5 Rue de Candolle, 1204 Geneva 5 Rue de Candolle, 1204 Geneva Switzerland Switzerland [email protected] [email protected] Abstract The role of theories of semantic role lists is to obtain a set of semantic roles that can apply to Semantic role labels are the representa- any argument of any verb, to provide an unam- tion of the grammatically relevant aspects biguous identifier of the grammatical roles of the of a sentence meaning. Capturing the participants in the event described by the sentence nature and the number of semantic roles (Dowty, 1991). Starting from the first proposals in a sentence is therefore fundamental to (Gruber, 1965; Fillmore, 1968; Jackendoff, 1972), correctly describing the interface between several approaches have been put forth, ranging grammar and meaning. In this paper, we from a combination of very few roles to lists of compare two annotation schemes, Prop- very fine-grained specificity. (See Levin and Rap- Bank and VerbNet, in a task-independent, paport Hovav (2005) for an exhaustive review). general way, analysing how well they fare In NLP, several proposals have been put forth in in capturing the linguistic generalisations recent years and adopted in the annotation of large that are known to hold for semantic role samples of text (Baker et al., 1998; Palmer et al., labels, and consequently how well they 2005; Kipper, 2005; Loper et al., 2007). The an- grammaticalise aspects of meaning.
    [Show full text]
  • A Novel Large-Scale Verbal Semantic Resource and Its Application to Semantic Role Labeling
    VerbAtlas: a Novel Large-Scale Verbal Semantic Resource and Its Application to Semantic Role Labeling Andrea Di Fabio}~, Simone Conia}, Roberto Navigli} }Department of Computer Science ~Department of Literature and Modern Cultures Sapienza University of Rome, Italy {difabio,conia,navigli}@di.uniroma1.it Abstract The automatic identification and labeling of ar- gument structures is a task pioneered by Gildea We present VerbAtlas, a new, hand-crafted and Jurafsky(2002) called Semantic Role Label- lexical-semantic resource whose goal is to ing (SRL). SRL has become very popular thanks bring together all verbal synsets from Word- to its integration into other related NLP tasks Net into semantically-coherent frames. The such as machine translation (Liu and Gildea, frames define a common, prototypical argu- ment structure while at the same time pro- 2010), visual semantic role labeling (Silberer and viding new concept-specific information. In Pinkal, 2018) and information extraction (Bas- contrast to PropBank, which defines enumer- tianelli et al., 2013). ative semantic roles, VerbAtlas comes with In order to be performed, SRL requires the fol- an explicit, cross-frame set of semantic roles lowing core elements: 1) a verb inventory, and 2) linked to selectional preferences expressed in terms of WordNet synsets, and is the first a semantic role inventory. However, the current resource enriched with semantic information verb inventories used for this task, such as Prop- about implicit, shadow, and default arguments. Bank (Palmer et al., 2005) and FrameNet (Baker We demonstrate the effectiveness of VerbAtlas et al., 1998), are language-specific and lack high- in the task of dependency-based Semantic quality interoperability with existing knowledge Role Labeling and show how its integration bases.
    [Show full text]
  • Automatic Correction of Real-Word Errors in Spanish Clinical Texts
    sensors Article Automatic Correction of Real-Word Errors in Spanish Clinical Texts Daniel Bravo-Candel 1,Jésica López-Hernández 1, José Antonio García-Díaz 1 , Fernando Molina-Molina 2 and Francisco García-Sánchez 1,* 1 Department of Informatics and Systems, Faculty of Computer Science, Campus de Espinardo, University of Murcia, 30100 Murcia, Spain; [email protected] (D.B.-C.); [email protected] (J.L.-H.); [email protected] (J.A.G.-D.) 2 VÓCALI Sistemas Inteligentes S.L., 30100 Murcia, Spain; [email protected] * Correspondence: [email protected]; Tel.: +34-86888-8107 Abstract: Real-word errors are characterized by being actual terms in the dictionary. By providing context, real-word errors are detected. Traditional methods to detect and correct such errors are mostly based on counting the frequency of short word sequences in a corpus. Then, the probability of a word being a real-word error is computed. On the other hand, state-of-the-art approaches make use of deep learning models to learn context by extracting semantic features from text. In this work, a deep learning model were implemented for correcting real-word errors in clinical text. Specifically, a Seq2seq Neural Machine Translation Model mapped erroneous sentences to correct them. For that, different types of error were generated in correct sentences by using rules. Different Seq2seq models were trained and evaluated on two corpora: the Wikicorpus and a collection of three clinical datasets. The medicine corpus was much smaller than the Wikicorpus due to privacy issues when dealing Citation: Bravo-Candel, D.; López-Hernández, J.; García-Díaz, with patient information.
    [Show full text]
  • An Arabic Wordnet with Ontologically Clean Content
    Applied Ontology (2021) IOS Press The Arabic Ontology – An Arabic Wordnet with Ontologically Clean Content Mustafa Jarrar Birzeit University, Palestine [email protected] Abstract. We present a formal Arabic wordnet built on the basis of a carefully designed ontology hereby referred to as the Arabic Ontology. The ontology provides a formal representation of the concepts that the Arabic terms convey, and its content was built with ontological analysis in mind, and benchmarked to scientific advances and rigorous knowledge sources as much as this is possible, rather than to only speakers’ beliefs as lexicons typically are. A comprehensive evaluation was conducted thereby demonstrating that the current version of the top-levels of the ontology can top the majority of the Arabic meanings. The ontology consists currently of about 1,300 well-investigated concepts in addition to 11,000 concepts that are partially validated. The ontology is accessible and searchable through a lexicographic search engine (http://ontology.birzeit.edu) that also includes about 150 Arabic-multilingual lexicons, and which are being mapped and enriched using the ontology. The ontology is fully mapped with Princeton WordNet, Wikidata, and other resources. Keywords. Linguistic Ontology, WordNet, Arabic Wordnet, Lexicon, Lexical Semantics, Arabic Natural Language Processing Accepted by: 1. Introduction The importance of linguistic ontologies and wordnets is increasing in many application areas, such as multilingual big data (Oana et al., 2012; Ceravolo, 2018), information retrieval (Abderrahim et al., 2013), question-answering and NLP-based applications (Shinde et al., 2012), data integration (Castanier et al., 2012; Jarrar et al., 2011), multilingual web (McCrae et al., 2011; Jarrar, 2006), among others.
    [Show full text]
  • Verbnet Based Citation Sentiment Class Assignment Using Machine Learning
    (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020 VerbNet based Citation Sentiment Class Assignment using Machine Learning Zainab Amjad1, Imran Ihsan2 Department of Creative Technologies Air University, Islamabad, Pakistan Abstract—Citations are used to establish a link between time-consuming and complicated. To resolve this issue there articles. This intent has changed over the years, and citations are exists many researchers [7]–[9] who deal with the sentiment now being used as a criterion for evaluating the research work or analysis of citation sentences to improve bibliometric the author and has become one of the most important criteria for measures. Such applications can help scholars in the period of granting rewards or incentives. As a result, many unethical research to identify the problems with the present approaches, activities related to the use of citations have emerged. That is why unaddressed issues, and the present research gaps [10]. content-based citation sentiment analysis techniques are developed on the hypothesis that all citations are not equal. There are two existing approaches for Citation Sentiment There are several pieces of research to find the sentiment of a Analysis: Qualitative and Quantitative [7]. Quantitative citation, however, only a handful of techniques that have used approaches consider that all citations are equally important citation sentences for this purpose. In this research, we have while qualitative approaches believe that all citations are not proposed a verb-oriented citation sentiment classification for equally important [9]. The quantitative approach uses citation researchers by semantically analyzing verbs within a citation text count to rank a research paper [8] while the qualitative using VerbNet Ontology, natural language processing & four approach analyzes the nature of citation [10].
    [Show full text]
  • Generating Lexical Representations of Frames Using Lexical Substitution
    Generating Lexical Representations of Frames using Lexical Substitution Saba Anwar Artem Shelmanov Universitat¨ Hamburg Skolkovo Institute of Science and Technology Germany Russia [email protected] [email protected] Alexander Panchenko Chris Biemann Skolkovo Institute of Science and Technology Universitat¨ Hamburg Russia Germany [email protected] [email protected] Abstract Seed sentence: I hope PattiHelper can helpAssistance youBenefited party soonTime . Semantic frames are formal linguistic struc- Substitutes for Assistance: assist, aid tures describing situations/actions/events, e.g. Substitutes for Helper: she, I, he, you, we, someone, Commercial transfer of goods. Each frame they, it, lori, hannah, paul, sarah, melanie, pam, riley Substitutes for Benefited party: me, him, folk, her, provides a set of roles corresponding to the sit- everyone, people uation participants, e.g. Buyer and Goods, and Substitutes for Time: tomorrow, now, shortly, sooner, lexical units (LUs) – words and phrases that tonight, today, later can evoke this particular frame in texts, e.g. Sell. The scarcity of annotated resources hin- Table 1: An example of the induced lexical represen- ders wider adoption of frame semantics across tation (roles and LUs) of the Assistance FrameNet languages and domains. We investigate a sim- frame using lexical substitutes from a single seed sen- ple yet effective method, lexical substitution tence. with word representation models, to automat- ically expand a small set of frame-annotated annotated resources. Some publicly available re- sentences with new words for their respective sources are FrameNet (Baker et al., 1998) and roles and LUs. We evaluate the expansion PropBank (Palmer et al., 2005), yet for many lan- quality using FrameNet.
    [Show full text]
  • Unified Language Model Pre-Training for Natural
    Unified Language Model Pre-training for Natural Language Understanding and Generation Li Dong∗ Nan Yang∗ Wenhui Wang∗ Furu Wei∗† Xiaodong Liu Yu Wang Jianfeng Gao Ming Zhou Hsiao-Wuen Hon Microsoft Research {lidong1,nanya,wenwan,fuwei}@microsoft.com {xiaodl,yuwan,jfgao,mingzhou,hon}@microsoft.com Abstract This paper presents a new UNIfied pre-trained Language Model (UNILM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling tasks: unidirec- tional, bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. UNILM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Moreover, UNILM achieves new state-of- the-art results on five natural language generation datasets, including improving the CNN/DailyMail abstractive summarization ROUGE-L to 40.51 (2.04 absolute improvement), the Gigaword abstractive summarization ROUGE-L to 35.75 (0.86 absolute improvement), the CoQA generative question answering F1 score to 82.5 (37.1 absolute improvement), the SQuAD question generation BLEU-4 to 22.12 (3.75 absolute improvement), and the DSTC7 document-grounded dialog response generation NIST-4 to 2.67 (human performance is 2.65). The code and pre-trained models are available at https://github.com/microsoft/unilm. 1 Introduction Language model (LM) pre-training has substantially advanced the state of the art across a variety of natural language processing tasks [8, 29, 19, 31, 9, 1].
    [Show full text]
  • Towards a Cross-Linguistic Verbnet-Style Lexicon for Brazilian Portuguese
    Towards a cross-linguistic VerbNet-style lexicon for Brazilian Portuguese Carolina Scarton, Sandra Alu´ısio Center of Computational Linguistics (NILC), University of Sao˜ Paulo Av. Trabalhador Sao-Carlense,˜ 400. 13560-970 Sao˜ Carlos/SP, Brazil [email protected], [email protected] Abstract This paper presents preliminary results of the Brazilian Portuguese Verbnet (VerbNet.Br). This resource is being built by using other existing Computational Lexical Resources via a semi-automatic method. We identified, automatically, 5688 verbs as candidate members of VerbNet.Br, which are distributed in 257 classes inherited from VerbNet. These preliminary results give us some directions of future work and, since the results were automatically generated, a manual revision of the complete resource is highly desirable. 1. Introduction the verb to load. To fulfill this gap, VerbNet has mappings The task of building Computational Lexical Resources to WordNet, which has deeper semantic relations. (CLRs) and making them publicly available is one of Brazilian Portuguese language lacks CLRs. There are some the most important tasks of Natural Language Processing initiatives like WordNet.Br (Dias da Silva et al., 2008), that (NLP) area. CLRs are used in many other applications is based on and aligned to WordNet. This resource is the in NLP, such as automatic summarization, machine trans- most complete for Brazilian Portuguese language. How- lation and opinion mining. Specially, CLRs that treat the ever, only the verb database is in an advanced stage (it syntactic and semantic behaviour of verbs are very impor- is finished, but without manual validation), currently con- tant to the tasks of information retrieval (Croch and King, sisting of 5,860 verbs in 3,713 synsets.
    [Show full text]
  • Distributional Semantics, Wordnet, Semantic Role
    Semantics Kevin Duh Intro to NLP, Fall 2019 Outline • Challenges • Distributional Semantics • Word Sense • Semantic Role Labeling The Challenge of Designing Semantic Representations • Q: What is semantics? • A: The study of meaning • Q: What is meaning? • A: … We know it when we see it • These sentences/phrases all have the same meaning: • XYZ corporation bought the stock. • The stock was bought by XYZ corporation. • The purchase of the stock by XYZ corporation... • The stock purchase by XYZ corporation... But how to formally define it? Meaning Sentence Representation Example Representations Sentence: “I have a car” Logic Formula Graph Representation Key-Value Records Example Representations Sentence: “I have a car” As translation in “Ich habe ein Auto” another language There’s no single agreed-upon representation that works in all cases • Different emphases: • Words or Sentences • Syntax-Semantics interface, Logical Inference, etc. • Different aims: • “Deep (and narrow)” vs “Shallow (and broad)” • e.g. Show me all flights from BWI to NRT. • Do we link to actual flight records? • Or general concept of flying machines? Outline • Challenges • Distributional Semantics • Word Sense • Semantic Role Labeling Distributional Semantics 10 Learning Distributional Semantics from large text dataset • Your pet dog is so cute • Your pet cat is so cute • The dog ate my homework • The cat ate my homework neighbor(dog) overlaps-with neighbor(cats) so meaning(dog) is-similar-to meaning(cats) 11 Word2Vec implements Distribution Semantics Your pet dog is so cute 1. Your pet - is so | dog 2. dog | Your pet - is so From: Mikolov, Tomas; et al. (2013). "Efficient Estimation of Word 12 Representations in Vector Space” Latent Semantic Analysis (LSA) also implements Distributional Semantics Pet-peeve: fundamentally, neural approaches aren’t so different from classical LSA.
    [Show full text]
  • Analyzing Political Polarization in News Media with Natural Language Processing
    Analyzing Political Polarization in News Media with Natural Language Processing by Brian Sharber A thesis presented to the Honors College of Middle Tennessee State University in partial fulfillment of the requirements for graduation from the University Honors College Fall 2020 Analyzing Political Polarization in News Media with Natural Language Processing by Brian Sharber APPROVED: ______________________________ Dr. Salvador E. Barbosa Project Advisor Computer Science Department Dr. Medha Sarkar Computer Science Department Chair ____________________________ Dr. John R. Vile, Dean University Honors College Copyright © 2020 Brian Sharber Department of Computer Science Middle Tennessee State University; Murfreesboro, Tennessee, USA. I hereby grant to Middle Tennessee State University (MTSU) and its agents (including an institutional repository) the non-exclusive right to archive, preserve, and make accessible my thesis in whole or in part in all forms of media now and hereafter. I warrant that the thesis and the abstract are my original work and do not infringe or violate any rights of others. I agree to indemnify and hold MTSU harmless for any damage which may result from copyright infringement or similar claims brought against MTSU by third parties. I retain all ownership rights to the copyright of my thesis. I also retain the right to use in future works (such as articles or books) all or part of this thesis. The software described in this work is free software. You can redistribute it and/or modify it under the terms of the MIT License. The software is posted on GitHub under the following repository: https://github.com/briansha/NLP-Political-Polarization iii Abstract Political discourse in the United States is becoming increasingly polarized.
    [Show full text]
  • Foundations of Natural Language Processing Lecture 16 Semantic
    Language is Flexible Foundations of Natural Language Processing Often we want to know who did what to whom (when, where, how and why) • Lecture 16 But the same event and its participants can have different syntactic realizations. Semantic Role Labelling and Argument Structure • Sandy broke the glass. vs. The glass was broken by Sandy. Alex Lascarides She gave the boy a book. vs. She gave a book to the boy. (Slides based on those of Schneider, Koehn, Lascarides) Instead of focusing on syntax, consider the semantic roles (also called 13 March 2020 • thematic roles) defined by each event. Alex Lascarides FNLP Lecture 16 13 March 2020 Alex Lascarides FNLP Lecture 16 1 Argument Structure and Alternations Stanford Dependencies Mary opened the door • The door opened John slices bread with a knife • This bread slices easily The knife slices cleanly Mary loaded the truck with hay • Mary loaded hay onto the the truck The truck was loaded with hay (by Mary) The hay was loaded onto the truck (by Mary) John gave a present to Mary • John gave Mary a present cf Mary ate the sandwich with Kim! Alex Lascarides FNLP Lecture 16 2 Alex Lascarides FNLP Lecture 16 3 Syntax-Semantics Relationship Outline syntax = semantics • 6 The semantic roles played by different participants in the sentence are not • trivially inferable from syntactical relations . though there are patterns! • The idea of semantic roles can be combined with other aspects of meaning • (beyond this course). Alex Lascarides FNLP Lecture 16 4 Alex Lascarides FNLP Lecture 16 5 Commonly used thematic
    [Show full text]
  • Automatic Spelling Correction Based on N-Gram Model
    International Journal of Computer Applications (0975 – 8887) Volume 182 – No. 11, August 2018 Automatic Spelling Correction based on n-Gram Model S. M. El Atawy A. Abd ElGhany Dept. of Computer Science Dept. of Computer Faculty of Specific Education Damietta University Damietta University, Egypt Egypt ABSTRACT correction. In this paper, we are designed, implemented and A spell checker is a basic requirement for any language to be evaluated an end-to-end system that performs spellchecking digitized. It is a software that detects and corrects errors in a and auto correction. particular language. This paper proposes a model to spell error This paper is organized as follows: Section 2 illustrates types detection and auto-correction that is based on n-gram of spelling error and some of related works. Section 3 technique and it is applied in error detection and correction in explains the system description. Section 4 presents the results English as a global language. The proposed model provides and evaluation. Finally, Section 5 includes conclusion and correction suggestions by selecting the most suitable future work. suggestions from a list of corrective suggestions based on lexical resources and n-gram statistics. It depends on a 2. RELATED WORKS lexicon of Microsoft words. The evaluation of the proposed Spell checking techniques have been substantially, such as model uses English standard datasets of misspelled words. error detection & correction. Two commonly approaches for Error detection, automatic error correction, and replacement error detection are dictionary lookup and n-gram analysis. are the main features of the proposed model. The results of the Most spell checkers methods described in the literature, use experiment reached approximately 93% of accuracy and acted dictionaries as a list of correct spellings that help algorithms similarly to Microsoft Word as well as outperformed both of to find target words [16].
    [Show full text]