SLIDE – a Sentiment Lexicon of Common Idioms
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Dictionary Users Do Look up Frequent Words. a Logfile Analysis
Erschienen in: Müller-Spitzer, Carolin (Hrsg.): Using Online Dictionaries. Berlin/ Boston: de Gruyter, 2014. (Lexicographica: Series Maior 145), S. 229-249. Alexander Koplenig, Peter Meyer, Carolin Müller-Spitzer Dictionary users do look up frequent words. A logfile analysis Abstract: In this paper, we use the 2012 log files of two German online dictionaries (Digital Dictionary of the German Language1 and the German Version of Wiktionary) and the 100,000 most frequent words in the Mannheim German Reference Corpus from 2009 to answer the question of whether dictionary users really do look up fre- quent words, first asked by de Schryver et al. (2006). By using an approach to the comparison of log files and corpus data which is completely different from that of the aforementioned authors, we provide empirical evidence that indicates - contra - ry to the results of de Schryver et al. and Verlinde/Binon (2010) - that the corpus frequency of a word can indeed be an important factor in determining what online dictionary users look up. Finally, we incorporate word dass Information readily available in Wiktionary into our analysis to improve our results considerably. Keywords: log file, frequency, corpus, headword list, monolingual dictionary, multi- lingual dictionary Alexander Koplenig: Institut für Deutsche Sprache, R 5, 6-13, 68161 Mannheim, +49-(0)621-1581- 435, [email protected] Peter Meyer: Institut für Deutsche Sprache, R 5, 6-13, 68161 Mannheim, +49-(0)621-1581-427, [email protected] Carolin Müller-Spitzer: Institut für Deutsche Sprache, R 5, 6-13, 68161 Mannheim, +49-(0)621-1581- 429, [email protected] Introduction We would like to Start this chapter by asking one of the most fundamental questions for any general lexicographical endeavour to describe the words of one (or more) language(s): which words should be included in a dictionary? At first glance, the answer seems rather simple (especially when the primary objective is to describe a language as completely as possible): it would be best to include every word in the dictionary. -
The Generative Lexicon
The Generative Lexicon James Pustejovsky" Computer Science Department Brandeis University In this paper, I will discuss four major topics relating to current research in lexical seman- tics: methodology, descriptive coverage, adequacy of the representation, and the computational usefulness of representations. In addressing these issues, I will discuss what I think are some of the central problems facing the lexical semantics community, and suggest ways of best ap- proaching these issues. Then, I will provide a method for the decomposition of lexical categories and outline a theory of lexical semantics embodying a notion of cocompositionality and type coercion, as well as several levels of semantic description, where the semantic load is spread more evenly throughout the lexicon. I argue that lexical decomposition is possible if it is per- formed generatively. Rather than assuming a fixed set of primitives, I will assume a fixed number of generative devices that can be seen as constructing semantic expressions. I develop a theory of Qualia Structure, a representation language for lexical items, which renders much lexical ambiguity in the lexicon unnecessary, while still explaining the systematic polysemy that words carry. Finally, I discuss how individual lexical structures can be integrated into the larger lexical knowledge base through a theory of lexical inheritance. This provides us with the necessary principles of global organization for the lexicon, enabling us to fully integrate our natural language lexicon into a conceptual whole. 1. Introduction I believe we have reached an interesting turning point in research, where linguistic studies can be informed by computational tools for lexicology as well as an appre- ciation of the computational complexity of large lexical databases. -
Etytree: a Graphical and Interactive Etymology Dictionary Based on Wiktionary
Etytree: A Graphical and Interactive Etymology Dictionary Based on Wiktionary Ester Pantaleo Vito Walter Anelli Wikimedia Foundation grantee Politecnico di Bari Italy Italy [email protected] [email protected] Tommaso Di Noia Gilles Sérasset Politecnico di Bari Univ. Grenoble Alpes, CNRS Italy Grenoble INP, LIG, F-38000 Grenoble, France [email protected] [email protected] ABSTRACT a new method1 that parses Etymology, Derived terms, De- We present etytree (from etymology + family tree): a scendants sections, the namespace for Reconstructed Terms, new on-line multilingual tool to extract and visualize et- and the etymtree template in Wiktionary. ymological relationships between words from the English With etytree, a RDF (Resource Description Framework) Wiktionary. A first version of etytree is available at http: lexical database of etymological relationships collecting all //tools.wmflabs.org/etytree/. the extracted relationships and lexical data attached to lex- With etytree users can search a word and interactively emes has also been released. The database consists of triples explore etymologically related words (ancestors, descendants, or data entities composed of subject-predicate-object where cognates) in many languages using a graphical interface. a possible statement can be (for example) a triple with a lex- The data is synchronised with the English Wiktionary dump eme as subject, a lexeme as object, and\derivesFrom"or\et- at every new release, and can be queried via SPARQL from a ymologicallyEquivalentTo" as predicate. The RDF database Virtuoso endpoint. has been exposed via a SPARQL endpoint and can be queried Etytree is the first graphical etymology dictionary, which at http://etytree-virtuoso.wmflabs.org/sparql. -
User Contributions to Online Dictionaries Andrea Abel and Christian M
The dynamics outside the paper: User Contributions to Online Dictionaries Andrea Abel and Christian M. Meyer Electronic lexicography in the 21st century (eLex): Thinking outside the paper, Tallinn, Estonia, October 17–19, 2013. 18.10.2013 | European Academy of Bozen/Bolzano and Technische Universität Darmstadt | Andrea Abel and Christian M. Meyer | 1 Introduction Online dictionaries rely increasingly on their users and leverage methods for facilitating user contributions at basically any step of the lexicographic process. 18.10.2013 | European Academy of Bozen/Bolzano and Technische Universität Darmstadt | Andrea Abel and Christian M. Meyer | 2 Previous work . Mostly focused on specific type/dimension of user contribution . e.g., Carr (1997), Storrer (1998, 2010), Køhler Simonsen (2005), Fuertes-Olivera (2009), Melchior (2012), Lew (2011, 2013) . Ambiguous, partly overlapping terms: www.wordle.net/show/wrdl/7124863/User_contributions (02.10.2013) www.wordle.net/show/wrdl/7124863/User_contributions http:// 18.10.2013 | European Academy of Bozen/Bolzano and Technische Universität Darmstadt | Andrea Abel and Christian M. Meyer | 3 Research goals and contribution How to effectively plan the user contributions for new and established dictionaries? . Comprehensive and systematic classification is still missing . Mann (2010): analysis of 88 online dictionaries . User contributions roughly categorized as direct / indirect contributions and exchange with other dictionary users . Little detail given, since outside the scope of the analysis . We use this as a basis for our work We propose a novel classification of the different types of user contributions based on many practical examples 18.10.2013 | European Academy of Bozen/Bolzano and Technische Universität Darmstadt | Andrea Abel and Christian M. -
Wiktionary Matcher
Wiktionary Matcher Jan Portisch1;2[0000−0001−5420−0663], Michael Hladik2[0000−0002−2204−3138], and Heiko Paulheim1[0000−0003−4386−8195] 1 Data and Web Science Group, University of Mannheim, Germany fjan, [email protected] 2 SAP SE Product Engineering Financial Services, Walldorf, Germany fjan.portisch, [email protected] Abstract. In this paper, we introduce Wiktionary Matcher, an ontology matching tool that exploits Wiktionary as external background knowl- edge source. Wiktionary is a large lexical knowledge resource that is collaboratively built online. Multiple current language versions of Wik- tionary are merged and used for monolingual ontology matching by ex- ploiting synonymy relations and for multilingual matching by exploiting the translations given in the resource. We show that Wiktionary can be used as external background knowledge source for the task of ontology matching with reasonable matching and runtime performance.3 Keywords: Ontology Matching · Ontology Alignment · External Re- sources · Background Knowledge · Wiktionary 1 Presentation of the System 1.1 State, Purpose, General Statement The Wiktionary Matcher is an element-level, label-based matcher which uses an online lexical resource, namely Wiktionary. The latter is "[a] collaborative project run by the Wikimedia Foundation to produce a free and complete dic- tionary in every language"4. The dictionary is organized similarly to Wikipedia: Everybody can contribute to the project and the content is reviewed in a com- munity process. Compared to WordNet [4], Wiktionary is significantly larger and also available in other languages than English. This matcher uses DBnary [15], an RDF version of Wiktionary that is publicly available5. The DBnary data set makes use of an extended LEMON model [11] to describe the data. -
A Study of Idiom Translation Strategies Between English and Chinese
ISSN 1799-2591 Theory and Practice in Language Studies, Vol. 3, No. 9, pp. 1691-1697, September 2013 © 2013 ACADEMY PUBLISHER Manufactured in Finland. doi:10.4304/tpls.3.9.1691-1697 A Study of Idiom Translation Strategies between English and Chinese Lanchun Wang School of Foreign Languages, Qiongzhou University, Sanya 572022, China Shuo Wang School of Foreign Languages, Qiongzhou University, Sanya 572022, China Abstract—This paper, focusing on idiom translation methods and principles between English and Chinese, with the statement of different idiom definitions and the analysis of idiom characteristics and culture differences, studies the strategies on idiom translation, what kind of method should be used and what kind of principle should be followed as to get better idiom translations. Index Terms— idiom, translation, strategy, principle I. DEFINITIONS OF IDIOMS AND THEIR FUNCTIONS Idiom is a language in the formation of the unique and fixed expressions in the using process. As a language form, idioms has its own characteristic and patterns and used in high frequency whether in written language or oral language because idioms can convey a host of language and cultural information when people chat to each other. In some senses, idioms are the reflection of the environment, life, historical culture of the native speakers and are closely associated with their inner most spirit and feelings. They are commonly used in all types of languages, informal and formal. That is why the extent to which a person familiarizes himself with idioms is a mark of his or her command of language. Both English and Chinese are abundant in idioms. -
THE LEXICON: a SYSTEM of MATRICES of LEXICAL UNITS and THEIR PROPERTIES ~ Harry H
THE LEXICON: A SYSTEM OF MATRICES OF LEXICAL UNITS AND THEIR PROPERTIES ~ Harry H. Josselson - Uriel Weinreich /I/~ in discussing the fact that at one time many American scholars relied on either the discipline of psychology or sociology for the resolution of semantic problems~ comments: In Soviet lexicology, it seems, neither the tra- ditionalists~ who have been content to work with the categories of classical rhetoric and 19th- century historical semantics~ nor the critical lexicologists in search of better conceptual tools, have ever found reason to doubt that linguistics alone is centrally responsible for the investiga- tion of the vocabulary of languages. /2/ This paper deals with a certain conceptual tool, the matrix, which linguists can use for organizing a lexicon to insure that words will be described (coded) with consistency, that is~ to insure that questions which have been asked about certain words will be asked for all words in the same class, regardless of the fact that they may be more difficult to answer for some than for others. The paper will also dis- cuss certain new categories~ beyond those of classical rhetoric~ which have been introduced into lexicology. i. INTRODUCTION The research in automatic translation brought about by the introduction of computers into the technology has ~The research described herein has b&en supported by the In- formation Systems Branch of the Office of Naval Research. The present work is an amplification of a paper~ "The Lexicon: A Matri~ of Le~emes and Their Properties"~ contributed to the Conference on Mathematical Linguistics at Budapest-Balatonsza- badi, September 6-i0~ 1968. -
Experiments in Idiom Recognition
Experiments in Idiom Recognition Jing Peng and Anna Feldman Department of Computer Science Department of Linguistics Montclair State University Montclair, New Jersey, USA 07043 Abstract Some expressions can be ambiguous between idiomatic and literal interpretations depending on the context they occur in, e.g., sales hit the roof vs. hit the roof of the car. We present a novel method of classifying whether a given instance is literal or idiomatic, focusing on verb-noun constructions. We report state-of-the-art results on this task using an approach based on the hypothesis that the distributions of the contexts of the idiomatic phrases will be different from the contexts of the literal usages. We measure contexts by using projections of the words into vector space. For comparison, we implement Fazly et al. (2009)’s, Sporleder and Li (2009)’s, and Li and Sporleder (2010b)’s methods and apply them to our data. We provide experimental results validating the proposed techniques. 1 Introduction Researchers have been investigating idioms and their properties for many years. According to traditional approaches, an idiom is — in its simplest form— a string of two or more words for which meaning is not derived from the meanings of the individual words comprising that string (Swinney and Cutler, 1979). As such, the meaning of kick the bucket (‘die’) cannot be obtained by breaking down the idiom and an- alyzing the meanings of its constituent parts, to kick and the bucket. In addition to being influenced by the principle of compositionality, the traditional approaches are also influenced by theories of generative grammar (Flores, 1993; Langlotz, 2006) The properties that traditional approaches attribute to idiomatic expressions are also the properties that make them difficult for generative grammars to describe. -
CS474 Natural Language Processing Semantic Analysis Caveats Introduction to Lexical Semantics
CS474 Natural Language Processing Semantic analysis Last class Assigning meanings to linguistic utterances – History Compositional semantics: we can derive the – Tiny intro to semantic analysis meaning of the whole sentence from the meanings of the parts. Next lectures – Max ate a green apple. – Word sense disambiguation Relies on knowing: » Background from linguistics – the meaning of individual words Lexical semantics – how the meanings of individual words combine to form » On-line resources the meaning of groups of words » Computational approaches [next class] – how it all fits in with syntactic analysis Caveats Introduction to lexical semantics Problems with a compositional approach Lexical semantics is the study of – the systematic meaning-related connections among – a former congressman words and – a toy elephant – the internal meaning-related structure of each word – kicked the bucket Lexeme – an individual entry in the lexicon – a pairing of a particular orthographic and phonological form with some form of symbolic meaning representation Sense: the lexeme’s meaning component Lexicon: a finite list of lexemes Lexical semantic relations: Dictionary entries homonymy right adj. located nearer the right hand esp. Homonyms: words that have the same form and unrelated being on the right when facing the same direction meanings – Instead, a bank1 can hold the investments in a custodial account as the observer. in the client’s name. – But as agriculture burgeons on the east bank2, the river will shrink left adj. located nearer to this side of the body even more. than the right. Homophones: distinct lexemes with a shared red n. the color of blood or a ruby. pronunciation – E.g. -
Extracting an Etymological Database from Wiktionary Benoît Sagot
Extracting an Etymological Database from Wiktionary Benoît Sagot To cite this version: Benoît Sagot. Extracting an Etymological Database from Wiktionary. Electronic Lexicography in the 21st century (eLex 2017), Sep 2017, Leiden, Netherlands. pp.716-728. hal-01592061 HAL Id: hal-01592061 https://hal.inria.fr/hal-01592061 Submitted on 22 Sep 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Extracting an Etymological Database from Wiktionary Benoît Sagot Inria 2 rue Simone Iff, 75012 Paris, France E-mail: [email protected] Abstract Electronic lexical resources almost never contain etymological information. The availability of such information, if properly formalised, could open up the possibility of developing automatic tools targeted towards historical and comparative linguistics, as well as significantly improving the automatic processing of ancient languages. We describe here the process we implemented for extracting etymological data from the etymological notices found in Wiktionary. We have produced a multilingual database of nearly one million lexemes and a database of more than half a million etymological relations between lexemes. Keywords: Lexical resource development; etymology; Wiktionary 1. Introduction Electronic lexical resources used in the fields of natural language processing and com- putational linguistics are almost exclusively synchronic resources; they mostly include information about inflectional, derivational, syntactic, semantic or even pragmatic prop- erties of their entries. -
Idioms-And-Expressions.Pdf
Idioms and Expressions by David Holmes A method for learning and remembering idioms and expressions I wrote this model as a teaching device during the time I was working in Bangkok, Thai- land, as a legal editor and language consultant, with one of the Big Four Legal and Tax companies, KPMG (during my afternoon job) after teaching at the university. When I had no legal documents to edit and no individual advising to do (which was quite frequently) I would sit at my desk, (like some old character out of a Charles Dickens’ novel) and prepare language materials to be used for helping professionals who had learned English as a second language—for even up to fifteen years in school—but who were still unable to follow a movie in English, understand the World News on TV, or converse in a colloquial style, because they’d never had a chance to hear and learn com- mon, everyday expressions such as, “It’s a done deal!” or “Drop whatever you’re doing.” Because misunderstandings of such idioms and expressions frequently caused miscom- munication between our management teams and foreign clients, I was asked to try to as- sist. I am happy to be able to share the materials that follow, such as they are, in the hope that they may be of some use and benefit to others. The simple teaching device I used was three-fold: 1. Make a note of an idiom/expression 2. Define and explain it in understandable words (including synonyms.) 3. Give at least three sample sentences to illustrate how the expression is used in context. -
Wiktionary Matcher Results for OAEI 2020
Wiktionary Matcher Results for OAEI 2020 Jan Portisch1;2[0000−0001−5420−0663] and Heiko Paulheim1[0000−0003−4386−8195] 1 Data and Web Science Group, University of Mannheim, Germany fjan, [email protected] 2 SAP SE Product Engineering Financial Services, Walldorf, Germany [email protected] Abstract. This paper presents the results of the Wiktionary Matcher in the Ontology Alignment Evaluation Initiative (OAEI) 2020. Wiktionary Matcher is an ontology matching tool that exploits Wiktionary as exter- nal background knowledge source. Wiktionary is a large lexical knowl- edge resource that is collaboratively built online. Multiple current lan- guage versions of Wiktionary are merged and used for monolingual on- tology matching by exploiting synonymy relations and for multilingual matching by exploiting the translations given in the resource. This is the second OAEI participation of the matching system. Wiktionary Matcher has been improved and is the best performing system on the knowledge graph track this year.3 Keywords: Ontology Matching · Ontology Alignment · External Re- sources · Background Knowledge · Wiktionary 1 Presentation of the System 1.1 State, Purpose, General Statement The Wiktionary Matcher is an element-level, label-based matcher which uses an online lexical resource, namely Wiktionary. The latter is "[a] collaborative project run by the Wikimedia Foundation to produce a free and complete dic- tionary in every language"4. The dictionary is organized similarly to Wikipedia: Everybody can contribute to the project and the content is reviewed in a com- munity process. Compared to WordNet [2], Wiktionary is significantly larger and also available in other languages than English. This matcher uses DBnary [13], an RDF version of Wiktionary that is publicly available5.