Detecting Semantic Difference: a New Model Based on Knowledge and Collocational Association
Total Page:16
File Type:pdf, Size:1020Kb
Load more
										Recommended publications
									
								- 
												  The Generative LexiconThe Generative Lexicon James Pustejovsky" Computer Science Department Brandeis University In this paper, I will discuss four major topics relating to current research in lexical seman- tics: methodology, descriptive coverage, adequacy of the representation, and the computational usefulness of representations. In addressing these issues, I will discuss what I think are some of the central problems facing the lexical semantics community, and suggest ways of best ap- proaching these issues. Then, I will provide a method for the decomposition of lexical categories and outline a theory of lexical semantics embodying a notion of cocompositionality and type coercion, as well as several levels of semantic description, where the semantic load is spread more evenly throughout the lexicon. I argue that lexical decomposition is possible if it is per- formed generatively. Rather than assuming a fixed set of primitives, I will assume a fixed number of generative devices that can be seen as constructing semantic expressions. I develop a theory of Qualia Structure, a representation language for lexical items, which renders much lexical ambiguity in the lexicon unnecessary, while still explaining the systematic polysemy that words carry. Finally, I discuss how individual lexical structures can be integrated into the larger lexical knowledge base through a theory of lexical inheritance. This provides us with the necessary principles of global organization for the lexicon, enabling us to fully integrate our natural language lexicon into a conceptual whole. 1. Introduction I believe we have reached an interesting turning point in research, where linguistic studies can be informed by computational tools for lexicology as well as an appre- ciation of the computational complexity of large lexical databases.
- 
												  Deriving Programming Theories from Equations by Functional Predicate CalculusCalculational Semantics: Deriving Programming Theories from Equations by Functional Predicate Calculus RAYMOND T. BOUTE INTEC, Ghent University The objects of programming semantics, namely, programs and languages, are inherently formal, but the derivation of semantic theories is all too often informal, deprived of the benefits of formal calculation “guided by the shape of the formulas.” Therefore, the main goal of this article is to provide for the study of semantics an approach with the same convenience and power of discov- ery that calculus has given for many years to applied mathematics, physics, and engineering. The approach uses functional predicate calculus and concrete generic functionals; in fact, a small part suffices. Application to a semantic theory proceeds by describing program behavior in the simplest possible way, namely by program equations, and discovering the axioms of the theory as theorems by calculation. This is shown in outline for a few theories, and in detail for axiomatic semantics, fulfilling a second goal of this article. Indeed, a chafing problem with classical axiomatic semantics is that some axioms are unintuitive at first, and that justifications via denotational semantics are too elaborate to be satisfactory. Derivation provides more transparency. Calculation of formulas for ante- and postconditions is shown in general, and for the major language constructs in par- ticular. A basic problem reported in the literature, whereby relations are inadequate for handling nondeterminacy and termination, is solved here through appropriately defined program equations. Several variants and an example in mathematical analysis are also presented. One conclusion is that formal calculation with quantifiers is one of the most important elements for unifying contin- uous and discrete mathematics in general, and traditional engineering with computing science, in particular.
- 
												  Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers Seth Grimes A market study sponsored by Published July 9, 2014, © Alta Plana Corporation Text Analytics 2014: User Perspectives on Solutions and Providers Table of Contents Executive Summary .............................................................................................................. 3 Growth Drivers ...................................................................................................................................... 4 Key Study Findings ................................................................................................................................ 5 About the Study and this Report ......................................................................................................... 6 Text Analytics Basics ............................................................................................................ 7 Patterns ................................................................................................................................................. 7 Structure ............................................................................................................................................... 7 Metadata ............................................................................................................................................... 8 Beyond Text .........................................................................................................................................
- 
												  Semantic Role Labeling: an Introduction to the Special IssueSemantic Role Labeling: An Introduction to the Special Issue Llu´ıs Marquez` ∗ Universitat Politecnica` de Catalunya Xavier Carreras∗∗ Massachusetts Institute of Technology Kenneth C. Litkowski† CL Research Suzanne Stevenson‡ University of Toronto Semantic role labeling, the computational identification and labeling of arguments in text, has become a leading task in computational linguistics today. Although the issues for this task have been studied for decades, the availability of large resources and the development of statistical machine learning methods have heightened the amount of effort in this field. This special issue presents selected and representative work in the field. This overview describes linguistic background of the problem, the movement from linguistic theories to computational practice, the major resources that are being used, an overview of steps taken in computational systems, and a description of the key issues and results in semantic role labeling (as revealed in several international evaluations). We assess weaknesses in semantic role labeling and identify important challenges facing the field. Overall, the opportunities and the potential for useful further research in semantic role labeling are considerable. 1. Introduction The sentence-level semantic analysis of text is concerned with the characterization of events, such as determining “who” did “what” to “whom,” “where,” “when,” and “how.” The predicate of a clause (typically a verb) establishes “what” took place, and other sentence constituents express the participants in the event (such as “who” and “where”), as well as further event properties (such as “when” and “how”). The primary task of semantic role labeling (SRL) is to indicate exactly what semantic relations hold among a predicate and its associated participants and properties, with these relations Departament de Llenguatges i Sistemes Informatics,` Universitat Politecnica` de Catalunya, Jordi Girona ∗ Salgado 1–3, 08034 Barcelona, Spain.
- 
												  Ten Years of Babelnet: a SurveyProceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21) Survey Track Ten Years of BabelNet: A Survey Roberto Navigli1 , Michele Bevilacqua1 , Simone Conia1 , Dario Montagnini2 and Francesco Cecconi2 1Sapienza NLP Group, Sapienza University of Rome, Italy 2Babelscape, Italy froberto.navigli, michele.bevilacqua, [email protected] fmontagnini, [email protected] Abstract to integrate symbolic knowledge into neural architectures [d’Avila Garcez and Lamb, 2020]. The rationale is that the The intelligent manipulation of symbolic knowl- use of, and linkage to, symbolic knowledge can not only en- edge has been a long-sought goal of AI. How- able interpretable, explainable and accountable AI systems, ever, when it comes to Natural Language Process- but it can also increase the degree of generalization to rare ing (NLP), symbols have to be mapped to words patterns (e.g., infrequent meanings) and promote better use and phrases, which are not only ambiguous but also of information which is not explicit in the text. language-specific: multilinguality is indeed a de- Symbolic knowledge requires that the link between form sirable property for NLP systems, and one which and meaning be made explicit, connecting strings to repre- enables the generalization of tasks where multiple sentations of concepts, entities and thoughts. Historical re- languages need to be dealt with, without translat- sources such as WordNet [Miller, 1995] are important en- ing text. In this paper we survey BabelNet, a pop- deavors which systematize symbolic knowledge about the ular wide-coverage lexical-semantic knowledge re- words of a language, i.e., lexicographic knowledge, not only source obtained by merging heterogeneous sources in a machine-readable format, but also in structured form, into a unified semantic network that helps to scale thanks to the organization of concepts into a semantic net- tasks and applications to hundreds of languages.
- 
												  ACNLP at Semeval-2020 Task 6: a Supervised Approach for DefinitionACNLP at SemEval-2020 Task 6: A Supervised Approach for Definition Extraction Fabien Caspani Pirashanth Ratnamogan Mathis Linger Mhamed Hajaiej BNP Paribas, France ffabien.caspani, pirashanth.ratnamogan, mathis.linger, [email protected] Abstract We describe our contribution to two of the subtasks of SemEval 2020 Task 6, DeftEval: Extracting term-definition pairs in free text. The system for Subtask 1: Sentence Classification is based on a transformer architecture where we use transfer learning to fine-tune a pretrained model on the downstream task, and the one for Subtask 3: Relation Classification uses a Random Forest classifier with handcrafted dedicated features. Our systems respectively achieve 0.830 and 0.994 F1-scores on the official test set, and we believe that the insights derived from our study are potentially relevant to help advance the research on definition extraction. 1 Introduction SemEval 2020 Task 6 (Spala et al., 2020) is a definition extraction problem divided into three different subtasks, in which the objective is to find the term-definition pairs in free text, possibly spanning across sentences boundaries. In Subtask 1 the system classifies a sentence as containing a definition or not. Subtask 2 consists of labeling each token with their corresponding BIO tags. Finally, in Subtask 3 the system pairs tagged entities, and labels the relation existing between them. The Definition Extraction from Texts (DEFT) corpus (Spala et al., 2019) is the dataset used for this task, and consists of human-annotated data from textbooks on a variety of subjects. Definition extraction methods in the research literature fall into one of the following categories.
- 
												  Semeval 2020 Task 1: Unsupervised Lexical Semantic Change DetectionSemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection November 11, 2020 Dominik Schlechtweg,| Barbara McGillivray,};~ Simon Hengchen,♠ Haim Dubossarsky,~ Nina Tahmasebi♠ [email protected] |University of Stuttgart, }The Alan Turing Institute, ~University of Cambridge ♠University of Gothenburg 1 Introduction I evaluation is currently our most pressing problem in LSC detection 1 I SemEval is competition-style semantic evaluation series I SemEval 2020 Task 1 on Unsupervised Lexical Semantic Change Detection (Schlechtweg, McGillivray, Hengchen, Dubossarsky, & Tahmasebi, 2020)2 I datasets for 4 languages with 100,000 human judgments I 2 subtasks I 33 teams submitted 186 systems 1https://semeval.github.io/ 2 https://languagechange.org/semeval/ 2 Tasks I comparison of two time periods t1 and t2 (i) reduces the number of time periods for which data has to be annotated (ii) reduces the task complexity I two tasks: I Subtask 1 { Binary classification: for a set of target words, decide which words lost or gained senses between t1 and t2, and which ones did not. I Subtask 2 { Ranking: rank a set of target words according to their degree of LSC between t1 and t2. I defined on word sense frequency distributions 3 Sense Frequency Distributions (SFDs) Figure 1: An example of a sense frequency distribution for the word cell in two time periods. 4 Corpora t1 t2 English CCOHA 1810-1860 CCOHA 1960-2010 German DTA 1800-1899 BZ+ND 1946-1990 Latin LatinISE -200{0 LatinISE 0{2000 Swedish Kubhist 1790-1830 Kubhist 1895-1903 Table 1: Time-defined subcorpora for each language. 5 Annotation 1.
- 
												  Conceptnet 5.5: an Open Multilingual Graph of General KnowledgeProceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) ConceptNet 5.5: An Open Multilingual Graph of General Knowledge Robyn Speer Joshua Chin Catherine Havasi Luminoso Technologies, Inc. Union College Luminoso Technologies, Inc. 675 Massachusetts Avenue 807 Union St. 675 Massachusetts Avenue Cambridge, MA 02139 Schenectady, NY 12308 Cambridge, MA 02139 Abstract In this paper, we will concisely represent assertions such Machine learning about language can be improved by sup- as the above as triples of their start node, relation label, and plying it with specific knowledge and sources of external in- end node: the assertion that “a dog has a tail” can be repre- formation. We present here a new version of the linked open sented as (dog, HasA, tail). data resource ConceptNet that is particularly well suited to ConceptNet also represents links between knowledge re- be used with modern NLP techniques such as word embed- sources. In addition to its own knowledge about the English dings. term astronomy, for example, ConceptNet contains links to ConceptNet is a knowledge graph that connects words and URLs that define astronomy in WordNet, Wiktionary, Open- phrases of natural language with labeled edges. Its knowl- Cyc, and DBPedia. edge is collected from many sources that include expert- The graph-structured knowledge in ConceptNet can be created resources, crowd-sourcing, and games with a pur- particularly useful to NLP learning algorithms, particularly pose. It is designed to represent the general knowledge in- those based on word embeddings, such as (Mikolov et al. volved in understanding language, improving natural lan- 2013). We can use ConceptNet to build semantic spaces that guage applications by allowing the application to better un- are more effective than distributional semantics alone.
- 
												  Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers Seth Grimes A market study sponsored by Published July 9, 2014, © Alta Plana Corporation Text Analytics 2014: User Perspectives on Solutions and Providers Table of Contents Executive Summary .............................................................................................................. 3 The Survey ............................................................................................................................................................. 5 Key Study Findings ................................................................................................................................................ 5 About the Study and this Report .......................................................................................................................... 6 Text Analytics Basics .............................................................................................................7 Patterns ................................................................................................................................................................. 7 Structure ................................................................................................................................................................ 7 Metadata ............................................................................................................................................................... 8 Beyond Text ..........................................................................................................................................................
- 
												  A Survey of Computational Semantics: Representation, Inference and Knowledge in Wide-Coverage Text Understanding Johan Bos* University of GroningenLanguage and Linguistics Compass 5/6 (2011): 336–366, 10.1111/j.1749-818x.2011.00284.x A Survey of Computational Semantics: Representation, Inference and Knowledge in Wide-Coverage Text Understanding Johan Bos* University of Groningen Abstract The aim of computational semantics is to capture the meaning of natural language expressions in representations suitable for performing inferences, in the service of understanding human language in written or spoken form. First-order logic is a good starting point, both from the representation and inference point of view. But even if one makes the choice of first-order logic as representa- tion language, this is not enough: the computational semanticist needs to make further decisions on how to model events, tense, modal contexts, anaphora and plural entities. Semantic representa- tions are usually built on top of a syntactic analysis, using unification, techniques from the lambda-calculus or linear logic, to do the book-keeping of variable naming. Inference has many potential applications in computational semantics. One way to implement inference is using algo- rithms from automated deduction dedicated to first-order logic, such as theorem proving and model building. Theorem proving can help in finding contradictions or checking for new infor- mation. Finite model building can be seen as a complementary inference task to theorem proving, and it often makes sense to use both procedures in parallel. The models produced by model generators for texts not only show that the text is contradiction-free; they also can be used for disambiguation tasks and linking interpretation with the real world. To make interesting inferences, often additional background knowledge is required (not expressed in the analysed text or speech parts).
- 
												  Proceedings of the LFG11 Conference Miriam Butt and Tracy Holloway King (Editors) 2011Proceedings of LFG11 Miriam Butt and Tracy Holloway King (Editors) 2011 CSLI Publications http://csli-publications.stanford.edu/ Contents 1 Editors’ Note 4 2 I Wayan Arka: Constructive Number Systems in Marori and Beyond 5 3 Doug Arnold and Louisa Sadler: Resource Splitting and Reintegration with Supplementals 26 4 Rajesh Bhatt, Tina Bögel, Miriam Butt, Annette Hautli, and Sebastian Sulger: Urdu/Hindi Modals 47 5 Adams Bodomo: Reflexivity without Apparent Marking: The Case of Mashan Zhuang 68 6 George Aaron Broadwell, Gregg Castellucci, and Megan Knickerbocker: An Optimal Approach to Partial Agreement in Kaqchikel 89 7 Maris Camilleri and Louisa Sadler: Restrictive Relative Clauses in Maltese 110 8 Damir Cavar and Melanie Seiss: Clitic Placement, Syntactic Disconti- nuity, and Information Structure 131 9 Mary Dalrymple: A Very Long-Distance Anaphor? 152 10 Mary Dalrymple and Louise Mycock: The Prosody-Semantics Inter- face 173 11 Yehuda Falk: Multiple-Gap Constructions 194 12 Anna Gazdik and András Komlósy: On the Syntax-Discourse Inter- face in Hungarian 215 13 Gianluca Giorgolo and Ash Asudeh: Multidimensional Semantics with Unidimensional Glue Logic 236 14 Gianluca Giorgolo and Ash Asudeh: Multimodal Communication in LFG: Gestures and the Correspondence Architecture 257 15 Dag Trygve Truslew Haug: Backward Control in Ancient Greek - Sub- sumption or Linearization? 278 16 Tibor Laczkó and György Rákosi: On Particularly Predicative Parti- cles in Hungarian 299 17 Leslie Lee and Farrell Ackerman: Mandarin Resultative Compounds: A Family
- 
												  Semeval-2013 Task 4: Free Paraphrases of Noun CompoundsSemEval-2013 Task 4: Free Paraphrases of Noun Compounds Iris Hendrickx Zornitsa Kozareva Radboud University Nijmegen & University of Southern California Universidade de Lisboa [email protected] [email protected] Preslav Nakov Diarmuid OS´ eaghdha´ QCRI, Qatar Foundation University of Cambridge [email protected] [email protected] Stan Szpakowicz Tony Veale University of Ottawa & University College Dublin Polish Academy of Sciences [email protected] [email protected] Abstract The frequency spectrum of compound types fol- lows a Zipfian distribution (OS´ eaghdha,´ 2008), so In this paper, we describe SemEval-2013 Task many NC tokens belong to a “long tail” of low- 4: the definition, the data, the evaluation and frequency types. More than half of the two-noun the results. The task is to capture some of the types in the BNC occur exactly once (Kim and Bald- meaning of English noun compounds via para- phrasing. Given a two-word noun compound, win, 2006). Their high frequency and high produc- the participating system is asked to produce tivity make robust NC interpretation an important an explicitly ranked list of its free-form para- goal for broad-coverage semantic processing of En- phrases. The list is automatically compared glish texts. Systems which ignore NCs may give up and evaluated against a similarly ranked list on salient information about the semantic relation- of paraphrases proposed by human annota- ships implicit in a text. Compositional interpretation tors, recruited and managed through Ama- is also the only way to achieve broad NC coverage, zon’s Mechanical Turk. The comparison of because it is not feasible to list in a lexicon all com- raw paraphrases is sensitive to syntactic and morphological variation.