Natural Language Generation in the Context of Machine Translation
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
An Arabic Wordnet with Ontologically Clean Content
Applied Ontology (2021) IOS Press The Arabic Ontology – An Arabic Wordnet with Ontologically Clean Content Mustafa Jarrar Birzeit University, Palestine [email protected] Abstract. We present a formal Arabic wordnet built on the basis of a carefully designed ontology hereby referred to as the Arabic Ontology. The ontology provides a formal representation of the concepts that the Arabic terms convey, and its content was built with ontological analysis in mind, and benchmarked to scientific advances and rigorous knowledge sources as much as this is possible, rather than to only speakers’ beliefs as lexicons typically are. A comprehensive evaluation was conducted thereby demonstrating that the current version of the top-levels of the ontology can top the majority of the Arabic meanings. The ontology consists currently of about 1,300 well-investigated concepts in addition to 11,000 concepts that are partially validated. The ontology is accessible and searchable through a lexicographic search engine (http://ontology.birzeit.edu) that also includes about 150 Arabic-multilingual lexicons, and which are being mapped and enriched using the ontology. The ontology is fully mapped with Princeton WordNet, Wikidata, and other resources. Keywords. Linguistic Ontology, WordNet, Arabic Wordnet, Lexicon, Lexical Semantics, Arabic Natural Language Processing Accepted by: 1. Introduction The importance of linguistic ontologies and wordnets is increasing in many application areas, such as multilingual big data (Oana et al., 2012; Ceravolo, 2018), information retrieval (Abderrahim et al., 2013), question-answering and NLP-based applications (Shinde et al., 2012), data integration (Castanier et al., 2012; Jarrar et al., 2011), multilingual web (McCrae et al., 2011; Jarrar, 2006), among others. -
Verbnet Based Citation Sentiment Class Assignment Using Machine Learning
(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020 VerbNet based Citation Sentiment Class Assignment using Machine Learning Zainab Amjad1, Imran Ihsan2 Department of Creative Technologies Air University, Islamabad, Pakistan Abstract—Citations are used to establish a link between time-consuming and complicated. To resolve this issue there articles. This intent has changed over the years, and citations are exists many researchers [7]–[9] who deal with the sentiment now being used as a criterion for evaluating the research work or analysis of citation sentences to improve bibliometric the author and has become one of the most important criteria for measures. Such applications can help scholars in the period of granting rewards or incentives. As a result, many unethical research to identify the problems with the present approaches, activities related to the use of citations have emerged. That is why unaddressed issues, and the present research gaps [10]. content-based citation sentiment analysis techniques are developed on the hypothesis that all citations are not equal. There are two existing approaches for Citation Sentiment There are several pieces of research to find the sentiment of a Analysis: Qualitative and Quantitative [7]. Quantitative citation, however, only a handful of techniques that have used approaches consider that all citations are equally important citation sentences for this purpose. In this research, we have while qualitative approaches believe that all citations are not proposed a verb-oriented citation sentiment classification for equally important [9]. The quantitative approach uses citation researchers by semantically analyzing verbs within a citation text count to rank a research paper [8] while the qualitative using VerbNet Ontology, natural language processing & four approach analyzes the nature of citation [10]. -
Thoughts on V. Mathesius' Conception of the Word-Order System in English Compared with That in Czech)
JAN FIRBAS FEOM COMPARATIVE WORD-ORDER STUDIES (Thoughts on V. Mathesius' Conception of the Word-Order System in English Compared with that in Czech) The study of word-order was a life-long interest of Professor Vilem MATHESIUS (1882—1945), the founder of the Prague Linguistic School. At the beginning of his academic career, he took it up in a series of articles (Slvdie), pursuing it later in a number of further papers (see Soupis), the last of which appealed three years before his death. It presented a summary of his researches into the word order of English as compared with that of Czech. It is this last paper of Prof. M., entitled Ze srovndvacich sludii slovoslednych (From Comparative Word-Order Studies), that will be the starting point of the present discussion. (1) Endeavouring to continue in Prof. M.'s word-order studies, wc propose to examine the solutions and suggestions offered by it. We should like to do so in the light of our own researches as well as of those of others, and in this way to survey the field that has been covered by these researches since the publication of the paper almost a quarter of a century ago. We do not, however, intend to submit an exhaustive treatment of all the problems touched upon'by Prof. M. It is the general conception of his paper, not the details, that we shall be concerned with here. (2) Nor can we discuss the place Prof. M.'s word-order studies occupy in the development of linguistic research. (3) Let us only briefly remark that Prof. -
Towards a Cross-Linguistic Verbnet-Style Lexicon for Brazilian Portuguese
Towards a cross-linguistic VerbNet-style lexicon for Brazilian Portuguese Carolina Scarton, Sandra Alu´ısio Center of Computational Linguistics (NILC), University of Sao˜ Paulo Av. Trabalhador Sao-Carlense,˜ 400. 13560-970 Sao˜ Carlos/SP, Brazil [email protected], [email protected] Abstract This paper presents preliminary results of the Brazilian Portuguese Verbnet (VerbNet.Br). This resource is being built by using other existing Computational Lexical Resources via a semi-automatic method. We identified, automatically, 5688 verbs as candidate members of VerbNet.Br, which are distributed in 257 classes inherited from VerbNet. These preliminary results give us some directions of future work and, since the results were automatically generated, a manual revision of the complete resource is highly desirable. 1. Introduction the verb to load. To fulfill this gap, VerbNet has mappings The task of building Computational Lexical Resources to WordNet, which has deeper semantic relations. (CLRs) and making them publicly available is one of Brazilian Portuguese language lacks CLRs. There are some the most important tasks of Natural Language Processing initiatives like WordNet.Br (Dias da Silva et al., 2008), that (NLP) area. CLRs are used in many other applications is based on and aligned to WordNet. This resource is the in NLP, such as automatic summarization, machine trans- most complete for Brazilian Portuguese language. How- lation and opinion mining. Specially, CLRs that treat the ever, only the verb database is in an advanced stage (it syntactic and semantic behaviour of verbs are very impor- is finished, but without manual validation), currently con- tant to the tasks of information retrieval (Croch and King, sisting of 5,860 verbs in 3,713 synsets. -
Identifying Basic Constituent Order in Old Tamil: Issues in Historical Linguistics with Special Reference to Tamil Epigraphic Texts (400-650 CE) Appasamy Murugaiyan
Identifying Basic Constituent Order in Old Tamil: Issues in historical linguistics with Special Reference to Tamil Epigraphic texts (400-650 CE) Appasamy Murugaiyan To cite this version: Appasamy Murugaiyan. Identifying Basic Constituent Order in Old Tamil: Issues in historical lin- guistics with Special Reference to Tamil Epigraphic texts (400-650 CE) . 2015. hal-01189729 HAL Id: hal-01189729 https://hal.archives-ouvertes.fr/hal-01189729 Preprint submitted on 1 Sep 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. 1 International Journal of Dravidian Linguistics, Vol. 44 No. 2 June 2015, pp. 1-18. Identifying Basic Constituent Order in Old Tamil: Issues in historical linguistics with Special Reference to Tamil Epigraphic texts (400-650 CE) 1 Appasamy Murugaiyan Paris, France [email protected] Abstract Why and how languages change over time have been the major concerns of the historical linguistics. The Dravidian comparative linguistics in the last few decades has arrived at excellent results at different levels of language change: phonology, morphology and etymology. However the field of historical syntax remains to be explored in detail. Change or variation in word order type is one of the most important areas in the study of historical linguistics and language change. -
Expression of Information Structure in West Slavic: Modeling the Impact of Prosodic and Word Order Factors
EXPRESSION OF INFORMATION STRUCTURE IN WEST SLAVIC: MODELING THE IMPACT OF PROSODIC AND WORD ORDER FACTORS RADEK ŠIMÍK MARTA WIERZBA Humboldt-Universität zu Berlin University of Potsdam [Final draft, February 3, 2017] The received wisdom is that word order alternations in Slavic languages arise as a direct consequence of word order-related information structure constraints such as ‘place given expressions before new ones’. In this paper, we compare the word order hypothesis with a competing one, according to which word order alternations arise as a consequence of a prosodic constraint – ‘avoid stress on given expressions’. Based on novel experimental and modeling data, we conclude that the prosodic hypothesis is more adequate than the word order hypothesis. Yet, we also show that combining the strengths of both hypotheses provides the best fit for the data. Methodologically, our paper is based on gradient acceptability judgments and multiple regression, which allows us to evaluate whether violations of generalizations like ‘given precedes new’ or ‘given lacks stress’ lead to a consistent decrease in acceptability and to quantify the size of their respective effects. Focusing on the empirical adequacy of such generalizations rather than specific theoretical implementations also makes it possible to bridge the gap between different linguistic traditions, and to directly compare predictions emerging from formal and functional approaches.1 1. INTRODUCTION. This paper contributes to the long-standing discussion of how information structure is formally expressed. Our main research question is: To what extent do information structure-related word order alternations reflect an inherent connection between information structure and word order (the WORD ORDER HYPOTHESIS) and to what extent do they merely help to fulfill independent prosodic requirements (the PROSODIC HYPOTHESIS)? The word order hypothesis is incarnated in generalizations like ‘foci are sentence-final’, ‘topics are sentence-initial’, or ‘discourse given expressions precede new ones’. -
Haplology of Reflexive Clitics in Czech
Haplology of Reflexive Clitics in Czech Alexandr Rosen 1. Clitics in Czech As units exhibiting some properties of words and some properties of affixes, Slavic clitics (e.g. Franks & King, 2000) and clitics in general (e.g. Anderson, 1993; Zwicky, 1977) present a challenge to existing descriptive grammars and grammar formalisms. This is due mainly to their complex ordering properties, involving constraints of different types – phonological, morphological, syntactic, pragmatic and stylistic. Czech clitics are no exception – in many aspects they are similar to clitics in other languages (inventory, position, order, climbing) but their behavior may be surprising in a ‘free-word-order’ language, i.e. language with a (constituent) order determined by information structure rather than by syntactic functions. Our focus will be on special clitics (henceforth simply clitics), whose word-order position is determined by constraints different from those determining the position of non-clitic words.1 For space reasons and due to the focus of this paper we are not aiming at an exhaustive and precise coverage of all potential Czech clitics, but rather refer to previous work – Czech clitics have already attracted considerable attention (e.g. Fried, 1994; Avgustinova & Oliva, 1 On the other hand, the position of simple clitics is the same as the position of non-clitic words of the same class, while the only difference is phonological (Zwicky, 1977). - 97 - ALEXANDR ROSEN 1995; Toman, 1996; Rosen, 2001; Hana, 2007). They include short, mostly monosyllabic morphemes – auxiliaries, weak pronouns, short adverbs. Example (1) includes an auxiliary (jsem), a reflexive particle (se), a weak personal pronoun (mu ‘himD’), and a pronoun which can occur both in a clitical and non-clitical position (to ‘itA’). -
Leveraging Verbnet to Build Corpus-Specific Verb Clusters
Leveraging VerbNet to build Corpus-Specific Verb Clusters Daniel W Peterson and Jordan Boyd-Graber and Martha Palmer University of Colorado daniel.w.peterson,jordan.boyd.graber,martha.palmer @colorado.edu { } Daisuke Kawhara Kyoto University, JP [email protected] Abstract which involved dozens of linguists and a decade of work, making careful decisions about the al- In this paper, we aim to close the gap lowable syntactic frames for various verb senses, from extensive, human-built semantic re- informed by text examples. sources and corpus-driven unsupervised models. The particular resource explored VerbNet is useful for semantic role labeling and here is VerbNet, whose organizing princi- related tasks (Giuglea and Moschitti, 2006; Yi, ple is that semantics and syntax are linked. 2007; Yi et al., 2007; Merlo and van der Plas, To capture patterns of usage that can aug- 2009; Kshirsagar et al., 2014), but its widespread ment knowledge resources like VerbNet, use is limited by coverage. Not all verbs have we expand a Dirichlet process mixture a VerbNet class, and some polysemous verbs model to predict a VerbNet class for each have important senses unaccounted for. In addi- sense of each verb, allowing us to incorpo- tion, VerbNet is not easily adaptable to domain- rate annotated VerbNet data to guide the specific corpora, so these omissions may be more clustering process. The resulting clusters prominent outside of the general-purpose corpora align more closely to hand-curated syn- and linguistic intuition used in its construction. tactic/semantic groupings than any previ- Its great strength is also its downfall: adding ous models, and can be adapted to new new verbs, new senses, and new classes requires domains since they require only corpus trained linguists - at least, to preserve the integrity counts. -
Statistical Machine Translation with a Factorized Grammar
Statistical Machine Translation with a Factorized Grammar Libin Shen and Bing Zhang and Spyros Matsoukas and Jinxi Xu and Ralph Weischedel Raytheon BBN Technologies Cambridge, MA 02138, USA {lshen,bzhang,smatsouk,jxu,weisched}@bbn.com Abstract for which the segments for translation are always fixed. In modern machine translation practice, a sta- However, do we really need such a large rule set tistical phrasal or hierarchical translation sys- to represent information from the training data of tem usually relies on a huge set of trans- much smaller size? Linguists in the grammar con- lation rules extracted from bi-lingual train- struction field already showed us a perfect solution ing data. This approach not only results in space and efficiency issues, but also suffers to a similar problem. The answer is to use a fac- from the sparse data problem. In this paper, torized grammar. Linguists decompose lexicalized we propose to use factorized grammars, an linguistic structures into two parts, (unlexicalized) idea widely accepted in the field of linguis- templates and lexical items. Templates are further tic grammar construction, to generalize trans- organized into families. Each family is associated lation rules, so as to solve these two prob- with a set of lexical items which can be used to lex- lems. We designed a method to take advantage icalize all the templates in this family. For example, of the XTAG English Grammar to facilitate the extraction of factorized rules. We experi- the XTAG English Grammar (XTAG-Group, 2001), mented on various setups of low-resource lan- a hand-crafted grammar based on the Tree Adjoin- guage translation, and showed consistent sig- ing Grammar (TAG) (Joshi and Schabes, 1997) for- nificant improvement in BLEU over state-of- malism, is a grammar of this kind, which employs the-art string-to-dependency baseline systems factorization with LTAG e-tree templates and lexical with 200K words of bi-lingual training data. -
Verbnet Guidelines
VerbNet Annotation Guidelines 1. Why Verbs? 2. VerbNet: A Verb Class Lexical Resource 3. VerbNet Contents a. The Hierarchy b. Semantic Role Labels and Selectional Restrictions c. Syntactic Frames d. Semantic Predicates 4. Annotation Guidelines a. Does the Instance Fit the Class? b. Annotating Verbs Represented in Multiple Classes c. Things that look like verbs but aren’t i. Nouns ii. Adjectives d. Auxiliaries e. Light Verbs f. Figurative Uses of Verbs 1 Why Verbs? Computational verb lexicons are key to supporting NLP systems aimed at semantic interpretation. Verbs express the semantics of an event being described as well as the relational information among participants in that event, and project the syntactic structures that encode that information. Verbs are also highly variable, displaying a rich range of semantic and syntactic behavior. Verb classifications help NLP systems to deal with this complexity by organizing verbs into groups that share core semantic and syntactic properties. VerbNet (Kipper et al., 2008) is one such lexicon, which identifies semantic roles and syntactic patterns characteristic of the verbs in each class and makes explicit the connections between the syntactic patterns and the underlying semantic relations that can be inferred for all members of the class. Each syntactic frame in a class has a corresponding semantic representation that details the semantic relations between event participants across the course of the event. In the following sections, each component of VerbNet is identified and explained. VerbNet: A Verb Class Lexical Resource VerbNet is a lexicon of approximately 5800 English verbs, and groups verbs according to shared syntactic behaviors, thereby revealing generalizations of verb behavior. -
Single Classifier Approach for Verb Sense Disambiguation Based On
Single Classifier Approach for Verb Sense Disambiguation based on Generalized Features Daisuke Kawaharay, Martha Palmerz yGraduate School of Informatics, Kyoto University, Kyoto, Kyoto 606-8501, Japan zDepartment of Linguistics, University of Colorado Boulder, Boulder, CO 80302, USA [email protected], [email protected] Abstract We present a supervised method for verb sense disambiguation based on VerbNet. Most previous supervised approaches to verb sense disambiguation create a classifier for each verb that reaches a frequency threshold. These methods, however, have a significant practical problem that they cannot be applied to rare or unseen verbs. In order to overcome this problem, we create a single classifier to be applied to rare or unseen verbs in a new text. This single classifier also exploits generalized semantic features of a verb and its modifiers in order to better deal with rare or unseen verbs. Our experimental results show that the proposed method achieves equivalent performance to per-verb classifiers, which cannot be applied to unseen verbs. Our classifier could be utilized to improve the classifications in lexical resources of verbs, such as VerbNet, in a semi-automatic manner and to possibly extend the coverage of these resources to new verbs. Keywords: verb sense disambiguation, single classifier, word representations 1. Introduction for each verb or use class membership constraints (Abend A verb plays a primary role in conveying the meaning of et al., 2008), which limit the class candidates of a verb to its a sentence. Since capturing the sense of a verb is essential seen classes in the training data. -
Nominal Structures: All in Complex
OMLM Vol. 2 Edited by Ludmila Veselovská and Markéta Janebová Nominal Structures: Nominal Nominal Structures: All in Complex DPs All in Complex DPs in Complex All OlomoUC MODERN LANGUAGE MONOGRAPHS Vol. 2 obalka_montaz_old.indd 2 7.5.2014 19:08:02 Nominal Structures: All in Complex DPs Edited by Ludmila Veselovská and Markéta Janebová Palacký University Olomouc 2014 monografie.indb 1 7.5.2014 9:31:04 OLOMOUC MODERN LANGUAGE MONOGRAPHS (OMLM) publishes themed monographs on linguistics, literature, and translation studies. Also in this series: OMLM, Vol. 1: Category and Categorial Changes (2014) monografie.indb 2 7.5.2014 9:31:04 OLOMOUC MODERN LANGUAGE MONOGRAPHS Vol. 2 Nominal Structures: All in Complex DPs Edited by Ludmila Veselovská and Markéta Janebová Palacký University Olomouc 2014 monografie.indb 3 7.5.2014 9:31:04 Reviewers: Petr Karlík (Masaryk University, Brno, Czech Republic) Jamal Ouhalla (University College Dublin, Ireland) Jeffrey Parrott (Palacký University, Olomouc, Czech Republic) FIRST EDITION Arrangement copyright © Ludmila Veselovská, Markéta Janebová Preface copyright © Ludmila Veselovská Papers copyright © Bożena Cetnarowska, Manuela Gonzaga, Andrea Hudousková, Pavel Machač, Katarzyna Miechowicz-Mathiasen, Petra Mišmaš, Elena Rudnitskaya, Branimir Stanković, Ludmila Veselovská, Marcin Wągiel, Magdalena Zíková Copyright © Palacký University, Olomouc, 2014 ISBN 978-80-244-4088-0 (print) ISBN 978-80-244-4089-7 (electronic version; available at http://olinco.upol.cz/assets/olinco-2013-monograph.pdf) Registration number: