Foreword to the Special Issue on Uralic Languages
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Arxiv:1908.07448V1
Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing Milan Straka and Jana Strakova´ and Jan Hajicˇ Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics {strakova,straka,hajic}@ufal.mff.cuni.cz Abstract Shared Task (Zeman et al., 2018). • We report our best results on UD 2.3. The We present an extensive evaluation of three addition of contextualized embeddings im- recently proposed methods for contextualized 25% embeddings on 89 corpora in 54 languages provements range from relative error re- of the Universal Dependencies 2.3 in three duction for English treebanks, through 20% tasks: POS tagging, lemmatization, and de- relative error reduction for high resource lan- pendency parsing. Employing the BERT, guages, to 10% relative error reduction for all Flair and ELMo as pretrained embedding in- UD 2.3 languages which have a training set. puts in a strong baseline of UDPipe 2.0, one of the best-performing systems of the 2 Related Work CoNLL 2018 Shared Task and an overall win- ner of the EPE 2018, we present a one-to- A new type of deep contextualized word repre- one comparison of the three contextualized sentation was introduced by Peters et al. (2018). word embedding methods, as well as a com- The proposed embeddings, called ELMo, were ob- parison with word2vec-like pretrained em- tained from internal states of deep bidirectional beddings and with end-to-end character-level word embeddings. We report state-of-the-art language model, pretrained on a large text corpus. results in all three tasks as compared to results Akbik et al. -
Extended and Enhanced Polish Dependency Bank in Universal Dependencies Format
Extended and Enhanced Polish Dependency Bank in Universal Dependencies Format Alina Wróblewska Institute of Computer Science Polish Academy of Sciences ul. Jana Kazimierza 5 01-248 Warsaw, Poland [email protected] Abstract even for languages with rich morphology and rel- atively free word order, such as Polish. The paper presents the largest Polish Depen- The supervised learning methods require gold- dency Bank in Universal Dependencies for- mat – PDBUD – with 22K trees and 352K standard training data, whose creation is a time- tokens. PDBUD builds on its previous ver- consuming and expensive process. Nevertheless, sion, i.e. the Polish UD treebank (PL-SZ), dependency treebanks have been created for many and contains all 8K PL-SZ trees. The PL- languages, in particular within the Universal De- SZ trees are checked and possibly corrected pendencies initiative (UD, Nivre et al., 2016). in the current edition of PDBUD. Further The UD leaders aim at developing a cross- 14K trees are automatically converted from linguistically consistent tree annotation schema a new version of Polish Dependency Bank. and at building a large multilingual collection of The PDBUD trees are expanded with the en- hanced edges encoding the shared dependents dependency treebanks annotated according to this and the shared governors of the coordinated schema. conjuncts and with the semantic roles of some Polish is also represented in the Universal dependents. The conducted evaluation exper- Dependencies collection. There are two Polish iments show that PDBUD is large enough treebanks in UD: the Polish UD treebank (PL- for training a high-quality graph-based depen- SZ) converted from Składnica zalezno˙ sciowa´ 1 and dency parser for Polish. -
Universal Dependencies According to BERT: Both More Specific and More General
Universal Dependencies According to BERT: Both More Specific and More General Tomasz Limisiewicz and David Marecekˇ and Rudolf Rosa Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics Charles University, Prague, Czech Republic {limisiewicz,rosa,marecek}@ufal.mff.cuni.cz Abstract former based systems particular heads tend to cap- ture specific dependency relation types (e.g. in one This work focuses on analyzing the form and extent of syntactic abstraction captured by head the attention at the predicate is usually focused BERT by extracting labeled dependency trees on the nominal subject). from self-attentions. We extend understanding of syntax in BERT by Previous work showed that individual BERT examining the ways in which it systematically di- heads tend to encode particular dependency verges from standard annotation (UD). We attempt relation types. We extend these findings by to bridge the gap between them in three ways: explicitly comparing BERT relations to Uni- • We modify the UD annotation of three lin- versal Dependencies (UD) annotations, show- ing that they often do not match one-to-one. guistic phenomena to better match the BERT We suggest a method for relation identification syntax (x3) and syntactic tree construction. Our approach • We introduce a head ensemble method, com- produces significantly more consistent depen- dency trees than previous work, showing that bining multiple heads which capture the same it better explains the syntactic abstractions in dependency relation label (x4) BERT. • We observe and analyze multipurpose heads, At the same time, it can be successfully ap- containing multiple syntactic functions (x7) plied with only a minimal amount of supervi- sion and generalizes well across languages. -
Hungarian Prehistory Series
Hungarian Prehistory Series The Hungarians moved to their later homeland, the Carpathian basin at the end of the ninth century. Prior to this period they lived in the western part of the southern Russian steppe as vassals of the Khazar Kaghanate. The ethnic envi- ronment of the Kaghanate had a great impact on the ethnogenesis of the Hun- garians as testified by the numerous Turkic and Iranian loan words as well as the art, the military and the political structure of the Hungarians in the period of the conquest. Therefore, from the point of view of Hungarian prehistory, it is crucial to be familiar with the history of the nomadic peoples, that is, with the "oriental background." The Hungarian Prehistory Series, launched in 1990, aimed to pub- lish source editions, collected papers and monographs in connection with the history of the Eurasian steppe. It includes historical, linguistical and archaeologi- cal studies. The Department of Medieval World History (University of Szeged) has played an active role in the publication of the series since 1994. The published volumes of the series until 2000 are the following: Vol. 1. Őstörténet és nemzettudat 1919-1931. [Prehistory and the National Con- sciousness.] Ed. Eva Kineses Nagy, Szeged 1991. Vol. 2. Sándor, Klára, A Bolognai Rovásemlék. [The Runic Inscription of Bologna.] Szeged 1991. Vol. 3. Szűcs, Jenő, A magyar nemzeti tudat kialakulása. [The Formation of Hungar- ian National Consciousness.] Ed. István Zimonyi, Szeged 1992. Vol. 4. Rovásírás a Kárpát-medencében. [Runic Scripts in the Carpathian Basin.] Ed. Klára Sándor, Szeged 1992. Vol. 5. Szádeczky-Kardoss, Samu, Az avar történelem forrásai. -
O Du Mein Österreich: Patriotic Music and Multinational Identity in The
O du mein Österreich: Patriotic Music and Multinational Identity in the Austro-Hungarian Empire by Jason Stephen Heilman Department of Music Duke University Date: _______________________ Approved: ______________________________ Bryan R. Gilliam, Supervisor ______________________________ Scott Lindroth ______________________________ James Rolleston ______________________________ Malachi Hacohen Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Music in the Graduate School of Duke University 2009 ABSTRACT O du mein Österreich: Patriotic Music and Multinational Identity in the Austro-Hungarian Empire by Jason Stephen Heilman Department of Music Duke University Date: _______________________ Approved: ______________________________ Bryan R. Gilliam, Supervisor ______________________________ Scott Lindroth ______________________________ James Rolleston ______________________________ Malachi Hacohen An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Music in the Graduate School of Duke University 2009 Copyright by Jason Stephen Heilman 2009 Abstract As a multinational state with a population that spoke eleven different languages, the Austro-Hungarian Empire was considered an anachronism during the age of heightened nationalism leading up to the First World War. This situation has made the search for a single Austro-Hungarian identity so difficult that many historians have declared it impossible. Yet the Dual Monarchy possessed one potentially unifying cultural aspect that has long been critically neglected: the extensive repertoire of marches and patriotic music performed by the military bands of the Imperial and Royal Austro- Hungarian Army. This Militärmusik actively blended idioms representing the various nationalist musics from around the empire in an attempt to reflect and even celebrate its multinational makeup. -
Roots of Modern Hungarian Nationalism: a Case Study and a Research Agenda
UvA-DARE (Digital Academic Repository) The roots of Modern Hungarian Nationalism: A Case Study and a Research Agenda Marácz, L. Publication date 2016 Document Version Final published version Published in The roots of nationalism: national identity formation in early modern Europe, 1600-1815 Link to publication Citation for published version (APA): Marácz, L. (2016). The roots of Modern Hungarian Nationalism: A Case Study and a Research Agenda. In L. Jensen (Ed.), The roots of nationalism: national identity formation in early modern Europe, 1600-1815 (pp. 235-250). (Heritage and Memory Studies). Amsterdam University Press. http://www.oapen.org/search?identifier=606242 General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl) Download date:25 Sep 2021 The Roots of Nationalism National Identity Formation in Early Modern Europe, 1600‑1815 Edited by Lotte Jensen Amsterdam University Press This research has been made possible with the generous support of the Netherlands Organisation for Scientific Research (NWO). -
Austria-Hungary 1914: Nationalisms in Multi- National Nation-State Anthony M
Comparative Civilizations Review Volume 72 Article 8 Number 72 Spring 2015 4-1-2015 Austria-Hungary 1914: Nationalisms in Multi- National Nation-State Anthony M. Stevens-Arroyo [email protected] Follow this and additional works at: https://scholarsarchive.byu.edu/ccr Recommended Citation Stevens-Arroyo, Anthony M. (2015) "Austria-Hungary 1914: Nationalisms in Multi-National Nation-State," Comparative Civilizations Review: Vol. 72 : No. 72 , Article 8. Available at: https://scholarsarchive.byu.edu/ccr/vol72/iss72/8 This Article is brought to you for free and open access by the All Journals at BYU ScholarsArchive. It has been accepted for inclusion in Comparative Civilizations Review by an authorized editor of BYU ScholarsArchive. For more information, please contact [email protected], [email protected]. Stevens-Arroyo: Austria-Hungary 1914: Nationalisms in Multi-National Nation-State Comparative Civilizations Review 99 Austria-Hungary 1914: Nationalisms in a Multi-National Nation-State Anthony M. Stevens-Arroyo [email protected] “Austria is disunity and partition into petty states, darkness, Jesuitism, reaction and the whorish way of doing things of the patriarchal rule of the police.” - Ludwig Bamberger, Radical German émigré, 1859 “We shall have a little parliamentarianism, but power will remain in my hands and the whole thing will be adapted to Austrian realities.” - Emperor Frantz Josef, 1861 “…civilized states by and large have adopted that organization which, in the whole continent, rests on historical foundations only in Hungary.” - Ernő Nagy, Nagyvárad Law School Professor, 1887 Introduction “Austria is disunity and partition into petty states, darkness, Jesuitism, reaction and the whorish way of doing things of the patriarchal rule of the police,” wrote Ludwig Bamberger, an early radical, in 1859. -
Universal Dependencies for Japanese
Universal Dependencies for Japanese Takaaki Tanaka∗, Yusuke Miyaoy, Masayuki Asahara}, Sumire Uematsuy, Hiroshi Kanayama♠, Shinsuke Mori|, Yuji Matsumotoz ∗NTT Communication Science Labolatories, yNational Institute of Informatics, }National Institute for Japanese Language and Linguistics, ♠IBM Research - Tokyo, |Kyoto University, zNara Institute of Science and Technology [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Abstract We present an attempt to port the international syntactic annotation scheme, Universal Dependencies, to the Japanese language in this paper. Since the Japanese syntactic structure is usually annotated on the basis of unique chunk-based dependencies, we first introduce word-based dependencies by using a word unit called the Short Unit Word, which usually corresponds to an entry in the lexicon UniDic. Porting is done by mapping the part-of-speech tagset in UniDic to the universal part-of-speech tagset, and converting a constituent-based treebank to a typed dependency tree. The conversion is not straightforward, and we discuss the problems that arose in the conversion and the current solutions. A treebank consisting of 10,000 sentences was built by converting the existent resources and currently released to the public. Keywords: typed dependencies, Short Unit Word, multiword expression, UniDic 1. Introduction 2. Word unit The definition of a word unit is indispensable in UD an- notation, which is not a trivial question for Japanese, since The Universal Dependencies (UD) project has been de- a sentence is not segmented into words or morphemes by veloping cross-linguistically consistent treebank annota- white space in its orthography. -
Universal Dependencies for Persian
Universal Dependencies for Persian *Mojgan Seraji, **Filip Ginter, *Joakim Nivre *Uppsala University, Department of Linguistics and Philology, Sweden **University of Turku, Department of Information Technology, Finland *firstname.lastname@lingfil.uu.se **figint@utu.fi Abstract The Persian Universal Dependency Treebank (Persian UD) is a recent effort of treebanking Persian with Universal Dependencies (UD), an ongoing project that designs unified and cross-linguistically valid grammatical representations including part-of-speech tags, morphological features, and dependency relations. The Persian UD is the converted version of the Uppsala Persian Dependency Treebank (UPDT) to the universal dependencies framework and consists of nearly 6,000 sentences and 152,871 word tokens with an average sentence length of 25 words. In addition to the universal dependencies syntactic annotation guidelines, the two treebanks differ in tokenization. All words containing unsegmented clitics (pronominal and copula clitics) annotated with complex labels in the UPDT have been separated from the clitics and appear with distinct labels in the Persian UD. The treebank has its original syntactic annota- tion scheme based on Stanford Typed Dependencies. In this paper, we present the approaches taken in the development of the Persian UD. Keywords: Universal Dependencies, Persian, Treebank 1. Introduction this paper, we present how we adapt the Universal Depen- In the past decade, the development of numerous depen- dencies to Persian by converting the Uppsala Persian De- dency parsers for different languages has frequently been pendency Treebank (UPDT) (Seraji, 2015) to the Persian benefited by the use of syntactically annotated resources, Universal Dependencies (Persian UD). First, we briefly de- or treebanks (Bohmov¨ a´ et al., 2003; Haverinen et al., 2010; scribe the Universal Dependencies and then we present the Kromann, 2003; Foth et al., 2014; Seraji et al., 2015; morphosyntactic annotations used in the extended version Vincze et al., 2010). -
Finnish and Hungarian
The role of linguistics in language teaching: the case of two, less widely taught languages - Finnish and Hungarian Eszter Tarsoly and Riitta-Liisa Valijärvi The School of Slavonic and East European Studies, University College London, London, United Kingdom The School of Slavonic and East European Studies, University College London, Gower Street, London, WC1E 6BT, United Kingdom; [email protected], [email protected] (Received xxx; final version received xxx) This paper discusses the role of various linguistic sub-disciplines in teaching Finnish and Hungarian. We explain the status of Finnish and Hungarian at University College London and in the UK, and present the principle difficulties in learning and teaching these two languages. We also introduce our courses and student profiles. With the support of examples from our own teaching, we argue that a linguistically oriented approach is well suited for less widely used and less taught languages as it enables students to draw comparative and historical parallels, question terminologies and raise their sociolinguistic and pragmatic awareness. A linguistic approach also provides students with skills for further language learning. Keywords: language teaching; less taught languages; LWUTL; Finnish; Hungarian; linguistic terminology; historical linguistics; phonology; typology; cognitive linguistics; contact linguistics; corpus linguistics; sociolinguistics; pragmatics; language and culture. Introduction The purpose of our paper is to explore the role of different sub-disciplines of linguistics in language teaching, in particular, their role in the teaching of less widely used and less taught (LWULT) languages. More specifically, we argue that a linguistic approach to language teaching is well suited for teaching morphologically complex less widely taught languages, such as Hungarian and Finnish, in the UK context. -
Language Technology Meets Documentary Linguistics: What We Have to Tell Each Other
Language technology meets documentary linguistics: What we have to tell each other Language technology meets documentary linguistics: What we have to tell each other Trond Trosterud Giellatekno, Centre for Saami Language Technology http://giellatekno.uit.no/ . February 15, 2018 . Language technology meets documentary linguistics: What we have to tell each other Contents Introduction Language technology for the documentary linguist Language technology for the language society Conclusion . Language technology meets documentary linguistics: What we have to tell each other Introduction Introduction I Giellatekno: started in 2001 (UiT). Research group for language technology on Saami and other northern languages Gramm. modelling, dictionaries, ICALL, corpus analysis, MT, ... I Trond Trosterud, Lene Antonsen, Ciprian Gerstenberger, Chiara Argese I Divvun: Started in 2005 (UiT < Min. of Local Government). Infrastructure, proofing tools, synthetic speech, terminology I Sjur Moshagen, Thomas Omma, Maja Kappfjell, Børre Gaup, Tomi Pieski, Elena Paulsen, Linda Wiechetek . Language technology meets documentary linguistics: What we have to tell each other Introduction The most important languages we work on . Language technology meets documentary linguistics: What we have to tell each other Language technology for the documentary linguist Language technology for documentary linguistics ... Language technology meets documentary linguistics: What we have to tell each other Language technology for the documentary linguist ... what’s in it for the language community I work for? I Let’s pretend there are two types of language communities: 1. Language communities without plans for revitalisation or use in domains other than oral use 2. Language communities with such plans . Language technology meets documentary linguistics: What we have to tell each other Language technology for the documentary linguist Language communities without such plans I Gather empirical material and do your linguistic analysis I (The triplet: Grammar, text collection and dictionary) I .. -
The Universal Dependencies Treebank of Spoken Slovenian
The Universal Dependencies Treebank of Spoken Slovenian Kaja Dobrovoljc1, Joakim Nivre2 1Institute for Applied Slovene Studies Trojina, Ljubljana, Slovenia 1Department of Slovenian Studies, Faculty of Arts, University of Ljubljana 2Department of Linguistics and Philology, Uppsala University [email protected], joakim.nivre@lingfil.uu.se Abstract This paper presents the construction of an open-source dependency treebank of spoken Slovenian, the first syntactically annotated collection of spontaneous speech in Slovenian. The treebank has been manually annotated using the Universal Dependencies annotation scheme, a one-layer syntactic annotation scheme with a high degree of cross-modality, cross-framework and cross-language interoper- ability. In this original application of the scheme to spoken language transcripts, we address a wide spectrum of syntactic particularities in speech, either by extending the scope of application of existing universal labels or by proposing new speech-specific extensions. The initial analysis of the resulting treebank and its comparison with the written Slovenian UD treebank confirms significant syntactic differences between the two language modalities, with spoken data consisting of shorter and more elliptic sentences, less and simpler nominal phrases, and more relations marking disfluencies, interaction, deixis and modality. Keywords: dependency treebank, spontaneous speech, Universal Dependencies 1. Introduction actually out-performs state-of-the-art pipeline approaches (Rasooli and Tetreault, 2013; Honnibal and Johnson, 2014). It is nowadays a well-established fact that data-driven pars- Such heterogeneity of spoken language annotation schemes ing systems used in different speech-processing applica- inevitably leads to a restricted usage of existing spoken tions benefit from learning on annotated spoken data, rather language treebanks in linguistic research and parsing sys- than using models built on written language observation.