Universal Morphology for Old Hungarian

Total Page:16

File Type:pdf, Size:1020Kb

Universal Morphology for Old Hungarian Universal Morphology for Old Hungarian Eszter Simon Veronika Vincze Research Institute for Linguistics, MTA-SZTE Research Group Hungarian Academy of Sciences for Artificial Intelligence Benczur´ u. 33. Tisza Lajos krt. 103. H-1068 Budapest, Hungary H-6720 Szeged, Hungary [email protected] [email protected] Abstract Parsed Corpus of Middle English (Kroch and Tay- lor, 2000), the Tycho Brahe Parsed Corpus of His- This paper provides a description of the torical Portuguese (Galves and Britto, 2002), or automatic conversion of the morphologi- the Welsh Prose corpus (Thomas et al., 2007) and cally annotated part of the Old Hungar- for non-Indo-European languages as well, such as ian Corpus. These texts are in the for- the Old Hungarian Corpus (Simon, 2014). mat of the Humor analyzer, which does not Historical corpora represent a rich source of follow any international standards. Since data, but only if the relevant information is speci- standardization always facilitates future fied in a computationally interpretable and retriev- research, even for researchers who do not able way. Moreover, following the current stan- know the Old Hungarian language, we dardisation efforts allows for cross-lingual com- opted for mapping the Humor formalism parative studies, as well as for longitudinal inves- to a widely used universal tagset, namely tigations on language change. With the recent the Universal Dependencies framework. increase in the number of annotated corpora, it The benefits of using a shared tagset across seems advisable to move towards a harmonized languages enable interlingual comparisons common framework and methodology. Standard- from a theoretical point of view and also ization always facilitates future research – in this multilingual NLP applications can profit case even for researchers who do not know the Old from a unified annotation scheme. In this Hungarian language. paper, we report the adaptation of the Uni- Natural language processing activities in Hun- versal Dependencies morphological anno- gary were not synchronized in the past, hence sim- tation scheme to Old Hungarian, and we ilar resources were developed in parallel at dif- discuss the most important theoretical lin- ferent locations. As a consequence, there are guistic issues that had to be resolved dur- two morphological analyzers for Hungarian: Hun- ing the process. We focus on the linguistic morph (Tron´ et al., 2005) and Humor (Novak,´ phenomena typical of Old Hungarian that 2003). The former one has not been maintained required special treatment and we offer so- recently, while the latter one is not freely available. lutions to them. Moreover, they use different formalisms, which share only one common property: they do not fol- 1 Introduction low any international standards. For the morpho- There is a growing interest not only in the nat- logical annotation of Old Hungarian texts, the Hu- ural language processing (NLP) community, but mor analyzer was used, thus all of the morphologi- even among theoretical and historical linguists cally annotated texts are in a special format, which for building and using databases of historical is hard to be interpreted for a non-Hungarian re- texts. High quality historical corpora enriched searcher. That is the reason behind the need of with some kinds of linguistic information and mapping the Humor formalism to a widely used metadata can provide a fertile ground for theoret- universal tagset, for which we chose the Universal ical investigations. Several databases of historical Dependencies (UD) framework. texts have recently been created for various Indo- The UD tagset and annotation scheme have just European languages, such as the Penn-Helsinki been adapted to Modern Hungarian (Vincze et al., 118 Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), pages 118–127, Berlin, Germany, August 11, 2016. c 2016 Association for Computational Linguistics 2016). In this paper, we report the adaptation of POS description the morphological annotation scheme to Old Hun- ADJ garian, and we discuss the most important theoret- adjective ADP ical linguistic issues that had to be resolved during adposition ADV the process. Section 2 briefly presents the inter- adverb AUX national project Universal Dependencies and Mor- auxiliary CONJ phology, then we summarize the part-of-speech coordinating conjunction DET (POS) tags and morphological features that are rel- determiner INTJ evant for Old Hungarian. Section 3 gives a brief interjection NOUN introduction of the Old Hungarian language and noun NUM describes the morphologically annotated part of number PART the Old Hungarian Corpus which has been con- particle PRON verted into the UD tagset. Section 4 reports on nominal pronoun PROPN our experiences in the conversion and discusses proper noun PUNCT the specific linguistic issues concerning parts-of- punctuation SCONJ speech and features. In Section 5, we contrast the subordinating conjunction VERB annotation schemes developed for Old and Mod- verb X ern Hungarian. Conclusions and the planned fu- other ture work end the paper in Section 6. Table 1: POS tags for Old Hungarian. 2 Universal Dependencies and Morphology terlingua for different morphological tagsets and Universal Dependencies is an international project it enables the conversion of different tagsets to that aims at developing a unified annotation the same morphological representation (Zeman, scheme for dependency syntax and morphology in 2008). Rambow et al. (2006) defined a multilin- a language-independent framework (Nivre, 2015). gual tagset for POS tagging and parsing, while Currently (as of June 2016), there are anno- McDonald and Nivre (2007) identified eight POS tated datasets available for 45 languages, includ- tags based on data from the CoNLL-2007 Shared ing modern languages such as English, German, Task (Nivre et al., 2007). Petrov et al. (2012) French, Hungarian and Irish, and old languages offered a tagset of 12 POS tags and applied this such as Ancient Greek, Coptic, Latin and Old tagset to 22 languages. 1 Church Slavic, among others . Datasets from all Now, Universal Dependencies is the latest stan- these languages apply the same tagsets at the mor- dardized tagset that we are aware of. In its current phological and syntactic levels and are annotated form, morphological information is encoded in the on the basis of the same linguistic principles, to form of POS tags and feature–value pairs. There is the widest extent possible, however, in some cases, a fixed set of universal POS tags without the pos- language-specific decisions had to be made. The sibility of introducing new members, but features benefits of using a shared tagset across languages and values can have language-specific additions if enable interlingual comparisons from a theoretical needed. Features are divided into the categories point of view and also multilingual NLP applica- lexical features and inflectional features. Lexical tions can profit from a unified annotation scheme. features are features that are characteristics of the Standardized tagsets for both morphological lemmas rather than the word forms, whereas in- and syntactic annotation have been constantly im- flectional features are those that are characteris- proved in the international NLP community. As tics of the word forms. Both lexical and inflec- for dependency syntax, Stanford dependencies is tional features can have layered features: some one of the most widely used tagsets (de Marn- features are marked more than once on the same effe and Manning, 2008). For morphology, the word, e.g. a Hungarian noun may denote its pos- MSD coding system was developed for a bunch sessor’s number as well as its own number. In of Eastern European languages including Hungar- this case, the Number feature has an added layer, ian (Erjavec, 2012). Interset functions as an in- Number[psor]. 1http://universaldependencies.org As mentioned above, Universal Morphology 119 annotates words with POS information and mor- 4 Language-specific extensions phological features. Tables 1 and 2 summarize the Since the time interval of the Old Hungarian pe- POS tags and morphological features that are rel- riod is more than 600 years, several linguistic phe- evant for Old Hungarian, based on the annotation nomena were in permanent change during this pe- scheme created for Modern Hungarian, described riod. That is one of the reasons behind the het- at the UD website and in Vincze et al. (2016). erogeneity of Old Hungarian texts. For instance, 3 Old Hungarian the progress in which postpositions became ver- bal particles or adverbs roots back to the Proto- The Old Hungarian era lasted from 896 to 1526, Hungarian period and lasts even in the Modern the year of the occupation of the major part of the Hungarian era, thus making a decision on their Hungarian Kingdom by the Ottoman Empire. The POS tag is far from trivial (discussed in more de- first part of this period (between 896–1350), doc- tail in Section 4.2). Such issues posed several umented by linguistic fragments and short coher- problems during the conversion process, which are ent texts, is called the Early Old Hungarian period. detailed in this section. The Late Old Hungarian period between 1350– In examples, throughout the section, the rel- 1526 is the period of codices. evant parts are emboldened. As a morphologi- The Old Hungarian Corpus (Simon, 2014) con- cal description, we apply and follow the standard tains all codices from the Late Old Hungarian pe- Leipzig Glossing Rules. The source of the exam- riod and several minor texts from the Early Old ple is provided in brackets after the translation. If Hungarian period in their original orthographic the example is part of the Bible, the translation is form. Because of the heterogeneity of the Old copied from the King James Bible, and its biblical Hungarian orthographic system, the original to- locus (book, chapter, verse) is also provided. kens had to be transcribed into their modernized First, we discuss general issues of the conver- form during a normalization step (for more de- sion, then we illustrate specific cases that are rel- tails, see Oravecz et al.
Recommended publications
  • Hungarian Prehistory Series
    Hungarian Prehistory Series The Hungarians moved to their later homeland, the Carpathian basin at the end of the ninth century. Prior to this period they lived in the western part of the southern Russian steppe as vassals of the Khazar Kaghanate. The ethnic envi- ronment of the Kaghanate had a great impact on the ethnogenesis of the Hun- garians as testified by the numerous Turkic and Iranian loan words as well as the art, the military and the political structure of the Hungarians in the period of the conquest. Therefore, from the point of view of Hungarian prehistory, it is crucial to be familiar with the history of the nomadic peoples, that is, with the "oriental background." The Hungarian Prehistory Series, launched in 1990, aimed to pub- lish source editions, collected papers and monographs in connection with the history of the Eurasian steppe. It includes historical, linguistical and archaeologi- cal studies. The Department of Medieval World History (University of Szeged) has played an active role in the publication of the series since 1994. The published volumes of the series until 2000 are the following: Vol. 1. Őstörténet és nemzettudat 1919-1931. [Prehistory and the National Con- sciousness.] Ed. Eva Kineses Nagy, Szeged 1991. Vol. 2. Sándor, Klára, A Bolognai Rovásemlék. [The Runic Inscription of Bologna.] Szeged 1991. Vol. 3. Szűcs, Jenő, A magyar nemzeti tudat kialakulása. [The Formation of Hungar- ian National Consciousness.] Ed. István Zimonyi, Szeged 1992. Vol. 4. Rovásírás a Kárpát-medencében. [Runic Scripts in the Carpathian Basin.] Ed. Klára Sándor, Szeged 1992. Vol. 5. Szádeczky-Kardoss, Samu, Az avar történelem forrásai.
    [Show full text]
  • O Du Mein Österreich: Patriotic Music and Multinational Identity in The
    O du mein Österreich: Patriotic Music and Multinational Identity in the Austro-Hungarian Empire by Jason Stephen Heilman Department of Music Duke University Date: _______________________ Approved: ______________________________ Bryan R. Gilliam, Supervisor ______________________________ Scott Lindroth ______________________________ James Rolleston ______________________________ Malachi Hacohen Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Music in the Graduate School of Duke University 2009 ABSTRACT O du mein Österreich: Patriotic Music and Multinational Identity in the Austro-Hungarian Empire by Jason Stephen Heilman Department of Music Duke University Date: _______________________ Approved: ______________________________ Bryan R. Gilliam, Supervisor ______________________________ Scott Lindroth ______________________________ James Rolleston ______________________________ Malachi Hacohen An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Music in the Graduate School of Duke University 2009 Copyright by Jason Stephen Heilman 2009 Abstract As a multinational state with a population that spoke eleven different languages, the Austro-Hungarian Empire was considered an anachronism during the age of heightened nationalism leading up to the First World War. This situation has made the search for a single Austro-Hungarian identity so difficult that many historians have declared it impossible. Yet the Dual Monarchy possessed one potentially unifying cultural aspect that has long been critically neglected: the extensive repertoire of marches and patriotic music performed by the military bands of the Imperial and Royal Austro- Hungarian Army. This Militärmusik actively blended idioms representing the various nationalist musics from around the empire in an attempt to reflect and even celebrate its multinational makeup.
    [Show full text]
  • Roots of Modern Hungarian Nationalism: a Case Study and a Research Agenda
    UvA-DARE (Digital Academic Repository) The roots of Modern Hungarian Nationalism: A Case Study and a Research Agenda Marácz, L. Publication date 2016 Document Version Final published version Published in The roots of nationalism: national identity formation in early modern Europe, 1600-1815 Link to publication Citation for published version (APA): Marácz, L. (2016). The roots of Modern Hungarian Nationalism: A Case Study and a Research Agenda. In L. Jensen (Ed.), The roots of nationalism: national identity formation in early modern Europe, 1600-1815 (pp. 235-250). (Heritage and Memory Studies). Amsterdam University Press. http://www.oapen.org/search?identifier=606242 General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl) Download date:25 Sep 2021 The Roots of Nationalism National Identity Formation in Early Modern Europe, 1600‑1815 Edited by Lotte Jensen Amsterdam University Press This research has been made possible with the generous support of the Netherlands Organisation for Scientific Research (NWO).
    [Show full text]
  • Austria-Hungary 1914: Nationalisms in Multi- National Nation-State Anthony M
    Comparative Civilizations Review Volume 72 Article 8 Number 72 Spring 2015 4-1-2015 Austria-Hungary 1914: Nationalisms in Multi- National Nation-State Anthony M. Stevens-Arroyo [email protected] Follow this and additional works at: https://scholarsarchive.byu.edu/ccr Recommended Citation Stevens-Arroyo, Anthony M. (2015) "Austria-Hungary 1914: Nationalisms in Multi-National Nation-State," Comparative Civilizations Review: Vol. 72 : No. 72 , Article 8. Available at: https://scholarsarchive.byu.edu/ccr/vol72/iss72/8 This Article is brought to you for free and open access by the All Journals at BYU ScholarsArchive. It has been accepted for inclusion in Comparative Civilizations Review by an authorized editor of BYU ScholarsArchive. For more information, please contact [email protected], [email protected]. Stevens-Arroyo: Austria-Hungary 1914: Nationalisms in Multi-National Nation-State Comparative Civilizations Review 99 Austria-Hungary 1914: Nationalisms in a Multi-National Nation-State Anthony M. Stevens-Arroyo [email protected] “Austria is disunity and partition into petty states, darkness, Jesuitism, reaction and the whorish way of doing things of the patriarchal rule of the police.” - Ludwig Bamberger, Radical German émigré, 1859 “We shall have a little parliamentarianism, but power will remain in my hands and the whole thing will be adapted to Austrian realities.” - Emperor Frantz Josef, 1861 “…civilized states by and large have adopted that organization which, in the whole continent, rests on historical foundations only in Hungary.” - Ernő Nagy, Nagyvárad Law School Professor, 1887 Introduction “Austria is disunity and partition into petty states, darkness, Jesuitism, reaction and the whorish way of doing things of the patriarchal rule of the police,” wrote Ludwig Bamberger, an early radical, in 1859.
    [Show full text]
  • Finnish and Hungarian
    The role of linguistics in language teaching: the case of two, less widely taught languages - Finnish and Hungarian Eszter Tarsoly and Riitta-Liisa Valijärvi The School of Slavonic and East European Studies, University College London, London, United Kingdom The School of Slavonic and East European Studies, University College London, Gower Street, London, WC1E 6BT, United Kingdom; [email protected], [email protected] (Received xxx; final version received xxx) This paper discusses the role of various linguistic sub-disciplines in teaching Finnish and Hungarian. We explain the status of Finnish and Hungarian at University College London and in the UK, and present the principle difficulties in learning and teaching these two languages. We also introduce our courses and student profiles. With the support of examples from our own teaching, we argue that a linguistically oriented approach is well suited for less widely used and less taught languages as it enables students to draw comparative and historical parallels, question terminologies and raise their sociolinguistic and pragmatic awareness. A linguistic approach also provides students with skills for further language learning. Keywords: language teaching; less taught languages; LWUTL; Finnish; Hungarian; linguistic terminology; historical linguistics; phonology; typology; cognitive linguistics; contact linguistics; corpus linguistics; sociolinguistics; pragmatics; language and culture. Introduction The purpose of our paper is to explore the role of different sub-disciplines of linguistics in language teaching, in particular, their role in the teaching of less widely used and less taught (LWULT) languages. More specifically, we argue that a linguistic approach to language teaching is well suited for teaching morphologically complex less widely taught languages, such as Hungarian and Finnish, in the UK context.
    [Show full text]
  • Foreword to the Special Issue on Uralic Languages
    Northern European Journal of Language Technology, 2016, Vol. 4, Article 1, pp 1–9 DOI 10.3384/nejlt.2000-1533.1641 Foreword to the Special Issue on Uralic Languages Tommi A Pirinen Hamburger Zentrum für Sprachkorpora Universität Hamburg [email protected] Trond Trosterud HSL-fakultehta UiT Norgga árktalaš universitehta [email protected] Francis M. Tyers Veronika Vincze HSL-fakultehta MTA-SZTE UiT Norgga árktalaš universitehta Szegedi Tudomány Egyetem [email protected] [email protected] Eszter Simon Jack Rueter Research Institute for Linguistics Helsingin yliopisto Hungarian Academy of Sciences Nykykielten laitos [email protected] [email protected] March 7, 2017 Abstract In this introduction we have tried to present concisely the history of language tech- nology for Uralic languages up until today, and a bit of a desiderata from the point of view of why we organised this special issue. It is of course not possible to cover everything that has happened in a short introduction like this. We have attempted to cover the beginnings of the (Uralic) language-technology scene in 1980’s as far as it’s relevant to much of the current work, including the ones presented in this issue. We also go through the Uralic area by the main languages to survey on existing resources, to also form a systematic overview of what is missing. Finally we talk about some possible future directions on the pan-Uralic level of language technology management. Northern European Journal of Language Technology, 2016, Vol. 4, Article 1, pp 1–9 DOI 10.3384/nejlt.2000-1533.1641 Figure 1: A map of the Uralic language area show approximate distribution of languages spoken by area.
    [Show full text]
  • Differential Object Marking in Hungarian and the Morphosyntax of Case and Agreement
    Differential object marking in Hungarian and the morphosyntax of case and agreement András Bárány Downing College, University of Cambridge November 2015 This dissertation is submitted for the degree of Doctor of Philosophy. Voor ⴰⵎⵓⵛⵛ Contents Declaration ix Acknowledgements xi Abbreviations xiii List of Tables xv List of Figures xvii 1 DOM, case and agreement 1 1.1 Introduction .................................... 1 1.2 Differential object marking ........................... 2 1.3 Person features and hierarchies ........................ 5 1.3.1 Hierarchies and functional approaches to DOM ......... 9 1.4 Case and agreement ............................... 10 1.5 Theoretical assumptions ............................. 14 1.5.1 Cyclic Agree ............................... 14 1.5.2 Agree can fail .............................. 17 1.5.3 Syntax and morphology ........................ 18 1.6 The sample of languages ............................ 21 Part I Differential object marking in Hungarian 23 2 DOM in Hungarian 25 2.1 Introduction: Hungarian object agreement ................. 25 v Contents 2.2 The distribution of object agreement ..................... 27 2.2.1 Direct objects and subject agreement ................ 28 2.2.2 Direct objects that trigger object agreement ............ 33 2.2.3 “Unexpected” object agreement ................... 43 2.3 Summary ...................................... 45 3 A hybrid analysis of object agreement: syntactic structure and π-features 47 3.1 Introduction .................................... 47 3.2 Towards an analysis ............................... 48 3.2.1 Problems for semantic approaches ................. 48 3.2.2 Problems for syntactic approaches ................. 50 3.2.3 Syntactic structure and person features .............. 53 3.3 Evidence from possessive noun phrases in Hungarian .......... 58 3.3.1 Types of possessors: nominative, dative, pronominal ...... 58 3.3.2 Non-specific possessives and dative possessors .......... 61 3.3.3 Possessed noun phrases and object agreement .........
    [Show full text]
  • 44Bela Pokoly.Pdf
    ON EXONYMS AND THEIR USE IN THE HUNGARIAN LANGUAGE Béla POKOLY Department of Land Administration and Geoinformation Hungarian Committee on Geographical Names Ministry of Agriculture and Rural Development, Budapest [email protected] AZ EXONIMÁKRÓL ÉS HASZNÁLATUKRÓL A MAGYAR NYELVBEN Összefoglalás Az exonimák olyan nyelvi közösség által használt földrajzi nevek, amelyek különböznek a megnevezett alakzat környékén beszélt helyi nyelvtől. Az ENSZ a hivatalos nyelvhez köti a kifejezés meghatározását. A cikkben az exonimák használatáról általában, valamint egyes magyar sajátosságokról esik szó. A szerző kitér az exonimahasználat szűkítésének olyan lehetőségeire, mint a kifejezés meghatározásának lazítása, illetve egyes kisebbségi nyelvek helyi hivatalos státusának elismerése. Summary Exonyms are geographical names that are used by a language community different from the local language used at the named feature. The definition applied by the UN is tied to official language. Exonym use in general and some Hungarian cases in particular are highlighted in the article. The author touches on the possibility of reducing exonyms by amending its present strict definition, and by the recognition of local official status of certain minority languages. POKOLY: ON EXONYMS AND THEIR USE IN THE HUNGARIAN LANGUAGE Exonyms: Why they are used Ever since the emergence of international cartography, but notably since Albrecht Penck’s proposal in 1891 of the 1 : 1 m scale International Map of the World, cartographers have faced the challenge of putting names on maps of foreign territories. The idealistic approach of placing geographical names in their original forms on globes or world atlases is logical, but only at first sight. Logical, because different foreign territories are inhabited by peoples of different languages and features (waters, hills, streams, settlements etc.) are obviously named in these local languages.
    [Show full text]
  • Ancient and Other Scripts
    The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1.
    [Show full text]
  • Country Report Hungary Final
    COUNTRY REPORT HUNGARY Ministry of Education, Hungary Language Education Policy Profile 2002 - 2003 Introduction In 2002 the Language Policy Division of the Council of Europe set out the principles and methods for the production of Language Policy Profiles in member states. The Ministry of Education of the Republic of Hungary was the first to begin drafting a Language Policy Profile, and it called on the Council of Europe for assistance in the completion of this task. In the summer of 2002, the advisers to the Language Policy Division, Michael Byram and Jean-Claude Beacco, together with Joseph Sheils, Head of Language Policy Division, visited Budapest to make preliminary arrangements. After this visit, the Ministry of Education commissioned a Hungarian working group to prepare the Country Report necessary for the production of a Language Education Country Profile . This task was promptly completed within a month. This document served as a basis for discussion with the six-member Council of Europe expert group, during their visit in October, when they met language education policy deciders, experts and representatives of civil society. The Council of Europe expert group will produce an Experts‘ Report at the beginning of 2003, which together with the Country Report, will be the subject of a roundtable discussion during a subsequent visit with the participation of language education policy deciders and experts, and the representatives of civil society. Following this spring visit, the Hungarian authorities, in close cooperation with the Council of Europe expert group, will complete the final version of the Language Education Profile of Hungary. This Country Report is published separately as a complementary document providing necessary information on the context and framework for the Language Education Policy Profile.
    [Show full text]
  • Creation of a Corpus with Semantic Role Labels for Hungarian
    Creation of a corpus with semantic role labels for Hungarian Attila Novák1;2, László János Laki1;2, Borbála Novák1;2 Andrea Dömötör1;3, Noémi Ligeti-Nagy1;3, Ágnes Kalivoda1;3 1MTA-PPKE Hungarian Language Technology Research Group, 2Pázmány Péter Catholic University, Faculty of Information Technology and Bionics Práter u. 50/a, 1083 Budapest, Hungary 3Pázmány Péter Catholic University, Faculty of Humanities and Social Sciences Egyetem u. 1, 2087 Piliscsaba, Hungary {surname.firstname}@itk.ppke.hu Abstract the given text, and this ability is closely related to the ability to answer questions. Therefore, our In this article, an ongoing research is pre- aim is to create a system that is actually capable sented, the immediate goal of which is to cre- ate a corpus annotated with semantic role la- of formulating relevant questions about the text it bels for Hungarian that can be used to train processes. To do this, many distinctions need to a parser-based system capable of formulat- be made that are not present in syntactic annota- ing relevant questions about the text it pro- tion currently available for Hungarian. This article cesses. We briefly describe the objectives of presents the first phase of this work, which aims to our research, our efforts at eliminating errors create an annotated corpus where the annotation in the Hungarian Universal Dependencies cor- contains all the features needed to generate ques- pus, which we use as the base of our an- notation effort, at creating a Hungarian ver- tions concerning the text. bal argument database annotated with thematic roles, at classifying adjuncts, and at match- 2 Shortcomings of the traditional ing verbal argument frames to specific occur- analysis rences of verbs and participles in the corpus.
    [Show full text]
  • Grammar Overview / Nyelvtani Összefoglaló
    Szita Szilvia – Pelcz Katalin: MagyarOK 1. kötet Grammar overview / Nyelvtani összefoglaló GRAMMAR OVERVIEW Nyelvtani összefoglaló a MagyarOK c. tankönyv 1. kötetéhez Szita Szilvia – Pelcz Katalin - All rights reserved. Minden jog fenntartva. 1 Szita Szilvia – Pelcz Katalin: MagyarOK 1. kötet Grammar overview / Nyelvtani összefoglaló TABLE OF CONTENTS The vowel harmony p. 3 The verb tenses p. 4 Verb forms in the present tense: Indefinite conjugation p. 5 Verb forms in the present tense: Definite conjugation p. 10 The verb van (lenni): Conjugation, negation, all tenses p. 12 The past tense: Past tense in the first person singular p. 15 Modal verbs I: tud, akar, szeret, szeretne p. 16 Modal verbs II: lehet, kell p. 17 The infinitive p. 18 Prefixes indicating directions p. 19 The article I: The definite article p. 21 The article II: The indefinite article p. 21 The plural of nouns p. 22 The direct object I: Meaning p. 24 The noun as direct object II: Types of the indefinite direct object p. 27 The noun as direct object III: Types of the definite direct object p. 26 The indirect object p. 27 Prepositional phrases: With whom? With what? By what? p. 27 Possessive endings p. 29 Possessive structures p. 32 More than one ending p. 33 Adverbs of place: Endings and postpositions p. 34 Adverbs of time p. 37 The adjective p. 40 Plural of the adjective p. 41 Suffixing adjectives p. 42 The numeral p. 43 Personal pronouns p. 46 The demonstrative pronoun p. 48 Conjunctions p. 49 Question words p. 50 The word order p.
    [Show full text]