Grapheme-To-Phoneme Transcription in Hungarian

Total Page:16

File Type:pdf, Size:1020Kb

Grapheme-To-Phoneme Transcription in Hungarian Grapheme-to-phoneme transcription in Hungarian Attila Nov´ak1, Borb´alaSikl´osi2 1 MTA-PPKE Hungarian Language Technology Research Group, 2 P´azm´any P´eterCatholic University, Faculty of Information Technology and Bionics, 50/a Pr´aterstreet, 1083 Budapest, Hungary fnovak.attila, [email protected] Abstract. A crucial component of text-to-speech systems is the one responsible for the transcription of the written text to its phonemic rep- resentation. Though the complexity of the relation between the written and spoken form of languages varies, most languages have their regular and irregular phonological set of rules. In this paper, we present a system for the phonemic transcription of Hungarian. Beside the implementation of transcription rules, the tool incorporates the knowledge of a Hungar- ian morphological analyzer in order to be able to detect morpheme and compound boundaries. It is shown that the system performs well even on texts containing a high number of foreign names, which could not be achieved by a lexicon-based method. 1 Introduction In this study, our goal was to create a method to automatically transform written Hungarian to its phonetic representation. The system was used to transcribe a database of Hungarian geographic terms to a phonetic representation. Even though units in a written alphabet might correspond to a phonetic unit of the spoken language, the complexity of this mapping varies among lan- guages. Even if we consider only languages using the Latin alphabet, there are significant, language-specific differences. Thus, a transcription system must be language-specific, and the applicability of certain methods depends on both mor- phosyntactic and phonological characteristics of the given language. In English, orthographic standards had been fixed quite early, while its sys- tem of pronunciation has further evolved [10]. Thus, it is often quite difficult to predict the correspondence between written and spoken forms. However, since the number of wordforms is limited, either a manually created, or an automat- ically generated lexicon – containing both written and transcribed wordforms – can cover almost the whole vocabulary of the language. The main problem in English is (in addition to eventual OOV items, like names) massive homog- raphy with items belonging to different part of speech often having different pronunciation. 2 Attila Nov´ak,Borb´alaSikl´osi In the case of some other languages, such as Hungarian, the relation between written and spoken forms is much closer; the orthography is basically phonemic. In most cases, pronunciation is predictable from the orthographic form. Still, there are many exceptional phenomena and restrictions arising from phonetic capabilities. Moreover, agglutination yields a huge number of wordforms, making the inclusion of the full vocabulary in a lexicon impossible [8]. Thus, an automated method is necessary and is also supported by techno- logical constraints, i.e. exploiting processing capabilities instead of storing large amount of offline data in the form of lexicons. The structure of this paper is as follows: In Section 2 related approaches are overviewed briefly. Then, our method for transcription is described, including detailed arguments about the language-specific difficulties of Hungarian phonol- ogy. Finally, an evaluation of our tool is presented with an error analysis of the most significant errors revealed during the experiments. 2 Related Work There are three main branches of grapheme-to-phoneme transcription methods [4]: – dictionary look-up, – rule-based approaches, – data-driven approaches. Dictionary look-up is used when the mapping between the orthographic and phonological representation is based on conventions, and rules or generalization are not applicable. The advantage of such methods is that other information (e.g. lexical stress, part-of-speech) can also be stored in the dictionary. However, the creation of such dictionaries by hand is very expensive and tedious. No matter how limited the agglutinating behaviour of a language is, there will always be new words or wordforms, which are not covered by a predefined lexicon. Rule-based approaches overcome this limitation by applying a set of predefined grapheme-to-phoneme transcription rules. These rules are language- specific and have to be manually defined by linguists, then these can be formu- lated for example in the framework of finite-state automata [7]. Such rule-based methods also require an exception lexicon for irregular wordforms. Machine learning methods are also applied to grapheme-to-phoneme tran- scription. In [5], it has been shown that the generalization capability of such methods is better than that of rule-based approaches (at least for English). One of the most successful implementations is based on the idea of Pronunciation by Analogy (PbA) [6]. The theory behind this approach is based on psycholinguis- tic models, i.e. predicting the pronunciation of a word by finding similarities to words for which the phonological representation is known. Joint-sequence mod- els [4] aim at finding the most likely pronunciation for an orthographic form by using Bayes’ decision rule. For all data-driven approaches, a dictionary or Grapheme-to-phoneme transcription in Hungarian 3 a transcribed corpus is needed for training the system or building statistical models. For Hungarian, there is an online dictionary containing 1.5 million wordforms and their phonetic transcription [2]. The construction of this dictionary included several main steps. First, wordforms from a large, written corpus were collected and the list of the resulting words were cleaned (i.e. foreign and misspelled words removed). Then, transformation rules were applied. Finally, exceptions were defined and corrected manually. The authors state, that their dictionary can be considered as a reference dictionary, providing the largest coverage of Hungarian wordforms and their IPA transcriptions. However, only wordforms appearing in the original corpus are included, not providing the possibility either for transcribing other inflected forms (unavoidable in Hungarian) or including new words arriving to language use. 3 Method In the case of phonetic languages, such as Finnish, Estonian or Hungarian, the transcription of a written wordform is almost always straightforward. For exam- ple, the word ablak (’window’) is pronounced as [OblOk]. (Table 1 shows the tran- scription of the standard pronunciation of letters in the International Phonetic Alphabet, which is used in this research to represent phonetic transcription.) However, there are two types of phenomena that make the transcription non- trivial: changes in pronunciation due to the interference of certain sounds, and traditional or foreign words. Another problem is the normalization of semiotic systems. letter IPA letter IPA letter IPA letter IPA ´a a: b b n n zs Z a O p p ny ñ s S > o o d d j j cs tS u u t t h h l l ¨u y g g v v r r > i i k k f f dz dz > ´e e: gy é z z dzs dZ ¨o ø ty c sz s > e E m m c ts Table 1: The phonemes of Hungarian Our method is based on three components: a morphological analyzer, a lex- icon for irregular stems and the implementation of phonological rules defined in an XFST (Xerox Finite-State Tool) formalization [3]. 4 Attila Nov´ak,Borb´alaSikl´osi 3.1 Morphological analysis First, the morphological structure of each word is identified. This is necessary in order to find morpheme boundaries to which certain morpho-phonological rules refer. Lexical palatalization, for example, applies only to some specific inflectional suffixes. In addition, certain phonemes are represented by bigraphs (cs, gy, ty, ny, sz, zs, dz, dzs, and their long forms). However, if a morpheme boundary intervenes, the individual consonants of these digraphs are pronounced as consonant clusters (other rules might affect their behaviour resulting in partial or full assimilation). For example, in the word eszk¨ozs´av, ‘toolbar’ the correct transcription is [EskøsSa:v] instead of [EskøZa:v]. Compounds, which are also quite frequent in Hungarian, might contain com- ponents that have an irregular pronunciation. These should also be recognized by the morphological analyzer to avoid their transcription by the regular phono- logical rules. In the system, we used the Humor morphological analyzer [9, 11]. 3.2 Lexicon of irregular stems In all natural languages, there are wordforms with irregular pronunciation. These are usually proper names and foreign words. Words of the latter category might adapt to the adopting language to some extent. For example, the English word file might be written in Hungarian as the original form file or as it is adapted to the pronunciation, i.e. f´ajl. In both cases, the phonetic form is [fa:jl]. How- ever, the phrase New York is used only in its original form in written text and is pronounced as [ñu:jork]. In Hungarian, however, not only foreign, but some traditionally spelled words also fall into this category. Such irregularities occur in quite a few family names, geographical names, etc. In addition, there are cases where standard pronunciation deviates from what orthography suggests in terms of vowel and/or consonant length. For example the word egyes¨ulet ’association’ is pronounced as [Eé:ESylet] rather than [EéESylet], as suggested by the orthographic form. Another group of words included in the lexicon are members of the semiotic system. These use the same set of characters and symbols as the writing system of the language, but render meaning to such units of text in a different man- ner. In order to be able to produce the phonological transcription, these units must be normalized in a preprocessing step. Examples are numbers, abbrevia- tions, acronyms, units of measurements, dates, mathematical expressions, e-mail addresses, etc. Though all of these examples contain a number of subproblems, it is out of the scope of this paper to go into details. You can turn to [13] instead.
Recommended publications
  • Investigation of English Language Contact-Induced Features in Hungarian Cardiology Discharge Reports and Language Attitudes of Physicians and Patients
    University of Szeged, Faculty of Arts PhD School in Linguistics PhD Program in English Applied Linguistics Investigation of English language contact-induced features in Hungarian cardiology discharge reports and language attitudes of physicians and patients Summary of PhD Dissertation Csilla Keresztes Supervisor: Anna Fenyvesi, PhD associate professor Szeged 2010 1. Introduction Since the 1950s English has become not just an important language in the field of medicine, but the predominant language of health sciences. The aim of this study is to describe a field, namely, a subregister of the Hungarian language of medicine, to reveal the English contact-induced features in this specific purpose language, and to investigate the attitude of various discourse communities affected by it towards the English language. The impact of some major European languages, among them the English language, on Hungarian and its lexicon has already been investigated, however, it has been looked at mainly from a puristic aspect so far and little sociolinguistic or contact linguistic research has been done in the field yet. This research is focused on only one field of medicine, cardiology, which was selected for a closer investigation, on the one hand, as it is a technologically sophisticated, professionalized, institutionalized, and highly invasive medical discipline. On the other hand, heart diseases are the leading causes of death in several countries of the world including Hungary. Numerous studies have been published on medical English, but studies on medical Hungarian are limited in number, and very little has been published on the language of cardiology. Hospital discharge reports (or summaries) are written documents prepared when the patient is discharged from a health institution after receiving management.
    [Show full text]
  • Iso/Iec Jtc1/Sc2/Wg2 N4120 2011-07-05
    ISO/IEC JTC1/SC2/WG2 N4120 2011-07-05 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация по стандартизации Doc Type: Working Group Document Title: Response to the Ad-hoc Report N4110 about the Rovas scripts Source: Gábor Hosszú (Hungarian National Body) Status: National Body Contribution Action: For consideration by JTC1/SC2/WG2 Date: 2011-07-05 This document gives the position of the Hungarian Standards Institution (Hungarian National Body) evaluation of the report of the ad-hoc committee on Hungarian met in Helsinki on 2011-06-08. Please send any response regarding to this document to Gábor Hosszú (email: [email protected]). Contents 1. Agreement..........................................................................................................................................................1 2. Disagreement .....................................................................................................................................................2 2.1. Naming of the script: barrier to the encoding.................................................................................................................. 2 2.2. Refused, but necessary Szekely-Hungarian Rovas characters......................................................................................... 3 2.3. Names of the characters .................................................................................................................................................
    [Show full text]
  • “Case Suffixes”, Postpositions and the Phonological Word in Hungarian
    “Case suffixes”, postpositions and the Phonological Word in Hungarian Abstract In this paper I propose a new construction algorithm for the Phonological Word in Hungarian. Based on a detailed discussion of the differences between so-called ‘postpositions’ and ‘case ‘suffixes’, I show that both types of adpositional elements are of the same morphosyntactic category, and that Phonological Word status depends not on an arbitrary division between affixes and syntactically free items, but on phonological properties of the respective adpositions: Bisyllabic adpositions form Phonological Words on their own, while monosyllabic adpositions are integrated into the Phonological Word of their lexical head. Generalizing this result, I argue that all functional elements of Hungarian traditionally called ‘inflectional affixes’ are syntactically independent functional heads integrated into the Phonological Word of a preceding lexical head because they are prosodically too small. I show that apparently bisyllabic inflectional affixes must either be decomposed into different markers or are underlyingly monosyllabic, and develop a ranking of optimality- theoretic alignment constraints implementing the construction algorithm for the Phonological Word in formal detail. 1. Introduction Descriptive tradition and orthographic convention suggest that Hungarian has two different types of functional items corresponding to adpositions: case suffixes and postpositions. The main empirical evidence for this distinction (Kiss, 2002:185) is that case suffixes (1-a,b) undergo vowel harmony with the preceding head noun while postpositions (1-c,d) do not:1 (1) Case suffixes and postpositions a. a ház-ban b. a kert-ben c. a ház alatt d. a kert alatt the house-in the garden-in the house under the garden under ‘in the house’ ‘in the garden’ ‘under the house’ ‘under the garden’ In this paper, I argue that case markers are part of the same Phonological Word (PWord) as their head nouns, but syntactically independent units, in other words they are postpositions.
    [Show full text]
  • ISO/IEC JTC1/SC2/WG2 N4367 2012-10-14 Tezevres Igwnavbas Izqktezmen
    ISO/IEC JTC1/SC2/WG2 N4367 2012-10-14 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация по стандартизации tezevre S iGwNAvbaS izqktezmen Doc Type: Working Group Document Title: Revised proposal for encoding the Rovas in the UCS Source: Jenő Demeczky, Gábor Hosszú, Tamás Rumi, László Sípos, & Erzsébet Zelliger Status: Individual Contribution Action: For consideration by UTC and ISO/IEC JTC1/SC2/WG2 Introduction This document is based on an expert backed process – involving Unicode-computing specialists, Rovas (r-o-v-a-sh) researchers and representatives of the user community – that reflects the latest results of the archeology, linguistics, paleography and the contemporary technical needs as well. This document replaces N4183 (v2: 12-01-11) and N4227 (12-02-06). This document contains modifications in the character repertoire. As for the optional spelling of the Rovas block and script name in English, the Hungarian National Body can except the Rovash, as its pronunciation is closer to that of the original Hungarian word rovás. According to the traditions of using the Hungarian word rovás in several languages (see Table 3-1) as loanword adjusted in spelling, further proper variations are expected as well, for example in French rovache (le rovache – masculine as a noun). The most important Rovas orthography – that has never became extinct and currently is gaining large popularity – is the Szekely-Hungarian Rovas (in French: rovache szekelyo-hongrois). Another, earlier Rovas orthography is the Carpathian Basin Rovas (in French: rovache du Bassin des Carpathes), which became extinct in the 11th/12th century, and it was deciphered on the end of the 20th century, only.
    [Show full text]
  • Orthographies in Early Modern Europe
    Orthographies in Early Modern Europe Orthographies in Early Modern Europe Edited by Susan Baddeley Anja Voeste De Gruyter Mouton An electronic version of this book is freely available, thanks to the support of libra- ries working with Knowledge Unlatched. KU is a collaborative initiative designed to make high quality books Open Access. More information about the initiative can be found at www.knowledgeunlatched.org An electronic version of this book is freely available, thanks to the support of libra- ries working with Knowledge Unlatched. KU is a collaborative initiative designed to make high quality books Open Access. More information about the initiative can be found at www.knowledgeunlatched.org ISBN 978-3-11-021808-4 e-ISBN (PDF) 978-3-11-021809-1 e-ISBN (EPUB) 978-3-11-021806-2 ISSN 0179-0986 e-ISSN 0179-3256 ThisISBN work 978-3-11-021808-4 is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License, ase-ISBN of February (PDF) 978-3-11-021809-1 23, 2017. For details go to http://creativecommons.org/licenses/by-nc-nd/3.0/. e-ISBN (EPUB) 978-3-11-021806-2 LibraryISSN 0179-0986 of Congress Cataloging-in-Publication Data Ae-ISSN CIP catalog 0179-3256 record for this book has been applied for at the Library of Congress. ISBN 978-3-11-028812-4 e-ISBNBibliografische 978-3-11-028817-9 Information der Deutschen Nationalbibliothek Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliogra- fie;This detaillierte work is licensed bibliografische under the DatenCreative sind Commons im Internet Attribution-NonCommercial-NoDerivs über 3.0 License, Libraryhttp://dnb.dnb.deas of February of Congress 23, 2017.abrufbar.
    [Show full text]
  • On the Creation of a Pronunciation Dictionary for Hungarian
    On the creation of a pronunciation dictionary for Hungarian Stephen M. Grimes [email protected] August 2007 Abstract This report describes the process of creating a pronunciation dictionary and phonological lexicon for Hungarian for the purpose of aiding in linguistic research on Hungarian phonology and phonotactics. The pronunciation dictionary was created by transforming orthographic forms to pronunciation representations by taking advantage of systematic deviations between Hungarian orthography and pronunciation. It is argued that the “automated” creation of such a dictionary is reasonably expected to be accurate due to the relative similarity of Hungarian orthography to actual pronunciation. This document includes discussion of goals and standards for creating a Hungarian pronunciation dictionary, and each phonological change creating a mismatch between orthography and pronunciation is highlighted. Future developments and additions to the current dictionary are also suggested as well as strategies for evaluating the quality of the dictionary. Finally, potential applications to linguistic research are discussed. 1 Introduction While students of the English language quickly learn that English spelling is by no means consistent, many Hungarians believe that the Hungarian alphabet is completely phonetic. Here, a phonetic alphabet refers to the existence of a one-to-one mapping between symbol and sound. It can quite easily be demonstrated by counter-example that 1 Hungarian orthography is not phonetic, and in fact several types of orthographic- pronunciation discrepancies exist. Consider as an example the word /szabadság/ 1 [sabatʃ:a:g] ‘freedom, liberty’ , in which no fewer than four orthographic-pronunciation discrepancies can be identified with the written form of this word: (1) a.
    [Show full text]
  • The Case of Hungarian Vowel Harmony
    Stochastic Phonological Knowledge: The Case of Hungarian Vowel Harmony Bruce Hayes Zsuzsa Cziráky Londe UCLA August 2005 Abstract In Hungarian, stems ending in a back vowel plus one or more neutral vowels show unusual behavior: for such stems, the otherwise-general process of vowel harmony is lexically idiosyncratic. Particular stems can take front suffixes, take back suffixes, or vacillate. Yet at a statistical level, the patterning among these stems is lawful: in the aggregate, they obey principles that relate the propensity to take back or front harmony to the height of the rightmost vowel and to the number of neutral vowels. We argue that this patterned statistical variation in the Hungarian lexicon is internalized by native speakers. Our evidence is that they replicate the pattern when they are asked to apply harmony to novel stems in a “wug” test (Berko 1958). Our test results match quantitative data about the Hungarian lexicon, gathered with an automated Web search. We model the speakers’ knowledge and intuitions with a grammar constructed under the dual listing/generation model of Zuraw (2000), then show how the constraint rankings of this grammar can be learned by algorithm.* *We would like to thank Stephen Anderson, Arto Anttila, Andrew Garrett, Matthew Gordon, Gunnar Hansson, Sharon Inkelas, Patricia Keating, Paul Kiparsky, Janet Pierrehumbert, Catherine Ringen, Colin Wilson, Péter Siptár, Donca Steriade, Robert Vago, and Kie Zuraw for helpful advice. As is usual, they are not to be held responsible for defects. We would also like to thank our many Hungarian language consultants for sharing their native-speaker intuitions. p.
    [Show full text]
  • ISO/IEC JTC1/SC2/WG2 N3xxx L2/08-Xxx
    ISO/IEC JTC1/SC2/WG2 N3xxx L2/08-xxx 2007-08-21 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация по стандартизации Nemzetközi Szabványügyi Szervezet Doc Type: Working Group Document Irat típusa: Munkacsoport irat Title: Revised proposal for encoding the Old Hungarian script in the UCS Cím: Javított előterjesztés a rovásírás Egyetemes Betűkészlet-beli kódolására Source/Forrás: Michael Everson & André Szabolcs Szelp Status: Individual Contribution Státusz: Magánelőterjesztés Action: For consideration by JTC1/SC2/WG2 and UTC Intézkedés: JTC1/SC2/WG2 és UTC általi megfontolásra Date/Kelt: 2007-08-21 Ez az irat átveszi az N3483 (2008-08-04), N2134 This document replaces N3483 (2008-08-04), (1999-10-02) és a N1638 (1999-09-18). A szerzők N2134 (1999-10-02), and N1638 (1997-09-18). hálájukat szeretnék kifejezni Joó Ádámnak a The authors are grateful to Ádám Joó for the magyar fordításért. Hungarian translation. 1. Bevezetés. A rovásírás egy rúnajellegű írás a 1. Introduction. The Old Hungarian script is a magyar nyelv lejegyzésére. Magyarul rovásírás, runi form script used to write the Hungarian lan- „rótt írás”, a rovás és írás szavakból. Néhány for- guage. In Hungarian it is called rovásírás ‘incised rás „Hungarian Runic” néven említi, ahol a runic script’, from rovás ‘incision’ and írás ‘writing, az írás rúnás alakjára utal, nem pedig a germán script’. Some sources call it “Hungarian Runic” rúnáktól való leszármazásra (habár a rovásírás és a where runic refers to the script’s runiform character fuþark távoli unokatestvérek). Más források and does not indicate direct descent from the „Szekler script” (székely írás) néven nevezik, Germanic runes (though Old Hungarian and the mások nem fordítják le és „Hungarian Rovás” Fuþark are distant cousins).
    [Show full text]
  • Argument Structure and Functional Projections in Old Hungarian Verbal Gerunds1
    CORE Metadata, citation and similar papers at core.ac.uk Provided by Repository of the Academy's Library Argument structure and functional projections in Old Hungarian verbal gerunds1 Éva Dékány Abstract This paper seeks to give a syntactic analysis of Old Hungarian verbal gerunds. I take the "mixed projection" approach to nominalizations (Bresnan 1997, Borsley & Kornfilt 2000, Alexiadou 2001, a.o.), whereby the extended vP is embedded under nominal functional categories. I argue that in the verbal part of the gerund there is solid evidence for AspP/PredP dominating VoiceP, but there is no conclusive evi- dence for a TP being projected. I suggest that the object of the gerundival verb may undergo scrambling to a position above negation, while the subject becomes a derived possessor on the surface. I propose that these gerunds do not contain a nominalizer (see Alexiadou 2005, Alexiadou et al. 2010b; 2011 for this possibility); the extended vP is embedded directly under the nominal functional head Poss. Keywords: gerund, mixed projection, nominalization, possessor, Old Hungarian 1 Introduction Old Hungarian (896–1526 A.D.) features a wide variety of non-finite verb forms: in addition to an infinitive, it also has several types of adjectival and adverbial participles as well as two types of gerunds. This paper focuses on the syntactic structure of one of the Old Hungarian gerunds: that marked by the suffix -t (1).2 (1) harom´Zèr tagac-me´ g´ [èngem-èt eSmèr-t-ed-èt] three.time deny-PRT I-ACC know-t-2SG-ACC `you deny your knowing me three times’ (Munich C.
    [Show full text]
  • Hungarian 16Th-Century Hungarian Orthography Klára Korompay
    Hungarian 16th-century Hungarian orthography Klára Korompay 0. Introduction Before embarking on this overview of 16th-century Hungarian orthogra- phy, we shall first of all present briefly the essential features of this orthog- raphy, and then outline some of the major aspects of its development. Hungarian has an alphabetic writing system, in which the relations be- tween writing and sound are very regular: a grapheme corresponds to a single phoneme, and vice-versa, with very few exceptions. Although the phonemic principle is predominant, we should not, however, underestimate the importance of the morphological principle, since Hungarian, a member of the Finno-Ugrian family, is an agglutinative language. The story of Hungarian orthography begins around the year 1000 AD, with the adoption of the Roman alphabet. As was the case in many other languages, difficulties soon arose due to the fact that the Latin alphabet (21 letters at first, then 23, before reaching the present number of 26) was inadequate to represent the phonological systems of the various vernaculars that it was called upon to transcribe. In the case of Old Hungarian, the num- ber of phonemes stood at around 35, and this number was to increase over the centuries. Several special graphic devices therefore had to be introduced in order to create new signs: this was achieved either by creating digraphs, or by using diacritical signs (mainly different types of accents). During the Middle Ages, both of these devices were adopted, and this led to the crea- tion of different models, whose parallel existence is significant in the 16th century.
    [Show full text]
  • On Hungarian Morphology Andr ´As Kornai
    ON HUNGARIAN MORPHOLOGY ANDRAS´ KORNAI Abstract The aim of this study is to provide an autosegmental description of Hungarian morphology. Chap- ter 1 sketches the (meta)theoretical background and summarizes the main argument. In Chapter 2 phonological prerequisites to morphological analysis are discussed. Special attention is paid to Hungarian vowel harmony. In Chapter 3 a universal theory of lexical categories is proposed, and the category system of Hungarian is described within it. The final chapter presents a detailed descrip- tion of nominal and verbal inflection in Hungarian, and describes the main features of a computer implementation based on the analyses provided here. 1 0. Preface 3 1. Introduction 5 1.1 The methods of the investigation 6 1.2 Summary of new results 8 1.3 Vowel harmony 9 1.4 Summary of conclusions 12 2. Phonology 14 2.1 The feature system: vowels 14 2.2 Consonants 23 2.3 Vowel harmony 29 2.4 Syllable structure 47 2.5 Postlexical rules 53 2.6 Appendix 57 3. Words and paradigms 67 3.1 Some definitions 67 3.2 The lexical categories of Hungarian 76 4. Inflectional morphology 81 4.1 Conjugation 81 4.2 Declension 106 4.3 Implementation 116 4.4 Conclusion 147 5. Bibliography 149 2 0 Preface This thesis was written in 1984-1986 – the first publicly circulated version (Version 1.4) was defended at the Hungarian Academy of Sciences (HAS) Institute of Linguistics in September 1986. An extended Version 2 was submitted to the HAS Scientific Qualifications Committee in August 1988, and was formally defended in September 1989.
    [Show full text]
  • Inflectional Marking in Hungarian Aphasics
    BRAIN AND LANGUAGE 41, 165-183 (1991) Inflectional Marking in Hungarian Aphasics BRIAN MACWHINNEY Carnegie Mellon Universiry AND JUDIT OSM~N-&I Hungarian Academy of Sciences How do aphasics deal with the rich inflectional marking available in agglutinative languages like Hungarian? For the Hungarian noun alone, aphasics have to deal with over 15 basic case markings and dozens of possible combinations of these basic markings. Using the picture description task of MacWhinney and Bates (1978), this study examined the use of inflectional markings in nine Broca’s and five Wemicke’s aphasic speakers of Hungarian. The analysis focused on subject, direct object, indirect object, and locative nominal arguments. Compared to nor- mals, both groups had a much higher rate of omission of all argument types. Subject ellipsis was particularly strong, as it is in normal Hungarian. There was a tendency for Broca’s to omit the indirect object and for Wernicke’s to omit the direct object. Across argument types, Wemicke’s had a much higher level of pronoun usage than did Broca’s. Broca’s also showed a very high level of article omission. Compared to similar data reported by Slobin (this issue) for Turkish, the Hungarian aphasics showed an elevated level of omission of case markings. Addition errors were quite rare, but there were 14 substitutions of one case marking for another. These errors all involved the substitution of some close semantic competitor. There were no errors in the basic rules for vowel harmony or morpheme order. Overall the results paint a picture of a group of individuals whose grammatical abilities are damaged and noisy, but still largely functional.
    [Show full text]