Quantitative Investigations in Hungarian Phonotactics and Syllable Structure

QUANTITATIVE INVESTIGATIONS IN HUNGARIAN PHONOTACTICS AND SYLLABLE STRUCTURE Stephen M. Grimes Submitted to the faculty of the University Graduate School in partial fulfillment of the requirements for the degree Doctor of Philosophy in the Department of Linguistics, Indiana University September 2010 Accepted by the Graduate Faculty, Indiana University, in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Doctoral Committee (Stuart Davis, Ph.D.) (Kenneth de Jong, Ph.D.) (Sandra Kübler, Ph.D.) (Markus Dickinson, Ph.D.) January 30, 2009 © 2009 Stephen M. Grimes ALL RIGHTS RESERVED Stephen Grimes QUANTITATIVE INVESTIGATIONS IN HUNGARIAN PHONOTACTICS AND SYLLABLE STRUCTURE This dissertation investigates statistical properties of segment collocation and syllable geometry of the Hungarian language. A corpus and dictionary based approach to studying language phonologies is outlined. In order to conduct research on Hungarian, a phonological lexicon was created by compiling existing dictionaries and corpora and using a system of regular expression rewrite rules (based on letter-to-sound rules) in order to derive pronunciations for each word in the lexicon. The resulting pronunciation dictionary contains not only pronunciations in several transcription systems but also syllable counts and syllable boundaries, corpus frequencies, and vowel and consonant projections for each entry. The highlight of the dissertation is an investigation into whether the rhyme or body can be posited as an intermediate node in the structure of the Hungarian syllable. While both consonant-vowel and vowel-consonant sequences in Hungarian exhibit both attracting and repelling connections, on average the language behaves neither as a rhyme- type language (such as English) or a body-type language (such as Korean). It is suggested that a more-nuanced description of the Hungarian syllable is required, and an alternative representation is proposed. The dissertation makes several contributions to research in Hungarian phonology. Results include insights into the statistics of phone-based n-grams for Hungarian, previously unknown phonotactic restrictions, and data on syllable structure with consequences for modeling the structure of the Hungarian syllable. In the cross-linguistic context, the dissertation inspires related quantitative phonotactic research on disparate languages while simultaneously suggesting several caveats that should be taken into account when performing similar studies. In particular, the difficulties of quantitative studies of the syllable structure of multisyllabic words are addressed. _____________________________________________ _____________________________________________ _____________________________________________ _____________________________________________ Table of Contents 1 APPROACHES TO PHONOTACTICS............................................................................................1 1.1 PHONOTACTIC STUDIES ACROSS LINGUISTIC SUB-DISCIPLINES......................................................3 1.2 DOMAINS OF PHONOTACTICS ........................................................................................................7 1.2.1 Locality in phonotactics ..........................................................................................................9 1.2.2 The role of morphology in phonotactics................................................................................10 1.2.3 The syllable and phonotactics ...............................................................................................10 1.3 STATISTICAL AND GRADIENT PHONOTACTICS .............................................................................13 1.3.1 Gradient phonotactic grammaticality....................................................................................14 1.3.2 Defining phonotactic probability...........................................................................................17 1.4 SUMMARY...................................................................................................................................23 2 HUNGARIAN PHONOTACTICS AND LEXICAL STATISTICS..............................................25 2.1 HUNGARIAN SEGMENT INVENTORY.............................................................................................26 2.1.1 Segment length ......................................................................................................................26 2.1.2 Consonants ............................................................................................................................28 2.1.3 Vowels ...................................................................................................................................29 2.2 PHONOTACTIC CONSTRAINTS......................................................................................................31 2.2.1 Vowel phonotactics................................................................................................................31 2.2.2 Consonant phonotactics ........................................................................................................34 2.2.3 Vowel-consonant interactions ...............................................................................................35 2.3 PHONOTACTIC DOMAINS IN HUNGARIAN ....................................................................................37 2.3.1 Phonotactics of lexical subcategories ...................................................................................39 2.4 SYLLABLES AND SYLLABIFICATION IN HUNGARIAN....................................................................39 2.4.1 Syllabification algorithm .......................................................................................................40 2.4.2 Exceptional syllabifications...................................................................................................42 2.5 SEGMENT FREQUENCY IN HUNGARIAN........................................................................................43 2.5.1 Uniphone segment frequency.................................................................................................44 2.5.2 Biphone segment frequency...................................................................................................48 2.5.3 Triphone segment frequency..................................................................................................49 2.6 SUMMARY...................................................................................................................................50 3 THE CREATION OF A PRONUNCIATION DICTIONARY......................................................51 3.1 INTRODUCTION TO THE TASK ......................................................................................................52 3.1.1 Pronunciation dictionaries in other languages .....................................................................54 3.1.2 Limitations of this pronunciation dictionary of Hungarian...................................................57 3.2 DESIGN REQUIREMENTS FOR THE HUNGARIAN PRONUNCIATION DICTIONARY ............................58 3.2.1 Hungarian corpora and word lists ........................................................................................58 3.2.2 Contents of the Hungarian phonological lexicon..................................................................61 3.2.3 Dialects and Idiolects............................................................................................................62 3.2.4 Phonemic versus phonetic approaches..................................................................................63 3.2.5 Character representations of Hungarian sounds ..................................................................64 3.2.6 Encoding of long segments....................................................................................................69 3.3 CONVERTING ORTHOGRAPHY TO PRONUNCIATION......................................................................71 3.3.1 Many-to-one letter-to-sound relationships............................................................................72 3.3.2 Divergent spelling conventions in the development of an orthography.................................73 3.3.3 Digraphs and Trigraphs........................................................................................................75 3.4 PHONOLOGY AND MORPHOPHONOLOGY UNMARKED IN THE ORTHOGRAPHY...............................78 3.4.1 Assimilation of nasals to place of articulation ......................................................................78 3.4.2 Voicing assimilation ..............................................................................................................80 3.4.3 Coronal palatalization...........................................................................................................81 3.4.4 Alveolar plosive affrication ...................................................................................................81 3.4.5 Hiatus resolution ...................................................................................................................82 3.4.6 Phonotactics and syllable structure constraints....................................................................83 3.4.7 High vowel lengthening in the primary syllable....................................................................84 3.4.8 Consonant Shortening ...........................................................................................................85

Quantitative Investigations in Hungarian Phonotactics and Syllable Structure

Investigation of English Language Contact-Induced Features in Hungarian Cardiology Discharge Reports and Language Attitudes of Physicians and Patients

Using 'North Wind and the Sun' Texts to Sample Phoneme Inventories

Iso/Iec Jtc1/Sc2/Wg2 N4120 2011-07-05

Grapheme-To-Phoneme Transcription in Hungarian

Russian Voicing Assimilation, Final Devoicing, and the Problem of [V] (Or, the Mouse That Squeaked)*

[Paper on the Alignment of Low Postnuclear F0 Valleys in Dutch

The Syntax of Phonology a Radically Substance-Free Approach Sylvia Blaho

The Processing of Reduced Word Pronunciation Variants by Natives and Foreign Language Learners

“Case Suffixes”, Postpositions and the Phonological Word in Hungarian

Differential Object Marking in Hungarian and the Morphosyntax of Case and Agreement

ISO/IEC JTC1/SC2/WG2 N4367 2012-10-14 Tezevres Igwnavbas Izqktezmen

Orthographies in Early Modern Europe