Texts to Sample Phoneme Inventories
Total Page:16
File Type:pdf, Size:1020Kb

Load more
Recommended publications
-
Department of English and American Studies
Masaryk University Faculty of Arts Department of English and American Studies English Language and Literature Petra Jureková The Pronunciation of English in Czech, Slovak and Russian Speakers Bachelor‟s Diploma Thesis Supervisor: PhDr. Kateřina Tomková, Ph.D. 2015 I declare that I have worked on this thesis independently, using only the primary and secondary sources listed in the bibliography. …………………………………………… Author‟s signature 2 I would like to express gratitude to my supervisor, PhDr. Kateřina Tomková, Ph.D., and thank her for her advice, patience, kindness and help. I would also like to thank all the volunteers that participated in the research project. 3 Table of contents List of tables ...................................................................................................................... 5 1. Introduction ............................................................................................................... 7 1.1. English as an important language for international communication .................. 7 1.2. Acquisition of a foreign language ...................................................................... 8 1.2.1. English as a foreign language ..................................................................... 8 1.2.2. Pronunciation .............................................................................................. 8 1.3. About the thesis .................................................................................................. 9 2. English phonetic system ........................................................................................ -
Using 'North Wind and the Sun' Texts to Sample Phoneme Inventories
Blowing in the wind: Using ‘North Wind and the Sun’ texts to sample phoneme inventories Louise Baird ARC Centre of Excellence for the Dynamics of Language, The Australian National University [email protected] Nicholas Evans ARC Centre of Excellence for the Dynamics of Language, The Australian National University [email protected] Simon J. Greenhill ARC Centre of Excellence for the Dynamics of Language, The Australian National University & Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History [email protected] Language documentation faces a persistent and pervasive problem: How much material is enough to represent a language fully? How much text would we need to sample the full phoneme inventory of a language? In the phonetic/phonemic domain, what proportion of the phoneme inventory can we expect to sample in a text of a given length? Answering these questions in a quantifiable way is tricky, but asking them is necessary. The cumulative col- lection of Illustrative Texts published in the Illustration series in this journal over more than four decades (mostly renditions of the ‘North Wind and the Sun’) gives us an ideal dataset for pursuing these questions. Here we investigate a tractable subset of the above questions, namely: What proportion of a language’s phoneme inventory do these texts enable us to recover, in the minimal sense of having at least one allophone of each phoneme? We find that, even with this low bar, only three languages (Modern Greek, Shipibo and the Treger dialect of Breton) attest all phonemes in these texts. -
University of Groningen Phonological Grammar and Frequency Sloos
University of Groningen Phonological grammar and frequency Sloos, Marjoleine IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2013 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Sloos, M. (2013). Phonological grammar and frequency: an integrated approach. s.n. Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 28-09-2021 Summary Phonological grammar and frequency: an integrated approach Language consists of two internal components: the mental lexicon, all memorized morphemes, and the grammar, a set of language rules. In phonology, these two parts are almost always studied independently, viz. usage-based phonology typically investigates the lexicon and generative phonology usually focuses on the grammar. During the last decades, the call for a combined model has gradually become stronger (Ernestus & Baayen (2011), Pierrehumbert (2002), Smolensky & Legendre (2006), van de Weijer (2012)). -
University of California Santa Cruz Minimal Reduplication
UNIVERSITY OF CALIFORNIA SANTA CRUZ MINIMAL REDUPLICATION A dissertation submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in LINGUISTICS by Jesse Saba Kirchner June 2010 The Dissertation of Jesse Saba Kirchner is approved: Professor Armin Mester, Chair Professor Jaye Padgett Professor Junko Ito Tyrus Miller Vice Provost and Dean of Graduate Studies Copyright © by Jesse Saba Kirchner 2010 Some rights reserved: see Appendix E. Contents Abstract vi Dedication viii Acknowledgments ix 1 Introduction 1 1.1 Structureofthethesis ...... ....... ....... ....... ........ 2 1.2 Overviewofthetheory...... ....... ....... ....... .. ....... 2 1.2.1 GoalsofMR ..................................... 3 1.2.2 Assumptionsandpredictions. ....... 7 1.3 MorphologicalReduplication . .......... 10 1.3.1 Fixedsize..................................... ... 11 1.3.2 Phonologicalopacity. ...... 17 1.3.3 Prominentmaterialpreferentiallycopied . ............ 22 1.3.4 Localityofreduplication. ........ 24 1.3.5 Iconicity ..................................... ... 24 1.4 Syntacticreduplication. .......... 26 2 Morphological reduplication 30 2.1 Casestudy:Kwak’wala ...... ....... ....... ....... .. ....... 31 2.2 Data............................................ ... 33 2.2.1 Phonology ..................................... .. 33 2.2.2 Morphophonology ............................... ... 40 2.2.3 -mut’ .......................................... 40 2.3 Analysis........................................ ..... 48 2.3.1 Lengtheningandreduplication. -
Phonetics and Phonology in Russian Unstressed Vowel Reduction: a Study in Hyperarticulation
Phonetics and Phonology in Russian Unstressed Vowel Reduction: A Study in Hyperarticulation Jonathan Barnes Boston University (Short title: Hyperarticulating Russian Unstressed Vowels) Jonathan Barnes Department of Modern Foreign Languages and Literatures Boston University 621 Commonwealth Avenue, Room 119 Boston, MA 02215 Tel: (617) 353-6222 Fax: (617) 353-4641 [email protected] Abstract: Unstressed vowel reduction figures centrally in recent literature on the phonetics-phonology interface, in part owing to the possibility of a causal relationship between a phonetic process, duration-dependent undershoot, and the phonological neutralizations observed in systems of unstressed vocalism. Of particular interest in this light has been Russian, traditionally described as exhibiting two distinct phonological reduction patterns, differing both in degree and distribution. This study uses hyperarticulation to investigate the relationship between phonetic duration and reduction in Russian, concluding that these two reduction patterns differ not in degree, but in the level of representation at which they apply. These results are shown to have important consequences not just for theories of vowel reduction, but for other problems in the phonetics-phonology interface as well, incomplete neutralization in particular. Introduction Unstressed vowel reduction has been a subject of intense interest in recent debate concerning the nature of the phonetics-phonology interface. This is the case at least in part due to the existence of two seemingly analogous processes bearing this name, one typically called phonetic, and the other phonological. Phonological unstressed vowel reduction is a phenomenon whereby a given language's full vowel inventory can be realized only in lexically stressed syllables, while in unstressed syllables some number of neutralizations of contrast take place, with the result that only a subset of the inventory is realized on the surface. -
A Note on the Phonology and Phonetics of CR, RC, and SC Consonant Clusters in Italian
A note on the phonology and phonetics of CR, RC, and SC consonant clusters in Italian Michael J. Kenstowicz 1. Introduction Previous generative research on Italian phonology starting with Vogel (1982) and Chierchia (1986) has proposed that intervocalic consonant clusters are parsed into contrasting tauto- vs. heterosyllabic categories based on several factors: phonotactic restrictions on word-initial consonant sequences, syllable weight as reflected in the distribution of stress and the length of a preceding tonic vowel, the distribution of prenominal allomorphs of various determiners, and the application of syntactic gemination (radoppiamento sintattico). Based on these criteria, clusters of rising sonority (in particular stop plus liquid) fall into the tautosyllabic category while falling sonority clusters composed of a sonorant plus obstruent are heterosyllabic. Clusters composed of /s/ plus a stop display mixed behavior but generally pattern with the heterosyllabic group. In her 2004 UCLA Ph.D. dissertation, Kristie McCrary investigated corpus-external reflexes of these cluster distinctions with a psycholinguistic test of word division and measurements of the phonetic duration of segments (both consonants and vowels). Her results support some aspects of the traditional phonological analysis but call into question others. In this squib we summarize the literature supporting the traditional distinction among these clusters and then review McCrary’s results. An important finding in McCrary’s study was that stops in VCV and VCRV contexts (R = a liquid) were significantly shorter than stops in VRCV contexts. She observed that these contexts align with the distribution of geminates in Italian and proposed that singleton stops are significantly shorter in the VCV and VCRV contexts in order to enhance their paradigmatic contrast with geminates. -
Russian Voicing Assimilation, Final Devoicing, and the Problem of [V] (Or, the Mouse That Squeaked)*
Russian voicing assimilation, final devoicing, and the problem of [v] (or, The mouse that squeaked)* Jaye Padgett - University of California, Santa Cruz "...the Standard Russian V...occupies an obviously intermediate position between the obstruents and the sonorants." Jakobson (1978) 1. Introduction Like the mouse that roared, the Russian consonant [v] has a status in phonology out of proportion to its size. Besides leaving a trail of special descriptive comments, this segment has played a key role in discussions about abstractness in phonology, about the manner in which long-distance spreading occurs, and about the the larger organization of phonology. This is largely because of the odd behavior of [v] with respect to final devoicing and voicing assimilation in Russian. Russian obstruents devoice word-finally, as in kniga 'book (nom. sg.) vs. knik (gen. pl.), and assimilate to the voicing of a following obstruent, gorodok 'town (nom. sg.)' vs. gorotka (gen. sg.). The role of [v] in this scenario is puzzling: like an obstruent, it devoices word-finally, krovi 'blood (gen. sg.)' vs. krofj (nom. sg.), and undergoes voicing assimilation, lavok 'bench (gen. pl.)' vs. lafka (nom. sg.). But like a sonorant, it does not trigger voicing assimilation: compare dverj 'door' and tverj 'Tver' (a town). As we will see, [v] behaves unusually in other ways as well. Why is Russian [v] special in this way? The best-known answer to this question posits that [v] is underlyingly /w/ and therefore behaves as a sonorant with respect to voicing assimilation (Lightner 1965, Daniels 1972, Coats and Harshenin 1971, Hayes 1984, Kiparsky 1985). -
The Syllable Structure of Bangla in Optimality Theory and Its Application to the Analysis of Verbal Inflectional Paradigms in Distributed Morphology
The syllable structure of Bangla in Optimality Theory and its application to the analysis of verbal inflectional paradigms in Distributed Morphology von Somdev Kar Philosophische Dissertation angenommen von der Neuphilologischen Fakultät der Universität Tübingen am 09. Januar 2009 Tübingen 2009 Gedruckt mit Genehmigung der Neuphilologischen Fakultät der Universität Tübingen Hauptberichterstatter : Prof. Hubert Truckenbrodt, Ph.D. Mitberichterstatter : PD Dr. Ingo Hertrich Dekan : Prof. Dr. Joachim Knape ii To my parents... iii iv ACKNOWLEDGEMENTS First and foremost, I owe a great debt of gratitude to Prof. Hubert Truckenbrodt who was extremely kind to agree to be my research adviser and to help me to formulate this work. His invaluable guidance, suggestions, feedbacks and above all his robust optimism steered me to come up with this study. Prof. Probal Dasgupta (ISI) and Prof. Gautam Sengupta (HCU) provided insightful comments that have given me a different perspective to various linguistic issues of Bangla. I thank them for their valuable time and kind help to me. I thank Prof. Sengupta, Dr. Niladri Sekhar Dash and CIIL, Mysore for their help, cooperation and support to access the Bangla corpus I used in this work. In this connection I thank Armin Buch (Tübingen) who worked on the extraction of data from the raw files of the corpus used in this study. And, I wish to thank Ronny Medda, who read a draft of this work with much patience and gave me valuable feedbacks. Many people have helped in different ways. I would like to express my sincere thanks and gratefulness to Prof. Josef Bayer for sending me some important literature, Prof. -
Anti-Romance Laryngeal Patterns in Italian Phonology
Anti-Romance laryngeal patterns in Italian phonology Bálint Huszthy Babes-Bolyai University [email protected] In the literature of laryngeal phonology all Romance languages are depicted as “voice languges”, exhibiting a binary laryngeal distinction between a voiced lenis and a voiceless fortis set of obstruents (Wetzels and Mascaró 2001; Petrova et al. 2006; etc.). Voice languages are characterised by regressive voice assimilation (RVA) due to the phonological activity of [voice] (Petrova et al. 2006; Cyran 2014). Italian manifests a process similar to RVA, called preconsonantal s-voicing; that is, /s/ becomes voiced before voiced consonantal segments; e.g., sparo [sp] ‘gunshot’ vs. sbarra [zb] ‘barrier’, sveglia [zv] ‘alarm clock’, smettere [zm] ‘to stop’, slitta [zl] ‘sled’, etc. (Nespor 1993; Bertinetto 2004; Krämer 2009). Since /sC/ is the only obstruent cluster in Italian phonotactics, Italian seems to fulfil the requirements for being a prototypical voice language. However, this paper argues that s-voicing is not an instance of RVA, at least from a synchronic phonological point of view. Data: This study is built on a loanword test: 15 Italian informants (from different dialectal zones) were recorded in a soundproof studio, who repeated five times 18 Italian sample texts containing 108 target loanwords (e.g., vo/dk/a, foo/tb/all, a/fɡ/ano, iceberg /sb/ etc.). The overall statistics reveal that the informants retain the underlying voice values in the respective obstruent clusters in 65% of the cases; that is, they avoid RVA in a two-thirds majority, which characterises the performance of all the informants rather evenly. Uniformity in voicing also occurs in the data: 20% out of the marked clusters is devoiced (e.g. -
Vocale Incerta, Vocale Aperta*
Vocale Incerta, Vocale Aperta* Michael Kenstowicz Massachusetts Institute of Technology Omaggio a P-M. Bertinetto Ogni toscano si comporta di fronte a una parola a lui nuova, come si nota p. es. nella lettura del latino, scegliendo costantamente, e inconsciamente, il timbro aperto, secondo il principio che il Migliorini ha condensato nella formula «vocale incerta, vocale aperta»…è il processo a cui vien sottoposto ogni vocabolo importato o adattato da altri linguaggi. (Franceschi 1965:1-3) 1. Introduction Standard Italian distinguishes seven vowels in stressed nonfinal syllables. The open ɛ,ɔ vs. closed e,o mid-vowel contrast (transcribed here as open è,ò vs. closed é,ó) is neutralized in unstressed position (1). (1) 3 sg. infinitive tócca toccàre ‘touch’ blòcca bloccàre ‘block’ péla pelàre ‘pluck’ * A preliminary version of this paper was presented at the MIT Phonology Circle and the 40th Linguistic Symposium on Romance Languages, University of Washington (March 2010). Thanks to two anonymous reviewers for helpful comments as well as to Maria Giavazzi, Giovanna Marotta, Joan Mascaró, Andrea Moro, and Mario Saltarelli. 1 gèla gelàre ‘freeze’ The literature uniformly identifies the unstressed vowels as closed. Consequently, the open è and ò have more restricted distribution and hence by traditional criteria would be identified as "marked" (Krämer 2009). In this paper we examine various lines of evidence indicating that the open vowels are optimal in stressed (open) syllables (the rafforzamento of Nespor 1993) and thus that the closed é and ó are "marked" in this position: {è,ò} > {é,ó} (where > means “better than” in the Optimality Theoretic sense). -
Why Dots and Dashes Matter: Writing Bengali in Roman Script Colonial Contexts, of Non-Roman Scripts Generally, Except for Greek
Carmen Brandt Why Dots and Dashes Matter Writing Bengali in Roman Script A friend recently sent me a photo of an Indian restaurant called “Vagina Tandoori” and asked for my ‘official statement’ as a scholar of South Asian studies. After we had had a good laugh – as many other Internet users had before us – I was not shy of providing an answer, as I explained the potential reason behind this awkward name: the restaurant owners could be of Bengali origin and might have transcribed the Bengali term ভািগনা/bhāginā1 “nephew”2, the way they thought it should be written in Roman script. I have come across this transcription several times in Bangladesh. Beyond that, a male colleague of mine regularly gets emails from his “vagina”, i.e. from a friend in Bangla- desh who is much younger to him and refers to himself as his “nephew”. Even though the awkward restaurant name turned out to be a hoax,3 the writing of South Asian lan- guages – especially Bengali – in Roman script is, nonetheless, often a challenge. Hence, this essay reflects on these challenges, discusses in general why South Asian languages are often written in Roman script, and which options might be best to do so. The Roman Script – a Symbol of Colonial Conquest As I have already discussed in detail (Brandt 2020), in South Asia, the Roman script is often perceived as a symbol of foreign domination in past and present. Its 1 All Bengali words are transliterated according to the rules in Table 4 below. 2 That means the son of one’s sister or the son of one’s husband’s sister. -
Kelantan and Sarawak Malay Dialects: Parallel Dialect Text Collection and Alignment Using Hybrid Distance-Statistical-Based Phrase Alignment Algorithm
Turkish Journal of Computer and Mathematics Education Vol.12 No.3 (2021), 2163-2171 Research Article Kelantan and Sarawak Malay Dialects: Parallel Dialect Text Collection and Alignment Using Hybrid Distance-Statistical-Based Phrase Alignment Algorithm Khaw, Jasmina Yen Min1, Tan, Tien-Ping2, Ranaivo-Malancon, Bali3 1Faculty of Information Communication and Technology, Universiti Tunku Abdul Rahman, Kampar, Malaysia 2School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia 3Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, Malaysia [email protected] Article History: Received: 10 November 2020; Revised: 12 January 2021; Accepted: 27 January 2021; Published online: 05 April 2021 Abstract: Parallel texts corpora are essential resources especially in translation and multilingual information retrieval. However, the publicly available parallel text corpora are limited to certain types and domains. Besides, Malay dialects are not standardized in term of writing. The existing alignment algorithms that is used to analayze the writing will require a large training data to obtain a good result. The paper describes our methodology in acquiring a parallel text corpus of Standard Malay and Malay dialects, particularly Kelantan Malay and Sarawak Malay. Second, we propose a hybrid of distance-based and statistical-based alignment algorithm to align words and phrases of the parallel text. The proposed approach has a better precision and recall than the state-of-the-art GIZA++. In the paper, the alignment