Phonology-Augmented Statistical Transliteration for Low-Resource Languages

Total Page:16

File Type:pdf, Size:1020Kb

Phonology-Augmented Statistical Transliteration for Low-Resource Languages INTERSPEECH 2015 Phonology-Augmented Statistical Transliteration for Low-Resource Languages Hoang Gia Ngo1, Nancy F. Chen2, Nguyen Binh Minh1, Bin Ma2, Haizhou Li2 1National University of Singapore, Singapore 2Institute for Infocomm Research, Singapore [email protected], [email protected] [email protected], mabin, hli @i2r.a-star.edu.sg { } Abstract because compared to OOV’s originating from the native lan- guage, their numbers are smaller in size. This often results Transliteration converts words in a source language (e.g., En- in non-interpretable outputs by statistical transliteration mod- glish) into phonetically equivalent words in a target language els [8]. The problem is compounded in low-resource languages (e.g., Vietnamese). This conversion needs to take into account such as Vietnamese [9]. Given the limited corpus size of such phonology of the target language, which are rules determin- languages, performance of statistical transliteration approaches ing how phonemes can be organized. For example, a translit- is suboptimal. On the other hand, rule-based transliteration ap- erated word in Vietnamese that begins with a consonant clus- proaches have been shown to produce phonologically-valid out- ter is phonologically invalid. While statistical transliteration puts with little training resources [10]. However, rule-based ap- approaches have been widely adopted, most do not explicitly proaches have their performance limited by the complexity of model the target language’s phonology, and thus produce in- the predefined rules, and therefore, under-perform with larger valid outputs. The problem is compounded for low-resource datasets, as compared to statistical methods [10]. languages where training data is scarce. In this work, we present We propose a transliteration framework in which the statis- a phonology-augmented statistical framework suitable for lan- tical n-gram language modeling is augmented with phonologi- guages with minimal linguistic resources. We propose the con- cal knowledge. We propose the concept of pseudo-syllables in cept of pseudo-syllables as structures representing how seg- statistical models to impose phonological constraints of sylla- ments of a foreign word are arranged according to the target ble structure in the target language, yet retain acoustic authen- language’s phonology. We use Vietnamese, a tonal language ticity of the source language as closely as possible. We assess with monosyllabic structure as an example. We show that the the transliteration performance of the proposed framework with proposed system outperforms the statistical baseline by up to existing baseline approaches on low-resource languages, using 70.3% relative, when there are limited training examples (94 Vietnamese as an example. We further investigate their perfor- word pairs). We also investigate the trade-off between training mance with corpora of different dialects and various training corpus size and transliteration performance of different methods data sizes. Our proposed framework integrates advantages of on two distinct corpora. rule-based approaches on top of classical statistical translitera- Index Terms: machine translation, under-resourced languages tion models. The proposed approach ensures phonologically- valid outputs, while maintaining strengths of statistical mod- 1. Introduction els (e.g., language-independent, performance scales up with in- crease in training data size). We are working on applying the 10.21437/Interspeech.2015-728 In every language, new words are constantly being invented framework to other languages such as Cantonese and Mandarin. or borrowed from foreign languages, especially in colloquial speech. For example, ‘facebook’ is now part of the vo- 2. Background cabulary in many languages other than English. These new words present out-of-vocabulary (OOV) challenges to spoken 2.1. Phonology language technologies such as automatic speech recognition We introduce two phonological concepts essential to transliter- [1], keyword search [2], and text-to-speech [3]. Translitera- ation. tion is a mechanism for modeling OOV’s adopted from for- eign languages. Transliteration converts words written in one i. Syllable: A syllable is considered the smallest phonological writing system (source language, e.g., English) into phoneti- unit of a word [11] with the following structure [12]: cally equivalent words in another writing system (target lan- [O] N [Cd] + [T ] (1) guage, e.g., Vietnamese) [4], and is often used to translate foreign names of people, locations, organizations, and prod- where the “[ ]” specifies an optional unit. O denotes the Onset, ucts. For example, the English word facebook can be translit- which is a consonant or a cluster of consonants at the begin- erated to Vietnamese phonemes using X-SAMPA notation [5]: ning of a syllable. N denotes the Nucleus, which contains at f @I _1 . b_< u k _2 (“.” is a syllable delimiter). least a vowel. Cd denotes the Coda, which mostly contains Letter-to-sound tools employing statistical approaches are consonants. T denotes lexical tone, a feature existing in many common solutions for OOV’s [6][7]. Transliteration deals with languages to distinguish different words. The syllabic structure two different language systems, with the input as letters from a above is shared across most languages [12]. However, how con- source language, and output as phonemes of a target language. sonants (C ) and vowels (V ) constitute Onset, Nucleus, Coda Transliteration outputs therefore need to comply with phono- differs across languages. For example, in English, an Onset can logical rules of the target language. Such phonological rules of be a consonant cluster, such as ‘kl’, while no consonant cluster OOV’s adopted from foreign words are often difficult to model can be the Onset of a syllable in Vietnamese [13][14]. Copyright © 2015 ISCA 3670 September 6-10, 2015, Dresden, Germany ii. Lexical Tones: In tonal languages, pitch is used to distin- 4. New phonemes can be added to the output to retain the guish the meaning of words which are phonetically the same. phonological structure of the source language: a ‘schwa’ is This distinctive pitch level or contour is referred to as lexical added to imitate the pronunciation of the consonant cluster tones [15]. For instance, there are 6 distinct lexical tones in “KL” that does not have an equivalent in Vietnamese. Vietnamese [16], and 4 distinct lexical tones in Mandarin [17]. In existing statistical transliteration models, the phonological Each lexical tone is commonly encoded in phonetic representa- considerations listed above are implicitly modeled using n-gram tion with a number. For example, consider two different Viet- language modeling of the source language graphemes [22], or namese words: b_< O _3 (cow) and b_< O _6 (bug). The joint sequences of graphemes and phonemes [19][20]. Due to two words are represented in phonetic units using X-SAMPA limited training data in low-resource languages, phonological notation [5], and have the same Onset (b_<) and Nucleus (O), structure in the transliteration output is not well-modeled, re- but are distinguished by the two different lexical tones (tone 3 sulting in a high rate of invalid syllables in the output [10]. For and tone 6). Around 70% of languages are tonal [15], concen- example, k l E i _1 . J a: n is another transliteration trating in Africa, East and Southeast Asia [18]. output in Vietnamese phonemes produced by a statistical model for the word “KLEINHANS” in Figure 1. The output is invalid 2.2. Baseline Model: Joint Source-Channel Model in Vietnamese phonology since the first syllable has a conso- The joint source-channel model for transliteration was intro- nant cluster, and the second syllable has no lexical tone. Em- duced in [19]. Similar approaches have also been proposed for pirically, at least 21% of transliterated entries lack lexical tones, grapheme-to-phoneme conversion in [20]. when we run Sequitur G2P [23] (a statistical system) on 100 The transliteration problem can be formulated as follows: Vietnamese transliteration pairs extracted from OpenKWS13 given a string of graphemes f = (f1, f2, ..., fm) from source corpus [24]. Previous rule-based approaches have been shown language F , the objective is to convert this input string into a to improve transliteration performance by imposing phonologi- string of phonemes e = (e1, e2, ..., en) in target language E. cal constraints on their outputs [10]. However, predefined rules The string of phonemes is inferred to: are likely to make mistakes with words not observed in training data. Rule-based approaches are thus outperformed by statisti- e∗ = arg max p(e f) = arg max p(e, f). (2) e | e cal models in larger data sets [10]. The transliteration process can be seen as finding the alignment Our proposed approach has the strengths of both the rule- for sub-sequences of the input string f and the output string based and statistical approaches by integrating the phonology of e [21]. Let there be K aligned transliteration pairs <x,y>1, the target language explicitly with a statistical model. The pro- ..., <x,y>K , where x takes on sub-sequences of e and y takes posed framework can thus generate transliterated outputs that on sub-sequences of f. The joint probability p(e, f) is esti- are phonologically valid yet improves the performance with mated through an N-gram model, where the k-th alignment pair more data. Figure 2 shows the three steps of the proposed <x,y>k is dependent on its N predecessor pairs [19][20]: approach: (1) pseudo-syllable formulation, (2) grapheme-to- K phoneme mapping, and (3) lexical tone assignment. p(e, f) = P (<x, y>k <x, y>k N+1, ..., <x, y>k 1) | − − 3.2. Pseudo-Syllable Formulation kY=1 Pseudo-syllable is a representation of how segments of a for- 3. Proposed Phonology-Augmented eign word are arranged according to the syllable structure spec- Statistical Framework ified by the target language’s phonology. The concept is in- spired by how native speakers process a foreign loan word by 3.1.
Recommended publications
  • Mon-Khmer Studies Volume 41
    Mon-Khmer Studies VOLUME 42 The journal of Austroasiatic languages and cultures Established 1964 Copyright for these papers vested in the authors Released under Creative Commons Attribution License Volume 42 Editors: Paul Sidwell Brian Migliazza ISSN: 0147-5207 Website: http://mksjournal.org Published in 2013 by: Mahidol University (Thailand) SIL International (USA) Contents Papers (Peer reviewed) K. S. NAGARAJA, Paul SIDWELL, Simon GREENHILL A Lexicostatistical Study of the Khasian Languages: Khasi, Pnar, Lyngngam, and War 1-11 Michelle MILLER A Description of Kmhmu’ Lao Script-Based Orthography 12-25 Elizabeth HALL A phonological description of Muak Sa-aak 26-39 YANIN Sawanakunanon Segment timing in certain Austroasiatic languages: implications for typological classification 40-53 Narinthorn Sombatnan BEHR A comparison between the vowel systems and the acoustic characteristics of vowels in Thai Mon and BurmeseMon: a tendency towards different language types 54-80 P. K. CHOUDHARY Tense, Aspect and Modals in Ho 81-88 NGUYỄN Anh-Thư T. and John C. L. INGRAM Perception of prominence patterns in Vietnamese disyllabic words 89-101 Peter NORQUEST A revised inventory of Proto Austronesian consonants: Kra-Dai and Austroasiatic Evidence 102-126 Charles Thomas TEBOW II and Sigrid LEW A phonological description of Western Bru, Sakon Nakhorn variety, Thailand 127-139 Notes, Reviews, Data-Papers Jonathan SCHMUTZ The Ta’oi Language and People i-xiii Darren C. GORDON A selective Palaungic linguistic bibliography xiv-xxxiii Nathaniel CHEESEMAN, Jennifer
    [Show full text]
  • Prosody of Tone Sandhi in Vietnamese Reduplications
    Prosody of Tone Sandhi in Vietnamese Reduplications Authors Thu Nguyen Linguistics program E.M.S.A.H. University of Queensland St Lucia, QLD 4072, Australia Email: [email protected] John Ingram Linguistics program E.M.S.A.H. University of Queensland St Lucia, QLD 4072, Australia Email: [email protected] Abstract In this paper we take advantage of the segmental control afforded by full and partial Vietnamese reduplications on a constant carrier phrase to obtain acoustic evidence of assymetrical prominence relations (van der Hulst 2005), in support of a hypothesis that Vietnamese reduplications are phonetically right headed and that tone sandhi is a reduction phenomenon occurring on prosodically weak positions (Shih, 2005). Acoustic parameters of syllable duration (onset, nucleus and coda), F0 range, F0 contour, vowel intensity, spectral tilt and vowel formant structure are analyzed to determine: (1) which syllable of the two (base or reduplicant) is more prominent and (2) how the tone sandhi forms differ from their full reduplicated counterparts. Comparisons of full and partial reduplicant syllables in tone sandhi forms provide additional support for this analysis. Key words: tone sandhi, prosody, stress, reduplication, Vietnamese, acoustic analysis ____________________________ We would like to thank our subjects for offering their voices for the analysis, Dr. Nguyen Hong Nguyen for statistical advice and the anonymous reviewers for their valuable comments and suggestions. The Postdoctoral research fellowship granted to the first author by the University of Queensland is acknowledged. Prosody of tone sandhi in Vietnamese reduplications 2 1. Introduction Vietnamese is a contour tone language, which is strongly syllabic in its phonological organization and morphology.
    [Show full text]
  • Vietnamese Accent
    Vietnamese English Erik Singer Vietnamese is spoken by about 86 million people, which makes it the 17th largest language community in the world. It is part of the Austro-Asiatic language family, and is by far the most widely spoken of these languages. It has borrowed a large portion of its vocabulary from Chinese, thanks to an early period of Chinese domination, but it is otherwise linguistically unrelated. It is generally described as having three dialects: Hanoi in the North, Ho Chi Minh in the South, and Hue in the center. The three dialects are mostly mutually intelligible, though Hue is said to be difficult for speakers of the other two dialects to understand. The northern speech…is marked by sharpness, or choppiness, with greater attention to the precise distinction of tones. The southern speech, in addition to certain uniform differences from northern speech in the pronunciation of consonants, does not distinguish between the hoi and nga tones; and, it is felt by some to sound more laconic and musical. The speech of the Center, on the other hand, is often described as being heavy because of its emphasis on low tones.1 Vietnamese uses the Latin alphabet, with additional diacritics to indicate tones. Vietnam itself is the world’s 13th most populous country, and the 8th most populous in Asia. It became independent from Imperial China in 938 BCE. Since 2000, it has been one of the fastest-growing economies in the world. Oral Posture Oral or vocal tract posture is the characteristic pattern of muscular engagement and relaxation inherent to a given language or accent.
    [Show full text]
  • 8Th Euroseas Conference Vienna, 11–14 August 2015
    book of abstracts 8th EuroSEAS Conference Vienna, 11–14 August 2015 http://www.euroseas2015.org contents keynotes 3 round tables 4 film programme 5 panels I. Southeast Asian Studies Past and Present 9 II. Early And (Post)Colonial Histories 11 III. (Trans)Regional Politics 27 IV. Democratization, Local Politics and Ethnicity 38 V. Mobilities, Migration and Translocal Networking 51 VI. (New) Media and Modernities 65 VII. Gender, Youth and the Body 76 VIII. Societal Challenges, Inequality and Conflicts 87 IX. Urban, Rural and Border Dynamics 102 X. Religions in Focus 123 XI. Art, Literature and Music 138 XII. Cultural Heritage and Museum Representations 149 XIII. Natural Resources, the Environment and Costumary Governance 167 XIV. Mixed Panels 189 euroseas 2015 . book of abstracts 3 keynotes Alarms of an Old Alarmist Benedict Anderson Have students of SE Asia become too timid? For example, do young researchers avoid studying the power of the Catholic Hierarchy in the Philippines, the military in Indonesia, and in Bangkok monarchy? Do sociologists and anthropologists fail to write studies of the rising ‘middle classes’ out of boredom or disgust? Who is eager to research the very dangerous drug mafias all over the place? How many track the spread of Western European, Russian, and American arms of all types into SE Asia and the consequences thereof? On the other side, is timidity a part of the decay of European and American universities? Bureaucratic intervention to bind students to work on what their state think is central (Terrorism/Islam)?
    [Show full text]
  • Register in Eastern Cham: Phonological, Phonetic and Sociolinguistic Approaches
    REGISTER IN EASTERN CHAM: PHONOLOGICAL, PHONETIC AND SOCIOLINGUISTIC APPROACHES A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirement for the Degree of Doctor of Philosophy by Marc Brunelle August 2005 © 2005 Marc Brunelle REGISTER IN EASTERN CHAM: PHONOLOGICAL, PHONETIC AND SOCIOLINGUISTIC APPROACHES Marc Brunelle, Ph.D. Cornell University, 2005 The Chamic language family is often cited as a test case for contact linguistics. Although Chamic languages are Austronesian, they are claimed to have converged with Mon-Khmer languages and adopted features from their closest neighbors. A good example of such a convergence is the realization of phonological register in Cham dialects. In many Southeast Asian languages, the loss of the voicing contrast in onsets has led to the development of two registers, bundles of features that initially included pitch, voice quality, vowel quality and durational differences and that are typically realized on rimes. While Cambodian Cham realizes register mainly through vowel quality, just like Khmer, the registers of the Cham dialect spoken in south- central Vietnam (Eastern Cham) are claimed to have evolved into tone, a property that plays a central role in Vietnamese phonology. This dissertation evaluates the hypothesis that contact with Vietnamese is responsible for the recent evolution of Eastern Cham register by exploring the nature of the sound system of Eastern Cham from phonetic, phonological and sociolinguistic perspectives. Proponents of the view that Eastern Cham has a complex tone system claim that tones arose from the phonemicization of register allophones conditioned by codas after the weakening or deletion of coda stops and laryngeals.
    [Show full text]
  • Re-Imagining “Annam”: a New Analysis of Sino–Viet– Muong Linguistic Contact
    Chinese Southern Diaspora Studies, Volume 4, 2010 南方華裔研究雑志第四卷, 2010 Re-Imagining “Annam”: A New Analysis of Sino–Viet– Muong Linguistic Contact John D. Phan©2010 Abstract: This article examines the linguistic boundaries that separated (or united) Medieval China’s southern territories and the river plains of northern Vietnam at the end of the first millennium C.E. New evidence from Sino–Vietnamese vocabulary demonstrates the existence of a regional dialect of Middle Chinese, spoken in the Ma, Ca, and Red River plains. Preliminary analysis suggests that a “language shift” away from this “Annamese Middle Chinese” in favor of the local, non-Chinese language, was largely responsible for the highly sinicized lexicon of modern Vietnamese. This theory, which challenges the tradition of an essentially literary source for Sino–Vietnamese, may help to explain some of the sinicized features of Vietnamese phonology and syntax as well. The last section of the article presents a tentative hypothesis for the formal emergence of Vietnamese contra its closest relative, Muong. These hypotheses require further testing, and are presented here as a first look at the history of the languages of “Annam”. Key Words: Ancient Vietnam; Sino–Vietnamese; Muong; historical phonology; language contact Introduction This article revisits the notions of “Chinese” and “Vietnamese” in a linguistic context, and as they pertain to the transitional period linking the first and second millennia C.E. New evidence from Sino–Vietnamese (Chinese words borrowed into Vietnamese), and the Vietnamese language’s closest living relative, Muong, demonstrate that traditional notions of the “survival” of the Vietnamese language under centuries of Chinese domination create a false imagining of its history and evolution—one that has been tailored to a political agenda of national identity.
    [Show full text]
  • Programme Et Résumés Programme and Abstracts
    Congrès de l’ACL 2015 CLA Conference 2015 Programme et Résumés Programme and Abstracts Congrès de l’ACL 2015 | 2015 CLA meeting Samedi 30 mai | Saturday, May 30 Fauteux 302 Fauteux 359 Fauteux 361 Syntaxe | Syntax Le français acadien | Acadian French Acquisition président | chair: Daniel Currie Hall président | chair: Walter Cichocki présidente | chair: Mihaela Pirvulescu 9:00–9:30 Amani Makkawi (Manitoba) Carmen L. LeBlanc & Laura Briggs (Carleton) Johannes Knaus & Mary Grantham O'Brien Participles as nonverbal predicates «Là les Madelinots étiont tout après boire pis ça (Calgary) chantait» ou L’étude des désinences à la 3e personne Word stress processing and the influence of cognate du pluriel suffixes in second language English: An EEG study 9:30–10:00 Annick Morin (Toronto) Emilie LeBlanc & Selena Phillips-Boyle (York) Laura Colantoni, Gabrielle Klassen, Matthew J. Où en est tu? A cross-linguistic approach to Quebec Discourse markers well and ben in Chiac Patience, Malina Radu & Olga Tararova (Toronto) French polar interrogatives Production of redundant and primary prosodic cues to sentence type by L1 Spanish and Mandarin learners of English 10:00–10:30 Éric Mathieu & Gita Zareikar (Ottawa) Basile Roussel (Ottawa) Lilliana Montoya & Joyce Bruhn de Garavito Bottles of milk and cups of sugar: A cross-linguistic Le français acadien, une variété conservatrice? (Western) perspective on measure constructions L’exemple de l’usage du subjonctif dans le Nord-Est du Information structure and nominal ellipsis in L2 Nouveau Brunswick Spanish
    [Show full text]
  • Z in Company Names: Trendy Clothing for a Typical Vietnamese Sound Alexis Michaud, Minh-Châu Nguyễn, Hiển Phạm
    Z in company names: trendy clothing for a typical Vietnamese sound Alexis Michaud, Minh-Châu Nguyễn, Hiển Phạm To cite this version: Alexis Michaud, Minh-Châu Nguyễn, Hiển Phạm. Z in company names: trendy clothing for a typical Vietnamese sound. Mon-Khmer Studies, 2016, 45, pp.53-65. halshs-01413258v2 HAL Id: halshs-01413258 https://halshs.archives-ouvertes.fr/halshs-01413258v2 Submitted on 14 Mar 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution - NonCommercial - ShareAlike| 4.0 International License Z in company names: trendy clothing for a typical Vietnamese sound1 Alexis MICHAUD*, Minh-Châu NGUYỄN** and Hiển PHẠM*** *International Research Institute MICA, HUST - CNRS UMI 2954 - Grenoble INP, Hanoi **Department of Linguistics, Vietnam National University, Hanoi ***Institute of Linguistics, Vietnam Academy of Social Sciences, Hanoi Abstract The letter Z is not part of the Vietnamese alphabet, any more than F, J and W. But it is far from uncommon in language use. It appears in the names of companies that target a popular audience, e.g. Zing for a local competitor to Yahoo. Why is Z, the least used letter of the English alphabet, so trendy in present-day Vietnamese? The evidence reported here suggests that the letter Z constitutes foreign-looking clothing for a typical Vietnamese sound.
    [Show full text]
  • Documentation and Phonological Description of Malieng, Vietnam, with Focus on Tone
    Documentation and phonological description of Malieng, Vietnam, with focus on tone. Upgrade Chapter In submission for PhD candidateship on 15/05/2019 Department of Linguistics, School of Languages, Cultures and Linguistics SOAS University of London MPhil Candidate: Albert Badosa Roldós First Supervisor: Dr Nathan Hill Second Supervisor: Dr Chris Lucas Table of contents 1. Aims ................................................................................................................................................ 3 2. Vietic languages: classification and state of the art ............................................................... 4 2.1 Minorities and ethnic groups in Vietnam ........................................................................... 4 2.1.1 The concepts of minority and ethnic group ....................................................................... 4 2.1.2 Ethnic minorities in Vietnam ....................................................................................... 7 2.1.3 The research context in Vietnam ............................................................................... 10 2.1.4 Theesearch context in the Vietnamese Central Highlands ...................................... 11 2.2 Historical linguistic classification of the Austroasiatic phylum ...................................... 13 2.2.1 Problems with the classification of Austroasiatic languages ................................... 13 2.2.2 Early classification of Austroasiatic .........................................................................
    [Show full text]
  • Bakalářská Práce
    Univerzita Karlova v Praze Filozofická fakulta Ústav anglického jazyka a didaktiky Bakalářská práce Lenka Tranová Znělostní kontrast ve vietnamské angličtině The Voicing Contrast in Vietnamese English Praha, 2015 vedoucí práce: doc. Mgr. Radek Skarnitzl, Ph.D The Voicing Contrast in Vietnamese English Prohlašuji, že jsem tuto bakalářskou práci vypracovala samostatně a výhradně s použitím citovaných pramenů, literatury a dalších odborných zdrojů. Souhlasím se zapůjčením bakalářské práce ke studijním účelům. V ….............................dne..............................… podpis.................................................. I declare that the following BA thesis is my own work for which I used only the sources and literature listed. I have no objections to the BA thesis being borrowed and used for study purposes. The Voicing Contrast in Vietnamese English Firstly I would like to express many thanks to my supervisor doc. Radek Skarnitzl for his invaluable advice and great support with my thesis. My gratitude also goes to all the Vietnamese speakers who were so kind and willing to be recorded for my academic purposes. Last but not least, I am immensely grateful to my friend Alžběta Ambrožová for her friendship and support throughout. The Voicing Contrast in Vietnamese English Abstract This thesis deals with the voicing contrast in Vietnamese-accented English. The theoretical part introduces the generally accepted phenomenon of voicing contrast and several theories aimed at generalization of the main tendencies in second language acquisition. The final part of the theoretical background addresses initial consonants in Vietnamese and Vietnamese English. The methodological section provides information about the informants, the recording, and data processing prior to the analysis itself. I also present graphs and tables illustrating the statistical calculations – using ANOVA and Tukey’s post-hoc tests – that identify the relations among the measured units.
    [Show full text]
  • The Two B's in the Vietnamese Dictionary of Alexandre De Rhodes
    The two b’s in the Vietnamese dictionary of Alexandre de Rhodes André-Georges Haudricourt To cite this version: André-Georges Haudricourt. The two b’s in the Vietnamese dictionary of Alexandre de Rhodes. 2018. halshs-01631486v2 HAL Id: halshs-01631486 https://halshs.archives-ouvertes.fr/halshs-01631486v2 Preprint submitted on 16 May 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Non-final version (May 16th, 2018). In preparation for: Haudricourt, André-Georges. Evolution of languages and techniques. (Ed.) Martine Mazaudon, Alexis Michaud & Boyd Michailovsky. (Trends in Linguistics. Studies and Monographs [TiLSM] 270). Berlin: De Gruyter Mouton. The two b’s in the Vietnamese dictionary of Alexandre de Rhodes (1974) Originally published in Vietnamese translation as: Hai Chữ B, trong cuốn từ điển của A-lếch-xan đơ Rốt. Ngôn Ngữ [Linguistics] 4 (1974). Haudricourt’s original French typescript was prepared by Nguyen Phu Phong and published in 2005 as: Les deux b du Dictionarium d’A. de Rhodes, Cahiers d’Etudes Vietnamiennes 18, 65–68. It is the basis of the present translation. translated by Alexis Michaud Abstract [This article, intended for non-specialist Vietnamese readers, begins with a typographical curiosity in the romanized spelling of the famous Dictionarium of Alexandre de Rhodes (1651): the modified ꞗ, indicating the bilabial spirant transcribed [β] by linguists.
    [Show full text]
  • Hệ Thống Ngữ Âm Tiếng Việt – the Vietnamese Sound System
    Hệ Thống Ngữ Âm Tiếng Việt – The Vietnamese Sound System A. Bảng Mẫu Tự – The Alphabet The modern Vietnamese alphabet, bảng mẫu tự, of the Vietnamese sound system, hệ thống ngữ âm tiếng Việt, has 29 letters: a A ă Ă â Â b B c C d D đ Đ e E ê Ê g G h H i I k K l L m M n N o O ô Ô ơ Ơ p P q Q r R s S t T u U ư Ư v V x X y Y The need to deal with loanwords, mainly from European languages, adds these four letters – fF [Ef´], jJ [dZi] , wW[vedub], and zZ [zE@t] to transliterate words like fi-lê for ‘fillet’ jun for ‘jouille’, wát or oát for ‘watt’, zê-rô for ‘zero’, etc. Since each regional dialect has a certain “defect” in pronouncing the words and such a standardized format would be helpful for foreign beginners, this publication is devoted to the so-called “simple standard Vietnamese,” which is intelligible all over Vietnam. 1. Chính Tự và Chính Âm – Vowels and Consonants The 29 letters of the alphabet are made up of two groups: chính tự ‘vowel letters’ and phụ tự ‘consonant letters’. The vowel letters and consonant letters represent chính âm ‘vowel sounds’ and phụ âm ‘consonant sounds’. The names of the vowel letters are identical to the sounds they present, except for the last letter – y, which has two names: y-kret or i dài: a ă â e ê i o ô ơ u ư y.
    [Show full text]