Multimodal Neural Pronunciation Modeling for Spoken Languages with Logographic Origin

Total Page:16

File Type:pdf, Size:1020Kb

Multimodal Neural Pronunciation Modeling for Spoken Languages with Logographic Origin Multimodal neural pronunciation modeling for spoken languages with logographic origin Minh Nguyen Gia H. Ngo Nancy F. Chen National University of National University of Institute for Infocomm Research Singapore Singapore Singapore [email protected] [email protected] [email protected] Abstract belonging to the Han logographic family. Sim- Graphemes of most languages encode pro- ilar to pronunciation modeling in phonographic nunciation, though some are more ex- languages, in which words are broken down into plicit than others. Languages like Spanish characters and modeling is done at the character have a straightforward mapping between its level, pronunciation modeling in logographic lan- graphemes and phonemes, while this mapping guages requires decomposing logographs into sub- is more convoluted for languages like English. units and extracting only sub-units carrying pro- Spoken languages such as Cantonese present nunciation hints. As the correspondence of Han even more challenges in pronunciation mod- eling: (1) they do not have a standard writ- logograph to phoneme is intricately complex with ten form, (2) the closest graphemic origins are many sub-rules or exceptions (Hashimoto, 1978), logographic Han characters, of which only a it is challenging to computationally model these subset of these logographic characters implic- correspondences using white box approaches (e.g. itly encodes pronunciation. In this work, we graphical model). Instead, we exploit neural net- propose a multimodal approach to predict the works, as they (1) can flexibly model the im- pronunciation of Cantonese logographic char- plicit similarity of grapheme-phoneme relation- acters, using neural networks with a geomet- ric representation of logographs and pronun- ships across languages with Han origin, (2) can au- ciation of cognates in historically related lan- tomatically learn the most relevant knowledge rep- guages. The proposed framework improves resentation with minimal feature engineering (Le- performance by 18.1% and 25.0% respective Cun et al., 2015), such as extracting pronunciation to unimodal and multimodal baselines. hints from logographic representations. 1 Introduction Due to historical contact, there is much lexi- cal overlap across Han logographic languages, as In phonographic languages, there is a di- they borrowed words from one another (Rokuro, rect correspondence between graphemes and 1969; Miyake, 1997; Loveday, 1996; Sohn, 2001; phonemes (Defrancis, 1996), though this corre- Alves, 1999). As a result, cognates in different spondence is not always one-to-one. For exam- languages are written using identical graphemes ple, in English, the word table corresponds to but pronounced differently. For example, [she] the pronunciation [``teI.bl], in which each alpha- in Mandarin and [sip] in Cantonese are cog- betic character corresponds to one phoneme, and nates; their pronunciations are different yet they the character e is mapped to silence. However, are written using the same logograph (懾), which in logographic languages, the correspondence be- represents “admire”. Though Han logographic tween graphemes and phonemes is more ambigu- languages are mutually unintelligible (Tang and ous (Defrancis, 1996), as only some sub-units in a Van Heuven, 2009; Handel, 2015), the correspon- 1 grapheme are indicative of its phonemes. Korean , dence of Han logographic graphemes to phonemes 2 Vietnamese and Chinese languages (e.g. Can- across languages is often similar in systematic tonese) are examples of logographic languages, all ways (Cai et al., 2011; Frellesvig and Whitman, 1A large portion of Korean vocabulary are Sino-Korean 2008; Miyake, 1997). The shared characteristics written in Hanja (Korean logographs) (Sohn, 2001) in pronunciation of cognates could be leveraged in 2Traditional Vietnamese vocabulary comprises of Sino- Vietnamese words written by Chinese logographs and deciphering the pronunciation of Han logographs. locally-invented Nom logographs (Alves, 1999). In this work, we proposed a neural pronuncia- 2916 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2916–2922 Brussels, Belgium, October 31 - November 4, 2018. c 2018 Association for Computational Linguistics tion model that exploits both embeddings of lo- logographs in the table have a common phonetic gographs and cognates’ phonemes. The proposed radical (in red), which offers an inkling of the pro- model significantly improves pronunciation pre- nunciation of these logographs. For instance, lo- diction of logographs in Cantonese. gographs that have the phonetic radical on the left (V and è) share a similar pronunciation in Ko- 2 Related Work rean (in blue) while logographs that have the pho- netic radical on the right (j, 賠, and 蓓) share The basic units in writing (graphemes) of Han lo- a similar pronunciation in Mandarin, Cantonese gographic languages are logographs. A word con- and Vietnamese. Note that for each logograph, tains one or more logographs and a logograph con- their pronunciations across the different languages sists of one or more radicals. The pronunciation of share similarities: when the phonetic radical is on a logograph corresponds to a syllable which has the left, the nucleus ends in a back vowel like u three phonemes: onset, nucleus and coda. or o, whereas when the phonetic radical is on the Grapheme-to-phoneme (G2P) approaches such right, the nucleus ends in a front vowel like i. as (Xu et al., 2004; Chen et al., 2016) predicted a Han logograph’s pronunciation from its local con- Position of 咅 text in a phrase. This was similar to predicting Logograph a Latin word’s pronunciation from its surrounding Mandarin pou bu pei pei bei words, essentially treated individual logographs as Cantonese fau bou pui pui bui the basic units of the model and did not delve fur- Korean pwu pwu pay pay pay ther into the logographic sub-units (the radicals). Vietnamese phau bo boi boi bui While we are unaware of any work that de- Table 1: The position of radicals affects pronuncia- rives features for pronunciation prediction from tions. All logographs share a common radical in red. logographs, there are recent work in deriving rep- Similar pronunciations for V and è are bolded in j 賠 蓓 resentation of logographs for various semantic blue. Similar pronunciations for , , and are bolded in green. The pronunciation of a logograph in tasks. Some methods (Shi et al., 2015; Ke and Mandarin, Cantonese, Korean and Vietnamese are rep- Hagiwara, 2017; Nguyen et al., 2017; Zhuang resented by Pinyin, Jyutping, Yale, and Vietnamese al- et al., 2017) decomposed logographs into sub- phabet symbols respectively. units using expert-defined rules and then extracted The example in Table1 explains the motivation the relevant semantic features. Other methods use for our proposed approach to predict a logograph’s convolutional neural network to extract features pronunciation by modelling both the constituent from the images of logographs (Dai and Cai, 2017; radicals and their geometric positions. Further- Liu et al., 2017; Toyama et al., 2017). Other works more, the proposed approach can generalize to un- combined multiple level of information for feature seen logographs if the co-occurrence patterns of extraction, using both logograph and sub-units ob- their constituent radicals have been learnt. tained from logograph decomposition (Dong et al., 2016; Han et al., 2017; Peng et al., 2017; Yu et al., 3 Model 2017; Yin et al., 2016). In this work, we explicitly looked at the rela- We first describe a geometric decomposition of lo- tionship between a logograph’s constituent rad- gographs and then different neural pronunciation icals and its pronunciation. Among Han lo- models for logographs. Finally, we present a mul- gographs, 81% of frequently used logographs timodal neural model that incorporates both logo- are semantic-phonetic compounds (Li and Kang, graphic input and the cognates’ phonemes in pre- 1993) which consist of radicals that might contain dicting pronunciation of logographs. phonetic or semantic hints (Hsiao and Shillcock, 2006). The pronunciation of a logograph could Representation of Han logographs conceivably be predicted from the phonetic radi- The majority of logographs (characters) in Han lo- cals. Furthermore, the relative position of radicals gographic language family comprise of a radical in the logograph might also offer clues about it that indicates its nominal semantic category and a pronunciation. Table1 shows an example of such phonetic radical that gives an inkling of the pro- intricate relationships between a logograph’s pro- nunciation (Defrancis, 1996). Thus, patterns of nunciation and its constituent radicals. All Han co-occurrence of radicals across logographs might 2917 Tree The BoR is input to a multilayer perceptron (MLP) forms A B C 懾 懾 懾 with three layers of size 750, 500, 250. L2 regular- ization of 1e-4 is applied to the hidden layers. The ⿰ ⿰ ⿰ three dropout layers have dropout probabilities of 忄 聶 忄 ⿱ 忄 ⿱ 0.5, 0.5, and 0.2, respectively. As the output vari- ables are categorical, cross-entropy loss was used. 耳 聑 耳 ⿰ We investigated two structures for predicting 耳 耳 Vector forms output phonemes (i.e. onset, nucleus, coda). In the ⿰ 忄 聶 ⿰ 忄 ⿱ 耳 聑 ⿰ 忄 ⿱ 耳 ⿰ 耳 耳 first structure, output phonemes were predicted in- Figure 1: Geometric representation of the logograph dependently using the last hidden layer. The sec- “admire”. A, B and C are equivalent decomposition of ond structure made a sequential prediction (1) the the same logograph but with different levels of granu- coda was first predicted using the last hidden layer larity. The geometric representation comprises of both (2) the nucleus was predicted using both the final the radicals and geometric operators, which can be hidden layer and the predicted coda, and (3) the used to reconstruct the original logograph. onset was predicted using the last hidden layer to- be exploited to find the phonetic radicals, which in gether with the predicted coda and nucleus. The turn can suggest the corresponding pronunciation second structure was motivated by a stronger de- of a logograph. Using this intuition, we model the pendency between the nuclues and coda.
Recommended publications
  • Example Sentences
    English 中文 harmony Opening/ Home page Tap on a button in the loading pentagon to dive into that Upon opening the app, the world. Pressing the yin yang user will see “English” and in the center takes you to the “中文” merge into a yin app’s “About” page. yang. That reflects the goal of harmony - to help the user Most things are labeled learn Cantonese and/or in English and Chinese to Mandarin through a bilingual help the user learn Chinese experience without getting more quickly, but this (and too stressed. Soothing colors, many other things) can be pleasing visuals, and relaxing changed in the settings and music keep the user at peace. preferences. harmony (Icons in top navigation bar, from left to right: home button, help button, and harmony settings button.) Dictionary (initial) When you first open the By default, the app only shows dictionary, it shows the items you the last 15 items you you last looked at - your looked at, but you can change history. The green tabs along this in the settings menu. the bottom allow you to swipe between items you recently The search bar is fixed as you viewed, items you starred, or scroll so you can search at any items most popular with other point (instead of having to harmony users. scroll back up to the top). Here, all the characters are in Traditional Chinese because the user left the “Traditional Chinese” checkbox in the search bar checked. The app remembers your choice even after you leave the dictionary section. harmony Choosing Typing in type of input your query To begin your search, you’ll Tapping the search field will want to first choose your make the keyboard pop up type of input by pressing the and allow you to type in your button next to the search field.
    [Show full text]
  • Homophones and Tonal Patterns in English-Chinese Transliteration
    Homophones and Tonal Patterns in English-Chinese Transliteration Oi Yee Kwong Department of Chinese, Translation and Linguistics City University of Hong Kong Tat Chee Avenue, Kowloon, Hong Kong [email protected] to overcome the problem and model the charac- Abstract ter choice directly. Meanwhile, Chinese is a typical tonal language and the tone information The abundance of homophones in Chinese can help distinguish certain homophones. Pho- significantly increases the number of similarly neme mapping studies seldom make use of tone acceptable candidates in English-to-Chinese information. Transliteration is also an open transliteration (E2C ). The dialectal factor also problem, as new names come up everyday and leads to different transliteration practice. We there is no absolute or one-to-one transliterated compare E2C between Mandarin Chinese and Cantonese, and report work in progress for version for any name. Although direct ortho- dealing with homophones and tonal patterns graphic mapping has implicitly or partially mod- despite potential skewed distributions of indi- elled the tone information via individual charac- vidual Chinese characters in the training data. ters, the model nevertheless heavily depends on the availability of training data and could be 1 Introduction skewed by the distribution of a certain homo- phone and thus precludes an acceptable translit- This paper addresses the problem of automatic eration alternative. We therefore propose to English-Chinese forward transliteration (referred model the sound and tone together in E2C . In to as E2C hereafter). this way we attempt to deal with homophones There are only a few hundred Chinese charac- more reasonably especially when the training ters commonly used in names, but their combina- data is limited.
    [Show full text]
  • A Note on Orthography and Transcription
    A Note on Orthography and Transcription there are two major problems with transcribing Cantonese speech on paper. The first is the lack of a fully standardized Cantonese script. In this book, I have done my best to produce the transcripts in ways that I think reflect common existing practices.I n 1999, the Hong Kong Special Administrative Region government published a Chinese character set known as the Hong Kong Supplementary Character Set (HKSCS). The latest version of the HKSCS (2001) contains 4,818 Chi- nese characters that are specific to the Hong Kong environment (Hong Kong Special Administrative Region Government, Information Technol- ogy Services Department 2004). This character set can be seen as a first step toward a standardized Cantonese script. As a rule, all the characters used in my transcriptions are taken from the HKSCS. The second is the problem of romanization. Several romanization schemes for Cantonese are in circulation. The Yale and Meyer-Wempe systems appear to be the most commonly adopted in the English-language literature. In this book, I follow the new Cantonese Romanization Scheme, or Jyutping system, promoted by the Linguistic Society of Hong Kong. Jyutping is intuitive to Cantonese speakers. It is also convenient, because it is based solely on alphanumeric characters (unlike the Yale system, for example, which uses diacritics). But Jyutping is a relatively new system; readers who are familiar with the Yale system may find the new system difficult to follow in the beginning. The key features of the Jyutping sys- tem are the following. 1 . Consonants. In Cantonese, consonants (shown in Table 1), are di- vided into initial consonants, or onsets (those that occupy the initial position of a syllable), and final consonants, or codas (those that oc- cupy the final position of a syllable).
    [Show full text]
  • Cantonese As a World Language from Pearl River and Beyond
    Volume 10 Issue 2 (2021) Cantonese as a World Language From Pearl River and Beyond Jiaqing Zeng1 and Asif Agha2 1St. Paul’s School, Concord, NH, USA 2University of Pennsylvania, Philadelphia, PA, USA DOI: https://doi.org/10.47611/jsrhs.v10i2.1435 ABSTRACT In this paper, I will be comparing different registers of Cantonese from all around the world, mainly focusing on the Pearl River Delta region after the 1800s. Yet my larger purpose is to draw attention to how these different registers relate to the cultural values and social lives of the people living in those places. Max Weinreich, a pioneer sociolinguist and Yiddish scholar once said, “a language is a dialect with an army and a navy (Fishman).” Cantonese is no exception, and the state of this language has been dependent upon four factors: the geographic distribution of the Cantonese- speaking population, the economic development of Cantonese-speaking regions, official status, and international sig- nificance. Introduction Cantonese is one of the Chinese dialects and the mother tongue for the Guangfu people of Han Chinese, who were originally from China’s Lingnan region. The language has a complete set of nine tones, retaining many features of Middle Ancient Chinese since the area seldom suffered from wars and was unaffected by the nomadic minorities in northern China. It has a complete series of characters that can be expressed independently from other Chinese lan- guages, and it is the only Chinese language that has been studied in foreign universities in addition to Mandarin. It originated from Canton (Guangzhou) because of the important role that Canton had played in China’s important pol- itics, economy, and culture since ancient times, and it still has official status in Hong Kong and Macau today.
    [Show full text]
  • 1 Introduction to Jyutping 粵拼(A Cantonese
    Introduction to Jyutping 粵拼 (A Cantonese Romanization) There are 6 tones in Jyutping. Initials n b bai1 瘸 (adj.) limp n p pai1 批 (v.) to approve n m mai1 咪 (n.) microphone n f fai1 輝 (n.) brilliance n d dai1 低 (adj.) low n t tai1 梯 (n.) ladder n n nai4 泥 (n.) soil n l lai4 嚟 (v.) to come n g gai1 雞 (n.) chicken n k kai1 溪 (n.) brook n ng ngai1 哀 (v.) to beg n h hai6 係 (v.) be 1 n gw gwai3 貴 (adj.) expensive n kw kwai1 虧 (n.) deficit n w wai3 喂 (v.) to feed n z zai1 劑 (n.) dose n c cai1 妻 (n.) wife n s sai1 西 (n.) west n j jai1 曳 (adj.) silly Finals Vowel: aa n aa zaa1 揸 (v.) to hold n aai zaai1 齋 (n.) vegan n aau zaau1 嘲 to make fun of n aam zaam6 站 (n.) station n aan zaan3 讚 (v.) to praise n aang zaang1 爭 (v.) to fight for n aap zaap6 集 (v.) to gather n aat zaat3 扎 (n.) bundle n aak zaak3 窄 (adj.) narrow Vowel: a n ai zai1 擠 (v.) to squeeze (an object) n au zau1 周 (adv.) around n am zam1 斟 (v.) to pour n an zan1 真 (adj.) real n ang zang1 憎 (v.) to hate n ap zap1 汁 (n.) juice n at zat2 侄 (n.) nephew/niece n ak zak1 側 (n.) lateral 2 Vowel: e n e se2 寫 (v.) to write n ei sei3 四 (num.) four n eu* deu6 掉 (v.) to dump n em* lem2 舔 (v.) to lick n eng beng6 病 (adj.) sick n ep* gep2 夾 (n.) clip n ek sek6 石 (n.) stone/rock Vowel: i n i si1 詩 (n.) poem n iu siu1 消 (v.) to disappear n im sim2 閃 (adj.) sparkling n in sin1 先 (adv.) first n ing sing1 升 (v.) to elevate n ip sip3 攝 (v.) to shoot (a scene) n it sit3 洩 (v.) to divulge n ik sik1 識 (v.) to know Vowel: o n o ho2 可 (aux.) can n oi hoi1 開 (v.) to open n ou hou2 好 (adj.) good n on hon6 汗 (n.)
    [Show full text]
  • Developing Computational Tools for Cantonese Linguistics Jackson L
    PyCantonese: Developing computational tools for Cantonese linguistics Jackson L. Lee, Litong Chen, & Tsz-Him Tsui University of Chicago & The Ohio State University Introduction: In this talk, we introducce PyCantonese, an open-source Python library for computational research in Cantonese linguistics. There are two primary motivations for this project. First, while an increasing number of Cantonese corpora are available (e.g., the Hong Kong Cantonese Corpus (Luke & Wong 2015), HKCAC (Leung & Law 2001, Fung & Law 2013), the Cantonese Radio Corpus (Francis & Matthews 2005)), these resources are in incompatible formats and there are no general toolkits for handling Cantonese corpus data. Second, computational linguistics is a largely undeveloped sub-field for Cantonese. In response to these gaps, PyCantonese is designed to provide general tools for the manipulation, annotation, and analysis of Cantonese corpus data. We demonstrate the implemented tools including the handling of Jyutping romanization and corpus search functions, and show how PyCantonese can facilitate Cantonese linguistic research. Handling Jyutping: A common scenario in Cantonese corpus work is that a corpus is available and transcribed in Jyutping, but no tools are readily available to parse Jyutping in order to identify onsets, nuclei, codas, and tones. We demonstrate the relevant functionalities of PyCantonese, and how they facilitate research areas such as phonotactics and phonological development using child-directed speech data. Search functions: Another frequent task is to search for particular items in corpus data. Depending on the exact nature of the dataset being used, PyCantonese provides search functions for some given Jyutping elements, part-of-speech tags, and Chinese characters. We show how simple searches are performed using PyCantonese, and how to combine these functions and programming techniques to achieve what would be of great interest to linguists (e.g., find verbal and prepositional phrases).
    [Show full text]
  • David Li-Wei Chen Handbook of Taiwanese Romanization
    DAVID LI-WEI CHEN HANDBOOK OF TAIWANESE ROMANIZATION DAVID LI-WEI CHEN CONTENTS PREFACE v HOW TO USE THIS BOOK 1 TAIWANESE PHONICS AND PEHOEJI 5 白話字(POJ) ROMANIZATION TAIWANESE TONES AND TONE SANDHI 23 SOME RULES FOR TAIWANESE ROMANIZATION 43 VERNACULAR 白 AND LITERARY 文 FORMS 53 FOR SAME CHINESE CHARACTERS CHIANG-CH旧漳州 AND CHOAN-CH旧泉州 63 DIALECTS WORDS DERIVED FROM TAIWANESE 65 AND HOKKIEN WORDS BORROWED FROM OTHER 69 LANGUAGES TAILO 台羅 ROMANIZATION 73 BODMAN ROMANIZATION 75 DAIGHI TONGIONG PINGIM 85 台語通用拼音ROMANIZATION TONGIONG TAIWANESE DICTIONARY 91 通用台語字典ROMANIZATION COMPARATIVE TABLES OF TAIWANESE 97 ROMANIZATION AND TAIWANESE PHONETIC SYMBOLS (TPS) CONTENTS • P(^i-5e-jT 白話字(POJ) 99 • Tai-uan Lo-ma-jT Phing-im Hong-an 115 台灣羅馬字拼音方案(Tailo) • Bodman Romanization 131 • Daighi Tongiong PTngim 147 台語通用拼音(DT) • Tongiong Taiwanese Dictionary 163 通用台語字典 TAIWANESE COMPUTING IN POJ AND TAILO 179 • Chinese Character Input and Keyboards 183 • TaigIME臺語輸入法設定 185 • FHL Taigi-Hakka IME 189 信聖愛台語客語輸入法3.1.0版 • 羅漢跤Lohankha台語輸入法 193 • Exercise A. Practice Typing a Self­ 195 Introduction in 白話字 P^h-Oe-jT Romanization. • Exercise B. Practice Typing a Self­ 203 Introduction in 台羅 Tai-l6 Romanization. MENGDIAN 萌典 ONLINE DICTIONARY AND 211 THESAURUS BIBLIOGRAPHY PREFACE There are those who believe that Taiwanese and related Hokkien dialects are just spoken and not written, and can only be passed down orally from one generation to the next. Historically, this was the case with most Non-Mandarin Chinese languages. Grammatical literacy in Chinese characters was primarily through Classical Chinese until the early 1900's. Romanization in Hokkien began in the early 1600's with the work of Spanish and later English missionaries with Hokkien-speaking Chinese communities in the Philippines and Malaysia.
    [Show full text]
  • 41912405 Masters Thesis CHEUNG Siu
    University of Queensland School of Languages & Comparative Cultural Studies Master of Arts in Chinese Translation and Interpreting CHIN7180 - Thesis Translation of Short Texts: A case study of street names in Hong Kong Student: Shirmaine Cheung Supervisor: Professor Nanette Gottlieb June 2010 ©2010 The Author Not to be reproduced in any way except for the purposes of research or study as permitted by the Copyright Act 1968 Abstract The topic of this research paper is “Translation of Short Texts: A case study of street names in Hong Kong”. It has been observed that existing translation studies literature appears to cater mainly for long texts. This suggests that there may be a literature gap with regard to short text translation. Investigating how short texts are translated would reveal whether mainstream translation theories and strategies are also applicable to such texts. Therefore, the objectives of the paper are two-fold. Firstly, it seeks to confirm whether there is in fact a gap in the existing literature on short texts by reviewing corpuses of leading works in translation studies. Secondly, it investigates how short texts have been translated by examining the translation theories and strategies used. This is done by way of a case study on street names in Hong Kong. The case study also seeks to remedy the possible paucity of translation literature on short texts by building an objective and representative database to function as an effective platform for examining how street names have been translated. Data, including street names in English and Chinese, are collected by way of systematic sampling from the entire data population.
    [Show full text]
  • Sinophone Southeast Asia
    Sinophone Southeast Asia - 9789004473263 Downloaded from Brill.com09/25/2021 02:55:49AM via free access Chinese Overseas HISTORY, LITERATURE, AND SOCIETY Chief Editor WANG Gungwu Subject Editors Evelyn Hu-DeHart David Der-wei WANG WONG Siu-lun volume 20 The titles published in this series are listed at brill.com/cho - 9789004473263 Downloaded from Brill.com09/25/2021 02:55:49AM via free access Sinophone Southeast Asia Sinitic Voices across the Southern Seas Edited by Caroline Chia Tom Hoogervorst LEIDEN | BOSTON - 9789004473263 Downloaded from Brill.com09/25/2021 02:55:49AM via free access This is an open access title distributed under the terms of the CC BY-NC 4.0 license, which permits any non-commercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited. Further information and the complete license text can be found at https://creativecommons.org/licenses/by-nc/4.0/ The terms of the CC license apply only to the original material. The use of material from other sources (indicated by a reference) such as diagrams, illustrations, photos and text samples may require further permission from the respective copyright holder. Library of Congress Cataloging-in-Publication Data Names: Chia, Caroline, editor. | Hoogervorst, Tom, 1984– editor. Title: Sinophone Southeast Asia : Sinitic voices across the Southern Seas / by Caroline Chia, Tom Hoogervorst. Description: Leiden; Boston : Brill, [2021] | Series: Chinese overseas: history, literature, and society; 1876-3847 ; volume 20 Identifiers: LCCN 2021032807 (print) | LCCN 2021032808 (ebook) | ISBN 9789004421226 (hardback) | ISBN 9789004473263 (ebook) Subjects: LCSH: Chinese language—Variation—Southeast Asia. | Chinese language—Social aspects—Southeast Asia.
    [Show full text]
  • Learning Cantonese As an Additional Language (CAL) Or Not: What the CAL Learners Say 從廣東話學習者的角度看學習廣東話的需要
    Global Chinese 2016; 2(1): 1–22 David C. S. Li*, Shuet Keung, Hon Fong Poon and Zhichang Xu Learning Cantonese as an additional language (CAL) or not: What the CAL learners say 從廣東話學習者的角度看學習廣東話的需要 DOI 10.1515/glochi-2016-0001 Abstract: Based on qualitative data obtained from 33 participants in four focus groups, two each in Putonghua (17) and English (16) respectively, this study shows that learners of Cantonese as an additional language (CAL) in Hong Kong experience a lot of difficulties. As a ‘dialect’, Cantonese has not been standardized and is not part of school literacy. A variety of romanization systems are used in commercially obtainable learning aid like Cantonese course books and bilingual dictionaries, which tend to diverge from romanized Cantonese in street signs and personal names. Independent learning is difficult while incidental learning is almost impossible. Cantonese tuition, often focusing on tones, is reportedly not so helpful. With six distinctive tonemes, the Cantonese tone system appears to be a major stumbling block. When spoken to in Cantonese, local speakers tend to switch to English or Putonghua. Inaccuracies in tone contours often trigger laugh- ter, damaging CAL learners’ self-esteem and dampening their motivation to learn and speak Cantonese. Unlike sojourners, non-Chinese residents who see them- selves as Hongkongers often get upset as their identity claims are questioned or even challenged by the mainstream Cantonese society. Keywords: Cantonese, tone language, dialect, attitude and motivation, identity 摘要: 本研究的資料取材自 33 位學習廣東話人士參與的四次焦點小組面談, 其中 包括兩次普通話 (17人) 和兩次英語 (16人) 焦點小組面談, 結果顯示這些參與者在 學習廣東話方面遇到很多困難。作為一種方言, 廣東話沒有規範標準, 亦不是一 種書寫語言。坊間的廣東話教材和雙語字典多採用不同的拼音系統, 而這些拼音 系統往往又與路標或個人姓名所採用的拼音系統大相逕庭。因此, 自學廣東話非 常困難, 延伸學習甚至可說是不可能的事情。本研究結果顯示, 著重聲調教授的 *Corresponding author: David C.
    [Show full text]
  • 粵拼中文打字教學jyutping Chinese Typing
    粵拼中文打字教學 Jyutping Chinese Typing 姓名︓__________ 班級︓__________ 學號︓__________ 第一週 Week1 簡介 Introduction A 我的名字 My name 粵拼是用拼音表示粵語的方法。 你的中文名字是甚麼︖ Jyutping is a way to spell out What is your name in Chinese? Cantonese words. 我的中文名字是︓ 注意有些字母和英文讀法不同。 You need to be aware that some _________ letters are different from English. 你的中文名字怎樣讀︖ How to pronounce your Chinese name? 我的名字的粵拼是︓______________ B 粵語音節 The Cantonese Syllable 粵語音節有三個部份︐分別是聲母、韻母和聲調。 A Cantonese syllable consists of three parts: initials, rhymes and tones. 早 z ou 2 晨 s an 4 聲母 韻母 聲調 Initials Rhymes Tones 注︓打字時不用打聲調。Note: You don’t need to enter the tones when typing. Jyutping Typing 粵拼打字 2 C 練習題 Exercise 請找出每個字的聲母、韻母和聲調。 Find out the initial, rhyme and tone for each of the characters. First character Second character Initial Rhyme Tone Initial Rhyme Tone 今天 gam1 tin1 g am 1 t in 1 香港 hoeng1 gong2 h 美麗 mei5 lai6 ei 學校 hok6 haau6 aau 亞洲 aa3 zau1 - 中午 zung1 ng5 - 粵語有 19個聲母︐約51個韻母︐6個聲調。 There are 19 initials, about 51 rhymes and 6 tones in Cantonese. 加油︐繼續努力︕ gaa1 jau5, gai3 zuk6 nou5 lik6! Jyutping Typing 粵拼打字 3 D 找不同 Find the odd one 下面的字︐有一個和其他不同︐你可以圈出來嗎︖ One of the characters does not belong to the same group .Can you circle it? 加 媽 他 話 茶 姐 那 下 再來一次︕ Here’s another one! 的 人 色 直 迪 力 食 即 讀出來︐押韻 (aat3 wan5) 的字︐串法也相近。 Words that rhyme with each other will have similar spellings. E 七個元音 The Seven Vowels � � � � 家 gaa 車 ce 衣 ji 哥 go � � � 褲 fu 書 syu 靴 hoe F 填充題 Fill in the blanks 試填上下面單字的元音。 Try filling the vowel sounds for the following words.
    [Show full text]
  • Introduction to Mandarin Chinese Splash 2010
    H3830: Introduction to Mandarin Chinese Splash 2010 Educational Studies Program Massachusetts Institute of Technology Instructors Bruce Chang ([email protected]) Stephen M. Hou ([email protected]) Kelsy Lai ([email protected]) Contents This packet contain the following sections: • Basic expressions in Mandarin Chinese • Numbers • Dates • Questions words and demonstratives • Pronouns • Basic words • Verbs • Nouns • People • Places • Dialogues • Comparing Chinese dialects • Chinese idioms Notes • All characters are written in traditional form, which is commonly used in Taiwan and Hong Kong. Mainland China and Singapore use simplified characters. • In the vocabulary lists, characters are in Kaiti font (楷體), but in the text (such as this sentence), they are in Mingti font (明體). • Unless otherwise stated, all Romanization is in Hanyu Pinyin (漢語拼音). • Where Mainland China and Taiwan differ in pronunciation or vocabulary, both are provided. The characters 「普」and「國」indicate the Mainland version and the Taiwan version, respectively. These are single-character abbreviations for the official name for Standard Mandarin in the respective regions: 「普通話」means “common language” and 「國語」means “national language”. 1 Basic Expressions in Mandarin Chinese 你好。 Nǐ hǎo. Hello. (Lit: You good.) 你好嗎? Nǐ hǎo mā? How are you? (Lit: Are you good/well?) 再見。 Zài-jiàn. See you later. (Lit: Again meet.) 明天見。 Míng-tiān jiàn. See you tomorrow. (Lit: Tomorrow meet.) 拜拜! Bāi-bāi! Bye bye! 你叫什麼名字? Nǐ jiào shé-me What’s your name? (Lit: You’re called míng-zì? what name?) 我叫 ___。 Wǒ jiào ___. My name is ___. (Lit: I’m called ___.) ___呢? ___ nē? How/what about ___? 很高興認識你。 Hěn gāo-xìng Pleased to meet you.
    [Show full text]