Automatic Collecting of Text Data for Cantonese Language Modeling Jiang CAO1, Xiaojun WU1, Yu Ting Yeung2, Tan LEE2, Thomas Fang Zheng1* 1

Total Page:16

File Type:pdf, Size:1020Kb

Automatic Collecting of Text Data for Cantonese Language Modeling Jiang CAO1, Xiaojun WU1, Yu Ting Yeung2, Tan LEE2, Thomas Fang Zheng1* 1 S4-4 Automatic Collecting of Text Data for Cantonese Language Modeling Jiang CAO1, Xiaojun WU1, Yu Ting Yeung2, Tan LEE2, Thomas Fang Zheng1* 1. Center for Speech and Language Technologies, Division of Technical Innovation and Development, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University 2. Department of Electronic Engineering, The Chinese University of Hong Kong *. Contact: Room 4-416 Information Science and Technology Building, Tsinghua University, Beijing, 100084, China; +86-10-6279-6393, [email protected]; http://cslt.riit.tsinghua.edu.cn/~fzheng Abstract and Chinese, there are many existing language It is hard to collect corpora used to train good databases available for language model training language models for many minority languages. (e.g. ATIS (Ward 1990), CallHome (Ries 2000) and Cantonese, one of the most popular Chinese SogouT1). It is also relatively easy to collect more dialects, is such a kind of language, lacking of text data for these langauages. For those spoken language materials for language model training. languages or dialects, the data availability is not This is a very big obstruction for the processing of trivial. In Cantonese, for example, there is very Cantonese language. little speech data and even less text data available, Unlike many other languages, there are great simply because it is not an official written language differences between written and colloquial (Fung 1999). The research progress of Cantonese Cantonese. What’s more, people in Hong Kong are language processing has been limited by the lack of using mixed Cantonese and English while they talk, text databases. The task of collecting more which is also a special characteristic of this language materials is a straightforward but feasible language. Beyond these, the materials collected breakthrough point in improving the performance from different sources have different proportion of of existing systems. colloquial Cantonese sentences, which means that In this study, we first developed a filter model, different sources should not be equally treated. which was built at lexical and grammar levels, to We developed a filter model, which was built decide whether a retrieved sentence is in Cantonese up at lexical and grammar levels. We trained this or not. This model was trained using a model using a development set and achieved a development set, which was made up of 500 precision rate of 99.89% and a recall rate of 88.2% colloquial Cantonese sentences and 500 standard in the test set. Chinese sentences. These sentences were all With this model, we found a method to define collected from web pages. The test set contained the credibility for the different material sources. It 2,251 colloquial Cantonese sentences and 7,172 was an iterative process and the proportion of the written sentences. The proposed filter model could sentences chosen from different sources for model achieve a precision rate of 99.89% and a recall rate training is decided by its result. of 88.2%, which was very encouraging. It must be pointed out that the precision rate is more 1 Introduction important than the recall rate in our intended Language modeling has been successfully used for application of data collection because undesirable speech recognition, part-of-speech tagging, sentences would deteriorate the performance of syntactic parsing, and information retrieval language models. recently and so on (Song 1999). In recent years, We designed a method to assess the credibility Grammar-Based Language Models (GLMs) and of a source of text materials, e.g., a website, which Statistical Language Models (SLMs) are most was an iterative process. The credibility level of each source was updated in every iteration widely used types of language models (Hockey according to the proportion of the colloquial 2005). For large-vocabulary continuous speech recognition (LVCSR), SLM, typically N-gram, Cantonese sentences contained in the documents plays an important role (Duchatcau 2005). In order from this source. With different iterations, to estimate model parameters, a large database of documents from different sources were treated text of this language is necessary. Lacking of differently. This method helped identify more language materials is generally a serious problem useful sources which attracted more attention, and in statistical language model training and may thus improved the efficiency of data collection. affect the performance of a language processing system. For popular languages like English,Spanish 1 http://www.sogou.com/labs/dl/t.html 130 2 Cantonese becoming popular in Mainland China in recent years, code-switching in Hong Kong Cantonese is 2.1 Introduction to Cantonese considered as an integrated part and a special Cantonese or Yue (粤语) is one of the most popular feature of contemporary Cantonese. Chinese dialects, and is a member of the For language model training in Cantonese Sino-Tibetan language families (Li 2006). It is also speech recognition, we need text materials that known as 广东话/廣東話 (Mandarin: Guǎngdōng match the content of real-life spoken Cantonese. huà, Cantonese: gwong dung waa). The population But in most Cantonese-speaking regions, including of Cantonese speakers is over 55 million. The areas Hong Kong SAR, Cantonese is not an official with the highest concentration of speakers are in written language (Zhang 1999). In Hong Kong, Guangdong Province and some parts of Guangxi standard Chinese in traditional characters (繁体中 Province, Hong Kong and Macau (See The 文 ) is used in official communications. Most Introduction in Lau 1999). It is the defacto official newspapers and formal publications are printed in spoken language in Hong Kong. Most Chinese standard Chinese. Over the internet, which hosts a people in Southeast Asia and North America also large amount of text materials, written Cantonese speak Cantonese. The name of this language, remains a minority. “Cantonese”, has been stated as being derived from Canton, which is an English name for Guangdong 3 The Cantonese Filter Model Province and also an English name for Guangzhou. As discussed earlier, there are quite a few (Graham 2006). Cantonese speech is generally differences between written Cantonese and unintelligible to people who live in other standard Chinese. Our goal is to develop an provinces. effective method to distinguish written Cantonese Cantonese is a monosyllabic and tonal from standard Chinese, so as to facilitate massive language. Each Chinese character is pronounced as collection of Cantonese text materials from the a syllable sound in Cantonese (Chan 2005). There internet. are about 5,500 commonly used characters in Cantonese, and this is a little less than standard 3.1 Construction of the Filter Model Chinese spoken in Mainland China and Taiwan. The basic function of the filter model is to Some of the characters are unique to Cantonese determine whether a given sentence is in written and are never being used in standard Chinese. But Cantonese or not. This function can be at the same time, there are also some characters implemented in two steps. The first step is to used in standard Chinese but not in Cantonese. In indicate whether the sentence is Chinese in a this paper, we mainly focus on Cantonese being “broad” sense, i.e., either written Cantonese, spoken in Hong Kong. standard Chinese or other kinds of Chinese text. 2.2 Written Cantonese and Written Chinese This is done by checking the encoding method of the text. The second step is to indicate whether the Standard written Chinese is, in essence, in sequence of Chinese characters is in Cantonese or agreement with Mandarin in Taiwan. When not. Cantonese text contains a set of standard written Chinese is read out with Cantonese-specific characters. It may also contain Cantonese pronunciations, the speech would sound special grammar phrases and code-switching strange and unnatural. It is very different from contents. For Cantonese language processing, spoken Cantonese. If we transcribe spoken methods of word segmentation and sentence Cantonese into Chinese characters, the resulted text parsing are not as well developed as for standard is referred to as written Cantonese. Chinese. Relevant research is rare. In our filter There are many differences between model, an obvious feature is used to detect Cantonese and standard Chinese. A standard Cantonese-specific content: Sentences that contain Chinese speaker will have difficulties to read at least one of these characters is considered to be written Cantonese. Cantonese has some unique in Cantonese. This simple method was found to characters like “嘅”, ”係”, and unique words, ”尋 perform very well in our study. 日”, ”噚日”, ”擒日”, ”琴日”. There also exist In (Chan 2005), some representative some special grammatical forms and phrase Cantonese sentences are listed. We chose 500 expressions in Cantonese. Cantonese is a language sentences from the list and used them as the where code-switching is quite common (Chan training data in our experiments. For standard 2005). In Hong Kong, people are used to embed Chinese, there is a “Common Chinese Character English words in a sentence. Although this is 131 List” published by the government in 19882. Then different training sets different numbers of positive we defined that the unique characters in Cantonese characters were found and thus the precision and from Chinese are the characters which appear in recall rate were different. All these experiments the training set but not in the Common Chinese were done in the same test set as we described Character List, and of course, in the traditional above and without the help of the negative Chinese form. These characters are called positive keywords, as shown in Table 1. characters. We also tried to defined negative characters as those which only appear in Chinese but not in Cantonese. However, a simple Training Set No. of Positive Precision Recall experiment showed that almost all Chinese Sentences Characters Rate Rate characters can be found in Cantonese.
Recommended publications
  • A Corpus Study of the 3 Tone Sandhi in Standard Chinese
    A Corpus Study of the 3 rd Tone Sandhi in Standard Chinese Yiya Chen 1, Jiahong Yuan 2 1 Department of Linguistics, Radboud University Nijmegen 2 Department of Linguistics, University of Pennsylvania [email protected], [email protected] (Zhang 1988, Shih 1997, M. Chen 2000, Chen 2003, Chen Abstract 2004). Speer et al. (1989) show that listeners are indeed In Standard Chinese, a Low tone (Tone3) is often realized sensitive to a constituent’s phrasal structure in judging the application of the 3 rd tone Sandhi to constituents which could with a rising F0 contour before another Low tone, known as the 3rd tone Sandhi. This study investigates the acoustic be ambiguous between an underlying Rising tone and a characteristics of the 3rd tone Sandhi in Standard Chinese Sandhi Rising tone. Their results suggest the possibility that the higher linguistic boundary it is between two Low tones, using a large telephone conversation speech corpus. Sandhi rd Rising was found to be different from the underlying Rising the less likely the 3 tone sandhi rule is applied. With regard tone (Tone2) in bi-syllabic words in two measures: the to the difference between the underlying Rising tone and the Sandhi Rising tone, Peng (2000) show that the F0 maximum magnitude of the F 0 rising and the time span of the F 0 rising. We also found different effects of word frequency on Sandhi of SR is lower than R. Furthermore, in fast speech, a Sandhi Rising and the underlying Rising tones. Finally, for tri- Rising tone may flatten and show no apparent F0 rise (Kuo, syllabic constituents with Low tone only, constituent Xu, and Yip, to appear).
    [Show full text]
  • The Status of Cantonese in the Education Policy of Hong Kong Kwai Sang Lee and Wai Mun Leung*
    Lee and Leung Multilingual Education 2012, 2:2 http://www.multilingual-education.com/2/1/2 RESEARCH Open Access The status of Cantonese in the education policy of Hong Kong Kwai Sang Lee and Wai Mun Leung* * Correspondence: waimun@ied. Abstract edu.hk Department of Chinese, The Hong After the handover of Hong Kong to China, a first-ever policy of “bi-literacy and Kong Institute of Education, Hong tri-lingualism” was put forward by the Special Administrative Region Government. Kong Under the trilingual policy, Cantonese, the most dominant local language, equally shares the official status with Putonghua and English only in name but not in spirit, as neither the promotion nor the funding approaches on Cantonese match its legal status. This paper reviews the status of Cantonese in Hong Kong under this policy with respect to the levels of government, education and curriculum, considers the consequences of neglecting Cantonese in the school curriculum, and discusses the importance of large-scale surveys for language policymaking. Keywords: the status of Cantonese, “bi-literacy and tri-lingualism” policy, language survey, Cantonese language education Background The adjustment of the language policy is a common phenomenon in post-colonial societies. It always results in raising the status of the regional vernacular, but the lan- guage of the ex-colonist still maintains a very strong influence on certain domains. Taking Singapore as an example, English became the dominant language in the work- place and families, and the local dialects were suppressed. It led to the degrading of both English and Chinese proficiency levels according to scholars’ evaluation (Goh 2009a, b).
    [Show full text]
  • Intonation in Hong Kong English and Guangzhou Cantonese-Accented English: a Phonetic Comparison
    ISSN 1798-4769 Journal of Language Teaching and Research, Vol. 11, No. 5, pp. 724-738, September 2020 DOI: http://dx.doi.org/10.17507/jltr.1105.07 Intonation in Hong Kong English and Guangzhou Cantonese-accented English: A Phonetic Comparison Yunyun Ran School of Foreign Languages, Shanghai University of Engineering Science, 333 Long Teng Road, Shanghai 201620, China Jeroen van de Weijer School of Foreign Languages, Shenzhen University, 3688 Nan Hai Avenue, Shenzhen 518060, China Marjoleine Sloos Fryske Akademy (KNAW), Doelestrjitte 8, 8911 DX Leeuwarden, The Netherlands Abstract—Hong Kong English is to a certain extent a standardized English variety spoken in a bilingual (English-Cantonese) context. In this article we compare this (native) variety with English as a foreign language spoken by other Cantonese speakers, viz. learners of English in Guangzhou (mainland China). We examine whether the notion of standardization is relevant for intonation in this case and thus whether Hong Kong English is different from Cantonese English in a wider perspective, or whether it is justified to treat Hong Kong English and Cantonese English as the same variety (as far as intonation is concerned). We present a comparison between intonational contours of different sentence types in the two varieties, and show that they are very similar. This shows that, in this respect, a learned foreign-language variety can resemble a native variety to a great extent. Index Terms—Hong Kong English, Cantonese-accented English, intonation I. INTRODUCTION Cantonese English may either refer to Hong Kong English (HKE), or to a broader variety of English spoken in the Cantonese-speaking area, including Guangzhou (Wong et al.
    [Show full text]
  • Pan-Sinitic Object Marking: Morphology and Syntax*
    Breaking Down the Barriers, 785-816 2013-1-050-037-000234-1 Pan-Sinitic Object Marking: * Morphology and Syntax Hilary Chappell (曹茜蕾) EHESS In Chinese languages, when a direct object occurs in a non-canonical position preceding the main verb, this SOV structure can be morphologically marked by a preposition whose source comes largely from verbs or deverbal prepositions. For example, markers such as kā 共 in Southern Min are ultimately derived from the verb ‘to accompany’, pau11 幫 in many Huizhou and Wu dialects is derived from the verb ‘to help’ and bǎ 把 from the verb ‘to hold’ in standard Mandarin and the Jin dialects. In general, these markers are used to highlight an explicit change of state affecting a referential object, located in this preverbal position. This analysis sets out to address the issue of diversity in such object-marking constructions in order to examine the question of whether areal patterns exist within Sinitic languages on the basis of the main lexical fields of the object markers, if not the construction types. The possibility of establishing four major linguistic zones in China is thus explored with respect to grammaticalization pathways. Key words: typology, grammaticalization, object marking, disposal constructions, linguistic zones 1. Background to the issue In the case of transitive verbs, it is uncontroversial to state that a common word order in Sinitic languages is for direct objects to follow the main verb without any overt morphological marking: * This is a “cross-straits” paper as earlier versions were presented in turn at both the Institute of Linguistics, Academia Sinica, during the joint 14th Annual Conference of the International Association of Chinese Linguistics and 10th International Symposium on Chinese Languages and Linguistics, held in Taipei in May 25-29, 2006 and also at an invited seminar at the Institute of Linguistics, Chinese Academy of Social Sciences in Beijing on 23rd October 2006.
    [Show full text]
  • Cantonese Vs. Mandarin: a Summary
    Cantonese vs. Mandarin: A summary JMFT October 21, 2015 This short essay is intended to summarise the similarities and differences between Cantonese and Mandarin. 1 Introduction The large geographical area that is referred to as `China'1 is home to many languages and dialects. Most of these languages are related, and fall under the umbrella term Hanyu (¡£), a term which is usually translated as `Chinese' and spoken of as though it were a unified language. In fact, there are hundreds of dialects and varieties of Chinese, which are not mutually intelligible. With 910 million speakers worldwide2, Mandarin is by far the most common dialect of Chinese. `Mandarin' or `guanhua' originally referred to the language of the mandarins, the government bureaucrats who were based in Beijing. This language was based on the Bejing dialect of Chinese. It was promoted by the Qing dynasty (1644{1912) and later the People's Republic (1949{) as the country's lingua franca, as part of efforts by these governments to establish political unity. Mandarin is now used by most people in China and Taiwan. 3 Mandarin itself consists of many subvarities which are not mutually intelligible. Cantonese (Yuetyu (£) is named after the city Canton, whose name is now transliterated as Guangdong. It is spoken in Hong Kong and Macau (with a combined population of around 8 million), and, owing to these cities' former colonial status, by many overseas Chinese. In the rest of China, Cantonese is relatively rare, but it is still sometimes spoken in Guangzhou. 2 History and etymology It is interesting to note that the Cantonese name for Cantonese, Yuetyu, means `language of the Yuet people'.
    [Show full text]
  • Download Our Latest Allegravita Backgrounder Booklet Here (PDF)
    Name: Allegravita is an award-winning, multi- disciplinary public relations and strategic communications agency focused on supporting international clients in the China region and taking Chinese clients to the world. We were voted China's most entrepreneurial company by the Australian Chambers of ABOUT ALLEGRAVITA Commerce in China in 2008. Allegravita is a boutique global agency A PORTFOLIO OF SERVICES TO BORN IN CHINA, with personnel and offices in Beijing, HELP YOU SUCCEED IN CHINA EFFECTIVE WORLDWIDE Guangzhou, Kunming, Hong Kong, Public Relations for proactive and Although our focus is on the China New York City and San Francisco. Since reactive messaging. region, our services are very effective 2003 we have provided high-quality Marketing and Communications in markets worldwide, with proven PR, marketing and corporate advisory Collateral to present your messages outcomes. Allegravita works within services with a special focus on achiev- with excellent credibility. a highly-accountable and disciplined ing excellent results for international Media Relations & Media Training Western management style, executing clients in the China region and in Chi- to insert your messages into Chinese the highest quality of work for our nese speaking markets worldwide, and and international media in the most clients, which we deliver with agility, international results for our Chinese compelling way possible. flexibility, creativity and cultural savvy. clients. Corporate Identity localization to Allegravita is an ethnically diverse, We incorporate expert public relations communicate your brand values and multi-cultural team of professionals abilities with a firm grasp of contempo- benefits to Chinese markets, and for of different cultural heritages. What rary China-region marketplaces to help Chinese clients, to international inves- we share in common is our passion for our clients communicate effectively, tors and influencers.
    [Show full text]
  • Language Specific Peculiarities Document for Cantonese As
    Language Specific Peculiarities Document for Cantonese as Spoken in the Guangdong and Guangxi Provinces of China 1. Dialects The name "Cantonese" is used either for all of the language varieties spoken in specific regions in the Guangdong and Guangxi Provinces of China and Hong Kong (i.e., the Yue dialects of Chinese), or as one particular variety referred to as the "Guangfu group" (Bauer & Benedict 1997). In instances where Cantonese is described as 'Cantonese "proper"' (i.e. used in the narrower sense), it refers to a variety of Cantonese that is spoken in the capital cities Guangzhou and Nanning, as well as in Hong Kong and Macau. This database includes Cantonese as spoken in the Guangdong and Guangxi Provinces of China only (i.e. not in Hong Kong); five dialect groups have been defined for Cantonese (see the following table)1. Three general principles have been used in defining these dialect groupings: (i) phonological variation, (ii) geographical variation, and (iii) lexical variation. With relation to phonological variation, although Cantonese is spoken in all of the regions listed in the table, there are differences in pronunciation. Differences in geographic locations also correlate with variations in lexical choice. Cultural differences are also correlated with linguistic differences, particularly in lexical choices. Area Cities (examples) Central Guangzhou, Conghua, Fogang (Shijiao), Guangdong Longmen, Zengcheng, Huaxian Group Northern Shaoguan, Qijiang, Lian Xian, Liannan, Guangdong Yangshan, Yingde, Taiping Group Northern
    [Show full text]
  • Cifu: a Frequency Lexicon of Hong Kong Cantonese
    Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 3069–3077 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC Cifu: a frequency lexicon of Hong Kong Cantonese Regine Lai, Grégoire Winterstein The Chinese University of Hong Kong, Université du Québec à Montréal Department of Linguistics and Modern Languages, Département de Linguistique [email protected], [email protected] Abstract This paper introduces Cifu, a lexical database for Hong Kong Cantonese (HKC) that offers phonological and orthographic information, frequency measures, and lexical neighborhood information for lexical items in HKC. The resource can be used for NLP applications and the design and analysis of psycholinguistic experiments on HKC. We elaborate on the characteristics and challenges specific to HKC that were relevant in the design of Cifu. This includes lexical, orthographic and phonological aspects of HKC, word segmentation issues, the place of HKC in written media, and the availability of data. We discuss the measure of Neighborhood Density (ND), highlighting how the analytic nature of Cantonese and its writing system affect that measure. We justify using six different variations of ND, based on the possibility of inserting or deleting phonemes when searching for neighbors and on the choice of data for retrieving frequencies. Statistics about the four genres (written, adult spoken, children spoken and child-directed) within the dataset are discussed. We find that the lexical diversity of the child-directed speech genre is particularly low, compared to a size-matched written corpus. The correlations of word frequencies of different genres are all high, but in general decrease as word length increases.
    [Show full text]
  • Language Management in the People's Republic of China
    LANGUAGE AND PUBLIC POLICY Language management in the People’s Republic of China Bernard Spolsky Bar-Ilan University Since the establishment of the People’s Republic of China in 1949, language management has been a central activity of the party and government, interrupted during the years of the Cultural Revolution. It has focused on the spread of Putonghua as a national language, the simplification of the script, and the auxiliary use of Pinyin. Associated has been a policy of modernization and ter - minological development. There have been studies of bilingualism and topolects (regional vari - eties like Cantonese and Hokkien) and some recognition and varied implementation of the needs of non -Han minority languages and dialects, including script development and modernization. As - serting the status of Chinese in a globalizing world, a major campaign of language diffusion has led to the establishment of Confucius Institutes all over the world. Within China, there have been significant efforts in foreign language education, at first stressing Russian but now covering a wide range of languages, though with a growing emphasis on English. Despite the size of the country, the complexity of its language situations, and the tension between competing goals, there has been progress with these language -management tasks. At the same time, nonlinguistic forces have shown even more substantial results. Computers are adding to the challenge of maintaining even the simplified character writing system. As even more striking evidence of the effect of poli - tics and demography on language policy, the enormous internal rural -to -urban rate of migration promises to have more influence on weakening regional and minority varieties than campaigns to spread Putonghua.
    [Show full text]
  • Corpus of Hong Kong Cantonese 香香香港港港語語語語語語料料料庫庫庫
    Corpus of Hong Kong Cantonese 香香香港nn;;;;;;$$$庫庫庫 ã 180,000-word corpus of Cantonese Speech ã 52 spontaneous conversations ã 42 radio programmes ã Transcribed (UTF-8); Transliterated Segmented; POS tagged ã English translation described in paper, not in downloadable corpus ã Available directly for download (no explicit license) http://compling.hss.ntu.edu.sg/hkcancor/ ã Produced by Luke Kang Kwong and ML Wong Francis Bond <[email protected]> HG3051 lab2 1 Creation ã 30 hours of recordings (March 1997 — August 1998) ã Native speakers of Cantonese ã ordinary settings with family members, friends and colleagues talking with each other freely on everyday topics such as current affairs, work and study, and personal hobbies ã Some parts selected 2 Meta-Data/Annotation ã Meta-Data ­ Tape number (of recording); Date of recording ­ Number of Speakers; List of Speakers (Code-Sex-Age-Origin) (e.g. A-M-22-HK says A is a 22-year-old male speaker from Hong Kong) ã Annotation ­ Each Utterance has the speaker code ­ Utterances are segmented, POS tagged and transliterated Ç%h/d/ge3i1bun2soeng6/ 2個/r/ni1go3/ze ... ã The whole corpus is wrapped in xml (but not very well) 3 Usage ã Used to examine the uses of the frequently used sentence final particles woˇ and boˇ in the 1990s in Hong Kong Cantonese by examining speech data. ã Question: are woˇ (喎) and boˇ (S) variant forms? ã Answer: No “[. ] the two SFPs carry and serve different meanings and functions in modern Hong Kong Cantonese, and thus they are not exactly the same particles and not interchangeable as previously assumed.” (Leung, 2010, p21) ã Also used as a corpus in the PyCantonese Project: Working with Cantonese corpus data using Python, by Jackson L.
    [Show full text]
  • LANGUAGE CONTACT and AREAL DIFFUSION in SINITIC LANGUAGES (Pre-Publication Version)
    LANGUAGE CONTACT AND AREAL DIFFUSION IN SINITIC LANGUAGES (pre-publication version) Hilary Chappell This analysis includes a description of language contact phenomena such as stratification, hybridization and convergence for Sinitic languages. It also presents typologically unusual grammatical features for Sinitic such as double patient constructions, negative existential constructions and agentive adversative passives, while tracing the development of complementizers and diminutives and demarcating the extent of their use across Sinitic and the Sinospheric zone. Both these kinds of data are then used to explore the issue of the adequacy of the comparative method to model linguistic relationships inside and outside of the Sinitic family. It is argued that any adequate explanation of language family formation and development needs to take into account these different kinds of evidence (or counter-evidence) in modeling genetic relationships. In §1 the application of the comparative method to Chinese is reviewed, closely followed by a brief description of the typological features of Sinitic languages in §2. The main body of this chapter is contained in two final sections: §3 discusses three main outcomes of language contact, while §4 investigates morphosyntactic features that evoke either the North-South divide in Sinitic or areal diffusion of certain features in Southeast and East Asia as opposed to grammaticalization pathways that are crosslinguistically common.i 1. The comparative method and reconstruction of Sinitic In Chinese historical
    [Show full text]
  • Essentials of Standard Chinese Phonetics for Prosthetic Dentistry Xiulian Hu, DMD,1 Ye Lin, MD,1 Cordula Hunold, Phd,2 & Katja Nelson, DDS, Phd3
    Essentials of Standard Chinese Phonetics for Prosthetic Dentistry Xiulian Hu, DMD,1 Ye Lin, MD,1 Cordula Hunold, PhD,2 & Katja Nelson, DDS, PhD3 1Beijing University, School and Hospital of Stomatology, Beijing, China 2Goethe Institut, Beijing, China 3Department of Oral and Maxillofacial Surgery, Albert-Ludwigs Universitat¨ Freiburg, Freiburg, Germany The article is associated with the American College of Prosthodontists’ journal-based continuing education program. It is accompanied by an online continuing education activity worth 1 credit. Please visit www.wileyonlinelearning.com/jopr to complete the activity and earn credit. Keywords Abstract Mandarin; Putonghua; denture; speech; pronunciation; palatogram. Speech adaptation after oral rehabilitation is based on a complex interaction of artic- ulatory and myofunctional factors. The knowledge of basic phonetic principles may Correspondence help clinicians identify phonetic problems associated with prosthodontic treatment. Katja Nelson, Department of Oral and The purpose of this article is to illustrate basic phonetic terminology, standard Chinese Maxillofacial Surgery, Albert-Ludwigs (Putonghua) phonetics, and the anatomic structures relevant for dentistry. In cooper- Universitat¨ Freiburg, Hugstetter Str. 55, ation with a Chinese linguistic specialist, Chinese articulators were selected and are Freiburg 79106, Germany. E-mail: described and compared with English phonetics. Established test words and sentences [email protected] aid the identification of mispronounced articulators and their related dental structures. The pronunciation of most consonants and vowels in standard Chinese is similar to Xiulian Hu received a scholarship grant for ʂ ʂ ʂ this study from the Camlog Foundation. English, but some of them, such as the retropalatals (/zh/ [t ], /ch/ [th ], /sh/ [ ]), have notable differences.
    [Show full text]