Automatic Collecting of Text Data for Cantonese Language Modeling Jiang CAO1, Xiaojun WU1, Yu Ting Yeung2, Tan LEE2, Thomas Fang Zheng1* 1
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
A Corpus Study of the 3 Tone Sandhi in Standard Chinese
A Corpus Study of the 3 rd Tone Sandhi in Standard Chinese Yiya Chen 1, Jiahong Yuan 2 1 Department of Linguistics, Radboud University Nijmegen 2 Department of Linguistics, University of Pennsylvania [email protected], [email protected] (Zhang 1988, Shih 1997, M. Chen 2000, Chen 2003, Chen Abstract 2004). Speer et al. (1989) show that listeners are indeed In Standard Chinese, a Low tone (Tone3) is often realized sensitive to a constituent’s phrasal structure in judging the application of the 3 rd tone Sandhi to constituents which could with a rising F0 contour before another Low tone, known as the 3rd tone Sandhi. This study investigates the acoustic be ambiguous between an underlying Rising tone and a characteristics of the 3rd tone Sandhi in Standard Chinese Sandhi Rising tone. Their results suggest the possibility that the higher linguistic boundary it is between two Low tones, using a large telephone conversation speech corpus. Sandhi rd Rising was found to be different from the underlying Rising the less likely the 3 tone sandhi rule is applied. With regard tone (Tone2) in bi-syllabic words in two measures: the to the difference between the underlying Rising tone and the Sandhi Rising tone, Peng (2000) show that the F0 maximum magnitude of the F 0 rising and the time span of the F 0 rising. We also found different effects of word frequency on Sandhi of SR is lower than R. Furthermore, in fast speech, a Sandhi Rising and the underlying Rising tones. Finally, for tri- Rising tone may flatten and show no apparent F0 rise (Kuo, syllabic constituents with Low tone only, constituent Xu, and Yip, to appear). -
The Status of Cantonese in the Education Policy of Hong Kong Kwai Sang Lee and Wai Mun Leung*
Lee and Leung Multilingual Education 2012, 2:2 http://www.multilingual-education.com/2/1/2 RESEARCH Open Access The status of Cantonese in the education policy of Hong Kong Kwai Sang Lee and Wai Mun Leung* * Correspondence: waimun@ied. Abstract edu.hk Department of Chinese, The Hong After the handover of Hong Kong to China, a first-ever policy of “bi-literacy and Kong Institute of Education, Hong tri-lingualism” was put forward by the Special Administrative Region Government. Kong Under the trilingual policy, Cantonese, the most dominant local language, equally shares the official status with Putonghua and English only in name but not in spirit, as neither the promotion nor the funding approaches on Cantonese match its legal status. This paper reviews the status of Cantonese in Hong Kong under this policy with respect to the levels of government, education and curriculum, considers the consequences of neglecting Cantonese in the school curriculum, and discusses the importance of large-scale surveys for language policymaking. Keywords: the status of Cantonese, “bi-literacy and tri-lingualism” policy, language survey, Cantonese language education Background The adjustment of the language policy is a common phenomenon in post-colonial societies. It always results in raising the status of the regional vernacular, but the lan- guage of the ex-colonist still maintains a very strong influence on certain domains. Taking Singapore as an example, English became the dominant language in the work- place and families, and the local dialects were suppressed. It led to the degrading of both English and Chinese proficiency levels according to scholars’ evaluation (Goh 2009a, b). -
Intonation in Hong Kong English and Guangzhou Cantonese-Accented English: a Phonetic Comparison
ISSN 1798-4769 Journal of Language Teaching and Research, Vol. 11, No. 5, pp. 724-738, September 2020 DOI: http://dx.doi.org/10.17507/jltr.1105.07 Intonation in Hong Kong English and Guangzhou Cantonese-accented English: A Phonetic Comparison Yunyun Ran School of Foreign Languages, Shanghai University of Engineering Science, 333 Long Teng Road, Shanghai 201620, China Jeroen van de Weijer School of Foreign Languages, Shenzhen University, 3688 Nan Hai Avenue, Shenzhen 518060, China Marjoleine Sloos Fryske Akademy (KNAW), Doelestrjitte 8, 8911 DX Leeuwarden, The Netherlands Abstract—Hong Kong English is to a certain extent a standardized English variety spoken in a bilingual (English-Cantonese) context. In this article we compare this (native) variety with English as a foreign language spoken by other Cantonese speakers, viz. learners of English in Guangzhou (mainland China). We examine whether the notion of standardization is relevant for intonation in this case and thus whether Hong Kong English is different from Cantonese English in a wider perspective, or whether it is justified to treat Hong Kong English and Cantonese English as the same variety (as far as intonation is concerned). We present a comparison between intonational contours of different sentence types in the two varieties, and show that they are very similar. This shows that, in this respect, a learned foreign-language variety can resemble a native variety to a great extent. Index Terms—Hong Kong English, Cantonese-accented English, intonation I. INTRODUCTION Cantonese English may either refer to Hong Kong English (HKE), or to a broader variety of English spoken in the Cantonese-speaking area, including Guangzhou (Wong et al. -
Pan-Sinitic Object Marking: Morphology and Syntax*
Breaking Down the Barriers, 785-816 2013-1-050-037-000234-1 Pan-Sinitic Object Marking: * Morphology and Syntax Hilary Chappell (曹茜蕾) EHESS In Chinese languages, when a direct object occurs in a non-canonical position preceding the main verb, this SOV structure can be morphologically marked by a preposition whose source comes largely from verbs or deverbal prepositions. For example, markers such as kā 共 in Southern Min are ultimately derived from the verb ‘to accompany’, pau11 幫 in many Huizhou and Wu dialects is derived from the verb ‘to help’ and bǎ 把 from the verb ‘to hold’ in standard Mandarin and the Jin dialects. In general, these markers are used to highlight an explicit change of state affecting a referential object, located in this preverbal position. This analysis sets out to address the issue of diversity in such object-marking constructions in order to examine the question of whether areal patterns exist within Sinitic languages on the basis of the main lexical fields of the object markers, if not the construction types. The possibility of establishing four major linguistic zones in China is thus explored with respect to grammaticalization pathways. Key words: typology, grammaticalization, object marking, disposal constructions, linguistic zones 1. Background to the issue In the case of transitive verbs, it is uncontroversial to state that a common word order in Sinitic languages is for direct objects to follow the main verb without any overt morphological marking: * This is a “cross-straits” paper as earlier versions were presented in turn at both the Institute of Linguistics, Academia Sinica, during the joint 14th Annual Conference of the International Association of Chinese Linguistics and 10th International Symposium on Chinese Languages and Linguistics, held in Taipei in May 25-29, 2006 and also at an invited seminar at the Institute of Linguistics, Chinese Academy of Social Sciences in Beijing on 23rd October 2006. -
Cantonese Vs. Mandarin: a Summary
Cantonese vs. Mandarin: A summary JMFT October 21, 2015 This short essay is intended to summarise the similarities and differences between Cantonese and Mandarin. 1 Introduction The large geographical area that is referred to as `China'1 is home to many languages and dialects. Most of these languages are related, and fall under the umbrella term Hanyu (¡£), a term which is usually translated as `Chinese' and spoken of as though it were a unified language. In fact, there are hundreds of dialects and varieties of Chinese, which are not mutually intelligible. With 910 million speakers worldwide2, Mandarin is by far the most common dialect of Chinese. `Mandarin' or `guanhua' originally referred to the language of the mandarins, the government bureaucrats who were based in Beijing. This language was based on the Bejing dialect of Chinese. It was promoted by the Qing dynasty (1644{1912) and later the People's Republic (1949{) as the country's lingua franca, as part of efforts by these governments to establish political unity. Mandarin is now used by most people in China and Taiwan. 3 Mandarin itself consists of many subvarities which are not mutually intelligible. Cantonese (Yuetyu (£) is named after the city Canton, whose name is now transliterated as Guangdong. It is spoken in Hong Kong and Macau (with a combined population of around 8 million), and, owing to these cities' former colonial status, by many overseas Chinese. In the rest of China, Cantonese is relatively rare, but it is still sometimes spoken in Guangzhou. 2 History and etymology It is interesting to note that the Cantonese name for Cantonese, Yuetyu, means `language of the Yuet people'. -
Download Our Latest Allegravita Backgrounder Booklet Here (PDF)
Name: Allegravita is an award-winning, multi- disciplinary public relations and strategic communications agency focused on supporting international clients in the China region and taking Chinese clients to the world. We were voted China's most entrepreneurial company by the Australian Chambers of ABOUT ALLEGRAVITA Commerce in China in 2008. Allegravita is a boutique global agency A PORTFOLIO OF SERVICES TO BORN IN CHINA, with personnel and offices in Beijing, HELP YOU SUCCEED IN CHINA EFFECTIVE WORLDWIDE Guangzhou, Kunming, Hong Kong, Public Relations for proactive and Although our focus is on the China New York City and San Francisco. Since reactive messaging. region, our services are very effective 2003 we have provided high-quality Marketing and Communications in markets worldwide, with proven PR, marketing and corporate advisory Collateral to present your messages outcomes. Allegravita works within services with a special focus on achiev- with excellent credibility. a highly-accountable and disciplined ing excellent results for international Media Relations & Media Training Western management style, executing clients in the China region and in Chi- to insert your messages into Chinese the highest quality of work for our nese speaking markets worldwide, and and international media in the most clients, which we deliver with agility, international results for our Chinese compelling way possible. flexibility, creativity and cultural savvy. clients. Corporate Identity localization to Allegravita is an ethnically diverse, We incorporate expert public relations communicate your brand values and multi-cultural team of professionals abilities with a firm grasp of contempo- benefits to Chinese markets, and for of different cultural heritages. What rary China-region marketplaces to help Chinese clients, to international inves- we share in common is our passion for our clients communicate effectively, tors and influencers. -
Language Specific Peculiarities Document for Cantonese As
Language Specific Peculiarities Document for Cantonese as Spoken in the Guangdong and Guangxi Provinces of China 1. Dialects The name "Cantonese" is used either for all of the language varieties spoken in specific regions in the Guangdong and Guangxi Provinces of China and Hong Kong (i.e., the Yue dialects of Chinese), or as one particular variety referred to as the "Guangfu group" (Bauer & Benedict 1997). In instances where Cantonese is described as 'Cantonese "proper"' (i.e. used in the narrower sense), it refers to a variety of Cantonese that is spoken in the capital cities Guangzhou and Nanning, as well as in Hong Kong and Macau. This database includes Cantonese as spoken in the Guangdong and Guangxi Provinces of China only (i.e. not in Hong Kong); five dialect groups have been defined for Cantonese (see the following table)1. Three general principles have been used in defining these dialect groupings: (i) phonological variation, (ii) geographical variation, and (iii) lexical variation. With relation to phonological variation, although Cantonese is spoken in all of the regions listed in the table, there are differences in pronunciation. Differences in geographic locations also correlate with variations in lexical choice. Cultural differences are also correlated with linguistic differences, particularly in lexical choices. Area Cities (examples) Central Guangzhou, Conghua, Fogang (Shijiao), Guangdong Longmen, Zengcheng, Huaxian Group Northern Shaoguan, Qijiang, Lian Xian, Liannan, Guangdong Yangshan, Yingde, Taiping Group Northern -
Cifu: a Frequency Lexicon of Hong Kong Cantonese
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 3069–3077 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC Cifu: a frequency lexicon of Hong Kong Cantonese Regine Lai, Grégoire Winterstein The Chinese University of Hong Kong, Université du Québec à Montréal Department of Linguistics and Modern Languages, Département de Linguistique [email protected], [email protected] Abstract This paper introduces Cifu, a lexical database for Hong Kong Cantonese (HKC) that offers phonological and orthographic information, frequency measures, and lexical neighborhood information for lexical items in HKC. The resource can be used for NLP applications and the design and analysis of psycholinguistic experiments on HKC. We elaborate on the characteristics and challenges specific to HKC that were relevant in the design of Cifu. This includes lexical, orthographic and phonological aspects of HKC, word segmentation issues, the place of HKC in written media, and the availability of data. We discuss the measure of Neighborhood Density (ND), highlighting how the analytic nature of Cantonese and its writing system affect that measure. We justify using six different variations of ND, based on the possibility of inserting or deleting phonemes when searching for neighbors and on the choice of data for retrieving frequencies. Statistics about the four genres (written, adult spoken, children spoken and child-directed) within the dataset are discussed. We find that the lexical diversity of the child-directed speech genre is particularly low, compared to a size-matched written corpus. The correlations of word frequencies of different genres are all high, but in general decrease as word length increases. -
Language Management in the People's Republic of China
LANGUAGE AND PUBLIC POLICY Language management in the People’s Republic of China Bernard Spolsky Bar-Ilan University Since the establishment of the People’s Republic of China in 1949, language management has been a central activity of the party and government, interrupted during the years of the Cultural Revolution. It has focused on the spread of Putonghua as a national language, the simplification of the script, and the auxiliary use of Pinyin. Associated has been a policy of modernization and ter - minological development. There have been studies of bilingualism and topolects (regional vari - eties like Cantonese and Hokkien) and some recognition and varied implementation of the needs of non -Han minority languages and dialects, including script development and modernization. As - serting the status of Chinese in a globalizing world, a major campaign of language diffusion has led to the establishment of Confucius Institutes all over the world. Within China, there have been significant efforts in foreign language education, at first stressing Russian but now covering a wide range of languages, though with a growing emphasis on English. Despite the size of the country, the complexity of its language situations, and the tension between competing goals, there has been progress with these language -management tasks. At the same time, nonlinguistic forces have shown even more substantial results. Computers are adding to the challenge of maintaining even the simplified character writing system. As even more striking evidence of the effect of poli - tics and demography on language policy, the enormous internal rural -to -urban rate of migration promises to have more influence on weakening regional and minority varieties than campaigns to spread Putonghua. -
Corpus of Hong Kong Cantonese 香香香港港港語語語語語語料料料庫庫庫
Corpus of Hong Kong Cantonese 香香香港nn;;;;;;$$$庫庫庫 ã 180,000-word corpus of Cantonese Speech ã 52 spontaneous conversations ã 42 radio programmes ã Transcribed (UTF-8); Transliterated Segmented; POS tagged ã English translation described in paper, not in downloadable corpus ã Available directly for download (no explicit license) http://compling.hss.ntu.edu.sg/hkcancor/ ã Produced by Luke Kang Kwong and ML Wong Francis Bond <[email protected]> HG3051 lab2 1 Creation ã 30 hours of recordings (March 1997 — August 1998) ã Native speakers of Cantonese ã ordinary settings with family members, friends and colleagues talking with each other freely on everyday topics such as current affairs, work and study, and personal hobbies ã Some parts selected 2 Meta-Data/Annotation ã Meta-Data Tape number (of recording); Date of recording Number of Speakers; List of Speakers (Code-Sex-Age-Origin) (e.g. A-M-22-HK says A is a 22-year-old male speaker from Hong Kong) ã Annotation Each Utterance has the speaker code Utterances are segmented, POS tagged and transliterated Ç%h/d/ge3i1bun2soeng6/ 2個/r/ni1go3/ze ... ã The whole corpus is wrapped in xml (but not very well) 3 Usage ã Used to examine the uses of the frequently used sentence final particles woˇ and boˇ in the 1990s in Hong Kong Cantonese by examining speech data. ã Question: are woˇ (喎) and boˇ (S) variant forms? ã Answer: No “[. ] the two SFPs carry and serve different meanings and functions in modern Hong Kong Cantonese, and thus they are not exactly the same particles and not interchangeable as previously assumed.” (Leung, 2010, p21) ã Also used as a corpus in the PyCantonese Project: Working with Cantonese corpus data using Python, by Jackson L. -
LANGUAGE CONTACT and AREAL DIFFUSION in SINITIC LANGUAGES (Pre-Publication Version)
LANGUAGE CONTACT AND AREAL DIFFUSION IN SINITIC LANGUAGES (pre-publication version) Hilary Chappell This analysis includes a description of language contact phenomena such as stratification, hybridization and convergence for Sinitic languages. It also presents typologically unusual grammatical features for Sinitic such as double patient constructions, negative existential constructions and agentive adversative passives, while tracing the development of complementizers and diminutives and demarcating the extent of their use across Sinitic and the Sinospheric zone. Both these kinds of data are then used to explore the issue of the adequacy of the comparative method to model linguistic relationships inside and outside of the Sinitic family. It is argued that any adequate explanation of language family formation and development needs to take into account these different kinds of evidence (or counter-evidence) in modeling genetic relationships. In §1 the application of the comparative method to Chinese is reviewed, closely followed by a brief description of the typological features of Sinitic languages in §2. The main body of this chapter is contained in two final sections: §3 discusses three main outcomes of language contact, while §4 investigates morphosyntactic features that evoke either the North-South divide in Sinitic or areal diffusion of certain features in Southeast and East Asia as opposed to grammaticalization pathways that are crosslinguistically common.i 1. The comparative method and reconstruction of Sinitic In Chinese historical -
Essentials of Standard Chinese Phonetics for Prosthetic Dentistry Xiulian Hu, DMD,1 Ye Lin, MD,1 Cordula Hunold, Phd,2 & Katja Nelson, DDS, Phd3
Essentials of Standard Chinese Phonetics for Prosthetic Dentistry Xiulian Hu, DMD,1 Ye Lin, MD,1 Cordula Hunold, PhD,2 & Katja Nelson, DDS, PhD3 1Beijing University, School and Hospital of Stomatology, Beijing, China 2Goethe Institut, Beijing, China 3Department of Oral and Maxillofacial Surgery, Albert-Ludwigs Universitat¨ Freiburg, Freiburg, Germany The article is associated with the American College of Prosthodontists’ journal-based continuing education program. It is accompanied by an online continuing education activity worth 1 credit. Please visit www.wileyonlinelearning.com/jopr to complete the activity and earn credit. Keywords Abstract Mandarin; Putonghua; denture; speech; pronunciation; palatogram. Speech adaptation after oral rehabilitation is based on a complex interaction of artic- ulatory and myofunctional factors. The knowledge of basic phonetic principles may Correspondence help clinicians identify phonetic problems associated with prosthodontic treatment. Katja Nelson, Department of Oral and The purpose of this article is to illustrate basic phonetic terminology, standard Chinese Maxillofacial Surgery, Albert-Ludwigs (Putonghua) phonetics, and the anatomic structures relevant for dentistry. In cooper- Universitat¨ Freiburg, Hugstetter Str. 55, ation with a Chinese linguistic specialist, Chinese articulators were selected and are Freiburg 79106, Germany. E-mail: described and compared with English phonetics. Established test words and sentences [email protected] aid the identification of mispronounced articulators and their related dental structures. The pronunciation of most consonants and vowels in standard Chinese is similar to Xiulian Hu received a scholarship grant for ʂ ʂ ʂ this study from the Camlog Foundation. English, but some of them, such as the retropalatals (/zh/ [t ], /ch/ [th ], /sh/ [ ]), have notable differences.