Global TIMIT Mandarin Chinese-Guanzhong Dialect Authors: Yue Jiang, Juhong Zhan, Hongjian Han, Zuohao Xu, Haiyan Zhou, Jiahong Y

Total Page：16

File Type：pdf, Size：1020Kb

Global TIMIT Mandarin Chinese-Guanzhong Dialect Authors: Yue Jiang, Juhong Zhan, Hongjian Han, Zuohao Xu, Haiyan Zhou, Jiahong Yuan, Mark Liberman Introduction Global TIMIT Mandarin Chinese-Guanzhong Dialect is a TIMIT-like corpus in Guanzhong dialect of Mandarin (in Shannxi province), part of the “Global TIMIT” series. The design of “Global TIMIT” adopts a scheme different from that of the original TIMIT. Instead of having 630 speakers and 10 sentences per speaker, the new design has 50 speakers and 120 sentences per speaker. Among the 120 sentences, 20 are “Calibration” sentences, read by all speakers; 40 are “Shared” sentences, read by 10 speakers; and 60 are “Unique” sentences, read by only one speaker. The total number of sentence types is, therefore, 20 + 40*(50/10) + 60*50 = 3220. The sentences of Global TIMIT Mandarin Chinese-Guanzhong Dialect were selected from the corpus of Chinese Gigaword Fifth Edition (LDC2011T13). Twenty “Calibration” sentences were selected to cover the maximum number of syllable types in the language. “Shared” sentences were selected to cover the maximum number of tones and tonal combinations. Finally, “Unique” sentences were randomly selected. 50 high school students, 25 females and 25 males, were recruited to read the sentences. The speakers were born in Weinan, Shannxi, and speak the Guanzhong dialect of Mandarin. The recording was made in a quiet room. HMM/GMM-based forced alignment, with employment of explicit phone boundary models, was applied to obtain phonetic segmentation. Data The corpus contains 5999 utterances (one utterance was missing). Each utterance has file data files as listed below: *.flac: flac files *.phones: phone segmentation files *.words: word segmentation files *.tones: syllable/tone segmentation files *.TextGrid: Praat TextGrid files Base filenames have the form SP##_###, where the first digit string is a 0-padded subject number and the second is a 0-padded sentence number. Odd subject numbers (SP01, SP03, …) represent male speakers; and even subject numbers (SP02, SP04, …) represent female speakers. Sentences numbered from 001 to 020 are “Calibration” sentences, 021-060 are “Shared” sentences, and 061-120 are “Unique” sentences. .

Copy Link

Recommended publications

U.S. Plays Catch-Up in Africa, After China Gains New Ground Minnesota
C HINA Fostering business and culturalI harmonyNSIGHT between China and the U.S. VOL. 13 NO. 8 SEPT 2014 Minnesota Chinese program educators reflect on their China visit By Yongling Zhang-Gorke, contributor During the past several are: Kristine Schaefer, prin- years, we have witnessed the cipal, Woodbury Elementary Arboretum’s Chinese garden, page 2 growth of Chinese programs School; Karin Lopez, principal, in K-12 schools in Minnesota, Woodbury Middle School; Bob especially those in the 12 af- Bulthuis, certified employment filiated Confucius Classrooms. specialist, Hopkins Public As schools grow their Chinese Schools; Molly Wieland, co- program, so does the need for ordinator, Mandarin Immersion establishing stronger connec- Program at Hopkins Public tions with Chinese schools and Schools; Shirley Gregoire, obtaining deeper understanding principal, Hopkins West Junior of the Chinese education sys- High School; Todd Rounda- Chinese acquisitions, page 5 tem. The need is reflected in the bush, science teacher and IB following three areas: coordinator, Hopkins West 1) Partner with a sister school Junior High School; Joe Muel- in China for concrete activities such as the University of Minnesota (CIUMN) ler, curriculum coordinator, Forest Lake exchange of students and teachers in June. The CIUMN would like to Area Schools; Rob Rapheal, president, 2) Get knowledge about the latest trend acknowledge the gracious financial Forest Lake Area Schools Board; and in curriculum reform in China, especial- support given by Confucius Institute Susan Tennyson, strategic data analyst, ly on core subjects of Chinese language, Headquarters/Hanban, and the excellent Edina Public Schools. Yongling Zhang- math, and science, which have implica- logistics planning by one of our Chinese Gorke, assistant director of CIUMN, tions for Mandarin immersion programs partners, Capital Normal University, as was the coordinator and leader of the Dragon Festival 2014, page 10 3) Understand and compare teacher well as International Education Asso- group.
[Show full text]
る中日理論言語学(英語名：Chinese and Japanese
Chinese and Japanese Theoretical Linguistics (CJTL) 第二回中日理論言語学研究会 2005 年 4 月 24 日（日） 14:00～17:30 関西学院大学大阪梅田キャンパス空間移動表現のタイポロジーと限界性 Christine Lamarre （東京大学言語情報科学専攻） [email protected] 問題提起その１ Talmy (2000:214) は S-言語（satellite-framed language）と V-言語(Verb-framed language)を区別するときに、前者の satellite が空間移動の経路のほかに、状態変化や実現を表示することを指摘している：the correlation between the encoding of the path in a motion event (‘the ball rolled in’) and the encoding of fulfillment in an event of realization (the police hunted the fugitive down’) or the encoding of the changed property in an event of state change (‘the candle blew out’) . これは中国語に関しても簡単にみつける関連性：空間移動の経路を表すいわゆる「方向補語」は、このような状態変化を表す「派生的用法」を豊富にもっていることがよく知られている（たとえば：Path --- 球滚进去了 / fulfillment -- 把犯人抓起来了 / changed property 他昏过去了）状態変化といえば、アスペクトのタイプとしては、「限界性」（boundedness）とリンクする。しかし Talmy が自分の提示するタイポロジーのカギが限界性にあるとは明記していない（読んだ限りでは）。そタイポロジーが有効であるとすれば、S-言語と V-言語の差を、S-言語の satellite という形式の存在にあるのか、あるいはその形式が動詞句限界性というアスペクト特徴を付与することがカギなのか？中国語のデータは後者を支持する。問題提起その２典型的な VO 言語であるタイ語とちがって、そして典型的な OV 言語である日本語とちがって、中国語では場所名詞を動詞の前と動詞の後ろという二つの位置に置くことができる。その位置は複数のパラメータによって決まるが、その一つは情報構造で、フォーカスは後ろという傾向がよく知られている。動詞後に置かれる用言的要素（方向補語）や「x＋場所詞」という体言的要素が動詞句に限界性を付与する傾向を認めるなら、これは中国語のこのかなり変わっている類型的特徴の束が整えた環境の中で発展した秩序であるかもしれない。しかし動詞後の位置と限界性を結びつけるのにいくつか問題がある。標準中国語のほかに、その結びつきがより顕著である地域語のデータを取り入れて検証する。まず中国語が空間移動を表すときに用いる素材を紹介した上（I、英文）、II では方向詞がなぜ本来動作の方向と縁のない「限界性」を帯びるようになるかを考えて、方向補語のいくつかの特徴を取り上げる。III では、中国語をほかの S-言語と対照させて、S-言語と V-言語の決定的な差はなにかという問題にもどる。日本語の資料（別紙）は主に中国語の方言データを補充するものである。 11 第二回中日理論言語学研究会 2005 年 4 月 24 日（日）
[Show full text]
The Linguistic Categorization of Deictic Direction in Chinese – with Reference to Japanese – Christine Lamarre
The linguistic categorization of deictic direction in Chinese – With reference to Japanese – Christine Lamarre To cite this version: Christine Lamarre. The linguistic categorization of deictic direction in Chinese – With reference to Japanese –. Dan XU. Space in Languages of China, Springer, pp.69-97, 2008, 978-1-4020-8320-4. hal-01382316 HAL Id: hal-01382316 https://hal-inalco.archives-ouvertes.fr/hal-01382316 Submitted on 16 Oct 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Lamarre, Christine. 2008. The linguistic categorization of deictic direction in Chinese — With reference to Japanese. In Dan XU (ed.) Space in languages of China: Cross-linguistic, synchronic and diachronic perspectives. Berlin/Heidelberg/New York: Springer, pp.69-97. THE LINGUISTIC CATEGORIZATION OF DEICTIC DIRECTION IN CHINESE —— WITH REFERENCE TO JAPANESE —— Christine Lamarre, University of Tokyo Abstract This paper discusses the linguistic categorization of deictic direction in Mandarin Chinese, with reference to Japanese. It focuses on the following question: to what extent should the prevalent bimorphemic (nondeictic + deictic) structure of Chinese directionals be linked to its typological features as a satellite-framed language? We know from other satellite-framed languages such as English, Hungarian, and Russian that this feature is not necessarily directly connected to satellite-framed patterns.
[Show full text]
The Development of 'Wang+Path' Adverbials in Northern Chinese
Breaking Down the Barriers, 887-909 2013-1-050-041-000372-1 When Lexicalization Meets Grammaticalization: The Development of ‘wang+path’ Adverbials * in Northern Chinese Christine Lamarre (柯理思) INALCO-CRLAO This paper discusses a productive compounding pattern, which combines preposition 往 WANG (wàng or wǎng) ‘towards’, with a path-expressing element, either a monosyllabic localizer, e.g. 往裡 wǎnglǐ ‘in’, or a path verb, e.g. 往回 wǎnghuí ‘back’ or 往起 wǎngqǐ ‘up’. These compounds, like directional complements, express the core path meanings of a spatial motion event (‘up’, ‘down’, ‘in’, ‘out’, ‘across’ etc.), but they appear before the verb, whereas directional complements follow the verb. Only the latter (e.g. 拿進去 nájinqu ‘take in’, 拿回去 náhuiqu ‘take back’ or 拿起來 náqilai ‘take up’) have a bounding effect on the clause. We argue that one important motivation for this lexicalization process is the need for two complete symmetrical sets of path- marking elements which share the same repertory of path meanings, but have different aspectual implications for the clause. The second type of compounds (WANG + directional verb) appears later in history than the first type (WANG + localizer), and fills up the gaps existing in the repertory of core path meanings provided by localizers, e.g. huí ‘back’, qǐ ‘up (source-oriented)’, guò ‘over’ etc. We also discuss the role of grammaticalization in this evolution, and conclude on a few typological perspectives. Key words: path of motion, directionals, localizers, Northern Mandarin, lexicali- zation, grammaticalization 1. Introduction We discuss here a productive compounding pattern, which combines preposition 往 WANG (wàng or wǎng) ‘towards’, with a path-expressing element (hereafter path), * We develop here some sections of a paper read at the 4th Kentridge roundtable on Chinese Linguistics on grammaticalization and lexicalization (National Singapore University, September 2008).
[Show full text]
Wild Food Plants and Wild Edible Fungi in Two Valleys of the Qinling Mountains (Shaanxi, Central China) Kang Et Al
JOURNAL OF ETHNOBIOLOGY AND ETHNOMEDICINE Wild food plants and wild edible fungi in two valleys of the Qinling Mountains (Shaanxi, central China) Kang et al. Kang et al. Journal of Ethnobiology and Ethnomedicine 2013, 9:26 http://www.ethnobiomed.com/content/9/1/26 Kang et al. Journal of Ethnobiology and Ethnomedicine 2013, 9:26 http://www.ethnobiomed.com/content/9/1/26 JOURNAL OF ETHNOBIOLOGY AND ETHNOMEDICINE RESEARCH Open Access Wild food plants and wild edible fungi in two valleys of the Qinling Mountains (Shaanxi, central China) Yongxiang Kang1, Łukasz Łuczaj2*, Jin Kang1 and Shijiao Zhang1 Abstract Background: The aim of the study was to investigate knowledge and use of wild food plants in two mountain valleys separated by Mount Taibai – the highest peak of northern China and one of its biodiversity hotspots, each adjacent to species-rich temperate forest vegetation. Methods: Seventy two free lists were collected among the inhabitants of two mountain valleys (36 in each). All the studied households are within walking distance of primary forest vegetation, however the valleys differed in access to urban centers: Houzhenzi is very isolated, and the Dali valley has easier access to the cities of central Shaanxi. Results: Altogether, 185 wild food plant species and 17 fungi folk taxa were mentioned. The mean number of freelisted wild foods was very high in Houzhenzi (mean 25) and slightly lower in Dali (mean 18). An average respondent listed many species of wild vegetables, a few wild fruits and very few fungi. Age and male gender had a positive but very low effect on the number of taxa listed.
[Show full text]
3Rd International Conference on Humanities
• 3rd International Conference on Humanities Education and Social Sciences (ICHESS 2020) Advances in Social Science, Education and Humanities Res earch Volume 496 Chengdu, China 23 – 25 October 2020 Part 1 of 2 Editors: S. Bridges M. F. bin A. Ghani J.H. Du ISBN: 978-1-7138-2492-3 Printed from e-media with permission by: Curran Associates, Inc. 57 Morehouse Lane Red Hook, NY 12571 Some format issues inherent in the e-media version may also appear in this print version. Copyright© (2020) by Atlantis Press All rights reserved. Copyright for individual electronic papers remains with the authors. For permission requests, please contact the publisher: Atlantis Press Amsterdam / Paris Email: [email protected] Conference Website: http://www.atlantis-press.com/php/pub.php?publication=ichess-20 Printed with permission by Curran Associates, Inc. (2021) Additional copies of this publication are available from: Curran Associates, Inc. 57 Morehouse Lane Red Hook, NY 12571 USA Phone: 845-758-0400 Fax: 845-758-2633 Email: [email protected] Web: www.proceedings.com TABLE OF CONTENTS PART 1 UNIVERSITY STUDENTS’ MENTAL HEALTH AND INNOVATION OF CRISIS INTERVENTION MECHANISM ....................................................................................................................... 1 Cheng Tang, Xiao Jiao Cao, Juan Zhang A PROBE INTO THE BASIC COMPUTER TEACHING MODEL OF APPLIED UNDERGRADUATES BASED ON MIXED TEACHING ................................................................................ 5 Tao Jin, Hai-Jun Wang APPLICATION RESEARCH OF MIXED TEACHING MODE OF HIGHER MATHEMATICS BASED ON “SPOC+ FLIPPED CLASSROOM” ............................................................................................. 11 Ruiping Huang, Lei Deng, Xinshe Qi, Guo Li, Cuicui Gao, Xin Wang, Na Wang AN ANALYSIS OF THE DEVELOPMENT PROSPECT ABOUT BEIJING FASHION EDUCATION: TAKE BEIJING INSTITUTE OF FASHION TECHNOLOGY AS AN EXAMPLE .............
[Show full text]
The Linguistic Encoding of Motion Events in Chinese -- with Reference to Cross-Dialectal Variation --1
The Linguistic Encoding of Motion Events in Chinese -- with Reference to Cross-dialectal Variation --1 Christine Lamarre (University of Tokyo, Language & Information Sciences) ABSTRACT / INTRODUCTION • Aims: Chinese has been classified by Talmy (1985:106-7) as a satellite-framed language (S-language), in which the satellites correspond to what are usually called ‘directional complements’ (hereafter designated ‘D’). However, as was pointed out by Tai 2003 and Lamarre 2003c, Chinese does not fit very well in this category, since apart from VD constructions, it also often uses Path verbs to express spatial motion. On the other hand, even in cases when a VD construction is used, Slobin (2004) proposed, on the ground that Chinese directionals are verbal elements and not particles, to put Chinese together with Thai in a third category, called ‘equipollently-framed languages’, where the co-event verb and the directional verb would be of equivalent morphosyntactic weight. We address here the issue of Chinese’s status in Talmy’s typology, with reference to Goldberg & Jackendoff’s discussion (2003) on English resultative constructions, and to Cappelle & Declerck’s discussion of temporal boundedness in English motion events. • Content: In section I we give an overview of the various grammatical categories used in Chinese to express motion. In section II we turn to the controversial VD construction [Co-event verb + Path verb], to examine various prosodic, semantic and syntactic pieces of evidence for the grammaticalization of Path Verbs into Directionals (i.e. Path Satellites in Talmy’s framework). Such evidence goes against Tai 2003 and Slobin 2004’s suggestions to treat Directionals as full verbs functioning on the same level as the co-event verb they follow.
[Show full text]
The Plural Forms of Personal Pronouns in Modern Chinese Baoying Qiu University of Massachusetts Amherst
University of Massachusetts Amherst ScholarWorks@UMass Amherst Masters Theses 1911 - February 2014 2013 The plural forms of personal pronouns in Modern Chinese Baoying Qiu University of Massachusetts Amherst Follow this and additional works at: https://scholarworks.umass.edu/theses Part of the Chinese Studies Commons Qiu, Baoying, "The lurp al forms of personal pronouns in Modern Chinese" (2013). Masters Theses 1911 - February 2014. 1150. Retrieved from https://scholarworks.umass.edu/theses/1150 This thesis is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Masters Theses 1911 - February 2014 by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. THE PLURAL FORMS OF PERSONAL PRONOUNS IN MODERN CHINESE A Dissertation Presented By BAOYING QIU Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment Of the requirements for the degree of MASTER OF ARTS September 2013 Department of Languages, Literatures & Cultures Asian Languages & Literatures © Copyright by Baoying Qiu 2013 All Rights Reserved The Plural Forms of Personal Pronouns in Modern Chinese A Dissertation Presented By BAOYING QIU Approved as to style and content by: _________________________________ Zhongwei Shen, Chair __________________________________ David K. Schneider, Member __________________________________ Elena Suet-Ying Chiu, Member _________________________________ Amanda C. Seaman, Director Asian
[Show full text]
Globaltimit: Acoustic-Phonetic Datasets for the World's Languages
GlobalTIMIT: Acoustic-Phonetic Datasets for the World’s Languages Nattanun Chanchaochai 1, Christopher Cieri 1, Japhet Debrah 1, Hongwei Ding 2, Yue Jiang 3, Sishi Liao 2, Mark Liberman 1, Jonathan Wright 1, Jiahong Yuan 1, Juhong Zhan3, Yuqing Zhan 2 1Linguistic Data Consortium, University of Pennsylvania, U.S.A. 2School of Foreign Languages, Shanghai Jiao Tong University, P.R.C. 3School of Foreign Studies, Xi'an Jiaotong University, P.R.C. [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Although Mandarin Chinese is one of the largest and best- Abstract documented languages in the world, he was not able to find any Although the TIMIT acoustic-phonetic dataset ([1], [2]) was available resources meeting his needs. Many experiences of created three decades ago, it remains in wide use, with more this kind over the past few years have motivated us to find a than 20000 Google Scholar references, and more than 1000 simple and inexpensive approach to designing, implementing, since 2017. Despite TIMIT’s antiquity and relatively small size, and distributing TIMIT-like resources that could plausibly be inspection of these references shows that it is still used in many generalized across all of the world’s languages. research areas: speech recognition, speaker recognition, speech This paper describes our exploration of this method in half a synthesis, speech coding, speech enhancement, voice activity dozen test cases.
[Show full text]
Download Article
Advances in Social Science, Education and Humanities Research, volume 85 4th International Conference on Management Science, Education Technology, Arts, Social Science and Economics (MSETASSE 2016) Phonemes of Xi’an Dialect and Statistic Study of Phonemic Combination Frequency Yonghong Li a, Han Wu b Key Lab of China's National Linguistic Information Technology, Northwest University for Nationalities, Lanzhou, Gansu 730000, China; [email protected], [email protected] Keywords: Phoneme; Xi’an; dialect, frequency Abstract. By using 3000 semantic units of Xi’an dialect from Characters Collection of Chinese Dialects as original database, the article calculates the frequency of initials, finals and tones, and also conducts a frequency statistic of the combination types of initials, finals and tones. Based on this study, the article probes into the syllable distribution, syllabic combination and the role of rhyme, offers a quantitative analysis of function of each phoneme and their combination capacity so as to provide fundamental data for the further study of phonemic theory study of Xi’an dialect. 1. Introduction Phonemes are the smallest speech units to differentiate the semantic meanings in linguistic system. The number and type of phonemes vary from language to language. In Chinese-Tibetan languages, the syllabic form are all single characters with a single sound. The three major elements of speech system are initials, finals and tones. In these languages, some have more than 100 phonemes, some have about 20 – 30 phonemes. All these phonemes are very different from each other in terms of frequency, function and roles. Xi’an dialect belongs to Guanzhong dialect of Shaanxi, is the representative of Dongfu dialect of Guanzhong and that of Shaanxi dialect.
[Show full text]
Download Article
Advances in Social Science, Education and Humanities Research, volume 289 5th International Conference on Education, Language, Art and Inter-cultural Communication (ICELAIC 2018) The Evolution of Zhizhuangzhang Initial Consonants in Zhiyan Cluster of Jin Dialect* Feng Gao School of Literature Xi'an University Xi'an, China 710065 Abstract—The ancient Zhizhuangzhang initial consonants the modern pronunciation of Zhizhuangzhang Jing group in Zhiyan cluster of Jin dialect belong to Changxu type. The initial consonant in Chinese: concourse type, dichotomy type, modern pronunciation pattern of Zhizhaozhang groups in and trisection type. Zhiyan cluster dialect includes Zhiyan cluster is in line with the dialect of “Zhongyuan Zhierzhuang and Zhisanzhang, belonging to the dichotomy Yinyun”, which was formed at the latest in the Yuan Dynasty. type. Xiong Zhenghui (1990) pointed out that in the northern dialect, there are the Jinan type with integration of Keywords—ancient Zhizhuangzhang; modern pronunciation Zhizhuangzhang and the Nanjing type and Changxu type type; formation time; Zhiyan cluster; Jin dialect with Zhi'erzhuang and Zhisanzhang opposite to each other. [2] The relationship between the word and sound of I. INTRODUCTION Zhizhuangzhang in Zhiyan cluster is the same as that of Zhiyan cluster of Jin dialect is located in Yan'an City, Changxu type, so it belongs to the typical Changxu type. Shaanxi Province, including dialects of six counties: Wuqi, Zhidan, Ansai, Yan'an, Yanchang and Ganquan. [1] Zhiyan III. THE DISTRIBUTION SCOPE OF THE TYPICAL cluster is in the transitional zone between Jin dialect and CHANGXU TYPE official language of central China. It has transitional features The dichotomy type of Zhizhuangzhang is widely in many aspects of the dialect.
[Show full text]
Langues Sinitiques Et Typologie : Deux Études De Cas
Langues sinitiques et typologie : deux études de cas Christine Lamarre* INTRODUCTION La linguistique chinoise connaît depuis une trentaine d’années un regain d’intérêt pour la variation spatiale - et sa signification du point de vue de la caractérisation typologique - de ce qu’on appelle «le chinois», ou parfois les «langues sinitiques», définies comme suit par Alain Peyraube (2011) dans le Dictionnaire des langues (Bonvini, Busuttil & Peyraube 2011). Les langues sinitiques ou langues chinoises constituent l’une des deux branches de la famille des langues sino-tibétaines, l’autre branche étant le tibéto-birman. Comme taxon ou sous-groupe des langues, elles sont aussi diverses que les langues romanes ou germaniques de la famille indo-européenne. Ainsi, le cantonais et le mandarin, sous leurs formes parlées, ne sont pas du tout mutuellement compréhensibles, au même titre que le roumain et le portugais, ou l’anglais et l’allemand. Pour illustrer l’apport de ces recherches à la réflexion typologique, nous abordons ici deux domaines pour lesquels la prise en compte de la variation interne aux langues sinitiques peut avoir une conséquence pour la caractérisation typologique du chinois : 1) la morphologie, 2) l’expression du déplacement. Avant de rentrer dans le vif du sujet, il nous faut revenir sur la notion de «mandarin». La variation la plus évidente au sein des langues sinitiques est celle opposant le mandarin aux langues sinitiques parlées au sud-est de la Chine, comme le min, le yue, le hakka1 ou le wu. Le chinois standard, fondé sur le dialecte de Pékin, est généralement adopté comme représentant du groupe mandarin, c’est le cas par exemple du Dictionnaire des langues cité plus haut, ou encore de l’Atlas des structures des langues du monde (WALS, Haspelmath et al.
[Show full text]