CASS-LING’s Linguistic Infrastructure: Resources, Platforms and Services

Wei WANG, Aijun LI and Danqing LIU Institute of Linguistics, Chinese Academy of Social Sciences,

Abstract Multimodal Corpus of Life Course from Womb to Tomb

As the highest academic institution of linguistic research in • CASS-Child Corpus • Spoken words of 4,000 the People’s Republic of China, the Institute of Linguistics of children Chinese Academy of Social Sciences (CASS-LING) has • Multimodal Corpus of Children from Newborns collected, compiled and created many language resources and to ASD established dedicated platforms to provide varied services, • Language based on solid academic studies, to a society of high linguistic • CASS-Aging Corpus development diversity. Such resources, platforms and services include: • Multimodal Corpus of • Language Aging from Old-Old to China’s most authoritative Contemporary Chinese Dictionary degradation Oldest Old and Xinhua Dictionary, world’s most popular dictionary • Spoken discourses • Atypical aging according to the Guinness World Records; the 41-volume in real interaction

Dictionaries of Contemporary Chinese and the 40- • Discourse-CASS with volume Speech Archives of Contemporary Chinese Dialects; a rich annotation supportive online system for dictionary compilation; the • 2,000 dialogues Translation of the Chinese-English bilingual version of • 3,000 hours of speech by Database of the of the Chinese Dialects with data 2,000 speakers from 10 of 10 dialects; a benchmark pronunciation model of 1.5-6 y Contemporary Chinese Dictionary has almost been dialectal regions finished and the translation of versions in Georgian, • Multimodal Corpus of native Putonghua speakers based on speech data collected Arabic, Russian, Spanish and Persian will start soon. Criminal Suspects from 4,000 plus children; a visual 3D pronunciation model for English phonetic learning based on 10,000 plus hours of The current global trend of digitization is leading our speech recordings of English learners across major Chinese society into the era of Internet+ and artificial intelligence. Corpus for L2 learners: AESOP-CASS, 10,000 hours English speech dialects as well as languages of non-Han ethnic groups; the The computer and smart-phone applications of Xinhua state government’s examination and standardization of the Dictionary and Contemporary Chinese Dictionary have pronunciation of characters and words. been released respectively in 2017 and 2019 to better Mandarin Xia meet our readers’ requirements of online language use. men Wu Min Fu Tai zhou yuan Population Distribution of Languages and Dialects in China Language Diversity: Jin Dialects Xiang Chang Min/Jin Ping sha /Xiang yao The Flourishing of Linguistic Typology Studies Shuang Da Yue Hui feng Yun tong Pinhua cheng

Bei a jing Mongolian Tian Xi’an jin

Language Atlas of China Shan Mandarin/ Hang Minority Korean Uyghur dong Wu zhou languages Ninety-five percent of the population speaks Chinese dialects, A Handbook for Grammatical Investigation which include ten major groups: Mandarin, Jin, Wu, Hui, Zhen Ning Xiang, Gan, Min, (Yue), and Hakka. The and Research jiang bo languages of non-Han ethnic groups are spoken by 5% of the Shang Tibetan population, with the largest number of speakers being the hai Zhuang and the smallest being the Hezhen.

Xinhua Dictionary and Contemporary Chinese Dictionary The 40-volume Speech Archives of Platforms and APPs in New China’s Cultural Development Contemporary Chinese Dialects

Illiteracy rate when New Xinhua Dictionary Contemporary Chinese Dictionary China was founded in 1949 10 dialects of First released in December 现代汉语词典 Urban Rural 1953, Xinhua Dictionary has Mandarin, Xiang, areas areas b e e n n a m e d a f t e r t h e Gan, Hakka, Wu, Yue groups, with 10 Chinese words meaning 0.8 0.95 新华 others coming. 'a new China'. Shouldering Xinhua Dictionary the task of setting rules for The 41-volume Dictionaries of The Database of the 新华字典 the Contemporary Chinese Contemporary Chinese Dialects Grammars of the Chinese and eliminating illiteracy in Dialects an old rural country then, it Wechat Applet: JIUZHOU YINJI has been compared with a Linguistic and Cognitive Studies on 九州⾳集 jack that has lifted the whole A network platform for collecting and demonstrating country’s new cause of Children and Aged People the speech sounds around the world culture and education. As of July 28, 2015, its global Online and offline tests for phonetic and phonological circulation has reached 567 development based on Pronunciation Normal million copies, which scored two Guinness World Records a s t h e “ m o s t p o p u l a r dictionary” and the “best- selling book (regularly updated).” Phonetic and Phonologic Development Tests Following the Chinese-English edition published in 2013, (picture books for 1.5-6-year-) versions in languages of China's non-Han ethnic groups of the dictionary will also be available soon, with the Uygur and Kazakh ones being the first two.

Contemporary Chinese Dictionary Language 3D Pronunciation Training Platform Till now a total of 70 million copies of the development dictionary have been sold. While it has fulfilled Speech its task of setting up lexical rules and therapy and enhancing the education of the nation's rehabilitation common language, it has also played a key Speech role in helping overseas to pathology learn and use the language and to identify with their motherland.