Word Segmentation and Ambiguity in English and Chinese NLP & IR

Total Page:16

File Type:pdf, Size:1020Kb

Word Segmentation and Ambiguity in English and Chinese NLP & IR Word Segmentation and Ambiguity in English and Chinese NLP & IR by ± Jin Hu Huang, B.Eng.(Comp.Sc)Grad.Dip.(Soft.Eng.) School of Computer, Engineering and Mathematics, Faculty of Science and Engineering August 10, 2011 A thesis presented to the Flinders University of South Australia in total fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Adelaide, South Australia, 2012 c (Jin Hu Huang, 2012) CONTENTS Abstract ::::::::::::::::::::::::::::::::::::: ix Certification ::::::::::::::::::::::::::::::::::: xii Acknowledgements ::::::::::::::::::::::::::::::: xiii Preface ::::::::::::::::::::::::::::::::::::: xv 1. Introduction ::::::::::::::::::::::::::::::::: 1 1.1 Context Sensitive Spelling Correction . 1 1.2 Chinese Pinyin Input . 2 1.3 Chinese Segmentation . 6 1.4 Chinese Information Retrieval (IR) . 9 1.5 Thesis Contribution . 10 1.6 Thesis Organization . 13 Part I Word Disambiguation for English Spelling Checking and Chinese Pinyin Input 15 2. Machine Learning for Context Sensitive Spelling Checking ::::::: 16 2.1 Introduction . 16 2.2 Confused Words . 17 2.3 Context-sensitive Spelling Correction . 18 2.4 Experiment and Result . 21 2.5 Interface . 29 2.6 Conclusion and Future Work . 30 2.7 Reflections . 31 Contents iii 3. Statistical N-gram Language Modeling :::::::::::::::::: 33 3.1 Statistical Language Modeling . 33 3.2 N-Gram Markov Language Models . 36 3.3 Smoothing Methods . 37 3.3.1 Add One Smoothing . 37 3.3.2 Interpolation Smoothing . 38 3.3.3 Absolute Discounting . 38 3.3.4 Good-Turing Discounting . 38 3.3.5 Katz Back-off Smoothing . 39 3.3.6 Witten-Bell Smoothing . 39 3.3.7 Kneser-Ney Smoothing . 40 3.3.8 Modified Kneser-Ney Smoothing . 40 3.4 Discussion . 41 3.5 Conclusion . 41 4. Compression-based Adaptive Approach for Chinese Pinyin Input :::: 42 4.1 Introduction . 42 4.2 Statistical Language Modelling . 44 4.2.1 Pinyin-to-Character Conversion . 45 4.2.2 SLM Evaluation . 45 4.3 Compression Theory . 46 4.4 Adaptive Modelling . 47 4.5 Prediction by Partial Matching . 48 4.6 Experiment and Result . 52 4.7 Conclusion . 54 5. Error-driven Adaptive Language Modeling for Pinyin-to-Character Con- version :::::::::::::::::::::::::::::::::::: 55 5.1 Introduction . 55 5.2 LM Adaption Methods . 56 5.2.1 MAP Methods . 56 5.2.2 Discriminative Training Methods . 57 5.3 Error-driven Adaption . 58 5.4 Experiment and Result . 61 5.5 Conclusion . 65 Contents iv Part II Chinese Word Segmentation and Classification 66 6. Chinese Words and Chinese Word Segmentation ::::::::::::: 67 6.1 Introduction . 67 6.2 The Definition of Chinese Word . 67 6.3 Chinese Word Segmentation . 70 6.3.1 Segmentation Ambiguity . 70 6.3.2 Unknown Words . 71 6.4 Segmentation Standards . 72 6.5 Current Research Work . 73 6.6 Conclusion . 76 7. Chinese Word Segmentation Based on Contextual Entropy ::::::: 78 7.1 Introduction . 78 7.2 Contextual Entropy . 79 7.3 Algorithm . 80 7.3.1 Contextual Entropy . 80 7.3.2 Mutual Information . 82 7.4 Experiment Results . 83 7.5 Conclusion . 90 7.6 Reflections . 90 8. Unsupervised Chinese Word Segmentation and Classification :::::: 92 8.1 Introduction . 92 8.2 Word Classification . 93 8.3 Experiments and Future Work . 96 8.4 Conclusion . 99 8.5 Reflections . 100 Part III Chinese Information Retrieval 101 9. Using Suffix Arrays to Compute Statistical Information ::::::::: 102 9.1 Suffix Trees . 102 9.2 Suffix Arrays . 104 9.3 Computing Term Frequency and Document Frequency . 109 9.4 Conclusion . 112 Contents v 10. N-gram based Approach for Chinese Information Retrieval ::::::: 113 10.1 Introduction . 113 10.2 Chinese Information Retrieval . 114 10.2.1 Single-character-based (Uni-gram) Indexing . 114 10.2.2 Multi-character-based (N-grams) Indexing . 115 10.2.3 Word-based Indexing . 115 10.2.4 Previous Works . 116 10.3 Retrieval Models . 121 10.3.1 Vector Space Model . 121 10.3.2 Term Weighing . 122 10.3.3 Query and Document Similarity . 123 10.3.4 Evaluation . 123 10.4 Experimental Setup . 126 10.4.1 TREC Data . 126 10.4.2 Measuring Retrieval Performance . 127 10.5 Experiments and Discussion . 127 10.5.1 Using Dictionary-based Approach . 127 10.5.2 Statistical Segmentation Approach . 128 10.5.3 Using Different N-grams . 131 10.5.4 Word Extraction . 136 10.5.5 Removing Stop Words . 142 10.6 Discussion . 145 10.7 Conclusion . 147 11. Conclusions ::::::::::::::::::::::::::::::::: 149 11.1 Thesis Review . 149 11.2 Future Work . 151 Appendix 153 A. The Appendix: Tables for TREC 5 & 6 Chinese Information Retrieval Results ::::::::::::::::::::::::::::::::::: 154 B. The Appendix: Examples of TREC 5 & 6 Chinese Queries ::::::: 162 Bibliography :::::::::::::::::::::::::::::::::: 188 LIST OF FIGURES 1.1 Number of Homonyms for Each Pinyin . 4 5.1 Perplexity compare between static and adaptive model on Modern Novel.................................. 62 5.2 Perplexity compare between static and adaptive model on Martial Arts Novel . 63 5.3 Perplexity compare between static and adaptive model on People's Daily 96 . 64 7.1 Contextual Entropy and Mutual Information for \The two world wars happened this century had brought great disasters to human being including China." . 80 10.1 Average Precision At Different Recall For Dictionary-based and Statistical Segmentation Approaches . 129 10.2 Average Precision At X Documents Retrieved For Dictionary-based and Statistical Segmentation Approaches . 130 10.3 Average Precision for 54 Queries Using 1-gram, 2-grams, 3-grams and 4-grams . 134 10.4 The Impact of Extracted Words on 54 Queries . 139 10.5 The Impact of Stop Words on 54 Queries . 143 LIST OF TABLES 2.1 Diameter with occurrences, significance and probability - number of contexts (WSJ 87-89,91-92) . 22 2.2 Relationship between probability and significance - number of con- texts (WSJ 87-89,91-92) . 23 2.3 False errors and coverage testing on test and validation corpora with no errors seeded but two real errors found . 24 2.4 True errors detected (recall) and corrected when errors seeded ran- domly . 25 2.5 Seeded errors of the confusion set of \from" and \form" (S,P≥95%) 25 2.6 False positive rate (FPR), true positive rate (TPR) and informed- ness when errors seeded . 27 2.7 Spelling Errors Found in WSJ0801 and WSJ1231 . 28 4.1 Compression models for the string \dealornodeal" . 47 4.2 PPM model after processing the string dealornodeal . 50 4.3 Compression results for different compression methods . 52 4.4 Character Error Rates for Kneser-Ney, Static and Adaptive PPM 53 5.1 Witten-Bell smoothing model after processing the string dealornodeal 60 5.2 Comparing perplexity and CER using different smoothing methods on testing corpus . 63 5.3 CER and percentage of data used for adaption . 64 5.4 Testing on Xinhua 96 with different mixed models with adaption . 65 6.1 Some differences between the segmentation standards . 73 7.1 Validation results based on Recall, Precision and F-Measure for Eq. 7.1 7.2 7.3 7.4 . 84 7.2 Validation results based on Recall, Precision and F-Measure for Eq. 7.5 7.6 7.7 7.8 . 84 7.3 Validation results on Recall, Precision and F-measure according to Eq. 7.9 7.10 7.11 . 85 List of Tables viii 7.4.
Recommended publications
  • Research on the Time When Ping Split Into Yin and Yang in Chinese Northern Dialect
    Chinese Studies 2014. Vol.3, No.1, 19-23 Published Online February 2014 in SciRes (http://www.scirp.org/journal/chnstd) http://dx.doi.org/10.4236/chnstd.2014.31005 Research on the Time When Ping Split into Yin and Yang in Chinese Northern Dialect Ma Chuandong1*, Tan Lunhua2 1College of Fundamental Education, Sichuan Normal University, Chengdu, China 2Sichuan Science and Technology University for Employees, Chengdu, China Email: *[email protected] Received January 7th, 2014; revised February 8th, 2014; accepted February 18th, 2014 Copyright © 2014 Ma Chuandong, Tan Lunhua. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In accordance of the Creative Commons Attribution License all Copyrights © 2014 are reserved for SCIRP and the owner of the intellectual property Ma Chuandong, Tan Lun- hua. All Copyright © 2014 are guarded by law and by SCIRP as a guardian. The phonetic phenomenon “ping split into yin and yang” 平分阴阳 is one of the most important changes of Chinese tones in the early modern Chinese, which is reflected clearly in Zhongyuan Yinyun 中原音韵 by Zhou Deqing 周德清 (1277-1356) in the Yuan Dynasty. The authors of this paper think the phe- nomenon “ping split into yin and yang” should not have occurred so late as in the Yuan Dynasty, based on previous research results and modern Chinese dialects, making use of historical comparative method and rhyming books. The changes of tones have close relationship with the voiced and voiceless initials in Chinese, and the voiced initials have turned into voiceless in Song Dynasty, so it could not be in the Yuan Dynasty that ping split into yin and yang, but no later than the Song Dynasty.
    [Show full text]
  • The Fundamentals of Chinese Historical Phonology
    ChinHistPhon – MA 1st yr Basics/ 1 Bartos The fundamentals of Chinese historical phonology 1. Old Mandarin (early modern Chinese; 14th c.) − 中原音韵 Zhongyuan Yinyun “Rhymes of the Central Plain”, written in 1324 by 周德清 Zhou Deqing: A pronunciation guide for writers and performers of 北曲 beiqu-verse in vernacular plays. − Arrangement: o 19 rhyme categories, each named with two characters, e.g. 真文, 江阳, 先天, 鱼模. o Within each rhyme category, words are divided according to tone category: 平声阴 平声阳 上声 去声 入声作 X 声 o Within each tone, words are divided into homophone groups separated by circles. o An appendix lists pairs of characters whose pronunciation is frequently confused, e.g.: 死有史 米有美 因有英 The 19 Zhongyuan Yinyun rhyme categories: Old Mandarin tones: The tone categories were the same as for modern standard Mandarin, except: − The former 入-tone words joined the other tone categories in a more regular fashion. ChinHistPhon – MA 1st yr Basics/ 2 Bartos 2. The reconstruction of the Middle Chinese sound system 2.1. Main sources 2.1.1. Primary – rhyme dictionaries, rhyme tables (Qieyun 切韵, Guangyun 广韵, …, Jiyun 集韵, Yunjing 韵镜, Qiyinlüe 七音略) – a comparison of modern Chinese dialects – shape of Chinese loanwords in ’sinoxenic’ languages (Japanese, Korean, Vietnamese) 2.1.2. Secondary – use of poetic devices (rhyming words, metric (= tonal patterns)) – transcriptions: - contemporary alphabetic transcription of Chinese names/words, e.g. Brahmi, Tibetan, … - contemporary Chinese transcription of foreign words/names of known origin ChinHistPhon – MA 1st yr Basics/ 3 Bartos – the content problem of the Qieyun: Is it some ‘reconstructed’ pre-Tang variety, or the language of the capital (Chang’an 长安), or a newly created norm, based on certain ‘compromises’? – the classic problem of ‘time-span’: Qieyun: 601 … Yunjing: 1161 → Pulleyblank: the Qieyun and the rhyme tables ( 等韵图) reflect different varieties (both geographically, and diachronically) → Early vs.
    [Show full text]
  • Norman 1988 Chapter 3.Pdf
    3 The Chinese script 3.1 The beginnings of,CJunese '!riting1 The Chinese script appears "Sa fully ,developed writing system in, the late Shang • :1 r "' I dynasty (f9urteenth to elev~nth cellt,uries ~C). From this pc;riod we have copious examples of the script inscribed or written on bones and tortoise shells, for. the most part in t~e form of short divinatory texts. From the same,period there also e)cist.a number of inscriptiops on bropze vessels of ~aljous sorts. The fprmer type of graphic,rec9rd is referr¥d to ,as th~ oracle ~bsme s~ript while the latter is com­ monly known as the bronze script. The script of. this period is already a fully ,deve!oped writing syst;nt: capable of recordi~~ the.contem.p~rary chinese lan­ guage in a, complete and unampigpous manner. TJle maturity. of this early script has, sqgg~steq to Il!any s~holars that.itJnust have passed through.a fairly long period of development before reaching this stage, but the few, examples of writing whic,h prece<!e th((. ~ourteenth century are unfortunately too sparse to allow any sort of reconstru~tion of. ~4i~ developm~nt,. 2 On \he basis of avaija,ble evidence, hpw~verl it would l)Ot be unreas~.nal;>lp to assup1e., ;h,at Chine~e 'Vriting began sometime in the early Shang or even somewhat earlier in the late Xia dynasty or approximately in the seventeenth century BC (Qiu 1978, 169). From the very beginning the Chinese writing system has basically been mor­ phemic: that is, almost every graph represents a single morph~me.
    [Show full text]
  • INFORMATION to USERS the Most Advanced Technology Has Been Used to Photo­ Graph and Reproduce This Manuscript from the Microfilm Master
    INFORMATION TO USERS The most advanced technology has been used to photo­ graph and reproduce this manuscript from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are re­ produced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the back of the book. These are also available as one exposure on a standard 35mm slide or as a 17" x 23" black and white photographic print for an additional charge. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6" x 9 ’ black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order. UMI University Microfilms International A Bell & Howell Information C om pany 300 North ZaeD Road.
    [Show full text]
  • A Study of the Standardization of Chinese Writing/ Ying Wang University of Massachusetts Amherst
    University of Massachusetts Amherst ScholarWorks@UMass Amherst Masters Theses 1911 - February 2014 2008 A study of the standardization of Chinese writing/ Ying Wang University of Massachusetts Amherst Follow this and additional works at: https://scholarworks.umass.edu/theses Wang, Ying, "A study of the standardization of Chinese writing/" (2008). Masters Theses 1911 - February 2014. 2060. Retrieved from https://scholarworks.umass.edu/theses/2060 This thesis is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Masters Theses 1911 - February 2014 by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. A STUDY OF THE STANDARDIZATION OF CHINESE WRITING A Thesis Presented by YING WANG Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment of the requirements for the degree of MASTER OF ARTS May 2008 Asian Languages and Literatures © Copyright by Ying Wang All Rights Reserved STUDIES OF THE STANDARDIZATION OF CHINESE WRITING A Thesis Presented by YING WANG Approved as to style and content by: hongwei Shen, Chair Donald E. GjertsoH, Member Enhua Zhang, Member hongwei Shen, Director Asian Languages and Literatures Program Department of Languages, Literatures and Cultures Julie Caii s, Chair Departira hguages, Literatures and Cultures ACKNOWLEDGEMENTS I would like to earnestly thank my advisor, Professor Zhongwei Shen, for his helpful, patient guidance and support in all the stages of my thesis writing. Thanks are also due to my committee members Professor Donald Gjertson and Professor Enhua Zhang, for their generous help. My friends, Mathew Flannery and Charlotte Mason, have also edited thesis my in various stages, and to them I am truly grateful.
    [Show full text]
  • November 2008 Editorial Paris
    November 2008 Editorial On November 29 and 30, at the EFEO Center in Seam Reap, will be held the first joint workshop of the École française d’Extrême-Orient and the Electronic Cultural Atlas Initiative (ECAI–University of California at Berkeley) on geographic information systems (GIS) and online databases. The objective of this workshop is to gather EFEO researchers and their collaborators together with specialists from the ECAI in order to present work in progress, to compare experiences, and to discuss and explore possibilities and perspectives for development and collaboration in the field of GIS and databases. Special attention will be paid to the EFEO/Agence Nationale de la Recherche (ANR) project L’espace khmer ancien: construction d’un corpus numérique de données archéologiques et épigraphiques [Ancient Khmer Territory: constructing a digital corpus of archaeological and epigraphic data]. Paris Colloquia, missions, and meetings At the annual meeting of the American Academy of Religion–AAR, November 1-3 in Chicago, Franciscus Verellen, Director, will take part in a round table entitled “A Conversation with Franciscus Verellen and Stephen F. Teiser [holder of the D.T. Suzuki chair in Buddhist studies at Princeton University],” to be chaired by James Robson (Harvard University). He will also be respondent to a panel on “Chronicling the Dao: A Critical Appraisal of Kristofer Schipper and Franciscus Verellen’s The Taoist Canon: A Historical Companion to the Daozang (University of Chicago Press, 2005),” to be chaired by Jonathan Herman (University of Georgia). From November 24 to 27, Franciscus Verellen will be at the EFEO Center in Hanoi where he will take part in the launch of the publication Champa and the Archaeology of My Son (Vietnam), edited by Andrew Hardy, Mauro Cucarzi, and Patrizia Zolese (Singapore, NUS Press, 2008).
    [Show full text]
  • The Plural Forms of Personal Pronouns in Modern Chinese Baoying Qiu University of Massachusetts Amherst
    University of Massachusetts Amherst ScholarWorks@UMass Amherst Masters Theses 1911 - February 2014 2013 The plural forms of personal pronouns in Modern Chinese Baoying Qiu University of Massachusetts Amherst Follow this and additional works at: https://scholarworks.umass.edu/theses Part of the Chinese Studies Commons Qiu, Baoying, "The lurp al forms of personal pronouns in Modern Chinese" (2013). Masters Theses 1911 - February 2014. 1150. Retrieved from https://scholarworks.umass.edu/theses/1150 This thesis is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Masters Theses 1911 - February 2014 by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. THE PLURAL FORMS OF PERSONAL PRONOUNS IN MODERN CHINESE A Dissertation Presented By BAOYING QIU Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment Of the requirements for the degree of MASTER OF ARTS September 2013 Department of Languages, Literatures & Cultures Asian Languages & Literatures © Copyright by Baoying Qiu 2013 All Rights Reserved The Plural Forms of Personal Pronouns in Modern Chinese A Dissertation Presented By BAOYING QIU Approved as to style and content by: _________________________________ Zhongwei Shen, Chair __________________________________ David K. Schneider, Member __________________________________ Elena Suet-Ying Chiu, Member _________________________________ Amanda C. Seaman, Director Asian
    [Show full text]
  • On Chinese Traditional Linguistics Philology Or Linguistics
    2nd International Conference on Education, Language, Art and Intercultural Communication (ICELAIC 2015) On Chinese Traditional Linguistics Philology or Linguistics Haiyan Li School of Nationalities Huanghe Science and Technology College Zhengzhou, China Abstract—For a long time, different people hold different proposed that, “linguistics has been regarded as an view for whether Chinese traditional linguistics is philology or independent subject since Han Dynasty.” linguistics. This paper analyzes language research in various periods of China. It is more practical that Chinese traditional Pu Zhizhen believes that ancient Chinese language linguistics belongs to philology. Meanwhile, it can be seen that research belongs to linguistics. Chinese traditional philology has laid a solid foundation for Then, ancient Chinese language research is on earth generation and development of Chinese linguistics. philology or linguistics? Which one is more practical? Keywords—linguistics; philology Through comprehensive opinions of scholars and combined with the author‟s experience, here, I talk about my immature views. I believe that, to determine nature of ancient Chinese I. INTRODUCTION language research, firstly we must clarify difference between With a long history, Chinese language research has linguistics and philology. begun to sprout as early as the Spring and Autumn Period Linguistics is an independent science taking language as and the Warring States Period, such as discussion on special research object, whose basic task is to research language issues by pre-Qin thinkers, debate on “name” and language law and make people understand conceptual “actuality” in the Spring and Autumn Period and the Warring knowledge of language. This research result can get States Period, Shuo Wen Jie Zi written by Xushen from the scientific, systematic and comprehensive language theory.
    [Show full text]
  • Paper Title (Use Style: Paper Title)
    Advances in Social Science, Education and Humanities Research, volume 233 3rd International Conference on Contemporary Education, Social Sciences and Humanities (ICCESSH 2018) Study on the Lexicographical Value of "Juyan Xinjian"* Hongli Ge Faculty of Liberal Arts Northwest University Xi'an, China 710127 Abstract—The value of unearthed literature is reflected in and Han Dynasties also verified this point. For example, in the addition of missing words in dictionaries. Also, it would the bamboo slip of Qin tombs of Shuihudi—The way of sort out historical inheritance and changes between certain being officials, the officials should control the emotions such words and written records. It provides the reference for as anger and pleasure, sadness, intelligence and foolishness. writing the complete lexicography. And it can also provide They should encourage themselves with braveness, softness some reference for today's reform of writing system. and kindness. (怒能喜,乐能哀,智能愚,壮能衰,恿能屈,刚能柔,仁能 忍.) In Han Bamboo Slips of Yinqueshan—Master Sun's Art Keywords—Juyan Xinjian; lexicography; reform of writing of War, "the chaos was born in the rule. The timidity was system born in braveness, and the weakness was born in the mightiness (乱生于治,胁(怯)生于恿,弱生于强). However, this I. INTRODUCTION word-usage situation has changed in the Han Dynasty. For The lexicographical value of the unearthed literature is example, "恿" (brave) and "勇" (brave) appeared in "Juyan embodied in many aspects such as adding meaning of a word Xinjian". And the word "恿" (brave) appeared three times. or an item, pre-documentary evidence, supplement to However, it was not used as another form of "勇" (brave).
    [Show full text]
  • “Regularities” and “Irregularities” in Chinese Historical Phonology
    “Regularities” and “irregularities” in Chinese historical phonology Tianrang (Quain) Bu Honors Thesis Department of Anthropology Oberlin College April 2018 Advisor: Jason Haugen 1 ABSTRACT With a combination of methodologies from Western and Chinese traditional historical linguistics, this thesis is an attempt to survey and synthetically analyze the major sound changes in Chinese phonological history. It addresses two hypotheses – the Neogrammarian regularity hypothesis and the unidirectionality hypothesis – and tries to question their validity and applicability. Drawing from fourteen types of “regular” and “irregular” processes, the thesis argues that the origins and impetuses of sound change is far from just phonetic environment (“regular” changes) and lexical diffusion (“irregular” changes), and that sound change is not unidirectional because of the existence and significance of fortifying and bi/multidirectional changes. The thesis also examines the sociopolitical aspect of sound change through the discussion of language changes resulting from social, geographical and historical factors, suggesting that the study of sound change should be more interdisciplinary and miscellaneous in order to explain the phenomena more thoroughly and reach a better understanding of how human languages function both synchronically and diachronically. KEY WORDS: Chinese, historical, phonology, sound change 2 Table of contents List of abbreviations and keys…………………………………………………… 5 Index of tables and figures………………………………………………………. 8 1. Introduction…………………………………………………………………… 10 2. Backgrounds………………………………………………………………….. 14 2.1. Overview of historical linguistics……………………………………………... 14 2.1.1. A brief history of historical linguistics………………………………………… 14 2.1.2. Neogrammarian regularity hypothesis and the comparative method………….. 16 2.1.3. Unidirectionality hypothesis and its application in phonology…………........... 19 2.2. Overview of historical Chinese phonology……………………………………. 21 2.2.1.
    [Show full text]
  • Journal of Chinese Linguistics
    JOURNAL OF CHINESE LINGUISTICS VOLtTh1E 24, NUMBER 2 JUNE 1996 EDITED BY WILLIAM S-Y. WANG MATTHEW Y. CHEN TSU-LIN MEl CHIN-CHUAN CHENG ALAIN PEYRAUBE CHU-REN HUANG ZHONGWEI SHEN SHU-XIANG LYU JAMES H-Y. TAl OVID J.L. TZENG PALATALIZATION OF OLD CHINESE VELARS* Axel Schuessler Wartburg College ABSTRACT Qicyun system (QYS) palatal initials which are suspected of an Old Chinese (OC) velar origin arc of two types: (1) Type I palatals occur in certain syllables with front vowels which are subject to the chongniu phenomenon: palatalized OC velars are in complemetary distribution with chongniu division ("grade") 4 syllables. Therefore, such palatals can be reconstructed as ordinary velars in OC, followed by whatever gave rise to QYS div. 4 chongniu medial and/or vocalism, e.g. 1i zhi. < OC *ke. (2) Type II is the QYS initial tshj which goes back to some initial cluster involving a velar and *I, with any vowel, e.g. Ill cbuiin < OC •k'lun (?). I. INTRODUCTION Words which arc reconstructed in the Qieyun system (QYS) with initial palatals such as tSj, tshj and fj (Karlgrcn as amended by Li 1971) arc generally thought to derive from, or be phonetically close to, Old Chinese (OC) initial dental stop consonants because they alternate in phonetic series quite regularly with the QYS initial dentals t, th, d, and the supradcntals tj- \hj. 9j. For example, il!t QYS ZjWJ is used as a phonetic clement in the word tAl) 11', hence these two initials have at some time probably been close phonetically. Therefore, Karlgren reconstructed OC •d]ang for the fonner, and OC *tAng for the latter, while Li disregarded non-contrasting features and set up OC *djang and •tang respectively.
    [Show full text]
  • The Use of Embodied Animation for Beginning Learners of Chinese Characters
    The Effect of Instructional Embodiment Designs on Chinese Language Learning: The Use of Embodied Animation for Beginning Learners of Chinese Characters Ming-Tsan Pierre Lu Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy under the Executive Committee of the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2011 © 2011 Ming-Tsan Pierre Lu ALL RIGHTS RESERVED ABSTRACT The Effect of Instructional Embodiment Designs on Chinese Language Learning: The Use of Embodied Animation for Beginning Learners of Chinese Characters Ming-Tsan Pierre Lu The focus of this study was an investigation of the effects of embodied animation on the retention outcomes of Chinese character learning (CCL) for beginning learners of Chinese as a Foreign Language (CFL). Chinese characters have three main features: semantic meaning, pronunciation, and written form. Chinese characters are different from English words in that they are non-alphabetic orthographies. Though popular, they are deemed very hard to learn. However, Chinese character processing is found to be neurologically related to human body movements, or at least the imagination of them. Literature also indicated the importance of embodied cognition, imagination, and technology use in human language memory and learning. The design of embodied animation for a computer-based CCL program is developed which consists of three types of characters. The study used Between-Subject Post-test Only Control Group experimental design with sixty-nine adults. The study compared five learning conditions: embodied animation learning (EAL), human-image animation learning (HAL), object-image animation learning, no-animation etymology learning, and traditional learning (serving as a control group).
    [Show full text]