Chinese Classifiers (Measure Words): a Phenomenon That Is Hard to Translate 119
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Noun Group and Verb Group Identification for Hindi
Noun Group and Verb Group Identification for Hindi Smriti Singh1, Om P. Damani2, Vaijayanthi M. Sarma2 (1) Insideview Technologies (India) Pvt. Ltd., Hyderabad (2) Indian Institute of Technology Bombay, Mumbai, India [email protected], [email protected], [email protected] ABSTRACT We present algorithms for identifying Hindi Noun Groups and Verb Groups in a given text by using morphotactical constraints and sequencing that apply to the constituents of these groups. We provide a detailed repertoire of the grammatical categories and their markers and an account of their arrangement. The main motivation behind this work on word group identification is to improve the Hindi POS Tagger’s performance by including strictly contextual rules. Our experiments show that the introduction of group identification rules results in improved accuracy of the tagger and in the resolution of several POS ambiguities. The analysis and implementation methods discussed here can be applied straightforwardly to other Indian languages. The linguistic features exploited here are drawn from a range of well-understood grammatical features and are not peculiar to Hindi alone. KEYWORDS : POS tagging, chunking, noun group, verb group. Proceedings of COLING 2012: Technical Papers, pages 2491–2506, COLING 2012, Mumbai, December 2012. 2491 1 Introduction Chunking (local word grouping) is often employed to reduce the computational effort at the level of parsing by assigning partial structure to a sentence. A typical chunk, as defined by Abney (1994:257) consists of a single content word surrounded by a constellation of function words, matching a fixed template. Chunks, in computational terms are considered the truncated versions of typical phrase-structure grammar phrases that do not include arguments or adjuncts (Grover and Tobin 2006). -
“Sorry We Apologize So Much”
Intercultural Communication Studies VIII-1 1998-9 Chu-hsia Wu [Special phonetic symbols do not appear in the online version] Linguistic Analysis of Chinese Verb Compounds and Measure Words to Cultural Values Chu-hsia Wu National Cheng Kung University, Taiwan Abstract Languages derived from different families show different morphological and syntactic structures, therefore, reflect different flavors in the sense of meaning. Inflections and derivatives do more work in Western languages than is asked of them in Chinese. However, the formation of verb compounds allows Chinese to attain the enlargement of lexicon. On the contrary, measure words do more work in Chinese but there is no exact equivalent of Chinese measure words in English. In this study, the Chinese verb compounds such as verb-object compound, cause-result compound, synonymous, and reduplication to cultural values will be discussed first. Measure Words to Cultural Symbolism will be rendered next from two aspects, pictorial symbolism and the characteristics to distinguish noun homophones. Introduction Languages derived from different families show different morphological and syntactic structures, therefore, reflect different flavors in the sense of meaning. Inflections and derivatives do more work in Western languages than is asked of them in Chinese. However, the formation of verb compounds allows Chinese to attain the enlargement of lexicon. On the contrary, measure words do more work in Chinese but there is no exact equivalent of Chinese measure words in English. Therefore, the purpose of this paper will be focused on the formation of verb compounds and the use of measure words relating to their corresponding meanings in Chinese respectively. -
Quantifying Constructions in English and Chinese a Corpus-Based Contrastive Study
Quantifying Constructions in English and Chinese A Corpus-Based Contrastive Study Tony McEnery1 and Richard Xiao1 Abstract Quantifiers are a linguistic concept that mirrors quantity in reality. They indicate ‘how many’ or ‘how much’, for example, the number of entities denoted by a noun, the count of actions or events, the length of time, and the distance in space. All human languages have linguistic devices that express such ideas, though the encoding of natural language semantics can vary from language to language. This paper compares quantifying constructions in English and Chinese on the basis of comparable corpora of spoken and written data in the two languages. We will focus on classifiers in Chinese and their counterparts in English, as well as the interaction between quantifying constructions and progressives, which is normally ruled out by aspect theory, with the aim of addressing the following research questions: • What linguistic devices are used in Chinese and English for quantification? • How different (or similar) are classifiers in Chinese as a classifier language and in English as a non-classifier language? • Can quantifiers interact with progressives in English and Chinese if such interactions are theoretically ruled out by aspect theory? Before these research questions are explored in detail, it is appropriate to first present the principal data used in this study, which includes two written corpora and two spoken corpora. The Freiburg-LOB (FLOB) corpus is a recent update of LOB, which is composed of approximately one million tokens of written British English sampled proportionally from fifteen text categories published in the early 1990s (Hundt et al. -
Tagging Guidelines for BOLT Chinese-English Word Alignment
Tagging Guidelines for BOLT Chinese-English Word Alignment Version 2.0 – 4/10/2014 Linguistic Data Consortium Created by: Xuansong Li, [email protected] With contributions from: Niyu Ge, [email protected] Stephanie Strassel, [email protected] BOLT_TaggingWA_V2.0 Tagging Guidelines for BOLT Chinese-English Word Alignment Page 1 of 24 Version 2.0 –4/10/2014 Table of Content 1 Introduction .................................................................................................... 3 2 Types of links ................................................................................................. 3 2.1 Semantic links ........................................................................................ 4 2.2 Function links .......................................................................................... 4 2.3 DE-clause links ....................................................................................... 5 2.4 DE-modifier links .................................................................................... 6 2.5 DE-possessive links ............................................................................... 6 2.6 Grammatical inference semantic links .................................................... 6 2.7 Grammatical inference function links ...................................................... 7 2.8 Contextual inference link ........................................................................ 7 3 Types of tags ................................................................................................ -
Chinese: Parts of Speech
Chinese: Parts of Speech Candice Chi-Hang Cheung 1. Introduction Whether Chinese has the same parts of speech (or categories) as the Indo-European languages has been the subject of much debate. In particular, while it is generally recognized that Chinese makes a distinction between nouns and verbs, scholars’ opinions differ on the rest of the categories (see Chao 1968, Li and Thompson 1981, Zhu 1982, Xing and Ma 1992, inter alia). These differences in opinion are due partly to the scholars’ different theoretical backgrounds and partly to the use of different terminological conventions. As a result, scholars use different criteria for classifying words and different terminological conventions for labeling the categories. To address the question of whether Chinese possesses the same categories as the Indo-European languages, I will make reference to the familiar categories of the Indo-European languages whenever possible. In this chapter, I offer a comprehensive survey of the major categories in Chinese, aiming to establish the set of categories that are found both in Chinese and in the Indo-European languages, and those that are found only in Chinese. In particular, I examine the characteristic features of the major categories in Chinese and discuss in what ways they are similar to and different from the major categories in the Indo-European languages. Furthermore, I review the factors that contribute to the long-standing debate over the categorial status of adjectives, prepositions and localizers in Chinese. 2. Categories found both in Chinese and in the Indo-European languages This section introduces the categories that are found both in Chinese and in the Indo-European languages: nouns, verbs, adjectives, adverbs, prepositions and conjunctions. -
Classifiers Determiners Yicheng Wu Adams Bodomo
REMARKS AND REPLIES 487 Classifiers ϶ Determiners Yicheng Wu Adams Bodomo Cheng and Sybesma (1999, 2005) argue that classifiers in Chinese are equivalent to a definite article. We argue against this position on empirical grounds, drawing attention to the fact that semantically, syntactically, and functionally, Chinese classifiers are not on the same footing as definite determiners. We also show that compared with Cheng and Sybesma’s ClP analysis of Chinese NPs (in particular, Cantonese NPs, on which their proposal crucially relies), a consistent DP analysis is not only fully justified but strongly supported. Keywords: classifiers, open class, definite determiners, closed class, Mandarin, Cantonese 1 Introduction While it is often proposed that the category DP exists not only in languages with determiners such as English but also in languages without determiners such as Chinese (see, e.g., Pan 1990, Tang 1990a,b, Li 1998, 1999, Cheng and Sybesma 1999, 2005, Simpson 2001, 2005, Simpson and Wu 2002, Wu 2004), there seems to be no consensus about which element (if any) in Chinese is the possible counterpart of a definite determiner like the in English. In their influential 1999 article with special reference to Mandarin and Cantonese, Cheng and Sybesma (hereafter C&S) declare that ‘‘both languages have the equivalent of a definite article, namely, classifiers’’ (p. 522).1 Their treatment of Chinese classifiers as the counterpart of definite determiners is based on the following arguments: (a) both can serve the individualizing/singularizing function; (b) both can serve the deictic function. These arguments and the conclusion drawn from them have been incorporated into C&S 2005, C&S’s latest work on the classifier system in Chinese. -
Mpub10110094.Pdf
An Introduction to Chaghatay: A Graded Textbook for Reading Central Asian Sources Eric Schluessel Copyright © 2018 by Eric Schluessel Some rights reserved This work is licensed under the Creative Commons Attribution-NonCommercial- NoDerivatives 4.0 International License. To view a copy of this license, visit http:// creativecommons.org/licenses/by-nc-nd/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, California, 94042, USA. Published in the United States of America by Michigan Publishing Manufactured in the United States of America DOI: 10.3998/mpub.10110094 ISBN 978-1-60785-495-1 (paper) ISBN 978-1-60785-496-8 (e-book) An imprint of Michigan Publishing, Maize Books serves the publishing needs of the University of Michigan community by making high-quality scholarship widely available in print and online. It represents a new model for authors seeking to share their work within and beyond the academy, offering streamlined selection, production, and distribution processes. Maize Books is intended as a complement to more formal modes of publication in a wide range of disciplinary areas. http://www.maizebooks.org Cover Illustration: "Islamic Calligraphy in the Nasta`liq style." (Credit: Wellcome Collection, https://wellcomecollection.org/works/chengwfg/, licensed under CC BY 4.0) Contents Acknowledgments v Introduction vi How to Read the Alphabet xi 1 Basic Word Order and Copular Sentences 1 2 Existence 6 3 Plural, Palatal Harmony, and Case Endings 12 4 People and Questions 20 5 The Present-Future Tense 27 6 Possessive -
A Context-Based Classifier Prediction System for Chinese Language Learners
ClassifierGuesser: A Context-based Classifier Prediction System for Chinese Language Learners Nicole Peinelt1,2 and Maria Liakata1,2 and Shu-Kai Hsieh3 1The Alan Turing Institute, London, UK 2Department of Computer Science, University of Warwick, Coventry, UK 3Graduate Institute of Linguistics, National Taiwan University, Taipei, Taiwan {n.peinelt, m.liakata}@warwick.ac.uk, [email protected] Abstract 2011), as well as an SVM with syntactic and ontological features (Guo and Zhong, 2005). Classifiers are function words that are However, without any context classifier assign- used to express quantities in Chinese ment can be ambiguous. For instance, the and are especially difficult for language noun 球 ‘ball’ can be modified by ke - a clas- learners. In contrast to previous stud- sifier for round objects - when referring tothe ies, we argue that the choice of clas- object itself as in (1), but requires the event sifiers is highly contextual and train classifier chang in the context of a ball match context-aware machine learning mod- as in (2). We argue that context is an impor- els based on a novel publicly available tant factor for classifier selection, since a head dataset, outperforming previous base- word may have multiple associated classifiers, lines. We further present use cases for but the final classifier selection is restricted by our database and models in an interac- the context. tive demo system. (1) 一 颗 红色 的 球 1 Introduction one ke red DE ball Languages such as Chinese are characterized ‘a red ball’ by the existence of a class of words commonly (2) 一 场 精彩 的 球 referred to as ‘classifiers’ or ‘measure words’. -
Classifier Effect in Early and Late Bilinguals
This document is downloaded from DR‑NTU (https://dr.ntu.edu.sg) Nanyang Technological University, Singapore. Language and thought : classifier effect in early and late bilinguals Tang, Huimin. 2011 Tang, H. (2011). Language and Thought : Classifier Effect in Early and Late Bilinguals. Final year project report, Nanyang Technological University. https://hdl.handle.net/10356/94061 Downloaded on 02 Oct 2021 00:47:01 SGT ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library NANYANG TECHNOLOGICAL UNIVERSITY SCHOOL OF HUMANITIES AND SOCIAL SCIENCES Language and Thought: Classifier Effect in Early and Late Bilinguals Name: Tang Huimin (088169D12) Supervisor: Prof. Nayoung Kwon A Final Year Project submitted to the School of Humanities and Social Sciences, Nanyang Technological University in partial fulfillment of the requirements for the Degree of Bachelor of Arts in Linguistics & Multilingual Studies 2011 ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library ACKNOWLEDGEMENTS It has been an amazing learning journey working on this project. The final work would not have been possible if not for the following people who have offered invaluable help to me during the course of this project. First of all, I would like to extend my sincere thanks to my supervisor, Professor Nayoung Kwon for her enthusiastic and patient guidance. Her thoughtful comments, expertise in experimental design and statistical analyses have been extremely helpful towards the completion of this study. Also, her constant encouragement has a great emotional support to me. I would like to thank all my friends and course mates from the Faculty of Linguistics and Multilingual Studies in Nanyang Technological University, who helped me in recruiting subjects for the experiments. -
Word Segmentation Standard in Chinese, Japanese and Korean
Word Segmentation Standard in Chinese, Japanese and Korean Key-Sun Choi Hitoshi Isahara Kyoko Kanzaki Hansaem Kim Seok Mun Pak Maosong Sun KAIST NICT NICT National Inst. Baekseok Univ. Tsinghua Univ. Daejeon Korea Kyoto Japan Kyoto Japan Korean Lang. Cheonan Korea Beijing China [email protected] [email protected] [email protected] Seoul Korea [email protected] [email protected] [email protected] framework), and others in ISO/TC37/SC4 1 . Abstract These standards describe annotation methods but not for the meaningful units of word segmenta- Word segmentation is a process to divide a tion. In this aspect, MAF and SynAF are to anno- sentence into meaningful units called “word tate each linguistic layer horizontally in a stan- unit” [ISO/DIS 24614-1]. What is a word dardized way for the further interoperability. unit is judged by principles for its internal in- Word segmentation standard would like to rec- tegrity and external use constraints. A word ommend what word units should be candidates to unit’s internal structure is bound by prin- ciples of lexical integrity, unpredictability be registered in some storage or lexicon, and and so on in order to represent one syntacti- what type of word sequences called “word unit” cally meaningful unit. Principles for external should be recognized before syntactic processing. use include language economy and frequency In section 2, principles of word segmentation such that word units could be registered in a will be introduced based on ISO/CD 24614-1. lexicon or any other storage for practical re- Section 3 will describe the problems in word duction of processing complexity for the fur- segmentation and what should be word units in ther syntactic processing after word segmen- each language of Chinese, Japanese and Korean. -
Ffifoes LIBRARIES
Combining Linguistics and Statistics for High-Quality Limited Domain English-Chinese Machine Translation By Yushi Xu B.Eng in Computer Science (2006) Shanghai Jiaotong University Submitted to Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of Master of Science at the Massachusetts Institute of Technology June, 2008 © 2008 Massachusetts Institute of Technology All rights reserved Signature of the author , Department of Electrical Engineering and Computer Science r/ I I ',, June, 2008 Certified by Stephanie Seneff Principal Research Scientist Thesis Supervisor Accepted by Terry P.Orlando Chair, Department Committee on Graduate Students MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUL 0 1 2008 ffifoES LIBRARIES Combining Linguistics and Statistics for High-Quality Limited Domain English-Chinese Machine Translation By Yushi Xu Submitted to Department of Electrical Engineering and Computer Science On May 9, 2008 in Partial Fulfillment of the Requirements for the Degree of Master of Science in Electrical Engineering and Computer Science ABSTRACT Second language learning is a compelling activity in today's global markets. This thesis focuses on critical technology necessary to produce a computer spoken translation game for learning Mandarin Chinese in a relatively broad travel domain. Three main aspects are addressed: efficient Chinese parsing, high-quality English-Chinese machine translation, and how these technologies can be integrated into a translation game system. In the language understanding component, the TINA parser is enhanced with bottom-up and long distance constraint features. The results showed that with these features, the Chinese grammar ran ten times faster and covered 15% more of the test set. -
The Study of Chinese Noun-Classifier Compounds
2019 2nd International Conference on Arts, Linguistics, Literature and Humanities (ICALLH 2019) The Study of Chinese Noun-Classifier Compounds Yanji Cui School of Foreign Languages, Yanbian University, Yanji, 133002, China [email protected] Keywords: formative morpheme; noun-classifier compounds; word structure. Abstract: Chinese noun-classifier compounds have the special structure in the Chinese vocabulary system. They have their own features both in the syntactic structure and the word meaning. The particularity of noun-classifiers lies in the complexity of their internal structure. The degree of grammaticalization of classifiers gives the explanation to the differences. They can be divided into three categories, they are real noun-classifier compounds, pseudo noun-classifier compounds and the noun compounds. This paper wants to study on the inner structure of the noun-classifier compounds, and the nature of the classifier in the noun-classifier compounds 1. Introduction In modern Chinese, there are a group of compounds which usually contain two elements, one of which is the noun, and the other is classifier. this kind of noun structure is usually called the noun- classifier compound, such as: cheliang (cars) chuanzhi (ships) mapi (horses) huaduo (flowers) tianmu(field) zhizhang (paper) renkou (population) qiangzhi (guns) huaping(vase) shubao(school bag) xinfeng(envelope) yinliang (money) (1) These kind of compounds have a long history, they appeared before the southern and northern dynasties, and became richer since then. [1] It can be seen from the above examples that among the two elements in these words: the first one is the nominal morpheme and the second one is the classifier morpheme.