A CRF Sequence Labeling Approach to Chinese Punctuation Prediction

Total Page:16

File Type:pdf, Size:1020Kb

A CRF Sequence Labeling Approach to Chinese Punctuation Prediction A CRF Sequence Labeling Approach to Chinese Punctuation Prediction Yanqing Zhao, Chaoyue Wang, Guohong Fu School of Computer Science and Technology, Heilongjiang University Harbin 150080, China [email protected], [email protected], [email protected] information extraction (Matusov et al., 2006; Lu Abstract and Ng, 2010). Over the past years, numerous studies have been This paper presents a conditional random performed on the insertion of punctuations in fields based labeling approach to Chinese speech transcripts using supervised techniques. punctuation prediction. To this end, we However, it is actually very difficult or even first reformulate Chinese punctuation impossible to develop a large high-quality corpus prediction as a multiple-pass labeling task to achieve reliable models for predicting on a sequence of words, and then explore punctuations in speech transcripts or ASR outputs various features from three linguistic levels, (Takeuchi et al., 2007). Furthermore, most namely words, phrase and functional previous research on punctuation prediction chunks for punctuation prediction under the exploited very shallow linguistic features such as framework of conditional random fields. lexical features or prosodic cues (viz. pitch and Our experimental results on the Tsinghua pause duration) (Lu and Ng, 2010), few studies Chinese Treebank show that using multiple have been done on the exploration of deeper deeper linguistic features and multiple-pass linguistic features like syntactic structural labeling consistently improves performance. information for punctuation prediction, particularly in Chinese (Guo et al., 2010). 1 Introduction In this paper we draw our motivation from speech transcripts to written texts. On the one hand, Punctuation prediction, also referred to as a number of large annotated corpora of written punctuation restoration, aims at inserting proper texts are available to date. On the other hand, we punctuation marks at right position of an intend to examine the role of different linguistic unpunctuated text (Gravano et al., 2009; Guo et al., features on Chinese punctuation prediction. To this 2010). Punctuation is obviously an essential end, we reformulate Chinese punctuation indicator for sentence construction. For Chinese, prediction as a multiple-pass labeling task on word adding proper punctuation marks can not only sequences, and then explore multiple features at enhance the readability of text, but also can three linguistic levels, namely words, phrases and provide additional information for further language functional chunks, for punctuation labeling under analysis, such as word segmentation, phrasing and the framework of conditional random fields syntactic analysis (Guo et al., 2010; Chen and (CRFs). Furthermore, we have also performed Huang, 2011; Xue and Yang, 2011). As such, evaluation on the Tsinghua Chinese Treebank punctuation prediction plays a critical role in many (Zhou, 2004). natural language processing applications such as The rest of the paper is organized as follows: In automatic speech recognition (ASR), machine Section 2, we will provide a brief review of the translation, automatic summarization, and related work on punctuation prediction. In Section 508 Copyright 2012 by Yanqing Zhao, Chaoyue Wang, and Guohong Fu 26th Pacific Asia Conference on Language,Information and Computation pages 508–514 3, we will describe in detail a labeling method to prediction. This might be that a well-annotated Chinese punctuation prediction. Section 4 will corpus of speech texts is not available to date. As summarize the experimental results. Finally in such, in the present study we address the problem Section 6, we will give our conclusion and some of Chinese punctuation prediction from the possible directions for future work. perspective of written texts. Specially, we attempt to exploit multiple levels of features under the 2 Related Work framework of CRF-based sequence labeling and thus examine the role of for Chinese punctuation Punctuation prediction has been well studied in the communities of ASR, and a variety of techniques 3 Approach have been attempted, including n-grams (Takeuchi et al., 2007; Gravano et al., 2009), maximum This section details the CRFs-based multiple pass entropy models (MEMs) (Huang and Zweig, 2002; labeling method to Chinese punctuation prediction. Guo et al., 2010), and CRFs (Liu et al., 2005; Tomanek et al., 2007; Lu and Ng, 2010). 3.1 Task Formulation Current research focuses on seeking informative Chinese punctuation prediction is a process of features for punctuation prediction. Huang and inserting proper punctuation marks into a raw Zweig (2002) attempted to explore POS features Chinese text without punctuation marks. In the and prosodic features for inserting punctuations in present study, we reformulate Chinese punctuation automatically recognized speech texts using prediction as a multiple-pass labeling task on an MEMs. Takeuchi et al. (2007) exploited silence unpunctuated word string with the help of word information from ASR systems and head or tail pattern tags defined in Table 1. Furthermore, we phrases within sentences. They showed that using consider eleven main punctuation marks as shown head and tail phrases could result in improvement in Table 2. of performance in sentence boundary detection. Gravano et al. (2009) examined the effect of Tag Definition different orders of n-grams on performance in B The preceding word of the current punctuation prediction. More recently, Huang and punctuation. Chen (2011) used CRFs to combine different A The following word of the current features for labeling pause and stop in Chinese punctuation. texts, including the beginning and end features of O Words not adjacent to the current voice fragments, character features, word features, punctuation. POS features, syntactic features and topic features. BOT The head word of a text. In addition to speech transcripts or ASR outputs, EOT The tail word of a text. recently a number of researchers start to study Table 1: Patterns of words in punctuation labeling punctuation prediction via written texts. Tomanek et al (2007) employed CRFs to phrase a biological No. Name Punctuation Tag article, and then inserted punctuation to sentences 1 Comma , COM during sentence segmentation. Xue and Yang 2 full stop 。 FUL (2011) used MEMs to explore contextual words, ! POS features and syntax trees for inserting 3 exclamation mark EXC : commas in Chinese texts. Laboreiro and Sarmento 4 Colon COL (2010) applied support vector machines to exploit 5 Bracket () {}[] BRA multiple cues, including such as characters, 6 question mark ? QUE character types, symbols and punctuations, for 7 Semicolon ; SEM sentence segmentation and punctuation correction 8 enumeration comma 、 ENU in micro-blog texts. 9 book title mark 《》〈〉 BOO From these studies, it is clear that systems with 10 quotation mark “”‘’ QUO more and deeper features outperform systems only 11 Ellipsis …… ELL using simple features. However, most previous studies only used lexical cues for punctuation Table 2: Types of Chinese punctuation marks 509 In order to reduce the interference between and the construction of a clean government.), along different types of punctuation marks and to with its unpunctuated word string, three levels of simplify the problem of punctuation prediction as linguistic annotations and the corresponding well, we take the following order to perform punctuation labeling representation. punctuation labelling: sentence-final delimiters It is worth noting the major motivation of this (viz. period, question mark and exclamation mark) study is to investigate the effects of different levels → sentence-internal delimiters (viz. comma, of linguistic cues on Chinese punctuation semicolon, colon, and enumeration comma) → prediction. To achieve this, we take the following indicators (viz. bracket, book title mark, quotation three steps: First, we remove all punctuation marks mark, and ellipsis). within a given original punctuated text like line (a) After punctuation labeling, each word within the in Figure 1 and reduce it to an unpunctuated text unpunctuated text will receive a hybrid punctuation (viz. line (b)) before punctuation labeling. Then, tag of the form t1-t2, if it is adjacent to a we explore three levels of linguistic information to punctuation mark, or is at the beginning or end of a restore the removed punctuation marks using the text. Otherwise, it will only receive a tag O. Here, CRF-based multiple-pass labeling strategy. Finally, t1 denotes the pattern of the current word in we evaluate punctuation prediction performance by punctuation labeling (as shown in Table 1), and t2 comparing the automatically restored punctuation stands for the type of the punctuation mark (as marks with the corresponding original ones. defined in Table 2) that precedes or follows the Considering the availability of linguistic current word if applicable. information, we perform punctuation prediction on the Tsinghua Chinese Treebank (Zhou, 2004), a (a) Punctuated text: 执法部门是反腐败斗争 、 corpus of written Chinese with a variety of 搞好廉政建设的重点部门之一。 linguistic annotation information, including word (b) Unpunctuated word string: 执法 /部门 /是/反 segmentation, POS, phrases and functional chunks. /腐败 /斗争 /搞好 /廉政 /建设 /的/重点 /部门 /之 Also, the relevant annotation scheme is used throughout our present study. 一/ (c) POS: 执法 /vN 部门 /n 是/vC 反/v 腐败 /a 3.2 CRFs for Punctuation Labeling 斗争 /vN 搞好 /v 廉政 /vN 建设 /vN 的 We choose CRFs as the basic framework for /uJDE
Recommended publications
  • Punctuate Your Translation Text Right: a View From
    Title of article: Decoding and Encoding the Discourse meaning of Punctuation: a Perspective from English-to-Chinese Translation Author: Dr Caiwen Wang, University of Westminster, London, UK Email: [email protected] - 1 - Decoding and Encoding the Discourse meaning of Punctuation: a Perspective from English-to-Chinese Translation Abstract: This exploratory research examines translation students’ use of punctuation, by applying Newmark’s (1988) classical idea of punctuation as a discourse unit for meaning demarcation. Data was collected from a group of 25 Chinese students studying specialised translation at a British university. The research focuses on the use of two punctuation marks: comma and period or full stop. The aim is to investigate how students of translation analyse the meaning of a source text with punctuation marks and how they subsequently convert this meaning into the target language again using punctuation marks. It is found that students generally do not mechanically copy the punctuation marks of a source text into the translation. They will customize or modify the original punctuation marks according to their meaning analysis of the text and their knowledge of punctuation in source and target languages. Finally, we will discuss the implications of the research for translation education. Key words: Punctuation; semantic relationship; discourse; translation pedagogy 1. Introduction This research is an attempt to enrich data for filling the gap summarised by Rodríguez-Castro, which is that ‘[i]n the scholarly research in Translation Studies, the study of punctuation has not attracted much attention either from professionals or from researchers’ (2011:43). The research especially draws inspiration from a Master student doing her end-of-year Translation Project, where she and I, as her supervisor, discussed punctuation use in depth.
    [Show full text]
  • |||GET||| Modern Korean Grammar a Practical Guide 1St Edition
    MODERN KOREAN GRAMMAR A PRACTICAL GUIDE 1ST EDITION DOWNLOAD FREE Andrew Sangpil Byon | 9781138931305 | | | | | Chinese characters It provides content covering all areas of health and wellness from mainstream medicine to the many perspectives of complementary, holistic and integrated medicine. Covers Responsa Literature - rabbinic case-law rulings which represent the historical-sociological milieu of real-life situations; the Bible, the Talmud and their principal commentaries; works about Jewish law and customs; codes of Jewish law, such as Maimonides' Mishneh Torah and the Shulchan Modern Korean Grammar A Practical Guide 1st edition with its principal commentaries; midrashim, Zohar, etc. Series 6 brings the previous published Series more firmly into the Modern Korean Grammar A Practical Guide 1st edition 20th Modern Korean Grammar A Practical Guide 1st edition, with titles up to Retrieved 22 July Series 1 covers through A digital repository for the nation's great research libraries that brings together the immense collections of its partner institutions. Provides searchable full text of several hundred classic Christian works in the public domain for theological study. Each volume contains an overview of a geographic region, a survey of its musical heritage, and a description of specific musical genres, practices, and performances. Verified purchase: Yes Condition: New. Cambridge: Cambridge University Press. New York: The Encyclopedia press, inc. Therefore, Socrates reasons, at least one thing exists that certain gods love but other gods hate. Consulates were listening posts reporting on the activities of the Italian colonial governments and later the mandate authorities, and the activities of the native peoples. Critical interpretations, theatre history surveys and major reference works on authors, movements, practitioners, periods and genres are included alongside performance and practitioner texts, acting and backstage guides.
    [Show full text]
  • |||GET||| Introduction to Classical Chinese Philosophy 1St Edition
    INTRODUCTION TO CLASSICAL CHINESE PHILOSOPHY 1ST EDITION DOWNLOAD FREE Bryan W Van Norden | 9781603846059 | | | | | Chinese philosophy Namespaces Article Talk. Among Chinese speakers, Literary Chinese has been largely replaced by written vernacular Chinesea style of writing that is similar to modern spoken Mandarin Chinesewhile speakers of non-Chinese languages have largely abandoned Literary Chinese in favor of local vernaculars. Two chiefly important monk philosophers were Sengzhao and Daosheng. As in Japanphilosophy in China has become a melting pot of ideas. My Cart. The Bible in China: the history of the Union Version or the culmination of protestant missionary Bible translation in China. The Analects. Poetry and other rhyme-based writing thus becomes less coherent than the original reading must have been. In terms of conciseness and compactness, Classical Chinese rarely uses words Introduction to Classical Chinese Philosophy 1st edition of two Chinese characters; nearly all words are of one Introduction to Classical Chinese Philosophy 1st edition only. Raymond Stanley Dawson JavaScript seems to be disabled in your browser. Wikibooks has more on the topic of: Classical Chinese. Buddhism Introduction to Classical Chinese Philosophy 1st edition in China around the 1st century AD, but it was not until the Northern and SouthernSui and Tang dynasties that it gained considerable influence and acknowledgement. The logicians School of Names were concerned with logic, paradoxes, names and actuality similar to Confucian rectification of names. Main article: List of Chinese philosophers. Philosopher Han Fei synthesized together earlier the methods of his predecessors, which famous historian Sima Tan posthumously termed Legalism. Mandarin and Cantonese, for example, also have words that are pronounced one way in colloquial usage and another way when used in Classical Chinese or in specialized terms coming from Classical Chinese, though the system is not as extensive as that of Southern Min or Wu.
    [Show full text]
  • An Interpretive Approach (Singapore). Siew Kim Lee University of Massachusetts Amherst
    University of Massachusetts Amherst ScholarWorks@UMass Amherst Doctoral Dissertations 1896 - February 2014 1-1-1987 A Chinese conception of "management" : an interpretive approach (Singapore). Siew Kim Lee University of Massachusetts Amherst Follow this and additional works at: https://scholarworks.umass.edu/dissertations_1 Recommended Citation Lee, Siew Kim, "A Chinese conception of "management" : an interpretive approach (Singapore)." (1987). Doctoral Dissertations 1896 - February 2014. 4291. https://scholarworks.umass.edu/dissertations_1/4291 This Open Access Dissertation is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Doctoral Dissertations 1896 - February 2014 by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. A CHINESE CONCEPTION OF "MANAGEMENT” -AN INTERPRETIVE APPROACH (SINGAPORE) A Dissertation Presented By SIEW KIM LEE Submitted to the Graduate School of the University of Massachusetts in partial fulfillment of the requirements for the degree of DOCTOR OF EDUCATION September 1987 Education @ Copyright by Siew Kim Lee 1987 All Rights Reserved. 11 A CHINESE CONCEPTION OF "MANAGEMENT - AN INTERPRETIVE APPROACH (SINGAPORE) A Dissertation Presented By SIEW KIM LEE Approved as to style and content by: JL Dr. Linda Sraircich, Member £y e. GeorgeUrch, Acting Dean School of Education iii DEDICATION TO MY PARENTS for their endless love, support, confidence, endurance and faith. Without them, without me. ACKNOWLEDGEMENTS I would like to express my appreciation to people who have helped me in this project in different ways. First of all, my special thanks to Sheryl Riechmann Hruska, for her warmth, care, guidance, understanding, love, and both intellectual and emotional support.
    [Show full text]
  • NY IME Instructions
    Selecting Simplified Input Method To type in simplified characters, click on the arrow to the right of the selected input language at the top left corner of your screen. Then select "Chinese (Simplified)" from the drop-down list that appears, as shown below. Please note that after each time you select a language, you must use your mouse to reposition your cursor inside the response box before you can begin typing. Note: You may also toggle between your selected Chinese input method and English by using the key code CTRL+SPACEBAR. IMPORTANT: Once you have selected a Chinese input language, if you press SHIFT, your keyboard will type in English until you press SHIFT again to resume Chinese input. Text Entry and Character Selection 1. Use the keyboard to type the pronunciation of the desired Chinese character in Simplified Chinese. As you type, a long window will appear with numbered character options, as shown in the example below. If you do not see the desired character in the window, type additional letters or use the mouse to click on the black arrow at the end of the window to see another set of character options. 2. If the first character shown is the desired character, press SPACEBAR to select it. To select another character, use the mouse to click on the character in the window or type the number that corresponds to the desired character. 3. The character you selected will now appear in the response box. Press ENTER or SPACEBAR to confirm the character and move the cursor forward.
    [Show full text]
  • NY IME Instructions
    Selecting Simplified Input Method To type in simplified characters, click on the arrow to the right of the selected input language at the top left corner of your screen. Then select "Chinese (Simplified)" from the drop-down list that appears, as shown below. Please note that after each time you select a language, you must use your mouse to reposition your cursor inside the response box before you can begin typing. Note: You may also toggle between your selected Chinese input method and English by using the key code CTRL+SPACEBAR. IMPORTANT: Once you have selected a Chinese input language, if you press SHIFT, your keyboard will type in English until you press SHIFT again to resume Chinese input. Text Entry and Character Selection 1. Use the keyboard to type the pronunciation of the desired Chinese character in Simplified Chinese. As you type, a long window will appear with numbered character options, as shown in the example below. If you do not see the desired character in the window, type additional letters or use the mouse to click on the black arrow at the end of the window to see another set of character options. 2. If the first character shown is the desired character, press SPACEBAR to select it. To select another character, use the mouse to click on the character in the window or type the number that corresponds to the desired character. 3. The character you selected will now appear in the response box. Press ENTER or SPACEBAR to confirm the character and move the cursor forward.
    [Show full text]
  • Pause and Stop Labeling for Chinese Sentence Boundary Detection
    Pause and Stop Labeling for Chinese Sentence Boundary Detection Hen-Hsen Huang Hsin-Hsi Chen Department of Computer Science and Department of Computer Science and Information Engineering, Information Engineering, National Taiwan University, National Taiwan University, Taipei, Taiwan Taipei, Taiwan [email protected] [email protected] “這是有點霸道,但也有道理,因為他們是 Abstract 上市公司,每一季要向美國證管會報告總公 司、附屬公司及子公司的營運及財務狀況, The fuzziness of Chinese sentence boundary 帳都是照一套會計原則來做,所以很多時候 makes discourse analysis more challenging. Moreover, many articles posted on the Internet 他們的要求,是出自一種單純的需要,而並 are even lack of punctuation marks. In this pa- 不是故意要來欺負我們。” per, we collect documents written by masters as a reference corpus and propose a model to could be divided into three sentences such as label the punctuation marks for the given text. Conditional random field (CRF) models trained with the corpus determine the correct “這是有點霸道,但也有道理。” „This is a delimiter (a comma or a full-stop) between little overbearing, but is also reasonable.‟ each pair of successive clauses. Different tag- ging schemes and various features from differ- “因為他們是上市公司,每一季要向美國證 ent linguistic levels are explored. The results 管會報告總公司、附屬公司及子公司的營運 show that our segmenter achieves an accuracy 及財務狀況,帳都是照一套會計原則來做。 of 77.48% for plain text, which is close to the ” human performance 81.18%. For the rich for- „Because they are listed companies and should matted text, our segmenter achieves an even report a summary of operation and financial sta- better accuracy of 82.93%. tus of their corporation, subsidiaries, and affili- ates to the U.S. Securities quarterly, the ac- 1 Introduction counts are prepared in accordance with the same set of accounting principles.‟ To resolve sentence boundary is a fundamental issue for human language understanding.
    [Show full text]
  • Chinese Language Kit User's Manual
    Chinese Language Kit User’s Manual Includes instructions for installing and using your software K Apple Computer, Inc. © 1998 Apple Computer, Inc. All rights reserved. 1 Infinite Loop Cupertino, CA 95014-2084 408-996-1010 http://www.apple.com Under the copyright laws, this manual may not be copied, in whole or in part, without the written consent of Apple. Your rights to the software are governed by the accompanying software license agreement. The Apple logo is a trademark of Apple Computer, Inc., registered in the U.S. and other countries. Use of the “keyboard” Apple logo (Option-Shift-K) for commercial purposes without the prior written consent of Apple may constitute trademark infringement and unfair competition in violation of federal and state laws. Every effort has been made to ensure that the information in this manual is accurate. Apple is not responsible for printing or clerical errors. Apple, the Apple logo, AppleShare, Mac, Macintosh, TrueType, and WorldScript are trademarks of Apple Computer, Inc., registered in the U.S. and other countries. Balloon Help and Finder are trademarks of Apple Computer, Inc. Helvetica is a registered trademark of Linotype-Hell AG and/or its subsidiaries. PowerPC is a trademark of International Business Machines Corporation, used under license therefrom. Other company and product names mentioned herein are trademarks of their respective companies. Mention of third-party products is for informational purposes only and constitutes neither an endorsement nor a recommendation. Apple assumes no responsibility with regard to the performance or use of these products. Simultaneously published in the United States and Canada.
    [Show full text]
  • Zhon Documentation Release 1.1.5
    Zhon Documentation Release 1.1.5 Thomas Roten Mar 12, 2017 Contents 1 Introduction 1 2 Installation 3 3 Using Zhon 5 3.1 zhon.hanzi ..............................................6 3.2 zhon.pinyin .............................................7 3.3 zhon.zhuyin .............................................9 3.4 zhon.cedict ............................................. 10 Python Module Index 11 i ii CHAPTER 1 Introduction Zhon is a Python library that provides constants commonly used in Chinese text processing: • CJK characters and radicals • Chinese punctuation marks • Chinese sentence regular expression pattern • Pinyin vowels, consonants, lowercase, uppercase, and punctuation • Pinyin syllable, word, and sentence regular expression patterns • Zhuyin characters and marks • Zhuyin syllable regular expression pattern • CC-CEDICT characters 1 Zhon Documentation, Release 1.1.5 2 Chapter 1. Introduction CHAPTER 2 Installation Zhon supports Python 2.7 and 3. Install using pip: $ pip install zhon If you want to download the latest source code, check out Zhon’s GitHub repository. Be sure to report any bugs you find. Thanks! 3 Zhon Documentation, Release 1.1.5 4 Chapter 2. Installation CHAPTER 3 Using Zhon Zhon contains four modules that export helpful Chinese constants: • zhon.hanzi • zhon.pinyin • zhon.zhuyin • zhon.cedict Zhon’s constants are formatted in one of three ways: • Characters listed individually. These can be used with membership tests or used to build regular expression patterns. For example, 'aeiou'. • Character code ranges. These are used to build regular expression patterns. For example, 'u\0041-\u005A\u0061-\u007A'. • Regular expression pattern. These are regular expression patterns that can be used with the regular expression library directly. For example, '[u\0020-\u007E]+'. Using the constants listed below is simple.
    [Show full text]
  • Bibliography of Chinese Linguistics William S.-Y.Wang
    BIBLIOGRAPHY OF CHINESE LINGUISTICS WILLIAM S.-Y.WANG INTRODUCTION THIS IS THE FIRST LARGE-SCALE BIBLIOGRAPHY OF CHINESE LINGUISTICS. IT IS INTENDED TO BE OF USE TO STUDENTS OF THE LANGUAGE WHO WISH EITHER TO CHECK THE REFERENCE OF A PARTICULAR ARTICLE OR TO GAIN A PERSPECTIVE INTO SOME SPECIAL TOPIC OF RESEARCH. THE FIELD OF CHINESE LINGUIS- TICS HAS BEEN UNDERGOING RAPID DEVELOPMENT IN RECENT YEARS. IT IS HOPED THAT THE PRESENT WORK WILL NURTURE THIS DEVELOP- MENT BY PROVIDING A SENSE OF THE SIZABLE SCHOLARSHIP IN THE FIELD» BOTH PAST AND PRESENT. IN SPITE OF REPEATED CHECKS AND COUNTERCHECKS, THE FOLLOWING PAGES ARE SURE TO CONTAIN NUMEROUS ERRORS OF FACT, SELECTION AND OMISSION. ALSO» DUE TO UNEVENNESS IN THE LONG PROCESS OF SELECTION, THE COVERAGE HERE IS NOT UNIFORM. THE REPRESENTATION OF CERTAIN TOPICS OR AUTHORS IS PERHAPS NOT PROPORTIONAL TO THE EXTENT OR IMPORTANCE BIBLIOGRAPHY OF CHINESE LINGUISTICS ]g9 OF THE CORRESPONDING LITERATURE. THE COVERAGE CAN BE DIS- CERNED TO BE UNBALANCED IN TWO MAJOR WAYS. FIRST. THE EMPHASIS IS MORE ON-MODERN. SYNCHRONIC STUDIES. RATHER THAN ON THE WRITINGS OF EARLIER CENTURIES. THUS MANY IMPORTANT MONOGRAPHS OF THE QING PHILOLOGISTS. FOR EXAMPLE, HAVE NOT BEEN INCLUDED HERE. THOUGH THESE ARE CERTAINLY TRACE- ABLE FROM THE MODERN ENTRIES. SECOND, THE EMPHASIS IS HEAVILY ON THE SPOKEN LANGUAGE, ALTHOUGH THERE EXISTS AN ABUNDANT LITERATURE ON THE CHINESE WRITING SYSTEM. IN VIEW OF THESE SHORTCOMINGS, I HAD RESERVATIONS ABOUT PUBLISHING THE BIBLIOGRAPHY IN ITS PRESENT STATE. HOWEVER, IN THE LIGHT OF OUR EXPERIENCE SO FAR, IT IS CLEAR THAT A CONSIDERABLE AMOUNT OF TIME AND EFFORT IS STILL NEEDED TO PRODUCE A COMPREHENSIVE BIBLIOGRAPHY THAT IS AT ONCE PROPERLY BALANCED AMD COMPLETELY ACCURATE (AND, PERHAPS, WITH ANNOTATIONS ON THE IMPORTANT ENTRIES).
    [Show full text]
  • Simplification in Translated Chinese Texts: a Corpus-Based Study on Mean Sentence Length
    NCUE Journal of Humanities Vol. 9, pp. 253-271 March 2014 Simplification in Translated Chinese Texts: A Corpus-Based Study on Mean Sentence Length Ting-hui Wen Abstract Mean sentence length is proposed as a measure of simplification: if a text has shorter mean sentence length, it is assumed to be simpler for readers to comprehend. Since translated texts are hypothesized to be simpler than non-translated text, they would presumably exhibit shorter mean sentence length. The present research aims to investigate, using corpus-based methods, the phenomenon of simplification in translated, compared to non-translated, Chinese texts. This paper focuses on measuring sentence length of the translated texts in the Corpus of Comparable Mystery Fiction, analyzing the results of mean sentence length and its additional measures: mean sentence length in terms of characters, mean sentence sub-unit length in terms of words and mean sentence sub-unit length in terms of characters. The measure of mean sentence length and its additional measures render consistent results showing that the translated texts of the corpus under study exhibit shorter sentence length than the non-translated texts. Keywords: corpus-based translation studies, simplification, mean sentence length Assistant Professor, Graduate Institute of Translation and Interpretation, National Changhua University of Education. Received December 31, 2013. 253 ⚳䩳⼘⊾ⷓ䭬⣏⬠㔯⬠昊⬠⟙ 䫔ḅ㛇炻枩 253-271 Ḵ˕ᶨ⚃⸜ᶱ㚰 ѳ֡ѡߏزᙁϯ౜ຝǺаᇟ਑৤Бݤ௖ޑᅇᇟᙌ᝿Ўҁύ ྕ൛ඁ ! ᄔा ѡߏၨอǴ೯த೏ᇡࣁޑෳໆБݤϐ΋Ǻӵ݀΋ঁЎҁޑѳ֡ѡߏࢂᙁϯ౜ຝ ѳ֡ѡߏޑΑှǶଷ೛ᙌ᝿ЎҁКߚᙌ᝿ЎҁᙁൂǴᙌ᝿ЎҁܭܰޣၨᙁൂǴᡣ᠐ ᙁϯ౜ຝǴԶЪ੝ձଞޑᅇᇟᙌ᝿Ўҁύزԑӧаᇟ਑৤Бݤࣴز൩КၨอǶҁࣴ ᝌᅪλᇥёКᇟ਑৤ȐTheޑᇟ਑৤ࣁԾࡌޑ٬Ҕ܌زǶҁࣴزჹѳ֡ѡߏ଺ࣴ ෳޑ՜ۯځCorpus of Comparable Mystery FictionȑǴϩ݋ѳ֡ѡߏȐаຒीᆉȑаϷ аӷीᆉѳ֡λѡߏǶکѳ֡λѡߏޑѳ֡ѡߏǴаຒीᆉޑໆБݤǺаӷीᆉ ѡߏޑᙌ᝿ЎҁޑࣣᡉҢҁᇟ਑৤่݀ޑෳໆБݤޑ՜ۯځϩ݋ѳ֡ѡߏаϷ Кߚᙌ᝿ЎҁޑѡߏอǶ ᜢᗖຒ: ᇟ਑৤ᙌ᝿Ꮲǵᙁϯ౜ຝǵѳ֡ѡߏ ⚳䩳⼘⊾ⷓ䭬⣏⬠侣嬗䞼䨞㇨≑䎮㔁㌰ˤ ⇘䧧㖍㛇烉2013 ⸜ 12 㚰 31 㖍ˤ 254 Simplification in Translated Chinese Texts: A Corpus-Based Study on Mean Sentence Length Introduction In 1996, Mona Baker proposed four recurrent translation features: explicitation, simplification, normalisation and levelling out.
    [Show full text]
  • Chinese Characters in Academic Writing
    Chinese Characters in Academic Writing Institute of East Asian Studies University of Duisburg-Essen ⃝c 2014 Flemming Christiansen 1 General Principles 1. Hanyu Pinyin is a scheme for spelling Modern Standard Chinese with Latin let- ters. Hanyu Pinyin is the internationally accepted standard (ISO 7098:1991), and with the exception of some few cases outlined below, it is the only transcription to be used for transcribing Chinese characters. 2. Hanyu Pinyin in academic and general texts never carries tone markers. Used in linguistic studies, Chinese language textbooks and grammars to represent phonemes or to aid language learning, Hanyu Pinyin normally carries the five tone markers. Otherwise not. If a linguistic point or a linguistic clarification is made within a non-linguistic text, tone markers may be used, but only where absolutely relevant. 3. Always use the appropriate “Umlaut” (diaresis, trema) in lü, nü, lüe and so on. It is an integral part of Hanyu Pinyin. 妇女 is funü, not *funu, and 战略 is zhanlüe, not *zhanlue. It is a grave mistake to replace “ü” with “v”, for “v” is not a valid character in Hanyu Pinyin. 4. Hanyu Pinyin transcribes words. Although Chinese written by characters does not indicate word boundaries, the Hanyu Pinyin scheme is based on word formation. In Hanyu Pinyin the char- acters are linked to form words, so 社会 becomes shehui (not *she hui) and 共产主义 becomes gongchanzhuyi (not *gong chan zhu yi). See Chinese dictionaries like Xiandai Hanyu Cidian for word division. 5. In cases where ambiguity is possible, the apostrophe is used to mark the border between two characters (=syllables).
    [Show full text]