Consistent Word Segmentation, Part-Of-Speech Tagging and Dependency Labelling Annotation for Chinese Language
Total Page:16
File Type:pdf, Size:1020Kb

Load more
Recommended publications
-
Learn Pronouns As Part of Speech for Bank & SSC Exams
Learn Pronouns as Part of Speech for Bank & SSC Exams - English Notes in PDF Are you preparing for Banking or SSC Exams? If you aim at making a career in the government sector & get a reputed job, it is very important to know the basics of English Language. To score maximum marks in this section with great accuracy, it is important for you to be prepared with the basic grammar & vocabulary. Here we are with a detailed explanatory article on Pronouns as Part of Speech with relevant examples. So, read the article carefully & then take our Online Mock Tests to check your level of preparation. Before moving ahead with Pronouns, let’s have a look at what are parts of speech in brief: Parts of Speech Parts of speech are the basic categories of words according to their function in a sentence. It is a category to which a word is assigned in accordance with its syntactic functions. English has eight main parts of speech, namely, Nouns, Pronouns, Adjectives, Verbs, Adverbs, Prepositions, Conjunctions & Interjections. In grammar, the parts of speech, also called lexical categories, grammatical categories or word classes is a linguistic category of words. Pronouns as Part of Speech 1 | Pronouns as part of speech are the words which are used in place of nouns like people, places, or things. They are used to avoid sounding unnatural by reusing the same noun in a sentence multiple times. In the sentence Maya saw Sanjay, and she waved at him, the pronouns she and him take the place of Maya and Sanjay, respectively. -
Typological Differences Between English and Chinese Multi-Verb Constructions
Linguistics and Literature Studies 7(3): 110-115, 2019 http://www.hrpub.org DOI: 10.13189/lls.2019.070303 Typological Differences between English and Chinese Multi-verb Constructions Mengmeng Tang School of Foreign Studies, China University of Petroleum, China Copyright©2019 by authors, all rights reserved. Authors agree that this article remains permanently open access under the terms of the Creative Commons Attribution License 4.0 International License Abstract English and Chinese are different in the verbs in one temporal category, with fixed order and no composition of Multi-verb Constructions (MVCs), which conjunctions [25], a language phenomenon without a refer to a series of verbs appearing in a mono-clause, counterpart in English. without pauses or conjunctions. English MVCs contain a There has been a long-running discussion on Chinese finite verb which inflects with tense, combined with finite and non-finite verb distinctions (e.g., [5-8, 16-18, non-finite forms (e.g., The boss encouraged Jerry to attend 21-24, 27-30]). Although there has been a considerable the meeting). Chinese MVCs are in the form of bare verbs theoretical discussion of Chinese finiteness, no research or verbs with aspectual morphemes. From the perspective has been undertaken to date from the perspective of the of finiteness, this article analyzes the typological typological differences of semantic finiteness in Chinese differences of morphological finiteness in English and and morphological finiteness in English in MVCs. semantic finiteness in Chinese. Keywords Multi-verb Constructions, Finiteness 2. Multi-verb Constructions Verbs are the core of a sentence. Descriptive linguists and comparative syntacticians have examined the 1. -
Instrumentality: the Core Meaning of the Coverb Yi 以 in Classical Chinese
Proceedings of the 20th North American Conference on Chinese Linguistics (NACCL-20). 2008. Volume 1. Edited by Marjorie K.M. Chan and Hana Kang. Columbus, Ohio: The Ohio State University. Pages 489-498. Instrumentality: The Core Meaning of the Coverb Yi 以 in Classical Chinese Sue-mei Wu 吳素美 Carnegie Mellon University Yi 以 is one of the most important and commonly used function words in Classical Chinese, and many researchers (e.g. Hsueh 1997, Pulleyblank 1995, Sun 1991, Wu 1994 and Wu 1997) have contributed to the discussion on yi's meanings and functions. The main issues in the current understanding of yi include its verb and coverb usages. Both usages have been thought to exhibit a range of meanings, and yi has been thought to be able to occur flexibly either before or after a verb phrase. In addition, yi is also thought to be a conjunction indicating the reason, purpose or result of an action. This study proposes that the verb yi fundamentally indicates instrumentality, and that the coverb yi, which is derived from the verb yi, still retains that fundamental notion of instrumentality. The notion of instrumentality has been extended in yi's coverb usage to indicate the means or reason by which an action occurs, or the time which an action occurs. Moreover, I will also argue that the yi phrase, when it occurs before a verb phrase, functions syntactically as a modifier to the verb phrase. When the yi phrase occurs after another verb phrase, however, the semantic emphasis has been deliberately put on the yi phrase, which then serves as the nucleus of the predicate. -
Compound Word Formation.Pdf
Snyder, William (in press) Compound word formation. In Jeffrey Lidz, William Snyder, and Joseph Pater (eds.) The Oxford Handbook of Developmental Linguistics . Oxford: Oxford University Press. CHAPTER 6 Compound Word Formation William Snyder Languages differ in the mechanisms they provide for combining existing words into new, “compound” words. This chapter will focus on two major types of compound: synthetic -ER compounds, like English dishwasher (for either a human or a machine that washes dishes), where “-ER” stands for the crosslinguistic counterparts to agentive and instrumental -er in English; and endocentric bare-stem compounds, like English flower book , which could refer to a book about flowers, a book used to store pressed flowers, or many other types of book, as long there is a salient connection to flowers. With both types of compounding we find systematic cross- linguistic variation, and a literature that addresses some of the resulting questions for child language acquisition. In addition to these two varieties of compounding, a few others will be mentioned that look like promising areas for coordinated research on cross-linguistic variation and language acquisition. 6.1 Compounding—A Selective Review 6.1.1 Terminology The first step will be defining some key terms. An unfortunate aspect of the linguistic literature on morphology is a remarkable lack of consistency in what the “basic” terms are taken to mean. Strictly speaking one should begin with the very term “word,” but as Spencer (1991: 453) puts it, “One of the key unresolved questions in morphology is, ‘What is a word?’.” Setting this grander question to one side, a word will be called a “compound” if it is composed of two or more other words, and has approximately the same privileges of occurrence within a sentence as do other word-level members of its syntactic category (N, V, A, or P). -
Hunspell – the Free Spelling Checker
Hunspell – The free spelling checker About Hunspell Hunspell is a spell checker and morphological analyzer library and program designed for languages with rich morphology and complex word compounding or character encoding. Hunspell interfaces: Ispell-like terminal interface using Curses library, Ispell pipe interface, OpenOffice.org UNO module. Main features of Hunspell spell checker and morphological analyzer: - Unicode support (affix rules work only with the first 65535 Unicode characters) - Morphological analysis (in custom item and arrangement style) and stemming - Max. 65535 affix classes and twofold affix stripping (for agglutinative languages, like Azeri, Basque, Estonian, Finnish, Hungarian, Turkish, etc.) - Support complex compoundings (for example, Hungarian and German) - Support language specific features (for example, special casing of Azeri and Turkish dotted i, or German sharp s) - Handle conditional affixes, circumfixes, fogemorphemes, forbidden words, pseudoroots and homonyms. - Free software (LGPL, GPL, MPL tri-license) Usage The src/tools dictionary contains ten executables after compiling (or some of them are in the src/win_api): affixcompress: dictionary generation from large (millions of words) vocabularies analyze: example of spell checking, stemming and morphological analysis chmorph: example of automatic morphological generation and conversion example: example of spell checking and suggestion hunspell: main program for spell checking and others (see manual) hunzip: decompressor of hzip format hzip: compressor of -
6 the Major Parts of Speech
6 The Major Parts of Speech KEY CONCEPTS Parts of Speech Major Parts of Speech Nouns Verbs Adjectives Adverbs Appendix: prototypes INTRODUCTION In every language we find groups of words that share grammatical charac- teristics. These groups are called “parts of speech,” and we examine them in this chapter and the next. Though many writers onlanguage refer to “the eight parts of speech” (e.g., Weaver 1996: 254), the actual number of parts of speech we need to recognize in a language is determined by how fine- grained our analysis of the language is—the more fine-grained, the greater the number of parts of speech that will be distinguished. In this book we distinguish nouns, verbs, adjectives, and adverbs (the major parts of speech), and pronouns, wh-words, articles, auxiliary verbs, prepositions, intensifiers, conjunctions, and particles (the minor parts of speech). Every literate person needs at least a minimal understanding of parts of speech in order to be able to use such commonplace items as diction- aries and thesauruses, which classify words according to their parts (and sub-parts) of speech. For example, the American Heritage Dictionary (4th edition, p. xxxi) distinguishes adjectives, adverbs, conjunctions, definite ar- ticles, indefinite articles, interjections, nouns, prepositions, pronouns, and verbs. It also distinguishes transitive, intransitive, and auxiliary verbs. Writ- ers and writing teachers need to know about parts of speech in order to be able to use and teach about style manuals and school grammars. Regardless of their discipline, teachers need this information to be able to help students expand the contexts in which they can effectively communicate. -
Building a Treebank for French
Building a treebank for French £ £¥ Anne Abeillé£ , Lionel Clément , Alexandra Kinyon ¥ £ TALaNa, Université Paris 7 University of Pennsylvania 75251 Paris cedex 05 Philadelphia FRANCE USA abeille, clement, [email protected] Abstract Very few gold standard annotated corpora are currently available for French. We present an ongoing project to build a reference treebank for French starting with a tagged newspaper corpus of 1 Million words (Abeillé et al., 1998), (Abeillé and Clément, 1999). Similarly to the Penn TreeBank (Marcus et al., 1993), we distinguish an automatic parsing phase followed by a second phase of systematic manual validation and correction. Similarly to the Prague treebank (Hajicova et al., 1998), we rely on several types of morphosyntactic and syntactic annotations for which we define extensive guidelines. Our goal is to provide a theory neutral, surface oriented, error free treebank for French. Similarly to the Negra project (Brants et al., 1999), we annotate both constituents and functional relations. 1. The tagged corpus pronoun (= him ) or a weak clitic pronoun (= to him or to As reported in (Abeillé and Clément, 1999), we present her), plus can either be a negative adverb (= not any more) the general methodology, the automatic tagging phase, the or a simple adverb (= more). Inflectional morphology also human validation phase and the final state of the tagged has to be annotated since morphological endings are impor- corpus. tant for gathering constituants (based on agreement marks) and also because lots of forms in French are ambiguous 1.1. Methodology with respect to mode, person, number or gender. For exam- 1.1.1. -
TRADITIONAL GRAMMAR REVIEW I. Parts of Speech Traditional
Traditional Grammar Review Page 1 of 15 TRADITIONAL GRAMMAR REVIEW I. Parts of Speech Traditional grammar recognizes eight parts of speech: Part of Definition Example Speech noun A noun is the name of a person, place, or thing. John bought the book. verb A verb is a word which expresses action or state of being. Ralph hit the ball hard. Janice is pretty. adjective An adjective describes or modifies a noun. The big, red barn burned down yesterday. adverb An adverb describes or modifies a verb, adjective, or He quickly left the another adverb. room. She fell down hard. pronoun A pronoun takes the place of a noun. She picked someone up today conjunction A conjunction connects words or groups of words. Bob and Jerry are going. Either Sam or I will win. preposition A preposition is a word that introduces a phrase showing a The dog with the relation between the noun or pronoun in the phrase and shaggy coat some other word in the sentence. He went past the gate. He gave the book to her. interjection An interjection is a word that expresses strong feeling. Wow! Gee! Whew! (and other four letter words.) Traditional Grammar Review Page 2 of 15 II. Phrases A phrase is a group of related words that does not contain a subject and a verb in combination. Generally, a phrase is used in the sentence as a single part of speech. In this section we will be concerned with prepositional phrases, gerund phrases, participial phrases, and infinitive phrases. Prepositional Phrases The preposition is a single (usually small) word or a cluster of words that show relationship between the object of the preposition and some other word in the sentence. -
Chinese: Parts of Speech
Chinese: Parts of Speech Candice Chi-Hang Cheung 1. Introduction Whether Chinese has the same parts of speech (or categories) as the Indo-European languages has been the subject of much debate. In particular, while it is generally recognized that Chinese makes a distinction between nouns and verbs, scholars’ opinions differ on the rest of the categories (see Chao 1968, Li and Thompson 1981, Zhu 1982, Xing and Ma 1992, inter alia). These differences in opinion are due partly to the scholars’ different theoretical backgrounds and partly to the use of different terminological conventions. As a result, scholars use different criteria for classifying words and different terminological conventions for labeling the categories. To address the question of whether Chinese possesses the same categories as the Indo-European languages, I will make reference to the familiar categories of the Indo-European languages whenever possible. In this chapter, I offer a comprehensive survey of the major categories in Chinese, aiming to establish the set of categories that are found both in Chinese and in the Indo-European languages, and those that are found only in Chinese. In particular, I examine the characteristic features of the major categories in Chinese and discuss in what ways they are similar to and different from the major categories in the Indo-European languages. Furthermore, I review the factors that contribute to the long-standing debate over the categorial status of adjectives, prepositions and localizers in Chinese. 2. Categories found both in Chinese and in the Indo-European languages This section introduces the categories that are found both in Chinese and in the Indo-European languages: nouns, verbs, adjectives, adverbs, prepositions and conjunctions. -
Copular Constructions in Colloquial Singaporean English
Copular Constructions in Colloquial Singaporean English Honour School of Psychology, Philosophy, and Linguistics Paper C: Linguistic Project 2020 Candidate no. 1025395 Word count: 10,000 Contents List of abbreviations iv 1 Introduction 1 1.1 Language in Singapore ......................... 2 1.1.1 Early and colonial history ................... 2 1.1.2 Independence and language policies ............. 3 1.2 Colloquial Singaporean English .................... 4 1.2.1 Evolution of CSE ........................ 4 1.2.2 Status of CSE .......................... 5 1.3 Summary ................................ 6 2 Copular Constructions in Lexical-Functional Grammar 7 2.1 LFG analyses of copular constructions ................ 8 2.1.1 Single-tier analysis ....................... 9 2.1.2 Open complement double-tier analysis ............ 12 2.1.3 Closed complement double-tier analysis ........... 15 2.2 Copular constructions in SSE ..................... 17 2.3 Copular constructions in Chinese ................... 17 2.3.1 NP predicatives in Chinese .................. 17 2.3.2 AP predicatives in Chinese .................. 18 2.3.3 PP predicatives in Chinese .................. 19 2.3.4 LFG analyses of Chinese copular constructions ....... 20 2.4 Copular constructions in Malay .................... 21 2.4.1 NP predicatives in Malay ................... 21 2.4.2 AP predicatives in Malay ................... 22 2.4.3 PP predicatives in Malay ................... 22 2.4.4 LFG analyses of Malay copular constructions ........ 23 2.5 Copular constructions in CSE ..................... 23 2.6 Summary ................................ 25 ii Contents iii 3 Distribution of Copular Constructions in CSE 27 3.1 Introduction ............................... 27 3.2 Pilot study ............................... 29 3.2.1 Participants ........................... 29 3.2.2 Task ............................... 30 3.2.3 Results and discussion ..................... 31 3.3 Main study .............................. -
PARTS of SPEECH ADJECTIVE: Describes a Noun Or Pronoun; Tells
PARTS OF SPEECH ADJECTIVE: Describes a noun or pronoun; tells which one, what kind or how many. ADVERB: Describes verbs, adjectives, or other adverbs; tells how, why, when, where, to what extent. CONJUNCTION: A word that joins two or more structures; may be coordinating, subordinating, or correlative. INTERJECTION: A word, usually at the beginning of a sentence, which is used to show emotion: one expressing strong emotion is followed by an exclamation point (!); mild emotion followed by a comma (,). NOUN: Name of a person, place, or thing (tells who or what); may be concrete or abstract; common or proper, singular or plural. PREPOSITION: A word that connects a noun or noun phrase (the object) to another word, phrase, or clause and conveys a relation between the elements. PRONOUN: Takes the place of a person, place, or thing: can function any way a noun can function; may be nominative, objective, or possessive; may be singular or plural; may be personal (therefore, first, second or third person), demonstrative, intensive, interrogative, reflexive, relative, or indefinite. VERB: Word that represents an action or a state of being; may be action, linking, or helping; may be past, present, or future tense; may be singular or plural; may have active or passive voice; may be indicative, imperative, or subjunctive mood. FUNCTIONS OF WORDS WITHIN A SENTENCE: CLAUSE: A group of words that contains a subject and complete predicate: may be independent (able to stand alone as a simple sentence) or dependent (unable to stand alone, not expressing a complete thought, acting as either a noun, adjective, or adverb). -
Automatic Extraction of Compound Verbs from Bangla Corpora
Automatic Extraction of Compound Verbs from Bangla Corpora Sibansu Mukhopadhyay1 Tirthankar Dasgupta2 Manjira Sinha2 Anupam Basu2 (1) Society for Natural Language Technology Research, Kolkata 700091 (2) Indian Institute of Technology Kharagpur, Kharagpur 721302 {sibansu, iamtirthankar, manjira87, anupambas}@gmail.com ABSTRACT In this paper we present a rule-based technique for the automatic extraction of Bangla compound verbs from raw text corpora. In our work we have (a) proposed rules through which a system could automatically identify Bangla CVs from texts. These rules will be established on the basis of syntactic interpretation of sentences, (b) we shall explain problems of CV identification subject to the semantics and pragmatics of Bangla language, (c) finally, we have applied these rules on two different Bangla corpuses to extract CVs. The extracted CVs were manually evaluated by linguistic experts where our system and achieved an accuracy of around 70%. KEYWORDS: COMPOUND VERBS, AUTOMATIC EXTRACTION, VECTOR VERBS Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP), pages 153–162, COLING 2012, Mumbai, December 2012. 153 1 Introduction Compound verbs (henceforth CV) are special type of complex predicates consisting of a sequence of two or more verbs acting as a single verb and express a single expression of meaning. However, not all verb sequences are considered as compound verbs. A compound verb consists of a sequence of two verbs, V1 and V2 such that V1 is a common verb with /-e/ [non- finite] inflection marker and V2 is a finite verb that indicates orientation or manner of the action or process expressed by V1 (Dasgupta, 1977).