Syntactic-Semantic Analysis of Modern Chinese with Left-Associative Grammar
Total Page:16
File Type:pdf, Size:1020Kb
Syntactic-Semantic Analysis of Modern Chinese with Left-Associative Grammar Inaugural Dissertation Faculty of Social Sciences and Theology The Friedrich-Alexander-University of Erlangen-Nuremberg presented by Qiuxiang Feng from China D29 Date of the oral test: 03.Feb.2012 Dean: University Professor Dr. Heidrun Stein-Kecks Primary reviewer: University Professor Dr. Roland Hausser Second reviewer: University Professor Dr. Michael Lackner 2 Preface It is a long-term pursuit of people to interact with computers by natural languages. This is why natural language processing is significant in computer science and artificial intelligence. Before enabling a computer, or a robot, to give proper responses, we have to first make it understand what we say. Here comes the task of parsing. For a long time, parsing refers only to syntactic analysis, which shows the grammatical relation of sentence elements. Some formal grammars, such as the Phrase Structure Grammar, the Categorial Grammar, the Dependency Grammar, etc. are applied for syntactic parsing. However, syntax and semantics represent the duality of a language. Their relation resembles that of form and content. When these grammars are applied in Chinese parsing, the disadvantage of merely syntactic parsing becomes more outstanding. Chinese is a parataxical language in which the semantic relation between sentence elements speaks loud. Additional semantic information of words is necessary for high-quality syntactic analysis. If the semantic relation between words is presented in the parsing result simultaneously with the syntactic relation, it will improve language production, which is important to machine translation, artificial intelligence, etc. In addition, if the whole analysis process models the mechanism of a human agent’s language understanding, it will be more theoretically solid and empirically valuable. Under these circumstances, the Left-Associative Grammar (LAG) is proposed. LAG is supported by the language theory of Surface compositional Linear Internal Matching (SLIM). SLIM is agent- oriented, aiming to explain the understanding and purposeful production of signs in terms of completely explicit, mechanical (logically electronic) procedures. Its technical implementation is called Database Semantics (DBS). LAG and the data structure of a word bank are the two bases of DBS, As the particular algorithm of DBS, LAG follows the time-linear order of language input, allows parallel derivation and is therefore computationally efficient. Guided by this procedural language theory of SLIM/DBS, LAG is applied to modern Chinese parsing in this research. The most basic and frequent patterns of phrases and sentences in Chinese are automatically analyzed on both the syntactic and the semantic levels. A detailed discussion is presented in the three parts of this dissertation, namely, NOUN, ADJECTIVE and VERB, which are the basic universal parts of speech in all languages. Analysis from the perspective of linguistics is also integrated in order to provide a solid foundation for the automatic parsing. SLIM/DBS proves complete, consistent and instructive in Chinese parsing. LAG proves unique, efficient and competitive in the syntactic-semantic analysis of a natural language, i.e. Chinese. The main content of this dissertation is as follows: Part I is composed of five chapters. After an overview of Chinese nouns, the following chapters focus respectively on numerals, quantifiers, pronouns and some particular grammatical phenomena related to nouns. Part II introduces various verbs in Chinese: common verbs, modal verbs, directional verbs, 3 causative verbs and three-valence verbs. The general application of these verbs are analyzed in LAG. PART III presents the analysis of adjectives, adverbs, prepositions and conjunctives. During the whole research, the difference between Chinese and English is slightly emphasized. In PART IV comes the discussion and conclusion. Acknowledgement I thank Prof. Dr. Hausser for his kind supervision over the whole research. I would also like to thank Prof. Dr. Lackner, the second reviewer of this dissertation, for his comments and in- structions. My thanks also go to my colleagues at the University of Erlangen: Johannes Handl, Besim Kabashi, Thomas Proisl and Dr. Carsten Weber. They have helped me so much since I came to Germany for research. In the past four months, I had a chance to work together with four nice students in the Chinese task group. I enjoyed very much the time with them. I thank them for their wonderful ideas and support. They are Weiwei Zheng, Hsiaoyun Huang, Sina Graf and Laura Zischler. Many other students at the CLUE have also helped me in different ways. I am grateful to them, too. Prof. Xiwu Han (Heilongjiang University, China) and Dr. Xiao Sun (Dalian University of Technology, China) have been really nice to answer my questions related to Chinese parsing. I appreciated it very much. At last, I give my heartfelt gratitude to my husband. Without his encouragement, I wouldn’t have started this research. Without his help in writing the parser in Perl and formating in LYX, I wouldn’t have finished the dissertation on time. He is also the first reader of this dissertation. Thanks for his listening and understanding. Thanks for all his time, patience and love. Erlangen, September 2011 Qiuxiang Feng 4 Contents 1 Introduction 7 1.1 Motivation . 7 1.2 Chinese Syntax . 14 1.3 Chinese Morphology . 15 1.4 LAG-Chinese Lexicon . 16 1.5 Corpus and Technology . 19 I NOUN Application and Analysis 21 2 Analysis of Nouns 25 2.1 Inflectional Change . 25 2.2 Derivational Change . 28 3 Analysis of Numerals and Quantifiers 39 3.1 Combination of N-Q . 39 3.2 Quantifiers in Repetition . 55 4 Analysis of Pronouns 63 4.1 Personal Pronouns . 63 4.2 Demonstrative Pronouns . 69 5 Analysis of Nouns Modified by Adverbial Adjectives 81 5.1 A Syntactic View . 82 5.2 A Semantic View . 98 6 Analysis of Nouns in Subject or Object Positions 101 6.1 Temporal Nouns . 106 6.2 Location Nouns . 112 II VERB Application and Analysis 127 7 Analysis of Common Verbs 131 7.1 Verbs in Repetition . 134 7.2 Verbs in Phrases and Sentences . 137 5 CONTENTS CONTENTS 8 Analysis of Modal Verbs 149 8.1 Modified by Adverbs . 150 8.2 Modal Verbs in Combination . 152 9 Analysis of Directional Verbs 159 9.1 Verb + Directional Verb . 160 9.2 Adjective + Directional Verb . 163 10 Analysis of Causative and Three-valence Verbs 165 10.1 Causative Verbs . 165 10.2 Three-valence Verbs . 168 III ADJECTIVE Application and Analysis 177 11 Analysis of Adjectives 181 11.1 Overview . 181 11.2 Adjectives as Predicators . 185 11.3 Adjectives in Repetition . 195 12 Analysis of Adverbs 201 12.1 Overview . 201 12.2 Adverbs as Complement . 207 12.3 Adverbs in Coordination . 211 13 Analysis of Prepositions 213 13.1 Overview . 213 13.2 Ba and Bei Constructions . 220 13.3 Analysis of Conjunctives . 233 IV Conclusion and Prospects 243 14 Discussion and Conclusion 245 15 Future Prospects 249 16 Summary 251 V Appendix 255 Bibliography 264 6 Chapter 1 Introduction Natural language processing (NLP) is concerned with the interactions between computers and natural languages. It is a part of computer science and artificial intelligence. NLP is generally composed of language understanding and language production. Language understanding is mainly represented by parsing. A key approach to the existing automatic syntactic parsing is rule-based. 1.1 Motivation No matter how free the sentence structure is, the fundamental rules are relatively stable. The rule-based approach starts from the most fundamental features of sentences, such as morphology and syntax, and generalizes syntactic rules in a macro view. These formalized rules are defined in mathematics and adjusted for computer operations, therefore called “grammars for computational linguistics”. These grammars include Transformational Grammar (TG), Government & Binding (GB), Mini- malist Program (MP), Phrase Structure Grammar (PSG), Generalized Phrase Structure Grammar (GPSG), Head-driven Phrase Structure Grammar (HPSG), Functional Grammar (FUG), Lexical Functional Grammar (LFG), Tree Adjoining Grammar (TAG), Categorial Grammar (CG), De- pendency Grammar (DG), Link Grammar (LG), and so on. All these grammars aim to explain how large language units are composed of small language units. They can be divided into two groups: one is phrase-based, e.g. TG, GB, MP, PSG, GPSG, HPSG, FUG, LFG and TAG; the other is word-based, e.g. CG, DG, LG , etc. The traditional rule-based approach is also applied in Chinese syntactic parsing. A lot of research has been done in this field (Zhao et al. [1992]; Zhou et al. [1999]; Zhou [1999]; Yang [2000]; Yuan et al. [2001]; Wang et al. [2003]; Liu and Zhao [2009]; Hu et al. [2010]). Generally, the rule-based Chinese syntactic parsers share the following features: • to rely on rules to define the collocation relations between sentence elements; 7 1.1. MOTIVATION CHAPTER 1. INTRODUCTION • to generate a syntactic tree or other equivalent forms of the input string; • to exclude incorrect structures through disambiguation mechanism; • to be equipped with a rule base and an electronic dictionary. 1.1.1 Phrase Structure Grammar Among all the grammars that have been introduced to Chinese parsing, the Phrase Structure Grammar (PSG) is the most classic. PSG (Chomsky [1957]; Chomsky [1965]) is further explained as regular, context-free, context-sensitive and recursively enumerable language with different level of restrictions. The context-free is widely applied in natural language processing (Wang et al.[2003]; Feng [2000]). Transformational rules are applied in PSG, which aims to define the basic grammat- ical relations in the deep structure of the sentence. The derivation order of PSG is schematized as follows: PSG .& .&.& .& top-down expanding The time-linear structure of language is emphasized by de Saussure (Saussure [1974]). The term “time-linear” means to be linear like time and in the direction of time (Hausser [1992]; Hausser [2001]). Obviously, the top-down expanding derivation of PSG is not based on time-linearity.