Subcategorization Acquisition and Classes of Predication in Urdu
Total Page:16
File Type:pdf, Size:1020Kb
Subcategorization Acquisition and Classes of Predication in Urdu Dissertation zur Erlangung des akademischen Grades eines Doktors der Philosophie vorgelegt von Ghulam Raza an der Universit¨atKonstanz Geisteswissenschaftliche Sektion Fachbereich Sprachwissenschaft Tag der m¨undlichen Pr¨ufung:30 November 2011 Referentin (Chair): Dr. Heike Zinsmeister Referentin: Prof. Dr. Miriam Butt Referent: Prof. Dr. Rajesh Bhatt Dedicated to Abbaa Saeen and Ammaan Huzoor Acknowledgements Hinrich Sch¨utzewas my first mentor. He supervised my PhD research for more than two years. He spared enough time to let me discuss different topics of computational linguistics with him. It was a wonderful experience to learn the concepts of information retrieval from a very good book which Hinrich Sch¨utzeis a co-author of. In the beginning, he assigned me the task of improving the ranking of German parses by encoding lexical features. Al- though the results were not positive, it was a great opportunity for me to learn. Since I did not have a good knowledge of the German language, I shunned experimenting on German corpora further and changed my re- search topic to the automatic acquisition of subcategorization information of verbs from an Urdu corpus under the supervision of Miriam Butt. She exposed me to the Lexical Functional Grammar framework in which I later analyzed some Urdu constructions. Exploring interesting syntactic phenomena in Urdu language and sharing them with her has been a source of joy and happiness for me. She spent a lot of her precious time on com- menting and correcting many drafts of my presentations, papers and the thesis. My knowledge of English and linguistics should now definitely be better than it was three years before. Tikaram Poudel is the person from whom I learnt how linguists think and react to empirical observations in a language. His style of discussing a language and linguistics was inspiring. Unfortunately, he left Konstanz some months after I reached here. Acknowledgements are due to Helmut Schmidt, Martin Forst, Aoifa Cahill and Sabine Schulte im Walde who always readily responded whenever I asked them for some help during my stay at the University of Stuttgart. Kati Schweitzer was so nice to happily make a copy of a statistics book for me at times when I did not know how to use a copy machine. Alice Davison took the headache of sending me the printed copy of one of her articles from the USA when I requested her to send just an electronic copy. The colleagues who have been helpful at Konstanz are: Jaouad Mousser, Melanie Seiss, Nette Hautli, Qaiser Abbas, Sebastian Sulger, Tafseer Ahmed i and Tina B¨ogel.In particular, Jaouad Mousser provided me a Latex tem- plate to write the thesis; Melanie Seiss proved an angel on many occasions when I was stuck with Latex problems; Nette Hautli translated the sum- mary of the thesis in German. Rajesh Bhatt commented on the first draft of the thesis and identified some errors. The higher education commission (HEC) of Pakistan was generous to award me the PhD scholarship and the Pakistan Institute of Engineering and Ap- plied Sciences (PIEAS), where I am employed in Pakistan, was kind to grant me the study leave. Thanks to all! ii Transcription Scheme Consonants// Orthographic Phonetic (IPA) Transcription H. b b H p p H ”t t H ú t. H T s h. dZ j h tS c h è h p x x X ”t d X ã d. X D z P r r P ó r. P z z P Z y s s SS sQ s dQ z tQ t DQ z ¨ QA ¨ GG ¬ f f q q ¸ k k À g g È l l Ð m m à n n ð V v è h h ø j y ë- (Aspiration) -h -h iii Vowels// Orthographic Phonetic (IPA) Transcription @ @A @ I ı @ UU @ a: a e e þ@ þ@ æ E ø@ i: i o o ð@ ð@ OO ð@ u: u à (Nasalization) ~ ~ iv Morphemic Glosses Gloss Meaning 1 first person 2 second person 3 third person M male F female Neut neutral Sg singular Pl plural Pres present tense Past past tense Fut future Nom nominative Acc accusative Dat dative Erg ergative Gen genitive GenR reflexive genitive Refl reflexive Pron pronoun RelP relative pronoun Abl ablative Inst instrumental Loc locative Temp temporal Obl oblique form Dir direct form Inf infinitive Perf perfect aspect Imperf imperfect aspect Subjn subjunctive Prog progressive Caus causative Ez ezafe Emp emphatic Conj conjunction v Abbreviations NLP Natural Language Processing SCF Subcategorization Frame CLC Case Clitic and Complementizer Combination SASU Subcategorization Acquisition System for Urdu XLE Xerox Linguistic Environment vi Contents 1 Introduction1 1.1 Predication . .1 1.2 Subcategorization . .4 1.3 Subcategorization in LFG . .8 1.4 ParGram . 12 1.5 Subcategorization lexicons . 13 1.6 Contribution of the thesis . 16 1.7 Outline of the thesis . 17 2 Urdu verbs and challenges for lexical acquisition 19 2.1 Urdu verb types . 19 2.1.1 Simple predicates . 19 2.1.1.1 Paradigms of base form derivation . 21 2.1.1.2 Paradigms of stems' derivation . 23 2.1.2 Complex predicates . 26 2.1.2.1 Verb-Verb predicates . 27 2.1.2.2 Adjective-Verb predicates . 29 2.1.2.3 Noun-Verb predicates . 30 2.1.2.4 Compound-Verb predicates . 31 2.1.3 Even predicates . 34 2.1.4 Summary . 35 2.2 Challenges for subcategorization acquisition . 36 2.2.1 Absence of unique case clitic forms . 37 2.2.1.1 The case clitic ko ..................... 37 2.2.1.2 The case clitic se ...................... 39 2.2.2 Different marking of grammatical functions . 42 2.2.2.1 Different marking of subject . 43 2.2.2.2 Differential object marking . 44 vii CONTENTS 2.2.3 Free word order . 46 2.2.4 Multifunctionality of the complementizer kıh ........... 47 2.2.5 Argument attachment ambiguities . 51 2.3 Summary . 53 3 Automatic lexical acquisition 55 3.1 Corpus selection . 55 3.1.1 Corpora used in previous works . 55 3.1.2 Urdu corpus . 56 3.2 Identification of verbs . 58 3.2.1 Different methods of identifying verbs . 58 3.2.2 Urdu Verb Conjugator . 58 3.3 Types of SCFs . 60 3.3.1 Distinction between arguments and adjuncts . 61 3.3.2 Arguments and adjuncts in Urdu . 62 3.3.2.1 Case marked NPs as arguments/adjuncts . 62 3.3.2.2 Adposition marked NPs as arguments/adjuncts . 66 3.3.2.3 Infinitival arguments/adjuncts . 67 3.3.2.4 Clausal arguments/adjuncts . 72 3.3.3 Number and types of SCFs considered for Urdu verbs . 73 3.4 Subcategorization Acquisition System for Urdu (SASU) . 76 3.4.1 Candidate Finder and Scope Delimiter . 77 3.4.1.1 Initial screening phase . 78 3.4.1.2 Scope delimiting phase . 81 3.4.1.3 Final screening phase . 85 3.4.2 CLC Builder and Frequency Collector . 86 3.4.3 CLC Filtering . 86 3.4.3.1 Relative frequencies . 86 3.4.3.2 Log likelihood ratio . 87 3.4.3.3 T-scores . 88 3.4.3.4 Binomial filter . 88 3.4.4 SCF Induction . 89 3.4.4.1 Application of Metarules . 89 3.4.4.2 CLC Collapse . 90 3.4.4.3 SCF Information Collection . 90 3.5 Results and evaluation . 91 3.6 Usefulness of the SASU system . 92 viii CONTENTS 3.7 Limitations of the SASU system . 96 3.8 Conclusion . 97 4 The verb ho `be/become' 99 4.1 Infinitive ho .................................. 100 4.2 Non-aspectual ho .............................. 101 4.2.1 The verb ho as an intransitive verb . 101 4.2.2 The verb ho as copula . 102 4.2.2.1 Possession of abstract characteristics/properties . 106 4.2.2.2 Possession of concrete objects . 110 4.2.2.3 Syntactic frames of stative copula . 114 4.2.3 The verb ho as modal . 114 4.2.4 The verb ho as tense auxiliary . 116 4.2.5 Optionality of non-aspectual ho ................... 117 4.2.6 The verb ho with the future . 118 4.3 Aspectual ho ................................. 121 4.3.1 The imperfect form of ho ...................... 121 4.3.2 The perfect form of ho ........................ 124 4.3.2.1 Perfect ho with present and future interpretation . 125 4.3.2.2 Perfect ho as an emphasis on being . 126 4.4 Classification and distinction of the verb ho uses . 127 4.4.1 Aspectual distinction . 127 4.4.2 Light verb distinction . 129 4.4.3 Auxiliary distinction . 129 4.4.4 Complement distinction . 130 4.5 The verb ho as a light verb . 130 4.6 Various other uses of ho ........................... 132 4.7 Characterizing participles . 136 4.7.1 Concomitant participles . 136 4.7.2 Resultative participles . 139 4.8 Summary . 144 5 Arguments and syntax of nouns 145 5.1 Argument-taking nouns . 145 5.2 Genitive modifiers/arguments . 148 5.2.1 The genitive marker . 148 5.2.2 Structure of noun phrases with multiple genitive modifiers . 151 5.2.3 Attributive genitive modifiers . 155 ix CONTENTS 5.2.4 Nominals and genitive arguments . 162 5.2.4.1 Infinitives with genitive arguments . 162 5.2.4.2 Other nominals with genitive marked arguments . 167 5.2.5 Implementation of NPs with multiple genitives in LFG . 170 5.2.6 Summary . 175 5.3 Argument-taking adjectives . 175 5.3.1 Genitive marked complements of degree adjectives . 177 5.3.2 Dative marked complements of degree adjectives .