<<

Syntactic Knowledge based Framework for Resolving Reflexive and Distributive Anaphors in Urdu Discourse

By

JAMAL ABDUL NASIR

Registration No. 1079-D-83

A thesis is submitted in partial fulfillment of the requirements for the degree of Ph.D. in Computer Science

INSTITUTE OF COMPUTING AND INFORMATION TECHNOLOGY GOMAL UNIVERSITY DERA ISMAIL KHAN, KPK, PAKISTAN

September, 2020

Dedicated to Humanity

List of Contents

S. No Description Page No 1. Student’s Declaration………………………………………………. i 2 List of Tables………………………………………………………. ii 3. List of Figures……………………………………………………… iii 4. List of Illustrations…………………………………………………. iv 5. List of Abbreviations………………………………………………. V 6. List of Appendices…………………………………………………. Vi 7. Acknowledgement…………………………………………………. Vii 8. Abstract…………………………………………………………….. Viii 9 Chapter 1: Introduction………………………………………….. 1 1.1 Overview ……………………………………………….. 1 1.2 Terminology ……………………………………………. 2 1.3 Anaphora Resolution …………………………………… 3 1.4 Aim and Objectives …………………………………… 5 1.5 Trends and Challenges …………………………………. 6 1.6 Reflexive and distributive anaphora in Urdu …………... 11 1.7 Key Contributions ……………………………………… 14 1.8 Significance of the Study ………………………………. 14 1.9 Thesis Organization ……………………………………. 15 1.10 Summary ……………………………………………….. 15 10. Chapter 2: Literature Review……..…………………………...... 16 2.1 Overview ……………………………………………… 16 2.2 Factors in Anaphora Resolution ………………………. 17 2.2.1 Constraints …………………………………... 17 2.2.2 Preferences ………………………………….. 18 2.3 Early AR systems ……………………………………... 19 2.4 Modern Anaphora Resolution Systems ………………. 20 2.5 Machine Learning and Statistics based AR System ….. 21 2.6 AR for URDU and Indian Languages ………………… 22 2.7 Summary ……………………………………………… 26 11. Chapter 3: Reflexive and Distributive ………………. 27 12. 3.1 Overview ………………………………………………. 27 3.2 Cases in Urdu …………………………………… 28 3.2.1 Nominative case ……………………………. 29 3.2.2 Ergative case ………………………………… 29 3.2.3 Accusative case ……………………………… 30

3.2.4 Dative case …………………………………… 30 3.2.5 Instrumental case ……………………………... 31 3.2.6 Genitive case .………………………………. 31 3.2.7 Locative case ………………………………... 32 3.2.8 Vocative case ………………………………. 33 3.2.9 Oblique case ………………………………… 34 36 ...... ) امضرئ وکعمس( Reflexive in Urdu 3.3 37 ...……………… (امضرئ وکعمس) Exploring 3.4 3.4.1 Reflexive pronouns ………………….. 37 3.4.1.1 Possessive reflexive pronoun preceded by a noun or a pronoun ………………………. 38 3.4.1.2 Possessive reflexive pronoun preceded by ergative case ……………………………… 39 3.4.1.3 Possessive reflexive pronoun preceded by an and a noun/pronoun ………………. 41 3.4.1.4 Possessive reflexive pronoun preceded by a dative case ………………………………… 42

Aur) between two) ”اور“ The connector 3.4.1.5 possessive reflexive pronouns …………… 43 45 ”وخد“ Non Possessive or Emphatic Reflexive pronoun 3.4.2

compound with ”وخد“ Emphatic Reflexive AD 3.4.2.1 a noun or ………………… 45

preceded by ”وخد“ Emphatic Reflexive pronoun 3.4.2.2 ergative case ……………………………….. 47 preceded a ”وخد“ Emphatic reflexive pronoun 3.4.2.3 dative case ………………………………… 48

preceded by ”وخد“ Emphatic reflexive pronoun 3.4.2.4 an adverb and a noun/pronoun ……………. 49 preceded by ”وخد“ Emphatic reflexive pronoun 3.4.2.5 and noun/pronoun …………………… 50 ”ذبات“ 3.4.3 Possessive and non-possessive reflexive pronouns Together …………………………………………... 51 3.4.4 Distributive Reflexive pronoun ……………………. 52 57 ………………… (امضرئمیسقت) Exploring Distributive Pronouns 3.5 Har Ek) ……………….. 57) رہ اکی Distributive Pronoun 3.5.1 3.5.1.1 Group Reference …………………………… 57 3.5.1.2 Role of ……………………………….. 59 3.5.1.3 Using properties/attributes/complements to identify referent …………………………. 61

Har Ek) followed by) رہاکی Pronoun 3.5.1.4 Noun/Noun Phrase ………………………… 64 3.5.1.5 Topicalized structure ………………………. 65 3.5.1.6 Referent in Oblique form ………………….. 67 Har Ek) referring to many) رہ اکی Pronoun 3.5.1.7 entities of same category separated by comma 68 Koi Ek) …………….. 70) وکیئ اکی Distributive Pronoun 3.5.2 3.5.2.1 Group formed by number ………………… 70 3.5.2.2 Group formed by a plural ………………… 71 Koi Ek) …. 72) وکیئ اکی Noun after the pronoun 3.5.2.3 Koi) وکیئ اکی Personal Pronoun before pronoun 3.5.2.4 Ek) …………………………………………. 73 Koi Bhi) …………….. 75) وکیئ یھب Distributive Pronoun 3.5.3 Koi Bhi) in a) وکیئ یھب Distributive pronoun 3.5.3.1 negative sentence ………………………… 75 Koi Bhi) in a) وکیئ یھب Distributive pronoun 3.5.3.2 positive sentence …………………………. 77 Kai Ek) …………….. 80) یئک اکی Distributive pronoun 3.5.4 3.5.4.1 Reference to preceding group ……………. 80 3.5.4.2 Genitive or Possessive case ……………… 80 3.6 Summary ……………………………………………….. 86

13. Chapter 4: Proposed Framework for Reflexive And Distributive Anaphora Resolution (RADAR) …………… 87 14. 4.1 Introduction to RADAR …………………………………. 87 4.2 General framework for anaphora resolution …………….. 88 4.3 Architecture of RADAR ………………………………… 90 4.4 Framework for resolving reflexive anaphora …………… 92 4.5 Workflow of framework for reflexive anaphora ……….. 92 4.5.1 Reflexive Identifier ………………………………. 92 4.5.2 Noun Phrase Extractor …………………………… 92 4.5.3 PR case Identifier ………………………………… 92 4.5.4 NPR case Identifier ………………………………. 94 4.5.5 PR Rules ………………………………………….. 94 4.5.6 NPR Rules ………………………………………… 94 4.5.7 Anaphora Resolution ………………………………. 94 4.5.8 RA …………………………………………………. 94 4.5.9 Noun ……………………………………………….. 95 4.5.10 PP …………………………………………………. 95

4.5.11 Noun Cases ……………………………………….. 95 4.6 Framework for resolving distributive anaphora …………. 96 4.7 Workflow of framework for distributive anaphora ……… 96 4.7.1 Distributive Anaphor Identifier ……………………. 96 4.7.2 Mark Groups ………………………………………. 96 4.7.3 Verb Identification ………………………………… 98 4.7.4 Distributive Anaphora Resolution ………………… 98 4.7.5 Resolve and mark case …………………………….. 98 4.7.6 Processes and resources for Distributive anaphora .. 98 4.7.6.1 QW () ……………………………………. 98 4.7.6.2 Nmbr () ………………………………….. 98 4.7.6.3 SP () ……………………………………… 98 4.7.6.4 Verb () …………………………………… 98 4.7.6.5 Attrib () …………………………………… 99 4.7.6.6 Polarity () …………………………………. 99 4.7.6.7 NCase () …………………………………… 99 4.7.6.8 MVerb () …………………………………... 99 4.7.6.9 DA ………………………………………... 100 4.7.6.10 Plurals …………………………………… 100 4.7.6.11 Numbers ………………………………… 100 4.7.6.12 Quantifier ……………………………….. 100 4.7.6.13 ……………………………………. 100 4.7.6.14 Attribute …………………………………. 101 4.7.6.15 Category …………………………………. 101 4.7.6.16 PP ………………………………………… 101 4.7.6.17 Oblique …………………………………… 101 4.7.6.18 Adverb …………………………………… 101 4.7.6.19 NounCase ………………………………… 101 4.8 Summary …………………………………………………. 102

15. Chapter 5: Evaluation and Results ……………………………… 103 16. 5.1 Evaluation overview ……………………………………… 103 5.2 Evaluation of RADAR …………………………………… 103 5.3 Overall Results …………………………………………… 104 5.4 Evaluation of RADAR for Reflexive anaphora ………….. 105 5.4.1 Analysis of possessive reflexive pronoun …………. 106 5.4.2 Analysis of Non-possessive reflexive pronoun ……. 108 5.5 Evaluation of RADAR for Distributive anaphora ………. 110 5.5.1 Analysis of possessive reflexive pronoun …………. 112

Har Ek) with) رہ اکی Distributive pronoun 5.5.1.1

various entities ……………………………. 112

Koi Ek) with) وکیئ اکی Distributive pronoun 5.5.1.2 different entities …………………………. 114 Koi Bhi) ……. 115) وکیئ یھب Distributive anaphor 5.5.1.3

Kai Ek) ……… 117) یئک اکی Distributive anaphor 5.5.1.4 5.6 Conclusion ……………………………………………….. 119 5.7 Future Work ……………………………………………… 120

17. Chapter 6: References……………………………………………. 122

Student’s Declaration

I, Jamal Abdul Nasir, do hereby state that my Ph.D. thesis titled “Syntactic Knowledge Based Framework for Resolving Reflexive and Distributive Anaphors in Urdu Discourse” is my own work and has not been submitted previously by me for taking any degree from Gomal University, Dera Ismail Khan or anywhere else in the country/world.

I understand the zero tolerance policy of the HEC and Gomal University, Dera Ismail Khan towards plagiarism. Therefore, I declare that no portion of my thesis has been plagiarized and any material used as reference is properly cited.

I undertake that if I am found guilty of any formal plagiarism in the above titled thesis even after award of Ph.D. degree, the university reserves the rights to withdraw/revoke my Ph.D. degree and that HEC has the right to publish my name on the website on which names of students are placed who submitted plagiarized work.

Name of Student: Jamal Abdul Nasir Signature______Date______

Name of Supervisor: Dr. Zia Ud Din Signature______Date______

i

List of Tables

Table No Description Page No 2.1 Summary of related work in Indian languages 24

3.1 Singular-Plural 28 3.2 Noun Cases 35 3.3 Resolution rules for reflexive anaphora 55 3.4 Resolution rules for distributive anaphora 83 5.1 Results of Reflexive and Distributive Anaphora Resolution 104 5.2 Reflexive Pronoun Individually 105 5.3 Possessive Reflexive Pronoun with various entities 107 5.4 Non- Possessive Reflexive Pronoun with various entities 109 5.5 Distributive Anaphors individually 111 5.6 112 Har Ek) with various entities) رہ اکی Distributive pronoun 5.7 114 Koi Ek) with different entities) وکیئ اکی Distributive pronoun 5.8 116 Koi Bhi) with different entities) وکیئ یھب Distributive pronoun 5.9 117 Kai Ek) with different entities) یئک اکی Distributive pronoun

ii

List of Figures Figure No Description Page No 1.1 Knowledge required for Anaphora Resolution 9 3.1 Possessive reflexive pronoun preceded by noun 38 3.2 Possessive reflexive pronoun with ergative case 40 3.3 Possessive reflexive pronoun preceded by adverb and noun 41 3.4 Two possessive reflexive pronoun together 44 3.5 Non-possessive reflexive pronoun preceded by a noun 46 3.6 Non-possessive reflexive pronoun ergative case 47 3.7 Non-possessive reflexive pronoun preceded by ergative case 48 3.8 51 preceded by personal pronoun ذبات NPRP with word 3.9 PRP and NPRP together preceded by a personal pronoun 52 3.10 PRP twice preceded by entity to be distributed 53 3.11 Distributive pronoun referring to a noun in plural 58 3.12 Multiple antecedents and verbs as deciding factors 61 3.13 Selecting antecedent on the bases of attributes-1 62 3.14 Selecting antecedent on the bases of attributes-2 64 3.15 Distributive anaphor referring to a class 69 3.16 Distributive anaphor in group formed by number in words 71 3.17 Distributive pronoun preceded by personal pronoun 74 3.18 Distributive pronoun in negative sentence 76 3.19 Distributive pronoun in positive sentence 78 3.20 Distributive pronoun in genitive sentence 82 4.1 General Framework for anaphora resolution 89 4.2 Architecture for RADAR 91 4.3 Architecture of framework for resolution of reflexive pronoun 93 4.4 Architecture of framework for resolution of distributive 97 pronoun 5.1 Results of reflexive and distributive anaphora 104 5.2 Reflexive anaphors individually 106 5.3 Possessive reflexive pronoun with various entities 108

iii

5.4 Non- Possessive Reflexive Pronoun with various entities 109 5.5 Distributive pronoun individually 111 5.6 113 Har Ek) with different entities) رہ اکی Distributive pronoun 5.7 115 koi Ek) with different entities) وکیئ اکی Distributive pronoun 5.8 117 Koi Bhi) with different entities) وکیئ یھب Distributive pronoun 5.9 118 Kai Ek) with different entities) یئک اکی Distributive pronoun

iv

LIST OF ABBREVIATIONS

Acronym Definition

AD Anaphoric Device

AR Anaphora Resolution

HTML Hypertext Markup Language

IE Information Extraction

MT Machine Translation

NER Named Entity Recognition

NLP Natural Language Processing

NLGS Natural Language generation System

NLTK Natural Language Toolkit

PoS

QA Question Answering

TS Text Summarization

ULP Urdu Language Processing

OPF Operational Research Framework

CRF Comprehensive Research Framework

PP Personal Pronoun

PRP Possessive Reflexive Pronoun

NPRP Non-Possessive Reflexive Pronoun

v

List of Appendices Appendix Description Page No A Reflexive pronoun in Urdu 134 B Noun 135 C Personal Pronoun 141 D Noun Cases 144 E Distributive pronoun in Urdu 145 F Plurals 146 G Numbers 148 H Quantifiers 151 I Verbs 152 J Attribute 156 K Category/Class 157 L Oblique 158 M Adverb 160

vi

Acknowledgement

All thanks to Almighty Allah, the most merciful and beneficent, and all blessings for who is always a torch of guidance and knowledge for the ,ﷺ Prophet Hazrat Muhammad whole mankind on the earth.

I would like to thank Dr. Zia Ud Din, my supervisor, for his continuous support, encouragement and interest in my work that he has shown during my PhD studies.

A big ‘Thank You’ to all my colleagues, in Institute of Computing and Information Technology, for many helpful discussions that helped in shaping my ideas. I would like to pay special thank to Dr. Shahid Kamal and Dr. Fazal Masood for their continuous support and suggestions during this research work.

Special thanks to Professor Dr. Muhammad Abid Khan, University of Peshawar, who first introduced me to the world of Computational Linguistics.

Finally, my gratitude goes to my family, my mother, my wife Dr. Fazeelat for their continuous encouragement, which helped me to complete this thesis and special thanks to my kids Ahmad, Ayesha, Fatima, and Ali for their patience, during this research work.

vii

Abstract

In this global information society, fast and easy access to information in one’s language of choice is essential, which is being facilitated by multilingual NLP applications. Anaphora resolution is an essential part of these applications.

In computational linguistics, anaphora resolution has been the topic of research for more than 40 years. Natural Languages contain anaphoric links. A text cannot be fully understandable without the knowledge of the references or links between its various units. Anaphor is an expression which refers to another expression in the same context, called antecedent. Identification of antecedent is important as it contains information for interpretation of anaphor. For a human, it is an easy task to interpret and understand these anaphoric links. But for a machine, it is a complex task.

Anaphora resolution is essential for numerous Natural Language Processing (NLP) applications such as Machine Translation (MT), Text Summarization (TS), Information Extraction (IE), etc. Anaphora resolution has been widely studied for English language, with a number of different original approaches and implementations. A limited research work has been done for Urdu language, especially for anaphora resolution. In this work, we have developed a framework to resolve reflexive and distributive anaphor in Urdu discourse. Main contributions of this research work are how to manage syntactic complexity of Urdu grammar to handle anaphoric tokens while resolving reflexive and distributive anaphors. Urdu text is analyzed syntactically for numerous variations of reflexive and distributive anaphors to find various features to define rules for resolution of these anaphors. For complex linkages in Urdu text, mechanisms are devised to incorporate world knowledge. On the bases of devised rules, algorithms are developed to resolve these anaphoric links. As, constraints and preferences also play important role in anaphora resolution, therefore, they are also made part of this framework.

Novelty of this work is that, it is the first to develop a new framework for Urdu language that resolves reflexive and distributive anaphors which is based on the rules, formulated by analyzing the syntactical structure of Urdu grammar. And world knowledge is also made part of it, to help in resolving complex cases. This frame can be extended to other anaphora types of Urdu.

viii

Urdu is a resource poor language. Proposed framework specifies the kind of resources required for resolution of reflexive and distributive anaphors. For experiments, limited resource sets are developed to test the algorithm. Our experiments showed encouraging results using these limited resources.

ix

Chapter 1: Introduction

1.1 Overview

Human language has extremely powerful communication system by providing the ability to signal to express infinite new meanings. Underlying Complexity of natural language is appreciable when children learning to speak are observed or when we miss-communicate with someone due to difference in opinion or assumptions, or when we interact with someone having some language deficiency by birth or by accident. Complexity of the language is also appreciated when we communicate with someone who has different language and grown up in different culture.

Natural language processing (NLP), field of computer science, is concerned with the interactions of computers and human using languages. Its aim is to design and build software that understand, analyze and generate human languages. It is essential to access and explore the information, in different natural languages in different formats, in this global society of information. Natural language processing or understanding requires analyses and implementation at different levels such as part of speech tagging, morphological analysis, clause identification at word level, discourse analysis at discourse level and semantic analysis at word level, etc., all these level are dependent on each other. Due to this dependency the available tools, techniques, and methods do not perform well, especially for a resource poor language. In this thesis one discourse level phenomenon called anaphora and its resolution is discussed. Anaphora is the phenomenon where the interpretation of one expression depends on the interpretation of another [1].

Objective of the natural language processing (NLP) is to provide methods and tools in order to understand and natural text in different languages. NLP is concerned with the interactions of human and computer using natural languages. A natural language understanding (NLU) system converts human language samples into a representation that is easier for computer programs to understand and manipulate. Systems which convert information from computer database to human understandable form are known as Natural Language Generation Systems (NLGS). Many problems regarding natural language processing are applicable to both understanding and generation of natural languages; for example, in order to understand a natural language sentence, computer must be able to

1

apply a model of morphology (the structure of words), but such model of morphology is also required for producing natural language sentences, which are grammatically correct.

There has been significant research in English and other language for anaphora resolution, however very limited work has been done in Urdu Language. Aim of this thesis is to study, explore and analyze different linguistic features which are helpful for resolution of anaphors in Urdu language text.

Large amount of annotated data is required for implementation of algorithms for anaphora resolution and statistical evaluation. However, no such corpus is available for Urdu language. Therefore, for anaphora resolution in Urdu text, a mini set of resources is created first, containing different linguistic features, which are helpful for the required purpose.

1.2 Terminology

This section defines important terms which are used frequently in the thesis, however most concept will be explained as arise.

The notion of reference is the relationship of one linguistic expression to another, in which one provides the information necessary to interpret the other [2].

Anaphora is the central concept of this research work. It is a Greek word, originated in 16th century. It is combination of ‘ana’, which means back, and ‘phora’ means carrying. Anaphora describes the relationship between a word or phrase called Anaphor or Anaphoric device, that refers to an entity used earlier in a sentence called Antecedent or referent.

Cataphora is reference of word or phrase to a later word or phrase in a given discourse of text.

Exophora is reference to something that is not in the same text.

Endophora refers to the phenomenon of expressions that derive their reference surrounding text.

Machine Translation (MT) is the translation of text from one language to another language by using computer.

2

Information extraction (IE) is automatic extraction or retrieval of specific information about a selected topic from a given body of text.

The term potential antecedent or candidate antecedent for an Anaphor means the words or phrases that can theoretically function as antecedents.

Anaphora Resolution is the process of locating the correct antecedent of an anaphor or anaphoric device.

Coreference is reference of two entities to the same real world entity.

Automatic processing or fully automatic processing refers to execution of a computer program without any intervention of human.

NLP system means a system which performs natural language processing activities.

Question answering (QA) is a system which automatically answers question by analyzing text.

Knowledge base (KB) is a technology to store complex information by using a computer system.

1.3 Anaphora Resolution

Anaphora resolution is a phenomenon in which the occurrence of reference for the word (anaphor) to its entity (antecedent), which occurs before the anaphor. For example, in English:

Peter went to market. He purchased a T-shirt.

In this example He is the anaphor, referring to its antecedent Peter. Once an entity is introduced into a discourse and we continue to refer it, we do not use full descriptive noun phrase or name for it each time rather we use a shorthand expression for it each time, unless there are similar entities which may cause in the context that might cause haziness. Therefore, for a natural language understanding system to summarize texts or question- answering or translation of texts from one language to another or for analyzing text for relationships among entities, it requires not only to identify entities but also to recognize when a new sentence presented some facts about a preceding entity that has already been processed.

3

Anaphora resolution (AR) is an important procedure for various NLP applications. According to Halliday and Hasan,

Cohesion occurs where the interpretation of some element in the text is dependent on that of another. The one presupposes the other, in the sense that it cannot be effectively decoded except by recourse to it. When this happens, a relation of cohesion is set up, and the two elements, the presupposing and the presupposed, are thereby at least potentially integrated in the text. [2]

Referential expressions seem straightforward but actually are more complex than meet the casual eye. Anaphora has been the subject of research since early sixties. Referential expressions are extensively used in natural languages. These referential expressions have different forms. Most common forms in terms of syntactic relation between antecedent and anaphor are noun phrase anaphora and verb phrase anaphora. When an anaphor refers to a noun phrase, it is noun phrase anaphora and when the antecedent is verb phrase, it is verb phrase anaphora.

To identify the antecedent of an anaphor is an important concern in natural language processing systems. The identification of correct antecedent of an anaphor can be essential for the production of accurate and meaningful target text in case of machine translation. There are definite limitations of the use of anaphora as it cannot be used anywhere. It has proven a thoughtful challenge to identify the rules and catalogue the pattern for the use of anaphors.

Extensive research work has been done in English, French, Italian, German, Dutch, etc., as compared to the languages of South Asian e.g. Hindi, Bengali, Tamil, Malayalam, Punjabi, etc., and especially Urdu which is a widely spoken language of the sub-continent. Urdu language became part of languages of South Asia on web ([3].

Hobbs’ algorithm [4] is one of the earliest algorithms, to resolve personal and possessive pronouns in English, and it is still being used as a reference. The authors of the study [5], used the Hobbs’ Algorithm as a reference, in order to resolves reflexive and reciprocal pronouns. Meng, W., et al [6] has worked on noun phrases and proper in English.

4

World Wide Web was initially for English but now it is multilingual and from the last few years’ multilingual contents are increasing rapidly. Therefore, information retrieval in monolingual and cross-lingual domains is gaining attention of the researchers, where queries are made in multiple languages [7].

Many Indian languages and Urdu are becoming a part of Asian languages on the internet [8]. Detailed knowledge of NLP is required for Information Retrieval (IR) and many tasks of data Mining (DM), e.g. Event Extraction, Topic Categorization, and Relationship Exploration, etc. For NLP systems, NLP tasks like stop-word removal, parsing, part of speech tagging, shallow parsing, etc. are very important [9].

1.4 Aim and objectives

Natural Language Processing systems developed for English language are well established, however for Urdu language it requires a lot of effort ([10]; [11]; [12]). Urdu is the state language of Pakistan and spoken by approximately 11 million people in Pakistan and more than 300 million people around the globe [13]. It has roots in Persian, Sanskrit, Turkish, Arabic, and has similarity in structure with Hindi language ([14]; [15]). Urdu has a complex structure as its syntax and morphology is the combination of Turkish, Persian, Arabic, Sanskrit, and English [16]. Due to less availability of linguistic resources, no considerable work is done for Urdu language processing. Urdu language processing is gaining attention of the researchers for future applications in IR, clustering, classification, document summarization, and plagiarism detection in Urdu documents.

Researchers are working in different areas like preparation of Gold standard dataset, NER, characteristics of Urdu dataset, stemming, Urdu preprocessing, and no considerable work is done in anaphora resolution in Urdu.

In other languages of the world, anaphora resolution has got attention of the researchers but however, Urdu Language has not got much attention of the researchers; especially of indefinite anaphors, Relative anaphors, Reflexive anaphors, Distributive anaphors, etc., which are integral part of an Urdu language processing system. Being the state Language of Pakistan and among the most spoken languages of India, and for ULP system, the specific gap, identified as research problem,

5

An extensive work has been done in past for different languages but fewer efforts have been for Urdu language. Due to its different syntactical structures, it captured our focus in this research study and problem of resolution of reflexive and distributive anaphors has been taken as research work.

Primary research question of this work is:

i. How to develop model to resolve reflexive and distributive anaphora?

And the secondary research questions are:

ii. How to manage syntactic complexity of Urdu language grammar for reflexive and distributive anaphora? iii. How to organize anaphoric tokens for reflexive anaphora resolution? iv. How to organize anaphoric tokens for distributive anaphoric resolution?

1.5 Trends and Challenges

Developing systems that understand a natural language is a difficult problem. Natural languages contain infinite words, phrases, and sentences with a lot of ambiguities. Many words have several meanings such as orange, bear, and fly etc. and a sentence can have different meanings in different context. This makes the creation of natural language understanding systems a challenging one.

Texts in Natural languages contain link between the different units. Anaphora resolution (AR) is the process of identifying/resolving links between the texts. Handling of anaphora plays a dynamic role in understanding and generating natural language discourse.

The word Anaphor came from the Greek, meaning “to carry back." In computational linguistics, an anaphor cannot be interpreted by itself, rather it depends on another expression for its meaning in a given text. That second expression is called “antecedent". Thus understanding an anaphor in a text requires locating out which expression in the preceding text is the target for its meaning, and then interpreting that expression accordingly. Locating the antecedent for an anaphoric expression and then interpreting the anaphor is called “anaphora resolution". Pronouns depend on an antecedent expression in the text for their correct interpretation.

6

The resolution of anaphoric links is one of the most challenging tasks in natural language processing. Anaphora is an expression which refers back to some entity, called antecedent. The process of locating, selecting, and binding the referring expression to the antecedent is called anaphora resolution. According to Halliday and R. Hasan [2], anaphora is defined as “The cohesion (presupposition) which points back to some previous item”. Anaphora resolution can also be defined as either “the problem of identifying which noun phrases (NPs) or mentions refer to the same real-world entity in a text or dialogue” ([17]; [18]; [19]).

A natural language processing system, useful for human, requires a component to understand the natural language like us. If someone says something and we, the human being, understand it, this means that we have correctly identified the entities, which the speaker has referred and also relationship between them in a given context and then we can take action accordingly. Similarly, natural language processing system that understands at a minimum must be able to identify the entities and relationship between them in a given discourse. If it happens then the computational system will be able to take further action accurately.

Anaphora resolution has been widely studied for English language, with a number of different original approaches and implementations. For other languages like French, Spanish, Japanese, Italian and German, etc., few approaches exist but research carried out for these languages does not comparable to the research for English language. In some cases, anaphora resolution system for a new language has been developed by simply adapting a method which was originally developed for English.

G. Hirst [20] defines anaphora as “a device for making an abbreviated reference (containing fewer bits of disambiguating information, rather than being lexically or phonetically shorter) to some entity (or entities) in the expectation that the receiver of the discourse will be able to dis-abbreviate the reference and, thereby, determine the identity of the entity”. Here, reference is an anaphora and the entity to which it refers, is an antecedent. It is an easy task for a human to understand and resolve the anaphoric links which occur in natural languages, but for a machine or an NLP system it is a quite complex

7

job to understand such links and then resolve them. For the success of any NLP, the knowledge of anaphors and their resolution is very important.

Resolution of anaphoric links in text is comparable to creating Artificial Intelligence; where computers are made to think like human. The resolution of anaphora is very complex task, as it is concerned with computational linguistics, as well as with philosophy, instance logic, psychology, neurology, and communication theory.

Anaphora resolution is deeply linked with the natural language principles and human communication. Therefore, in all natural language applications, anaphora resolution is needed. The main areas of NLP applications are Information Extraction (IE)/Question- Answering, Machine Translation (MT), and Dialogue Systems (DSs).

a) Information Extraction (IE)/Question-Answering: In most question answering systems, answer to a natural language query is obtained by simple string matching or by parsing the sentence. However, information extraction or question answering is the task of extracting the answer of a natural language query from a given text [21]. Anaphora resolution is an important sub task of question answering system [22]. For example, consider the following two sentences.

Z. A. Bhutto was born on January 5, 1928. He served Pakistan as Prime Minister from 1973 to 1977.

The answer to query “Who served Pakistan as prime minister from 1973 to 1977?” by simple search using string matching or parse analysis of the second sentence will answer as ‘He’, which is incomplete answer in itself. Therefore, in order to get complete answer, the system must have knowledge that the pronoun ‘HE’ is referring back to noun ‘Z. A. Bhutto’ in the previous sentence. The system can extract the complete answer after performing anaphora resolution.

b) Automatic summarization: It is the process of reducing a text document to create a summary by retaining its main points with the help of a computer program [23]. Most common method to summarize the text is to choose important information and drop additional information. Summarization is required for large text. However, for example, one possible summarization of the above example can be:

8

Z. A. Bhutto, born on January 5, 1928, served Pakistan as Prime Minister from 1973 to 1977.

Which shows that the knowledge of referent of pronoun ‘He’ is important to get comprehensive and coherent summary.

c) Machine Translation: Machine Translation (MT) means translation of text in one natural language to text in another by using machine. To make the translation process simple, majority of the research focused on translation at sentence level. Translation of pronouns from one language to another requires knowledge of referents of pronouns [24], as pronouns in some languages have different forms on the bases of morphological properties (e.g. number, gender, , etc.) of the words. Therefore, to get correct translation in destination language, anaphora resolution is required to resolve these issues.

If any NLP system is not able to handle anaphora resolution, it means that it is not processing an important part of the data, which can be considered as a fault in any NLP state-of-the-art system.

Anaphora resolution requires different kinds of knowledge to resolve the links. The Fig- 1.1 given below presents different kinds of knowledge which are required to resolve anaphoric links [25]. The shaded circles in the Figure are presenting those kinds of knowledge which are focused upon in this research study.

9

Fig-1.1 Knowledge required for Anaphora Resolution (Ashima [25])

A number of interesting theoretical models have been developed as a result of research in AR, however, there are many barriers on the way to make them functional, e.g., it is a difficult job to merge the different kinds of relevant knowledge needed for this purpose. Secondly, each application can utilize only very limited resources. A resource may not be available when it is essential for the AR processing in a language at all. For instance, general world knowledge required for AR processing may not be available in a particular language. For many languages, the unavailability of a syntactic analyzer, or unavailability of a large corpus makes it difficult to make AR system functional, which results in error. To access such resources and overcome the deficiencies is a challenging problem in the anaphora resolution.

For Urdu language, the similar problems exist. Some theoretical work has been done for anaphora resolution in Urdu, however, due to the lack of required resources and tools, the efforts have not been very fruitful.

10

Syntactic approaches specifically, Hobbs [26], are quite successful in resolving anaphora reference. Motivated by this approach, we aim to explore syntactic structures, relations, and linguistics features for resolving specific category of anaphor. Many approaches use world knowledge for anaphora resolution, but use of world knowledge does not explain how the disambiguation process works and alternative explanations should be explored [27].

1.6 Reflexive and distributive anaphora in Urdu

Urdu language started developing in 12th century in north India around Delhi, based on the language spoken there. It was heavily influenced by Persian, Arabic, and Turkish. Origin of Urdu is Hindi, which is sometimes referred as a ‘sister’ language of Urdu, having the similar grammar. It became widely used in the Mughal period, but dates back as far as the era of the Sultans of Delhi.

According to Ethnologue's 2018 estimates, Urdu is the 11th most widely spoken language in the world, with 170 million total speakers, including those who speak it as a second language. After the creation of Pakistan in 1947, Urdu was chosen as its national language. Now a day, Urdu is spoken in many countries of the world, including USA, Canada, United Kingdom, Middle East and of course India.

Urdu is a free word-order language. It allows many possible word order structures. Most common sentence structure is Subject-Object-Verb (SOV), used by the native speakers.

Following are different types of noun phrase anaphora in Urdu:

- Personnel anaphora

- anaphora

- Indefinite anaphora

- Relative anaphora

- Compound Relative anaphora

- Reflexive anaphora

- Distributive anaphora

11

- Interrogative anaphora

This work is about resolution of reflexive and distributive anaphora in Urdu text.

The challenges of finding the correct antecedent for reflexive and distributive anaphors in Urdu text and proposing solutions to resolve them in Urdu text are accepted here in this research. In English, Reflexive pronouns are words ending in -self or -selves that are used when the subject and the object of a sentence are the same (e.g., I believe in myself). A reflexive pronoun is preceded/succeeded by a noun, adverb, , or a pronoun to which it refers (its antecedent) within the same clause.

The following examples are given for better understanding of the research problem focused upon in this study for analysis and resolution of reflexive anaphors or reflexive anaphoric devices or reflexive pronouns and distributive anaphors or distributive anaphoric

“اانپ“ ,(khud) “وخد“ devices or distributive pronouns, used in Urdu. The reflexive anaphors are

khud) is combined with) “وخد“ apni). The reflexive anaphor) “اینپ“ apne), and) “اےنپ“ ,(apna)

“اےنپ“ ,(apna) “اانپ“ personal ADs or noun or noun phrases to say something emphatically and

:apni) to say non-emphatically, for example) “اینپ“ apne), and)

ہی آفص اک اانپرھگ ےہ- (i)

reflexive anaphor noun

رھگ Apna) is referring the possession of entity) اانپ In the above example, reflexive anaphor

.(Asif)آفص house) to a noun)

ےھجم اینپ یطلغاک ااسحس ےہ- (ii)

12

reflexive anaphors personal pronoun

mistake) to) یطلغ Apni) is referring an entity) اینپ In the above example, reflexive pronoun

.(me) ےھجم a noun

The distributive anaphors or distributive anaphoric devices or distributive pronouns are used for entities, persons or things, to be taken alone or in the form of a group. These are

:Kai Ek). For example) یئک اکی Koi Bhi), and) وکیئ یھب ,(Koi Ek) وکیئ اکی ,(Har Ek) رہ اکی

آرٹسایلی ےک الخف الھکڑیڑبیتنحمےس گنٹیب رک رےہ ےھت- رہ اکی ایھچاکررکدیگ داھکان اچاتہ اھت- (iii)

distributive anaphor noun in plural

(player) الھکڑی Har Ek) is referring to a group) رہ اکی In above example, distributive anaphor in previous sentence.

ہی ریمی وصتریںی ںیہ۔ وکیئ اکیمتےل ےتکس وہ - (iv)

distributive anaphor noun in plural

وصتریںی Koi Ek) is referring to a group) وکیئ اکی In above example, distributive anaphor

(pictures) in previous sentence.

In these example, anaphoric link is created between different units of text. These links need to be resolved for further NL processing, like machine translation, question-answering, etc. Basic structure of reflexive and distributive anaphors is shown in above examples. In chapter 3, the variation of syntactical use of reflexive and distributive anaphors is investigated in detail.

13

1.7 Key Contributions.

Following are the key contributions of this research work.

 Main contribution is the development of a framework for resolution of entity reflexive anaphora and distributive anaphora in Urdu by exploring different linguistic features.

 Based on syntactic structure and constraints, we derived rules and developed algorithm to resolve different variations of reflexive and distributive anaphors.

 To resolve complex references, we incorporated world knowledge to select correct antecedent in case of multiple candidates. Our results showed that promising performance can be achieved by using world knowledge and agreement feature. This performance can be improved further by incorporating other semantic and salience features.

 To use world knowledge, we developed resources for storing entities and their relevant features/attributes which are used commonly and developed framework to combine world knowledge with syntactic knowledge to resolution.

 Developed a method to extract the noun case for the purpose of resolution and to translate accurately to target language.

 Developed a method to explore the variations of syntactical usage of reflexive and distributive anaphors.

1.8 Significance of the Study

This work covers the area of resolution of reflexive and distributive anaphoric links in Urdu text. After resolution of these links, Urdu text can be used for Machine Translation (MT), question answering (QA), Text Summarization (TS), Information Retrieval (IR), and Information Extraction (IE). On a larger scale this work will be a valuable part of a Natural Language Processing System (NLPS) for Urdu language.

14

1.9 Thesis Organization

The research Thesis is organized into five chapters. Brief outline of these chapters is as follows:

Chapter 1 presents an introduction of the domain and research problem along with the terminology, used in this research work.

Chapter 2 is about the literature review, starting from historical background of the problem and significant work done in area of anaphora resolution. It covers work done in English by using different techniques, and also covers important work done in Asian languages. Very limited work has been done for Urdu in the area of anaphora resolution, which is described in the chapter.

Chapter 3 investigates the reflexive and distributive anaphors in detail with variation of their uses, and rules are formulated for their resolution.

Chapter 4 presents framework (RADAR) to resolve reflexive and distributive anaphors in Urdu text. It presents a model, showing different phases, modules, and resources required for this purpose.

Chapter 5 is dedicated to the problem of evaluation of RADAR, conclusion, and future work. It discusses and analyzes the evaluated results and presents the accuracy rate of the system in %age, and also in the form of charts. It also discusses the areas where there is less accuracy rate of the system, finds out clues for the reason. It also discusses the aims and objectives, achieved in the research work. It describes different issues regarding processing of linguistic entities during evaluation phase and various factors causing low accuracy.

1.10 Summary:

This chapter presented the introduction of research domain, research problem, research questions, objectives of this research problem, its overall contribution and its significance.

15

Chapter 2: Literature Review

2.1 Overview

Resolution of anaphors is considered as a very complex problem in NLP [28]. For example:

Sara likes chocolate and she eats it often

The words “she” and “it” are called anaphors and refer to “Sara” and “chocolate” respectively. These phenomena can easily be understood by human beings but, to develop a natural language system based, on this phenomenon, is not a simple task.

Anaphora resolution research has two main approaches:

a) Knowledge-rich approaches.

b) Knowledge-poor approaches. a) Knowledge rich approaches

These approaches were used in early research in resolving the anaphors. It was based on heuristic observation and assumed a correctly parsed input. These approaches are further divided into following types: i) Syntax-based approach

In this approach fully syntactic parsed tree is required which is traversed to find out antecedent by applying syntactical constraints and morphological constraints. A standard, Hobbs [29], is an example that uses this approach. ii) Discourse-based approach

This approach of locating and selecting the reference of a pronoun is called centering theory (CT). Approaches based on CT require information from the structural properties of utterances. The well-known example which is based on this method is of Brenan, Friedman, and Pollard [30].

16

iii) Hybrid approaches

In these approaches different sources of knowledge e.g. syntactic knowledge, discourse knowledge, morphological knowledge etc., are used to locate the antecedent. Lappin & Leass [31] is one of the important systems which use this approach. A model for finding out antecedent on the bases of different factors applied on potential candidates was used. iv) Corpus-based approaches

A statistical method was used by Charniak [32], for resolution of anaphors by using a small size corpus as training corpus based on Hobb’s algorithm [33]. Their model was based on different kind of information like distance between pronouns and its antecedent, syntactic constraints, etc. b) Knowledge poor approaches

These approaches use machine learning techniques. In these approaches training data was taken from annotated corpus for creation of feature vectors. The training examples from this data set are used to build a to machine learning algorithm. Soon [34] is the example of this approach.

Following section 2.2 presents various factors and features which affect the anaphoric reference relation and have been used by different anaphora resolution algorithms.

2.2 Factors in Anaphora Resolution

Mitkov [24] discussed various factors that affect the phenomena of anaphora resolution. Most frequently used factors are morphological agreement (gender and number), semantic similarity, binding and c-command constraints, grammatical and semantic parallelism, nearness, or proximity etc. These factors are classified into constraints and preferences. Some important factors are explained in the following section.

2.2.1 Constraints

These are the linguistic restrictions/boundaries that define which entity may or may not be referred by a pronoun. Following example of English shows the gender agreement.

17

- Peter and Sara went to watch a movie. He is fond of movies.

- Peter and Sara went to watch a movie. She is fond of movies.

First sentence of both examples are same, having two candidate antecedents. In the second sentence of first example a masculine pronoun is used, its referent will be Peter, and in the second sentence of second example a feminine pronoun is used, its referent will be Sara.

Similarly, another constraint is number agreement in anaphoric references as shown in the following examples.

- There was a difference of opinion between captain and the team. He wanted to bat first.

- There was a difference of opinion between captain and the team. They wanted to bat first.

In these examples, there is only one difference that is of pronouns “He and They” (number agreement). The referent of singular pronoun will be captain and plural pronoun will be team.

In some languages, there must be semantic agreement for anaphora resolution between anaphoric device and antecedent. Consider the following examples:

- Sara has a laptop. It is attractive.

- Sara has a laptop. She is attractive.

Semantics of she relates it with Sara, the animate entity, and semantics of it relates it to laptop, an inanimate entity.

2.2.2 Preferences

Preferences help in selecting antecedent from multiple candidate antecedents by giving relative weight and thus giving preference to a candidate over the others. Consider the following examples:

John contested Peter in election. He was elected.

Peter contested John in election. He was elected.

18

In the first example, the subject of the sentence is John, and in second example subject of the sentence is Peter. Therefore, John in the first example and Peter in the second example are preferably correct antecedents.

From the above some examples of preferences and constraints, it is obvious that the linguistic analysis is essential to obtain features or information, required for anaphora resolution.

The following Sections present an overview of related research to AR. First, the pioneer AR systems and underlying theories are mentioned. An overview of the anaphoric resolution systems which are based on machine learning technique and statistics are also presented. Last section presents the work done in anaphora resolution for Urdu language and some Indian languages.

2.3 Early AR Systems

The research work about anaphora resolution started in 1960s and many projects started to develop intelligent agents to perform the task of anaphora resolution. The input to these systems was instruction in natural language and were able to understand the pronouns used. e.g., STUDENT developed by Bobrow [35] and SHRDLU developed by Winograd [36]. The STUDENT system was designed to fulfill the requirements of algebra lessons of high school. While SHRDLU system was designed to give commands to robot for moving the objects in “block world.” But however, limited domain was used in these systems containing some heuristic rules to resolve anaphora references.

The first algorithm proposed by Hobbs [37] is the first one which is designed on the bases of well-defined linguistic knowledge. In this algorithm, the author used “the syntactic approach” for AR.

In 1957, Chomsky presented “Generative grammar”, Chomsky [38], and then later on refined Chomsky [39], [40], and [41] and Jackendoff [42]. In all these, many insight issues relating to anaphoric reference were presented. These were formulated as “Government and Binding Theory” [38] in 1980’s.

Another theory, called Centering Theory (CT) was presented by Joshi and Kuhn [43] and then later on by Joshi and Weinstein [44]. It was related to various aspects of importance

19

of discourse objects. It studies anaphoric expressions and their referential properties through investigating local textual coherence. The most famous algorithm which utilizes the centering theory for resolving pronominal anaphora is by Brennan [45], called the BFP- algorithm. This algorithm was extended by Strube [46]. Their work was largely inspired by Sidner [47], who presented a similar theory known as Focusing.

Rhetorical Structure Theory (RST) is another famous theory of anaphora resolution, which is presented by Mann and Thompson [48], is based on intentionality, and was presented by Grosz and Sidner [49]. This theory is about the relationships between three different structures, which are intentional structure, linguistic structure, and attentional state. Cristea et al [50] proposed a method to take advantages of RST to resolve anaphoric expressions.

Kamp [51] and Kamp and Reyle [52] proposed Discourse Representation Theory in the area of dynamic discourse. On the bases of syntax of the sentences, it was suggested to devise an algorithm based on Discourse Representation Structures (DRSs). The information and relationship data about all entities in the sentence are stored. Each DRS builds a DRS for the subsequent sentences that serve as a context. It provides rules to resolve anaphora by considering the relevant context. Heim [53] presented a similar kind of framework, which was called as File Change Semantics (FCS).

2.4 Modern Anaphora Resolution Systems

It was believed in the early stage of AR research that, within a few years, sources will become ready to handle all important features to overcome the difficulties of AR implementation. However, after decades of thorough research, it became clear that these expectations were not true, as world knowledge, and different semantic issues are also required for AR systems. Therefore, in 1990s, AR research shifted to concentrate on the development of lower-level information, e.g. morphology and syntax. This section presents different approaches that are labeled as “knowledge-poor”.

Resolution of Anaphora Procedure (RAP) is first prominent system developed by Lappin & Leass [54]. It has assigned weights to all the occurrences of potential antecedents and then these weights were used to filter out the most prominent antecedent on the bases of morphology or syntactical reference constraints. The factors used in RAP were based on syntax and did not reflect semantics or world knowledge required for AR.

20

Dagan et al. [55] extended the RAP by introducing a method (e-RAP) to measure the lexical preference patterns statistically. These methods over-rule the RAP’s decisions when the most multiple candidates have slight difference in weights but, due to the lexical content, one of them is prominent.

Kennedy & Boguraev [56] also proposed a variation in RAP with title “Anaphora for everyone: pronominal anaphora resolution without a parser”. As, at the time of creation of system, the output of the parser was not reliable, therefore they used ENGCG POS tagger to build their system. ENGCG POS provides POS-tag, its grammatical role, for each token that is presented in Voutilainen et al. [57].

Baldwin [58] developed a prominent anaphora resolution system, called CogNIAC. It resolves only unambiguous cases. It has 6 rules for this purpose. These rules are applied one by one to resolve anaphora. If no rule solves the problem then anaphora is left unresolved.

A most significant research work is done by Ruslan Mitkov in the field of AR. He presented his initial AR system by combining statistical and traditional linguistic methods in Mitkov [59] and Mitkov [69].

Mitkov [61] claimed that various constraints and preferences on co-reference for resolution of anaphora are not reliable. He proposed a system to consider each antecedent candidate as an assumption and then by considering various anaphora resolution indicators to find out its standing as antecedent. He then, combined his two previous approaches Mitkov [62] to improve the accuracy of AR system.

Mitkov [63] presented robust knowledge-poor approach and then later Mitkov et al. [64] improved and implemented it as MARS. In this approach, antecedent indicators with certain conditions were used to find out appropriate antecedent.

2.5 Machine Learning and Statistics based AR System

Connolly et al. [65] proposed machine learning approach for the first time for anaphora resolution. They suggested about the instances for anaphor and a pair of antecedent candidates. To make decision that which candidate is correct antecedent, the target information was left to educate.

21

McCarthy & Lehnert [66] proposed a system called RESOLVE to the process anaphor– antecedent pair together along with some information to find out whether the candidate antecedent is appropriate antecedent for the anaphor or not.

Aone & Bennett [67] worked on a corpus of Japanese newspaper and defined 66 features by using C4.5 algorithm for anaphora resolution.

Soon et al. [68] presented a system which is based on C4.5 algorithm. It resolves anaphora on unrestricted text, which is an important feature of the system. Later, the authors updated the system by adding some extra features and used C5 (Soon et al. [69]).

2.6 AR for URDU and Indian Languages

Previous section presented valuable research work done regarding anaphora resolution. However, this work done has been for English language. A very little work is done for anaphora resolution in Urdu. The following is the work done for anaphora resolution in Urdu.

Khan et al. [70] worked on anaphora resolution in Urdu and found some factors which are important for resolution of pronominal anaphors. Kalsoom et al. [71] worked on resolving anaphors in Urdu but did not provide any framework for resolution of anaphora. Khan et al. [72] worked on developing algorithm for distributive anaphora resolution, but it does not cover the linguistic findings and the world knowledge required which play a vital role in resolving the anaphora.

Other than English, some work has been done in some Indian languages for anaphora resolution. For Hindi, one of the earliest work for anaphora resolution is by L. Sobha and B. Patnaik [73]. They discussed a possible algorithm without actual implementation and evaluation by making a limited use of grammatical rules to identify different entities like subject, object, etc. Sinha [74], presented a translation system for English to Hindi, called AnglaHindi. This system had problems in selecting the correct reflexive pronoun. Pal TL [75] used ellipses to handle anaphora. The study by Lakhmani [76] was focused on resolution of pronominal anaphors by covering the syntactic and semantic structure of Hindi and presented a computational model for anaphora resolution in Hindi based on Gazetteer method. Chopra [77] presented that how anaphora resolution is useful in NLP.

22

Dutta [78] also presented that how to solve reflexive and possessive pronouns by applying Hobbs algorithm and no results were reported and they stated that the algorithm can be further evaluated when a sufficient amount of data is available. S.Agarwal et al. [79] matched constraints for the grammatical attributes of different words. They used animate/non-animate classification of entities for resolving the references. They claimed accuracy of 96% for simple and 80% for complex sentences. They used data, taken from children stories.

R. Prasad and Strube [80] applied discourse salience ranking to BFP and S-List algorithm for anaphora resolution. S-List algorithm consists of a single construct and insertion operation. They proposed some modification for ranking elements in S-List for Hindi.

Uppalapu [81] made an attempt to resolve third person pronouns in Hindi by extending S- List algorithm by using two different list instead of a single list. Lalitha [82] presented a generic anaphora engine for resource poor Indian languages. Dakwale et al. [83] in their study has used dependency for resolution of entity-pronoun references in Hindi. Lakhmani et al. [84] focused the pronominal anaphora resolution using gazetteer method for developing a model by the use of recency factor. Dakwale, P et al. [83] utilized machine learning approach for indirect anaphora in Hindi corpus.

The below given Table-1.1 presents a summary of the related work in Indian languages, as these are very close to Urdu, in grammatical structure.

23

Table-2.1 Summary of the related work in Indian languages

Issues Method Target Contribution Achievement Future Addressed Language Direction Reference

Gender agreement

Ashima & Recency affects Resolved the issue Mohana, Factor, Accuracy accuracy Hindi of Pronominal R. Anaphora Animistic 71% Reflexive and Resolution anaphora (2016) Knowledge distributive anaphors need to be solved

Hindi Semantic Information and Named- Entity- Dependency Dakwale, Resolution entity Pronoun Structure by P et al, Hindi/Urdu Performance for Accuracy 60% Categorization References Rule-based (2013) Dependency Hindi might Resolution Module Treebank Improve the (Bhatt et al., anaphora 2009) resolution accuracy

ANGLABHARTI Can be translation Hierarchical utilized for Methodology [ structuring domain Pseudo- Integration of of the lexical specific Sinha, information Language interlingua English to example-based database R.M.K et and target Translation rule-based Hindi approach with leading to al, (2003) language translation rule-based and preferences statistics human on lexical

engineered post- choice editing

24

Anaphora Indian Precision Resolution Languages (96.3%), Chopra, D Named- (Hindi, usefulness Improved Recall et al, Entity Telugu, Scalability in Transliteration (95.80%), F- (2013) Recognition Bengali, performing Measure English, and computation Urdu) (96.04%) linguistic tasks

Incorporation

Gazetteer 60-70% of gender and Lakhmani, Pronominal number Method, PART Model for Successful P. et al, Anaphora Hindi agreement Recency Hindi Language Identification (2014) Resolution factors to Factor of Anaphora increase success rate

Hindi Incorporate (Stories, Computational gender and Singh, S Pronominal Gazetteer News Model based on Improved number et al, Anaphora Method Articles and Animistic and accuracy agreement for (2014) Resolution Biography Recency Factors increased Contents) accuracy

Naïve Approach for Hobbs, Pronoun Antecedent Entity Algorithm, resolving J.R, References English identification Identification, Semantic pronoun (1978) Resolution 98% Scalability Analysis references

Semantic

Analysis [ Identification Agarwal, Matching Hindi ( Algorithm for of Cue Anaphora Accuracy S et al, Constraints Children’s Anaphora phrases and Resolution 96% (2007) with stories) Resolution semantic Grammatical indicators Attributes]

25

Anaphora Semantic SKA Method, Analysis, Resolution ( Based on Proposed Syntactic C&P Algorithm, world Reflexive Urdu Results after Study Knowledge and SKbAR knowledge, and testing Framework verb Distributive) identification

2.7 Summary

In this chapter, based on the literature review various related research issues were also analysed. And different approaches that are investigated in the literature for anaphora resolution were also discussed.

26

Chapter 3: Reflexive and Distributive anaphors

3.1 Overview

As, discussed earlier, anaphora resolution is a phenomenon in which a word called anaphor refers to another word called antecedent, which occurs before anaphor. For example:

درایےئدنسھاایشی اک اکی ڑبا درای ےہ۔ اس اک اہبورقتابی 243 بعکم ولکرٹیمےہ-

(Indus river is one of the longest river of Asia. Its estimated annual flow stands at around

243 km3.)

Indus river). The) درایےئ دنسھ it) is) اس In the above example, the referent of pronoun resolution of anaphora involves identification of correct antecedent/referent of a pronoun out of possible noun or noun phrases. In the above example, the possible referent of the

river). The task of anaphora) درای Asia), and) اایشی ,(Indus river) درایےئ دنسھ it) are) اس pronoun

it) automatically, which in this) اس resolution is to identify the correct referent of pronoun

Indus river). For a human, it is an easy task to locate the antecedent) درایےئ دنسھ example is

it), as human have the capability to understand the things. But it becomes) اس of pronoun complex for a machine to locate the correct antecedent from list of multiple potential candidates. Therefore, it needs a deep study of the structure of these anaphors to devise methods and formulate rule to make a machine to understand the text like human.

In noun phrase anaphora, the anaphors refer to noun or noun phrases. Noun is a word, which is name of something. For example: name of a person, a place, an animal, a thing, a time, a situation, a situation, etc. Nouns are categorized into different sub types: proper noun, common noun, abstract noun, etc. Nouns can be plural and oblique and may represent the gender.

27

For example, in Urdu:

girls) is) ڑلیک .boys) is plural-masculine form) ڑلےک boy) is singular-masculine and) ڑلاک

.girls) is plural-feminine form) ڑلایکں singular-feminine and

Use of a noun in languages is associated with noun cases and case markers, also called clitics or postpositions, are used to mark the noun cases. These post positions cannot be handled at lexical level and thus needed to be handled at syntactical level (Butt and King [85]). Following section describes noun cases in Urdu and their types.

3.2 Noun cases in Urdu

As, pronoun or anaphors refer to nouns or noun phrases which exist in different cases affecting the syntax and semantics of text. To accurately locate the correct antecedent and to make the text available for machine translation and other NLP areas, it is essential to have knowledge of noun cases. Therefore, following section discusses the use of noun cases in Urdu and their affect syntactically and semantically.

Urdu is an amalgamative language (Siddiqui [86]). It is subject-object-verb language having relatively free word order and rampant pro drop (But [87]). The function of noun in Urdu sentence is shown by clitics (postpositions) or case markers, a special word class in Urdu (Ijaz [88]). Case markers or clitics are used with a space in Urdu, while in some languages it is a morphological process. Noun can be inflected by number and case. Table- 3.1 shows the plural form of noun.

Table-3.1 Singular-Plural

Singular Plural

(Books) اتکںیب (Book) اتکب

(Teachers) ااسذتہ (Teacher) ااتسد

(Birds) رپدنے (Bird) رپدنہ

28

There are two opinions about the noun cases in Urdu:

Opinion No. 1: There are three noun cases: nominative, oblique, and vocative (Schmidt [89]). Nominative case is used for nouns which are not followed by clictics (postpositions), oblique case is used for nouns followed by postposition and some nouns which appear in imperative sentences have vocative case.

Opinion No. 2: There are eight noun cases (Siddiqui [86]) and according to (Butt and King [5]), there are seven cases.

As, there is no conflict in both opinions, so by combining them following are the nine noun cases (Humayoun et al. [90]), and the discussion show how these cases are used and affect the syntax and semantic of a sentence.

3.2.1 Nominative case

It generally marks subject of the verb. However, it can also appear with object of the verb. There is no clitic or postposition form for it. For example:

یلعکیک اھکات ےہ۔

‘Ali cake khata ha’.

(Ali eats cake).

.cake) is objective nominative case) کیک Ali) is subjective nominative and) یلع Here

3.2.2 Ergative case

ne) is used after the subject. For) ےن It occurs with subject of the sentence and clitic example:

یلعےن کیک اھکای۔

‘Ali-ne cake khaya’.

(Ali ate cake).

29

ne). Lexically it seems simple but) ےن Here noun Ali is ergative, followed by clitic semantically it can possess many alternations with some other cases (Butt & King [85]). For example, ergative-dative alternation changes the meaning of the sentence.

یلعےن ڑپانھ ےہ۔

‘Ali ne parhna ha’

(Ali wants to study)

ne) is replaced by dative case) ےن ne). If) ےن Here noun Ali is subject and followed by clitic

.meanings will be changed ,(وک)

یلعوک ڑپانھ ےہ۔

‘Ali ko parhna ha’

(Ali has to study).

3.2.3 Accusative case

.ko) is used for accusative case after the object) وک It appears with object. The clitic For example,

یلعےن اتکب وک ڑپاھ۔

‘Ali ne kitab ko parha’

(Ali read the book).

.(ko is used after the object kitab (book وک Here Ali is the subject and clitic

3.2.4 Dative case

.ke) are used with object) ےک ko) and) وک It is identical to accusative case. The clitics For example:

ااتسدےن اشرگد ےس وصتریوک وگلاای۔

30

‘Ustad ne shagird se tasweer ko lagwaya’

(Teacher made the student to hang the picture).

.(picture) وصتری is used after the object وک The clitic

The distinction between the accusative case and dative case is that the accusative case occurs with the object and dative case occurs with the second object. In the above example

.picture) is the second object) وصتری

3.2.5 Instrumental case

The instrumental case is used to indicate that a noun is the instrument or means with or by which the subject accomplishes some action or achieves some goals. In

:se) is used after a noun. For example) ےس Urdu clitic

یلع ےن ملق ےس اھکل۔

‘Ali ne qallam se likha’

(Ali wrote with pen)

.pen), which is an instrument/mean of writing) ملق se) is preceded by noun) ےس The clitic

3.2.6 Genitive case

It is most commonly used case to show the possession/ownership of something.

ke. The clitic ےک ki), and) یک ,(ka) اک ,There are three forms of Genitive case in Urdu

ki) is used for singular/plural feminine, and) یک ,ka) is used for singular masculine) اک

:Ke) is used for plural masculine. For example) ےک

ہی یلع اک ملق ےہ۔

‘Yeh Ali ka qallam ha’

(This is Ali’s pen).

31

pen), which) ملق Ali), showing his ownership of) یلع ka) is preceded by noun) اک Here, clitic is singular and masculine.

ہی یلع یک اگڑی ےہ۔

‘Yeh Ali ki garri ha’

(This is Ali’s car)

car), which) اگڑی Ali), showing his ownership of) یلع ki) is preceded by noun) یک Here clitic is singular and faminine.

ہی یلع ےک ولھکےن ںیہ۔

‘Yeh Ali ke khilone ha’

(These are Ali’s toys)

,(toys) ولھکےن Ali), showing his ownership of) یلع ke) is preceded by noun)ےک Here clitic which is plural and masculine.

3.2.7 Locative case

,(talle) ےلت ,(tak) کت ,(par) رپ ,(main) ںیم This case indicates the location. The clitics

talakk) are used for this purpose in Urdu. Following are the examples of) کلت different forms of locative noun case.

ریمیبیج ںیم اچپس روےپ ںیہ۔

‘Meri jaib main pachas rupe hn’

(There are Rs. 50 in my pocket)

.pocket), which is a location) بیج in) is preceded by) ںیم Here clitic

الگس زیم رپ ےہ۔

‘Glass maiz par ha’

32

(Glass is on the table)

.table), which is a location) زیم on) is preceded by) رپ Here clitic

یلع امرٹیک کت ایگ ےہ۔

‘Ali market tak gya ha’

(Ali went up to market)

.market), showing a location) امرٹیک to) is preceded by) کت Here clitic

یلع امرٹیک کلت ایگ ےہ-

‘Ali market talak gya ha’

(Ali went up to market)

.market), which is a location) امرٹیک up to) is preceded by) کلت Here clitic

یلع درتخ ےلت ڑپھ راہ ےہ۔

‘Ali darakht talle parrh raha ha’

(Ali is studying under the tree)

.tree), showing a location) درتخ under) is preceded by) ےلت Here clitic

3.2.8 Vocative case

.A) is used before a noun) اے It is used to identify a person being addressed. Clitic

.(A) اے It can also be used without clitic

Following are the examples of vocative case:

اے ڑلوک! اانپ اکم متخ رکو۔

‘A larrko! Apna kaam khatam karo’

(O boys! Finish your task)

33

.boys, who are being addressed) ڑلوک A) is succeeded by) اے Here clitic

Following is example of vocative case without clitic.

ڑلوک! اانپ اکم متخ رکو۔

‘Larrko! Apna kaam khatam karo’

(Boys! Finish your task)

3.2.9 Oblique case

It is used when the noun or pronoun is the object of a verb or a preposition. It can appear in any role except subject, where nominative case is used. For example:

یلع وھگڑے کت ایگ ےہ۔

‘Ali ghorrey tak gya ha’

(Ali has gone to the horse)

وھگڑے ,ghorray). Here) وھگڑے ghorra) and its oblique form is) وھگڑا Original word for horse is

.(to) کت to the horse) shows the oblique behavior for locative case) کت

Clitics/postpositions used with noun in Urdu are shown in Table-3.2. First column shows the name of clitics, second column is showing the words used for noun cases and third column is showing the morphological effect i.e. how it is used, for example, for case

.(ne) ےن ergative oblique for of noun is succeeded by

34

Table 3.2 Noun Cases

Case Clitic form Morphological effect

Nominative Nothing No Change

Oblique Nothing Nominative or its Modified form

Ergative Oblique + ne (ne) ےن

Accusative Oblique + ko ko وک

Dative Oblique + [ko,ke] ke ےک ko وک

Instrumental Oblique + se se ےس

Genitive Oblique + [ka,ki,ke] ke ےک ki یک ka اک

Locative Oblique + [mayn, par, tak, talle, ,par رپ ,main ںیم tallak] ,talle ےلت ,tak کت

tallak کلت

Vocative (A) + Oblique or modified form (A) اے of oblique

35

After presenting types of noun cases of Urdu, now we explore the syntactical structure of reflexive and distributive pronouns by considering different linguistic features and formulate the rules accurately locating the antecedent for anaphors. In section 3.3, reflexive anaphor is introduced and in section 3.4, we explore the syntactical structural use of reflexive anaphor in Urdu text.

) امضرئ وکعمس( Reflexive pronoun in Urdu 3.3

In general linguistics, a reflexive pronoun, is an anaphoric pronoun that must be coreferential with another nominal, its antecedent, within the same clause [91]. Generally, it is a noun phrase that obligatorily gets its meaning from another noun or noun phrase in the same sentence [92].

In Urdu, various types of pronoun are used. One of these types is a reflexive pronoun. In English, reflexive pronouns are words ending in self (for singular) or selves (for plural) that are used when the subject and the object of a sentence are the same. These are myself, yourself, himself, herself, oneself, itself, ourselves, yourselves, and themselves.

For example:

Sara introduced herself.

Peter cooked meal himself.

Urdu has two types of reflexive pronouns.

 Possessive reflexive pronoun: possessive reflexive pronouns are used in possession relations with in the same clause. They are inflected with possessee and

apne), and) “اےنپ“ ,(apna) “اانپ“ not with gender and number of possessor. These are

:apni). Consider the following example) “اینپ“

اعہشئ ےن اینپ اگڑی الچیئ-

‘Ayesha ne apni garri chalai’

(Ayesha drove her (own) car)

36

apni) is reflexive pronoun which refers to the subject of the) اینپ In the above example

.car) to the subject) اگڑی Ayesha’) and shows the possession of the‘)اعہشئ clause

 Non-possessive Reflexive: Non-possessive reflexive pronouns are used in any participant position but mostly in object position. They are used to say something

,khudd), representing ‘self’ for different persons) ”وخد“ emphatically. They include preceded by personal pronoun or noun. Consider the following example:

آفص وخد واہں ایگ-

‘Asif khudd waha gya’

(Asif himself went there)

Asif’) and is‘) آفص khudd’) refers to subject‘) وخد In the above example, reflexive pronoun used to emphasize the subject.

As discussed in section 2.1, syntactic approaches specifically, Hobbs [29], are quite successful in resolving anaphora reference. Motivated by this approach, we aim to explore syntactic structures, relations, and linguistics features for resolving specific categories of anaphors.

.(امضرئ وکعمس) Exploring Reflexive pronoun 3.4

As discussed earlier, reflexive pronoun has two types, possessive and non-possessive (emphatic). Following section discusses different variations of the structural use of possessive and no possessive anaphors in Urdu sentence and explores how to identify the correct antecedent for resolution.

3.4.1 Possessive Reflexive pronouns

Apni) are possessive reflexive pronoun that) “اینپ“ Apne), and) “اےنپ“ ,(Apna) “اانپ“ ,In Urdu are preceded/succeeded by noun, pronoun, adverb or adjective to which they refer. Following are different grammatical uses of these reflexive pronouns in Urdu.

37

3.4.1.1 Possessive reflexive pronoun preceded by a noun or a pronoun

Consider the following examples:

ادمحاانپ اکم متخ رکاکچ ےہ۔ (1)

‘Ahmad apna kaam khatam kar chukka ha’

(Ahmad has completed his (own) work

Apna) is reflexive and refers to the subject of the clause) اانپ In the above example, pronoun

work) to the subject. Fig-3.1 shows the) اکم ahmad’) and shows the possession of the’) ادمح syntax tree of the above example. It is showing that possessive reflexive pronoun is preceded by noun, in this example, and may be preceded by personal pronoun. so, it is an easy task, due to the structure of possessive reflexive pronoun that once it is located, its preceding noun or personal pronoun is its antecedent.

S

VP NP NP

PRP NN V V N N

ادمح اانپ اکم متخ رکاکچ ےہ

Fig-3.1 Possessive reflexive pronoun preceded by noun

38

Consider another example:

یلعاےنپ اگوں الچ ایگ ےہ- (2)

‘Ali apne gaoon challa gya ha’

(Ali has gone to his village)

Apne) is reflexive and refers to the subject of the clause) اےنپ In the above example, pronoun

village) to the subject. The possessive reflexive) اگوں ali’) and shows the possession of the‘) pronoun, in this example too, is preceded by a noun.

ےھجماینپ یطلغ اک ااسحس ےہ۔ (3)

‘mujhay apni ghalti ka ehsaas ha’

(I realize my mistake)

Apni) is reflexive and refers to the subject of the clause) اینپ In the above example, pronoun

.mistake) to the subject) یطلغ I’) and shows the possession of the‘) ےھجم

In all above examples, noun or pronoun preceded by possessive reflexive pronoun is the antecedent.

3.4.1.2 Possessive reflexive pronoun preceded by ergative case

ne) is preceded by a noun/pronoun. Consider the) ےن Ergative case occurs when clitic following example.

ونوجان ےن اینپ اگڑیوکآےگ ڑباھای یہ اھت ہک یتب رسخ وہ یئگ۔ (4)

‘nojawan ne apni gharry ko aage barrhaya hi tha keh batti surkh ho gye’

(As soon as the young man moved his car forward, the signal turned red)

39

Fig-3.2 is showing the that the possessive reflexive pronoun is preceded by ergative noun case, which is its antecedent. In this diagram only relevant grammatical entities and anaphoric link are shown.

ونوجان ےن اینپ اگڑیوکآےگ ڑباھای یہ اھت ہک یتب رسخ وہ یئگ-

PRP Clitic Noun

Ergative case

Fig-3.2. Possessive reflexive pronoun with ergative case

ںیم ےن اےنپ ےئن ڑپکے ےنہپ اور رواہن وہ ایگ- (5)

‘main ne apne nae kaprray pehne aur rawana ho gya’

(I put on my new dress and set off)

ne) and) ےن Apni) is preceded by clitic) اینپ In example (4), possessive reflexive pronoun

subject of the clause, and , ونوجان ےن young man), forming ergative case of noun) ونوجان noun

Apne) is preceded by ergative case of) اےنپ example in (5), possessive reflexive pronoun

also subject of the clause. The antecedent in both examples is subject without ,ںیم ےن noun

.(ne) ےن clitic

40

3.4.1.3 Possessive reflexive pronoun preceded by an adverb and a noun/pronoun

An adverb may also appear in between possessive reflexive pronoun and a noun/pronoun.

.(bhi) یھب Following are examples of using the adverb

رمیحیھب اینپ زغل انسےن ےک ےئیلےب نیچ اھت- (6)

‘Rahim bhi apni ghazal sunane k lye bechain tha’

(Rahim was also rest less to recite his ghazal)

رمیحیھب اینپ زغل انسےن ےکےئیلےبنیچ اھت-

PRP Adverb Noun

Adverb+Noun

Fig-3.3 Possessive reflexive pronoun preceded by adverb and noun

Fig-3.3 shows the structure of possessive reflexive pronoun, which is preceded by an adverb and noun, which is the antecedent.

Consider another example:

وہ یھب اےنپ کلمواسپاجان اچاتہ اھت۔ (7)

‘who bhi apne mulk wapas jana chahta tha’

(He wanted to go back to his country too)

41

bhi) and) یھب apni) is preceded by adverb) اینپ In example (6), possessive reflexive pronoun

اےنپ which is subject of the clause. In example (7), possessive reflexive pronoun ,(رمیح) noun

He), which is the subject) وہ bhi) and a personal pronoun) یھب apne) is preceded by adverb)

bhi) is the) یھب of the clause. In both example, the subject of the clause preceded by adverb antecedent.

3.4.1.4 Possessive reflexive pronoun preceded by a dative case

ke) is preceded by a noun/pronoun. Consider) ےک ko) or) وک Dative case occurs when clitic the following example.

نیسحوک اےنپ وھگڑے ےک ےئیلاھگس رخدیین ےہ- (8)

‘Hussain ko bhi apne ghorray k lye ghaas kharidni ha’

(Hussain has to buy grass for his horse)

(ko) وک Apne) is preceded by clitic) اےنپ In the above example possessive reflexive pronoun

Hussain), subject of the clause, which is its antecedent and the reflexive) نیسح and noun

Hussain) of) نیسح horse) to subject) وھگڑے Apne) is showing the possession of) اےنپ pronoun

horse) has accusative case, as preceded by clitic) وھگڑے the clause. The object of the clause

.(ke) ےک

Consider another example:

اسوک اےنپ ااحتمنیک رکف وہ ریہ یھت- (9)

‘uss ko apne imtihan ki fikar ho rahi ha’

(He was worried about his examination)

42

,(Apne) is preceded by clitic (ko) and pronoun (uss) اےنپ Here, possessive reflexive pronoun

اےنپ which is subject of the clause in dative case, and is antecedent of the reflexive pronoun

.(uss) اُ س Apne) showing the possession of examination (examination) to subject)

Aur) between two possessive reflexive pronouns) ”اور“ The connector 3.4.1.5

Aur) are when used in a) اور Two possessive reflexive pronoun connected by the connector clause, they refer to the same antecedent and show the possession of same entity. Consider the following example:

یلع اینپ اور اےنپدوتسیک اتکںیب ےل آای- (10)

‘Ali apni aur apne dost ki kitben le aya’

(Ali brought his own and his friend’s book

(Apni) ”اینپ“ appears between two possessive reflexive pronouns ”اور“ Here, the connector

,(Ali) ”یلع“ ,Apni) is referring to its preceding noun) ”اینپ“ Apne). First pronoun) ”اےنپ“ and

friend). Both) دوتس Apne) is referring to succeeding noun) ”اےنپ“ and second pronoun pronouns are preceded by accusative case.

43

(aur) اور Fig-3.4 shows the connection f two possessive reflexive pronouns by connector and preceded and succeeded by a noun.

یلع اینپ اور اےنپدوتسیک اتکںیب ےل آای- connector

Noun

Noun PRP PRP

Fig-3.4 Two possessive reflexive pronoun together

After analyzing and discussing different variations in the use of possessive reflexive anaphor, following heuristic rules are formulated for its resolution.

i) If possessive reflexive anaphor is preceded by a noun or personal pronoun, then select it as antecedent. ii) If possessive reflexive pronoun is preceded by a clitic/postposition and a noun or personal pronoun, select noun/personal pronoun as antecedent and mark the case. iii) If possessive reflexive pronoun adverb and noun or personal pronoun, select noun/personal pronoun as antecedent.

and) and) اور iv) If two possessive reflexive pronouns are connected by connector preceded by noun or personal pronoun, select noun or personal pronoun as antecedent. v) If possessive reflexive pronoun is used in double (two times), and preceding text contains a noun in plural, select this, noun in plural, as antecedent.

44

”وخد“ Non Possessive or Emphatic Reflexive pronoun .3.4.2

Non possessive or emphatic reflexive pronouns are personal pronouns. They are used to explain that the action done by noun is without anyone’s help. In English, myself, ourselves, yourself, himself, herself, itself, oneself, yourselves, themselves are emphatic

khudd) is used for this purpose with personal) ”وخد“ reflexive pronouns. In Urdu, the word

,(myself) ںیم وخد ,pronoun or noun as a compound word to emphasize an action. For example

.our self), etc) مہ وخد ,(himself/herself) وہ وخد ,(yourself) آپ/مت وخد

Following are the different variations of the grammatical uses of emphatic reflexive

.”وخد“ pronoun

compound with a noun or personal pronoun ”وخد“ Emphatic Reflexive AD 3.4.2.1

Emphatic reflexive pronoun is used as a compound, preceded by a noun or personal pronoun, to explain that the action performed is emphatic and its referent is the noun or pronoun which precedes it. Consider the following examples:

اوبوخدبجوکسلںیم ڑپےتھ ےھت وت وہ یھب ابلکسیئ رپ آےت ےھت۔ (11)

‘Abbu khudd jab school main parrte the tu woh bhi bicycle par aate the’

(When father himself studied in school, he used to come on bicycle)

father) to show) اوب self) is used as compound with noun) وخد In the above example, the word the action done emphatically by preceding entity (father).

45

Fig-3.5 shows the linkage between non-possessive reflexive pronoun and preceding noun.

اوبوخدبجوکسلںیم ڑپےتھ ےھت وت وہ یھب ابلکسیئ رپ آےت ےھت۔

NPRP Noun

Fig-3.5 Non-possessive reflexive pronoun preceded by a noun

Syntactical structure of the non-possessive reflexive pronoun shows that it is preceded by a noun, which is its antecedent.

Consider another example:

ااتسدوکآےت دھکی رک ںیم وخد ریحان رہ ایگ (12)

‘Ustad ko aate dekh kar ain khudd hairan reh gya’

(I myself was astonished to see the teacher coming

I). Syntactical) ںیم teacher) and a personal pronoun) ااتسد Above example contains a noun structure of non-possessive pronoun shows that noun or personal pronoun preceded by non-

.I), is the antecedent) ںیم possessive pronoun is its antecedent. Therefore, personal pronoun

46

preceded by ergative case ”وخد“ Emphatic Reflexive pronoun 3.4.2.2

ne) and a noun or pronoun then) ےن is preceded by the clitic ”وخد“ When emphatic reflexive that noun or pronoun is its antecedent, showing that the action done is performed

.(ne) ےن emphatically by the entity before clitic

Consider the following example:

وخیشیکابتہیےہ ہک اعہشئ ےن وخد یہ رھگ اک اکم رک ایل۔ (13)

‘khushi ki baat yeh ha keh Ayesha ne khudd hi ghar ka kaam kar lya’

(Good news is that Ayesha herself has done homework

وخیشیکابتہیےہ ہک اعہشئ ےن وخد یہ رھگ اک اکم رک ایل۔

Noun NPRP Clitic

Ergative case

Fig-3.6 Non-possessive reflexive pronoun ergative case

Fig-3.6 shows that non-possessive reflexive pronoun is preceded by ergative noun case,

ne) is) ےن ne). The noun preceded by postposition) ےن noun combined with postposition the antecedent.

47

Consider another example:

آجاسےن وخد ویپمکرٹرپورگام اھکل۔ (14)

‘aj uss ne khudd computer program likha’ (He himself has written computer program today)

اعہشئ noun) اعہشئ ےن self) is preceded by ergative case) وخد In example (13), emphatic reflexive

,(ےن clitic + اس personal pronoun) اس ےن and in example (14) , by ergative case ,( ےن clitic + so these, noun and personal pronoun are antecedent.

Fig–3.7 shows syntactic structure of example (14), where non-possessive reflexive pronoun is preceded by ergative case.

آجاسےن وخد ویپمکرٹرپورگام اھکل۔

NPRP clitic PP

Ergative case

Fig-3.7 Non-possessive reflexive pronoun preceded by ergative case

preceded a dative case ”وخد“ Emphatic reflexive pronoun 3.4.2.3

ke) are used after the noun. Consider the following) ےک ko) and) وک For dative case, clitic examples:

48

ی ں آفص وکوخد رزن انبےن اچ ئہ۔ (15)

‘Asif ko khudd runs banana chaheye’

(Asif himself should make runs)

اس ےکوخدااکوٹن دنب وہ ےئگ۔ (16)

‘uss ke account band ho gye’

(He himself got accounts closed)

and (آفص) In example (15), emphatic reflexive pronoun is preceded by dative case, noun

in this case is antecedent. In example (16), emphatic reflexive (آفص) The noun .(وک) clitic

The personal .(ےک) and clitic (اس) pronoun is preceded by dative case, personal pronoun

.is the antecedent (اس) pronoun

preceded by an adverb and a noun/pronoun ”وخد“ Emphatic reflexive pronoun 3.4.2.4

bhi) and) یھب is preceded by the adverb ”وخد“ In the following examples, emphatic reflexive a noun or pronoun. That noun or pronoun is its antecedent, showing that the action done is

bhi). Consider the following) یھب performed emphatically by the entity preceded by adverb examples:

یلعیھبوخد الوہر اجان اچاتہ ےہ۔ (17)

‘Ali bhi khudd Lahore jana chahta ha’

(Ali himself too wants to fo to Lahore)

وہیھب وخد رپورگاگنم رکات ےہ۔ (18)

‘who bhi khudd programming karta ha’ (he himself too does the programming)

49

(bhi) یھب is preceded by adverb ”وخد“ In example (17) and (18), emphatic reflexive pronoun

(bhi) یھب he) respectively. The entity preceding the adverb) وہ Ali) and pronoun) یلع and noun is antecedent in both example, the subject of the clause.

and noun/pronoun ”ذبات“ preceded by ”وخد“ Emphatic reflexive pronoun 3.4.2.5

in person) can also appear before the emphatic reflexive pronoun. It is) ”ذبات“ The word used emphatically to show that the someone has done/performed something in person. This

وہ ذبات ,(Ali himself) یلع ذبات وخد ,combination is preceded by a noun/pronoun. For example

you yourself), etc. Consider the following) مت ذبات وخد ,(I myself) ںیم ذبات وخد ,(he himself) وخد examples:

اس ومہعق رپ ںیم ذبات وخدوموجد اھت۔ (19)

‘es moka par main bazat e khudd mojood tha’

(I myself was present at that time)

یلع ذبات وخد االجسںیم رشکی وہا۔ (20)

‘Ali bazat e khudd ijlas main shareek hua’

(Ali himself attended the meeting)

is used to say that someone (ذبات وخد) وخد and ذبات In example (19) and (20) combination of has performed an action by personal presence. The noun/pronoun preceded by the

,(is the antecedent. Fig-3.8 shows syntactic structure of example (19 (ذبات وخد) combination

.is preceded by a personal pronoun (ذبات وخد) where the combination

50

اس ومہعق رپ ںیم ذبات وخد وموجد اھت۔

Personal NPRP + pronoun ذبات وخد

preceded by personal pronoun ذبات Fig-3.8 NPRP with word

3.4.3 Possessive and non-possessive reflexive pronouns together

khudd) can be followed by any of possessive) وخد Non-possessive reflexive pronoun

This combination is used to say something emphatically .(اینپ / اےنپ / اانپ) reflexive pronoun and also show the possession of some entity. Consider the following examples:

ںیموخد اےنپ وشق ےس زفسک ڑپھ راہ وہں۔ (21)

‘main khudd apne shoke se physics parrh raha hoon’

(I am studying physics on my own will)

یلعوخد اےنپ اکم اک ذہم دار ےہ- (22)

‘Ali khudd apne kaam ka zima dar ha’

(Ali himself is responsible for his own work)

myself/himself + my/his own) is) وخد اےنپ In the above two examples, the combination showing that the action performed/done emphatically and also showing the possession of

51

work) by) اکم I) in example (21) and possession of) ںیم will) by the subject of the clause) وشق

Ali). The noun or pronoun preceded by this combination will be) یلع the subject of clause the referent. Fig-3.9 show example (21), where possessive and non-possessive reflexive pronouns are used together, preceded by personal pronoun.

ںیموخداےنپ وشق ےس زفسک ڑپھ راہ وہں۔

PRP NPRP PP

Fig-3.9 PRP and NPRP together preceded by a personal pronoun

3.4.4 Distributive Reflexive pronoun

In this case, possessive reflexive pronouns are used twice and behave like reciprocal in the relevant aspect. They refer to some noun, in plural form, and show the distribution of some possessive entity.

Apni Apni), each) اینپ اینپ Apne Apne), and) اےنپ اےنپ ,(Apna Apna) اانپ اانپ,For example possessive reflexive pronoun is being used twice and their behavior is the combination of both distributive pronoun and reflexive pronoun.

52

Consider the following example:

ڈارئرٹکییکرقتری ےک دعب المزنیم وخش وہ ےئگاور اےنپ اےنپاکم رپ ےلچ ےئگ۔ (23)

‘Director ki takreer k baad mulazmeen khush ho gye aur apne apne kaam par chale gye’

‘the employees became happy and went on their work after the speech of director’

Apne Apne) is referring back to noun) اےنپ اےنپ Twice use of possessive reflexive pronoun

اکم employees), subject of the clause and showing the possession and distribution of)المزنیم (work) to the subject. Fig-3.10 show possessive reflexive pronoun together, referring to noun in plural and the entity to be distributed.

ڈارئرٹکییکرقتری ےک دعب المزنیم وخش وہ ےئگ اوراےنپ اےنپاکمرپ ےلچ ےئگ۔

Entity to be distributed Noun in plural

PRP twice

Fig-3.10 PRP twice preceded by entity to be distributed

53

After analyzing and discussing different variations in the use of non-possessive reflexive anaphor, following heuristic rules are formulated for its resolution.

i) If non-possessive reflexive pronoun is preceded by a noun or personal pronoun, then select it as antecedent. ii) If non-possessive reflexive pronoun is preceded by a clitic/postposition and a noun or personal pronoun, select noun/personal pronoun as antecedent and mark the case. iii) If non-possessive reflexive pronoun adverb and noun or personal pronoun, select noun/personal pronoun as antecedent.

and a ذبات iv) If non-possessive reflexive pronoun is preceded by word Noun/personal pronoun, select noun/personal pronoun as antecedent, and the

is taken as myself, himself, herself, depending upon gender ذبات وخد combination and number agreement. v) If possessive and non-possessive reflexive pronoun are used together and preceded by noun/personal pronoun, select noun/personal pronoun as antecedent, combination is taken as myself, himself, herself, depending upon gender and number agreement.

On the bases of variety of uses of all possessive and non-possessive pronouns of Urdu, heuristic rules are formulated, in the previous sections, to resolve them. These rules are combined together and shown in Table-3.3.

First column in the Table-3.3 is for type of reflexive anaphor, second column is Rule No, abbreviated as R1, R2, etc., third column shows the syntactic usage of reflexive anaphors in Urdu text, and last column shows, how to select antecedent, and take action to resolve anaphoric link.

54

Table-3.3 Resolution rules for reflexive anaphora

Pronoun/Anaphor Rule # Usage Antecedent/Action

Possessive Anaphor R1 Preceded by Noun / PP is antecedent Noun/PP ,(apne) “اےنپ“ ,(apna) “اانپ“)

,apni) R2 Preceded by Noun/PP is antecedent) “اینپ“ and clitic and Remove clitic, Mark Case Noun/PP

R3 Preceded by Noun/PP is antecedent adverb and Noun/PP

R4 Two PRP Noun/PP before first PRP and connected by noun/PP after second PRP are antecedent. Mark the case ,(and) اور preceded by Noun/PP

Non Possessive R1 Preceded by Noun / PP is antecedent Noun/PP (khudd) وخد Anaphor

R2 Preceded by Noun/PP is antecedent, Noun/PP case Remove clitic, Mark Case

R3 Preceded by Noun/PP is antecedent adverb and Noun/PP

55

R4 Preceded by Noun/PP is antecedent,

is taken as ذبات وخد and combination ذبات word a Noun/PP myself, himself, herself, depending upon gender and number agreement.

Possessive and Non R1 Combination Noun/PP is antecedent, Possessive Anaphors preceded by combination is taken as Noun/PP myself, himself, herself, + (khudd) وخد) together depending upon gender and (apna) (apne)) .number agreement “اےنپ“ , “اانپ“

Distributive Possessive R1 Preceded by Plural form is antecedent and entity in following entity is Apna) اانپ اانپ ) Anaphors plural form distribution to antecedent. Apne) اےنپ اےنپ ,(Apna

Apni) اینپ اینپ Apne), and Apni))

In the next section, we explore our second type of pronoun, distributive pronoun, and discuss its various types, their various syntactical uses, and formulate heuristic rules to resolve them.

56

:(امضرئمیسقت) Exploring Distributive Pronouns 3.5

A distributive pronoun considers members of a group separately, rather than collectively [93]. Distributive pronouns refer to persons or things or places one at a time, out of a group. They describe members of a group separately and not collectively. These are used with singular noun and verb to describe all the members of the particular group.

The distributive pronouns show that the persons or things or places are taken singly or in a group. Following are the distributive pronoun in Urdu.

Har Ek)- every one) رہ اکی (a

Koi Ek)- any one) وکیئ اکی (b

Koi Bhi) anyone) وکیئ یھب (c

Kai Ek) – many out of many) یئک اکی (d

In the following section, we explore and discuss the syntactical use of these distributive pronouns with the help of Urdu examples.

(Har Ek) رہ اکی Distributive Pronoun 3.5.1

It acts as ‘every one’ of English. It refers back to everyone in a group of people or things. So everyone in that group is its antecedent or referent.

.(Har Ek) رہ اکی The following are different uses of pronoun

3.5.1.1 Group Reference

Har Ek) refers to each individual in a) رہ اکی In Urdu text, the distributive pronoun specified group, which is the antecedent or referent.

Consider the following examples:

آرٹسایلیےک الخف ڑلےک ڑبی تنحم ےس لیھک رےہ ےھت-رہ اکیچیم انتیج اچاتہ اھت۔ (1)

57

‘Australia k khilaf larrke barri mehnat se khel rahe the. Har Ek match jeetna chahta tha’

(The boys were playing with great effort against Australia. Each one wanted to win the match)

boys) in) ڑلےک each one) in the second sentence refers to a group) رہ اکی Here, the pronoun the first sentence, indicating each individual of the group (antecedent).

,(boy) ڑلاک boys), which is) ڑلےک To resolve this distributive anaphoric link, singular form of

.(Har Ek) رہ اکی can be inserted after anaphor

Text after resolution will become:

آرٹسایلیےک الخف ڑلےک ڑبی تنحم ےس لیھک رےہ ےھت-رہ اکی ڑلاکچیم انتیج اچاتہ اھت- (2)

‘Australia k khilaf larrke barri mehnat se khel rahe the. Har Ek larrka match jeetna chahta tha’ (The boys were playing with great effort against Australia. Each boy wanted to win the match)

Fig- 3.11 is showing the distributive pronoun in the referring sentence and a group or noun in plural form (antecedent), in the referred sentence.

آرٹسایلیےک الخف ڑلےکڑبیتنحم ےس لیھک رےہ ےھت-رہ اکیچیم انتیج اچاتہ اھت

Distributive Noun in plural Anaphor

Fig-3.11 Distributive pronoun referring to a noun in plural

58

Consider another example:

یلعےناسری وصتریںی رخدی ںیل۔ رہ اکی رفنمد یھت- (3)

‘Ali ne sari tasweeren khareed le- Har Ek munfarid thi’

(Ali purchased all pictures. Each one was unique)

each one) in the second sentence refers to a group or noun in plural) رہ اکی Here, the pronoun

.pictures) in the first sentence, indicating each individual of the group)وصتریںی ,form

pictures), which is)وصتریںی To resolve this distributive anaphoric link, the singular form of

.(Har Ek) رہ اکی picture), can be inserted after anaphor) وصتری

Text after resolution will become:

یلعےناسری وصتریںی رخدی ںیل۔ رہ اکی وصتریرفنمد یھت- (4)

' Ali ne sari tasweeren khareed le- Har Ek tasweer munfarid thi’

(Ali purchased all pictures. Each picture was unique)

3.5.1.2 Role of verb

Verbs are vital for construction of sentence. verbs specify the action being taken in a sentence. The subject of the sentence, what the sentence is about, can be a noun or pronoun. A sentence may have more than one verbs. One of these verbs is the main verb showing the action being taken. This main verb can be identified by the subject of the sentence. For example, in English:

Because it was snowing, Peter decided to bring extra coat.

There are four verbs in this sentence: “was”, “snowing”, “decided” and “bring”. Now the question arises: which one is the main verb? Answer depends upon the subject of the sentence, and what is being discussed. Here, in this case, Peter is being discussed and what peter is doing. He is deciding, so “decided” is the main verb.

There can be more than one main verbs in a sentence. consider the following example:

59

Peter plays and runs in football ground.

Here both verbs are in present form and carry equal weight. Both verbs demonstrate the action of the subject, Peter, so both are main verbs.

The same case is in Urdu. There can be cases where a sentence may contain more than one referent candidate. The verb/main verb in the referring sentence plays an important role as it carries information about the action of subject, which in turn helps in locating the correct referent.

Urdu does not differentiate pronouns on gender, it is the verb in the text that differentiates masculine from feminine gender. Therefore, knowledge of verb is also essential for correct identification of antecedent.

Following is the example where a sentence contains more than one antecedent candidate.

رہشوںںیمڑلےکویلگں ںیم رکٹک ےتلیھک ںیہ رہ ۔اکی اچاتہ ےہ ہکوہ وقیم میٹ ںیم اشلم وہاجےئ۔ (5)

‘shehro main larrke gallio main cricket khelte hain. Har Ek chahta ha keh who komi team main shamil ho jai’

(In cities, boys play cricket in the streets. Each one wants to join national team)

Har Ek) refers to each one out of a) رہ“ اکی“ If we consider the previous rule that anaphor group, then in this example there is a complication for finding the correct referent. The reason is that there are three entities in group form in the referent sentence. The entity

streets), each represents a group, being in plural) “ویلگں“ boys) and) “ ڑلےک“ ,(cities) ”رہشوں“ form and is a candidate referent. Here, the main verb will decide about the subject of the

join). And) اشلم wants) and) اچاتہ :referring sentence. Referring sentence contain two verbs

play). These three verbs collectively indicate) ےتلیھک in the referent sentence the verb used is

boys), as all verbs are associated with it, on the bases of) ڑلےک that the subject of the text is

Har Ek). Fig-3.12) رہ اکی boys) is the referent of the pronoun) ڑلےک ,world knowledge. So shows example (5), its anaphor, possible potential antecedents, and the three deciding factors to find out the subject of sentence, which is the antecedent. 60

Finding the subject

رہشوںںیمڑلےکویلگں ںیم رکٹک ےتلیھک ںیہ۔رہ اکی اچاتہ ہکوہ ےہوقیم میٹ ںیم اشلم وہ اجےئ۔

Main verb

plural plural plural

Fig-3.12 Multiple antecedents and verbs as deciding factors

Har Ek), the singular form of the subject will replace the) رہ اکی For resolution of pronoun

:Ek). After resolution the text will become) اکی word

رہشوںںیمڑلےکویلگں ںیم رکٹک ےتلیھک ںیہ-رہڑلاک اچاتہ ےہ ہکوہ وقیم میٹ ںیم اشلم وہاجےئ ۔ (6)

‘shehro main larrke gallio main cricket khelte hain. Har larrka chahta ha keh who komi team main shamil ho jai’

(In cities, boys play cricket in the streets. Every boy wants to join national team)

3.5.1.3 Using properties/attributes/complements to identify referent.

The referring sentence may have some property/attribute words to help in locating the correct referent in case of multiple candidate referent. Consider the following example:

اشدی ںیم ابراویتں ےک ےئیل دعتمد اھکےن ےھت- رہاکی ذلت ےس رھبوپر اھت۔ (7)

‘shadi main baratiyo k lye muta-addid khane the. Har Ek lazat se bharpur tha’

(There were several dishes for the participants at the wedding. Each one was full of taste)

61

ابراویتں In the referent sentence, there are two plural words or groups or candidate referent

taste) ذلت ےس رھبوپر dishes). The referring sentence has a complement) اھکےن participants) and)

,dishes). As human beings) اھکےن full) which is an attribute of one of the candidate referent

ابراویتں dishes) and not) اھکےن having world knowledge, it is easy to decide that the referent is (participants). It means that all our prior assumptions are not sufficient and there is a need of even more knowledge to be stored in the computer for locating correct referent. Fig-3.13 shows, how attributes help in locating the correct antecedent.

Attribute Matching

اشدی ںیم ابراویتں ےک ےئیلدعتمد اھکےن ےھت- رہاکیذلت ےس رھبوپر اھت۔

D. Anaphor

plural plural

Fig-3.13 Selecting antecedent on the bases of attributes-1

اھکےن Har Ek), the singular form of the selected candidate) رہاکی To resolve the pronoun

.(Har Ek) رہاکی Ek) of the pronoun) اکی dish) will replace the word) اھکان dishes) which is) After resolution, now the text is clearly understandable.

اشدی ںیم ابراویتں ےک ےئیل دعتمد اھکےن ےھت- رہاھکان ذلت ےس رھبوپر اھت۔ (8)

‘shadi main baratiyo k lye muta-addid khane the. Har khana lazat se bharpur tha’

62

(There were several dishes for the participants at the wedding. Each dish was full of taste)

Consider the same example with change in attributes:

اشدی ںیم ابراویتں ےک ےئیل دعتمد اھکےن ےھت- رہاکی وک زور ےس وھبک یگل وہیئ یھت۔ (9)

‘shadi main baratiyo k lye muta-addid khane the. Har Ek ko bhook lagi hui thi’

(There were several dishes for the participants at the wedding. Each one was extremely hungry)

ابراویتں ,In the above example, the referred sentence contains two candidate referent

زور ےس ,dishes). Applying the same analogy i.e. world knowledge) اھکےن participants) and)

ابراویتں which is the attribute of , رہاکی extremely hungry) after the pronoun) وھبک یگل (participants), the correct referent is selected. Fig-3.14 shows how attributes can be used to select accurate antecedent out of multiple. To resolve anaphoric link, the singular form

اکی participant), will replace the word) ابرایت participants), which is) ابراویتں of selected referent

:Har Ek). After resolution, the text will become) رہاکی Ek) of the pronoun)

اشدی ںیم ابراویتں ےک ےئیل دعتمد اھکےن ےھت- رہ ابرایت وک زور ےس وھبک یگل وہیئ یھت۔ (10)

‘shadi main baratiyo k lye muta-addid khane the. Har barati ko bhook lagi hui thi’

(There were several dishes for the participants at the wedding. Each participant was extremely hungry)

In example (7) and (8), the candidate referents are same, and it is the world knowledge which is used to select the correct referent.

63

Finding the subject

اشدیںیم ابراویتں ےک ےئیلدعتمد اھکےن ےھت- رہاکی وک زور ےس وھبک یگلوہیئیھت۔

D.Anaphor

plural plural

Fig-3.14 Selecting antecedent on the bases of attributes-2

Har Ek) followed by Noun/Noun Phrase) رہاکی Pronoun 3.5.1.4

Har Ek) can be followed by a noun/noun phrase in Urdu sentence. In this) رہ اکی Pronoun

Har Ek) will refer to everyone, specified by) رہ اکی case, resolution is not required. Pronoun following noun/noun phrase. Consider the following examples:

رہاکی املسمن یک ہی وخاشہ وہیت ےہہکوہزدنیگںیممک از مک اکی دہعف رضور جح یک اعسدت احلص رکے۔ (11)

‘Har Ek musalman ki yeh khuwahish hoti ha keh who zindagi main kam az kame k dafa zaroor haj ki saadat hasil kare’

(Every Muslim has the desire to perform Haj at least once in his life)

Muslim), which is its referent. Here) املسمن Har Ek) is followed by noun) رہ اکی Here, pronoun

.Har Ek) is used to say something more emphatically) رہ اکی pronoun

64

The above text can also be written in the form:

رہاملسمن یک ہی وخاشہ وہیت ےہ ہکوہزدنیگںیممک از مک اکی دہعف رضور جح یک اعسدت احلص رکے۔ (12)

‘Har musalman ki yeh khuwahish hoti ha keh who zindagi main kam az kame k dafa zaroor haj ki saadat hasil kare’

(Every Muslim has desire to perform Haj at least once in his life)

Har) can also be used. But in this case it will) رہ Har Ek), only the word) رہ اکی Instead of not be said emphatically.

Consider another examples:

آرٹسایلی ےتچنہپ یہرہ اکیالھکڑیےن دہع ایک ہک وہ حتف ےک ےئیل اجن یک ابزی اگل دے اگ (13)

‘Australia pohnchte hi Har Ek khilarri ne ehd kya keh woh fateh k lye jaan ki baazi laga de ga’

(Each player vowed that he would make every effort for victory after reaching Australia)

player), describing) الھکڑی Har Ek) is followed by noun) رہ اکی In the above example, pronoun each one emphatically.

3.5.1.5 Topicalized structure

In Urdu discourse, topicalized structures are frequently used. If the topic of the given discourse is a , plural noun or representing a group, and there is no noun in plural or a group in discourse text, then anaphoric link is towards the topic of discourse ([70], [72]).

65

Consider the following example:

(Dogs) ےتک (14)

وصخًاص بج یسک داکنےتختےکےچینےساناک اکی وپرا ہیفخ ہسلج ابرہ ڑسک رپ آرک غیلبت اک اکمرشوع رکدے وت آپ یہ ےیہک وہش

اکھٹےن رہ ےتکس ںیہ؟ رہ اکی یکرطفابری ابری وتمہج وہان ڑپات ےہ۔

‘khasoosan jab kisi dookan k takhte k neechay se en ka ek poora khufya jalsa bahar saddar par aa kar tableegh ka kaam shuroo kar de tu aap hi kehye hosh thikane reh sakte hn? Har Ek ki taraf bari bari mutawajah hona parrta ha’

(Especially when whole of their procession emerges from under the step of a shop for preaching then tell me how can one be in one’s senses. You have to turn your attention towards each of them alternatively one by one)

Har Ek), the antecedent in the preceding text is) رہ اکی For distributive anaphor unidentifiable. The topic of the discourse is a noun in plural. Therefore, the

.Har Ek) is referring to the noun in topic of the discourse) رہ اکی distributive anaphor To resolve this anaphoric link, the singular form of the topic of discourse can be

ےتک HarEk). Here the topic of the text is) رہ اکی inserted after the distributive anaphor

,ki), genitive case) یک Dog). But, as it is preceded by) اتک Dogs). Its singular form is)

Har Ek), as mentioned) رہ اکی therefore singular oblique form will be inserted after in Table 3.2. Singular oblique form is same as its plural form.

Text after resolution will become:

(Dogs) ےتک (15)

وصخًاصبجیسک داکن ےک ےتخت ےک ےچین ےس ان اک اکیوپراہیفخہسلج ابرہ ڑسک رپ آرک غیلبت اک اکمرشوع رکدے وت آپ یہ ےیہک وہش

اکھٹےن رہ ےتکس ںیہ؟ رہ اکی یک ےتکرطفابری ابری وتمہج وہان ڑپات ےہ۔

66

‘khasoosan jab kisi dookan k takhte k neechay se en ka ek poora khufya jalsa bahar saddar par aa kar tableegh ka kaam shuroo kar de tu aap hi kehye hosh thikane reh sakte hn? Har Ek kutte ki taraf bari bari mutawajah hona parrta ha’

(Especially when whole of their procession emerges from under the step of a shop for preaching then tell me how can one be in one’s senses. You have to turn your attention towards each dog alternatively one by one)

3.5.1.6 Referent in Oblique form

In Urdu text, oblique form of a noun is followed by postpositions (clitics). If the referent

Har Ek), singular oblique form is) رہ اکی is the oblique form then for resolution of pronoun

Har Ek) in the referring sentence of the text) رہ اکی required to be inserted after the pronoun to make it easily understandable by the machine.

Consider the following example:

ےھجماےنپ وھگڑوں ےس تہب ایپر ےہ- ںیمرہاکییکوخراک اک اخص ایخل راتھک وہں۔ (16)

‘mujhay apne ghorro se bahut pyar ha. Main Har Ek ki khorak ka khaas khyal rakhta hoon’

(I love my horses too much. I take great care of the food of each)

horses). The plural form of) وھگڑوں Har Ek) is referring to plural oblique form) رہ اکی Here

horse) and oblique plural) وھگڑے horses) and the oblique singular form is) وھگڑے horse) is) وھگڑا

horses). After locating, the singular oblique form can be inserted after the) وھگڑوں form is

Har Ek). After resolution, the text will become) رہ اکی pronoun

ےھجماےنپ وھگڑوں ےس تہب ایپر ےہ- ںیمرہاکیوھگڑے یک وخراک اک اخص ایخل راتھک وہں- (17)

‘mujhay apne ghorro se bahut pyar ha. Main Har Ek ghorray ki khorak ka khaas khyal rakhta hoon’

(I love my horses too much. I take great care of the food of each horse)

67

horses), which is actually a plural form, is used as an oblique singular) وھگڑے Here, the word form.

Har Ek) referring to many entities of same category separated) رہ اکی Pronoun 3.5.1.7 by comma

Har Ek) can refer back to many entities of same category separated by) رہ اکی The pronoun comma. The antecedent, in this case, is not in plural form rather more than one antecedent, in singular form, are separated by comma to make one group. Fig-3.15 shows these nouns, separated by comma, belong to one group or class. To resolve this type of anaphora, the

.(Har Ek) رہ اکی category of all these singular antecedent can be inserted after pronoun

Consider the example:

یلعاعیبطت،ایمیکرگی،راییض اور ایحایتت ڑپاھ اتکس ےہ۔ وہ رہ اکی اک امرہ ےہ۔ (18)

‘Ali tibiyat, kimiya gari, riazi aur hayatiyat parrha sakta ha. Woh Har Ek ka mahir ha’

(Ali can teach physics, chemistry, mathematics, and biology. He is expert of each one)

,mathematics (راییض) ,chemistry (ایمیکرگی) ,physics (اعیبطت) In this example, multiple entities

.(Har Ek) رہ اکی biology, separated by comma, are being referred by pronoun (ایحایتت) and These entities belong to one category or class ‘subject’. So, to resolve this anaphora and

subject) can be)ومضمن make the text easy to understand and translate, the category/class

.(Har Ek) رہ اکی inserted after the pronoun

68

یلع اعیبطت، ایمیکرگی، راییض اور ایحایتتڑپاھ اتکس ےہ۔ وہرہ اکیاک امرہ ےہ۔

D. Anaphor One Class/Group

Fig 3.15 Distributive anaphor referring to a class

Text after resolution will become:

یلعاعیبطت،ایمیکرگی، راییض اور ایحایتت ڑپاھ اتکس ےہ۔ وہ رہ اکی ومضمناک امرہ ےہ- (19)

‘Ali tibiyat, kimiya gari, riazi aur hayatiyat parrha sakta ha. Woh Har Ek mazmoon ka mahir ha’

(Ali can teach physics, chemistry, mathematics, and biology. He is expert of each subject)

رہ اکی After analyzing and discussing different variations in the use of distributive anaphor (Har Ek), following rules are formulated for its resolution.

Har Ek) is preceded by a noun or noun phrase, then there is) رہ اکی i) If anaphor no need to resolve it. ii) If there is only one plural form or group of entities, then select it as antecedent

.Ek) with the singular form of selected antecedent) اکی and replace the word iii) If there are more than one plural nouns or noun phrases in the antecedent sentence, then relate the verb of the referring sentence with the characteristics

69

اکی of all Nouns or noun phrases. After finding the relation, replace the word (Ek) with singular form of the associated selected noun or noun phrase. iv) If there is no main verb in the referring sentence, or unable to locate the antecedent on the bases of verb then associate candidate antecedent with attributes or complements to find out the correct antecedent. v) If there is no plural or group in the discourse and the topic of discourse is a count noun, then select it as antecedent and insert its singular form after

.Har Ek), for its resolution) رہ اکی distributive anaphor vi) If the referent is in oblique form, then use singular oblique form for insertion

.Har Ek), for resolution) رہ اکی after the distributive anaphor vii) If there is a group of noun separated by comma, then use its class or group

.Har Ek), for resolution) رہ اکی name for insertion after distributive anaphor

(Koi Ek) وکیئ اکی Distributive Pronoun 3.5.2

It refers back to one entity out of a group of two or more, as its referent. It is like ‘any one’ of English. The following are different uses of this pronoun and the ways to resolve it.

3.5.2.1 Group formed by number

Koi Ek) can refer to one out of a group specified/formed by using by) وکیئ اکی Pronoun number.

Consider the following example:

اکلبیٹسنےن اھتدینار ےس اہک دو آدیم ڑکپے ےئگ ںیہ- وکیئ اکی رجممےہ- (20)

‘constable ne thanidar se kaha, do aadmi pakrray gye hn. Koi Ek mujrim ha’

(Constable told the Inspector that two men have been arrested. One of them is offender)

70

وکیئ اکی two). The pronoun) دو ,.Here, the group is mentioned by using number in words i.e

two men)). To resolve this anaphora, the) دو آدیم Koi Ek) is referring to one out of group)

Koi Ek). Fig-3.16 shows) وکیئ اکی man) can be used after the pronoun) آدیم singular form

.Koi Ek) and antecedent. ‘D.Anaphor is for distributive anaphor) وکیئ اکی anaphor

اکلبیٹسنےن اھتدینار ےس اہک دو آدیم ڑکپے ےئگ ںیہ- وکیئ اکی رجممےہ-

Singular form number

D. Anaphor

Group/plural

Fig-3.16 Distributive anaphor in group formed by number in words

After resolution the text will become:

اکلبیٹسنےن اھتدینار ےس اہک دو آدیم ڑکپےےئگ ںیہ- وکیئ اکی آدیم رجممےہ- (21)

‘constable ne thanidar se kaha, do aadmi pakrray gye hn. Koi Ek aadmi mujrim ha‘

(Constable told the Inspector that two men have been arrested, one man is offender)

3.5.2.2 Group formed by a plural

Koi Ek) can refer to one out of a group specified/formed by using a plural) وکیئ اکی Pronoun form of a noun.

71

Consider the following example:

ہیریمی ذایت اتکںیب ںیہ-انںیمےس وکیئ اکی مت ےل ےتکس وہ- (22)

‘yeh meri zati kitaben hain. En main se Koi Ek tum le sakte ho’

(These are my personal books. You can take any of these)

Koi Ek) is referring to any one out of a group in the previous) وکیئ اکی Here, the pronoun

,books). To resolve this anaphora) اتکںیب sentence. The group is mentioned by plural form

book), can be inserted after the pronoun) اتکب books), which is) اتکںیب the singular form of

.Koi Ek) to make the text easy to understand and translate) وکیئ اکی

The text after resolution will become:

ہیریمی ذایت اتکںیب ںیہ-انںیمےس وکیئ اکی اتکب مت ےل ےتکس وہ- (23)

‘yeh meri zati kitaben hain. En main se Koi Ek kitab tum le sakte ho’

(These are my personal books. You can take any book of these)

(Koi Ek) وکیئ اکی Noun after the pronoun 3.5.2.3

Koi Ek) then in this case, there is) وکیئ اکی If singular form of a noun is used after pronoun

Koi Ek) is referring entity which) وکیئ اکی no need to resolve the anaphora. The pronoun follows it.

Consider the following example: ٹ اتپکن ےن اہک وکیئ اکی الھکڑی بفال وک کک اگلےئ- (24)

‘kaptan ne kaha Koi Ek khilarri football ko kick lagai’

(The captain said, “Any one player should kick the football)

Consider another example:

72

اسکلمںیموکیئ اکی رامنہ یھب صلخم ںیہن ےہ۔ (25)

‘es mulk main Koi Ek rehnuma bhi mukhlis nahi ha’

(There is not a single sincere leader in this country)

Koi Ek) in this text. It is understandable) وکیئ اکی leader) after pronoun) رامنہ There is a noun and there is no need to resolve it.

(Koi Ek) وکیئ اکی Personal Pronoun before pronoun 3.5.2.4

Koi Ek) in plural form to show a) وکیئ اکی A personal pronoun can be used before pronoun

.Koi Ek) refers to one out of that group) وکیئ اکی group. In this case, pronoun

Consider the following example:

یلعےناےنپدووتسں ےس اہک ہک مت ںیم ےس وکیئ اکی لب ادا رکے اگ۔ (26)

‘Ali ne apne dosto se kaha keh tum main se Koi Ek bill ada kare ga’

(Ali said to his friends that any one of you will pay the bill)

you), plural + locative + instrumental case, is used) مت Here, the second person pronoun

Koi Ek) is referring) وکیئ اکی Koi Ek) showing a group and pronoun) وکیئ اکی before pronoun to one out of this group. This anaphora requires resolution of personal pronoun first and

Koi Ek) will be resolved. As, the resolution of personal) وکیئ اکی then distributive pronoun pronoun is out of the scope of this work, so we assume that personal pronoun is resolved

Koi Ek). As, the personal pronoun) وکیئ اکی and we only need to resolve distributive pronoun

Koi Ek) will refer to any) وکیئ اکی friends), so, pronoun) دووتسں tum) is referring to group) مت

friends). Resolution of this anaphor can be obtained by simply) دووتسں one out of these

Koi Ek). Fig-3.17 shows) وکیئ اکی friends) after distributive pronoun) دووتسں inserting the group all this mechanism.

73

Plural

یلعےناےنپ دووتسں ےس اہک ہک مت ںیم ےس وکیئ اکی لبادا رکے اگ۔

locative PP instrumental

D. Anaphor PP with case

Fig-3.17 Distributive pronoun preceded by personal pronoun

After resolution, the text will become:

یلعےناےنپ دووتسں ےس اہک ہک مت دووتسںمیئں ےس وکیئ اکی لب ادا رکے اگ۔ (27)

‘Ali ne apne dosto se kaha keh tum main se Koi Ek dost bill ada kare ga’

(Ali said to his friends that any one of you friends will pay the bill)

وکیئ اکی After analyzing and discussing different variations in the use of distributive anaphor (Koi Ek), following rules are formulated for its resolution.

Koi Ek) is referring to a group in preceding text, then) وکیئ اکی a) If distributive anaphor

(Koi Ek) وکیئ اکی select it as antecedent and insert its singular form after the anaphor in referring sentence.

Koi Ek), then there is no need) وکیئ اکی b) If there is a noun or noun phrase after anaphor to resolve it.

74

,Koi Ek) is preceded by personal pronoun and clitic) وکیئ اکی c) If distributive anaphor and there is only one noun in plural, then insert its singular form after distributive

.(Koi Ek) وکیئ اکی anaphor

(Koi Bhi) وکیئ یھب Distributive pronoun 3.5.3

Koi Bhi) may be used in Urdu text as the negative form of) وکیئ یھب The distributive pronoun

Koi Ek). Mostly, it refers to none of the two or more in) وکیئ اکی the distributive pronoun negative sentence. Sometimes, it refers to any one out of a group, especially when used in a positive sentence. Following are different uses of this distributive pronoun.

Koi Bhi) in a negative sentence) وکیئ یھب Distributive pronoun 3.5.3.1

Koi Bhi) is used in a negative sentence, it refers to ‘None) وکیئ یھب When distributive pronoun of a particular specified group’. Consider the following sentence:

دسادیموار ارٹنووی ےک ےئیل آےئ ےھت۔ انںیمےس وکیئ یھب الہ ںیہن ےہ ۔ (28)

‘das umeedwar interview k lye aaye the. en main se Koi Bhi ehal nahi ha’

(Ten candidate came for interview. None of them is eligible)

Koi Bhi) is a negative sentence. The) وکیئ یھب The sentence containing distributive pronoun

ten candidates). Here, the) دس ادیموار referred sentence contains a group specified by

ten) دس ادیموار Koi Bhi) is referring to ‘None out of the group of) وکیئ یھب distributive pronoun candidates)’. Fig-3.18 shows the distributive pronoun in negative sentence and referring to a noun in plural form.

75

دس ادیموارارٹنووی ےک ےئیل آےئ ےھت۔ انںیمےس وکیئ یھب الہ ںیہن ےہ۔

Plural -ve polarity D. Anaphor

Fig-3.18 Distributive pronoun in negative sentence

To resolve this anaphora, singular form of plural specified by group can be inserted after

Koi Bhi) or in between the two words of distributive) وکیئ یھب the distributive pronoun

:Bhi). After resolution, the text will become) یھب Koi) and)وکیئ ,.Koi Bhi) i.e) وکیئ یھب pronoun

دسادیموار ارٹنووی ےک ےئیل آےئ ےھت۔ انںیمےس وکیئ یھب ادیموار الہ ںیہن ےہ۔ (29)

‘das umeedwar interview k lye aaye the. en main se Koi Bhi umeedwar ehal nahi ha’

(Ten candidate came for interview. None of these candidates is eligible)

,.Koi Bhi) i.e) وکیئ یھب Or by inserting in between the two words of distributive pronoun

:(Bhi) یھب Koi) and)وکیئ

دسادیموار ارٹنووی ےک ےئیل آےئ ےھت۔ انںیم ےس وکیئ ادیموار یھب الہ ںیہن ےہ۔ (30)

‘das umeedwar interview k lye aaye the. en main se Koi umeedwar Bhi ehal nahi ha’

(Ten candidate came for interview. No candidate out of these is eligible)

76

Consider another example:

میٹ ںیماپچننیمسٹیب ےھت-وکیئیھب رزن ںیہن انب اکس۔ (31)

‘team main panch batsman the. Koi Bhi runs nahi bana saka.

(Five batsman were in the team. None of them could score runs)

five batsman). Referring) اپچن نیمسٹیب The referred sentence contains a group specified by

Koi Bhi) is referring to ‘None out) وکیئ یھب sentence is negative and the distributive pronoun

can نیمسٹیب of this group’. Therefore, for resolution of this anaphora, singular form, which is be inserted at any of the two locations as mentioned earlier. After resolution the text will become:

میٹ ںیماپچننیمسٹیب ےھت-وکیئ یھب نیمسٹیب رزن ںیہن انب اکس ۔ (32)

‘team main panch batsman the. Koi Bhi batsman runs nahi bana saka’

(Five batsman were in the team. None of batsman could score runs)

Or

میٹ ںیماپچننیمسٹیب ےھت- وکیئ نیمسٹیبیھبرزن ںیہن انب اکس۔ (33)

‘team main panch batsman the. Koi batsman Bhi runs nahi bana saka’

(Five batsman were in the team. No batsman could score run too)

Koi Bhi) in a positive sentence) وکیئ یھب Distributive pronoun 3.5.3.2

Koi Bhi) is used in a positive sentence, it refers to any) وکیئ یھب When distributive pronoun one of a particular specified group. Consider the following example:

ہیاسری اتکںیب ایھچ ںیہ- انںیمےسوکیئ یھب رخدی ول- (34)

‘yeh sari kitaben achi hain. en main se Koi Bhi khareed lo’

(All these books are good. Buy any one of these)

77

Koi) وکیئ یھب In the above example, referring sentence is positive and the distributive pronoun

books). Fig-3.19 shows the distributive) اتکںیب ,.Bhi) is referring to any one out of group i.e pronoun, antecedent, with positive sentence.

ہیاسری اتکںیب ایھچ ںیہ- انںیمےسوکیئ یھب رخدی ول-

Plural +ve polarity D. Anaphor

Fig-3.19 Distributive pronoun in positive sentence

book) can be inserted) اتکب books), which is) اتکںیب To resolve it, the singular form of group

:Koi Bhi). After resolution, the text will become) وکیئ یھب after distributive pronoun

ہیاسری اتکںیب ایھچ ںیہ- ان ںیم ےسوکیئیھب اتکب رخدی ول۔ (35)

‘yeh sari kitaben achi hain. en main se Koi Bhi kitab khareed lo’

(All these books are good. Buy any one book of these)

Consider another example:

انابلطءںیمےسوکیئ یھب یلہپ وپزنشی ےل اتکس ےہ۔ (36)

‘en talaba main se Koi Bhi pehli position le sakta ha’

(out of these students, anyone can get first position)

78

students), are in single) ابلطء Koi Bhi) and the referent) وکیئ یھب Both, the distributive pronoun

ابلطء is referring to any one of the group وکیئ یھب positive sentence. The distributive pronoun

(student) اطملعبل students), which is) ابلطء students). To resolve this anaphora, singular form of)

:After resolution, text will become . وکیئ یھب can be inserted after the distributive pronoun

انابلطءںیمےسوکیئ یھب اطملعبل یلہپ ےل وپزنشیاتکس ےہ۔ (37)

‘en talaba main se Koi Bhi talib ilam pehli position le sakta ha’

(out of these students, any one student can get first position)

وکیئ یھب After analyzing and discussing different variations in the use of distributive anaphor (Koi Bhi), following rules are formulated for its resolution.

a) If the polarity of the referring sentence is negative and there is one noun in plural in referred sentence, then select it as antecedent and insert its singular form after

.Koi Bhi) in referring sentence for its resolution) وکیئ یھب distributive anaphor b) If the polarity of the referring sentence is negative and both anaphor and antecedent are in the same sentence, then insert singular form of noun after the distributive

.Koi Bhi) for its resolution) وکیئ یھب anaphor c) If the polarity of the referring sentence is positive and referred sentence contains one noun in plural, then select it as antecedent and insert its singular form after the

.Koi Bhi) for its resolution) وکیئ یھب anaphor d) If the polarity of the referring sentence is positive and both anaphor and antecedent are in the same sentence, then insert singular form of noun after the distributive

.Koi Bhi) for its resolution) وکیئ یھب anaphor

79

(Kai Ek) یئک اکی Distributive pronoun 3.5.4

kai Ek) is used to refer to many out of many specified) یئک اکی In Urdu, distributive pronoun by a group/plural form. Following are different uses of this pronoun.

3.5.4.1 Reference to preceding group

Kai Ek) refers to many/several out of a group mentioned in) یئک اکی Distributive pronoun preceding text. Consider the following example:

اس فنصم ےن تہب ےسانول ےھکل ںیہ۔ ان ںیم ےسیئک اکی نیباالوقایم حطس رپ وہشمر ںیہ۔ (38)

‘es musanif ne bahut se naval likhe hain. en main se Kai Ek bain ul aqwami satah par mashhoor hain’

(This writer has written many stories. Many of them are famous on international level)

Kai Ek) in the second sentence is referring to many out of group) یئک اکی Here, the pronoun

انول many stories). To resolve this anaphora, the plural form of noun) تہب ےس انول specified by

stories), singular and plural forms are same in this case, can be inserted) انول story), which is)

.(Kai Ek) یئک اکی Ek) of pronoun) اکی Kai Ek) or can replace the word) یئک اکی after the pronoun After resolution, the text will become.

اسفنصمےنتہبےسانولےھکلںیہ۔ ان ںیم ےسیئک اکی انول نیب االوقایم حطس رپ وہشمر ںیہ۔ (39)

‘es musanif ne bahut se naval likhe hain. en main se Kai Ek novel bain ul aqwami satah par mashhoor hain’

(This writer has written many stories. Many of these stories are famous on international level)

80

Or also in this form:

اسفنصمےنتہبےسانولےھکل ںیہ۔ ان ںیم ےسیئک انول نیب االوقایم حطس رپ وہشمر ںیہ- (40)

‘es musanif ne bahut se naval likhe hain. en main se Kai novel bain ul aqwami satah par mashhoor hain’

(This writer has written many stories. Many of these stories are famous on international level)

3.5.4.2 Genitive or Possessive case

Kai Ek) is used as referring pronoun) یئک اکی When genitive or possessive case of pronoun then oblique plural form of referent noun is used for its resolution. Consider the following example:

اس ااحتمن ںیم دعتمداطبل ملع اناکم وہےئگ- یئکاکیاک ہی آرخی ومہعق اھت۔ (41)

‘es imtihan main mutaaddid talib ilam nakam ho gye. Kai Ek ka yeh aakhri moka tha’

(Several students failed in this examination. This was many’ s last chance)

.(ka) اک Kai Ek) is in genitive or possessive form as succeeded by clitic) یئک اکی Here pronoun

several students) in the previous) دعتمد اطبل ملع Kai Ek) is referring to group) یئک اکی The pronoun

(students) اطبل وملعں sentence. For its resolution, the oblique plural form of referent, which is

Kai Ek). Fig-3.20 shows the genitive case of) یئک اکی shall be inserted after pronoun distributive pronoun.

81

اس ااحتمن ںیم دعتمداطبل ملع اناکم وہےئگ-یئکاکیاک ہی آرخی ومہعق اھت۔

Plural clitic D. Anaphor

Fig-3.20 Distributive pronoun in genitive sentence

After resolution, text will become:

اس ااحتمن ںیم دعتمداطبل ملع اناکم وہےئگ -یئک اکیاطبل وملعں اک ہیآرخی ومہعق اھت۔ (42)

‘es imtihan main mutaaddid talib ilam nakam ho gye. Kai Ek talib ilmo ka yeh aakhri moka tha’

(Several students failed in this examination. This was many students’s last chance)

یئک اکی After analyzing and discussing different variations in the use of distributive anaphor (Kai Ek), following rules are formulated for its resolution.

a) If there is one noun in plural in referred sentence, then select it as antecedent and

Ek) by) اکی Kai Ek) or replace word) یئک اکی insert its singular form after the anaphor singular form of entity in referring sentence. b) If there is one noun in plural in referred sentence and there is genitive case in

Kai Ek) in referring sentence) یئک اکی referring sentence, then insert oblique form after for its resolution.

82

We have analyzed the syntactical structure of all distributive anaphors with the help of examples and discussed their usage and the ways to resolve them, in case of variety of possibilities.

On the bases of variety of uses of all distributive anaphors in Urdu text, heuristic rules are formulated, in previous sections, to resolve them. These rules are combined together and shown in Table-3.4. First column in the Table-3.4 is for type of distributive anaphor, second column is Rule No, abbreviated as R1, R2, etc., third column shows the syntactic usage of distributive anaphors in Urdu text, and last column shows, how to select antecedent, and take action to resolve anaphoric link.

Table-3.4 Resolution rules for distributive pronoun

Distributive Rule # Usage Antecedent/Action Anaphor

R1 Preceded by noun No need to resolve

phrase

R2 One plural entity Ek) with) اکی Replace word in antecedent singular form of entity, in sentence referring sentence

R3 More than one Select entity associated with main

plural entities in verb of referring sentence and antecedent Ek) with) اکی replace word sentence singular form of entity, in (Har Ek) referring sentence رہ اکی

R4 More than one Select antecedent by matching plural entities in attributes/complements/properties antecedent in referring sentence and replace sentence (in case

83

main verb does not Ek) with singular form) اکی word help) of entity, in referring sentence

R5 No plural noun, Topic is antecedent, insert its Topic of the رہ اکی singular form after pronoun discourse is plural (Har Ek) , in referring sentence

R6 One entity in Ek) with) اکی Replace word oblique plural singular oblique form of entity, in form in antecedent referring sentence sentence

R7 Entities separated Insert category name after by comma Har Ek) , in) رہ اکی pronoun referring sentence

R1 One plural entity Insert singular form of entity after

in antecedent Koi Ek), in referring) وکیئ اکی sentence sentence

(Koi Preceded by Noun No need to resolve وکیئ اکی Ek) R2

R3 Preceded by Insert singular form of plural personal pronoun entity after personal pronoun, in + clitic and referring sentence and note the referring sentence case. contains one plural entity

84

R1 Referring sentence Insert singular form of entity after

is negative, koi Bhi) , in referring) وکیئ یھب antecedent sentence, means none in the group sentence contains one plural entity

R2 Referring sentence Insert singular form of entity after is negative, ,koi Bhi) , in same sentence) وکیئ یھب (Koi anaphor and وکیئ یھب means none in the group Bhi) antecedent are in same sentence

R3 Referring sentence Insert singular form of entity after is positive, koi Bhi) , in referring) وکیئ یھب antecedent sentence, means anyone in the sentence contains group one plural entity

R4 Referring sentence Insert singular form of entity after is positive, ,koi Bhi) , in same sentence) وکیئ یھب anaphor and means anyone in the group antecedent are in same sentence.

R1 One plural entity Insert singular form of entity after

in antecedent Kai Ek) in referring) یئک اکی sentence (Ek) اکی sentence or replace word by singular form of entity in referring sentence

85

(Kai Ek) R2 One plural entity یئک اکی Insert oblique form after یئک اکی in antecedent (Kai Ek), in referring sentence sentence and genitive case in referring sentence

After this analysis of reflexive and distributive anaphora in Urdu, we have achieved our objectives/secondary research questions, specified in topic ‘1.4 Objectives’.

3.6 Summary

In this chapter, our secondary objectives are achieved. First, we presented the overview of anaphora resolution. In noun phrase anaphora, pronoun or anaphors refer to noun or noun phrases. Nouns are associated with noun cases, marked with postpositions or clitics. We discussed noun cases of Urdu, with examples, so that, when anaphoric link is resolved, noun case must be accurately translated to target language. Then, we investigated the structure and usage of reflexive and distributive pronouns one by one, with the help of examples and diagrams, and formulated heuristic rules to resolve reflexive and distributive anaphoric links in Urdu text. Anaphoric link between anaphors and antecedents and other associated entities were shown in diagram, wherever it is required. On the bases of these rules, a framework is designed to resolve reflexive and distributive anaphora in Urdu text, which is presented in the next chapter.

86

Chapter 4:

Proposed Framework for Reflexive And Distributive Anaphora Resolution (RADAR)

4.1 Introduction to RADAR.

Understanding of text by an NLP system is very useful for machine translation, question- answering system, and knowledge extraction. If an NLP system is able to understand text, it means it can understand the subject, object, and verb in provided text. Traditionally, understanding and processing of text is a complex and difficult task for an NLP system. Anaphora resolution is an integral part of NLP system, which helps in understanding the links between different units of text.

For the development of anaphora resolution system, it is essential to understand, in depth, the nature of referential process in discourse and problems behind it. Halliday and Hassan [2] stressed that anaphoric relations in a text are semantic, and not a textual one. However, the authors avoided the extensive use of semantic information for devising the automatic systems for anaphora resolution (Lappin and Leass [5]; Mitkov [62]; Kameyama [94]).

Distributive and reflexive anaphors frequently occur in natural languages and create links in between different parts of the text. Therefore, resolving the distributive and reflexive anaphora in any language will ultimately lead to the development of an efficient and accurate machine translation system.

Our approach, for developing a framework for reflexive and distributive anaphora resolution (RADAR), is mainly based on syntactical structure of the Urdu text and some world knowledge, when and where it is needed.

In artificial intelligence (AI) research, commonsense knowledge consists of facts about the everyday world [95]. For example:

Lemons are sour.

All humans are expected to have this knowledge. An NLP process can be attached to a commonsense knowledge base to allow it to attempt to answer a questions about the world [96].

87

The work done has been carried out by first analyzing the Urdu text manually to identify distributive and reflexive anaphors that occur in Urdu text. Then, the structure and use of these anaphors is analyzed and some generic rules have been defined for their resolution. Most of the rules are based on the syntactic and grammatical features of the language and some rules are defined on the basis of world-knowledge to resolve the reflexive and distributive anaphoric links.

4.2 General framework for anaphora resolution

The general framework for anaphora resolution consists of following tasks.

i) Sentence analyzer

ii) Noun phrase tagger

iii) Anaphora resolution process

a. Determination of anaphors

b. Search scope

c. Antecedent Selection

d. Anaphora Resolution

Following section discusses these task. Fig-4.1 shows the flow of these task.

(i) Sentence Analyzer.

It is required to analyze the sentence boundary on the bases of delimiters like dots, commas, question mark or exclamation mark etc. and then splits the sentence into tokens up to any one of these delimiters.

(ii) Noun Phrase Tagger.

It assigns grammatical category to each noun/noun phrase in the text.

(iii) Anaphora Resolution Process.

Anaphora resolution means identification and selection of suitable antecedent from a set of candidate antecedents. This process has following sub tasks:

a) Determination of anaphor.

88

Set of anaphor is determined from the given discourse.

b) Search Scope.

It is used to locate antecedent of an anaphor with in the discourse. Usually from the current sentence to N preceding sentences, depending on the type of anaphor.

Text Document

Sentence Analyzer

Noun Phrase Tagger

Anaphora Resolution Process

Determination of Anaphors Resources Search Scope

Antecedent Selection

Anaphora Resolution

Anaphora Resolved Text

Fig.- 4.1 General Framework for anaphora resolution

89

c) Antecedents Selection

Appropriate antecedent is selected out of list of candidate antecedents using algorithm based on rules and different factors.

d) Anaphora Resolution

The next step is to select suitable antecedent from list of candidate antecedent for an anaphor on the bases of rules/factors through sources of knowledge.

4.3 Architecture of RADAR

In chapter 3, we discussed and analyzed the syntax/structure of the Urdu text for resolving reflexive and distributive anaphora, with different features and variations of their usage, and formulated rules for their resolution. These rules are summarized in Table-3.3 and Table-3.4. The architecture of framework for resolution of reflexive and distributive anaphors (RADAR) is shown in Fig-4.2.

90

Fig-4.2. Architecture for RADAR

Architecture of the RADAR shows that its working starts with preprocessing. Text collected from various sources, containing reflexive and distributive pronouns, is prepared by putting discourse boundaries and inserting tokens.

Then the reflexive and distributive pronouns and potential antecedents are identified, by utilizing various resources. Reflexive and distributive pronouns are resolved by implementing the formulated rules of Tabl-3.3 and Table3.4 respectively.

This framework is divided into:

 Framework for resolving reflexive anaphora and  Framework for resolving distributive anaphora

91

In the next section, we explore these frameworks, separately.

4.4 Framework for resolving Reflexive anaphora

Our goal is to develop a framework for resolution of reflexive anaphoric links, present in Urdu text. Fig-4.3 shows the architecture for reflexive pronouns, separately, which is part of RADAR framework. This architecture, consists of various modules and resources to perform the required task.

In the following section, the working of framework for reflexive anaphora is discussed.

4.5 Workflow of framework for reflexive anaphora

Urdu text containing reflexive anaphoric link is collected from different internet sources, books, magazines, newspaper, children stories, etc. and entered as input into this framework. The text is passed through various modules for reflexive anaphora resolution process, utilizing various resources. The detail of these modules and resources is given below.

4.5.1 Reflexive Identifier

The function of this module is to scan the input Urdu text document and identify reflexive pronouns. It uses resource RA (Reflexive Anaphors) and keeps the list and other relevant information about reflexive pronoun.

4.5.2 Noun Phrase Extractor

This module extracts and collects noun or noun phrases, as candidate antecedent from the input text. It uses resources ‘Noun’ and ‘PP’ for identification of noun and personal pronoun. These resources contain common nouns and personal pronouns, normally used with possessive and non-possessive anaphors, in day to day life.

4.5.3 PR case Identifier

It identifies the noun case associated possessive reflexive pronoun, if any, for further NL processing. It uses resource ‘Noun cases’ for this identification.

92

Urdu Text Document

Reflexive Identifier RA

Noun Phrase Extractor PP Noun

Non Possessive Reflexive Possessive Type

NPR Case Identifier PR Case Identifier Noun Case

NPR Select Antecedent / Mark Rules Case PR Rules

Anaphora Resolution

Anaphora Resolved Text

Fig-4.3 Architecture of framework for resolution of reflexive pronoun

93

4.5.4 NPR case Identifier

It identifies the noun case associated non-possessive reflexive pronoun, if any, for further NL processing. It uses resource ‘Noun cases’ for this identification.

4.5.5 PR Rules

This module is responsible for selecting antecedent for possessive reflexive pronoun. It applies rules for resolving possessive reflexive pronouns, shown in Table-3.3, to locate the correct antecedent.

4.5.6 NPR Rules

This module is responsible for selecting antecedent for non-possessive reflexive pronoun. It applies rules for resolving non-possessive reflexive pronouns, shown in Table-3.3, to locate the correct antecedent.

4.5.7 Anaphora Resolution

After identification of antecedent by the previous modules, the module ‘Anaphora Resolution’ resolves the reflexive anaphoric link, both possessive and non-possessive, and makes appropriate changes, if required, to make the text easy to understand.

Noun cases are important in natural languages. Case of a noun indicates its grammatical function in relevance with the rest of the entities in a sentence. Reflexive anaphors refer to nouns or pronoun having different cases, affecting the syntax and semantics of text. Our approach, not only locates the correct antecedent but also determines noun case. It is important, when given Urdu text is to be translated to another language or for the purpose of question answering and text summarization.

Following section describes processes and resources, developed for this framework.

Following sets of data (resources) are developed to help in resolving reflexive anaphors.

4.5.8 RA

This resource contains both reflexive and non-reflexive anaphors. It is used to identify reflexive pronouns, present in the text. This resource is shown in Appendix A.

94

4.5.9 Noun

Though, it is not an easy task, to collect and store all the nouns. However, acceptable number of pronouns, used in routine, are collected, and stored in this collection. This mini Urdu noun set is shown in Appendix B.

4.5.10 PP

This resource contains personal pronoun, all forms of first, second, and third person. It helps in identifying personal pronoun, present in the text. It is shown in Appendix C.

4.5.11 Noun Cases

This resource contains clitics/postpositions, used in Urdu, for noun cases, shown in Table- 3.2. It is shown in Appendix D.

95

4.6 Framework for resolving distributive anaphora

After analyzing the variety of syntactical usage of various distributive anaphors of Urdu in section 3.5, a framework for their resolution is designed and presented here. This framework is based on the rules formulated for resolution of distributive anaphoric links in Urdu text, shown in Table-3.4. The architecture of the framework is shown in Fig-4.4.

4.7 Workflow of framework for distributive anaphora

In this section, we discuss, how framework for resolution of distributive anaphoric links work. Discourse units of Urdu text are collected from different internet sources, books, newspapers, magazines, containing distributive anaphoric links. Discourse boundaries, containing distributive anaphoric links, are marked manually. This document is entered in to system, RADAR, for resolving distributive anaphoric links.

The text is passed through various modules for distributive anaphora resolution process, utilizing various resources. The detail of these modules and resources is given below.

4.7.1 Distributive Anaphor Identifier

The first module, ‘Distributive Anaphor Identifier’, scans the input Urdu text document and identifies the distributive pronouns by using set of distributive anaphors (DA), shown in appendix ().

4.7.2 Mark Groups

Distributive pronouns refer to one or more entities out of a group specified by noun in plural. The module ‘Mark Groups’ identifies noun in plural or group entities. It uses three resources, ‘Plural’, which contains the singular as well as plural form of various entities, a resource ‘Quantifier’, which contains those Urdu words which are commonly used to show/represent a group or plural form of an entity, and the resource ‘Numbers’, which contains numeric values in the form of words, normally used to specify plural/group. For

is used as singular in Urdu. To make it plural, or آدیم three men) ‘, word) نیت آدیم‘ ,example

()three) is used here. Three processes QW(), SP(), and nmbr) نیت show a group, word perform this task.

96

Urdu Text Document DA

Distributive Anaphor Identifier QW () Plurals

Nmbr () Mark Group(s)/Plural(s) SP() Numbers Quantifier

Verb Identification Verb () Verbs

Attrib () Distributive Anaphora Category Resolution Attribute Polarity ()

NCase () Oblique PP MVerb () Resolve and / Mark Case

Adverb Nouncase

Anaphora Resolved Text

Fig-4.4 Architecture of framework for resolution of distributive pronoun

97

4.7.3 Verb Identification

This module identifies the verb(s) in the text by using process Verb() and resource ‘Verbs’. When, there are multiple candidate antecedent then verbs may help in selecting the correct antecedent.

After these identifications, the text is now ready for resolution.

4.7.4 Distributive Anaphora Resolution

Module ‘Distributive Anaphora Resolution Process’ identifies and selects correct antecedent with the help of rules formulated and summarized in Table-3.4. It uses various resources and processes for implementing the formulated rules of resolution.

4.7.5 Resolve and Mark Case

This module completes the process of resolution by inserting selected antecedent at a position where the anaphoric link becomes easily understandable.

4.7.6 Processes and resources for Distributive anaphora

Following are the processes and resources, developed for this framework.

4.7.6.1 QW()

This process identifies groups or noun in plural, which are mentioned by using quantifier words. For example, in English, the words many, few, etc. are used the specify group. In

.bahut), etc. are used for this purpose) تہب ,(kuch) ھچک ,Urdu

4.7.6.2 Nmbr()

This process identifies groups specified by using quantity in numbers.

4.7.6.3 SP()

This process identifies noun in plural, present in the text, by using resource ‘plurals’

4.7.6.4 Verb()

This process identifies verbs (one or more), in the text, to help selecting correct antecedent.

98

4.7.6.5 Attrib()

When there are multiple candidate antecedents then this process is activated. It helps in locating the correct antecedent by examining the attributes, related with candidate antecedents. A resource ‘Attribute’ is developed which contains world knowledge in the form of attribute/complements/properties, associated with the candidate entity. The entity, with matching attribute relations in the resource, is selected as antecedent.

4.7.6.6 Polarity ()

This process identifies the modifiers, which used in Urdu, to make the polarity of the sentence as negative. The polarity of a sentence is helpful in locating accurate antecedent

not) is used for negative) ںیہن Koi Bhi). Normally word) وکیئ یھب for distributive anaphor sentence in Urdu.

4.7.6.7 NCase ()

This process checks the clitics or postpositions, succeeding the pronoun, and decides whether simple singular form should be inserted after distributive pronoun or its oblique form. It uses resource ‘Nouncase’ for clitic/postpositions and for oblique form it uses resource ‘Oblique’.

4.7.6.8 MVerb ()

This process is developed for finding the main verb in referring sentence, and relating it with multiple groups/plurals in referent sentence on the bases of some attribute. For this purpose, a resource VAttrib () is developed which contains verbs and associated entities used in real world.

To apply the formulated rules for distributive anaphora, RADAR needs resources for identification of different language entities, for example, noun, verb, adverb, etc. To fulfill this requirement, some resources are developed. These resources contain limited sets of data, for testing purpose. To develop complete and comprehensive resources is beyond the scope of this research work. Following are the resources developed for implementation and testing of RADAR.

99

4.7.6.9 DA

This resource contains distributive anaphors, and is used to identify the existence of distributive pronoun in the text. DA is shown in Appendix E.

4.7.6.10 Plurals

This resource contains singular and plural words of Urdu, which are most commonly used. This resource is used for identify plurals/groups in the text at initial stages and then is used to get singular form for insertion in the text, to resolve anaphoric link. A portion of resource is shown in Appendix F.

4.7.6.11 Numbers

It contains numeric number in the form of words, which are normally used in daily life to represent plural/groups. In Urdu, to represent plural form or group the number in words along with singular form is used. Therefore, plural form or group is only identified by the

flower) is singular form. It is) وھپل three flowers). The word) نیت وھپل ,number. For example

three) which makes it plural. RADAR identifies plural or group by using this) نیت the word resource and also gets the singular form successor to this number. This resource is shown in Appendix G.

4.7.6.12 Quantifier

This resource contains words which are used as quantifier in Urdu to specify plural or group. For example, in English, words like some, all, many, few, etc. are used as quantifier. Similarly, such words of Urdu are used to represent the things in group or plural form. These words are stored in this resource. RADAR uses these words to identify plural forms in the text. This resource is shown in Appendix H.

4.7.6.13 Verbs

Verbs are important part of a sentence, which show the action of the subject. In case of multiple candidate antecedents, the RADAR uses the option of verbs to identify the correct antecedent. Urdu has rich verb morphology with various categories [97]. To build a complete verb resource for Urdu is beyond the scope of this research work. For

100

implementation and testing of RADAR, a mini resource is developed, which can be extended in future. Portion of this resource is shown in Appendix I.

4.7.6.14 Attribute

In case of multiple candidates for antecedent, RADAR uses some world knowledge to locate the correct antecedent. This world knowledge is stored as resource in ‘Attribute’. This resource contains the associated attributes of different entities, which are used in daily routine matters. Portion of this resource is shown in Appendix J.

4.7.6.15 Category

This resource contains grouping/classification of similar things under one title. In written text, when a group is specified by multiple entities of same nature, then in resolving anaphora, title or class of that group is used. This resource is shown in Appendix K.

4.7.6.16 PP

This resource contains personal pronouns, shown in Appendix C.

4.7.6.17 Oblique

An oblique case of a noun is used when noun phrase is the object of either a verb or a preposition. In Urdu, oblique form exists without any clitic for singular and plural form.

birds) and oblique form is) رپدنے bird), plural form is) رپدنہ For example, singular form is

.This resource contains oblique form of nouns, and shown in Appendix L .رپدنوں

4.7.6.18 Adverb

This resource contains , shown in Appendix M.

4.7.6.19 Nouncase

This resource contains the clitics/postpositions, used to mark noun cases, shown in Appendix D.

Answer to primary research question, specified in topic ‘1.4 objectives’, is achieved here, and the model for resolution of reflexive and distributive anaphora is developed.

101

4.8 Summary

In this chapter, we presented RADAR, Reflexive And Distributive Anaphora Resolution, which is solution to our primary research question. First, the general architecture of anaphora resolution is presented, and then the architecture of framework of RADAR is presented. RADAR framework consists of two frameworks: one for resolving reflexive anaphoric links and second for resolving distributive anaphoric links, in Urdu text.

The workflow of framework of reflexive anaphora is explained along with the procedure/modules, and the resources which are utilized for this resolution. Similarly, framework for distributive anaphora is explored, its workflow, procedures/modules, processes, and resources need to resolve the distributive links. This chapter explains in depth, the framework for resolving reflexive and distributive anaphora in Urdu text.

102

Chapter 5: Evaluation and Results

5.1 Evaluation overview

A major task in the development of an NLP system is to evaluate it by using different sets of data. It is helpful in assessing the system and also it gives clues about different portions of the system which need improvement to get better result rates in future.

There are three main methods of evaluating an NLP system. Adequacy evaluation is related with the functionality and usability of a system. Diagnostic evaluation is performed during different stages of the implementation to decide that the development of the system is in the right direction or not. Performance evaluation is the measurement of a system to assess it independently or by comparing it with other similar systems.

This chapter deals with the evaluation of RADAR, the anaphora resolution system for reflexive and distributive anaphoric links in Urdu text. Performance evaluation method is used here to address some general issues relating to evaluate the performance of the system. The components of the system are assessed individually to test the performance and also get clues to make the performance of the system better.

5.2 Evaluation of RADAR.

Anaphora resolution is that area of NLP which is suitable for automatic evaluation. To evaluate anaphora resolution system, manually annotated corpus is required. For RADAR, we developed modules and resources to annotate the required entities, as not all entities require annotation. Text, containing reflexive and distributive anaphors, was collected from internet sources, books, magazines, newspapers, and children stories. Set of resources with limited and sufficient data were also developed to evaluate the system. For discourse processing, Carletta [98], Carletta et al.[99], Walker and Moore [100] promoted that linguistic theories are based on several subjects. In developing processes and resources, this is reflected to have more than one annotator according to requirements.

Evaluation of RADAR was carried out with the aim to assess the accuracy of the system and identify causes, when the system failed to perform the task. The resolution of each pronoun is evaluated in order to find out difficulties/problems to resolving it.

103

5.3 Overall Results

The system is evaluated by Urdu text documents, collected from various resources ([101], [102], [103], [104], [105], [106], [107]). Two documents are prepared for this purpose. One document contains text having reflexive anaphoric links and second document contains distributive anaphoric links.

After necessary preprocessing, these documents are test separately. Table-5.1 shows the results achieved in percentage. System achieved 85.71% accuracy, for processing documents containing discourse units of reflexive anaphoric links and 78.67% accuracy while resolving distributive anaphoric links.

Table-5.1 Results of Reflexive and Distributive Anaphora Resolution

Anaphora Total Correctly Incorrectly Accuracy% Anaphora Resolved Resolved

Reflexive Anaphora 140 120 20 85.71

Distributive Anaphora 150 118 32 78.67

These results are shown in Fig-5.1

Reflexive and Distributive Anaphora Resolution 100 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 Reflexive Anaphora Distributive Anaphora

Fig-5.1 Results of reflexive and distributive anaphora

104

Accuracy rate, in %age, of the system, for reflexive pronouns is higher than of distributive pronouns. Results for both types of pronouns are discussed in detail in the following section.

5.4 Evaluation of RADAR for Reflexive anaphora

We prepared two documents, one for reflexive and another one for distributive anaphora. Overall result of the system RADAR for reflexive anaphora is shown in Table-5.1, which is 85.71%. In the following section, we explore the result, for each reflexive pronoun separately, for further individual analysis.

Accuracy rate, for possessive reflexive pronouns is in the range of 86.36% to 88%. Accuracy rate for non-possessive reflexive pronouns is 84.84% and for possessive and non- possessive pronouns, when used together is 83.33%, and for possessive & non-possessive pronoun together is 83.33%. The success rate is between 83.33% to 88%. This break up is shown in Table-5.2.

Table-5.2 Reflexive Pronoun Individually

Pronoun Pronoun Total Correctly Incorrectly Accuracy% Type Pronoun Resolved Resolved

30 26 4 86.67 (Apna) اانپ

(Apne) 25 22 3 88 اےنپ

Possessive 22 19 3 86.36 (Apni) اینپ

33 28 5 84.84 (Khudd) وخد

Possessive and 18 15 3 83.33 Non Possessive

Possessive together

Non Distributive 12 10 2 83.33 Possessive

105

These results are shown in Fig-5.2.

Reflexive anaphors individually 100

90

80

70

60

50

40

30

20

10 Possessive and Distributive خود Khudd اپنی Apni اپنے Apne اپنا Apna Non Possessive Possessive together

Fig-5.2 Reflexive anaphors individually

5.4.1 Analysis of possessive reflexive pronoun

Table-5.2 shows the results of reflexive pronoun, both possessive and non-possessive pronoun. For further in depth analysis, we discuss the results of possessive and non- possessive pronouns separately.

Table-5.3 shows the cases of possessive reflexive pronouns, when used with various syntactical options or grammatical entities, i.e., when nouns or adverbs, are used with possessive reflexive pronoun, in text. Second column ‘Total’ shows that there are 30 cases, where possessive reflexive pronoun is preceded by a noun or personal pronoun. In 23 cases, possessive reflexive pronoun is preceded by clitic and noun or personal pronoun. 14 cases with pronoun, and in 10 cases, possessive and non-possessive pronouns are preceded by

106

noun or personal pronoun. Third column shows correctly resolved cases. Fourth column of the table shows the accuracy in percent.

Results of first three rows are very encouraging. For first row, possessive reflexive pronoun preceded by noun or personal pronoun, accuracy rate is 90%, and in third row, where possessive reflexive pronoun is preceded by an adverb, it is 92.86%

The accuracy rate is comparatively low for cases, where combination of possessive reflexive pronoun and non-possessive reflexive pronoun is preceded by noun or personal pronoun.

Table-5.3 Possessive Reflexive Pronoun with various entities

Syntactical Use Total Correctly Incorrect Accuracy%

Resolved

Noun/PP 30 27 3 90.00

Clitic & Noun/PP 23 20 3 86.96

Adverb+Noun/PP 14 13 1 92.86

10 PRP AND NPRP + 7 3 70.00 Noun/PP

107

Accuracy rate of possessive reflexive pronoun with various grammatical entities is shown in Fig-5.3.

Possessive Reflexive Pronoun with various entities 100 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 Noun/PP Clitic & Noun/PP Adverb+Noun/PP PRP AND NPRP + Noun/PP

Fig-5.3 Possessive reflexive pronoun with various entities

Accuracy rate of resolution of possessive reflexive pronouns for syntactical usage with noun/pp, noun cases, and adverbs, is in the range of 86.96 to 92, which is encouraging.

For PRP and NRP with noun/pp is 70%, which is comparatively low, as compared to other entities. Low success rate is due to the complexity of the structure, as three entities are combined together, and system inaccurately located the accurate antecedent.

5.4.2 Analysis of Non-possessive reflexive pronoun

After analyzing and discussing the results of various cases of possessive reflexive pronoun, now we analyze and discuss the result of various cases of non-possessive reflexive pronoun. The total number of cases for non-possessive reflexive pronoun is 33. The distribution of these cases to various grammatical/syntactical uses is shown in Table-5.4.

108

Table-5.4 Non- Possessive Reflexive Pronoun with various entities

Syntactical Use Total Correctly Incorrect Accuracy %

Resolved

Noun/PP 15 13 2 86.67

Clitic & Noun/PP 8 6 2 75

Adverb + Noun/PP 4 3 1 75

Noun/PP + 100 0 6 6 ذبات

Non-Possessive Reflexive Pronoun with various entities

100

90

80

70

60

50

40

30

20

10

0 بذات + Noun/PP Clitic & Noun/PP Adverb+Noun/PP Noun/PP

Fig-5.4 Non- Possessive Reflexive Pronoun with various entities

109

Fig-5.4 shows accuracy rate non-possessive reflexive pronoun with various entities. When non-possessive pronoun is preceded by a noun or personal pronoun, accuracy rate is 86.67, which is satisfactory. When non-possessive reflexive pronoun is preceded by clitic and noun or personal pronoun, accuracy is 75%. This decrease in accuracy rate is due to the complications of use of clitic. Some clitic represent more than one noun cases and also a noun may have more than one clitic. Similarly, for adverb, accuracy rate is 75% due to multiple usage of adverb and more than one adverb at a time. When none-possessive

bazat), accuracy rate is 100%, due to simplicity) ذبات reflexive pronoun is preceded by word

is preceded by noun ذبات وخد of the case, as presented in Table-3.3. In Urdu, the combination or personal pronoun. However, this accuracy rate may decrease due to un-identifiable nouns.

5.5 Evaluation of RADAR for Distributive anaphora

A document is prepared, to test the system, for distributive anaphor. This document contains 150 discourse units, with distributive anaphoric links. Over all accuracy rate for distributive anaphor is 78.67, shown in Table-5.1. Distribution of these 150 discourse units is shown in Table-5.5. The accuracy rate of distributive pronouns varies in the range 80.0% to 86.11. overall accuracy rate for distributive pronouns, which is 78.67. Table-5.5 shows the distribution of this accuracy rate for distributive anaphor, individually.

110

Table-5.5 Distributive Anaphors individually

Anaphor Total Correctly Incorrectly Accuracy% Anaphors Resolved Resolved

45 36 9 80.0 (Har Ek) رہ اکی

36 31 5 86.11 (Koi Ek) وکیئ اکی

37 31 6 83.78 (Koi Bhi) وکیئ یھب

32 27 5 84.37 (Kai Ek) یئک اکی

Fig-5.5 shows the graph of each distributive pronoun, individually.

Distributive Anaphors individuallly 100 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 کئی ایک )(Kai Ek کوئی بھی )(Koi Bhi کوئی ایک )(Koi Ek ہر ایک )(Har Ek

Fig-5.5 Distributive pronoun individually

111

5.5.1 Analysis of distributive pronouns

Table-5.5 shows the results of distributive pronoun, individually, For further in depth analysis, we discuss the results of all distributive pronouns one by one.

Har Ek) with various entities) رہ اکی Distributive pronoun 5.5.1.1

Har Ek) is 80%. Here, we explore the accuracy) رہ اکی The overall accuracy rate of pronoun rate of this pronoun by its usage with different grammatical entities. Text document,

Har) رہ اکی prepared for distributive pronouns, contains 45 discourse unit to test pronoun Ek). The distribution of these discourse units is shown in in Table-5.6.

Har Ek) with various entities) رہ اکی Tabl-5.6 Distributive pronoun

Entity No. of Correctly Incorrectly Accuracy%

Anaphors Resolved Resolved

NN/NP 5 5 0 100

Single Group 10 9 1 90.00

Multiple Groups 9 5 4 55.56

Attribute Based 8 5 3 62.50

Single Oblique 5 5 0 100.00

Class/category 8 4 4 50.00

Har Ek) is) رہ اکی First row of the above table shows that when distributive pronoun preceded by a noun or personal pronoun, the accuracy rate is 100%. This accuracy is due

112

to the simplicity of the case, that there is no need for resolution. This accuracy rate may decrease, if the system is unable to recognize noun or personal pronoun.

When the antecedent sentence contains a single group or noun in plural form, accuracy rate is 90%, which is good enough. Accuracy rate for multiple groups or noun in plural in antecedent sentence is 55.56%. This low accuracy rate is due to the complexity of the problem. In case of multiple candidate antecedents, the factor like world knowledge about main verbs is required, which at the moment needs attention of the research in linguistic. Results for attribute based and single class/category are also low, due to lack of proper grouping of entities under one class/category. This accuracy rate can be increased, if sufficient knowledge is provided. For oblique form, the case is simple, and resources contain enough data for text to be test. The results are shown in Fig-5.6.

Har Ek) with different entities) ہر ایک Distributive pronoun 100

90

80

70

60

50

40

30

20

10

0 NN/NP Single Group Multiple Groups Attrib Based Single Oblique Class/Category

Har Ek) with different entities) رہ اکی Fig-5.6 Distributive pronoun

113

Koi Ek) with different entities) وکیئ اکی Distributive pronoun 5.5.1.2

.Koi Ek), the total number of discourse units are 36) وکیئ اکی For distributive pronoun Distribution of these discourse units is shown in Table-5.7. System, RADAR, resolved 31 units accurately. The overall success rate for distributive pronoun is 80%.

Koi Ek) with different entities) وکیئ اکی Table-5.7 Distributive pronoun

Entity No. of Correctly Incorrectly Accuracy%

Anaphors Resolved Resolved

NN/NP 14 14 0 100

Single Group 12 11 1 91.67

PP + clitic + noun in plural in referring 10 6 4 60.00

First row of the above table shows that when a noun or personal pronoun is preceded by

,Koi Ek), accuracy rate is 100%. As the case is simple) وکیئ اکی distributive pronoun succeeding noun or pronoun is the antecedent and there is no need to resolve it.

Second row shows the accuracy rate is 91.67%, when there is a single group or noun in plural form. The increase or decrease in this accuracy rate depends upon the recognition of noun in plural form.

Third row shows that the result is 60.0%, comparatively low one, for cases when a personal pronoun is used with clitic(s) in antecedent sentence and referring sentence contain single noun in plural form. This low result is due to the complications when more than one noun cases are combined together after a noun or personal pronoun. The graph of distributive

.koi Ek) with different entities is shown in Fig-5.7) وکیئ اکی pronoun

114

Koi Ek) with) کوئی ایک ) Distributive pronoun different syntactical entities

100

90

80

70

60

50

40

30

20

10

0 NN/NP Single Group PP + clitic

koi Ek) with different entities) وکیئ اکی Fig-5.7 Distributive pronoun

(Koi Bhi) وکیئ یھب Distributive anaphor 5.5.1.3

Koi Bhi), text document contains 37 discourse unit. The) وکیئ یھب For distributive pronoun distribution of these discourse units over different entities is show in Table-5.8. Our system, RADAR, resolved 31 units accurately. The overall success rate is 83.78%.

115

Koi Bhi) with different entities) وکیئ یھب Table-5.8 Distributive pronoun

Entity No. of Correctly Incorrectly Accuracy%

Anaphors Resolved Resolved

Negative Sentence 9 8 1 88.89

Positive Sentence 8 7 1 87.50

Negative + same sentence 10 8 2 80.00

Positive + Same Sentence 10 8 2 80.00

First row of the above table shows that, for negative sentence the accuracy rate 88.89%. System has to check the polarity of the referring sentence and if it is negative then singular

Koi Bhi). Identification of) وکیئ یھب form of antecedent is inserted after distributive anaphor antecedent and resolution of anaphora is simple in this case. The increase or decrease in accuracy rate depends upon the checking of polarity of the sentence.

Similar case is with sentences having positive polarity, where the accuracy rate is 87.50%.

.Koi Bhi) with various entities is shown in Fig-5.8) وکیئ یھب The graph of distributive anaphor

116

Koi Bhi) with different) کوئی بھی Distributive pronoun entities 100.00

90.00

80.00

70.00

60.00

50.00

40.00

30.00

20.00

10.00 Negative Sentence Positive Sentence Negative+same Positive + Same sentence Sentence

Koi Bhi) with different entities) وکیئ یھب Fig-5.8 Distributive pronoun

(Kai Ek) یئک اکی Distributive anaphor 5.5.1.4

.Kai Ek) are 32 in number) یئک اکی The number of discourse units for distributive anaphor Distribution of these cases is shown in Table-5.9. System resolved 27 cases correctly. The overall accuracy rate is 84.37%.

Kai Ek) with different entities) یئک اکی Table-5.9 Distributive pronoun

Entity No. of Correctly Incorrectly Accuracy%

Anaphors Resolved Resolved

Plural Entity 18 15 3 83.33

Plural Entity + Noun case 14 12 2 85.71

117

First row of the above table shows that for a single plural entity in antecedent sentence, accuracy rate is 83.33% and second row shows the accuracy rate for a single plural entity along with a noun case is 85.71%. Here the results are satisfactory. Accuracy rate may be increased or decreased, depends upon how accurately plural forms are identified. Our system finds noun in plural form in antecedent sentence, and inserts its singular form after

,Kai Ek) in referring sentence to resolve anaphora. If a clitic or postposition is used) یئک اکی

یئک اکی normally genitive or locative, then oblique form of the plural entity is inserted after (Kai Ek). These results are shown in Fig-5.9.

Kai Ek) with different) کئی ایک Distributive pronoun entities 100.00

90.00

80.00

70.00

60.00

50.00

40.00

30.00

20.00

10.00 Plural Entity Plural Entity + Noun case

Kai Ek) with different entities) یئک اکی Fig-5.9 Distributive pronoun

118

5.6 Conclusion

This work proposed a framework, RADAR, for resolving reflexive and distributive anaphora in Urdu discourse. The main aims of this research are to investigate the syntactic structure of reflexive and distributive pronouns in Urdu text and formulate rules for their resolution, and design a framework.

These aims are fulfilled by design and implementing RADAR, an extendable system based on formulated rules and supporting resources. The system was designed for reflexive and distributive anaphors of Urdu, having the potential of being extended for other types of anaphors in Urdu as well as other languages having same grammatical structure.

Apart from these aims, a mini set of resources is also developed to support and test the implementation of proposed framework. These resources are extendable to keep more world knowledge to support the anaphora resolution process.

A significant part of research has been the evaluation of formulated rules in general and of RADAR in particular. The results have been encouraging and showed that the rules, formulated for resolution, worked well and also indicated the deficiency and sources of errors including structure and size of supporting resources, which can be improved and extended in future for better results.

Evaluation of RADAR envisaged different issues regarding verb attachment, attributes of nouns, and world knowledge in resolving anaphors. Different factors are involved for this low result rate: lack of semantic information, impossibility of using world knowledge, preprocessing errors, and inherent ambiguity of natural language. Fully automatic NLP systems produce more errors, as errors produced at each stage are reflected in the next stage.

Main contributions of this research work are:

- Analyzed the grammatical structural use of reflexive anaphors (both possessive and non-possessive) and distributive anaphors, and formulated rules to locate their antecedents, when used with various language entities. - Developed a framework, in general, for resolution of reflexive and distributive anaphors.

119

- Developed a separate framework for resolution of reflexive anaphors. - Developed a separate framework for resolution of distributive anaphors. - Presented and implemented a mechanism for resolution of reflexive and distributive anaphors, when there are multiple potential antecedent candidates. - Presented and implemented, how world knowledge can be used for locating the correct antecedent. - Development of resources for different Urdu language entities - Development of resources, depicting world knowledge - Investigated and found different factors, which cause errors in the process of anaphora resolution. - Found different factors, which need improvement, for better results in future.

5.7 Future Work

As, described in Chapter 1, anaphora resolution is essential for various types of NLP systems, machine translation (MT), information extraction (IE), etc. Following areas can be addressed in future.

 There is a need to make the selection process more effective for resolving anaphora. In case of multiple antecedent candidates, we used verbs, world knowledge to locate the correct antecedent. Verbs create ambiguities in complicated sentences. Urdu verbs are more complex, as they not only show the action, they also show the gender and number. Therefore, a deep study of verbs is required to support the selection process of antecedent.  RADAR could be extended for other types of anaphors, especially for most commonly used personal pronouns.  World knowledge requires attention and modification to support for anaphora resolution process.  Identification of noun for Urdu need more work, as some nouns are also used as verbs, which creates errors in the whole process.  To find out the plural form is also a problem in some cases. It is the verb or which make a singular form to plural form. For example, in the following sentence,

120

آرٹسایلیےک الخف الھکڑی ڑبی تنحم ےس لیھک رےہ ےھت-رہ اکیچیم انتیج اچاتہ اھت

Verb + Singular auxiliary verb form

player) is used as both singular and plural in Urdu. Now, it) الھکڑی In above example, word is verb and auxiliary verb which decide its use as singular or plural in the sentence.

 The efficiency, accuracy and success rate of the system can be improved in future by applying the automated NLP tools for this kind of system.  Part of speech tagging (POS tagging) is the process of marking up the words in a text as corresponding to a particular part of speech, based on both its definition, as well as its context (Wikipedia). So for future work an Urdu Tagger is needed.  Before entering the text in an NLP system, it must be accurate grammatically. Therefore, a parser is required to enter text, grammatically correct, to make an efficient system.  World knowledge is also needed to resolve anaphoric link. Research Work on knowledge base is required.

121

Chapter 6: References

[1] J. C. King. Anaphora. In E. N. Zalta, editor, The Stanford Encyclopedia of

Philosophy. Summer 2013 edition, 2013.

[2] Halliday and R. Hasan, “Cohesion in English,” Longman English Language

Series 9, Longman, 1976.

[3] Mukund, S., Srihari, R. and Peterson, E., 2010. An Information-Extraction

System for Urdu---A Resource-Poor Language. ACM Transactions on Asian

Language Information Processing (TALIP), 9(4), pp.1-43.

[4] Hobbs, J.R., 1978. Resolving pronoun references. Lingua, 44(4), pp.311-338.

[5] Lappin, S. and Leass, H.J., 1994. An algorithm for pronominal anaphora

resolution. Computational linguistics, 20(4), pp.535-561.

[6] Soon, W.M., Ng, H.T. and Lim, C.Y., 1999. Corpus-based learning for noun

phrase coreference resolution. In 1999 Joint SIGDAT Conference on Empirical

Methods in Natural Language Processing and Very Large Corpora.

[7] Capstick, J., Diagne, A.K., Erbach, G., Uszkoreit, H., Leisenberg, A. and

Leisenberg, M., 2000. A system for supporting cross-lingual information

retrieval. Information processing & management, 36(2), pp.275-289.

[8] Mukund, S., Srihari, R. and Peterson, E., 2010. An Information-Extraction

System for Urdu---A Resource-Poor Language. ACM Transactions on Asian

Language Information Processing (TALIP), 9(4), pp.1-43.

[9] Riaz K. Rule-based named entity recognition in Urdu. In Proceedings of the 2010

named entities workshop 2010 Jul (pp. 126-135).

122

[10] Al-Shammari ET, Lin J. Towards an error-free Arabic stemming. In Proceedings

of the 2nd ACM workshop on Improving non English web searching 2008 Oct 30

(pp. 9-16).

[11] Jawaid B, Ahmed T (2009) Hindi to Urdu conversion: beyond simple

transliteration. In: Proceedings of the conference on language and technology, pp.

24–31.

[12] Adeeba, F. and Hussain, S., 2011, November. Experiences in building urdu

wordnet. In Proceedings of the 9th workshop on Asian language resources (pp.

31-35).

[13] Riaz, K., 2008, October. Baseline for Urdu IR evaluation. In Proceedings of the

2nd ACM workshop on Improving non english web searching (pp. 97-100).

[14] Ahmed, T. and Hautli, A., 2010. Developing a basic lexical resource for Urdu

using Hindi WordNet. Proceedings of CLT10, Islamabad, Pakistan.

[15] Visweswariah K, et al. (2010) Urdu and Hindi: translation and sharing of

linguistic resources. In: Proceedings of the 23rd international conference on

computational linguistics (COLING), pp 1283–1291.

[16] Adeeba F, Hussain S (2011) Experiences in building the UrduWordNet. In:

Proceedings of the 9th workshop on Asian language resources, pp 31–35.

[17] Ng, V., 2009, June. Graph-cut-based anaphoricity determination for coreference

resolution. In Proceedings of Human Language Technologies: The 2009 Annual

Conference of the North American Chapter of the Association for Computational

Linguistics (pp. 575-583).

123

[18] Finkel JR, Manning CD. Enforcing transitivity in coreference resolution.

InProceedings of ACL-08: HLT, Short Papers 2008 Jun (pp. 45-48).

[19] Stoyanov, V., Gilbert, N., Cardie, C. and Riloff, E., 2009, August. Conundrums in

noun phrase coreference resolution: Making sense of the state-of-the-art.

In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL

and the 4th International Joint Conference on Natural Language Processing of

the AFNLP (pp. 656-664).

[20] G. Hirst, “Anaphora in Natural Language Understanding,” Springer-Verlag,

Berlin, 1981.

[21] Hirschman, L. and Gaizauskas, R., 2001. Natural language question answering:

the view from here. natural language engineering, 7(4), p.275.

[22] Vicedo, J.L. and Ferrández, A., 2000, October. Importance of pronominal

anaphora resolution in question answering systems. In Proceedings of the 38th

Annual Meeting of the Association for Computational Linguistics (pp. 555-562).

[23] Wikipedia. Automatic summarization— Wikipedia, the free encyclopedia, 2014.

[Online; accessed 17-January-2018].

[24] Mitkov, R., 1999. Anaphora resolution: the state of the art (pp. 1-34). School of

Languages and European Studies, University of Wolverhampton.

[25] Ashima, A. and Mohana, C.R., 2016, September. Anaphora resolution in Hindi: A

hybrid approach. In The International Symposium on Intelligent Systems

Technologies and Applications (pp. 815-830). Springer, Cham.

[26] Hobbs, J.R., 1978. Resolving pronoun references. Lingua, 44(4), pp.311-338.

124

[27] Leffa, V.J., 2003. Anaphora resolution without world knowledge. DELTA:

Documentação de Estudos em Lingüística Teórica e Aplicada, 19(1), pp.181-200.

[28] Denber, M., 1998. Automatic resolution of anaphora in English. Eastman Kodak

Co.

[29] Hobbs, J.R., 1977. Pronoun resolution. ACM SIGART Bulletin, (61), pp.28-28.

[30] Brennan, S.E., Friedman, M.W. and Pollard, C., 1987, July. A centering approach

to pronouns. In 25th Annual Meeting of the Association for Computational

Linguistics (pp. 155-162).

[31] Lappin, S. and Leass, H.J., 1994. An algorithm for pronominal anaphora

resolution. Computational linguistics, 20(4), pp.535-561.

[32] Ge et al.1998 Niyu Ge, J. Hale, and E. Charniak. 1998. A Statistical Approach to

Anaphora Resolution. In Proceedings of the Sixth Workshop on Very Large

Corpora, COLING-ACL ’98, pages 161 – 170, Montreal, Canada.

[33] Hobbs, J.R. and Shieber, S.M., 1987. An algorithm for generating quantifier

scopings. Computational Linguistics, 13, pp.47-63.

[34] Soon, W.M., Ng, H.T. and Lim, D.C.Y., 2001. A machine learning approach to

coreference resolution of noun phrases. Computational linguistics, 27(4), pp.521-

544.

[35] Daniel G Bobrow. A question-answering system for high school algebra word

problems. In Proceedings of the October 27-29, 1964, fall joint computer

conference, part I, pp. 591–614. ACM, 1964.

[36] T. Winograd. 1972. Understanding natural language. Cognitive psychology, 3:1–

191.

125

[37] J. Hobbs, “Resolving pronoun references,” in Readings in Natural Language

Processing (B. Grosz, K. Sparck Jones, and B. Lynn Webber, eds.), Morgan

Kaufmann, 1978.

[38] Chomsky, N., 1957. Syntactic structures. The Hague: Mouton.. 1965. Aspects of

the theory of syntax. Cambridge, Mass.: MIT Press.(1981) Lectures on

Government and Binding, Dordrecht: Foris.(1982) Some Concepts and

Consequences of the Theory of Government and Binding. LI Monographs, 6,

pp.1-52.

[39] Chomsky, N., Jacobs, R.A. and Rosenbaum, P.S., 1970. Remarks on

nominalization. 1970, 184, p.221.

[40] Chomsky, N., 1993. Lectures on government and binding: The Pisa lectures (No.

9). Walter de Gruyter.

[41] Chomsky, N., 1995. The minimalist program (current studies in linguistics

28). Cambridge et al.

[42] Jackendoff, R., 1977. X syntax: A study of phrase structure. Linguistic Inquiry

Monographs Cambridge, Mass, (2), pp.1-249.

[43] Joshi, Aravind K. and Steve Kuhn. 1979. Centered logic: The role of entity

centered sentence representation in natural language inferencing. In Proceedings

of the International Joint Conference on Artificial Intelligence, pages 435–439,

Tokyo.

[44] Joshi, Aravind K. and Scott Weinstein. 1981. Control of inference: Role of some

aspects of discourse structure – centering. In Proceedings of the International

Joint Conference on Artificial Intelligence, pp. 385–387, Vancouver, B.C.

126

[45] S.E. Brennan, M.W. Friedmann, and C.J. Pollard. 1987. A Centering approach to

pronouns. In Proceedings of the 25th Annual Meeting of the ACL, pp. 155–162,

Stanford.

[46] M. Strube. Never look back: An alternative to centering. In Proceedings of the

17th international conference on Computational Linguistics-Volume 2, pp. 1251–

1257. Association for Computational Linguistics, 1998.

[47] Sidner, Candace L. 1979. Towards a Computational Theory of Definite Anaphora

Comprehension in English Discourse. Ph.D. thesis, Massachusetts Institute of

Technology, Artificial Intelligence Laboratory.

[48] Mann, W.C. and Thompson, S.A., 1988. Rhetorical structure theory: Toward a

functional theory of text organization. Text, 8(3), pp.243-281.

[49] Grosz, B. and Sidner, C.L., 1986. Attention, intentions, and the structure of

discourse. Computational linguistics.

[50] Cristea, Dan, Nancy Ide, and Laurent Romary. 1998. Veins theory: A model of

global discourse cohesion and coherence. In Proceedings of the 17th International

Conference on Computational Linguistics, pages 281–285, Montreal, Canada.

Association for Computational Linguistics.

[51] Kamp, H., 1981. A theory of truth and semantic representation. Formal

semantics-the essential readings, pp.189-222.

[52] Kamp, Hans and Uwe Reyle. 1993. From Discourse to Logic. Introduction to

Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse

Representation Theory. Kluwer, Dordrecht.

127

[53] Heim, Irene. 1983. File change semantics and the familiarity theory of

definiteness. In Rainer B¨auerle, Christoph Schwarze, and Arnim von Stechow,

editors, Meaning, Use, and Interpretation of Language. Walter de Gruyter, Berlin,

pages 164–189.

[54] S. Lappin and H. Leass, “An algorithm for pronominal anaphora resolution,”

Computational Linguistics, vol. 20, no. 4, pp. 535–561, 1994.

[55] Dagan, I., Justeson, J., Lappin, S., Leass, H. and Ribak, A., 1995. Syntax and

lexical statistics in anaphora resolution. Applied Artificial Intelligence an

International Journal, 9(6), pp.633-644.

[56] Kennedy, C. and Boguraev, B., 1996. Anaphora for everyone: Pronominal

anaphora resolution without a parser. In COLING 1996 Volume 1: The 16th

International Conference on Computational Linguistics.

[57] Karlsson, F., Voutilainen, A., Heikkilae, J. and Anttila, A. eds., 2011. Constraint

Grammar: a language-independent system for parsing unrestricted text (Vol. 4).

Walter de Gruyter.

[58] B. Baldwin. CogNIAC: high precision coreference with limited knowledge and

linguistic resources. In Proceedings of a Workshop on Operational Factors in

Practical, Robust Anaphora Resolution for Unrestricted Texts, pp. 38–45.

Association for Computational Linguistics, 1997.

[59] Mitkov, Ruslan. 1994. An integrated model for anaphora resolution. In

Proceedings of the 15th conference on Computational linguistics, pp. 1170–1176,

Morristown, NJ, USA. Association for Computational Linguistics.

128

[60] Mitkov, Ruslan. 1996. Anaphora resolution: A combination of linguistic and

statistical approaches. In Proceedings of the Discourse Anaphora and Anaphor

Resolution (DAARC’96), Lancaster, UK.

[61] Mitkov, Ruslan. 1995. An uncertainty reasoning approach for anaphora

resolution. In Proceedings of the Natural Language Processing Pacific Rim

Symposium (NLPRS’95), pp. 149–154, Seoul, Korea.25.

[62] Mitkov, R., 1997. Two Engines are better than one: Generating more power and

confidence in the search for the antecedent. AMSTERDAM STUDIES IN THE

THEORY AND HISTORY OF LINGUISTIC SCIENCE SERIES 4, pp.225-234.

[63] Mitkov, Ruslan. 1998. Robust pronoun resolution with limited knowledge. In

Proceedings of the 18th International Conference on Computational Linguistics

(COLING’98)/ACL’98 Conference, pp. 869 – 875, Montreal, Canada.

[64] Mitkov, R. and Evans, R., Or asan, C.(2002). A new, fully automatic version of

Mitkov’ s knowledge-poor pronoun resolution method. In Proceedings of the

Third International Conference on Intelligent Text Processing and Computational

Linguistics (CICLing-2002). Mexico City, Mexico.

[65] Connolly, D., Burger, J.D. and Day, D.S., 1997. A machine learning approach to

anaphoric reference. In New methods in language processing (pp. 133-144).

[66] McCarthy, J.F. and Lehnert, W.G., 1995. Using decision trees for coreference

resolution in 40th International Joint Conference on Artificial Intelligence, pp

127-140, Montreal, Canada.

129

[67] Aone, C. and Bennett, S.W., 1995, August. Applying machine learning to

anaphora resolution. In International Joint Conference on Artificial

Intelligence (pp. 302-314). Springer, Berlin, Heidelberg.

[68] Soon, Wee Meng, Hwee Tou Ng, and Daniel Chung Yong Lim. 1999. Corpus-

based learning for noun-phrase coreference resolution. In Proceedings of the Joint

Conference on Empirical Methods in Natural Language Processing and Very

Large Corpora (EMNLP/VLC-99), pp. 285–291, College Park, Maryland.

[69] Soon, W.M., Ng, H.T. and Lim, D.C.Y., 2001. A machine learning approach to

coreference resolution of noun phrases. Computational linguistics, 27(4), pp.521-

544.

[70] Khan, M.A., Ali, M.N. and Khan, M.A., 2006, November. Treatment of

pronominal anaphoric devices in Urdu discourse. In 2006 International

Conference on Emerging Technologies (pp. 543-547). IEEE.

[71] Kalsoom et al (1993), “Urdu Anaphora resolution in monologue”, M.Sc.

Computer Sc. Thesis, Department of Computer Science University of Peshawar.

[72] Khan, M.A. and Nasir, J.A., 2008, October. Distributive anaphora resolution in

Urdu discourse. In 2008 4th International Conference on Emerging

Technologies (pp. 38-43). IEEE.

[73] L. Sobha and B. Patnaik. Vasisth: An anaphora resolution system for Malayalam

and Hindi. In Symposium on Translation Support Systems, 2002.

[74] Sinha RM, Jain A. AnglaHindi: An English to Hindi machine-aided translation

system. MT Summit IX, New Orleans, USA. 2003 Sep 23; 494–7.

130

[75] Pal TL, Dutta K, Singh P. Anaphora Resolution in Hindi: Issues and Challenges.

International Journal of Computer Applications. 2012 Mar; 42(18).

[76] Singh S, Lakhmani P, Mathur P, Morwal S. Anaphora resolution in Hindi

language using gazetteer method. International Journal on Computational

Sciences and Applications IJCSA. 2014 Jun; 4:567–9.

[77] Chopra D, Purohit GN. Handling ambiguities and unknown words in named

entity recognition using anaphora resolution. International Journal on

Computational Sciences and Applications IJCSA. 2013 Oct; 3:456–63.

[78] Dutta K, Prakash N, Kaushik S. Resolving pronominal anaphora in Hindi using

Hobbs algorithm. Web Journal of Formal Computation and Cognitive Linguistics.

2008 Jan; 1(10):5607.

[79] S. Agarwal, M. Srivastava, P. Agarwal, and R. Sanyal. Anaphora resolution in

Hindi documents. In Natural Language Processing and Knowledge Engineering,

2007. NLP-KE 2007. International Conference on, pages 452–458. IEEE, 2007.

[80] Prasad, R. and Strube, M., 2000. Discourse salience and pronoun resolution in

Hindi. University of Pennsylvania Working Papers in Linguistics, 6(3), p.13.

[81] Uppalapu B, Sharma DM. Pronoun Resolution for Hindi. Proceedings

DAARC2009. 2009 Apr 22; 5847.

[82] Lalitha Devi S, Sundar Ram V, Rao PRK. A Generic Anaphora Resolution

Engine for Indian Languages. Proceedings 25th International Conference on

Computational Linguistics, Coling. 2014. pp. 67–84.

131

[83] Dakwale P, Mujadia V, Sharma DM. A Hybrid Approach for Anaphora

Resolution in Hindi Praveen. Proceeding 6th International Joint Conference on

Natural Language Processing, IJCNLP, Nagoya, Japan. 2013 Oct 14-18:80–6.

[84] Lakhmani P, Singh S, Mathur P. Gazetteer Method for Resolving Pronominal

Anaphora in Hindi Language. International Journal of Advances in Computer

Science and Technology. 2014 Mar; 3.

[85] M. Butt and T. H. King. The status of case. Clause Structure in South Asian

Languages, Vol. 61 Dayal, Veneeta; Mahajan, Anoop (Eds.), 2004.

[86] Siddiqui: Jamia Ul Qawaid by Dr. Abullais Siddiqui, Markazi Urdu Board,

Lahore, 1971.

[87] Butt, M., Dyvik, H., King, T.H., Masuichi, H. and Rohrer, C., 2002. The parallel

grammar project. In COLING-02: Grammar Engineering and Evaluation.

[88] Ijaz, M., Hussain, S. “Corpus Based Urdu Lexicon Development”,

in the Proceedings of Conference on Language Technology

(CLT07), University of Peshawar, Pakistan, 2007.

[89] Schmidt: Urdu an essential grammar by Ruth Laila Schmidt, Routledge

Grammars – ISBN: 0415163803, 1999.

[90] Humayoun, M., Hammarström, H. and Ranta, A., 2006. Urdu morphology,

orthography and lexicon extraction. Chalmers tekniska högskola.

[91] Wikipedia (https://en.wikipedia.org/wiki/Reflexive_pronoun), Aug. 28, 2020

[92] Carnie, A., 2012. Syntax: A generative introduction (Vol. 18). John Wiley &

Sons.

[93] Wikipedia (https://en.wikipedia.org/wiki/Distributive_pronoun), Aug. 28, 2020

132

[94] Kameyama, M., 1997. Recognizing referential links: An information extraction

perspective. arXiv preprint cmp-lg/9707009.

[95] Wikipediahttps://en.wikipedia.org/wiki/Commonsense_knowledge_(artificial_inte

lligence), Aug. 28, 2020

[96] Liu, H. and Singh, P., 2004. ConceptNet—a practical commonsense reasoning

tool-kit. BT technology journal, 22(4), pp.211-226.

[97] Rizvi, S.M.J., 2007. Development of algorithms and computational grammar for

Urdu. Unpublished doctoral dissertation). PIEAS, Nilore, Islamabad.

[98] Carletta, J., 1996. Assessing agreement on classification tasks: the kappa

statistic. arXiv preprint cmp-lg/9602004.

[99] Carletta, J., Isard, A., Isard, S., Kowtko, J.C., Doherty-Sneddon, G. and

Anderson, A.H., 1997. The reliability of a dialogue structure coding scheme.

[100] Walker, M. and Moore, J.D., 1997. Empirical studies in discourse. Computational

Linguistics, 23(1), pp.1-12.

[101] http://www.bbc.com/urdu.

[102] www.urdupoint.com

[103] www.liveurdu.com

[104] www.hamariweb.com

[105] www.urduword.com

[106] www.kitabnagri.com

[107] https://www.rekhta.org/?lang=ur

133

Appendix A Reflexive pronoun in Urdu

Possessive Reflexive Non-Possessive Reflexive

وخد اانپ

ذبات وخد اےنپ

اےنپ اےنپ اینپ

اینپ اینپ اانپ اانپ

وخد اانپ وخد اےنپ

وخد اینپ

134

Appendix B Noun

وکیسکیم ریپ

ایلگن اڈنیلگن

انک ایلٹ

تشپ اڈنوایشین

اہھت انیپس

اپسہونی ااطولی

داوتنں اک ربش رتسب

دنہی ااطولی

ینہک اجاپن

وہٹن ارمہکی

ارمہکی داتن

ویانین ادنروین تھچ

ومیس رصم

نسح اڈنوایشین

135

اعمذ ریپو

دمرث ارگنزیی

رعامشن رٹیلنی/اہطرت اخہن

دعلی اڈنیلگن

ابعس اایلگنں

داشن اوگناھٹ

رخم ابزو

زرین ابل

دیحر نیچ

اضیفن اھبرت

دیعس اکدناھ

ارعش ہجنپ

اعرش اھبرت

اہجزنبی ریپو

دعس ایلٹ

136

اشرق اھتءڈنیل

واہب اکن

ربجان اٹگن

رعشی رفاسن

بیط اجاپن

شہ یمی ئ ر زیم

نسح اجاپین

ہحلط اھتءڈنیل

زملم رجنم

رہشوز تھچ

اایز رجینم

اشوہزی رماشک

یلع راض ینیچ

رہظم ہجنپ

اخدل وخااگبہ

137

ااجعز داتن

ارہظ داتن

اعرم ہنیس

ذبیہ دل

ااشتحم وکاہل

نیلقث دوااخہن

امحد رویس

ارنش مکش

رامحن زابن

اضنم اگل

افروق رس

زخہمی اصنب

اقمس ربعاین

واقص وکیسکیم

راض رعیب

138

رضخ اوگناھٹ

بیہص لسغ اخہن

رہشایر ابوریچ اخہن

ابہشز رفاسن

زارون انیپس

ارالسن رفایسیسن

رواشن زابن

دنمی رفش

ارالسن اعطم اگہ

اعبق ڑھکیک

رواحن اکاٹن

ریہش وکراییئ

ح نیئں رہچا

زادہ رگدن

رسرفاز دویار

139

ٰیفطصم رگسی

رصیق وھٹڑی

رفاقن انٹھگ

انسح رگسی

اسحن رماشک

واجتہ رمکہ

اشزنی رصم

ااطلف قلح

رایض اکمن

اربار آھکن

وتدیح م

رجینم

140

Appendix C Personal Pronoun Pronouns امس ریمض

He وہ)رمد(

Her ان یک

Her ان یک hers ان یک

Him ان ےک

His ان اک

His ان اک

I ںیم

Me ےھجم mine ریما

My ریما

141

Our امہرا

Ours امہرے

She وہ)وعرت( their ان ےک theirs اےکن

Them ان ےک

They وہ)عمج(

Us امہرا

We مہ

You وت، مت، آپ

You وت، مت، آپ your ریتا، اہمترا، آپ اک yours ریتا، اہمترا، آپ اک

142

Appendix D Noun Cases Case Clitic form

Ergative ےن

Accusative وک

Dative ےک , وک

Instrumental ےس

Genitive ےک , یک ,اک

Locative , رپ ,ںیم

,ےلت , کت

کلت

Vocative اے

143

Appendix E Distributive pronoun in Urdu رہ اکی وکیئ اکی وکیئ یھب یئک اکی

144

Appendix F Plurals Singular Plural

اجحج احیج

اداب ادبی

ادبان دبن

احتفئ ہفحت

رکبے رکبا

رجتابت رجتہب

رجامئ رجم

ااسجم مسج

احداثت احدہث

بّ ئباں یّلب

دنبر دنبر

اھبول اھبول

ڑیھبےی ڑیھبای

لیب لیب

رپدنے رپدنہ

145

پ ی سلیئں لسنپ

ںیلپچ لپچ

وچےہ وچاہ

ےتیچ اتیچ

رخوگش رخوگش

وخایشں وخیش

دواکانت دواکن

راےتس راہتس

زراےف زراہف

اسپن اسپن

ریش ریش

اقعب اقعب

امعرںیت امعرت

وعرںیت وعرت

قل میئں ملق

اوقال وقل

146

اوقام وقم

ےتک اتک

اھکےن اھکان

اگےئ اگےئ

دگےھ دگاھ

ایلگں یلگ

وھگڑے وھگڑا

ڑلےک ڑلاک

ڑلایکں ڑلیک

رمد رمد

اکمانت اکمن

رگمھچم رگمھچم

انول انول

اہیھت اہیھت

یئک رہن رہن

147

Appendix G Numbers Numbers ادعاد

one اکی

two دو

three نیت

four اچر

five اپچن

six ھچ

seven است

eight آھٹ

nine ون

ten دس

148

eleven ایگرہ

twelve ابرہ

thirteen ریتہ

fourteen وچدہ

fifteen دنپرہ

sixteen وسہل

seventeen رتسہ

eighteen ااھٹرہ

nineteen اسین

twenty سیب

hundred وس

one thousand اکی زہار

149

million دس الھک

150

Appendix H Quantifiers

Quantities دقمارںی

few دنچ

little وھچاٹ

many تہب

much ریثک

parts ےصح

some ھچک

a few دنچ اکی

whole امتم

151

Appendix I Verbs Urdu English Boil اابانل

Hold ااھٹان

Fly اڑاان

Grow ااگان

Come آان

Snore اوانھگن

distribute ابانٹن

Tell اتبان

Make sit اھٹبان

Play اجبان

Save اچبان

increase ڑباھان

اسبان

Call البان

Invite البان

Make انبان

152

وگھبان

Speak وبانل

Raise اپانل

Find اپان

Cook اکپان

Feed الپان

Blow وھپانکن

Ask وپانھچ

Watch انکت

Fry انلت

Break وتڑان

Weigh وتانل

Avoid اٹانل

Invert ااٹلان

Stich اٹانکن

Grope وٹٹانل

Reject رکھٹاان

153

Go اجان

Go اجان

Know اجانن

Freeze انمج

Make وجڑان

Like اچانہ

Like اچانہ

Leave وھچڑان

Kiss وچانم

Buy رخدیان

Run دوڑان

See دانھکی

Give دانی

Put ڈاانل

Make sleep السان

Listen اننس

Think وسانچ

154

Dry وسانھک

Sleep وسان

Do رکان

Say انہک

Get انیل

To be وہان

155

Appendix J Attribute Entity Attrib1 Attrib2 Attrib3 Attrib4

ابیس دبذاہقئ زمدیار ذلذی اھکان

رگم اشدنار آرام دہ رنم رتسب

وھچاٹ ڈنلا ڑبا رگم وکٹ

اگنہم اتسس یتمیق اسدہ ابلس

رپاان اتسس این یتمیق رفرچین

رپاان این وھچاٹ ڑبا اکمن

وہادار وخوصبرت اصف ڑبا ابوریچ اخہن

اتزہ رپاان این اابخر

رخاب رپاان ااھچ این ملق

اندر یتمیق رپاین یئن وصتری

ڑبی یتمیق رپاین یئن رجاںیب

اخیل رھبا رپاان این وٹبا

مین رساکری اریم رساکری ااھچ کنیب

اپس لیف انالقئ القئ اطملعبل

ااھچ تخس یتنحم اقلب ااتسد

156

Appendix K Category/Class Class Member1 Member2 Member3 Member4

ایبولیج ھتیم رٹسمیکی زفسک ومضمن

اکذغ لسنپ رتاش لسنپ رڑب رنشیٹسی

رٹک پیج اکر سب اگڑی

رثن اگنری ڈراہم اگنری اشرعی ادب

ومولی ااتسد رپورسیف ااتسد

اسحب اسسنئ ارگنزیی اردو ومضمن

انول زگیمنی راسہل اابخر اطمہعل ٹ ونسرک بفال اہیک رکٹک لیھک

ویوینریٹس دمرہس اکجل وکسل یمیلعت ادارہ

ک وفوٹرگایف نئلنئگر ایف گنٹنیپ اطخیط آرٹ

شقن واگنری ہمغن اگنری ولگاکری ادااکری نف

157

Appendix L Oblique

Word Oblique وھپولں وھپل اکومں اکم انوولں انول وھگڑوں وھگڑا ڑلوکں ڑلاک ڑلویکں ڑلیک آدویمں آدیم ڑلوکں ڑلےک ڑلویکں ڑلایکں رمدوں رمد وعروتں وعرت رمکوں رمکہ امعروتں امعرت وعروتں وعرںیت الغومں الغم ونرکوں ونرک ادیموں ادیم ادیمواروں ادیموار وبیصنں بیصن اریموں اریم رغوبیں رغبی ریقفں ریقف وناوبں وناب ذونیمں ذ نیم 158

وبعشں ہبعش دگوھں دگاھ وسنیھبں سنیھب

159

Appendix M Adverb Urdu Adverbs English Adverbs

Now ایھب

Recently ایھب

right now ایھب ایھب

Still ایھب کت

Yet ایھب کت

Today آج

Tonight آج رات

this morning آج حبص

Well ااھچ

There ادرھ

Really الص ںیم

Frequently ارثک

together اےٹھک

Alone اےلیک

160

Urdu Adverbs English Adverbs

next week اےلگ ےتفہ

slowly آیگتسہ ےس

quite ابلکل/امتم/رسارس

out ابرہ

later دعب ںیم

hardly لکشمب

barely لکشمب

mostly رتشیب

already ےلہپ ےس

then بت

almost رقتًابی

fast زیت رو

quickly زیتی ےس

pretty وخوصبرت/نیسح

161

Urdu Adverbs English Adverbs

away دور

hard تخس

seldom اشز و اندر

rarely اشز و اندر

usually ومعًام

soon رقنعبی

immediately وفرا

absolutely ًاعطق

occasionally یھبک اھبکر وہےن واال

sometimes یھبک یھبک

never یھبک ںیہن

lately ھچک دن وہےئ

anywhere یسک ہگج ےس

yesterday لک

162

Urdu Adverbs English Adverbs

tomorrow لک)آےن واال(

nowhere ںیہک ںیہن

last night زگری وہیئ رات

ago زگہتش

home رھگ

carefully اتحمط ادناز ںیم

very اہنتی/زایدہ

everywhere رہ ہگج

always ہشیمہ

over there واہں رپ

here اہیں

163