<<

Annotation in the Hindi Treebank

Itisree Jena, Riyaz Ahmad Bhat, Sambhav Jain and Dipti Misra Sharma Technologies Research Centre, IIIT-Hyderabad, India {itisree|riyaz.bhat|sambhav.jain}@research.iiit.ac.in, [email protected]

Abstract finer distinctions of linguistic relevance. Animacy hierarchy proposed in Silverstein’s influential arti- In this paper, discuss our efforts to anno- cle on “animacy hierarchy” (Silverstein, 1986) ranks tate nominals in the Hindi Treebank with the nominals on a scale of the following gradience: 1st semantic property of animacy. Although the pers > 2nd pers > 3rd anim > 3rd inanim. Several such treebank already encodes lexical information at a number of levels such as morph and part hierarchies of animacy have been proposed follow- of speech, the addition of animacy informa- ing (Silverstein, 1986), basic scale taken from tion seems promising given its relevance to (Aissen, 2003) makes a three-way distinction as hu- varied linguistic phenomena. The suggestion mans > animates > inanimates. These hierarchies can is based on the theoretical and computational be said to be based on the likelihood of a referent of analysis of the property of animacy in the con- a to act as an in an event (Kittila¨ et text of anaphora resolution, syntactic parsing, al., 2011). Thus higher a nominal on these hierar- classification and differentia- chies higher the degree of agency/control has over tion. an action. In morphologically rich , the degree of control/agency is expressed by case mark- 1 Introduction ing. Case markers capture the degree of control a nominal has in a given context (Hopper and Thomp- Animacy can either be viewed as a biological prop- son, 1980; Butt, 2006). rank nominals on the erty or a of . In a continuum of control as shown in (1)1. Nominals strictly biological sense, all living entities are ani- marked with Ergative case have highest control and mate, while all other entities are seen as inanimate. the ones marked with Locative have lowest. However, in its linguistic sense, the term is syn- onymous with a referent’s ability to act or instigate Erg > Gen > Inst > Dat > Acc > Loc (1) events volitionally (Kittila¨ et al., 2011). Although seemingly different, linguistic animacy can be im- Of late the systematic correspondences between plied from biological animacy. In linguistics, the animacy and linguistic phenomena have been ex- manifestation of animacy and its relevance to lin- plored for various NLP applications. It has been guistic phenomena have been studied quite exten- noted that animacy provides important informa- sively. Animacy has been shown, cross linguisti- tion, to mention a few, for anaphora resolution cally, to control a number of linguistic phenomena. (Evans and Orasan, 2000), argument disambiguation Case marking, argument realization, topicality or (Dell’Orletta et al., 2005), syntactic parsing (Øvre- discourse salience are some phenomena, highly cor- lid and Nivre, 2007; Bharati et al., 2008; Ambati et related with the property of animacy (Aissen, 2003). al., 2009) and verb classification (Merlo and Steven- In linguistic theory, however, animacy is not seen 1Ergative, Genitive, Instrumental, Dative, Accusative and as a dichotomous variable, rather a range capturing Locative in the given order.

159 Proceedings of the 7th Linguistic Annotation Workshop & Interoperability with Discourse, pages 159–167, Sofia, Bulgaria, August 8-9, 2013. c 2013 Association for Computational Linguistics son, 2001). Despite the fact that animacy could play presence of a high performance sense dis- an important role in NLP applications, its annota- ambiguation system. As shown in Table 2, only tion, however, is not usually featured in a treebank 38.02% of nouns have a single sense as listed in or any other annotated corpora used for developing Hindi Wordnet. these applications. There are a very few annotation projects that have included animacy in their anno- 3. Metonymy or Complex Types: Domains like tation manual, following its strong theoretical and newspaper articles are filled with metonymic computational implications. One such work, mo- expressions like courts, institute names, coun- tivated by the theoretical significance of the prop- try names etc, that can refer to a building, a ge- erty of animacy, is (Zaenen et al., 2004). They ographical place or a group of people depend- make use of a coding scheme drafted for a para- ing on the context of use. These are not project (Bresnan et al., 2002) and present ambiguous per se but show different aspects of an explicit annotation scheme for animacy in En- their in different contexts (logically glish. The annotation scheme assumes a three-way polysemous). Hindi wordnet treats these types distinction, distinguishing Human, Other animates of nouns as inanimate. and Inanimates. Among the latter two categories ‘Other animates’ is further sub-categorized into Nominals in HTB Hindi WordNet Coverage Organizations and Animals, while the category of 78,136 65,064 83.27% ‘Inanimates’ further distinguishes between con- Table 1: Coverage of Hindi WordNet on HTB Nominals. crete and non-concrete, and time and place nomi- nals. As per the annotation scheme, nominals are HTB Nominals Single Unique Sense annotated according to the animacy of their referents with WN Semantics in Hindi WordNet in a given context. Another annotation work that 65,064 24,741 (38.02%) includes animacy for nominals is (Teleman, 1974), Table 2: Nominals in HTB with multiple senses however, the distinction made is binary between hu- man and non-human referents of a nominal in a Given these drawbacks, we have included ani- given context. In a recent work on animacy annota- macy information manually in the annotation of the tion, Thuilier et al. (2012) have annotated a multi- Hindi Treebank, as discussed in this work. In the source French corpora with animacy and verb se- rest, we will discuss the annotation of nominal ex- mantics, on the lines of (Zaenen et al., 2004). Apart pressions with animacy and the motivation for the from the manual annotation for animacy, lexical re- same, the discussion will follow as: Section 2 gives sources like wordnets are an important source of this a brief overview of the Hindi Treebank with all its information, if available. These resources usually layers. Section 3 motivates the annotation of nom- cover animacy, though indirectly (Fellbaum, 2010; inals with animacy, followed by the annotation ef- Narayan et al., 2002). Although a wordnet is an forts and issues encountered in Section 4. Section easily accessible resource for animacy information, 5 concludes the paper with a discussion on possible there are some limitations on its use, as discussed future directions. below: 2 Description of the Hindi Treebank 1. Coverage: Hindi wordnet only treats common nouns while proper nouns are excluded (except In the following, we give an overview of the Hindi famous names) see Table 1. The problem is se- Treebank (HTB), focusing mainly on its dependency vere where the domain of text includes more layer. The Hindi-Urdu Treebank (Palmer et al., proper than common nouns, which is the case 2009; Bhatt et al., 2009) is a multi-layered and with the Hindi Treebank as it is annotated on multi-representational treebank. It includes three newspaper articles. levels of annotation, namely two syntactic levels and one lexical-semantic level. One syntactic level is a 2. Ambiguity: Since words can be ambiguous, the dependency layer which follows the CPG (Begum animacy listed in wordnet can only be used in

160 et al., 2008), inspired by the Pan¯ .inian grammati- Despite the fact that the Hindi Treebank already cal theory of . The other level is annotated features a number of layers as discussed above, there with phrase structure inspired by the Chomskyan ap- have been different proposals to enrich it further. proach to (Chomsky, 1981) and follows a bi- Hautli et al. (2012) proposed an additional layer to nary representation. The third layer of an- the treebank, for the deep analysis of the language, notation, a purely lexical semantic one, encodes the by incorporating the functional structure (or semantic relations following the English PropBank f-structure) of Lexical Functional which (Palmer et al., 2005). encodes traditional syntactic notions such as sub- In the dependency annotation, relations are ject, , and . Dakwale et mainly verb-centric. The relation that holds between al. (2012) have also extended the treebank with a verb and its arguments is called a kar.aka relation. anaphoric relations, with a motive to develop a data Besides kar.aka relations, dependency relations also driven anaphora resolution system for Hindi. Given exist between nouns (genitives), between nouns and this scenario, our effort is to enrich the treebank their modifiers (adjectival modification, relativiza- with the animacy annotation. In the following tion), between and their modifiers (adver- sections, we will discuss in detail, the annotation of bial modification including subordination). CPG the animacy property of nominals in the treebank provides an essentially syntactico-semantic depen- and the motive for the same. dency annotation, incorporating kar.aka (e.g., agent, theme, etc.), non-kar.aka (e.g. , purpose) and other (part of) relations. A complete tag set of 3 Motivation: In the Context of dependency relations based on CPG can be found in Dependency Parsing (Bharati et al., 2009), the ones starting with ‘k’ are Hindi is a morphologically rich language, gram- largely Pan¯ .inian kar.aka relations, and are assigned matical relations are depicted by its to the arguments of a verb. Figure 1 encodes the de- via case . Hindi has a morphologically pendency structure of (5), the preterminal node is a split-ergative case marking system (Mahajan, 1990; part of speech of a lexical item (e.g. NN,VM, PSP). Dixon, 1994). Case marking is dependent on the The lexical items with their part of speech tags are aspect of a verb (progressive/perfective), further grouped into constituents called chunks (e.g. (transitive/intransitive) and the type of a nominal NP, VGF) as part of the sentence analysis. The de- (definite/indefinite, animate/inanimate). Given pendencies are attached at the chunk level, marked this peculiar behavior of case marking in Hindi, with ‘drel’ in the SSF format. k1 is the agent of arguments of a verb (e.g. transitive) have a number KAyA an action ( ‘eat’), whereas k2 is the object or of possible configurations with respect to the case . marking as shown in the statistics drawn from (5) s\@yAn sº KAyA । the Hindi Treebank released for MTPIL Hindi Sandhya-Erg apple-Nom eat-Perf Dependency parsing shared task (Sharma et al., ‘Sandhya ate an apple.’ 2012) in Table 3. Almost in 15% of the transitive , there is no morphological case marker on Offset Token Tag structure any of the arguments of a verb which, in the context 1 (( NP 1.1 s\@yA NNP of data driven parsing, means lack of an explicit 1.2 n PSP cue for machine learning. Although, in other cases )) 2 (( NP there is a case marker, at least on one argument of a 2.1 sº NN verb, the ambiguity in case markers (one-to-many )) 3 (( VGF mapping between case markers and grammatical 3.1 KAyA VM functions as presented in Table 4) further worsens )) the situation (however, see Ambati et al. (2010) and Figure 1: Annotation of an Example Sentence in SSF. Bhat et al. (2012) for the impact of case markers on parsing Hindi/Urdu). Consider the examples from

161 (6a-e), the instrumental se is extremely ambiguous. (7) EÊEwyA ÚAnA Ê g rhF h{ । It can mark the instrumental adjuncts as in (6a), bird-Nom grain-Nom devour-Prog source expressions as in (6b), material as in (6c), ‘A bird is devouring grain.’ comitatives as in (6d), and causes as in (6e).

A conventional parser has no cue for the disam- K2-Unmarked K2-Marked biguation of instrumental case marker se in exam- K1-Unmarked 1276 741 ples (6a-e) and similarly, in example (7), it’s hard K1-Marked 5373 966 for the parser to know whether ‘bird’ or ‘grain’ is Table 3: Co-occurrence of Marked and Unmarked verb argu- the agent of the action ‘devour’. Traditionally, syn- ments (core) in HTB. tactic parsing has largely been limited to the use of only a few lexical features. Features like POS-

n/ne ko/ko s/se m\/meN pr/par kA/kaa tags are way too coarser to provide deep informa- (Ergative)(Dative)(Instrumental)(Locative)(Locative)(Genitive) tion valuable for syntactic parsing while on the other k1(agent) 7222 575 21 11 3 612 k2(patient) 0 3448 451 8 24 39 hand lexical items often suffer from lexical ambi- k3(instrument) 0 0 347 0 0 1 guity or out of vocabulary problem. So in oder to k4(recipient) 0 1851 351 0 1 4 k4a(experiencer) 0 420 8 0 0 2 assist the parser for better judgments, we need to k5(source) 0 2 1176 12 1 0 k7(location) 0 1140 308 8707 3116 19 complement the morphology somehow. A careful r6(possession) 0 3 1 0 0 2251 observation easily states that a simple world knowl- Table 4 : Distribution of case markers across case function. edge about the nature (e.g. living-nonliving, arti- fact, place) of the participants is enough to disam- biguate. For Swedish, Øvrelid and Nivre (2007) and (6a) mohn n ÊAºF s tAlA KolA । Øvrelid (2009) have shown improvement, with an- Mohan-Erg key-Inst lock-Nom open imacy information, in differentiation of core argu- ‘Mohan opened the lock with a key.’ ments of a verb in dependency parsing. Similarly for Hindi, Bharati et al. (2008) and Ambati et al. (2009) have shown that even when the training data (6b) gFtA n EÚ¥F s sAmAn is small simple animacy information can boost de- Geeta-Erg Delhi-Inst luggage-Nom pendency parsing accuracies, particularly handling m\gvAyA । the differentiation of core arguments. In Table 5, procure we show the distribution of animacy with respect to ‘Geeta procured the luggage from Delhi.’ case markers and dependency relations in the anno- tated portion of the Hindi Treebank. The high rate (6c) m Et kArn p(Tr s m Et ºnAyF । of co-occurrence between animacy and dependency sculptor-Erg stone-Inst idol-Nom make relations makes a clear statement about the role an- ‘The sculptor made an idol out of stone.’ imacy can play in parsing. Nominals marked with dependency relations as k1 ‘agent’, k4 ‘recipient’, k4a ‘experiencer’ are largely annotated as human (6d) rAm kF [yAm s ºAt h । while k3 ‘instrument’ is marked as inanimate, Ram-Gen Shyaam-Inst talk-Nom happen which confirms our conjecture that with animacy ‘Ram spoke to Shyaam.’ information a parser can reliably predict linguistic patterns. Apart from parsing, animacy has been re- (6e) ºAErf s kI Psl\ tºAh ported to be beneficial for a number of natural lan- rain-Inst many crops-Nom destroy guage applications (Evans and Orasan, 2000; Merlo ho gyF\ । and Stevenson, 2001). Following these computa- happen-Perf tional implications of animacy, we started encoded ‘Many crops were destroyed due to the rain.’ this property of nominals explicitly in our treebank. In the next section, we will present these efforts fol-

162 lowed by the inter-annotator studies. n/ne (Erg) 0 0 0 ko/ko (Dat/Acc) 0 0 0 s/se (Inst) 1 0 0 Human Other-Animates Inanimate r6 m\/me (Loc) 0 0 0 n/ne (Erg) 2321 630 108 pr/par (Loc) 0 0 0 ko/ko (Dat/Acc) 172 8 135 kA/kaa (Gen) 156 80 605 s k1 /se (Inst) 6 0 14 φ (Nom) 13 3 25 m\/me (Loc) 0 0 7 pr/par (Loc) 0 0 1 Table 5: Distribution of semantic features with respect a kA/kaa (Gen) 135 2 99 to case markers and dependency relations . φ (Nom) 1052 5 3072 n/ne (Erg) 0 0 0 ak1 ‘agent’, k2 ‘patient’, k3 ‘instrument’, k4 ‘recipient’, ko/ko (Dat/Acc) 625 200 226 k4a ‘experiencer’, k5 ‘source’, k7 ‘location’, r6 ‘possession’ s k2 /se (Inst) 67 0 88 m\/me (Loc) 2 0 6 4 Animacy Annotation pr/par (Loc) 5 0 37 kA/kaa (Gen) 15 0 14 Following Zaenen et al. (2004), we make a three- φ (Nom) 107 61 2998 way distinction, distinguishing between Human, n/ne (Erg) 0 0 0 Other Animate and In-animate referents of a ko/ko (Dat/Acc) 0 0 0 nominal in a given context. The animacy of a ref- s k3 /se (Inst) 2 0 199 erent is decided based on its sentience and/or con- m\/me (Loc) 0 0 0 pr/par (Loc) 0 0 0 trol/volitionality in a particular context. Since, - kA/kaa (Gen) 0 0 0 totypically, agents tend to be animate and patients φ (Nom) 0 0 20 tend to be inanimate (Comrie, 1989), higher ani- n/ne (Erg) 0 0 0 mates such as humans, dogs etc. are annotated as ko/ko (Dat/Acc) 597 0 13 such in all contexts since they frequently tend to be s k4 /se (Inst) 53 0 56 seen in contexts of high control. However, lower m\/me (Loc) 0 0 0 pr/par (Loc) 0 0 0 animates such as insects, plants etc. are anno- kA/kaa (Gen) 0 0 0 tated as ‘In-animate’ because they are ascribed φ (Nom) 7 0 8 less or no control in human languages like inan- n/ne (Erg) 0 0 0 imates (Kittila¨ et al., 2011). Non-sentient refer- ko/ko (Dat/Acc) 132 0 8 ents, except intelligent machines and vehicles, are s k4a /se (Inst) 4 0 2 annotated as ‘In-animate’ in all contexts. Intel- m\/me (Loc) 0 0 0 pr/par (Loc) 0 0 0 ligent machines like robots and vehicles, although, kA/kaa (Gen) 1 0 0 lack any sentience, they possess an animal like be- φ (Nom) 56 0 1 havior which separates them from inanimate nouns n/ne (Erg) 0 0 0 with no animal resemblance, reflected in human lan- ko/ko (Dat/Acc) 0 0 0 guage as control/volitionality. These nouns unlike s k5 /se (Inst) 7 0 460 humans and other higher animates are annotated as m\/me (Loc) 0 0 1 per the context they are used in. They are anno- pr/par (Loc) 0 0 0 kA/kaa (Gen) 0 0 0 tated as ‘Other animate’ only in their agentive φ (Nom) 0 0 2 roles. Nominals that vary in sentience in varying n/ne (Erg) 0 0 0 contexts are annotated based on their reference in a ko/ko (Dat/Acc) 4 0 0 given context as discussed in Subsection 4.2. These s k7 /se (Inst) 3 0 129 nominals include country names referring to geo- m\/me (Loc) 0 1977 1563 graphical places, teams playing for the country, gov- pr/par (Loc) 66 0 1083 kA/kaa (Gen) 0 0 8 ernments or their inhabitants; and organizations in- φ (Nom) 5 0 1775 cluding courts, colleges, schools, banks etc. Un- like Zaenen et al. (2004) we don’t further categorize ‘Other Animate’ and ‘In-animate’ classes. We

163 don’t distinguish between Organizations and Ani- are added. One for their context bound sense mals in ‘Other Animate’ and Time and Place in (metaphorical) and the other for context free sense ‘In-animates’. (literal). In example (9), waves is annotated with The process of animacy annotation in the Hindi literal animacy as In-animante and metaphoric Treebank is straight forward. For every chunk in a animacy as Human, as shown in Figure 4 (offset sentence, the animacy of its word is captured 2). in an ‘attribute-value’ pair in SSF format, as shown in Figure 3. Hitherto, around 6485 sentence, Offset Token Tag Feature structure of the Hindi Treebank, have been annotated with 1 (( NP 1.1 sAgr NNC the animacy information. 1.2 tV NN 1.3 pr PSP )) Offset Token Tag Feature structure 2 (( NP metaphoric=‘human’> 1.1 mohn NNP 2.1 lhr\ NN 1.2 n PSP )) )) 3 (( VGF 2 (( NP semprop=‘other-animate’> 3.2 rhF VAUX 2.1 Eº¥F NN 3.3 h{\ AUX 2.2 ko PSP )) )) 3 (( NP Figure 4: Semantic Annotation in SSF. 3.1 ºotl NN 3.2 s PSP )) 4 (( NP sea coast-Loc waves-Nom dance-Prog 4.1 Ú D NN )) ‘Waves are dancing on the sea shore.’ 5 (( VGF 5.1 EplAyA VM 4.2 Complex Types )) The Hindi Treebank in largely built on newspa- Figure 3: Semantic Annotation in SSF. per corpus. Logically polysemous expressions (metonymies) such as government, court, (8) mohn n Eº¥F ko ºotl s Ú D newspaper etc. are very frequent in news re- Mohan-Erg cat-Dat bottle-Inst milk-Nom porting. These polysemous nominals can exhibit EplAyA । contradictory semantics in different contexts. In drink-Perf example (10a), court refers to a person (judge) or ‘Mohan fed milk to the cat with a bottle.’ a group of persons (jury) while in (10b) it is a building (see Pustejovsky (1996) for the semantics of complex types). In our annotation procedure, In the following, we discuss some of the interest- such expressions are annotated as per the sense or ing cross linguistic phenomena which added some reference they carry in a given context. So, in case challenge to the annotation. of (10a) court will be annotated as Human while 4.1 Personification in (10b) it will be annotated as In-animante. Personification is a type of meaning extension whereby an entity (usually non-human) is given (10a) aÚAlt n m kÚmkA P{\slA human qualities. Personified expressions are an- court-Erg case-Gen decision-Nom notated, in our annotation procedure, as Human, s nAyA । since it is the sense they carry in such contexts. declare-Perf However, to retain their literal sense, two attributes ‘The court declared its decision on the case.’

164 (10b) m{\ aÚAlt m\ h  । 5 Conclusion and Future Work I-Nom court-Loc be-Prs In this work, we have presented our efforts to enrich ‘I am in the court.’ the nominals in the Hindi Treebank with animacy 4.3 Inter-Annotator Agreement information. The annotation was followed by the inter-annotator agreement study for evaluating the We measured the inter-annotator agreement on a confusion over the categories chosen for annotation. set of 358 nominals (∼50 sentences) using Cohen’s The annotators have a significant understanding of kappa. We had three annotators annotating the same the property of animacy as shown by the higher val- data set separately. The nominals were annotated ues of Kappa (κ). In future, we plan to continue the in context i.e., the annotation was carried consider- animacy annotation for the whole Hindi Treebank. ing the role and reference of a nominal in a partic- We also plan to utilize the annotated data to build ular sentence. The kappa statistics, as presented in a data driven automatic animacy classifier (Øvrelid, Table 6, show a significant understanding of anno- 2006). From a linguistic perspective, an annotation tators of the property of animacy. In Table 7, we of the type, as discussed in this paper, will also be of report the confusion between the annotators on the great interest for studying information dynamics and three animacy categories. The confusion is high for see how semantics interacts with syntax in Hindi. ‘Inanimate’ class. Annotators don’t agree on this category because of its fuzziness. As discussed ear- 6 Acknowledgments lier, although ‘Inanimate’ class enlists biologically inanimate entities, some entities may behave like an- The work reported in this paper is supported by the NSF grant (Award Number: CNS 0751202; CFDA imates in some contexts. They may be sentient and 2 have high linguistic control in some contexts. The Number: 47.070). difficulty in deciphering the exact nature of the ref- erence of these nominals, as observed, is the reason References behind the confusion. The confusion is observed for Judith Aissen. 2003. Differential object marking: nouns like organization names, lower animates and Iconicity vs. economy. Natural Language & Linguis- vehicles. Apart from the linguistically and contextu- tic Theory, 21(3):435–483. ally defined animacy, there was no confusion, as ex- B.R. Ambati, P. Gade, S. Husain, and GSK Chaitanya. pected, in the understanding of biological animacy. 2009. Effect of minimal semantics on dependency parsing. In Proceedings of the Student Research Work- shop. Annotators κ B.R. Ambati, S. Husain, J. Nivre, and R. Sangal. 2010. ann1-ann2 0.78 On the role of morphosyntactic features in Hindi de- pendency parsing. In Proceedings of the NAACL ann1-ann3 0.82 HLT 2010 First Workshop on Statistical Parsing of ann2-ann3 0.83 Morphologically-Rich Languages, pages 94–102. As- Average κ 0.811 sociation for Computational Linguistics. R. Begum, S. Husain, A. Dhwaj, D.M. Sharma, L. Bai, Table 6: Kappa Statistics and R. Sangal. 2008. Dependency annotation scheme for Indian languages. In Proceedings of IJCNLP. Cite- Human Other-animate Inanimate seer. Human 71 0 14 Akshar Bharati, Samar Husain, Bharat Ambati, Sambhav Other-animate 0 9 5 Jain, Dipti Sharma, and Rajeev Sangal. 2008. Two se- Inanimate 8 10 241 mantic features make all the difference in parsing ac- curacy. Proceedings of ICON, 8. Table 7: Confusion Matrix 2Any opinions, findings, and conclusions or recommenda- tions expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foun- dation.

165 A. Bharati, D.M. Sharma, S. Husain, L. Bai, R. Begum, Seppo Kittila,¨ Katja Vasti,¨ and Jussi Ylikoski. 2011. and R. Sangal. 2009. AnnCorra: TreeBanks for Indian Case, Animacy and Semantic Roles, volume 99. John Languages Guidelines for Annotating Hindi TreeBank Benjamins Publishing. (version–2.0). A.K. Mahajan. 1990. The A/A-bar distinction and move- R.A. Bhat, S. Jain, and D.M. Sharma. 2012. Experi- ment theory. Ph.D. thesis, Massachusetts Institute of ments on Dependency Parsing of Urdu. In Proceed- Technology. ings of TLT11 2012 Lisbon Portugal, pages 31–36. Paola Merlo and Suzanne Stevenson. 2001. Auto- Edic¸es Colibri. matic verb classification based on statistical distribu- R. Bhatt, B. Narasimhan, M. Palmer, O. Rambow, D.M. tions of argument structure. Computational Linguis- Sharma, and F. Xia. 2009. A multi-representational tics, 27(3):373–408. and multi-layered treebank for hindi/urdu. In Pro- Dipak Narayan, Debasri Chakrabarty, Prabhakar Pande, ceedings of the Third Linguistic Annotation Workshop, and Pushpak Bhattacharyya. 2002. An experience in pages 186–189. Association for Computational Lin- building the indo wordnet-a wordnet for hindi. In First guistics. International Conference on Global WordNet, Mysore, Joan Bresnan, Jean Carletta, Richard Crouch, Malvina India. Nissim, Mark Steedman, Tom Wasow, and Annie Za- Lilja Øvrelid and Joakim Nivre. 2007. When word or- enen. 2002. Paraphrase analysis for improved genera- der and part-of-speech tags are not enough – Swedish tion, link project. dependency parsing with rich linguistic features. In Miriam Butt. 2006. The dative-ergative connection. Em- Proceedings of the International Conference on Recent pirical issues in syntax and semantics, 6:69–92. Advances in Natural Language Processing (RANLP), N. Chomsky. 1981. Lectures on Government and Bind- pages 447–451. ing. Dordrecht: Foris. Lilja Øvrelid. 2006. Towards robust animacy classifica- Bernard Comrie. 1989. Language universals and lin- tion using morphosyntactic distributional features. In guistic typology: Syntax and morphology. University Proceedings of the Eleventh Conference of the Euro- of Chicago press. pean Chapter of the Association for Computational Praveen Dakwale, Himanshu Sharma, and Dipti M Linguistics: Student Research Workshop, pages 47– Sharma. 2012. Anaphora Annotation in Hindi Depen- 54. Association for Computational Linguistics. dency TreeBank. In Proceedings of the 26th Pacific Lilja Øvrelid. 2009. Empirical evaluations of animacy Asia Conference on Language, Information, and Com- annotation. In Proceedings of the 12th Conference of putation, pages 391–400, Bali,Indonesia, November. the European Chapter of the Association for Compu- Faculty of Computer Science, Universitas Indonesia. tational Linguistics (EACL). Felice Dell’Orletta, Alessandro Lenci, Simonetta Mon- M. Palmer, D. Gildea, and P. Kingsbury. 2005. The temagni, and Vito Pirrelli. 2005. Climbing the proposition bank: An annotated corpus of semantic path to grammar: A maximum entropy model of sub- roles. volume 31, pages 71–106. MIT Press. ject/object learning. In Proceedings of the Workshop M. Palmer, R. Bhatt, B. Narasimhan, O. Rambow, D.M. on Psychocomputational Models of Human Language Sharma, and F. Xia. 2009. Hindi Syntax: Annotat- Acquisition, pages 72–81. Association for Computa- ing Dependency, Lexical -Argument Struc- tional Linguistics. ture, and Phrase Structure. In The 7th International R.M.W. Dixon. 1994. Ergativity. Number 69. Cam- Conference on Natural Language Processing, pages bridge University Press. 14–17. Richard Evans and Constantin Orasan. 2000. Improv- J. Pustejovsky. 1996. The Semantics of Complex Types. ing anaphora resolution by identifying animate entities Lingua. in texts. In Proceedings of the Discourse Anaphora Dipti Misra Sharma, Prashanth Mannem, Joseph van- and Reference Resolution Conference (DAARC2000), Genabith, Sobha Lalitha Devi, Radhika Mamidi, and pages 154–162. Ranjani Parthasarathi, editors. 2012. Proceedings of Christiane Fellbaum. 2010. WordNet. Springer. the Workshop on Machine Translation and Parsing in A. Hautli, S. Sulger, and M. Butt. 2012. Adding an an- Indian Languages. The COLING 2012 Organizing notation layer to the Hindi/Urdu treebank. Linguistic Committee, Mumbai, India, December. Issues in Language Technology, 7(1). Michael Silverstein. 1986. Hierarchy of features and Paul J Hopper and Sandra A Thompson. 1980. Tran- ergativity. Features and projections, pages 163–232. sitivity in grammar and discourse. Language, pages Ulf Teleman. 1974. Manual for¨ grammatisk beskrivning 251–299. av talad och skriven svenska. Studentlitteratur.

166 Juliette Thuilier, Laurence Danlos, et al. 2012. Seman- tic annotation of French corpora: animacy and verb semantic classes. In LREC 2012-The eighth interna- tional conference on Language Resources and Evalu- ation. Annie Zaenen, Jean Carletta, Gregory Garretson, Joan Bresnan, Andrew Koontz-Garboden, Tatiana Nikitina, M Catherine O’Connor, and Tom Wasow. 2004. Ani- macy Encoding in English: why and how. In Proceed- ings of the 2004 ACL Workshop on Discourse Anno- tation, pages 118–125. Association for Computational Linguistics.

167