Towards a psycholinguistically motivated dependency for Hindi

Samar Husain Rajesh Bhatt Shravan Vasishth Department of University of Massachusetts Department of Linguistics Universitat¨ Potsdam Amherst, MA , USA Universitat¨ Potsdam Germany [email protected] Germany [email protected] [email protected]

Abstract Mary, ‘John’ and ‘Mary’ are connected via arcs to ‘kissed’; the former arc bears the label ‘sub- The overall goal of our work is to build ject’ and the latter arc the label ‘’. Taken a -based sen- together, these nodes with their connections form tence processor for Hindi. As a first step a tree. Figure 1 shows the dependency and the towards this end, in this paper we present phrase structure trees for the above sentence. a dependency grammar that is motivated by psycholinguistic concerns. We describe the components of the grammar that have been automatically induced using a Hindi dependency . We relate some as- pects of the grammar to relevant ideas in the psycholinguistics literature. In the - cess, we also extract statistics and pat- terns for phenomena that are interesting from a processing perspective. We fi- nally present an outline of a dependency grammar-based human sentence processor for Hindi.

1 Introduction Human proposals and mod- eling works are overwhelmingly based on phrase- structure and constituent based representa- tion. This is because most modern linguistic theo- ries (Chomsky, 1965), (Chomsky, 1981), (Chom- sky, 1995), (Bresnan and Kaplan, 1982), (Sag et al., 2003) use phrase structure representation to analyze a sentence. There is, however, an alter- Figure 1: Dependency tree and Phrase structure native approach to sentential syntactic representa- tree tion, known as dependency representation, that is quite popular in Computational Linguistics (CL). There have been some previous attempts to use Unlike phrase structures where the actual lexicalized such as LTAG, CCG, etc. of the sentence appear as leaves, and the inter- in psycholinguistics. These lexicalized grammars nal nodes are phrases, in a dependency grammar have been independently shown to be related to (Mel’cuk,ˇ 1988), (Bharati et al., 1995), (Hud- dependency grammar (Kuhlmann, 2007). For ex- son, 2010) a syntactic tree comprises of sentential ample, Pickering and Barry (1991) used catego- words as nodes. These words/nodes are connected rial grammar to handle processing of empty cat- to each other with edges/arcs. The edges can be egories. Similarly, Pickering (1994) used depen- labeled to show the type of relation between a pair dency categorial grammar to process both local of node. For example, in the sentence John kissed and non-local dependencies. More recenly, Ta-

108 Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013), pages 108–117, Prague, August 27–30, 2013. c 2013 Charles University in Prague, Matfyzpress, Prague, Czech Republic bor and Hutchins (2004) used a lexical grammar ogy with verb-subject2, noun-adjective agreement. based parser and Kim et al. (1998) used lexical- Examples (2) to (6) below show some of the possi- ized tree-adjoining grammar (LTAG) to model cer- ble order variations possible with (1). These tain processing results. Demberg (2010) has re- permutations are not exhaustive - in fact all 4! per- cently proposed a psycholinguistically motivated mutations are possible. LTAG (P-LTAG). Despite the success of depen- (1) malaya ne abhiisheka ko kitaaba dii dency paradigm in CL, it has remained unex- Malaya ERG Abhishek DAT book gave plored in psycholinguistics. To our knowledge, ‘Malaya gave a book to Abhishek.’ (S-IO-O-V) the work by Boston and colleagues (Boston et al., (2) malaya ne kitaaba abhiisheka ko dii (S-O-IO-V) 2011), (Boston et al., 2008) is the only such at- (3) abhiisheka ko malaya ne kitaaba dii (IO-S-O-V) tempt. There are some very interesting open ques- tions with respect to using dependency representa- (4) abhiisheka ko kitaaba malaya ne dii (IO-O-S-V) tion and dependency parsers while building a hu- (5) kitaaba abhiisheka ko malaya ne dii (O-IO-S-V) man sentence processing system. Can a process- (6) kitaaba malaya ne abhiisheka ko dii (O-S-IO-V) ing model based on dependency parsing paradigm account for classic psycholinguistic phenomena? A great deal of experimental research has shown Can one adapt a high performance dependency that working-memory limitations play a major role parser for psycholinguistic research? If yes, then in sentence comprehension difficulty (e.g., Lewis how? How will the differences in different depen- and Vasishth (2005)). We find numerous instances dency parsing paradigms affect the predictive ca- in natural where a word needs to be tem- pacity of the models based on them? porarily retained in memory before it can be in- This paper is arranged as follows, in Section 2 tegrated as part of a larger structure. Because of we mention some experimental works that have limited working-memory, retaining a word for a motivated the grammar design. Section 3 dis- longer period can make sentence processing diffi- cusses the grammar induction process and lists out cult. Abstracting away from details, on this view, the main components of the grammar. In Section one way in which processing complexity can be 4 we present statistics of Hindi word order varia- formulated is by using metrics that can incorporate tions as found in the treebank and point out some dependent-head distance (Gibson, 2000), (Grod- patterns that are interesting from a processing per- ner and Gibson, 2005). This idea manifests itself spective. We also talk about prediction rules that in various forms in the psycholinguistics literature. are an important component in the grammar. Sec- For example, Gibson (2000) proposes integration tion 5 then presents a proposal for developing a cost and storage cost to account for processing human sentence processing system by adapting complexity. Lewis and Vasishth (2005) have pro- graph-based dependency parsing. Finally in Sec- posed a -based theory that uses tion 6 we discuss some issues and challenges in the notion of decay as one determinant of memory using dependency grammar paradigm for human retrieval difficulty. Elements that exists in mem- sentence processing. We conclude in Section 7. ory without being retrieved for a long time will decay more, compared to elements that have been 2 : Some relevant retrieved recently or elements that are recent. In experimental work addition to decay, the theory also incorporates the notion of interference. Memory retrievals are fea- Some crucial design decisions in our research have ture based, and feature overlap during retrieval, in been inspired by psycholinguistic experimental addition to decay, will cause difficulty. work. In this section we will mention these works. As opposed to -based accounts men- But before that, we will briefly discuss Hindi, the tioned above, expectation-based theories appeal to language that we are working with. the predictive nature of sentence processor. On Hindi is one of the official of India. this view, processing becomes difficult if the up- Hindi is the fourth most widely spoken language coming sentential material is less predictable. Sur- 1 in the world . It is a free-word order language prisal (Hale, 2001), (Levy, 2008) is one such and is head final. It has relatively rich morphol- account. Informally, surprisal increases when a

1(http://www.ethnologue.com/ethno docs/distribution.asp? 2This is the default agreement pattern. The complete by=size). agreement system is much more .

109 parser is required to build some low-probability ing approach. structure. Expectation-based theories can successfully ac- 3 Inducing a grammar count for so called anti-locality effects. It has To develop a dependency grammar we will make been noted that in some language phenomena, in- use of an already existing Hindi dependency tree- creasing the distance between the dependent and bank (Bhatt et al., 2009). The treebank data is its head speeds up time at the head (see a collection of news articles from a Hindi news- Konieczny (2000) for such effects in German, and paper and has 400k words. The task of auto- Vasishth and Lewis (2006) for Hindi). This is, matic induction of grammar from a treebank can of course, contrary to the predictions made by be thought of as making explicit the implicit gram- locality-based accounts where such an increase mar present in the treebank. This approach can be should cause slowdown at the head. There is con- beneficial for a variety of tasks, such as, comple- siderable empirical evidence supporting the idea menting traditional hand-written grammars, com- of predictive parsing in language comprehension paring grammars of different languages, building (e.g. Staub and Clifton (2006)). There is also parsers, etc. (Xia and Palmer, 2001), (Kolachina some evidence that shows that the nature of pre- et al., 2010). Our task is much more focused, we dictive processing can be contingent on language want to bootstrap a grammar that can be used for specific characteristics. For example, Vasishth et a dependency-based human sentence processor for al. (2010) argue that the verb-final nature of Ger- Hindi. man subordinate clauses leads to the parser main- taining future predictions more effectively as com- 3.1 pared to English. As Hindi is a verb-final lan- guage, these experimental results become perti- The lexicon comprises of syntactic properties of nent for this paper. Locality-based results are gen- various heads (e.g. verbs). Based on a priori selec- erally formalized using the limited working mem- tion of certain relations (, object, ory model. Such a model enforces certain re- indirect object, experiencer verb subject, goal, source limitations within which human sentence noun complements of subject for copula verbs) 3 processing system operates. On the other hand, we formed around 13 verb clusters . These clus- expectation/prediction-based results have to be ac- ters were then merged into 6 super-clusters based counted by appealing to the nature of the process- on the previously mentioned relations (this time 4 ing system itself. acting as discriminators ). These clusters corre- spond to, (1) intransitive verbs (e.g. so ‘sleep’, Hindi being a free word order language, exper- gira ‘fall’), (2) transitive verbs (e.g. khaa ‘eat’), imental work that deal with the processing cost of (3) ditransitives (e.g. de ‘give’), (4) experiencer word-order variation is also important to us. Ex- verbs (e.g. dikha ‘to appear’), (5) copula (e.g. hai perimental work points to the fact that human sen- ‘is’), (6) goal verbs (e.g. jaa ‘go’). tence processing is sensitive to word order vari- These 6 verb classes can be thought of as tree- ation (e.g. Bader and Meng (1999), Kaiser and templates and can be associated with various class Trueswell (2004) ). However, it is still not clear as specific constraints such as, number of mandatory to why/how word order variation influences pro- arguments, part of , category of the argu- cessing costs. Processing costs could be due to ments, canonical order of the arguments, relative variety of reasons (such as, syntactic complexity, position of the argument with respect to the verb, frequency, information structure, prosody, mem- agreement constraints, etc. Figure 2 shows a sim- ory constraints, etc). plified transitive template that can be associated So, there are three streams of experimental with all the transitive verbs in the lexicon. Its research that are relevant for us: (a) locality 3 effects, (b) anti-locality/expectation effects, (c) Clustering originally gave us 31 classes that were manu- ally checked to give us 13 correct classes. Although this fil- word-order variation effects. In section 3 and 4 we tering was done manually, most of the remaining 18 clusters will discuss how insights from (b) and (c) inform could have also been identified automatically. The proportion some of our design decisions. Later in section 5 of verbs associated with them is very low. These clusters are mainly due to annotation errors. while discussing human sentence processing, we 4Each of these clusters were then compared to each other will touch upon the notion of locality in our pars- in order to remove common verbs.

110 first argument (subject) is represented by a vari- occurrences in the treebank. While inducing the able i, and its second argument (object) is repre- clusters we neglected the differences in tense, as- sented by a variable j. The verb itself is shown as pect and modality (TAM) of the verbs (e.g. per- variable x. These variables (i, j, x) will be instan- fective, obligational) that sometime leads to dif- tiated by those lexical items that are of a particu- ferent case-markings on the arguments. This is lar type. For example, x can only be instantiated because we are focusing on the number of argu- by transitive verb class members5. Similarly, only ments, not the case-markings on the arguments. those items that satisfy the dependent constraints But, finite-templates cannot be used for non-finite can be instantiated as subject or object. These con- verbs. This is because the surface requirements straints can be of various kinds, such as the part of of non-finite verbs are different from that of finite speech (POS) category, semantic type, etc. The verbs6. For example, when khaa ‘eat’ occurs as tree-template in figure 2 also encodes the arity of khaakara ‘having eaten’, its original requirement the verbal head, as well as the canonical word or- for a subject changes to mandatorily not taking any der of its dependents (cf. figure 3). Lastly, the visible subject. In addition, it requires another fi- tree-template shows that adjuncts are not manda- nite or a non-finite verb as its head. tory. Figure 3 shows a template for a transitive Adjunct* vmod argument structure but with object fronting. In the Obj figure, object is indexed with j, and its canonical ... j x y position is shown by øj . Note that the arc com- V-kara V-fin/V-Nonfin ing into the empty node (øj ) is not a dependency relation; the arc and the node are just a representa- tional strategy to show word order variation in the Figure 4: A -kara tree-template. fin = Finite. tree-template. One way to think about -kara template is that it Subj is an outcome of transforming the transitive argu- Adjunct* Aux* Obj ment structure (Bharati et al., 1995). The transfor- mation will perform a series of add/modify/delete i ... j x ... operations on the transitive-template leading to the V-transitive aux-finite template shown in figure 4. Another way to think about this would be ba- Figure 2: A simplified transitive tree-template. sically a non-transformational account, where we aux = Auxiliaries. * signifies 0 or more instances. assume that the non-finite templates are obtained independently of the transitive templates. This would amount to looking at only the non-finite Obj Subj verbs in the treebank and identifying some gen- Adjunct* Aux* eralization about various non-finite instances, i.e. creating classes based on the inflections such kara j i ... øj x ... V-trans aux-finite (encodes causality/sequentiality), e (hii) (sequen- tiality), taa huaa (signifies co-occurring event), naa (gerundive form), etc. Figure 3: A simplified transitive tree template From a grammar extraction perspective, the showing object fronting. trans = Transitive, aux method that is used to get the non-finite templates = Auxiliaries, ø = canonical position of object j j is not that important. As long as one can extract the correct requirements of the non-finite verbs, it 3.2 Frame variations should suffice. On the other hand, such a distinc- tion could, in fact, be very relevant from a lexical The tree-templates we saw in section 3.1 have processing perspective. Do we build the kara tem- been induced automatically using the finite verb plate on the fly or do we just access its information 5Only finite verbs were considered to form the verb clus- from a stored entry? Such a design decision will ters. Syntactically, non-finite verbs in Hindi behave differ- ently than their finite counterpart. This is not only reflected have to await future investigation. in the inflection, but also in the number of visible arguments. We discuss such cases in Section 3.2. 6This also holds true for passives.

111 3.3 Probability distribution of dependency We begin with one simple cue, case of the argu- types ments. Considering the occurrence of the first ver- Each verb class (i) is associated with a probability bal argument in a sentence and its case, we tried to distribution of its dependents (j=1..n). The prob- predict the verb class using the data in the Hindi ability of a node (x) being a dependent of i with dependency treebank. Here are some predictions based on frequent case-markings: ne (ERG) relation r, is computed as: → transitive, ko (ACC) transitive, se (INST) → → λ(x,r,i) ditransitive, 0 (NOM) intransitive. P = (1) → x,r,i n The verb classes that we get for ne, se and 0 λ(j ,i) j=1 reflect the default distribution of ERG, INST and P NOM case-markers vis-a-vis` the type of verbs where λ is the count of i x (with de- (x,r,i) → they tend to occur with. Of course, predictions pendency relation r), and the denominator signi- will become more precise as more words are pro- fies the total count of all the dependents of i. cessed. When the first two argument case-markers Such probabilities can be used to score a depen- are considered we get: dency tree at any given point during the parsing process. This is of course a simplification and as 0 0: copula we will note in section 5, there are other ways to 0 -: intransitive induce the probability model that can be used to 0 se: transitive score a dependency tree. 0 ko: transitive The dependency grammar that we have been 0 ne: transitive building will be used to model a human sentence ne 0: transitive parser. The incrementality of the parser and the ne ko: transitive/ditransitive observation that many nouns can appear without ne se: ditransitive any post-position makes it necessary to identify if ko 0: ditransitive the two nouns appearing together are collocations ko ko: ditransitive or independent arguments. ko se: ditransitive One way to compute the collocational strength ko ne: transitive between words x0 and y0 is by using pointwise mu- ko -: transitive tual information (Manning and Schutze,¨ 1999): se 0: ditransitive p(x , y ) se ne: ditransitive I(x , y ) = log 0 0 (2) 0 0 p(x )p(y ) And as we get more information, we might have 0 0 to revise our previous predictions and make neces- Again, as we will see in section 5, there might sary structural changes (or rerank multiple struc- be other ways in order to incorporate this knowl- tures). For example, the first ko occurrence pre- edge for parsing purposes. dicts a transitive-template, but that is later revised to a ditransitive if we happen to see a 0 or a ko 3.4 Prediction rules case-marker. As each incoming word is incorporated into the The prediction rules shown above have been au- existing structure, predictions are made about up- tomatically extracted from the dependency tree- coming words based on current information. In bank. Other than the case-marker, one can also order for the parser to proceed in such a fashion it use other features to make our predictions more must have ready access to such information. The realistic. For example, we could use features such grammar that we propose provides this informa- as sentence position, animacy feature (using a re- tion in the form of prediction rules. The kind of source such as WordNet), etc. information that the (implemented) parser utilizes to make such predictions can be influenced by var- 4 Processing concerns ious theoretical assumptions (and/or experimental results). For illustrative purpose, while gathering Having discussed the main components of the statistics to formulate predictions shown in this grammar, in this section, we will raise some pro- section, we consider only verbal arguments. The cessing concerns based on the statistics gathered presence of adjuncts has been neglected. from the treebank.

112 4.1 Canonical and non-canonical structures (this is not considering clausal objects that occur 4.1.1 Argument structure variation with verbs such as kaha ‘say’). As mentioned previously, the tree-template for The canonical order is encoded in the tree- each verb class also encodes the canonical order template and non-canonical word order is there- in which its argument should appear. For exam- fore reflected in a separate tree. We can see this in ple, in a transitive class, it is expected that the sub- the representations in figure 2 and figure 3. Fig- ject should precede the object, and that the object ure 2 represents the canonical order and figure should immediately precede the verb. Such word 3 the order with object fronting. To reflect that order information can be extracted from the tree- this is a non-canonical word order, a null node is bank. also shown in object’s canonical pre-verbal posi- Reflecting each verb class, following word tion and indexed to the fronted object. orders were extracted from the treebank: As just noted, non-canonical word order due Transitive to changes in argument structure order is not so - Subj Obj frequent. The occurrence of non-canonical word - Subj Obj-cm (cm=case-marker; ko (ACC), se order has been attributed to information structure (INST)) constraints. For example, (Butt and King, 1996) Ditransitive have argued for the following word position - dis- - Subj Indirect-Obj Obj course function mapping for Urdu/Hindi: Sen- tence initial TOPIC, Pre-verbal (immediate) - Subj Indirect-Obj Obj-cm → → FOCUS, Post-verb BACKGROUNDINFORMA- - Subj Indirect-Obj Obj-LOC → TION, Pre-verbal (others) COMPLETIVEIN- - Indirect-Obj Obj (missing Subj) → - Indirect-Obj Obj-LOC (missing Subj) FORMATION. - Subj Indirect-Obj (missing Obj) Other related work on Hindi word order (e.g. 8 Experiencer verbs Kidwai (2000) , Gambhir (1981), Kachru (2006)) - DAT NOM also have a discourse-centric explanation. There is Copula some work that has investigated word order vari- - Subj Noun-complement: Copula ation effects during sentence processing. Vasishth Others et al. (2012), Vasishth (2004) investigated the ef- - Subj Obj-LOC: Verbs such as jaa ‘go’ fect of discourse on word order variation, while - Object verb collocation (whether the object Patil et al. (2008) looked at effects of word order appears immediately before the verb) and information structure on intonation. - Object verb order (the relative position of the 4.1.2 Non-projective dependencies object with respect to the verb; left or right) A dependent and its head can sometimes be dis- Based on the above patterns, following are contiguous; the constraint that the head-dependent some main trends: pair is contiguous is called the projectivity con- straint. More formally, an arc i j is projective → - Close to 11% of argument structures are if, for node i with j as its child, any node k, such non-canonical, that i < k < j (or i > k > j) is dominated by i (i * k) (Nivre and Nilsson, 2005). - Non-canonical order for “Subj Obj”, “Subj → Noun-complement” (for copula verbs) is rare; The dependency treebank being used has 3755 “Obj Subj” = 6.7% out of all “Subj Obj” instances, non-projective arcs. Amongst these, intra-clausal while “Noun-complement Subj” = 3.6% out of all dependencies related to verbal heads account for “Subj Noun-complement” instances, 16.2% of non-projective arcs and 5% of such de- - When Obj is not case-marked it is more likely to pendencies are due to intra-clausal dependencies appear next to the verb (82.2%), than when it is with head. While relative clauses (RCs) case-marked (47.2%) canonical structures are rare, might be because our corpus is - Total number of Obj appearing after the verb is a written corpus. Spoken Hindi has more postverbal material extremely rare (.35% of all object verb instances)7 and non-canonical structures than written Hindi. 8Kidwai (2000) has a syntactic explanation but her syn- 7We note here that this pattern and the fact that non- tactic features have discourse (topic, focus etc.).

113 account for close to 25% of all non-projective of a canonical structure and its non-canonical arcs. For a more detailed classification of non- counterpart in Hindi? projective dependencies in Hindi, see Mannem et - How can we quantify this difference? al. (2009). Embedded relative clauses and correlatives with To answer these questions one needs to conduct canonical word order lead to projective structures, targeted experiments. We need to investigate these while right extraposed relative clauses such as the questions because it will help us make more in- one shown in figure 5 are non-projective. A right formed decisions to implement the grammar and extraposed relative clause can also be projective, the sentence processor. For instance, if it turns this can happen in certain non-canonical configu- out that processing non-projective relative clause rations and is quite rare (see Table 1). structures in Hindi is easy, then what does it say about the parser adaptability based on specific lan- root nmodrelc guage patterns? And how will that knowledge af- fect the design of the parser? subj subj loc pred 4.2 Prediction rules that boy thin is who there is standing Given the observation that predictions made by the parser will go wrong and the parser will Figure 5: Right extraposed relative clause (non- have to make revisions (or rerank), we need to ask: projective). - What is predicted? - What are the different cues that are pooled to Type Projective Non-projective Total Embedded 2.4 .2 2.6 make a prediction? Correlative 17.3 .5 17.8 - What is the processing cost when a prediction is Right extraposed 2.5 76.7 79.2 incorrect? - How can we quantify this cost? Table 1: Relative clause types (Occurrence in %). - How does the prediction system interact with Total RC count = 1198. other aspects of the comprehension process? Recently, Levy et al. (2012) have shown that extraposed RC structures in English are difficult Table 2 shows how our predictions based on to process. Such structures in English are non- case-markers (when the first two arguments have projective and are quite rare9. As opposed to been seen) can vary in terms of correctness in this, right-extraposed RCs in Hindi are the most word order and verb class. For example, after the frequently occurring structures amongst all rela- 1st ko is seen, it predicts a canonical transtive- tive clause types (cf. table 1). Like English, template, this prediction changes to non-canonical these structures are also non-projective in Hindi. transitive template in case ne happens to be the The question then arises as to whether these non- next case-marker; on the other hand if a 0 case- projective structure are also difficult to process in marker was encountered instead, the parser revises Hindi, and if yes, why Hindi speakers prefer such its prediction to a canonical ditransitive-template. configurations, as opposed to a projective structure These predictions have been made before arriv- of embedded relative clause or correlatives?10 ing at the verb, this means, sometimes these pre- At this point in time, we can pose the following dictions could be incorrect. In such a case, the questions: verb (or additional arguments) itself will eventu- - What is the difference between the processing ally help make the final revision. So, based on the above discussion, there are 2 9 Table 1 in Levy et al. (2012), P(extraposedRC ) factors that will influence the processing cost of a is 0.00004, while P(RC context) is 0.00561. | | 10Recently, Kothari (2010) in a judgement study found no prediction: effect of discourse in Hindi native speakers’ preference for correlatives over right extraposed RCs. In another judgement - Correct/Incorrect verb-template prediction, study involving non-restrictive RCs, Kothari shows that Hindi native speakers prefer embeddings for short RCs while right - Correct/Incorrect word-order prediction extrapositions are preferred when the RC is long.

114 Prediction CO NCO Correct prediction Predicted 0 0: copula Predicted ko ne: transitive (that will be predicted by already seen tokens), → → Correct 0 0: copula Correct ko ne: transitive → → Incorrect prediction Predicted 0 0: copula Predicted ko ne: transitive and extracting MST out of it. The other impor- → → (incorrect class) Correct 0 0: transitive Correct ko ne: ditransitive → → tant task after implementing the parser will be to Incorrect prediction Predicted ko ko: ditransitive Predicted ? → → (incorrect word order) Correct ko ko: ditransitive Correct ? → → use the parser to compute certain measures (such (NCO) Incorrect prediction Predicted ko 0: ditransitive Predicted ? as surprisal, locality-based costs). These measures → → (incorrect class Correct ko 0: transitive (NCO) Correct ? → → and word order) can then be used to predict processing difficulty. Within the graph-based parsing paradigm, a prob- Table 2: Different prediction scenarios. Canonical ability model can be induced using the method order: CO, Non-canonical order: NCO proposed by McDonald et al. (2005a), McDonald et al. (2005b)11. Once we have such a probability Based on the presented grammar design, the model, surprisal and locality-based costs can then processing hypothesis about the cost of such a pre- be computed. diction is: To the best of our knowledge the work by Correct prediction < Incorrect prediction Boston and colleagues (Boston et al., 2008), (argstr order or verb class) < Incorrect prediction (Boston et al., 2011) is the only other work that (argstr and class) has employed dependency parsing to model hu- man sentence processing difficulty. Unlike what This hypothesis will of course need to be evalu- has been proposed here, they used a transition- ated experimentally. based dependency parsing model (Nivre, 2003). However, their parser will not be able to correctly 5 An outline of human sentence analyse crossing/discontiguous dependencies. In processing using dependency parsing addition, they have no notion of prediction explic- itly built into their system. We will adapt the graph-based dependency pars- The other work that bears similarity with our ing paradigm (Kubler¨ et al., 2009) to model hu- work is that of Demberg (2010). Demberg (2010), man sentence processing. The parser will be used unlike us, used a variant of lexicalized tree- to compute certain cognitive measures (such as adjoining grammar (a psycholinguistically moti- surprisal, retrieval cost; cf. Boston et al. (2008), vated LTAG called P-LTAG). And although, LTAG Demberg and Keller (2008)) that will in turn be is related to dependency grammars (Kuhlmann, used to predict processing difficulty. 2007), the choice of grammar has a considerable Graph-based parsing data-driven models pa- impact on the parsing system that one employs to rameterizes directly on subtrees. Arc-factored model processing difficulty. Our predicted tree models that only exploit single head-child node template looks similar to the prediction tree of pair will be implemented. The parsing algorithm Demberg (2010). But again, the operations and comprises of finding a maximal spanning tree mechanisms that will be employed by us to con- (MST) out of a complete graph using the arc pa- struct the syntactic structure will be influenced by rameters. Note here that this formulation of the the constraints put in by the properties of depen- parsing algorithm (McDonald et al., 2005a), (Mc- dency grammar and the graph-based parsing algo- Donald et al., 2005b) needs to be modified in or- rithm, and will be significantly different from the der to adapt it for the goals of this paper. In P-LTAG based operations such as substitution, ad- particular, the algorithm needs to be incremental. junction (and verification). It is easy to see how this can be done. Instead of starting with the complete sentence, one needs 6 Issues and Challenges to form complete graphs out of all the available words. If the length of the sentence is n, this Our dependency grammar based human sentence will involve extracting an MST n-1 times, i.e., af- processing system presents itself as an attrac- ter hearing/reading each word. By doing so, the tive alternative to phrase structure based mod- worst case complexity of the algorithm remains els currently dominant in the psycholinguistic lit- unchanged. Another modification that needs to be erature. This is because of its representational incorporated is the use of prediction rules within simplicity and availability of efficient dependency the parsing process; this will involve forming 11Such a probability model can be used as an alternative to complete graphs using unlexicalized tree-template the one mentioned in section 3.3

115 parsing paradigms. It seems to be well suited A. Bharati, V. Chaitanya, and R. Sangal. 1995. Nat- to model expectation-based psycholinguistic the- ural Language Processing: A Paninian Perspective. ories. However, there are certain issues that will Prentice-Hall of India, New Delhi. need to be eventually addressed in order for the R. Bhatt, B. Narasimhan, M. Palmer, O. Rambow, dependency model to have comprehensive cover- D. Sharma, and F. Xia. 2009. A Multi- age. Representational and Multi-Layered Treebank for Hindi/Urdu. In Proceedings of the Third LAW, ACL- The first issue is related to the representational IJCNLP ’09, pages 186–189. aspect of dependency structures. There is con- siderable evidence that while processing some de- M. F. Boston, John T. Hale, U. Patil, R. Kliegl, and pendencies, for example, filler-gap dependencies, S. Vasishth. 2008. Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam anaphora resolution, etc., human sentence com- Sentence Corpus. Journal of Eye Movement Re- prehension system uses certain grammatical con- search, 2(1):1–12. straints (Phillips et al., 2011). These constraints M. F. Boston, J. T. Hale, S. Vasishth, and R. Kliegl. (e.g. c-command) have been traditionally for- 2011. Parallel processing and sentence comprehen- malized using phrase-structure representation. If sion difficulty. Language and Cognitive Processes, it is true that the parser does employ configura- 26(3):301–349. tional constraints such as c-command then it will J. Bresnan and R. M. Kaplan. 1982. The Mental be imperative to formulate a functionally equiv- Representation of Grammatical Relations. The MIT alent definition of c-command within the depen- Press, Cambridge, MA. dency framework. M. Butt and T. C. King. 1996. Structural topic and The second issue is related to parser adaptation. focus without movement. In M. Butt and T. H. King, Adapting the graph-based dependency parser in eds., The First LFG Conference. CSLI Publications. order to effectively compute the cognitive mea- sures will be the most challenging task of this N. Chomsky. 1965. Aspects of the theory of . MIT Press. work. In particular, the same parser has to be conceptualized to compute both locality-based as N. Chomsky. 1981. Lectures on government and bind- well as expectation-based measures (Boston et al., ing. Dordrecht: Foris. 2008). In addition, the prediction system needs N. Chomsky. 1995. The . The to be seamlessly integrated within the parsing pro- MIT Press. cess. V. Demberg and F. Keller. 2008. Eye-tracking corpora as evidence for theories of syntactic processing com- 7 Conclusion plexity. In , volume 109, pages 193–210.

In this paper we introduced our work towards V. Demberg. 2010. A Broad-Coverage Model of Pre- building a psycholinguistically motivated depen- diction in Human Sentence Processing. Ph.D. the- sis, The University of Edinburgh. dency grammar for Hindi. We outlined the main components of such a dependency grammar that V. Gambhir. 1981. Syntactic restrictions and discourse was automatically induced using a Hindi de- functions of word order in standard Hindi. Ph.D. pendency treebank. We discussed certain lan- thesis, University of Pennsylvania, Philadelphia. guage patterns that were interesting psycholin- E. Gibson. 2000. Dependency locality theory: A guistically. We sketched how a graph-based de- distance-based theory of linguistic complexity. In A. pendency parser can be used to model sentence Marantz, Y. Miyashita W. O’Neil (Eds.), Image, lan- guage, brain: Papers from the first mind articulation processing difficulty. We finally mentioned some project symposium. Cambridge, MA: MIT Press. issues with using a dependency based human sen- tence processing model. D. Grodner and E. Gibson. 2005. Consequences of the serial nature of linguistic input for sentenial com- plexity. , 29(2):261–290. References J. Hale. 2001. A probabilistic earley parser as a psy- cholinguistic model. In Proceedings of the NAACL, M. Bader and M. Meng. 1999. Subject-Object Ambi- NAACL ’01, pages 1–8. guities in German Embedded Clauses: An Across- the-Board Comparison. Journal of Psycholinguistic R. Hudson. 2010. An Introduction to Word Grammar. Research., 28(2):121–143. Cambridge University Press.

116 Y. Kachru. 2006. Hindi. John Benjamins Publishing I. Mel’cuk.ˇ 1988. Dependency Syntax: Theory and Company, Philadelphia. Practice. State University of New York Press. E. Kaiser and J. C. Trueswell. 2004. The role of dis- J. Nivre and J. Nilsson. 2005. Pseudo-projective de- course context in the processing of a flexible word- pendency parsing. In Proc. Of ACL 2005. order language. Cognition, 94:113–147. J. Nivre. 2003. An efficient algorithm for projective A. Kidwai. 2000. XP-Adjunction in universal gram- dependency parsing. In 8th International Workshop mar: Scrambling and binding in Hindi- Urdu. Ox- on Parsing Technologies (IWPT), pages 149–160. ford University Press, New York. U. Patil, G. Kentner, A. Gollrad, F. Kuegler, C. Fery, A. Kim, B. Srinivas, and J. Trueswell. 1998. The and S. Vasishth. 2008. Focus, word order and into- convergence of lexicalist perspectives in psycholin- nation in Hindi. Journal of South Asian Linguistics, guistics and computational linguistics. In P. Merlo 1(1):55–72. and S. Stevenson (eds.), Papers from the Special Sec- tion on the Lexicalist Basis of Syntactic Processing, C. Phillips, M. W. Wagers, and E. F. Lau. 2011. Gram- CUNY Conference. matical illusions and selective fallibility in real-time language comprehension. In J. Runner (ed.), Exper- P. Kolachina, S. Kolachina, A. K. Singh, V. Naidu, iments at the Interfaces, Syntax and , vol- S. Husain, R. Sangal, and A. Bharati. 2010. Gram- ume 37, pages 153–186. mar Extraction from for Hindi and Tel- ugu. In Proceedings of The 7th International Con- M. Pickering and G. Barry. 1991. Sentence Processing ference on Language Resources and Evaluation. without Empty Categories. Language and Cognitive Processes, 6(3):229–259. L. Konieczny. 2000. Locality and parsing complexity. Journal of Psycholinguistic Research, 29(6):627– M. Pickering. 1994. Processing Local and Unbounded 645. Dependencies: A Unified Account. Journal of Psy- cholinguistic Research, 23(4):323–352. A. Kothari. 2010. Processing Constraints And Word Order Variation In Hindi Relative Clauses. Ph.D. I. A. Sag, T. Wasow, and E. M. Bender. 2003. Syntactic thesis, Stanford University. Theory: A Formal Introduction, 2nd edition. CSLI. S. Kubler,¨ R. McDonald, and J. Nivre. 2009. Depen- A. Staub and C. Clifton. 2006. Syntactic prediction in dency Parsing. Synthesis Lectures on Human Lan- language comprehension: Evidence from either ... guage Technologies. Morgan & Claypool Publish- or. Journal of Experimental : Learning, ers. Memory, and Cognition, 32:425–436. M. Kuhlmann. 2007. Dependency Structures and Lex- W. Tabor and S. Hutchins. 2004. Evidence for Self- icalized Grammars. Ph.D. thesis, Saarland Univer- Organized Sentence Processing: Digging-In Ef- sity, Saarbrucken,¨ Germany. fects. Journal of : Learn- ing, Memory, and Cognition, 30(2):431–450. R. Levy, E. Fedorenko, M. Breen, and E. Gibson. 2012. The processing of extraposed structures in english. S. Vasishth and R. L. Lewis. 2006. Argument- Cognition, 122(1):12–36. head distance and processing complexity: Explain- ing both locality and antilocality effects. Language, R. Levy. 2008. Expectation-based syntactic compre- 82(4):767–794. hension. Cognition, 106(3):1126 – 1177. S. Vasishth, K. Suckow, R. L. Lewis, and S. Kern. R. L. Lewis and S. Vasishth. 2005. An activation- 2010. Short-term forgetting in sentence compre- based model of sentence processing as skilled mem- hension: Crosslinguistic evidence from head-final ory retrieval. Cognitive Science, 29:1–45. structures. Language and Cognitive Processes, 25(4):533–567. P. Mannem, H. Chaudhry, and A. Bharati. 2009. Insights into non-projectivity in Hindi. In ACL- S. Vasishth, R. Shaher, and N. Srinivasan. 2012. The IJCNLP Student Research Workshop. role of clefting, word order and given-new ordering in sentence comprehension: Evidence from Hindi. C. D. Manning and H. Schutze.¨ 1999. Foundations In Journal of South Asian Linguistics, volume 5. of Statistical Natural Language Processing. MIT Press, june. S. Vasishth. 2004. Discourse Context and Word Or- der Preferences in Hindi. In R. Singh (ed.), The R. McDonald, K. Crammer, and F. Pereira. 2005a. On- Yearbook of South Asian Languages and Linguistics, line large-margin training of dependency parsers. In pages 113–127. Proceedings of the 43rd ACL, ACL ’05. F. Xia and M. Palmer. 2001. Converting dependency R. McDonald, F. Pereira, K. Ribarov, and J. Hajic.ˇ structures to phrase structures. In Proceedings of the 2005b. Non-projective dependency parsing us- First International Conference on Human Language ing spanning tree algorithms. In Proceedings of Technology Research. HLT/EMNLP.

117