Aligned Parallel Corpus by Projecting Syntactic Relations from Annotated Source Corpus

Shailly Goyal Niladri Chatterjee Department of Mathematics Indian Institute of Technology Delhi Hauz Khas, New Delhi - 110 016, India {shailly goyal, niladri iitd}@yahoo.com

Abstract approaches either use examples from the same lan- guage, e.g., (Bod et al., 2003; Streiter, 2002), or Example-based parsing has already been they try to imitate the parse of a given sentence proposed in literature. In particular, at- using the parse of the corresponding sentence in tempts are being made to develop tech- some other language (Hwa et al., 2005; Yarowsky niques for language pairs where the source and Ngai, 2001). In particular, Hwa et al. (2005) and target languages are different, e.g. have proposed a scheme called direct projection Direct Projection Algorithm (Hwa et al., algorithm (DPA) which assumes that the relation 2005). This enables one to develop parsed between two words in the source language sen- corpus for target languages having fewer tence is preserved across the corresponding words linguistic tools with the help of a resource- in the parallel target language. This is called Di- rich source language. The DPA algo- rect Correspondence Assumption (DCA). rithm works on the assumption of Di- However, with respect to Indian languages we rect Correspondence which simply means observed that the DCA does not hold good all the that the relation between two words of time. In order to overcome the difficulty, in this the source language sentence can be pro- work, we propose an algorithm based on a vari- jected directly between the correspond- ation of the DCA, which we call pseudo Direct ing words of the parallel target language Correspondence Assumption (pDCA). Through sentence. However, we find that this as- pDCA the syntactic knowledge can be transferred sumption does not hold good all the time. even if not all syntactic relations may be projected This leads to wrong parsed structure of the directly from the source language to the target lan- target language sentence. As a solution guage in toto. Further, the proposed algorithm we propose an algorithm called pseudo projects the relations between phrases instead of DPA (pDPA) that can work even if Direct projecting relations between words. Keeping in Correspondence assumption is not guaran- line with (Hwa et al., 2005), we call this algorithm teed. The proposed algorithm works in a as pseudo Direct Projection Algorithm (pDPA). recursive manner by considering the em- The present work discusses the proposed pars- bedded phrase structures from outermost ing scheme for a new (target) language with the level to the innermost. The present work help of a parser that is already available for a discusses the pDPA algorithm, and illus- language (source) and using word-aligned paral- trates it with respect to English-Hindi lan- lel corpus of the two languages under considera- guage pair. Link Grammar based pars- tion. We propose that the syntactic relationships ing has been considered as the underlying between the chunks of the input sentence T (of parsing scheme for this work. the target language) are given depending upon the 1 Introduction relationships of the corresponding chunks in the translation S of T . Along with the parsed struc- Example-based approaches for developing parsers ture of the input, the system also outputs the con- have already been proposed in literature. These stituent structure (phrases) of the given input sen-

301 Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 301–308, Sydney, July 2006. 2006 Association for Computational Linguistics tence. ture and constituent representation of the sentence In this work, we first discuss the proposed given below. scheme in a general framework. We illustrate the +------Ss------+ scheme with respect to parsing of Hindi sentences | +----Jp---+ | using the Link Grammar (LG) based parser for En- +--Ds-+-Mp-+ +-Dmc-+ +-Pa-+ | | | | | | | glish and the experimental results are discussed. the teacher of the boys is good Before that in the following section we discuss Link Grammar briefly. (S (NP (NP The teacher) (PP of (NP the boys))) (VP is) 2 Link Grammar and Phrases (ADJP good).) Link grammar (LG) is a theory of which It may be noted that in the phrase structure of builds simple relations between pairs of words, the above sentence, verb phrase as obtained from rather than constructing constituents in tree-like the phrase parser has been modified to some ex- hierarchy. For example, in an SVO language like tent. The algorithm discussed in this work as- English, the verb forms a subject link (S-) to some sumes verb phrases as the main verb along with word on its left, and an object link (O+) with some all the auxiliary verbs. word on its right. Nouns make the subject link For ease of presentation and understanding, we (S+) to some word (verb) on its right, or object classify phrase relations as Inter-Phrase and Intra- link (O-) to some word on its left. phrase relations. Since the phrases are often em- The English Link Grammar Parser (Sleator and bedded, different levels of phrase relations are ob- Temperley, 1991) is a syntactic parser of English tained. From the outermost level to the innermost, based on LG. Given a sentence, the system as- we call them as “first level”, “second level” of re- signs to it a syntactic structure, which consists of lations and so on. One should note that an ith level a set of labeled links connecting pairs of words. Intra-phrase relation may become Inter-phrase re- The parser also produces a “constituent” represen- lation at a higher level. tation of a sentence (showing noun phrases, verb As an example, consider the parsing and phrase phrases, etc.). It is a dictionary-based system in structure of the English sentence given above. which each word in the dictionary is associated In the first level the Inter-phrase relations (cor- with a set of links. Most of the links have some responding to the phrases “the teacher of associated suffixes to provide various information the boys”, “is” and “good”) are Ss and Pa (e.g., gender (m/f), number (s/p)), describing and the remaining links are Intra-phrase relations. some properties of the underlying word. The En- In the second level the only Inter-phrase rela- glish link parser lists total of 107 links. Table tionship is Mp (connecting “the teacher” and 1 gives a list of some important links of English “the boys”), and the Intra-phrase relations are LG along with the information about the words on Ds, Jp and Dmc. In third and the last level, Jp is their left/right and some suffixes. the Inter-phrase relationship and Dmc is the Intra- phrase relation (corresponding to “of” and “the Link Word in Left Word in Right Suffixes A Premodifier Noun - boys”). D Nouns s/m,c/u The algorithm proposed in Section 4 uses J Preposition Object of the prepo- s/p pDCA to first establish the relations of the tar- sition M Noun Post-nominal Modi- p/v/g/a get language corresponding to the first-level Inter- fier phrase relations of the source language sentence. MV Verbs/adjectives Modifying phrase p/a/i/ Then recursively it assigns the relations corre- l/x O Transitive verb Direct or indirect ob- s/p sponding to the inner level relations. ject P Forms of “be” Complement of “be” p/v/g/a 3 DCA vis-a-vis` pDCA PP Forms of “have” Past participle - S Subject Finite verb s/p, i, g Direct Correspondence Assumption (DCA) states that the relation between words in source language Table 1: Some English Links and Their Suffixes sentence can be projected as the relations between corresponding words in the (literal) translation in As an example, consider the syntactic struc- the target language. Direct Projection Algorithm

302 (DPA), which is based on DCA, is a straightfor- from the source language sentence. Thus we pro- ward projection procedure in which the dependen- pose pseudo Direct Correspondence Assumption cies in an English sentence are projected to the (pDCA) where not all relations can be projected sentence’s translation, using the word-level align- directly. The projection algorithm needs to take ments as a bridge. DPA also uses some monolin- care of the following three categories of links: gual knowledge specific to the projected-to lan- guage. This knowledge is applied in the form of Category 1: Relationship between two chunks Post-Projection transformation. in the source language can be projected to the tar- However with respect to many language pairs get language with minor or no changes (for ex- syntactic relationships between the words cannot ample, subject-verb, object-verb relationships in always be imitated to project a parse structure the above illustration). It may be noted that since from source language to target language. For il- except for some suffix differences (due to mor- lustration consider the sentence given in Figure 1. phological variations), the relation is same in the We try to project the links from English to Hindi source and the target language. in Figure 1(a) and Hindi to Bangla in Figure 1(b). Category 2: Relationship between two chunks For Hindi sentence, links are given as discussed by in the source language can be projected to the Goyal and Chatterjee (2005a; 2005b). target language with major changes. For ex- ample, in the English sentence given in Figure 2(a), the relationship between the girl and in the white dress is Mp, i.e. “nominal mod- ifier (preposition phrase)”. In the corresponding phrases ladkii and safed kapde waalii of Hindi, although the relationship is same, i.e., “nominal modifier”, the type of nominal modifier is chang- ing to waalaa/waale/waalii-adjective. If the dis- (a) tinction between the types of nominal modifiers is not maintained, the parsing will be very shallow. Hence the modification in the link is necessary. Category 3: Relationship between two chunks in the target language is either entirely different or can not be captured from the relationship be- tween the corresponding chunk(s) in the source (b) language. For example, the relationship between the main verb and the auxiliary verb of the Hindi Figure 1: Failure of DCA sentence in Figure 2(a) can not be defined us- ing the English parsing. Such phrases should be We observe that in the parse structure of the tar- parsed independently. get language sentences, neither all relations are The proposed algorithm is based on the above- correct nor the is complete. Thus, we described concept of pDCA which gives the parse observe that DPA leads to, if not wrong, a very structure of the sentences given in Fig. 2. shallow parse structure. Further, Figure 1(b) sug- While working with Indian languages, we found gests that DCA fails not only for languages be- that outermost Inter-phrase relations usually be- longing to different families (English-Hindi), but long to Category 1, and remaining relations be- also for languages belonging to the same family long to Category 2. Generally an innermost Intra- (Hindi-Bangla). phrase relation (like verb phrase) belongs to Cate- Hence it is necessary that the parsing algo- gory 3. Thus, outermost Inter-phrase relations can rithm should be able to differentiate between the usually be projected to target language directly, in- links which can be projected directly and the nermost Intra-phrase relations for the target lan- links which cannot. Further it needs to identify guage which are independent of the source lan- the chunks of the target language sentence that guage should be decided on the basis of language cannot be linked even after projecting the links specific study and remaining relationship should

303 in the target language are given independently. A list of link rules is maintained which keeps the in- formation about modification(s) required in a link while projecting from the source language to the target language. These rules are limited to closed category words, to parts of speech projected from source language, or to easily enumerated lexical categories. (a) Figure 3 describes the algorithm. The algorithm takes an input sentence (T ) and the parsing and the constituent structure of its parallel sentence (S). Further S and T are assumed to be word-aligned. Initially, S and T are passed to the module Project- From(), which identifies the constituent phrases of S and the relations between them. Then each set (b) of phrases and relations is passed to the module ParseFrom(). ParseFrom() module takes as input Figure 2: Parsing Using pDCA two source phrases/words, relation between them, and corresponding target phrases. It projects the be modified before projection from source to tar- corresponding relations in the target language sen- get language. tence T . ParseFromSpecial() module is required if the relation between phrases of source language 4 The Proposed Algorithm can not be projected so directly to the target lan- guage. Module Parse() assigns links between the DPA (Hwa et al., 2005) discusses projection pro- constituent words of the target language phrases cedure for five different cases of word align- ∈ P. Notations used in the algorithm are as fol- ment of source-target language: one-to-one, one- lows: to-none, one-to-many, many-to-one and many-to- many. As discussed earlier, DPA is not sufficient • By T 0 ∼ S0 we mean that T 0 is aligned with for many cases. For example, in case of one-to- S0, T 0 and S0 being some text in the target many alignment, the proposed solution is to first and source language, respectively. create a new empty word that is set as head of all multiply aligned words in target language sen- • Given a language, the head of a phrase is usu- tence, and then the relation is projected accord- ally defined as the keyword of the phrase. For ingly. But, in such cases, relations between these example, for a verb phrase, the head word is multiply-aligned words can not be given, and thus the main verb. the resulting parsing becomes shallow. The pro- • P is the exhaustive set of target language posed algorithm (pDPA) overcomes these short- phrases for which Intra-phrase relations are comings as well. independent of the corresponding source lan- The pDPA works in the following way. It re- guage phrases. cursively identifies the phrases of the target lan- guage sentence, and assigns the links between • Rule list R is the list of source-target lan- the two phrases/words of the target language sen- guage specific rules which specifies the mod- tence by using the links between the correspond- ifications in the source language relations to ing phrases/words in the source language sen- be projected appropriately in the target lan- tence. It may be noted that link between phrases guage. means link between the head words of the corre- sponding phrases. Assignment of links starts from • Given the parse and constituent structure of a the outermost level phrases. Syntactic relations text S, Ψij = hSi,Sj,Li, where L is the re- between the constituents of the target language lation between the constituent phrases/words 0 phrase(s) for which the syntactic structure does Si and Sj of S. Ψij = hTi,Tji, Ti ∼ Si and 0 not correspond with the corresponding phrase(s) Tj ∼ Sj. Further, Φij = hΨij, Ψiji.

304 0 0 0 ProjectFrom(S ,T ): // S is a source • S is a stack of Φijs. // language sentence or phrase, T 0 ∼ S0 { • L is the set of source language relations 0 IF T 0 ∈ P whose occurrence in parse of some S may 0 0 THEN Parse(T 0); lead to different structure of T , where T ∼ 0 ELSE S . 0 S = {S1,S2,...,Sn}; // Sis are In the following sections we discuss in detail the 0 //constituent phrases/words of S scheme for parsing Hindi sentences using parse 0 T = {T1,T2,...,Tn} // Ti ∼ Si structure of the corresponding English sentence. 0 Find all Ψij = hSi,Sj,Li from S and Along with the parse structure of the input, the 0 0 corresponding Ψij = {Ti,Tj} from T ; phrase structure is also obtained. 0 Φij = hΨij, Ψiji For all i, j, push (S , Φij); 5 Case study: English to Hindi While !empty(S ) Prior requirements for developing a parsing Φ = pop(S ); scheme for the target language using the proposed IF L/∈ L algorithm are: development of target language THEN ParseFrom(Φ); links, word alignment technique, phrase identifi- ELSE ParseFromSpecial(Φ); cation procedure, creation of rule set R, morpho- } logical analysis, development of ParseFromSpe- 0 0 Parse(T ): // T is a target language phrase cial() module. In this section we discuss these de- { tails for adapting a parser for Hindi using English 0 Assign links between constituent words of T LG based parser. using target language specific rules; } Hindi Links. Goyal and Chatterjee (2005a; ParseFrom(Φ): // Φ = hΨ, Ψ0i; 2005b) have developed links for Hindi Link Gram- 0 // Ψ = hS1,S2,Li;Ψ = hT1,T2i; mar along with their suffixes. Some of the Hindi { links are briefly discussed in the Table 2. It may IF T1 6= {φ} & T2 6= {φ} THEN be noted that due to the free of Hindi, Find head words t1 ∈ T1 and t2 ∈ T2; direction can not be specified for some links, i.e., 0 0 Assign relation L between t1 and t2; // L for such links “Word in Left” and “Word in Right” //is target language link corresponding (second and third column of Table 2) shall be read //to L identified using R as “Word on one side” and “Word on the other IF T1 is a phrase and not already parsed side”, respectively. THEN ProjectFrom(S1,T1); Link Word in Left Word in Right Directed IF T2 is a phrase and not already parsed S Subject Main verb NO THEN ProjectFrom(S2,T2); SN ne Main verb NO } O Object Main verb NO 0 J noun/pronoun postposition YES ParseFromSpecial(Φ): // Φ = hΨ, Ψ i; MV verb modifier Main verb NO 0 // Ψ = hS1,S2,Li;Ψ = hT1,T2i; MA Adjective Noun YES { ME aa-e-ii form of Noun YES verb Use target language specific rules to identify if MW waalaa/waale/ Noun YES 0 the relation between T1 and T2 is given by L ; waalii IF true THEN ParseFrom(Φ); PT taa-te-tii form of declension of YES verb verb honaa ELSE D Head noun YES Assign required relations using rules; IF T1 is a phrase and not already parsed Table 2: Some Hindi Links THEN ProjectFrom(S1,T1); IF T2 is a phrase and not already parsed THEN ProjectFrom(S2,T2); Word Alignment. The algorithm requires that } the source and target language sentences are word aligned. Some English-Hindi word align- Figure 3: pseudo Direct Projection Algorithm ment algorithms have already been developed, e.g.

305 (Aswani and Gaizauskas, 2005). However, for the translation of words of English NP1 is kamre mein current implementation alignment has been done baiThii huii aurat which is followed by postposi- manually with the help of an online English-Hindi tion ne, the Hindi phrase corresponding to NP1 dictionary1. is kamre mein baiThii huii aurat ne with ne as the head word. As huii and thii, which follow Identification of Phrases and Head Words. the verbs baiThii4 and banaayaa respectively, are Verb Phrases. Corresponding to any main verb present in the auxiliary verb list, Hindi VPs are obtained as baiThii huii and banaayaa thaa (cor- vi present in the Hindi sentence, a verb phrase is formed by considering all the auxiliary verbs fol- responding to VP1). lowing it. A list of Hindi auxiliary verbs, along Phrase Set P. Hindi verb phrase and postposi- with the linkage requirements has been main- tion phrases are linked independent of the corre- tained. This list is used to identify and link verb sponding phrases in the English sentence. Thus, phrases. Main verb of the verb phrase is consid- P = {VP,PP }. ered to be the head word. Rule List R. Below we enlist some of the rules 2 Noun and Postposition Phrases. English NP defined for parsing Hindi sentences using the En- 3 is translated in Hindi as either NP or PP . Also, glish links (E-links) of the parallel English sen- English PP can be translated as either NP or PP. If tences. Note that these rules are dependent on the the Hindi noun is followed by any postposition, target language. then that postposition is attached with the noun Corresponding to E-link S: If the Hindi subject is to get a PP. In this case the postposition is con- followed by ne, then the subject makes a Jn link sidered as the head. Hindi NP corresponding to with ne, and ne makes an SN link with the verb. some English NP is the maximal span of the words (in Hindi sentence) aligned with the words in the Corresponding to E-link O: If the Hindi object is corresponding English NP. The Hindi noun whose followed by ko, then the object makes a Jk link English translation is involved in establishing the with ko, and ko makes an OK link with the verb. Inter-phrase link is the head word. Note that if the Corresponding to E-links M, MX: English NPs last word (noun) in this Hindi NP is followed by may have preposition phrase, present participle, any postposition (resulting in some PP), then that past participle or adjective as postnominal modi- postposition is also included in the NP concerned . fiers which are translated as prenominal modifiers, In this case the postposition is the head of the NP. or as relative clause in Hindi. The structure of The system maintains a list of Hindi postpositions postnominal modifier, however, may not be pre- to identify Hindi PPs. served in the Hindi sentence. If the sentence is not For example, consider the translation pair the complex, then the corresponding Hindi link may lady in the room had cooked the be one of MA (adjective), MP (postposition phrase), food∼ kamre (room) mein (in) baiThii huii (-) MT (present participle), ME (past participle), or MW aurat (lady) ne (-) khaanaa (food) banaayaa (waalaa/waale/waalii-adjective). An appropriate (cooked) thaa (-). link is to be assigned in Hindi sentence after iden- The phrase structure of the English sen- tification of the structure of the nominal modifier. tence is (NP1 (NP2 the lady) (PP1 These cases are handled in the module ParseFrom- in (NP3 the room))) (VP1 had Special(). The segment of the module that handles cooked) (NP4 the food). English Mp link is given in Figure 4. Here, some of the Hindi phrases are as follows: Further, since morphological information of kamre mein and aurat ne are identified as Hindi Hindi words can not be always extracted using cor- PP corresponding to English PP1 and NP2. The responding English sentence, a morphological an- words mein and ne are considered as their head alyzer is required to extract the information5. For words, respectively. Since the maximal span of the current implementation, morphological infor-

1www.sanskrit.gde.to/hindi/dict/eng-hin-itrans.html 4We observe that English PP as postnominal modifier may 2In Hindi prepositions are used immediately after the be translated as verbal prenominal modifier in Hindi and in noun. Thus, we refer to them as “postposition”. such cases some unaligned word is effectively a verb. 3PP for English is preposition phrase and for Hindi it 5For Hindi, some work is being carried out in this direc- stands for postposition phrase. tion, e.g., http://ccat.sas.upenn.edu/plc/ tamilweb/hindi.html

306 Since Ss ∈/ L , ParseFrom(Φ ) is executed. ParseFromSpecial(Φ): // Φ = hΨ, Ψ0i; 12 0 // Ψ = hS1,S2,Li;Ψ = hT1,T2i; ParseFrom(Φ12): { The algorithm identifies t1 = ne, t2 = banaayaa. IF L = Mp THEN //S and S are NP and PP, resp. 1 2 The Hindi link corresponding to Ss will be SN. IF T2 is followed by some verb, v, not aligned with any word in S THEN The module ProjectFrom(S1, T1) is then called. T3 = VP corresponding to v; ProjectFrom(S1, T1): Parse(T3); Find head word t1 ∈ T1; S1 = {S11,S12}, where S11 and S12 are the Assign MT/ME link between v and t1; girl and in the room, respectively. Corre- Assign MVp link between postposition (in T2) and v; sponding T11 and T12 are ladkii ne and kamre ProjectFrom(S1, T1); ProjectFrom(S2, T2); mein. Thus, Φ = hhS11,S12, Mpi, hT11,T12ii. ELSE Since L = Mp ∈ L , ParseFromSpecial(Φ) is ParseFrom(Φ); ELSE called. Check for other cases of L; ParseFromSpecial(Φ): (Refer to Figure 4) } Since T2 is followed by an unaligned verb Figure 4: ParseFromSpecial() for ‘Mp’ Link baithii, the algorithm finds T3 as baithii, and t1 as ne. It assigns ME link between baithii and ne. Further, MVp link is assigned between mation is being extracted using some rules in sim- mein and baithii. Then ProjectFrom(S11,T11) and pler cases, and manually for more complex cases. ProjectFrom(S12,T12) are called. Since both T11 and T12 ∈ S , J and Jn links are assigned be- 5.1 Illustration with an Example tween constituent words of T11 and T12, respec- Consider the English sentence (S) the girl tively, using Hindi-specific rules. in the room drew a picture, its parsed Similarly, Φ23 is parsed. and constituent structure as given in Figure 5. Fur- The final parse and phrase structure of the sen- ther, the corresponding Hindi sentence (T ), and tence are obtained as given in Figure 6. the word-alignment is also given.

Figure 6: Parsing of Example Sentence

6 Experimental Results Figure 5: An Example Currently the system can handle the following The step-by-step parsing of the sentence as per types of phrases in different simple sentences. the pDPA is given below. Noun Phrase. There can be four basic elements ProjectFrom(S, T ): of an English NP6: determiner, pre-modifier, noun S = {S ,S ,S }, where S ,S ,S are the 1 2 3 1 2 3 (essential), post-modifier. The system can han- phrases the girl in the room, drew and dle any combination of the following: adjective, a picture, respectively. From the definition of noun, present participle or past participle as pre- Hindi phrases, corresponding T ’s are identified as i modifier, and adjective, present participle, past “kamre mein baithii laDkii ne”, “banaayaa” and participle or preposition phrase as post-modifier. “ek chitr”. From the parse structure of S, Φ’s are Note that some of these cases may be translated as obtained as Φ = hhS ,S , Ssi, hT ,T ii and 12 1 2 1 2 complex sentence in Hindi (e.g., (book on the Φ = hhS ,S , Osi, hT ,T ii. These Φ’s are 23 2 3 2 3 table ∼ jo kitaab mej par rakhii hai). We are pushed in the stack S and further processing is working upon such cases. done one-by-one for each of them. We show the 6 further process for the Φ12. Pronouns as NPs are simple.

307 Verb Phrase. The system can handle all the four provide more flexibility in the projection algo- aspects (indefinite, continuous, perfect and perfect rithm. The flexibility comes from the fact that un- continuous) for all three tenses. Other cases of like DPA the algorithm can project links from the VPs (e.g., modals, passives, compound verbs) can source language to the target language even if the be handled easily by just identifying and putting translations are not literal. Use of rules at the pro- the corresponding auxiliary verbs and their link- jection level gives more robust parsing and reduces ing requirements in the auxiliary verb list. the need of post-editing. The proposed scheme Since the system is not fully automated yet, we should work for other target languages also pro- could not test our system on a large corpus. The vided the relevant rules can be identified. Fur- system has been tested on about 200 sentences ther, since LG can be converted to Dependency following the specific phrase structures mentioned Grammar (DG) (Sleator and Temperley, 1991), above. These sentences have been taken randomly this work can be easily extended for languages for from translation books, stories books and adver- which DG implementation is available. tisement materials. These sentences were manu- At present, we have focused on developing ally parsed and a total of 1347 links were obtained. parsing scheme for simple sentences. Work has to These links were compared with the system’s out- be done to parse complex sentences. Once a size- put. Table 3 summarizes the findings. able parsed corpus is generated, it can be used for developing the parser for a target language using Correct Links : 1254 bootstrapping. We are currently working on these Links with wrong suffix : 47 lines for developing a Hindi parser. Wrong links : 22 Links missing : 31 References

Table 3: Experimental Results Niraj Aswani and Robert Gaizauskas. 2005. A hy- brid approach to aligning sentences and words in English-Hindi parallel corpora. In ACL 2005 Work- After analyzing the results, we found that shop on Building and Using Parallel Texts: Data- driven and Beyond. • For some links, suffixes were wrong. This Rens Bod, Remko Scha, and Khalil Sima’an, editors. was due to insufficiency of rules identifying 2003. Data-Oriented Parsing. Stanford: CSLI Pub- morphological information. lications.

• Due to incompleteness of some cases of Shailly Goyal and Niladri Chatterjee. 2005a. Study of Hindi noun phrase for developing a link ParseFromSpecial() module, some wrong grammar based parser. Language in India, 5. links were assigned. Also, some links which should not have been projected, were pro- Shailly Goyal and Niladri Chatterjee. 2005b. Towards developing a link grammar based parser for Hindi. jected in the Hindi sentence. We are working In Proc. of Workshop on Morphology, Bombay. towards exploring these cases in detail. Rebecca Hwa, Philip Resnik, Amy Weinberg, Clara • Some links were found missing in the pars- Cabezas, and Okan Kolak. 2005. Bootstrap- ping parsers via syntactic projection across parallel ing since corresponding sentence structures texts. Natural Language Engineering, 11(3):311– are yet to be considered in the scheme. 325, September.

7 Concluding Remarks Daniel Sleator and Davy Temperley. 1991. Parsing English with a link grammar. Computer Science The present work focuses on development of Ex- technical report CMU-CS-91-196, Carnegie Mellon University, October. ample based parsing scheme for a pair of lan- guages in general, and for English to Hindi in par- Oliver Streiter. 2002. Abduction, induction and ticular. memorizing in corpus-based parsing. In ESSLLI- 2002 Workshop on Machine Learning Approaches Although the current work is motivated by in Computational Linguistics,, Trento, Italy. (Hwa et al., 2005), the algorithm proposed herein David Yarowsky and Grace Ngai. 2001. Inducing mul- provides a more generalized version of the projec- tilingual POS taggers and NP bracketers via robust tion algorithm by making use of some target lan- projection across aligned corpora. In NAACL-2001, guage specific rules while projecting links. This pages 200–207.

308