Literature Survey: Pivot-based

Rohit More IIT Bombay [email protected]

Current statistical machine translation sys- the situation of resource-poor languages where tems heavily rely on the availability of par- direct translation is either very poor or not allel corpora between the language pair in- available. Our approach, on the other hand, volved. The good quality parallel corpus is not tries to employ pivot strategy to help improve always available. This creates a bottleneck. the performance of existing direct MT system. One solution to solve this bottleneck is to in- Our attempt to integrate word segmentation troduce third language, named pivot language with pivot strategies is first of a kind. for which there exist good quality source-pivot and pivot-target bilingual corpora. 2 Approaches to Pivot based MT There are methods by which the resources of 1 Related work pivot language can be utilized as explained in There is a substantial amount of work done in (Wu and Wang, 2009) - namely the area of pivot strategies for SMT. (De Gis- 1. Sentence Translation or Transfer Method pert and Marino, 2006) talk about transla- tion task between Catalan and English with 2. Synthetic corpus synthesis use of Spanish as a pivot language. Pivot- 3. Phrase table construction or Triangula- ing is done using two techniques- concatena- tion Approach tion of two SMT systems and direct approach in which Catalan-English corpus is generated These methods are explained in brief in fol- and trained upon. In (Utiyama and Isahara, lowing sections. 2007), the authors inspect the use of pivot lan- guage through - phrase translation (phrase ta- 2.1 Sentence Translation or Transfer ble creation) and sentence translation. (Wu Method and Wang, 2007) discuss three methods for The transfer method first translates the source pivot strategies namely - phrase translation language into pivot language using source- (i.e. triangulation), transfer method and syn- pivot translation system, and then from pivot thetic method. (Nakov and Ng, 2012) try to language to target language through the pivot- exploit the similarity between resource-poor target translation system. Given a source languages and resource-rich languages for the sentence S, we can translate it into n pivot translation task. (Dabre et al., 2014) used language sentences P1,P2,P3, ...Pn using a multiple decoding paths (MDP) to overcome source-pivot translation system. Each of these the limitation of small sized corpora.(Paul et n sentence, Pi then can be translated into m al., 2013) debates over criteria to be consid- target language sentences Ti1,Ti2,Ti3, ....Tim ered for selection of good pivot language. Use using pivot-target translation system. Thus, of source-side segmentation as pre-processing in total we will have m×n target language sen- technique is demonstrated by (Kunchukuttan tences.These sentences can then be re-scored et al., 2014). (Goldwater and McClosky, 2005) using source-pivot and pivot-target transla- investigates several methods for incorporating tion scores according to method described in morphological information to achieve better (Utiyama and Isahara, 2007) translation from Czech to English. If we denote source-pivot system features as Pivot strategies mentioned above focus on hsp and pivot-target features as hpt, the best scoring translation is calculated using equa- induced word alignment, lexical probabilities tion: are estimated. Thus, lexical weight is calcu- lated using induced alignment and estimated ∑L ( ) lexical probabilities. ˆ sp sp pt pt We will take a detailed look at the mathe- t = argmax λk hk (s, p) + λk hk (p, t) t k=1 matics behind triangulation approach (1) 3 Mathematics of Triangulation Where, L is the number of features used in Approach SMT systems and λsp, λpt are feature weights. This section will introduce the triangulation 2.2 Corpus Synthesis method that performs phrase-based SMT for the language pair L − L by using two bilin- In order to obtain source-target corpus, there f e gual corpora of L − L and L − L . Two are two ways. One is, we can translate pivot f p p e translation models are trained for L −L and language sentences from source-pivot corpus f p L −L . Based on these models, a pivot trans- into target language sentences using the pivot- p e lation model is built for L − L , with L as a target system. The other way is, translation f e p pivot language. The details are extracted from of pivot sentences from the pivot-target cor- Wu and Wang (Wu and Wang, 2007). pus into source sentences using pivot-source According to Equation ??, only phrase system. translation probability, and the lexical weight The source-target corpora created using are language dependent. They are introduced above two methods can then be combined to as follows: produce a final synthetic corpus. 3.1 Phrase Translation Probabilities 2.3 Triangulation or Phrase table − − induction Using Lf Lp and Lp Le bilingual corpora, we( train) two phrase translation probabilities The method of triangulation is described in ⃗ | | (?). In this method, we train source-pivot ϕ fi ⃗pi and ϕ (⃗pi ⃗ei), where pi is the phrase models and pivot-target models using source- in pivot language Lp. We( obtain) the phrase pivot and pivot-target corpora respectively. translation probability ϕ f⃗i|⃗ei according to Using these two models created so far, we in- the following model, duce a source-target model. The two impor- tant components to be induced are - 1) phrase ( ) ∑ ( ) ⃗ | ⃗ | | translation probability and 2) lexical weight. ϕ fi ⃗ei = ϕ fi ⃗pi, ⃗ei ϕ (⃗pi ⃗ei) (3) Phrase translation probability is in- ⃗pi duced on the basis of assumption- that source (The phrase) translation probability and target phrases are conditionally indepen- ϕ f⃗i|⃗pi, ⃗ei does not depend on the phrase ⃗ei dent when conditioned on pivot phrases. It in the language Le, since it is estimated from can be given as, the Lf − Lp bilingual corpus. Thus, equation 3 can be rewritten as ( ) ∑ ( ) |⃗ | |⃗ ϕ ⃗s t = ϕ (⃗s ⃗p) ϕ ⃗p t (2) ( ) ∑ ( ) ⃗p ⃗ | ⃗ | | ϕ fi ⃗ei = ϕ fi ⃗pi ϕ (⃗pi ⃗ei) (4) ⃗p Where, ⃗s, ⃗p, ⃗t are phrases in the languages i Ls, Lp, Lt respectively. Are probability calculations correct? Lexical Weight, according to (Koehn et Let us go step by step through the formulation( ) al., 2003), depends on - 1) word alignment in- of phrase translation probability ϕ f⃗i|⃗ei . formation a in a phrase pair (s, t) and 2) lexical First, we marginalize, translation probability w(s|t). To calculate lexical weight, the word align- ( ) ∑ ( ) ⃗ | ⃗ | ment is induced from source-pivot and pivot- ϕ fi ⃗ei = ϕ fi, ⃗pi ⃗ei (5) target alignment. Using the information from ⃗pi ( ) ⃗ Now we will use the chain rule, Where, ϕk f|⃗e is phrase translation prob- ( ) ∑ ( ) ability for phrase pair k. δ (x, y) = 1 ifx = y; otherwise 0 ϕ f⃗i|⃗ei = ϕ f⃗i|⃗pi, ⃗ei ϕ (⃗pi|⃗ei) (6) Thus, lexical translation probability can be ⃗pi estimated as Since, we have Lf −Lp corpus available with us, the calculation of first term in( the above) equation will not depend on p i.e. ϕ f⃗ |⃗p , ⃗e ∑count (f, e) i i i w (f|e) = ′ (11) ( ) ′ f count (f , e) will now reduce to ϕ f⃗i|⃗pi . Thus, the final equation will be, w (f|e) can also be calculated using word method as, ( ) ∑ ( ) ⃗ | ⃗ | | ϕ fi ⃗ei = ϕ fi ⃗pi ϕ (⃗pi ⃗ei) (7) ∑ ⃗pi w (f|e) = w (f|p) w (p|e) sim (f, e; p) 3.2 Lexical Weight p (12) According to (Koehn et al., 2003), lexi- cal weight can be estimated using following Where, w (f|p) and w (p|e) are two lexical model. probabilities, and sim (f, e; p) is the cross lan- guage word similarity. ( ) ∏n 1 ∑ p f⃗|⃗e,a = w (f |e ) 3.3 Interpolated Model w |j| (i, j) ∈ a| i j i=1 ∀(i,j)∈a If we have a small Lf − Le parallel corpus, (8) training a translation model on this corpus In order to estimate lexical weight for our alone will result in the poorly performing sys- model, we first need to obtain the alignment tem. The reason behind the poor performance information a between two phrases f⃗ and ⃗e, is sparse data. In order to improve this per- − and then estimate the lexical translation prob- formance, we can use additional Lf Lp and − ability w (f|e) according to the alignment in- Lp Le parallel corpora. Moreover, we can formation. also use more than one pivot languages to improve the translation performance. Differ- The( alignment) information for the phrase ent pivot language may catch different lan- pair f,⃗e⃗ can be induced from the two phrase ( ) guage phenomenon and can improve transla- pairs, f,⃗ ⃗p and (⃗p,⃗e). Let a1 and a2 be tion quality by adding quality Lf − Le phrase the word( alignment) information inside phrase pairs. pairs f,⃗ ⃗p and (⃗p,⃗e) respectively. If we include n pivot languages, n pivot models can be estimated as described in sec- tion 3. In order to combine all these mod- a = {(f, e) |∃p :(f, p) ∈ a1& (p, e) ∈ a2} (9) els with the standard model trained with the L − Le corpus, we use linear interpolation. With this induced alignment information, f The phrase translation probability and the lex- there exists a method to estimate the prob- ical weight are estimated as shown in equation ability directly from the induced phrase pairs. 13 and 14 This is phrase method. If we use K to denote the number of induced phrase pairs, we esti- mate co-occurring frequency of the word-pair ( ) ∑n ( ) ⃗| ⃗| (f, e) according to the following model. ϕ f ⃗e = αiϕi f ⃗e (13) i=0

∑K ( ) ∑n ⃗ ( ) n ( ) count (f, e) = ϕk f|⃗e δ (f, fi) δ (e, ea ) ∑ i ⃗| ⃗| k=1 i=1 pw f ⃗e,a = βipw,i f ⃗e,a (14) (10) i=0 ∑ ∑ i=0 i=0 where, n αi = 1 and n βi = 1 1. Concatenating Bi-texts In this approach, we can simply concate- ( ) ( ) nate the bi-texts for X1-Y and X2-Y into ⃗ ⃗ ϕ0 f|⃗e and pw,0 f|⃗e,a denote the phrase one large bi-text. It can improve align- translation probability and lexical weight ments obtained from X1-Y bi-text. This − trained( with) the Lf (Le corpus.) is because, additional sentences can pro- vide context for rare words in that bi-text. ϕi f⃗|⃗e and pw,i f⃗|⃗e,a (i = 1, 2, ...., n) are the phrase translation probability and lexi- Concatenation can also provide more cal weight estimated by using pivot languages. source side translation options, thereby increasing lexical coverage and reducing αi and βi are interpolation coefficients. the number of Out-Of-Vocabulary (OOV) 4 Case Studies words. It can also introduce new non- compositional phrases on source-side to 4.1 Improving statistical machine increase the fluency. It also offers new translation for a resource target language phrases. Inappropriate disadvantaged language using phrases from X2 that do not exist in X1 related resource-rich languages will not match the test time input. Due to recent developments in statistical ma- However, this approach of simple concate- chine translation (SMT), it is possible to build nation can be problematic. Since the size a prototype system for any language pair of X2-Y bi-text is much higher than X1-Y within hours. To achieve this, a large number bi-text, the former will dominate during of parallel sentence-aligned text is required. word alignment and phrase extraction. Such high-quality bi-texts are rare except for This can affect lexical and phrase transla- few pairs of languages. tion probabilities, thus yielding poor per- Nakov and Ng (Nakov and Ng, 2012) pro- formance. This imbalance of bi-texts can pose a language-independent approach for im- be corrected by repeating smaller X1-Y proving machine translation for resource dis- bi-text several times so that large one advantaged languages exploiting their simi- does not dominate. larity to resource-rich ones. In other words, Original and additional training bi-texts we have a resource disadvantaged language are combined in following ways. (say X1) which is closely related to resource- rich language (say X2). X1 and X2 may (a) cat × 1 - Simple concatenation have similarities in word order, vocabulary, of original and additional bi-text to spelling, syntax, etc. We improve translation form a new training bitext, which from resource disadvantaged language X1 into is used to train a new phrase-based resource-rich language Y using bi-text con- SMT system. taining a limited number of parallel sentences (b) cat × k - Concatenation of k copies for X1-Y and large bi-text for X2-Y. The ap- of original bi-text and one copy of ad- proaches to achieve the same are discussed be- ditional bi-text to form a new train- low. ing bi-text. The value of k is se- lected such that original bi-text ap- 4.1.1 Method proximately matches the size of ad- In order to use bi-text of one language on or- ditional bi-text. der to improve SMT for some related language, (c) cat × k:align - We concatenate k two general strategies are used - 1) bi-text con- copies of original bi-text and one catenation with possible repetitions of original copy of additional bi-text to form a bi-text for balance and 2) phrase table combi- new training bi-text. Word align- nation, where each bi-text is used to build sep- ments are generated from this new arate phrase table, and then two phrase tables bi-text. Then all sentence pairs and are combined. We discuss these strategies be- word alignments are discarded ex- low: cept for one copy of original bi-text. Thus, only word alignments from above. Improvement in word alignments original bi-text are induced using ad- for X1-Y bi-text is achieved by biasing ditional statistical information from word alignment process by considering additional bi-text. These alignments additional phrases from X2-Y bi-text. It are then used to build a phrase table. also tries to increase lexical coverage by considering additional phrases from X2-Y 2. Combining Phrase Tables bi-text. The process is explained in brief The alternative way to use additional below. training bi-text is to build separate phrase tables. These phrase tables can be used (a) Use X1-Y bi-text k times and X2-Y together, merged, or interpolated. bi-text one time to create balanced bi-text Brep. Create word alignments Phrase table construction method has for Brep and truncate them keeping many advantages. The phrase pairs ex- only once copy for X1-Y bi-text. By tracted from X1-Y bi-text are clearly dis- using these alignments, phrase table tinguished from riskier ones from X2-Y Trep−trunc is created. bi-text. The lexical and phrase transla- (b) By using the simple concatenation of tion probabilities are combined in proper one copy of X1-Y bi-text and one manner. On the negative side, word- copy of X2-Y bi-text, create a bi-text alignments for sentences in X1-Y bi-text called B . Create word alignments are not improved as they were in first case. cat for B and build a phrase table T Below are the three phrase table construc- cat cat tion strategies: (c) Now by making use of Trep−trunc and Tcat, create a new phrase table by us- (a) Two tables : Two separate phrase ing merging. The priority is given to tables are build from two bi-texts. Trep−trunc during merging. These tables are used as alternative decoding paths. 4.1.2 Experiments and Analysis (b) Interpolation : From original and In this paper, a number of experiments were additional bi-text, two separate done in order to test the similarity between phrase tables, Torig and Textra, are the original (Indonesian and Spanish) and the built. To combine corresponding auxiliary languages (Malay and Portuguese). conditional probabilities, linear Indonesian-English SMT is improved using interpolation is used - Malay as auxiliary languages, while Spanish- P r (e|s) = αP rorig (e|s) + English SMT is improved using Portuguese as (1 − α) P rextra (e|s). pivot. The value of α is optimized over a Various conclusions are drawn according to development dataset. results of the experiments. It is clear that rel- (c) Merge : From original and addi- ative languages can help improve SMT. There tional bi-text, two separate phrase was improvement of over 3 BLEU points in tables, Torig and Textra, are built. We Spanish-English using Portuguese and around keep all source-target phrases from 1.5 BLEU points improvement in case of Torig, adding to them those source- Indonesian-English using Malay. target phrase pairs from Textra that Method of simple concatenation helps, but were not present in Torig. For each it can be problematic when additional sen- added phrase pair, the associated lex- tences are way more than original. Concate- ical and phrase translation probabil- nation works well if original bi-text is repeated ities are retained. enough number of times to match to the size of additional bi-text. 3. Proposed Approach To give additional weighting to original The approach proposed here tries to phrases in merging method is a good strategy. take into account advantages and disad- Improvement in system is due to improvement vantages of both the schemes discussed in word alignment as well as due to increased lexical coverage. to Spanish-Catalan system will not corre- late with English test and may get very 4.2 Catalan-English Statistical low probabilities when training English- Machine Translation without Catalan system. parallel corpus: Bridging through Spanish 4.2.3 Baseline SMT system This paper (De Gispert and Marino, 2006) dis- The SMT system used for this exper- cusses about experiments done for Catalan- iments follows the maximum entropy English Statistical Machine Translation with- framework. It maximizes log-linear com- out an English-Catalan parallel corpus. In- bination of feature functions, as described stead they make use of English-Spanish par- in the following equation: allel corpus and Spanish-Catalan parallel cor- pus. Since, Spanish and Catalan are have close language proximity, promising results are ∑M ( ) achiever using Spanish as a bridge language. ˆI J I t1 = argmax λmhm S1 , t1 (15) tI 4.2.1 Choice of Spanish as a bridge 1 m=1 Catalan is the Roman language spoken or un- derstood by over 12 million people.In spite of Where, λm corresponds to weighting coef- this, there are not much parallel corpora avail- ficients of log-linear combination, and the able for Catalan-English. Spanish-English on feature functions hm (s, t) to a logarith- the other side has large good quality parallel mic scaling of the probabilities of each corpus. Spanish and Catalan belong to the model. For this approach, one transla- same language family showcasing morpholog- tion model and four additional feature ical and grammatical similarity. Thus, it is models are used. In contrast to stan- natural to exploit the use of Spanish as bridge dard phrase-based models, the transla- language. tion model used here is a bilingual n- gram model expressed in tuples as bilin- 4.2.2 Bridging Strategies gual units. The additional features used In order to carry out English-Catalan machine are - a target language model, a word translation, two strategies are implemented bonus model, a source-to-target lexicon which are as follows: model, a target-to-source lexicon model. Once these models are computed, the op- 1. Sequential Strategy timal values of logarithmic coefficients are This method simply concatenates two sta- estimated using a Simplex algorithm, an tistical machine translation systems, one in-house implementation. between Catalan and Spanish, and the other between Spanish and English. 4.2.4 Results

This is an error additive approach, as er- The results, in general, show that auto- rors from one system propagate to the in- matic evaluation measures achieved in the put of following system. Catalan-English task are similar to those of Spanish-English task. Also, sentence 2. Direct Strategy concatenation performs worse than direct This strategy consists of translating the strategy in both directions i.e. from En- whole Spanish side of English-Spanish glish to Catalan and vice versa. Thus, corpus into Catalan by using Spanish- nearly no loss is found when Spanish is to-Catalan SMT system, which is of a used as a bridge to obtain the Catalan- more general domain. Then, an English- English system. For automatic evalua- Catalan is trained on this automatically tion, Word Error Rate (WER), Position- translated noisy Catalan text. With this, independent word Error Rate (PER), and the expectation is that the errors related BLEU scores are used. These experiments proved that it is pos- 2. Pseudo Corpus Approach sible to build a large-scale statistical ma- In the first part, a ”noisy” corpus between chine translation system between Catalan source and target language is created by and any other language as long as there translating pivot language parts of the is huge parallel corpus available between source-pivot corpus into the target lan- Spanish and that language. guage system using pivot-target MT sys- The translation strategy discussed in this tem. In the second part, a single SMT paper is limited to resource disadvan- system is trained on the ”noisy” source- taged language which are closely related target corpus(De Gispert and Marino, to resource-rich languages. 2006).

4.3 How to choose the best pivot 3. Phrase-Table Composition language for automatic translation The translation models of source-pivot of low resource languages and pivot-target MT systems is com- When it comes to statistical machine transla- bined into source-target MT system. This tion, the first requirement is to have a good- is done using the creation of source- quality parallel corpus. This situation is im- target phrase table by merging source- possible for low resource languages such as pivot phrase table and pivot-target phrase Asian languages. Thus, recent research on table entries with identical pivot language multilingual statistical machine translation fo- phrases and multiplying posterior proba- cuses on the use of pivot language for the bilities(Wu and Wang, 2007). translation of such resource disadvantaged lan- guages. English, by its richness in language 4. Bridging at translation time resources, comes first as possible pivot lan- The coupling is integrated into the SMT guages. In this paper (Paul et al., 2013), the decoding process by modeling the pivot effect of various factors on the choice of pivot text as hidden variable and assuming in- language are studied. dependence between source and target Generally, the choice of pivot language is language sentences. done on two criteria - 1) availability of lan- guage resources and 2) relatedness between 5. Multi-pivot translation source and pivot language. However, the Intermediate translations into several preceding criteria might not be sufficient for pivot languages are used to generate a fi- choosing the best pivot language, especially for nal target language translation by a prob- Asian languages. The recent resource shows abilistic combination of translation mod- that use of non-English language as pivot im- els or system combination techniques. proves the system performance. 4.3.2 Language resources and diversity 4.3.1 Coupling Strategies for pivot translation The scope of this article was limited to the investigation of choice of pivot language using Pivot translation is a translation from a source cascading of two translation systems explained language to target language through an inter- above. In this article, 22 Indo-European and mediate pivot language. Within SMT frame- Asian languages are covered. For this, a work, the following strategies have been inves- multilingual Basic Travel Expressions Corpus tigated. (BTEC) is used. 1. Cascading of two translation systems. The information of languages used is shown The first MT system translated the source in table 1. These languages differ largely in language input to pivot language and sec- word-order, segmentation unit and degree of ond MT receives pivot language output inflection. OOV in table 1 is a percentage of from the first system as input. It then OOV words while Length is average sentence translated it into the target language out- length. put. Language Voc Length OOV Order Unit Inflection Danish 26.5K 7.2 1.0 SVO word high German 25.7K 7.1 1.1 mixed word high English 15.4K 7.5 0.4 SVO word moderate Spanish 20.8K 7.4 0.8 SVO word high French 19.3K 7.6 0.7 SVO word high Hindi 33.6K 7.8 3.8 SOV word high Italian 23.8K 6.7 0.9 SVO word high Dutch 22.3K 7.2 1.0 mixed word high Polish 36.4K 6.5 1.1 SVO word high Portugese 20.8K 7.0 1.0 SVO word high Brazilian Portugese 20.5K 7.2 1.0 SVO word high Russian 36.2K 6.4 2.3 SVO word high Arabic 47.8K 6.4 2.1 VSO word high Indonesian 18.6K 6.8 0.8 SVO word high Japanese 17.2K 8.5 0.5 SOV none moderate Koren 17.2K 8.1 0.8 SOV phrase moderate Malay 19.3K 6.8 0.8 SVO word high Thai 7.4K 7.8 0.4 SVO none light Tagalog 28.7 7.4 0.7 CSO word high Vietnamese 9.9K 9.0 0.2 SVO phrase light Chinese 13.3K 6.8 0.5 SVO word light Taiwanese 39.5K 5.9 0.6 SVO word light

Table 1: Language Resource(BTEC)

4.3.3 Observations for pivot language 4.3.4 Indicators of Pivot Translation selection quality Based on the observations made in experi- The results of the experiment show a large ments, below eight factors are identified which variation in BLEU score for all pivot lan- make a language effective pivot language for guages, indicating that there is not a single given language pair. best pivot language. The quality of given translation system largely depends on the re- 1. Language Family : A binary feature in- spective source and target languages. For dicating whether source and target lan- Indo-European pivot languages, the best lan- guage belong to the same language fam- guage combination scores are generally higher ily. than the ones obtained for Asian pivot lan- guages. 2. Vocabulary : The training data vocabu- For languages that are closely related, such lary size of source and target languages, as Portuguese and Brazilian Portuguese, the the ratio of source and target vocabulary related language should be chosen as pivot sizes, and the overlap between source and language while translating from or into the target vocabulary. respective language. The results, in general, 3. Sentence length : The average sentence show that pivot languages closely related to length (computed in terms of words) of source language have a larger impact on over- source and target training sets and the all pivot translation quality than pivot lan- ratio of source and target sentence length. guages related to target language. For Indo-European-only language pairs, 4. Reordering : The amount and span of only Indo-European languages perform well as word order differences (reordering) in the pivots. training data. 5. Language Perplexity : The perplexity of Preslav Nakov and Hwee Tou Ng. 2012. Improv- utilized language models. ing statistical machine translation for a resource- poor language using related resource-rich lan- 6. Translation model entropy : The amount guages. Journal of Artificial Intelligence Re- of uncertainty involved in choosing candi- search, 44(1):179–222. date translation phrases. Michael Paul, Andrew Finch, and Eiichrio Sumita. 2013. How to choose the best pivot language for 7. Engine performance : The BLEU scores automatic translation of low-resource languages. of the respective SMT engines used for the ACM Transactions on Asian Language Informa- pivot translation experiments. tion Processing (TALIP), 12(4):14. Masao Utiyama and Hitoshi Isahara. 2007. A com- 8. Monotonicity : The BLEU score differ- parison of pivot methods for phrase-based sta- ence of a given SMT engine for decoding tistical machine translation. In HLT-NAACL, with and without a reordering model. pages 484–491.

4.3.5 Summary Hua Wu and Haifeng Wang. 2007. Pivot language approach for phrase-based statistical machine To summarize, the effects of using non- translation. Machine Translation, 21(3):165– English pivot languages for translations be- 181. tween 22 Indo-European and Asian languages Hua Wu and Haifeng Wang. 2009. Revisiting pivot were compared. The source language fea- language approach for machine translation. In tures are preferable for heterogeneous lan- Proceedings of the Joint Conference of the 47th guage pairs while target language-related fea- Annual Meeting of the ACL and the 4th Inter- tures are focused more in case of homogeneous national Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1, language pairs. pages 154–162. Association for Computational Linguistics. References Raj Dabre, Fabien Cromieres, Sadao Kurohashi, and Pushpak Bhattacharyya. 2014. Leveraging small multilingual corpora for smt using many pivot languages. Adrià De Gispert and Jose B Marino. 2006. Catalan-english statistical machine translation without parallel corpus: bridging through span- ish. In Proc. of 5th International Conference on Language Resources and Evaluation (LREC), pages 65–68. Citeseer. Sharon Goldwater and David McClosky. 2005. Improving statistical mt through morphologi- cal analysis. In Proceedings of the conference on Human Language Technology and Empiri- cal Methods in Natural Language Processing, pages 676–683. Association for Computational Linguistics. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based transla- tion. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Lan- guage Technology-Volume 1, pages 48–54. Asso- ciation for Computational Linguistics. Anoop Kunchukuttan, Ratish Pudupully, Rajen Chatterjee, Abhijit Mishra, and Pushpak Bhat- tacharyya. 2014. The iit bombay smt system for icon 2014 tools contest. In NLP Tools Contest at ICON 2014.