Pivot-Based Machine Translation

Literature Survey: Pivot-based Machine Translation Rohit More IIT Bombay [email protected] Current statistical machine translation sys- the situation of resource-poor languages where tems heavily rely on the availability of par- direct translation is either very poor or not allel corpora between the language pair in- available. Our approach, on the other hand, volved. The good quality parallel corpus is not tries to employ pivot strategy to help improve always available. This creates a bottleneck. the performance of existing direct MT system. One solution to solve this bottleneck is to in- Our attempt to integrate word segmentation troduce third language, named pivot language with pivot strategies is first of a kind. for which there exist good quality source-pivot and pivot-target bilingual corpora. 2 Approaches to Pivot based MT There are methods by which the resources of 1 Related work pivot language can be utilized as explained in There is a substantial amount of work done in (Wu and Wang, 2009) - namely the area of pivot strategies for SMT. (De Gis- 1. Sentence Translation or Transfer Method pert and Marino, 2006) talk about translation task between Catalan and English with 2. Synthetic corpus synthesis use of Spanish as a pivot language. Pivot- 3. Phrase table construction or Triangula- ing is done using two techniques- concatena- tion Approach tion of two SMT systems and direct approach in which Catalan-English corpus is generated These methods are explained in brief in fol- and trained upon. In (Utiyama and Isahara, lowing sections. 2007), the authors inspect the use of pivot language through - phrase translation (phrase ta- 2.1 Sentence Translation or Transfer ble creation) and sentence translation. (Wu Method and Wang, 2007) discuss three methods for The transfer method first translates the source pivot strategies namely - phrase translation language into pivot language using source- (i.e. triangulation), transfer method and syn- pivot translation system, and then from pivot thetic method. (Nakov and Ng, 2012) try to language to target language through the pivot- exploit the similarity between resource-poor target translation system. Given a source languages and resource-rich languages for the sentence S, we can translate it into n pivot translation task. (Dabre et al., 2014) used language sentences P1;P2;P3; :::Pn using a multiple decoding paths (MDP) to overcome source-pivot translation system. Each of these the limitation of small sized corpora.(Paul et n sentence, Pi then can be translated into m al., 2013) debates over criteria to be consid- target language sentences Ti1;Ti2;Ti3; ::::Tim ered for selection of good pivot language. Use using pivot-target translation system. Thus, of source-side segmentation as pre-processing in total we will have m×n target language sen- technique is demonstrated by (Kunchukuttan tences.These sentences can then be re-scored et al., 2014). (Goldwater and McClosky, 2005) using source-pivot and pivot-target transla- investigates several methods for incorporating tion scores according to method described in morphological information to achieve better (Utiyama and Isahara, 2007) translation from Czech to English. If we denote source-pivot system features as Pivot strategies mentioned above focus on hsp and pivot-target features as hpt, the best scoring translation is calculated using equa- induced word alignment, lexical probabilities tion: are estimated. Thus, lexical weight is calculated using induced alignment and estimated XL ( ) lexical probabilities. ^ sp sp pt pt We will take a detailed look at the mathe- t = argmax λk hk (s; p) + λk hk (p; t) t k=1 matics behind triangulation approach (1) 3 Mathematics of Triangulation Where, L is the number of features used in Approach SMT systems and λsp, λpt are feature weights. This section will introduce the triangulation 2.2 Corpus Synthesis method that performs phrase-based SMT for the language pair L − L by using two bilin- In order to obtain source-target corpus, there f e gual corpora of L − L and L − L . Two are two ways. One is, we can translate pivot f p p e translation models are trained for L −L and language sentences from source-pivot corpus f p L −L . Based on these models, a pivot trans- into target language sentences using the pivot- p e lation model is built for L − L , with L as a target system. The other way is, translation f e p pivot language. The details are extracted from of pivot sentences from the pivot-target cor- Wu and Wang (Wu and Wang, 2007). pus into source sentences using pivot-source According to Equation ??, only phrase system. translation probability, and the lexical weight The source-target corpora created using are language dependent. They are introduced above two methods can then be combined to as follows: produce a final synthetic corpus. 3.1 Phrase Translation Probabilities 2.3 Triangulation or Phrase table − − induction Using Lf Lp and Lp Le bilingual corpora, we( train) two phrase translation probabilities The method of triangulation is described in ~ j j (?). In this method, we train source-pivot ϕ fi ~pi and ϕ (~pi ~ei), where pi is the phrase models and pivot-target models using source- in pivot language Lp. We( obtain) the phrase pivot and pivot-target corpora respectively. translation probability ϕ f~ij~ei according to Using these two models created so far, we in- the following model, duce a source-target model. The two impor- tant components to be induced are - 1) phrase ( ) X ( ) ~ j ~ j j translation probability and 2) lexical weight. ϕ fi ~ei = ϕ fi ~pi; ~ei ϕ (~pi ~ei) (3) Phrase translation probability is in- ~pi duced on the basis of assumption- that source (The phrase) translation probability and target phrases are conditionally indepen- ϕ f~ij~pi; ~ei does not depend on the phrase ~ei dent when conditioned on pivot phrases. It in the language Le, since it is estimated from can be given as, the Lf − Lp bilingual corpus. Thus, equation 3 can be rewritten as ( ) X ( ) j~ j j~ ϕ ~s t = ϕ (~s ~p) ϕ ~p t (2) ( ) X ( ) ~p ~ j ~ j j ϕ fi ~ei = ϕ fi ~pi ϕ (~pi ~ei) (4) ~p Where, ~s, ~p, ~t are phrases in the languages i Ls, Lp, Lt respectively. Are probability calculations correct? Lexical Weight, according to (Koehn et Let us go step by step through the formulation( ) al., 2003), depends on - 1) word alignment in- of phrase translation probability ϕ f~ij~ei . formation a in a phrase pair (s; t) and 2) lexical First, we marginalize, translation probability w(sjt). To calculate lexical weight, the word align- ( ) X ( ) ~ j ~ j ment is induced from source-pivot and pivot- ϕ fi ~ei = ϕ fi; ~pi ~ei (5) target alignment. Using the information from ~pi ( ) ~ Now we will use the chain rule, Where, ϕk fj~e is phrase translation prob- ( ) X ( ) ability for phrase pair k. δ (x; y) = 1 ifx = y; otherwise 0 ϕ f~ij~ei = ϕ f~ij~pi; ~ei ϕ (~pij~ei) (6) Thus, lexical translation probability can be ~pi estimated as Since, we have Lf −Lp corpus available with us, the calculation of first term in( the above) equation will not depend on p i.e. ϕ f~ j~p ; ~e Pcount (f; e) i i i w (fje) = 0 (11) ( ) 0 f count (f ; e) will now reduce to ϕ f~ij~pi . Thus, the final equation will be, w (fje) can also be calculated using word method as, ( ) X ( ) ~ j ~ j j ϕ fi ~ei = ϕ fi ~pi ϕ (~pi ~ei) (7) X ~pi w (fje) = w (fjp) w (pje) sim (f; e; p) 3.2 Lexical Weight p (12) According to (Koehn et al., 2003), lexical weight can be estimated using following Where, w (fjp) and w (pje) are two lexical model. probabilities, and sim (f; e; p) is the cross language word similarity. ( ) Yn 1 X p f~j~e;a = w (f je ) 3.3 Interpolated Model w jjj (i; j) 2 aj i j i=1 8(i;j)2a If we have a small Lf − Le parallel corpus, (8) training a translation model on this corpus In order to estimate lexical weight for our alone will result in the poorly performing sys- model, we first need to obtain the alignment tem. The reason behind the poor performance information a between two phrases f~ and ~e, is sparse data. In order to improve this per- − and then estimate the lexical translation prob- formance, we can use additional Lf Lp and − ability w (fje) according to the alignment in- Lp Le parallel corpora. Moreover, we can formation. also use more than one pivot languages to improve the translation performance. Differ- The( alignment) information for the phrase ent pivot language may catch different lan- pair f;~e~ can be induced from the two phrase ( ) guage phenomenon and can improve transla- pairs, f;~ ~p and (~p;~e). Let a1 and a2 be tion quality by adding quality Lf − Le phrase the word( alignment) information inside phrase pairs. pairs f;~ ~p and (~p;~e) respectively. If we include n pivot languages, n pivot models can be estimated as described in section 3. In order to combine all these mod- a = f(f; e) j9p :(f; p) 2 a1& (p; e) 2 a2g (9) els with the standard model trained with the L − Le corpus, we use linear interpolation. With this induced alignment information, f The phrase translation probability and the lex- there exists a method to estimate the prob- ical weight are estimated as shown in equation ability directly from the induced phrase pairs.

Load more