Pivot-Based Machine Translation

Literature Survey: Pivot-based Machine Translation Rohit More IIT Bombay [email protected] Current statistical machine translation sys- the situation of resource-poor languages where tems heavily rely on the availability of par- direct translation is either very poor or not allel corpora between the language pair in- available. Our approach, on the other hand, volved. The good quality parallel corpus is not tries to employ pivot strategy to help improve always available. This creates a bottleneck. the performance of existing direct MT system. One solution to solve this bottleneck is to in- Our attempt to integrate word segmentation troduce third language, named pivot language with pivot strategies is first of a kind. for which there exist good quality source-pivot and pivot-target bilingual corpora. 2 Approaches to Pivot based MT There are methods by which the resources of 1 Related work pivot language can be utilized as explained in There is a substantial amount of work done in (Wu and Wang, 2009) - namely the area of pivot strategies for SMT. (De Gis- 1. Sentence Translation or Transfer Method pert and Marino, 2006) talk about translation task between Catalan and English with 2. Synthetic corpus synthesis use of Spanish as a pivot language. Pivot- 3. Phrase table construction or Triangula- ing is done using two techniques- concatena- tion Approach tion of two SMT systems and direct approach in which Catalan-English corpus is generated These methods are explained in brief in fol- and trained upon. In (Utiyama and Isahara, lowing sections. 2007), the authors inspect the use of pivot language through - phrase translation (phrase ta- 2.1 Sentence Translation or Transfer ble creation) and sentence translation. (Wu Method and Wang, 2007) discuss three methods for The transfer method first translates the source pivot strategies namely - phrase translation language into pivot language using source- (i.e. triangulation), transfer method and syn- pivot translation system, and then from pivot thetic method. (Nakov and Ng, 2012) try to language to target language through the pivot- exploit the similarity between resource-poor target translation system. Given a source languages and resource-rich languages for the sentence S, we can translate it into n pivot translation task. (Dabre et al., 2014) used language sentences P1;P2;P3; :::Pn using a multiple decoding paths (MDP) to overcome source-pivot translation system. Each of these the limitation of small sized corpora.(Paul et n sentence, Pi then can be translated into m al., 2013) debates over criteria to be consid- target language sentences Ti1;Ti2;Ti3; ::::Tim ered for selection of good pivot language. Use using pivot-target translation system. Thus, of source-side segmentation as pre-processing in total we will have m×n target language sen- technique is demonstrated by (Kunchukuttan tences.These sentences can then be re-scored et al., 2014). (Goldwater and McClosky, 2005) using source-pivot and pivot-target transla- investigates several methods for incorporating tion scores according to method described in morphological information to achieve better (Utiyama and Isahara, 2007) translation from Czech to English. If we denote source-pivot system features as Pivot strategies mentioned above focus on hsp and pivot-target features as hpt, the best scoring translation is calculated using equa- induced word alignment, lexical probabilities tion: are estimated. Thus, lexical weight is calculated using induced alignment and estimated XL ( ) lexical probabilities. ^ sp sp pt pt We will take a detailed look at the mathe- t = argmax λk hk (s; p) + λk hk (p; t) t k=1 matics behind triangulation approach (1) 3 Mathematics of Triangulation Where, L is the number of features used in Approach SMT systems and λsp, λpt are feature weights. This section will introduce the triangulation 2.2 Corpus Synthesis method that performs phrase-based SMT for the language pair L − L by using two bilin- In order to obtain source-target corpus, there f e gual corpora of L − L and L − L . Two are two ways. One is, we can translate pivot f p p e translation models are trained for L −L and language sentences from source-pivot corpus f p L −L . Based on these models, a pivot trans- into target language sentences using the pivot- p e lation model is built for L − L , with L as a target system. The other way is, translation f e p pivot language. The details are extracted from of pivot sentences from the pivot-target cor- Wu and Wang (Wu and Wang, 2007). pus into source sentences using pivot-source According to Equation ??, only phrase system. translation probability, and the lexical weight The source-target corpora created using are language dependent. They are introduced above two methods can then be combined to as follows: produce a final synthetic corpus. 3.1 Phrase Translation Probabilities 2.3 Triangulation or Phrase table − − induction Using Lf Lp and Lp Le bilingual corpora, we( train) two phrase translation probabilities The method of triangulation is described in ~ j j (?). In this method, we train source-pivot ϕ fi ~pi and ϕ (~pi ~ei), where pi is the phrase models and pivot-target models using source- in pivot language Lp. We( obtain) the phrase pivot and pivot-target corpora respectively. translation probability ϕ f~ij~ei according to Using these two models created so far, we in- the following model, duce a source-target model. The two impor- tant components to be induced are - 1) phrase ( ) X ( ) ~ j ~ j j translation probability and 2) lexical weight. ϕ fi ~ei = ϕ fi ~pi; ~ei ϕ (~pi ~ei) (3) Phrase translation probability is in- ~pi duced on the basis of assumption- that source (The phrase) translation probability and target phrases are conditionally indepen- ϕ f~ij~pi; ~ei does not depend on the phrase ~ei dent when conditioned on pivot phrases. It in the language Le, since it is estimated from can be given as, the Lf − Lp bilingual corpus. Thus, equation 3 can be rewritten as ( ) X ( ) j~ j j~ ϕ ~s t = ϕ (~s ~p) ϕ ~p t (2) ( ) X ( ) ~p ~ j ~ j j ϕ fi ~ei = ϕ fi ~pi ϕ (~pi ~ei) (4) ~p Where, ~s, ~p, ~t are phrases in the languages i Ls, Lp, Lt respectively. Are probability calculations correct? Lexical Weight, according to (Koehn et Let us go step by step through the formulation( ) al., 2003), depends on - 1) word alignment in- of phrase translation probability ϕ f~ij~ei . formation a in a phrase pair (s; t) and 2) lexical First, we marginalize, translation probability w(sjt). To calculate lexical weight, the word align- ( ) X ( ) ~ j ~ j ment is induced from source-pivot and pivot- ϕ fi ~ei = ϕ fi; ~pi ~ei (5) target alignment. Using the information from ~pi ( ) ~ Now we will use the chain rule, Where, ϕk fj~e is phrase translation prob- ( ) X ( ) ability for phrase pair k. δ (x; y) = 1 ifx = y; otherwise 0 ϕ f~ij~ei = ϕ f~ij~pi; ~ei ϕ (~pij~ei) (6) Thus, lexical translation probability can be ~pi estimated as Since, we have Lf −Lp corpus available with us, the calculation of first term in( the above) equation will not depend on p i.e. ϕ f~ j~p ; ~e Pcount (f; e) i i i w (fje) = 0 (11) ( ) 0 f count (f ; e) will now reduce to ϕ f~ij~pi . Thus, the final equation will be, w (fje) can also be calculated using word method as, ( ) X ( ) ~ j ~ j j ϕ fi ~ei = ϕ fi ~pi ϕ (~pi ~ei) (7) X ~pi w (fje) = w (fjp) w (pje) sim (f; e; p) 3.2 Lexical Weight p (12) According to (Koehn et al., 2003), lexical weight can be estimated using following Where, w (fjp) and w (pje) are two lexical model. probabilities, and sim (f; e; p) is the cross language word similarity. ( ) Yn 1 X p f~j~e;a = w (f je ) 3.3 Interpolated Model w jjj (i; j) 2 aj i j i=1 8(i;j)2a If we have a small Lf − Le parallel corpus, (8) training a translation model on this corpus In order to estimate lexical weight for our alone will result in the poorly performing sys- model, we first need to obtain the alignment tem. The reason behind the poor performance information a between two phrases f~ and ~e, is sparse data. In order to improve this per- − and then estimate the lexical translation prob- formance, we can use additional Lf Lp and − ability w (fje) according to the alignment in- Lp Le parallel corpora. Moreover, we can formation. also use more than one pivot languages to improve the translation performance. Differ- The( alignment) information for the phrase ent pivot language may catch different lan- pair f;~e~ can be induced from the two phrase ( ) guage phenomenon and can improve transla- pairs, f;~ ~p and (~p;~e). Let a1 and a2 be tion quality by adding quality Lf − Le phrase the word( alignment) information inside phrase pairs. pairs f;~ ~p and (~p;~e) respectively. If we include n pivot languages, n pivot models can be estimated as described in section 3. In order to combine all these mod- a = f(f; e) j9p :(f; p) 2 a1& (p; e) 2 a2g (9) els with the standard model trained with the L − Le corpus, we use linear interpolation. With this induced alignment information, f The phrase translation probability and the lex- there exists a method to estimate the prob- ical weight are estimated as shown in equation ability directly from the induced phrase pairs.

Pivot-Based Machine Translation

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support