Machine Translation with a Stochastic Grammatical Channel

Dekai Wu and Hongsing WONG HKUST Human Language Technology Center Department of Computer Science University of Science and Technology Clear Water Bay, Hong Kong {dekai,wong}@cs.ust.hk

Abstract word alignments. The SBTG can be regarded as We introduce a stochastic grammatical channel a model of the language-universal hypothesis that model for machine translation, that synthesizes sev- closely related arguments tend to stay together (Wu, eral desirable characteristics of both statistical and 1995a; Wu, 1995b). grammatical machine translation. As with the In this paper we introduce a generalization of pure statistical translation model described by Wu Wu's method with the objectives of (1996) (in which a bracketing transduction gram- 1. increasing translation speed flmher, mar models the channel), alternative hypotheses 2. improving meaning-preservation accuracy, compete probabilistically, exhaustive search of the 3. improving grammaticality of the output, and translation hypothesis space can be performed in 4. seeding a natural transition toward transduc- polynomial time, and robustness heuristics arise tion role models, naturally from a language-independent inversion- under the constraint of transduction model. However, unlike pure statisti- • employing no additional knowledge resources cal translation models, the generated output string except a grammar for the target language. is guaranteed to conform to a given target gram- To achieve these objectives, we: mar. The model employs only (1) a translation • replace Wn's SBTG channel with a full lexicon, (2) a context-free grammar for the target stochastic inversion transduction grammar or language, and (3) a bigram language model. The SITG channel, discussed in Section 3, and fact that no explicit bilingual translation roles are • (mis-)use the target language grammar as a used makes the model easily portable to a variety of SITG, discussed in Section 4. source languages. Initial experiments show that it also achieves significant speed gains over our ear- In Wu's SBTG method, the burden of generating lier model. grammatical output rests mostly on the bigram lan- guage model; explicit grammatical knowledge can- 1 Motivation not be used. As a result, output grammaticality can- Speed of statistical machine translation methods not be guaranteed. The advantage is that language- has long been an issue. A step was taken by dependent syntactic knowledge resources are not Wu (Wu, 1996) who introduced a polynomial-time needed. algorithm for the mntime search for an optimal We relax those constraints here by assuming a translation. To achieve this, Wu's method substi- good (monolingual) context-free grammar for the tuted a language-independent stochastic bracketing target language. Compared to other knowledge transduction grammar (SBTG) in place of the sim- resources (such as transfer rules or semantic on- pler word-alignment channel models reviewed in tologies), monolingual syntactic grammars are rel- Section 2. The SBTG channel made exhaustive atively easy to acquire or construct. We use the search possible through dynamic programming, in- grammar in the SITG channel, while retaining the stead of previous "stack search" heuristics. Trans- bigram language model. The new model facilitates lation accuracy was not compromised, because the explicit coding of grammatical knowledge and finer SBTG is apparently flexible enough to model word- control over channel probabilities. Like Wu's SBTG order variation (between English and Chinese) even model, the translation hypothesis space can be ex- though it eliminates large portions of the space of haustively searched in polynomial time, as shown in

1408 Section 5. The experiments discussed in Section 6 3 A SITG ChannelModel show promising results for these directions. The translation channel we propose is based on 2 Review: Noisy Channel Model the recently introduced bilingual language model- ing approach. The model employs a stochastic ver- The statistical translation model introduced by IBM sion of an inversion transduction grammar or ITG (Brown et al., 1990) views translation as a noisy (Wu, 1995c; Wu, 1995d; Wu, 1997). This formal- channel process. The underlying generative model ism was originally developed for the purpose of par- contains a stochastic Chinese (input) sentence gen- allel corpus annotation, with applications for brack- erator whose output is "corrupted" by the transla- eting, alignment, and segmentation. Subsequently, tion channel to produce English (output) sentences. a method was developed to use a special case of the Assume, as we do throughout this paper, that the ITG--the aforementioned BTG--for the translation input language is English and the task is to trans- task itself (Wu, 1996). The next few paragraphs late into Chinese. In the IBM system, the language briefly review the main properties of ITGs, before model employs simple 'n-grams, while the transla- we describe the SITG channel. tion model employs several sets of parameters as An ITG consists of context-free productions discussed below. Estimation of the parameters has where terminal symbols come in couples, for ex- been described elsewhere (Brown et al., 1993). ample :c/9, where x is a English word and U is an Translation is performed in the reverse direction Chinese translation of z, with singletons of the form from generation, as usual for recognition under gen- x/~ or ~/9 representing function words that are used erative models. For each English sentence e to be in only one of the languages. Any parse tree thus translated, the system attempts to find the Chinese generates both English and Chinese strings simulta- sentence e. such that: neously. Thus, the tree: c* == argtnax Pr(e[e) = argmax l'r(e[e)l'r(c) (1) c c (1) [I/~ [[took/~T [aJ-- ~/ak book/~]Np ]vP [for/,,~ you/4~]pp ]vP IS In the IBM model, the search for the optimal e. is produces, for example, the mutual translations: performed using a best-first heuristic "stack search" (2) a. [~ [[~" [~74kN~:~]Np ]VP [,,~4~]PP ]VP ]S similar to A* methods. b. [1 [[took [a book]Np ]vp [for you]pp ]vp Is One of the primary obstacles to making the statis- An additional mechanism accommodates a con- tical translation approach practical is slow speed of servative degree of word-order variation between translation, as performed in A* l'ashion. This price the two languages. With each production of the is paid for the robustness that is obtained by using gralnmar is associated either a straight orientation very flexible language and translation models. The or an im, erted orientation, respectively denoted as language model allows sentences of arbitrary or- follows: VP -+ [VPPP] der and the translation model allows arbitrary word- VP -+ {VP PP} order permutation. No structural constraints and In the case of a production with straight orien- explicit linguistic grammars are imposed by this tation, the right-hand-side symbols are visited left- model. to-right for both the English and Chinese streams. The translation channel is characterized by two But for a production with inverted orientation, the sets of parameters: translation and alignment prob- right-hand-side symbols are visited left-to-right for abilities, l The translation probabilities describe lex- English and right-to-left for Chinese. Thus, the tree: ical substitution, while alignment probabilities de- (3) [I/~j~ ([took/~" [aJ-- ~/~ book/-~]N p ]Vl, scribe word-order permutation. The key problem [for/.,~ you/4~]pp)ve ]s is that the formulation of alignment probabilities produces translations with different word order: a(ilj, V, T) permits the English word in position j (4) a. [I [[took [a book]Np ]vP [for you]pp ]ve ]s of a length-T sentence to map to any position i of a length-V Chinese sentence. So V g alignments are b. [~ [[,.~4/~,]I,p [~T [--*NINP ]vp ]ve ]s The surprising ability of ITGs to accommodate possible, yielding an exponential space with corre- nearly all word-order variation between fixed-word- spondingly slow search times. order languages 2 (English and Chinese in particu- 'Various models have been constructed by the IBM team lar), has been analyzed mathematically, linguisti- (Brown et al., 1993). This description corresponds to one of the simplest ones, "Model 2"; search costs for the more complex 2With the except{on of higher-order phenomena such as models are corrcspondingly higher. neg-raising and wh-movcment.

1409 cally, and experimentally (Wu, 1995b; Wu, 1997). S -} NP VP Punc Any ITG can be transformed to an equivalent VP --+ V NP binary-branching normal form. NP -+ NModNIPrn A stochastic ITG associates a probability with S --4 [NPVPPunc]l(PuncVPNP) each production. It follows that a SITG assigns VP -+ [V NP]I (NPV) a probability Pr(e,e,q) to all generable trees q NP -9 [N Mod N]I (NMod N) I [Prn] and sentence-pairs. In principle it can be used as Figure 1: An input CFG and its mirrored ITG. the translation channel model by normalizing with Pr(c) and integrating out Pr(q) to give Pr(e[c) in Equation (1). In practice, a strong language model 4.1 Production Mirroring makes this unnecessary, so we can instead optimize The first step is to convert the monolingual Chi- the simpler Viterbi approximation nese CFG to a bilingual ITG. The production mir- c* = argmaxPr(e,c, q) Pr(c) (2) roring tactic simply doubles the number of pro- e ductions, transforming every monolingual produc- To complete the picture we add a bigram model tion into two bilingual productions, a one straight g~a_~a = g(cj I cj-1) for the Chinese language and one inverted, as for example in Figure 1 where model Pr(c). the upper Chinese CFG becomes the lower ITG. This approach was used for the SBTG chan- The intent of the mirroring is to add enough flex- nel (Wu, 1996), using the language-independent ibility to allow parsing of English sentences using bracketing degenerate case of the SITG: 3 the language 1 side of the ITG. The extra produc- a[l tions accommodate reversed subconstituent order in A -+ [AA] the source language's constituents, at the same time A -+ (AA) restricting the language 2 output sentence to con- form the given target grammar whether straight or A b(~,a) x/y Vx, y lexical translations inverted productions are used. A ~(~J) x,/~ Vx language 1 vocabulary The following example illustrates how produc- A ~(~J) c/y gy language 2 vocabulary tion cnirroring works. Consider the input sentence He is the son of Stephen, which can be parsed by In the proposed model, a structured language- the ITG of Figure 1 to yield the corresponding out- dependent ITG is used instead. put sentence ~:~fl~.~, with the following 4 A Grammatical ChannelModel parse tree: (5) [[[He/~ ]Prn]NP [[1./~'S t~' ]V [the/e]NOISE Stated radically, our novel modeling thesis is that ([son/~z~]N [of/fl~]Mod [Stephen/~]N a mirrored version of the target language grammar )Ne Ire [./o ]pone IS can parse sentences of the source language. Production mirroring produced the inverted NP Ideally, an ITG would be tailored for the desired constituent which was necessary to parse son of source and target languages, enumerating the trans- Stephen, i.e., (son/~ of&'J Stephen/~-g3} )NP- duction patterns specific to that language pair. Con- If the target CFG is purely bina U branching, structing such an ITG, however, requires massive then the previous theoretical and linguistic analy- manual labor effort for each language pair. Instead, ses (Wu, 1997) suggest that much of the requisite our approach is to take a more readily acquired constituent and word order transposition may be ac- monolingual context-free grammar for the target commodated without change to the mirrored ITG. language, and use (or perhaps misuse) it in the SITG On the other hand, if the target CFG contains pro- channel, by employing the three tactics described ductions with long right-hand-sides, then merely in- below: production mirroring, part-of-speech map- verting the subconstituent order will probably be in- ping, and word skipping. sufficient. In such cases, a more complex transfor- In the following, keep in mind our convention mation heuristic would be needed. that language 1 is the source (English), while lan- Objective 3 (improving grammaticality of the guage 2 is the target (Chinese). output) can be directly tackled by using a tight tar- 3Wtl (Wtl, 1996) experimented with Chinese-English trans- lation, while this paper experiments with English-Chinese 4Except for unary productions, which yield only one bilin- translation. gual production.

1410 get grammar. To see this, consider using a mir- markers. This is the rationale [or the singletons rored Chinese CFG to parse English sentences with mentioned in Section 3. the language 1 side of the ITG. Any resulting parse If we create an explicit singleton hypothesis for tree must be consistent with the original Chinese every possible input word, the resulting search grammar: This follows from the fact that both the space will be too large. To recognize singletons, we straight and inverted versions of a production have instead borrow the word-skippitzg technique fiom language 2 (Chinese) sides identical to the original and robust parsing. As formal- monolingual production: inverting production ori- ized in the next section, we can do this by modifying entation cancels out the mirroring of the right-hand- the item extension step in our chart-parser-like algo- side symbols. Thus, the output grammaticality de- rithm. When the clot of an item is on the rightmost pends directly on the tightness of the original Chi- position, we can use such constituent, a subtree, to nese grammar. extend other items. In chart parsing, the valid sub- In principle, with this approach a single tar- trees that can be used to extend an item are those get grammar could be used for translation from that are located on the adjacent right of the dot po- any number of other (fixed word-order) source lan- sition of the item and the anticipated category of the guages, so long as a translation lexicon is available item should also be equal to that of the subtrees. for each source language. If word-skipping is to be used, the valid subtrees Probabilities on the mirrored ITG cannot be re- can be located a few positions right (or, left for the liably estimated from bilingual data without a very item corresponding to inverted production) to the large parallel corpus. A straightforward approxima- dot position of the item. In other words, words be- tion is to employ EM or Viterbi training on just a tween the dot position and the start of the subtee are monolingual target language (Chinese) corpus. skipped, and considered to be singletons. Consider Sentence 5 again. Word-skipping han- 4.2 Part-of-Speech Mapping dled the the which has no Chinese counterpart. At a The second problem is that the part-of-speech (PoS) certain point during translation, we have the follow- categories used by the target (Chinese) grammar do ing item: VP -~[is/;~]v.NP. With word-skipping, not correspond to the source (English) words when it can be extended to VP -+[is/,~]vNPe by the sub- the source sentence is parsed. It is unlikely that any tree (son/~-~ of/fl~ Stephen/51~_-~3~)Ne, even the English lexicon will list Chinese parts-of-speech. subtree is not adjacent (but within a certain distance, We employ a simple part-of-speech mapping see Section 5) to the dot position of the item. The technique that allows the PoS tag of any corre- the located on the adjacent to the dot position of the sponding word in the target language (as found in item is skipped. the translation lexicon) to serve as a proxy for the Word-skipping provides us the flexibility to parse source word's PoS. The word view, for example, the source input by skipping possible singleton(s), may be tagged with the Chinese tags nc and vn, if when we doing so, the source input can be parsed since the translation lexicon holds both viewNN/~ with the highest likelihood, and grammatical output ~,nc and viewvB//~i~vn. can be produced. Unknown English words must be handled differ- ently since they cannot be looked up in the transla- 5 Translation Algorithm tion lexicon. The English PoS tag is first found by tagging the English sentence. A set of PoSsible cor~ The translation search algorithm differs from that of responding Chinese PoS tags is then found by table Wu's SBTG model in that it handles arbitrary gram- lookup (using a small hand-constructed mapping ta- mars rather than binary bracketing grammars. As ble). For example, NN may map to nc, loc and pref, such it is more similar to active chart parsing (Ear- while VB may map to vi, wl, vp, vv, vs, etc. This ley, 1970) rather than CYK parsing (Kasami, 1965; method generates many hypotheses and should only Younger, 1967). We take the standard notion of be used as a last resort. items (Aho and Ulhnan, 1972), and use the term an- ticipation to mean an item which still has symbols 4.3 WordSkipping right of its dot. Items that don't have any symbols Regardless of how constituent-order transposition is right of the dot are called subtree. handled, some function words simply do not oc- As with Wu's SBTG model, the algorithm max- cur in both languages, for example Chinese aspect imizes a probabilistic objective function, Equa- 1411 r~ tion (2), using dynamic programming similar to that = lllax ai(r) I-[ 5,.~,~ ...... , ,o,,,+.... for HMM recognition (Viterbi, 1967). The presence ,'~0"o ,',) s, i=0 s, max[50.t ,-,,,,~., 3 ,-~t,,v]° 1. Initialization For each word in the input sentence, put a

"}'?" S t tl U - ,,,,era[] ,, -o -- @st.t,] subtree with category equal to the PoS of its translation 0 otherwise into the agenda. 2. Recursion Loop while agenda is not empty: where 6 (a) If the current item is a subtree of category X, ex- tend existing anticipations by calling ANTIClPA- TIONEXTENSION. For each rule in the grammar 511 nlax ;"8 t Zt V ~" ,.-q,...... ] ai(r) fI e;,.,,,t ...... q..... +~ of Z -+ X!/V ... Y, add an initial anticipation of s,

1412 3. Reconstruction The output string is recursively recon- Time SBTG Grammatical structed from the highest likelihood subtree, with cate- (z) Channel Channel gory 5', that span the whole input sentence. .~: < 30 secs. 15.6% 83.3% 6 Results 30 secs. < :r < 1 rain, 34.9% 7.6% :t: > 1 min. 49.5% 9.1% The grammatical channel was tested in the SILC translation system. The translation lexicon was Table I: Translation speed. partly constructed by training on government tran- scripts from the HKUST English-Chinese Paral- lel Bilingual Corpus, and partly entered by hand. Sentence meaning SBTG Grammatical The corpus was sentence-aligned statistically (Wu, preservation Channel Channel 1994); Chinese words and collocations were ex- Conect 25.9% 32.3% tracted (Fung and Wu, 1994; Wu and Fung, 1994); Incorrect 74.1% 67.7% then translation pairs were learned via an EM pro- cedure (Wu and Xia, 1995). Together with hand- Table 2: Translation accuracy. constructed entries, the resulting English vocabu- lary is approximately 9,500 words and the Chinese vocabulary is approximately 14,500 words, with a constraints on the search space given by the SITG. many-to-many translation mapping averaging 2.56 The natural trade-off is that constraining the Chinese translations per English word. Since the structure of the input decreases robusmess some- lexicon's content is mixed, we approximate transla- what. Approximately 13% of the test corpus could tion probabilities by using the unigram distribution not be parsed in the grammatical channel model. of the target vocabulary from a small monolingual As mentioned earlier, this figure is likely to vary corpus. Noise still exists in the lexicon. widely depending on the characteristics of the tar- The Chinese grammar we used is not tight-- get gramnmr. Of course, one can simply back off it was written for robust parsing purposes, and as to the SBTG model when the grammatical channel such it over-generates. Because of this we have not rejects an input sentence. yet been able to conduct a fair quantitative assess- With respect to objective 2 (improving meaning- ment of objective 3. Our productions were con- preservation accuracy), the new model is also stmcted with reference to a standard grammar (Bei- promising. Table 2 shows that the percentage of jing Language and Culture Univ., 1996) and totalled meaningfully translated sentences rises from 26% to 316 productions. Not all the original productions 32% (ignoring the rejected cases)] We have judged are mirrored, since some (128) are unary produc- only whether the correct meaning is conveyed by the tions, and others are Chinese-specific lexical con- translation, paying particular attention to word order structions like S -+ i~-~ S NP ~ S, which are and grammaticality, but otherwise ignoring morpho- obviously unnecessary to handle English. About logical and function word choices. 27.7% of the non-unary Chinese productions were mirrored and the total number of productions in the 7 Conclusion final ITG is 368. Currently we are designing a tight generation- For the experiment, 222 English sentences with oriented Chinese grammar to replace our robust a maximum length of 20 words from the parallel parsing-oriented grammar. We will use the new corpus were randomly selected. Some examples of grammar to quantitatively evaluate objective 3. We the output are shown in Figure 2. No morphological are also studying complementary approaches to processing has been used to correct the output, and the English word deletion performed by word- up to now we have only been testing with a bigram skipping--i.e., extensions that insert Chinese words model trained on extremely small corpus. suggested by the target grammar into the output. With respect to objective 1 (increasing translation The framework seeds a natural transition toward speed), the new model is very encouraging. Ta- pattern-based translation models (objective 4). One ble 1 shows that over 90% of the samples can be processed within one minute by the grammatical 7These accuracy rates arc relatively low because these ex- channel model, whereas that for the SBTG channel periments are being conducted with new lexicons and grammar model is about 50%. This demonstrates the stronger on a new translation direction (English-Chinese).

1413 can post-edit the productions of a mirrored SITG Input : I entirely agree with this point of view. more carefully and extensively than we have done Output: ~J~.~,, ~ J:J~~j ~_~J~,.~ o in our cursory pruning, gradually transforming the original monolingual productions into a set of true Input : This would create a tremendous financial transduction rule patterns. This provides a smooth burden to taxpayers in Hong Kong. evolution from a purely statistical model toward a hybrid model, as more linguistic resources become Corpus: 11~ ~ ~:i~ i~ ~ ~),.~ ~ ~ ~ ~ ffg,~i'~ ~t ~ o available. Input : Tile Government wants, and will work for, the best education for all the children of Hong Kong. We have described a new stochastic grammati- Output: :~i~ ~Jl~~ ~t~ ~ ~ ~c~,$~ ~ :]~ ~'~ ,,~ ~1~ ~ ~:~ t~,j 2~. cal channel model for statistical machine translation that exhibits several nice properties in comparison ~ ~ ~/I~.~o Corpus: i~,~ ~i~ ~)i~ ~ ~ ~)~ ~ ~1~ with Wu's SBTG model and IBM's word alignment model. The SITG-based channel increases trans- Input : Let me repeat one simple point yet again. lation speed, improves meaning-preservation accu- Output: ~:J~[JX ~ ~/1~ ~1~ o racy, permits tight target CFGs to be incorporated Corpus:-~ :k :~ ~- ~ ~*t~ O,~i o for improving output grammaticality, and suggests Input : We are very disappointed. a natural evolution toward transduction rule mod- Output: ~J~ ~ ~[~ ~t~ ~'~ [~,Jo els. The input CFG is adapted for use via produc- Corpus: ~lt~,~:~ o tion mirroring, part-of-speech mapping, and word- skipping. We gave a polynomial-time translation Figure 2: Example translation outputs frorn the algorithm that requires only a translation lexicon, grammatical channel model. plus a CFG and bigram language model for the tar- get language. More linguistic knowledge about the target language is employed than in pure statisti- T. Kasami. 1965. An efficient recognition and syntax analysis al- gorithm for context-free languages. Technical Report AFCRL-65- cal translation models, but Wu's SBTG polynomial- 758, Air Force Cambridge Research Lab., Bedford, MA. time bound on search cost is retained and in fact the Andrew J. Viterbi. 1967. Error bounds for convolutional codes and an search space can be significantly reduced by using asymptotically optimal decoding algorithm. IEEE 7)'ansactions on a good grammar. Output always conforms to the h!tbrmation Them3', 13:260-269. Dekai Wu and Pascale Fung. 1994. Improving Chinese tokenization given target grammar. with linguistic filters on statistical lexical acquisition. In Proc. of 4th Con[: on ANLP, pg 180-181, Stuttgart, Oct. Acknowledgments Dekai Wu and Xuanyin Xia. 1995. Large-scale automatic extraction Thanks to the SILC group members: Xuanyin Xia, Daniel of an English-Chinese lexicon. Machine Translation, 9(3-4):285- 313. Chan, Aboy Wong, Vincent Chow & James Pang. Dekai Wu. 1994. Aligning a parallel English-Chinese corpus statisti- cally with lexical criteria. In t'roc, qf 32nd Anmml Conf of Assoc. References .[br Conqmmtional Linguistics, pg 80-87, Las Cruces, Jun. Dekai Wu. 1995a. An algorithm for simultaneously bracketing parallel Alfred V. Aho and Jeffrey D. Ullman. 1972. The Theory of Parsing, texts by aligning words. In Proc. c~f33rd Anmml Om]i t~lAssoc. for Translation, and Compiling. Prentice Hall, Englewood Cliffs, NJ, Co,qmtational Linguistics, pg 244-251, Cambridge, MA, Jun. G. Edward Barton, Robert C. Berwick, and Eric, S Ristad. 1987. Com- Dekai Wu. 1995b. Grammarless extraction of phrasal translation ex- putational Complexity and Natural Language. MIT Press, Cam- amples from parallel texts. In 7MI-95, Proc. of the 6th hltd Cm![i bridge, MA. on Theoretical and Methodological Issues in Machine 7)anslation, Beijing Language and Culture Univ.. 1996. Sucheng Hanyu Chuji volume 2, pg 354-372, Leuven, Belgium, Jul. Jiaocheng (A Short h~tensive Elementao' Chinese Course), volume Dekai Wu. 1995c. Stochastic inversion transduction grammars, with 1-4. Beijing Language And Culture Univ. Press. application to segmentation, bracketing, and alignment of parallel Peter E Brown, John Cocke, Stephen A. DellaPietra, Vincent J. Del- corpora. In Proc. qf IJCAI-95, 14th hmi Joint Cop!~ on Artificial laPietra, , John D. Lafferty, Robert L. Mercer, and h~telligence, pg 1328-1334, Montreal, Aug. Paul S. Roossin. 1990. A statistical approach to machine transla- Dekai Wu. 1995d. Trainable coarse bilingual grammars for parallel tion. ComputationalLinguistics, 16(2):29-85. text bracketing. In Proc. of the 3rd Ammal Workshop on Ve~' l.ztrge Peter E Brown, Stephen A. DellaPietra, Vincent J. DellaPietra, and Corpora, pg 69-81, Cambridge, MA, Jun. Robert L. Mercer. 1993, The mathematics of statistical ma- Dekai Wu. 1996. A polynomial-time algorithm for statistical machine chine translation: Parameter estimation. Computational Linguis- translation. In Proc. of the 34th Annual Cm!]: qf the Assoc. J?," Co/n- tics, 19(2):263-311. putational Linguistics, pg 152-158, Santa Cruz, CA, Juu. Jay Earley. 1970. An efficient context-free parsing algorithm. Com- Dekai Wu. 1997. Stochastic inversion transduction grammars and munications of the Assoc..[br Computing Machineo', 13(2):94-102. bilingual parsing of parallel corpora. Conlputational Linguistics, Pascale Fung and Dekai Wu. 1994. Statistical augmentation of a Chi- 23(3):377-404, Sept. nese machine-readabledictionary. In Proc. of the 2nd Anmtal Work- David H. Younger. 1967. Recognition and parsing of context-free lan- shop on Vet3' Large Corpora, pg 69-85, Kyoto, Aug. guages in time n 3. h~[brmation and Control, 10(2): 189-208.

1414 Machine Translation with a Stochastic Grammatical Channel

Dekai WU vs~?~l_~) and Hongsing WONG (~g~l{~.~)

{dekai, wong) @cs. ust.hk

,J,I t- J.~-¢ ,~ L,', [.~]

(- -) lt[TJ~ ~s~I1-~¢: '~J~' ~tI~.<,"

1415