Quick viewing(Text Mode)

A Model of Competence for Corpus-Based Machine Translation

A Model of Competence for Corpus-Based Machine Translation

A Mo del of Comp etence for Corpus-Based Machine

Michael Carl

Institut fur  Angewandte Informationsforschung,

Martin-Luther-Strae 14,

66111 Saarbruc  ken, Germany,

[email protected]

is based. In CBMT-systems, it is assumed that Abstract

the reference given to the system in a

In this pap er I elab orate a mo del of comp etence

training phase have equivalence meanings. Accord-

for corpus-based CBMT along

ing to their intelligenc e, these systems try to g-

the lines of the representations used in the transla-

ure out of what the meaning invariance consists in

tion system. Representations in CBMT-systems can

the reference text and learn an appropriate source

b e rich or austere, molecular or holistic and they can

language/target language mapping mechanism. A

b e ne-grained or coarse-grained. The pap er shows

translation can only b e generated if an appropriate

that di erent CBMT architectures are required de-

example translation is available in the reference text.

p endent on whether a b etter translation qualityor

An interesting question in CBMT systems is thus:

a broader coverage is preferred according to Boitet

what theory of meaning should the learning pro-

1999's formula: \Coverage * Quality = K".

cess implement in order to generate an appropriate

understanding of the source text such that it can

1 Intro duction

b e mapp ed into a meaning equivalent target text?

In the machine translation MT , it has

Dummett Dummett, 1975 suggests a distinction

often b een argued that translations of natural lan-

of theories of meaning along the following lines:

guage texts are valid if and only if the source lan-

guage text and the target language text have the

 In a rich theory of meaning, the knowledge of

same meaning cf. e.g. Nagao, 1989. If we assume

the concepts is achieved by knowing the features

that MT systems pro duce meaningful translations

of these concepts. An austere theory merely re-

to a certain extent, wemust assume that such sys-

lies up on simple recognition of the shap e of the

tems have a notion of the source text meaning to a

concepts. A rich theory can justify the use of a

similar extent. Hence, the translation algorithm to-

concept by means of the characteristic features

gether with the data it uses enco de a formal mo del of

of that concept, whereas an austere theory can

meaning. Despite 50 years of intense , there

justify the use of a concept merely byenumer-

is no existing system that could map arbitrary input

ating all o ccurrences of the use of that concept.

texts onto meaning-equivalent output texts. Howis

 A molecular theory of meaning derives the

that p ossible?

understanding of an expression from a nite

According to Dummett, 1975 a theory of mean-

numb er of axioms. A holistic theory, in con-

ing is a theory of understanding: having a theory

trast, derives the understanding of an expres-

of meaning means that one has a theory of under-

sion through its distinction from all other ex-

standing. In linguistic research, texts are describ ed

pressions in that language. A molecular theory,

on a numberoflevels and dimensions each contribut-

therefore, provides criteria to asso ciate a cer-

ing to its understanding and hence to its meaning.

tain meaning to a sentence and can explain the

Traditionally, the main fo cus has b een on semantic

concepts used in the language. In a holistic the-

asp ects. In this research it is assumed that know-

ory nothing is sp eci ed ab out the knowledge of

ing the prop ositional structure of a text means to

the language other than in global constraints

understand it. Under the same premise, researchin

related to the language as a whole.

MT has fo cused on semantic asp ects assuming that

texts have the same meaning if they are semantically

In addition, the granularity of concepts seems cru-

equivalent.

cial for CBMT implementations.

Recent research in corpus-based MT has di er-

 A ne-grained theory of meaning derives con- ent premisses. Corpus-Based Machine Translation

cepts from single morphemes or separable words CBMT systems make use of a set of reference

of the language, whereas in a coarse-grained translations on which the translation of a new text

theory of meaning, concepts are obtained from with resp ect to the three levels of description are

morpheme clusters. In a ne-grained theory of returned as the b est available translation.

meaning, complex concepts can b e created by

Example Based Machine Translation EBMT

hierarchical comp osition of their comp onents,

systems Sato and Nagao, 1990; Collins, 1998;

whereas in a coarse-grained theory of meaning,

Guvenir and Cicekli, 1998; Carl, 1999; Brown, 1997

complex meanings can only b e achieved through

are richer systems. Translation examples are stored

a concatenation of concept sequences.

as feature and tree structures. Translation templates

are generated which contain - sometimes weighted

The next three sections discuss the dichotomies of

- connections in those p ositions where the source

theories of meaning, rich vs. austere, molecular vs.

language and the target language equivalences are

holistic and coarse-grained vs. ne-grained where a

strong. In the translation phase, a multi-layered

few CBMT systems are classi ed according to the

mapping from the source language into the target

terminology intro duced. This leads to a mo del of

language takes place on the level of templates and

comp etence for CBMT. It app ears that translation

on the level of llers.

systems can either b e designed to have a broad cov-

The ReVerb EBMT system Collins, 1998 p er-

erage or a high quality.

forms sub-sentential chunking and seeks to link con-

stituents with the same function in the source and

2 Rich vs. Austere CBMT

the target language. A source language sub ject is

translated as a target language sub ject and a source

A common characteristic of all CBMT systems is

language ob ject as a target language ob ject. In case

that the understanding of the translation task is de-

there is no appropriate translation template avail-

rived from the understanding of the reference trans-

able, single words can b e replaced as well, at the

lations. The inferred translation knowledge is used

exp ense of translation quality.

in the translation phase to generate new transla-

The EBMT approach describ ed in Guv  enir and

tions.

Cicekli, 1998 makes use of morphological knowl-

Collins 1998 distinguishes b etween Memory-

edge and relies on word stems as a basis for trans-

Based MT, i.e. memory heavy, linguistic light and

lation. Translation templates are generalized from

Example-Based MT i.e. memory light and linguistic

aligned sentences by substituting di erences in sen-

heavy. While the former systems implement an aus-

tence pairs with variables and leaving the identical

tere theory of meaning, the latter make use of rich

substrings unsubstituted. An iterative application

representations.

of this metho d generates translation examples and

The most sup er cial theory of understanding

translation templates which serve as the basis for

is implemented in purely memory-based MT ap-

an example based MT system. An understanding

proaches where learning takes place only by extend-

consists of extraction of comp ositionally translatable

ing the reference text. No abstraction or generaliza-

substrings and the generation of translation tem-

tion of the reference examples takes place.

plates.

Translation Memories TMs are such purely

A similar approach is followed in EDGAR Carl,

memory based MT-systems. A TM e.g. TRADOS's

1999. Sentences are morphologically analyzed and

Translator's Workb ench Heyn, 1996, and STAR's

translation templates are decorated with features.

TRANSIT calculates the graphemic similarity of the

Fillers in translation template slots are constrained

input text and the source side of the reference trans-

to unify with these features. In addition to this,

lations and return the target string of the most sim-

a shallow linguistic formalism is used to p ercolate

ilar translation examples as output. TMs make use

features in derivation trees.

of a set of reference translation examples and a k-

Sato and Nagao 1990 prop osed still richer repre-

nn retrieval algorithm. They implement an austere

sentations where syntactically analyzed phrases and

theory of meaning b ecause they cannot justify the

sentences are stored in a database. In the translation

use of a word other than by lo oking up all contexts

phase, most similar derivation trees are retrieved

in which the word o ccurs. They can, however, enu-

from the database and a target language deriva-

merate all o ccurrences of a word in the reference

tion tree is comp osed from the translated parts. By

text.

means of a thesaurus semantically similar lexical

The TM distributed by ZERES Zer, 1997 follows

items may b e exchanged in the derivation trees.

a richer approach. The reference translations and

Statistics based MT SBMT approaches imple- the input sentence to b e translated are lemmatized

ment austere theories of meaning. For instance, in and part-of-sp eech tagged. The source language sen-

Brown et al. 1990 a couple of mo dels are pre- tence is mapp ed against the reference translations

sented starting with simple sto chastic translation on a surface string level, on a lemma level and on

mo dels getting incrementally more complex and rich a part-of-sp eech level. Those example translations

byintro ducing more random variables. No linguistic which show greatest similarity to the input sentence

Richness of Representation Richness of Representation Granularity of Representation

6 6 6

sent. sem sem    

1 1 4;5 9

sy n sy n    

2;3 6 2;3 6

mor mor phrase   

4 4 6;7

gra gra      

5 7 8;9 8 7 9;5

   word

2;3 8 1

- - -

molecular mixed holistic word phrase sentence molecular mixed holistic

Atomicity of Representation Granularity of Representation Atomicity of Representation

 : Sato and Nagao 1990  : EDGAR Carl 1999  : GEBMT Guv  enir and Cicekli 1998

1 2 3

 : ZERES Zer 1997  : TRADOS Heyn 1996  : ReVerb Collins 1998

4 5 6

 : Brown 1997  : Brown et al. 1990  : McLean 1992

7 8 9

Figure 1: Atomicity, Granularity and Richness of CBMT

the target language is asso ciated with a certain prob- analyses are taken into account in these approaches.

ability. This prior probability is derived from the However, in further research the authors plan to

reference text. In the translation phase, several tar- integrate linguistic knowledge such as in ectional

get language sequences are considered and the one analysis of verbs, nouns and adjectives.

McLean McLean, 1992 has prop osed an austere with the highest p osterior probability is then taken

approach where he uses neural networks NN to to b e the translation of the source language string.

translate surface strings from English to French. His

Similarly, neural network based CBMT systems

approach functions similar to TM where the NN is

McLean, 1992 are holistic approaches. The train-

used to classify the sequences of surface word forms

ing of the weights and the minimization of the classi-

according to the examples given in the reference

cation error relies on the reference text as a whole.

translations. On a small set of examples he shows

Temptations to extract rules from the trained neu-

that NN can successfully b e applied for MT.

ral networks seek to isolate and make explicit asp ects

on how the net successfully classi es new sequences.

3 Molecular vs. Holistic CBMT

The training pro cess, however, remains holistic.

TMs implement the molecular CBMT approachas

As discussed in the previous section, all CBMT sys-

they rely on a static distance metric which is inde-

tems make use of some text dimensions in order to

p endent from the size and content of the case base.

map a source language text into the target language.

TMs are molecular b ecause they rely on a xed and

TMs, for instance, rely on the set of graphemical

limited set of graphic symb ols. Adding further ex-

symb ols i.e. the ASCI I set. Richer systems use lex-

ample translations to the data base do es not increase

ical, morphological, syntactic and/or semantic de-

the set of the graphic symb ols nor do es it mo dify the

scriptions. The degree to which the set of descrip-

distance metric. Learning capacities in TMs are triv-

tions is indep endent from the reference translations

ial as their only way to learn is through extension of

determines the molecularity of the theory. The more

the example base.

the descriptions are learned from and thus dep end

The translation templates generated byGuvenir

on the reference translations the more the system

and Cicekli 1998, for instance, di er according to

b ecomes holistic. Learning descriptions from refer-

the similarities and dissimilariti es found in the ref-

ence translations makes the system more robust and

erence text. Translation templates in this system

easy to adjust to a new text domain.

thus re ect holistic asp ects of the example transla-

SBMT approaches e.g. Brown et al., 1990 have

tions. The way in which morphological analyses is

a purely holistic view on languages. Every sentence

pro cessed is, however, indep endent from the transla-

of one language is considered to b e a p ossible trans-

tion examples and is thus a molecular asp ect in the

lation of any sentence in the other language. No

system.

account is given for the equivalence of the source

language meaning and the target language meaning Similarly, the ReVerb EBMT system Collins,

other than by means of global considerations con- 1998 makes use of holistic comp onents. The ref-

cerning frequencies of o ccurrence in the reference erence text is part-of-sp eech tagged. The length of

text. In order to compute the most probable trans- translation segments as well as their most likely ini-

lations, each pair of items of the source language and tial and nal words are calculated based on proba-

 : Sato and Nagao 1990

1

Exp ected Coverage of the System

 : Carl 1999

2

 :Guvenir and Cicekli 1998

3

6

high

 : Zer 1997

4



8

 : Heyn 1996

5



1;7

 : Collins 1998

6



2;3;6

 : Brown 1997

7



4



 : Brown et al. 1990

5

8

low  : McLean 1992

9

-

low high

Exp ected Translation Quality

Figure 2: A Mo del of Comp etence for CBMT

bilities found in the reference text. mo del assigns to each n-gram a probability which

enables the system to generate the most likely tar-

4 Coarse vs. Fine Graining CBMT

get language strings.

One task that all MT systems p erform is to segment

5 A Comp etence Mo del for CBMT

the text to b e translated into translation units which

A comp etence mo del is presented as two indep en-

| to a certain extent | can b e translated indep en-

dent parameters, i.e. Coverage and Quality see Fig-

dently. The ways in which segmentation takes place

ure 2.

and how the translated segments are joined together

in the target language are di erent in each MT sys-

 Coverage of the system refers to the extentto

tem.

whichavariety of source language texts can b e

In Collins, 1998 segmentation takes place on a

translated. A system has a high coverage if a

phrasal level. Due to the lack of a rich morphological

great variety of texts can b e translated. A low-

representation, agreement cannot always b e granted

coverage system can translate only restricted

in the target language when translating single words

texts of a certain domain with limited termi-

from English to German. Reliable translation can-

nology and linguistic structures.

not b e guaranteed when phrases in the target lan-

 Quality refers to the degree to whichanMT

guage - or parts of it - are moved from one p osition

system pro duces successful translations. A sys-

e.g. the ob ject p osition into another one e.g. a

tem has a low quality if the pro duced transla-

sub ject p osition.

tions are not even informative in the sense that

In Guv  enir and Cicekli, 1998, this situation is

a user cannot understand what the source text

even more problematic b ecause there are no restric-

is ab out. A high quality MT-system pro duces

tions on p ossible llers of translation template slots.

user-oriented and correct translations with re-

Thus, a slot which has originally b een lled with an

sp ect to text typ e, terminological preferences,

ob ject can, in the translation pro cess, even accom-

p ersonal style, etc.

mo date an adverb or the sub ject.

SBMT approaches p erform ne-grained segmen-

An MT system with lowcoverage and low quality

tation. Brown et al. 1990 segment the input

is completely uninteresting. Such a system comes

sentences into words where for each source-target

close to a random numb er generator as it translates

language word pair translation probabilities, fertil-

few texts in an unpredictable way.

ity probabilities, alignment probabilities etc. are

An MT system with high coverage and \not-to o-

computed. Coarse-grained segmentation are unre-

bad" quality can b e useful in a Web-application

alistic b ecause sequences of 3 or more words so-

where a great variety of texts are to b e translated

called n-grams o ccur very rarely for n  3even in

for o ccasional users whichwant to grasp the basic

1

huge learning corp ora . Statistical and probabilis-

ideas of a foreign text. On the other hand a system

tic systems rely on word frequencies found in texts

with high quality and restricted coverage mightbe

and usually cannot extrap olate from a very small

useful for in-house MT-applications or a controlled

number of word o ccurrences . A statistical language

language.

1

An MT system with high coverage and high qual-

Brown et al. 1990 uses the Hansard French-English text

itywould translate anytyp e of text to everyone's containing several million words.

satisfaction. However, as one can exp ect, sucha ing descriptions are derived from reference transla-

system seems to b e not feasible. tions while in a molecular approach the meaning de-

scriptions are obtained from a nite set of prede-

Boitet 1999 prop oses \the tentative formula:

ned features. In a ne-grained theory, the minimal

C overage  Qual ity = K "where K dep ends on the

length of a translation unit is equivalent to a mor-

MT technology and the amountofwork enco ded in

pheme while in a coarse-grained theory this amounts

the system. The question, then, is when is the max-

to a morpheme cluster, a phrase or a sentence.

imum K p ossible and howmuchwork do wewant

According to the implemented theory of meaning,

to invest for what purp ose. Moreover a given K can

one can exp ect to obtain high quality translations or

mean high coverage and low quality, or it can mean

a good coverage of the CBMT system.

the reverse.

The more the system makes use of coarse-grained

The exp ected quality of a CBMT system increases

translation units, the higher is the exp ected trans-

when segmenting more coarsely the input text. Con-

lation quality. The more the theory uses rich repre-

sequently,alowcoverage must b e exp ected due to

sentations the more the system mayachieve broad

the combinatorial explosion of the numb er of longer

coverage. CBMT systems can b e tuned to achieve

chunks. In order for a ne-graining system to gener-

either of the two goals.

ate at least informative translations, further knowl-

edge resources need b e considered. These knowledge

References

resources may b e either pre-de ned and molecular or

they can b e derived from reference translations and

Christian Boitet. 1999. A research p ersp ective

holistic.

on how to demo cratize machine translation and

TMs fo cus on the quality of translations. Only

translation aides aiming at high quality nal out-

large clusters of meaning entities are translated into

put. In MT-Summit '99.

the target language in the hop e that such clus-

Peter F. Brown, J. Co cke, Stephen A. Della Pietra,

ters will not interfere with the context from which

Vincent J. Della Pietra, F. Jelinek, Mercer Rob ert

they are taken. Broader coverage can b e achieved

L., and Ro ossin P.S. 1990. A statistical approach

through ner grained segmentation of the input into

to machine translation. Computational Linguis-

phrases or single terms. Systems which nely seg-

tics, 16:79{85.

ment texts use rich representation languages in or-

Ralf D. Brown. 1997. Automated Dictionary Ex-

der to adapt the translation units to the target lan-

traction for \Knowledge-Free" Example-Based

guage context or, as in the case of SBMT systems,

Translation. In TMI-97, pages 111{118.

use holistic derived constraints.

Michael Carl. 1999. Inducing Translation Templates

What can b e learned and what should b e learned

for Example-Based Machine Translation. In MT-

from the reference text, how to represent the in-

Summit VII.

ferred knowledge, howtocombine it with pre-de ned

Br ona Collins. 1998. Example-Based Machine

knowledge and the impact of di erent settings on the

Translation: An Adaptation-GuidedRetrieval Ap-

constant K in the formula of Boitet 1999 are all

proach. Ph.D. thesis, Trinity College, Dublin.

still op en question for CBMT-design.

Michael Dummett. 1975. What is a Theory of

Meaning? In Mind and Language. Oxford Uni-

6 Conclusion

versity Press, Oxford.

Halil AltayGuvenir and Ilyas Cicekli. 1998. Learn-

Machine Translation MT is a meaning preserving

ing Translation Templates from Examples. Infor-

mapping from a source language text into a target

mation Systems, 236:353{363.

language text. In order to enable a computer system

Matthias Heyn. 1996. Integrating machine trans-

to p erform such a mapping, it is provided with a

lation into translation memory systems. In Eu-

formalized theory of meaning.

ropean Asscociation for Machine Translation -

Theories of meaning are characterized by three di-

Workshop Proceedings, pages 111{123, ISSCO,

chotomies: they can b e holistic or molecular, aus-

Geneva.

tere or rich and they can b e ne-grained or coarse-

Ian J. McLean. 1992. Example-Based Machine

grained.

Translation using Connectionist Matching. In

Anumb er of CBMT systems - translation mem-

TMI-92.

ories, example-based and statistical-based machine

Makoto Nagao. 1989. Machine Translation How Far

translation systems - are examined with resp ect to

Can It Go. Oxford University Press, Oxford.

these dichotomies. In a system that uses a rich the-

S. Sato and M. Nagao. 1990. Towards memory-

ory of meaning, complex representations are com-

based translation. In COLING-90.

puted including morphological, syntactical and se-

Zeres GmbH, Bo chum, Germany, 1997. ZERE- mantical representations, while with an austere the-

STRANS Benutzerhandbuch. ory the system relies on the mere graphemic surface

form of the text. In a holistic implementation mean-