A Model of Competence for Corpus-Based Machine Translation
A Mo del of Comp etence for Corpus-Based Machine Translation
Michael Carl
Institut fur Angewandte Informationsforschung,
Martin-Luther-Strae 14,
66111 Saarbruc ken, Germany,
is based. In CBMT-systems, it is assumed that Abstract
the reference translations given to the system in a
In this pap er I elab orate a mo del of comp etence
training phase have equivalence meanings. Accord-
for corpus-based machine translation CBMT along
ing to their intelligenc e, these systems try to g-
the lines of the representations used in the transla-
ure out of what the meaning invariance consists in
tion system. Representations in CBMT-systems can
the reference text and learn an appropriate source
b e rich or austere, molecular or holistic and they can
language/target language mapping mechanism. A
b e ne-grained or coarse-grained. The pap er shows
translation can only b e generated if an appropriate
that di erent CBMT architectures are required de-
example translation is available in the reference text.
p endent on whether a b etter translation qualityor
An interesting question in CBMT systems is thus:
a broader coverage is preferred according to Boitet
what theory of meaning should the learning pro-
1999's formula: \Coverage * Quality = K".
cess implement in order to generate an appropriate
understanding of the source text such that it can
1 Intro duction
b e mapp ed into a meaning equivalent target text?
In the machine translation MT literature, it has
Dummett Dummett, 1975 suggests a distinction
often b een argued that translations of natural lan-
of theories of meaning along the following lines:
guage texts are valid if and only if the source lan-
guage text and the target language text have the
In a rich theory of meaning, the knowledge of
same meaning cf. e.g. Nagao, 1989. If we assume
the concepts is achieved by knowing the features
that MT systems pro duce meaningful translations
of these concepts. An austere theory merely re-
to a certain extent, wemust assume that such sys-
lies up on simple recognition of the shap e of the
tems have a notion of the source text meaning to a
concepts. A rich theory can justify the use of a
similar extent. Hence, the translation algorithm to-
concept by means of the characteristic features
gether with the data it uses enco de a formal mo del of
of that concept, whereas an austere theory can
meaning. Despite 50 years of intense research, there
justify the use of a concept merely byenumer-
is no existing system that could map arbitrary input
ating all o ccurrences of the use of that concept.
texts onto meaning-equivalent output texts. Howis
A molecular theory of meaning derives the
that p ossible?
understanding of an expression from a nite
According to Dummett, 1975 a theory of mean-
numb er of axioms. A holistic theory, in con-
ing is a theory of understanding: having a theory
trast, derives the understanding of an expres-
of meaning means that one has a theory of under-
sion through its distinction from all other ex-
standing. In linguistic research, texts are describ ed
pressions in that language. A molecular theory,
on a numberoflevels and dimensions each contribut-
therefore, provides criteria to asso ciate a cer-
ing to its understanding and hence to its meaning.
tain meaning to a sentence and can explain the
Traditionally, the main fo cus has b een on semantic
concepts used in the language. In a holistic the-
asp ects. In this research it is assumed that know-
ory nothing is sp eci ed ab out the knowledge of
ing the prop ositional structure of a text means to
the language other than in global constraints
understand it. Under the same premise, researchin
related to the language as a whole.
MT has fo cused on semantic asp ects assuming that
texts have the same meaning if they are semantically
In addition, the granularity of concepts seems cru-
equivalent.
cial for CBMT implementations.
Recent research in corpus-based MT has di er-
A ne-grained theory of meaning derives con- ent premisses. Corpus-Based Machine Translation
cepts from single morphemes or separable words CBMT systems make use of a set of reference
of the language, whereas in a coarse-grained translations on which the translation of a new text
theory of meaning, concepts are obtained from with resp ect to the three levels of description are
morpheme clusters. In a ne-grained theory of returned as the b est available translation.
meaning, complex concepts can b e created by
Example Based Machine Translation EBMT
hierarchical comp osition of their comp onents,
systems Sato and Nagao, 1990; Collins, 1998;
whereas in a coarse-grained theory of meaning,
Guvenir and Cicekli, 1998; Carl, 1999; Brown, 1997
complex meanings can only b e achieved through
are richer systems. Translation examples are stored
a concatenation of concept sequences.
as feature and tree structures. Translation templates
are generated which contain - sometimes weighted
The next three sections discuss the dichotomies of
- connections in those p ositions where the source
theories of meaning, rich vs. austere, molecular vs.
language and the target language equivalences are
holistic and coarse-grained vs. ne-grained where a
strong. In the translation phase, a multi-layered
few CBMT systems are classi ed according to the
mapping from the source language into the target
terminology intro duced. This leads to a mo del of
language takes place on the level of templates and
comp etence for CBMT. It app ears that translation
on the level of llers.
systems can either b e designed to have a broad cov-
The ReVerb EBMT system Collins, 1998 p er-
erage or a high quality.
forms sub-sentential chunking and seeks to link con-
stituents with the same function in the source and
2 Rich vs. Austere CBMT
the target language. A source language sub ject is
translated as a target language sub ject and a source
A common characteristic of all CBMT systems is
language ob ject as a target language ob ject. In case
that the understanding of the translation task is de-
there is no appropriate translation template avail-
rived from the understanding of the reference trans-
able, single words can b e replaced as well, at the
lations. The inferred translation knowledge is used
exp ense of translation quality.
in the translation phase to generate new transla-
The EBMT approach describ ed in Guv enir and
tions.
Cicekli, 1998 makes use of morphological knowl-
Collins 1998 distinguishes b etween Memory-
edge and relies on word stems as a basis for trans-
Based MT, i.e. memory heavy, linguistic light and
lation. Translation templates are generalized from
Example-Based MT i.e. memory light and linguistic
aligned sentences by substituting di erences in sen-
heavy. While the former systems implement an aus-
tence pairs with variables and leaving the identical
tere theory of meaning, the latter make use of rich
substrings unsubstituted. An iterative application
representations.
of this metho d generates translation examples and
The most sup er cial theory of understanding
translation templates which serve as the basis for
is implemented in purely memory-based MT ap-
an example based MT system. An understanding
proaches where learning takes place only by extend-
consists of extraction of comp ositionally translatable
ing the reference text. No abstraction or generaliza-
substrings and the generation of translation tem-
tion of the reference examples takes place.
plates.
Translation Memories TMs are such purely
A similar approach is followed in EDGAR Carl,
memory based MT-systems. A TM e.g. TRADOS's
1999. Sentences are morphologically analyzed and
Translator's Workb ench Heyn, 1996, and STAR's
translation templates are decorated with features.
TRANSIT calculates the graphemic similarity of the
Fillers in translation template slots are constrained
input text and the source side of the reference trans-
to unify with these features. In addition to this,
lations and return the target string of the most sim-
a shallow linguistic formalism is used to p ercolate
ilar translation examples as output. TMs make use
features in derivation trees.
of a set of reference translation examples and a k-
Sato and Nagao 1990 prop osed still richer repre-
nn retrieval algorithm. They implement an austere
sentations where syntactically analyzed phrases and
theory of meaning b ecause they cannot justify the
sentences are stored in a database. In the translation
use of a word other than by lo oking up all contexts
phase, most similar derivation trees are retrieved
in which the word o ccurs. They can, however, enu-
from the database and a target language deriva-
merate all o ccurrences of a word in the reference
tion tree is comp osed from the translated parts. By
text.
means of a thesaurus semantically similar lexical
The TM distributed by ZERES Zer, 1997 follows
items may b e exchanged in the derivation trees.
a richer approach. The reference translations and
Statistics based MT SBMT approaches imple- the input sentence to b e translated are lemmatized
ment austere theories of meaning. For instance, in and part-of-sp eech tagged. The source language sen-
Brown et al. 1990 a couple of mo dels are pre- tence is mapp ed against the reference translations
sented starting with simple sto chastic translation on a surface string level, on a lemma level and on
mo dels getting incrementally more complex and rich a part-of-sp eech level. Those example translations
byintro ducing more random variables. No linguistic which show greatest similarity to the input sentence
Richness of Representation Richness of Representation Granularity of Representation
6 6 6
sent. sem sem
1 1 4;5 9
sy n sy n
2;3 6 2;3 6
mor mor phrase
4 4 6;7
gra gra
5 7 8;9 8 7 9;5
word
2;3 8 1
- - -
molecular mixed holistic word phrase sentence molecular mixed holistic
Atomicity of Representation Granularity of Representation Atomicity of Representation
: Sato and Nagao 1990 : EDGAR Carl 1999 : GEBMT Guv enir and Cicekli 1998
1 2 3
: ZERES Zer 1997 : TRADOS Heyn 1996 : ReVerb Collins 1998
4 5 6
: Brown 1997 : Brown et al. 1990 : McLean 1992
7 8 9
Figure 1: Atomicity, Granularity and Richness of CBMT
the target language is asso ciated with a certain prob- analyses are taken into account in these approaches.
ability. This prior probability is derived from the However, in further research the authors plan to
reference text. In the translation phase, several tar- integrate linguistic knowledge such as in ectional
get language sequences are considered and the one analysis of verbs, nouns and adjectives.
McLean McLean, 1992 has prop osed an austere with the highest p osterior probability is then taken
approach where he uses neural networks NN to to b e the translation of the source language string.
translate surface strings from English to French. His
Similarly, neural network based CBMT systems
approach functions similar to TM where the NN is
McLean, 1992 are holistic approaches. The train-
used to classify the sequences of surface word forms
ing of the weights and the minimization of the classi-
according to the examples given in the reference
cation error relies on the reference text as a whole.
translations. On a small set of examples he shows
Temptations to extract rules from the trained neu-
that NN can successfully b e applied for MT.
ral networks seek to isolate and make explicit asp ects
on how the net successfully classi es new sequences.
3 Molecular vs. Holistic CBMT
The training pro cess, however, remains holistic.
TMs implement the molecular CBMT approachas
As discussed in the previous section, all CBMT sys-
they rely on a static distance metric which is inde-
tems make use of some text dimensions in order to
p endent from the size and content of the case base.
map a source language text into the target language.
TMs are molecular b ecause they rely on a xed and
TMs, for instance, rely on the set of graphemical
limited set of graphic symb ols. Adding further ex-
symb ols i.e. the ASCI I set. Richer systems use lex-
ample translations to the data base do es not increase
ical, morphological, syntactic and/or semantic de-
the set of the graphic symb ols nor do es it mo dify the
scriptions. The degree to which the set of descrip-
distance metric. Learning capacities in TMs are triv-
tions is indep endent from the reference translations
ial as their only way to learn is through extension of
determines the molecularity of the theory. The more
the example base.
the descriptions are learned from and thus dep end
The translation templates generated byGuvenir
on the reference translations the more the system
and Cicekli 1998, for instance, di er according to
b ecomes holistic. Learning descriptions from refer-
the similarities and dissimilariti es found in the ref-
ence translations makes the system more robust and
erence text. Translation templates in this system
easy to adjust to a new text domain.
thus re ect holistic asp ects of the example transla-
SBMT approaches e.g. Brown et al., 1990 have
tions. The way in which morphological analyses is
a purely holistic view on languages. Every sentence
pro cessed is, however, indep endent from the transla-
of one language is considered to b e a p ossible trans-
tion examples and is thus a molecular asp ect in the
lation of any sentence in the other language. No
system.
account is given for the equivalence of the source
language meaning and the target language meaning Similarly, the ReVerb EBMT system Collins,
other than by means of global considerations con- 1998 makes use of holistic comp onents. The ref-
cerning frequencies of o ccurrence in the reference erence text is part-of-sp eech tagged. The length of
text. In order to compute the most probable trans- translation segments as well as their most likely ini-
lations, each pair of items of the source language and tial and nal words are calculated based on proba-
: Sato and Nagao 1990
1
Exp ected Coverage of the System
: Carl 1999
2
:Guvenir and Cicekli 1998
3
6
high
: Zer 1997
4
8
: Heyn 1996
5
1;7
: Collins 1998
6
2;3;6
: Brown 1997
7
4
: Brown et al. 1990
5
8
low : McLean 1992
9
-
low high
Exp ected Translation Quality
Figure 2: A Mo del of Comp etence for CBMT
bilities found in the reference text. mo del assigns to each n-gram a probability which
enables the system to generate the most likely tar-
4 Coarse vs. Fine Graining CBMT
get language strings.
One task that all MT systems p erform is to segment
5 A Comp etence Mo del for CBMT
the text to b e translated into translation units which
A comp etence mo del is presented as two indep en-
| to a certain extent | can b e translated indep en-
dent parameters, i.e. Coverage and Quality see Fig-
dently. The ways in which segmentation takes place
ure 2.
and how the translated segments are joined together
in the target language are di erent in each MT sys-
Coverage of the system refers to the extentto
tem.
whichavariety of source language texts can b e
In Collins, 1998 segmentation takes place on a
translated. A system has a high coverage if a
phrasal level. Due to the lack of a rich morphological
great variety of texts can b e translated. A low-
representation, agreement cannot always b e granted
coverage system can translate only restricted
in the target language when translating single words
texts of a certain domain with limited termi-
from English to German. Reliable translation can-
nology and linguistic structures.
not b e guaranteed when phrases in the target lan-
Quality refers to the degree to whichanMT
guage - or parts of it - are moved from one p osition
system pro duces successful translations. A sys-
e.g. the ob ject p osition into another one e.g. a
tem has a low quality if the pro duced transla-
sub ject p osition.
tions are not even informative in the sense that
In Guv enir and Cicekli, 1998, this situation is
a user cannot understand what the source text
even more problematic b ecause there are no restric-
is ab out. A high quality MT-system pro duces
tions on p ossible llers of translation template slots.
user-oriented and correct translations with re-
Thus, a slot which has originally b een lled with an
sp ect to text typ e, terminological preferences,
ob ject can, in the translation pro cess, even accom-
p ersonal style, etc.
mo date an adverb or the sub ject.
SBMT approaches p erform ne-grained segmen-
An MT system with lowcoverage and low quality
tation. Brown et al. 1990 segment the input
is completely uninteresting. Such a system comes
sentences into words where for each source-target
close to a random numb er generator as it translates
language word pair translation probabilities, fertil-
few texts in an unpredictable way.
ity probabilities, alignment probabilities etc. are
An MT system with high coverage and \not-to o-
computed. Coarse-grained segmentation are unre-
bad" quality can b e useful in a Web-application
alistic b ecause sequences of 3 or more words so-
where a great variety of texts are to b e translated
called n-grams o ccur very rarely for n 3even in
for o ccasional users whichwant to grasp the basic
1
huge learning corp ora . Statistical and probabilis-
ideas of a foreign text. On the other hand a system
tic systems rely on word frequencies found in texts
with high quality and restricted coverage mightbe
and usually cannot extrap olate from a very small
useful for in-house MT-applications or a controlled
number of word o ccurrences . A statistical language
language.
1
An MT system with high coverage and high qual-
Brown et al. 1990 uses the Hansard French-English text
itywould translate anytyp e of text to everyone's containing several million words.
satisfaction. However, as one can exp ect, sucha ing descriptions are derived from reference transla-
system seems to b e not feasible. tions while in a molecular approach the meaning de-
scriptions are obtained from a nite set of prede-
Boitet 1999 prop oses \the tentative formula:
ned features. In a ne-grained theory, the minimal
C overage Qual ity = K "where K dep ends on the
length of a translation unit is equivalent to a mor-
MT technology and the amountofwork enco ded in
pheme while in a coarse-grained theory this amounts
the system. The question, then, is when is the max-
to a morpheme cluster, a phrase or a sentence.
imum K p ossible and howmuchwork do wewant
According to the implemented theory of meaning,
to invest for what purp ose. Moreover a given K can
one can exp ect to obtain high quality translations or
mean high coverage and low quality, or it can mean
a good coverage of the CBMT system.
the reverse.
The more the system makes use of coarse-grained
The exp ected quality of a CBMT system increases
translation units, the higher is the exp ected trans-
when segmenting more coarsely the input text. Con-
lation quality. The more the theory uses rich repre-
sequently,alowcoverage must b e exp ected due to
sentations the more the system mayachieve broad
the combinatorial explosion of the numb er of longer
coverage. CBMT systems can b e tuned to achieve
chunks. In order for a ne-graining system to gener-
either of the two goals.
ate at least informative translations, further knowl-
edge resources need b e considered. These knowledge
References
resources may b e either pre-de ned and molecular or
they can b e derived from reference translations and
Christian Boitet. 1999. A research p ersp ective
holistic.
on how to demo cratize machine translation and
TMs fo cus on the quality of translations. Only
translation aides aiming at high quality nal out-
large clusters of meaning entities are translated into
put. In MT-Summit '99.
the target language in the hop e that such clus-
Peter F. Brown, J. Co cke, Stephen A. Della Pietra,
ters will not interfere with the context from which
Vincent J. Della Pietra, F. Jelinek, Mercer Rob ert
they are taken. Broader coverage can b e achieved
L., and Ro ossin P.S. 1990. A statistical approach
through ner grained segmentation of the input into
to machine translation. Computational Linguis-
phrases or single terms. Systems which nely seg-
tics, 16:79{85.
ment texts use rich representation languages in or-
Ralf D. Brown. 1997. Automated Dictionary Ex-
der to adapt the translation units to the target lan-
traction for \Knowledge-Free" Example-Based
guage context or, as in the case of SBMT systems,
Translation. In TMI-97, pages 111{118.
use holistic derived constraints.
Michael Carl. 1999. Inducing Translation Templates
What can b e learned and what should b e learned
for Example-Based Machine Translation. In MT-
from the reference text, how to represent the in-
Summit VII.
ferred knowledge, howtocombine it with pre-de ned
Br ona Collins. 1998. Example-Based Machine
knowledge and the impact of di erent settings on the
Translation: An Adaptation-GuidedRetrieval Ap-
constant K in the formula of Boitet 1999 are all
proach. Ph.D. thesis, Trinity College, Dublin.
still op en question for CBMT-design.
Michael Dummett. 1975. What is a Theory of
Meaning? In Mind and Language. Oxford Uni-
6 Conclusion
versity Press, Oxford.
Halil AltayGuvenir and Ilyas Cicekli. 1998. Learn-
Machine Translation MT is a meaning preserving
ing Translation Templates from Examples. Infor-
mapping from a source language text into a target
mation Systems, 236:353{363.
language text. In order to enable a computer system
Matthias Heyn. 1996. Integrating machine trans-
to p erform such a mapping, it is provided with a
lation into translation memory systems. In Eu-
formalized theory of meaning.
ropean Asscociation for Machine Translation -
Theories of meaning are characterized by three di-
Workshop Proceedings, pages 111{123, ISSCO,
chotomies: they can b e holistic or molecular, aus-
Geneva.
tere or rich and they can b e ne-grained or coarse-
Ian J. McLean. 1992. Example-Based Machine
grained.
Translation using Connectionist Matching. In
Anumb er of CBMT systems - translation mem-
TMI-92.
ories, example-based and statistical-based machine
Makoto Nagao. 1989. Machine Translation How Far
translation systems - are examined with resp ect to
Can It Go. Oxford University Press, Oxford.
these dichotomies. In a system that uses a rich the-
S. Sato and M. Nagao. 1990. Towards memory-
ory of meaning, complex representations are com-
based translation. In COLING-90.
puted including morphological, syntactical and se-
Zeres GmbH, Bo chum, Germany, 1997. ZERE- mantical representations, while with an austere the-
STRANS Benutzerhandbuch. ory the system relies on the mere graphemic surface
form of the text. In a holistic implementation mean-