Context-Free Grammar Rewriting and the Transfer of Packed Linguistic Representations
Marc Dymetman Fred´ eric´ Tendeau Xerox Research Centre Europe Lernout & Hauspie 6, chemin de Maupertuis Koning Albert-I laan 64 38240 Meylan, France B-1780 Wemmel, Belgium [email protected] [email protected]
Abstract The representations of (Emele and Dorna, 1998) and We propose an algorithm for the transfer of packed linguistic (Kay, 1999) are based on a notion of propositional con- structures, that is, finite collections of labelled graphs which texts (see (Maxwell and Kaplan, 1991)), where each share certain subparts. A labelled graph is seen as a word over possible non-ambiguous reading included in the packed a vocabulary of description elements (nodes, arcs, labels), and source representation is extracted by selecting the value a collection of graphs as a set of such words, that is, as a lan- (true or false) of a certain number of propositional vari- guage over description elements. A packed representation for ables that index elements of the labelled source graph. the collection of graphs is then viewed as a context-free gram- Transfer is then seen as a process of rewriting source mar which generates such a language. We present an algorithm graph elements (e.g, nodes labelled with French lexemes) that uses a conventional set of transfer rules but is capable of into target graph elements (e.g. nodes labelled with En- rewriting the CFG representing the source packed structure into glish lexemes), while preserving the propositional con- a CFG representing the target packed structure that preserves texts in which these graph elements were selected. the compaction properties of the source CFG. In contrast, our approach, following (Dymetman, 1997), views a packed representation as being a gram- 1 Introduction mar (more specifically, a context-free grammar) over the vocabulary of graph elements (labelled nodes and edges), There is currently much interest in translation models where each word (in the sense of formal language theory) that support some amount of ambiguity preservation be- tween source and target texts, so as to minimize disam- generated by the grammar represents one of the possible biguation decisions that the system, or an interactive user, non-ambiguous readings of the packed representation. In has to make during the translation process (Kay et al., other terms, the collection of non-ambiguous graphs be- longing to the packed representation is seen as a lan- 1994). guage over a vocabulary of graph elements, and a packed An important aspect of such models is the ability to handle, during all the stages of the translation process, representation is seen as a grammar which generates such packed linguistic structures, that is, structures which fac- a language. Packing comes from the fact that a context- torize in a compact fashion all the different readings of free grammar is an efficient representation for the lan- guage it generates. Another essential feature of such a a sentence and obviate the need to list and treat all these representation is that it is interaction-free, that is, each readings in isolation of each other (as is standard in more nondeterministic top-down traversal of the grammar suc- traditional models for machine translation). In the case of parsing, and more specifically, parsing ceeds without ever backtracking and it results in a certain with unification-based formalisms such as LFG, tech- reading, without the need for checking the consistency of a set of associated propositional constraints: the repre- niques for producing packed structures have been in sentation for the collection of readings is as direct as can existence for some time (Maxwell and Kaplan, 1991; be while permitting a factorization of common parts. Maxwell and Kaplan, 1993; Maxwell and Kaplan, 1996; D¨orre, 1997; Dymetman, 1997). More recently, tech- Based on this notion, we present an algorithm for niques have been appearing for the generation from transfer which, starting from a finite set of rewriting packed structures (Shemtov, 1997), the transfer between patterns (the transfer lexicon), associates with a given context-free grammar representing the source packed packed structures (Emele and Dorna, 1998; Rayner and structure a context-free grammar representing the tar- Bouillon, 1995), and the integration of such mechanisms into the whole translation process (Kay, 1999; Frank, get packed structure. Therefore, the target representa- 1999). tion remains interaction-free and transparently encodes This paper focuses on the problem of transfer. The the target structures; furthermore, under certain natural “locality” conditions on the rewriting rules (the graph el- method proposed is related to those of (Emele and Dorna, ements in their left-hand sides tend be be “close” from 1998) and (Kay, 1999). As in these approaches, we view each other in the source grammar derivations), the target packed representations as being descriptions of a finite collection of directed labelled graphs (similar to the func- grammar preserves much of the factorization and com- tional structures of LFG), each representing a different paction properties of the source grammar. non-ambiguous reading, which share certain subparts. The paper is structured in the following way. Sec- tion 2 explains how ambiguous graphs can be seen as way to describe such a graph is by listing a collection of commutative languages over graph description elements, “description elements” for it, where each such element
and how context-free grammars provide concise specifi- is either a labelled node such as see 0 or a labelled edge
cations for these languages. Section 3 extends the stan- such as mod27 . Using this format, the pragmatically pre-
f 01
dard notion of non-ambiguous transfer to that of am- ferred analysis for our sentence is the set see 0 , arg1 ,
02 2 27 7 23 3 34
biguous transfer. Section 4 presents the basic language- i1 , arg2 , light , mod , green1 , mod , on , arg2 ,
g
05 5 56 6 theoretic formalism needed and introduces some opera- hill4 , mod , with , arg2 , telescope . tors on languages. Section 5 presents the detailed rewrit- If we consider the collection of all possible analyses, ing algorithm, which applies these operators not directly we then obtain a collection of sets of description ele- to languages, but to the context-free grammars specify- ments. It is convenient to view such a collection as a ing them. Section 6 gives an example of the algorithm in commutative language over the vocabulary of all possi- operation. ble description elements; each word in such a language corresponds to one analysis and is a list of description 2 Ambiguous structures as languages elements the order of which is considered irrelevant. The main advantage of taking this view of ambiguous
0: see / saw structures is that formal language theory provides stan- arg1 arg2 mod mod dard tools for representing languages compactly. Thus it is well-known in computational lexicography that a 1: i 2: light mod mod large list of word strings can be represented efficiently by mod 3: on means of a finite-state automaton which factorizes com- 7: green1 / green2 arg2 mon substrings. Such a representation is both compact 4: hill mod and “explicit”: accessing and using it is as direct as the 5: with flat list of words would be. arg2 Although one might think of using finite-state mod- 6: telescope els for representing compactly the language associated with a collection of graphs, they do not seem as relevant Figure 1: An informal graphical representation of the 20 as context-free models for our purposes. The reason is possible analyses for “I saw the green light on the hill that the source packed representations are typically ob- with a telescope”. tained as the results of chart-parsing processes. A chart Let’s consider the sentence “I saw the green light on used in the parsing of a context-free grammar can itself the hill with a telescope”. In Fig. 1, we have repre- be viewed as a context-free grammar, which is a spe- sented informally the set of possible analyses for this cialization of the original grammar for the string being sentence. Labels on the nodes correspond to predicate parsed, and which directly generates the derivation trees names (‘on’, ‘hill’, etc). A slash is used to indicate dif- for this string relative to the original grammar (Billot and ferent possible readings for a node; for instance, we as- Lang, 1989).1 The generalization of this approach to uni- sume that the surface form “saw” can correspond to the fication grammars (of the LFG or DCG type) proposed verbs “to see” or “to saw”, and that “green” is ambigu- in (Dymetman, 1997) shows that, in turn, chart-parsing ous between the color adjective “green1” and the noun with these unification grammars conducts naturally to “green2” (grassy lawn). Relations between nodes are in- packed representations for the parse results very close to
dicated by labels on the edges joining two nodes: ‘arg1’ the ones we are about to introduce. G and ‘arg2’ for first and second argument, ‘mod’ for mod- Let’s consider the CFG 0 : ifier. The solid edges correspond to relations which are S ! SAW ON WITH D3
satisfied in all the readings for the sentence, dotted edges
!
1 02 SAW D0 arg101 i arg2 LIGHT
to relations that are satisfied only for certain readings. ! 2
LIGHT GREEN mod27 light
! j 7
Thus, the preprositional phrase “on the hill” can modify GREEN green17 green2
!
34 4
either “light” or “see/saw”, the phrase “with a telescope” ON on3 arg2 hill
!
56 6
either “hill”, “light”, or “see/saw”. The informal picture WITH with5 arg2 telescope
j ! 0
of Fig. 1 does not make explicit exactly which structures D0 see0 saw
! j 23
are actually possible analyses of the sentence. For in- D3 mod03 D30 mod D32
! j 45
D30 mod05 mod 25
stance the two crossing edges mod 03 and mod (where
! j j
25 45 indices are used to denote the origin and destination of D32 mod05 mod mod the edge) cannot appear together in a reading of the given Nonterminals of that grammar are written in upper-
sentence. As a consequence only five of the apparent case, terminals (which are graph description elements) in
3 2 prepositional attachments combinations are possi- lowercase. It can be verified that the language generated ble, which multiplied by the four possible lexical variants by this grammar is the collection of commutative words for “saw” and “green” gives 20 possible readings for the sentence. 1This context-free grammar has polynomial size relative to the length of the string. While it is also possible in principle to use a finite- Each of these readings is a graph where nodes 0 and state model for representing the same set of derivation trees, it can be 7 now carry one label, and where one ‘mod’ edge has shown that such a model may be exponential relative to string length been selected for the attachment of nodes 3 and 5. One (remark due to John Maxwell). corresponding precisely to all the possible analyses for 4 Formal aspects the sentence. The commutative monoid over an alphabet A is denoted
The fact that there are 20 such words can be es-
A by C , and its words are represented by vectors of
tablished by a simple bottom-up computation involving A
A N w 2 N , indexed by and with entries in . For each
multiplications and sums. If we call ambiguity degree A
a 2 A N , the component indexed by is denoted by
ad(N) of a nonterminal N the number of words it gen-
w a w
a] [ and tells how many ’s occur in . The product
erates, then it is obvious that, for instance, ad(D30) = 2,
w w C A w 2 2
(concatenation) of 1 and in is the vector
ad(D3) = 2+3, ad(S) = 4 1 1 5 = 20. In fact, it is the mul-
A
+ w 8a 2A w = w N
2 1
[a] [a] a]
tiplications which appear in such computations which are s.t. : [ . A language of
A responsible for the compactness of the grammar as com- the commutative monoid is a subset of C .
pared to the direct listing of the words: each time a mul- The subword relation is denoted by . For a language
v L w 2 L v w
tiplication appears, a factorization is being cashed in.2 L, we write: iff there exists s.t. . L
The rewriting is performed from a source language S
L T
3 Transfer as language rewriting over an alphabet S to a target language over an al-
S
phabet T (disjoint from ) w.r.t. a set of rewriting
+
!
When working with non-ambiguous structures, transfer R T
rules S (rules have the form ). We
2 is a rewriting process which takes as input a source- a
assume in the sequel that any S appears at most
language graph and constructs a target-language graph once in any left-hand side of each rule of R and also at
lhs ! rhs
by applying transfer rules of the form ,where L
most once in any word of S . This property is preserved rhs
lhs and are finite sets of description elements for by all the rewritings that we are going to introduce.
!= R R
source graph and target graph respectively. In outline, the Let’s define LH S .For ,wedefine
R =fa 2 j9r 2 R s:t: aLH S r g “non-ambiguous”transfer process works in the following eft et
L S S . L
way: for each non-overlapping covering of the source S The rewriting is a function R taking and yielding
graph with left-hand sides of transfer rules, the corre- L
T ,definedas:
sponding right-hand sides are produced and taken to-
L = f j9w 2 L ;w = ^
S 1 p S 1 p
gether represent a target graph (this is a non-deterministic R
! 2R^::: ^ ! 2Rg:
1 p p function as there can be several such coverings). 1 In the case of ambiguous structures, the aim of transfer
is to take as input a language of source graphs and to pro- 5 Algorithm
duce a language of target graphs. The language of target In order to implement the function R ,itisusefulto
!
graphs should be equal to the union of all the graphs that introduce rewriting functions and a . They apply
L C = [ T would have obtained if one had enumerated one-by-one to any language over ,where S .
the source graphs, applied non-ambiguous transfer, and They are defined as:
L=fw j w 2 Lg
!
taken the collection of all target graphs obtained. The
L=fw 2 L j w =0g
a
a]
goal of ambiguous transfer is to perform the same task [ .
!
on the basis of a compact representation for the collec- The functions are applied so that source sym- L
tion of source graphs, yielding a compact representation bols are guaranteed to be removed one by one from S :
<
for the collection of target graphs. we consider S is totally ordered by and we write