EM-Training for Weighted Aligned Hypergraph Bimorphisms
Total Page:16
File Type:pdf, Size:1020Kb
EM-Training for Weighted Aligned Hypergraph Bimorphisms Frank Drewes Kilian Gebhardt and Heiko Vogler Department of Computing Science Department of Computer Science Umea˚ University Technische Universitat¨ Dresden S-901 87 Umea,˚ Sweden D-01062 Dresden, Germany [email protected] [email protected] [email protected] Abstract malized by the new concept of hybrid grammar. Much as in the mentioned synchronous grammars, We develop the concept of weighted a hybrid grammar synchronizes the derivations aligned hypergraph bimorphism where the of nonterminals of a string grammar, e.g., a lin- weights may, in particular, represent proba- ear context-free rewriting system (LCFRS) (Vijay- bilities. Such a bimorphism consists of an Shanker et al., 1987), and of nonterminals of a R 0-weighted regular tree grammar, two ≥ tree grammar, e.g., regular tree grammar (Brainerd, hypergraph algebras that interpret the gen- 1969) or simple definite-clause programs (sDCP) erated trees, and a family of alignments (Deransart and Małuszynski, 1985). Additionally it between the two interpretations. Seman- synchronizes terminal symbols, thereby establish- tically, this yields a set of bihypergraphs ing an explicit alignment between the positions of each consisting of two hypergraphs and the string and the nodes of the tree. We note that an explicit alignment between them; e.g., LCFRS/sDCP hybrid grammars can also generate discontinuous phrase structures and non- non-projective dependency structures. projective dependency structures are bihy- In this paper we focus on the task of training an pergraphs. We present an EM-training al- LCFRS/sDCP hybrid grammar, that is, assigning gorithm which takes a corpus of bihyper- probabilities to its rules given a corpus of discon- graphs and an aligned hypergraph bimor- tinuous phrase structures or non-projective depen- phism as input and generates a sequence dency structures. Since the alignments are first of weight assignments which converges to class citizens, we develop our approach in the a local maximum or saddle point of the general framework of hypergraphs and hyperedge likelihood function of the corpus. replacement (HR) (Habel, 1992). We define the concepts of bihypergraph (for short: bigraph) and 1 Introduction aligned HR bimorphism. A bigraph consists of In natural language processing alignments play an hypergraphs H1, λ, and H2, where λ represents important role. For instance, in machine translation the alignment between H1 and H2. A bimorphism they show up as hidden information when training = (g, , Λ, ) consists of a regular tree gram- B A1 A2 probabilities of dictionaries (Brown et al., 1993) or mar g generating trees over some ranked alphabet when considering pairs of input/output sentences Σ, two Σ-algebras and which interpret each A1 A2 derived by a synchronous grammar (Lewis and symbol in Σ as an HR operation (thus evaluating Stearns, 1968; Chiang, 2007; Shieber and Schabes, every tree to two hypergraphs), and a Σ-indexed 1990; Nederhof and Vogler, 2012). As another family Λ of alignments between the two interpreta- example, in language models for discontinuous tions of each σ Σ. The semantics of is a set of ∈ B phrase structures and non-projective dependency bigraphs. structures they can be used to capture the connec- For instance, each discontinuous phrase struc- tion between the words in a natural language sen- ture or non-projective dependency structure can be tence and the corresponding nodes of the parse tree represented as a bigraph (H1, λ, H2) where H1 and or dependency structure of that sentence. H2 correspond to the string component and the tree In (Nederhof and Vogler, 2014) the generation component, respectively. Fig. 1 shows an example of discontinuous phrase structures has been for- of a bigraph representing a non-projective depen- 60 Proceedings of the ACL Workshop on Statistical NLP and Weighted Automata, pages 60–69, Berlin, Germany, August 12, 2016. c 2016 Association for Computational Linguistics is . hearing scheduled A on today (a) issue the A hearing is scheduled on the issue today . (0) inp (0) out (y1 ) (y1 ) is . hearing scheduled H2 A on today (b) issue the λ A hearing is scheduled on the issue today . H1 (0) inp (0) out (y1 ) (y1 ) Figure 1: (a) A sentence with non-projective dependencies is represented in (b) by a bigraph (H1, λ, H2). Both hypergraphs H1 and H2 contain a distinct hyperedge (box) for each word of the sentence. H1 specifies the linear order on the words. H2 describes parent-child relationships between the words, where children form a list to whose start and end the parent has a tentacle. The alignment λ establishes a one-to-one correspondence between the (input vertices of the) hyperedges in H1 and H2. dency structure. We present each LCFRS/sDCP ities. We show that the complexity of construct- hybrid grammar as a particular aligned HR bimor- ing the reduct is polynomial in the size of g and phism; this establishes an initial algebra semantics (H , λ, H ) if is an LCFRS/sDCP hybrid gram- 1 2 B (Goguen et al., 1977) for hybrid grammars. mar. However, as the algorithm itself is not limited The flexibility of aligned HR bimorphisms goes to this situation, we expect it to be useful in other well beyond hybrid grammars as they generalize cases as well. the synchronous HR grammars of (Jones et al., 2012), making it possible to synchronously gen- 2 Preliminaries erate two graphs connected by explicit alignment Basic mathematical notation We denote the set structures. Thus, they can for instance model align- of natural numbers (including 0) by N and the set ments involving directed acyclic graphs like Ab- N 0 by N . For n N, we denote 1, . , n \{ } ∈ { } stract Meaning Representations (Banarescu et al., by [n]. An alphabet A is a finite set of symbols. 2013) or Millstream systems (Bensch et al., 2014). We denote the set of all strings over A by A∗, the Our training algorithm takes as input an aligned empty string by ε, and A ε by A+. We denote ∗ \{ } HR bimorphism = (g, 1, Λ, 2) and a corpus the length of s A by s and, for each i [ s ], B A A ∈ ∗ | | ∈ | | c of bigraphs. It is based on the dynamic program- the ith item in s by s(i), i.e., s is identified with the ming variant (Baker, 1979; Lari and Young, 1990; function s:[ s ] A such that s = s(1) s( s ). | | → ··· | | Prescher, 2001) of the EM-algorithm (Dempster et We denote the range s(1), . , s( s ) of s by [s]. { | | } al., 1977) and thus approximates a local maximum The powerset of a set A is denoted by (A). The P or saddle point of the likelihood function of c. canonical extension of a function f : A B to → In order to calculate the significance of each f : (A) (B) and to f : A B are de- P → P ∗ → ∗ rule of g for the generation of a single bigraph fined as usual and denoted by f as well. We denote (H , λ, H ) occurring in c, we proceed as usual, the restriction of f : A B to A A by f . 1 2 → 0 ⊆ |A0 constructing the reduct £ (H , λ, H ) which For an equivalence relation on B we denote B 1 2 ∼ generates the singleton (H1, λ, H2) via the same the equivalence class of b B by [b] and the ∈ ∼ derivation trees as and preserves the probabil- quotient of B modulo by B/ . For f : A B ∼ ∼ → 61 B we define the function f/ : A B/ by with initial nonterminal S and the following rules: ∼ → ∼ f/ (a) = [f(a)] . In particular, for a string s ∼ ∼ ∈ S σ1(A, B) A σ2 B∗ we let s/ = [s(1)] [s( s )] . → → ∼ ∼ ··· | | ∼ B σ (C, D) C σ D σ → 3 → 4 → 5 Terms, regular tree grammars, and algebras We observe that S σ (σ , σ (σ , σ )). Let η, ⇒g∗ 1 2 3 4 5 A ranked alphabet is a pair (Σ, rk) where Σ is ζ, and ζ0 be the following trees (in order): an alphabet and rk: Σ N is a mapping associat- → B σ (C, D) S σ (A, B) S σ (A, B) ing a rank with each symbol of Σ. Often we just → 3 → 1 → 1 1 write Σ instead of (Σ, rk). We abbreviate rk (k) C σ4 D σ5 A σ2 B A σ2 η − → → → → by Σk. In the following let Σ be a ranked alphabet. Then ζ DS(T ,B) because ζ DS(T ) and Let A be an arbitrary set. We let Σ(A) denote ∈ g Σ 0 ∈ g Σ the left-hand side of the root of η is B. the set of strings σ(a1, . , ak) k N, σ { | ∈ ∈ Σk, a1, . , ak A (where the parentheses and A Σ-algebra is a pair = (A, (σ σ Σ)) ∈ } A A | ∈ commas are special symbols not in Σ). The set of where A is a set and σ is a k-ary operation on A A well-formed terms over Σ indexed by A, denoted for every k N and σ Σk. As usual, we will ∈ ∈ by T (A), is defined to be the smallest set T such sometimes use to refer to its carrier set A or, Σ A that A T and Σ(T ) T . We abbreviate TΣ( ) conversely, denote by A (and thus σ by σA) ⊆ ⊆ ∅ A A by T and write σ instead of σ() for σ Σ . if there is no risk of confusion. The Σ-term alge- Σ ∈ 0 1 A regular tree grammar (RTG) (Gecseg´ and bra is the Σ-algebra TΣ with σTΣ (t1, . , tk) = σ(t1, .