EM-Training for Weighted Aligned Hypergraph Bimorphisms Frank Drewes Kilian Gebhardt and Heiko Vogler Department of Computing Science Department of Computer Science Umea˚ University Technische Universitat¨ Dresden S-901 87 Umea,˚ Sweden D-01062 Dresden, Germany
[email protected] [email protected] [email protected] Abstract malized by the new concept of hybrid grammar. Much as in the mentioned synchronous grammars, We develop the concept of weighted a hybrid grammar synchronizes the derivations aligned hypergraph bimorphism where the of nonterminals of a string grammar, e.g., a lin- weights may, in particular, represent proba- ear context-free rewriting system (LCFRS) (Vijay- bilities. Such a bimorphism consists of an Shanker et al., 1987), and of nonterminals of a R 0-weighted regular tree grammar, two ≥ tree grammar, e.g., regular tree grammar (Brainerd, hypergraph algebras that interpret the gen- 1969) or simple definite-clause programs (sDCP) erated trees, and a family of alignments (Deransart and Małuszynski, 1985). Additionally it between the two interpretations. Seman- synchronizes terminal symbols, thereby establish- tically, this yields a set of bihypergraphs ing an explicit alignment between the positions of each consisting of two hypergraphs and the string and the nodes of the tree. We note that an explicit alignment between them; e.g., LCFRS/sDCP hybrid grammars can also generate discontinuous phrase structures and non- non-projective dependency structures. projective dependency structures are bihy- In this paper we focus on the task of training an pergraphs. We present an EM-training al- LCFRS/sDCP hybrid grammar, that is, assigning gorithm which takes a corpus of bihyper- probabilities to its rules given a corpus of discon- graphs and an aligned hypergraph bimor- tinuous phrase structures or non-projective depen- phism as input and generates a sequence dency structures.