EM-Training for Weighted Aligned Hypergraph Bimorphisms

Frank Drewes

Department of Computing Science
Umea˚ University

Kilian Gebhardt and Heiko Vogler

Department of Computer Science Technische Universita¨t Dresden
S-901 87 Umea˚, Sweden

[email protected]

D-01062 Dresden, Germany

[email protected] [email protected]

Abstract

malized by the new concept of hybrid grammar.

Much as in the mentioned synchronous grammars,

a hybrid grammar synchronizes the derivations of nonterminals of a string grammar, e.g., a lin-

ear context-free rewriting system (LCFRS) (Vijay-

Shanker et al., 1987), and of nonterminals of a

tree grammar, e.g., regular tree grammar (Brainerd,

1969) or simple deﬁnite-clause programs (sDCP)

(Deransart and Małuszynski, 1985). Additionally it

synchronizes terminal symbols, thereby establish-

ing an explicit alignment between the positions of

the string and the nodes of the tree. We note that

LCFRS/sDCP hybrid grammars can also generate

non-projective dependency structures.

In this paper we focus on the task of training an

LCFRS/sDCP hybrid grammar, that is, assigning probabilities to its rules given a corpus of discon-

tinuous phrase structures or non-projective depen-

dency structures. Since the alignments are ﬁrst class citizens, we develop our approach in the

general framework of hypergraphs and hyperedge

replacement (HR) (Habel, 1992). We deﬁne the

concepts of bihypergraph (for short: bigraph) and

aligned HR bimorphism. A bigraph consists of
We develop the concept of weighted

aligned hypergraph bimorphism where the

weights may, in particular, represent proba-

bilities. Such a bimorphism consists of an

R

_≥0-weighted regular tree grammar, two

hypergraph algebras that interpret the gen-

erated trees, and a family of alignments between the two interpretations. Semantically, this yields a set of bihypergraphs each consisting of two hypergraphs and an explicit alignment between them; e.g., discontinuous phrase structures and non-

projective dependency structures are bihy-

pergraphs. We present an EM-training algorithm which takes a corpus of bihypergraphs and an aligned hypergraph bimorphism as input and generates a sequence

of weight assignments which converges to

a local maximum or saddle point of the

likelihood function of the corpus.

1 Introduction

hypergraphs H₁, λ, and H₂, where

the alignment between H₁and H₂. A bimorphism

λ

represents

In natural language processing alignments play an

important role. For instance, in machine translation

they show up as hidden information when training

probabilities of dictionaries (Brown et al., 1993) or

when considering pairs of input/output sentences derived by a synchronous grammar (Lewis and

Stearns, 1968; Chiang, 2007; Shieber and Schabes,

1990; Nederhof and Vogler, 2012). As another example, in language models for discontinuous phrase structures and non-projective dependency

structures they can be used to capture the connec-

B = (g, A₁, Λ, A₂) consists of a regular tree gram-

mar

, two

symbol in every tree to two hypergraphs), and a

family of alignments between the two interpreta-

g

generating trees over some ranked alphabet

Σ

-algebras A₁and A₂which interpret each

as an HR operation (thus evaluating
-indexed

Σ
Σ
Λ

tions of each σ ∈ Σ. The semantics of

B

is a set of

bigraphs.

For instance, each discontinuous phrase struction between the words in a natural language sen- ture or non-projective dependency structure can be

tence and the corresponding nodes of the parse tree

or dependency structure of that sentence.

represented as a bigraph (H₁, λ, H₂) where H₁and H₂correspond to the string component and the tree

component, respectively. Fig. 1 shows an example

In (Nederhof and Vogler, 2014) the generation of discontinuous phrase structures has been for- of a bigraph representing a non-projective depen-

60

Proceedings of the ACL Workshop on Statistical NLP and Weighted Automata, pages 60–69,

c

Berlin, Germany, August 12, 2016. ꢀ2016 Association for Computational Linguistics

is
.

hearing

scheduled

A

on

today

issue

(a)

the

hearing is scheduled on the issue today

.

(y₁⁽⁰⁾

)

(y₁⁽⁰⁾

)

inp

out

is

.

hearing scheduled

H₂

today today
AAon on

(b)

issue issue the the

λ

hearing

is

scheduled

.

H₁

(y₁⁽⁰⁾

)

(y₁⁽⁰⁾

)

inp

out

Figure 1: (a) A sentence with non-projective dependencies is represented in (b) by a bigraph (H₁, λ, H₂)
.

Both hypergraphs H₁and H₂contain a distinct hyperedge (box) for each word of the sentence. H₁

speciﬁes the linear order on the words. H₂describes parent-child relationships between the words, where

children form a list to whose start and end the parent has a tentacle. The alignment

λ

establishes a

one-to-one correspondence between the (input vertices of the) hyperedges in H₁and H₂.

dency structure. We present each LCFRS/sDCP

hybrid grammar as a particular aligned HR bimor- ing the reduct is polynomial in the size of

phism; this establishes an initial algebra semantics (H₁, λ, H₂) if is an LCFRS/sDCP hybrid gram-

ities. We show that the complexity of construct-

g

and

B

(Goguen et al., 1977) for hybrid grammars.

mar. However, as the algorithm itself is not limited

to this situation, we expect it to be useful in other

cases as well.

The ﬂexibility of aligned HR bimorphisms goes

well beyond hybrid grammars as they generalize the synchronous HR grammars of (Jones et al., 2012), making it possible to synchronously generate two graphs connected by explicit alignment

structures. Thus, they can for instance model align-

ments involving directed acyclic graphs like Ab-

stract Meaning Representations (Banarescu et al.,

2013) or Millstream systems (Bensch et al., 2014).

Our training algorithm takes as input an aligned

HR bimorphism B = (g, A₁, Λ, A₂) and a corpus

2 Preliminaries

Basic mathematical notation We denote the set

of natural numbers (including

N \ {0} by . For n ∈ N, we denote {1, . . . , n} by [n]. An alphabet is a ﬁnite set of symbols.

0

) by

N

and the set

N

A

We denote the set of all strings over

A

by A^∗, the

empty string by

ε

, and A^∗\ {ε} by A⁺. We denote

the length of s ∈ A^∗by |s| and, for each i ∈ [|s|]

,

c

of bigraphs. It is based on the dynamic program-

the

i

th item in

s

by s(i), i.e., is identiﬁed with the

s

ming variant (Baker, 1979; Lari and Young, 1990;

Prescher, 2001) of the EM-algorithm (Dempster et

al., 1977) and thus approximates a local maximum

or saddle point of the likelihood function of c.

In order to calculate the signiﬁcance of each

function s: [|s|] → A such that s = s(1) · · · s(|s|).

We denote the range {s(1), . . . , s(|s|)} of s by [s].

The powerset of a set

A

is denoted by P(A). The

canonical extension of a function f : A → B to f : P(A) → P(B) and to f : A^∗→ B^∗are derule of

g

for the generation of a single bigraph

ﬁned as usual and denoted by

f

as well. We denote

0

(H₁, λ, H₂) occurring in c, we proceed as usual, the restriction of f : A → B to A ⊆ A by f|_A.

constructing the reduct B £ (H₁, λ, H₂) which

For an equivalence relation

∼

on

B

we denote

∼

generates the singleton (H₁, λ, H₂) via the same

the equivalence class of b ∈ B by [b] and the

derivation trees as

B

and preserves the probabil- quotient of

B

modulo

∼

by B/∼. For f : A →

61

B

we deﬁne the function f/∼: A → B/∼ by with initial nonterminal S and the following rules:

f/∼(a) = [f(a)] . In particular, for a string s ∈

∼

S → σ₁(A, B)
A→ σ₂

B^∗we let s/∼ = [s(1)] · · · [s(|s|)] .

∼

B → σ₃(C, D)

C→ σ₄

D → σ₅

We observe that S ⇒^∗_gσ₁(σ₂, σ₃(σ₄, σ₅)). Let

η

,

Terms, regular tree grammars, and algebras

A ranked alphabet is a pair (Σ, rk) where is

an alphabet and rk: Σ → N is a mapping associat-

ing a rank with each symbol of . Often we just

write

instead of (Σ, rk). We abbreviate rk⁻¹(k)

by Σ_k. In the following let be a ranked alphabet.

Let be an arbitrary set. We let Σ(A) denote

ζ, and ζ⁰be the following trees (in order):

Σ

B → σ₃(C, D)

S → σ₁(A, B)

A → σ₂
S → σ₁(A, B)

Σ

η

C → σ₄D → σ₅

A → σ₂

Σ

B

Σ

Then ζ ∈ D_g^S(T_Σ, B) because ζ⁰∈ D_g^S(T_Σ) and

A

the left-hand side of the root of η is B.

ꢀ

the set of strings {σ(a₁, . . . , a_k) | k ∈ N, σ ∈

A

Σ-algebra is a pair A = (A, (σ_A| σ ∈ Σ))

where is a set and σ_Ais a k-ary operation on

for every k ∈ N and σ ∈ Σ_k. As usual, we will sometimes use to refer to its carrier set or, conversely, denote by (and thus σ_Aby σ_A) if there is no risk of confusion. The -term algebra is the Σ-algebra T_Σwith σ_T(t₁, . . . , t_k) =
Σ_k, a₁, . . . , a_k∈ A} (where the parentheses and

A

commas are special symbols not in Σ). The set of

well-formed terms over

by T_Σ(A), is deﬁned to be the smallest set

Σ

indexed by A, denoted

such

A

A
T

A

that A ⊆ T and Σ(T) ⊆ T. We abbreviate T_Σ(∅)

Σ

by T_Σand write σ instead of σ() for σ ∈ Σ₀.

A regular tree grammar (RTG)¹(Ge´cseg and

Σ

σ(t₁, . . . , t_k) for every k ∈ N

,

σ ∈ Σ_k, and

Steinby, 1984) is a tuple g = (Ξ, Σ, ξ₀, R) where t₁, . . . , t_k∈ T_Σ. For each

Σ

-algebra

A

there is

Ξ

is an alphabet (nonterminals), Ξ ∩ Σ = ∅, el-

ements in

are called terminals, ξ₀∈ Ξ (initial

nonterminal), is a ranked alphabet (rules); each

exactly one Σ-homomorphism, denoted by [[.]]
,

A

Σ

from the Σ-term algebra to A (Wechler, 1992).

R

rule in R_khas the form ξ → σ(ξ₁, . . . , ξ_k) where

ξ, ξ₁, . . . , ξ_k∈ Ξ, σ ∈ Σ_k. We denote the set of all

rules with left-hand side ξ by R_ξfor each ξ ∈ Ξ.

Hypergraphs and hyperedge replacement In

the following let be a ﬁnite set of labels. A

hypergraph is a tuple H = (V, E, att, lab, ports)

Γ
Γ-

,

Since RTGs are particular context-free grammars, the concepts of derivation relation and gen-

erated language are inherited. The language of the

where

V

is a ﬁnite set of vertices,

E

is a ﬁnite set of

hyperedges, att: E → V ^∗\ {ε} is the attachment

of hyperedges to vertices, lab: E → Γ is the label-

ing of hyperedges, and ports ∈ V ^∗is a sequence of (not necessarily distinct) ports. The set of all

RTG

g

is the set of all well-formed terms in T_Σ

generated by g; this language is denoted by L(g).

We deﬁne the Ξ-indexed family (D_g^ξ| ξ ∈

Γ

-hypergraphs is denoted by

H

_Γ. The vertices in
Ξ) of mappings D_g^ξ: T_Σ→ P(T_R); for each
V \ [ports] are also called internal vertices and

denoted by int(H).

term t ∈ T_Σ

,

D_g^ξ(t) is the set of t’s deriva-

For the sake of brevity, we shall in the following

simply call Γ-hypergraphs and hyperedges graphs

and edges, respectively. We illustrate a graph in ﬁg-

tion trees in T_Rwhich start with

ξ

and yield t.
Formally, for each ξ ∈ Ξ and σ(t₁, . . . , t_k) ∈ T_Σ, the set D_g^ξ(σ(t₁, . . . , t_k)) contains each term

%(d₁, . . . , d_k) where % = (ξ → σ(ξ₁, . . . , ξ_k))

ures as follows (cf., e.g., graph((σ₂)_A) in Fig. 2a).

A vertex

v

is illustrated by a circle, which is ﬁlled

is in

R

and d_i∈ D_g^ξⁱ(t_i) for each i ∈ [k]

.

and labeled by

i

in case that ports(i) = v. An edge and att(e) = v₁. . . v_nis depicted as

γ-labeled rectangle with tentacles, lines point-

S

We deﬁne D_g(t) =

D_g^ξ(t) and D_g^ξ(T_Σ) =

ξ∈Ξ

e

with label

γ

S

D_g^ξ(t). Finally, D_g^ξ⁰(T_Σ, ξ) is the set of all

a

n

t∈T_Σ

ζ ∈ T_R({ξ}) such that there is a ζ⁰∈ D_g^ξ⁰(T_Σ)

ing to v₁, . . . , v_nwhich are annotated by 1, . . . , n

.

(We sometimes drop these annotations.)

which has a subtree whose root is in R_ξ, and

ζ

is

obtained from ζ

⁰by replacing exactly one of these

If we are not interested in the particular set of

labels , then we also call a -graph simply graph

and write instead of _Γ. In the following, we

will refer to the components of a graph by index-

ing them with H unless they are explicitly named.

Γ

subtrees by ξ.

H

Example 2.1. Let Σ = Σ₀∪ Σ₂where Σ₀

=

H

{σ₂, σ₄, σ₅} and Σ₂= {σ₁, σ₃}. Let be an RTG

g

Let

H

and

0

H

⁰be graphs.

H

and

0

H

⁰are disjoint

¹in this context we use “tree” and “term” as synonyms

if V_H∩ V_H= ∅ and E_H∩ E_H= ∅.

62

1

2

3

4

2
13
24

1

2

1

2

EM-Training for Weighted Aligned Hypergraph Bimorphisms

Domain Theory and the Logic of Observable Properties

Algebraic Implementation of Abstract Data Types*

Categories of Coalgebras with Monadic Homomorphisms Wolfram Kahl

Abstraction and Abstraction Refinement in the Verification Of

Syntax and Semantics in Algebra Jean-François Nicaud, Denis Bouhineau, Jean-Michel Gélis

Lecture Notes for MATH 770 : Foundations of Mathematics — University of Wisconsin – Madison, Fall 2005

Abstract Data Types

Algebraic Semantics Abstract Type Whose Values Are Lists of Algebraic Semantics Involves the Algebraic Integers: Specification of Data and Language Constructs

The Agda Universal Algebra Library Part 2: Structure Homomorphisms, Terms, Classes of Algebras, Subalgebras, and Homomorphic Images

Lectures on Universal Algebra

Domain Theory Corrected and Expanded Version Samson Abramsky1 and Achim Jung2

Arxiv:1611.02908V1 [Cs.LO] 9 Nov 2016 1