Hyp: a Toolkit for Representing, Manipulating, and Optimizing Hypergraphs

hyp: A Toolkit for Representing, Manipulating, and Optimizing Hypergraphs Markus Dreyer¤ Jonathan Graehl SDL Research SDL Research 6060 Center Drive Suite 150 6060 Center Drive Suite 150 Los Angeles, CA 90045 Los Angeles, CA 90045 [email protected] [email protected] Abstract tail1 w We present hyp, an open-source toolkit for tail2 head the representation, manipulation, and optimization of weighted directed hypergraphs. tail3 hyp provides compose, project, invert functionality, k-best path algorithms, Figure 1: An arc leading from three tail states to a head state, the inside and outside algorithms, and more. with weight w. Finite-state machines are modeled as a special case of directed hypergraphs. hyp con- sists of a C++ API, as well as a command 2 Definitions line tool, and is available for download at github.com/sdl-research/hyp. A weighted directed hypergraph (hereinafter hypergraph) is a pair H V,E , where V is a set of Æ h i vertices and E a set of edges. Each edge (also called 1 Introduction hyperedge) is a triple e T(e),h(e),w(e) , where Æ h i T(e) is an ordered list of tails (i.e., source vertices), We present hyp, an open-source toolkit that pro- h(e) is the head (i.e., target vertex) and w(e) is the vides data structures and algorithms to process semiring weight (see Section 3.4) of the edge (see weighted directed hypergraphs. Figure 1). Such hypergraphs are important in natural lan- We regard hypergraphs as automata and call the guage processing and machine learning, e.g., in vertices states and edges arcs. We add an optional parsing (Klein and Manning (2005), Huang and start state S V and a final state F V. Chiang (2005)), machine translation (Kumar et al., 2 2 Each state s has an input label i(s) (§ {∅}) 2009), as well as in logic (Gallo et al., 1993) and 2 [ and output label o(s) (§ {∅}); if o(s) ∅ then weighted logic programming (Eisner and Filardo, 2 [ Æ we treat the state as having o(s) i(s). The label 2011). Æ alphabet § is divided into disjoint sets of nonter- The hyp toolkit enables representing and ma- minal, lexical, and special {²,Á,½,σ} labels. The nipulating weighted directed hypergraphs, pro- input and output labels are analogous to those of viding compose, project, invert functional- a finite-state transducer in some hyp operations ity, k-best path algorithms, the inside and out- (Section 3.3). side algorithms, and more. hyp also implements The set of incoming arcs into a state s is called a framework for estimating hypergraph feature the Backward Star of s, or short, BS(s). Formally, weights by optimization on forests derived from BS(s) {a E: h(a) s}. A path ¼ is a sequence training data. Æ 2 Æ of arcs ¼ (a ...a ) E¤ such that a ¼, t Æ 1 k 2 8 2 8 2 Markus Dreyer is now at Amazon, Inc., Seattle, WA. T(a),( a0 ¼ : h(a0) t) BS(t) . Each tail state ¤ 9 2 Æ _ Æ; 11 Proceedings of NAACL-HLT 2015, pages 11–15, Denver, Colorado, May 31 – June 5, 2015. c 2015 Association for Computational Linguistics t of each arc on the path must be the head of some 0(S) <- 1("he") 2("eats") 3("rice") / 0.693 FINAL <- 0(S) arc on the path, unless t is the start state or has "he" no incoming arcs and a terminal (lexical or spe- 1 2 0.693 "eats" S cial) input label, in which case we call t an axiom. 3 The rationale is that each tail state of each arc on "rice" the path must be derived, by traveling an arc that leads to it, or given as an axiom. If the hypergraph Figure 2: The text and visual representation of a hypergraph has a start state, the first tail of the first arc of any with a single arc, similar to Figure 1. The visual representation path must be the start state. The head of the last leaves out the state IDs of labeled states. arc must always be the final state, h(ak ) F. Paths Æ 0(S) <- 1(NP) 2(VP) correspond to trees, or proofs that the final state 1(NP) <- 3(PRON) may be reached from axioms. 2(VP) <- 4(V) 5(NP) 6(PP) 3(PRON) <- 10("He") Hypergraph arcs have exactly one head; some 4(V) <- 11("eats") 5(NP) <- 7(N) authors permit multiple heads and would call our 6(PP) <- 8(PREP) 9(N) hypergraphs B-hypergraphs (Gallo et al., 1993). 7(N) <- 12("rice") 8(PREP) <- 13("with") 9(N) <- 14("sticks") FINAL <- 0(S) 3 Representing hypergraphs # These added arcs # make it intoa forest: Text representation. hyp uses a simple human- 15(NP) <- 7(N) 6(PP) 2(VP) <- 4(V) 15(NP) "He" PRON NP readable text format for hypergraphs. For exam- 1 2 S "eats" V 1 1 ple, see the first two lines in Figure 2. Each hyper- VP 2 "rice" N NP 2 1 graph arc has the following format: "with" PREP 1 NP 2 3 2 PP head <- tail1 tail2 ... tailn / weight "sticks" N Head and tail states are non-negative integers Figure 3: A packed forest. followed by an optional label in parentheses (or a pair of (input output) labels). If it is lexical (i.e., a word), then it is double-quoted with the Reducing redundancy. State labels need not be usual backslash-escapes; nonterminal and special repeated at every mention of that state’s ID; if a symbols are unquoted. Special symbols like ², Á, state has a label anywhere it has it always. For ex- ½, σ are written with brackets, as <eps>, <phi>, ample, we write the label S for state 0 in Figure 2 <rho>, <sigma>. Each arc may optionally have only once: 0(S) <- 1("he") 2("eats") 3("rice") / 0.693 a slash followed by a weight, which is typically a FINAL <- 0 negative log probability (i.e., the cost of the arc). A Similarly, state IDs may be left out wherever a final state n is marked as FINAL <- n. Figure 2 label uniquely identifies a particular state: shows the text and visual representation of a hy- 0(S) <- ("he")("eats")("rice") / 0.693 pergraph with only one arc; it represents and ac- FINAL <- 0 cepts the string he eats rice. hyp generates state IDs for these states automati- cally. Visual representation. A provided Draw command can render hypergraphs using Graphviz 3.1 Trees and forests (Gansner and North, 2000). Small gray numbers A forest is a hypergraph that contains a set of trees. indicate the order of arc tails. Axiom nodes are A forest may be packed, in which case its trees filled gray.1 The final state is drawn as a double cir- share substructure, like strings in a lattice. An ex- cle, following finite-state convention. ample forest in hyp format is shown in Figure 3. 1Gray is used analogously in graphical models for observed Any two or more arcs pointing into one state have nodes. OR semantics; the depicted forest compactly rep- 12 he eats rice 0 1 2 3 simple sentence hypergraph of Figure 5, we could arrive at a more interesting lattice or even an FSM Figure 4: A one-sentence finite-state machine in OpenFst. with cycles and so infinitely many paths. 3.3 Transducers START <- 0 0 1 1 <- 0 4("he") 2 1 2 <- 1 5("eats") 1 A leaf state s with an output label o(s) i(s) "he" 6Æ 3 <- 2 6("rice") 2 2 1 rewrites the input label. This applies to finite-state FINAL <- 3 "eats" 2 3 "rice" as well as general hypergraphs. The following arc, for example, reads “eats” and an NP and derives a VP; it also rewrites “eats” to “ate”: hyp Figure 5: A one-sentence finite-state hypergraph in . (V) <- ("eats""ate") (NP) If a state has an output label, it must then have an resents two interpretations of one sentence: (1) he input label, though it may be <eps>. The start eats rice using sticks OR he eats rice that has sticks. state conventionally has no label. Hypergraphs can represent any context-free grammar, where the strings in the grammar are the lex- 3.4 Semirings and features ical yield (i.e., leaves in order) of the hypergraph Each hypergraph uses a particular semiring, which trees. specifies the type of weights and defines how 3.2 Strings, lattices, and general FSMs weights are added and multiplied. hyp provides the standard semirings (Mohri, 2009), as well as In addition to trees and forests, hypergraphs can the expectation semiring (Eisner, 2002), and a new represent strings, lattices, and general finite-state “feature” semiring. The feature semiring pairs with machines (FSMs) as a special case. A standard tropical semiring elements a sparse feature vector finite-state representation of a string would look that adds componentwise in the semiring prod- like Figure 4, which shows a left-recursive bracket- uct and follows the winning tropical element in the ing as (((he) eats) rice), i.e., we read “he”, semiring sum. Features 0 and 8 fire with different combine it with “eats”, then combine the result strengths on this arc: with “rice” to accept the whole string (Allauzen et (V) <- 11("eats""ate") / 3.2[0=1.3,8=-0.5] al., 2007). We can do something similar in hyp using By using the expectation or the feature semiring, hypergraphs—see Figure 5.

Load more