Three Dependency-and-Boundary Models for Grammar Induction Valentin I. Spitkovsky Hiyan Alshawi Stanford University and Google Inc. Google Inc., Mountain View, CA, 94043
[email protected] [email protected] Daniel Jurafsky Stanford University, Stanford, CA, 94305
[email protected] Abstract DT NN VBZ IN DT NN [The check] is in [the mail]. We present a new family of models for unsu- | {z } | {z } pervised parsing, Dependency and Boundary Subject Object models, that use cues at constituent bound- Figure 1: A partial analysis of our running example. aries to inform head-outward dependency tree generation. We build on three intuitions that Consider the example in Figure 1. Because the are explicit in phrase-structure grammars but determiner (DT) appears at the left edge of the sen- only implicit in standard dependency formu- tence, it should be possible to learn that determiners lations: (i) Distributions of words that oc- may generally be present at left edges of phrases. cur at sentence boundaries — such as English This information could then be used to correctly determiners — resemble constituent edges. parse the sentence-internal determiner in the mail. (ii) Punctuation at sentence boundaries fur- ther helps distinguish full sentences from Similarly, the fact that the noun head (NN) of the ob- fragments like headlines and titles, allow- ject the mail appears at the right edge of the sentence ing us to model grammatical differences be- could help identify the noun check as the right edge tween complete and incomplete sentences. of the subject NP. As with jigsaw puzzles, working (iii) Sentence-internal punctuation boundaries inwards from boundaries helps determine sentence- help with longer-distance dependencies, since internal structures of both noun phrases, neither of punctuation correlates with constituent edges.