Starting from Scratch in Semantic Role Labeling

Starting From Scratch in Semantic Role Labeling Michael Connor Yael Gertner University of Illinois University of Illinois [email protected] [email protected] Cynthia Fisher Dan Roth University of Illinois University of Illinois [email protected] [email protected] Abstract tence comprehension. The structure-mapping account makes three key assumptions: First, sen- A fundamental step in sentence compre- tence comprehension is grounded by the acquisi- hension involves assigning semantic roles tion of an initial set of concrete nouns. Nouns are to sentence constituents. To accomplish arguably less dependent on prior linguistic knowl- this, the listener must parse the sentence, edge for their acquisition than are verbs; thus chil- find constituents that are candidate argu- dren are assumed to be able to identify the refer- ments, and assign semantic roles to those ents of some nouns via cross-situational observa- constituents. Each step depends on prior tion (Gillette et al., 1999). Second, these nouns, lexical and syntactic knowledge. Where once identified, yield a skeletal sentence structure. do children learning their first languages Children treat each noun as a candidate argument, begin in solving this problem? In this pa- and thus interpret the number of nouns in the sen- per we focus on the parsing and argument- tence as a cue to its semantic predicate-argument identification steps that precede Seman- structure (Fisher, 1996). Third, children represent tic Role Labeling (SRL) training. We sentences in an abstract format that permits gener- combine a simplified SRL with an un- alization to new verbs (Gertner et al., 2006). supervised HMM part of speech tagger, The structure-mapping account of early syn- and experiment with psycholinguistically- tactic bootstrapping makes strong predictions, in- motivated ways to label clusters resulting cluding predictions of tell-tale errors. In the sen- from the HMM so that they can be used tence “Ellen and John laughed”, an intransitive to parse input for the SRL system. The verb appears with two nouns. If young chil- results show that proposed shallow rep- dren rely on representations of sentences as sim- resentations of sentence structure are ro- ple as an ordered set of nouns, then they should bust to reductions in parsing accuracy, and have trouble distinguishing such sentences from that the contribution of alternative repre- transitive sentences. Experimental evidence sug- sentations of sentence structure to suc- gests that they do: 21-month-olds mistakenly in- cessful semantic role labeling varies with terpreted word order in sentences such as “The girl the integrity of the parsing and argument- and the boy kradded” as conveying agent-patient identification stages. roles (Gertner and Fisher, 2006). 1 Introduction Previous computational experiments with a system for automatic semantic role labeling In this paper we present experiments with an au- (BabySRL: (Connor et al., 2008)) showed that tomatic system for semantic role labeling (SRL) it is possible to learn to assign basic semantic that is designed to model aspects of human lan- roles based on the shallow sentence representa- guage acquisition. This simplified SRL system is tions proposed by the structure-mapping view. inspired by the syntactic bootstrapping theory, and Furthermore, these simple structural features were by an account of syntactic bootstrapping known robust to drastic reductions in the integrity of as ’structure-mapping’ (Fisher, 1996; Gillette et the semantic-role feedback (Connor et al., 2009). al., 1999; Lidz et al., 2003). Syntactic bootstrap- These experiments showed that representations of ping theory proposes that young children use their sentence structure as simple as ‘first of two nouns’ very partial knowledge of syntax to guide sen- are useful, but the experiments relied on perfect 989 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 989–998, Uppsala, Sweden, 11-16 July 2010. c 2010 Association for Computational Linguistics knowledge of arguments and predicates as a start stand sentences at the level of who did what to to classification. whom. The architecture of our system is similar Perfect built-in parsing finesses two problems to a previous approach to modeling early language facing the human learner. The first problem in- acquisition (Connor et al., 2009), which is itself volves classifying words by part-of-speech. Pro- based on the standard architecture of a full SRL posed solutions to this problem in the NLP and system (e.g. (Punyakanok et al., 2008)). human language acquisition literatures focus on This basic approach follows a multi-stage distributional learning as a key data source (e.g., pipeline, with each stage feeding in to the next. (Mintz, 2003; Johnson, 2007)). Importantly, The stages are: (1) Parsing the sentence, (2) Iden- infants are good at learning distributional pat- tifying potential predicates and arguments based terns (Gomez and Gerken, 1999; Saffran et al., on the parse, (3) Classifying role labels for each 1996). Here we use a fairly standard Hidden potential argument relative to a predicate, (4) Ap- Markov Model (HMM) to generate clusters of plying constraints to find the best labeling of ar- words that occur in similar distributional contexts guments for a sentence. In this work we attempt in a corpus of input sentences. to limit the knowledge available at each stage to The second problem facing the learner is the automatic output of the previous stage, con- more contentious: Having identified clusters of strained by knowledge that we argue is available distributionally-similar words, how do children to children in the early stages of language learn- figure out what role these clusters of words should ing. play in a sentence interpretation system? Some In the parsing stage we use an unsupervised clusters contain nouns, which are candidate ar- parser based on Hidden Markov Models (HMM), guments; others contain verbs, which take argu- modeling a simple ‘predict the next word’ parser. ments. How is the child to know which are which? Next the argument identification stage identifies In order to use the output of the HMM tagger to HMM states that correspond to possible argu- process sentences for input to an SRL model, we ments and predicates. The candidate arguments must find a way to automatically label the clusters. and predicates identified in each input sentence are Our strategies for automatic argument and pred- passed to an SRL classifier that uses simple ab- icate identification, spelled out below, reflect core stract features based on the number and order of claims of the structure-mapping theory: (1) The arguments to learn to assign semantic roles. meanings of some concrete nouns can be learned As input to our learner we use samples of without prior linguistic knowledge; these concrete natural child directed speech (CDS) from the nouns are assumed based on their meanings to be CHILDES corpora (MacWhinney, 2000). During possible arguments; (2) verbs are identified, not initial unsupervised parsing we experiment with primarily by learning their meanings via observa- incorporating knowledge through a combination tion, but rather by learning about their syntactic of statistical priors favoring a skewed distribution argument-taking behavior in sentences. of words into classes, and an initial hard cluster- By using the HMM part-of-speech tagger in this ing of the vocabulary into function and content way, we can ask how the simple structural fea- words. The argument identifier uses a small set tures that we propose children start with stand up of frequent nouns to seed argument states, relying to reductions in parsing accuracy. In doing so, we on the assumptions that some concrete nouns can move to a parser derived from a particular theoret- be learned as a prerequisite to sentence interpreta- ical account of how the human learner might clas- tion, and are interpreted as candidate arguments. sify words, and link them into a system for sentence comprehension. The SRL classifier starts with noisy largely unsupervised argument identification, and receives 2 Model feedback based on annotation in the PropBank style; in training, each word identified as an argu- We model language learning as a Semantic Role ment receives the true role label of the phrase that Labeling (SRL) task (Carreras and Marquez,` word is part of. This represents the assumption 2004). This allows us to ask whether a learner, that learning to interpret sentences is naturally su- equipped with particular theoretically-motivated pervised by the fit of the learner’s predicted mean- representations of the input, can learn to under- ing with the referential context. The provision 990 of perfect ‘gold-standard’ feedback over-estimates With HMM we can also easily incorporate ad- the real child’s access to this supervision, but al- ditional knowledge during parameter estimation. lows us to investigate the consequences of noisy The first (and simplest) parser we used was an argument identification for SRL performance. We HMM trained using EM with 80 hidden states. show that even with imperfect parsing, a learner The number of hidden states was made relatively can identify useful abstract patterns for sentence large to increase the likelihood of clusters corre- interpretation. Our ultimate goal is to ‘close the sponding to a single part of speech, while preserv- loop’ of this system, by using learning in the SRL ing some degree of generalization. system to improve the initial unsupervised parse Johnson (2007) observed that EM tends to cre- and argument identification. ate word clusters of uniform size, which does The training data were samples of parental not reflect the way words cluster into parts of speech to three children (Adam, Eve, and speech in natural languages. The addition of pri- Sarah; (Brown, 1973)), available via CHILDES. ors biasing the system toward a skewed alloca- The SRL training corpus consists of parental utter- tion of words to classes can help.

Load more