<<

Issues in Synchronizing the English Treebank and PropBank

Olga Babko-Malayaa, Ann Biesa, Ann Taylorb, Szuting Yia, Martha Palmerc, Mitch Marcusa, Seth Kulicka and Libin Shena aUniversity of Pennsylvania, bUniversity of York, cUniversity of Colorado {malayao,bies}@ldc.upenn.edu, {szuting,mitch,skulick,libin}@linc.cis.upenn.edu, [email protected], [email protected]

that can be found, and our Abstract plans for reconciling them. We also investigate the sources of the disagreements, which types of The PropBank primarily adds semantic disagreements can be resolved automatically, role labels to the syntactic constituents in which types require manual adjudication, and for the parsed trees of the Treebank. The which types an agreement between syntactic and goal is for automatic semantic role label- semantic representations cannot be reached. ing to be able to use the domain of local- ity of a predicate in order to find its ar- 1.1 Treebank guments. In principle, this is exactly what The Penn Treebank annotates text for syntactic is wanted, but in practice the PropBank structure, including syntactic argument structure annotators often make choices that do not and rough semantic information. Treebank anno- actually conform to the Treebank parses. tation involves two tasks: part-of-speech tagging As a result, the syntactic features ex- and syntactic annotation. tracted by automatic semantic role label- The first task is to provide a part-of-speech tag ing systems are often inconsistent and for every token. Particularly relevant for Prop- contradictory. This paper discusses in de- Bank work, verbs in any form (active, passive, tail the types of mismatches between the gerund, infinitive, etc.) are marked with a verbal syntactic bracketing and the semantic part of speech (VBP, VBN, VBG, VB, etc.). role labeling that can be found, and our (Marcus, et al. 1993; Santorini 1990) plans for reconciling them. The syntactic annotation task consists of marking constituent boundaries, inserting empty 1 Introduction categories (traces of movement, PRO, pro), showing the relationships between constituents The PropBank corpus annotates the entire Penn (argument/adjunct structures), and specifying a Treebank with predicate argument structures by particular subset of adverbial roles. (Marcus, et adding semantic role labels to the syntactic al. 1994; Bies, et al. 1995) constituents of the Penn Treebank. Constituent boundaries are shown through Theoretically, it is straightforward for PropBank syntactic node labels in the trees. In the simplest annotators to locate possible arguments based on case, a node will contain an entire constituent, the syntactic structure given by a parse tree, and complete with any associated arguments or mark the located constituent with its argument modifiers. However, in structures involving syn- label. We would expect a one-to-one mapping tactic movement, sub-constituents may be dis- between syntactic constituents and semantic placed. In these cases, Treebank annotation arguments. However, in practice, PropBank represents the original position with a trace and annotators often make choices that do not shows the relationship as co-indexing. In (1) be- actually conform to the Penn Treebank parses. low, for example, the direct object of entail is The discrepancies between the PropBank and shown with the trace *T*, which is coindexed to the Penn Treebank obstruct the study of the syn- the WHNP node of the question what. tax and interface and pose an immedi- ate problem to an automatic semantic role label- (SBARQ (WHNP-1 (WP What )) ing system. A semantic role labeling system is (1) (SQ (VBZ does ) trained on many syntactic features extracted from (NP-SBJ (JJ industrial ) the parse trees, and the discrepancies make the (NN emigration )) training data inconsistent and contradictory. In (VP (VB entail) this paper we discuss in detail the types of mis- (NP *T*-1))) matches between the syntactic bracketing and the (. ?))

70 Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006, pages 70–77, Sydney, July 2006. c 2006 Association for Computational In (2), the relative clause modifying a journal- tem (temporal –TMP, locative –LOC, manner ist has been separated from that NP by the prepo- –MNR, purpose –PRP, etc.). sitional phrase to al Riyadh, which is an argu- Inside NPs, the argument/adjunct distinction is ment of the verb sent. The position where the shown structurally. Argument constituents (S and relative clause originated or “belongs” is shown SBAR only) are children of NP, sister to the head by the trace *ICH*, which is coindexed to the noun. Adjunct constituents are sister to the NP SBAR node containing the relative clause con- that contains the head noun, child of the NP that stituent. contains both:

(2)(S (NP-SBJ You) (NP (NP head) (VP sent (PP adjunct)) (NP (NP a journalist) (SBAR *ICH*-2)) 1.2 PropBank (PP-DIR to (NP al Riyadh)) PropBank is an annotation of predicate-argument (SBAR-2 structures on top of syntactically parsed, or Tree- (WHNP-3 who) banked, structures. (Palmer, et al. 2005; Babko- (S (NP-SBJ *T*-3) Malaya, 2005). More specifically, PropBank (VP served annotation involves three tasks: argument (NP (NP the name) labeling, annotation of modifiers, and creating (PP of co-reference chains for empty categories. (NP Lebanon))) The first goal is to provide consistent argu- (ADVP-MNR magnificently)))))) ment labels across different syntactic realizations of the same verb, as in

Empty subjects which are not traces of move- (3) [ARG0 John] broke [ARG1 the window] ment, such as PRO and pro, are shown as * (see [ARG1 The window] broke. the null subject of the infinite clause in (4) be- low). These null subjects are coindexed with a As this example shows, semantic arguments governing NP if the allows. The null sub- are tagged with numbered argument labels, such ject of an infinitive clause complement to a noun as Arg0, Arg1, Arg2, where these labels are de- is, however, not coindexed with another node in fined on a verb by verb basis. the tree in the syntax. This coindexing is shown The second task of the PropBank annotation as a semantic coindexing in the PropBank anno- involves assigning functional tags to all modifi- tation. ers of the verb, such as MNR (manner), LOC The distinction between syntactic arguments (locative), TMP (temporal), DIS (discourse con- and adjuncts of the verb or verb phrase is made nectives), PRP (purpose) or DIR (direction) and through the use of functional dashtags rather than others. with a structural difference. Both arguments and And, finally, PropBank annotation involves adjuncts are children of the VP node. No distinc- finding antecedents for ‘empty’ arguments of the tion is made between VP-level modification and verbs, as in (4). The subject of the verb leave in S-level modification. All constituents that appear this example is represented as an empty category before the verb are children of S and sisters of [*] in Treebank. In PropBank, all empty catego- VP; all constituents that appear after the verb are ries which could be co-referred with a NP within children of VP. the same are linked in ‘co-reference’ Syntactic arguments of the verb are NP-SBJ, chains: NP (no dashtag), SBAR (either –NOM-SBJ or no dashtag), S (either –NOM-SBJ or no dashtag), (4) I made a decision [*] to leave -DTV, -CLR (closely/clearly related), -DIR with directional verbs. Rel: leave, Adjuncts or modifiers of the verb or sentence Arg0: [*] -> I are any constituent with any other adverbial dashtag, PP (no dashtag), ADVP (no dashtag). As the following sections show, all three tasks Adverbial constituents are marked with a more of PropBank annotation result in structures specific functional dashtag if they belong to one which differ in certain respects from the corre- of the more specific types in the annotation sys- sponding Treebank structures. Section 2 presents

71 our approach to reconciling the differences be- (7) What-1 do you like [*T*-1]? tween Treebank and PropBank with respect to the third task, which links empty categories with Revised PropBank annotation: their antecedents. Section 3 introduces mis- Rel: like matches between syntactic constituency in Tree- Arg0: you bank and PropBank. Mismatches between modi- Arg1: [*T*] fier labels are not addressed in this paper and are left for future work. As the second stage, syntactic chains will be re- constructed automatically, based on the 2 and syntactic chains coindexation provided by Treebank (note that the trace is coindexed with the NP What in (7)). And, PropBank chains include all syntactic chains finally, coreference annotation will be done on (represented in the Treebank) plus other cases of top of the resulting resource, with the goal of nominal semantic coreference, including those finding antecedents for the remaining empty in which the coreferring NP is not a syntactic categories, including empty subjects of infinitival antecedent. For example, according to PropBank verbs and gerunds. guidelines, if a trace is coindexed with a NP in One of the advantages of this approach is that Treebank, then the chain should be reconstructed: it allows us to distinguish different types of

chains, such as syntactic chains (i.e., chains (5) What-1 do you like [*T*-1]? which are derived as the result of syntactic

movement, or control coreference), direct Original PropBank annotation: coreference chains (as illustrated by the example Rel: like in (6)), and semantic type links for other ‘indi- Arg0: you rect’ types of links between an empty category Arg1: [*T*] -> What and its antecedent.

Syntactic chains are annotated in Treebank, Such chains usually include traces of A and A’ and are reconstructed automatically in PropBank. movement and PRO for subject and object con- The annotation of direct coreference chains is trol. On the other hand, not all instances of PROs done manually on top of Treebank, and is re- have syntactic antecedents. As the following ex- stricted to empty categories that are not ample illustrates, subjects of infinitival verbs and coindexed with any NP in Treebank. And, finally, gerunds might have antecedents within the same as we show next, a semantic type link is used for sentence, which cannot be linked as a syntactic relative clauses and a coindex link for verbs of chain. saying.

A semantic type link is used when the antece- (6) On the issue of abortion , Marshall Coleman dent and the empty category do not refer to the wants to take away your right [*] to choose same entity, but do have a certain kind of rela- and give it to the politicians . tionship. For example, consider the relative

clause in (8): ARG0: [*] -> your

REL: choose (8) Answers that we’d like to have

Given that the goal of PropBank is to find all Treebank annotation: semantic arguments of the verbs, the links be- (NP (NP answers) tween empty categories and their coreferring NPs (SBAR (WHNP-6 which) are important, independent of whether they are (S (NP-SBJ-3 we) syntactically coindexed or not. In order to recon- (VP 'd cile the differences between Treebank and Prop- (VP like Bank annotations, we decided to revise Prop- (S (NP-SBJ *-3) Bank annotation and view it as a 3 stage process. (VP to First, PropBank annotators should not recon- (VP have (NP *T*-6) struct syntactic chains, but rather tag empty cate- )))))))) gories as arguments. For example, under the new approach annotators would simply tag the trace In Treebank, the object of the verb have is a trace, as the Arg1 argument in (7): which is coindexed with the relative pronoun. In

72 the original PropBank annotation, a further link (VP would is provided, which specifies the relative pronoun (VP develop as being of “semantic type” answers. (NP (NP musical acts) (PP for

(NP a new record (9) Original PropBank annotation: label))))) Arg1: [NP *T*-6] -> which -> answers .) rel: have In PropBank, the different pieces of the utterance, Arg0: [NP-SBJ *-3] -> we including the trace under the verb said, were concatenated This additional link between which and answers is important for many applications that make use (12) Original PropBank annotation: of preferences for semantic types of verb argu- ARG1: [ Among other things] [ Mr. ments, such as Word Sense Disambiguation Azoff] [ would develop musical acts for a (Chen & Palmer 2005). In the new annotation new record label] [ [*T*-1]] scheme, annotators will first label traces as ar- ARG0: they guments: rel: said

(10) Revised PropBank annotation (stage 1): Under the new approach, in stage one, Tree- Rel: have bank annotation will introduce not a trace of the Arg1: [*T*-6] S clause, but rather *?*, an empty category indi- Arg0: [NP-SBJ *-3] cating ellipsis. In stage three, PropBank annota- tors will link this null element to the S node, but As the next stage, the trace [*T*-6] will be the resulting chain will not be viewed as ‘direct’ linked to the relative pronoun automatically (in coreference. A special tag will be used for this addition to the chain [NP-SBJ *-3] -> we being link, in order to distinguish it from other types of automatically reconstructed). As the third stage, chains. PropBank annotators will link which to answers. However, this chain will be labeled as a “seman- (13) Revised PropBank annotation: tic type” to distinguish it from direct coreference ARG1: [*?*] (-> S) chains and to indicate that there is no identity ARG0: they relation between the coindexed elements. rel: said Verbs of saying illustrate another case of links rather than coreference chains. In many sen- 3 Differences in syntactic constituency tences with direct speech, the clause which intro- duces a verb of saying is ‘embedded’ into the 3.1 Extractions of mismatches between utterance. Syntactically this presents a problem PropBank and Treebank for both Treebank and Propbank annotation. In In order to make the necessary changes to both Treebank, the original annotation style required a the Treebank and the PropBank, we have to first trace coindexed to the highest S node as the ar- find all instances of mismatches. We have used gument of the verb of saying, indicating syntactic two methods to do this: 1) examining the argu- movement. ment locations; 2) examining the discontinuous arguments. (11) Among other things, they said [*T*-1] , Mr. Azoff would develop musical acts for a new Argument Locations In a parse tree which ex- record label . presses the syntactic structure of a sentence, a semantic argument occupies specific syntactic Treebank annotation: locations: it appears in a subject position, a verb (S-1 (PP Among complement location or an adjunct location. (NP other things)) (PRN , Relative to the predicate, its argument is either a (S (NP-SBJ they) sister node, or a sister node of the predicate’s (VP said ancestor. We extracted cases of PropBank argu- (SBAR 0 ments which do not attach to the predicate spine, (S *T*-1)))) and filtered out VP coordination cases. For ex- ,) ample, the following case is a problematic one (NP-SBJ Mr. Azoff) because the argument PP node is embedded too

73 deeply in an NP node and hence it cannot find a cases where one-to-one mapping cannot be pre- connection with the main predicate verb lifted. served. This is an example of a PropBank annotation error. 3.2 Attachment ambiguities Many cases of mismatches between Treebank (14) (VP (VBD[rel] lifted) and PropBank constituents are the result of am- (NP us) ) biguous interpretations. The most common ex- (NP-EXT amples are cases of modifier attachment ambi- (NP a good 12-inches) guities, including PP attachment. In cases of am- (PP-LOC[ARGM-LOC] above (NP the water level)))) biguous interpretations, we are trying to separate cases which can be resolved automatically from However, the following case is not problem- those which require manual adjudication. atic because we consider the ArgM PP to be a sister node of the predicate verb given the VP PP-Attachment The most typical case of PP coordination structure: attachment annotation disagreement is shown in (17).

(15) (VP (VP (VB[rel] buy) (NP the basket of … ) (17) She wrote a letter for Mary. (PP in whichever market …)) (CC and) Treebank annotation: (VP (VBP sell) (VP wrote (NP them) (NP (NP a letter) (PP[ARGM] in the more (PP for expensive market))) (NP Mary))))

Discontinuous Arguments happen when Prop- PropBank annotation: Bank annotators need to concatenate several REL: write Treebank constituents to form an argument. Dis- Arg1: a letter continuous arguments often represent different Arg2: for Mary opinions between PropBank and Treebank anno- tators regarding the interpretations of the sen- In (17), the PP ‘for Mary’ is attached to the tence structure. verb in PropBank and to the NP in Treebank. For example, in the following case, the Prop- This disagreement may have been influenced by Bank concatenates the NP and the PP to be the the set of roles of the verb ‘write’, which in- Arg1. In this case, the disagreement on PP at- cludes a beneficiary as its argument. tachment is simply a Treebank annotation error. (18) Frameset write: Arg0: writer (16) The region lacks necessary mechanisms for Arg1: thing written handling the aid and accounting items. Arg2: beneficiary

Treebank annotation: Examples of this type cannot be automatically (VP lacks resolved and require manual adjudication. (NP necessary mechanisms) (PP for Adverb Attachment Some cases of modifier (NP handing the aid…))) attachment ambiguities, on the other hand, could be automatically resolved. Many cases of mis- PropBank annotation: matches are of the type shown in (19), where a REL: lacks directional adverbial follows the verb. In Tree- Arg1: [NP necessary mechanisms][PP for bank, this adverbial is analyzed as part of an handling the aid and accounting items] ADVP which is the argument of the verb in question. However, in PropBank, it is annotated All of these examples have been classified into as a separate ArgM-DIR. the following categories: (1) attachment ambi- (19) Everything is going back to Korea or Japan. guities, (2) different policy decisions, and (3)

74 Treebank annotation: terpretation of the argument, e.g. whether the (S (NP-SBJ (NN Everything) ) argument can be interpreted as an event or pro- (VP (VBZ is) position. (VP (VBG[rel] going) For example, causative verbs (e.g. make, get), (ADVP-DIR (RB[ARGM-DIR] back) verbs of perception (see, hear), and intensional (PP[ARG2] (TO to) verbs (want, need, believe), among others, are (NP (NNP Korea) analyzed as taking an S clause, which is inter- (CC and) preted as an event in the case of causative verbs (NNP Japan) and verbs of perception, and as a proposition in ))))) (. .)) the case of intensional verbs. On the other hand, ‘label’ verbs (name, call, entitle, label, etc.), do Original PropBank annotation: not select for an event or proposition and are Rel: going analyzed as having 3 arguments: Arg0, Arg1, ArgM-DIR: back and Arg2. Arg2: to Korea or Japan Treebank criteria for distinguishing arguments, on the other hand, were based on syntactic For examples of this type, we have decided to considerations, which did not always match with automatically reconcile PropBank annotations to Propbank. For example, in Treebank, evidence of be consistent with Treebank, as shown in (20). the syntactic category of argument that a verb can take is used as part of the decision process (20) Revised PropBank annotation: about whether to allow the verb to take a small Rel: going clause. Verbs that take finite or non-finite (verbal) Arg2: back to Korea or Japan clausal arguments, are also treated as taking small clauses. The verb find takes a finite clausal 3.3 Sentential complements complement as in We found that the book was Another area of significant mismatch between important and also a non-finite clausal comple- Treebank and PropBank annotation involves sen- ment as in We found the book to be important. tential complements, both infinitival clauses and Therefore, find is also treated as taking a small small clauses. In general, Treebank annotation clause complement as in We found the book allows many more verbs to take sentential com- important. plements than PropBank annotation. For example, the Treebank annotation of the (22) (S (NP-SBJ We) sentence in (21) gives the verb keep a sentential (VP found complement which has their markets active un- (S (NP-SBJ the book) der the S as the subject of the complement clause. (ADJP-PRD important)))) PropBank annotation, on the other hand, does not mark the clause but rather labels each subcon- The obligatory nature of the secondary predi- stituent as a separate argument. cate in this construction also informed the deci- sion to use a small clause with a verb like find. In (21) …keep their markets active (22), for example, important is an obligatory part of the sentence, and removing it makes the sen- Treebank annotation: tence ungrammatical with this sense of find (“We (VP keep found the book” can only be grammatical with a (S (NP-SBJ their markets) different sense of find, essentially “We located (ADJP-PRD active))) the book”). With verbs that take infinitival clausal com- PropBank annotation: plements, however, the distinction between a REL: keep single S argument and an NP object together Arg1: their markets with an S argument is more difficult to make. Arg2: active The original Treebank policy was to follow the criteria and the list of verbs taking both an NP In Propbank, an important criterion for decid- object and an infinitival S argument given in ing whether a verb takes an S argument, or de- Quirk, et al. (1985). composes it into two arguments (usually tagged Resultative constructions are frequently a as Arg1 and Arg2) is based on the semantic in- source of mismatch between Treebank annota-

75 tion as a small clause and PropBank annotation verb is labeled as a verb+particle combination in with Arg1 and Arg2. Treebank treated a number the Treebank, the PropBank annotators concate- of resultative as small clauses, although certain nate it with the verb as the REL. If Treebank la- verbs received resultative structure annotation, bels the second part of the ‘phrasal verb’ as part such as the one in (23). of a prepositional phrase, there is no way to re- solve the inconsistency. (23) (S (NP-SBJ They) (VP painted (25) But Japanese institutional investors are used (NP-1 the apartment) to quarterly or semiannual payments on their in- (S-CLR (NP-SBJ *-1) vestments, so … (ADJP-PRD orange)))) Treebank annotation: In all the mismatches in the area of sentential (VBN used) complementation, Treebank policy tends to (PP (TO to) overgeneralize S-clauses, whereas Propbank (NP quarterly or … leans toward breaking down clauses into separate on their investments)) arguments. This type of mismatch is being resolved on a PropBank annotation: verb-by-verb basis. Propbank will reanalyze Arg1: quarterly or … on their investments some of the verbs (like consider and find), which Rel: used to (‘used to’ is a separate predi- have been analyzed as having 3 arguments, as cate in PropBank) taking an S argument. Treebank, on the other hand, will change the analysis of label verbs like 4.2 Conjunction call, from a small clause analysis to a structure In PropBank, conjoined NPs and clauses are with two complements. usually analyzed as one argument, parallel to Our proposed structure for label verbs, for ex- Treebank. For example, in John and Mary came, ample, is in (24). the NP John and Mary is a constituent in Tree- bank and it is also marked as Arg0 in PropBank. (24) (S (NP-SBJ[Arg0] his parents) However, there are a few cases where one of the (VP (VBD called) conjuncts is modified, and PropBank policy is to (NP-1[Arg1] him) mark these modifiers as ArgMs. For example, in (S-CLR[Arg2] the following NP, the temporal ArgM now modi- (NP-SBJ *-1) fies a verb, but it only applies to the second con- (NP-PRD John)))) junct.

This structure will accommodate both Treebank (26) and PropBank requirements for label verbs. (NP (NNP Richard) (NNP Thornburgh) ) 4 Where Syntax and Semantics do not (, ,) match (SBAR (WHNP-164 (WP who)) Finally, there are some examples where the dif- (S ferences seem to be impossible to resolve with- (NP-SBJ-1 (-NONE- *T*-164)) out sacrificing some important features of Prop- (VP Bank or Treebank annotation. (VBD went) (PRT (RP on) ) 4.1 Phrasal verbs (S (NP-SBJ (-NONE- *-1)) PropBank has around 550 phrasal verbs like (VP (TO to) keep up, touch on, used to and others, which are (VP (VB[rel] become) analyzed as separate predicates in PropBank. (NP-PRD These verbs have their own set of semantic roles, (NP[ARG2] which is different from the set of roles of the cor- (NP (NN governor)) responding ‘non-phrasal’ verbs, and therefore (PP (IN of) they require a separate PropBank entry. In Tree- (NP bank, on the other hand, phrasal verbs are not (NNP distinguished. If the second part of the phrasal Pennsylvania))))

76 (CC and) mantics do not match. We have found that for the (PRN (, ,) most part, such mismatches arise because Tree- (ADVP-TMP (RB now)) bank decisions are based primarily on syntactic (, ,) ) considerations while PropBank decisions give (NP[ARG2] (NNP U.S.) (NNP Attorney) more weight to semantic representation.. (NNP General)) In order to reconcile these differences we have ))))))) revised the annotation policies of both the Prop- Bank and Treebank in appropriate ways. A In PropBank, cases like this can be decom- fourth source of mismatches is simply annotation posed into two propositions: error in either the Treebank or PropBank. Look- ing at the mismatches in general has allowed us (27) Prop1: rel: become to find these errors, and will facilitate their cor- Arg1: attorney general rection. Arg0: [-NONE- *-1] References Prop2: rel: become Olga Babko-Malaya. 2005. PropBank Annotation ArgM-TMP: now Guidelines. http://www.cis.upenn.edu/~mpalmer/ Arg0: [-NONE- *-1] project_pages/PBguidelines.pdf Arg1: a governor Ann Bies, Mark Ferguson, Karen Katz, Robert Mac- Intyre. 1995. Bracketing Guidelines for Treebank In Treebank, the conjoined NP is necessarily II Style. Penn Treebank Project, University of analyzed as one constituent. In order to maintain Pennsylvania, Department of Computer and Infor- the one-to-one mapping between PropBank and mation Science Technical Report MS-CIS-95-06. Treebank, PropBank annotation would have to Jinying Chen and Martha Palmer. 2005. Towards Ro- be revised in order to allow the sentence to have bust High Performance Word Sense Disambigua- one proposition with a conjoined phrase as an tion of English Verbs Using Rich Linguistic Fea- argument. Fortunately, these types of cases do tures. In Proceedings of the 2nd International Joint not occur frequently in the corpus. Conference on Natural Language Processing, IJCNLP2005, pp. 933-944. Oct. 11-13, Jeju Island, 4.3 Gapping Republic of Korea. Another place where the one-to-one mapping M. Marcus, G. Kim, M. Marcinkiewicz, R. MacIn- is difficult to preserve is with gapping construc- tyre, A. Bies, M. Ferguson, K. Katz & B. Schas- tions. Treebank annotation does not annotate the berger, 1994. The Penn Treebank: Annotating gap, given that gaps might correspond to differ- predicate argument structure. Proceedings of the ent syntactic categories or may not even be a Human Language Technology Workshop, San constituent. The policy of Treebank, therefore, is Francisco. simply to provide a coindexation link between M. Marcus, B. Santorini and M.A. Marcinkiewicz, the corresponding constituents: 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics. (28) Mary-1 likes chocolates-2 and Martha Palmer, Dan Gildea, and Paul Kingsbury. Jane=1 – flowers=2 2005. The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, This policy obviously presents a problem for 31(1). one-to-one mapping, since Propbank annotators R. Quirk, S. Greenbaum, G. Leech and J. Svartvik. tag Jane and flowers as the arguments of an im- 1985. A Comprehensive Grammar of the English plied second likes relation, which is not present Language. Longman, London. in the sentence. B. Santorini. 1990. Part-of-speech tagging guidelines 5 Summary for the Penn Treebank Project. University of Penn- sylvania, Department of Computer and Information In this paper we have considered several types Science Technical Report MS-CIS-90-47. of mismatches between the annotations of the English Treebank and the PropBank: coreference and syntactic chains, differences in syntactic constituency, and cases in which syntax and se-

77