<<

Automatic Extraction of Subcategorization from Corpora

Ted Briscoe John Carroll Computer Laboratory Cognitive and Computing Sciences University of Cambridge University of Sussex Pembroke Street, Cambridge CB2 3QG, UK Brighton BN1 9QH, UK ejb@cl, cam. ac. uk j otto. carroll@cogs, susx. ac. uk

Abstract Several substantial machine-readable subcatego- rization dictionaries exist for English, either built We describe a novel technique and imple- largely automatically from machine-readable ver- mented system for constructing a subcate- sions of conventional learners' dictionaries, or manu- gorization dictionary from textual corpora. ally by (computational) linguists (e.g. the Alvey NL Each dictionary entry encodes the relative Tools (ANLT) dictionary, Boguraev et al. (1987); frequency of occurrence of a comprehen- the COMLEX dictionary, Grishman et al. sive set of subcategorization classes for En- (1994)). Unfortunately, neither approach can yield a glish. An initial experiment, on a sample genuinely accurate or comprehensive computational of 14 verbs which exhibit multiple comple- , because both rest ultimately on the manual mentation patterns, demonstrates that the efforts of lexicographers / linguists and are, there- technique achieves accuracy comparable to fore, prone to errors of omission and commission previous approaches, which are all limited which are hard or impossible to detect automatically to a highly restricted set of subcategoriza- (e.g. Boguraev & Briscoe, 1989; see also section 3.1 tion classes. We also demonstrate that a below for an example). Furthermore, manual encod- subcategorization dictionary built with the ing is labour intensive and, therefore, it is costly to system improves the accuracy of a parser extend it to neologisms, information not currently by an appreciable amount 1. encoded (such as relative frequency of different sub- categorizations), or other (sub)languages. These 1 Motivation problems are compounded by the fact that predi- cate subcategorization is closely associated to lexical Predicate subcategorization is a key component of sense and the senses of a word change between cor- a lexical entry, because most, if not all, recent syn- pora, sublanguages and/or subject domains (Jensen, tactic theories 'project' syntactic structure from the lexicon. Therefore, a wide-coverage parser utilizing 1991). such a lexicalist grammar must have access to an In a recent experiment with a wide-coverage pars- accurate and comprehensive dictionary encoding (at ing system utilizing a lexicalist grammatical frame- a minimum) the number and category of a predi- work, Briscoe & Carroll (1993) observed that half cate's arguments and ideally also information about of parse failures on unseen test data were caused control with predicative arguments, semantic selec- by inaccurate subcategorization information in the tion preferences on arguments, and so forth, to allow ANLT dictionary. The close connection between the recovery of the correct predicate- struc- sense and subcategorization and between subject do- ture. If the parser uses statistical techniques to rank main and sense makes it likely that a fully accurate analyses, it is also critical that the dictionary encode 'static' subcategorization dictionary of a language is the relative frequency of distinct subcategorization unattainable in any case. Moreover, although Sch- classes for each predicate. abes (1992) and others have proposed 'lexicalized' probabilistic grammars to improve the accuracy of 1This work was supported by UK DTI/SALT parse ranking, no wide-coverage parser has yet been project 41/5808 'Integrated Language Database', CEC constructed incorporating probabilities of different Telematics Applications Programme project LE1-211i subcategorizations for individual predicates, because 'SPARKLE: Shallow PARsing and Knowledge extraction of the problems of accurately estimating them. for Language Engineering', and by SERC/EPSRC Ad- vanced Fellowships to both authors. We would like to These problems suggest that automatic construc- thank the COMLEX Syntax development team for al- tion or updating of subcategorization dictionaries lowing us access to pre-release data (for an early exper- from textual corpora is a more promising avenue iment), and for useful feedback. to pursue. Preliminary experiments acquiring a few 356 verbal subcategorization classes have been reported from sentence subanalyses which begin/end at by Brent (1991, 1993), Manning (1993), and Ush- the boundaries of (specified) predicates. ioda et al. (1993). In these experiments the max- 5. A pattern classifier which assigns patterns in imum number of distinct subcategorization classes patternsets to subcategorization classes or re- recognized is sixteen, and only Ushioda et al. at- jects patterns as unclassifiable on the basis of tempt to derive relative subcategorization frequency the feature values of syntactic categories and for individual predicates. the head lemmas in each pattern. We describe a new system capable of distinguish- ing 160 verbal subcategorization classes--a superset 6. A patternsets evaluator which evaluates sets of those found in the ANLT and COMLEX Syn- of patternsets gathered for a (single) predicate, tax dictionaries. The classes also incorporate infor- constructing putative subcategorization entries mation about control of predicative arguments and and filtering the latter on the basis of their re- alternations such as particle movement and extra- liability and likelihood. position. We report an initial experiment which For example, building entries for attribute, and demonstrates that this system is capable of acquir- given that one of the sentences in our data was (la), ing the subcategorization classes of verbs and the the tagger and lemmatizer return (lb). relative frequencies of these classes with compara- ble accuracy to the less ambitious extant systems. (1) a He attributed his failure, he said, to We achieve this performance by exploiting a more no< blank> one buying his books. sophisticated robust statistical parser which yields b he_PPHS1 attribute_VVD his_APP$ fail- complete though 'shallow' parses, a more compre- ure_NN1 ,_, he_PPHS1 say_VVD ,_, to_II hensive subcategorization class classifier, and a pri- noone_PN buy_ VVG his_APP$ or/ estimates of the probability of membership of book_NN2 these classes. We also describe a small-scale ex- periment which demonstrates that subcategorization class frequency information for individual verbs can (lb) is parsed successfully by the probabilistic LR be used to improve parsing accuracy. parser, and the ranked analyses are returned. Then the patternset extractor locates the subanalyses con- 2 Description of the System taining attribute and constructs a patternset. The highest ranked analysis and pattern for this example 2.1 Overview are shown in Figure 12 . Patterns encode the value The system consists of the following six components of the VSUBCAT feature from the VP rule and the which are applied in sequence to sentences contain- head lemma(s) of each argument. In the case of PP ing a specific predicate in order to retrieve a set of (I)2) arguments, the pattern also encodes the value of subcategorization classes for that predicate: PSUBCAT from the PP rule and the head lemma(s) of its complement(s). In the next stage of process- 1. A tagger, a first-order HMM part-of-speech ing, patterns are classified, in this case giving the (PoS) and punctuation tag disambiguator, is subcategorization class corresponding to transitive used to assign and rank tags for each word and plus PP with non-finite clausal complement. punctuation token in sequences of sentences (El- The system could be applied to corpus data by worthy, 1994). first sorting sentences into groups containing in- 2. A lemmatizer is used to replace word-tag stances of a specified predicate, but we use a different pairs with lemma-tag pairs, where a lemma is strategy since it is more efficient to tag, lemmatize the morphological base or dictionary headword and parse a corpus just once, extracting patternsets form appropriate for the word, given the PoS for all predicates in each sentence; then to classify assignment made by the tagger. We use an en- the patterns in all patternsets; and finally, to sort hanced version of the GATE project stemmer and recombine patternsets into sets of patternsets, (Cunningham et al., 1995). one set for each distinct predicate containing pat- 3. A probabilistic LR parser, trained on a tree- ternsets of just the patterns relevant to that predi- bank, returns ranked analyses (Briscoe &: Car- cate. The tagger, lemmatizer, grammar and parser roll, 1993; Carroll, 1993, 1994), using a gram- have been described elsewhere (see previous refer- mar written in a feature-based unification gram- ences), so we provide only brief relevant details here, mar which assigns 'shallow' phrase concentrating on the description of the components structure analyses to tag networks (or 'lattices') returned by the tagger (Briscoe & Carroll, 1994, 2The analysis shows only category aliases rather than 1995; Carroll & Briscoe, 1996). sets of feature-value pairs. Ta represents a text delimited by commas (Nunberg 1990; Briscoe ~ Carroll, 4. A patternset extractor which extracts sub- 1994). Tokens in the patternset are indexed by sequen- categorization patterns, including the syntac- tial position in the sentence so that two or more tokens tic categories and head lemmas of constituents, of the same type can be kept distinct in patterns.

357 (Tp (1 ((((he:l PPHS1)) (V2 (N2 he_PPHSI) (VSUBCAT NP_PP) (Vl (V0 attribute_VVD) ((attribute:6 VVD)) (N2 (DT his_APP$) ((failure:8 NN1)) (NI ((PSUBCAT SING) (NO (NO failure_NNl) ((to:9 II)) (Ta (Pu ,_,) ((noone:lO PN)) (V2 (N2 he_PPHSi) ((buy:ll VVG)))) (Vl (VO say_VVD))) (Pu ,_,))))) i)) (P2 (PI (P0 to_II) (N2 noone_PN) (Vl (V0 buy_WG) (N2 (DT his_APP$) (N1 (NO book_NN2)))))))))

Figure 1: Highest-ranked analysis and patternset for (lb) of the system that are new: the extractor, classifier passive constructions are recognized and treated spe- and evaluator. cially. The extractor returns the predicate, the The grammar consists of 455 phrase structure VSUBCAT value, and just the heads of the comple- rule schemata in the format accepted by the parser ments (except in the case of PPs, where it returns (a syntactic variant of a Definite Clause Grammar the PSUBCAT value, the preposition head, and the with iterative (Kleene) operators). It is 'shallow' in heads of the PP's complements). that no atof which thetempt is made to fully anal- The subcategorization classes recognized by the yse unbounded dependencies. However, the distinc- classifier were obtained by manually merging the tion between arguments and adjuncts is expressed, classes exemplified in the COMLEX Syntax and following X-bar theory (e.g. Jackendoff, 1977), by ANLT dictionaries and adding around 30 classes Chomsky-adjunction to maximal projections of ad- found by manual inspection of unclassifiable pat- juncts (XP --* XP Adjunct) as opposed to 'govern- terns for corpus examples during development of the ment' of arguments (i.e. arguments are sisters within system. These consisted of some extra patterns for X1 projections; X1 ~ X0 Argl... ArgN). Further- phrasM verbs with complex complementation and more, all analyses are rooted (in S) so the grammar with flexible ordering "of the preposition/particle, assigns global, shallow and often 'spurious' analy- some for non-passivizable patterns with a surface ses to many sentences. There are 29 distinct val- direct object, and some for rarer combinations of ues for VSUBCAT and 10 for PSUBCAT; these are governed preposition and complementizer combina- analysed in patterns along with specific closed-class tions. The classifier filters out as unclassifiable head lemmas of arguments, such as it (dummy sub- around 15% of patterns found by the extractor when jects), whether (wh-complements), and so forth, to run on all the patternsets extracted from the Su- classify patterns as evidence for one of the 160 sub- sanne corpus. This demonstrates the value of the categorization classes. Each of these classes can be classifier as a filter of spurious analyses, as well as parameterized for specific predicates by, for exam- providing both translation between extracted pat- ple, different prepositions or particles. Currently, terns and two existing subcategorization dictionar- the coverage of this grammar--the proportion of sen- ies and a definition of the target subcategorization tences for which at least one analysis is found--is dictionary. 79% when applied to the Susanne corpus (Sampson, The evaluator builds entries by taking the pat- 1995), a 138K word treebanked and balanced subset terns for a given predicate built from successful of the Brown corpus. Wide coverage is important parses and records the number of observations of since information is acquired only from successful each subcategorization class. Patterns provide sev- parses. The combined throughput of the parsing eral types of information which can be used to rank components on a Sun UltraSparc 1/140 is around or select between patterns in the patternset for a 50 words per CPU second. given sentence exemplifying an instance of a pred- icate, such as the ranking of the parse from which 2.2 The Extractor~ Classifier and Evaluator it was extracted or the proportion of subanalyses The extractor takes as input the ranked analyses supporting a specific pattern. Currently, we simply from the probabilistic parser. It locates the subanal- select the pattern supported by the highest ranked yses around the predicate, finding the constituents parse. However, we are experimenting with alterna- identified as complements inside each subanalysis, tive approaches. The resulting set of putative classes and the subject clause preceding it. Instances of for a predicate are filtered, following Brent (1993), 358 by hypothesis testing on binomial frequency data. • many incorrect analyses will yield patterns Evaluating putative entries on binomial frequency which are unclassifiable, and are thus filtered data requires that we record the total number of out; patternsets n for a given predicate, and the number • arguments of a specific verb will occur with of these patternsets containing a pattern support- greater frequency than adjuncts (in potential ing an entry for given class m. These figures are argument positions); straightforwardly computed from the output of the classifier; however, we also require an estimate of the • the patternset generator will incorrectly output probability that a pattern for class i will occur with patterns for certain classes more often than oth- a verb which is not a member of subcategorization ers; and class i. Brent proposes estimating these probabili- • even a highest ranked pattern for i is only a ties experimentally on the basis of the behaviour of probabilistic cue for membership of i, so mem- the extractor. We estimate this probability more di- bership should only be inferred if there are rectly by first extracting the number of verbs which enough occurrences of patterns for i in the data are members of each class in the ANLT dictionary to outweigh the error probability for i. (with intuitive estimates for the membership of the This simple automated, hybrid linguis- novel classes) and converting this to a probability of tic/statistical approach contrasts with the manual class membership by dividing by the total number of linguistic analysis of the COMLEX Syntax lexicog- verbs in the dictionary; and secondly, by multiplying raphers (Meyers et al., 1994), who propose five cri- the complement of these probabilities by the proba- teria and five heuristics for argument-hood and six bility of a pattern for class i, defined as the number criteria and two heuristics for adjunct-hood, culled of patterns for i extracted from the Susanne corpus mostly from the linguistics literature. Many of these divided by the total number of patterns. So, p(v -i), are not exploitable automatically because they rest the probability of verb v not of class i occurring with on semantic judgements which cannot (yet) be made a pattern for class i is: automatically: for example, optional arguments are p(v -i) = (1-lanlt-verbs-in-elass-il Ipatterns-f °r-il often 'understood' or implied if missing. Others are lanlt_verbsl ) Ipatternsl syntactic tests involving diathesis alternation possi- bilities (e.g. passive, dative movement, Levin (1993)) The binomial distribution gives the probability of an which require recognition that the 'same' argument, event with probability p happening exactly m times defined usually by semantic class / thematic role, is out of n attempts: occurring across argument positions. We hope to ex- n! ploit this information where possible at a later stage P(m, n,p) - m!(n - rn)! pro(1 - p)n-m in the development of our approach. However, recog- nizing same/similar arguments requires considerable The probability of the event happening m or more quantities of lexical data or the ability to back-off to times is: lexical semantic classes. At the moment, we exploit linguistic information about the syntactic type, obli- P(m+,n,p) = ~ P(i,n,p) gatoriness and position of arguments, as well as the i~m set of possible subcategorization classes, and com- Thus P(m,n,p(v -i)) is the probability that m or bine this with statistical inference based on the prob- more occurrences of patterns for i will occur with ability of class membership and the frequency and a verb which is not a member of i, given n occur- reliability of patterns for classes. rences of that verb. Setting a threshold of less than or equal to 0.05 yields a 95% or better confidence 3 Experimental Evaluation that a high enough proportion of patterns for i have been observed for the verb to be in class i 3. 3.1 Lexicon Evaluation - Method In order to test the accuracy of our system (as de- 2.3 Discussion veloped so far) and to provide empirical feedback Our approach to acquiring subcategorization classes for further development, we took the Susanne, SEC is predicated on the following assumptions: (Taylor & Knowles, 1988) and LOB corpora (Gar- • most sentences will not allow the application of side et al., 1987)--a total of 1.2 million words--and all possible rules of English complementation; extracted all sentences containing an occurrence of one of fourteen verbs, up to a maximum of 1000 • some sentences will be unambiguous even given citations of each. These verbs, listed in Figure 2, the indeterminacy of the grammar4; were chosen at random, subject to the constraint 3Brent (1993:249-253) provides a detailed explana- that they exhibited multiple complementation pat- tion and justification for the use of this measure. terns. The sentences containing these verbs were 4In fact, 5% of sentences in Susanne are assigned only tagged and parsed automatically, and the extractor, a single analysis by the grammar. classifier and evaluator were applied to the resulting 359 successful analyses. The citations from which entries Dictionary Corpus were derived totaled approximately 70K words. (14 verbs) (7 verbs) The results were evaluated against a merged entry Precision 65.7% 76.6% for these verbs from the ANLT and COMLEX Syn- Recall 35.5% 43.4% tax dictionaries, and also against a manual analysis of the corpus data for seven of the verbs. The process Figure 3: Type precision and recall of evaluating the performance of the system relative to the dictionaries could, in principle, be reduced to an automated report of type precision (percentage of Ranking Accuracy correct subcategorization classes to all classes found) ask 75 .O% and recall (perCentage of correct classes found in the begin 100.0% dictionary entry). However, since there are disagree- believe 66.7% ments between the dictionaries and there are classes cause 100.0% found in the corpus data that are not contained in give 70.0% either dictionary, we report results relative both to a seem 75.0% manually merged entry from ANLT and COMLEX, swing 83.3% and also, for seven of the verbs, to a manual anal- Mean 81.4% ysis of the actual corpus data. The latter analysis is necessary because precision and recall measures Figure 4: Ranking accuracy of classes against the merged entry will still tend to yield in- accurate results as the system cannot acquire classes not exemplified in the data, and may acquire classes Figure 3 gives the type precision and recall of incorrectly absent from the dictionaries. our system's recognition of subcategorization classes We illustrate these problems with reference to as evaluated against the merged dictionary entries seem, where there is overlap, but not agreement (14 verbs) and against the manually analysed cor- between the COMLEX and ANLT entries. Thus, pus data (7 verbs). The frequency distribution of both predict that seem will occur with a sentential the classes is highly skewed: for example for believe, complement and dummy subject, but only ANLT there are 107 instances of the most common class in predicts the possibility of a 'wh' complement and the corpus data, but only 6 instances in total of the only COMLEX predicts the (optional) presence of least common four classes. More generally, for the a PP[to] argument with the sentential complement. manually analysed verbs, almost 60% of the false One ANLT entry covers two COMLEX entries given negatives have only one or two exemplars each in the different treatment of the relevant complements the corpus citations. None of them are returned by but the classifier keeps them distinct. The corpus the system because the binomial filter always rejects data for seem contains examples of further classes classes hypothesised on the basis of such little evi- which we judge valid, in which seem can take a dence. PP[to] and infinitive complement, as in he seems to In Figure 4 we estimate the accuracy with which me to be insane, and a passive participle, as in he our system ranks true positive classes against the seemed depressed. This comparison illustrates the correct ranking for the seven verbs whose corpus in- problem of errors of omission common to computa- put was manually analysed. We compute this mea- tional constructed manually and also from sure by calculating the percentage of pairs of classes machine-readable dictionaries. All classes for seem at positions (n, m) s.t. n < m in the system rank- are exemplified in the corpus data, but for ask, for ing that are ordered the same in the correct ranking. example, eight classes (out of a possible 27 in the This gives us an estimate of the accuracy of the rel- merged entry) are not present, so comparison only ative frequencies of classes output by the system. to the merged entry would give an unreasonably low For each of the seven verbs for which we under- estimate of recall. took a corpus analysis, we calculate the token recall of our system as the percentage (over all exemplars) 3.2 Lexicon Evaluation - Results of true positives in the corpus. This gives us an es- Figure 2 gives the raw results for the merged en- timate of the parsing performance that would result tries and corpus analysis on each verb. It shows the from providing a parser with entries built using the number of true positives (TP), correct classes pro- system, shown in Figure 5. posed by our system, false positives (FP), incorrect Further evaluation of the results for these seven classes proposed by our system, and false negatives verbs reveals that the filtering phase is the weak (FN), correct classes not proposed by our system, link in the systerri. There are only 13 true negatives as judged against the merged entry, and, for seven which the system failed to propose, each exemplified of the verbs, against the corpus analysis. It also in the data by a mean of 4.5 examples. On the other shows, in the final column, the number of sentences hand, there are 67 false negatives supported by an from which classes were extracted. estimated mean of 7.1 examples which should, ide-

360 Merged Entry Corpus Data No. of TP FP [ FN TP FP FN Sentences ask 9 0 18 9 0 10 390 begin 4 1 7 4 1 7 311 believe 4 4 11 4 4 8 230 cause 2 3 6 2 3 5 95 expect 6 5 3 223 find 5 7 15 645 give 5 2 11 5 2 5 639 help 6 3 8 223 like 3 2 7 228 move 4 3 9 217 produce 2 1 3 152 provide 3 2 6 217 seem 8 1 4 8 1 4 534 swing 4 0 10 4 0 8 45 Totals 65 34 118 36 11 47 4149

Figure 2: Raw results for test of 14 verbs

Mean Recall Precision Token Recall crossings ask 78.5% 'Baseline' 1.00 70.7% 72.3% begin 73.8% Lexicalised 0.93 71.4% 72.9% believe 34.5% 92.1% cause Figure 6: GEIG evaluation metrics for parser against 92.2% give Susanne bracketings seem 84.7% swing 39.2% Mean 80.9% parsers or grammars being the same. We ran- domly selected a test set of 250 in-coverage sen- Figure 5: Token recall tences (of lengths 3-56 tokens, mean 18.2) from the Susanne treebank, retagged with possibly multiple ally, have been accepted by the filter, and 11 false tags per word, and measured the 'baseline' accu- positives which should have been rejected. The per- racy of the unlexicalized parser on the sentences us- formance of the filter for classes with less than 10 ing the now standard PARSEVAL/GEIG evaluation exemplars is around chance, and a simple heuris- metrics of mean crossing brackets per sentence and tic of accepting all classes with more than 10 exem- (unlabelled) bracket recall and precision (e.g. Gr- plars would have produced broadly similar results ishman et al., 1992); see figure 65. Next, we col- for these verbs. The filter may well be performing lected all words in the test corpus tagged as possi- poorly because the probability of generating a sub- bly being verbs (giving a total of 356 distinct lem- categorization class for a given verb is often lower mas) and retrieved all citations of them in the LOB than the error probability for that class. corpus, plus Susanne with the 250 test sentences excluded. We acquired subcategorization and as- 3.3 Parsing Evaluation sociated frequency information from the citations, in the process successfully parsing 380K words. We In addition to evaluating the acquired subcategoriza- then parsed the test set, with each verb subcate- tion information against existing lexical resources, gorization possibility weighted by its raw frequency we have also evaluated the information in the con- score, and using the naive add-one smoothing tech- text of an actual parsing system. In particular we nique to allow for omitted possibilities. The GEIG wanted to establish whether the subcategorization measures for the lexicalized parser show a 7% im- frequency information for individual verbs could be provement in the crossing bracket score (figure 6). used to improve the accuracy of a parser that uses Over the existing test corpus this is not statisti- statistical techniques to rank analyses. The experiment used the same probabilistic parser 5Carroll & Briscoe (1996) use the same test set, al- and tag sequence grammar as are present in the though the baseline results reported here differ slightly acquisition system (see references above)--although due to differences in the mapping from parse trees to the experiment does not in any way rely on the Susanne-compatible bracketings. 361 cally significant at the 95% level (paired t-test, 1.21, 5 Conclusions and Further Work 249 dr, p = 0.11)--although if the pattern of differ- ences were maintained over a larger test set of 470 The experiment and comparison reported above sug- sentences it would be significant. We expect that gests that our more comprehensive subcategoriza- a more sophisticated smoothing technique, a larger tion class extractor is able both to assign classes acquisition corpus, and extensions to the system to to individual verbal predicates and also to rank deal with nominal and adjectival predicates would them according to relative frequency with compa- improve accuracy still further. Nevertheless, this rable accuracy to extant systems. We have also experiment demonstrates that lexicalizing a gram- demonstrated that a subcategorization dictionary mar/parser with subcategorization frequencies can built with the system can improve the accuracy of a appreciably improve the accuracy of parse ranking. probabilistic parser by an appreciable amount. The system we have developed is straightfor- wardly extensible to nominal and adjectival pred- 4 Related Work icates; the existing grammar distinguishes nominal and adjectival arguments from adjuncts structurally, Brent's (1993) approach to acquiring subcategoriza- tion is based on a philosophy of only exploiting un- so all that is required is extension of the classi- ambiguous and determinate information in unanal- fier. Developing an analogous system for another language would be harder but not infeasible; sim- ysed corpora. He defines a number of lexical pat- ilar taggers and parsers have been developed for a terns (mostly involving closed class items, such as number of languages, but no extant subcategoriza- pronouns) which reliably cue one of five subcatego- rization classes. Brent does not report comprehen- tion dictionaries exist to our knowledge, therefore the lexical statistics we utilize for statistical filter- sive results, but for one class, sentential complement verbs, he achieves 96% precision and 76% recall at ing would need to be estimated, perhaps using the classifying individual tokens of 63 distinct verbs as technique described by Brent (1993). However, the exemplars or non-exemplars of this class. He does entire approach to filtering needs improvement, as not attempt to rank different classes for a given verb. evaluation of our results demonstrates that it is the weakest link in our current system. Ushioda et al. (1993) utilise a PoS tagged corpus Our system needs further refinement to nar- and finite-state NP parser to recognize and calcu- row some subcategorization classes, for example, to late the relative frequency of six subcategorization choose between differing control options with pred- classes. They report an accuracy rate of 83% (254 icative complements. It also needs supplementing errors) at classifying 1565 classifiable tokens of 33 with information about diathesis alternation pos- distinct verbs in running text and suggest that in- sibilities (e.g. Levin, 1993) and semantic selection correct noun phrase boundary detection accounts for preferences on argument heads. Grishman & Ster- the majority of errors. They report that for 32 verbs ling (1992), Poznanski & Sanfilippo (1993), Resnik their system correctly predicts the most frequent (1993), Ribas (1994) and others have shown that it class, and for 30 verbs it correctly predicts the sec- is possible to acquire selection preferences from (par- ond most frequent class, if there was one. Our sys- tially) parsed data. Our system already gathers head tem rankings include all classes for each verb, from lemmas in patterns, so any of these approaches could a total of 160 classes, and average 81.4% correct. be applied, in principle. In future work, we intend to Manning (1993) conducts a larger experiment, extend the system in this direction. The ability to also using a PoS tagged corpus and a finite-state recognize that argument slots of different subcatego- NP parser, attempting to recognize sixteen distinct rization classes for the same predicate share seman- complementation patterns. He reports that for a test tic restrictions/preferences would assist recognition sample of 200 tokens of 40 verbs in running text, the that the predicate undergoes specific alternations, acquired subcategorization dictionary listed the ap- this in turn assisting inferences about control, equi propriate entry for 163 cases, giving a token recall of and raising (e.g. Boguraev & Briscoe, 1987). 82% (as compared with 80.9% in our experiment). He also reports a comparison of acquired entries for References the verbs to the entries given in the Oxford Advanced Learner's Dictionary of Current English (Hornby, Boguraev, B. & Briscoe, E. 1987. Large lexicons 1989) on which his system achieves a precision of for natural language processing: utilising the gram- 90% and a recall of 43%. His system averages 3.48 mar coding system of the Longman Dictionary of subentries (maximum 10)--less then half the num- Contemporary English. Computational Linguistics ber produced in our experiment. It is not clear what 13.4: 219-240. level of evidence the performance of Manning's sys- Boguraev, B. & Briscoe, E. 1989. Introduction. In tem is based on, but the system was applied to 4.1 Boguraev, B. & Briscoe, E. eds. Computational Lex- million words of text (c.f. our 1.2 million words) and icography for Natural Language Processing. Long- the verbs are all common, so it is likely that consid- man, London: 1-40. erably more exemplars of each verb were available. Boguraev, B., Briscoe, E., Carroll, J., Carter, D. 362 & Grover, C. 1987. The derivation of a gram- con. In Proceedings of the International Conference matically-indexed lexicon from the Longman Dictio- on Computational Linguistics, COLING-94, Kyoto, nary of Contemporary English. In Proceedings of the Japan. 268-272. 25th Annual Meeting of the Association for Compu- Grishman, R., Macleod, C. & Sterling, J. 1992. tational Linguistics, Stanford, CA. 193-200. Evaluating parsing strategies using standardized Brent, M. 1991. Automatic acquisition ofsubcatego- parse files. In Proceedings of the 3rd A CL rization frames from untagged text. In Proceedings Conference on Applied Natural Language Process- of the 29th Annual Meeting of the Association for ing, Trento, Italy. 156-161. Computational Linguistics, Berkeley, CA. 209-214. Grishman, R. & Sterling, J. 1992. Acquisition of Brent, M. 1993. From grammar to lexicon: unsu- selectional patterns. In Proceedings of the Inter- pervised learning of lexical syntax. Computational national Conference on Computational Linguistics, Linguistics 19.3: 243-262. COLING-92, Nantes, France. 658-664. Briscoe, E. & Carroll, J. 1993. Generalised proba- Jackendoff, R. 1977. X-bar syntax. MIT Press; bilistic LR parsing for unification-based grammars. Cambridge, MA.. Computational Linguistics 19.1: 25-60. Jensen, K. 1991. A broad-coverage natural language Briscoe, E. & Carroll, J. 1994. Parsing (with) punc- analysis system. In M. Tomita eds. Current Issues tuation. Rank Xerox Research Centre, Grenoble, in Parsing Technology. Kluwer, Dordrecht. MLTT-TR-007. Levin, B. 1993. Towards a lexical organization of Briscoe, E. & Carroll, J. 1995. Developing and eval- English verbs. Chicago University Press, Chicago. uating a probabilistic LR parser of part-of-speech Manning, C. 1993. Automatic acquisition of a large and punctuation labels. In Proceedings of the ~th subcategorisation dictionary from corpora. In Pro- A CL/SIGPARSE International Workshop on Pars- ceedings of the 31st Annual Meeting of the Asso- ing Technologies, Prague, Czech Republic. 48-58. ciation for Computational Linguistics, Columbus, Carroll, J. 1993. Practical unification-based parsing Ohio. 235-242. of natural language. Cambridge University Com- Meyers, A., Macleod, C. & Grishman, R. 1994. puter Laboratory, TR-224. Standardization of the complement adjunct distinc- Carroll, J. 1994. Relating complexity to practical tion. New York University, Ms. performance in parsing with wide-coverage unifica- Nunberg, G. 1990. The linguistics of punctuation. tion grammars. In Proceedings of the 32nd Annual CSLI Lecture Notes 18, Stanford, CA. Meeting of the Association for Computational Lin- guistics, NMSU, Las Cruces, NM. 287-294. Poznanski, V. & Sanfilippo, A. 1993. Detecting de- pendencies between semantic verb subclasses and Carroll, J. & Briscoe, E. 1996. Apportioning de- subcategorization frames in text corpora. In Pro- velopment effort in a probabilistic LR parsing sys- ceedings of the SIGLEX ACL Workshop on the Ac- tem through evaluation. In Proceedings of the ACL quisition of Lexical Knowledge from Text, Boguraev, SIGDAT Conference on Empirical Methods in Natu- B. & Pustejovsky, J. eds. ral Language Processing, University of Pensylvania, Philadelphia, PA. 92-100. Resnik, P. 1993. Selection and information: a class- based approach to lexical relationships. University of Carroll, J. & Grover, C. 1989. The derivation Pennsylvania, CIS Dept, PhD thesis. of a large computational lexicon for English from Ribas, P. 1994. An experiment on learning ap- LDOCE. In Boguraev, B. and Briscoe, E. eds. Com- propriate selection restrictions from a parsed cor- putational for Natural Language Pro- pus. In Proceedings of the International Conference cessing. Longman, London: 117-134. on Computational Linguistics, COLING-94, Kyoto, Cunningham, H., Gaizauskas, R. & Wilks, Y. 1995. Japan. A general architecture for text engineering (GATE) Sampson, G. 1995. English for the computer. Ox- - a new approach to language R~4D. Research memo ford, UK: Oxford University Press. CS-95-21, Department of Computer Science, Univer- sity of Sheffield, UK. Schabes, Y. 1992. Stochastic lexicalized tree ad- joining grammars. In Proceedings of the Inter- de Marcken, C. 1990. Parsing the LOB corpus. In national Conference on Computational Linguistics, Proceedings of the 28th Annual Meeting of the As- COLING-92, Nantes, France. 426-432. sociation for Computational Linguistics, Pittsburgh, PA. 243-251. Taylor, L. & Knowles, G. 1988. Manual of informa- tion to accompany the SEC corpus: the machine- Elworthy, D. 1994. Does Baum-Welch re-estimation readable corpus of spoken English. University of help taggers?. In Proceedings of the ~th Conf. Ap- Lancaster, UK, Ms. plied NLP, Stuttgart, Germany. Ushioda, A., Evans, D., Gibson, T. & Waibel, A. Garside, R., Leech, G. & Sampson, G. 1987. The 1993. The automatic acquisition of frequencies of computational analysis of English: A corpus-based verb subcategorization frames from tagged corpora. approach. Longman, London. In Boguraev, B. & Pustejovsky, J. eds. SIGLEX Grishman, R., Macleod, C. & Meyers, A. 1994. A CL Workshop on the Acquisition of Lexical Knowl- Comlex syntax: building a computational lexi- edge from Text. Columbus, Ohio: 95-106. 363