The Berkeley Framenet Project

The Berkeley FrameNet Project Collin F. Baker and Charles J. Fillmore and John B. Lowe {collinb, fillmore, jblowe}@icsi.berkeley.edu International Computer Science Institute 1947 Center St. Suite 600 Berkeley, Calif., 94704 Abstract These descriptions are based on hand-tagged FrameNet is a three-year NSF-supported semantic annotations of example sentences ex- project in corpus-based computational lexicog- tracted from large text corpora and systematic raphy, now in its second year (NSF IRI-9618838, analysis of the semantic patterns they exem- "Tools for Lexicon Building"). The project's plify by lexicographers and linguists. The pri- key features are (a) a commitment to corpus mary emphasis of the project therefore is the evidence for semantic and syntactic generaliza- encoding, by humans, of semantic knowledge tions, and (b) the representation of the valences in machine-readable form. The intuition of the of its target words (mostly nouns, adjectives, lexicographers is guided by and constrained by and verbs) in which the semantic portion makes the results of corpus-based research using high- use of frame semantics. The resulting database performance software tools. will contain (a) descriptions of the semantic The semantic domains to be covered are" frames underlying the meanings of the words de- HEALTH CARE, CHANCE, PERCEPTION, COMMU- scribed, and (b) the valence representation (se- NICATION, TRANSACTION, TIME, SPACE, BODY mantic and syntactic) of several thousand words (parts and functions of the body), MOTION, LIFE and phrases, each accompanied by (c) a repre- STAGES, SOCIAL CONTEXT, EMOTION and COG- sentative collection of annotated corpus attes- NITION. tations, which jointly exemplify the observed 1.1 Scope of the Project linkings between "frame elements" and their syntactic realizations (e.g. grammatical func- The results of the project are (a) a lexical re- source, called the FrameNet database 3, and (b) tion, phrase type, and other syntactic traits). This report will present the project's goals and associated software tools. The database has workflow, and information about the computa- three major components (described in more de- tional tools that have been adapted or created tail below: • Lexicon containing entries which are com- in-house for this work. posed of: (a) some conventional dictionary-type data, mainly for the sake of human readers; (b) FOR- 1 Introduction MULAS which capture the morphosyntactic ways in The Berkeley FrameNet project 1 is producing which elements of the semantic frame can be realized frame-semantic descriptions of several thousand within the phrases or sentences built up around the word; (c) links to semantically ANNOTATED EXAM- English lexical items and backing up these descriptions with semantically annotated attesta- European collaborators whose participation has made tions from contemporary English corpora2. this possible are Sue Atkins, Oxford University Press, and Ulrich Held, IMS-Stuttgart. 1The project is based at the International Computer SThe database will ultimately contain at least 5,000 Science Institute (1947 Center Street, Berkeley, CA). A lexical entries together with a parallel annotated cor- fuller bibliography may be found in (Lowe et ai., 1997) pus, these in formats suitable for integration into appli- 2Our main corpus is the British National Corpus. cations which use other lexical resources such as Word- We have access to it through the courtesy of Oxford Net and COMLEX. The final design of the database will University Press; the POS-tagged and lemmatized ver- be selected in consultation with colleagues at Princeton sion we use was prepared by the Institut flit Maschinelle (WordNet), ICSI, and IMS, and with other members of Sprachverarbeitung of the University of Stuttgart). The the NLP community. 86 PLE SENTENCES which illustrate each of the poten- subframes associated with individual words in- tial realization patterns identified in the formula; 4 herit all of these while possibly adding some of and (d) links to the FRAME DATABASE and to other their own. Fig. 1 shows some of the subframes, machine-readable resources such as WordNet and as discussed below. COMLEX. • Frame Database containing descriptions of fra~ne(TRANSPORTATION) each frame's basic conceptual structure and giving frame.elements(MOVER(S), MEANS, PATH) names and descriptions for the elements which par- scene(MOVER(S) move along PATH by MEANS) ticipate in such structures. Several related entries in frame(DRiVING) this database are schematized in Fig. 1. inherit(TRANSPORTATION) • Annotated Example Sentences which are frarne.elements(DRIVER (:MOVER), VEHICLE marked up to exemplify the semantic and morpho- (:MEANS), RIDER(S) (:MOVER(S)), CARGO syntactic properties of the lexical items. (Several (=MOVER(S))) of these are schematized in Fig. 2). These sentences scenes(DRIVER starts VEHICLE, DRIVER con- provide empirical support for the lexicographic anal- trois VEHICLE, DRIVER stops VEHICLE) ysis provided in the frame database and lexicon en- frame(RIDING-i) tries. inherit(TRANSP O RTATION) These three components form a highly rela- frame.elements(RIDER(S) (=MOVER(S)), VE- tional and tightly integrated whole: elements HICLE (:MEANS)) in each may point to elements in the other scenes(RIDER enters VEHICLE, two. The database will also contain estimates VEHICLE carries RIDER along PATH, of the relative frequency of senses and comple- RIDER leaves VEHICLE ) mentation patterns calculated by matching the Figure 1: A subframe can inherit elements and senses and patterns in the hand-tagged exam- semantics from its parent ples against the entire BNC corpus. 1.2 Conceptual Model The DRIVING frame, for example, specifies a DRIVER (a principal MOVER), a VEHICLE (a par- The FrameNet work is in some ways similar ticularization of the MEANS element), and po- to efforts to describe the argument structures tentially CARGO or RIDER as secondary movers. of lexical items in terms of case-roles or theta- In this frame, the DRIVER initiates and controls roles, 5 but in FrameNet, the role names (called the movement of the VEHICLE. For most verbs frame elements or FEs) are local to particular in this frame, DRIVER or VEHICLE can be real- conceptual structures (frames); some of these ized as subjects; VEHICLE, RIDER, or CARGO can are quite general, while others are specific to a appear as direct objects; and PATH and VEHICLE small family of lexical items. can appear as oblique complements. For example, the TRANSPORTATION frame, Some combinations of frame elements, or within the domain of MOTION, provides MOVERS, MEANS of transportation, and PATHS; 6 Frame Element Groups (FEGs), for some real corpus sentences in the DRIVING frame are 4In cases of accidental gaps, clearly marked invented shown in Fig. 2. examples may be added. A RIDING_I frame has the primary mover role 5The semantic frames for individual lexical units are as RIDER, and allows as VEHICLE those driven typically "blends" of more than one basic frame; from our point of view, the so-called "linking" patterns pro- by others/ In grammatical realizations of this posed in LFG, HPSG, and Construction Grammar, op- frame, the RIDER can be the subject; the VEHI- erate on higher-level frames of action (giving agent, pa- CLE can appear as a direct object or an oblique tient, instrument), motion and location (giving theme, complement; and the PATH is generally realized location, source, goal, path), and experience (giving ex- periencer, stimulus, content), etc. In some but not all as an oblique. cases, the assignment of syntactic correlates to frame el- The FrameNet entry for each of these verbs ements could be mediated by mapping them to the roles will include a concise formula for all seman- of one of the more abstract frames. 8A detailed study of motion predicates would require work includes the separate analysis of the flame seman- a finer-grained analysis of the Path element, separating tics of directional and locational expressions. out Source and Goal, and perhaps Direction and Area, 7A separate frame RIDING_2 that applies to the En- but for a basic study of the transportation predicates glish verb r/de selects means of transportation that can such refined analysis is not necessary. In any case, our be straddled, such as bicycles, motorcycles, and horses. 87 FEG Annotated Example from BNC of the full range of use possibilities for individ- D [D Kate] drove [v home] in a ual words, documented with corpus data, the stupor. model examples for each use, and the statistical V, D A pregnant woman lost her baby af- information on relative frequency. ter she fainted as she waited for a bus and fell into the path of [v a 2 Organization and Workflow lorry] driven [~ by her uncle]. D, P And that was why [D I] drove 2.1 Overview [p eastwards along Lake Geneva]. The computational side of the FrameNet project D, R, P Now [D Van Cheele] was driving is directed at efficiently capturing human in- [R his guest] Iv back to the station]. sights into semantic structure. The majority D, V, P [D Cumming] had a fascination with most forms of transport, driving of the work involved is marking text with se- [y his Rolls] at high speed [p around mantic tags, specifying (again by hand) the the streets of London]. structure of the frames to be treated, and writ- D+R, P [D We] drive [p home along miles ing dictionary-style entries based the results of of empty freeway]. annotation and a priori descriptions. With V, P Over the next 4 days, Iv the Rolls the exception of the example sentence extrac- Royces] will drive [p down to Ply- tion component, all the software modules are mouth], following the route of the highly interactive and have substantial user in- railway. terface requirements. Most of this functionality Figure 2: Examples of Frame Element Groups is provided by WWW-based programs written and Annotated Sentences in PERL.

Load more