A Freely Available Syntactic Lexicon for English
Total Page:16
File Type:pdf, Size:1020Kb
University of Pennsylvania ScholarlyCommons IRCS Technical Reports Series Institute for Research in Cognitive Science April 1995 A Freely Available Syntactic Lexicon for English Dania Egedi University of Pennsylvania Paul Martin SRA Follow this and additional works at: https://repository.upenn.edu/ircs_reports Egedi, Dania and Martin, Paul, "A Freely Available Syntactic Lexicon for English" (1995). IRCS Technical Reports Series. 125. https://repository.upenn.edu/ircs_reports/125 University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-95-11. This paper is posted at ScholarlyCommons. https://repository.upenn.edu/ircs_reports/125 For more information, please contact [email protected]. A Freely Available Syntactic Lexicon for English Abstract This paper presents a syntactic lexicon for English that was originally derived from the Oxford Advanced Learner's Dictionary and the Oxford Dictionary of Current Idiomatic English, and then modified and augmented by hand. There are more than 37,000 syntactic entries from all 8 parts of speech. An X- windows based tool is available for maintaining the lexicon and performing searches. C and Lisp hooks are also available so that the lexicon can be easily utilized by parsers and other programs. Comments University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-95-11. This technical report is available at ScholarlyCommons: https://repository.upenn.edu/ircs_reports/125 The Institute For Research In Cognitive Science A Freely Available Syntactic Lexicon for English by Dania Egedi IRCS P Patrick Martin SRA E University of Pennsylvania 3401 Walnut Street, Suite 400C Philadelphia, PA 19104-6228 April 1995 N (originally published in August 1994) Site of the NSF Science and Technology Center for Research in Cognitive Science N University of Pennsylvania IRCS Report 95-11 Founded by Benjamin Franklin in 1740 App ears in the Proceedings of the International Workshop on Sharable Natural Language Resources Nara Japan August pp AFreely Available Syntactic Lexicon for English Dania Egedi and Patrick Martin Institute for Research in CognitiveScience UniversityofPennsylvania Philadelphia PA USA fegedimartingunagicisup ennedu Abstract This pap er presents a syntactic lexicon for English that was originally derived from the Oxford Advanced Learners Dictionary and the Oxford Dictionary of Current Idiomatic English and then mo died and augmented by hand There are more than syn tactic entries from all parts of sp eech An Xwindows based to ol is available for main taining the lexicon and p erforming searches C and Lisp ho oks are also available so that the lexicon can b e easily utilized by parsers and other programs consistencies in the various comp onents of the Intro duction lexical entries making extraction quite di One of the central needs of any widecoverage cult Many researchers abandon the extrac tion pro cess altogether b ecause it consumes to o parser is a large lexicon that contains the syn tactic information for various lexical items many scarce resources The creation of such a lexicon has tradition Although a numb er of researchers haveex ally b een a very large and daunting task and tracted information out of the various dictio naries available the resulting lexicons have most universities have shied away from it leav not in general b een made freely available ing the creation of widecoverage parsers to commercial institutions that could aord the to the NLP research community In at time and p ersonnel to devote to the creation of least some cases Carroll and Grover such a lexicon The release of several machine Guthrie et al this is due to licensing restrictions on the source dictionaries In re readable dictionaries MRDs into the public domain has op ened new p ossibilities to gram sp onse to the related problems of duplication of mar develop ers at research institutions but eort and nonavailability of needed lexicons the task did not b ecome trivial The problem there are currently several ongoing pro jects to of creating large scale lexicons changed from create syntactic lexicons and make them gen the tiresome painstaking task of trying to de erally available velop individual word lists for various syntactic The Proteus Pro ject at New York Uni phenomena to the task of simply extracting versity is developing the Comlex Syntac the information from the online dictionaries tic Dictionary from scratch for release as This however has not turned out to b e as sim one of the lexical resources in COMLEX ple or straightforward as researchers mayhave available through the Linguistic Data hop ed Machine readable dictionaries present Consortium Macleo d et al numerous problems in terms of errors and in Currently at SRA Arlington VA USA The I ITLEX pro ject at Illinois Institute martinpsracom of Technology has an ongoing pro ject ex eld optional may b e used for any to extract and release the information numb er of example sentences in the Collins English Dictionaryalong with information from various other word Note that lexical items mayhave more than lists that will include b oth syntactic and one entry in the database eg have and that semantic information That system is they may select the same frame eld more still under development however and than once using the fs to capture lexical id currently uses an exp ensive relational iosyncrasies eg map Table shows selected database package a drawbackwhich they entries from the database plan to correct Conlon INDEX have The syntactic lexicon describ ed here con ENTRY have tains approximately entries extracted POS Verb from the OxfordAdvancedLearners Dictio FRAME Auxiliary Verb nary of Current English Hornby and the FS Go es on Innitive Oxford Dictionary for Current Idiomatic En EX John has to go to the store glish Cowie and Mackin It is available via FTP in b oth an ASCI I and a database for INDEX have mat The database format uses a UNIX hash ENTRY have table facility Seltzer and Yigit that is POS V freely distributed and comes with an X FRAME Transitive Verb windows based interface for mo difying the FS NonErgative database and doing searches C and Lisp ho oks EX Johnhasaproblem to allow other programs to use the database are also included INDEX map ENTRY map out Syntactic Lexicon Particle POS Verb Verb Verb Particle FRAME Transitive The syntactic lexicon has entries for part ofsp eech categories Adjective Adverb Com INDEX map plementizer Conjunction Determiner Noun ENTRY map erb Eachentry consists of Prep osition and V POS Noun the following required and optional elds FRAME Base Noun Determiner required Noun index eld required the uninected Noun Mo dier form under which the lexical item is com FS whreexive piled in the database INDEX map entry eld required contains all of the ENTRY map lexical items asso ciated with the index POS Noun pos eld required gives the partof Determiner not required FRAME Noun sp eech for the lexical items in the entry FS whreexive plural eld Table Selected Syntactic Database Entries frame eld required contains the syn tactic information ab out that entry Because the syntactic database is part of the XTAG pro ject Doran et al a ongoing fs eld optional the Feature Structure pro ject to develop a widecoverage parser for eld may provide additional information English see Section some entries in the syn ab out the frame eld tactic lexicon reect sp ecic XTAG analyses For example a verb particle construction would b e In fact the graphical interface for the syntac indexed under the verb but would contain b oth the tic lexicon describ ed in Section can run in verb and the verb particle in the entry eld in predicative sentences Other frames provide two mo des xtag and verb oseTables information ab out the use of the noun with and were all generated in verb ose mo de determiners when forming noun phrases The The vast ma jority of lexical items in the frames for noun are presented b elow database fall into just categories Adjectives Nouns and Verbs These three categories plus Base noun All nouns Adverbs are presented in more detail in the fol lowing subsections Noun Phrase with Determiner Nouns that can take a determiner when Adjectives forming a noun phrase Ex a mana jealousy There are lexical adjectives in the database of which are Prop er Name adjec Noun Phrase without Determiner tives suchas Chinese and American Adjec Nouns that can app ear without a deter tives have frames that they can select which miner when forming a noun phrase Ex are listed b elow Possible values for the fs eld envyplant are wh and wh Mo difying noun Nouns that can mo d Base adjective All adjectives ify other nouns Note that not all nouns can mo dify other nouns Prop er nouns in Mo difying adjective Adjectives that general cannot mo dify other nouns and can o ccur in direct mo dication contexts sp ecic lexical items may b e restricted as Ex the Chinese man well Ex basketball gameJohn car Predicative adjective Adjectives that Noun with sentential complement can o ccur as the complementofapredica Nouns that takesentential complements tiveverb Ex John was happy Ex the fact that Mary loves John e adjective w sentential Predicativ Predicative noun Nouns that can o ccur complement Adjectives that can o ccur as the complement of a predicativeverb as the complement of a predicativeverb Ex John was a man and that take a sentential complement Ex John was happy that Mary left Bil l tential sub Predicativenounwsen ject Nouns