N N the Institute for Research in Cognitive Science

Total Page:16

File Type:pdf, Size:1020Kb

N N the Institute for Research in Cognitive Science The Institute For Research In Cognitive Science Selection and Information: A Class- Based Approach to Lexical Relationships (Ph.D. Dissertation) P by Philip Stuart Resnik E University of Pennsylvania 3401 Walnut Street, Suite 400C Philadelphia, PA 19104-6228 N December 1993 Site of the NSF Science and Technology Center for Research in Cognitive Science N University of Pennsylvania IRCS Report 93-42 Founded by Benjamin Franklin in 1740 SELECTION AND INFORMATION: A CLASS-BASED APPROACH TO LEXICAL RELATIONSHIPS Philip Stuart Resnik A dissertation in Computer and Information Science Presented to the Faculties of the University of Pennsylvania in Partial Ful®llment of the Requirements for the Degree of Doctor of Philosophy 1993 Aravind Joshi Supervisor of Dissertation Mark Steedman Graduate Group Chairperson c Copyright 1993 by Philip Stuart Resnik For Michael Resnik Abstract Selection and Information: A Class-Based Approach to Lexical Relationships Philip Stuart Resnik Supervisor: Aravind Joshi Selectional constraints are limitations on the applicability of predicates to arguments. For example, the statement ªThe number two is blueº may be syntactically well formed, but at some level it is anomalous Ð BLUE is not a predicate that can be applied to numbers. According to the in¯uential theory of (Katz and Fodor, 1964), a predicate associates a set of de®ning features with each argument, expressed within a restricted semantic vocabulary. Despite the persistence of this theory, however, there is widespread agreement about its empirical shortcomings (McCawley, 1968; Fodor, 1977). As an alternative, some critics of the Katz-Fodor theory (e.g. (Johnson-Laird, 1983)) have abandoned the treatment of selectional constraints as semantic, instead treating them as indistinguishable from inferences made on the basis of factual knowledge. This provides a better match for the empirical phenomena, but it opens up a different problem: if selectional constraints are the same as inferences in general, then accounting for them will require a much more complete understanding of knowledge representation and inference than we have at present. The problem, then, is this: how can a theory of selectional constraints be elaborated without ®rst having either an empirically adequate theory of de®ning features or a comprehensive theory of inference? In this dissertation, I suggest that an answer to this question lies in the representation of conceptual knowledge. Following Miller (1990b), I adopt a ªdifferentialº approach to conceptual representation, in which a conceptual taxonomy is de®ned in terms of inferential relationships rather than de®nitional features. Crucially, however, the inferences underlying the stored knowledge are not made explicit. My hypothesis is that a theory of selectional constraints need make reference only to knowledge stored in such a taxonomy, without ever referring overtly to inferential processes. I propose such a theory, formalizing selectional relationships in probabilistic terms: the selectional behavior of a predicate is modeled as its distributional effect on the conceptual classes of its arguments. This is expressed using the information-theoretic measure of relative entropy (Kullback and Leibler, 1951), which leads to an illuminating interpretation of what selectional constraints are: the strength of a predicate's selection for an argument is identi®ed with the quantity of information it carries about that argument. In addition to arguing that the model is empirically adequate, I explore its application to two problems. The ®rst concerns a linguistic question: why some transitive verbs permit implicit direct objects (ªJohn ate º) and others do not (ª*John brought º). It has often been observed informally that the omission of objects is connected to the ease with which the object can be inferred. I have made this observation more formal by positing a relationship between selectional constraints and inferability. This predicts (i) that verbs permitting implicit objects select more strongly for (i.e. carry more information about) that argument than verbs that do not, and (ii) that strength of selection is a predictor of how often verbs omit their objects in naturally occurring utterances. Computational experiments con®rm these predictions. Second, I have explored the practical applications of the model in resolving syntactic ambiguity. A number of authors have recently begun investigating the use of corpus-based lexical statistics in automatic iv parsing; the results of computational experiments using the present model suggest that many lexical rela- tionships are better viewed in terms of underlying conceptual relationships. Thus the information-theoretic measures proposed here can serve not only as components in a theory of selectional constraints, but also as tools for practical natural language processing. v Acknowledgements In the past, I have occasionally read the acknowledgements in other people's papers and dissertations and thought, well, they really do seem to have thrown in the kitchen sink, haven't they? Which of those people really had something signi®cant to do with this work? I will never think that thought again. Having sat down to acknowledge my debt to the people around me in making this dissertation happen, I realize that the number of people who have had a real in¯uence is enormous, and that simply listing their names is an expedient but criminally understated way of recognizing their contribution. I am very grateful to my advisor, Aravind Joshi, for his support, for his guidance, and for his role (together with his co-director, Lila Gleitman) in creating the Institute for Research in Cognitive Science. I'm fortunate to have been a part of IRCS at such an exciting time. I would like to thank the members of my dissertation committee: Steve Abney, Lila Gleitman, Mark Liberman, and Mitch Marcus. They are individually extraordinary, and together they form a committee with enormous personality and intellect. I would like to thank the participants in Penn's Computational Linguistics Feedback Forum (CLiFF group) Ð which is to say my fellow grad students, the IRCS postdocs, and the natural language faculty at Penn Ð for their support and constructive criticism. In particular, this work has pro®ted from discussions with Eric Brill, Barbara DiEugenio, Bob Frank, Michael Hegarty, Jamie Henderson, Shyam Kapur, Libby Levison, Dave Magerman, Michael Niv, Sandeep Prasada, Owen Rambow, Robert Rubinoff, Giorgio Satta, Jeff Siskind, Mark Steedman, Lyn Walker, Mike White, Bonnie Webber, and David Yarowsky. I have also had extremely helpful discussions with Kevin Atteson, Ken Church, Ido Dagan, Christiane Fellbaum, Jane Grimshaw, Marti Hearst, Donald Hindle, Tony Kroch, Annie Lederer, Robbie Mandelbaum, Gail Mauner, George Miller, Max Mintz, Fernando Pereira, and Stuart Shieber. I know it's a long list, but each name I look at calls to mind an important discussion, a shared insight, or, more often then not, a whole blur of images and conversations over time. I'm grateful to have been a part of the Gleitmans' ªcheeseº seminar, with thanks especially to Henry Gleitman, Lila Gleitman, Mike Kelly, and Barbara Landau. Those meetings are, I think, the very essence of what research is about. I hope that someday I can manage to come close to recreating something like it for other generations of students. I owe a debt of gratitude to Nan Biltz, Carolyn Elken, Dawn Greisbach, Chris Sandy, Estelle Taylor, and Trisha Yannuzzi for all their help with the ins and outs of the department and the university. Ditto for the computational support of Mark-Jason Dominus, Mark Foster, Ira Winston, and Martin Zaidel. vi I would like to thank George Miller and the WordNet ªlexigangº for their continued interest and for making WordNet freely available. I would like to acknowledge helpful conversations with my IBM fellowship technical liaison, Wlodek Zadrozny, and to express my gratitude to Peter Brown, Stephen Della Pietra, Vincent Della Pietra, Fred Jelinek, and Bob Mercer for the enormous amount I learned working with them at the IBM T. J. Watson Research Center. That's just the debts I've incurred during four years at Penn. I also owe a great many thanks to the people who helped me get started in research as an undergraduate Ð in particular, Barbara Grosz, John Whitman, and Bill Woods. (Thanks, too, to Laurence Bouvard, for helping to inspire my interest in linguistics.) And thanks to those I worked with and learned from at Bolt Beranek and Newman Ð in particular, Rusty Bobrow, Bob Ingria, Lance Ramshaw, and Ralph Weischedel. As for the personal debts, words on paper seem especially inadequate. The support and love of my parents are constants that I could not have done without. And it feels to me as if my old friends (especially Debbie Co, Dan Josell, and Lynn Stein) and new friends (especially Howard Lang, Libby Levison, Robbie Mandelbaum, and Owen Rambow) have saved my life more times than I can count. Finally: Tracy and Benjamin. Words could not possibly say. Oh yes, and then there's money. This work was partially supported by the following grants: ARO DAAL 03-89-C-0031, DARPA N00014-90-J-1863, NSF IRI 90-16592, and Ben Franklin 91S.3078C-1, and by an IBM Graduate Fellowship. vii Contents Abstract iv Acknowledgements vi 1 Introduction 1 1.1 Setting ¡ ¡ ¢ ¡ ¡ ¡ ¢ ¡ £ ¤ £ ¡ ¡ ¢ ¡ ¡ ¢ ¡ ¡ £ ¤ £ ¡ ¢ ¡ ¡ ¡ ¢ ¡ ¡ ¢ ¡ £ ¡ ¢ ¡ ¡ ¢ ¡ ¡ ¡ ¢ ¡ 1 1.2 Argument ¢ ¡ ¡ £ ¤ £ ¡ ¢ ¡ ¡ ¡ ¢ ¡ ¡ ¢ ¡ £ ¡ ¢ ¡ ¡ ¢ ¡ ¡ ¡ ¢ ¡ £ ¤ £ ¡ ¡ ¢ ¡ ¡ ¢ ¡ ¡ £ ¤ £ 2 1.3 Chapter
Recommended publications
  • The FINITE STRING Newsletter Programs 8000 Words), Double
    The FINITE STRING Newsletter Programs 8000 words), double-spaced, by 1 December 1985, to Bestougeff, Ligozat - Parametrised abstract objects for the Chairman of the Program Committee: linguistic information processing Salton - On the representation of query term relations by Prof. Makoto Nagao (Kyoto) soft Boolean operators Dept. of Electrical Engineering Kyoto University 29 MARCH Sakyo-ku, Kyoto, 606, Japan MORNING The Program Committee will respond before 15 March Altman - The resolution of local syntactic ambuiguity by 1986. the human sentence processing mechanism The complete text of the revised papers in camera- Pulman - A parser that doesn't ready form should be sent before 1 May 1986 to Delmonte - Parsing difficulties and phonological processing Winfried Lenders in Italian Institut for Kommunikationsforschung und Phonetik Izumida et al. - A natural language interface using a world der Universit~it Bonn model Poppelsdorfer Allee 47 Berry-Rogghe - Interpreting singular definite descriptions D-5300 Bonn 1 in database queries Bree, Smit - Non-standard uses of if Wehrli - Design and implementation of a lexical data base PROGRAMS Maistros, Kotsanis - Lexifamis: A lexical analyser of modern Greek ACL EUROPEANCHAPTER: Beale - Grammatical analysis by computer of the SECOND CONFERENCE Lancaster-Oslo/Bergen corpus 28 MARCH 1985 Fimbel et al. - Using a text model for analysis and gener- ation MORNING Gillott - The simulation of stress patterns in synthetic Opening Session: Invited Speaker speech - a two level problem Kornai - Natural languages and the Chomsky hierarchy Johnston, Altman - Automatic speech recognition: a Hess - How does Natural Language Quantify framework for research Stifling - Distributives, quantifiers, and a multiplicity of .4 FTER NOON events Garside - A probabflistic parser Slocum and Bennett - An evaluation of METAL Boguraev, Briscoe - Toward a dictionary support environ- Root - A two-way approach to structural transfer in MT ment for real time parsing Boitet et al.
    [Show full text]
  • Download the Complete Issue
    ISSN 0976-0962 IJCLA International Journal of Computational Linguistics and Applicati ons Vol. 3 No. 2 Jul-Dec 2012 Guest Editor Yasunari Harada Editor-in-Chief Alexander Gelbukh © BAHRI PUBLICATIONS (2012) ISSN 0976-0962 International Journal of Computational Linguistics and App lications Vol. 3 No. 2 Jul-Dec 2012 International Journal of Computational Linguistics and Applications – IJCLA (started in 2010) is a peer-reviewed international journal published twice a year, in June and December. It publishes original research papers related to computational linguistics, natural language processing, human language technologies and their applications. The views expressed herein are those of the authors. The journal reserves the right to edit the material. © BAHRI PUBLICATIONS (2013). All rights reserved. No part of this publication may be reproduced by any means, transmitted or translated into another language without the written permission of the publisher. Indexing: Cabell's Directory of Publishing Opportunities. Editor-in-Chief: Alexander Gelbukh Subscription: India: Rs. 2699 Rest of the world: US$ 249 Payments can be made by Cheques/Bank Drafts/International Money Orders drawn in the name of BAHRI PUBLICATIONS, NEW DELHI and sent to: BAHRI PUBLICATIONS 1749A/5, 1st Floor, Gobindpuri Extension, P. O. Box 4453, Kalkaji, New Delhi 110019 Telephones: 011-65810766, (0) 9811204673, (0) 9212794543 E-mail: [email protected]; [email protected] Website: http://www.bahripublications.com Printed & Published by Deepinder Singh Bahri, for and
    [Show full text]
  • Obituary Aravind K. Joshi
    Obituary Aravind K. Joshi Bonnie Webber University of Edinburgh School of Informatics [email protected] It might surprise some young researchers that the first recipient of the ACL LifeTime Achievement award,1 Aravind Joshi, was so often compared to “Yoda,” one of the oldest and most powerful of the Jedi Masters in the Star Wars universe. But Aravind was also one of the kindest, wisest, and most justly celebrated people that one was fortunate to know. When Aravind received the award in 2002, at the age of 73, he said that he hoped his lifetime wasn’t over. Fortunately, it wasn’t: For the next 15 years, Aravind continued to enjoy time spent on research; advising students and younger colleagues; attending ACL conferences both at home in the United States and in far-flung places such as Sydney, Singapore, and Jeju Island; and enjoying the company of his extraordinary wife, the embryologist Susan Heyner, his daughters, Meera and Shyamala Joshi, and his grandchildren, Marco and Ava. Then on 31 December 2017, Aravind died peacefully at home in Philadelphia, sitting in his favorite chair, at the age of 88. Aravind Joshi was born in Pune, India, on 5 August, 1929. He sailed to the United States in 1954 to study electrical engineering (EE) at the University of Pennsylvania, after he was rejected by Harvard because his application, mailed from India, arrived a day late. While completing his M.Sc. in EE, he worked as an engineer at RCA (Camden, NJ), and then while completing his Ph.D. in EE, as a research assistant at the Univer- sity of Pennsylvania’s Department of Linguistics.
    [Show full text]
  • Cliff Notes on the Net
    CLiFF Notes Research in the Language, Information and Computation Laboratory of the University of Pennsylvania Annual Report: 1994, No. 4 Technical Report MS-CIS-95-07 LINC Lab 282 Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104-6389 Editors: Matthew Stone & Libby Levison cmp-lg/9506008 9 Jun 1995 i Contents I Introduction v II Abstracts 1 Breck Baldwin 2 Tilman Becker 4 Betty J. Birner 6 Sandra Carberry 8 Christine Doran 10 Dania Egedi 13 Jason Eisner 15 Christopher Geib 17 Abigail S. Gertner 19 James Henderson 21 Beth Ann Hockey 23 Beryl Hoffman 26 Paul S. Jacobs 28 Aravind K. Joshi 29 Jonathan M. Kaye 32 Albert Kim 34 Nobo Komagata 36 Seth Kulick 37 Sadao Kurohashi 38 Libby Levison 39 D. R. Mani 41 Mitch Marcus 43 I. Dan Melamed 46 ii Michael B. Moore 47 Charles L. Ortiz, Jr. 48 Martha Palmer 49 Hyun S Park 52 Jong Cheol Park 54 Scott Prevost 55 Ellen F. Prince 58 Lance A. Ramshaw 60 Lisa F. Rau 62 Jeffrey C. Reynar 64 Francesc Ribas i Framis 66 James Rogers 68 Bernhard Rohrbacher 70 Joseph Rosenzweig 72 Deborah Rossen-Knill 73 Anoop Sarkar 76 B. Srinivas 77 Mark Steedman 80 Matthew Stone 82 Anne Vainikka 84 Bonnie Lynn Webber 86 Michael White 89 David Yarowsky 91 III Projects and Working Groups 93 The AnimNL Project 94 The Gesture Jack project 96 Information Theory Reading Group 99 iii The STAG Machine Translation Project 100 The TraumAID Project 102 The XTAG Project 104 IV Appendix 107 iv Part I Introduction This report takes its name fromtheComputationalLinguistics Feedback Forum (CLiFF), an informal discussion group for students and faculty.
    [Show full text]
  • Research in the Language, Information and Computation Laboratory of the University of Pennsylvania
    University of Pennsylvania ScholarlyCommons IRCS Technical Reports Series Institute for Research in Cognitive Science March 1995 Research in the Language, Information and Computation Laboratory of the University of Pennsylvania Matthew Stone University of Pennsylvania Libby Levison University of Pennsylvania Follow this and additional works at: https://repository.upenn.edu/ircs_reports Stone, Matthew and Levison, Libby, "Research in the Language, Information and Computation Laboratory of the University of Pennsylvania" (1995). IRCS Technical Reports Series. 120. https://repository.upenn.edu/ircs_reports/120 University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-95-05. This paper is posted at ScholarlyCommons. https://repository.upenn.edu/ircs_reports/120 For more information, please contact [email protected]. Research in the Language, Information and Computation Laboratory of the University of Pennsylvania Abstract This report takes its name from the Computational Linguistics Feedback Forum (CLiFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania. It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition.
    [Show full text]
  • Ms Niha Dingankar Assistant Professor and Dr Anita Chaware, Head PG Depa
    Prepared by First Year MCA Students Compiled by – Ms Niha Dingankar Assistant Professor and Dr Anita Chaware, Head PG Department Of Computer Science, SNDT WU National Technology Day is celebrated on 11 May across India. This day marks the successfully tested Shakti-I nuclear missile at the Indian Army’s Pokhran Test Range in Rajasthan. This day will be focusing on rebooting the economy through Science and Technology. Shakti also knows as the Pokhran Nuclear Test was the first nuclear test code-named ‘Smiling Buddha’ was carried out in May 1974. This operation was administered by late president and aerospace engineer Dr APJ Abdul Kalam Kalam when Atal Bihari Vajpayee was the Prime Minister. The students of Post Graduate Department Of Computer Science,SNDT University have prepared a list of Computer Scientists from Maharashtra and Orissa. The song which is played in the background is Jayostute Shree Mahanmangale With Original Lyrics by Shree Veer Savarkar Sung By Lata Mangeshkar Chhatrapati Mukteswar Shivaji Temple,Odisha Maharaj Aravind Joshi Born: 5 August 1929 - 31 December 2017 Aravind Krishna Joshi was the Henry Salvatori Professor of Computer and Cognitive Science in the computer science department of the University of Pennsylvania. Joshi defined the tree-adjoining grammar formalism which is often used in computational linguistics and natural language processing. Awards: Guggenheim fellow, 1971–72 Fellow of the Institute of Electrical and Electronics Engineers (IEEE), 1976 Founding Fellow of the American Association for Artificial Intelligence (AAAI), 1990 Award Benjamin Franklin Medal Vijay P. Bhatkar Born: 11 October 1946,Mumbai,India Vijay Pandurang Bhatkar is an Indian computer scientist, IT leader and educationalist.
    [Show full text]