N N the Institute for Research in Cognitive Science

The Institute For Research In Cognitive Science Selection and Information: A Class- Based Approach to Lexical Relationships (Ph.D. Dissertation) P by Philip Stuart Resnik E University of Pennsylvania 3401 Walnut Street, Suite 400C Philadelphia, PA 19104-6228 N December 1993 Site of the NSF Science and Technology Center for Research in Cognitive Science N University of Pennsylvania IRCS Report 93-42 Founded by Benjamin Franklin in 1740 SELECTION AND INFORMATION: A CLASS-BASED APPROACH TO LEXICAL RELATIONSHIPS Philip Stuart Resnik A dissertation in Computer and Information Science Presented to the Faculties of the University of Pennsylvania in Partial Ful®llment of the Requirements for the Degree of Doctor of Philosophy 1993 Aravind Joshi Supervisor of Dissertation Mark Steedman Graduate Group Chairperson c Copyright 1993 by Philip Stuart Resnik For Michael Resnik Abstract Selection and Information: A Class-Based Approach to Lexical Relationships Philip Stuart Resnik Supervisor: Aravind Joshi Selectional constraints are limitations on the applicability of predicates to arguments. For example, the statement ªThe number two is blueº may be syntactically well formed, but at some level it is anomalous Ð BLUE is not a predicate that can be applied to numbers. According to the in¯uential theory of (Katz and Fodor, 1964), a predicate associates a set of de®ning features with each argument, expressed within a restricted semantic vocabulary. Despite the persistence of this theory, however, there is widespread agreement about its empirical shortcomings (McCawley, 1968; Fodor, 1977). As an alternative, some critics of the Katz-Fodor theory (e.g. (Johnson-Laird, 1983)) have abandoned the treatment of selectional constraints as semantic, instead treating them as indistinguishable from inferences made on the basis of factual knowledge. This provides a better match for the empirical phenomena, but it opens up a different problem: if selectional constraints are the same as inferences in general, then accounting for them will require a much more complete understanding of knowledge representation and inference than we have at present. The problem, then, is this: how can a theory of selectional constraints be elaborated without ®rst having either an empirically adequate theory of de®ning features or a comprehensive theory of inference? In this dissertation, I suggest that an answer to this question lies in the representation of conceptual knowledge. Following Miller (1990b), I adopt a ªdifferentialº approach to conceptual representation, in which a conceptual taxonomy is de®ned in terms of inferential relationships rather than de®nitional features. Crucially, however, the inferences underlying the stored knowledge are not made explicit. My hypothesis is that a theory of selectional constraints need make reference only to knowledge stored in such a taxonomy, without ever referring overtly to inferential processes. I propose such a theory, formalizing selectional relationships in probabilistic terms: the selectional behavior of a predicate is modeled as its distributional effect on the conceptual classes of its arguments. This is expressed using the information-theoretic measure of relative entropy (Kullback and Leibler, 1951), which leads to an illuminating interpretation of what selectional constraints are: the strength of a predicate's selection for an argument is identi®ed with the quantity of information it carries about that argument. In addition to arguing that the model is empirically adequate, I explore its application to two problems. The ®rst concerns a linguistic question: why some transitive verbs permit implicit direct objects (ªJohn ate º) and others do not (ª*John brought º). It has often been observed informally that the omission of objects is connected to the ease with which the object can be inferred. I have made this observation more formal by positing a relationship between selectional constraints and inferability. This predicts (i) that verbs permitting implicit objects select more strongly for (i.e. carry more information about) that argument than verbs that do not, and (ii) that strength of selection is a predictor of how often verbs omit their objects in naturally occurring utterances. Computational experiments con®rm these predictions. Second, I have explored the practical applications of the model in resolving syntactic ambiguity. A number of authors have recently begun investigating the use of corpus-based lexical statistics in automatic iv parsing; the results of computational experiments using the present model suggest that many lexical relationships are better viewed in terms of underlying conceptual relationships. Thus the information-theoretic measures proposed here can serve not only as components in a theory of selectional constraints, but also as tools for practical natural language processing. v Acknowledgements In the past, I have occasionally read the acknowledgements in other people's papers and dissertations and thought, well, they really do seem to have thrown in the kitchen sink, haven't they? Which of those people really had something signi®cant to do with this work? I will never think that thought again. Having sat down to acknowledge my debt to the people around me in making this dissertation happen, I realize that the number of people who have had a real in¯uence is enormous, and that simply listing their names is an expedient but criminally understated way of recognizing their contribution. I am very grateful to my advisor, Aravind Joshi, for his support, for his guidance, and for his role (together with his co-director, Lila Gleitman) in creating the Institute for Research in Cognitive Science. I'm fortunate to have been a part of IRCS at such an exciting time. I would like to thank the members of my dissertation committee: Steve Abney, Lila Gleitman, Mark Liberman, and Mitch Marcus. They are individually extraordinary, and together they form a committee with enormous personality and intellect. I would like to thank the participants in Penn's Computational Linguistics Feedback Forum (CLiFF group) Ð which is to say my fellow grad students, the IRCS postdocs, and the natural language faculty at Penn Ð for their support and constructive criticism. In particular, this work has pro®ted from discussions with Eric Brill, Barbara DiEugenio, Bob Frank, Michael Hegarty, Jamie Henderson, Shyam Kapur, Libby Levison, Dave Magerman, Michael Niv, Sandeep Prasada, Owen Rambow, Robert Rubinoff, Giorgio Satta, Jeff Siskind, Mark Steedman, Lyn Walker, Mike White, Bonnie Webber, and David Yarowsky. I have also had extremely helpful discussions with Kevin Atteson, Ken Church, Ido Dagan, Christiane Fellbaum, Jane Grimshaw, Marti Hearst, Donald Hindle, Tony Kroch, Annie Lederer, Robbie Mandelbaum, Gail Mauner, George Miller, Max Mintz, Fernando Pereira, and Stuart Shieber. I know it's a long list, but each name I look at calls to mind an important discussion, a shared insight, or, more often then not, a whole blur of images and conversations over time. I'm grateful to have been a part of the Gleitmans' ªcheeseº seminar, with thanks especially to Henry Gleitman, Lila Gleitman, Mike Kelly, and Barbara Landau. Those meetings are, I think, the very essence of what research is about. I hope that someday I can manage to come close to recreating something like it for other generations of students. I owe a debt of gratitude to Nan Biltz, Carolyn Elken, Dawn Greisbach, Chris Sandy, Estelle Taylor, and Trisha Yannuzzi for all their help with the ins and outs of the department and the university. Ditto for the computational support of Mark-Jason Dominus, Mark Foster, Ira Winston, and Martin Zaidel. vi I would like to thank George Miller and the WordNet ªlexigangº for their continued interest and for making WordNet freely available. I would like to acknowledge helpful conversations with my IBM fellowship technical liaison, Wlodek Zadrozny, and to express my gratitude to Peter Brown, Stephen Della Pietra, Vincent Della Pietra, Fred Jelinek, and Bob Mercer for the enormous amount I learned working with them at the IBM T. J. Watson Research Center. That's just the debts I've incurred during four years at Penn. I also owe a great many thanks to the people who helped me get started in research as an undergraduate Ð in particular, Barbara Grosz, John Whitman, and Bill Woods. (Thanks, too, to Laurence Bouvard, for helping to inspire my interest in linguistics.) And thanks to those I worked with and learned from at Bolt Beranek and Newman Ð in particular, Rusty Bobrow, Bob Ingria, Lance Ramshaw, and Ralph Weischedel. As for the personal debts, words on paper seem especially inadequate. The support and love of my parents are constants that I could not have done without. And it feels to me as if my old friends (especially Debbie Co, Dan Josell, and Lynn Stein) and new friends (especially Howard Lang, Libby Levison, Robbie Mandelbaum, and Owen Rambow) have saved my life more times than I can count. Finally: Tracy and Benjamin. Words could not possibly say. Oh yes, and then there's money. This work was partially supported by the following grants: ARO DAAL 03-89-C-0031, DARPA N00014-90-J-1863, NSF IRI 90-16592, and Ben Franklin 91S.3078C-1, and by an IBM Graduate Fellowship. vii Contents Abstract iv Acknowledgements vi 1 Introduction 1 1.1 Setting ¡ ¡ ¢ ¡ ¡ ¡ ¢ ¡ £ ¤ £ ¡ ¡ ¢ ¡ ¡ ¢ ¡ ¡ £ ¤ £ ¡ ¢ ¡ ¡ ¡ ¢ ¡ ¡ ¢ ¡ £ ¡ ¢ ¡ ¡ ¢ ¡ ¡ ¡ ¢ ¡ 1 1.2 Argument ¢ ¡ ¡ £ ¤ £ ¡ ¢ ¡ ¡ ¡ ¢ ¡ ¡ ¢ ¡ £ ¡ ¢ ¡ ¡ ¢ ¡ ¡ ¡ ¢ ¡ £ ¤ £ ¡ ¡ ¢ ¡ ¡ ¢ ¡ ¡ £ ¤ £ 2 1.3 Chapter

Load more