Cliff Notes on the Net

Total Page:16

File Type:pdf, Size:1020Kb

Cliff Notes on the Net CLiFF Notes Research in the Language, Information and Computation Laboratory of the University of Pennsylvania Annual Report: 1994, No. 4 Technical Report MS-CIS-95-07 LINC Lab 282 Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104-6389 Editors: Matthew Stone & Libby Levison cmp-lg/9506008 9 Jun 1995 i Contents I Introduction v II Abstracts 1 Breck Baldwin 2 Tilman Becker 4 Betty J. Birner 6 Sandra Carberry 8 Christine Doran 10 Dania Egedi 13 Jason Eisner 15 Christopher Geib 17 Abigail S. Gertner 19 James Henderson 21 Beth Ann Hockey 23 Beryl Hoffman 26 Paul S. Jacobs 28 Aravind K. Joshi 29 Jonathan M. Kaye 32 Albert Kim 34 Nobo Komagata 36 Seth Kulick 37 Sadao Kurohashi 38 Libby Levison 39 D. R. Mani 41 Mitch Marcus 43 I. Dan Melamed 46 ii Michael B. Moore 47 Charles L. Ortiz, Jr. 48 Martha Palmer 49 Hyun S Park 52 Jong Cheol Park 54 Scott Prevost 55 Ellen F. Prince 58 Lance A. Ramshaw 60 Lisa F. Rau 62 Jeffrey C. Reynar 64 Francesc Ribas i Framis 66 James Rogers 68 Bernhard Rohrbacher 70 Joseph Rosenzweig 72 Deborah Rossen-Knill 73 Anoop Sarkar 76 B. Srinivas 77 Mark Steedman 80 Matthew Stone 82 Anne Vainikka 84 Bonnie Lynn Webber 86 Michael White 89 David Yarowsky 91 III Projects and Working Groups 93 The AnimNL Project 94 The Gesture Jack project 96 Information Theory Reading Group 99 iii The STAG Machine Translation Project 100 The TraumAID Project 102 The XTAG Project 104 IV Appendix 107 iv Part I Introduction This report takes its name fromtheComputationalLinguistics Feedback Forum (CLiFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania. It may at ®rst be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science and Linguistics Departments,and the Institutefor Research in CognitiveScience. It includes prototypical Natural Language ®elds such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax- semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition. Naturally, this introductioncannot spell out all the connections between these abstracts; we invite you to explore themon your own. In fact, with thisissueit'seasier than ever to do so: this document is accessible on the ªinformation superhighwayº. Just call up http://www.cis.upenn.edu/˜cliff-group/94/cliffnotes.html In addition,you can ®nd many of the papers referenced in the CLiFF Notes on the net. Most can be obtained by following links from the authors' abstracts in the web version of this report. The abstracts describe the researchers' many areas of investigation, explain their shared concerns, and present some interesting work in CognitiveScience. We hope its new online format makes the CLiFF Notes a more useful and interesting guide to Computational Linguistics activity at Penn. v 1.1 Addresses Contributorsto this report can be contacted by writing to: contributor'sname department University of Pennsylvania Philadelphia, PA 19104 In addition, a list of publications from the Computer Science department may be obtained by contacting: Technical Report Librarian Department of Computer and Information Science University of Pennsylvania 200 S. 33rd St. Philadelphia, PA 19104-6389 (215)-898-3538 [email protected] 1.2 Funding Agencies This research is partially supported by: ARO including participation by the U.S. Army Research Laboratory (Aberdeen), Natick Laboratory, the Army Research Institute, the Institutefor Simulation and Training, and NASA Ames Research Center; Air Force Of®ce of Scienti®c Research; ARPA through General Electric Government Electronic Systems Division; AT&T Bell Labs; AT&T Fel- lowships; Bellcore; DMSO throughthe Universityof Iowa; General Electric Research Lab; IBM T.J. Watson Research Center; IBM Fellowships; National Defense Science and Engineering Graduate Fellowship in Computer Science; NSF, CISE, and Instrumentation and Laboratory Improvement Program Grant, SBE (Social, Behavioral, and Economic Sciences), EHR (Education & Human Re- sources Directorate); Nynex; Of®ce of Naval Research; Texas Instruments; U.S. Air Force DEPTH contract through Hughes Missile Systems. Acknowledgements Thanks to all the contributors. Comments about this report can be sent to: Matthew Stone or Libby Levison Department of Computer and Information Science University of Pennsylvania 200 S. 33rd St. Philadelphia, PA 19104-6389 [email protected] or [email protected] vi Part II Abstracts 1 Breck Baldwin Department of Computer and Information Science [email protected] Uniqueness, Salience and the Structure of Discourse Keywords: Discourse Processing, Anaphora Resolution, Centering, Pragmatics In spoken and written language, anaphora occurs when one phrase ªpoints toº another, where ªpoints toº means that the two phrases denote the same thing in one's mind. For example, consider the following quote from Kafka's The Metamorphosis: AsGregor Samsa awokeonemorningfromuneasy dreams hefound himself transformed in his bed into a gigantic insect. The proper noun Gregor Samsa isthe antecedent to theanaphor he, and the relationbetween them is termed anaphora. A thorny problem for those interested in modeling human language capabilities on computers is determining what an anaphor (the pointing phrase) picks out as its antecedent (the pointed-to phrase). Much of the dif®culty arises in picking the right antecedent when there are many to choose from. Since anaphora mediates the interrelatedness of language, accurately modeling this phenomenon paves the way to improvements in information retrieval, natural language interfaces and topic identi®cation. I have developed an approach that is particularly suitable for applications that require broad coverage and accuracy. My approach achieves broad coverage by applying simple and ef®cient algorithms to structurethepriordiscourse;it maintains accuracy by detecting the circumstances under which it is unlikely to make a good choice. A typical way to ®nd the antecedent of an anaphor is to impose a total order on the entities evoked by the prior discourse via some saliency metric, and then to take the descriptive content of the anaphor and search the saliency order for a matching antecedent Hobbs [1], Sidner [5], Luperfoy [4]. This approach has the ¯aw that for any two candidate antecedents, one is more salient than the other. But examples like; Earl and Ted were working together when ?he fell into the threshing machine. indicate that Earl and Ted are equally available as antecedents, hence the awkwardness of the example. My approach notes this problem by imposing a partial order on the entities of the prior discourse, which allows Earl and Ted to be equally salient, and requiring that there be a unique antecedent. These changes have greater import than addressing linguistic nit-picks. From an engineering perspective, the changes allow the algorithm to notice when it does not have enough information to ®nd a unique antecedent, and then appropriate fall back strategies can be used. For instance in quoted speech, it is better to wait for more information to arrive which follows the anaphor in the text so that the speaker is identi®ed, thereby resolving uses of I, you and me. This is particularly important when trying to ®nd inferred antecedents because it is a mechanism that can control search through knowledge bases±although I am just beginning to explore the possibilities in this area. The structuringportionof thetheory isachieved by assigninga part ofspeech tagsand identifying noun phrases with a simple regular expressions. Next, I assign a partial order to explicit and inferred entities using an amalgam of grammatical cues Hobbs [1], Lappin [3] which are triggered by regular 2 expression matches. A notion of local topic is provided by Centering Theory Grosz, Joshi and Weinstein [2]. Initial results show that the algorithm gets the correct antecedent for pronouns on the order of 85-90%, possessive pronouns 75% and the results for full de®nite NPs are still unknown± all percentages are for ambiguous contexts, i.e. two or more semantically acceptable candidate antecedents in the current and prior clause. References [1] Jerry Hobbs. Pronoun Resolution. Research Report No. 76-1, City College, City University of New York, 1976. [2] Barbara Grosz, Aravind Joshi and Scott Weinstein. Towards a Computational Theory of Discourse Interpretation. Manuscript, 1986. [3] Shalom Lappin and Herbert J. Leass. An Algorithmfor Pronominal Anaphora Resolution. To appear in Computational Linguistics. [4] Susan Luperfoy. The Representation of Multimodal User Interface Dialogues Using Discourse Pegs. Proceedings ofthe30thAnnualMeetingof theAssociationforComputationalLinguistics, 1992. [5] Candance L. Sidner. Focusing in the Comprehension of De®nite Anaphora. Readings in Natural Language Processing, eds Grosz, Jones, Webber. Morgan Kaufman, Los Altos CA. 1986. 3 Tilman Becker Postdoc, Institute for Research in Cognitive Science [email protected] Compact representation of syntactic knowledge Keywords: Syntax, Lexical rules In the framework of
Recommended publications
  • The FINITE STRING Newsletter Programs 8000 Words), Double
    The FINITE STRING Newsletter Programs 8000 words), double-spaced, by 1 December 1985, to Bestougeff, Ligozat - Parametrised abstract objects for the Chairman of the Program Committee: linguistic information processing Salton - On the representation of query term relations by Prof. Makoto Nagao (Kyoto) soft Boolean operators Dept. of Electrical Engineering Kyoto University 29 MARCH Sakyo-ku, Kyoto, 606, Japan MORNING The Program Committee will respond before 15 March Altman - The resolution of local syntactic ambuiguity by 1986. the human sentence processing mechanism The complete text of the revised papers in camera- Pulman - A parser that doesn't ready form should be sent before 1 May 1986 to Delmonte - Parsing difficulties and phonological processing Winfried Lenders in Italian Institut for Kommunikationsforschung und Phonetik Izumida et al. - A natural language interface using a world der Universit~it Bonn model Poppelsdorfer Allee 47 Berry-Rogghe - Interpreting singular definite descriptions D-5300 Bonn 1 in database queries Bree, Smit - Non-standard uses of if Wehrli - Design and implementation of a lexical data base PROGRAMS Maistros, Kotsanis - Lexifamis: A lexical analyser of modern Greek ACL EUROPEANCHAPTER: Beale - Grammatical analysis by computer of the SECOND CONFERENCE Lancaster-Oslo/Bergen corpus 28 MARCH 1985 Fimbel et al. - Using a text model for analysis and gener- ation MORNING Gillott - The simulation of stress patterns in synthetic Opening Session: Invited Speaker speech - a two level problem Kornai - Natural languages and the Chomsky hierarchy Johnston, Altman - Automatic speech recognition: a Hess - How does Natural Language Quantify framework for research Stifling - Distributives, quantifiers, and a multiplicity of .4 FTER NOON events Garside - A probabflistic parser Slocum and Bennett - An evaluation of METAL Boguraev, Briscoe - Toward a dictionary support environ- Root - A two-way approach to structural transfer in MT ment for real time parsing Boitet et al.
    [Show full text]
  • N N the Institute for Research in Cognitive Science
    The Institute For Research In Cognitive Science Selection and Information: A Class- Based Approach to Lexical Relationships (Ph.D. Dissertation) P by Philip Stuart Resnik E University of Pennsylvania 3401 Walnut Street, Suite 400C Philadelphia, PA 19104-6228 N December 1993 Site of the NSF Science and Technology Center for Research in Cognitive Science N University of Pennsylvania IRCS Report 93-42 Founded by Benjamin Franklin in 1740 SELECTION AND INFORMATION: A CLASS-BASED APPROACH TO LEXICAL RELATIONSHIPS Philip Stuart Resnik A dissertation in Computer and Information Science Presented to the Faculties of the University of Pennsylvania in Partial Ful®llment of the Requirements for the Degree of Doctor of Philosophy 1993 Aravind Joshi Supervisor of Dissertation Mark Steedman Graduate Group Chairperson c Copyright 1993 by Philip Stuart Resnik For Michael Resnik Abstract Selection and Information: A Class-Based Approach to Lexical Relationships Philip Stuart Resnik Supervisor: Aravind Joshi Selectional constraints are limitations on the applicability of predicates to arguments. For example, the statement ªThe number two is blueº may be syntactically well formed, but at some level it is anomalous Ð BLUE is not a predicate that can be applied to numbers. According to the in¯uential theory of (Katz and Fodor, 1964), a predicate associates a set of de®ning features with each argument, expressed within a restricted semantic vocabulary. Despite the persistence of this theory, however, there is widespread agreement about its empirical shortcomings (McCawley, 1968; Fodor, 1977). As an alternative, some critics of the Katz-Fodor theory (e.g. (Johnson-Laird, 1983)) have abandoned the treatment of selectional constraints as semantic, instead treating them as indistinguishable from inferences made on the basis of factual knowledge.
    [Show full text]
  • Download the Complete Issue
    ISSN 0976-0962 IJCLA International Journal of Computational Linguistics and Applicati ons Vol. 3 No. 2 Jul-Dec 2012 Guest Editor Yasunari Harada Editor-in-Chief Alexander Gelbukh © BAHRI PUBLICATIONS (2012) ISSN 0976-0962 International Journal of Computational Linguistics and App lications Vol. 3 No. 2 Jul-Dec 2012 International Journal of Computational Linguistics and Applications – IJCLA (started in 2010) is a peer-reviewed international journal published twice a year, in June and December. It publishes original research papers related to computational linguistics, natural language processing, human language technologies and their applications. The views expressed herein are those of the authors. The journal reserves the right to edit the material. © BAHRI PUBLICATIONS (2013). All rights reserved. No part of this publication may be reproduced by any means, transmitted or translated into another language without the written permission of the publisher. Indexing: Cabell's Directory of Publishing Opportunities. Editor-in-Chief: Alexander Gelbukh Subscription: India: Rs. 2699 Rest of the world: US$ 249 Payments can be made by Cheques/Bank Drafts/International Money Orders drawn in the name of BAHRI PUBLICATIONS, NEW DELHI and sent to: BAHRI PUBLICATIONS 1749A/5, 1st Floor, Gobindpuri Extension, P. O. Box 4453, Kalkaji, New Delhi 110019 Telephones: 011-65810766, (0) 9811204673, (0) 9212794543 E-mail: [email protected]; [email protected] Website: http://www.bahripublications.com Printed & Published by Deepinder Singh Bahri, for and
    [Show full text]
  • Obituary Aravind K. Joshi
    Obituary Aravind K. Joshi Bonnie Webber University of Edinburgh School of Informatics [email protected] It might surprise some young researchers that the first recipient of the ACL LifeTime Achievement award,1 Aravind Joshi, was so often compared to “Yoda,” one of the oldest and most powerful of the Jedi Masters in the Star Wars universe. But Aravind was also one of the kindest, wisest, and most justly celebrated people that one was fortunate to know. When Aravind received the award in 2002, at the age of 73, he said that he hoped his lifetime wasn’t over. Fortunately, it wasn’t: For the next 15 years, Aravind continued to enjoy time spent on research; advising students and younger colleagues; attending ACL conferences both at home in the United States and in far-flung places such as Sydney, Singapore, and Jeju Island; and enjoying the company of his extraordinary wife, the embryologist Susan Heyner, his daughters, Meera and Shyamala Joshi, and his grandchildren, Marco and Ava. Then on 31 December 2017, Aravind died peacefully at home in Philadelphia, sitting in his favorite chair, at the age of 88. Aravind Joshi was born in Pune, India, on 5 August, 1929. He sailed to the United States in 1954 to study electrical engineering (EE) at the University of Pennsylvania, after he was rejected by Harvard because his application, mailed from India, arrived a day late. While completing his M.Sc. in EE, he worked as an engineer at RCA (Camden, NJ), and then while completing his Ph.D. in EE, as a research assistant at the Univer- sity of Pennsylvania’s Department of Linguistics.
    [Show full text]
  • Research in the Language, Information and Computation Laboratory of the University of Pennsylvania
    University of Pennsylvania ScholarlyCommons IRCS Technical Reports Series Institute for Research in Cognitive Science March 1995 Research in the Language, Information and Computation Laboratory of the University of Pennsylvania Matthew Stone University of Pennsylvania Libby Levison University of Pennsylvania Follow this and additional works at: https://repository.upenn.edu/ircs_reports Stone, Matthew and Levison, Libby, "Research in the Language, Information and Computation Laboratory of the University of Pennsylvania" (1995). IRCS Technical Reports Series. 120. https://repository.upenn.edu/ircs_reports/120 University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-95-05. This paper is posted at ScholarlyCommons. https://repository.upenn.edu/ircs_reports/120 For more information, please contact [email protected]. Research in the Language, Information and Computation Laboratory of the University of Pennsylvania Abstract This report takes its name from the Computational Linguistics Feedback Forum (CLiFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania. It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition.
    [Show full text]
  • Ms Niha Dingankar Assistant Professor and Dr Anita Chaware, Head PG Depa
    Prepared by First Year MCA Students Compiled by – Ms Niha Dingankar Assistant Professor and Dr Anita Chaware, Head PG Department Of Computer Science, SNDT WU National Technology Day is celebrated on 11 May across India. This day marks the successfully tested Shakti-I nuclear missile at the Indian Army’s Pokhran Test Range in Rajasthan. This day will be focusing on rebooting the economy through Science and Technology. Shakti also knows as the Pokhran Nuclear Test was the first nuclear test code-named ‘Smiling Buddha’ was carried out in May 1974. This operation was administered by late president and aerospace engineer Dr APJ Abdul Kalam Kalam when Atal Bihari Vajpayee was the Prime Minister. The students of Post Graduate Department Of Computer Science,SNDT University have prepared a list of Computer Scientists from Maharashtra and Orissa. The song which is played in the background is Jayostute Shree Mahanmangale With Original Lyrics by Shree Veer Savarkar Sung By Lata Mangeshkar Chhatrapati Mukteswar Shivaji Temple,Odisha Maharaj Aravind Joshi Born: 5 August 1929 - 31 December 2017 Aravind Krishna Joshi was the Henry Salvatori Professor of Computer and Cognitive Science in the computer science department of the University of Pennsylvania. Joshi defined the tree-adjoining grammar formalism which is often used in computational linguistics and natural language processing. Awards: Guggenheim fellow, 1971–72 Fellow of the Institute of Electrical and Electronics Engineers (IEEE), 1976 Founding Fellow of the American Association for Artificial Intelligence (AAAI), 1990 Award Benjamin Franklin Medal Vijay P. Bhatkar Born: 11 October 1946,Mumbai,India Vijay Pandurang Bhatkar is an Indian computer scientist, IT leader and educationalist.
    [Show full text]