Natural Language Processing and Information Retrieval Natural Language Processing and Information Retrieval

Total Page:16

File Type:pdf, Size:1020Kb

Natural Language Processing and Information Retrieval Natural Language Processing and Information Retrieval NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL NATURAL LANGUAGE GENERATION Title Generation for Machine-Translated Documents Rong Jin Alexander G. Hauptmann Language Technologies Institute Dept. of Computer Science Carnegie Mellon University, Carnegie Mellon University, Pittsburgh, PA, 15213 Pittsburgh, PA, 15213 [email protected] [email protected] Abstract which methods perform better than others will be very helpful towards general understanding on how to dis- In this paper, we present and compare automatically cover knowledge from errorful or corrupted data and generated titles for machine-translated documents using how to apply learned knowledge in ‘noisy’ environments. several different statistics-based methods. A Naïve Bayesian, a K-Nearest Neighbour, a TF-IDF and an it- Historically, the task of title generation is strongly con- erative Expectation-Maximization method for title gen- nected to more traditional document summarization tasks eration were applied to 1000 original English news (Goldstein et al, 1999) because title generation can be documents and again to the same documents translated thought of as extremely short summarization. Traditional from English into Portuguese, French or German and summarization has emphasized the extractive approach, back to English using SYSTRAN. The using selected sentences or paragraphs from a document to provide a summary (Strzalkowski et al., 1998, Salton AutoSummarization function of Microsoft Word was et al, 1997, Mitra et al, 1997). Most notably, McKeown used as a base line. Results on several metrics show that et al. (1995) have developed systems that extract phrases the statistics-based methods of title generation for and recombine elements of phrases into titles. machine-translated documents are fairly language independent and title generation is possible at a level More recently, some researchers have moved toward approaching the accuracy of titles generated for the “learning approaches” that take advantage of training original English documents. data (Witbrock and Mittal, 1999). Their key idea was to estimate the probability of generating a particular title 1. Introduction word given a word in the document. In their approach, they ignore all document words that do not appear in the Before we discuss generating target language titles for title. Only document words that effectively reappear in documents in a different source language, let’s consider the title of a document are counted when they estimate in general the complex task of creating a title for a the probability of generating a title word wt given a document: One has to understand what the document is document word wd as: P(wt|wd) where wt = wd. While about, one has to know what is characteristic of this the Witbrock/Mittal Naïve Bayesian approach is not in document with respect to other documents, one has to principle limited to this constraint, our experiments show know how a good title sounds to catch attention and how that it is a very useful restriction. Kennedy and to distill the essence of the document into a title of just a Hauptmann (2000) explored a generative approach with few words. To generate a title for a machine-translated an iterative Expectation-Maximization algorithm using document becomes even more challenging because we most of the document vocabulary. Jin and Hauptmann have to deal with syntactic and semantic translation er- (2000a) extended this research with a comparison of sev- rors generated by the automatic translation system. eral statistics-based title word selection methods. Generating text titles for machine translated documents In our approach to the title generation problem we will is very worthwhile because it produces a very compact assume the following: target language representation of the original foreign First, the system will be given a set of target language language document, which will help readers to under- training data. Each datum consists of a document and its stand the important information contained in the docu- corresponding title. After exposure to the training cor- ment quickly, without requiring a translation of the com- pus, the system should be able to generate a title for any plete document. From the viewpoint of machine learning, unseen document. All source language documents will be studies on how well general title generation methods can translated into the target language before any title gen- be adapted to errorful machine translated documents and eration is attempted. NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL 1229 grams to order the newly generated title word We decompose the title generation problem into two candidates into a linear sequence. phases: Finally, as a baseline, we also used the extractive sum- • learning and analysis of the training corpus and marization approach implemented as AutoSummarize in • generating a sequence of words using learned Microsoft Word that selects the “best” sentence from the statistics to form the title for a new document. document as a title. For learning and analysis of the training corpus, we The outline of this paper is as follows: This section gave present five different learning methods for comparison. an introduction to the title generation problem. Details of All the approaches are described in detail in section 2. our approach and experiments are presented in Section 2. • A Naïve Bayesian approach with limited vo- The results and their analysis are presented in Section 3. cabulary. This closely mirrors the experiments Section 4 discusses our conclusions drawn from the ex- reported by Witbrock and Mittal (1999). periment and suggests possible improvements. • A Naïve Bayesian approach with full vocabu- lary. Here, we compute the probability of gener- 2. Title Generation Experiment across ating a title word given a document word for all Multiple Languages. words in the training data, not just those docu- ment words that reappear on the titles. We will first describe the data used in our experiments, • A KNN (k nearest neighbors) approach, which and then explain and justify our evaluation metrics. The treats title generation as a special classification six learning approaches will then be described in more problem. We consider the titles in the training detail at the end of this section. corpus as a fixed set of labels, and the task of generating a title for a new document is essen- 2.1 Data Description tially the same as selecting an appropriate label The experimental dataset comes from a CD of 1997 (i.e. title) from the fixed set of training labels. broadcast news transcriptions published by Primary The task reduces to finding the document in the Source Media (1997). There were a total of roughly training corpus, which is most similar to the cur- 50,000 news documents and corresponding titles in the rent document to be titled. Standard document dataset. The training dataset was formed by randomly similarity vectors can be used. The new docu- picking four documents-title pairs from every five pairs ment title will be set to the title for the training in the original dataset. The size of training corpus was document most similar to the current document. therefore 40,000 documents and their titles. We fixed the • Iterative Expectation-Maximization approach. size of the test collection at 1,000 items from the unused This duplicates the experiments reported by document-title pairs. By separating training data and test Kennedy and Hauptmann (2000). data in this way, we ensure strong overlap in topic cover- • Term frequency and inverse document frequency age between the training and test datasets, which gives (TFIDF) method (Salton and Buckley, 1988). the learning algorithms a chance to play a bigger role. TFIDF is a popular method used by the Informa- tion Retrieval community for measuring the im- Since we did not have a large set of documents with titles portance of a term related to a document. We in multiple parallel languages, we approximated this by use the TFIDF score of a word as a measurement creating machine-translated documents from the original of the potential that this document word will be 1000 training documents with titles as follows: adopted into the title, since important words Each of the 1000 test documents was submitted to the have higher chance of being used in the title. SYSTRAN machine translation system (http://babelfish.altavista.com) and translated into For the title-generating phase, we can further decom- French. The French translation was again submitted to pose the issues involved as follows: the SYSTRAN translation system and translated back • Choosing appropriate title words. This is done into English. This final retranslation resulted in our by applying the learned knowledge from one of French machine translation data. The procedure was re- the six methods examined in this paper. peated on the original documents for translation into Por- • Deciding how many title words are appropriate tuguese and German to obtain two more machine trans- for this document title. In our experiments we lated sets of identical documents. For all languages, the simply fixed the length of generated titles to the average vocabulary overlap between the translated average expected title length for all compari- documents and the original documents was around 70%. sons. This implies that at least 30% of the English words were • Finding the correct sequence of title words that translated incorrectly during translation from English to forms a readable title ‘sentence’. This was done a foreign language and back to English. by applying a language model of title word tri- 1230 NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL 2.2 Evaluation Metric word DW, if DW is the same as TW (i.e. DW = In this paper, two evaluations are used, one to measure TW). To generate a title, we merely apply the statis- selection quality and another to measure accuracy of tics to the test documents for generating titles and the sequential ordering of the title words.
Recommended publications
  • The Evolution of Lisp
    1 The Evolution of Lisp Guy L. Steele Jr. Richard P. Gabriel Thinking Machines Corporation Lucid, Inc. 245 First Street 707 Laurel Street Cambridge, Massachusetts 02142 Menlo Park, California 94025 Phone: (617) 234-2860 Phone: (415) 329-8400 FAX: (617) 243-4444 FAX: (415) 329-8480 E-mail: [email protected] E-mail: [email protected] Abstract Lisp is the world’s greatest programming language—or so its proponents think. The structure of Lisp makes it easy to extend the language or even to implement entirely new dialects without starting from scratch. Overall, the evolution of Lisp has been guided more by institutional rivalry, one-upsmanship, and the glee born of technical cleverness that is characteristic of the “hacker culture” than by sober assessments of technical requirements. Nevertheless this process has eventually produced both an industrial- strength programming language, messy but powerful, and a technically pure dialect, small but powerful, that is suitable for use by programming-language theoreticians. We pick up where McCarthy’s paper in the first HOPL conference left off. We trace the development chronologically from the era of the PDP-6, through the heyday of Interlisp and MacLisp, past the ascension and decline of special purpose Lisp machines, to the present era of standardization activities. We then examine the technical evolution of a few representative language features, including both some notable successes and some notable failures, that illuminate design issues that distinguish Lisp from other programming languages. We also discuss the use of Lisp as a laboratory for designing other programming languages. We conclude with some reflections on the forces that have driven the evolution of Lisp.
    [Show full text]
  • Lisp: Program Is Data
    LISP: PROGRAM IS DATA A HISTORICAL PERSPECTIVE ON MACLISP Jon L White Laboratory for Computer Science, M.I.T.* ABSTRACT For over 10 years, MACLISP has supported a variety of projects at M.I.T.'s Artificial Intelligence Laboratory, and the Laboratory for Computer Science (formerly Project MAC). During this time, there has been a continuing development of the MACLISP system, spurred in great measure by the needs of MACSYMAdevelopment. Herein are reported, in amosiac, historical style, the major features of the system. For each feature discussed, an attempt will be made to mention the year of initial development, andthe names of persons or projectsprimarily responsible for requiring, needing, or suggestingsuch features. INTRODUCTION In 1964,Greenblatt and others participated in thecheck-out phase of DigitalEquipment Corporation's new computer, the PDP-6. This machine had a number of innovative features that were thought to be ideal for the development of a list processing system, and thus it was very appropriate that thefirst working program actually run on thePDP-6 was anancestor of thecurrent MACLISP. This earlyLISP was patterned after the existing PDP-1 LISP (see reference l), and was produced by using the text editor and a mini-assembler on the PDP-1. That first PDP-6 finally found its way into M.I.T.'s ProjectMAC for use by theArtificial lntelligence group (the A.1. grouplater became the M.I.T. Artificial Intelligence Laboratory, and Project MAC became the Laboratory for Computer Science). By 1968, the PDP-6 wasrunning the Incompatible Time-sharing system, and was soon supplanted by the PDP-IO.Today, the KL-I 0, anadvanced version of thePDP-10, supports a variety of time sharing systems, most of which are capable of running a MACLISP.
    [Show full text]
  • Iliiiiiiiiiiiiwi1
    tt ¦ »:¦/ " : ^" : ¦ ¦ ' ¦¦ ¦ ¦ : WATERFORD STEAMSHIP COMPANY ¦ ¦ ¦ ' • : - ;; (LIMITED). "4^ ' ¦ - " C¦ rdrity arid :C ty i -[q f>i-: ^ ' t?^'fe w ¦¦;:|:!.r|: - Wdteiifor<M ^ ^ITA Mpi! , r!l ,: i !i) -|^i:U}, ^p; .!-J;. ' .:{ INTENDED ORDER OF SAILING. VISITORS to DUBLIN j QENERA.L QUABTEn : ; HLts)tTAlilA'f,ond ^i^M.: l &E$SI0FS} ' FEBRUARY, l<110 TRy THE ^TOECANiA"- j isJ'SiiD 0Nos of itHE d6cryirt,i are . the XlaisasS ' and , .Fti£teEA!hnx &els xn tiqOKfrfpRFHEy ¦ 1 Y8JU&f t9J0fL.;-\ ST&AiMIEKS • W«w'; t5t5ni«r i^FRANiOOiNIA;M ' FOUR ; ¦ ¦ ,i Swinr ¦ CLODAGH. DtlNBRODY. REGINALD, Byilqers' ironmo^g'ery. - ' -' : - ' " i::|' -THS ;. , ¦]¦• , ;; ; ; . i , 'i ' : LAKA ; ! ; : . : ': : ;,j v j ; ; ^ . Triple -sc&w, TuiMne | \\. HILAicVS; BB38IONS. -!,-|v • ¦V. , JTKNAPIA, Ett. COURTS 1 ¦ If OAEMANIA¦ ," Iiitnore.dfliB osincOT—Totsday <th January, 1910 • ' [ ' - . ,. : • CrowS ConTentent to- M ' -ifj 20,000. abni' J^T . : BmlDCsS—Tuesday, 4U) piouirj.'9 <? V t^md an4 "VTOTICE.—Tho O Tl PI Household Ironr^on^o ; BANIC Bralllr—Wedoesdiy,5th lanbary, ' '¦ ¦ Water* r^y- CITY :a C9UWTYiiLOAM €i|. blSGQUWT Twuwfecrow- Staapi«j "GARONIA-i'' 20,000 191a , ., .- : 'i : . ¦/ i "V— _3N. ii ford Steamship Cpra» \ : ¦ ¦ ' ' ' DutuRirrisV OTU Daaineov—T»unday,«lh . |ariua»y, Can't bo boaten (or Comfort ' «^€£"^m^ci3 1 ¦ 'a jpuUbei: of yjears) ' : ' : Bg 1910^Crosm BOSISBSS—Friday, 7U1 Jnnnaf^,1910 j Lour) , Pieces^PrbIO T" - (Wli' l ch la estAbHshed for '¦ ¦ ' !lV «iul j ChChi i mhey ' - " - ' ¦ ' ' ; ¦¦ ¦ ' 'i ^6r Stoairdcr i' Bprlty—Friday,^ ¦ ' ; nr) - 7th Jao;ary, ~S-raJ Pibces^^pgl ; - 'VCAlCPAiin-A. ' ^jSf*?l5b?''!! -I' " ima ' • < rt;ey . 1 •! ^WCS, iGowfi li . , ! ' I ' . \i . , - •!• •( ~ ¦ «nd Lite Stock foe Ship.
    [Show full text]
  • ' MACSYMA Users' Conference
    ' MACSYMA Users'Conference Held at University of California Berkeley, California July 27-29, 1977 I TECH LIBRARY KAFB, NY NASA CP-2012 Proceedings of the 1977 MACSYMA Users’ Conference Sponsored by Massachusetts Institute of Technology, University of California at Berkeley, NASA Langley Research Center and held at Berkeley, California July 27-29, 1977 Scientific and TechnicalInformation Office 1977 NATIONALAERONAUTICS AND SPACE ADMINISTRATION NA5A Washington, D.C. FOREWORD The technical programof the 1977 MACSPMA Users' Conference, held at Berkeley,California, from July 27 to July 29, 1977, consisted of the 45 contributedpapers reported in.this publicationand of a workshop.The work- shop was designed to promote an exchange of information between implementers and users of the MACSYMA computersystem and to help guide future developments. I The response to the call for papers has well exceeded the early estimates of the conference organizers; and the high quality and broad ra.ngeof topics of thepapers submitted has been most satisfying. A bibliography of papers concerned with the MACSYMA system is included at the endof this publication. We would like to thank the members of the programcommittee, the many referees, and the secretarial and technical staffs at the University of California at Berkeley and at the Laboratory for Computer Science, Massachusetts Instituteof Technology, for shepherding the many papersthrough the submission- to-publicationprocess. We are especiallyappreciative of theburden. carried by .V. Ellen Lewis of M. I. T. for serving as expert in document preparation from computer-readableto camera-ready copy for several papers. This conference originated as the result of an organizing session called by Joel Moses of M.I.T.
    [Show full text]
  • A Lisp Machine with Very Compact Programs
    Session 25 Hardware and Software for Artificial Intelligence A LISP MACHINE WITH VERY COMPACT PROGRAMS L. Peter Deutsch Xerox corporation, Palo Alto Research center (PARC) Palo Alto, California 94304 Abstract Data Types This paper presents a machine designed for LISP has many data types (e.g. list, compact representation and rapid execution symbolic atom, integer) but no declarations. of LISP programs. The machine language is a The usual implementation of languages with factor of 2 to 5 more compact than this property affixes a tag to each datum to S-expressions or conventional compiled code, indicate its type, in LISP, however, the , and the.compiler is extremely simple. The vast majority of data are pointers to lists encoding scheme is potentially applicable to or atoms, and it would be wasteful to leave data as well as program. The machine also room for a full word plus tag (the space provides for user-defined data structures. needed for an integer datum, for example) in every place where a datum can appear such as Introduction the CAR and CDR of list cells. Consequently, in BBN-LISP every datum is a Pew existing computers permit convenient or pointer; integers, strings, etc. are all efficient implementation of dynamic storage referenced indirectly. Storage is allocated in quanta, and each quantum holds data of allocation, recursive procedures, or only one type, so what type of object a operations on data whose type is represented given pointer references is just a function explicitly at run time rather than of the object's address, i.e. the pointer determined at compile time.
    [Show full text]
  • USENIX Winter Conference
    February ! 990 The Australian UNIX* systems User Group Newsletter Volume 11 Number 1 February 1990 CONTENTS AUUG General Information ..................... 4 Editorial ........................... 5 Secretary’s Letter ......................... 7 AUUG Inc 1990 Annual Elections -- Nomination Form ............. 8 Western Australian UNIX systems Group Information ............. 9 SESSPOOLE Information ...................... 10 AUUG Book Club Reviews ...................... 11 AUUG Book Club Order Form .................... 18 X Technical Bibliography ...................... 19 Commonly Asked X Questions ..................... 25 AUUG Institutional Members ..................... 42 From the EUUG Newsletter - Volume 9 Number 4 .............. 44 Editorial ......................... 45 How To Protect Your Software .................. 46 The EUUG Conference Mailing System ................ 54 UNIX in Czechoslovakia .................... 57 EUUG Munich Conference, Spring 1990 ................ 60 USING -- UNIX Systems Information Networking Group .......... 64 EUUG Executive Report .................... 67 Report From ICEUUG ..................... 69 News From The Netherlands ................... 71 UKUUG Report ....................... 74 USENIX Association News For EUUG Members ............. 76 USENIX Winter Conference ................... 78 EUnet Growing Up In Spain ................... 81 EUUG Software Distribution ................... 88 ISO/IEC JTC1/SC22AVG15 (POSIX) Meeting October, 1989 ......... 92 The OPEN LOOKTM Graphical User Interface .............. 98
    [Show full text]
  • 1 Introduction
    C7|Lisp Programming Russell Bradford 1 Introduction This course is about Lisp. Or rather this course is about how to program in Lisp. The distinction is a great one: Lisp, as a language to use, to program with, to learn, is one of the simplest of all languages. There is no difficult syntax, and every piece of code or data is treated in exactly the same way, with only very few (but important) exceptions. On the other hand, some of the hardest language semantic questions can be posed and written using Lisp, involving subtle or complex ideas, readily adapting to accommodate every new fad and paradigm. We shall concentrate on the programming aspect, and perhaps give the merest hint of the semantic under- pinning. 1.1 What is Lisp? The name \Lisp" means \List Processor." The basic object that we manipulate is the list. A list is an ordered collection of objects, each of which may be other lists, or things which are not lists, called \atoms." An atom is any object that does not have a list structure. Thus (the cat sat (on the) mat) is a list of 5 objects, the 4th being a list of 2 objects. The symbols the, cat, etc., are atoms. Here is another example of a list ((one 1) (two 2) (three 3)) which associates a symbol with a numeral (each of which are atoms). Lists come in all shapes and sizes, for example (((x 10) 2) ((x 5) ((y 2) 1) 1) ((x 2) 3) 2) is a more complicated list, which some people like to think of as a representation for the polynomial 2x10 + (y2 + 1)x5 + 3x2 + 2.
    [Show full text]
  • Ci the Interlisp Virtual Machine
    C.I THE INTERLISP VIRTUAL MACHINE: A STUDY OF ITS DESIGN AND ITS IMPLEMENTATION AS MULTILISP by OHANNES A.G.M. KOOMEN B.Sc. THE UNIVERSITY OF BRITISH COLUMBIA 1978 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES (Department of Computer Science) We accept this thesis as conforming to the required standard. THE UNIVERSITY OF BRITISH COLUMBIA July, 1980 (c) Johannes A.G.M. Koomen, 1980 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the Head of my Department or by his representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Computer Science The University of British Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 Date July 22, 1980 ABSTRACT Abstract machine definitions have been recognized as convenient and powerful tools for enhancing software portability. One such machine, the Inter lisp Virtual Machine, is examined in this thesis. We present the Multilisp System as an implementation of the Virtual Machine and discuss some of the design criteria and difficulties encountered in mapping the Virtual Machine onto a particular environment. On the basis of our experience with Multilisp we indicate several weaknesses of the Virtual Machine which impair its adequacy as a basis for a portable Interlisp System.
    [Show full text]
  • Guide to the Herbert Stoyan Collection on LISP Programming, 2011
    Guide to the Herbert Stoyan collection on LISP programming Creator: Herbert Stoyan 1943- Dates: 1955-2001, bulk 1957-1990 Extent: 105 linear feet, 160 boxes Collection number: X5687.2010 Accession number: 102703236 Processed by: Paul McJones with the aid of fellow volunteers John Dobyns and Randall Neff, 2010 XML encoded by: Sara Chabino Lott, 2011 Abstract The Herbert Stoyan collection on LISP programming contains materials documenting the origins, evolution, and use of the LISP programming language and many of its applications in artificial intelligence. Stoyan collected these materials during his career as a researcher and professor, beginning in the former German Democratic Republic (Deutsche Demokratische Republik) and moving to the Federal Republic of Germany (Bundesrepublik Deutschland) in 1981. Types of material collected by Stoyan include memoranda, manuals, technical reports, published and unpublished papers, source program listings, computer media (tapes and flexible diskettes), promotional material, and correspondence. The collection also includes manuscripts of several books written by Stoyan and a small quantity of his personal papers. Administrative Information Access Restrictions The collection is open for research. Publication Rights The Computer History Museum (CHM) can only claim physical ownership of the collection. Users are responsible for satisfying any claims of the copyright holder. Permission to copy or publish any portion of the Computer History Museum’s collection must be given by the Computer History Museum. Preferred Citation [Identification of Item], [Date], Herbert Stoyan collection on LISP programming, Lot X5687.2010, Box [#], Folder [#], Computer History Museum Provenance The Herbert Stoyan collection on LISP programming was donated by Herbert Stoyan to the Computer History Museum in 2010.
    [Show full text]
  • Guide to the Herbert Stoyan Collection on LISP Programming
    http://oac.cdlib.org/findaid/ark:/13030/kt038nf156 No online items Guide to the Herbert Stoyan collection on LISP programming Paul McJones with the aid of fellow volunteers John Dobyns and Randall Neff Computer History Museum 1401 N. Shoreline Blvd. Mountain View, California 94043 Phone: (650) 810-1010 Email: [email protected] URL: http://www.computerhistory.org © 2011 Computer History Museum. All rights reserved. Guide to the Herbert Stoyan X5687.2010 1 collection on LISP programming Guide to the Herbert Stoyan collection on LISP programming Collection number: X5687.2010 Computer History Museum Processed by: Paul McJones with the aid of fellow volunteers John Dobyns and Randall Neff Date Completed: 2010 Encoded by: Sara Chabino Lott © 2011 Computer History Museum. All rights reserved. Descriptive Summary Title: Guide to the Herbert Stoyan collection on LISP programming Dates: 1955-2001 Bulk Dates: 1957-1990 Collection number: X5687.2010 Collector: Stoyan, Herbert Collection Size: 105 linear feet160 boxes Repository: Computer History Museum Mountain View, CA 94043 Abstract: The Herbert Stoyan collection on LISP programming contains materials documenting the origins, evolution, and use of the LISP programming language and many of its applications in artificial intelligence. Stoyan collected these materials during his career as a researcher and professor, beginning in the former German Democratic Republic (Deutsche Demokratische Republik) and moving to the Federal Republic of Germany (Bundesrepublik Deutschland) in 1981. Types of material collected by Stoyan include memoranda, manuals, technical reports, published and unpublished papers, source program listings, computer media (tapes and flexible diskettes), promotional material, and correspondence. The collection also includes manuscripts of several books written by Stoyan and a small quantity of his personal papers.
    [Show full text]
  • The Evolution of Lisp
    1 The Evolution of Lisp Guy L. Steele Jr. Richard P. Gabriel Thinking Machines Corporation Lucid, Inc. 245 First Street 707 Laurel Street Cambridge, Massachusetts 02142 Menlo Park, California 94025 Phone: (617) 234-2860 Phone: (415) 329-8400 FAX: (617) 243-4444 FAX: (415) 329-8480 E-mail: [email protected] E-mail: [email protected] Abstract Lisp is the world’s greatest programming language—or so its proponents think. The structure of Lisp makes it easy to extend the language or even to implement entirely new dialects without starting from scratch. Overall, the evolution of Lisp has been guided more by institutional rivalry, one-upsmanship, and the glee born of technical cleverness that is characteristic of the “hacker culture” than by sober assessments of technical requirements. Nevertheless this process has eventually produced both an industrial- strength programming language, messy but powerful, and a technically pure dialect, small but powerful, that is suitable for use by programming-language theoreticians. We pick up where McCarthy’s paper in the first HOPL conference left off. We trace the development chronologically from the era of the PDP-6, through the heyday of Interlisp and MacLisp, past the ascension and decline of special purpose Lisp machines, to the present era of standardization activities. We then examine the technical evolution of a few representative language features, including both some notable successes and some notable failures, that illuminate design issues that distinguish Lisp from other programming languages. We also discuss the use of Lisp as a laboratory for designing other programming languages. We conclude with some reflections on the forces that have driven the evolution of Lisp.
    [Show full text]
  • CAL Lisp: Preliminary Description
    ., CAL Lisp: Preliminary Description (Bill Bridge) Gene McDaniel Paul McJones A project for CS 220 and CS 199. / -1---' c> j . ,j.) '~;'4)':j ( ~.' . /1 , I . ! . ".--(-- ,,\--1.::> INTRODUCTION Lisp has been in the forefront of symbol manipulation programming languages for more than a decade and interactive implem~entations on timeshared computer systems have kept it up-to-date. In particular BBN Lisp for the SDS 940 and, more recently, the PDP-10 provide a very large address space and powerful primitive'; ) .. / operations allowing interactive debugging and editting ext~sions, among others, -t to be written in Lisp itself. (1,2). CAL Lisp is an attempt to coninue this line t of development and hopefully harness a fast computer, the CDC 6400, and a not-so­ r fast operating system, the CA~Timesharing system • ..; BBN has diverged from Lisp 1.S (3) in several ways which increase efficiency without, it is felt, detracting from the language. First, a linear pushdown stack replaces the association list scheme for variable bindings and allows compiled functions to reference their arguments directly. (These compiled functions still place the variable names with the values to facilitate debugging and maintain compatibility with interpreted functions.) Second, the property list of an atom is no longer used by Lisp to hold function definitions and constant values. Each atom has three settable attributes: a definition cell, referenced when the atom is used as a function name; a "zero-level value" cell, used when the atom occurs as a free variable bound nowhere in the stack; and a property list cell, purely for use by the user (including the Lisp-coded extensions) as a hatrack.
    [Show full text]