For the Abstract.The-Revision Deletion, Orcreation,Of Sentences Is
Total Page:16
File Type:pdf, Size:1020Kb
DOCUMENT REBUKE ED 071 886 LI.004 082 AUTHOR Mathis, Betty. Ann TITLE Techniques for the Evaluation and IMprovement of Computer-Produced Abstracts. INSTITUTION Ohio State Univ".Colulbus..Computer and Information Science, Research.Center.. SPONS AGENCY National Science Foundation, Washington, D.C. REPORT NO OSU-CISRC-TR-15 . PUB DATE Dec 72 NOTE 275p.;(202 References) EDRS PRICE MF-30.65 HC-49.87 DESCRIPTORS *Abstracting;,,Abstracts; Algorithms; *Automation; CoMputers; *Electrohit Data Processing; Evaluation; Periodicals IDENTIFIERS *Automatic Abstracting ABSTRACT An automatic abstracting system, named ADAM, implemented on the IBM 370, receives journal.articles as input and produces abstracts as output. An algorithm has- been.developed which considers every sentence in the input text and rejects sentences which, are not suitable for inclusion in the abstract. All sentences which are not rejected are included in the set of sentences which are candidates for inclusion in the abstract.,The quality of the abstracts can be evaluated by means of a two -step evaluation procedUre.,The 'first step determines the conforMity of the abstracts to the defined criteria for an acceptable abstract The aecond step provides an objective evaluation criterion for abstract quality based on a.compsrison of the abstract with 'its parent documere,-. Based on the results of this evaluation, several techniques have been developed to improve the quality of the abstracts. These procedures -modify the form, arrangement, and content of the sentences selected for the abstract.The-revision deletion, orcreation,of sentences is perfOrmed according to a number of generalized ruletwhich are based on the structural characteriitics of the sentences.. This modification produces' abstracts in which the flow of ideas is improved and which represent a more nearly cohcrent:whole..(AuthorISJ) - - TECIfigICAL REPORT SERIES 4 CO1PUTE13 1r1fiFilliTTO S. -SCIENCE FiESEFIliCii CENTER. , THE OHIO STATE UNIVERSITY COLUMBUS, OHIO U.S. DEPARTMENT OF HEALTH, (OSU:-CISRC-TR-72 -15) EDUCATION & WELFARE OFFICE OF EDUCATION THIS DOCUMENT HAS BEEN REPRO. OUCEO EXACTLY AS RECEIVED FROM THE PERSON OR ORGANIZATION ORIG. MATING IT, POINTS OF VIEW OR OPIN. IONS STATED DO NOT NECESSARILY REPRESENT OFFICIAL OFFICE OF EOU CATION POSITION OR POLICY. TECHNIQUES FOR THE EVALUATION AND IMPROVEMENT OF COMPUTER-PRODUCED ABSTRACTS by Betty Ann Mathis Work performed under Grant No. 534.1, National Science Foundation Computer and Information Science Research Center The Ohio State University Columbus,'Ohio 43210 ,December 1972 ABSTRACT An,authomatic abstracting system, named ADAM, has been implemented on the IBM 370. ADAM receives journal articles as input and produces abstracts as output. An-algorithm.has been developed which considers every sentence in the input text,and rejects sentences which are not suitable for inclusion in the abstract. All sentences which are not rejected are included in the set of sentences which are candidates for inclusion:in the abstract. The quality of the abstracts can be evaluated by means ofa two-step evaluation procedure. The first step,of the procedure determines the con- formity of the abstracts. to the defined- criteria for an acceptable abstract aSti for the given system. The second step provides an objective evaluation criterion for abstract quality based on a _comparison of the abstract with its parent document. The second step is based on the assumption that an abstract should present the maximum amount of data from the parent document in the minimum amount of length. Based on the results of this evaluation, several techniques hive been developed to improve the quality of the abstracts. These procedures modify the form, arrangement, and-content of the sentences selected for theab- stract. The revision,, deletion, or creation of sentences is performed according to a number of generalized rules whichare based on the structunAl chatacteristics of the sentences-. This modification produces abstracts in which the flow of ideas is improved and whichrepresent a more nearly coher- ent whole. The abstracts also show improvement accordingto the objective evaluation criteria: ii PREFACE This work was done in partial fulfillment of the requirements for a doctor of philosophy degree in Computer and Information Science from The Ohio State University. It was supported in part by Grant No. GN 534.1 from the Office of Science Information Seryice, National Science Founda tion, to the Computer and Information Science Research Center of The Ohio State.University The CoMPuter and Information Science Research Center of The Ohio State University is an interdisciplinary research organization which consists of the staff, graduate students, and faculty of many University departments and laboratories. This report is based on research accom plished in cooperation with the Department of Computer-and Information Science. The research was administered and monitored by The Ohio State University Research Foundation. 1 vso ACKNOWLEDGEMENTS I am indebted'to my adviser, Dr. James E. Rush, for his help and advice during the preparation qt this dissertition and throughout my graduate career. I am also grateful to the other two members of my reading committee, Dr. Marshal C. Yovitg. and Dr. Anthony E. Petrarca for their contributions to this work. My graduate studies have been enriched by courses taught by Professors Ernst, Foulk",*Lazorick, Liu, Montgomery, Petrarca, Rustagi, Reeker, Rush, Seltzer, White and Yovits. Richard Salvador and Antonio Zamora worked with Dr. Rush on the original design of ADAM and I wish to thank them for allowingme to continue work,on their system. I also'wish to thank Carol E. Young for allowing me to use her grammatical class assignment programs in my 'research. I am grateful to Dr. Bertrand C. Landry for reading one of the early drafts of this dissertation and for providing valuable Suggestions. Michael McCabe helped me with some of the programming of ADAM and Iam appreciative of his assistance. I also wish to thank the students in the undergraduate information storage and retrieval courses-who tested the abstracting system. Thanks is also due to my typiSts, Mrs. .Mary Kimball and Mrs. Constance Fitzpatrick. pilling my graduate-studies, I received financial aid froMa. University Fellowship and research support froM a grant (GN-534.1) from the National Science Foundation to the Computer and information Science Research Center. Computer time for this research was provided by the grant and The Ohio State University Instruction and Research Computer Center. A special wOrd of thanks' is due my parents, my brother and Mike for their help and encouragement. iv ,f TABLE OF CONTENTS ABSTRACT PREFACE ACKNOWLEDGEMENTS iv LIST OF TABLES i ix LIST OF FIGURES QUOTATION REFERENCE xiv Chapter I. Introduction 1 1. TheNeed for Abstracting 1 2, The Need for Abstracts in the Scientific Community. 2 3, The Production of Abstracts by Computer 3 4. The Scope of the Dissertation 4 Reference§ 6 II. Background and RevieW of Previous Work 7 1. The Nature and Definition of Abstraction 7 1.1. Introduction 7 1.2. Historical Development of Abstracting 8 1.3. Intensional Properties of Abstracts 12 1.4. Formal Definition of an "Abstract" 19 2. Methods of Producing Abstracts 22 2.1. Human Abstracting Systems 22 2.1:1. .0perational Systems 24 2.1.2: Selection and Consistency 29 2.2. Computerbased Abstracting Systems 39 2.2.1. Basic System Components 40 2.2.2. Major Research tffarts-in the Rroduction of Abstracts 42 2.2.2.1. The Luhn Study 42 2.2.2.2.The ACSIMatic Study . 44 v e, 2.2.2.3. The Oswald Study i.2.2.4.1 Word Association Research 50 , . 2.2.2.5.- The TRW Study ..... 53 2.2.2.6. The Earl Study 58 2.2.2.7, The Soviet Study 64 References 69 III. Des,ign of the Abstracting System 7'5 1. Philosophy Underlying the Design 75 2. Data Input 77 3. Methods of Sentence Selection and Rejection 78 3.1. Sentence Elimination 78 3.2. Location criteria 79 3.3. "Cue" Criteria 80 3.4. Intersentence Reference 82 3.5. Frequency Criteria 83 3.6. Coherence Considerations 84 4, The Word Control List 84 4.1. Hierarchical Rules for Implementing the Functions in the Word Control List 85 4.2. Syntactic Valuesand Their Use 85 5. Summary of the 'Operation of the Abstracting System , 87 References 91 IV. The Evaluation of'the Quality of Abstracts 92 1. Need for an Evaluation Procedure 92 2. 'Existing Methods of Abstract Evaluation 93 3. Analysis of Previous Evaluation Measures for Specific Uses of Abstracts 96 3.1. The Abstract as'an Alerting Tool 97 3.2. The Abstract as a Retrospective Search Tool . 99 3.3. The Abstract as-the Source of Index Entries 101 3.4. The Abstract as a Data Base for information Storage and Retrieval Systems 101 4. Desiderata of an EvaluaCion Criterion 104 4 t. Importance of Data 105 6. The Relationship'Between Data and Information . 107 6.1. On the Difficulty of Using "Information Content" as a Basis for Evaluation of Abstracts . 110 6.2. On the Usefulness of Data Content as a Basis for Evaluation of Abstracts 110 7. A Two-Step Procedure for the Evaluation of Abstracts 113 7.1. Step 1 114 7.2. Step 2 117 7.2.1. Definition of the Data Coefficient 117 7.2.1.1. Definition of the Units of Dateand Length 124 7.2.1.2. Implementation of the Data Coefficient 1;26 7.2..3. The Representation of Concepts by Name-Relation-Name Patterns 1z8 'e 8. Examples of the Application of the Content Evaluation Criterion 132 8.1. Sample DocUment with Six Possible Abstracts . 132 8.2. Fourteen Abstracts produced by the Abstracting System 144 8.3. Summary of Results of the Experimental-Application of The Data Content Criterion 167 References 169 V. Improvement of the Abstracting System $$ 171 1. Improvement of the Quality of Abstracts through Sentence Modification 171 1.1.