This Report Presents the Results of the Initial Phase of the File Organization
Total Page:16
File Type:pdf, Size:1020Kb
DOCUMENT RESUME ED 029 679 52 LI 001 535 By-Cunningham. Jay L.; And Others A Study of the Organization and Search of Bibliographic Holdings Records inOn-Line Computer Systems: Phase L Final Report. California Univ.. Berkeley. Inst of Library Research. Spons Agency-Office of Education (DHEW). Washington. D.C. Bureau of Research. Bureau No- BR-7-1083 Pub Date Mar 69 Grant- OEG-1-7-071083-5068 Note- 307p. EDRS Price MF-$1.25 HC-$15.45 Descriptors-`,Qtomation. Bibliographic Citations. Catalogs. Computer Programs. ComputerStorage Devices. Costs.*InformationProcessing.InformationRetrieval.Information Storage.Information Systems. Libraries. Library Technical Processes. Search Strategies.Systems Development This report presents the results of the initial phase of theFile Organization Project. a study which focuses upon the on-line maintenance andsearch of the library's catalog holdings record. The focus of the project is to develop a facilityfor research and experimentation with the many issues of on-line file organizations and search. The first year has been primarily devoted to&fining issues to be studied. developing the facility for experiment, and carrying out initialresearch on the issues. Achievements involved: (1) obtaining equipment: (2) programming and testing aninitial software system. and then expanding it to supply access to thecentral processor from two different mechanical terminals at two remote locations:(3) planning for acquisition and incorporation of an existing machine file as well asbibliographic records which require original conversions: (4) developing software for data base preparation and for file handling and access: and (5) initiatinganalyses on issues such as optimum length of search keys. Appended are six reportswhich cover specific aspects of the project and an article entitled *The Organization. Maintenance and Search of Machine Files." reprinted from *The Annual Review ofInformation Science and Technology:2 volume 3. (JB) FINAL REPORT Project No. 7-1083 Grant No. OEG-1-7-071083-5068 A STUDY OF THE ORGANIZATION ANDSEARCH OF BIBLIOGRAPHIC HOLDINGS RECORDS IN ON-LINE COMPUTER SYSTEMS: PHASE I By Jay L. Cunningham William D. Schieber and Ralph M. Shoffner Institute of Library Research University of California Berkeley, California94720 U.S. DEPARTMENT OF HEALTH, EDUCATION it WELFARE OFFICE OF EDUCATION THIS DOCUMENT HAS BEEN REPRODUCED EXACTLY AS RECEIVED FROM THE PERSON OR ORGANIZATION ORIGINATING IT.POINTS OF VIEW OR OPINIONS STATED DO NOT NECESSARILY REPRESENT OFFICIAL OFFICE OF EDUCATION POSITION OR POLICY. March 1969 The research reported herein was performed pursuantto a grant with the Office of Education, U.S. Department of Health,Educa- tion, and Welfare. Contractors undertaking such projects under Government sponsorship are encouraged to express freelytheir professional judgment in the conduct of the project. Points of view or opinions stated do not, therefore, necessarilyrepresent official Office of Education position or policy. U.S. DEPARTMENT OF HEALTH, EDUGATION, AND WELFARE Office of Education Bureau of Research TABLE OF CONTENTS List of Figures Acknowledgments I. INTRODUCTION AND SUMMARY A. The Research Problem 1 B. Research Method 3 C. Significant Findings and Achievements. D. Future Directions II. FACILITY ESTABLISHMENT 7 A. General 7 B. Fquipment 12 C. Computer Programs 14 D. Data Base Development 20 E. File Structure 21 III. THE BIBLIOGRAPHIC RECORD 35 A. General 35 B. Record Content 37 C. Record Form 38 D. The Representation of Typographical Characters48 E. Logical Similarity of Bibliographic Records. 60 IV. DATA BASE DEVELOPMENT 75 A. General 75 B. Strategies of Conversion 77 C. Translation of Existing Machine Files, . 85 D. Data Base Production Procedure 94 E. Issues of Cost and Quality 107 References 115 TABLE 07 CONTENTS, (Cont.) Laza APPENDIX I: AN ALGORITHM FOR NOISY MATCHES IN CATALOG SEARCHING, By James L. Dolby 117 APPENDIX II: USER'S GUIDE TO THE TERMINAL MONITOR SYSTEM (WS), By William D. Schieber 137 APPENDIX III: A DESCRIPTION OF LYRIC, A LANGUAGE FOR REMOTE INSTRUCTION BY COMPUTER, By Steven S. Silver 145 APPENDIX IV: ILR PROCESSING RECORD SPECIFICATION, By Jay L. Cunningham 163 APPENDIX V: SUMMARY OF RECORD FORMATS FOR DATA BASES TO BE CONVERTED TO ILR PROCESSING RECORD FORMAT 1. Santa Cruz Record Format 193 2. ILR Input Record Format 207 3. Experimental On-line Mathematics Citation Data Base 263 APPENDIX VI: SAMPLE SIZE DETERMINATION FOR DATA CONVERSION QUALITY CONTROL, By Jorge Rodriguez 271 APPENDIX VII: THE ORGANIZATION, MAINTENANCE AND SEARCH OF MACHINE FILES, By Ralph M. Shoffner (Published in the Annual Review of Information Science and Technology, v. 3, edited by Carlos A. Cuadra. Chicago, Encyclopaedia Britannica, Inc., 1968. pp. 137-167) 277 - LIST OF FIGURES Figure Title LEI SECTION II: FACILITY ESTABLISHMENT 1. Schematic Diagram of Project Facility 8 2. File Generation Process 17 3. Blocking Strategy 24 4 Block a Non-Keyed File 25 5. Uniqueness of Author Identification 30 6. Distribution of Number of Fields of Length N 31 7. Schematic of Multi-level File Structure Linkage . 33 SECTION III: THE BIBLIOGRAPHIC RECORD 8. Functional System Components Related to Different Record Formats 36 9. MARC II Elements Deferred in File Organization Project Data Base 39 10. File Organization Project Data Elements Not Defined in MARC II 40 11. Coding Sheet - Monographs 46 12. Keying Blocks of Text 52 13. The Tentative Harvard List of Diacritics 56 14. Alphabetical Index of Diacritic Codes 57-58 15. Proposed Single Keying Codes Compatible with Transliteration Schemes for Modern Cyrillic 59 16. A Spelling Equivalent Abbreviation Algorithm for Personal Names (Dolby Version 1 - Variable Length). 65 17. Equivalence Class Computation (Manual) 66 18. Equivalence Class Computation (Computer) 68 19. Abbreviation Algorithm for Personal Names (Version 2- FixedLength) 69 LIST OF FIGURES (Cont.) Figure Title Page SECTION IV: DATA BASE DEVELOPMENT 20. Distribution of DupliLate Titles as a Function of Publication Date: A. Titles in English Language 86 B. Titles in Languages Other Than English That Use a Roman Alphabet 87 C. Titles in Languages That Use a Non-Roman Alphabeb 87 21. Conventional Conversion Compared to Automatic Format Translation and Computer-Assisted Editing. 89 22. Plow Chart of Personal Author Field Algorithm . 91 23. Flow Chart of Title Field Algorithm 92-93 24. Summary Chart of Data Base Production . 95 25. On-line Search for Duplicates 97 26. Verification of Match 98 27. Data Preparation and Transcription 100 28. Computer Edit, Correction Cycle, and File Update. 101 29. Diagnostic Printout, Part 1- Logical Field Listing 105 30. Diagnostic Printout, Part 2- Card Image Listing. 106 31. Schematic of Quality Control Subsystem 108 32. Relation of Initial Keying Cost to Accuracy . 113 33. Acceptability in Terms of Accuracy and Cost for Three Price Quotations for Keying 114 APPENDIX IV: ILR PROCESSING RECORD SPECIFICATION 1. Indicator for Main Entry- Personal Name 166 2. Storage Record Organization 166 - iv- LIST OF FIGURES (Cont.) Title Page Record, INFOCAL Version 1 . 168 3. Schematic of ILR Storage 169 4. ILR Processing Record -Segment 1, Leader Segment 2, Record Directory 172 5. ILR Processing Record - 6. ILR Processing Record -Segment 3, Fixed Length 173 Data Elements 177 7. Variable Field Tags and DataElements 180 8. Values for Indicator 1 inApplicable Fields 184 9. Sub-Field Delimiter Codes 188 10. Proposed Variable Field Header APPENDIX V-1: SANTA CRUZ RECORD FORMAT 195 1. Sample Catalog Record inOriginal Santa Cruz Format. APPENDIX 11-2: ILR INPUT RECORD FORMAT 208 1. Storage Record Components &Organization 211 2. Structural Patterns in MARC RecordData Definition 3. Example of Input FormatMapping Into Processing 213 Format 217 4. Example of Tab Card Decklet - ILRInput Fonuat . 5. ILR Input Record Format-I-Fields: Data Elements & Codes 219 6. ILR Input Record Format-A-Fields: Data Elements & Codes 224 7. ILR Input Record Format-B-Fields: Data Elements & Codes 226 232 8. Input Code Values Tablefor Typeof Main Entry . 9. Input Code Values Tablefor Typeof Added Entries (Series Traced Same) 233 10. Input Code Values Tablefor Type of Added Entries (Subject Added Entries) 234 =i IMO LIST OF FIGURES (Cont.) Figure Title Page 11. Input Code Values Table for Type of Added Entries (Other Added Entries) 236 12. Input Code Values Table for Type of Added Entries (Series Traced Differently) 238 13. Presence of Fields in an Input Record 239 14. INFOCAL Default Initializations 241 15. Default Settings for Indicator 1 243 16. List of Tag Numbers Which are Currently Repeatable. 244 17. Revised Field Coding: A-Fi3lds & B-Fields 246 18. Table of Valid Symbols 250 APPENDIX V-3: EXPERIMENTAL ON-LINE MATHEMATICS CITATION DATA BASE 1. Cards Punched for One Paper Published in Vol. 66 of the Communications in Pure and Applied Mathe- matics 268 ACKNOWLEDGMENTS This report comprises the results of the first year of effort under a grant, 0EG-1-7-071083-5068, from the Bureau of Research, Office of Education, U.S, Department of Health, Educa- tion and Welfare. The content and conclusions presented in the report pertain to the period July 1, 1967 - June 30, 1968. The University of California also provided contributory support. M.E. Maron, Associate Director of the Institute, acted as Prin- cipal Investigator and Ralph M. Shoffner as Project Director. For constructive criticism concerning goals and methods, and for otherwise inaccessible information, we are especially grateful to the members of the