Database Machines in Support of Very Large Databases
Total Page:16
File Type:pdf, Size:1020Kb
Rochester Institute of Technology RIT Scholar Works Theses 1-1-1988 Database machines in support of very large databases Mary Ann Kuntz Follow this and additional works at: https://scholarworks.rit.edu/theses Recommended Citation Kuntz, Mary Ann, "Database machines in support of very large databases" (1988). Thesis. Rochester Institute of Technology. Accessed from This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. Rochester Institute of Technology School of Computer Science Database Machines in Support of Very large Databases by Mary Ann Kuntz A thesis. submitted to The Faculty of the School of Computer Science. in partial fulfillment of the requirements for the degree of Master of Science in Computer Systems Management Approved by: Professor Henry A. Etlinger Professor Peter G. Anderson A thesis. submitted to The Faculty of the School of Computer Science. in partial fulfillment of the requirements for the degree of Master of Science in Computer Systems Management Approved by: Professor Henry A. Etlinger Professor Peter G. Anderson Professor Jeffrey Lasky Title of Thesis: Database Machines In Support of Very Large Databases I Mary Ann Kuntz hereby deny permission to reproduce my thesis in whole or in part. Date: October 14, 1988 Mary Ann Kuntz Abstract Software database management systems were developed in response to the needs of early data processing applications. Database machine research developed as a result of certain performance deficiencies of these software systems. This thesis discusses the history of database machines designed to improve the performance of database processing and focuses primarily on the Teradata DBC/1012, the only successfully marketed database machine that supports very large databases today. Also reviewed is the response of IBM to the performance needs of its database customers; this response has been in terms of improvements in both software and hardware support for database processing. In conclusion, an analysis is made of the future of database machines, in particular the DBC/1012, in light of recent IBM enhancements and its immense customer base. Table of Contents Chapter 1 - Introduction to Database Machines Introduction 1-1 Out of The Labs 1- IBM 1. Database Beginnings 1- Why Database? 1-1 Separation of Data 1 -2 Storage Considerations 1 -3 Data As A Resource 1 -3 Database Solutions 1-3 Data Independence 1 -3 Centralized Control 1-3 Problems Arising From Software Database Solutions 1-4 Referential Integrity 1-4 Cascading 1-5 Restriction 1 -5 Nullification 1-6 Cost 1-6 Purchase Price Plus 1 -7 Overhead 1 -7 Performance 1-8 Very Large Databases 1-8 Vulnerability 1-8 Database Machine Research 1-8 Hsiao's Classification Scheme 1-9 Application 1-9 Technology 1 -9 Architecture 1-10 Qadah's Taxonomy 1-10 Indexing Level 1-11 Query Processing Place 1 -1 1 Processing Multiplicity 1-12 A Chronological Categorization 1-12 Pre-1979 1-12 Post-1979 1-13 DBC/1012 - The Teradata Database Machine 1-13 An Alternative to Software Database Systems 1 -1 3 Cost 1-13 Performance 1-13 Very Large Databases 1 -1 4 Table of Contents-2 Multiple Hosts 1-14 Vulnerability 1-14 Other Features 1-14 Architectural Overview 1-16 Host System Communications Interface (HSCI) 1 -1 6 Interface Processor (IFP) 1-16 Ynet -I_16 Access Module Processor (AMP) 1 -1 6 Disk Storage Unit (DSU) 1-16 System Console 1-16 Communications Processor (COP) 1 -1 6 Open Issues 1-17 Referential Integrity 1-17 Conversion From Software Database Management Systems 1-17 Functionality 1-17 IBM 1-18 IBM's Response 1-18 System/38 1-1 8 Mainframe Trends 1-19 Intelligent Disk Controller 1 -1 9 Data Facility Product 1 -1 9 DB2 1-19 Data Facility/Hierarchical Storage Management 1 -1 9 Summit 1 -20 Summary 1-20 Chapter 2 - A History of Database Machine Research Introduction 2-1 Magnetic Disk Memories 2-2 Fixed-Head-Disks 2-2 Movable-Arm-Disks 2-3 Associative Memory 2-3 Slotnick 2-3 Database Machine Research 2-4 Information Systems for Associative Memories (IFAM). 2-4 Content Addressed Segment Sequential Memory (CASSM). 2-5 Relational Associative Processor - RAP.1 2-8 Rotating Associative Relational Store - RARES. 2-10 SURE - A Search Processor For Database Management Systems 2-12 Content Addressable File Store - CAFS. 2-14 RAP.2 2-18 Table of Contents-3 Direct 2-20 Database Computer - DBC 2-23 Distributed Associative Logic Database Machine - DIALOG 2-25 RAP.3 2-27 Conclusion 2-28 Chapter 3 - The Teradata DBC/1012 Introduction 3-1 Performance Evaluation of Computer Systems 3-1 Performance Indices 3-1 Evaluation Techniques 3-1 The Neches Study 32 DBC/1012 Concepts 33 Interconnect Network Requirements 3-3 Implementation of the Interconnect 3-4 DBC/1012 Subsystems 3-5 YNET 35 Host System Communication Interface (HSCI) 3-7 Processor Modules 3-9 Interface Processor (IFP) 3-9 Session Control 3-9 Host Driver 3-9 Parser 3-10 Dispatcher 3-10 Ynet Driver 3-10 Access Module Processor 3-1 1 Ynet Driver 3-12 Data Base Manager 3-1 2 Disk Driver 3-13 Communications Processor (COP). 3-14 Disk Storage Unit (DSU) 3-1 6 Other Host Resident Components 3-1 7 Language Preprocessor 3-17 Interactive Teradata Query (ITEQ) 3-1 7 Batch Teradata Query (BTEQ) 3-1 7 Database Integrity 3-1 8 Transaction Processing 3-1 8 Locking Mechanisms 3-1 8 Journaling 3-19 Transient Journals 3-1 9 Table of Contents-4 Permanent Journals 3- 1 9 Redundant Data Storage 3-20 Benchmarks 3-20 Liberty Mutual 3-20 Citibank N. A. 3-23 Insurance Application 3-24 Sales History Analysis 3-26 Banking Transaction Application 3-27 Data Loading Scenario 3-29 Chicago Board Options Exchange 3-30 Utility Benchmark 3-30 Bulk Data Load 3-30 Dump and Restore 3-3 1 Fasr Load 3-32 Application Benchmark. 3-33 Bank of America 3-33 Conclusion 3-34 Chapter 4 - Database Processing Solutions by IBM Introduction 4-1 370-XA Architecture 4-1 Main Storage 4-1 CPU 4-1 Instruction Execution 4-1 Interrupt Handling 4-3 Channel Subsystem (CSS) 4-3 Channel Control Element (CCE). 4-3 Channel Paths 4-3 Subchannels 4-4 Storage Subsystem 4-4 Storage Control Unit 4-5 Direct Access Storage Device (DASD) System 4-5 Sierra 4-6 3090 Processor Complex 4-6 3880 Storage Control Unit 4-6 3380 Direct Access Storage 4-7 Standard Models 4-9 Extended Capability Models 4-1 0 Disk Storage Management 4-10 I/O Processing on the Sierra 4-11 Initiation of I/O Operations 4-1 1 Table of Contents-5 Start Subchannel 4-1 1 Operation-Request Block (ORB) 4-1 2 Channel-Command Word (CCW) 4-12 START Function of the Channel Subsystem 4-13 Path Management 4-13 Channel Program Execution 4-13 Indicating Completion of Operations 4-1 4 I/O Interruptions 4-14 IBM Mainframe Directions 4-14 3990 Storage Control 4-14 3990 Components 4-15 Fast Write Capabilities 4-1 6 DASD Fast Write 4-16 Cache Fast Write 4-17 Dual Copy 4-17 Enhanced DASD Subsystem Model 4-1 8 Data Facility Product (DFP) 4-1 8 New IBM Announcements 4-19 Enterprise Systems Architecture (ESA/370) 4-1 9 MVS/Enterprise Systems Architecture 4-1 9 MVS/SP Version 3 4-19 Data Spaces 4-20 Hiperspaces 4-20 System Services 4-20 MVS/Data Facility Product (MVS/DFP) V 3 4-20 DB2 Considerations 4-21 Summit 4-21 System 38 (S/38) 4-22 High-Level Machine Interface 4-22 Control Program Facility (CPF) 4-23 User Interface 4-23 Database Data Management 4-23 Data Communications 4-24 S/38's Hardware Subsystem 4 24 Central Processing Unit 4-24 Storage Management 4-24 I/O Channel 4-24 Main Storage 4-25 Auxiliary Storage 4-25 Silverlake 4-25 Conclusion 4-26 Table of Contents-6 Chapter 5 - The Future of Database Machines Introduction 5-1 JASMIN 52 Software Components 5-2 Hardware Components 5-3 GRACE 5^ GAMMA 58 BUBBA 510 Teradata 5-11 Database Machine Design Conventions 5 15 Outstanding Issues 5-16 Referential Integrity 5-1 6 User Tools and Interfaces 5-17 System Tools and Utilities 5-1 7 Programming and Query Tools 5-1 7 Data Dictionary Standards 5-18 Conclusion 5-18 Table of Figures Chapter 1 Figure 1. Payroll Record 1-2 Figure 2. PL/I Payroll structure. 1-2 Figure 3. a new field. Adding 1-2 Figure 4. Database Relations. 1-5 Figure 5. a change. Cascading key 1-6 Figure 6. a value. Nullifying key 1-6 Figure 7. Hsiao's Classifications. 1-9 Figure 8. Application Category. .,_g Figure 9. Category. Technology i_g Figure 10. Architectural Category. ^10 Figure 1 1 . Qadah's Taxonomy. 1 ^ n Figure 1 2. Level. Indexing ., _., 1 Figure 13. Query Processing Place. 1^ 1 Figure 14. Processing Multiplicity. 1.12 Figure 15. The Teradata DBC/1012 Database Machine. [DBC\ 86] 1-15 Chapter 2 1 . Figure Database Machine Categories. 2-1 Figure 2. Magnetic Disk. [BAER 80] 2-2 Figure 3. Fixed-head Disk. [BAER 80] 2-2 Figure 4. Movable-arm Disk. [BAER 80] 2-3 Figure 5. Off-disk, database indexing level machine. [QADA 85] 2-4 Figure 6. IFAM Architecture. [DEFI 83] 2-5 Figure 7. A CASSM Module or Cell.