(DB): – Collection of Related Data • Has Following Characteristics

Intro: Databases • Database (DB): { Collection of related data • Has following characteristics: { Logically coherent collection of data with inherent meaning { Designed for specific purpose { Represents some aspect of the real world: the miniworld or universe of discourse • DB is not a random collection of facts • Concept of DB independent of Database Management System (DBMS) • Prior to DBMS concept, DBs maintained as flat file (traditional) systems 1 Intro: Flat File Systems • Flat file system: { One or more data files accessed via dedicated programs • Typical large organization (e.g., university) has many departments, each with specific needs { Each department has own set of data { Each department has own set of apps for processing data { Data stored in one or more data files accessed by app programs which define ∗ Record/field structure (object/attribute structure) { Apps written in high-level language { Generally, significant overlap in data stored in various departments • This approach leads to many problems: 1. Data redundancy { Data stored multiple places { Ramifications: (a) Wasted resources ∗ Wasted disk space ∗ Wasted effort ∗ Both result in wasted money (b) Inconsistent data (as a result of updates) ∗ Doubtful if a datum that occurs in multiple files will be updated simulta- neously · Files will be inconsistent for a time interval ∗ Clerical errors may result in permanent inconsistencies 2. Concurrency control { Have many independent programs with uncontrolled access to same data 2 Intro: Flat File Systems (2) 3. Interdependence of data and programs { Data structure of DB embedded in app programs { Based on structures defined using data types inherent in host language { Ramifications: (a) Difficulty in modifying DB/apps ∗ If modify record structure or field data type, all apps must be modified accordingly (b) Difficulty sharing data (among departments) ∗ Most likely incompatible record formats in different locations, so cannot share files easily (c) Difficulty extracting new info ∗ If want to extract info not anticipated at design time, either i. Extract manually using results from existing software ii. Create new app { The interdependence tends to promote a proliferation of apps ∗ This tends to result in much ad hoc code ∗ This in turn tends to result in problems related to (a) Data integrity · Checks that data meet certain constraints (b) Security · Limiting users to authorized data (c) Concurrency control · Limiting number of users accessing a given piece of data at any one time • These problems motivated the development of DBMSs 3 Intro: DBMS Overview • DBMS: { General purpose software system that facilitates definition, construction, manipulation, and maintenance of a DB { Definition: ∗ Specifying objects/entities, logical structure of data, data types, constraints, relationships, ... { Construction: ∗ Creating the DB - installing data { Manipulation: ∗ Insertion, deletion, retrieval, update of data { Maintenance: ∗ Monitoring performance, changing storage structures, changing access paths • DBMS Characteristics { General purpose ∗ Independent of any world of discourse or app ∗ Can represent any type of data (within restrictions of DBMS) { Single, central repository for data ∗ Eliminates redundancy problem (and associated problems, e.g., Cost, consistency) ∗ Data logically related ∗ Note: Some duplication required to create associations among data Referred to as controlled redundancy { Data independent of apps ∗ Data accessed by queries that are independent of physical record structure ∗ User deals with conceptual representation of data, not implementational details ∗ Achieved via data abstraction ∗ Implemented in terms of data structure/module called catalog (dictionary/data dictionary/data directory) 4 Intro: DBMS Overview (2) { Access controlled by a module called the DB Manager ∗ Manager responsible for · Concurrency control · Security · Backup and recovery · Integrity constraints ∗ Since all data access controlled by manager, problems associated with above eliminated { Multiple interfaces (usually provided) ∗ Each presents data in format specific to type of user ∗ Limits access to what is needed by that user 5 Intro: DBMS Architecture - History • Conference on Data Systems Languages (CODASYL) { Purpose was to establish standards for DBs { 1967 - created Database Task Group (DBTG) { DBTG charged with generating standards for environment for creation of DB and manipulation of data { 1969 - DBTG initial report • Codd { 1970 - seminal paper proposing relational model { Proposed 8 services that should be supported by any full DBMS: 1. Data storage, retrieval, and update ∗ Primary purpose of DBMS ∗ Physical details of data storage should be hidden from user 2. User-accessible catalog ∗ Stores metadata - data about data ∗ Stores info pertaining to all aspects of DB design, usage, and maintenance 3. Transaction support ∗ Transaction: an atomic operation on the DB · Consists of a read and/or write ∗ Transaction executes in its entirety, or not at all ∗ Insures consistency of DB in case of DBMS failure 4. Concurrency control 5. Recovery ∗ Enables DB to be returned to a consistent state in case of failure 6. Restriction of unauthorized access 7. Support for data communication ∗ Provide for remote access of DB 8. Integrity support ∗ Insure data is correct and consistent 6 Intro: DBMS Architecture - History (2) • DBTG { 1971 - formal proposal { DBMS should consist of 3 components: 1. Network schema ∗ Describes logical organization of DB ∗ Includes (a) DB name (b) Structure of each record type (c) Data types of each record field 2. Subschema ∗ Describes DB as seen by users 3. Data management language ∗ Used to define structure of data ∗ Used to manipulate data ∗ Proposed 3 sub-languages: (a) Schema data definition language (DDL) · Defines schema · Used by DBA (b) Subschema DDL · Defines parts of DB required by apps (c) Data manipulation language · For manipulation of data · Used by anyone querying DB { Not adopted by ANSI 7 Intro: DBMS Architecture - History (3) • ANSI Standards Planning and Requirements Committee (SPARC) { 1975 - proposed 3-level architecture with data dictionary { Based on IBM (Codd) proposals { Reflected need for independent layer between implementational and application levels { Purpose of 3-level architecture: ∗ Users should be able to access same data, but with customized view · Should be able to change one view without affecting others ∗ Physical data storage should be invisible to user ∗ Changes to physical storage should not affect user view ∗ Changes to physical storage should not affect internal structure of DB ∗ Changes to internal structure should not affect user view • ANSI-SPARC proposal did not become formal standard { Is basis for modern DBMS architectures 8 Intro: ANSI-SPARC Architecture • 3-level ANSI-SPARC Architecture: 1. External level { Presents subsets (views) of the DB { Each view customized for a particular user ∗ Limits what is accessible to the user ∗ Display of data may be different for same data in different views { May containing data not actually stored in DB (derived data) 2. Conceptual level { "Community" representation of entire DB { Data represented logically (structure in terms of components, data types, and sizes), independent of physical storage (not in terms of bytes) { Types of information represented: ∗ Entities, attributes, relations ∗ Constraints ∗ Semantic info ∗ Security and integrity info 9 Intro: ANSI-SPARC Architecture (2) 3. Internal level { Physical (implementational) representation of data ∗ Storage allocation ∗ Access mechanisms ∗ Record disk address ∗ Record structure ∗ Data compression and encryption 4. Physical level { Machine level 10 Intro: Schemas • Schema: { Logical structure of data • Need schema for each level of ANSI-SPARC architecture { Schema for a level describes data at that level only { One internal schema that describes 1. Data fields of records (physical) 2. Indices 3. Access methods 4. ... { One conceptual schema that describes 1. Data fields of records (logical) 2. Relationships among data 3. Constraints 4. ... { Multiple external schemas (subschemas) that describe 1. Same aspects as subset of conceptual level • Schema diagram used to represent structure graphically • Schemas created by DB designers for a particular domain • Schema is relatively static • Often referred to as an intension of the DB • A populated DB is often referred to as a(n) { Instance { State { Extension { Snapshot 11 Intro: Data Models • Data model: { Integrated collection of concepts for describing data, relationships between data, and constraints on data in an organization (Connelly and Begg) { Schemas represented in terms of a data model • Data models are used at each level of the DBMS • Models do not need to be the same across DBMS levels • Data models are categorized in terms of their degree of abstraction 1. High-level/conceptual/object-based models { Highest degree of abstraction { Describe DB in terms of (a) Entities ∗ Concepts to be represented (b) Attributes ∗ Characteristics of the entities (c) Relationships ∗ Associations among entities { Example paradigms: (a) Entity-relation (ER) model (b) Object-oriented (c) Semantic (d) Functional 12 Intro: Data Models (2) 2. Representational/Implementational/record-based models { Describe DB in terms of logical records { Correspond closely to way data represented physically, while hiding the details { Applicable to external and conceptual DBMS levels { Better than OO for representing structure { Poorer than OO for representing constraints {

Load more