(DB): – Collection of Related Data • Has Following Characteristics

Intro: Databases • Database (DB): { Collection of related data • Has following characteristics: { Logically coherent collection of data with inherent meaning { Designed for specific purpose { Represents some aspect of the real world: the miniworld or universe of discourse • DB is not a random collection of facts • Concept of DB independent of Database Management System (DBMS) • Prior to DBMS concept, DBs maintained as flat file (traditional) systems 1 Intro: Flat File Systems • Flat file system: { One or more data files accessed via dedicated programs • Typical large organization (e.g., university) has many departments, each with specific needs { Each department has own set of data { Each department has own set of apps for processing data { Data stored in one or more data files accessed by app programs which define ∗ Record/field structure (object/attribute structure) { Apps written in high-level language { Generally, significant overlap in data stored in various departments • This approach leads to many problems: 1. Data redundancy { Data stored multiple places { Ramifications: (a) Wasted resources ∗ Wasted disk space ∗ Wasted effort ∗ Both result in wasted money (b) Inconsistent data (as a result of updates) ∗ Doubtful if a datum that occurs in multiple files will be updated simulta- neously · Files will be inconsistent for a time interval ∗ Clerical errors may result in permanent inconsistencies 2. Concurrency control { Have many independent programs with uncontrolled access to same data 2 Intro: Flat File Systems (2) 3. Interdependence of data and programs { Data structure of DB embedded in app programs { Based on structures defined using data types inherent in host language { Ramifications: (a) Difficulty in modifying DB/apps ∗ If modify record structure or field data type, all apps must be modified accordingly (b) Difficulty sharing data (among departments) ∗ Most likely incompatible record formats in different locations, so cannot share files easily (c) Difficulty extracting new info ∗ If want to extract info not anticipated at design time, either i. Extract manually using results from existing software ii. Create new app { The interdependence tends to promote a proliferation of apps ∗ This tends to result in much ad hoc code ∗ This in turn tends to result in problems related to (a) Data integrity · Checks that data meet certain constraints (b) Security · Limiting users to authorized data (c) Concurrency control · Limiting number of users accessing a given piece of data at any one time • These problems motivated the development of DBMSs 3 Intro: DBMS Overview • DBMS: { General purpose software system that facilitates definition, construction, manipulation, and maintenance of a DB { Definition: ∗ Specifying objects/entities, logical structure of data, data types, constraints, relationships, ... { Construction: ∗ Creating the DB - installing data { Manipulation: ∗ Insertion, deletion, retrieval, update of data { Maintenance: ∗ Monitoring performance, changing storage structures, changing access paths • DBMS Characteristics { General purpose ∗ Independent of any world of discourse or app ∗ Can represent any type of data (within restrictions of DBMS) { Single, central repository for data ∗ Eliminates redundancy problem (and associated problems, e.g., Cost, consistency) ∗ Data logically related ∗ Note: Some duplication required to create associations among data Referred to as controlled redundancy { Data independent of apps ∗ Data accessed by queries that are independent of physical record structure ∗ User deals with conceptual representation of data, not implementational details ∗ Achieved via data abstraction ∗ Implemented in terms of data structure/module called catalog (dictionary/data dictionary/data directory) 4 Intro: DBMS Overview (2) { Access controlled by a module called the DB Manager ∗ Manager responsible for · Concurrency control · Security · Backup and recovery · Integrity constraints ∗ Since all data access controlled by manager, problems associated with above eliminated { Multiple interfaces (usually provided) ∗ Each presents data in format specific to type of user ∗ Limits access to what is needed by that user 5 Intro: DBMS Architecture - History • Conference on Data Systems Languages (CODASYL) { Purpose was to establish standards for DBs { 1967 - created Database Task Group (DBTG) { DBTG charged with generating standards for environment for creation of DB and manipulation of data { 1969 - DBTG initial report • Codd { 1970 - seminal paper proposing relational model { Proposed 8 services that should be supported by any full DBMS: 1. Data storage, retrieval, and update ∗ Primary purpose of DBMS ∗ Physical details of data storage should be hidden from user 2. User-accessible catalog ∗ Stores metadata - data about data ∗ Stores info pertaining to all aspects of DB design, usage, and maintenance 3. Transaction support ∗ Transaction: an atomic operation on the DB · Consists of a read and/or write ∗ Transaction executes in its entirety, or not at all ∗ Insures consistency of DB in case of DBMS failure 4. Concurrency control 5. Recovery ∗ Enables DB to be returned to a consistent state in case of failure 6. Restriction of unauthorized access 7. Support for data communication ∗ Provide for remote access of DB 8. Integrity support ∗ Insure data is correct and consistent 6 Intro: DBMS Architecture - History (2) • DBTG { 1971 - formal proposal { DBMS should consist of 3 components: 1. Network schema ∗ Describes logical organization of DB ∗ Includes (a) DB name (b) Structure of each record type (c) Data types of each record field 2. Subschema ∗ Describes DB as seen by users 3. Data management language ∗ Used to define structure of data ∗ Used to manipulate data ∗ Proposed 3 sub-languages: (a) Schema data definition language (DDL) · Defines schema · Used by DBA (b) Subschema DDL · Defines parts of DB required by apps (c) Data manipulation language · For manipulation of data · Used by anyone querying DB { Not adopted by ANSI 7 Intro: DBMS Architecture - History (3) • ANSI Standards Planning and Requirements Committee (SPARC) { 1975 - proposed 3-level architecture with data dictionary { Based on IBM (Codd) proposals { Reflected need for independent layer between implementational and application levels { Purpose of 3-level architecture: ∗ Users should be able to access same data, but with customized view · Should be able to change one view without affecting others ∗ Physical data storage should be invisible to user ∗ Changes to physical storage should not affect user view ∗ Changes to physical storage should not affect internal structure of DB ∗ Changes to internal structure should not affect user view • ANSI-SPARC proposal did not become formal standard { Is basis for modern DBMS architectures 8 Intro: ANSI-SPARC Architecture • 3-level ANSI-SPARC Architecture: 1. External level { Presents subsets (views) of the DB { Each view customized for a particular user ∗ Limits what is accessible to the user ∗ Display of data may be different for same data in different views { May containing data not actually stored in DB (derived data) 2. Conceptual level { "Community" representation of entire DB { Data represented logically (structure in terms of components, data types, and sizes), independent of physical storage (not in terms of bytes) { Types of information represented: ∗ Entities, attributes, relations ∗ Constraints ∗ Semantic info ∗ Security and integrity info 9 Intro: ANSI-SPARC Architecture (2) 3. Internal level { Physical (implementational) representation of data ∗ Storage allocation ∗ Access mechanisms ∗ Record disk address ∗ Record structure ∗ Data compression and encryption 4. Physical level { Machine level 10 Intro: Schemas • Schema: { Logical structure of data • Need schema for each level of ANSI-SPARC architecture { Schema for a level describes data at that level only { One internal schema that describes 1. Data fields of records (physical) 2. Indices 3. Access methods 4. ... { One conceptual schema that describes 1. Data fields of records (logical) 2. Relationships among data 3. Constraints 4. ... { Multiple external schemas (subschemas) that describe 1. Same aspects as subset of conceptual level • Schema diagram used to represent structure graphically • Schemas created by DB designers for a particular domain • Schema is relatively static • Often referred to as an intension of the DB • A populated DB is often referred to as a(n) { Instance { State { Extension { Snapshot 11 Intro: Data Models • Data model: { Integrated collection of concepts for describing data, relationships between data, and constraints on data in an organization (Connelly and Begg) { Schemas represented in terms of a data model • Data models are used at each level of the DBMS • Models do not need to be the same across DBMS levels • Data models are categorized in terms of their degree of abstraction 1. High-level/conceptual/object-based models { Highest degree of abstraction { Describe DB in terms of (a) Entities ∗ Concepts to be represented (b) Attributes ∗ Characteristics of the entities (c) Relationships ∗ Associations among entities { Example paradigms: (a) Entity-relation (ER) model (b) Object-oriented (c) Semantic (d) Functional 12 Intro: Data Models (2) 2. Representational/Implementational/record-based models { Describe DB in terms of logical records { Correspond closely to way data represented physically, while hiding the details { Applicable to external and conceptual DBMS levels { Better than OO for representing structure { Poorer than OO for representing constraints {

(DB): – Collection of Related Data • Has Following Characteristics

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support