<<

Week 1 – Part 1: An Introduction and DBMSs to Systems • Database: A very large, integrated collection of data. • Examples: databases of customers, products,... • There are huge databases out there, for satellite Databases and DBMSs and other scientific data, digitized movies,...; up Data Models and Data Independence to hexabytes of data (i.e., 1018 bytes) Concurrency Control and Database Transactions • A database usually models (some part of) a real- Structure of a DBMS world enterprise. DBMS Languages Œ Entities (e.g., students, courses) Œ Relationships (e.g., Paolo is taking CS564) • A Database Management System (DBMS) is a package designed to store and manage databases. cscC43/343 – Introduction to Databases Introduction — 1 cscC43/343 – Introduction to Databases Introduction — 2

Why Use a DBMS? Why Study Databases?? • Data independence and efficient access —You don’t need to know the implementation of the • Shift from computation to information: database to access data; queries are optimized. were initially conceived as neat • Reduced application development time — devices for doing scientific calculations; more Queries can be expressed declaratively, and more they are used as data managers. programmer doesn’t have to specify how they • Datasets increasing in diversity and volume: are evaluated. Digital libraries, interactive video, Human • and security —(Certain) Genome project, EOS project constraints on the data are enforced … need for DBMS technology is exploding! automatically. • DBMS technology encompasses much of • Uniform data administration. Science: • Concurrent access, recovery from crashes — Many users can access/update the database at OS, languages, theory, AI, multimedia, logic,... the same time without any interference. cscC43/343 – Introduction to Databases Introduction — 3 cscC43/343 – Introduction to Databases Introduction — 4 Data Models Levels of Abstraction • A data model is a collection of concepts for Many views, single logical describing data. schema and physical • A database schema is a description of the data schema. 1View View 2 View 3 that are contained in a particular database. • Views (also called external • The of data is the most widely schemas) describe how Logical Schema used data model today. users see the data. Œ Main concept: , basically a with • Logical schema* defines Physical Schema rows and columns. logical structure Œ A relation schema, describes the columns, or • Physical schema describes attributes, or fields of a relation. the files and indexes used. * Called conceptual schema back in the old days.

cscC43/343 – Introduction to Databases Introduction — 5 cscC43/343 – Introduction to Databases Introduction — 6

Example: University Database Tables Represent Relations • Logical schema: Students(Sid:String, Name:String, Login: Sid Name Login Age Gpa String, Age:Integer,Gpa:Real) 00243 Paolo pg 21 4.0 Courses(Cid:String, Cname:String, Credits: Students 01786 Maria mf 20 3.6 Integer) 02699 Klaus klaus 19 3.4 02439 Eric eric 19 3.1 Enrolled(Sid:String, Cid:String, Grade:String) • Physical schema: Cid Cname Credits Œ Relations stored as unordered files. Courses csc340 Rqmts Engineering 4 Œ Index on first of Students. csc343 Databases 6 ece268 Operating Systems 3 • (One) External Schema (View): csc324 Programming Langs 4 CourseInfo(Cid:String, Enrollment:Integer)

cscC43/343 – Introduction to Databases Introduction — 7 cscC43/343 – Introduction to Databases Introduction — 8 Data Independence Concurrency Control

Applications insulated from how data is • Concurrent execution of user programs is structured and stored: (See also 3-layer essential for good DBMS performance. schema structure.) Œ Because disk accesses are frequent, and Œ Logical data independence: Protection from relatively slow, it is important to keep the CPU changes in the logical structure of data. humming by working on several user programs Œ Physical data independence: Protection concurrently. from changes in the physical structure of • Interleaving actions of different user programs data. can lead to inconsistency: e.g., cheque is cleared One of the most important benefits while account balance is being computed. of database technology! • DBMS ensures that such problems don’t arise: users can pretend they are using a single-user system.

cscC43/343 – Introduction to Databases Introduction — 9 cscC43/343 – Introduction to Databases Introduction — 10

Database Transactions Scheduling Concurrent Transactions

• Key concept is transaction, which is an atomic DBMS ensures that execution of {T1, ... , Tn} is sequence of database actions (reads/writes). equivalent to some serial execution of T1, ... ,Tn. • Each transaction executed completely, must leave • Before reading/writing an object, a transaction the DB in a consistent state, if DB is consistent requests a on the object, and waits till the when the transaction begins. DBMS gives it the lock. All locks are released at • Users can specify some simple integrity the end of the transaction. (Strict 2-phase constraints on the data, and the DBMS will locking protocol.) enforce these constraints. • Idea: If an action of T (say, writing X) affects T • Beyond this, the DBMS does not really understand i k (which perhaps reads X), one of them, say Ti, will the semantics of the data. (e.g., it does not obtain the lock on X first and T is forced to wait understand how the interest on a bank account is k until Ti completes; this effectively orders the computed). transactions. • Thus, ensuring that a transaction (run alone) • What if Tk already has a lock on Y and Ti later preserves consistency is ultimately the user’s requests a lock on Y? (!) T or T is responsibility! i k cscC43/343 – Introduction to Databases Introduction — 11 cscC43/343aborted – Introductionand to Databases restarted! Introduction — 12 Ensuring Atomicity The Log • DBMSs ensure atomicity (all-or-nothing property), • The following actions are recorded in the log: even if system crashes in the middle of a Œ Ti writes an object: the old value and the new transaction. value; log record must go to disk before the • Idea: Keep a log (history) of all actions carried out changed page! by the DBMS while executing a set of Œ Ti commits/aborts: a log record indicating this transactions: action. Œ Before a change is made to the database, the • Log records chained together by transaction id, so corresponding log entry is forced to a safe it’s easy to undo a specific transaction (e.g., to location. (WAL protocol; OS support for this is resolve a deadlock). often inadequate.) • Log is often duplexed and archived on “stable” Œ After a crash, the effects of partially executed storage. transactions are undone using the log. (Thanks • All log related activities (and in fact, all CC-related to WAL, if log entry wasn’t saved before the activities such as lock/unlock, dealing with crash, corresponding change was not applied to etc.) are handled transparently by the database!) DBMS. cscC43/343 – Introduction to Databases Introduction — 13 cscC43/343 – Introduction to Databases Introduction — 14

Databases Make Folks Happy... Structure of a DBMS These layers must consider concurrency • End users and DBMS vendors control and recovery • Database application programmers, • A typical DBMS has a e.g. smart webmasters layered architecture. and Execution • Database administrators (DBAs) • The figure does not Relational Operators Œ Design logical /physical schemas show the concurrency Files and Access Methods Œ Handle security and authorization control and recovery Œ Data availability, crash recovery components. Buffer Management Œ Database tuning as needs evolve • This is one of several Disk Space Management possible architectures; each system has its Must understand how a DBMS works! own variation. DB

cscC43/343 – Introduction to Databases Introduction — 15 cscC43/343 – Introduction to Databases Introduction — 16 Database Languages SQL, an Interactive Language SELECT Course, Room, A DBMS supports several languages and Building several modes of use: FROM Rooms, Courses • Interactive textual languages, such as SQL; WHERE Code = Room • Interactive commands embedded in a host programming language (Pascal, C, Cobol, AND Floor=”Ground" Java, etc.) ROOMS Code Building Floor • Interactive commands embedded in ad-hoc DS1 Ex-OMI Ground development languages (known as 4GL), N3 Ex-OMI Ground usually with additional features (e.g., for the G Science Third production of forms, menus, reports, ...) COURSES Course Room Floor Networks N3 Ground • Form-oriented, non-textual user-friendly Systems N3 Ground languages such as QBE.

cscC43/343 – Introduction to Databases Introduction — 17 cscC43/343 – Introduction to Databases Introduction — 18

SQL Embedded in Pascal SQL Embedded in ad-hoc Language write(‘city name''?'); readln(city); declare Sal number; (Oracle PL/SQL) EXEC SQL DECLARE E FOR begin SELECT NAME, SALARY select Sal into Salary from Emp where FROM EMPLOYEES Code='5788' WHERE CITY = :city ; for update of Sal; if Salary>30M then EXEC SQL OPEN E ; update Emp set Sal=Salary*1.1 where EXEC SQL FETCH E INTO :name, :salary ; Code='5788'; while SQLCODE = 0 do begin else write(‘employee:', name, ‘raise?'); update Emp set Sal=Salary*1.2 where readln(raise); Code='5788'; end if; EXEC SQL UPDATE PERSON SET ; SALARY=SALARY+:raise exception WHERE CURRENT OF E when no_data_found then EXEC SQL FETCH E INTO :name, :salary insert into Errors end; values(‘No employee has given EXEC SQL CLOSE CURSOR E code',sysdate); end; cscC43/343 – Introduction to Databases Introduction — 19 cscC43/343 – Introduction to Databases Introduction — 20 Form-Based Interface DBMS Languages (in Access)

Host Programming Language DML — data manipulation language DDL — DBMS (allows defini-tion of database schema) DML DDL 4GL — fourth generation language, 4GL useful for declarative query proces- sing, report generation Database

cscC43/343 – Introduction to Databases Introduction — 21 cscC43/343 – Introduction to Databases Introduction — 22

DBMS Technology: Pros and Cons Conventional Files vs Databases Pros • Data are handled as a common resource. Databases • Centralized management and economy of scale. Files Advantages — Good for Advantages — many data integration; allow for • Availability of integrated services, reduction of already exist; good for more flexible formats (not redundancies and inconsistencies simple applications; very just records) • Data independence (useful for the development efficient Disadvantages — high cost; and maintenance of applications) Disadvantages — data drawbacks in a centralized facility Cons duplication; hard to facility evolve; hard to build for • Costs of DBMS products (and associated tools), complex applications also of data migration. • Difficulty in separating features and services The future is with databases! (with potential lack of efficiency.) cscC43/343 – Introduction to Databases Introduction — 23 cscC43/343 – Introduction to Databases Introduction — 24 Types of DBMSs The Hierarchical Data Model • Conventional — relational, network, hierarchical, consist of records of many different record types Database consists of hierarchical record (database looks like a collection of files) structures; a field may have as value a list of • Object-Oriented — database consists of objects records; every record has at most one parent (and possibly associated programs); database Book schema consists of classes (which can be objects B365 War & Peace $8.99 too). parent Borrower • Multimedia — database can store formatted data children (i.e., records) but also text, pictures,... 38 Elm Toronto • Active databases — database includes event- condition-action rules Borrowing • Deductive databases* — like large Prolog programs, not available commercially Jan 28, 1994 Feb 24, 1994

cscC43/343 – Introduction to Databases Introduction — 25 cscC43/343 – Introduction to Databases Introduction — 26

The Network Data Model Comparing Data Models • The oldest DBMSs were hierarchical, dating back to A database now consists of records with the mid-60s. IMS (IBM product) is the most popular pointers (links) to other records. Offers a among them. Many old databases are hierarchical. navigational of a database. • The network data model came next (early ‘70s). Views database programmer as “navigator”, chasing CustomerCustomer links (pointers, actually) around a database. • The network model was found to be too 1::n link implementation-oriented, not insulating sufficiently Order cycles of links are allowed the programmer from implementation features of Order network DBMSs. • The relational model is the most recent arrival. Ordered Ordered SalesSales Relational databases are cleaner because they don’t Part PartPart Part HistoryHistory allow links/pointers (necessarily implementation- dependent). • Even though the relational model was proposed in RegionRegion 1970, it didn’t take over the database market till the 80s. cscC43/343 – Introduction to Databases Introduction — 27 cscC43/343 – Introduction to Databases Introduction — 28 Summary • DBMSs used to maintain and query large datasets. • Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security. • Levels of abstraction give data independence. • A DBMS typically has a layered architecture. • DBAs hold responsible jobs and are well-paid ! • DBMS R&D is one of the broadest, most exciting areas in CS.

cscC43/343 – Introduction to Databases Introduction — 29