An Introduction to Database Systems Databases and Dbmss Why Use A
Total Page:16
File Type:pdf, Size:1020Kb
Week 1 – Part 1: An Introduction Databases and DBMSs to Database Systems • Database: A very large, integrated collection of data. • Examples: databases of customers, products,... • There are huge databases out there, for satellite Databases and DBMSs and other scientific data, digitized movies,...; up Data Models and Data Independence to hexabytes of data (i.e., 1018 bytes) Concurrency Control and Database Transactions • A database usually models (some part of) a real- Structure of a DBMS world enterprise. DBMS Languages Entities (e.g., students, courses) Relationships (e.g., Paolo is taking CS564) • A Database Management System (DBMS) is a software package designed to store and manage databases. cscC43/343 – Introduction to Databases Introduction — 1 cscC43/343 – Introduction to Databases Introduction — 2 Why Use a DBMS? Why Study Databases?? • Data independence and efficient access —You don’t need to know the implementation of the • Shift from computation to information: database to access data; queries are optimized. Computers were initially conceived as neat • Reduced application development time — devices for doing scientific calculations; more Queries can be expressed declaratively, and more they are used as data managers. programmer doesn’t have to specify how they • Datasets increasing in diversity and volume: are evaluated. Digital libraries, interactive video, Human • Data integrity and security —(Certain) Genome project, EOS project constraints on the data are enforced … need for DBMS technology is exploding! automatically. • DBMS technology encompasses much of • Uniform data administration. Computer Science: • Concurrent access, recovery from crashes — Many users can access/update the database at OS, languages, theory, AI, multimedia, logic,... the same time without any interference. cscC43/343 – Introduction to Databases Introduction — 3 cscC43/343 – Introduction to Databases Introduction — 4 Data Models Levels of Abstraction • A data model is a collection of concepts for Many views, single logical describing data. schema and physical • A database schema is a description of the data schema. 1View View 2 View 3 that are contained in a particular database. • Views (also called external • The relational model of data is the most widely schemas) describe how Logical Schema used data model today. users see the data. Main concept: relation, basically a table with • Logical schema* defines Physical Schema rows and columns. logical structure A relation schema, describes the columns, or • Physical schema describes attributes, or fields of a relation. the files and indexes used. * Called conceptual schema back in the old days. cscC43/343 – Introduction to Databases Introduction — 5 cscC43/343 – Introduction to Databases Introduction — 6 Example: University Database Tables Represent Relations • Logical schema: Students(Sid:String, Name:String, Login: Sid Name Login Age Gpa String, Age:Integer,Gpa:Real) 00243 Paolo pg 21 4.0 Courses(Cid:String, Cname:String, Credits: Students 01786 Maria mf 20 3.6 Integer) 02699 Klaus klaus 19 3.4 02439 Eric eric 19 3.1 Enrolled(Sid:String, Cid:String, Grade:String) • Physical schema: Cid Cname Credits Relations stored as unordered files. Courses csc340 Rqmts Engineering 4 Index on first column of Students. csc343 Databases 6 ece268 Operating Systems 3 • (One) External Schema (View): csc324 Programming Langs 4 CourseInfo(Cid:String, Enrollment:Integer) cscC43/343 – Introduction to Databases Introduction — 7 cscC43/343 – Introduction to Databases Introduction — 8 Data Independence Concurrency Control Applications insulated from how data is • Concurrent execution of user programs is structured and stored: (See also 3-layer essential for good DBMS performance. schema structure.) Because disk accesses are frequent, and Logical data independence: Protection from relatively slow, it is important to keep the CPU changes in the logical structure of data. humming by working on several user programs Physical data independence: Protection concurrently. from changes in the physical structure of • Interleaving actions of different user programs data. can lead to inconsistency: e.g., cheque is cleared One of the most important benefits while account balance is being computed. of database technology! • DBMS ensures that such problems don’t arise: users can pretend they are using a single-user system. cscC43/343 – Introduction to Databases Introduction — 9 cscC43/343 – Introduction to Databases Introduction — 10 Database Transactions Scheduling Concurrent Transactions • Key concept is transaction, which is an atomic DBMS ensures that execution of {T1, ... , Tn} is sequence of database actions (reads/writes). equivalent to some serial execution of T1, ... ,Tn. • Each transaction executed completely, must leave • Before reading/writing an object, a transaction the DB in a consistent state, if DB is consistent requests a lock on the object, and waits till the when the transaction begins. DBMS gives it the lock. All locks are released at • Users can specify some simple integrity the end of the transaction. (Strict 2-phase constraints on the data, and the DBMS will locking protocol.) enforce these constraints. • Idea: If an action of T (say, writing X) affects T • Beyond this, the DBMS does not really understand i k (which perhaps reads X), one of them, say Ti, will the semantics of the data. (e.g., it does not obtain the lock on X first and T is forced to wait understand how the interest on a bank account is k until Ti completes; this effectively orders the computed). transactions. • Thus, ensuring that a transaction (run alone) • What if Tk already has a lock on Y and Ti later preserves consistency is ultimately the user’s requests a lock on Y? (Deadlock!) T or T is responsibility! i k cscC43/343 – Introduction to Databases Introduction — 11 cscC43/343aborted – Introductionand to Databases restarted! Introduction — 12 Ensuring Atomicity The Log • DBMSs ensure atomicity (all-or-nothing property), • The following actions are recorded in the log: even if system crashes in the middle of a Ti writes an object: the old value and the new transaction. value; log record must go to disk before the • Idea: Keep a log (history) of all actions carried out changed page! by the DBMS while executing a set of Ti commits/aborts: a log record indicating this transactions: action. Before a change is made to the database, the • Log records chained together by transaction id, so corresponding log entry is forced to a safe it’s easy to undo a specific transaction (e.g., to location. (WAL protocol; OS support for this is resolve a deadlock). often inadequate.) • Log is often duplexed and archived on “stable” After a crash, the effects of partially executed storage. transactions are undone using the log. (Thanks • All log related activities (and in fact, all CC-related to WAL, if log entry wasn’t saved before the activities such as lock/unlock, dealing with crash, corresponding change was not applied to deadlocks etc.) are handled transparently by the database!) DBMS. cscC43/343 – Introduction to Databases Introduction — 13 cscC43/343 – Introduction to Databases Introduction — 14 Databases Make Folks Happy... Structure of a DBMS These layers must consider concurrency • End users and DBMS vendors control and recovery • Database application programmers, • A typical DBMS has a Query Optimization e.g. smart webmasters layered architecture. and Execution • Database administrators (DBAs) • The figure does not Relational Operators Design logical /physical schemas show the concurrency Files and Access Methods Handle security and authorization control and recovery Data availability, crash recovery components. Buffer Management Database tuning as needs evolve • This is one of several Disk Space Management possible architectures; each system has its Must understand how a DBMS works! own variation. DB cscC43/343 – Introduction to Databases Introduction — 15 cscC43/343 – Introduction to Databases Introduction — 16 Database Languages SQL, an Interactive Language SELECT Course, Room, A DBMS supports several languages and Building several modes of use: FROM Rooms, Courses • Interactive textual languages, such as SQL; WHERE Code = Room • Interactive commands embedded in a host programming language (Pascal, C, Cobol, AND Floor=”Ground" Java, etc.) ROOMS Code Building Floor • Interactive commands embedded in ad-hoc DS1 Ex-OMI Ground development languages (known as 4GL), N3 Ex-OMI Ground usually with additional features (e.g., for the G Science Third production of forms, menus, reports, ...) COURSES Course Room Floor Networks N3 Ground • Form-oriented, non-textual user-friendly Systems N3 Ground languages such as QBE. cscC43/343 – Introduction to Databases Introduction — 17 cscC43/343 – Introduction to Databases Introduction — 18 SQL Embedded in Pascal SQL Embedded in ad-hoc Language write(‘city name''?'); readln(city); declare Sal number; (Oracle PL/SQL) EXEC SQL DECLARE E CURSOR FOR begin SELECT NAME, SALARY select Sal into Salary from Emp where FROM EMPLOYEES Code='5788' WHERE CITY = :city ; for update of Sal; if Salary>30M then EXEC SQL OPEN E ; update Emp set Sal=Salary*1.1 where EXEC SQL FETCH E INTO :name, :salary ; Code='5788'; while SQLCODE = 0 do begin else write(‘employee:', name, ‘raise?'); update Emp set Sal=Salary*1.2 where readln(raise); Code='5788'; end if; EXEC SQL UPDATE PERSON SET commit; SALARY=SALARY+:raise exception WHERE CURRENT OF E when no_data_found then EXEC SQL FETCH E INTO :name, :salary insert into