An Introduction to Relational Database Management System
Total Page:16
File Type:pdf, Size:1020Kb
An introduction to relational database management system Fabrice Rossi CEREMADE Université Paris Dauphine 2021 Database Management Systems Database (DB) An organized collection of data Database Management System (DBMS) A software used to manage databases, namely I creating a database I maintaining a database I interacting with a database 2 Advantages of DBMS I meta data management I integrity constraints I security (access control) I concurrency control I etc. DBMS versus Files File-Based data management I data in CSV files (or other format) I management with e.g. Python + Pandas (or R + tidyverse) 3 DBMS versus Files File-Based data management I data in CSV files (or other format) I management with e.g. Python + Pandas (or R + tidyverse) Advantages of DBMS I meta data management I integrity constraints I security (access control) I concurrency control I etc. 3 DBMS components DBMS architecture I storage manager I handles saving data and meta data on the storage system I manages part of the concurrency aspects I query processor I core part of the DBMS I handles computation on the database (e.g. search, insertion, deletion, modification, aggregation) I optimize queries for efficiency I high level tools I meta data management I connection and security manager 4 DBMS components DBMS languages I data definition language (DDL) I used to specify a data model I data types I links between data I meta data management I data manipulation language (DML) I used to query a database I specifies computation on a database 5 Types of DBMS Data model categorization I old approaches: hierarchical DBMS and network DBMS I current standard: relational DBMS I based on a relational data model I typical based on SQL (Structured Query Language, sequel) I niche approach: object-oriented DBMS I large scale or unstructured data: NoSQL I not-only SQL (or more specifically not relational) I document stores I graph oriented databases I column oriented databases 6 What’s missing? I A lot! I DBMS are a very rich and complex subject I numerous theoretical aspects I getting under the hood is yet another subject I practical aspects are also very complex In this course An introduction to Relational DBMS 7 In this course An introduction to Relational DBMS What’s missing? I A lot! I DBMS are a very rich and complex subject I numerous theoretical aspects I getting under the hood is yet another subject I practical aspects are also very complex 7 Outline Introduction Conceptual Data Modeling The Relational Model SQL 8 Outline Introduction Conceptual Data Modeling The Relational Model SQL 9 Data Management Specifying the data I ideally the first step of a data oriented project I important questions I what are the objects under study? I how are they described? I how are they related? Two steps process 1. conceptual/formal model 2. implementation in a meta data oriented language 10 Entity Relationship Model Proposed by Peter Chen in 1976 Concepts I Entity: uniquely identified object under study (e.g. a person) I Relationship: a way to relate entities (e.g. a has access to b) I Attribute: a property of an entity or of a relationship. An attribute has a domain (the set of values it can take) I an ER model describes types, e.g. entity type, not values 11 Example Loan application data set I https://relational.fit.cvut.cz/dataset/Financial I 8 tables including I client table I account table I credit card table I loan table I etc. 12 key attribute Example (part of the) Client table ER model client_id gender birth_date I entity type: client 1 F 1970-12-13 2 M 1945-02-04 I attributes 3 F 1940-10-09 4 M 1956-12-01 I gender with domain F and 5 F 1960-07-03 M I birth_date with a date domain I client_id 13 key attribute Example (part of the) Client table ER model client_id gender birth_date I entity type: client 1 F 1970-12-13 2 M 1945-02-04 I attributes 3 F 1940-10-09 4 M 1956-12-01 I gender with domain F and 5 F 1960-07-03 M I birth_date with a date domain I client_id Client gender birth_date client_id 13 Example (part of the) Client table ER model client_id gender birth_date I entity type: client 1 F 1970-12-13 2 M 1945-02-04 I attributes 3 F 1940-10-09 4 M 1956-12-01 I gender with domain F and 5 F 1960-07-03 M I birth_date with a date domain I client_id key attribute Client gender birth_date client_id 13 Graphical representation Entity type Entity types are represented by rectangles link to attributes Attribute type I an ellipse per type I derived attribute type I key attribute type I can be computed from I a unique identifier of the another one (e.g. age corresponding entity from birth date) I underlined in the I dashed border representation I multi-valued attribute type I composite attribute type I several values are I can be decomposed into authorized sub-attributes I double border I linked ellipses 14 Example Last name First name email name Person gender birth_date age client_id 15 Typical domains Numerical Textual I integers I words I decimal numbers I codes (such as post codes) I possible constraints: positive I structured strings (e.g. numbers, number of emails) significant digits, etc. Others Temporal I truth values (boolean) I dates I binary content (such as I times images) 16 Relationship Principle I a relationship represents an association between at least two entities I it can have attributes I it is characterized by cardinalities I minimum and maximum number of relationships to which a given entity can participate I asymmetric I a relationship type is graphically represented by a rhombus 17 Example Loan application data set I client entities and account entities I relationships I a client can be the owner of an account I a client can be allowed to use an account I cardinalities I owner: I each account has exactly one owner I each client can own at most one account (in this database) I user: I each account may have some users I each client can be the user of some accounts 18 Example Owns 0-1 1-1 client_id Client Account account_id 0-n 0-m Uses 19 Outline Introduction Conceptual Data Modeling The Relational Model SQL 20 Relational Model History I invented by Edgar F. Codd in 1969-1970 I based on a mathematical model, the relational algebra I associated relational calculus Main concepts I a domain: a set Qn I a relation: a (finite) subset of i=1(Di [ fNULLg) where each Di is a domain and NULL a special value (for missing values) I a tuple: an element of a relation I a database: a collection of relations 21 Examples Clients Accounts I Domains: I Domains: + + I N : positive integers I N : positive integers I G = fF; Mg: genders I D: dates I D: dates I An account relation: + + I A client relation: I a subset of N × N × D + I a subset of N × G × D I a set of tuples such as I a set of tuples such as (1; 8; 1995−03−24) (1; F; 1970−12−13) 22 Relation type/variable Specifying relations I somewhat imprecise vocabulary Qn I a relation (value) r 2 i=1(Di [ fNULLg) I a relation type/variable (relvar) I a named specification of the relation I domains and names: column types Notation I a relation type/variable is denoted R(A1;:::; An): I R: name of the relation type I Ak : column type names I e.g. client(client_id, gender, birth_date) I and account(account_id, district_id, date) 23 ER and relational model Correspondence I Entity type: relation type/variable, relvar I Entity: tuple I Attribute type: column type (frequently called attribute type) I Attribute: value in a tuple More about ER ! relational later 24 Keys Superkeys and keys I a relation is a set ) each tuple is unique I a superkey I any subset of the attribute types such that no tuples of the relation share exactly the same values for these attribute types I default superkey: all the attribute types! I a key: a minimal superkey (removing attribute types removes it superkey status) France data set I departement(region_id,departement_id,name) I superkeys I (region_id,departement_id,name) I (departement_id,name) I (departement_id): a key! I (name): another key! 25 Primary and foreign keys Primary key I several keys: candidate keys I one of the is the primary key I other are alternative keys Foreign keys I a set of attribute types FK in a relation type R1 is a foreign key of R1 if 1. a candidate key in a relation R2 has exactly the same domains as the ones of the attribute types in FK 2. values on FK in a tuple in r1 are either NULL or occur in a tuple in r2 I in simple terms: a foreign key links R1 to R2 26 Example Relvar I client(client_id, gender, birth_date) I account(account_id, district_id, date) I disp(disp_id, client_id, account_id, type) with type2 fOWNER; DISPONENT g Many keys I bold and underlined: primary keys I underlined: foreign keys I district_id is also a foreign key but for another relation I in this database, the relation between a client and an account is unique and thus (client_id, account_id) is candidate key 27 Integrity Constraints Relational constraints When implemented the relational model must enforce some constraints I Domain integrity: the values in a tuple are unique values from the associated domain or NULL (if allowed, i.e.