Review Entity-Relationship Diagrams and the • Why use a DBMS? OS provides RAM Relational Model and disk CS 186, Fall 2007, Lecture 2 R & G, Chaps. 2&3
A relationship, I think, is like a shark, you know? It has to constantly move forward or it dies. And I think what we got on our hands is a dead shark.
Woody Allen (from Annie Hall, 1979)
Review Data Models
• Why use a DBMS? OS provides RAM and disk • DBMS models real world • Data Model is link – Concurrency between user’s view of – Recovery the world and bits stored in computer – Abstraction, Data Independence • Many models exist Student (sid: string, name: string, login: string, age: integer, gpa:real) – Query Languages • We think in terms of.. – Relational Model (clean and – Efficiency (for most tasks) common) – Security – Entity-Relationship model (design) 10101 – Data Integrity – XML Model (exchange) 11101
Why Study the Relational Model? Steps in Database Design
• Requirements Analysis • Most widely used model. – user needs; what must database do? • “Legacy systems” in older models • Conceptual Design – e.g., IBM’s IMS – high level description (often done w/ER model) • Object-oriented concepts merged in – Rails encourages you to work here – “Object-Relational” – two variants • Logical Design • Object model known to the DBMS – translate ER into DBMS data model • Object-Relational Mapping (ORM) outside the DBMS – Rails requires you to work here too – A la Rails • Schema Refinement • XML features in most relational systems – consistency, normalization – Can export XML interfaces • Physical Design - indexes, disk layout – Can provide XML storage/retrieval • Security Design - who accesses what, and how Conceptual Design name ER Model Basics ssn lot
• What are the entities and relationships in the Employees enterprise? • Entity: Real-world object, distinguishable from • What information about these entities and other objects. An entity is described using a set of attributes. relationships should we store in the database? • Entity Set: A collection of similar entities. E.g., all employees. • What integrity constraints or business rules hold? – All entities in an entity set have the same set • A database `schema’ in the ER Model can be of attributes. (Until we consider hierarchies, anyway!) represented pictorially (ER diagrams). – Each entity set has a key (underlined). • Can map an ER diagram into a relational schema. – Each attribute has a domain.
ER Model Basics (Contd.) ER Model Basics (Cont.) name
since ssn lot name dname ssn lot did budget Employees since dname Employees Works_In Departments super- subor- did budget visor dinate
Reports_To • Relationship: Association among two or more Departments Works_In entities. E.g., Attishoo works in Pharmacy department. •Same entity set can participate in different – relationships can have their own attributes. relationship sets, or in different “roles” in • Relationship Set: Collection of similar relationships. the same set.
– An n-ary relationship set R relates n entity sets E1 ... En ; each relationship in R involves entities e1 ∈ E1, ..., en ∈ En
name since dname ssn lot did budget …to be clear… Key Constraints Employees Manages Departments • Recall that each relationship has exactly An employee can one element of each Entity Set work in many Works_In – “1-M” is a constraint on the Relationship departments; a since dept can have Set, not each relationship many employees. • Think of 1-M-M ternary relationship
In contrast, each dept has at most one manager, according to the key constraint Many-to- 1-to Many 1-to-1 on Manages. Many Participation Constraints Weak Entities • Does every employee work in a department? A weak entity can be identified uniquely only by • If so, this is a participation constraint considering the primary key of another (owner) – the participation of Employees in Works_In is said to be entity. total (vs. partial) – Owner entity set and weak entity set must – What if every department has an employee working in it? participate in a one-to-many relationship set (one • Basically means “at least one” owner, many weak entities). since name dname – Weak entity set must have total participation in ssn lot did budget this identifying relationship set.
Employees Manages Departments name cost ssn lot pname age Works_In
Means: “exactly one” Employees Policy Dependents since Weak entities have only a “partial key” (dashed underline)
Binary vs. Ternary Relationships name Binary vs. Ternary Relationships (Contd.) ssn lot pname age
Employees Covers Dependents If each policy is owned by just 1 employee: • Previous example illustrated a case when two binary relationships were better than one ternary Bad design Policies Key constraint on relationship. Policies would policyid cost mean policy can name pname age only cover 1 ssn lot • An example in the other direction: a ternary relation dependent! Dependents Contracts relates entity sets Parts, Departments and Employees Suppliers, and has descriptive attribute qty. No Purchaser • Think through all Beneficiary combination of binary relationships is an adequate the constraints in substitute. (With no new entity sets!) the 2nd diagram! Better design Policies
policyid cost
Summary so far Binary vs. Ternary Relationships (Contd.) qty • Entities and Entity Set (boxes) Parts Contract Departments • Relationships and Relationship sets (diamonds) VS. – binary Suppliers – n-ary Parts needs Departments • Key constraints (1-1,1-N, M-N, arrows)
can-supply • Participation constraints (bold for Total) Suppliers deals-with • Weak entities - require strong entity for key
– S “can-supply” P, D “needs” P, and D “deals-with” S does not imply that D has agreed to buy P from S. – How do we record qty? Administrivia Other Rails Resources
• Blog online • Rails API: http://api.rubyonrails.org • Syllabus & HW calendar coming on-line • Online tutorials – Schedule and due dates may change (check – E.g. http://poignantguide.net/ruby frequently) – Screencasts: – Lecture notes are/will be posted http://www.rubyonrails.org/screencasts • HW 0 posted -- due Friday night! – Armando Fox’s daylong seminar: – Accts forms! http://webcast.berkeley.edu/event_details.php?we • Other textbooks bcastid=20854 – Korth/Silberschatz/Sudarshan – O’Neil and O’Neil • There are tons of support materials and fora – Garcia-Molina/Ullman/Widom on the web for RoR
Relational Database: Definitions Ex: Instance of Students Relation
• Relational database: a set of relations. • Relation: made up of 2 parts: sid name login age gpa – Schema : specifies name of relation, plus name 53666 Jones jones@cs 18 3.4 and type of each column. • E.g. Students(sid: string, name: string, login: string, age: 53688 Smith smith@eecs 18 3.2 integer, gpa: real) 53650 Smith smith@math 19 3.8
– Instance : a table, with rows and columns. • #rows = cardinality Cardinality = 3, arity = 5 , all rows distinct • #fields = degree / arity • • Can think of a relation as a set of rows or • Do all values in each column of a relation instance tuples. have to be distinct? – i.e., all rows are distinct
SQL - A language for Relational DBs SQL Overview
• SQL (a.k.a. “Sequel”), standard • CREATE TABLE
• Creates the Students relation. • Another example: the Enrolled table – Note: the type (domain) of each field is holds information about courses specified, and enforced by the DBMS students take. whenever tuples are added or modified. CREATE TABLE Enrolled CREATE TABLE Students (sid CHAR(20), (sid CHAR(20), cid CHAR(20), name CHAR(20), grade CHAR(2)) login CHAR(10), age INTEGER, gpa FLOAT)
Adding and Deleting Tuples Keys
• Can insert a single tuple using: • Keys are a way to associate tuples in INSERT INTO Students (sid, name, login, age, gpa) different relations VALUES (‘53688’, ‘Smith’, ‘smith@ee’, 18, 3.2) • Keys are one form of integrity constraint (IC) • Can delete all tuples satisfying some condition (e.g., name = Smith): Enrolled Students sid cid grade DELETE 53666 Carnatic101 C sid name login age gpa FROM Students S 53666 Reggae203 B 53666 Jones jones@cs 18 3.4 53650 Topology112 A 53688 Smith smith@eecs 18 3.2 WHERE S.name = ‘Smith’ 53666 History105 B 53650 Smith smith@math 19 3.8
Powerful variants of these commands are available; more later! FOREIGN Key PRIMARY Key
Primary Keys Primary and Candidate Keys in SQL • Possibly many candidate keys (specified using • A set of fields is a superkey if: UNIQUE), one of which is chosen as the primary key. – No two distinct tuples can have same values in all key fields • A set of fields is a key for a relation if : • Keys must be used carefully! – It is a superkey • “For a given student and course, there is a single grade.” – No subset of the fields is a superkey CREATE TABLE Enrolled • what if >1 key for a relation? CREATE TABLE Enrolled (sid CHAR(20) (sid CHAR(20) – One of the keys is chosen (by DBA) to be the primary key. cid CHAR(20), Other keys are called candidate keys. cid CHAR(20), vs. grade CHAR(2), grade CHAR(2), • E.g. PRIMARY KEY (sid,cid)) PRIMARY KEY (sid), – sid is a key for Students. UNIQUE (cid, grade)) – What about name? – The set {sid, gpa} is a superkey. “Students can take only one course, and no two students in a course receive the same grade.” Foreign Keys, Referential Integrity Foreign Keys in SQL
• Foreign key: Set of fields in one relation • E.g. Only students listed in the Students relation that is used to `refer’ to a tuple in another should be allowed to enroll for courses. relation. – sid is a foreign key referring to Students: – Must correspond to the primary key of the other CREATE TABLE Enrolled relation. (sid CHAR(20),cid CHAR(20),grade CHAR(2), PRIMARY KEY (sid,cid), – Like a `logical pointer’. FOREIGN KEY (sid) REFERENCES Students )
Enrolled • If all foreign key constraints are enforced, sid cid grade Students sid name login age gpa referential integrity is achieved (i.e., no 53666 Carnatic101 C 53666 Jones jones@cs 18 3.4 53666 Reggae203 B dangling references.) 53688 Smith smith@eecs 18 3.2 53650 Topology112 A 53666 History105 B 53650 Smith smith@math 19 3.8 11111 English102 A
Enforcing Referential Integrity Integrity Constraints (ICs)
• IC: condition that must be true for any • Consider Students and Enrolled; sid in Enrolled is a instance of the database; e.g., domain foreign key that references Students. constraints. • What should be done if an Enrolled tuple with a non- – ICs are specified when schema is defined. existent student id is inserted? (Reject it!) – ICs are checked when relations are modified. • What should be done if a Students tuple is deleted? • A legal instance of a relation is one that – Also delete all Enrolled tuples that refer to it? satisfies all specified ICs. – Disallow deletion of a Students tuple that is referred to? – DBMS should not allow illegal instances. – Set sid in Enrolled tuples that refer to it to a default sid? • If the DBMS checks ICs, stored data is more – (In SQL, also: Set sid in Enrolled tuples that refer to it to a faithful to real-world meaning. special value null, denoting `unknown’ or `inapplicable’.) – Avoids data entry errors, too! • Similar issues arise if primary key of Students tuple is updated.
Where do ICs Come From? Relational Query Languages
• ICs are based upon the semantics of the real-world that is being described in the database relations. • A major strength of the relational model: • We can check a database instance to see if an IC is violated, but we can NEVER infer that an IC is true by supports simple, powerful querying of data. looking at an instance. • Queries can be written intuitively, and the – An IC is a statement about all possible instances! DBMS is responsible for efficient evaluation. – From example, we know name is not a key, but the – The key: precise semantics for relational queries. assertion that sid is a key is given to us. – Allows the optimizer to extensively re-order • Key and foreign key ICs are the most common; more operations, and still ensure that the answer does general ICs supported too. not change. • In the real world, sometimes the constraint should hold but doesn’t --> data cleaning! The SQL Query Language Querying Multiple Relations
• The most widely used relational query • What does the following query compute? language. – Current std is SQL:2003; SQL92 is a basic subset SELECT S.name, E.cid FROM Students S, Enrolled E • To find all 18 year old students, we can write: WHERE S.sid=E.sid AND E.grade='A' sid name login age gpa SELECT * Given the following instance of sid cid grade FROM Students S 53666 Jones jones@cs 18 3.4 53831 Carnatic101 C WHERE S.age=18 Enrolled 53688 Smith smith@ee 18 3.2 53831 Reggae203 B
53650 Smith smith@math 19 3.8 53650 Topology112 A • To find just names and l ogins, replace the first line: 53666 History105 B SELECT S.name, S.login we get: S.name E.cid Smith Topology112
Semantics of a Query Cross-product of Students and Enrolled Instances
S.sid S.name S.login S.age S.gpa E.sid E.cid E.grade • A conceptual evaluation method for the previous 53666 Jones jones@cs 18 3.4 53831 Carnatic101 C 53666 Jones jones@cs 18 3.4 53832 Reggae203 B query: 53666 Jones jones@cs 18 3.4 53650 Topology112 A 1. do FROM clause: compute cross-product of Students and 53666 Jones jones@cs 18 3.4 53666 History105 B Enrolled 53688 Smith smith@ee 18 3.2 53831 Carnatic101 C 53688 Smith smith@ee 18 3.2 53831 Reggae203 B 2. do WHERE clause: Check conditions, discard tuples that fail 53688 Smith smith@ee 18 3.2 53650 Topology112 A 53688 Smith smith@ee 18 3.2 53666 History105 B 3. do SELECT clause: Delete unwanted fields 53650 Smith smith@math 19 3.8 53831 Carnatic101 C • Remember, this is conceptual. Actual evaluation will 53650 Smith smith@math 19 3.8 53831 Reggae203 B 53650 Smith smith@math 19 3.8 53650 Topology112 A be much more efficient, but must produce the same 53650 Smith smith@math 19 3.8 53666 History105 B answers.
Relational Model: Summary
• A tabular representation of data. • Simple and intuitive, currently the most widely used – Object-relational support in most products – XML support added in SQL:2003, most systems GOSUB XML; • Integrity constraints can be specified by the DBA, based on application semantics. DBMS checks for violations. – Two important ICs: primary and foreign keys – In addition, we always have domain constraints.
• Powerful query languages exist. – SQL is the standard commercial one • DDL - Data Definition Language • DML - Data Manipulation Language Internet Moment Databases for Programmers
• Programmers think about objects (structs) – Nested and interleaved • Often want to “persist” these things • Options – encode opaquely and store – translate to a structured form • relational DB, XML file – pros and cons?
Remember the Inequality! \YUCK!!
• How do I “relationalize” my objects? dapp denv • Have to write a converter for each << class? dt dt • Think about when to save things into the DB? • If storing indefinitely…use a flexible representation • Good news: – Can all be automated ! – With varying amounts of trouble
Object-Relational Mappings Details, details
• Roughly: • We have to map this down to tables – Class ~ Entity Set • Which table holds which class of object? – Instance ~ Entity • What about relationships? – Data member ~ Attribute • Solution #1: Declarative Configuration – Reference ~ Foreign Key – Write a description file (often in XML) • E.g. Enterprise Java Beans (EJBs) • Solution #2: Convention – Agree to use some conventions • E.g. Rails Ruby on Rails Rails and ER
• Ruby: an OO scripting language • Models – and a pretty nice one, too – Employees • Rails: a framework for web apps – Departments – “convention over configuration” • great for standard web-app stuff! since – allows overriding as needed name dname • Very ER-like ssn lot did budget
Employees Works_In Departments
Some Rails “Models” A More Complex Example app/models/state.rb class State < ActiveRecord::Base has_many :cities end app/models/city.rb class City < ActiveRecord::Base belongs_to :state end
Further Reading
• Chapter 18 (through 18.3) in Agile Web Development with Rails