<<

Review Entity-Relationship Diagrams and the • Why use a DBMS? OS provides RAM Relational Model and disk CS 186, Fall 2007, Lecture 2 R & G, Chaps. 2&3

A relationship, I think, is like a shark, you know? It has to constantly move forward or it dies. And I think what we got on our hands is a dead shark.

Woody Allen (from Annie Hall, 1979)

Review Data Models

• Why use a DBMS? OS provides RAM and disk • DBMS models real world • is link – Concurrency between user’s of – Recovery the world and bits stored in computer – Abstraction, Data Independence • Many models exist Student (sid: string, name: string, login: string, age: integer, gpa:real) – Query Languages • We think in terms of.. – Relational Model (clean and – Efficiency (for most tasks) common) – Security – Entity-Relationship model (design) 10101 – – XML Model (exchange) 11101

Why Study the Relational Model? Steps in Design

• Requirements Analysis • Most widely used model. – user needs; what must database do? • “Legacy systems” in older models • Conceptual Design – e.g., IBM’s IMS – high level description (often done w/ER model) • Object-oriented concepts merged in – Rails encourages you to work here – “Object-Relational” – two variants • Logical Design • Object model known to the DBMS – translate ER into DBMS data model • Object-Relational Mapping (ORM) outside the DBMS – Rails requires you to work here too – A la Rails • Schema Refinement • XML features in most relational systems – consistency, normalization – Can export XML interfaces • Physical Design - indexes, disk layout – Can provide XML storage/retrieval • Security Design - who accesses what, and how Conceptual Design name ER Model Basics ssn lot

• What are the entities and relationships in the Employees enterprise? • Entity: Real-world object, distinguishable from • What about these entities and other objects. An entity is described using a of attributes. relationships should we store in the database? • Entity Set: A collection of similar entities. E.g., all employees. • What integrity constraints or business rules hold? – All entities in an entity set have the same set • A database `schema’ in the ER Model can be of attributes. (Until we consider hierarchies, anyway!) represented pictorially (ER diagrams). – Each entity set has a key (underlined). • Can map an ER diagram into a relational schema. – Each attribute has a domain.

ER Model Basics (Contd.) ER Model Basics (Cont.) name

since ssn lot name dname ssn lot did budget Employees since dname Employees Works_In Departments super- subor- did budget visor dinate

Reports_To • Relationship: Association among two or more Departments Works_In entities. E.g., Attishoo works in Pharmacy department. •Same entity set can participate in different – relationships can have their own attributes. relationship sets, or in different “roles” in • Relationship Set: Collection of similar relationships. the same set.

– An n-ary relationship set R relates n entity sets E1 ... En ; each relationship in R involves entities e1 ∈ E1, ..., en ∈ En

name since dname ssn lot did budget …to be clear… Key Constraints Employees Manages Departments • Recall that each relationship has exactly An employee can one element of each Entity Set work in many Works_In – “1-M” is a constraint on the Relationship departments; a since dept can have Set, not each relationship many employees. • Think of 1-M-M ternary relationship

In contrast, each dept has at most one manager, according to the key constraint Many-to- 1-to Many 1-to-1 on Manages. Many Participation Constraints Weak Entities • Does every employee work in a department? A weak entity can be identified uniquely only by • If so, this is a participation constraint considering the of another (owner) – the participation of Employees in Works_In is said to be entity. total (vs. partial) – Owner entity set and weak entity set must – What if every department has an employee working in it? participate in a one-to-many relationship set (one • Basically means “at least one” owner, many weak entities). since name dname – Weak entity set must have total participation in ssn lot did budget this identifying relationship set.

Employees Manages Departments name cost ssn lot pname age Works_In

Means: “exactly one” Employees Policy Dependents since Weak entities have only a “partial key” (dashed underline)

Binary vs. Ternary Relationships name Binary vs. Ternary Relationships (Contd.) ssn lot pname age

Employees Covers Dependents If each policy is owned by just 1 employee: • Previous example illustrated a case when two binary relationships were better than one ternary Bad design Policies Key constraint on relationship. Policies would policyid cost mean policy can name pname age only cover 1 ssn lot • An example in the other direction: a ternary dependent! Dependents Contracts relates entity sets Parts, Departments and Employees Suppliers, and has descriptive attribute qty. No Purchaser • Think through all Beneficiary combination of binary relationships is an adequate the constraints in substitute. (With no new entity sets!) the 2nd diagram! Better design Policies

policyid cost

Summary so far Binary vs. Ternary Relationships (Contd.) qty • Entities and Entity Set (boxes) Parts Contract Departments • Relationships and Relationship sets (diamonds) VS. – binary Suppliers – n-ary Parts needs Departments • Key constraints (1-1,1-N, M-N, arrows)

can-supply • Participation constraints (bold for Total) Suppliers deals-with • Weak entities - require strong entity for key

– S “can-supply” P, D “needs” P, and D “deals-with” S does not imply that D has agreed to buy P from S. – How do we record qty? Administrivia Other Rails Resources

• Blog online • Rails API: http://api.rubyonrails.org • Syllabus & HW calendar coming on-line • Online tutorials – Schedule and due dates may change (check – E.g. http://poignantguide.net/ruby frequently) – Screencasts: – Lecture notes are/will be posted http://www.rubyonrails.org/screencasts • HW 0 posted -- due Friday night! – Armando Fox’s daylong seminar: – Accts forms! http://webcast.berkeley.edu/event_details.php?we • Other textbooks bcastid=20854 – Korth/Silberschatz/Sudarshan – O’Neil and O’Neil • There are tons of support materials and fora – Garcia-Molina/Ullman/Widom on the web for RoR

Relational Database: Definitions Ex: Instance of Students Relation

: a set of relations. • Relation: made up of 2 parts: sid name login age gpa – Schema : specifies name of relation, plus name 53666 Jones jones@cs 18 3.4 and type of each . • E.g. Students(sid: string, name: string, login: string, age: 53688 Smith smith@eecs 18 3.2 integer, gpa: real) 53650 Smith smith@math 19 3.8

– Instance : a , with rows and columns. • #rows = cardinality Cardinality = 3, arity = 5 , all rows distinct • #fields = degree / arity • • Can think of a relation as a set of rows or • Do all values in each column of a relation instance . have to be distinct? – i.e., all rows are distinct

SQL - A language for Relational DBs SQL Overview

• SQL (a.k.a. “Sequel”), standard • CREATE TABLE ( , … ) language • INSERT INTO () • (DDL) VALUES () – create, modify, delete relations • DELETE FROM – specify constraints WHERE – administer users, security, etc. • UPDATE SET = • Data Manipulation Language (DML) WHERE – Specify queries to find tuples that satisfy • SELECT criteria FROM – add, modify, remove tuples WHERE Creating Relations in SQL Table Creation (continued)

• Creates the Students relation. • Another example: the Enrolled table – Note: the type (domain) of each field is holds information about courses specified, and enforced by the DBMS students take. whenever tuples are added or modified. CREATE TABLE Enrolled CREATE TABLE Students (sid CHAR(20), (sid CHAR(20), cid CHAR(20), name CHAR(20), grade CHAR(2)) login CHAR(10), age INTEGER, gpa FLOAT)

Adding and Deleting Tuples Keys

• Can insert a single using: • Keys are a way to associate tuples in INSERT INTO Students (sid, name, login, age, gpa) different relations VALUES (‘53688’, ‘Smith’, ‘smith@ee’, 18, 3.2) • Keys are one form of integrity constraint (IC) • Can delete all tuples satisfying some condition (e.g., name = Smith): Enrolled Students sid cid grade DELETE 53666 Carnatic101 C sid name login age gpa FROM Students S 53666 Reggae203 B 53666 Jones jones@cs 18 3.4 53650 Topology112 A 53688 Smith smith@eecs 18 3.2 WHERE S.name = ‘Smith’ 53666 History105 B 53650 Smith smith@math 19 3.8

Powerful variants of these commands are available; more later! PRIMARY Key

Primary Keys Primary and Candidate Keys in SQL • Possibly many candidate keys (specified using • A set of fields is a if: UNIQUE), one of which is chosen as the primary key. – No two distinct tuples can have same values in all key fields • A set of fields is a key for a relation if : • Keys must be used carefully! – It is a superkey • “For a given student and course, there is a single grade.” – No of the fields is a superkey CREATE TABLE Enrolled • what if >1 key for a relation? CREATE TABLE Enrolled (sid CHAR(20) (sid CHAR(20) – One of the keys is chosen (by DBA) to be the primary key. cid CHAR(20), Other keys are called candidate keys. cid CHAR(20), vs. grade CHAR(2), grade CHAR(2), • E.g. PRIMARY KEY (sid,cid)) PRIMARY KEY (sid), – sid is a key for Students. UNIQUE (cid, grade)) – What about name? – The set {sid, gpa} is a superkey. “Students can take only one course, and no two students in a course receive the same grade.” Foreign Keys, Foreign Keys in SQL

• Foreign key: Set of fields in one relation • E.g. Only students listed in the Students relation that is used to `refer’ to a tuple in another should be allowed to enroll for courses. relation. – sid is a foreign key referring to Students: – Must correspond to the primary key of the other CREATE TABLE Enrolled relation. (sid CHAR(20),cid CHAR(20),grade CHAR(2), PRIMARY KEY (sid,cid), – Like a `logical pointer’. FOREIGN KEY (sid) REFERENCES Students )

Enrolled • If all foreign key constraints are enforced, sid cid grade Students sid name login age gpa referential integrity is achieved (i.e., no 53666 Carnatic101 C 53666 Jones jones@cs 18 3.4 53666 Reggae203 B dangling references.) 53688 Smith smith@eecs 18 3.2 53650 Topology112 A 53666 History105 B 53650 Smith smith@math 19 3.8 11111 English102 A

Enforcing Referential Integrity Integrity Constraints (ICs)

• IC: condition that must be true for any • Consider Students and Enrolled; sid in Enrolled is a instance of the database; e.g., domain foreign key that references Students. constraints. • What should be done if an Enrolled tuple with a non- – ICs are specified when schema is defined. existent student id is inserted? (Reject it!) – ICs are checked when relations are modified. • What should be done if a Students tuple is deleted? • A legal instance of a relation is one that – Also delete all Enrolled tuples that refer to it? satisfies all specified ICs. – Disallow deletion of a Students tuple that is referred to? – DBMS should not allow illegal instances. – Set sid in Enrolled tuples that refer to it to a default sid? • If the DBMS checks ICs, stored data is more – (In SQL, also: Set sid in Enrolled tuples that refer to it to a faithful to real-world meaning. special value , denoting `unknown’ or `inapplicable’.) – Avoids data entry errors, too! • Similar issues arise if primary key of Students tuple is updated.

Where do ICs Come From? Relational Query Languages

• ICs are based upon the semantics of the real-world that is being described in the database relations. • A major strength of the relational model: • We can check a database instance to see if an IC is violated, but we can NEVER infer that an IC is true by supports simple, powerful querying of data. looking at an instance. • Queries can be written intuitively, and the – An IC is a statement about all possible instances! DBMS is responsible for efficient evaluation. – From example, we know name is not a key, but the – The key: precise semantics for relational queries. assertion that sid is a key is given to us. – Allows the optimizer to extensively re-order • Key and foreign key ICs are the most common; more operations, and still ensure that the answer does general ICs supported too. not change. • In the real world, sometimes the constraint should hold but doesn’t --> data cleaning! The SQL Querying Multiple Relations

• The most widely used relational query • What does the following query compute? language. – Current std is SQL:2003; SQL92 is a basic subset SELECT S.name, E.cid FROM Students S, Enrolled E • To find all 18 year old students, we can write: WHERE S.sid=E.sid AND E.grade='A' sid name login age gpa SELECT * Given the following instance of sid cid grade FROM Students S 53666 Jones jones@cs 18 3.4 53831 Carnatic101 C WHERE S.age=18 Enrolled 53688 Smith smith@ee 18 3.2 53831 Reggae203 B

53650 Smith smith@math 19 3.8 53650 Topology112 A • To find just names and l ogins, replace the first line: 53666 History105 B SELECT S.name, S.login we get: S.name E.cid Smith Topology112

Semantics of a Query Cross-product of Students and Enrolled Instances

S.sid S.name S.login S.age S.gpa E.sid E.cid E.grade • A conceptual evaluation method for the previous 53666 Jones jones@cs 18 3.4 53831 Carnatic101 C 53666 Jones jones@cs 18 3.4 53832 Reggae203 B query: 53666 Jones jones@cs 18 3.4 53650 Topology112 A 1. do FROM clause: compute cross-product of Students and 53666 Jones jones@cs 18 3.4 53666 History105 B Enrolled 53688 Smith smith@ee 18 3.2 53831 Carnatic101 C 53688 Smith smith@ee 18 3.2 53831 Reggae203 B 2. do WHERE clause: Check conditions, discard tuples that fail 53688 Smith smith@ee 18 3.2 53650 Topology112 A 53688 Smith smith@ee 18 3.2 53666 History105 B 3. do SELECT clause: Delete unwanted fields 53650 Smith smith@math 19 3.8 53831 Carnatic101 C • Remember, this is conceptual. Actual evaluation will 53650 Smith smith@math 19 3.8 53831 Reggae203 B 53650 Smith smith@math 19 3.8 53650 Topology112 A be much more efficient, but must produce the same 53650 Smith smith@math 19 3.8 53666 History105 B answers.

Relational Model: Summary

• A tabular representation of data. • Simple and intuitive, currently the most widely used – Object-relational support in most products – XML support added in SQL:2003, most systems GOSUB XML; • Integrity constraints can be specified by the DBA, based on application semantics. DBMS checks for violations. – Two important ICs: primary and foreign keys – In addition, we always have domain constraints.

• Powerful query languages exist. – SQL is the standard commercial one • DDL - Data Definition Language • DML - Data Manipulation Language Internet Moment for Programmers

• Programmers think about objects (structs) – Nested and interleaved • Often want to “persist” these things • Options – encode opaquely and store – translate to a structured form • relational DB, XML file – pros and cons?

Remember the Inequality! \YUCK!!

• How do I “relationalize” my objects? dapp denv • Have to write a converter for each << class? dt dt • Think about when to save things into the DB? • If storing indefinitely…use a flexible representation • Good news: – Can all be automated ! – With varying amounts of trouble

Object-Relational Mappings Details, details

• Roughly: • We have to map this down to tables – Class ~ Entity Set • Which table holds which class of object? – Instance ~ Entity • What about relationships? – Data member ~ Attribute • Solution #1: Declarative Configuration – Reference ~ Foreign Key – Write a description file (often in XML) • E.g. Enterprise Java Beans (EJBs) • Solution #2: Convention – Agree to use some conventions • E.g. Rails Ruby on Rails Rails and ER

• Ruby: an OO scripting language • Models – and a pretty nice one, too – Employees • Rails: a framework for web apps – Departments – “convention over configuration” • great for standard web-app stuff! since – allows overriding as needed name dname • Very ER-like ssn lot did budget

Employees Works_In Departments

Some Rails “Models” A More Complex Example app/models/state.rb class State < ActiveRecord::Base has_many :cities end app/models/city.rb class City < ActiveRecord::Base belongs_to :state end

Further Reading

• Chapter 18 (through 18.3) in Agile Web Development with Rails