Database Management Systems s6
Total Page:16
File Type:pdf, Size:1020Kb
DATABASE MANAGEMENT SYSTEMS UNIT II- RELATIONAL MODEL Objectives
To explain the relational model To know the definition of various keys To know the basic concepts of SQL To write the queries in relational algebra and relational calculus To learn about SQL views, triggers To know the concept of embedded SQL and dynamic SQL 2.1 Basic Structure Formally, given sets D1, D2, …. Dn a relation r is a subset of D1 x D2 x … x Dn Thus a relation is a set of n-tuples (a1, a2, …, an) where each ai Î Di Example: if . customer-name = {Jones, Smith, Curry, Lindsay} customer-street = {Main, North, Park} customer-city = {Harrison, Rye, Pittsfield} Then r = { (Jones, Main, Harrison), (Smith, North, Rye), (Curry, North, Rye), (Lindsay, Park, Pittsfield)} is a relation over customer-name x customer-street x customer-city Attribute Types Each attribute of a relation has a name The set of allowed values for each attribute is called the domain of the attribute Attribute values are (normally) required to be atomic, that is, indivisible o E.g. multivalued attribute values are not atomic o E.g. composite attribute values are not atomic The special value null is a member of every domain The null value causes complications in the definition of many operations o we shall ignore the effect of null values in our main presentation and consider their effect later Relation Schema A1, A2, …, An are attributes R = (A1, A2, …, An ) is a relation schema . E.g. Customer-schema = (customer-name, customer-street, customer-city) r(R) is a relation on the relation schema R . E.g. customer (Customer-schema) Relation Instance The current values (relation instance) of a relation are specified by a table An element t of r is a tuple, represented by a row in a table
2.2 Keys Let K Í R K is a superkey of R if values for K are sufficient to identify a unique tuple of each possible relation r(R) o by “possible r” we mean a relation r that could exist in the enterprise we are modeling. o Example: {customer-name, customer-street} and {customer-name} are both superkeys of Customer, if no two customers can possibly have the same name. K is a candidate key if K is minimal Example: {customer-name} is a candidate key for Customer, since it is a superkey (assuming no two customers can possibly have the same name), and no subset of it is a superkey. Determining Keys from E-R Sets Strong entity set. The primary key of the entity set becomes the primary key of the relation. Weak entity set. The primary key of the relation consists of the union of the primary key of the strong entity set and the discriminator of the weak entity set. Relationship set. The union of the primary keys of the related entity sets becomes a super key of the relation. o For binary many-to-one relationship sets, the primary key of the “many” entity set becomes the relation’s primary key. o For one-to-one relationship sets, the relation’s primary key can be that of either entity set. o For many-to-many relationship sets, the union of the primary keys becomes the relation’s primary key Schema Diagram for the Banking Enterprise
2.3 Query Languages Language in which user requests information from the database. Categories of languages r procedural r non-procedural “Pure” languages: r Relational Algebra r Tuple Relational Calculus r Domain Relational Calculus Pure languages form underlying basis of query languages that people use. 2.3.1 Relational Algebra Procedural language Six basic operators r select r project r union r set difference r Cartesian product r rename The operators take two or more relations as inputs and give a new relation as a result. (i)Select Operation Notation: s p(r) p is called the selection predicate Defined as: sp(r) = {t | t Î r and p(t)} Where p is a formula in propositional calculus consisting of terms connected by : Ù (and), Ú (or), Ø (not) Each term is one of: .
(ii)Project Operation Notation: ÕA1, A2, …, Ak (r) where A1, A2 are attribute names and r is a relation name. The result is defined as the relation of k columns obtained by erasing the columns that are not listed Duplicate rows removed from result, since relations are sets E.g. To eliminate the branch-name attribute of account Õaccount-number, balance (account) Project Operation – Example
(iii)Union Operation Notation: r È s Defined as: . r È s = {t | t Î r or t Î s} For r È s to be valid. r, s must have the same arity (same number of attributes) The attribute domains must be compatible (e.g., 2nd column of r deals with the same type of values as does the 2nd column of s) E.g. to find all customers with either an account or a loan Õcustomer-name (depositor) È Õcustomer-name (borrower) Union Operation – Example (iv) Set Difference Operation Notation r – s Defined as: . r – s = {t | t Î r and t Ï s} Set differences must be taken between compatible relations. r r and s must have the same arity r attribute domains of r and s must be compatible Set Difference Operation – Example
(v) Cartesian-Product Operation Notation r x s Defined as: . r x s = {t q | t Î r and q Î s} Assume that attributes of r(R) and s(S) are disjoint. (That is, R Ç S = Æ). If attributes of r(R) and s(S) are not disjoint, then renaming must be used. Cartesian-Product Operation-Example (vi) Composition of Operations Can build expressions using multiple operations Example: sA=C(r x s) r x s
sA=C(r x s)
(vii) Rename Operation Allows us to name, and therefore to refer to, the results of relational-algebra expressions. Allows us to refer to a relation by more than one name. Example: r x (E) returns the expression E under the name X If a relational-algebra expression E has arity n, then o rx (A1, A2, …, An) (E) returns the result of expression E under the name X, with the attributes renamed to A1, A2, …., An.
Banking Example
branch (branch-name, branch-city, assets) customer (customer-name, customer-street, customer-only) account (account-number, branch-name, balance) loan (loan-number, branch-name, amount) depositor (customer-name, account-number) borrower (customer-name, loan-number)
Example Queries
1. Find all loans of over $1200
samount > 1200 (loan)
2. Find the loan number for each loan of an amount greater than $1200
Õloan-number (samount > 1200 (loan))
3. Find the names of all customers who have a loan, an account, or both, from the bank
Õcustomer-name (borrower) È Õcustomer-name (depositor)
4. Find the names of all customers who have a loan and an account at bank.
Õcustomer-name (borrower) Ç Õcustomer-name (depositor)
5. Find the names of all customers who have a loan at the Perryridge branch.
Õcustomer-name (sbranch-name=“Perryridge”(sborrower.loan-number = loan.loan- number(borrower x loan)))
6. Find the names of all customers who have a loan at the Perryridge branch but do not have an account at any branch of the bank.
Õcustomer-name (sbranch-name = “Perryridge” (sborrower.loan-number = loan.loan- number(borrower x loan))) – Õcustomer-name(depositor)
7. Find the names of all customers who have a loan at the Perryridge branch. Õcustomer-name(sbranch-name = “Perryridge” (sborrower.loan-number = loan.loan- number(borrower x loan))) (or) Õcustomer-name(sloan.loan-number = borrower.loan-number((sbranch-name = “Perryridge”(loan)) x borrower))
(viii) Formal Definition
A basic expression in the relational algebra consists of either one of the following: r A relation in the database r A constant relation Let E1 and E2 be relational-algebra expressions; the following are all relational-algebra expressions: r E1 È E2 r E1 - E2 r E1 x E2 r sp (E1), P is a predicate on attributes in E1 r Õs(E1), S is a list consisting of some of the attributes in E1 r r x (E1), x is the new name for the result of E1 Additional Operations We define additional operations that do not add any power to the relational algebra, but that simplify common queries. Set intersection Natural join Division Assignment (a) Set-Intersection Operation Notation: r Ç s Defined as: r Ç s ={ t | t Î r and t Î s } Assume: r r, s have the same arity r attributes of r and s are compatible Note: r Ç s = r - (r - s) Set-Intersection Operation - Example (b) Natural-Join Operation Notation: r s Let r and s be relations on schemas R and S respectively. Then, r s is a relation on schema R È S obtained as follows: r Consider each pair of tuples tr from r and ts from s. r If tr and ts have the same value on each of the attributes in R Ç S, add a tuple t to the result, where . t has the same value as tr on r . t has the same value as ts on s Example: R = (A, B, C, D) S = (E, B, D) r Result schema = (A, B, C, D, E) r r s is defined as: Õr.A, r.B, r.C, r.D, s.E (sr.B = s.B Ù r.D = s.D (r x s)) Natural Join Operation – Example
(c) Division Operation r ¸ s Suited to queries that include the phrase “for all”. Let r and s be relations on schemas R and S respectively where r R = (A1, …, Am, B1, …, Bn) r S = (B1, …, Bn) The result of r ¸ s is a relation on schema R – S = (A1, …, Am) . r ¸ s = { t | t Î Õ R-S(r) Ù " u Î s ( tu Î r ) } Division Operation – Example
Example Queries 1. Find all customers who have an account from at least the “Downtown” and the Uptown” branches. ÕCN(sBN=“Downtown”(depositor account)) ÇÕCN(sBN=“Uptown”(depositor account)) where CN denotes customer-name and BN denotes branch-name. (or) Õcustomer-name, branch-name (depositor account) ¸ rtemp(branch-name) ({(“Downtown”), (“Uptown”)}) 2. Find all customers who have an account at all branches located in Brooklyn city. Õcustomer-name, branch-name (depositor account) ¸ Õbranch-name (sbranch-city = “Brooklyn” (branch)) (ix)Extended Relational-Algebra-Operations Generalized Projection Outer Join Aggregate Functions Generalized Projection Extends the projection operation by allowing arithmetic functions to be used in the projection list. Õ F1, F2, …, Fn(E) E is any relational-algebra expression Each of F1, F2, …, Fn are are arithmetic expressions involving constants and attributes in the schema of E. Given relation credit-info(customer-name, limit, credit-balance), find how much more each person can spend: . Õcustomer-name, limit – credit-balance (credit-info) Aggregate Functions and Operations Aggregation function takes a collection of values and returns a single value as a result. . avg: average value min: minimum value max: maximum value sum: sum of values count: number of values Aggregate operation in relational algebra G1, G2, …, Gn g F1( A1), F2( A2),…, Fn( An) (E) r E is any relational-algebra expression r G1, G2 …, Gn is a list of attributes on which to group (can be empty) r Each Fi is an aggregate function r Each Ai is an attribute name Outer Join An extension of the join operation that avoids loss of information. Computes the join and then adds tuples form one relation that does not match tuples in the other relation to the result of the join. Uses null values: r null signifies that the value is unknown or does not exist r All comparisons involving null are (roughly speaking) false by definition. . Will study precise meaning of comparisons with nulls later Outer Join – Example
2.3.2 Domain Relational Calculus A nonprocedural query language equivalent in power to the tuple relational calculus Each query is an expression of the form: { < x1, x2, …, xn > | P(x1, x2, …, xn)} o x1, x2, …, xn represent domain variables o P represents a formula similar to that of the predicate calculus Example Queries 1. Find the loan-number, branch-name, and amount for loans of over $1200 {< l, b, a > | < l, b, a > Î loan Ù a > 1200} 2. Find the names of all customers who have a loan of over $1200 {< c > | $ l, b, a (< c, l > Î borrower Ù < l, b, a > Î loan Ù a > 1200)} 3. Find the names of all customers who have a loan from the Perryridge branch and the loan amount: {< c, a > | $ l (< c, l > Î borrower Ù $b(< l, b, a > Î loan Ù b = “Perryridge”))} or {< c, a > | $ l (< c, l > Î borrower Ù < l, “Perryridge”, a > Î loan)} 4. Find the names of all customers having a loan, an account, or both at the Perryridge branch: {< c > | $ l ({< c, l > Î borrower Ù $ b,a(< l, b, a > Î loan Ù b = “Perryridge”))Ú $ a(< c, a > Î depositor Ù $ b,n(< a, b, n > Î account Ù b = “Perryridge”))} 5. Find the names of all customers who have an account at all branches located in Brooklyn: {< c > | $ s, n (< c, s, n > Î customer) Ù " x,y,z(< x, y, z > Î branch Ù y = “Brooklyn”) Þ $ a,b(< x, y, z > Î account Ù < c,a > Î depositor)} Safety of Expressions { < x1, x2, …, xn > | P(x1, x2, …, xn)}is safe if all of the following hold: 1.All values that appear in tuples of the expression are values from dom(P) (that is, the values appear either in P or in a tuple of a relation mentioned in P). 2.For every “there exists” subformula of the form $ x (P1(x)), the subformula is true if and only if there is a value of x in dom(P1) such that P1(x) is true. 3. For every “for all” subformula of the form "x (P1 (x)), the subformula is true if and only if P1(x) is true for all values x from dom (P1). 2.3.3 Tuple Relational Calculus A nonprocedural query language, where each query is of the form . {t | P (t) } It is the set of all tuples t such that predicate P is true for t t is a tuple variable, t[A] denotes the value of tuple t on attribute A t Î r denotes that tuple t is in relation r P is a formula similar to that of the predicate calculus Predicate Calculus Formula Set of attributes and constants o Set of comparison operators: (e.g., <, £, =, ¹, >, ³) o Set of connectives: and (Ù), or (v)‚ not (Ø) o Implication (Þ): x Þ y, if x if true, then y is true o x Þ y Ø ºx v y o Set of quantifiers: o $ t Î r (Q(t)) º ”there exists” a tuple in t in relation r such that predicate Q(t) is true o "t Î r (Q(t)) º Q is true “for all” tuples t in relation r Banking Example branch (branch-name, branch-city, assets) customer (customer-name, customer-street, customer-city) account (account-number, branch-name, balance) loan (loan-number, branch-name, amount) depositor (customer-name, account-number) borrower (customer-name, loan-number) Example Queries 1. Find the loan-number, branch-name, and amount for loans of over $1200 {t | t Î loan Ù t [amount] > 1200} 2. Find the loan number for each loan of an amount greater than $1200 {t | $ s Î loan (t[loan-number] = s[loan-number] Ù s [amount] > 1200)} 3.Find the names of all customers having a loan, an account, or both at the bank {t | $s Î borrower( t[customer-name] = s[customer-name])Ú $u Î depositor( t[customer-name] = u[customer-name]) 4.Find the names of all customers who have a loan and an account at the bank {t | $s Î borrower( t[customer-name] = s[customer-name]) Ù $u Î depositor( t[customer-name] = u[customer-name]) 5.Find the names of all customers having a loan at the Perryridge branch {t | $s Î borrower(t[customer-name] = s[customer-name] Ù $u Î loan(u[branch-name] = “Perryridge” Ù u[loan-number] = s[loan-number]))} 6.Find the names of all customers who have a loan at the Perryridge branch, but no account at any branch of the bank {t | $s Î borrower( t[customer-name] = s[customer-name] Ù $u Î loan(u[branch-name] = “Perryridge” Ù u[loan-number] = s[loan-number])) Ù not $v Î depositor (v[customer-name] = t[customer-name]) } 7.Find the names of all customers having a loan from the Perryridge branch, and the cities they live in {t | $s Î loan(s[branch-name] = “Perryridge” Ù $u Î borrower (u[loan-number] = s[loan-number] Ù t [customer-name] = u[customer-name]) Ù $ v Î customer (u[customer-name] = v[customer- name] Ù t[customer-city] = v[customer-city])))} 8. Find the names of all customers who have an account at all branches located in Brooklyn: {t | $ c Î customer (t[customer.name] = c[customer-name]) Ù" s Î branch(s[branch-city] = “Brooklyn” Þ $ u Î account ( s[branch-name] = u[branch-name] Ù $ s Î depositor ( t[customer-name] = s[customer-name] Ù s[account-number] = u[account-number] )) )} Safety of Expressions It is possible to write tuple calculus expressions that generate infinite relations. For example, {t | Ø t Î r} results in an infinite relation if the domain of any attribute of relation r is infinite To guard against the problem, we restrict the set of allowable expressions to safe expressions. An expression {t | P(t)} in the tuple relational calculus is safe if every component of t appears in one of the relations, tuples, or constants that appear in P o NOTE: this is more than just a syntax condition. . E.g. { t | t[A]=5 Ú true } is not safe --- it defines an infinite set with attribute values that do not appear in any relation or tuples or constants in P. 2.4 SQL fundamentals Basic Structure SQL is based on set and relational operations with certain modifications and enhancements A typical SQL query has the form: select A1, A2, ..., An from r1, r2, ..., rm where P Ais represent attributes, ris represent relations, P is a predicate. This query is equivalent to the relational algebra expression. A1, A2, ..., An(P (r1 x r2 x ... x rm)) The result of an SQL query is a relation. The select Clause The select clause list the attributes desired in the result of a query corresponds to the projection operation of the relational algebra E.g. find the names of all branches in the loan relation select branch-namefrom loan In the “pure” relational algebra syntax, the query would be: branch-name(loan) SQL allows duplicates in relations as well as in query results. To force the elimination of duplicates, insert the keyword distinct after select. Find the names of all branches in the loan relations, and remove duplicates select distinct branch-name from loan The keyword all specifies that duplicates not be removed. select all branch-name from loan An asterisk in the select clause denotes “all attributes” select * from loan The select clause can contain arithmetic expressions involving the operation, +, –, , and /, and operating on constants or attributes of tuples. The query: select loan-number, branch-name, amount 100 from loan would return a relation which is the same as the loan relations, except that the attribute amount is multiplied by 100. The where Clause The where clause specifies conditions that the result must satisfy corresponds to the selection predicate of the relational algebra. To find all loan number for loans made at the Perryridge branch with loan amounts greater than $1200. select loan-numberfrom loan where branch-name = ‘Perryridge’ and amount > 1200 Comparison results can be combined using the logical connectives and, or, and not. Comparisons can be applied to results of arithmetic expressions. SQL includes a between comparison operator E.g. Find the loan number of those loans with loan amounts between $90,000 and $100,000 (that is, $90,000 and $100,000) select loan-numberfrom loan where amount between 90000 and 100000 The from Clause The from clause lists the relations involved in the query corresponds to the Cartesian product operation of the relational algebra. Find the Cartesian product borrower x loan select from borrower, loan Find the name, loan number and loan amount of all customers having a loan at the Perryridge branch. select customer-name, borrower.loan-number, amountfrom borrower, loan where borrower.loan-number = loan.loan-number and branch-name = ‘Perryridge’ The Rename Operation The SQL allows renaming relations and attributes using the as clause: old-name as new-name Find the name, loan number and loan amount of all customers; rename the column name loan- number as loan-id. select customer-name, borrower.loan-number as loan-id, amount from borrower, loan where borrower.loan-number = loan.loan-number Tuple Variables Tuple variables are defined in the from clause via the use of the as clause. Find the customer names and their loan numbers for all customers having a loan at some branch. select customer-name, T.loan-number, S.amount from borrower as T, loan as S where T.loan-number = S.loan-number Find the names of all branches that have greater assets than some branch located in Brooklyn. select distinct T.branch-name from branch as T, branch as S where T.assets > S.assets and S.branch-city = ‘Brooklyn’ String Operations SQL includes a string-matching operator for comparisons on character strings. Patterns are described using two special characters: percent (%). The % character matches any substring. underscore (_). The _ character matches any character. Find the names of all customers whose street includes the substring “Main”. select customer-name from customer where customer-street like ‘%Main%’ Match the name “Main%” like ‘Main\%’ escape ‘\’ SQL supports a variety of string operations such as concatenation (using “||”) converting from upper to lower case (and vice versa) finding string length, extracting substrings, etc. Ordering the Display of Tuples List in alphabetic order the names of all customers having a loan in Perryridge branch select distinct customer-name from borrower, loan where borrower loan-number - loan.loan- number and branch-name = ‘Perryridge’order by customer-name We may specify desc for descending order or asc for ascending order, for each attribute; ascending order is the default. E.g. order by customer-name desc Duplicates In relations with duplicates, SQL can define how many copies of tuples appear in the result. Multiset versions of some of the relational algebra operators – given multiset relations r1 and r2: sq (r1): If there are c1 copies of tuple t1 in r1, and t1 satisfies selections sq,, then there are c1 copies of t1 in sq (r1). PA(r): For each copy of tuple t1 in r1, there is a copy of tuple PA(t1) in PA(r1) where PA(t1) denotes the projection of the single tuple t1. r1 x r2 : If there are c1 copies of tuple t1 in r1 and c2 copies of tuple t2 in r2, there are c1 x c2 copies of the tuple t1. t2 in r1 x r2 Example: Suppose multiset relations r1 (A, B) and r2 (C) are as follows: r1 = {(1, a) (2,a)} r2 = {(2), (3), (3)} Then PB(r1) would be {(a), (a)}, while PB(r1) x r2 would be {(a,2), (a,2), (a,3), (a,3), (a,3), (a,3)} SQL duplicate semantics: select A1,, A2, ..., An from r1, r2, ..., rm where P is equivalent to the multiset version of the expression: A1,, A2, ..., An(sP (r1 x r2 x ... x rm)) Set Operations The set operations union, intersect, and except operate on relations and correspond to the relational algebra operations .- ,Ç ,È Each of the above operations automatically eliminates duplicates; to retain all duplicates use the corresponding multiset versions union all, intersect all and except all. Suppose a tuple occurs m times in r and n times in s, then, it occurs: m + n times in r union all s min(m,n) times in r intersect all s max(0, m – n) times in r except all s Aggregate Functions These functions operate on the multiset of values of a column of a relation, and return a value avg: average value min: minimum value max: maximum value sum: sum of values count: number of values Aggregate Functions – Group By Find the number of depositors for each branch. select branch-name, count (distinct customer-name) from depositor, account where depositor.account-number = account.account-numbergroup by branch-name Note: Attributes in select clause outside of aggregate functions must appear in group by list
Aggregate Functions – Having Clause Find the names of all branches where the average account balance is more than $1,200. select branch-name, avg (balance)from accountgroup by branch-namehaving avg (balance) > 1200 Note: predicates in the having clause are applied after the formation of groups whereas predicates in the where clause are applied before forming groups Null Values It is possible for tuples to have a null value, denoted by null, for some of their attributes null signifies an unknown value or that a value does not exist. The predicate is null can be used to check for null values. E.g. Find all loan number which appear in the loan relation with null values for amount. select loan-number from loanwhere amount is null The result of any arithmetic expression involving null is null E.g. 5 + null returns null However, aggregate functions simply ignore nulls more on this shortly Nested Subqueries SQL provides a mechanism for the nesting of subqueries. A subquery is a select-from-where expression that is nested within another query.A common use of subqueries is to perform tests for set membership, set comparisons, and set cardinality Example Query 1.Find all customers who have both an account and a loan at the bank. select distinct customer-namefrom borrower where customer-name in (select customer-name from depositor) 2. Find all customers who have a loan at the bank but do not have an account at the bank select distinct customer-namefrom borrowerwhere customer-name not in (select customer-name from depositor) 3.Find all customers who have both an account and a loan at the Perryridge branch select distinct customer-namefrom borrower, loanwhere borrower.loan-number = loan.loan-number and branch-name = “Perryridge” and (branch-name, customer-name) in (select branch- name, customer-name from depositor, account where depositor.account-number = account.account-number) Set Comparison Find all branches that have greater assets than some branch located in Brooklyn. select distinct T.branch-name from branch as T, branch as S where T.assets > S.assets and S.branch-city = ‘Brooklyn’
2.5 Integrity
Domain Constraints Integrity constraints guard against accidental damage to the database, by ensuring that authorized changes to the database do not result in a loss of data consistency. Domain constraints are the most elementary form of integrity constraint. They test values inserted in the database, and test queries to ensure that the comparisons make sense. New domains can be created from existing data types E.g. create domain Dollars numeric(12, 2) create domain Pounds numeric(12,2) We cannot assign or compare a value of type Dollars to a value of type Pounds. However, we can convert type as below (cast r.A as Pounds) Should also multiply by the dollar-to-pound conversion- rate) The check clause in SQL-92 permits domains to be restricted: Use check clause to ensure that an hourly-wage domain allows only values greater than a specified value. create domain hourly-wage numeric(5,2)constraint value-test check(value > = 4.00) The domain has a constraint that ensures that the hourly-wage is greater than 4.00 The clause constraint value-test is optional; useful to indicate which constraint an update violated can have complex conditions in domain check create domain AccountType char(10) constraint account-type-test check (value in (‘Checking’, ‘Saving’))check (branch-name in (select branch-name from branch)) Referential Integrity Ensures that a value that appears in one relation for a given set of attributes also appears for a certain set of attributes in another relation. Example: If “Perryridge” is a branch name appearing in one of the tuples in the account relation, then there exists a tuple in the branch relation for branch “Perryridge”. Formal Definition Let r1(R1) and r2(R2) be relations with primary keys K1 and K2 respectively. The subset a of R2 is a foreign key referencing K1 in relation r1, if for every t2 in r2 there must be a tuple t1 in r1 such that t1[K1] = t2[a]. Referential integrity constraint also called subset dependency since its can be written as Õa (r2) Í ÕK1 (r1) Referential Integrity in the E-R Model Consider relationship set R between entity sets E1 and E2. The relational schema for R includes the primary keys K1 of E1 and K2 of E2. Then K1 and K2 form foreign keys on the relational schemas for E1 and E2 respectively.
E E R 1 2 Weak entity sets are also a source of referential integrity constraints. For the relation schema for a weak entity set must include the primary key attributes of the entity set on which it depends Checking Referential Integrity on Database Modification The following tests must be made in order to preserve the following referential integrity constraint: Õa (r2) Í ÕK (r1) Insert. If a tuple t2 is inserted into r2, the system must ensure that there is a tuple t1 in r1 such that t1[K] = t2[a]. That is t2 [a] Î ÕK (r1) Delete. If a tuple, t1 is deleted from r1, the system must compute the set of tuples in r2 that reference t1: sa = t1[K] (r2) If this set is not empty either the delete command is rejected as an error, or the tuples that reference t1 must themselves be deleted (cascading deletions are possible). Update. There are two cases: If a tuple t2 is updated in relation r2 and the update modifies values for foreign key a, then a test similar to the insert case is made: Let t2’ denote the new value of tuple t2. The system must ensure that t2’[a] Î ÕK(r1).If a tuple t1 is updated in r1, and the update modifies values for the primary key (K), then a test similar to the delete case is made: The system must compute sa = t1[K] (r2) using the old value of t1 (the value before the update is applied). If this set is not empty, the update may be rejected as an error, or the update may be cascaded to the tuples in the set, or the tuples in the set may be deleted. Referential Integrity in SQL Primary and candidate keys and foreign keys can be specified as part of the SQL create table statement: The primary key clause lists attributes that comprise the primary key. The unique key clause lists attributes that comprise a candidate key. The foreign key clause lists the attributes that comprise the foreign key and the name of the relation referenced by the foreign key. By default, a foreign key references the primary key attributes of the referenced table foreign key (account-number) references account Short form for specifying a single column as foreign key account-number char (10) references account Reference columns in the referenced table can be explicitly specified but must be declared as primary/candidate keys foreign key (account-number) references account(account-number) 2.6 Triggers
A trigger is a statement that is executed automatically by the system as a side effect of a modification to the database.To design a trigger mechanism, we must: Specify the conditions under which the trigger is to be executed. Specify the actions to be taken when the trigger executes. Triggers introduced to SQL standard in SQL:1999, but supported even earlier using non-standard syntax by most databases. Trigger Example Suppose that instead of allowing negative account balances, the bank deals with overdrafts by setting the account balance to zero creating a loan in the amount of the overdraft giving this loan a loan number identical to the account number of the overdrawn account The condition for executing the trigger is an update to the account relation that results in a negative balance value. Trigger Example in SQL:1999 create trigger overdraft-trigger after update on account referencing new row as nrow for each row when nrow.balance < 0 begin atomic insert into borrower (select customer-name, account-number from depositor where nrow.account-number = depositor.account-number); insert into loan values (n.row.account-number, nrow.branch-name, – nrow.balance); update account set balance = 0 where account.account-number = nrow.account-number end Triggering Events and Actions in SQL Triggering event can be insert, delete or update Triggers on update can be restricted to specific attributes E.g. create trigger overdraft-trigger after update of balance on account Values of attributes before and after an update can be referenced referencing old row as : for deletes and updates referencing new row as : for inserts and updates Triggers can be activated before an event, which can serve as extra constraints. E.g. convert blanks to null. create trigger setnull-trigger before update on r referencing new row as nrow for each row when nrow.phone-number = ‘ ‘ set nrow.phone-number = null When Not To Use Triggers Triggers were used earlier for tasks such as maintaining summary data (e.g. total salary of each department). Replicating databases by recording changes to special relations (called change or delta relations) and having a separate process that applies the changes over to a replica There are better ways of doing these now:Databases today provide built in materialized view facilities to maintain summary data, Databases provide built-in support for replication,Encapsulation facilities can be used instead of triggers in many cases, Define methods to update fields and carry out actions as part of the update methods instead of through a trigger.
2.7 Security
Security - protection from malicious attempts to steal or modify data. o Database system level . Authentication and authorization mechanisms to allow specific users access only to required data . We concentrate on authorization in the rest of this chapter o Operating system level . Operating system super-users can do anything they want to the database! Good operating system level security is required. o Network level: must use encryption to prevent . Eavesdropping (unauthorized reading of messages) . Masquerading (pretending to be an authorized user or sending messages supposedly from authorized users) o Physical level . Physical access to computers allows destruction of data by intruders; traditional lock-and-key security is needed . Computers must also be protected from floods, fire, etc. More in Chapter 17 (Recovery) o Human level . Users must be screened to ensure that an authorized users do not give access to intruders . Users should be trained on password selection and secrecy Authorization Forms of authorization on parts of the database: Read authorization - allows reading, but not modification of data. Insert authorization - allows insertion of new data, but not modification of existing data. Update authorization - allows modification, but not deletion of data. Delete authorization - allows deletion of data Forms of authorization to modify the database schema: Index authorization - allows creation and deletion of indices. Resources authorization - allows creation of new relations. Alteration authorization - allows addition or deletion of attributes in a relation. Drop authorization - allows deletion of relations. Authorization and Views Users can be given authorization on views, without being given any authorization on the relations used in the view definition Ability of views to hide data serves both to simplify usage of the system and to enhance security by allowing users access only to data they need for their job A combination or relational-level security and view-level security can be used to limit a user’s access to precisely the data that user needs. Audit Trails An audit trail is a log of all changes (inserts/deletes/updates) to the database along with information such as which user performed the change, and when the change was performed. Used to track erroneous/fraudulent updates. Can be implemented using triggers, but many database systems provide direct support. Encryption Data may be encrypted when database authorization provisions do not offer sufficient protection. Relatively simple for authorized users to encrypt and decrypt data. Encryption scheme depends not on the secrecy of the algorithm but on the secrecy of a parameter of the algorithm called the encryption key. Extremely difficult for an intruder to determine the encryption key. Data Encryption Standard (DES) substitutes characters and rearranges their order on the basis of an encryption key which is provided to authorized users via a secure mechanism. Scheme is no more secure than the key transmission mechanism since the key has to be shared. Advanced Encryption Standard (AES) is a new standard replacing DES, and is based on the Rijndael algorithm, but is also dependent on shared secret keys Public-key encryption is based on each user having two keys: public key – publicly published key used to encrypt data, but cannot be used to decrypt data private key -- key known only to individual user, and used to decrypt data. Need not be transmitted to the site doing encryption. Encryption scheme is such that it is impossible or extremely hard to decrypt data given only the public key. The RSA public-key encryption scheme is based on the hardness of factoring a very large number (100's of digits) into its prime components. Authentication Password based authentication is widely used, but is susceptible to sniffing on a network Challenge-response systems avoid transmission of passwords DB sends a (randomly generated) challenge string to user. User encrypts string and returns result. DB verifies identity by decrypting result. It can use public-key encryption system by DB sending a message encrypted using user’s public key, and user decrypting and sending the message back Digital signatures are used to verify authenticity of data E.g. use private key (in reverse) to encrypt data, and anyone can verify authenticity by using public key (in reverse) to decrypt data. Only holder of private key could have created the encrypted data. Digital signatures also help ensure nonrepudiation: sender cannot later claim to have not created the data Digital Certificates Digital certificates are used to verify authenticity of public keys. Every client (e.g. browser) has public keys of a few root-level certification authorities A site can get its name/URL and public key signed by a certification authority: signed document is called a certificate Client can use public key of certification authority to verify certificate Multiple levels of certification authorities can exist. Each certification authority presents its own public-key certificate signed by a higher level authority, and uses its private key to sign the certificate of other web sites/authorities 2.8 Embedded SQL
The SQL standard defines embeddings of SQL in a variety of programming languages such as Pascal, PL/I, Fortran, C, and Cobol. A language to which SQL queries are embedded is referred to as a host language, and the SQL structures permitted in the host language comprise embedded SQL. The basic form of these languages follows that of the System R embedding of SQL into PL/I. EXEC SQL statement is used to identify embedded SQL request to the preprocessor . EXEC SQL
Allows programs to construct and submit SQL queries at run time. Example of the use of dynamic SQL from within a C program. char * sqlprog = “update account set balance = balance * 1.05 where account-number = ?” EXEC SQL prepare dynprog from :sqlprog; char account [10] = “A-101”; EXEC SQL execute dynprog using :account; The dynamic SQL program contains a ?, which is a place holder for a value that is provided when the SQL program is executed. ODBC Open DataBase Connectivity(ODBC) standard o standard for application program to communicate with a database server. o application program interface (API) to . open a connection with a database, . send queries and updates, . get back results. Applications such as GUI, spreadsheets, etc. can use ODBC Each database system supporting ODBC provides a "driver" library that must be linked with the client program. When client program makes an ODBC API call, the code in the library communicates with the server to carry out the requested action, and fetch results. ODBC program first allocates an SQL environment, then a database connection handle. Opens database connection using SQLConnect(). Parameters for SQLConnect: o connection handle, o the server to which to connect o the user identifier, o password Must also specify types of arguments: o SQL_NTS denotes previous argument is a null-terminated string. ODBC Code int ODBCexample() { RETCODE error; HENV env; /* environment */ HDBC conn; /* database connection */ SQLAllocEnv(&env); SQLAllocConnect(env, &conn); SQLConnect(conn, "aura.bell-labs.com", SQL_NTS, "avi", SQL_NTS, "avipasswd", SQL_NTS); { …. Do actual work … } SQLDisconnect(conn); SQLFreeConnect(conn); SQLFreeEnv(env); } Program sends SQL commands to the database by using SQLExecDirect Result tuples are fetched using SQLFetch() SQLBindCol() binds C language variables to attributes of the query result . When a tuple is fetched, its attribute values are automatically stored in corresponding C variables. . Arguments to SQLBindCol() ODBC stmt variable, attribute position in query result The type conversion from SQL to C. The address of the variable. For variable-length types like character arrays, o The maximum length of the variable o Location to store actual length when a tuple is fetched. o Note: A negative value returned for the length field indicates null value Good programming requires checking results of every function call for errors; we have omitted most checks for brevity. JDBC JDBC is a Java API for communicating with database systems supporting SQL JDBC supports a variety of features for querying and updating data, and for retrieving query results JDBC also supports metadata retrieval, such as querying about relations present in the database and the names and types of relation attributes Model for communicating with the database: o Open a connection o Create a “statement” object o Execute queries using the Statement object to send queries and fetch results o Exception mechanism to handle errors JDBC Code public static void JDBCexample(String dbid, String userid, String passwd) { try { Class.forName ("oracle.jdbc.driver.OracleDriver"); Connection conn = DriverManager.getConnection( "jdbc:oracle:thin:@aura.bell- labs.com:2000:bankdb", userid, passwd); Statement stmt = conn.createStatement(); … Do Actual Work …. stmt.close(); conn.close(); } catch (SQLException sqle) { System.out.println("SQLException : " + sqle); } } Update to database try { stmt.executeUpdate( "insert into account values ('A-9732', 'Perryridge', 1200)"); } catch (SQLException sqle) { System.out.println("Could not insert tuple. " + sqle); } Execute query and fetch and print results ResultSet rset = stmt.executeQuery( "select branch_name, avg(balance) from account group by branch_name"); while (rset.next()) { System.out.println(rset.getString("branch_name") + " " + rset.getFloat(2)); } 2.10 Views Provide a mechanism to hide certain data from the view of certain users. To create a view we use the command:where: o
2.11 Distributed Databases Data is stored at several sites, each managed by a DBMS that can run independently. � Distributed Data Independence: Users should not have to know where data is located (extends Physical and Logical Data Independence principles). � Distributed Transaction Atomicity: Users should be able to write Xacts accessing multiple sites just like local Xacts. Users have to be aware of where data is located, i.e., Distributed Data Independence and Distributed Transaction Atomicity are not supported. � These properties are hard to support efficiently. � For globally distributed sites, these properties may not even be desirable due to administrative overheads of making location of data transparent. Types of Distributed Databases 1.Homogeneous: Every site runs same type of DBMS. 2.Heterogeneous: Different sites run different DBMSs (different RDBMSs or even nonrelational DBMSs).
Distributed DBMS Architectures (i) Client-Server:Client ships query to single site. All query processing at server. - Thin vs. fat clients. - Set-oriented communication, client side caching. (ii) Collaborating-Server: Query can span multiple sites. Summary SQL includes a variety of language constructs for queries on the database. All the relational-algebra operations, including the extended relational-algebra operations, can be expressed by SQL. SQL also allows ordering of query results by sorting on specified attributes. View relations can be defined as relations containing the result of queries. Views are useful for hiding unneeded information, and for collecting together information from more than one relation into a single view. Temporary views defined by using the with clause are also useful for breaking up complex queries into smaller and easier-to-understand parts. SQL provides constructs for updating, inserting, and deleting information. A transaction consists of a sequence of operations, which must appear to be atomic. That is, all the operations are carried out successfully, or none is carried out. In practice, if a transaction cannot complete successfully, any partial actions it carried out are undone. Modifications to the database may lead to the generation of null values in tuples. We discussed how nulls can be introduced, and how the SQL query language handles queries on relations containing null values. Key Terms >> Table >> Relation >> Tuple variable >> Atomic domain >> Null value >> Database schema >> Database instance >> Relation schema >> Relation instance >>Keys >>Foreign key >>Schema diagram >>Query language >>Procedural language >>Nonprocedural language >>Relational algebra Key Term Quiz 1. The ------defines a set of algebraic operations that operate on tables, and output tables as their results. 2. Views are ------3. ------is an example for nonprocedural languages 4. The set of allowed values for each attribute is called ------5. ------which signifies that the value is unknown or does not exist 6. A ------language is a language in which a user requests information from the database. 7. The user instructs the system to perform a sequence of operations on the database to compute the desired result. In a nonprocedural language 8. The user describes the desired information without giving a specific procedure for obtaining that information. Objective Type Questions 1. What is the SQL command to end a transaction? a. COMMIT b. ROLLBACK c. END d. ENTER 2. Prime and non prime attributes are a. part of any candidate key and not part of candidate key b. not part of any candidate key and part of candidate key c. both part of any candidate key d. both not part of candidate key 3. Any set of attributes of a relation schema is called a. an alternate key b. a candidate key c. a primary key d. a superkey 4. What is SQL stands for? a. Standard Query Language b. Structured Query Language c. Structured Questioning Language d. Structural Query Language e. None of the above. 5. SQL command SELECT is a a. DDL b. DML c. DCL 6. What is the SQL command to end a transaction a. COMMIT b. ROLLBACK c. END d. ENTER 7. Which is the command to remove a database a. DELETE b. DROP c. None of the above 8. Which of the following is True? (i)Character " _" stands for one character (ii)Character "%" stands for more than one character (iii)Character " _" stands for more than one character (iv)Character "%" stands for one character a. Both i and ii b. Only i c. Only ii d. Both iii and iv 9. Which one is valid? a. SELECT [ALL DISTINCT]column_list FROM table_list [WHERE conditional expression] [GROUP BY group_by_column_list] [HAVING conditional expression][ORDER BY order_by_column_list] b. SELECT [ALL DISTINCT]column_list FROM table_list [WHERE conditional expression] [GROUP BY group_by_column_list] [HAVING conditional expression][ORDER BY order_by_column_list]; c. SELECT [ALL DISTINCT]column_list FROM table_list [WHERE conditional expression] [GROUP BY group_by_column_list] [HAVING conditional expression][ORDER order_column_list] d. SELECT [ALL DISTINCT]row_list FROM table_list [WHERE conditional expression] [GROUP BY group_by_column_list] [HAVING conditional expression][ORDER BY order_by_column_list] e. All of the above. f. None of the above. 10. Which of the following allows SQL statements to inserted into host languages? a. Embedded SQL b. QBE c. C d. Linker e. All of the above 11. Host variable when used within the Embedded SQL statement need to be prefixed with colon(:) a. True b. False 12. Pick out the correct format of Date value a. yyyy-dd-mm b. yyyy-mm-dd c. dd-mm-yyyy d. mm-dd-yyyy 13. Which of the following is not a data type in SQL? a. Char b. Varchar c. String d. Long 14. The optional keyword that helps to get the answer not in a duplicate way a. GROUP b. COUNT c. ORDER d. DISTINCT e. None of the above 15. Which of the following is a set comparison operator similar to IN? a. REFERENCE b. EXISTS c. LIKE d. GROUP e. All of the above 16. In SQL notation,lower case letter is used to represent reserved words a. True b. False 17. "A change in the organization of records in a table from clustered to nonclustered does not affect the front-end application program." This is an example of a. Data integrity b. Data independence c. Data isolation d. Data consistency 18. Consider the relation: Student_Course (Student#, Course, Marks) Assuming that Marks column contains marks in percentage, which of the following query can be written to retrieve the total marks of the students who have average marks > 75 % ? a. SELECT Student#, SUM(Marks) FROM Student_Course HAVING Avg(Marks) > 75; b. SELECT Student#, SUM(Marks) FROM Student_Course GROUP BY Student# HAVING Avg(Marks) > 75; c. SELECT Student#, SUM(Marks) FROM Student_Course where Avg(Marks) > 75; d. None of the above Review Questions Two Mark Questions 1. Difference between alter table & update command? 2. Difference between drop & truncate table command? 3. Difference between drop table & delete command? 4. What are aggregate functions? 5. What is the use of orderby and group by clause? 6. What is table aliasing? 7. What is view & what are its advantages? 8. What is join and division operation? 9. What comparison operations are used in where clause? 10. How can you test the absence of tuples? 11. What is sub query? 12. Difference between static and dynamic sql? 13. What is embedded SQL? 14. What is trigger and what is its use? 15. What is assertion statement? 16. What are the different types of authorization? 17. Write SQL commands for grant and revoke operations? Big Questions
1. What is nested relationls? Give example. 2. What are the relational algebra operations supported in SQL? Write the SQL statement for each operation. 3. Illustrate the issues to be considered while developing an ER-diagram.Consider the relational database employee (empname, street,city) works (empname.companyname,salary) company(companyname,city) manages(empname,managername) and give an expression in SQL for the following queries: (1) Find the names of all employees who work for first bank corporation. (2) Find the names.street addresses,and cities of residence of all employees who work for first bank corporation and earn more than 200000 per annum. (3) Find the names of all employees in this database who live in the city as the companies for which they work. (4) Find the names of all the employees who earn more than every employees of small corporation.
------END OF SECOND UNIT ------