Database Systems Relational Algebra SL02 ◮ Attribute, Domain, Tuple, Relation, Database
Total Page:16
File Type:pdf, Size:1020Kb
Informatik f¨ur Okonomen¨ II Fall 2010 The Relational Model Database Systems Relational Algebra SL02 ◮ Attribute, domain, tuple, relation, database ◮ Instance, schema ◮ ◮ The Relational Model Key constraints, entity constraints, foreign key ◮ Basic Relational Algebra Operators: σ,π, , , ,ρ ∪ × − ◮ Additional Relational Algebra Operators: , 1, ∩ ← Inf4Oec10, SL02 1/48 M. B¨ohlen, ifi@uzh Inf4Oec10, SL02 2/48 M. B¨ohlen, ifi@uzh The Relational Model Relation Schema ◮ The relational model is based on the concept of a relation. ◮ A relation is a mathematical concept based on the ideas of sets. ◮ The relational model was proposed by Codd from IBM Research in ◮ R(A1, A2,..., An) is a relation schema the paper: ◮ A Relational Model for Large Shared Data Banks, Communications of ◮ R is the name of the relation. the ACM, June 1970 ◮ The above paper caused a major revolution in the field of database ◮ A1, A2,..., An are attributes management and earned Codd the coveted ACM Turing Award. ◮ Example: ◮ The strength of the relational approach comes from the formal Customer(CustName, CustStreet, CustCity) foundation provided by the theory of relations. ◮ In practice, there is a standard model based on SQL. There are several important differences between the formal model and the practical model, as we shall see. Inf4Oec10, SL02 3/48 M. B¨ohlen, ifi@uzh Inf4Oec10, SL02 4/48 M. B¨ohlen, ifi@uzh The Customer Relation Attribute customer CustName CustStreet CustCity ◮ Each attribute of a relation has a name Adams Spring Pittsfield ◮ The set of allowed values for each attribute is called the domain of Brooks Senator Brooklyn the attribute Curry North Rye ◮ Attribute values are required to be atomic; that is, indivisible Glenn Sad Hill Woodside ◮ The value of an attribute can be an account number, but cannot be a Green Walnut Stamford set of account numbers Hayes Main Harrison ◮ The attribute name designates the role played by a domain in a Johnson Alma Palo Alto relation: Jones Main Harrison ◮ Used to interpret the meaning of the data elements corresponding to Lindsay Park Pittsfield that attribute Smith North Rye ◮ Example: The domain Date may be used to define two attributes named “Invoice-date” and “Payment-date” with different meanings Turner Putnam Stamford Williams Nassau Princeton Inf4Oec10, SL02 5/48 M. B¨ohlen, ifi@uzh Inf4Oec10, SL02 6/48 M. B¨ohlen, ifi@uzh Domain Tuple ◮ A domain has a logical definition: ◮ Example: USA phone numbers are the set of 10 digit phone numbers valid in the U.S. ◮ A tuple is an ordered set (list) of values ◮ Angled brackets ... are used as notation; sometimes regular ◮ A domain also has a data-type or a format defined for it. parentheses (...) are used as well ◮ The USA phone numbers may have a format: (ddd)ddd-dddd where each d is a decimal digit. ◮ Each value is derived from an appropriate domain ◮ Dates have various formats such as year, month, date formatted as ◮ A customer tuple is a 3-tuple and would consist of three values, for yyyy-mm-dd, or as dd mm,yyyy etc. example: ◮ The special value null is a member of every domain ◮ Adams, Spring, Pittsfield ◮ The null value causes complications in the definition of many operations Inf4Oec10, SL02 7/48 M. B¨ohlen, ifi@uzh Inf4Oec10, SL02 8/48 M. B¨ohlen, ifi@uzh Relational Instance Example of a Relation ◮ r(R) denotes a relation r on the relation schema R ◮ Example: customer(Customer) relation name attributes ◮ A relation instance is a subset of the Cartesian product of the domains of its attributes. Thus, a relation is a set of n-tuples account (a1, a2,..., an) where each ai Di AccNr BrName Balance ∈ ◮ Formally, given sets D1, D2,..., Dn a relation r is a subset of A-101 Downtown 500 D1 D2 ... Dn A-215 Mianus 700 × × × ◮ Example: tuples A-102 Perryridge 400 A-305 Round Hill 350 D = CustName = Jones, Smith, Curry, Lindsay,... 1 { } A-201 Brighton 900 D = CustStreet = Main, North, Park,... 2 { } A-222 Redwood 700 D = CustCity = Harrison, Rye, Pittsfield,... 3 { } A-217 Brighton 750 r = (Jones, Main, Harrison), (Smith, North, Rye), { (Curry, North, Rye), (Lindsay, Park, Pittsfield) } CustName CustStreet CustCity ⊆ × × Inf4Oec10, SL02 9/48 M. B¨ohlen, ifi@uzh Inf4Oec10, SL02 10/48 M. B¨ohlen, ifi@uzh Characteristics of Relations Database ◮ Relations are unordered, i.e., the order of tuples is irrelevant (tuples may be stored and retrieved in an arbitrary order) ◮ A database consists of a set of relations ◮ The attributes in R(A ,..., A ) and the values in t = v ,..., v are 1 n 1 n ordered. ◮ Example: Information about an enterprise is broken up into parts, with each relation storing one part of the information account ◮ account: stores information about accounts AccNr BrName Balance ◮ customer: stores information about customers A-101 Downtown 500 ◮ depositor: information about which customer owns which account A-215 Mianus 700 ◮ Storing all information as a single relation such as A-102 Perryridge 400 ◮ bank(AccNr, Balance, CustName,...) A-305 Round Hill 350 results in A-201 Brighton 900 ◮ repetition of information: e.g.,if two customers own the same account A-222 Redwood 700 ◮ the need for null values: e.g., to represent a customer without an A-217 Brighton 750 account ◮ There exist alternative definitions of a relation where attributes in a schema and values in a tuple are not ordered. Inf4Oec10, SL02 11/48 M. B¨ohlen, ifi@uzh Inf4Oec10, SL02 12/48 M. B¨ohlen, ifi@uzh The Depositor Relation Checking... depositor 1. What kind of object is X = (3) in the relational model? {{ }} CuName AccNr X is a database. Hayes A-102 2. Is r = (Tom, 27, ZH), (Bob, 33, Rome, IT ) a relation? Johnson A-101 { } No. Tuples have different schemas. Johnson A-201 Jones A-217 3. Are DB1 and DB2 identical databases? Lindsay A-222 DB1= (1, 5), (2, 3) , (4, 4) {{ } { }} Smith A-215 DB2= (4, 4) , (2, 3), (1, 5) {{ } { }} Turner A-305 Yes. Relations are sets of tuples and not ordered Inf4Oec10, SL02 13/48 M. B¨ohlen, ifi@uzh Inf4Oec10, SL02 14/48 M. B¨ohlen, ifi@uzh Checking... Checking... 4. Illustrate the following relations graphically: r(X , Y )= (1, a), (2, b), (3, c) { } s(A, B, C)= (1, 2, 3) { } r 6. What is the difference between a set and a relation? Illustrate with X Y s an example. 1 a A B C set: elements can be anything. 2 b 1 2 3 relation: all tuples must have the same schema. S = (1, r), (2, c , ), is a set but not a relation. 3 c { {⋆} •} 5. Determine the following objects: ◮ the 2nd attribute of relation r? X ◮ the 3rd tuple of relation r? there is no ordering ◮ the tuple in relation r with the smallest value for attribute X ? (1,a) Inf4Oec10, SL02 15/48 M. B¨ohlen, ifi@uzh Inf4Oec10, SL02 16/48 M. B¨ohlen, ifi@uzh Summary of the Relational Data Model Constraints ◮ A domain D is a set of atomic data values. ◮ phone numbers, names, grades, birthdates, departments ◮ each domain includes the special value null ◮ With each domain a data type or format is specified. ◮ 5 digit integers, yyyy-mm-dd, characters ◮ Constraints are conditions that must be satisfied by all valid relation ◮ An attribute Ai describes the role of a domain in a relation schema. instances ◮ PhoneNr, Age, DeptName ◮ There are four main types of constraints in the relational model: ◮ A relation schema R(A1,..., An) is made up of a relation name R and a list of attributes. ◮ Domain constraints: each value in a tuple must be from the domain ◮ employee(Name, Dept, Salary), department(DName, Manager, Address) of its attribute ◮ Key constraints ◮ A tuple t is an ordered list of values t =(v1,..., vn) with vi dom(Ai ). ∈ ◮ Entity constraints ◮ t =(Tom, SE, 23K) ◮ Referential integrity constraints ◮ A relation r D1 ... Dn over schema R(A1,..., An) is a set of n-ary tuples. ⊆ × × ◮ r = (Tom, SE, 23K), (Lene, DB, 33K) Names Departments Integer { }⊆ × × ◮ A database DB is a set of relations. ◮ DB = r, s { } ◮ r = (Tom, SE, 23K), (Lene, DB, 33K) { } ◮ s = (SE, Tom, Boston), (DB, Lena, Tucson) { } Inf4Oec10, SL02 17/48 M. B¨ohlen, ifi@uzh Inf4Oec10, SL02 18/48 M. B¨ohlen, ifi@uzh Key Constraints/1 Key Constraints/2 ◮ Let K R ⊆ ◮ K is a superkey of R if values for K are sufficient to identify a unique tuple of each possible relation r ◮ By “possible” we mean a relation r that could exist in the enterprise ◮ K is a candidate key if K is minimal we are modeling. Example: CustName is a candidate key for Customer, since it is a ◮ Example: CustName, CustStreet and CustName are both { } superkeys{ of Customer, if no two} customers{ can possibly} have the superkey and no subset of it is a superkey. same name. ◮ Primary key: a candidate key chosen as the principal means of ◮ In real life, an attribute such as CustID would be used instead of identifying tuples within a relation CustName to uniquely identify customers, but we omit it to keep our examples small, and instead assume customer names are unique. ◮ Should choose an attribute whose value never, or very rarely, changes. ◮ E.g. email address is unique, but may change CustName CustStreet ID CustName CustStreet N. Jeff Binzm¨uhlestr 1 N. Jeff Binzm¨uhlestr N. Jeff Hochstr 2 N. Jeff Hochstr bad good Inf4Oec10, SL02 19/48 M. B¨ohlen, ifi@uzh Inf4Oec10, SL02 20/48 M. B¨ohlen, ifi@uzh Entity Constraints Referential Integrity Constraint ◮ A relation schema may have an attribute that corresponds to the ◮ The entity constraint requires that the primary key attributes of primary key of another relation.