<<

Week 2 – Part 1: The The Relational Model ƒ Proposed by E. F. Codd in 1970 as a The Relational Model which strongly supports data independence. Mathematical Relations ƒ Made available in commercial DBMSs in 1981 -- Attributes and Schema it is not easy to implement data independence Null Values and Database Constraints efficiently and reliably! Keys, Primary and Foreign Keys ƒ It is based on (a variant of) the mathematical notion of . ƒ Relations are represented as tables.

CSC343 Introduction to — University of Toronto The Relational Model — 1 CSC343 Introduction to Databases — University of Toronto The Relational Model — 2

An Example Mathematical Relations

ƒ Given sets D1, D2, …, Dn, not necessarily distinct. Games ⊆ String × String × Integer × Integer ƒ The D1xD2x…xDn is the of all (ordered) n- such that Juve Lazio 3 1 d1∈D1, d2 ∈ D2, …, dn ∈ Dn Lazio Milan 2 0 Juve Roma 1 2 ƒ A mathematical relation on D1, D2, …, Dn is a Roma Milan 0 1 of the Cartesian product D1xD2x…xDn. ƒ D , D , …, D are the domains of the relation. 1 2 n ƒ Note that String and Integer each play two ƒ n is the degree of the relation. roles, distinguished by means of position. ƒ The number of n-tuples in a given relation is the ƒ The structure of a relation is positional. cardinality of that relation; the cardinality of a relation is always finite. CSC343 Introduction to Databases — University of Toronto The Relational Model — 3 CSC343 Introduction to Databases — University of Toronto The Relational Model — 4

1 Notation Attributes ƒ t[A] (or t.A ) denotes the value on A of a t ƒ We would like to have a non-positional structure ƒ In the example, if t is the first tuple in the for relations. To do so, we associate a unique name t[VisitingTeam] = Lazio (attribute) with each domain of a relation which describes the role of the domain. ƒ The same notation is extended to sets of attributes, thus denoting tuples: ƒ In the tabular representation, attributes are used as headings t[VisitingTeam,VisitorGoals] HomeTeam VisitingTeam HomeGoals VisitorGoals is a tuple on two attributes, Juve Lazio 3 1 ƒ More generally, if X is a sequence of attribute

Lazio Milan 2 0 names A1,...An, t[X] is Juve Roma 1 2 Roma Milan 0 1

CSC343 Introduction to Databases — University of Toronto The Relational Model — 5 CSC343 Introduction to Databases — University of Toronto The Relational Model — 6

Value-Based References Students RegNum Surname FirstName BirthDate 6554 Rossi Mario 5/12/1978 8765 Neri Paolo 3/11/1976 Students RegNum Surname FirstName BirthDate 9283 Verdi Luisa 12/11/1979 3456 Rossi Maria 1/2/1978 6554 Rossi Mario 5/12/1978 8765 Neri Paolo 3/11/1976 9283 Verdi Luisa 12/11/1979 Exams Student Grade Course 3456 Rossi Maria 1/2/1978 30 24 Exams Student Grade Course 28 26

3456 30 04 3456 24 02 9283 28 01 Courses Code Title Tutor 6554 26 01 Courses Code Title Tutor 01 Analisi Neri 01 Analisi Neri 02 Chimica Bruni 02 Chimica Bruni 04 Chimica Verdi 04 Chimica Verdi

CSC343 Introduction to Databases — University of Toronto The Relational Model — 7 CSC343 Introduction to Databases — University of Toronto The Relational Model — 8

2 Advantages of Value-Based Definitions References Relation schema: ƒ Value-based references lead to independence from A name (of the relation) R with a set of attributes physical structures, such as pointers. A1,..., An: R(A1,..., An) ƒ Pointers are implemented differently on different : hardware, inhibit portability of a database. A set of relation schemas with different names D= {R(X ), ..., R (X )} Notes: 1 1 n n Relation (instance) on a relation schema R(X): ƒ Pointers usually exist at the physical level, but they Set r of tuples on X are not visible at the logical level Database (instance) on a schema D= {R (X ), ..., ƒ Object identifiers in object databases share some 1 1 R (X )}: features with pointers, at a higher level of n n abstraction. Set of relations r = {r1,..., rn} (with ri relation on Ri) CSC343 Introduction to Databases — University of Toronto The Relational Model — 9 CSC343 Introduction to Databases — University of Toronto The Relational Model — 10

Examples Nested structures ƒ Relations on a single attribute are admissible:

Students RegNum Surname FirstName BirthDate 6554 Rossi Mario 5/12/1978 8765 Neri Paolo 3/11/1976 9283 Verdi Luisa 12/11/1979 3456 Rossi Maria 1/2/1978

Workers RegNum 6554 8765

CSC343 Introduction to Databases — University of Toronto The Relational Model — 11 CSC343 Introduction to Databases — University of Toronto The Relational Model — 12

3 Representating Nested Structures Questions Number Quantity Description Cost Details 1357 3 Covers 3.00 1357 2 Hors d'oeuvre 5.00 ƒ Have we represented all details of receipts? 1357 3 First course 9.00 1357 2 Steak 12.00 ƒ Well, it depends on what we are really interested 2334 2 Covers 2.00 in: Receipts 2334 2 Hors d'oeuvre 2.50 Number Date Total 2334 2 First course 6.00 ƒ does theWhat order else of could lines wematter? require? 1357 5/5/92 29.00 2334 2 Bream 15.00 ƒ could we have duplicate lines in a receipt? 2334 4/7/92 27.50 2334 2 Coffee 2.00 3007 4/8/92 29.50 3007 2 Covers 3.00 3007 2 Hors d'oeuvre 6.00 3007 3 First course 8.00 ƒ If needed, an alternative organization is possible 3007 1 Bream 7.50 3007 1 Salad 3.00 3007 2 Coffee 2.00

CSC343 Introduction to Databases — University of Toronto The Relational Model — 13 CSC343 Introduction to Databases — University of Toronto The Relational Model — 14

More Detailed Representation Number Line Quantity Description Cost Incomplete Details 1357 1 3 Covers 3.00 1357 2 2 Hors d'oeuvre 5.00 How could we ƒ The relational model imposes a rigid structure We could add 1357 3 3 First course 9.00 Preserve the on data: a new 1357 4 2 Steak 12.00 Detail sequence attribute 2334 1 2 Covers 2.00 ƒ information is represented by means of Or allow duplicates? 2334 2 2 Hors d'oeuvre 2.50 tuples; 2334 3 2 First course 6.00 Receipts 2334 4 2 Bream 15.00 ƒ tuples must conform to relation schemas. Number Date Total 2334 5 2 Coffee 2.00 1357 5/5/92 29.00 3007 1 2 Covers 3.00 ƒ In practice, available data need not conform to 2334 4/7/92 27.50 3007 2 2 Hors d'oeuvre 6.00 the required formats. In particular, values of 3007 4/8/92 29.50 3007 3 3 First course 8.00 attributes may be missing for a particular 3007 4 1 Bream 7.50 tuple we want to add to a . 3007 5 1 Salad 3.00 3007 6 2 Coffee 2.00

CSC343 Introduction to Databases — University of Toronto The Relational Model — 15 CSC343 Introduction to Databases — University of Toronto The Relational Model — 16

4 Incomplete information: Motivation (County towns have government offices, other towns do not.) Incomplete information: Solutions ƒ Florence is a county town; so it has a government office, but we do not know its address. We should not use domain values (0, 99, empty string, ƒ Tivoli is not a county town; so it has no government office. etc.) to represent lack of information: ƒ Prato has recently become a county town; has the ƒ Using unused values may lead to ambiguity and government office been established? We don‘t know! confusion. SOMETIMES we do not have the information or it does not ƒ Unused values could become meaningful. exist. When data is missing, we cannot discern why ƒ Within applications, we should be able to distinguish City GovtAddress between actual values and placeholders. Roma Via IV novembre ƒ For example, in order to calculate the average age Florence ? of a set of people, use 50 as default value for Tivoli ?? unknown ages! Prato ???

CSC343 Introduction to Databases — University of Toronto The Relational Model — 17 CSC343 Introduction to Databases — University of Toronto The Relational Model — 18

Incomplete Information Types of Null Values in the Relational Model ƒ At least three different types are useful: ƒ A simple but effective technique is adopted by ƒ unknown value : there is a domain value, but the Relational Model: use values. it is not known (Florence); ƒ A null value is a special value (i.e., not a value ƒ non-existent value: the attribute is not of the domain) which denotes the absence of a applicable for the tuple (Tivoli); domain value. ƒ no-information value: we don‘t know whether ƒ We could (and often should) put restrictions on a value exists or not (Prato); this is the the presence of null values in tuples (more on disjunction (logical or) of the other two. this later.) ƒ DBMSs do not distinguish between these types: they implicitly adopt the no-information value.

CSC343 Introduction to Databases — University of Toronto The Relational Model — 19 CSC343 Introduction to Databases — University of Toronto The Relational Model — 20

5 A Meaningless Database Integrity Constraints Exams RegNum Name Course Grade Honours 6554 Rossi B01 K 8765 Neri B03 C ƒ An integrity constraint is a property that must 3456 Bruni B04 B honours be satisfied by all meaningful database 3456 Verdi B03 A honours

instances. Courses Code Title B01 Physics ƒ A constraint can be seen as a predicate; a B02 Calculus database is legal if it satisfies all integrity B03 Chemistry constraints. ƒ Grades shouldWHAT beARE between SOME PROBLEMS A and F. ƒ Types of constraints ƒ Honours areWITH awarded THIS onlyDATABASE? if grade is A. ƒ Intra-relational constraints, with domain ƒ Different students have different registration constraints and tuple constraints as special numbers cases; ƒ Exams refer to existing courses. ƒ Inter-relational constraints. CSC343 Introduction to Databases — University of Toronto The Relational Model — 21 CSC343 Introduction to Databases — University of Toronto The Relational Model — 22

Rationale for Integrity Constraints ƒ Useful for describing the application in greater Tuple and Domain Constraints detail. ƒ A tuple constraint expresses conditions on the ƒ Contribute to data quality. values of each tuple, independently of other tuples. ƒ An element in the design process; we will ƒ For example, discuss later normal forms. ƒ Used by the system in choosing a strategy for (NOT(Honours = 'honours'))OR(Grade = 'A') query processing ƒ Another example (derivation rule) Note: It is not the case that all desirable properties Net = Amount-Deductions of the data in a database can be described by ƒ A domain constraint is a tuple constraint that means of integrity constraints! involves a single attribute e.g., "data in the relation Employee must be e.g., (Grade ≤ 'A') AND (Grade ≥ 'F') valid" CSC343 Introduction to Databases — University of Toronto The Relational Model — 23 CSC343 Introduction to Databases — University of Toronto The Relational Model — 24

6 Keys Unique Identification for Tuples RegNum Surname FirstName BirthDate DegreeProg ƒ A key is a set of attributes that uniquely identifies 284328 Smith Luigi 29/04/59 Computing tuples in a relation. 296328 Smith John 29/04/59 Computing 587614 Smith Lucy 01/05/61 Engineering ƒ More precisely: 934856 Black Lucy 01/05/61 Fine Art 965536 Black Lucy 05/03/58 Fine Art ƒ A set of attributes K is a for a relation r if r can not contain two distinct tuples

ƒ Registration number identifies students, i.e., there t1 and t2 such that t1[K]=t2 [K]; is no pair of tuples with the same value for ƒ K is a key for r if K is a minimal superkey (that RegNum. is, there exists no other superkey K’ of r that is ƒ Personal data could identify students as well, i.e., contained in K as proper subset.) there is no pair of tuples with the same values for all of Surname, FirstName, BirthDate. CSC343 Introduction to Databases — University of Toronto The Relational Model — 25 CSC343 Introduction to Databases — University of Toronto The Relational Model — 26

An Example Beware!

RegNum Surname FirstName BirthDate DegreeProg RegNum Surname FirstName BirthDate DegreeProg 284328 Smith Luigi 29/04/59 Computing 296328 Smith John 29/04/59 Computing 296328 Smith John 29/04/59 Computing 587614 Smith Lucy 01/05/61 Engineering 587614 Smith Lucy 01/05/61 Engineering 934856 Black Lucy 01/05/61 Fine Art 934856 Black Lucy 01/05/61 Fine Art 965536 Black Lucy 05/03/58 Engineering 965536 Black Lucy 05/03/58 Fine Art

ƒ There is no pair of tuples with the same values ƒ RegNum is a key: i.e., RegNum is a superkey and on both Surname and DegreeProg; it contains a sole attribute, so it is minimal. i.e., in each programme students have different ƒ Surname, Firstname, BirthDate is surnames; can we conclude that Surname and another key: the three attributes form a superkey DegreeProg form a key for this relation? and there is no proper subset that is also a ƒ No! There could be students with the same superkey. surname in the same programme

CSC343 Introduction to Databases — University of Toronto The Relational Model — 27 CSC343 Introduction to Databases — University of Toronto The Relational Model — 28

7 Existence of Keys Keys and Null Values

ƒ Relations are sets; therefore each relation is ƒ If there are nulls, keys do not work that well: composed of distinct tuples. ƒ They do not guarantee unique identification; ƒ It follows that the whole set of attributes for a ƒ They do not help in establishing correspondences relation defines a superkey. between data in different relations ƒ Therefore each relation has a key, which is the RegNum Surname FirstName BirthDate DegreeProg set of all its attributes, or a subset thereof. NULL Smith John NULL Computing ƒ The existence of keys guarantees that each 587614 Smith Lucy 01/05/61 Engineering piece of data in the database can be accessed, 934856 Black Lucy NULL NULL ƒ Keys are a major feature of the Relational Model NULL Black Lucy 05/03/58 Engineering and allow us to say that it is “value-based”. ƒ Are the third and fourth tuple the same? ƒ How do we access the first tuple? CSC343 Introduction to Databases — University of Toronto The Relational Model — 29 CSC343 Introduction to Databases — University of Toronto The Relational Model — 30

Primary Keys Do we Always Have Primary Keys? ƒ The presence of nulls in keys has to be limited. ƒ Each relation must have a on which ƒ In most cases we do have reasonable primary nulls are not allowed, keys. ƒ Notation: the attributes of the primary key are ƒ In other cases we don’t, so we need to introduced underlined, new attributes by identifying codes. ƒ References between relations are realized ƒ Note that most of the “obvious” codes we have through primary keys, now (social security number, student number, area RegNum Surname FirstName BirthDate DegreeProg code, …) were introduced before the adoption of 643976 Smith John NULL Computing databases with the same goal in mind, i.e. to offer 587614 Smith Lucy 01/05/61 Engineering an unambiguous identification of things. 934856 Black Lucy NULL NULL 735591 Black Lucy 05/03/58 Engineering CSC343 Introduction to Databases — University of Toronto The Relational Model — 31 CSC343 Introduction to Databases — University of Toronto The Relational Model — 32

8 Example of Referential Constraints Referential Constraints Offences Code Date Officer Dept Registration (Foreign Keys) 143256 25/10/1992 567 75 5694 FR ƒ Pieces of data in different relations are 987554 26/10/1992 456 75 5694 FR 987557 26/10/1992 456 75 6544 XY correlated by means of values of (primary) keys. 630876 15/10/1992 456 47 6544 XY 539856 12/10/1992 567 47 6544 XY ƒ constraints are imposed in

order to guarantee that the values refer to Officers RegNum Surname FirstName existing tuples in the referenced relation. 567 Brun Jean 456 Larue Henri ƒ For example, if the manager of the employee 638 Larue Jacques with employee# 76544 is an employee with Cars Registration Dept Owner … employee# 87233, there better be an employee 6544 XY 75 Cordon Edouard … with such an employee number. 7122 HT 75 Cordon Edouard … 5694 FR 75 Latour Hortense … ƒ Also called inclusion dependencies. 6544 XY 47 Mimault Bernard … CSC343 Introduction to Databases — University of Toronto The Relational Model — 33 CSC343 Introduction to Databases — University of Toronto The Relational Model — 34

Referential Constraints Violation of Referential Constraints ƒ A referential constraint requires that the values

on a set X of attributes of a relation R1 must appear as values for the primary key of another Offences Code Date Officer Dept Registration relation R . 987554 26/10/1992 456 75 5694 FR 2 630876 15/10/1992 456 47 6544 XY

ƒ In such a situation, we say that X is a of relation R . Officers RegNum Surname FirstName 1 567 Brun Jean ƒ In the previous example, we have referential 638 Larue Jacques constraints between the attribute Officer of the Cars Registration Dept Owner … relation Offences and the relation Officers; 7122 HT 75 Cordon Edouard … also between the attributes Registration and 5694 FR 93 Latour Hortense … Department of Offences and the relation 6544 XY 47 Mimault Bernard … Cars. CSC343 Introduction to Databases — University of Toronto The Relational Model — 35 CSC343 Introduction to Databases — University of Toronto The Relational Model — 36

9 Referential Constraints: Comments Complications with Constraints

Accidents Code Dept1 Registration1 Dept2 Registration2 ƒ Referential constraints play an important role in 6207 75 6544 XY 93 9775 GF 6974 93 5694 FR 93 9775 GF making the relational model value-based. ƒ It is possible to have features that support the Cars Registration Dept Owner … 7122 HT 75 Cordon Edouard … management of referential constraints 5694 FR 93 Latour Hortense … (“actions” activated by violations). 9775 GF 93 LeBlanc Pierre 6544 XY 75 Mimault Bernard … ƒ Care is needed in case of referential constraints that involve two or more attributes. ƒ Here we have two referential constraints for Accidents: from Registration1, Dept1 to Cars; also from Registration2, Dept2 to Cars.

CSC343 Introduction to Databases — University of Toronto The Relational Model — 37 CSC343 Introduction to Databases — University of Toronto The Relational Model — 38

10