Chapter 3: the Relational Data Model and Relational Databases 2

Chapter 3: The Relational Data Model and Relational Databases 2 The Relational Model of Data a l b m c n e s d t A B a b c A relation between sets A and B d e A subset of A x B l m n s t 3 The basis of the model is the concept of relation, as found in mathematics, set theory, mathematical logic, in particular in predicate logic A model is given in terms of relations between elements of a domain A relational schema contains the basic elements of a relational data model The schema is application dependent 4 A relational schema S contains: The data domain D, which is a possibly infinite set A finite collection of relations (relation names) R1,...,Rn over D of finite and fixed arity That is, for each relation name R ∈S, its potential extensions will be subsets of Dk = D×···×D (k times) for some natural number k that depends on R R(·,...,·) k arguments An k-ary relation can be seen as a table with k columns R ··· ··· · ··· ··· · · ··· ··· · ↑←− k columns−→ ↑ 5 A finite collection of attributes (attribute names) A1,...,Am They are associated to the different relations to denote their arguments, or “columns” They can be identified with/by unary relations (unary pred- icates, properties) over D That is, they can be identified with subsets (sub-domains) of the domain D R A ··· ··· C · ··· ··· · · ··· ··· · A,...,C are attributes of relation R 6 Example: Schema S with Domain D = {john, peter , mary, ..., 1, 2, 3, 4, ....} Binary relation People(·, ·) Attributes for People, in this order: Name, Age People Name Age · · No contents or extensions so far; the schema describes the struc- ture of the model The schemas are domain/application dependent We can see that attributes can be seen as subsets (to be) of the domain 7 Two attributes, with different names, can have later the same extensions (and still be different; that’s why treating them as functions is more precise) Example: Schema with domain D = {john, peter , mary,...} and relation Manager Boss Subordinate · · Schemas can be filled with data in many different ways A database instance D compatible with a given schema S is a collection of finite extensions for the relation names in the schema 8 Example: For the schema S with Domain D = {john, peter, mary, ken, carol, steve, ..., 1, 2, 3, 4, ....} Binary relations People(·, ·), Manager(·, ·) Attributes for People, in this order: Name, Age Attributes for Manager, in this order: Boss, Subordinate This is an instance compatible with the schema: D1: People Name Age john 35 Manager Boss Subordinate mary 25 ken john ken 40 john mary 9 This is another compatible instance: D2: People Name Age mary 35 Manager Boss Subordinate mary 25 ken steve peter 40 carol steve john mary The sub-domains for the attributes Boss and Subordinate are the same, namely the subset {john, peter , mary, ken, carol , steve, ...} of the database domain D 10 Example: (different notations for the same) The table Account# Name Balance 12345 Raoul 400,00 34567 Rupert 354,60 12338 Rumilde 1234,30 34561 Sulema 34445,23 Accounts relation is an instance of a relation between the attributes Acount#, Name,andBalance Each attribute has an associated (sub)domain Here, the account number 12345, the name Raoul and the nu- merical value 400,00 are mutually related through the relation The schema of the relation is: Accounts(Account#, Name, Balance) 11 Bank Example: Some abbreviations clientn = client name cladd = client address clneigh = client neighborhood, branch = branch name acc# = account number Schema: Deposit(branch,acc#,clientn,balance), Client(clientn,cladd,clneigh) 12 An instance Deposit branch acc# clientn balance Carleton 101 Jim 500 Downtown 215 Sandy 700 Barrhaven 304 Alvin 1300 Client clientn cladd neighcl Jim 101 Queensbury Barrhaven Sandy 40 Stone Nepean Hernandez 15 Laurier Downtown Alvin 17 Clyde Altavista John 89 Case Centrepoint 13 What is a right schema? What about this one? A single universal relation Bank(branch,acc#,clientn, balance,cladd,neighcl) It depends on the application and other practical, DB oriented, issues If a client has several accounts, there is redundancy of in- formation This DB becomes unnecessarily large, and inconsistencies become more likely to occur If a client has an account, but no address, we have to use more null values than desired Null values are not easy to handle 14 We will come back to design issues later on ... For the moment, this one seems to be a better schema: Deposit(branch,acc#,clientn,balance) Client(clientn,cladd,clneigh) Consider the relation Deposit We have 4 (sub)domains, D1, D2, D3, D4, one for each of its 4 attributes, where they take values (branch names, account numbers, client names, balances) Any row in the table (extension of the relation) is a 4-tuple (v1,v2,v3,v4) with v1 ∈D1,v2 ∈D2,v3 ∈D3,v4 ∈D4 15 That is Deposit branch acc# clientn balance Carleton 101 Jim 500 Downtown 215 Sandy 700 Barrhaven 304 Alvin 1300 is a subset of D1 ×D2 ×D3 ×D4 Any instance of the relation Deposit will be a subset of D1 ×D2 ×D3 ×D4 We use relation and table as synonymous, the same for tuple and row If t is a tuple, and R is a relation (extension), then: 16 We can say that t ∈ R if the tuple belongs to the relation R (relation extensions are sets) Let A be the name of an attribute in the nth column of relation R If t ∈ R,thent[n] and t[A] denote the value of the attribute A in the tuple t For example, if t denotes the first tuple in the table Deposit, then t[2]=t[acc#]=101 Useful notation: Since the same attribute may appear in different tables, we distinguish the occurrences of the attribute, by using the relation name followed by “.” as a prefix, e.g. Deposit.acc# Deposit.clientn Client.clientn 17 Queries For the instance on page 12, give me the addresses with balances of the clients who have a balance higher than 600 40 Stone 700 Answer: 17 Clyde 1300 The answer is a set of tuples, a new relation (extension) We can say that a query is a mapping that sends DB instances to new DB instances (possible with a different schema) 18 Several issues: How to specify a query? How to write it? In what language? What is the precise meaning of a query? How to compute the answer? There are several query languages for RDBs Some more used in practice than others But those of a more theoretic nature are the basis for the most used in practice 19 The distinction between declarative vs. procedural query languages is always relevant The former express what the user wants to obtain from the database, the latter express a particular way to compute the answer 20 Relational Algebra as a Query Language Idea: Relations are sets (subsets of cartesian products) con- structed on top of other sets (domain or subdomains) Query answers are new relations Thus, in order to obtain new relations (e.g. query answers) do set-theoretic algebra on existing relations Operate on sets and relations in order to obtain new sets or relations 21 The Relational Algebra (RA) Provides algebraic operations over relations that produce new relations Operations based on set-theoretic operations Some of those operations come directly from set theory Others are specific, ad hoc, for the RA The latter are applicable to relations (as opposed to sets in general) Provides a procedural query language for RDBs (because it is based on explicit operations) The RA is one of the strengths of the relational model RA can be used to give a precise, set-theoretic semantics to other query languages 22 Queries in RA: It is possible to answer the query by applying a sequence of algebraic (relational) operations starting from the original database instance Even if the RDBMS offers a different query language, e.g. a declarative one, a query will be compiled into a sequence of algebraic operations on the DB 23 Summary of basic operations of RA: Union and Intersection: R1 ∪ R2, R1 ∩ R2 Can be applied to similar relations, i.e. same arity (and data types), as normal sets Difference: R1 R2 Again, for similar relations, as normal sets Product: R1 × R2 This is essentially the cartesian product of two relations taken as normal sets E.g. for R = {(a, b), (c, d)},S= {(1, 2), (2, 3)} R × S = {(a, b, 1, 2), (a, b, 2, 3), (c, d, 1, 2), (c, d, 2, 3)} 24 R1 D R2 D R1 D R2 R1 U R2 R1 R2 D R1 D R2 R1 \ R2 D 25 Π R(··· ,A,···) Projection: A , i.e. the projection of relation R on attribute A B R A II R A Here, A is one of the attributes of R The projection could be on several attributes of R This is a unary operation: takes one relation as input (the previous ones are binary) This is an operation special for relations It deletes, ignores, projects out entire “columns” from a relation Projects R over one (or several) “coordinates” (attributes) 26 It generates a new relation, with a subset of the attributes (columns) Its logical counterpart is the existential quantification For the relation in the figure: Π R(A, B)={a ∈ A | b ∈ B A it exists such that (a, b) ∈ R} 27 Selection: σ<condition>(R) Unary operation, special for relations Selects the tuples of the relation R that satisfy the condition The condition can be expressed in a (limited) logical language It generates a new relation, with the same attributes, but possibly fewer tuples (rows) 28 Join: R1 R 2 A binary operator, essential in RA It allows to compose two relations through the values in common taken by a distinguished attribute that shared by the two relations (or two different attributes but with same data type or domain) Similar to the operation of composition of two relations as seen in set theory: R◦S It is essential to combine tables in natural way, without ap-

Load more