<<

IS0: Relational Languages

Relational Languages

-

- SQL (Structured Query Language) I (’86) en II (’92) (wij kijken slechts naar enkele ‘extra’ aspecten van SQL ’92)

Literatuur: Boek Halpin hfdst 11 Dictaat: meerdere stukken over SQL ( I en II) Relational Algebra

Studying this algebra first clarifies the basic query operations without getting distracted by the specific syntax of commercial query languages.

In the of data, all facts are stored in tables (or relations).

New tables may be formed existing tables by applying operations in the relational algebra.

The original relational algebra defined by Codd contained eight relational operators: four based on traditional (union, intersection, difference, and ) and four special operations (selection, projection, , and division).

Each of these eight relational operators is a -forming operator on tables.

Relational algebra includes six comparison operators (=, <>, <, >, <=, >=). These are proposition-forming operators on terms. For example, x <> 0 asserts that x is not equal to zero.

It also includes three logical operators (and, or, not). These are proposition-forming operators on propositions (e.g., x > 0 and x < 8).

Since the algebra does not include arithmetic operators (e.g., +) or functions (e.g., count), it is less expressive than SQL. Union (“A  B” or “A union B”)

Two tables are union-compatible if and only if they have the same number of columns, and their corresponding columns are based on the same domain (the columns may have different names).

The union of tables A and B is the set of all rows belonging to A or B (or both). Intersection (“A  B” or “A intersect B”) and Difference (“A - B” or “A minus B” or “A except B”)

The intersection of tables A and B is the set of rows common to A or B.

The difference between tables A and B is the set of rows belonging to A but not B. Cartesian Product (Unrestricted Join) (ªA ‚ Bº or ªA times Bº)

If A and B are tables, A ‚ B is formed by pairing each of A with each row of B. (Here “pairing” means “prepending”.

The number of rows in A ‚ B is the product of the number of rows in A and B.

The number of columns in A ‚ B is the sum of the number of columns in A and B.

The operations of union, intersection and Cartesian product are associative (e.g., A  (B  C) = (A  B)  C ). Difference is not associative (e.g., A – ( B – C ) is not equal to ( A – B ) – C ) What is wrong with this Cartesian Product?

Table aliases are needed to multiply a table by itself: Codds table operations: a) Relational Selection

The selection operation chooses just those rows that satisfy a specified . The selection operation may be specified using an expression of the form T c. T denotes a table expression (i.e., an expression whose value is a table) and c denotes a condition. The ªwhere c” part is called a where clause.

1 The alternative notation c(T) is often used in academic journals. The ª1º is sigma, the Greek s (which is the first letter of ªselectionº). Examples

Unless parentheses are used, and is evaluated before or.

Two equivalent queries; if in doubt, include parentheses.

Three equivalent queries.

De Morgan’s laws: not ( p and q ) å not p or not q not ( p or q ) å not p and not q Codds table operations: b) Relational Projection

Relational projection involves choosing one or more columns from a table, and then eliminating any duplicate rows that might result.

We may represent the projection operation as T [ a, b, ...] where T is a table expression and a, b, ... are the names of the required columns.

Œ The alternative notation a,b..(T) is common in academic journals. The ªŒº is pi, the Greek p (the first letter of ªselectionº).

Projection involves picking the columns and removing any duplicates Codds table operations: c) Relational Joins

We now consider the relational join operator between two tables, which compares attribute values from the tables, using the comparison operators (=, <, >, <>, <=, >=).

There are several kinds of join operations, and we discuss only some of these here.

Let , (theta) denote any comparison operator (=, <, etc.). Then the ,-join of tables A and B on attributes a of A and b of B equals the Cartesian product A ‚ B , restricted to those rows where A.a , B.b We write this as shown below, an academic notation is shown in braces.

‚ A B where c { or A L c B }

The condition c used to express this comparison of attributes between tables is called the join condition. The join condition may be composite (e.g., a1 < b1 and a2 < b2 ).

With most joins, the comparison operator used is = . The ,-join is then called an equijoin.

Examples: A ‚ B where A.a = B.b

or: A ‚ B where a = b { if a, b occur in only one of A, B} Base for the following examples

The ORM schema (a) maps to the relational schema (b).

An example population for the schema. Example: an equijoin

If the matching columns actually refer to the same thing in the UoD (and they typically do), then one of these columns is redundant. In this case we lose no information if we one of these matching columns (by performing a projection on all but the to delete). Example: from equijoin to natural join (a)

If the columns used for joining have the same name in both tables, then the unqualified name is used in the join result.

The resulting table is then said to be the natural inner join of the original tables.

Since ‘inner’ is assumed by default, the natural inner join may be expressed simply as “natural join”. This is by far the most common join operation in practice.

It may be written as: A L B , or in words: “A natural join B”

To compute A L B : Form A ‚ B For each column name c that occurs in both A and B Apply the restriction A.c = B.c Remove B.c Rename A.c to c

Note that “L” looks like a cross “‚” with two vertical lines added, suggesting that a natural join is a Cartesian product plus two other operations (selection of rows with equal values for the common columns, followed by projection to delete redundant columns). Example: from equijoin to natural join (b)

The duplicate column is removed by projection on the equijoin.

A natural join. Example: with natural joins

Tables being joined may have zero, one, or more common columns.

Examples:

Account has a composite identification scheme. Example: with natural joins (cont.)

Sample population for the bank schema:

Two queries using natural joins: Other join types In rare cases, comparison operators other than equality are used in joins. As a simple example: Suppose we want a list of drinker-smoker pairs, where the drinker and smoker are distinct persons.

Other kinds of joins can be defined. For example, left, right, and full outer joins are used to include various cases with values. An outer join is basically an inner join, with extra rows padded with nulls when the join condition is not satisfied. For example, Client left outer join AcUser includes a row to indicate that client 8005 has the name ªShankara, TAº but uses no account (branchNr a nd accountNr are assigned null values on this row). Codds table operations: d) Relational Division

A table A is divisible by another table B only if A has more columns. Let B have n columns.

The operation A ÷ B ( A divide-by B) is defined if and only if the domains of the last n columns of A match the domains of the columns in B (in order). Examples: Query strategies (in relational algebra . . . )

• Phrase the query in natural language, and understand what it means. • Which tables hold the information? • If you have table data, answer the query yourself, then retrace your mental steps. • Divide the query up into steps or subproblems. • If the columns to be listed are in different tables, declare joins between the tables. • If you need to relate two or more rows of the same table, use an to perform a self-join.

Example: how can the next query be formulated in relational algebra? Solution: query in relational algebra

(Account where balance > 700 ) [ branchNr, accountNr ] L AcUser L Client [ clientNr, clientName ]

This is not the only way we could express the query. For instance, using a top-down approach, both joins could have been done before selecting or projecting, thus:

( Account L AcUser L Client ) where balance > 700 [ clientNr, clientName ]

Although these two queries are logically equivalent, if executed in the evaluation order shown, the second query is less efficient because it involves a larger join.

Relational algebra can be used to specify transformation rules between equivalent queries to obtain an optimally efficient, or at least more efficient, formulation. SQL database systems include a query optimizer to translate queries into an optimized form before executing them. Systems / SQL

We may now define a relational DBMS as a DBMS that has the relational table as its only essential data structure, and supports the selection, projection and join operations without needing specification of physical access paths.

A relational system that supports all eight table operations of the relational algebra is said to be relationally complete. This doesn’t entail eight distinct operators for these tasks; ra ther, the eight operations must be expressible in terms of the table operations provided by the system.

For version 1 of the relational model, Codd proposed 12 basic rules to be satisfied by a relational DBMS.

Version 2 of the relational model as proposed by Codd (1990) includes 333 rules …

In 1992 the SQL standard was substancially improved …

The latest standard SQL:1999 ….

Much used RDBMS: Oracle, DB2, Microsoft SQL Server, Ingress/Posgress, MS Acces, FoxPro, Paradox, …. SQL: Choosing Columns, Rows, and Order [where ...]

Relational algebra SQL Equivalent formulations in relational algebra and SQL : T where c * from T where c T where c select distinct a, b, … from T [ a, b, … ] where c

A where clause is used to select just the Aquarians. SQL: Joins (a)

A cross join (Cartesian product) of tables pairs all the rows of one with all the rows of the other.

In SQL-89, a cross join of tables is specified by listing the tables in the from clause, using a comma to denote the Cartesian product operator.

A conditional join (,-join) selects only those rows from the Cartesian product that satisfy a specified condition. In SQL-89, the condition is specified in the where clause.

SQL-92 (and SQL:1999) uses special syntax for various kinds of joins. In addition to supporting the SQL-89 syntax, these newer standards include special notations for the following types of joins (any text after two hypens ª- -º is a comment): cross join qualified join: conditional join - - on clause column-list join - - using clause natural join union join

Qualified and natural joins may be further classified into the following types: inner - - this is the default outer: left right full SQL: Joins (b) [syntax 89/92/1999]

Join type New syntax in SQL-92, SQL:1999 SQL-89 syntax cross select * select * from A cross join B from A, B conditional select * select * from A join B from A, B on condition where condition column-list select * select A.c1, … , … - - omit B.c1, ¼ from A join B from A, B

using (c1, ¼) where A.c1 = B.c1 and ¼

- - c1, ¼a re unqualified - - join columns are qualified natural inner select * select A.c, … , … - - omit B.c, ¼ from A natural [inner] join B from A, B where A.c = B.c - - join column in result is c - - join columns in result is A.c left outer select * select A.c, … { omit B.c } from A natural left [outer] join B from A, B where A.c = B.c union all - - join column are unqualificated select c, ¼, ‘?’, .. - - nulls generatted for nonmatches . . from A where c not in (select c from B) - - for composite c use exists with correlated subquery¼ etc...... SQL: Joins (c) [cross join // conditional join ]

Listing all possible male-female pairs.

A conditional join specifies the join condition in an on clause. SQL: Joins (d) [ natural join ]

Currently, most SQL dialects (including SQL Server) do not support the natural join. Instead, the conditional join is often used to handle natural joins. For example, the query ‘above’ may be formulated thus: select Employee.empNr, empName, carModel from Employee join Drives on Employee.empNr = Drives.empNr join Car on Drives.carRegNr = Car.carRegNr Alternatively, SQL-89 syntax can be used as follows: select Employee.empNr, empName, carModel from Employee, Drives, Car where Employee.empNr = Drives.empNr and Drives.carRegNr = Car.carRegNr SQL: Joins (e) [ natural left outer join ]

Two equivalent formulations of a natural left outer join. SQL: Joins (f) [ self-join ]

A self-join is used to list pairs of scientists of opposite sex. SQL: Grouping (a)

Partitioning rows into groups with the same value for column a.

Basic syntax: select group-property1, ... from ... [ where row-criterion(s) ] group-criterion1, ...

Example: select sex, count ( *), avg ( iq ) from Pupils where iq is not null group by sex SQL: Grouping (b) [ ‘division’ ]

We have seen the relational algebra ‘division’-example:

Speaks ÷ ( Speaks where country = ‘Canada’ [language] )

Sometimes, simple cases of relational division can be handled by grouping.

N.B. The column name “language” is double quoted because it is a reserved word. SQL: Correlated and Existential Subqueries

Example: (members have a composite reference scheme)

Consider: who is not ranked in judo?

select surname, firstname, sex from Member where not exists ( select * from Ranked where surname = Member.surname and firstname = Member.firstname and art = ‘judo’ ) SQL: Data Definition [ create table / ] create table tablename ( colname data-type [not null] [...| ..| .., .. , ...] [, ...] [, (col-list) ] [, unique (col-list)] [, (col-list) references tablename (unique-collist)] [, ...] [, check (table-condition-on-same-row) [,...]])

Views, or ªvirtual tablesº, are basically named, derived tables. Their definition is stored, but their contents are not. create view viewname [ (col-list) ] as select-query [ with check option ]

Also: drop table / drop view / alter table .. add ... SQL: Updating Table Populations

into tablename [ (col-list ) ] insert into Employee values ( constant-list ) values ( 715, ‘Jones’, ‘Eve’, ‘F’, null, null )

or: insert into Employee (empNr, surname, firstname, sex) values ( 715, ‘Jones’, ‘Eve’, ‘F’ ) commit [work]  rollback [work]

delete from Item - - deletes all rows from the table delete from tablename [ where condition ] delete from Item where category = ‘DB’

tablename update Employee set colname = expression [, ... ] set salary = salary * 1.05 [ where condition ] where job = ‘Modeler’ and salary < 50000 SQL: Security and Metadata

Security: A database is secure if and only if operations on it can be performed only by users authorized to do so.

SQL provides a grant statement for granting various kinds of privileges to users. The SQL-92 syntax is:

grant all privileges | select | insert [(col)] | update [(col)] | delete | usage | .... on object to user-list [ with grant option ]

Privileges may be removed with the revoke statement:

revoke [ grant option for ] privilege-list on object from user-list [ restrict | cascade ]

Metadata (data about data): SQL systems automatically maintain a set of system tables holding information about the itself (e.g., base tables, views, domains, constraints, and privileges). Users with access to the system tables may query them in SQL (just like the application tables).