Lecture 7: Relational Algebra and Select Queries
Total Page:16
File Type:pdf, Size:1020Kb
Lecture 7: Relational Algebra and Select Queries Introduction Over the past several lectures we have been discussing the design and construction of relational databases. Last lecture we looked specifically at building relations in a database for restaurants. Today’s discussion will look at using relational algebra to extract information from relations. We will begin the discussion with two definitions; then describe each relational algebra operator; and conclude with a brief discussion of how these operators are used to make select queries. Definitions Relational Algebra: a collection of operators, such as join, union, and intersect, that take relations as their operands and return relations as their result Relational Closure: because the product of every operation is the same kind of object as the input, the output can become input for additional operations For example, the Relational Operator, Union, operates on the relations, A and B, to produce a new relation, AB (relational algebra). Because AB is a relation, it can be used in subsequent operations (relational closure). Relational Algebra Operators In 1970, E.F. Codd identified eight relational algebra operators (Codd 1970). Today, these remain the basis for most database transactions. Codd categorized these eight at Traditional (Union, Intersection, Difference, and Product) and Special (Restrict, Project, Join, and Divide). The following table contains definitions and graphic aids for each of these operators. RNR/GEOG 417-517 Gary L. Christopherson Traditional Operators Union Returns a relation consisting of all tuples appearing in either or both of two specified relations. Relations must be same shape. That is, the two tables must contain the same number of attributes, and each attribute in the tables must be defined in exactly the same way. For example, a Union of the following tables results in a table that contains all the records from the two tables. A B S# SNAME SATUS CITY S# SNAME STATUS CITY S1 Smith 20 London S1 Smith 20 London S4 Clark 20 London S2 Jones 20 Paris S# SNAME STATUS CITY A union B S1 Smith 20 London S4 Clark 20 London S2 Jones 20 Paris Intersect Returns a relation consisting of all tuples appearing in both of two specified relations. Relations must be same shape. RNR/GEOG 417-517 Gary L. Christopherson For example, an intersect of the two tables results in a table that contains only those tuples appearing in both tables. AB S# SNAME SATUS CITY S# SNAME STATUS CITY S1 Smith 20 London S1 Smith 20 London S4 Clark 20 London S2 Jones 20 Paris A intersect B S# SNAME STATUS CITY S1 Smith 20 London Difference Returns a relation consisting of all tuples appearing in the first and not the second of two specified relations. Relations must be same shape. For example, in the following difference operation on the two tables, the result is a talble that contains only those records in table A that don’t appear in table B. AB S# SNAME SATUS CITY S# SNAME STATUS CITY S1 Smith 20 London S1 Smith 20 London S4 Clark 20 London S2 Jones 20 Paris S# SNAME STATUS CITY A difference B S4 Clark 20 London Product Returns a relation consisting of all possible tuples that are a combination of two tuples, one from each of two specified relations. The cardinality of the result will be the product of the cardinality of the two relations, and the degree will be the sum of the RNR/GEOG 417-517 Gary L. Christopherson degrees of the two relations. Special Operators Restrict Returns a relation consisting of all tuples from a specified relation that meet a specified condition. Usually expressed as a WHERE clause. As in the example on the right, a Restrict operation selects records/tuples. S WHERE City = London Project Returns a relation consisting of all tuples that remain as (sub)tuples in a specified relation after specified attributes have been eliminated. As in the example on the right, a Project operation selects attributes from a relation. Parts [color, city] RNR/GEOG 417-517 Gary L. Christopherson Join Returns a relation consisting of all possible tuples that are a combination of two tuples, one from each of two specified relations, such that the two tuples contributing to any given combination have a common value for the common attribute(s) of the two relations (and that common value appears just once, not twice, in the resulting tuple). As in the example on the right, join is used to connect information from two tables that have a common attribute containing common values Divide Takes two relations, one binary and one unary, and returns a relation consisting of all values of one attribute of the binary relation that match (in the other attribute) all values in the unary relation. Divide is particularly helpful if you are asking for all of something. For example, if you have a table containing a list of movies that you want to see, and a table containing names of theaters and the movies they are playing. Using the operator Divide, you can ask for the names of theaters that are playing all the movies from your list. Theatre Movie Movies I want to see Regal Reservoir Dogs Reservoir Dogs Regal Bourne Identity My Little Pony Imperial Bourne Identity Divide Imperial My Little Pony Imperial Reservoir Dogs Royal My Little Pony Royal Bourne Identity Imperial RNR/GEOG 417-517 Gary L. Christopherson Relational Algebra and Select Queries Using these eight operators it is possible to carry out a variety of management tasks for a relational database, but for most users the most common task will be the select query. A select query asks for information based on values in particular tuples in particular attributes in particular tables. For example, we might want to know the names of all the movies currently playing, whose stars were born before 1940. Given the proper relations in a database, and using relational algebra, it is possible to search the values in the relations to identify all the movies that met these conditions and return a relation containing their names. Creating select queries requires a computer language that will interface with the database. The industry standard query language is Structured Query Language (SQL) developed by IBM in the 1970’s. SQL uses the relational algebra operators discussed above to retrieve, add, modify, or delete data. A standard format for writing a select query in SQL (below) uses the operators Project, Join, and Restrict to return a table containing the information sought by the user. Because this begins with tables and ends with a table, you have relational closure. Even simple select queries are often very complex, representing the complexity of the database structure. In this Microsoft Access example, the query is looking for restaurants where the cost is low, or the dress code is casual; a fairly simple question for the complexity of the SQL statement. RNR/GEOG 417-517 Gary L. Christopherson Because this complexity frightens many users, application developers have designed aids for writing SQL statements. For example, Microsoft Access uses a method called Relational Query by Example (RQBE). This allows the user to drag and drop representations of relations, relationships, and attributes into a Query Grid. The figure below shows the same query for restaurants where the cost is low, or the dress code is casual built in the Access Query Grid. Don’t be misled by the apparent simplicity of RQBE, because the power behind the scenes remains SQL. The software simply translates the drag-and-drop actions of the user to SQL statements. RNR/GEOG 417-517 Gary L. Christopherson Summary Relational algebra allows users to manage relational databases, querying, added, deleting, and modifying data in tables. The most common of the eight relational algebra operators are restrict, project, and join. These three operators are particularly important in writing select queries. Using SQL statements, the user can create a new relation containing particular tuples from particular tables that answer particular questions. These SQL statements generally take the form of SELECT columns (Project) FROM tables (Join) WHERE row conditions are met (Restrict) Reference Cited: Codd, E. F. 1970 A Relational Model of Data for Large Shared Data Banks. Communications of the ACM 13(6):377-387. RNR/GEOG 417-517 Gary L. Christopherson .