Background
A database model contains the means for
¡ Specifying particular data structures
¡ Constraining the data sets Database Theory: Lecture 1 ¡ Manipulating the data
Introduction and the relational model A DDL (Data Definition Language) provides the means to specify the structures and constraints. Dr G.M. Bierman A DML (Data Manipulation Language) provides the means to manipulate the data; specifically querying and updating. www.cl.cam.ac.uk/Teaching/2002/DBaseThy
Database Theory 2003 1 c GMB/AD Database Theory 2003 2 c GMB/AD ¢
Database models This lecture course
Database models are typically characterized by the prominent data structure, e.g. We will study some of these database models from a theoretical perspective. ¥ £ Graphs: Study closely the data model
Network model – ¥ Study closely the associated query languages:
– Semantic model – Choices of query languages
– Object model – Relationships between them
– XML (?) – Their characteristics (typically expressivity)
£ Trees: Three database models considered in this course: – Hierarchical model 1. Relational
£ Relations: 2. Object – Relational model 3. Semi-structured/XML
Database Theory 2003 3 c GMB/AD Database Theory 2003 4 c GMB/AD ¦ ¤ Warning Books
This is a brand new course! Recommended textbooks:
There will be eight lectures, two examples classes, and two lecturers! ¨ Foundations of databases Abiteboul, Hull and Vianu. Addison Wesley, 1995 Schedule: ¨ Data on the web Abiteboul, Buneman and Suciu. Morgan Kaufmann, 2000. 1. GMB - (now!) relational model Also: Some useful information in any of the database books from the IB 2. AD - Relational calculus Databases course. 3. AD - Deductive databases
4. AD - Recursion and negation
5. AD - Expressivity and complexity
6. GMB - Complex values
7. GMB - Object model
8. GMB - Semistructured/XML model
Database Theory 2003 5 c GMB/AD Database Theory 2003 6 c GMB/AD © §
What is the relational model?
Codd 70:
– Data structure: flat relations
– Simple algebra of queries The relational model – No constraints – No updates
Codd 72-:
– Same data structure
– Second query language (based on FOL)
– Proof of equivalence to algebra language
– Integrity constraints - functional dependencies
Database Theory 2003 7 c GMB/AD Database Theory 2003 8 c GMB/AD
What is the relational model? cont. Basics Therafter a whole host of extensions. Thus the relational model means the class Fix a countably infinite set of attribute names, with a total order of database models that have relations as the core data structure and incorporate Fix a single countably infinite set : the underlying domain some of Codd’s approach. Fix a countably infinite set of relation names Assume that there is a sort for each element of , i.e. a function
! . " # The arity of a relation $ is then defined %&' ( 01 & ( + + ) * ,.- / * , / A relation schema is just a relation name. We may write $ 3 to denote $ is 2 4 5 of sort 3 , and $ to denote it is of arity 265 4
A database schema is just a nonempty finite set of relation schema, e.g. 8 < < < 3 $ 3 $ 7 : : = = ; ; 4> 2 9 2 4
Database Theory 2003 9 c GMB/AD Database Theory 2003 10 c GMB/AD ?
Example relations Choicepoint: names
Q: Are the attribute names part of the explicit database schema or not? F H P AB CD E IJ A CK IQ ER A I B T AU C @ M M S M O O G L N L N L NV In SQL they are available, e.g. SELECT p.Name FROM Persons p Movies Title Director Actor
Magnolia Anderson Moore Guide Title Cinema Time But are they compiled away by the system to just integers?
Magnolia Anderson Cruise Rocky Warner 12:00 More theoretically: Spiderman Raimi Maguire Spiderman Picturehouse 19:00 X _ ]^ : A tuple over relation schema Y [ is a map from [ to
Named Z \ Spiderman Raimi Dunst ...
... Spiderman Phoenix 19:00 X : A tuple over relation schema Y is a element of the cartesian
Unnamed Z6` \ Rocky Avildsen Stallone Magnolia Picturehouse 22:00 _ba product ]^ RockyII Stalone Stallone The choice of model impacts on the query language.
Location Cinema Address Tel Having assumed a total order on the attribute names provides us with a Picturehouse Cambridge 504444 correspondence between the two. Thus we’ll flip between the two during this Phoenix Oxford 512526 course Warner Cambridge 560225
Database Theory 2003 11 c GMB/AD Database Theory 2003 12 c GMB/AD c W Unnamed: SPC algebra Example
(Simplest possible algebra first) List the name and addresses of cinemas playing a Almodovar film f e e g
SPC query d Base relation
Singleton constant relation h ij6k lm | | } } 6 6 6 } ~ Almodovar n p q
d Select (constant) h r s o Alternatively: n p t d Select (attribute) h r s o u x x x
d Projection h v r s y w w o o z | d d
Cartesian product h 6 6 } ~ Almodovar What is this? (Assume is of arity ) | 6 ¡¢ ~ ~
Database Theory 2003 13 c GMB/AD Database Theory 2003 14 c GMB/AD £ {
Arity judgements for SPC queries Semantics © ¤ ¦ ¨ Ö ÔÕ ¥ § × Ø Ñ Ñ Ó Ù Ú Ð Ð Ò Ò ª « ± © ® ¬ ¯° ª « © ¤ ¦ Ö ÔÕ × Ý Ý Ó ÛÜ Þß Ð ÐÛÜ Þß Ò Ò Ö ÔÕ × × å à å Ý â æ « ª Ó Ó « ª ¦ ¦ © ã ¦ ¦ » ® ¼ ¾ © ä ä è á Û Ð Ð Ù Ú Ò Ò Ð Ð Ò Ò ç Ù Ú ß ³ ³ ² ½ ² ´ ´ ¹ º Ö ÔÕ × × å å å ê à â æ Ó Ó é ä ä è Û Ù Ú Ò Ò Ð Ð Ò Ò ç Ù Ú Ù Úß Ð Ð á « ª ¦ © µ « ª ¦ © µ · · ¸ ¿ ² ² ¹ º ¶ ¹ º ¶ Ö ÔÕ × ò ò ò å ë å å æ ð Ó î î î Ó é ñ ñ ä ä è è ìí í á á Ú Þ ç Ð Ð ÛÜ Ù Ú Ù Ð Ð Ò Ò ß Ù Ú Ò Ò ï Ö ÔÕ Á Á Á « ª ª « ª « ¦ ¦ » ® ¼ © © ¦ © » À À Ç ¿ ³ ½ ½ ² ² ² ´ ´ º ¹ × ò ò ò ò ò ò ó å å ö ÷ ÷ ø å ÷ æ æ ð ô Ó ð Ó ô Ó ñ ñ ñ ñ ñ ä ä ä ñ ä ÛÜ Ù6õ Ú Ù Ú Ù6õ Ú Ù Ú Þ ç Ð Ð Ò Ò Ð Ð Ò Ò Ð Ð Ò Ò ß ª È « « ª © ¦ » ¾ © Â É Å Å Å À Ç ² ² ² ÃÄ Ä ¶ ¶ ¹ º ú û Æ ù ö ð
where ä ú û ù ø ð Í Î Ì Ì Ë Ë ä Ê A query Ê is well-formed wrt a schema if there exists such that .
Aside: Some queries can return ü for all instances, e.g.
þ ¤ ¥¦§
ý
ÿ ÿ ¨ ¡£¢ © â â ð ã Ó ð
Database Theory 2003 15 c GMB/AD Database Theory 2003 16 c GMB/AD Ï Generalized SPC algebra Normal forms 1. Intersection Define an SPC normal form to be a query of the form
2. Generalized Selection % % % " " U U U U U U ) ! ' , $ $ T T T T T ( & * G P P W Y Y +
Of form where # # and and # is or K K K Q X Q V Z I MNO RS N O RS [ [ M L J J H H " - + d d d d d d ^ \ ^ _ a e Q ` g ] ] ] ] ] ] ] where f f S N S Ncb 3. Equijoin
Proposition 1 For every (extended) SPC query h , there exists an extended % % % " " ! . / - $ $ * & + # where # # where is . i i
query h in normal form such that h h . k k l lnm k k l l j j 1 4 1 4 ? ? 5 5 5 8 9: <= > 5 8 9: A = > 0 0 3 3 6 6 3 6 2 2 7 7 ; ; @ @ ; ; @ @ 1 4 5 5 0 BC E 3 6 3 6 D 2 2
Exercise 1
1. Code up these operators
2. Formalise the semantics of the equijoin operator
Database Theory 2003 17 c GMB/AD Database Theory 2003 18 c GMB/AD o F
Named perspective Named: SPJR algebra p t rs q
Recall: Tuples are functions to
SPJR query Base relation p w z w x { Singleton constant relation c We write them as, e.g. y ucv | Select (constant) p Note: We do not allow repeated names in a relation sort
Select (attributes)
p We have to ensure that the result of a query does not have repeated names, Projection ~ e.g. } }
Renaming – We’ll need to be able to rename attributes
Natural join
p A natural join operator reveals itself: join on the common attributes
Database Theory 2003 19 c GMB/AD Database Theory 2003 20 c GMB/AD Sort judgements for SPJR queries Natural join ¡
È Ë È Ë Ç Ì Ç Ì Ê Í Ê Í É É ¢ £ £ ¦ ¡ ¨ ¤¥§¦ ©ª ¢ £ ¡ È ÎÏ Ë Ç Ì Ð Ì Ê Í Ê Í É É ¢ £ ¦ ¢ £ ¦ ³ ¡ ¡
« « ² Ó Ó Ñ Õ Î Ï Ì Ò Ì ¢ £ ¢ £ ¡ ¬ ¡ ¬ ® ® Ô ´ ¯ Ê Í
« « If then ° ± ° ± Ó Ó Ó Ñ Î Ï Ì Ò Ì Ì Ò Ì Í Í Ê If Í then ¢ £ ¦ ¶ ¶ ¶ ¦ ¦ ¡ µ · ¹ ¸ « ² ² distinct ¤ ª ¢ £ ¦ ¶ ¶ ¶ ¦ ¡ º ½ ½ ½ µ · « ² ² ¾ »¼ ¼ ° ± ¢ £ ¡ ¢ £ ¢ £ ¡ ¡ µ ¿ µ ¿ « « « ¢ £ ¡ Ã ¢ £ ÀÁ ¡ Â µ ¿ µ ¿ « Å « « ° ± ° ± Ä
Database Theory 2003 21 c GMB/AD Database Theory 2003 22 c GMB/AD Ö Æ
More Union
× Can define similar generalizations to SPC algebra We’d like to add some disjunctive flavour to our query language, e.g.
× Normal form: Where can I watch “Magnolia” or “Spiderman”?
1. Union é é é é é é çè çè ã çè çè ã çè í Ø ä ä ë ï ï þ ¦
â Ù Ü Ü Ü Ù â â ì ò ê ê ¡¢ £¤ ð Þ ð ð ð Ú Þß àcá åæ ß àcá å æ Þ§í Ú Þ ÿ ñ Ý
Û Û î î ¥ ¥§¦ ¨ ¥ © ¢ ¨ Magnolia © ¢ Spiderman á ø ø ø á ú ø ø ø ú á ó õ ÷ â ù â ö ê û ü ô ô ô ô ô where ô and are distinct ß æ ß æ 2. Extended (disjunctive) selection × þ A corresponding normal form theorem ¡¢ £ ¤ ÿ
¥ ¥§¦ ¨ ¨ © ¢ © ¢ Magnolia Spiderman
3. Non-singleton constant relations þ
¡¢ £ ¤ ÿ Magnolia Spiderman ¥
The addition of Union to our algebras, leads to SPCU and SPJRU algebra
Aside The addition of union to the corresponding calculus is non-trivial!
Database Theory 2003 23 c GMB/AD Database Theory 2003 24 c GMB/AD
ý Further power Difference operator
Even with Union, our algebra is quite weak, e.g. Unnamed Named Which movies where Stallone did not act were directed by Stallone? 0 3 0 3 0 3 0 3 Put more formally, our query algebras are monotonic. / 4 / 4 / 7 / 7 2 5 2 5 1 1 1 1 ( 6 6 0 3 0 3 / 4 / 7 ! ! % & & 2 5 2 5 " $ ' 1 1 1 1
Definition 1 An algebra is monotonic if # If then - + + ,
# # . ) ) * * ) ) * * Our example becomes, e.g.
To rectify this we’ll add the difference operator. 6 8 ? FGH IJ FGH IJ 8 ? 9 9 :;<= : ; < = D D > @ >§E K K >§E K K > L :A = B ;C A Stallone B ; C A Stallone
M The (unnamed) relational algebra is SPCU with difference operator.
M The (named) relational algebra is SPJRU with difference operator.
(It is traditional to add to these the generalizations, e.g. N for unnamed relational algebra)
Database Theory 2003 25 c GMB/AD Database Theory 2003 26 c GMB/AD O .
What’s coming up
P Relational model
– Different query language perspective: logic
– Obvious questions:
Q Equivalence with algebra Database Theory: Lecture 2
Q Expressivity?
Q Natural extensions and their expressivity The Relational Calculus
P Different data models Dr A. Dawar
www.cl.cam.ac.uk/Teaching/2002/DBaseThy
Database Theory 2003 27 c GMB/AD Database Theory 2003 28 c GMB/AD S R Queries Queries in Algebra
A query is a formal syntactic rendering of a question we wish to pose to a In the (SPC or SPJR) relational algebra, a query is a term in the algebra. database. Each term denotes a relation. Examples Relations are combined using various unary operations (selection, projection) and
T Who is the director of “Spiderman”? binary operations (Cartesian product, union).
T Where is “Spiderman” playing?
A Boolean query is one where the output schema is a relation of arity Z .
T Is there a film directed by Almodovar playing in Cambridge? [ [ Formally, a query defines a query mapping U . It has an input schema (typically
So, for any instance , \ is either or . ] ^ _` _ab ` a database schema) and an output schema (typically a relation schema). V V
It maps an instance over the input schema to an instance U over the output W X schema.
Database Theory 2003 29 c GMB/AD Database Theory 2003 30 c GMB/AD c Y
Conjunctive Queries Conjunctive Calculus
The various relational calculi (of which the conjunctive calculus is the Formally, a conjunctive formula x is given by the following syntax z y y } { ~ x atom simplest) are based on first-order predicate logic. |§} x x
e l conjunction j k f h f h d d i
If is a relation of arity and g , then g is a fact that is either m n o
true or false in a given database instance . x existential quantification } { o p f p d where is a relation name, is a variable and each is either a variable or a
If is a variable, then g is true or false given an instance and a m n valuation for p . constant in . A conjunctive query has the form p f p d
g is a simple query. q r m ns } } ~ x Example
t Who is the director of “Spiderman”? where
x is a conjunctive formula; p v p v g Movies “Spiderman” g ns m q ru }
each is either a variable or a constant; and } } ~ x the set of variables in is exactly . |
Database Theory 2003 31 c GMB/AD Database Theory 2003 32 c GMB/AD w Semantics Active Domain º»¼ ½ ¹ The of a formula ¹ , denoted is the set of constants A valuation is a function , where is a set of variables. active domain ¾ ¿
appearing in ¹ .
If includes all free variables in a formula , we can define satisfaction À À inductively by: Similarly, the active domain of an instance , denoted º »¼ ½ is the set of all ¾ ¿
constants appearing in any relation of À . £ ¥ ¥ ¡ ¡ §
and ¦ ¢ ¦ ¢ ¢¤£ ¦ À Å Ã Ä Show, by induction, that if Ä , then for any free variable of , Á © ¨ ª ¨ ª and and À Æ º »¼ ½ Å Ç « ® ¬ § ¾ ¿È ¾ ¿
° ² and there is a with ¯ ± ³
£ Exercise! ¥ ´ The query mapping defined by the query maps the instance to µ ¶
It follows that in evaluating a query ¹ , we only need to consider valuations whose ¥
´ for some ¢ ¦ µ ¢·£ ¦ ¶ À º»¼ ½ º»¼ ½ É
range is in ¹ . ¾ ¿ ¾ ¿
Database Theory 2003 33 c GMB/AD Database Theory 2003 34 c GMB/AD Ê ¸
Normal Form Equality
A conjuctive formula is said to be in normal form if all quantifiers precede all Every query in the calculus defined so far is satisfiable. conjunctions. That is, it’s of the form ã ã â That is, for every â , there is an such that is not empty. ä å Ö Ö Ö Ï Ë Ï Ï Ï Ì Ì Õ Õ Ò Ò Í Í Í × × Ð Î Î æ Ñ§Ó Ô Ñ§Ó Ô Ô æ Ñ ë è è ç ç é Compare ê ä ä å å
Every query in the conjunctive calculus can be defined with a formula in normal The calculus could be extended to express such queries by allowing equalities ì ì form. ( çîí ï ) as atomic formulae.
Use, í í ñ õ ñ ô ñ ó ó ë ö å ð§ñ ò ä Ë Ë Ì Ì Ø Ù Ù Ø Ù 1. Ø is equivalent to if does not occur in . Ú Û Ü Ë Ë Ë Ë Ì Ì Ì Õ Õ Ý Ù Ù Ø Ý 2. Ø ÙÝ is equivalent to if is not free in , is not Ñ Ô Ñ Ô Ñ Ô Ì Ù Ø
free in and Þàß .
Database Theory 2003 35 c GMB/AD Database Theory 2003 36 c GMB/AD ÷ á Equivalence Algebra to Calculus ùûú Every query definable in SPC algebra is definable in the conjunctive calculus Given a query ù in the SPC algebra, we define a query in the calculus.
Moreover, if the query is satisfiable, then it is definable without the use of ÿ ýþ ú
¦ ¦ ¦ ¦ ¦ ¦ ¢ ¢ ¢ ü ü ¤ ¤ § § ¥ ¥ ¥ ¥
¨ © equality. ¡£¢ ÿ ýþ ú
true ¡ ¨ ¡
Every query definable in the conjunctive calculus is definable in the SPC algebra ú ú
¦ ¦ ¦ ¦ ¦ ¦ ¤ ¤ ¤ § ù ¥ ¥ ù ¥ ¥ If and and there are no ¨ ¨ ¤ ¡£ ¡£ variables in common, then ÿ ýþ ú
¦ ¦ ¦ ¦ ¦ ¦ ¤ ¤ ¤ ¤ § ¥ ¥ ù ¥ ¥ ¥ ù ¨ © ¡£
ÿ ýþ ú
¦ ¦ ¦ ¦ ¦ ¦ ¤ ¤ ¤ ¥ ¥ ¥ ¥ ù ¨ ¡£ © ©£ ¦ ¦ ¦ ¦ ¦ ¦ ¤ ¤ § ¥ ¥ ¥ ¥ where enumerates all variables among that are not in ¦ ¦ ¦
¥ ¥ .
Database Theory 2003 37 c GMB/AD Database Theory 2003 38 c GMB/AD
ø
Selection Calculus to Algebra $ ) ) ) & ' * !#" ( ( If , , Given a query %£& + - ; G G G = @ F F B B C C H H 798 A£D E A£D E EI :<; >? A " G G G 1 J J B B 2 ! .£/ . 3 3 C H 0 take the cross-product , 4 is obtained by replacing& by . O D K 0 M P apply a selection N for each constant appearing among the , L For D K M P apply a selection Q for every repeated variable among the , L " 1 5 ! .£/ . 3 3 0 ; R @
apply a projection U U U to get rid of the columns corresponding to , S V T T L L & &
if either or 5 is a variable, it can be replaced by the other. If they are both 0 apply further selections corresponding to constants and repeated variables in constants, we need equality! ;
= .
Database Theory 2003 39 c GMB/AD Database Theory 2003 40 c GMB/AD W 6 Disjunction Negation
To extend the conjunctive calculus to one equivalent to SPCU algebra, it would The full relational calculus is obtained by allowing negation in addition to appear we need disjunction. disjunction. This allows the full power of first-order predicate logic in specifying a query. Where can I watch “Magnolia” or “Spiderman”? k k k h i l e9f j j n m o g£h
with rules for formulae (in addition to those of conjunctive formulae): Y Y ` ^ ^ ^ ^ \ Guide “Magnolia” \ Guide “Spiderman” \ X£Y Z[ ] ] ] _ _ _a f q p p n n negation Aside: How would you prove this cannot be done in the conjunctive calculus? r
n n disjunction m t s
n universal quantification However, allowing unrestricted disjunction can result in unsafe queries, i.e. m queries with possibly infinite answers. In the presence of negation, the last two are not strictly necessary, but they are convenient. b Y b c ` c ^ ^ ^ ^ \ \ \ Z ] _a X£Y _ ]
It is no longer the case that a satisfying valuation is restricted to the active domain.
Database Theory 2003 41 c GMB/AD Database Theory 2003 42 c GMB/AD u d
Safety Domain Independence Using negation and disjunction, one can write queries that admit infinite relations If we fix a domain , the interpretation of a query over an instance
as answers. can be relativized to . w
Movies “Kika” | “Almodovar” | { }~ v£w xzy All valuations are restricted to have range in . w w | | | | | Movies “Kika” “Almodovar” Movies “Almodovar” “Maura” }~ } { v x { The resulting interpretation is .
One possibility might be to restrict to some fixed finite set. Still, some A query is domain independent if for any and any , queries are not domain independent. w | }~ v£w x { It is undecidable whether a given expression of the relational calculus defines a domain independent query.
If a query is domain independent, it can be evaluated by restricting all valuations to the active domain.
Database Theory 2003 43 c GMB/AD Database Theory 2003 44 c GMB/AD Equivalence
Every query expressible in the relational algebra is equivalent to a domain independent query in the relational calculus.
Every query in the relational calculus interpreted with domain restricted to Database Theory: Lecture 3
is equivalent to a query in the relational algebra. Deductive Databases
Dr A. Dawar
www.cl.cam.ac.uk/Teaching/2002/DBaseThy
Database Theory 2003 45 c GMB/AD Database Theory 2003 46 c GMB/AD ¢ ¡
Rules Syntax
Rules provide an alternative syntax (inspired by logic programming) for Formally, a conjunctive rule is: conjunctive queries. ½ ½ ½ ¶ ¶ ¶ » » ¾ ¾ ¼ ¼ ·£¸ ¹ ·£¸ ¹£º ·£¸ ¹ £ ª ¥ ª ª ª ¬ ® « « ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ § § Movies “Almodovar” Guide Location ¨ ¤ ¨ ¤£¥ ¨£© ¤£¥ ¨ ¤£ª where expresses the query ¿ ¶ is a relation name not in the input database schema. ² ² ² ª ª ª ª ¥ ª ª ³ ³ ® « ¬ « « ® ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ § § Movies “Almodovar” Guide Location ¤£¥ ¨´ ¤£ª ¨ ¤£¥ ¨ ¯ °± ¿ ¶
Each À is a relation in the input schema. ¿ ¸ ¸
Each of and À is a tuple of terms (i.e. variables or constants). and, in addition, gives the resulting relation the name £ . ¿ ¸ ¸
Every variable occurring in occurs in at least one of the À .
Database Theory 2003 47 c GMB/AD Database Theory 2003 48 c GMB/AD Á µ Semantics Disjunction
The meaning of the rule Disjunction is easily added by allowing multiple rules for the same relation
É É É Where can I watch “Magnolia” or “Spiderman”? Â Â Â Ç Ç Ê Ê È È Ã£Ä Å£Æ Ã£Ä Å Ã£Ä Å
is given by Î Ö Û Ø Ý Ý Ü Guide “Magnolia” Ü ×£Ø Ù£Ú × × Ù É É É Ï Ä Ä Â Ð Ð Â Ç Ç Ê Ê Ã ÅÑ Ë ÌÍ Ã£Ä Å Î Ö Û Ø Ø Ý Ý Ü Ü Guide “Spiderman” × Ù × Ù£Ú × É É É Ä Ï Ä Ä Ç Ê where is the sequence of all variables in È È not occurring in .
Note: Occurrences of a variable in distinct rules are to be treated as distinct
 is called the head of the rule. Ã£Ä Å variables. É É É Â Â Ç Ç Ê Ê È È is the of the rule. Ã£Ä Å Ã£Ä Å body
Ó This form of disjunction is always safe. Ò
Aside: by allowing equalities ( Ô ) in the body, we can ensure that there are no constants or repeated variables in the head.
Database Theory 2003 49 c GMB/AD Database Theory 2003 50 c GMB/AD Þ Õ
Multiple Relations Negation
As the syntax of conjunctive rules allows us to name the result of the query, it is One possible extension would be to allow negation on atomic formulae in the easy to simultaneously define multiple relations. body of a rule.
This can allow unsafe queries. ä æ ä ê é ß à ç ã ã ã ã ã è è á£â å á£â å á å ô ä æ â ê ê ì ì ä ß ß é ß ï ñ ë à à ã ã ã ã ã ã ã ã è è õ õ á£â å á å á å á å Movies “Kika” “Almodovar” ð ò ð£ñ ò£ó
As long as the dependencies among the relations are non-circular, each However, there is a simple syntactic restriction that guarantees safety. individual relation is still a conjunctive query. For instance, ê ì ì ä ê ê ê æ â ä é é é ß ë ë ë à à à ë ã ã ã ã ã ã ã ã ã ã ã ã ç ç ã ã è è è è è è å á åí å á å á å á á£â å á é
Relations (such as ç ã ) occurring in the database schema are called extensional database relations. Relations that occur in the heads of rules are intensional database relations.
Database Theory 2003 51 c GMB/AD Database Theory 2003 52 c GMB/AD ö î Range Restriction Universal Quantification
A Range-restricted conjunctive rule with negation is If negation is restricted to edb (extensional database) relations, not all relational calculus queries are expressible. ÷ ü ÿ ÿ ÿ ü ý ý
þ þ ø£ù ú ø£ù ú£û ø£ù ú Who are the actors that appeared in every Almodovar film? where
Movies “Almodovar” Movies “Almodovar” ¤ ¡ ü £ £ £
Each ¢ is a literal, i.e. either or for some relation in the database schema. Informally, a set of conjunctive rules with negation only on edb relations is equivalent to a formula that is the disjunction of formulae of the form ¡ ù ù
Every variable occurring in occurs in at least one of the ¢ . " " "
! # # ¡ Every variable in the body appears in some positive literal.
¥ § ¥ § A literal is positive if it’s of the form ¨ . It is negative if it’s of the form ¨ . ¦ © ¦ © Simulating the universal quantifier requires placing a negation outside the existential quantifier.
Database Theory 2003 53 c GMB/AD Database Theory 2003 54 c GMB/AD $
Universal Quantification Example: Path Finding
Who are the actors that appeared in every Almodovar film? A railway database contains the relation Railway Service From To
WAGN Cambridge Ely & * ' ' ' ( Almo-actor ( Movies “Almodovar” % ) %& ) WAGN Cambridge King’s Cross
Central Ely Birmingham & & & * - , - ' ' ' ' '/. Not-all Almo-actor Almo-actor ( Almo-actor %+ ) %+ ) % ) %+ ) Virgin Euston Birmingham
... & * GNER Peterbrough Newcastle
All Almo-actor ' '/. Not-all ) %+ %+ )0 %+ ) Great Western Paddington Cardiff
This is dynamically updated, to reflect cancellations and line failures.
Is it possible to get from Cambridge to Glasgow?
Database Theory 2003 55 c GMB/AD Database Theory 2003 56 c GMB/AD 2 1 Path Finding Recursion
In the relational calculus, we can ask: We can allow recursion, by allowing a relation name to appear in both the head and body of a rule. Which stations are reachable from Cambridge without change? Example: 5 3 9 9 Reach Railway 8 “Cambridge” 6 45 67 4
Which stations are reachable from Cambridge with one change? ? E F B B C C @A D @A D ; 5 ; : < : ? ? 9 9 9 9 9 E G G 8 8 Reach Railway “Cambridge” Railway F 6 4 6 4 45 67 B B B B C C @A D D @ @A DH
Which stations are reachable from Cambridge with two changes? ? Here, F is a relation given in the database. The query defines a relation that is
the transitive closure of F . ; 5 ; ; ; < : : < : < < = 9 9 9 9 9 9 9 9 Reach Railway 8 “Cambridge” Railway 8 Railway 8 6 6 6 4 4 4 45 67
But, there is no way to ask “which stations are reachable from Cambridge?”. Datalog is the language of conjunctive rules with recursion.
Database Theory 2003 57 c GMB/AD Database Theory 2003 58 c GMB/AD I >
Route Finding Syntax
Which stations are reachable from Cambridge? A Datalog rule is an expression Z Z Z K M S S S X X [ [ O O Y Y Reach Railway N “Cambridge” L J JK L TU V TU VW TU V Q K Q M P O O O Reach Railway N Reach J L J L JK L where \ Z Z Z U U U X [
The tuples of terms Y Y Y satisfy the restrictions on conjunctive rules. Is it possible to get from Cambridge to Glasgow? \ S
Each ] is either a relation in the input database scheme, or appears in the
Camb-Glas M Reach “Glasgow” L J head of some rule.
A Datalog program ^ is a sequence of rules. ^ We write _`a for the database schema consisting of the relation names that T V ^ ^ occur only in the body of rules in . bcd is the schema consisting of all T V
relations occurring in ^ .
Note: no negation.
Database Theory 2003 59 c GMB/AD Database Theory 2003 60 c GMB/AD e R Model-Theoretic Semantics Model-Theoretic Semantics contd. With a rule If is a program, a database instance over is said to be a of if
model m m m f f f k k n n l l gh i gh ij gh i it satisfies . we associate the formula of first-order predicate logic: If is a database instance over , the semantics of over is defined to s s s m m m p p h f r r f f o k k k n n n l l q g gh i g i it gh iu be the minimum instance such that and is a model of . m m m p p k n where l l enumerates all the variables appearing in the rule.
How do we know such a model exists? v w
For a program , we write x for the conjunction of all formulae associated
with rules in v . v w y z{|
Since there are no free variables in x , for any instance over , we g i have either y w y w
x satisfies x . }~
or y w y w x x
does not satisfy . }~
Database Theory 2003 61 c GMB/AD Database Theory 2003 62 c GMB/AD
Fixed-point Semantics Fixed-point Semantics contd. °
Let be a program and an instance over . The function ± is a on instances.
monotone map ² ° ² ° Define the instance as follows ´ ´ If ³ , then ± ³ ± . ¶ µ ¶ µ
For every relation in , . Exercise: prove it! For every relation not in , a tuple is in if, and only if, there is a
Fixed-point theorem ² ² ´ ´ rule For any instance , there is a least such that ³ which is a fixed-point of ° ¨ ¨ ¨ £ ±
¥ ¥ . © © § § ¢¡ ¤ ¦ ¦ ¡ ª ª ® ¦ £ ¬ and a valuation such that and for each . « ° ¸ ´ ´ ± µ ¶·
That is, is formed of the set of facts that can be by
immediately justified the program and the instance .
Database Theory 2003 63 c GMB/AD Database Theory 2003 64 c GMB/AD ¹ ¯ Iteration
The least fixed point can be reached by iteration. º º » Given a program and an instance over ¼½¾ , take ¿ À » Á ÂÄÃ Ã È Á Á Ç É Å Å . Database Theory: Lecture 4 Æ À ¿ Ã Á Á Ç Å Repeat until Å . Æ Recursion and Negation
Dr A. Dawar
www.cl.cam.ac.uk/Teaching/2002/DBaseThy
Database Theory 2003 65 c GMB/AD Database Theory 2003 66 c GMB/AD Ë Ê
Datalog Monotonicity
Datalog extends the conjunctive calculus with recursion, but does not allow Every query definable in Datalog is monotonic. negation. Í Î ÎÐ Í ÑÒÓ
If is a Datalog program and Ï instances over , then Ô Õ
We can express Ð Î ÎÐ Í Î Í Î
If Ö then Ö . Ô Õ Ô Õ Which stations are reachable from Cambridge?
but not Ð Î Î Ö For which pairs of stations is there no direct connection? Ð × Î × Î Ø Ø Ö Ô Õ Ô Õ Ð × × Î × × Î Ø Ø Ø Ø Ö Ô Ô Õ Õ The expressive power of Datalog and the relational calculus are incomparable. Ô Ô Õ Õ Ð ×ÚÙ Î ×ÚÙ Î Ö Ô Õ Ô Õ Ø Ø . .
Database Theory 2003 67 c GMB/AD Database Theory 2003 68 c GMB/AD Û Ì Example Adding Negation
Edge Source Target Allowing negation in datalog programs is not a trivial task, as the fixed-point
a c semantics was based on monotonicity. Given a graph as a binary relation a b
b d We now consider a series of ever more expressive query languages obtained by different ways of incorporating negation into datalog.
We can ask for which pairs of nodes there is a path: All have the proviso that negation is range-restricted. á Ý Þ Þ ß Path Edge ß ÜÝ à Ü à ç ê That is, if a è appears in the body of a rule, every variable in éê ë á Ý â â Þ Þ Þ Þ ß Path Edge Path ß ÜÝ à Ü à Ü àã appears in some positive literal in the body of the same rule. Ý Þ But we cannot ask: for which pairs ß is every path of even length. å
Adding the pair Þ to the above table would require removing it from the result à Üä of the query. This shows that the query is not monotone.
Database Theory 2003 69 c GMB/AD Database Theory 2003 70 c GMB/AD ì æ
Semipositive Datalog Semipositive Datalog contd.
ÿ
The simplest method of extending datalog is to allow negation only on edb For a semipositive datalog program , the map ¡ on instances is still monotone predicates. in a restricted sense. ¢ ¢¤£ ÿ A semipositive datalog program is a collection of rules If and are instances over ¥¦ § such that: ¨ ©
¢ ¢¤£
ò õ õ õ ò ï í , and ó ó ö ö ô ô îï ðñ îï ð î ð
ÿ ¢ ¢¤£ for each in , . ¨ © ¨ © ¨ © ø ò í í
where if ÷ is ÷ , then ÷ does not appear in the head of any rule. and the usual restrictions on variables apply. then £ ÿ ¢ ÿ ¢
¨ © ¨ © ø ù ñ ü ô ô û û îú ð îú ð This suffices for proving the existence of minimal fixed-points and therefore giving ø õ ù ù ý ñ ý ü ô ô ô ô û û ð îú ð î îú ð a semantics for semipositive datalog.
This computes the transitive closure of the complement of ü .
Database Theory 2003 71 c GMB/AD Database Theory 2003 72 c GMB/AD þ Semipositive Datalog contd. Stratified Datalog
While semipositive datalog can express some queries that are neither in the A stratified datalog program 2 is a set of rules relational calculus nor in datalog, it cannot express universal quantification. 8 ; ; ; 8 5 3 9 9 < < : : 45 67 45 6 4 6 ? ? ? = 2 2 2 > > 9 , which can be partitioned into sets < such that Movies “Almodovar” Movies “Almodovar” ! "# @ A For every relation 3 , there is an such that every rule such that all rules with 2 A 3 3
in the head are in B (we call the defining stratum for ; for edb $ $¤& '() +
Suppose % are two instances over such that for any tuple over *
& relations, the defining stratum is 0). $ $ $ + + , (- . 0 0 , / if, and only if, / . & @ 8 3 3 D * $ * $ If is a positive literal , then the defining stratum of the defining C C C Then % .
stratum of 3 . The proof of this is similar to monotonicity of ordinary datalog programs. @ 8 3 If is a negative literal 3 , then the defining stratum of the defining CFE To get universal quantification, we have to find a way of alternating rule C C
stratum of 3 . application with negation.
Database Theory 2003 73 c GMB/AD Database Theory 2003 74 c GMB/AD G 1
Example Semantics Z U If is a database instance over VWX , where is a stratified datalog program YZ [ _ _ _ \ Z Z Z ^ ^ ] ` , we define the following sequence of instances for I M J J J K Almo-actor K Movies “Almodovar” H L HI L c d a
b b . I I I M h fg P O P J J J J JRQ Not-all Almo-actor Almo-actor K Almo-actor HN L HN L H L HN L \ U U e h fg \ Z U U U ^ ] ] i i i i j j Y [ I M
All Almo-actor J JRQ Not-all HN L HN LS HN L h fg \ Z U U
Then, ` . Y [ This example does not involve recursion.
Indeed, stratified datalog without recursion is exactly equivalent to the relational For a given program k , there may be more than one valid stratification. Show that calculus. the semantics does not depend on which stratification is chosen.
Database Theory 2003 75 c GMB/AD Database Theory 2003 76 c GMB/AD l T Example: Game Fixed-Point Calculus
Relational equations such as q no p tu p w z {| m y v x v} x~ rs
Win-pos Win “White” Move Player Position Next Win Position Player Move “White” Move “Black” Win-pos
Black n White have solutions.
White n . . . If
White
¢ £ ¥
. . . ¡ is a relational expression over schema ; ¤ ¦
¥
no occurrence of in ¡ is in the scope of a negation; and q
Win Win y “White”
§ ¢ is an instance over schema . q y y y y Win Move “White” Move “Black” Win § ¨ ¥
There is a minimum extension of satisfying ¡ .
Each Win is definable in stratified datalog but the set of all winning positions for White is not definable.
Database Theory 2003 77 c GMB/AD Database Theory 2003 78 c GMB/AD ©
Fixed-Point Calculus Definition Extending the Algebra ¬ ª « « Predicate Base relation One way of extending relational algebra to express recursive queries like transitive closure ° ² ²
¯ where occurs only positively in ® ± ³ ½ Ã Ä À À Á Á ¾¿  ¾¿  ¬ ª « « ¶ ½ ½ Ã Å Å Ä ² with matching arities
Formula ±µ´ ³ À À À À Á Á ¾¿  ¾¿  ¾ ÂÆ · ² ² ® is to add while loops. ¸ ¹ ² ® È ½Ç Ä º ² ® while change do » begin ° ² ²
For the semantics, interpret ¯ as the least relation satisfying . ± ³ È Ñ ½ ½Ç ½ Ê Ä É Ì Ì Ë Í Ï Ð Â Â ¾ Every query definable in stratified datalog is expressible in the fixed-point calculus. ¾Î end
Database Theory 2003 79 c GMB/AD Database Theory 2003 80 c GMB/AD Ò ¼ While Programs
Every query definable in the fixed-point calculus is definable by a while program of the form. Õ Ô Ö Ó × while change do Database Theory: Lecture 5 begin Õ Ô Ö Ó
Ø Expressivity and Complexity end Dr A. Dawar Ö Ö
where × and Ø are terms of the relational algebra. Ú Ú Ú Ö ÓÙ
Moreover, if Ø is of the form , the converse holds as well. www.cl.cam.ac.uk/Teaching/2002/DBaseThy
Database Theory 2003 81 c GMB/AD Database Theory 2003 82 c GMB/AD Ü Û
Evaluating a Query Running Time
Given a datalog program Ý , a na¨ıve method of evaluating it on a database Let
instance Þ would proceed as follows: ý û § ü ¡¢ ¦ þÿ £¥¤ ¨ þ á à Þ Ý é êë â ä ä è æ 1. ß . ì íî ã å ç ý û ¤ © the maximum number of distinct variables occurring in any rule in Ý Ý Þ ï ð êñò
2. For every valuation of the variables occurring in over ó , if ì í ý û ¤ the maximum arity of any relation in the head of a rule in ö ä body ìõô í÷ ¤ § ü Then, the na¨ıve evaluation of on takes steps. £ ¨ á û ß ø is a rule such that body, then ü ç We only need to consider distinct evaluations for any given rule. á û ü à ï ö ä â ä ß ß The number of iterations is at most . ì í ã ìõô íîù ì í ¤ ¤
For any program , there is a polynomial such that can be evaluated in time 3. Repeat step (2) until no new tuples are added. § §
on an instance . £ þ þ ¨
Database Theory 2003 83 c GMB/AD Database Theory 2003 84 c GMB/AD ú Running Time NP-complete Query
The upper bound on the running time also holds for stratified datalog, as each Is there a way to travel around Britain so that I visit every railway station stratum is evaluated to completion just once. exactly once?
Since the query expressed above is an NP-complete property of a database One can also derive such bounds for relational algebra. instance, we cannot hope to express it in datalog. For any relational algebra query , there is a polynomial-time algorithm to Indeed, one can prove that it is not definable. evaluate it on all instances.
The degree of the polynomial is bounded by the maximum arity of any Is every polynomial-time computable query expressible in datalog? In stratified datalog? In the fixed-point calculus?
sub-query of .
Many optimization techniques are designed to reduce the arities of intermediate relations.
Database Theory 2003 85 c GMB/AD Database Theory 2003 86 c GMB/AD
Polynomial-time Queries Generic Queries
Give me the set of elements in the active domain which are at even Given database instances over the same schema, a bijection positions when listed lexicographically. This query is computable in polynomial time. !
is an isomorphism from to if for any relation and any tuple : ! !
" "
Unless the lexicographical order is explicitly given as a relation, it is not definable iff # $ # $ # $¥% in any language we have considered. A query mapping & is generic if, whenever is an isomorphism from to , it is The query is not generic & also an isomorphism from & to . # $ # $ # $ ) ) ' ( '( denotes the query obtained from by replacing every constant by
* + .
Every query language we have considered only defines generic queries.
Database Theory 2003 87 c GMB/AD Database Theory 2003 88 c GMB/AD , Order Fixed-Point with Order 0 ./
Let - be a total order relation on . The fixed-point calculus with order can express non-generic queries: 0 (for instance, if ./ is the set of natural numbers) H E H J K
Is there a pair G in the relation with ? I DFE Allow queries in the fixed-point calculus to use the order. Immerman and Vardi showed: 3 1 2 2 Predicate 4 Base relation A query is expressible in fixed-point calculus with order if, and only if, it is 4 7 6 9 9 where occurs only positively in 5 8 : computable in polynomial time. 3 2 2 1 = In the absence of order, there are polynomial-time, queries that cannot Formula 9 with matching arities generic 8<; :
= = be expressed in the fixed-point calculus. > 5 ? 9 9 5 Is the number of tuples in J even? @ A 9 5 B 9 5 Open Question: Is there a language that can express exactly the polynomial-time computable generic queries?
Database Theory 2003 89 c GMB/AD Database Theory 2003 90 c GMB/AD L C
Inexpressivity
The transitive closure query: N T U Q Q R R OFP S OFP S N N T V V U Q Q Q Q R R OFP S OFP S O S¥W Proving Inexpressivity is not definable in the relational calculus, without recursion.
How do we prove this?
We can show that the query:
Is the graph U strongly connected?
is not definable. N N P X X Q If were definable, this query would be R R . OFP S
Database Theory 2003 91 c GMB/AD Database Theory 2003 92 c GMB/AD Y M Quantifier Rank Proving Inexpressibility t q q \ s s u r v [ For a formula Z , we define its Z by: For two instances , we write to denote that for any formula with
quantifier rank ] ^ x | { w v and no free variables: y z \ ` [ Z 1. if Z is atomic then , ] ^ _ ~ q s
v iff v } ~ } _ a \ \ [ [ b Z 2. if Z b then , ] ^ ] ^ _ _ _ f d c e c e Z b b 3. if Z b b or then g i \ h \ \ e c j [ [ [ b b Z . ] ^ ^ ] ^ ] ^ _ ] To prove that strong connectedness is not definable in relational calculus, we | u show that for each there are graphs and u such that _ _ o k \ l \ l m n [ [ b Z b Z 4. if Z b or then ] ^ ] ^ _ t u u u \ [ That is, Z is the depth of nesting of quantifiers in Z . ] ^
but u is strongly connected while u is not.
Database Theory 2003 93 c GMB/AD Database Theory 2003 94 c GMB/AD p
Ehrenfeucht Games Ehrenfeucht Games contd.
We define a game played between two players Spoiler and Duplicator on two If, for any relation and any tuple taken from we have
database instances and over the same schema. ¢ ¥ ¡ ¡ if, and only if, ¦ £ ¤ £ ¤ £ ¤ The game consists of rounds, where in each round : then Duplicator has won the game. Otherwise Spoiler has won. 1. Spoiler chooses an element from the active domain of one of the two
instances: either or .
Duplicator has a strategy for winning the § -round Ehrenfeucht game on 2. Duplicator responds with an element from the active domain of the other ¢
and ¦ if, and only if,
instance: or . ¨ ¢ ¦
At the end of play, we have two sequences and .
Let denote the mapping .
Database Theory 2003 95 c GMB/AD Database Theory 2003 96 c GMB/AD © Cycles « ¬ ª
Take « to be a graph consisting of a single cycle of length and « to be a «
graph consisting of two disconnected cycles, each of length ® . ¯ ª
We can show « « « Database Theory: Lecture 6
This proves that the transitive closure query is not definable in the relational Complex value model calculus. Dr G.M. Bierman
www.cl.cam.ac.uk/Teaching/2002/DBaseThy
Database Theory 2003 97 c GMB/AD Database Theory 2003 98 c GMB/AD ± °
History Foundations
² First Normal Form: Assumption in original Codd 1970 paper The notions of relation names, attributes and constants are as in the relational model ² But...lots of application areas require more flexible model (even our Cinema ¸ · · » ¹º database from lecture 1!) Sorts ¶ Base relation · Á Á Á ¾ · ¶ ¶ Ç Æ ¿   ² À À Tuples È ÄFÅ First proposal: Makinouchi [1977] ¼ ½¥¾ Ã
¶ Sets ¼ É Ê ² Main activity in 80s:
– Jaeschke & Schell [1982] Ë An element of a sort is called a complex value Ë » – Thomas & Fischer [1983] Useful shorthand: Often eliminate occurrences of ¹º , e.g. – Abiteboul & Bidoit [1984] ÍÎ Ñ Ó Î Ñ Î ÍÎ Ñ Ö Î Ñ Í Ó Î Í Ö ÏÐ ÏÐ Ï Ð ÏÐ Ô Ô Ò Ò Ò Ò Ò Ò – Roth, Korth & Silberschatz [1986/8] Ì Õ Ì ×Ø ×ÚÙ Ì Õ Ì ×Ø ×
Ë Note that a complex value may belong to more than one sort – Schek & Scholl [1986]
Some differences, but basic idea: drop requirement that entries are atomic ³ Other names: NFNF, NF ,´ 1NF, nested relational model, ...
Database Theory 2003 99 c GMB/AD Database Theory 2003 100 c GMB/AD Û µ Extended language ALGcv: Complex value algebra é Ü è è ê
Relational algebra deals with sets of tuples. Thus complex value algebra ALGcv ç Base relation
í Constant values ë ì should deal with sets of complex values. î ï
ç ç Union ë Ü A complex value relation of sort Ý is a finite set of values of sort Ý .
çñð ç Set difference ë Ü á ò â õ ó ö ç Select (constant) Common confusion: A complex value relation of sort à à is a set of ë ÷ ø Þ¥ß ã ô ò õ ß á â ó ó
ç Select (attribute) ë ÷ ø
tuples over attributes . The entire relation is also a complex value of ô ù ò ó ó á ç Select (attribute mem) ú â ë ÷ ø ô
sort . ù ä Þ¥ß ãå ò õ ó ó û ü ç Select (attribute comp) ë ÷ ø ô ù
Ü Note: Sort of a relation need not be a tuple. ý ó û û û ó
ç Projection ë ÷ ø þÿ ÿ
£ ¥ ¢ ¤ ¦ ¤§
¡ ç Powerset ë ÷ ø ¥
§ ¨ © ¤ § ¤
¡ ç ç Tuple ë ÷ ø ó û û û ó þ ÿ ÿ
¥ ¦ ¤§ © ¤ § ¤
ç Set creation ë ÷ ø ¥ § ¨ ¤ ¦ § ¢ ¡ ç Tuple destroy ë ÷ ø ¥ ¦ ¤§ ¤ ¦ § ¢
ç Set destroy ë ÷ ø
Database Theory 2003 101 c GMB/AD Database Theory 2003 102 c GMB/AD æ
Semantics of ALGcv Sort judgements R T U U Q Q S means that S denotes a relation of sort , given database schema . X V [ Z W Y \ ] c _ ab [ ^ ` " " ! # ! #
\ ] X [ V $ $ ! # ! #
\ ] \ ] \ ] \ ] X X X X [ [ [ [ e f e f d d d d -. * * % + ( & / )
, ' h \ ] \ ] X X [ [ g e f e f d d d d -. - . % * * * + ( / & &
, 1 ' 0 -. -. % * * * + + / & &
2 , 1 ' 0 - . - -. % * * * 6 + ( & & / 5
, 1 ' 043 -. - . - - - . < . < * * * 7 + ! ! & 3 3 3 & > >
= = ? , ; 89 9 : B D C E CF G G A
H @ , ------. < < - - - . D G G G G CK F C FI J + + ! ! ! ! ! > > > > >
= = = = = @ ? , ; & 3 3 3 & 89 9 : D E CF J CK F C
N < D L G G C E F A FI +
M @ , ; ? D L C E F A E CF
M O
Database Theory 2003 103 c GMB/AD Database Theory 2003 104 c GMB/AD i P Sort judgements cont. Sort judgements cont. k m o q q q s m w q q q n n uv j t l r r p x k m y n j | ¢ ¢ ¢ z
} l ~ ¤ ¥ £ £ ¡ ¡ { k m o q q q s m w q q q s w q q q n n u v u v j t l r r r r p x m ª ¦§ © «¬ ¦ « k m y n j | ¨ z z ¤ ¥ ¢ ¢ ¢ l ~ £ ¡ ¡ { k m o q q q s m q q q s m q q q n n n n j t t t l r r r r p x ª ¦ § «® ¦ ¯ m k n y j ¨ ° ¤ ¥ z z l
~ { k m o q q q s m q q q s q q q m q q q q q q n n n n j t t t l r r r r r r p p x x m ª ® « ¦ © «¬ ¦ « ª ® « ¦ «® ¦ ¯ ° ¤ ¥ ¤ ¥ k m y n j | z z l ~ { ± ª ¯ « ® « ¦ ¨ ¤ ¥
Database Theory 2003 105 c GMB/AD Database Theory 2003 106 c GMB/AD ²
Generalized algebra Further extensions: Nest and Unnest
As in the relational algebra, there are a number of useful generalizations that can (Simplified forms) be coded up in ALGcv Ê Ê Â Ì Â À Â Â Ç Â ÅËÊ Å Å É ¿ Æ Á Æ ÍÎ Í ÃÄ È Ã ³ · º · ¸
Complex Constants, e.g. ¹ ´µ¶ ´4» ¼½ ¼ Ê Ê Ê À Ð Ð Â Â Â Ì Â Ï ÑÒÓ Å Å Å ¿ É Ô Á Æ Æ Õ Ö ÃÄ Í
³ Renaming Ê Ê Ê ³ À Â Â Ç Â Â Å Å Å ¿ É
Cross Product Á Æ Æ ÃÄ Í Ê À Ð Â Ä Â Ì Â Ç Â Â Ñ Ò Ó Å ÅËÊ ÅËÊ ¿ É Ø × Ô Ú Á Æ Æ ³ Õ Ö Ã È Ã ÍÎ Í Join Ù
³ N-ary set creation
Aside: Clear that ALGcv subsumes relational algebra.
Database Theory 2003 107 c GMB/AD Database Theory 2003 108 c GMB/AD Û ¾ Examples Further examples è è Ý § § ¦ ã æ ì ã æ ã æ â ã æ äå äå äå äå ë ¨ Ü ß
1. Renaming of attributes to © ç Assume ç ç . àáì Þ í éê í Þ àáâ éê ß # ! 1. Union of and some constant tuples: !
" " " " " ö ö # $% ! & & ( ( ' ' " " " ó ò ó ó ò ó îï ô ÷ ù ù õ õ õ ðñ4ò ø ñ4ò øú *,+ )
2. Tuples inë where the first component is containing in the second component 2. Unnesting of on ? þ û ü ü ý ÿ ¡
3. Cartesian product ofß andë ? Ý â ì 4. Join ofß andë on ¤ û î £ ¢ ü ÿ ¡
Database Theory 2003 109 c GMB/AD Database Theory 2003 110 c GMB/AD - ¥
CALCcv: Complex value calculus Extensions to CALCcv 0 / / Term . 1 Constant value Similar to relational calculus
3 Variable 2 G I I P P P K I N N H M M Q Q e.g. Assume O O
Complex constants JLK R 45 3 Tuple element 2 ^ ] d d d _ _ _ X X X ` X Xc Xc S U S b U U b b U V V V Y Y Y W W a a a a T T Z Z\[ 0 / / 6 . 7 Relation atoms
Literal 8 9 G O e.g. h egf i 0 Complex terms . . Equality 2 . . : Membership 2 G l j Complex literals e.g. k . .
; Subset 2 G P P P o m q M Q O O O h h Queries as terms e.g. where p i r s,t e egm nLo i 0 < / / Formula 6 Literal < < < < < = > Logical connectives 2 2@? 2 A 4 < 3 Powerset 2 4 < 3 B Tuple 2 0 < / / 3
Query C D 2 E
Database Theory 2003 111 c GMB/AD Database Theory 2003 112 c GMB/AD u F Examples Transitive closure! ¥ ¥ « §® ¥ « ©ª ©ª ¤ ¨
Recall ¬ . Define Again assume ¦L§ ¯ w } | } } } ~ ~ ~ ~ y v » ¹º
. x z{L| z{L x ¼ Â Å ÄÇÆ Å Å ÄÇÆ Å ½¾ À Á ¾ À ·Ê À Á · °±² ³´µ · É É ¿ ¿ ¿ ¿ ÃgÄ È ÃgÄ È ¶ ¸ y
1. Union of and some constant tuples: ¼ Å ÄÇÆ Å ¾ Á · Ë É ¿ ÃgÄ È g g » ¹º ¼ ¼ Â Ñ · ½Ñ Ð Ñ ° ² ÌÍÎ Ï Ì ³Ð · Ë É ¶ ¸ ¶ ¸
2. Tuples in where the first component is contained in the second component Then transitive closure can be defined as follows:
¼ Â ½ · °±² ³´µ · Ê ° ² ÌÍÎ Ï Ì ³ Ð · · Ë É Ó Ó Ò Ô ¶ ¸ ¶ ¸ Õ
3. Cartesian product ofy and ? Recall that this can not be expressed in the relational calculus/algebra!! w | 4. Join ofy and on
Where did this extra expressive power come from? ¡ ¡ g¢ g
Database Theory 2003 113 c GMB/AD Database Theory 2003 114 c GMB/AD Ö £
Equivalence: From ALGcv to CALCcv Equivalence: From CALCcv to ALGcv Ü Given an ALGcv expression × we can define a CALCcv query × . This proof proceeds in exactly the same way as for the relational calculus to ØgÙ ÚÛ Û Ý relational algebra ã áâ ä æ ß ß à Þ Þ å ç (Again we will only consider domain-independent queries) ã áâ ä ä é æ é à Þè ê Þ ã áâ The only extra complication is in the creation of the active domains ä ä òó ë æ é ñ î à à ì ô ï ð ð Þ å ç Þ Þ Þ å ç í ã áâ ä ä òó ò ó ë æ æ ñ î à à ì ì ô ð ð Þ å ç Þ Þ Þ å ç ö í õ ò ò ò ã áâ ä ä ü ò ó ò ó ò ò ò ó òó æ ÷ ñ ÿ ÿ à ì ú ú ú ì ¢ ¢ ð ý ý ¡ ¡ ý ð Þ ç Þ å þ £ Þ å ç Þ ¤ øù ù û ã áâ ä ¥ ñ ¦ à ÿ ¦ à ÿ à ð ð ð ð Þ Þ Þ Þ Þ Þ ã áâ ä ä ò ò ò ü ò ò ò ò ó ò ò ò ó § æ § ¨ à ÿ ÿ ÿ ÿ ¢ ¢ ¢ ¢ ð ¡ ¡ ð ý © ý ý ¡ ¡ ý ç Þ ì ú ú ú ì å å þ £ Þ øù ù û ñ ñ ñ ÿ ¢ ð ð Þ Þ Þ Þ ç ø ¤ ¤ û ã áâ ä ä ü ò æ § § ¨ ñ à ð ý ð ý © Þ ç Þ å þ £ å ç Þ Þ ¤ ò ò ò
Database Theory 2003 115 c GMB/AD Database Theory 2003 116 c GMB/AD Constructing the active domains Variants of Complex Value Model Define a function that maps a relation of sort to a relation of sort Nested relational model: Set and tuple constructors are required to alternate, (essentially just returns the ground elements). e.g. l i g i n h f f f f ok op op o j d+m j d+k d+e l g i h f f f not ok o o d+k d+e $ "# % ! ( &' ' $ "#
: further restriction of the nested relational model where at least % V-relation model ( 4 7 - &' . 12 56 1 8 ' . , , , 3 9 0 0 : : / )+* $ "#
one component of every tuple sort must be atomic, e.g. A A A % ( - - - ( ( ( * - &' . = ' @ @ . = . . ' , , > > , , < < < < ; ? 0 0 )+* / : 0 0 : : : 0 0 )+* / : 0 )+* / 0 : : : B $ "#
% l i g i n h ( 4 7 . &' . 6 5 1 5 6 1 8 ' f f f f ok op op o j d+m j d+k d+e 9 C D 0 0 : : g i l i n h f f f not ok o p op o j d j d+m d+e F M M M H E H I I N N
Then for a relation schema L L and instance G J K J KO Further assumption: atomic values in a tuple form a key F M M M Q P I N L L GRQ O Work on the complex value model has naturally lead to work on object-relational ] \ ` ` ` F Q Q P b ^ ^ STU V _ _ _ I N X X L a Y W Y W I W Y W Y[Z N and object models (next lecture!). b X X where a are the free elements in . W Y
Database Theory 2003 117 c GMB/AD Database Theory 2003 118 c GMB/AD q c
Manifestos
By the late 80s following from work on Complex Value Models, and from work in the OO community, came suggestions for adding objects to database systems.
1. The OODBMS manifesto Atkinson, Bancilhon, DeWitt, Dittrich, Maier and Database Theory: Lecture 7 Zdonik. 1989 Object model 2. The third-generation database system manifesto Committee for advanced DBMS function. 1990
Dr G.M. Bierman (Date and Darwen have subsequently suggested a “third manifesto”)
www.cl.cam.ac.uk/Teaching/2002/DBaseThy/
Database Theory 2003 119 c GMB/AD Database Theory 2003 120 c GMB/AD s r The OODBMS manifesto Mandatory commandments
v Thou shalt support complex objects t Paper attempts to give a generic definition of an OODBMS
v Thou shalt support object identity
t It describes the main features and characteristics for a system to qualify as an
v Thou shalt encapusulate thine objects OODBMS
v Thou shalt support types or classes
t These were grouped as follows:
v Thine classes or types shalt inherit from their ancestors 1. Mandatory Features that your system must have
v Thou shalt not bind prematurely 2. Optional Features that might make your system better v Thou shalt be computationally complete 3. Open Features where you have free design space v Thou shalt be extensible
These have had enormous influence both on OODBMSs and RDBMSs too! v Thou shalt remember thy data
v Thou shalt manage very large databases
v Thou shalt accept concurrent users
v Thou shalt recover from hardware and software failures
v Thou shalt have a simple way of querying data
Database Theory 2003 121 c GMB/AD Database Theory 2003 122 c GMB/AD w u
ODMG: history The ODMG architecture
x Essentially an industrial response to the manifesto & a number of competing OIF products
x Founded by Rick Cattell (Sun) in 1991 to produce an industry standard for object databases (but “avoid design by committee”!)
Object Model x Published various releases of the standard (1.1–Jan 94, 1.2–Jan 96, 2.0–May OQL ODL 97, 3.0–Jan 2000)
x Links with the OMG, and X3H2 (SQL) standard committee (amongst others) Language Binding
x Now (2001-) refocused on the JDO design effort (Persistent objects for Java)
x Whilst now dormant: still an influential design and most OODBMS products Java C++ Smalltalk are “compliant”
Database Theory 2003 123 c GMB/AD Database Theory 2003 124 c GMB/AD z y ODL: Object Definition Language ODL: Primitive types
{ Language for specifying the structure of database objects A whole host of primitive (non-object) types, e.g. } { Independent of your host programming language float, long, short, unsigned short, unsigned long, ... } { Class-based model boolean, char, string } { (Superset of IDL - CORBA definition language) date, time, timestamp, ... } { A class consists of enumerations and structs (like in C)
– Attributes (cf. fields) } set
Database Theory 2003 125 c GMB/AD Database Theory 2003 126 c GMB/AD ~ |
Example ODL definitions ODL methods Movie class ODL allows declarations of methods (code is written in your favourite host attribute string title; language) attribute integer year; class Movie Film color,BW filmType;
attribute enum attribute string title; ; attribute integer year; attribute enum Film color,BW filmType; Star class float lengthInHours(); attribute string name; set(string) starNames(); attribute Struct Addr ; string street, string city address; ;
Database Theory 2003 127 c GMB/AD Database Theory 2003 128 c GMB/AD ODL relationships Classes
ODL provides binary relationships. As in Java and C , we allow single inheritance between classes, and provide class Movie interfaces with mutiple inheritance. attribute string title; class Cartoon attribute integer year; extends Movie attribute enum Film color,BW filmType; relationship Set
attribute string name; attribute Struct Addr string street, string city address; relationship Set
Database Theory 2003 129 c GMB/AD Database Theory 2003 130 c GMB/AD
Extents Expressivity
ODL provides us with a handle on the collection of all current objects in a class: Consider the relational schema extent Cinema,Title,StartingTime from Lecture 1. In ODL: class Movie ( )
extent Movies attribute string title; class Movie (extent Movies) attribute integer year; { attribute enum Film color,BW filmType; string Cinema; float lengthInHours(); string Title; set(string) starNames(); relationship Set
Database Theory 2003 131 c GMB/AD Database Theory 2003 132 c GMB/AD More types An introduction to OQL
Rather confusingly, ODL also provides collection classes Based on SQL’s SELECT FROM WHERE construct Set
Database Theory 2003 133 c GMB/AD Database Theory 2003 134 c GMB/AD
Some example queries Further features
1. Movies; 1. Casting
2. select s.name from Stars as s; select (Movie)m from MurderMysteries as m 3. select m.year from Movies as m where where m.weapon="AK47"; m.title="Magnolia";
4. select s.name from Movies as m, 2. Object creation m.stars as s select new Studio(owner: s.name) where m.title="Magnolia"; from Stars as s;
5. select t.name from (select m.stars from Movies as m where m.title="Magnolia") as t where t.address.city="Hollywood";
Database Theory 2003 135 c GMB/AD Database Theory 2003 136 c GMB/AD Language binding Java language binding
Intention is to embed the database operations transparently into the host ODMG specify a host of interfaces, e.g. for ODL types, OQL queries, e.g.
programming language public interface OQLQuery public void create(String query) ... public void bind(Object parameter)
... public Object execute ...... ; Database programming now just looks like normal programming
Your vendor supplies the classes that implement them For most languages this is via an API
Java database programming now becomes just like normal programming Runtime language objects persist by backend magic OQLQuery query; query.create("select ... where p.salary>$1"); x=new Double(50000.0); query.bind(x); RichProfs=(DBag)query.execute();
Database Theory 2003 137 c GMB/AD Database Theory 2003 138 c GMB/AD
Formalizing: MiniODMG MiniODL: Design choices
¡ One of the big problems with the ODMG was a lack of formalization 1. No interfaces
¡ So let’s formalize! (Cutting edge [SIGMOD 2003!]) 2. Limited primitive types £ ¡ We’ll define Just integers and booleans
– MiniODL 3. No relationships
– MiniOQL 4. No collection classes (requires generics!)
¡ We’ll assume that methods have been written in our favourite programming language, MiniDFlat.
Database Theory 2003 139 c GMB/AD Database Theory 2003 140 c GMB/AD ¤ ¢ MiniODL: Definitions MiniODL: Example class Employee extends Person Class definition (extent Employees) ¨ § § ¥¦ class C © extends C ª { attribute int EmpID;
extent ¬ « attribute int GrossSalary; ° ° ° ¯ ¦ ¯ ¦ © ± attribute Manager UniqueManager; ® ° ° ° ² ¦ ² ¦ © ³ int NetSalary (int TaxRate); } ´
Attribute definition ¨ § § ¶ ¯ ¦
attribute µ ·
Method definition ¨ § § ° ° ° ² ¦ ¸ º º ¼ ¼ ¹ ¹ µ µ » » µ · «
Database Theory 2003 141 c GMB/AD Database Theory 2003 142 c GMB/AD ¾ ½
MiniOQL: Definition MiniOQL: Definition cont.
Query expression Query expression cont. Õ Á Ô Ô Ö Ö Ö À À Â Ó ¿ integer
true false booleans C Ó cast × Ø Ù Ã Ã Ö Ú Ä
identifier Ó attribute access × Ã Ö Ö Ö Ö È È È Û Ü Æ Þ É Ó Ó Ý Ý Ó ¿ Ç Ç ¿ set method invocation × Ø Ù Ã Å Ê Ô Ö Ö Ö Ô Ú Ú Ë Ì Ü Ü Þ Þ Ó Ý Ý Ó ¿ sop ¿ set ops new C object creation × Ø Ù Ã ß à Ë Ì á Ó Ó Ó ¿ iop ¿ int ops if then else conditional × Ã â Ë Ì Ó Ó Ó ¿ = ¿ int equality select from in where select × Ã Ë Ì
¿ == ¿ object equality à À È È È À Î Î Ë Ë É É ¿ Ç Ç ¿ record Ã Í Ï È Î
¿ record access Ã
size ¿ set size Ð Ñ Ã
Database Theory 2003 143 c GMB/AD Database Theory 2003 144 c GMB/AD ã Ò Type system for MiniOQL Type system
A query typing judgement is written
MiniOQL type ó ö ø ù æ ôõ ÷ ï ï ï å å å å ì ä ä ä ä í í ð ð î î ç set ñ é ê èë+ì è
where ú ó is a map from extent identifiers to their type ú
õ is a map from free identifiers of the query ÷ to their type
Database Theory 2003 145 c GMB/AD Database Theory 2003 146 c GMB/AD û ò
Typing judgements Typing judgements
(Int) (Bool) " " " ! ! ü ÿ ¡ ü ÿ ¡
# # ýþ
ýþ true false bool
int ¢ (Rec) ( ( ( ( ( ( % ! % ! # # # # ' ' ' ' $&% ) $&% ) (Id) (Extent) ü ¡ ÿ ¡ ¦ ¦ ü ¡ ÿ ¡ ¤ ¥ ¤ ¥ £ ýþ
ýþ £ C set C § ¨ ( ( ( , ( ( ! % ! * - + # # ' ' $&% ) (Rec access) ( % ! . . ü ÿ ¡ ¦ ¥
ýþ set § ¨ (Size) ¡ ü ÿ ¦
ýþ size int 0 § ¨ /
C C C (Upcast) 0 0
C C 2 1 ü ÿ ¡ ü ÿ ¡ ¥ ¥ ¥ ¥
ýþ © ýþ © (Set) ¡ ü ÿ ¥
ýþ © £ £ © set § ¨ ! 4 ! ! 4 ! 3 5 ' ' set bool 1 2 (Select)
! 4 3 5 select from in where set 1 2 ü ÿ ¡ ü ÿ ¡ ¥ ¥ ýþ © ýþ © set set § ¨ § ¨ (Union) ¡ ü ÿ ¥ © ý þ © set § ¨
Database Theory 2003 147 c GMB/AD Database Theory 2003 148 c GMB/AD 6 Observable nondeterminism Even worse... class P extends Object class F extends Object class P extends Object ( extent Ps) ( extent Fs) ( extent Ps) { attribute int name; { attribute int name; { attribute int name; }; attribute P pal;}; F loop(); /* This method always loops! */ }; SELECT (if size(Fs)<1 then (new F(name:007,pal:p)).name else p.name) SELECT (if size(Fs)<1 and p.name=001 FROM p in Ps; then p.loop() else new F(name:007,pal:p) Assume we have two P objects (001 and 002) and zero F objects. FROM p in Ps;
What is the result of this query? What is the result of this query?
Database Theory 2003 149 c GMB/AD Database Theory 2003 150 c GMB/AD 8 7
Can it be fixed? Method support
Yes! In two ways: Notice that MiniOQL allows you to invoke methods, which have been written in MiniDFlat. 1. Remove object creation from queries!
: ALL DBMS maufacturers think you need this! 2. Make object creation invisible to current query (!)
: What is the meaning of a query? 3. Use a fancy type-based analysis (GMB) – Dependent on meaning of arbitrary code!
– How does this tie-in with semantic techniques so-far in DBT?
– Queries can loop - which set is this denoted by?
Database Theory 2003 151 c GMB/AD Database Theory 2003 152 c GMB/AD ; 9 Semantics of queries revisited Useful URLs > < Dependent on meaning of arbitrary code! www.odmg.org Fine - we have operational semantics for Java/SML/...! > www.service-architecture.com/articles
< How does this tie-in with semantic techniques in DBT? > java.sun.com/products/jdo Maybe it’s been wrong all along!
< Queries can loop - which set is this denoted by? Either move to domain theory; or use operational semantics!
Database Theory 2003 153 c GMB/AD Database Theory 2003 154 c GMB/AD ? =
Schema-less data
A Underlying assumption so far: Data has a fixed schema declared in advance
A E.g. CREATE TABLE in SQL, ODL in ODMG object databases
Database Theory: Lecture 8 A Obvious question: What happens if we drop this assumption?
Semistructured model and XML A Note: This predates the web-based applications!
A First prototype system: TSIMMIS/Lore at Stanford [1995] Dr G.M. Bierman
A [Aside: Another assumption is persistent data sets. Dropping this yields a data stream model—a HOT area!] www.cl.cam.ac.uk/Teaching/2002/DBaseThy
Database Theory 2003 155 c GMB/AD Database Theory 2003 156 c GMB/AD B @ Is this a good idea? Is this a good idea?
The world is really quite unstructured, for example: To query the database one currently needs to understand the schema
C Data exchange formats (pre-XML) But...lots of users either don’t understand the schema or don’t want to know about the schema
C The World-Wide Web
E Where in the database is there information on “Magnolia”?
C ACeDB - a database used by biologists (from Sanger Centre, Cambridge)
E Are there integers less than -256?
E Which database objects have attribute names beginning with “Name”?
So let’s move to schema-less data model!
Database Theory 2003 157 c GMB/AD Database Theory 2003 158 c GMB/AD F D
Semistructured data SSD and graphs
G Rather than devising a syntax for types, we need a general syntax for values
G First start with simple idea: records {name:"Britney", email:"[email protected]"} person person
G Thus a value can be a collection of label-value pairs, e.g. {name:{first:"Britney",second:"Spears"}, email:"[email protected]"}
G (We’ll drop the standard convention that labels must be unique) {person:{name:"Britney", email:"[email protected]"}, name email name email person:{name:"Avril", email:"[email protected]"}}
"Britney" "[email protected]" "Avril" "[email protected]"
Notice that these values can be represented using edge-labelled trees
Database Theory 2003 159 c GMB/AD Database Theory 2003 160 c GMB/AD I H Relational model Object model
It’s pretty obvious that relational database instances can be represented using Z Objects have identity this model. Z Let’s extend our semi-structured model to label values with oids and allow K R W M S U J
For example Q Q Q Q . An instance could be stored as L NPO T NPV TX oids to be values, e.g. {R:{row{A:"Britney",B:"St Johns", C:"E1(a)"}, {person: &o1{name:"Britney", row{A:"Christina",B:"Trinity",C:"F7"}}, friend: &o2, S:{row{D:"Myleene",E:"Thursdays"}, ex-bf: &o3}, row{D:"Avril",E:"Fridays"} person: &o2{name:"Gavin", row{D:"Britney",E:"Saturdays"}} friend: &o2, } friend: &o3}, person: &o3{name:"Justin", friend: &o2, ex-gf: &o1} }
Database Theory 2003 161 c GMB/AD Database Theory 2003 162 c GMB/AD [ Y
Graphs! Semi-structured data model _ ^ ^ Drawing this example graphically: Data is now a graph! SSD ] ` Value
&o ` Value with identity a
&o Object identity a
person person person _ ^ ^ Value ` b Atomic values friend
c Complex value a &o1 &o2 &o3 _ ^ ^ ^ f f f ^ c d ] d ]
Complex value { e e } friend friend
ex−bf ex−gf friend name name
"Britney" "Gavin" "Justin"
Database Theory 2003 163 c GMB/AD Database Theory 2003 164 c GMB/AD g \ Coding up ODMG schema Some example SSD compliant data class Movie (extent Movies) h {Movies:{Movie:&m01{title:"Magnolia", attribute string title; year:1999, attribute integer year; attribute enum Film color,BW filmType; film:{colour}, h i relationship Set
Thus it is easy to embed ODMG data into SSD model
Database Theory 2003 165 c GMB/AD Database Theory 2003 166 c GMB/AD k j
Expressivity cont. SSD systems l We have seen that we can code up relational and object model instances into n A number of these flourished in the mid to late 1990s. SSD. What about the other direction? n Most notable work at Stanford:
l Not so easy! The point of the SSD model is its flexibility, e.g. – TSIMMIS [1995-] {Friend:{Name:"Britney", Tel:337890}, – Lore [1997-] Friend:{Name:"Emma",
n Other interesting systems include ACeDB: Email:"[email protected]"}, Friend:{Email:"[email protected]", – Built at the Sanger Centre (Cambridge) Tel:898989}, – Originally to store genetic data about C. elegans organism Friend:{Name:"Myleene"} } – Essentially a SSD model (has a schema language, but allows labels to be missing) l SSD model offers a very flexible data model!
l How might we store SSD values in a relational model?
Database Theory 2003 167 c GMB/AD Database Theory 2003 168 c GMB/AD o m XML XML graph model
p Standard for data exchange on web
p Textual format, consisting of elements wrapped with matching tags (mark-up), person person e.g.
r XML is a node-labelled graph
r SSD is an edge-labelled graph
Database Theory 2003 169 c GMB/AD Database Theory 2003 170 c GMB/AD s q
Building graphs in XML
Database Theory 2003 171 c GMB/AD Database Theory 2003 172 c GMB/AD u t Querying XML XPath x v There were a series of proposals, XML-QL, XQL, UnQL, Quilt... http://www.w3.org/TR/xpath x v Lead to current proposal: XQuery Building block for other W3C standards (not just XQuery) e.g. XLink, XPointer etc.
v XQuery consists of two parts
1. XPath: A language for describing nodes in an XML tree
2. XQuery: Query-like language surrounding XPath
Database Theory 2003 173 c GMB/AD Database Theory 2003 174 c GMB/AD y w
XPath Example XML document
z All paths start from a context node
z Result is (approximately) the set of nodes reachable from the context node
Database Theory 2003 175 c GMB/AD Database Theory 2003 176 c GMB/AD | { Example XPath XPath: Functions
Query: /bib/book/year Query: /bib/book/author/text()
Result:
Query: /bib/paper/year ~ text() – matches the text value Result: ~ node() – matches any node
~ name() – returns the name of the tag Query: //author
Result:
Query: /bib//first-name
Result:
Database Theory 2003 177 c GMB/AD Database Theory 2003 178 c GMB/AD }
XPath: Wildcard XPath: Attribute Nodes
Query: //author/* Query: /bib/book/@price
Result: Result: 55
NB: price is an attribute
Database Theory 2003 179 c GMB/AD Database Theory 2003 180 c GMB/AD XPath: Qualifiers XPath summary
Query: /bib/book/author[first-name] bib matches a bib element
/bib matches a bib element under root Query: /bib/book[@price<"60"] bib/paper matches a paper in bib
bib//paper matches a paper in bib, at any depth Query: /bib/book[author/@age<"25"] //paper matches a paper at any depth Thus we get a primitive form of querying (although result is a set of nodes, not valid XML) paper|book matches a paper or a book @price matches a price attribute
bib/book/@price matches price attribute in book, in bib
Database Theory 2003 181 c GMB/AD Database Theory 2003 182 c GMB/AD
XQuery Simple XQuery
XPath is central to XQuery. In addition, XQuery provides FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 XML glue to turn XPath results back into XML RETURN $x/title
Variables to communicate between XPath and XQuery Returns:
Fancy database features, e.g. joins, aggregates etc.
Fundamental structure:
Database Theory 2003 183 c GMB/AD Database Theory 2003 184 c GMB/AD
Simple XQuery Bindings
FOR $a IN distinct(document("bib.xml") FOR $x IN e – binds $x to each element in the expression e /bib/book[publisher="Morgan Kaufmann"]/author)
LET $x = e’ – binds $x to the entire expression e’ RETURN
distinct is a function that removes duplicates
Database Theory 2003 185 c GMB/AD Database Theory 2003 186 c GMB/AD
Example LET/WHERE query XQuery
WHERE count($b) > 100 Really cool thing: RETURN $p – http://www.w3.org/TR/query-semantics/ – This contains an operational semantics and type system for the core count is an aggregate function that returns the number of elements fragment of XQuery! – Industrial application of 1B Semantics material!
Database Theory 2003 187 c GMB/AD Database Theory 2003 188 c GMB/AD