Datalog Logic As a Query Language a Logical Rule Anatomy of a Rule Anatomy of a Rule Sub-Goals Are Atoms
Total Page:16
File Type:pdf, Size:1020Kb
Logic As a Query Language If-then logical rules have been used in Datalog many systems. Most important today: EII (Enterprise Information Integration). Logical Rules Nonrecursive rules are equivalent to the core relational algebra. Recursion Recursive rules extend relational SQL-99 Recursion algebra --- have been used to add recursion to SQL-99. 1 2 A Logical Rule Anatomy of a Rule Our first example of a rule uses the Happy(d) <- Frequents(d,rest) AND relations: Likes(d,soda) AND Sells(rest,soda,p) Frequents(customer,rest), Likes(customer,soda), and Sells(rest,soda,price). The rule is a query asking for “happy” customers --- those that frequent a rest that serves a soda that they like. 3 4 Anatomy of a Rule sub-goals Are Atoms Happy(d) <- Frequents(d,rest) AND An atom is a predicate, or relation Likes(d,soda) AND Sells(rest,soda,p) name with variables or constants as arguments. Head = “consequent,” Body = “antecedent” = The head is an atom; the body is the a single sub-goal AND of sub-goals. AND of one or more atoms. Read this Convention: Predicates begin with a symbol “if” capital, variables begin with lower-case. 5 6 1 Example: Atom Example: Atom Sells(rest, soda, p) Sells(rest, soda, p) The predicate Arguments are = name of a variables relation 7 8 Interpreting Rules Interpreting Rules A variable appearing in the head is Rule meaning: called distinguished ; The head is true of the distinguished otherwise it is nondistinguished. variables if there exist values of the nondistinguished variables that make all sub-goals of the body true. 9 10 Example: Interpretation Example: Interpretation Happy(d) <- Frequents(d,rest) AND Happy(d) <- Frequents(d,rest) AND Likes(d,soda) AND Sells(rest,soda,p) Likes(d,soda) AND Sells(rest,soda,p) Distinguished Nondistinguished variable variables Interpretation: customer d is happy if there exist Interpretation: customer d is happy if there exist a rest, a soda, and a price p such that d frequents the a rest, a soda, and a price p such that d frequents the rest, likes the soda, and the rest sells the soda at price p. rest, likes the soda, and the rest sells the soda at price p. 11 12 2 Arithmetic sub-goals Example: Arithmetic In addition to relations as predicates, a A soda is “cheap” if there are at least predicate for a sub-goal of the body can two rests that sell it for under $1. be an arithmetic comparison. Figure out a rule that would determine We write such sub-goals in the usual way, whether a soda is cheap or not. e.g.: x < y. 13 14 Example: Arithmetic Negated sub-goals Cheap(soda) <- We may put “NOT” in front of a sub- Sells(rest1,soda,p1) AND goal, to negate its meaning. Sells(rest2,soda,p2) AND p1 < 1.00 AND p2 < 1.00 AND rest1 <> rest2 15 16 Negated sub-goals Algorithms for Applying Rules Example: Think of Arc(a,b) as arcs in a Two approaches: graph. 1. Variable-based : Consider all possible S(x,y) says the graph is not transitive from assignments to the variables of the body. x to y ; i.e., there is a path of length 2 If the assignment makes the body true, from x to y, but no arc from x to y. add that tuple for the head to the result. S(x,y) <- Arc(x,z) AND Arc(z,y) 2. Tuple-based : Consider all assignments of tuples from the non-negated, relational AND NOT Arc(x,y) sub-goals. If the body becomes true, add the head’s tuple to the result. 17 18 3 Example: Variable-Based --- 1 Example: Variable-Based; x=1, z=2 S(x,y) <- Arc(x,z) AND Arc(z,y) S(x,y) <- Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) AND NOT Arc(x,y) 1 1 2 2 1 Arc(1,2) and Arc(2,3) are the only tuples in the Arc relation. Only assignments to make the first sub-goal Arc(x,z) true are: 1. x = 1; z = 2 2. x = 2; z = 3 19 20 Example: Variable-Based; x=1, z=2 Example: Variable-Based; x=2, z=3 S(x,y) <- Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) S(x,y) <- Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) 1 3 1 2 2 3 1 3 2 2 3 3 2 3 is the only value of y that makes all three sub-goals true. Makes S(1,3) a tuple of the answer 21 22 Example: Variable-Based; x=2, z=3 Tuple-Based Assignment S(x,y) <- Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) Start with the non-negated, relational sub- 2 2 3 3 2 goals only. Consider all assignments of tuples to these No value of y sub-goals. Thus, no contribution makes Arc(3,y) Choose tuples only from the corresponding to the head tuples; true. relations. S = {(1,3)} If the assigned tuples give a consistent value to all variables and make the other sub-goals true, add the head tuple to the result. 23 24 4 Example: Tuple-Based Example: Tuple-Based S(x,y) <- Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) S(x,y) <- Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) Only possible values Arc(1,2), Arc(2,3) Only possible values Arc(1,2), Arc(2,3) Four possible assignments to first two sub- Four possible assignments to first two sub- goals: goals: Arc(x,z) Arc(z,y) Arc(x,z) Arc(z,y) Only assignment (1,2) (1,2) (1,2) (1,2) with consistent z-value. Since it (1,2) (2,3) These two rows (1,2) (2,3) are invalid since also makes (2,3) (1,2) z can’t be (2,3) (1,2) NOT Arc(x,y) true, (2,3) (2,3) (3 and 1) or (2,3) (2,3) add S(1,3) to (3 and 2) result. simultaneously. 25 26 Datalog Programs Evaluating Datalog Programs A Datalog program is a collection of As long as there is no recursion, rules. we can pick an order to evaluate the IDB In a program, predicates can be either predicates, 1. EDB = Extensional Database so that all the predicates in the body of its = stored table. rules have already been evaluated. 2. IDB = Intensional Database If an IDB predicate has more than one = relation defined by rules. rule, Never both! No EDB in heads. each rule contributes tuples to its relation. 27 28 Example: Datalog Program Expressive Power of Datalog Using following EDB find all the Without recursion, manufacturers of sodas Joe doesn’t sell: Datalog can express all and only the Sells(rest, soda, price) and queries of core relational algebra. sodas(name, manf). The same as SQL select-from-where, JoeSells(b) <- Sells(’Joe’’s rest’, b, p) without aggregation and grouping. Answer(m) <- Sodas(b,m) AND NOT JoeSells(b) 29 30 5 Recursive Example: Expressive Power of Datalog Generalized Cousins But with recurson, EDB: Parent(c,p) = p is a parent of c. Datalog can express more than these languages. Generalized cousins: people with common Yet still not Turing-complete. ancestors one or more generations back. Note: We are all cousins according to this definition. 31 32 Recursive Example Definition of Recursion Sibling(x,y) <- Parent(x,p) Form a dependency graph whose AND Parent(y,p) nodes = IDB predicates. AND x<>y Arc X ->Y if and only if Cousin(x,y) <- Sibling(x,y) there is a rule with X in the head and Y in the body. Cousin(x,y) <- Parent(x,xParent) Cycle = recursion; AND Parent(y,yParent) AND Cousin(xParent,yParent) No cycle = no recursion. 33 34 Example: Dependency Graphs Evaluating Recursive Rules The following works when there is no Cousin Answer negation: 1. Start by assuming all IDB relations are empty. Sibling JoeSells 2. Repeatedly evaluate the rules using the EDB and the previous IDB, to get a new Recursive Non-recursive IDB. 3. End when no change to IDB. 35 36 6 The “Naïve” Evaluation Algorithm Example: Evaluation of Cousin Remember the rules: Start: Sibling(x,y) <- IDB = 0 Parent(x,p) AND Parent(y,p) AND x<>y Cousin(x,y) <- Sibling(x,y) Apply rules to IDB, EDB Cousin(x,y) <- Parent(x,xParent) AND Parent(y,yParent) no yes Change AND Cousin(xParent,yParent) done to IDB? 37 38 Semi-naive Evaluation Example: Evaluation of Cousin Since the EDB never changes, We’ll proceed in rounds to infer on each round we only get new IDB tuples Sibling facts (red) if we use at least one IDB tuple that was and Cousin facts (green). obtained on the previous round. Saves work; lets us avoid rediscovering most known facts. A fact could still be derived in a second way. 39 40 Parent Data: Parent Above Child Parent Data: Parent Above Child The parent data, and Exercise: edge goes downward from a parent to child. ad1. What do you ad Exercises: expect after first 1. List some of the round? parent-child relationships. bc e bc e 2. What is contained in the Sibling and Cousin data? fgh fgh jk i jk i 41 42 7 Sibling facts Sibling and remain Cousin are unchanged presumed empty. Round 1 because Sibling is Round 2 not recursive.