Towards Optimal Cardinality Estimation of Unions and Intersections with Sketches

Total Page:16

File Type:pdf, Size:1020Kb

Towards Optimal Cardinality Estimation of Unions and Intersections with Sketches Towards Optimal Cardinality Estimation of Unions and Intersections with Sketches Daniel Ting Facebook 1730 Minor Ave Seattle, WA [email protected] ABSTRACT operations on sketches and derive cardinality estimates that Estimating the cardinality of unions and intersections of sets are optimal or nearly optimal. is a problem of interest in OLAP. Large data applications This problem of estimating the number of distinct ele- often require the use of approximate methods based on small ments in a dataset in a memory efficient manner appears sketches of the data. We give new estimators for the cardi- in a wide range of applications. For example, in data ana- nality of unions and intersection and show they approximate lytics and OLAP, a web company may count distinct users an optimal estimation procedure. These estimators enable accessing a service [21]. The summarizations are used in the improved accuracy of the streaming MinCount sketch networking to detect denial of service attacks by counting to be exploited in distributed settings. Both theoretical network flows [15], and in databases to optimize query plans and empirical results demonstrate substantial improvements [22]. Other applications include graph analysis where they over existing methods. are used to estimate the diameter of a graph [9], [5]. The extended problem for unions has particular impor- tance in OLAP and distributed settings. The streaming CCS Concepts cardinality estimator given by [23], [10] is optimal in the •Mathematics of computing ! Probabilistic algo- sense that no unbiased estimator has lower asymptotic vari- rithms; •Theory of computation ! Sketching and ance. This is much stronger than typical space-complexity sampling; •Computing methodologies ! Distributed results in the theory literature [2], [19] that just guarantee algorithms; an optimal rate. It guarantees optimality of the constant in front of the rate. This optimality is evident in practice as the estimators require half the space of existing meth- Keywords ods to achieve the same error on MinCount sketches. The cardinality estimation; data sketching; randomized algorithms streaming estimator, however, cannot be applied directly in distributed settings. In a map-reduce setting, the map- pers computing efficient streamed sketch summaries need a 1. INTRODUCTION method to combine them at a reducer. A union operation Consider a dataset D. The basic approximate distinct on the sketches would allow the efficiency gains to be trans- count problem is to construct a memory efficient summa- fered to distributed settings and turn the streamed sketch rization S of D and to estimate the cardinality of the set into a mergeable summary [1]. These efficiency gains can of unique items A using just the sketch S. Under this ba- be tremendous as empirical results show efficient cardinal- sic formulation, a sketch answers a cardinality question that ity estimation for unions can reduce variance and the space must be specified before computing the sketch. We consider requirements by an order of magnitude when many sketches the extended problem of estimating the cardinality of set are merged. unions, jA1 [A2j, and intersections, jA1 \A2j, using multiple In OLAP settings, one is often interested in cardinality sketches. This has two important consequences. First, com- estimates on multi-dimensional data, for example counting binatorially many cardinality questions can be accurately the number of users accessing a web service by geographic answered by sketches. Second, the significant improvements location, time window, and other factors [21]. This results in accuracy using the HIPS or optimal martingale estimator in exponentially many cardinalities to estimate. Unions and of [10] and [23] can be realized in distributed settings. More intersections allow a limited number of sketches to answer a specifically, we wish to define union [~ and intersection \~ multitude of cardinality questions. Cardinality estimates for intersections are not as well stud- Permission to make digital or hard copies of all or part of this work for personal or ied as for unions. While most sketches for approximate dis- classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation tinct counting have a natural but inefficient union operation, on the first page. Copyrights for components of this work owned by others than the they do not have one for intersections. Intersection car- author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or dinality estimates are often computed using the inclusion- republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. exclusion principle or by using Jaccard similarity [13], [4] KDD ’16, August 13 - 17, 2016, San Francisco, CA, USA when an estimator is available. These methods still require c 2016 Copyright held by the owner/author(s). Publication rights licensed to ACM. first accurately estimating the cardinality of the union or ISBN 978-1-4503-4232-2/16/08. $15.00 the Jaccard similarity of a set with the union. DOI: http://dx.doi.org/10.1145/2939672.2939772 Our contributions are as follows. We introduce two tech- [12] uses over twice the memory as that used in [8]. The dis- niques for computing unions and intersections on approx- advantage of using the optimal streaming methods described imate distinct counting sketches: pseudo-likelihood based in [23] and [10] is that they require the data to arrive in a methods and re-weighting component estimators. These are single stream. They are not immediately applicable in dis- applied to Streaming MinCount sketches in this paper but tributed settings or in OLAP settings where the sketches are may be applied to other sketches as well. The resulting es- pre-aggregations that need to be further merged. timators demonstrate state of the art performance on both real and synthetic datasets. The pseudo-likelihood based 3. NAIVE ESTIMATION MinCount method is conjectured to be asymptotically ef- ficient as it is nearly the same as the asymptotically ef- We first consider basic estimation procedures for unions ficient maximum likelihood estimator. This establishes a and intersections on distinct counting sketches. Many sketches nearly ideal baseline for comparison. Re-weighted compo- used in approximate distinct counting have a natural union nent estimators are much simpler to implement and gener- operation. For example, the LPCA sketch is a bitmap and alize, and they are empirically shown to be as efficient as is equivalent to a Bloom filter with the number of hashes the pseudo-likelihood based estimators. We also derive vari- k = 1. Taking the union of LPCA sketches simply takes the ance estimates which allow confidence intervals to be given bitwise OR of the sketches. The resulting sketch is identi- for the cardinality estimates. In the special case where the cal to the sketch computed on the union of the original sets. streams are different permutations of the same set, we show This property holds for other sketches such as HyperLogLog that merging the streaming estimates yields a more accu- [16] and MinCount [18] as well. In other words, there is an ~ ~ rate estimate than the estimator on a single stream. Thus, operation [ such that S(A1)[S(A2) = S(A1 [ A2) where S unlike existing methods, the union operation on streaming is the function that generates a sketch from a streamed set. sketches exploits information about the order of elements in For notational convenience, we will denote Si = S(Ai). each stream. The variance after averaging over all possible Unlike the union, typically no natural intersection oper- orderings is shown to be 2=3 the variance of the streaming ation exists for approximate distinct counting sketches. A MinCount estimator. In addition to improved estimation, simple method to obtain cardinality estimates for the inter- the resulting methods yield mergeable summaries [1] under section is to use union estimates and the inclusion-exclusion both union and intersection operations unlike existing meth- principle, jA1 \ A2j = jA1j + jA2j − jA1 [ A2j. In the case ^ ods. where there is an estimate J(S1;S2) for the Jaccard similar- ity, two existing intersection cardinality estimator are given 2. EXISTING BASIC METHODS by [4] and [13]. Throughout, we will use a hat to denote estimated quantities, with N^(S) denoting a cardinality esti- The basic approximate distinct counting problem has been mate using sketch S. The naive and Jaccard based estimates well studied in the literature beginning with the seminal pa- of the cardinality of the union and intersection are given by per by Flajolet and Martin [17]. Methods include Linear Probabilistic Counting (LPCA) [25], HyperLogLog [16], the N^naive(S1[~S2) = N^(S(A1 [ A2)) (1) Multiresolution Bitmap [15], the S-bitmap [7], α-stable dis- N^ (S \~S ) = N^(S ) + N^(S ) − N^(S(A [ A )) tribution methods [12], and MinCount [18]. Kane, et al naive 1 2 1 2 1 2 [19] also gave a method that is optimal in terms of space- N^Jaccard;1(S1\~S2) = J^(S1;S2)N^(S1[~S2) complexity. J^(S ;S ) As described in [23], all these methods generate a stochas- N^ (S \~S ) = 1 2 N^(S ) + N^(S ) Jaccard;2 1 2 ^ 1 2 tic process called the area process that is governed solely by 1 + J(S1;S2) the number of distinct elements encountered. A sketch S These estimation strategies turn out to be suboptimal. can be mapped to a set R(S) ⊂ (0; 1) called the remaining For the intersection estimators, one reason is that the error area. Given a random hash, each element in a stream is is often roughly proportional to the size of the union or the hashed to some value U. If U is still in the remaining area, larger set, while a good procedure should give error that is U 2 R(S), the sketch is updated by cutting out a portion bounded by the size of the smaller set.
Recommended publications
  • Fast and Private Computation of Cardinality of Set Intersection and Union
    Fast and Private Computation of Cardinality of Set Intersection and Union Emiliano De Cristofaroy, Paolo Gastiz, and Gene Tsudikz yPARC zUC Irvine Abstract. With massive amounts of electronic information stored, trans- ferred, and shared every day, legitimate needs for sensitive information must be reconciled with natural privacy concerns. This motivates var- ious cryptographic techniques for privacy-preserving information shar- ing, such as Private Set Intersection (PSI) and Private Set Union (PSU). Such techniques involve two parties { client and server { each with a private input set. At the end, client learns the intersection (or union) of the two respective sets, while server learns nothing. However, if desired functionality is private computation of cardinality rather than contents, of set intersection, PSI and PSU protocols are not appropriate, as they yield too much information. In this paper, we design an efficient crypto- graphic protocol for Private Set Intersection Cardinality (PSI-CA) that yields only the size of set intersection, with security in the presence of both semi-honest and malicious adversaries. To the best of our knowl- edge, it is the first protocol that achieves complexities linear in the size of input sets. We then show how the same protocol can be used to pri- vately compute set union cardinality. We also design an extension that supports authorization of client input. 1 Introduction Proliferation of, and growing reliance on, electronic information trigger the increase in the amount of sensitive data stored and processed in cyberspace. Consequently, there is a strong need for efficient cryptographic techniques that allow sharing information with privacy. Among these, Private Set Intersection (PSI) [11, 24, 14, 21, 15, 22, 9, 8, 18], and Private Set Union (PSU) [24, 15, 17, 12, 32] have attracted a lot of attention from the research community.
    [Show full text]
  • 1 Measurable Sets
    Math 4351, Fall 2018 Chapter 11 in Goldberg 1 Measurable Sets Our goal is to define what is meant by a measurable set E ⊆ [a; b] ⊂ R and a measurable function f :[a; b] ! R. We defined the length of an open set and a closed set, denoted as jGj and jF j, e.g., j[a; b]j = b − a: We will use another notation for complement and the notation in the Goldberg text. Let Ec = [a; b] n E = [a; b] − E. Also, E1 n E2 = E1 − E2: Definitions: Let E ⊆ [a; b]. Outer measure of a set E: m(E) = inffjGj : for all G open and E ⊆ Gg. Inner measure of a set E: m(E) = supfjF j : for all F closed and F ⊆ Eg: 0 ≤ m(E) ≤ m(E). A set E is a measurable set if m(E) = m(E) and the measure of E is denoted as m(E). The symmetric difference of two sets E1 and E2 is defined as E1∆E2 = (E1 − E2) [ (E2 − E1): A set is called an Fσ set (F-sigma set) if it is a union of a countable number of closed sets. A set is called a Gδ set (G-delta set) if it is a countable intersection of open sets. Properties of Measurable Sets on [a; b]: 1. If E1 and E2 are subsets of [a; b] and E1 ⊆ E2, then m(E1) ≤ m(E2) and m(E1) ≤ m(E2). In addition, if E1 and E2 are measurable subsets of [a; b] and E1 ⊆ E2, then m(E1) ≤ m(E2).
    [Show full text]
  • Julia's Efficient Algorithm for Subtyping Unions and Covariant
    Julia’s Efficient Algorithm for Subtyping Unions and Covariant Tuples Benjamin Chung Northeastern University, Boston, MA, USA [email protected] Francesco Zappa Nardelli Inria of Paris, Paris, France [email protected] Jan Vitek Northeastern University, Boston, MA, USA Czech Technical University in Prague, Czech Republic [email protected] Abstract The Julia programming language supports multiple dispatch and provides a rich type annotation language to specify method applicability. When multiple methods are applicable for a given call, Julia relies on subtyping between method signatures to pick the correct method to invoke. Julia’s subtyping algorithm is surprisingly complex, and determining whether it is correct remains an open question. In this paper, we focus on one piece of this problem: the interaction between union types and covariant tuples. Previous work normalized unions inside tuples to disjunctive normal form. However, this strategy has two drawbacks: complex type signatures induce space explosion, and interference between normalization and other features of Julia’s type system. In this paper, we describe the algorithm that Julia uses to compute subtyping between tuples and unions – an algorithm that is immune to space explosion and plays well with other features of the language. We prove this algorithm correct and complete against a semantic-subtyping denotational model in Coq. 2012 ACM Subject Classification Theory of computation → Type theory Keywords and phrases Type systems, Subtyping, Union types Digital Object Identifier 10.4230/LIPIcs.ECOOP.2019.24 Category Pearl Supplement Material ECOOP 2019 Artifact Evaluation approved artifact available at https://dx.doi.org/10.4230/DARTS.5.2.8 Acknowledgements The authors thank Jiahao Chen for starting us down the path of understanding Julia, and Jeff Bezanson for coming up with Julia’s subtyping algorithm.
    [Show full text]
  • Naïve Set Theory Basic Definitions Naïve Set Theory Is the Non-Axiomatic Treatment of Set Theory
    Naïve Set Theory Basic Definitions Naïve set theory is the non-axiomatic treatment of set theory. In the axiomatic treatment, which we will only allude to at times, a set is an undefined term. For us however, a set will be thought of as a collection of some (possibly none) objects. These objects are called the members (or elements) of the set. We use the symbol "∈" to indicate membership in a set. Thus, if A is a set and x is one of its members, we write x ∈ A and say "x is an element of A" or "x is in A" or "x is a member of A". Note that "∈" is not the same as the Greek letter "ε" epsilon. Basic Definitions Sets can be described notationally in many ways, but always using the set brackets "{" and "}". If possible, one can just list the elements of the set: A = {1,3, oranges, lions, an old wad of gum} or give an indication of the elements: ℕ = {1,2,3, ... } ℤ = {..., -2,-1,0,1,2, ...} or (most frequently in mathematics) using set-builder notation: S = {x ∈ ℝ | 1 < x ≤ 7 } or {x ∈ ℝ : 1 < x ≤ 7 } which is read as "S is the set of real numbers x, such that x is greater than 1 and less than or equal to 7". In some areas of mathematics sets may be denoted by special notations. For instance, in analysis S would be written (1,7]. Basic Definitions Note that sets do not contain repeated elements. An element is either in or not in a set, never "in the set 5 times" for instance.
    [Show full text]
  • Fast and Private Computation of Cardinality of Set Intersection and Union*
    Fast and Private Computation of Cardinality of Set Intersection and Union* Emiliano De Cristofaroy, Paolo Gastiz, Gene Tsudikz yPARC zUC Irvine Abstract In many everyday scenarios, sensitive information must be shared between parties without com- plete mutual trust. Private set operations are particularly useful to enable sharing information with privacy, as they allow two or more parties to jointly compute operations on their sets (e.g., intersec- tion, union, etc.), such that only the minimum required amount of information is disclosed. In the last few years, the research community has proposed a number of secure and efficient techniques for Private Set Intersection (PSI), however, somewhat less explored is the problem of computing the magnitude, rather than the contents, of the intersection – we denote this problem as Private Set In- tersection Cardinality (PSI-CA). This paper explores a few PSI-CA variations and constructs several protocols that are more efficient than the state-of-the-art. 1 Introduction Proliferation of, and growing reliance on, electronic information generate an increasing amount of sensitive data stored and processed in the cyberspace. Consequently, there is a compelling need for efficient cryptographic techniques that allow sharing information while protecting privacy. Among these, Private Set Intersection (PSI) [15, 29, 18, 26, 19, 27, 12, 11, 22], and Private Set Union (PSU) [29, 19, 21, 16, 36] have recently attracted a lot of attention from the research community. In particular, PSI allows one party (client) to compute the intersection of its set with that of another party (server), such that: (i) server does not learn client input, and (ii) client learns no information about server input, beyond the intersection.
    [Show full text]
  • Worksheet: Cardinality, Countable and Uncountable Sets
    Math 347 Worksheet: Cardinality A.J. Hildebrand Worksheet: Cardinality, Countable and Uncountable Sets • Key Tool: Bijections. • Definition: Let A and B be sets. A bijection from A to B is a function f : A ! B that is both injective and surjective. • Properties of bijections: ∗ Compositions: The composition of bijections is a bijection. ∗ Inverse functions: The inverse function of a bijection is a bijection. ∗ Symmetry: The \bijection" relation is symmetric: If there is a bijection f from A to B, then there is also a bijection g from B to A, given by the inverse of f. • Key Definitions. • Cardinality: Two sets A and B are said to have the same cardinality if there exists a bijection from A to B. • Finite sets: A set is called finite if it is empty or has the same cardinality as the set f1; 2; : : : ; ng for some n 2 N; it is called infinite otherwise. • Countable sets: A set A is called countable (or countably infinite) if it has the same cardinality as N, i.e., if there exists a bijection between A and N. Equivalently, a set A is countable if it can be enumerated in a sequence, i.e., if all of its elements can be listed as an infinite sequence a1; a2;::: . NOTE: The above definition of \countable" is the one given in the text, and it is reserved for infinite sets. Thus finite sets are not countable according to this definition. • Uncountable sets: A set is called uncountable if it is infinite and not countable. • Two famous results with memorable proofs.
    [Show full text]
  • Some Set Theory We Should Know Cardinality and Cardinal Numbers
    SOME SET THEORY WE SHOULD KNOW CARDINALITY AND CARDINAL NUMBERS De¯nition. Two sets A and B are said to have the same cardinality, and we write jAj = jBj, if there exists a one-to-one onto function f : A ! B. We also say jAj · jBj if there exists a one-to-one (but not necessarily onto) function f : A ! B. Then the SchrÄoder-BernsteinTheorem says: jAj · jBj and jBj · jAj implies jAj = jBj: SchrÄoder-BernsteinTheorem. If there are one-to-one maps f : A ! B and g : B ! A, then jAj = jBj. A set is called countable if it is either ¯nite or has the same cardinality as the set N of positive integers. Theorem ST1. (a) A countable union of countable sets is countable; (b) If A1;A2; :::; An are countable, so is ¦i·nAi; (c) If A is countable, so is the set of all ¯nite subsets of A, as well as the set of all ¯nite sequences of elements of A; (d) The set Q of all rational numbers is countable. Theorem ST2. The following sets have the same cardinality as the set R of real numbers: (a) The set P(N) of all subsets of the natural numbers N; (b) The set of all functions f : N ! f0; 1g; (c) The set of all in¯nite sequences of 0's and 1's; (d) The set of all in¯nite sequences of real numbers. The cardinality of N (and any countable in¯nite set) is denoted by @0. @1 denotes the next in¯nite cardinal, @2 the next, etc.
    [Show full text]
  • 06. Naive Set Theory I
    Topics 06. Naive Set Theory I. Sets and Paradoxes of the Infinitely Big II. Naive Set Theory I. Sets and Paradoxes of the Infinitely Big III. Cantor and Diagonal Arguments Recall: Paradox of the Even Numbers Claim: There are just as many even natural numbers as natural numbers natural numbers = non-negative whole numbers (0, 1, 2, ..) Proof: { 0 , 1 , 2 , 3 , 4 , ............ , n , ..........} { 0 , 2 , 4 , 6 , 8 , ............ , 2n , ..........} In general: 2 criteria for comparing sizes of sets: (1) Correlation criterion: Can members of one set be paired with members of the other? (2) Subset criterion: Do members of one set belong to the other? Can now say: (a) There are as many even naturals as naturals in the correlation sense. (b) There are less even naturals than naturals in the subset sense. To talk about the infinitely Moral: notion of sets big, just need to be clear makes this clear about what’s meant by size Bolzano (1781-1848) Promoted idea that notion of infinity was fundamentally set-theoretic: To say something is infinite is “God is infinite in knowledge” just to say there is some set means with infinite members “The set of truths known by God has infinitely many members” SO: Are there infinite sets? “a many thought of as a one” -Cantor Bolzano: Claim: The set of truths is infinite. Proof: Let p1 be a truth (ex: “Plato was Greek”) Let p2 be the truth “p1 is a truth”. Let p3 be the truth “p3 is a truth”. In general, let pn be the truth “pn-1 is a truth”, for any natural number n.
    [Show full text]
  • Session 5 – Main Theme
    Database Systems Session 5 – Main Theme Relational Algebra, Relational Calculus, and SQL Dr. Jean-Claude Franchitti New York University Computer Science Department Courant Institute of Mathematical Sciences Presentation material partially based on textbook slides Fundamentals of Database Systems (6th Edition) by Ramez Elmasri and Shamkant Navathe Slides copyright © 2011 and on slides produced by Zvi Kedem copyight © 2014 1 Agenda 1 Session Overview 2 Relational Algebra and Relational Calculus 3 Relational Algebra Using SQL Syntax 5 Summary and Conclusion 2 Session Agenda . Session Overview . Relational Algebra and Relational Calculus . Relational Algebra Using SQL Syntax . Summary & Conclusion 3 What is the class about? . Course description and syllabus: » http://www.nyu.edu/classes/jcf/CSCI-GA.2433-001 » http://cs.nyu.edu/courses/fall11/CSCI-GA.2433-001/ . Textbooks: » Fundamentals of Database Systems (6th Edition) Ramez Elmasri and Shamkant Navathe Addition Wesley ISBN-10: 0-1360-8620-9, ISBN-13: 978-0136086208 6th Edition (04/10) 4 Icons / Metaphors Information Common Realization Knowledge/Competency Pattern Governance Alignment Solution Approach 55 Agenda 1 Session Overview 2 Relational Algebra and Relational Calculus 3 Relational Algebra Using SQL Syntax 5 Summary and Conclusion 6 Agenda . Unary Relational Operations: SELECT and PROJECT . Relational Algebra Operations from Set Theory . Binary Relational Operations: JOIN and DIVISION . Additional Relational Operations . Examples of Queries in Relational Algebra . The Tuple Relational Calculus . The Domain Relational Calculus 7 The Relational Algebra and Relational Calculus . Relational algebra . Basic set of operations for the relational model . Relational algebra expression . Sequence of relational algebra operations . Relational calculus . Higher-level declarative language for specifying relational queries 8 Unary Relational Operations: SELECT and PROJECT (1/3) .
    [Show full text]
  • Understanding the New-Keynesian Model When Monetary Policy Switches Regimes
    NBER WORKING PAPER SERIES UNDERSTANDING THE NEW-KEYNESIAN MODEL WHEN MONETARY POLICY SWITCHES REGIMES Roger E.A. Farmer Daniel F. Waggoner Tao Zha Working Paper 12965 http://www.nber.org/papers/w12965 NATIONAL BUREAU OF ECONOMIC RESEARCH 1050 Massachusetts Avenue Cambridge, MA 02138 March 2007 We thank Zheng Liu and Richard Rogerson for helpful discussions. The views expressed herein do not necessarily reflect those of the Federal Reserve Bank of Atlanta nor those of the Federal Reserve System. Farmer acknowledges the support of NSF grant SBR 0418174. The views expressed herein are those of the author(s) and do not necessarily reflect the views of the National Bureau of Economic Research. © 2007 by Roger E.A. Farmer, Daniel F. Waggoner, and Tao Zha. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source. Understanding the New-Keynesian Model when Monetary Policy Switches Regimes Roger E.A. Farmer, Daniel F. Waggoner, and Tao Zha NBER Working Paper No. 12965 March 2007 JEL No. E3,E5,E52 ABSTRACT This paper studies a New-Keynesian model in which monetary policy may switch between regimes. We derive sufficient conditions for indeterminacy that are easy to implement and we show that the necessary and sufficient condition for determinacy, provided by Davig and Leeper, is necessary but not sufficient. More importantly, we use a two-regime model to show that indeterminacy in a passive regime may spill over to an active regime, no matter how active the latter regime is.
    [Show full text]
  • Relational Algebra & Relational Calculus
    Relational Algebra & Relational Calculus Lecture 4 Kathleen Durant Northeastern University 1 Relational Query Languages • Query languages: Allow manipulation and retrieval of data from a database. • Relational model supports simple, powerful QLs: • Strong formal foundation based on logic. • Allows for optimization. • Query Languages != programming languages • QLs not expected to be “Turing complete”. • QLs not intended to be used for complex calculations. • QLs support easy, efficient access to large data sets. 2 Relational Query Languages • Two mathematical Query Languages form the basis for “real” query languages (e.g. SQL), and for implementation: • Relational Algebra: More operational, very useful for representing execution plans. • Basis for SEQUEL • Relational Calculus: Let’s users describe WHAT they want, rather than HOW to compute it. (Non-operational, declarative.) • Basis for QUEL 3 4 Cartesian Product Example • A = {small, medium, large} • B = {shirt, pants} A X B Shirt Pants Small (Small, Shirt) (Small, Pants) Medium (Medium, Shirt) (Medium, Pants) Large (Large, Shirt) (Large, Pants) • A x B = {(small, shirt), (small, pants), (medium, shirt), (medium, pants), (large, shirt), (large, pants)} • Set notation 5 Example: Cartesian Product • What is the Cartesian Product of AxB ? • A = {perl, ruby, java} • B = {necklace, ring, bracelet} • What is BxA? A x B Necklace Ring Bracelet Perl (Perl,Necklace) (Perl, Ring) (Perl,Bracelet) Ruby (Ruby, Necklace) (Ruby,Ring) (Ruby,Bracelet) Java (Java, Necklace) (Java, Ring) (Java, Bracelet) 6 Mathematical Foundations: Relations • The domain of a variable is the set of its possible values • A relation on a set of variables is a subset of the Cartesian product of the domains of the variables. • Example: let x and y be variables that both have the set of non- negative integers as their domain • {(2,5),(3,10),(13,2),(6,10)} is one relation on (x, y) • A table is a subset of the Cartesian product of the domains of the attributes.
    [Show full text]
  • Set Notation and Concepts
    Appendix Set Notation and Concepts “In mathematics you don’t understand things. You just get used to them.” John von Neumann (1903–1957) This appendix is primarily a brief run-through of basic concepts from set theory, but it also in Sect. A.4 mentions set equations, which are not always covered when introducing set theory. A.1 Basic Concepts and Notation A set is a collection of items. You can write a set by listing its elements (the items it contains) inside curly braces. For example, the set that contains the numbers 1, 2 and 3 can be written as {1, 2, 3}. The order of elements do not matter in a set, so the same set can be written as {2, 1, 3}, {2, 3, 1} or using any permutation of the elements. The number of occurrences also does not matter, so we could also write the set as {2, 1, 2, 3, 1, 1} or in an infinity of other ways. All of these describe the same set. We will normally write sets without repetition, but the fact that repetitions do not matter is important to understand the operations on sets. We will typically use uppercase letters to denote sets and lowercase letters to denote elements in a set, so we could write M ={2, 1, 3} and x = 2 as an element of M. The empty set can be written either as an empty list of elements ({})orusing the special symbol ∅. The latter is more common in mathematical texts. A.1.1 Operations and Predicates We will often need to check if an element belongs to a set or select an element from a set.
    [Show full text]