Subsumption and Complementation As Data Fusion Operators

Subsumption and Complementation as Data Fusion Operators ∗ y Jens Bleiholder Sascha Szott Melanie Herschel Hasso-Plattner-Institut Konrad-Zuse-Zentrum Universität Tübingen Potsdam, Germany für Informationstechnik Berlin Tübingen, Germany [email protected] Berlin, Germany melanie.herschel@uni- potsdam.de [email protected] tuebingen.de Frank Kaufer Felix Naumann Hasso-Plattner-Institut Hasso-Plattner-Institut Potsdam, Germany Potsdam, Germany [email protected] [email protected] potsdam.de potsdam.de ABSTRACT Categories and Subject Descriptors The goal of data fusion is to combine several representations of one H.2.5 [Database Management]: Heterogeneous Databases real world object into a single, consistent representation, e.g., in data integration. A very popular operator to perform data fusion is the minimum union operator. It is defined as the outer union and the General Terms subsequent removal of subsumed tuples. Minimum union is used in Algorithms other applications as well, for instance in database query optimization to rewrite outer join queries, in the semantic web community in Keywords implementing SPARQL’s OPTIONAL operator, etc. Despite its wide applicability, there are only few efficient implementations, and un- minimum union, complement union, data integration, data quality til now, minimum union is not a relational database primitive. This paper fills this gap as we present implementations of sub- 1. INTRODUCTION sumption that serve as a building block for minimum union. Fur- thermore, we consider this operator as database primitive and show Data integration is the process of providing users of an integrated how to perform optimization of query plans in presence of sub- information system with a unified view of several data sources. sumption and minimum union through rule-based plan transforma- Three main challenges in data integration are (1) to transform the tions. Experiments on both artificial and real world data show that data stored in the sources to the target schema (schema match- our algorithms outperform existing algorithms used for subsump- ing [22] and mapping [14]), (2) to match object representations tion in terms of runtime and they scale to large volumes of data. from different sources representing the same real-world object (du- In the context of data integration, we observe that performing plicate detection [8]), and (3) to combine several existing repre- data fusion calls for more than subsumption and minimum union. sentations of one real world object into a single consistent repre- Therefore, another contribution of this paper is the definition of sentation (data fusion [3]). In this last step, which is the focus of the complementation and complement union operators. Intuitively, this paper, special care must be taken when handling data conflicts these allow to merge tuples that have complementing values and between the different object representations. thus eliminate unnecessary null-values. We focus on two alternative operators, namely minimum union and complement union. Both aim at resolving a special type of data conflict, called uncertainty. Intuitively, the operators consider cases where a concrete value in one tuple conflicts with a NULL value of ∗Research was partially performed while at Hasso-Plattner-Institut. another tuple, and replaces the NULL value with the NON-NULL yResearch was partially performed while at Hasso-Plattner-Institut value when merging the tuples. and IBM Almaden. Minimum union combines an outer union with the removal of tuples that are contained in other tuples, so called subsumed tuples [12]. For instance, tuple t1 = (a; b; c) subsumes tuple t2 = (a; b; ?), where ? denotes a NULL value. When applying subsumption, t2 is removed. This operator has been widely used Permission to make digital or hard copies of all or part of this work for in the literature [11, 12, 14], however it lacks efficient and gener- personal or classroom use is granted without fee provided that copies are ally applicable implementations of the subsumption part, a gap this not made or distributed for profit or commercial advantage and that copies paper aims to fill. bear this notice and the full citation on the first page. To copy otherwise, to Complement union combines object representations by first com- republish, to post on servers or to redistribute to lists, requires prior specific puting the outer union and then combining tuples that complement permission and/or a fee. EDBT 2010, March 22–26, 2010, Lausanne, Switzerland. each other. Consider above t2 and a tuple t3 = (a; ?; c): t2 and Copyright 2010 ACM 978-1-60558-945-9/10/0003 ...$10.00 t3 complement each other and combine to a single tuple (a; b; c) 513 Amazon.com: Comme Si de Rien N'Etait: Carla Bruni: Music 9/1/09 2:05 PM Result 2 when complementation is applied. To the best of our knowledge, B001AY2P26 Amazon.com: Comme Si de Rien N'Etait: Carla Bruni: Music 9/1/09 2:05 PM FREE 2-Day Shipping on college essentials this is the first proposal of an operator with these semantics. Hello. Sign in to get personalized recommendations. New customer? Start here. Sponsored by Canon Printers FREE 2-Day Shipping on college essentials Hello. Sign in to get personalized recommendations. New customer? Start here. Sponsored by Canon Printers Your Amazon.com | Today's DealsYour | Gifts & Wish Amazon.com Lists | Gift Cards | Today'sYour Account Deals | Help | Gifts & Wish Lists | Gift Cards Your Account | Help Police Hospital Shop All Departments Search Music Cart Wish List tid Name DOB Sex Address tid Name DOB Sex Blood ShopMusic All DepartmentsAdvanced Search Browse Genres New ReleasesSearchTop Sellers Music DealsMusicMusic You Should Hear Music Essentials MP3 Downloads Cart Wish List Result 1 See buying choices for this item toResult see if it's one of3 the millions that are eligible for Amazon Prime. 1 Miller 7/7/59 m 12 Main 4 Peters 1/1/53 AB ⊥ B001B9ZSO2 B001B16PGK Miller 12 Main Music Comme Si de RienAdvanced N'Etait [IMPORT] Search Browse Genres New Releases Top Sellers Music Deals Music You Should Hear Music Essentials MP3 Downloads 2 ⊥ ⊥ 5 Peters 1/1/53 m ⊥ Carla Bruni 8 used & new from $19.98 (24 customer reviews) | More about this product 3 Peters 1/1/53 m 34 First 6 Miller ⊥ f B See buying choices for this item to see if it's one of the millions that are eligible for Amazon Prime. Have one to sell? (1) Schema matching + outer union 7 Miller 7/7/59 m O Available from these sellers. + duplicate detection 5 new from $19.98 3 used from $36.00 Available to Download Now oid tid Name DOB Sex Address Blood Buy the MP3 album for $9.49 at the AmazonComme MP3 Si de Rien N'Etait [IMPORT] 1 1 Miller 7/7/59 m 12 Main ⊥ Downloads store. Carla Bruni Buy the MP3 album for $9.49 8 used & new from $19.98 1 2 Miller ⊥ ⊥ 12 Main ⊥ Result 5 Result 4 (24 customer reviews) | More about this product Free MP3 Samplers and Up to 30% off CDs See larger image and other views Download over two dozen free MP3 samplers and shop 2 3 Peters 1/1/53 m 34 First B001BN1V9O B001JINDU6hundreds of CDs at up to 30% off at our World Music Festival Share with Friends ⊥ event. Savings available for a limited time only. Shop now. 2 4 Peters 1/1/53 ⊥ ⊥ AB Have one to sell? Amazon.com:(a) Comme Relationship Si De Rien N'Etait: Carla Bruni: Music between five representations9/1/09 2:56 PM of the 2008 Share your own customer images Available from these sellers. 2 5 Peters 1/1/53 m ⊥ ⊥ Amazon.com: Comme Si de Rien N'Etait: Carla Bruni: Music 9/1/09 2:05 PM Miller f B Listen to samples 1 6 ⊥ ⊥ ImportAmazon.com: Comme Si CD De Rien N'Etait: “Comme Carla Bruni: Music si de RienFREE n’Etait” 2-Day Shipping on college essentials by Carla9/1/09 2:56 PM Bruni. Hello. Sign in to get personalized recommendations. New customer? Start here. FREE 2-Day Shipping on college essentials SponsoredHello. by Canon Sign inPrinters to get personalized recommendations. New customer? Start here. Amazon's Carla Bruni Store Sponsored by Canon Printers (2.1) Fusion using 1 7 Miller 7/7/59 m O (2.2) Fusion using Your Amazon.com | Today's Deals | Gifts & Wish Lists | Gift Cards Your Account | Help ⊥ Dashed arrows signify that the source isYour Amazon.com subsumed | Today's Deals | Gifts by & Wish Liststhe | Gift Cards Your Account | Help FindFREE all 2-Day the ShippingCDs, MP3s, on college and vinyl, essentials plus photos, videos, biographies, discussions, and more. Shop All Departments SearchHello.Music Sign in to get personalized recommendations. New customer?Shop Start All here Departments . Cart SearchSponsoredWishMusic List by Canon Printers Cart Wish List subsumption complementation 5 new from $19.98 3 used from $36.00 Your Amazon.com | Today's Deals | Gifts & Wish Lists | Gift Cards Your Account | Help Musictarget, whereasAdvanced Search Browse solid Genres New Releases edgesTop Sellers Music represent Deals MusicMusic You Should Hear› complementation.VisitMusic Amazon'sEssentialsAdvancedMP3 SearchCarla Downloads BruniBrowse GenresStore New Releases Top Sellers Music Deals Music You Should Hear Music Essentials MP3 Downloads oid tid Name DOB Sex Address Blood oid tid Name DOB Sex Address Blood Shop All DepartmentsThis item is not eligibleSearch for AmazonMusic Prime, but millions of other items are. Join Amazon Prime today. Already a member? Sign in.See buyingCart choices forWish this item List to see if it's one of the millions that are eligible for Amazon Prime. Available to Download Now 1 1+2 Miller 7/7/59 m 12 Main ⊥ 1 1+7 Miller 7/7/59 m 12 Main O Music AdvancedComme Search SiBrowse De GenresRien N'EtaitNew Releases [IMPORT]Top Sellers Music Deals Music You Should Hear Music EssentialsComme MP3Si deDownloads Rien N'Etait [IMPORT] Buy the MP3 album for $9.49 at the Amazon MP3 Get it for less! Carla Bruni (Artist) Carla Bruni 8 used & new from $19.98 This item is not eligible for Amazon Prime, but millions of other items are.

Subsumption and Complementation As Data Fusion Operators

TOPOLOGY and ITS APPLICATIONS the Number of Complements in The

Determinacy in Linear Rational Expectations Models

180: Counting Techniques

Set Difference and Symmetric Difference of Fuzzy Sets

E. A. Emerson and C. S. Jutla, Tree Automata, Mu-Calculus, And

Naïve Set Theory Basic Definitions Naïve Set Theory Is the Non-Axiomatic Treatment of Set Theory

Regularity Properties and Determinacy

Set (Mathematics) from Wikipedia, the Free Encyclopedia

1 Basics About Sets

DETERMINACY for MEASURES 1. Introduction We Say That a Positive

A Workshop for High School Students on Naive Set Theory

Dynamic Determinacy Analysis