Databases with Uncertainty and Lineage
The VLDB Journal manuscript No. (will be inserted by the editor) Omar Benjelloun · Anish Das Sarma · Alon Halevy · Martin Theobald · Jennifer Widom Databases with Uncertainty and Lineage Abstract This paper introduces ULDBs, an extension of re- Keywords Uncertainty in Databases · Lineage · Prove- lational databases with simple yet expressive constructs for nance · Probabilistic data management representing and manipulating both lineage and uncertainty. Uncertain data and data lineage are two important areas of data management that have been considered extensively in isolation, however many applications require the features in 1 Introduction tandem. Fundamentally, lineage enables simple and consis- tent representation of uncertain data, it correlates uncertainty in query results with uncertainty in the input data, and query The problems faced when managing uncertain data, and those processing with lineage and uncertainty together presents associated with tracking data lineage, have been addressed computational benefits over treating them separately. in isolation in the past (e.g., [2,4,21,27,30,31,34,41,45] for uncertain data and [11,17–19,39,40] for data lineage). Mo- We show that the ULDB representation is complete, and tivated by a diverse set of applications including data inte- that it permits straightforward implementation of many re- gration, deduplication, scientific data management, informa- lational operations. We define two notions of ULDB mini- tion extraction, and others, we became interested in the com- mality — data-minimal and lineage-minimal — and study bination of uncertainty and lineage as the basis for a new minimization of ULDB representations under both notions. type of data management system [46]. With lineage, derived relations are no longer self-contained: Intuitively, an uncertain database is one that represents their uncertainty depends on uncertainty in the base data.
[Show full text]