Adding Belief Annotations to Databases
Total Page:16
File Type:pdf, Size:1020Kb
Believe It or Not: Adding Belief Annotations to Databases Wolfgang Gatterbauer, Magdalena Balazinska, Nodira Khoussainova, and Dan Suciu Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA fgatter, magda, nodira, [email protected] ABSTRACT As the community accumulates knowledge and the data- We propose a database model that allows users to anno- base content evolves over time, it may contain conflict- tate data with belief statements. Our motivation comes ing information and members may disagree on the in- from scientific database applications where a commu- formation it should store. Relational database man- nity of users is working together to assemble, revise, agement systems (DBMSs) today can help these com- and curate a shared data repository. As the commu- munities manage their shared data, but provide limited nity accumulates knowledge and the database content support for managing conflicting facts and conflicting evolves over time, it may contain conflicting informa- opinions about the correctness of the stored data. tion and members can disagree on the information it The recent concept of database annotations aims to should store. For example, Alice may believe that a tu- address this need: annotations are commonly seen as su- ple should be in the database, whereas Bob disagrees. perimposed information that helps to explain, correct, He may also insert the reason why he thinks Alice be- or refute base information [36] without actually chang- lieves the tuple should be in the database, and explain ing it. Annotations have been recognized by scientists as what he thinks the correct tuple should be instead. an essential feature for new generation database man- We propose a formal model for Belief Databases that agement systems [4, 8, 18], and efficient management interprets users' annotations as belief statements. These of annotations has become the focus of much recent annotations can refer both to the base data and to other work in the database community [7, 10, 12, 14, 23, 24]. annotations. We give a formal semantics based on a Still, the semantic distinction between base information fragment of multi-agent epistemic logic and define a and annotations remains blurred [9]. Annotations are query language over belief databases. We then prove a simply additional metadata added to existing data [44] key technical result, stating that every belief database without unique and distinctive semantics. can be encoded as a canonical Kripke structure. We use In discussions with scientists from forestry and bio- this structure to describe a relational representation of engineering, we have seen the need for an annotation belief databases, and give an algorithm for translating semantics that helps collaborating community members queries over the belief database into standard relational engage in a structured discussion on both content and queries. Finally, we report early experimental results each other's annotations: scientists do not only want to with our prototype implementation on synthetic data. insert their own annotations but also want to be able to respond to other scientists' annotations. Such annota- tion semantics creates several challenges for a database 1. INTRODUCTION system. First, it needs to allow for conflicting anno- In many sciences today, a community of users is work- tations: Users should be able to use annotations to ing together to assemble, revise, and curate a shared indicate conflicts between what they believe and what data repository. Examples of such collaborations in- others believe. The database should allow and expose clude identifying functions of particular regions of ge- those conflicts. Second, it should also support higher- netic sequences [39], curating databases of protein func- order annotations. Users should be able to annotate not tions [10, 46], identifying astronomical phenomena on only content but also other users' annotations. And, fi- images [43], and mapping the diversity of species [37]. nally, the additional functionality should be supported on top of a standard DBMS with a simple extension of SQL. Any new annotation model should take advantage of existing state-of-the art in query processing. Permission to copy without fee all or part of this material is granted provided To address these challenges, we introduce the concept that the copies are not made or distributed for direct commercial advantage, of a belief database. A belief database contains base in- the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data formation in the form of ground tuples, annotated with Base Endowment. To copy otherwise, or to republish, to post on servers belief statements. It represents a set of different be- or to redistribute to lists, requires a fee and/or special permission from the lief worlds, each one for one type of belief annotation, publisher, ACM. i.e. the beliefs of a particular user on ground tuples, VLDB ‘09, August 24-28, 2009, Lyon, France or on another user's beliefs. These belief worlds follow Copyright 2009 VLDB Endowment, ACM 000-0-00000-000-0/00/00. an open world assumption and may be overlapping and select selectlist partially conflicting with each other. The formal seman- from (((BELIEF user)+ not?)? relationname)+ tics of belief annotations is defined in terms of multi- where conditionlist agent epistemic logic [20]. This semantics can be rep- resented by an appropriate canonical Kripke structure insert into ((BELIEF user)+ not?)? relationname which, in turn, can be represented in the standard rela- values tional model and, hence, on top of a standard RDBMS. delete from ((BELIEF user)+ not?)? relationname We also introduce belief conjunctive queries, a simple, where conditionlist yet versatile query language that serves as interface to a belief database and consists of conjunctive queries with update ((BELIEF user)+ not?)? relationname belief assertions. In addition to retrieving facts believed set value assignments or not believed by certain users, this language can also where conditionlist be used to query for agreements or disagreements be- tween users. We describe an algorithm for translat- Figure 1: Syntax of query and data manipulation ing belief conjunctive queries into non-recursive Datalog commands in BeliefSQL. (and, hence, to SQL). We have implemented a prototype Belief Database Management System (BDMS), and de- scribe a set of preliminary experiments validating the ingly. They can also correct a sighting by annotating it feasibility of translating belief queries into SQL. with corrected values they believe more plausible than The structure of this paper follows its contributions: those provided by the volunteers in the field. And they • We describe a motivating application, and give ex- can also suggest explanations for other users' annota- amples and a syntax for BeliefSQL (Sect. 2). tions, thus leading to higher-order annotations. • We define a data model and a query language for We now illustrate the use of a BDMS. We assume belief databases (Sect. 3). three users (Alice, Bob, and Carol) and a simplified • We describe the canonical Kripke structure that database schema consisting of three relations: enables implementing belief databases (Sect. 4). Sightings(sid, uid, species, date, location) • We describe a relational representation of belief Comments(cid, comment, sid) databases and the translation of queries and up- Users(uid, name) dates over this canonical representation (Sect. 5). • We validate our model and report on experiments We refer to this schema as external schema since it with our prototype BDMS (Sect. 6). presents the way users enter and retrieve data. Beliefs, The paper ends with an overview of related work (Sect. 7) in contrast, are stored transparently from users and can and conclusions (Sect. 8). be manipulated via natural extensions to standard SQL (Fig. 1). We illustrate its usage through examples next. Little Carol sees a bald eagle during her school trip 2. MOTIVATING APPLICATION and reports her sighting with the following insert: In this section, we present a motivating application i :insert into Sightings that we use as running example throughout this paper. 1 values ('s1','Carol','bald eagle','6-14-08','Lake Forest') The scenario is based on the NatureMapping project whose goal is to record biodiversity of species in the Bob, a graduate student, however, does not believe that US state of Washington [37]. Participating community Carol saw a bald eagle: members volunteer to submit records of animal sightings from the field. Each observation includes user-id, date, i2:insert into BELIEF 'Bob' not Sightings location, species name, and various options to comment values ('s1','Carol','bald eagle','6-14-08','Lake Forest') on the observation, such as details about how the ani- Additionally, Bob does not believe that Carol could have mal was identified (e.g., animal tracks were found). As seen a fish eagle, which looks similar to a bald eagle: sightings are reported by non-experts, they can contain errors. In fact, even experts sometimes disagree on the i3:insert into BELIEF 'Bob' not Sightings exact species of a sighted animal. values ('s1','Carol’,’fish eagle','6-14-08','Lake Forest') In the current protocol, a single expert in forestry (the This ensures that Bob still disagrees even if Carol's tuple principal investigator) manually curates all the entries is updated to species=’fish eagle'. In both cases, Bob before inserting them into the database, which results in uses the external key 's1' to refer to the tuple with which significant delays and does not allow the application to he disagrees. scale to a larger number of volunteers. In this setting, a Alice, a field technician, believes there was a crow at Belief Database Management System (BDMS) can ad- Lake Placid because she found some black feathers. She dress this challenge by allowing multiple experts to an- does not insert a regular tuple as Carol did, but inserts notate, thus streamlining the curation process.