Adding Belief Annotations to Databases

Total Page:16

File Type:pdf, Size:1020Kb

Adding Belief Annotations to Databases Believe It or Not: Adding Belief Annotations to Databases Wolfgang Gatterbauer, Magdalena Balazinska, Nodira Khoussainova, and Dan Suciu Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA fgatter, magda, nodira, [email protected] ABSTRACT As the community accumulates knowledge and the data- We propose a database model that allows users to anno- base content evolves over time, it may contain conflict- tate data with belief statements. Our motivation comes ing information and members may disagree on the in- from scientific database applications where a commu- formation it should store. Relational database man- nity of users is working together to assemble, revise, agement systems (DBMSs) today can help these com- and curate a shared data repository. As the commu- munities manage their shared data, but provide limited nity accumulates knowledge and the database content support for managing conflicting facts and conflicting evolves over time, it may contain conflicting informa- opinions about the correctness of the stored data. tion and members can disagree on the information it The recent concept of database annotations aims to should store. For example, Alice may believe that a tu- address this need: annotations are commonly seen as su- ple should be in the database, whereas Bob disagrees. perimposed information that helps to explain, correct, He may also insert the reason why he thinks Alice be- or refute base information [36] without actually chang- lieves the tuple should be in the database, and explain ing it. Annotations have been recognized by scientists as what he thinks the correct tuple should be instead. an essential feature for new generation database man- We propose a formal model for Belief Databases that agement systems [4, 8, 18], and efficient management interprets users' annotations as belief statements. These of annotations has become the focus of much recent annotations can refer both to the base data and to other work in the database community [7, 10, 12, 14, 23, 24]. annotations. We give a formal semantics based on a Still, the semantic distinction between base information fragment of multi-agent epistemic logic and define a and annotations remains blurred [9]. Annotations are query language over belief databases. We then prove a simply additional metadata added to existing data [44] key technical result, stating that every belief database without unique and distinctive semantics. can be encoded as a canonical Kripke structure. We use In discussions with scientists from forestry and bio- this structure to describe a relational representation of engineering, we have seen the need for an annotation belief databases, and give an algorithm for translating semantics that helps collaborating community members queries over the belief database into standard relational engage in a structured discussion on both content and queries. Finally, we report early experimental results each other's annotations: scientists do not only want to with our prototype implementation on synthetic data. insert their own annotations but also want to be able to respond to other scientists' annotations. Such annota- tion semantics creates several challenges for a database 1. INTRODUCTION system. First, it needs to allow for conflicting anno- In many sciences today, a community of users is work- tations: Users should be able to use annotations to ing together to assemble, revise, and curate a shared indicate conflicts between what they believe and what data repository. Examples of such collaborations in- others believe. The database should allow and expose clude identifying functions of particular regions of ge- those conflicts. Second, it should also support higher- netic sequences [39], curating databases of protein func- order annotations. Users should be able to annotate not tions [10, 46], identifying astronomical phenomena on only content but also other users' annotations. And, fi- images [43], and mapping the diversity of species [37]. nally, the additional functionality should be supported on top of a standard DBMS with a simple extension of SQL. Any new annotation model should take advantage of existing state-of-the art in query processing. Permission to copy without fee all or part of this material is granted provided To address these challenges, we introduce the concept that the copies are not made or distributed for direct commercial advantage, of a belief database. A belief database contains base in- the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data formation in the form of ground tuples, annotated with Base Endowment. To copy otherwise, or to republish, to post on servers belief statements. It represents a set of different be- or to redistribute to lists, requires a fee and/or special permission from the lief worlds, each one for one type of belief annotation, publisher, ACM. i.e. the beliefs of a particular user on ground tuples, VLDB ‘09, August 24-28, 2009, Lyon, France or on another user's beliefs. These belief worlds follow Copyright 2009 VLDB Endowment, ACM 000-0-00000-000-0/00/00. an open world assumption and may be overlapping and select selectlist partially conflicting with each other. The formal seman- from (((BELIEF user)+ not?)? relationname)+ tics of belief annotations is defined in terms of multi- where conditionlist agent epistemic logic [20]. This semantics can be rep- resented by an appropriate canonical Kripke structure insert into ((BELIEF user)+ not?)? relationname which, in turn, can be represented in the standard rela- values tional model and, hence, on top of a standard RDBMS. delete from ((BELIEF user)+ not?)? relationname We also introduce belief conjunctive queries, a simple, where conditionlist yet versatile query language that serves as interface to a belief database and consists of conjunctive queries with update ((BELIEF user)+ not?)? relationname belief assertions. In addition to retrieving facts believed set value assignments or not believed by certain users, this language can also where conditionlist be used to query for agreements or disagreements be- tween users. We describe an algorithm for translat- Figure 1: Syntax of query and data manipulation ing belief conjunctive queries into non-recursive Datalog commands in BeliefSQL. (and, hence, to SQL). We have implemented a prototype Belief Database Management System (BDMS), and de- scribe a set of preliminary experiments validating the ingly. They can also correct a sighting by annotating it feasibility of translating belief queries into SQL. with corrected values they believe more plausible than The structure of this paper follows its contributions: those provided by the volunteers in the field. And they • We describe a motivating application, and give ex- can also suggest explanations for other users' annota- amples and a syntax for BeliefSQL (Sect. 2). tions, thus leading to higher-order annotations. • We define a data model and a query language for We now illustrate the use of a BDMS. We assume belief databases (Sect. 3). three users (Alice, Bob, and Carol) and a simplified • We describe the canonical Kripke structure that database schema consisting of three relations: enables implementing belief databases (Sect. 4). Sightings(sid, uid, species, date, location) • We describe a relational representation of belief Comments(cid, comment, sid) databases and the translation of queries and up- Users(uid, name) dates over this canonical representation (Sect. 5). • We validate our model and report on experiments We refer to this schema as external schema since it with our prototype BDMS (Sect. 6). presents the way users enter and retrieve data. Beliefs, The paper ends with an overview of related work (Sect. 7) in contrast, are stored transparently from users and can and conclusions (Sect. 8). be manipulated via natural extensions to standard SQL (Fig. 1). We illustrate its usage through examples next. Little Carol sees a bald eagle during her school trip 2. MOTIVATING APPLICATION and reports her sighting with the following insert: In this section, we present a motivating application i :insert into Sightings that we use as running example throughout this paper. 1 values ('s1','Carol','bald eagle','6-14-08','Lake Forest') The scenario is based on the NatureMapping project whose goal is to record biodiversity of species in the Bob, a graduate student, however, does not believe that US state of Washington [37]. Participating community Carol saw a bald eagle: members volunteer to submit records of animal sightings from the field. Each observation includes user-id, date, i2:insert into BELIEF 'Bob' not Sightings location, species name, and various options to comment values ('s1','Carol','bald eagle','6-14-08','Lake Forest') on the observation, such as details about how the ani- Additionally, Bob does not believe that Carol could have mal was identified (e.g., animal tracks were found). As seen a fish eagle, which looks similar to a bald eagle: sightings are reported by non-experts, they can contain errors. In fact, even experts sometimes disagree on the i3:insert into BELIEF 'Bob' not Sightings exact species of a sighted animal. values ('s1','Carol’,’fish eagle','6-14-08','Lake Forest') In the current protocol, a single expert in forestry (the This ensures that Bob still disagrees even if Carol's tuple principal investigator) manually curates all the entries is updated to species=’fish eagle'. In both cases, Bob before inserting them into the database, which results in uses the external key 's1' to refer to the tuple with which significant delays and does not allow the application to he disagrees. scale to a larger number of volunteers. In this setting, a Alice, a field technician, believes there was a crow at Belief Database Management System (BDMS) can ad- Lake Placid because she found some black feathers. She dress this challenge by allowing multiple experts to an- does not insert a regular tuple as Carol did, but inserts notate, thus streamlining the curation process.
Recommended publications
  • Agent Based System for Realtime Database Audit Monitoring
    UNIVERSITY OF NAIROBI SCHOOL OF COMPUTING AND INFORMATICS AGENT BASED SYSTEM FOR REALTIME DATABASE AUDIT MONITORING BY BONIFACE AKUKU P58/73079/2009 SUPERVISOR MR. CHRISTOPHER MOTURI August 2011 A research report submitted in partial fulfillment for the requirements of Master of Science in Computer Science University of NAIROBI Library 0439225 4 Table of Contents Table of Contents........................................................................................................................................ „... ii Abstract............................................................................................................................................................. iii Dedication......................................................................................................................................................... iv Acknowledgement..............................................................................................................................................v Declaration........................................................................................................................................................ vi Abbreviations...................................................................................................................................................vii List of Tables...................................................................................................................................................viii List of Figures...................................................................................................................................................ix
    [Show full text]
  • Negative Database for Data Security
    Negative Database for Data Security Anup Patel Niveeta Sharma Magdalini Eirinaki Computer Engineering Department San Jose State University [email protected], [email protected], [email protected] Abstract high database security levels. This framework is a collection of some algorithms that manipulate the input Data Security is a major issue in any web-based data and store it in the database. This populated database application. There have been approaches to handle is referred to as the negative database. A negative intruders in any system, however, these approaches are database can be defined as a database that contains huge not fully trustable; evidently data is not totally protected. amount of data which consists of counterfeit data along Real world databases have information that needs to be with the actual data. A few approaches that describe this securely stored. The approach of generating negative concept have been proposed but have not yet been database could help solve such problem. A Negative implemented to work for real world databases [5]. All Database can be defined as a database that contains the proposed algorithms had issues related to the huge amount of data consisting of counterfeit data along reversibility of the data, converting back to the original with the real data. Intruders may be able to get access to form from the negative representations [3] and the fact is such databases, but, as they try to extract information, that they have not yet been implemented. [2, 4]. they will retrieve data sets that would include both the Contrary to previous work, the proposed framework actual and the negative data.
    [Show full text]
  • Security Implementation on EAV Model Using Negative Database and Shuffling
    Pooja Vartak. Int. Journal of Engineering Research and Application www.ijera.com ISSN: 2248-9622, Vol. 7, Issue 8, (Part -6) August 2017, pp.56-60 RESEARCH ARTICLE OPEN ACCESS Security Implementation on EAV model using Negative database and Shuffling. Pooja Vartak*, Prof. Amarja Adgaonkar** *(Department of Computer Engineering, L R Tiwari College of Engineering, Mumbai University, India ** (Department of Computer Engineering, K C College of Engineering and Technology, Thane Mumbai University, India Corresponding Author: Pooja Vartak ABSTRACT This Paper presents an improvised security mechanism for EAV (Entity Attribute Value) data model. EAV data model for data storage has been used in various information systems now days as it gives an advantage of data flexibility and addition and modification of new data without changing the physical database schema. In EAV (Entity Attribute Value) model uses only three columns to store the data where first column stores entity, second column stores attribute and third column stores the value of the attribute for particular entity and single generic table is used to the data. As data security and database flexibility is important in this paper we are proposing security mechanism for existing databases using concept of negative database and shuffling. Keywords: EAV Model, Negative Database, Information Security, Shuffling. -------------------------------------------------------------------------------------------------------------------------------------- Date of Submission: 11-08-2017 Date of acceptance:
    [Show full text]
  • Self-Protection Against Insider Threats in DBMS Through Policies Implementation
    (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 3, 2017 Self-Protection against Insider Threats in DBMS through Policies Implementation Farukh Zaman, Basit Raza Ahmad Kamran Malik, Adeel Anjum Department of Computer Science Department of Computer Science COMSATS Institute of Information Technology COMSATS Institute of Information Technology Islamabad, Pakistan Islamabad, Pakistan Abstract—In today’s world, information security of an Firewall, network traffic monitoring, enforcing password organization has become a major challenge as well as a critical mechanism and penetration testing [4]. However, it’s difficult business issue. Managing and mitigating these internal or to monitor insider’s intent. According to CERT survey, more external security related issues, organizations hire highly than 700 cases were caused by the insider threats [6]. To knowledgeable security expert persons. Insider threats in protect against these threats database should have some extra database management system (DBMS) are inherently a very hard features of Autonomic Computing like self-protection. We problem to address. Employees within the organization carry out first provide an introduction to Autonomic computing and its or harm organization data in a professional manner. To protect components. and monitor organization information from insider user in DBMS, the organization used different techniques, but these Autonomic computing has the ability to self-manage its techniques are insufficient to secure their data.
    [Show full text]
  • University of Hawa1'1 Ubra~Y
    UNIVERSITY OF HAWA1'1 UBRA~Y NETWORK THREAT DETECTION UTILIZING ADAPTIVE AND INNATE IMMUNE SYSTEM METAPHORS A DISSERTATION SUBMITTED TO THE GRADUATE DIVISION OF THE UNIVERSITY OF HAW AI 1 IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN COMPUTER SCIENCE MAY 2008 By Robert L. Fanelli Dissertation Committee: Kimberly Binsted, Chairperson Edoardo Biagioni David Chin WesJey Peterson Raymond Panko We certify that we have read this dissertation and that, in our opinion, it is satisfactory in scope and quality as a dissertation for the degree of Doctor of Philosophy in Computer Science. DISSERTATION COMMITTEE ~ ~ Chairperson , i Dedication To Lisa, Anne and John. the true Precious. ii Acknowledgements My thanks go to my advisor, Kim Binsted, for her guidance and support throughout the dissertation process. Thanks also go to my committee members, Edo Biagioni, David Chin, Wes Peterson and Ray Panko for their valuable time, sage advice and keen eyes. I'd like to acknowledge the support of the United States Anny in providing the opportunity to pursue this research. Thanks also go to the International Infonnation Systems Security Certification Consortium, (ISC)2, for their support in the conduct of this research. Finally, heartfelt thanks go to my family for their unwavering support and for putting up with distractedness, irritability, missed weekends and all the other features of life in the computer dungeon. The views expressed in this document are those of the author and do not necessarily reflect the official policy or position of the United States Anny, the Department of Defense or the U.S.
    [Show full text]
  • Leveraging Positive and Negative Representations of Information
    Leveraging Positive and Negative Representations of Information by Eric Delarosa Trias B.S., Computer Science, University of California, Davis, 1988 M.S., Computer Engineering, Air Force Institute of Technology, 2002 DISSERTATION Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Computer Science The University of New Mexico Albuquerque, New Mexico December, 2008 c 2008, Eric Delarosa Trias iii Dedication Sa aking pamilia, mga kaibigan at guro na tumulong para ako’y magtagumpay. iv Acknowledgments First and foremost, I thank God for his countless blessings and for giving me the strength to persevere. I also know that I couldn’t have done this without the support of my wife and children. They sacrificed much for me to finish. I was fortunate to have dedicated collaborators who were essential to this research. I would like to thank Fernando Esponda for his help and inspiration. To Elena S. Ackley, your enthusiasm and hard work gave life to our research. To Jorge Navas, your partnership was invaluable to this work. I heartily acknowledge Prof. Stephanie Forrest, my advisor and dissertation chair, for her guidance and encouragement. I also thank my committee members, Prof. Greg Heileman, Prof. Terran Lane, and Prof. Jedidiah Crandall. Finally, I thank the U.S. Air Force and the Air Force Institute of Technology for providing this fellowship opportunity. The views expressed in this article are those of the author and do not reflect the official policy or position of the United States Air Force, Department
    [Show full text]
  • Adding Belief Annotations to Databases
    Believe It or Not: Adding Belief Annotations to Databases Wolfgang Gatterbauer, Magdalena Balazinska, Nodira Khoussainova, and Dan Suciu Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA fgatter, magda, nodira, [email protected] ABSTRACT As the community accumulates knowledge and the data- We propose a database model that allows users to anno- base content evolves over time, it may contain conflict- tate data with belief statements. Our motivation comes ing information and members may disagree on the in- from scientific database applications where a commu- formation it should store. Relational database man- nity of users is working together to assemble, revise, agement systems (DBMSs) today can help these com- and curate a shared data repository. As the commu- munities manage their shared data, but provide limited nity accumulates knowledge and the database content support for managing conflicting facts and conflicting evolves over time, it may contain conflicting informa- opinions about the correctness of the stored data. tion and members can disagree on the information it The recent concept of database annotations aims to should store. For example, Alice may believe that a tu- address this need: annotations are commonly seen as su- ple should be in the database, whereas Bob disagrees. perimposed information that helps to explain, correct, He may also insert the reason why he thinks Alice be- or refute base information [36] without actually chang- lieves the tuple should be in the database, and explain ing it. Annotations have been recognized by scientists as what he thinks the correct tuple should be instead. an essential feature for new generation database man- We propose a formal model for Belief Databases that agement systems [4, 8, 19], and efficient management interprets users' annotations as belief statements.
    [Show full text]
  • Negative Representations of Information
    Negative Representations of Information by Carlos Fernando Esponda Darlington B.S., Instituto Tecnol´ogico Aut´onomo de M´exico, 1995 DISSERTATION Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Computer Science The University of New Mexico Albuquerque, New Mexico December 2005 c 2005, Carlos Fernando Esponda Darlington iii Dedication Para mi Luciernaga preciosa. iv Acknowledgments I would like to thank my Dad, Mom and Sister; Mom, Dad and Sister; Sister, Mom and Dad; Mom, Sister and Dad; Dad, Sister and Mom; Sister, Dad and Mom in no particular order and with all my heart. I want to thank and acknowledge Stephanie Forrest mentor, collaborator and friend, Paul Helman who taught me a lot and whose input made this work robust, and Elena Ackley for helping me make it real. Throughout the years the people at the adaptive computation laboratory have listened and provided input, in particular Todd Kaplan, Dennis Chao, Anil Somayaji, Matt Glickman, Josh Karlin, Rob Abbott, Ken Ingham, Hajime Inoue, Gabriela Barrantes, Haixia Jia, Justin Balthrop and Ryan Gerety. Terran Lane, professor and committee member continually poses challenges that enrich my research (thanks). I thank Nitant Kenkre for being in my committee and providing a different perspective on things. My friends Gerardo, Horacio, Vladimir, Todd, Dennis, Lucinda and Alejandro (Barru) have provided input and support that is impossible to quantify. My sisters Le, Gaby, Susi and Flor have been my strong advocates and have always helped me along the way. In particular, I want to thank Gaby and Xavier for always making my trips back home pleasurable.
    [Show full text]
  • U Niversal I Ntegration a Rchitecture for H
    U NIVERSAL I NTEGRATION A RCHITECTURE FOR H ETEROGENEOUS D ATASOURCES AND O PTIMISATION M ETHODS UNIWERSALNA ARCHITEKTURA INTEGRACYJNA DLA HETEROGENICZNYCH ZRÓDEŁ´ DANYCH I METOD OPTYMALIZACJI THIS DISSERTATION IS SUBMITTED FOR THE DEGREE OF Doctor of Philosophy BY MICHAŁ CHROMIAK FACULTY OF MATHEMATICS, PHYSICS AND COMPUTER SCIENCE, Maria Curie-Skłodowska University, Lublin ADVISOR: prof. dr hab. Krzysztof Stencel INSTITUTEOF FUNDAMENTAL TECHNOLOGICAL RESEARCH, POLISH ACADEMYOF SCIENCES WARSAW 2015 Table of Contents Page LISTINGS ............................................... 5 LISTOF FIGURES ........................................... 6 LISTOF TABLES ........................................... 8 ABSTRACT .............................................. 9 CHAPTER 1. INTRODUCTION ................................... 19 1.1 Motivation . 19 1.2 Considerations, Objectives and the Thesis . 20 1.3 History and Related Work . 22 1.4 Thesis Outline . 23 CHAPTER 2. THE STATE OF THE ART AND THE RELATED WORKS . 25 2.1 Integrity - the Philosophy of Integration . 25 2.2 Integration - Cure for Chaos of Multiplicity, General Considerations . 27 2.2.1 At the beginning there was a relation . 28 2.2.2 Revolution - the Web changes everything ................... 30 2.2.3 Integration - Principia and Taxonomy . 35 2.2.4 Data Integration Practices . 38 2.2.5 Integration Theory . 42 2.2.6 Data Integration Issues . 47 2.3 Data Stores - the Integration Targets . 51 2.3.1 Database modelling - persistence . 51 2.3.2 Relational Model . 51 2.3.3 Object-oriented Database Model . 55 2.3.4 Column-oriented Relational Database Model (CORDB) – Relational Ap- proach . 56 2.3.5 NoSQL – Distributed Storage Services . 57 2.3.6 NewSQL . 63 2.3.7 Big Data - all or nothing . 66 2.3.8 After SQL Era . 68 2.3.9 Database taxonomy .
    [Show full text]
  • Database Repairing and Consistent Query Answering
    BERTOSSI SeriesSeriesSeries ISSN: ISSN:ISSN: 2153-5418 2153-54182153-5418 BERTOSSI BERTOSSI MMM MorganMorganMorgan & & & ClaypoolClaypoolClaypool PublishersPublishersPublishers SSSYNTHESISYNTHESISYNTHESIS L LLECTURESECTURESECTURES ON ONON D DDATAATAATA M MMANAGEMENTANAGEMENTANAGEMENT &&&CCC SeriesSeriesSeries Editor: Editor:Editor: M. M.M. Tamer TamerTamer Özsu, Özsu,Özsu, University UniversityUniversity of ofof Waterloo WaterlooWaterloo DATABASE REPAIRING AND CONSISTENT QUERYDATABASE ANSWERING DATABASE REPAIRING AND CONSISTENT QUERYDATABASE ANSWERING REPAIRING AND CONSISTENT QUERYDATABASE ANSWERING DatabaseDatabaseDatabase Repairing RepairingRepairing and andand Consistent ConsistentConsistent DatabaseDatabaseDatabase RepairingRepairingRepairing QueryQueryQuery Answering AnsweringAnswering LeopoldoLeopoldoLeopoldo Bertossi, Bertossi,Bertossi, Carleton CarletonCarleton University, University,University, Ottowa, Ottowa,Ottowa, Canada CanadaCanada andandand ConsistentConsistentConsistent QueryQueryQuery IntegrityIntegrityIntegrity constraints constraintsconstraints are areare semantic semanticsemantic conditions conditionsconditions that thatthat a a adatabase databasedatabase should shouldshould satisfy satisfysatisfy in inin order orderorder to toto be bebe an anan appropriate appropriateappropriate model modelmodel ofofof external externalexternal reality. reality.reality. In InIn practice, practice,practice, and andand for forfor many manymany reasons, reasons,reasons, a a adatabase databasedatabase may maymay not notnot satisfy satisfysatisfy
    [Show full text]
  • Data Exfiltration a Review of External Attack Vectors And
    Author’s Accepted Manuscript Data Exfiltration: A Review of External Attack Vectors and Countermeasures Faheem Ullah, Matthew Edwards, Rajiv Ramdhany, Ruzanna Chitchyan, M. Ali Babar, Awais Rashid www.elsevier.com/locate/jnca PII: S1084-8045(17)30356-9 DOI: https://doi.org/10.1016/j.jnca.2017.10.016 Reference: YJNCA1996 To appear in: Journal of Network and Computer Applications Received date: 8 August 2017 Revised date: 18 October 2017 Accepted date: 28 October 2017 Cite this article as: Faheem Ullah, Matthew Edwards, Rajiv Ramdhany, Ruzanna Chitchyan, M. Ali Babar and Awais Rashid, Data Exfiltration: A Review of External Attack Vectors and Countermeasures, Journal of Network and Computer Applications, https://doi.org/10.1016/j.jnca.2017.10.016 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Data Exfiltration: A Review of External Attack Vectors and Countermeasures Faheem Ullah1, Matthew Edwards2, Rajiv Ramdhany2, Ruzanna Chitchyan3, M. Ali Babar1, 4, Awais Rashid2 1University of Adelaide, Australia & 4IT University of Copenhagen, Denmark 2Security Lancaster, School of Computing and Communications, Lancaster University, UK, 3Department of Computer Science, University of Leicester, UK ABSTRACT Context: One of the main targets of cyber-attacks is data exfiltration, which is the leakage of sensitive or private data to an unauthorized entity.
    [Show full text]