Distributed Data Management with the Rule-Based Language: Webdamlog Émilien Antoine

Total Page:16

File Type:pdf, Size:1020Kb

Distributed Data Management with the Rule-Based Language: Webdamlog Émilien Antoine Distributed data management with the rule-based language: Webdamlog Émilien Antoine To cite this version: Émilien Antoine. Distributed data management with the rule-based language: Webdamlog. Databases [cs.DB]. Université Paris Sud - Paris XI, 2013. English. tel-00908155v1 HAL Id: tel-00908155 https://tel.archives-ouvertes.fr/tel-00908155v1 Submitted on 22 Nov 2013 (v1), last revised 20 Jan 2014 (v5) HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. UNIVERSITÉ PARIS-SUD ÉCOLE DOCTORALE D’INFORMATIQUE DE PARIS-SUD LABORATOIRE SPÉCIFICATION ET VÉRIFICATION (LSV ) – ENS DE CACHAN DISCIPLINE :INFORMATIQUE THÈSE DE DOCTORAT Soutenue le 5 décembre 2013 par Émilien ANTOINE Gestion des données distribuées avec le langage de règles: Webdamlog Distributed data management with the rule-based language: Webdamlog Directeur de thèse : Serge ABITEBOUL D.R. Inria Saclay Composition du jury : Présidente du jury : Nicole BIDOIT Prof. Univ. Paris-Sud Rapporteurs : Christine COLLET Prof. Grenoble INP Pascal MOLLI Prof. Univ. Nantes Examinateurs : Bogdan CAUTIS Prof. Univ. Paris-Sud David GROSS-AMBLARD Prof. Univ. Rennes 1 ii Résumé Notre but est de permettre à un utilisateur du Web d’organiser la gestion de ses données distribuées en place, c’est à dire sans l’obliger à centraliser ses données chez un unique hôte. Par conséquent, notre système diffère de Facebook et des autres systèmes centralisés, et propose une alternative permettant aux utilisa- teurs de lancer leurs propres pairs sur leurs machines gérant localement leurs données personnelles et collaborant éventuellement avec des services Web ex- ternes. Dans ma thèse, je présente Webdamlog, un langage dérivé de datalog pour la gestion de données et de connaissances distribuées. Le langage étend data- log de plusieurs manières, principalement avec une nouvelle propriété la délé- gation, autorisant les pairs à échanger non seulement des faits (les données) mais aussi des règles (la connaissance). J’ai mené une étude utilisateur pour démontrer l’utilisation du langage. Je décris le moteur d’évaluation de Web- damlog qui étend un moteur d’évaluation de datalog distribué nommé Bud, avec le support de la délégation et d’autres innovations telles que la possibil- ité d’avoir des variables pour les noms de pairs et des relations. J’aborde de nouvelles techniques d’optimisation, notamment basées sur la provenance des faits et des règles. Je présente des expérimentations qui démontrent que le coût du support des nouvelles propriétés de Webdamlog reste raisonnable même pour de gros volumes de données. Finalement, je présente l’implémentation d’un pair Webdamlog qui fourni l’environnement pour le moteur. En parti- culier, certains adaptateurs permettant aux pairs Webdamlog d’échanger des données avec d’autres pairs sur Internet. Pour illustrer l’utilisation de ces pairs, j’ai implémenté une application de partage de photo dans un réseau social en Webdamlog. Mots clefs Distribution ; Datalog ; Base de connaissances ; Pair à pair ; Gestion de données du Web. iii iv Abstract Our goal is to enable a Web user to easily specify distributed data management tasks in place, i.e. without centralizing the data to a single provider. Our sys- tem is therefore not a replacement for Facebook, or any centralized system, but an alternative that allows users to launch their own peers on their machines processing their own local personal data, and possibly collaborating with Web services. We introduce Webdamlog, a datalog-style language for managing distributed data and knowledge. The language extends datalog in a number of ways, no- tably with a novel feature, namely delegation, allowing peers to exchange not only facts but also rules. We present a user study that demonstrates the usabil- ity of the language. We describe a Webdamlog engine that extends a distributed datalog engine, namely Bud, with the support of delegation and of a number of other novelties of Webdamlog such as the possibility to have variables de- noting peers or relations. We mention novel optimization techniques, notably one based on the provenance of facts and rules. We exhibit experiments that demonstrate that the rich features of Webdamlog can be supported at reason- able cost and that the engine scales to large volumes of data. Finally, we discuss the implementation of a Webdamlog peer system that provides an environment for the engine. In particular, a peer supports wrappers to exchange Webdamlog data with non-Webdamlog peers. We illustrate these peers by presenting a pic- ture management application that we used for demonstration purposes. Keywords Distribution ; Datalog ; Knowledge Base ; Peer to Peer ; Web Data Management. v vi Contents Acknowledgement ix Résumé en Français xi 1 Introduction 1 2 State of the Art 3 2.1 Distributed Information Systems . 3 2.1.1 Distributed systems . 3 2.1.2 Distributed databases . 4 2.1.3 Data on the Web . 4 2.1.4 Peer-to-peer systems . 5 2.1.5 Social networks . 6 2.1.6 Contribution . 7 2.2 Knowledge bases . 7 2.2.1 Processing knowledge . 7 2.2.2 Datalog . 8 2.2.3 Distributed datalog . 10 2.2.4 Provenance and optimization . 11 2.2.5 Contribution . 11 2.3 Webdam exchange . 11 3 Webdamlog language 15 3.1 Model of data . 17 3.1.1 Informal presentation . 17 3.1.2 Formal definitions . 19 3.2 Key observations . 24 3.3 Expressive power . 27 3.3.1 Traces and simulations . 27 3.3.2 Expressivity results . 28 3.4 Convergence of Webdamlog ....................... 33 3.4.1 Positive Webdamlog ....................... 33 vii viii CONTENTS 3.4.2 Strongly-stratified Webdamlog . 41 4 Webdamlog rule engine 51 4.1 Datalog inside . 51 4.2 Connection between Bud and Webdamlog . 52 4.2.1 Webdamlog computation on Bud . 52 4.2.2 Implementing Webdamlog rules . 54 4.3 Optimization of the evaluation . 58 4.4 Optimization for view maintenance . 60 4.4.1 Provenance graphs . 60 4.4.2 Deletions . 63 4.4.3 Running the fixpoint . 63 4.5 Performance evaluation . 64 4.5.1 Cost of delegation . 64 4.5.2 Cost of dynamism . 68 5 Architecture of a Webdamlog peer 75 5.1 Peer architecture . 76 5.1.1 Event-driven system . 77 5.1.2 Module interactions . 80 5.2 Wrappers . 81 5.3 Demonstration . 84 5.3.1 Wepic application . 84 5.3.2 Demonstration Scenario . 87 5.3.3 Access control . 91 6 User Study 93 6.1 Webdamlog tutorial . 93 6.2 Test . 96 6.3 Results . 99 7 Conclusion 101 Self references 103 External references 105 Acknowledgement [TODO]: acknowledgement ix x ACKNOWLEDGEMENT Résumé en Français [TODO]: reread all this section and complete todos Le volume d’informations présentes sur le Web s’accroît exponentiellement. Les utilisateurs comme les compagnies partagent de plus en plus leurs don- nées, qui se trouvent distribuées sur les nombreuses machines qu’ils possè- dent, ou via des services Web ils stockent leurs infromations sur des machines externes. En particulier, l’émergence de l’infonuagique1 et des réseaux soci- aux permet aux utilisateurs de partager encore plus de données personels. La multitude de services spécialisés offrant chacun une expérience différente à l’utilisateur complique énormément la gestion de ces informations et la col- laboration des ces services dépassent rapidement l’expertise humaine. Les in- formations manipulées par les utilisateurs ont de nombreuses facettes : elles concernent des données personnelles (photos, films, musique, mails), des don- nées sociales (annotations, recommandation, liens sociaux), la localisation des données (marque-pages), les informations de contrôle d’accès (mots de passe, clés privées), les services Web (moteur de recherche, archives), la sémantique (ontologies), la croyance et la provenance. Les tâches exécutées par les utilisa- teurs sont très variées : recherches par mots clef, requêtes structurées, mise-à- jour, authentification, fouille de données et extraction de connaissances. Dans cette thèse, nous montrons que toute cette information devrait être modélisée comme un problème de gestion d’une base de connaissance distribuée. Nous soutenons aussi que datalog et ses extensions est une base formelle sûre pour représenter ces informations et ces tâches. Ce travail fait partie du projet ERC Webdam [ERC13] sur les fondations de la gestion des données du Web. Le but de ce projet est de participer au développement de fondations formelles unifiées pour la gestion de données distribuées, le manque actuel de telles fon- dations ralentissant les progrès dans ce domaine. [TODO]: exemple 1cloud computing xi xii RÉSUMÉ EN FRANÇAIS Contributions Les contributions de cette thèse sont les suivantes : • Je présente Webdamlog, un nouveau langage à base de règles pour la ges- tion de données distribuées qui combine dans un cadre formel les rè- gles déductives de datalog avec négation pour la définition des faits in- tentionnels et les règles actives de datalog ¬¬ pour les mises-à-jour et les communications. Le modèle met un accent fort sur la dynamique et les interactions typiques du Web 2.0, principalement grâce à une nou- veauté du langage Webdamlog, la délégation de règles permettant aux pairs de collaborer. Ce modèle est à la fois suffisamment puissant pour spécifier des systèmes distribués complexes et suffisamment simple pour permettre une étude formelle de la distribution, de la concurrence et de l’expressivité dans un système de pairs autonomes. • Je présente l’implémentation du moteur d’évaluation de programmes Web- damlog qui étend le moteur datalog distribuée avec mise à jour nommé Bud, de deux manières.
Recommended publications
  • Understanding the Hidden Web
    Introduction General Framework Different Modules Conclusion Understanding the Hidden Web Pierre Senellart Athens University of Economics & Business, 28 July 2008 Introduction General Framework Different Modules Conclusion Simple problem Contact all co-authors of Serge Abiteboul. It’s easy! You just need to: Find all co-authors. For each of them, find their current email address. Introduction General Framework Different Modules Conclusion Advanced Scholar Search Advanced Search Tips | About Google Scholar Find articles with all of the words with the exact phrase with at least one of the words without the words where my words occur Author Return articles written by e.g., "PJ Hayes" or McCarthy Publication Return articles published in e.g., J Biol Chem or Nature Date Return articles published between — e.g., 1996 Subject Areas Return articles in all subject areas. Return only articles in the following subject areas: Biology, Life Sciences, and Environmental Science Business, Administration, Finance, and Economics Chemistry and Materials Science Engineering, Computer Science, and Mathematics Medicine, Pharmacology, and Veterinary Science Physics, Astronomy, and Planetary Science Social Sciences, Arts, and Humanities ©2007 Google Introduction General Framework Different Modules Conclusion Advanced Scholar Search Advanced Search Tips | About Google Scholar Find articles with all of the words with the exact phrase with at least one of the words without the words where my words occur Author Return articles written by e.g., "PJ Hayes" or McCarthy Publication
    [Show full text]
  • The Best Nurturers in Computer Science Research
    The Best Nurturers in Computer Science Research Bharath Kumar M. Y. N. Srikant IISc-CSA-TR-2004-10 http://archive.csa.iisc.ernet.in/TR/2004/10/ Computer Science and Automation Indian Institute of Science, India October 2004 The Best Nurturers in Computer Science Research Bharath Kumar M.∗ Y. N. Srikant† Abstract The paper presents a heuristic for mining nurturers in temporally organized collaboration networks: people who facilitate the growth and success of the young ones. Specifically, this heuristic is applied to the computer science bibliographic data to find the best nurturers in computer science research. The measure of success is parameterized, and the paper demonstrates experiments and results with publication count and citations as success metrics. Rather than just the nurturer’s success, the heuristic captures the influence he has had in the indepen- dent success of the relatively young in the network. These results can hence be a useful resource to graduate students and post-doctoral can- didates. The heuristic is extended to accurately yield ranked nurturers inside a particular time period. Interestingly, there is a recognizable deviation between the rankings of the most successful researchers and the best nurturers, which although is obvious from a social perspective has not been statistically demonstrated. Keywords: Social Network Analysis, Bibliometrics, Temporal Data Mining. 1 Introduction Consider a student Arjun, who has finished his under-graduate degree in Computer Science, and is seeking a PhD degree followed by a successful career in Computer Science research. How does he choose his research advisor? He has the following options with him: 1. Look up the rankings of various universities [1], and apply to any “rea- sonably good” professor in any of the top universities.
    [Show full text]
  • Essays Dedicated to Peter Buneman ; [Colloquim in Celebration
    Val Tannen Limsoon Wong Leonid Libkin Wenfei Fan Wang-Chiew Tan Michael Fourman (Eds.) In Search of Elegance in the Theory and Practice of Computation Essays Dedicated to Peter Buneman 4^1 Springer Table of Contents Models for Data-Centric Workflows 1 Serge Abiteboul and Victor Vianu Relational Databases and Bell's Theorem 13 Samson Abramsky High-Level Rules for Integration and Analysis of Data: New Challenges 36 Bogdan Alexe, Douglas Burdick, Mauricio A. Hernandez, Georgia Koutrika, Rajasekar Krishnamurthy, Lucian Popa, Ioana R. Stanoi, and Ryan Wisnesky A New Framework for Designing Schema Mappings 56 Bogdan Alexe and Wang-Chiew Tan User Trust and Judgments in a Curated Database with Explicit Provenance 89 David W. Archer, Lois M.L. Delcambre, and David Maier An Abstract, Reusable, and Extensible Programming Language Design Architecture 112 Hassan Ait-Kaci A Discussion on Pricing Relational Data 167 Magdalena Balazinska, Bill Howe, Paraschos Koutris, Dan Suciu, and Prasang Upadhyaya Tractable Reasoning in Description Logics with Functionality Constraints 174 Andrea Call, Georg Gottlob, and Andreas Pieris Toward a Theory of Self-explaining Computation 193 James Cheney, Umut A. Acar, and Roly Perera To Show or Not to Show in Workflow Provenance 217 Susan B. Davidson, Sanjeev Khanna, and Tova Milo Provenance-Directed Chase&Backchase 227 Alin Deutsch and Richard Hull Data Quality Problems beyond Consistency and Deduplication 237 Wenfei Fan, Floris Geerts, Shuai Ma, Nan Tang, and Wenyuan Yu X Table of Contents 250 Hitting Buneman Circles Michael Paul Fourman 259 Looking at the World Thru Colored Glasses Floris Geerts, Anastasios Kementsietsidis, and Heiko Muller Static Analysis and Query Answering for Incomplete Data Trees with Constraints 273 Amelie Gheerbrant, Leonid Libkin, and Juan Reutter Using SQL for Efficient Generation and Querying of Provenance Information 291 Boris Glavic, Renee J.
    [Show full text]
  • INF2032 2017-2 Tpicos Em Bancos De Dados III ”Foundations of Databases”
    INF2032 2017-2 Tpicos em Bancos de Dados III "Foundations of Databases" Profs. Srgio Lifschitz and Edward Hermann Haeusler August-December 2017 1 Motivation The motivation of this course is to put together theory and practice of database and knowledge base systems. The main content includes a formal approach for the underlying existing technology from the point of view of logical expres- siveness and computational complexity. New and recent database models and systems are discussed from theory to practical issues. 2 Goal of the course The main goal is to introduce the student to the logical and computational foundations of database systems. 3 Sylabus The course in divided in two parts: Classical part: relational database model 1. Revision and formalization of relational databases. Limits of the relational data model; 2. Relational query languages; DataLog and recursion; 3. Logic: Propositional, First-Order, Higher-Order, Second-Order, Fixed points logics; 4. Computational complexity and expressiveness of database query lan- guages; New approaches for database logical models 1. XML-based models, Monadic Second-Order Logic, XPath and XQuery; 2. RDF-based knowledge bases; 3. Decidable Fragments of First-Order logic, Description Logic, OWL dialects; 4. NoSQL and NewSQL databases; 4 Evaluation The evaluation will be based on homework assignments (70 %) and a research manuscript + seminar (30%). 5 Supporting material Lecture notes, slides and articles. Referenced books are freely available online. More details at a wiki-based web page (available from August 7th on) References [ABS00] Serge Abiteboul, Peter Buneman, and Dan Suciu. Data on the Web: From Relations to Semistructured Data and XML.
    [Show full text]
  • Author Template for Journal Articles
    Mining citation information from CiteSeer data Dalibor Fiala University of West Bohemia, Univerzitní 8, 30614 Plzeň, Czech Republic Phone: 00420 377 63 24 29, fax: 00420 377 63 24 01, email: [email protected] Abstract: The CiteSeer digital library is a useful source of bibliographic information. It allows for retrieving citations, co-authorships, addresses, and affiliations of authors and publications. In spite of this, it has been relatively rarely used for automated citation analyses. This article describes our findings after extensively mining from the CiteSeer data. We explored citations between authors and determined rankings of influential scientists using various evaluation methods including cita- tion and in-degree counts, HITS, PageRank, and its variations based on both the citation and colla- boration graphs. We compare the resulting rankings with lists of computer science award winners and find out that award recipients are almost always ranked high. We conclude that CiteSeer is a valuable, yet not fully appreciated, repository of citation data and is appropriate for testing novel bibliometric methods. Keywords: CiteSeer, citation analysis, rankings, evaluation. Introduction Data from CiteSeer have been surprisingly little explored in the scientometric lite- rature. One of the reasons for this may have been fears that the data gathered in an automated way from the Web are inaccurate – incomplete, erroneous, ambiguous, redundant, or simply wrong. Also, the uncontrolled and decentralized nature of the Web is said to simplify manipulating and biasing Web-based publication and citation metrics. However, there have been a few attempts at processing the Cite- Seer data which we will briefly mention. Zhou et al.
    [Show full text]
  • Data on the Web: from Relations to Semistructured Data and XML
    Data on the Web: From Relations to Semistructured Data and XML Serge Abiteboul Peter Buneman Dan Suciu February 19, 2014 1 Data on the Web: from Relations to Semistructured Data and XML Serge Abiteboul Peter Buneman Dan Suciu An initial draft of part of the book Not for distribution 2 Contents 1 Introduction 7 1.1 Audience . 8 1.2 Web Data and the Two Cultures . 9 1.3 Organization . 15 I Data Model 17 2 A Syntax for Data 19 2.1 Base types . 21 2.2 Representing Relational Databases . 21 2.3 Representing Object Databases . 22 2.4 Specification of syntax . 26 2.5 The Object Exchange Model, OEM . 26 2.6 Object databases . 27 2.7 Other representations . 30 2.7.1 ACeDB . 30 2.8 Terminology . 32 2.9 Bibliographic Remarks . 34 3 XML 37 3.1 Basic syntax . 39 3.1.1 XML Elements . 39 3.1.2 XML Attributes . 41 3.1.3 Well-Formed XML Documents . 42 3.2 XML and Semistructured Data . 42 3.2.1 XML Graph Model . 43 3.2.2 XML References . 44 3 4 CONTENTS 3.2.3 Order . 46 3.2.4 Mixing elements and text . 47 3.2.5 Other XML Constructs . 47 3.3 Document Type Declarations . 48 3.3.1 A Simple DTD . 49 3.3.2 DTD's as Grammars . 49 3.3.3 DTD's as Schemas . 50 3.3.4 Declaring Attributes in DTDs . 52 3.3.5 Valid XML Documents . 54 3.3.6 Limitations of DTD's as schemas .
    [Show full text]
  • O. Peter Buneman Curriculum Vitæ – Jan 2008
    O. Peter Buneman Curriculum Vitæ { Jan 2008 Work Address: LFCS, School of Informatics University of Edinburgh Crichton Street Edinburgh EH8 9LE Scotland Tel: +44 131 650 5133 Fax: +44 667 7209 Email: [email protected] or [email protected] Home Address: 14 Gayfield Square Edinburgh EH1 3NX Tel: 0131 557 0965 Academic record 2004-present Research Director, Digital Curation Centre 2002-present Adjunct Professor of Computer Science, Department of Computer and Information Science, University of Pennsylvania. 2002-present Professor of Database Systems, Laboratory for the Foundations of Computer Science, School of Informatics, University of Edinburgh 1990-2001 Professor of Computer Science, Department of Computer and Information Science, University of Pennsylvania. 1981-1989 Associate Professor of Computer Science, Department of Computer and Information Science, University of Pennsylvania. 1981-1987 Graduate Chairman, Department of Computer and Information Science, University of Pennsyl- vania 1975-1981 Assistant Professor of Computer Science at the Moore School and Assistant Professor of Decision Sciences at the Wharton School, University of Pennsylvania. 1969-1974 Research Associate and subsequently Lecturer in the School of Artificial Intelligence, Edinburgh University. Education 1970 PhD in Mathematics, University of Warwick (Supervisor E.C. Zeeman) 1966 MA in Mathematics, Cambridge. 1963-1966 Major Scholar in Mathematics at Gonville and Caius College, Cambridge. Awards, Visting positions Distinguished Visitor, University of Auckland, 2006 Trustee, VLDB Endowment, 2004 Fellow of the Royal Society of Edinburgh, 2004 Royal Society Wolfson Merit Award, 2002 ACM Fellow, 1999. 1 Visiting Resarcher, INRIA, Jan 1999 Visiting Research Fellowship sponsored by the Japan Society for the Promotion of Science, Research Institute for the Mathematical Sciences, Kyoto University, Spring 1997.
    [Show full text]
  • SIGMOD Record’S Series of Interviews with Distinguished Members of the Database Community
    Serge Abiteboul Speaks Out on Building a Research Group in Europe, How He Got Involved in a Startup, Why Systems Papers Shouldn’t Have to Include Measurements, the Value of Object Databases, and More by Marianne Winslett http://www-rocq.inria.fr/~abitebou/ Welcome to this installment of ACM SIGMOD Record’s series of interviews with distinguished members of the database community. I’m Marianne Winslett, and today we are at the SIGMOD 2006 conference in Chicago, Illinois. I have here with me Serge Abiteboul, who is a senior researcher at INRIA and the manager of the Gemo Database Group. Serge’s research interests are in databases, web data, and database theory. He received the SIGMOD Innovation Award in 1998, and he is a cofounder of Xyleme, a company that provides XML-based content management. His PhD is from the University of Southern California. So, Serge, welcome! Serge, you have built what may be the most successful database group in Europe. How did you do it? And are the challenges for building a group different in Europe than they would be in the US? Many people deserve credit for the group’s success. It is a continuation of a research group named Verso that was sponsored by Francois Bancilhon and Michel Scholl; so when I arrived, the group was already existing. My contribution was to bring in some new fresh very good scientists --- Luc Segoufin, Ioana Manolescu, Sophie Cluet. More recently, we moved to a new research unit of INRIA. The idea was to get closer to the university, so now we are at the University of Orsay; we merged with the knowledge representation group that was managed by Christina Rousset.
    [Show full text]
  • Example-Guided Synthesis of Relational Queries
    Example-Guided Synthesis of Relational Queries Aalok Thakkar Aaditya Naik Nathaniel Sands University of Pennsylvania University of Pennsylvania University of Southern California Philadelphia, USA Philadelphia, USA Los Angeles, USA [email protected] [email protected] [email protected] Rajeev Alur Mayur Naik Mukund Raghothaman University of Pennsylvania University of Pennsylvania University of Southern California Philadelphia, USA Philadelphia, USA Los Angeles, USA [email protected] [email protected] [email protected] Abstract 1 Introduction Program synthesis tasks are commonly specified via input- Program synthesis aims to automatically synthesize a pro- output examples. Existing enumerative techniques for such gram that meets user intent. While the user intent is classi- tasks are primarily guided by program syntax and only make cally described as a correctness specification, synthesizing indirect use of the examples. We identify a class of synthesis programs from input-output examples has gained much trac- algorithms for programming-by-examples, which we call tion, as evidenced by the many applications of programming- Example-Guided Synthesis (EGS), that exploits latent struc- by-example and programming-by-demonstration, such as ture in the provided examples while generating candidate spreadsheet programming [25], relational query synthesis [51, programs. We present an instance of EGS for the synthesis 57], and data wrangling [19, 33]. Nevertheless, their scalabil- of relational queries and evaluate it on 86 tasks from three ity remains an important challenge, and often hinders their application domains: knowledge discovery, program analy- application in the field [5]. sis, and database querying. Our evaluation shows that EGS Existing synthesis tools predominantly adapt a syntax- outperforms state-of-the-art synthesizers based on enumer- guided approach to search through the space of candidate ative search, constraint solving, and hybrid techniques in programs.
    [Show full text]
  • Paris C. Kanellakis
    In Memoriam Paris C Kanellakis Our colleague and dear friend Paris C Kanellakis died unexp ectedly and tragically on De cemb er together with his wife MariaTeresa Otoya and their b eloved young children Alexandra and Stephanos En route to Cali Colombia for an annual holiday reunion with his wifes family their airplane strayed o course without warning during the night minutes b efore an exp ected landing and crashed in the Andes As researchers we mourn the passing of a creative and thoughtful colleague who was resp ected for his manycontributions b oth technical and professional to the computer science research community As individuals wegrieveover our tragic lossof a friend who was regarded with great aection and of a happy thriving family whose warmth and hospitalitywere gifts appreciated by friends around the world Their deaths create for us a void that cannot b e lled Paris left unnished several pro jects including a pap er on database theory intended for this sp ecial issue of Computing Surveys to b e written during his holiday visit to Colombia Instead we wish to oer here a brief biographyofParis and a description of the research topics that interested Paris over the last few years together with the contributions that he made to these areas It is not our intention to outline denitive surveys or histories of these areas but rather to honor the signicantcontemp orary research of our friend Paris was b orn in Greece in to Eleftherios and Argyroula Kanellakis In he received the Diploma in Electrical Engineering from the National
    [Show full text]
  • Efficient Maintenance Techniques for Views Over Active Documents
    Efficient Maintenance Techniques for Views over Active Documents∗ Serge Abiteboul#, Pierre Bourhis#∗, Bogdan Marinoiu# #INRIA Saclay { Ile-de-France^ and University Paris Sud ∗ENS Cachan [email protected] ABSTRACT The core of the model is a complex stream processor that Many Web applications are based on dynamic interactions we call axlog widget. The term axlog results from the mar- between Web components exchanging flows of information. riage between Active XML (AXML for short) [5] and dat- Such a situation arises for instance in mashup systems or alog [7]. An axlog widget consists of an active document when monitoring distributed autonomous systems. Our work interacting with the rest of the system via streams of up- is in this challenging context that has generated recently a dates. See Figure 1. The input streams specify updates to lot of attention; see Web 2.0. We introduce the axlog for- the document (in the spirit of RSS feeds). In most of the mal model for capturing such interactions and show how paper, the focus is on input streams where the updates are this model can be supported efficiently. The central compo- only insertions, although we do consider briefly streams with nent is the axlog widget defined by one tree-pattern query deletions. An output stream is defined by a query on the or more, over an active document (in the Active XML style) document. More precisely, it represents the list of update that includes some input streams of updates. A widget gen- requests to maintain the view for the query. The queries we erates a stream of updates for each query, the updates that consider here are tree-pattern queries with value joins (and are needed to maintain the view corresponding to the query.
    [Show full text]
  • A Formal Study of Collaborative Access Control in Distributed Datalog
    A Formal Study of Collaborative Access Control in Distributed Datalog Serge Abiteboul1, Pierre Bourhis2, and Victor Vianu3 1 INRIA Saclay, Palaiseau, France; and ENS Cachan, Cachan, France [email protected] 2 CNRS, France; and Centre de Recherche en Informatique, Signal et Automatique (CRIStAL) – UMR 9189, Lille, France [email protected] 3 University of California San Diego, La Jolla, USA; and INRIA Saclay, Palaiseau, France [email protected] Abstract We formalize and study a declaratively specified collaborative access control mechanism for data dissemination in a distributed environment. Data dissemination is specified using distributed datalog. Access control is also defined by datalog-style rules, at the relation level for extensional relations, and at the tuple level for intensional ones, based on the derivation of tuples. The model also includes a mechanism for “declassifying” data, that allows circumventing overly restrictive access control. We consider the complexity of determining whether a peer is allowed to access a given fact, and address the problem of achieving the goal of disseminating certain information under some access control policy. We also investigate the problem of information leakage, which occurs when a peer is able to infer facts to which the peer is not allowed access by the policy. Finally, we consider access control extended to facts equipped with provenance information, motivated by the many applications where such information is required. We provide semantics for access control with provenance, and establish the complexity of determining whether a peer may access a given fact together with its provenance. This work is motivated by the access control of the Webdamlog system, whose core features it formalizes.
    [Show full text]