Research Directions for Principles of Data Management

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Archive Ouverte en Sciences de l'Information et de la Communication Research Directions for Principles of Data Management Serge Abiteboul, Marcelo Arenas, Pablo Barceló, Meghyn Bienvenu, Diego Calvanese, Claire David, Richard Hull, Eyke Hullermeier, Benny Kimelfeld, Leonid Libkin, et al. To cite this version: Serge Abiteboul, Marcelo Arenas, Pablo Barceló, Meghyn Bienvenu, Diego Calvanese, et al.. Research Directions for Principles of Data Management: Manifesto from Dagstuhl Perspectives Workshop 16151. Dagstuhl Manifestos, Schloss Dagstuhl - LZI GmbH, 2018, 7 (1), pp.1-29. 10.4230/DagMan.7.1.1. hal-01778196 HAL Id: hal-01778196 https://hal-upec-upem.archives-ouvertes.fr/hal-01778196 Submitted on 25 Apr 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License Manifesto from Dagstuhl Perspectives Workshop 16151 Research Directions for Principles of Data Management Serge Abiteboul1, Marcelo Arenas2, Pablo Barceló3, Meghyn Bienvenu4, Diego Calvanese5, Claire David6, Richard Hull7, Eyke Hüllermeier8, Benny Kimelfeld9, Leonid Libkin10, Wim Martens11, Tova Milo12, Filip Murlak13, Frank Neven14, Magdalena Ortiz15, Thomas Schwentick16, Julia Stoyanovich17, Jianwen Su18, Dan Suciu19, Victor Vianu20, and Ke Yi21 1 ENS – Cachan, FR 2 Pontificia Universidad Catolica de Chile, CL, [email protected] 3 DCC, University of Chile – Santiago de Chile, CL 4 University of Montpellier, FR 5 Free Univ. of Bozen-Bolzano, IT 6 University Paris-Est – Marne-la-Vallée, FR 7 IBM TJ Watson Research Center – Yorktown Heights, US, [email protected] 8 Universität Paderborn, DE 9 Technion – Haifa, IL 10 University of Edinburgh, GB 11 Universität Bayreuth, DE, [email protected] 12 Tel Aviv University, IL, [email protected] 13 University of Warsaw, PL 14 Hasselt Univ. – Diepenbeek, BE 15 TU Wien, AT 16 TU Dortmund, DE, [email protected] 17 Drexel University — Philadelphia, US 18 University of California – Santa Barbara, US 19 University of Washington – Seattle, US 20 University of California – San Diego, US 21 HKUST – Kowloon, HK Abstract The area of Principles of Data Management (PDM) has made crucial contributions to the de- velopment of formal frameworks for understanding and managing data and knowledge. This work has involved a rich cross-fertilization between PDM and other disciplines in mathematics and computer science, including logic, complexity theory, and knowledge representation. We an- ticipate on-going expansion of PDM research as the technology and applications involving data management continue to grow and evolve. In particular, the lifecycle of Big Data Analytics raises a wealth of challenge areas that PDM can help with. In this report we identify some of the most important research directions where the PDM community has the potential to make significant contributions. This is done from three perspectives: potential practical relevance, results already obtained, and research questions that appear surmountable in the short and medium term. Perspectives Workshop April 10–15, 2016 – http://www.dagstuhl.de/16151 2012 ACM Subject Classification Theory of computation → Database theory Keywords and phrases database theory, principles of data management, query languages, ef- ficient query processing, query optimization, heterogeneous data, uncertainty, knowledge- enriched data management, machine learning, workflows, human-related data, ethics Digital Object Identifier 10.4230/DagMan.7.1.1 Except where otherwise noted, content of this manifesto is licensed under a Creative Commons BY 3.0 Unported license Engineering Academic Software, Dagstuhl Manifestos, Vol. 7, Issue 1, pp. 1–29 Authors: S. Abiteboul et al. Dagstuhl Manifestos Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany 2 Research directions for Principles of Data Management Executive Summary In April 2016, a community of researchers working in the area of Principles of Data Manage- ment (PDM) joined in the Dagstuhl Castle in Germany for a workshop organized jointly by the Executive Committee of the ACM Symposium on Principles of Database Systems (PODS) and the Council of the International Conference on Database Theory (ICDT). The mission of this workshop was to identify and explore some of the most important research directions that have high relevance to society and to Computer Science today, and where the PDM community has the potential to make significant contributions. This report describes the family of research directions that the workshop focused on from three perspectives: potential practical relevance, results already obtained, and research questions that appear surmountable in the short and medium term. This report organizes the identified research challenges for PDM around seven core themes, namely Query Processing at Scale, Multi-model Data, Uncertain Information, Knowledge-enriched Data, Data Management and Machine Learning, Process and Data, and Ethics and Data Management. Since new challenges in PDM arise all the time, we note that this list of themes is not intended to be exhaustive. This report is intended for a diverse audience. It is intended for government and industry funding agencies, because it includes an articulation of important areas where the PDM community is already contributing to the key data management challenges in our era, and has the potential to contribute much more. It is intended for universities and colleges world-wide, because it articulates the importance of continued research and education in the foundational elements of data management, and it highlights growth areas for Computer Science and Management of Information Science research. It is intended for researchers and students, because it identifies emerging, exciting research challenges in the PDM area, all of which have very timely practical relevance. It is also intended for policy makers, sociologists, and philosophers, because it re-iterates the importance of considering ethics in many aspects of data creation, access, and usage, and suggests how research can help to find new ways for maximizing the benefits of massive data while nevertheless safeguarding the privacy and integrity of citizens and societies. S. Abiteboul et al. 3 Contents Executive Summary ................................... 2 Introduction ........................................ 4 Query Processing at Scale ................................ 6 Multi-model Data: Towards an Open Ecosystem of Data Models . 8 Uncertain Information .................................. 10 Knowledge-enriched Data Management ....................... 13 Data Management and Machine Learning ...................... 16 Process and Data ..................................... 18 Human-Related Data and Ethics ........................... 21 Looking Forward ...................................... 22 References .......................................... 23 1 6 1 5 1 4 Research directions for Principles of Data Management 1 Introduction In April 2016, a community of researchers working in the area of Principles of Data Manage- ment (PDM) joined in the Dagstuhl Castle in Germany for a workshop organized jointly by the Executive Committee of the ACM Symposium on Principles of Database Systems (PODS) and the Council of the International Conference on Database Theory (ICDT). The mission of this workshop was to identify and explore some of the most important research directions that have high relevance to society and to Computer Science today, and where the PDM community has the potential to make significant contributions. This report describes the family of research directions that the workshop focused on from three perspectives: potential practical relevance, results already obtained, and research questions that appear surmountable in the short and medium term. This report organizes the identified research challenges for PDM around seven core themes, namely Query Processing at Scale, Multi-model Data, Uncertain Information, Knowledge-enriched Data, Data Management and Machine Learning, Process and Data, and Ethics and Data Management. Since new challenges in PDM arise all the time, we note that this list of themes is not intended to be exhaustive. This report is intended for a diverse audience. It is intended for government and industry funding agencies, because it includes an articulation of important areas where the PDM community is already contributing to the key data management challenges in our era, and has the potential to contribute much more. It is intended for universities and colleges world-wide, because it articulates the importance of continued research and education in the foundational elements of data management, and it highlights growth areas for Computer Science and Management of Information Science research. It is intended for researchers and students, because it identifies emerging, exciting research challenges in the PDM area, all of which have very timely practical

Research Directions for Principles of Data Management

Probabilistic Databases

O. Peter Buneman Curriculum Vitæ – Jan 2008

Foundations of Data Management

Semistructured Data

A Query Language for NC

Provenance As Dependency Analysis†

Dan Suciu Speaks out on Research, Shyness and Being a Scientist

Any Algorithm in the Complex Object Algebra with Powerset Needs Exponential Space to Compute Transitive Closure

Domain-Independent Queries on Databases with External Functions