A Survey of Current Property Graph Query Languages Peter Boncz (CWI)
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
IDC Techscape IDC Techscape: Internet of Things Analytics and Information Management
IDC TechScape IDC TechScape: Internet of Things Analytics and Information Management Maureen Fleming Stewart Bond Carl W. Olofson David Schubmehl Dan Vesset Chandana Gopal Carrie Solinger IDC TECHSCAPE FIGURE FIGURE 1 IDC TechScape: Internet of Things Analytics and Information Management — Current Adoption Patterns Note: The IDC TechScape represents a snapshot of various technology adoption life cycles, given IDC's current market analysis. Expect, over time, for these technologies to follow the adoption curve on which they are currently mapped. Source: IDC, 2016 December 2016, IDC #US41841116 IN THIS STUDY Implementing the analytics and information management (AIM) tier of an Internet of Things (IoT) initiative is about the delivery and processing of sensor data, the insights that can be derived from that data and, at the moment of insight, initiating actions that should then be taken to respond as rapidly as possible. To achieve value, insight to action must fall within a useful time window. That means the IoT AIM tier needs to be designed for the shortest time window of IoT workloads running through the end- to-end system. It is also critical that the correct type of analytics is used to arrive at the insight. Over time, AIM technology adopted for IoT will be different from an organization's existing technology investments that perform a similar but less time-sensitive or data volume–intensive function. Enterprises will want to leverage as much of their existing AIM investments as possible, especially initially, but will want to adopt IoT-aligned technology as they operationalize and identify functionality gaps in how data is moved and managed, how analytics are applied, and how actions are defined and triggered at the moment of insight. -
Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries
1 Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries MACIEJ BESTA, Department of Computer Science, ETH Zurich ROBERT GERSTENBERGER, Department of Computer Science, ETH Zurich EMANUEL PETER, Department of Computer Science, ETH Zurich MARC FISCHER, PRODYNA (Schweiz) AG MICHAŁ PODSTAWSKI, Future Processing CLAUDE BARTHELS, Department of Computer Science, ETH Zurich GUSTAVO ALONSO, Department of Computer Science, ETH Zurich TORSTEN HOEFLER, Department of Computer Science, ETH Zurich Graph processing has become an important part of multiple areas of computer science, such as machine learning, computational sciences, medical applications, social network analysis, and many others. Numerous graphs such as web or social networks may contain up to trillions of edges. Often, these graphs are also dynamic (their structure changes over time) and have domain-specific rich data associated with vertices and edges. Graph database systems such as Neo4j enable storing, processing, and analyzing such large, evolving, and rich datasets. Due to the sheer size of such datasets, combined with the irregular nature of graph processing, these systems face unique design challenges. To facilitate the understanding of this emerging domain, we present the first survey and taxonomy of graph database systems. We focus on identifying and analyzing fundamental categories of these systems (e.g., triple stores, tuple stores, native graph database systems, or object-oriented systems), the associated graph models (e.g., RDF or Labeled Property Graph), data organization techniques (e.g., storing graph data in indexing structures or dividing data into records), and different aspects of data distribution and query execution (e.g., support for sharding and ACID). -
Demystifying Graph Databases: Analysis and Taxonomyof Data Organization, System Designs, and Graph Queries
1 Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries Towards Understanding Modern Graph Processing, Storage, and Analytics MACIEJ BESTA, EMANUEL PETER, Department of Computer Science, ETH Zurich ROBERT GERSTENBERGER, Department of Computer Science, ETH Zurich MARC FISCHER, PRODYNA (Schweiz) AG MICHAŁ PODSTAWSKI, Future Processing CLAUDE BARTHELS, GUSTAVO ALONSO, Department of Computer Science, ETH Zurich TORSTEN HOEFLER, Department of Computer Science, ETH Zurich Graph processing has become an important part of multiple areas of computer science, such as machine learning, computational sciences, medical applications, social network analysis, and many others. Numerous graphs such as web or social networks may contain up to trillions of edges. Often, these graphs are also dynamic (their structure changes over time) and have domain-specific rich data associated with vertices and edges. Graph database systems such as Neo4j enable storing, processing, and analyzing such large, evolving, and rich datasets. Due to the sheer size of such datasets, combined with the irregular nature of graph processing, these systems face unique design challenges. To facilitate the understanding of this emerging domain, we present the first survey and taxonomy of graph database systems. We focus on identifying and analyzing fundamental categories of these systems (e.g., triple stores, tuple stores, native graph database systems, or object-oriented systems), the associated graph models (e.g., RDF or Labeled Property Graph), data organization techniques (e.g., storing graph data in indexing structures or dividing data into records), and different aspects of data distribution and query execution (e.g., support for sharding and ACID). 45 graph database systems are presented and compared, including Neo4j, OrientDB, or Virtuoso. -
Storage, Indexing, Query Processing, and Partitioning in State-Of-The-Art Distributed RDF Engines
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 May 2020 doi:10.20944/preprints202005.0360.v2 STORAGE,INDEXING,QUERY PROCESSING, AND BENCHMARKING IN CENTRALIZED AND DISTRIBUTED RDF ENGINES:ASURVEY APREPRINT Waqas Ali Department of Computer Science and Engineering, School of Electronic, Information and Electrical Engineering (SEIEE), Shanghai Jiao Tong University, Shanghai, China [email protected] Muhammad Saleem Agile Knowledge and Semantic Web (AKWS), University of Leipzig, Leipzig, Germany [email protected] Bin Yao Department of Computer Science and Engineering, School of Electronic, Information and Electrical Engineering (SEIEE), Shanghai Jiao Tong University, Shanghai, China [email protected] Axel-Cyrille Ngonga Ngomo University of Paderborn, Paderborn, Germany [email protected] May 30, 2020 ABSTRACT The recent advancements of the Semantic Web and Linked Data have changed the working of the traditional web. There is a huge adoption of the Resource Description Framework (RDF) format for saving of web-based data. This massive adoption has paved the way for the development of various centralized and distributed RDF processing engines. These engines employ different mechanisms to implement key components of the query processing engines such as data storage, indexing, language support, and query execution. All these components govern how queries are executed and can have a substantial effect on the query runtime. For example, the storage of RDF data in various ways significantly affects the data storage space required and the query runtime performance. The type of indexing approach used in RDF engines is key for fast data lookup. The type of the underlying querying language (e.g., SPARQL or SQL) used for query execution is a key optimization component of the RDF storage solutions. -
A Survey of RDF Stores & SPARQL Engines for Querying Knowledge
Noname manuscript No. (will be inserted by the editor) A Survey of RDF Stores & SPARQL Engines for Querying Knowledge Graphs Waqas Ali · Muhammad Saleem · Bin Yao · Aidan Hogan · Axel-Cyrille Ngonga Ngomo Received: date / Accepted: date Abstract The RDF graph-based data model has seen ever- 1 Introduction broadening adoption in recent years, prompting the stan- dardization of the SPARQL query language for RDF, and the The Resource Description Framework (RDF) is a graph- development of local and distributed engines for process- based data model where triples of the form (s; p; o) denote p ing SPARQL queries. This survey paper provides a com- directed labeled edges s −! o in a graph. RDF has gained prehensive review of techniques, engines and benchmarks significant adoption in the past years, particularly on the for querying RDF knowledge graphs. While other reviews Web. As of 2019, over 5 million websites publish RDF data on this topic tend to focus on the distributed setting, the embedded in their webpages [30]. RDF has also become main focus of the work is on providing a comprehensive sur- a popular format for publishing knowledge graphs on the vey of state-of-the-art storage, indexing and query process- Web, the largest of which – including Bio2RDF, DBpedia, ing techniques for efficiently evaluating SPARQL queries in PubChemRDF, UniProt, and Wikidata – contain billions of a local setting (on one machine). To keep the survey self- triples. These developments have brought about the need for contained, we also provide a short discussion on graph par- optimized techniques and engines for querying large RDF titioning techniques used in the distributed setting. -
Download (888.22
Survey of RDF Stores & SPARQL Engines for Querying Knowledge Graphs This paper was downloaded from TechRxiv (https://www.techrxiv.org). LICENSE CC BY 4.0 SUBMISSION DATE / POSTED DATE 06-04-2021 / 09-04-2021 CITATION ali, waqas; Yao, Bin; Saleem, Muhammad; Hogan, Aidan; Ngomo, A.-C. Ngonga (2021): Survey of RDF Stores & SPARQL Engines for Querying Knowledge Graphs. TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.14376884.v1 DOI 10.36227/techrxiv.14376884.v1 Noname manuscript No. (will be inserted by the editor) A Survey of RDF Stores & SPARQL Engines for Querying Knowledge Graphs Waqas Ali · Muhammad Saleem · Bin Yao · Aidan Hogan · Axel-Cyrille Ngonga Ngomo Received: date / Accepted: date Abstract Recent years have seen the growing adop- 1 Introduction tion of non-relational data models for representing di- verse, incomplete data. Among these, the RDF graph- The Resource Description Framework (RDF) is a graph- based data model has seen ever-broadening adoption, based data model where triples of the form (s; p; o) de- p particularly on the Web. This adoption has prompted note directed labeled edges s −! o in a graph. RDF has the standardization of the SPARQL query language for gained significant adoption in the past years, particu- RDF, as well as the development of a variety of local larly on the Web. As of 2019, over 5 million websites and distributed engines for processing queries over RDF publish RDF data embedded in their webpages [27]. graphs. These engines implement a diverse range of spe- RDF has also become a popular format for publishing cialized techniques for storage, indexing, and query pro- knowledge graphs on the Web, the largest of which – cessing. -
Storage, Indexing, Query Processing, and Benchmarking in Centralized and Distributed RDF Engines: a Survey
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 28 August 2020 doi:10.20944/preprints202005.0360.v3 Storage, Indexing, Query Processing, and Benchmarking in Centralized and Distributed RDF Engines: A Survey Waqas Ali Muhammad Saleem Bin Yao Department of Computer Agile Knowledge and Department of Computer Science and Engineering, Semantic Web (AKWS), Science and Engineering, School of Electronic, University of Leipzig, Leipzig, School of Electronic, Information and Electrical Germany Information and Electrical Engineering (SEIEE), Engineering (SEIEE), Shanghai Jiao Tong University, [email protected] Shanghai Jiao Tong University, Shanghai, China leipzig.de Shanghai, China [email protected] [email protected] Aidan Hogan Axel-Cyrille Ngonga Ngomo IMFD; Department of University of Paderborn, Computer Science (DCC), Paderborn, Germany Universidad de Chile, Santiago, Chile [email protected] [email protected] ABSTRACT Keywords: Storage, Indexing, Language, Query Plan- The recent advancements of the Semantic Web and Linked ning, SPARQL Translation, Centralized RDF Engines, Dis- Data have changed the working of the traditional web. There tributed RDF Engines, SPARQL Benchmarks, Survey. is significant adoption of the Resource Description Framework (RDF) format for saving of web-based data. This massive adoption has paved the way for the development of various 1. INTRODUCTION centralized and distributed RDF processing engines. These Over recent years, the simple, decentralized, and linked engines employ various mechanisms to implement critical architecture of Resource Description Framework (RDF) data components of the query processing engines such as data has greatly attracted different data providers who store their storage, indexing, language support, and query execution. data in the RDF format.