A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs Using Gremlin Traversals

Total Page:16

File Type:pdf, Size:1020Kb

A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs Using Gremlin Traversals Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals Harsh Thakkar a,*, Dharmen Punjani b Yashwant Keswani c Jens Lehmann a,d and Sören Auer e a Smart Data Analytics, University of Bonn, Germany, E-mail: {thakkar, jens.lehmann}@cs.uni-bonn.de b Department, National and Kapodistrian University of Athens, Greece, E-mail: [email protected] c DA-IICT, India, E-mail: [email protected] d Fraunhofer IAIS, Germany, E-mail: [email protected] e TIB Technische Informationsbibliothek & L3S Research Center, Leibniz University of Hannover, Germany, E-mail: [email protected] Abstract. Knowledge graphs have become popular over the past years and frequently rely on the Resource Description Frame- work (RDF) or Property Graphs (PG) as underlying data models. However, the query languages for these two data models – SPARQL for RDF and Gremlin for property graph traversal – are lacking interoperability. We present Gremlinator, a novel SPARQL to Gremlin translator. Gremlinator translates SPARQL queries to Gremlin traversals for executing graph pattern match- ing queries over graph databases. This allows to access and query a wide variety of Graph Data Management Systems (DMS) using the W3C standardized SPARQL query language and avoid the learning curve of a new Graph Query Language. Grem- lin is a system agnostic traversal language covering both OLTP graph database or OLAP graph processors, thus making it a desirable choice for supporting interoperability wrt. querying Graph DMSs. We present a comprehensive empirical evaluation of Gremlinator and demonstrate its validity and applicability by executing SPARQL queries on top of the leading graph stores Neo4J, Sparksee and Apache TinkerGraph and compare the performance with the RDF stores Virtuoso, 4Store and JenaTDB. Our evaluation demonstrates the substantial performance gain obtained by the Gremlin counterparts of the SPARQL queries, especially for star-shaped and complex queries. Keywords: SPARQL, Gremlin, Pattern Matching, Graph Traversal, Query Translator, RDF Graph, Property Graph, Gremlinator 1. Introduction integration with built-in world-wide unique identifiers and the expressive SPARQL query language; PGs on Knowledge graphs have become increasingly pop- the other hand support extremely scalable storage and ular over the past years. The two most popular data querying and are meanwhile widely used for modern arXiv:1801.02911v2 [cs.DB] 12 Feb 2018 models for representing and storing knowledge graphs Web applications. are property graphs (PG) and the Resource Description In this article, we present an approach for execut- Framework (RDF). For RDF, the SPARQL query lan- ing SPARQL queries over graph databases via Grem- guage was standardized by W3C, whereas for PGs sev- lin traversals – Gremlinator, thus building a bridge be- eral languages are frequently used, including Grem- tween the currently still largely disjoint semantic and lin [1]. Both data models and the corresponding data graph data technology ecosystems and thus addressing management techniques have distinct and complemen- the query interoperability problem. tary characteristics: RDF is suited for distributed data A SPARQL-PG query translation renders several benefits: (1) Applications based on W3C Semantic *Corresponding author. E-mail: {thakkar, jens.lehmann}@cs.uni- Web standards, like SPARQL and RDF, can use prop- bonn.de. erty graph databases in a non-intrusive fashion. (2) The 1570-0844/0-1900/$35.00 © 0 – IOS Press and the authors. All rights reserved 2 H. Thakkar et al. / Gremlinator query translation lays the foundation for a hybrid use of RDF triple stores and property graph DMS – where a particular query can be dispatched to the DMS ca- pable to answer the query more efficiently [2]. In par- ticular, property graph databases have been shown to work very well for a wide range of queries which ben- efit from locality in a graph. Rather than performing expensive joins, property graph databases use micro indices to perform traversals. (3) Users familiar with the W3C SPARQL query language can avoid learning another query language. To the best of our knowledge, this is the first Figure 1. The Gremlin Traversal Language and Machine. work addressing the query interoperability (transla- tion) problem. Related work (cf. Section2) mostly lin (e.g. Gremlin-Java8, Gremlin-Python etc.), we map covers the area of SPARQL to SQL conversion and each corresponding operation within a SPARQL ba- vice versa. In contrast to those previous efforts, we have to overcome the challenge of mediating be- sic graph pattern (BGP) to its corresponding traver- tween two very different execution paradigms: While sal step in the Gremlin instruction library (i.e. a single SPARQL uses pattern matching techniques, Grem- step traversal operation). As a result, we build complex lin is based on performing graph traversals. More pattern matching traversals, in an analogous fashion to specifically, previous efforts applied query rewriting SPARQL style querying wherein multiple BGPs form techniques between formalisms, which are ultimately complex graph patterns (CGP). Thus, it is possible to rooted in relational algebra operations, whereas we had construct a corresponding Gremlin traversal for each to bridge more disparate query paradigms. While this SPARQL query. is a significant challenge, it is also the reason why sub- Overall, we make the following contributions: stantial performance improvements can be made de- – We propose a novel approach for mapping SPARQL pending on the query characteristics: Whereas direct queries to Gremlin pattern matching traversals, , SPARQL query execution can be expected to be suit- which is the first work converting an RDF to a able for large analytical joins over the entire dataset, property graph query language to the best of our the Gremlin conversion can significantly speed up all knowledge. queries that require exploiting the graph locality. We selected TinkerPop Gremlin as target language, – Our Gremlinator implementation for executing since it is more general than, e.g. CYPHER, as it sup- SPARQL queries over a plethora of third party ported by a wide range of property graph databases graph DMS such as Neo4J, Sparksee, OrientDB, (including OLTP and OLAP processors (see Figure1 etc. using the Apache TinkerPop framework is (a)). Moreover, Gremlin supports both the imperative openly available. (graph traversal) and declarative (graph pattern match- – We report the results of a comprehensive em- ing) style [1], for addressing the query interoperabil- pirical evaluation of the proposed translation ap- ity issue. Lastly, together with the Apache TinkerPop proach comprising a variety of state-of-the-art framework, Gremlin is a language and a virtual ma- property graph databases and triple stores on the chine, enabling to design another traversal language Northwind and BSBM datasets. that compiles to the Gremlin traversal machine (analo- The remainder of the article is organized as follows: gous to how Scala compiles to the JVM), ref. Figure1 Section2 covers related query conversion efforts. Sec- (b). tion3 introduces preliminary notions. Section4 de- We map SPARQL queries to the pattern matching Gremlin traversals (i.e. we map declarative SPARQL scribes the relationship between SPARQL graph pat- queries to declarative Gremlin constructs and not the tern matching and Gremlin traversal steps. Section5 imperative ones). This ensures a level of fairness explains our mapping approach. Section6 presents the when comparing the performance of both Graph Query experimental evaluation on two famous datasets, dis- Languages (GQLs). Furthermore, instead of translat- cusses the results and observations. Finally, Section7 ing SPARQL queries to a specific dialect of Grem- concludes the article and describes future work. H. Thakkar et al. / Gremlinator 3 2. Related Work SQL ! CYPHER: CYPHER2 is the graph query language used to query the Neo4j3 graph database. In this section we briefly survey the related work There has been no work yet aiming to convert the with regard to techniques and tools that support the RDBMS to CYPHER. However, there are some exam- translation and execution of GQLs. ples4 that show the equivalent CYPHER queries for SPARQL ! SQL: There is a substantial amount certain SQL queries. of work been done for conversion of SPARQL queries to SQL queries [3–8]. Ontop [3]1 exposes relational databases as virtual RDF graphs by linking the terms 3. Preliminaries (classes and properties) in the ontology to the data sources through mappings. This virtual RDF graph In this section, we recall and summarize the mathe- can then be queried using SPARQL by dynamically matical concepts which will be used in this article. Our and transparently translating the SPARQL queries notation closely follows [10] and extends [11] by in- into SQL queries over the relational databases. The troducing the notion of vertex labels, a detailed discus- work presented in [4] generates SQL that is optimized sion on which can be found in [12]. and also provides a well-defined specification of the SPARQL semantics used in the translation. In addition, 3.1. Graph Data Models Ontop also supports R2RML mappings over general relational schemas. The authors show that their imple- 3.1.1. Edge-labeled Graphs. mentation can outperform other well known SPARQL- The Resource Description Framework (RDF) is a to-SQL systems, as well as commercial triple stores well-known W3C standard, which is used for data by large margin. In [5] a SPARQL-to-SQL translation modeling and encoding machine readable content on technique is introduced, that focuses on the genera- the Web [13] and within intranets. An RDF graph tion of efficient SQL queries. It relies on a mapping can be seen as a set of triples, roughly analogous to language that lacks support for URI templates and is nodes and edges in a graph database. However, RDF is less expressive than R2RML. [6] proposes a transla- more specific in defining disjoint vertex-sets of blank tion function that takes a query and two many-to-one nodes, literals and IRIs.
Recommended publications
  • Empirical Study on the Usage of Graph Query Languages in Open Source Java Projects
    Empirical Study on the Usage of Graph Query Languages in Open Source Java Projects Philipp Seifer Johannes Härtel Martin Leinberger University of Koblenz-Landau University of Koblenz-Landau University of Koblenz-Landau Software Languages Team Software Languages Team Institute WeST Koblenz, Germany Koblenz, Germany Koblenz, Germany [email protected] [email protected] [email protected] Ralf Lämmel Steffen Staab University of Koblenz-Landau University of Koblenz-Landau Software Languages Team Koblenz, Germany Koblenz, Germany University of Southampton [email protected] Southampton, United Kingdom [email protected] Abstract including project and domain specific ones. Common applica- Graph data models are interesting in various domains, in tion domains are management systems and data visualization part because of the intuitiveness and flexibility they offer tools. compared to relational models. Specialized query languages, CCS Concepts • General and reference → Empirical such as Cypher for property graphs or SPARQL for RDF, studies; • Information systems → Query languages; • facilitate their use. In this paper, we present an empirical Software and its engineering → Software libraries and study on the usage of graph-based query languages in open- repositories. source Java projects on GitHub. We investigate the usage of SPARQL, Cypher, Gremlin and GraphQL in terms of popular- Keywords Empirical Study, GitHub, Graphs, Query Lan- ity and their development over time. We select repositories guages, SPARQL, Cypher, Gremlin, GraphQL based on dependencies related to these technologies and ACM Reference Format: employ various popularity and source-code based filters and Philipp Seifer, Johannes Härtel, Martin Leinberger, Ralf Lämmel, ranking features for a targeted selection of projects.
    [Show full text]
  • Towards an Integrated Graph Algebra for Graph Pattern Matching with Gremlin
    Towards an Integrated Graph Algebra for Graph Pattern Matching with Gremlin Harsh Thakkar1, S¨orenAuer1;2, Maria-Esther Vidal2 1 Smart Data Analytics Lab (SDA), University of Bonn, Germany 2 TIB & Leibniz University of Hannover, Germany [email protected], [email protected] Abstract. Graph data management (also called NoSQL) has revealed beneficial characteristics in terms of flexibility and scalability by differ- ently balancing between query expressivity and schema flexibility. This peculiar advantage has resulted into an unforeseen race of developing new task-specific graph systems, query languages and data models, such as property graphs, key-value, wide column, resource description framework (RDF), etc. Present-day graph query languages are focused towards flex- ible graph pattern matching (aka sub-graph matching), whereas graph computing frameworks aim towards providing fast parallel (distributed) execution of instructions. The consequence of this rapid growth in the variety of graph-based data management systems has resulted in a lack of standardization. Gremlin, a graph traversal language, and machine provide a common platform for supporting any graph computing sys- tem (such as an OLTP graph database or OLAP graph processors). In this extended report, we present a formalization of graph pattern match- ing for Gremlin queries. We also study, discuss and consolidate various existing graph algebra operators into an integrated graph algebra. Keywords: Graph Pattern Matching, Graph Traversal, Gremlin, Graph Alge- bra 1 Introduction Upon observing the evolution of information technology, we can observe a trend arXiv:1908.06265v2 [cs.DB] 7 Sep 2019 from data models and knowledge representation techniques being tightly bound to the capabilities of the underlying hardware towards more intuitive and natural methods resembling human-style information processing.
    [Show full text]
  • Licensing Information User Manual
    Oracle® Database Express Edition Licensing Information User Manual 18c E89902-02 February 2020 Oracle Database Express Edition Licensing Information User Manual, 18c E89902-02 Copyright © 2005, 2020, Oracle and/or its affiliates. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or “commercial computer software documentation” pursuant to the applicable Federal
    [Show full text]
  • Large Scale Querying and Processing for Property Graphs Phd Symposium∗
    Large Scale Querying and Processing for Property Graphs PhD Symposium∗ Mohamed Ragab Data Systems Group, University of Tartu Tartu, Estonia [email protected] ABSTRACT Recently, large scale graph data management, querying and pro- cessing have experienced a renaissance in several timely applica- tion domains (e.g., social networks, bibliographical networks and knowledge graphs). However, these applications still introduce new challenges with large-scale graph processing. Therefore, recently, we have witnessed a remarkable growth in the preva- lence of work on graph processing in both academia and industry. Querying and processing large graphs is an interesting and chal- lenging task. Recently, several centralized/distributed large-scale graph processing frameworks have been developed. However, they mainly focus on batch graph analytics. On the other hand, the state-of-the-art graph databases can’t sustain for distributed Figure 1: A simple example of a Property Graph efficient querying for large graphs with complex queries. Inpar- ticular, online large scale graph querying engines are still limited. In this paper, we present a research plan shipped with the state- graph data following the core principles of relational database systems [10]. Popular Graph databases include Neo4j1, Titan2, of-the-art techniques for large-scale property graph querying and 3 4 processing. We present our goals and initial results for querying ArangoDB and HyperGraphDB among many others. and processing large property graphs based on the emerging and In general, graphs can be represented in different data mod- promising Apache Spark framework, a defacto standard platform els [1]. In practice, the two most commonly-used graph data models are: Edge-Directed/Labelled graph (e.g.
    [Show full text]
  • Introduction to Graph Database with Cypher & Neo4j
    Introduction to Graph Database with Cypher & Neo4j Zeyuan Hu April. 19th 2021 Austin, TX History • Lots of logical data models have been proposed in the history of DBMS • Hierarchical (IMS), Network (CODASYL), Relational, etc • What Goes Around Comes Around • Graph database uses data models that are “spiritual successors” of Network data model that is popular in 1970’s. • CODASYL = Committee on Data Systems Languages Supplier (sno, sname, scity) Supply (qty, price) Part (pno, pname, psize, pcolor) supplies supplied_by Edge-labelled Graph • We assign labels to edges that indicate the different types of relationships between nodes • Nodes = {Steve Carell, The Office, B.J. Novak} • Edges = {(Steve Carell, acts_in, The Office), (B.J. Novak, produces, The Office), (B.J. Novak, acts_in, The Office)} • Basis of Resource Description Framework (RDF) aka. “Triplestore” The Property Graph Model • Extends Edge-labelled Graph with labels • Both edges and nodes can be labelled with a set of property-value pairs attributes directly to each edge or node. • The Office crew graph • Node �" has node label Person with attributes: <name, Steve Carell>, <gender, male> • Edge �" has edge label acts_in with attributes: <role, Michael G. Scott>, <ref, WiKipedia> Property Graph v.s. Edge-labelled Graph • Having node labels as part of the model can offer a more direct abstraction that is easier for users to query and understand • Steve Carell and B.J. Novak can be labelled as Person • Suitable for scenarios where various new types of meta-information may regularly
    [Show full text]
  • The Query Translation Landscape: a Survey
    The Query Translation Landscape: a Survey Mohamed Nadjib Mami1, Damien Graux2,1, Harsh Thakkar3, Simon Scerri1, Sren Auer4,5, and Jens Lehmann1,3 1Enterprise Information Systems, Fraunhofer IAIS, St. Augustin & Dresden, Germany 2ADAPT Centre, Trinity College of Dublin, Ireland 3Smart Data Analytics group, University of Bonn, Germany 4TIB Leibniz Information Centre for Science and Technology, Germany 5L3S Research Center, Leibniz University of Hannover, Germany October 2019 Abstract Whereas the availability of data has seen a manyfold increase in past years, its value can be only shown if the data variety is effectively tackled —one of the prominent Big Data challenges. The lack of data interoperability limits the potential of its collective use for novel applications. Achieving interoperability through the full transformation and integration of diverse data structures remains an ideal that is hard, if not impossible, to achieve. Instead, methods that can simultaneously interpret different types of data available in different data structures and formats have been explored. On the other hand, many query languages have been designed to enable users to interact with the data, from relational, to object-oriented, to hierarchical, to the multitude emerging NoSQL languages. Therefore, the interoperability issue could be solved not by enforcing physical data transformation, but by looking at techniques that are able to query heterogeneous sources using one uniform language. Both industry and research communities have been keen to develop such techniques, which require the translation of a chosen ’universal’ query language to the various data model specific query languages that make the underlying data accessible. In this article, we survey more than forty query translation methods and tools for popular query languages, and classify them according to eight criteria.
    [Show full text]
  • Full-Graph-Limited-Mvn-Deps.Pdf
    org.jboss.cl.jboss-cl-2.0.9.GA org.jboss.cl.jboss-cl-parent-2.2.1.GA org.jboss.cl.jboss-classloader-N/A org.jboss.cl.jboss-classloading-vfs-N/A org.jboss.cl.jboss-classloading-N/A org.primefaces.extensions.master-pom-1.0.0 org.sonatype.mercury.mercury-mp3-1.0-alpha-1 org.primefaces.themes.overcast-${primefaces.theme.version} org.primefaces.themes.dark-hive-${primefaces.theme.version}org.primefaces.themes.humanity-${primefaces.theme.version}org.primefaces.themes.le-frog-${primefaces.theme.version} org.primefaces.themes.south-street-${primefaces.theme.version}org.primefaces.themes.sunny-${primefaces.theme.version}org.primefaces.themes.hot-sneaks-${primefaces.theme.version}org.primefaces.themes.cupertino-${primefaces.theme.version} org.primefaces.themes.trontastic-${primefaces.theme.version}org.primefaces.themes.excite-bike-${primefaces.theme.version} org.apache.maven.mercury.mercury-external-N/A org.primefaces.themes.redmond-${primefaces.theme.version}org.primefaces.themes.afterwork-${primefaces.theme.version}org.primefaces.themes.glass-x-${primefaces.theme.version}org.primefaces.themes.home-${primefaces.theme.version} org.primefaces.themes.black-tie-${primefaces.theme.version}org.primefaces.themes.eggplant-${primefaces.theme.version} org.apache.maven.mercury.mercury-repo-remote-m2-N/Aorg.apache.maven.mercury.mercury-md-sat-N/A org.primefaces.themes.ui-lightness-${primefaces.theme.version}org.primefaces.themes.midnight-${primefaces.theme.version}org.primefaces.themes.mint-choc-${primefaces.theme.version}org.primefaces.themes.afternoon-${primefaces.theme.version}org.primefaces.themes.dot-luv-${primefaces.theme.version}org.primefaces.themes.smoothness-${primefaces.theme.version}org.primefaces.themes.swanky-purse-${primefaces.theme.version}
    [Show full text]
  • An Efficient and Scalable Platform for Java Source Code Analysis Using Overlaid Graph Representations
    Received March 10, 2020, accepted April 7, 2020, date of publication April 13, 2020, date of current version April 29, 2020. Digital Object Identifier 10.1109/ACCESS.2020.2987631 An Efficient and Scalable Platform for Java Source Code Analysis Using Overlaid Graph Representations OSCAR RODRIGUEZ-PRIETO1, ALAN MYCROFT2, AND FRANCISCO ORTIN 1,3 1Department of Computer Science, University of Oviedo, 33007 Oviedo, Spain 2Department of Computer Science and Technology, University of Cambridge, Cambridge CB2 1TN, U.K. 3Department of Computer Science, Cork Institute of Technology, Cork 021, T12 P928 Ireland Corresponding author: Francisco Ortin ([email protected]) This work was supported in part by the Spanish Department of Science, Innovation and Universities under Project RTI2018-099235-B-I00. The work of Oscar Rodriguez-Prieto and Francisco Ortin was supported by the University of Oviedo through its support to official research groups under Grant GR-2011-0040. ABSTRACT Although source code programs are commonly written as textual information, they enclose syntactic and semantic information that is usually represented as graphs. This information is used for many different purposes, such as static program analysis, advanced code search, coding guideline checking, software metrics computation, and extraction of semantic and syntactic information to create predictive models. Most of the existing systems that provide these kinds of services are designed ad hoc for the particular purpose they are aimed at. For this reason, we created ProgQuery, a platform to allow users to write their own Java program analyses in a declarative fashion, using graph representations. We modify the Java compiler to compute seven syntactic and semantic representations, and store them in a Neo4j graph database.
    [Show full text]
  • Neo4j.Com Daniel Howard – Senior Researcher 111 E 5Th Avenue, San Mateo, CA 94401, USA Tel: +1 855 636 4532 Email: [email protected] Neo4j
    InPhilip Howard – ResearchBrief Director, Information Management www.neo4j.com Daniel Howard – Senior Researcher 111 E 5th Avenue, San Mateo, CA 94401, USA Tel: +1 855 636 4532 Email: [email protected] Neo4j The company CREATIVITY SCALE Neo4j Inc (previously Neo Technologies) was founded in 2000 in Sweden although it is now based in the United States. Outside of these two countries the company also has offices in the UK, Germany, France and Japan. The company’s eponymous product is available in both Community and Enterprise Editions and is available both on-premises and via Google, Amazon and Microsoft Azure cloud platforms. The company has a significant partner base, as illustrated in Figure 1. Notable amongst these are Pitney Bowes, which embeds Neo4j within its EXECUTION TECHNOLOGY MDM offering. It is also worth mentioning Structr. org, which is an open source graph-based (Neo4j) The image in this Mutable Quadrant is derived from 13 high level metrics, the more the image covers a section the better. low code development and runtime environment for Execution metrics relate to the company, Technology to the mobile and web applications. product, Creativity to both technical and business innovation and Scale covers the potential business and market impact. supports immediate consistency. Most users (see below) employ Cypher or OpenCypher (the open source version), which is the declarative language developed by Neo4j. It is notable that SAP, Redis, Memgraph and others have adopted OpenCypher and it is also being used within several open source projects including Cypher for Apache Spark, and Cypher for Gremlin, as well as in research projects like InGraph for streaming queries.
    [Show full text]
  • Graph Types and Language Interoperation
    March 2019 W3C workshop in Berlin on graph data management standards Graph types and Language interoperation Peter Furniss, Alastair Green, Hannes Voigt Neo4j Query Languages Standards and Research Team 11 January 2019 We are actively involved in the openCypher community, SQL/PGQ standards process and the ISO GQL (Graph Query Language) initiative. In these venues we in Neo4j are working towards the goals of a single industry-standard graph query language (GQL) for the property graph data model. We ​ ​ feel it is important that GQL should inter-operate well with other languages (for property graphs, and for other data models). Hannes is a co-author of the SIGMOD 2017 G-CORE paper and a co-author of the ​ ​ recent book Querying Graphs (Synthesis Lectures on Data Management). ​ ​ Alastair heads the Neo4j query languages team, is the author of the The GQL ​ Manifesto and of Working towards a New Work Item for GQL, to complement SQL ​ ​ PGQ. ​ Peter has worked on the design and early implementations of property graph typing in the Cypher for Apache Spark project. He is a former editor of OSI/TP and OASIS ​ ​ Business Transaction Protocol standards. Along with Hannes and Alastair, Peter has centrally contributed to proposals for SQL/PGQ Property Graph Schema. ​ We would like to contribute to or help lead discussion on two linked topics. ​ ​ Property Graph Types The information content of a classic Chen Entity-Relationship model, in combination with “mixin” multiple inheritance of structured data types, allows a very concise and flexible expression of the type of a property graph. A named (catalogued) graph type is composed of node and edge types.
    [Show full text]
  • Evaluating a Graph Query Language for Human-Robot Interaction Data in Smart Environments
    Evaluating a Graph Query Language for Human-Robot Interaction Data in Smart Environments Norman K¨oster1, Sebastian Wrede12, and Philipp Cimiano1 1 Cluster of Excellence Center in Cognitive Interactive Technology (CITEC) 2 Research Institute for Cognition and Robotics (CoR-Lab), Bielefeld University, Bielefeld Germany fnkoester,swrede,[email protected], Abstract. Solutions for efficient querying of long-term human-robot in- teraction data require in-depth knowledge of the involved domains and represents a very difficult and error prone task due to the inherent (sys- tem) complexity. Developers require detailed knowledge with respect to the different underlying data schemata, semantic mappings, and most im- portantly the query language used by the storage system (e.g. SPARQL, SQL, or general purpose language interfaces/APIs). While for instance database developers are familiar with technical aspects of query lan- guages, application developers typically lack the specific knowledge to efficiently work with complex database management systems. Addressing this gap, in this paper we describe a model-driven software development based approach to create a long term storage system to be employed in the domain of embodied interaction in smart environments (EISE). The targeted EISE scenario features a smart environment (i.e. smart home) in which multiple agents (a mobile autonomous robot and two virtual agents) interact with humans to support them in daily activities. To support this we created a language using Jetbrains MPS to model the high level EISE domain w.r.t. the occurring interactions as a graph com- posed of nodes and their according relationships. Further, we reused and improved capabilities of a previously created language to represent the graph query language Cypher.
    [Show full text]
  • Technology Overview
    Big Data Technology Overview Term Description See Also Big Data - the 5 Vs Everyone Must Volume, velocity and variety. And some expand the definition further to include veracity 3 Vs Know and value as well. 5 Vs of Big Data From Wikipedia, “Agile software development is a group of software development methods based on iterative and incremental development, where requirements and solutions evolve through collaboration between self-organizing, cross-functional teams. Agile The Agile Manifesto It promotes adaptive planning, evolutionary development and delivery, a time-boxed iterative approach, and encourages rapid and flexible response to change. It is a conceptual framework that promotes foreseen tight iterations throughout the development cycle.” A data serialization system. From Wikepedia, Avro Apache Avro “It is a remote procedure call and serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format.” BigInsights Enterprise Edition provides a spreadsheet-like data analysis tool to help Big Insights IBM Infosphere Biginsights organizations store, manage, and analyze big data. A scalable multi-master database with no single points of failure. Cassandra Apache Cassandra It provides scalability and high availability without compromising performance. Cloudera Inc. is an American-based software company that provides Apache Hadoop- Cloudera Cloudera based software, support and services, and training to business customers. Wikipedia - Data Science Data science The study of the generalizable extraction of knowledge from data IBM - Data Scientist Coursera Big Data Technology Overview Term Description See Also Distributed system developed at Google for interactively querying large datasets. Dremel Dremel It empowers business analysts and makes it easy for business users to access the data Google Research rather than having to rely on data engineers.
    [Show full text]