Practicalgremlin.Pdf

Total Page:16

File Type:pdf, Size:1020Kb

Practicalgremlin.Pdf PRACTICAL GREMLIN An Apache TinkerPop Tutorial Kelvin R. Lawrence Version 283-preview, August 2nd 2021 Table of Contents 1. INTRODUCTION . 1 1.1. How this book came to be. 1 1.2. Providing feedback. 2 1.3. Some words of thanks . 2 1.4. What is this book about?. 2 1.5. Introducing the book sources, sample programs and data . 4 1.6. A word about TinkerPop 3.4 . 5 1.7. Introducing TinkerPop 3.5 . 6 1.8. So what is a graph database and why should I care? . 6 1.9. A word about terminology . 8 2. GETTING STARTED . 10 2.1. What is Apache TinkerPop? . 10 2.2. The Gremlin console . 11 2.2.1. Downloading, installing and launching the console . 11 2.2.2. Saving output from the console to a file . 15 2.3. Introducing TinkerGraph . 16 2.4. Introducing the air-routes graph . 18 2.4.1. Updated versions of the air-route data . 22 2.5. TinkerPop 3 migration notes . 22 2.5.1. Creating a TinkerGraph TP2 vs TP3 . 22 2.5.2. Loading a graphML file TP2 vs TP3 . 22 2.5.3. A word about the TinkerPop.sugar plugin . 23 2.6. Loading the air-routes graph using the Gremlin console . 24 2.7. Turning off some of the Gremlin console’s output. 25 2.8. A word about indexes and schemas. 25 3. WRITING GREMLIN QUERIES . 27 3.1. Introducing Gremlin . 27 3.1.1. A quick look at Gremlin and SQL . 27 3.2. Some fairly basic Gremlin queries . 29 3.2.1. Retrieving property values from a vertex . 31 3.2.2. Does a specific property exist on a given vertex or edge? . 32 3.2.3. Counting things . 33 3.2.4. Counting groups of things. 34 3.3. Starting to walk the graph . 36 3.3.1. Some simple graph traversal examples . 37 3.3.2. What vertices and edges did I visit? - Introducing path . 38 3.3.3. Modifying a path using from and to modulators. 41 3.3.4. Does an edge exist between two vertices? . 44 3.3.5. Using as, select and project to refer to traversal steps . 44 3.3.6. Using multiple as steps with the same label. 47 3.3.7. Returning selected parts of a path . 49 3.3.8. Examining the edge between two vertices . 49 3.4. Limiting the amount of data returned . 50 3.4.1. Retrieving a range of vertices . 52 3.4.2. Removing duplicates - introducing dedup. 54 3.5. Using valueMap to explore the properties of a vertex or edge . 55 3.5.1. Changes to valueMap introduced in TinkerPop 3.4 . 57 3.6. An alternative to valueMap - introducing elementMap . 60 3.7. Assigning query results to a variable . 61 3.7.1. Introducing toList, toSet, bulkSet and fill . 63 3.8. Working with IDs . 65 3.9. Working with labels. 67 3.10. Using the local step to make sure we get the result we intended . 69 3.11. Basic statistical and numerical operations . 71 3.12. Testing values and ranges of values . 73 3.12.1. Using between to simulate startsWith . 77 3.12.2. Refining flight routes analysis using not, neq, within and without. 79 3.12.3. Using coin and sample to sample a dataset. 81 3.12.4. Using Math.random to more randomly select a single vertex . 82 3.13. New text search predicates added in TinkerPop 3.4 . 85 3.13.1. startingWith . 85 3.13.2. endingWith. 87 3.13.3. containing. 87 3.13.4. notStartingWith . ..
Recommended publications
  • Introduction to Hbase Schema Design
    Introduction to HBase Schema Design AmANDeeP KHURANA Amandeep Khurana is The number of applications that are being developed to work with large amounts a Solutions Architect at of data has been growing rapidly in the recent past . To support this new breed of Cloudera and works on applications, as well as scaling up old applications, several new data management building solutions using the systems have been developed . Some call this the big data revolution . A lot of these Hadoop stack. He is also a co-author of HBase new systems that are being developed are open source and community driven, in Action. Prior to Cloudera, Amandeep worked deployed at several large companies . Apache HBase [2] is one such system . It is at Amazon Web Services, where he was part an open source distributed database, modeled around Google Bigtable [5] and is of the Elastic MapReduce team and built the becoming an increasingly popular database choice for applications that need fast initial versions of their hosted HBase product. random access to large amounts of data . It is built atop Apache Hadoop [1] and is [email protected] tightly integrated with it . HBase is very different from traditional relational databases like MySQL, Post- greSQL, Oracle, etc . in how it’s architected and the features that it provides to the applications using it . HBase trades off some of these features for scalability and a flexible schema . This also translates into HBase having a very different data model . Designing HBase tables is a different ballgame as compared to relational database systems . I will introduce you to the basics of HBase table design by explaining the data model and build on that by going into the various concepts at play in designing HBase tables through an example .
    [Show full text]
  • Empirical Study on the Usage of Graph Query Languages in Open Source Java Projects
    Empirical Study on the Usage of Graph Query Languages in Open Source Java Projects Philipp Seifer Johannes Härtel Martin Leinberger University of Koblenz-Landau University of Koblenz-Landau University of Koblenz-Landau Software Languages Team Software Languages Team Institute WeST Koblenz, Germany Koblenz, Germany Koblenz, Germany [email protected] [email protected] [email protected] Ralf Lämmel Steffen Staab University of Koblenz-Landau University of Koblenz-Landau Software Languages Team Koblenz, Germany Koblenz, Germany University of Southampton [email protected] Southampton, United Kingdom [email protected] Abstract including project and domain specific ones. Common applica- Graph data models are interesting in various domains, in tion domains are management systems and data visualization part because of the intuitiveness and flexibility they offer tools. compared to relational models. Specialized query languages, CCS Concepts • General and reference → Empirical such as Cypher for property graphs or SPARQL for RDF, studies; • Information systems → Query languages; • facilitate their use. In this paper, we present an empirical Software and its engineering → Software libraries and study on the usage of graph-based query languages in open- repositories. source Java projects on GitHub. We investigate the usage of SPARQL, Cypher, Gremlin and GraphQL in terms of popular- Keywords Empirical Study, GitHub, Graphs, Query Lan- ity and their development over time. We select repositories guages, SPARQL, Cypher, Gremlin, GraphQL based on dependencies related to these technologies and ACM Reference Format: employ various popularity and source-code based filters and Philipp Seifer, Johannes Härtel, Martin Leinberger, Ralf Lämmel, ranking features for a targeted selection of projects.
    [Show full text]
  • Sample File CREDITS Lead Designer, Concept, Writing, and Art Direction Artwork Marc Altfuldisch the Artwork in This Handbook Is All Created by the Artists Below
    Sample file CREDITS Lead Designer, Concept, Writing, and Art Direction Artwork Marc Altfuldisch The artwork in this handbook is all created by the artists below. A huge thanks goes out to them, for allowing me to include their illustrations herein. Balance and Flavor If you find their artwork intriguing, you should check out their galleries, which are linked Marc Altfuldisch below. Thomas Thorhave Baltzer George Cameron Alecyl Sandara David Moore alecyl.deviantart.com sandara.deviantart.com Bog Hag Page 3: Tree with Animals Editing Tsunami George Cameron Arturo Delgado Shaman with Animal Spirits madstalfos.deviantart.com Goi-Kashu Playtesters Jiki-Ketsu-Gaki Two-Tailed White Inari Adam Ford Con-Tinh Oriental Sea Life Alejandro Villalon Celestial Dragon and Human Bailey Kellenberger Carp Dragon and Human Girl Branden Weaver Dave Melvin davesrightmind.deviantart.com Celestial Dragon Bryan Butler Typhoon Dragon Kumo Derik Snell Byoki Spawn Nian Elvin Johson Kyoso Spawn Ashi no Oni George Cameron Sanru no Oni Haino no Oni Gianfranco Abbatemarco Akuma Kamu no Oni James “Dragon Lover” Hudson Byoki Ugulu no Oni Jason Gyorog Nikoma Kyoso John “Crit God” Wilantowicz Onikage Shikibu Jonathan Butler Phoenix Tsuburo Joseph Miller Great Wyrm Torn Kenneth Robinson Spirit Wolf Urigarimono Parker Doiron Taka-onna Yaoguai Pete McCue Wang-Liang Raymond Govero Robert “Wrayyth” Whitsell Harley Dela Cruz Shizen1102 Steven “Nook” Anderson denzelberg.deviantart.com shizen1102.deviantart.com Samantha Christine Bajang Garegosu Tre Stoterau Ancient Dokufu Void Serpent Victor Vega Dokufu Spiderling Kirin Vijay Dukkipati Goblin Rat Orochi Hellbeast Manananggal ... thank you all very much! Your assistance made this all possible! Hsing-sing Matriarch Hundun Hyekuhei Teo Tei Drone A Very Special Thanks to Kappa Tao Tei Regent Eleazzaar’s Detect Balance Werebadger Taowu Drone with Taowu Scouts SwordMeow’s JOTUNGARD Maho-Tsukai Magus Qiongqui Mamono ..
    [Show full text]
  • American Motors ~I
    American Motors ~I 1972 AMASPECIFICATIONS FORM . PassengerCar 1.-ANU FA C T URER CAR NA ME A ME RICAN M OT ORS CORPORATION • Grem li n • Ma t a dor • Jave li n • Horne t • Ambassa dor IAA I L I N G A OORESS MODEL. YEAR ISSUED : 14250 P l ymouth Road Det r o it, Mich igan 4823 2 1972 Se d:ember 2 1. 1971 At tn: Car L Chakmak ian REVISED (e ) P r oduc t Informat ion Dep t., 493 - 2557 (3 13) The information contained herein Is prepared, distributed by, and Is solely the responsi­ bility of the automobile manufacturing company to whose products it relates. Questions concerning these specifications should be directed to the manufacturer whose address Is shown above. This specification form was developed by automobile manufacturing companies under the auspices of the Automobile Manufacturers Association. AMA-40A-72 AMA Spec ificat ions Form-Pas senger Car TABLE OF CONTENTS BODY MODEL . ... .... .. ............... .. .... ......... ...... .... ...... , .... .. 1 CAR AND BODY DIMENSIONS ...... • .. .• ... , • •. ..• •. •.•.••. • •• •• .. .•• • . , ••. .. , , .•• 2, 3 POWER TEAMS .. ... .. ..... ... ........ .. ..... .. ..... ............... .... ........ 4 ENGINE .. ..................................... .. ..... .. .. .. ..... .... ... .. .. 5-9 EXHAUST SYSTEM ............................ ... ......• . ......................•... 9 FUEL SYSTEM ........................................ ... ...... ...... ...... .•..... 10 COOL ING SYSTEM ........................ .. ....... .... .. ....... ..... ........ 11 VEHICLE EMISSION CONTROL .... .. • ...
    [Show full text]
  • Orchestrating Big Data Analysis Workflows in the Cloud: Research Challenges, Survey, and Future Directions
    00 Orchestrating Big Data Analysis Workflows in the Cloud: Research Challenges, Survey, and Future Directions MUTAZ BARIKA, University of Tasmania SAURABH GARG, University of Tasmania ALBERT Y. ZOMAYA, University of Sydney LIZHE WANG, China University of Geoscience (Wuhan) AAD VAN MOORSEL, Newcastle University RAJIV RANJAN, Chinese University of Geoscienes and Newcastle University Interest in processing big data has increased rapidly to gain insights that can transform businesses, government policies and research outcomes. This has led to advancement in communication, programming and processing technologies, including Cloud computing services and technologies such as Hadoop, Spark and Storm. This trend also affects the needs of analytical applications, which are no longer monolithic but composed of several individual analytical steps running in the form of a workflow. These Big Data Workflows are vastly different in nature from traditional workflows. Researchers arecurrently facing the challenge of how to orchestrate and manage the execution of such workflows. In this paper, we discuss in detail orchestration requirements of these workflows as well as the challenges in achieving these requirements. We alsosurvey current trends and research that supports orchestration of big data workflows and identify open research challenges to guide future developments in this area. CCS Concepts: • General and reference → Surveys and overviews; • Information systems → Data analytics; • Computer systems organization → Cloud computing; Additional Key Words and Phrases: Big Data, Cloud Computing, Workflow Orchestration, Requirements, Approaches ACM Reference format: Mutaz Barika, Saurabh Garg, Albert Y. Zomaya, Lizhe Wang, Aad van Moorsel, and Rajiv Ranjan. 2018. Orchestrating Big Data Analysis Workflows in the Cloud: Research Challenges, Survey, and Future Directions.
    [Show full text]
  • The Gremlin: Transforming the Past's Failures 161 the Eagle Squadron, a Noted "Gremlinologist," Who Had Heard Many Such Tales and Was Beginning to Collect Them
    The Gremlin: Transfor111ing the Past's Failures BRUCE A. ROSENBERG A MERICAN MOTORS CORPORATION entered the sub-compact field in 1970 with a sturdy little car it named the Gremlin. Its rear door-window (the hatch-back) descends so abruptly from the vehicle's roof that it gives it the appearance of having been cut off before it finished growing (or perhaps the way that Sir Yvain's horse was cut off at the hind quarter by the falling portcullis at the castle of Esclados the Red). The company it- self exploits the car's looks in television commercials which depict a snide and burly service-station attendant asking a pretty young driver about the "rest" of her car. But the ad concludes with the important message that though the car might have unorthodox looks it is economical to drive: the girl smugly and tauntingly hands her torrnentor a dollar bill for gas. What is most incongruous in all this, particularly for those people with clear memories of World War II, is the very name of the car, Gremlin. Historically this is the least likely name for a machine imaginable. But apparently not for the general public, and presumably not for the staff at American Motors which named the car several years ago. The company must have assumed, or conducted extensive field research which demonstrated, that car buyers would think of pixies, sprites, or other varieties of the "wee folk" when they heard the name Gremlin. The name conjures up an image of a small, compact creature (verymuch like the car), who is harmless, playful, and whimsical, perhaps something like the Celtic elf who sells the breakfast cereal, "Lucky Charms." When they first became known to Americans in 1942 (Gremlins had already made the acquaintance of the R.A.F.), they were the subject of several articles in leading magazines, and their diminutive stature was usually featured.
    [Show full text]
  • An Evaluation of the Design Space for Scalable Data Loading Into Graph Databases
    Otto-von-Guericke-Universit¨at Magdeburg Faculty of Computer Science Databases D B and Software S E Engineering Master's Thesis An Evaluation of the Design Space for Scalable Data Loading into Graph Databases Author: Jingyi Ma February 23, 2018 Advisors: M.Sc. Gabriel Campero Durand Data and Knowledge Engineering Group Prof. Dr. rer. nat. habil. Gunter Saake Data and Knowledge Engineering Group Ma, Jingyi: An Evaluation of the Design Space for Scalable Data Loading into Graph Databases Master's Thesis, Otto-von-Guericke-Universit¨at Magdeburg, 2018. Abstract In recent years, computational network science has become an active area. It offers a wealth of tools to help us gain insight into the interconnected systems around us. Graph databases are non-relational database systems which have been developed to support such network-oriented workloads. Graph databases build a data model based on graph abstractions (i.e. nodes/vertexes and edges) and can use different optimizations to speed up the basic graph processing tasks, such as traversals. In spite of such benefits, some tasks remain challenging in graph databases, such as the task of loading the complete dataset. The loading process has been considered to be a performance bottleneck, specifically a scalability bottleneck, and application developers need to conduct performance tuning to improve it. In this study, we study some optimization alternatives that developers have for load data into a graph databases. With this goal, we propose simple microbenchmarks of application-level load optimizations and evaluate these optimizations experimentally for loading real world graph datasets. We run our tests using JanusGraphLab, a JanusGraph prototype.
    [Show full text]
  • Building Machine Learning Inference Pipelines at Scale
    Building Machine Learning inference pipelines at scale Julien Simon Global Evangelist, AI & Machine Learning @julsimon © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Problem statement • Real-life Machine Learning applications require more than a single model. • Data may need pre-processing: normalization, feature engineering, dimensionality reduction, etc. • Predictions may need post-processing: filtering, sorting, combining, etc. Our goal: build scalable ML pipelines with open source (Spark, Scikit-learn, XGBoost) and managed services (Amazon EMR, AWS Glue, Amazon SageMaker) © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Apache Spark https://spark.apache.org/ • Open-source, distributed processing system • In-memory caching and optimized execution for fast performance (typically 100x faster than Hadoop) • Batch processing, streaming analytics, machine learning, graph databases and ad hoc queries • API for Java, Scala, Python, R, and SQL • Available in Amazon EMR and AWS Glue © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. MLlib – Machine learning library https://spark.apache.org/docs/latest/ml-guide.html • Algorithms: classification, regression, clustering, collaborative filtering. • Featurization: feature extraction, transformation, dimensionality reduction. • Tools for constructing, evaluating and tuning pipelines • Transformer – a transform function that maps a DataFrame into a new
    [Show full text]
  • Evaluation of SPARQL Queries on Apache Flink
    applied sciences Article SPARQL2Flink: Evaluation of SPARQL Queries on Apache Flink Oscar Ceballos 1 , Carlos Alberto Ramírez Restrepo 2 , María Constanza Pabón 2 , Andres M. Castillo 1,* and Oscar Corcho 3 1 Escuela de Ingeniería de Sistemas y Computación, Universidad del Valle, Ciudad Universitaria Meléndez Calle 13 No. 100-00, Cali 760032, Colombia; [email protected] 2 Departamento de Electrónica y Ciencias de la Computación, Pontificia Universidad Javeriana Cali, Calle 18 No. 118-250, Cali 760031, Colombia; [email protected] (C.A.R.R.); [email protected] (M.C.P.) 3 Ontology Engineering Group, Universidad Politécnica de Madrid, Campus de Montegancedo, Boadilla del Monte, 28660 Madrid, Spain; ocorcho@fi.upm.es * Correspondence: [email protected] Abstract: Existing SPARQL query engines and triple stores are continuously improved to handle more massive datasets. Several approaches have been developed in this context proposing the storage and querying of RDF data in a distributed fashion, mainly using the MapReduce Programming Model and Hadoop-based ecosystems. New trends in Big Data technologies have also emerged (e.g., Apache Spark, Apache Flink); they use distributed in-memory processing and promise to deliver higher data processing performance. In this paper, we present a formal interpretation of some PACT transformations implemented in the Apache Flink DataSet API. We use this formalization to provide a mapping to translate a SPARQL query to a Flink program. The mapping was implemented in a prototype used to determine the correctness and performance of the solution. The source code of the Citation: Ceballos, O.; Ramírez project is available in Github under the MIT license.
    [Show full text]
  • Data-Centric Graphical User Interface of the ATLAS Event Index Service
    EPJ Web of Conferences 245, 04036 (2020) https://doi.org/10.1051/epjconf/202024504036 CHEP 2019 Data-centric Graphical User Interface of the ATLAS Event Index Service Julius Hrivnᡠcˇ1,∗, Evgeny Alexandrov2, Igor Alexandrov2, Zbigniew Baranowski3, Dario Barberis4, Gancho Dimitrov3, Alvaro Fernandez Casani5, Elizabeth Gallas6, Carlos Gar- cía Montoro5, Santiago Gonzalez de la Hoz5, Andrei Kazymov2, Mikhail Mineev2, Fedor Prokoshin2, Grigori Rybkin1, Javier Sanchez5, Jose Salt5, Miguel Villaplana Perez7 1Université Paris-Saclay, CNRS/IN2P3, IJCLab, 91405 Orsay, France 2Joint Institute for Nuclear Research, 6 Joliot-Curie St., Dubna, Moscow Region, 141980, Russia 3CERN, 1211 Geneva 23, Switzerland 4Physics Department of the University of Genoa and INFN Sezione di Genova, Via Dodecaneso 33, I-16146 Genova, Italy 5Instituto de Fisica Corpuscular (IFIC), Centro Mixto Universidad de Valencia - CSIC, Valencia, Spain 6Department of Physics, Oxford University, Oxford, United Kingdom 7Department of Physics, University of Alberta, Edmonton AB, Canada Abstract. The Event Index service of the ATLAS experiment at the LHC keeps references to all real and simulated events. Hadoop Map files and HBase tables are used to store the Event Index data, a subset of data is also stored in the Oracle database. Several user interfaces are currently used to access and search the data, from a simple command line interface, through a programmable API, to sophisticated graphical web services. It provides a dynamic graph-like overview of all available data (and data collections). Data are shown together with their relations, like paternity or overlaps. Each data entity then gives users a set of actions available for the referenced data. Some actions are provided directly by the Event Index system, others are just interfaces to different ATLAS services.
    [Show full text]
  • Flare: Optimizing Apache Spark with Native Compilation
    Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data Gregory Essertel, Ruby Tahboub, and James Decker, Purdue University; Kevin Brown and Kunle Olukotun, Stanford University; Tiark Rompf, Purdue University https://www.usenix.org/conference/osdi18/presentation/essertel This paper is included in the Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’18). October 8–10, 2018 • Carlsbad, CA, USA ISBN 978-1-939133-08-3 Open access to the Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation is sponsored by USENIX. Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data Grégory M. Essertel1, Ruby Y. Tahboub1, James M. Decker1, Kevin J. Brown2, Kunle Olukotun2, Tiark Rompf1 1Purdue University, 2Stanford University {gesserte,rtahboub,decker31,tiark}@purdue.edu, {kjbrown,kunle}@stanford.edu Abstract cessing. Systems like Apache Spark [8] have gained enormous traction thanks to their intuitive APIs and abil- In recent years, Apache Spark has become the de facto ity to scale to very large data sizes, thereby commoditiz- standard for big data processing. Spark has enabled a ing petabyte-scale (PB) data processing for large num- wide audience of users to process petabyte-scale work- bers of users. But thanks to its attractive programming loads due to its flexibility and ease of use: users are able interface and tooling, people are also increasingly using to mix SQL-style relational queries with Scala or Python Spark for smaller workloads. Even for companies that code, and have the resultant programs distributed across also have PB-scale data, there is typically a long tail of an entire cluster, all without having to work with low- tasks of much smaller size, which make up a very impor- level parallelization or network primitives.
    [Show full text]
  • Towards an Integrated Graph Algebra for Graph Pattern Matching with Gremlin
    Towards an Integrated Graph Algebra for Graph Pattern Matching with Gremlin Harsh Thakkar1, S¨orenAuer1;2, Maria-Esther Vidal2 1 Smart Data Analytics Lab (SDA), University of Bonn, Germany 2 TIB & Leibniz University of Hannover, Germany [email protected], [email protected] Abstract. Graph data management (also called NoSQL) has revealed beneficial characteristics in terms of flexibility and scalability by differ- ently balancing between query expressivity and schema flexibility. This peculiar advantage has resulted into an unforeseen race of developing new task-specific graph systems, query languages and data models, such as property graphs, key-value, wide column, resource description framework (RDF), etc. Present-day graph query languages are focused towards flex- ible graph pattern matching (aka sub-graph matching), whereas graph computing frameworks aim towards providing fast parallel (distributed) execution of instructions. The consequence of this rapid growth in the variety of graph-based data management systems has resulted in a lack of standardization. Gremlin, a graph traversal language, and machine provide a common platform for supporting any graph computing sys- tem (such as an OLTP graph database or OLAP graph processors). In this extended report, we present a formalization of graph pattern match- ing for Gremlin queries. We also study, discuss and consolidate various existing graph algebra operators into an integrated graph algebra. Keywords: Graph Pattern Matching, Graph Traversal, Gremlin, Graph Alge- bra 1 Introduction Upon observing the evolution of information technology, we can observe a trend arXiv:1908.06265v2 [cs.DB] 7 Sep 2019 from data models and knowledge representation techniques being tightly bound to the capabilities of the underlying hardware towards more intuitive and natural methods resembling human-style information processing.
    [Show full text]