D A T 3 2 9 Building your first graph application with Neptune

Taylor Riggan Kunal Sengupta Senior Specialist Solutions Architect Software Dev. Engineer Amazon Neptune Amazon Web Services

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda

• What are graph ? • Overview of Amazon Neptune • Graph query languages • Apache TinkerPop/Gremlin • RDF/SPARQL • Hands-on exercises • Loading data into an Amazon Neptune cluster • Using Gremlin or SPARQL from a Jupyter Notebook • Q&A Related breakouts

• DAT347: Neptune best practices: How to optimize your graph queries (3:15 p.m. Thursday) Common data categories and use cases

Relational Key-value Document In-memory Graph Time-series Ledger

Referential High Store Query by key Quickly and Collect, store, Complete, integrity, ACID throughput, low- documents and with easily create and process immutable, and transactions, latency reads quickly access microsecond and navigate data verifiable history schema- and writes, querying on latency relationships sequenced by of all changes to on-write endless scale any attribute between time application data data

Lift and shift, ERP, Real-time bidding, Content Leaderboards, Fraud detection, IoT applications, Systems CRM, finance shopping cart, management, real-time analytics, social networking, event tracking of record, supply social, product personalization, caching recommendation chain, healthcare, catalog, customer mobile engine registrations, preferences financial AWS purpose-built databases

Relational Key-value Document In-memory Graph Time-series Ledger

Amazon RDS Amazon Amazon Amazon Amazon Amazon Amazon DynamoDB DocumentDB ElastiCache Neptune Timestream QLDB

Aurora Community Commercial Graphs are all around us Graph data

• Relationships are first-class objects

• Vertices connected by PRODUCT Edges PURCHASED PURCHASED

KNOWS

PURCHASED

FOLLOWS SPORT

FOLLOWS Graph use case

// Product recommendation to a user gremlin> V().has(‘name’,’sara’).as(‘customer’). out(‘follows’).in(‘follows’).out(‘purchased’) ( (‘customer’)).dedup() (‘name’) ('name')

// Identify a friend in common and make a recommendation gremlin> g.V().has('name','mary').as(‘start’). both('knows').both('knows’). where(neq(‘start’)). dedup().by('name').properties('name') Amazon Neptune Fully managed graph

Fast Reliable Easy Open

Query billions of Six replicas of your data Build powerful queries Supports Apache relationships with across three AZs with full easily with Gremlin and TinkerPop and W3C RDF millisecond latency and restore SPARQL graph models Amazon Neptune high-level architecture

Bulk load from S3

Database Mgmt. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Leading graph models and frameworks

Property graph Resource description framework (RDF)

• Open-source Apache TinkerPop • W3C standard • Gremlin traversal language • SPARQL drivers The world’s most authoritative source 6 million titles of entertainment information. Movies, TV, OTT and video games Now available for license. 10 million names Celebrities, cast and crew

Box office data Worldwide release performance

Ratings Contact us at [email protected] Real ratings from true fans follows Data model Artist

actress | actor | composer | director | archive_sound | writer | editor archive_footage | production_designer | cinematographer | producer | self

episode episode videoGame tvSpecial tvMovie tvSpecial tvEpisode tvSeries movie

episode rated

genre

isLocatedIn Genre Place Person name: String follows birthyear: Int Data model deathyear: Int (property graph) Artist

actress | actor | composer | director | archive_sound | writer | editor archive_footage | production_designer | cinematographer | producer | self

episode episode videoGame tvSpecial tvMovie tvSpecial tvEpisode tvSeries movie

episode runtime: Int Float rating: title: String rated year: Int averageRating: Float genre firstName: String numvotes: Int lastName: String birthday: Date gender: String isLocatedIn Genre Place Person

name: String type: String name: String follows birthyear: Int Data model deathyear: Int (RDF) Artist

actress | actor | composer | director | archive_sound | writer | editor archive_footage | production_designer | cinematographer | producer | self

episode episode videoGame tvSpecial tvMovie tvSpecial tvEpisode tvSeries movie

episode Float rating: runtime: Int title: String year: Int rated averageRating: Float genre numvotes: Int firstName: String lastName: String birthday: Date isLocatedIn gender: String Genre Place Person

name: String type: String Learning graph traversals

Create/Filter Find Fetch Apache TinkerPop/Gremlin

Create/Filter Find Fetch addV; addE both; bothV; bothE aggregate has choose; coalesce path; project hasId in; inV; inE properties hasLabel out; outV; outE store filter repeat; until; times values; valueMap property select Reduce where count dedup Apache TinkerPop Gremlin

Anatomy of a Gremlin query traversal g.V().has(‘property’,’value’).out(‘label’).values(‘name’,’description’).toList()

Navigation Vertex selector “Join” Serialization Filter “Fetch” Terminal step Graph traversal source RDF/SPARQL PREFIX person: PREFIX objProp: PREFIX dataProp: PREFIX resource:

Named Graph URI (Edge Id) Default - Named Graph URI : tt0026073-person10995116285432

resource:tt0026073-person10995116285432 SPARQL query from triples serialization of the graph pattern

Query

PREFIX person: PREFIX objProp: PREFIX dataProp: PREFIX resource:

SELECT ?title ?rating ?genre

WHERE { { GRAPH ?ratingEdge { person:10995116285432 objProp:rated ?movie } }

?ratingEdge dataProp:rating ?rating. ?movie dataProp:title ?title. ?movie objProp:genre ?genre. }

Result ?title ?rating ?genre "Goodbye, Dragon Inn" 2 resource:Drama "Sunflower" 6 resource:Action RDF/SPARQL

Create/Select Find Built-in Filters

INSERT WHERE, OPTIONAL Logical: !, &&, || INSERT DATA UNION, FILTER Math: +, -, *, / DELETE GRAPH Comparison: =, !=, >, < DELETE DATA isURI, isBlank, isLiteral SELECT Sort/Reduce bound PREFIX COUNT, DISTINCT str, lang, datatype GROUP/ORDER BY sameTerm LIMIT langMatches OFFSET regex … but what about traversals across edges?

Construct Meaning PREFIX dataprop: path1/path2 Forwards path (path1 followed by path2) PREFIX objprop: ^path1 Backwards path (object to subject) SELECT DISTINCT ?names path1|path2 Either path1 or path2 WHERE { ?s1 dataprop:name "Kevin Bacon" . path1* path1, repeated zero or more times ?movies objprop:actor ?s1 . ?movies (objprop:actor/dataprop:name)+ ?names . path1+ path1, repeated one or more times } path1? path1, optionally !uri Any predicate except uri !^uri Any backwards (object to subject) predicate except uri Workshop architecture

VPC VPC

Generic group Generic group

SageMaker notebook Elastic network User instance interface Neptune cluster

Neptune snapshot on S3 Workshop instructions

Browse to: neptune-deep-dive.workshop.aws Learn databases with AWS Training and Certification Resources created by the experts at AWS to help you build and validate database skills

25+ free digital training courses cover topics and services related to databases, including: • • Amazon Neptune • Amazon RDS • Amazon DocumentDB • Amazon DynamoDB • Amazon ElastiCache

Validate expertise with the new AWS Certified Database - Specialty beta exam

Visit aws.training

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.