D A T 3 2 9 Building your first graph application with Amazon Neptune
Taylor Riggan Kunal Sengupta Senior Specialist Solutions Architect Software Dev. Engineer Amazon Neptune Amazon Neptune Amazon Web Services Amazon Web Services
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda
• What are graph databases? • Overview of Amazon Neptune • Graph query languages • Apache TinkerPop/Gremlin • RDF/SPARQL • Hands-on exercises • Loading data into an Amazon Neptune cluster • Using Gremlin or SPARQL from a Jupyter Notebook • Q&A Related breakouts
• DAT347: Neptune best practices: How to optimize your graph queries (3:15 p.m. Thursday) Common data categories and use cases
Relational Key-value Document In-memory Graph Time-series Ledger
Referential High Store Query by key Quickly and Collect, store, Complete, integrity, ACID throughput, low- documents and with easily create and process immutable, and transactions, latency reads quickly access microsecond and navigate data verifiable history schema- and writes, querying on latency relationships sequenced by of all changes to on-write endless scale any attribute between time application data data
Lift and shift, ERP, Real-time bidding, Content Leaderboards, Fraud detection, IoT applications, Systems CRM, finance shopping cart, management, real-time analytics, social networking, event tracking of record, supply social, product personalization, caching recommendation chain, healthcare, catalog, customer mobile engine registrations, preferences financial AWS purpose-built databases
Relational Key-value Document In-memory Graph Time-series Ledger
Amazon RDS Amazon Amazon Amazon Amazon Amazon Amazon DynamoDB DocumentDB ElastiCache Neptune Timestream QLDB
Aurora Community Commercial Redis Memcached Graphs are all around us Graph data
• Relationships are first-class objects
• Vertices connected by PRODUCT Edges PURCHASED PURCHASED
KNOWS
PURCHASED
FOLLOWS SPORT
FOLLOWS Graph use case
// Product recommendation to a user gremlin> V().has(‘name’,’sara’).as(‘customer’). out(‘follows’).in(‘follows’).out(‘purchased’) ( (‘customer’)).dedup() (‘name’) ('name')
// Identify a friend in common and make a recommendation gremlin> g.V().has('name','mary').as(‘start’). both('knows').both('knows’). where(neq(‘start’)). dedup().by('name').properties('name') Amazon Neptune Fully managed graph database
Fast Reliable Easy Open
Query billions of Six replicas of your data Build powerful queries Supports Apache relationships with across three AZs with full easily with Gremlin and TinkerPop and W3C RDF millisecond latency backup and restore SPARQL graph models Amazon Neptune high-level architecture
Bulk load from S3
Database Mgmt. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Leading graph models and frameworks
Property graph Resource description framework (RDF)
• Open-source Apache TinkerPop • W3C standard • Gremlin traversal language • SPARQL query language • Programming language drivers The world’s most authoritative source 6 million titles of entertainment information. Movies, TV, OTT and video games Now available for license. 10 million names Celebrities, cast and crew
Box office data Worldwide release performance
Ratings Contact us at imdb[email protected] Real ratings from true fans follows Data model Artist
actress | actor | composer | director | archive_sound | writer | editor archive_footage | production_designer | cinematographer | producer | self
episode episode videoGame tvSpecial tvMovie tvSpecial tvEpisode tvSeries movie
episode rated
genre
isLocatedIn Genre Place Person name: String follows birthyear: Int Data model deathyear: Int (property graph) Artist
actress | actor | composer | director | archive_sound | writer | editor archive_footage | production_designer | cinematographer | producer | self
episode episode videoGame tvSpecial tvMovie tvSpecial tvEpisode tvSeries movie
episode runtime: Int Float rating: title: String rated year: Int averageRating: Float genre firstName: String numvotes: Int lastName: String birthday: Date gender: String isLocatedIn Genre Place Person
name: String type: String name: String follows birthyear: Int Data model deathyear: Int (RDF) Artist
actress | actor | composer | director | archive_sound | writer | editor archive_footage | production_designer | cinematographer | producer | self
episode episode videoGame tvSpecial tvMovie tvSpecial tvEpisode tvSeries movie
episode Float rating: runtime: Int title: String year: Int rated averageRating: Float genre numvotes: Int firstName: String lastName: String birthday: Date isLocatedIn gender: String Genre Place Person
name: String type: String Learning graph traversals
Create/Filter Find Fetch Apache TinkerPop/Gremlin
Create/Filter Find Fetch addV; addE both; bothV; bothE aggregate has choose; coalesce path; project hasId in; inV; inE properties hasLabel out; outV; outE store filter repeat; until; times values; valueMap property select Reduce where count dedup Apache TinkerPop Gremlin
Anatomy of a Gremlin query traversal g.V().has(‘property’,’value’).out(‘label’).values(‘name’,’description’).toList()
Navigation Vertex selector “Join” Serialization Filter “Fetch” Terminal step Graph traversal source RDF/SPARQL PREFIX person:
Named Graph URI (Edge Id) Default - Named Graph URI : tt0026073-person10995116285432
resource:tt0026073-person10995116285432 SPARQL query from triples serialization of the graph pattern
Query
PREFIX person:
SELECT ?title ?rating ?genre
WHERE { { GRAPH ?ratingEdge { person:10995116285432 objProp:rated ?movie } }
?ratingEdge dataProp:rating ?rating. ?movie dataProp:title ?title. ?movie objProp:genre ?genre. }
Result ?title ?rating ?genre "Goodbye, Dragon Inn" 2 resource:Drama "Sunflower" 6 resource:Action RDF/SPARQL
Create/Select Find Built-in Filters
INSERT WHERE, OPTIONAL Logical: !, &&, || INSERT DATA UNION, FILTER Math: +, -, *, / DELETE GRAPH Comparison: =, !=, >, < DELETE DATA isURI, isBlank, isLiteral SELECT Sort/Reduce bound PREFIX COUNT, DISTINCT str, lang, datatype GROUP/ORDER BY sameTerm LIMIT langMatches OFFSET regex … but what about traversals across edges?
Construct Meaning PREFIX dataprop:
VPC VPC
Generic group Generic group
SageMaker notebook Elastic network User instance interface Neptune cluster
Neptune snapshot on S3 Workshop instructions
Browse to: neptune-deep-dive.workshop.aws Learn databases with AWS Training and Certification Resources created by the experts at AWS to help you build and validate database skills
25+ free digital training courses cover topics and services related to databases, including: • Amazon Aurora • Amazon Redshift • Amazon Neptune • Amazon RDS • Amazon DocumentDB • Amazon DynamoDB • Amazon ElastiCache
Validate expertise with the new AWS Certified Database - Specialty beta exam
Visit aws.training
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.