<<

D A T 2 2 0 Real-world customer use cases with Neptune

Karthik Bharathy Elliott Foster Yaz Shunnar Product Leader, Neptune Software Architect Software Engineer NBCUniversal ATG

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Graphs are everywhere

• In the search engines you use • In the social network you connect, read, or follow • In the professional network you participate in • In the travel you booked • In the conference you are attending • In the devices you hold

Internet Graph use cases

Social Knowledge Fraud Life Network & IT Recommendations networking graphs detection sciences operations

Connected data Navigate (variably) connected structure Filter or compute a result based on strength, weight, or quality of relationships

Fast Reliable Easy Open

Query billions of Six replicas of your data Build powerful queries Supports Apache relationships with across three AZs with full easily with Gremlin and TinkerPop & W3C RDF millisecond latency backup and restore SPARQL graph models New features in Amazon Neptune

• Generate a complete sequence of change-log entries that record every Neptune Streams change made to the graph

Transaction semantics • Formalized semantics to help you avoid data anomalies

Gremlin/SPARQL Explain • Gain insights into the query plan and evaluation order

SPARQL 1.1 Federated Query • Use SPARQL to express queries across diverse data sources

• Client starts session transaction. All queries run during the session are Gremlin sessions committed only when after connection is closed.

Database cloning • Create multiple clones of a DB cluster using copy-on-write semantics

Elasticsearch integration • Full-text search using with graph data in Neptune

Neptune Workbench • In-console notebook experience to query your graph Neptune reference customers From a NoSQL system to Amazon Neptune Elliott Foster Software Architect

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Howdy, y’all!

+ Problems

• Legacy data store • Year-over-year growth • Operational complexity • Use case requires scaling up and down The Neptune decision

• Graph DB: right tool for the job • Early adoption is hard • Great support from AWS engineers • Gremlin is natural and easy to use Progressive migration

• Need to preserve existing functionality while improving performance • Ability to prototype in parallel with live traffic • Solutions faster to market Simplified architecture

Users Traversal Traversal Amazon Simple CMSs Amazon Kinesis Builder Builder Queue Service (Amazon SQS)

Amazon Neptune

Traversal Builder System performance during migration

Average throughput (rpm x 1000) Average Response (ms)

300 50

240 37.5

180 25 120

12.5 60

0 0 POC, May 2018 Phase 1, August 2018 Phase 2, December 2018 Phase 3, July 2019

Today

• Write throughput more than doubled over legacy system • Read response times improved by an order of magnitude or more • Easier to manage scaling and down- sync events • ~40% cost reduction over legacy system • Lots of room to grow! What we learned

• Easy-to-write traversals that bring back the whole graph • Right-sizing the cluster is tricky due to fail-over logic • Use your data model to drive application logic Next steps

• Further personalization to drive higher user engagement • Deeper insights into our data and how users are interacting with it • GraphQL interface Thank you!

Elliott Foster [email protected] @elliotttf Running Amazon Neptune at scale Yaz Shunnar Software Engineer

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Uber ATG Map Versioning At Scale Uber ATG Agenda 01 Map Versioning at Uber ATG 02 Map Versioning as a Graph 03 Map Versioning in Neptune 04 Map Versioning at Scale Map Versioning at Uber ATG Uber ATG Uber ATG Maps

Pittsburgh

Dallas Uber ATG Pittsburgh Version Graph

Version Graph Each dot is a version of the map. Each edge is a parent-child relationship between map versions. Uber ATG Data Constraints x K y TB Map Versions of Of Data per Map of Pittsburgh Pittsburgh Uber ATG Access Constraints

< 1s <.01s Version Edits Version Creation Uber ATG Pittsburgh Version Graph

Version Graph Each dot is a version of the map. Each edge is a parent-child relationship between map versions. Map Versioning as a Graph Uber ATG Modeling as a Graph

● Vertexes only store their edits!

● Edits on a vertex are sorted by timestamp

● Walk edges to find parent data Uber ATG Versioning as a Graph

Map Version 1

Patrick Edits Uber ATG Versioning as a Graph

Map Version 1

Patrick Edits Uber ATG Versioning as a Graph

Map Version 1

Patrick Edits

Patrick’s View: Uber ATG Versioning as a Graph

Map Version 1 Map Version 2

Patrick Edits John Creates

Patrick’s View: John’s View: Uber ATG Version Separation

Map Version 1 Map Version 2

Patrick Edits John Edits

Patrick’s View: John’s View: Uber ATG Versioning Separation

Map Version 1 Map Version 2

Patrick Edits John creates

Patrick’s View: John’s View: Uber ATG Versioning Inheritance Extensibility

Map Version 1 Map Version 2 Map Version 3

Patrick creates John creates Mark Creates

Patrick’s View: John’s View: Mark’s View: Uber ATG Versioning Inheritance Extensibility

Map Version 1 Map Version 2 Map Version 3

Patrick creates John creates Mark edits

Patrick’s View: John’s View: Mark’s View: Uber ATG Versioning Inheritance Extensibility

Map Version 1 Map Version 2 Map Version 3

Patrick creates Balaji creates Yaz edits Uber ATG Versioning Inheritance Extensibility

Map Version 1 Map Version 2 Map Version 3 Map Versioning in Neptune Uber ATG Why Neptune?

● Versioning models easily as a graph

● Vertically scalable writes

● Horizontally scalable reads

● Managed by AWS Uber ATG Experience with Neptune

● Gremlin query language is simple and naturally models to our problem

● A wide variety of instance types to scale hardware with

● Creation of read replicas is simple Uber ATG Experience with Neptune (cont.)

● Read replica propagation average < 20ms

● Extremely fast traversals

● Monitoring tools are simple and intuitive Uber ATG Things to Watch For

● Neptune’s cluster endpoints propagate stickiness

● Adding data faster than deleting data

● Gremlin GoLang clients are developing Map Versioning at Scale Uber ATG End Results

● > 1 PB of data in managed by Neptune graph

● < .1s avg data edits, < .01s avg version creation

● Our entire Map Storage is managed by Neptune! Key takeaways

NBCUniversal Processing more than 30K requests per second using Neptune. They reported 40% total cost savings by moving from a NoSQL to Amazon Neptune

Uber ATG Built a microservice to manage high-definition maps. They use Amazon Neptune to process billions of relationships on a map and query them in milliseconds Related breakouts

ADM203 - Reimagining advertising analytics & identity resolution at scale ADM301 - Best practices for identity resolution with Amazon Neptune AIM341 - Deep learning on graphs DAT220 - Real-world customer use cases with Amazon Neptune DAT329 - Building your first graph application with Amazon Neptune DAT341 - Best practices for graph data modeling and Amazon Neptune DAT347 - Neptune best practices: How to optimize your graph queries DAT361 - Deep dive on Amazon Neptune DVC06 - Use Neptune to discover where & when events can impact local businesses GAM303 - How Call of Duty uses ML to personalize player engagement MOB318 - AWS AppSync does that?: Support for alternative data Learn with AWS Training and Certification Resources created by the experts at AWS to help you build and validate database skills

25+ free digital training courses cover topics and services related to databases, including: • • Amazon ElastiCache • Amazon Neptune • • Amazon DocumentDB • Amazon RDS • Amazon DynamoDB

Validate expertise with the new AWS Certified Database - Specialty beta exam

Visit aws.training

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.