Sub-Second Analytics for User-Facing Applications with Apache Spark™ and Rockset Venkat Venkataramani CEO and Co-Founder, Rockset About Me

Sub-Second Analytics for User-Facing Applications with Apache Spark™ and Rockset Venkat Venkataramani CEO and Co-Founder, Rockset About Me

Sub-Second Analytics for User-Facing Applications with Apache Spark™ and Rockset Venkat Venkataramani CEO and co-founder, Rockset About me 2016 - present 2007 - 2015 2002 - 2007 Venkat Venkataramani Co-Founder & CEO 2002 - 2007 Agenda ▪ Large-scale data applications ▪ Rockset: designed to serve data applications ▪ Reference architectures with Apache Spark and Rockset ▪ Conclusion Large-scale data applications Building apps on Apache Spark BI Machine Learning Data Pipelines Apache Spark Data Lake Building apps on Apache Spark BI Machine Learning Data Pipelines Data Apps low latency high concurrency Serving Tier Apache Spark (MySQL, Postgres) Data Lake What happens when we get to large scale? BI Machine Learning Data Pipelines Large-Scale Data Apps TBs of data low latency high concurrency Serving Tier Apache Spark (MySQL, Postgres) Data Lake BI and SQL Data Science and Data Engineering Analytics Machine Learning Real-Time Data Applications Apache Spark Rockset Data Lake Example: Investment decisions at Sequoia Capital • Data sets from multiple vendors loaded regularly into data lake • Run data enrichment in Apache Spark/Databricks • Entity 360: combine with internal data sources for complete view of potential investments • Investment team and data scientists use app to help make investment decisions Example: Personalized recommendations at Ritual • Health technology company selling multivitamins online • Customer data from Segment loaded into data warehouse and data lake • Machine learning modeling in Apache Spark/Databricks • Build personalized offers and bundles for their online portal and checkout page Challenges when building large-scale data apps • Speed • Scale • Operational complexity ▪ Cannot power fast • Single-node systems • Extensive performance analytics on OLTP cannot scale horizontally engineering required ▪ Cannot build indexes on • Bulk loads take too long • Periodic reloads result in data lakes/warehouses downtime Modern data apps Modern data applications demand speed and scale But existing solutions force you to pick one OR Modern data apps Modern data applications demand speed and scale But existing solutions force you to pick one Pick speed ➔ hard to scale (OLTP: MySQL, Postgres) Modern data apps Modern data applications demand speed and scale But existing solutions force you to pick one Pick speed ➔ hard to scale (OLTP: MySQL, Postgres) Pick scale ➔ slow and expensive (data warehouse, data lake) Rockset: Designed to serve data apps What is Rockset? Real-time indexing database for modern data apps at massive scale without operational overhead Speed - Converged Index • All fields are indexed in inverted, columnar and row indexes • Accelerates search, aggregation and join queries • No index definition required Speed - Converged Index • Low latency for both highly selective queries and large scans • Optimizer picks between • inverted index (Index Filter operator) • columnar format (Column Scan operator) • inverted index (Index Scan operator) Scale - Disaggregated, cloud-native architecture Scale - Disaggregated, cloud-native architecture Simplicity - Bulk ingestion • Bulk ingest mode for large-scale data • Scale out number of ingesters, use larger leaf pods • Write to S3 instead of log store • No downtime Scale ingest Reference architectures with Apache Spark and Rockset Building data apps with Apache Spark and Rockset Data architecture for personalized recommendations Conclusion Speed: Sub-second analytics for user-facing apps • Star Schema Benchmark • Industry-standard benchmark to measure database performance for analytical apps • All queries ran in <1 sec • Median runtime of 254 millisec Scale: Lower cost with higher compute efficiency • Customer reduced overall bill by 75% • Data app running 24x7 • Converged indexing increases storage cost but significantly reduces compute required to run queries Zero Ops Overhead:Compress development time • Customer reduced development from 6 months to 3 days • SQL analytics on semi-structured data without data prep • No performance engineering required • Serverless for low ops “Our users want to search on any field, anywhere, and we needed to give them that ability. To have this unique capability offered as a service was exactly what we needed to deliver real-time search months ahead of plan.” - Todd McPartlin, Command Alkon Rockset: Serving tier to complement Apache Spark BI Machine Learning Data Pipelines Large-Scale Data Apps TBs of data low latency high concurrency Serving Tier Apache Spark sub-second analytics Data Lake scale compute/storage as needed serverless Learn more • Stop by the Rockset booth or Thank you! • Request Demo at Rockset.com ($300 in trial credits) [email protected] Thank you Venkat Venkataramani [email protected].

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    31 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us