In-Memory Computing Platform: Data Grid Deep Dive
Total Page:16
File Type:pdf, Size:1020Kb
In-Memory Computing Platform: Data Grid Deep Dive Rachel Pedreschi Matt Sarrel Director of Solutions Architecture Director of Technical Marketing GridGain Systems GridGain Systems [email protected] [email protected] @rachelpedreschi @msarrel © 2016 GridGain Systems, Inc. GridGain Company Confidential Agenda • Introduction • In-Memory Computing • GridGain / Apache Ignite Overview • Survey Results • Data Grid Deep Dive • Customer Case Studies © 2016 GridGain Systems, Inc. Why In-Memory Now? Digital Transformation is Driving Companies Closer to Their Customers • Driving a need for real-time interactions Internet Traffic, Data, and Connected Devices Continue to Grow • Web-scale applications and massive datasets require in-memory computing to scale out and speed up to keep pace • The Internet of Things generates huge amounts of data which require real-time analysis for real world uses The Cost of RAM Continues to Fall • In-memory solutions are increasingly cost effective versus disk-based storage for many use cases © 2015 GridGain Systems, Inc. GridGain Company Confidential Why Now? Data Growth and Internet Scale Declining DRAM Cost Driving Demand Driving Attractive Economics Growth of Global Data 35. 26.3 17.5 ZettabytesData of 8.8 DRAM 0. Flash Disk 8 zettabytes in 2015 growing to 35 in 2020 Cost drops 30% every 12 months © 2016 GridGain Systems, Inc. The In-Memory Computing Technology Market Is Big — And Growing Rapidly IMC-Enabling Application Infrastructure ($M) © 2016 GridGain Systems, Inc. What is an In-Memory Computing Platform? Multi-Featured Solution • Supports data caching, massive parallel processing, in-memory SQL, streaming and much more Does Not Replace Existing Databases • Slides in between the existing application and data layers Supports OLTP and OLAP Use Cases • Offers ACID compliant transactions as well as analytics support Multi-Platform Integration • Works with all popular RDBMS, NoSQL and Hadoop databases and offers a Unified API with support for a wide range of languages Deployable Anywhere • Can be deployed on premise, in the cloud, or in hybrid environments © 2015 GridGain Systems, Inc. GridGain Company Confidential Apache Ignite Project • 2007: First version of GridGain • Oct. 2014: GridGain contributes Ignite to ASF • Aug. 2015: Ignite is the second fastest project to | graduate after Spark • Today: • 60+ contributors and rapidly growing • Huge development momentum - Estimated 192 years of effort since the first commit in February, 2014 [Openhub] • Mature codebase: 1M+ lines of code © 2016 GridGain Systems, Inc. Application In-Memory Always Available Scalable SQL 99 / ACID / No Rip & Replace MapReduce © 2016 GridGain Systems, Inc. GridGain Enterprise Monitoring and Security High Availability Management Services OLTP OLAP GaaS and Streaming Caching Caching Messaging Data Grid Compute Grid Apache Ignite / GridGain Professional © 2016 GridGain Systems, Inc. Survey Results: Primary Industries Software Banking IT Consulting Other Telecommunications Insurance Investment Management Healthcare FinTech Government Retail eCommerce Other Financial Services Other Online Services Manufacturing Online Gaming Social Media Media Logistics Other Electronics Online Business Services Computers Hospitality Pharmaceuticals Utilities © 2016 GridGain Systems, Inc. Survey Results: What uses were you considering for in-memory computing Database Caching Application Scaling High Speed Transactions Real Time Streaming Faster Reporting RDBMS Scaling HTAP Apache Spark Acceleration Web Session Clustering Hadoop Acceleration MongoDB Acceleration 0 18 35 53 70 Column1 © 2016 GridGain Systems, Inc. Survey Results: How important are each of the following product features to your organization? Data Grid Compute Grid In-memory File System Service Grid Streaming Grid ANSI SQL-99 Compliance Container Support In-Memory Hadoop File System Spark Shared RDDs Hadoop Acceleration Integration with Zeppelin 0. 1.2 2.3 3.5 4.6 5.8 Column1 © 2016 GridGain Systems, Inc. Survey Results: Where do you run GridGain and/or Apache Ignite? Private Cloud On Premise AWS Microsoft Azure Google Cloud Platform Softlayer Another Public Cloud 0 15 30 45 60 Column1 © 2016 GridGain Systems, Inc. Survey Results: Which of the following protocols do you use to access your data? Java SQL MapReduce .NET C++ Scala Node.js PHP Groovy 0 23 45 68 90 Column1 © 2016 GridGain Systems, Inc. Survey Results: Which data stores are you/would you likely use with GridGain/Apache Ignite? Oracle MySQL HDFS PostresSQL MongoDB Microsoft SQL Server Cassandra DB2 Other 0 13 25 38 50 Column1 © 2016 GridGain Systems, Inc. Data Grid: Cache modes & Horizontal Scaling A A B A A B Primary Primary Primary Primary B B C B D C D C A C D C Local Client Backup Backup Local Client Backup Backup JVM 1 JVM 2 JVM 1 JVM 2 A A D C A A D C B B Primary Primary B B Primary Primary C C A B C B A D C C B A Near Cache Near Cache Remote Client Backup Backup Remote Client Backup Backup Client JVM JVM 3 JVM 4 Client JVM JVM 3 JVM 4 Replicated Cache Partitioned Cache © 2016 GridGain Systems, Inc. Cluster Groups Data Cache Zone 1 Data & Compute Zone 2 Compute Grid Zone 3 Data read Data write Server Server Client Server Server Server Server Client Client Server Server Client Map/Reduce Data read Data write Job Data read Compute Job Client Client Client GridGain / Ignite Cluster © 2016 GridGain Systems, Inc. Data Grid: Tiered Memory & Local Store • Tiered Memory Latency • On-Heap -> • Off-Heap -> Swap Space • Disk (Swap) • Persistent On-Disk Store Off-Heap • Fast Recovery On-Heap • Local Data Reload • Eliminate Network and Db impacts Capacity when reloading in-memory store Local Recoverable Store © 2016 GridGain Systems, Inc. Data Grid: Off-Heap Memory • Unlimited Vertical Scale • Avoid Java Garbage Collection Pauses • Small On-Heap Footprint • Configurable eviction policies • Off-Heap Indexes • Full RAM Utilisation • Simple Configuration © 2016 GridGain Systems, Inc. Data Grid: DC Replication DC1 DC2 • Multiple (up to 32) Data Centres • Complex Replication Technologies • Active-Active & Active-Passive • Smart Conflict Resolution • Durable Persistent Queues Active • Automatic Throttling Transactional or Bi-Directional Eventually • GridGain Enterprise Replication Consistent DC3 © 2016 GridGain Systems, Inc. Data Grid: External Persistence Write Through • Read-through & Write- through • Support for Write-behind DB External • Configurable eviction policies Persistent • DB schema mapping wizard: Ignite Clients Store • Generates all the XML configuration and Read Java POJOs Through Ignite Cache Nodes © 2016 GridGain Systems, Inc. Data Grid: Cache APIs • Predicate-based Scan Queries • Text Queries based on Lucene indexing • Query configuration using annotations, Spring XML or simple Java code • SQL Queries: Automatic Group By, Aggregations, Sorting, Cross-Cache Joins, Unions • Memcached (PHP, Java, Python, Ruby) • HTTP REST API • JDBC & ODBC © 2016 GridGain Systems, Inc. Data Grid: SQL Support (ANSI 99) • ANSI-99 SQL • In-Memory Indexes (On and Off- Heap) • Automatic Group By, Aggregations, Sorting • Cross-Cache Joins, Unions • Use local H2 engine © 2016 GridGain Systems, Inc. Data Grid: Transactions • Fully ACID • Support for Transactional & Atomic • Cross-cache transactions • Optimistic and Pessimistic concurrency modes with multiple isolation levels • Deadlock protection • JTA Integration © 2016 GridGain Systems, Inc. 1000’s of Deployments Automated Trading Systems • Real time analysis of trading positions • Real time market risk assessment • High volume transactions • Ultra low latencies trading Financial Services • Fraud Detection • Risk Analysis Mobile & IoT • Insurance rating and modeling • Real-time streaming processing • Complex event processing Big Data Analytics • Real time analysis of inventory Biotech • Operational up-to-the-second BI • High performance genome data matching • Drug discovery © 2016 GridGain Systems, Inc. Case Study: • Connects over 650 lifestyle and consumer brands (more than 4.5 million products) with brand advocates • Brands use the network to identify experts and engage them in their sales channel • Member profiles (interests, experience, content viewed, purchases) • Real Time Analytics on "We need it to work fast and work at scale, thousands of active user and GridGain does. We couldn't do what we sessions for personalized do without GridGain, and we're confident we content can scale to meet the growth we anticipate.“ -Jeremy Knudsen, CTO © 2016 GridGain Systems, Inc. Case Study: • Background: • The Challenge – Intelligentpipe is a big data – Collect and analyze massive software company serving amounts of mobile user the global traffic data in real time telecommunications industry – Tens of millions of users by developing solutions for – Consumption of network mobile operators to improve resources their business and – Type of network traffic operational processes (voice or data) © 2016 GridGain Systems, Inc. Case Study: • GridGain Professional Edition used to build a high performance low latency analysis platform • “GridGain ensures responsiveness regardless of how much information we need to search through.” Sakari Paloviita, CTO, Intelligentpipe • Collect and analyze multiple terabytes per day © 2016 GridGain Systems, Inc. Case Study: • Real-time analytics provides fast insight • Easy integration with existing systems due to GridGain’s Unified API and ANSI SQL-99 support • Linear scaling across deployed server to keep up seamlessly as the business grows We’ll want to use technology