A Gridgain Systems In-Memory Computing White Paper

WHITE PAPER d A GridGain Systems In-Memory Computing White Paper May 2017 © 2017 GridGain Systems, Inc. All Rights Reserved. GRIDGAIN.COM WHITE PAPER Choosing the Right In-Memory Computing Solution Contents Wh IMC Is Right for Toda’s Fast-Data and Big-Data Applications ............................................................. 2 Dispelling Myths About IMC ......................................................................................................................... 3 IMC Product Categories ................................................................................................................................ 4 Table of IMC Product Categories .............................................................................................................. 8 Sberbank case study: In-Memory Computing in Action ............................................................................... 9 GridGain Systems: A Leader in In-Memory Computing ................................................................................ 9 Making the Choice ...................................................................................................................................... 10 Contact GridGain Systems .......................................................................................................................... 11 About GridGain Systems ............................................................................................................................. 11 © 2017 GridGain Systems, Inc. All Rights Reserved. GRIDGAIN.COM WHITE PAPER Choosing the Right In-Memory Computing Solution As in-memory computing (IMC) gains momentum across a wide range of applications, from fintech and ecommerce to telecommunications and IoT, many companies are looking to IMC solutions to help them process and analyze large amounts of data in real time. However, the variety of IMC product categories and solutions can be confusing, making it difficult to determine which solution is best for a particular application. Questions abound: In-memory database or in-eor dataase optio? I-memory data grid or in-memory computing platform? This hite paper reies h IMC akes sese for toda’s fast-data and big-data applications, dispels common myths about IMC, and clarifies the distinctions among IMC product categories to make the process of choosing the right IMC solution for a specific use case much easier. Wh IMC Is Right for Toda’s Fast-Data and Big-Data Applications Many data problems can be split into two groups: fast data and big data. Fast data generally applies to online transaction processing (OLTP) use cases that are characterized by high throughput and low late. It’s iportat for OLTP solutios to e ale to proess hudreds of thousads of operations per second with latencies in milliseconds or nanoseconds. This represents operational data sets that must be consistent with reliable performance. There’s also the ig data side of the orld, hih is haraterized olie aaltial proessig (OLAP) appliatios. I OLAP searios, high throughput is’t iportat as the are desiged for aalzig data and frequently process data offline in a batch-oriented fashion. Low latency is still critical in OLAP scenarios because we want queries to run fast. Before fousig o IMC produt ategories, let’s reie hat IMC is ad h o is the right tie for memory-first data solutions. What IMC is. In-memory computing involves storing data in RAM instead of on disk, optimally across a cluster of computers that can process the data in parallel (though some IMC product categories are more limited). Ideally, this architecture provides a way to compute and transact on data at speeds that are orders of magnitudes faster than disk-based systems, providing response times in the range of milliseconds to nanoseconds. IMC can also provide other important benefits such as scalability and data consistency, depending on the product and category. Ho storage technology eoled toard today’s IMC. While in-memory technology has been around in some capacity for decades, its early forms were expensive and had limitations that made other storage technologies preferable: tape drives in the 1950s, hard drives in the 1970s, and flash drives in the 2000s. Not until the 2010s did two factors combine to make in-memory computing the technology of choice. Those two factors were RAM cost (now reduced to less than a penny per megabyte, compared to $35 per megabyte in the mid-1990s) and the advent of 64-bit processors. 2 © 2017 GridGain Systems, Inc. All Rights Reserved. GRIDGAIN.COM WHITE PAPER Choosing the Right In-Memory Computing Solution The 32-bit processors of the past had a significant limitation: they could address only four gigabytes of RAM. Once the 64-bit CPUs came in, they could address much larger spaces: multiple terabytes of RAM. This ability to put much more data in RAM gave a big push to in-memory computing. Why IMC performs better than disk-based architectures. The reason that memory-first architectures perform faster than disk-first architectures has to do with the way the data is accessed. Disk-first architectures use disk as primary storage and use memory only for caching, while memory-first architectures use memory as primary storage and disk for backup. This is important because there are several reasons why disk access is slower than memory access. With disk-first architectures – suh as RDBMS ad NoSQL dataases ad file sstes suh as Hadoop’s HDFS – there is a multi-step process that must occur in order to access a byte of data. The request starts with an API call, then it goes to the operating system and through the I/O controller before finally accessing the disk to load the data. At this point, even if only one byte of data is needed, more must be loaded, because disk is block-addressable and not byte-addressable storage. This is significant because an entire block (whichever block size has been configured in the OS, typical configurations for a data store volume could be four kilobytes or eight kilobytes) must be loaded. Each step introduces latency, from several milliseconds to hundreds of milliseconds. With a memory-first architecture, the process is much simpler and faster. Accessing a byte of data requires only the referencing of a pointer (which specifies the location of that byte in memory). No loading is needed. The system can read kilobytes of data immediately without loading it or processing it. There is direct access to any portion of the memory, and that access is measured in nanoseconds not in milliseconds like the disk access described above. For applications requiring truly fast performance – nanoseconds or microseconds instead of milliseconds – memory-first architectures are the clear choice. Dispelling Myths About IMC Of course, while performance is a key factor in deciding on a data architecture, there are other factors as well – and sometimes myths about in-memory computing come into play when those factors are osidered. Let’s look at four oo ths aout i-memory computing and why they are not based in fact. Myth #1: Too expensive. In the mid-1990s, loading a terabyte of data in-memory cost millions of dollars. However, the cost of RAM has been decreasing about 30% per year since then, so it is now much more affordable. Recently, Sberbank found that they could generate one billion transactions per second with the GridGain in-memory computing platform using only 10 Dell blades with a combined memory of one terabyte – at a cost of about $25,000. So, memory is now very affordable and it is being used as a main system of records in many use cases. 3 © 2017 GridGain Systems, Inc. All Rights Reserved. GRIDGAIN.COM WHITE PAPER Choosing the Right In-Memory Computing Solution Myth #2: Not durable. While it is true that DRAM ill ot surie a serer rash, ost of toda’s systems are designed to solve that problem through fault-tolerance strategies such as automatic data replication and integration with persistent data stores. For example, the Apache® Ignite™ in-memory computing platform (and its commercial version, GridGain), will automatically persist and load data from any RDBMS or NoSQL database on disk. Also, there are some promising developments on the horizon in the area of non-volatile memory, such as Itel’s D XPoit: te-addressable, flash-based memory that will plug into PCI slots and act almost like a regular DRAM. Because of its non-volatile nature, this type of memory will be able to survive several restarts. While it will not be as fast as DRAM, it ill e faster tha toda’s flash. Myth #3: Flash is fast enough. While toda’s flash-based memory is definitely faster than spinning disks, it is still generally a block-level device and still must go through OS I/O, the I/O controller, marshaling, and buffering. For applications that need to access individual bytes of data, DRAM is still about a hundred times – or, in some cases, a thousand times – faster than flash-based access. If a ten- millisecond latency is good enough for a particular application, then flash-based memory is fine. However, if the application requires latency of one millisecond or half a millisecond, there is a benefit to paying a little more, and doing most of the processing through DRAM. Myth #4. Memory is only for caching. While caching is an important aspect of memory, many systems use caching only as a small component of their architectures. Today, many types of processing happen directly in memory: machine learning, streaming, queries, transactions, and so on. As awareness grows that these common misperceptions

A Gridgain Systems In-Memory Computing White Paper

JSON Processing in Apache Ignite As Cache of RDBMS

Beyond Relational Databases

Apache Ignitetm In-Memory Data Fabric in Action Fast Data Meets Open Source DMITRIY SETRAKYAN Founder, PMC

A Gridgain Systems In-Memory Computing White Paper

Apache Ignite and Gridgain Enterprise Data Fabric Training Gridgain Brings Apache Ignite to the Enterprise

Evaluation of SQL Benchmark for Distributed In-Memory Database Management Systems

TIBCO® MDM Cloud Deployment Guide Version 9.3.0 December 2020 Document Updated: March 2021

Apache Ignitetm Distributed In-Memory SQL Queries

Apache Ignite [email protected], Ptupitsyn.Github.Io Traditional Approach: Bring Data to Code

Apache Ignite In-‐Memory Data Fabric for .NET

In-Memory Databases and Apache Ignite

Apache Ignite Management and Monitoring Solution with Gridgain Control Center