High Performance with Distributed Caching

Total Page:16

File Type:pdf, Size:1020Kb

High Performance with Distributed Caching High Performance with Distributed Caching Key Requirements For Choosing The Right Solution High Performance with Distributed Caching: Key Requirements for Choosing the Right Solution Table of Contents Executive summary 3 Companies are choosing Couchbase for their caching layer, and much more 3 Memory-first 4 Persistence 4 Elastic scalability 4 Replication 5 More than caching 5 About this guide 5 Memcached and Oracle Coherence – two popular caching solutions 6 Oracle Coherence 6 Memcached 6 Why cache? Better performance, lower costs 6 Common caching use cases 7 Key requirements for an effective distributed caching solution 8 Problems with Oracle Coherence: cost, complexity, capabilities 8 Memcached: A simple, powerful open source cache 10 Lack of enterprise support, built-in management, and advanced features 10 Couchbase Server as a high-performance distributed cache 10 General-purpose NoSQL database with Memcached roots 10 Meets key requirements for distributed caching 11 Develop with agility 11 Perform at any scale 11 Manage with ease 12 Benchmarks: Couchbase performance under caching workloads 12 Simple migration from Oracle Coherence or Memcached to Couchbase 13 Drop-in replacement for Memcached: No code changes required 14 Migrating from Oracle Coherence to Couchbase Server 14 Beyond caching: Simplify IT infrastructure, reduce costs with Couchbase 14 About Couchbase 14 Caching has become Executive Summary a de facto technology to boost application For many web, mobile, and Internet of Things (IoT) applications that run in clustered performance as well or cloud environments, distributed caching is a key requirement, for reasons of both as reduce costs. performance and cost. By caching frequently accessed data in memory – rather than making round trips to the backend database – applications can deliver highly responsive experiences that today’s users expect. And by reducing workloads on backend resources and network calls to the backend, caching can significantly lower capital and operating costs. All distributed caching solutions solve for common problems – performance, manageability, scalability – in order to gain effective access to data in high-quality applications. Figure 1: The three common requirements for distributed caching solutions High performance is a given, because the primary goal of caching is to alleviate the bottlenecks that come with traditional databases. This is not limited to relational databases, however. NoSQL databases like MongoDB™ also have to make up for their performance problems by recommending a third-party cache, such as Redis, to service large numbers of requests in a timely manner. Caching solutions must be easy to manage, but often are not. Whether it’s being able to easily add a new node, or to resize existing services, it needs to be quick and easy to configure. The best solutions provide graphical user interfaces and REST APIs to help keep things manageable. Elastic scalability refers not only to the ability to grow a cluster as needed, but also refers to the ability to replicate across multiple wide area cluster networks. Cross datacenter replication is a feature that is either missing or very poor performing across many caching technologies. To achieve this scalability, several products often have to be glued together, thereby decreasing manageability and greatly increasing cost. Companies are choosing Couchbase for their caching layer, and much more Couchbase Server is a great fit for many caching scenarios. More than 400 enterprises – including eBay, PayPal, LinkedIn, Intuit, Viber, and many others – have deployed Couchbase Server for mission-critical applications. So why have they chosen Couchbase 3 over the alternatives? It’s not only for the caching capabilities but also for other powerful features, a few of which are described here. Caching solutions are built to solve the problems mentioned above. Many are merely key-value stores with in-memory capabilities and some ability to scale out. However, their overall architecture prevents them from being much more than that. Couchbase was architected to solve the core problems that are often related to caching but are really just standard performance and scalability issues. Additionally, Couchbase addresses challenges relating to high availability for applications as well as comprehensive durability for data with failover options. There are four particular capabilities that are core to Couchbase and that serve as the foundation for all the other capabilities of the database. Memory-first Couchbase has a memory-first architecture – it can run as a pure in-memory database, letting applications get the fastest possible access to data without hitting a disk. It can also service data beyond the size of memory, using efficient data access methods that keep memory loaded with the latest updates. Couchbase is a high-performance and flexible caching solution that makes the most out of the resources that are available on each node. Incidentally, it is also an easy upgrade for those moving from Memcached – they can simply drop-in Couchbase as a full replacement and instantly gain several advantages while also maintaining in-memory performance as well as adding many other advantages. Persistence Although in-memory capabilities are available, that does not mean that Couchbase only operates in memory. Eventual data persistence is integral to Couchbase and provides the added benefit of data durability that enterprises demand. For those who are simply using Couchbase as a cache, they benefit from not having to reload the cache every time they either need to adjust their infrastructure or respond to a system failure – the data remains available as expected. Elastic scalability In-memory caching solutions should not require you to scale up your hardware to keep pace with your data volumes. Couchbase allows you to easily scale out your cluster with commodity hardware when you want to share the workload or distribute data resources. Enterprise users demand this functionality as it minimizes infrastructure expenditures. Similarly, different kinds of workloads in Couchbase can be isolated from each other depending on need. If you need to improve performance for one aspect of Couchbase, for example the query service capabilities, then you only need to add more query nodes. Likewise, if you have more data but don’t need to scale querying capabilities, you can selectively add more data nodes. This is referred to as Multi-Dimensional Scaling (MDS) and allows users to isolate their workloads while also incrementally increasing access to more cluster resources as needed. Other solutions require you to overprovision new hardware resources in anticipation of future problems, but that is not required with Couchbase. 4 Replication When you combine the in-memory, elastic scalability, and persistence capabilities of Couchbase you have a very powerful platform. But today’s enterprise demands require even more. In order to service the high availability needs of large users, a solution for replicating not only across multiple nodes, but also across multiple clusters, is required. Intra-node replication is done at the speed of memory and local persistence of replicated data is performed rapidly. Couchbase also offers intra-cluster replication that is very efficient and serves as a way to keep data both highly available and local to application servers in different geographic regions. Cross datacenter replication (XDCR) is one of the most powerful features of Couchbase, and is easy to use and set up through a graphical web administration console. Varying types of replication topologies – including master-master – are supported, providing many options for keeping data highly available and easy to access. More than caching When you combine all the above architectural advantages of Couchbase you have the most comprehensive, high-performance NoSQL database platform to build on top of. While caching is an obvious use case to get started, there is so much that is possible once your data is in Couchbase. By caching frequently Other features include SQL-like querying using the Couchbase NoSQL query language used data in memory (N1QL) – effectively letting you query JSON data without having to enforce a schema or – rather than making database round trips transform your data to behave a certain way just to get answers to queries. and incurring disk io More advanced real-time analytic queries are also possible as well as full-text searches. overhead – application Many developer-centric features exist in Couchbase, from server-side event processing, response times can be dramatically improved, operation tracing, and automatic application failover between clusters. typically by orders of These are all features that the most demanding enterprises require for effective data magnitude. management. The remainder of this paper explores these concepts further and contrasts them with other solutions within the overall context of caching solutions. About this guide Based on Couchbase’s experience with leading enterprises, this guide: • Explains the value of caching and common caching use cases • Details the key requirements for an effective, highly available distributed cache • Explains differences in capabilities across Oracle Coherence, Memcached, and Couchbase Server • Describes how Couchbase Server provides a high-performance, low-cost, and easy-to- manage caching solution 5 Memcached and Oracle Coherence – two popular caching solutions Memcached and Oracle Coherence are a small part
Recommended publications
  • Redis and Memcached
    Redis and Memcached Speaker: Vladimir Zivkovic, Manager, IT June, 2019 Problem Scenario • Web Site users wanting to access data extremely quickly (< 200ms) • Data being shared between different layers of the stack • Cache a web page sessions • Research and test feasibility of using Redis as a solution for storing and retrieving data quickly • Load data into Redis to test ETL feasibility and Performance • Goal - get sub-second response for API calls for retrieving data !2 Why Redis • In-memory key-value store, with persistence • Open source • Written in C • It can handle up to 2^32 keys, and was tested in practice to handle at least 250 million of keys per instance.” - http://redis.io/topics/faq • Most popular key-value store - http://db-engines.com/en/ranking !3 History • REmote DIctionary Server • Released in 2009 • Built in order to scale a website: http://lloogg.com/ • The web application of lloogg was an ajax app to show the site traffic in real time. Needed a DB handling fast writes, and fast ”get latest N items” operation. !4 Redis Data types • Strings • Bitmaps • Lists • Hyperlogs • Sets • Geospatial Indexes • Sorted Sets • Hashes !5 Redis protocol • redis[“key”] = “value” • Values can be strings, lists or sets • Push and pop elements (atomic) • Fetch arbitrary set and array elements • Sorting • Data is written to disk asynchronously !6 Memory Footprint • An empty instance uses ~ 3MB of memory. • For 1 Million small Keys => String Value pairs use ~ 85MB of memory. • 1 Million Keys => Hash value, representing an object with 5 fields,
    [Show full text]
  • Modeling and Analyzing Latency in the Memcached System
    Modeling and Analyzing Latency in the Memcached system Wenxue Cheng1, Fengyuan Ren1, Wanchun Jiang2, Tong Zhang1 1Tsinghua National Laboratory for Information Science and Technology, Beijing, China 1Department of Computer Science and Technology, Tsinghua University, Beijing, China 2School of Information Science and Engineering, Central South University, Changsha, China March 27, 2017 abstract Memcached is a widely used in-memory caching solution in large-scale searching scenarios. The most pivotal performance metric in Memcached is latency, which is affected by various factors including the workload pattern, the service rate, the unbalanced load distribution and the cache miss ratio. To quantitate the impact of each factor on latency, we establish a theoretical model for the Memcached system. Specially, we formulate the unbalanced load distribution among Memcached servers by a set of probabilities, capture the burst and concurrent key arrivals at Memcached servers in form of batching blocks, and add a cache miss processing stage. Based on this model, algebraic derivations are conducted to estimate latency in Memcached. The latency estimation is validated by intensive experiments. Moreover, we obtain a quantitative understanding of how much improvement of latency performance can be achieved by optimizing each factor and provide several useful recommendations to optimal latency in Memcached. Keywords Memcached, Latency, Modeling, Quantitative Analysis 1 Introduction Memcached [1] has been adopted in many large-scale websites, including Facebook, LiveJournal, Wikipedia, Flickr, Twitter and Youtube. In Memcached, a web request will generate hundreds of Memcached keys that will be further processed in the memory of parallel Memcached servers. With this parallel in-memory processing method, Memcached can extensively speed up and scale up searching applications [2].
    [Show full text]
  • Release 2.5.5 Ask Solem Contributors
    Celery Documentation Release 2.5.5 Ask Solem Contributors February 04, 2014 Contents i ii Celery Documentation, Release 2.5.5 Contents: Contents 1 Celery Documentation, Release 2.5.5 2 Contents CHAPTER 1 Getting Started Release 2.5 Date February 04, 2014 1.1 Introduction Version 2.5.5 Web http://celeryproject.org/ Download http://pypi.python.org/pypi/celery/ Source http://github.com/celery/celery/ Keywords task queue, job queue, asynchronous, rabbitmq, amqp, redis, python, webhooks, queue, dis- tributed – • Synopsis • Overview • Example • Features • Documentation • Installation – Bundles – Downloading and installing from source – Using the development version 1.1.1 Synopsis Celery is an open source asynchronous task queue/job queue based on distributed message passing. Focused on real- time operation, but supports scheduling as well. The execution units, called tasks, are executed concurrently on one or more worker nodes using multiprocessing, Eventlet or gevent. Tasks can execute asynchronously (in the background) or synchronously (wait until ready). Celery is used in production systems to process millions of tasks every hour. 3 Celery Documentation, Release 2.5.5 Celery is written in Python, but the protocol can be implemented in any language. It can also operate with other languages using webhooks. There’s also RCelery for the Ruby programming language, and a PHP client. The recommended message broker is RabbitMQ, but support for Redis, MongoDB, Beanstalk, Amazon SQS, CouchDB and databases (using SQLAlchemy or the Django ORM) is also available. Celery is easy to integrate with web frameworks, some of which even have integration packages: Django django-celery Pyramid pyramid_celery Pylons celery-pylons Flask flask-celery web2py web2py-celery 1.1.2 Overview This is a high level overview of the architecture.
    [Show full text]
  • An End-To-End Application Performance Monitoring Tool
    Applications Manager - an end-to-end application performance monitoring tool For enterprises to stay ahead of the technology curve in today's competitive IT environment, it is important for their business critical applications to perform smoothly. Applications Manager - ManageEngine's on-premise application performance monitoring solution offers real-time application monitoring capabilities that offer deep visibility into not only the performance of various server and infrastructure components but also other technologies used in the IT ecosystem such as databases, cloud services, virtual systems, application servers, web server/services, ERP systems, big data, middleware and messaging components, etc. Highlights of Applications Manager Wide range of supported apps Monitor 150+ application types and get deep visibility into all the components of your application environment. Ensure optimal performance of all your applications by visualizing real-time data via comprehensive dashboards and reports. Easy installation and setup Get Applications Manager running in under two minutes. Applications Manager is an agentless monitoring tool and it requires no complex installation procedures. Automatic discovery and dependency mapping Automatically discover all applications and servers in your network and easily categorize them based on their type (apps, servers, databases, VMs, etc.). Get comprehensive insight into your business infrastructure, drill down to IT application relationships, map them effortlessly and understand how applications interact with each other with the help of these dependency maps. Deep dive performance metrics Track key performance metrics of your applications like response times, requests per minute, lock details, session details, CPU and disk utilization, error states, etc. Measure the efficiency of your applications by visualizing the data at regular periodic intervals.
    [Show full text]
  • WEB2PY Enterprise Web Framework (2Nd Edition)
    WEB2PY Enterprise Web Framework / 2nd Ed. Massimo Di Pierro Copyright ©2009 by Massimo Di Pierro. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600, or on the web at www.copyright.com. Requests to the Copyright owner for permission should be addressed to: Massimo Di Pierro School of Computing DePaul University 243 S Wabash Ave Chicago, IL 60604 (USA) Email: [email protected] Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created ore extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Library of Congress Cataloging-in-Publication Data: WEB2PY: Enterprise Web Framework Printed in the United States of America.
    [Show full text]
  • LIST of NOSQL DATABASES [Currently 150]
    Your Ultimate Guide to the Non - Relational Universe! [the best selected nosql link Archive in the web] ...never miss a conceptual article again... News Feed covering all changes here! NoSQL DEFINITION: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply such as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge amount of data and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above. [based on 7 sources, 14 constructive feedback emails (thanks!) and 1 disliking comment . Agree / Disagree? Tell me so! By the way: this is a strong definition and it is out there here since 2009!] LIST OF NOSQL DATABASES [currently 150] Core NoSQL Systems: [Mostly originated out of a Web 2.0 need] Wide Column Store / Column Families Hadoop / HBase API: Java / any writer, Protocol: any write call, Query Method: MapReduce Java / any exec, Replication: HDFS Replication, Written in: Java, Concurrency: ?, Misc: Links: 3 Books [1, 2, 3] Cassandra massively scalable, partitioned row store, masterless architecture, linear scale performance, no single points of failure, read/write support across multiple data centers & cloud availability zones. API / Query Method: CQL and Thrift, replication: peer-to-peer, written in: Java, Concurrency: tunable consistency, Misc: built-in data compression, MapReduce support, primary/secondary indexes, security features.
    [Show full text]
  • Database Software Market: Billy Fitzsimmons +1 312 364 5112
    Equity Research Technology, Media, & Communications | Enterprise and Cloud Infrastructure March 22, 2019 Industry Report Jason Ader +1 617 235 7519 [email protected] Database Software Market: Billy Fitzsimmons +1 312 364 5112 The Long-Awaited Shake-up [email protected] Naji +1 212 245 6508 [email protected] Please refer to important disclosures on pages 70 and 71. Analyst certification is on page 70. William Blair or an affiliate does and seeks to do business with companies covered in its research reports. As a result, investors should be aware that the firm may have a conflict of interest that could affect the objectivity of this report. This report is not intended to provide personal investment advice. The opinions and recommendations here- in do not take into account individual client circumstances, objectives, or needs and are not intended as recommen- dations of particular securities, financial instruments, or strategies to particular clients. The recipient of this report must make its own independent decisions regarding any securities or financial instruments mentioned herein. William Blair Contents Key Findings ......................................................................................................................3 Introduction .......................................................................................................................5 Database Market History ...................................................................................................7 Market Definitions
    [Show full text]
  • Couchbase Vs Mongodb Architectural Differences and Their Impact
    MongoDB vs. Couchbase Server: Architectural Differences and Their Impact This 45-page paper compares two popular NoSQL offerings, diving into their architecture, clustering, replication, and caching. By Vladimir Starostenkov, R&D Engineer at Altoros Q3 2015 Table of Contents 1. OVERVIEW ......................................................................................................................... 3 2. INTRODUCTION ................................................................................................................. 3 3. ARCHITECTURE................................................................................................................. 4 4. CLUSTERING ..................................................................................................................... 4 4.1 MongoDB ............................................................................................................................................. 4 4.2 Couchbase Server .............................................................................................................................. 5 4.3 Partitioning .......................................................................................................................................... 6 4.4 MongoDB ............................................................................................................................................. 7 4.5 Couchbase Server .............................................................................................................................
    [Show full text]
  • White Paper Using Hazelcast with Microservices
    WHITE PAPER Using Hazelcast with Microservices By Nick Pratt Vertex Integration June 2016 Using Hazelcast with Microservices Vertex Integration & Hazelcast WHITE PAPER Using Hazelcast with Microservices TABLE OF CONTENTS 1. Introduction 3 1.1 What is a Microservice 3 2. Our experience using Hazelcast with Microservices 3 2.1 Deployment 3 2.1.1 Embedded 4 2.2 Discovery 5 2.3 Solving Common Microservice Needs with Hazelcast 5 2.3.1 Multi-Language Microservices 5 2.3.2 Service Registry 5 2.4 Complexity and Isolation 6 2.4.1 Data Storage and Isolation 6 2.4.2 Security 7 2.4.3 Service Discovery 7 2.4.4 Inter-Process Communication 7 2.4.5 Event Store 8 2.4.6 Command Query Responsibility Segregation (CQRS) 8 3. Conclusion 8 TABLE OF FIGURES Figure 1 Microservices deployed as HZ Clients (recommended) 4 Figure 2 Microservices deployed with embedded HZ Server 4 Figure 3 Separate and isolated data store per Service 6 ABOUT THE AUTHOR Nick Pratt is Managing Partner at Vertex Integration LLC. Vertex Integration develops and maintains software solutions for data flow, data management, or automation challenges, either for a single user or an entire industry. The business world today demands that every business run at maximum efficiency.T hat means reducing errors, increasing response time, and improving the integrity of the underlying data. We can create a product that does all those things and that is specifically tailored to your needs. If your business needs a better way to collect, analyze, report, or share data to maximize your profitability, we can help.
    [Show full text]
  • Alfresco Enterprise on AWS: Reference Architecture October 2013
    Amazon Web Services – Alfresco Enterprise on AWS: Reference Architecture October 2013 Alfresco Enterprise on AWS: Reference Architecture October 2013 (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this paper) Page 1 of 13 Amazon Web Services – Alfresco Enterprise on AWS: Reference Architecture October 2013 Abstract Amazon Web Services (AWS) provides a complete set of services and tools for deploying business-critical enterprise workloads on its highly reliable and secure cloud infrastructure. Alfresco is an enterprise content management system (ECM) useful for document and case management, project collaboration, web content publishing and compliant records management. Few classes of business-critical applications touch more enterprise users than enterprise content management (ECM) and collaboration systems. This whitepaper provides IT infrastructure decision-makers and system administrators with specific technical guidance on how to configure, deploy, and run an Alfresco server cluster on AWS. We outline a reference architecture for an Alfresco deployment (version 4.1) that addresses common scalability, high availability, and security requirements, and we include an implementation guide and an AWS CloudFormation template that you can use to easily and quickly create a working Alfresco cluster in AWS. Introduction Enterprises need to grow and manage their global computing infrastructures rapidly and efficiently while simultaneously optimizing and managing capital costs and expenses. The computing and storage services from AWS meet this need by providing a global computing infrastructure as well as services that simplify managing infrastructure, storage, and databases. With the AWS infrastructure, companies can rapidly provision compute capacity or quickly and flexibly extend existing on-premises infrastructure into the cloud.
    [Show full text]
  • Couchbase Server Manual 2.1.0 Couchbase Server Manual 2.1.0
    Couchbase Server Manual 2.1.0 Couchbase Server Manual 2.1.0 Abstract This manual documents the Couchbase Server 2.1.0 series, including installation, monitoring, and administration interface and associ- ated tools. For the corresponding Moxi product, please use the Moxi 1.8 series. See Moxi 1.8 Manual. External Community Resources. Download Couchbase Server 2.1 Couchbase Developer Guide 2.1 Client Libraries Couchbase Server Forum Last document update: 05 Sep 2013 23:46; Document built: 05 Sep 2013 23:46. Documentation Availability and Formats. This documentation is available online: HTML Online . For other documentation from Couchbase, see Couchbase Documentation Library Contact: [email protected] or couchbase.com Copyright © 2010-2013 Couchbase, Inc. Contact [email protected]. For documentation license information, see Section F.1, “Documentation License”. For all license information, see Appendix F, Licenses. Table of Contents Preface ................................................................................................................................................... xiii 1. Best Practice Guides ..................................................................................................................... xiii 1. Introduction to Couchbase Server .............................................................................................................. 1 1.1. Couchbase Server and NoSQL ........................................................................................................ 1 1.2. Architecture
    [Show full text]
  • Getting Started
    3/29/2021 Getting Started v2.2 Guides Getting Started This page will help you get started with Hazelcast Cloud. Here are the steps to set up your first cluster in Hazelcast Cloud: 1. Register Create your account on here. A confirmation will be sent to you. When you confirm the email, your account becomes ready for use. 2. Sign-in After setting your password via the link provided by email, you can log in with your email and password here. https://docs.cloud.hazelcast.com/docs/getting-started 1/4 3/29/2021 Getting Started v2.2 Guides 2.1 Sign-in with Social Providers (Optional) You can use also use sign-in with Github and Google options in order to sign-in easily without spending your time on email verification and filling registration forms. The only thing you should do is selecting your social provider and authorizing Hazelcast Cloud for registration purposes. Then it will directly redirect you to our console. 3. Create a Cluster After successfully logging in, you can create your first cluster by clicking the + New Cluster button in the top left corner. On the New Cluster page, provide a name for your cluster. You can leave the other options https://docs.cloud.hazelcast.com/docs/getting-started 2/4 3/29/2021 Getting Started as they are. Click + Create Cluster to create and start your new cluster. v2.2 Guides Once your cluster is running and ready, you will see the Cluster Memory and Client Count charts as well as lifecycle information about the cluster.
    [Show full text]