Couchbase Architecture

©2015 Couchbase Inc. 1 $whoami

Laurent Doguin Couchbase Developer Advocate @ldoguin | [email protected]

©2015 Couchbase Inc. 2 2 Big Data = Operational + Analytic (NoSQL + Hadoop)

Real-time, Batch-oriented interactive databases analytic databases

OPERATIONAL VELOCITY ANALYTICAL VOLUME

§ Online § Offline, batch-oriented § Web/Mobile/IoT apps § Analytics apps § Millions of customers/ § Hundreds of business analysts ©2015 Couchbase Inc. consumers 3 Key Capabilities

Combines the flexibility of JSON, the power of SQL and the scale of NoSQL

N1QL • Develop with Agility Operate at Any Scale

­ Multiple data models ­ Push-button scalability ­ N1QL - SQL-Like query ­ Consistent high-performance language ­ Always on 24x7 with HA - DR ­ Multiple indexes ­ Easy Administration with Web UI, ­ Languages, ODBC / JDBC Rest API and CLI drivers and frameworks you already know ©2015 Couchbase Inc. 4 Couchbase provides a complete Data Management solution

General purpose capabilities support a broad range of apps and use cases

N1QL

Highly available Key-value Document Embedded Sync cache store database database management

©2015 Couchbase Inc. 5 Enterprises use Couchbase to enable key objectives

Profile 360 Degree Internet of Mobile Personalization Things Management Customer View Applications

Content Catalog Real Time Digital Fraud Management Big Data Communication Detection

©2015 Couchbase Inc. 6 Develop with Agility

©2015 Couchbase Inc. 7 What does a JSON document look like?

{ “ID”: 1, “FIRST”: “Dipti”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, = + “STATE”: “CA” } JSON

All data in a single document

©2015 Couchbase Inc. 8 Storing and retrieving documents

Clients Documents User/application data Read from / Written to

Servers Data Buckets Which live on Server Nodes Based on hash partitioning

That form a

Couchbase Cluster Dynamically scalable

©2015 Couchbase Inc. ©2014 Couchbase, Inc. 9 Accessing Data in Couchbase

§ Multiple Access Paths

Functional Allow for view querying, building of queries Hold on to cluster information such as Give the application developer a concurrent Manage connections to the bucket within the Allow for querying, execution of other and reasonable error handling from the cluster. topology. API for basic (k-v) or document management cluster for different services. directives such as defining indexes and View Provide a core layer where IO can be managed checking on index state. CRUD N1QL Query Query API and optimized. abucket.NewViewQueryReference Cluster Management get() API Provide a way to manage buckets. ().Limit().Stale() Query & Index Data Service abucket.NewN1QLQuery( openBucketinsert() () Services “SELECT * FROM default LIMIT 5” ) info() upsertAPI () Cluster .Consistency(disconnect() remove() insertDesignDocumentgocouchbase.RequestPlus() ); flush() listDesignDocuments()

©2015 Couchbase Inc. 10 Couchbase SDKs and Connectors

©2015 Couchbase Inc. 11 Operate at Any Scale

©2015 Couchbase Inc. 12 Couchbase Architecture – Single Node

ü Data Service – builds and maintains

Distributed secondary indexes Data Index Query Management REST API (MapReduce Views) Service Service Service Web UI

ü Indexing Engine – builds and

maintains Global Secondary Indexes Managed Cache Indexing Query Engine Engine

View Engine Node / ü Query Engine – plans, coordinates, Cluster and executes queries against either Orchestration Global or Distributed indexes Managed Cache Storage Managed ü Cluster Manager – configuration, Cache heartbeat, statistics, RESTful Storage Erlang / OTP Management interface Node Manager Cluster Manager Couchbase Server Node ©2015 Couchbase Inc. 13 13 Data Service: Write Operation

Single-node type means APPLICATION SERVER easier administration and scaling DOC 1 § Writes are async by default § Application gets acknowledgement when MANAGED CACHE successfully in RAM and can trade- off waiting for replication or DOC 1 persistence per-write

REPLICATION/ § Replication to 1, 2 or 3 other nodes XDCR/ CONNECTORS/ VIEWS/ DISK § Replication is RAM-based so INDEXING DISK extremely fast QUEUE § Off-node replication is primary level of HA § Disk written to as fast as possible – ©2015 Couchbase Inc. no waiting 14 14 Data Service: Read Operation

APPLICATION SERVER Single-node type means

GET easier administration and DOC 1 scaling § Reads out of cache are extremely fast

MANAGED CACHE § No other process/system to communicate with DOC 1 § Data connection is a TCP-binary REPLICATION/ XDCR/ protocol CONNECTORS/ VIEWS/ DISK INDEXING DISK QUEUE DOC 1

©2015 Couchbase Inc. 15 15 Data Service: Cache Miss

Single-node type means APPLICATION SERVER easier administration and GET DOC 1 scaling § Layer consolidation means 1 single interface for App to talk to and get its data back as fast as

MANAGED CACHE possible § DOC 1 DOC 2 DOC 3 DOC 4 DOC 5 Separation of cache and disk allows for fastest access out of

REPLICATION/ RAM while pulling data from disk XDCR/ CONNECTORS/ in parallel VIEWS/ DISK INDEXING DISK QUEUE DOC 1 DOC 2 DOC 3 DOC 4 DOC 5

©2015 Couchbase Inc. 16 16 Couchbase Views

§ Local Index – Distributed indexing and scatter gather querying

§ Incremental Map-Reduce – Distributed simple real-time analytics – Only considers changes due to updated data

©2015 Couchbase Inc. ©2014 Couchbase, Inc. 17 Index Service

©2015 Couchbase Inc. 18 Couchbase Global Indexing Service

Global Secondary Index Service

Index#1 Index#2 § New to 4.0

§ Indexes partitioned Index#3 Index#4 independently from data § Each index receives only its own Supervisor mutations Index maintenance & Scan coordinator § Managed Caching layer § ForestDB storage engine § B+ Trie optimized for very large data volumes Indexing Service § Optimized for SSD’s

©2015 Couchbase Inc. 19 Query Service

©2015 Couchbase Inc. 20 Query Execution Flow

SELECT c_id, { c_first, "c_first": "Joe", c_last, Clients "c_id": 49165, c_max "c_last": "Montana", FROM CUSTOMER "c_max" : 50000 WHERE c_id = 49165; } 1. Submit the query over REST API 8. Query result

2. Parse, Analyze, create Plan Query 7. Evaluate: Documents to results Service 3. Scan Request; 5. Fetch Request, Index index filters doc keys Data Service Service 4. Get qualified doc keys 6. Fetch the documents

©2015 Couchbase Inc. 21 Couchbase Clustering Architecture

©2015 Couchbase Inc. 22 22 Auto sharding – Bucket and vBuckets

Data buckets

vB vB vB vB

1 ….. 1024 1 ….. 1024

Active Virtual buckets Replica Virtual buckets ©2015 Couchbase Inc. 23 Cluster Map

Couchbase SDK Couchbase SDK

CRC32 CRC32 Hashing Algorithm Hashing Algorithm

CLUSTER MAP CLUSTER MAP vBucket1024 vBucket2 vBucket4 vBucket6 vBucket5 vBucket3 vBucket7 vBucket1024 vBucket1 vBucket2 vBucket4 vBucket6 vBucket5 vBucket3 vBucket7 vBucket1 ......

©2015 Couchbase Inc. 24 Couchbase Cluster Couchbase Cluster Data Services – Sharding and Replication

Application has single logical connection to cluster (client object) READ/WRITE/UPDATE § Multiple nodes added or ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE removed at once

SHARD SHARD SHARD SHARD SHARD SHARD SHARD SHARD SHARD 5 2 9 4 7 8 1 3 6 § One-click operation

SHARD SHARD SHARD SHARD SHARD SHARD SHARD SHARD SHARD § Incremental movement of active and replica vbuckets and data REPLICA REPLICA REPLICA REPLICA REPLICA § Client library updated via SHARD SHARD SHARD SHARD SHARD SHARD SHARD SHARD SHARD 4 1 8 6 3 2 7 9 5 cluster map

SHARD SHARD SHARD SHARD SHARD SHARD SHARD SHARD SHARD § Fully online operation, no downtime or loss of Couchbase Server 1 Couchbase Server 2 Couchbase Server 3 Couchbase Server 4 Couchbase Server 5 performance

©2015 Couchbase Inc. 25 25 What is Multi-Dimensional Scaling?

MDS is the architecture that enables independent scaling of data, query and indexing workloads while being managed as one cluster

node1 node8

Index Service

Query Service

Data Service

©2015 Couchbase Inc. 26 Couchbase Cluster Modern Architecture

§ Independent Scalability for Best Computational Capacity per Service

Heavier indexing (index more fields) : scale up index service nodes More RAM for query processing: scale up query service nodes

node1 node8 node9 Query Service Index Service

Data Service

©2015 Couchbase Inc. 27 Couchbase Cluster Cross Data Center Replication

©2015 Couchbase Inc. 28 Market leading memory-to-memory replication

NYC Server Cluster

Couchbase Server 1 Couchbase Server 2 Couchbase Server 3 Couchbase Server 4

MEMORY DISK MEMORY DISK MEMORY DISK MEMORY DISK New York San Francisco

MEMORY DISK MEMORY DISK MEMORY DISK

Couchbase Server 1 Couchbase Server 2 Couchbase Server 3

SF Server Cluster

©2015 Couchbase Inc. 29 In summary

The best of both worlds

N1QL • Develop with Agility Operate at Any Scale

­ Multiple data models ­ Push-button scalability ­ N1QL - SQL-Like query ­ Consistent high-performance language ­ Always on 24x7 with HA - DR ­ Multiple indexes ­ Easy Administration with Web UI, ­ Languages, ODBC / JDBC Rest API and CLI drivers and frameworks you already know ©2015 Couchbase Inc. 30 Thanks!

©2015 Couchbase Inc. 31