Nosql – Content: Everything – Closed Book – Can Bring 2 Letter-Size Pieces of Paper with Notes • Can Write on Both Sides

Announcements

• HW8 is due on Friday Introduction to Data Management CSE 344 • Final exam – Wednesday, December 16th @ 8:30am – In our regular classroom Lecture 28: NoSQL – Content: everything – Closed book – Can bring 2 letter-size pieces of paper with notes • Can write on both sides

CSE 344 - Fall 2015 1 CSE 344 - Fall 2015 2

How To Study

• Go over the lecture notes • Read the book • Go over the homeworks The New Hipster: NoSQL • Practice – Practice web quiz posted – Finals & midterms from past 344s • Ask course staff questions! • The goal of the final is to help you learn!

CSE 344 - Fall 2015 3 CSE 344 - Fall 2015 4

Optional References NoSQL Motivation

• Scalable SQL and NoSQL Data Stores, Rick • Originally motivated by Web 2.0 applications Cattell, SIGMOD Record, December 2010 (Vol. 39, No. 4) • Goal is to scale simple OLTP-style workloads • Bigtable: A Distributed Storage System for to thousands or millions of users Structured Data. Fay Chang, Jeffrey Dean, et. al. OSDI 2006 • Users are doing both updates and reads

• Online documentation: Amazon SimpleDB, Google App Engine Datastore, etc.

CSE 344 - Fall 2015 5 CSE 344 - Fall 2015 6

1 What is the Problem? Scale Through Partitioning

• Scaling a relational DBMS is hard • Partition the database across many machines in a cluster • Spread queries across these machines • Can increase throughput • We saw how to scale queries with parallel DBMSs • Easy for reads but writes become expensive! • Need 2PC (two phase commit) to ensure serializability • Much more difficult to scale transactions

Transaction • Because need to ensure ACID properties starts here Also touches data here – Hard to do beyond a single machine Three partitions CSE 344 - Fall 2015 7 CSE 344 - Fall 2015 8

Scale Through Replication Scaling Transactions

• Create multiple copies of each database partition • Need to partition the db across machines • Spread queries across these replicas • Can increase throughput and lower latency • If a transaction touches one machine • Can also improve fault-tolerance – Life is good • Easy for reads but writes become expensive!

• If a transaction touches multiple machines Some Other – ACID becomes extremely expensive! requests requests – Need two-phase commit Three replicas CSE 344 - Fall 2015 9 CSE 344 - Fall 2015 10

Two-Phase Commit: The Setting Two-Phase Commit: Motivation

Coordinator / Master • Data partitioned across multiple nodes Subordinate 1 1) User decides 2) COMMIT to commit • Query touches multiple partitions and commits 3) COMMIT 4) Coordinator Each subordinate crashes holds fraction of database • Lock multiple partitions at the same time What do we do now? Subordinate 2 But I already aborted!

Subordinate 3 Example: Each node holds some subset of bank accounts CSE 344 - Fall 2015 11 Transaction transfers money12

2 2PC: Phase 1 Illustrated 2PC: Phase 2 Illustrated

Coordinator / master Coordinator / master 2) PREPARE Subordinate 1 2) COMMIT Subordinate 1 1) User decides to commit 3) YES 3) ACK Transaction is 2) COMMIT 2) PREPARE now committed! 3) YES 3) ACK 3) YES 3) ACK 2) PREPARE Subordinate 2 2) COMMIT Subordinate 2

Subordinate 3 Subordinate 3

CSE 344 - Fall 2015 13 CSE 344 - Fall 2015 14

Cattell, SIGMOD Record 2010

NoSQL Key Feature Decisions “Not Only SQL” or “Not Relational”

• Want a data management system that is Six key features: – Elastic and highly scalable 1. Scale horizontally “simple operations” – Flexible (different records have different schemas) – key lookups, reads and writes of one record or a small number of records, simple selections 2. Replicate/distribute data over many servers • To achieve above goals, will give up All updates – Complex queries: e.g., give up on joins eventually reach 3. Simple call level interface (contrast w/ SQL) all replicas – Multi-object transactions 4. Weaker concurrency model than ACID – ACID guarantees: e.g., eventual consistency is OK 5. Efficient use of distributed indexes and RAM – Not all NoSQL systems give up all these properties 6. Flexible schema CSE 344 - Fall 2015 15 CSE 344 - Fall 2015 16

Cattell, SIGMOD Record 2010

Terminology ACID Vs BASE

• Sharding = horizontal partitioning by some • ACID = Atomicity, Consistency, Isolation, and key, and storing records on different servers Durability in order to improve performance • BASE = Basically Available, Soft state, • Horizontal scalability = distribute both data Eventually consistent

and load over many servers Scale-out

• Vertical scaling = when a dbms uses multiple cores and/or CPUs Scale-up CSE 344 - Fall 2015 17 CSE 344 - Fall 2015 18

3 Cattell, SIGMOD Record 2010

Data Models Different Types of NoSQL

• Tuple = row in a relational database Taxonomy based on data models: ☞ • Key-value stores • Key-value = records identified with keys have values that are opaque blobs – e.g., Project Voldemort, Memcached • Document stores • Document = nested values, extensible records (think – e.g., SimpleDB, CouchDB, MongoDB XML, JSON, attribute-value pairs) • Extensible Record Stores • Extensible record = families of attributes have a – e.g., HBase, Cassandra, PNUTS schema, but new attributes may be added

CSE 344 - Fall 2015 19 CSE 344 - Fall 2015 20

Key-Value Stores Features Key-Value Stores Internals

• Data model: (key,value) pairs • Data remains in main memory – A single key-value index for all the data • One type of impl.: distributed hash table • Operations • Most systems also offer a persistence option – Insert, delete, and lookup operations on keys • Others use replication to provide fault-tolerance • Distribution / Partitioning – Asynchronous or synchronous replication – Distribute keys across different nodes – Tunable consistency: read/write one replica or majority • Other features • Some offer ACID transactions others do not – Versioning • Multiversion concurrency control or locking – Sorting CSE 344 - Fall 2015 21 CSE 344 - Fall 2015 22

Cattell, SIGMOD Record 2010

Different Types of NoSQL Amazon SimpleDB (1/3)

Taxonomy based on data models: A Document Store • Key-value stores – e.g., Project Voldemort, Memcached • Partitioning – Data partitioned into domains: queries run within a domain ☞ • Document stores – Domains seem to be unit of replication. Limit 10GB – e.g., SimpleDB, CouchDB, MongoDB – Can use domains to manually create parallelism • Extensible Record Stores – e.g., HBase, Cassandra, PNUTS • Data Model / Schema – No fixed schema – Objects are defined with attribute-value pairs

CSE 344 - Fall 2015 23 CSE 344 - Fall 2015 24

4 Amazon SimpleDB (2/3) Amazon SimpleDB (3/3)

• Indexing • Availability and consistency – Automatically indexes all attributes – “Fully indexed data is stored redundantly across multiple servers and data centers” – “Takes time for the update to propagate to all storage • Support for writing locations. The data will eventually be consistent, but an – PUT and DELETE items in a domain immediate read might not show the change” – Today, can choose between consistent or eventually select output_list consistent read • Support for querying from domain_name [where expression] – GET by key [sort_instructions] – Selection + sort [limit limit] – A simple form of aggregation: count – Query is limited to 5s and 1MB output (but can continue) CSE 344 - Fall 2015 25 CSE 344 - Fall 2015 26

Cattell, SIGMOD Record 2010

Different Types of NoSQL Extensible Record Stores

Taxonomy based on data models: • Based on Google’s BigTable • Key-value stores – e.g., Project Voldemort, Memcached • Data model is rows and columns • Document stores • Scalability by splitting rows and columns over nodes – e.g., SimpleDB, CouchDB, MongoDB – Rows partitioned through sharding on primary key ☞ • Extensible Record Stores – Columns of a table are distributed over multiple nodes by – e.g., HBase, Cassandra, PNUTS using “column groups”

• HBase is an open source implementation of BigTable

CSE 344 - Fall 2015 27 CSE 344 - Fall 2015 28

Chang, OSDI 2006

What is Bigtable? Bigtable Data Model • Sparse, multidimensional sorted map • Distributed storage system (row:string, column:string, time:int64)è string • Designed to Notice how everything but time is a string – Hold structured data – Scale to thousands of servers • Example: Columns are grouped into families – Store up to several hundred TB (maybe even PB) – Perform backend bulk processing – Perform real-time data serving • To scale, Bigtable has a limited set of features

CSE 344 - Fall 2015 29 CSE 344 - Fall 2015 30

5 Chang, OSDI 2006 Chang, OSDI 2006

BigTable Key Features BigTable API

• Read/writes of data under single row key is atomic • Data definition – Only single-row transactions! – Creating/deleting tables or column families • Data is stored in lexicographical order – Changing access control rights – Improves data access locality • Data manipulation • Column families are unit of access control – Writing or deleting values • Data is versioned (old versions garbage collected) – Supports single-row transactions – Ex: most recent three crawls of each page, with times – Looking up values from individual rows – Iterating over subset of data in the table • Can select on rows, columns, and timestamps

CSE 344 - Fall 2015 31 CSE 344 - Fall 2015 32

Cattell, SIGMOD Record 2010 Cattell, SIGMOD Record 2010

Megastore Different Types of NoSQL

• BigTable is implemented, used within Google Taxonomy based on data models: • Key-value stores • Megastore is a layer on top of BigTable – e.g., Project Voldemort, Memcached – Transactions that span nodes • Document stores – A database schema defined in a SQL-like language – e.g., SimpleDB, CouchDB, MongoDB – Hierarchical paths that allow some limited joins • Extensible Record Stores – e.g., HBase, Cassandra, PNUTS • Megastore is made available through the Google App Engine Datastore CSE 344 - Fall 2015 33 CSE 344 - Fall 2015 38

NoSQL Systems Challenge

Hard to build applications when • Limited querying capabilities • Limited transactions • Weak consistency

Can we do better? • Yes with NewSQL systems!

CSE 344 - Fall 2015 39