Key-Value Graph Document Column-family summary Polygot persistence
NoSQL Systems
February 15, 2016
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
Data in a key-value store is organized around the associative array (a.k.a. Map, Dictionary, Hash) Key-value stores contain collection of key-value pair with each key being unique in the collection. They store any value in the value field including data structures, e.g., JSON. They operate as caches or data structures. They do not use SQL and have a flexible schema. An important feature of Key-Value stores is the sharding of data that delivers scale-out architecture.
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
Consistency only for operations on a single key (operations are limited to get, put, or delete on a single key). Optimistic writes can be performed, but are very expensive to implement, because a change in value cannot be determined by the data store. Eventual consistency in Riak → update conflicts: newest write wins or return values to let client decide. Scaling uses sharding on the key
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
Different strategies for KV stores. In general, no guarantees on writes. Riak uses the concept of Quorum (get write tolerance that way, W=3, R=5) Write quorum: W > N/2, the number of nodes participating in the Writes (W) must be more than half the number of nodes involved in replication (N). Ex: Replication factor of 3, only 2 nodes need to confirm on writes. Read quorum (how many nodes do I need to contact to ensure I have the most up to date value): R + W > N so R > N − W .
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
Use cases Storing session information, e.g., web sessions User profiles and preferences Shopping cart data Do not use for: joins between keys, multioperation transactions (support for rollback), need for secondary indexes, operations by sets.
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
Memcached
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
stands for Memory Cache Daemon in-memory indexing Implementation started in 2003 at LiveJournal (social blogging sites) Project Danga Interactive, BSD licence Implemented in C Used by YouTube, Facebook, Twitter, Orange, GAE, AWS, Flickr, Slashdot, etc. Characteristics: distributed memory, caching system.
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
It is a server that caches Name Value Pairs (NVPs) in memory. Value can be anything: rows of data, HTML, binary objects. When some data are needed, the system checks if it is available in Memcached and if it is not the case retrieves it from disk and stores it in Memcached for future accesses.
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
Query model 4 main commands taking over a TCP or UDP connection. SET: add a new item or replace an existing one with new data ADD: only store the data if the key does not exist. REPLACE: only store the data if the key already exists. GET: return the data
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
is not a persistent data store. can not be dumped to disk. has no security mechanism built-in. does not support any fail-over/ high availability mechanisms. Least Recently Used (LRU) algo to remove data when space is needed. Expiration time can be associated to NVP
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
Components Server: a NVP server. Basically, stores and retrieves data stored with a key. Limitations: length(key)¡250 characters. size(value)¡1MB. Server is atomic (does not care about other servers). Client: knows which server contains an NVP. Client libraries for most languages. Enables to compress NVPs with values greater than 1MB.
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
Redis high performance key-value store flexible schema Supports publish-subscrie messaging Supports many data structures: lists, sets, sorted sets, hashes, hyperloglogs, etc.
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
Hashes collections of key-value pairs map between key string keys and string values Efficient for representing objects and tabular data Operations: HLEN, HKEYS, HVALS, HDEL, HEXISTS, HINCRBY, HINCRBYFLOAT
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
Sets unordered collections of strings Similar to lists but all members are unique Possible to add the same element multiple times without needing to check its existence in the set. Add, remove and test existence of members in constant time. Set-based operations (union, intersection, difference) Operations: SADD, SREM, SISMEMBER, SMEMBERS, SDIFF,SDIFFSTORE, SINTER, SINTERSTORE, SPOP, etc.
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
Sorted sets Ordered collections of strings. a rank field to determine the order. Members are automatically sorted by rank Members must be unique. Can retrieve elemets by position or rank Set-based operations (union, intersection, difference) Add, remove and updating of members is in O(log(n)). Operations: ZADD, ZREM, ZSCORE, ZRANGE, ...
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
Bitmaps Bit operations Can count the number of bits set ot 1. Perform AND, OR, XOR, NOT operations Operations: SETBIT, GETBIT, BITOP AND, BITOP OR, BITOP NOT
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
Hyperloglogs A probabilistic data structure to count unique elements → gives an estimation Do not need to keep a copy of all members. Operations: ZADD, ZREM, ZSCORE, ZRANGE, ...
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
uses the BerkeleyDB storage engine. provides transactional semantics with fined-grained concurrency primary and secondary indexes. high availability
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
Amazon DynamoDB automatic sharding Client: AdRoll serves 100 billion ad impressions per day.
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
Aerospike Hybrid in-memory: dynamic random access memory (DRAM) and SSD self-healing architecture
NoSQL Systems Key-Value Introduction Graph Consistency - Scaling Document Transactions - Use cases Column-family Memcached - Redis summary Oracle NoSQL, DynamoDB, Aerospike, Riak Polygot persistence
Riak Riak secondary indexes, full-text search, MapReduce compliant Couchbase: membase
NoSQL Systems Key-Value Graph Introduction Document Consistency Column-family Scaling - Use cases summary Neo4J Polygot persistence
Stores entities (aka nodes) and relationships (aka edges) between these entities. Nodes and edges can have properties (aka attributes) in the property graph model. Edges may be directed or not. A query on a graph is known as traversing the graph. Traversing is fast because joins are not calculated but persisted.
NoSQL Systems Key-Value Graph Introduction Document Consistency Column-family Scaling - Use cases summary Neo4J Polygot persistence
Some solutions do not support distributing the nodes, so there no consistency problem → ACID Otherwise, systems adopt a master-slave approach. Some where slaves accept writes and synchronize with the master
NoSQL Systems Key-Value Graph Introduction Document Consistency Column-family Scaling - Use cases summary Neo4J Polygot persistence
Partition the graph using round-robin, range, hash, graph partitioning, cluster ML approaches (K-means) Use cases: connected data (social networks), routing and location-based services, recommendation engines. No adapted to use cases where many nodes need to be modified at once.
NoSQL Systems Key-Value Graph Introduction Document Consistency Column-family Scaling - Use cases summary Neo4J Polygot persistence
NoSQL Systems Key-Value Graph Introduction Document Consistency Column-family Scaling - Use cases summary Neo4J Polygot persistence
Project NeoTechnology, AGPL/VGPL licence Started in 2009 Implemented in Java Characteristics: represent everything with nodes and relationships, persisted, fully transactional.
NoSQL Systems Key-Value Graph Introduction Document Consistency Column-family Scaling - Use cases summary Neo4J Polygot persistence
Nodes Relationships between nodes (can be both directed and bidirectional) Data on nodes and relationships (arbitrary number of key/value pairs)
NoSQL Systems Key-Value Graph Introduction Document Consistency Column-family Scaling - Use cases summary Neo4J Polygot persistence
NoSQL Systems Key-Value Graph Introduction Document Consistency Column-family Scaling - Use cases summary Neo4J Polygot persistence
Graph traversal based. Navigate from a starting node via relationships to the node matching a criteria. Usually takes the form of a Java API. Can be queried with Cypher or Gremlin1 Cypher (declarative) vs Gremlin (procedural) OpenCypher (http://www.opencypher.org/)
1https://github.com/tinkerpop/gremlin/wiki NoSQL Systems Key-Value Graph Introduction Document Consistency Column-family Scaling - Use cases summary Neo4J Polygot persistence
Cypher example START barbara = node:nodeIndex(name = ”Barbara”) MATCH (barbara)-[:FRIEND]->(friend node) RETURN friend node.name,friend node.location
NoSQL Systems Key-Value Graph Introduction Document Consistency Column-family Scaling - Use cases summary Neo4J Polygot persistence
Every operation must happen inside a transaction. Query model: graph traversal based. Persisted on disk with a custom binary disk format. Sharding is hard and needs interactions of end-user.
NoSQL Systems Key-Value Graph Introduction Document Consistency Column-family Scaling - Use cases summary Neo4J Polygot persistence
Architecture based on master-slave Concurrency control: locks Support for MapReduce: No
NoSQL Systems Key-Value Introduction Graph Transactions Document Querying Column-family Use cases summary MongoDB Polygot persistence
Documents are the main concept in document databases The DB stores and retrieves XML, JSON, BSON, etc. documents. The documents are stored in the value of a key-value store. The documents are self-describing, hierarchical tree data structures consisting of maps, collections ad scalar values. From document to another, the “schema” can be different but they can still belong to the same collection (like a table in RDBMS).
NoSQL Systems Key-Value Introduction Graph Transactions Document Querying Column-family Use cases summary MongoDB Polygot persistence
Example: { “name”: “ACTIFED ALLERGIE”, “price”:5.61} { “name”:“HUMEX ALLERGIE”, “rembRate”:0} There are no empty or null-valued attributes in documents (contrary to RDBMS). If an attribute is not found in a document, we assume that it is not set or relevant to the document. Documents allow for new attributes to be created without the need to define them or to change the existing documents.
NoSQL Systems Key-Value Introduction Graph Transactions Document Querying Column-family Use cases summary MongoDB Polygot persistence
Popular document stores are: MongoDB, CouchDB, Terrastore, OrientDB, RavenDB, ArrangoDB. Like other NoSQL stores, they have their differences (transaction, consistency, representation, query model, etc.).
NoSQL Systems Key-Value Introduction Graph Transactions Document Querying Column-family Use cases summary MongoDB Polygot persistence
As aggregate-oriented stores, document database systems support transactions at a single document level (→ atomic transactions). RavenDB supports transactions across multiple operations.
NoSQL Systems Key-Value Introduction Graph Transactions Document Querying Column-family Use cases summary MongoDB Polygot persistence
Compared to KV stores, it is frequently possible to the data inside the document without having to retrieve the whole document by its key. CouchDB: query via views which can be materialized (automatically updated when queried if any data has changed since the last update) or dynamic.
NoSQL Systems Key-Value Introduction Graph Transactions Document Querying Column-family Use cases summary MongoDB Polygot persistence
Event logging, especially when events keep changing. Content Management systems Blogging platforms Web and real-time analytics E-commerce applications: due to rich schema flexibility for products and orders. Not to use for complex transactions spanning different operations, queries against varying aggregate structure
NoSQL Systems Key-Value Introduction Graph Transactions Document Querying Column-family Use cases summary MongoDB Polygot persistence
NoSQL Systems Key-Value Introduction Graph Transactions Document Querying Column-family Use cases summary MongoDB Polygot persistence
Presentation Document-oriented database Project 10gen, AGPL licence Started in 2009 Implemented in C++ Characteristics: scale out, MapReduce style aggregation, geospatial indexes with features of RDBMS: secondary indexes, range queries, sorting.
NoSQL Systems Key-Value Introduction Graph Transactions Document Querying Column-family Use cases summary MongoDB Polygot persistence
Data model Document is the basic unit. Support for embedded documents, arrays → complex h ierarchical relationships within a single document. Every document has a special key: “ id” that is unique across the document’s collection Collection is like a table but schemafree. It is a group of documents. Subcollections can be defined, e.g. blog.posts. Collections are grouped in databases. Document’s format: JSON (i.e. key/value pairs). Stored as BSON (4MB max for a doc).
NoSQL Systems Key-Value Introduction Graph Transactions Document Querying Column-family Use cases summary MongoDB Polygot persistence
MongoDB is type and case sensitive. No duplicates in a document. MongoDB shell is a full-featured JavaScript interpreter Only 6 types in JSON. MongoDB extends that: null, boolean, 32-bit and 64-bit integer, 64-bit floating point number, string, objectid, date, regep, code, binary data, array, embedded doc, etc.
NoSQL Systems Key-Value Introduction Graph Transactions Document Querying Column-family Use cases summary MongoDB Polygot persistence
Query expression objects expressed in BSON documents, e.g. db.users.find({}) or db.users.find({’name’:’smith’}) Support for conditions, e.g. db.users.find({’age’: {’$gte’:18,’$lte’:25}}) disjunction (with $in and $nin), negation ($not), regular expression, sorting, skip, limit Querying arrays ($all, $size, $slice to get a subset of the values stored in an array) Querying embedded documents, e.g. db.people.find({”name” : {”first” : ”Joe”, ”last” : ”Schmoe”}}) API and drivers for many languages
NoSQL Systems Key-Value Introduction Graph Transactions Document Querying Column-family Use cases summary MongoDB Polygot persistence
Aggregation tools: count (db.foo.count()), distinct (finds all the distinct value for a given key), group (divides the collection for each value of the chosen key, similar to group by). MapReduce: map and reduce written in Javascript. Not using a framework like Hadoop.
NoSQL Systems Key-Value Introduction Graph Transactions Document Querying Column-family Use cases summary MongoDB Polygot persistence
GridFS to store large documents (up to 2GB): large docs are splitted into chunks which are stored in collections. Indexes on collections
NoSQL Systems Key-Value Introduction Graph Transactions Document Querying Column-family Use cases summary MongoDB Polygot persistence
Replication Replica set is basically a master-slave cluster with automatic failover. The master is elected by the cluster and may change if the current one goes down. Replication is asynchronous. Be careful if slaves are used for reads (stale data). One starts servers as master or slave. local database stores the operation log (oplog) of the master slaves sync by asking for the oplog is used for scaling reads (from slaves) or data processing
NoSQL Systems Key-Value Introduction Graph Transactions Document Querying Column-family Use cases summary MongoDB Polygot persistence
Sharding MongoDB’s solution to scale out. Principle: break up collections into smaller chunks which are distributed over shards. mongos is started and handles the distribution of queries. That way, the client connects to mongos instead of mongod. The rest is transparent to the client. When to shard: disk space problem on current machine, need faster writes, need to put more data in-memory. How: select a shard key (e.g. username). MongoDB takes care of shard balancing.
NoSQL Systems Key-Value Introduction Graph Transactions Document Querying Column-family Use cases summary MongoDB Polygot persistence
Architecture based on master-slave or master-master (not recommended) Concurrency control: Last update wins Support for MapReduce: Yes but a built-in one (not Hadoop)
NoSQL Systems Key-Value Graph Document Introduction Column-family Cassandra summary Polygot persistence aka Wide row stores and wide column stores. Aggregate-oriented stores where the value can have multiple columns. A column is a set of data values of a particular type. Column-family dbs store and process data by column instead of row. Popular systems are Cassandra, HBase and Google’s BigTable.
NoSQL Systems Key-Value Graph Document Introduction Column-family Cassandra summary Polygot persistence
Use cases Event logging CMS Blogging platforms Counters: count and categorize visitors of a page to calculate analytics. Expiring usage: you can define expiring columns (remove ad banners on a website for a specific time). Columns are removed automatically after a gven time. (Time To Live = TTL). Not adapted to: for early prototypes due to schema change (Cassandra), ACID transactions.
NoSQL Systems Key-Value Graph Document Introduction Column-family Cassandra summary Polygot persistence
NoSQL Systems Key-Value Graph Document Introduction Column-family Cassandra summary Polygot persistence
Column family database Apache project (since 2008), started at Facebook in 2007 Started in 2009 Implemented in Java Used at Digg, Twitter, Reddit, Rackspace, Netflix, etc. Characteristics: Fault tolerant, Decentralized (P2P), Shared nothing architecture, Tuneable consistency, Elastic
NoSQL Systems Key-Value Graph Document Introduction Column-family Cassandra summary Polygot persistence
Lineage : Dynamo (for the distribution model) and Bigtable (for data model and storage archititecture) Fully distributed (no Single Point of Failure) Fast reads and write(’optimize for reads, writes are cheap’) Eventually consistent
NoSQL Systems Key-Value Graph Document Introduction Column-family Cassandra summary Polygot persistence
Distributed multi-level hash map
NoSQL Systems Key-Value Graph Document Introduction Column-family Cassandra summary Polygot persistence
Column is a triple with: a key, a value and a timestamp Columns and super columns are sorted (customizable and defined by column family) Predefined sorts are: BytesType, LongType, AsciiType, UTF8Type, LexicalUUIDType, TimeUUIDType
NoSQL Systems Key-Value Graph Document Introduction Column-family Cassandra summary Polygot persistence
API Writes insert(): insert/update a single column remove(): remove a column/super column/row batch mutated() : update/remove several columns Reads get() : a single column get slice() ; retrieve a group of columns (by names or range) get range slices() : retrieve a set of slices for a range of row keys count() : the number of columns in a row
NoSQL Systems Key-Value Graph Document Introduction Column-family Cassandra summary Polygot persistence
CQL CQL: Cassandra Query language SQL like language with DDL (CREATE, ALTER, DROP) and DML (INSERT, UPDATE, DELETE, SELECT) query operations. Supported datatypes: numerical, character, date, unstructured and specialized datatypes (JSON). Transactions are more AID than ACID :: tunable data consistency across a database (from strong to eventual).
NoSQL Systems Key-Value Graph Document Introduction Column-family Cassandra summary Polygot persistence
Replication Set per keyspace Specified in servers config file Tells how duplicate one wants
NoSQL Systems Key-Value Graph Document Introduction Column-family Cassandra summary Polygot persistence
Architecture based on ring (consistent hashing) Concurrency control: OCC (no locking) Support for MapReduce: Yes
NoSQL Systems Key-Value Graph Document Exercice Column-family summary Polygot persistence
NoSQL Systems Key-Value Graph Document Exercice Column-family summary Polygot persistence
We consider a simple blog application containing the following information. Blog entries are being written by users characterized by a userId, a name and an email address. For each blog entry, we store the content of the entry, its storage date (that is the date at which it is being stored in the database), the user who has produced the entry and the category of the entry. Category correspond to a subject area (e.g. sports, music, computer science). Finally, each user can subscribe to the blogs of other users. Apart from joinless queries (e.g. display blog entries in chronological order), the application needs to answer questions such as ’what users subscribe to ones blog’ and ’show the most recent entries for the blogs one has subscribed to’. Model for the relational, document, column family and graph models.
NoSQL Systems Key-Value Graph Document Exercice Column-family summary Polygot persistence
user (userid, username, state) category (catId, catName) blog (blogid, blogContent, blogDate, #userid ,#catID) follow (#follower, #followed, followDate)
NoSQL Systems Key-Value Graph Document Exercice Column-family summary Polygot persistence
Schema from http://www.datastax.com/docs/0.8/ddl/index
NoSQL Systems Key-Value Graph Document Exercice Column-family summary Polygot persistence
users { id : ”jbellis” name : ”Jonathan” state : ”TX” following : [”dhutch”, ”egilmore” ], followers : [ ”dhutch”, ”egilmore”] blogs: { [ date: ”128..”, body: ”Today ..”, category: ”tech” ],.. } } with secondary indexes on following, followers and date of blogs
NoSQL Systems Key-Value Graph Document Exercice Column-family summary Polygot persistence
NoSQL Systems Key-Value Graph Introduction Document scenario Column-family Example summary Polygot persistence
Polyglot persistence Term coined after Neal Ford’s Polygot programming, asking to write programs with a mix prog. languages. Polyglot persistence aims to use different different data stores in your applications. Imagine a e-commerce application. What would you use for the shopping cart, the completed orders and session data ?
NoSQL Systems Key-Value Graph Introduction Document scenario Column-family Example summary Polygot persistence
The shopping cart and the session data can be efficiently stored in a Key-Value store. Respectively, their keys are userID and sessionID. Once an order is completed, that data can be stored in an RDBMS or a Document store. What if we want to add a product recommendation service ? Thing Collaborative Filtering, those who bought that product also like that product or your friends bought .. What about inventory and item prices ?
NoSQL Systems Key-Value Graph Introduction Document scenario Column-family Example summary Polygot persistence
A graph database corresponds to storing recommendation data. Inventory and item prices fit nicely in an RDBMS. If we have a lot of text, we can index that text using a store like Solr (part of the Lucene project). With Polyglot Persistence, one has to be careful with deployment complexity: all databases are needed in production at the same time. It may be a got solution to design services on these databases. It reduces the impact of data storage choices.
NoSQL Systems Key-Value Graph Introduction Document scenario Column-family Example summary Polygot persistence
It is a good practice to use an RDBMS for every aspect of storage for the application With the various NOSQL stores available now, it seems more clever to implement applications that will access data stored in RDBMS, NOSQL, RDF stores, etc. Key aspect for this decision is: to understand the pros and cons of each DB system and identify the storage issues related your app’s functionalities.
NoSQL Systems Key-Value Graph Introduction Document scenario Column-family Example summary Polygot persistence
NoSQL Systems Key-Value Graph Introduction Document scenario Column-family Example summary Polygot persistence Twitter When a write a tweet: the tweet enters the ’WriteAPI’ which calls the Fanout module to send it to all followers, i.e. stored in a user array of tweets (in Redis) In the Redis cluster, all users’s timelines are stored (not persisted, everything in RAM, duplicated 3 times). In case of failover, it can be reconstructed. They keep the last 800 tweets for each user in RAM. Fanout asks the Social Graph service to know who is following who. In redis, data model is tweetId (8bytes), UserID (8bytes), bits (4bytes) plus retweet (tweetID) Timeline service, provides the Redis server where your home timeline is stored.
NoSQL Systems Key-Value Graph Introduction Document scenario Column-family Example summary Polygot persistence
Twitter (2) The WriteAPI also sends tweets to the ’Search Ingester’ then it stores it in a modified Lucene index (named Earlybird). Index is in-memory. Blender is the service that enables to access Earlybird. Twitter also a a pull solution (pulls tweets to users). WriteAPI sends tweets to HTTP Push which contains ’Hosebird’ which searches to how to sends that tweet. A similar service exists for mobile devices, named Mobile Push. WriteAPI also sends all tweets to HDFS to run MR jobs.
NoSQL Systems