Twitter Platform Structure

Twitter Platform Structure for scalable SNS platform 날짜 2011.01.11 박혜웅 1 written by BAGE Twitter vs Facebook • Twitter (2010.06) – 2006 ~ 2009 – 190,000,000 users – 750 tweets/second, 3,000 tweets/second peek – 600 TPS/second, 1000 TPS/sec peak – 600,000,000 searches/day • Facebook (2010.07) – 500,000,000 users – 200,000,000 users on mobile – new 90 contents per month 2 written by BAGE Twitter Architecture • Front-end • Logging – Ruby on Rails – Scribe • Caching • Monitoring for system & – Varnish (for static) network – Memcached (for dynamic) – Munin & Nagios – CacheMoney • Statistics • Message Queue – Google Anlyics(?) – Kestrel • Storage – Hadoop, Cassandra(?) • Searching – Lucene(?), Hbase • Analysis – Pig on Hadoop • Graphing – FlockDB 3 Twitter Architecture invalidated each time a tweet was coming read-through page per user (50% hit rate) read-through cache from cache (95% hit rate) write-through real data of tweet and user (95% hit rate) write-through an array of tweet IDs (99% hit rate) from MySQL to Cassandra 4 written by BAGE Front-end • Ruby on Rails – Ruby: script language – Rails: web framework • Mongrel – web server 5 written by BAGE Caching • from MySQL to Memcached, CacheMoney, Varnish • make user’s timeline using Fanout. • Memcached – “There are only two hard things in CS: cache invalidation and naming things.” Phil Karlton, via Tim Bray • CacheMoney – write-through caching – developed by twitter • Varnish – for web pages cache. (Reverse Proxy) – uses virtual memory (mapping memory to file) – better than Squid 6 written by BAGE Message Queue • from DRB, Rinda, Erlang, Starling to Kestrel • Kestrel – Simplest Message Queue – based on Starling • Starling uses eventmachine in a single-thread single-process from. • Kestrel uses eventmachine in a multi I/O thread and a pool of worker thread. – written in Scala – developed by twitter – with Apache Mina • a network application framework • Features of Kestrel – fast • queues are stored in memory. • runs on the JVM, good garbage collector. – small • only 2K lines os Scala. – durable • logged into a journal on disk. – reliable • if fail, item is automatically handed to another client. 7 written by BAGE Message Queue • Anti-Features of Kestrel – strongly ordered • each queue is ordered on each machine. • a queue cluster will apear “loosely ordered”. – transactional • item ownership is transferred with acknowledgement. • Use messages to invalidate cache. • Simplest Message Queue – Gives up constraints for scalability. – No strict ordering of jobs. – No shared state among servers. – Uses memcached protocol. 8 written by BAGE Searching • from Memcached to Lucene... – low hit rates problem. – term_id, doc_id scheme + partioning by time 9 written by BAGE Storage • from MySQL to Cassandra – for Huge Data and free Licence • only for Backup • Table Scheme for partioning and replicatioin – Commutative (가환) • 연산의 순서가 바뀌어도 됨, sum(x,y) – Idempotent (멱등) • 같은 연산을 여러 번 수행해도 결과가 같음. ex: max(x,y) – Denormalization – make duplicated table with different primary key. • Multi functional masters. • Database Table Partioning – partioning by time for tweets. – partioning by user for timeline. – “The Next Application I Build is Going to Be Easily Partitionable” - S. Butterfield • Index Everything – Do not use “like %keyword%” • Lightweight Locking 10 written by BAGE Storage • Use explain to how your queries are running. • Avoid complex joins. • Avoid scanning large sets of data 11 written by BAGE Lessons Learned • Talk to the community. • Treat your scaling plan like a business plan. • Build it yourself. – Don’t rely on memcache and database. • Build in user limits. • Don’t make the databse the central bottleneck of doom. – Not everything needs to require a gigantic join. • Make your application easily partitionable from the start. • Realize your site is slow. – Instrument everything. – Start graphing early • Prepare a full test suite – Test everything. • Don’t make services dependent. • Process asynchronously when possible • Cache as much as possible – Databases not always the best store. • Start working on scaling early. 12 written by BAGE Lessons Learned • Long running processes should be abstracted to daemons. • Most performance comes from application design. – not from the language. • Turn your website into an open service by creating an API. 13 written by BAGE.

Twitter Platform Structure

Title, Modify

Mogućnosti Primjene Twitterovog Aplikacijskog Progamskog Sučelja

What Is Nosql? the Only Thing That All Nosql Solutions Providers Generally Agree on Is That the Term “Nosql” Isn’T Perfect, but It Is Catchy

Graph Databases: Their Power and Limitations Jaroslav Pokorný

The Complete Guide to Social Media from the Social Media Guys

Chapter 3 Big Data Outlook, Tools, and Architectures (Hajira Jabeen)

Analysis of Alternatives to Store Genealogical Trees Using Graph Databases

Issue Editor

Evaluation of Graph Management Systems for Monitoring and Analyzing Social Media Content with Obi4wan

Graph Analytics Using Vertica Relational Database

Campustream: Mobile Social Networking

Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries