Twitter Platform Structure
Total Page:16
File Type:pdf, Size:1020Kb
Twitter Platform Structure for scalable SNS platform 날짜 2011.01.11 박혜웅 1 written by BAGE Twitter vs Facebook • Twitter (2010.06) – 2006 ~ 2009 – 190,000,000 users – 750 tweets/second, 3,000 tweets/second peek – 600 TPS/second, 1000 TPS/sec peak – 600,000,000 searches/day • Facebook (2010.07) – 500,000,000 users – 200,000,000 users on mobile – new 90 contents per month 2 written by BAGE Twitter Architecture • Front-end • Logging – Ruby on Rails – Scribe • Caching • Monitoring for system & – Varnish (for static) network – Memcached (for dynamic) – Munin & Nagios – CacheMoney • Statistics • Message Queue – Google Anlyics(?) – Kestrel • Storage – Hadoop, Cassandra(?) • Searching – Lucene(?), Hbase • Analysis – Pig on Hadoop • Graphing – FlockDB 3 Twitter Architecture invalidated each time a tweet was coming read-through page per user (50% hit rate) read-through cache from cache (95% hit rate) write-through real data of tweet and user (95% hit rate) write-through an array of tweet IDs (99% hit rate) from MySQL to Cassandra 4 written by BAGE Front-end • Ruby on Rails – Ruby: script language – Rails: web framework • Mongrel – web server 5 written by BAGE Caching • from MySQL to Memcached, CacheMoney, Varnish • make user’s timeline using Fanout. • Memcached – “There are only two hard things in CS: cache invalidation and naming things.” Phil Karlton, via Tim Bray • CacheMoney – write-through caching – developed by twitter • Varnish – for web pages cache. (Reverse Proxy) – uses virtual memory (mapping memory to file) – better than Squid 6 written by BAGE Message Queue • from DRB, Rinda, Erlang, Starling to Kestrel • Kestrel – Simplest Message Queue – based on Starling • Starling uses eventmachine in a single-thread single-process from. • Kestrel uses eventmachine in a multi I/O thread and a pool of worker thread. – written in Scala – developed by twitter – with Apache Mina • a network application framework • Features of Kestrel – fast • queues are stored in memory. • runs on the JVM, good garbage collector. – small • only 2K lines os Scala. – durable • logged into a journal on disk. – reliable • if fail, item is automatically handed to another client. 7 written by BAGE Message Queue • Anti-Features of Kestrel – strongly ordered • each queue is ordered on each machine. • a queue cluster will apear “loosely ordered”. – transactional • item ownership is transferred with acknowledgement. • Use messages to invalidate cache. • Simplest Message Queue – Gives up constraints for scalability. – No strict ordering of jobs. – No shared state among servers. – Uses memcached protocol. 8 written by BAGE Searching • from Memcached to Lucene... – low hit rates problem. – term_id, doc_id scheme + partioning by time 9 written by BAGE Storage • from MySQL to Cassandra – for Huge Data and free Licence • only for Backup • Table Scheme for partioning and replicatioin – Commutative (가환) • 연산의 순서가 바뀌어도 됨, sum(x,y) – Idempotent (멱등) • 같은 연산을 여러 번 수행해도 결과가 같음. ex: max(x,y) – Denormalization – make duplicated table with different primary key. • Multi functional masters. • Database Table Partioning – partioning by time for tweets. – partioning by user for timeline. – “The Next Application I Build is Going to Be Easily Partitionable” - S. Butterfield • Index Everything – Do not use “like %keyword%” • Lightweight Locking 10 written by BAGE Storage • Use explain to how your queries are running. • Avoid complex joins. • Avoid scanning large sets of data 11 written by BAGE Lessons Learned • Talk to the community. • Treat your scaling plan like a business plan. • Build it yourself. – Don’t rely on memcache and database. • Build in user limits. • Don’t make the databse the central bottleneck of doom. – Not everything needs to require a gigantic join. • Make your application easily partitionable from the start. • Realize your site is slow. – Instrument everything. – Start graphing early • Prepare a full test suite – Test everything. • Don’t make services dependent. • Process asynchronously when possible • Cache as much as possible – Databases not always the best store. • Start working on scaling early. 12 written by BAGE Lessons Learned • Long running processes should be abstracted to daemons. • Most performance comes from application design. – not from the language. • Turn your website into an open service by creating an API. 13 written by BAGE.