what happens when you type en.wikipedia.org
effie mouzeli • alexandros kosiaris
SREcon19 Dublin @kosiaris • @manjiki
About
@kosiaris • @manjiki CC BY-SA 4.0 Niccolò Caranti2 Did you know...
● … the Wikipedia infrastructure is run by the Wikimedia Foundation, an American nonprofit charitable organisation? ● … and we are ~370 people? ● … and we have no affiliation with Wikileaks? ● … all content is managed by volunteers? ● … we support 304 languages? ● … Wikipedia is 18 years old ? ● … Wikipedia hosts some really really weird articles? ● … which can’t be read in Turkey (2017) nor China (2019)?
3 Wikimedia Projects
4 Wikimedia Infrastructure
✺ Open source software
✺ 2 Primary Data Centres
✺ 3 Caching Points of Presence
✺ ~17 billion pageviews per month*
✺ ~300k new editors per month
✺ ~1300 bare metal servers
* it’s complicated 5 Site Reliability Engineering
✺ Datacenter Operations The SRE team is a globally distributed team of 26 people responsible for ✺ Data Persistence developing and maintaining Wikimedia's production systems ✺ Infrastructure Foundations
✺ Service Operations The Foundation has more SREs in other teams as well! ✺ Traffic
6
Application Layer
@kosiaris • @manjiki CC BY-SA 2.0 Arthur Dunn7 MediaWiki
✺ Our core application MediaWiki is a free server-based
✺ PHP, Apache, MySQL. Yes.* software, licensed under theGNU GPL. ✴ PHP7.2 since Sept 2019 It is an extremely powerful, scalable ✺ Wiki web pages - app servers software, and a feature-rich wiki cluster implementation that uses PHP to ✺ API cluster process and display data stored in a
✺ Jobrunners/Videoscalers cluster database, such as MySQL.
* it’s complicated 8 Application Layer Caches
9 2014
From a
Monolith to 2019 Microservices
10 ✺ Elasticity
From a ✺ Hardware fault mitigation Monolith to ✺ Deployments ✺ Migration is not easy, and still Microservices ongoing
11 Microservices!
✺ Thumbor Thumbor is used for imagescaling Mathoid renders LaTeX, and returns JSON ✺ Mathoid with PNG, SVG or MathML renderings of the formula ✺ ORES ORES scores edits using Machine Learning ✺ Mobile Content Service (MCS) (anti-vandalism effort) MCS modifies page content on the fly, ✺ And many more tailoring it for mobile
12
Kubernetes
@kosiaris • @manjiki Public Domain Kubernetes
✺ Bare metal We have been running it successfully for the last 2 years! Currently, 11 services on ✺ Calico as a CNI plugin it. Got a pipeline in the works.
✺ Helm for deployments Powers all mathematical formulas on
✺ 2 clusters + 1 staging one Wikipedia!!!
✺ Docker as a CRE
14
Message Queueing
@kosiaris • @manjiki CC BY 2.0 bootbearwdc Message Queueing
✺ Yes, we use Apache Kafka Apache Kafka: stream processing
✺ We are sending events like: platform for real-time data feeds ✳ wikitext templates refresh ✳ edge caches purging One message queue to rule them all; ✳ cross wiki links started as a service for Analytics only. ✳ create new thumbnails Now, it is our de facto solution. ✳ re-encoding videos to open source formats
16
Databases
@kosiaris • @manjiki CC BY 2.0 RageZ17 MariaDB*
✺ Database clusters are divided into sections MariaDB: fork of MySQL, migrated from ✺ Sections have masters and MySQL in 2013* replicas*
✺ MediaWiki reads from replicas and writes to master Have a go at https://quarry.wmflabs.org ✺ Clusters: ✳ Wikitext (compressed) ✳ Metadata
✳ Parsercache
* it’s complicated 18 MariaDB
✺ Online schema migrations* ✺ Cross DC replication ✺ TLS across all DBs ✺ Snapshots and local dumps for Backups
✺ ~570 TB total data ✺ ~150 DB servers ✺ ~350k queries per second (qps) ✺ ~70 TB of RAM
* it’s complicated 19 Elasticsearch
You guessed it right, we use it for search. That box on your top right. Run by a team surprisingly called Search Platform!
20
Storage
@kosiaris • @manjiki CC BY-NC 2.0 Gail Thomas Swift
✺ All our media are stored on Swift OpenStack Object Storage: a scalable ✺ It has frontends storage system that stores and retrieves data … and backends via HTTP
✺ 1 billion objects
✺ ~390 TB of media!
22
Traffic
@kosiaris • @manjiki Public Domain23 Network
24 Network
25 Network
✺ We have our own content delivery gdnsd: GeoDNS is written and maintained network by one of us
✺ We direct traffic to a location on peering: interconnection with other demand (via GeoDNS) internet networks ✳ Pooling/Depooling DCs Linux Virtual Server: an advanced L3/L4 ✳ 10 min TTL load balancing solution for linux, supports
✺ LVS as a Layer 3/4 Linux consistent hashing loadbalancer* pybal: LVS manager, developed in-house
* it’s complicated 26 LVS-DR
27 CDN
28 CDN
✺ Nginx- for TLS termination Nginx-: Highly performant HTTP ✺ Varnish frontend ✳ in memory webserver/proxy with excellent TLS ✺ Varnish backend support ✳ local stores ✺ Varnish text ✳ HTML, CSS, JS etc Varnish: Reverse HTTP caching proxy ✺ Varnish upload ✳ media, media, media
29 CDN (coming soon)
30 CDN (coming soon)
✺ ATS TLS ✳ in memory Apache Traffic Server: Reverse and ✺ ATS backend forward proxy with excellent caching ✳ local store (SSDs) support ✺ ATS text ✳ HTML, CSS, JS etc ACME-chief: handles all the process of ✺ ATS upload issuing and renewing Let’s Encrypt ✳ media, media, media certificates (dns-01) ✺ ACME-chief
31
what happens when you type en.wikipedia.org
@manjiki • @kosiaris CC BY 3.0 WikiReader Read (cached)
33 Read (cached)
34 Read (uncached)
35 Edit - Media Upload
36
Managing to Manage
@kosiaris • @manjiki GETTY IMAGES Managing to Manage
Puppet: configuration management ✺ Infrastructure as code system for servers/services ✺ Configuration management ...~50k lines of puppet code ✺ Kubernetes ...~100k lines of Ruby/ERB ✺ Testing/CI/CD Cumin: in-house automation and ✺ Orchestration tooling orchestration tool
38
In a Nutshell
@kosiaris • @manjiki CC BY 2.0 Peter Trimming Want to sell encyclopedias?
https://jobs.wikimedia.org
https://grafana.wikimedia.org/ https://github.com/wikimedia/operations-puppet https://phabricator.wikimedia.org/ https://wikitech.wikimedia.org/
SREcon19 Dublin @kosiaris • @manjiki