Procella: Unifying Serving and Analytical Data at Youtube
Total Page:16
File Type:pdf, Size:1020Kb
Procella: Unifying serving and analytical data at YouTube Biswapesh Chattopadhyay Priyam Dutta Weiran Liu Ott Tinn Andrew Mccormick Aniket Mokashi Paul Harvey Hector Gonzalez David Lomax Sagar Mittal Roee Ebenstein Nikita Mikhaylin Hung-ching Lee Xiaoyan Zhao Tony Xu Luis Perez Farhad Shahmohammadi Tran Bui Neil McKay Selcuk Aya Vera Lychagina Brett Elliott Google LLC [email protected] ABSTRACT • Reporting and dashboarding: Video creators, con- Large organizations like YouTube are dealing with exploding tent owners, and various internal stakeholders at data volume and increasing demand for data driven applica- YouTube need access to detailed real time dashboards tions. Broadly, these can be categorized as: reporting and to understand how their videos and channels are per- dashboarding, embedded statistics in pages, time-series mon- forming. This requires an engine that supports exe- itoring, and ad-hoc analysis. Typically, organizations build cuting tens of thousands of canned queries per second specialized infrastructure for each of these use cases. This, with low latency (tens of milliseconds), while queries however, creates silos of data and processing, and results in may be using filters, aggregations, set operations and a complex, expensive, and harder to maintain infrastructure. joins. The unique challenge here is that while our data At YouTube, we solved this problem by building a new volume is high (each data source often contains hun- SQL query engine { Procella. Procella implements a super- dreds of billions of new rows per day), we require near set of capabilities required to address all of the four use cases real-time response time and access to fresh data. above, with high scale and performance, in a single product. • Embedded statistics: YouTube exposes many real- Today, Procella serves hundreds of billions of queries per time statistics to users, such as likes or views of a video, day across all four workloads at YouTube and several other resulting in simple but very high cardinality queries. Google product areas. These values are constantly changing, so the system PVLDB Reference Format: must support millions of real-time updates concurrently Biswapesh Chattopadhyay, Priyam Dutta, Weiran Liu, Ott Tinn, with millions of low latency queries per second. Andrew Mccormick, Aniket Mokashi, Paul Harvey, Hector Gonza- lez, David Lomax, Sagar Mittal, Roee Ebenstein, Hung-ching Lee, • Monitoring: Monitoring workloads share many prop- Xiaoyan Zhao, Tony Xu, Luis Perez, Farhad Shahmohammadi, erties with the dashboarding workload, such as rela- Tran Bui, Neil McKay, Nikita Mikhaylin, Selcuk Aya, Vera Ly- tively simple canned queries and need for fresh data. chagina, Brett Elliott. Procella: Unifying serving and analytical The query volume is often lower since monitoring is typ- data at YouTube. PVLDB, 12(12): 2022-2034, 2019. ically used internally by engineers. However, there is a DOI: https://doi.org/10.14778/3352063.3352121 need for additional data management functions, such as automatic downsampling and expiry of old data, and additional query features (for example, efficient approx- 1. INTRODUCTION imation functions and additional time-series functions). YouTube, as one of the world's most popular websites, generates trillions of new data items per day, based on billions • Ad-hoc analysis: Various YouTube teams (data sci- of videos, hundreds of millions of creators and fans, billions entists, business analysts, product managers, engineers) of views, and billions of watch time hours. This data is used need to perform complex ad-hoc analysis to understand for generating reports for content creators, for monitoring usage trends and to determine how to improve the prod- our services health, serving embedded statistics such as video uct. This requires low volume of queries (at most tens view counts on YouTube pages, and for ad-hoc analysis. per second) and moderate latency (seconds to minutes) These workloads have different set of requirements: of complex queries (multiple levels of aggregations, set operations, analytic functions, joins, unpredictable pat- terns, manipulating nested / repeated data, etc.) over This work is licensed under the Creative Commons Attribution- enormous volumes (trillions of rows) of data. Query pat- NonCommercial-NoDerivatives 4.0 International License. To view a copy terns are highly unpredictable, although some standard of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For data modeling techniques, such as star and snowflake any use beyond those covered by this license, obtain permission by emailing schemas, can be used. [email protected]. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment. Historically, YouTube (and other similar products at Proceedings of the VLDB Endowment, Vol. 12, No. 12 ISSN 2150-8097. Google and elsewhere) have relied on different storage and DOI: https://doi.org/10.14778/3352063.3352121 query backends for each of these needs: Dremel [33] for ad-hoc 2022 Realtime Metadata analysis and internal dashboards, Mesa [27] and Bigtable [11] Root Data Flow Server Query for external-facing high volume dashboards, Monarch [32] Server (RS) Client Batch (MDS) for monitoring site health, and Vitess [25] for embedded Data Flow statistics in pages. But with exponential data growth and memory product enhancements came the need for more features, bet- Query Flow Metadata Data Server Remote Store (DS) ter performance, larger scale and improved efficiency, making meeting these needs increasingly challenging with existing Ingestion infrastructures. Some specific problems we were facing were: Registration Server Server (RS) (IgS) • Data needed to be loaded into multiple systems us- Batch Process ing different Extract, Transform, and Load (ETL) Compaction Realtime processes, leading to significant additional resource Server Data consumption, data quality issues, data inconsistencies, slower loading times, high development and mainte- nance cost, and slower time-to-market. Distributed non-volatile file system (Colossus) • Since each internal system uses different languages and API's, migrating data across these systems, to Figure 1: Procella System Architecture. enable the utilization of existing tools, resulted in re- duced usability and high learning costs. In particular, since many of these systems do not support full SQL, { Data is immutable. A file can be opened and some applications could not be built by using some appended to, but cannot be modified. Further, backends, leading to data duplication and accessibility once finalized, the file cannot be altered at all. issues across the organization. { Common metadata operations such as listing and • Several of the underlying components had performance, opening files can have significantly higher latency scalability and efficiency issues when dealing with data (especially at the tail) than local file systems, be- at YouTube scale. cause they involve one or more RPCs to the Colos- sus metadata servers. To solve these problem, we built Procella, a new distributed { All durable data is remote; thus, any read or write query engine. Procella implements a superset of features operation can only be performed via RPC. As required for the diverse workloads described above: with metadata, this results in higher cost and • Rich API: Procella supports an almost complete im- latency when performing many small operations, plementation of standard SQL, including complex multi- especially at the tail. stage joins, analytic functions and set operations, with • Shared compute: All servers must run on Borg, our several useful extensions such as approximate aggrega- distributed job scheduling and container infrastructure. tions, handling complex nested and repeated schemas, This has several important implications: user defined functions, and more. • High Scalability: Procella separates compute (run- { Vertical scaling is challenging. Each server runs ning on Borg [42]) and storage (on Colossus [24]), en- many tasks, each potentially with different re- abling high scalability (thousands of servers, hundreds source needs. To improve overall fleet utilization, of petabytes of data) in a cost efficient way. it is better to run many small tasks than a small number of large tasks. • High Performance: Procella uses state-of-the-art { Borg master can often bring down machines for query execution techniques to enable efficient execution maintenance, upgrades, etc. Well behaved tasks of high volume (millions of QPS) of queries with very thus must learn to recover quickly from being low latency (milliseconds). evicted from one machine and brought up in a • Data Freshness: Procella supports high volume, low different machine. This, along with the absence latency data ingestion in both batch and streaming of local storage, makes keeping large local state modes, the ability to work directly on existing data, impractical, adding to the reasons for having a and native support for lambda architecture [35]. large number of smaller tasks. { A typical Borg cluster will have thousands of inex- 2. ARCHITECTURE pensive machines, often of varying hardware con- figuration, each with a potentially different mix of 2.1 Google Infrastructure tasks with imperfect isolation. Performance of a Procella is designed to run on Google infrastructure. task, thus, can be unpredictable. This, combined Google has one of world's most advanced distributed systems with the factors above, means that any distributed infrastructure, with several notable features,