Procella: Unifying Serving and Analytical Data at Youtube

Total Page:16

File Type:pdf, Size:1020Kb

Procella: Unifying Serving and Analytical Data at Youtube Procella: Unifying serving and analytical data at YouTube Biswapesh Chattopadhyay Priyam Dutta Weiran Liu Ott Tinn Andrew Mccormick Aniket Mokashi Paul Harvey Hector Gonzalez David Lomax Sagar Mittal Roee Ebenstein Nikita Mikhaylin Hung-ching Lee Xiaoyan Zhao Tony Xu Luis Perez Farhad Shahmohammadi Tran Bui Neil McKay Selcuk Aya Vera Lychagina Brett Elliott Google LLC [email protected] ABSTRACT • Reporting and dashboarding: Video creators, con- Large organizations like YouTube are dealing with exploding tent owners, and various internal stakeholders at data volume and increasing demand for data driven applica- YouTube need access to detailed real time dashboards tions. Broadly, these can be categorized as: reporting and to understand how their videos and channels are per- dashboarding, embedded statistics in pages, time-series mon- forming. This requires an engine that supports exe- itoring, and ad-hoc analysis. Typically, organizations build cuting tens of thousands of canned queries per second specialized infrastructure for each of these use cases. This, with low latency (tens of milliseconds), while queries however, creates silos of data and processing, and results in may be using filters, aggregations, set operations and a complex, expensive, and harder to maintain infrastructure. joins. The unique challenge here is that while our data At YouTube, we solved this problem by building a new volume is high (each data source often contains hun- SQL query engine { Procella. Procella implements a super- dreds of billions of new rows per day), we require near set of capabilities required to address all of the four use cases real-time response time and access to fresh data. above, with high scale and performance, in a single product. • Embedded statistics: YouTube exposes many real- Today, Procella serves hundreds of billions of queries per time statistics to users, such as likes or views of a video, day across all four workloads at YouTube and several other resulting in simple but very high cardinality queries. Google product areas. These values are constantly changing, so the system PVLDB Reference Format: must support millions of real-time updates concurrently Biswapesh Chattopadhyay, Priyam Dutta, Weiran Liu, Ott Tinn, with millions of low latency queries per second. Andrew Mccormick, Aniket Mokashi, Paul Harvey, Hector Gonza- lez, David Lomax, Sagar Mittal, Roee Ebenstein, Hung-ching Lee, • Monitoring: Monitoring workloads share many prop- Xiaoyan Zhao, Tony Xu, Luis Perez, Farhad Shahmohammadi, erties with the dashboarding workload, such as rela- Tran Bui, Neil McKay, Nikita Mikhaylin, Selcuk Aya, Vera Ly- tively simple canned queries and need for fresh data. chagina, Brett Elliott. Procella: Unifying serving and analytical The query volume is often lower since monitoring is typ- data at YouTube. PVLDB, 12(12): 2022-2034, 2019. ically used internally by engineers. However, there is a DOI: https://doi.org/10.14778/3352063.3352121 need for additional data management functions, such as automatic downsampling and expiry of old data, and additional query features (for example, efficient approx- 1. INTRODUCTION imation functions and additional time-series functions). YouTube, as one of the world's most popular websites, generates trillions of new data items per day, based on billions • Ad-hoc analysis: Various YouTube teams (data sci- of videos, hundreds of millions of creators and fans, billions entists, business analysts, product managers, engineers) of views, and billions of watch time hours. This data is used need to perform complex ad-hoc analysis to understand for generating reports for content creators, for monitoring usage trends and to determine how to improve the prod- our services health, serving embedded statistics such as video uct. This requires low volume of queries (at most tens view counts on YouTube pages, and for ad-hoc analysis. per second) and moderate latency (seconds to minutes) These workloads have different set of requirements: of complex queries (multiple levels of aggregations, set operations, analytic functions, joins, unpredictable pat- terns, manipulating nested / repeated data, etc.) over This work is licensed under the Creative Commons Attribution- enormous volumes (trillions of rows) of data. Query pat- NonCommercial-NoDerivatives 4.0 International License. To view a copy terns are highly unpredictable, although some standard of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For data modeling techniques, such as star and snowflake any use beyond those covered by this license, obtain permission by emailing schemas, can be used. [email protected]. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment. Historically, YouTube (and other similar products at Proceedings of the VLDB Endowment, Vol. 12, No. 12 ISSN 2150-8097. Google and elsewhere) have relied on different storage and DOI: https://doi.org/10.14778/3352063.3352121 query backends for each of these needs: Dremel [33] for ad-hoc 2022 Realtime Metadata analysis and internal dashboards, Mesa [27] and Bigtable [11] Root Data Flow Server Query for external-facing high volume dashboards, Monarch [32] Server (RS) Client Batch (MDS) for monitoring site health, and Vitess [25] for embedded Data Flow statistics in pages. But with exponential data growth and memory product enhancements came the need for more features, bet- Query Flow Metadata Data Server Remote Store (DS) ter performance, larger scale and improved efficiency, making meeting these needs increasingly challenging with existing Ingestion infrastructures. Some specific problems we were facing were: Registration Server Server (RS) (IgS) • Data needed to be loaded into multiple systems us- Batch Process ing different Extract, Transform, and Load (ETL) Compaction Realtime processes, leading to significant additional resource Server Data consumption, data quality issues, data inconsistencies, slower loading times, high development and mainte- nance cost, and slower time-to-market. Distributed non-volatile file system (Colossus) • Since each internal system uses different languages and API's, migrating data across these systems, to Figure 1: Procella System Architecture. enable the utilization of existing tools, resulted in re- duced usability and high learning costs. In particular, since many of these systems do not support full SQL, { Data is immutable. A file can be opened and some applications could not be built by using some appended to, but cannot be modified. Further, backends, leading to data duplication and accessibility once finalized, the file cannot be altered at all. issues across the organization. { Common metadata operations such as listing and • Several of the underlying components had performance, opening files can have significantly higher latency scalability and efficiency issues when dealing with data (especially at the tail) than local file systems, be- at YouTube scale. cause they involve one or more RPCs to the Colos- sus metadata servers. To solve these problem, we built Procella, a new distributed { All durable data is remote; thus, any read or write query engine. Procella implements a superset of features operation can only be performed via RPC. As required for the diverse workloads described above: with metadata, this results in higher cost and • Rich API: Procella supports an almost complete im- latency when performing many small operations, plementation of standard SQL, including complex multi- especially at the tail. stage joins, analytic functions and set operations, with • Shared compute: All servers must run on Borg, our several useful extensions such as approximate aggrega- distributed job scheduling and container infrastructure. tions, handling complex nested and repeated schemas, This has several important implications: user defined functions, and more. • High Scalability: Procella separates compute (run- { Vertical scaling is challenging. Each server runs ning on Borg [42]) and storage (on Colossus [24]), en- many tasks, each potentially with different re- abling high scalability (thousands of servers, hundreds source needs. To improve overall fleet utilization, of petabytes of data) in a cost efficient way. it is better to run many small tasks than a small number of large tasks. • High Performance: Procella uses state-of-the-art { Borg master can often bring down machines for query execution techniques to enable efficient execution maintenance, upgrades, etc. Well behaved tasks of high volume (millions of QPS) of queries with very thus must learn to recover quickly from being low latency (milliseconds). evicted from one machine and brought up in a • Data Freshness: Procella supports high volume, low different machine. This, along with the absence latency data ingestion in both batch and streaming of local storage, makes keeping large local state modes, the ability to work directly on existing data, impractical, adding to the reasons for having a and native support for lambda architecture [35]. large number of smaller tasks. { A typical Borg cluster will have thousands of inex- 2. ARCHITECTURE pensive machines, often of varying hardware con- figuration, each with a potentially different mix of 2.1 Google Infrastructure tasks with imperfect isolation. Performance of a Procella is designed to run on Google infrastructure. task, thus, can be unpredictable. This, combined Google has one of world's most advanced distributed systems with the factors above, means that any distributed infrastructure, with several notable features,
Recommended publications
  • INTRODUCTION to MOBILE APP ANALYTICS Table of Contents
    Empowering the Mobile Application with analytics strategies Prepared by: Pushpendra Yadav (Team GMA) INTRODUCTION TO MOBILE APP ANALYTICS Table of Contents 1. Abstract …………………………………………… 2 2. Available tools for Mobile App Analytics …………………………………………… 2 3. Firebase Analytics ……………………………………………………. 4 3.1 Why Firebase Analytics? ….………………………………………………… 4 4. How it works? ……………….…….………………..…………….………………………. 5 5. More on Firebase Analytics …….………………..…………….………………………. 6 5. Conclusion ……………………………….…………………………………………………. 7 6. References …………………………………………………………………………………. 8 1 | P a g e 1. Abstract Mobile analytics studies the behavior of end users of mobile applications and the mobile application itself. The mobile application, being an important part of the various business products, needs to be monitored, and the usage patterns are to be analyzed. Mobile app analytics is essential to your development process for many reasons. It gives you insights into how users are using your app, which parts of the app they interact with, and what actions they take within the app. You can then use these insights to come up with an action plan to further improve your product, like adding new features based on users seem to need, or improving existing ones in a way that would make the users lives easier, or removing features that the users don’t seem to use. 2. Available tools for Mobile App Analytics There are lots of tools on the market that can help you make your app better by getting valuable insights about return on investment, user traffic, your audience, and your
    [Show full text]
  • IBM Security Access Manager Version 9.0.7 June 2019: Advanced Access Control Configuration Topics Contents
    IBM Security Access Manager Version 9.0.7 June 2019 Advanced Access Control Configuration topics IBM IBM Security Access Manager Version 9.0.7 June 2019 Advanced Access Control Configuration topics IBM ii IBM Security Access Manager Version 9.0.7 June 2019: Advanced Access Control Configuration topics Contents Figures .............. vii Configuring authentication ........ 39 Configuring an HOTP one-time password Tables ............... ix mechanism .............. 40 Configuring a TOTP one-time password mechanism 42 Configuring a MAC one-time password mechanism 45 Chapter 1. Upgrading configuration ... 1 Configuring an RSA one-time password mechanism 46 Upgrading external databases with the dbupdate tool Configuring one-time password delivery methods 50 (for appliance at version 9.0.0.0 and later) .... 2 Configuring username and password authentication 54 Upgrading a SolidDB external database (for Configuring an HTTP redirect authentication appliance versions earlier than 9.0.0.0) ...... 3 mechanism .............. 56 Upgrading a DB2 external runtime database (for Configuring consent to device registration .... 57 appliance versions earlier than 9.0.0.0) ...... 4 Configuring an End-User License Agreement Upgrading an Oracle external runtime database (for authentication mechanism ......... 59 appliance versions earlier than 9.0.0.0) ...... 5 Configuring an Email Message mechanism .... 60 Setting backward compatibility mode for one-time HTML format for OTP email messages .... 62 password ............... 6 Configuring the reCAPTCHA Verification Updating template files ........... 6 authentication mechanism ......... 62 Updating PreTokenGeneration to limit OAuth tokens 7 Configuring an Info Map authentication mechanism 64 Reviewing existing Web Reverse Proxy instance point Embedding reCAPTCHA verification in an Info of contact settings ............ 8 Map mechanism ............ 66 Upgrading the signing algorithms of existing policy Available parameters in Info Map .....
    [Show full text]
  • System and Organization Controls (SOC) 3 Report Over the Google Cloud Platform System Relevant to Security, Availability, and Confidentiality
    System and Organization Controls (SOC) 3 Report over the Google Cloud Platform System Relevant to Security, Availability, and Confidentiality For the Period 1 May 2020 to 30 April 2021 Google LLC 1600 Amphitheatre Parkway Mountain View, CA, 94043 650 253-0000 main Google.com Management’s Report of Its Assertions on the Effectiveness of Its Controls Over the Google Cloud Platform System Based on the Trust Services Criteria for Security, Availability, and Confidentiality We, as management of Google LLC ("Google" or "the Company") are responsible for: • Identifying the Google Cloud Platform System (System) and describing the boundaries of the System, which are presented in Attachment A • Identifying our service commitments and system requirements • Identifying the risks that would threaten the achievement of its service commitments and system requirements that are the objectives of our System, which are presented in Attachment B • Identifying, designing, implementing, operating, and monitoring effective controls over the Google Cloud Platform System (System) to mitigate risks that threaten the achievement of the service commitments and system requirements • Selecting the trust services categories that are the basis of our assertion We assert that the controls over the System were effective throughout the period 1 May 2020 to 30 April 2021, to provide reasonable assurance that the service commitments and system requirements were achieved based on the criteria relevant to security, availability, and confidentiality set forth in the AICPA’s
    [Show full text]
  • Expo Push Notifications Php
    Expo Push Notifications Php Organisable and homemaker Gay velarize: which Darrell is insufferable enough? Luminous and dingy Raimund adventured so bitingly that Chev cold-shoulder his lacs. If unplumbed or rabbinic Finn usually purifying his nurseling amercing steaming or bumble devotionally and allegro, how large-hearted is Val? Preview its success in php code reusable in. Code snippets, tutorials, and sample apps for ahead use cases and communications solutions. Básicamente sirve para evitar que usan aplicaciones web push notification should check to expo php is built on the export. It news happening on push notifications can add the performance, there are accessible from. You likely retrieve this information from leave By default, NO Firebase products are initialized. This article is a queued for firebase admin sdk extends react authentication from text. You can also adjust your preferences by clicking on Manage Preferences. SDK for python using the above package, we also need to create a new Firebase database. Push notifications through firebase applications like regular firmware updates in expo php. The notification components are viewing an api in the security rules action and interactive conversation with. Notifications push notification services keep things are as a php. Failed to get push token for push notification! The user can urge to another database, especially as pound master level then the default database release be changed. Expo properly at the previous steps, the new React Native project. We build extraordinary native is not yet be laravel to have already use notifications push? What sat it do? Rf antennas for windows clients, we want some configuration to add a adjustment is pushed but it? Realtime databases that push notification can download php projects in one: we earn a list of development? In Microsoft Azure, copy function URL.
    [Show full text]
  • F1 Query: Declarative Querying at Scale
    F1 Query: Declarative Querying at Scale Bart Samwel John Cieslewicz Ben Handy Jason Govig Petros Venetis Chanjun Yang Keith Peters Jeff Shute Daniel Tenedorio Himani Apte Felix Weigel David Wilhite Jiacheng Yang Jun Xu Jiexing Li Zhan Yuan Craig Chasseur Qiang Zeng Ian Rae Anurag Biyani Andrew Harn Yang Xia Andrey Gubichev Amr El-Helw Orri Erling Zhepeng Yan Mohan Yang Yiqun Wei Thanh Do Colin Zheng Goetz Graefe Somayeh Sardashti Ahmed M. Aly Divy Agrawal Ashish Gupta Shiv Venkataraman Google LLC [email protected] ABSTRACT 1. INTRODUCTION F1 Query is a stand-alone, federated query processing platform The data processing and analysis use cases in large organiza- that executes SQL queries against data stored in different file- tions like Google exhibit diverse requirements in data sizes, la- based formats as well as different storage systems at Google (e.g., tency, data sources and sinks, freshness, and the need for custom Bigtable, Spanner, Google Spreadsheets, etc.). F1 Query elimi- business logic. As a result, many data processing systems focus on nates the need to maintain the traditional distinction between dif- a particular slice of this requirements space, for instance on either ferent types of data processing workloads by simultaneously sup- transactional-style queries, medium-sized OLAP queries, or huge porting: (i) OLTP-style point queries that affect only a few records; Extract-Transform-Load (ETL) pipelines. Some systems are highly (ii) low-latency OLAP querying of large amounts of data; and (iii) extensible, while others are not. Some systems function mostly as a large ETL pipelines. F1 Query has also significantly reduced the closed silo, while others can easily pull in data from other sources.
    [Show full text]
  • Containers at Google
    Build What’s Next A Google Cloud Perspective Thomas Lichtenstein Customer Engineer, Google Cloud [email protected] 7 Cloud products with 1 billion users Google Cloud in DACH HAM BER ● New cloud region Germany Google Cloud Offices FRA Google Cloud Region (> 50% latency reduction) 3 Germany with 3 zones ● Commitment to GDPR MUC VIE compliance ZRH ● Partnership with MUC IoT platform connects nearly Manages “We found that Google Ads has the best system for 50 brands 250M+ precisely targeting customer segments in both the B2B with thousands of smart data sets per week and 3.5M and B2C spaces. It used to be hard to gain the right products searches per month via IoT platform insights to accurately measure our marketing spend and impacts. With Google Analytics, we can better connect the omnichannel customer journey.” Conrad is disrupting online retail with new Aleš Drábek, Chief Digital and Disruption Officer, Conrad Electronic services for mobility and IoT-enabled devices. Solution As Conrad transitions from a B2C retailer to an advanced B2B and Supports B2C platform for electronic products, it is using Google solutions to grow its customer base, develop on a reliable cloud infrastructure, Supports and digitize its workplaces and retail stores. Products Used 5x Mobile-First G Suite, Google Ads, Google Analytics, Google Chrome Enterprise, Google Chromebooks, Google Cloud Translation API, Google Cloud the IoT connections vs. strategy Vision API, Google Home, Apigee competitors Industry: Retail; Region: EMEA Number of Automate Everything running
    [Show full text]
  • Spanner: Becoming a SQL System
    Spanner: Becoming a SQL System David F. Bacon Nathan Bales Nico Bruno Brian F. Cooper Adam Dickinson Andrew Fikes Campbell Fraser Andrey Gubarev Milind Joshi Eugene Kogan Alexander Lloyd Sergey Melnik Rajesh Rao David Shue Christopher Taylor Marcel van der Holst Dale Woodford Google, Inc. ABSTRACT this paper, we focus on the “database system” aspects of Spanner, Spanner is a globally-distributed data management system that in particular how query execution has evolved and forced the rest backs hundreds of mission-critical services at Google. Spanner of Spanner to evolve. Most of these changes have occurred since is built on ideas from both the systems and database communi- [5] was written, and in many ways today’s Spanner is very different ties. The first Spanner paper published at OSDI’12 focused on the from what was described there. systems aspects such as scalability, automatic sharding, fault tol- A prime motivation for this evolution towards a more “database- erance, consistent replication, external consistency, and wide-area like” system was driven by the experiences of Google developers distribution. This paper highlights the database DNA of Spanner. trying to build on previous “key-value” storage systems. The pro- We describe distributed query execution in the presence of reshard- totypical example of such a key-value system is Bigtable [4], which ing, query restarts upon transient failures, range extraction that continues to see massive usage at Google for a variety of applica- drives query routing and index seeks, and the improved blockwise- tions. However, developers of many OLTP applications found it columnar storage format.
    [Show full text]
  • Economic and Social Impacts of Google Cloud September 2018 Economic and Social Impacts of Google Cloud |
    Economic and social impacts of Google Cloud September 2018 Economic and social impacts of Google Cloud | Contents Executive Summary 03 Introduction 10 Productivity impacts 15 Social and other impacts 29 Barriers to Cloud adoption and use 38 Policy actions to support Cloud adoption 42 Appendix 1. Country Sections 48 Appendix 2. Methodology 105 This final report (the “Final Report”) has been prepared by Deloitte Financial Advisory, S.L.U. (“Deloitte”) for Google in accordance with the contract with them dated 23rd February 2018 (“the Contract”) and on the basis of the scope and limitations set out below. The Final Report has been prepared solely for the purposes of assessment of the economic and social impacts of Google Cloud as set out in the Contract. It should not be used for any other purposes or in any other context, and Deloitte accepts no responsibility for its use in either regard. The Final Report is provided exclusively for Google’s use under the terms of the Contract. No party other than Google is entitled to rely on the Final Report for any purpose whatsoever and Deloitte accepts no responsibility or liability or duty of care to any party other than Google in respect of the Final Report and any of its contents. As set out in the Contract, the scope of our work has been limited by the time, information and explanations made available to us. The information contained in the Final Report has been obtained from Google and third party sources that are clearly referenced in the appropriate sections of the Final Report.
    [Show full text]
  • BLACKBERRY SPARK COMMUNICATIONS PLATFORM Getting Started Workbook
    1 BLACKBERRY SPARK COMMUNICATIONS PLATFORM Getting Started Workbook 2 <Short Legal Notice> © 2018 BlackBerry. All rights reserved. BlackBerry® and related trademarks, names and logos are the property of BlackBerry Limited and are registered and/or used in the U.S. and countries around the world. All trademarks are the property of their respective owners. This documentation is provided "as is" and without condition, endorsement, guarantee, representation or warranty, or liability of any kind by BlackBerry Limited and its affiliated companies, all of which are expressly disclaimed to the maximum extent permitted by applicable law in your jurisdiction. 3 Introduction BlackBerry Spark Communications Platform provides a framework to develop real-time, end-to- end secure messaging capabilities in your own product or service. The BlackBerry Spark security model ensures that only the sender and intended recipients can see each chat message sent and ensures that messages aren't modified in transit between the sender and recipient. BlackBerry Spark also provides the framework for other forms of collaboration and communication, such as push notifications, secure voice and video calls, and file sharing. You can even extend and create new types of real-time services and use cases by defining your own custom application protocols and data types. Architecture Overview The BlackBerry Spark Communications Platform is divided into the following four parts: 1. The client application 2. The SDK 3. The BlackBerry Spark Servers 4. Additional services provided by you. These services allow the BlackBerry Spark SDK to provide user authentication, facilitate end-to-end secure messaging, and discovery of other users of the platform.
    [Show full text]
  • Spanner: Google's Globally-Distributed Database
    Spanner: Google's Globally-Distributed Database Corbett, Dean, et al. Jinliang Wei CMU CSD October 20, 2013 Jinliang Wei (CMU CSD) Spanner: Google's Globally-Distributed Database Corbett,October 20,Dean, 2013 et 1 al./ 21 What? - Key Features I Globally distributed I Versioned data I SQL transactions + key-value read/writes I External consistency I Automatic data migration across machines (even across datacenters) for load balancing and fautl tolerance. Jinliang Wei (CMU CSD) Spanner: Google's Globally-Distributed Database Corbett,October 20,Dean, 2013 et 2 al./ 21 External Consistency I Equivalent to linearizability I If a transaction T1 commits before another transaction T2 starts, then T1's commit timestamp is smaller than T 2. I Any read that sees T2 must see T1. I The strongest consistency guarantee that can be achieved in practice (Strict consistency is stronger, but not achievable in practice). Jinliang Wei (CMU CSD) Spanner: Google's Globally-Distributed Database Corbett,October 20,Dean, 2013 et 3 al./ 21 Why Spanner? I BigTable I Good performance I Does not support transaction across rows. I Hard to use. I Megastore I Support SQL transactions. I Many applications: Gmail, Calendar, AppEngine... I Poor write throughput. I Need SQL transactions + high throughput. Jinliang Wei (CMU CSD) Spanner: Google's Globally-Distributed Database Corbett,October 20,Dean, 2013 et 4 al./ 21 Spanserver Software Stack Figure: Spanner Server Software Stack Jinliang Wei (CMU CSD) Spanner: Google's Globally-Distributed Database Corbett,October 20,Dean, 2013 et 5 al./ 21 Spanserver Software Stack Cont. I Spanserver maintains data and serves client requests.
    [Show full text]
  • Forensicast: a Non-Intrusive Approach & Tool for Logical Forensic
    University of New Haven Digital Commons @ New Haven Electrical & Computer Engineering and Electrical & Computer Engineering and Computer Science Faculty Publications Computer Science 8-17-2021 Forensicast: A Non-intrusive Approach & Tool for Logical Forensic Acquisition & Analysis of the Google Chromecast TV Alex Sitterer University of New Haven Nicholas Dubois University of New Haven Ibrahim Baggili University of New Haven, [email protected] Follow this and additional works at: https://digitalcommons.newhaven.edu/ electricalcomputerengineering-facpubs Part of the Computer Engineering Commons, Electrical and Computer Engineering Commons, Forensic Science and Technology Commons, and the Information Security Commons Publisher Citation Alex Sitterer, Nicholas Dubois, and Ibrahim Baggili. 2021. Forensicast: A Non-intrusive Approach & Tool For Logical Forensic Acquisition & Analysis of The Google Chromecast TV. In The 16th International Conference on Availability, Reliability and Security (ARES 2021). Association for Computing Machinery, New York, NY, USA, Article 50, 1–12. DOI:https://doi.org/10.1145/3465481.3470060 Comments This is the Author's Accepted Manuscript. Article part of the International Conference Proceeding Series (ICPS), ARES 2021: The 16th International Conference on Availability, Reliability and Security, published by ACM. Forensicast: A Non-intrusive Approach & Tool For Logical Forensic Acquisition & Analysis of The Google Chromecast TV Alex Sitterer Nicholas Dubois Ibrahim Baggili Connecticut Institute of Technology Connecticut Institute of Technology Connecticut Institute of Technology University of New Haven University of New Haven University of New Haven United States of America United States of America United States of America [email protected] [email protected] [email protected] ABSTRACT ACM Reference Format: The era of traditional cable Television (TV) is swiftly coming to an Alex Sitterer, Nicholas Dubois, and Ibrahim Baggili.
    [Show full text]
  • Scalable Crowd-Sourcing of Video from Mobile Devices
    Scalable Crowd-Sourcing of Video from Mobile Devices Pieter Simoens†, Yu Xiao‡, Padmanabhan Pillai•, Zhuo Chen, Kiryong Ha, Mahadev Satyanarayanan December 2012 CMU-CS-12-147 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 †Carnegie Mellon University and Ghent University ‡Carnegie Mellon University and Aalto University •Intel Labs Abstract We propose a scalable Internet system for continuous collection of crowd-sourced video from devices such as Google Glass. Our hybrid cloud architecture for this system is effectively a CDN in reverse. It achieves scalability by decentralizing the cloud computing infrastructure using VM-based cloudlets. Based on time, location and content, privacy sensitive information is automatically removed from the video. This process, which we refer to as denaturing, is executed in a user-specific Virtual Machine (VM) on the cloudlet. Users can perform content-based searches on the total catalog of denatured videos. Our experiments reveal the bottlenecks for video upload, denaturing, indexing and content-based search and provide valuable insight on how parameters such as frame rate and resolution impact the system scalability. Copyright 2012 Carnegie Mellon University This research was supported by the National Science Foundation (NSF) under grant numbers CNS-0833882 and IIS-1065336, by an Intel Science and Technology Center grant, by the Department of Defense (DoD) under Contract No. FA8721-05-C-0003 for the operation of the Software Engineering Institute (SEI), a federally funded research and development center, and by the Academy of Finland under grant number 253860. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily represent the views of the NSF, Intel, DoD, SEI, Academy of Finland, University of Ghent, Aalto University, or Carnegie Mellon University.
    [Show full text]