A Data Store for Analytics on Large Videos

VStore: A Data Store for Analytics on Large Videos Tiantu Xu Luis Materon Botelho Felix Xiaozhu Lin Purdue ECE Purdue ECE Purdue ECE Abstract Backward derivation of configuration We present VStore, a data store for supporting fast, resource- Fidelity Coding Fidelity efficient analytics over large archival videos. VStore manages Video video ingestion, storage, retrieval, and consumption. It con- f c f Operator Data trols video formats along the video data path. It is challenged @ accuracy Ingestion Storage Retrieval Consume by i) the huge combinatorial space of video format knobs; ii) Transcoder Disk Decoder/ CPU/GPU the complex impacts of these knobs and their high profiling bw space disk bw cycles cost; iii) optimizing for multiple resource types. It explores an idea called backward derivation of configuration: in the Figure 1. The VStore architecture, showing the video data opposite direction along the video data path, VStore passes path and backward derivation of configuration. the video quantity and quality expected by analytics backward to retrieval, to storage, and to ingestion. In this process, VStore derives an optimal set of video formats, optimizing for different resources in a progressive manner. are reported to run more than 200 cameras 24×7 [57]. In VStore automatically derives large, complex configura- such deployment, a single camera produces as much as 24 tions consisting of more than one hundred knobs over tens GB encoded video footage per day (720p at 30 fps). of video formats. In response to queries, VStore selects video Retrospective video analytics To generate insights from formats catering to the executed operators and the target enormous video data, retrospective analytics is vital: video accuracy. It streams video data from disks through decoder streams are captured and stored on disks for a user-defined to operators. It runs queries as fast as 362× of video realtime. lifespan; users run queries over the stored videos on demand. Retrospective analytics offers several key advantages that CCS Concepts • Information systems → Data analyt- live analytics lacks. i) Analyzing many video streams in real ics; • Computing methodologies → Computer vision time is expensive, e.g., running deep neural networks over tasks; Object recognition; live videos from a $200 camera may require a $4000 GPU [34]. Keywords Video Analytics, Data Store, Deep Neural Net- ii) Query types may only become known after the video works capture [33]. iii) At query time, users may interactively revise their query types or parameters [19, 33], which may not be ACM Reference Format: foreseen at ingestion time. iv) In many applications, only a Tiantu Xu, Luis Materon Botelho, and Felix Xiaozhu Lin. 2019. VStore: A Data Store for Analytics on Large Videos. In Fourteenth small fraction of the video will be eventually queried [27], EuroSys Conference 2019 (EuroSys ’19), March 25–28, 2019, Dresden, making live analytics an overkill. Germany. ACM, New York, NY, USA, 17 pages. https://doi:org/ A video query (e.g., “what are the license plate numbers 10:1145/3302424:3303971 of all blue cars in the last week?”) is typically executed as a cascade of operators [32–34, 60, 67]. Given a query, a query 1 Introduction engine assembles a cascade and run the operators. Query engines typically expose to users the trade-offs between op- arXiv:1810.01794v3 [cs.DB] 17 Feb 2019 Pervasive cameras produce videos at an unprecedented rate. erator accuracy and resource costs, allowing users to obtain Over the past 10 years, the annual shipments of surveillance inaccurate results with a shorter wait. The users thus can cameras grow by 10×, to 130M per year [28]. Many campuses explore large videos interactively [33, 34]. Recent query en- Permission to make digital or hard copies of all or part of this work for gines show promise of high speed, e.g., consuming one-day personal or classroom use is granted without fee provided that copies are not video in several minutes [34]. made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components Need for a video store While recent query engines as- of this work owned by others than ACM must be honored. Abstracting with sume all input data as raw frames present in memory, there credit is permitted. To copy otherwise, or republish, to post on servers or to lacks a video store that manages large videos for analyt- redistribute to lists, requires prior specific permission and/or a fee. Request ics. The store should orchestrate four major stages on the permissions from [email protected]. video data path: ingestion, storage, retrieval, and consump- EuroSys ’19, March 25–28, 2019, Dresden, Germany tion, as shown in Figure 1. The four stages demand multiple © 2019 Association for Computing Machinery. ACM ISBN 978-1-4503-6281-8/19/03...$15.00 hardware resources, including encoder/decoder bandwidth, https://doi:org/10:1145/3302424:3303971 disk space, and CPU/GPU cycles for query execution. The 1 EuroSys ’19, March 25–28, 2019, Dresden, Germany Tiantu Xu, Luis Materon Botelho, and Felix Xiaozhu Lin resource demands are high, thanks to large video data. De- Frame diff Specialized Full Motion License detector neural net neural net detector plate detector OCR mands for different resource types may conflict. Towards [Diff] [S-NN] [NN] [Motion] [License] optimizing these stages for resource efficiency, classic video databases are inadequate [35]: they were designed for hu- (a) Car detector. Diff filters out b) Vehicle license plate recognition. man consumers watching videos at 1×–2× speed of video similar frames; S-NN rapidly Motion filters frames w/ little realtime; they are incapable of serving some algorithmic con- detects part of cars; NN motion; License spots plate regions sumers, i.e., operators, processing videos at more than 1000× analyzes remaining frames for OCR to recognize characters video realtime. Shifting part of the query to ingestion [24] Figure 2. Video queries as operator cascades [34, 46]. has important limitations and does not obviate the need for such a video store, as we will show in the paper. Towards designing a video store, we advocate for tak- costs; iii) from the storage formats, VStore derives a data ero- ing a key opportunity: as video flows through its data path, sion plan, which gradually deletes aging video data, trading the store should control video formats (fidelity and cod- off analytics speed for lower storage cost. ing) through extensive video parameters called knobs. These Through evaluation with two real-world queries over six knobs have significant impacts on resource costs and analyt- video datasets, we demonstrate that VStore is capable of de- ics accuracy, opening a rich space of trade-offs. riving large, complex configuration with hundreds of knobs We present VStore, a system managing large videos for ret- over tens of video formats, which are infeasible for humans rospective analytics. The primary feature of VStore is its auto- to tune. Following the configuration, VStore stores multiple matic configuration of video formats. As video streams arrive, formats for each video footage. To serve queries, it streams VStore saves multiple video versions and judiciously sets video data (encoded or raw) from disks through decoder to their storage formats; in response to queries, VStore retrieves operators, running queries as fast as 362× of video realtime. stored video versions and converts them into consumption As users lower the target query accuracy, VStore elastically formats catering to the executed operators. Through con- scales down the costs by switching operators to cheaper figuring video formats, VStore ensures operators tomeet video formats, accelerating the query by two orders of mag- their desired accuracies at high speed; it prevents video re- nitude. This query speed is 150× higher compared to systems trieval from bottlenecking consumption; it ensures resource that lack automatic configuration of video formats. VStore consumption to respect budgets. reduces the total configuration overhead by 5×. To decide video formats, VStore is challenged by i) an Contributions We have made the following contributions. enormous combinatorial space of video knobs; ii) complex • We make a case for a new video store for serving retro- impacts of these knobs and high profiling costs; iii) optimiz- spective analytics over large videos. We formulate the design ing for multiple resource types. These challenges were unad- problem and experimentally explore the design space. dressed: while classic video databases may save video con- tents in multiple formats, their format choices are oblivious • To design such a video store, we identify the configuration to analytics and often ad hoc [35]; while existing query en- of video formats as the central concern. We present a novel gines recognize the significance of video formats32 [ , 33, 67] approach called backward derivation. With this approach, and optimize them for query execution, they omit video we contribute new techniques for searching large spaces of coding, storage, and retrieval, which are all crucial to retro- video knobs, for coalescing stored video formats, and for spective analytics. eroding aging video data. To address these challenges, our key idea behind VStore • We report VStore, a concrete implementation of our de- is backward derivation, shown in Figure 1. In the opposite sign. Our evaluation shows promising results. VStore is the direction of the video data path, VStore passes the desired first holistic system that manages the full video lifecycle data quantity and quality from algorithmic consumers back- optimized for retrospective analytics, to our knowledge. ward to retrieval, to storage, and to ingestion. In this process, VStore optimizes for different resources in a progres- 2 Motivations sive manner; it elastically trades off among them to respect 2.1 Retrospective Video analytics resource budgets. More specifically, i) from operators and their desired accuracies, VStore derives video formats for Query & operators A video query is typically executed as fastest data consumption, for which it effectively searches a cascade of operators.

A Data Store for Analytics on Large Videos

Nvidia Video Codec Sdk - Encoder

Tegra Linux Driver Package

Mechdyne-TGX-2.1-Installation-Guide

NVIDIA RTX Virtual Workstation

NVIDIA A100 Tensor Core GPU Architecture UNPRECEDENTED ACCELERATION at EVERY SCALE

Tegra Linux Driver Package

NVIDIA DESIGNWORKS Ankit Patel - [email protected] Prerna Dogra - [email protected]

Nvidia Video Codec Sdk - Decoder

Studies on CUDA Offloading for Real-Time Simulation And

IN-SITU VIS in the AGE of HPC+AI Peter Messmer, WOIV – ISC 2018, Frankfurt, 6/28/18 EXCITING TIMES for VISUALIZATION

NVIDIA® RTX™ A4000 GPU Performance Features NVIDIA Ampere Architecture NVIDIA RTX A4000 Is the Most Powerful Single Slot

Nvidia Video Codec Sdk - Decoder