Copyright by Jian He 2020 the Dissertation Committee for Jian He Certiﬁes That This Is the Approved Version of the Following Dissertation

Copyright by Jian He 2020 The Dissertation Committee for Jian He certifies that this is the approved version of the following dissertation: Empowering Video Applications for Mobile Devices Committee: Lili Qiu, Supervisor Mohamed G. Gouda Aloysius Mok Xiaoqing Zhu Empowering Video Applications for Mobile Devices by Jian He DISSERTATION Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY THE UNIVERSITY OF TEXAS AT AUSTIN May 2020 Acknowledgments First and foremost, I want to thank my advisor Prof. Lili Qiu, for the support and guidance I have received over the past few years. I appreciate all her contributions of time, ideas and funding to make my Ph.D. experience productive and stimulating. The enthusiasm she has for her research significantly motivated to concentrate on my research especially during tough times in my Ph.D. pursuit. She taught me how to crystallize ideas into solid and fancy research works. I definitely believe that working with her will help me have a more successful career in the future. I also want to thank all the members in my dissertation committee, Prof. Mohamed G. Gouda, Prof. Aloysius Mok and Dr. Xiaoqing Zhu. I owe many thanks to them for their insightful comments on my dissertation. I was very fortunate to collaborate with Wenguang Mao, Mubashir Qureshi, Ghufran Baig, Zaiwei Zhang, Yuchen Cui, Sangki Yun, Zhaoyuan He, Chenxi Yang, Wangyang Li and Yichao Chen on many interesting works. They always had time and passion to devote to my research projects. Without their support, I could not complete those projects smoothly. I want to thank my colleagues Mei Wang, Wei Sun and Swadhin Pradhan for their great help make my research life enjoyable. I would like to thank Xiaoqing Zhu, Shruti Sanadhya, Sangki Yun, iv Christina Vlachou and Kyu-Han Kim. They were my mentors during my internships at Cisco and HP Labs. I had very fun experiences with them to learn how to do projects in industry companies and research labs. They encouraged me a lot to seek for more success in my career. I feel extremely grateful to have my many friends who brought lots of joy to my life at UT: Chen Chen, Lei Xu, Yuxiang Lin, Wenhui Zhang, Hangchen Yu, Zhiting Zhu, Yuepeng Wang, Xinyu Wang, Ye Zhang, and many others. More importantly, I owe my sincere gratitude to Xiaoting Liu, who provided me her continuous care. I will never forget the days and nights we had together to go through the hard times during the COVID-19 outbreak. I wish you all the best! Finally, I would like to thank my family for all their love and encour- agement. I dedicate this dissertation to you. v Empowering Video Applications for Mobile Devices Jian He, Ph.D. The University of Texas at Austin, 2020 Supervisor: Lili Qiu The popularity of video applications has grown rapidly. There are two main trends in the development of video applications: (i) video streaming sup- porting higher-resolution videos and 360◦ videos, (ii) providing video analytics (e.g., running object detection on video frames). In this dissertation, we focus on how to improve performance of streaming 360◦ and 4K videos and running real-time video analytics on mobile devices. We identify a few major challenges to guarantee high user experience for running video applications on mobile devices. First, existing video applications call for high-resolution videos(e.g., 4K). Due to limited hardware resource on mobile devices, it is slow to code high-resolution videos. It is crit- ical to design a light-weight video codec to provide fast video coding as well as high compression efficiency for mobile devices. Second, wireless channels have unpredictable throughput fluctuation. It is necessary to design a robust rate adaptation algorithm to adjust video quality according to the varying vi network condition. Third, streaming entire panoramic video views wastes lots of bandwidth, while only transmitting the portion visible to the users FoV significantly degrades video quality. It is hard to save bandwidth while main- taining high video quality with inevitable head movement prediction error. Last, motion based object tracking can speed up video analytics, but existing motion estimation is noisy due to the presence of complex background and object size or shape changes. In this dissertation, we will show how to address the above mentioned challenges. We propose a new layered coding design to code high-resolution video data. It can effectively adapt to varying data rates on demand by first sending the base layer and then opportunistically sending more layers when- ever the link allows. We further design an optimization algorithm to decide which video layers to send according to available throughput. Compared with existing rate adaptation algorithm, our algorithm includes the new dimen- sion of deciding the number of layers to transmit. We design a novel layered tile-based encoding framework for 360◦ videos. It can achieve efficient video coding, bandwidth saving, and robustness against head movement prediction error. Moreover, we design a robust technique to extract reliable motion from video frames. We use a combination of feature maps and motion to generate a representative mask which can reliably capture the motion of object pixels and the changes of the overall object shape or size. First, we implement our tile-based layered encoding framework Rubiks on mobile devices for 360◦ video streaming. We exploit spatial and tempo- vii ral characteristics of 360◦ videos for encoding. Specifically, Rubiks splits the 360◦ video spatially into tiles and temporally into layers. The client runs an optimization routine to determine the video data that needs to be fetched to optimize user QoE. Using this encoding approach, we can send the video portions that have a high probability of viewing at a higher quality and the portion that has a lower probability of viewing at a lower quality. By control- ling the amount of data sent, the data can be decoded in time. Rubiks can save significant bandwidth while maximizing the users QoE and decoding the video in a timely manner. Compared with existing approaches, Rubiks can achieve up to 69% improvement in user QoE and 49% in bandwidth savings over existing approaches. Next, we design a system Jigsaw to support live 4K video streaming over wireless networks using commodity mobile devices. Given the high data rate requirement of 4K videos, 60GHz is appealing, but its large and unpredictable throughput fluctuation makes it hard to provide desirable user experience. We propose a novel system Jigsaw, which consists of (i) easy-to-compute layered video coding to seamlessly adapt to unpredictable wireless link fluctuations, (ii) efficient client GPU implementation of video coding on commodity mobile devices, and (iii) effectively leveraging both WiFi and WiGig through delayed video adaptation and smart scheduling. Using real experiments and emulation, we demonstrate the feasibility and effectiveness of Jigsaw. Our results show that it improves PSNR by 6-15dB and improves SSIM by 0.011-0.217 over state-of-the-art approaches. viii Finally, we develop a novel mobile video analytics system Sight. Its unique features include (i) high accuracy, (ii) real-time, and (iii) running ex- clusively on a mobile device without the need of edge/cloud server or network connectivity. At its heart lies an effective technique to reliably extract motion from video frames and use the motion to speed up video analytics. Unlike the existing motion extraction, our technique is robust to background noise and changes in object sizes. Using extensive evaluation, we show that Sight can support real-time object tracking at 30 frames/second (fps) on Nvidia Jetson TX2. For single-object tracking, Sight improves the average Intersection-over- Union (IoU) by 88%, improves the mean Average Precision (mAP) by 207% and reduces the average hardware resource usage by 45% over state-of-the-art approach. For multi-object tracking, Sight improves IoU by 69%, improves mAP by 173% and reduces resource usage by around 32%. ix Table of Contents Acknowledgments iv Abstract vi List of Tables xiv List of Figures xv Chapter 1. Introduction 1 1.1 Background . .1 1.2 Motivation . .3 1.2.1 Video Streaming . .3 1.2.2 Mobile Video Analytics . .5 1.3 Challenges . .6 1.3.1 Video Streaming . .6 1.3.2 Video Coding . .8 1.3.3 Video Analytics . .8 1.4 Our Approach . .9 1.5 Summary of Contributions . 11 1.6 Dissertation Outline . 13 Chapter 2. Related Work 14 2.1 Video Streaming Algorithms . 14 2.2 Wireless Technologies . 16 2.3 Video Coding . 18 2.4 Mobile Video Analytics . 19 x Chapter 3. Practical 360◦ Video Streaming for Smartphones 22 3.1 Background for 360◦ Video Streaming . 22 3.1.1 Existing Streaming Framework . 22 3.1.2 H.264 and HEVC Codecs . 23 3.1.3 Scalable Video Coding . 24 3.2 Motivation . 25 3.2.1 Real-Time Media Codecs . 25 3.2.2 Limitations of Existing Approaches . 26 3.2.2.1 Decoding Time . 27 3.2.2.2 Bandwidth Savings . 29 3.2.2.3 Video Quality . 30 3.2.3 Insights From Existing Approaches . 32 3.3 Challenges . 33 3.4 Our Approach . 34 3.4.1 Video Encoding . 35 3.4.2 360◦ Video Rate Adaptation . 36 3.4.2.1 MPC-based Optimization Framework . 37 3.4.2.2 User QoE . 38 3.4.2.3 Estimate Video Quality . 40 3.4.2.4 Decoding Time . 43 3.4.2.5 Improving Efficiency . 44 3.5 System Design for Rubiks ..................... 45 3.5.1 System Architecture .

Copyright by Jian He 2020 the Dissertation Committee for Jian He Certiﬁes That This Is the Approved Version of the Following Dissertation

Transcoding SDK Combine Your Encoding Presets Into a Single Tool

MSI Afterburner V4.6.4

Processing Multimedia Workloads on Heterogeneous Multicore Architectures

Mechdyne-TGX-2.1-Installation-Guide

NVIDIA GRID™ Virtual PC and Virtual Apps

Volume 2 – Vidéo Sous Linux

NVIDIA Quadro by PNY Spring 07 Sales Presentation

NVIDIA GRID™ Virtual PC and Virtual Apps

Zefektivnění Práce V Kreativních Softwarech Pomocí Nových Technologií Společnosti NVIDIA

Trueconf Brings 4K Video Conferencing to Smart Tvs Trueconf Introduced 4K (2160P) Video Calls to Smart Tvs for NVIDIA SHIELD TV Users

Delivering Transformational User Experience with Blast Extreme Adaptive Transport and NVIDIA GRID

H264 Encoding Available on Etere ETX