Fusing Multimedia Data Into Dynamic Virtual Environments
Total Page:16
File Type:pdf, Size:1020Kb
ABSTRACT Title of dissertation: FUSING MULTIMEDIA DATA INTO DYNAMIC VIRTUAL ENVIRONMENTS Ruofei Du Doctor of Philosophy, 2018 Dissertation directed by: Professor Amitabh Varshney Department of Computer Science In spite of the dramatic growth of virtual and augmented reality (VR and AR) technology, content creation for immersive and dynamic virtual environments remains a signifcant challenge. In this dissertation, we present our research in fusing multimedia data, including text, photos, panoramas, and multi-view videos, to create rich and compelling virtual environments. First, we present Social Street View, which renders geo-tagged social media in its natural geo-spatial context provided by 360° panoramas. Our system takes into account visual saliency and uses maximal Poisson-disc placement with spatiotem- poral flters to render social multimedia in an immersive setting. We also present a novel GPU-driven pipeline for saliency computation in 360° panoramas using spher- ical harmonics (SH). Our spherical residual model can be applied to virtual cine- matography in 360° videos. We further present Geollery, a mixed-reality platform to render an interactive mirrored world in real time with three-dimensional (3D) buildings, user-generated content, and geo-tagged social media. Our user study has identifed several use cases for these systems, including immersive social storytelling, experiencing the culture, and crowd-sourced tourism. We next present Video Fields, a web-based interactive system to create, cal- ibrate, and render dynamic videos overlaid on 3D scenes. Our system renders dynamic entities from multiple videos, using early and deferred texture sampling. Video Fields can be used for immersive surveillance in virtual environments. Fur- thermore, we present VRSurus and ARCrypt projects to explore the applications of gestures recognition, haptic feedback, and visual cryptography for virtual and augmented reality. Finally, we present our work on Montage4D, a real-time system for seamlessly fusing multi-view video textures with dynamic meshes. We use geodesics on meshes with view-dependent rendering to mitigate spatial occlusion seams while maintain- ing temporal consistency. Our experiments show signifcant enhancement in ren- dering quality, especially for salient regions such as faces. We believe that Social Street View, Geollery, Video Fields, and Montage4D will greatly facilitate several applications such as virtual tourism, immersive telepresence, and remote education. FUSING MULTIMEDIA DATA INTO DYNAMIC VIRTUAL ENVIRONMENTS by Ruofei Du Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfllment of the requirements for the degree of Doctor of Philosophy 2018 Advisory Committee: Dr. Amitabh Varshney, Chair/Advisor Dr. Matthias Zwicker Dr. Furong Huang Dr. Joseph F. JaJa Dr. Ming Chuang © Copyright by Ruofei Du 2018 Acknowledgments With my deepest sincerity and gratitude, thank you everyone I met in the in- credible fve years I have had at University of Maryland, College Park and Microsoft Research. First and foremost, I have to say a special thank you to my advisor, Professor Amitabh Varshney for advising me, supporting me, and having faith in me on the creative and fascinating projects over the past years. I am not the most conventional of researchers, and there is often disruption along the path to achieving successful projects, but Professor Varshney was always in the background supporting me. Thank you for helping me to grow into the researcher I am today. With your encouragement, I have learned a great deal of knowledge about interactive graphics and visualization, virtual and augmented reality, parallel computing, and, most importantly, how to conduct research and behave myself. I would like to deeply thank my committee members, Dr. Zwicker, Dr. Huang, Dr. JaJa, and Dr. Chuang, for ofering me invaluable advice and direction. I am also grateful to my advisors in Human Computer Interaction, Dr. Froehlich and Dr. Findlater, for guiding me through ProjectSideWalk and ProjectHandSight, teaching me how to organize a team project, how to take human factors into account, and how to think out of the box. I owe my gratitude to all the teachers and classmates I have learned from and because of whom my graduate experience has been one that I will cherish forever. I owe my thanks to all my colleagues, collaborators, and mentors from Mi- ii crosoft Research, Redmond: Hugues Hoppe, Wayne Chang, Sameh Khamis, Shahram Izadi, Mingsong Dou, Yury Degtyarev, Philip Davidson, Sean Fanello, Adarsh Kow- dle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, Push- meet Kohli, Vladimir Tankovich, Marek Kolwalski, Qiyu Chen, Spencer Fowers, Jef Foster, Norm Whittaker, and Ben Cutler. They live and breathe the spirit of re- search that I know and love: to work hard, to embrace science and engineering, to mix theory and practice, to think and work as a team rather than individuals, and to always focus on big things. I am grateful to all my dear friends, colleagues, collaborators, and lab-mates at University of Maryland, College Park: Dr. Sujal Bista, Dr. Hsueh-Chien Cheng, Dr. Changqing Zou, Dr. Eric Krokos, Dr. Kotaro Hara, Dr. Lee Stearns, Dr. Uran Oh, Dr. Hui Miao, Dr. Hao Li, Dr. Fan Du, Dr. Awalin Sopan, Dr. Xintong Han, Dr. Zebao Gao, Dr. Jin Sun, Yu Jin, Xiaoxu Meng, Shangfu Peng, Hong Wei, Liang He, Tifany Chao, Kent Wills, Max Potasznik, Zheng Xu, Xuetong Sun, Hao Zhou, Xiyang Dai, Weiwei Yang, Shuo Li, Sida Li, Eric Lee, Somay Jain, Mukul Agarwal, Patrick Owen, Tara Larrue, and David Li. I am lucky to collaborate with you and learn from you. Your passion and dedication will always inspire me. Finally, there are three people missing up to now who I should thank the most: my parents and my wife, Sai Yuan. They have supported my continual dedication to study and research during the late nights, the weekends, and the travel. Thank you all! iii Dedication 2D justifes existence 3D validates identifcation 4D convinces living To my advisors, teachers, friends, and families, who taught me theorems, algorithms, data structures, and the meaning of life, and to those who taught me to relish the moment in the true reality. iv Table of Contents Acknowledgements ii Dedication iv List of Tables x List of Figures xi List of Abbreviations xiv 1 Introduction 1 1.1 Social Street View: Blending Immersive Maps with Geotagged Social Media . 1 1.1.1 TopicFields: Spatiotemporal Visualization of Geotagged So- cial Media With Hybrid Topic Models and Scalar Fields . 3 1.1.2 Geollery: Designing an Interactive Mirrored World with Geo- tagged Social Media . 5 1.2 Spherical Harmonics for Saliency Computation and Virtual Cine- matography in 360° Videos . 7 1.3 Video Fields: Fusing Multiple Surveillance Videos into a Dynamic Virtual Environment . 9 1.4 Integrating Gesture Recognition, Tactile Feedback, and Visual Cryp- tography into Virtual Environments . 11 1.4.1 VRSurus: Enhancing Interactivity and Tangibility of Puppets in Virtual Reality . 12 1.4.2 ARCrypt: Visual Cryptography with Misalignment Tolerance using Augmented Reality Head-Mounted Displays . 13 1.5 Montage4D: Real-time Seamless Fusion and Stylization of Multiview Video Textures . 14 2 Social Street View: Blending Immersive Street Views with Geotagged Social Media 18 2.1 Introduction . 18 v 2.2 Background and Related Work . 21 2.2.1 Immersive Maps . 21 2.2.2 Visual Management of Geotagged Information . 22 2.2.3 Analysis of Geotagged Social Media . 25 2.2.4 Mixed Reality in Immersive Maps . 26 2.3 System Architecture . 27 2.3.1 Street View Scraper . 27 2.3.2 Mining Social Media . 29 2.3.3 Servers and Relational Databases . 31 2.4 Social Street View Interface . 32 2.5 Social Media Layout Algorithm . 33 2.5.1 Baseline: 2D Visualization . 34 2.5.2 Uniform Random Sampling . 34 2.5.3 Depth and Normal-map-driven Placement of Social Media . 36 2.5.4 Maximal Poisson-disk Sampling . 37 2.5.5 Placement of Social Media in Scenic Landscapes . 38 2.5.6 Post-processing, Rendering and Interaction . 40 2.5.7 Filtering of Social Media . 40 2.6 Experiments and Evaluation . 41 2.6.1 Dataset Acquisition and Hardware Setup . 41 2.6.2 Evaluation of Initialization and Rendering Time . 42 2.6.3 Evaluation of Saliency Coverage . 43 2.7 Use Cases and Discussion . 45 2.7.1 Storytelling . 46 2.7.2 Business Advertising . 46 2.7.3 Learning Culture and Crowd-sourced Tourism . 47 2.8 TopicFields: Spatiotemporal Visualization of Geotagged Social Me- dia with Hybrid Topic Models and Scalar Fields . 49 2.8.1 System Overview . 50 2.8.2 Data Processing . 53 2.8.2.1 Data Mining . 53 2.8.2.2 Feature Extraction . 55 2.8.2.3 Spectral Clustering . 58 2.8.3 Topic Fields Visualization . 59 2.8.4 Use Cases . 62 2.8.4.1 Trip Planning . 62 2.8.4.2 Searching with Temporal Filters . 64 2.8.5 Discussion . 65 2.9 Geollery: Designing an Interactive Mirrored World with Geotagged Social Media . 66 2.9.1 Introduction . 66 2.9.2 System Overview . 69 2.9.3 Design Space . 71 2.9.3.1 Meshes and Textures . 71 2.9.3.2 Interactive Capabilities . 73 vi 2.9.3.3 Virtual Representations of Social Media . 75 2.9.3.4 Aggregation Approaches . 77 2.9.3.5 Privacy . 78 2.9.3.6 Real-world Phenomena . 79 2.9.3.7 Filtering of Social Media . 79 2.9.4 User Study . 79 2.9.4.1 Background Interview . 80 2.9.4.2 Exploration of Geollery and Social Street View . 81 2.9.4.3 Quantitative Evaluation . 83 2.9.4.4 The Future of 3D Social Media Platforms . 85 2.9.5 Discussion . 88 2.9.5.1 Insights from User Study .