Signature Redacted Signature Redacted

Efficient Semantic Retrieval on K-Segment Coresets of User Videos by Pramod Kandel S.B., C.S. M.I.T., 2014 Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of Master of Engineering in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology AUgiist, 2015 All rights reserved. Author: Signature redacted Depatiit of Electrical Engineering and Computer Science Aug 3 1, 2015 Certified by: Signature redacted Prof. Daniela Rus, CSAIL Director, Andrew (1956) & Erna Viterbi Professor, Thesis Supervisor Aug 31, 2015 Certified by: Signature redacted Guy Rosman ostd Ooral Associate, Thesis Co-Supervisor Aug 31, 2015 Accepted by: Signature redacted Prui.~ALgrt Me, Chaimj'asters of Engineering Thesis Committee ARCHIVES MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUL 19 2016 LIBRARIES Efficient Semantic Retrieval on K-segment Coresets of User Videos By Pramod Kandel Submitted to the Department of Electrical Engineering and Computer Science Aug 24, 2015 In Partial Fulfillment of the Requirements for the Degree of Master of Engineering in Electrical Engineering and Computer Science ABSTRACT Every day, we collect and store various kinds of data with our modern sensors, phones, cameras, and various gadgets. One of the richest available data is video data. We take numerous hours of videos with our phones and cameras, and store them in computers or cloud. However, because recording videos produce large files, it is hard to search and locate for specific video segments within a video library. We might need the part where "Matt was playing guitar", or we might want to see "the glimpse of John's laptop" among hours of video data that contain those pieces. The goal of this thesis is to create a system that is able to retrieve efficiently the relevant segments(frames) in the video by allowing users to do textual search based on objects of the video, such as "guitar" or "laptop". A big challenge with videos is the huge space required to store them, therefore making it difficult to retrieve and analyze videos. This thesis presents an efficient compression method, which uses k-segment mean coresets to represent the video data using fewer frames while preserving the information content in the original data set. The system then uses a state-of-the-art object detector to analyze and detect objects in the reduced data. The objects and corresponding frames are stored and cross-linked to the original data to enable retrieval. The system allows users to pose text queries about objects in the videos. It is important that the retrieval of the stored objects is as efficient and meaningful as possible. This thesis presents a retrieval algorithm, also based on the k-segment mean coreset algorithm, which allows efficient any-time retrieval of the detected objects, retrieving the "more preferred" or "more important" frames earlier. The system presents the any-time results to the users in an incremental way. This thesis describes the architecture and modules of the objects retrieval system for video data. The modules include the user interface for making the search query and displaying the results, the module for video compression with coresets, the object-detection module, the retrieval module, and the data flow between them. This thesis describes an implementation of this system, the algorithms used, and a suite of experiments to validate and evaluate the algorithms. The results show that using coresets, it is possible to identify, store, and efficiently retrieve video segments by specifying the objects in video data. Page 3 of 107 Page 4 of 107 ACKNOWLEDGEMENTS First and foremost, I would like to thank my thesis supervisor Professor Daniela Rus and my thesis co- supervisor Guy Rosman for guiding me through my Master's program. Prof. Rus has a deep understanding of myriads of topics in the field of robotics and beyond, and has a knack of giving clear and understandable guidelines and comments to her students. Despite her busy schedule, I met her every week, and every meeting with her has positively impacted my understanding on the subject of my thesis and the direction I should be heading. She has shown me the way in the most difficult times. Similarly, Guy has been almost my companion throughout this process, being there to guide and support me at any time I needed him. He is one of the smartest people I have met both in theory and applications of various topics of computer vision, and more generally robotics and computer science as a whole. I am thankful for the opportunity to be supervised by such amazing and smart people. Secondly, my family has always been an inspiration to me. My parents provided for me and sent me to good schools, even when the circumstances were not great for themselves. Their sacrifice at difficult times is the primary reason I'm here and writing this thesis. Similarly, my loving brother Saroj has always been my best friend and has made me smile and laugh when I needed. Thirdly, I thank my close friends who have continuously offered assistance to make my life easier during the thesis write-up process, and have always been there to encourage me. Finally, I am thankful to each and every person I have had opportunity to know during my lifetime. I am confident that each of their positivity and their stories have inspired me to be where I am today, and will keep inspiring and guiding me in days to come. Page 5 of 107 Page 6 of 107 TABLE OF CONTENTS Chapter 1 Introduction...........................................................................................................................18 1.1 Goal........................................................................................................................................... 18 1.2 Challenges................................................................................................................................. 19 1.2.1 Large Data.........................................................................................................................19 1.2.2 Preserving Sem antics on Compression .......................................................................... 20 1.2.3 Object Detection on Videos ............................................ 1 1.2.4 Efficient, Sem antic, and Preferential Retrieval.................................................................21 1.3 Solution Approach .................................................................................................................... 21 1.4 Contributions ............................................................................................................................ 1.4.1 System ............................................................................................................................... 22 1.4.2 Algorithm s ........................................................................................................................ 22 1.4.3 Experiments ...................................................................................................................... 23 Chapter 2 Related W ork ........................................................................................................................ 24 2.1 Life Logging Systems/Diaries ............................................................................................... 24 2.2 Coreset Algorithm s...................................................................................................................26 2.3 Sem antic Retrieval for videos............................................................................................... 27 Chapter 3 Video Summarization and Retrieval: Technical Approach ............................................... 30 3.1 Objects Retrieval System Architecture: Introduction ............................................................ 31 3.2 W eb Interface............................................................................................................................32 Page 7 of 107 3.3 Video Processing ...................................................................................................................... 32 3.3.1 Coreset Tree Creator.........................................................................................................32 3.3.2 Object Detector ................................................................................................................. 33 3.4 Storage ...................................................................................................................................... 33 3.5 Retriever....................................................................................................................................36 Chapter 4 Video to Coreset Tree Creation ......................................................................................... 38 4.1 Coresets: Introduction...............................................................................................................38 4.2 K-segment Mean Coreset Tree Construction (without key frames) .................................... 38 4.3 Sum m arization: Coreset Tree with key frames...................................................................... 39 4.4 Output/Populating inform ation to DB .................................................................................. 41 Chapter 5 Object Detection of Coreset Tree Leaves .......................................................................... 43 5.1 Object Recognition

Load more