Transfer Learning for Intelligent Systems in the Wild by Wei-Lun
Total Page:16
File Type:pdf, Size:1020Kb
Transfer Learning for Intelligent Systems in the Wild by Wei-Lun Chao A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) December 2018 Copyright 2018 Wei-Lun Chao Acknowledgments I would like to express my sincere appreciation to my fantastic mentors, colleagues, friends, and family for their generous contributions to the work presented in this thesis and toward com- pleting my Ph.D. degree. Special mention goes to my enthusiastic adviser, Prof. Fei Sha. My Ph.D. has been an amazing five-year journey, and I thank Fei for his tremendous academic guidance and support, giving me many wonderful opportunities to explore and grow. As a mentor, Fei is knowledgeable and inspiring, always motivating me to think out of the box and challenge myself to the limit. As a research scientist, Fei has an extremely high standard and never stop pursuing impactful work, from which I can always derive “gradients” to improve. It was my great honor to work with him and follow his career path to become a faculty, and I am especially thankful for the excellent support and suggestions he has provided during my job hunting season. Similarly, profound gratitude goes to my long-term collaborators and mentors Prof. Kristen Grauman and Dr. Boqing Gong. Kristen is a deep thinker, who can always point out key aspects to raise our work to the next level. She helped me set up a productive and solid path of research in my first half of Ph.D., and was enthusiastically supportive when I was on the job market. Boqing is my very best collaborator, friend, and teacher. It is he who step-by-step guided me on every aspect of doing research. I will never forget those paper deadlines we faced together and how we used to minimize the overlap of sleeping time to maintain progress. I am greatly inspired and motivated by watching him from being a hardworking senior Ph.D. to a productive young faculty. Like Fei and Kristen, Boqing offered tremendous support throughout my job search. Thank you. I thank Prof. Winston Hsu and Prof. Jian-Jiun Ding, who led me into research and prepared me for the program when I was a M.S. student in NTU, Taiwan. My Ph.D. journey would not have started without them. I would like to thank my dissertation defense committee members, Prof. Laurent Itti, Ja- son Lee, Panayiotis Georgiou, and Joseph Lim, for their interest, valuable time, and helpful comments. The same appreciation goes to my proposal committee members, Prof. Antonio Or- tega, Haipeng Luo, and Meisam Razaviyayn. Especially Haipeng, Meisam, and Joseph set up mock interviews and job talks for me, greatly improving my preparation for job interviews. I also want to thank Lizsl De Leon and Jennifer Gerson for their professional services and excellent consultations at the CS department and Viterbi school to resolve many issues during my Ph.D. My sincere gratitude also goes to Prof. Justin Solomon and Dominik Michels, and Dr. Hoifung Poon, Kris Quirk, and Xiaodong He, who provided me opportunities to collaborate on different projects, greatly expanding my research horizon and interest. The experience and skills I learned from them definitely facilitated my thesis work. I would specifically thank Justin for his help throughout my job search. During my Ph.D., I was so fortunate to join the TEDS lab and have many talented and fun lab mates to work and live with. Kuan and Ali brought me many precious memories at my first half ii of Ph.D.; the same appreciation goes to Dong, Yuan, Franziska, and Wenzhe. Thank Zhiyun for bringing me a great number of enjoyable moments at the lab, and thank Chao-Kai for many inspiring theoretical discussions. I also enjoyed chatting about lives with both of them. I want to give special thanks to Beer, Hexing (Frank), and Ke for their hardworking while we collaborated on zero-shot learning, visual question answering, and video summarization, respectively. Beer always works hard, thinks hard, plays hard, and sleeps hard. He cares a lot about people and everything but his daily routine. Frank has endless energy and passion for research and acquiring knowledge, and Ke is always humble and thoughtful. It was a tremendously productive period when working with them. Besides, I was so grateful to meet many new members at my final year, including Seb, Shariq, Aaron, Liyu, Ivy, Bowen, Chin-Cheng (Jeremy), Yuri, Yiming, Melissa, Han-Jia, and Marc. I learned so much from them, not only about research but about cultures and lifestyles. I also thank them for giving me opportunities to practice being a mentor. It was my great pleasure to discuss and work with so many gifted colleagues on diverse research problems, and thank all of them for helping me improve my job talks and defense presentation. Finally, but by no means least, thanks to my family for all their love and encouragement. This thesis is dedicated to them. My parents are always supportive of me and my younger sister Yung-Hsuan for every decision we made, especially for studying abroad. It is amazing that we both studied and obtained the Ph.D. degree at USC, and attended the commencement in 2018. My Ph.D. life was with many up and down moments, and I really thank their company to share my happiness and depression. In the end, I would like to thank my loving, encouraging, patient, and supportive wife Feng-Ju (Claire). We have been life partners for seven years since our M.S. studies in Taiwan and it is she who brings me love and joy, teaches me how to pursue and fulfill dreams and never giving up, and makes me the most fortunate person. Thank you. iii Table of Contents Acknowledgments ii List of Tables ix List of Figures xiii Abstract xviii I Background 1 1 Introduction 2 1.1 Machine learning for intelligent systems . .3 1.2 Challenges in the wild . .4 1.3 Transfer learning for intelligent systems . .5 1.4 Contributions and outline . .7 1.5 Published work . .7 1.5.1 Zero-shot learning . .7 1.5.2 Domain generalization for visual question answering . .7 1.5.3 Other work . .8 II Zero-shot Learning 9 2 Introduction to Zero-shot Learning 10 2.1 Definition . 11 2.1.1 Notations . 11 2.1.2 Problem formulations . 12 2.1.3 The framework of algorithms . 12 2.2 Challenges . 12 2.3 Contributions . 13 2.4 Outline of Part II . 13 3 Literature Survey on Zero-Shot Learning 15 3.1 Semantic representations . 15 3.1.1 Visual attributes . 15 iv 3.1.2 Vector representations (word vectors) of class names . 16 3.1.3 Textural descriptions of classes . 17 3.1.4 Hierarchical class taxonomies . 18 3.1.5 Other forms of semantic representations . 18 3.2 Algorithms for conventional ZSL . 19 3.2.1 Embedding-based methods . 19 3.2.2 Similarity-based methods . 22 3.2.3 Other approaches . 24 3.3 Algorithms for generalized ZSL . 24 3.4 Related tasks to zero-shot learning . 25 3.4.1 Transductive and semi-supervised zero-shot learning . 25 3.4.2 Zero-shot learning as the prior for active learning . 25 3.4.3 Few-shot learning . 25 4 Synthesize Classifier (SynC) for Zero-Shot Learning 26 4.1 Main idea . 26 4.2 Approach . 27 4.2.1 Notations . 27 4.2.2 Manifold learning with phantom classes . 27 4.2.3 Learning phantom classes . 28 4.3 Comparison to existing methods . 30 4.4 Hyper-parameter tuning: cross-validation (CV) strategies . 31 4.5 Empirical studies . 31 4.5.1 setup . 32 4.5.2 Implementation details . 33 4.5.3 Main results . 33 4.5.4 Large-scale zero-shot learning . 35 4.5.5 Detailed analysis . 35 4.5.6 Qualitative results . 38 4.6 Summary . 38 5 Generalized Zero-Shot Learning 40 5.1 Overview . 40 5.2 Generalized zero-shot learning . 41 5.2.1 Conventional and generalized zero-shot learning . 41 5.2.2 Classifiers . 42 5.2.3 Generalized ZSL is hard . 42 5.3 Approach for GZSL . 43 5.3.1 Calibrated stacking . 43 5.3.2 Area Under Seen-Unseen Accuracy Curve (AUSUC) . 44 5.3.3 Comparisons to alternative approaches . 45 5.4 Empirical studies . 45 5.4.1 Setup . 45 5.4.2 Hyper-parameter tuning strategies . 46 5.4.3 Which method to use to perform GZSL? . 47 v 5.4.4 Which zero-shot learning approach is more robust to GZSL? . 47 5.5 Summary . 49 6 From Zero-Shot Learning to Conventional Supervised Learning 50 6.1 Comparisons among different learning paradigms . 50 6.2 Empirical studies . 51 6.2.1 Overview . 51 6.2.2 Setup . 52 6.2.3 Results . 52 6.3 Summary . 54 7 Improving Semantic Representations by Predicting Visual Exemplars (EXEM) 55 7.1 Approach . 55 7.1.1 Learning to predict the visual exemplars from the semantic representations 56 7.1.2 Zero-shot learning based on the predicted visual exemplars . 56 7.1.2.1 Predicted exemplars as training data . 56 7.1.2.2 Predicted exemplars as the ideal semantic representations . 56 7.1.3 Comparison to related approaches . 57 7.2 Other details . 57 7.3 Empirical studies . 58 7.3.1 Setup . 58 7.3.2 Implementation details . 58 7.3.3 Predicted visual exemplars . 59 7.3.4 Results on the conventional setting .