
Advances in Generative Feature Learning Zhilin Yang CMU-LTI-19-010 Language Technologies Institute School of Computer Science Carnegie Mellon University 5000 Forbes Ave, Pittsburgh, PA 15213 www.lti.cs.cmu.edu Thesis Committee: Ruslan Salakhutdinov (Co-Chair), Carnegie Mellon University William W. Cohen (Co-Chair), Carnegie Mellon University Graham Neubig, Carnegie Mellon University Jason Weston, Facebook AI Research Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Copyright c 2019 Zhilin Yang Abstract It is crucial to use unlabeled data to improve learning because unlabeled data is more accessible and more abundant than labeled data. There are two major learning paradigms towards this goal—the unsupervised pretraining method pretrains a lan- guage model on unlabeled data and performs finetuning on downstream tasks, while semi-supervised learning jointly optimizes loss functions on labeled and unlabeled data. In this thesis, we study the problem of generative feature learning, which aims to use generative modeling to leverage unlabeled data for boosting target task performance. We propose novel methods to address critical challenges in generative feature leanring. In the context of unsupervised learning, Transformer-XL is a novel architec- ture that enables modeling longer-range dependency. Built upon it, XLNet employs a permutation language modeling objective to bridge the gap between language model- ing and unsupervised pretraining, which opens the possibility of applying language modeling progress to improve downstream tasks. In the setting of semi-supervised learning, we theoretically answer two open questions of GAN-based semi-supervised learning, which provides more fundamental understanding about why and how GANs work for semi-supervised learning. In addition, we pioneered the research of genera- tive modeling based semi-supervised learning for tasks with complex data structures such as classification on graphs and question answering. Empirically, our generative feature learning methods have played a leading role in the development of a variety of research topics. The proposed methods obtained the state-of-the-art results on more than 30 benchmarks at the time of their publication, including natural language inference, question answering, text classification, language modeling, semi-supervised learning, etc., demonstrating significant practical value. iv Acknowledgments It is the luckiest thing ever to have the opportunities to work with my advisors— Ruslan Salakhutdinov and William W. Cohen. As a world-class expert in machine learning and deep learning, Russ has played an important role in my PhD research career. When I started my journey in deep learning, I did not have much experience or knowledge about deep learning. Russ guided me through the very first steps by pointing me to important papers and teaching me the core ideas of deep learning. As a result, I had the opportunities to work in the exciting area of developing deep learning methods to solve natural language understanding problems. I was thus lucky enough to witness and even contribute to the rapid progress of the field. Moreover, Russ is very open-minded about what research topics to work on. Because of this, I was able to explore different research topics in the first year such as image captioning, graph embedding learning, and transfer learning. I benefited a lot from such diversity of research topics in my first year of research. On one hand, it made me aware of the most challenging problems in each area, which essentially defines what are the most important problems in machine learning and gives me vision about the future. On the other, different approaches in these areas inspired my later research when I focused on the framework of generative feature learning. He also taught me important lessons in terms of research ideas, research methods, and paper writing. William pays a lot of attention to the details including technical details and writing details. He usually gives very detailed comments about wording and grammar, and will carefully point out all the seemingly minor (but in fact very critical) issues about the organization of multiple sections and the correct usage of mathematical symbols. He also has a very rigorous attitude towards technical details. This is a very important merit I learned from him, and I always try to ensure that every step in our paper or experiments is well-motivated and logically solid. Our papers were seldom rejected by any conference with an acceptance rate over 90%, and I believe this is one of the most important reasons. Moreover, he made me believe that simplicity is among the most important things in research. We should never introduce unnecessary complexity or cosmetic maths into our methods if they do not lead to practical improvement or theoretical insight. He also taught me important research methods such as ablation studies. These lessons are valuable and pave the way for my research career. Both Russ and William are very nice people to work with. And thus I was able to enjoy research as an important part of my life in the last four years, without feeling stress or discourage. We celebrated every successful breakthrough and tackled the biggest challenges in the field together. They were also extremely intelligent and knowledgeable, and it was my great pleasure and honor to work with two of the most wonderful minds in the world. In addition, they offered the most important advices for my career plan, for which I am very grateful. I worked with Jason Weston at Facebook AI Research. I admired a lot of Jason’s work because I believe they fundamentally shaped the field of NLP, with “NLP from scratch” and memory networks being the most prominent examples. Therefore I was thrilled to have the opportunity to work with him. The most important thing I learned from Jason is the methodology of objective-driven research. The main idea is to think about an ultimate goal such as solving dialog, identify the most critical challenges to reach the goal, and then decouple the big challenges into smaller, reachable key results. I believe this leads to a good trade-off between the significance of ultimate goals and the technical feasibility of a single paper. I was also honored to work with Quoc V. Le at Google Brain. Quoc is a strong believer of big data and large computation. In fact, his groundbreaking work on AutoML is the best example. The philosophy is that it is important to scale with big data and large computation so that we focus on solving the most challenging problems that are not solvable even when we reach the limits of data and computation. I very much agree with this philosophy and have been deeply influenced. Another important lesson I learned from Quoc is optimism about the future of deep learning or AI. The golden era of AI has not passed, and the best is always yet to come. Recent evidence seems to suggest that this might be true—when we looked at the success of LSTMs and wondered if there is innovative work left to be done, the wave of Transformers sweeps through the entire field. Quoc also advocates for the generality of methods instead of being limited to specific applications. This is also an idea that I have been following in my research career. I would like to thank Graham Neubig for being on my thesis committee. I very much appreciate his valuable advices on improving the quality of the thesis. These advices also lead to deeper thinking about the relationship between our previous work and the future of the field. Zihang Dai has so far been my most important coauthor, excluding my advisors and mentors. Together we created XLNet, Transformer-XL, complement GAN, and the high-rank language models. I believe these are among the best papers I have written so far. I am surprised and deeply impressed by his strong, full-stack research skills, including implementation, hyperparameter tuning, brainstorming about new ideas, paper writing, understanding and analyzing results, deriving equations and theorems, and visualization. Our collaboration has led to numerous interesting ideas and significant insights. I would like to thank my best friends at CMU. I have been lucky to meet Zihang Dai, Jiateng Xie, Qizhe Xie, Dylan Du, Guokun Lai, and Jingzhou Liu. I don’t think I need to say much here, and you guys all understand. Edward Chen, Yutao Zhang, and Jeff Jie are valuable friends that give me guidance in the startup world. Their passion, leadership and vision have greatly inspired me. I would like to thank Hanxiao Liu for giving me numerous useful advices and suggestions both in terms of research and career plans. His opinions and experiences have been very inspiring. I would like to thank Ye Yuan for going to Chengdu Gourmet with me every week :), and of course the great collaborations and discussions we had. I also had a lot of discussions and wrote a couple of papers with Bhuwan Dhingra. He usually gives me new perspectives to a problem, and I really enjoyed working and brainstorming with him. Adams Yu has also been one of my good friends and I learned a lot from his experiences and insights. I would like to thank Peng Qi, Jake vi Zhao, and Saizheng Zhang for our great collaborations on HotpotQA and GLoMo, from which I have learned a lot. I am grateful for the suggestions and ideas Thang Luong gave me during my internship at Google, and it was my pleasure to work with him. I would also like to thank Guoqing Zheng, Fan Yang, Diyi Yang, Wei-Cheng Chang, Junjie Hu, Yuexin Wu, Yichong Xu, Simon Du, Han Zhao, Zhiting Hu, Pengtao Xie, Yao-Hung Tsai, Max Ma, Ruochen Xu, Pengcheng Yin, Xinyu Wang, Hector Liu, Shikun Zhang, Keyang Xu, Xingyu Lin, Di Wang and many others for their support.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages125 Page
-
File Size-