Topics at the Interface of Optimization and Statistics by Tu Nguyen a Dissertation Submitted to the Johns Hopkins University In

Topics at the interface of optimization and statistics by Tu Nguyen A dissertation submitted to The Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy. Baltimore, Maryland July, 2020 Abstract Optimization has been an important tool in statistics for a long time. For example, the problem of parameter estimation in a statistical model, either by maximizing a likelihood function [82, Chapter 8] or using least squares approach [46, Chapters 5-6], reduces to solving an optimization problem. Not only has optimization been utilized in solving traditional statistical problems, it also plays a crucial role in more recent areas such as statistical learning. In particular, in most statistical learning models, one learns the best parameters for the model through minimizing some cost function under certain constraints. In the past decade or so, there has been an increasing trend in going to reverse direction: Using statistics as a powerful tool in optimization. As learning algorithms become more efficient, researchers have focused on finding ways to apply learning models to improve the performance of existing optimization algorithms [91, 65, 11]. Following their footsteps, in this thesis, we study a recent algorithm for generating cutting planes in mixed integer linear programming problems and show how one can apply learning algorithms to improve the algorithm. In addition, we use the decision theory framework to evaluate whether the solution given by the sample average approximation, a commonly used method to solve stochastic programming problems, is \good". In particular, we show that the sample average solution is admissible for an uncertain linear objective over a fixed compact set and for a convex quadratic function with an uncertain linear term over box constraints when the dimension d ≤ 4: Finally, we combine tools from mixed integer programming and Bayesian statistics to solve the catalog matching problem in astronomy, which tries to associate an object's de- ii tections coming from independent catalogs. This problem has been studied by many researchers [26, 27, 88]. However, the most current algorithm to tackle the problem, as in [88], is only shown to work with 3 catalogs. In this thesis, we extend this algorithm to allow for matching across a higher number of catalogs. In addition, we introduce a new algorithm that is more efficient and scales much better with large number of catalogs. iii Primary Reader: Amitabh Basu Secondary Reader: Dan Naiman iv Acknowledgements In all honesty, writing the acknowledgement is one of the harder parts in writing this thesis. Expressing my gratitude to any of the following people in a couple of sentences is almost impossible, as there are not enough words to adequately express my appreciation for what they have done for me. It took me some time to ponder who I should put first. Is it the person who has helped me the most throughout my journey or should it be one that has influenced me most? Fortunately, there is a common answer to all those questions: my advisor, Amitabh Basu! So, thank you, Amitabh, for your endless support since the day we started working together. Thank you for your patience in guiding me along this path. As I did not have any formal background in optimization before coming to Hopkins, you really took your time to instruct me from the most basic concepts. Finally, thank you for being an outstanding role model. Your curiosity and your desire to learn new subjects that are outside of your area of expertise encourage me to always broaden my knowledge. I would like to thank my qualifying exams committee members and dissertation defense panels for their suggestions and feedbacks. In particular, thank you, Dan, for your formal introduction to the field of statistics through your class and for asking me insightful questions during my GBO exam. Your questions made me aware of the holes I have in my knowledge. To Tamas, thank you for being such a collaborator. Working with you in the catalog matching project not only helps me solidifying my programming skills but also teaches me a lesson that the presentation of the work is just as important as the work itself. At Hopkins, I have also had the opportunity to learn from many amazing faculty members. I want to especially thank faculty members John Miller, James Fill, Donniell Fishkind, v Avanti Athreya, Steve Hanke, Helyette Geman, and Fred Torcaso for letting me pick your brain. In addition, to Kristin, Lindsay, Sandy, Ann, Seanice, and Sonjala, thank you for being the unsung heroes in the department. Your work has made my life at Hopkins easier. To admission committee members, whom I do not know, thank you for giving me a chance to study at Hopkins. I graduated from a not-so-stellar college and honestly did not expect to get accepted here. Thank you for believing in me and I hope I proved you right! I feel very fortunate to call so many wonderful PhD students in the department my friends. To William, Wayne, Lingyao, Ke, Andrew, Hsi-Wei, Tianyu, Chu-chi, thank you being there with me through thick and thin. Without you all, my experience would not be as great and enjoyable. Finally, I would like to thank my family back home, epsecially my mom and my grandma, for their constant support and encouragement. They are the greatest source of my strength. Finally, this would not feel complete without mentioning my girlfriend. Thank you for always being by my side, for comforting me through hard times, and for listening to all my crazy research ideas. You know how much I love you. vi This thesis is dedicated to my whole family, especially my parents and my grandparents. Simply thinking about you gives me all the strength to keep going. vii Contents Abstract ii Acknowledgementsv 1 Introduction and Motivation1 1.1 Optimization...................................2 1.2 Statistics.....................................3 1.3 Contributions and Outline of the thesis....................5 2 Optimization Background6 2.1 Complexity Classes................................6 2.2 Convex Analysis.................................7 2.3 Mixed-integer Linear Programming....................... 15 2.3.1 Linear Programming........................... 15 2.3.2 Mixed-integer Linear Programming................... 19 2.4 Cut Generating Functions............................ 26 2.4.1 Mixed-integer set............................. 26 2.4.2 Cut Generating Function Pairs..................... 28 2.4.3 Example.................................. 30 3 Statistics Background 34 3.1 Point Estimation Methods............................ 34 3.2 Decision Theory Framework........................... 37 3.2.1 Components of the decision theory framework............. 38 viii 3.2.2 Comparing between decision rules................... 39 3.2.3 Improving a decision rule........................ 40 3.2.4 Admissibility results........................... 42 3.3 Statistical Learning Theory........................... 45 3.3.1 k Nearest Neighbor............................ 47 3.3.2 Logistic Regression............................ 48 3.3.3 Random Forests............................. 49 3.3.4 Support Vector Machine......................... 52 3.3.5 Neural Network.............................. 54 4 Learning to cut 57 4.1 Introduction.................................... 57 4.2 Problem setup.................................. 58 4.3 Finding the best parameters for generalized crosspolyhedral cuts...... 61 4.4 Classifying problem instances based on the effectiveness of generalized crosspolyhedral cuts.................................. 63 4.5 Future Work................................... 77 5 Admissibility of Solution Estimators for Stochastic Optimization 78 5.1 Introduction.................................... 78 5.1.1 Statistical decision theory and admissibility.............. 80 5.1.2 Admissibility of the sample average estimator and our results.... 81 5.1.3 Comparison with previous work..................... 85 5.1.4 Admissibility and other notions of optimality............. 86 5.2 Technical Tools.................................. 88 5.3 Proof of Theorem 5.1 (the scenario with linear objective).......... 90 5.3.1 When the covariance matrix is the identity.............. 90 5.3.2 General covariance............................ 95 5.4 An alternate proof for the linear objective based on Bayes' decision rules.. 96 5.5 Proof of Theorem 5.2 (the scenario with quadratic objective)........ 98 5.6 Future Work................................... 105 ix 6 Scalable N-way matching of astronomy catalogs 107 6.1 Motivation.................................... 107 6.2 Our Approach................................... 108 6.2.1 CanILP: Optimal Selection of Candidates............... 110 6.2.2 DirILP: Optimal Direct Associations.................. 112 6.2.3 DirILP in General............................ 117 6.3 Mock Objects and Simulations......................... 119 1 6.3.1 Special case: κic = σ2 for every detection (i; c)............ 119 6.3.2 General case: κic is different for every detection (i; c)......... 122 6.3.3 Multiple sources per catalog in each island............... 123 6.4 Software Tool................................... 124 6.5 Discussion and Conclusion............................ 124 Appendix A Appendix to Chapter5 126 n A.1 Proof of the consistency of δSA in Section 5.1.4................ 126 n A.1.1 Proof of consistency of δSA for the linear case............. 126 n A.1.2 Proof of consistency of δSA for the quadratic case........... 127 A.2 Proof of Lemma 5.1 in Section

Load more