PDF We Write As P (Λ)) to Average Over the Exponential Supermartingales in This Section, Proving the Following Theorem

UNIVERSITY OF CALIFORNIA, SAN DIEGO Playing Games to Reduce Supervision in Learning A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science by Akshay Balsubramani Committee in charge: Professor Yoav Freund, Chair Professor Sanjoy Dasgupta Professor Patrick J. Fitzsimmons Professor Alon Orlitsky Professor Lawrence K. Saul 2016 EPIGRAPH The most important questions of life...are indeed, for the most part, really only problems of probability. Pierre-Simon Laplace Knowledge I possess of the game of dice, thus is my skill in numbers. Mahabharata (Rituparna to Nala, Vana Parva) Confer with the ignorant man as with the learned. For knowledge has no limits, and none has yet achieved perfection in it. Ptahhotep, Maxim 1 Nothing has such power to broaden the mind as the ability to investigate systematically and truly all that comes under observation in life. Marcus Aurelius, “Meditations” (Book III) From discrimination between this and that a host of demons blazes forth! Huangb´ o` X¯ıyun` To see what is in front of one’s nose needs a constant struggle. George Orwell, “In Front of Your Nose” Picture all experts as if they were mammals. Christopher Hitchens, “Letters to a Young Contrarian” If life is going to exist in a Universe of this size, then the one thing it cannot afford to have is a sense of proportion. Douglas Adams, “The Restaurant at the End of the Universe” I’m not much but I’m all I have. Philip K. Dick, “Martian Time-Slip” Since that’s the way we’re playing it...let’s play it that way... Samuel Beckett, “Endgame” ii TABLE OF CONTENTS Epigraph . ii Table of Contents . iii Abstract of the Dissertation . viii Chapter 1 Introduction . 1 1.1 Outline of Part I: “Muffled” Semi-Supervised Classifier Aggregation . 1 1.2 Outline of Part II: Martingales, Stopping Times, and Statistical Testing . 4 1.3 Notation and Preliminaries . 5 Part I Games over Unlabeled Data for Semi-Supervised Classification . 6 Chapter 2 Prologue: A Formulation for Classifier Aggregation . 7 2.1 A Motivating Scenario . 7 2.2 Some Illustrative Examples . 8 2.3 Playing Against An Adversary . 10 2.4 Semi-Supervised Classifier Aggregation . 12 2.5 Advantages of the New Formulation . 12 Chapter 3 Combining Binary Classifiers to Minimize Classification Error . 14 3.1 Introduction . 14 3.2 Mathematical Preliminaries . 15 3.3 The Transductive Binary Classification Game . 16 3.4 Bounding the Correlation Vector Using Labeled Data . 18 3.5 Interpretation and Discussion . 19 3.5.1 Subgradient Conditions . 20 3.5.2 Approximate Learning . 20 3.5.3 Characteristics of the Solution . 21 3.5.4 Independent Label Noise . 22 3.6 Computational Considerations . 23 3.7 Related Work . 23 Chapter 4 Optimal Binary Classifier Aggregation for General Losses . 25 4.1 Setup . 25 4.1.1 Loss Functions . 25 4.1.2 Minimax Formulation . 26 4.2 Results for Binary Classification . 28 4.2.1 Solution of the Game . 29 4.2.2 The Ensemble Aggregation Algorithm . 30 4.2.3 Examples of Different Losses . 31 4.2.4 Technical Discussion . 33 iii 4.3 Related Work and Extensions . 34 4.3.1 Weighted Test Sets, Covariate Shift, and Label Noise . 35 4.3.2 Uniform Convergence Bounds for b ....................... 35 4.4 Constraints on General Losses for Binary Classification . 36 4.4.1 Matching Objective and Constraint Losses . 37 4.4.2 Beating the Best Classifier and the Best Weighted Majority . 38 4.5 Supporting Results and Proofs . 39 4.5.1 Proof of Theorem 4 . 39 4.5.2 Other Proofs . 41 Chapter 5 Improving Random Forests with Semi-Supervised Specialists . 45 5.1 Introduction . 45 5.2 Learning with Specialists . 46 5.2.1 Creating Specialists for an Algorithm . 48 5.2.2 Discussion . 48 5.3 Experimental Evaluation . 50 5.4 Related Work . 52 5.5 Additional Information on Experiments . 52 5.5.1 Datasets . 52 5.5.2 Algorithms . 52 Chapter 6 Muffled Incremental Combination of Classifiers . 55 6.1 Introduction . 55 6.2 An Algorithm for Incrementally Aggregating Classifiers . 57 6.3 Maximizing the Performance of an Ensemble of Trees . 59 6.3.1 Line Search . 59 6.3.2 Wilson’s Score Interval for Estimating b .................... 59 6.3.3 Discussion . 61 6.4 Empirical Results. 61 6.5 Discussion . 64 6.6 Related and Future Work . 64 6.7 Derivation of the MARVIN Update Rule . 65 6.8 Generalization and Estimating b ................................. 65 6.8.1 Wilson’s Interval . 65 6.8.2 Other Details . 66 6.9 Implementation Details . 67 Chapter 7 Optimal Classifier Aggregation for Decomposable Multiclass Losses 69 7.1 Preliminaries . 69 7.2 Decomposable Losses . 70 7.3 Main Result . 71 7.3.1 Example: Optimality of the Softmax Artificial Neuron . 72 7.4 Proof of Theorem 15 . 74 iv 7.5 Proof of Theorem 16 . 77 Chapter 8 Learning to Abstain from Binary Prediction . 80 8.1 Introduction . 80 8.2 Abstaining with a Fixed Cost . 82 8.3 Predicting with Constrained Abstain Rate . 85 8.3.1 Solving the Game . 86 8.3.2 The Pareto Frontier and Discussion . 87 8.4 Discussion and Related Work. 87 8.5 Abstaining with General Losses. 89 Chapter 9 MARVIN as a Function Approximator . 92 9.1 Algorithm and Setup . 92 9.1.1 Definitions . 94 9.1.2 Greedy Coordinate Descent as Residual Fitting . 95 9.2 Convergence of the Algorithm . 96 9.2.1 Convergence Rate . ..

Load more