Prediction, Learning, and Games

Prediction, Learning, and Games NICOLO` CESA-BIANCHI Universita` degli Studi di Milano GABOR´ LUGOSI Universitat Pompeu Fabra, Barcelona cambridge university press Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge cb2 2ru,UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Informationonthistitle:www.cambridge.org/9780521841085 © Nicolo Cesa-Bianchi and Gabor Lugosi This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2006 isbn-13 978-0-511-19178-7 eBook (NetLibrary) isbn-10 0-511-19178-2 eBook (NetLibrary) isbn-13 978-0-521-84108-5 hardback isbn-10 0-521-84108-9 hardback Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. to Betta, Silvia, and Luca – NCB to Arrate, Dani, and Frida to the memory of my parents – GL Contents Preface page xi 1 Introduction1 1.1 Prediction1 1.2 Learning3 1.3 Games3 1.4 A Gentle Start4 1.5 A Note to the Reader6 2 Prediction with Expert Advice7 2.1 Weighted Average Prediction9 2.2 An Optimal Bound 15 2.3 Bounds That Hold Uniformly over Time 17 2.4 An Improvement for Small Losses 20 2.5 Forecasters Using the Gradient of the Loss 22 2.6 Scaled Losses and Signed Games 24 2.7 The Multilinear Forecaster 25 2.8 The Exponential Forecaster for Signed Games 27 2.9 Simulatable Experts 29 2.10 Minimax Regret 30 2.11 Discounted Regret 32 2.12 Bibliographic Remarks 34 2.13 Exercises 37 3 Tight Bounds for Specific Losses 40 3.1 Introduction 40 3.2 Follow the Best Expert 41 3.3 Exp-concave Loss Functions 45 3.4 The Greedy Forecaster 49 3.5 The Aggregating Forecaster 52 3.6 Mixability for Certain Losses 56 3.7 General Lower Bounds 59 3.8 Bibliographic Remarks 63 3.9 Exercises 64 vii viii Contents 4 Randomized Prediction 67 4.1 Introduction 67 4.2 Weighted Average Forecasters 71 4.3 Follow the Perturbed Leader 74 4.4 Internal Regret 79 4.5 Calibration 85 4.6 Generalized Regret 90 4.7 Calibration with Checking Rules 93 4.8 Bibliographic Remarks 94 4.9 Exercises 95 5Efficient Forecasters for Large Classes of Experts 99 5.1 Introduction 99 5.2 Tracking the Best Expert 100 5.3 Tree Experts 109 5.4 The Shortest Path Problem 116 5.5 Tracking the Best of Many Actions 121 5.6 Bibliographic Remarks 124 5.7 Exercises 125 6 Prediction with Limited Feedback 128 6.1 Introduction 128 6.2 Label Efficient Prediction 129 6.3 Lower Bounds 136 6.4 Partial Monitoring 143 6.5 A General Forecaster for Partial Monitoring 146 6.6 Hannan Consistency and Partial Monitoring 153 6.7 Multi-armed Bandit Problems 156 6.8 An Improved Bandit Strategy 160 6.9 Lower Bounds for the Bandit Problem 164 6.10 How to Select the Best Action 169 6.11 Bibliographic Remarks 173 6.12 Exercises 175 7 Prediction and Playing Games 180 7.1 Games and Equilibria 180 7.2 Minimax Theorems 185 7.3 Repeated Two-Player Zero-Sum Games 187 7.4 Correlated Equilibrium and Internal Regret 190 7.5 Unknown Games: Game-Theoretic Bandits 194 7.6 Calibration and Correlated Equilibrium 194 7.7 Blackwell’s Approachability Theorem 197 7.8 Potential-based Approachability 202 7.9 Convergence to Nash Equilibria 205 7.10 Convergence in Unknown Games 210 7.11 Playing Against Opponents That React 219 7.12 Bibliographic Remarks 225 7.13 Exercises 227 Contents ix 8 Absolute Loss 233 8.1 Simulatable Experts 233 8.2 Optimal Algorithm for Simulatable Experts 234 8.3 Static Experts 236 8.4 A Simple Example 238 8.5 Bounds for Classes of Static Experts 239 8.6 Bounds for General Classes 241 8.7 Bibliographic Remarks 244 8.8 Exercises 245 9 Logarithmic Loss 247 9.1 Sequential Probability Assignment 247 9.2 Mixture Forecasters 249 9.3 Gambling and Data Compression 250 9.4 The Minimax Optimal Forecaster 252 9.5 Examples 253 9.6 The Laplace Mixture 256 9.7 A Refined Mixture Forecaster 258 9.8 Lower Bounds for Most Sequences 261 9.9 Prediction with Side Information 263 9.10 A General Upper Bound 265 9.11 Further Examples 269 9.12 Bibliographic Remarks 271 9.13 Exercises 272 10 Sequential Investment 276 10.1 Portfolio Selection 276 10.2 The Minimax Wealth Ratio 278 10.3 Prediction and Investment 278 10.4 Universal Portfolios 282 10.5 The EG Investment Strategy 284 10.6 Investment with Side Information 287 10.7 Bibliographic Remarks 289 10.8 Exercises 290 11 Linear Pattern Recognition 293 11.1 Prediction with Side Information 293 11.2 Bregman Divergences 294 11.3 Potential-Based Gradient Descent 298 11.4 The Transfer Function 301 11.5 Forecasters Using Bregman Projections 307 11.6 Time-Varying Potentials 314 11.7 The Elliptic Potential 316 11.8 A Nonlinear Forecaster 320 11.9 Lower Bounds 322 11.10 Mixture Forecasters 325 11.11 Bibliographic Remarks 328 11.12 Exercises 330 x Contents 12 Linear Classification 333 12.1 The Zero–One Loss 333 12.2 The Hinge Loss 335 12.3 Maximum Margin Classifiers 343 12.4 Label Efficient Classifiers 346 12.5 Kernel-Based Classifiers 350 12.6 Bibliographic Remarks 355 12.7 Exercises 356 Appendix 359 A.1 Inequalities from Probability Theory 359 A.1.1 Hoeffding’s Inequality 359 A.1.2 Bernstein’s Inequality 361 A.1.3 Hoeffding–Azuma Inequality and Related Results 362 A.1.4 Khinchine’s Inequality 364 A.1.5 Slud’s Inequality 364 A.1.6 A Simple Limit Theorem 364 A.1.7 Proof of Theorem 8.3 367 A.1.8 Rademacher Averages 368 A.1.9 The Beta Distribution 369 A.2 Basic Information Theory 370 A.3 Basics of Classification 371 References 373 Author Index 387 Subject Index 390 Preface ...bewareofmathematicians,andallthosewhomakeemptyprophecies. St. Augustine, De Genesi ad Litteram libri duodecim. Liber Secundus, 17, 37. Prediction of individual sequences, the main theme of this book, has been studied in various fields, such as statistical decision theory, information theory, game theory, machine learning, and mathematical finance. Early appearances of the problem go back as far as the 1950s, with the pioneering work of Blackwell, Hannan, and others. Even though the focus of investigation varied across these fields, some of the main principles have been discovered independently. Evolution of ideas remained parallel for quite some time. As each community developed its own vocabulary, communication became difficult. By the mid-1990s, however, it became clear that researchers of the different fields had a lot to teach each other. When we decided to write this book, in 2001, one of our main purposes was to investigate these connections and help ideas circulate more fluently. In retrospect, we now realize that the interplay among these many fields is far richer than we suspected. For this reason, exploring this beautiful subject during the preparation of the book became a most exciting experience – we really hope to succeed in transmitting this excitement to the reader. Today, several hundreds of pages later, we still feel there remains a lot to discover. This book just shows the first steps of some largely unexplored paths. We invite the reader to join us in finding out where these paths lead and where they connect. The book should by no means be treated as an encyclopedia of the subject. The selection of the material reflects our personal taste. Large parts of the manuscript have been read and constructively criticized by Gilles Stoltz, whose generous help we greatly appreci- ate. Claudio Gentile and Andras´ Gyorgy¨ also helped us with many important and useful comments. We are equally grateful to the members of a seminar group in Budapest who periodically gathered in Laci Gyor¨ fi’s office at the Technical University and unmercifully questioned every line of the manuscript they had access to, and to Gyorgy¨ Ottucsak´ who diligently communicated to us all these questions and remarks. The members of the group are Andras´ Antos, Balazs´ Csanad´ Csaji,´ Laci Gyor¨ fi, Andras´ Gyorgy,¨ Levente Kocsis, Gyorgy¨ Ottucsak,´ Marti´ Pinter,´ Csaba Szepesvari,´ and Kati Varga. Of course, all remaining errors are our responsibility. We thank all our friends and colleagues who, often without realizing it, taught us many ideas, tricks, and points of view that helped us understand the subtleties of the material. xi xii Preface The incomplete list includes Peter Auer, Andrew Barron, Peter Bartlett, Shai Ben-David, Stephane´ Boucheron, Olivier Bousquet, Miklos´ Cs`ùros,¨ Luc Devroye, Meir Feder, Paul Fischer, Dean Foster, Yoav Freund, Claudio Gentile, Fabrizio Germano, Andras´ Gyorgy,¨ Laci Gyor¨ fi, Sergiu Hart, David Haussler, David Helmbold, Marcus Hutter, Sham Kakade, Adam Kalai, Balazs´ Kegl,´ Jyrki Kivinen, Tamas´ Linder, Phil Long, Yishay Mansour, Andreu Mas-Colell, Shahar Mendelson, Neri Merhav, Jan Poland, Hans Simon, Yoram Singer, Rob Schapire, Kic Udina, Nicolas Vayatis, Volodya Vovk, Manfred Warmuth, Marcelo Weinberger, and Michael ‘Il Lupo’ Wolf. We gratefully acknowledge the research support of the Italian Ministero dell’Istruzione, Universita` e Ricerca, the Generalitat de Catalunya, the Spanish Ministry of Science and Technology, and the IST Programme of the European Community under the PASCAL Network of Excellence.

Load more