Introduction to Multi-Armed Bandits

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to Multi-Armed Bandits Introduction to Multi-Armed Bandits Aleksandrs Slivkins Microsoft Research NYC First draft: January 2017 Published: November 2019 This version: June 2021 Abstract Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty. An enormous body of work has accumulated over the years, covered in several books and surveys. This book provides a more introductory, textbook-like treatment of the subject. Each chapter tackles a particular line of work, providing a self-contained, teachable technical introduction and a brief review of the further developments; many of the chapters conclude with exercises. The book is structured as follows. The first four chapters are on IID rewards, from the basic model to impossibility results to Bayesian priors to Lipschitz rewards. The next three chapters cover adversarial rewards, from the full-feedback version to adversarial bandits to extensions with linear rewards and combinatorially structured actions. Chapter 8 is on contextual bandits, a middle ground between IID and adversarial bandits in which the change in reward distributions is completely explained by observable contexts. The last three chapters cover connections to economics, from learning in repeated games to bandits with supply/budget constraints to exploration in the presence of incentives. The appendix provides sufficient background on concentration and KL-divergence. The chapters on “bandits with similarity information”, “bandits with knapsacks” and “bandits and agents” can also be consumed as standalone surveys on the respective topics. Published with Foundations and Trendsr in Machine Learning, November 2019. arXiv:1904.07272v6 [cs.LG] 26 Jun 2021 The present version is a revision of the “Foundations and Trends” publication. It contains numerous edits for presentation and accuracy (based in part on readers’ feedback), updated and expanded literature reviews, and some new exercises. Further comments, suggestions and bug reports are very welcome! © 2017-2021: Aleksandrs Slivkins. Author’s webpage: https://www.microsoft.com/en-us/research/people/slivkins. Email: slivkins at microsoft.com. Preface Multi-armed bandits is a rich, multi-disciplinary research area which receives attention from computer sci- ence, operations research, economics and statistics. It has been studied since (Thompson, 1933), with a big surge of activity in the past 15-20 years. An enormous body of work has accumulated over time, various subsets of which have been covered in several books (Berry and Fristedt, 1985; Cesa-Bianchi and Lugosi, 2006; Gittins et al., 2011; Bubeck and Cesa-Bianchi, 2012). This book provides a more textbook-like treatment of the subject, based on the following principles. The literature on multi-armed bandits can be partitioned into a dozen or so lines of work. Each chapter tackles one line of work, providing a self-contained introduction and pointers for further reading. We favor fundamental ideas and elementary proofs over the strongest possible results. We emphasize accessibility of the material: while exposure to machine learning and probability/statistics would certainly help, a standard undergraduate course on algorithms, e.g., one based on (Kleinberg and Tardos, 2005), should suffice for background. With the above principles in mind, the choice specific topics and results is based on the au- thor’s subjective understanding of what is important and “teachable”, i.e., presentable in a relatively simple manner. Many important results has been deemed too technical or advanced to be presented in detail. The book is based on a graduate course at University of Maryland, College Park, taught by the author in Fall 2016. Each chapter corresponds to a week of the course. Five chapters were used in a similar course at Columbia University, co-taught by the author in Fall 2017. Some of the material has been updated since then, so as to reflect the latest developments. To keep the book manageable, and also more accessible, we chose not to dwell on the deep connections to online convex optimization. A modern treatment of this fascinating subject can be found, e.g., in Hazan (2015). Likewise, we do not venture into reinforcement learning, a rapidly developing research area and subject of several textbooks such as Sutton and Barto (1998); Szepesvari´ (2010); Agarwal et al. (2020). A course based on this book would be complementary to graduate-level courses on online convex optimiza- tion and reinforcement learning. Also, we do not discuss Markovian models of multi-armed bandits; this direction is covered in depth in Gittins et al. (2011). The author encourages colleagues to use this book in their courses. A brief email regarding which chapters have been used, along with any feedback, would be appreciated. A simultaneous book. An excellent recent book on bandits, Lattimore and Szepesvari´ (2020), has evolved over several years simultaneously and independently with ours. Their book is much longer, providing a deeper treatment for a number of topics and omitting a few others. The two books reflect the authors’ somewhat differing tastes and presentation styles, and, I believe, are complementary to one another. Acknowledgements. Most chapters originated as lecture notes from my course at UMD; the initial versions of these lectures were scribed by the students. Presentation of some of the fundamental results is influenced by (Kleinberg, 2007). I am grateful to Alekh Agarwal, Bobby Kleinberg, Yishay Mansour, and Rob Schapire for discussions and advice. Chapters 9, 10 have benefited tremendously from numerous conversations with Karthik Abinav Sankararaman. Special thanks go to my PhD advisor Jon Kleinberg and my postdoc mentor Eli Upfal; Jon has shaped my taste in research, and Eli has introduced me to bandits back in 2006. Finally, I wish to thank my parents and my family for love, inspiration and support. ii Contents Introduction: Scope and Motivation 1 1 Stochastic Bandits 4 1.1 Model and examples . 4 1.2 Simple algorithms: uniform exploration . 6 1.3 Advanced algorithms: adaptive exploration . 8 1.4 Forward look: bandits with initial information . 13 1.5 Literature review and discussion . 13 1.6 Exercises and hints . 15 2 Lower Bounds 18 2.1 Background on KL-divergence . 19 2.2 A simple example: flipping one coin . 20 2.3 Flipping several coins: “best-arm identification” . 21 2.4 Proof of Lemma 2.8 for the general case . 22 2.5 Lower bounds for non-adaptive exploration . 24 2.6 Instance-dependent lower bounds (without proofs) . 25 2.7 Literature review and discussion . 26 2.8 Exercises and hints . 27 3 Bayesian Bandits and Thompson Sampling 28 3.1 Bayesian update in Bayesian bandits . 29 3.2 Algorithm specification and implementation . 33 3.3 Bayesian regret analysis . 35 3.4 Thompson Sampling with no prior (and no proofs) . 37 3.5 Literature review and discussion . 37 4 Bandits with Similarity Information 38 4.1 Continuum-armed bandits . 38 4.2 Lipschitz bandits . 42 4.3 Adaptive discretization: the Zooming Algorithm . 45 4.4 Literature review and discussion . 50 4.5 Exercises and hints . 55 5 Full Feedback and Adversarial Costs 59 5.1 Setup: adversaries and regret . 60 5.2 Initial results: binary prediction with experts advice . 62 5.3 Hedge Algorithm . 64 5.4 Literature review and discussion . 68 5.5 Exercises and hints . 68 6 Adversarial Bandits 70 6.1 Reduction from bandit feedback to full feedback . 71 6.2 Adversarial bandits with expert advice . 71 6.3 Preliminary analysis: unbiased estimates . 72 6.4 Algorithm Exp4 and crude analysis . 73 6.5 Improved analysis of Exp4 ........................................ 74 6.6 Literature review and discussion . 76 6.7 Exercises and hints . 79 iii CONTENTS iv 7 Linear Costs and Semi-bandits 80 7.1 Online routing problem . 81 7.2 Combinatorial semi-bandits . 83 7.3 Follow the Perturbed Leader . 85 7.4 Literature review and discussion . 89 8 Contextual Bandits 90 8.1 Warm-up: small number of contexts . 91 8.2 Lipshitz contextual bandits . 91 8.3 Linear contextual bandits (no proofs) . 93 8.4 Contextual bandits with a policy class . 94 8.5 Learning from contextual bandit data . 97 8.6 Contextual bandits in practice: challenges and a system design . 99 8.7 Literature review and discussion . 104 8.8 Exercises and hints . 106 9 Bandits and Games 107 9.1 Basics: guaranteed minimax value . 108 9.2 The minimax theorem . 110 9.3 Regret-minimizing adversary . 111 9.4 Beyond zero-sum games: coarse correlated equilibrium . 113 9.5 Literature review and discussion . 114 9.6 Exercises and hints . 116 10 Bandits with Knapsacks 118 10.1 Definitions, examples, and discussion . 118 10.2 Examples . 120 10.3 LagrangeBwK: a game-theoretic algorithm for BwK . 122 10.4 Optimal algorithms and regret bounds (no proofs) . 128 10.5 Literature review and discussion . 130 10.6 Exercises and hints . 139 11 Bandits and Agents 141 11.1 Problem formulation: incentivized exploration . 142 11.2 How much information to reveal? . 144 11.3 Basic technique: hidden exploration . 146 11.4 Repeated hidden exploration . ..
Recommended publications
  • We Can Go Anywhere': Understanding Independence Through a Case Study
    ‘We can go anywhere’: Understanding independence through a case study of ride-hailing use by people with visual impairments in metropolitan India VAISHNAV KAMESWARAN, University of Michigan JATIN GUPTA, University of Michigan JOYOJEET PAL, University of Michigan SILE O’MODHRAIN, University of Michigan TIFFANY C. VEINOT, University of Michigan ROBIN N. BREWER, University of Michigan AAKANKSHA PARAMESHWAR, University of Michigan VIDHYA Y, Microsoft Research India JACKI O’NEILL, Microsoft Research India Ride-hailing services have received attention as part of the growing work around the sharing economy, but the focus of these studies has largely been on drivers. In this paper, we examine how ride-hailing is transforming the transportation practices of one group of passengers - people with visual impairments in metropolitan India. Through a qualitative study consisting of interviews and observations, we examined the use and impact of these services on our target population, who otherwise contend with chaotic, unreliable, and largely inaccessible modes of transportation. We found that ride-hailing services positively affects participants’ notions of independence, and we tease out how independence for our participants is not just about ‘doing things alone, without help’ but is also situated, social and relative. Furthermore, we show how accessibility, in the case of ride-hailing in India, is a socio-technical and collaborative achievement, involving interactions between the passenger, the driver, and the technology. CCS Concepts: • Human-centered computing → Empirical studies in accessibility; 85 Additional Key Words and Phrases: Accessibility, social accessibility, collaborative accessibility, independence, stigma, social interactions, ridesharing, Uber, Ola, blind users ACM Reference Format: Vaishnav Kameswaran, Jatin Gupta, Joyojeet Pal, Sile O’Modhrain, Tiffany C.
    [Show full text]
  • A Toolkit for Partners of the CTC 2Nd Edition
    Experiences A toolkit for partners of the CTC Kraus Hotspring, Nahanni, Northwest Territories © Noel Hendrickson 2nd edition October 2011 1 Experiences October 2011 © Canadian Tourism Commission 2011. All rights reserved. Dear Colleagues I’m delighted to present Experiences – A toolkit for partners of the CTC (2nd Ed.) for industry. The release coincides with the launch of our new Signature Experiences Collection® and our deeper knowledge of the values, attitudes and behaviours of travellers to Canada based on our Explorer Quotient® (EQ®) research. Travellers around the world are telling We proudly support Canada’s small and We look forward to the innovation this us that they want to explore the unique, medium enterprises (SMEs) with tools, Toolkit stimulates, the current practices the exotic and the unexpected. We’ve research, digital asset sharing, programs it validates and the creative product promised them that Canada is the place and marketing campaigns. Our Brand development that will emerge. Together we where they can fulfill this dream. Our Experiences unit works directly with industry can welcome the world, increase demand tourism businesses are key to delivering on to support your product development, for travel to Canada, and strengthen our that promise. marketing and market development national brand: Canada. Keep Exploring. activities. Memorable and engaging visitor Sincerely yours experiences in Canada bring our brand to The Experiences - A toolkit for partners life. They also strengthen the perception of the CTC (2nd Ed.) for industry provides of Canada as an all-season, premier travel updated information that we hope clearly destination. explains experiential travel and the business Library and Archives Canada Cataloguing in Publication Our goal at the Canadian Tourism opportunity it represents.
    [Show full text]
  • The Complexity Zoo
    The Complexity Zoo Scott Aaronson www.ScottAaronson.com LATEX Translation by Chris Bourke [email protected] 417 classes and counting 1 Contents 1 About This Document 3 2 Introductory Essay 4 2.1 Recommended Further Reading ......................... 4 2.2 Other Theory Compendia ............................ 5 2.3 Errors? ....................................... 5 3 Pronunciation Guide 6 4 Complexity Classes 10 5 Special Zoo Exhibit: Classes of Quantum States and Probability Distribu- tions 110 6 Acknowledgements 116 7 Bibliography 117 2 1 About This Document What is this? Well its a PDF version of the website www.ComplexityZoo.com typeset in LATEX using the complexity package. Well, what’s that? The original Complexity Zoo is a website created by Scott Aaronson which contains a (more or less) comprehensive list of Complexity Classes studied in the area of theoretical computer science known as Computa- tional Complexity. I took on the (mostly painless, thank god for regular expressions) task of translating the Zoo’s HTML code to LATEX for two reasons. First, as a regular Zoo patron, I thought, “what better way to honor such an endeavor than to spruce up the cages a bit and typeset them all in beautiful LATEX.” Second, I thought it would be a perfect project to develop complexity, a LATEX pack- age I’ve created that defines commands to typeset (almost) all of the complexity classes you’ll find here (along with some handy options that allow you to conveniently change the fonts with a single option parameters). To get the package, visit my own home page at http://www.cse.unl.edu/~cbourke/.
    [Show full text]
  • Bon Jovi's This House Is Not for Sale Tour to Launch
    BON JOVI’S THIS HOUSE IS NOT FOR SALE TOUR TO LAUNCH FEBRUARY 2017 PRESENTED BY LIVE NATION New Album Release Date for This House Is Not for Sale set for Nov. 4th; Featured in The Ellen DeGeneres Show, ABC’s Good Morning America, Nightline, Charlie Rose, Howard Stern, People, Billboard American Express and Fan Club Ticket Pre-Sales Begin Oct. 10 at 10 a.m. Public Tickets On-Sale Oct. 15 at 10 a.m. October 5, 2016 – Grammy Award®-winning band, Bon Jovi today announced that the This House Is Not for Sale Tour, presented by Live Nation, will kick off in February 2017. Hitting arenas across the U.S., the iconic rock band will present anthems, fan favorites, and new hits from their upcoming 14th studio album, This House Is Not for Sale (out Nov. 4 on Island/UMG). As an added bonus, fans will receive a physical copy of This House Is Not For Sale with every ticket purchased. This will be the band’s first outing since Bon Jovi’s 2013 Because We Can World Tour, which was their third tour in six years to be ranked the #1 top-grossing tour in the world (a feat accomplished by only The Rolling Stones previously). Bon Jovi’s touring legacy will be recogniZed Nov. 9th with the 2016 “Legend of Live” award at the Billboard Touring Conference & Awards. As Bon Jovi rocks October to launch This House Is Not for Sale, the title track is already inside the Top Ten of the AC Radio Chart – it is Bon Jovi’s highest debut on that chart to date.
    [Show full text]
  • Algorithms and NP
    Algorithms Complexity P vs NP Algorithms and NP G. Carl Evans University of Illinois Summer 2019 Algorithms and NP Algorithms Complexity P vs NP Review Algorithms Model algorithm complexity in terms of how much the cost increases as the input/parameter size increases In characterizing algorithm computational complexity, we care about Large inputs Dominant terms p 1 log n n n n log n n2 n3 2n 3n n! Algorithms and NP Algorithms Complexity P vs NP Closest Pair 01 closestpair(p1;:::; pn) : array of 2D points) 02 best1 = p1 03 best2 = p2 04 bestdist = dist(p1,p2) 05 for i = 1 to n 06 for j = 1 to n 07 newdist = dist(pi ,pj ) 08 if (i 6= j and newdist < bestdist) 09 best1 = pi 10 best2 = pj 11 bestdist = newdist 12 return (best1, best2) Algorithms and NP Algorithms Complexity P vs NP Mergesort 01 merge(L1,L2: sorted lists of real numbers) 02 if (L1 is empty and L2 is empty) 03 return emptylist 04 else if (L2 is empty or head(L1) <= head(L2)) 05 return cons(head(L1),merge(rest(L1),L2)) 06 else 07 return cons(head(L2),merge(L1,rest(L2))) 01 mergesort(L = a1; a2;:::; an: list of real numbers) 02 if (n = 1) then return L 03 else 04 m = bn=2c 05 L1 = (a1; a2;:::; am) 06 L2 = (am+1; am+2;:::; an) 07 return merge(mergesort(L1),mergesort(L2)) Algorithms and NP Algorithms Complexity P vs NP Find End 01 findend(A: array of numbers) 02 mylen = length(A) 03 if (mylen < 2) error 04 else if (A[0] = 0) error 05 else if (A[mylen-1] 6= 0) return mylen-1 06 else return findendrec(A,0,mylen-1) 11 findendrec(A, bottom, top: positive integers) 12 if (top
    [Show full text]
  • Lecture 11 — October 16, 2012 1 Overview 2
    6.841: Advanced Complexity Theory Fall 2012 Lecture 11 | October 16, 2012 Prof. Dana Moshkovitz Scribe: Hyuk Jun Kweon 1 Overview In the previous lecture, there was a question about whether or not true randomness exists in the universe. This question is still seriously debated by many physicists and philosophers. However, even if true randomness does not exist, it is possible to use pseudorandomness for practical purposes. In any case, randomization is a powerful tool in algorithms design, sometimes leading to more elegant or faster ways of solving problems than known deterministic methods. This naturally leads one to wonder whether randomization can solve problems outside of P, de- terministic polynomial time. One way of viewing randomized algorithms is that for each possible setting of the random bits used, a different strategy is pursued by the algorithm. A property of randomized algorithms that succeed with bounded error is that the vast majority of strategies suc- ceed. With error reduction, all but exponentially many strategies will succeed. A major theme of theoretical computer science over the last 20 years has been to turn algorithms with this property into algorithms that deterministically find a winning strategy. From the outset, this seems like a daunting task { without randomness, how can one efficiently search for such strategies? As we'll see in later lectures, this is indeed possible, modulo some hardness assumptions. There are two main objectives in this lecture. The first objective is to introduce some probabilis- tic complexity classes, by bounding time or space. These classes are probabilistic analogues of deterministic complexity classes P and L.
    [Show full text]
  • Naiad: a Timely Dataflow System
    Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry Rebecca Isaacs Michael Isard Paul Barham Mart´ın Abadi Microsoft Research Silicon Valley {derekmur,mcsherry,risaacs,misard,pbar,abadi}@microsoft.com Abstract User queries Low-latency query are received responses are delivered Naiad is a distributed system for executing data parallel, cyclic dataflow programs. It offers the high throughput Queries are of batch processors, the low latency of stream proces- joined with sors, and the ability to perform iterative and incremental processed data computations. Although existing systems offer some of Complex processing these features, applications that require all three have re- incrementally re- lied on multiple platforms, at the expense of efficiency, Updates to executes to reflect maintainability, and simplicity. Naiad resolves the com- data arrive changed data plexities of combining these features in one framework. A new computational model, timely dataflow, under- Figure 1: A Naiad application that supports real- lies Naiad and captures opportunities for parallelism time queries on continually updated data. The across a wide class of algorithms. This model enriches dashed rectangle represents iterative processing that dataflow computation with timestamps that represent incrementally updates as new data arrive. logical points in the computation and provide the basis for an efficient, lightweight coordination mechanism. requirements: the application performs iterative process- We show that many powerful high-level programming ing on a real-time data stream, and supports interac- models can be built on Naiad’s low-level primitives, en- tive queries on a fresh, consistent view of the results. abling such diverse tasks as streaming data analysis, it- However, no existing system satisfies all three require- erative machine learning, and interactive graph mining.
    [Show full text]
  • User's Guide for Complexity: a LATEX Package, Version 0.80
    User’s Guide for complexity: a LATEX package, Version 0.80 Chris Bourke April 12, 2007 Contents 1 Introduction 2 1.1 What is complexity? ......................... 2 1.2 Why a complexity package? ..................... 2 2 Installation 2 3 Package Options 3 3.1 Mode Options .............................. 3 3.2 Font Options .............................. 4 3.2.1 The small Option ....................... 4 4 Using the Package 6 4.1 Overridden Commands ......................... 6 4.2 Special Commands ........................... 6 4.3 Function Commands .......................... 6 4.4 Language Commands .......................... 7 4.5 Complete List of Class Commands .................. 8 5 Customization 15 5.1 Class Commands ............................ 15 1 5.2 Language Commands .......................... 16 5.3 Function Commands .......................... 17 6 Extended Example 17 7 Feedback 18 7.1 Acknowledgements ........................... 19 1 Introduction 1.1 What is complexity? complexity is a LATEX package that typesets computational complexity classes such as P (deterministic polynomial time) and NP (nondeterministic polynomial time) as well as sets (languages) such as SAT (satisfiability). In all, over 350 commands are defined for helping you to typeset Computational Complexity con- structs. 1.2 Why a complexity package? A better question is why not? Complexity theory is a more recent, though mature area of Theoretical Computer Science. Each researcher seems to have his or her own preferences as to how to typeset Complexity Classes and has built up their own personal LATEX commands file. This can be frustrating, to say the least, when it comes to collaborations or when one has to go through an entire series of files changing commands for compatibility or to get exactly the look they want (or what may be required).
    [Show full text]
  • Expander Flows, Geometric Embeddings and Graph Partitioning
    Expander Flows, Geometric Embeddings and Graph Partitioning SANJEEV ARORA Princeton University SATISH RAO and UMESH VAZIRANI UC Berkeley We give a O(√log n)-approximation algorithm for the sparsest cut, edge expansion, balanced separator,andgraph conductance problems. This improves the O(log n)-approximation of Leighton and Rao (1988). We use a well-known semidefinite relaxation with triangle inequality constraints. Central to our analysis is a geometric theorem about projections of point sets in d, whose proof makes essential use of a phenomenon called measure concentration. We also describe an interesting and natural “approximate certificate” for a graph’s expansion, which involves embedding an n-node expander in it with appropriate dilation and congestion. We call this an expander flow. Categories and Subject Descriptors: F.2.2 [Theory of Computation]: Analysis of Algorithms and Problem Complexity; G.2.2 [Mathematics of Computing]: Discrete Mathematics and Graph Algorithms General Terms: Algorithms,Theory Additional Key Words and Phrases: Graph Partitioning,semidefinite programs,graph separa- tors,multicommodity flows,expansion,expanders 1. INTRODUCTION Partitioning a graph into two (or more) large pieces while minimizing the size of the “interface” between them is a fundamental combinatorial problem. Graph partitions or separators are central objects of study in the theory of Markov chains, geometric embeddings and are a natural algorithmic primitive in numerous settings, including clustering, divide and conquer approaches, PRAM emulation, VLSI layout, and packet routing in distributed networks. Since finding optimal separators is NP-hard, one is forced to settle for approximation algorithms (see Shmoys [1995]). Here we give new approximation algorithms for some of the important problems in this class.
    [Show full text]
  • Optimizing Social Welfare for Myopic Multi-Armed Bandits
    Noble Deceit: Optimizing Social Welfare for Myopic Multi-Armed Bandits SUBMISSION #553 In the information economy, consumer-generated information greatly informs the decisions of future consumers. However, myopic consumers seek to maximize their own reward with no regard for the information they generate. By controlling the consumers’ access to information, a central planner can incentivize them to produce more valuable information for future consumers. The myopic multi-armed bandit problem is a simple model encapsulating these issues. In this paper, we construct a simple incentive-compatible mechanism achieving constant regret for the problem. We use the novel idea of completing phases to selectively reveal information towards the goal of maximizing social welfare. Moreover, we characterize the distributions for which an incentive-compatible mechanism can achieve the first-best outcome and show that our general mechanism achieves first-best in such settings. Manuscript submitted for review to the 22nd ACM Conference on Economics & Computation (EC'21). 1 INTRODUCTION Submission #553 1 With society’s deeper integration with technology, information generated by con- sumers has become an incredibly valuable commodity. Entire industries solely exist to compile and distribute this information effectively. Some of this information, such as product ratings or traffic reports, is critical for informing a consumer’s decisions. For example, an agent may wish to buy the best phone case from an online retailer. With- out accurate product ratings, an agent might purchase a low-quality item. However, if every agent were to simply purchase the highest rated product, alternate options would never get explored and the truly best product might never be discovered.
    [Show full text]
  • ML Cheatsheet Documentation
    ML Cheatsheet Documentation Team Sep 02, 2021 Basics 1 Linear Regression 3 2 Gradient Descent 21 3 Logistic Regression 25 4 Glossary 39 5 Calculus 45 6 Linear Algebra 57 7 Probability 67 8 Statistics 69 9 Notation 71 10 Concepts 75 11 Forwardpropagation 81 12 Backpropagation 91 13 Activation Functions 97 14 Layers 105 15 Loss Functions 117 16 Optimizers 121 17 Regularization 127 18 Architectures 137 19 Classification Algorithms 151 20 Clustering Algorithms 157 i 21 Regression Algorithms 159 22 Reinforcement Learning 161 23 Datasets 165 24 Libraries 181 25 Papers 211 26 Other Content 217 27 Contribute 223 ii ML Cheatsheet Documentation Brief visual explanations of machine learning concepts with diagrams, code examples and links to resources for learning more. Warning: This document is under early stage development. If you find errors, please raise an issue or contribute a better definition! Basics 1 ML Cheatsheet Documentation 2 Basics CHAPTER 1 Linear Regression • Introduction • Simple regression – Making predictions – Cost function – Gradient descent – Training – Model evaluation – Summary • Multivariable regression – Growing complexity – Normalization – Making predictions – Initialize weights – Cost function – Gradient descent – Simplifying with matrices – Bias term – Model evaluation 3 ML Cheatsheet Documentation 1.1 Introduction Linear Regression is a supervised machine learning algorithm where the predicted output is continuous and has a constant slope. It’s used to predict values within a continuous range, (e.g. sales, price) rather than trying to classify them into categories (e.g. cat, dog). There are two main types: Simple regression Simple linear regression uses traditional slope-intercept form, where m and b are the variables our algorithm will try to “learn” to produce the most accurate predictions.
    [Show full text]
  • Quantum Computing : a Gentle Introduction / Eleanor Rieffel and Wolfgang Polak
    QUANTUM COMPUTING A Gentle Introduction Eleanor Rieffel and Wolfgang Polak The MIT Press Cambridge, Massachusetts London, England ©2011 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. For information about special quantity discounts, please email [email protected] This book was set in Syntax and Times Roman by Westchester Book Group. Printed and bound in the United States of America. Library of Congress Cataloging-in-Publication Data Rieffel, Eleanor, 1965– Quantum computing : a gentle introduction / Eleanor Rieffel and Wolfgang Polak. p. cm.—(Scientific and engineering computation) Includes bibliographical references and index. ISBN 978-0-262-01506-6 (hardcover : alk. paper) 1. Quantum computers. 2. Quantum theory. I. Polak, Wolfgang, 1950– II. Title. QA76.889.R54 2011 004.1—dc22 2010022682 10987654321 Contents Preface xi 1 Introduction 1 I QUANTUM BUILDING BLOCKS 7 2 Single-Qubit Quantum Systems 9 2.1 The Quantum Mechanics of Photon Polarization 9 2.1.1 A Simple Experiment 10 2.1.2 A Quantum Explanation 11 2.2 Single Quantum Bits 13 2.3 Single-Qubit Measurement 16 2.4 A Quantum Key Distribution Protocol 18 2.5 The State Space of a Single-Qubit System 21 2.5.1 Relative Phases versus Global Phases 21 2.5.2 Geometric Views of the State Space of a Single Qubit 23 2.5.3 Comments on General Quantum State Spaces
    [Show full text]