Journal of Machine Learning Research 8 (2007) 2369-2403 Submitted 4/07; Revised 8/07; Published 10/07 The On-Line Shortest Path Problem Under Partial Monitoring Andras´ Gyor¨ gy
[email protected] Machine Learning Research Group Computer and Automation Research Institute Hungarian Academy of Sciences Kende u. 13-17, Budapest, Hungary, H-1111 Tamas´ Linder
[email protected] Department of Mathematics and Statistics Queen’s University, Kingston, Ontario Canada K7L 3N6 Gabor´ Lugosi
[email protected] ICREA and Department of Economics Universitat Pompeu Fabra Ramon Trias Fargas 25-27 08005 Barcelona, Spain Gyor¨ gy Ottucsak´
[email protected] Department of Computer Science and Information Theory Budapest University of Technology and Economics Magyar Tudosok´ Kor¨ utja´ 2. Budapest, Hungary, H-1117 Editor: Leslie Pack Kaelbling Abstract The on-line shortest path problem is considered under various models of partial monitoring. Given a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way, a decision maker has to choose in each round of a game a path between two distinguished vertices such that the loss of the chosen path (defined as the sum of the weights of its composing edges) be as small as possible. In a setting generalizing the multi-armed bandit problem, after choosing a path, the decision maker learns only the weights of those edges that belong to the chosen path. For this problem, an algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched off-line to the entire sequence of the edge weights, by a quantity that is proportional to 1=pn and depends only polynomially on the number of edges of the graph.