Theory and Algorithms for Modern Problems in Machine Learning and an Analysis of Markets
Total Page:16
File Type:pdf, Size:1020Kb
Theory and Algorithms for Modern Problems in Machine Learning and an Analysis of Markets by Ashish Rastogi A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science Courant Institute of Mathematical Sciences New York University May 2008 Richard Cole—Advisor Mehryar Mohri—Advisor °c Ashish Rastogi All Rights Reserved, 2008 To the most wonderful parents in the whole world, Mrs. Asha Rastogi and Mr. Shyam Lal Rastogi iv Acknowledgements First and foremost, I would like to thank my advisors, Professor Richard Cole and Professor Mehryar Mohri, for their unwavering support, guidance and constant encouragement. They have been inspiring mentors and much of what lies in the following pages can be credited to them. Working under their supervision has been one of the most enriching experiences of my life. I would also like to thank Professor Joel Spencer, Professor Arun Sun- dararajan, Professor Subhash Khot and Dr. Corinna Cortes for agreeing to serve as members on my thesis committee. Professor Spencer’s class on Random Graphs remains one of the most stim- ulating courses I undertook as a graduate student. Internships at Google through the summers of 2005, 2006 and 2007 were some of the most enjoy- able periods of my graduate school life. Many thanks are due to Dr. Corinna Cortes for providing me with the opportunity to work on several challenging problems at Google. Research initiated during these internships culminated in the development of ideas that form the bulk of this thesis. I would also like to thank my peers from the graduate school. Spirited discussions in the reading group meetings and during various seminars were directly responsible for the development of a “research attitude”, a critical ingredient for the successful completion of graduate studies. For all this, I thank Eugene Weinstein and Tyler Neylon. v Pursuing a Ph.D. can be an arduous affair. It entails everything from hav- ing to cope with failure to the exhilarating feeling of settling a research problem successfully. I would like to express my gratitude towards my friends, espe- cially Shruti Haldea, Ritu Gupta, Abhimanyu Yadav, Usha Mallya, Hameer Ruparel, Chris Wu and Marjorie Levy for providing an excellent web of sup- port. Finally, none of this would ever have transpired without the unselfish love and support of my precious parents and my delightful sister Ragini. Their constant support, patient love and care and unswerving faith has gotten me where I am. This thesis is dedicated to them. vi Abstract The unprecedented growth of the Internet over the past decade and of data collection, more generally, has given rise to vast quantities of digital in- formation, ranging from web documents and images, genomic databases to a vast array of business customer information. Consequently, it is of growing importance to develop tools and models that enable us to better understand this data and to design data-driven algorithms that leverage this information. This thesis provides several fundamental theoretical and algorithmic results for tackling such problems with applications to speech recognition, image pro- cessing, natural language processing, computational biology and web-based algorithms. • Probabilistic automata provide an efficient and compact way to model sequence-oriented data such as speech or web documents. Measuring the similarity of such automata provides a way of comparing the objects they model, and is an essential first step in organizing this type of data. We present algorithmic and hardness results for computing various discrep- ancies (or dissimilarities) between probabilistic automata, including the relative entropy and the Lp distance; we also give an efficient algorithm to determine if two probabilistic automata are equivalent. In addition, we study the complexity of computing the norms of probabilistic automata. • Widespread success of search engines and information retrieval systems has led to large scale collection of rating information which is being used vii to provide personalized rankings. We examine an alternate formulation of the ranking problem for search engines motivated by the requirement that in addition to accurately predicting pairwise ordering, ranking sys- tems must also preserve the magnitude of the preferences or the dif- ference between ratings. We present algorithms with sound theoretical properties, and verify their efficacy through experiments. • Organizing and querying large amounts of digitized data such as images and videos is a challenging task because little or no label information is available. This motivates transduction, a setting in which the learning algorithm can leverage unlabeled data during training to improve per- formance. We present novel error bounds for a family of transductive regression algorithms and validate their usefulness through experiments. • Finally, price discovery in a market setting can be viewed as an (ongoing) learning problem. Specifically, the problem is to find and maintain a set of prices that balance supply and demand, a core topic in economics. This appears to involve complex implicit and possibly large-scale infor- mation transfers. We show that finding equilibrium prices, even approx- imately, in discrete markets is NP-hard and complement the hardness result with a matching polynomial time approximation algorithm. We also give a new way of measuring the quality of an approximation to equilibrium prices that is based on a natural aggregation of the dissatis- faction of individual market participants. viii Contents Dedication . iv Acknowledgements . v Abstract . vii 1 Distances Between Probabilistic Automata 1 1.1 Introduction . 1 1.2 Preliminaries . 6 1.2.1 Semirings . 6 1.2.2 Probabilistic Automata and Shortest-Distances . 9 1.2.3 Algorithms for Computing Shortest-Distances . 14 1.2.4 Composition of Weighted Automata . 19 1.2.5 Distances Between Distributions And Norms . 20 1.2.6 Problems Studied . 22 1.3 Algorithms for Computation of Distances . 22 1.3.1 Relative Entropy of Unambiguous Automata . 23 1.3.2 L2p Distance . 36 1.3.3 Hellinger Distance . 40 ix 1.4 Hardness Results . 41 1.4.1 L2p+1 Distance and L∞ Distance . 41 1.4.2 Relative Entropy of Arbitrary Automata . 55 1.5 Equivalence of Probabilistic Automata . 59 1.6 Relative Entropy As a Kernel . 61 1.7 Computation of the Norm . 64 1.7.1 Norm of an Unambiguous Automaton . 65 1.7.2 Norm of Arbitrary Automata . 67 1.7.3 Approximate Computation of Lp-norm . 67 1.7.4 Approximate Computation of Entropy . 68 1.8 Conclusion . 71 2 Magnitude-Preserving Ranking Algorithms 74 2.1 Introduction and Motivation . 74 2.2 Preliminaries . 78 2.2.1 Formulation of the Problem . 78 2.2.2 Cost Functions . 79 2.2.3 Kernels and Regularization . 81 2.3 Algorithms . 84 2.3.1 Objective Functions . 84 2.3.2 MPRank . 85 2.3.3 SVRank . 88 2.3.4 On-line Version of MPRank . 90 x 2.3.5 Leave-One-Out Analysis for MPRank . 92 2.4 Stability bounds . 98 2.4.1 Magnitude-preserving regularization algorithms . 99 2.5 Experiments . 106 2.5.1 MovieLens Dataset . 108 2.5.2 Jester Joke Dataset . 109 2.5.3 Netflix Dataset . 109 2.5.4 Book-Crossing Dataset . 110 2.5.5 Performance Measures and Results . 110 2.5.6 On-line Version of MPRank . 113 2.6 Conclusion . 114 3 Transductive Regression 116 3.1 Introduction . 116 3.2 Preliminaries . 122 3.2.1 Learning Setting . 122 3.2.2 Transductive Stability . 123 3.3 Transductive Regression Stability Bounds . 125 3.3.1 Bound for Sampling without Replacement . 125 3.3.2 Transductive Stability Bound . 129 3.4 Stability of Local Transductive Regression Algorithms . 132 3.4.1 Local Transductive Regression Algorithms . 133 3.5 Stability Based on Closed-Form Solution . 141 xi 3.5.1 Unconstrained Regularization Algorithms . 141 3.5.2 Stability of Constrained Regularization Algorithms . 146 3.5.3 Making Seemingly Unstable Algorithms Stable . 148 3.6 Experiments . 149 3.6.1 Model Selection Based on Bound . 149 3.7 Conclusion . 155 4 An Analysis of Discrete Markets 156 4.1 Introduction . 156 4.2 Definitions . 171 4.3 Hardness of Computing Near-Equilibrium Prices . 178 4.3.1 The Balanced Max-3SAT-3 problem . 184 4.3.2 Details of the Hardness Construction . 186 4.4 A Matching Algorithm . 207 4.5 A Local Tatonnement Algorithm . 213 4.5.1 Introduction and Motivation . 213 4.5.2 Analysis of Convergence . 218 4.6 Relationship Between Discontent and ²-closeness in Utility . 238 Bibliography . 240 xii Chapter 1 Distances Between Probabilistic Automata 1.1 Introduction A probabilistic automaton is a finite automaton in which each transition carries a non-negative weight. The weight associated with a path π in the automaton is the product of the weights of the transitions that appear on π and the probability associated with a certain string x accepted by the automaton is the sum of the weights of all the paths on which x is accepted. In addition, the sum of the weights of all strings accepted by the automaton is one. Thus, such an automaton represents a probability distribution over a regular set. Probabilistic automata were introduced in [81] and are extensively used in a variety of areas of computer science including text and speech process- 1 ing [73], image processing [39], and computational biology [43]. In natural language-processing applications, for example, probabilistic automata are used to describe morphological and phonological rules [61]. Closely related to probabilistic automata are Hidden Markov Models, which are also used extensively in statistical learning [42, 82]. In a Hidden Markov Model, the weight associated with a transition can itself be a random variable whose value is drawn from some distribution. However, when each transi- tion has a fixed constant weight, a Hidden Markov Model is identical to a probabilistic automaton.