Learning Binary Relations and Total Orders
Total Page:16
File Type:pdf, Size:1020Kb
Learning Binary Relations and Total Orders Sally A Goldman Ronald L Rivest Department of Computer Science MIT Lab oratory for Computer Science Washington University Cambridge MA St Louis MO rivesttheorylcsmitedu sgcswustledu Rob ert E Schapire ATT Bell Lab oratories Murray Hill NJ schapireresearchattcom Abstract We study the problem of learning a binary relation b etween two sets of ob jects or b etween a set and itself We represent a binary relation b etween a set of size n and a set of size m as an n m matrix of bits whose i j entry is if and only if the relation holds b etween the corresp onding elements of the twosetsWe present p olynomial prediction algorithms for learning binary relations in an extended online learning mo del where the examples are drawn by the learner by a helpful teacher by an adversary or according to a uniform probability distribution on the instance space In the rst part of this pap er we present results for the case that the matrix of the relation has at most k rowtyp es We present upp er and lower b ounds on the number of prediction mistakes any prediction algorithm makes when learning such a matrix under the extended online learning mo del Furthermore we describ e a technique that simplies the pro of of exp ected mistake b ounds against a randomly chosen query sequence In the second part of this pap er we consider the problem of learning a binary re lation that is a total order on a set We describ e a general technique using a fully p olynomial randomized approximation scheme fpras to implement a randomized ver sion of the halving algorithm We apply this technique to the problem of learning a total order using a fpras for counting the numb er of extensions of a partial order to obtain a p olynomial prediction algorithm that with high probabilitymakes at most This pap er prepared with all authors were at MIT Lab oratory for Computer Science with supp ort from NSF grant DCR ARO GrantDAALK DARPAContract NJ and a grant from the Siemens Corp oration S Goldman currently receives supp ort from a GE Foundation Junior Faculty Grant and NSF Grant CCR Preliminary versions of this pap er app eared in the Pro ceedings of the th IEEE Symp osium on Foundations of Computer Science Octob er and as MIT Lab oratory for Computer Science Technical Memo MITLCSTM May n lg n lg elgn mistakes when an adversary selects the query sequence We also consider the case that a teacher or the learner selects the query sequence Keywords Machine Learning Computational Learning Theory Online Learning Mistakeb ounded Learning Binary Relations Total Orders Fully Polynomial Ran domized Approximation Schemes Intro duction In many domains it is imp ortant to acquire information ab out a relation b etween two sets For example one may wish to learn a haspart relation b etween a set of animals and a set of attributes We are motivated by the problem of designing a prediction algorithm to learn such a binary relation when the learner has limited prior information ab out the predicate forming the relation While one could mo del such problems as concept learning they are fundamentally dierent problems In concept learning there is a single set of ob jects and the learners task is to classify these ob jects whereas in learning a binary relation there are two sets of ob jects and the learners task is to learn the predicate relating the two sets Observe that the problem of learning a binary relation can b e viewed as a concept learning problem by letting the instances b e all ordered pairs of ob jects from the two sets However the ways in which the problem may b e structured are quite dierent when the true task is to learn a binary relation as opp osed to a classication rule That is instead of a rule that denes which ob jects b elong to the target concept the predicate denes a relationship b etween pairs of ob ject A binary relation is dened b etween twosetsofobjects Throughout this pap er we assume that one set has cardinality n and the other has cardinality mWe also assume that for all p ossible pairings of ob jects the predicate relating the two sets of variables is either true or false Before dening a prediction algorithm we rst discuss our representation of a binary relation Throughout this pap er we represent the relation as an n m binary matrix where an entry contains the value of the predicate for the corresp onding elements Since the predicate is binaryvalued all entries in this matrix are either false or true The twodimensional structure arises from the fact that we are learning a binary relation For the sake of comparison wenowbrieymention other p ossible representations One could represent the relation as a table with two columns where eachentry in the rst column is an item from the rst set and eachentry in the second column is an item from the second set The rows of the table consist of the subset of the p otential nm pairings for whichthe predicate is true One could also represent the relation as a bipartite graph with n vertices in one vertex set and m vertices in the other set An edge is placed b etween twovertices exactly when the predicate is true for corresp onding items Having intro duced our metho d for representing the problem wenow informally discuss the basic learning scenario The learner is rep eatedly given a pair of elements one from each set and asked to predict the corresp onding matrix entry After making its prediction the learner is told the correct value of the matrix entry The learner wishes to minimize the numb er of incorrect predictions it makes Since we assume that the learner must eventually make a prediction for each matrix entrythenumb er of incorrect predictions dep ends on the size of the matrix Unlike problems typically studied where the natural measure of the size of the learners problem is the size of an instance or example for this problem it is the size of the matrix Such concept classes with p olynomialsized instance spaces are uninteresting in Valiants probably approximately correct PAC mo del of learning In this mo del instances are cho sen randomly from an arbitrary unknown probability distribution on the instance space A concept class is PAClearnable if the learner after seeing a numb er of instances that is p oly nomial in the problem size can output a hyp othesis that is correct on all but an arbitrarily small fraction of the instances with high probabilityFor concepts whose instance space has cardinality p olynomial in the problem size by asking to see enough instances the learner can see almost all of the probabilityweight of the instance space Thus it is not hard to show that these concept classes are trivially PAClearnable One goal of our research is to build a framework for studying such problems To study learning algorithms for these concept classes we extend the basic mistakebound mo del to the cases that a helpful teacher or the learner selects the query sequence in addition to the cases where instances are chosen byanadversary or according to a prob ability distribution on the instance space Previously helpful teachers have b een used to provide counterexamples to conjectured concepts or to break up the concept into smaller subconcepts In our framework the teacher only selects the presentation order for the instances If the learner is to haveany hop e of doing b etter than random guessing there must b e some structure in the relation Furthermore since there are so manyways to structure a binary relation wegive the learner some prior knowledge ab out the nature of this structure Not surprisingly the learning task dep ends greatly on the prior knowledge provided One way to imp ose structure is to restrict one set of ob jects to haverelatively few typ es For example a circus may contain many animals but only a few dierent sp ecies In the rst part of this pap er we study the case where the learner has a priori knowledge that there are a limited number of ob ject typ es Namelywe restrict the matrix representing the relation to have at most k distinct rowtyp es Tworows are of the same typ e if they agree in all columns We dene a k binaryrelation to b e a binary relation for which the corresp onding matrix has at most k rowtyp es This restriction is satised whenever there are only k typ es of ob jects in the set of n ob jects b eing considered in the relation The learner receives no other knowledge ab out the predicate forming the relation With this restriction weprove that any prediction algorithm makes at least km nb lgkc k blgkc mistakes in the worst case for anyxed against any query sequence Sofor weget k km n blg k c on the numb er of mistakes made byany prediction alower b ound of algorithm If computational eciency is not a concern the halving algorithm makes at most km n k lg k mistakes against any query sequence The halving algorithm predicts according to the ma jority of the feasible relations or concepts and thus each mistake halves the numb er of remaining relations We present an ecient algorithm making at most kmn k blg k c mistakes in the case that the learner cho oses the query sequence We provea tightmistakeboundof km n k k in the case that the helpful teacher selects the query sequence When the adversary selects the query sequence we present an ecient algorithm for k that makes at most m n mistakes and for arbitrary k we presentanecient algorithm making at most q k m mistakes We proveany algorithm makes at least kmn k blg k c mistakes km n in the case that an adversary selects the query sequence and use the existence