A Geometric Approach to Betweenness
Total Page:16
File Type:pdf, Size:1020Kb
A GEOMETRIC APPROACH TO BETWEENNESS y z BENNY CHOR AND MADHU SUDAN Abstract An input to the betweenness problem contains m constraints over n real variables p oints Each constraint consists of three p oints where one of the p oint is sp ecied to lie inside the interval dened by the other two The order of the other two p oints ie which one is the largest and which one is the smallest is not sp ecied This problem comes up in questions related to physical mapping in molecular biology In Opatrny showed that the problem of deciding whether the n p oints can b e totally ordered while satisfying the m b etweenness constraints is NPcomplete Furthermore the problem is MAX SNP complete and for every nding a total order that satises at least of the m constraints is NPhard even if all the constraints are satisable It is easy to nd an ordering of the p oints that satises of the m constraints eg by cho osing the ordering at random This pap er presents a p olynomial time algorithm that either determines that there is no feasible solution or nds a total order that satises at least of the m constraints The algorithm translates the problem into a set of quadratic inequalities and solves a semidenite relaxation of n them in R The n solution p oints are then pro jected on a random line through the origin The claimed p erformance guarantee is shown using simple geometric prop erties of the SDP solution Key words Approximation algorithm Semidenite programming NPcompleteness Compu tational biology AMS sub ject classications Q Q Intro duction An input to the betweenness problem consists of a nite set of n elements or points S fx x g and a nite set of m constraints Each n constraint consists of a triplet x x x S S S A candidate solution to the i j k x b etweenness problem is a total order on its p oints A total order x i i 2 1 satises the constraint x x x if either x x x or x x x That is x i j k i j k k j i i n each constraint forces the second variable x to b e b etween the two other variables x j i and x but do es not sp ecify the relative order of x and x The decision version of k i k the b etweenness problem is to decide if all constraints can b e simultaneously satised by a total order of the variables In Opatrny showed that the decision version of the b etweenness prob lem is NPcomplete This problem arises naturally when analyzing certain mapping problems in molecular biology For example it arises when trying to order markers on a chromosome given the results of a radiation hybrid exp eriment A com putational task of practical signicance in this context is to nd a total ordering of the markers the x in our terminology that maximizes the numb er of satised con i straints Indeed b etweenness is central in the recent software package RHMAPPER At the heart of this package is a metho d for pro ducing the order of framework markers based on b etweenness constraints obtained from a statistical analysis of the biological data Slonim et al successfully employ two greedy heuristics for solving the b etweenness problem A preliminary version of this pap er app eared in the pro ceedings of the third annual Europ ean Symp osium on Algorithms ESA Lecture Notes in Computer Science Paul Spirakis Ed Springer Septemb er pp y Dept of Computer Science Technion Haifa Israel b ennycstechnionacil Research partially supp orted by Technion VPR funds z Lab oratory for Computer Science MIT Technology Square Cambridge MA madhulcsmitedu Part of the work was done when this author was at the IBM Thomas J Watson Research Center Chor and Sudan Opatrny gave two reductions in his pro of of NPcompleteness One of these reductions is from SAT Following his construction we show in Section an ap proximation preserving reduction from MAX SAT This implies that there exists an such that nding a total order that satises at least m of the constraints even if they are all satisable is NPhard In particular this holds for every see Corollary On the other hand it is easy to nd a total order that satises one third of the m constraints even if they are not all satisable Simply arrange the p oints in a random order along the line The probability that a sp ecic constraint x x x is satised by such a randomly chosen order is since exactly two of i j k the six p ermutations on i j k have j in the middle Thus the exp ected numb er of constraints satised by a random order is at least a third of the m constraints On the other hand it is easy to construct examples where at most m constraints are satisable Thus to achieve b etter approximation factors one needs to b e able to recognize instances of the b etweenness problems that are not satisable We present a p olynomial time algorithm that either determines that there is no feasible solution or nds a total order that satises at least of the m constraints Our algorithm translates the problem into a set of quadratic inequalities and solves n n a semidenite programming relaxation SDP of them in R Let v v R n b e a feasible solution to the SDP where each v corresp onds to the real variable x i i The n solution p oints are then pro jected on a random line through the origin We show that if x b etween x and x is one of the b etweenness constraints then the j i k n angle b etween the lines v v and v v in R is obtuse Using this prop erty we prove i j k j that the random pro jection satises each constraint with probability at least This gives a randomized algorithm with the claimed p erformance guarantee Next we show how to derandomize the algorithm In addition we demonstrate that our analysis of the semidenite program is tight There is an innite family of inputs to the b etweenness problem such that the resulting SDP is feasible but any total order of the variables satises at most o of the m constraints Our use of semidenite programming is inspired by the recent success in using this metho dology to nd improved approximation algorithms for several optimization problems The applicability of SDP in combinatorial optimization was demonstrated by Grotschel Lovasz and Schrijver to show that the Theta function of Lovasz was p olynomial time computable This application was then turned into exact coloring and indep endent set nding algorithms for p erfect graphs The use of SDP in approximation algorithms was innovated by the work of Go emans and Williamson who broke longstanding barriers in the approximability of MAX CUT and MAX SAT by their SDP based algorithm Further evidence of the applicability of the SDP approach is provided by the works of Karger Motwani and Sudan who use it to approximate graph coloring Alon and Kahale indep endent set approximation and by Feige and Go emans improvements to MAX SAT Thus the semidenite programming metho d has now b een used successfully to solve many optimization problems exactly and approximately However all the cases where SDP has b een used to nd approximation algorithms seem to b e essen tially partition problems MAX CUT Coloring Multicut etc Our solution seems to b e to the b est of our knowledge the only case where SDP has b een used to solve an ordering problem This syntactic dierence b etween ordered structures and unordered ones and the ability of SDP to help optimize over b oth oers critical additional evidence on the p ower of the SDP metho dology The remaining of this pap er is organized as following Section presents the A Geometric Approach to Betweenness approximation preserving reduction from MAX SAT as well as other observations ab out the b etweenness problem Semidenite programming is briey reviewed in Section The algorithm is presented in Section Section shows the tightness of our analysis Finally Section contains some concluding remarks and op en problems Preliminaries We start this section with some preliminary observations ab out the b etweenness problem We b egin by dening the notion of an approxi mate solution to the b etweenness problem and analyzing the complexity of nding such a solution Definition Given an instance of the betweenness problem on m constraints and an approximate solution is one that satises at least k constraints where k is the maximum number of constraints satised by any solution For the approximation version of the betweenness problem is the task of nding an approximate solution for every instance An algorithm which solves such a problem is said to be an approximation algorithm For the approximation problem for satisable instances is the task of nding a total order that satises m constraints or determining that the instance is not satisable An algorithm which solves this problem is an approximation algorithm for satisable instances The complexity of solving the b etweenness problem exactly ie for is wellsettled Opatrny has shown that it is NPhard to decide if a given instance of the b etweenness problem is satisable We now turn our attention to the complexity of the problem for We rst present a hardness result based on a simple reduction from MAX CUT due to Michel Go emans An instance of the MAX CUT problem is an undirected graph The goal of the problem is to nd a partition S S of the vertex set so as to maximize the numb er of edges with one endp oint in S and one in S This problem was shown to b e hard to approximate to within some factor by Arora et al The b est result known to date due to Hastad see also Trevisan et al is that approximating MAX CUT is NPhard for every Proposition For every the approximation version of the MAX CUT problem reduces