Satisfiability-Based Algorithms for Haplotype Inference

Total Page:16

File Type:pdf, Size:1020Kb

Satisfiability-Based Algorithms for Haplotype Inference UNIVERSIDADE TECNICA´ DE LISBOA INSTITUTO SUPERIOR TECNICO´ Satisfiability-based Algorithms for Haplotype Inference Ana Sofia Soeiro da Gra¸ca (Licenciada) Disserta¸c˜ao para obten¸c˜ao do Grau de Doutor em Engenharia Inform´atica e de Computadores Orientadora: Doutora Maria Inˆes Camarate de Campos Lynce de Faria Co-orientadores: Doutor Jo˜ao Paulo Marques da Silva Doutor Arlindo Manuel Limede de Oliveira J´uri Presidente: Presidente do Conselho Cient´ıfico do IST Vogais: Doutor Arlindo Manuel Limede de Oliveira Doutor Jo˜ao Paulo Marques da Silva Doutor Eran Halperin Doutor Ludwig Krippahl Doutor Lu´ıs Jorge Br´as Monteiro Guerra e Silva Doutora Maria Inˆes Camarate de Campos Lynce de Faria Janeiro de 2011 Abstract Identifying genetic variations has an important role in human genetics. Single Nucleotide Polymor- phisms (SNPs) are the most common genetic variations and haplotypes encode SNPs in a single chromosome. Unfortunately, current technology cannot obtain directly the haplotypes. Instead, genotypes, which represent the combined data of two haplotypes on homologous chromosomes, are acquired. The haplotype inference problem consists in obtaining the haplotypes from the genotype data. This problem is a key issue in genetics and a computational challenge. This dissertation proposes two Boolean optimization methods for haplotype inference, one po- pulation-based and another pedigree-based, which are competitive both in terms of accuracy and efficiency. The proposed method for solving the haplotype inference by pure parsimony (HIPP) approach is shown to be very robust, representing the state of the art HIPP method. The pedigree- based method, which takes into consideration both family and population information, shows to be more accurate than the existing methods for haplotype inference in pedigrees. Furthermore, this dissertation contributes to better understanding the HIPP problem by providing a vast comparison between HIPP methods. Keywords: haplotype inference, Boolean satisfiability (SAT), pseudo-Boolean optimization (PBO), maximum satisfiability (MaxSAT), pure parsimony (HIPP), minimum recombinant (MRHC). i ii Sum´ario A identifica¸c˜ao de varia¸c˜oes gen´eticas desempenha um papel fundamental no estudo da gen´etica humana. Os polimorfismos de nucle´otido ´unico (SNPs) correspondem `as varia¸c˜oes gen´eticas mais comuns entre os seres humanos e os hapl´otipos codificam SNPs presentes num ´unico cromossoma. Actualmente, por limita¸c˜oes de ´ındole tecnol´ogica, a obten¸c˜ao directa de hapl´otipos n˜ao ´eexequ´ıvel. Por outro lado, s˜ao obtidos os gen´otipos, que correspondem `ainforma¸c˜ao conjunta do par de cro- mossomas hom´ologos. O problema da inferˆencia de hapl´otipos consiste na obten¸c˜ao de hapl´otipos a partir de gen´otipos. Este paradigma representa um aspecto fundamental em gen´etica e tamb´em um interessante desafio computacional. Esta disserta¸c˜ao prop˜oe dois m´etodos de optimiza¸c˜ao Booleana para inferˆencia de hapl´otipos, um populacional e outro em linhagens geneal´ogicas, cujo desempenho ´eassinal´avel em eficiˆencia e precis˜ao. O m´etodo para inferˆencia de hapl´otipos por parcim´onia pura (HIPP) em popula¸c˜oes mostra-se muito robusto e representa o estado da arte entre os modelos HIPP. O modelo proposto para inferˆencia de hapl´otipos em linhagens geneal´ogicas, que combina informa¸c˜ao familiar e popula- cional, representa um m´etodo mais preciso que os congeneres. Ademais, esta disserta¸c˜ao representa um contributo relevante na compreens˜ao do problema HIPP, fornecendo uma extensa compara¸c˜ao entre m´etodos. Palavras chave: inferˆencia de hapl´otipos, satisfa¸c˜ao Booleana (SAT), optimiza¸c˜ao pseudo-Booleana (PBO), satisfa¸c˜ao m´axima (MaxSAT), parcim´onia pura (HIPP), recombina¸c˜ao m´ınima (MRHC). iii iv Acknowledgments “Bear in mind that the wonderful things that you learn in your schools are the work of many generations, produced by enthusiastic effort and infinite labor in every country of the world. All this is put into your hands as your inheritance in order that you may receive it, honor it, and add to it, and one day faithfully hand it on to your children.” Albert Einstein I have always been fascinated by science, its mysteries and curiosities. Understanding the world where we live and, in particular, the human beings, requires the effort and hard work of many generations. This thesis represents a modest contribution to the development of science, my little drop in the ocean. Studying such a multidisciplinary topic was a big challenge and achieving this goal represents a very important step for me. However, I could not have reached this point without the help of several people. First of all, I owe a special thanks to Professor Inˆes Lynce for being such a dedicated supervisor and for her contributions to this project. She taught me how to do research. She is an example of an excellent researcher and also a very good friend. She motivated me from the very beginning and always gave me the right words. I also thank Professor Jo˜ao Marques da Silva for his enthusiasm and motivation at work, for his interesting contributions and for sharing with me the vast experience he has in the SAT area. I am also grateful to Professor Arlindo Oliveira for suggesting me this PhD topic, for his inter- esting suggestions and his help with the biological issues - the hardest for me as a mathematician. I thank all my colleges and professors at Instituto Superior T´ecnico, INESC-ID, University of Southampton and University College of Dublin. I am eternally grateful to my parents, Ot´ılia and Ant´onio, who always help me following my dreams and support my decisions, sometimes even without understanding. I dedicate my thesis to them. v I owe a word of gratitude to my brother and to my sister-in-law. Paulo and Cristina gave me strength to follow this project and they gave me the best advices when I needed them. I admire their perseverance in life. Most of all, I thank them for giving me three lovely nephews that bring so much happiness into my life. A special thanks to my nephews for renewing in me the imagination and curiosity, typical from children, and so important for who works in research. I am very grateful to Renato. He is able to solve my problems even before I could realize them. He supported me in the hardest moments and always believed I could do a good work out of this project. Finally, I also thank all my friends for being present. “We ourselves feel that what we are doing is just a drop in the ocean. But the ocean would be less because of that missing drop.” Mother Teresa vi Contents 1 Introduction 1 1.1 Contributions................................... ..... 5 1.2 ThesisOutline ................................... .... 6 2 Discrete Constraint Solving 9 2.1 IntegerLinearProgramming(ILP) . ......... 9 2.1.1 Preliminaries ................................. ... 9 2.1.2 ILPAlgorithms................................. 10 2.2 BooleanSatisfiability(SAT) . ......... 12 2.2.1 Preliminaries ................................. 12 2.2.2 CNFEncodings.................................. 13 2.2.3 SATAlgorithms ................................. 14 2.3 Pseudo-BooleanOptimization(PBO). .......... 17 2.3.1 Preliminaries ................................. 17 2.3.2 PBOAlgorithms ................................. 19 2.4 MaximumSatisfiability(MaxSAT) . ........ 21 2.4.1 Preliminaries ................................. 21 2.4.2 MaxSATAlgorithms.............................. 23 2.5 Conclusions ..................................... 24 3 Haplotype Inference 27 3.1 Preliminaries ................................... ..... 28 3.1.1 SNPs,HaplotypesandGenotypes. ...... 28 3.1.2 Haplotyping................................... 30 3.1.3 Pedigrees ..................................... 31 vii 3.2 Population-based HaplotypeInference . ............ 32 3.2.1 CombinatorialApproaches. ...... 33 3.2.2 StatisticalApproaches . ...... 38 3.3 Pedigree-basedHaplotypeInference. ............ 42 3.3.1 CombinatorialApproaches. ...... 43 3.3.2 StatisticalApproaches . ...... 44 3.4 Conclusions ..................................... 45 4 Haplotype Inference by Pure Parsimony (HIPP) 47 4.1 PreprocessingTechniques . ........ 48 4.1.1 StructuralSimplifications . ....... 48 4.1.2 LowerBounds ................................... 50 4.1.3 UpperBounds ................................... 53 4.2 IntegerLinearProgrammingModels . ......... 58 4.2.1 RTIP ........................................ 58 4.2.2 PolyIP........................................ 60 4.2.3 HybridIP ...................................... 61 4.2.4 HaploPPH ..................................... 62 UB 4.2.5 Pmax ........................................ 64 4.3 Branch-and-BoundModels. ....... 66 4.3.1 HAPAR....................................... 66 4.3.2 SetCovering ................................... 68 4.4 BooleanSatisfiabilityModels . ......... 70 4.4.1 SHIPs........................................ 70 4.4.2 SATlotyper .................................... 72 4.5 AnswerSetProgrammingModels. ....... 73 4.5.1 HAPLO-ASP.................................... 73 4.6 CompleteHIPP.................................... 74 4.7 Complexity ...................................... 76 4.8 HeuristicMethods ................................ ..... 76 4.9 Conclusions ..................................... 78 5 A New Approach for HIPP 81 5.1 SolvingILPHIPPModelswithPBO . ...... 82 viii 5.2 RPolyModel ...................................... 83 5.2.1 EliminatingKeySymmetries . ..... 83 5.2.2 ReducingtheModel .............................. 84 5.3
Recommended publications
  • Approximability Preserving Reduction Giorgio Ausiello, Vangelis Paschos
    Approximability preserving reduction Giorgio Ausiello, Vangelis Paschos To cite this version: Giorgio Ausiello, Vangelis Paschos. Approximability preserving reduction. 2005. hal-00958028 HAL Id: hal-00958028 https://hal.archives-ouvertes.fr/hal-00958028 Preprint submitted on 11 Mar 2014 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Laboratoire d'Analyse et Modélisation de Systèmes pour l'Aide à la Décision CNRS UMR 7024 CAHIER DU LAMSADE 227 Septembre 2005 Approximability preserving reductions Giorgio AUSIELLO, Vangelis Th. PASCHOS Approximability preserving reductions∗ Giorgio Ausiello1 Vangelis Th. Paschos2 1 Dipartimento di Informatica e Sistemistica Università degli Studi di Roma “La Sapienza” [email protected] 2 LAMSADE, CNRS UMR 7024 and Université Paris-Dauphine [email protected] September 15, 2005 Abstract We present in this paper a “tour d’horizon” of the most important approximation-pre- serving reductions that have strongly influenced research about structure in approximability classes. 1 Introduction The technique of transforming a problem into another in such a way that the solution of the latter entails, somehow, the solution of the former, is a classical mathematical technique that has found wide application in computer science since the seminal works of Cook [10] and Karp [19] who introduced particular kinds of transformations (called reductions) with the aim of study- ing the computational complexity of combinatorial decision problems.
    [Show full text]
  • Constraint Satisfaction Problems and Descriptive Complexity
    1 Constraint Satisfaction Problems and Descriptive Complexity Anuj Dawar University of Cambridge AIM workshop on Universal Algebra, Logic and CSP 1 April 2008 Anuj Dawar April 2008 2 Descriptive Complexity Descriptive Complexity seeks to classify computational problems on finite structures (i.e. queries) according to their definability in different logics. Investigate the connection between the descriptive complexity of queries (i.e. the logical resources required to describe them) and their computational complexity (measured in terms of space, time, etc. on a suitable model of computation). Anuj Dawar April 2008 3 Logic and Complexity Recall: For a logic L The combined complexity of L is the complexity of determining, given a structure A and a sentence ϕ ∈ L, whether or not A |= ϕ. The data complexity of L is in the complexity class C, if every query definable in L is in C. We say L captures the complexity class C if the data complexity of L is in C, and every query whose complexity is in C is definable in L. Anuj Dawar April 2008 4 Fagin’s Theorem A formula of Existential Second-Order Logic (ESO) of vocabulary σ is of the form: ∃R1 ···∃Rmψ • R1,...,Rm are relational variables • ψ is FO formula in the vocabulary σ ∪ {R1,...,Rm}. Theorem (Fagin 1974) ESO captures NP. Corollary For each B, CSP(B) is definable in ESO. Anuj Dawar April 2008 5 3-Colourability The following formula is true in a graph (V, E) if, and only if, it is 3-colourable. ∃R∃B∃G ∀x(R(x) ∨ B(x) ∨ G(x))∧ ∀x(¬(R(x) ∧ B(x)) ∧¬(B(x) ∧ G(x)) ∧¬(R(x) ∧ G(x)))∧ ∀x∀y(E(x, y) → ((R(x) ∧ B(y)) ∨ (B(x) ∧ R(y))∨ (B(x) ∧ G(y)) ∨ (G(x) ∧ B(y))∨ (G(x) ∧ R(y)) ∨ (R(x) ∧ G(y)))) This example easily generalises to give a direct translation from B to a formula of ESO expressing CSP(B).
    [Show full text]
  • Parameterizing MAX SNP Problems Above Guaranteed Values
    Parameterizing MAX SNP problems above Guaranteed Values Meena Mahajan, Venkatesh Raman, and Somnath Sikdar The Institute of Mathematical Sciences, C.I.T Campus, Taramani, Chennai 600113. {meena,vraman,somnath}@imsc.res.in Abstract. We show that every problem in MAX SNP has a lower bound on the optimum solution size that is unbounded and that the above guarantee question with respect to this lower bound is fixed parameter tractable. We next introduce the notion of “tight” upper and lower bounds for the optimum solution and show that the parameterized version of a variant of the above guarantee question with respect to the tight lower bound cannot be fixed parameter tractable unless P = NP, for a class of NP-optimization problems. 1 Introduction In this paper, we consider the parameterized complexity of NP-optimization prob- lems Q with the following property: for non-trivial instance I of Q, the optimum opt(I), is lower-bounded by an increasing function of the input size. That is, there exists a function f : N → N which is increasing such that for non-trivial instances I, opt(I) ≥ f(|I|). For such an optimization problem Q, the standard parameterized ver- sion Q˜ defined below is easily seen to be fixed parameter tractable. For if k ≤ f(|I|), we answer ‘yes’; else, f(|I|) < k and so |I| < f −1(k)1 and we have a kernel. Q˜ = {(I, k): I is an instance of Q and opt(I) ≥ k} Thus for such an optimization problem it makes sense to define an “above guaran- tee” parameterized version Q¯ as Q¯ = {(I, k): I is an instance of Q and opt(I) ≥ f(|I|) + k}.
    [Show full text]
  • Algorithms for Optimization Problems in Planar Graphs
    Report from Dagstuhl Seminar 13421 Algorithms for Optimization Problems in Planar Graphs Edited by Glencora Borradaile1, Philip Klein2, Dániel Marx3, and Claire Mathieu4 1 Oregon State University, US, [email protected] 2 Brown University, US, [email protected] 3 HU Berlin, DE, [email protected] 4 Brown University, US, [email protected] Abstract This report documents the program and the outcomes of Dagstuhl Seminar 13421 “Algorithms for Optimization Problems in Planar Graphs”. The seminar was held from October 13 to October 18, 2013. This report contains abstracts for the recent developments in planar graph algorithms discussed during the seminar as well as summaries of open problems in this area of research. Seminar 13.–18. October, 2013 – www.dagstuhl.de/13421 1998 ACM Subject Classification F.2 Analysis of Algorithms and Problem Complexity Keywords and phrases Algorithms, planar graphs, theory, approximation, fixed-parameter tract- able, network flow, network design, kernelization Digital Object Identifier 10.4230/DagRep.3.10.36 Edited in cooperation with Kyle Fox 1 Executive Summary Glencora Borradaile Philip Klein Dániel Marx Claire Mathieu License Creative Commons BY 3.0 Unported license © Glencora Borradaile, Philip Klein, Dániel Marx, and Claire Mathieu Planar graphs, and more generally graphs embedded on surfaces, arise in applications such as road map navigation and logistics, computational topology, graph drawing, and image processing. There has recently been a growing interest in addressing combinatorial optimization problems using algorithms that exploit embeddings on surfaces to achieve provably higher-quality output or provably faster running times. New algorithmic techniques have been discovered that yield dramatic improvements over previously known results.
    [Show full text]
  • Satisfiability on Mixed Instances
    Electronic Colloquium on Computational Complexity, Report No. 192 (2015) Satisfiability on Mixed Instances Ruiwen Chen∗ Rahul Santhanamy November 26, 2015 Abstract The study of the worst-case complexity of the Boolean Satisfiability (SAT) problem has seen considerable progress in recent years, for various types of instances including CNFs [16, 15, 20, 21], Boolean formulas [18] and constant-depth circuits [6]. We systematically investigate the complexity of solving mixed instances, where different parts of the instance come from different types. Our investigation is motivated partly by practical contexts such as SMT (Satisfiability Modulo Theories) solving, and partly by theoretical issues such as the exact complexity of graph problems and the desire to find a unifying framework for known satisfiability algorithms. We investigate two kinds of mixing: conjunctive mixing, where the mixed instance is formed by taking the conjunction of pure instances of different types, and compositional mixing, where the mixed instance is formed by the composition of different kinds of circuits. For conjunctive mixing, we show that non-trivial savings over brute force search can be obtained for a number of instance types in a generic way using the paradigm of subcube partitioning. We apply this generic result to show a meta-algorithmic result about graph optimisation problems: any opti- misation problem that can be formalised in Monadic SNP can be solved exactly with exponential savings over brute-force search. This captures known results about problems such as Clique, Independent Set and Vertex Cover, in a uniform way. For certain kinds of conjunctive mixing, such as mixtures of k-CNFs and CNFs of bounded size, and of k-CNFs and Boolean formulas, we obtain improved savings over subcube partitioning by combining existing algorithmic ideas in a more fine-grained way.
    [Show full text]
  • ON the COMPLEXITY of the SINGLE INDIVIDUAL SNP HAPLOTYPING PROBLEM 2 at That Site
    USENGLISHON THE COMPLEXITY OF THE SINGLE INDIVIDUAL SNP HAPLOTYPING PROBLEM 1 On the Complexity of the Single Individual SNP Haplotyping Problem Rudi Cilibrasi, Leo van Iersel, Steven Kelk and John Tromp Abstract— We present several new results pertaining to of as a “fingerprint” for that individual. haplotyping. These results concern the combinatorial prob- lem of reconstructing haplotypes from incomplete and/or It has been observed that, for most SNP sites, only imperfectly sequenced haplotype fragments. We consider two nucleotides are seen; sites where three or four the complexity of the problems Minimum Error Correction (MEC) and Longest Haplotype Reconstruction (LHR) for nucleotides are found are comparatively rare. Thus, different restrictions on the input data. Specifically, we from a combinatorial perspective, a haplotype can look at the gapless case, where every row of the input be abstractly expressed as a string over the alphabet corresponds to a gapless haplotype-fragment, and the 1- {0, 1}. Indeed, the biologically-motivated field of gap case, where at most one gap per fragment is allowed. SNP and haplotype analysis has spawned a rich We prove that MEC is APX-hard in the 1-gap case and variety of combinatorial problems, which are well still NP-hard in the gapless case. In addition, we question earlier claims that MEC is NP-hard even when the input described in surveys such as [6] and [9]. matrix is restricted to being completely binary. Concerning LHR, we show that this problem is NP-hard and APX-hard We focus on two such combinatorial problems, in the 1-gap case (and thus also in the general case), but both variants of the Single Individual Haplotyping is polynomial time solvable in the gapless case.
    [Show full text]
  • Decidable Relationships Between Consistency Notions for Constraint Satisfaction Problems
    Decidable Relationships between Consistency Notions for Constraint Satisfaction Problems Albert Atserias1 and Mark Weyer2 1 Universitat Polit`ecnica de Catalunya, Barcelona, Spain 2 Humboldt Universit¨at zu Berlin, Berlin, Germany Abstract. We define an abstract pebble game that provides game in- terpretations for essentially all known consistency algorithms for con- straint satisfaction problems including arc-consistency,(j, k)-consistency, k-consistency, k-minimality, and refinements of arc-consistency such as peek arc-consistency and singleton arc-consistency. Our main result is that for any two instances of the abstract pebble game where the first satisfies the additional condition of being stacked, there exists an al- gorithm to decide whether consistency with respect to the first implies consistency with respect to the second. In particular, there is a decid- able criterion to tell whether singleton arc-consistency with respect to a given constraint language implies k-consistency with respect to the same constraint language, for any fixed k. We also offer a new decidable crite- rion to tell whether arc-consistency implies satisfiability which pulls from methods in Ramsey theory and looks more amenable to generalization. 1 Introduction Comparing finite structures with respect to some partial order or equivalence relation is a classic theme in logic and algorithms. Notable examples include isomorphisms, embeddings, and homomorphisms, as well as preservation of for- mulas in various logics, and (bi-)simulation relations of various types. Let ≤ and ≤′ be partial orders on finite structures, where ≤ is a refinement of ≤′ which is however harder than ≤′. More precisely, ≤ is a refinement of ≤′ in that the implication A ≤ B =⇒ A ≤′ B (1) holds true, but the reverse implication is not true in general.
    [Show full text]
  • Approximate Constraint Satisfaction
    Approximate Constraint Satisfaction Lars Engebretsen Stockholm 2000 Doctoral Dissertation Royal Institute of Technology Department of Numerical Analysis and Computer Science Akademisk avhandling som med tillst˚and av Kungl Tekniska H¨ogskolan framl¨agges till offentlig granskning f¨or avl¨aggande av teknologie doktorsexamen tisdagen den 25 april 2000 kl 10.00 i Kollegiesalen, Administrationsbyggnaden, Kungl Tekniska H¨ogskolan, Valhallav¨agen 79, Stockholm. ISBN 91-7170-545-7 TRITA-NA-0008 ISSN 0348-2952 ISRN KTH/NA/R--00/08--SE c Lars Engebretsen, April 2000 H¨ogskoletryckeriet, Stockholm 2000 Abstract In this thesis we study optimization versions of NP-complete constraint satisfaction problems. An example of such a problem is the Maximum Satisfiability problem. An instance of this problem consists of clauses formed by Boolean variables and the goal is to find an assignment to the variables that satisfies as many clauses as pos- sible. Since it is NP-complete to decide if all clauses are simultaneously satisfiable, the Maximum Satisfiability problem cannot be solved exactly in polynomial time unless P = NP. However, it may be possible to solve the problem approximately, in the sense that the number of satisfied clauses is at most some factor c from the optimum. Such an algorithm is called a c-approximation algorithm. The foundation of the results presented in this thesis is the approximability of several constraint satisfaction problems. We provide both upper and lower bounds on the approx- imability. A proof of an upper bound is usually a polynomial time c-approximation algorithm for some c, while a proof of a lower bound usually shows that it is NP- hard to approximate the problem within some c.
    [Show full text]
  • Some Tight Results∗
    Genus Characterizes the Complexity of Certain Graph Problems: Some Tight Results∗ Jianer Chen† Iyad A. Kanj‡ Ljubomir Perkovic´‡ Eric Sedgwick‡ Ge Xia§ Abstract We study the fixed-parameter tractability, subexponential time computability, and approx- imability of the well-known NP-hard problems: independent set, vertex cover, and dom- inating set. We derive tight results and show that the computational complexity of these problems, with respect to the above complexity measures, is dependent on the genus of the underlying graph. For instance, we show that, under the widely-believed complexity assump- tion W [1] 6= FPT, independent set on graphs of genus bounded by g1(n) is fixed parameter 2 tractable if and only if g1(n) = o(n ), and dominating set on graphs of genus bounded by o(1) g2(n) is fixed parameter tractable if and only if g2(n) = n . Under the assumption that not all SNP problems are solvable in subexponential time, we show that the above three prob- lems on graphs of genus bounded by g3(n) are solvable in subexponential time if and only if g3(n) = o(n). We also show that the independent set, the kernelized vertex cover, and the kernelized dominating set problems on graphs of genus bounded by g4(n) have PTAS if g4(n) = o(n/ log n), and that, under the assumption P 6= NP, the independent set problem on graphs of genus bounded by g5(n) has no PTAS if g5(n) = Ω(n), and the vertex cover and Ω(1) dominating set problems on graphs of genus bounded by g6(n) have no PTAS if g6(n) = n .
    [Show full text]
  • On the Complexity of Constraint Satisfaction Problems a Model Theoretic Approach
    On the Complexity of Constraint Satisfaction Problems A Model Theoretic Approach CO401J Project - Final Report 19th of June 2019 Imperial College London Author A. Papadopoulos Supervisors Dr. C. Kestner Dr. Z. Ghadernezhad Second Marker Prof. K. Buzzard Abstract Constraint Satisfaction Problems (CSPs) have played a crucial role in recent developments in the field of Compu- tational Complexity. The Dichotomy Conjecture of Finite Domain CSPs by Feder and Vardi [1] was finally proved two years ago (independently in [2] and [3]). Ever since the conjecture was proposed, the study of CSPs using various techniques both from pure mathematics (such as Model Theory, Topology, and Group Theory) and theoretical computer science (for instance, Datalog and Descriptive Complexity) has flourished. Even though a vast amount of research is being carried out on the topic, the literature remains somewhat convoluted and in general inaccessible. In this report, we attempt to give a concise presentation of recent works on the model theoretic approach to constraint satisfaction. Since the literature on Model Theory and Complexity Theory has been standardised over the last 30 years or so (the standard references include [4] for Model Theory, [5] for Classical Complexity Theory and [6] for Descriptive Complexity), we only present a short summary of the results we need from these areas. A more in-depth presentation of CSPs is given, in an attempt to provide a clearer entry point to current research. Acknowledgements I would like, first of all, to thank my supervisor, Zaniar not only for proposing this topic, but also for the long and interesting discussions and his overall helpful comments and immense support.
    [Show full text]
  • Complexity, Combinatorial Positivity, and Newton Polytopes
    Complexity, Combinatorial Positivity, and Newton Polytopes Alexander Yong University of Illinois at Urbana-Champaign Joint work with: Anshul Adve (University of California at Los Angeles) Colleen Robichaux (University of Illinois at Urbana-Champaign) Complexity, Combinatorial Positivity, and Newton Polytopes Alexander Yong University of Illinois at Urbana-Champaign Computational Complexity Theory Poorly understood issue: Why are do some decision problems have fast algorithms and others seem to need costly search? Some complexity classes: I NP: LP (9x ≥ 0, Ax=b?) I coNP: Primes I P: LP and Primes! I NP-complete: Graph coloring Famous theoretical computer science problems: ? I P = NP ? I NP = coNP ? I NP \ coNP = P Complexity, Combinatorial Positivity, and Newton Polytopes Alexander Yong University of Illinois at Urbana-Champaign Polynomials In algebraic combinatorics and combinatorial representation theory we often study: α F = cα,x = wt(s) 2 Z[x1;:::; xn] α s2S X X Example 1: = λ = F = sλ (Schur), cα,G = Kostka coeff. Example 2: = G = (V ; E) = F = χG (Stanley's chromatic ) symmetric polynomial), cα,G = #proper colorings of G with αi -many colors i ) Example 3: = w 2 S = F = Sw (Schubert polynomial). More later. 1 ) Complexity, Combinatorial Positivity, and Newton Polytopes Alexander Yong University of Illinois at Urbana-Champaign Nonvanishing Nonvanishing: What is the complexity of deciding cα, 6= 0 as measured in the length of the input (α, ) assuming arithmetic takes constant time? I In general undecidable: G¨odelincompleteness '31, Turing's halting problem '36. I Our cases of interest have combinatorial positivity: 9 rule for cα, 2 Z≥0 = Nonvanishing(F) 2 NP.
    [Show full text]
  • 1 Introduction
    The Computational Structure of Monotone Monadic SNP and Constraint Satisfaction: A Study through Datalog and Group Theory Tom´as Feder Moshe Y. Vardi IBM Almaden Research Center 650 Harry Road San Jose, California 95120-6099 Abstract This paper starts with the project of finding a large subclass of NP which exhibits a dichotomy. The approach is to find this subclass via syntactic prescriptions. While, the paper does not achieve this goal, it does isolate a class (of problems specified by) “Monotone Monadic SNP without inequality” which may exhibit this dichotomy. We justify the placing of all these restrictions by showing that classes obtained by using only two of the above three restrictions do not show this dichotomy, essentially using Ladner’s Theorem. We then explore the structure of this class. We show all problems in this class reduce to the seemingly simpler class CSP. We divide CSP into subclasses and try to unify the collection of all known polytime algorithms for CSP problems and extract properties that make CSP problems NP-hard. This is where the second part of the title – “a study through Datalog and group theory” – comes in. We present conjectures about this class which would end in showing the dichotomy. 1 Introduction We start with a basic overview of the framework explored in this paper; for an accompanying pictorial description, see Figure 1. A more detailed presentation of the work and its relationship to earlier work is given in the next section. It is well-known that if P=NP, then NP contains problems that are neither solvable 6 in polynomial time nor NP-complete.
    [Show full text]