Applied Logic Exercises - Complexity of satisfiability

Marcin Szczuka

Institute of Informatics, The University of Warsaw

Monographic lecture, Spring semester 2018/2019

Marcin Szczuka (MIMUW) Applied Logic 2019 1 / 39 Plan wykładu

1 Computational compexity Time complexity Space complexity Complexity in non-deterministic case

2 Complexity classes PSPACE Complexity class PTIME Complexity class NPTIME

3 NP-complete problems Polynomial problem reduction NP completeness SAT problem and Cook’s theorem Other versions of SAT NP-hardness and MAX-SAT

Marcin Szczuka (MIMUW) Applied Logic 2019 2 / 39 cost function

We will use the Turing Machine model (sometimes multi-tape) as our basic model of computation. Using the Church’s Thesis we will assume that it coresponds to “normal” algorithms. Notational convention Let f : N 7→ N be a non-decreasing natural function. We state that function g : N 7→ N is of order f(n), denoted by O(f(n)) if for f(n) 6= 0 exist a finite limit: g(n) lim n→∞ f(n)

Note that with this convention we are always looking at the fastest growing component of the function g.

Marcin Szczuka (MIMUW) Applied Logic 2019 3 / 39 Plan wykładu

1 Computational compexity Time complexity Space complexity Complexity in non-deterministic case

2 Complexity classes Complexity class PSPACE Complexity class PTIME Complexity class NPTIME

3 NP-complete problems Polynomial problem reduction NP completeness SAT problem and Cook’s theorem Other versions of SAT NP-hardness and MAX-SAT

Marcin Szczuka (MIMUW) Applied Logic 2019 4 / 39 Time complexity

Class TIME(.) Let f(n) be a non-decreasing natural function. By TIME(f(n)) we denote a class of decision problems (languages) πL such, that there exist constants a, b ∈ and a Turing Machine M for which M can verify if w ∈ using no more than af(n) + b computational steps, regardless of the choice of w as long as |w| = n.

In the definition above:

1 We mostly care of the fastest growing component.

2 We disregard the “overhead” associated with, e.g, data -coding. This is only represented by the constant b.

3 We assume that one computational step takes one unit of time, i.e., the time of computation is equivalent to the number of steps.

Marcin Szczuka (MIMUW) Applied Logic 2019 5 / 39 Plan wykładu

1 Computational compexity Time complexity Space complexity Complexity in non-deterministic case

2 Complexity classes Complexity class PSPACE Complexity class PTIME Complexity class NPTIME

3 NP-complete problems Polynomial problem reduction NP completeness SAT problem and Cook’s theorem Other versions of SAT NP-hardness and MAX-SAT

Marcin Szczuka (MIMUW) Applied Logic 2019 6 / 39 Space (memory) complexity

Class SPACE(.) Let f(n) be a non-decreasing natural function. By SPACE(f(n)) we denote a class of decision problems (languages) πL such, that there exist constants a, b ∈ R and a Turing Machine M for which M can verify if w ∈ L using no more than af(n) + b different positions (cells) on the tape, regardless of the choice of w as long as |w| = n.

In the definition above:

1 We mostly care of the fastest growing component.

2 We disregard the “overhead” associated with, e.g, storage of the input.

3 We assume that in one computational step we can “occupy” no more than one unit of memory (one cell on tape).

Marcin Szczuka (MIMUW) Applied Logic 2019 7 / 39 Plan wykładu

1 Computational compexity Time complexity Space complexity Complexity in non-deterministic case

2 Complexity classes Complexity class PSPACE Complexity class PTIME Complexity class NPTIME

3 NP-complete problems Polynomial problem reduction NP completeness SAT problem and Cook’s theorem Other versions of SAT NP-hardness and MAX-SAT

Marcin Szczuka (MIMUW) Applied Logic 2019 8 / 39 Non-deterministic Turing Machine

Non-deterministic Turing Machine Non-deterministic Turing Machine (NTM) is a tuple

M = (Q, Σ, Γ, δ, q0,B,F )

where:

Q – finite set of states, including the initial state q0; Γ – finite set of tape symbols, including the empty symbol B (Blank) and input symbols; Σ – set of input symbols, such that B/∈ Σ and Σ ⊂ Γ; F ⊂ Q – set of terminal states; δ – transition relation δ ⊂ Q × Γ × Q × Γ × {L, R}, where L i R correspond to head movements (Left/Right).

An NTM in a given configuration may have more than one possibility of action for a given state and symbol on the tape. Marcin Szczuka (MIMUW) Applied Logic 2019 9 / 39 Properties of NTM

In case od deterministic TMs we can talk about a step going from 0 configuration (q1, t1, u) to (q2, t2, u ), given that σ(q1, t1) = (q2, t2, h) and information u0 on tape is a result of applying h to u. In NTM we can talk about a admissible step going from configuration 0 (q1, t1, u) to (q2, t2, u ) if (q1, t1, q2, t2, h) ∈ δ. Language of NTM A non-deterministic Turing Machine M decides(induces) the language L = L(M) ⊂ Σ∗, if for each w ∈ L(M) there exists a finite sequence of admissible steps in M that ends in a terminal state and each transition in this sequence complies with relation δM for M.

Non-deterministic Turing Machine M accepts a word from language L(M) if there exists at least one finite path of computation in the computation tree for this word that is admissible and ends in a terminal state. How to select the right path in the computation tree?!

Marcin Szczuka (MIMUW) Applied Logic 2019 10 / 39 Non-deterministic time complexity

Class NTIME(.) Let f(n) be a non-decreasing natural function. By NTIME(f(n)) we denote a class of decision problems (languages) πL such, that there exist constants a, b ∈ R and a non-deterministic Turing Machine M for which every computation path of M that can verify if w ∈ L contains no more than af(n) + b steps, regardless of the choice of w as long as |w| = n.

In the definition above:

1 We consider the longest admissible computation path.

2 We disregard of the size of (possibly enormous) computation tree.

3 Wea assume that a “Ferry Godmother” (an oracle) shows us which path to take.

Marcin Szczuka (MIMUW) Applied Logic 2019 11 / 39 Plan wykładu

1 Computational compexity Time complexity Space complexity Complexity in non-deterministic case

2 Complexity classes Complexity class PSPACE Complexity class PTIME Complexity class NPTIME

3 NP-complete problems Polynomial problem reduction NP completeness SAT problem and Cook’s theorem Other versions of SAT NP-hardness and MAX-SAT

Marcin Szczuka (MIMUW) Applied Logic 2019 12 / 39 Complexity classes

In theory (and practice) of complexity theory we are usually most interested in classes (types) of problems that have specific degree of hardness (complexity). Most commonly considered classes: Problems with constant complexity, especially w.r.t. memory. Linear time (and/or space) compelxity. Log-linear problems (n log n), e.g, sorting algorithms. Polynomial problems – deterministic and non-deterministic. The last of these classes (polynomial) will be of special interest to us, as it has many properties with practical ramifications.

Marcin Szczuka (MIMUW) Applied Logic 2019 13 / 39 Plan wykładu

1 Computational compexity Time complexity Space complexity Complexity in non-deterministic case

2 Complexity classes Complexity class PSPACE Complexity class PTIME Complexity class NPTIME

3 NP-complete problems Polynomial problem reduction NP completeness SAT problem and Cook’s theorem Other versions of SAT NP-hardness and MAX-SAT

Marcin Szczuka (MIMUW) Applied Logic 2019 14 / 39 Problems of polynomial memory cost

Note, that if a problem is in SPACE(nk) then there exists an algorithm that solves it using a memory proportional to nk. Hence, the estimation for complexity of our problem is given by a polynomial of the degree at most k. Complexity class PSPACE Polynomial space (memory) problems (denote by PSPACE) are a class of decision problems (languages) πL such, that for each L ∈ πL exists k ∈ N for which L ∈ SPACE(nk). Hence: [ PSPACE = SPACE(nk) k∈N In non-determininistic case definition of NPSPACE class has more caveats. It is, however, possible to prove that PSPACE = NPSPACE. The proof is non-trivial.

Marcin Szczuka (MIMUW) Applied Logic 2019 15 / 39 Plan wykładu

1 Computational compexity Time complexity Space complexity Complexity in non-deterministic case

2 Complexity classes Complexity class PSPACE Complexity class PTIME Complexity class NPTIME

3 NP-complete problems Polynomial problem reduction NP completeness SAT problem and Cook’s theorem Other versions of SAT NP-hardness and MAX-SAT

Marcin Szczuka (MIMUW) Applied Logic 2019 16 / 39 Problems of polynomial (time) cost

Just like in the case of memory, the class of problems with polynomial time cost will be of special interest to us.

Complexity class P = PTIME Polynomial (time cost) problems, traditionally denoted by P = PTIME, are a class of decision problems (languages) πL such, that for each L ∈ πL k exists k ∈ N for which L ∈ TIME(n ). Hence:

[ k P = PTIME = TIME(n ) k∈N Note, that by assuming that in a single computational step only one memory cell can be altered we obtain:

P = PTIME ( PSPACE

Marcin Szczuka (MIMUW) Applied Logic 2019 17 / 39 Plan wykładu

1 Computational compexity Time complexity Space complexity Complexity in non-deterministic case

2 Complexity classes Complexity class PSPACE Complexity class PTIME Complexity class NPTIME

3 NP-complete problems Polynomial problem reduction NP completeness SAT problem and Cook’s theorem Other versions of SAT NP-hardness and MAX-SAT

Marcin Szczuka (MIMUW) Applied Logic 2019 18 / 39 Problems of non-deterministic polynomial (time) cost

The polynomial time cost will be of special interest to us in non-deterministic case, just as it was in deterministic situation.

Complexity class NP = NPTIME Non-deterministic polynomial (time cost) problems, traditionally denoted by NP = NPTIME, , are a class of decision problems (languages) πL such, k that for each L ∈ πL exists k ∈ N for which L ∈ NTIME(n ). Hence:

[ k NP = NPTIME = NTIME(n ) k∈N Using a previous observations about containment of classes we get:

P ⊂ NP ⊂ PSPACE

Marcin Szczuka (MIMUW) Applied Logic 2019 19 / 39 Example of a problem in NP

Decision Travelling Salesman Problem - TSP(K) Decision Travelling Salesman Problem(TSP(K)) is described as: GIVEN: Undirected weighted graph G; constant K > 0. QUESTION: Is there is a travelling salesman route (Hamiltonian cycle) in G with a total cost K?

We do not know if this problem is in P, but we know that it is in NP. We can also investigate a problem complementary to a given problem in NP. For example: Complementary (reverse) TSP(K) GIVEN: Undirected weighted graph G; constant K > 0. QUESTION: Demonstrate that there is no travelling salesman route (Hamiltonian cycle) in G with a total lower than K?

Marcin Szczuka (MIMUW) Applied Logic 2019 20 / 39 Dependencies between classes

Complexity class coNP A decision proble (language) belongs to the class coNP (complementary nondeterministic polynomial) if its reverse (complementary) problem is in NP. The following dependencies hold:

P ⊂ NP ⊂ PSPACE

P ⊂ coNP ⊂ PSPACE

Multi-million dollar question Is P = NP?

Marcin Szczuka (MIMUW) Applied Logic 2019 21 / 39 Plan wykładu

1 Computational compexity Time complexity Space complexity Complexity in non-deterministic case

2 Complexity classes Complexity class PSPACE Complexity class PTIME Complexity class NPTIME

3 NP-complete problems Polynomial problem reduction NP completeness SAT problem and Cook’s theorem Other versions of SAT NP-hardness and MAX-SAT

Marcin Szczuka (MIMUW) Applied Logic 2019 22 / 39 Plan wykładu

1 Computational compexity Time complexity Space complexity Complexity in non-deterministic case

2 Complexity classes Complexity class PSPACE Complexity class PTIME Complexity class NPTIME

3 NP-complete problems Polynomial problem reduction NP completeness SAT problem and Cook’s theorem Other versions of SAT NP-hardness and MAX-SAT

Marcin Szczuka (MIMUW) Applied Logic 2019 23 / 39 Problem reduction Efficient problem reduction

We say that the language (problem) L1 is reducible ( Karp reducible) to language (problemu) L2, if there exists a function R (reduction) that assigns words to words, such that:

1 There exists a Turing Machine that computes R with memory cost at most O(log n);

2 w ∈ L1 ⇔ R(w) ∈ L2; Note: Point 1 above prevents “cheating” by putting a limit on calculations that we try to avoid by using large amount of memory. Another complexity class of some importance is the class of logarithmic problems L.

L ⊆ NL ⊆ P ⊆ NP ⊆ PSPACE Hence, each properly defined Karp reduction from definition above has at most polynomial complexity.

Marcin Szczuka (MIMUW) Applied Logic 2019 24 / 39 Hierarchy of problem complexity

Reducibility of problems If there exists an efficient reduction of problem B to problem A and the problem A has at lest polynomial (deterministic or non-deterministic) complexity, then A is at least as hard as B.

Algorithm for problem B

Algorithm w R(w) for problem yes/no R A

Marcin Szczuka (MIMUW) Applied Logic 2019 25 / 39 Plan wykładu

1 Computational compexity Time complexity Space complexity Complexity in non-deterministic case

2 Complexity classes Complexity class PSPACE Complexity class PTIME Complexity class NPTIME

3 NP-complete problems Polynomial problem reduction NP completeness SAT problem and Cook’s theorem Other versions of SAT NP-hardness and MAX-SAT

Marcin Szczuka (MIMUW) Applied Logic 2019 26 / 39 NP completeness

NP-complete problems An algorithmic decision problem (language) L is complete in the class NP (NP-complete) if and only if: 1 L ∈ NP; 0 2 Every language L ∈ NP is reducible (in polynomial time) to L. If in the definition above only condition 2 holds then L is NP-hard. IMPORTANT NOTES: 1 In order to show that a given problem in NP is NP-complete it suffices to transform it in polynomial time to some known NP-complete problem.

2 If we find a polynomial, deterministic algorithm solving any of NP-complete problems then P = NP. 3 So far we do not know if any NP-complete problem exist.

Marcin Szczuka (MIMUW) Applied Logic 2019 27 / 39 Plan wykładu

1 Computational compexity Time complexity Space complexity Complexity in non-deterministic case

2 Complexity classes Complexity class PSPACE Complexity class PTIME Complexity class NPTIME

3 NP-complete problems Polynomial problem reduction NP completeness SAT problem and Cook’s theorem Other versions of SAT NP-hardness and MAX-SAT

Marcin Szczuka (MIMUW) Applied Logic 2019 28 / 39 SAT problem and Cook’s theorem

SAT Problem– Satisfability of propositional formulæ

GIVEN: Boolean formula f with variables x1, . . . , xn ∈ {0, 1}. QUESTION:Is there a valuation for x1, . . . , xn for which f is satisfied?

SAT Problem has a special importance for us. We will how, that every problem in NP can be reduced to it in polynomial time. Hence, SAT will be our seed example of an NP-complete problem.

Cook’s Theorem SAT problem is NP-complete.

Marcin Szczuka (MIMUW) Applied Logic 2019 29 / 39 Proof of Cook’s Theorem

We only present a very brief sketch of the proof. According to the definition of NP-completeness we need to make two steps in the proof. 1 Show that SAT is in NP. 2 Show that every problem in NP has a polynomial reduction to SAT.

Step 1: SAT is in NP. NTM can “guess” a solution (variable valuation) and then verify if the formula is satisfied for this valuation. Given a valuation of a formula one can verify its satisfiability in polynomial time w.r.t. the length of the formula. Therefore, SAT is in NP.

Unfortunately, second step in this proof is far less straightforward.

Marcin Szczuka (MIMUW) Applied Logic 2019 30 / 39 Completeness proof for SAT

We are going to demonstrate that for any language in NP there exists a polynomial-time encoding to language of SAT. To do that we will show that any NTM M = (Q, Σ, Γ, δ, q0,B,F ) that uses polynomial number p(n) of steps to process the word w (|w| = n) can be represented by a Boolean formula f such, that M accepts w if and only if there exists a satisfying valuation for f. Note, that since the (time) complexity of M is no more than p(n) it suffices to consider only the part of machine’s tape that is no further than p(n) from the position of the head at in initial state q0. The head’s position in state q0 we denote by 0 and then enumerate tape positions symmetrically towards both ends, using numbers between −p(n) and p(n). The operations L, R, i.e., head movement to left or right we will identify with adding +1 or -1 to the position counter.

Marcin Szczuka (MIMUW) Applied Logic 2019 31 / 39 NP-completeness of SAT – variables

Let q ∈ Q, −p(n) ≤ i ≤ p(n), j ∈ Σ i 0 ≤ k ≤ p(n). We begin the encoding of NTM as Boolean formula by establishing variables:

Tijk is true (equals 1) ⇔ i-th cell on tape contains symbol j in k-th step of computation. There are O(p(n)2) such variables.

Hik is true ⇔ the head is over i-th tape cell in k-th step of computation. There are O(p(n)2) such variables.

Qqk is true ⇔ M is in state q in k-th step of computation. There are O(p(n)) such variables.

With use of these variables we construct a Boolean formula f that represents a complete computation in M. Altogether, this formula contains O(p(n)2) variables.

Marcin Szczuka (MIMUW) Applied Logic 2019 32 / 39 Formula f is a conjunction of sub-formulæ that encode: Tij0 – i-th cell at state q0 contains symbol j.There are O(p(n)) such elements. Qs0 – initial state (unique). H00 – initial head position (unique). 0 Tijk ⇒ ¬Tij0k dla j 6= j – at each step of computation each tape cell contains exactly one symbol. There are O(p(n)2) such elements. Tijk = Tij(k+1) ∨ Hik – Change of symbol on tape can only happen at the position where head is. There are O(p(n)2) such elements. 0 Qqk ⇒ ¬Qq0k dla q 6= q – at each step in computation we are in exactly one state. There are O(p(n)) such elements. 0 Hik ⇒ ¬Hi0k dla i 6= i – at each step in computation the head is at exactly one position. There are O(p(n)) such elements. W (q,σ,q0,σ0,d)∈δ(Hik ∧ Qqk ∧ Tiσk) ⇒ (H(i+d)(k+1) ∧ Qq0(k+1) ∧ T(i+d)σ0(k+1)) – transition to new state at each step must comply with relation δ. There are O(p(n)2) such elements. W Qqp(n) – at the end of computation we must be in one of qb∈F b terminal (accepting) states. Marcin Szczuka (MIMUW) Applied Logic 2019 33 / 39 NP-completeness of SAT – formula

First, let us observe that if we know a legal computation of M for an input word w then formula f is satisfied. If we have a satisfying valuation for f then from values of Tijk,Hik,Qqk we can read step-by-step the sequence of states and actions of M that constitute a legal computation in M that accepts w. All in , the cost of recoding machine as formula is a polynomial of p(n), hence a polynomial of n. Note – CNF-SAT In the proof of Cook’s theorem we don’t care much about the form of f. However, in the further considerations we will mostly consider CNF-SAT proble, i.e. SAT for Boolean fomulæ in Conjunctive Normal Form (CNF). f is in CNF if it is written as a conjunction of clauses (alternatives) made of literals (variables or their negations).

(l11 ∨ l12 ∨ ... ∨ l1k1 ) ∧ (l21 ∨ l22 ∨ ... ∨ l2k2 ) ∧ ... ∧ (ln1 ∨ ln2 ∨ ... ∨ lnkn )

Marcin Szczuka (MIMUW) Applied Logic 2019 34 / 39 Plan wykładu

1 Computational compexity Time complexity Space complexity Complexity in non-deterministic case

2 Complexity classes Complexity class PSPACE Complexity class PTIME Complexity class NPTIME

3 NP-complete problems Polynomial problem reduction NP completeness SAT problem and Cook’s theorem Other versions of SAT NP-hardness and MAX-SAT

Marcin Szczuka (MIMUW) Applied Logic 2019 35 / 39 3-SAT Problem

3-SAT Problem

GIVEN: Boolean formula f with variables x1, . . . , xn ∈ {0, 1} in 3-CNF, i.e., consisting of clauses that contain no more than three literals. QUESTION:Is there a satisfying valuation for f?

We will show that 3-SAT is NP-complete by reducing to it the known CNF-SAT problem.

1 Formula f for CNF-SAT is a conjunction of clauses with arbitrary number of literals. 0 2 We transform f to f that is in 3-CNF by multiple application of the following rewrite rule to clauses in f that have more than three literals:

(l1 ∨ l2 ∨ l3 ∨ l4 ∨ ...) = (l1 ∨ l2 ∨ z1) ∧ (¬z1 ∨ l3 ∨ l4 ∨ ...)

where z1 is an added (new) variable. 0 3 Multiple application of the rule above yields a formula f that is satisfied is and only if f is satisfied. The size (number of clauses and variables) of f 0 is polynomial w.r.t. the size of f. Marcin Szczuka (MIMUW) Applied Logic 2019 36 / 39 Plan wykładu

1 Computational compexity Time complexity Space complexity Complexity in non-deterministic case

2 Complexity classes Complexity class PSPACE Complexity class PTIME Complexity class NPTIME

3 NP-complete problems Polynomial problem reduction NP completeness SAT problem and Cook’s theorem Other versions of SAT NP-hardness and MAX-SAT

Marcin Szczuka (MIMUW) Applied Logic 2019 37 / 39 NP-hardness and optimisation problems In case we can’t (or are not required to) show, that a given problem S is in the class NP and we know that there exists a polynomial reduction leading to it from a known NP-complete (hard) we state that S is NP-hard (hard in class NP). An NP-hard problem may be much harder than any of NP-complete problems. For example, one can demonstrate a polynomial reduction of SAT problem to the HALTING problem. Hence, HALTING problem is NP-hard but, we know that this problem is not even fully algorithmically decidable. We frequently associate NP-hardness with optimisation versions of NP-complete problems. Indeed, optimisation versions of known NP-complete problems are NP-hard. However, we have to remember than they may not be in NP, meaning that they may be a lot harder than the corresponding decision problems. Eamples of NP-hard problems: 1 MAX-SAT – find a valuation that satisfies as many clauses as possible.

2 General TSP – find the shortest salesman route.

Marcin Szczuka (MIMUW) Applied Logic 2019 38 / 39 MAX-SAT problem

MAX-SAT problem

GIVEN: Boolean formula f with variables x1, . . . , xn ∈ {0, 1} in CNF. TASK:Find a valuation satisfying as many clauses in f as possible.

(x1 ∨ x2) ∧ (¬x1 ∨ x2) ∧ (x1 ∨ ¬x2) ∧ (¬x1 ∨ ¬x2) MAX-SAT problem is NP-hard since if there exists a (complete) valuation satisfying f then a proper algorithm solving MAX-SAT will find it. MAX-SAT is not NP-complete since it is not a 0/1 (decision) problem. MAX-SAT problem does not behave too well w.r.t. approximate solutions. There is no polynomial approximation schema for it if P 6= NP. MAX-SAT problem (and it’s variants) are frequently appearing in the context of verifying the quality of programs for finding valuations, so called SAT-Solvers.

Marcin Szczuka (MIMUW) Applied Logic 2019 39 / 39