<<

The Satisfiability Problem

Boolean Formula Assignment. Let F be a Boolean formula over variables x1, x2, . . . , xn, and Boolean operators {∧, ∨, ()}.A truth assignment for F is a vector a ∈ {0, 1}n. The evaluation of F under truth assignment a is an element of {0, 1}, and is denoted by F (a). It is defined recursively as follows.

• Base Case. If F = xi, then F (a) = ai. If F = xi, then F (a) = ai.

• Recursive Case. Assume F1(a) and F2(a) have been defined. Then (F1 ∨ F2)(a) = F1(a) ∨ F2(a), (F1 ∧ F2)(a) = F1(a) ∧ F2(a), F1(a) = F1(a).

If F (a) = 1, then we say that a satisfies F , and that F is satisfiable. F is called unsatisi- fiable if it is not satisfied by any truth assignment.

The SATisfiability Problem. Given Boolean formula F (x1, x2, . . . , xn), does there exist a truth assignment A = a1, a2, . . . , an such that F (a1, a2, . . . , an) evaluates to true?

Example 1. Given Boolean formula F (x1, x2, x3, x4) =

(x1 ∨ (x2 ∧ x3)) ∧ (x2 ∨ x3 ∨ x4) ∧ (x2 ∧ (x3 ∨ x4)) ∧ (x2 ∨ (x3 ∧ x4)) ∧ (x2 ∨ x3 ∨ x4), evaluate F (a) for a = (1, 1, 0, 1). Is F satisfiable?

1 Literals. A literal is a Boolean variable, or the of a Boolean variable. For example, x12 and x2 are examples of literals, with the first being called a positive literal, and the second being called a negative literal.

Conjunctive Normal Form. A Boolean formula F is said to be in if F is of the form C1 ∧ C2 ∧ · · · ∧ Cm, where each clause Ci has the form

Ci = li1 ∨ li2 ∨ · · · ∨ limi which is the disjunction of a finite number of literals. F is called a CNF formula. Further- more, in the case that each Ci has exactly k literals, for some constant k, then F is called a k-CNF formula. We can thus define the CNF-SAT problem as the problem of deciding if a given CNF formula is satisfiable. The k-CNF-SAT (or kSAT for short) problem is defined similarly. When k = 2, the problem is called 2SAT, while, when k = 3, it is called 3SAT.

Example 2. Provide a Boolean formula that is logically equivalent to the defining formula from Example 1, and is in conjunctive normal form. For what value of k is this formula an instance of the k-CNF-SAT problem?

Currently there no known polynomial-time algorithm for deciding if a 3-CNF formula is sat- isfiable (and hence no known polynomial-time algorithm for SAT, since an arbitrary Boolean formula can be reduced to a 3-CNF formula in polynomial time). In fact, in a future lecture we shall demonstrate that 3SAT is one of the hardest problems to solve in a family of prob- lems called NP, which consists of those problems that can be solved in nondeterministic polynomial time.

2 However, it turns out that the 2SAT problem can be solved in time that is quadratic in m, the number of clauses, and n, the number of variables. The algorithm relies on the fact that, for a directed graph G = (V,E), one can check in linear time (in |V | + |E|) if some vertex b ∈ V is reachable from another vertex a ∈ V . The reachability algorithm is now described.

• Name: Reach

• Input hG = (V,E), a, bi, where a, b ∈ V .

• Output accept if and only if b is reachable from a.

• Begin Algorithm

• Initialize a FIFO queue Q = ∅

• Mark a as having been reached and enter a into Q

• While Q is nonempty

– Remove u from the from the front of Q – For each directed edge of the form (u, v) ∈ E ∗ If v is unmarked, then mark v and enter v into Q

• If b is marked then accept

• Else reject

• End Algorithm

3 Example 3. Show the contents of the queue Q during the execution of the above algorithm on the graph G = (V,E), where

V = {a, b, c, d, e, f, g, h} and the edges are given by

E = {(a, b), (a, c), (b, c), (b, d), (b, e), (b, g), (c, g), (c, f),

(d, f), (f, g), (f, h), (g, h)}. Decide if h is reachable from a.

4 CNF Notation. To simplify notation, a k-CNF formula of the form

(l11 ∨ · · · ∨ l1k) ∧ (l21 ∨ · · · ∨ l2k) ∧ · · · ∧ (lm1 ∨ · · · ∨ lmk) will be written as {(l11, . . . , l1k), (l21 . . . , l2k),..., (lm1, . . . , lmk)}, which is a family of sets of literals.

Example 4 Re-write the CNF formula from Example 2 using CNF notation.

Implication Graph of a 2-CNF Formula. Let F be a 2-CNF formula over the variables x1, x2, . . . , xn. The implication graph of F is defined as the directed graph GF = (V,E), where V = x1, x2, . . . , xn, x1, x2,..., xn, and for any two literals li and lj,(li, lj) ∈ E if and only if either (li, lj) is a clause of F or (lj, li) is a clause of F .

5 Example 5. Draw the implication graph for the following set of CNF clauses.

(x2, x4), (x2, x3), (x2, x3), (x2, x3), (x2, x4), (x1, x4).

6 Theorem 1. 2-CNF formula F is unsatisfiable iff there exists a variable x such that x is reachable from x and x is reachable from x in the implication graph GF .

Proof. ⇐. Let F be a 2-CNF formula and assume that there is some variable x such that x is reachable from x and x is reachable from x. Then there are edge sequences in GF of ˆ ˆ ˆ ˆ ˆ ˆ the form (x, l1), (l1, l2),..., (lr−1, lr), (lr, x) and of the form (x, l1), (l1, l2),..., (ls−1, ls), (ls, x). The first edge sequence implies that F is not satisfiable when x is assigned to 1. For example, (x, l1) ∈ E implies that either (x, l1) or (l1, x) is a literal of F . Then the assignment of 1 to x forces an assignment of 1 to l1. Similar reasoning shows that this in turn forces an assignment of 1 to l2, and, using this reasoning through the entire edge sequence, we see that lr is forced to have an assignment of 1. But edge (lr, x) corresponds with the literal (lr, x). And if lr is forced to 1, then this literal cannot be satisifed, since x was already assigned 1. Hence, no satisfying assignment of F can assign x to 1. A similar argument using the second edge sequence shows that no satisfying assignment of F can assign x to 0. Therefore, F is unsatisfiable.

Proof. ⇒. Now assume that F is unsatisfiable. We prove that there is some variable x such that x is reachable from x and x is reachable from x. The proof is by induction on the number of variables in formula F .

Basis Step. F has one variable x. Since F is unsatisfiable, F must have the two clauses (x, x) and (x, x). These yield implication graph edges (x, x) and (x, x), and hence x is the desired variable.

Induction Step. Assume that any unsatisfiable Boolean formula with less than n variables has a variable x such that x is reachable from x and x is reachable from x in its implication graph, for some n ≥ 1. Let F be an unsatisfiable Boolean formula with n variables. Choose a variable w such that either w is not reachable from w or vice versa. If no such variable exists, then the statement is proved. Without loss of generality, assume w is not reachable from w. Let R be the set of literals that are reachable from w. From what we just stated, w 6∈ R.

Claim 1: R is a consistent set of literals, meaning that, if l ∈ R, then l 6∈ R. By way of contradiction, assume otherwise; e.g. that l ∈ R and l ∈ R. Then there is a path P1 from w to l and a path P2 from w to l. And by contraposing the edges in P2, this yields a path P3 from l to w. Hence, P1 · P3 yields a path from w to w, a contradiction.

Now let V (R) denote the set of variables x for which either x ∈ R or x ∈ R. Without loss |R| of generality, assume that V (R) = {x1, x2, . . . , x|R|}. Furthermore, let a ∈ {0, 1} be an assignment for which ai = 1 if xi ∈ R and ai = 0 if xi ∈ R. Clearly, a satisfies all literals in R. Slightly more subtle is that a satisfies all clauses C that have at least one variable in V (R). For example, assume l ∈ R. Then if C = (l, ˆl), then C is satisfied by a since l is satisfied by a. On the other hand, if C = (l, ˆl), then this implies that ˆl ∈ R, since C yields the edge (l, ˆl), and l is reachable by w. Thus ˆl is also reachable by w and is thus in R. Thus

7 a satisfies C since ˆl in R and a satisfies ˆl by definition.

Now let Fˆ denote the new 2CNF formula that is F with all clauses removed that contain a ˆ variable in V (R). Notice that i) GFˆ is a subgraph of GF , ii) F is unsatisfiable (otherwise F would be satisfiable), and iii) Fˆ has |R| > 0 fewer variables than F . Hence, by the inductive assumption, there exists a variable x such that x is reachable from x and x is reachable from x in the implication graph GFˆ. But since, GFˆ is a subgraph of GF , we have that x is reachable from x and x is reachable from x in the implication graph GF , and the theorem is proved.

Corollary 1. 2SAT is solvable in quadratic time.

The above proof suggests an algorithm for finding a satisfying assignment for a satisfiable instance C of 2SAT. Henceforth we refer to this algorithm as the 2SAT Algorithm.

The first step of the algorithm is to identify a literal l for which there is no path from l to its negation. Next compute the reachability set Rl. Then for each variable x ∈ var(Rl) assign x to true (respectively, false) if x occurs positively (respectively, negatively) in Rl. Note that this assignment will satisfy all clauses that have a variable in var(Rl). Remove these clauses from C and repeat the process for the reduced set of clauses.

Example 6. Use the 2SAT Algorithm to find a satisfying assignment for the clauses in Example 5. Then add the clause (x2, x3) to the problem and verify that a cycle is created that contains a variable and its negation.

8 Exercises.

1. Given F (x1, x2, x3, x4) =

(x1 ∨ x3 ∨ x4) ∧ (x1 ∨ x3 ∨ x4) ∧ (x1 ∨ (x2 ∧ x3)) ∧ (x1 ∧ (x2 ∨ x4)) ∧ (x1 ∧ x2 ∧ x4),

evaluate F (a) for a = (1, 1, 1, 1). Is F satisfiable? If so, provide a satisfying assignment for F .

2. Provide a Boolean formula that is logically equivalent to the defining formula from the previous problem, but is in conjunctive normal form. For what value of k is this formula an instance of the k-CNF-SAT problem?

3. For the directed graph G = (V,E), where

V = {a, b, c, d, e, f, g, h, i, j, k}

and the edges are given by

E = {(a, b), (a, c), (b, c), (b, d), (b, e), (b, g), (c, g), (c, f),

(d, f), (f, g), (f, h), (g, h), (i, j), (i, k), (j, k)}, use the Reachability Algorithm to determine if vertex k is reachable from vertex a. Show the contents of the FIFO queue Q at each stage of the algorithm.

4. Re-write the CNF formula from Problem 2 in CNF notation.

5. Draw the implication graph for the following set of CNF clauses.

(x2, x3), (x2, x4), (x1, x3), (x2, x3), (x1, x4), (x1, x4), (x1, x2).

Perform the 2SAT Algorithm to determine a satisfying assignment for this set of clauses.

6. Repeat the previous problem, but now add the additional clause (x2, x3). Verify that there is now a cycle in the implication graph which contains a variable and its negation. Which variable is it?

9 x1 x2 x3 x4

x1 x2 x3 x4

Figure 1: Exercise 5:GC

Hints and Answers to the Exercises.

1. F (1, 1, 1, 1) = 0. F is satisfied by a = (0, 1, 0, 1). 2.

(x1 ∨ x3 ∨ x4) ∧ (x1 ∨ x3 ∨ x4) ∧ (x1 ∨ x2) ∧ (x1 ∨ x3) ∧ x1 ∧ (x2 ∨ x4) ∧ x1 ∧ x2 ∧ x4.

3. Queue sequence: Q1 = {a}, Q2 = {b, c}, Q3 = {c, d, e, g}, Q4 = {d, e, g, f}, Q5 = {e, g, f}, Q6 = {g, f}, Q7 = {f, h}, Q8 = {h}, Q9 = ∅. Therefore, vertex k is not reachable from a. 4.

C = {(x1, x3, x4), (x1, x3, x4), (x1, x2), (x1, x3), (x1), (x2, x4), (x1), (x2), (x4)}

5. See GC above. Round Root Vertex Reachable Set R 1 x1 R = {x1, x2, x3, x4} 6.

10 x1 x2 x3 x4

x1 x2 x3 x4

Figure 2: Exercise 6: GC and a “bad” cycle

11