<<

From: FLAIRS-02 Proceedings. Copyright © 2002, AAAI (www.aaai.org). All rights reserved.

An On-Line Satisfiability Algorithm for Expressions with Two Literals

John F. Kolen Institute of Human and Machine Cognition University of West Florida Pensacola, Florida 32501 [email protected]

Abstract so far. For instance, an on-line convex hull algorithm incre- mentally maintains the convex hull of a sequence of coplanar This paper describes two algorithms for determining points (Preparata 1979). the satisfiability of Boolean conjunctive normal form In this paper, I first describe an on-line algorithm for 2- expressions limited to two literals per clause (2-SAT) SAT. At each insertion of a clause, the algorithm condition- extending the classic effort of Aspvall, Plass, and Tar- jan. The first algorithm differs from the original in ally performs a depth first search of subset of the dual impli- that satisfiability is determined upon the presentation of cation graph and checks affected components for inclusion each clause rather than the entire clause set. This on- of literal pairs (x and x). A large performance savings is line algorithm experimentally exhibits average run-time made by not searching on pure variables, by making oppor- linear to the number of variables. This performance tunistic variable assignments, especially during depth first is achieved by performing a single depth first search search of the graph. These techniques are used to construct of one of the incoming literals. Additional search is an off-line algorithm, as well. The performance of the algo- avoided by excluding clauses containing pure variables rithms are demonstrated experimentally. or variables whose truth value has been explicitly pro- vided or can be inferred. An off-line algorithm is also described that incorporates these strategies. Preliminaries During the course of this paper, the following terms are used to describe satisfiability problems and their graph theoretic Introduction equivalents. Let X = x1, ..., xn be a set of Boolean vari- A conjunctive normal form Boolean expression (CNF) is a ables. A literal is either a variable or it’s , hence L = {x , ..., x }∪{x , ..., x } conjunction of disjunctive clauses. Determining the exis- 1 n 1 n is the set of literals. A c k tance of an assignment of truth values to the variables of clause, ,isamultiset representing the disjunction of L the CNF expression that makes it evaluate as true is the sat- members of .For this paper, we are only interested in the k =2 isfiability problem. When the clause size is greater than case where . This restriction implies that clauses will a, b ∈ L a ∨ a two, the problem is NP-Complete (Cook 1971). A linear exhibit one of three forms ( ): i) unit clauses , a ∨ a a ∨ b algorithm for 2-SAT was originally developed by Aspvall, ii) tautological clauses, , iii) and regular clauses, . C Lk Plass, and Tarjan (Aspvall, Plass, & Tarjan 1979) (APT al- A clause set is the multiset drawn from .Aninstance k F =(X, C) a gorithm). Consider a directed graph where the vertices are of a -SAT problem is .Aliteral, ,ispure, if C the set of literals from a 2-CNF formula. One could con- it’s negation does not appear in any clause in . struct a digraph on these vertices from the formula by in- The 2-SAT decision problem is intimately related to deci- terpreting each clause a ∨ b as a pair of directed edges sions problems on graphs (Aspvall, Plass, & Tarjan 1979). All graphs are assumed to be directed. Let G =(V,E) a → b and b → a on the graph. They proved that a designate a graph with V vertices and E edges. The graph formula is satisfiable if, and only if, no pair of literals, x dual, or implication graph, G,ofa2-SAT problem instance and x, appear in the same strongly connected component. (X, C) directly represents the implication relationship be- Since a linear time algorithm for determining the strongly tween literals of X induced by the disjunctive clauses of C. connected components of a digraph existed (Tarjan 1972; The vertices, V ,ofthe graph are the literals, L.Anedge Aho, Hopcroft, & Ullman 1974), the 2-SAT question could exists in the graph if and only if the implication relation- be answered in linear time as well. ship holds between the two vertices. Hence, E = {(a, b) | The APT algorithm requires the entire expression before (a ∨ b) ∈ C}. Note that both (a, b) ∈ E if and only if it can begin processing–an off-line algorithm. An on-line   b, a ∈ E. Literal and vertex will be used interchangeably algorithm, on the other hand, is one receives a sequence of through out the paper. inputs and performs some computation on the sequence seen Avertex b is reachable from vertex a if there exists a set Copyright c 2002, American Association for Artificial Intelli- (xi,xi+1) ⊂ V of vertices such that a → x1, x1 → x2, gence (www.aaai.org). All rights reserved. ..., xi → b. Reachability of two vertices is represented as

FLAIRS 2002 187 proc 2-SAT APT(F ) G.Either P = P or there exist two components S1,S2 ∈ P Construct the dual graph G=(V,E) of F such that S1,S2 ∈ P but S1 ∪ S2 ⊆ S, and S ∈ P . foreach v ∈ E do   if not(w.Mark) then SearchAPT(v); fi Theorem 3 Let a b and a b. There exists a path from return not(SearchFailed()) od. a to a through b (and b) with length | a  b | + | a  b |. proc SearchAPT(v) Theorem 4 S is a contradictory component if and only if Push(v); ∀b ∈ S, b ∈ S. v.Lowlink := v.Dfnum := count; count := count +1; v.Mark := true; Algorithms foreach w where (v, w) ∈ E do While the APT algorithm performs well–linear run time if not(w.Mark) with respect to the number of variables and edges–it requires then SearchAPT(w); the entire formula to begin it’s processing. There are situa- v.Lowlink := min(v.Lowlink, w.Lowlink) tions, however, that this may not be feasible or, at least, in- elsif w.Dfnum

188 FLAIRS 2002 proc InsertClause(a, b) edges appear in pairs, thus performing the backward push of if a.V alue = false ∧ b.V alue = false false values that this procedure would provide. then Fail(); // can’t satisfy The DFS procedure, Search, differs slightly from the elsif a.V alue = true ∨ b.V alue = true one implementing the APT algorithm. Note that the ver- then return ; // satisfied tex traversal mark is no longer Boolean. The initial call to elsif a.V alue = false Search instantiates a unique symbol to mark the current then AssignTrue(b); // forced traversal. More importantly, it takes advantage of the log- elsif b.V alue = false ical of the search vertex to prune the search then AssignTrue(a); // forced space. As before, it performs DFS of the graph from the elsif a = b start vertex, v.Itispossible, however, to eliminate certain then AssignTrue(a); // unit clause vertex expansions. Expanding a literal whose truth value is elsif a = b // not known serves no purpose. Pure literals, according to Theo- then Add edge (not(a),b) to E; rem 1, will always end up in a singleton component, so they Add edge (b, a) to E; should remain unexpanded. Search, unlike the generic con- if a and bare not pure nected component algorithm, is attempting to traverse the then Search(a, UniqueMark()); fi fi. single component containing the original vertex. The stack serves dual purposes in the new procedure: it proc AssignTrue(v) is a valid implication chain in addition to the component oneof v.V alue holding area. The literals on the stack were pushed in their true : return ; implication order, x1 → x2 → ... → xi → v, where false : Fail(); the index of xi reflects the order that literals were pushed unassigned : v.V alue := true; onto the stack and v is the current literal. Consider the event v.V alue := false; where the search algorithm stumbles upon a literal, w whose foreach w where (v, w) ∈ E truth value is false while at v.Given that w is a child of v, do AssignTrue(w); od foeno. v =⇒ w. The only way to avoid a contradiction is to assign v the value false. During the AssignT rue(v) call proc Search(v, mark) this assignment propagates down the stack, with eventually Push(v); all the stack literals, including the root literal, receive an as- v.Lowlink := v.Dfnum := count; signment. Hence, the entire traversal can be aborted. count := count +1; Another assignment can be determined from the stack. If v.Mark := mark; an implication chain exists from a literal to it’s negation, the foreach w where (v, w) ∈ E negation is asserted true. This rule can be implemented with ∧ w.V alue = true ∧ w is pure do an examination of the stack for the negation of the vertex if w.V alue = false whose expansion in question. then Empty the stack; If the vertex was not marked during the current traver- AssignTrue(v); sal, then it is expanded and v’s backedge is updated ac- AbortSearch(); cordingly. In the case of a backedge, a small simpli- elsif OnStack(w) then AssignT rue(w); fication can be performed. Since Lowlink is assigned elsif Mark(w) = mark the value of the vertex’s traversal number, Dfnum,or then Search(w, mark); the minimum of itself and some other value, Lowlink ≤ v.Lowlink := Dfnum for all vertices. The APT algorithm performs min(v.Lowlink, w.Lowlink) two comparisons involving Lowlink on the backlink test. elsif w.Dfnum < v.LowlinkandOnStack(w) First, the target must have appeared previously in the DFS, then v.Lowlink := w.Dfnum fi w.Dfnum < v.Dfnum. Second, v receives a new od Lowlink if w.Dfnum < v.Lowlink. The second compar- if v.Lowlink = v.Dfnum ison subtends the first, so the former can be replaced by the then repeat u := Pop(); until u = v; fi. latter and used to record the destination of the new backedge. The algorithm fills the stack with candidate literals dur- Figure 2: The on-line algorithm. The InsertClause proce- ing its DFS of the implication graph. When it encounters dure performs simple truth maintenence on the literals, adds a literal whose Lowlink points to itself after visiting it’s implications, and initiates component checking, when nec- progeny, it has discovered the root vertex of the current com- essary. Search recursively explores the dual graph, while ponent occupying the topmost portion of the stack. In the AssignT rue maintains truth assignments. APT algorithm, the stack is checked for the of all literals of the component. This check, however, can be re- duced to one comparison: the presence of the root’s nega- pruning the traversal at vertices with known truth values. tion. Theorem 4 supports this simplification. Truth values are assigned to the complementary pair of lit- Even this comparison can be eliminated. During DFS of erals: v is assigned true and v is assigned false.Arecursive acontradictory component, the negation of some literal, w, AssignF alse procedure is unnecessary because the graph will be found on the stack and trigger the assignment of true

FLAIRS 2002 189 to the newly found literal, w. Recall that truth assignments proc OffLine2-SAT(F ) are recursive, all literals found via DFS from the source are Construct the dual graph G =(V,E) of F , then assigned true and the negated literals are assigned false. The foreach non-negated v ∈ E complimentary pair assignments occur before vertex expan- ∧ w.V alue = unassigned sion in AssignT rue. Since w  w, the algorithm will ∧ w is not pure eventually try to make a contradictory assignment to w and ∧¬w.Mark ∧¬w.Mark do trigger a failure. When the algorithm safely returns to com- SearchOffLine(v); od ponent’s root vertex, it is guaranteed not to contain any con- return ¬SearchF ailed(); . tradictory literal pairs. Thus, the constituents need only be proc SearchOffLine(v) popped from the stack. Push(v); v.Lowlink := v.Dfnum := count; The Off-Line Algorithm count := count +1; v.Mark := true; The APT algorithm, separated the two abstractions, logical foreach w where (v, w) ∈ E and graph, that made their algorithm work. While convert- ∧ w.V alue = true ∧ OutEdges(w) > 0 do ing the formula to a graph and looking for contradictory if w.V alue = false components solves the satisfiability problem and achieving then EmptyStack(); asymptotic optimality, it fails to take advantage of regular- AssignT rue(v); ities of the graph submitted to component subroutine. The AbortSearch(); remainder of this section addresses optimization of the off- elsif OnStack(w) then AssignT rue(w); line algorithm in light of the on-line algorithm. elsif ¬w.Mark The new off-line 2-SAT algorithm still works depth-first, then Search(w); butitisable to prune a great many vertex expansions that v.Lowlink := the APT algorithm would follow. It can operate with a sin- min(v.Lowlink, w.Lowlink) gle vertex mark, as the graph is stable. It can not, however, elsif w.Dfnum < v.Lowlink ∧ OnStack(w) rely on truth assignments to pick up all contradictions. Test- then v.Lowlink := w.Dfnum fi ing a component for contradictions requires a single check od for the negation of the root literal. The main procedure dif- if v.Lowlink = v.Dfnum fers as well. Recall that pure literals are in singleton com- then if u is on the stack after v ponents, so they are skipped. Assigned literals need no fur- then F ail(); fi ther expansion, either. The algorithm expands literals only repeat u := Pop(); until u = v; fi. if they are not negated and neither it, nor its negation, are marked (contradictory components are closed under nega- Figure 3: The 2-SAT algorithm of Aspvall, Plass, and Tar- tion by Theorem 4). Expanding from the main loop only jan (Aspvall, Plass, & Tarjan 1979) embedded into Tarjan’s positive literals applies this reasoning to limit the loop con- strongly connected connected components algorithm (Tarjan struct and avoiding mark and purity tests. This optimization 1972). mirrors the single DFS invoked in InsertClause.

Analysis and Experimental Results experiments described below compare the run times of four The linear run time of the APT algorithm sets it apart from different versions of the algorithm: the off-line algorithm other satisfiability algorithms. In this section, the asymptotic where value truth values are identified and propagated (once run times of both algorithms are examined. assign) or not (once), the on-line approach utilizing variable Unfortunately, the off-line version worst-case run time is assignments (insert assign) or not (insert). Two hypotheses θ(n2). This behavior can be induced by a clause stream were tested: first, that run time grows linearly with prob- lem size; second, that variable assignments improved per- containing groups of three clause pairs of the form (xi−1 ∨ formance. The latter is of interest as truth maintenance adds xi) ∧ (xi ∨ y) ∧ (xi ∨ zi). The first clause inserts the edges significant complexity to the algorithm. (xi,xi−1) and (xi−1,xi). Since this is the first appearance of xi,it’s purity postpones it’s expansion to a future in- Each algorithm was ran on variable sets of size 1000, sert. When the second clause is inserted, the edge (zi,xi) 3500, 6500, 10000, 35000, 65000, and 100000. The ratio is added and xi is defiled. If z is impure as well (easily in- of clauses to variables examined were 0.9, 1.0, 1.1, 1.2, 2.0, duced by w ∨ y), then xi is traversed. Thus, there exists a and 5.0. For each condition, variable set size and clause ra- clause stream of length 3n +2whose run time is Θ(n2) as tio, 100 clause sets were generated and presented to each all xi are recursively visited on each insert. This run time of the algorithms. Clauses were independently generated is asymptotically the same as running the original algorithm by selecting, with replacement, two literals from L under on each iteration. Fortunately, such expressions are uncom- the assumption of uniform probability. Presentations to the mon. on-line versions maintained identical orderings of the clause The average run time is much more acceptable than this sets. quadratic upper bound. While a proof is elusive at this time, Table 1 displays the mean run times for the experimen- experimental results suggest linear expected run time. The tal conditions conditions. To test the linearity hypothesis,

190 FLAIRS 2002 Clause Variables Once Once Insert Insert Conclusion Ratio Assign (sec) (sec) Assign Satisfiability is an important problem for both AI and com- (sec) (sec) puter science. Many AI problems, such as planning, have 1000 0.005 0.005 0.005 0.004 been shown to be NP-complete. The algorithm described 3500 0.010 0.011 0.010 0.010 above, an on-line 2-SAT algorithm, can be brought to bear 6500 0.018 0.019 0.019 0.020 in restricted cases of various AI problems that can be trans- 0.9 10000 0.030 0.027 0.031 0.031 lated into 2-SAT instances. 35000 0.096 0.098 0.107 0.114 Future work includes theoretical validation of the experi- 65000 0.180 0.181 0.205 0.215 mental results described above. The on-line algorithm will 100000 0.274 0.280 0.318 0.333 also be enhanced to allow rollback. That is, the partially- 1000 0.005 0.004 0.005 0.005 dynamic version will be able to forget an arbitrary number 3500 0.011 0.011 0.012 0.014 of the most recent clauses and process new clause inserts 6500 0.020 0.024 0.024 0.024 as if the forgotten clauses never appeared. More impor- 1.0 10000 0.030 0.031 0.039 0.041 tantly, however, is the application of the on-line algorithm 35000 0.104 0.107 0.182 0.182 to a novel k-SAT algorithm. This new approach to k-SAT 65000 0.192 0.196 0.368 0.361 reduces the problem instance into an exponential number of 100000 0.296 0.301 0.592 0.578 (k-1)-SAT instances. A preprocessing step iteratively re- 1000 0.005 0.005 0.006 0.006 moves clauses from the set containing the most common 3500 0.012 0.011 0.017 0.015 literal. The removed clauses are associated with the lit- 6500 0.021 0.021 0.037 0.030 eral. The algorithm then examines all possible truth assign- 1.1 10000 0.030 0.030 0.077 0.053 ments to the common literals. The on-line version described 35000 0.100 0.097 0.394 0.238 above will allow the algorithm to prune the tree during the 65000 0.184 0.175 0.838 0.489 brute-force search by examining prefixes to the generated 100000 0.287 0.270 1.380 0.775 clauses. The envisioned partially-dynamic 2-SAT algorithm 1000 0.005 0.005 0.007 0.006 will help even more with rollback mimicking the variable 3500 0.013 0.012 0.020 0.016 assignments. 6500 0.021 0.022 0.045 0.032 1.2 10000 0.034 0.031 0.082 0.055 Acknowledgements 35000 0.115 0.103 0.397 0.238 This work was supported by the Office of Naval Research, 65000 0.203 0.184 0.837 0.492 grant number N00014-01-1-0926. 100000 0.322 0.287 1.383 0.776 1000 0.006 0.006 0.007 0.006 References 3500 0.017 0.015 0.020 0.016 Aho, A.; Hopcroft, J.; and Ullman, J. 1974. The Design 6500 0.028 0.026 0.044 0.032 and Analysis of Computer Algorithms. Addison-Wesley 2.0 10000 0.046 0.039 0.083 0.055 Publishing Co. 35000 0.164 0.134 0.398 0.239 Aspvall, B.; Plass, M.; and Tarjan, R. 1979. A linear-time 65000 0.271 0.230 0.839 0.493 algorithm for testing the truth of certain quantified boolean 100000 0.466 0.378 1.380 0.773 formulas. Information Processing Letters 8:121–123. 1000 0.007 0.006 0.008 0.007 Cook, S. A. 1971. The complexity of theorem-proving 3500 0.016 0.018 0.020 0.016 procedures. In Proceedings of the 3rd Annual ACM Sym- 6500 0.033 0.031 0.045 0.033 posium on the Theory of Computing, 151–158. Association 5.0 10000 0.055 0.050 0.083 0.055 for Computing Machinery. 35000 0.188 0.174 0.399 0.241 65000 0.309 0.282 0.845 0.497 Khanna, S.; Motwani, R.; and Wilson, R. 1996. On certifi- 100000 0.612 0.535 1.386 0.783 cates and lookahead in dynamic graph problems. In SODA: ACM-SIAM Symposium on Discrete Algorithms (A Confer- Table 1: Comparison of run-time’s of the algorithms. ence on Theoretical and Experimental Analysis of Discrete Algorithms). Preparata, F. P. 1979. An optimal real-time algorithm for planar convex hulls. Communications of the ACM 22:402– 405. variable count and mean run times with a ratio condition Tarjan, R. E. 1972. Depth-first search and linear graph were subjected to regression. Across all ratio conditions, the algorithms. SIAM Journal of Computing 1:146–160. smallest R2 value was 0.9941 strongly indicating a linear relationship between problem size to run time. Performing variable assignments had minimal impact on the offline al- gorithm, but significantly reduced the run time of the online version.

FLAIRS 2002 191