<<

The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

Dependency Stochastic Boolean Satisfiability: A Logical Formalism for NEXPTIME Decision Problems with Uncertainty

Nian-Ze Lee,1 Jie-Hong . Jiang1, 2 1 Graduate Institute of Electronics Engineering, National Taiwan University 2 Department of Electrical Engineering, National Taiwan University No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan {d04943019, jhjiang}@ntu.edu.tw

Abstract to their simplicity and generality, various satisfiability for- mulations are under active investigation. Stochastic Boolean Satisfiability (SSAT) is a logical formal- ism to model decision problems with uncertainty, such as Among the quantified decision procedures, QBF and Partially Observable Markov Decision Process (POMDP) for SSAT are closely related. While SSAT extends QBF to verification of probabilistic systems. SSAT, however, is lim- allow random quantifiers to model uncertainty, they are ited by its descriptive power within the PSPACE complex- both PSPACE-complete (Stockmeyer and Meyer 1973). A ity class. More complex problems, such as the NEXPTIME- number of SSAT solvers have been developed and applied complete Decentralized POMDP (Dec-POMDP), cannot be in probabilistic planning, formal verification of probabilis- succinctly encoded with SSAT. To provide a logical formal- tic design, partially observable Markov decision process ism of such problems, we extend the Dependency Quantified (POMDP), and analysis of software security. For example, Boolean Formula (DQBF), a representative problem in the solver MAXPLAN (Majercik and Littman 1998) encodes a NEXPTIME-complete class, to its stochastic variant, named conformant planning problem as an exist-random quanti- Dependency SSAT (DSSAT), and show that DSSAT is also ZANDER NEXPTIME-complete. We demonstrate the potential appli- fied SSAT formula; solver (Majercik and Littman cations of DSSAT to circuit synthesis of probabilistic and 2003) deals with partially observable probabilistic planning approximate design. Furthermore, to study the descriptive by formulating the problem as a general SSAT formula; power of DSSAT, we establish a polynomial-time reduction solver DC-SSAT (Majercik and Boots 2005) relies on a from Dec-POMDP to DSSAT. With the theoretical founda- divide-and-conquer approach to speedup the solving of a tions paved in this work, we hope to encourage the develop- general SSAT formula. Solvers ressat and erssat (Lee, ment of DSSAT solvers for potential broad applications. Wang, and Jiang 2017, 2018) are developed for random- exist and exist-random quantified SSAT formulas respec- 1 Introduction tively, and applied to the formal verification of probabilistic design (Lee and Jiang 2018). POMDP has also been stud- Satisfiability (SAT) solvers (Biere, Heule, and van Maaren ied under the formalism of SSAT (Majercik and Littman 2009) have been successfully applied to numerous research 2003; Salmon and Poupart 2019). Recently, bi-directional fields including artificial intelligence (Nilsson 2014; Russell polynomial-time reductions between SSAT and POMDP are and Norvig 2016), electronic design automation (Marques- established (Salmon and Poupart 2019). The quantitative in- Silva and Sakallah 2000; Wang, Chang, and Cheng 2009), formation flow analysis for software security is also inves- software verification (Berard´ et al. 2013; Jhala and Majum- tigated as an exist-random quantified SSAT formula (Fre- dar 2009), etc. The tremendous benefits have encouraged the mont, Rabe, and Seshia 2017). development of more advanced decision procedures for sat- In view of the close relation between QBF and SSAT, isfiability with respect to more complex logics beyond pure we raise the question what would be the formalism that propositional. For example, solvers of the satisfiability mod- extends DQBF to the stochastic domain. We formalize the ulo theories (SMT) (De Moura and Bjørner 2011; Barrett dependency SSAT (DSSAT) as the answer to the question. and Tinelli 2018) accommodate first order logic fragments; We prove that DSSAT has the same NEXPTIME-complete quantified Boolean formula (QBF) (Narizzano, Pulina, and complexity as DQBF (Peterson, Reif, and Azhar 2001), and Tacchella 2006; Buning¨ and Bubeck 2009) allows both exis- therefore it can succinctly encode decision problems with tential and universal quantifiers; stochastic Boolean satisfia- uncertainty in the NEXPTIME . bilty (SSAT) (Littman, Majercik, and Pitassi 2001; Majercik To highlight the benefits of DSSAT over DQBF, we note 2009) models uncertainty with random quantification; and that DSSAT intrinsically represents an optimization problem dependency QBF (DQBF) (Balabanov, Chiang, and Jiang (the answer is the maximum satisfying probability) while 2014; Scholl and Wimmer 2018) equips Henkin quantifiers DQBF is a (the answer is either true or to describe multi-player games with partial information. Due false). The optimization nature of DSSAT potentially allows Copyright © 2021, Association for the Advancement of Artificial broader applications of the formalism. Moreover, DSSAT is Intelligence (www.aaai.org). All rights reserved. often preferable to DQBF in expressing problems involving

3877 uncertainty and probabilities. As case studies, we investigate 2.1 Stochastic Boolean Satisfiability its applicability in probabilistic system design/verification SSAT was first proposed by Papadimitriou and described as and artificial intelligence. games against nature (Papadimitriou 1985). An SSAT for-

In system design of the post Moore’s law era, the practice Φ V = {x , . . . , x } mula over a set 1 n of variables isR of the p of very large scale integration (VLSI) circuit design experi- form: Q1x1,...,Qnxn.φ, where each Qi ∈ {∃, } and

ences a paradigm shift in design principles to overcome the φ V ∃ Boolean formula over is quantifier-free.R Symbol de- obstacle of physical scaling of computation capacity. Prob- notes an existential quantifier, and p denotes a randomized abilistic design (Chakrapani et al. 2008) and approximate quantifier, which requires the probability that the quantified design (Venkatesan et al. 2011) are two such examples of variable equals > to be p ∈ [0, 1]. Given an SSAT formula emerging design methodologies. The former does not - Φ, the quantification structure Q1x1,...,Qnxn is called the quire logic gates to be error-free, but rather allowing them prefix, and the quantifier-free Boolean formula φ is called the to function with probabilistic errors. The latter does not re- matrix. quire the implementation circuit to behave exactly the same Let x be the outermost variable in the prefix of an SSAT as the specification, but rather allowing their deviation to formula Φ. The satisfying probability of Φ, denoted by some extent. These relaxations to design requirements pro- Pr[Φ], is defined recursively by the following four rules: vide freedom for circuit simplification and optimization. We show that DSSAT can be a useful tool for the analysis of a) Pr[>] = 1, probabilistic design and approximate design. b) Pr[⊥] = 0,

The theory and applications of Markov decision pro- c) Pr[Φ] = max{Pr[Φ|¬x], Pr[Φ|x]}, if x is existentially cess and its variants are among the most important topics quantified, in the study of artificial intelligence. For example, the de-

Pr[Φ] = (1 − p) Pr[Φ| ] + p Pr[Φ| ] x d) R ¬x x , if is randomly cision problem involving multiple agents with uncertainty p and partial information is often considered as a decentral- quantified by , ized POMDP (Dec-POMDP) (Oliehoek, Amato et al. 2016). where Φ|¬x and Φ|x denote the SSAT formulas obtained by The independent actions and observations of the individual eliminating the outermost quantifier of x via substituting the agents make POMDP for single-agent systems not applica- value of x in the matrix with ⊥ and >, respectively. ble and require the more complex Dec-POMDP. Essentially The decision version of SSAT is stated as follows. Given the complexity is lifted from the PSPACE-complete policy an SSAT formula Φ and a threshold θ ∈ [0, 1], decide evaluation of finite-horizon POMDP to the NEXPTIME- whether Pr[Φ] ≥ θ. On the other hand, the optimization ver- complete Dec-POMDP. We show that Dec-POMDP is poly- sion asks to compute Pr[Φ]. The decision version of SSAT nomial time reducible to DSSAT. was shown to be PSPACE-complete (Papadimitriou 1985). To sum , the main results of this work include: 2.2 Dependency Quantified Boolean Formula • formulating the DSSAT problem (Section 3), DQBF was formulated as multiple-person alternation (Pe- • proving its NEXPTIME-completeness (Section 4), and terson and Reif 1979). In contrast to the linearly ordered • showing its applications in: prefix used in QBF, i.., an existentially quantified variable will depend on of its preceding universally quantified – analyzing probabilistic/approximate design (Section 5) variables, the quantification structure in DQBF is extended – modeling Dec-POMDP (Section 6). with Henkin quantifiers, where the dependency of an ex- Our results may encourage the development of DSSAT istentially quantified variable on the universally quantified solvers to enable potential broad applications. variables can be explicitly specified. A DQBF Φ over a set V = {x1, . . . , xn, y1, . . . , ym} of variables is of the form: 2 Preliminaries ∀x ,..., ∀x , ∃y (D ),..., ∃y (D ).φ, (1) In this section, we provide background knowledge about 1 n 1 y1 m ym SSAT, DQBF, probabilistic design, and Dec-POMDP. where each Dyj ⊆ {x1, . . . , xn} denotes the set of variables In the sequel, Boolean values TRUE and FALSE are rep- that variable yj can depend on, and Boolean formula φ over resented by symbols > and ⊥, respectively; they are also V is quantifier-free. We denote the set {x1, . . . , xn} (resp. treated as 1 and 0, respectively, in arithmetic computation. {y1, . . . , ym}) of universally (resp. existentially) quantified ¬, ∨, ∧, ⇒, ≡ ∀ ∃ Boolean connectives are interpreted in their variables of Φ by VΦ (resp. VΦ ). conventional semantics. Given a set V of variables, an as- Given a DQBF Φ, it is satisfied if for each variable yj, signment α is a mapping from each variable x ∈ V to B there exists a function fj : A(Dyj ) → , such that after B = {>, ⊥}, and we denote the set of all assignments over ∃ substituting variables in VΦ with their corresponding func- V by A(V ). An assignment α satisfies a Boolean formula φ tions respectively, matrix φ yields a tautology over V ∀. The over a set V of variables if φ yields > after substituting all Φ set of functions F = {f1, . . . , fm} is called a set of Skolem occurrences of every variable x ∈ V with its assigned value functions for Φ. In other words, Φ is satisfied by F if α(x) and simplifying φ under the semantics of Boolean con- 1 nectives. A Boolean formula φ over a set V of variables is a min φ|F (β) = 1, (2) β∈A(V ∀) tautology if every assignment α ∈ A(V ) satisfies φ. Φ

3878 of Agent i, T : S × (A1 × · · · × An) × S → [0, 1] is a transi- tion distribution function with T (s,~a,s0) = Pr[s0|s,~a], the p probability to transit to state s0 from state s after taking - z tions ~a, ρ : S × (A1 × · · · × An) → R is a reward function with ρ(s,~a) giving the reward for being in state s and taking Figure 1: Conversion of the distillation operation. actions ~a, Oi is a finite set of observations for Agent i, Ω: S ×(A1 ×· · ·×An)×(O1 ×· · ·×On) → [0, 1] is an obser- vation distribution function with Ω(s0,~a,~o) = Pr[~o|s0,~a], 1 the probability to receive observation ~o after taking actions where φ|F (·) is an indicator function to indicate whether 0 ∀ ~a and transiting to state s , ∆0 : S → [0, 1] is an initial state an assignment over VΦ belongs to the set of satisfying 0 ∃ distribution function with ∆0(s) = Pr[s ≡ s], the proba- assignments of matrix φ, when variables in VΦ are sub- bility for the initial state s0 being state s, and h is a planning stituted by their Skolem functions in F. That is, φ|F = horizon, which we assume finite in this work. {β | φ(β(x1), . . . , β(xn), f1|β, . . . , fm|β) ≡ >}, where Given a Dec-POMDP M, we aim at maximizing the ex- f | is the logical value derived by substituting every x ∈ h−1 j β i pected total reward E[P ρ(st,~at)] through searching an D with β(x ) in function f . The satisfiability problem of t=0 yj i j optimal joint policy for the team of agents. Specifically, a DQBF was shown to be NEXPTIME-complete (Peterson, policy πi of Agent i is a mapping from the agent’s observa- Reif, and Azhar 2001). t 0 t tion history, i.e., a sequence of observations oi = oi , . . . , oi t+1 2.3 Probabilistic Design received by Agent i, to an action ai ∈ Ai. A joint pol- icy for the team of agents ~π = (π1, . . . , πn) maps the In this paper, a design refers to a combinational Boolean agents’ joint observation history ~ot = (ot , . . . , ot ) to ac- logic circuit, which is a directed acyclic graph G = (V,E), 1 n tions ~at+1 = (π (ot ), . . . , π (ot )). We shall focus on de- where V is a set of vertices, and E ⊆ V × V is a set of 1 1 n n edges. Each vertex in V can be a primary input, primary out- terministic policies only, as it was shown that every Dec- put, or an intermediate gate. An intermediate gate is associ- POMDP with a finite planning horizon has a deterministic ated with a Boolean function. An edge (u, v) ∈ E signifies optimal joint policy (Oliehoek, Spaan, and Vlassis 2008). the connection from u to v, denoting the associated Boolean To assess the quality of a joint policy ~π, its value is defined E Ph−1 t t function of v may depend on u. A circuit is called a partial to be [ t=0 ρ(s ,~a )|∆0, ~π]. The value function V (~π) can design if some of the intermediate gates are black boxes, that be computed in a recursive manner, where for t = h − 1, is, their associated Boolean functions are not specified. V π(sh−1,~oh−2) = ρ(sh−1, ~π(~oh−2)), and for t < h − 1,

A probabilistic design is an extension of conventional π t t−1 t t−1 Boolean logic circuits to model the scenario where inter- V (s ,~o ) = ρ(s , ~π(~o ))+ X X mediate gates exhibit probabilistic behavior. In a probabilis- Pr[st+1, ~ot|st, ~π(~ot)]V π(st+1,~ot). (3) tic design, each intermediate gate has an error rate, i.e., the st+1∈S t ~ probability for the gate to produce an erroneous output. An ~o ∈O intermediate gate is erroneous if its error rate is nonzero. The probability Pr[st+1, ~ot|st, ~π(~ot)] in the above equation Using the distillation operation (Lee and Jiang 2018), an er- is the product of T (st, ~π(~ot), st+1) and Ω(st+1, ~π(~ot), ~ot). roneous gate can be modeled by its corresponding error-free Eq. (3) is also called the Bellman Equation of Dec-POMDP. > gate XORed with an auxiliary input, which valuates to Denoting the empty observation history at the first stage with a probability equal to the error rate. As illustrated in t = 0 ~o−1 V (~π) Figure 1, a NAND gate with error rate p is converted to an (i.e., ) with the symbol , the value of a joint P 0 π 0 −1 error-free NAND gate XORed with a fresh auxiliary input policy equals s0∈S ∆0(s )V (s ,~o ). z with Pr[z = >] = p so that it triggers the error with probability p. After applying the distillation operation to ev- 3 DSSAT Formulation ery erroneous gate, all the intermediate gates in the distilled In this section, we extend DQBF to its stochastic vari- design become error-free, which makes the techniques for ant, named Dependency Stochastic Boolean Satisfiability conventional reasoning applicable. (DSSAT). A DSSAT formula Φ over V = {x1, . . . , xn, y1, . . . , ym}

2.4 Decentralized POMDP is of the form:

R Dec-POMDP is a formalism for multi-agent systems under p1 pR n x1,. .., xn, ∃y1(Dy1 ),. .., ∃ym(Dym ).φ, (4) uncertainty and with partial information. Its computational complexity was shown to be NEXPTIME-complete (Bern- where each Dyj ⊆ {x1, . . . , xn} denotes the set of vari- stein et al. 2002). In the following, we briefly review the ables that variable yj can depend on, and Boolean formula

definition, optimality criteria, and value function of Dec- φ over V is quantifier-free. We denote the set {x1, . . . , xn} POMDP from the literature (Oliehoek, Amato et al. 2016). (resp. {y1, . . . , ym}) of randomlyR (resp. existentially) quan- ∃ A Dec-POMDP is specified by a tuple M = (I, S, {Ai}, tified variables of Φ by VΦ (resp. VΦ ). T, ρ, {Oi}, Ω, ∆0, h), where I = {1, . . . , n} is a finite set of Given a DSSAT formula Φ and a set of Skolem functions B n agents, S is a finite set of states, Ai is a finite set of actions F = {fj : A(Dyj ) → | j = 1, . . . , m}, the satisfying

3879 probability Pr[Φ|F ] of Φ with respect to F is defined by the First, to see why DSSAT belongs to the NEXPTIME com- following equation: plexity class, observe that a Skolem function for an existen- tially quantified variable can be guessed and constructed in

X 1

Pr[Φ|F ] = φ|F (α)w(α), (5) nondeterministic exponential time with respect to the num- R α∈A(V ) ber of randomly quantified variables. Given the guessed Φ Skolem functions, the evaluation of the matrix, summa- 1 where φ|F (·) is the indicator function defined in Section 2.2 tion of weights of satisfying assignments, and comparison α(x ) Qn i 1−α(xi) against the threshold θ can also be performed in exponential and w(α) = i=1 pi (1 − pi) is the weight- ing function for assignments. In other words, the satisfying time. Overall, the whole procedure is done in nondeterminis-

tic exponential time with respect to the input size, and hence probability is theR summation of weights of satisfying assign-

DSSAT belongs to the NEXPTIME complexity class. ments over V . The weight of an assignment can be under-R Φ Second, to see why DSSAT is NEXPTIME-hard, we re- stood as its occurring probability in the space of A(VΦ ). duce the NEXPTIME-complete problem DQBF to DSSAT The decision version of DSSAT is stated as follows. Given as follows. Given an arbitrary DQBF: a DSSAT formula Φ and a threshold θ ∈ [0, 1], decide whether there exists a set of Skolem functions F such that ΦQ = ∀x1,..., ∀xn, ∃y1(Dy1 ),..., ∃ym(Dym ).φ, Pr[Φ| ] ≥ θ optimization version

F . On the other hand, the we construct a DSSAT formula:

asks to find a set of Skolem functions to maximize the satis- R 0.5 0R .5 fying probability of Φ. ΦS = x1,. .., xn, ∃y1(Dy1 ),. .., ∃ym(Dym ).φ The formulation of SSAT can be extended by incorporat- ing universal quantifiers, resulting in a unified framework by changing every universal quantifier to a randomized named extended SSAT (Majercik 2009), which subsumes quantifier with probability 0.5. The reduction can be done both QBF and SSAT. In the extended SSAT, besides the in polynomial time with respect to the size of ΦQ. We will four rules discussed in Section 2.1 for calculating the satis- show that ΦQ is satisfiable if and only if there exists a set of fying probability of an SSAT formula Φ, the following rule Skolem functions F such that Pr[ΦS|F ] ≥ 1. The “only if” direction: As Φ is satisfiable, there ex- is added: Pr[Φ] = min{Pr[Φ|¬x], Pr[Φ|x]}, if x is univer- Q sally quantified. Similarly, an extended DSSAT formula Φ ists a set of Skolem functions F such that after substituting the existentially quantified variables with the corresponding over a set of variables {x1, . . . , xn, y1, . . . , ym, z1, . . . , zl} is of the form: Skolem functions, matrix φ becomes a tautology over vari- ables {x1, . . . , xn}. Therefore, Pr[ΦS|F ] = 1 ≥ 1. Q1v1,...,Qn+lvn+, ∃y1(Dy1 ),..., ∃ym(Dym ).φ, (6) The “if” direction: As there exists a set of Skolem func-

where Q v equals either pR k x or ∀z for some k with tions F such that Pr[ΦS|F ] ≥ 1, after substituting the i i k k existentially quantified variables with the corresponding vi 6= vj for i 6= j, and each Dyj ⊆ {x1, . . . , xn, z1, . . . , zl} denotes the set of randomly and universally quantified vari- Skolem functions, each assignment α ∈ A({x1, . . . , xn}) must satisfy φ, i.e., φ becomes a tautology over vari- ables which variable yj can depend on. The satisfying prob- ables {x1, . . . , xn}. Otherwise, the satisfying probability ability of Φ with respect to a set of Skolem functions F = −n B Pr[ΦS|F ] must be less than 1 as the weight 2 of some {fj : A(Dyj ) → | j = 1, . . . , m}, denoted by Pr[Φ|F ], can be computed by recursively applying the aforemen- unsatisfying assignment is missing from the summation. tioned five rules to the induced formula of Φ with the ex- Therefore, ΦQ is satisfiable. istential variables yj being substituted with their respective When DSSAT is extended with universal quantifiers, its Skolem functions fj. Under the above computation scheme, complexity remains in the NEXPTIME complexity class as both Eq. (2) and Eq. (5) are special cases, where the vari- the fifth rule of the satisfying probability calculation does ables preceding the existential quantifiers in the prefixes are not incur any complexity overhead. Therefore the following solely universally or randomly quantified, and hence the fifth corollary is immediate. or the fourth rule is applied to calculate Pr[Φ|F ]. Note that in the above extension the Henkin-type quanti- Corollary 1. The decision problem of DSSAT extended with fiers are only defined for the existential variables. Although universal quantifiers of Eq. (6) is NEXPTIME-complete. the extended formulation increases practical expressive suc- cinctness, the computational complexity is not changed as to 5 Application: Analyzing Probabilistic and be shown in the next section. Approximate Partial Design After formulating DSSAT and proving its NEXPTIME- 4 DSSAT Complexity completeness, we show its application to the analysis of In the following, we show that the decision version of probabilistic design and approximate design. Specifically, DSSAT is NEXPTIME-complete. we consider the probabilistic version of the topologically Theorem 1. DSSAT is NEXPTIME-complete. constrained logic synthesis problem (Sinha, Mishchenko, and Brayton 2002; Balabanov, Chiang, and Jiang 2014), or Proof. To show that DSSAT is NEXPTIME-complete, we equivalently the partial design problem (Gitina et al. 2013). have to show that it belongs to the NEXPTIME complexity In the (deterministic) partial design problem, we are given class and that it is NEXPTIME-hard. a specification function G(X) over primary input variables

3880 Moreover, the probabilistic partial design problem can be

encoded with the following DSSAT formula

R    R X, Z, ∀Y, ∃T (D). F G (Y ≡ E(X)) → (F (X,Z,T ) ≡ G(X)), (8) where the primary input variables are randomly quantified �1 �1 �2 �2 with probability pxi of xi ∈ X to reflect their weights, and �1 �2 the error-triggering auxiliary input variables Z are randomly quantified according to the pre-specified error rates of the er- roneous gates in CF . Notice that the above DSSAT formula Z X takes advantage of the extension with universal quantifiers as discussed previously. Figure 2: Circuit for the equivalence checking of probabilis- In approximate design, a circuit implementation may de- tic partial design. viate from its specification by a certain extent. The amount of deviation can be characterized in a way similar to the error probability calculation in probabilistic design. For approxi- X and a partial design CF with black boxes to be synthe- mate partial design, the equivalence checking problem can sized. The Boolean functions induced at the primary outputs be expressed by the DSSAT formula:

of CF can be described by F (X,T ), where T corresponds X,R ∀Y, ∃T (D). to the variables of the black box outputs. Each black box out- (Y ≡ E(X)) → (F (X,T ) ≡ G(X)), (9) put ti is specified with its input variables (i.e., dependency set) Di ⊆ X ∪ Y in CF , where Y represents the variables which differs from Eq. (8) only in requiring no auxiliary in- for intermediate gates in CF referred to by the black boxes. puts. The probabilities of the randomly quantified primary The partial design problem aims at deriving the black box input variables are determined by the approximation criteria functions {h1(D1), . . . , h|T |(D|T |)} such that substituting in measuring the deviation. For example, when all the in- ti with hi in CF makes the resultant circuit function equal put assignments are of equal weight, the probabilities of the G(X). The above partial design problem can be encoded as primary inputs are all set to 0.5. a DQBF problem; moreover, the partial equivalence check- ing problem is shown to be NEXPTIME-complete (Gitina 6 Application: Modeling Dec-POMDP et al. 2013). Specifically, the DQBF that encodes the partial equiva- In this section we demonstrate the descriptive power of lence checking problem is of the form: DSSAT to model NEXPTIME-complete problems by con- structing a polynomial-time reduction from Dec-POMDP to ∀X, ∀Y, ∃T (D). DSSAT. Our reduction is an extension of that from POMDP (Y ≡ E(X)) → (F (X,T ) ≡ G(X)), (7) to SSAT proposed in the previous work (Salmon and Poupart where D consists of (D ,...,D ), E corresponds to the 2019). 1 |T | In essence, given a Dec-POMDP M, we will construct defining functions of Y in CF , and the operator “≡” denotes elementwise equivalence between its two operands. in polynomial time a DSSAT formula Φ such that there is a The above partial design problem can be extended to joint policy ~π for M with value V (~π) if and only if there is a its probabilistic variant, which is illustrated by the circuit set of Skolem functions F for Φ with satisfying probability shown in Figure 2. The probabilistic partial design problem Pr[Φ|F ], and V (~π) = Pr[Φ|F ]. is the same as the deterministic partial design problem ex- First we introduce the variables used in construction of the DSSAT formula and their domains. To improve read- cept that CF is a distilled probabilistic design (Lee and Jiang 2018) with black boxes, whose functions at the primary out- ability, we allow a variable x to take values from a finite set

U = {x , . . . , x } puts can be described by F (X,Z,T ), where Z represents 1 K (Salmon and PoupartR 2019). Under this the variables for the auxiliary inputs that trigger errors in setting, a randomized quantifier over variable x specifies a distribution Pr[x ≡ xi] for each xi ∈ U. We also define a CF (including the errors of the black boxes) and T corre- sponds to the variables of the black box outputs. Each black scaled reward function: 0 0 box output t is specified with its input variables (i.e., depen- ρ(s,~a) − min 0 0 ρ(s ,~a ) i r(s,~a) = s ,~a dency set) Di ⊆ X ∪ Y in CF . When ti is substituted with P 00 00 0 0 s00,~a00 [ρ(s ,~a ) − mins0,~a0 ρ(s ,~a )] hi in CF , the function of the resultant circuit is required to be sufficiently close to the specification with respect to some such that r(s,~a) forms a distribution over all pairs of s and P expected probability. ~a, i.e., ∀s,~a.r(s,~a) ≥ 0 and s,~a r(s,~a) = 1. We will use The hardness of the problem is stated in Theorem 2, the following variables: whose proof can be found in the full version of this work xt ∈ S t at https://arxiv.org/abs/1911.04112. • s : the state at stage , i,t Theorem 2. The probabilistic partial design equivalence • xa ∈ Ai: the action taken by Agent i at stage t, i,t checking problem is NEXPTIME-complete. • xo ∈ Oi: the observation received by Agent i at stage t,

3881 ^ t ^ i,t t+1 t+1 [xp ≡ ⊥ → xo ≡ 0 ∧ xs ≡ 0 ∧ xp ≡ ⊥] (10) 0≤t≤h−2 i∈I h−1 xp ≡ ⊥ (11) ^ ^ 0 0 ^ i,0 0 [xp ≡ ⊥ ∧ xs ≡ s ∧ xa ≡ ai → xr ≡ Nr(s,~a)] (12) s∈S ~a∈A~ i∈I ^ ^ ^ t−1 t t ^ i,t t [xp ≡ > ∧ xp ≡ ⊥ ∧ xs ≡ s ∧ xa ≡ ai → xr ≡ Nr(s,~a)] (13) 1≤t≤h−1 s∈S ~a∈A~ i∈I ^ ^ ^ ^ t t ^ i,t t+1 0 t 0 [xp ≡ > ∧ xs ≡ s ∧ xa ≡ ai ∧ xs ≡ s → xTs,~a ≡ s ] (14) 0≤t≤h−2 s∈S ~a∈A~ s0∈S i∈I ^ ^ ^ ^ t t+1 0 ^ i,t ^ i,t t [x ≡ > ∧ x ≡ s ∧ x ≡ ai ∧ x ≡ oi → x ≡ NΩ(~o)] (15) p s a o Ωs0,~a 0≤t≤h−2 s0∈S ~a∈A~ ~o∈O~ i∈I i∈I

Figure 3: The formulas used to encode a Dec-POMDP M.

t • xr ∈ S × (A1 × ... × An): the reward earned at stage t, For an arbitrary Dec-POMDP with h > 1, we follow t the two steps proposed in the previous work (Salmon and • xT ∈ S: transition distribution at stage t, t Poupart 2019), namely policy selection and policy evalua- • xΩ ∈ O1 × ... × On: observation distribution at stage t, tion, and adapt the policy selection step for the multi-agent t B • xp ∈ : used to sum up rewards across stages. setting in Dec-POMDP. We represent elements in the sets S, Ai, and Oi by inte- In the previous work (Salmon and Poupart 2019), an gers, i.e., S = {0, 1,..., |S| − 1}, etc., and use indices s, agent’s policy selection is encoded by the following prefix ai, and oi to iterate through them, respectively. On the other (use Agent i as an example):

t t R R R R hand, a special treatment is required for variables x and x , R r Ω i,0 0 i,0 h−2 i,h−2 i,h−1 h−1 as they range over Cartesian products of several sets. We will ∃xa , xp, xo ,. .., xp , xo , ∃xa , xp . give a unique number to an element in a product set as fol- In the above quantification, variable xt is introduced to sum lows. Consider Q~ = Q1 × ... × Qn, where each Qi is a p up rewards earned at different stages. It takes values from finite set. An element ~q = (q , . . . , q ) ∈ Q~ is numbered 1 n B, and follows a uniform distribution, i.e., Pr[xt ≡ >] = by N(q , . . . , q ) = Pn q (Qi−1 |Q |). In the following p 1 n i=1 i j=1 j Pr[xt ≡ ⊥] = 0.5. When xt ≡ ⊥, the process is stopped xt xt p p construction, variables r and Ω will take values from the and the reward at stage t is earned; when xt ≡ >, the pro- ~ ~ p numbers given to the elements in S × A and O by Nr(s,~a) cess is continued to stage t + 1. Note that variables {xt } and N (~o), respectively. p Ω are shared across all agents. With the help of variable xt , We begin by constructing a DSSAT formula for a Dec- p rewards earned at different stages are summed up with an POMDP with h = 1. Under this setting, the derivation of equal weight 2−h. Variable xi,t also follows a uniform dis- the optimal joint policy is simplified to finding an action o tribution Pr[xi,t ≡ o ] = |O |−1, which scales the satisfy- for each agent such that the expectation value of the reward o i i ing probability by |O |−1 at each stage. Therefore, we need function is maximized, i.e., i to re-scale the satisfying probability accordingly in order to ∗ X ~a = arg max ∆0(s)r(s,~a) obtain the correct satisfying probability corresponding to the ~a∈A~ s∈S value of a joint policy. The scaling factor, denoted κh, equals

2h(|O~ ||S|)h−1 (derived in the proof of Theorem 3).

The DSSAT formula below encodes the above equation:

R R As the actions of the agents can only depend on their 0 0 1,0 n,0 x , x , ∃x (D 1,0 ),. .., ∃x (D n,0 ).φ, s r a xa a xa own observation history, for the selection of a joint policy 0 0 it is not obvious how to combine the quantification, i.e., the where the distribution of xs follows Pr[xs ≡ s] = ∆0(s), 0 0 selection of a policy, of each agent into a linearly ordered the distribution of xr follows Pr[xr ≡ Nr(s,~a)] = r(s,~a), prefix required by SSAT, without suffering an exponential each D i,0 = ∅, and the matrix: xa translation cost. On the other hand, DSSAT allows to specify ^ ^ 0 ^ i,0 0 the dependency of an existentially quantified variable freely φ = [xs ≡ s ∧ xa ≡ ai → xr ≡ Nr(s,~a)]. and is suitable to encode the selection of a joint policy. In s∈S ~a∈A~ i∈I i,t the prefix of the DSSAT formula, variable xa depends on i,0 i,t−1 0 t−1 D i,t = {x , . . . , x , x , . . . , x }. As the existentially quantified variables have no dependency xa o o p p on randomly quantified variable, the DSSAT formula is ef- Next, the policy evaluation step is exactly the same as that fectively an exist-random quantified SSAT formula. in the previous work (Salmon and Poupart 2019). The fol-

3882 1,0 2,0 ∃xa xa 0 0 0

~a = (a1, a2) R 0 xp

0.5 0.5 R

xR 1,0x2,0 x1,0x2,0 1 o o 1 o o 0 0 0 |O1×O2| |O1×O2| ~o = (o1, o2)

1,1 2,1 1,1 2,1 1,1 2,1 ∃xa xa 0 · · · · · · 0 ∃xa xa ... ∃xa xa

1 1 1

~a = (a1, a2)

R R 1 1 xp xp

0.5 0.5 0.5 0.5 R

1 xR sxrxT xΩ 0 1 xsxrxT xΩ 0 |S| |S|

0 0 0 0 0 0 1 1 0 0 1 1 ∆0(s )r(s ,~a ) ...... ∆0(s )T (s ,~a , s )Ω(s ,~a , ~o )r(s ,~a ) ......

Figure 4: The decision tree of a Dec-POMDP example with two agents and h = 2.

lowing quantification computes the value of a joint policy: Agent 1 and Agent 2, let the actions taken at t = 0 be

R R t t 0 0 0

x , x , t = 0, . . . , h − 1 ~a = (a1, a2) and the actions taken at t = 1 under cer-

s r 0 0 0 1 1 1 R R t t tain observations ~o = (o1, o2) be ~a = (a1, a2). The value xT , xΩ, t = 0, . . . , h − 2 of this joint policy is computed by Eq. (3) as t t Variables xs follow a uniform distribution Pr[xs ≡ s] = −1 0 X 0 0 0 |S| except for variable xs, which follows the initial dis- V (π) = ∆0(s )[r(s ,~a ) 0 tribution specified by Pr[xs ≡ s] = ∆0(s); variables s0∈S t t xr follow the distribution of the reward function Pr[xr ≡ X X 0 0 1 1 0 0 1 1 t + T (s ,~a , s )Ω(s ,~a , ~o )r(s ,~a )]. Nr(s,~a)] = r(s,~a); variables xT follow the state transi- tion distribution Pr[xt ≡ s0] = T (s,~a,s0); variables xt ~o0∈O~ s1∈S Ts,~a Ω t follow the observation distribution Pr[x ≡ NΩ(~o)] = Ωs0,~a The decision tree of the converted DSSAT formula is shown Ω(s0,~a,~o). Note that these variables encode the random in Figure 4. Note that the randomized quantifiers over vari- t t t mechanism of a Dec-POMDP and are hidden from agents. ables xp, xs, and xo will scale the satisfying probability by i,t That is, variables xa do not depend on the above variables. the corresponding factors labelled on the edges. Therefore, 2 The formulas to encode M are listed in Figure 3. For- we have to re-scale the satisfying probability by 2 |S||O1 × t h h−1 mula (10) encodes that when xp ≡ ⊥, i.e., the process is O2|, according to the scaling factor κh = 2 (|O~ ||S|) . i,t t+1 stopped, the observation xo and next state xs are set to (A more detailed explanation for this example can be found t+1 a preserved value 0, and xp ≡ ⊥. Formula (11) ensures in the full paper.) the process is stopped at the last stage. Formula (12) ensures the reward at the first stage is earned when the process is 0 7 Conclusions and Future Work stopped, i.e., xp ≡ ⊥. Formula (13) requires the reward at t−1 t In this paper, we extended DQBF to its stochastic vari- stage t > 0 is earned when xp ≡ > and xp ≡ ⊥. For- mula (14) encodes the transition distribution from state s to ant DSSAT and proved its NEXPTIME-completeness. Com- state s0 given actions ~a are taken. Formula (15) encodes the pared to the PSPACE-complete SSAT, DSSAT is more pow- observation distribution to receive observation ~o under the erful to succinctly model NEXPTIME-complete decision situation that state s0 is reached after actions ~a are taken. problems with uncertainty. The new formalism can be use- ful in applications such as artificial intelligence and sys- Theorem 3. The above reduction maps a Dec-POMDP M tem design. Specifically, we demonstrated the DSSAT for- to a DSSAT formula Φ, such that a joint policy ~π exists for mulation of the analysis to probabilistic/approximate par- M if and only if a set of Skolem functions F exists for Φ, tial design, and gave a polynomial-time reduction from with V (~π) = Pr[Φ|F ]. the NEXPTIME-complete Dec-POMDP to DSSAT. We Due to space limit, the proof is available in the full-paper envisage the potential broad applications of DSSAT and version at https://arxiv.org/abs/1911.04112. Note that the plan solver development for future work. We note that re- proposed reduction is a polynomial-time reduction, as the cent developments of clausal abstraction for QBF (Jan- numbers of variables and clauses in the resulting DSSAT for- ota and Marques-Silva 2015; Rabe and Tentrup 2015) and mula are polynomials of the input size of the Dec-POMDP. DQBF (Tentrup and Rabe 2019) might provide a promising Below we demonstrate the reduction with an example. framework for DSSAT solving. Clausal abstraction has been Example 1. Consider a Dec-POMDP with two agents and lifted to SSAT (Chen, Huang, and Jiang 2021), and we are planning horizon h = 2. Given a joint policy (π1, π2) for investigating its feasibility for DSSAT.

3883 Acknowledgments Lee, N.-Z.; and Jiang, J.-H. R. 2018. Towards formal eval- The authors are grateful to Christoph Scholl, Ralf Wim- uation and verification of probabilistic design. IEEE Trans- mer, and Bernd Becker for valuable discussions motivating actions on Computers 67(8): 1202–1216. this work. This work was supported in part by the Min- Lee, N.-Z.; Wang, Y.-S.; and Jiang, J.-H. R. 2017. Solving istry of Science and Technology of Taiwan under Grant stochastic Boolean satisfiability under random-exist quan- No. 108-2221-E-002-144-MY3, 108-2218-E-002-073, and tification. In Proceedings of the 26th International Joint 109-2224-E-002-008. JHJ was supported in part by the Conference on Artificial Intelligence (IJCAI), 688–694. Alexander von Humboldt Foundation during this work. Lee, N.-Z.; Wang, Y.-S.; and Jiang, J.-H. R. 2018. Solving exist-random quantified stochastic Boolean satisfiability via References clause selection. In Proceedings of the 27th International Balabanov, V.; Chiang, H.-J. K.; and Jiang, J.-H. R. 2014. Joint Conference on Artificial Intelligence (IJCAI), 1339– Henkin quantifiers and Boolean formulae: A certification 1345. perspective of DQBF. Theoretical Computer Science 523: 86–100. Littman, M. L.; Majercik, S. M.; and Pitassi, T. 2001. Stochastic Boolean satisfiability. Journal of Automated Rea- Barrett, C.; and Tinelli, C. 2018. Satisfiability modulo theo- soning 27(3): 251–296. ries. Handbook of Model Checking 305–343. Majercik, S. M. 2009. Stochastic Boolean satisfiability. Berard,´ B.; Bidoit, M.; Finkel, A.; Laroussinie, F.; Petit, A.; Handbook of Satisfiability 185: 887–925. Petrucci, L.; and Schnoebelen, P. 2013. Systems and Soft- ware Verification: Model-checking Techniques and Tools. Majercik, S. M.; and Boots, B. 2005. DC-SSAT: A divide- Springer Science & Business Media. and-conquer approach to solving stochastic satisfiability problems efficiently. In Proceedings of the 19th AAAI Con- Bernstein, D. S.; Givan, R.; Immerman, N.; and Zilberstein, ference on Artificial Intelligence (AAAI), 416–422. S. 2002. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research Majercik, S. M.; and Littman, M. L. 1998. MAXPLAN: A 27(4): 819–840. new approach to probabilistic planning. In Proceedings of the 4th International Conference on Artificial Intelligence Biere, A.; Heule, M.; and van Maaren, H. 2009. Handbook Planning (AIPS), 86–93. of Satisfiability. IOS press. Majercik, S. M.; and Littman, M. L. 2003. Contingent plan- Buning,¨ H. K.; and Bubeck, U. 2009. Theory of quantified ning under uncertainty via stochastic satisfiability. Artificial Boolean formulas. Handbook of Satisfiability 185: 735–760. Intelligence 147(1-2): 119–162. Chakrapani, L. N. B.; George, J.; Marr, B.; Akgul, B. E. S.; Marques-Silva, J. P.; and Sakallah, K. A. 2000. Boolean and Palem, K. V. 2008. Probabilistic design: A survey of satisfiability in electronic design automation. In Proceedings probabilistic CMOS technology and future directions for of the 37th Annual Design Automation Conference (DAC), terascale IC design. In VLSI-SoC: Research Trends in VLSI 675–680. and Systems on Chip, 101–118. Narizzano, M.; Pulina, L.; and Tacchella, A. 2006. The Chen, P.-W.; Huang, Y.-C.; and Jiang, J.-H. R. 2021. A sharp QBFEVAL web portal. In Proceedings of the 10th European leap from quantified Boolean formula to stochastic Boolean Conference on Logics in Artificial Intelligence (JELIA), satisfiability solving. In Proceedings of the 35th AAAI Con- 494–497. ference on Artificial Intelligence (AAAI). Nilsson, N. J. 2014. Principles of Artificial Intelligence. De Moura, L.; and Bjørner, N. 2011. Satisfiability modulo Morgan Kaufmann. theories: Introduction and applications. Communications of the ACM 54(9): 69–77. Oliehoek, F. A.; Amato, C.; et al. 2016. A Concise Introduc- tion to Decentralized POMDPs. Springer. Fremont, D. J.; Rabe, M. N.; and Seshia, S. A. 2017. Maxi- mum model counting. In Proceedings of the 31st AAAI Con- Oliehoek, F. A.; Spaan, M. T. J.; and Vlassis, N. 2008. Op- ference on Artificial Intelligence (AAAI), 3885–3892. timal and approximate Q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research 32: Gitina, K.; Reimer, S.; Sauer, M.; Wimmer, R.; Scholl, C.; 289–353. and Becker, B. 2013. Equivalence checking of partial de- signs using dependency quantified Boolean formulae. In Papadimitriou, C. H. 1985. Games against nature. Journal Proceedings of the IEEE 31st International Conference on of Computer and System Sciences 31(2): 288–301. Computer Design (ICCD), 396–403. Peterson, G.; Reif, J.; and Azhar, S. 2001. Lower bounds for Janota, M.; and Marques-Silva, J. 2015. Solving QBF by multiplayer noncooperative games of incomplete informa- clause selection. In Proceedings of the 24th International tion. Computers & Mathematics with Applications 41(7-8): Joint Conference on Artificial Intelligence (IJCAI), 325– 957–992. 331. Peterson, G. L.; and Reif, J. H. 1979. Multiple-person al- Jhala, R.; and Majumdar, R. 2009. Software model check- ternation. In Proceedings of the 20th IEEE Symposium on ing. ACM Computing Surveys 41(4): 21:1–21:54. Foundations of Computer Science (FOCS), 348–363.

3884 Rabe, M. N.; and Tentrup, L. 2015. CAQE: A certifying QBF solver. In Proceedings of the 15th Conference on For- mal Methods in Computer-Aided Design (FMCAD), 136– 143. Russell, S. J.; and Norvig, P. 2016. Artificial Intelligence: A Modern Approach. Pearson Education Limited. Salmon, R.; and Poupart, P. 2019. On the relationship between stochastic satisfiability and partially observable Markov decision processes. In Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI), 407:1–407:11. Scholl, C.; and Wimmer, R. 2018. Dependency quantified Boolean formulas: An overview of solution methods and ap- plications. In Proceedings of the 21st International Con- ference on Theory and Applications of Satisfiability Testing (SAT), 3–16. Sinha, S.; Mishchenko, A.; and Brayton, R. K. 2002. Topo- logically constrained logic synthesis. In Proceedings of the 21st IEEE/ACM International Conference on Computer Aided Design (ICCAD), 679–686. Stockmeyer, L. J.; and Meyer, A. R. 1973. Word problems requiring exponential time. In Proceedings of the 5th Annual ACM Symposium on Theory of Computing (STOC), 1–9. Tentrup, L.; and Rabe, M. N. 2019. Clausal abstraction for DQBF. In Proceedings of the 22nd International Conference on Theory and Applications of Satisfiability Testing (SAT), 388–405. Venkatesan, R.; Agarwal, A.; Roy, K.; and Raghunathan, A. 2011. MACACO: Modeling and analysis of circuits for approximate computing. In Proceedings of the 30th IEEE/ACM International Conference on Computer Aided Design (ICCAD), 667–673. Wang, L.-T.; Chang, Y.-W.; and Cheng, K.-T. T. 2009. Elec- tronic Design Automation: Synthesis, Verification, and Test. Morgan Kaufmann.

3885