<<

Journal of Articial Intelligence Research Submitted published

The Computational Complexity of Probabilistic Planning

Michael Littman mlittmancsdukeedu

Department of Computer Science Duke University

Durham NC USA

Judy Goldsmith goldsmitcsengrukyedu

Department of Computer Science University of Kentucky

Lexington KY USA

Martin Mundhenk mundhenktiunitrierde

FB Theoretische Informatik Universitat Trier

D Trier GERMANY

Abstract

We examine the computational complexity of testing and nding small plans in proba

bilistic planning domains with b oth at and prop ositional representations The complexity

of plan evaluation and existence varies with the plan typ e sought we examine totally

ordered plans acyclic plans and lo oping plans and partially ordered plans under three

natural denitions of plan value We show that problems of interest are complete for a

PP PP

variety of complexity classes PL P NP coNP PP NP coNP and PSPACE In

PP

the pro cess of proving that certain planning problems are complete for NP we intro duce

PP

a new basic NP complete problem EMajsat which generalizes the standard Bo olean

satisability problem to computations involving probabilistic quantities our results suggest

that the development of go o d heuristics for EMajsat could b e imp ortant for the creation

of ecient algorithms for a wide variety of problems

Intro duction

Recent work in articialintelligence planning has addressed the problem of nding eec

tive plans in domains in which op erators have probabilistic eects Drummond Bresina

Mansell Drap er Hanks Weld Ko enig Simmons Goldman

Bo ddy Kushmerick Hanks Weld Boutilier Dearden Goldszmidt

Dearden Boutilier Kaelbling Littman Cassandra Boutilier Dean

Hanks Here an eective or successful plan is one that reaches a goal state

with sucient probability In probabilistic propositional planning op erators are sp ecied

in a Bayes network or an extended STRIPSlike notation and the planner seeks a recip e

for cho osing op erators to achieve a goal conguration with some usersp ecied probability

This problem is closely related to that of solving a Markov decision pro cess Puterman

when it is expressed in a compact representation

In previous work Goldsmith Lusena Mundhenk Littman a we exam

ined the complexity of determining whether an eective plan exists for completely observable

domains the problem is EXPcomplete in its general form and PSPACEcomplete when lim

ited to p olynomialdepth plans A p olynomialdepth or p olynomialhorizon plan is one

that takes at most a p olynomial numb er of actions b efore terminating For these results

c

AI Access Foundation and Morgan Kaufmann Publishers All rights reserved

Littman Goldsmith Mundhenk

plans are p ermitted to b e arbitrarily large ob jectsthere is no restriction that a valid plan

need have any sort of compact p olynomialsize representation

Because they place no restrictions on the size of valid plans these earlier results are not

directly applicable to the problem of nding valid plans It is p ossible for example that for

a given planning domain the only valid plans require exp onential space and exp onential

time to write down Knowing whether or not such plans exist is simply not very imp ortant

b ecause they are intractable to express

In the present pap er we consider the complexity of a more practical and realistic

problemthat of determining whether or not a plan exists in a given restricted form and of a

given restricted size The plans we consider take several p ossible forms that have b een used

in previous planning work totally ordered plans partially ordered plans totally ordered

conditional plans and totally order lo oping plans In cases we limit our attention to

plans that can b e expressed in size b ounded by a p olynomial in the size of the sp ecication

of the problem This way once we determine that a plan exists we can use this information

to try to write it down in a reasonable amount of time and space

In the deterministic planning literature several authors have addressed the computa

tional complexity of determining whether a valid plan exists of determining whether a plan

exists of a given cost and of nding the valid plans themselves under a variety of assump

tions Chapman Bylander Erol Nau Subrahmanian Backstrom

Backstrom Neb el These results provide lower b ounds hardness results for

analogous probabilistic planning problems since deterministic planning is a sp ecial case In

deterministic planning optimal plans can b e represented by a simple sequence of op erators

a totally ordered plan In probabilistic planning a go o d conditional plan will often p er

form b etter than any totally ordered unconditional plan therefore we need to consider

the complexity of the planning pro cess for a richer set of plan structures

For ease of discussion we only explicitly describ e the case of planning in completely

observable domains This means that the state of the world is known at all times during

plan execution in spite of the uncertainty of state transitions We know that the state of the

system is sucient information for cho osing actions optimally Puterman however

representing such a universal plan is often impractical in prop ositional domains in which

the size of the state space is exp onential in the size of the domain representation For this

reason we consider other typ es of plan structures based on simple nitestate machines

Because the typ e of plans we consider do not necessarily use the full state of the system to

make every decision our results carry over to partially observable domains although we do

not explore this fact in detail in the present work

The computational problems we lo ok at are complete for a variety of complexity classes

ranging from PL probabilistic logspace to PSPACE Two results are deserving of sp ecial

mention b ecause they concern problems closely related to ones b eing actively addressed

by articialintelligence researchers rst the problem of evaluating a totally ordered plan

in a compactly represented planning domain is PPcomplete A compactly represented

P PP

The class PP is closely related to the somewhat more familiar P To da showed that P P

Roughly sp eaking this means that P and PP are equally p owerful when used as oracles The counting

class P has already b een recognized by the articialintelligence community as an imp ortant complexity

class in computations involving probabilistic quantities such as b eliefnetwork inference Roth

Complexity of Probabilistic Planning

planning domain is one that is describ ed by a twostage temp oral Bayes network Boutilier

et al or similar notation

Second the problem of determining whether a valid totally ordered plan exists for a

PP

compactly represented planning domain is NP complete Whereas the class NP can b e

thought of as the set of problems solvable by guessing the answer and checking it in p olyno

PP

mial time the class NP can b e thought of as the set of problems solvable by guessing the

answer and checking it using a probabilistic p olynomialtime PP computation It is likely

PP

that NP characterizes many problems of interest in the area of uncertainty in articial

intelligence this pap er and earlier work Goldsmith et al Mundhenk Goldsmith

Allender a Mundhenk Goldsmith Lusena Allender b give initial evidence

of this

PlanningDomain Representations

A probabilistic planning domain M hS s A t G i is characterized by a nite set of states

S an initial state s S a nite set of op erators or actions A and a set of goal states

G S The application of an action a in a state s results in a probabilistic transition

to a new state s according to the probability transition function t where ts a s is the

probability that state s is reached from state s when action a is taken The ob jective is to

cho ose actions one after another to move from the initial state s to one of the goal states

with probability ab ove some threshold The state of the system is known at all times

fully observable and so can b e used to cho ose the action to apply

We are concerned with two main representations for planning domains at represen

tations which enumerate states explicitly and propositional representations sometimes

called compact structured or factored representations which view states as assignments

to a set of Bo olean state variables or prop ositions Prop ositional representations can rep

resent many domains exp onentially more compactly than can at representations

In the at representation the transition function t is represented by a collection of

jS j jS j matrices one for each action In the prop ositional representation this typ e of

jS j jS j matrix would b e huge so the transition function must b e expressed another way

In the probabilistic planning literature two p opular representations for prop ositional plan

ning domains are probabilistic statespace op erators PSOs Kushmerick et al and

twostage temp oral Bayes networks TBNs Boutilier et al Although these repre

sentations dier in the typ e of planning domains they can express naturally Boutilier et al

they are computationally equivalent a planning domain expressed in one represen

tation can b e converted in p olynomial time to an equivalent planning domain expressed in

the other with at most a p olynomial increase in representation size Littman a

In this work we fo cus on a prop ositional representation called the sequentialeects

tree representation ST Littman a which is a syntactic variant of TBNs with

conditional probability tables represented as trees Boutilier et al This rep

resentation is equivalent to TBNs and PSOs and simplies the presentation of our results

It is also p ossible to formulate the ob jective as one of maximizing exp ected total discounted

ward Boutilier et al but the two formulations are essentially p olynomially equivalent Con

don The only diculty is that compactly represented domains may require discount factors

exp onentially close to one for this equivalence to hold This is discussed further in Section

We assume that the numb er of bits used to represent the individual probability values isnt to o large

Littman Goldsmith Mundhenk

In ST the eect of each action on each prop osition is represented as a separate decision

tree For a given action a the set of decision trees for the dierent prop ositions is ordered

so the decision tree for one prop osition can refer to b oth the new and old values of previ

ous prop ositions this allows ST to represent any probability distribution The leaves of a

decision tree describ e how the asso ciated prop osition changes as a function of the state and

action p erhaps probabilistically Section gives a simple example of this representation

As in other prop ositional representations the states in the set of goal states G are not

explicitly enumerated in ST Instead we dene a goal set which is a set of prop ositions

such that any state in which all the goalset prop ositions are true is considered a goal state

The set of actions A is explicitly enumerated in ST just as it is in the at representation

The ST representation of a planning domain M hS s A t G i can b e dened more

formally as M hP I A T G i we use blackb oardb old font to stand for an ST represen

tation on a domain Here P is a nite set of distinct prop ositions The set of states S is

the p ower set of P the prop ositions in s S are said to b e true in s The set I P is

the initial state The set G is the goal set so the set of goal states G is the set of states s

such that G s

The transition function t is represented by a function T which maps each action in

A to an ordered sequence of jPj binary decision trees Each of these decision trees has a

distinct lab el prop osition decision prop ositions at the no des optionally lab eled with the

sux new and probabilities at the leaves The ith decision tree Ta for action a

i

denes the transition probabilities ts a s as follows For the ith decision tree let p b e

i

its lab el prop osition Dene to b e the value of the leaf no de found by traversing decision

i

tree Ta taking the left branch if the decision prop osition is in s or s if the decision

i

prop osition has the new sux and the right branch otherwise Finally we let

Y

if p s

i i

ts a s

otherwise

i

i

This denition of t constitutes a welldened probability distribution over s for each a and

s

To insure the validity of the representation we only allow pnew to app ear as a

decision prop osition in Ta if p is the lab el prop osition for some decision tree Ta for

i j

j i For this reason the order of the decision trees in Ta is signicant To put this

another way a prop osition only has a new value after this new value has b een dened by

some decision tree

The complexity results we derive for ST apply also to PSOs TBNs and all other com

putationally equivalent representations They also hold for the succinct representation

a prop ositional representation p opular in the complexitytheory literature which captures

the set of transition matrices as a function most commonly represented by a Bo olean circuit

that computes that function ST can straightforwardly b e represented as a Bo olean circuit

and in the pro of of Theorem we show how to represent particular Bo olean circuits in

ST Thus although we have not shown that the succinct representation is formally equiv

alent to ST the two representations are closely related the pro ofs we give for ST need to

b e changed only slightly to work for the succinct representation Goldsmith Littman

Mundhenk a b Mundhenk et al b Our results require that we restrict

the succinct representation to generate transition probabilities with at most a p olynomial

Complexity of Probabilistic Planning

numb er of bits the results may b e dierent for other circuitbased representations that can

represent probabilities with an exp onential numb er of bits Mundhenk et al a

Example Domain

To help make these domainrepresentation ideas more concrete we present the following

simple probabilistic planning domain based on the problem of building a sand castle at the

b each There are a total of four states in the domain describ ed by combinations of two

Bo olean prop ositions moat and castle prop ositions app ear in b oldface The prop osition

moat signies that a moat has b een dug in the sand and the prop osition castle signies

that the castle has b een built In the initial state b oth moat and castle are false and the

goal set is fcastle g

There are two actions digmoat and erectcastle actions app ear in sans serif Figure

illustrates these actions in ST Executing digmoat when moat is false causes moat to

b ecome true with probability if moat is already true digmoat leaves it unchanged

The castle prop osition in not aected The digmoat action is depicted in the left half of

Figure

The second action is erectcastle which app ears in the right half of Figure The decision

trees are numb ered to allow sequential dep endencies b etween their eects to b e expressed

The rst decision tree is for castle which do es not change value if it is already true when

erectcastle is executed Otherwise the probability that it b ecomes true is dep endent on

whether moat is true the castle is built with probability if moat is true and only

probability if it is not The idea here is that building a moat rst protects the castle

from b eing destroyed prematurely by the o cean waves

The second decision tree is for the prop osition moat Because erectcastle cannot make

moat b ecome true there is no eect when moat is false On the other hand if the moat

exists it may collapse as a result of trying to erect the castle The lab el castlenew in the

diagram refers to the value of the castle prop osition after the rst decision tree is evaluated

If the castle was already built when erectcastle was selected the moat remains built with

probability If the castle had not b een built but erectcastle successfully builds it moat

remains true Finally if erectcastle fails to make castle true moat b ecomes false with

probability and everything is destroyed

Note that given an ST representation of a domain we can p erform a numb er of useful

op erations eciently First given a state s and action a we can generate a next state s with

the prop er probabilities This is accomplished by calculating the value of the prop ositions

of s one at a time in the order given in the representation of a ipping coins with the

probabilities given in the leaves of the decision trees Second given a state s action a

and state s we can compute ts a s the probability that state s is reached from state s

when action a is taken via Equation

Plan Typ es and Representations

We consider four classes of plans for probabilistic domains Total ly ordered plans are the

most basic typ e b eing a nite sequence of actions that must b e executed in order this

typ e of plan ignores the state of the system Acyclic plans generalize totally ordered plans

to include conditional execution of actions Partial ly ordered plans are a dierent way of

Littman Goldsmith Mundhenk

dig-moat erect-castle

1: moat 2: castle 1: castle 2: moat moat castle castle moat TF TF TF TF moat 1 T 1/2 T 1 T 0 T 1 T castle 0 T TF T F

1/2 T 1/4 T 3/4 T castle: new TF

1 T 1/2 T

Figure Sequentialeectstree ST representation for the sandcastle domain

generalizing totally ordered plans in which the precise sequence is left exible McAllester

Rosenblitt Looping plans generalize acyclic plans to the case in which plan steps

can b e rep eated Smith Williamson Lin Dean This typ e of plan is also

referred to as a plan graph or p olicy graph Kaelbling et al

In the following sections we prove computational complexity results concerning each

of these plan typ es The remainder of this section provides formal denitions of the plan

typ es illustrated in Figure with examples for the sandcastle domain

In its most general form a plan or p olicy controller or transducer is a program that

outputs actions and takes as input information ab out the outcome of these actions In this

work we consider only a particularly restricted nitestatecontrollerbased plan represen

tation

A plan P for a planning domain M hS s A t G i can b e represented by a structure

V v E consisting of a directed multi graph V E with initial no de v V a

lab eling V A of plan no descalled plan steps to domain actions and a lab eling

of edges with state sets E P S such that for every v V with outgoing edges

S

v v S and v v v v for all v v V v v Some plan

0 0

v V v v E

steps have no outgoing edges at allthese are the terminal steps Actions for terminal

steps are not executed Note that the function can b e represented in a direct manner for

at domains but for prop ositional domains a more compact representation is needed We

assume that for prop ositional domains edge lab els are given as conjunctions of literals

The b ehavior of plan P in domain M is as follows The initial time step is t At time

step t the domain is in state s and the plan is at step v s is dened by the planning

t t

domain v by the plan Action v is executed resulting in a transition to domain state

t

s with probability ts v s Plan step v is chosen so that s v v

t t t t t t t t

the function tells the plan where to go next At this p oint the timestep index t is

incremented and the pro cess rep eats This continues until a terminal step is reached in the

plan

One can understand the b ehavior of domain M under plan P in several dierent ways

The p ossible sequences of states of M can b e viewed as a tree each no de of the tree at

depth t is a state reachable from the initial state at time step t Alternatively one can view

the state of M at time step t under plan P as a probability distribution over S At time

Complexity of Probabilistic Planning

step with probability the pro cess is in state s The probability that M is in state s

at time step t Prs t is the sum of the probabilities of all length t paths from

s to s ie

t

X Y

ts a s

j j j

0

j

s s s s s s

t

t

where a is the action selected by plan P at time j given the observed sequence of state

j

transitions s s This view is useful in some of the later pro ofs

j

Next we formalize the probability that domain M reaches a goal state under plan P

We need to intro duce several notions A legal sequence of states and steps applied is

k

called a trajectory ie for M and P this is a sequence hs v i of pairs with

i i

i

ts v s for i k

i i i

s v v for i k and

i i i

v v are not terminal steps

k

A goal trajectory is a tra jectory that ends in a goal state of M s G Note that

k

each goal tra jectory is nite Thus we can calculate the probability of a goal tra jectory

Q

k

k

hs v i as Pr ts v s given that s G The probability that

i i i i i

k

i

i

M reaches a goal state under plan P is the sum of the probabilities of goal tra jectories for

M

X

PrM reaches a goal state under P Pr

goal tra jectory

we call this the value of the plan

We characterize a plan P V v E on the basis of the size and structure of its

underlying graph V E If the graph V E contains no cycles we call it an acyclic plan

otherwise it is a looping plan It follows that an acyclic plan has a terminal step and that

a terminal step will b e reached after no more than jV j actions are taken such plans can

only b e used for nitehorizon control A total ly ordered plan sometimes called a linear

plan or a straight line plan is an acyclic plan with no more than one outgoing edge for

each no de in V Such a plan is a simple path

In this work we also consider partial ly ordered plans sometimes called nonlinear

plans that express an entire family of totally ordered plans In this representation the steps

of the plan are given as a partial order sp ecied for example as a directed acyclic graph

This partial order represents a set of totally ordered plans all totally ordered sequences of

plan steps consistent with the partial order that consist of al l steps of the partially ordered

plan Each of these totally ordered plans has a value and these values need not all b e the

same As such we have a choice in dening the value for a partially ordered plan In this

work we consider the optimistic p essimistic and average interpretations Let P b e the

set of totally ordered sequences consistent with partial order plan P Under the optimistic

interpretation

The value of P max PrM reaches a goal state under p

pP

Littman Goldsmith Mundhenk

Under the pessimistic interpretation

The value of P min PrM reaches a goal state under p

pP

Under the average interpretation

X

The value of P PrM reaches a goal state under p

jP j

pP

To illustrate these notions Figure gives plans of each typ e for the sandcastle domain

describ ed earlier Initial no des are marked an incoming arrow and terminal steps are

represented as lled circles The step totally ordered plan in Figure a successfully

builds a sand castle with probability An acyclic plan is given in Figure b which

succeeds with probability and executes digmoat an average of times Note

that it succeeds more often and with fewer actions on average than the totally ordered plan

in Figure a

Figure c illustrates a partially ordered plan for the sandcastle domain While this

plan b ears a sup ercial resemblance to the acyclic plan in Figure b it has a dierent

interpretation In particular the plan in Figure c represents a set of totally ordered plans

with ve nonterminal plan steps digmoat steps and erectcastle steps In contrast

to the solid arrows in Figure b which indicate ow of control the dashed arrows in

Figure c represent ordering constraints each erectcastle step must b e preceded by at

least two digmoat steps for example

Although there are distinct ways of arranging the ve plan steps in Fig

ure c into a totally ordered plan only two distinct totally ordered plans are consistent

with the ordering constraints

digmoat digmoat digmoat erectcastle erectcastle

success probability and

digmoat digmoat erectcastle digmoat erectcastle

success probability Thus the optimistic success probability of this partially

ordered plan is the p essimistic Note that the p essimistic interpreta

tion is closely related to the standard interpretation in deterministic partial order plan

ning McAllester Rosenblitt in which a partially ordered plan is considered suc

cessful only if al l its consistent totally ordered plans are successful The average success

probability is here b ecause there are orderings that yield the p o orer plan

describ ed ab ove and that yield the b etter one

The lo oping plan in Figure d do es not terminate until it succeeds in building a sand

castle which it will do with probability eventually Of course not all lo oping plans

succeed with probability the totally ordered plan in Figure a and the acyclic plan in

Figure b are sp ecial cases of such lo oping plans for instance

We dene jP j the size of a plan P to b e the numb er of steps it contains We dene jM j

the size of a domain M to b e the sum of the numb er of actions and states for a at domain

and the sum of the sizes of the ST decision trees for a prop ositional domain

Complexity of Probabilistic Planning

dig-moat erect-castle dig-moat dig-moat erect-castle dig-moat (a) A totally ordered plan. dig-moat erect-castle (c) A partially ordered plan. not(moat) not(moat) dig-moat dig-moat dig-moat erect-castle not(moat) moat and not(castle) moat moat moat castle dig-moat erect-castle (b) An acyclic (conditional) plan. not(moat) and not(castle)

(d) A looping plan.

Figure Example plans for the sandcastle domain

We consider the following decision problems The planevaluation problem asks given

a domain M a plan P of size jP j jM j and threshold whether its value is greater than

ie whether

PrM reaches a goal state under P

Note that the condition that jP j jM j is just a technical onewe simply want to use jM j

to represent the size of the problem Given an instance in which jP j is larger than jM j we

simply imagine padding out jM j to make it larger The imp ortant thing is that we are

considering plans that are roughly the size of the description of the domain and not the

size of the numb er of states which might b e considerably larger

The planexistence problem asks given domain M threshold and size b ound z jM j

whether there exists a plan P of size z with value greater than Note that b ecause we

b ound the size of the target plan the complexity of plan generation is no more than that of

plan existence the technique of selfreduction can b e used to construct a valid plan using

p olynomially many calls to an oracle for the

Each of these decision problems has a dierent version for each typ e of domain at

and prop ositional and each typ e of plan category lo oping acyclic totally ordered and

partially ordered under each of the three interpretations We address all of these problems

in the succeeding sections

Complexity Classes

For denitions of complexity classes reductions and standard results from complexity

theory we refer the reader to Papadimitriou

Briey we are lo oking only at the complexity of decision problems those with yesno

answers The class P consists of problems that can b e decided in p olynomial time that is

given an instance of the problem there is a program for deciding whether the answer is yes

or no that runs in p olynomial time The class NP contains the problems with p olynomial

time checkable p olynomialsize certicates for any given instance and certicate it can b e

checked in time p olynomial in the size of the instance whether the certicate proves that

the instance is in the NP set This means that if the answer to the instance is yes this

Littman Goldsmith Mundhenk

can b e shown in p olynomial time given the right key The class co NP is the opp ositeif

the answer is no this can b e shown in p olynomial time given the right key

A problem X is C hard for some C if every problem in C can b e reduced

to it to put it another way a fast algorithm for X can b e used as a subroutine to solve any

problem in C quickly A problem is C complete if it is b oth C hard and in C these are the

hardest problems in the class

In the interest of b eing complete we next give more detailed descriptions of the less

familiar probabilistic and counting complexity classes we use in this work

The class L Alvarez Jenner is the class of functions f such that for some

nondeterministic logarithmically spaceb ounded machine N the numb er of accepting paths

of N on x equals f x The class P is dened analogously as the class of functions f

such that for some nondeterministic polynomialtime b ounded machine N the numb er of

accepting paths of N on x equals f x Typical complete problems are computing the

determinant for L and computing the p ermanent for P

A function f is dened to b e in GapL if it is the dierence f g h of L functions g

and h While L functions have nonnegative integer values by denition GapL functions

may have negative integer values for example if g always returns zero

Probabilistic logspace Gill PL is the class of sets A for which there exists a

nondeterministic logarithmically spaceb ounded machine N such that x A if and only if

the numb er of accepting paths of N on x is greater than its numb er of rejecting paths In

the original denition of PL there is no time b ound on computations Boro din Co ok and

Pipp enger later showed PL P Jung proved that any set computable in

probabilistic logspace is computable in probabilistic logspace where the PL machine has a

simultaneous p olynomialtime b ound In apparent contrast to Pcomplete sets sets in PL

are decidable using very fast parallel computations Boro din et al

Probabilistic p olynomial time PP is dened analogously A classic PPcomplete prob

lem is Majsat given a Bo olean formula in conjunctive normal form CNF do es the

ma jority of assignments satisfy it According to Balcazar Daz and Gabarro the

PPcompleteness of Majsat was shown in a combination of results from Gill and

Simon

For p olynomialspaceb ounded computations PSPACE equals probabilistic PSPACE

and PSPACE is the same as the class of p olynomialspacecomputable functions Ladner

Note that L NL L PL and GapL are to logarithmic space what P NP P PP and

GapP are to p olynomial time Also the notion of completeness we use in this pap er relies

on manyone reductions In the case of PL the reduction functions are logarithmic space

in the case of NP and ab ove they are p olynomial time

0

C

consists of those sets that are C Turing For any complexity classes C and C the class C

reducible to sets in C ie sets that can b e accepted with resource b ounds sp ecied by C

using some problem in C as a subroutine oracle with instantaneous output For any class

C PSPACE

C PSPACE it is the case that NP PSPACE and therefore NP PSPACE

PP

NP

closure of The primary oracledened class we consider is NP It equals the

m

PP Toran which can b e seen as the closure of PP under p olynomialtime disjunc

tive reducibility with an exp onential numb er of queries each of the queries computable in

p olynomial time from its index in the list of queries To simplify our completeness results

Complexity of Probabilistic Planning

for this class we intro duce a decision problem we call EMajsat exists Majsat which

generalizes the standard NPcomplete satisability problem and the PPcomplete Majsat

An EMajsat instance is dened by a CNF Bo olean formula on n Bo olean variables

x x and a numb er k b etween and n The task is to decide whether there is an

n

initial partial assignment to variables x x so that the ma jority of assignments that

k

PP

extend that partial assignment satises We prove that this problem is NP complete

in the App endix

The complexity classes we consider satisfy the following containment prop erties and

relations to other wellknown classes

PP

NP NP

L NL PL P PSPACE EXP PP

PP

co NP

co NP

Because P is prop erly contained in EXP EXPcomplete problems are provably intractable

the other classes may equal P although that is not generally b elieved to b e the case

PP

Several other observations are worth making here It is also known that PH NP

where PH represents the p olynomial hierarchy In a crude sense PH is close to PSPACE

PP

and thus our NP completeness results place imp ortant problems close to PSPACE

However some early empirical results Littman b show that random problem in

stances from PP have similar prop erties to random problem instances from NP suggesting

that PP might b e close enough to NP for NPtyp e heuristics to b e eective

Results Summary

Tables and summarize our results which are explained in more detail in later sections

The general avor of our main results and techniques can b e conveyed as follows To

show that a planevaluation problem is in a particular complexity class C we take the

cross pro duct of the steps of the plan and the states of the domain and then lo ok at the

complexity of evaluating the absorption probability of the resulting Markov chain ie the

directed graph with probabilitylab eled edges The complexity of the corresp onding plan

C

existence problem is then b ounded by NP b ecause the problem can b e solved by guessing

C

the correct plan nondeterministically and then evaluating it in many cases it is NP

complete The appropriate complexity class C dep ends primarily on the representation of

the crosspro duct Markov chain

Exceptions to this basic pattern are the results for partially ordered plans in Section

These app ear to require a distinct set of techniques

It is also worth noting that although prop ositional domains can b e exp onentially more

compact than at domains the computational complexity of solving problems in prop o

sitional domains is not always exp onentially greater in one instance evaluating partially

ordered plans under the average interpretation the complexity is actually the same for at

and prop ositional domains

We also prove results concerning plan evaluation and existence for compactly represented

PP

plans PPcomplete and NP complete Corollary plan existence of large enough

lo oping plans in at domains Pcomplete Theorem plan evaluation and existence

for lo oping plans in deterministic prop ositional domains PSPACEcomplete Theorems

and and plan existence for p olynomialsize lo oping plans in partially observable domains

NPcomplete Section

Littman Goldsmith Mundhenk

Plan Typ e Plan Evaluation Plan Existence Reference

unrestricted Pcomplete P T

p olynomialdepth Pcomplete P T

lo oping PLcomplete NPcomplete Section

acyclic PLcomplete NPcomplete Section

totally ordered PLcomplete NPcomplete Section

partially ordered optimistic NPcomplete NPcomplete Section

partially ordered average PPcomplete NPcomplete Section

partially ordered p essimistic co NPcomplete NPcomplete Section

Table Complexity results for at representations P T is Papadimitriou and

Tsitsiklis

Plan Typ e Plan Evaluation Plan Existence Reference

unrestricted EXPcomplete Littman a

p olynomialdepth PSPACEcomplete Littman a

lo oping PSPACEcomplete PSPACEcomplete Section

PP

acyclic PPcomplete NP complete Section

PP

totally ordered PPcomplete NP complete Section

PP PP

partially ordered optimistic NP complete NP complete Section

PP

partially ordered average PPcomplete NP complete Section

PP PP

partially ordered p essimistic co NP complete NP complete Section

Table Complexity results for prop ositional representations

Complexity of Probabilistic Planning

Acyclic Plans

In this section we treat the complexity of generating and evaluating acyclic and totally

ordered plans

Theorem The planevaluation problem for acyclic and total ly ordered plans in at do

mains is PLcomplete

Pro of First we show PLhardness for totally ordered plans Jung proved that a

set A is in PL if and only if there exists a logarithmically spaceb ounded and p olynomially

timeb ounded nondeterministic Turing machine N with the following prop erty For every

input x machine N must have at least half of its computations on input x b e accepting

if and only if x is in A The machine N can b e transformed into a probabilistic Turing

machine such that for each input x the probability that R x accepts x equals the

fraction of computations of N x that accepted Given R a planning domain M can b e

describ ed as follows The state set of M is the set of congurations of R on input x Note

that a conguration consists of the contents of the logarithmically spaceb ounded tap e the

state the lo cation of the readwrite heads and one symb ol each from the input and output

tap es Thus a conguration can b e represented with logarithmically many bits and there

are only p olynomially many such congurations The statetransition probabilities of M

under the unique action a are the conguration transition probabilities of R All states

obtained from accepting congurations are goal states The totally ordered plan consists

of a step counter for R on input x and each of its plan steps takes the only action a

The probability that the planning domain under this plan reaches a goal state is exactly

the probability that R x reaches an accepting conguration Thus evaluating this totally

ordered plan is PLhard

Since totally ordered plans are acyclic plans this also proves PLhardness of the plan

evaluation problem for acyclic plans

Next we show that the planevaluation problem is in PL for acyclic plans Let M

hS s A t G i b e a planning domain let P hV v E i b e an acyclic plan and let

threshold b e given We show how our question whether the probability that M under P

reaches a goal state with probability greater than can b e equivalently transformed into

the question of whether a GapL function is greater than The transformation can b e done

in logarithmic space As shown by Allender and Ogihara it follows that our question

is in PL

At rst we construct a Markov chain C from M and P which simulates the execution

or evaluation of M under P Note that a Markov chain can b e seen as a probabilistic

domain with only one action in its set of actions Since there is no choice of actions we do

not mention them in this construction The state space of C is S V the initial state is

s v the set of goal states is G V and the transition probabilities t for C are

C

ts v s if s v v

t s v s v if v is a terminal step no de and s v s v

C

otherwise

Let m b e the numb er of plan steps of P ie jV j the numb er of no des in the graph

representing P Since states of C that contain a terminal step of P are sinks in C it follows

Littman Goldsmith Mundhenk

that

PrM reaches a goal state under P PrC reaches a goal state in exactly m steps

Let

p s m PrC reaches a goal state in exactly m steps from initial state s

C

Then p s v m is the probability we want to calculate The standard inductive de

C

nition of p used to evaluate plans by dynamic programming is

C

if s is a goal state of C

p s

C

otherwise

X

p s k t s s p s k k m

C C C

0

s S V

Let h b e the maximum length of the representation of a statetransition probability t

C

Then for

if s is a goal state of C

p s

h

otherwise

X

h

p s k t s s p s k k m

C

h h

0

s S V

hm

it follows that p s v m p s v m Note that p s v m is an integer

C

h h

hm

value Therefore p s v m if and only if p s v m b c In order

C

h

to show that p s v m is decidable in PL it suces to show that p s v m

C

h

is in GapL Therefore we unwind the inductive denition of p Let T b e the integer

h

h

matrix obtained from t with T 0 t s s We intro duce the integervalued T to

C C

ss

show that p can b e comp osed from GapL functions using comp ositions under which GapL

h

is closed as t is not integer valued it cannot b e used to show this We can write

C

X

m

p s m T 0 p s

h h

ss

0

s S V

0

We argue that p is in GapL Each entry T is logspace computable from the do

h

ss

main M and plan P Therefore the p owers of the matrix are in GapL as shown by

Vinay Because GapL is closed under multiplication and summation of p olynomially

many summands it follows that p GapL Finally we use closure prop erties of GapL

h

from Allender and Ogihara since GapL is closed under subtraction it follows that

the planevaluation for acyclic plans is in PL

Because totally ordered plans are acyclic plans the planevaluation problem for totally

ordered plans is also in PL

The technique of forming a Markov chain by taking the cross pro duct of a domain and

a plan will b e useful later Planexistence problems require a dierent set of techniques

Theorem The planexistence problem for acyclic and total ly ordered plans in at do

mains is NPcomplete

Complexity of Probabilistic Planning

Pro of First we show containment in NP Given a planning domain M a threshold

and a size b ound z jM j guess a plan of the correct form of size at most z and accept if

and only if M reaches a goal state with probability greater than under this plan Note that

checking whether a plan has the correct form can b e done in p olynomial time Because the

planevaluation problem is in PL Theorem it follows that the planexistence problem

PL

is in NP ie it is in NP NP

To show the NPhardness of the planexistence problem we give a reduction from the

NPcomplete satisability problem for Bo olean formulae in conjunctive normal form We

construct a planning domain that evaluates a Bo olean formula with n variables where a

n step plan describ es an assignment of values to the variables In the rst step a

clause is chosen randomly At step i the planning domain checks whether the plan

satises the app earance of variable i in that clause If so the clause is marked as satised

After n steps if no literal was satised in that clause then no goal state is reached

through this clause otherwise a transition is made to the goal state Therefore the goal

state will b e reached with probability greater than m if and only if all clauses are

satisedthe plan describ es a satisfying assignment

We formally dene the reduction which is similar to one presented by Papadimitriou

and Tsitsiklis Let b e a CNF formula with n variables x x and m clauses

n

C C Let the sign of an app earance of a variable in a clause b e if the variable is

m

negated and otherwise Dene the planning domain M hS s A t G i where

S fsat i j unsat i j j i n j mg fs s s g

acc rej

A fassigni b j i n b f gg fstart end g

G fs g

acc

if s s a start s unsat j j m

m

if s s a start s s

rej

if s unsati j a assigni b s sati j i n

x app ears in C with sign b

i j

if s unsati j a assigni b s unsati j i n

x do es not app ear in C with sign b

i j

if s unsati j a assigni b or a start or a end

s s i i n b f g

rej

ts a s

if s unsatn j s s

rej

if s sati j a assigni b s sati j i n

if s sati j a assigni b or a start or a end

s s i i n

rej

if s satn j a end s s

acc

if s satn j a end s s

rej

if s s s or s s s

rej acc

otherwise

The meaning of the states in this domain is as follows When the domain is in state

sat i j for i n j m it means the formula has b een satised and we are

currently checking variable i in clause j State sat n j for all j m means that

weve nished verifying clause j and it was satised The meanings are similar for the

Littman Goldsmith Mundhenk

s

start start

unsat unsat

assign assign assign assign

sat unsat sat unsat

assign assign x assign x

assign

sat unsat sat unsat

assign

assign x assign x

assign

sat unsat sat unsat

end end end end

s s

rej acc

Figure A domain generated from the Bo olean formula x x x x

unsat states Of course s is the initial state and s and s are the accepting and

acc rej

rejecting states resp ectively

The actions in this domain are start and end which mark the b eginning and end of the

assignment and assigni b for i n b f g which assign the truth value b to

variable i Figure gives the domain generated by this reduction from a simple Bo olean

formula By the description of the reduction M can b e computed from in time

p olynomial in jj

By construction M under z n step plan P can only reach goal state s if

acc

P has the form

start assign b assign b assignn b end

n

P reaches s with probability if and only if b b is a satisfying assignment for the

acc n

n variables in This shows that Bo olean satisability p olynomialtime reduces to the

planexistence problem for totally ordered and acyclic plans showing that it is NPhard

Note that if we b ound the plan depth horizon instead of the plan size the plan

existence problem for acyclic plans in at domains is Pcomplete Goldsmith et al a

Papadimitriou Tsitsiklis Limiting the plan size makes the problem more dicult

b ecause it is p ossible to force the planner to take the same action from dierent states

guring out how to do this without sacricing plan quality is very challenging

In prop ositional domains plan evaluation is harder b ecause of the large numb er of states

Theorem The planevaluation problem for acyclic and total ly ordered plans in proposi

tional domains is PPcomplete

Pro of To show PPhardness for totally ordered plans we give a reduction from the

PPcomplete problem Majsat given a CNF Bo olean formula do es the ma jority of

assignments satisfy it

Complexity of Probabilistic Planning

evaluate

1: xi 2: xi ... n: xi

1/2 T 1/2 T 1/2 T

n+1: clause1 n+m: clausem n+m+1: satisfied n+m+2: done

xa1:new xam:new done ... 1 T TF TF TF

0 T clause1:new xb1:new 1 T 1 T xbm:new TF T F T F clause :new 2 0 T 1 T xc1:new 1 T xcm:new T F TF TF ... 0 T x :new d1 1 T 1 T 0 T clausem:new TF TF

0 T 1 T 1 T 0 T

Figure Sequentialeectstree representation for evaluate

Given we construct a planning domain M and a step plan such that the plan

achieves the goal with probability greater than if and only if the ma jority of

assignments satises The planning domain M consists of a single action evaluate

which is also the step plan to b e evaluated There are n m prop ositions in M

x through x which corresp ond to the n variables of clause through clause which

n m

corresp ond to the m clauses of satised which is also the sole element of the goal set

and done which insures that evaluate is only executed once this is imp ortant when this

domain is used later in Theorem to show the complexity of plan existence In the initial

state all prop ositions are false

The evaluate action generates a random assignment to the variables of evaluates the

clauses clause is true if any of the literals in the ith clause is true and evaluates the entire

i

formula satised is true if all the clauses are true Figure gives an ST representation

represent the variables in clause i x of evaluate in which x

i i

b a

By construction is in Majsat if and only if M reaches a goal state with probability

greater than under the plan consisting of the single action evaluate

We next show memb ership in PP for acyclic plans We do this by showing that a

planning domain M and an acyclic plan P induce a computation tree consisting of all

paths through M under P Evaluating this computation tree can b e accomplished by a PP

machine

Let b b e a b ound on the numb er of bits used to sp ecify probabilities in the leaves of the

decision trees representing M Consider a computation tree dened as follows It has ro ot

lab eled hs v i If in the planning domain M the probability of reaching state s from s

We represent numb ers in p olynomialprecision binary representation In principle this could intro duce

roundo errors if planning problems are sp ecied in some other form

Littman Goldsmith Mundhenk

b

given action v is equal to then hs v i will have children lab eled hs v s i

Each of the identically lab eled child no des is indep endent but is dened identically to the

others Thus the numb er of paths with a given set of lab els corresp onds to the probability

b h

of that tra jectory through the domain and plan multiplied by where h is the depth

of the plan

b h

The numb er of accepting computations is therefore more than if and only if

the probability of achieving the goal is more than Note that b is inherent in the planning

domain rather than in h A PP machine accepts if more than half of the nal states are

accepting so if it will b e necessary to pad the computation tree by intro ducing

dummy branches that accept or reject in the right prop ortions

The planexistence problem is essentially equivalent to guessing and evaluating a valid

plan

Theorem The planexistence problem for acyclic and total ly ordered plans in proposi

PP

tional domains is NP complete

PP

Pro of Containment in NP for b oth totally ordered and for acyclic plans follows from

the fact that a p olynomialsize plan can b e guessed in p olynomial time and checked in PP

Theorem

PP

Hardness for NP for b oth totally ordered and acyclic plans can b e shown using a

PP

reduction from EMajsat shown NP hard in the App endix The reduction echo es the

one used in the PPhardness argument in the pro of of Theorem

Given a CNF Bo olean formula with variables x x and a numb er k we construct

n

a planning domain M k such that a plan exists that can reach the goal with probability

greater than if and only if there is an assignment to the variables x x such

k

that the ma jority of assignments to the remaining variables satises The planning domain

M k consists of the action evaluate from Theorem and one action setx for each of

i

the rst k variables Just as in the pro of of Theorem there are n m prop ositions in

M k all initially false x through x which corresp ond to the n variables of clause

n

through clause which corresp ond to the m clauses of satised and done which

m

insures that evaluate is only executed once The goal set contains satised and done

For i k action setx makes prop osition x true Analogously to Theorem the

i i

evaluate action generates a random assignment to the remaining variables of evaluates

the clauses clause is true if any of the literals in the clause is true and evaluates the

i

entire formula satised is true if all the clauses are true and sets done to true If done

is true no further action can make satised true

If the pair k is in EMajsat then there exists an assignment b b to the rst k

k

variables of such that the ma jority of assignments to the rest of the variables satises

Therefore the plan applying steps setx for all i with b followed by an evaluate action

i i

reaches a goal state with probability greater than

Conversely assume M k under totally ordered plan P reaches a goal state with

probability greater than Since the evaluate action is the only action setting done to

true and since no action reaches the goal once done is set to true we can assume without

loss of generality that P consists of a sequence of steps setx that ends with evaluate By

i

construction the assignment to x x assigning exactly to those variables set by P

k

Complexity of Probabilistic Planning

is an assignment under which the ma jority of the assignments to the rest of the variables

satises and therefore k is in EMajsat

Since every totally ordered plan is acyclic the same hardness holds for acyclic plans

In the ab ove results we consider b oth at and compactly represented prop ositional

planning domains but only at plans Compactly represented plans are also quite useful

A compact acyclic plan is an acyclic plan in which the names of the plan steps

are enco ded by a set of prop ositional variables and the steptransition function

b etween plan steps is represented by a set of decision trees just as in ST We

require that the plan has depth p olynomial in the size of the representation

even though the total numb er of steps in the plan might b e exp onential due to

the logarithmic succinctness of the enco dings

Because the plandomain crosspro duct technique used in the pro of of Theorem gen

eralizes to compact acyclic plans the same complexity results apply This also holds true

for a probabilistic acyclic plan which is an acyclic plan that can make random transitions

b etween plan steps ie the steptransition function is sto chastic These insights can b e

combined to yield the following corollary of Theorems and

Corollary The planevaluation problem for compact probabilistic acyclic plans in propo

sitional domains is PPcomplete and the planexistence problem for compact probabilistic

PP

acyclic plans in propositional domains is NP complete

We mention probabilistic plans here for two reasons First the b ehavior of some plan

ning structures such as partially ordered plan evaluation under the average interpretation

discussed in Section can b e thought of as generating probabilistic plans Second there

are many instances in which simple probabilistic plans p erform nearly as well as much larger

and more complicated deterministic plans this notion is often exploited in the eld of ran

domized algorithms Work by Platzman describ ed by Lovejoy shows how the

idea of randomized plans can come in handy for planning in partially observable domains

Lo oping Plans

Lo oping plans can b e applied to innitehorizon control The complexity of plan existence

and plan evaluation in at domains Theorems and do es not dep end on the presence

or absence of lo ops in the plan

Theorem The planevaluation problem for looping plans in at domains is PLcomplete

Pro of Given a domain M and a lo oping plan P we can construct a pro duct Markov

chain C as in the pro of of Theorem As in the pro of of Theorem of Allender and

Ogihara this chain can b e constructed such that it has exactly one accepting and

exactly one rejecting state b oth of these states are absorbing The probability that M

reaches a goal state under P equals the probability that C reaches its accepting state if

started in its initial state which is the pro duct of the initial states of M and P In the

Littman Goldsmith Mundhenk

pro of of Theorem of Allender and Ogihara it is shown that the construction of

the Markov chain and the computation of whether it reaches its nal state with probability

greater than can b e p erformed in PL

PLhardness is implied by Theorem since acyclic plans are a sp ecial case of lo oping

plans

Theorem The planexistence problem for looping plans in at domains is NPcomplete

in general but Pcomplete if the size of the desired plan is at least the size of the state or

action space ie z minjS j jAj

Pro of sketch NPcompleteness follows from the pro of of Theorem containment and

hardness still hold if plans are p ermitted to b e lo oping

However this is only true if we are forced to sp ecify a plan whose size is small with

resp ect to the size of the domain If our lo oping plan is allowed to have a numb er of states

that is at least as large as the numb er of states or actions in the domain the problem can

b e solved in p olynomial time

It is known that for Markov decision pro cesses such as these the maximum probability

of reaching a goal state equals the maximum probability of reaching a goal state under

any innitehorizon stationary policy where a stationary p olicy is a mapping from states

to actions that is used rep eatedly to cho ose actions at each time step It is known that

such an optimal stationary p olicy can b e computed in p olynomial time via linear pro

gramming Condon Any stationary p olicy for a domain M hS s A G ti can b e

written as a lo oping plan although of course not all lo oping plans corresp ond to stationary

p olicies

We show that for any xed stationary p olicy p S A there are two simple ways a

lo oping plan P V v E can b e represented First let V A v ps v v

and v v fs S j ps v g It follows that whenever M reaches state s then the

action applied according to the lo oping plan is the same as according to P

Second let V S v s v pv and v v fv g It follows that whenever

M reaches state s the plan will b e at the no de corresp onding to that state and therefore

the appropriate action for that state will b e applied by the lo oping plan Therefore the

maximum probability of reaching a goal state can b e obtained by either of these lo oping

plans

Since the b est stationary p olicy can b e computed in p olynomial time the b est lo oping

plan can b e computed in p olynomial time to o Phardness follows from a theorem of

Papadimitriou and Tsitsiklis

In prop ositional domains the complexity of plan existence and plan evaluation of lo oping

plans is quite dierent from the acyclic case Lo oping plan evaluation is very hard

Theorem The planevaluation problem for looping plans in both deterministic and stochas

tic propositional domains is PSPACEcomplete

Pro of Recall that the planevaluation problem for at domains is in PL Theorem

n

For a planning domain with c states and a representation of size n a lo oping plan can

Complexity of Probabilistic Planning

n

b e evaluated in probabilistic space O log c Theorem which is to say probabilistic

space p olynomial in the size of the input This follows b ecause the ST representation of

the domain can b e used to compute entries of the transition function t in p olynomial space

Since probabilistic PSPACE equals PSPACE this shows that the planevaluation problem

for lo oping plans in sto chastic prop ositional domains is in PSPACE

It remains to show PSPACEhardness for deterministic prop ositional domains Let N

b e a deterministic p olynomialspaceb ounded Turing machine The momenttomoment

computation state conguration of N can b e expressed as a p olynomiallength bit string

that enco des the contents of the Turing machines tap e the lo cation of the readwrite head

the state of N s nitestate controller and whether or not the machine is in an accepting

state

For any input x we describ e how to construct in p olynomial time a deterministic plan

ning domain M x and a singleaction lo oping plan that reaches a goal state of M x if and

only x is accepted by Turing machine N

Given a description of N and x one can in time p olynomial in the size of the descriptions

of N and x pro duce a description of a Turing machine T that computes the transition

function for N In other words T on input c a conguration of N outputs the next

conguration of N In fact T can even check whether c is a valid conguration in the

computation of N x by simulating that computation By an argument similar to that

used in Co oks theorem T can b e mo deled by a p olynomialsize circuit This circuit takes

as input the bit string describing the current conguration of N and outputs the next

conguration

Next we argue that the computation of this circuit can b e expressed by an action com

pute in ST representation There is one prop osition in M x for each bit in the conguration

plus one for each gate of the circuit The three standard gates and or and not are

all easily represented as decision trees By ordering the decision trees in compute accord

ing to a top ological sort of the gates of the circuit a single compute action can compute

precisely the same output as the circuit Figure illustrates this conversion for a simple

circuit which gives the form of the not i and i and or i gates

We can now describ e the complete reduction The planning domain M x consists of

the single action compute and the set of prop ositions describ ed in the previous paragraph

The initial state is the initial conguration of the Turing machine N and the goal set is the

prop osition corresp onding to whether or not the conguration is an accepting state for N

Because all transitions are deterministic and only one action can b e chosen it follows

that the goal state is reached with probability greater than for example under

the plan that rep eatedly cho oses compute until an accepting state is reached if and only if

p olynomialspace machine N on input x accepts

A similar argument shows that lo oping plan existence is not actually any harder than

lo oping plan evaluation

Theorem The planexistence problem for looping plans in both deterministic and stochas

tic propositional domains is PSPACEcomplete

Littman Goldsmith Mundhenk

compute

1: i1 2: i2 3: i3

c2 i1:new c1 c1 c2 c3 TF TF TF

0 T 1 T c c not 3 0 T 1 T 2 T i F T F or 1 and i i2 3 1 T 0 T 1 T 0 T and not or 4: c1 5: c2 6: c3

c1 c2 c3 c1 i2:new i3:new TF TF TF

i2:new 0 T 0 T 1 T 1 T i2:new T F T F

1 T 0 T 1 T 0 T

Figure A circuit and its representation as a sequentialeects tree

Pro of Hardness for PSPACE follows from the same construction as in the pro of of

Theorem either the onestep lo oping plan is successful or it is not No other plan yields

a b etter result

Recall that we are only interested in determining whether there is a plan of size z where

z is b ounded by the size of the domain that reaches the goal with a given probability The

problem is in PSPACE b ecause the plan can b e guessed in p olynomial time and checked in

PSPACE

PSPACE Theorem Because NP PSPACE the result follows

As we mentioned earlier the unrestricted innitehorizon planexistence problem is

EXPcomplete Littman a this shows the problem of determining unrestricted plan

existence is EXPhard only b ecause some domains require plans that are larger than p olynomial

size lo oping plans

Because Theorem shows PSPACEcompleteness for determining plan existence in de

terministic domains it is closely related to the PSPACEcompleteness result of Bylan

der The main dierence b etween the two results is that our theorem applies to

more compact plans p olynomial instead of exp onential with more complex op erator de

scriptions conditional eects instead of preconditions with add and delete lists that can

include lo ops Also as the pro ofs ab ove show PSPACEhardness is retained even in plan

ning domains with only one action so it is the lo oping that makes lo oping plans hard to

work with

Partially Ordered Plans

Partially ordered plans are a p opular representation b ecause they allow planning algorithms

to defer a precise commitment to the ordering of plan steps until it b ecomes necessary in

Complexity of Probabilistic Planning

the planning pro cess A k step partially ordered plan corresp onds to a set of k step totally

ordered plansall those that are consistent with the given partial order The evaluation of

a partially ordered plan can b e dened to b e the evaluation of the b est worst or average

memb er of the set of consistent totally ordered plans these are the optimistic p essimistic

and average interpretations resp ectively

The planevaluation problem for partially ordered plans is dierent from that of totally

ordered plans This is b ecause a single partial order can enco de all totally ordered plans

Hence evaluating a partially ordered plan involves guring out the b est in case of optimistic

interpretation or the worst for p essimistic interpretation memb er or the average for

average interpretation of this combinatorial set

Theorem The planevaluation problem for partial ly ordered plans in at domains is

NPcomplete under the optimistic interpretation

Pro of sketch Memb ership in NP follows from the fact that we can guess any totally

ordered plan consistent with the given partial order and accept if and only if the domain

reaches a goal state with probability more than Rememb er that this evaluation can b e

p erformed in PL Theorem and therefore deterministically in p olynomial time

The hardness pro of is a variation of the construction used in Theorem The partially

ordered plan to evaluate has the form given in Figure the consistent total orders are of

the form

start assign b assign b assign b assign b

assignn b assignn b end

n n

where b is either or Each of the p ossible plans can b e interpreted as an assignment

i

to n Bo olean variables by ignoring every second assignment action The construction in

Theorem shows how to turn a CNF formula into a planning domain M and it

can easily b e mo died to ignore every second action Thus the b est totally ordered plan

consistent with the given partially ordered plan reaches the goal with probability if and

m

only if it reaches the goal with probability greater than if and only if it satises all

clauses of if and only if is satisable

Theorem The planevaluation problem for partial ly ordered plans in at domains is

coNPcomplete under the pessimistic interpretation

Pro of sketch Both the pro of of memb ership in co NP and the pro of of hardness are

very similar to the pro of of Theorem We show a reduction from the co NPcomplete set

Sat of unsatisable formulae in CNF The plan to evaluate has the form given in Figure

and is interpreted as ab ove As in the pro of of Theorem we construct a planning domain

M but we take G fs g as goal states where the state s is reached with probability

rej rej

greater than if and only if the assignment do es not satisfy one of the clauses of formula

A formula is unsatisable if and only if under every assignment at least one of the clauses

is not satised Therefore the probability that M reaches a goal state under a given

totally ordered plan is greater than if and only if the plan corresp onds to an unsatisfying

Littman Goldsmith Mundhenk

start

assign(1,1) assign(1,-1)

assign(2,1) assign(2,-1)

assign(3,1) assign(3,-1) ... assign(n,1) assign(n,-1)

end

Figure A partially ordered plan that can b e hard to evaluate

assignment Finally the minimum of that probability over all consistent partially ordered

plans is greater than if and only if is unsatisable

Theorem The planevaluation problem for partial ly ordered plans in at domains is

PPcomplete under the average interpretation

Pro of Under the average interpretation we must decide whether the average evaluation

over all consistent totally ordered plans is greater than threshold This can b e decided in

PP by guessing uniformly a totally ordered plan and checking its consistency with the given

partially ordered plan in p olynomial time If the guessed totally ordered plan is consistent

it can b e evaluated in p olynomial time Theorem and accepted or rejected as appropriate

If the guessed plan is inconsistent the computation accepts with probability and rejects

with probability leaving the average over the consistent orderings unchanged with

resp ect to the threshold

The PPhardness is shown by a reduction from the PPcomplete Majsat Let b e a

formula in CNF We show how to construct a domain M and a partially ordered plan

P such that Majsat if and only if the average p erformance of M under a totally

ordered plan consistent with P is greater than

Let consist of the m clauses C C which contain n variables x x Domain

m n

M hS s A t G i has actions

A fassigni b j i f ng b f gg fstart check endg

Action assigni b will b e interpreted as assign sign b to x The partially ordered plan

i

P has plan steps

V f i b h j i f ng b f g h f mgg fstart check endg

and mapping V A with

for fstart check end g and i b h assigni b

Complexity of Probabilistic Planning

The order E requires that a consistent plan has start as the rst and end as the last step

The steps in b etween are arbitrarily ordered More formally

E fstart q j q V fstart end gg fq end j q V fstart end gg

Now we dene how the domain M acts on a given totally ordered plan P consistent

with P Domain M consists of the cross pro duct of the following p olynomialsize

deterministic domains M and M to which a nal probabilistic transition will b e added

s

Before we describ e M and M precisely here are their intuitive denitions The domain

s

M is satised by plans that have the form of an assignment to the n Bo olean variables with

s

the restriction that the assignment is rep eated m times for easy checking The domain

M is satised by plans that corresp ond to satisfying assignments The comp osite of these

two domains is only satised by plans that corresp ond to satisfying assignments We will

now dene these domains formally

First M checks whether the totally ordered plan matches the regular expression

s

m m

start assign jassign

m m

assignn jassignn

m

check assign jassign assignn jassign n

Note that the m here is a constant Let go o d b e the state reached by M if the plan

s

matches that expression Otherwise the state reached is bad To clarify the actions b e

fore check are there simply to use the extra steps not used in sp ecifying the assignment

in the partially ordered plan

Next M checks whether the sequence of actions following the check action satises

the clauses of in the following sense Let a a b e this sequence M interprets each

k

subsequence a a with a assignx b as assignment b b

n

l

j n nj n l j m

to the variables x x and checks whether this assignment satises clause C If all

n j

single clauses are satised in this way then M reaches state satised

Note that M and M are dened so that they do not deal with the nal end action

s

M consists of the pro duct domain of M and M with the transitions for action end

s

as follows If M is in state bad q for any state q of M then action end lets M go

probabilistically to state accept or to state reject with probability each if M is

in state go o d satised the M under action end go es to state accept with probability

otherwise M under action end go es to state reject with probability The set of goal

states of M consists of the only state accept

We analyze the b ehavior of M under any plan P consistent with P If M under

s

P reaches state bad then M under P reaches a goal state with probability Now

consider a plan P under which M reaches the state go o dcalled a good plan Then

s

P matches the ab ove regular expression Therefore for every i f mg there exists

b f g such that all steps si b h are b etween start and check Thus all steps

i i

b etween check and end are

s i sn i s i sn i m

n n

Consequently the sequence of actions dened by the lab eling of these plan steps are

m

assign i assign i assignn i

n

Littman Goldsmith Mundhenk

This means that M checks whether all clauses of are satised by the assignment i i

n

ie M checks whether i i satises Therefore M accepts under plan P with

n

probability if the plan represents a satisfying assignment and with probability other

wise

Note that each assignment corresp onds to exactly one go o d plan Therefore the average

over all go o d plans that M accepts equals the fraction of satisfying assignments of

Since M accepts under bad plans with probability this yields that the average

over all plans consistent with P of the acceptance probabilities of M is greater than

if and only if Majsat

The complexity of the planexistence problem for partially ordered plans is identical to

that for totally ordered plans

Theorem The planexistence problem for partial ly ordered plans in at domains is NP

complete under the pessimistic optimistic and average interpretations The planexistence

PP

problem for partial ly ordered plans in propositional domains is NP complete under the

pessimistic optimistic and average interpretations

Pro of First note that a totally ordered plan is a sp ecial typ e of partially ordered plan

and its evaluation is unchanged under the p essimistic optimistic or average interpretation

In particular b ecause there is only one ordering consistent with a given totally ordered

plan the b est worst and average orderings are all the same Therefore if there exists a

totally ordered plan with value greater than then there is a partially ordered plan with

value greater than the same plan under all three interpretations

Conversely if there is a partially ordered plan with value greater than under any of

the three interpretations then there is a totally ordered plan with value greater than

This is b ecause the value of the b est worst and average ordering of a partially ordered

plan is always a lower b ound on the value of the b est consistent totally ordered plan

Given this strong equivalence the complexity of plan existence for partially ordered

plans is a direct corollary of Theorems and

The pattern for partially ordered plan evaluation in at domains is that the average

interpretation is no easier to decide than either the optimistic or p essimistic interpretations

In prop ositional domains the pattern is the opp osite the average interpretation is no harder

to decide than either the optimistic or p essimistic interpretations

Theorem The planevaluation problem for partial ly ordered plans in propositional do

PP PP

mains is NP complete under the optimistic interpretation coNP complete under the

pessimistic interpretation and PPcomplete under the average interpretation

PP

Pro of sketch For the optimistic interpretation memb ership in NP follows from the

fact that we can guess a single suciently go o d consistent total order and evaluate it in

PP

PP Theorem Hardness for NP can b e shown using a straightforward reduction from

EMajsat as in the pro of of Theorem

PP

For the p essimistic interpretation memb ership in co NP follows from the fact that we

can guess the worst consistent total order and evaluate it in PP Theorem Hardness for

Complexity of Probabilistic Planning

PP PP

EMajsat co NP can b e shown by reducing to it the co NP version of EMajsat

the pro of is a simple adaptation of the techniques used for example in Theorem ab ove

For the average interpretation the problem can b e shown to b e in PP by combining

the argument in the pro of of Theorem showing how to average over consistent totally

ordered plans with the argument in the pro of of Theorem showing how to evaluate a

plan in a prop ositional domain in PP Alternatively we could express the evaluation of a

partially ordered plan under the average interpretation as a compact probabilistic acyclic

plan Corollary states that such plans can b e evaluated in PP PPhardness follows directly

from Theorem b ecause totally ordered plans are a sp ecial case of partially ordered plans

and evaluating totally ordered plans is PPhard

Applications

To help illustrate the utility of our results this section cites several planners from the

literature and analyzes the computational complexity of the problems they attack We do

not give detailed explanations of the planners themselves for this we refer the reader to

the original pap ers We fo cus on three planning systems witness Brown University

buridan University of Washington and treeplan University of British Columbia In

the pro cess of making connections to these planners we also describ e how our work relates to

the discountedreward criterion partial observability other domain representations partial

order conditional planning p olicybased planning and approximate planning

Witness

The witness algorithm Cassandra Kaelbling Littman Kaelbling et al

solves at partially observable Markov decision pro cesses using a dynamicprogramming ap

proach The basic algorithm nds optimal unrestricted solutions to nitehorizon problems

Papadimitriou and Tsitsiklis showed that the planexistence problem for p olynomial

horizon partially observable Markov decision pro cesses is PSPACEcomplete

As an extension to their nitehorizon algorithm Kaelbling et al sketch a metho d

for nding optimal lo oping plans for some domains Although this is not presented as a

formal algorithm it is not unreasonable to say that the pure form of the problem that this

extended version of witness attacks is one of nding a valid p olynomialsize lo oping plan

for a partially observable domain The similarities b etween this problem and that describ ed

in Section are that the domains are at and that the plans are identical in form The

apparent dierences are that witness optimizes a reward function instead of probability

of goal satisfaction and that witness works in partially observable domains whereas our

results are dened in terms of completely observable domains Both of these apparent

dierences are insignicant however from a computational complexity p oint of view

First witness attempts to maximize the exp ected total discounted reward over an

innite horizon sometimes called optimizing a timeseparable value function As argued

by Condon any problem dened in terms of a sum of discounted rewards can b e

recast as one of goal satisfaction The argument pro ceeds roughly as follows Let

b e the discount factor and R s a b e the immediate reward received for taking action a in

state s

Littman Goldsmith Mundhenk

Dene

0 0

R s a R s a min

s a

R s a

0 0 0 0

R s a R s a min max

s a s a

From this we have that R s a for all s and a and that the value of any plan with

resp ect to the revised reward function is a simple linear transformation of its true value

Now we intro duce an auxiliary state g to b e the goal state and create a new transition

function t such that t s a g R s a and t s a s R s ats a s

for s g t is a welldened transition function and the probability of goal satisfaction for

any plan under transition function t is precisely the same as the exp ected total discounted

reward under reward function R and transition function t Thus any problem stated as

one of optimizing the exp ected total of discounted immediate rewards can b e turned into an

equivalent problem of optimizing goal satisfaction with only a slight change to the transition

function and one additional state This means there is no fundamental computational

complexity dierence b etween these two dierent typ es of planning ob jectives

The second apparent dierence b etween the problem solved by the extended witness

algorithm and that describ ed in Section is that of partial versus complete observability

In fact our results do address partial observability alb eit indirectly In our formulation of

the planexistence problem plans are constrained to make no conditional branches in the

totally ordered and partially ordered cases or to branch only on distinctions made by the

steptransition function in the acyclic and lo oping cases these two choices corresp ond

to unobservable and partially observable domains resp ectively In a partially observable

domain the planexistence problem b ecomes one of nding a valid p olynomialsize nite

state controller sub ject to the given observability constraints Nothing in our complexity

pro ofs dep ends on the presence or absence of additional observability constraints Therefore

it is a direct corollary of Theorem that the planexistence problem for p olynomialhorizon

plans in unobservable domains is NPcomplete Papadimitriou Tsitsiklis and of

Theorem that the planexistence problem for p olynomialsize lo oping plans in partially

observable domains is NPcomplete this is a new result

It is interesting to note that the computational complexity of searching for sizeb ounded

plans in partially observable domains is generally substantially less than that of solving the

corresp onding unconstrained partially observable Markov decision pro cess For example

we found that the planexistence problem for acyclic plans in prop ositional domains is

PP

NP complete Theorem The corresp onding unconstrained problem is that of deter

mining the existence of a historydep endent p olicy for a p olynomialhorizon compactly

represented partially observable Markov decision pro cess which is EXPSPACEcomplete

Theorem of Goldsmith et al or Theorem of Mundhenk et al b The

gap here is enormous EXPSPACE is to EXP what PSPACE is to P and EXP is already

provably intractable in the worst case In contrast to EXPSPACEcomplete problems it is

PP

conceivable that go o d heuristics for NP complete problems can b e created by extensions

of recent advances in heuristics for NPcomplete problems Therefore there is some hop e

of devising eective planning algorithms by building on the observations in this pap er and

searching for optimal sizeb ounded plans instead of optimal unrestricted plans in fact re

cent planners for b oth prop ositional domains Ma jercik Littman a b and at

domains Hansen are motivated by these results

Complexity of Probabilistic Planning

Domain Typ e Horizon Typ e SizeBounded Plan Unrestricted Plan

at p olynomial NPcomplete PSPACEcomplete

PP

prop ositional p olynomial NP complete EXPSPACEcomplete

at innite NPcomplete undecidable

prop ositional innite PSPACEcomplete undecidable

Table Complexity results for plan existence in partially observable domains

Table summarizes complexity results for planning in partially observable domains

The results for sizeb ounded plans are corollaries of Theorems and of this pa

p er The results for unrestricted plans are due to Papadimitriou and Tsitsiklis

at p olynomial Goldsmith et al prop ositional p olynomial and Hanks

innitehorizon This last result is derived by noting the isomorphism of the innite

horizon problem to the emptiness problem for probabilistic nitestate automata which is

undecidable Rabin

Buridan

The buridan planner Kushmerick et al nds partially ordered plans for prop osi

tional domains in the PSO representation There are two identiable dierences b etween

the problem solved by buridan and the problem analyzed in Section the representation

of planning problems and the fact that buridan is not restricted to nd p olynomialsize

plans We address each of these dierences b elow

Although on the surface PSO is dierent from ST either can b e converted into the

other in p olynomial time with at most a p olynomial increase in domain size In particular

the eect of an action in PSO is represented by a single decision tree consisting of prop osition

no des like ST and random no des easily simulated in ST using auxiliary prop ositions

At the leaves are a list of prop ositions that b ecome true and another list of prop ositions

that b ecome false should that leaf b e reached This typ e of correlated eect is also easily

represented in ST using the chain rule of probability theory to decomp ose the probability

distribution into separate probabilities for each prop osition and careful use of the new

sux Thus any PSO domain can b e converted to a similar size ST domain quickly

Similarly a domain in ST can b e converted to PSO with at most a p olynomial expansion

This conversion is to o complex to sketch here but follows from the pro of of equivalence b e

tween ST and a simplied representation called IF Littman a Given the p olynomial

equivalence b etween ST and PSO any complexity results for ST carry over to PSO

The results describ ed in this pap er concern planning problems in which a b ound is given

on the size of the plan sought Although Kushmerick et al do not explicitly describ e

their planner as one that prefers small plans to large plans the design of the planner as

one that searches through the space of plans makes the notion of plan size central to the

algorithm Indeed the publicdomain buridan implementation uses plan size as part of a

b estrst search pro cedure for identifying a suciently successful plan This means that all

other things b eing equal shorter plans will b e found b efore larger plans Furthermore to

assure termination the planner only considers a xed numb er of plans b efore halting thus

To b e more precise this is true for complexity classes closed under logspace reductions

Littman Goldsmith Mundhenk

putting a limit indirectly on the maximum allowable plan size So although buridan do es

not attempt to solve precisely the same problem that we considered it is fair to say that the

problem we consider is an idealization of the problem attacked by buridan Regardless

our lower b ounds on complexity apply to buridan

Kushmerick et al lo oked at generating suciently successful plans under b oth

the optimistic interpretation and the p essimistic interpretation They also explicitly ex

amined the planevaluation problem for partially ordered plans under b oth interpretations

Therefore Theorems and apply to buridan

The more sophisticated cburidan planner Drap er et al extends buridan to

plan in partially observable domains and to pro duce plans with conditional execution The

results of our work also shed light on the computational complexity of the problem addressed

by cburidan Drap er et al devised a representation for partially ordered acyclic

conditional plans In this representation each plan step generates an observation label as

a function of the probabilistic outcome of the step Each step also has an asso ciated set of

context labels dictating the circumstances under which that step must b e executed A plan

step is executed only if its context lab els are consistent with the observation lab els pro duced

in earlier steps In its totally ordered form this typ e of plan can b e expressed as a compact

acyclic plan Corollary can b e used to show that the planevaluation and planexistence

problems for a totally ordered version of cburidans conditional plan representation in

PP

prop ositional domains are PPcomplete and NP complete resp ectively

In our results ab ove we consider evaluating and searching for plans that are partially

ordered and plans that have conditional execution but not b oth at once Nonetheless the

same sorts of techniques presented in this pap er can b e applied to analyzing the problems

attacked by cburidan For example consider the planexistence problem for cburidans

partially ordered conditional plans under the optimistic interpretation This problem asks

whether there is a partially ordered conditional plan that has some total order that reaches

the goal with sucient probability This is equivalent to asking whether there is a totally

ordered conditional plan that reaches the goal with sucient probability Therefore the

PP

problem is NP complete by the argument in the previous paragraph

In spite of many sup ercial dierences b etween the problems analyzed in this pap er and

those studied by the creators of the buridan planners our results are quite relevant to

understanding their work

Treeplan

A family of planners have b een designed that generate a decisiontreebased representation

of stationary p olicies mappings from state to action Boutilier et al Boutilier

Po ole Boutilier Dearden in probabilistic prop ositional domains we refer

to these planners collectively as the treeplan planners Once again these planners solve

problems that are not identical to the problems addressed in this pap er but are closely

related

The planner describ ed by Boutilier et al nds solutions that maximize exp ected

total discounted reward in compactly represented Markov decision pro cesses the domain

representation used is expressively equivalent to ST As mentioned earlier the dierence

b etween maximizing goal satisfaction and maximizing exp ected total discounted reward is a

Complexity of Probabilistic Planning

sup ercial one so the problem addressed by this planner is EXPcomplete Littman a

Although the p olicies used by Boutilier et al app ears quite dissimilar from the nite

state controllers describ ed in our work p olicies can b e converted to a typ e of similarly

sized compact looping plan an extension of the typ e of plan describ ed in Corollary

The conversion from stationary p olicies to lo oping plans is as describ ed in the pro of of

Theorem except that the resulting plans are represented compactly

In later work Boutilier and Dearden show how it is p ossible to limit the size

of the representation of the p olicy in treeplan and still obtain approximately optimal

p erformance This is necessary b ecause in general the size of decision trees needed to

represent the optimal p olicies can b e exp onentially large By keeping the decision trees

from getting to o large the resulting planner b ecomes sub ject to an extension of Theorem

and therefore attacks a PSPACEcomplete problem

One emphasis of Boutilier and Dearden is on nding approximately optimal

solutions with the hop e that doing so is easier than nding optimal solutions We do

not explore the worstcase complexity of approximation in this pap er although Lusena

Goldsmith and Mundhenk have pro duced some strong negative results in this area

A related issue is one of using simulation random sampling to nd approximately optimal

solutions to probabilistic planning problems Some empirical successes have b een obtained

with the related approach of reinforcement learning Tesauro Crites Barto

but once again the worstcase complexity of probabilistic planning is not known to b e any

lower for approximation by simulation

Conclusions

In this pap er we explored the computational complexity of plan evaluation and plan ex

istence in probabilistic domains We found that in compactly represented prop ositional

domains restricting the size and form of the p olicies under consideration reduced the

computational complexity of plan existence from EXPcomplete for unrestricted plans to

PP

PSPACEcomplete for p olynomialsize lo oping plans and NP complete for p olynomial

size acyclic plans In contrast in at domains restricting the form of the p olicies under

consideration increased the computational complexity of plan existence from Pcomplete

for unrestricted plans to NP complete for totally ordered plans this is b ecause a plan that

is smaller than the domain in which it op erates is often unable to exploit imp ortant Markov

prop erties of the domain We were able to characterize precisely the complexity of all

problems we examined with regard to the current state of knowledge in complexity theory

PP PP

Several problems we studied turned out to b e NP complete The class NP promises

to b e very useful to researchers in uncertainty in articial intelligence b ecause it captures

the typ e of problems resulting from cho osing guessing a solution and then evaluating its

probabilistic b ehavior This is precisely the typ e of problem faced by planning algorithms in

probabilistic domains and captures imp ortant problems in other domains as well such as

constructing explanations in b elief networks and designing robust communication networks

PP

We provide a new conceptually simple NP complete problem EMajsat that may b e

useful in further explorations in this direction

The basic structure of our results is that if plan evaluation is complete for some class C

C

then plan existence is typically NP complete This same basic structure holds in determin

Littman Goldsmith Mundhenk

istic domains evaluating a totally ordered plan in a prop ositional domain is Pcomplete for

suciently p owerful domain representations and determining the existence of a p olynomial

P

size totally ordered plan is NP NPcomplete

From a pragmatic standp oint the intuition that searching for small plans is more ef

cient than searching for arbitrary size plans suggests that exact dynamicprogramming

algorithms which are so successful in at domains may not b e as eective in prop ositional

domains they do not fo cus their eorts on the set of small plans Algorithmdevelopment

PP

energy therefore might fruitfully b e sp ent devising heuristics for problems in the class NP

as this class captures the essence of searching for small plans in probabilistic domainssome

early results in this direction are app earing Ma jercik Littman a b Complex

PP

ity theorists have only recently b egun to explore classes such as NP that lie b etween the

p olynomial hierarchy and PSPACE and algorithm designers have come to these classes even

more recently As this pap er marks the b eginning of our exploration of this class of prob

lems much work is still to b e done in probing algorithmic implications but it is our hop e

PP

that heuristics for NP could lead to p owerful metho ds for solving a range of imp ortant

uncertaintysensitive combinatorial problems

Acknowledgements

This work was supp orted in part by grants NSFIRICAREER Littman and

NSF CCR Goldsmith We gratefully acknowledge Andrew Klapp er Anne Con

don Matthew Levy Steve Ma jercik Chris Lusena Mark Peot and our reviewers for helpful

feedback and conversations on this topic

App endix A Complexity of EMajsat

The EMajsat problem is given a pair k consisting of a Bo olean formula of n

variables x x and a numb er k n is there an assignment to the rst k variables

n

x x such that the ma jority of assignments to the remaining n k variables x x

n

k k

satises

For k n this is precisely Bo olean satisability a classic NPcomplete problem This

is b ecause we are asking whether there exists an assignment to al l the variables that makes

true For k EMajsat is precisely Majsat a wellknown PPcomplete problem

This is b ecause we are asking whether the ma jority of all total assignments makes true

Deciding an instance of EMajsat for intermediate values of k has a dierent character

It involves b oth an NPtyp e calculation to pick a go o d setting for the rst k variables and a

PPtyp e calculation to see if the ma jority of assignments to the remaining variables makes

true This is akin to searching for a go o d answer plan schedule coloring b elief network

explanation etc in a combinatorial space when go o d is determined by a computation

over probabilistic quantities This is just the typ e of computation describ ed by the class

PP PP

and we show next that EMajsat is NP complete NP

PP

Theorem EMajsat is NP complete

Complexity of Probabilistic Planning

PP

Pro of Memb ership in NP follows directly from denitions To show completeness of

PP

NP

closure of the PPcomplete EMajsat we rst observe Toran that NP is the

m

PP

set Majsat Thus any NP computation can b e mo deled by a nondeterministic machine

N that on each p ossible computation rst guesses a sequence s of bits that controls its

nondeterministic moves deterministically p erforms some computation on input x and s

and then writes down a formula q with variables in z z as a query to Majsat

xs

l

Finally N x with oracle Majsat accepts if and only if for some s q Majsat

xs

Given any input x like in Co oks Theorem we can construct a formula with variables

x

y y and z z such that for every assignment a a b b it holds that

k l k l

a a b b q b b Thus k EMajsat if and only if for

x xa a x

k l l

k

some assignment s to y y q Majsat if and only if N x accepts

xs

k

References

Allender E Ogihara M Relationships among PL L and the determinant

Theoretical Informatics and Applications

Alvarez C Jenner B A very hard logspace counting class Theoretical Computer

Science

Backstrom C Expressive equivalence of planning formalisms Articial Intel ligence

Backstrom C Neb el B Complexity results for SAS planning Computational

Intel ligence

Balcazar J Daz J Gabarro J Structural Complexity III EATCS

Monographs on Theoretical Computer Science Springer Verlag

Boro din A Co ok S Pipp enger N Parallel computation for wellendowed

rings and spaceb ounded probabilistic machines Information and Control

Boutilier C Dean T Hanks S Decision theoretic planning Structural as

sumptions and computational leverage In preparation

Boutilier C Dearden R Approximating value trees in structured dynamic pro

gramming In Saitta L Ed Proceedings of the Thirteenth International Conference

on Machine Learning

Boutilier C Dearden R Goldszmidt M Exploiting structure in p olicy con

struction In Proceedings of the Fourteenth International Joint Conference on Articial

Intel ligence

Boutilier C Po ole D Computing optimal p olicies for partially observable

decision pro cesses using compact representations In Proceedings of the Thirteenth

National Conference on Articial Intel ligence pp AAAI PressThe MIT

Press

Littman Goldsmith Mundhenk

Bylander T The computational complexity of prop ositional STRIPS planning

Articial Intel ligence

Cassandra A R Kaelbling L P Littman M L Acting optimally in partially

observable sto chastic domains In Proceedings of the Twelfth National Conference on

Articial Intel ligence pp Seattle WA

Chapman D Planning for conjunctive goals Articial Intel ligence

Condon A The complexity of sto chastic games Information and Computation

Crites R H Barto A G Improving elevator p erformance using reinforcement

learning In Touretzky D S Mozer M C Hasselmo M E Eds Advances in

Neural Information Processing Systems Cambridge MA The MIT Press

Dearden R Boutilier C Abstraction and approximate decisiontheoretic plan

ning Articial Intel ligence

Drap er D Hanks S Weld D Probabilistic planning with information gathering

and contingent execution In Proceedings of the AAAI Spring Symposium on Decision

Theoretic Planning pp

Drummond M Bresina J Anytime synthetic pro jection Maximizing the

probability of goal satisfaction In Proceedings of the Eighth National Conference on

Articial Intel ligence pp Morgan Kaufmann

Erol K Nau D S Subrahmanian V S Complexity decidability and unde

cidability results for domainindep endent planning Articial Intel ligence

Gill J Computational complexity of probabilistic Turing machines SIAM Journal

on Computing

Goldman R P Bo ddy M S Epsilonsafe planning In Proceedings of the th

Conference on Uncertainty in Articial Intel ligence UAI pp Seattle

WA

Goldsmith J Littman M Mundhenk M a The complexity of plan existence and

evaluation in probabilistic domains Tech rep CS Department of Computer

Science Duke University

Goldsmith J Littman M L Mundhenk M b The complexity of plan existence

and evaluation in probabilistic domains In Proceedings of the Thirteenth Annual Con

ference on Uncertainty in Articial Intel ligence UAI pp San Francisco

CA Morgan Kaufmann Publishers

Goldsmith J Lusena C Mundhenk M The complexity of deterministically

observable nitehorizon Markov decision pro cesses Tech rep Department

of Computer Science University of Kentucky

Complexity of Probabilistic Planning

Hanks S Decisiontheoretic planning in unobservable domains is undecidable

Personal communication

Hansen E A FiniteMemory Control of Partial ly Observable Systems PhD thesis

University of Massachusetts

Jung H On probabilistic time and space In Proceedings th ICALP pp

Lecture Notes in Computer Science SpringerVerlag

Kaelbling L P Littman M L Cassandra A R Planning and acting in

partially observable sto chastic domains Articial Intel ligence

Ko enig S Simmons R G Risksensitive planning with probabilistic decision

graphs In Proceedings of the th International Conference on Principles of Know ledge

Representation and Reasoning pp

Kushmerick N Hanks S Weld D S An algorithm for probabilistic planning

Articial Intel ligence

Ladner R Polynomial space counting problems SIAM Journal on Computing

Lin SH Dean T Generating optimal p olicies for highlevel plans with con

ditional branches and lo ops In Proceedings of the Third European Workshop on

Planning pp

Littman M L a Probabilistic prop ositional planning Representations and complex

ity In Proceedings of the Fourteenth National Conference on Articial Intel ligence

pp AAAI PressThe MIT Press

Littman M L b Solving large POMDPs Lessons from complexity theory Talk

presented at the DARPA AI Workshop in Providence RI Slides available at URL

httpwwwcsdukeedumlittmantalksdarpapomdpps

Lovejoy W S A survey of algorithmic metho ds for partially observable Markov

decision pro cesses Annals of Operations Research

Lusena C Goldsmith J Mundhenk M Nonapproximability results for Markov

decision pro cesses Tech rep UK CS Dept TR University of Kentucky

Ma jercik S M Littman M L a MAXPLAN A new approach to probabilistic

planning In Simmons R Veloso M Smith S Eds Proceedings of the Fourth

International Conference on Articial Intel ligence Planning pp AAAI Press

Ma jercik S M Littman M L b Using caching to solve larger probabilistic

planning problems In Proceedings of the Fifteenth National Conference on Articial

Intel ligence pp The AAAI PressThe MIT Press

Mansell T M A metho d for planning given uncertain and incomplete information

In Proceedings of the th Conference on Uncertainty in Articial Intel ligence pp

Morgan Kaufmann Publishers

Littman Goldsmith Mundhenk

McAllester D Rosenblitt D Systematic nonlinear planning In Proceedings of

the th National Conference on Articial Intel ligence pp

Mundhenk M Goldsmith J Allender E a The complexity of p olicyevaluation

for nitehorizon partiallyobservable Markov decision pro cesses In Proceedings of

nd Symposium on Mathematical Foundations of Computer Science published in

Lecture Notes in Computer Science SpringerVerlag

Mundhenk M Goldsmith J Lusena C Allender E b Encyclopaedia of com

plexity results for nitehorizon Markov decision pro cess problems Tech rep UK CS

Dept TR University of Kentucky

Papadimitriou C H Computational Complexity AddisonWesley Reading MA

Papadimitriou C H Tsitsiklis J N The complexity of Markov decision pro

cesses Mathematics of Operations Research

Platzman L K A feasible computational approach to innitehorizon partially

observed Markov decision problems Tech rep J Georgia Institute of Technology

Atlanta GA

Puterman M L Markov Decision ProcessesDiscrete Stochastic Dynamic Pro

gramming John Wiley Sons Inc New York NY

Rabin M O Probabilistic automata Information and Control

Roth D On the hardness of approximate reasoning Articial Intel ligence

Simon J On some central problems in computational complexity PhD thesis

Cornell University Also Cornell Department of Computer Science Technical Rep ort

TR

Smith D E Williamson M Representation and evaluation of plans with lo ops

Working notes for the Stanford Spring Symp osium on Extended Theories of

Action

Tesauro G TDGammon a selfteaching backgammon program achieves master

level play Neural Computation

To da S PP is as hard as the p olynomialtime hierarchy SIAM Journal on Com

puting

Toran J Complexity classes dened by counting quantiers Journal of the ACM

Vinay V Counting auxiliary pushdown automata and semiunb ounded arithmetic

circuits In Proc th Structure in Complexity Theory Conference pp IEEE