Solving Multistage Influence Diagrams Using Branch-And-Bound Search

Solving Multistage Influence Diagrams using Branch-and-Bound Search Changhe Yuan, Xiaojian Wu and Eric A. Hansen Department of Computer Science and Engineering Mississippi State University Mississippi State, MS 39762 fcyuan,[email protected], [email protected] Abstract A branch-and-bound approach to influence diagram evaluation appears to have been first suggested by Pearl [16]. He proposed it as an improvement over the classic method A branch-and-bound approach to solving influ- of unfolding an influence diagram into a decision tree and ence diagrams has been previously proposed in solving it using the rollback method, which itself is a form the literature, but appears to have never been of dynamic programming [7]. In Pearl’s words: implemented and evaluated – apparently due to the difficulties of computing effective bounds for the branch-and-bound search. In this paper, A hybrid method of evaluating influence dia- we describe how to efficiently compute effective grams naturally suggests itself. It is based on bounds, and we develop a practical implementa- the realization that decision trees need not action of depth-first branch-and-bound search for tually be generated and stored in their total- influence diagram evaluation that outperforms ity to produce the optimal policy. A decision existing methods for solving influence diagrams tree can also be evaluated by traversing it in a with multiple stages. depth-first, backtracking manner using a mea- ger amount of storage space (proportional to the depth of the tree). Moreover, branch-and-bound techniques can be employed to prune the search 1 Introduction space and permit an evaluation without exhaus- tively traversing the entire tree... an influence di- An influence diagram [7] is a compact representation of agram can be evaluated by sequentially instan- the relations among random variables, decisions, and pref- tiating the decision and observation nodes (in erences in a domain that provides a framework for decision chronological order) while treating the remain- making under uncertainty. Many algorithms have been de- ing chance nodes as a Bayesian network that sup- veloped to solve influence diagrams [2, 3, 8, 15, 17, 18, plies the probabilistic parameters necessary for 19, 21]. Most of these algorithms, whether they build a tree evaluation. (p. 311) secondary structure or not, are based on the bottom-up dynamic programming approach. They start by solving small However, neither Pearl nor anyone else appears to have low-level decision problems and gradually build on these followed up on this suggestion and implemented such an results to solve larger problems until the solution to the algorithm. The apparent reason is the difficulty of com- global-level decision problem is found. The drawback of puting effective bounds to prune the search tree. Qi and these methods is that they can waste computation in solv- Poole [17] proposed a similar search-based method for ing decision scenarios that have zero probability or that are solving influence diagrams, but with no method for com- unreachable from any initial state by following an optimal puting bounds; in fact, their implementation relied on the decision policy. trivial infinity upper bound to guide the search. Recently, This drawback can be overcome by adopting a branch-and- Marinescu [12] proposed a related search-based approach bound approach to solving an influence diagram that uses to influence diagram evaluation. But again, he proposed no a search tree to represent all possible decision scenarios. method for computing bounds; his implementation relies This approach can use upper bounds on maximum utility on brute-force search. Even without bounds to prune the to prune branches of the search tree that correspond to low- search space, note that both Qi and Poole and Marinescu quality decisions that cannot be part of an optimal policy; argue that a search-based approach has advantages – for it can also prune branches that have zero probability. example, it can prune branches that have zero probability. In this paper, we describe an implemented depth-first assumed for an influence diagram, which means the infor- branch-and-bound search algorithm for influence diagram mation variables of earlier decisions are also information evaluation that includes efficient techniques for comput- variables of later decisions. We call these past information ing bounds to prune the search tree. To compute effec- variables the history, and, for convenience, we assume that tive bounds, our algorithm adapts and integrates two pre- there are explicit information arcs from history information vious contributions. First, we adapt the work of Nilsson variables to decision variables. Finally, each utility node and Hohle¨ [14] on computing an upper bound on the max- Ui 2 U represents a function that maps each configuration imum expected utility of an influence diagram. The moti- of its parents to a utility value the represents the prefer- vation for their work was to bound the quality of strategies ence of the decision maker. (Utility variables typically do found by an approximation algorithm for solving limited- not have other variables as children except multi-attribute memory influence diagrams, and their bounds are not in utility/super-value variables.) a form that can be directly used for branch-and-bound The decision variables in an influence diagram are typically search. We show how to adapt their approach to branch- assumed to be temporally ordered, i.e., the decisions have and-bound search. Second, we adapt the recent work of to be made in a particular order. Suppose there are n deci- Yuan and Hansen [20] on solving the MAP problem for sion variables D ;D ; :::; D in an influence diagram. The Bayesian networks using branch-and-bound search. Their 1 2 n decision variables partition the variables in X into a col- work describes an incremental method for computing upper lection of disjoint sets I ; I ; :::; I . For each k, where bounds based on join tree evaluation that we show allows 0 1 n 0 < k < n, I is the set of chance variables that must such bounds to be computed efficiently during branch-and- k be observed between D and D . I is the set of initial bound search. In addition, we describe some novel methods k k+1 0 evidence variables that must be observed before D . I is for constructing the search tree and computing probabilities 1 n the set of variables left unobserved when decision D is and bounds that contribute to an efficient implementation. n made. Therefore, a partial order ≺ is defined on the influ- Our experimental results show that this approach leads to ence diagram over X [ D, as follows: an exact algorithm for solving influence diagrams that outperforms existing methods for solving multistage influence I0 ≺ D1 ≺ I1 ≺ ::: ≺ Dn ≺ In: (1) diagrams. A solution to the decision problem defined by an influence 2 Background diagram is a series of decision rules for the decision variables. A decision rule for Dk is a mapping from each configuration of its parents to one of the actions defined by We begin with a brief review of influence diagrams and the decision variable. A decision policy (or strategy) is a algorithms for solving them. We also introduce an example series of decision rules with one decision rule for each de- of multi-stage decision making that will serve to illustrate cision variable. The goal of solving an influence diagram the results of the paper. is to find an optimal decision policy that maximizes the expected utility. The maximum expected utility is equal to 2.1 Influence Diagrams X X X X max ::: max P (XjD) Uj(P a(Uj)): D1 Dn An influence diagram is a directed acyclic graph G contain- I0 In−1 In j ing variables V of a decision domain. The variables can be In general, the summations and maximizations are not classified into three groups, V = X [ D [ U, where X is commutable. The methods presented in Section 2.3 differ the set of oval-shaped chance variables that specify the un- in the various techniques they use to carry out the summa- certain decision environment, D is the set of square-shaped tions and maximizations in this order. decision variables that specify the possible decisions to be made in the domain, and U are the diamond-shaped util- Recent research has begun to relax the assumption of or- ity variables representing a decision maker’s preferences. dered decisions. In particular, Jensen proposes the frame- As in a Bayesian network, each chance variable Xi 2 X work of unconstrained influence diagrams to allow a partial is associated with a conditional probability distribution ordering among the decisions [9]. Other research relaxes P (XijP a(Xi)), where P a(Xi) is the set of parents of Xi the no-forgetting assumption, in particular, the framework in G. Each decision variable Dj 2 D has multiple informa- of limited-memory influence diagrams [10]. Although the tion states, where an information state is an instantiation of approach we develop can be extended to these frameworks, the variables with arcs leading into Dj; the selected action we do not consider the extension in this paper. is conditioned on the information state. Incoming arcs into a decision variable are called information arcs; variables at 2.2 Example the origin of these arcs are assumed to be observed before the decision is made. These variables are called the infor- To illustrate decision making using multi-stage influence mation variables of the decision. No-forgetting is typically diagrams, consider a simple maze navigation problem [6, (1,1) ns_0 (2) ns_1 (9) es_0 (3) es_1 (10) x_2 (14) x_0 (0) x_1 (7) d_0 (6) d_1 (13) u y_2 (15) y_0 (1) y_1 (8) ss_0 (4) ss_1 (11) (7,9) ws_0 (5) ws_1 (12) (a) (b) (a) Figure 1: Two maze domains.

Solving Multistage Influence Diagrams Using Branch-And-Bound Search

Adversarial Search

Artificial Intelligence Spring 2019 Homework 2: Adversarial Search

Best-First and Depth-First Minimax Search in Practice

General Branch and Bound, and Its Relation to A* and AO*

Trees, Binary Search Trees, Heaps & Applications Dr. Chris Bourke

Parallel Technique for the Metaheuristic Algorithms Using Devoted Local Search and Manipulating the Solutions Space

Backtracking Search (Csps) ■Chapter 5 5.3 Is About Local Search Which Is a Very Useful Idea but We Won’T Cover It in Class

Best-First Minimax Search Richard E

Binary Search Trees! – These Are "Nonlinear" Implementations of the ADT Table

Branch and Bound

Monte Carlo *-Minimax Search

Algorithm for Character Recognition Based on the Trie Structure