A Game-Chain-Based Approach for Decision Making

A GAME-CHAIN-BASED APPROACH FOR DECISION MAKING

An Tingyu, Watanabe tsunami Kochi University of Technology

ABSTRACT: Nowadays, with the rapid development of information society, decision-making problems become more and more complicated especially in large scale systems such as infrastructure, environmental and industrial fields, which are usually accompanied by psychological competition between involved parties in a complicated, uncertain and dynamic situation. From a holistic perspective of system, a specific decision-making method which is described as game-chain-based decision making has been proposed in this paper in order to seek a solution for these problems. It takes other involved parties’ thinking into consideration and explores consequences of holistic instead of reductionist in order to pursue scientific proof and optimal choice for decision makers. Based on systematic and game ideas, this paper attempts to find a solution by mathematical modeling. First, characteristics and descriptions of this sort of decision-making problems are summarized which can be abstracted as game chain. Through giving proper definitions of decision point, state, action, state transition probability and immediate utility, a mathematical model is set up which translates the process of game chain decision making into Markov decision process with a list of 5 objects. Second, with support of Game Theory and Markov Decision Process, the corresponding equilibrium (system equilibrium) in a holistic view of system is developed under the principle function of expected total utility and then proved to be existed under some certain conditions. Then a pathway to find the system equilibrium is given. Finally the proposed method is demonstrated through a tank virtual game.

KEYWORDS: game chain, system equilibrium, Markov decision process

1. INTRODUCTION evolve over time as it is continuously modified, improved, enlarged, and as various components are Currently, management for large-scale, highly rebuilt, decommissioned or adapted to other uses. It interconnected systems such as infrastructure, usually brings a complex, uncertain and dynamic environmental and industrial systems seems to environment for infrastructure management. Another become more complicated partly because it is attribute is that system components are commonly usually required to balance the involved parties’ interdependent, not usually capable of subdivision, requirements and benefits in complex, uncertain and and moreover, the management process is composed dynamic situations from a holistic perspective. Take of several stages or steps involving multiple parties. infrastructure for example, which herein refers to the Consequently, management from holistic and game technical structures that support a society, such as perspectives seems quite necessary. As such, an roads, water supply, sewers, telecommunications, organization’s success will largely depend on its power grids, and so forth. One typical attribute of ability to manage the problems induced by those infrastructure is that the system or network tends to attributes. One important involvement during the 1 management process is decision making, which is non-human players (computers, animals, plants)” especially made more difficult by those attributes (Aumann 1987). Traditional applications of game mentioned above, and consequently accompanied by theory attempt to find equilibrium in these games, in psychological competition between involved parties which each player of the game has adopted a in a complicated, uncertain and dynamic situation. strategy that they are unlikely to change. Many This paper is an illustration of the application of equilibrium concepts have been developed (most system process modeling and game idea to the famously the Nash equilibrium) in an attempt to resolution of complex decision problems. capture this idea. However, as it is generally focused on one game, it seems necessary to add something to A holistic illustration is needed to understand the game theory in order to find a solution for problems overall situation by showing the interconnectivity with a sequence of games which can be depicted as a between involved parties, which is the foundation of game chain (Figure 1). This paper is just an attempt the system processes modeling. The system thinking with structure showed in Figure 2. movement (Checkland, 1999; Senge, 2006) is an exploration of consequences of holistic instead of reductionist ways of thinking. System approaches are a way of grasping and managing situations of complexity and uncertainty in which there are no simple answers and people and their attitudes are an integral part of the problem. That is to say, such a decision-making method from a holistic perspective of system should take involved parties’ thinking into account and allow for complexity to be managed. Based on above requirements, it seems reasonable to combine systematic idea with game idea in order to pursue optimal choice for decision makers.

Game theory is considered as a branch of applied mathematics which can be used in the social sciences, biology, engineering, political science, international 2. CHARACTERISTICS OF GAME CHAIN relations, computer science (mainly for artificial DECISION PROBLEMS intelligence) etc. Game theory attempts to capture behavior in strategic situations mathematically, in The game chain decision-making problems have which an individual’s success in making choices characteristics as follows: depends on the choices of others. It plays an (1) It can be abstracted as a chain made up of a important role in solving practical decision-making sequence of game units in which the players take the problems existed in many fields. Nowadays, “game control of decision variables by themselves, theory is a sort of umbrella or ‘unified field’ theory respectively. for the rational side of social science, where social is (2) Some game unit will exert an influence on the interpreted broadly, to include human as well as following game’ condition by taking action, which 2 req uire s that, any dec ision-maker should not only consider the immediate opportunities for future, it seems a challenge to find reward but also take care of long-term benefit. In a solution for this problem. other words, today’s decision will influence tomorrow, and tomorrow’s decision will influence 3. MODELING FOR A CERTAIN KIND OF future too. If you disregard impact on future and GAME CHAIN only take the interests of current stage into account, 3.1 Decision process description you do not make a wise decision from the long-term The certain game chain to be analyzed is illustrated Gj(1,2)= perspective. In a word, decision makers are facing a as GC = (, G 12 G ,,L G N ) (see Figure 3). ij decision process which is frequently made up of represent the players of game unit i(1,2,,)iN= L ; Ui more than one related phases. and Ri (1,2,,)iN= L represent the payoff and (3) The decision-making problem is displayed with a outcome vectors of game unit i, respectively; Fi is hierarchically discrete structure, thus a system the relation function which reflects the influence of strategy for the game chain should be formed from a previous game on the one behind. sequence of strategies for each stage. (4) The decision-making process is an orderly The decision-making process based on game process which means that the players make decisions chain is showed in Figure 4. Our concerning is that follow the sequence of game units. how to make a proper decision in such a process in a (5) The final system strategy should be acceptable holistic view which means taking all stages into for each participant, containing two layers of consideration. meaning: first, decision-makers seek their own Action Action Decision Decision “optimal” strategy on the premise of taking others’ point t point t+1 State of State of Game Unit possible strategies into account; second, they balance Game Unit t t+１ the immediate reward with possible future reward in various circumstances. As a result, all parties are Utility Utility unlikely to change or else they will get less benefit. Figure 4 The game-chain decision process

In conclusion, game chain decision problem is 3.2 Elements of the game chain decision process firstly a game problem, which means that we should (1) Decision-point and stage take others’ strategies into consideration. Secondly, it Each decision should be made at a certain is a sequential decision-making process usually moment called decision-point in this paper. The under a complicated, uncertain and dynamic corresponding set of decision-points in the game environment. As decision makers are frequently chain GC= (, G12 G ,,L GN ) is then denoted as faced with the complexity of not only taking care of TN={1, 2,… , } , while the process between two immediate reward but also attempting to create adjacent decision-making points is known as Stage. 3 (2) State and action the tth stage. Condition of the moment when game unit t starts is defined as the state of decision-point t and all Note: As a matter of convenience, we suppose possible states form the set St . Given iStt∈ , denote that immediate utility of each game unit has the

Aitt() as the action set of game unit t which gives all same dimension (otherwise we can achieve it by possible action profiles of players in game unit t. method of normalization). Notice that under a certain state, a certain action may occur or an uncertainty may also be available with a 2) Influence on state of the next decision-point t+1, number of possible actions to be selected randomly. which is represented by the probability distribution p ()t (,)ia Thus for decision-point t, we define Dis(()) Att i as tt described as follows: all probability distributions on Att()i . Then selecting action or making decision under the state iStt∈ at If the system takes action aAitt∈ () under the the decision-point t is equivalent to selecting a state of iS∈ t at the decision-point t, then it will probability distribution which transfer to state at the decision-point t+1 πttt() ∈ Dis ( A ( i )) j ∈ St+1 ()t displays the probability of action aAittt∈ () as with probability p ij () a t , satisfying π ()a , satisfying ()t (1) tt ∑ paij()1 t = jS∈ t+1 ∑ π tt()1a = aAittt∈ () ()t Denote ppjiaiSjSaAitT = {(,),, ttttt ∈∈∈∈ + 1 , (),} as If this distribution is degenerated, it means that the state transition probability distribution family, action is selected in determinacy. which is generally related to decision-point. The Assumption 1 state transition probability distribution displays us

∀∈tT, S t is a finite set. the relation of two adjacent game units, which is Assumption 2 expressed as the relation function before. For

∀∈tT, iStt∈ , strategy sets of both players in example, if the state of game unit t+1 can be only j game unit t are all finite sets which guarantees that proved to be a t , then the corresponding transition

Aitt() be a finite set too. probability is ⎧1, jj= ()t ⎪ at (2) pjia(,)tttt= ⎨ iSaAitT∈∈ , (), ∈ (3) State transition probability and immediate utility ⎩⎪0,others

We notice that the action aASttt∈ () selected at Thus we have the relation function as () t (3) the decision-point t would influence on state of the Ftt(,ia , j )= p ijt ( a ) next decision-point t+1. Naturally, we interpret what for iSaAijS∈∈tt,(), t ∈ t+1 . In the game chain will be brought by taking action aAittt∈ () under decision problems, the relationship between action at the state of iStt∈ at the decision-point t as follows: decision-point t and state of decision-point t+1 can 1) A utility profile at the tth stage in the form be determined using game theory or obtained from

()GGtt12 ( ) of riatt(, t )= ( r t (, ia t t ), r t (, ia t t )) , which is experience, thus given the state of decision-point t, expressed as immediate utility in this paper. In transition probability to the state of decision-point each game unit, either player’s immediate utility is t+1 is independent of previous states or actions, dependent on strategies of both players at the which indicates that game-chain-based decision corresponding stage. Therefore, once the action at process possesses the Markov property. Therefore, a is known, we can determine the immediate utility of mathematical model for game chain based on 4 Markov decision process is set up with a list of 5 π = (,ππ12 ,,… π N ) objects as follows: is called a generic strategy. The universal set of

{TS ,== S , A Ai ( ), p()t ( i , a ), ria ( , )} (4) generic strategies is denoted as Π . UUttttttt t tT∈∈ tT

Definition 3: If the decision policy πt at the (4) Decision policy and strategy decision-point t is independent on previous states From the description above, we notice the game and actions, which means chain decision process is composed of successive ππ tt () hihHtT = ttt (), ∈∈ t , (7) states and actions. Label one trace of decision then πt is a Markov decision policy and the process as ht , which denotes a series of states and corresponding decision policy sequence actions from decision-point 1 to decision-point t with π = (,ππ12 ,,… π N ) state of the decision-point t as it : is called random Markov strategy. The universal set

def of random Markov strategies is denoted as Πm . If hiaiaitt=∈(,1122 , , ,… ,), tT (5) π t ()∀tT∈ are all degenerate distributions, that is to in which iSaAik kkkkk ∈∈ ,()1,,1 ( =− … t ) , iStt∈ . say, there exists a decision function sequence

The universal set of all that kind of traces is denoted (,f12ff ,… , N ), satisfying as Ht , which actually displays all possible decision π tttt(())1, f ii = ∀∈ i t S t (8) pathways to reach the state of it at the then π is called a deterministic Markov strategy, d decision-point t. the universal set of which is denoted as Πm .

A decision policy should give the principle to (5) Stochastic Process and Principle function follow when making decisions or selecting actions For tT∈ , Yt and Δt denote state and action at under different states at the decision-points. the decision-point t, respectively. Obviously, they are random variables depending on strategyπ . Now we

Definition 1: If function ft satisfies: for any iStt∈ , have the random sequence

ftt ()iAi ∈ tt () … there is , then ft is defined as a (,YY1122Δ , ,ΔΔ , , YN ,N ) deterministic decision policy or decision function for denoted as L()π . As it depends on the initial state of short. The corresponding decision function sequence probability distribution of Y1 , transition probability

π = (,f12ff ,,… N ) is called a deterministic strategy. and strategy π , we call it as stochastic process induced by strategy π . Accordingly, denote the Definition 2: If a family of probability distribution probability of occurrence of an event in L()π as on the state space satisfies: given any pathway , and corresponding expected utility as . πt Pπ () Eπ () hHtt∈ to approach the decision-point t with the ∀π ∈Π, define a random variable sequence in the state iStt∈ , π tt()hDisAi∈ (()) tt is a probability stochastic process induced by π as distribution on Aitt(), such that RRRR()π = (12 (),(),,ππ… N ()) π in which the random variable ⎧≥π ttt()0ah RrYtTttt()π =Δ∈ (, )( ) ⎪ (6) ⎨ π ()1ah = represents the immediate utility of game unit t at the ⎪ ∑ ttt ⎩aAittt∈ () decision-point t. then πt is defined as a generic decision policy. The corresponding decision policy sequence Now, our purpose is to define a principle 5 **** function to discriminate between good and bad ππππ= (,,,12… N )∈Π strategies, and then try to find an ‘optimal’ strategy is a system equilibrium strategy if satisfying

** in the sense of “equilibrium”. π tt()iEqGitTiS∈ arg { tt ()},∀∈ , t ∈ t (11) where arg Eq means to find Nash Equilibrium of

* * Definition 4 the new game Gitt(). π tt()i represents the decision Finite game chain decision problem is a or the selected action at the decision-point t under systematic decision process with finite stages in the state of iStt∈ . some degree. Therefore, we attempt to set up Given a fixed initial state iS11∈ , we can find the **** expected total utility as a criterion for this game system equilibrium EEEE= (,,,12… N ), in which ** ** chain. Suppose that there is an immediate utility at Ei111= π (), and Ei222= π () by combining i1 , ** the decision point t as Ei111= π () and state transition probability, and then

()GGtt12 ( ) ** riattt( , )=∀∈∈∈ ( r t ( ia tt , ), r t ( ia tt , )), i t S tt , a Ai tt ( ), t T E3 ,,… EN . Then define the expected total utility obtained by selecting action aAittt∈ () at the decision-point t From Assumption 1 and 2, we can see that the under the strategyπ and state iStt∈ as games at any decision-point are all finite games. As a N (9) result, they all have Nash Equilibrium, which means ViatNtt, (, ,π )=⊕ ria ttt (, )∑ ErYπ {( n , Δ∀∈∈ n )}, iSaAi t tt , tt () nt=+1 that ∀tTi∈∈， tt S * where argEq { Gtt ( i )} ≠ φ (12) Then we can approach a conclusion that finite game ErYππ{(nn ,Δ= )}∑ PY { n =Δ= i nn , ahria nnnnn } ( , ) (,iSaAinnnnn∈∈ ()) chain decision making process under complete nt=+1,… , N (10) information can be defined as a Markov decision in which, hn should meet the condition that it process, and under the principle of expected total passes (,iatt ) at the decision-point t. utility (9), the system equilibrium exists.

Notes: 4.2 Solution for system equilibrium

1) Actually, ViatN, (, t t ,π )is a vector in the form of 4.2.1 Theoretical foundation

()GGtt12 ( ) ViatN,,,(, t t ,π )= ( V tN (, iaV t t ,ππ ), tN (, ia t t , )) For generic strategies Π , as all possible pathways 2) The expected total utility of decision-point N is will be considered, it won’t leave out any strategy

()GGNN12 ( ) VNNNN, (,) ia== ria NNN (,)( r N (,),(,)) ia NN r N ia NN while it is too difficult to operate when encountering 3) ⊕ , ∑ are different from a summation notation in with large sets of states and actions brought by that only utilities of the identical player can be numerous game units or a large strategy space. added. (1) A large amount of calculation. When the game chain is in a large scale, “combinatorial explosion” 4. SOLUTION FOR THE GAME-CHAIN will occur with an exponential growth of possible DECISION PROBLEM emergence of states, which will lead to inefficiency. 4.1 Existence of system equilibrium (2) A large calculation error. As the transition Definition 5: probability is dependent on experience on many At the decision-point t, by employing expected occasions, so there will be errors inevitably. Then in total utility as payoff, we establishes a new game the process to find the system strategy in a holistic

* Gitt()( t∈ T ). Then a strategy view, with a larger time span, the accumulated error 6 will grow larger, and even result in that theoretical probability. “optimal” solution is not practical at all. 4.2.2 Pathway of the solution

Theorem 1: For any π ∈Π and iS∈ 1 , there exists Step1: For tN= and iSNN∈ , let a random Markov strategy π '∈Πm , satisfying uiaNN(,)= ria NN (,),∀∈ a Ai NN (). Based on the previous assumption, we can see that Pππ'1{,Yjtt=Δ= aYiPYj == }{, tt =Δ= aYitT 1 = }, ∀∈

(13) ANN()i is a finite set, ∀∈ iSN N . Establish a new game * Brief proof: Fix the state iS∈ 1 , for any GiNN()with the payoff uiaNN(,) under the state

j ∈∈Satt,() Aj and tT∈ , define a random Markov iSNN∈ . Then according to Game Theory, we can * policy as find decision policy π NN (),ai ∀∈ a A NN () i and the * π '(tttaj )≡Δ== Pπ { aY jY ,1 = i } corresponding system equilibrium value uiNN(), * * Then we can prove that such strategy π '∈Πm can which are denoted as arg EqG NN ( i ) and EqGNN() i , meet the condition above. respectively. Find the equilibrium under all states

* iSNN∈ and then we can obtain π N . ** * The meaning of this theorem lies in that Step2: If t = 1 , stop. Then π = (,,ππ1 … N ) is searching for an optimal strategy from a generic the system equilibrium for this game chain under the strategy set can be equivalent to searching from a given initial state. Or else, let tt−⇒1 , transfer to random Markov strategy set. The conclusion comes step3. to existence only under the condition of fixing the Step3: For every state iStt∈ , get uiatt(,),∀∈ a Ai tt () initial state which indicates that the random Markov through strategy π ' is dependent on the initial state iS∈ . * 1 uiatt(, )=⊕ ria tt (,)∑ p t ( jiau t , ) t+1 () j Fortunately, it can be met in game chain decision jS∈ t+1

* problems described in this paper. Therefore, we Then establish the new game Gitt()∀∈iStt with confine the system equilibrium to the set of random the payoff uiatt(,). Find the decision policy * Markov strategy, in accordance with which, design π tt(),ai∀∈ a A tt () i and corresponding equilibrium ** the corresponding pathway to find the equilibrium. value EqGtt() i= u tt () i by employing game theory * and then obtain π t . ： Theorem 2: Given ππππ=∈Π(,12 ,,… N ) m , L()π is Step4 Return to Step2. a non-homogeneous Markov chain with transition probability of the decision point t is 5. APPLICATION TO DECISION-MAKING

P π {,Yjtttt++11=Δ= bYiapjiabj =Δ== ,}(,)()π t + 1 IN A TANK VIRTUAL GAME (14) 5.1 Background

Brief proof: For tT∈ and hHtt∈ , expand Interdependence of involved parties is usually

Pπ {,Yjtttt++11=Δ= bYia =Δ= ,}by Total Probability overlooked in decision-making process. However, an Formula. Combining Markov property of π and excellent decision maker should always take his property of single step transition probability, we can opponents’ thinking into consideration just the same get the conclusion. as what a wise man would do in a competitive or fair game. Moreover, on most occasions, decision The significance of the theorem lies in that it making seems to be carried out by multi-stage shows the way to calculate the single step transition operation, which indicates that a decision-maker is 7 frequently faced with a multistep decision process or 5.2.2 Force deployment assumptions multiple games with his opponents which can be (1) At the beginning of the game, Red has 6 platoons described as a game chain. It’s unwise for him to at the station, which are 18 tanks in all. According to make a decision just focusing on his own thinking or present intelligence investigation, there are 4 only partial stages. For example, in a campaign platoons at Blue’s station, which are 12 tanks in all. virtual game, before fighting the first battle one must (2) Both players can only dispatch troops from their have a general idea of how the second, third, fourth, station to each battlefield by platoon as a unit. Either and even the final battle will be fought, and consider side is required to have no less than 1 platoon in any what actions his opponent would probably take all battlefield. the time, which means that decision should be made (3) In the encounter battle, Red must meet the from holistic and game perspectives. In this part, we condition that its probability of winning is no less make up a scene of tank virtual game to demonstrate than a pre-determined acceptable value θ . the pathway to solve game chain problem. (Supposeθ = 0.6 ) (4) Remaining tanks of each battlefield are supposed 5.2 Game scenarios to withdraw and never fight again at the following 5.2.1 Game stage scenario stages. Considering that there are many kinds of decisions to (5) Both players are rational indicating that both will make during a tank game, we just take decision of pursue its maximization of utility on the given force deployment into consideration in this part. conditions. Before the game starts, a player should make an overall deployment of forces on the basis of current 5.3 A Game chain model situation of both sides. We simulate a game simply Simulated process of this tank game can be as follows (see Figure 5) in which two players are abstracted as a game chain (see Figure 6). called Red and Blue. They may have different battles in different battlefields, which are called as an Before establishing a mathematical model of this encounter battle, a position battle and a target battle game chain, we should first analyze three battles in successively in this part. details, respectively, by setting up sub-models which

Figure 5 Game Stage scenario

Figure 6 Tank game chain 8 have already been showed in my master thesis. Here, (2) Position battle only some necessary formulas and results are given. Red is supposed to attack the Blue station. For Red: 5.3.1 Sub-models of the virtue game The number of platoons which can be used to

(1) Encounter battle attack: αα22,{1,2,3}∈ For Red Attack strength:λ =1.0 / min

Initial number of tanks: xmm0 =∈,{3,6,9,12} For Blue:

Winning probability: px ∈[0,1] The number of defense platoons which can be

Number of remaining tanks after tth random considered as defense lines: ββ22,{1,2}∈ fighting: x t ,tN∈ + The average shooting time of each defense line

For Blue for one target: tmin1 = 2 (the service strength of

Initial number of tanks: ynn0 =∈,{3,6} defense line is ut= 1/ 1 ).

Winning probability: py ∈[0,1] Shooting time t is subject to the negative Number of remaining tanks after tth random exponential distribution. fighting: ytt , ∈ N+ The damage probability of target under the condition of being shot: p = 0.7 Table 1 Winning probability and average remaining tanks The time for a tank to pass through the target

Average remaining region is 2min. Red Blue tanks

px p y Red Blue If β2 = 2 , denote the shooting probability of ith m=3 n=3 0.5000 0.5000 1.1333 1.1333 defense lines as p i . Then m=3 n=6 0.3603 0.6397 0.7724 2.7113 ⎧up10= λ p m=6 n=3 0.6397 0.3603 2.7113 0.7724 ⎪ ⎨2()up20+=+λλ p u p 1 m=6 n=6 0.5000 0.5000 1.9295 1.9295 ⎪2up= λ p ⎩ 32 m=9 n=3 0.7358 0.2642 4.6583 0.5492 2 Combining with , we can find the solution m=9 n=6 0.6179 0.3821 3.5120 1.3887 ∑ pi = 1 i=0 m=12 n=3 0.8026 0.1974 6.8444 0.4024 122 m=12 n=6 0.7098 0.2902 5.4134 1.0113 as ppp= ,,==. 01255 5 Utility function[10]: 12 2 yyt / 0 And for β = 1, there is pp==, . ⎪⎪⎧⎫(/)ppyexy0 2 01 Red: re()A = min 1, ()/30xy00− 33 1 ⎨⎬xxt / 0 ⎩⎭⎪⎪xe0 Table 3 Penetration of Red in the position battle 2 xxt / 0 ⎪⎪⎧⎫(/)ppxeyx0 Blue: ()B ()/30yx00− re2 = min⎨⎬ 1, yyt / 0 penetration expectation of ⎩⎭⎪⎪ye0 related parameters probability penetration Table 2 Payoff matrix of the encounter battle for Red tanks for Red strategy profile Blue p N [1−− (1pp ) ] 1 2 Blue defense lines c c Red (3,1) 1 2/3 6.9000 1 — — (2,2) 2 2/5 3.4800 2 0.4230, 0.5449 0.5000, 0.5000 (2,1) 1 2/3 4.6000 3 0.4004, 0.4855 0.5144, 0.4590 (3,2) 2 2/5 5.2200 4 0.3907, 0.4056 0.5358, 0.3845 (1,1) 1 2/3 2.3000 (1,2) 2 2/5 1.7400 Note: The strategies of both sides are expressed by platoon as a unit in payoff matrix existing in this part. Note: N denotes the number of tanks of Red. 9 Utility functions: = {,j12jj ,… , 6 }

Red: Aj21( )= {(1,1), (1, 2), (2,1), (2, 2), (3,1), (3, 2)}

()A Npp[1−− (1cc ) ]33β 22 [1 −− (1 pp ) ] β Aj22( )= {(1,1), (2,1), (3,1)} r2 ==min{ ,1} min{ ,1} NN N Aj23( )= Aj 22 ( )= {(1,1), (1, 2), (2,1), (2, 2)} Blue: Aj24( )= {(1,1), (2,1)} NppN(1−− ) Npp (1 ) ()B cc Aj25(){(1,1),(1,2)}= r2 ==min{ ,1} min{ ,1} N 33ββ22 Aj26(){(1,1)}= ; Table 4 Payoff matrix of the position battle Sjb= ()− 3 U Blue jS∈∈22,() bA j 1 2 Red = {(3,2),(3,1),(2,2),(2,1),(1,2),(1,1)} 1 0.2556, 0.7000 1.0000, 0.2100 。 2 0.3833, 0.4667 0.5800, 0.4200 Ak33()= {}, k k∈ S 3 0.7667, 0.2333 0.3867, 0.6300 (2) State transition probability and immediate utility (3) Target battle The state of decision point t+1 is decided by the Utility Functions state of decision point t and corresponding action which is selected. For example, if decision-maker Red: ()A R ed force number r3 = min{ ,1} B lue force number selects action a1 = (2,1) , then state of decision point 2 B lue force number will be transferred to (6,4)−= (2,1) (4,3) . Hence, Blue: r ()B = min{ ,1} 3 Red force number the state transition probability of decision point 1 is ⎧1, jia=− Table 5 Payoff matrix of the target battle (1) p (,)jia=∀∈∈⎨ i S11 , a A () i Blue ⎩0, other Red 1 2 The state transition probability of decision point 2 is 1 1.0000, 1.0000 0.5000, 1.0000 1, kjb=− 2 1.0000, 0.5000 1.0000, 1.0000 (2) ⎧ p (,)kjb=∀∈∈⎨ j S22 , b A () j 3 1.0000, 0.3333 1.0000, 0.6667 ⎩0,other The immediate utilities are in the form of ()AB () 5.3.2 The game chain model ria111(, )=∈∈ ( r (, ia ), r (, ia )), i S 11 , a Ai () ()AB () On a basis of previous sub-models, a game chain rjb222(,)=∈∈ ( r (,), jbr (,)), jbj Sb 22 , Aj () ()AB () model is established based on Markov Decision rkk3113(,)= ( r (), k r ()), k k∈ S Process with decision point set as T = {1, 2, 3} . Then we try to find system equilibrium of this game chain model. (1) State and action In this problem, state is defined as the platoons 5.4 Solution which can be assigned at the decision point t (t=1, 2, Step1: For t = 3, expected total utility of each 3), while action represents the platoons dispatched to action is: ()AB () the corresponding battlefield which is related to state u33(,) ia== r (,) ia ( r 3 (,), ia r 3 (,)), ia i ∈∈ S 33 , a A () i * of the corresponding decision point. Then Label the new game under any state iS∈ 1 as Gi3 (). According to previous analysis, the equilibrium of S1 = {(6,4)} new game Gi * () is Ai 1( )= {(2,1),(2,2),(3,1),(3,2),(4,1),(4,2)} 3 arg Eq {G * ( i )} =∀∈ i , i S ==∈{,aa12 ,… , a 6 },(6,4) i S 1 31 Thus the decision function is f * ()ii= and the S2 = {(4,3),(4,2),(3,3),(3,2),(2,3),(2,2)} 3 10 corresponding equilibrium value function is Table 9 Pay-off Matrix under the state of i = (3,2) * Blue ui33()= rii (,) (see Table 5). 1 Red

1 1.2556, 1.2000 Step2: For t = 2 , 2 1.3833, 1.4667 uia(, )=+− ria (, ) ui* ( a ) * * 223 Then fi 3 () = (2,1) , ui 3 ( ) = (1.3833,1.4667) . ()AB () =∈∈(uiauiaiSaAi22 (,), (,)), 22 , () Label the new game in any state iS∈ as Gi * () , 2 2 (5) For i = (2,3) , aAi ∈ 2 ( ) = {(1,1), (1, 2)} and the and the corresponding results are listed as follows: corresponding pay-off matrix is showed in Table 10. Table 10 Pay-off Matrix under the state of i = (2,3) (1) For i = (4,3) Blue 1 Red aAi∈=2 ( ) {(1,1),(1,2),(2,1),(2,2),(3,1),(3,2)} and the 1 0.7556, 1.7000 corresponding pay-off matrix under the state 2 1.3833, 1.4667 i = (4,3) is showed in Table 6. * * Then fi 3 () = (2,1) , ui 3 ( ) = (1.3833,1.4667) . Table 6 Pay-off Matrix under the state of i = (4,3)

Blue 1 2 Red (6) For i = (2,2) , aAi ∈ 2 () = {(1,1)} . 1 1.2556, 1.3667 2.0000, 0.5433 Corresponding pay-off matrix is showed in Table 11. 2 1.3833, 1.4667 1.5800, 0.9200 Table 11 Pay-off Matrix under the state of i = (2,2) 3 1.2667, 1.2333 1.3867, 1.6300 Blue 1 * * Red Then fi3 ()= (2,1), ui3 ( )= (1.3833,1.4667) . 1 1.2556, 1.7000 Then fi * () = (1,1) , ui * ( ) = (1.2556,1.7000) . (2) For i = (4,2) , aAi∈=2 ( ) {(1,1), (2,1), (3,1)} and the 3 3 corresponding pay-off matrix is showed in Table 7. Table 7 Pay-off Matrix under the state of i = (4,2) The decision function and corresponding value Blue 1 function are as follows: Red ⎧(2,1), i = (4,3) 1 1.2556, 1.0333 ⎪ 2 1.3833, 0.9667 ⎪(3,1), i = (4,2)

3 1.7667, 1.2333 * ⎪(1,1), i = (3,3) fi2 ()= ⎨ * * (2,1), i = (3,2) Then fi 3 () = (3,1) , ui 3 ( ) = (1.7667,1.2333) . ⎪ ⎪(2,1),i = (2,3) ⎪ ⎩(1,1),i = (2, 2) (3) For i = (3,3) , aAi ∈=2 ( ) {(1,1),(1,2),(2,1),(2,2)} and the corresponding pay-off matrix is showed in ⎧(1.3833,1.4667),i = (4,3) Table 8. ⎪ ⎪(1.7667,1.2333),i = (4,2) Table 8 Pay-off Matrix under the state of i = (3,3) * ⎪(1.2556,1.7000),i = (3,3) Blue ui2 ()= ⎨ 1 2 (1.3833,1.4667), i = (3,2) Red ⎪ ⎪(1.3833,1.4667), i = (2,3) 1 1.2556, 1.7000 2.0000, 0.7100 ⎪ 2 0.8833, 1.4667 1.5800, 1.4200 ⎩(1.2556,1.7000), i = (2,2) * * Then fi 3 () = (1,1) , ui 3 ( ) = (1.2556,1.7000) . Step3: For t = 1 , the expected total utility is *()()AB u 11 (,) ia =+−= r (,) ia u 2 ( i a ) ( u 1 (,), ia u 1 (,)) ia , such (4) For i = (3,2) , aAi ∈= 2 () {(1,1),(2,1)} and the that . As i = (6,4) and corresponding corresponding pay-off matrix is showed in Table 9. iSaAi∈ 11,()∈ 11 Ai1 ( )= {(2,1),(2,2),(3,1),(3,2),(4,1),(4,2)}, we find Master Thesis, Beijing Institute of Technology, * * the solution Eq {()} G 1 i of new game Gi 1 () under the Beijing. principle of expected total utility which is showed in Table 12. Aumann, Robert J., 1987, “game theory”, The New

* * nd Then f 1 ((6, 4)) = (2,1) , u 1 ((6,4)) = (1.8132,2.01161) . Palgrave: A Dictionary of Economics, 2 Edition, 460–482 p. Table 12 Pay-off Matrix of encounter battle

under the state of i = (6,4) Blue Checkland, P. 1999. System Thinking, System 1 2 Red Practice: Including a 30-Year Retrospective, John 2 1.8132, 2.01161 2.2667, 1.7333 Wiley, Chichester. 3 1.65596, 2.18551 1.89768, 1.92568 4 1.77398, 1.87225 1.79316, 2.08447 Hou Guangming, 2006. An Introduce to

Organizational Systems Science, Science Press, Above all, we find the system equilibrium of this Beijing. game chain as E * = ((2,1),(2, 2),(2,1)) , which could provide proposal for decision-maker from holistic Hou Guangming, Game-chain and Its Application and game perspectives. The meaning of decision for Organizational Innovation for Defense R&D, function also lies in that in actual decision-making proceedings of the 288th XIANGSHAN Science process, it can provide the system equilibrium of the Conferences, Beijing, 84-101 p, 2007. rest stages of the affair after combining with the feedback of environmental information. Liu Ke, 2004. Applied Markov Decision Processes,

TsingHua University Press, Beijing. 6. CONCLUSION

Puterman M. L., Markov Decision Processes, John Demonstrated by a virtue game, it seems feasible to Wiley & Sons, New York, 1994. find a solution for game chain decision problem through the pathway proposed in this paper. The Senge, P.M., 2006. The Fifth Discipline: The Art and result is more acceptable and reasonable from Practice of the Leaning Organazation, Random systematic and holistic perspectives. The most House Business Books, London, 6-7 p. important purpose of this paper is that we want to draw more attention to adding systematic idea and game idea into complex decision making process in such fields as infrastructure management etc. in order to pursue a more reasonable scheme before taking action for decision makers.

REFERENCES

AN Tingyu, 2008. A Decision-Making Method of

Game Chain Based On Markov Decision Processes,