Stochastic games

Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives Stochastic Games The value Min strategies (in Formal Verification) Max strategies Finite-state games BPA games

Branching-time objectives Basic properties Deciding the winner Antonín Kuceraˇ Games with time Masaryk University Brno

SFM-10:QAPL 2010 1/56 Stochastic games Antonín Kuceraˇ

Preliminaries Games Strategies, plays Game theory studies the behavior of rational “players” who can Objectives make choice and attempt to achieve a certain objective.A Reachability objectives player’s success depends on the choices of the other players. The value Min strategies Max strategies stochastic games: Determinacy Finite-state games BPA games the impact of players’ choices in uncertain; Branching-time objectives the players’ choice can be randomized. Basic properties Deciding the winner

Games with games in computer science: time formal semantics; communication protocols; Internet auctions; . . . many other things.

SFM-10:QAPL 2010 2/56 Stochastic games Stochastic games in formal verification Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives Our setting: The value Min strategies Max strategies state space: discrete Determinacy Finite-state games players: controller, environment BPA games Branching-time objectives: antagonistic objectives Basic properties Deciding the winner choice: turn-based, randomized

Games with time information: perfect

Is there a for the controller such that the system satisfies a certain property no matter what the environment does?

SFM-10:QAPL 2010 3/56 Stochastic games Outline Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives The value Preliminaries. Min strategies Max strategies Games, strategies, objectives. Determinacy Finite-state games BPA games Stochastic games with reachability objectives. Branching-time objectives Basic properties The (non)existence of optimal strategies. Deciding the winner

Games with Algorithms for finite-state games. time Stochastic games with branching-time objectives.

Stochastic games with time.

SFM-10:QAPL 2010 4/56 We want to measure the probability of certain subsets of Run(s). For every finite path w initiated in s, we define the probability of Run(w) in the natural way. This assignment can be uniquely extended to the (Borel) σ-algebra generated by all Run(w). F Thus, we obtain the probability space (Run(s), , ). F P

Stochastic games Markov chains Antonín Kuceraˇ

Preliminaries Games Definition 1 (Markov chain) Strategies, plays Objectives 1 Reachability 4 objectives 1 s t 1 = (S, , Prob) The value 2 3 M → Min strategies 1 3 S is at most countable set of states; Max strategies 1 1 Determinacy 4 3 Finite-state games S S is a transition relation; BPA games →⊆ × u Branching-time Prob is a probability assignment. objectives Basic properties 1 Deciding the winner

Games with time

SFM-10:QAPL 2010 5/56 Stochastic games Markov chains Antonín Kuceraˇ

Preliminaries Games Definition 1 (Markov chain) Strategies, plays Objectives 1 Reachability 4 objectives 1 s t 1 = (S, , Prob) The value 2 3 M → Min strategies 1 3 S is at most countable set of states; Max strategies 1 1 Determinacy 4 3 Finite-state games S S is a transition relation; BPA games →⊆ × u Branching-time Prob is a probability assignment. objectives Basic properties 1 Deciding the winner

Games with time We want to measure the probability of certain subsets of Run(s). For every finite path w initiated in s, we define the probability of Run(w) in the natural way. This assignment can be uniquely extended to the (Borel) σ-algebra generated by all Run(w). F Thus, we obtain the probability space (Run(s), , ). F P

SFM-10:QAPL 2010 5/56 Stochastic games Turn-based stochastic games Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives The value Definition 2 (Turn-based stochastic game) Min strategies Max strategies Determinacy G = (V, E, (V, V^, V ), Prob) Finite-state games BPA games 0.2 the setV is at most countable; Branching-time objectives each vertex has a successor; Basic properties 0.8 Deciding the winner Prob is positive; Games with time G is a (MDP) ifV ^ = orV  = . 0.4 0.6 ∅ ∅

SFM-10:QAPL 2010 6/56 Stochastic games Strategies Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives Definition 3 (Strategy)

Reachability LetG = (V, E, (V, V^, V ), Prob) be a game. A strategy for objectives The value player  is a function σ which to every wv V ∗V assigns a Min strategies ∈ Max strategies probability distribution over the set of outgoing edges ofv. Determinacy Finite-state games BPA games Branching-time A strategy for player ^ is defined analogously. objectives Basic properties Deciding the winner We can classify strategies according to Games with time memory requirements: history-dependent (H), finite-memory (F), memoryless (M) randomization: randomized (R), deterministic (D)

Thus, we obtain the classes ofMD,MR,FD,FR,HD, andHR strategies.

SFM-10:QAPL 2010 7/56 Stochastic games Plays Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives Definition 4 (Play) The value Min strategies Max strategies LetG = (V, E, (V, V^, V ), Prob) be a game. Each pair (σ, π) of Determinacy Finite-state games strategies for player  and player ^ determines a unique play BPA games G(σ,π), which is a Markov chain whereV + is the set of states and Branching-time objectives transitions are defined accordingly. Basic properties Deciding the winner

Games with time Plays are infinite trees.

For a pair of memoryless strategies (σ, π), the play G(σ,π) can be depicted as a Markov chain with the set of states V.

SFM-10:QAPL 2010 8/56 Is there a strategy σ such that v = >0(v) in Gσ ? | G Is there a strategy σ such that v = >0(v >0u) in Gσ ? | G ∧ F Obviously, there is no suchMR (or evenFR) strategy.

wv wv 1/2| | 1 1/2| | Let σ(wv) = v u, v − v −−−−→ −−−−−−→ 1/2 3/4 7/8 15/16 v vv vvv vvvv

1/2 1/4 1/8 1/16

vu vvu vvvu vvvvu

1 1 1 1

Stochastic games Plays (2) Antonín Kuceraˇ

Preliminaries Games Example 5 (A game and its play) Strategies, plays Objectives Reachability v u 1 objectives The value Min strategies Max strategies Determinacy Finite-state games BPA games

Branching-time objectives Basic properties Deciding the winner

Games with time

SFM-10:QAPL 2010 9/56 Is there a strategy σ such that v = >0(v >0u) in Gσ ? | G ∧ F Obviously, there is no suchMR (or evenFR) strategy.

wv wv 1/2| | 1 1/2| | Let σ(wv) = v u, v − v −−−−→ −−−−−−→ 1/2 3/4 7/8 15/16 v vv vvv vvvv

1/2 1/4 1/8 1/16

vu vvu vvvu vvvvu

1 1 1 1

Stochastic games Plays (2) Antonín Kuceraˇ

Preliminaries Games Example 5 (A game and its play) Strategies, plays Objectives Reachability v u 1 objectives The value Min strategies Max strategies >0 σ Determinacy Is there a strategy σ such that v = (v) in G ? Finite-state games | G BPA games

Branching-time objectives Basic properties Deciding the winner

Games with time

SFM-10:QAPL 2010 9/56 Obviously, there is no suchMR (or evenFR) strategy.

wv wv 1/2| | 1 1/2| | Let σ(wv) = v u, v − v −−−−→ −−−−−−→ 1/2 3/4 7/8 15/16 v vv vvv vvvv

1/2 1/4 1/8 1/16

vu vvu vvvu vvvvu

1 1 1 1

Stochastic games Plays (2) Antonín Kuceraˇ

Preliminaries Games Example 5 (A game and its play) Strategies, plays Objectives Reachability v u 1 objectives The value Min strategies Max strategies >0 σ Determinacy Is there a strategy σ such that v = (v) in G ? Finite-state games | G BPA games >0 >0 σ Branching-time Is there a strategy σ such that v = (v u) in G ? objectives | G ∧ F Basic properties Deciding the winner

Games with time

SFM-10:QAPL 2010 9/56 wv wv 1/2| | 1 1/2| | Let σ(wv) = v u, v − v −−−−→ −−−−−−→ 1/2 3/4 7/8 15/16 v vv vvv vvvv

1/2 1/4 1/8 1/16

vu vvu vvvu vvvvu

1 1 1 1

Stochastic games Plays (2) Antonín Kuceraˇ

Preliminaries Games Example 5 (A game and its play) Strategies, plays Objectives Reachability v u 1 objectives The value Min strategies Max strategies >0 σ Determinacy Is there a strategy σ such that v = (v) in G ? Finite-state games | G BPA games >0 >0 σ Branching-time Is there a strategy σ such that v = (v u) in G ? objectives | G ∧ F Basic properties Obviously, there is no suchMR (or evenFR) strategy. Deciding the winner

Games with time

SFM-10:QAPL 2010 9/56 Stochastic games Plays (2) Antonín Kuceraˇ

Preliminaries Games Example 5 (A game and its play) Strategies, plays Objectives Reachability v u 1 objectives The value Min strategies Max strategies >0 σ Determinacy Is there a strategy σ such that v = (v) in G ? Finite-state games | G BPA games >0 >0 σ Branching-time Is there a strategy σ such that v = (v u) in G ? objectives | G ∧ F Basic properties Obviously, there is no suchMR (or evenFR) strategy. Deciding the winner wv wv Games with 1/2| | 1 1/2| | time Let σ(wv) = v u, v − v −−−−→ −−−−−−→ 1/2 3/4 7/8 15/16 v vv vvv vvvv

1/2 1/4 1/8 1/16

vu vvu vvvu vvvvu

1 1 1 1

SFM-10:QAPL 2010 9/56 Stochastic games A taxonomy of objectives Antonín Kuceraˇ

Preliminaries Games Each play of a game G is assigned a (numerical) yield. The Strategies, plays Objectives goal of player /^ is to maximize/minimize the yield. Reachability objectives The value Win-lose objectives assign either1 or0 to each play. Min strategies Max strategies Z% Determinacy P ϕ, where ϕ is an LTL formula. Finite-state games BPA games PCTL or PCTL* objectives. Branching-time objectives Basic properties Objectives specified by Borel measurable payoffs. Deciding the winner σ,π σ,π Games with yield(G ) = E(f ), where f : Run(G) R is measurable. time → Qualitative payoffs assign either1 or0 to each run Büchi, parity, Rabin, Street, Muller, etc. Quantitative payoffs Pn i=0 rew(w(i)) Mean payoff: MP(w) = limn n →∞ P i Discounted payoff: DP(w) = ∞ λ rew(w(i)) i=0 ·

SFM-10:QAPL 2010 10/56 Stochastic games The problems of interest Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives Reachability Win-lose objectives objectives The value Min strategies Determinacy: does one of the two players always have a Max strategies winning strategy? If so, what type of strategy? Determinacy Finite-state games BPA games Can we effectively determine the winner and compute a Branching-time winning strategy for her? objectives Basic properties Deciding the winner Objectives specified by Borel measurable payoffs Games with time Is there an equilibrium value? If so, do the players have optimal strategies? And of what type? Can we compute the value and (ε-) optimal strategies?

SFM-10:QAPL 2010 11/56 Stochastic games The existence of an equilibrium value Antonín Kuceraˇ

Preliminaries Games Strategies, plays Theorem 6 (Martin, 1998; Maitra & Sudderth, 1998) Objectives

Reachability objectives LetG = (V, E, (V, V^, V ), Prob) be a game,v V, and ∈ The value f : Run(G) R a bounded Borel measurable payoff. Then Min strategies Max strategies → Determinacy σ,π σ,π Finite-state games sup inf E(fv ) = inf sup E(fv ) π π BPA games σ σ

Branching-time objectives Basic properties Deciding the winner Thm. 6 does not impose any restrictions on G. The set of Games with vertices and the branching degree of G can be infinite. time References: D.A. Martin. The Determinacy of Blackwell Games. The Journal of Symbolic Logic, Vol. 63, No. 4 (Dec., 1998), pp. 1565–1581. A. Maitra and W. Sudderth. Finitely Additive Stochastic Games with Borel Measurable Payoffs. International Journal of Game Theory, Vol. 27 (1998), pp. 257–267.

SFM-10:QAPL 2010 12/56 Stochastic games Optimal strategies Antonín Kuceraˇ

Preliminaries Games Definition 7 Strategies, plays Objectives LetG = (V, E, (V, V^, V ), Prob) be a game,v V, and ∈ Reachability f : Run(G) R a bounded Borel measurable payoff. Let [0 1]. objectives ε , The value → ∈ Min strategies An ε-optimal maximizing strategy is a strategy σ for player  Max strategies Determinacy such that for every strategy π of player ^ we have that Finite-state games σ,π E(f ) valf (v) ε. BPA games v ≥ − Branching-time objectives An ε-optimal minimizing strategy is a strategy π for player ^ Basic properties such that for every strategy σ of player  we have that Deciding the winner σ,π Games with E(fv ) valf (v) + ε. time ≤ An optimal maximizing/minimizing strategy is a 0-optimal maximizing/minimizing strategy.

According to Thm. 6, ε-optimal maximizing/minimizing strategies exist for every ε > 0.

. . . and we cannot say much more in the general setting.

SFM-10:QAPL 2010 13/56 Stochastic games Reachability objectives Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives The value Now we examine the properties of our interest for reachability Min strategies objectives in greater detail (and reveal some surprising facts). Max strategies Determinacy Finite-state games BPA games Let G = (V, E, (V, V^, V ), Prob) be a game, T V a set of ∈ Branching-time target vertices. objectives Basic properties Deciding the winner Let Reach(T) be the set of all runs that visit T. Games with time The goal of player /^ is to maximize/minimize the probability of Reach(T).

SFM-10:QAPL 2010 14/56 Stochastic games Reachability games have a value (1) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives The value Min strategies Max strategies Theorem 8 Determinacy Finite-state games BPA games LetG = (V, E, (V, V^, V ), Prob) be a game,T V target ⊆ Branching-time vertices. For everyv V we have that objectives ∈ Basic properties σ,π σ,π Deciding the winner sup inf v (Reach(T)) = inf sup v (Reach(T)) Games with σ π P π σ P time

SFM-10:QAPL 2010 15/56 σ,π σ,π µΓ(v) sup inf v (Reach(T)) inf sup v (Reach(T)) ≤ σ π P ≤ π σ P the second inequality holds for all Borel objectives; σ,π the tuple of all sup inf v (Reach(T)) is a fixed-point of Γ. σ π P It cannot be that µΓ(v) < inf sup σ,π(Reach(T)) π σ Pv For all ε > 0 and v V, there is a strategy πˆ such that sup σ,πˆ (Reach(T∈)) µΓ(v) + ε.  σ Pv ≤

Stochastic games Reachability games have a value (2) Antonín Kuceraˇ

Preliminaries Games Proof sketch. Strategies, plays Objectives Let Γ:[0, 1] V [0, 1] V be a (monotonic) function defined by Reachability | | | | objectives → The value  1 if v T; Min strategies  ∈ Max strategies  sup α(v0) (v, v0) E if v < T and v V; Determinacy Γ(α)(v) =  { | ∈ } ∈ Finite-state games  inf α(v0) (v, v0) E if v < T and v V^; BPA games  P { | ∈ } ∈ Branching-time Prob(v, v0) α(v0) if v T and v V . (v,v0) E < objectives ∈ · ∈ Basic properties Deciding the winner

Games with time

SFM-10:QAPL 2010 16/56 the second inequality holds for all Borel objectives; σ,π the tuple of all sup inf v (Reach(T)) is a fixed-point of Γ. σ π P It cannot be that µΓ(v) < inf sup σ,π(Reach(T)) π σ Pv For all ε > 0 and v V, there is a strategy πˆ such that sup σ,πˆ (Reach(T∈)) µΓ(v) + ε.  σ Pv ≤

Stochastic games Reachability games have a value (2) Antonín Kuceraˇ

Preliminaries Games Proof sketch. Strategies, plays Objectives Let Γ:[0, 1] V [0, 1] V be a (monotonic) function defined by Reachability | | | | objectives → The value  1 if v T; Min strategies  ∈ Max strategies  sup α(v0) (v, v0) E if v < T and v V; Determinacy Γ(α)(v) =  { | ∈ } ∈ Finite-state games  inf α(v0) (v, v0) E if v < T and v V^; BPA games  P { | ∈ } ∈ Branching-time Prob(v, v0) α(v0) if v T and v V . (v,v0) E < objectives ∈ · ∈ Basic properties Deciding the winner σ,π σ,π µΓ(v) sup inf v (Reach(T)) inf sup v (Reach(T)) Games with ≤ σ π P ≤ π σ P time

SFM-10:QAPL 2010 16/56 It cannot be that µΓ(v) < inf sup σ,π(Reach(T)) π σ Pv For all ε > 0 and v V, there is a strategy πˆ such that sup σ,πˆ (Reach(T∈)) µΓ(v) + ε.  σ Pv ≤

Stochastic games Reachability games have a value (2) Antonín Kuceraˇ

Preliminaries Games Proof sketch. Strategies, plays Objectives Let Γ:[0, 1] V [0, 1] V be a (monotonic) function defined by Reachability | | | | objectives → The value  1 if v T; Min strategies  ∈ Max strategies  sup α(v0) (v, v0) E if v < T and v V; Determinacy Γ(α)(v) =  { | ∈ } ∈ Finite-state games  inf α(v0) (v, v0) E if v < T and v V^; BPA games  P { | ∈ } ∈ Branching-time Prob(v, v0) α(v0) if v T and v V . (v,v0) E < objectives ∈ · ∈ Basic properties Deciding the winner σ,π σ,π µΓ(v) sup inf v (Reach(T)) inf sup v (Reach(T)) Games with ≤ σ π P ≤ π σ P time the second inequality holds for all Borel objectives; σ,π the tuple of all sup inf v (Reach(T)) is a fixed-point of Γ. σ π P

SFM-10:QAPL 2010 16/56 Stochastic games Reachability games have a value (2) Antonín Kuceraˇ

Preliminaries Games Proof sketch. Strategies, plays Objectives Let Γ:[0, 1] V [0, 1] V be a (monotonic) function defined by Reachability | | | | objectives → The value  1 if v T; Min strategies  ∈ Max strategies  sup α(v0) (v, v0) E if v < T and v V; Determinacy Γ(α)(v) =  { | ∈ } ∈ Finite-state games  inf α(v0) (v, v0) E if v < T and v V^; BPA games  P { | ∈ } ∈ Branching-time Prob(v, v0) α(v0) if v T and v V . (v,v0) E < objectives ∈ · ∈ Basic properties Deciding the winner σ,π σ,π µΓ(v) sup inf v (Reach(T)) inf sup v (Reach(T)) Games with ≤ σ π P ≤ π σ P time the second inequality holds for all Borel objectives; σ,π the tuple of all sup inf v (Reach(T)) is a fixed-point of Γ. σ π P It cannot be that µΓ(v) < inf sup σ,π(Reach(T)) π σ Pv For all ε > 0 and v V, there is a strategy πˆ such that sup σ,πˆ (Reach(T∈)) µΓ(v) + ε.  σ Pv ≤

SFM-10:QAPL 2010 16/56 Stochastic games Minimizing strategies (1) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives The value Min strategies Definition 9 (Locally optimal minimizing strategy) Max strategies Determinacy LetG = (V, E, (V, V^, V ), Prob) be a game. Finite-state games BPA games An edge (v, v0) E is value minimizing if Branching-time ∈ objectives n o Basic properties val(v0) = min val(vˆ) V (v, vˆ) E Deciding the winner ∈ | ∈ Games with time A locally optimal minimizing strategy is a strategy which in every play selects only value minimizing edges.

SFM-10:QAPL 2010 17/56 (2) Let π be a locally optimal min. strategy which is not optimal. Then there is a strategy σ of player  such that σ,π(Reach(T)) = val(v) + δ, where δ > 0. Pv This means that there is k N such that σ,π(Reachk (T)) > val(v) +∈ δ . Pv 2 δ Hence, if player ^ switches to 4 -optimal minimizing strategy after playing k rounds according to π, we do not δ obtain a 4 -optimal minimizing strategy for v. 

Stochastic games Minimizing strategies (2) Antonín Kuceraˇ

Preliminaries Theorem 10 Games Strategies, plays Every locally optimal min. strategy is an optimal min. strategy. Objectives

Reachability objectives The value Proof. Min strategies Max strategies Let v V be an initial vertex, and u V a target vertex. Determinacy ∈ ∈ Finite-state games (1) After playing k rounds according to a locally optimal minimizing BPA games strategy, player can switch to ε-optimal minimizing strategies Branching-time ^ objectives in the current vertices of the play. Thus, we always (for every k Basic properties Deciding the winner and ε > 0) obtain an ε-optimal minimizing strategy for v. Games with time

SFM-10:QAPL 2010 18/56 Stochastic games Minimizing strategies (2) Antonín Kuceraˇ

Preliminaries Theorem 10 Games Strategies, plays Every locally optimal min. strategy is an optimal min. strategy. Objectives

Reachability objectives The value Proof. Min strategies Max strategies Let v V be an initial vertex, and u V a target vertex. Determinacy ∈ ∈ Finite-state games (1) After playing k rounds according to a locally optimal minimizing BPA games strategy, player can switch to ε-optimal minimizing strategies Branching-time ^ objectives in the current vertices of the play. Thus, we always (for every k Basic properties Deciding the winner and ε > 0) obtain an ε-optimal minimizing strategy for v. Games with time (2) Let π be a locally optimal min. strategy which is not optimal. Then there is a strategy σ of player  such that σ,π(Reach(T)) = val(v) + δ, where δ > 0. Pv This means that there is k N such that σ,π(Reachk (T)) > val(v) +∈ δ . Pv 2 δ Hence, if player ^ switches to 4 -optimal minimizing strategy after playing k rounds according to π, we do not δ obtain a 4 -optimal minimizing strategy for v.  SFM-10:QAPL 2010 18/56 Theorem 12 Every optimal min. strategy is a locally optimal min. strategy. Hence, if player ^ has some optimal minimizing strategy, then she also has an MD optimal minimizing strategy.

Proof. This is WRONG. Optimal minimizing strategies may require infinite memory. 

Stochastic games Minimizing strategies (3) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives Corollary 11 (Properties of minimizing strategies.) Reachability objectives The value In every finitely-branching game, there is an optimal minimizing Min strategies MD strategy. Max strategies Determinacy Finite-state games BPA games

Branching-time objectives Basic properties Deciding the winner

Games with time

SFM-10:QAPL 2010 19/56 Proof. This is WRONG. Optimal minimizing strategies may require infinite memory. 

Stochastic games Minimizing strategies (3) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives Corollary 11 (Properties of minimizing strategies.) Reachability objectives The value In every finitely-branching game, there is an optimal minimizing Min strategies MD strategy. Max strategies Determinacy Finite-state games BPA games

Branching-time Theorem 12 objectives Basic properties Every optimal min. strategy is a locally optimal min. strategy. Deciding the winner Hence, if player has some optimal minimizing strategy, then she Games with ^ time also has an MD optimal minimizing strategy.

SFM-10:QAPL 2010 19/56 Stochastic games Minimizing strategies (3) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives Corollary 11 (Properties of minimizing strategies.) Reachability objectives The value In every finitely-branching game, there is an optimal minimizing Min strategies MD strategy. Max strategies Determinacy Finite-state games BPA games

Branching-time Theorem 12 objectives Basic properties Every optimal min. strategy is a locally optimal min. strategy. Deciding the winner Hence, if player has some optimal minimizing strategy, then she Games with ^ time also has an MD optimal minimizing strategy.

Proof. This is WRONG. Optimal minimizing strategies may require infinite memory. 

SFM-10:QAPL 2010 19/56 r

1 2

1 1 2

Stochastic games Minimizing strategies (4) Antonín Kuceraˇ

Preliminaries Games Theorem 13 Strategies, plays Objectives Optimal minimizing strategies do not necessarily exist, and Reachability objectives (ε-) optimal minimizing strategies may require infinite memory. The value Min strategies Max strategies Determinacy Proof. Finite-state games BPA games v Branching-time objectives Basic properties 1 3 7 1 1 Deciding the winner 2 4 8 − 2i Games with time s1 s2 s3 si 1 1 1 1 2 4 8 2i

1



SFM-10:QAPL 2010 20/56 Stochastic games Minimizing strategies (4) Antonín Kuceraˇ

Preliminaries Games Theorem 13 Strategies, plays Objectives Optimal minimizing strategies do not necessarily exist, and Reachability objectives (ε-) optimal minimizing strategies may require infinite memory. The value Min strategies Max strategies Determinacy Proof. Finite-state games BPA games r v Branching-time objectives Basic properties 1 3 7 1 1 Deciding the winner 1 2 4 8 − 2i Games with 2 time s1 s2 s3 si 1 1 1 1 i 1 1 2 4 8 2 2

1



SFM-10:QAPL 2010 20/56 Stochastic games Maximizing strategies (1) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives Observation 14 Reachability objectives A locally optimal maximizing strategy is not necessarily an optimal The value Min strategies maximizing strategy. This holds even for finite-state MDPs. Max strategies Determinacy Finite-state games BPA games Proof. Branching-time objectives v t Basic properties Deciding the winner

Games with time

1



SFM-10:QAPL 2010 21/56 Stochastic games Maximizing strategies (2) Antonín Kuceraˇ

Preliminaries Games Theorem 15 Strategies, plays Objectives Letv V be a vertex with finitely many successorst 1,..., tn. ∈  Reachability Then there is 1 i n such that val(v) does not change if all objectives ≤ ≤ The value edges (v, tj), wherei , j, are deleted from the game. Min strategies Max strategies Determinacy Finite-state games Proof. BPA games Branching-time  (u) objectives  P if (u) + ( ) > 0; Basic properties (σ,π)  (u)+ ( ) V =  P P ↑ P P ↑ Deciding the winner t  v k  Games with 0 otherwise; time σ (σ,π) tk V = inf V tk π tk σ Vt = sup V k σ tk

There must be some k such that Vt = v u k ↑ val(v). We put i = k. 

SFM-10:QAPL 2010 22/56 Stochastic games Maximizing strategies (3) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives Theorem 16 Reachability objectives Optimal maximizing strategies may not exist, even in The value Min strategies finitely-branching MDPs. Max strategies Determinacy Finite-state games BPA games

Branching-time objectives v Basic properties Deciding the winner

Games with 1 1 1 1 1 time 2 2 2 2 2 1

1 1 1 1 2 2 2 2

1 1 1 1

SFM-10:QAPL 2010 23/56 Stochastic games Maximizing strategies (4) Antonín Kuceraˇ

Preliminaries Theorem 17 Games Strategies, plays Optimal maximizing strategies may require infinite memory, even Objectives

Reachability in finitely-branching games. objectives The value Min strategies Max strategies Determinacy 1 1 1 1 1 Finite-state games 2 2 2 2 2 BPA games vˆ

Branching-time 1 1 1 1 1 objectives 2 2 2 2 2 Basic properties Deciding the winner d1 d2 d3 d4 d5 Games with time

v e1 e2 e3 e4 e5

s1 s2 s3 s4 s5 1 1 1 1 1 1 2 1 2 1 2 1 2 1 2 1 2 2 2 2 2 2

SFM-10:QAPL 2010 24/56 Stochastic games Summary Antonín Kuceraˇ

Preliminaries Minimizing strategies: Games Strategies, plays Objectives Optimal minimizing strategies may not exist. Optimal and Reachability ε-optimal minimizing strategies may require infinite memory. objectives The value Min strategies In finitely-branching games, there are MD optimal minimizing Max strategies strategies. Determinacy Finite-state games BPA games Maximizing strategies: Branching-time objectives Basic properties Optimal maximizing strategies may not exist, even in Deciding the winner finitely-branching games. Optimal maximizing strategies may Games with time require infinite memory. In finite-state games, there are MD optimal maximizing strategies.

References: M.L. Puterman. Markov Decision Processes, Wiley, 1994. T. Brázdil, V. Brožek, V. Forejt, A. Kucera.ˇ Reachability in recursive Markov decision processes. Information and Computation, vol. 206, pp. 520–537, 2008.

SFM-10:QAPL 2010 25/56 Stochastic games Reachability as a win-lose objective (1) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives The value Let % [0, 1]. Min strategies ∈ Max strategies Determinacy Finite-state games A strategy σ Σ is ( %)-winning in v if for every π Π we have BPA games (σ,π) ∈ ≥ ∈ Branching-time that v (Reach(T) %). objectives P ≥ Basic properties Deciding the winner Π ( ) Σ Games with A strategy π is <% -winning if for every σ we have that ( ) ∈ ∈ time σ,π (Reach(T) < %). Pv Is there a winning strategy for one of the two players?

SFM-10:QAPL 2010 26/56 Stochastic games Reachability as a win-lose objective (2) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives The value Min strategies Max strategies Determinacy Finite-state games Theorem 18 BPA games Branching-time Turn-based stochastic games with reachability objectives are not objectives Basic properties necessarily determined. However, finitely-branching games are Deciding the winner determined. Games with time

SFM-10:QAPL 2010 27/56 Stochastic games Reachability as a win-lose objective (3) Antonín Kuceraˇ

Preliminaries Games 1 1 Strategies, plays 2 2 Objectives u s v

Reachability objectives The value Min strategies Max strategies Determinacy Finite-state games u 1 BPA games 1 1 1 1 1 Branching-time 2 1 2 1 2 1 2 1 2 1 objectives 2 2 2 2 2 Basic properties Deciding the winner

Games with time

v

1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 2 2 2 2 1 1 1 1

SFM-10:QAPL 2010 28/56 Stochastic games Algorithms for finite-state MDPs and games Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives The value We show how to compute the values and optimal strategies Min strategies Max strategies for reachability objectives in finite-state games and MDPs. Determinacy Finite-state games For finite-state MDPs we have that BPA games Branching-time the values and optimal strategies are computable in objectives Basic properties polynomial time; Deciding the winner Games with For finite-state games we have that time the values and optimal strategies are computable in polynomial space (for a fixed number of randomized vertices, the problem is in P);

SFM-10:QAPL 2010 29/56 Proof. It suffices to realize that =1 is exactly the greatest S V satisfying the following conditions:V ⊆ If v S, then there is a finite path from v to the target vertex which∈ visits only the vertices of S. If v S V , then all successors of v belong to S. ∈ ∩ Hence, =1 is computable in polynomial time. The set =0 can be computedV similarly. Note that the sets =1 and =0Vdepend only on the “topology” of G. V V 

Stochastic games Finite-state MDPs (1) Antonín Kuceraˇ

Preliminaries Theorem 19 Games Strategies, plays Objectives LetG = (V, E, (V, V ), Prob) be a finite-state MDP. Then

Reachability =0 objectives = v V val(v) = 0 The value V { ∈ | } =1 Min strategies = v V val(v) = 1 Max strategies V { ∈ | } Determinacy Finite-state games are computable in polynomial time. BPA games

Branching-time objectives Basic properties Deciding the winner

Games with time

SFM-10:QAPL 2010 30/56 Stochastic games Finite-state MDPs (1) Antonín Kuceraˇ

Preliminaries Theorem 19 Games Strategies, plays Objectives LetG = (V, E, (V, V ), Prob) be a finite-state MDP. Then

Reachability =0 objectives = v V val(v) = 0 The value V { ∈ | } =1 Min strategies = v V val(v) = 1 Max strategies V { ∈ | } Determinacy Finite-state games are computable in polynomial time. BPA games

Branching-time objectives Basic properties Proof. Deciding the winner It suffices to realize that =1 is exactly the greatest S V Games with V ⊆ time satisfying the following conditions: If v S, then there is a finite path from v to the target vertex which∈ visits only the vertices of S. If v S V , then all successors of v belong to S. ∈ ∩ Hence, =1 is computable in polynomial time. The set =0 can be computedV similarly. Note that the sets =1 and =0Vdepend only on the “topology” of G. V V 

SFM-10:QAPL 2010 30/56 Stochastic games Finite-state MDPs (2) Antonín Kuceraˇ

Preliminaries Theorem 20 Games Strategies, plays LetG = (V, E, (V, V ), Prob) be a finite-state MDP where Prob Objectives Reachability is rational. The values val(v),v V, are rational and computable objectives ∈ The value in polynomial time. An optimal maximizing strategy is also Min strategies constructible in polynomial time. Max strategies Determinacy Finite-state games BPA games Proof. Branching-time objectives Let V = v1,..., vn , where vn is the (only) target vertex. Basic properties { } Deciding the winner minimize x + + x Games with 1 n time subject to ··· xn = 1 xi xj for all (vi, vj) E where vi V and i < n ≥= P ( ∈ ) ∈ xi (vi ,vj ) E Prob vi, vj xj for all vi V , i < n ∈ · ∈ xi 0 for all i 1,..., n ≥ ∈ { } An optimal strategy can be constructed by successively removing the ougoing edges of every v V untill only one such edge is left. ∈  SFM-10:QAPL 2010 31/56 Proof. >0 = µΓ, where Γ: 2V 2V is defined as follows: V → Γ(A)= T v V V (v, v0) E s.t. v0 A ∪ { ∈ ∪ | ∃ ∈ ∈ } v V (v, v0) E we have that v0 A ∪{ ∈ ^ | ∀ ∈ ∈ } =0 = V r >0 V V <1 = µΓ, where Γ: 2V 2V is defined as follows: V =0 → Γ(A)= v V^ V (v, v0) E s.t. v0 A V ∪ { ∈ ∪ | ∃ ∈ ∈ } v V (v, v0) E we have that v0 A ∪{ ∈  | ∀ ∈ ∈ } =1 = V r <1  V V

Stochastic games Finite-state games (1) Antonín Kuceraˇ

Preliminaries Theorem 21 Games Strategies, plays LetG = (V, E, (V, V^, V ), Prob) be a finite-state game. Then Objectives Reachability =0 = v V val(v) = 0 objectives V { ∈ | } The value =1 Min strategies = v V val(v) = 1 Max strategies V { ∈ | } Determinacy Finite-state games are computable in polynomial time. BPA games

Branching-time objectives Basic properties Deciding the winner

Games with time

SFM-10:QAPL 2010 32/56 =0 = V r >0 V V <1 = µΓ, where Γ: 2V 2V is defined as follows: V =0 → Γ(A)= v V^ V (v, v0) E s.t. v0 A V ∪ { ∈ ∪ | ∃ ∈ ∈ } v V (v, v0) E we have that v0 A ∪{ ∈  | ∀ ∈ ∈ } =1 = V r <1  V V

Stochastic games Finite-state games (1) Antonín Kuceraˇ

Preliminaries Theorem 21 Games Strategies, plays LetG = (V, E, (V, V^, V ), Prob) be a finite-state game. Then Objectives Reachability =0 = v V val(v) = 0 objectives V { ∈ | } The value =1 Min strategies = v V val(v) = 1 Max strategies V { ∈ | } Determinacy Finite-state games are computable in polynomial time. BPA games

Branching-time objectives Proof. Basic properties 0 V V Deciding the winner > = µΓ, where Γ: 2 2 is defined as follows: Games with V → time Γ(A)= T v V V (v, v0) E s.t. v0 A ∪ { ∈ ∪ | ∃ ∈ ∈ } v V (v, v0) E we have that v0 A ∪{ ∈ ^ | ∀ ∈ ∈ }

SFM-10:QAPL 2010 32/56 <1 = µΓ, where Γ: 2V 2V is defined as follows: V =0 → Γ(A)= v V^ V (v, v0) E s.t. v0 A V ∪ { ∈ ∪ | ∃ ∈ ∈ } v V (v, v0) E we have that v0 A ∪{ ∈  | ∀ ∈ ∈ } =1 = V r <1  V V

Stochastic games Finite-state games (1) Antonín Kuceraˇ

Preliminaries Theorem 21 Games Strategies, plays LetG = (V, E, (V, V^, V ), Prob) be a finite-state game. Then Objectives Reachability =0 = v V val(v) = 0 objectives V { ∈ | } The value =1 Min strategies = v V val(v) = 1 Max strategies V { ∈ | } Determinacy Finite-state games are computable in polynomial time. BPA games

Branching-time objectives Proof. Basic properties 0 V V Deciding the winner > = µΓ, where Γ: 2 2 is defined as follows: Games with V → time Γ(A)= T v V V (v, v0) E s.t. v0 A ∪ { ∈ ∪ | ∃ ∈ ∈ } v V (v, v0) E we have that v0 A ∪{ ∈ ^ | ∀ ∈ ∈ } =0 = V r >0 V V

SFM-10:QAPL 2010 32/56 =1 = V r <1  V V

Stochastic games Finite-state games (1) Antonín Kuceraˇ

Preliminaries Theorem 21 Games Strategies, plays LetG = (V, E, (V, V^, V ), Prob) be a finite-state game. Then Objectives Reachability =0 = v V val(v) = 0 objectives V { ∈ | } The value =1 Min strategies = v V val(v) = 1 Max strategies V { ∈ | } Determinacy Finite-state games are computable in polynomial time. BPA games

Branching-time objectives Proof. Basic properties 0 V V Deciding the winner > = µΓ, where Γ: 2 2 is defined as follows: Games with V → time Γ(A)= T v V V (v, v0) E s.t. v0 A ∪ { ∈ ∪ | ∃ ∈ ∈ } v V (v, v0) E we have that v0 A ∪{ ∈ ^ | ∀ ∈ ∈ } =0 = V r >0 V V <1 = µΓ, where Γ: 2V 2V is defined as follows: V =0 → Γ(A)= v V^ V (v, v0) E s.t. v0 A V ∪ { ∈ ∪ | ∃ ∈ ∈ } v V (v, v0) E we have that v0 A ∪{ ∈  | ∀ ∈ ∈ }

SFM-10:QAPL 2010 32/56 Stochastic games Finite-state games (1) Antonín Kuceraˇ

Preliminaries Theorem 21 Games Strategies, plays LetG = (V, E, (V, V^, V ), Prob) be a finite-state game. Then Objectives Reachability =0 = v V val(v) = 0 objectives V { ∈ | } The value =1 Min strategies = v V val(v) = 1 Max strategies V { ∈ | } Determinacy Finite-state games are computable in polynomial time. BPA games

Branching-time objectives Proof. Basic properties 0 V V Deciding the winner > = µΓ, where Γ: 2 2 is defined as follows: Games with V → time Γ(A)= T v V V (v, v0) E s.t. v0 A ∪ { ∈ ∪ | ∃ ∈ ∈ } v V (v, v0) E we have that v0 A ∪{ ∈ ^ | ∀ ∈ ∈ } =0 = V r >0 V V <1 = µΓ, where Γ: 2V 2V is defined as follows: V =0 → Γ(A)= v V^ V (v, v0) E s.t. v0 A V ∪ { ∈ ∪ | ∃ ∈ ∈ } v V (v, v0) E we have that v0 A ∪{ ∈  | ∀ ∈ ∈ } =1 = V r <1  V V SFM-10:QAPL 2010 32/56 Stochastic games Finite-state games (2) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives Theorem 22 (Anne Condon, 1992)

Reachability objectives LetG = (V, E, (V, V^, V ), Prob) be a finite-state game. The

The value problem whether val(v) 1 for a givenv V is in NP coNP. Min strategies > 2 Max strategies ∈ ∩ Determinacy Finite-state games BPA games Proof.

Branching-time objectives Since both players have optimal MD strategies, it suffices to Basic properties Deciding the winner “guess” an optimal MD strategy for player  (or player ^); Games with time compute the value in the resulting MDP by solving the associated linear program. 

Obviously, val(v) and the optimal strategies for both players are computable by exhaustive search.

SFM-10:QAPL 2010 33/56 Stochastic games Finite-state games (3) Antonín Kuceraˇ

Preliminaries Theorem 23 (Gimbert, Horn, 2008) Games Strategies, plays Objectives The values and MD optimal strategies in a finite-state game Reachability G = (V, E, (V, V^, V ), Prob) are computable in objectives The value   Min strategies V ! (log( V ) E + p ) Max strategies Determinacy O | | · | | | | | | Finite-state games BPA games time, where p is the maximal bit-length of an edge probability.

Branching-time | | objectives Basic properties Remark 24 Deciding the winner Games with The question whether finite-state stochastic games are solvable time in P is a longstanding open problem in algorithmic game theory. References: A. Condon. The Complexity of Stochastic Games. Information and Computation, 96(2):203–224, 1992. L.S. Shapley. Stochastic games. Proceedings of the National Academy of Sciences USA, 39:1095–1100, 1953. H. Gimbert, F. Horn. Simple Stochastic Games with Few Random Vertices Are Easy to Solve. Proc. FoSSaCS 2008, pp. 5–19, LNCS 4962, Springer, 2008. SFM-10:QAPL 2010 34/56 1 1 3 √5 1 val(v) is the least solution of x = 2 + 2 x in [0, 1], i.e., 2− Even MD strategies may not be finitely representable.

Stochastic games Infinite-state games Antonín Kuceraˇ

Preliminaries Games Strategies, plays Interesting classes of infinite-state stochastic games are Objectives obtained by extending non-deterministic computational devices Reachability objectives with randomized choice. So far, most of the results consider The value Min strategies pushdown automata (recursive state machines); Max strategies Determinacy Finite-state games lossy channel systems. BPA games

Branching-time objectives There are some “new” problems: Basic properties Deciding the winner The value can be irrational Games with time 1 1 1 2 2 2

1 t v 1 1 1 1 2 2 2 2

SFM-10:QAPL 2010 35/56 Stochastic games Infinite-state games Antonín Kuceraˇ

Preliminaries Games Strategies, plays Interesting classes of infinite-state stochastic games are Objectives obtained by extending non-deterministic computational devices Reachability objectives with randomized choice. So far, most of the results consider The value Min strategies pushdown automata (recursive state machines); Max strategies Determinacy Finite-state games lossy channel systems. BPA games

Branching-time objectives There are some “new” problems: Basic properties Deciding the winner The value can be irrational Games with time 1 1 1 2 2 2

1 t v 1 1 1 1 2 2 2 2

1 1 3 √5 1 val(v) is the least solution of x = 2 + 2 x in [0, 1], i.e., 2− Even MD strategies may not be finitely representable.

SFM-10:QAPL 2010 35/56 Stochastic games Stochastic BPA games (1) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives Definition 25 The value Min strategies Max strategies A stochastic BPA game is a tuple ∆ = (Γ,, , (Γ, Γ^, Γ ), Prob) Determinacy → Finite-state games where BPA games

Branching-time Γ is a finite stack alphabet, objectives 2 Basic properties , Γ Γ≤ is a finite set of rules, Deciding the winner →⊆ × Games with (Γ, Γ^, Γ ) is a partition of Γ, time Prob is a probability assignment which to eachX Γ assigns a rational positive probability distribution on the set∈ of all rules of the formX , α. →

SFM-10:QAPL 2010 36/56 Stochastic games Stochastic BPA games (2) Antonín Kuceraˇ

Preliminaries Games Example 26 Strategies, plays Objectives Reachability Let Γ = X, Y, Z, P, R , where Γ = X, Y , Γ^ = , Γ = P, R, Z , objectives { } { } ∅ { } The value and Min strategies Max strategies 1/2 1/2 1 1 Determinacy X , YZ, Y , YP, Y , P, P , R, P , ε, R , R, Z , Z Finite-state games → → → → → → → BPA games

Branching-time objectives X YZ YPZ YPPZ YPPPZ Basic properties Deciding the winner

Games with time 1/2 1/2 1/2 1/2 1/2 Z PZ PPZ PPPZ PPPPZ

1 1/2 1/2 1/2 1/2

RZ RPZ RPPZ RPPPZ

1 1 1 1

SFM-10:QAPL 2010 37/56 Stochastic games BPA MDPs with reachability objectives (1) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives Let ∆ = (Γ,, , (Γ, Γ ), Prob) be a BPA Markov decision → Reachability process, and T Γ∗ a regular set of target configurations. objectives ⊆ The value Min strategies Consider the sets Max strategies Determinacy >0 σ = α Γ∗ σ : (Reach(T)) > 0 Finite-state games W { ∈ | ∃ Pα } BPA games =0 = Γ : σ(Reach(T)) = 0 Branching-time α ∗ σ α objectives W { ∈ | ∃ P } =1 σ Basic properties = α Γ∗ σ : α(Reach(T)) = 1 Deciding the winner W { ∈ | ∃ P } <1 σ Games with = α Γ∗ σ : α(Reach(T)) < 1 time W { ∈ | ∃ P } These sets are regular and the associated finite-state automata are computable in polynomial time. The corresponding winning strategies are regular and computable in polynomial time. Similar results hold for BPA games.

SFM-10:QAPL 2010 38/56 Stochastic games BPA MDPs with reachability objectives (2) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives The value References: Min strategies Max strategies Determinacy K. Etessami, M. Yannakakis. Efficient Qualitative Analysis of Finite-state games BPA games Classes of Recursive Markov Decision Processes and Simple Branching-time Stochastic Games. Proc. STACS 2006, pp. 634–645, LNCS objectives Basic properties 3884, Springer 2006. Deciding the winner

Games with T. Brázdil, V. Brožek, A. Kucera,ˇ and J. Obdržálek. Qualitative time Reachability in Stochastic BPA Games. Proc. STACS 2009, pp. 207–218, 2009.

SFM-10:QAPL 2010 39/56 Stochastic games Branching-time winning objectives Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives The value Min strategies Max strategies Determinacy Specified by formulae of branching-time logics that are Finite-state games interpreted over Markov chains (such as PCTL or PCTL ). BPA games ∗

Branching-time =1 0.1 objectives (p ≥ q) Basic properties G ⇒ F Deciding the winner Games with The aim of player  and player ^ is to satisfy and falsifya time given formula, respectively.

SFM-10:QAPL 2010 40/56 Stochastic games Properties of games with b.-t. objectives (I) Antonín Kuceraˇ

Preliminaries Memory and randomization help: Games Strategies, plays Objectives HR Reachability objectives The value MR HD Min strategies Max strategies Determinacy MD Finite-state games BPA games Consider the following game: Branching-time objectives Basic properties v Deciding the winner 1 1 Games with time p q

=1p =1q. Requires memory. X ∧ F >0p >0q. Requires randomization. X ∧ X >0p >0q =1 =1q. Requires both memory and Xrandomization.∧ X ∧ F G In some cases, infinite memory is required.

SFM-10:QAPL 2010 41/56 Stochastic games Properties of games with b.-t. objectives (II) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives The value The games are not determined (for any strategy type). Min strategies Max strategies   Determinacy =1(a c) =1(b d) >0c >0d Finite-state games BPA games F ∨ ∨ F ∨ ∨ F ∧ F

Branching-time objectives 1 1 Basic properties 2 2 Deciding the winner v

Games with time a b c d

1 1 1 1

SFM-10:QAPL 2010 42/56 Stochastic games Who wins the game (MD strategies) ? Antonín Kuceraˇ

Preliminaries Games Theorem 27 (Brázdil, Brožek, Forejt, K., 2006) Strategies, plays Objectives NP The existence of a winning MD strategy for player  is Σ2 = NP Reachability objectives complete. The value Min strategies Max strategies Determinacy Proof. Finite-state games BPA games The membership to Σ2 follows easily. The Σ2-hardness can be established as Branching-time follows: objectives Basic properties Let x1, , xn y1, , ym B be a Σ2 formula. Deciding the winner ∃ ··· ∀ ··· Games with Consider the following game: time v

p1 bp1 pn bpn q1 bq1 qm bqm Let ϕ be the PCTL formula obtained from B by substituting each occurrence of >0 >0 >0 >0 xi , xi , yj , and yj with pi , bpi , qj , and bqj , respectively.  ¬ ¬ F F F F

SFM-10:QAPL 2010 43/56 Stochastic games Who wins the game (MR strategies) ? Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives Theorem 28 (Brázdil, Brožek, Forejt, K., 2006) Reachability objectives The existence of a winning MR strategy for player  is Σ -hard The value 2 Min strategies and in EXPTIME. For the qualitative fragment of PCTL, the Max strategies Determinacy problem is Σ2-complete. Finite-state games BPA games

Branching-time objectives Proof. Basic properties Deciding the winner The Σ2-hardness is established similarly as for MD strategies. Games with time The membership to EXPTIME is obtained by encoding the condition into Tarski algebra.

The membership to Σ2 for the qualitative PCTL follows easily.



SFM-10:QAPL 2010 44/56 Stochastic games Who wins the game (HD, HR, FD, FR) ? Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives Theorem 29 (Brázdil, Brožek, Forejt, K., 2006) The value Min strategies The existence of a winning HD (or HR) strategy for player  in Max strategies 1 Determinacy MDPs is highly undecidable (and Σ1-complete). Moreover, the Finite-state games BPA games existence of a winning FD (or FR) strategy is also undecidable.

Branching-time objectives Basic properties ( =1/2 =1 >0 =1) Deciding the winner The result holds for the , , , fragment of =1/2L F F F G Games with PCTL (the role of is crucial). time F The proof is obtained by reduction of the problem whether a given non-deterministic Minsky machine has an infinite recurrent computation.

SFM-10:QAPL 2010 45/56 Stochastic games The undecidability proof Antonín Kuceraˇ

Preliminaries Games A non-deterministic Minsky machine with two counters Strategies, plays M Objectives c1, c2: Reachability objectives The value 1 : ins1, , n : insn Min strategies ··· Max strategies Determinacy where each insi takes one of the following forms: Finite-state games BPA games cj := cj + 1; goto k Branching-time objectives if cj=0 then goto k else cj := cj 1; goto m Basic properties − Deciding the winner goto k or m Games with time { } The problem whether a given non-deterministic Minsky machine with two counters initialized to zero has an infinite 1 computation that executes ins1 infinitely often is Σ1-complete. For a given machine , we construct a finite-state MDP G( ) and a formula ϕ (M=1/2, =1, >0, =1) such that hasM an infinite recurrent∈ L computationF F iffF playerG  has a winningM HD (or HR) strategy for ϕ in a distingushed vertex v of G( ). M SFM-10:QAPL 2010 46/56 Stochastic games The construction of G( ) and ϕ Antonín Kuceraˇ M Preliminaries p p Games Strategies, plays Objectives t v Reachability r objectives u The value Min strategies Max strategies Determinacy Finite-state games qqq BPA games chosen I times Branching-time p p objectives Basic properties t t Deciding the winner v Games with r time u u

q q q q chosen J times I = J < ω iff v = >0r =1/2(p q) | F ∧ F ∨ The probability of (p q):0 .010 0 01 + 0.0011 1 1 F ∨ |{z}··· |{z}··· I J

SFM-10:QAPL 2010 47/56 Stochastic games Positive results (1) Antonín Kuceraˇ

Preliminaries Games Strategies, plays We restrict ourselves to qualitative fragments of probabilistic Objectives branching time logics. Reachability objectives The value Even MDPs with qualitative PCTL objectives may require Min strategies infinite memory. Max strategies Determinacy Finite-state games stop left BPA games 1 4 1/4 s / >0 >0 Branching-time ( stop stop) v objectives 1 v2 G =¬1 ∧ F=1 =1 Basic properties (s (X v1 X v2)) 3/4 Deciding the winner 3/4 ∧ G ⇒ ∨ Games with rightright time A winning strategy: if #left < #right use the red transition, otherwise use the green one.

right right

3/4 3/4 stop v1 v2 v2 1/4

1/4 1/4

left left

SFM-10:QAPL 2010 48/56 Stochastic games Positive result (2) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Theorem 30 (Brázdil, Forejt, K., 2008) Objectives

Reachability The existence of a winning HD (or HR) strategy for player  in objectives The value MDPs with qualitative PECTL∗ objectives is decidable in time Min strategies which is polynomial in the size of MDP and doubly exponential Max strategies Determinacy in the size of the formula. The problem is 2-EXPTIME-hard. Finite-state games BPA games Moreover, iff there is a winning HD (or HR) strategy, there is Branching-time objectives also a one-counter winning strategy and one can effectively Basic properties Deciding the winner construct a one-counter automaton which implements this

Games with strategy (the associated complexity bounds are the same as time above). References: T. Brázdil, V. Brožek, V. Forejt, and A. Kucera.ˇ Stochastic Games with Branching-Time Winning Objectives. Proc. of LICS 2006, pp. 349-358, 2006. T. Brázdil, V. Forejt, and A. Kucera.ˇ Controller Synthesis and Verification for Markov Decision Processes with Qualitative Branching Time Objectives. Proc. of ICALP 2008, pp. 148-159, volume 5126 of LNCS. Springer, 2008.

SFM-10:QAPL 2010 49/56 Stochastic games Games with time Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives Games over continuous-time stochastic processes such as The value Min strategies Max strategies continuous-time Markov chains; Determinacy Finite-state games semi-Markov processes; BPA games Branching-time generalized semi-Markov processes. objectives Basic properties Deciding the winner Time-dependent objectives such as Games with time time-bounded reachability; properties expressible in temporal logics with time; properties encoded by timed automata.

SFM-10:QAPL 2010 50/56 5 2

8 4

Stochastic games Continuous-time Markov chains (1) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives 1 Reachability 4 objectives s t The value 3 Min strategies 4 Max strategies Determinacy 3 2 1 Finite-state games 4 3 4 BPA games

Branching-time 1 objectives u v Basic properties 1 Deciding the winner 3 Games with time

The probability that a transition occurs in a state s before time λs t t > 0 is equal to1 e− . − A timed run is an infinite sequence s0, t0, s1, t1,... where 0 s0, s1,... is a run and ti R≥ . ∈

SFM-10:QAPL 2010 51/56 The probability that a transition occurs in a state s before time λs t t > 0 is equal to1 e− . − A timed run is an infinite sequence s0, t0, s1, t1,... where 0 s0, s1,... is a run and ti R≥ . ∈

Stochastic games Continuous-time Markov chains (1) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives 1 Reachability 5 4 2 objectives s t The value 3 Min strategies 4 Max strategies Determinacy 3 2 1 Finite-state games 4 3 4 BPA games

Branching-time 1 objectives u v Basic properties 1 Deciding the winner 8 3 4 Games with time

SFM-10:QAPL 2010 51/56 Stochastic games Continuous-time Markov chains (1) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives 1 Reachability 5 4 2 objectives s t The value 3 Min strategies 4 Max strategies Determinacy 3 2 1 Finite-state games 4 3 4 BPA games

Branching-time 1 objectives u v Basic properties 1 Deciding the winner 8 3 4 Games with time

The probability that a transition occurs in a state s before time λs t t > 0 is equal to1 e− . − A timed run is an infinite sequence s0, t0, s1, t1,... where 0 s0, s1,... is a run and ti R≥ . ∈

SFM-10:QAPL 2010 51/56 Stochastic games Continuous-time Markov chains (2) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives The value For every timed cylinder w = s0, I0,..., sn 1, In 1, sn we put Min strategies − − Max strategies Determinacy n 1 Z Finite-state games Y− λs x BPA games (w) = Prob(si, si+1) λs e− i dx P · i Branching-time i=0 Ii objectives Basic properties Deciding the winner Games with This assignment can be uniquely extended to the (Borel) time σ-algebra generated by all timed cylinders. F Thus, we obtain the probability space (TRun(s), , ). F P

SFM-10:QAPL 2010 52/56 Stochastic games Continuous-time stochastic games Antonín Kuceraˇ

Preliminaries Games 0 2 Strategies, plays 3 . Objectives 0.1 0.1 Reachability objectives The value Min strategies Max strategies Determinacy 0.7 4 Finite-state games BPA games

Branching-time objectives 0.4 0.9 Basic properties Deciding the winner 5 0.6 Games with time A strategy of player assigns to each timed history wv (where v V ) a probability distribution over the outgoing edges of v. ∈ In general, a play is a Markov process with uncountable state-space. Time abstract strategies do not depend on time stamps, and the corresponding play is a continuous-time Markov chain.

SFM-10:QAPL 2010 53/56 Stochastic games Time-bounded reachability objectives (1) Antonín Kuceraˇ

Preliminaries Games Strategies, plays The objective of player /^ is to maximize/minimize the Objectives probability of reaching a target vertex before a time bound t. Reachability objectives The value Continuous-time stochastic games with time-bounded Min strategies Max strategies reachability objectives have a value (w.r.t. time abstract Determinacy strategies), i.e., Finite-state games BPA games σ,π t σ,π t Branching-time sup inf v (Reach≤ (T)) = inf sup v (Reach≤ (T)) objectives σ π P π σ P Basic properties Deciding the winner

Games with time An optimal strategy for player ^ exists in finitely-branching CTSGs. An optimal strategy for player  exists in finitely-branching CTSGs with bounded rates. In finite uniform CTGs, both players haveFD optimal strategies which are effectively computable.

SFM-10:QAPL 2010 54/56 Stochastic games Time-bounded reachability objectives (2) Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives References: Reachability C. Baier, H. Hermanns, J.-P. Katoen, and B.R. Haverkort. Efficient computation objectives The value of time-bounded reachability probabilities in uniform continuous-time Markov Min strategies decision processes. Theoretical Computer Science, 345:2–26, 2005. Max strategies Determinacy M. Neuhäußer, M. Stoelinga, and J.-P. Katoen. Delayed nondeterminism in Finite-state games continuous-time Markov decision processes. In Proceedings of FoSSaCS BPA games 2009, volume 5504 of LNCS, pages 364–379. Springer, 2009. Branching-time objectives T. Brázdil, V. Forejt, J. Krcál,ˇ J. Kretínský,ˇ and A. Kucera.ˇ Continuous-time Basic properties Deciding the winner stochastic games with time-bounded reachability. In Proceedings of FST&TCS

Games with 2009, pages 61–72, 2009. time M. Rabe and S. Schewe. Optimal time-abstract schedulers for CTMDPs and Markov games. In Eighth Workshop on Quantitative Aspects of Programming Languages, 2010. T. Brázdil, J. Krcál,ˇ J. Kretínský,ˇ A. Kucera,ˇ and V. Rehák.ˇ Stochastic Real-Time Games with Qualitative Timed Automata Objectives. In Proceedings of Concur 2010, to appear.

SFM-10:QAPL 2010 55/56 Stochastic games Open problems Antonín Kuceraˇ

Preliminaries Games Strategies, plays Objectives

Reachability objectives The value Min strategies Max strategies MANY. . . Determinacy Finite-state games BPA games Games with continuous time.

Branching-time objectives Hybrid games. Basic properties Deciding the winner Games with multiple players (cooperative games). Games with time Games over infinite-state computational models.

SFM-10:QAPL 2010 56/56