1

Optimal Secure Control with Linear Temporal Constraints Luyao Niu, Student Member, IEEE, and Andrew Clark, Member, IEEE

Abstract—Prior work on automatic control synthesis for cyber- cluding denial-of-service and injection of false sensor mea- physical systems under logical constraints has primarily focused surements and control inputs [10], [11]. For instance, power on environmental disturbances or modeling uncertainties, how- outages have been reported due to the penetration of attackers ever, the impact of deliberate and malicious attacks has been less studied. In this paper, we consider a discrete-time dynamical in power systems [10]. Attacks against cars and UAVs are system with a linear (LTL) constraint in the also reported in [11], [12]. Unlike stochastic errors/modeling presence of an adversary, which is modeled as a stochastic game. uncertainties, intelligent adversaries are able to adapt their We assume that the adversary observes the control policy before strategies to maximize impact against a given controller, and choosing an attack strategy. We investigate two problems. In thus exhibit strategic behaviors. Moreover, controllers will the first problem, we synthesize a robust control policy for the stochastic game that maximizes the probability of satisfying the have limited information regarding the objective and strategy LTL constraint. A value iteration based algorithm is proposed of the adversary, making techniques such as randomized to compute the optimal control policy. In the second problem, control strategies potentially effective in mitigating attacks. we focus on a subclass of LTL constraints, which consist of In this case, control strategies synthesized using existing an arbitrary LTL formula and an invariant constraint. We approaches may be suboptimal in the presence of intelligent then investigate the problem of computing a control policy that minimizes the expected number of invariant constraint violations adversaries because they are designed for CPS under errors and while maximizing the probability of satisfying the arbitrary uncertainties. However, automatic synthesis of control systems LTL constraint. We characterize the optimality condition for in adversarial scenarios has received limited research attention. the desired control policy. A policy iteration based algorithm In this paper, we investigate two problems for a probabilistic is proposed to compute the control policy. We illustrate the autonomous system in the presence of an adversary who proposed approaches using two numerical case studies. tampers with control inputs based on the current system state. Index Terms—Linear Temporal Logic (LTL), stochastic game, We abstract the system as a stochastic game (SG), which is a adversary. generalization of Markov decision process (MDP). We assume a concurrent Stackelberg information structure, in which the I.INTRODUCTION adversary and controller take actions simultaneously. Stack- Cyber-physical systems (CPS) are expected to perform in- elberg games are popular models in security domain [13]– creasingly complex tasks in applications including autonomous [16]. Turn-based Stackelberg games, in which a unique player vehicles, teleoperated surgery, and advanced manufacturing. takes action each time step, have been used to construct model An emerging approach to designing such systems is to specify checkers [17] and compute control strategy [18], however, to a desired behavior using formal methods, and then automat- the best of our knowledge, control synthesis in the concurrent ically synthesize a controller satisfying the given require- Stackelberg setting has been less investigated. ments [1]–[5]. We focus on two problems. In the first problem, we are Temporal such as linear temporal logic (LTL) and given an arbitrary LTL specification and focus on generating (CTL) are powerful tools to specify a control strategy such that the probability of satisfying the specification is maximized. In the second problem, we focus arXiv:1907.07556v1 [eess.SY] 17 Jul 2019 and verify system properties [6]. In particular, LTL, whose syntax and semantics have been well developed, is widely on a subclass of LTL specification that combine an arbitrary used to express system properties. Typical examples include LTL specification with an invariant constraint using logical liveness (e.g., “always eventually A”), safety (e.g., “always and connectives, where an invariant constraint requires the not A”), and priority (e.g., “first A, then B”), as well as more system to always satisfy some property. The specification of complex tasks and behaviors [6]. For systems operating in interest is commonly required for CPS, where the arbitrary stochastic environments or imposed probabilistic requirements LTL specification can be used to model properties such as (e.g., “reach A with probability 0.9”), probabilistic extensions liveness and the invariant property can be used to model safety have also been proposed such as safety and reachability games property. We consider the scenario where the specification can- that capture worst-case system behaviors [7]. not be satisfied. Hence, we relax the specification by allowing In addition to modeling uncertainties and stochastic errors violations on the invariant constraint and we select a control [7]–[9], CPS will also be subject to malicious attacks, in- policy that minimizes the average rate at which invariant property violations occur while maximizing the probability L. Niu and A. Clark are with the Department of Electrical and Computer of satisfying the LTL specification. We make the following Engineering, Worcester Polytechnic Institute, Worcester, MA 01609 USA. {lniu,aclark}@wpi.edu specific contributions: This work was supported by NSF grant CNS-1656981. • We formulate an SG to model the interaction between 2

the CPS and adversary. The SG describes the system probabilistic satisfaction based [23]) have been proposed for dynamics and the effects of the joint input determined motion planning in robotics under temporal logic constraints. by the controller and adversary. We propose a heuristic Control synthesis for deterministic system and probabilistic algorithm to compute the SG given the system dynamics. system under LTL formulas are studied in [24] and [9], • We investigate how to generate a control policy that respectively. Switching control policy synthesis among a set maximizes the worst-case probability of satisfying an of shared autonomy systems is studied in [25]. When temporal arbitrary specification modeled using LTL. We prove logic constraints cannot be fulfilled [26], least-violating control that this problem is equivalent to a zero-sum stochastic synthesis problem is studied in [27]. These existing works do Stackelberg game, in which the controller chooses a not consider the impact of malicious attacks. policy to maximize the probability of reaching a desired Existing approaches of control synthesis under LTL con- set of states and the adversary chooses a policy to min- straints require a compact abstraction of CPS such as MDP [2], imize that probability. We give an algorithm to compute [8], [9], which models the non-determinism and probabilistic the set of states that the system desires to reach. We behaviors of the systems, and enables us applying off-the- then propose an iterative algorithm for constructing an shelf model checking algorithms for temporal logic [6]. Robust optimal stationary policy. We prove that our approach control of MDP under uncertainties has been extensively converges to a Stackelberg equilibrium and characterize studied [8], [28]. Synthesis of control and sensing strategies the convergence rate of the algorithm. under incomplete information for turn-based deterministic • We formulate the problem of computing a stationary game is studied in [29]. However, MDPs only model the control policy that minimizes the rate at which invari- uncertainties that arise due to environmental disturbances and ant constraint violations occur under the constraint that modeling errors, and are only suitable for scenarios with a an LTL specification must be satisfied with maximum single controller. probability. We prove that this problem is equivalent to a For CPS operating in adversarial scenarios, there are two zero-sum Stackelberg game in which the controller selects decision makers (the controller and adversary) and their de- a control policy that minimizes the average violation cisions are normally coupled. Thus, MDP cannot model the cost and the adversary selects a policy that maximizes system, and the robust control strategy obtained on MDP may such cost. We solve the problem by building up the be suboptimal to the CPS operated in adversarial environment. connections with a generalized average cost per stage To better formulate the strategic interactions between the problem. We propose a novel algorithm to generate an controller and adversary, SG is used to generalize MDP [30]. optimal stationary control policy. We prove the optimality Turn-based two-player SG, in which a unique player takes and convergence of the proposed algorithm. action at each time step, have been used to construct model • We evaluate the proposed approach using two numerical checkers [17] and abstraction-refinement framework for model case studies in real world applications. We consider a checking [7], [31], [32]. Unlike the literature using turn-based remotely controlled UAV under deception attack given games [7], [17], [31], [32], however, we consider a different different LTL specifications. We compare the perfor- information structure denoted as concurrent SG, in which both mance of our proposed approaches with the performance players take actions simultaneously at each system state [33]. obtained using existing approaches without considering Several existing works using SGs focus on characterizing the adversary’s presence. The results show that our pro- and computing Nash equilibria [34], [35], whereas in the posed approach outperforms existing methods. present paper we consider a Stackelberg setting in which the The remainder of this paper is organized as follows. Section adversary chooses an attack strategy based on the control II presents related work. Section III gives background on LTL, policy selected by the system. The relationship between Nash SGs, and preliminary results on average cost per stage and and Stackelberg equilibria is investigated in [36]. A hybrid average cost per cycle problem. Section IV introduces the SG with asymmetric information structure is considered in system model. Section V presents the problem formulation [18]. The problem setting in [18] is similar to turn-based on maximizing the probability of satisfying a given LTL stochastic game, while the concurrent setting considered in this specification and the corresponding solution algorithm. Section paper can potentially grants advantage to the controller (see VI presents the problem formulation and solution algorithm of [13] for a simple example). Moreover, the problem setting in the problem of minimizing the average cost incurred due to this paper leads to a more general class of control strategies. violating invariant property while maximizing the probability Specifically, mixed strategies are considered in this work. In of satisfying an LTL specification. Two numerical case studies particular, concurrent SGs played with mixed strategies gen- are presented in Section VII to demonstrate our proposed eralizes models including Markov chains, MDPs, probabilistic approaches. Section VIII concludes the paper. turn-based games and deterministic concurrent games [33]. The problem of maximizing the probability of satisfying a II.RELATED WORK given specification consisting of safety and liveness constraints Temporal logics such as LTL and CTL are widely used in the presence of adversary is considered in the preliminary to specify and verify system properties [6], especially com- conference version of this work [37]. Whereas only a restricted plex system behaviors. Multiple frameworks (e.g., receding class of LTL specifications is considered in [37], in this paper, horizon based [3], sampling based [4], sensor-based [19]– we derive results for arbitrary LTL specifications and we also [21], probabilistic map based [22], multi-agent based [5], and investigate the problem of minimizing the rate of violating 3 invariant constraints. finitely many times and intersects with K(z) infinitely often. CPS security is also investigated using game and control Denote the satisfaction of a formula φ by a run ρ as ρ |= φ. theoretic approaches. Secure state estimation is investigated in [38], [39]. CPS security and privacy using game theoretic ap- B. Stochastic Games proach is surveyed in [40]. Game theory based resilient control is considered in [41]. CPS security under Stackelberg setting A stochastic game is defined as follows [30]. and Nash setting are studied in [14] and [42], respectively. Definition 2. (Stochastic Game): A stochastic game (SG) SG Stochastic Stackelberg security games have been studied in is a tuple SG = (S, UC ,UA, P r, s0, Π, L), where S is a finite [15], [16]. set of states, UC is a finite set of actions of the controller, UA is a finite set of actions of an adversary, P r : S×UC ×UA×S → 0 III.PRELIMINARIES [0, 1] is a transition function where P r(s, uC , uA, s ) is the In this section, we present background on LTL, stochas- probability of a transition from state s to state s0 when the tic games, and preliminary results on the average cost per controller’s action is uC and the adversary’s action is uA. stage (ACPS) and average cost per cycle (ACPC) problems. s0 ∈ S is the initial state. Π is a set of atomic propositions. Throughout this paper, we assume that inequalities between L : S → 2Π is a labeling function, which maps each state to vectors and matrices are component wise comparison. a subset of propositions that are true at each state. Denote the set of admissible actions for the controller and A. Linear Temporal Logic (LTL) adversary at state s as UC (s) and UA(s), respectively. Given An LTL formula consists of [6] a finite set S, we use the Kleene star S∗ and the ω symbol Sω • a set of atomic propositions Π; to denote the set obtained by concatenating elements from S • Boolean operators: negation (¬), conjunction (∧) and finitely and infinitely many times, respectively. Given an SG, disjunction (∨).; the set of finite paths, i.e, the set of finite sequence of states, ∗ • temporal operators: next (X) and until (U). can be represented as S , while the set of infinite paths, i.e., ω An LTL formula is defined inductively as the set of infinite sequence of states, can be represented as S . The strategies (or policies) that players can commit to can be φ = T rue | π | ¬φ | φ1 ∧ φ2 | Xφ | φ1 U φ2. classified into the following two categories. In other words, any atomic proposition φ is an LTL formula. • Pure strategy: A pure strategy gives the action of the Any formula formed by joining atomic propositions using player as a deterministic function of the state. Suppose Boolean or temporal connectives is an LTL formula. Other the players commit to pure strategies. Then a pure control ∗ operators can be defined accordingly. In particular, implication strategy is defined as µ : S → UC , which gives a specific ( =⇒ ) operator (φ =⇒ ψ) can be described as ¬φ∨ψ; even- control action, and a pure adversary strategy is defined ∗ tually (3) operator 3φ can be written as 3φ = T rue U φ; as τ : S → UA. always (2) operator 2φ can be represented as 2φ = ¬3¬φ. • Mixed strategy: A mixed strategy determines a probability The semantics of LTL formulas are defined over infinite distribution over all admissible pure strategies. Suppose words in 2Π [6]. Informally speaking, φ is true if and only if the players commit to mixed strategy. Then a control pol- ∗ φ is true at the current time step. ψ U φ is true if and only if icy for the controller is defined as µ : S × UC → [0, 1], ψ ∧ ¬φ is true until φ becomes true at some future time step. which maps a finite path and the admissible action to a 2φ is true if and only if φ is true for the current time step probability distribution over the set of actions UC (sk) and all the future time. 3φ is true if φ is true at some future available at state sk. A policy τ for the adversary is ∗ time. Xφ is true if and only if φ is true in the next time step. defined as τ : S × UA → [0, 1]. A word η satisfying an LTL formula φ is denoted as η |= φ. In this paper we focus on computing the optimal mixed Given any LTL formula, a deterministic Rabin automaton strategy. When a specific action is assigned with probability (DRA) can be constructed to represent the formula. A DRA one, then mixed strategy reduces to pure strategy. A control is defined as follows. policy is stationary if it is only a function of the current state, i.e., µ : S × U → [0, 1] is only dependent on the last Definition 1. (Deterministic Rabin Automaton): A determinis- C state of the path. A stationary policy is said to be proper if tic Rabin automaton (DRA) is a tuple R = (Q, Σ, δ, q , Acc), 0 the probability of satisfying the given specification after finite where Q is a finite set of states, Σ is a finite set of steps is positive under this policy. Given a pair of policies µ symbols called alphabet, δ : Q × Σ → Q is the and τ, an SG reduces to a Markov chain (MC) whose state set transition function, q0 is the initial state and Acc = 0 µτ 0 is S and transition probability from state s to s is P (s, s ) , {(L(1),K(1)), (L(2),K(2)), ··· , (L(Z),K(Z))} is a finite P P 0 µ(s, uC )τ(s, uA)P r(s, uC , uA, s ). set of Rabin pairs such that L(z),K(z) ⊆ Q for all z = uC ∈UC (s) uA∈UA(s) Given a path β ∈ Sω, a word is generated as 1, 2, ··· ,Z with Z being a positive integer. ηβ = L(s0)L(s1) ··· . The probability of satisfying an A run ρ of a DRA over a finite input word η = η0η1 ··· ηn LTL formula φ under policies µ and τ on SG is denoted as µτ ω is a sequence of states q0q1 ··· qn such that (qk−1, ηk, qk) ∈ δ P rSG = P r{ηβ |= φ : β ∈ S }. for all 0 ≤ k ≤ n. A run ρ is accepted if and only if there In the following, we review a subclass of stochastic games, exists a pair (L(z),K(z)) such that ρ intersects with L(z) denoted as Stackelberg games, involving two players [30]. In 4

the Stackelberg setting, player 1 (also called leader) commits policies µ : S → UC and unichain MDP. An MDP is said to a strategy first. Then player 2 (also known as follower) to be unichain if for any control policy µ, the induced MC is observes the strategy of the leader and plays its best response. irreducible, i.e, the probability of reaching any state from any The information structure under Stackelberg setting can be state on the MC is positive. Denote the cost incurred at state classified into the following two categories. s by applying the deterministic control policy µ as g(s, µ(s)). 0 • Turn-based games: Exactly one player is allowed to take The transition probability from state s to s via action u on action at each time step. Turn-based games are used to MDP is denoted as P r(s, u, s0). Each transition to a new state model asynchronous interaction between players. is viewed as a completion of a stage. Then the objective of • Concurrent games: All the players take actions simulta- ACPS problem is to minimize neously at each time step. Concurrent games are used to ( N ) 1 X model synchronous interaction between players. Jµ(s) = lim sup E g(sk, µ(sk))|s0 = s (1) N→∞ N Unlike [7], [17], [31], [32], which are turn-based and played k=0 with pure strategies, in this paper, we focus on concurrent over all deterministic stationary control policies. games with players committing to mixed strategies. To demon- It has been shown that a gain-bias pair (Jµ, hµ) for ACPS strate the efficiency of mixed strategies in a concurrent game, problem satisfies the properties stated as follows. we consider a robot moving to the right in a 1D space. The action sets for the controller and adversary are {move, stay}. Proposition 1 ( [43]). Assume the MDP is unichain. Then: ∗ If the controller and action take the same action at a given time • the optimal ACPS Jµ(s0) associated with each control step, then the robot will follow the specified action, i.e., move policy µ is independent of initial state s0, i.e., there exists ∗ ∗ ∗ one step to the right under the action pair (move, move) and a constant Jµ such that Jµ(s0) = Jµ for all s0 ∈ S; ∗ stay in the current location under action pair (stay, stay). The • there exists a vector h such that J + hµ(s) = n µ o goal of the robot is to move to the location immediately to the P 0 0 minµ g(s, µ(s)) + s0∈S P r(s, µ(s), s )h(s ) . right of its starting location. When the controller commits to a pure strategy, say move, then the adversary will always take We present some preliminary results on the ACPC problem action stay, and the robot will remain at its starting location. on MDP in the following. Denote the set of states that satisfy On the other hand, if the controller plays a mixed strategy, LTL formula φ as Sφ. A cycle is completed when Sφ is visited. e.g., choosing move and stay with equal probability 1/2 at Therefore, a path starting from s0 and ending in Sφ completes each time step, then the robot has a 1/2 probability to reach the first cycle, and the path starting from Sφ after completing the desired location at each time step, and hence will reach the first cycle completes the second cycle when coming back the desired location within finite time with probability 1. On to Sφ. Denote the number of cycles that have been completed the other hand, in a turn-based setting where the adversary until stage N as C(N). The ACPC problem is described as observes the controller’s action before choosing its action at given an MDP and an LTL formula φ, find a control policy µ each time step, the adversary will always be able to choose that minimizes the opposite of the controller’s action and prevent the robot (PN ) k=0 g(sk, µ(sk)) from moving. Hence while mixed strategies are beneficial in Jµ(s0) = lim sup E ηµ |= φ , (2) N→∞ C(N) the concurrent game formulation, they are not beneficial in the turn-based game for this case. where ηµ = L(s0)L(s1) ··· is the word generated by the path The concept of Stackelberg equilibrium is used to solve s0s1 ··· induced by deterministic control policy µ. Stackelberg games. The Stackelberg equilibrium is defined It has been shown that the following proposition holds for formally in the following. the ACPC problem. Definition 3. (Stackelberg Equilibrium): Denote the utility Proposition 2 ( [9]). Assume the MDP is unichain. Then: that the leader gains in a stochastic game SG under leader ∗ • the optimal ACPC Jµ(s0) associated with each control follower strategy pair (µ, τ) and the utility that the follower policy µ is independent of initial state s0, i.e., there exists gains as Q (µ, τ) and Q (µ, τ), respectively. A pair of ∗ ∗ ∗ L F a constant Jµ such that Jµ(s0) = Jµ for all s0 ∈ S; leader follower strategy (µ, τ) is a Stackelberg equilibrium • there exists some vector h such that the following if leader’s strategy µ is optimal given that the follower ∗ n equation holds Jµ + h(s) = minµ g(s, µ(s)) + observes its strategy and plays its best response, i.e., µ = o 0 0 P P r(s, µ(s), s0)h(s0)+J ∗ P P r(s, µ(s), s0) . argmaxµ0∈µ QL(µ , BR(µ )), where µ is the set of all ad- s0∈S µ s0∈S missible policies of the controller and BR(µ0) = {τ : τ = 0 argmax QF (µ , τ)} is the best response to leader’s strategy IV. SYSTEM MODEL 0 µ played by the follower. In this section, we present the system model. We consider the following discrete-time finite state system C. ACPS and ACPC Problems x(t + 1) = f(x(t), u (t), u (t), ϑ(t)), ∀t = 0, 1, ··· , (3) We present some preliminary results on the average cost C A per stage (ACPS) problem and average cost per cycle (ACPC) where x(t) is the finite system state, uC (t) is the control problem on MDP without the presence of adversary in this input from the controller, uA(t) is the attack signal from the subsection. Both problems focus on deterministic control adversary, and ϑ(t) is stochastic disturbance. 5

In system (3), there exists a strategic adversary that can tam- Algorithm 1 Algorithm for constructing a stochastic game per with the system transition. In particular, the controller and approximation of a system. adversary jointly determine the state transition. For instance, 1: procedure CREATE STOCHASTIC GAME( X1,...,Xn) an adversary that launches false data injection attack modifies 2: Input: Dynamics (3), set of subsets X1,...,Xn the control input as u(t) = uC (t) + uA(t); an adversary that 3: Output: Stochastic game SG = launches denial-of-service attack manipulates the control input (S, UC ,UA, P r, s0, Π, L) as u(t) = uC (t) · uA(t), where uA(t) ∈ {0, 1}. 4: Initialize K In security domain, Stackelberg game is widely used to 5: S = {X1,...,Xn} and L is determined accordingly model systems in the presence of malicious attackers. In 6: Generate control primitive sets UC =

Stackelberg setting, the controller plays as the leader and the {uC1 , uC2 ··· , uCΞ } and UA = {uA1 , uA2 ··· , uAΓ } adversary plays as the follower. In this paper, we adopt the 7: for i = 1, . . . , n do concurrent Stackelberg setting. The controller first commits 8: for all uC ∈ UC and uA ∈ UA do to its control strategy. The adversary can stay outside for 9: for k = 1,...,K do indefinitely long time to observe the strategy of the controller 10: x ← sampled state in Xi and then chooses its best response to the controller’s strategy. 11: uˆC , uˆA ← sampled inputs from uC , uA However, at each time step, both players must take actions 12: j ← region containing f(x, uˆC , uˆA, ϑ) simultaneously. The system is given some specification that is 13: Invoke particle filter to approximate transi- modeled using LTL. tion probabilities P r between sub-region i and j for all i To abstract system (3) as a finite state/action SG, we propose and j. a heuristic simulation based algorithm as shown in Algorithm 14: end for 1, which is generalized from the approaches proposed in [8], 15: end for [44]. The difference between Algorithm 1 and algorithms 16: end for in [8], [44] is that Algorithm 1 considers the presence of 17: end procedure adversary. Algorithm 1 takes the dynamical system (3), the set of sub-regions of state space {X1, ··· ,Xn} and actions as in dramatic change on state vector. Thus the states evolve inputs. We observe that the choice of subregions X1,...,Xn may affect the accuracy of the model, however, choice of the following the joint actions of administrator and adversary. subregions is beyond the scope of this work. For each sub- Moreover, the probability of the occurrence of events, i.e., the transmission line is out of service, is jointly determined region Xi and pair of (control, adversary) inputs (uC , uA), by the actions of adversary and administrator (or defender). we randomly select K sample states in Xi and adversary The specifications that can be given to the system might in- and control inputs that map to uC and uA. We compute clude reachability (e.g., ’eventually satisfy optimal power flow the probability distribution over the set of sub-regions {Xj} that the system can transition to following (3), and update equation’: 3OPF ) and reactivity (e.g., ’if voltage exceeds some threshold, request load shedding from demand side’: P r(Xi, uC , uA,Xj) accordingly for all Xj (Algorithm 1). To approximate the transition probability, Monte Carlo simulation 2(voltage alarm =⇒ XDR)). or particle filter can be used [8], [44], [45]. 2) Networked Control System under Attacks: In the follow- ing, we present an example on control synthesis for networked In the following, we present two applications in the security control system under deception attacks. domain that can be formulated using the proposed framework. The system is modeled as a discrete linear time invariant 1) Infrastructure Protection in Power System: The pro- system x(k + 1) = Ax(k) + Bu(k) + ϑ(k), k = 0, 1 ··· , posed framework can capture attack-defense problems on where x(k) is the system state, u(k) is the compromised power system as shown in the following. An attack-defense control input and ϑ(k) is independent Gaussian distributed problem on power system is investigated in [34]. disturbance. There exists an intelligent and strategic adver- The players involved in this example are the power system sary that can compromise the control input of the system administrator and adversary. The adversary aims to disrupt the by launching deception attack. When the adversary launches transmission lines in power network, while the administrator deception attack on system actuator [14], then the control input deploys resources to protect critical infrastructures or repair is represented as u(k) = uC (k)+uA(k). Typical specifications damaged infrastructures. The dynamics of power system is that are assigned to the system include stability and safety (e.g., ‘eventually reach stable status while not reaching unsafe modeled as x(t + 1) = f(x(t), uC (t), uA(t)), where x(t) state’: 32stable ∧ 2¬unsafe). is the state vector, uC (t) and uA(t) are the inputs from the administrator and adversary, respectively. Depending on the focus of the administrator, the state may contain bus V. PROBLEM FORMULATION -MAXIMIZING voltages, bus power injections, network frequency, and so SATISFACTION PROBABILITY on. The actions of the administrator UC and adversary UA, In this section, we formulate the problem of maximizing respectively, are the actions to protect (by deploying protection the probability of satisfying a given LTL specification in or repair resources) and damage (by opening the breakers the presence of an adversary. We first present the problem at ends of) the transmission lines. If an attack is successful, formulation, and then give a solution algorithm for computing then the transmission line is out of service, which will result the optimal control policy. 6

A. Problem Statement We next show that the vector v satisfying (5) is unique. The problem formulation is as follows. Since every Stackelberg equilibrium satisfies (5), if the vector v is unique, then the vector v must be a Stackelberg equilib- Problem 1. Given a stochastic game SG and an LTL specifi- rium. Suppose that uniqueness does not hold, and let µ and µ0 cation φ, compute a control policy µ that maximizes the prob- be Stackelberg equilibrium policies with corresponding satis- 0 ability of satisfying the specification φ under any adversary faction probabilities v and v . We have that v = T v ≥ Tµ0 v. policy τ, i.e., Composing k times and taking the limit as k tends to infinity, µτ k k 0 max min P rSG(φ). (4) we have v = limk→∞ T v ≥ limk→∞ Tµ0 v = v . By the same µ τ argument, v0 ≥ v, implying that v = v0 and thus uniqueness Denote the probability of satisfying specification φ as satis- holds. faction probability. The policies µ and τ that achieve the max- By Lemma 1, we have that the satisfaction probability for min value of (4) can be interpreted as an equilibrium defined some state s can be computed as the linear combination of in Definition 3 in a zero-sum Stackelberg game between the the satisfaction probabilities of its neighbor states, where the controller and adversary, in which the controller first chooses coefficients are the transition probabilities jointly determined a randomized policy µ, and the adversary observes µ and by the control and adversary policies. Lemma 1 provides selects a policy τ to minimize P rµτ (φ|s). By Von Neumann’s SG us the potential to apply iterative algorithm to compute the theorem [46], the satisfaction probability at equilibrium must satisfaction probability. exist. We restrict our attention to the class of stationary policies, leaving the general case for future work. We have the following preliminary lemma. B. Computing the Optimal Policy Lemma 1. Let satisfaction probability v(s) = Motivated by model checking algorithms [6], we first con- µτ struct a product SG. Then we analyze Problem 1 on the maxµ minτ P rSG(φ|s). Then product SG. A product SG is defined as follows. X X X v(s) = max min µ(s, uC ) Definition 4. Given an SG SG = µ τ (Product SG): 0 uC ∈UC (s) uA∈UA(s) s ∈S (S, UC ,UA, P r, L, Π) and a DRA R = (Q, Σ, δ, q0, Acc), a 0 0 τ(s, uA)v(s )P r(s, uC , uA, s ). (5) (labeled) product SG is a tuple G = (SG,UC ,UA, P rG, AccG), where SG = S × Q is a finite set of states, UC is a Conversely, if v(s) satisfies (5), then v(s) = finite set of control inputs, U is a finite set of attack µτ A 0 0 0 maxµ minτ P rSG(φ|s). Moreover, the satisfaction probability signals, P rG((s, q), uC , uA, (s , q )) = P r(s, uC , uA, s ) 0 0 v is unique. if δ(q, L(s )) = q , AccG = {(LG(1),KG(1)), Proof. In the following, we will first show the forward direc- (LG(2),KG(2)), ··· , (LG(Z),KG(Z))} is a finite set n of Rabin pairs such that LG(z),KG(z) ⊆ SG for all tion. We let n = |S| and define three operators Tµτ : [0, 1] → n n n n n z = 1, 2, ··· ,Z with Z being a positive integer. In particular, [0, 1] , Tµ : [0, 1] → [0, 1] , and T : [0, 1] → [0, 1] . a state (s, q) ∈ LG(z) if and only if q ∈ L(z), and a state X 0 0 (Tµτ v)(s) = P r(s, µ, τ, s )v(s ) (s, q) ∈ KG(z) if and only if q ∈ K(z). s0 By Definition 2 and Definition 4, we have the following X 0 0 (Tµv)(s) = min P r(s, µ, τ, s )v(s ) τ observations. First, since the transition probability is deter- s0 mined by SG and the satisfaction condition is determined by X (T v)(s) = max min P r(s, µ, τ, s0)v(s0) R, the satisfaction probability of φ on SG is equal to the µ τ s0 satisfaction probability of φ on the product SG G. Second, 0 P P we can generate the corresponding path s0s1 ··· on SG where P r(s, µ, τ, s ) = µ(s, uC ) uC ∈UC (s) uA∈UA(s) given a path (s , q )(s , q ) ··· on the product SG G. Finally, τ(s, u )P r(s, u , u , s0). Suppose that µ is a Stackelberg 0 0 1 1 A C A given a control policy µ synthesized on the product SG G, a equilibrium with v(s) equal to the satisfaction probability for corresponding control policy µ on SG is obtained by letting state s, and yet (5) does not hold. We have that v = T v, since SG µ µ (s ) = µ((s , q)) for all time step i [6], [8]. Due to these v is the optimal policy for the MDP defined by the policy SG i i one-to-one correspondence relationships, in the following, we µ [43]. On the other hand, T v ≤ T v. Composing T and µ analyze Problem 1 on the product SG G and present an T k times and taking the limit as k tends to infinity yields µ algorithm to compute the optimal control policy. When the v = lim T kv ≤ lim T kv v∗. The convergence k→∞ µ k→∞ , context is clear, we use s to represent state (s, q) ∈ S . of T kv to a fixed point v∗ follows from the fact that T is a G We next introduce the concept of Generalized Accepting bounded and monotone nondecreasing operator. Furthermore, Maximal End Component (GAMEC), which is generalized choosing the policy µ(s) at each state as the maximizer of from accepting maximal end component (AMEC) on MDP. (5) yields a policy with satisfaction probability v∗. Hence v ≤ v∗. If v(s) = v∗(s) for all states s, then (5) is satisfied, Definition 5. (Sub-SG): A sub-SG of an SG SG = contradicting the assumption that the equation does not hold. (S, UC ,UA, P r, s0, Π, L) is a pair of states and actions On the other hand, if v(s) < v∗(s) for some state s, then µ (C,D) where ∅= 6 C ⊆ S is a set of states, and D : C → UC (s) is not a Stackelberg equilibrium. 2 is an enabling function such that D(s) ⊆ UC (s) for 7

0 0 all s ∈ C and {s |P r(s, uC , uA, s ) > 0, ∀uA ∈ UA(s), s ∈ Algorithm 2 Computing the set of GAMECs C. C} ⊆ C. 1: procedure COMPUTE GAMEC(G) 2: Input: Product SG G By Definition 5, we have a sub-SG is also an SG. Given 3: Output: Set of GAMECs C Definition 5, a Generalized Maximal End Component (GMEC) 4: Initialization: Let D(s) = UC (s) for all s ∈ SG . Let C = ∅ is defined as follows. and Ctemp = {SG } 5: repeat Definition 6. A Generalized End Component (GEC) is a sub- 6: C = Ctemp, Ctemp = ∅ 7: for C ∈ C do SG (C,D) such that the underlying digraph G(C,D) of sub- 8: R = ∅ .R SG (C,D) is strongly connected. A GMEC is a GEC (C,D) is the set of states that should be 0 0 removed such that there exists no other GEC (C ,D ) 6= (C,D), where 9: Let SCC1, ··· ,SCCn be the set of nontrivial C ⊆ C0 and D(s) ⊆ D0(s) for all s ∈ C. strongly connected components (SCC) of the underlying diagraph G(C,D) Definition 7. A GAMEC on the product SG G is a GMEC if 10: for i = 1, ··· , n do there exists some (LG(z),KG(z)) ∈ AccG such that LG(z) ∩ 11: for each state s ∈ SCCi do 0 C = ∅ and KG(z) ⊆ C. 12: D(s) = {uC ∈ UC (s)|s ∈ 0 C where P r(s, uC , uA, s ) > 0, ∀uA ∈ UA(s)} By Definition 7, we have a set of states constitutes a 13: if D(s) = ∅ then GAMEC if there exists a control policy such that for any initial 14: R = R ∪ {s} states in the GAMEC, the system remains in the GAMEC 15: end if with probability one and the specification is satisfied with 16: end for 17: end for probability one. We denote the set of GAMECs as C, and 18: while R 6= ∅ do the set of states that constitute GAMEC as accepting states. 19: dequeue s ∈ R from R and C 0 0 Algorithm 2 is used to compute the set of GAMECs. Given a 20: if there exist s ∈ C and uC ∈ UC (s ) such that 0 0 product SG G, a set of GAMECs C can be initialized as C = ∅ P r(s , uC , uA, s) > 0 under some uA ∈ UA(s ) then 21: D(s0) = D(s0) \{u } and D(s) = U (s) for all s. Also, we define a temporary set C C 22: if D(s0) = ∅ then Ctemp which is initialized as Ctemp = SG. Then from line 8 to 23: R = R ∪ {s0} line 17, we compute a set of states R that should be removed 24: end if from GMEC. The set R is first initialized to be empty. Then for 25: end if each state s in each nontrivial strongly connected component 26: end while 27: i = 1, ··· , n (SCC) of the underlying diagraph, i.e., the SCC with more for do 28: if C ∩ SCC 6= ∅ then s i than one states, we modify the admissible actions at state 29: C = Ctemp ∪ {C ∩ SCCi} by keeping the actions that can make the system remain in C 30: end if under any adversary action. If there exists no such admissible 31: end for action at state s, then the state s is added into R. From line 18 32: end for 33: C = C to line 26, we examine if there exists any state s0 in current until temp 34: for C ∈ C do R GMEC that will steer the system into states in . In particular, 35: for (LG (z),KG (z)) ∈ AccG do 0 by taking action uC at each state s , if there exists some 36: if LG (z) ∩ C 6= ∅ or KG (z) 6⊆ C then adversary action uA such that the system is steered into some 37: C = C\ C 0 38: end if state s ∈ R, then uC is removed from UC (s ). Moreover, if there exists no admissible action at state s0, then s0 is added 39: end for 40: end for to R. Then we update the GMEC set as shown from line 27 41: return C to line 32. This procedure is repeated until no further update 42: end procedure can be made on GMEC set. Line 34 to line 40 is to find the GAMEC following Definition 7. Given the set of GAMECs Proposition 3. For any stationary control policy µ and initial C = {(C ,D ), ··· , (C ,D ), ··· , (C ,D )} returned by 1 1 h h |C| |C| state s, the minimum probability over all stationary adversary Algorithm 2, the set of accepting states E is computed as |C| policies of satisfying the LTL formula is equal to the minimum E = ∪ C . h=1 h probability over all stationary policies of reaching E, i.e., The main idea to computing the solution to (4) is to show given any stationary policy µ, we have that the max-min probability of (4) is equivalent to maximizing µτ µτ (over µ) the worst-case probability of reaching the set of min P rG (φ|s) = min P rG (reach E|s), (6) accepting states E. Denote the probability of reaching the set of τ τ µτ accepting states E as reachability probability. In the following, where P rG (reach E) is the probability of reaching E under we formally prove the equivalence between the worst-case policies µ and τ. satisfaction probability of (4) and the worst-case reachability Proof. By Definition of E, if the system reaches E, probability. Then, we present an efficient algorithm for com- then φ is satisfied for a maximizing policy µ. Thus puting a policy µ that maximizes the worst-case probability of µτ µτ minτ P r (reach E) = minτ P r (φ). reaching E, with the proofs of the correctness and convergence G G Suppose that for some control policy µ and initial state s0, of the proposed algorithm. In particular, our proposed solution µτ µτ is based on the following. min P r (φ|s0) > min P r (reach E|s0), (7) τ G τ G 8 and let τ be a minimizing stationary policy for the adversary. Algorithm 3 Modifying product SG G. The policies µ and τ induce an MC on the state space. By 1: procedure CONSTRUCT SG(G, C) model checking algorithms on MC [6], the probability of 2: Input: Product SG G, the set of GAMECs C satisfying φ from s0 is equal to the probability of reaching 3: Output: Modified product SG G a bottom strongly connected component (BSCC) that satisfies 4: SG := SG ∪ {dest},UC (s) := UC (s) ∪ {d}, ∀s ∈ SG φ. By assumption there exists a BSCC, denoted SCC0, 5: P rG(s, d, uA, dest) = 1 for all s ∈ E ∪ {dest} and that is reachable from s0, disjoint from E, and yet satisfies uA ∈ UA(s) µτ P rG (φ|s) = 1 for all s ∈ S0 (if this were not the case, then 6: end procedure (7) would not hold). Choose a state s ∈ SCC0. Since s∈ / E, there exists a policy µτˆ s. Algorithm 4 gives a value iteration based algorithm for τˆ such that P rP (φ|s) < 1. Create a new adversary policy τ1 0 0 0 0 0 computing v. The idea of the algorithm is to initialize v to as τ1(s ) =τ ˆ(s ) for all s ∈ SCC0 and τ1(s ) = τ(s ) otherwise. This policy induces a new MC on the state space. be zero except on states in E, and then greedily update v(s) at each iteration by computing the optimal Stackelberg policy Furthermore, since only the outgoing transitions from SCC0 are affected, the success probabilities of all sample paths that at each state. The algorithm terminates when a stationary v is reached. do not reach SCC0 are unchanged. If there exists any state s0 that is reachable from s in the µτ1 0 Algorithm 4 Algorithm for a control strategy that maximizes new chain with P rG (φ|s ) < 1, then the policy τ1 strictly reduces the probability of satisfying φ, thus contradicting the the probability of satisfying φ. 1: procedure AX EACHABILITY G C assumption that τ is a minimizing policy. Otherwise, let SCC1 M R ( , ) denote the set of states that are reachable from s under µ 2: Input: product SG G, the set of GAMECs C 3: Output: v ∈ |SG | v(s) = and τ1 and are disjoint from E (this set must be non-empty; Vector R , where µτˆ µτ max min P r (reach dest|s0 = s) otherwise, the policy τˆ would lead to P rG (φ|s) = 1, a G 0 0 4: Initialization: v0 ← 0 v1(s) ← 1 s ∈ E v1(s) ← contradiction). Construct a new policy τ2 by τ2(s ) =τ ˆ(s ) , for , 0 0 0 0 k ← 0 if s ∈ S1 and τ2(s ) = τ(s ) otherwise. Proceeding in- otherwise, 5: while max {|vk+1(s) − vk(s)| : s ∈ S } > δ do ductively, we derive a sequence of policies τk that satisfy G µτk µτ 6: k ← k + 1 P rG (φ) ≤ P rG (φ). This process terminates when either µτk µτ 7: for s∈ / E do P r (φ|s0) < P r (φ|s0), contradicting the minimality of G G k+1 µτk 00 µτˆ 00 00 8: Compute v as v (s) ← maxµ minτ τ, or when P rG (φ|s ) = P rG (φ|s ) for all s that are  P P P 0 reachable from s under τˆ. The latter case, however, implies 0 v(s )µ(s, uC )τ(s, uA) µτˆ s uC ∈UC (s) uA∈UA(s) that P rG (φ|s)) = 1, contradicting the definition of τˆ.  0 Proposition 3 implies that the problem of maximizing the P rG(s, uC , uA, s ) worst-case success probability can be mapped to a reachability 9: end for problem on the product SG G, where G is modified following 10: end while Algorithm 3. A dummy state dest is added into the state space 11: return v of SG. All transitions starting from a state in GAMECs are 12: end procedure directed to state dest with probability one regardless of the actions taken by the adversary. The transition probabilities The following theorem shows that Algorithm 4 guarantees and action spaces of all other nodes are unchanged. We convergence to a Stackelberg equilibrium. observe that the reachability probability remains unchanged after applying Algorithm 3. Hence the satisfaction probability Theorem 1. There exists v∞ such that for any  > 0, there k ∞ remains unchanged. Moreover, the one-to-one correspondence exists δ and K such that ||v − v ||∞ <  for k > K. of control policy still holds for states outside E. Therefore, Furthermore, v∞ satisfies the conditions of v in Lemma 1. Problem 1 is then equivalent to Proof. We first show that, for each s, the sequence vk(s): k = µτ max min P rG (reach dest) (8) 1, 2,..., is bounded and monotone. Boundedness follows from µ τ the fact that, at each iteration, vk(s) is a convex combination of Then, the solution to (4) can be obtained from the solution to the states of its neighbors, which are bounded above by 1. To (8) by following the optimal policy µ∗ for (8) at all states not show monotonicity, we induct on k. Note that v1(s) ≥ v0(s) in E. The control policy for states in E can be any probability and v2(s) ≥ v1(s) since v1(s) = 0 for s∈ / E and vk(s) ≡ 1 distribution over the set of enabled actions in each GAMEC. for s ∈ E. Let µk denote the optimal control policy at step k. We have Due to Proposition 3, in the following we focus on solv- X X X vk+1(s) ≥ min vk(s0) (9) ing the problem (8). Our approach for solving (8) is to τ u ∈U (s) u ∈U (s) s0∈S first compute a value vector v ∈ R|SG |, where v(s) = C C A A µτ ·µk(s, u )τ(s, u )P r (s, u , u , s0) maxµ minτ P rG (reach dest|s). By Lemma 1, the optimal C A G C A X X X policy can then be obtained from v by choosing the distribution ≥ min vk−1(s0) (10) τ µ that solves the optimization problem of (5) at each state 0 uC ∈UC (s) uA∈UA(s) s ∈S 9

k 0 ·µ (s, uC )τ(s, uA)P rG(s, uC , uA, s ) such as liveness φ1 = 23π, while the invariant property = vk(s) (11) can be used to model collision avoidance requirements ψ = 2¬obstacle. Given an LTL specification on the dynamical Eq. (9) follows because the value of vk+1(s), which corre- system (3), it might be impossible for the system to satisfy sponds to the maximizing policy, dominates the value achieved the specification due to the presence of the adversary, i.e., by the particular policy µk. Eq. (9) holds by induction, µτ s maxµ minτ P rSG(φ) = 0. Thus, we relax the specification since vk(s0) ≥ vk−1(s0) for all s0. Finally, (11) holds by by allowing violations on invariant constraint ψ. To minimize k k construction of µs . Hence v (s) is monotone in k. the impact of invariant constraint violations, we investigate the We therefore have that vk(s) is a bounded monotone problem of minimizing the invariant constraint violation rate sequence, and hence converges by the monotone convergence in this section. In particular, given a specification φ = φ1 ∧ ψ, theorem. Let v∞ denote the vector of limit points, so that the objective is to compute a control policy that minimizes we can select δ sufficiently small (to prevent the algorithm the expected number of violations of ψ per cycle over all the from terminating before convergence) and K large in order to stationary policies that maximizes the probability of satisfying k ∞ satisfy ||v − v ||∞ < . φ1. We say that every visit to a state that satisfies φ1 completes We now show that v∞ is a Stackelberg equilibrium. Since a cycle. We still focus on the SG generated from (3). vk(s) converges, it is a Cauchy sequence and thus for any In the following, we first formulate the problem. Motivated  > 0, there exists K such that k > K implies that |vk(s) − by the solution idea of ACPC problem, we solve the problem vk+1(s)| < . By construction, this is equivalent to by generalizing the ACPS problem in [43] and establishing a connection between the problem we formulated and the k X X X  k−1 0 v (s)−max min v (s )µ(s, uC ) generalized ACPS problem. Finally, we present the optimality µ τ 0 uC ∈UC (s) uA∈UA(s) s ∈S conditions of the problem of interest, and propose an efficient

0  algorithm to solve the problem. τ(s, uA)P rG(s, uC , uA, s ) < , and hence v∞ is within  of a Stackelberg equilibrium for A. Problem Statement every  > 0. In the following, we focus on how to generate a control While this approach guarantees asymptotic convergence to policy that minimizes the rate at which ψ is violated while a Stackelberg equilibrium, there is no guarantee on the rate guaranteeing that the probability of satisfying φ1 is maxi- of convergence. By modifying Line 8 of the algorithm so that mized. The problem is stated as follows: k+1 v (s) is updated if Problem 2. Compute a secure control policy µ that minimizes the violation rate of ψ, i.e., the expected number of violations X X X h 0 max min v(s )µ(s, uC )τ(s, uA) µ τ of ψ per cycle, while maximizing the probability that φ1 is 0 uC ∈UC (s) uA∈UA(s) s ∈S satisfied under any adversary policy τ. 0 i k P r(s, uC , uA, s ) > (1 + )v (s) (12) To investigate the problem above, we assign a positive cost α s s 6|= ψ and is constant otherwise, we derive the following result on to every transition initiated from a state if . If state s |= ψ g(s) = 0 u u the termination time. , we let for all C and A. Thus we have for all uC and uA Proposition 4. The -relaxation of (12) converges to a value ( k+1 k α if s 6|= ψ of v satisfying max{|v (s) − v (s)| : s ∈ SG} <  within g(s) = (13) n  1  o 0 0 if s |= ψ. n maxs log v0(s) /log (1 + ) iterations, where v (s) is the smallest positive value of vk(s) for k = 0, 1,.... By Proposition 3, we have that two consecutive visits to E N complete a cycle. Based on the transition cost defined in (13), Proof. After N updates, we have that v (s) ≥ (1 + Problem 2 can be rewritten as follows. )N v0(s). Hence for each s, v(s) will be incremented at n  1  o Problem 3. Given a stochastic game SG and an LTL formula most maxs log 0 /log (1 + ) times. Furthermore, we v (s) φ in the form of φ = φ ∧ ψ, obtain an optimal control have that at least one v(s) must be updated at each iteration, 1 policy µ that maximizes the probability of satisfying φ while thus giving the desired upper bound on the number of itera- 1 minimizing the average cost per cycle due to violating ψ which tions. By definition of (12), the set that is returned satisfies is defined as |vk+1(s) − vk(s)| < vk(s) < . PN g(s )  µτ k=0 k JSG = lim sup E ηβ |= φ1 . (14) VI.PROBLEM FORMULATION-MINIMIZING INVARIANT N→∞ I(β, N) CONSTRAINT VIOLATION Since φ1 is required to be satisfied, similar to our analysis in In this section, we focus on a subclass of specifications of Section V, we first construct a product SG G using SG SG and the form φ = φ1∧ψ, where φ1 is an arbitrary LTL formula and the DRA converted from specification φ1. Then we have the ψ is an invariant constraint. An invariant constraint requires following observations. First, the one-to-one correspondence the system to always satisfy some property. The general LTL relationships between the control policies, paths, and associ- formula φ1 can be used to model any arbitrary properties ated expected cost due to violating ψ on SG and G hold. 10

Furthermore, we observe that if there exists a control policy be interpretated as the expected cost and expected number such that the specification φ can be satisfied, it is the optimal of stages to return to s0 for the first time from state s0, µτ solution to Problem 3 with JSG = 0. Finally, by our analysis in respectively. Based on the definitions above, we have the Section V, specification φ1 is guaranteed to be satisfied if there following equations: exists a control policy that can reach the set of accepting states µτ X µτ E. These observations provide us the advantage to analyze ξ(s) = g (s) + P (s, k)ξ(k), ∀s ∈ SG, (18) Problem 3 on the product SG G constructed using SG SG k∈S\s0 and DRA constructed using φ . Hence, in the following, we X µτ 1 o(s) = 1 + P (s, k)o(k), ∀s ∈ SG. (19) analyze Problem 3 on the product SG G. When the context 0 k∈SG \s is clear, we use s to refer to state (s, q) ∈ SG. Without loss of generality, we assume that E = {1, 2, ··· , l}, i.e., states Define ζµτ = ξ(s0)/o(s0). Multiplying (19) by ζµτ and {l + 1, ··· , n} ∩ E = ∅. subtracting the associated product from (18), we have

µτ µτ µτ B. Computing the Optimal Control Policy ξ(s) − ζ o(s) = g (s) − ζ X µτ µτ Due to the presence of an adversary, the results presented + P (s, k)(ξ(k) − ζ o(k)), ∀s ∈ SG. (20) 0 in Proposition 1 are not applicable. In the following we first k∈SG \s generalize the ACPS problem discussed in [43], which focused Define a bias term on systems without adversaries. Then we characterize the optimality conditions for Problem 3 by connecting it with the µτ µτ b (s) = ξ(s) − ζ o(s), ∀s ∈ SG (21) generalized ACPS problem. Generalized ACPS problem. The presence of adversary is Using (21), (20) can be rewritten as not considered in the ACPS problem considered in [43]. Thus n we need to formulate the ACPS problem with the presence of X ζµτ +bµτ (s) = gµτ (s)+ P µτ (s, k)bµτ (k), ∀s ∈ S (22) adversary and we denote it as the generalized ACPS problem. G k=1 The objective of generalized ACPS problem is to minimize ( N ) which completes our proof. 1 X J (s) = lim sup g(s) | s = s (15) µτ N E 0 N→∞ n=0 The result presented above generalizes the one in [43] in the sense that we consider the presence of adversary. The reason over all stationary control considering the adverary plays some that we focus on communicating SG is that we will focus on strategy τ against the controller. the accepting states which are strongly connected. Based on Optimality conditions for generalized ACPS problem. Lemma 2, we have the optimality conditions for generalized Given any stationary policies µ and τ, denote the induced ACPS problem expressed using the gain-bias pair (B, b): transition probability matrix as P µτ with P µτ (s, s0) = P P 0 µ(s, uC )τ(s, uA)P rG(s, uC , uA, s ). X X X uC ∈UC (s) uA∈UA(s) B(s) = min max µ(s, uC )τ(s, uA) Analogously, denote the expected transition cost µ τ u ∈U (s) u ∈U (s) s0 µτ C C A A starting from any state s ∈ SG as g (s) = 0 0 P P · P rG(s, uC , uA, s )B(s ) (23) µ(s, uC )τ(s, uA)g(s). Similar to uC ∈UC (s) uA∈UA(s) X X X [43], a gain-bias pair is used to characterize the optimality B(s) + b(s) = min max gµτ (s) + µ∈µ∗ τ∈τ ∗ µτ µτ 0 condition. The gain-bias pair (B , b ) under stationary uC ∈UC (s) uA∈UA(s) s policies µ and τ, where Bµτ is the average cost per stage  0 0 and bµτ is the differential or relative cost vector, satisfies the µ(s, uC )τ(s, uA)P rG(s, uC , uA, s )b(s ) (24) following proposition. where µ∗ and τ ∗ are the optimal policy sets obtained by solv- Let µ and τ be proper stationary policies for Lemma 2. ing (23). Eq. (23) can be shown using the method presented a communicating SG, where a communicating SG is an SG in Lemma 1, and (24) is obtained directly from (22). Given whose underlying graph is strongly connected. Then there the optimality conditions (23) and (24) for generalized ACPS exists a constant ζµτ such that problem, we can derive the optimality conditions for Problem µτ µτ B (s) = ζ , ∀s ∈ SG. (16) 3 by mapping Problem 3 to generalized ACPS problem. µτ µτ Optimality conditions for Problem 3. In the following, Furthermore, the gain-bias pair (B , b ) satisfies we establish the connection between the generalized ACPS n problem and Problem 3. Given the connection, we then derive µτ µτ µτ X µτ µτ B (s) + b (s) = g (s) + P (s, k)b (k) (17) the optimality conditions for Problem 3. k=1 Denote the gain-bias pair of Problem 3 on the prod- 0 n Proof. Suppose s is a recurrent state under policies µ and τ. uct SG G as (JG, hG), where JG, hG ∈ R . Denote 0 µτ µτ Define ξ(s) as the expected cost to reach s for the first time the gain-bias pair under policies µ and τ as (JG , hG ), µτ µτ µτ µτ T µτ from state s, and o(s) as the expected number of stages to where JG = [JG (1),JG (2), ··· ,JG (n)] and hG = 0 0 0 µτ µτ µτ T reach s for the first time from s. Thus ξ(s ) and o(s ) can [hG (1), hG (2), ··· , hG (n)] . 11

We can express the transition probability matrix P µτ in- adversary policy τˆ, the control policy that makes the gain-bias duced by control and adversary policy µ and τ as P µτ = pair of ACPS problem satisfy P µτ + P µτ , where in out B + b ≤ gµˆτˆ + P µˆτˆb (32) ( µτ 0 0 µτ 0 P (s, s ) if s ∈ E is optimal. That is, the control policy µ∗ that maps to µˆ∗ is Pin (s, s ) = (25a) 0 otherwise optimal. ( P µτ (s, s0) if s0 ∈/ E To obtain the optimal control policy, we need to characterize P µτ (s, s0) = . (25b) Problem 3 in terms of the control and adversary policies µ and out 0 otherwise τ. The following lemma generalizes the results presented in Denote the probability that we visit some accepting state [9] in which no adversary is considered. For completeness, we s0 ∈ E from state s under policies µ and τ as Pˆµτ (s, s0). show its proof which generalizes the proof in [9]. Then we see that Pˆµτ (s, s0) is calculated as µτ µτ Lemma 3. The gain-bias pair (JG , hG ) of Problem 3 under policies µ and τ satisfies the following equations: µτ 0 X X Pˆ (s, s ) = µ(s, uC ) τ(s, uA) µτ µτ µτ uC ∈UC (s) uA∈UA(s) JG = P JG , (33) µτ µτ µτ µτ µτ 0 X X J + h = gµτ + P µτ h + P J , (34) P rG(s, uC , uA, s ) + µ(s, uC ) τ(s, uA) G G G out G µτ µτ µτ µτ µτ uC ∈UC (s) uA∈UA(s) P v = (I − Pout )hG + v , (35) n µτ X µτ 0 for some vector v . P rG(s, uC , uA, k)Pˆ (k, s ). (26) k=l+1 Proof. Given the policies µˆ and τˆ for ACPS problem, we have The intuition behind (26) is that the probability that s0 is the µˆτˆ µˆτˆ µˆτˆ JG = P JG , first state to be visited consists of the following two parts. The µˆτˆ µˆτˆ µˆτˆ µˆτˆ µˆτˆ first term in (26) describes the probability that next state is in JG + hG = g + P hG , µˆτˆ µˆτˆ µˆτˆ µˆτˆ E. The second term in (26) models the probability that before hG + v = P v , reaching state s0 ∈ E, the next visiting state is k∈ / E. Denote Due to the connection between the control policy of Problem the transition probability matrix formed by Pˆ(s, s0) as Pˆµτ . µτ µτ 3 and generalized ACPS problem, we have Since Pout is substochastic and transient, we have I − Pout is non-singular [47], where I is the identity matrix with proper J µτ = P µˆτˆJ µτ = (I − P µτ )−1P µτ J µτ . µτ G G out in G dimension. Thus I − Pout is invertible. Then using (25), the µτ µτ ˆµτ By rearranging the equation above, we have (I − Pout )JG = transition probability matrix P is represented as µτ µτ µτ µτ µτ µτ µτ µτ µτ JG − Pout JG = Pin JG . Thus JG = (Pout + Pin )JG = ˆµτ µτ −1 µτ µτ µτ µτ µτ µτ µτ µτ P = (I − Pout ) Pin . (27) P JG . The expression JG + hG = g + P hG + P µτ J µτ can be rewritten using (27) and (29). We have Denote the expected invariant property violation cost in- out G 0 µτ µτ µτ −1 µτ µτ µτ curred when visiting some accepting state s ∈ E from state JG + hG = (I − Pout ) (g + Pin hG ). s under policies µ and τ as gˆ(s). The expected cost gˆ(s) is Manipulating the equation above, we see that (I−P µτ )(J µτ + calculated as follows: out G hµτ ) = gµτ + P µτ hµτ . Then we can see that n G in G µτ X µτ µτ µτ µτ µτ µτ µτ µτ µτ gˆ(s) = g (s) + P r (s, k)ˆg(k). (28) JG + hG = g + (Pin + Pout )hG + Pout JG k=l+1 µτ µτ µτ µτ µτ = g + P hG + Pout JG . Under policies µ and τ, denote the expected cost vector formed hµˆτˆ + vµˆτˆ = P µˆτˆvµˆτˆ hµˆτˆ + vµˆτˆ = by gˆ(s) as gˆµτ . Then using (25), the expected cost vector (28) Start from G . We see that G (I − P µτ )−1P µτ vµτ . can be rearranged as follows: out in Therefore we have µτ µτ µτ µτ µτ µτ µτ µτ µτ µτ µτ −1 µτ (I − Pout )hG + v = P v , gˆ = Pout gˆ + g ⇒ gˆ = (I − Pout ) g . (29) which completes our proof. Using (27) and (29), we can rewrite (14) as

N−1 Lemma 3 indicates that the gain-bias pair can be solved µτ 1 X ˆµτ k µτ as solutions to a linear system with 3n unknowns. Thus we JG = lim sup P gˆ . (30) N→∞ N can evaluate any control and adversary policies using Lemma k=0 3, which provides us the potential to implement iterative Proper policies µ and τ of the product SG G for Problem algorithm to compute the optimal control policy µ. 3 are related to proper policies µˆ and τˆ for generalized ACPS To compute the control policy µ, we define two operators problem as follows: ∗ on (JG, hG) in (36) and (37), denoted as T (JG, hG) and Pˆµτ = P µˆτˆ, gˆµτ = gµˆτˆ,J µτ = Bµˆτˆ. T (JG, hG). Generally speaking, we can view them as map- G (31) ∗ n pings from (JG, hG) to T (JG, hG) ∈ R and Tµ(JG, hG) ∈ µτ µˆτˆ n If we define a bias term hG = b , then a gain-bias pair R , respectively. Note that in (37), the transition probability µτ µτ (JG , hG ) is constructed for Problem 3. Under the worst case is the one induced under a certain control policy µ. 12

 n ∗ X X X X X (T (JG, hG)) (s) = min max µ(s, uC )τ(s, uA)g(s) + µ(s, uC ) (36) µ τ 0 uC ∈UC (s) uA∈UA(s) uC ∈UC (s) uA∈UA(s) s =1 n  0 0 X X X 0 0 · τ(s, uA)P rG(s, uC , uA, s )hG(s ) + µ(s, uC )τ(s, uA)P rG(s, uC , uA, s )JG(s ) , ∀s 0 uC ∈UC (s) uA∈UA(s) s =l+1 n  X X X X X (Tµ(JG, hG)) (s) = max µ(s, uC )τ(s, uA)g(s) + µ(s, uC )τ(s, uA) (37) τ 0 uC ∈UC (s) uA∈UA(s) uC ∈UC (s) uA∈UA(s) s =1 n  0 0 X X X 0 0 · P rG(s, uC , uA, s )hG(s ) + µ(s, uC )τ(s, uA)P rG(s, uC , uA, s )JSG(s ) . ∀s 0 uC ∈UC (s) uA∈UA(s) s =l+1

Based on the definitions above, we present the optimality Algorithm 5 Algorithm for a control strategy that minimizes conditions for Problem 3 using the following theorem. the expected number of invariant constraint violations. 1: procedure MIN VIOLATION(G, C) Theorem 2. The control policy µ with gain-bias pair µτ µτ 2: Input: product SG G, the set GAMECs C associated (JG , hG ) that satisfies with formula φ1 µτ µτ ∗ µτ µτ 3: µ JG + hG = T (JG , hG ) (38) Output: Control policy cycle 4: Initialization: Initialize µ0 and τ 0 be proper policies. k k k k is the optimal control policy. ∗ µ τ µ τ 5: while T (JC , hC ) 6= k−1 k−1 k−1 k−1 Proof. Consider any arbitrary control policy µˆ and the worst T ∗(J µ τ , hµ τ ) do ∗ C C case adversary policy τˆ. By definition of T (·) in (36), we have 6: Policy Evaluation: Given µk and τ k, calculate the µτ µτ µˆτˆ µˆτˆ µτ µˆτˆ µτ µkτ k µkτ k that (38) implies JG +hG ≤ g +P hG +Pout JG , where gain-bias pair (J , h ) using Lemma 3. µˆτˆ µˆτˆ C C P and Pout are the transition probability matrix induced by 7: Policy Improvement: Calculate policies µˆ and τˆ. Then we have the control policy µ using µ = n k k k k o µτ µτ µˆτˆ µτ µτ µτ µτ µ τ µτ µ τ µˆτˆ µˆτˆ argminµ argmaxτ g + P h + P J . JG + hG − Pout JG ≤ g + P hG C out C µˆτˆ µˆτˆ µˆτˆ µτ 8: Set µk+1 = µ. = g + (Pin + Pout )hG . 9: Set k = k + 1. µˆτˆ µτ µτ µˆτˆ µˆτˆ µτ Thus we observe that (I −Pout )(JG +hG ) ≤ g +Pin hG . 10: end while µˆτˆ Note that (I − Pout ) is invertible. Thus the inequality above 11: end procedure µτ µτ µˆτˆ −1 µˆτˆ µˆτˆ µτ is rewritten as JG + hG ≤ (I − Pout ) (g + Pin hG ). Rewrite the inequality above according to (27) and (29). Then we have corresponding adversary policies until no more improvement µτ µτ µ˜τ˜ µ˜τ˜ µτ JG + hG ≤ g + P hG , can be made. Given µreach and µcycle, we can construct the optimal control policy for Problem 3 as where µ˜ and τ˜ are the control and adversary policies in the µ˜∗ µ ( associated ACPS problem. Thus, satisfies (32) and is µ , if s∈ / E optimal over all the proper policies. µ∗ = reach . (39) µcycle, if s ∈ E Optimal control policy for Problem 3. In the following, We finally present the convergence and optimality of Algo- we focus on how to obtain an optimal secure control policy. rithm 5 using the following theorem. First, note that the optimal control policy consists of two parts. The first part, denoted as µreach, maximizes the probability of Theorem 3. Algorithm 5 terminates within a finite number satisfying specification φ1, while the second part, denoted as of iterations for any given accepting state set E. Moreover, µcycle, minimizes the violation cost per cycle due to violating the result returned by Algorithm 5 satisfies the optimality invariant property. Following the procedure described in Algo- conditions for Problem 3. rithm 4, we can obtain the control policy µreach that maximizes Proof. In the following, we first prove Algorithm 5 converges the probability of satisfying specification φ1. Suppose the set of accepting states E has been reached. Then the control policy within a finite number of iterations. Then we prove that the results returned by Algorithm 5 satisfies the optimality µcycle that optimizes the long term performance of the system conditions in Theorem 2. We denote the iteration index as k. is generated using Algorithm 5. Algorithm 5 first initializes the k 0 0 The control policy at kth iteration is denoted as µ . The worst control and adversary policies arbitrarily (e.g., if µ and τ are k k µ0(s, u ) = 1/|U (s)| case adversary policy associated with µ is denoted as τ . set as uniform distributions, then C C and µkτ k µkτ k k+1 k+1 0 n µ τ τ (s, u ) = 1/|U (s)| for all s, u and u ). Then it follows Define a vector δ ∈ R as δ = JG 1+hG −g − A A C A k k k+1 k+1 k k µk+1τ k+1 µ τ µ τ µ τ a policy iteration procedure to update the control and the P hG − Pout JG 1. 13

k k k k k k µ τ µ τ µτ µτ µ τ By Lemma 3, we have JG 1+hG = g +P hG + By observing (41), we have µτ µkτ k P J . By the definition of T ∗(·) in (36), the control T −1 out G t k k k+1 k+1 µτ X µk+1τ k+1 µ τ µ τ policy at iteration k + 1 is computed by optimizing g + lim P (hG − hG ) k k k k T →∞ µτ µ τ µτ µ τ t=0 P hG +Pout JG . Thus we have that for all s, δ(s) ≥ 0. T −1 k k k+1 k+1 t Moreover, we can rewrite vector δ as µ τ µ τ X µk+1τ k+1 = hG − hG − lim P δ µkτ k µkτ k µk+1τ k+1 µk+1τ k+1 µk+1τ k+1 T →∞ δ = JG 1 + hG − g − P hG t=0 k k k+1 k+1 k+1 k+1 k+1 k+1 k+1 k+1 µ τ µ τ µ τ µ τ µk+1τ k+1 µ τ ≤ h − h − δ. − Pout JG 1 + P hG G G µk+1τ k+1 µk+1τ k+1 µk+1τ k+1 µkτ k Note that the elements corresponding to the transient states + Pout JG 1 − P hG k+1 k+1 t µkτ k µk+1τ k+1 k+1 k+1 k k µ τ µ τ µ τ in P (hG − hG ) approach zero when t → − Pout J 1 k k k+1 k+1 G ∞. Thus we have hµ τ (s) − hµ τ (s) ≥ δ(s) ≥ 0 for µkτ k µkτ k µk+1τ k+1 µk+1τ k+1 G G = JG 1 + hG − JG 1 − hG all transient states s. Combining all the above together, we k k k+1 k+1  k k k+1 k+1  µ τ µ τ µ τ µ τ have that µk = µk+1 when δ = 0, otherwise h (s) − − P hG − hG G µk+1τ k+1 µk+1τ k+1  µkτ k µk+1τ k+1  hG (s) ≥ 0 holds for some transient state s. − Pout JG − JG 1, When Algorithm 5 terminates, we have that where the second equality holds by Lemma 3. Thus δ can be ∗ µk+1τ k+1 µk+1τ k+1 ∗ µkτ k µkτ k T (JG , hG ) = T (JG , hG ). (43) represented as By using policy iteration algorithm, the gain-bias pair k k k k  µk+1τ k+1   µkτ k µk+1τ k+1  (J µ τ , hµ τ ) is first evaluated using Lemma 3 at each δ = I − Pout JG − JG 1 G G iteration k. Then using the gain-bias pair obtained in policy k k k+1 k+1  µk+1τ k+1   µ τ µ τ  ∗ + I − P hG − hG , (40) evaluation phase, the T operator is calculated as shown in Algorithm 5. Thus according to Lemma 3, we see t µk+1τ k+1 where I is the identity matrix. By multiplying P to n µτ µτ µkτ k µτ µkτ k o µ = argmin max g + P h + Pout J . (44) both sides of (40) and calculating the summation over t from µ τ G G 0 to T − 1, we have that Note that the right hand side of (44) is equivalent to how T ∗ T −1 T −1 is calculated in Algorithm 5. Therefore, by combining (43) t t k+1 k+1 k k k k k k k k X µk+1τ k+1 X µk+1τ k+1  µ τ  µ τ µ τ ∗ µ τ µ τ P δ = P I − Pout and (44), we obtain JG + hG = T (JG , hG ). By k t=0 t=0 Theorem 2, we see that µ is the optimal control policy. T −1 k k k+1 k+1 t  µ τ µ τ  X µk+1τ k+1  µk+1τ k+1  · JG − JG 1+ P I − P VII.CASE STUDY t=0 In this section, we present two case studies to demonstrate  µkτ k µk+1τ k+1  · hG − hG . (41) our proposed method. In particular, we focus on the application of remotely controlled UAV under deception attack. In the Divide both sides by T and let T → ∞. Then we have first case study, the UAV is given a specification modeling

T −1 T −1 reach-avoid requirement. In the second case study, the UAV is X 1 k+1 k+1 t X 1 k+1 k+1 t lim P µ τ δ = lim P µ τ given a specification modeling surveillance and collision free T →∞ T T →∞ T t=0 t=0 requirement. Both case studies were run on a Macbook Pro t k+1 k+1 k k k+1 k+1 µk+1τ k+1 µ τ  µ τ µ τ  with 2.6GHz Intel Core i5 CPU and 8GB RAM. − P Pout JG − JG 1 (42) since the second term of (41) is eliminated when T → A. Case Study I: Remotely Controlled UAV under Deception µk+1τ k+1 Attack with Reach-Avoid Specification ∞. Since Pout is a substochastic matrix, we have k+1 k+1 µ τ µk+1τ k+1 Pout 1 ≤ 1. Furthermore, since P is a stochastic In this case study, we focus on the application of remotely µk+1τ k+1 controlled UAV, which conducts package delivery service. The matrix, we see that 1 − Pout 1 ≥ 0. Thus we have  k+1 k+1 t k+1 k+1 t µk+1τ k+1  UAV carries multiple packages and is required to deliver the P µ τ − P µ τ P 1 ≥ 0. Given the in- out packages to pre-given locations in particular order (e.g., the µkτ k µk+1τ k+1 equality above and δ ≥ 0, we have that JG −JG ≥ 0 solution of a travelling salesman problem). The UAV navigates µkτ k µk+1τ k+1 in a discrete bounded grid environment following system by observing (42), which implies that JG ≥ JG . k k k+1 k+1 µ τ µ τ model x(t + 1) = x(t) + (uC (t) + uA(t) + ϑ(t)) ∆t, where Consider the scenario where JG = JG . We 2 2 µkτ k µk+1τ k+1 x(t) ∈ R is the location of the UAV, uC (t) ∈ U ⊆ R further need to show that in this case h ≤ h . For 2 G G is the control signal, uA(t) ∈ A ⊆ R is the attack signal, each state that belongs to the recurrent class, the corresponding ϑ(t) ∈ D ⊆ R2 ∆t k+1 k+1 t is the stochastic disturbance and is time PT −1 µ τ 2 entry of t=0 P is positive. By observing (42), we interval. In particular, we let the control set U = [−0.3, 0.3] , have δ(s) = 0 for all s belonging to the recurrent class. Thus the attack action signal set A = [−0.2, 0.2]2, the disturbance k k k+1 k+1 µ τ µ τ 2 according to (41), we have that hG (s) = hG (s) for set D = [−0.05, 0.05] . Also, the disturbance ϑ(t) ∼ N (0, Γ), all s in the recurrent class. where Γ = diag(0.152, 0.152). 14

Trajectory Comparison Satisfaction Probability Starting from Intersection States Satisfaction Probability Starting from Intersection States 20 20 20

dest3 dest2 dest3 dest2 dest3 dest2 18 18 18

16 obstacle obstacle obstacle 16 obstacle obstacle obstacle 16 obstacle obstacle obstacle

14 14 14

12 12 12

10 obstacle obstacle obstacle 10 obstacle obstacle obstacle 10 obstacle obstacle obstacle

8 8 8

6 6 6

4 obstacle obstacle obstacle 4 obstacle obstacle obstacle 4 obstacle obstacle obstacle

2 2 2 home dest1 dest1 dest1

0 0 0 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20

(a) (b) (c) Fig. 1: Comparison of the proposed approach and the approach without considering the presence of the adversary. Fig. 1a gives the trajectories obtained using two approaches. The solid blue line is the trajectory obtained using the proposed approach, while the dashed red line represents the trajectory obtained using the approach without considering the presence of the adversary. Fig. 1b and Fig. 1c present the probability of satisfying the LTL specification using the proposed approach and the approach without considering the adversary when the initial state is set as each of the states lands in the intersections of the grid world, respectively. The shade of gray level at the intersection states corresponds to the satisfaction probability, with black being probability 0 and white being probability 1.

Trajectory Comparison Comparison of Expected Invariant Constraint Violation Cost 20 500 Proposed approach 18 450 Approach without considering adversary

16 obstacle obstacle obstacle 400

14 350

dest3 dest2 12 300

10 obstacle obstacle obstacle 250

8 200 dest1

6 150

4 obstacle obstacle obstacle 100

2 50 home Expected Invariant Constraint Violation Cost 0 0 0 2 4 6 8 10 12 14 16 18 20 1 2 3 4 5 6 7 8 9 10 Iteration Index

(a) (b) Fig. 2: Comparison of the proposed approach and the approach without considering the presence of the adversary. Fig. 1a gives the trajectories obtained using two approaches. The solid blue line is the trajectory obtained using the proposed approach, while the dashed red line represents the trajectory obtained using the approach without considering the presence of the adversary. Fig. 2b shows the expected invariant constraint violation cost with respect to iteration indices.

State (7, 8) (8, 8) (13, 8) (14, 8) (7, 13) (8, 13) (13, 13) (14, 13) µτ P rG 0.6684 0.6028 0.5915 0.4893 0.8981 0.7126 0.6684 0.6028 µ˜τ˜ P rG 0.3619 0.3182 0.2878 0.1701 0.6146 0.5112 0.3619 0.3182 Improvement 84.69% 89.44% 105.52% 187.65% 46.13% 39.40% 84.69% 89.44% TABLE I: Comparison of probabilities of satisfying specification φ when starting from the states located in intersections using proposed approach and approach without considering the adversary.

We abstract the system as an SG using Algorithm 1. In using Algorithm 1. particular, given the location of the UAV, we can map the The UAV is required to deliver packages to three locations location of the UAV to the grid and simulate the grid it ‘dest1’, ‘dest2’, and ‘dest3’ in this particular order after reaches at time t + 1. Each grid in the environment can be departing from its ‘home’. Then it has to return to ‘home’ mapped to a state in the SG. In this case study, there exists and stay there forever. Also during this delivery service, the 400 states in the SG. In the following, we use location and UAV should avoid colliding with obstacle areas marked as state interchangeably. The control actions and attack signals black in Fig. 1a to Fig. 1c. The LTL formula is written as φ = are sets of discrete control inputs. The label of each state is home∧3(dest1∧3(dest2∧3dest3))∧32home∧2¬obstacle. shown in Fig. 1a. The transition probability can be obtained We compare the control policy obtained using the proposed 15 approach with that synthesized without considering the pres- after convergence. In Fig. 2b, the approach that does not ence of the adversary. In Fig. 1a, we present the sample trajec- consider the adversary incurs lower cost compared to the tories obtained using these approaches. The solid line shows proposed approach during iterations 2 to 6. The reason is a sample trajectory obtained by using the proposed approach, that although the proposed approach guarantees convergence and the dashed line shows the trajectory obtained by using the to Stackelberg equilibrium, it does not guarantee optimality control policy synthesized without considering the presence of of the intermediate policies. The average invariant constraint adversary. To demonstrate the resilience of the proposed ap- violation cost using proposed approach is 23.90, while the proach, we let the states located in the intersections be labelled average invariant constraint violation cost using the approach as ‘home’ and hence are set as the initial states. We compare without considering the adversary is 40.52. The improvement the probability of satisfying the specification φ using the achieved using the proposed approach is 28.67%. proposed approach and the approach without considering the Given the transition probability, the SG and DRA associated adversary in Fig. 1b and Fig. 1c, respectively. We observe that with specification φ are created within 1 and 0.01 second, the control policy synthesized using the proposed approach has respectively. The computation of product SG took 72 seconds. higher probability of satisfying specification φ. The detailed The product SG has 1600 states and 26688 transitions. It took probability of satisfying specification φ is listed in Table 36 seconds to compute the control policy. I. Denote the probability of satisfying specification φ using the proposed approach and the approach without considering VIII.CONCLUSION µτ µ˜τ˜ the adversary as P rG and P rG , respectively. By using the In this paper, we investigated two problems on a discrete- proposed approach, the average of the improvements of the time dynamical system in the presence of an adversary. We probability of satisfying the given specification starting from assumed that the adversary can initiate malicious attacks on µτ µ˜τ˜ µ˜τ˜ intersection states achieves (P rG − P rG )/P rG = 90.87%. the system by observing the control policy of the controller The computation of transition probability took 890 seconds. and choosing an intelligent strategy. First, we studied the Given the transition probability, the SG and DRA associated problem of maximizing the probability of satisfying given with specification φ are created within 1 and 0.01 second, LTL specifications. A stochastic Stackelberg game was formu- respectively. The computation of product SG took 80 seconds. lated to compute a stationary control policy. A deterministic The product SG has 2000 states and 41700 transitions. It took polynomial-time algorithm was proposed to solve the game. 45 seconds to compute the control policy. Second, we formulated the problem of minimizing the ex- pected times of invariant constraint violation while maximiz- ing the probability of satisfying the liveness specification. We B. Case Study II: Remotely Controlled UAV under Deception developed a policy iteration algorithm to compute an optimal Attack with Liveness and Invariant Specification control policy by exploiting connections to the ACPS problem. In this case study, we focus on the same UAV model as pre- The bottleneck of the proposed framework is the computation sented in Section VII-A. Let the UAV be given an LTL spec- complexity of the abstraction procedure. However, this is ification φ = 2(3(dest1 ∧ 3(dest2 ∧ 3dest3))) ∧ 2¬obstacle beyond the scope of this work. The potential ways to reduce consisting of liveness and invariant constraints. In particular, the computation complexity include exploring the symmetry of the lieveness constraint φ1 = 2(3(dest1∧3(dest2∧3dest3))) the environment, and applying receding horizon based control models a surveillance task, i.e., the UAV is required to patrol framework. In future work, we will consider non-stationary three critical regions infinitely often following a particular control and adversary policies. order, and the invariant constraint ψ = 2¬obstacle requires the UAV to avoid collisions with obstacles. Once the critical REFERENCES regions are visited, a cycle is completed. During each cycle, [1] M. Lahijanian, S. B. Andersson, and C. Belta, “Temporal logic motion planning and control with probabilistic satisfaction guarantees,” IEEE the rate of invariant constraint violation need to be minimized. Transactions on Robotics, vol. 28, no. 2, pp. 396–409, 2012. The cost incurred at each violation is assigned to be 20. [2] D. Sadigh and A. Kapoor, “Safe control under uncertainty with proba- We compare the proposed approach with the approach bilistic signal temporal logic,” in Robotics: Science and Systems, 2016. [3] T. Wongpiromsarn, U. Topcu, and R. M. Murray, “Receding horizon without considering the adversary. The sample trajectories temporal logic planning for dynamical systems,” in the Proc. of the 48th obtained using these approaches are presented in Fig. 2a. In IEEE Conf. on Decision and Control (CDC), 2009, pp. 5997–6004. particular, the solid line shows a sample trajectory obtained [4] S. Karaman and E. Frazzoli, “Sampling-based motion planning with deterministic µ-calculus specifications,” in the Proc. of the 48th IEEE by using the proposed approach, and the dashed line shows Conf. on Decision and Control (CDC), 2009, pp. 2222–2229. the trajectory obtained by using the control policy synthesized [5] S. G. Loizou and K. J. Kyriakopoulos, “Automatic synthesis of multi- without considering the presence of adversary. We observe agent motion tasks based on LTL specifications,” in the Proc. of the 43rd IEEE Conf. on Decision and Control (CDC), vol. 1, 2004, pp. 153–158. that the control strategy of synthesized using the approach [6] C. Baier, J.-P. Katoen, and K. G. Larsen, Principles of Model Checking. without considering the adversary uses less effort comparing MIT Press, 2008. to the proposed approach. However, the proposed approach [7] M. Kattenbelt and M. Huth, “Verification and refutation of probabilistic specifications via games,” in IARCS Annual Conf. on Foundations of is more resilient since it uses more control effort to deviate Software Technology and Theoretical Computer Science. Schloss from the obstacles to minimize the violation cost. We present Dagstuhl-Leibniz-Zentrum fur¨ Informatik, 2009. the average invariant constraint violation cost incurred using [8] E. M. Wolff, U. Topcu, and R. M. Murray, “Robust control of uncertain Markov decision processes with temporal logic specifications,” in the the control policy obtained at each iteration in Fig. 2b. Proc. of the 51st IEEE Conf. on Decision and Control (CDC), 2012, We observe that the proposed approach incurs lower cost pp. 3372–3379. 16

[9] X. Ding, S. L. Smith, C. Belta, and D. Rus, “Optimal control of [34] C. Y. Ma, N. S. Rao, and D. K. Yau, “A game theoretic study of attack Markov decision processes with linear temporal logic constraints,” IEEE and defense in cyber-physical systems,” in the Proc. of IEEE Conf. on Transactions on Automatic Control, vol. 59, no. 5, pp. 1244–1257, 2014. Computer Communications Workshops, 2011, pp. 708–713. [10] K. O’Connell, “CIA Report: Cyber extortionists attacked foreign [35] Y. Li, L. Shi, P. Cheng, J. Chen, and D. E. Quevedo, “Jamming attacks power grid, disrupting delivery,” http://www.ibls.com/internet law on remote state estimation in cyber-physical systems: A game-theoretic news portal view.aspx?id=1963&s=latestnews. approach,” IEEE Transactions on Automatic Control, vol. 60, no. 10, [11] K. Koscher, A. Czeskis, F. Roesner, S. Patel, T. Kohno, S. Checkoway, pp. 2831–2836, 2015. D. McCoy, B. Kantor, D. Anderson, H. Shacham, and S. Savage, [36] D. Korzhyk, Z. Yin, C. Kiekintveld, V. Conitzer, and M. Tambe, “Experimental security analysis of a modern automobile,” in IEEE Symp. “Stackelberg vs. Nash in security games: An extended investigation of on Security and Privacy, 2010, pp. 447–462. interchangeability, equivalence, and uniqueness,” Journal of Artificial [12] A. J. Kerns, D. P. Shepard, J. A. Bhatti, and T. E. Humphreys, Intelligence Research, vol. 41, no. 2, pp. 297–327, 2011. “Unmanned aircraft capture and control via GPS spoofing,” Journal of [37] L. Niu and A. Clark, “Secure control under linear temporal logic Field Robotics, vol. 31, no. 4, pp. 617–636, 2014. constraints.” in the Proc. of American Control Conf. IEEE, 2018. [13] P. Paruchuri, J. P. Pearce, J. Marecki, M. Tambe, F. Ordonez, and [38] Y. Shoukry and P. Tabuada, “Event-triggered state observers for sparse S. Kraus, “Playing games for security: An efficient exact algorithm for sensor noise/attacks,” IEEE Transactions on Automatic Control, vol. 61, solving Bayesian Stackelberg games,” in Proc. of the Intl. Conf. on no. 8, pp. 2079–2091, 2016. Autonomous agents and multiagent systems. International Foundation [39] H. Fawzi, P. Tabuada, and S. Diggavi, “Secure estimation and control for for Autonomous Agents and Multiagent Systems, 2008, pp. 895–902. cyber-physical systems under adversarial attacks,” IEEE Transactions on [14] M. Zhu and S. Martinez, “Stackelberg-game analysis of correlated Automatic Control, vol. 59, no. 6, pp. 1454–1467, 2014. attacks in cyber-physical systems,” in the Proc. of American Control [40] M. H. Manshaei, Q. Zhu, T. Alpcan, T. Bacs¸ar, and J.-P. Hubaux, “Game Conf. IEEE, 2011, pp. 4063–4068. theory meets network security and privacy,” ACM Computing Surveys, [15] N. Basilico, N. Gatti, and F. Amigoni, “Patrolling security games: vol. 45, no. 3, p. 25, 2013. Definition and algorithms for solving large instances with single patroller [41] Q. Zhu and T. Basar, “Game-theoretic methods for robustness, security, and single intruder,” Artificial Intelligence, vol. 184, pp. 78–123, 2012. and resilience of cyberphysical control systems: games-in-games prin- [16] M. Tambe, Security and Game Theory: Algorithms, Deployed Systems, ciple for optimal cross-layer resilient control systems,” IEEE Control Lessons Learned. Cambridge University Press, 2011. Systems Magazine, vol. 35, no. 1, pp. 46–65, 2015. [17] T. Chen, V. Forejt, M. Z. Kwiatkowska, D. Parker, and A. Simaitis, [42] Q. Zhu and T. Bas¸ar, “Robust and resilient control design for cyber- “Prism-games: A model checker for stochastic multi-player games.” in physical systems with an application to power systems,” in the Proc. of the Proc. of Intl. Conf. on TACAS. Springer, 2013, pp. 185–191. the 50th IEEE Conf. on Decision and Control and European Control [18] J. Ding, M. Kamgarpour, S. Summers, A. Abate, J. Lygeros, and Conf., 2011, pp. 4066–4071. C. Tomlin, “A stochastic games framework for verification and control [43] D. P. Bertsekas, D. P. Bertsekas, D. P. Bertsekas, and D. P. Bertsekas, of discrete time stochastic hybrid systems,” Automatica, vol. 49, no. 9, Dynamic Programming and Optimal Control. Athena Scientific Bel- pp. 2665–2674, 2013. mont, MA, 1995, vol. 1, no. 2. [19] H. Kress-Gazit, G. E. Fainekos, and G. J. Pappas, “Where’s Waldo? [44] M. Lahijanian, S. B. Andersson, and C. Belta, “A probabilistic approach sensor-based temporal logic motion planning,” in the Proc. of IEEE for control of a stochastic system from LTL specifications,” in the Proc. Intl. Conf. on Robotics and Automation, 2007, pp. 3116–3121. of the 48th IEEE Conf. on Decision and Control/Chinese Control Conf., [20] A. Bhatia, L. E. Kavraki, and M. Y. Vardi, “Sampling-based motion 2009, pp. 2236–2241. planning with temporal goals,” in the Proc. of IEEE Intl. Conf. on [45] O. Cappe,´ S. J. Godsill, and E. Moulines, “An overview of existing Robotics and Automation, 2010, pp. 2689–2696. methods and recent advances in sequential monte carlo,” Proceedings [21] E. Plaku, L. E. Kavraki, and M. Y. Vardi, “Motion planning with of the IEEE, vol. 95, no. 5, pp. 899–924, 2007. dynamics by a synergistic combination of layers of planning,” IEEE [46] J. v. Neumann, “Zur theorie der gesellschaftsspiele,” Mathematische Transactions on Robotics, vol. 26, no. 3, pp. 469–482, 2010. annalen, vol. 100, no. 1, pp. 295–320, 1928. [22] J. Fu, N. Atanasov, U. Topcu, and G. J. Pappas, “Optimal temporal logic [47] L. Hogben, Handbook of Linear Algebra. Chapman and Hall/CRC, planning in probabilistic semantic maps,” in the Proc. of IEEE Intl. Conf. 2013. on Robotics and Automation, 2016, pp. 3690–3697. [23] M. Lahijanian, J. Wasniewski, S. B. Andersson, and C. Belta, “Motion planning and control from temporal logic specifications with probabilis- Luyao Niu (SM’15) received the B.Eng. degree tic satisfaction guarantees,” in the Proc. of IEEE Intl. Conf. on Robotics from the School of Electro-Mechanical Engineering, and Automation, 2010, pp. 3227–3232. Xidian University, Xian, China, in 2013 and the [24] M. Kloetzer and C. Belta, “A fully automated framework for control of M.Sc. degree from the Department of Electrical and linear systems from temporal logic specifications,” IEEE Transactions Computer Engineering, Worcester Polytechnic Insti- on Automatic Control, vol. 53, no. 1, pp. 287–297, 2008. tute (WPI) in 2015. He has been working towards [25] J. Fu and U. Topcu, “Synthesis of shared autonomy policies with tem- his Ph.D. degree in the Department of Electrical poral logic specifications,” IEEE Transactions on Automation Science and Computer Engineering at Worcester Polytechnic and Engineering, vol. 13, no. 1, pp. 7–17, 2016. Institute since 2016. His current research interests [26] V. Raman and H. Kress-Gazit, “Analyzing unsynthesizable specifications include optimization, game theory, and control and for high-level robot behavior using LTLMoP,” in Computer Aided security of cyber physical systems. Verification. Springer, 2011, pp. 663–668. [27] J. Tumova,´ L. I. R. Castro, S. Karaman, E. Frazzoli, and D. Rus, Andrew Clark (M’15) is an Assistant Professor in “Minimum-violation LTL planning with conflicting specifications,” in the Department of Electrical and Computer Engi- the Proc. of American Control Conf. IEEE, 2013, pp. 200–205. neering at Worcester Polytechnic Institute. He re- [28] A. Nilim and L. El Ghaoui, “Robust control of Markov decision pro- ceived the B.S. degree in Electrical Engineering and cesses with uncertain transition matrices,” Operations Research, vol. 53, the M.S. degree in Mathematics from the Univer- no. 5, pp. 780–798, 2005. sity of Michigan - Ann Arbor in 2007 and 2008, [29] J. Fu and U. Topcu, “Synthesis of joint control and active sensing respectively. He received the Ph.D. degree from the strategies under temporal logic constraints,” IEEE Transactions on Network Security Lab (NSL), Department of Elec- Automatic Control, vol. 61, no. 11, pp. 3464–3476, 2016. trical Engineering, at the University of Washington [30] D. Fudenberg and J. Tirole, Game Theory. MIT Press, 1991. - Seattle in 2014. He is author or co-author of the [31] T. Quatmann, C. Dehnert, N. Jansen, S. Junges, and J.-P. Katoen, IEEE/IFIP William C. Carter award-winning paper “Parameter synthesis for Markov models: Faster than ever,” in Intl. Symp. (2010), the WiOpt Best Paper (2012), and the WiOpt Student Best Paper on Automated Technology for Verification and Analysis. Springer, 2016, (2014), and was a finalist for the IEEE CDC 2012 Best Student-Paper pp. 50–67. Award. He received the University of Washington Center for Information [32] M. Kattenbelt, M. Kwiatkowska, G. Norman, and D. Parker, “A game- Assurance and Cybersecurity (CIAC) Distinguished Research Award (2012) based abstraction-refinement framework for Markov decision processes,” and Distinguished Dissertation Award (2014). His research interests include Formal Methods in System Design, vol. 36, no. 3, pp. 246–280, 2010. control and security of complex networks, submodular optimization, and [33] L. de Alfaro and T. A. Henzinger, “Concurrent omega-regular games,” in control-theoretic modeling of network security threats. Proc. of IEEE symp. on Logic in Computer Science, 2000, pp. 141–154.