A Markov Decision Process Perspective

Privacy in Stochastic Control: A Markov Decision Process Perspective Parv Venkitasubramaniam1 Abstract— Cyber physical systems, which rely on the joint As cyber physical systems imminently begin to replace our functioning of information and physical systems, are vulnerable existing basic infrastructures, leakage of such information to information leakage through the actions of the controller. In can have critically damaging consequences ranging from particular, if an external observer has access to observations in the system exposed through cyber communication links, airline delays, power blackouts to malfunctioning nuclear then critical information can be inferred about the internal reactors. It is, therefore, imperative that we understand the states of the system and consequently compromise the privacy privacy implications of information driven physical systems, in system operation. In this work, a mathematical framework and expand the science of control systems to include pri- based on a Markov Process model is proposed to investigate vacy requirements. In this article, we investigate a formal the design of controller actions when a privacy requirement is imposed as part of the system objective. Quantifying privacy mathematical framework using a specific class of Markov using information theoretic equivocation, the tradeoff between modelled systems, analyze it under favourable mathematical achievable privacy and system utility is studied analytically. conditions, and provide insights into the design of privacy Specifically, for a sub-class of Markov Decision Processes preserving controller policies. (MDP), where the system output is independent of present and Markov decision processes (MDPs) are a common discrete future states, the optimization is expressed as a solution to a Bellman equation with convex reward functions. Further, time mathematical framework for modelling decision making when the state evolution is a deterministic function of the in systems where the evolution of internal states and the states, actions and inputs, the Bellman equation is reduced to observable outputs depend partly on the external input and a series of recurrence relations. For the general MDP with partly on the actions of the internal control mechanism. In an privacy constraints, the optimization is expressed as a Partially MDP, at each time step, the process is in some state s, and Observable Markov Decision Process with belief dependent rewards (ρ−POMDP). Computable inner and outer bounds are upon receiving an input x, the decision maker may choose provided on the achievable privacy utility tradeoff using greedy any action a that is available in state s. The process responds policies and rate distortion optimizations respectively. at the next time step by randomly moving into a new state s′ (depending on the values of s and x), and giving the I. INTRODUCTION decision maker a corresponding reward or utility u(s, a, x). Often, the action a in state s could result in an output y. Cyber-physical systems, as the name suggests, rely on the In many cyber physical systems, it is fair to assume that the joint functioning of information systems and physical com- sequence of internal states are available to the decision maker ponents. These systems of the future, which include the smart to take the necessary action at each step. It is also essential electric grid, smart transportation, advance manufacturing that these internal states should remain confidential to any and next generation air traffic management system, are en- external observer. In the context of cyber physical systems, visioned to transform the way engineered systems function, particularly those implemented by wireless networks which far exceeding the systems of today in capability, adaptability, are vulnerable to eavesdropping, several questions arise: If reliability and usability. While the success of cyber physical the sequence of inputs and outputs are available to an external systems relies on the power of information exchange, it’s observer, how much information can he/she obtain about the fallibility lies in the power of information leakage. Despite internal states of the system. If the internal system states tremendous advances in cryptography, communication over are to remain confidential from an external observer, how the Internet is far from being truly confidential. The re- does it change the decision maker’s choices of actions and cent NSA controversy notwithstanding, there are examples consequently the expected reward or utility. Is there a fun- aplenty where the sensitive information of legitimate users damental tradeoff between the degree of privacy achievable are stolen using “visible” facets of communication such in the system and the total expected utility? In this article, as the length of transmitted packets [1], timing of packets we investigate the answers to these questions within the transmitted [2], routes of packet flow over a network [3] and framework of Markov Decision processes. suchlike. We emphasize the qualifier “visible” to indicate A quantitative model for information privacy is essential to that the aforementioned features of communication cannot be this investigation, and we rely on the information-theoretic hidden using encryption methodologies and are, in today’s equivocation (conditional entropy) for this purpose. Using wireless communication medium, easily retrievable [4]. equivocation as a measure of privacy, we wish to study controller policies that maximize utility subject to a desired *This work was supported in part by the National Science Foundations level of privacy. Specifically, consider the system as shown in through the grants CCF-1149495 and CNS-1117701 1P. Venkitasubramaniam is with the Electrical and Computer Engineering Figure 1. Let X = {X1, ··· ,Xn}, A = {A1, ··· ,An}, S = Department at Lehigh University parv.v at lehigh.edu {S1, ··· ,Sn}, Y = {Y1, ··· ,Yn} denote the respective se- adversarial observer can utilize observations in a particular A1, ··· ,An time step to improve his estimate of states in previous time steps; it is impractical to assume that eavesdroppers stop X1, ··· ,Xn Y1, ··· ,Yn updating their estimates of a state after time has elapsed. An identity thief can and will use all possible information– S1, ··· ,Sn past, present and future– to compromise a user’s privacy. It is this important difference that makes the problem of Fig. 1. Markov Decision Process Model of a Cyber Physical System designing optimal strategies quite challenging. It is therefore conceivable that although the decision maker has access to the true internal state, the strategies should take into account quences of input variables, internal state and output variables the belief from the external observer’s point of view, and over n time steps of system operation. The primary objective account for the possibility that actions in one time step can of the controller under no privacy restrictions is to maximize reveal information about states in past time steps. the expected reward/utility A quantitative approach to privacy utility tradeoffs has n previously been explored in [9], [10] using a notion the E Un = ( u(Xi,Ai,Si)) authors refer to as competitive privacy. In particular, they i=1 X consider the privacy utility tradeoff resulting when informa- Designing controller policies to maximize net reward defined tion is shared between regional transmission operators on a as above over finite and infinite horizon is well understood smart grid for the purpose of joint state estimation. Using [5]. In this work, we investigate the design of controller information theoretic equivocation to quantify privacy, they policies when a constraint is provided on the information demonstrate the equivalence of the privacy utility tradeoff to leaked, measured using equivocation a lossy source coding problem. As will be seen in Section IV-B of this paper, the connection to lossy source coding in P = E(H(S , ··· ,S |X , ··· ,X ,Y ··· ,Y ) n 1 n 1 n 1 n our model is obtained in the form of an upper bound, rather Specifically we model the net reward as a weighted sum of than a direct equivalence. An alternative measure to privacy the utility (Rn) and the privacy Pn, and study the design of in control is the use of differential privacy as in [11], where optimal policies. the authors address the problem of releasing filtered signals The confluence of communication and control has in that respect the privacy of the input data stream. general been a tremendous source of interest in the research community since Witsenhausen’s famous counterexample II. PRIVACY PRESERVING CONTROL:AN MDP was published [6]. More recently, the authors in [7] used FRAMEWORK an information theoretic perspective to shed new light on We describe the Markov Decision Process framework the counterexample. From the perspective of this work, we with relevance to the problem in consideration. Prior to the find it important to discuss in slight detail the problem description of the model, we briefly discuss the notation of control under communication constraints [8]. Control that follows. Uppercase variables X1,S1 etc denote random under communication constraints studies decision making in variables while lowercase x1, a1 denote values. Vectors systems where observations, feedback or state variables can X, S are represented using boldfaced letters. The notation j be measured or are to be communicated under bandwidth Xi refers to the sequence Xi,Xi+1, ··· ,Xj . limitations. Consequently, there is uncertainty in the

Load more