Information Transfer in Dynamical Systems and Optimal Placement of Actuators and Sensors for Control of Non-equilibrium Dynamics

Subhrajit Sinha, Umesh Vaidya and Enoch Yeung

Abstract— In this paper we develop the concept of information transfer transfer is not designed to capture indirect influence. This was between the Borel-measurable sets for a described addressed in [25] and in [25], [26] the authors provided a new def- by a measurable space and a non-singular transformation. The con- inition of information transfer between the state(s) of a dynamical cept is based on how Shannon entropy is transferred between the measurable sets, as the dynamical system evolves. We show that the system and it was shown that the information transfer thus defined proposed definition of information transfer satisfies the usual notions of captures the intuitions of causality, namely zero transfer, transfer information transfer and causality, namely, zero transfer and transfer asymmetry and also clearly distinguishes between direct and indi- asymmetry. Furthermore, we show how the information transfer rect transfers. Furthermore, the utility of the information transfer can be used to classify and mixing. We also develop the computational methods for information transfer computation and measure for analysis of nonlinear systems was demonstrated in apply the framework for optimal placements of actuators and sensors [27]–[29], where the information measure was used for stability for control of non-equilibrium dynamics. classification and causality analysis of power networks. Moreover, since the theory is formulated in terms of the P-F operator, it allows I.INTRODUCTION the data-driven computation of information transfer measure for a A mathematical study of dynamical system has a rich history, dynamical system [30], [31]. beginning with the work of Newton. A dynamical system is usually In this paper, we extend the definition of information transfer defined on a manifold M with a family of smooth functions to define information transfers between the measurable sets of a φt : M → M, where t is time, generally belonging to a monoid dynamical system. In particular, we consider a dynamical system T (R≥0 or Z≥0). The maps φt are such that they preserve the as a quadruple (X, B, µ, T ), where X is the phase space, B is the monoid structure, that is φt ◦φs = φt+s [1]. In another approach, a Borel σ-algebra on X, µ a and T : X → X dynamical system is defined on a measure space and time evolution and define the information transfer between the sets Ai ∈ B, with is modelled through a collection of measurable functions which respect to the measure µ and the map T . A direct consequence preserve the monoid structure of T [2], [3]. In recent times, operator of the definition of information content of any set in the σ- theoretic ideas are increasingly being used for analysis of nonlinear algebra is the recovery of the famous Liuoville’s Theorem [32] for dynamical systems [4]–[19]. The main advantage of this approach measure-preserving systems. As with the definition of information is the fact that these operators, namely Perron-Frobenius (P-F) transfer between the states [25], [26], the information transfer operator and Koopman operators are linear operators even though defined here is further generalized to define information transfer the underlying system may be nonlinear. Moreover, the operators over multiple iterates of the map T and with this we provide are positive Markov operators and these properties can be exploited necessary and sufficient conditions under which the dynamical to have probabilistic interpretations. system is either ergodic or mixing. We also discuss the applicability On the other hand, the concept of entropy was first introduced by of the developed framework and provide an information transfer th Clausius and later in the 19 century Ludwig Boltzmann inter- based solution to the problem of optimal placement of actuators preted entropy as a measure of disorder. In particular, Boltzmann and sensors for control of non-equilibrium dynamics [33]–[35]. entropy is given by the famous relation The paper is organized as follows. In sections II and III we briefly

S = kB ln W review the concepts of transfer operators and information transfer between the states in a dynamical system. In section IV we define where S is entropy, kB is the Boltzmann constant and W is the the information transfer measure between the Borel measurable sets number of microstates associated with a given macrostate of a and provide the main theoretical results followed by a discussion arXiv:1909.13369v1 [eess.SY] 29 Sep 2019 system. Claude Shannon generalized this notion notion of entropy of computational technique of information transfer in section V. when he provided his information-theoretic version of entropy [20] In section VI we address the problem of optimal placement of and defined entropy for a probability density. Shannon’s information actuators and sensors and provide an information transfer based theory is symmetric and in [21] Shannon’s information theory was solution. Finally we conclude the paper in section VII. generalized to give a sense of direction when the author defined bi-directional information. The idea was taken forward by Massey II.PRELIMINARIES and Kramer [22], [23] and defined directed information. Consider a dynamical system In dynamical system setting Liang and Kleeman introduced the concept of information transfer [24] which capture zero transfer xn+1 = T (xn) (1) and is inherently asymmetric. However, Liang-Kleeman information where T : X → X is in general assumed to be at least differentiable N and non-singular, with X ∈ R . The mapping T is said to be non- S. Sinha is with Pacific Northwest National Laboratory, Richland, Wash- −1 ington. U. Vaidya is with the Department of Electrical Engineering at Iowa singular with respect to a measure m, if m(T A) = 0 for all State University, Ames, Iowa. E. Yeung is with Department of Mechanical A ∈ B(X), such that m(A) = 0. Here B(X) denotes the Borel Engineering, University of California, Santa Barbara, California. email σ-algebra on X and let M(X) be the vector space of real-valued : [email protected] measures on B(X). A. Transfer Operators operators are dual to each other 1 Z Associated with the dynamical system (1) are transfer operators, hUf, gi = [Uf](z)g(z)dx namely Perron-Frobenius (P-F) operator and Koopman operator, Z Z −1 which are used to study the dynamical system. Instead of studying −1 dT = f(y)g(T (y)) dy = hf, Pgi the trajectories on the state space, these operators studies the X dy evolution of functions under the mapping T . Before defining the where f ∈ L∞(Z) and g ∈ L1(Z) and the P-F operator on P-F and Koopman operators, we define the more general Markov the space of densities L1(Z) is defined as follows operator [2]. −1 Definition 1 (Markov Operator): Let (X, A, µ) be a measure −1 dT (z) [Pg](z) = g(T (z))| |. space. Any linear operator P : L1 → L1 satisfying dz e). For g(z) ≥ 0, [ g](z) ≥ 0. • P f ≥ 0 for f ≥ 0, f ∈ L1; and P f). Let (Z, B, µ) be the measure space where µ is a positive but • k P f k=k f k, for f ≥ 0, f ∈ L1 not necessarily the measure of T : Z → Z, then is called a Markov operator. the P-F operator P : L1(Z, B, µ) → L1(Z, B, µ) satisfies Peron-Frobenius operator is a particular type of Markov operator following property: and is used for the evolution of densities. It is defined as follows: Z Z Definition 2 (Perron-Frobenuis Operator): Let (X, A, µ) be a [Pg](z)dµ(z) = g(z)dµ(x). measure space. The Perron-Frobenius operator P : L1 → L1 Z Z corresponding to the dynamical system (1) is a Markov operator satisfying III.INFORMATION TRANSFER BETWEEN STATES OF A Z Z DYNAMICAL SYSTEM Pf(x)dµ = f(x)dµ (2) A T −1(A) Information flow between the states of dynamical systems is a newly For invertible nonsingular transformations T , we have developed concept [25], [26] which studies how the evolution of a state(s) of a dynamical system affect/influence any other state(s). dT −1(x) For completeness of the paper we briefly discuss the concept of [ f](x) = f(T −1(x))| |, P dx information transfer between the states of a dynamical system. Consider the discrete time dynamical system where | · | is the . Another operator of interest is the Koopman operator U : L∞ → z(t + 1) = T (z(t)) + ξ(t) (3) L∞, which is defined as follows: > >> N N N where z = x y ∈ R , T : R → R is assumed to be at Definition 3 (Koopman Operator): h ∈ L : L → Given any ∞, U ∞ least continuous and ξ(t) is independent and identically distributed L ∞ is defined by additive noise, which comes from the distribution g. [Uh](z) = h(T (z)). Information transfer from state (subspace) x to state (subspace) Both the P-F operator and the Koopman operator are linear opera- y gives a measure of how the evolution of x dynamics affect tors, even if the underlying system is non-linear. But while analysis (influence) the evolution of y dynamics. In particular, we quantify is made tractable by linearity, the trade-off is that these operators this influence in terms of the entropy transferred from the x are typically infinite dimensional. In particular, the P-F operator dynamics to the y dynamics, as the system (3) evolves in time. and Koopman operator often will lift a dynamical system from a Note that, by entropy we mean the Shannon entropy. finite-dimensional space to generate an infinite dimensional linear The entropy of a distribution is the measure of the information system in infinite dimensions. content of the distribution. Suppose there are two agents (states in Properties 4: Following properties for the Koopman and Perron- the case of a dynamical system) which are interacting with each Frobenius operators can be stated [2]. other. Each has its own entropy (information) and they transfer a part of their own information to the other agent via the interaction. a). For the Hilbert space F = L2(Z, B, µ¯) We use this intuition to define the information transfer. In particular, Z information transfer from x to y is the amount of information 2 2 k Uh k = |h(T (z))| dµ¯(z) (entropy) of x that is being transferred to y, the system (3) evolves Z Z in time. With this, we define the information transfer as follows. = |h(z)|2dµ¯(z) =k h k2 Definition 5 (Information Transfer): The information transfer from Z a state x to state y (Tx→y), as the dynamical system z(t + 1) = T (z(t))+ξ(t) evolves from time step t to time step t+1 is defined where µ¯ is an invariant measure. This implies that Koopman as operator is unitary. T = H(y(t + 1)|y(t)) − H (y(t + 1)|y(t)) (4) b). For any h ≥ 0, [Uh](z) ≥ 0 and hence Koopman is a positive x→y 6x > operator. where z = x y , H(y(t + 1)|y(t)) is the entropy of y at time c). For invertible system T , the P-F operator for the inverse −1 ∗ ∗ ∗ t + 1 conditioned on y(t) and H6x(y(t + 1)|y(t)) is the conditional system T : Z → Z is given by P and P P = PP = I. entropy of y(t + 1), conditioned on y(t), when x is absent from Hence, the P-F operator is unitary. the dynamics. d). If the P-F operator is defined to act on the space of densities i.e., L1(Z) and Koopman operator on space of L∞(Z) 1with some abuse of notation we use the same notation for the P-F functions, then it can be shown that the P-F and Koopman operator defined on the space of measure and densities. The intuition behind the definition of information transfer is the Integrating over the entire space X, we have fact that the total entropy of y is the entropy of y when x is Z Z absent from the dynamics plus the entropy transferred from x to y. η(P ρ(x))dµ ≥ P η(ρ(x))dµ X X Hence, the information transfer from x to y gives the amount of Z entropy flowing from x to y and thus quantifies how much the x = η(ρ(x))dµ (6) dynamics affects the y dynamics. With this, we define influence in X a dynamical system as follows, since P preserves the integral. Hence, we have Definition 6 (Influence): We say a state (or subspace) x influences a state (or subspace) y if and only if the information transfer from H(P ρ) ≥ H(ρ). x to y is non-zero. T x Larger the absolute value of the x→y, more is the effect of Note that, for a finite measure space, when P has a constant y x y dynamics on and hence more is the influence of on . stationary density, the above theorem tells us that the entropy never The information transfer thus defined (4), gives the entropy trans- decreases. Note that, this statement is similar to the second law of ferred as the system (3) evolves by one time step. This gives whether thermodynamics. the state (or subspace) x affect/influence the state (or subspace) y The above theorem states that under the transformation T , when the directly. The definition of information transfer can be generalized Markov operator has a constant stationary density, entropy never to define information transfer over multiple time steps [25] and it decreases. However, the entropy may not increase at all during has been shown to capture indirect influence in a dynamical system iterations of T . This is the case when the Markov operator P is [25]. the Perron-Frobenius operator (P) and T is an invertible measure preserving transformation. IV. INFORMATION TRANSFER BETWEEN SETSINTHE PHASE Definition 9 (Measure Preserving Transformation): Let (X, B, µ) SPACE be a measure space and T : X → X a measurable transformation. In this section, we develop the notion of information transfer Then T is said to be measure-preserving if between the sets of a dynamical system. µ(T −1(A)) = µ(A), ∀A ∈ B. Theorem 10: Let (X, B, µ) be a finite measure space and T : X → A. Shannon Entropy and Dynamical Systems X be an invertible measure-preserving transformation. If P is the n Let (X, B, µ) be a measure space, such that µ(X) < ∞. Hence, Perron-Frobenius operator corresponding to T , then H(P f) = by normalizing with µ(X) one can define a probability measure on H(f) for all n, where f ∈ L1 and f ≥ 0. (X, B) and treat (X, B, µ) as a probability space. Let T : X → X Proof: For f ∈ L1, we have be a transformation on X and ρ be a probability density on X, dT −1(x) such that R ρdµ = 1. Then the Shannon entropy of the density ρ [ f](x) = f(T −1(x))| |. X P dx is given by Z dT −1(x) H(ρ) = − ρ log ρdµ. Since, T is invertible and measure-preserving, | dx | = 1. X Hence, −1 The evolution of ρ, as the dynamical system (X, B, µ, T ) evolves [Pf](x) = f(T (x)). under the transformation T is given by the Perron-Frobenius −1 Let P1 be the P-F operator corresponding to T . Then by similar operator, which is defined as arguments, [P1f](x) = f(T (x)). Hence, P1Pf = PP1f. Thus, Definition 7 (Perron-Frobenius Operator): [2] The P-F operator −1 P : M(X) → M(X) is given by P1 = P .

Z Hence, from Theorem 8, −1 [Pµ](A) = δT (x)(A)dµ(x) = µ(T (A)) (5) X H(P1Pf) ≥ H(Pf) ≥ H(f). where δ (A) is stochastic transition which measure the T (z) However, since P1Pf = f, we have H(Pf) = H(f). Hence, probability that point x will reach the set A in one time step under n the system mapping T and M(X) is the space of signed measures H(P f) = H(f), ∀n. on X. With this we have Hence, for invertible measure-preserving dynamical systems, the Theorem 8: Let (X, B, µ) be a finite measure space, such that entropy remains constant. An example is the Hamiltonian systems. µ(X) < ∞ and P : L1 → L1 be a Markov operator. If P has a However, for Hamiltonian systems, one has the celebrated Liou- constant stationary density (P 1 = 1), then ville’s Theorem [32] which proves that the volume of any set in H(P ρ) ≥ H(ρ). the phase space is conserved. In order to state the theorem in terms of entropy, it is not sufficient to consider the entropy over the entire Proof: For any f ∈ L1, such that f ≥ 0, define phase space X, but one has to define the entropy of any measurable η(f) = −f log f. set A ⊂ B. This assignment of Shannon entropy to any measurable subset of the Then, η is a concave function and hence from Jensen inequality we phase space X will form the basis of the definition of information have flow between the sets in the phase space of a dynamical system η(P ρ) ≥ P η(ρ). and this will be the main focus in the next subsection. B. Entropy and Information Flow of the transformation T . Hence this can be thought of as one-step In this subsection we define the information transfer between the information transfer. sets in the phase space of a dynamical system. Let (X, B, µ) be a Proposition 14: Let (X, B, µ) be a probability space and T : X → probability space. Then the Shannon entropy of any measurable set X a non-singular transformation. Then the information transfer can be defined as follows: defined in (8) satisfies the following: µ Definition 11: The Shannon entropy of any set A ∈ B with respect 1) Ti→j ≥ 0. for Ai,Aj ∈ B such that µ(Ai) > 0 and µ(Aj ) > to the measure µ is defined as 0. Moreover, if µ(Aj ) is zero, then the information transfers µ Ti→j is zero. Hµ(A) = −µ(A) log µ(A). 2) Information transfer, in general, is asymmetric. Proof: 1) Consider Ai,Aj ∈ B such that µ(Ai) > 0 and If µ(A) is zero, then Hµ(A) := 0. Lemma 12: (X, B, µ) T : X → X µ(Aj ) > 0. Then the non-negativity of the information transfer Let be a probability space and µ be an invertible measure-preserving transformation. Let A ∈ B, follows directly from the definition of Shannon entropy and Ti→j = −1 then the Shannon entropy of A remains constant under the action 0 if and only if µ(T (Aj ) ∩ Ai) = 0. Moreover, let µ(Aj ) = 0. −1 of the transformation T . Since T is non-singular µ(T (Aj )) = 0. And since 0 ≤ µ(A) ≤ −1 −1 Proof: Since T is an invertible measure-preserving transfor- 1 for A ∈ B and µ(T (Aj ) ∩ Ai) ≤ µ(T (Aj )), we have mation on X, we have −1 µ(T (Aj ) ∩ Ai) = 0 n µ(T (A)) = µ(A), ∀n ∈ Z. µ and hence Ti→j = 0. −1 Hence for n ∈ , 2) In a system, it may happen that T (Aj )∩Ai is a measure zero Z −1 set, but T (Ai) ∩ Aj = Ak, such that µ(Ak) > 0. In this case n n n µ µ Hµ(T (A)) = −µ(T (A)) log µ(T (A)) Ti→j = 0 but Tj→i ≥ 0. Hence, information transfer in general is = −µ(A) log µ(A) = Hµ(A). not symmetric. Unlike mutual information between two measurable sets, which is Hence the entropy of any measurable set remains constant under symmetric and quantifies the information shared by two sets, it is the transformation T . the asymmetry property of information transfer which allows us The above lemma gives a conservation law for the Shannon entropy to characterize influence of one measurable set on any other mea- and in the framework of Hamiltonian systems is the Liouville’s surable set. In [25], the one-step information transfer between the Theorem [32], which states that phase space volumes are conserved states in a dynamical system was generalized to define information under a Hamiltonian flow. transfer over multiple time steps. In this set-theoretic setting this With the Shannon entropy defined for any measurable set A ∈ B, amounts to defining information transfer from Ai to Aj under the with respect to a measure µ, now we are in a position to define the n transformation T , for n ∈ N. information flow among any two sets measurable sets Ai,Aj ⊂ B. Definition 15 (n-step Information Transfer): Let (X, B, µ) be a probability space and let T : X → X be a non-singular trans- n formation. Let Ai,Aj ⊂ B. Define µij as 1 T (A ) 1 AAAB8nicbVBNSwMxEM3Wr1q/qh69BItQD5ZdFfRY9eKxQr9gu5Zsmrah2WRJZoWy9Gd48aCIV3+NN/+NabsHbX0w8Hhvhpl5YSy4Adf9dnIrq2vrG/nNwtb2zu5ecf+gaVSiKWtQJZRuh8QwwSVrAAfB2rFmJAoFa4Wju6nfemLacCXrMI5ZEJGB5H1OCVjJrz+mZ96kfNPlp91iya24M+Bl4mWkhDLUusWvTk/RJGISqCDG+J4bQ5ASDZwKNil0EsNiQkdkwHxLJYmYCdLZyRN8YpUe7ittSwKeqb8nUhIZM45C2xkRGJpFbyr+5/kJ9K+DlMs4ASbpfFE/ERgUnv6Pe1wzCmJsCaGa21sxHRJNKNiUCjYEb/HlZdI8r3gXFe/hslS9zeLIoyN0jMrIQ1eoiu5RDTUQRQo9o1f05oDz4rw7H/PWnJPNHKI/cD5/ANVDkE0= i T (A ) A

AAAB/HicbVDJSgNBEK2JW4zbaI5eGoMQD4YZFfSY6MVjhGyQjENPpydp07PQ3SMMQ/wVLx4U8eqHePNv7CwHTXxQ8Hiviqp6XsyZVJb1beRWVtfWN/Kbha3tnd09c/+gJaNEENokEY9Ex8OSchbSpmKK004sKA48Ttve6Gbitx+pkCwKGyqNqRPgQch8RrDSkmsWG/fZqT0u11x20iM4RjX3wTVLVsWaAi0Te05KMEfdNb96/YgkAQ0V4VjKrm3FysmwUIxwOi70EkljTEZ4QLuahjig0smmx4/RsVb6yI+ErlChqfp7IsOBlGng6c4Aq6Fc9Cbif143Uf6Vk7EwThQNyWyRn3CkIjRJAvWZoETxVBNMBNO3IjLEAhOl8yroEOzFl5dJ66xin1fsu4tS9XoeRx4O4QjKYMMlVOEW6tAEAik8wyu8GU/Gi/FufMxac8Z8pgh/YHz+ALx6k4g= i j \ −n n µ(T (Aj ) ∩ Ai) µij = . µ(Ai)

A AAAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPVi8eK9gPaUDbbSbt0swm7G6GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/VbT6g0j+WjGSfoR3QgecgZNVZ6uO7xXrniVt0ZyDLxclKBHPVe+avbj1kaoTRMUK07npsYP6PKcCZwUuqmGhPKRnSAHUsljVD72ezUCTmxSp+EsbIlDZmpvycyGmk9jgLbGVEz1IveVPzP66QmvPIzLpPUoGTzRWEqiInJ9G/S5wqZEWNLKFPc3krYkCrKjE2nZEPwFl9eJs2zqnde9e4vKrWbPI4iHMExnIIHl1CDO6hDAxgM4Ble4c0Rzovz7nzMWwtOPnMIf+B8/gANwo2k i AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KokKeqx68VjRfkAbyma7adduNmF3IpTQn+DFgyJe/UXe/Ddu2xy09cHA470ZZuYFiRQGXffbWVpeWV1bL2wUN7e2d3ZLe/sNE6ea8TqLZaxbATVcCsXrKFDyVqI5jQLJm8HwZuI3n7g2IlYPOEq4H9G+EqFgFK10f9V97JbKbsWdgiwSLydlyFHrlr46vZilEVfIJDWm7bkJ+hnVKJjk42InNTyhbEj7vG2pohE3fjY9dUyOrdIjYaxtKSRT9fdERiNjRlFgOyOKAzPvTcT/vHaK4aWfCZWkyBWbLQpTSTAmk79JT2jOUI4soUwLeythA6opQ5tO0Ybgzb+8SBqnFe+s4t2dl6vXeRwFOIQjOAEPLqAKt1CDOjDowzO8wpsjnRfn3fmYtS45+cwB/IHz+QMPRo2l j Then the information transferred from set Ai to Aj , with respect XAAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzU6PTLFbfqzkFWiZeTCuSo98tfvUHM0gilYYJq3fXcxPgZVYYzgdNSL9WYUDamQ+xaKmmE2s/mh07JmVUGJIyVLWnIXP09kdFI60kU2M6ImpFe9mbif143NeGNn3GZpAYlWywKU0FMTGZfkwFXyIyYWEKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rXuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kDtrWM3w== to the measure µ, under n iterates of the transformation T is

µ n n n Fig. 1. Schematic showing intuition of information transfer. [Ti→j ] = −µij log µij . (9) Hence, the total information transferred from Ai to Aj as the Definition 13 (Information Transfer): Let (X, B, µ) be a probabil- dynamical system (X, B, µ, T ) evolves n times is ity space and let T : X → X be a non-singular transformation. Let n µ n X µ i Ai,Aj ⊂ B such that µ(Ai) > 0. Define µij as [Tij ]1 = [Ti→j ] . (10) i=1 −1 µ(T (Aj ) ∩ Ai) µij = . (7) C. Ergodicity, Mixing and Information Flow µ(Ai) The term “ergodic” was first used by Ludwig Boltzmann in relation Then the information transferred from set A to A , with respect i j to problems in statistical mechanics. to the measure µ, under one iterate of the transformation T is Definition 16 (Ergodicity [2]): Let (X, B, µ) be a measure space µ and T : X → X a non-singular transformation. Then T is called Ti→j = −µij log µij . (8) ergodic if every invariant set A ∈ B is such that µ(A) = 0 or µ If µ(Ai) = 0, then Ti→j := 0. µ(X \ A) = 0; that is, T is ergodic if all invariant sets are trivial For invertible transformations T , the quantity µij can be equiva- subsets of X. lently defined as The above is one of the many equivalent definitions of ergodicity µ(T (Ai) ∩ Aj ) µij = . and depending on the underlying system and the problem in µ(Ai) hand a particular equivalent formulation of ergodicity is used. The Intuitively, the quantity µij quantifies the fraction of the volume of equivalent definition of ergodicity relevant to the current study is Ai that ends up in the set Aj under one iterate of the transformation the following: T (see Fig. 1) and information transfer, thus defined in definition Definition 17 (Ergodicity): Let (X, B, µ) be a measure space and 13, gives the information flow from Ai to Aj under one iterate let T : X → X be a non-singular transformation. Then T is ergodic if for any two measurable sets A, B ∈ B with µ(A) > 0 and For ergodicity, the necessary and sufficient condition is that there −n µ(B) > 0, there is an n ∈ Z such that µ(T (A) ∩ B) > 0. should be a non-zero information flow from any set of positive Theorem 18: Let (X, B, µ) be a probability space and T : X → X measure to any other set of positive measure. It does not impose be a non-singular measure-preserving transformation. Then T is any constraint on the quantity of information transferred. However, ergodic if and only if for any two sets A, B ∈ B, with µ(A) > 0 for the transformation to be mixing, the information transfer should and µ(B) > 0 the total information transferred (10) from A to B not only be non-zero but should also be equal to a fixed entropy, µ n ([TAB ]1 ) is non-zero. determined by the measure of the set to which information is Proof: First assume that T is ergodic. Hence, for any two sets flowing. Thus mixing is a stronger condition and mixing implies A, B ∈ B with µ(A) > 0 and µ(B) > 0, there exists n ∈ Z such ergodicity. Hence, information transfer can be used to distinguish that ergodic transformations from mixing. µ(T −n(A) ∩ B) > 0. V. FINITE-DIMENSIONAL APPROXIMATION Hence for that n, −n In this section, we develop the computational framework for compu- n µ(T (A) ∩ B) µAB = > 0. tation of information transfer between the states in the phase space µ(A) of a dynamical system. The computation is based on set-oriented µ n Hence, [TA→B ] is non-zero and since the information transfer is methods for construction of finite dimensional approximation of always positive, the total information flow from A to B, given by the Perron-Frobenius operator associated with a dynamical system n (X, B, µ, T ) [36]. For construction of the finite-dimensional ap- µ n X µ i [TAB ]1 = [TA→B ] proximation of the P-F operator, consider a finite partition (X ) of i=1 X as is non-zero. X = {D ,D , ··· ,D }, (12) On the other hand, assume that for any two sets A, B ∈ B, 1 2 N with µ(A) > 0 and µ(B) > 0 the total information transfer such that ∪iDi = X,Di ∩ Dj = {φ} for i 6= j. Then the finite- µ n Pn µ i [TAB ]1 = i=1[TA→B ] is non-zero. This implies that there exists dimensional P-F operator on the partition X is a N × N P µ n at least one n ∈ Z such that [TA→B ] is non-zero. Moreover, since such that µ i each [TA→B ] > 0, it implies there exists at least one n such that −1 µ n m(T (Dj ) ∩ Di) [TA→B ] > 0. For that n, since the information transfer is positive, [P ]ij = , (13) n m(Di) µAB > 0. Hence, th µ(T −n(A) ∩ B) where [P ]ij is the ij entry of the finite-dimensional P matrix µn = > 0 AB µ(A) and m is the . Computationally several short- −n term trajectories are used to compute the matrix P . In particular, =⇒ µ(T (A) ∩ B) > 0. (11) the map T is used to propagate L “initial conditions”, which are Hence T is ergodic. chosen to be uniformly distributed over each cell Di and the entry Although a plethora of ergodic systems can be constructed or can be [P ]ij is approximated by the fraction of the initial conditions in Di abstractly shown to exist, it is extremely difficult to verify ergodicity that end up in cell Dj after one iterate of the map T . for naturally occurring systems. Hence, in many cases, ergodicity is With the finite-dimensional P-F operator, the computation of in- proved by proving the system to be satisfying a “stronger” condition formation transfer between any two sets Di,Dj ∈ X is straight- called mixing. forward. In particular, from (7) and (13) we have µij = [P ]ij . Definition 19 (Mixing [2]): Let (X, B, µ) be a probability space Hence the information transfer from cell Di to cell Dj , with respect and T : X → X a measure-preserving transformation. T is called to the partition X , is mixing if [TDi→Dj ] = −[P ]ij log[P ]ij . (14) lim µ(T −n(A) ∩ B) = µ(A)µ(B), ∀A, B ∈ B. n→∞ As a direct consequence of the definition of mixing we have the The expression in (14) is the one-step information transfer and following lemma which characterizes mixing using information higher order transfers can be obtained similarly by considering pow- transfer. ers of the finite-dimensional P matrix. In particular, the information Lemma 20: Let (X, B, µ) be a probability space and T : X → X a transferred from cell Di to cell Dj under n iterates of the map T non-singular transformation. Let A, B ∈ B such that µ(A) > 0 and is µ(B) > 0. Then T is mixing if and only if the n-step information n n n [T ] = −[P ] log[P ] (15) transfer, with n → ∞, from A to B is equal to the entropy of B. Di→Dj ij ij Proof: The proof is straight-forward and follows from the fact where that −n µ(T −n(A) ∩ B) n m(T (Dj ) ∩ Di) µn = [P ]ij = (16) AB µ(A) m(Di) and for T to be mixing one should have is the ijth entry of the matrix P n. µ(T −n(A) ∩ B) lim = µ(B), ∀A, B ∈ B. VI.OPTIMAL PLACEMENTOF ACTUATORS AND SENSORS n→∞ µ(A) A. Problem Formulation As stated earlier, mixing is stronger condition than ergodicity and Motivated by the problem of control of oil spill in fluid flow, we mixing implies ergodicity. This fact is can be understood intuitively consider the problem of optimal placement of static actuators and using the concept of information transfer as follows: sensors. The spatial distribution of the oil spill is modelled as a scalar density function ρ(x, t) ∈ L2(X) which is advected by the where, Na ⊆ {1,...,N} is the admissible set and it consists of differential equation the admissible locations where the actuators can be placed. The x˙ = f(x) assumption essentially implies that the actuator occupies the entire cell of the partition X . A similar assumption can also be made on N where x ∈ X ⊂ R and X is assumed compact. The evolution of the location of the sensor. the density ρ(x, t) is given by the linear advection PDE With this, we have the following characterization of the controllable ∂ρ(x, t) region in terms of the finite-dimensional P-F operator P . = ∇ · (f(x)ρ(x, t)). (17) ∂t Theorem 26: Let P be the finite-dimensional approximation of the P-F semigroup constructed on the finite partition X = Let A ⊂ X, k = 1, ··· , p and S ⊂ X, l = 1, ··· , q be the k l {D ,D , ··· ,D } of X. Let A = D , i = 1, 2, ··· , p locations of the actuators and sensors. 1 2 N i ki correspond to the location of actuator cells. The following are true. Assumption 21: We assume 0 < m(Ak)  m(X) and 0 < 1) If the entire space X is controllable with actuator located on m(Sl)  m(X), where m is the Lebesgue measure. The controlled evolution of linear advection PDE (17) with spatially cells Dki for i = 1, . . . , p i.e., located actuators and sensors can be described as follows: ∞ " p # X n n X p µc(B) = α mD (B) 6= 0 ∂ρ(x, t) X P ki = −∇ · (f(x)ρ(x, t)) + χA (x)uk(x, t) n=0 i=1 ∂t k (18) k=1 for every set B ⊂ X with m(B) > 0, then the vector y`(x, t) = χS` (x)ρ(x, t), k = 1, . . . , p, ` = 1, . . . , q N µ¯ = (¯µ1,..., µ¯N ) ∈ R is nonzero i.e., µ¯k 6= 0 for

where χAk (x) and χS` (x) are the indicator functions for the set k = 1,...,N, where th Ak and S` respectively. uk(x, t) is the control input for the k p th X n X > n actuator and y` is the output of the ` sensor. µ¯ = α eki P (19) The objective is to provide an approach for optimal locations of Ak n=0∞ i=1 and Sl, such that certain controllability and observability conditions N and eki ∈ R is a column vector consisting of all zeros with are satisfied for the controlled PDE (18). th 1 at ki location. B. Controllability and Observability of Controlled PDE 2) If µ¯ as defined in Eq. (19) is nonzero (element wise) then then system is coarse controllable. In this subsection, we briefly state the main results about controlla- Proof: See [35]. bility and observability of the controlled PDE (18). For details see Essentially, this theorem proves that if the actuators are located in [33]–[35]. the cells Dki , then under the flow all the other cells are reachable In [34], [35], the authors had provided results connecting the flow from these actuator cells. Similar results exists for sensor placement of the underlying vector field and infinite-time controllability and [35] where if the sensors are locates in cells Di, then these observability of the controlled PDE (18). The characterization is cells are reachable from all the other cells. For results on coarse in terms of the P-F and the Koopman operators and hence for observability, see [35]. implementation of the results we consider the finite-dimensional approximation of these operators on a partition of the space X. C. Information Transfer and Optimal Placement Remark 22: For computation of the finite-dimensional P-F opera- Let X = {D1, ··· ,DN } be a finite partition of the space X. Then tor, the continuous time system x˙ = f(x) is discretized using Euler from (15), it can be seen that the cell Dj is reachable from cell Di if discretization. and only if there exists a n such that the n-step information transfer The finite dimensional approximation of the P-F operator leads to is non-zero. Note that if the N − 1-step information transfer from coarse con- a coarser notion of controllability which we refer to as Di to Dj is zero, then for all n > N − 1 the n-step information trollability. The definition of coarse controllability closely follows transfer remains zero. This follows from the fact that the finite that of coarse stability as introduced in [12]. partition can be thought of as a graph of N nodes and if one node Definition 23 (Coarse Controllability [35]): Consider a space X (Dj ) is not reachable from some other node (Di) in steps less than with a finite partition X . X is said to be coarsely controllable or equal to N − 1, then Dj is not reachable from Di at all. See with respect to the partition X if for an uncontrollable subset A ⊂ [35] for a detailed discussion on this. U ⊂ X, there exists no subpartition S = {Ds1 ,Ds2 , ··· ,Dsl } Hence, the problem of optimal placement of actuators reduces to l in X with domain S = ∪k=1Dsk , such that A ⊂ S and S is an finding the minimum number of cells in the partition, such that uncontrollable subset of X. there is non-zero information transfer from these cells to all the Definition 24 (Coarse Controllability from Dk [35]): Let X be other cells. The optimization problem can be formulated in many the partition of the space and Dk ∈ X . The system is said to different ways and we discuss the very obvious formulation. For 1×N be coarse controllable from cell Dk if all the cells of the partition other formulations see [34], [35]. Let e ∈ R be a vector of X are reachable from Dk. length N with entries zero and one. Then one can write the optimal For typical partitions, coarse controllability means controllability placement problem as modulo the uncontrollable sets A, with the uncontrollable sets smaller than the size of the cells within the partitions. When the cell min ||e||0 sizes (measure) goes to zero, one recovers the uncontrollable sets subject to eT > 0 (20) fully. Without loss of generality, we make following assumption on ei ∈ {0, 1} the location of the actuators and sensors. Assumption 25: where T is the information transfer matrix such that [T ]ij is the total information transfer from cell Di to cell Dj in N − 1 time

Ai = Dki , i = 1, . . . , p where ki ∈ Na. steps. However, the optimization problem (20) is non-convex. A relaxed version of (20) is a convex problem and the optimization 1 1 4 problem can be stated as 0.9 0.9 3 0.8 0.8 2 min ||e||1 0.7 0.7 1 0.6 0.6 0 subject to eT > 0 0.5 0.5 -1 (21) X 0.4 0.4 -2 ei ∈ [0, 1], ei = p, 0.3 0.3 -3 i 0.2 0.2 -4 0.1 0.1 -5 where p is the number of actuators. Note that the relaxed problem 0 0 -6 0 0.5 1 1.5 2 0 0.5 1 1.5 2 provides a sub-optimal solution and the optimal solution e?, in general, will not be binary and one can provide a sub-optimal (a) (b) solution to the actuator placement problem by considering the ? Fig. 3. (a) Position of the 20 actuators. (b) Total information transfer from largest p values of e and place the actuators in those cells. the actuator cells to all the cells in the state space in log scale. The relaxed optimization places the given number of actuators in those cells such that the non-zero information transfer region is maximized, thus maximizing the coarse controllable region. shown in Fig. 4. The dimensions of the room are as follows: 0 ≤ For the sensor placement problem the optimization problem, the x ≤ 1.52m and 0 ≤ y ≤ 1.68m. The order of magnitude for objective is to choose the minimum number of cells such that the velocity field is O(1). The Reynolds number of the flow is information transfer from each of the other cells to these cells is Re = 76725 and the Prandtl number P r = 0.729. non-zero, thus achieving coarse observability. Hence the relaxed optimization problem is

min ||e||1 subject to T >e> > 0 X ei ∈ [0, 1], ei = q i where q is the number of sensors. D. Simulation Results 1. Control of contaminants in Double Gyre flow field Fig. 4. Fluid flow vector field in building. We consider the double gyre fluid flow field given by

x˙ = −π sin(πx) cos(πy), y˙ = π sin(πy) cos(πx) For the finite-dimensional approximation of the P-F operator we where (x, y) ∈ X = [0, 2] × [0, 1] (Fig. 2). The Double Gyre flow divide the state space into 60 × 60 divisions and choose 6 sensors field is often used as a model for oceanographic flow field [37] and for maximizing the coarse observable region. the interest in this example comes from the problem of control of contaminants in oceanographic flows [38], [39]. 1 1 3 0.9 2 0.8 0.8 1 0.7 0 0.6 0.6

0.5 -1

0.4 0.4 -2

0.3 -3

0.2 0.2 -4

0.1 -5

0 0 -6 0 0.5 1 1.5 2 0 0.5 1 1.5 2

(a) (b)

Fig. 5. (a) Position of the 6 sensors. (b) Total information transfer to the Fig. 2. Double gyre velocity field sensor cells from all the cells in the state space in log scale.

For the finite-dimensional approximation we consider a 60 × 60 In Fig. 5(a) the position of the 6 sensors are shown (black boxes). partition of the state space and we choose a total of 20 actuators. The corresponding coarse observable region is shown in Fig. 5(b) The position of the actuators, obtained using the optimization where the information transfer to the sensor cells from all the problem (21) is shown in Fig. 3(a) and the information transfer other cells are plotted in log scale. It is observed that most of to all the cells from these 20 actuator cells are shown in Fig. 3(b). the state space is observable with the 6 sensors, with the separatrix It should be noted that the information transfer is not non-zero to not being observed. This is because the separatrix divides the state all the cells from these 20 cells and thus to achieve full coarse space into invariant regions and the 6 sensors are placed in each of controllability, one would require more actuators. these invariant subspaces. Hence these sensors can observe only the 2. Sensor placement in building systems corresponding invariant subspace and cannot observe the separatrix The vector field used in this example is the average velocity field and hence very little information flows from the separatirx to the obtained from a detailed finite element-based simulation of Navier sensors. This is indicated by the blue region in Fig. 5(b). Ideally Stokes equation. For the purpose of simulation, we only employ there should not be any information flow from the separatrix to a two dimensional slice of the three dimensional velocity field as the sensors, however in the simulations to to finite partitions, there is very small non-zero information flow from the separatrix to the [19] E. Yeung, S. Kundu, and N. Hodas, “Learning deep neural network sensors. representations for koopman operators of nonlinear dynamical sys- tems,” arXiv preprint arXiv:1708.06850, 2017. VII.CONCLUSIONS [20] C. E. Shannon, “A mathematical theory of communication,” Bell In this paper we extend the concept of information transfer between system technical journal, vol. 27, no. 3, pp. 379–423, 1948. [21] H. Marko, “The bidirectional communication theory-a generalization the states in a dynamical system to define information transfer of information theory,” IEEE Transactions on communications, vol. 21, between the measureable sets of a dynamical system and we show no. 12, pp. 1345–1351, 1973. the defined information transfer measure satisfy the intuitions of [22] G. Kramer, “Directed information for channels with feedback,” in PhD information flow, namely, zero transfer and transfer asymmetry. Thesis, Swiss Federal Institute of Technology Zurich, 1998. [23] J. L. Massey, “Causality, feedback and directed information.” in Proc. We further show how information transfer is connected with the Intl. Symp. on Info. th. and its Applications, Waikiki, Hawai, USA, dynamical system concepts of ergodicity and mixing and provide 1990. necessary and sufficient conditions in terms of information transfer [24] X. S. Liang and R. Kleeman, “Information transfer between dynamical measure for a system to be ergodic or mixing. Finally, we formulate system components,” Physical Review Letters, vol. 95, p. 244101, 2005. a convex optimization problem in terms of information transfer [25] S. Sinha and U. Vaidya, “Causality preserving information transfer to address the problem of optimal placement of actuators and measure for control dynamical system,” in 2016 IEEE 55th Conference sensors for control of non-equilibrium dynamics and demonstrate on Decision and Control (CDC). IEEE, 2016, pp. 7329–7334. the efficiency of the developed framework on two different systems. [26] ——, “On information transfer in discrete dynamical systems,” in 2017 Indian Control Conference (ICC). IEEE, 2017, pp. 303–308. REFERENCES [27] U. Vaidya and S. Sinha, “Information based causal measure for influence characterization in dynamical systems with applications,” Mathematical methods of classical mechanics [1] V. I. Arnol’d, . Springer Accepted for publication in IEEE Proceedings of American Control Science & Business Media, 2013, vol. 60. Conference, 2016. [2] A. Lasota and M. C. Mackey, Chaos, , and Noise: Stochastic [28] S. Sinha, P. Sharma, U. Vaidya, and V. Ajjarapu, “Identifying causal Aspects of Dynamics. New York: Springer-Verlag, 1994. interaction in power system: Information-based approach,” in 2017 [3] A. Katok and B. Hasselblatt, Introduction to the modern theory of IEEE 56th Annual Conference on Decision and Control (CDC). IEEE, dynamical systems. Cambridge, UK: Cambridge University Press, 2017, pp. 2041–2046. 1995. [29] ——, “On information transfer based characterization of power system [4] M. Dellnitz and O. Junge, “On the approximation of complicated stability,” arXiv preprint arXiv:1809.07704, 2018. dynamical behavior,” SIAM Journal on Numerical Analysis, vol. 36, [30] S. Sinha and U. Vaidya, “Data-driven approach for inferencing pp. 491–515, 1999. causality and network topology,” in 2018 Annual American Control [5] I. Mezic and A. Banaszuk, “Comparison of systems with complex Conference (ACC). IEEE, 2018, pp. 436–441. behavior: spectral methods,” in Proceedings of the 39th IEEE Confer- [31] ——, “On data-driven computation of information transfer for causal ence on Decision and Control (Cat. No.00CH37187), vol. 2, 2000, pp. inference in dynamical systems,” arXiv preprint arXiv:1803.08558, 1224–1231 vol.2. 2018. [6] G. Froyland, “Extracting dynamical behaviour via Markov models,” [32] H. Goldstein, C. Poole, and J. Safko, “Classical mechanics,” 2002. in Nonlinear Dynamics and Statistics: Proceedings, Newton Institute, [33] U.Vaidya, R. Rajaram, and S. Dasgupta, “Actuator and sensor place- Cambridge, 1998, A. Mees, Ed. Birkhauser, 2001, pp. 283–324. ment in linear advection PDE with building system application,” [7] O. Junge and H. Osinga, “A set oriented approach to global optimal Journal of Mathematical Analysis and Application, vol. 394, pp. 213– control,” ESAIM: Control, Optimisation and Calculus of Variations, 224, 2012. vol. 10, no. 2, pp. 259–270, 2004. [34] S. Sinha, U. Vaidya, and R. Rajaram, “Optimal placement of acturators [8] I. Mezic´ and A. Banaszuk, “Comparison of systems with complex and sensors for the control of nonequilibrium dynamics,” in Proceed- behavior,” Physica D, vol. 197, pp. 101–133, 2004. ings of European Control Conference, 2013. [9] M. Dellnitz, O. Junge, W. S. Koon, F. Lekien, M. Lo, J. E. Marsden, [35] ——, “Operator theoretic framework for optimal placement of sensors K. Padberg, R. Preis, S. D. Ross, and B. Thiere, “Transport in and actuators for control of nonequilibrium dynamics,” Journal of dynamical astronomy and multibody problems,” International Journal Mathematical Analysis and Applications, vol. 440, no. 2, pp. 750–772, of Bifurcation and Chaos, vol. 15, pp. 699–727, 2005. 2016. [10] I. Mezic,´ “Spectral properties of dynamical systems, model reduction [36] M. Dellnitz and O. Junge, “Set oriented numerical methods for and decompositions,” Nonlinear Dynamics, vol. 41, no. 1-3, pp. 309– dynamical systems,” Handbook of dynamical systems, vol. 2, pp. 221– 325, 2005. 264, 2002. [11] P. G. Mehta and U. Vaidya, “On stochastic analysis approaches for [37] A. M. Mancho, D. Small, and S. Wiggins, “A tutorial on dynamical comparing dynamical systems,” in Proceeding of IEEE Conference on systems concepts applied to Lagrangian transport in oceanic flows Decision and Control, Spain, 2005, pp. 8082–8087. defined as finite time data set: Theoretical and computational issues,” [12] U. Vaidya and P. G. Mehta, “Lyapunov measure for almost everywhere Physics Reports, vol. 437, pp. 55–124, 2006. stability,” IEEE Transactions on Automatic Control, vol. 53, no. 1, pp. [38] I. Mezic,´ S. Loire, V. A. Fonoberov, and P. Hogan, “A new mixing 307–323, 2008. diagnostic and gulf oil spill movement,” Science, vol. 330, no. 6003, [13] A. Raghunathan and U. Vaidya, “Optimal stabilization using lyapunov pp. 486–489, 2010. measures,” IEEE Transactions on Automatic Control, vol. 59, no. 5, [39] M. J. Olascoaga and G. Haller, “Forecasting sudden changes in envi- pp. 1316–1321, 2014. ronmental pollution patterns,” Proceedings of the National Academy [14] Y. Susuki and I. Mezic, “Nonlinear koopman modes and coherency of Sciences, vol. 109, no. 13, pp. 4738–4743, 2012. identification of coupled swing dynamics,” IEEE Transactions on Power Systems, vol. 26, no. 4, pp. 1894–1904, 2011. [15] M. Budisic, R. Mohr, and I. Mezic, “Applied koopmanism,” Chaos, vol. 22, pp. 047 510–32, 2012. [16] A. Mauroy and I. Mezic, “A spectral operator-theoretic framework for global stability,” in Proc. of IEEE Conference of Decision and Control, Florence, Italy, 2013. [17] A. Surana and A. Banaszuk, “Linear observer synthesis for nonlinear systemsusing koopman operator framework,” in Proceedings of IFAC Symposium on Nonlinear Control Systems, Monterey, California, 2016. [18] E. Yeung, Z. Liu, and N. O. Hodas, “A koopman operator approach for computing and balancing gramians for discrete time nonlinear systems,” in 2018 Annual American Control Conference (ACC). IEEE, 2018, pp. 337–344.