Directed Information, Causal Estimation, and Communication in Continuous Time

Directed Information, Causal Estimation, and Communication in Continuous Time Young-Han Kim Haim H. Permuter Tsachy Weissman University of California, San Diego Ben Gurion University of the Negev Stanford University/Technion La Jolla, CA, USA Beer-Sheva, Israel Stanford, CA, USA/Haifa, Israel [email protected] [email protected] [email protected] Abstract—The notion of directed information is intro- the road to a computable characterization of feedback duced for stochastic processes in continuous time. Proper- capacity; see [7], [8] for examples. ties and operational interpretations are presented for this Directed information and its variants also characterize notion of directed information, which generalizes mutual (via multi-letter expressions) the capacity of two-way information between stochastic processes in a similar channels and multiple access channels with feedback [2], manner as Massey’s original notion of directed information generalizes Shannon’s mutual information in the discrete- [9], and the rate distortion function with feedforward time setting. As a key application, Duncan’s theorem is [10], [11]. In another context, directed information also generalized to estimation problems in which the evolution captures the difference in growth rates of wealth in horse of the target signal is affected by the past channel noise, race gambling due to causal side information [12]. This and the causal minimum mean squared error estimation is provides a natural interpretation of I(Xn → Y n) as related to directed information from the target signal to the the amount of information about Y n causally provided observation corrupted by additive white Gaussian noise. by Xn on the fly. A similar conclusion can be drawn An analogous relationship holds for the Poisson channel. for other engineering and science problems, in which The notion of directed information as a characterizing of directed information measures the value of causal side the fundamental limit on reliable communication for a wide class of continuous-time channels with feedback is information [13]. dicussed. In this paper, we extend the notion of directed information to continuous-time random processes. The I. INTRODUCTION contribution of this paper is twofold. First, the definition Directed information I(Xn → Y n) between two we give for directed information in continuous time is n n random n-sequences X = (X1,...,Xn) and Y = valuable in itself. Just as in the discrete-time setting, di- (Y1,...,Yn) is a natural generalization of Shannon’s rected information in continuous time generalizes mutual mutual information to random objects with causal struc- information between two stochastic processes. Indeed, tures. Introduced by Massey [1], this notion of directed when two processes do not have any causal dependence information has been shown to arise as the canonical among them, the two notions become identical. Directed answer to a variety of problems with causally dependent information in continuous time is also a generalization components. For example, it plays a pivotal role in char- of its discrete time counterpart. acterizing the capacity CFB of communication channels Second, we demonstrate the utility of this notion of with feedback. Massey [1] showed that the feedback directed information by generalizing classical results capacity is upper bounded by on the relationship between mutual information and causal estimation in continuous time. In particular, we 1 n n CFB ≤ lim max I(X →Y ), generalize Duncan’s theorem which relates the minimum n→∞ p(xn||yn−1) n mean squared error (MMSE) of a target signal based where the definition of directed information I(Xn → on an observation through an additive white Gaussian Y n) is given in Section II and p(xn||yn−1) = channel to directed information between the target signal n i−1 i−1 i=1 p(xi|x ,y ) is the causal conditioning notation and the observation. We similarly generalize the Poisson streamlined by Kramer [2], [3]. This upper bound is tight analogue of Duncan’s theorem. forQ a certain class of ergodic channels [4]–[6], paving The rest of the paper is organized as follows. Section II is devoted to the definitions of directed information Note that each sequence component is a continuous-time and directed information density in continuous time, stochastic process. Define now which is followed by key properties of continuous- T T It X → Y time directed information in Section III. Section IV 0 0 T,t T,t presents a generalization of Duncan’s theorem, and of := I X0 →Y0 (3) its Poisson counterpart, for target signals that depend n on the past noise. In Section V we present a feedback ti− ti− ti−1− T T tn− = I Yti−1 ; X0 |Y0 + I Ytn ; X0 |Y0 , communication setting in which our notion of directed " # Xi=1 information in continuous time emerges naturally as the (4) characterization of the capacity. We conclude with a where on the right side of (3) is the directed information few remarks in Section VI. This extended abstract is between two sequences of length n + 1 defined in taken essentially verbatim from [14], with the addition previous sections, in (4) we take t0 = 0, and the mutual of Section V. More details, proofs of the stated results, information terms between two continuous time pro- and additional related results will be given in [15]. cesses, conditioned on a third, are well defined objects, T T as developed in [16], [17]. It X0 → Y0 is monotone II. DEFINITION OF DIRECTED INFORMATION IN in t in the following sense: CONTINUOUS TIME Proposition 1. If t′ is a refinement of t, then n n Let be a pair of random -sequences. Di- ′ T T T T (X ,Y ) n It X0 → Y0 ≤ It X0 → Y0 . rected information (from Xn to Y n) is defined as The following definition is now natural: n n n i i−1 Definition 1. Directed information between T and T I(X →Y ) := I(X ; Yi|Y ). X0 Y0 is defined as Xi=1 T T T T I X → Y := inf It X → Y , (5) Note that unlike mutual information, directed informa- 0 0 t 0 0 n n tion is asymmetric in its arguments, so I(X → Y ) 6= where the infimum is over all n and t as in (1). I(Y n →Xn). b Note, in light of Proposition 1, that For a continuous-time process {Xt}, let Xa = {Xs : T T T T a ≤ s ≤ b} denote the process in the interval [a, b] I X → Y = lim inf It X → Y . 0 0 + t 0 0 when a ≤ b and the empty set otherwise. Let Xb− = ε→0 { :ti−ti−1≤ε} a (6) {X : a ≤ s < b} denote the process in the interval s Directed information can be given an integral repre- [a, b) if a < b and the empty set otherwise. Similarly, b sentation via the following notion of a density. let Xa+ = {Xs : a<s ≤ b} denote the process in T T the interval (a, b] if a < b and the empty set other- Definition 2. For 0 ≤ t < T , define it+ X0 → Y0 wise. Throughout this section, equalities and inequalities T T 1 t+δ t+δ t between random objects, when not explicitly indicated, it+ X0 → Y0 = lim I(Yt ; X0 |Y0 ), (7) δ→0+ δ are to be understood in the sure sense (i.e., hold for all and for0 <t ≤ T , define i XT → Y T by sample paths). Functions of random objects are assumed t− 0 0 measurable even though not explicitly indicated. T T 1 t t t−δ it− X0 → Y0 = lim I(Yt−δ; X0|Y0 ) (8) We now develop the notion of directed information δ→0+ δ T T between two continuous-time stochastic processes on the whenever the limits exist. When it+ X0 → Y0 and time interval [0, T ]. Let t = (t ,t ,...,t ) denote an n- i XT → Y T exist and are equal, we denote them 1 2 n t− 0 0 dimensional vector with components satisfying by i XT → Y T , which we refer to as the directed t 0 0 information density. 0 ≤ t ≤ t ≤···≤ t ≤ T. (1) 1 2 n T T Proposition 2. If i0+ X0 → Y0 and T,t T T exist and are finite, and the directed Let X0 denote the sequence of length n + 1 resulting iT − X0 → Y0 T T T from “chopping up” the continuous-time signal X into information density it X → Y exists for all 0 0 0 consecutive segments as follows 0 <t<T , then T t T T T T T, t1− t2 − tn− T (2) I X0 → Y0 = it X0 → Y0 dt (9) X0 = X0 , Xt1 ,...,Xtn−1 , Xtn . Z0 T and, for every 0 <t<T , X0 by δ to the right and filling the gap with 0, i.e.: [δ] [δ] d Xt = Xt−δ for t ∈ [δ, T ] and Xt ≡ 0 for t ∈ [0, δ). I Xt → Y t = i XT → Y T . (10) dt 0 0 t 0 0 Define now [δ],T Example 1. Let {Bt} be the standard Brownian motion T − T T I X0 → Y0 := lim sup I X0 → Y0 (12) and let A ∼ N(0, 1) be independent of {B }. Let δ→0+ t Xt ≡ A for all t and dYt = Xtdt + dBt. Letting 2 2 and 2 2 1 σX +σN J(σ ,σ )= ln 2 denote the mutual information X N 2 σN T − T [δ],T T 2 I X0 → Y0 := lim inf I X0 → Y0 , (13) between a Gaussian random variable of variance σX and δ→0+ itself corrupted by an independent Gaussian of variance where the directed information expressions on the right 2 , we have for every σN t ∈ [0, T ) sides of (12) and (13) are according to the definition we 1/t 1 1 δ already have for directed information between two pro- I(Y t+δ; Xt+δ|Y t)= J , = ln 1+ t 0 0 1 + 1/t δ 2 t + 1 cesses on [0, T ].

Directed Information, Causal Estimation, and Communication in Continuous Time

An Information-Theoretic Perspective on Credit Assignment in Reinforcement Learning

Task-Driven Estimation and Control Via Information Bottlenecks

Interpretations of Directed Information in Portfolio Theory, Data

Analogy Between Gambling and Measurements-Based Work Extraction

Universal Estimation of Directed Information Lei Zhao, Haim Permuter, Young-Han Kim, and Tsachy Weissman

Information Theoretic Causal Effect Quantification

Capacity of Continuous Channels with Memory Via Directed Information Neural Estimator

Information Structures for Causally Explainable Decisions

A Unified Bellman Equation for Causal Information and Value in Markov

Transfer-Entropy-Regularized Markov Decision Processes Takashi Tanaka1 Henrik Sandberg2 Mikael Skoglund3

Directed Information for Complex Network Analysis from Multivariate Time Series

Echo State Network Models for Nonlinear Granger Causality