IEEE TRANSACTIONS ON THEORY, VOL. 59, NO. 3, MARCH 2013 1271 Directed Information, Causal Estimation, and Communication in Continuous Time Tsachy Weissman, Fellow, IEEE, Young-Han Kim, Senior Member, IEEE,andHaimH. Permuter, Member, IEEE

Abstract—A notion of directed information between two contin- characterizing the capacity of a communication channel uous-time processes is proposed. A key component in the defini- with feedback. Massey [1] showed that the feedback capacity tion is taking an infimum over all possible partitions of the time is upper bounded as interval, which plays a role no less significant than the supremum over “space” partitions inherent in the definition of mutual infor- mation. Properties and operational interpretations in estimation (1) and communication are then established for the proposed notion of directed information. For the continuous-time additive white where Gaussian noise channel, it is shown that Duncan’s classical rela- (2) tionship between causal estimation error and continues to hold in the presence of feedback upon replacing mu- tual information by directed information. A parallel result is es- and tablished for the Poisson channel. The utility of this relationship is demonstrated in computing the directed information rate be- (3) tween the input and output processes of a continuous-time Poisson channel with feedback, where the channel input process is con- (see also Kramer [2] that streamlines the notion of directed strained to be constant between events at the channel output. Fi- nally, the capacity of a wide class of continuous-time channels with information by causal conditioning). The upper bound in (1) feedback is established via directed information, characterizing is tight for certain classes of ergodic channels, such as general the fundamental limit on reliable communication. nonanticipatory channels satisfying certain regularity condi- Index Terms—Causal estimation, conditional mutual informa- tions [3], channels with finite input memory and ergodic noise tion, continuous time, directed information, Duncan’s theorem, [4], and indecomposable finite-state channels [5], paving the feedback capacity, Gaussian channel, Poisson channel, time road to a computable characterization of feedback capacity (see partition. [6]–[8] for examples). Directed information and its variants also characterize (via multiletter expressions) the capacity for two-way channels [2], I. INTRODUCTION multiple access channels with feedback [2], [9], broadcast chan- nels with feedback [10], and compound channels with feedback HE directed information between [11],aswellastherate-distortion function with feedforward T two random -sequences and [12], [13]. In another context, directed information captures the is a natural generalization of Shannon’s difference in growth rates of wealth in horse race gambling due mutual information to random objects obeying causal relations. to causal side information [14]. This provides a natural inter- Introduced by Massey [1], this notion has been shown to arise pretation of as the amount of information about as the canonical answer to a variety of problems with causally causally provided by on the fly. Similar interpretations dependent components. For example, it plays a pivotal role in for directed information can be drawn for other problems in sci- ence and engineering [15]. Manuscript received September 01, 2011; revised October 30, 2012; accepted This paper is dedicated to extending the mathematical notion November 05, 2012. Date of publication November 15, 2012; date of current of directed information to continuous-time random processes version February 12, 2013. This work was supported in part by the National Sci- andtoestablishing results that demonstrate the operational sig- ence Foundation (NSF) under Grant CCF-0729195, in part by the U.S.–Israel Binational Science Foundation under Grant 2008402, and in part by the Center nificance of this notion in estimation and communication. Our for Science of Information, an NSF Science and Technology Center, under Grant contributions include the following. CCF-0939370. H. H. Permuter was supported in part by the Marie Curie Reinte- 1) We introduce the notion of directed information in contin- gration Fellowship. The material in this paper was presented in part at the 2009 IEEE International Symposium on and the 2011 IEEE In- uous time. Given a pair of continuous-time processes in formation Theory Workshop. atimeinterval and its partition consisting of subinter- T. Weissman is with the Information Systems Laboratory, Stanford Univer- vals, we first consider the (discrete time) directed informa- sity, Stanford, CA 94305 USA (e-mail: [email protected]). Y.-H. Kim is with the Department of Electrical and Computer Engineering, tion for the two sequences of length whose components University of California San Diego, La Jolla, CA 92093-0407 USA (e-mail: are the sample paths on the respective subintervals. The [email protected]). resulting quantity depends on the specific partition of the H. H. Permuter is with the Department of Electrical and Computer Engi- neering, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel (e-mail: time interval. We define directed information in continuous [email protected]). time by taking the infimum over all finite time partitions. Communicated by I. Kontoyiannis, Associate Editor At Large. Thus, in contrast to mutual information in continuous time Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. which can be defined as a supremum of mutual informa- Digital Object Identifier 10.1109/TIT.2012.2227677 tion over finite “space” partitions [16, Ch. 2.5], [17, Ch.

0018-9448/$31.00 © 2012 IEEE 1272 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 3, MARCH 2013

3.5], inherent to our notion of directed information is a where denotes the product distribution under which similar supremum followed by an infimum over time parti- and are independent but maintain their respective marginal tions. We explain why this definition is natural by showing distributions. As an alternative, the mutual information is de- that the continuous-time directed information inherits key fined [16, Ch. 2.5], as properties of its discrete-time origin andbyestablishing new properties that are meaningful in continuous time. (6) 2) We show that this notion of directed information arises in extending classical relationships between information and where the supremum is over all finite quantizations of and . estimation in continuous time—Duncan’s theorem [18] That the two notions coincide has been established in, e.g., [17, that relates the minimum mean squared error (MMSE) in Ch.3.5]and[19].Wewrite instead of when causal estimation of a target signal based on an observa- we wish to emphasize the dependence on the joint distribution tion through an additive white Gaussian noise (AWGN) . channel to the mutual information between the target For a jointly distributed random triple with compo- signal and the observation, and its counterpart for the nents in arbitrary measurable spaces, we define the conditional Poisson channel—to the scenarios in which the channel mutual information between and given as input process can causally depend on the channel output process, whereby corresponding relationships now hold (7) between directed information and estimation. 3) We illustrate these relationships between directed informa- where the supremum is over all finite quantizations of and . tion and estimation by characterizing the directed informa- This quantity, due to Wyner [20], is always well defined and sat- tion rate and the feedback capacity of a continuous-time isfies all the basic properties of conditional mutual information Poisson channel with inputs constrained to be constant be- for discrete and continuous random variables, in particular: tween events at the channel output. 1) Nonnegativity: with equality iff 4) We establish the fundamental role of continuous-time di- form a Markov chain (that is, and are con- rected information in characterizing the feedback capacity ditionally independent given ). of a large class of continuous-time channels. In particular, 2) Chain rule: we show that for channels where the output is a function of the input and some stationary ergodic “noise” process, the continuous-time directed information characterizes the (8) feedback capacity of the channel. The remainder of this paper is organized as follows. Section II 3) Data processing inequality: If form is devoted to the definition of directed information and related a Markov chain, then with quantities in continuous time, which is followed by a presenta- equality iff . tion of key properties of continuous-time directed information The definition in (7) coincides with Dobrushin’s more restrictive in Section III. In Section IV, we establish the generalizations definition[17,p.29] of Duncan’s theorem and its Poisson counterpart that accom- modates the presence of feedback. In Section V, we apply the (9) relationship between the causal estimation error and directed in- formation for the Poisson channel to compute the directed in- where is a regular version of the conditional proba- formation rate between the input and the output of this channel bility law of given (cf. [21, Ch. 6]) if it exists. in a scenario that involves feedback. In Section VI, we study a Let be a pair of random -sequences. The directed general feedback communication problem in which our notion information from to is defined as of directed information in continuous time emerges naturally in the characterization of the feedback capacity. Section VII con- cludes this paper with a few remarks. (10) II. DEFINITION AND REPRESENTATION OF DIRECTED INFORMATION IN CONTINUOUS TIME Note that, unlike mutual information, directed information is Let and be two probability measures on the same space asymmetric in its arguments, i.e., and be the Radon–Nikodym derivative of with re- in general. spect to . The relative entropy between and is defined as Let us now develop the notion of directed information between two continuous-time stochastic processes on the if exists (4) time interval . For a continuous-time process ,let otherwise. denote the process in the time interval For jointly distributed random objects and ,themutualin- .Let denote a vector with components formation between them is defined as satisfying

(5) (11) WEISSMAN et al.: DIRECTED INFORMATION, CAUSAL ESTIMATION, AND COMMUNICATION IN CONTINUOUS TIME 1273

Let denote the sequence of length resulting from “chop- Definition 1: Let be a pair of stochastic processes. ping up” the continuous-time signal into consecutive seg- The directed information from to is defined as ments as (23) (12) If is another random object jointly distributed with Note that each component of the sequence is a continuous-time ,wedefine the conditional directed information stochastic process. For a pair of jointly distributed stochastic as processes ,define (24) (13)

(14) Note that the definitions and conventions preceding Defi- nition 1 imply that the directed information is a nonnegative extended real number (i.e., as an element of where on the right side of (15) is the directed information be- ). It is also worth noting, by recalling (6), that each of tween two sequences of length defined in (10); and in (16), we the conditional mutual information terms in (16), and hence the note that the conditional mutual information terms, defined as in sum, is a supremum over “space” partitions of the stochastic (7), are between two continuous-time processes, conditioned on process in the corresponding time intervals. Thus, the directed a third. We extend this definition to ,where information in (23) is an infimum over time partitions of a is a random object jointly distributed with ,inthe supremum over space partitions. obvious way, namely Also note that

(15) (25)

(16) where the infimum is over all partitions in with subinterval lengths uniformly bounded by . Indeed, We define to be the set of all finite partitions of the for any and any partition ,wehave time interval . The quantity is monotone ,sincea in in the following sense. refinement of the time interval does not increase the directed information as seen in Proposition 1. By the arbitrariness of Proposition 1: Let and be partitions in .If is ,thisimplies arefinement of ,i.e., ,then . Proof: It suffices to prove the claim assuming as in (11) and that is the -dimensional vector with components (26)

(17) (27)

For such and ,wehavefrom(16) whichinturnimplies

(28)

by the arbitrariness of . Since the reverse inequality (18) (29) (19) is immediate from the definition of ,wehave (25). (20) As is clear from its definition in (10), the discrete-time di- rected information satisfies

(21) (30) (22) A continuous-time analogue would be that, for small where the last inequality follows since directed information (be- tween two sequences of length 2 in this case) is upper bounded (31) by the mutual information [1, Th. 2]. Thus, if our proposed notion of directed information in contin- The following definition is now natural. uous time is to be a natural extension of that in discrete time, 1274 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 3, MARCH 2013 one might expect the approximate relation (31) to hold in some where the equality in (40) follows since the infimum over all sense. Toward a precise statement, denote partitions does not change by restricting to partitions that have an interval up to time and from time and the last equality (32) follows by the definition of the function in (33). Now whenever the limit exists. Assuming exists, let

(33) (44) and note that (32) is equivalent to (45) (34) where the inequality in (44) is due to (36) and the monotonicity Proposition 2: Fix . Suppose that is continuous of , which implies ,as is the at and that the convergence in (34) is uniform in the interval length of a subinterval in . Bounding the terms in for some .Then (44) from the other direction, we similarly obtain

(35)

Note that Proposition 2 formalizes (31) by implying that the (46) left- and right-hand sides of (31), when normalized by , coin- cide in the limit of small . Combining (43), (45), and (46) yields ProofofProposition2: Note first that the stipulated uni- form convergence in (34) implies the existence of and a monotone function such that

(36) (47) (48) and

(37) The continuity of at implies , and thus, taking the limit in (48) and applying (37) finally yields Fix now and consider (49)

(38) which completes the proof of Proposition 2. Beyond the intuitive appeal of Proposition 2 in formalizing (39) (31), it also provides a useful formula for computing directed information. Indeed, the integral version of (35) is (40) (50)

As the following example illustrates, evaluating the right-hand side of (50) [via the definition of in (32)] can be simpler than tackling the left-hand side directly via Definition 1. (41) Example 1: Let be a standard Brownian motion and be independent of .Let for all and .Letting denote the mutual information between a Gaussian random vari- able of variance and its corrupted version by an independent (42) Gaussian noise of variance , we have for every

(51)

(52) (43) WEISSMAN et al.: DIRECTED INFORMATION, CAUSAL ESTIMATION, AND COMMUNICATION IN CONTINUOUS TIME 1275

With such an explicit expression for , can More generally, if is monotone strictly increasing and be obtained directly from its definition: continuous, and ,then

(53) (60) 3) Coincidence of directed information and mutual informa- We can now compute the directed information by applying tion: If the Markov relation holds for all Proposition 2: ,then

(61)

(54) 4) Equivalence between discrete time and piecewise con- Note that in this example stancy in continuous time: Let be a pair of , and thus, by (54), we have jointly distributed -tuples and suppose satisfy (11). Let the pair be defined as the (55) piecewise-constant process satisfying

This equality between mutual information and directed informa- if (62) tion holds in more general situations, as elaborated in the next section. for .Then The directed information we have just defined is between two processes on .Weextendthisdefinition to processes (63) of different durations by zero-padding at the beginning of the shorter process. For instance 5) Conservation law: For any ,wehave

(56) (64) where denotes a process on formed by con- catenating a process that is equal to the constant 0 for the time Further, if ,then exists interval and then the process . and Define now (65) (57) Remarks: and 1) The first, second, and fourth parts in the proposition present properties that are known to hold for mutual in- (58) formation (when all the directed information expressions in those items are replaced by the corresponding mutual Finally, define the directed information by information), which follow immediately from the data processing inequality and the invariance of mutual infor- (59) mation to one-to-one transformations of its arguments. That these properties hold also for directed information is not as obvious in view of the fact that directed information when the limit exists, or equivalently, when is, in general, not invariant to one-to-one transformations . As we shall see below (in the last part of nor does it satisfy the data processing inequality in its Proposition 3), is guaranteed to exist whenever second argument. . 2) The third part of the proposition is a natural analogue of the fact that whenever III. PROPERTIES OF DIRECTED INFORMATION IN form a Markov chain for all CONTINUOUS TIME . It covers, in particular, any scenario where and are the input and output of any channel of the form The following proposition collects some properties of di- , where the process (which can be rected information in continuous time. thought of as the internal channel noise) is independent of Proposition 3: Let be a pair of jointly distributed the channel input process . To see this, note that in this stochastic processes. Then: case, we have for all , 1) Monotonicity: is monotone nondecreasing in implying since is determined by the . pair . 2) Invariance to time dilation: For ,if and 3) Particularizing even further, we obtain ,then . whenever is the outcome of corrupting 1276 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 3, MARCH 2013

with additive noise, i.e., ,where To prove the fourth part, consider first the case .Inthis and are independent. case, and for all . It is an imme- 4) The fifth part of the proposition can be considered the con- diate consequence of the definition of directed information that tinuous-time analogue of the discrete-time conservation and, therefore, law [22] that (77) (66) for all .Consequently, ,which It is consistent with, and in fact generalizes, the third part. establishes the case . For the general case ,note Indeed, if the Markov relation holds for firstthatitisimmediatefromthedefinition of all , then our definition of directed information and from the construction of based on in is readily seen to imply that for all (62) that for consisting of the time epochs in and therefore that exists and equals (62), we have . Thus, zero. Thus, (65) in this case reduces to (61). . We now argue that ProofofProposition3: The first part of the proposition follows immediately from the definition of directed information (78) in continuous time (see Definition 1) and from the fact that, in discrete time, for . for any partition . By Proposition 1, it suffices to establish (78) The second part follows from Definition 1 upon noting that, with equality assuming is a refinement of the particular just under a dilation as stipulated, due to the invariance of mutual discussed, that is, is of the form information to one-to-one transformations of its arguments, for any partition of

(67) (79) Then where is shorthand for .Thus (80) (68)

(69) (81) (70) (82) (71) (83) where (68) and (71) follow from Definition 1, (69) follows from (67), and (70) is due to the strict monotonicity and continuity of where (82) follows by applying a similar argument as in the case which implies that . Moving to the proof of the fifth part of the proposition, fix as in (11) with . Applying the (72) discrete-time conservation law (66), we have

Moving to the proof of the third part, assume that the Markov (84) relation holds for all and fix as in (11). Then and consequently, for any

(73)

(74) (85) (75) (86)

(76) (87) where (75) follows since for each where (88) , and (76) is due to the chain rule for mutual information. The proof of the third part of the proposition now follows from the and arbitrariness of . (89) WEISSMAN et al.: DIRECTED INFORMATION, CAUSAL ESTIMATION, AND COMMUNICATION IN CONTINUOUS TIME 1277

The equality in (85) follows since due to its definition in (56), which, by the arbitrariness of ,implies does not decrease by refining the time in- terval in the interval; the equality in (86) follows from the refinement property in Proposition 1, which implies that for (101) arbitrary processes , , , and partitions and , On the other hand, for any ,weclearlyhave there exists a third partition (which will be a refinement of both) such that (102)

as the right-hand side, by its definition, is an infimum over all (90) partitions in , while the left-hand side corresponds to an infimum over the subset consisting only of those partitions with and the equality in (87) follows since (84) holds for any . By the arbitrariness of in (102), we obtain with . Hence

(103) which, when combined with (101), finally implies

(91) (104) Existence of the limit in (104), when combined with (64) and (92) the added assumption , implies existence of the limit and that

(93) (105)

thus completing the proof.

(94) IV. DIRECTED INFORMATION,FEEDBACK, AND (95) CAUSAL ESTIMATION where the equality in (91) follows by taking the limit A. Gaussian Channel from both sides of (87); the equality in (93) follows by writing out explicitly for with and using (25) In [18], Duncan discovered the following fundamental rela- to equate the second limit in (92) with ;and tionship between the MMSE in causal estimation of a target the equality in (95) follows by applying (25) on the conditional signal corrupted by an AWGN in continuous time and the mu- distribution of the pair given .Wehavethus tual information between the clean and noise-corrupted signals. proven (64) or, equivalently, the identity Theorem 1 [18]: Let beasignaloffinite average power , independent of a standard Brownian motion .Let satisfy .Then (96) (106) Toward the proof of (65), for and ,let denote the refinement of obtained by adding an additional point at .Then A remarkable aspect of Duncan’s theorem is that the relation- ship (106) holds regardless of the distribution of . Among its (97) ramifications is the invariance of the causal MMSE to the flow of time, or more generally, to any reordering of time [23], [24]. (98) It should also be mentioned that, although this exact relation- (99) ship holds in continuous-time, approximate versions that hold where the first inequality follows since is a refinement of in discrete time can be derived from it, as is done in [24, Th. 9]. , the equality by writing out the sum that defines A key stipulation in Duncan’s theorem is the independence andisolatingitsfirst term, and the second inequality by the between the noise-free signal and the channel noise , infimum over partitions inherent in the definition of which excludes scenarios in which the evolution of is af- . The arbitrariness of in (99) implies fected by the channel noise, as is often the case in signal pro- cessing (e.g., target tracking) and communication (e.g., in the presence of feedback). Indeed, the identity (106) does not hold (100) in the absence of such a stipulation. 1278 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 3, MARCH 2013

As an extreme example, consider the case where the channel oretic framework by Kadota, Zakai, and Ziv [26]. Indeed, in input is simply the channel output with some delay, i.e., communication over the AWGN channel where is the waveform associated with message , in the absence (107) of feedback, the Markov relation implies for some (and for ). In this case, the that on the right-hand side of (106), when applying causal MMSE on the left side of (106) is clearly 0, while the mu- Theorem 1 in this restricted communication framework, can be tual information on its right side is infinite. On the other hand, equivalently written as . The main result of [26] is in this case, the directed information , as can that this relationship between the causal estimation error and be seen by noting that for all satisfying persists in the presence of feedback, i.e., that (since for such , is determined by for all ). (110) The third remark following Proposition 3 implies that The- orem 1 could be equivalently stated with on the with or without feedback, even though, in the presence of feed- right side of (106) replaced by .Furthermore, back, one no longer has and there- such a modified identity would be valid in the extreme example fore (106) is no longer true. The combination of Theorem 2 in (107). This is no coincidence and is a consequence of the with the main result of [26] [namely, with (114)] thus implies result that follows, which generalizes Duncan’s theorem. To that in communication over the AWGN channel, with or without state it formally, we assume a probability space with feedback, we have . This equality an associated filtration satisfying the “usual conditions” holds well beyond the Gaussian channel, as is elaborated in (right-continuous and contains all the -negligible events Section VI. Evidently, Theorem 2 can be considered an exten- in , cf., e.g., [25, Definition 2.25]). Recall also that when the sion of the Kadota–Zakai–Ziv result as it holds in settings more standard Brownian motion is adapted to ,bydefinition, it general than communication, where there is no message but is implied that, for any , is independent of merely a signal observed through AWGN, adapted to a general (rather than merely of ; cf., e.g., [25, Definition 1.1]). filtration. Theorem 2 is a direct consequence of Proposition 2 and the Theorem 2: Let be adapted to the filtra- following lemma. tion ,where is a signal of finite average power and is a standard Brownian motion. Let Lemma 1 [27]: Let and be two probability laws gov- be the output of the AWGN channel whose input is erning , under which (113) and the stipulations of and whose noise is driven by , i.e., Theorem 2 are satisfied. Then

(108) Suppose that the regularity assumptions of Proposition 2 are satisfied for all .Then (111) (109) Lemma 1 was implicit in [27]. It follows from the second part Note that unlike in Theorem 1, where the channel input of [27, Th. 2], put together with the exposition in [27, Subsection process is independent of the channel noise process, in The- IV-D] (cf., in particular, (148)–(161) therein). orem 2 no such stipulation exists, and thus, the setting in the ProofofTheorem2: Consider (112)–(115) at the bottom of latter accommodates the presence of feedback. Furthermore, the page. Here, the equality in (114) follows by applying (111) since is not invariant to the direction of the flow to the integrand in (113) as follows: replacing the time interval of time in general, Theorem 2 implies, as should be expected, by , substituting by the law of that neither is the causal MMSE for processes evolving in the conditioned on (note that is deterministic generality afforded by the theorem. at under this law), and substituting by the law of That Theorem 1 can be extended to accommodate the pres- conditioned on . The last step is obtained by ence of feedback has been established for a communication the- switching between the integral and and then using

(112)

(113)

(114)

(115) WEISSMAN et al.: DIRECTED INFORMATION, CAUSAL ESTIMATION, AND COMMUNICATION IN CONTINUOUS TIME 1279 the definition of conditional expectation. The switch between could equivalently be stated with directed information rather the integrals is possible due to Fubini’s theorem and the fact than mutual information on the right-hand side of (119). Indeed, that the signal has finite average power .It with continuous-time directed information replacing mutual in- follows that defined in (32) exists and is given by formation, this relationship remains true in much wider gener- ality, as the next theorem shows. In the statement of the theorem, we use the notions of a point process and its predictable inten- (116) sity, as developed in detail in, e.g., [30, Ch. II]. Theorem 4: Let be a point process and be its -pre- which completes the proof by an appeal to Proposition 2. dictable intensity, where is the -field generated by . Suppose that , and that the as- B. Poisson Channel sumptions of Proposition 2 are satisfied for all .Then Consider the function given by

(122) (117)

That this function is natural for quantifying the loss when esti- Paralleling the proof of Theorem 2, the proof of Theorem 4 mating nonnegative quantities is implied in [28, Sec. 2], where is a direct application of Proposition 2 and the following. some of its basic properties are exposed. Among them is that Lemma 2 [28]: Let and be two probability laws gov- conditional expectation is the optimal estimator not only under erning under the setting and stipulations of Theorem the squared error loss but also under , i.e., for any nonnegative 4. Then random variable jointly distributed with

(118) where the minimum is over all (measurable) maps from the do- (123) main of into . With this loss function, the analogue of Duncan’s theorem for the case of doubly stochastic Poisson Lemma 2 is implicit in [28], following directly from [28, Th. process (i.e., the intensity is a random process) can be stated as 4.4], and the discussion in [28, Subsection 7.5]. Equipped with it, the proof of Theorem 4 follows similarly as that of Theorem Theorem 3 [28], [29]: Let be a doubly stochastic Poisson 2, the role of (115) being played here by (123). process and be its intensity process (i.e., conditioned on , is a nonhomogenous Poisson process with rate function ) satisfying .Then V. E XAMPLE:POISSON CHANNEL WITH FEEDBACK The Poisson channel (see, e.g., [31]–[38]) is a channel where the input at time , , determines the intensity of the doubly (119) stochastic Poisson process occurring at the output of the channel. A Poisson channel with feedback refers to the case We remark that for ,onehas where the input signal may depend on the previous obser- vation of the output . In this section, we consider a special case of Poisson channel (120) with feedback. Let and be the input and output processes of the continuous-time Poisson channel with and thus (119) can equivalently be expressed as feedback, where each time an event occurs at the channel output, the channel input changes to a new value, drawn according to the distribution of a positive random variable , independently of the channel input and output up to that point in time. The (121) channel input remains fixed at that value until the occurrence of the next event at the channel output, and so on. Throughout as was done in [29] and other classical references. But it was not this section, the shorthand “Poisson channel with feedback” will until [28] that the left-hand side was established as the minimum refer to this scenario, with its implied channel input process. mean causal estimation error under an explicitly identified loss The Poisson channel we use here is similar to the well-known function, thus completing the analogy with Duncan’s theorem. Poisson channel model (see, e.g., [31]–[38]) with one difference The condition stipulated in the third item of Proposition 3 that the intensity of the Poisson channel changes according to is readily seen to hold when is a doubly stochastic Poisson the input only when there is an event at the output of the process and is its intensity process. Thus, the above theorem channel. Note that the channel description given here uniquely 1280 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 3, MARCH 2013 determines the joint distribution of the channel input and output 2) Let denote the time of occurrence of the last processes. (most recent) event at the channel output prior to time 0 In the first part of this section, we derive, using Theorem 4, a and define . The density of is formula for the directed information rate of this Poisson channel with feedback. In the second part, we demonstrate the use of this formula by computing and plotting the directed information rate (129) for a special case in which the intensity alphabet is of size 2.

A. Characterization of the Directed Information Rate 3) For distributed as in (129) For jointly distributed processes ,define the directed information rate by (130)

(124) Proof: For the first part of the lemma, note that is an ergodic continuous-time Markov chain, and thus, is equal to the fraction of time that spends in state which when the limit exists. is proportional to , accounting for (127), which, in Proposition 4: Assume that is finite-valuedwithproba- turn, yields bility mass function (pmf) . The directed information rate between the input and output processes of the Poisson channel with feedback exists and is given by

(125) (131) where, in on the right-hand side, , i.e., the conditional density of given is accounting for (128). . To prove the second part of the lemma, observe that The key component in the proof of the proposition is the use (a) the interarrival times of the process are independent of Theorem 4 for directed information in continuous time as a and identically distributed (i.i.d.) copies of a random vari- causal mean estimation error. An intuition for the expression in able ; (125) can be obtained by considering rate per unit cost [39], i.e., (b) has a density ,where is the cost of the input. In our case, the “cost” of is proportional to the average duration (132) of time until the channel can be used again, i.e., . Finally, we remark that the assumption of discreteness of in Proposition 4 is made for simplicity of the proof, though the (c) the probability density of the length of the interarrival in- result carries over to more generally distributed . terval of the process around 0 is proportional to To prove Proposition 4, let us first collect the following ;and observations. (d) given the length of the interarrival interval around 0 is , its left point is uniformly distributed on . Lemma 3: Let and . Letting denote the density of a random variable Define uniformly distributed on ; it follows that the density of is

(126)

(133) Then, the following statement holds. 1) The marginal distribution of is (134)

(127) (135)

and consequently (136)

(128) (137) WEISSMAN et al.: DIRECTED INFORMATION, CAUSAL ESTIMATION, AND COMMUNICATION IN CONTINUOUS TIME 1281 where (133) follows by combining observations (c) and (d), and stationarity and martingale convergence. Specifically, by mar- (135) follows by substituting from (132). We have thus proven tingale convergence as the second part of the lemma. a.s. and thus ,which To establish the third part, let denote the cumulative is equal to by sta- distribution function of and consider tionarity, converges to

(154)

(138) by the bounded convergence theorem (recall that is finite valued); and (153) follows from the firstpartofLemma3.Now, recalling the definition of the function in (126), we note that

(155) (139) Thus

(156) (140) (157) (141) (158)

(142) (159)

(143) where (156) follows from the Markov relation , (157) follows from (155), and (159) from the last part of Lemma 3. Thus (144) (160) where (139) follows by substituting from the second part of the lemma and (141) follows by substituting from (132) and noting (161) that (162) (145) where (160) follows by combining (153) with (159), and (161) (146) follows by noting that

(163) (147) (164) (148) (165) (149) This completes the proof of Proposition 4. We have thus established the third and last part of the lemma. B. Evaluation of the Directed Information Rate ProofofProposition4: We have Fig. 1 depicts the directed information rate for the case where takes only two values and .Wehave used numerical evaluation of in the right-hand side (150) of (125) to compute the directed information rate. The figure shows the influence of on the directed in- formation rate where and . As expected, the maximum is achieved when there is higher probability that the (151) encoder output will be the higher rate , which would imply (152) more channel uses per unit time, but not much higher as other- wise the input value will be close to deterministic. (153) Fig. 2 depicts the maximal value (optimized w.r.t. ) of the directed information rate when is fixed and is where (151) follows from the relation between directed infor- equalto1and varies. This value is the capacity of the Poisson mation and causal estimation in (122); (152) follows from the channel with feedback, when the inputs are restricted to one of 1282 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 3, MARCH 2013

Fig. 3. Continuous-time communication with delay and channel of the form ,where isablock-ergodicprocess.

rate characterizes the capacity of the Poisson channel with feedback. In Section VI, we will see that the continuous-time directed information rate characterizes the capacity of a large Fig. 1. Directed information rate between the input and output processes for the continuous-time Poisson channel with feedback, as a function of ,the family of continuous-time channels. pmf of the input to the channel. The input to the channel is one of two possible values and , and it is the intensity of the Poisson process at the VI. COMMUNICATION OVER CONTINUOUS-TIME CHANNELS output of the channel until the next event. WITH FEEDBACK We first review the definition of a block-ergodic process as given by Berger [40]. Let denote a continuous-time process drawn from a space according to the prob- ability measure .For ,let be a -shift transformation, i.e., . A measurable set is -invariant if it does not change under the -shift transformation, i.e., .A continuous-time process is -ergodic if every mea- surable -invariant set of processes has either probability 1 or 0, i.e., for any -invariant set ,inotherwords, . The definition of -ergodicity means that if we take the process and slice it into time blocks of length , then the new discrete-time process is ergodic. A contin- uous-time process is block-ergodic if it is -ergodic Fig. 2. Capacity of the Poisson channel with feedback, in case where channel input is constrained to the binary set ,when is fixed and is equal to for every . Berger [40] showed that weak mixing (there- 1and varies. fore also strong mixing) implies block ergodicity. Now let us describe the communication model of our interest (see Fig. 3) and show that the continuous-time directed infor- the two values or .When , the capacity is obviously mation characterizes the capacity. Consider a continuous-time zero since any use of as input will cause the channel not channel that is specified by to change any further. It is also obviously zero at since 1) the channel input and output alphabets and , respec- in this case , so there is only one possible input to the tively, that are not necessarily finite, and channel. As increases, the capacity of the channel increases 2) the channel output at time without bound since, for , the channel effectively op- erates as a noise-free binary channel, where one symbol “costs” an average duration of 1 while the other a vanishing average du- (167) ration. Thus, the limiting capacity with increasing is equal to . corresponding to the channel input at time ,where One can consider a discrete-time memoryless channel, where is a stationary ergodic noise process on an alphabet the input is discrete ( or ) and the output is distributed and is a given measurable function. according to . Consider now a random cost , A code with delay for the channel consists where is the output of the channel. Using the result from of [39], we obtain that the capacity per unit cost of the discreet 3) a message set , memoryless channel is 4) an encoder that assigns a symbol

(166) (168) where the equality follows since to each message and past received . Finally, we note that the capacity of the Poisson output signal for ,where channel in the example above is the capacity per unit cost of is measurable, and the discrete memoryless channel. Thus, by Proposition 4, we 5) a decoder that assigns a message estimate can conclude that the continuous-time directed information to each received output signal WEISSMAN et al.: DIRECTED INFORMATION, CAUSAL ESTIMATION, AND COMMUNICATION IN CONTINUOUS TIME 1283

,where is Definition 4: Let be the information feedback ca- measurable. pacity defined as We assume that the message is uniformly distributed on and independent of the noise process . By the definition of the channel in (167), the definition of (175) the encoding function in (168), and the independence of and ,itfollowsthatforany and any where the supremum in (175) is over , which is the set of all channel input processes of the form

(169) (176) form a Markov chain. This is analogoustotheassumptioninthe discrete case that ; some family of measurable functions , and some process the analogy is exact when we convert a discrete-time channel which is independent of the channel noise process [ap- to a continuous-time channel with constant piecewise process pearing in (167)] and has a finite cardinality that may depend on between the time samples. Furthermore, for any , , . and , The limit in (175) is shown to exist in Lemma 4 using the superadditivity property. We now characterize in terms (170) of for the class of channels defined in (167). form a Markov chain. This is analogoustotheassumptioninthe Theorem 5: For the channel defined in (167) discrete case that whenever there is feedback of delay , (177) . Similar communication settings with feedback in continuous for all (178) time were studied by Kadota, Zakai, and Ziv [41] for contin- uous-time memoryless channels, where it is shown that feed- Since is a decreasing function in , (178) may back does not increase the capacity, and by Ihara [42], [43] for be written as , and the limit exists the Gaussian case. Our main result in this section is showing that because of the monotonicity. Since the function is monotonic, the operational capacity, defined below, can be characterized by then with a possible exception of the information capacity, which is the maximum of directed in- the points of of a set of measure zero [44, p. 5]. There- formation from the channel input process to the output process. fore, for any except of a set of Next we define an achievable rate, the operational feedback ca- points of measure zero. Furthermore, (177) and (178) imply pacity, and the information feedback capacity for our setting. that ; hence, we also have . Definition 2: Arate is said to be achievable with feedback Before proving the theorem, we show that the limits in (175) delay if for each , there exists a family of codes exist. such that Lemma 4: The term is superadditive, (171) namely,

Definition 3: Let

(172) (179) be the (operational) feedback capacity with delay , and let the (operational) feedback capacity be and therefore, the limit in (175) exists and is equal to

(173)

(180) From the monotonicity of in ,wehave To prove Lemma 4, we use the following result. Lemma 5: Let be a pair of discrete-time (174) processes such that Markov relation holds for .Then This definition coincides with the feedback capacity definition of continuous-time channels given in [41], where there also was assumed a positive but arbitrary small delay in the feedback capacity. (181) 1284 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 3, MARCH 2013

Proof: The result is a consequence of the identity [4, Eq. . In addition, fixapartition of (11)] length such that for any and let . Consider (182) (188) Consider (189) (190) (191)

(183) (192)

(193)

(184) (194)

(185) (195) (186) where (183) follows from the identity given in (182), and (185) follows from the Markov chain assumption in the lemma. (196) Proof of Lemma 4: Firstnotethatwedonotincreasethe term by restricting the time parti- (197) tion to have an interval starting at point .Nowfixthreetime partitions: in , in ,and in (198) such that is a concatenation and .For and , fix the input functions of the form of (176) and fix the arguments where the equality in (188) follows since the message is dis- and which corresponds to and ,re- tributed uniformly, the inequality in (190) follows from Fano’s spectively. The construction is such that the random processes inequality, where , the equality in (193) fol- and are independent of each other. Let lows from the fact that is a deterministic function of be a concatenation of and . Applying Lemma 5 and , the equality in (194) follows from the assump- on the discrete-time process ,where tion that , the equality in (196) follows from for , we obtain that for (169), and the equality in (197) follows from (170). Hence, we any fixed , , , , ,and as described obtained that for every above, we have (199)

Since the number of codewords is finite, we may consider the (187) input signal of the form with ,where the cardinality of is bounded, i.e., for any given Note that the Markov condition (the bound may depend on ), independently of the partition indeed holds because of the construction of .Furthermore . Furthermore, because of the stationarity of the noise (187) implies (179). Finally, using Fekete’s lemma [45, Ch. (200) 2.6], and the superadditivity in (179) implies the existence of the limit in (180). (201) The proof of Theorem 5 consists of two parts: the proof of the converse, i.e., (177), and the proof of achievability, i.e., (178). Finally, for any that is achievable, there exists a sequence of Proof of the Converse for Theorem 5: Fix an encoding codes such that ; hence, and we have scheme with rate and probability of decoding error, established (178). WEISSMAN et al.: DIRECTED INFORMATION, CAUSAL ESTIMATION, AND COMMUNICATION IN CONTINUOUS TIME 1285

Note that as a byproduct of the sequence of equalities essential to divide into small time interval as well as increasing (190)–(198), we conclude that for the communication system the feedback delay by a small but positive value . depicted in Fig. 3 Proof of Achievability for Theorem 5: Let , where . In addition, let be such that for all .Let be of (202) the form

(203) (206)

The only assumptions that we used to prove (190)–(198) are that where the cardinality of is bounded. Then, we show that any the encoders uses a strictly causal feedback of the form given in rate (176) and that the channel satisfies the benign assumption given in (169). This might be a valuable result by itself that provides (207) a good intuition why directed information characterizes the ca- pacity of a continuous-time channel. Furthermore, the interpre- is achievable. tations of the measure , for instance, as given in [26], Assume that the communication is over the time interval should also hold for directed information and vice versa. ,where is fixed and may be chosen to be as large as For the proof of achievability, we will use the following result needed. Partition the time interval into subintervals for discrete-time channels. of length and in each subinterval ,whichwe Lemma 6: Consider the discrete-time channel, where the index by , fixtherelation input at time has a finite alphabet, i.e., ,andthe output at time hasanarbitraryalphabet . We assume that (208) the relation between the input and the output is given by . Note that this coding scheme is possible with feedback delay (204) since . This follows from the assumption that and .Now,letusdefine a discrete-time where the noise process is stationary and ergodic with channel where the input at time is (which an arbitrary alphabet . Then, any rate is achievable for this has an alphabet ), the output at time is the channel if vector and the (205) noise at time is .Notethatsince is a stationary and block-ergodic, the noise process is stationary and ergodic. Furthermore, the relation where the joint distribution of is induced by the input holds and the alphabet of is finite. Hence, distribution , the stationary distribution of , and (204). by Lemma 6, any rate Proof: Fix the pmf that attains the maximum in (205). Since can be approximated arbitrarily close by a finite (209) partition of [16], assume without loss of generality that is finite. The proof uses the random codebook generation and joint is achievable. Now using the definition of the discrete-time typicality decoding in [46, Ch. 3]. Randomly and independently channel and the properties of directed information, we obtain generate codewords , , each ac- cording to . The decoder finds the unique such (210) that is jointly typical. (For the definition and prop- (211) erties of joint typicality, refer to [46, Ch. 2] and [47].) Now, as- suming that is sent, the decoder makes an error only if (212) is not typical or is typical for some where the equality in (210) follows from the definition of the . By the packing lemma (see [46, Ch. 3]), the probability discrete-time channel and the equality in (212) follows from the of the second event tends to zero as if . same sequence of equalities as in (190)–(198). Since (212) holds To bound the probability of the first event, recall from [48, Th. for any such that ,weconcludethat 10.3.1], that if is i.i.d. and is stationary ergodic, inde- pendent of , then the pair is jointly stationary er- (213) godic. Consequently, from the definition of the channel in (204), is jointly stationary ergodic. Thus, by Birkhoff’s er- Finally, by the definition of directed information and by the fact godic theorem, the probability that is not typical that (213) holds for any , we have established (178). tends to zero as . Therefore, any rate is achievable. VII. CONCLUDING REMARKS The proof of achievability is based on the lemma above and We have introduced and developed a notion of directed the definition of directed information for continuous time. It is information between continuous-time stochastic processes. It 1286 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 3, MARCH 2013 emerges naturally in the characterization of the fundamental [15] H. H. Permuter, Y. H. Kim, and T. Weissman, “Interpretations of di- limit on reliable communication for a wide class of contin- rected information in portfolio theory, , and hypoth- esis testing,” IEEE Trans. Inf. Theory, vol. 57, no. 6, pp. 3248–3259, uous-time channels with feedback, quite analogously to the Jun. 2011. discrete-time setting. It also arises in estimation-theoretic [16] R. G. Gallager, Information Theory and Reliable Communication. relations as the replacement for mutual information when New York: Wiley, 1968. [17] M. S. Pinsker, Information and Information Stability of Random Vari- extending the scope to the presence of feedback. In particular, ables and Processes. San Francisco, CA: Holden-Day, 1964. with continuous-time directed information replacing mutual [18] T. E. Duncan, “On the calculation of mutual information,” SIAM J. information, Duncan’s theorem generalizes to estimation prob- Appl. Math., vol. 19, no. 1, pp. 215–220, Jul. 1970. [19] A. Kolmogorov, “On the Shannon theory of information transmission lems in which the evolution of the target signal is affected by in the case of continuous signals,” IRE Trans. Inf. Theory, vol. 2, no. the past channel noise. An analogous relationship based on the 4, pp. 102–108, Dec. 1956. directed information holds for the Poisson channel. We have [20] A. D. Wyner, “A definition of conditional mutual information for arbi- trary ensembles,” Inf. Control, vol. 38, no. 1, pp. 59–61, Jul. 1978. illustrated the use of the latter in an explicit computation of [21] O. Kallenberg, Foundations of Modern Probability,ser.SpringerSe- the directed information rate between the input and output of a ries in Statistics, 2nd ed. New York: Springer, 2002. Poisson channel where the input intensity changes only when [22] J. Massey and P. Massey, “Conservation of mutual and directed infor- mation,” in Proc.Int.Symp.Inf.Theory, Sep. 2005, pp. 157–158. there is an event at the channel output. One important direction [23] A. Cohen, N. Merhav, and T. Weissman, “Scanning and sequential de- for future exploration is to use the “multiletter” characteriza- cision making for multidimensional data, Part II: The noisy case,” IEEE tion of capacity developed here to compute or approximate the Trans. Inf. Theory, vol. 54, no. 12, pp. 5609–5631, Dec. 2008. [24] D. Guo, S. Shamai, and S. Verdú, “Mutual information and minimum feedback capacity of interesting continuous-time channels. mean-square error in Gaussian channels,” IEEE Trans. Inf. Theory, vol. 51, no. 4, pp. 1261–1282, Apr. 2005. [25] I. Karatzas and S. E. Shreve, Brownian Motion and Stochastic Cal- ACKNOWLEDGMENT culus. New York: Springer-Verlag, 1991. [26] T. T. Kadota, M. Zakai, and J. Ziv, “Mutual information of the white The authors thank the Associate Editor and the anonymous Gaussian channel with and without feedback,” IEEE Trans. Inf. Theory, reviewers for their careful reading of the original manu- vol. IT-17, no. 4, pp. 368–371, Jul. 1971. [27] T. Weissman, “The relationship between causal and noncausal mis- script and many valuable comments that helped improve the matched estimation in continuous-time AWGN channels,” IEEE Trans. presentation. Inf. Theory, vol. 56, no. 9, pp. 4256–4273, Sep. 2010. [28] R. Atar and T. Weissman, “Mutual information, relative entropy, and estimation in the Poisson channel,” IEEE Trans. Inf. Theory, vol. 58, REFERENCES no. 4, pp. 1302–1318, Mar. 2012. [29] R. S. Liptser and A. N. Shiryaev, Statistics of Random Processes II: [1] J. Massey, “Causality, feedback, and directed information,” in Proc. Applications. New York: Springer-Verlag, 2001. Int. Symp. Inf. Theory Appl., Nov. 1990, pp. 303–305. [30] R. S. Liptser and A. N. Shiryaev, Point Processes and Queues: Mar- [2] G. Kramer, “Capacity results for the discrete memoryless network,” tingale Dynamics. New York: Springer-Verlag, 1982. IEEE Trans. Inf. Theory, vol. 49, no. 1, pp. 4–21, Jan. 2003. [31] J. Mazo and J. Salz, “On optical data communication via direct detec- [3] S. Tatikonda and S. Mitter, “The capacity of channels with feedback,” tion of light pulses,” Bell Syst. Tech. J., vol. 55, pp. 347–369, 1976. IEEE Trans. Inf. Theory, vol. 55, no. 1, pp. 323–349, Jan. 2009. [32] Y. M. Kabanov, “The capacity of a channel of the Poisson type,” [4] Y.-H. Kim, “A coding theorem for a class of stationary channels with Theory Probab. Appl., vol. 23, no. 1, pp. 143–147, 1978. feedback,” IEEE Trans. Inf. Theory, vol. 54, no. 4, pp. 1488–1499, Apr. [33] M. Davis, “Capacity and cutoff rate for Poisson-type channels,” IEEE 2008. Trans. Inf. Theory, vol. IT-26, no. 6, pp. 710–715, Nov. 1980. [5] H. H. Permuter, T. Weissman, and A. J. Goldsmith, “Finite state [34] A. D. Wyner, “Capacity and error exponent for the direct detection channels with time-invariant deterministic feedback,” IEEE Trans. photon channel—part I,” IEEE Trans. Inf. Theory, vol. 34, no. 6, pp. Inf. Theory, vol. 55, no. 2, pp. 644–662, Feb. 2009. 1449–1461, Nov. 1988. [6] J. Chen and T. Berger, “The capacity of finite-state Markov channels [35] A. D. Wyner, “Capacity and error exponent for the direct detection with feedback,” IEEE Trans. Inf. Theory, vol. 51, no. 3, pp. 780–789, photon channel—part II,” IEEE Trans. Inf. Theory, vol. 34, no. 6, pp. Mar. 2005. 1462–1471, Nov. 1988. [7] Y.-H. Kim, “Feedback capacity of stationary Gaussian channels,” IEE [36] A. Lapidoth, “On the reliability function of the ideal Poisson channel Trans. Inf. Theory, vol. 57, no. 1, pp. 57–85, Jan. 2010. with noiseless feedback,” IEEE Trans. Inf. Theory,vol.39,no.2,pp. [8] H.H.Permuter,P.Cuff,B.V.Roy,andT.Weissman,“Capacityofthe 491–503, Mar. 1993. trapdoor channel with feedback,” IEEE Trans. Inf. Theory, vol. 54, no. [37] D. Guo, S. Shamai, and S. Verdu, “Mutual information and conditional 7, pp. 3150–3165, Jul. 2008. mean estimation in Poisson channels,” IEEE Trans. Inf. Theory, vol. [9] H. H. Permuter, T. Weissman, and J. Chen, “Capacity region of the 54, no. 5, pp. 1837–1849, May 2008. finite-state multiple access channel with and without feedback,” IEEE [38] S. Bross, A. Lapidoth, and L. Wang, “The Poisson channel with side Trans. Inf. Theory, vol. 55, no. 6, pp. 2455–2477, Jun. 2009. information,” in Proc. 47th Allerton Conf. Commun. Control Comput., [10]R.DaboraandA.J.Goldsmith,“Capacity theorems for discrete, fi- Sep. 2009, pp. 574–578. nite-state broadcast channels with feedback and unidirectional receiver [39] S. Verdú, “On channel capacity per unit cost,” IEEE. Trans. Inf. Theory, cooperation,” IEEE Trans. Inf. Theory, vol. 56, no. 12, pp. 5958–5983, vol. 36, no. 5, pp. 1019–1030, Sep. 1990. Dec. 2010. [40] T. Berger, “Rate distortion theory for sources with abstract alphabets [11] B. Shrader and H. Permuter, “Feedback capacity of the compound and memory,” Inf. Control, vol. 13, no. 3, pp. 254–273, Sep. 1968. channel,” IEEE Trans. Inf. Theory, vol. 55, no. 8, pp. 3629–3644, [41] T. T. Kadota, M. Zakai, and J. Ziv, “Capacity of a continuous memo- Aug. 2009. ryless channel with feedback,” IEEE Trans. Inf. Theory, vol. 17, no. 4, [12] R. Venkataramanan and S. S. Pradhan, “Source coding with feed-for- pp. 372–378, Jul. 1971. ward: Rate-distortion theorems and error exponents for a general [42] S. Ihara, “Coding theorems for a continuous-time Gaussian channel source,” IEEE Trans. Inf. Theory, vol. 53, no. 6, pp. 2154–2179, Jun. with feedback,” IEEE Trans. Inf. Theory, vol. 40, no. 6, pp. 2041–2044, 2007. Nov. 1994. [13] S. Pradhan, “On the role of feedforward in Gaussian sources: Point-to- [43] S. Ihara, Information Theory for Continuous Systems. River Edge, point source coding and multiple description source coding,” IEEE NJ: World Scientific, 1993. Trans. Inf. Theory, vol. 53, no. 1, pp. 331–349, Jan. 2007. [44] F. Riesz and B. Sz.-Nagy, Functional Analysis, 2nd ed. New York: [14] H. H. Permuter, Y.-H. Kim, and T. Weissman, “On directed informa- Dover. tion and gambling,” in Proc.IEEEInt.Symp.Inf.Theory, Toronto, ON, [45] A. Schrijver, Combinatorial Optimization: Polyhedra and Effi- Canada, Jul. 2008, pp. 1403–1407. ciency. New York: Springer-Verlag, 2003. WEISSMAN et al.: DIRECTED INFORMATION, CAUSAL ESTIMATION, AND COMMUNICATION IN CONTINUOUS TIME 1287

[46] A. El Gamal and Y.-H. Kim, Network Information Theory.Cam- Young-Han Kim (S’99–M’06–SM’12) received the B.S. degree with honors bridge, U.K.: Cambridge Univ. Press, 2012. in electrical engineering from Seoul National University, Seoul, Korea, in 1996 [47] A. Orlitsky and J. R. Roche, “Coding for computing,” IEEE Trans. Inf. and the M.S. degrees in electrical engineering and in statistics, and the Ph.D. de- Theory, vol. 47, no. 3, pp. 903–917, Mar. 2001. gree in electrical engineering from Stanford University, Stanford, CA, in 2001, [48] J. Wolfowitz, Coding Theorems of Information Theory, 2nd ed. New 2006, and 2006, respectively. York: Springer-Verlag, 1964. In July 2006, he joined the University of California, San Diego, where he is an Associate Professor of Electrical and Computer Engineering. His research interests are in statistical signal processing and information theory, with appli- cations in communication, control, computation, networking, data compression, and learning. Dr. Kim is a recipient of the 2008 NSF Faculty Early Career Develop- ment (CAREER) Award the 2009 US-Israel Binational Science Foundation Bergmann Memorial Award, and the 2012 IEEE Information Theory Paper Award. He is currently on the Editorial Board of the IEEE TRANSACTIONS ON INFORMATION THEORY, serving as an Associate Editor for Shannon Theory. He is also serving as a Distinguished Lecturer for the IEEE Information Theory Tsachy Weissman (S’99–M’02–SM’07–F’13) graduated summa cum laude Society. with a B.Sc. in electrical engineering from the Technion in 1997, and earned his Ph.D. at the same place in 2001. He then worked at Hewlett-Packard Laboratories with the information theory group until 2003, when he joined Stanford University, where he is Associate Professor of Electrical Engineering Haim H. Permuter (M’08) received his B.Sc. (summa cum laude) and M.Sc. and incumbent of the STMicroelectronics chair in the School of Engineering. (summa cum laude) degrees in Electrical and Computer Engineering from the He has spent leaves at the Technion, and at ETH Zurich. Ben-Gurion University, Israel, in 1997 and 2003, respectively, and the Ph.D. Dr. Weissman’s research is focused on information theory, statistical signal degree in Electrical Engineering from Stanford University, California in 2008. processing, the interplay between them, and their applications. Between 1997 and 2004, he was an officer at a research and development unit He is a recipient of several best paper awards, and prizes for excellence in of the Israeli Defense Forces. He is currently a senior lecturer at Ben-Gurion research. University. He currently serves on the editorial boards of the IEEE TRANSACTIONS ON Dr. Permuter is a recipient of the Fulbright Fellowship, the Stanford Grad- INFORMATION THEORY and Foundations and Trends in Communications and uate Fellowship (SGF), Allon Fellowship, and the 2009 U.S.-Israel Binational Information Theory. Science Foundation Bergmann Memorial Award.