ISIT 2008, Toronto, Canada, July 6 - 11, 2008 On Directed and Gambling

Haim H. Permuter Young-Han Kim Tsachy Weissman Stanford University University of California, San Diego Stanford University/Technion Stanford, CA, USA La Jolla, CA, USA Stanford, CA, USA/Haifa, Israel [email protected] [email protected] [email protected]

Abstract—We study the problem of gambling in horse races rates of the optimal gambler’s wealth with and without side with causal side information and show that Massey’s directed information Y . Thus, Kelly’s result gives an interpretation that information characterizes the increment in the maximum achiev- I(X; Y ) is the value of side information able capital growth rate due to the availability of side infor- Y X mation. This result gives a natural interpretation of directed for the horse race . information I(Y n → Xn) as the amount of information that Y n In order to tackle problems arising in information systems n causally provides about X . Extensions to stock market portfolio with causally dependent components, Massey [5] introduced strategies and with causal side information are the notion of directed information as also discussed. n n n i i−1 I. INTRODUCTION I(X → Y )  I(X ; Yi|Y ), Mutual information arises as the canonical answer to a va- i=1 riety of problems. Most notably, Shannon [1] showed that the C and showed that the maximum directed information upper capacity , the maximum data rate for reliable communication bounds the capacity of channels with feedback. Subsequently, over a discrete memoryless channel p(y|x) with input X and Y it was shown that Massey’s directed information and its output , is given by variants indeed characterize the capacity of feedback and two- C =maxI(X; Y ), (1) way channels [6]–[13] and the rate distortion function with p(x) feedforward [14]. which leads naturally to the operational interpretation of mu- The main contribution of this paper is showing that directed tual information I(X; Y )=H(X) − H(X|Y ) as the amount information I(Y n → Xn) has a natural interpretation in of uncertainty about X that can be reduced by observation gambling as the gain in growth rates due to causal side Y , or equivalently, the amount of information Y can provide information. As a special case, if the horse race outcome and about X. Indeed, mutual information I(X; Y ) plays the cen- the corresponding side information sequences are i.i.d., then tral role in Shannon’s random coding argument because the the (normalized) directed information becomes a single letter probability that independently drawn X n and Y n sequences mutual information I(X; Y ), which coincides with Kelly’s “look” as if they were drawn jointly decays exponentially result. with exponent I(X; Y ). Shannon also proved a dual result The rest of the paper is organized as follows. We describe [2] showing that the minimum compression rate R to satisfy the notation of directed information and causal conditioning a certain fidelity criterion D between the source X and its in Section II. In Section III, we formulate the horse-race gam- reconstruction Xˆ is given by R(D)=minp(ˆx|x) I(X; Xˆ).In bling problem, in which side information is revealed causally another duality result (Lagrange duality this time) to (1), Gal- to the gambler. We present the main result in Section IV and an lager [3] proved the minimax redundancy theorem, connecting analytically solved example in Section V. Finally, Section VI the redundancy of the universal lossless source code to the concludes the paper and states two possible extensions of this capacity of the channel with conditional distribution described work to stock market and data compression with causal side by the set of possible source distributions. information. Later on, it was shown that mutual information also has an important role in problems that are not necessarily related to II. DIRECTED INFORMATION AND CAUSAL CONDITIONING describing sources or transferring information through chan- nels. Perhaps the most lucrative example is the use of mutual Throughout this paper, we use the causal conditioning nota- information in gambling. tion (·||·) developed by Kramer [6]. We denote as p(xn||yn−d) n Kelly showed in [4] that if each horse race outcome can the probability mass function (pmf) of X =(X1,...,Xn) be represented as an independent and identically distributed causally conditioned on Y n−d, for some integer d ≥ 0, which (i.i.d.) copy of a random variable X, and the gambler has is defined as Y X some side information relevant to the outcome of the n n n−d i−1 i−d race, then under some conditions on the odds, the mutual in- p(x ||y )  p(xi|x ,y ). formation I(X; Y ) captures the difference between the growth i=1

978-1-4244-2571-6/08/$25.00 ©2008 IEEE 1403

Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on November 18, 2009 at 02:09 from IEEE Xplore. Restrictions apply. ISIT 2008, Toronto, Canada, July 6 - 11, 2008

i−d 1 n n (By convention, if i − d ≤ 0, then x is set to null.) In Finally, the growth rate n W (X ||Y ) is defined as the particular, we use extensively the cases d =0, 1: normalized growth. n Here is a summary of the notation: n n i−1 i p(x ||y )  p(xi|x ,y ), • Xi is the outcome of the horse race at time i. i=1 • Yi is the the side information at time i. n i−1  • o Xi|X i Xi n n−1 i−1 i−1 ( ) is the payoffs at time for horse given p(x ||y )  p(xi|x ,y ). that in the previous race the horses X i−1 won. i=1 i i−1 • b(Xi|Y ,X ) the fractions of the gambler’s wealth Using the chain rule, we can easily verify that invested in horse Xi at time i given that the outcome Xi−1 n n n n n n−1 of the previous races are and the side information p x ,y p x ||y p y ||x . i ( )= ( ) ( ) available at time i is Y . n n n n • S X ||Y the gambler’s wealth after n races when the The causally conditional entropy H(X ||Y ) is defined as ( ) outcomes of the races are Xn and the side information n n n n n H(X ||Y )  E[log p(X ||Y )] Y is causally available. n 1 n n  • n W (X ||Y ) is the growth rate. i−1 i = H(Xi|X ,Y ). Without loss of generality, we assume that the gambler’s i=1 capital is 1 initially; therefore S0 =1.

Under this notation, directed information can be written as IV. MAIN RESULTS n I Y n → Xn I X Y i|Xi−1 In Subsection IV-A, we assume that the gambler invests all ( )= ( i; ) his money in the horse race, while in Subsection IV-B, we i=1 H Xn − H Xn||Y n , allow the gambler to invest only part of the money. Using = ( ) ( ) Kelly’s result, it is shown in Subsection IV-B that if the odds which hints, in a rough analogy to mutual information, of a are fair with respect to some distribution, then the gambler possible interpretation of directed information I(Y n → Xn) should invest all his money in the race. as the amount of information causally available side informa- A. Investing all the money in the horse race tion Y n can provide about Xn. We assume that at any time n the gambler invests all his Note that the channel capacity results involve the term capital and, therefore, I(Xn → Y n), which measures the information in the forward n n n n n−1 n n−1 n−1 n−1 link X → Y . In contrast, in gambling the gain in growth S(X ||Y )=b(Xn|X ,Y )o(Xn|X )S(X ||Y ). rate is due to the side information (backward link), and This also implies that therefore the expression I(Y n → Xn) appears. n S Xn||Y n b X |Xi−1,Yi o X |Xi−1 . III. GAMBLING IN HORSE RACES WITH CAUSAL SIDE ( )= ( i ) ( i ) i=1 INFORMATION The following proposition characterizes the optimal betting Suppose that there are m racing horses in an infinite strategy and the corresponding growth of wealth. sequence of horse races, and let Xi ∈X  {1, 2,...,m}, i =1, 2,..., denote the horse that wins at time i. Before Theorem 1: For any finite horizon n, the maximum growth betting in the i-th horse race, the gambler knows some side rate is achieved when the gambler invests the money propor- information Yi ∈Y. We assume that the gambler invests all his tional to the causal conditioning distribution, i.e., capital in the horse race as a function of the information that he ∗ i−1 i i−1 i i i b (xi|x ,y )=p(xi|x ,y ), ∀x ,y ,i≤ n, (3) knows at time i, i.e., the previous horse race outcomes X i−1 i i−1 i and side information Y up to time i. Let b(xi|x ,y ) be the and the growth is proportion of wealth that the gambler bets on horse xi given W ∗ Xn||Y n o Xn − H Xn||Y n . i−1 i−1 i i ( )=E[log ( )] ( ) X = x and Y = y . The betting scheme should satisfy i−1 i i−1 i b xi|x ,y ≥ b xi|x ,y i−1 i n ( ) 0 (no short) and xi ( )=1for {p xi|x ,y } i−1 i i−1 Note that the sequence ( ) i=1 uniquely de- any history x ,y . Let o(xi|x ) denote the odds of a horse p xn||yn xn,yn i−1 termines ( ). Moreover, for all pairs ( ) such xi x n n i−1 i n given the previous outcomes , which is the amount of that p(x ||y ) > 0, the sequence {p(xi|x ,y )}i=1 is capital that the gambler gets for each unit capital invested determined uniquely by p(xn||yn) by the identity in the horse. We denote by S(xn||yn) the gambler’s wealth p xi||yi n xn i−1 i ( ) after races where the race outcomes were and the side p(xi|x ,y )= . n p xi−1||yi−1 information that was causally available was y . The growth, ( ) W Xn||Y n ∗ i−1 i n denoted by ( ), is defined as the expected logarithm A similar argument applies for {b (xi|x ,y )}i=1 and (base 2) of the gambler’s wealth, i.e., b∗(xn||yn), and therefore (3) is equivalent to n n n n ∗ n n n n n n n n W (X ||Y )  E[log S(X ||Y )]. (2) b (x ||y )=p(x ||y ), ∀x ∈X ,y ∈Y .

1404

Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on November 18, 2009 at 02:09 from IEEE Xplore. Restrictions apply. ISIT 2008, Toronto, Canada, July 6 - 11, 2008

Proof of Theorem 1: We have B. Investing only part of the money ∗ n n n n n W (X ||Y )= max E[log b(X ||Y )o(X )] In this subsection, we consider the case wherein the gambler b(xn||yn) does not necessarily invest all his money in the gambling. Let n n n i i−1 b X ||Y o X b0(y ,x ) be the portion of money that the gambler does not =maxn n E[log ( )] + E[log ( )] b(x ||y ) invest in gambling at time i given that the previous race results n n n = −H(X ||Y ) + E[log o(X )], were xi−1 and the side information is yi. In this setting, the n n wealth is given by where the last equality is achieved by choosing b(x ||y )= n n n n p(x ||y ), and it is justified by the following upper bound S(X ||Y ) n n n   E[ log b(X ||Y )] i−1 i i−1 i i−1   = b0(X ,Y )+(b(Xi|X ,Y )o(Xi|X ) ,  n n n n n n b(x ||y ) i=1 = p(x ,y ) log p(x ||y )+log p xn||yn n n xn,yn ( ) and the growth W (X ||Y ) is defined as before in (2).  n n W Xn||Y n b(x ||y ) The term ( ) obeys a chain rule similar to the −H Xn||Y n p xn,yn n n = ( )+ ( )log p xn||yn causal conditioning entropy definition H(X ||Y ), i.e., xn,yn ( ) n (a)  n n n n i−1 i n n n n b(x ||y ) W X ||Y W X |X ,Y , ≤−H(X ||Y )+log p(x ,y ) ( )= ( i ) p xn||yn i=1 xn,yn ( ) (b)  where n n n n−1 n n ≤−H(X ||Y )+log p(y ||x )b(x ||y ) i−1 i−1 n n W (Xi|X ,Y ) x ,y n n i−1 i i−1 i i−1 = −H(X ||Y ), (4)  E log(b0(X ,Y )+b(Xi|X ,Y )o(Xi|X )) . Note that for any given history (xi−1,yi) ∈Xi−1 ×Yi,the where (a) follows from Jensen’s inequality and (b) from the i−1 i i−1 i n n−1 n n betting scheme {b0(x ,y ),b(xi|x ,y )} influences only fact that xn,yn p(y ||x )b(x ||y )=1. All summations n n n n W X |Xi−1,Yi in (4) are over the arguments (x ,y ), for which p(x ,y ) > ( i ), so that we have p xn||yn > n n 0. This ensures that ( ) 0, and therefore, we can max W (X ||Y ) n n {b (xi−1,yi),b(x |xi−1,yi)}n multiply and divide by p(x ||y ) in the first step of (4). 0 i i=1 n i−1 i In a case in which the odds are fair and uniform, i.e., = max W (Xi|X ,Y ) i−1 i i−1 i i−1 1 b0(x ,y ),b(xi|x ,y ) o(Xi|X )= |X | , then i=1 n  1 ∗ n n 1 n n p xi−1,yi W X |xi−1,yi . W (X ||Y )=log|X | − H(X ||Y ). = ( )max ( i ) b (xi−1,yi),b(x |xi−1,yi) n n i=1 xi−1,yi 0 i 1 W Xn||Y n Thus, the sum of the growth rate n ( ) and the en- The optimization problem in the last equation is equivalent 1 H Xn||Y n tropy rate n ( ) of the horse race process conditioned to the problem of finding the optimal betting strategy in causally on the side information is constant, and one can see the memoryless case where the winning horse distribution H Xn||Y n W ∗ Xn||Y n i−1 i a duality between ( ) and ( ); cf. [15, p(x) is p(x)=Pr(Xi = x|x ,y ), the odds o(x) are i−1 th. 6.1.3] o(x)=o(Xi = x|x ), and the betting strategy (b0,b(x)) W i i−1 i−1 i Let us denote by Δ the increase in the growth rate due is respectively (b0(x ,y ),b(Xi = x|x ,y )). Hence, i−1 i to causal side information, i.e., the optimization, max W (Xi|x ,y ), is equivalent to the 1 1 following convex problem: W W ∗ Xn||Y n − W ∗ Xn .  Δ = n ( ) n ( ) (5) maximize p(x)log(b0 + b(x)o(x)) W Y n Thus Δ characterizes the value of side information . x  Theorem 1 leads to the following proposition, which gives a subject to b0 + b(x)=1, new operational meaning of Massey’s directed information. x Corollary 1: The increase in growth rate due to causal side b0 ≥ 0,b(x) ≥ 0, ∀x ∈X. information Y n for horse races Xn is 1 The solution to this optimization problem was given by W I Y n → Xn . (6) 1 ≤ Δ = n ( ) Kelly [4]. If the odds are super-fair, namely, x o(x) 1, then the gambler will invest all his wealth in the race rather Proof: From Theorem 1, we have b x c than leave some as cash, since by betting ( )= o(x) , where ∗ n n ∗ n n n n c / 1 W (X ||Y ) − W (X )=−H(X ||Y )+H(X ) =1 x o(x) , the gambler’s money will be multiplied by n n c ≥ = I(Y → X ). 1, regardless of the race outcome. Therefore, for this case, the solution is given by Theorem 1, where the gambler invests proportional to the causal conditioning distribution p(xn||yn).

1405

Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on November 18, 2009 at 02:09 from IEEE Xplore. Restrictions apply. ISIT 2008, Toronto, Canada, July 6 - 11, 2008

 p 1 Horse 1 wins Horse 2 wins If the odds are sub-fair, i.e., x o(x) > 1, then it is optimal to bet only some of the money, namely b0 > 0. The solution 1−p X =1 X =2 1−p to this problem is given in terms of an algorithm in [4, p. 925]. p V. A N EXAMPLE Here we consider betting on a horse race, where the winning X 1 − q Y 1 1 horse can be represented as a Markov process, and causal side q information is available. q Example 1: Consider the horse race process depicted in 2 1 − q 2 Figure 1, where two horses are racing and the winning horse X Fig. 1. The setting of Example 1. The winning horse Xi is represented as a i behaves as a Markov process. A horse that won will win Markov process with two states. In state 1, horse number 1 wins, and in state again with probability 1 − p and lose with probability p. 2, horse number 2 wins. The side information, Yi, is a noisy observation of At time zero, we assume that both horses have probability the winning horse, Xi. 1 2 of winning. The side information Yi at time i is a noisy observation of the horse race outcome Xi. It has probability If the side information is known with some lookahead k ∈ 1−q of being equal to Xi, and probability q of being different {0, 1, ...}, that is, if the gambler knows Y i+k at time i, then from Xi. For this example, the increase in growth rate due to side the increase in growth rate is given by information as n goes to infinity is 1 n+k n ΔW = lim I(Y → X ) n→∞ n W h p ∗ q − h q , k Δ = ( ) ( ) = H(Yk+1|Y ,X0) − H(Y1|X1), (9) where the function h(·) denotes the binary entropy, i.e., where the last equality is due to the same arguments as in (8). h(x)=−x log x − (1 − x)log(1 − x), and p ∗ q denotes Figure 2 shows the increase in growth rate ΔW due to side the parameter of a Bernoulli distribution that results from information as a function of the side information parameters convolving two Bernoulli distributions with parameters p and (q,k). The left plot shows ΔW as a function of q, where q, i.e., p ∗ q =(1− p)q +(1− q)p. p =0.2 and no lookahead, k =0. The right plot shows ΔW The increase in the growth rate ΔW for this example can as a function of k, where p =0.2 and q =0.25. If the entire be obtained using first principles as follows: side information sequence Y1,Y2, ... is known to the gambler ahead of time, then we should have mutual information rather W Δ then directed information, i.e., 1 n n = lim I(Y → X ) 1 n n n→∞ n ΔW = lim I(Y ; X ) n n→∞ n 1 i i−1 i i H Y n = lim H(Y |X ) − H(Y |X ) ( ) n→∞ n = lim − H(Y1|X1), (10) i=1 n→∞ n n 1 i i i and this coincides with the fact that for a stationary hidden = lim H(Y |Xi−1) − H(Y2 |X2) − H(Y1|X1) k−1 n→∞ n Markov process {Y1,Y2, ...} the sequence H(Yk+1|Y ,X0) i=1 converges to the entropy rate of the process. n (a) 1 i i−1 i−1 = lim H(Y |Xi−1) − H(Y |X ) − H(Y1|X1) n→∞ n VI. CONCLUSION AND FURTHER EXTENSIONS i=1 We have shown that directed information arises naturally n 1 i−1 in gambling as the gain in the maximum achievable capital = lim H(Yi|Y ,Xi−1) − H(Y1|X1) n→∞ n i=1 growth due to the availability of causal side information. We (b) now outline two extensions: stock market portfolio strategies H Y |X − H Y |X h p ∗ q − h q , = ( 1 0) ( 1 1)= ( ) ( ) (7) and data compression in the presence of causal side informa- tion. Details are given in [16]. where steps (a) and (b) are due to the stationarity of the process (Xi,Yi). Alternatively, the sequence of equalities up to step A. Stock market (b) in (7) can be derived directly using Using a notation similar to that in [15, ch. 16], a stock n i X 1 (a) 1 market at time is represented as a vector of stocks i = I Y n → Xn I Y Xn|Xi−1,Yi−1 X ,X , ..., X m n ( ) = n ( i; i ) ( i1 i2 im), where is the number of stocks, and the i=1 price relative Xik is the ratio of the price of stock-k at the end (b) i k i = H(Y1|X0) − H(Y1|X1), (8) of day to the price of stock- at the beginning of day .We assume that at time i there is side information Y i that is known where (a) is the identity given in [11, eq. (9)] and (b) is due to the investor. A portfolio is an allocation of wealth across to the stationarity of the process. the stocks. A nonparticipating or causal portfolio strategy with

1406

Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on November 18, 2009 at 02:09 from IEEE Xplore. Restrictions apply. ISIT 2008, Toronto, Canada, July 6 - 11, 2008

1 0.17 ←H(Y1|Y0,Y−1, ...) −H Y1|X1 0.8 0.16 ( ) H(X1|X0)→ 0.6 0.15 W W

Δ 0.4 Δ 0.14

0.2 0.13

0 0.12 −4 −3 −2 −1 10 10 q 10 10 0.5 0 2 k4 6 8 Fig. 2. Increase in the growth rate, in Example 1, as a function of the side information parameters (q, k). The left plot of the figure shows the increase of the growth rate ΔW as a function of q =Pr(Xi = Yi) and there is no lookahead. The right plot shows the increase of the growth rate as function of lookahead k, where q =0.25. The horse race outcome is assumed to be a first-order binary symmetric Markov process with parameter p =0.2.

i−1 i i i causal side information at time i is denoted as b(x ,y ), and a decoder that can losslessly recover x based on y and the m i−1 i i−1 i 2 2 it satisfies k=1 bk(x ,y )=1, and bk(X ,Y ) ≥ 0 for bit stream M1(x1,y1)M2(x ,y ) ··· just as soon as it sees i−1 i n n 2 2 i i all possible x ,y . We define S(x ||y ) as the wealth at M1(x1,y1)M2(x ,y ) ···Mi(x ,y ) for all sequence pairs n the end of day n for a stock sequence x and causal side (x1,y1), (x2,y2) ... and all i ≥ 1. Using natural extensions information yn. We can write of standard arguments, we show in [16] that I(Y n → Xn) is   n n t n−1 n n−1 n−1 essentially (up to terms that are sublinear in n) the rate savings S(x ||y )= b (x ,y )xn S(x ||y ) n in optimal sequential lossless compression of X due to the where (·)t denotes the transpose of a vector. The goal is to causal availability of the side information. maximize the growth W (Xn||Y n) = E[log S(Xn||Y n)]. We n−1 n t n−1 n REFERENCES also define W (Xn|X ,Y ) = E[log(b (X ,Y )Xn)]. From this definition, we can write the chain rule [1] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J., vol. 27, pp. 379–423 and 623–656, 1948. n  [2] ——, “Coding theorems for a discrete source with fidelity criterion,” in n n i−1 i W (X ||Y )= W (Xi|X ,Y ). Information and Decision Processes, R. E. Machol, Ed. McGraw-Hill, i=1 1960, pp. 93–126. [3] R. G. Gallager, “Source coding with side information and universal The gambling in horse races with m horses studied in the coding,” Sept. 1976, unpublished manuscript. previous section is a special case of investing the stock market [4] J. L. Kelly, “A new interpretation of information rate,” Bell System m m m Technical Journal, vol. 35, pp. 917–926, 1956. with +1 stocks. The first stocks correspond to the [5] J. Massey, “Causality, feedback and directed information,” Proc. Int. horses and at the end of the day one of the stocks, say k ∈ Symp. Inf. Theory Applic. (ISITA-90), pp. 303–305, 1990. {1, ..., m}, gets the value o(k) with probability p(k) and all [6] G. Kramer, “Directed information for channels with feedback,” Ph.D. m Dissertation, Swiss Federal Institute of Technology (ETH) Zurich, 1998. other stocks become zero. The +1-st stock is always one, [7] S. Tatikonda, “Control under communication constraints,” Ph.D. diser- and it allows the gambler to invest only part of the wealth in tation, Massachusetts Institute of Technology, Cambridge, MA, 2000. the horse race. [8] G. Kramer, “Capacity results for the discrete memoryless network,” IEEE Trans. Inf. Theory, vol. IT-49, pp. 4–21, 2003. The developments in the previous section can be expanded [9] H. H. Permuter, T. Weissman, and A. J. Goldsmith, “Finite state channels to characterize the increase in growth rate due to side infor- with time-invariant deterministic feedback,” Sept. 2006, submitted to mation, where again directed information emerges as the key IEEE Trans. Inf. Theory. Availble at arxiv.org/pdf/cs.IT/0608070. [10] S. C. Tatikonda and S. Mitter, “The capacity of channels with feedback,” quantity, upper-bounding the value of causal side information; September 2006, submitted to IEEE Trans. Inf. Theory. Availble at cf. [17]. Details will be given in [16]. arxiv.org/cs.IT/0609139. [11] Y.-H. Kim, “A coding theorem for a class of stationary channels with B. Instantaneous compression with causal side information feedback,” Jan 2007, submitted to IEEE Trans. Inf. Theory. Availble at X ,X ,... Y ,Y ,... arxiv.org/cs.IT/0701041. Let 1 2 be a source and 1 2 its side in- [12] H. H. Permuter and T. Weissman, “Capacity region of the finite- formation sequence. The source is to be losslessly encoded state multiple access channel with and without feedback,” Au- instantaneously, with causal available side information. More gust 2007, submitted to IEEE Trans. Inf. Theory. Availble at arxiv.org/pdf/cs.IT/0608070. precisely, an instantaneous lossless source encoder with causal [13] B. Shrader and H. H. Permuter, “On the compound finite state channel side information consists of a sequence of mappings {Mi}i≥1 with feedback,” in Proc. Internat. Symp. Inf. Theory, Nice, France, 2007. i i ∗ such that each Mi : X ×Y →{0, 1} has the property [14] R. Venkataramanan and S. S. Pradhan, “Source coding with feedforward: xi−1 yi M xi−1 · ,yi Rate-distortion theorems and error exponents for a general source,” IEEE that for every and , i( ) is an instantaneous Trans. Inf. Theory, vol. IT-53, pp. 2154–2179, 2007. (prefix) code for Xi. [15] T. M. Cover and J. A. Thomas, Elements of , 2nd ed. An instantaneous lossless source encoder with causal side New-York: Wiley, 2006. [16] Y.-H. Kim, H. H. Permuter, and T. Weissman, “An interpretation of di- information operates sequentially, emitting the concatenated rected information in gambling, portfolio theory and data compression,” 2 2 bit stream M1(X1,Y1)M2(X ,Y ) ···. The defining property Jan. 2007, in preparation. i−1 i i−1 that Mi(x ·,y ) is an instantaneous code for every x and [17] A. R. Barron and T. M. Cover, “A bound on the financial value of information,” IEEE Trans. Inf. Theory, vol. IT-34, pp. 1097–1100, 1988. yi is a necessary and sufficient condition for the existence of

1407

Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on November 18, 2009 at 02:09 from IEEE Xplore. Restrictions apply.