Opportunistic Scheduling with Limited Channel State Information: A Rate Distortion Approach

Matthew Johnston, Eytan Modiano, Yury Polyanskiy Laboratory for Information and Decision Systems Massachusetts Institute of Technology Cambridge, MA Email: {mrj, modiano, yp}@mit.edu

Abstract—We consider an opportunistic communication The above minimization can be formulated as a rate system in which a transmitter selects one of multiple channels distortion optimization with an appropriately designed over which to schedule a transmission, based on partial distortion metric. The opportunistic communication frame- knowledge of the network state. We characterize a fun- damental limit on the rate that channel state information work, in contrast to traditional rate distortion, requires must be conveyed to the transmitter in order to meet a that the channel state information sequence be causally constraint on expected throughput. This problem is modeled encoded, as the receiver causally observes the channel as a causal rate distortion optimization of a Markov source. states. Consequently, restricting the rate distortion problem We introduce a novel distortion metric capturing the impact to causal encodings provides a tighter lower bound on the of imperfect channel state information on throughput. We compute a closed-form expression for the causal information required CSI that must be provided to the transmitter. rate distortion function for the case of two channels, as well Opportunistic scheduling is one of many network con- as an algorithmic upper bound on the causal rate distortion trol schemes that requires network state information (NSI) function. Finally, we characterize the gap between the causal in order to make control decisions. The performance of information rate distortion and the causal entropic rate- these schemes is directly affected by the availability and distortion functions. accuracy of this information. If the network state changes I.INTRODUCTION rapidly, there are more possibilities to take advantage of Consider a transmitter and a receiver connected by two an opportunistic performance gain, albeit at the cost of independent channels. The state of each channel is either additional overhead. For large networks, this overhead can ON or OFF, where transmissions over an ON channel become prohibitive. result in a unit throughput, and transmissions over an OFF This paper presents a novel rate distortion formulation channel fail. Channels evolve over time according to a to quantify the fundamental limit on the rate of overhead Markov process. At the beginning of each time slot, the required for opportunistic scheduling. We design a new receiver measures the channel states in the current slot, distortion metric for this setting that captures the impact and transmits (some) channel state information (CSI) to on network performance, and incorporate a causality con- the transmitter. Based on the CSI sent by the receiver, the straint to the rate distortion formulation to reflect practical transmitter chooses over which of the channels to transmit. constraints of a real-time communication system. We ana- In a system in which an ON channel and OFF channel lytically compute a closed-form expression for the causal are equally likely to occur, the transmitter can achieve an rate distortion lower bound for a two-channel system. 1 Additionally, we propose a practical encoding algorithm expected per-slot throughput of 2 without channel state in- 3 to achieve the required throughput with limited overhead. formation, and a per-slot throughput of 4 if the transmitter has full CSI before making scheduling decisions. How- Moreover, we show that for opportunistic scheduling, there ever, the transmitter does not need to maintain complete is a fundamental gap between the and knowledge of the channel state in order to achieve high -rate-based rate distortion functions, and discuss throughput; it is sufficient to only maintain knowledge of scenarios under which this gap vanishes. Proofs have been which channel has the best state. Furthermore, the memory omitted for brevity. in the system can be used to further reduce the required II.PROBLEM FORMULATION CSI. We are interested in the minimum rate that CSI must be sent to the transmitter in order to guarantee a lower Consider a transmitter and a receiver, connected through bound on expected throughput. This quantity represents a M independent channels. Assume a time slotted system, fundamental limit on the overhead information required in where at time-slot t, each channel has a time-varying this setting. channel state Si(t) ∈ {OFF, ON}, independent from all other channels. The notation Si(t) ∈ {0, 1} is used This work was supported by NSF Grants CNS-0915988, CNS- interchangeably. 1217048, ARO MURI Grant W911NF-08-1-0238, ONR Grant N00014- 12-1-0064, and by the Center for Science of Information (CSoI), an NSF Let X(t) = Xt = {S1(t),S2(t),...,SM (t)} represent Science and Technology Center, under grant agreement CCF-09-39370. the system state at time slot t. At each time slot, the p satisfying

n n X X n n n n n 1 − p OFF ON 1 − q E[d(x1 , z1 )] = p(x1 )q(z1 |x1 )d(x1 , z1 ) ≤ D. n n x1 z1 q (3) where p(xn) is the PDF of the source, and the causality Fig. 1: describing the channel state evolution of 1 each independent channel. constraint: i n i n n n i i q(z1|x1 ) = q(z1|y1 ) ∀x1 , y1 s.t. x1 = y1, (4) transmitter chooses a channel over which to transmit, with the goal of opportunistically transmitting over an ON Mathematically, the minimum rate that CSI must be trans- channel. Channel states evolve over time according to a mitted is given by Markov process described by the chain in Figure 1, with NG 1 n 1 1 Rc (D) = lim inf H(Z1 ) (5) n→∞ transition probabilities p and q satisfying p ≤ 2 and q ≤ 2 , q∈Qc(D) n corresponding to channels with “positive memory.” where 1 H(Zn) is the entropy rate of the encoded se- The transmitter does not observe the state of the system. n 1 quence in bits. Equation (5) is the causal rate distortion Instead, the receiver causally encodes the sequence of function, as defined by Neuhoff and Gilbert [1], and channel states Xn into the sequence Zn and sends the 1 1 is denoted using the superscript NG. This quantity is encoded sequence to the transmitter, where Xn is used to 1 an entropy rate distortion function, in contrast to the denote the vector of random variables [X(1),...,X(n)]. information rate distortion function [2], [3], [4], which will The encoding Z(t) = Z ∈ {1,...,M} represents the t be discussed in Section III. The decision to formulate this index of the channel over which to transmit. Since the problem as a minimization of entropy rate is based on the throughput-optimal transmission decision is to transmit intuition that the entropy rate should capture the average over the channel with the best state, it is sufficient for number of bits per channel use required to convey channel the transmitter to restrict its knowledge to the index of the state information. channel with the best state at each time. The expected throughput earned in slot t is E[thpt(t)] = B. Previous Work S (t) i = Z(t) Z(t) , since the transmitter uses channel , Among the earliest theoretical works to study commu- 1 and receives a throughput of if that channel is ON, nication overhead in networks is Gallager’s seminal paper and 0 otherwise. Clearly, a higher throughput is attainable [5], where fundamental lower bounds on the amount of with more accurate CSI, determined by the quality of overhead needed to keep track of source and destination Zn the encoding 1 . The average distortion between the addresses and message starting and stopping times are xn zn sequences 1 and 1 is defined in terms of the per-letter derived using rate-distortion theory. A discrete-time analog distortion, of Gallager’s model is considered in [6]. A similar frame- n 1 X work was considered in [7] and [8] for different forms of d(xn, zn) = d(x , z ), (1) 1 1 n i i network state information. i=1 The traditional rate-distortion problem [9] has been where d(xi, zi) is the per-letter distortion between the ith extended to bounds for Markov Sources in [10], [11], source symbol and the ith encoded symbol at the transmit- [12]. Additionally, researchers have considered the causal ter. For the opportunistic communication framework, the source coding problem due to its application to real-time per-letter distortion is defined as processing. One of the first works in this field was [1], in which Neuhoff and Gilbert show that the best causal d(x , z ) 1 − [thpt(t)] = 1 − S (t), (2) i i , E Z(t) encoding of a memoryless source is a memoryless coding, where SZ(t) is the state of the channel indexed by Z(t). or a time sharing between two memoryless codes. Neuhoff Thus, an upper bound on expected distortion translates and Gilbert focus on the minimization of entropy rate, as to a lower bound on expected throughput. Note that the in (5). The work in [13] studied the optimal finite-horizon traditional Hamming distortion metric is inappropriate in sequential quantization problem, and showed that the this setting, since the transmitter does not need to know optimal encoder for a kth-order Markov source depends the channel states of channels it will not transmit over. on the last k source symbols and the present state of the decoder’s memory (i.e. the history of decoded symbols). A. Problem Statement A causal (sequential) rate distortion theory was intro- duced in [3] and [14] for stationary sources. They show The goal in this work is to determine the minimum rate that the sequential rate distortion function lower bounds that CSI must be conveyed to the transmitter to achieve the entropy rate of a causally encoded sequence, but this a lower bound on expected throughput. In this setting, inequality is strict in general. Despite this, operational CSI must be conveyed to the transmitter casually, in other significance for the causal rate distortion function is de- words, the ith encoding can only depend on the channel veloped in [3]. Lastly, [4] studies the causal rate distortion state at time i, and previous channel states and encodings. function as a minimization of directed mutual information, n n Let Qc(D) be the family of causal encodings q(z1 |x1 ) and computes the form of the optimal causal kernels. n n III.RATE DISTORTION LOWER BOUND Theorem 1. The optimal kernel q(z1 |x1 ) satisfies i−1 To begin, we review the traditional rate distortion i−1 i Q(zi|z1 ) exp(−λd(xi, zi)) q(zi|z , x ) = problem, and define the causal information rate distortion 1 1 P Q(z |zi−1) exp − λd(x , z ) zi i 1 i i function, a minimization of mutual information, which is (8) known to lower bound RNG(D) [14], and hence provides i i−1 c where for all z1, Q(zi|z1 ) and λ satisfy a lower bound on the required rate at which CSI must n Pn  X P (x1 ) exp − λd(xi, zi) be conveyed to the transmitter to meet the throughput 1 = i=1 Qn P i−1  (9) requirement. n i=1 z Q(zi|z1 ) exp − λd(xi, zi) x1 i n A. Traditional Rate Distortion Equation (9) holds for all z1 , and gives a system of equations from which one can solve for Q(z |zi−1). Note Consider the well known rate distortion problem, in i 1 this holds in general for any number of Markovian chan- which the goal is to find the minimum number of bits per nels, and can be numerically solved to determine R (D). source symbol necessary to encode a source while meet- c Observe in (8) that q(z |zi−1, xi ) = q(z |zi−1, x ). In ing a fidelity constraint. Consider a discrete memoryless i 1 1 i 1 i other words, the solution to the rate distortion optimization source {X }∞ , where X ’s are i.i.d. random variables i i=1 i is a distribution which generates Z depending only on the taking values in the set X , according to distribution p (x). i X source sequence through the current source symbol. This source sequence is encoded into a sequence {Z }∞ , i i=1 C. Analytical Solution for Two-Channel System with Zi taking values in Z. The distortion between a block of source symbols and encoded symbols is defined in Consider the system in Section II with two channels (1) with per-letter distortion d(xi, zi). Define Q(D) to be (M = 2), and a symmetric channel state Markov chain. the family of conditional probability distributions q(z|x) Theorem 2. For the aforementioned system, the causal satisfying an expected distortion constraint (3). information rate distortion function is given by Shannon’s rate-distortion theory states that the mini- 1 1  1 1  mum rate R at which the source can be encoded with Rc(D) = 2 Hb 2p − 4pD + 2D − 2 − 2 Hb 2D − 2 average distortion less than D is given by the information (10) rate distortion function R(D), where 1 1 for all D satisfying 4 ≤ D ≤ 2 . R(D) , min I(X; Z), (6) q(z|x)∈Q(D) This result follows from evaluating (8) and (9) for a two channel system, and showing the stationarity of the and I(·; ·) represents mutual information. optimal kernel. The information rate distortion function in (10) is a lower bound on the rate that information needs B. Causal Rate Distortion for Opportunistic Scheduling 1 to be conveyed to the transmitter. A distortion Dmin = 4 Consider the problem formulation in Section II. As 1 represents a lossless encoding, since 4 of the time slots, discussed above, the information rate distortion is a min- both channels are OFF, and no throughput can be obtained. imization of mutual information over all stochastic ker- 1 Additionally, Dmax = 2 corresponds to an oblivious nels satisfying a distortion constraint. For opportunistic encoder, as transmitting over an arbitrary channel requires scheduling, this minimization is further constrained to 1 no rate, and achieves distortion equal to 2 . The function include only causal kernels. Let Qc(D) be the set of all Rc(D) is plotted in Figure 2 as a function of D. stochastic kernels q(zn|xn) satisfying (3) and (4). The 1 1 IV. HEURISTIC UPPER BOUND causal information rate distortion function is defined as In this section, we propose an algorithmic upper bound 1 n n Rc(D) = lim inf I(X1 ; Z1 ). (7) to the Neuhoff-Gilbert rate distortion function in (5). For n→∞ q(zn|xn)∈Q (D) n 1 1 c simplicity, assume that p = q, and that M = 2, i.e. The function Rc(D) is a lower bound on the Neuhoff- the transmitter has two symmetric channels over which to NG Gilbert rate distortion function Rc (D) in (5), and hence transmit. Therefore, X(t) ∈ {00, 01, 10, 11}. Observe that a lower bound on the rate of CSI that needs to be conveyed when X(t) = 11, no distortion is accumulated regardless to the transmitter to ensure expected per-slot throughput of the encoding Z(t), and a unit distortion is always is greater than 1 − D. In the traditional (non-causal) rate accumulated when X(t) = 00. The minimum possible 1 distortion framework, this bound is tight; however, in the average distortion is Dmin = 4 , since the state of the 1 causal setting this lower bound is potentially strict. Note system is 00 for a fraction 4 of the time. that for memoryless sources, Rc(D) = R(D), where A. Minimum Distortion Encoding Algorithm R(D) is the traditional rate distortion function; however, causal f(·) Z(t) = for most memoryless sources, R(D) < RNG(D). Recall that a encoder satisfies c f(Xt,Zt−1). Consider the following encoding policy: The optimization problem in (7) is solved using a 1 1  geometric programming dual as in [15]. The following Z(t − 1) if X(t) = 00 or X(t) = 11  result gives the structure of the optimal stochastic kernel. Z(t) = 1 if X(t) = 10 Note that this result is also obtained in the work [4] for a 2 if X(t) = 01 similar formulation. (11) Lower and Upper Bounds on Required Overhead Rate Note that Z(t) is a function of Z(t − 1) and X(t), and is 0.5 R (D) c 0.45 therefore a causal encoding as defined in (4). The above Encoding Upper Bound encoding achieves expected distortion equal to 1 , the 0.4 4 0.35 minimum distortion achievable. Note that the transmitter is 0.3 unaware of the channel state; conveying full CSI requires 0.25 Rate additional rate at no reduction to distortion. Let K be a 0.2 random variable denoting the number of time slots since 0.15 the last change in the sequence Z(i), i.e., 0.1 0.05

0 K = min{j < i|Z(i − j) 6= Z(i − j − 1)}. (12) 0.25 0.3 0.35 0.4 0.45 0.5 j D Thus, the transmitter can infer the state of the system K Fig. 2: Plots the causal information rate distortion function slots ago. Since the channel state is Markovian, the entropy Rc(D) (Section III) and the upper bound to the rate distortion ∞ function (Section IV), computed using Monte Carlo Simulation. rate of the sequence Z1 is expressed as n 1 n 1 X i−1 T respectively. For large T , the achievable R(D) curve lim H(Z1 ) = lim H(Z(i)|Z ) (13) n→∞ n n→∞ n 1 for the above encoding algorithm, denoted by the points i=1 (D(T ),R(T )) has slope = H(Z(i)|Z(i − 1),K) (14) ∞ R(T + 1) − R(T ) −H(M) X lim = , (16) = (K = k)H ( (Z(i) 6= Z(i − 1)|K = k)) (15) T →∞ 1 P b P D(T + 1) − D(T ) c + 4 E[M] k=1 where M is a random variable denoting the expected H (·) where b is the binary entropy function. Note by number of slots after the initial T slots until the Zi definition, Z(i − 1) = Z(i − K) in (14). Equation (15) sequence changes value, and c is a constant given by can be computed in terms of the transition probabilities of T  the Markov chain in Figure 1. X 1 c = E[ (Xi = 00 or Xi = 01)] i=1 B. Threshold-based Encoding Algorithm  1 In order to further reduce the rate of the encoded − E[ (Xi = 00 or Xi = 01)|X0 = 10] . (17) sequence using the encoder in (11), a higher expected distortion is required. A new algorithm is obtained by The constant in (17) represents the difference in ex- introducing a parameter T , and modifying the encoding pected accumulated distortion over an interval of T slots of algorithm in (11) as follows: If K ≤ T , then Z(i) = the state processes beginning in steady state and X0 = 10. Z(i − 1), and if K > T , then Z(i) is assigned according Proposition 1 shows that the slope of R(D) is independent to (11). As a result, for the first T slots after the Z(i) of T for T sufficiently large, as illustrated in Figure 2. sequence changes value, the transmitter can determine the V. CAUSAL RATE DISTORTION GAP next element of the sequence deterministically, and hence the sequence can be encoded with zero rate. After T slots, Figure 2 shows a gap between the causal information the entropy in the Z(i) process is similar to that of the rate distortion function, and the heuristic upper bound to original encoding algorithm. As expected, this reduction in the Neuhoff-Gilbert rate distortion function computed in entropy rate comes at an increase in distortion. In the first Section IV. In this section, we prove that for a class of T slots after a change to Z(i) = 1, every visit to state distortion metrics including the throughput metric in (2), X(i) = 01 or X(i) = 00 incurs a unit distortion. The there exists a gap between the information and Neuhoff- accumulated distortion is equal to the number of visits to Gilbert causal rate distortion functions, even at D = Dmin. those states in an interval of T slots. For example, consider a discrete memoryless source Clearly, as the parameter T increases, the entropy rate {Xi}, drawing i.i.d. symbols from the alphabet {0, 1, 2}, decreases, and the expected distortion increases. Conse- and an encoding sequence {Zi} drawn from {0, 1, 2}. quently, T parameterizes the rate-distortion curve; how- Consider the following distortion metrics: d1(x, z) = ever, due to the integer restriction, only a countable 1z6=x and d2(x, z) = 1z=x, where 1 is the indicator number of rate-distortion pairs are achievable by varying function. The first metric d1(x, z) is a simple Hamming T , and time sharing is used to interpolate between these distortion measure, used to minimize probability of error, points. An example curve is shown in Figure 2. Note that whereas the second is such that there exist two distortion- as T increases, the corresponding points on the R(D) free encodings for each source symbol. The causal rate curve become more dense. Furthermore, for the region distortion functions Rc(D) for d1(x, z) and d2(x, z) are of R(D) parameterized by large T , the function R(D) R (D) = −H (D) − D log 2 − (1 − D) log 1 is linear. The slope of this linear region is characterized 1 b 3 3 2 by the following result. 0 ≤ D ≤ 3 (18) R (D) = −H (D) − D log 1 − (1 − D) log 2 Proposition 1. Let R(T ) and D(T ) denote the rate 2 b 3 3 1 and expected distortion as functions of the parameter 0 ≤ D ≤ 3 . (19) ministic mapping f : X → Z resulting in minimum distortion, then there will be no gap between the Neuhoff- Gilbert rate distortion function and the causal information rate distortion function at Dmin. However, when there are multiple deterministic mappings that achieve minimum distortion, a randomized combination of them results in lower mutual information, creating a gap between the two rate distortion functions. Note that the throughput distortion metric in (2) satisfies the conditions of Theorem (a) Distortion d1(x, z). (b) Distortion d2(x, z). 3, as well as any distortion metric such that there exists a Fig. 3: Rate distortion functions for example systems. source symbol such that all per-letter encodings result in the same distortion satisfies the theorem statement. While the above result proves that the causal infor- mation rate distortion function is not tight, it is still Additionally, Neuhoff and Gilbert [1] show that for a possible to provide an operational interpretation to R (D) memoryless source, RNG equals the lower convex enve- c c in (10). In [3], the author proves that for a source {X }, lope of all memoryless encoders for this source. Thus, n,t which is Markovian across the time index t, yet i.i.d. NG 3 across the spatial index n, there exist blocks of suffi- Rc,1 (D) = (1 − 2 D) log 3 (20) ciently large t and n such that the causal rate distortion RNG(D) = (1 − 3D)H ( 1 ). (21) c,2 b 3 function is operationally achievable, i.e. the information The information and Neuhoff-Gilbert rate distortion func- and Neuhoff-Gilbert rate distortion functions are equal. In tions for the two metrics are plotted in Figure 3. Note the opportunistic scheduling setting, this is equivalent to that for both distortion metrics, the causal rate distortion a transmitter sending N messages to the receiver, where function is not operationally achievable. Furthermore, in each transmission is assigned a disjoint subset of the the lossless encoding case (D = Dmin), there is a gap in channels over which to transmit. However, this restriction between the Neuhoff-Gilbert and information rate distor- can result in a reduced throughput. tion functions when using the second distortion metric. REFERENCES This gap arises when for a state x, there exist multiple encodings z that can be used with no distortion penalty. [1] D. Neuhoff and R. Gilbert, “Causal source codes,” , IEEE Transactions on, vol. 28, no. 5, pp. 701–713, 1982. This observation is formalized in the following result. [2] A. Gorbunov and M. S. Pinsker, “Nonanticipatory and prognos- tic epsilon and message generation rates,” Problemy Theorem 3. Let {Xi} represent an i.i.d. discrete memo- Peredachi Informatsii, vol. 9, no. 3, pp. 12–21, 1973. ryless source from alphabet X , encoded into a sequence [3] S. Tatikonda and S. Mitter, “Control under communication con- straints,” Automatic Control, IEEE Transactions on, vol. 49, no. 7, {Zi} taken from alphabet Z, subject to a per-letter dis- pp. 1056–1068, 2004. tortion metric d(xi, zi). Furthermore, suppose there exists [4] P. A. Stavrou and C. D. Charalambous, “Variational equali- x1, x2, y ∈ X and z1, z2 ∈ Z, such that z1 6= z2 and ties of directed information and applications,” arXiv preprint arXiv:1301.6520, 2013. a) P(x1) > 0, P(x2) > 0, P(y) > 0, [5] R. Gallager, “Basic limits on protocol information in data com- b) z1 is the minimizer z1 = arg minz d(x1, z), munication networks,” Information Theory, IEEE Transactions on, c) z is the minimizer z = arg min d(x , z), vol. 22, no. 4, pp. 385–398, 1976. 2 2 z 2 [6] B. P. Dunn and J. N. Laneman, “Basic limits on protocol infor- d) d(y, z1) = d(y, z2) = minz d(y, z). mation in slotted communication networks,” in ISIT 2008. IEEE, Then RNG(D ) > R (D ). 2008, pp. 2302–2306. c min c min [7] A. A. Abouzeid and N. Bisnik, “Geographic protocol information and capacity deficit in mobile wireless ad hoc networks,” Informa- Proof: By [1], there exists a deterministic function tion Theory, IEEE Trans. on, vol. 57, no. 8, pp. 5133–5150, 2011. f : X → Z such that [8] J. Hong and V. O. Li, “Impact of information on network performance-an information-theoretic perspective,” in GLOBECOM NG Rc (Dmin) = H(f(X)) (22) 2009. IEEE, 2009, pp. 1–6. [9] T. Berger, Rate-Distortion Theory. Wiley Online Library, 1971. E[d(X, f(X))] = Dmin (23) [10] R. Gray, “Information rates of autoregressive processes,” Informa- tion Theory, IEEE Trans. on, vol. 16, no. 4, pp. 412–421, 1970. Define a randomized encoding q(z|x), where z = f(x) [11] T. Berger, “Explicit bounds to r (d) for a binary symmetric markov for all x 6= y, and the source symbol y is encoded source,” Information Theory, IEEE Transactions on, vol. 23, no. 1, randomly into z or z with equal probability. Conse- pp. 52–59, 1977. 1 2 [12] S. Jalali and T. Weissman, “New bounds on the rate-distortion quently, H(Z|X) > 0, and H(Z) > Iq(X; Z) under function of a binary markov source,” in ISIT 2007. IEEE, 2007, encoding q(z|x). Note that the new encoding also satisfies pp. 571–575. [d(X,Z)] = D . To conclude, [13] H. Witsenhausen, “On the structure of real-time source coders,” Eq min Bell Syst. Tech. J, vol. 58, no. 6, pp. 1437–1451, 1979. RNG(D ) = H(f(X)) > I (X; Z) [14] V. Borkar, S. Mitter, A. Sahai, and S. Tatikonda, “Sequential source min q coding: an optimization viewpoint,” in CDC-ECC’05. IEEE, 2005, ≥ R(Dmin) = Rc(Dmin) (24) pp. 1035–1042. [15] M. Chiang and S. Boyd, “Geometric programming duals of and rate distortion,” Information Theory, IEEE Transac- tions on, vol. 50, no. 2, pp. 245–258, 2004. Theorem 3 shows that if there exists only one deter-