3690 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 64, NO. 9, SEPTEMBER 2019 Control Over Gaussian Channels With and Without Source–Channel Separation Anatoly Khina , Member, IEEE, Elias Riedel Garding° , Gustav M. Pettersson , Victoria Kostina , Member, IEEE, and Babak Hassibi, Member, IEEE

Abstract—We consider the problem of controlling an un- first, the quantizer needs to be capable of zooming in and stable linear plant with Gaussian disturbances over an ad- out to be able to track unbounded system disturbances, and ditive white Gaussian noise channel with an average trans- second, the channel code must be capable of improving mit power constraint, where the signaling rate of commu- its estimates of the past transmissions exponentially with nication may be different from the sampling rate of the time, a characteristic known as anytime reliability. We im- underlying plant. Such a situation is quite common since plement a separated scheme by leveraging recently devel- sampling is done at a rate that captures the dynamics of oped techniques for control over quantized-feedback chan- the plant and that is often lower than the signaling rate of nels and for efficient decoding of anytime-reliable codes. the communication channel. This rate mismatch offers the We further propose an alternative, namely, to perform ana- opportunity of improving the system performance by us- log joint source–channel coding, by this avoiding the digital ing coding over multiple channel uses to convey a single domain altogether. For the case where the communication control action. In a traditional, separation-based approach signaling rate is twice the sampling rate, we employ ana- to source and channel coding, the analog message is first log linear repetition as well as Shannon–Kotel’nikov maps quantized down to a few bits and then mapped to a channel to show a significant improvement in stability margins and codeword whose length is commensurate with the num- linear-quadratic costs over separation-based schemes. We ber of channel uses per sampled message. Applying the conclude that such analog coding performs better than sep- separation-based approach to control meets its challenges: aration, and can stabilize all moments as well as guarantee almost-sure stability.

Manuscript received August 11, 2018; accepted November 3, 2018. Index Terms—Channel coding, combined source– Date of publication April 19, 2019; date of current version August 28, channel coding, Gaussian channel, Lloyd–Max algorithm, 2019. This work was supported by the European Union’s Horizon 2020 networked control systems, quantization, tree codes. research and innovation program under the Marie Skłodowska-Curie Grant 708932. The work of E. Riedel Garding° was supported by the National Science Foundation (NSF) under Grant CCF-1566567 through the SURF program. The work of G. M. Pettersson was supported by The I. INTRODUCTION Boeing Company under the SURF program. The work of V. Kostina was HE current technological era of ubiquitous wireless con- supported in part by the NSF under Grant CCF-1566567. The work of nectivity and the Internet of Things exhibits an ever- B. Hassibi was supported in part by the NSF under Grant CNS-0932428, T Grant CCF-1018927, Grant CCF-1423663, and Grant CCF-1409204, growing demand for new and improved techniques for stabi- by a grant from Qualcomm Inc., by NASA’s Jet Propulsion Laboratory lizing cyber-physical and networked control systems, which as through the President and Director’s Fund, and by King Abdullah Univer- a result have been the subject of intense recent investigations sity of Science and Technology. This paper was presented in part at the [1]–[4]. Unlike traditional control with colocated plant, observer IEEE Conference on Decision and Control, Las Vegas, NV, USA, Dec. 2016. This work was done in part while A. Khina and V. Kostina were and controller, the components of such systems are separated by visiting the Simons Institute for the Theory of Computing. Recommended unreliable communication links. In many of these systems, the by Associate Editor K. Kashima. (Corresponding author: Anatoly Khina.) rate at which the output of the plant is sampled and observed, as A. Khina was with the Department of Electrical Engineering, California well as the rate at which control inputs are applied to the plant, Institute of Technology, Pasadena, CA 91125 USA. He is now with the Department of Electrical Engineering—Systems, Tel Aviv University, Tel is different from the signaling rate with which communication Aviv 6997801, Israel (e-mail:,[email protected]). occurs. The rate at which the plant is sampled and controlled is E. R. Garding° was with the Department of Applied Mathematics and often governed by how fast the dynamics of the plant is, whereas Theoretical Physics, Centre for Mathematical Sciences, University of the signaling rate of the communication depends on the band- Cambridge, Cambridge CB3 0WA, U.K. He is now with the School of width available, the noise levels, etc. As a result, there is no Engineering Sciences, KTH Royal Institute of Technology, SE-100 44 Stockholm, Sweden (e-mail:,[email protected]). inherent reason why these two rates should be related and, in G. M. Pettersson is with the School of Engineering Sciences, KTH fact, the communication signaling rate is almost always higher Royal Institute of Technology, SE-100 44 Stockholm, Sweden (e-mail:, than the control sampling rate. [email protected]). This latest fact clearly gives us the opportunity to improve the V. Kostina is with the Department of Electrical Engineering, Cal- ifornia Institute of Technology, Pasadena, CA 91125 USA (e-mail:, performance of the system by conveying the information about [email protected]). each sampled output of the plant, and/or each control signal, B. Hassibi is with the Department of Electrical Engineering, California through multiple uses of the communication channel. Institute of Technology, Pasadena, CA 91125 USA (e-mail:, hassibi@ The standard information-theoretic approach suggests quan- caltech.edu). tizing the analog messages (the sampled output or state signal) Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. and then protecting the quantized bits with a channel error- Digital Object Identifier 10.1109/TAC.2019.2912255 correcting code whose block length is commensurate with the

0018-9286 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on February 16,2020 at 05:48:56 UTC from IEEE Xplore. Restrictions apply. KHINA et al.: CONTROL OVER GAUSSIAN CHANNELS WITH AND WITHOUT SOURCE–CHANNEL SEPARATION 3691

number of channel uses available per sample. This approach the digital domain altogether, and can therefore be viewed as a relies on the source–channel separation principle, which prof- simple instance of joint source–channel coding (JSCC) [24]. fers that quantization of the messages and channel coding of the Surprisingly, in the Gaussian rate-matched case, in which one quantized bits can be done independently of one another. additive white Gaussian noise (AWGN) channel use is avail- Nonetheless, while source–channel separation-based able per one white Gaussian source sample, a simple amplifier schemes become optimal in communication systems where achieves the Shannon limit with zero delay [25], [26]. The op- large blocks of the message and the channel code are processed timality of linear schemes extends further to the case where together (necessitating noncausal knowledge of all the message KC > 1 uses of an AWGN channel with perfect instantaneous signals and entailing large delays)—a celebrated result [5], [6, feedback are available per one white Gaussian source sam- Ch. 3.9]—it is not true for control systems that require real-time ple [26]–[30], the reason being that a Gaussian source is prob- (low-delay) communication of causally available messages. abilistically matched to a Gaussian channel [31]—uncommon Furthermore, since any error made in the past is magnified coincidence. Tatikonda and Mitter [32] exploited a special prop- in each subsequent time step due to the unstable nature of erty of the erasure channel with feedback, in which a retransmis- the plant, the source–channel separation principle requires a sion scheme attains its capacity without delay. A related example stronger notion of error protection, termed anytime reliability is control over a packet-drop channel, considered by Sinopoli by Sahai and Mitter [7]. Anytime reliability guarantees that the et al. [33]. There, a simple retransmission scheme attains the error probability of causally encoded quantized (“information”) optimum, as long as the packet drop probability is not too high. bits decays faster than the inflation factor at each step. Sahai Coding of Gauss–Markov sources over a packet erasure channel and Mitter [7] further observed that anytime-reliable codes with feedback is studied in [34]. have a natural tree code structure reminiscent of the codes de- JSCC in the absence of probabilistic matching is challenging. veloped by Schulman [8] for the related problem of interactive In the Gaussian rate-mismatched case with no communication communication. feedback, in which KC > 1 AWGN channel uses are available Sukhavasi and Hassibi [9] showed that anytime reliability can per one source sample, repetitive transmission of the source sam- be guaranteed with high probability by concentrating on the fam- ple is suboptimal. Non-linear mappings are known to achieve ily of linear time-invariant (LTI) codes and choosing their co- better performance, as noted originally by Shannon [35] and Ko- efficients at random. Unfortunately, maximum-likelihood (ML) tel’nikov [36], and in the context of control—in [37]–[39] (and (optimum) decoding of tree codes is infeasible.1 To overcome is akin to Witsenhausen’s celebrated counterexample [40]). this problem, a sequential decoder [10], [11, Ch. 6.4], [12, In this paper, we concentrate on the simple case of stabilizing Ch. 10], [13, Sec. 6.9], [14, Ch. 6], [15, Ch. 6] for tree codes a scalar discrete-time linear quadratic Gaussian (LQG) control was proposed in [16] and was shown to achieve anytime reliabil- system over an AWGN channel with KC =2channel uses per ity with high probability while maintaining bounded expected control sample, with a fixed SNR. As we show in the sequel, decoding complexity, albeit with some loss of performance. this SNR imposes an upper limit on the size of the maximum Tree codes transform the control task over a noisy channel unstable eigenvalue of the plant that can be stabilized. to that over a noiseless channel with finite-capacity C,imply- We develop end-to-end separation-based and JSCC schemes ing that the channel code needs to be supplemented with an and compare their LQG costs, as well as the minimum required adequate fixed-rate quantizer (a.k.a. fixed-length lossy source SNR for stabilizing the system; to the best of our knowledge, coder). Such a quantizer compresses the analog signal to pack- this is the first time the latter is devised for the scenario of ets of exactly C bits to be communicated from the observer to LQG control over an AWGN channel (cf., [7]), whereas the the controller at every time step. As unstable systems with dis- former (in the absence of instantaneous feedback) was offered turbances that have distributions with unbounded support can- in the conference version of this paper [41]. We show that JSCC not be stabilized by a static quantizer [17, Sec. III-A], adaptive schemes achieve far better performance while requiring far less uniform and logarithmic quantizers that establish stabilizability computational and memory resources. We further observe that guarantees were devised by Yuksel¨ [18] and Minero et al. [19] an inherent advantage of JSCC schemes is that they allow a (and others), respectively. A greedy algorithm that re-calculates, graceful improvement in performance with the SNR, while the at every time step, the probability density function (PDF) of the performance of separation-based schemes saturates due to their source conditioned on the previously transmitted packets (a digital nature. Moreover, whereas separation-based schemes can la sequential Bayesian filtering [20]) and applies Lloyd–Max guarantee only a finite number of bounded moments, certain quantization [21, Ch. 6] with respect to this PDF, was proposed JSCC schemes, e.g., linear ones (repetition-based included), by Bao et al. [22] (albeit without any optimality claims). For can stabilize all moments (as well as guarantee almost-sure the scenario of Gaussian disturbances and scalar measurements, stability). this algorithm has been recently shown to be greedily optimal Another important product of this paper is the implemen- in [23], i.e., it minimizes the linear-quadratic cost at each time tation and simulation of all the aforementioned schemes and step; it was further shown there to be nearly globally optimal. algorithms along with a comparison of their performance; the An obvious alternative strategy to separated source/channel Python 3 code is available online in [42]. coding is to simply repeat the transmitted (analog) signal—this An outline of the rest of the paper is as follows. We formu- adds a linear factor to the signal-to-noise ratio (SNR) (3 dB late the problem setup in Section II. Three different ingredients for a single repetition). This strategy maps the analog control that are used to construct the schemes for control over noisy signals directly into analog communication signals, avoiding channels of this paper, namely, a quantizer for control over a noiseless finite-rate channel, an anytime-reliable tree code, and 1Except over erasure channels, over which ML decoding amounts to solving an Archimedean bi-spiral-based JSCC map, are described in linear equations [9]. Sections III, IV, and V, respectively. They are subsequently used

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on February 16,2020 at 05:48:56 UTC from IEEE Xplore. Restrictions apply. 3692 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 64, NO. 9, SEPTEMBER 2019

a  a ,...,a T of notation, by t ( t;1 t;K C ) , where “T” denotes the transpose operation. The corresponding channel output vector b  b ,...,b T is denoted by t ( t;1 t;K C ) . Causal Transmitter: At time t, generates KC channel inputs t t−1 K C by applying a causal function Et : R × R → R to the t measured states x  (x1 ,...,xt ) and all past control signals t−1 u  (u1 ,...,ut−1 ):   t t−1 at = Et x ,u , (4) where the input is subject to an average power constraint (3). Remark 2.2: In this paper, we assume that the ob- t− Fig. 1. Scalar control system with an AWGN driving disturbance and server/transmitter knows all past control signals u 1 ;fora an AWGN communication channel. The dashed line represents the as- discussion of the scenario when such information is not avail- sumption that the past control signals are available at the transmitter. able at the observer, see Section IX-C. Causal Receiver: At time t, observes KC channel outputs and generates a control signal ut by applying a causal function tK in Sections VI and VII to develop source–channel separation- Dt : R C → R to all the available channel outputs based and JSCC-based schemes for LQG control over an AWGN   u D bt , channel, and are compared in terms of their LQG cost in t = t (5) Section VIII. We conclude the paper with Section IX, by dis- bt  bT,...,bT T cussing the principal differences between the proposed schemes where ( 1 t ) . along with possible extensions. Cost: Similarly to the classical LQG control setting (in which the controller and the observer are colocated), we wish to mini- mize the average-stage LQG cost at the time horizon T , II. PROBLEM SETUP  T−1  We now formulate the control–communications setting that 1 2 2 2 J¯T  E QT x + Qt x + Rt u , will be treated in this paper, depicted in Fig. 1. We concentrate T T t t t on the simple case of a scalar linear fully observable system. =1 In contrast to traditional control settings, the observer and the for known non-negative weights {Qt } and {Rt }, by designing controller are not colocated in this setting, and are connected appropriate operations at the observer, which also plays the role instead via a scalar AWGN channel. of the transmitter over the channel (2); and the controller, which Remark 2.1: The model and solutions can be extended also serves as the receiver over the channel (2). to more complex cases of vector states and channels; see For the important special case of fixed parameters, Section IX-B. Q ≡ Q, R ≡ R, The control and transmission duration spans the time interval t t [T ]  {1,...,T}. we further define the infinite-horizon cost Plant: A scalar discrete-time linear system dynamics J¯∞  lim J¯T , (6) T →∞ xt+1 = αxt + wt + ut ,t∈ [T − 1], (1) assuming the limit exists. where xt is the (scalar) state at time t, wt is an AWGN of power We recall next recently developed schemes for quantization W , α is a known scalar satisfying |α| > 1, and ut is the control signal. We further assume that x =0for simplicity.2 and channel coding for control as well as results from informa- 0 tion theory for JSCC design with low delay. Channel: We assume KC ∈ N channel uses are available per each control sample. Hence, at each time instant t, we can use the channel III. CONTROL WITH NOISELESS FINITE-RATE FEEDBACK In this section, we consider the model of Section II with the bt;i = at;i + nt;i ,i∈ [KC ],t∈ [T − 1] (2) AWGN channel (2) replaced with a noiseless channel of finite KC times, where bt;i is the ith channel output corresponding to capacity C, depicted in Fig. 2. That is, in this case, the channel, control sample t, at;i is the corresponding channel input subject transmitter, and receiver are as follows. C to a unit power constraint Channel: At time t, a packet t ∈{0,...,2 − 1} is sent   2 over a noiseless channel of capacity C, meaning that the receiver E at i ≤ 1, (3) ; obtains t at time t. 3 E and nt;i is an AWGN of power 1/SNR. We collect all the KC Transmitter: The function t (4), in this case, has a discrete codomain {0,...,2C − 1} (with no power constraints). channel uses in a column vector and denote it, with a slight abuse C Receiver: The domain of Dt (5) is {0,...,2 − 1}. Thus, the transmitter–receiver design amounts, in this case, 2 x This assumption can be replaced with an initial state 0 with a Gaussian to fixed-length sequential quantization. PDF. 3 A recent result [23] shows that an adaptive quantizer that suc- This representation is without loss of generality since the case of an average x t power PC and noise power N can always be transformed to an equivalent chan- cessively calculates the PDF of t given and applies Lloyd– nel with average power√ 1 and noise power N/PC  1/SNR by multiplying Max quantization with respect to it is greedily optimal and both sides of (2) by 1/ PC . close-to-globally optimal whenever fw is log-concave.

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on February 16,2020 at 05:48:56 UTC from IEEE Xplore. Restrictions apply. KHINA et al.: CONTROL OVER GAUSSIAN CHANNELS WITH AND WITHOUT SOURCE–CHANNEL SEPARATION 3693

and its quantization Q(w)   D  E (w −Q(w))2 (7a)

C 2−1 p[+1] 2 = (w − c[]) fw (w)dw. (7b) p  =0 [ ] Denoted by D∗, the minimal achievable distortion D;the optimal quantizer is the one that achieves D∗. f Fig. 2. Scalar control system with a driving white Gaussian distur- Remark 3.1: Since w is log-concave, it is continuous [43]. bance and a noiseless finite-capacity channel (“bit pipe”). The dashed Hence, the inclusion/exclusion of the boundary points of each line represents a bit-pipe of capacity C. cell does not affect the distortion of the quantizer, meaning that the boundary points can be broken systematically. Remark 3.2: We concentrate, in this paper, on input PDFs Definition 3.1 (Log-concave function; see [43]): A function with an infinite support. Consequently, p[0] = −∞ and p[2C ]= f R → R ◦f : ≥0 is said to be log-concave if its logarithm log ∞, and the leftmost and rightmost intervals are open. f x is concave; we use the extended definition that allows ( ) to The optimal quantizer satisfies the following necessary con- f x ∈ R ∪ {−∞} assign zero values, i.e., log ( ) is an extended ditions [21, Ch. 6.2]. −∞ real-value function that can take the value . Proposition 3.1 (Nearest neighbor condition): For a fixed We recall the Lloyd–Max algorithm and its optimality guaran- reproduction-point set c (fixed decoder), the partition-level set tees in Section III-A; the appropriate adaptive networked control p (encoder) that minimize the distortion D (7) is system is described in Section III-B. c[ − 1] + c[] p[]= ,=1, 2,...,2C − 1, (8) A. Quantizer Design 2 p −∞ p C ∞ The exposition in this section is based on [21, Part II]. and [0] = and [2 ]= . Definition 3.2 (Quantizer): A scalar quantizer Q of rate C Proposition 3.2 (Centroid condition): For a fixed partition- C p c is described by an encoder EQ : R →{0,...,2 − 1} and a level set (fixed encoder), the reproduction-point set (decoder) C C D decoder DQ : {0,...,2 − 1}→{c[0],...,c[2 − 1]}⊂R. that minimizes the distortion (7) is    With a slight abuse of notation, we shall define the quantization c[]=E w  p[] ≤ w

B. Controller Design 4The encoder and decoder that give rise to the same quantizer are unique up to a permutation of the labeling of the index . We now describe the optimal greedy control pol- 5If some inequalities are not strict, then the quantizer can be reduced to icy, the implementation of which is available in [42, another quantizer of lower rate. tree/master/code/separate/control]. To that end,

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on February 16,2020 at 05:48:56 UTC from IEEE Xplore. Restrictions apply. 3694 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 64, NO. 9, SEPTEMBER 2019

we make use of the following lemma that extends the separa- I   p  ,p  where [ t ] [ t [ t ] t [ t + 1]) as in Definition 3.3, and tion principle of estimation and control to networked control pt [t +1] t−1 t−1 γ  f t − t − α| ,u dα systems. xt | 1 ,u 1 ( ) . Lemma 3.1 (see [47], [29]): The optimal control law is pt [t ] u −L x given by 6) Calculates the next prior PDF using (1) and t = t ˆt t t fx |t ,ut xt | ,u ut = −Lt xˆt , t +1 ( +1 )     where x − u 1 t+1 t  t t−1 f t t −  ,u ∗ f x = xt | ,u 1  w ( t+1) St+1 |α| α  Lt = α (10) Rt + St +1 where “∗” denotes the convolution operation, and the two is the optimal linear quadratic regulator control gain, xˆt  convolved terms correspond to the PDFs of the quantiza-  t E xt  , and St satisfies the dynamic Riccati backward re- tion error α(xt − xˆt ) and the disturbance wt . cursion [48] with ST = QT and ST +1 = LT =0: Controller/Receiver. At time t ∈ [T − 1]: S R 1) Runs the Lloyd–Max algorithm (see Algorithm 3.1) with t+1 t 2 St Qt α . = + respect to the prior PDF fx |t −1 ,ut −1 as in Step 2 of the St+1 + Rt t observer/transmitter protocol. Moreover, this controller achieves the cost6 2) Receives the index t . 3) Reconstructs the quantized value: xˆt = DQ (t ). T    t 1 2 ut −Lt xt J¯T = St W + Gt E (xt − xˆt ) 4) Applies the control actuation = ˆ to the system. T f t t − t=1 5) Calculates the posterior PDF xt | ,u 1 and the next prior PDF fx |t ,ut as in Steps 5 and 6 of the ob- G S α2 − S Q t +1 with t = t+1 t + t . server/transmitter protocol. The optimal greedy algorithm minimizes the estimation dis- f E x − x 2 t Theorem 3.2 (see [23]): Let w be a Gaussian PDF. Then, tortion ( t ˆt ) at time , without regard to its effect on Algorithm 3.2 constitutes the optimal greedy control policy. future distortions. To that end, at time t, the transmitter and t−1 the receiver calculate the PDF of xt conditioned on  and t− u 1 f t − t − IV. ANYTIME-RELIABLE CODES , xt | 1 ,u 1 , and apply the Lloyd–Max quantizer to this 7 f t − t − f t t − PDF. We refer to xt | 1 ,u 1 and to xt | ,u 1 as the prior and We now describe causal error-correcting codes that allow us posterior PDFs , respectively. to meet the assumption of a noiseless finite-capacity channel of Although the optimal greedy algorithm does not achieve Section III over the AWGN channel (2). global optimality [49], its loss is negligible [23]. Since any decoding mistake is multiplied by α at every time Algorithm 3.2 (Optimal greedy control): step, and the corresponding second moment (power)—by α2 , Initialization. Both the transmitter and the receiver set the code should have an error probability that decays exponen- 2 8 1) 0 = x0 = u0 =0. tially with time with an exponent that is greater than α ; see [7], f x | , ≡ f x [9], and [16] for further details and discussion. 2) Prior PDF: x1 |0 ,u0 ( 1 0 0) x1 ( ). Observer/Transmitter. At time t ∈ [T − 1]: We construct such codes, termed anytime-reliable codes [7], 1) Observes xt . for memoryless binary-input output-symmetric (MBIOS) chan- 2) Runs the Lloyd–Max algorithm (Algorithm 3.1) with re- nels and then apply these results for the AWGN channel by employing appropriate digital constellations. f t − t − spect to the prior PDF xt | 1 ,u 1 to obtain the quan- Q x C Definition 4.1 (MBIOS channel): A binary-input channel is tizer t ( t ) of rate ; we denote its partition-level and a system with binary input alphabet {0, 1}, output alphabet Z, p c reproduction-point sets by t and t , respectively. and two probability transition functions: q(z|0) for input c =0 3) Quantizes the system state xt [recall Definition 3.2]: and q(z|1) for input c =1. The channel is said to be memoryless if the probability distribution of the output depends only on the t = EQ (xt ), t input at that time and is conditionally independent of previous x Q x D  . ˆt = t ( t )= Qt ( t ) and future channel inputs and outputs. It is further said to be output-symmetric if there exists an involution π : Z→Z, i.e.,  4) Transmits the quantization index t . a permutation that satisfies π−1 = π, such that q(π(z)|0) = 5) Calculates the posterior PDF: q(z|1) for all z ∈Z.9 t t−1 f t t − x | ,u The encoder and resulting code need to be causal, in our case, xt | ,u 1 ( t )  due to the sequential nature of the information stream. That is, at t−1 t−1 time instant t, k new information bits ıt are fed to the encoder; fx |t −1 ,ut −1 (xt | ,u )/γ, xt ∈I[t ] = t n c , the encoder, then, produces coded bits t by encoding all of 0 otherwise the available information bits ıt ,    t ct = Et ı , (11) 6We set RT =0and T =0for the definition of xˆT , as no transmission or control action are performed at time T . 7Since ut−1 is a deterministic function of t−1 , it suffices to condition 8To stabilize higher moments, one needs higher exponents. See the discussion on t−1 . However, when incorporating Algorithm 3.2 into a separation-based in Section IX-A below. scheme, this distinction becomes useful since ut−1 becomes a function of pos- 9This also extends to additive noise channels, such as the binary-input AWGN sibly corrupted channel outputs in this case. channel.

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on February 16,2020 at 05:48:56 UTC from IEEE Xplore. Restrictions apply. KHINA et al.: CONTROL OVER GAUSSIAN CHANNELS WITH AND WITHOUT SOURCE–CHANNEL SEPARATION 3695

with ⎡ ⎤ G1 00··· ··· ⎢ ⎥ ⎢ G2 G1 0 ··· ···⎥ ⎢ ⎥ ⎢ . . . . ⎥ Fig. 3. MBIOS channel with reconstructions of all past information bits. G ⎢ ...... ···⎥ n,R = ⎢ . . ⎥ (15a) ⎢ G G ··· G 0 ⎥ ⎣ t t−1 1 ⎦ . . . .  kt n ...... using an encoding function Et : {0, 1} →{0, 1} , agreed . . . . . upon by the encoder and the decoder before transmission.   The sequential encoding operation can be conveniently ıT ıT ıT ··· ıT ... = 1 2 t (15b) viewed as advancing over a prefix tree (trie) and the corre-   cT cT cT ··· cT ... . sponding codes are therefore referred to as tree codes. = 1 2 t (15c) t At time t, the decoder recovers estimates {ıi|t } of all i=1 We now define the random LTI tree code ensemble. the past information bits ıt by applying a causal function D Znt →{ , }kt ct Definition 4.3 (LTI tree code ensemble): An ensemble of : 0 1 to all the received channel outputs , LTI tree codes of rate R = k/n that map kt information bits to produce (see also Fig. 3) into n bits at every time step t, where the entries in all {Gi} of     Gn;R of (15a) are i.i.d. and uniform.  t q ı1|t , ı2|t ,...,ıt|t = Dt c . (12) Theorem 4.1 (Error exponent under ML decoding): Let be an MBIOS channel. Let further >0 and d0 ∈ N. Then, the One is then assigned the task of choosing a sequence of func- probability that a particular code from the random LTI tree   code ensemble of Definition IV.3 has an anytime exponent (13) tion pairs {(Et , Dt )|t ∈ N} that provides anytime reliability. of EG (R) − , for all t ∈ N and d>d0 , under optimal (ML) We recall this definition as stated in [9]. decoding, is bounded from below by Definition 4.2 (Anytime reliability): Define the probabil-   t d ∞ t   −nd ity of the first error event at time happening steps back 2 0 P t, d ≤ −[E G (R)−]nd ≥ − (delay) as Pr e ( ) 2 1 − −n t d d 1 2   =1 = 0 Pe (t, d)  P ıt−d = ˆıt−d|t , ∀δ>d,ıt−δ = ˆıt−δ|t , where EG is the block random-coding error exponent [50, Ch. 9], [13, Sec. 5.6], [12, Ch. 7] where the probability is over the randomness of the information EG (R)  max [E0 (ρ) − ρR] (16a) bits {ıt } and the channel noise. Suppose we are assigned a 0≤ρ≤1 n   budget of channel uses per time step of the evolution of the  1+ρ R, β 1 1 1 1 plant. Then, an encoder–decoder pair is called ( ) anytime E (ρ)  − log q 1+ρ (z|0) + q 1+ρ (z|1) . A ∈ R d ∈ N 0 reliable if there exist and 0 , such that z∈Z 2 2 (16b) −βnd Pe (t, d) ≤ A2 ∀t, d ≥ d , (13) 0 Thus, for any >0, however small, this probability can be made arbitrarily close to 1 by taking d0 to be large enough. where β is called the anytime exponent. Unfortunately, ML decoding requires searching over all pos- d ≥ d Remark 4.1: The requirement of 0 in (13) can always sible codewords—the number of which grows exponentially fast be dropped by replacing A by a larger constant. Conversely, A with time—rendering it infeasible except over erasure chan- can be replaced with 1 by reducing β by >0, however small, nels [9]. We therefore turn to sequential decoding, which trades d A and taking a large enough 0 . Nonetheless, we use both and some performance for feasible expected complexity. d0 in the definition for convenience. B. LTI Anytime-Reliable Codes Under Sequential A. LTI Anytime-Reliable Codes Under ML Decoding Decoding Instead of an exhaustive search over all possible codewords— Following Sukhavasi and Hassibi [9], we now present an LTI O kt anytime-reliable code ensemble under ML decoding. the complexity of which grows as (2 )—as is done in ML  decoding, one may restrict the search to only the most likely When restricted to an LTI (“tree”) code, each function εt {G ,...,G } codeword paths, such that their per letter complexity does not can be characterized by a set of matrices 1 t , where grow significantly with time. Such algorithms are known col- G ∈ Zn×k t 2 . The sequence of quantized measurements at time lectively as sequential decoding algorithms. t {b }t , i i=1, is encoded as In this paper, we shall concentrate on one of the two popular variants of this algorithm—the Stack Algorithm , the other being ct = G1 ı1 + G2 ı2 + ···+ Gt ıt (14) the Fano Algorithm . The former achieves better time complexity and smaller error probability but is more expensive in terms or equivalently in a matrix form of memory, compared to the latter. Nonetheless, the anytime exponent of both algorithms is the same; for a treatment of the Fano algorithm, which is similar to the one presented next, the c G ı, = n;R reader is referred to [16].

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on February 16,2020 at 05:48:56 UTC from IEEE Xplore. Restrictions apply. 3696 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 64, NO. 9, SEPTEMBER 2019

Algorithm 1: Sequential Decoding Stack Algorithm. Q ← MaxPriorityQueue(node → node.metric)  Leaf nodes, ordered by Fano metric Q.push with priority(root) while Q.top.depth

We next summarize the relevant properties of the stack de- Since EJ (B,R) is a monotonically increasing function of coding algorithm when using the generalized Fano metric (see, B, choosing B = E0 (1) maximizes the exponential decay of 12 e.g., [12, Ch. 10]) to compare possible codeword paths P¯e (d) in d. Interestingly, for this choice of bias, we have EJ (E (1),R)=EG (R) whenever EG (R) is achieved by ρ = T 0 1 in (16a), i.e., for rates below the critical rate. For other values M(c1 ,...,cN )= M(ct ) (17a) of ρ, EJ (E (1),R) is strictly smaller than EG (R). t 0 =1 The choice B = R, on the other hand, is known to minimize q(zt |ct ) the expected computational complexity (which has a Pareto M(ct )  log   n − nB 1  distribution; see [16], for details), and is therefore a popular c∈{ , }n q(zt |c ) 0 1 2 choice in practice. Moreover, for rates below the cutoff rate (17b) RE0 (1), with the highest metric is further explored and replaced with the expected complexity is known to grow rapidly with the its immediate descendants and their metrics. The stack al- code length for any metric [52], implying that the algorithm is gorithm is outlined in Algorithm 1 and implemented in [42, applicable only for rates below the cutoff rate E0 (1). tree/master/code/separate/coding], [51]; for a detailed description of the stack algorithm (as well as the Fano C. Modulation algorithm and variants thereof), see [11, Ch. 6.4], [12, Ch. 10], [13, Sec. 6.9], [14, Ch. 6], and [15, Ch. 6]. In order to support the transmission of more than one coded Theorem 4.2 (Error exponent under sequential decoding): bit per channel use, we modulate the bits using pulse-amplitude n/K Let q be an MBIOS channel. Let further >0 and d ∈ N. modulation (PAM). Specifically, we map every C consec- 0 T utive coded bits of ct =(ct;1,...,ct;n ) into a constellation Then, the probability that a particular code from the random k LTI tree code ensemble of Definition 4.3 has an anytime point of size 2 , where k is the number of information bits t exponent (13) of EJ (R) − , for all t ∈ N and d>d0 , under than need to be conveyed at each time step . We normalize the sequential stack decoding, is bounded from below by constellation to have an average unit power:   ∞ t    k   −nd0  −[E J (R)−]nd 2 3 j−1 ct j ki Pr Pe (t, d) ≤ A2 ≥ 1− at i = 2 (−1) ; + ,i∈ [KC ]. (18) 1 − 2−n ; 2k−1 − 1 t=1 d=d0 j=1

where EJ is Jelinek’s sequential decoding exponent ρ   V. L OW-DELAY JSCC EJ (B,R)  max E0 (ρ)+B − (1 + ρ)R , 0≤ρ≤1 1+ρ In this section, we review known results from information theory for transmitting an i.i.d. zero-mean Gaussian source st E0 is given in (16b), and A is finite for B0, however small, this probability can be appropriate distortion measure for our case of interest is the d made arbitrarily close to 1 by taking 0 to be large enough. mean square error (MSE) distortion.

10Note that optimizing (17a) in this case is equivalent to ML decoding. 11 12 Note that E0 (ρ)/ρ is a monotonically decreasing function of ρ, therefore For finite values of d a lower choice of B may be better, since the constant B 0. A might be smaller in this case.

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on February 16,2020 at 05:48:56 UTC from IEEE Xplore. Restrictions apply. KHINA et al.: CONTROL OVER GAUSSIAN CHANNELS WITH AND WITHOUT SOURCE–CHANNEL SEPARATION 3697

Transmitter: Similarly to (4), it generates KC channel inputs A. With Feedback by applying a function E : R → RK C s to the source sample , When perfect instantaneous feedback is available, the fol- a = E (s) , lowing simple scheme, due to Elias [26] (see also [31, Ch. 3.5] and the references therein), is known to achieve SDROPTA for where a is subject to an average power constraint (3). K ∈ N K b C . Receiver: Observes the C channel outputs [defined as in Scheme V.1 (JSCC with feedback): Section II], and constructs an estimate sˆ of s, by applying a Transmitter. At channel use i ∈ [KC ] function D : RK C → R: 1) Calculates the MMSE estimation error of the source s s D b . ˆ = ( ) given all past outputs (b1 ,...,bi−1 ) (available via the Cost: The cost, commonly referred to as average distortion in instantaneous feedback): the context of JSCC, is defined by sMMSE s − sMMSE, ˜i−1 = ˆi−1 D = E (s − sˆ)2 , where the MMSE estimate sˆi of st given (b1 ,...,bi ) is and the corresponding (source) signal-to-distortion ratio (SDR) equal to (clearly sˆMMSE =0) is defined as 0 ! PS P SDR  . sMMSE sMMSE S b . D ˆi =ˆi− +SNR i i (22) 1 (1 + SNR) +1 Our results here are more easily presented in terms of unbi- ased errors, as these can be regarded as uncorrelated additive 2) Transmits the estimation error s˜i− after a suitable power noise in the sequel (when used as part of the developed con- 1 trol scheme). Therefore, we consider the use of (sample-wise) adjustment: correlation-sense unbiased estimators (CUBE), namely, estima- i−1 tors that satisfy a (1 + SNR) sMMSE . i = ˜i−1 (23) PS E [s (s − sˆ)] = 0. B i ∈ K We note that any estimator sˆ can be transformed into a CUBE Receiver. At channel use [ C ] sMMSE s sˆ by multiplying by a suitable constant: 1) Calculates the MMSE estimate ˆi of from   b ,...,bi 2 ( 1 ) as in (22); E s B s s . 2) Calculates the CUBE estimate sˆi of s from (b1 ,...,bi ) ˆ = B ˆ (19) E [ssˆ ] using (19):

For a further discussion of such estimators and their use in i communications, the reader is referred to [24]. s (1 + SNR) sMMSE. ˆi = i ˆi Shannon’s celebrated result [5] states that the minimal achiev- (1 + SNR) − 1 able distortion, using any transmitter–receiver scheme, is dic- 13 tated, in the case of a Gaussian source, by Theorem 5.1 (see [26]): Scheme V.1 achieves the OPTA SDR (21). 1 K C log (1 + SDR) = R(D) ≤ KC C = log (1 + SNR) , 2 2 We provide a short proof, for completeness. (20) Proof: The transmitter calculates the MMSE estimate from the channel outputs, which are available to it via the feedback where R(D) is the rate–distortion function of the source and C and transmits the estimation error with a proper power adjust- is the channel capacity [5]; this result remains true even in the ment. presence of feedback—when the channel outputs are available at sMMSE E s Clearly, ˆt;0 = [ t ]=0. the transmitter [31, Ch. 1.5]. Thus, the optimal SDR, commonly At channel use i, the MMSE estimate is given by referred to as optimum performance theoretically achievable    (OPTA) SDR, is given by MMSE  sˆi  E s b1 ,...,bi (24a) K SDR =(1+SNR) C − 1. (21)    OPTA E sMMSE sMMSEb ,...,b = ˆi−1 +˜i−1 1 i (24b) Although (21) is attainable via separation for KS source sam-    K K K sMMSE E sMMSEb ,...,b ples and S C channel uses in the limit of large S ,itis =ˆi−1 + ˜i−1 1 i (24c) unknown, in general, how closely (21) can be approached at fi- ! nite delay. Here, we focus on the scenario of interest to control, P sMMSE SNR S b namely, the zero-delay case, in which a single Gaussian sample =ˆi−1 + i− i (24d) 1 + SNR (1 + SNR) 1 is instantaneously mapped to KC channel uses. We next concentrate on the case of KC > 1 and perfect in- sMMSE,b stantaneous feedback, in Section V-A. We further treat the case where (24c) holds since (˜i−1 i ) are independent of b ,...,b a of KC =2when no feedback is available, in Section V-B. 1 i−1 due to the structure of i (23), the fact that the MMSE estimation error is orthogonal to all the measurements, 13The rate–distortion function here is written in terms of the unbiased SDR, and hence also independent by Gaussianity, and (24d) holds in contrast to the more common-biased SDR expression log(SDR). since the MMSE estimator is linear in the Gaussian case.

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on February 16,2020 at 05:48:56 UTC from IEEE Xplore. Restrictions apply. 3698 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 64, NO. 9, SEPTEMBER 2019

The MMSE is equal to the conditional MMSE in the Gaussian case, and is given by      2  2 E s˜i b1 ,...,bi = E s˜i

1 P . = i S (1 + SNR) This concludes the proof.  Remark 5.1 (Non-Gaussian noise): For the case of an addi- tive non-Gaussian noise channel with a given SNR, Scheme V.1 achieves an SDR of (1 + SNR)K C − 1. Since linear optimiza- tion is generally suboptimal in the non-Gaussian case, better performance can be attained using an appropriate scheme in- stead of Scheme V.1 (which is the optimal linear scheme); a notable attempt in this direction was made by Shayevitz and Feder [53]. In fact, for most noises, OPTA performance can be attained only in the limit of large information and code blocks, even in the presence of feedback [31, Ch. 3.5].

B. Without Feedback We now turn to the more involved case of low-delay JSCC Fig. 4. Linear and Archimedean bispiral curves. without feedback. We concentrate on the case of KC =2. That is, the case in which one source sample is conveyed over two where ω determines the rotation frequency, the factor creg is channel uses. chosen to satisfy the power constraint, and the sign(s) term Ana¨ıve approach is to send the source as is over both channel is needed to avoid overlap of the curve for positive and nega- uses, up to a power adjustment. The corresponding unbiased tive values of s (for each of which now corresponds a distinct SDR in this case is spiral, and the two meet only at the origin). This (bi)spiral al- lows to effectively improve the resolution with respect to small SDRlin = 2SNR noise values, since the 1-D source line is effectively stretched i.e., a linear improvement rather than an exponential one as compared to the noise, and hence the noise magnitude shrinks in (21). This scheme approaches (21) for very low SNRs, but when the source curve is mapped (contracted) back. However, suffers great losses at high SNRs. Note that the linear factor for large noise values, a jump to a different branch—referred to 2 comes from the fact that the total power available over both as a threshold effect —may occur, incurring a large distortion. ω channel uses has doubled, and the same performance can be Thus, the value needs to be chosen to be as large as possible to attained by allocating all of the available power to the first allow maximal stretching of the curve for the same given power, channel use while remaining silent during the second channel while maintaining a low threshold event probability. The SDRs ω use. for different values of are depicted in Fig. 5(a). This suggests that better mappings that truly exploit the extra Another ingredient that is used in conjunction with (25) is stretching s prior to mapping it to a bispiral using φλ(s)  channel use can be constructed. The first to propose an improve- λ ment for the 1:2 case were Shannon [35] and Kotel’nikov [36], sign(s)|s| :    in the late 1940s. In their works, the source sample is viewed as a stretch reg stretch λ λ a (s)=a (φλ(s)) = c |s| cos ω|s| sign(s) point on a single-dimensional line, whereas the two channel uses 1 1   . astretch s areg φ s cstretch|s|λ ω|s|λ s correspond to a two-dimensional (2-D) space. In these terms, 2 ( )= 2 ( λ( )) = sin sign( ) the linear scheme corresponds to mapping the one-dimensional (26) (1-D) source line to a straight line in the 2-D channel space (rep- resented by a dashed line in Fig. 4), and hence clearly cannot The choice λ =0.5 promises a great boost in performance in provide any improvement, as AWGN is invariant to rotations. the high-SNR regime, as is seen in Fig. 5(b). We further note E s|b ,b However, by mapping the 1-D source line into a 2-D curve that although the optimal decoder [ 1 2 ] is an MMSE esti- p b ,b |s that fills the space better, a great boost in performance can be mator, in this case, the ML decoder ( 1 2 ) achieves similar attained, as was demonstrated in [35], [36], [11, Ch. 8.2], [54]– performance for moderate and high SNRs. A joint optimization [58], and references therein, for different families of mappings. of λ and ω for each SNR, for both MMSE and ML decoding, In this paper, we concentrate on such a family that is was carried out in [59], with the latter depicted in Fig. 5. based on the Archimedean spiral, which was considered in A desired property of the linear JSCC schemes is that their several works [54], [58]–[61] (represented by the solid line SDR improves with the channel SNR (“SNR universality”). in Fig. 4): Such an improvement is not allowed by the separation-based  technique, as it fails when the actual SNR is lower than the areg s creg s ωs creg |s| ω|s| s 1 ( )= cos( )= cos( ) sign( ) design SNR, and does not promise (almost) any improvement for SNRs above it. This motivated much work in designing areg(s)=creg |s| sin(ωs)=creg |s| sin(ω|s|) sign(s) 2 JSCC schemes whose performance improves with the SNR, (25) even for the case of large block lengths [62]–[64]. The schemes

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on February 16,2020 at 05:48:56 UTC from IEEE Xplore. Restrictions apply. KHINA et al.: CONTROL OVER GAUSSIAN CHANNELS WITH AND WITHOUT SOURCE–CHANNEL SEPARATION 3699

for some β>1. This has only a slight effect on the resulting SDRs, as is illustrated in Fig. 5. Finally, note that in no way do we claim that the spiral- based Shannon–Kotel’nikov (SK) scheme is optimal. Various other techniques exist, most using a hybrid of digital and analog components [56], [57], [65], which outperform the spiral-based scheme for various parameters. Nevertheless, this scheme gives good performance boosts, which suffice for our demonstration.

VI. CONTROL VIA SOURCE–CHANNEL SEPARATION The separation-based control scheme, outlined next, applies Algorithm 3.2 and sends the resulting quantization indices af- ter encoding with a tree code generated as in Section IV. The observer/transmitter knowingly ignores any decoding er- rors made by the controller/receiver by internally simulating the system without any decoding errors. On the other hand, the con- troller/receiver, upon detecting an error in the past, recalculates the steps of Algorithm 3.2 starting from this error and corrects for it in the following steps. This scheme is implemented in [42, tree/master/code/separate]. Scheme VI.1 (Separation-based): Initialization. 1) Selects the number of information bits k that are encoded at every time step t [recall (11), (14)]. 2) Sets the size M of the PAM constellation to be 2k and the number of coded bits, n,tobeKC k. 3) Generates G1 ,...,GT as in Definition IV.3. 4) Assigns the noiseless-channel capacity C of Algo- rithm 3.2 to equal k. 5) Initializes Algorithm 3.2. Observer/Transmitter. At time t ∈ [T ]: 1) Runs the observer/transmitter steps of Algorithm 3.2 with the control signal ut in Step 6 replaced by the signal generated by the controller in (28) below. 2) Maps the resulting quantization index t into the k-bit Fig. 5. Performances of the JSCC linear repetition scheme, OPTA λ ω input ıt of the tree encoder. bound, and the JSCC SK spiral scheme for optimized and ,forthe t standard case (β =1) and distortion-bounded case. The solid lines de- 3) Encodes ı into n coded bits ct according to (14). ω pict the performance of the standard spiral for various values of for two 4) Maps ct into KC constellation points at as in (18). stretch parameters λ =0.5 and 1, which perform better at high and low 5) Transmits the KC constellation points at over the KC SNRs, respectively. channel uses. Controller/Receiver. At time t ∈ [T ]: in these works achieve optimal performance (21) for a specific 1) Receives the KC channel outputs bt . design SNR (21), and improve linearly for higher SNRs. Similar 2) Recovers estimates of all information bits until time t, behavior is observed also in Fig. 5 where the optimal ω value ˆı |t ,ˆı |t ,...,ˆıt|t as in (12) using Algorithm 1. varies with the (design) SNR, and mimics closely the quadratic 1 2 3) Maps each ˆıτ |t (for τ ∈ [t]) into a quantization index esti- growth in the SDR. Above the design SNR, linear growth is ˆr achieved for a particular choice of ω. mate τ |t , where the superscript “r” stands for “receiver.” We further note that the distortion component due to the ˆr ˆr 4) Finds the earliest time τ ∈ [t − 1] for which τ |t = τ |t− . threshold event grows with |s|. To avoid this behavior, instead 1 " " We denote this time instant by t0 . If no such time instant of increasing the magnitude "astretch" proportionally to the phase   exists, set t0 = t. stretch ∠ a as in (26), we propose to increase it slightly faster at 5) Runs the controller/receiver steps of Algorithm 3.2 for a pace that guarantees that the incurred distortion does not grow time instants τ = t ,...,t− 1, with t replaced with |s| 0 with : (ˆr ,...,ˆr ) and the used control signals ut−1 .    1|t t|t abounded s cbounded|s|λβ ω|s|λ s 6) Runs Steps 1 and 3 of the controller/receiver of 1 ( )= cos sign( )   t t λβ λ (27) Algorithm 3.2 for time instant with replaced abounded(s)=cbounded|s| sin ω|s| sign(s) ˆr 2 with t|t .

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on February 16,2020 at 05:48:56 UTC from IEEE Xplore. Restrictions apply. 3700 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 64, NO. 9, SEPTEMBER 2019

7) Applies the control signal 3) Unnormalizes s¯ˆt to construct an estimate of st : # t−1  sˆt = P r s¯ˆt (31a) u −L xr αt−1−τ L xr − xr t|t−1 t = t ˆt|t + τ ˆτ |t ˆτ |t−1 (28a) τ =0 #   eff = P r s¯t + n (31b) t−1  t|t−1 t −L xr αt−1−τ L xr − xr = t ˆt|t + τ ˆτ |t ˆτ |t−1 # r eff τ =t x P r n . 0 =˜t|t−1 + t|t−1 t (31c) (28b) r t 4) Constructs an estimate xˆ of xt given b . Since sˆt ⊥ xr t|t to the system, where ˆτ |t denotes the estimate of the xr 14  ˆt|t−1 , the linear MMSE estimate amounts to x ˆr , ˆr ,...,ˆr source τ given 1|t 2|t τ |t at the receiver. xr xr SDR0 s ˆt|t =ˆt|t−1 + ˆt (32) 1 + SDR0 VII. CONTROL VIA LOW-DELAY JSCC with an MSE of In this section, we construct a Kalman-filter-like solution [48] P r by employing JSCC schemes. We note that the additional com- r t|t−1 Pt|t = . (33) plication here is due to the communication channel (2) and its 1 + SDR0 inherent input power constraint. u −L xr r t2 5) Generates the control signal t = t ˆt|t , and the re- Denote by xˆt |t , the estimate of xt at the receiver given b . 1 2 1 ceiver prediction of the next system state Denote further its MSE by r r   xˆ = αxˆ + ut− , 2 t|t−1 t−1|t−1 1 r r Pt |t  E x˜t |t , 1 2 1 2 where Lt is given as in Lemma 3.1. Using (33) and (1), the prediction error at the receiver is given xr  x − xr where ˜t |t t1 ˆt |t . by the following recursion: 1 2 1 2 t Then, the scheme works as follows. At time instant , the con- α2 P r r t|t−1 troller constructs an estimate xˆt|t of xt . It then applies the control P r W. t+1|t = + (34) r 1 + SDR signal ut = −Lt xˆt|t to the plant, with Lt given in (10). Note 0 that, since both the controller and the observer know the previ- The recursive relation (34) leads to the following condition ut xr xr ously applied control signals , they also know ˆt|t and ˆt+1|t . for the stabilizability of the control system and bound on the Hence, in order to describe xt , the observer can save transmit achievable cost. r power by transmitting the error signal (xt − xˆt|t− ), instead of Theorem 7.1 (Achievable): The scalar control system of 1 2 x xr Section II is stabilizable using Scheme VII.1 if α < 1 + SDR0 , t . The controller can then add back ˆt|t−1 to the received signal and its infinite-horizon average-stage LQG cost J¯∞ (6) is to construct xˆr . t|t bounded from above by Scheme VII.1:   Observer/transmitter: At time t Q + α2 − 1 S J¯ ≤ W. 2 (35) 1) Generates the desired error signal 1 + SDR0 − α s xr t =˜t|t−1 (29a) The following theorem is an adaptation of the lower bound in r [66] to our setting of interest. = xt − xˆ (29b) t|t−1 Theorem 7.2 (Lower bound): The scalar control system of 2 P r Section II is stabilizable only if α < 1 + SDROPTA, and the of average power t|t−1 (determined in the sequel). 2) Since the channel input is subject to a unit power con- optimal achievable infinite-horizon average-stage LQG cost is s bounded from below by straint (3), t is normalized:   Q + α2 − 1 S s # 1 s . J¯ ≥ W. (36) ¯t = t (30) − α2 P r 1 + SDROPTA t|t−1 s K By comparing (35) and (36), we see that the potential gap 3) Maps ¯t into C channel inputs, constituting the en- between the two bounds stems only from the gap between the a tries of t , using a bounded-distortion JSCC scheme of bounds on the achievable SDR over the AWGN channel (2). choice with (maximum given any input) average distor- Remark 7.1 (Stabilizability): The stabilizability condition of tion 1/SDR0 for the given channel SNR. Theorem. 7.1 is distant from that of Theorem 7.2 in this case K a 4) Sends the C channel inputs t over the channel (2). since SDR0 < SDROPTA. By improving SDR0 , one can im- Controller/Receiver: At time t ∈ [T ]: prove the achievable stabilizability of the system. 1) Receives the KC channel outputs bt . eff 2) Recovers a CUBE of the source signal s¯t : s¯ˆt =¯st + n , t 14 neff neff ⊥ s If the resulting effective noise t is not an AWGN with power that does where t ¯t is an additive noise of power of (at most) not depend on the channel input, then a better estimator than that in (32) may 1/SDR0 . be constructed.

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on February 16,2020 at 05:48:56 UTC from IEEE Xplore. Restrictions apply. KHINA et al.: CONTROL OVER GAUSSIAN CHANNELS WITH AND WITHOUT SOURCE–CHANNEL SEPARATION 3701

It is interesting to note that in this case, in stark contrast to the classical LQG setting in which the system is stabilizable for any values of α and W , low values of the SDR render the system unstable. Hence, it provides, among others, the minimal required transmit power for the system to remain stable. The difference from the classical LQG case stems from the additional input power constraint, which effectively couples the power of the effective observation noise with that of the estimation error, and was previously observed in, e.g., [28], [29],[66], and [67]. The existence of a threshold SDR below which the system cannot be stabilized parallels the result of Sinopoli et al. [33] for control over packet drop channels, which shows that the system cannot be stabilized if the packet drop probability exceeds a certain threshold. K We next discuss the special cases of C =1 in J¯ K ∈ N Fig. 6. Optimal average stage LQG cost of a single representative Section VII-A, C and instantaneous perfect output run for KC = KS =1, α =2, and SNRs 2 and 4, which correspond feedback—in Section VII-B, and KC =2in Section VII-C. to a stabilizable and an unstabilizable systems. The driving noise and observation noise powers and the LQG penalty coefficients are Qt ≡ Rt ≡ 1,W =1. A. Source–Channel Rate Match K In this section, we treat the case of C =1, namely, the case this lower bound is not achievable in general with or without where the sample rate of the control system and the signaling feedback, as is implied by [31, Ch. 3.5]. rate of the communication channel match. As we have seen in Section V, analog linear transmission of a Gaussian source over an AWGN channel achieves optimal C. Source–Channel Rate Mismatch Without Feedback performance (even when infinite delay is allowed), namely, the We now consider the case of KC =2channel uses per sam- OPTA SDR (21), for any given input value. Thus, the JSCC ple. As we saw in Section V, linear schemes are suboptimal out- scheme that we use in this case is linear transmission—the side the low-SNR regime. Instead, by using nonlinear maps, e.g., source is transmitted as is, up to a power adjustment [recall (29) the (modified) Archimedean spiral-based SK maps (27), bet- and (30)] ter performance can be achieved. This scheme is implemented tree/master/code/joint 1 in [42, ]. at =¯st = # st . We note that the improvement in the SDR of the JSCC scheme P r t|t−1 is substantial when the SDR of the linear scheme is close to α2 − 1, using an improved scheme with better SDR improves Since in this case SDR0 =SDROPTA, the upper and lower substantially the LQG cost. Unfortunately, the spiral-based SK bounds of Theorems 7.1 and 7.2 coincide, establishing the op- schemes do not promise any improvement for SNRs below 5 dB timum performance in this case. under ML decoding. Corollary 7.1: The scalar control system of Section II with Remark 7.4: By replacing the ML decoder with an MMSE 2 KC =1is stabilizable if only if α < 1 + SNR, and the optimal one, strictly better performance can be achieved over the linear achievable infinite-horizon average stage LQG cost satisfies (35) scheme for all SNR values. with equality with SDR0 =SNR. Remark 7.5: The resulting effective noise at the output of Remark 7.2: The stabilizability condition and optimum the JSCC receiver is not necessarily Gaussian, and hence the MMSE performance were previously established in [28] and resulting system state, xt , is not necessarily Gaussian either. [67] and extend also to the noisy-observation case [23]. Nevertheless, for the bounded-distortion scheme (27), this has no effect on the resulting performance, as is demonstrated next. B. Source–Channel Rate Mismatch With Feedback When the AWGN channel outputs bt (2) are available to the VIII. SIMULATIONS transmitter via an instantaneous feedback, we can incorporate A. Rate-Matched Case Scheme V.1 in Scheme VII.1 to attain the OPTA SDR and again establish the optimal LQG cost of this setting. The optimal average-stage LQG cost is illustrated in Fig. 6,for K ∈ a system with α =2and two SNRs—2 and 4. SNR = 4 satisfies Corollary 7.2: The scalar control system of Section II C 2 K the stabilizability condition α < 1 + SNR, whereas SNR = 2 N is stabilizable if only if α2 < (1 + SNR) C , and the optimal fails to do so. Unit LQG penalty coefficients Qt ≡ Rt ≡ 1 and achievable infinite-horizon average stage LQG cost satisfies (35) W =1 K C unit driving noise power are used. with equality with SDR0 =(1+SNR) − 1. Remark 7.3 (Non-Gaussian noise): Following Remark V.1, B. JSCC for the Rate-Mismatched Case the achievability of Theorem 7.1 is attainable with SDR0 = (1 + SNR)K c − 1 even when the noise is non-Gaussian. In this The effect of the SDR improvement is illustrated in Fig. 7 for a case, however, the variance W in the lower bound of Theorem system with α =3and W =1,forQt ≡ 1 and Rt ≡ 0, by com- 7.2 should be replaced with its entropy power (which is strictly paring the achievable costs and lower bound of Theorems 7.1 lower than the variance for non-Gaussian processes) [66], and and 7.2. We note that the JSCC scheme with (intermediate) therefore better performance might be achievable. We note that feedback always achieves OPTA.

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on February 16,2020 at 05:48:56 UTC from IEEE Xplore. Restrictions apply. 3702 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 64, NO. 9, SEPTEMBER 2019

Fig. 7. Average-stage LQG costs when using the (distortion-bounded) SK Archimedean bispiral, repetition, and the lower bound of Theorem 7.2 Fig. 9. LQG cost comparison of the separation-based scheme with k for α =3,W =1,Qt ≡ 1,Rt ≡ 0. The vertical dotted lines represent the =2 (2-PAM constellation), the JSCC SK bispiral scheme, and the α . . ,W ,Q ≡ ,R ≡ minimum SNR below which the cost diverges to infinity. OPTA lower bound for =12, SNR = 4 5 dB =1 t 1 t 0. The schemes were simulated for the same disturbance and noise se- quences and their results are compared to the analytically derived cost of the JSCC scheme and the OPTA lower bound. Times when decoding errors in the separation-based scheme occur are marked by a thick line.

IX. DISCUSSION AND FUTURE RESEARCH A. Separation-Based and JSCC Schemes Comparison As is evident from the simulation results in Section VIII-C, in addition to demanding far less computation time and memory, and being considerably simpler to implement than separation- based schemes, the JSCC-based schemes also perform much better in terms of control cost. A key component behind this improvement is the fact that the JSCC schemes of Section V allow the (rare) utilization of large (unbounded) excess transmission power. The separation-based Fig. 8. Average-stage LQG costs averaged over 256 runs when using schemes, on the other hand, are limited by the transmission the (distortion-bounded) JSCC SK Archimedean bispiral scheme, linear power of the maximum constellation point, which increases as (repetition) scheme, separation-based scheme for 2-PAM and 4-PAM, 15 and the (OPTA) lower bound of Theorem 7.2 for α =1.2,W =1,Qt ≡ the square root of the average power. Namely, these schemes 1,Rt =0. The SNRs for which the separation based schemes were have a peak power constraint, which is known to have a detri- simulated are marked with squares and circles for 2-PAM and 4-PAM, mental effect on the performance [68]. respectively. Another unfortunate shortcoming of using separation-based schemes is their incompetence to stabilize higher moments. As was noted already in the seminal work of Sahai and Mitter [7], in C. Comparison of Separation-Based and JSCC order to stabilize higher moments, increased error exponents are Schemes for the Rate-Mismatched Case required, that need to grow linearly with the moment’s order— this behavior is manifested by the abrupt jumps in the cost of We now compare the performance of the separation-based this scheme in Fig. 9. In contrast, JSCC schemes (which in our scheme of Section VI with the JSCC schemes of Section VII. case enjoy an implicit feedback via the control-system loop) The implementations of these schemes are available in [42]. can attain a super exponential decay of the error probability, J¯ Fig. 8 shows a comparison of the control costs t achieved by when used to send bits [69] (cf., [68]). Thus, such schemes can α . these schemes for a fully observable scalar plant with =12, stabilize more and even all moments, and guarantee almost- W Q ≡ R ≡ =1, t 1, and t 0, over 256 runs. sure stability (by using, e.g., the simple linear/repetition based Clearly, the JSCC-based schemes outperform the separation- scheme), as was noted already by Sahai and Mitter [7, Sec. III-C] based schemes by a large margin. in the context of anytime reliability. Note that while larger constellations perform better for high On the other hand, the JSCC schemes developed in this paper SNRs, the situation is reversed when the SNR is low. assumed knowledge of the control objectives at the transmitter We further include a simulation of a single run of each of the in order to recover the estimates of the controller at the trans- schemes in Fig. 9. For the separation-based scheme, decoding mitter. The separation-based scheme on the other hand did not ˆr errors (times when t|t = t ) are highlighted. Their impact is clear: while the decoder is in error, it applies the wrong control signal, causing the cost function to rapidly deteriorate. In the 15In the limit of infinite-size constellations, the distribution of the con- instance shown, these decoding errors are clearly the major stellation tends to a continuous uniform PDF over [−M/2,M/2] of power factor degrading performance. P = M 2 /12.

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on February 16,2020 at 05:48:56 UTC from IEEE Xplore. Restrictions apply. KHINA et al.: CONTROL OVER GAUSSIAN CHANNELS WITH AND WITHOUT SOURCE–CHANNEL SEPARATION 3703

require such knowledge and only assumed knowledge of the We further note that for bounded noise (even worst- control signal ut . case/arbitrary), parallel results can be achieved.

B. Partially Observable Vector Systems D. Packeted Transmission With Erasures In this paper, we focused on the simplest case of scalar sys- In the separation-based schemes, following the work of Sahai tems, and KC =2. As implied by the JSCC theorem (20), an and Mitter [7], we used a decoder that ought to make a decision exponentially large (in KC ) gain in the cost can be achieved. on all information bits transmitted until that time, even if its Furthermore, the results of Theorems 7.1 and 7.2 readily extend “belief” of a particular bit—quantified by an appropriate metric, to systems with noisy observations [41] as well as vector states say Fano’s metric—is low. xt and vector control signals ut but scalar observed outputs yt . An alternative to this approach is to allow declaring an era- Interestingly, for the case of vector (observation, state and sure, for bits of “low belief.” This idea was advocated and ex- control) signals, even if the signaling rate of the channel and plored in the celebrated work of Fano [71] for block codes, the sample rate of the observer are equal (rate-matched case), where a tradeoff between the achievable error erasure exponents conveying several analog observations over a single channel in- was established. put may be of the essence. This is achieved by a compression Furthermore, by substantially increasing the error exponent JSCC scheme, e.g., by reversing the roles of the source and (lowering the error probability), at the expense of decreas- the channel inputs in the SK spiral-based scheme. Similarly to ing the erasure exponent (increasing the erasure probability), their expansion counterparts, such compression JSCC schemes one can drive the separation-based scheme toward that of a provide exponentially growing gains with the SNR and dimen- noiseless channel with occasional packet erasures. Interestingly, sion [35], [36], [54], [57]–[59], [61], and promise better LQG Algorithm 3.2 and the lower bound of Theorem 7.2 readily ex- costs than their linear counterparts [28]. tend to this case due to their “greedy nature” [34]. The vector case is much more challenging, in general. Beyond the difficulty of designing good higher-dimensional curves/surfaces (and more general geometric structures) [55], ACKNOWLEDGMENT [57] and lower bounds [66], there is a challenge of even de- The authors thank H. Yıldız for valuable discussions and help termining and prioritizing the subspaces of the state space of with parts of the simulation. xt that needs to be conveyed. To make this point more clear, consider a linear plant REFERENCES xt+1 = Axt + But + wt [1] J. P. Hespanha, P. Naghshtabrizi, and U. Xu, “A survey of recent results x y w in networked control systems,” Proc. IEEE, vol. 95, no. 1, pp. 138–162, with 2-D state ( t ), output ( t ), and i.i.d. Gaussian noise ( t ) Jan. 2007. signals, a scalar control signal (ut ), and known 2 × 2 and 2 × 1 [2] V. Gupta, A. F. Dana, J. P. Hespanha, R. M. Murray, and B. Hassibi, matrices A and B, respectively. Since ut is 1-D, optimizing “Data transmission over networks for estimation and control,” IEEE Trans. the cost of the next step mandates transmitting the information Autom. Control, vol. 54, no. 8, pp. 1807–1819, Aug. 2009. [3] L. Schenato, B. Sinopoli, M. Franceschetti, K. Poola, and S. S. Sastry, that is needed to reconstruct/approximate the optimal ut (say T “Foundations of control and estimation over lossy networks,” Proc. IEEE, a 1-D linear combination of xt : ut = −Kt xt , where Kt is a vol. 95, no. 1, pp. 163–187, Jan. 2007. predetermined column vector). However, since A mixes both [4] S. Yuksel¨ and T. Bas¸ar, Stochastic Networked Control Systems: Stabiliza- the subspace dictated by KT and its perpendicular, better per- tion and Optimization Under Information Constraints. Boston, MA, USA: formance might be attained in future steps (“cost-to-go”) by Birkhauser,¨ 2013. [5] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. transmitting both subspaces. Tech. J., vol. 27, pp. 379–423, Jul. 1948. [6] A. El Gamal and Y.-H. Kim, Network Information Theory. Cambridge, U.K.: Cambridge Univ. Press, 2011. C. Oblivious Transmitter [7] A. Sahai and S. K. Mitter, “The necessity and sufficiency of anytime capacity for stabilization of a linear system over a noisy communication In this paper, we assumed that the observer knows all past link—part I: Scalar systems,” IEEE Trans. Inf. Theory, vol. 52, no. 8, control signals. We note that such information is not needed for pp. 3369–3395, Aug. 2006. the JSCC schemes (without feedback), for the special case of [8] L. J. Schulman, “Coding for interactive communication,” IEEE Trans. Inf. variance control, i.e., Rt ≡ 0. Theory, vol. 42, no. 6, pp. 1745–1756, Nov. 1996. [9] R. T. Sukhavasi and B. Hassibi, “Linear time-invariant anytime codes for Rt ≡ For the more general LQG-cost setting ( 0), this assump- control over noisy channels,” IEEE Trans. Autom. Control, vol. 61, no. 12, tion can be viewed as a two-sided side-information scenario. pp. 3826–3841, Dec. 2016. Nevertheless, although this is a common situation in practice, [10] J. M. Wozencraft, “Sequential decoding for reliable communications,” there are scenarios in which the observer is oblivious of the Tech. Rep. 325, Res. Lab. Electron., Massachusetts Inst. Technol., Cam- control signal applied or has only a noisy measurement of the bridge, MA, USA, Aug. 1957. [11] J. M. Wozencraft and I. M. Jacobs, Principles of Communication Engi- actuation signal generated by the controller. Such settings can neering. New York, NY, USA: Wiley, 1965. be regarded as a JSCC problem with side information at the [12] F. Jelinek, Probabilistic Information Theory: Discrete and Memoryless receiver (only), and can be treated using JSCC techniques de- Models. New York, NY, USA: McGraw-Hill, 1968. signed for this case, some of which combine naturally with the [13] R. G. Gallager, Information Theory and Reliable Communication.New York, NY, USA: Wiley, 1968. JSCC schemes for rate mismatch [57], [64], [65]. In fact, this [14] A. J. Viterbi and J. K. Omura, Principles of Digital Communication and idea was recently applied for the related problem of communi- Coding. New York, NY, USA: McGraw-Hill, 1979. cation over an AWGN channel with AWGN feedback channel [15] R. Johannesson and K. S. Zigangirov, Fundamentals of Convolutional in [70]. Coding. New York, NY, USA: Wiley-IEEE Press, 1999.

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on February 16,2020 at 05:48:56 UTC from IEEE Xplore. Restrictions apply. 3704 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 64, NO. 9, SEPTEMBER 2019

[16] A. Khina, W. Halbawi and B. Hassibi, “(almost) practical tree codes,” in [43] S. Dharmadhikari and K. Joag-Dev, Unimodality, Convexity, and Appli- Proc. IEEE Int. Symp. Inf. Theory, Barcelona, Spain, Jul. 2016, pp. 2404– cations Probability and Mathematical Statistics. Boston, MA, USA: Aca- 2408. demic, 1988. [17] G. N. Nair, F. Fagnani, S. Zampieri, and R. J. Evans, “Feedback control [44] P. E. Fleischer, “Sufficient conditions for achieving minimum distortion under data rate constraints: An overview,” Proc. IEEE, vol. 95, no. 1, in a quantizer,” in Proc. IEEE Int. Convention Rec., Part 1, Jan. 1964, pp. 108–137, Jan. 2007. pp. 104–111. [18] S. Yuksel,¨ “Stochastic stabilization of noisy linear systems with fixed-rate [45] J. C. Kieffer, “Uniqueness of locally optimal quantizer for log-concave limited feedback,” IEEE Trans. Autom. Control, vol. 55, no. 12, pp. 2847– density and convex error weighting function,” IEEE Trans. Inf. Theory, 2853, Dec. 2010. vol. 29, no. 1, pp. 42–47, Jan. 1983. [19] P.Minero, M. Franceschetti, S. Dey, and G. N. Nair, “Data rate theorem for [46] A. V. Trushkin, “Sufficient conditions for uniqueness of a locally optimal stabilization over time-varying feedback channels,” IEEE Trans. Autom. quantizer for a class of convex error weighting functions,” IEEE Trans. Control, vol. 54, no. 2, pp. 243–255, Feb. 2009. Inf. Theory, vol. 28, no. 2, pp. 187–198, Mar. 1982. [20] S. Sarkk¨ a,¨ Bayesian Filtering and Smoothing, vol. 3. Cambridge, U.K.: [47] T. Fischer, “Optimal quantized control,” IEEE Trans. Autom. Control, Cambridge Univ. Press, 2013. vol. 27, no. 4, pp. 996–998, Aug. 1982. [21] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression. [48] D. P. Bertsekas, Dynamic Programming and Optimal Control, vol. I, 2nd Boston, MA, USA: Kluwer, 1992. ed. Belmont, MA, USA: Athena Sci., 2000. [22] L. Bao, M. Skoglund, and K. H. Johansson, “Iterative encoder–controller [49] M. Fu, “Lack of separation principle for quantized linear quadratic Gaus- design for feedback control over noisy channels,” IEEE Trans. Autom. sian control,” IEEE Trans. Autom. Control, vol. 57, no. 9, pp. 2385–2390, Control, vol. 56, no. 2, pp. 265–278, Feb. 2011. Sep. 2012. [23] A. Khina, Y. Nakahira, Y. Su, and B. Hassibi, “Algorithms for optimal [50] R. M. Fano, Transmission of Information. Cambridge, MA, USA: MIT control with fixed-rate feedback,” in Proc. IEEE Conf. Decision Control, Press, 1961. Melbourne, VIC, Australia, Dec. 2017, pp. 6015–6020. [51] A. Khina, W. Halbawi, and B. Hassibi, “Sequential stack decoder: Python [24] Y. Kochman, A. Khina, U. Erez, and R. Zamir, “Rematch-and-forward: implementation,” Jan. 2016. [Online]. Available: http://www.eng.tau.ac.il/ Joint source/channel coding for parallel relaying with spectral mismatch,” anatolyk/code/tree_codes.py IEEE Trans. Inf. Theory, vol. 60, no. 1, pp. 605–622, Jan. 2014. [52] E. Arıkan, “An upper bound on the cutoff rate of sequential decoding,” [25] T. Goblick, “Theoretical limitations on the transmission of data from IEEE Trans. Inf. Theory, vol. 34, no. 1, pp. 55–63, Jan. 1988. analog sources,” IEEE Trans. Inf. Theory, vol. 11, no. 4, pp. 558–567, [53] O. Shayevitz and M. Feder, “The posterior matching feedback scheme for Oct. 1965. joint source–channel coding with bandwidth expansion,” in Proc. Data [26] P. Elias, “Channel capacity without coding,” Proc. IRE, vol. 45, no. 3, Comput. Conf., Snowbird, UT, USA, 2009, pp. 83–92. pp. 381–381, Jan. 1957. [54] S. Y. Chung, “On the construction of some capacity-approaching coding [27] R. Bansal and T. Bas¸ar, “Simultaneous design of measurement and control schemes,” Ph.D. dissertation, , Dept. EECS, Massachusetts Inst. Technol., strategies for stochastic systems with feedback,” Automatica, vol. 25, no. 5, Cambridge, MA, USA, 2000. pp. 679–694, Sep. 1989. [55] P. A. Floor, “On the theory of Shannon–Kotelnikov mappings in [28] J. S. Freudenberg, R. H. Middleton, and V. Solo, “Stabilization and distur- joint source–channel coding,” Ph.D. dissertation, Dept. Electron. bance attenuation over a Gaussian communication channel,” IEEE Trans. Telecommun., Norwegian Univ. Sci. Technol., Trondheim, Norway, Autom. Control, vol. 55, no. 3, pp. 795–799, Mar. 2010. May 2008. [29] S. Tatikonda, A. Sahai, and S. K. Mitter, “Stochastic linear control over [56] M. Kleiner and B. Rimoldi, “Asymptotically optimal joint source–channel a communication channel,” IEEE Trans. Autom. Control, vol. 49, no. 9, with minimal delay,” in Proc. IEEE Global Telecommun. Conf., Honolulu, pp. 1549–1561, Sep. 2004. HI, USA, Nov./Dec. 2009, pp. 1–5. [30] N. Elia, “When Bode meets Shannon: Control-oriented feedback commu- [57] E. Akyol, K. B. Viswanatha, K. Rose, and T. A. Ramstad, “On zero- nication schemes,” IEEE Trans. Autom. Control, vol. 49, no. 9, pp. 1477– delay source–channel coding,” IEEE Trans. Inf. Theory, vol. 60, no. 12, 1488, Sep. 2004. pp. 7473–7489, Dec. 2014. [31] M. Gastpar, “To code or not to code,” Ph.D. dissertation, School Comput. [58] F. Hekland, P. A. Floor, and T. A. Ramstad, “Shannon–Kotel’nikov map- Commun. Sci., Ecole´ Polytechnique Fed´ erale´ de Lausanne, Lausanne, pings in joint source–channel coding,” IEEE Trans. Commun., vol. 57, Switzerland, Dec. 2002. no. 1, pp. 94–105, Jan. 2009. [32] S. Tatikonda and S. K. Mitter, “Control over noisy channels,” IEEE Trans. [59] Y. Hu, J. Garcia-Frias, and M. Lamarca, “Analog joint source–channel Autom. Control, vol. 49, no. 7, pp. 1196–1201, Jul. 2004. coding using non-linear curves and MMSE decoding,” IEEE Trans. Com- [33] B. Sinopoli, L. Schenato, M. Franceschetti, K. Poola, M. I. Jordan, and S. mun., vol. 59, no. 11, pp. 3016–3026, Nov. 2011. S. Sastry, “Kalman filtering with intermittent observations,” IEEE Trans. [60] I. Kvecher and D. Rephaeli, “An analog modulation using a spiral map- Autom. Control, vol. 49, no. 9, pp. 1453–1464, Sep. 2004. ping,” in Proc. IEEE Convection Electr. Electron. Eng. Israel, Eilat, Israel, [34] A. Khina, V. Kostina, A. Khisti, and B. Hassibi, “Tracking and control Nov. 2006, pp. 194–198. of Gauss–Markov processes over packet-drop channels with acknowledg- [61] A. Ingber and M. Feder, “Power preserving 2:1 bandwidth reduction map- ments,” IEEE Trans. Control Netw. Syst.. pings,” in Proc. Data Compression Conf., Snowbird, UT, USA, Mar. 1997, [35] C. E. Shannon, “Communication in the presence of noise,” Proc. IRE, p. 385. vol. 37, no. 1, pp. 10–21, Jan. 1949. [62] U. Mittal and N. Phamdo, “Hybrid digital-analog (HDA) joint source– [36] V. A. Kotel’nikov, The Theory of Optimum Noise Immunity.NewYork, channel codes for broadcasting and robust communications,” IEEE Trans. NY, USA: McGraw-Hill, 1959. Inf. Theory, vol. 48, no. 5, pp. 1082–1102, May 2002. [37] S. Yuksel¨ and S. Tatikonda, “A counterexample in distributed optimal [63] Z. Reznic, M. Feder, and R. Zamir, “Distortion bounds for broadcast- sensing and control,” IEEE Trans. Autom. Control, vol. 54, no. 4, pp. 841– ing with bandwidth expansion,” IEEE Trans. Inf. Theory, vol. 52, no. 8, 844, Apr. 2009. pp. 3778–3788, Aug. 2006. [38] U. Kumar, V. Gupta, and J. N. Laneman, “On stability across a Gaussian [64] Y. Kochman and R. Zamir, “Analog matching of colored sources to col- product channel,” in Proc. IEEE Conf. Decision Control Eur. Control ored channels,” IEEE Trans. Inf. Theory, vol. 57, no. 6, pp. 3180–3195, Conf., Orlando, FL, USA, Dec. 2011, pp. 3142–3147. Jun. 2011. [39] A. A. Zaidi, T. J. Oechtering, S. Yuksel,¨ and M. Skoglund, “Stabilization [65] X. Chen and E. Tuncel, “Zero-delay joint source–channel coding using and control over Gaussian networks,” in Information and Control in Net- hybrid digital–analog schemes in the Wyner–Ziv setting,” IEEE Trans. works, B. Bernhardsson, G. Como, and A. Rantzer, Eds. New York, NY, Commun., vol. 62, no. 2, pp. 726–735, Feb. 2014. USA: Springer, 2014, pp. 39–85. [66] V. Kostina and B. Hassibi, “Rate–cost tradeoffs in control,” in Proc. Aller- [40] H. S. Witsenhausen, “A counterexample in stochastic optimum control,” ton Conf. Commun., Control, Comput., Monticello, IL, USA, Sep. 2016, SIAM J. Control, vol. 6, no. 1, pp. 131–147, Feb. 1968. pp. 1157–1164. [41] A. Khina, G. M. Pettersson, V.Kostina, and B. Hassibi, “Multi-rate control [67] J. S. Freudenberg, R. H. Middleton, and J. H. Braslavsky, “Stabilization over AWGN channels via analog joint source–channel coding,” in Proc. and disturbance attenuation over a Gaussian communication channel,” in IEEE Conf. Decision Control, Las Vegas, NV, USA, Dec. 2016, pp. 5968– Proc. IEEE Conf. Decision Control, New Orleans, LA, USA, Dec. 2007, 5973. pp. 3958–3963. [42] E. Riedel Garding,˚ “Control over AWGN channels with and without [68] A. D. Wyner, “On the Schalkwijk–Kailath coding scheme with a peak source–channel separation: Python implementation,” Aug. 2017. [Online]. energy constraint,” IEEE Trans. Inf. Theory, vol. 14, no. 1, pp. 129–134, Available: https://github.com/eliasrg/SURF2017/ Jan. 1968.

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on February 16,2020 at 05:48:56 UTC from IEEE Xplore. Restrictions apply. KHINA et al.: CONTROL OVER GAUSSIAN CHANNELS WITH AND WITHOUT SOURCE–CHANNEL SEPARATION 3705

[69] J. P. M. Schalkwijk and T. Kailath, “A coding scheme for additive noise Victoria Kostina (S’12–M’14) received the channels with feedback–I: No bandwidth constraint,” IEEE Trans. Inf. Bachelor’s degree in applied mathematics and Theory, vol. IT-12, no. 2, pp. 172–182, Apr. 1966. physics from Moscow institute of Physics and [70] A. Ben-Yishai and O. Shayevitz, “Interactive schemes for the AWGN Technology, Dolgoprudny, Russia, in 2004, channel with noisy feedback,” IEEE Trans. Inf. Theory, vol. 63, no. 4, where she was affiliated with the Institute for pp. 2409–2427, Apr. 2017. Information Transmission Problems, Russian [71] G. D. Forney Jr., “Exponential error bounds for erasure, list, and decision Academy of Sciences, the Master’s degree in feedback schemes,” IEEE Trans. Inf. Theory, vol. IT-14, no. 2, pp. 206– electrical engineering from the University of 220, Mar. 1968. Ottawa, Ottawa, ON, Canada, in 2006, and the Ph.D. degree in electrical engineering from Princeton University, Princeton, NJ, USA, 2013. Anatoly Khina (S’08–M’17) was born in She joined Caltech as an Assistant Professor of electrical engineer- Moscow, USSR, in 1984, and moved to Israel ing in the fall of 2014. Her research interests include information theory, in 1991. He received the B.Sc. (summa cum coding, control, and communications. laude), the M.Sc. (summa cum laude), and the Dr. Kostina received the Natural Sciences and Engineering Research Ph.D. degrees in 2006, 2010, and 2016, respec- Council of Canada postgraduate scholarship (2009–2012), the Prince- tively, all in electrical engineering from Tel Aviv ton Electrical Engineering Best Dissertation Award (2013), the Simons- University, Tel Aviv, Israel. Berkeley Research Fellowship (2015), and the NSF CAREER award He is currently a Senior Lecturer with the De- (2017). partment of Electrical Engineering-Systems, Tel Aviv University. He was a Postdoctoral Scholar with the Department of Electrical Engineering, California Institute of Technology, Pasadena, CA, USA, from 2015 to 2018, and a Research Fellow with the Simons Institute for the Theory of Computing, University of California, Berkeley, Berkeley, CA, USA, dur- ing the Spring of 2018. His research interests include information theory, , , and matrix analysis. In parallel to his Babak Hassibi (M’08) was born in Tehran, , studies, he has worked as an Engineer in various roles focusing on al- in 1967. He received the B.S. degree from the gorithms, software, and hardware R&D. , Tehran, Iran, in 1989, and Dr. Khina is a recipient of the Simons-Berkeley and Qualcomm Re- the M.S. and Ph.D. degrees from Stanford Uni- search Fellowships; Fulbright, Rothschild, and Marie Skłodowska-Curie versity, Stanford, CA, USA, in 1993 and 1996, Postdoctoral Fellowships; Clore Scholarship; Trotsky Award; Weinstein respectively, all in electrical engineering. Prize in signal processing; Intel award for Ph.D. research; and the He has been with the California Institute of first prize for outstanding research work in the field of communication Technology, Pasadena, CA, USA, since January technologies from the Advanced Communication Center’s Feder Family 2001, where he is currently the Mose and Lil- Award program. ian S. Bohn Professor of electrical engineering. From 20013 to 2016, he was the Gordon M. Binder/Amgen Professor of electrical engineering and from 2008 to 2015, he was an Executive Officer of electrical engineering, as well as an As- Elias Riedel Garding˚ received the B.Sc. degree sociate Director of information science and technology. From October in engineering physics from the Royal Institute of 1996 to October 1998, he was a Research Associate with the Informa- Technology (KTH), Stockholm, Sweden, in 2017 tion Systems Laboratory, Stanford University, and from November 1998 and the Master’s degree in applied mathematics, to December 2000, he was a Member of the Technical Staff with the focusing on theoretical physics, from the Univer- Mathematical Sciences Research Center, Bell Laboratories, Murray Hill, sity of Cambridge, Cambridge, U.K., in 2018. He NJ, USA. He has also held short-term appointments at Ricoh California is currently working toward the M.Sc. degree in Research Center, the Indian Institute of Science, and Linkoping¨ Univer- subatomic and astrophysics with KTH. sity, Sweden. He is the coauthor of the books (both with A. H. Sayed and T. Kailath) Indefinite Quadratic Estimation and Control: A Unified He contributed to this work as a Summer Un- ∞ Approach to H2 and H Theories (SIAM, 1999) and Linear Estimation dergraduate Research Fellow with the California (Prentice Hall, 2000). His research interests include communications and Institute of Technology (2017). He was a sum- information theory, control and network science, and signal processing mer student with the Joint Research Centre of Photonics, Zhejiang Uni- and machine learning. versity, Hangzhou, China, in 2016 and contributed to NMR spectroscopy Dr. Hassibi is a recipient of an Alborz Foundation Fellowship, the research with the Department of Biochemistry, University of Cambridge 1999 O. Hugo Schuck best paper award of the American Automatic in the summer of 2018. Control Council (with H. Hindi and S. P. Boyd), the 2002 National Sci- ence Foundation Career Award, the 2002 Okawa Foundation Research Grant for Information and Telecommunications, the 2003 David and Lu- Gustav M. Pettersson received the B.Sc. de- cille Packard Fellowship for Science and Engineering, the 2003 Presi- gree in engineering physics in 2017 and the dential Early Career Award for Scientists and Engineers, and the 2009 M.Sc. degree in aerospace engineering in 2018, Al-Marai Award for Innovative Research in Communications, and was a from the KTH Royal Institute of Technology, participant in the 2004 National Academy of Engineering “Frontiers in Stockholm, Sweden. Engineering” program. He has been a Guest Editor for the IEEE TRANS- He is currently with the European Space ACTIONS ON INFORMATION THEORY special issue on “space-time transmis- Agency, Noordwijk, the Netherlands. He has sion, reception, coding and signal processing,” was an Associate Editor previously been with the California Institute of for Communications of the IEEE TRANSACTIONS ON INFORMATION THEORY Technology, Pasadena, CA, USA, in 2016, the during 2004–2006, and is currently an Editor for the Journal “Founda- Science for Life Laboratory, Solna, Sweden, in tions and Trends in Information and Communication” and for the IEEE 2017, and at the NASA Ames Research Center, TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING. He was an IEEE Moffett Field, CA, USA, in 2018. Information Theory Society Distinguished Lecturer for 2016–2017.

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on February 16,2020 at 05:48:56 UTC from IEEE Xplore. Restrictions apply.