Joint Source-Channel Coding of Images with (not very) Deep Learning David Burth Kurka and Deniz Gündüz Department of Electrical and Electronic Engineering, Imperial College London, London, UK {d.kurka, d.gunduz}@imperial.ac.uk Channel Abstract—Almost all wireless communication sys- Encoder Decoder Noise Reconstructed Input Noisy Encoded Input tems today are designed based on essentially the same Sampled Input digital approach, that separately optimizes the com- Vector pression and channel coding stages. Using machine + learning techniques, we investigate whether end-to- (Deconv end transmission can be learned from scratch, thus (Conv Net) Net) using joint source-channel coding (JSCC) rather than the separation approach. This paper reviews and ad- Fig. 1. Machine learning based communication system. vances recent developments on our proposed tech- nique, deep-JSCC, an autoencoder-based solution for chine learning techniques, focusing particularly on image generating robust and compact codes directly from transmission. For this, we replace the modular separation- images pixels, being comparable or even superior in performance to state-of-the-art (SoA) separation-based based design with a single neural network component for schemes (BPG+LDPC). Additionally, we show that encoder and decoder (see Fig.1 for an illustrative diagram), deep-JSCC can be expanded to exploit a series of thus performing JSCC, whose parameters are trained from important features, such as graceful degradation, ver- data, rather than being designed. Our solution, the deep- satility to different channels and domains, variable JSCC, is applied to the problem of image transmission transmission rate through successive refinement, and its capability to exploit channel output feedback. and can learn strictly from data in an unsupervised man- ner, as we model our system as an autoencoder [6], [7] I. Introduction with the communication channel incorporated as a non- Wireless communication systems have traditionally fol- trainable layer. This approach is motivated by the recent lowed a modular model-based design approach, in which developments in machine learning through the use of highly specialized blocks are designed separately based on deep learning (DL) techniques, and their applications to expert knowledge accumulated over decades of research. communication systems in recent years [8]. Autoencoders, This approach is partly motivated by Shannon’s separation in particular, due to the similarity between its architecture theorem [1], which gives theoretical guarantees that the and digital communication systems [9], [10] have been separate optimization of source compression and channel used in related problems and pushing the boundaries of coding can, in the asymptotic limit, approach the optimal communications [11]–[16]. The use of DL for the separate performance. In this way, we have available today highly problems of channel coding and image compression have specialized source codes, e.g., JPEG2000/BPG for images, been showing promising results, achieving performance in MPEG-4/WMA for audio, or H.264 for video, to be used some cases superior to handcrafted algorithms [17], [18]. in conjunction with near-capacity-achieving channel codes, We show, however, that by performing JSCC, we can e.g., Turbo, LDPC, polar codes. further improve the end-to-end performance. However, despite its huge impact, optimality of sepa- This paper reviews different features that were shown ration holds only under unlimited delay and complexity to be achieved with deep-JSCC, namely (a) performance assumptions; and, even under these assumptions, it breaks comparable or superior to SoA separation-based schemes; down in multi-user scenarios [2], [3], or non-ergodic source (b) graceful degradation upon deterioration of channel or channel distributions [4], [5]. Moreover, unconventional conditions [19]; (c) versatility to adapt to different channels communication paradigms have been emerging, demand- and domains [19]; (d) capacity of successive refinement [20] ing extreme end-to-end low latency and low power (e.g., and (e) ability to exploit channel output feedback in order IoT, autonomous driving, tactile Internet), and operating to improve the communication [21]. Thus, deep-JSCC under more challenging environments that might not fol- presents itself as a powerful solution for the transmission low the traditional models (e.g., channels under bursty of images, enabling communications with excellent per- interference). formance while achieving low-delay and low-energy, being In light of above, our goal is to rethink the problem robust to channel changes, and allowing small and flexible of wireless communication of lossy sources by using ma- bandwidth transmissions, thus advancing the field of com- This work was supported by the European Research Council (ERC) munications by improving existing JSCC and separation- through project BEACON (No. 677854). based methods. Encoder (f휭) 1 1 2↓ 2↓ 2↓ 1 | | | | | | c 5x5x 5x5x256 5x5x256 | 9x9x256 9x9x256 GDN GDN 5x5x256 GDN GDN GDN | | PReLU PReLU | PReLU PReLU | Power Power | Reshape Conv Normalization Normalization Conv Conv Conv Conv Conv Decoder (g흓) 1 1 1 2↑2↑ 2↑ | | | 2↑ || | | IGDN IGDN IGDN IGDN IGDN PReLU PReLU PReLU PReLU Sigmoid Reshape 5x5x256 5x5x256 9x9x3 5x5x256 5x5x256 5x5x256 TransConv TransConv TransConv TransConv 5x5x256 TransConv TransConv TransConv Denormalization Fig. 2. Encoder and decoder architectures used in experiments. II. Problem Formulation and Model Description AWGN channel - Kodak - k/n = 1/6 45.0 Consider an input image with height H, width W and C Deep-JSCC 42.5 BPG+LDPC color channels, represented as a vector of pixel intensities JPEG2000+LDPC 40.0 x ∈ Rn; n = H × W × C to be transmitted over k uses WEBP+LDPC JPEG+LDPC of a noisy channel, where k/n is the bandwidth ratio. An 37.5 n ki 35.0 encoder fθi : R → C maps x into channel input symbols L k P PSNR (dB) 32.5 zi ∈ Ci in L blocks, where i=1 ki = k. These symbols are transmitted over a noisy channel, characterized by a 30.0 k k random transformation η : C i → C i , which may model 27.5 physical impairments such as noise, fading or interference, 25.0 resulting in the corrupted channel output z^ = η(z ). We 0 5 10 15 20 25 i i SNR (dB) consider L distinct decoders, where the channel outputs Fig. 3. Deep-JSCC performance compared to digital schemes. kI n for the first i blocks are decoded using gφi : C → R Pi (where I = j=0 kj), creating reconstructions x^i = III. Deep-JSCC g (z^ ,..., z^ ) ∈ Rn, for i ∈ 1,...,L. φi 1 i Our first set of results demonstrate the base case when, The encoder and decoder(s) are modelled as fully convo- an image x is encoded by a single encoder and a single lutional networks, using generalized normalization trans- decoder, thus L = 1. We consider a complex AWGN formations (GDN/IGDN) [22], followed by a parametric channel with transfer function given by: ReLU (PReLU) [23] activation function (or a sigmoid, in the last decoder block). The channel is incorporated into η (z) = z + n, (2) the model as a non-trainable layer. Fig. 2 presents the n architecture and the hyperparameters used in the experi- where n ∈ Ck is independent and identically distributed 0 2 2 ments. Before transmission, the latent vector zi generated (i.i.d.) with n ∼ CN (0, σ I), where σ is the average at the encoder’s last convolutional layer is normalized to noise power. We measure the quality of the channel by 1 ∗ enforce an average power constraint so that [z zi] ≤ P , the average signal-to-noise ratio (SNR) given by SNR = ki E i √ 0 1 zi 10 log (dB) when P = 1 and the systems’ perfor- by setting zi = kiP . The model can be optimized 10 σ2 p 0∗ 0 zi zi mance by the peak SNR (PSNR), given by PSNR = to minimize the average distortion between input x and its 2552 10 log 2 (dB). 10 ||x−x^i|| reconstructions x^i at each layer i: Fig. 3 compares deep-JSCC with other well estab- lished codecs (BPG, JPEG2000, WebP, JPEG) followed by ∗ ∗ (θi , φi ) = arg min Ep(x,x^)[d(x, x^i)], (1) LDPC channel coding (see [19], [24] for more information θ ,φ i i on the experimental setup, dataset and alternative schemes where d(x, x^i) is a specified distortion measure, usually considered). We see that the performance of deep-JSCC is the mean squared error (MSE), although other metrics are either above or comparable to the performance of the SoA also considered. When L > 1, we have a multi-objective schemes, for a wide range of channel SNRs. problem. However, we simplify it so that the optimization These results are obtained by training a different en- of multiple layers is done either jointly, by considering a coder/decoder model for each SNR value evaluated in the weighted combination of losses, or greedily, by optimizing case of deep-JSCC, and considering the best performance (θi, φi) successively. Please see [20], [21] for more details. achieved by the separation-based scheme at each SNR. In AWGN Channel - Kodak - k/n = 1/6 Bursty Channel - k/n = 1/6 AWGN channel - Eurosat - k/n = 1/12 45 Deep-JSCC (Optimised for MS-SSIM) Deep-JSCC (Optimised for MSE) 40 35 1.00 BPG+LDPC 35 30 0.98 30 25 0.96 PSNR (dB) MS_SSIM 25 Average PSNR (dB) 0.94 Deep-JSCC (SNR =19dB) 20 20 train Deep-JSCC (SNRtrain=7dB) Deep-JSCC (SNRtrain=1dB) Deep-JSCC = 0.5 b 0.92 15 BPG+LDPC 15 BPG+LDPC b = 3.5 0 5 10 15 20 25 0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 0 5 10 15 20 25 SNRtest (dB) p SNR (dB) (a) (b) (c) Fig. 4. (a) effects of graceful degradation for deep-JSCC compared to cliff effect in separation-based scheme; (b) performance of deep-JSCC on a bursty interference channel (c) performance of deep-JSCC trained with MS-SSIM as objective function.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages5 Page
-
File Size-