Ding Liu, Zhaowen Wang and Thomas Huang Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign

Category: Computer Vision & Machine Vision - CV10

Poster contact Name P5195 Ding Liu: [email protected]

Efficient Image and Video Super-Resolution Using Deep Convolutional Neural Networks Ding Liu, Zhaowen Wang and Thomas Huang Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign

Abstract LISTA Algorithm [2] Experiments & Results We propose a GPU-parallelized algorithm using deep • ISTA is used to find the solution to the norm optimization problem in Average PSNR Average Time (sec) 18 convolutional neural network for single image super-resolution 36 15.76 35.50 16 testing phase. It iterates the following recursive equation to convergence: 35.5 1 14 (SR), in which the restoration of high-resolution image from low- 𝑙𝑙 35 12 34.5 10 resolution image is accomplished by a deep learning model. . ′ 34 33.66 8 33.5 6 𝜃𝜃 𝑒𝑒 4 Unlike the traditional sparse coding based method, our deep 33 where 𝛼𝛼 𝑘𝑘 + 1 = ℎ , 𝑊𝑊 𝑦𝑦 largest+ 𝑆𝑆𝑆𝑆 𝑘𝑘eigenvalue, 𝛼𝛼 0 = 0of , 2 1.41 32.5 0 learning approach parallelizes all the computation steps from 1 1 Bicubic Coupled SC [1] / 𝑇𝑇 𝑇𝑇 𝑇𝑇 Coupled SC [1] Proposed Proposed end to end, without sacrificing the performance of the current 𝑊𝑊𝑒𝑒 = 𝐿𝐿 𝐷𝐷𝑙𝑙 , 𝑆𝑆 = 𝐼𝐼 − 𝐿𝐿 𝐷𝐷𝑙𝑙 𝐷𝐷 and𝑙𝑙 𝐿𝐿 > 𝐷𝐷𝑙𝑙 𝐷𝐷𝑙𝑙

state-of-the-art SR methods. The GPU parallelization 𝜃𝜃 𝑖𝑖 𝑖𝑖 𝑖𝑖 𝑖𝑖 + 𝑖𝑖 accelerates our algorithm significantly, and enables us to build ℎ 𝑉𝑉 = sign 𝑉𝑉 𝑉𝑉 − 𝜃𝜃 𝜃𝜃 = 𝜆𝜆/𝐿𝐿 up a real-time video SR system for on-line applications.

Introduction Image super-resolution is usually cast as an inverse problem of

recovering the original high-resolution (HR) image from the low- resolution (LR) observation images. This technique can be utilized in the applications where high resolution is of • LISTA takes a fixed number of iterations in ISTA, and are learned importance, such as medical imaging, surveillance, satellite imaging and SDTV to HDTV conversion. The main difficulty in the process which is equivalent to a deep neural network 𝑊𝑊𝜃𝜃, 𝑆𝑆, 𝜃𝜃 resides in the loss of much information in the degradation process. Method

One representative of the current state-of-the-art SR methods End-to-end mapping between the LR patch and HR patch: is the sparse coding based method[1], which can be easily • Extract low-level features from each LR patch transformed into a deep convolutional neural network. • Utilize LISTA to find the sparse code and multiply it with LR video

• Scale it by the norm and add mean to it 𝐷𝐷ℎ frames Related Work • Average overlapping estimated patches to produce the HR output Sparse-Coding-Based Method [1] Model Architecture

Coupled Dictionary Training • Learn the transform from low-resolution (LR) feature space to high resolution (HR) feature space • Find , such that the sparse code of can well reconstruct ℎ 𝑙𝑙 𝑖𝑖 𝐷𝐷 𝐷𝐷 𝑦𝑦 𝑖𝑖 𝑥𝑥 2 2 𝑖𝑖 ℎ 𝑖𝑖 𝑖𝑖 𝑙𝑙 𝑖𝑖 HR video Dminl,Dh 𝑥𝑥 − 𝐷𝐷 𝑧𝑧 2 + 𝛾𝛾 𝑦𝑦 − 𝐷𝐷 𝑧𝑧 2 𝑖𝑖 frames 2 Testing 𝑖𝑖 i l 𝑖𝑖 𝑖𝑖 𝑠𝑠. 𝑡𝑡. 𝑧𝑧 = argmin𝛼𝛼𝑖𝑖 y − D 𝛼𝛼 2 + 𝜆𝜆 𝛼𝛼 1 • Find the sparse code for the LR patch • Obtain the estimated HR∗ patch by and′ 𝛼𝛼 𝑦𝑦 ∗ ℎ 𝑥𝑥 𝐷𝐷 𝛼𝛼 Real-time Video SR system Conclusions & Future Work • Real-time video SR: frame by frame process A new single image SR method is proposed based on deep convolutional • Frame size: 200x250 pixels -> 400x500 pixels neural network, in which the restoration of HR image from LR image is achieved by a deep learning model. It is implemented at a much higher • Frame rate: 0.07 sec / frame ( 14 frames per sec) speed utilizing GPU computing. In the future, parameters in all layers can be trained in order to improve the restoration performance. ≈ References [1] Yang, Jianchao, et al. "Coupled dictionary training for image super-resolution."Image Processing, IEEE Transactions on 21.8 (2012): 3467-3478. [2] Gregor, Karol, and Yann LeCun. "Learning fast approximations of sparse coding." Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010.