Deep Learning for Image Super-Resolution: a Survey
Total Page:16
File Type:pdf, Size:1020Kb
1 Deep Learning for Image Super-resolution: A Survey Zhihao Wang, Jian Chen, Steven C.H. Hoi, Fellow, IEEE Abstract—Image Super-Resolution (SR) is an important class of image processing techniques to enhance the resolution of images and videos in computer vision. Recent years have witnessed remarkable progress of image super-resolution using deep learning techniques. This article aims to provide a comprehensive survey on recent advances of image super-resolution using deep learning approaches. In general, we can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR. In addition, we also cover some other important issues, such as publicly available benchmark datasets and performance evaluation metrics. Finally, we conclude this survey by highlighting several future directions and open issues which should be further addressed by the community in the future. Index Terms—Image Super-resolution, Deep Learning, Convolutional Neural Networks (CNN), Generative Adversarial Nets (GAN) F 1 INTRODUCTION MAGE super-resolution (SR), which refers to the process and strategies [8], [31], [32], etc. I of recovering high-resolution (HR) images from low- In this paper, we give a comprehensive overview of re- resolution (LR) images, is an important class of image cent advances in image super-resolution with deep learning. processing techniques in computer vision and image pro- Although there are some existing SR surveys in literature, cessing. It enjoys a wide range of real-world applications, our work differs in that we are focused in deep learning such as medical imaging [1], [2], [3], surveillance and secu- based SR techniques, while most of the earlier works [33], rity [4], [5]), amongst others. Other than improving image [34], [35], [36] aim at surveying traditional SR algorithms or perceptual quality, it also helps to improve other computer some studies mainly concentrate on providing quantitative vision tasks [6], [7], [8], [9]. In general, this problem is very evaluations based on full-reference metrics or human visual challenging and inherently ill-posed since there are always perception [37], [38]. Unlike the existing surveys, this survey multiple HR images corresponding to a single LR image. takes a unique deep learning based perspective to review In literature, a variety of classical SR methods have been the recent advances of SR techniques in a systematic and proposed, including prediction-based methods [10], [11], comprehensive manner. [12], edge-based methods [13], [14], statistical methods [15], The main contributions of this survey are three-fold: [16], patch-based methods [13], [17], [18], [19] and sparse 1) We give a comprehensive review of image super- representation methods [20], [21], etc. resolution techniques based on deep learning, in- With the rapid development of deep learning techniques cluding problem settings, benchmark datasets, per- in recent years, deep learning based SR models have been formance metrics, a family of SR methods with deep actively explored and often achieve the state-of-the-art per- learning, domain-specific SR applications, etc. formance on various benchmarks of SR. A variety of deep 2) We provide a systematic overview of recent ad- learning methods have been applied to tackle SR tasks, rang- vances of deep learning based SR techniques in a ing from the early Convolutional Neural Networks (CNN) arXiv:1902.06068v2 [cs.CV] 8 Feb 2020 hierarchical and structural manner, and summarize based method (e.g., SRCNN [22], [23]) to recent promising the advantages and limitations of each component SR approaches using Generative Adversarial Nets (GAN) for an effective SR solution. [24] (e.g., SRGAN [25]). In general, the family of SR al- 3) We discuss the challenges and open issues, and gorithms using deep learning techniques differ from each identify the new trends and future directions to other in the following major aspects: different types of provide an insightful guidance for the community. network architectures [26], [27], [28], different types of loss functions [8], [29], [30], different types of learning principles In the following sections, we will cover various aspects of recent advances in image super-resolution with deep learning. Fig. 1 shows the taxonomy of image SR to be • Corresponding author: Steven C.H. Hoi is currently with Salesforce Research Asia, and also a faculty member (on leave) of the School covered in this survey in a hierarchically-structured way. of Information Systems, Singapore Management University, Singapore. Section 2 gives the problem definition and reviews the Email: [email protected] or [email protected]. mainstream datasets and evaluation metrics. Section 3 ana- • Z. Wang is with the South China University of Technology, China. E-mail: lyzes main components of supervised SR modularly. Section [email protected]. This work was done when he was a visiting student with Dr Hoi’s group at the School of Information Systems, Singapore 4 gives a brief introduction to unsupervised SR methods. Management University, Singapore. Section 5 introduces some popular domain-specific SR ap- • J. Chen is with the South China University of Technology, China. E-mail: plications, and Section 6 discusses future directions and [email protected]. open issues. 2 Image Super-resolution Supervised Image Super-resolution Model Frameworks Upsampling Methods Network Design Learning Strategies Other Improvements Residual Learning Loss Functions: Pre-upsampling SR Context-wise Network Interpolation-based Methods: Recursive Learning - Pixel Loss Fusion - Nearest Neighbor Multi-path Learning - Content Loss - Bilinear - Texture Loss Post-upsampling SR Dense Connections - Bicubic - Adversarial Loss Data Augmentation - Others Attention Mechanism - Cycle Consistency Loss Advanced Convolution - Total Variation Loss Progressive Upsampling SR Region-recursive Learning - Prior-based Loss Multi-task Learning Learning-based Methods: Pyramid Pooling Batch Normalization - Transposed Convolution Network Interpolation Iterative Up-and-down Wavelet Transformation - Sub-pixel Layer Curriculum Learning Sampling SR xUnit - Meta Upscale Module Self-ensemble Desubpixel Multi-supervision Performance Evaluation Unsupervised Image Super-resolution Domain-specific Applications Benchmark Datasets Depth Map Super-resolution Zero-shot Super-resolution Performance Metric: Face Image Super-resolution - Objective Methods: PSNR, SSIM, etc. Weakly-supervised Super-resolution: - Subjective Methods: MOS Hyperspectral Image Super-resolution - Learned Degradation - Task-based Evaluation Real-world Image Super-resolution - Learning-based Perceptual Quality - Cycle-in-cycle Super-resolution - Other Methods Video Super-resolution Operating Channels Deep Image Prior Other Applications Fig. 1. Hierarchically-structured taxonomy of this survey. 2 PROBLEM SETTING AND TERMINOLOGY downsampling operation is bicubic interpolation with anti- aliasing. However, there are other works [39] modelling the 2.1 Problem Definitions degradation as a combination of several operations: Image super-resolution aims at recovering the correspond- D(I ; δ) = (I ⊗ κ) # +n ; fκ, s; &g ⊂ δ; (4) ing HR images from the LR images. Generally, the LR image y y s & I x is modeled as the output of the following degradation: where Iy ⊗ κ represents the convolution between a blur kernel κ and the HR image I , and n is some additive I = D(I ; δ); y & x y (1) white Gaussian noise with standard deviation &. Compared to the naive definition of Eq. 3, the combinative degradation where D denotes a degradation mapping function, Iy is pattern of Eq. 4 is closer to real-world cases and has been the corresponding HR image and δ is the parameters of shown to be more beneficial for SR [39]. the degradation process (e.g., the scaling factor or noise). Generally, the degradation process (i.e., D and δ) is un- To this end, the objective of SR is as follows: known and only LR images are provided. In this case, also ^ θ = arg min L(I^y;Iy) + λΦ(θ); (5) known as blind SR, researchers are required to recover an θ HR approximation I^y of the ground truth HR image Iy from where L(I^y;Iy) represents the loss function between the the LR image Ix, following: generated HR image I^y and the ground truth image Iy, Φ(θ) λ I^y = F(Ix; θ); (2) is the regularization term and is the tradeoff parameter. Although the most popular loss function for SR is pixel-wise where F is the super-resolution model and θ denotes the mean squared error (i.e., pixel loss), more powerful models parameters of F. tend to use a combination of multiple loss functions, which Although the degradation process is unknown and can will be covered in Sec. 3.4.1. be affected by various factors (e.g., compression artifacts, anisotropic degradations, sensor noise and speckle noise), 2.2 Datasets for Super-resolution researchers are trying to model the degradation mapping. Most works directly model the degradation as a single Today there are a variety of datasets available for image downsampling operation, as follows: super-resolution, which greatly differ in image amounts, quality, resolution, and diversity, etc. Some of them provide D(Iy; δ) = (Iy) #s; fsg ⊂ δ; (3) LR-HR image pairs, while others only provide HR images, in which case the LR images are typically obtained by imre- where #s is a downsampling operation with the scaling size function with default settings in MATLAB (i.e., bicubic factor s. As a matter of fact, most datasets for generic SR are interpolation with anti-aliasing). In Table 1 we list a number built based on this pattern, and the most commonly used of image datasets commonly used by the SR community, 3 TABLE 1 List of public image datasets for super-resolution benchmarks. Dataset Amount Avg. Resolution Avg. Pixels Format Category Keywords BSDS300 [40] 300 (435; 367) 154; 401 JPG animal, building, food, landscape, people, plant, etc. BSDS500 [41] 500 (432; 370) 154; 401 JPG animal, building, food, landscape, people, plant, etc. DIV2K [42] 1000 (1972; 1437) 2; 793; 250 PNG environment, flora, fauna, handmade object, people, scenery, etc.