Video Super Resolution Based on Deep Learning: a Comprehensive Survey

Total Page:16

File Type:pdf, Size:1020Kb

Video Super Resolution Based on Deep Learning: a Comprehensive Survey 1 Video Super Resolution Based on Deep Learning: A Comprehensive Survey Hongying Liu, Member, IEEE, Zhubo Ruan, Peng Zhao, Chao Dong, Member, IEEE, Fanhua Shang, Senior Member, IEEE, Yuanyuan Liu, Member, IEEE, Linlin Yang Abstract—In recent years, deep learning has made great multiple successive images/frames at a time so as to uti- progress in many fields such as image recognition, natural lize relationship within frames to super-resolve the target language processing, speech recognition and video super- frame. In a broad sense, video super-resolution (VSR) resolution. In this survey, we comprehensively investigate 33 state-of-the-art video super-resolution (VSR) methods based can be regarded as a class of image super-resolution on deep learning. It is well known that the leverage of and is able to be processed by image super-resolution information within video frames is important for video algorithms frame by frame. However, the SR perfor- super-resolution. Thus we propose a taxonomy and classify mance is always not satisfactory as artifacts and jams the methods into six sub-categories according to the ways of may be brought in, which causes unguaranteed temporal utilizing inter-frame information. Moreover, the architectures and implementation details of all the methods are depicted in coherence within frames. detail. Finally, we summarize and compare the performance In recent years, many video super-resolution algo- of the representative VSR method on some benchmark rithms have been proposed. They mainly fall into two datasets. We also discuss some challenges, which need to categories: the traditional methods and deep learning be further addressed by researchers in the community of based methods. For some traditional methods, the mo- VSR. To the best of our knowledge, this work is the first systematic review on VSR tasks, and it is expected to make tions are simply estimated by affine models as in [8]. In a contribution to the development of recent studies in this [9, 10], they adopt non-local mean and 3D steering ker- area and potentially deepen our understanding to the VSR nel regression for video super-resolution, respectively. techniques based on deep learning. Liu and Sun [11] proposed a Bayesian approach to Index Terms—Video super-resolution, deep learning, con- simultaneously estimate underlying motion, blur kernel, volutional neural network, inter-frame Information. and noise level and reconstruct high-resolution frames. In [12], the expectation maximization (EM) method is adopted to estimate the blur kernel, and guide the I. INTRODUCTION reconstruction of high-resolution frames. However, these Super-resolution (SR) aims at recovering a high- explicit models of high-resolution videos are still inade- resolution (HR) image or multiple images from the corre- quate to fit various scenes in videos. sponding low-resolution (LR) counterparts. It is a classic With the great success of deep learning in a variety and challenging problem in computer vision and image of areas, super-resolution algorithms based on deep processing, and it has extensive real-world applications, learning are studied extensively. Many video super- such as medical image reconstruction [1], face [2], remote resolution methods based on deep neural networks such sensing [3] and panorama video super-resolution [4,5], as convolutional neural network (CNN), generative ad- UAV surveillance [6] and high-definition television [7]. versarial network (GAN) and recurrent neural network With the advent of the 5th generation mobile commu- (RNN) have been proposed. Generally, they employ a arXiv:2007.12928v2 [cs.CV] 20 Dec 2020 nication technology, larger size images or videos can large number of both low-resolution and high-resolution be transformed within a shorter time. Meanwhile, with video sequences to input the neural network for inter- the popularity of high definition (HD) and ultra high frame alignment, feature extraction/fusion, and then to definition (UHD) display devices, super-resolution is produce the high-resolution sequences for the corre- attracting more attention. sponding low-resolution video sequences. The pipeline Video is one of the most common multimedia in our of most video super-resolution methods mainly includes daily life, and thus super-resolution of low-resolution one alignment module, one feature extraction and fusion videos has become very important. In general, image module, and one reconstruction module, as shown in super-resolution methods process a single image at a Fig.1. Because of the nonlinear learning capability of time, while video super-resolution algorithms deal with deep neural networks, the deep learning based meth- ods usually achieve good performance on many public H. Liu, Z. Ruan, P. Zhao, F. Shang, Y. Liu and L. Yang are with the Key Laboratory of Intelligent Perception and Image Understanding benchmark datasets. of Ministry of Education, School of Artificial Intelligence, Xidian So far, there are few works about the overview on University, China. E-mails: fhyliu, fhshang, yyliug@xidian.edu.cn. video super-resolution tasks, though many works [13, 14, C. Dong is with the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. E-mail: chao.dong@siat.ac.cn. 15] on the investigation of single image super-resolution Manuscript received November 12, 2020. have been published. Daithankar and Ruikar [16] pre- 2 SR image Feature extraction LR sequences Alignment and fusion Reconstruction t-1 t t+1 Fig. 1: The general pipeline of deep learning methods for VSR tasks. Note that the inter-frame alignment module can be either traditional methods or deep CNNs, while both the feature extraction & fusion module and the upsampling module usually utilize deep CNNs. The dashed line box means that the module is optional. sented a brief review on many frequency-spatial domain RGB color space, the YUV including YCbCr color space H×W ×3 methods, while the deep learning methods are rarely is also widely used for VSR. Ii 2 R denotes the ^ sH×sW ×3 mentioned. Unlike the previous work, we provide a com- i-th frame in a LR video sequence I, and Ii 2R prehensive investigation on deep learning techniques for is the corresponding HR frame, where s is the scale ^ i+N video super-resolution in recent years. It is well known factor, e.g., s = 2, 4, or 8. And fIjgj=i−N is a set of that the main difference between video super-resolution 2N +1 HR frames for the center frame I^i, where N is and image super-resolution lies in the processing of the temporal radius. Then the degradation process of HR inter-frame information. How to effectively leverage the video sequences can be formulated as follows: information from neighboring frames is critical for VSR ^ ^ i+N tasks. We focus on the ways of utilizing inter-frame Ii = φ(Ii; fIjgj=i−N ; θα) (1) information for various deep learning based methods. The contributions of this work are mainly summarized where φ(·; ·) is the degradation function, and the param- as follows. 1) We review recent works and progresses eter θα represents various degradation factors such as on developing techniques for deep learning based video noise, motion blur and downsampling factors. In most super-resolution. To the best of our knowledge, this is the existing works such as [11, 12, 17, 18], the degradation first comprehensive survey on deep learning上采样 based VSR process is expressed as: methods. 2) We propose a taxonomy for deep learning ^ based video super-resolution methods by categorizing Ij = DBEi!jIi + nj (2) their ways of utilization金字塔级联 inter-frame information and 时空注意力 where D and B are the down-sampling and blur opera- illustrate特征提取 how the taxonomy的可变形卷 can be used to categorize 重构模块 HR LR 模块 积对齐模块 tions, nj denotes image noise, and Ei!j is the warping existing methods. 3) We summarize the performance ^ ^ of state-of-the-art methods on some public benchmark operation based on the motion from Ii to Ij. datasets. 4) We further discuss some challenges and In practice, it is easy to obtain LR image Ij, but the perspectives for video super-resolutionDual network tasks. degradation factors, which may be quite complex or The rest of the paper is organized as follows. In Section probably a combination of several factors, are unknown. II, we briefly introduce the background of video super- Different from single image super-resolution (SISR) aim- resolution. Section III shows our taxonomy for recent ing at solving a single degraded image, VSR needs to works. In Sections IV and V, we describe the video残 deal with残 degraded video sequences, and recovers the super-resolution methods with and without alignment,差 corresponding差 HR video sequences, which should be respectively, according to the taxonomy. In Section VI, as close as the ground truth (GT) videos. Specifically, the performance of state-of-the-art methods is analyzed a VSR algorithm may use similar techniques to SISR for quantitatively. In Section VII, we discuss the challenges processing a single frame (spatial information), while it and prospective trends in video super-resolution. Finally, has to take relationships among frames (temporal infor- we conclude this work in Section VIII. mation) into consideration to ensure motion consistency of the video. The super-resolution process, namely the reverse process of Eq. (1), can be formulated as follows: II. BACKGROUND ~ −1 i+N Video super-resolution stems from image super- Ii = φ (Ii; fIjgj=i−N ; θβ) (3) resolution, and it aims at restoring high-resolution ~ ^ videos from multiple low-resolution frames. However,
Recommended publications
  • Digital Video Quality Handbook (May 2013
    Digital Video Quality Handbook May 2013 This page intentionally left blank. Executive Summary Under the direction of the Department of Homeland Security (DHS) Science and Technology Directorate (S&T), First Responders Group (FRG), Office for Interoperability and Compatibility (OIC), the Johns Hopkins University Applied Physics Laboratory (JHU/APL), worked with the Security Industry Association (including Steve Surfaro) and members of the Video Quality in Public Safety (VQiPS) Working Group to develop the May 2013 Video Quality Handbook. This document provides voluntary guidance for providing levels of video quality in public safety applications for network video surveillance. Several video surveillance use cases are presented to help illustrate how to relate video component and system performance to the intended application of video surveillance, while meeting the basic requirements of federal, state, tribal and local government authorities. Characteristics of video surveillance equipment are described in terms of how they may influence the design of video surveillance systems. In order for the video surveillance system to meet the needs of the user, the technology provider must consider the following factors that impact video quality: 1) Device categories; 2) Component and system performance level; 3) Verification of intended use; 4) Component and system performance specification; and 5) Best fit and link to use case(s). An appendix is also provided that presents content related to topics not covered in the original document (especially information related to video standards) and to update the material as needed to reflect innovation and changes in the video environment. The emphasis is on the implications of digital video data being exchanged across networks with large numbers of components or participants.
    [Show full text]
  • Multiframe Motion Coupling for Video Super Resolution
    Multiframe Motion Coupling for Video Super Resolution Jonas Geiping,* Hendrik Dirks,† Daniel Cremers,‡ Michael Moeller§ December 5, 2017 Abstract The idea of video super resolution is to use different view points of a single scene to enhance the overall resolution and quality. Classical en- ergy minimization approaches first establish a correspondence of the current frame to all its neighbors in some radius and then use this temporal infor- mation for enhancement. In this paper, we propose the first variational su- per resolution approach that computes several super resolved frames in one batch optimization procedure by incorporating motion information between the high-resolution image frames themselves. As a consequence, the number of motion estimation problems grows linearly in the number of frames, op- posed to a quadratic growth of classical methods and temporal consistency is enforced naturally. We use infimal convolution regularization as well as an automatic pa- rameter balancing scheme to automatically determine the reliability of the motion information and reweight the regularization locally. We demonstrate that our approach yields state-of-the-art results and even is competitive with machine learning approaches. arXiv:1611.07767v2 [cs.CV] 4 Dec 2017 1 Introduction The technique of video super resolution combines the spatial information from several low resolution frames of the same scene to produce a high resolution video. *University of Siegen, jonas.geiping@uni-siegen.de †University of Munster,¨ hendrik.dirks@wwu.de ‡Technical University of Munich, cremers@tum.de §University of Siegen, michael.moeller@uni-siegen.de 1 A classical way of solving the super resolution problem is to estimate the motion from the current frame to its neighboring frames, model the data formation process via warping, blur, and downsampling, and use a suitable regularization to suppress possible artifacts arising from the ill-posedness of the underlying problem.
    [Show full text]
  • (A/V Codecs) REDCODE RAW (.R3D) ARRIRAW
    What is a Codec? Codec is a portmanteau of either "Compressor-Decompressor" or "Coder-Decoder," which describes a device or program capable of performing transformations on a data stream or signal. Codecs encode a stream or signal for transmission, storage or encryption and decode it for viewing or editing. Codecs are often used in videoconferencing and streaming media solutions. A video codec converts analog video signals from a video camera into digital signals for transmission. It then converts the digital signals back to analog for display. An audio codec converts analog audio signals from a microphone into digital signals for transmission. It then converts the digital signals back to analog for playing. The raw encoded form of audio and video data is often called essence, to distinguish it from the metadata information that together make up the information content of the stream and any "wrapper" data that is then added to aid access to or improve the robustness of the stream. Most codecs are lossy, in order to get a reasonably small file size. There are lossless codecs as well, but for most purposes the almost imperceptible increase in quality is not worth the considerable increase in data size. The main exception is if the data will undergo more processing in the future, in which case the repeated lossy encoding would damage the eventual quality too much. Many multimedia data streams need to contain both audio and video data, and often some form of metadata that permits synchronization of the audio and video. Each of these three streams may be handled by different programs, processes, or hardware; but for the multimedia data stream to be useful in stored or transmitted form, they must be encapsulated together in a container format.
    [Show full text]
  • On the Accuracy of Video Quality Measurement Techniques
    On the accuracy of video quality measurement techniques Deepthi Nandakumar Yongjun Wu Hai Wei Avisar Ten-Ami Amazon Video, Bangalore, India Amazon Video, Seattle, USA Amazon Video, Seattle, USA Amazon Video, Seattle, USA, nandakd@amazon.com yongjuw@amazon.com haiwei@amazon.com avisarta@amazon.com Abstract —With the massive growth of Internet video and therefore measuring and optimizing for video quality in streaming, it is critical to accurately measure video quality these high quality ranges is an imperative for streaming service subjectively and objectively, especially HD and UHD video which providers. In this study, we lay specific emphasis on the is bandwidth intensive. We summarize the creation of a database measurement accuracy of subjective and objective video quality of 200 clips, with 20 unique sources tested across a variety of scores in this high quality range. devices. By classifying the test videos into 2 distinct quality regions SD and HD, we show that the high correlation claimed by objective Globally, a healthy mix of devices with different screen sizes video quality metrics is led mostly by videos in the SD quality and form factors is projected to contribute to IP traffic in 2021 region. We perform detailed correlation analysis and statistical [1], ranging from smartphones (44%), to tablets (6%), PCs hypothesis testing of the HD subjective quality scores, and (19%) and TVs (24%). It is therefore necessary for streaming establish that the commonly used ACR methodology of subjective service providers to quantify the viewing experience based on testing is unable to capture significant quality differences, leading the device, and possibly optimize the encoding and delivery to poor measurement accuracy for both subjective and objective process accordingly.
    [Show full text]
  • The H.264 Advanced Video Coding (AVC) Standard
    Whitepaper: The H.264 Advanced Video Coding (AVC) Standard What It Means to Web Camera Performance Introduction A new generation of webcams is hitting the market that makes video conferencing a more lifelike experience for users, thanks to adoption of the breakthrough H.264 standard. This white paper explains some of the key benefits of H.264 encoding and why cameras with this technology should be on the shopping list of every business. The Need for Compression Today, Internet connection rates average in the range of a few megabits per second. While VGA video requires 147 megabits per second (Mbps) of data, full high definition (HD) 1080p video requires almost one gigabit per second of data, as illustrated in Table 1. Table 1. Display Resolution Format Comparison Format Horizontal Pixels Vertical Lines Pixels Megabits per second (Mbps) QVGA 320 240 76,800 37 VGA 640 480 307,200 147 720p 1280 720 921,600 442 1080p 1920 1080 2,073,600 995 Video Compression Techniques Digital video streams, especially at high definition (HD) resolution, represent huge amounts of data. In order to achieve real-time HD resolution over typical Internet connection bandwidths, video compression is required. The amount of compression required to transmit 1080p video over a three megabits per second link is 332:1! Video compression techniques use mathematical algorithms to reduce the amount of data needed to transmit or store video. Lossless Compression Lossless compression changes how data is stored without resulting in any loss of information. Zip files are losslessly compressed so that when they are unzipped, the original files are recovered.
    [Show full text]
  • Video Super-Resolution Based on Spatial-Temporal Recurrent Residual Networks
    1 Computer Vision and Image Understanding journal homepage: www.elsevier.com Video Super-Resolution Based on Spatial-Temporal Recurrent Residual Networks Wenhan Yanga, Jiashi Fengb, Guosen Xiec, Jiaying Liua,∗∗, Zongming Guoa, Shuicheng Yand aInstitute of Computer Science and Technology, Peking University, Beijing, P.R.China, 100871 bDepartment of Electrical and Computer Engineering, National University of Singapore, Singapore, 117583 cNLPR, Institute of Automation, Chinese Academy of Sciences, Beijing, P.R.China, 100190 dArtificial Intelligence Institute, Qihoo 360 Technology Company, Ltd., Beijing, P.R.China, 100015 ABSTRACT In this paper, we propose a new video Super-Resolution (SR) method by jointly modeling intra-frame redundancy and inter-frame motion context in a unified deep network. Different from conventional methods, the proposed Spatial-Temporal Recurrent Residual Network (STR-ResNet) investigates both spatial and temporal residues, which are represented by the difference between a high resolution (HR) frame and its corresponding low resolution (LR) frame and the difference between adjacent HR frames, respectively. This spatial-temporal residual learning model is then utilized to connect the intra-frame and inter-frame redundancies within video sequences in a recurrent convolutional network and to predict HR temporal residues in the penultimate layer as guidance to benefit esti- mating the spatial residue for video SR. Extensive experiments have demonstrated that the proposed STR-ResNet is able to efficiently reconstruct videos with diversified contents and complex motions, which outperforms the existing video SR approaches and offers new state-of-the-art performances on benchmark datasets. Keywords: Spatial residue, temporal residue, video super-resolution, inter-frame motion con- text, intra-frame redundancy.
    [Show full text]
  • AVC, HEVC, VP9, AVS2 Or AV1? — a Comparative Study of State-Of-The-Art Video Encoders on 4K Videos
    AVC, HEVC, VP9, AVS2 or AV1? | A Comparative Study of State-of-the-art Video Encoders on 4K Videos Zhuoran Li, Zhengfang Duanmu, Wentao Liu, and Zhou Wang University of Waterloo, Waterloo, ON N2L 3G1, Canada fz777li,zduanmu,w238liu,zhou.wangg@uwaterloo.ca Abstract. 4K, ultra high-definition (UHD), and higher resolution video contents have become increasingly popular recently. The largely increased data rate casts great challenges to video compression and communication technologies. Emerging video coding methods are claimed to achieve su- perior performance for high-resolution video content, but thorough and independent validations are lacking. In this study, we carry out an in- dependent and so far the most comprehensive subjective testing and performance evaluation on videos of diverse resolutions, bit rates and content variations, and compressed by popular and emerging video cod- ing methods including H.264/AVC, H.265/HEVC, VP9, AVS2 and AV1. Our statistical analysis derived from a total of more than 36,000 raw sub- jective ratings on 1,200 test videos suggests that significant improvement in terms of rate-quality performance against the AVC encoder has been achieved by state-of-the-art encoders, and such improvement is increas- ingly manifest with the increase of resolution. Furthermore, we evaluate state-of-the-art objective video quality assessment models, and our re- sults show that the SSIMplus measure performs the best in predicting 4K subjective video quality. The database will be made available online to the public to facilitate future video encoding and video quality research. Keywords: Video compression, quality-of-experience, subjective qual- ity assessment, objective quality assessment, 4K video, ultra-high-definition (UHD), video coding standard 1 Introduction 4K, ultra high-definition (UHD), and higher resolution video contents have en- joyed a remarkable growth in recent years.
    [Show full text]
  • Video Quality Assessment: Subjective Testing of Entertainment Scenes Margaret H
    1 Video Quality Assessment: Subjective testing of entertainment scenes Margaret H. Pinson, Lucjan Janowski, and Zdzisław Papir This article describes how to perform a video quality subjective test. For companies, these tests can greatly facilitate video product development; for universities, removing perceived barriers to conducting such tests allows expanded research opportunities. This tutorial assumes no prior knowledge and focuses on proven techniques. (Certain commercial equipment, materials, and/or programs are identified in this article to adequately specify the experimental procedure. In no case does such identification imply recommendation or endorsement by the National Telecommunications and Information Administration, nor does it imply that the program or equipment identified is necessarily the best available for this application.) Video is a booming industry: content is embedded on many Web sites, delivered over the Internet, and streamed to mobile devices. Cisco statistics indicate that video exceeded 50% of total mobile traffic for the first time in 2012, and predicts that two-thirds of the world’s mobile data traffic will be video by 2017 [1]. Each company must make a strategic decision on the correct balance between delivery cost and user experience. This decision can be made by the engineers designing the service or, for increased accuracy, by consulting users [2]. Video quality assessment requires a combined approach that includes objective metrics, subjective testing, and live video monitoring. Subjective testing is a critical step to ensuring success in operational management and new product development. Carefully conducted video quality subjective tests are extremely reliable and repeatable, as is shown in Section 8 of [3]. This article provides an approachable tutorial on how to conduct a subjective video quality experiment.
    [Show full text]
  • Using H.264 and H.265 Codecs for 4K Transmissions
    ISSN 1843-6188 Scientific Bulletin of the Electrical Engineering Faculty – Year 15 No.1 (29) OBJECTIVE VIDEO QUALITY ASSESMENT: USING H.264 AND H.265 CODECS FOR 4K TRANSMISSIONS NICOLETA ANGELESCU Department of Electronics, Telecommunications and Energetics, Valahia University of Targoviste E-mail: nicoletaangelescu@yahoo.com Abstract. H.265 codecs has attracted a lot of attention solution can be applied only to MPEG 2 video files to for transmission of digital video content with 4k reduce storage space. In [5], the H.265 and H.264 codecs resolution (3840x2160 pixels), since offers necessary are evaluated on video quality for 4k resolution. The bandwidths saving in comparison with H.264 codec. We authors use three objective quality metrics for evaluation. compare both codecs using three objective video quality The authors conclude that H.265 codec offer the best metrics on five standard 4k video test sequences for improvement comparing to H.264 codec for low bit-rate. digital TV transmissions using both codecs. In [6], a solution to reduce bit rate for MPEG 2 and H.264 transmission is proposed. The base layer at MPEG Keywords: H.264, HECV, VQM, SSIM, objective video 2 standard resolution is multiplexed with the enhanced quality layer (which is the difference between scaled MPEG 2 video content and original high resolution video captured) compressed with H.264 codec and transmitted to user 1. INTRODUCTION destination. The same solution can be applied for H.264 and H.265 content. The users with receivers which Nowadays, the majority of video content is stored in support Full HD resolution view the 4K content scaled lossy compressed form.
    [Show full text]
  • Online Multi-Frame Super-Resolution of Image Sequences Jieping Xu1* , Yonghui Liang2,Jinliu2, Zongfu Huang2 and Xuewen Liu2
    Xu et al. EURASIP Journal on Image and Video EURASIP Journal on Image Processing (2018) 2018:136 https://doi.org/10.1186/s13640-018-0376-5 and Video Processing RESEARCH Open Access Online multi-frame super-resolution of image sequences Jieping Xu1* , Yonghui Liang2,JinLiu2, Zongfu Huang2 and Xuewen Liu2 Abstract Multi-frame super-resolution recovers a high-resolution (HR) image from a sequence of low-resolution (LR) images. In this paper, we propose an algorithm that performs multi-frame super-resolution in an online fashion. This algorithm processes only one low-resolution image at a time instead of co-processing all LR images which is adopted by state- of-the-art super-resolution techniques. Our algorithm is very fast and memory efficient, and simple to implement. In addition, we employ a noise-adaptive parameter in the classical steepest gradient optimization method to avoid noise amplification and overfitting LR images. Experiments with simulated and real-image sequences yield promising results. Keywords: Super-resolution, Image processing, Image sequences, Online image processing 1 Introduction There are a variety of SR techniques, including multi- Images with higher resolution are required in most elec- frame SR and single-frame SR. Readers can refer to Refs. tronic imaging applications such as remote sensing, med- [1, 5, 6] for an overview of this issue. Our discussion below ical diagnostics, and video surveillance. For the past is limited to work related to fast multi-frame SR method, decades, considerable advancement has been realized in as it is the focus of our paper. imaging system. However, the quality of images is still lim- One SR category close to our alogorithm is dynamic SR ited by the cost and manufacturing technology [1].
    [Show full text]
  • On Evaluating Perceptual Quality of Online User-Generated Videos Soobeom Jang, and Jong-Seok Lee, Senior Member, IEEE
    JOURNAL OF LATEX CLASS FILES, VOL. 13, NO. 9, SEPTEMBER 2014 1 On Evaluating Perceptual Quality of Online User-Generated Videos Soobeom Jang, and Jong-Seok Lee, Senior Member, IEEE Abstract—This paper deals with the issue of the perceptual quality assessment by employing multiple human subjects. quality evaluation of user-generated videos shared online, which However, subjective quality assessment is not feasible for is an important step toward designing video-sharing services online videos because of a tremendous amount of videos that maximize users’ satisfaction in terms of quality. We first analyze viewers’ quality perception patterns by applying graph in online video-sharing services. An alternative is objective analysis techniques to subjective rating data. We then examine quality assessment, which uses a model that mimics the human the performance of existing state-of-the-art objective metrics for perceptual mechanism. the quality estimation of user-generated videos. In addition, we The traditional objective quality metrics to estimate video investigate the feasibility of metadata accompanied with videos quality have two limitations. First, the existence of the ref- in online video-sharing services for quality estimation. Finally, various issues in the quality assessment of online user-generated erence video, i.e., the pristine video of the given video, is videos are discussed, including difficulties and opportunities. important. In general, objective quality assessment frameworks are classified into three groups: full-reference (FR), reduced- Index Terms—User-generated video, paired comparison, qual- ity assessment, metadata. reference (RR), and no-reference (NR). In the cases of FR and RR frameworks, full or partial information about the reference is provided.
    [Show full text]
  • Versatile Video Coding and Super-Resolution For
    VERSATILE VIDEO CODING AND SUPER-RESOLUTION FOR EFFICIENT DELIVERY OF 8K VIDEO WITH 4K BACKWARD-COMPATIBILITY Charles Bonnineau, Wassim Hamidouche, Jean-Francois Travers, Olivier Deforges To cite this version: Charles Bonnineau, Wassim Hamidouche, Jean-Francois Travers, Olivier Deforges. VERSATILE VIDEO CODING AND SUPER-RESOLUTION FOR EFFICIENT DELIVERY OF 8K VIDEO WITH 4K BACKWARD-COMPATIBILITY. International Conference on Acoustics, Speech, and Sig- nal Processing (ICASSP), May 2020, Barcelone, Spain. hal-02480680 HAL Id: hal-02480680 https://hal.archives-ouvertes.fr/hal-02480680 Submitted on 16 Feb 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. VERSATILE VIDEO CODING AND SUPER-RESOLUTION FOR EFFICIENT DELIVERY OF 8K VIDEO WITH 4K BACKWARD-COMPATIBILITY Charles Bonnineau?yz, Wassim Hamidouche?z, Jean-Franc¸ois Traversy and Olivier Deforges´ z ? IRT b<>com, Cesson-Sevigne, France, yTDF, Cesson-Sevigne, France, zUniv Rennes, INSA Rennes, CNRS, IETR - UMR 6164, Rennes, France ABSTRACT vary depending on the exploited transmission network, reducing the In this paper, we propose, through an objective study, to com- number of use-cases that can be covered using this approach. For pare and evaluate the performance of different coding approaches instance, in the context of terrestrial transmission, the bandwidth is allowing the delivery of an 8K video signal with 4K backward- limited in the range 30-40Mb/s using practical DVB-T2 [7] channels compatibility on broadcast networks.
    [Show full text]