State-Of-The-Art in 360° Video/Image Processing

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 State-of-the-art in 360° Video/Image Processing: Perception, Assessment and Compression Chen Li, Student Member, IEEE, Mai Xu, Senior Member, IEEE, Shanyi Zhang, Patrick Le Callet, Fellow, IEEE Abstract—Nowadays, 360° video/image has been increasingly have been developed for traditional 2D video/image by the or- popular and drawn great attention. The spherical viewing range ganizations of International Telecommunication Union (ITU), of 360° video/image accounts for huge data, which pose the Joint Photographic Experts Group (JPEG), Moving Picture challenges to 360° video/image processing in solving the bot- tleneck of storage, transmission, etc. Accordingly, the recent Experts Group (MPEG), etc. Although 360° images and video years have witnessed the explosive emergence of works on 360° sequences are projected from sphere to 2D planes to be stored, video/image processing. In this paper, we review the state-of-the- processed and transmitted [1], these 2D standards do not fit art works on 360° video/image processing from the aspects of 360° video/image well due to the spherical characteristics. perception, assessment and compression. First, this paper reviews Therefore, projects for improving compression efficiency on both datasets and visual attention modelling approaches for 360° video/image. Second, we survey the related works on both 360° video/image were started, such as MPEG-I [2] and JPEG subjective and objective visual quality assessment (VQA) of 360° 360 [3], and much effort have been made in 360° video/image video/image. Third, we overview the compression approaches for compression [2]. To measure the compression performance, 360° video/image, which either utilize the spherical characteristics visual quality assessment (VQA) is needed to evaluate the or visual attention models. Finally, we summarize this overview quality degradation caused by compression. However, due paper and outlook the future research trends on 360° video/image processing. to the spherical characteristics and the existence of sphere- to-plane projection, subjective VQA recommendations [4]– Index Terms—360° video/image, perception, assessment, com- [6] and objective VQA approaches [7] for 2D video/image pression. are not appropriate for 360° video/image. Under this cir- cumstance, there have emerged several works for VQA on 360° video/image, from the aspects of effectively collecting I. INTRODUCTION subjective quality data [8], [9] and modelling the visual 60° video/image, also known as panoramic, spherical or quality [1]. Moreover, the unique mechanism of viewing 360° 3 omnidirectional video/image, is a new multimedia type video/image through the viewport in an HMD accounts for that provides immersive experience. The content of 360° two facts: (1) The quality degradation in the viewport is more video/image is on the sphere that covers the whole 360×180° noticeable in 360° video/image, since the viewer focus on the viewing range. In other words, 360° video/image surrounds viewport, which is a small part of the whole 360° video/image. the viewer seamlessly and occupies the entire vision of (2) There is massive redundancy in the encoded bits of 360° the viewer, which is different from traditional 2-dimensional video/image, since the giant region outside the viewport is (2D) video/image that only covers a limited plane. Recent invisible to the viewer. Inspired by these facts, consideration of years have witnessed the rapid development of virtual reality human perception may benefit VQA and compression on 360° (VR) technology. As an essential type of VR content, 360° video/image. Thus, there are also many works concentrating video/image has been flooding into our daily life and drawing on visual attention modelling for 360° video/image to predict great attention. With the commercial head mount displays human perception [10], and even the grand challenges were (HMDs) available recently, the viewer is allowed to freely held [11], [12]. arXiv:1905.00161v2 [eess.IV] 28 Oct 2019 make the viewport focus on the desired content via head Even though plenty of works targeting at 360° video/image movement (HM), just like humans do in real world. In processing have dramatically emerged in recent years, to the this way, the immersive and even interactive experiences are best of our knowledge, there lacks a survey that reviews on technically achieved. Meanwhile, new challenges have been these works and provides summary and outlook. In this paper, raised to 360° video/image processing. To cover the whole we survey works on 360° video/image processing from the 360 × 180° viewing range with high fidelity, the resolution of aspects of visual attention modelling, VQA and compression, 360° video/image is extraordinarily high. Moreover, for 360° and we also reveal the relationships among these aspects. video, the frame rate should also be high to avoid motion For visual attention modelling, we review both datasets and sickness of viewers [1]. Consequently, heavy burdens are laid approaches. For VQA, subjective VQA methods for 360° on the storage and transmission of 360° video/image. video/image are first surveyed, which also provide datasets To relieve the storage and transmission burdens, compres- with subjective quality scores. Then, we focus on objective sion is in urgent need for saving bitrates of 360° video/image. VQA approaches, which aim at being consistent with the sub- In the past decades, many video/image compression standards jective quality scores on 360° video/image. For compression, approaches that either incorporate the spherical characteristics Manuscript received April 16, 2019. or perception of 360° video/image are reviewed. At last, we JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2 TABLE I SUMMARY OF THE EXISTING 360° VIDEO/IMAGE DATASETS WITH THE VISUAL ATTENTION DATA OF SUBJECTS. Dataset Image/Video Subjects Dataset Size Resolution Duration * HM/EM Description HMD: Oculus Rift DK2. The subjects were equally divided into two 21, indoor and out- 4096×2048 Abreu et al. [13] Image 32 10 or 20 s HM groups. For one group, the 360° images were viewed for 10 seconds, door images pixels and for the other group, the 360° images were viewed for 20 seconds. 153 Full HD to 4K, HMD: Oculus DK2. 35 subjects viewed all 16 sequences in order, and the Bao et al. [14] Video 16 from 3 categories — HM Ages: 20-50 mostly 4K remained 118 watched 3-5 randomly selected sequences. Ozcinar et al. [15] Video 17 6 4K to 8K 10 s HM HMD: Oculus Rift CV1. 59 3840×2048 Corbillon et al. [16] Video 7 70 s HM HMD: Razer OSVR HDK2. Ages: 6-62 pixels Lo et al. [17] Video 50 10 from 3 categories 4K 1 min HM HMD: Oculus Rift DK2. 48 Full HD to 4K, HMD: HTC Vive. Half of the sequences were freely viewed by subjects. Wu et al. [18] Video 18 from 5 categories ≈ 3-11 min HM Ages: 20-26 mostly 2K Before viewing the other half, subjects were instructed with questions. 48 HMD: HTC Vive. Although free viewing was allowed, subjects were asked AVTrack360 [19] Video 20 4K 30 s HM Ages: 18-65 to fill in the simulator sickness questionnaire during the test session. VR-HM48 [9] Video 40 48 from 8 categories 3K to 8K 20-60 s HM HMD: HTC Vive. No intercutting. The sequences in this dataset were not viewed in HMD. Subjects viewed the sequences under projection in a typical 2D monitor and labelled the Wild-360 [20] Video 30 85 from 3 categories — — HM-like viewport positions with mouse. As the sequences were not mapped onto the sphere, the data collected in this dataset are not real HM data. HMD: Oculus Rift DK2. Eye-tracker: pupil-labs stereoscopic eye-tracker. There were three experiment conditions: the VR stand condition, the VR 169 Sitzmann et al. [21] Image 22 — 30 s HM+EM seated condition and the desktop condition. In the desktop condition, the Ages: 17-59 data were collected using Tobii EyeX eye-tracker, with mouse-controlled desktop panorama viewers in a typical 2D monitor, 5376×2688 to HMD: Oculus Rift DK2. Eye-tracker: Sensomotoric Instruments (SMI) 63 98 (60 is released) Image [10], [22] 18332×9166 25 s HM+EM eye-tracker. Each subject viewed a subset of 60 images, so each image Ages: 19-52 from 5 categories Salient360 pixels was viewed by 40-42 subjects. 57 19, categorized by 3 3840×1920 Video [23] 20 s HM+EM HMD: HTC Vive. Eye-tracker: SMI eye-tracker. No intercutting. Ages: 19-44 groups of labels pixels 58 PVS-HM [24] Video 76 from 8 categories 3K to 8K 10-80 s HM+EM HMD: HTC Vive. Eye-tracker: aGlass DKI. Ages: 18-36 27 104 from 5 sports HMD: HTC Vive. Eye-tracker: aGlass DKI. The sequences are from the Zhang et al. [25] Video — 20-60 s HM+EM Ages: 20-24 catogories sports-360 dataset [26]. Each sequence was viewed by at least 20 subjects. HMD: HTC Vive. Eye-tracker: aGlass DKI. The sequences were divided 45 208 from more than VR-EyeTracking [27] Video > 4K 20-60 s HM+EM into 6 groups, and subjects viewed one group each time. Each sequence Ages: 20-24 6 categories was viewed by at least 31 subjects. 60 from more than 4 HMD: HTC Vive. Eye-tracker: aGlass DKI. No intercutting. The task categories (the other 3840×1920 to for subjects was assessing visual quality of the sequences instead of free 221 VQA-ODV [28] Video 540 are with the 7680×3840 10-23 s HM+EM viewing. The sequences were equally divided into 10 groups, and each Ages: 19-35 same content as the pixels subject only viewed 1 group of sequences (60 sequences in total including 60 sequences) 6 reference sequences).

State-Of-The-Art in 360° Video/Image Processing

An End-To-End System for Automatic Characterization of Iba1 Immunopositive Microglia in Whole Slide Imaging

Digital Image Processing

9. Biomedical Imaging Informatics Daniel L

Recognition of Electrical & Electronics Components

Sampling Theory for Digital Video Acquisition: the Guide for the Perplexed User

An Analysis of Aliasing and Image Restoration Performance for Digital Imaging Systems

Digital Image Processing Lectures 5 & 6

Chapter7 Image Compression

A Survey of Medical Image Processing Tools

Application of Augmented Reality Gis in Architecture

Source Coding

From Virtual to Augmented Reality in Financial Trading: a CYBERII