Download Full Text (Pdf)
Total Page:16
File Type:pdf, Size:1020Kb
Bitrate Requirements of Non-Panoramic VR Remote Rendering Viktor Kelkkanen Markus Fiedler David Lindero Blekinge Institute of Technology Blekinge Institute of Technology Ericsson Karlskrona, Sweden Karlshamn, Sweden Luleå, Sweden [email protected] [email protected] [email protected] ABSTRACT This paper shows the impact of bitrate settings on objective quality measures when streaming non-panoramic remote-rendered Virtual Reality (VR) images. Non-panoramic here refers to the images that are rendered and sent across the network, they only cover the viewport of each eye, respectively. To determine the required bitrate of remote rendering for VR, we use a server that renders a 3D-scene, encodes the resulting images using the NVENC H.264 codec and transmits them to the client across a network. The client decodes the images and displays them in the VR headset. Objective full-reference quality measures are taken by comparing the image before encoding on the server to the same image after it has been decoded on the client. By altering the average bitrate setting of the encoder, we obtain objective quality Figure 1: Example of the non-panoramic view and 3D scene scores as functions of bitrates. Furthermore, we study the impact that is rendered once per eye and streamed each frame. of headset rotation speeds, since this will also have a large effect on image quality. We determine an upper and lower bitrate limit based on headset 1 INTRODUCTION rotation speeds. The lower limit is based on a speed close to the Remote rendering for VR is commonly done by using panorama average human peak head-movement speed, 360°/s. The upper limit images that contain a 360-degree-view from the current position of is based on maximal peaks of 1080°/s. Depending on the expected the Head Mounted Display (HMD). Such panorama techniques may rotation speeds of the specific application, we determine that a total reduce rotational latency and provide a more seamless experience of 20–38Mbps should be used at resolution 2160×1200@90 fps, and than techniques that only render the current viewport. This is due to 22–42Mbps at 2560×1440@60 fps. The recommendations are given the fact that the panorama image has information of every viewing with the assumption that the image is split in two and streamed in direction, and can therefore easily select a new view depending parallel, since this is how the tested prototype operates. on the look-vector of the HMD on the local device. However, as those images cover a much larger area, they require more data and CCS CONCEPTS rendering power to be able to produce an image quality that matches • Networks ! Cloud computing; Network measurement; • Ap- that of the non-panoramic method. The technique is also not ideal plied computing ! Computer games; • Computing methodolo- for very interactive scenes, i.e. games, because any interaction with gies ! Virtual reality. the world must immediately be shown anyway. This must be done either by downloading a new image or rendering it on the client. KEYWORDS Rendering new panorama images on the server in real time can remote rendering, game streaming, low-latency, 6-DOF, SSIM, VMAF also be very time consuming due to the stitching process and need for six cameras. This is why we see potential in the simpler method ACM Reference Format: where just the current viewport that is seen by the user is rendered, Viktor Kelkkanen, Markus Fiedler, and David Lindero. 2020. Bitrate Re- as would be the case in local rendering. Figure 1 shows an example quirements of Non-Panoramic VR Remote Rendering. In Proceedings of the 28th ACM International Conference on Multimedia (MM ’20), October of the non-panoramic view used in this work. 12–16, 2020, Seattle, WA, USA. ACM, New York, NY, USA, 8 pages. https: The method of rendering just the viewport is commonly used in //doi.org/10.1145/3394171.3413681 short-range remote rendering for VR, for example when enabling wireless streaming to HMDs in the same room as the rendering Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed computer, e.g. when using TPCAST [17]. It may be possible to for profit or commercial advantage and that copies bear this notice and the full citation expand this to networks with longer range in the future. It would on the first page. Copyrights for components of this work owned by others than the put immense requirements on those networks though, because new author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission images must constantly be downloaded at a high framerate to cover and/or a fee. Request permissions from [email protected]. the current view-direction of the user. Naturally, many problems MM ’20, October 12–16, 2020, Seattle, WA, USA arise from this task, perhaps most critically are the ones related to © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-7988-5/20/10...$15.00 network latency and codec time consumption. Those issues will be https://doi.org/10.1145/3394171.3413681 covered in a different publication though, in this work we focus on the issue of bandwidth requirements. The bandwidth may be to achieve an SSIM above 0.98. The solution is intended for mobile more limited than on a LAN with a WiGig connection or similar. phones and works at a resolution of 1280×720@60. Thus, we need to determine in depth the bitrate requirements of this A work on foveated video encoding for cloud gaming on mobile type of streaming, which is the scope of this work. We determine phones present a maximum bitrate of 10Mbps that results in a Mean bitrate requirements by using an existing prototype, developed in- Opinion Score (MOS) of 69 on a scale of 0–100 [6]. 1920×1080@50 house, that remotely renders a VR scene once per eye at a common is reported as the used resolution. HMD resolution, 1080×1200@90 fps, and mobile phone resolution 1280×1440@60 fps. From now on, the unit fps will be omitted when 3 METHOD describing a resolution and its framerate. In order to determine the image quality as function of bitrate and The remainder of the paper is structured as follows: Section 2 HMD movement, we conduct experiments with a VR remote ren- presents previous work that is related to VR streaming and required dering prototype where one parameter is altered at a time and the bitrates, Section 3 describes the methods we have used to determine impact on two objective quality estimators are recorded, SSIM and the bitrate requirements, Section 4 shows results from experiments Video Multi-Method Assessment Fusion (VMAF) [10]. with the prototype, Section 5 concludes the work and presents our most important findings, Section 6 discusses potential future work. 3.1 Software An existing solution for non-panoramic VR remote rendering is 2 RELATED WORK used. This solution was developed in-house and consists of a server Related work can be divided into three areas with a decreasing level that renders a 3D-scene by using C++ and OpenGL, and a client of relevance: First, other studies of remote rendering solutions for that displays received images. The scene is rendered once per eye non-panoramic VR, second, studies of remote rendering or video and thus creates two textures that are encoded, transmitted and de- streaming of 360 panoramic images for VR, and finally, solutions coded in parallel. When calculating quality scores, we only use the for cloud gaming. left-eye image, this is because the two images are very similar and never merged in the program. An alternative would be to calculate 2.1 Non-Panoramic VR Remote Rendering the scores for both images, and report the average, or artificially merge the two into one image for comparison. Combining the two We have found two public works where authors have created sys- would double the size of recorded data though, and not make much tems for non-panoramic VR streaming and put them to the test difference in the analysis according to our observations. Therefore, [9, 20]. [20] does not bring up bitrate, and [9] leaves room for deeper we only use one of the images to determine the scores. As previ- investigation into bitrate requirements. In [9], authors present re- ously stated, the images are encoded in parallel, this happens in sults related to bitrate in the form of a maximum frame size setting two separate hardware encoders that use the exact same settings, that is set to slightly more than 200KB to reach a Structural SIM- including bitrate. This means that the total bitrate required for the ilarity (SSIM) [21] above 0.98. This number, 0.98, is used as the system is two times that of the individual encoder bitrate setting. limit for a visually lossless experience, the authors base this on When a bitrate is presented in this document, it refers to the bitrate related previous work [1]. It is not clear what a maximum frame of one individual eye, unless specifically stated otherwise. It should size means in terms of overall bitrate requirements. To our under- thus be doubled to acquire the total bitrate of the system. standing, 200KB per frame would in the worst case be equal to 200 × 1024 × 8 × 90 = 147Mbps, but is likely lower on average.