2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)

Evaluation of Simulator Sickness for 360◦ Videos on an HMD Subject to Participants’ Experience with Virtual Reality

Majed Elwardy* Hans-Jurgen¨ Zepernick† Yan Hu‡ Thi My Chinh Chu§ Veronica Sundstedt¶

Blekinge Institute of Technology SE-37179 Karlskrona, Sweden

ABSTRACT In contrast to the large technological advancements supporting novel immersive visual and interactive applications, the understand- Virtual reality (VR) has seen tremendous advances in head-mounted ing about the level of simulator sickness and cybersickness is less displays (HMDs), optics, media quality, and other improvements developed but similar important to assure customer satisfaction. that facilitate immersive experiences. With the occurrence of new Given that viewing VEs on HMDs is a rather new option for many technologies like Cloud VR and networked VR video services, ap- consumer verticals, studies on simulator sickness that engage partic- plications such as 360◦ video streaming are becoming more popular ipants with different levels of experience with VR are also needed. within the broader consumer markets. As a result, VR content is accessible to customers with rather different levels of experiences with immersive media, i.e., never, sometimes, or often use of VR. 1.1 Related Work The question, therefore, arises to which degree simulator sickness Related recent work is typically based on the simulator sickness is induced to viewers depending on their experiences with VR on ◦ questionnaire (SSQ) proposed in [17]. In [4], a VR simulator of a HMDs. In this paper, simulator sickness is evaluated for 360 videos forestry crane for loading logs onto a truck was evaluated in terms that were shown on an HTC Vive Pro HMD to participants having of quality of experience (QoE) aspects and SSQ to reveal the effects different levels of experience with VR on HMDs. The modified ab- of latency for task completion. It was shown that the display delay solute category rating with hidden reference (M-ACR-HR) method strongly influenced sickness symptoms causing some test subjects was used in a subjective experiment for video quality assessment to discontinue the experiment. The work reported in [5] evaluated within two subsequent sessions along with a simulator sickness ques- the physical discomfort by recording the SSQ for the Oculus Rift tionnaire (SSQ). A statistical analysis of the SSQ scores is performed HMD in three experiments against the Google Cardboard and 3D to reveal the relationship between simulator sickness and partici- TV. Compared to the two low-cost devices, the Oculus Rift HMD pants’ experiences with VR regarding: (1) Individual symptoms, (2) turned out to have great potential for conducting basic research in Pairwise comparison of symptoms, and (3) Symptom clusters of this field. However, additional in-depth studies were suggested to , oculomotor, disorientation, and total score. It is shown that better understand the relationship between exposure duration, ex- the simulator sickness symptoms, in general, are slightly or rarely ◦ perimental design, and simulator sickness as well as other HMDs. perceived across the different experience levels for the selected 360 In [11], the impact of video content type on users’ VR sickness and videos. The results indicate that the reported simulator sickness in- physiological response was studied engaging 26 participants with creases in the second session for participants that never used VR on visual stimuli shown on three generations of Oculus Rift HMDs, HMDs. Sufficiently long breaks between sessions should therefore Samsung Gear VR HMD, and a 2D TV screen. Findings of this be accounted for in the M-ACR-HR method to avoid that simulator research include that participants’ personal content preferences af- sickness influences quality rating. fect their sickness perception, skin conductance strongly correlates Keywords: Immersive media, 360° videos, subjective experiments, with sickness effects, and that both SSQ and also simpler question- M-ACR-HR method, simulator sickness questionnaire. naires give good results. In [15], QoE and simulator sickness of a smart-exercise-bike VR system was evaluated using the SSQ and physiological signals. It was shown that texture quality and frame 1 INTRODUCTION rate have a statistically significant impact on perceived quality but In recent years, immersive multimedia such as virtual reality (VR) not on the SSQ scores. Further, the synthetic VEs presented to the and augmented reality (AR) have seen increased applications ranging participants appear to trigger higher levels of sickness symptoms from digital games to the general consumer markets. The trend compared to studies focusing on omnidirectional videos. The levels of viewing virtual environments (VEs) on head-mounted displays of prior experience with VR appeared to also have a significant im- (HMDs) is due to large technological improvements in hardware, pact on simulator sickness with more experienced participants being processors, and software. 5G mobile networks are being developed less affected. In [21], subjective quality of omnidirectional videos to provide the high data rates needed for this type of applications with at different bit rates and resolutions was evaluated along with an ultra-low latency and stringent reliability constraints [12]. Similarly, analysis of SSQ scores and head motion behavior. It was found that the visions for future 6G mobile networks foresee to take VEs even simulator sickness increases with increased test time for the videos further to serve holographic verticals and societies [13]. shown on an HTC Vive Pro HMD and Oculus Rift HMD but can be reduced by taking sufficiently long breaks between sessions. The *e-mail: [email protected] finding that simulator sickness increases with increased time was †e-mail: [email protected] confirmed in [25] and the role of extended breaks to reduce SSQ ‡e-mail: [email protected] scores was pointed out. A similar experimental setting was studied §e-mail: [email protected] in [22] with the obtained SSQ scores suggesting different levels of ¶e-mail: [email protected] simulator sickness depending on the content. Further, it was shown that the female participants developed significantly higher simulator sickness (see also [10]). The work in [26] investigated visual and cognitive aftereffects of using HMDs and their relationship to the re- porting of sickness on the SSQ in an application-based study. In [1],

978-1-7281-6532-5/20/$31.00 ©2020 IEEE 477 DOI 10.1109/VRW50115.2020.00097 the relationship between SSQ scores and participants’ dropout rates 2 EXPERIMENTAL DESIGN due to sickness symptoms in driving simulators was studied. The SSQ data was obtained as part of a larger subjective experiment Similar as the SSQ [17] was developed from the Pensacola mo- on 360◦ video quality assessment. The stimuli were viewed on tion sickness questionnaire (MSQ) [16] to better capture symptoms an HMD by participants having different levels of experience with of simulator-induced sickness compared to motion-induced sickness, VR. The subjective experiment was broken down into a pilot study several questionnaires have been proposed in recent years to bet- with a small group of experts and a main study with participants ter evaluate immersive media-induced sickness. A virtual reality not directly involved in quality assessment of visual stimuli. As sickness questionnaire (VRSQ) was proposed in [18] based on the noted in Recommendation ITU-T P.910 [14], the aim of a pilot study responses of 24 participants that were given target selection tasks on (4-8 experts) is to obtained indicative results before conducting a VR device. The results show that the SSQ proportional score of the main study with a larger number of participants. Results of nausea-related symptoms is the lowest which motivated to consider this subjective experiments addressing the impact of participants’ only nine symptoms for the VRSQ that contribute to oculomotor and experience on 360◦ video quality assessment regarding opinion disorientation. In [23], the suitability of short, single-scale versions scores have been reported in [7]. In the following sections, the of the SSQ and presence questionnaire (PQ) for omnidirectional design of this experiment is described to the extent needed for the videos in comparison to the respective long versions was investi- understanding of the SSQ analysis. Additional details about the gated. The results show that the short questionnaires can be used to design of this subjective experiment can be found in [7]. assess strong effects such as impact of content but cannot replace the long versions when it comes to assess the impact of coding parame- 2.1 Stimuli, Software, and Equipment ters such as resolution, bit rate, and frame rate on simulator sickness ◦ and presence. In [20], the SSQ is compared with the cybersickness Four natural 360 reference video scenes of 8K resolution (2D width) with different content, complexity, and motion were selected from questionnaire (CSQ) [6], VRSQ [18], and French SSQ (FSSQ) [3] ◦ in VEs developed for VR. The results obtained in this study with 32 the VQA-ODV database [19], [2]. For each of the 8K 360 video participants and 7 different VEs presented in 9 sessions show that scenes, given in equirectangular projection format, additional refer- ence videos with 6K, 4K, 2K, and optimal resolution (OR) [29] were CSQ and VRSQ give better indicators of validity compared to SSQ ◦ and FSSQ when it comes to evaluating cybersickness. produced. A set of 360 test videos was generated by compressing the 360◦ reference videos using five different quantization parame- 1.2 Contribution and Paper Structure ters (QPs), i.e., QP=22, 27, 32, 37, and 42. In this way, 120 360◦ videos with different quality levels were obtained: (1) 20 reference In view of the above discussion, in this paper, we evaluate the impact videos with five resolutions and four scenes, (2) 100 test videos rep- of participants’ experience with VR on simulator sickness using the resenting five resolutions, five QPs, and four scenes. In other words, SSQ. The SSQ scores were obtained from participants that took part a set of 30 360◦ videos has been produced for each of the four scenes in a large subjective experiment on quality assessment of 360◦ videos comprising of the respective reference and test videos. Appendix A using the modified absolute category rating with hidden reference provides sample frames of the four 360◦ reference video scenes and (M-ACR-HR) method and stimuli shown on an HTC Vive Pro HMD some details about the generation of the test videos. (see [7], [28] for more details). It should be mentioned that the ◦ M-ACR-HR method has recently been introduced in [24] to account The 360 videos were shown on an HTC Vive Pro HMD equipped for users that have little or no experience with immersive multimedia with an integrated eye-tracker from Tobii Pro with gaze data output on HMDs. As such, the SSQ analysis provided in this paper also of 120 Hz. The HTC Vive Pro HMD provides a resolution of 1440 × ◦ contributes to the understanding of the usability of the M-ACR-HR 1600 pixels per eye, a 110 field of view (FoV), and runs with a method in terms of simulator sickness. The statistical analysis of the refresh rate of 90 Hz. Interaction with the virtual world was provided SSQ scores addresses the following: through the HTC Vive controller which allowed the participants to follow the instructions during eye-tracker calibration and to rate ◦ • Mean SSQ scores and box plots over all participants to reveal the quality of the 360 videos. The heart rate and galvanic skin the level of occurrence of the 16 individual symptoms. response (GSR) data were also recorded using the iMotion Software Version 7.1 with a wireless Shimmer GSR biosensor and a photo- • Pairwise comparison of symptoms using t-tests to identify plethysmography sensor. At the end of a session, the participants symptoms that are statistically different to other symptoms. answered the SSQ using a graphical user interface under iMotion on a standard screen. • Mean weighted SSQ scores and total score for the symptom clusters of nausea, oculomotor, and disorientation over all 2.2 Participants participants to reveal the ranking among these clusters in con- A pilot study was conducted with five experts familiar with immer- tributing to simulator sickness. sive multimedia and/or multimedia signal processing (2 females and • Mean SSQ scores and box plots when participants are grouped 3 males). The main study engaged 30 participants (9 females and into experts, participants that sometimes used VR, and partici- 21 males) having a variety of occupations as follows: 1 Bachelor pants that never used VR. student, 10 Master students, 8 Ph.D. students, 6 academic staff, 1 ad- ministrator, and 4 industry staff. The SSQ of one participant from the • Mean weighted SSQ scores and total scores for the symptom pilot study and one participant from the main study that had a break clusters of nausea, oculomotor, and disorientation when par- longer than one hour between the two subsequent sessions were not ticipants are grouped into experts, participants that sometimes considered. Screening of the participants’ distribution of opinion used VR, and participants that never used VR. scores for consistency was performed according to guidelines of the Video Quality Experts Group (VQEG) [27] which revealed four The rest of the paper is organized as follows. Section 2 describes the outliers in the main study. The SSQs associated with these four experimental design including the visual stimuli, software, equip- outliers were also removed from further statistical analysis. This ment, participants, test method, and the SSQ. The statistical analysis left four participants (2 females and 2 males) from the pilot study of the SSQ scores and weighted SSQ scores for the individual symp- and 25 participants (5 females and 20 males) from the main study. toms, symptom clusters, and subject to participants’ experience with The average age of the experts in the pilot study was 38.2 years VR is given in Section 3. Discussions, conclusions, and future work and that of the participants in the main study was 29.53 years. The are provided in Section 4. breakdown of the total of 29 participants into classes of experience

478 with VR on HMDs is as follows: 4 (experts), 12 (sometimes used, Table 1: Session Schedule i.e., a few times a year), and 13 (never used). Session Duration Comments M-ACR-HR 28-35 min. Starts with eye-tracker calibration. 2.3 Test Method Presentation and rating of 60×2 The M-ACR-HR method [24] was used in this experiment where videos is scheduled as follows: each 360◦ video of 10 s duration is presented twice with a three (1) 10 s video - first time, second mid-grey screen in between. In this method, participants (2) 3 s mid-grey screen, give their opinion score on a five-level quality scale: (5) Excellent, (3) 10 s video - second time, (4) Good, (3) Fair, (2) Poor, and (1) Bad. All videos were shown in random order with the reference videos hidden among the test (4) Quality rating, typically, <10 s. videos. In this study, two sessions were recorded for each participant SSQ <5 min. Answered at the end of the session with 60 of the total of 120 videos shown in each session. Thus, all on a conventional screen. participants have seen the entire set of 360◦ reference and test videos associated with the four different scenes. The schedule followed in Table 2: Computation of SSQ Scores [17]. each session of the subjective experiment is shown in Table 1. The participants were informed about the risks of Weight and were given instructions before the experiment. All participants i SSQ Symptom N O D were screened for color vision (Ishihara color blindness test plates) and visual acuity (Snellen charts, 20/20 normal or corrected to 1 General discomfort 1 1 0 normal vision). Before starting with the subjective experiment, each 2 Fatigue 0 1 0 participant was asked to confirm that he or she is in good health 3 Headache 0 1 0 condition for the experiment. Then, a short training session was 4 Eyestrain 0 1 0 × conducted presenting only 4 2 videos not shown in the actual test 5 Difficulty focusing 0 1 1 sessions. After a break of 5 min., the first session was conducted with the SSQ scores collected at the end of the session. In line 6 Increased salivation 1 0 0 with the ethical vetting application that has been approved for this 7 Sweating 1 0 0 subjective experiment, participants were given a break as desired 8 Nausea 1 0 1 before commencing with the second session. However, in this paper, 9 Difficulty concentrating 1 1 0 the SSQ scores of the participants that had a break between sessions 10 Fullness of head 0 0 1 exceeding one hour were excluded from processing. In this way, we comply with the ethical vetting while keeping the effect of alleviated 11 Blurred vision 0 1 1 symptoms due to a long rest within limits. The average duration of 12 Dizziness (eyes open) 0 0 1 each session and the average duration of the break between sessions 13 Dizziness (eyes closed) 0 0 1 for the pilot study (less than 15 min. break on average) and main 14 0 0 1 study (less than 30 min. break on average) are shown in Fig. 1. 15 Stomach awareness 1 0 0 16 Burping 1 0 0

Total Nw Ow Dw

tion (D). The scores of each symptom cluster are calculated as

16 Nw = ∑ wN,i · si (1) i=1 16 Ow = ∑ wO,i · si (2) i=1 16 Dw = ∑ wD,i · si (3) i=1

where wN,i,wO,i,wD,i ∈ {0,1} are the weights given in the respective column of Table 2 that include/exclude the symptom scores si, i = 1,...16, into/from the symptom cluster scores Nw, Ow, and Dw, Figure 1: Average duration of each session and the break between respectively. Weighting of the symptom cluster scores N , O , and sessions for the pilot study and main study. w w Dw by empirically obtained constants is then performed to obtain the following weighted symptom cluster scores as given in [17]: N = 9.54 · N (4) 2.4 Simulator Sickness Questionnaire w O = 7.58 · Ow (5) The SSQ in [17] accounts for 16 symptoms (see Table 2). The re- D = 13.92 · D (6) spondents to the SSQ can rate the presence of each symptom through w a symptom variable score: (0) None, (1) Slight, (2) Moderate, (3) and the total score (TS), also referred to as total severity, as Severe. The 16 symptoms are further organized in [17] into three symptom clusters, i.e., nausea (N), oculomotor (O), and disorienta- TS = 3.74 · (Nw + Ow + Dw) (7)

479 Fatigue EyestrainDifficulty focusingSweating Difficulty concentratingDizzinessDizziness (eyesVertigo open)(eyes closed)Burping General discomfortHeadache Increased salivationNausea FullnessBlurred of head vision Stomach awareness1 General discomfort Fatigue * * * * * * * * * * Headache * * * * * * * * * * * * Eyestrain * * * * * 0.75 Difficulty focusing * * * * * * * * * * * * Increased salivation * * * * * * * Sweating * * * * * Nausea * * * * 0.5 Difficulty concentrating * * * * Fullness of head * * * Blurred vision * * * * * * * * * Dizziness (eyes open) * * * 0.25 Dizziness (eyes closed) * * * * Vertigo * * * * Stomach awareness * * * * * Burping * * * * * (a) Mean and 95% CI * * * * * * 0

Figure 3: Illustration of p-values obtained from pairwise comparison of symptoms using t-tests with α = 0.05. The colour bar indicates the p-value level. The diameter and colour of the circle changes with the p-value. The red asterisk indicates significant differences.

To reveal the symptoms that contribute significantly different to the TS compared to other symptoms, a t-test was performed for all pairs of symptoms using a significance level of α = 0.05. The p-values obtained for this pairwise parametric test are illustrated in Fig. 3. For example, eyestrain occurs statistically similar as general discomfort, fatigue, and fullness of head but is statistically significant different to the remaining 12 symptoms.

3.2 Statistical Analysis of Symptom Cluster Scores Fig. 4(a) shows the summary statistics in terms of means and 95% CIs of the weighted SSQ scores obtained for the symptom clusters of nausea, oculomotor, disorientation, and the total score. As can (b) ”+” denotes an outlier, ”◦” denotes a median value be seen from the figure, the mean weighted SSQ scores related to nausea reach a value of around 15 in both sessions. In contrast, the Figure 2: Statistical analysis of SSQ scores: (a) Summary statistics, mean weighted SSQ score of around 30 indicates a twice as high (b) Box plots. occurrence of oculomotor and disorientation related symptoms in both sessions for the 360◦ videos shown in the experiment. It is also observed that oculomotor and disorientation decreases slightly in the second session. The dominance of these two symptom clusters 3 STATISTICAL ANALYSIS OF THE SSQ carries over to the total score which approaches 30 for both sessions. A statistical analysis of the SSQ scores from the 360◦ video quality Regarding the box plots shown in Fig. 4(b), it can be observed assessment test was conducted in terms of mean and mean weighted that the spread of weighted SSQ scores obtained from the responses SSQ scores, 95% confidence intervals (CIs), box plots, and t-tests. of the 29 participants increases from nausea over oculomotor to disorientation. In particular, there exists a large diversity of weighted 3.1 Statistical Analysis of SSQ Scores SSQ scores for disorientation in the first and even more pronounced Fig. 2(a) shows the mean SSQ scores and 95% CIs obtained for each in the second session of the experiment. symptom in the first and second session. In general, given that the These results may suggest to further study suitable durations of symptom variable scores can assume discrete values (0, 1, 2, 3), the exposure of this type of VR stimuli on HMDs in order to reduce mean SSQ scores are kept below 1 (slight) for all symptoms and symptoms like fatigue, headache, difficulty focusing, and difficulty close to 0 (none) for many symptoms. Those symptoms that are concentrating. Further advancements in HMD technologies includ- experienced towards a slight level are general discomfort, fatigue, ing higher resolutions beyond 8K may alleviate symptoms like eye- eyestrain, difficulty focusing, difficulty concentrating, fullness of strain, blurred vision, dizziness, and vertigo. head, and blurred vision. This result may be attributed to the rel- atively long duration of each of the two sessions being of around 3.3 Statistical Analysis of SSQ Scores Subject to Partic- 25 minutes. The observed ranking among the mean SSQ scores ipants’ Experience with VR of symptoms is consistent for Session 1 and Session 2. Fig. 2(b), Additional insights on simulator sickness related to 360◦ videos showing the box plots of the SSQ scores, confirms the above finding shown on HMDs can be gained by analysing the SSQ scores sub- where those seven symptoms of slight simulator sickness obtain a ject to participants’ experience levels with VR. In this experiment, distinct spread of scores and the remaining nine symptoms have the 29 participants are grouped into 4 experts, 12 participants that rarely been experienced. sometimes used VR, and 13 participants that have never used VR.

480 three groups of participant experiences with VR. While experts and participants that sometimes used VR show similar results such as being less affected by simulator sickness in Session 2 as indicated by the total score, the participants that never used VR report increased symptom cluster scores and an increased total score for Session 2. Furthermore, for participants with no VR experience, the nausea symptom cluster received the lowest mean weighted SSQ score, higher occurrence of oculomotor related symptoms, and the highest mean weighted SSQ score for the disorientation symptom cluster. However, the 95% CIs increase accordingly for the mean given to nausea over oculomotor to disorientation for the non-experienced participants. This finding is supported by the box plots shown in Fig. 7 with a wide spread of weighted SSQ scores for oculomotor and disorientation in case of participants which have never used VR. In contrast, for experts and participants that sometimes used VR, the spread of weighted SSQ scores is relatively narrow. The finding about simulator sickness being more pronounced for (a) Mean and 95% CI participants with no prior experience with VR, i.e., viewing natu- ral 360◦ video scenes on an HMD in this study, is consistent with the conclusion given [15] for synthetic VEs that more experienced participants are less affected by simulator sickness. The results also reveal that the statistical analysis provided in Section 3.1 and Sec- tion 3.2 performed over all participants does not provide complete insights into simulator sickness issues but should also consider par- ticipants’ experience levels with VR as presented in this section. In relation to the usability of the M-ACR-HR method for 360◦ video quality assessment, the results suggest to account for sufficiently long breaks between sessions for inexperienced participants in order to avoid that simulator sickness influences the quality rating.

4 SUMMARY In this paper, we have provided a statistical analysis of the SSQ scores obtained from 29 participants that took part in a subjective experiment on 360◦ video quality assessment following the M-ACR- HR method. In particular, 120 360◦ video sequences with different quality in terms of resolution and quantization parameter were shown (b) ”+” denotes an outlier, ”◦” denotes a median value on an HTC Vive Pro HMD over two sessions. Focus of the statistical analysis has been given to individual symptoms, symptom clusters, Figure 4: Statistical analysis of symptom cluster scores: (a) Summary and the level of participants’ experience with VR. statistics, (b) Box plots. 4.1 Discussions In general, the results have shown that the level of simulator sick- Fig. 5 shows the mean SSQ scores for the 16 symptoms obtained ness is kept low for the presented videos. Regarding the individual for the three groups of participant experiences with VR. It can be symptoms, seven symptoms have slightly been perceived while the seen that some of the symptoms did not occur with the experts, i.e., remaining nine symptoms have rarely been experienced. The sta- increased salivation, dizziness (eyes closed), vertigo, and stomach tistically analysis of the symptom clusters has revealed that nausea- awareness. Further, some symptoms occurred only in the first ses- related symptoms are rather low. On the other hand, the symptoms sion (difficulty concentrating, fullness of head, burping) and others that relate to oculomotor and disorientation are perceived about occurred only in the second session (nausea, blurred vision, dizzi- twice as high compared to those related to nausea. The above find- ness (eyes open)). In general, the mean SSQ scores obtained from ings apply to both the first and second session of the experiment the responses of the experts are all quite low although with wide with the mean weighted SSQ scores slightly lower in the second 95% CI due to the pilot study having engaged only 4 participants. session. Further, the same statistical analysis has been performed The comparison of the results for the participants that sometimes with participants grouped into sets of experience levels with VR, used VR (Fig. 5(b)) and those that have never used VR (Fig. 5(c)) i.e., experts, sometimes used, and never used VR. In contrast to shows similar trends for the respective symptoms. However, for the experts and participants with some VR experience, it turns out that participants that never used VR, the mean SSQ scores increased in the participants that never used VR produce higher mean weighted Session 2 for some symptoms, i.e., general discomfort, difficulty SSQ scores for the three symptom clusters and the total score. In focusing, sweating, difficulty concentrating, fullness of head, dizzi- addition, the participants that never used VR report higher mean ness (eyes open), dizziness (eyes closed), stomach awareness, and weighted SSQ scores in the second session while the scores for the burping. In other words, while experts and participants with some other two experience levels have decreased in the second session. VR experience were able to cope well with Session 2, participants with no VR experience tended to develop slightly increased levels 4.2 Conclusions of simulator sickness in Session 2. Since the symptom clusters of oculomotor and disorientation have Similarly, Figs. 6 and 7 provide the mean weighted SSQ scores received higher scores compared to nausea-related symptoms, future and box plots of weighted SSQ scores, respectively, for the symp- work may be directed towards improving experimental designs such tom clusters N, O, D, and the total score TS with respect to the as finding a suitable duration of exposure to 360◦ videos. Regarding

481 (a) Experts (b) Sometimes used (c) Never used

Figure 5: Mean SSQ scores and 95% CIs obtained for participants with different experience levels of VR.

(a) Experts (b) Sometimes used (c) Never used

Figure 6: Mean weighted SSQ scores and 95% CIs for symptom clusters N, O, D, and TS subject to participants’ experience with VR.

(a) Experts (b) Sometimes used (c) Never used

Figure 7: Box plots of weighted SSQ scores for symptom clusters N, O, D, and TS subject to participants’ experience with VR.

HMD technologies, the results indicate that improvements in reso- are planned in our future work to deduce statistically significant lution may be needed to further reduce symptoms associated with relationships between SSQ scores, the aforementioned sensory data, oculomotor and disorientation such as fatigue, eyestrain, difficulty and participants’ experience levels. focusing, blurred vision, dizziness, and vertigo. As far as the usabil- ity of the M-ACR-HR method for 360◦ video quality assessment is concerned, it is suggested to allow for sufficiently long breaks be- tween sessions for inexperienced participants to avoid that simulator ACKNOWLEDGMENTS sickness influences their quality rating. This work was supported in part by The Knowledge Foundation, 4.3 Future Work Sweden, through the ViaTecH project (Contract 20170056). The study has been granted ethical approval (Dnr. 2018/863). We thank Our future work aims to correlate the SSQ scores obtained in this ex- all volunteers who generously shared their time to participate in the periment with the also recorded galvanic skin responses, eye-tracker subjective experiment. Special thanks go to Francisco Lopez Luro data and head movement behavior of the participants with different and Diego Navarro for their advice given to the development of the prior experience with VR. Additionally, a variety of parametric tests test platform.

482 REFERENCES [22] A. Singla, S. Fremerey, W. Robitza, and A. Raake. Measuring and comparing QoE and simulator sickness of omnidirectional videos in [1] S. A. Balk, M. A. Bertola, and V. W. Inman. Simulator sickness different head mounted displays. In Proc. Int. Conf. on Quality of questionnaire: Twenty years later. In Proc. Int. Driving Symp. on Multimedia Experience, pp. 1–6. Erfurt, Germany, May 2017. Human Factors in Driver Assessment, Training, and Vehicle Design, [23] A. Singla, R. R. R. Rao, S. Goring,¨ and A. Raake. Assessing media pp. 257–263. Lake George, NY, USA, Jun. 2013. QoE, simulator sickness and presence for omnidirectional videos with [2] Beihang University, School of Electronic and Information Engineering, different test protocols. In Proc. IEEE Conf. on Virtual Reality and 3D Beijing, China. VQA-ODV, 2017 (accessed Apr. 27, 2019). https: User Interfaces, pp. 1163–1164. Osaka, Japan, Mar. 2019. //github.com/Archer-Tatsu/VQA-ODV. [24] A. Singla, W. Robitza, and A. Raake. Comparison of subjective quality [3] S. Bouchard, G. Robillard, and P. Renaud. Revising the factor structure evaluation methods for omnidirectional videos with DSIS and modi- of the simulator sickness questionnaire. Annual Review of Cyberther- fied ACR. In Proc. Human Vision and Electronic Imaging, pp. 1–6. apy and Telemedicine, 5(Summer):128–137, Jan. 2007. Burlingame, CA, USA, Jan. 2018. [4] K. Brunnstrom,¨ M. Sjostr¨ om,¨ M. Imran, M. Pettersson, and M. Jo- [25] A. Singla, W. Robitza, and A. Raake. Comparison of subjective quality hanson. Quality of experience for a virtual reality simulator. In Proc. test methods for omnidirectional video quality evaluation. In IEEE Int. Human Vision and Electronic Imaging, pp. 1–9. Burlingame, CA, USA, Workshop on Multimedia Signal Processing, pp. 1–6. Kuala Lumpur, Jan. 2018. Malaysia, Sep. 2019. [5] M. Chessa, G. Maiello, A. Borsari, and P. J. Bex. The perceptual quality [26] A. Szpak, S. C. Michalski, D. Saredakis, C. S. Chen, and T. Loetscher. of the Oculus Rift for immersive virtual reality. Human–Computer Beyond feeling sick: The visual and cognitive aftereffects of virtual Interaction, 34(1):51–82, Dec. 2016. reality. IEEE Access, 7:130883–130892, Sep. 2019. [6] J. Drexler. Identification of system design features that affect sickness [27] Video Quality Experts Group. Test plan for evaluation of video quality in virtual environments. PhD thesis, University of Central Florida, models for use with high definition TV content, 2009 (accessed June Orlando, USA, 2006. 28, 2019). http://www.vqeg.org. [7] M. Elwardy, H.-J. Zepernick, V. Sundstedt, and Y. Hu. Impact of [28] H.-J. Zepernick, M. Elwardy, Y. Hu, and V. Sundstedt. On the Number ◦ Participants’ Experiences with Immersive Multimedia on 360◦ Video of Participants Needed for Subjective Quality Assessment of 360 Quality Assessment. In Proc. IEEE Int. Conf. on Signal Processing Videos. In Proc. IEEE Int. Conf. on Signal Processing and Commun. and Commun. Systems, pp. 40–49. Gold Coast, Australia, Dec. 2019. Systems, pp. 50–59. Gold Coast, Australia, Dec. 2019. [8] FFmpeg. FFmpeg and H.265 Encoding Guide, 2018 (accessed July 4, [29] Y. Zhang, Y. Wang, F. Liu, Z. Liu, Y. Li, D. Yang, and Z. Chen. 2019). Subjective panoramic video quality assessment database for coding [9] FFmpeg. H.264 Video Encoding Guide, 2018 (accessed July 4, 2019). applications. IEEE Trans. on Broadcasting, 64(2):461–473, Jun. 2018. [10] S. Fremerey, A. Singla, K. Meseberg, and A. Raake. AVtrack360: An [30] Y. Zhang, Y. Wang, F. Liu, Z. Liu, Y. Li, D. Yang, and Z. Chen. open dataset and software recording people’s head rotations watching Subjective panoramic video quality assessment database for coding 360° videos on an HMD. In Proc. ACM Multimedia Systems Conf., pp. applications. IEEE Trans. Broadcast., 64(2):42–51, Jun. 2018. 403–408. Amsterdam, The Netherlands, Jun. 2018. ◦ [11] J. Guna, G. Gersak,ˇ I. Humar, J. Song, J. Drnovsek,ˇ and M. Pogacnik.ˇ ASAMPLE FRAMESOFTHE 360 REFERENCE VIDEOS Influence of video content type on users’ percep- Four scenes from the publicly available VQA-ODV database [2, 19] tion and physiological response. Future Generation Computer Systems, have been selected to span over different contents and to reduce the 91:263–276, Feb. 2019. potential risk of simulator sickness. Sample frames of these scenes in [12] Huawei iLab. Cloud VR network solution white paper. Huawei Tech- equirectangular projection format are shown in Fig. 8. The reference nologies Co., Ltd., 2018. videos produced from these scenes are 10 s long with a frame rate [13] IEEE Vehicular Technology Society. Defining 6G: Challenges and of 29.97 frames per second (fps). Because of the excessively high opportunities. IEEE Vehicular Technology Magazine, 14(3), Sep. 2019. [14] Int. Telecomm. Union. Subjective video quality assessment methods for bitrates of the uncompressed 8K reference videos ranging from multimedia applications. Recommendation ITU-T P.910, Apr. 2008. 352.35 Mbps to 844.65 Mbps, perceptual near to lossless encoding [15] S. Katsigiannis, R. Willis, and N. Ramzan. A QoE and simulator was performed according to the recommendations given in [8,9]. For sickness evaluation of a smart-exercise-bike virtual reality system via this purpose, we used the constant rate factor (CRF) option of the user feedback and physiological signals. IEEE Trans. on Consumer H.265 (HEVC) encoder as suggested by the Fast Forward Motion Electronics, 65(1):119–127, Feb. 2019. Picture Experts Group (FFmpeg) Version 4.1.3. This reduced the [16] R. S. Kellogg, R. S. Kennedy, and A. Graybiel. Motion sickness bitrates of the 8K reference videos to range between 136.68 Mbps symptomatology of labyrinthine defective and normal subjects during and 226.79 Mbps for CRF=10. zero gravity maneuvers. Aerospace Medicine, 36:315–318, Apr. 1965. Additional reference videos with 6K, 4K, and 2K resolution were [17] R. S. Kennedy, N. E. Lane, K. S. Berbaum, and M. G. Lilienthal. generated by downsampling the 8K reference video using the bi- Simulator sickness questionnaire: An enhanced method for quantifying cubic scaling algorithm followed by near to lossless encoding with simulator sickness. The Int. J. of Aviation Psychology, 3(3):203–220, CRF=10. Apart from the resolutions obtained by downsampling 1993. the 8K reference videos, the optimal resolution (OR) of 3600 × [18] H. K. Kim, J. Park, Y. Choi, and M. Choe. Virtual reality sickness 1800 pixels suggested in [30] for the HTC Vive Pro HMD was also questionnaire (VRSQ): Motion sickness measurement index in a virtual generated. Sampling the reference video to the OR before encoding reality environment. Applied Ergonomics, 69:66–73, May 2018. is thought to alleviate interference on the quality assessment that [19] C. Li, M. Xu, X. Du, and Z. Wang. Bridge the gap between VQA could be inflicted when sampling is left to the HMD. Finally, the and human behavior on omnidirectional video: A large-scale dataset libx265 tool was used to compress the reference videos of each and a deep learning model. In Proc. ACM Int. Conf. on Multimedia, p. resolution with QP = 22, 27, 32, 37, and 42. This resulted in a set of 932–940. Seoul, Republic of Korea, Oct. 2018. [20] V. Sevinc and M. I. Berkman. Psychometric evaluation of simulator test videos with a wide range of quality levels to be used with the sickness questionnaire and its variants as a measure of cybersickness M-ACR-HR method. in consumer virtual environments. Applied Ergonomics, 82:Article 102958, Jan. 2020. [21] A. Singla, S. Fremerey, W. Robitza, P. Lebreton, and A. Raake. Com- parison of subjective quality evaluation for HEVC encoded omnidirec- tional videos at different bit-rates for UHD and FHD resolution. In Proc. of the on Thematic Workshops of ACM Multimedia, pp. 511–519. Mountain View, CA, USA, Oct. 2017.

483 (a) Alcatraz (b) BloomingAppleOrchards

(c) FormationPace (d) PandaBaseChengdu

Figure 8: Sample frames of the four 360◦ video scenes in equirectangular projection from the sphere to the plane.

484