Linköping University | Department of Computer and Information Science Bachelor’s thesis, 16 ECTS | Computer Science 2020 | LIU-IDA/LITH-EX-G--20/062--SE

Cloud Gaming: A QoE Study of Fast-paced Single-player and Multiplayer Games Molnspelande: En QoE Studie med Fokus på Snabba Single- player och Multiplayer Spel

Sebastian Flinck Lindström Markus Wetterberg

Supervisor : Niklas Examiner : Marcus Bendtsen

Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer- ingsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko- pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis- ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker- heten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman- nens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to down- load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

Sebastian Flinck Lindström © Markus Wetterberg Students in the 5 year Information Technology program complete a semester-long soft- ware development project during their sixth semester (third year). The project is completed in mid-sized groups, and the students implement a mobile application intended to be used in a multi-actor setting, currently a search and rescue scenario. In parallel they study several topics relevant to the technical and ethical considerations in the project. The project culmi- nates by demonstrating a working product and a written report documenting the results of the practical development process including requirements elicitation. During the final stage of the semester, students create small groups and specialise in one topic, resulting in a bache- lor thesis. The current report represents the results obtained during this specialisation work. Hence, the thesis should be viewed as part of a larger body of work required to pass the semester, including the conditions and requirements for a bachelor thesis. Abstract

Cloud computing is a way to deliver high-performance services to clients who would not usually be able to handle the computations on their own. They rely on computers in the cloud performing the calculations and therefore ease the load on the client-side. The goal of this thesis is to find what factors affect the players’ experience and how they affect the player. To handle this, we have done a user-based study on . The users get to play a fast-paced single-player game and a fast-paced multiplayer game against each other, while we collect data about their experiences. During the tests, we manipulated the players’ network conditions, and afterward, they answered questions regarding their quality of experience. From the data collected, we can see that the frame age is the most important measure- ment for determining the players’ in-game performance as well as the quality of experi- ence. We are also able to see that from the quality of service measurements manipulated, the latency is the one affecting the player the most. Results from the multiplayer test would indicate that we can equalize the skill differ- ence between the players without affecting the players quality of experience too much. These results are based on the advantage in ping time as well as frame age. From a developers perspective, this thesis emphasized the need to take frame age into account, and to try to manipulate the different parts of the frame age. The goal would be to ultimately lower the frame age and make the gaming experience more enjoyable for the player. Acknowledgments

The authors would like to extend their sincere thanks to Associate Professor Niklas Carlsson at Linköping’s University for his guidance and for playing a decisive role in how this thesis turned out. Special thanks should also go to our fellow students Erica Christensen Weistrand, Sophie Ryrberg, Alexandra Goltsis, Carl Hermod Ekblad and Rami Latif for their invaluable insight and help.

v Contents

Abstract iii

Acknowledgments v

Contents vi

List of Figures viii

List of Tables ix

1 Introduction 1 1.1 Aim...... 1 1.2 Research Questions ...... 2 1.3 Delimitations ...... 2 1.4 Findings ...... 2 1.5 Structure ...... 2

2 Background and Related Work 3 2.1 Background ...... 3 2.2 Related Work ...... 5 2.3 Comparison ...... 6

3 Method 7

4 Single-player Results 12 4.1 Scenario-based QoE Analysis ...... 12 4.2 Correlation Analysis ...... 15 4.3 Age-based QoE Analysis ...... 17 4.4 Other QoS-based QoE analysis ...... 18 4.5 Model-based Parameter Selection ...... 19 4.6 Accounting for player differences ...... 21

5 Multiplayer Results 25 5.1 The impact of Winning on Quality of Experience ...... 29 5.2 Correlation Analysis for the Combined Data ...... 30 5.3 Model-based Parameter Selection ...... 33

6 Discussion 35 6.1 Results ...... 35 6.2 Method ...... 37 6.3 The Work in a Wider Context ...... 40

7 Conclusion 41

vi Bibliography 42

8 Appendix 45 8.1 QoS measurements Top 3 ...... 45

vii List of Figures

2.1 Link Streaming Diagram ...... 3

3.1 Screenshot Geometry Wars: Retro Evolved ...... 8 3.2 Frame Age per Frame ...... 9 3.3 Screenshot Speedrunners ...... 11

4.1 QoE Metrics Across all Scenarios ...... 14 4.2 Score Compared to Baselines Across Scenarios ...... 14 4.3 Correlation Matrix for Pearson’s r ...... 15 4.4 Correlation Matrix for Kendall’s τ ...... 16 4.5 Correlation Matrix for MIC ...... 17 4.6 In-game Performance and QoE Depending on Average Frame Age with a bin-size of50ms...... 18 4.7 In-game Performance and QoE Depending on Average Frame Age with a bin-size of 100 ms ...... 19 4.8 The Smoothed Average for In-game Performance and QoE compared to Different QoS...... 20 4.9 Model Occurrences of Predictors ...... 21 4.10 In-game Performance and QoE for two Players Depending on Average Frame Age with a bin-size of 50 ms ...... 22 4.11 Individual Model Occurrences of Predictors ...... 23

5.1 Win Rate Compared to the Advantage in Ping Time for Player 1 ...... 25 5.2 Win Rate Compared to the Advantage in Average Frame Age for Player 1 . . . . . 26 5.3 General opinion as ping changes ...... 27 5.4 Graphics score as ping changes ...... 27 5.5 Interactive Score as ping changes ...... 27 5.6 General opinion as the average frame age changes ...... 28 5.7 Graphics score as the average frame age changes ...... 28 5.8 Interactive Score as the average frame age changes ...... 28 5.9 QoE Depending on Winner ...... 29 5.10 Correlation Matrix for Pearson’s r ...... 31 5.11 Correlation Matrix for Kendall’s τ ...... 32 5.12 Correlation Matrix for MIC ...... 33 5.13 Model Occurrences of Predictors ...... 34

viii List of Tables

3.1 Data Collected ...... 8 3.2 Test Scenarios ...... 10 3.3 The Lag Factor x Depending on the Difference in Score ...... 11

4.1 Average Performace and QoE for Test Scenarios ...... 13 4.2 Top 7 Best Pearson Correlations for QoE and Performance ...... 15 4.3 Top 7 Best Kendall Correlations for QoE and Performance ...... 16 4.4 Top 7 Best MIC Correlations for QoE and Performance ...... 17 4.5 Model including categorical predictor ...... 24

5.1 Models including winning categorical predictor ...... 30 5.2 Top 7 Best Pearson Correlations for QoE and Performance ...... 31 5.3 Top 7 Best Kendall Correlations for QoE and Performance ...... 32 5.4 Top 7 Best MIC Correlations for QoE and Performance ...... 33

6.1 Percentage of p > 0.05 for different correlation coefficients ...... 38 6.2 p-values for Different Correlations Rounded to 4 Significant Figures (3 for MIC) . . 39 6.3 Cronbach’s α for Different Games ...... 39

8.1 Top 3 Best Pearson Correlations for General Opinion Score ...... 45 8.2 Top 3 Best Pearson Correlations for Graphical Score ...... 45 8.3 Top 3 Best Pearson Correlations for Interactive Score ...... 45 8.4 Top 3 Best Pearson Correlations for In-game Performance relative to the Baseline . 45 8.5 Top 3 Best Kendall Correlations for General Opinion Score ...... 46 8.6 Top 3 Best Kendall Correlations for Graphical Score ...... 46 8.7 Top 3 Best Kendall Correlations for Interactive Score ...... 46 8.8 Top 3 Best Kendall Correlations for In-game Performance relative to the Baseline . 46 8.9 Top 3 Best MIC Correlations for General Opinion Score ...... 46 8.10 Top 3 Best MIC Correlations for Graphical Score ...... 46 8.11 Top 3 Best MIC Correlations for Interactive Score ...... 47 8.12 Top 3 Best MIC Correlations for In-game Performance relative to the Baseline . . . 47

ix 1 Introduction

Modern computer games are getting more and more graphically extreme, therefore demand- ing the user to have the latest hardware to play the game. Meanwhile, our devices are only getting smaller and thinner, while the demand for performance keeps growing. Since the devices struggle to provide the required performance, a solution to the problem is to offload calculations to a computer in the cloud. This could open possibilities for services such as mo- bile gaming, where the heavy graphical calculations are done on a server, and the client de- vice would only handle the inputs. The cloud computing/gaming concept is that the server computer leverages its computing capabilities to both play the game and video stream the gameplay to the client in real-time. The client, in turn, sends its inputs back to the stream- ing computer. This would have positive effects for both the players and the developers. The reduction in demand for performance on the players’ end would result in cheaper devices and longer battery life. The benefits would also extend to home entertainment, where the player would only require a "thin client" to play the latest games. This would also have the significant benefit of reducing piracy for the developer since the code would only be stored on the server the game is running on [29].

1.1 Aim

This thesis aims to test what factors affect the experience of playing streamed games. We will try to find what factors have the most significant impact and how they might be used to predict how good or bad the experience will be. While playing, we will record different Quality of Service (QoS) measurements, in particular, Average Frame Age, which is the time it takes for the video frame to be displayed to the user. We will also be measuring Quality of Experience (QoE) and in-game performance. During the test, we will manipulate the network conditions to affect the users quality of experience [29]. The idea is to artificially introduce latency and packet loss to the network to see how it affects the users’ in-game performance and experience.

1 1.2. Research Questions

1.2 Research Questions

In this thesis, we answer the following three questions.

1. How do different QoS measurements, especially Average Frame Age, affect the users QoE?

2. How do different QoS measurements affect the user’s in-game performance?

3. Is it possible to compensate for unequal skill levels in a multiplayer session by intro- ducing latency to the players, without affecting the QoE noticeably?

1.3 Delimitations

From experience, we have only focused on using the cloud gaming platform Steam Link from Valve in our test. Even though there is plenty of similar services, the impact of chang- ing different network conditions should be similar across the different services because it is primarily affecting the transfer between the units. A point in favor of Steam Link is that it provides detailed log files. In the thesis, we performed the game tests with university students. This limits our vol- unteer players to a particular age group and background. We have also chosen to limit our test to a small group of people for which we collected and analyzed a large number of play sessions, in particular it will only be us, the authors, who complete the tests. This allows us to account for individual skill differences between the players in our analysis but require a significant time commitment from our volunteer players.

1.4 Findings

In this thesis, we have shown the correlation between the different QoS and QoE measure- ments and found an especially strong correlation between the QoE and average frame age. This carries over to trying to model the QoE, where frame age proves to be one of the best predictors for all QoEs. When expanding this to data from two games simultaneously, we still have the same strong correlation between frame age and QoEs. However, the model pre- dictions are not as simple anymore. When trying to construct a model for both games, frame age is still among the most common predictors but is joined by several other predictors in order to get a good predictive model. In regards to equalizing multiplayer games, we have found it possible to bridge the gap for a specific skill level without affecting the enjoyment too much.

1.5 Structure

The rest of the thesis is organized as follows. Chapter 2 provides a background to the work as well as a review of related work. In Chapter 3, we introduce the method for single-player and multiplayer. For Chapter 4, we present our findings from the user study and the polled results for the single-player results. Chapter 5 includes the same things as Chapter 4 but for multiplayer and a combination of the results. In Chapter 6, our results and method are analyzed, and finally, we present our conclusion in Chapter 7.

2 2 Background and Related Work

The following chapter presents a background for the thesis and related work that is of interest.

2.1 Background

Steam Link Steam Link is a software service providing the ability to stream content of a Steam library from one computer to another, developed and released by Valve in 2015 [25]. It was initially a hardware device that the user would connect to their network and then be able to stream the games in their Steam library to the device. However, it has since been discontinued to pave the way for their software version of Steam Link, which is used in this thesis. This works by having either the Steam app downloaded on a computer or having the Steam Link app on an Android or IOS device. Furthermore, it works by playing the game on the server computer and streaming the game, sending encoded audio and video, over to the desired device. In turn, the device sends back the inputs to the process that runs the game on the host computer. The idea is to run the game on a high-powered gaming computer while streaming the video feed to another unit.

Figure 2.1: Steam Link Streaming Diagram [25]

As can be seen from Figure 2.1, the inputs from the user goes through the network to the streaming computer that handles the inputs, game rendering, capture, and encoding. After the streaming computer encodes the captured frame, it sends it to be displayed on the client that needs to decode it before displaying it. The small box between "input" and "decode"

3 2.1. Background represents the physical hardware Steam Link, whereas in our case, this would be the client’s computer. The encoding and decoding of the stream is because the frames sent from the server to the client are too big for the network to handle and, therefore, need to be made smaller by encoding them. When they then arrive at the client, it needs to decode it so that it can be displayed.

Interesting Network Properties In our thesis, some network properties might not be immediately apparent what they mean. The property frame age is often used, which is the time it takes for the frame to be dis- played, starting from the time it is rendered on the streaming computer to when it is pre- sented to the user. Unlike ping, which is the time it takes to send a practically empty packet to the host computer and to get an answer, frame age includes the time to capture, encode, decode, transfer and render the frame. Ping is also a great tool to compute the latency in the transfer between the computers. Latency is the delay from one action to the desired effect. Another measurement used in the thesis is packet loss. Packet loss happens when a packet of data is lost in the transfer to the client. It is typically caused by some errors in the trans- mission or network congestion and is often expressed as the percentage of packets dropped.

Plots and Correlation When presenting our results, we have sometimes chosen to present them in forms of box plots. There has also been the decision to present correlation matrices with different correla- tion coefficients, which needs some background. The box plot is a statistic tool to display the results in the form of boxes. The box consists of a middle section and two smaller boxes going out from it in opposite directions. At the end of those boxes are some lines which end in a T. The middle of the box is the median of the data, the smaller boxes is the interquartile range, meaning the top box is the 75th percentile, while the bottom box is the 25th percentile. The lines going out of the boxes are showing the max and the min value. In our plots, we also have an orange dot, which represents the mean of the data. Correlation is the statistical relationship between two variables; in our case, we will mea- sure how closely two variables are related. One thing to keep in mind is that correlation does not imply causation. It is important to know that because two variables are correlated, they do not need to have a cause and effect relationship. The correlation coefficients we will be using are Pearson’s r [17], Kendall’s τ [14] and the Maximal Information Coefficient (MIC) [20]. Pearson’s r or the Pearson product-moment correlation coefficient is one of the most common correlation coefficients and is a linear correlation coefficient. Kendall’s rank corre- lation coefficient (τ) on the other hand is a rank correlation coefficient which concerns itself with if there is an increase in variable X that leads to an increase in variable Y, no matter the size of the change, whereas r takes the size of the increase in mind. Both r and τ have values between −1 and 1, where −1 means perfect negative correlation, i.e., an increase in X always lead to a decrease in Y and vice versa for 1. A value of 0 means that there is no correlation between the variables. The MIC measures how strong the linear or non-linear relationship between the two variables is. One difference from r and τ is that the MIC has a value from 0 to 1 instead of −1 to 1, so we only get the strength of the correlation, not the direction or type of the relationship. Probability value or p-value is the probability of finding results at least as extreme as the actually observed ones. It is often used as evidence against the null hypothesis that there is no relation between two measured variables.

4 2.2. Related Work

Models and Parameter Selection To predict how a variable affects the result, we can try to fit a mathematical model to the measured values. What is interesting is the use of as few variables as possible while still retaining a good model. However, to do that, we need some way to measure how good the 2 model is. Two ways to do that is the R value and Mallow’s Cp. R2 or the coefficient of determination is a measure of how close the predictive model is Explained variation to the measured values. R2 is defined as R2 = [19] which gives us a Total variation percentage from 0 % to 100 % of how well the model can explain variances in the measured data. Mallow’s Cp is used to compare the model’s precision, and bias with variables removed compared to the full model with all variables [27]. A smaller Cp indicates a model with higher precision. A good rule is that for a relatively precise and unbiased model, the value of Cp should be less than the number of predictors plus a constant, usually one. Stepwise regression is an automated way to select the predictor variables. We add or re- move one variable at a time if the variables p-value is greater than a specified α-value. The resulting model is then evaluated with, for example, the R2 value to see if it is an improve- ment or not. This continues until we cannot improve the model anymore. It is important to note that the final model is not necessarily a good model, but it gives a good hint.

2.2 Related Work

In the field of cloud computing, there has been plenty of research done. Several studies have studied the effects of latency on gaming, either regular gaming or cloud-based gaming. Others have focused on the video encoding of cloud gaming to deliver the highest quality of gameplay to the user [12, 23]. One study done by Shea et al. [22] measured the interaction delay and the responsiveness on the cloud-based platform when adding latency. Whereas another study done by Chen et al. [1] studied the effect of latency, packet loss, and bitrate on frame rates and graphical quality for their cloud-based platforms. Another study done on regular gaming by Raaen and Petlund [18] studies how much delay is added locally, where the results show that the local delay is so large one has to keep it in mind when studying the effects of latency on games. However, these did not measure anything concerning the players’ in-game performance with the added network changes.

User Studies on Cloud Gaming and What to Expect Jarschel et al. [13] do this by doing a subjective user study, investigating the results of the quality of experience from their users. One of the conclusions reached by Jarschel et al. [13] is that for the more fast-paced games, the user is much more tolerant to packet loss than other slower games such as roleplaying games. They also conclude that the delay is the decisive factor for the players when playing fast-paced games. "Players of fast games would rather accept higher packet loss rates than they would tolerate high delays...". Another method is the model developed by Xu et al. [28] that shows the relationship between bit rate and perceived quality. They also conclude that the perceived quality depends on the type of content. In their case, depending on the motion intensity of the stream. There are also other studies focusing on cloud-based platforms such as the study done by Chen et al. [2], which shows the differences for the platforms in latency components. There is also Huang et al. [10] who made an open- cloud gaming system, outperforming the ones displayed by Chen et al. [2]. One study done by Yates et al. [29] focuses on optimizing the age of information in a cloud gaming system. They specifically mention in the conclusion that "...further study is needed to determine whether the average frame age is an effective measure of user-perceived QoE...", which is closely related to what this thesis sets out to do.

5 2.3. Comparison

Claypool and Finkel [4], provide a study on "the effects of latency on player perfor- mance...", focusing on arcade-type of games like "Crazy Taxi" by measuring the points the player got after a one minute round as well as subjective opinions of the quality of experi- ence. They concluded that the games are as sensitive as the FPS genre, "... the most sensitive class of games to latency...", and that the "...user performance degrad[ed] up to 25 % with each 100 milliseconds of latency". These tests would do well to be compared to our tests when trying to measure the player’s in-game performance. It is also a great example of what to expect for the outcome of our study, where Claypool and Finkel [4] concluded that the players’ performance degrades linearly with the network conditions. Other studies like Dick et al. [7] also show the effect of delay on the user, claiming that a Round Trip Time of less than 150 ms is required for a good quality gaming experience for First Person Shooters. Clincy and Wilgor [5] makes a similar study to what we aim to achieve, studying the adverse effects of network conditions on the user experience. They look closer on network effects such as packet loss and latency and the role they play in determining the user’s QoE. They come to the interesting conclusion that packet loss of only 1 % impact the player in a way that the experience is almost unplayable. This would be a great study to base our network conditions on.

Mean Opinion Score When conducting the test with the players, Hsu et al. [9] has an approach in which they conduct an online survey measuring the users’ Mean Opinion Score (MOS). The users play a game for 2-4 minutes after accepting the survey, after the gameplay is done, the player is asked to answer a questionnaire. The users are then asked to play at least one of the three games using eight different configurations. The configurations include different kinds of bitrate and frame rate and more. In the questionnaire, users are asked to score the game’s graphical quality, interactive quality, and overall score from 1 to 5, where interactive quality is how responsive the game is. According to Miller [16], a person can generally distinguish between 7 ± 2 different lev- els when rating something. Any benefit from a higher resolution scale is lost in the noise stemming from humans’ limited information processing capabilities. Streijl et al. [26] has compiled an excellent summary where they go through the strengths and weaknesses of us- ing MOS. One important thing to note is that the subjective MOS "is not a precise number, but a statistical measurement." These studies should be a useful guide of what to expect from the MOS.

2.3 Comparison

What sets this work apart from the others is our focus on both QoE and in-game performance, while others have measured one of them we will measure both. Another thing is our collec- tion of QoS metrics from the service in order to get accurate data and not rely solely on the values set by us artificially. A significant difference is also the focus on competitive play and how to equalize gameplay. Another difference in comparison to the mentioned studies is our results from the latency degradation where every 100 ms our results drop 50 %, which is double the percentage men- tioned in Claypool and Claypool [3]. The study done by Clincy and Wilgor [5] does not fully compare well with our results regarding packet loss either. They mention that at 1 % packet loss, it makes the experience almost unplayable. At the same time, our results would say otherwise, but this could be affected by the results as mentioned earlier regarding fast-paced games.

6 3 Method

In this chapter, we describe our method for testing the service. We go through how the testing was conducted, what types of games, how long they were played, and what questions were asked. The method is split up in single-player and multiplayer.

Single-player tests The single-player tests consist of participants taking part in a gaming session containing a client computer, a laptop which the client played on, and a host computer running the game over Steam Link. The Steam client is installed on both the server and the client. The client runs on a mid-power laptop with integrated graphics (Intel i5 8250U, 8 GB RAM) while the server runs on a high-powered desktop computer (Intel i7 8700K, Nvidia GTX 1080 Ti, 16 GB RAM). The participants were given time to familiarise themselves with the game, Geometry Wars: Retro Evolved [8]. Geometry Wars is a 2D game where the player plays as a small ship inside a larger box where enemies of different sizes and abilities spawn. The player can move the ship inside the perimeters of the box and shoot the enemies, earning the player points. The player starts the game with three lives and three bombs, the bombs clear the screen of enemies but earn the player no points. Every 10 000 points, the player’s weapons are changed to be either faster, slower, broader, or more narrow. Figure 3.1 shows a screenshot taken during one of our tests displaying the game Geometry Wars: Retro Evolved. After familiarising themselves, the participants played the game until they ran out of lives or for a maximum of 5 minutes as a baseline. The time limit is to make sure that the players’ experience is still relevant at the end of the test [21]. The baseline was repeated at the end of the test, and an average of the two baselines was used for comparison on how well the player did. Steam Link then gives us a summary of each session, giving us data such as average bitrate for both the client and the server, average ping, average bandwidth, and much more. To gain more precise data, the participants game is continually polled through Steam Link for a video trace. This video trace contains detailed frame-by-frame information for the last 10 seconds as well as network statistics every second. These video traces are saved as zip files locally on the server computer. The data we are collecting for every frame is shown in Table 3.1. In addition to the raw per frame data, we also calculated the mean, median, max, min values, and the standard deviation for every scenario in Table 3.2.

7 Figure 3.1: Screenshot Geometry Wars: Retro Evolved

Data Comment Ping The round trip time between client and server Server Bandwidth The data rate from server to client Client Bandwidth The data rate from client to server Link Bandwidth The maximum available bandwidth.

Packet loss Percentage of packets lost Frame Size The size of the received frame Frame Age The age of the frame when displayed to the user Capture time The time it took to capture the frame

Encode time The time it took to encode the frame Transfer time The time it took to transfer the frame from server to client Decode time The time it took to decode the frame on the client

Dropped slow network Frames dropped because the transfer took too long (%) Dropped lost network Frames dropped because they were lost during transfer (%) Dropped slow decode Frames dropped because the decoding took too long (%) Dropped corrupt decode Frames dropped because corruption during decoding (%)

Dropped late Frames dropped because they took too long to display (%) Dropped reset Unknown (%) Dropped total Frames dropped in total. (%)

Table 3.1: Data Collected

8 Figure 3.2: Frame Age per Frame

It is essential to make sure that the data collection does not affect the test to a significant degree. In order to assess the impact of the data collection, we have traced the frame age per frame during data capture for one of the baseline tests in Figure 3.2. Here we can see that while the collection has an impact, the frame age is around 90 ms for only two frames out of 1194, which is 0.02 %, and the average remains around 23.58 ms, so the impact should be negligible. When the participant is done with the first baseline test, they go through the different scenarios, as shown in Table 3.2 in random order, where every scenario comes with different network conditions. Table 3.2 shows the scenarios nickname, the additional latency for that scenario, the additional packet loss for that scenario, and its full name. After the participant is done with a scenario, their score is recorded, and they are asked to rate their experience of the game. The score is then compared to the average baseline to get a proper measurement of how well the player performed compared to optimal network conditions. As mentioned before, tests always starts and ends with a baseline scenario with no added latency or packet loss. We use the application Clumsy [6] to introduce artificial network conditions according to Table 3.2. For the constructed ranking system, we use the MOS, which ranks the users’ subjective opinion on a 7 point scale, where 7 is the best, and 1 is the worst. The players are asked to answer three questions, ranking the graphical quality, the quality of the interactivity, and the overall score.

9 Scenario Additional Latency Additional Packet Loss Comments BL1 0 ms 0 % Baseline

L1 25 ms 0 % Latency 1 L2 50 ms 0 % Latency 2 L3 75 ms 0 % Latency 3 L4 100 ms 0 % Latency 4 L5 125 ms 0 % Latency 5 L6 150 ms 0 % Latency 6 L7 175 ms 0 % Latency 7 L8 200 ms 0 % Latency 8

P1 0 ms 0.25 % Packet loss 1 P2 0 ms 0.5 % Packet loss 2 P3 0 ms 0.75 % Packet loss 3 P4 0 ms 1.0 % Packet loss 4 P5 0 ms 1.25 % Packet loss 5 P6 0 ms 1.5 % Packet loss 6 P7 0 ms 1.75 % Packet loss 7 P8 0 ms 2.0 % Packet loss 8

PL1 100 ms 0.5 % Packet loss & latency 1 PL2 100 ms 1.0 % Packet loss & latency 2 PL3 150 ms 1.0 % Packet loss & latency 3 PL4 200 ms 1.0 % Packet loss & latency 4 PL5 100 ms 1.5 % Packet loss & latency 5 PL6 100 ms 2.0 % Packet loss & latency 6 PL7 50 ms 1.0 % Packet loss & latency 7

BL2 0 ms 0 % Baseline 2

Table 3.2: Test Scenarios

Multiplayer tests The multiplayer tests were done similarly to how the single-player test was done. How- ever, instead of having one participant, two participants competed against one another. Both participants played on a client that consists of a mid-power laptop with integrated graphics (Intel i5 8250U, 8 GB RAM). The participants played using Steam Link for the cloud comput- ing with the server consisting of high-powered desktop computers (Intel i7 8700K, Nvidia GTX 1080 Ti, 16 GB RAM). The participants were given 5 minutes to get familiar with the game, Speedrunners [24], and its controls. Speedrunners is a multiplayer racing game whose goal is to run a looping course faster than the opponents, where players get eliminated if they fall too far behind. To their disposal, the players have a hook-shot that can hook the ceiling in order to traverse gaps and gain speed as well as speed-boosts scattered around the map and different kinds of weapons. A match is played to first to three wins.

10 Figure 3.3: Screenshot Speedrunners

In Figure 3.3, we can see a screenshot from one of our tests. We can see the players racing along the course with player 2 (dark blue) in a slight lead over player 1 (light blue). The line going from player 1 to the roof is the hook shot, while the blue ball above player 2 is a collectible item. The bars under the players’ icons at the top of the screen is how much boost the players got left, while the figure to the left of player 2’s icon is the current item they have. Both players start with 50 ms of additional latency. After each match the person who won the match gets an additional delay, multiplying the current lag with a factor f = xn, where x depends on the difference in the score at the end of the match (best of 5), see Table 3.3, and n is the amount of consecutive wins the winning player currently has. The losing player then has their artificial latency divided by the same factor f . Same as the single-player tests, the service is continuously polled for a video trace. After the round, the participants were similarly asked the same questions as for the single-player tests, the overall score, the graphical score, and the interactive score.

Winning score Losing score x 3 0 1.20 3 1 1.15 3 2 1.10

Table 3.3: The Lag Factor x Depending on the Difference in Score

11 4 Single-player Results

In this chapter, we present our results, mainly in the form of graphs where we present the data which we deem most interesting to analyze. This chapter only contains data from the single-player test and is followed by a chapter presenting our findings from the multiplayer tests. The chapter starts with an analysis of the QoE scores compared to the different scenarios presented in Table 3.2. After that, there is a correlation analysis of QoE and QoS measure- ments, followed by an analysis of the frame age and other QoS metrics, and later a model- based parameter selection section. In the end, there is a section accounting for the differences between the players.

4.1 Scenario-based QoE Analysis

All the metrics captured and taken into consideration are captured by Steam Link and com- piled by us. Table 4.1 displays the average value for the different scenarios for different QoE metrics as well as the raw in game score and normalized score relative to the baseline. The QoE parameters opinion, graphics, and interactive have a scale of 1-7, where 1 is the worst, and 7 is the best. The in-game performance is the score the player got at the end of their game, after either they lost three lives or their time ran out. It is interesting to point out the downward trend in both in-game performance and QoE as we progress through the scenarios. The latency scenarios L1-8 never reach under 3, and the packet loss scenarios P1-8 never reaches under 4. It is when we reach the combined scenarios PL1-7 that we can witness a score under 3. Where PL4, the combination with the highest latency and an average packet loss, scores the lowest at all QoE measurements. It is also the scenario with the worst average in-game performance at 7,665, not even reaching the first milestone in the game at 10,000 points. It is important to note that the last scenario, PL7, is not inherently the scenario with the worst network conditions. PL7 has 50 ms latency and 1.0 % packet loss as per Table 3.2, which would explain the sudden increase in in-game performance and enjoyment. To better visualize the downward trend through the scenarios, Figure 4.1 shows box plots of the different QoE scores throughout all the scenarios. The vertical axis shows the respec- tive QoE score, and the horizontal axis shows the scenario. Starting from the left, we have

12 4.1. Scenario-based QoE Analysis

Scenario Performance Overall opinion Graphics Interactive Norm. Score BL1 77,082 6.50 6.36 6.50 N/A L1 76,266 6.14 5.93 6.29 1.01 L2 61,196 5.64 5.50 5.50 0.96 L3 52,662 4.79 4.64 4.71 0.91 L4 31,152 4.64 5.00 4.57 0.48 L5 36,096 4.29 4.36 4.43 0.58 L6 27,798 4.00 4.50 3.71 0.49 L7 24,920 3.71 4.64 3.50 0.30 L8 19,150 3.21 3.71 3.00 0.29

P1 46,695 5.14 5.21 5.21 0.63 P2 69,821 5.36 5.50 5.36 1.07 P3 58,513 5.07 5.21 5.07 0.86 P4 70,678 4.93 5.00 4.79 0.99 P5 63,031 5.14 5.14 5.07 0.97 P6 71,740 5.14 5.21 5.07 1.07 P7 70,538 5.21 5.57 5.14 1.09 P8 72,652 4.85 4.93 4.72 0.95

PL1 26,432 2.93 3.85 2.79 0.37 PL2 23,208 2.71 3.85 2.21 0.36 PL3 15,584 2.43 3.21 2.14 0.17 PL4 7,665 1.64 2.14 1.36 0.10 PL5 21,628 2.79 3.29 2.64 0.28 PL6 14,430 2.71 3.29 2.21 0.18 PL7 45,646 4.07 4.29 3.79 0.68 BL2 84,007 6.35 6.21 6.57 N/A

Table 4.1: Average Performace and QoE for Test Scenarios the baseline, latency, packet loss, and the combinations as well as the respective network conditions for the scenario. In this plot, we can see the downward trend with the lowest score for the scenario with 200 ms delay and 1.0 % packet loss rate (i.e., PL4). We are also more clearly able to see the higher overall opinion in the packet loss scenarios compared to the latency and the combination scenarios. It is also interesting to note that the interactive score generally is lower than the graphical score. This would indicate that the most important thing for the player is the interaction with the game. Not surprising since the game is a fast-paced game, where the user is more focused on surviving than observing great visuals. Figure 4.2 shows the players’ normalized score relative to the average of the two baseline tests across the different scenarios in the form of a box plot, which shows a similar result. Its vertical axis shows the normalized score relative to the baseline, while its horizontal axis is the same as for Figure 4.1c displaying the scenarios. When measuring performance across the different scenarios, we see a steady decline in performance across the latency scenarios, a similar but not quite as steady decline for the packet loss scenarios and overall lousy in-game performance for the scenarios with both. We also see a higher impact when the latency is comparatively higher than the packet loss. Here it is also essential, as mentioned before, to note that the last scenario PL7 is not the scenario with the worst conditions, but with 50 ms latency and 1.0 % packet loss. This would

13 4.1. Scenario-based QoE Analysis

(a) General Opinion Score

(b) Graphics Score

(c) Interactive Score

Figure 4.1: QoE Metrics Across all Scenarios explain the sudden increase for both the enjoyment the player has during the scenario as well as the in-game scenario. These results would indicate that latency has a more significant impact on the players’ opinion and in-game performance compared to packet loss. This, as seen by the more promi- nent downward trend in the latency category, as well as the fact that the lowest average score is the one with the highest latency and packet loss combination.

Figure 4.2: Score Compared to Baselines Across Scenarios

14 4.2. Correlation Analysis

Figure 4.3: Correlation Matrix for Pearson’s r

Rank Pearson 1 Avg. Frame Age & Gen. Op. Score 2 Avg. Frame Age & Inter. Score 3 St. Dev Frame Age & Inter. Score 4 St. Dev Frame Age & Gen. Op. Score 5 Avg. Ping Time & Gen. Op. Score 6 Avg. Ping Time & Inter. Score 7 Avg. Frame Age & Graph. Score

Table 4.2: Top 7 Best Pearson Correlations for QoE and Performance

4.2 Correlation Analysis

For a more in-depth look at how the metrics can affect the experience, we have calculated the correlation between some of the available variables. The correlation coefficients we have calculated are Pearson’s r, Kendall’s τ, and the Maximal information coefficient [20]. The results are presented in Figures 4.3 to 4.5 where the number inside the boxes are the correlations, where 0 means no correlation and a larger absolute value means a stronger correlation. Looking at the matrices, we see a similar result across all three coefficients, with the average frame age having the strongest correlation for all three opinion scores as well as for in-game performance. Tables 4.2 to 4.4 shows the top 7 strongest correlations between a QoS metric (first in each correlation pair) and QoE or performance metric (second in each correlation pair). Here, we show results using the Pearson, Kendall, and MIC coefficients. We note that there is not much of a difference between the Pearson and Kendall until we arrive at rank five and six, where the Pearson coefficient has average ping time while the Kendall has average transfer time. Worth to note is that both of these are strongly correlated with each other. While the MIC

15 4.2. Correlation Analysis

Figure 4.4: Correlation Matrix for Kendall’s τ

Rank Kendall 1 Avg. Frame Age & Gen. Op. Score 2 Avg. Frame Age & Inter. Score 3 St. Dev. Frame Age & Inter. Score 4 St. Dev. Frame Age & Gen. Op. Score 5 Avg. Transfer Time & Gen. Op. Score 6 Avg. Transfer Time & Inter. Score 7 Avg. Frame Age & Graph. Score

Table 4.3: Top 7 Best Kendall Correlations for QoE and Performance ranking differs slightly, it still retains the average frame age at the top and shares most of the correlation pairs with the two others. The things that stand out are the exclusion of the corre- lation of standard deviation of frame age for interactivity and general opinion scores, which is rank 3 and 4 for Pearson and Kendall, respectively, and the inclusion of the correlation between the standard deviation of packet loss and general opinion score. It is interesting to note that the average frame age is at the top two of the seven measure- ments for all coefficients, and in five of all the presented measurements for both Pearson’s and Kendall’s. Interestingly there are no combinations for the MIC, which include the graphical score. These results would indicate that the frame age is a prominent measurement to determine the players QoE. The next section will tackle the interesting topic of showing how the different average frame age values compares to the in-game performance and QoE measurements.

16 4.3. Age-based QoE Analysis

Figure 4.5: Correlation Matrix for MIC

Rank MIC 1 Avg. Frame Age & Gen. Op. Score 2 Avg. Frame Age & Inter. Score 3 Avg. Transfer Time & Inter. Score 4 Avg. Ping Time & Gen. Op. Score 5 Avg. Transfer Time & Gen. Op. Score 6 Avg. Ping Time & Inter. Score 7 St. Dev. Packet Loss & Gen. Op. Score

Table 4.4: Top 7 Best MIC Correlations for QoE and Performance

4.3 Age-based QoE Analysis

In this section, we will take a closer look at how the average frame age affected the QoE measurements and in-game performance. Figure 4.6 presents us with both the in-game performance as well as the QoE compared to different average frame ages. The box plots show the different frame ages in bins along the horizontal axis, and the corresponding QoE or in-game performance for that frame age along the vertical axis. The numbers in Figure 4.6d present the highest in-game performance for that particular frame age. We can see a downwards facing trend per usual, albeit it is not as prominent as other plots. One interesting aspect is the similarities between the opinion score and the interactive score, unlike the graphical score, which is not as similar. Another interesting aspect is the quick drop at around 850 ms, where the QoE scores are around three, and the median mostly stays low after that.

17 4.4. Other QoS-based QoE analysis

(a) Opinion Score (b) Graphical Score

(c) Interactive score (d) In-game performance

Figure 4.6: In-game Performance and QoE Depending on Average Frame Age with a bin-size of 50 ms

For the in-game performance, we see a similar downwards pacing trend, and when we reach 1000 ms, we reach a pit. It is interesting to note the general good quality of in-game performance from 100 ms to at least 550 ms. These results would further indicate that the general opinion score has a good correlation with the interactive score, as well as the interactive score being the most important measure- ment for the player. The in-game performance would suggest that there is a certain threshold for the players at 400-600 ms where, under the threshold, the player still performs at their best. Figures 4.7a to 4.7d present different QoE or In-game performance compared to the aver- age frame age but this time with a bin-size of 100 ms. This to try and counteract the problems that can stem from a small sample size in some of the bins. As we can see, all the plots have a general downward trend. We can also see that the opinion score and interactive score share a similar appearance. The graphical score has a gentler slope compared to the rest of the QoE measurements, another indication that frame age affects the general opinion and interactive score more than the graphical scores. The reason the data at the end of the plot goes up again is that we might lack sufficient measurements despite the larger bin size.

4.4 Other QoS-based QoE analysis

Beyond the average frame age, several other QoS measurements are worthy of analysis based on their QoE scores. Other vital measurements are the average ping time, average transfer time, average packet loss, and average server bandwidth. Figure 4.8 shows a table of plots where the rows are the different QoS measurements, starting from the top is the average ping time, average transfer time, average server band- width and average packet loss. Meanwhile, the columns display the QoE measurements as well as the average in-game performance compared to the baseline. Each plot displays their respective smoothed average with a length of 10 from the lowest QoS value to its highest.

18 4.5. Model-based Parameter Selection

(a) Opinion Score (b) Graphical Score

(c) Interactive score (d) In-game performance

Figure 4.7: In-game Performance and QoE Depending on Average Frame Age with a bin-size of 100 ms

Figure 4.8 shows a consistent trend where the enjoyment and the in-game performance degrade as the values of the QoS measurements get higher and higher. The average server bandwidth shows the opposite; when the bandwidth increases, the enjoyment and in-game performance increase, which is as expected since the more bandwidth the server has access to, the more it can send with better quality. It is interesting to note the similarities between the average ping time and average transfer time as they degrade at the same rate. We can see a similar trend but not as prominent for the average packet loss, which would further indicate that the packet loss does not have such an important roll in determining the enjoyment of the fast-paced game as the latency.

4.5 Model-based Parameter Selection

In order to find the best predictor variable for the different QoE metrics, we have performed a best subset regression analysis on the data. To ensure that collinearity has as small an impact as possible, we have used Mallow’s Cp to assess the models. For the remaining models, we have calculated how often each predictor variable occurs, and the variables occurring the most should be the best predictors. In Figure 4.9a, we can see the model occurrences of predictors for the different QoS mea- surements. Meaning the percentage of times the measurement showed up as a predictor for the result; in this case, the general opinion score. The measurements that are missing were not interesting enough, or their data did not have enough deviations to be displayed. The vertical axis is the percentage of model occurrences, and the horizontal axis is the different predictors, showing the name of the predictor variable, where BW is the bandwidth. As will be the trend throughout these model occurrence plots, the average frame age is one of the most prominent predictor. In this particular plot Figure 4.9a it is tied to two other measurements, the average packet loss, and its standard deviation. They are followed closely by the standard deviation for the frame age. This would have a strong indication that the frame age and the packet loss, along with their respective standard deviation, would be a good predictor for the players’ opinion score.

19 4.5. Model-based Parameter Selection

Opinion Score Graphical Score Interactive Score Performance

Avg. Ping Time (a) (b) (c) (d)

Avg. Transfer Time (e) (f) (g) (h)

Avg. Server BW (i) (j) (k) (l)

Avg. Packet Loss (m) (n) (o) (p)

Figure 4.8: The Smoothed Average for In-game Performance and QoE compared to Different QoS

It is important to note that these plots show the model occurrences of predictors for single- player as we will present the same plot but including the multiplayer results later. Figure 4.9b shows the same results as Figure 4.9a, but instead of the general opinion, this is the graphics score. It is still the same vertical axis, showing the percentage of model occurrences, and the horizontal axis showing the different predictors. As we can see, there are some differences in the plots. However, they are similar in the top measurements, such as the average frame age, the average of packet loss, and its standard deviation. The significant difference is the occurrences of both the standard deviation of client bandwidth and the average encode time. The ping time and its standard deviation make a reasonably high leap as well. Indicating that the encode time and standard deviation for client bandwidth are measurements that are good predictors for the graphical score along with the predictors for the opinion score. In Figure 4.9c, we see the model occurrences of predictors for the interactive score. Where we can see just as for the opinion score, some of the best predictors are frame age, packet loss, and their standard deviations. In this plot, we can see that the interactive score compares to the general opinion much better than with the graphical score. The only noticeable difference is the average decode time. Comparing it to the graphical score, we can draw the observation that there is a significant difference for the encode time and the standard deviation for both client bandwidth but also frame age. This would indicate that the interactive score is much

20 4.6. Accounting for player differences

(a) Occurrences of predictors for opinion score (b) Occurrences of predictors for graphics score

(c) Occurrences of predictors for interactive score (d) Occurrences of predictors for in-game performance

Figure 4.9: Model Occurrences of Predictors more tied to the general opinion and that the standard deviation for frame age is not as crucial in the graphical opinion than for interactive and overall scores. The last plot for single-player results shows the model occurrences of predictors for in- game performance in single-player, where the horizontal axis is the different QoS measure- ments, and the vertical axis is the percentage of model occurrences. Figure 4.9d shows some similarities to the others in that the average packet loss and frame age are fairly high up in the plot. What is new is that the average link bandwidth, as well as the standard deviations of server bandwidth and frame size, rise to the top. This might suggest that clear graphics play a significant role for in-game performance, in order to identify the enemies. That the stan- dard deviations are high can indicate that players handle conditions with substantial changes worse than conditions with a worse average.

4.6 Accounting for player differences

In this section, we will examine how the different plots and models behave when looking at one player at a time. It is essential to make sure that our predictions still work for different players. Figures 4.10a to 4.10h present the different QoE measurements and the in-game perfor- mance compared to the frame age inside box plots for each player, where the bin-size is 50 ms. Here we can see how the experience differed for different players. As can be seen from the opinion scores, the results are relatively similar. The differences are the speed at which they descend, where player 2’s opinion dives more aggressively, and player 1’s does not. Player 2’s opinion also rises more than player 1 for the higher values. This trend continues for the graphical scores for the different players. Worth noticing is that the player 1 has a reasonably high score all the time until higher frame age values. This also continues for the interactive score, where the player 1 has a sharp change in opinion when it comes to high values. For the in-game performance, player 1 has a much wider range of performances than player 2 has, where player 1 goes above the baseline quite often while player 2 does not. Nevertheless, we can see the fall of the performance as we advance through the frame age.

21 4.6. Accounting for player differences

(a) Opinion Score Player 1 (b) Opinion Score Player 2

(c) Graphical Score Player 1 (d) Graphical Score Player 2

(e) Interactive Score Player 1 (f) Interactive Score Player 2

(g) In-game performance Player 1 (h) In-game performance Player 2

Figure 4.10: In-game Performance and QoE for two Players Depending on Average Frame Age with a bin-size of 50 ms

Overall the plots are relatively similar with similar patterns and values, except in-game performance and the graphical score. One thing to notice is that player 2 has overall lower values than for player 1. Another thing to notice is the difference in in-game performance where player 1 does not have as even scores as player 2 and goes above the baseline reason- ably often. These results would indicate that there are small differences between the opinion score and interactive score for both of the players, where the graphical score is not as similar. How- ever, for the in-game performance, there are substantial differences that might have to do with the skill difference. In Figure 4.11 we can see how the model occurrences look when we are only using data from one player at a time. These plots show the percentage of occurrences for different QoS

22 4.6. Accounting for player differences

(a) Occurrences of Predictors for Opinion Score Player 1 (b) Occurrences of Predictors for Opinion Score Player 2

(c) Occurrences of Predictors for Graphics Score Player 1 (d) Occurrences of Predictors for Graphics Score Player 2

(e) Occurrences of Predictors for Interactive Score Player 1 (f) Occurrences of Predictors for Interactive Score Player 2

(g) Occurrences of Predictors for In-game Performance Player 1 (h) Occurrences of Predictors for In-game Performance Player 2

Figure 4.11: Individual Model Occurrences of Predictors measurements for the model. All the plots are either focused on different QoE measurements or in-game performance. The vertical axis shows the percentage of model occurrences, while the horizontal axis shows the different QoS measurements. As we can see, most of them differ profoundly between the players but are relatively sim- ilar between the same player. Player 1’s model almost always includes average ping time, the standard deviation of packet loss, and average transfer time, while player 2’s model generally includes server bandwidth and frame size. One thing consistent throughout the plots is that the average frame age is always used as a predictor. Meanwhile, the in-game performance differs significantly from the other plots, except for the average frame age. Another common QoS between the players is the server bandwidth for in-game performance. Additional insight can be gained by performing a stepwise regression analysis in order to select the best model while allowing for categorical predictors for different players. In Table 4.5, we can see if the selected model includes a categorical predictor or not. As we can

23 4.6. Accounting for player differences

Model for Includes Categorical Predictor Opinion Score No Graphics Score Yes Interactive Score No Performance No Normalized Performance Yes

Table 4.5: Model including categorical predictor see, three out of the five models do not include a categorical predictor, which is promising for our predictors to be more universal. As we can see, normalized performance does include a categorical predictor, which is unsurprising when looking at Figures 4.10g and 4.10h where we can see that player 2 is much more consistent in their performance. We see similar things when looking at the graphical score and Figures 4.10c and 4.10d where player 1 is much more tolerant for the higher frame age.

24 5 Multiplayer Results

In this chapter, we present the results from the multiplayer tests as well as the results for a combination of the single-player and multiplayer tests. The chapter starts with an analysis of the QoE scores compared to the advantages in ping and average frame age between players as well as the win rate between them. After that, there is a analysis of the impact of winning followed by correlation analysis of the QoE and QoS measurements, followed by a section of a model-based parameter selection.

Figure 5.1: Win Rate Compared to the Advantage in Ping Time for Player 1

Figure 5.1 displays the number of matches played and the win rate of player 1 grouped by the advantage in ping enjoyed by player 1 in bins of 20 ms. So 0 means player 1 had 0 to 19 ms less ping than player 2, 20 means 20 to 39 less ping. The yellow and blue checker pattern is the win rate for that particular bin in percent, while the one with red and blue stripes is the

25 number of matches for that particular bin. The left vertical axis is the win rate while the right vertical axis is the number of matches played, and the horizontal axis is the different bins, where the number means the lowest value of the bin. We can see that player 2 proves to be quite dominant on an equal playing field. However, we can see that it is possible to affect the in-game performance of the players by manipulating the latency. As the advantage in latency increases, the win rate for player 1 grows steadily, except for some categories with very few games played. For a ping difference between 60 to 99, the match is almost equal.

Figure 5.2: Win Rate Compared to the Advantage in Average Frame Age for Player 1

Figure 5.2 displays the win rate for player 1 compared to the advantage in average frame age. It displays player 1’s win rate on the left vertical axis in percentage, and the matches played for that specific average frame age on the right vertical axis. The horizontal axis shows the different bins the average frame age was categorized in. The yellow and blue checker is the win rate, while the red and blue striped is the number of matches played. We can see that there is an equal match at the normalized advantage in frame age, player 2 has the slightest advantage but still fairly equal. However, when player 1 is at a 20 ms advantage, we can see the win rate equalizing to an almost 50/50 matchup. However, as the average frame age increases, the win rate falls ever so slightly, and at higher frame ages with small pools of matches played, the win rate is unreliable. This would indicate that we can equalize the matchup at lower frame ages and still keep a reasonably okay win rate at moderately higher frame ages, especially at 20 ms advantage. Another vital thing to consider is how the difference in latency affects the QoE of the players. The box plots in Figures 5.3 to 5.5 shows the different QoE scores for advantage in ping, where the vertical axis is the players’ different opinion, and the horizontal axis is the different bins. The blue and red striped bin is for player 2 while the blue and yellow checker bin is for player 1. It is clear to see that the enjoyment for player 2 drops steadily as the latency increases. For high values, the experience becomes very poor for player 2, but if we look at ping differences of 60 ms through 99 ms, where the winning rate is reasonably even, the players QoE score is still relatively high. The box plots in Figures 5.6 to 5.8 shows the advantage in average frame age for player 1, where the vertical axis is the respective QoE measurement, and the horizontal axis is the advantage where the number represents the lowest value in the bin.

26 Figure 5.3: General opinion as ping changes

Figure 5.4: Graphics score as ping changes

Figure 5.5: Interactive Score as ping changes

Here we can see the opinion for player 2 falling more drastically than for the average ping. The scores above 100 ms does not have enough matches to show any meaningful results. However, as we can see for the scores on 80 to 99 ms and 100 to 119 ms, the scores are not nearly as good as for ping and can not be considered good enough to say the player has not been affected noticeably. The bin 60 to 79 ms looks a bit better, but it is still not enough to say that the QoE has not changed noticeably. For the bins lower than 60, which have a reasonably okay median, we can say that the QoE has not changed noticeably. It is also for these bins

27 Figure 5.6: General opinion as the average frame age changes

Figure 5.7: Graphics score as the average frame age changes

Figure 5.8: Interactive Score as the average frame age changes that we can see from Figure 5.2 that the win rate is relatively equal, meaning we can equalize the match without affecting the players QoE too much. These results would indicate that it is possible to compensate for unequal skill level by introducing latency to the winning player, without affecting the QoE noticeably. How- ever, there is a clear difference in how much latency one has to apply before equalizing the matchup in the ping time and frame age. It would, of course, be easier to compensate for the skill level without affecting the QoE noticeably if the players’ skill difference would be lesser.

28 5.1. The impact of Winning on Quality of Experience

(a) General Opinion as Ping Changes when Player 1 Wins (b) General Opinion as Ping Changes when Player 2 Wins

(c) Graphical Opinion as Ping Changes when Player 1 Wins (d) Graphical Opinion as Ping Changes when Player 2 Wins

(e) Interactive Opinion as Ping Changes when Player 1 Wins (f) Interactive Opinion as Ping Changes when Player 2 Wins

Figure 5.9: QoE Depending on Winner

5.1 The impact of Winning on Quality of Experience

When measuring QoE and comparing it to QoS like this, it is essential to make sure that whether the player wins or not does not have a more significant effect on the QoE than the QoS. In Figure 5.9, we can see how the QoE scores differ depending on if the player won or not. As we can see, both players generally have a lower opinion score when loosing, including the graphics score, which ideally should be unaffected. However, we can still see the same general trend regardless if the player is winning or losing, where a higher ping leads to a lower QoE. So although winning does make a difference in QoE, we can still see the impact of ping in both cases. Another way to take winning into account is constructing models using step-wise regres- sion and allowing the use of categorical predictors, in this case, the winner of the match as we have done before. We have tried to construct models for the different QoE scores and recorded whether the model differed depending on whether the player won. As we can see in Table 5.1, all the different models used the winner as a categorical vari- able. However, what was interesting was that the only thing that differed between the two models used was the constant at the start. An example of the models used is the two models for predicting Player 2’s interactive score, which we can see in Equation 5.1 and 5.2. So although winning has an impact on the QoE, the models still behave the same way, so it should not impact the result.

29 5.2. Correlation Analysis for the Combined Data

Model for Model use winner of match Difference in Constant Player 1 Opinion Score Yes 0.911 Player 1 Graphics Score Yes 0.285 Player 1 Interactive Score Yes 0.884 Player 2 Opinion Score Yes 1.100 Player 2 Graphics Score Yes 1.010 Player 2 Interactive Score Yes 1.026

Table 5.1: Models including winning categorical predictor

Player 2 Winning Inter = 7.135 − 0.00573ClientAvg + 0.00290ClientStDev − 0.03242FrameAgeAvg + 0.00817FrameAgeStDev (5.1) + 0.000104FrameSizeStDev

Player 2 Losing Inter = 6.109 − 0.00573ClientAvg + 0.00290ClientStDev − 0.03242FrameAgeAvg + 0.00817FrameAgeStDev (5.2) + 0.000104FrameSizeStDev

So all QoE models behave the same whether the player wins or loses, and the only differ- ence is a constant decrease in enjoyment when losing. If we take a look at player 2’s scores in Table 5.1, the constant seems to be around 1 for all opinion scores, while for player 1, the opinion and interactive scores are around 0.9.

5.2 Correlation Analysis for the Combined Data

In this section, we present results in a similar fashion as before, such as correlation matrices or model occurrences of predictors for QoE measurements. However, here we show them with all the results, both the multiplayer and single-player results combined. Figures 5.10 to 5.12 shows the correlation between different QoS and QoE measurements, where 0 means that there is no correlation, and a higher absolute value means there is a stronger correlation. Studying the correlations matrices for both games in Figures 5.10 to 5.12, we see a similar result as in Figures 4.3 to 4.5, where the correlations for the single-player games are presented. The average frame age proves to have the strongest correlation for all three QoE metrics recorded by all correlation coefficients with the standard deviation not far behind. Just as in the single-player correlations, ping is strongly correlated with the frame age and is not too far behind the correlation with the QoE metrics. Tables 5.2 to 5.4 shows the top seven correlations for the respective coefficients between QoS metrics (first in each pair) and QoE metric (second in each pair) for the combined results. We note that there are some differences between the combined results correlations and the single-player correlations, mostly in the order that they come in, but there are some newcom- ers. Overall the standard deviations are either removed or move down on the list, while the average frame age & graphical score, average transfer time, and average ping time move up the list.

30 5.2. Correlation Analysis for the Combined Data

Figure 5.10: Correlation Matrix for Pearson’s r

Rank Pearson 1 Avg. Frame Age & Gen. Op. Score 2 Avg. Frame Age & Inter. Score 3 Avg. Ping Time & Gen. Op. Score 4 Avg. Ping Time & Inter. Score 5 St. Dev. Frame Age & Gen. Op. Score 6 Avg. Frame Age & Graph. Score 7 St. Dev. Frame Age & Inter. Score

Table 5.2: Top 7 Best Pearson Correlations for QoE and Performance

Just as with the single-player correlations, the average frame age is at the top with the general opinion or the interactive score. This would indicate that the frame age is essential for determining the QoE for the player, both multiplayer and single-player. It would also suggest that the standard deviation of frame age is not as crucial for multiplayer as it is for single-player.

31 5.2. Correlation Analysis for the Combined Data

Figure 5.11: Correlation Matrix for Kendall’s τ

Rank Kendall 1 Avg. Frame Age & Gen. Op. Score 2 Avg. Frame Age & Inter. Score 3 Avg. Frame Age & Graph. Score 4 Avg. Transfer Time & Gen. Op. Score 5 Avg. Transfer Time & Inter. Score 6 Avg. Ping Time & Gen. Op. Score 7 Avg. Ping Time & Inter. Score

Table 5.3: Top 7 Best Kendall Correlations for QoE and Performance

32 5.3. Model-based Parameter Selection

Figure 5.12: Correlation Matrix for MIC

Rank MIC 1 Avg. Frame Age & Inter. Score 2 Avg. Frame Age & Gen. Op. Score 3 Avg. Transfer Time & Gen. Op. Score 4 Avg. Ping Time & Gen. Op. Score 5 Avg. Transfer Time & Inter. Score 6 Avg. Ping Time & Inter. Score 7 Avg. Frame Age & Graph. Score

Table 5.4: Top 7 Best MIC Correlations for QoE and Performance

5.3 Model-based Parameter Selection

In Figure 5.13a, we can see the model occurrences of predictors, in this case, the general opinion score for all tests. This means the percentage of times the measurement was included as a predictor for the score. The vertical axis is the percentage of model occurrences, and the horizontal axis shows the different predictors. The predictors not in the plot were either not interesting or did not have enough deviation for proper measurements. Once again, the frame age is at the top, together with the other predictors, as displayed in Figure 4.9a, with the exception being with both the servers average and standard deviation, average encode time as well as the frame size average and standard deviation. The other apparent differences from the single-player results being the average decode time falling. This would indicate that there are some similarities between the multiplayer results and single-player results, but the server and frame size stand out more from the other variables for all the tests.

33 5.3. Model-based Parameter Selection

(a) Occurrences of predictors for opinion score (b) Occurrences of predictors for graphics score

(c) Occurrences of predictors for interactive score

Figure 5.13: Model Occurrences of Predictors

One thing to consider here is that we have many more predictors compared to just one game. This tells us that in order to make a good predictive model for both games simultane- ously, we need to use more variables than one game. We see a similar pattern for the model occurrences of predictors for the graphics score in Figure 5.13b, where average frame age, packet loss, and server bandwidth take the lead, together with encode time and frame size. We see the significant disparities if we look at the average ping, average and standard deviation of link, and client bandwidth and its standard deviation, which all rise. These results could mean that the overall bandwidth for the client is an essential part of the interactive score for results, including multiplayer. We also see the same thing here where we need more predictors compared to single-player only. In Figure 5.13c we are able to see similar results. The frame age, packet loss, and server together with frame size stay at the top. There are not as many differences as with the other two opinion measurements, the only difference being that the average ping falls, while the average decode time rises for the combination. Interestingly, the standard deviation for frame age is falling when looking at the combined results compared to the single-player while we still need more predictors to get a good model.

34 6 Discussion

This chapter will include discussions of some of the previous sections, mainly a closer analy- sis of the results presented and what could have been done better in the method

6.1 Results

The results presented shows interesting data that need to be analyzed. The questions posed in Section 1.2 also needs to be discussed and answered.

Single-player Looking back at Table 4.1 and Figure 4.1 which shows how the in-game performance and QoE measurements changed across the different scenarios, the results show a higher overall score for packet loss than for latency and an even lower score, compared to the average, for the combination of the two. This result would relate to the studies made by IJsselsteijn et al. [11], where they present the results that for more fast-paced games, the player is more tolerant to packet loss, than to latency. This unlike more slow-paced games, like roleplaying games where the packet loss would bring the user much more discomfort. Our results seem to comply well with what IJsselsteijn et al. [11] discovered regarding fast-paced games since both games tested are fast-paced. This can be seen similarly for a QoS-based analysis on QoE measurements in Section 4.4, where we show that, once again, the packet loss does not have such a vital role as the latency. We are also able to observe this for the combinations, where the combination with the highest latency is the one where the player both played the worse and gave the lowest rating for the QoE. It is also interesting to note that it is the interactive score that deteriorates the most across the scenarios. This result would be another indicator of the fast-paced nature of the games. The player is more likely to notice changes in responsiveness and most likely does not care about the visual aspects of the game, unlike roleplaying games, just as Jarschel et al. [13] brought up. When looking at the complete picture of the players’ in-game performance, it is clear that the players performed worse as the network conditions worsened. There is a common trend showing itself in Figure 4.2, where the players score measured the lowest when we have high latency. However, the player scored the worst when there was a combination of latency and

35 6.1. Results packet loss. Comparing the scores of scenarios PL1-7, the combinations, to the worst scenario for latency, L8, every one of PL1-7 were almost as bad. It is also interesting to note that, as mentioned in Section 2.2, when looking at Figure 4.1, showing the different scores for the scenarios, there is a significant drop in the players’ score when passing L4, where we have an additional latency of 100 ms, just as Claypool and Finkel [4] mentioned. We can also see a similar downgrade in in-game performance for 200 ms. Both times the drop in in-game performance is approximately 50 % compared to 100 ms less delay, doubling the in-game performance downgrade Claypool and Claypool [3] mentioned in their study. If we are solely looking at the results regarding the frame age, we can see that it is an excellent predictor for the user’s QoE score. It is among the most common predictors for all QoE metrics and the in-game performance, and its standard deviation is common among several of them. In Section 4.3, we can see the further indications that the general opinion has a good correlation with the interactive score, and that the frame age affects these scores more than the graphical score. It is also evident from the correlation matrix that the average frame age has a reasonably strong correlation with the different QoE measurements. From the correlation matrices, we can see that there is a good correlation with ping where there are strong correlations with the average frame age. Nevertheless, the value for the frame age is slightly better. Both measurements are closely related, measuring the time it takes for the user to see the picture, but the frame age is more precise as it covers more factors than just the round trip time between the server and client. For the single-player results, it is essential to examine how the graphs and plots behave with the different players in mind, if the model occurrence plots, and correlations still work when looking at the players differently, which we do in Section 4.6. The findings from this section would indicate that there are small differences when looking at the players’ different interactive and general opinion scores, but that it differs on the graphical scores. We also make a model occurrence plot for both the players, which shows the average frame age as one of the top predictors across the players’ models.

Multiplayer For the multiplayer results, it is easy to see that player 1 had a low win rate when the latency was equal. However, from Figure 5.1, we can see that when the ping advantage reaches 60-99 ms, the match becomes almost equal with a win rate of about 45 % for player 1. For Figure 5.2, we can also see that when the frame age advantage reaches 20-39 ms, the match almost becomes a 50/50 split. So we can certainly say that it is possible to equalize the game using latency, but that is unusable if the other player have an miserable experience. We therefore need to take a look at how the players rated the experience. As we can see in Figures 5.3 to 5.5, where the QoE for different latency differences is presented, player 2 has a reasonably high general opinion with 80-99 ms disadvantage. The player scored as high as 6 but also scored as low as 1. Predictably we can also observe better results in the 60-79 ms bin with an overall higher average and smaller spread. We see similar results for the graphics and interactive score, where the score remains reasonably high for the 60-79 ms bin but starts to drop off below the average after that. These results also present themselves in Figures 5.6 to 5.8, but for average frame age in- stead of ping time. This time there is a noticeable difference in QoE scores for the previously mentioned bins, 60-79 ms, 80-99 ms, and 100-119 ms. The scores in these bins are not nearly high enough to conclude that the QoE has not been affected too much. However, if we look at the bins lower than 60 ms, we can conclude that it is not affected too much, this is also where the win rate for the average frame age is fairly high. These results would show that we can equalize the playing field without sacrificing too much enjoyment. It is also worth

36 6.2. Method noting since the enjoyment is linearly declining that this would be more prominent shown if the skill level between the players would be less. As mentioned in Section 5.1, we still need to take the impact of winning into account when predicting the QoE for the players. As we showed in that section in Figure 5.9 and Table 3.3, winning does have a significant impact on the players’ enjoyment of the game, but it seems to be somewhat consistent and could be accounted for by subtracting a constant when losing.

Combined results It is also interesting to note that the combined results show practically the same results as with only single-player, which means that there is not much of a difference in the results for both of the games. If we compare the correlation matrices for one game in Figures 4.3 to 4.5 and Figures 5.10 to 5.12, which present the correlations for both single-player and the combined results, they look very similar so one could assume that both games behave similarly. But if we take a look at Figures 4.9a to 4.9d and Figures 5.13a to 5.13c, which shows the model occurrences of predictors for the different QoE metrics, we see that when trying to predict the results for both games at the same time we need more predictors. This suggests more substantial differences between the games and that our findings might not be as universal.

6.2 Method

We discussed the choice of executing the test in the method, and since a study can never be perfect, there are some things that we could have done differently, and there are also some things that need to be analyzed.

Method Reasoning This thesis is based on the idea of investigating the QoS on the QoE for a player. To get real players opinions, we have had to do a user study. The questions we chose for the users to answer were based on the idea that we needed short answers that could be measured since we were going to do a lot of them. It was also based on the study, as mentioned earlier, done by Hsu et al. [9], where they asked three simple questions measuring the players MOS. As Hsu et al. [9] chose to have a scale of 1-5, we chose a scale of 1-7 because of the findings of Miller [16]. In the study, it is mentioned that the user can distinguish between 7 ± 2 different levels, so we chose the mean. Our idea from the beginning was to have the players do the test on both single-player and multiplayer. We were also interested in measuring the in-game performance of the players, that is why we chose Geometry Wars for single-player and Speedrunners for multiplayer. Geometry Wars is a game where the player gets a score at the end of the game, and it is suf- ficiently fast enough that the test would not take forever. The same thing goes for Speedrun- ners; it is fast enough for the players to still have the energy to play it, while it does an excellent job of distinguishing who won the game. The decision to make the scenarios the way they are have been explained rather thor- oughly in Section 2.2. From other works, we have seen where they have put their limitations on what the user can handle. The decision to put the packet loss scenarios where they are was based on the study mentioned [5], where they mention that 1 % makes the experience almost unplayable. From our own practical experience with the packet loss, we decided that 2 % would be a good cut off since the gaming experience drastically fell. Similarly, we decided where the limitations for latency where to be put based on the previous studies where the latency starts to affect the user considerably [3, 7], from these studies together with practical experience regarding latency of 200 ms we put together our scenarios.

37 6.2. Method

We only chose packet loss and latency because every other measurement does practi- the same thing as per Jarschel et al. [13]. For example, packets that come in the wrong order because of issues along the way, are handled as packet loss. This is because for cloud- computing to meet the real-time requirements that it poses, it can not wait for an arbitrary time nor show them in a different order. Therefore, this leads to the software having no other choice but to drop the packets that are not in order. This would be one example of how another network property behaves just like packet loss or latency. The platform we chose has also been discussed earlier in Section 1.3, where we discuss that this is our chosen platform from our previous experience. The platform has practically no impact on the results for the thesis, and since Steam Link has such thorough logs, it was an obvious choice for us.

Method Analysis Regarding the sources we have chosen for our thesis, we handpicked them to be one of the best we could find. There were clear guidelines that the sources needed to be of top quality, with a majority from reputable and credible conferences. The sources we have cited regarding cloud computing are also not too old to keep up with a fast-moving field. When looking at Table 3.1, we can see that we are not sure what the "Dropped reset" data means. We have tried reaching out to the developers of the service, , but have received no answer. To ensure replicability, we will make sure to make all methods and tools used during test- ing available. Another measure of replicability is the p-value. If we test the null hypothesis of no correlation against the alternative hypothesis of a nonzero correlation. A small p-value (typically p < 0.05) indicates that there is strong evidence against the null hypothesis. In Table 6.1, we can see the percentage of all correlations with a p > 0.05; in other words, the correlations with no strong evidence against the null hypothesis.

Percentage of p > 0.05 r τ MIC Single-player 44.97 % 34.02 % 36.98 % Combined 46.24 % 34.56 % 43.20 %

Table 6.1: Percentage of p > 0.05 for different correlation coefficients

38 6.2. Method

Correlation Pearson Kendall MIC Single-player Average frame age vs General opinion score 5.828e−43 4.942e−39 ≤1.28e−6 Average frame age vs Graphics score 5.178e−26 1.379e−25 ≤1.28e−6 Average frame age vs Interactivity score 1.038e−43 1.686e−39 ≤1.28e−6

St.Dev frame age vs General opinion score 3.676e−36 8.665e−33 ≤1.28e−6 St.Dev frame age vs Interactivity score 1.417e−36 2.093e−33 ≤1.28e−6

Average Ping Time vs General Opinion Score 1.146e−28 8.322e−27 ≤1.28e−6 Average Ping Time vs Interactivity Score 1.119e−28 3.821e−26 ≤1.28e−6

Average Transfer Time vs General Opinion Score 9.869e−16 5.145e−32 ≤1.28e−6 Average Transfer Time vs Interactivity Score 5.502e−15 3.211e−32 ≤1.28e−6

St. Dev. Packet Loss vs General Opinion Score 0.054 48 4.387e−26 ≤1.28e−6 Combined Average frame age vs General opinion score 2.258e−75 8.9e−64 ≤1.28e−6 Average frame age vs Graphics score 7.885e−27 3.163e−48 ≤1.28e−6 Average frame age vs Interactivity score 1.366e−73 3.192e−60 ≤1.28e−6

Average Ping Time vs General Opinion Score 2.775e−57 1.745e−45 ≤1.28e−6 Average Ping Time vs Interactivity Score 1.307e−55 2.7e−44 ≤1.28e−6

Average Transfer Time vs General Opinion Score 8.764e−27 3.885e−47 ≤1.28e−6 Average Transfer Time vs Interactivity Score 1.017e−25 1.546e−46 ≤1.28e−6

St.Dev frame age vs General opinion score 7.192e−53 5.791e−44 ≤1.28e−6 St.Dev frame age vs Interactivity score 2.365e−50 1.49e−39 ≤1.28e−6

Table 6.2: p-values for Different Correlations Rounded to 4 Significant Figures (3 for MIC)

Here in Table 6.2 we can see the p-values for the correlations in Tables 4.2 to 4.4 and 5.2 to 5.4. As we can see, most of the p-values are very low. The only one not below 0.05 is the standard deviation of packet loss versus general opinion score, but only for Pearson’s r. So there is strong evidence that there is a correlation between the values and that the result is not based on chance. When it comes to reliability, one measure we can use is the Internal Consistency of the test. A measure for this is Cronbach’s α values, where values of α > 0.9 is considered excellent internal consistency. We have calculated the internal consistency of all the baselines for the single-player tests as they are the most comparable. For the multiplayer tests, we calculated the α value for the first match of every test where the ping is always the same. As we can see in Table 6.3, we have high values of α for both games and are as such internally consistent, which is a kind of reliability.

Game Cronbach’s α Comment Geometry Wars 0.8830 Internal consistency for all baseline tests Speedrunners 0.8769 Internal consistency for the first match of every test

Table 6.3: Cronbach’s α for Different Games

Regarding validity, as we can see in Figures 4.3 to 4.5 and Figures 5.10 to 5.12, which shows the correlation between the QoS and QoE, all three QoE values are strongly correlated

39 6.3. The Work in a Wider Context with each other. This is promising as the enjoyment of a game could be said to, in part, depend on graphics and interactivity. We can also see that the QoE values are correlated with the in-game score, which could be interpreted as that the user finds more enjoyment out of the game when they perform well. All of this should point to the validity of what we are measuring.

6.3 The Work in a Wider Context

The aspects of cloud computing discussed in this thesis, such as frame age in consideration of single-player and multiplayer does not directly have any ethical or societal impacts. How- ever, if we look at it from a wider context and consider cloud gaming as a whole, there are some ethical and societal aspects worth considering as it grows in popularity. The positive aspect of storing the games in the cloud instead of storing them locally is that it will be cheaper for the players as they are no longer required to keep expensive hardware up to date. As mentioned in the introduction, there would also be a significant drop in piracy as the game would be stored in the cloud [29]. Nevertheless, it has some adverse effects, like the fact that the consumer is no longer the owner of the game. If, for example, no longer sees a profit in running Minecraft on their servers, the users will lose all access to the game, or even worse, if they decide to shut down their entire operation, all users will lose access to all their games. Another negative aspect is that the users are dependant on the internet; they can no longer play locally if their internet goes down or is shut off. This assumes the game provider does not provide a local copy together with their cloud service.

40 7 Conclusion

In this thesis, our goal has been to find out how different factors affect the experience when streaming games. We have focused on how the players’ in-game performance and quality of experience changes with network conditions. Our results show how the different quality of service measurements affects the users’ in-game performance and opinion scores. They also show that the frame age is one of the best predictors for both in-game performance and all of the three opinion scores, general opinion, graphics, and interactivity. These results are consistent with what Yates et al. [29] theorized regarding frame age being an "...effective measure of user-perceived QoE...". Regarding our third question about equalizing multiplayer matches, with a difference in ping time of 60-79 ms, we can see that we raised the win rate of one player from 15 % at an equal ping time to 44 % with the additional latency. We see similar results for the average frame age difference, where we raise the win rate of one player from 43 % to 48 %. Moreover, this is without lowering the opinion score of the affected player too much. With a more substantial difference in ping and frame age, we can see the win rate rising for ping and enjoyment falling for both the frame age and ping. So for our particular skill difference, we can say that we can equalize the match. The most important part of what we have found in the results presented here is that the frame age is an essential factor in the players’ experience and in-game performance. As the frame age combines different measures such as encode time, transfer time, and decode time, it is unsurprising that it is more accurate than any of them alone. For future works, it would be interesting to analyze the different parts of the life of the frame to see what parts we can affect. The goal of this would be to minimize the frame age and maximize the user’s in-game performance and enjoyment of the game. Other interesting work include determining how the frame age (and variation thereof) and QoE may be impacted by the use of bandwidth caps [15]. Moreover, further studies should be conducted to see if the frame age is still the best predictor for other slower-paced games. A broader study needs to be done to ensure that the correlation applies to a more substantial subset of users. The same can be said for the multiplayer test, where we need to test a wider variety of users and skill levels.

41 Bibliography

[1] Kuan Ta Chen, Yu Chun Chang, Hwai Jung Hsu, De Yu Chen, Chun Ying Huang, and Cheng Hsin Hsu. “On the quality of service of cloud gaming systems”. In: IEEE Transac- tions on Multimedia 16.2 (Feb. 2014), pp. 480–495. DOI: 10.1109/TMM.2013.2291532. [2] Kuan Ta Chen, Yu Chun Chang, Po Han Tseng, Chun Ying Huang, and Chin Laung Lei. “Measuring the latency of cloud gaming systems”. In: Proceedings of the ACM Multime- dia Conference. 2011, pp. 1269–1272. DOI: 10.1145/2072298.2071991. [3] Mark Claypool and Kajal Claypool. Latency and player actions in online games. Nov. 2006. DOI: 10.1145/1167838.1167860. URL: http://portal.acm.org/citation. cfm?doid=1167838.1167860. [4] Mark Claypool and David Finkel. “The Effects of Latency on Player Performance in Cloud-Based Games”. In: Proceedings of the Annual Workshop on Network and Systems Support for Games (NetGames). 2014. DOI: 10.1109/NetGames.2014.7008964. [5] Victor Clincy and Brandon Wilgor. “Subjective evaluation of latency and packet loss in a cloud-based game”. In: Proceedings of the International Conference on Information Tech- nology: New Generations (ITNG). 2013, pp. 473–476. DOI: 10.1109/ITNG.2013.79. [6] clumsy, an utility for simulating broken network for Windows Vista / Windows 7 and above. URL: https://jagt.github.io/clumsy/. [7] Matthias Dick, Oliver Wellnitz, and Lars Wolf. “Analysis of Factors Affecting Players’ Performance and Perception in Multiplayer Games”. In: Proceedings of ACM SIGCOMM Workshop on Network and System Support for Games (NetGames). 2005, pp. 1–7. DOI: 10. 1145/1103599.1103624. [8] Geometry Wars: Retro Evolved på Steam. URL: https://store.steampowered.com/ app/8400/Geometry_Wars_Retro_Evolved/. [9] Cheng-Hsin Hsu, Hua-Jun Hong, Chih-Fan Hsu, Tsung-Han Tsai, Chun-Ying Huang, and Kuan-Ta Chen. “Enabling Adaptive Cloud Gaming in an Open-Source Cloud Gam- ing Platform”. In: IEEE Transactions on Circuits and Systems for Video Technology (Dec. 2015), pp. 2078–2091. [10] Chun Ying Huang, Cheng Hsin Hsu, Yu Chun Chang, and Kuan Ta Chen. “Gamin- gAnywhere: An open cloud gaming system”. In: Proceedings of the ACM Multimedia Systems Conference (MMSys). 2013, pp. 36–47. DOI: 10.1145/2483977.2483981.

42 Bibliography

[11] Wijnand IJsselsteijn, Yvonne de Kort, Karolien Poels, Audrius Jurgelionis, and Fran- scesco Bellotti. “Characterising and Measuring User Experiences in Digital Games”. In: Proceedings of the International Conference on Advances in Computer Entertainment Technol- ogy (ACE) (2007), pp. 1–4. [12] Gazi K. Illahi, Thomas Van Gemert, Matti Siekkinen, Enrico Masala, Antti Oulasvirta, and Antti Ylä-Jääski. “Cloud Gaming with Foveated Video Encoding”. In: ACM Trans- actions on Multimedia Computing, Communications, and Applications 16.1 (Mar. 2020), pp. 1–24. ISSN: 1551-6857. DOI: 10.1145/3369110. URL: https://dl.acm.org/ doi/10.1145/3369110. [13] Michael Jarschel, Daniel Schlosser, Sven Scheuring, and Tobias Hoßfeld. “An evalua- tion of QoE in cloud gaming based on subjective tests”. In: Proceedings of the International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS). 2011, pp. 330–335. DOI: 10.1109/IMIS.2011.92. [14] Maurice G. Kendall. “A New Measure of Rank Correlation”. In: Biometrika 30.1/2 (June 1938), p. 81. DOI: 10.2307/2332226. [15] Vengatanathan Krishnamoorthi, Niklas Carlsson, and Emir Halepovic. “Slow but Steady: Cap-Based Client-Network Interaction for Improved Streaming Experience”. In: Proceedings of the IEEE/ACM International Symposium on Quality of Service (IWQoS). Jan. 2018. DOI: 10.1109/IWQoS.2018.8624170. [16] George A. Miller. “The magical number seven, plus or minus two: some limits on our capacity for processing information”. In: Psychological Review 63.2 (Mar. 1956), pp. 81– 97. DOI: 10.1037/h0043158. [17] Karl Pearson. On further methods of determining correlation. 16. London: Dulau and Com- pany, 1907. [18] Kjetil Raaen and Andreas Petlund. “How much delay is there really in current games?” In: Proceedings of the ACM Multimedia Systems Conference (MMSys). Mar. 2015, pp. 89–92. DOI: 10.1145/2713168.2713188. [19] Regression Analysis: How Do I Interpret R-squared and Assess the Goodness-of-Fit? URL: https : / / blog . minitab . com / blog / adventures - in - statistics - 2 / regression- analysis- how- do- i- interpret- r- squared- and- assess- the-goodness-of-fit. [20] David N. Reshef, Yakir A. Reshef, Hilary K. Finucane, Sharon R. Grossman, Gilean McVean, Peter J. Turnbaugh, Eric S. Lander, Michael Mitzenmacher, and Pardis C. Sa- beti. “Detecting novel associations in large data sets”. In: Science 334.6062 (Dec. 2011), pp. 1518–1524. DOI: 10.1126/science.1205438. [21] Saeed S. Sabet, Carsten Griwodz, and Sebastian Möller. “Influence of primacy, re- cency and peak effects on the game experience questionnaire”. In: Proceedings of the ACM Workshop on Immersive Mixed and Virtual Environment Systems (MMVE). June 2019, pp. 22–27. DOI: 10.1145/3304113.3326113. [22] Ryan Shea, Jiangchuan Liu, Edith Ngai, and Yong Cui. “Cloud gaming: Architecture and performance”. In: IEEE Network 27.4 (2013), pp. 16–21. DOI: 10 . 1109 / MNET . 2013.6574660. [23] Ivan Slivar, Mirko Suznjevic, and Lea Skorin-Kapov. “Game categorization for deriv- ing QoE-driven video encoding configuration strategies for cloud gaming”. In: ACM Transactions on Multimedia Computing, Communications and Applications 14.3s (June 2018), pp. 1–24. ISSN: 15516865. DOI: 10.1145/3132041. URL: http://dl.acm.org/ citation.cfm?doid=3233173.3132041. [24] SpeedRunners på Steam. URL: https://store.steampowered.com/app/207140/ SpeedRunners/.

43 Bibliography

[25] Steam - Remote Play - Knowledge Base - Steam Support. URL: https : / / support.steampowered.com/kb_article.php?ref=3629-riav-1617. [26] Robert C. Streijl, Stefan Winkler, and David S. Hands. “Mean opinion score (MOS) re- visited: methods and applications, limitations and alternatives”. In: Multimedia Systems 22.2 (Mar. 2016), pp. 213–227. DOI: 10.1007/s00530-014-0446-1. [27] What is Mallows’ Cp? - Minitab. URL: https : / / support . minitab . com / en - us/minitab/18/help-and-how-to/modeling-statistics/regression/ supporting-topics/goodness-of-fit-statistics/what-is-mallows- cp/. [28] Yiling Xu, Qiu Shen, Xin Li, and Zhan Ma. “A Cost-Efficient Cloud Gaming System at Scale”. In: IEEE Network 32.1 (Jan. 2018), pp. 42–47. DOI: 10.1109/MNET.2018. 1700153. [29] Roy D. Yates, Mehrnaz Tavan, Yi Hu, and Dipankar Raychaudhuri. “Timely cloud gam- ing”. In: Proceedings of the IEEE International Conference on Computer Communications (IN- FOCOM). Oct. 2017, pp. 1–9. DOI: 10.1109/INFOCOM.2017.8057197.

44 8 Appendix

8.1 QoS measurements Top 3

Pearson’s

Rank Gen. Op. Score 1 Avg. Frame Age 2 St. Dev. Frame Age 3 Avg. Ping Time

Table 8.1: Top 3 Best Pearson Correlations for General Opinion Score

Rank Graph. Score 1 Avg. Frame Age 2 St. Dev. Frame Age 3 Avg. Server BW

Table 8.2: Top 3 Best Pearson Correlations for Graphical Score

Rank Inter. Score 1 Avg. Frame Age 2 St. Dev. Frame Age 3 Avg. Ping Time

Table 8.3: Top 3 Best Pearson Correlations for Interactive Score

Rank Score Rel. to BL 1 Avg. Frame Age 2 Avg. Ping Time 3 St. Dev. Frame Age

Table 8.4: Top 3 Best Pearson Correlations for In-game Performance relative to the Baseline

45 8.1. QoS measurements Top 3

Kendall’s

Rank Gen. Op. Score 1 Avg. Frame Age 2 St. Dev. Frame Age 3 Avg. Transfer Time

Table 8.5: Top 3 Best Kendall Correlations for General Opinion Score

Rank Graph. Score 1 Avg. Frame Age 2 Avg. Packet Loss 3 Avg. Frame Size

Table 8.6: Top 3 Best Kendall Correlations for Graphical Score

Rank Inter. Score 1 Avg. Frame Age 2 St. Dev. Frame Age 3 Avg. Transfer Time

Table 8.7: Top 3 Best Kendall Correlations for Interactive Score

Rank Score Rel. to BL 1 Avg. Frame Age 2 Avg. Transfer Time 3 St. Dev. Frame Age

Table 8.8: Top 3 Best Kendall Correlations for In-game Performance relative to the Baseline

MIC

Rank Gen. Op. Score 1 Avg. Frame Age 2 Avg. Ping Time 3 St. Dev. Frame Age

Table 8.9: Top 3 Best MIC Correlations for General Opinion Score

Rank Graph. Score 1 Avg. Frame Age 2 Avg. Frame Size 3 Avg. Transfer Time

Table 8.10: Top 3 Best MIC Correlations for Graphical Score

46 8.1. QoS measurements Top 3

Rank Inter. Score 1 Avg. Frame Age 2 Avg. Transfer Time 3 St. Dev. Frame Age

Table 8.11: Top 3 Best MIC Correlations for Interactive Score

Rank Score Rel. to BL 1 Avg. Frame Age 2 Avg. Transfer Time 3 Avg. Ping Time

Table 8.12: Top 3 Best MIC Correlations for In-game Performance relative to the Baseline

47