<<

Bachelor of Science in Digital Game Development September 2019

Evaluating the Impact of V-Ray Rendering Engine Settings on Perceived Visual Quality and Render Time A Perceptual Study.

Andreas Linné

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulfilment of the requirements for the degree of Bachelor of Science in Digital Game Development. The thesis is equivalent to 10 weeks of full time studies.

The authors declare that they are the sole authors of this thesis and that they have not used any sources other than those listed in the bibliography and identified as references. They further declare that they have not submitted this thesis at any other institution to obtain a degree.

Contact Information: Author(s): Andreas Linné E-mail: [email protected]

University advisor: Dr. Valeria Garro Department of Computer Science

Faculty of Computing Internet : www.bth.se Blekinge Institute of Technology Phone : +46 455 38 50 00 SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57 Abstract

Background. In , it can be a time-consuming process to render photorealistic images. This rendering process, called “physically based rendering” uses complex algorithms to calculate the behavior of light. Fortunately, most ren- derers offer the possibility to alter the render-settings, allowing for a decrease in render time, but this usually comes at the cost of a lower quality image. Objectives. This study aims to identify what setting has the highest impact on the rendering process in the V-Ray renderer. It also examines if a perceived difference can be seen when reducing this setting. Methods. To achieve this, an experiment was done where 22 participants would indicate their preference for rendered images. The images were rendered in V-Ray with different settings, which affected their respective render time differently. Addi- tionally, an objective image metric was used to analyze the images and try to form a correlation with the subjective results. Results. The results show that the anti-aliasing setting had the highest impact on render time as well as user preference. It was found that participants preferred images with at least 25% to 50% anti-aliasing depending on the scene. The objective results also coincided well enough with the subjective results that it could be used as a faster analytical tool to measure the quality of a computer-generated image. Prior knowledge of rendering was also taken into account but did not give conclusive results about user preferences. Conclusions. From the results it can be concluded that anti-aliasing is the most im- portant setting for achieving good subjective image quality in V-Ray. Additionally, the use of an objective image assessment tool can drastically speed up the process for targeting a specific visual quality goal.

Keywords: V-Ray, Computer-Generated Imagery, Rendering, Perception, Subjec- tive Evaluation

Contents

Abstract i

1 Introduction 1 1.1 Aim, Objectives and Research Question ...... 2

2 Related Work 3

3 Background 5 3.1 Physically Based Rendering ...... 5 3.1.1 Biased Render Engines ...... 5 3.1.2 Unbiased Render Engines ...... 5 3.2AutodeskMaya...... 6 3.3V-Ray...... 6 3.4 V-Ray Quick Settings ...... 6 3.4.1 Render Settings Overview ...... 7 3.5 Image Quality Assessment ...... 9 3.5.1 Subjective IQA ...... 9 3.5.2 Objective IQA ...... 10

4 Method 13 4.1Scenes...... 13 4.2 V-Ray Quick Settings ...... 14 4.3 Preliminary Study ...... 15 4.4 Rendering ...... 15 4.4.1 Rendering Hardware ...... 15 4.5 Experiment Setup and Procedure ...... 15 4.6Ethics...... 16 4.7 Experiment Specifications ...... 16 4.8 Custom Software Implementation ...... 16 4.9 DSSIM IQA Metric ...... 17

5 Results 19 5.1 Preliminary Study ...... 19 5.2 Subjective Results ...... 22 5.2.1 How Render-Settings Affect Votes ...... 23 5.2.2 Survey Results ...... 27 5.2.3 Previous Experience vs No Experience ...... 27 5.3 Objective Results ...... 28

iii 5.3.1 DSSIM Results ...... 28 5.3.2 Render Time ...... 29

6 Analysis and Discussion 39

7 Conclusions and Future Work 41

References 43

A Supplemental Information 45 A.1 Acronyms ...... 45

iv Chapter 1 Introduction

Achieving photorealism in computer graphics has been widely chased ever since the ’70s [9]. With current software and technology, this has been possible for the layman for almost 20 years. While big corporations take advantage of render farms with enormous computing capabilities, personal computers and workstations still linger behind when it comes to rendering times. Even still, results which are similar to that of render farms can be achieved on any hardware, given enough time. As of today, there is a diverse number of rendering engines that are used in computer graphics. One of the primary functions of these engines is to calculate by using different algorithms such as [22, 24]. One of these engines is V-Ray, developed by Chaos Group [4], and is widely used in the field of computer graphics. V-Ray is available as a plugin for Autodesk Maya [1], [1], and Maxon [6], to name a few. To reduce rendering times, image quality must be sacrificed. In V-Ray, different render settings can be used to modify the quality, thus changing the render time. These settings range from different methods of primary diffuse bounces, Irradiance map, light cache, and brute force. Additionally, these methods also have their specific options that can be modified. Shading rate and anti-aliasing are two other settings that can also be modified. To add to this, while there have been many studies where the focus lies in opti- mizing render times [15, 20, 26], they require the implementation of their technique to take advantage of the improvements. Although these techniques decrease render times, they come with the disadvantage of not being able to be implemented by an end-user in V-Ray. It should however still be possible to achieve lower render time with the software tools currently available, as is. In other words, using V-Ray with optimal settings could reduce the render times, without a perceived loss in quality. With this in mind, a problem is created: Which of these settings can be reduced without negatively affecting a users perceived quality of the image? If one or more of these settings can be lowered without users noticing a difference in perceived quality, rendering times will surely decrease. Reduced rendering times will save the artist time and possibly even money in the long run. Furthermore, it can also be of interest for those who do not have access to a render farm, for example, students, amateur artists, startup companies, indie game developers, and possibly even AAA studios.

1 2 Chapter 1. Introduction 1.1 Aim, Objectives and Research Question

The overall aim of this study is to have a better understanding of how some of the rendering settings in V-Ray affect the rendering time as well as the perceived quality of the image. To achieve this, three objectives will be the focus of this report.

• Determine what setting or settings in V-Ray have the most impact on render times.

• By developing an image quality perceptual study, let participants compare rendered images with different settings.

• Analyze the gathered data and propose a solution to the problem above.

With these objectives in mind, a research question is asked: Can a perceived difference in image quality be noticed when reducing the most impacting render settings in V-Ray? When determining the most impacting render setting it was hypothesized that every setting would increase the render time based on its increased value, but global illumination would have more impact than shading quality and anti-aliasing. Chapter 2 Related Work

This section provides a brief overview of the related work in this field.

Studies in visual quality perception in the area of computer graphics are not uncommon. Some of the previous works include [23] by Rademacher et al. where the perception of visual realism in images was measured. Participants were instructed to rate an image as either real or not. Rademacher et al. point out the difficulty of communicating to the participants what they mean by the term “real”, as the goal of the experiment itself was to find out what makes an image realistic. Therefore one of the key points of the experiment design was to give participants as little information as possible to what was considered “real” and instead let the participants decide. The images used were real photographs and computer-generated. Rademacher et al. state that physics is not the only key to photorealism, as some of the real photographs were not all equally realistic. Finally, Rademacher et al. mention that if we can understand what visual factors have an impact on the perception of a photorealistic image, new rendering algorithms can be developed to take advantage of these factors [23]. A 2013 study by Pedersen [21] investigated if it was possible to lower the level of detail (LOD) of a background object without users noticing, by drawing their attention to a moving object in the foreground, thus reducing render time. Pedersen argued that the idea was that when participants are drawn to a moving object, they spend less time and attention on the background object. Two groups were used during the experiment, one with a moving foreground object and the other with a static foreground object. Both objects were the same. The results showed that visual movement influenced the overall perceived believability on the scene, but no conclusive evidence was found that it reduced attention for the background object. In 2011, Perez [22] conducted a study comparing some of the physically-based render engines at the time. The goal of the study was to show which render engines work better in different situations where a certain technique is more prominent to the scene. Additionally, render time was also taken into account. Perez compared all available render engines (biased and unbiased) on the market and analyzed their technical features, resulting in a final list of five different renderers that matched the criteria. These renderers were then subject to a side by side comparison of different techniques ranging from, , bump mapping, depth of field, global illumination, and normal mapping, to name a few. In his conclusion, Perez states that the renderer Reyes had the best

3 4 Chapter 2. Related Work performance when it came to techniques such as geometry displacement, motion blur, and depth of field. However, when photon calculation techniques such as global illumination, caustics, and subsurface scattering were measured, two other renderers, MentalRay or V-Ray, were preferred. Additionally, Perez also noted that when comparing quality versus render time, unbiased render engines were not preferred, as they took longer in every technique to get a good image, however, in applications without time restrictions, unbiased render engines would eventually produce the most physically accurate image. In 2014, a study conducted by Hoerter analyzed the threshold where surface detail of a mesh could no longer be subjectively discerned. The main question in Hoerter’s study was to determine the degree to which the implementation of normal maps will “trick” the average viewer into being unable to tell the difference between various differently detailed models of the same object. The study aimed to optimize the graphics development pipeline by having a clear and defined limit to what level of quality developers want to achieve [16]. In the study, a just noticeable difference approach was used where participants had to rate two versions of a pre-rendered computer-generated character displayed side by side. One of the characters had a polycount of four times larger than the other. Participants would then indicate if there was a noticeable difference between the two. By using a forced staircase model, participants would indicate if they could discern a difference between the models. Each time a difference was noted, a new set of more detailed models were shown. Similarly, when a difference was not seen, a new set of lower detailed models were shown. Hoerter suggests that increasing the level of detail beyond 14,000 polygons for detailed charters with normal maps would yield diminishing results. It is also stated that for non-normal mapped models, the range where differentiating details was observed was around 240,000 to 950,000 polygons. Chapter 3 Background

In this section, related information regarding the topics in this thesis will be ex- plained.

3.1 Physically Based Rendering

In computer graphics, the term “physically based rendering” implies that a render process uses a physically accurate model to interpret how light is affecting a scene. These models try to imitate how light is affecting the real world with the use of algorithms. These algorithms are computationally heavy, meaning that they require a large amount of time for accurate results and can produce images which look life- life. This is unlike the rendering in video games, where the same calculations are generalized and approximated to save time. This results in a less accurate image. These two render techniques are often referred to as offline and real-time rendering. However, some offline renderers do not exclude generalizations and approximations in their models. This means that offline renderers have two categories: biased and unbiased. Bias is a term for defining the error in an algorithm.

3.1.1 Biased Render Engines A biased render engine is oftentimes not only relying on a purely physical model and instead uses optimization algorithms to speed up the render process. These optimizations are not always accurate and can introduce errors in the final image. One feature of biased render engines is that the user has control of the settings for these types of calculations and is able to specify to what degree the renderer should operate. These settings can range from selecting the number of rays cast in a scene, to using a cache for interpolation of global illumination and adaptive sampling, to name a few [8]. Because of the shortcuts that biased render engines use, it is possible to achieve results which closely resemble that of an unbiased render in a shorter amount of time, granted that some amount of error will be present.

3.1.2 Unbiased Render Engines An unbiased render engine objectively produces a more physically correct image than a biased. This is because there are no shortcuts taken during the render process.

5 6 Chapter 3. Background

Although this introduces a big drawback, render time. Some render engines allow for a certain amount of time or error to be achieved before stopping the process. It can also be said that every renderer is biased since a lot of the reflection algorithms such as Blinn or GGX is in itself an approximation [22].

3.2 Autodesk Maya

Autodesk is an American company that develops software solutions for engineers, 3D artists and generally any entertainment industry. Through the Autodesk education community, students can use Autodesk software for free. One of these software’s is called Maya. Maya is a 3D software designed for , modeling, and simulation. It is widely used in Hollywood blockbusters as well as in-game design. One reason for Maya being extremely versatile is its plugin functionality. Developers can create their own plugins to interact with the software or add new features. One of these features can be a render engine, and although Maya comes with built-in rendering engines, it has support for 3rd party solutions like V-Ray [10].

3.3 V-Ray

V-Ray, developed by Bulgarian company Chaos Group, is a biased render engine that is available as a plugin for over 15 different software tools, including Autodesk Maya. The first version was released in 2002 for 3ds max, which is developed by Autodesk. It was not until 2009 that a version for Maya was released. About a year later, in 2010, V-Ray 2.0 was released. The latest version, as of 2019, V-Ray next, was released in 2018 where the focus lies on improving the scene setup for rendering, which can give faster and cleaner results. This also cuts down on other time-consuming tasks for the artists [13]. V-Ray has been used in many feature films and is also used in areas such as games, architecture, automotive design to name a few [5].

3.4 V-Ray Quick Settings

V-Ray has multiple settings that can be altered to one’s liking. These include but are not limited to: sampler type, minimum shading rate, min/max subdivisions, global illumination. Each setting has an impact on the final image one way or another but for this experiment, it was chosen to use “V-Ray quick settings”. V-Ray quick settings is a separate window that can be accessed from the V-Ray shelf in Maya. It lets users control three aspects of the render process: GI Quality, Shading Quality, and Anti-Aliasing quality [7]. 3.4. V-Ray Quick Settings 7

Figure 3.1: V-Ray Quick Settings Interface.

3.4.1 Render Settings Overview The V-Ray quick settings panel lets users adjust three different sliders on a scale from 0 to 100. Moving one slider will affect a number of different settings for the renderer. The sliders affected settings will be briefly explained. First and foremost, V-Ray quick settings has four different presets, ArchViz In- terior, ArchViz Exterior, VFX and Studio setup. Each of these presets affect the options available to the user in different ways as well as its affected parameters. As both the scenes used are interior scenes, it was chosen to use the ArchViz Interior preset.

Global Illumination (GI) Quality The default preset for the global illumination engine is Irradiance Map + Light Cache (IM + LC), which is a faster and less accurate preset than Brute Force + Light Cache (BF + LC). BF + LC was therefore chosen in order to achieve more accurate results. The Global Illumination slider therefore affect the following settings: 8 Chapter 3. Background

• Brute force subdivs: It determines the number of samples used to approximate GI. This value ranges from 8 at 0% to 64 at 100%.

• Light cache subdivs: It determines how many paths are traced from the camera. This value ranges from 500 to 3000.

• Light cache pre-filter samples: It controls the number of samples taken during prefiltering. Values range from 40 to 20. “Prefiltering is performed by exam- ining each sample in turn, and modifying it so that it represents the average of the given number of nearby samples. More prefilter samples mean a more blurry and less noisy light cache. Prefiltering is computed once after a new light cache is computed or loaded from disk” [7].

• Light cache retrace threshold: It controls the threshold value. “When enabled, this option and its corresponding Retrace threshold value improve the precision of global illumination in cases where the light cache will produce too large an error”. This value ranges from 2 to 8.

Shading Quality The Shading Quality is described in the documentation as follows: “Controls the number of primary rays shot for AA (anti-aliasing) versus secondary rays for other effects like glossy reflections, GI, area shadows, etc. Higher values mean that less time will be spent on AA, and more effort will be put in the sampling of shading effects”. This value ranges from 1 to 64.

Anti Aliasing (AA) Quality The Anti-Aliasing slider affects the following settings:

• Min Subdivs: Controls the initial number of samples per pixel. For cases with thin lines, it can be increased. Ranges from 1 to 2.

• Max Subdivs: Determines the maximum number of samples for a pixel. V- Ray may take less than the maximum number of samples if the difference in intensity of the neighboring pixels is small enough. Ranges from 1 to 50.

• Noise Threshold: The threshold that will be used to determine if a pixel needs more samples. Ranges from 0.05 to 0.002 (lower values being more sensitive).

Lastly, the option to select sampler type is also available. V-Ray has two samplers, bucket and progressive. While they mostly serve the same function, they are very different from each other. The progressive sampler gives a quick image early on in the render process as it renders the entire image in a single pass, with additional passes to enhance the image. Note that the first pass will not give any satisfactory results, as the images usually need a couple of passes before some of the details can be noticed. Additional passes are then used to refine the image until satisfactory results are achieved. 3.5. Image Quality Assessment 9

The bucket sampler, on the other hand, works by rendering different sections, or buckets, of the image. Each bucket calculates the final pixels for the image and then moves on to another section where pixels need to be calculated. The bucket sampler works by taking a variable number of samples per pixel based on the difference in intensity between the pixel and its neighbors. The AA parameters directly affect the bucket samplers performance when rendering. The bucket sampler is considered the preferred sampler when dealing with lots of small details and/or blurry effects such as depth of field, motion blur, and glossy reflections to name a few.

3.5 Image Quality Assessment

Image quality assessment (IQA) is a process for determining the quality of an image. With the increase in media consumption, the bandwidth for transmitting images is also increased, which furthers the use of compression algorithms. These algorithms, while achieving their goal of reducing the image size, can oftentimes cause distortions or other types of errors in which the end-users, mostly human observers are the target. To reduce this effect, IQA plays a big role in finding the balance of lower bandwidth and degradation quality. Furthermore, IQA is widely used in a broad range of fields where visual signal communication is important. Some areas include but are not limited to, printing and information systems as well as biomedical imaging, to name a few [19], IQA methods can be divided into two categories, subjective and objective.

3.5.1 Subjective IQA

Due to these images being targeted at humans, it is only reasonable that their quality should be assessed by other humans, for accurate results. This process is called subjective image assessment. It is however very expensive and time-consuming since it requires a large number of participants to get a good estimate of the general consensus for image quality. Subjective evaluation methods differ, but it generally consists of a group of par- ticipants who are instructed to give a rating to a set of images. Some of the standardized subjective IQA methods are clearly described in [19], but a brief summary will be given.

Single Stimulus Categorical Rating

Here, an image is displayed for a brief amount of time and users are then asked to rate the image on a subjective scale of “excellent, good, fair, poor, or bad”. The images are displayed randomly.

Double Stimulus Categorical Rating

This method is almost identical to a single stimulus with the exception of the ref- erence image being shown at the same time as the condition image. Users are also asked to rate the image with the same scale as above. 10 Chapter 3. Background

Pairwise Similarity Judgments

In pairwise similarity judgments, users are shown two images and have to make a decision on which one is preferred. Users are also required to indicate the level of difference between the images on a continuous scale. This is because an abstract scale as described in single and double stimulus rating is not preferred as described in [18].

Two Alternative Forced-choice Comparison

Two alternative forced-choice comparisons (2AFC), is a method where users are shown two images from the same scene. They are then forced to make a choice based on criteria, for example, image quality. Since it is a forced choice, a decision has to be made regardless if the user has no preference. This method is used in the thesis experiment.

3.5.2 Objective IQA A significant limitation of subjective image evaluation stems from the use of real human participants, which require time and resources. Instead, objective image evaluation algorithms are used to quickly evaluate an image. Objective IQA, while having an advantage over subjective IQA in terms of speed, is limited by the math- ematical model, which serves as a way to predict image quality as close to human vision as possible. Objective IQA’s can be categorized in three ways. The first is full reference image quality assessment (FR-IQA) where a reference image is available. This reference image is undistorted and of the highest quality available. One of these objective IQA’s is called SSIM. The second is called reduced reference image quality assessment (RR-IQA) where the reference image is not perfect or complete. The third, called no- reference image quality assessment (NR-IQA) completely lack access to a reference image [19].

SSIM

Structural Similarity Index (SSIM) developed by Wang et al. [27] is a method used to objectively measure the degradation of images by measuring the difference between two images. For this to work, a reference image is needed. This reference image is to be considered the better image of the two, as the method itself cannot determine which image is better, only how much they differ. Earlier IQA work by Wang includes Universal Quality Index (UQI), also known as Wang–Bovik Index, which was originally developed by in 2001 [12]. In 2004 Wang et al. published their paper on SSIM [27].

MSSIM

Although SSIM has been shown to outperform other state-of-the-art perceptual im- age quality metrics, it is still limited by a single-scale approach. This scale can depend 3.5. Image Quality Assessment 11 on the viewing angle, resolution or viewing distance from the screen, in which adap- tation for the single scale SSIM is impossible. To combat this, different variations of SSIM exist such as Multiscale SSIM (MS-SSIM) where an image is sampled multiple times in the form of sub-sampling. This means that multiple scales such as resolu- tion and viewing distance is taken into account when calculating the perceived image quality [28].

DSSIM Structural Dissimilarity (DSSIM) is a method based on SSIM for calculating the dissimilarity between two images. The value returned is 1/SSIM − 1, where a result of 0 means that the image is identical to the reference, and a result of > 0 (to infinity) is the amount of difference between the images.

Chapter 4 Method

This chapter covers the methodology and experiment design used for this thesis. In brief summary, two scenes were used to render 50 images. One of these scenes served as a preliminary study to find the most impacting render setting. A 2AFC method was used in the experiment where participants indicated their preference for one of two images. An objective image metric was used to quantitatively measure the difference between the images.

4.1 Scenes

Two scenes, see Figure 4.1 and 4.2, were used in this experiment. The first scene, a 50’s diner, was created by the author of this thesis during the first year at BTH. The scene resembles a restaurant during the mid-1900s with a white and red theme throughout. All materials in the scene were converted to V-Ray materials and ad- justed to give an overall better result. The scene was also slightly adjusted to allow for more sunlight to enter the room. The second scene, a hotel room with a modern and greek theme, available online, was created by Naveen Raghul, a Civil Engineer- ing graduate [2]. This scene was part of Rahgul’s portfolio and completely free for anyone to use. No copyright infringement has been made. Both scenes are interior scenes. Although they both use V-Ray materials, the complexity of the scenes is different. The diner scene is less complex than the hotel room scene. This is because the diner was created with a real-time game in mind, which resulted in having lower resolution meshes in order to optimize the frame rate. The hotel room, on the other hand, consists of high-resolution meshes and textures. This difference in complexity is also shown in the render times for these scenes. Although the diner scene in this context is referred to as a low-poly scene, it should not be confused with the traditional low-poly artistic design, where visible edges are usually criteria. In relative terms, when compared to the hotel scene, the diner scene can be referred to as low-poly. The diner scene has a triangle count of 112949. The hotel scene has a triangle count of 2109218.

13 14 Chapter 4. Method

Figure 4.1: Image of the diner scene used in the experiment.

Figure 4.2: Image of the hotel scene used in the experiment.

4.2 V-Ray Quick Settings

It was decided to use the V-Ray Quick Settings option because it provides an easy interface for achieving fast results. To quote the documentation from Chaos Group: “It is intended to give new V-Ray users the ability to set up scenes without worrying about all the different V-Ray options available in the Render Settings window” [7]. 4.3. Preliminary Study 15

This also provides an easily accessed metric, the percentage scale, without going out of bounds with the settings that it affects.

4.3 Preliminary Study

A preliminary study was done in order to complete objective 1: Find the most impacting render setting. The diner scene served as a test for this pre-study. To measure the impact of each setting in the V-Ray quick settings window, a total of 27 images were rendered. Each image consisted of different variations of the global illumination, shading rate, and anti-aliasing quality setting. Given the ten weeks for this thesis it was decided to only render three conditions for each of the settings, 0%, 50%, and 100%. This ensured that the preliminary study would not take up necessary time from the final renders of the two scenes. From this pre-study it was determined to only render conditions where GI and AA had been changed. Details about the results are described in 5.1.

4.4 Rendering

The experiment required 25 images to be rendered per scene. One of the images served as the reference image with all settings at max, i.e. 100% global illumination, 100% shading quality, and 100% anti-aliasing. The other 24 images were compared to this reference. The other images all had their global illumination and anti-aliasing settings lowered, separately, by decrements of 20% per image.

4.4.1 Rendering Hardware • CPU: AMD Ryzen 7 1700X Eight-Core Processor, 3750 Mhz, 8 Core(s), 16 Logical Processor(s)

• Motherboard: ASUS PRIME X370-PRO

• GPU: Nvidia GeForce GTX 1080 TI

• RAM: G.Skill Trident Z 16GB 3200Mhz CL14

• OS: Windows 10 version 1803

4.5 Experiment Setup and Procedure

The experiment took place at Blekinge Tekniska Högskola. Information letters had been put up before the experiment to attract participants. During the course of 3 days, random students from the school were briefed verbally about the experiment and asked to participate. The participants were led to a small room. The windows were covered with blinds to not cause disturbance from the outside. Participants were asked to sign a consent form and were also informed that they could end the experiment at any time without 16 Chapter 4. Method giving an explanation for doing so. Participants who frequently use glasses or lenses were instructed to have them on to negate any interference with viewing the images. Participants were then seated about 0.5m in front of the screen and informed about the instructions for operating the program. A two-alternative forced-choice (2AFC) method was used during the experiment and participants were shown two images side by side on the screen. Both images were from the same scene and camera position. One image was the reference and the other was one of the conditions with altered settings. Participants had been instructed to select the image which in their mind was the better looking. Since there were a total of 96 conditions that had to be rated, a timer of 10 seconds was used to advocate the participant to make a choice in order to keep the experiment within its limit. After each choice, the images turn grey for a about 2 seconds to not expose the user to constant stimuli. A new pair of images is then shown and the timer resets. After reviewing and rating all of the images from both scenes, users were asked to fill out a small questionnaire. This questionnaire served as a way to gather general information about the participants such as age, previous experience with rendering or computer graphics and also if and what they noticed between the two images that stood out as different.

4.6 Ethics

When considering ethical issues, it was determined that participants would not be exposed to any danger during the procedure and there was no obligation to continue if a participant would change his/her mind.

4.7 Experiment Specifications

• Computer: Microsoft Surface Pro 4 – 256 GB / Intel Core i7

• Monitor: PHL 258B6QJEB 25” AH-IPS LCD 1440p

4.8 Custom Software Implementation

A custom-built software developed in C# was used during the experiment. The software enabled the use of a 2AFC method, with a custom interface and logic for data capture. A reference image can be selected and the subsequent conditions (images) can be selected. These images are then loaded into the program and are shown when pressing start. The users can then adjust a slider to indicate which image is preferred. The slider always indicates one of the images. A checkbox under the slider is also available for the user to select if they can not determine a preferred image. A timer is shown to indicate how much time the users have to make a choice. After selecting a preferred image, the user clicks “next” and is taken to a new comparison. When first adding the images, they are randomly added to an “image-queue” as well as given a position, either right or left. The queue is then generated again, 4.9. DSSIM IQA Metric 17 adding the same images to the queue, but with the position reversed. The next image that is shown to the user after pressing “next” will be the next in the “image-queue”. When all images have been shown, a message will display “done”. The results can then be saved to a text file. The text file displays the condition images previously selected and what score (preference) the user has given to each one of them. These data files were also coupled with the questionnaire that the participants answered after the experiment. Note that this data was anonymous and could not be associated with the consent form where the name was written. This data was then analyzed.

Figure 4.3: Screenshot of the custom software that was used during the experiment.

4.9 DSSIM IQA Metric

An objective method was used to evaluate the image conditions as this would be a rather quick approach and could potentially provide interesting results when com- pared to the subjective evaluation done by the participants. An open-source software [17] which was based on the MSSIM algorithm developed by Wang et al. was used to determine the dissimilarity between the reference image and the subsequent con- ditions for both scenes. The value returned is 1/SSIM-1, where 0 means that the image is identical, and >0 (to infinity) is the amount of difference. The version used was 2.9.7.

Chapter 5 Results

The results from the preliminary study and experiment are described in the sections below.

5.1 Preliminary Study

The results were unforeseen. It was hypothesized that increased settings would also increase the render time, to a somewhat linear degree, meaning that, all settings at 100% would have a higher render time than all settings at 0%. While this turned out to be true, there were some exceptions, specifically with the shading quality setting. In some cases, it was found that when increasing the shading quality setting, the render time would decrease, contrary to the hypothesis, see Figure 5.1. Two examples of this behavior could be seen when the GI, SQ, and AA were all turned to 100%, the measured render time was about 37 minutes, while the same settings, except for the SQ, which was turned to 0%, resulted in a render time of 50 minutes. The same behavior was also observed when the settings were all at 50%, resulting in an 11-minute render, whereas decreasing the SQ to 0% resulted in a 14 minute render, see Table 5.1. This led to the decision of only rendering conditions where AA and GI had been changed. When lowering GI and AA to 0%, SQ did not affect the render time and each render was about 30 second long. When measuring each specific setting, the anti-aliasing had the most impact on render time by a large margin. With every other setting at 0%,100% percent anti- aliasing resulted in a 39-minute render. Global illumination at 100% had a render time of about 5 minutes. Shading quality, as stated above, only took 30 seconds. A 3D scatter plot was created to visually show the difference in render time of each setting, see Figure 5.3.

Table 5.1: Shading quality proved to be unreliable as render time increased when decreasing the percentage of SQ

Global Illumination Shading Quality Anti-Aliasing Render Time 100 100 100 36m 52s 100 0 100 50m 50 50 50 10m 44s 50 0 50 14m 1s

19 20 Chapter 5. Results

Figure 5.1: Percentage of shading quality and its resulting render time.

Figure 5.2: Anti-Aliasing and global illumination effect on render time at 0%, 50% and 100%. 5.1. Preliminary Study 21

Figure 5.3: 3D scatter plots showing the combination of the three different render settings and corresponding render time, from different perspectives. 22 Chapter 5. Results 5.2 Subjective Results

1056 votes were collected for each respective scene and analyzed. Participants were able to vote on the reference image in every single condition, whereas the condition images could only get a maximum of two votes per participant. Therefore all the votes were combined and normalized to find the most preferred images. Results show that conditions with higher settings are preferred over lower settings. Table 5.2 and 5.3 show each condition, their respective votes as well as votes for the reference image and p-value. Bold values indicate significant cases. Since the 22 participants could give a maximum of two votes per condition, the maximum number of votes for either the condition or reference is 44. The p-value is derived from a chi-square test [3] where the observed frequency is the amount votes for the condition image and the expected frequency is the maximum number of votes divided by two i.e. 22. 22 is to be expected if no difference can be observed, which will result in preferential indication to be random.

Global Illumination Anti-Aliasing Condition Votes Reference Votes p-value 100 75 23 21 0.7630246 100 50 23 21 0.7630246 75 50 23 21 0.7630246 25 75 21 23 0.7630246 25 25 21 23 0.7630246 0 100 21 23 0.7630246 75 100 20 24 0.5464936 50 75 20 24 0.5464936 0 25 20 24 0.5464936 50 50 19 25 0.3657123 0 75 25 19 0.3657123 50 100 18 26 0.2277999 25 100 26 18 0.2277999 0 50 18 26 0.2277999 100 25 17 27 0.1316680 75 75 28 16 0.0704404 75 25 16 28 0.0704404 50 25 16 28 0.0704404 25 50 14 30 0.0158613 100 0 0 44 0 75 0 1 43 0 50 0 1 43 0 25 0 0 44 0 0 0 0 44 0

Table 5.2: Every condition with number of votes and p-value for the diner scene. 5.2. Subjective Results 23

Global Illumination Anti-Aliasing Condition Votes Reference Votes p-value 75 100 22 22 1 50 100 23 21 0.7630246 50 75 21 23 0.7630246 25 100 21 23 0.7630246 0 75 21 23 0.7630246 75 75 24 20 0.5464936 25 50 20 24 0.5464936 75 50 19 25 0.3657123 25 75 18 26 0.2277999 0 50 17 27 0.1316680 0 100 16 28 0.0704404 75 25 15 29 0.0348084 50 50 15 29 0.0348084 100 50 14 30 0.0158613 100 75 31 13 0.0066556 50 25 12 32 0.0025688 0 25 12 32 0.0025688 100 25 11 33 0.0009111 25 25 11 33 0.0009111 75 0 5 39 0.0000003 100 0 4 40 0 50 0 4 40 0 25 0 3 41 0 0 0 1 43 0

Table 5.3: Every condition with number of votes and p-value for the hotel scene.

5.2.1 How Render-Settings Affect Votes As can be seen in Figure 5.8, conditions where AA is more prominent, also gain more votes. Additionally, every condition where AA is at 0%, has the least amount of votes in both scenes. On the other hand, the same is not true for AA at 100%. Furthermore, when analyzing the results from the 2AFC with a chi-square test it is indicated that every condition with 0% AA falls within the margin of significance, where p<0.05, and specifically, that participants did have a preference between the reference image and the condition image. And with the data from the number of votes cast per image, it supports the fact that users did not prefer images with 0% AA. This was true for both scenes. Global illumination had no significant influence on the preference of the partici- pants, as can be seen in Figure 5.9. Although this was true for both scenes, a clear but small downwards trendline can be seen with the hotel scene.

Diner Scene Only one condition, in the diner scene, with 25% GI and 50% AA had the same significance (p<0.05) factor as the other, 0% AA images. Although this case was 24 Chapter 5. Results

Figure 5.4: Column chart representing the normalized votes for each condition for the diner scene. significant, a clear distinction in the number votes was observed compared to the 0% AA conditions. The aforementioned condition received 14 votes out of 44, whereas all five 0% AA conditions received a total of two votes combined.

Hotel Scene Looking at the hotel scene, a rather different result was observed. Instead of only the 0% AA condition being significant, it was also observed that every 25% AA condition was well. What is also surprising is that the condition with 100% GI and 75% AA also fell within the margin of significance, but with a staggering 31 votes, meaning that this image was preferred over the reference image by a significant amount. The results also show that, for the hotel scene, two conditions with 50% AA also fell into the margin of significance, where participants voted in favor of the reference image. Every condition with 75% to 100% AA was not significant.

Not Sure Indication During the experiment, participants had the option to indicate if they were not sure about their decision in determining which image was better. Table 5.4 and 5.5 indicate their "not sure" answers for each condition. The results are in line with the previously mentioned results where participants preferred images where AA is more prominent, specifically, at least 25% for the diner and 50% for the hotel scene. In the hotel scene, conditions where AA approached 75% to 100%, also had more participants indicate that they were not sure which image was better. 5.2. Subjective Results 25

Global Illumination Anti-Aliasing Not Sure Indication 75 75 22 25 75 21 25 25 19 0 75 19 25 100 18 25 50 18 0 100 18 100 50 17 50 50 17 0 25 17 100 25 15 75 50 15 50 75 15 100 75 14 75 100 14 50 100 14 50 25 13 75 25 12 0 50 11 100 0 3 75 0 2 50 0 1 25 0 0 0 0 0

Table 5.4: The number of times participants indicated that they were not sure of being able to determine which image looked better when compared to the reference. Diner scene. 26 Chapter 5. Results

Global Illumination Anti-Aliasing Not Sure Indication 50 100 23 25 100 21 50 75 20 0 100 20 75 100 19 25 75 19 0 75 19 100 75 18 75 75 17 25 50 17 25 25 17 75 50 16 0 50 16 50 50 14 75 25 13 50 25 12 100 50 11 100 25 10 0 25 8 100 0 0 75 0 0 50 0 0 25 0 0 0 0 0

Table 5.5: The number of times participants indicated that they were not sure of being able to determine which image looked better when compared to the reference. Hotel scene. 5.2. Subjective Results 27

Figure 5.5: Column chart representing the normalized votes for each condition for the hotel scene.

5.2.2 Survey Results Out of the 22 participants, 5 answered that they had previous experience in computer graphics or rendering. When asked about the participants’ noticeable difference between the images a majority of the answers revolved around the effects of anti-aliasing. Specifically, participants’ consensus was that some of the images looked “blurry”, “pixelated” or “grainy”. Four participants noticed a difference in the shadows. Five participants felt that there was a difference in the light. Six participants noted that most of the images were very hard to differentiate. This statement was also verbally brought forward by the majority of participants after the experiment had ended, although no record of this exists.

5.2.3 Previous Experience vs No Experience To find a correlation between previous experience in computer graphics or rendering and perceptual opinion of the rendered images, the votes were analyzed by the data from the survey results where participants would indicate their previous experience in the subject.

Diner Comparing results for the diner scene, both groups showed an agreement that 0% AA was not preferred at all, with a score of 0 for experienced participants, and 0.0117 28 Chapter 5. Results

Figure 5.6: Column chart representing the p-value for the diner scene. for non-experienced. The biggest change in perceived quality between the groups came when comparing the 25% AA condition. Here, the experienced participants had an average score of 0.28 whereas the non-experienced scored an average of 0.447, a difference of about 46%. The difference between the remaining conditions were about 4.67% for 100% AA, 7.90% for 75% AA, and 0.25% for 50% AA, see Figure 5.10.

Hotel In the hotel scene, the difference between 100% AA was only about 2.25%. 75% AA had a difference of 21.7%. 50% and 25% AA were about 10% each. 0% AA had a 129% difference with a score of 0.02 for experienced participants and 0.094 for non-experienced, see Figure 5.11. As previously mentioned in Section 5.2.1, global illumination did not seem to have any effect on user votes and it did not produce any reliable data mostly because of the low sample size for the experienced participants, see Figure 5.12.

5.3 Objective Results

5.3.1 DSSIM Results A DSSIM comparison was also done to test how an objective computer algorithm would rate the similarity or in this case dissimilarity (DSSIM) of the images. In both scenes, conditions with 0% anti-aliasing scored the highest in dissimilarity. What was noticed in both scenes, especially the hotel scene, was that the range from 5.3. Objective Results 29

Figure 5.7: Column chart representing the the p-value for the hotel scene.

25% AA to 100% had a drastic difference and scored much lower in dissimilarity than the 0% AA. For example, in the hotel scene, the percentage difference from the lowest scoring 0% AA to the highest scoring 25% AA was around 148%. For the diner scene, the percentage was about 87%. These objective results somewhat coincided with the participants’ subjective pref- erence for images with at least 25% AA.

5.3.2 Render Time For every condition rendered, including the reference, render time was recorded. Table 5.8 and 5.9 indicate each image with its respective render time as well as p-value. From this data, it was clear that render time had a clear correlation with the participants preferences. It was also clear that non-significant conditions had a lower render time than the reference. When comparing the render time of the reference im- age against the lowest non-significant condition image for the diner scene it was found that the reference image took 583% longer to render. Doing the same comparison for the hotel scene resulted in a 313% increase in render time. 30 Chapter 5. Results

Global Illumination Anti-Aliasing DSSIM 100 100 0 100 75 0.000029 75 75 0.000299 75 100 0.000327 50 100 0.000988 50 75 0.000988 100 50 0.001021 75 50 0.001176 25 100 0.001559 25 75 0.001565 50 50 0.001811 100 25 0.002155 75 25 0.002271 25 50 0.002365 50 25 0.002866 0 100 0.003115 0 75 0.003117 25 25 0.003412 0 50 0.003897 0 25 0.004928 100 0 0.012450 75 0 0.012583 50 0 0.015133 25 0 0.020295 0 0 0.049341

Table 5.6: DSSIM value for each condition. Diner scene. 5.3. Objective Results 31

Global Illumination Anti-Aliasing DSSIM 100 100 0 100 75 0.000019 75 75 0.000432 75 100 0.000479 50 100 0.001330 50 75 0.001331 100 50 0.001495 75 50 0.001756 25 100 0.001842 25 75 0.001842 50 50 0.002495 25 50 0.002980 100 25 0.003312 75 25 0.003451 50 25 0.004018 25 25 0.004449 0 100 0.004834 0 75 0.004837 0 50 0.005909 0 25 0.007313 100 0 0.048827 75 0 0.049354 50 0 0.050461 25 0 0.052755 0 0 0.071551

Table 5.7: DSSIM value for each condition. Hotel scene. 32 Chapter 5. Results

Figure 5.8: Normalized votes for each percentage of anti aliasing.

Figure 5.9: Normalized votes for each percentage of global illumination. 5.3. Objective Results 33

Figure 5.10: Normalized votes on different anti aliasing settings by experienced and non-experienced participants. Diner scene.

Figure 5.11: Normalized votes on different anti aliasing settings by experienced and non-experienced participants. Hotel scene. 34 Chapter 5. Results

Figure 5.12: Average votes on different global illumination settings by experienced participants. Data is inconclusive.

Figure 5.13: DSSIM results for the diner scene. 0 means identical to reference image. 5.3. Objective Results 35

Figure 5.14: DSSIM results for the hotel scene. 0 means identical to reference image. 36 Chapter 5. Results

Global Illumination Anti-Aliasing Render Time (s) p-value 100 100 2212 100 75 2126 0.7630246 75 100 1974 0.5464936 75 75 1905 0.0704404 50 100 1784 0.2277999 25 100 1719 0.2277999 0 100 1668 0.7630246 50 75 1664 0.5464936 0 75 1642 0.3657123 25 75 1620 0.7630246 100 50 893 0.7630246 75 50 801 0.7630246 50 50 701 0.3657123 25 50 676 0.0158613 0 50 664 0.2277999 100 25 435 0.1316680 75 25 388 0.0704404 50 25 342 0.0704404 25 25 334 0.7630246 0 25 324 0.5464936 100 0 312 0 75 0 211 0 50 0 117 0 25 0 61 0 0 0 30 0

Table 5.8: Every condition and its respective render time as well as p-value. Diner scene. 5.3. Objective Results 37

Global Illumination Anti-Aliasing Render Time (s) p-value 100 100 14880 100 75 14520 0.0066556 75 100 13260 1 75 75 11820 0.5464936 50 75 9660 0.7630246 50 100 9600 0.7630246 25 100 9360 0.7630246 25 75 9180 0.2277999 0 100 9000 0.0704404 0 75 8820 0.7630246 100 50 6000 0.0158613 75 50 4920 0.3657123 25 50 3840 0.5464936 0 50 3601 0.1316680 50 50 3486 0.0348084 100 25 3067 0.0009111 75 25 2507 0.0348084 100 0 2301 0.0000000 0 25 1864 0.0025688 25 25 1794 0.0009111 50 25 1792 0.0025688 75 0 1317 0.0000003 50 0 483 0 25 0 298 0 0 0 148 0

Table 5.9: Every condition and its respective render time as well as p-value. Hotel scene.

Chapter 6 Analysis and Discussion

Three objectives were established to try to answer the research question if a perceived difference could be observed when lowering the most impacting render settings in V- Ray. With the method described above, not only was it determined which setting has the most significant effect on render times, but also that it was heavily correlated with user preference. In the preliminary study, the anti-aliasing setting was determined to be the most impacting of the three “V-Ray quick settings” that were used in this study. The global illumination setting had a negligible effect on render time, see Figure 5.2. When determining the effect of the shading quality setting, it was observed that, contrary to the other settings, the render impact became higher when lowering the percentage. This resulted in the decision that it was to be left at 100% because of the intricate nature of the V-Ray renderer and how it determines its priority for anti- aliasing vs other effects, as described in Section 3.4.1. Figure 5.1 shows the varying render times based on the different settings for shading quality. It is, however, important to note that the shading quality setting could have affected the perceived visual quality if it had been altered. The main experiment showed that a significant amount of participants did not vote for the images where anti-aliasing had been turned down to 0%. This was the case for every condition despite varying global illumination percentages. While this statement was true for the diner scene alone, the hotel scene also saw the same results for images with 25% anti-aliasing. There could be a couple of reasons for this result. Firstly, the higher poly-count in the hotel scene could allow for more smooth surfaces where the effects of lower anti-aliasing are more noticeable. Secondly, which ties into the first point, the wall behind the beds, which is directly centered, make up for a large amount of space in the image. A large centered object could draw a lot of attention from the viewer. This wall is also heavily smoothed and features a dented pattern which could further influence the anti-aliasing effect. Only two conditions with 50% anti-aliasing fell within the margin of significance where users preferred the reference image. On the contrary, there was only one significant case where users preferred the condition over the reference image. This image had 100% GI and 75% AA. There is no clear explanation for this result. One potential argument that could be made, is that there was an anomaly during the render process and the image stood out from the reference in a perceivable way for the participants. However, this contradicts with results from the dissimilarity test, where the image in question scored the lowest against the other conditions, meaning

39 40 Chapter 6. Analysis and Discussion that it was the most similar to the reference. Regarding the quality perception between experienced and non-experienced par- ticipants, the results show a small but noticeable difference between the two groups. Even though the results show a clear difference, it should still be noted that the sample size between the groups is rather large, with 5 experienced and 17 non- experienced. There is no certain way of telling if the results from the 5 individuals are because of their experience, the low sample size, or simply a random coincidence. The limitations of this study should not be dismissed. It can be argued that the low sample size of the randomly selected participants could affect the results in a way that would skew the data in an unfavorable direction. Although this cannot be measured, either way, larger sample size is generally always preferred. The use of a low-poly scene can also be considered a misleading motive as computer-generated imagery with tools such as V-Ray often strive for photo-realism. With this in mind, there still was a high-poly scene, which fits the motive and data was gathered indi- vidually for both scenes. The use of two interior scenes could also skew results to be more applicable toward similar scenes, potentially limiting the results which could be applied for exterior scenes. In other cases where forced choice alternatives are present, each condition is usually compared against each other but this results in more trials, 0.5(n(n − 1)) for n conditions as described by Mantiuk et al [18]. Furthermore, Manituk et al. also outline different approaches to limit the number of trials, namely, by using balanced incomplete block designs [14] or even more effective, by the use of a sorting algorithm [25]. One of the motivations for this study was the difficulty to subjectively measure the quality of a computer-generated image during the production phase. It is not uncommon to sometimes end up with mental fatigue after being exposed to prolonged periods of cognitive activity [11], where it can be hard to spot easy mistakes or focusing on the wrong part of the image. This study served as a way to get an understanding of the general perception from these images and how the different settings affect the perceived quality. While the main focus of this study was the subjective results from the partici- pants, the objective results coincided well enough that it could be used as a faster analytical tool to measure the quality of a computer-generated image. This is highly beneficial since it is significantly easier to measure image quality with software rather than human participants. Chapter 7 Conclusions and Future Work

This study concludes that when reducing the most demanding render setting, anti- aliasing, a difference can be noticed. The results show that a minimum of 25% AA was required when dealing with less complex scenes where smooth topology is sparse. For scenes with higher complexity, such as smooth topology and reflections, AA of at least 50% was preferred. Additionally, the use of an objective image assessment tool can drastically speed up the process for targeting a specific visual quality. When comparing the subjective and objective results against each other, it was shown that the objective threshold for the subjective preference of the images was in the average range of 0.060446 to 0.002908. Every condition below 0.002908 was not proven to be statistically significant in the subjective analysis. Lastly, it was observed that the reference image took 583% and 313% longer to render than the lowest non-significant image for the diner and hotel scene respectively. This implies that since the condition was not significant i.e. no difference could be observed by the participants, a lower render time can be achieved without affecting perceived quality. One considerable limitation of this thesis was the low amount of settings that could be tested to not go out of scope with the time given for this project. Future work would consist of more altered settings, preferably individually and with a larger margin for the number of conditions per setting, meaning that, instead of 5 steps ranging from 0% to 100%, 10 steps per setting could be tested. This is also dependent on the type of setting tested, as this thesis used the V-Ray quick settings window which uses percentages of already predetermined ranges whereas individual settings mostly rely on user and/or software limitations. Furthermore, a larger sample size of participants would also be preferred to get more accurate results. Another interesting topic would be if participants would notice if a computer- rendered picture was rendered with low quality without prior knowledge of it. This could consist of a website where either interior or exterior environments are shown, as a type of gallery. Would people notice or care if an image looked “bad” without first looking at a high-quality reference image? A study like this could also heavily look at the different target audience and their respective impression.

41

References

[1] 3d design, engineering construction software. https://www.autodesk.com/.

[2] 3d portrayer - my portfolio. http://3dportrayer.com/!portfolio.html/.

[3] Calculation for the chi-square test. http://www.quantpsy.org/chisq/chisq.htm/.

[4] Chaos group | rendering simulation software – v-ray, vrscans phoenix fd. https://www.chaosgroup.com/.

[5] Film projects – rendered with v-ray: Chaos group. https://www.chaosgroup.com/gallery/industry/feature-film/.

[6] Maxon computer gmbh. https://www.maxon.net/.

[7] Render settings: Vray tab. https://docs.chaosgroup.com/display/VRAY3MAYA/V- Ray+for+Maya+Help/.

[8] The truth about . https://www.chaosgroup.com/blog/the- truth-about-unbiased-rendering/.

[9] J. Amanatides. Realism in computer graphics: A survey. IEEE Computer Graphics and Applications, 7(1):44–56, Jan 1987.

[10] Bethany. Autodesk – everything you need to know, Aug 2019. https://www.scan2cad.com/cad/autodesk/.

[11] Maarten AS Boksem, Theo F Meijman, and Monicque M Lorist. Effects of mental fatigue on attention: an erp study. Cognitive brain research, 25(1):107– 116, 2005.

[12] Brian L Evans and Wilson S Geisler. Rate scalable foveated image and video communications. 2001.

[13] Chaos Group. Chaos group launches v-ray next for 3ds max, May 2018. https://www.globenewswire.com/news- release/2018/05/18/1508937/0/en/Chaos-Group-Launches-V-Ray-Next-for- 3ds-Max.html/.

[14] Harold Gulliksen and Ledyard R. Tucker. A general procedure for obtaining paired comparisons from multiple rank orders. Psychometrika, 26(2):173–183, Jun 1961.

43 44 References

[15] Robert Günther, Stefan Guthe, and Michael Guthe. A visual model for quality driven refinement of global illumination. pages 1–8. ACM, 2017. [16] Michael Edward Hoerter. Just noticeable difference survey of computer gener- ated imagery using normal maps. 2014. [17] Kornelski. kornelski/dssim, Jun 2019. https://github.com/kornelski/dssim/. [18] Rafał K Mantiuk, Anna Tomaszewska, and Radosław Mantiuk. Comparison of four subjective methods for image quality assessment. In Computer graphics forum, volume 31, pages 2478–2491. Wiley Online Library, 2012. [19] Pedram Mohammadi, Abbas Ebrahimi-Moghadam, and Shahram Shirani. Sub- jective and objective quality assessment of image: A survey. Majlesi Journal of Electrical Engineering, 9(1):55–83, Dec. 2014. [20] Karol Myszkowski, Takehiro Tawara, Hiroyuki Akamine, and Hans-Peter Seidel. Perception-guided global illumination solution for animation rendering. pages 221–230. ACM, 2001. [21] Louise Blom Pedersen. A study in perceived believability. Utilising visual move- ment to alter the level of detail. Copenhagen: Aalborg University Copenhagen. Retrieved September, 2:2016, 2013. [22] FRANCISCO PÉREZ ROIG. Photorealistic physically based render engines: a comparative study. PhD thesis, 2012. [23] Paul Rademacher, Jed Lengyel, Edward Cutrell, and Turner Whitted. Measur- ing the perception of visual realism in images. In Steven J. Gortler and Karol Myszkowski, editors, Rendering Techniques 2001, pages 235–247, Vienna, 2001. Springer Vienna. [24] Zvonimir Sabati et al. Ray tracing algorithm rendering. In Central European Conference on Information and Intelligent Systems, page 221. Faculty of Orga- nization and Informatics Varazdin, 2011. [25] D Amnon Silverstein and Joyce E Farrell. Efficient method for paired compari- son. Journal of Electronic Imaging, 10(2):394–399, 2001. [26] Valdimir Volevich, Karol Myszkowski, Andrei Khodulev, and Edward Kopy- lov. Using the visual differences predictor to improve performance of progres- sive global illumination computation. ACM Transactions on Graphics (TOG), 19(2):122–161, 2000. [27] Zhou Wang, Alan C Bovik, Hamid R Sheikh, Eero P Simoncelli, et al. Image quality assessment: from error visibility to structural similarity. IEEE transac- tions on image processing, 13(4):600–612, 2004. [28] Zhou Wang, Eero P Simoncelli, and Alan C Bovik. Multiscale structural simi- larity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, volume 2, pages 1398–1402. Ieee, 2003. Appendix A Supplemental Information

A.1 Acronyms

• 2AFC - Two-Alternative Forced Choice

• AA - Anti-Aliasing

• BF - Brute Force

• DSSIM - Structural Dissimilarity

• GI - Global Illumination

• IM - Irradiance Map

• IQA - Image Quality Assessment

• LC - Light Cache

• LOD - Level of Detail

• SSIM - Structural Similarity

• SQ - Shading Quality

• UQI - Universal Quality Index

• VFX - Visual Effects

45

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden