ABSTRACT

HUSSAIN, SYED ASIF. Stereoscopic, Real-time, and Photorealistic Rendering of Natural Phenomena – A GPU based Particle System for Rain and Snow. (Under the direction of Dr. David F.McAllister and Dr. Edgar Lobaton.)

Natural phenomena exhibit a variety of forms such as rain, snow, fire, smoke, fog, and

clouds. Realistic stereoscopic rendering of such phenomena have implications for scientific visualization, entertainment, and the video game industry. However, among all natural

phenomena, creating a convincing stereoscopic view of a computer generated rain or snow,

in real-time, is particularly difficult. Moreover, the literature in rendering of precipitation

in stereo is non-existent and research in stereo rendering of other natural phenomenon

is sparse. A survey of recent work in stereo rendering of natural phenomenon, such as vegetation, fire, and clouds, is done to analyze how stereoscopic rendering is implemented.

Similarly, a literature review of monoscopic rendering of rain and snow is completed to

learn about the behavior and distribution of particles in precipitation phenomena. From

these reviews, it is hypothesized that the monoscopic rendering of rain or snow can be

extended to stereo with real-time and photorealistic results. The goal of this study is to validate this hypothesis and demonstrate it by implementing a particle system using a

graphics processing unit (GPU).

The challenges include modeling realistic particle distributions, use of illumination

models, and the impact of scene dynamics due to environmental changes. The modern

open graphics library shading language (GLSL) and single instruction multiple threads

(SIMT) architecture is used to take advantage of data-parallelism in a graphics processor.

The particle geometry is modeled by a few vertices, which are morphed into a raindrop or

snowflake shapes. Every vertex is processed in parallel, using the SIMT GPU architecture. A compute shader program, a new compute mode GPU programming language, is used

to implement the effects of physical forces on rain or snow particles. Additionally for rain,

the concept of retinal persistence is used to elongate the raindrop so that it appears as a

falling rain streak. Dynamic level of detail on rain streaks and snowflakes is implemented

so that particles closer to the viewer have more visual detail then particles farther away.

Illumination models are applied for photorealistic output. The scene is rendered for the left-

and right-eye views to produce stereoscopic output, while reducing rendering complexity

by drawing some features such as object shadows only once.

Additional experiments are performed to evaluate and compare various 2D-3D soft- ware video converters. The goal of these experiments is to determine effectiveness of the

2D-3D converters in producing realistic stereoscopic output of scenes containing water

phenomenon. Such scenes are challenging to convert due to scene complexity such as

details in scene dynamics, illumination, and reflective distortion. Comparisons between

five 2D-3D software video converters are provided by using quantitative and subjective

evaluations. The study concludes with experiments on the visual factors necessary to pro-

duce photorealistic output. The experimental method uses a series of controlled human

experiments where participants are presented with video clips and still photographs of

real precipitation. The stimuli vary along three visual factors such as number of particles,

particle sizes, and their motion. The goal is to determine the statistical ranking and im-

portance of these visual factors for producing a photorealistic output. The experiments

are extended to investigate if stereo improves photorealism. Experimental stimuli include

post-processing on rendered output to produce variable lighting, glow, and fog effects to

study their impact on photorealism as the moves in the scene. © Copyright 2017 by Syed Asif Hussain

All Rights Reserved Stereoscopic, Real-time, and Photorealistic Rendering of Natural Phenomena – A GPU based Particle System for Rain and Snow

by Syed Asif Hussain

A dissertation submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy

Computer Engineering

Raleigh, North Carolina

2017

APPROVED BY:

Dr. David F.McAllister Dr. Edgar Lobaton Co-chair of Advisory Committee Co-chair of Advisory Committee

Dr. Edward Grant Dr. Gregory T. Byrd

Dr. Theodore R. Simons DEDICATION

This dissertation is dedicated to my family, their unending support and love made this work

possible, and to my committee who gave me the necessary foundation to reach my goals.

ii BIOGRAPHY

Syed grew up in Pakistan. His favorite after school activity was to use his brother’s Sinclair ZX

Spectrum personal computer. This early introduction to computers inspired him to pursue higher education in computer engineering. As luck has it, his brother won a scholarship to study at Duke university and encouraged Syed to follow his engineering dreams at NC

State university. In 1994 Syed graduated from NC State with a Bachelor of Science degree in computer engineering with a minor in mathematics. He continued his graduate studies at NC State while working as a software engineer, which resulted in a Master of Science in electrical engineering with thesis in 1998. He went on to work in the software industry and moved to Massachusetts. He was conferred with an online degree from NC State in

2003, a Master of Engineering with a concentration in computer science. Syed came back to Raleigh to join NC State with an ultimate goal, to be a Doctor of Philosophy. Along with achieving his research objectives, Syed has been teaching at a local community college since 2010.

iii ACKNOWLEDGEMENTS

I would like to acknowledge and thank every member of my committee. Special thanks

to my academic advisor Dr. McAllister for motivating me to move onward and upward.

His expertise are paramount in my understanding of the field. My accomplishments are

incomplete without his support. My committee co-chair Dr. Lobaton has always been

understanding, accessible, and open for discussion. His insight has led to many improve-

ments in my work. Many thanks to Dr. Grant for his engaging lectures and for always being

on my side. Always welcoming Dr. Byrd has been instrumental in providing me with en-

couragement in times of need. Thank you for always keeping your door open and being

approachable, kind, and fair. Before anyone on my committee, there was Dr. Simons. He,

along with John Wettroth of Maxim Integrated, played a pivotal role in helping me find my way and point me in the direction of success. Many thanks to Dr. Perros as without him

being approachable and open to discuss my plans, I would not have been able to form a

committee of exceptional faculty. I am forever in debt. Thank you.

Lastly, I must acknowledge my family. To my wife for giving me hope of success when I

needed it the most and to our daughters and son for giving meaning to our lives. Thank

you for all the joy and opportunities to provide unconditional love.

iv TABLE OF CONTENTS

LIST OF TABLES ...... viii

LIST OF FIGURES ...... ix

Chapter 1 Introduction ...... 1 1.1 Research Objectives...... 3 1.1.1 Stereoscopic Rendering...... 4 1.1.2 Real-time Execution ...... 4 1.1.3 Photorealistic Output ...... 5 1.1.4 The 2D-3D Conversion ...... 5 1.1.5 Perceptual Space and Measuring Photorealism...... 6 1.2 Research Contributions ...... 6 1.3 Chapters Layout...... 7

Chapter 2 Formation of Rain and Snow ...... 9 2.1 Types of Rain and Snow ...... 10 2.1.1 Convectional Rain...... 11 2.1.2 Frontal Rain ...... 12 2.1.3 Relief or Orographic Rain...... 13 2.1.4 Dry Snow ...... 13 2.1.5 Wet Snow...... 14 2.2 Rain Intensity, Size and Shape...... 14 2.3 Snow Intensity, Size and Shape...... 15

Chapter 3 Literature Review ...... 19 3.1 Stereoscopic Rendering of Natural Phenomenon...... 20 3.2 Monoscopic Rendering of Rain...... 21 3.3 Monoscopic Rendering of Snow ...... 30 3.4 Computational Analysis...... 38 3.4.1 Rendering Performance – Rain...... 38 3.4.2 Rendering Performance – Snow...... 43

Chapter 4 Stereo Rendering ...... 47 4.1 ...... 48 4.2 Psychological Depth Cues...... 49 4.3 Physiological Depth Cues...... 51 4.4 Creating Stereo Pairs ...... 55 4.5 Viewing Stereo Pairs...... 57 4.5.1 Free Viewing...... 59

v 4.5.2 Time-parallel Viewing...... 59 4.5.3 Time-multiplexed Viewing...... 62 4.6 Challenges with Stereo Rendering...... 63

Chapter 5 Implementation Framework ...... 66 5.1 Graphics Hardware ...... 67 5.2 Graphics Software ...... 68 5.3 Stereoscopic Implementation...... 69 5.4 Real-time Implementation...... 71 5.4.1 Shader Mode...... 72 5.4.2 Compute Mode...... 76 5.4.3 Transform Feedback...... 78 5.5 Photorealistic Implementation...... 80 5.5.1 Illumination using a GPU ...... 81 5.6 Particle System...... 83 5.7 Precipitation using the Particle System ...... 86 5.8 Compute Mode Particle Simulation...... 88 5.9 Animation...... 89

Chapter 6 Experiments and Results ...... 91 6.1 Phase 1 – Real-time Stereo ...... 91 6.1.1 Method 1 – Horizontal Parallax ...... 92 6.1.2 Method 2 – Asymmetric View Frustum...... 98 6.1.3 Frame rate Comparison...... 100 6.1.4 Phase 1 – Conclusions...... 102 6.2 Phase 2 – Stereo from 2D-3D Converters...... 103 6.2.1 Overview ...... 104 6.2.2 Depth Estimation Techniques ...... 105 6.2.3 Quantitative Experiments and Results ...... 107 6.2.4 Subjective Experiments and Results...... 115 6.2.5 Rain and Snow Rendering ...... 118 6.2.6 Runtime...... 121 6.2.7 Phase 2 – Conclusions...... 123 6.3 Phase 3 – Measuring Photorealism ...... 123 6.3.1 Visual Factors for Photorealism ...... 124 6.3.2 Measuring Photorealism...... 125 6.3.3 Experiment 1 – Perceptual Space...... 126 6.3.4 Experiment 2 – Other Visual Factors...... 132 6.3.5 Experiment 3 – Photorealism and Stereo...... 135 6.3.6 Phase 3 – Conclusions...... 137

vi Chapter 7 Future Enhancements ...... 139 7.1 Summary...... 139 7.2 Future Extensions ...... 141

REFERENCES ...... 144

vii LIST OF TABLES

Table 2.1 Snowflake shapes. Image Credit: Snowcrystals.com ...... 17

Table 3.1 Monoscopic rain rendering literature...... 29 Table 3.2 Monoscopic snow rendering literature...... 39 Table 3.3 Performance summary of monoscopic rain rendering ...... 43 Table 3.4 Performance summary of monoscopic snow rendering ...... 46

Table 6.1 Frame rate comparison...... 102 Table 6.2 Parallax (in Pixels) for objects closer to the camera ...... 112 Table 6.3 Parallax (in Pixels) for objects farther to the camera ...... 112 Table 6.4 Normalized MSE between baseline and 2D-3D converters ...... 115 Table 6.5 Survey questions for visual assessments...... 116 Table 6.6 Frame rate comparison for rain and snow rendering ...... 120 Table 6.7 Survey questions to determine the perceptual space ...... 129 Table 6.8 Descriptive statistics – rain/snow to determine perceptual space . . . 130 Table 6.9 Survey questions to determine other important visual factors ...... 133 Table 6.10 Descriptive statistics – rain/snow to determine other visual factors . . 135 Table 6.11 Response frequency of rain/snow ...... 136

viii LIST OF FIGURES

Figure 2.1 Conversion of water from one form to another...... 10 Figure 2.2 Convectional Rain...... 11 Figure 2.3 Frontal Rain ...... 12 Figure 2.4 Relief Rain...... 13 Figure 2.5 Raindrop changes shape as it grows in size...... 16

Figure 4.1 Stereo pair as viewed by the two eyes...... 48 Figure 4.2 Gustave Caillebotte. Paris Street, Rainy Day, 1877. Image Credit: Art Institute of Chicago...... 50 Figure 4.3 Binocular disparity ...... 52 Figure 4.4 Accommodation - changes in lens thickness accommodate focus on objects ...... 53 Figure 4.5 Vergence - inward or outward movement of both eyes to converge on objects...... 54 Figure 4.6 Motion parallax - object closer to viewer appear to move faster . . . . 55 Figure 4.7 Stereo visible where the two view frustums overlap (top view) . . . . . 56 Figure 4.8 Stereo window and horizontal parallax...... 58

Figure 5.1 Transformation from 3D world space to 2D screen space...... 70 Figure 5.2 Fragment shader (stereo view with red-cyan anaglyph glasses) . . . . 74 Figure 5.3 Input and output of tessellation shader ...... 75 Figure 5.4 Modern graphics pipeline ...... 77 Figure 5.5 Implementation of a particle system using transform feedback . . . . 79 Figure 5.6 Particle system block diagram ...... 84 Figure 5.7 Simulation and rendering loops...... 88

Figure 6.1 Symmetric view frustum with stereo overlap (top view)...... 93 Figure 6.2 Generating stereo rain streak (top view)...... 94 Figure 6.3 Rain billboard and mipmaps (side view)...... 95 Figure 6.4 Single camera setup to add parallax to rain streaks ...... 97 Figure 6.5 Method 1 output (stereo view with red-cyan anaglyph glasses) . . . . 98 Figure 6.6 Symmetric view frustum (top view) ...... 99 Figure 6.7 Asymmetric view frustum (top view) ...... 100 Figure 6.8 Method 2 output (stereo view with red-cyan anaglyph glasses) . . . . 101 Figure 6.9 Baseline input image to test depth from linear perspective ...... 108 Figure 6.10 Baseline output image: Test case C-1 ...... 108 Figure 6.11 Output of Axara 3D software video converter ...... 109 Figure 6.12 Depth from occlusion. Test case C-2...... 110 Figure 6.13 Depth from object placement. Test case C-3...... 110

ix Figure 6.14 Depth from stereoscopic camera: Test case C-4, C-5 and C-6 ...... 111 Figure 6.15 Axara 3D output for a monoscopic rendered rain scene ...... 117 Figure 6.16 Variation in rating from twenty five participants ...... 119 Figure 6.17 One of the eight visual stimuli for rain and snow scene ...... 128 Figure 6.18 Rain scenes - response to number of particles, size, and motion . . . 130 Figure 6.19 Snow scenes - response to number of particles, size, and motion . . . 131 Figure 6.20 Rain scenes - response to lighting, glow, and fog ...... 134 Figure 6.21 Snow scenes - response to lighting, glow, and fog ...... 134

x CHAPTER 1

INTRODUCTION

Stereoscopic rendering produces the left- and right-eye views of the same scene from two different perspectives. True 3D animations and movie special effects demand realistic view- ing experience. Similarly, software applications in virtual reality (VR) systems, which have become mainstream, must produce stereoscopic output for viewer to have an immersive feel. Often in movies, video games, and VR applications use of natural phenomena, such as rain or snow enhances the scene or the storyline. Stereoscopic rendering of precipitation is also relevant to scientific visualization applications, such as study of various weather phenomena. However, creating a convincing stereoscopic rendering of rain or snow, in real-time, is particularly difficult. The dynamics of falling particles, their interaction with light, collision with other objects in the environment, and effects of external forces such as gravity and wind make them complex phenomena to simulate and render.

The focus of this study is to develop techniques for the stereoscopic, real-time, and photorealistic rendering of natural phenomena, such as rain or snow. The dynamics of such

1 phenomena is simulated and rendered in stereo at a real-time frame rate with photorealistic

output by taking advantage of data-parallelism, programmability, and new features of

current Graphics Processing Units (GPU). Many commercially available 2D-3D software video converters can also produce stereoscopic outputs. How well these outputs compare to

rendered stereo output is examined. The visual factors necessary to produce photorealistic

output in precipitation scenes are also measured by performing subjective experiments.

Several studies have explored computer rendering of natural phenomena. A compre-

hensive survey of recent work done in data acquisition and simulation techniques needed

to render natural phenomena is provided by Zhao [1]. The majority of these studies use monoscopic rendering, where depth sensation is simulated by using monoscopic cues such

as perspective, occlusions, relative size, atmospheric haze, motion parallax, shading and

shadows. Although monoscopic rendering provides some information about depth, the viewer lacks the ability to look beyond or around objects in that scene. The use of stereo-

scopic rendering can remedy this shortfall by providing true 3D depth sensation, which

is important in many computer graphics applications. However, research in stereoscopic

rendering of natural phenomena has been limited because visually accurate stereoscopic

output is challenging to produce without any visual fatigue. These challenges include ac-

counting for variations in stereo perception from person to person. Approximately five

percent of the general population is stereoscopically latent or stereo blind and cannot

sense depth [2]. Stereo is also difficult to perceive in quickly changing scenes such as fast moving objects or camera. Despite these challenges, a stereoscopic view of precipitation

greatly enhances a scene giving an immersive feel in applications of virtual reality, scientific visualization, video games and movies.

2 The stereoscopic rendering also presents many challenges. Due to the appearance of

depth in a stereo view, objects in the background become much more noticeable. Thus

scenes that may appear visually appealing in monoscopic view, because of ignored back-

ground, may appear distracting in stereo. Therefore, stereo extension of existing mono-

scopic rendering techniques are challenging. Additionally, the monoscopic and stereo

depth cues are additive. If these are combined incorrectly, they can create a conflict in

depth cues. This may result in eyestrain and an unpleasant viewing experience. Other

challenges include proper scene illumination such as use of light reflection and refraction

models, impact of scene dynamics due to environmental changes, such as wind speed, and

scene synchronization with natural sound effects.

Additionally, monoscopic techniques do not provide the necessary depth information

to permit rendering of left- and right-eye views with the correct parallax needed for stereo.

A technique that appears feasible to implement in real-time for a monoscopic environment

can become too computationally expensive to implement in stereo. Real-time applications

often opt for using heuristics to give an illusion of visual correctness instead of applying

physically accurate simulation models. Such rendering shortcuts may give seemingly realis-

tic output on a monoscopic display. However, their visual accuracy in stereoscopic displays

is an open research question.

1.1 Research Objectives

The objective of this research is to demonstrate that stereoscopic, real-time, and photoreal-

istic rendering of a rain or snow scene is achievable with contemporary programmable GPU.

A major contribution of this study is to extend the monoscopic statistical precipitation

3 models to simulate the behavior and distribution of falling particles for stereo viewing.

The application must respond to new parameters and make necessary adjustments, in

real-time, so that the user continues to view a high quality, photorealistic, stereo output without any visual fatigue. An additional goal of this research is to measure the quality of

stereo output from various 2D-3D video software converters and compare with the ren-

dered results of this study by analyzing answers to survey questions. Moreover, a series of

subjective experiments are performed to determine the statistical ranking and importance

of visual factors in precipitation scenes important for photorealism, such as number of

particles, size, and motion.

1.1.1 Stereoscopic Rendering

The stereo problem poses the biggest challenge since stereo perception varies from person

to person. Correct use of depth cues without conflict is important. Monoscopic rendering

methods are extended using modern graphics library (OpenGL) and corresponding OpenGL

shading language (GLSL) programs that run directly on a GPU to get the best stereo results.

1.1.2 Real-time Execution

The definition of real-time is taken from Akenine-Möller et al. [3]. A viewpoint change should result in a redraw of newly rendered scene at a minimum rate of 15 frames per second

(fps). This is a lower bound on real-time frame rates. Higher frame rates are desirable and

achievable by use of the modern hardware. The user should not observe any flicker or

discontinuous motion since such visual artifacts would have detrimental effect on stereo viewing. At least 120 fps are desired for flicker free stereo viewing in an interactive gaming

4 environment. Graphics acceleration hardware is used to meet real-time requirements.

Current hardware offers a programmable graphics pipeline that runs on a GPU using

multiple independent parallel execution threads, i.e. a single instruction runs on multiple

independent threads at the same time in lockstep. This type of parallel architecture, referred

to as Single Instruction Multiple Threads (SIMT), is well suited for solving many graphics

problems in real-time.

1.1.3 Photorealistic Output

Photorealism is the process of image generation by simulating the physical interaction of

light in an environment and how light from the source is reflected and scattered through

the environment. It is also to determine how local and global illumination models make

precipitation rendering realistic. The challenge is to implement photorealistic output in

real-time with stereoscopic rendering, which is one of the major goals of this study.

1.1.4 The 2D-3D Conversion

The alternative to simulating and rendering stereo precipitation is to apply 2D-3D con- version using commercially available software applications. The goal is to measure the

quality of stereo output of 2D-3D software video converters and compare the results with

the output of this study. The quality of the stereo output is measured in two ways. It is

quantitatively measured by comparing the horizontal disparity produced by various 2D-3D

converters with the known disparity taken from baseline images. It is also measured sub-

jectively by asking participants survey questions regarding the quality of visual experience when viewing 2D-3D video conversions in stereo.

5 1.1.5 Perceptual Space and Measuring Photorealism

The perceptual space is defined as the visual experience of a precipitation scene as observed

from the ground. From the literature review of related work in computer generation of rain

and snow it is concluded that there are three key visual factors influencing the perceptual

space, thus important for producing photorealistic results. They are number, size, and

movement of particles. The experiments performed in this study measure the statistical

ranking and importance of the three visual factors. Additionally, visual stimuli varying along

the three factors and are used in subjective experiments where participants are asked to

respond to survey questions. Photorealism is quantified by analyzing the results from these

experiments.

1.2 Research Contributions

The major results presented in the latter chapters of this work are the following:

I. Stereoscopic rendering achieved by using GPU quad buffers. This type of display uses

active shutter goggles to view in stereo.

II. A real-time frame rate of 60 frames per seconds (fps) per eye is achieved when vertical

sync is turned on. The vertical sync synchronizes GPU output to the display screen

refresh rate, which is typically set at 60 fps.

III. Photorealistic output is achieved by simulating global light effects with reflection,

specular, ambient, and diffused lighting components.

6 IV. Stereo output is compared and evaluated against commercially available 2D-3D

software video converters.

V. Perceptual space for rain and snow is determined and visual effects of photorealism

are measured by performing survey experiments.

1.3 Chapters Layout

A summary of the remaining chapters follows.

Chapter 2: In order to accurately simulate and render stereoscopic scene with rain

or snow, it is important to understand the attributes and various properties associated with such natural phenomena. This chapter provides a general description of rain and

snow formation in nature. Various types of natural precipitations are discussed along with

attributes that define them such as particles shape, size, and intensity.

Chapter 3: A review of recent literature in monoscopic rendering of rain and snow is

provided. Stereoscopic rendering of natural phenomenon such as fire, trees, and clouds

is also reviewed. The literature review provides an overview of various mono and stereo-

scopic rendering techniques, including methods to simulate the dynamics of rain and

snowfall. It also details various algorithms for rain splash and snow accumulation along with listing computational performance data. The topics covered in the chapter provide a

comprehensive background needed to achieve the research objectives.

Chapter 4: This chapter details stereoscopic depth perception and compares with

monoscopic depth cues. Methods of creating stereo pairs and their viewing techniques

are described. Various methods of displaying stereo rendering and related issues are also

7 discussed.

Chapter 5: An implementation framework of rendering precipitation phenomena in

stereo is given. Such implementation is computationally expensive and an application

running only on a CPU may not have the capacity to maintain a real-time frame rate. There-

fore, implementation takes advantage of hardware acceleration by a GPU. The graphics

hardware and rendering pipeline used for implementation of the demonstration applica-

tion is detailed in this chapter. It also provides detail about how to achieve real-time and

photorealistic implementation. Particle system implementation and animation using a

programmable GPU is described. Such particle systems can be modified to show other

natural phenomena such as mist, drizzle, hail or even falling autumn leaves.

Chapter 6: In this chapter various experiments and results are presented including:

computational complexity, a comparison between the outputs of various 2D-3D software video converters, and findings from the survey experiments to determine perceptual space

of rain and snow.

Chapter 7: The last chapter discusses future works and extensions to this study.

8 CHAPTER 2

FORMATION OF RAIN AND SNOW

This chapter provides background information needed to understand formation of rain

or snow in nature thus providing better understanding of how to accurately simulate and

render such processes. In nature, causes of formation of precipitation are complex but the

processes involved are always the same. The Earth’s atmosphere varies in temperature with

altitude. Generally, air at ground level is warmer. As altitude increases, the air temperature

falls. The cooling causes the moisture in the air to condense and form clouds. There are

three basic cloud types: cumulus, stratus, and cirrus meaning heaped, spread out, and

a lock of hair respectively. To describe clouds capable of precipitation the term nimbus,

meaning rain, is added [4]. When condensed moisture in nimbus clouds becomes heavy, it falls as rain. It is liquid precipitation different form drizzle. Relative to rain, drizzle droplets

are far smaller and more in number that results in poor visibility and produce a fog like

effect. If air temperature falls even further, the moisture in the air condenses into tiny ice

crystals to form snowflakes. If enough ice crystals stick together, they become heavy and fall

9 to the ground as snow. This cycle of surface moisture to convert into water vapors and come down in various forms of precipitations is called the water cycle, illustrated in Figure 2.1.

Figure 2.1 Conversion of water from one form to another

2.1 Types of Rain and Snow

Rain is classified into three types, namely convectional, frontal, and relief, also known as

Orographic, rain [5]. These types produce rain with variable intensity, droplet size and shape. Same winds that form conditions for rain can form snow but the air temperature must be close to freezing, between 0 and 2 °C(32 to 35.6 °F). Snow contains a certain amount of moisture. On average 127 mm (5 inches) of snow melts into 12.7 mm (0.5 inches) of water.

Based on the amount of moisture it contains, snow can be characterized into two major

10 types, namely dry and wet snow [6]. The following sections further describe rain and snow types and associated properties.

2.1.1 Convectional Rain

This type of rain is commonly formed in the tropics. It tends to form when the sun is the warmest. Because land warms more quickly than the sea, convectional rain generally forms

over land. It produces cumulus and rapidly towering cumulonimbus clouds, which results

in short but heavy rain interspaced with sunny periods. Figure 2.2 illustrates this process.

As the air moves upward and away from the relatively warmer ground surface, it cools down.

Figure 2.2 Convectional Rain

The cooler air is unable to hold as much water vapor as it can when it is warmer. Eventually

the temperature of the rising air reaches a dew point, a temperature where air cannot hold water vapor anymore. At dew point, the cooler air condenses. Condensation is the process

by which the water vapor held in the air is turned back into liquid water droplets, which

11 falls as rain.

2.1.2 Frontal Rain

Frontal rain is common in temperate regions. This type of rain takes a long time to arrive.

Initially, high altitude cirrus clouds are formed. Subsequently, they become layered and turn into stratus clouds that eventually become rain producing nimbostratus clouds. Frontal rain usually starts slowly and remains steady for several hours. It forms when two air masses meet. If a warm air mass meets a cold air mass, a warm front forms. Conversely, a cold front is formed when a cold air mass meets a warm air mass as shown in Figure 2.3.

Figure 2.3 Frontal Rain

In a warm front, the warm air, being less dense, gently slides over the cold air. As it rises, it cools and condenses into clouds. In a cold front since cold air is denser, it forces its way under the warm air. Consequently, the warm air is forced up quickly which leads to large cumulonimbus clouds producing heavy rain, often with thunder and lightning.

12 2.1.3 Relief or Orographic Rain

This rain type is characterized by thick clouds that produced light rain conditions. It can

occur at any time over a mountainous terrain. The warm air over an ocean is forced to

rise when it encounters high land mass. As this moist air gains height it gets cooler. Water vapors in this air mass gradually condense to form rain clouds. Figure 2.4 illustrates this

phenomenon.

Figure 2.4 Relief Rain

2.1.4 Dry Snow

Dry snow, sometimes called powdery snow, is formed when snowflakes pass through dryer

and cooler air at temperature below or closer to freezing. Dry snow contains below average

13 moisture, sometimes producing 12.7 mm (0.5 inch) of water for every 508 mm (20 inches)

of snow thus accumulating to higher depths. This type of snow produces large number of

smaller snowflakes resulting in a powdery texture that can drift more easily with wind. Such

snow is difficult to stick, easy to shovel, and is perfect for winter sports.

2.1.5 Wet Snow

Wet snow is created when the temperature is slightly warmer than freezing. At this temper-

ature, snowflakes melt around the edges and stick together resulting in a small amount of

big and denser snowflakes. Wet snow contains above average moisture producing 25.4 mm

(1 inch) of water for every 127 mm (5 inches) of snow. Such snow is easy to pack together

into various snow sculptures but is much heavier to shovel.

2.2 Rain Intensity, Size and Shape

Variation in the number of raindrops impacts rain intensity, which is measured by a volume

of water accumulated per unit of time. The more raindrops, and therefore water, the greater

the rain intensity, which is classified into three categories: light, moderate, and heavy rain.

The rate of light rain varies between a trace and 2.5 mm/hr (millimeters per hour). Moderate

rain rate is between 2.5 to 7.5 mm/hr. Rain is classified as heavy if the rate is more than 7.5

mm/hr. The size of a raindrop is typically greater than 0.5 mm in diameter. In widely scattered

rain, the drops may be smaller, down to 0.1 mm. In general, a raindrop size can vary from 0.1

mm to 9 mm across. However, droplets larger than about 4 mm can become unstable and

14 split into smaller droplets. The probability of a raindrop breaking up is given as a function of its size and results in an exponential increase in the likelihood of breakup in droplet sizes beyond 3.5 mm [7]. The shape of a raindrop depends on the result of a tug-of-war between surface tension of the raindrop and the pressure of the air pushing up against the bottom of the drop as it falls. When the drop is smaller than 2 mm, surface tension wins and pulls the drop into a spherical shape. The fall velocity increases as a raindrop gets bigger which causes the pressure on the bottom to increase. The raindrop becomes more oblate with a flatter surface facing the oncoming airflow. Larger drops become increasingly flattened at the bottom forming a depression. In raindrops that are greater than 4 mm across, this depression grows to form a parachute shape that explodes into many smaller droplets. Examples of this phenomenon are given by Pruppacher et al. [8]. Figure 2.5 illustrates the change in droplet shape as its size increases. The final illustration on the right shows the breakup of a bigger than 4 mm droplet into smaller spherical droplets.

2.3 Snow Intensity, Size and Shape

Like rain, snow is classified into light, moderate and heavy categories. Snow intensity is measured by considering how much equivalent liquid water a snowfall generates. This is called liquid water equivalent measurement. Light snow rate is about 1.0 mm/hr, moderate rate is between 1.0 to 2.5 mm/hr, and it is classified as heavy if the rate is more than 2.5 mm/hr. The size of a snowflake varies greatly. The smallest snowflakes are called diamond dust crystals. They are as small as 0.1 mm. These faceted crystals sparkle in sunlight as they float

15 Figure 2.5 Raindrop changes shape as it grows in size

through the air, which is how they got their name. They form rarely in very extremely cold weather. The largest snow crystal ever recorded, measured 10 mm (0.4 inches) from tip to

tip.

The shape of snowflakes is influenced by the temperature and humidity of the atmo-

sphere. Snowflakes form in the atmosphere when water droplets freeze onto dust particles.

Depending on the temperature and humidity of the air where the snowflakes form, the

resulting ice crystals will grow into a myriad of different shapes. Snow crystals are classified

by Magono and Lee [9] into eight primary categories as explained below and illustrated in Table 2.1.

1. Stellar Dendrites are quite large and a common type of snowflakes. The best speci-

mens usually appear when the weather is very cold close to -15 °C (5 °F). The name

comes from their star-shaped appearance, along with their branches called dendrite

meaning tree-like.

16 Table 2.1 Snowflake shapes. Image Credit: Snowcrystals.com

No. Category Illustration

1 Stellar Dendrites

2 Columns and Needles

3 Capped Columns

4 Fernlike Stellar Dendrites

5 Diamond Dust Crystals

6 Triangular Crystals

7 Twelve-branched Snowflakes

8 Rimed Snowflakes and Graupel

2. Columns and Needles snow crystals form when the temperature is around -6 °C (21

°F). They are small in size and look like bits of white hair. Longer column crystals look

like needles.

3. A Capped Column forms when it travels through multiple temperature layers as it

grows. First a column forms at approximately -6 °C (21 °F). After that, plates grow

on the ends of the columns near -15 °C (5 °F). These types of snowflakes are very

uncommon and difficult to spot.

4. Fernlike Stellar Dendrites snowflakes are similar to the stellar dendrites but slightly

larger. The specimen size can be up to 5 mm. They have leaf like structure with

side-branches parallel to neighboring branches. These crystals are not perfectly sym-

metrical. The side-branches on one arm are not usually the same as those on the

17 other branches.

5. Diamond Dust Crystals are the tiniest snow crystals no larger than the diameter of a

human hair forming in extreme cold. The basic ice crystal shape is that of a hexagonal

prism, governed by crystal faceting.

6. The aero-dynamical effects help produce these Triangular Crystals. They are typi-

cally small, shaped like truncated triangles. Sometimes branches sprout from the six

corners, yielding an unusual symmetry.

7. When two small six-branched snow crystals collide in mid-air, they can stick to-

gether and grow into a Twelve-Branched Snowflake. This occurs frequently in a windy

environment.

8. Snow crystals grow inside clouds made of water droplets. Often a snow crystal will

collide with some water droplets, which freeze onto the ice. These droplets are called

rime. A snow crystal might have no rime, a few rime droplets, quite a few, and some-

times the crystals are completely covered with rime. Blobs of rime are called graupel

meaning soft hail.

18 CHAPTER 3

LITERATURE REVIEW

Literature in rendering of natural phenomenon in stereo is sparse and work in rendering

rain and snow in stereo is non-existent. A fair amount of literature is focused on monoscopic

rendering of natural phenomena. This is because rendering techniques that work well in

monoscopic view such as bump mapping, which adds realism by modifying surface normal vectors to simulate bumps and wrinkles on the surface of an object, fail in stereo. The depth

information is lost and objects may look flat giving the rendered scene a cardboard effect.

The literature review of monoscopic rendering of rain and snow helps establish under-

standing of realistic rain and snow distribution models in virtual space. The literature in

stereoscopic rendering of natural phenomenon, such as fire, smoke, fog, or clouds, helps in

understanding stereoscopic rendering techniques in the context of natural phenomena.

Since there is no work in stereo rendering of precipitation, such as rain or snow, there is a

potential for research to extend monoscopic rain and snow models to stereo for use in VR

and true 3D animations, movies and applications. The focus of this chapter is to perform a

19 literature review on monoscopic rendering of precipitation and stereo rendering of other

natural phenomenon as it is hypothesized that monoscopic rendering of rain or snow can

be extended to stereo.

3.1 Stereoscopic Rendering of Natural Phenomenon

Natural phenomena such as fire, gaseous elements and vegetation have been the focus of

stereoscopic rendering. A real-time photorealistic and stereoscopic rendering method to

depict fire is proposed by Rose and McAllister [10]. Just like rain and snow, fire has dynamic characteristics that are challenging to reproduce in real-time with photorealistic effects.

The authors use pre-rendered high-quality images of fire as textures to attain photorealistic

effects. Real-time rendering is achieved by using billboards. Billboarding is a technique

that maps texture onto a polygon, which orients itself according to changing view direction.

This polygon is called a billboard. The billboard rotates around an axis and aligns itself to

face the viewer. Use of single billboard is not enough to give illusion of depth. Several layers

of billboards, called slices, are proposed to give depth to the rendered fire scene.

In another study, Johnson and McAllister [11] present stereo rendering of gaseous natural phenomena that vary in density, cast shadows, are transparent, and have dynamic behavior.

Such phenomena include fog, mist, and clouds. The authors apply a volume rendering

technique called splatting. This technique improves rendering time but is less accurate.

Splatting projects volume elements, called voxels, on the 2D viewing plane. It approximates

this projection by using a Gaussian splat, which depends on the opacity and on the color of

the voxel. Other splat types, like linear splats can be used also. A projection to the image

plane is made for every voxel and the resulting splats accumulate on top of each other in

20 back-to-front order to produce the final image.

A real-time stereo implementation of rendering vegetation is described by Borse and

McAllister [12], which use pre-rendered images of vegetation. They apply image-based rendering in stereo, which is an enhancement of monoscopic rendering of vegetation

proposed by Jakulin [13]. Rendering from an arbitrary viewpoint is achieved by using the composite of the nearest two image slices that are alpha-blended as the user changes the viewing position. Alpha blending is a combination of two colors allowing for transparency

effects. The value of alpha in the color code ranges from 0.0 to 1.0, where 0.0 represents a

fully transparent color, and 1.0 represents a fully opaque color. Their method improved

stereo quality and demonstrated reduction in visual artifacts.

3.2 Monoscopic Rendering of Rain

To analyze the effect of rainfall, it is necessary to understand the visual appearance of a

single raindrop. Several geometric and photometric models are proposed by Garg and Nayar

[14], which simulate light reflection and refraction through the raindrops. Such models are critical in understanding the visual representation of rain effects. The appearance of a

raindrop is a complex mapping of the light radiating from the environment. The results of

this technique showed that each drop acts like a wide-angle lens. The light redirects from a

large field of view towards the observer. The raindrop models created in this study provided

fundamental tools to analyze complex rain effects. Another raindrop shape models is

proposed by Roser et al. [15], which incorporates cubic Bezier curves. The authors provided validation of the model and show that the error between the shape fitted model and the

real raindrop is significantly less than when a spherical shape model is used. Such shape

21 models are useful for image correction by removing distortion created by raindrops.

In another study, Garg and Nayar [16] illustrated photorealistic rendering of rain with a system that measures rain streaks. A rain steak is formed when a raindrop looks like a bright

stroke on an image due to its fast falling speed relative to the camera exposure. In humans,

falling raindrops are perceived as streaks due to retinal persistence. An image formed on

the retina takes about 60 to 90 milliseconds to fade away, during which the raindrop keeps

falling resulting in composite image on the retina producing this persistence effect. As a

raindrop falls, it undergoes shape distortions known as oscillations. The interaction of light with oscillating raindrops produces complex brightness patterns, such as speckles and

smeared highlights, within a single motion blurred rain streak as viewed by an observer. The

authors presented a model build for rain streak appearance that captures these complex

interactions between various raindrop oscillations, lighting, and viewing directions. To

fully capture all the possible illumination conditions, they constructed an apparatus to

photograph distortion in a falling raindrop from various angles. From these experiments

a large database consisting of thousands of rain streak images is created that capture variations and appearance of a falling raindrop with respect to light position and view

direction. Subsequently, an image-based rendering algorithm is applied to utilize their

database to add rain to a single image or video.

A novel understanding of the visual effects of rain is also presented by Garg and Nayar

[17]. They analyzed various influential factors and proposed an effective photorealistic rendering algorithm. The authors first presented a photometric model that describes the

intensities produced by individual rain streaks. Then they produced a dynamic model that

captures the spatio-temporal properties of rain. These two models are used to describe

22 the visual appearance of rain. The authors then developed an algorithm to be used in

post-processing to remove rain effects from videos. They showed that properties unique to

rain, such as its small size, high velocity, and spatial distribution, make its visibility depend

strongly on camera parameters such as exposure time and depth of field.

Rain specific effects, such as ripples, splash, and drips add to the realism of a rainy

scene. Such effects are described Tatarchuk and Isidoro [18] in a visually complex virtual city environment rendered in real-time. The authors gave an overview of the lighting system and

presented various approaches of rendering rain and dynamic water simulation using a GPU.

They presented a novel post-processing rain effect that is created by rendering a composite

layer of falling rain. The illumination of rain is computed by using refraction of individual

raindrops and reflection due to surrounding light sources and the Fresnel effect, which

describes the behavior of light when moving between media of differing refractive indices.

Rain splashes as it falls on various objects is rendered by using a billboard particle system.

Their technique works well for rainfall effects with streaks formed by falling raindrops.

However, this method does not produce the specular richness of a stationary raindrop or

photorealistic effect in a slow-motion video of rain. An alternative is to use hierarchical

refractive and reflective maps presented by Slomp et al. [19]. They use this technique to create a photorealistic real-time rendering of raindrops in a static or slow-motion scene

that are common in movies or instant-replays in video games. Multiple texture maps are

generated, each with decreasing resolution as distance from the viewpoint increases, called

hierarchical maps. These hierarchical maps are mapped on the raindrop for photorealistic

effect and raindrop billboards are used to achieve real-time rendering.

A similar method uses a multi-resolution techniques proposed by Puig-Centelles et al.

23 [20]. They apply a hierarchy of rain texture models to render rain. To obtain realistic results the authors also use a level-of-detail technique, which decreases the complexity in the rain

model with an increase in distance from the viewer. Programmable units, called shaders, were used to upload input data to the GPU memory in order to improve rendering time by

taking advantage of the programmable graphics pipeline. In another study [21], the same authors extended their work to include control and management of rain scene by defining

and operating within a certain rain area. The physical properties of rain were analyzed

and incorporated in rendering of realistic rain within a predefined rain area with real-time

user controls. They further extended the work by taking advantage of a GPU compute

mode, where the GPU can be used as a general-purpose processor [22]. They achieved much higher frame rates in simulating rain dynamics and collision detection with CUDA

implementation. Presence of fog, halos, and light glows in a rendered scene enhances

photorealism. For greater photorealistic effects, Creus and Patow [23] extended existing monoscopic rendering of rain by adding splashes and illumination effects such as fog, halos,

and light glows. Their algorithm did not impose any restriction on rain area dimensions

but instead only rendered rain streaks in the area around the viewer. The simulation ran

entirely on the GPU that included collision detection with other scene objects and used

pre-computed images for illumination effects.

Real-time rendering with photorealistic output using a GPU brings user interactivity

to applications.. Such method is proposed by Rousseau et al. [24]. Refraction of the scene inside a raindrop is simulated by using a texture map that is distorted based on the optical

properties of a raindrop. The authors conclude that reflections are limited to its silhouette

and thus can be neglected without reducing photorealism. Similar results are presented

24 by Wang et al. [25] with a two-part approach. First they use off-line image analysis of rain videos and second they use particle-based synthesis of rain. Images analyzed from rain videos are used to create a rain mask, which are then used for online synthesis. A pre-

computed radiance transfer function is used for scene radiance and illumination. The

radiance transfer depicts the mapping of the light in the surrounding environment to the

raindrop and how it will appear to the observer. The authors demonstrated photorealistic

results with low computational cost and small memory footprint. They applied results on a variety of real video and incorporated synthetic rain in real-time.

In another study Rousseau et al. [26] described a complete framework to simulate rainfall in a video game. The authors developed a particle system that was implemented on a GPU.

A particle system is a large set of simple primitive objects, such as a point or a triangle, which

are processed as a group. Each particle has its own attributes including position, velocity,

and lifespan that can be changed dynamically. It animated each raindrop considering it

as an individual particle. They also simulated the effect of light on raindrops by using a

refraction model developed in one of their previous studies. A retinal persistence model was also developed and included in final implementation. Another rain scene animation

framework that uses a particle system is presented by Coutinho et al. [27]. Each particle represents a raindrop. Environment lighting is implemented by using a pre-computed

radiance transfer function. The authors also employed smoothed particle hydrodynamics,

a computational method used for simulating fluid flows.

A particle system is also used by Tariq [28] to animate and render rain streaks using only a GPU. In this process, rain particles were expanded into billboards to be rendered using

the geometry shader, which is a programmable stage in a modern GPU graphics pipeline

25 where object shape can be manipulated by adding or removing vertices. Subsequently,

a library of stored textures was used to render raindrops from different viewpoints and

lighting directions.

A video frame sequence of a real scene can be extracted and used as a background

scene.in a rendered output. Such method is proposed by Starik et al. [29]. They added synthetic rain to real video sequences by describing visual properties of rainfall in terms of

time and space. The authors assumed partial knowledge of intrinsic camera parameters,

such as focal length and camera exposure time, and user defined rain parameters, such

as intensity and velocity, to achieve photorealistic results. A similar method is proposed

by Mizukami et al. [30]. Their method is to render a realistic rain scene that represented environmental conditions that change from one scene to another. They modeled wind

effect, intensity and density of rainfall. They also proposed a technique to calculate raindrop

trajectory that made rainfall effect more realistic.

A technique that maps textures onto a double cone that is placed around the observer

is presented by Wang and Wade [31]. The orientation of the cones is determined by the position and speed of the camera movement. Several textures of rain or snow are simul-

taneously scrolled on the double cone with different speed to give impression of motion.

This method is faster than the particle system, but lacks interaction between raindrops or

snowflakes and the respective environment such as splashes, water puddle formation, and

snow accumulation. On the other hand, the method proposed by Wang et al. [32] simu- lated rain interaction with the ground giving it a wet appearance and ripples formation

on water puddles. They presented a real-time realistic rain scene model that accounts for

changes in physical characteristics of a raindrop under various conditions. The authors

26 modeled changes in shapes, movements and intensity of raindrops. They also presented a new method to calculate rain streaks that accounted for retinal persistence. Besides rain, they implemented a fog effect. The method also allowed a real-time change in scenery by implementing the algorithm on a GPU.

A rain model based on spectral analysis of real rain is presented by Weber et al. [33], This model is used to determine rain distribution. Furthermore, the visibility and intensity of distant rain streaks are attenuated by using the sparse convolution noise technique, which is used in building new textures at any resolution in real-time. The authors also derived a rain density function to quantify visible rain streaks within the view frustum in terms of rain intensity and camera parameters, such as field of view and exposure time. The view frustum is defined as the volume containing visible scene objects when observed from the camera’s perspective. The authors show that their technique is independent of scene complexity due to use of image-space post-processing.

Rain clouds and rendering of lightening from the clouds also adds to realism in a rainy scene. This is demonstrated by Wang et al. [34]. They used rain cloud lightening and wind field effect to enhance realism in a rainy scene. A particle system is used to simulate rainfall dynamics. Photorealistic output is achieved by implementing a ray-tracing algorithm, a method for calculating the path of light rays through a system, to render rain under different lighting condition. The authors also simulate hazy atomization to create mist simulation by fusing raindrops texture into the scene background. On the other hand,

Feng et al. [35] implemented a non-photorealistic cartoon style animation of rain. The authors achieved a real-time rendering frame rate on a GPU-based particle system that includes collision detection and cartoon style splash effects. Particle motion was based

27 on the Newtonian dynamics to simulate global changes in the rain direction due to wind- driven effects. Rendering runtime is compared between several scene complexities by using object models of various geometries.

The literature review of monoscopic rendering of rain highlights many different tech- niques. Some methods focused on real-time rendering such as the use of scrolling textures and particle systems. Other methods discussed realism in terms of rainfall dynamics, rain- drop shape, factors that may improve photorealism such as reflection, refraction, wind, lightening, rainbow effects, and collision with other objects in the scene. A study also described non-photorealistic cartoon style rainfall effects.

In summary, the key realization is that a real-time photorealistic rendering of rain can be achieved in a monoscopic case. However, achieving the same results for a stereoscopic display is more challenging because, in theory, the data processing doubles. Therefore, twice as much time is needed to process the left- and the right image. However, there is redundancy and parallelism that can be exploited to improve rendering efficiency for stereoscopic images. Table 3.1 summaries these results.

A total of 22 recent studies on monoscopic rendering of rain have been reviewed. In contrast, only 3 studies are found on rendering natural phenomena in stereo. None of the stereo studies addressed issues related to realistic rain and snow rendering, which are spread over large 3D virtual space. Therefore, a solution is proposed to extend existing monscopic techniques of rain or snow rendering to stereo while maintaining real-time and photorealistic output.

28 Table 3.1 Monoscopic rain rendering literature

No. Authors Realtime Photorealistic Other Attributes

1 Starik et al. (2003) Rain in video 2 Garg & Nayar (2004) Reflection/refraction 3 Wang & Wade (2004) Scrolling textures 4 Feng et al. (2005) Cartoon style 5 Garg & Nayar (2006) Rain streaks 6 Tatarchuk & Isidoro (2006) Cityscape rain 7 Rousseau et al. (2006) Reflection/refraction 8 Wang et al. (2006) Particle based rain 9 Garg & Nayar (2007) Rain distribution 10 Tariq (2007) GPU based rain 11 Rousseau et al. (2008) Rain streaks 12 Mizuka et al. (2008) Wind effect 13 Changbo et al. (2008) Ripples/puddles 14 Puig-Centelles et al. (2008) Particle based rain 15 Puig-Centelles et al. (2009) User controls 16 Roser et al. (2010) Raindrop model 17 Coutinho et al. (2010) Rain surface flow 18 Slomp et al. (2011) Reflection/refraction 19 Puig-Centelles et al. (2011) CUDA based rain 20 Creus et al. (2012) Rain splashes 21 Wang C. et al. (2015) Rainy scene 22 Weber et al. (2015) Rainfall modelling

29 3.3 Monoscopic Rendering of Snow

The methodologies used to simulate and render rainfall can also be applied to snowfall

such as scrolling texture-based or particle-based systems. Wang and Wade [31] used a scrolling textures method for both rain and snow. The authors applied snow textures to a

double cone that was placed around the observer. Applications such as flight simulation, where an airplane is in flight, can take advantage of such a method. Although it can be

implemented in real-time, it lacks realism due to little or no interaction with other objects

in the scene. On the other hand, the particle-based systems are more physically realistic

since each snowflake is simulated and rendered individually, whose motion is determined

by the influences of various forces, such as gravity, wind, and air resistance.

Another method to render rain and snow is proposed by Yang et al. [36] in which the authors combine two techniques, namely level-of-detail, which decreases complexity of a

3D object representation as it moves away from the viewer, and fuzzy-motion, the blurriness

of moving objects due to persistence of vision. Such combination is used to increase particle

system efficiency for rendering of natural phenomenon. Instead of using a billboard method

to represent rain or snow, the particles were expanded to form a point eidolon that can

be texture mapped. A point eidolon is defined as a point that is stretched to conform to

a rain streak or snowflake shape, which is texture mapped with appropriate rain or snow

texture. The authors showed that the number of polygon required to create a point eidolon

are lower than creating a billboard, increasing the performance of the particle system. A

shape and appearance model of a single falling snowflake or rain streak is proposed by

Barnum et al. [37]. It is used to detect rain or snowfall in a video. The detection results are

30 identified in the frequency domain and then transferred to the image-space. Once detected,

the amount of snow or rain can be reduced or increased. The authors demonstrate that

the frequency-based approach had greater accuracy in the detection of dynamic snow and

rain particles as compared to a pixel-based approach.

The realism can also be enhanced by having snow accumulate on the ground and on

other scene objects. A method for generating such snow cover is presented by Fearing

[38]. The method consists of two models that address snow accumulation and stability. The accumulation model calculates how much snow each surface should receive. This is

based on a counter-intuitive idea where particles are emitted from upward-facing surfaces

towards the sky to determine exposure to falling snow. The amount of exposure determined

the amount of snow accumulating on the surface. To simulate wind effects, the author used

a global wind-direction that had an advantage of not requiring any fluid computations.

However, it did not produce fully convincing accumulation patterns. The stability model

is used when layers of snow were added to the scene, in an unstable area, to determine

if it will fall down. The method is based on calculating the angle of repose, which is used

to measure the static friction between piles of granular material. If the repose angle is too

steep, snow is redistributed. To render the scene, the author used commercial rendering

software. While the method is visually superior and photorealistic, it is computationally

expensive.

Fearing’s work is extended by Feldman and O’Brien [39]. They apply fluid dynamics

techniques described in Fedkiw et al. [40] to create wind velocity fields and use it to drift accumulated snow in a more realistic manner. The snow is accumulated by storing the

amount on a horizontal surface of a three-dimensional grid, a voxel, which is marked as

31 occupied. New voxels are marked as occupied after enough snow has been accumulated to

fill the voxels beneath. A real-time snow accumulation method is Haglund et al. [41], which simulates different stages of buildup of snow cover starting with a snow free environment

and ending with a completely snow covered scene. To store snow depth, height matrices

are placed on all surfaces that could receive snow. When snowflakes hit a surface, the

nearest height value is increased. To render the scene, triangulations are created from the

height matrices, and rendered using Gouraud shading, an interpolation method to produce

continuous shading of surfaces, by means of OpenGL functionality. Their focus is to find a

good trade-off between visual result and performance without physical correctness.

The type of snow, either wet or dry, is determined by the polygon count. A method

that distinguish between the two snow types is proposed by Moeslund, et al. [42]. In this method, the rendering of snowfall and snow accumulation is based on collection of ran-

domized triangular polygons to distinguish between wet and dry snow. The appearance

and movement of snowfall is based on physics governing the real processes. The same goes

for the accumulated snow where a correctly modeled wind-field is important for producing

realistic results. The effects of wind also make individual snowflakes of various shapes, which results in snowflakes moving and tumbling differently from one another, based on

their shapes. Their accumulation model allowed for the generation of snow-covered scenes

of any depth very rapidly. The results show that both the appearance and movement of the

snow, as well as the accumulated snow, are very similar to real snow. This method is used

by Zou et al. [43] to make the scene photorealistic by creating natural looking snowflakes. The algorithm to create realistic snowflakes uses tessellated triangles that are combined

together to form a snowflake. The sharp edges on the triangles are curved by applying

32 quadric Bezier curves, implemented on a GPU, that gives a new snowflake a more smooth

and natural appearance. Photorealistic heavy snowfall requires the use a of high number of

particles. A frequency domain spectral analysis of the rendered image to reduce the number

of particles in snowfall, yet keep it visually realistic, is proposed by Langer et al. [44]. They combine geometry-based falling particles with image-based spectral synthesis. The method

first renders an image with simple particle-based snowfall representation. It then analyzes

the movement of particles stored inside an image within the frequency domain. This is

used to produce an opacity map, which creates an illusion of more dense snowfall than what has been rendered initially. While this technique provides visually pleasing results

for snowfall it does not address the problem of rendering snow on the ground or snow

accumulation.

A real-time snow accumulation model is implemented by Ohlsson, and Seipel [45]. The authors used an accumulation prediction model, which contained two components, namely

an exposure and an inclination component, which determine how much snow, at specific

point, a surface would receive. This method creates snow accumulation on a per-pixel basis.

Their idea was to create an occlusion map to decide how the surface should be rendered in

terms of snow depth. A noise function is used to create surface snow textures. The algorithm

is implemented on the GPU to achieve real-time frame rates. Another real-time occlusion-

based snow accumulation algorithm is proposed by Reynolds et al. [46]. A surface-bound accumulation buffer is assigned to each object in the scene, which forms a height map

of accumulated snow cover. The authors use ashadow mapping technique, to render the

scene from above, to map directly visible surfaces to their corresponding accumulation

height-map. To reduce formation of sharp peaks and edges, blurring is performed to get a

33 smoother accumulation height-map and existing scene geometry is tessellated to add more

detail. The authors are able to achieve real-time snow accumulation on a dynamic, moving

scene. A shadow buffer technique is used by Tokei [47] to render snowfall in real-time. This method generated a shadow map using the z-buffer, also called depth-buffer. Snow is

accumulated in areas that are not shadowed. The shadow map simulates obstruction to

snowfall. A snow stability method by Fearing [38] is used to create micro-avalanches to stabilize fallen snow on various objects. This method provided good results for small size

scenes that are only up to 300 300 pixels. A procedural modeling method by Foldes and × Benes [48] is used for snow accumulation based on illumination. The authors assumed that there is a constant layer of snow on the surface. In their model, the snow accumulated or

dissipated. The snow accumulation regions are defined by calculating ambient occlusion.

The snow dissipation is by either direct or indirect illumination. Pervious work by Ohlsson,

and Seipel [45] used occlusion techniques for determining obstructions to snowfall while this method uses occlusion to account for heat and dissipation.

Existing methods in snowfall are extended by Saltvik et al. [49] to include wind simulation and snow accumulation for parallel execution on symmetric multiprocessors and multi-

core systems. The data structures are divided among various parallel threads of executions.

For snowfall modeling, the authors extended the results from Moeslund, et al. [42] by tracking the 3D positions of snowflakes to decide where to accumulate snow. In the model,

the position and velocity of each individual snowflake is tracked and updated according to

the forces described by Newtonian dynamics and the laws of motion. For wind simulation,

computational fluid dynamics is used, which is based on the Navier-Stokes equations used

in Fedkiw et al. [40] for smoke simulation. The authors extended Haglund et. al. [41] for

34 snow accumulation. For each frame, objects are checked for intersections with snowflakes.

If a hit occurs, the nearest snow height value is increased.

An efficient snow distribution method is presented by Festenberg and Gumhold [50] to give a realistic snow cover on object surfaces, as an alternative to using high-cost particle

system simulation, which produced simplified snow surface representation. The authors

use photographs of snow-covered scenes as a primary source for the model development.

Inspired by the real world observations, they derived a statistical snow deposition model.

This work is extended by Hinks and Museth [51] to include a wind-driven snow distribution model that uses dual level set, a data structure that represents the surfaces of the dynamic

snow and the static boundaries of a scene. The authors introduce a concept of using snow-

packages, a representation of discrete volumes of snow, which is traced in a wind-field and

on the surfaces of scene objects.

A GPU-based particle system for snow rendering is implemented by Zhang et al. [52] for snow rendering that included both snowfall and accumulation. Their method used

textures to store data on a GPU necessary to simulate snow particles, e.g., RGB color value

on a texture represented the xyz value of particle position in space. For each update to the

next frame, new data is written on a new texture. The height fields are used to establish

snow accumulation rules, such as how high snow can accumulate on a certain surface. The

implementation targets real-time-rendering goals instead of achieving photorealism. A wind-effect model is developed by Tan et al. [53] which is applied on a particle system to simulate snowfall. To improve visual fidelity, the authors used eight different snowflake

textures and applied texture indexing to switch between them. The snowflakes attributes

such as position, velocity, and rotational angles are computed by the particle system imple-

35 mented using a DirectX-3D library designed specifically for Microsoft platforms. However,

the implementation ran on a CPU and did not take advantage of the programmable graph-

ics pipeline. This method is extended by Tan and Fan [54] to include snow accumulation. They increased the number of textures used for snowflakes and changed the wind-field

model to use lattice Boltzmann fluid dynamics equation, which simplified the calculation

and offered a more realistic particle motion with changes in wind directions. The rendering

efficiency is improved by Fan and Zhang [55]. They use OpenGL display lists, a series of graphics commands that define an output image. When a display list is referenced, a group

of stored commands execute in order efficiently. However, implementation of display lists

is still done on a CPU causing the GPU to reference the CPU for new data for each new

draw call, creating a bottleneck between the CPU and GPU interface.

Simulation of water due to rain and melting snow is added by Ding et al. [56]. The authors used a height-map, an array that store height values, to describe various terrain

elevations. For ground water simulation, bump mapping is used to perturb the normal vectors for realistic reflection from a water surface. A study on animation of snow dynamics

is proposed by Stomakhin et al. [57]. The interactions between solid and fluid states produce a realistic snow dynamics especially in wet or dense snow, which exhibits both solid and

fluid-like properties. The authors implement a user-controllable variant of a hybrid of

an Eulerian and Lagrangian material-point-method (MPM). The MPM, first proposed by

Sulsky et al. [58], is a method in computational fluid dynamics in relation to computational

solid dynamics, which is an extension of the particle-in-cell (PIC) method [59]. An Eulerian or grid-based approach to fluid dynamics is a way of looking at fluid motion that focuses

on specific locations in the space through which the fluid flows as time passes, a grid

36 based approach. As opposed to a Lagrangian or particle-based approach, which is a way

of looking at fluid motion where the observer follows an individual fluid parcel, a small

amount of fluid, as it moves through space and time. In a hybrid Eulerian/Lagrangian MPM approach, instead of direct communication between particles they communicate through

a background grid, making the method extremely efficient. However, like other methods without a rest state MPM suffers from drift, which is exacerbated by a Taylor expansion

approximation of the deformation gradient, limiting the ability to simulate large elastic

deformations over long time frames. The authors also demonstrate that the MPM occupy

an interesting middle ground for simulation techniques, especially elasto-plastic materials

undergoing fracture. By increasing plasticity to the basic constitutive model, they show

that a range of compressible materials, such as snow, can be simulated.

A shell texture to render snow is implemented by Wong and Fu [60]. This type of texture is formed by creating a series of concentric, semi-transparent textures containing samples

images of snowflakes. The shell textures are generated at the preprocessing stage and are

used in rendering snowflakes. The proposed method is based on a hybrid, particle (La-

grangian) and grid (Eulerian), structure for handling snow. The movable snow is represented

as particles, whereas static snow is modeled as grid cells. The snowflakes particles are made

of several shell textures that are held together by applying spring forces among each other

to bond them together. The movement of these particles is simulated by a particle system.

While static snow on the ground is simulated by grid cells such that the occupied grid cells

are filled with snow and the empty grid cells have no snow. But the particles can move freely

inside each grid cell. This allows fallen snow on objects to look natural. The final resting

place of the snowflakes is computed according to gravity, collision, and spring forces.

37 Like rain, numerous studies exist on monoscopic rendering of snow. A total of 22 recent

studies have been reviewed. In addition to simulating snowfall, other problems are also

considered such as interaction of fallen snow with the ground and objects in the scene, which includes snow accumulation, formation of snow drift patterns due to wind effects,

and various shapes of snow piles as they form on objects. Notably, none of the studies

address stereoscopic rendering. Table 3.2 summaries these results.

3.4 Computational Analysis

Computation power needed to render a 1024 1024 screen resolution requires about 1 × million pixels to paint. A single pixel color contains 8 bits per channel for red, green, and

blue colors and another 8 bits for the alpha channel, which is used for transparency. To

color a single pixel a total of 32 bits are assigned. Therefore, rendering a single image

requires about 4 MB of data. At a rate of 60 fps the computer processes upward of 240 MB

of information every second to display one image. To process scene content, object shapes,

lighting, and other characteristics require more computational resources. The following

sub-section summarizes computation analysis performed by authors of recent literature in

monoscopic rain and snow rendering.

3.4.1 Rendering Performance – Rain

A scrolling texture approach instead of a particle system is used by Wang and Wade [31]. This resulted in giving the same performance overhead for heavy or light precipitation, which is independent of the number of particles rendered. Across a range of consumer

38 Table 3.2 Monoscopic snow rendering literature

No. Authors Realtime Photorealistic Other Attributes

1 Fearing (2000) Accumulation 2 Feldman & O’Brien (2002) Drifts 3 Haglund˙ (2002) Accumulation 4 Langer et al. (2004) Snowfall 5 Ohlsson & Seipel (2004) Accumulation 6 Moeslund et al. (2005) Accumulation 7 Saltvik et al. (2006) Snowfall 8 Tokoi (2006) Piles & shapes 9 Foldes & Benes˙ (2007) Dissipation 10 Yang et al. (2008) Snowfall 11 Festenberg et al. (2009) Distribution 12 Hinks & Museth (2009) Distribution/wind 13 Tan et al. (2009) Snowfall/wind 14 Barnum et al. (2010) Snow/rain removal 15 Zhang et al. (2010) accumulation 16 Zou et al. (2010) Snowflakes model 17 Tan & Fan (2011) Snowfall/wind 18 Fan & Zhang (2012) Snowfall 19 Ding et al. (2013) Accumulation 20 Stomakhin et al. (2013) Distribution 21 Reynolds et al. (2015) Accumulation 22 Wong et al. (2015) Accumulation

39 graphics cards, their technique maintained frame rates of 15 to 60 fps. A cartoon style

non-photorealistic technique by Feng et al. [35] implemented a particle system. The au-

thors compared results between different scene geometries on an AMD Athlon XP 2500+ 1.83GHz processor with an NVIDIA GeForce 6800 LE GPU with 512 MB of main memory.

The performance varied between 10 to 45 fps for number of particles ranging from 6000 to

20,000. Note that the frame rate increases with a decrease in the number of particles. The variation was due to changes in the quantity of geometries drawn and the complexity of

the particle system.

A particle system used by Rousseau et al. [24] included 5000 particles that are sufficient to provide a realistic rain impression when large raindrops or streaks are used. When using very small raindrops, below a radius of 1 mm, at least 10,000 particles are required for a

realistic rain impression. The authors ran their experiments on a PC with a 2600+ AMD CPU and an NVIDIA Geforce 6800 GT graphics card. Their method generated 100 fps for a

rain scene containing 5000 particles.

An ATI Radeon X1900 XT graphics card with a 512 MB video memory on a 1 GB Dual

Core 3.2 GHz Pentium 4 PC is used by Tatarchuk [61] to render rain in real-time for a visually complex virtual city environment. The author was able to achieve frame rates of 26 fps for

20,000 rain particles. The frame rate increased to 69 fps when number of particles decreased

to 5000. Photorealistic results with low computational cost are demonstrated by Wang et al.

[25]. The performance experiments ran on an Intel Pentium 4, 3.2 GHz PC with an NVIDIA GeForce 7800 GT graphics card. Their frame rate varied from 77 to 790 fps and the number

of particles were in the range of 10,000 to 80,000.

An NVIDIA GeForce 8800 GTX graphics card on a 2 GB Intel Core 2 processor is used by

40 Tariq [28] to implement a particle system to animate and render rain streaks. The author implemented experiments using Direct3D10 library designed specifically for a Microsoft

platform and took advantage of the GPU geometry shader to achieve 26 to 545 fps for

particles form ranging from 200,000 to 5 million. A multiresolution techniques for rain

rendering is used by Puig-Centelles et al. [20] taking advantage of a programmable GPU shader units. For experiments NVIDIA GeForce 8800 GT graphics card on a Pentium D 2.8

GHz with 2 GB RAM was used. They achieved 187 fps for 50,000 rain particles. On the same

hardware, the authors compared their results with earlier implementation by Rousseau et

al. and Tariq and showed that their method provide a better rain appearance with fewer

particles.

A method that incorporates wet ground, ripples, puddles, and even a rainbow effect is

implemented by Wang et al. [32]. They used a Pentium 4, 3.2 GHz CPU using an NVIDIA GeForce FX 7900 GTX graphics card for implementation. The average rendering speed of a

dynamic rain scene reached 20 frames per second. Rain rendering with collision detection

is implemented by Coutinho et al. [27] to give visual effects like rain splashes. They also simulate rain water collecting into lakes and forming rivers. To model the water surface flow

they used a smoothed-particle hydrodynamics (SPH) technique, which is a computational

method used for simulating fluid flows. The framework used CUDA for rain simulation. The visualization was implemented in OpenGL and GLSL. The experiments were performed

using an Intel Quad Core 3.0 GHz, with 4 GB of RAM and an NVIDIA GeForce 9800 GTX,

running in an OpenSuse Linux platform. The experiments included simulation of rain,

collision detection and SPH with a total of 60,000 rain particles and achieved a frame rate

of 68 fps. The frame rate went down to 38 fps when simulation also included formation of

41 river flows and rain splashes.

A technique consisting of two stages, a preprocessing stage that generated a raindrop

mask and a run-time stage that renders raindrops as screen-aligned billboards is used by

Slomp et al. [19]. The framework was implemented with using C++, OpenGL, and GLSL version 1.20. The experiments were performed on an Intel Core 2 Quad 32 bit CPU running

at 2.66 GHz with 4 GB RAM with NVIDIA GeForce GTX 280, 1 GB VRAM graphics card.

A frame rate in the range of 46 to 794 fps for particles ranging from 125,000 to 4 million

raindrops. The authors also implemented tone mapping, a technique to approximate the

appearance of high dynamic range images by mapping one set of colors to another. When

tone mapping is enabled, the frame rate decreased to 595 fps for 125,000 raindrops. An

Intel Core 2 Duo running at 3.0 GHz with an NVIDIA GeForce GTX 280, 1 GB VRAM is

used to implement the Creus and Patow [23] work. Along with rain rendering, the authors implemented rain splashes and various illumination effects such as fog, halos, and light

glows. The frame rate varied from 4 to 56 fps for number of particles in the range of 430,000

to 7.6 million.

An image space rain streaks model is proposed by Weber et al. [33]. The experiments were performed on NVIDIA GeForce GTX 980. A heavy rain is simulated using 8000 visible

streaks rendered at 30 fps. A light rain corresponding to 2000 visible streaks performed at

60 fps. Their results produced a non-linear relationship between number of rendered rain

streaks and the frame rate. A ray-tracing technique to render a realistic rain scene is used

by Wang et al. [34]. The authors used an Intel 2.4 GHz Core CPU with 4 GB RAM and an NVIDIA GeForce GT 540M graphic card. For the number of particles ranging from 2000 to

15000, the frame rate varied from 76 to 478 fps. Table 3.3 summaries performance results

42 from recent work in monoscopic rendering of rain.

Table 3.3 Performance summary of monoscopic rain rendering

No. Authors No. of Particles Framerate (in thousands) (fps)

1 Wang & Wade (2004) N/A 15 - 60 2 Feng et al. (2005) 6 - 20 10 - 45 3 Rousseau et al. (2006) 5 100 4 Tatarchuk & Isidoro (2006) 5 - 20 26 - 69 5 Wang et al. (2006) 10 - 80 77 - 790 6 Tariq (2007) 200 - 5000 26 - 545 7 Puig-Centelles et al. (2008) 50 187 8 Wang et al. (2008) N/A 20 9 Coutinho et al. (2010) 60 68 10 Slomp et al. (2011) 125 - 4000 46 - 794 11 Creus et al. (2012) 430 - 7600 4 - 56 12 Weber et al. (2015) 2 - 8 30 - 60 13 Wang C. et al. (2015) 2 - 15 76 - 478

3.4.2 Rendering Performance – Snow

Real-time rendering of snowfall is achieved by Langer et al. [44] using a static background image and combining it with snowflake textures using an image-based spectral synthesis

method. The implementation was using a DirectX 9 library designed for Microsoft platforms

running on a Windows XP PC with an Intel Pentium 4, 2.4 GHz processor and 1 GB of RAM.

In addition, the PC had an ATI Radeon 9800 Pro graphics card with 256 MB of video memory.

The authors simulated light and heavy snow conditions with 2000 to 16000 snowflakes. The

frame rate varied between 10 to 60 fps.

43 An accumulation prediction model is used by Ohlsson and Seipel [45], which helped to compute how much snow a specific point on a surface would receive. The implementation was tested on NVIDIA Geforce FX 5600 Ultra. The performance was directly dependent on

the resolution and the amount of the screen covered with potential snow covered surfaces.

The authors used a 600 600 resolution image with 48000 vertices. On average a frame rate × of 13 fps was achieved.

Snowfall accumulation and effects of wind is implemented by Saltvik et al. [49] on a multiprocessor system. They compared results between task and data-parallel systems.

Task-parallelism is achieved when each processor executes different threads on the same

or different data. In contrast, data-parallelism performs the same task on different data

sets and a single thread controls operations on all pieces of data. The authors showed that

task parallelism gave a 29% performance gain. This is because in their data-parallel imple-

mentation, during rendering cycle the OpenGL thread blocked other executing threads.

The experiments ran on an Intel Pentium Xeon 3.2 GHz dual CPU workstation with NVIDIA

Quadro FX 3400 graphics card. The number of simulated snowflakes ranged from 5000 to

40,000 with a frame rate varying from 23 to 133 fps.

A shadow buffer technique was used by Tokoi [47] to render snow cover and distribution. The implementation was on an Intel Pentium III, 800 MHz CPU, 384 MB main memory,

NVIDIA Geforce 2 with 32 MB video memory. The experimental software was developed on

Microsoft Windows XP Home Edition using Microsoft Visual C++ 6.0, OpenGL version 1.1. The authors used a 300 300 image with about 40,000 vertices. On average a frame rate of 4 × fps was achieved.

Level-of-detail and retinal persistence are used by Yang et al. [36] to render falling

44 precipitations such as snow and rain. The experiments ran on an Intel Pentium 4 CPU 2.80

GHZ, 512 MB RAM, NVIDIA GeForce FX 5200, with 128 MB video memory. The authors

showed that their implementation needed approximately 56,000 particles to represent

precipitation with an average frame rate of 20 fps.

A GPU-based particle system is implemented by Zhang et al. [52] and compared to a CPU implementation of the same algorithm. The experiments ran on an Intel Pentium

4 2.8 GHz computer with 1 GB RAM, and NVIDIA GeForce 7650 graphics card with 256

MB of video memory under Windows XP,Visual C++ 6.0, and an OpenGL environment. Light snow was represented by 100,000 snowflakes and heavy snow contained 200,000

particles. On a CPU implementation, the frame rate ranged from 8 to 18 fps. However, GPU

implementation improved results by more than three times with frame rate ranging from

36 to 60 fps.

A shadow-mapping technique is used by Reynolds et al. [46] to determine which areas on the ground are not occluded, and therefore will accumulate snow. The experiments were

performed on Intel i7 PC with an NVIDIA GTX 580 GPU. Scenes of differing complexity were used with up to 750,000 vertices at 1024 1024 resolution. The results showed that × the performance of snow accumulation was largely independent of scene complexity with

frame rate varying between 65 to 75 fps. Table 3.4 summaries performance results from

recent work in monoscopic rendering of rain.

45 Table 3.4 Performance summary of monoscopic snow rendering

No. Authors No. of Particles Framerate (in thousands) (fps)

1 Langer et al. (2004) 2 - 16 10 - 60 2 Ohlsson & Seipel (2004) 48 13 3 Saltvik et al. (2006) 5 - 40 23 - 133 4 Tokoi (2006) 40 4 5 Yang et al. (2008) 56 20 6 Zhang et al. (2010) 100 - 200 36 - 60 7 Reynolds et al. (2015) 750 65

46 CHAPTER 4

STEREO RENDERING

Rendering is a process of converting a three-dimensional world scene into a two-dimensional

image of that scene. Although this two-dimensional image can give some information about

depth using lighting and the viewer’s knowledge of the world, it loses the ability to look

beyond or around objects in that scene and makes some depth relationships ambiguous.

The use of stereovision can remedy this shortfall. The process of rendering in stereo requires

two subtly different views of the same scene, one for the left eye and another for the right

eye. This is analogous to stereophonics, where separate speakers play different sounds into

the left and the right ears. Stereovision requires two forward facing eyes separated by a

horizontal distance, called the inter-ocular distance. The light enters the two eyes and fall

onto respective two-dimensional surface called the retina. The two separate sets of image

data are sent from each retina to the back of the brain for processing by the visual system where the images merge to produce depth perception. Debate exists about how a vision

system combines various depth cues [62]. How the human visual system functions is an

47 open area of research and beyond the scope of this study. Suffice it to say that most viewers will perceive depth from planar left- and right-eye views with disparity. Figure 4.1 illustrates

the two views perceived by each eye.

Figure 4.1 Stereo pair as viewed by the two eyes

4.1 Depth Perception

The 3D structure of a scene is perceived from the 2D retinal images using various psychologi-

cal and physiological depth cues [63]. Other taxonomies of depth perception, such as monoc-

ular, binocular, and oculomotor depth cues, also exist in the literature [64]. Depth cues are additive. Correct combination of depth cues gives a better sense of a three-dimensional

environment but a conflict in depth cues may result in eyestrain and unpleasant viewing ex-

48 perience. Some cues are stronger than the others in certain situations. For example, a sailor may rely on multiple psychological depth cues to determine the distance to a far-off buoy, such as linear perspective, aerial distortion, and relative size to name the few. However, a person threading a needle primarily uses physiological depth cues, such as binocular disparity, accommodation, and convergence, to determine the location of the end of the thread and the eye of the needle. An important criterion for the dominance of one cue over another is the distance from the viewer to the objects of interest.

4.2 Psychological Depth Cues

Psychological depth cues include: linear perspective, lighting and shadows, aerial perspec- tive, color, interposition or occlusion, texture gradient, and relative size. Such depth cues are considered monocular because they can be observed by one eye. They give an impression of depth even in a two-dimensional image. Artists have known about psychological depth cues since the renaissance period and have used them to create depth perception. Since these depth cues are observable in a painting or a picture, they are also known as pictorial depth cues. As an example, in Figure 4.2 an 1877 painting by Gustave Caillebott, Paris Street;

Rainy Day, shows how use of psychological depth cues can be effective in creating the illusion of depth even from a planar surface.

Aerial Distortion: The objects that are further away from the viewer appear cloudy or behind bluish haze. This is because blue color, having shorter wavelength, penetrate atmosphere more easily. In computer graphics, depth cuing is used to reproduce effects of aerial perspective, which reduces the intensity of the object in proportion to the distance from the viewer.

49 Figure 4.2 Gustave Caillebotte. Paris Street, Rainy Day, 1877. Image Credit: Art Institute of Chicago

Linear Perspective: The retinal image formed by an object becomes smaller as the object moves away from the viewer. In computer graphics, this phenomenon is modeled by linear perspective projection. All parallel lines that run towards the horizon line appear to converge at a single point called the vanishing point. This is why train tracks in the distance seem to come together.

Interposition or Occlusion: It is one of the simplest depth cues that work at any distance in which foreground objects hide or overlap background objects. This provides information about relative depth.

Relative Size: Certain objects are expected to be smaller than others. Knowledge of relative size of an object also aid in determining depth. Prior knowledge of normal object size can be used to infer knowledge about depth.

Color: When light of different wavelengths enter the eye, it is refracted by the fluids in the eye at various angles due to differences in the refractive index of the wavelengths. This

50 causes color images on the two retinas to have disparity thus producing depth perception.

This is referred to as . Typically, red color and bright objects appears

closer than blue color or dull objects.

Texture Gradient: More details are perceived in an object that is closer to the viewer

but with distance, textures get blurry due to perspective transformation. For example, a

brick or stone is coarse in foreground view but gets progressively finer with distance. This

texture gradient causes relative depth perception.

Lighting and Shadows: Scene illumination plays an important role in giving a realistic

depth sensation to an image. Accurate use of lighting and shadows also enhances photo-

realism. An object that is further away from the light source appears darker and casts a

smaller shadow. The effects of light and shadow on an object give cues about depth, shape,

relative position and size.

4.3 Physiological Depth Cues

Physiological depth cues are perceived when eye structure changes. Examples include variations in the thickness of the eye lens or convergence of two eyes on an object to bring

it in focus. The physiological depth cues are due to binocular disparity, ocular motion

such accommodation, and convergence, or due to monocular clues as observed in motion

parallax.

Binocular Disparity: One of the most noticeable and important depth cue is binocular

disparity, also known as retinal disparity. It refers to the difference between an image formed

on the retina of the left- and the right-eye. If the two images formed on each retina are

somehow superimposed, two horizontally displaced but overlapping images would be seen.

51 Depth perception is possible when the brain processes the existence of small differences between the two retinal images, the process known as [65]. The differences between the two retinal images is due to the difference in the two angles formed, in each eye, between the retinal projection of the object in focus and any other object in the field of view, as illustrated in Figure 4.3. Two cubes are in the field of view, one farther away than the other. Assume both eyes are fixated on a corner of the cube nearer to the viewer, labeled n1 such that the image of n1 is focused at corresponding points in the center of the fovea of each eye, where the visual acuity is highest. A corner, labeled n2, from the farther away cube will be imaged on the retina of each eye that will be at different distances from respective fovea, therefore θl = θr . This difference is binocular 6 or retinal disparity. Moreover, retinal disparity in relation to an object of interest can be

Figure 4.3 Binocular disparity

defined as the difference between the convergence angle of that object and the convergence

52 angle associated with the fixation target. From intersecting lines and opposite angles, it

is deduced that retinal disparity at point n2 can be expressed as: θl θr = ϕ α. In other − − words, when the convergence angle of that object (α) is smaller than the convergence angle

associated with the fixation target (ϕ), the object is farther than the fixation target, then the retinal disparity is positive.

Accommodation: The contracting or relaxing of the eye muscles that changes the shape

of the eye lens is called accommodation. To see an object that is closer to the viewer, the

eye muscles relax and the lens thickens. Conversely, the eye muscles contract stretching

the lens making it thin to see distant objects as illustrated in Figure 4.4. The changes in the

lens shape focus incoming light rays onto the retina to form a clear image. Accommodation

is categorized as an oculomotor depth cue because eye muscles are used in changing the

focus. Out of focus information from different states of accommodation can also provide

useful depth information.

Figure 4.4 Accommodation - changes in lens thickness accommodate focus on objects

Convergence: It is also referred to as vergence, which is the inward or outward rotation

of the eyes towards a point of interest as it moves towards or away from the viewer as shown

53 in Figure 4.5. It is also an oculomotor depth cue because eye muscles are used in rotating

the eyeballs. As an object moves away from the viewer, the eyes diverge and move outward

until they reach maximum parallel position. In a normal human vision, the eyes cannot

diverge beyond this point. As the both eyes rotate inward or outward to converge on the

object the lens becomes thinker or thinner respectively to bring the object in focus and

accommodate. Thus in real life, accommodation and convergence occur simultaneously when viewing the world with stereovision.

Figure 4.5 Vergence - inward or outward movement of both eyes to converge on objects

Motion Parallax: Motion parallax is a monocular depth cue that occurs when either

the observer or the scene move relative to each other. An image formed on the retina by

a distant object moves across the retina more slowly than an object’s image that is closer

to the viewer. Thus, objects closer to the viewer appear to move faster than objects that

are farther away. This allows relative depth judgments to be made. The effects of motion

parallax are evident when looking outside of a window from a moving car, where objects in

54 the distance appear stationary while objects close by rapidly travel across the viewer’s field

of view as depicted in Figure 4.6.

Figure 4.6 Motion parallax - object closer to viewer appear to move faster

4.4 Creating Stereo Pairs

The rendering of a stereo pair requires an understanding of the geometry of viewpoints

represented by two virtual cameras described by a viewing frustum in the right handed

Cartesian coordinate system. A frustum is the view volume formed by the eye’s field of view. In computer graphics, frustum is clipped by near and a far clipping planes. Objects

appearing in front of the near plane or behind far plane are not visible. The cameras

are horizontally separated by a distance, called the inter-axial distance, which represents

55 separation between the two eyes, the inter-ocular separation, thus simulating stereovision.

The simplest method of rendering a stereoscopic image pair is to set-up two virtual cameras

by using the parallel axis shown in Figure 4.7. This model has a parallel axis with symmetric

Figure 4.7 Stereo visible where the two view frustums overlap (top view)

view frustum. It is recommended for creating stereo pairs because it does not generate

keystone distortion or vertical parallax and has zero disparity for points at infinity [66]. The field of view common to both cameras is where stereovision is observed. Objects that are visible in one camera but outside the field of view of the other camera are the regions to

be avoided. Such object placement causes visual discomfort due to the object’s visibility

in one eye but not the other. To perceive corresponding points in the left and the right

images correctly, it is important to adjust the image. This is accomplished by either using

56 an asymmetric view frustum during rendering or cropping the sides of the image during

post-production. In computer graphics, use of an asymmetric view frustum is common

because of simpler implementation.

4.5 Viewing Stereo Pairs

The 2D display surface capable of forming a stereo image by projecting the left- and right-

eye views is called a stereo window. Consider looking through a real window on a building,

objects may appear behind, at, or in front of this window. Similarly, in a computer generated

stereoscopic view, looking at a stereo window is analogues to looking through a real window.

The intersection of two lines from a point in the scene to each eye produces what is called

homologous points on the stereo window. The horizontal distance between the homologous

points is called the horizontal parallax. Three types of horizontal parallaxes that induce

stereoscopic depth cues are shown in Figure 4.8. The sign of the difference between the

abscissas of two homologous points determines the type of parallax: positive, zero, or

negative. An object appears behind the stereo window when the horizontal parallax in

positive. This happens when the projections of the object on the stereo window for each

eye is on the same side as the respective eyes, which are uncrossed. The maximum positive

parallax occurs when the object is at infinity i.e. both eyes are looking straight with parallel

line of sight. At this point the horizontal parallax is equal to the inter-ocular distance.

An extreme form of positive parallax occurs when the horizontal parallax is greater than

the viewer’s inter-ocular separation. This is known as diverging parallax resulting in a

phenomenon called walled-eye vision. This phenomenon does not occur under natural viewing conditions for normal human vision. When an object has no perceivable amount

57 Figure 4.8 Stereo window and horizontal parallax

of horizontal parallax it appears to be at the stereo window. The projection of the object on

the stereo window is coincident for both the left and the right eye, hence zero horizontal

parallax.

An object is located in front of the stereo window when the projection for the left eye is

on the right and the projection for the right eye is on the left, hence crossed also known as

negative parallax. Note that a negative parallax is equal to the inter-ocular distance when

the object is halfway between the stereo window and the viewer. As the object moves closer

to the viewer the negative parallax increases to infinity.

Several techniques exist to view a stereo pair, with or without optical aid. Viewing a

stereo pair without assistance from any apparatus is called free viewing. With practice, most viewers can fuse stereo pairs using this technique. Viewing a stereo pair with an optical

aid can be grouped into two categories, time-parallel and time-multiplexed methods. The

following sub-sections describe these techniques.

58 4.5.1 Free Viewing

Parallel or uncross and transverse or cross viewing are two types of free viewing techniques.

In parallel viewing, the left eye image is placed to the left of the right eye image. When the viewer looks at the two images, the eyes are uncrossed. The lines of sight of the viewer’s

eyes move outward toward parallel and meet in the distance at a point well behind and

beyond the image.

In transverse viewing, the left and right eye images are reversed requiring the viewer to

cross eyes to restore image placement and perceive depth. Most people are better at one

type of free viewing over the other.

Free viewing skills require conscious effort, concentration and practice to master. Con-

sequently, these methods are used by experts in field enabling them to view stereo pairs without optical aid.

4.5.2 Time-parallel Viewing

In the time-parallel method both eyes are presented with stereo pairs simultaneously. Such

methods include anaglyphs, advanced wavelength multiplexing approach used by Dolby

3D, and auto-stereoscopic techniques.

Anaglyphs: Various techniques use filters to extract images for the left- and right-eye.

The anaglyph method is one such technique that has been used extensively in viewing

stereo pairs. In anaglyphs the left- and right-eye images are superimposed and the pixel values are computed from the combination of the left eye color and the right eye color.

Anaglyphs require the viewer to see the image with glasses that use complementary color

59 filters. The filters for viewing the anaglyph on an electronic display are typically designed to

block blue and green wavelengths for the left eye and to block red wavelength for the right

eye. Other common filter combinations include red-green, red-cyan. Anaglyphs are easy to

create and only require inexpensive color filter glasses to view either on a monitor or in

print. However, major drawbacks exist that make this technique unsuitable for mainstream

media. Since viewing of the anaglyph requires color filters, the true color fidelity of the

scene is often lost. In addition, it suffers from retinal rivalry that can create the appearance

of ghosting. Ghosting, also known as crosstalk, means that one eye can see part of the

image that is intended for the other eye. This causes an unpleasant viewing experience,

eye fatigue and headaches after an extended period of looking at anaglyphs. Recent tech-

niques for computing anaglyphs that improve color faithfulness which are based on the

transmission properties of the filters and the color characteristics of the display device have

been proposed. The anaglyph output can be improved by using algorithms like uniform

approximation to produce brighter output [67] or the CIELab approximation method to

preserve color fidelity [68]. However, such algorithms incur significant extra computational overhead in anaglyph calculation.

Advanced Wavelength Multiplexing: The classic anaglyph approach is a wavelength

multiplexing technique where the whole wavelength range of visible light from 400 to

700 nanometers (nm) is subdivided into two ranges red and the complementary color

to red, cyan. The advanced wavelength multiplexing approach, sometimes referred to

as super-anaglyph, is adopted by Dolby 3D that uses narrower wavelength bands. This

approach works for stereo projection systems. It uses interference narrowband spectral

filters to extract the left- and right-eye image [69]. The filters have to be selected such that

60 they fall within the red, green and blue sensitivity range of the cone cells, photoreceptors

responsible for color vision, of the human eye. Each RGB value is split into two channels with slightly different wavelengths. This set of RGB triplets is used to encode the stereo pair.

For example, the left-eye image may use: red = 620 nm, green = 530 nm, and blue = 440 nm wavelengths, while the right-eye image may use slightly different wavelength values:

red = 615 nm, green = 525 nm, and blue = 435 nm. The projected stereo pair is decoded by glasses with appropriate interference filters. Although, the glasses are passive, without

any active electronic components, they are expensive and not disposable because they use

specific wavelengths to filter light. Unlike anaglyphs, an advanced wavelength multiplexing

approach has full color output. The projection screen is inexpensive, either simple matte

or low gain, and it is compatible with standard cinema screens.

Auto-: The necessary use of glasses is a hindrance to widespread consumer

acceptance of viewing content in stereo. Auto-stereoscope techniques provide glasses

free method for viewing stereo. Two ways of manufacturing spatially multiplexed auto-

stereoscope display are lenticular and [70]. A lenticular surface has an array of small cylindrical lenses in front of the pixel raster. These lenses are placed in such a way that light from adjacent pixel columns falls in different viewing slots at some ideal viewing distance. Each viewer’s eye sees light from only every other pixel column. The

parallax barrier technique achieves the same goal by using small visual barriers in front of

the pixel raster. The use of the auto-stereoscope display is specialized with a narrow viewing

angle. To get an optimum viewing angle, the observer needs to sit directly in front of the

screen. Auto-stereoscope display technology is well suited for personal displays, such as

the Nintendo 3DS game console. Other drawbacks include cost, diminishing image quality,

61 and being limited to only viewing 3D images.

4.5.3 Time-multiplexed Viewing

In the time-multiplexed method a stereo pair is presented to both eyes in a sequence.

Optical techniques are used to block one eye while the other eye is shown the image and vice versa. This method is grouped into two types, one with passive and the other with

active viewing glasses. In both types, the images are delivered at a faster frame rate, at least

120 Hz, to avoid flicker. The use of a passive polarized glasses and active shutter glasses

are examples of time-multiplexed methods. Time-multiplexed methods are sometimes

referred to as field sequential if the left- and right-eye images are shown using the interlaced

fields. The term frame sequential is used when display is progressive or non-interlaced.

Passive Polarized Glasses: In a system that uses polarized light to display a stereo pair,

the left- and right-eye images are polarized orthogonal to each other. A newer version of

this technology uses circular polarization where one eye lens is polarized clockwise and

the other counterclockwise. If the viewer tilts his head the images will not separate as it will in the orthogonally polarized case. The viewer uses passive polarized glasses that also

have orthogonal axis of polarization. The combination of the two acts as a shutter. When

the left eye image is displayed, the light is polarized such that it is parallel to the axis of the

left eye lens. Since the right lens’ polarization axis is orthogonal to the left lens, the image

to the right eye is blocked. The passive system does not need any synchronization with

the display device. It also allows several viewers to simultaneously view in stereo and has

a relatively larger field of view. For these reasons this system is popular in movie theaters.

The disadvantage of this type of system is that the display device must produce a polarized

62 image. The projector must use polarizing lenses and the viewing screen must be coated with a special material to reflect polarized light. Since the passive polarized glasses essentially act as dark sunglasses, the intensity of the light that the viewer sees is low that causes images to appear dark.

Active Shutter Glasses: The glasses used in this method do not have filters. Instead an

LCD acts as a blocking shutter. An electronic signal is used to either make the lenses clear or opaque. The signal alternates for each eye causing the left eye to see the view while the right is blocked and vice versa. The view from the glasses is actively synchronized with the current frame on the display via an infrared signal between the active shutter glasses and the display. This method is capable of delivering full high resolution progressive image to each eye. Since active shutter glasses do not use polarized light, the light intensity researching the viewer is much higher. However, the major disadvantage of this system is that is requires additional logic to maintain synchronization between the display and the glasses.

Quad Buffer Stereo: Quad buffering is a technology for implementing stereoscopic rendering that uses double buffering with a front and back buffer for each eye, totaling four frame buffers. Quad buffering allows swapping the front and back buffers for both eyes in sync, allowing the display to seamlessly work with different rendering frame rates. A GPU that supports quad buffering also supports time-sequential images to be viewed by active glasses with liquid crystal shutters.

4.6 Challenges with Stereo Rendering

Numerous experiments have shown that stereovision can provide significant advantages over monocular vision. Particularly, stereovision can aid in spatial localization and visu-

63 alizing large amounts of complex data [71]. However, mainstream acceptance of stereo requires a comfortable viewing experience. In most people the dominant depth cue for

stereovision is binocular disparity. The control of depth cues for producing a correct stereo

effect is critical. A conflict in depth cues has a strong detrimental visual effect resulting in

one or more of the following:

1. The dominant depth cue may not be the correct or intended depth cue.

2. The depth perception may get exaggerated or reduced.

3. The stereo view may cause eyestrain and become uncomfortable to watch.

4. An object at the edges of the stereo window with negative parallax results in conflict

in depth cues.

5. The stereo pair may not fuse at all and the viewer may observe two separate images.

In a stereo pair the only difference should be in the horizontal parallax. Any other

difference in the color, geometry, and brightness between the two images will result in visual fatigue. Some people experience eyestrain when viewing a stereoscopic image pair

on a flat display device where the relationship between accommodation and convergence

breakdown. In viewing a natural scene, both eyes accommodate and converge at the same

point. This is a habitual and learned response. However, viewing an object on a plane screen

makes the two eyes always focus on the screen itself while their line of sight converges at

some point in space. Only the image points that have zero parallax are those that have the

same accommodation and convergence. This results in using low horizontal parallax to

reduce viewer discomfort.

64 The goal in creating a stereo pair is to provide maximum depth effect with the minimum

amount of horizontal parallax. This can be achieved by varying inter-axial separation.

Bringing the two camera viewpoints towards each other reduces the amount of horizontal

parallax. Conversely, greater inter-axial separation results in greater parallax. As a general

rule, horizontal parallax should not exceed half an inch if viewed from typical viewing

distance of eighteen inches for a desktop monitor. The leakage of the left-eye image into the

right-eye and vice versa results in crosstalk or ghosting. It can result from inaccurate shutter

synchronization in active shutter glasses or from phosphor afterglow in plasma or older CRT

displays. A difference in perception of color, contrast, brightness, and geometry between

the left- and the right-eye all result in a ghosting effect. Reducing horizontal parallax also

helps to diminish this effect.

In creating a stereo view it is important to decide where to position the object with

respect to the stereo window. If an object has negative parallax and the edges of the stereo window cuts off that object then there will be conflict in depth cues. The occlusion of the

object by the stereo window tells the viewer that the object is behind the window. On the

other hand, the stereo cue provided by the parallax tells the viewer that the object is leaping

out of the window. From experience, when looking through the window an object cannot

be between the window and the viewer therefore causing conflict in depth cues.

Strictly speaking, a stereo view is only correct for one viewing position. When looking

through the stereo window, if the viewer shifts the head from side to side, a shearing effect will be observed. If the viewer moves closer or farther away from the screen the scene will

compress or expand. This can be compensated in real-time systems by head tracking.

65 CHAPTER 5

IMPLEMENTATION FRAMEWORK

Algorithms that execute instructions while performing calculations on multiple indepen-

dent data sets in parallel are well suited for running on graphics hardware. Graphics pro-

cessing takes place on a programmable GPU. It is a specialized processor with dedicated

memory that performs floating point operations required for computer graphics. In re-

sponse to commercial demand for real-time rendering, the current generation of GPUs have

evolved into many-core processors that are specifically designed to perform data-parallel

computation. The single instruction multiple threads (SIMT) framework accomplishes

data-parallelism where multiple processing cores in a GPU execute same instructions on

different data sets using multiple threads executing in lockstep. Particle simulation is an

ideal computational task for a SIMT framework since it is assumed that every raindrop

or snowflake will behave independently. This chapter describes implementation details

needed to demonstrate stereoscopic, real-time, and photorealistic rendering of precipita-

tions.

66 5.1 Graphics Hardware

In the graphics hardware terminology, host is synonyms to a CPU executing a graphics

application that is responsible for capturing events such as key presses, mouse clicks, setup

of initial states of the graphics program, and creation of an output window to draw results.

The term device is referred to a graphics card that houses a GPU, which consists of thousands

efficient data processing cores designed for handling multiple data sets simultaneously.

This increases throughput, the amount of data processed in a given unit of time, which is

ideal for high performance computing or executing graphics applications.

A prototype application is developed to test the performance of real-time rendering

of rain in stereo on an Intel Core-2 Duo CPU running at 2.00 GHz with 2.0 GB of RAM

installed with a generic graphics card designed for the Intel series express chip set family.

The experiments performed on this basic graphics hardware validate that stereoscopic

rendering of precipitation scene is possible in real-time.

In later experiments the graphics card used is an NVIDIA QUADRO K5000, which has

3.54 billion transistors. This is a powerful graphics hardware capable of supporting a mod-

ern OpenGL version 4.3 and above by featuring a programmable GPU. It has a theoretical

single precession floating point operation with a peak rate of 3090 GFLOPS. It incorporates

1,536 compute unified device architecture (CUDA) cores clocked at 1.06 GHz. The term

CUDA is a marketing name NVIDIA uses to define a GPU parallel computing platform and

programming model, which enables improvement in computing performance by harness-

ing the data-parallelism provided by the GPU. A CUDA core, also known as a streaming

processor, is a pipelined hardware unit accepting instructions and executing them in paral-

67 lel. Other GPU vendors, such as AMD, refer to a CUDA Core as a single instruction multiple

data unit (SIMD). In OpenGL terminology, a CUDA Core is called a shader unit because

it executes a shader program. The QUADRO K5000 graphics card also supports 4 GB of

GDDR5 memory clocked at 6.08 GHz which produces memory bandwidth of 192.26 GB/s

and a texel (texture element, also known as texture pixel) fill rate of 128.8 GTexel/s.

5.2 Graphics Software

There are two major low-level graphics libraries available to facilitate GPU programming,

DirectX and OpenGL. DirectX is proprietary and specific to Microsoft Windows operating

system. It is optimized to render video games for real-time performance, sacrificing pho-

torealism. However with each new release of DirectX, visual quality and experience has

improved by implementing better algorithms that takes advantage of programmability of a

modern GPU. Windows 10 supports the latest version 12 of DirectX. On the other hand,

OpenGL is an open framework. An application that uses OpenGL can execute on various

platforms. It is shipped as part of the graphics hardware driver and therefore some features

can be vendor specific. It provides a good balance between real-time and photorealistic

output. Additionally, it supports stereoscopic rendering by using quad buffers. Until re-

cently DirectX did not support stereoscopic implementation. Since one major goal of this

study is to produce results in stereo, OpenGL is selected for software implementation.

As the GPU hardware evolved so did the programming model. The prototype application

developed with OpenGL 2.0 to implement a stereo rendering of a rain uses a fixed-function

graphic pipeline. This pipeline is configurable but can only use fixed functionality, such

as lighting models that run on a GPU but cannot be changed or programmed. Because of

68 this, it is limited in use and does not take advantage of modern hardware improvements.

In the modern GPU, the fixed function graphic pipeline is improved and replaced with

a programmable graphics pipeline. In OpenGL, such programs are written in a C-like

language called the OpenGL Shading Language (GLSL) and are called shader programs.

The name shader is a misnomer since it has little to do with various lighting or shading

models. A shader programs are executable programs that run on a GPU and processes

data in parallel. The OpenGL driver and underlying hardware is responsible for parallel

execution, scheduling and synchronization.

5.3 Stereoscopic Implementation

The monoscopic particle system to simulate the behavior and distribution of falling par-

ticles are extended to stereo, where left- and right-eye views are calculated and rendered

separately. Thus, the scene is rendered from two different perspectives. The scene with

rain or snow particles is modeled using the world coordinates system while each object

is modeled using their respective model coordinates. Several matrix transformations are

required to transform all model coordinates to screen coordinates. This approach is typical

in a raster graphics pipeline where scene objects are used to form an image on the screen.

Figure 5.1 show this transformation process. All operations shown in gray boxes are software

implementations while blue boxes are GPU hardware units.

The implementation to produce stereo output is described by Hussain and McAllister

[72], where precipitation is rendered within a bounding box. The farther away rain or snow particles from the stereo camera, the bigger the disparity in the two views and the smaller

the particle size. A particle at near infinity will approach maximum disparity and minimum

69 Figure 5.1 Transformation from 3D world space to 2D screen space

particle size. If the inter-axial distance, the distance between the left- and the right-view camera, is kept similar to the distance between the human eyes, which is about 6.5 cm, then the stereovision will only be effective up to 30 meters (m) from the camera [73]. Since stereovision will only be strong for precipitation forming close to the camera, to improve rendering speed adjustments to the precipitation bounding box are made such that any particle formed outside of this boundary need only be rendered once. Conversely, the inter- axial distance is increased to produce hyper-stereo output, which is suitable for viewing outdoor scenery and distant rain or snow particles in stereo. The greater the inter-axial separation, the greater the depth effect.

Initially, the OpenGL 2.0 graphics library is used for experiments. Later implementation uses newer graphics hardware that supports OpenGL 4.5. The newer application executes on the Intel Quad Core i7 CPU running at 2.00 GHz with 16.0 GB of RAM installed with an

NVIDIA Quadro K5000 GPU. The graphics card supports an advanced graphics pipeline

70 featuring a programmable GPU. The hardware also supports quad buffer stereo with active

shutter glasses to view stereo output. A GPU that supports quad buffering also supports

time-sequential images to be viewed by active glasses with a liquid crystal display (LCD).

The glasses used in this method do not have filters. Instead an LCD acts as a blocking shutter.

An electronic signal is used to either make the lenses clear or opaque.

Alternatively, the frame buffer object (FBO) can be used to produce stereo anaglyphs.

In a fragment shader, instead of writing the pixel data to the frame buffer for rendering,

the FBO is used to save the pixel data. The fragment shader generates stereo anaglyphs

by rendering left- and right-eye images on separate FBO that are rendered to texture to

produce a composite of left- and right-eye views. The red component comes from the

left-eye image while the green and blue components come from the right-eye image.

5.4 Real-time Implementation

An earlier goal of this study is to investigate if real-time rendering in stereo is even viable.

Stereoscopic real-time output in other natural phenomena, such as fire [10], clouds [11],

and vegetation [12] have been studied. However, rain is distributed over larger 3D space and therefore representing each particle to produce visually realistic animation of rain in

stereo is a challenge.

In a previous study [72], a method for real-time rendering of rain in stereo is presented. It does not consider complex illumination and therefore produces an output without

emphasis on photorealism. The OpenGL 2.0 application executes on the Intel Core-2 Duo

CPU installed with a generic graphics card, which is good for basic rendering but does not

support an advance graphics pipeline or a programmable of GPU. Using OpenGL 2.0, the

71 particle simulation is implemented on the CPU. For each new frame, updates to pixel data

are made and passed to the GPU for rendering. This creates a bottleneck between CPU and

GPU, which is a major reason for GPU hardware and software redesign. The experiments with the older hardware show that a real-time stereo implementation of rain for simple

scenes is possible.

Recent improvements in graphics hardware architecture and corresponding updates to

OpenGL have made it possible for the CPU to send data once to the GPU, which keeps the

data resident in local memory. Thus, all graphics related data processing and simulation

takes place on the GPU. This enables a particle system to be implemented on a GPU. The

desired video frame rate is attained by taking advantage of inherent parallel architecture of

a modern GPU, which can operate in two program modes, shader and compute modes.

5.4.1 Shader Mode

In the shader mode, the current OpenGL programmable graphics pipeline, version 4.5, is

divided into four shader stages. The two required shader stages are vertex and fragment

shader, while the other two tessellation and geometry shader stages are optional. Every

modern OpenGL program must have at least vertex and fragment shaders, which constitute

a shader program.

The data passed from the host to the device is first processed by a vertex shader. It

operates on the vertex data which is in the form of 3D geometric primitives, like points,

lines, or triangles. Basic object geometry is defined by a collection of vertices and associated

attributes, like position, color, texture coordinates, normal vectors, and other attributes

that may defined that object. The vertex attributes are initially stored on the host system

72 memory in array form on a Vertex Buffer Array (VBA) This array is passed to the device

memory where it is referred to as a Vertex Buffer Object (VBO). Multiple vertex buffer

objects form a geometric scene. The vertex shader program is responsible for reading each vertex from the vertex buffer object and processes it in parallel. At a minimum, a vertex

shader calculates the projected position of a transformed vertex in screen space. It can

also generate other outputs, such as a color or texture coordinates, for the rasterizer to

blend across the surface of the triangles connecting the vertex. The particle geometry, for

either raindrop or snowflake, and associated attributes are defined in the VBA. Instead of

using detail geometric models, which may require hundreds of vertices to define, a coarse

icosahedron shape is used. An icosahedron shape only uses twelve vertices and takes far

less host and device memory to store. Using the optional shader stages, these initial vertices will be later subdivided into forming a desired particle shape of a raindrop or a snowflake, which can also be texture mapped to for greater visual detail.

The output of the vertex shader is passed to the next stage in the graphics pipeline. In

the absence of optional shader stages, such as a tessellation or geometry shader, the data

enters a fragment shader after passing through a hardware unit called the rasterizer. The

rasterizer produces visible pixel size fragments. OpenGL stores depth values of objects in a

depth buffer, also called a z-buffer. These values are used in a depth test which checks for

objects to be within near and far clipping planes. A fragment is defined as a pixel that has

not been subjected to a depth test. The GPU passes these fragments to a fragment shader.

The output of the fragment shader adds color to each fragment and applies depth values

that are then drawn into the framebuffer. Common fragment shader operations include

texture mapping and lighting. Since the fragment shader runs independently for every pixel

73 drawn, it can perform the most sophisticated special effects; however, it is also the most performance-sensitive part of the graphics pipeline. Since the fragment shader manipulates color values, it is used to produce stereo anaglyphs. It produces a composite image that incorporates both the left- and right-eye views. The red component of the composite image comes from the left eye image and the green and blue components come from the right eye image. Figure 5.2 show the final output the fragment shader forming an anaglyph from an initial icosahedron shape.

Figure 5.2 Fragment shader (stereo view with red-cyan anaglyph glasses)

Between the two required vertex and fragment shader stages there are a two optional stages namely tessellation and geometry shaders. The tessellation shader consists of three

74 sub-stages for creating more details in geometry. The first tessellation stage takes the in- coming vertices and passes them through a tessellation control (TC). This stage determines the amount of new vertices to generate. The output of TC passes to the tessellator, which is a graphics hardware unit responsible for generating new vertices by interpolating between the original and newly specified vertices. The position of these new vertices is determined by a tessellation evaluation (TE), which defines the shape of the new geometry. Tessellation shader stages are used to produce a dynamic level-of-detail in a scene instead of using static texture maps. They are also used in making curved surfaces either smoother or produce sharp edges suitable for terrain rendering. Figure 5.3 show input an icosahedron shape tessellated into a more refined spherical shape.

Figure 5.3 Input and output of tessellation shader

A geometry shader is the other optional shader stage that is responsible for modifying the geometry of objects before passing to a rasterizer. This stage is ideal for geometric instancing. For example, the geometry of an object is specified once but instead of making a draw call for each new object, instancing is used to make one draw call with multiple

75 instances of the same object drawn. This stage is useful in creating precipitation scenes where a single particle is drawn many times by a single draw call using multiple instances

of a particle.

With the use of a programmable GPU and performing all simulation and rendering

computation on GPU itself the real-time frame rates for stereoscopic rendering is achieved.

Figure 5.4 shows the modern GPU programmable shader stages for rendering and computa-

tion. Each stage runs on the GPU taking advantage of the parallel hardware architecture. The vertex shader operates on the vertex data which is in the form of 3D geometric primitives,

like points, lines, or triangles. The fragment shader operates on the fragments generated by

the rasterizer. The compute shader is not part of the programmable graphic pipeline but

runs in parallel to other shaders and is responsible for non-graphics related computations,

such as particle simulation.

5.4.2 Compute Mode

In the compute mode, the GPU can be used as a general purpose processor in which data

processing only takes place for simulation purposes and not for rendering. Programming

models defined by CUDA, OpenCL, and compute shader programs are used in GPU com-

pute mode. CUDA is vendor specific and can only be implemented on NVIDIA graphics

cards. The OpenCL is an open architecture designed to provide some implementation on

all graphics cards.

The compute shader is a new programming stage introduced in OpenGL version 4.3 to

provide better interoperability between GPU rendering and simulation tasks. It enables a

GPU to be used for general purpose computing. The particle simulation for rain and snow

76 Figure 5.4 Modern graphics pipeline

is performed in GPU compute mode using a compute shader. It is also used to simulate

environment effects, such as gravity and wind to implement a particle system. The compute

shader uses two buffers; one stores the current velocity of each particle and a second stores

the current position. At each time step, a compute shader updates position and velocity values. Each invocation of a compute shader processes a single particle. The current velocity

and position are read from their respective buffers. A new velocity is calculated for the

particle and then this velocity is used to update the particle’s position. The new velocity

and position are then written back into the buffers. To make the buffers accessible to the

compute shader program, a shader storage buffers object (SSBO) is used. Each particle

is represented as a single element stored in an SSBO. This is a memory reserved on the

graphics card. Each member has a position and a velocity that are updated by a compute

77 shader that reads the current values from one buffer and writes the result into another

buffer. That buffer is then bound as a vertex buffer and used as an instanced input to

the rendering vertex shader. The algorithm then iterates, starting again with the compute

shader, reusing the positions and velocities calculated in the previous pass. No data leaves

the GPU memory, and the CPU is not involved in any calculations. An alternative to pass

data to the GPU is via buffer textures that are used with image load and store operations.

A particle system can also be implemented using a transform feedback buffer which is

implemented using vertex and geometry shader programs.

5.4.3 Transform Feedback

Transform feedback captures vertices as they are assembled into primitives (points, lines, or

triangles). Each time a vertex passes through primitive assembly, those attributes that have

been marked for capture are recorded into one or more buffer objects. Those buffer objects

can then be read back by the application. To implement a particle system transform feed-

back requires two passes. On the first pass transform feedback is used to capture geometry

as it passes through the graphics pipeline. The captured geometry is then used in a second

pass along with another instance of transform feedback in order to implement a particle

system that uses the vertex shader to perform collision detection between particles and the

rendered geometry. An implementation of a particle system using transform feedback is

illustrated in Figure 5.5.

In the first pass, a vertex shader is used to transform object space geometry into both world space, and into eye space for rendering. The world space results are captured into a

buffer using transform feedback, while the eye space geometry is passed through to the

78 Figure 5.5 Implementation of a particle system using transform feedback

rasterizer. The buffer containing the captured world space geometry is attached to a texture

buffer object (TBO) so that it can be randomly accessed in the vertex shader that is used

to implement collision detection in the second simulation pass. Using this mechanism,

any object that would normally be rendered can be captured, so long as the vertex (or

geometry) shader produces world space vertices in addition to eye space vertices. This

allows the particle system to interact with multiple objects, potentially with each render

using a different set of shaders.

The second pass is where the particle system simulation occurs. Particle position and velocity vectors are stored in a pair of buffers. Two buffers are used so that data can be

double-buffered as it’s not possible to update vertex data in place. Each instance of the vertex shader performs collision detection between the particle and all of the geometry

79 captured during the first pass. It calculates new position and velocity vectors, which are

captured using transform feedback, and written into a buffer object ready for the next step

in the simulation. To produce results in stereo two sets of transform feedback buffers are

maintained at the same time.

5.5 Photorealistic Implementation

There are several applications of synthetic precipitation phenomenon found in the movies, video games, and virtual reality. Special effects are often used in the movies but they require

expensive and time consuming off-line processing to achieve photorealism. On the other

hand, video games require a balance between real-time user input and photorealistic

output. Evolution of software algorithms and recent improvements in computer hardware

have made it possible to produce real-time frame rates with emphasis on photorealism.

Photorealism is defined as a detailed representation like that obtained in a photograph in a

non-photographic medium such as a painting or, in our case, computer graphics.

References to photorealism appear in several studies. However, characterization of the

term varies. This is because human response to visual stimuli is as variable as individuals

themselves. Photorealism as defined by Rademacher et al. [74] is an image that is perceptu- ally indistinguishable from a photograph of a real scene. Since a photograph is planar, this

characterization of photorealism is well suited for monoscopic view. It is human conscious-

ness that identifies a scene as real or otherwise. Our brain creates a perception of reality

after processing visual information received by our two eyes, which are taking a snapshot

of the world around us from two different perspectives. A stereoscopic characterization

of photorealism is closer to visual realism where depth is perceived by presenting images

80 from two different perspectives. An alternate definition of photorealism is given by Ferw-

erda [75]. The author considers photorealism as images that are photo-metrically realistic, where photometry is the measure of eye’s response to light energy. Thus, photorealistic

rendering is about simulating light or how photons move around in a scene. The better

the approximation of this process, the closer we can get to photorealism. Techniques like

ray- or path-tracing, which simulate photons bouncing around in a scene, are inherently

better at producing photorealistic results. However, these techniques do not to work very well for a dynamic scene with many thousands of moving particles. The implementation

is further compounded by stereoscopic output at video frame rates, where a minimum of

120 fps screen refresh is required to achieve a jitter free stereo animation of rain or snow.

Thus, an approximation to photorealistic results is desired. Fortunately, it is difficult for

the human eye to notice subtle differences in a dynamic scene. Therefore, ignoring certain

photorealistic effects, such as soft shadows, which may make a visual difference in an

otherwise static scene, is a viable option.

5.5.1 Illumination using a GPU

The fragment shader programs are used to create realistic illumination effects using a GPU.

The image-based light and environment mapping techniques are used to reflect and refract

light from scene objects including rain and snow particles. Cube maps are one commonly

used variant of environment mapping which maps the reflection and refraction vectors

from the surrounding texture on to the particle. The benefit of this approach is that it

requires less computation and is therefore good for real-time applications. However, the

problem with this approach is that it only deals with the front surface, not the back or any

81 other intersecting polygons in the main object. This means anything between the front face

of the particle and the background is not considered. Thus the reflection and refraction

of any moving objects will not appear in the particle; only the static surrounding will be

reflected and refracted.

For simple monoscopic scenes, the cube map approach can give some interesting

results. However, to get more realistic stereo results it requires multiple passes that create

a new cube map during each frame cycle. The other background objects in the scene are

rendered to an off-screen buffer, which is used as a cube map for scene illumination. The

stereo implementation of cube maps is performed for both the left- and the right-eye views. This potentially doubles computation. OpenGL is a rasterization-based system but

there are other methods for generating images such as ray-tracing. To take realism and

global illumination into account, a move towards ray-tracing is necessary. The ray-tracing

simulates natural reflection, refraction, and shadowing of light by 3D surfaces. In general

ray-tracing is computationally expensive due to many calculations required to determine

object-ray intersections. Parallel implementation using a GPU are ideal for ray-tracing

based image generation. In ray-tracing, since pixel color is determined by tracing a ray from

the viewpoint towards the light source, this approach inherently avoids hidden view in a

stereoscopic image and can be computed with as little as five percent of the effort required

to fully ray-trace surface issues [76]. Distributed parallel ray-tracing implementation is best suited using a compute mode programming interface where GPUs are utilized as general

purpose processors for speed up. The real-time output from ray-tracing for an animated

scene with thousands of moving particles such as in rain or snow natural phenomenon is

an open area of research.

82 5.6 Particle System

In this study rain and snow particles are animated and rendered using a particle system.

Pre-computed images of rain streaks and snowflakes are used as billboards to render an

appropriate precipitation scene. This increases rendering speed and also enables the same

particle system to simulate other phenomena such as falling autumn tree leaves by changing

the billboard texture and modifying particle attributes, such as particle mass and velocity.

In nature, precipitations such as rain and snow consist of thousands of tiny particles.

The simulation and rendering of such nature phenomena require complex mathematical

models and computer algorithms. A particle system, which is first introduced by Reeves

[77], is a technique well suited to simulate and render particles that do not have smooth or well-defined surfaces but instead are irregular in shape and complex in behavior. A

particle system is a large set of simple primitive objects, such as a point or a triangle, which are processed as a group. Each particle has its own attributes including position, velocity, and lifespan that can be changed dynamically. The particles move and change

their attributes over time before their extinction, which occurs when the particle lifespan is

exhausted or where an attribute falls below a specified threshold. If the attributes of the

particles are coordinated, the collection of particles can represent an object, such as rain or

snow. To achieve the desired effects of rain or snowfall, many independent particles are

simulated and rendered. In a particle system four basic steps are performed: generation,

rendering, update, and removing of particles from the system. These steps are described in

the following list and illustrated in Figure 5.6.

1. Particle Emitter: the initial position of particles is specified on a plane in 3D space

83 Figure 5.6 Particle system block diagram

called an emitter, also known as a generator. It acts as a source of new particles. The initial number of particles generated defines the density of the desired precipitation.

Fewer initial particles will create an effect of light rain or snow as opposed to larger number of particles, which will generate heavy precipitation. At a given time, a ren- dered frame f contains a total of p number of particles. Let µ and σ2 be the user defined values representing desired mean and variance of the distribution of the number of particles in the system. Let U (a, b ) represent a random number of equal probability between a and b . Then new particles generated every frame is described by an equation 5.1

2 n f = µ + σ U (a, b ). (5.1) ∗ Once the number of generated particles is known, each particle is assigned attributes such as position in 3D space, initial velocity, air resistance, wind forces, size, color,

84 transparency, and particle lifespan.

2. Particle Rendering: the rendering of particles is complex because the particles can

overlap each other, be translucent, cast shadows, and interact with other objects in

the scene. Therefore, an image of a rain streak or snowflake is used as texture, called

an impostor texture or a sprite, which is mapped to a polygon called a quadrilateral

also known as a quad that is made up to two triangles. This textured polygon forms

a billboard, which always faces the camera. It is rendered for each particle. The

use of billboards cuts back on the number of polygons required to model a scene by

replacing geometry with an impostor texture. The rain streak and snowflake billboards

are pre-computed and stored in a GPU texture memory unit for efficient processing.

3. Particle Update: after pixel data in the frame buffer is rendered, every particle is up-

dated for the next frame cycle. The update involves a change in particle position due

to the result of net forces acting on the particle. The particle velocity is updated, lifes-

pan is decreased, and color is changed depending on the environment illumination.

Additionally, an acceleration factor is a user supplied parameter to the particle system

that alters velocity of each particle between frame cycles. This allows for simulating

effects of gravity and other external forces such as wind and air resistance, which

makes particle motion more realistic. When updating the state of a particle, for a

small time interval ∆t , Euler integration is applied. Given initial or previous particle

velocity v and acceleration a, the new velocity is calculated by equation 5.2

v = v + a ∆t . (5.2) ∗

85 This velocity is further integrated with initial or previous position p to get the new

particle position to be rendered for the current frame as expressed by equation 5.3.

p = p + v ∆t . (5.3) ∗

4. Particle Lifespan: when an emitter generates the particles they are given a lifespan,

which is incrementally decreased every frame cycle. The particle is removed from

the particle system when the lifespan expires by reaching a certain threshold, which

is usually zero. An alternative to removing a particle from the system is to recycle it.

This is done by setting a flag when the lifespan expires so that the particle can be

reinitialized for the subsequent frame. The removal of a particle from the system can

have a performance penalty; therefore reusing it for the next frame is preferred.

5.7 Precipitation using the Particle System

For the precipitation effects, such as rain and snow, the running simulation must be main- tained for the entire world, not just the portion that is within the field of view. This is because particle dynamics may cause particles to move from a portion of the world which is not currently visible to the visible portion or vice versa. In the case of snow, it is important to appropriately manage various scenarios dealing with the end-of-life of a particle, e.g., snowflakes accumulation or depletion. In snow accumulation, a snow layer is formed over the objects upon which they fall. In snow depletion the opposite happens which is the result of shrinking or disappearing snow particles due to heat from the sun, thus simulating the melting effect. The snow accumulation can be modeled by terminating the particle

86 dynamics when the particle strikes a surface, but continue to draw it in its final position which is determined by a collision detection algorithm. A difficulty with this solution is

that the number of particles which need to be drawn each frame will grow without bound.

Another solution is to draw the surfaces upon which the particles are falling as textured

surfaces. When a particle strikes the surface it is removed from the particle system and the

snow texture is added to the surface texture. However, this leads to a problem of efficiently

managing texture maps for the collision surfaces. One way to manage these texture maps is

to use the frame buffer object (FBO). Instead of writing the pixel data to the frame buffer for

rendering, the FBO is used to save the pixel data. When the simulation begins, the texture

map for a surface is without snow cover. At the end of each frame, expiring particles are

drawn on the surface using an orthographic projection, which is a viewpoint that is perpen-

dicular to the surface. The resulting texture is saved in the FBO and used in rendering the

surface during the next frame cycle. The process is repeated every frame cycle to simulate

snow accumulation on the surfaces. This method of using FBO for collided snow particles

provides an efficient mechanism for maintaining a constant number of particles in the

system. It works well for the initial snow accumulation on uncovered surface. However, it

does not model continuous snow accumulation and growth in snow cover over time. Rain

particles are denser, heavier, and have more mass as compared to the snow particles. The

particle attributes are different from snow and therefore the effect of gravity and wind is

also different on the rain particles. The heavy rainfall is better simulated using rain streak

texture while light rain is represented by motion blurred spherical mesh objects. The initial

accumulation of rain is a more complex problem than snow. In case of snow, an opaque

accumulation is built up over time but the rain is translucent thus the shading of the col-

87 lision surface is more subtle. Like in snow accumulation, the FBO is used to texture map the collision surface. However, a multi-pass shading method is used, which partition the scene into wet and dry pixels. The scene is drawn using two different shading models, one that renders a wet appearance and the other renders a dry appearance. The texture map is used to choose which output to store in the FBO on a pixel-by-pixel basis. A more efficient method increases the simulation performance by reducing the number of particles. The particles are only rendered if they are in front of the viewer. The use of motion blur in the particles, fog, and illumination effects are used to simulate overcast sky to enhance realism.

5.8 Compute Mode Particle Simulation

In the particle system implemented on a GPU, each particle cycles through the two key program stages running on a GPU as shown in Figure 5.7. One is responsible for particle simulation and the other is required for particle rendering in stereo.

Figure 5.7 Simulation and rendering loops

88 The particle simulation for rain and snow is performed in GPU compute mode using

a compute shader for better OpenGL interoperability. Each particle is represented as a

single element stored in a shader storage buffer object (SSBO). This is a memory reserved

on the graphics card. Each member has a position and a velocity that are updated by a

compute shader that reads the current values from one buffer and writes the result into

another buffer. That buffer is then bound as a vertex buffer and used as an instanced input

to the rendering vertex shader. The algorithm then iterates, starting again with the compute

shader, reusing the positions and velocities calculated in the previous pass. No data leaves

the GPU memory, and the CPU is not involved in any calculations.

5.9 Animation

Animation is accomplished by using a particle system. It is a well suited animation solution

for scenes that contain many similar objects, such as rain or snow particles. The particle

system also enables us to implement laws of physics that are used to model complex

dynamics in an animated precipitation. Physical forces on particles, such as gravity, air

resistance, and wind are simulated. In the animation cycle position and parallax of a new

particle is determined for left- and right-eye views, laws of physics are applied, and particle

attributes are updated before rendering takes place. Expired particles are reborn with initial

attributes and the cycle continues until the application stops.

It is assumed that all particles are moving towards the ground (bottom) plane at terminal velocity. The terminal velocity is the velocity at which the acceleration of the particle is

zero. This happens when force due to gravity cancels the effect of air resistance on the

particle thus the particle appears to fall at a constant velocity. The effect of wind and gusts

89 are the only other external forces considered that can change this constant velocity. They are defined by wind or gust velocity that includes direction and speed. This is implemented in forming a wind bounding box inside the particle boundary. For wind, the size of the wind bounding box is equal to the particle boundary. For gusts the position and size of the wind bounding box is specified in a configuration file. The wind and gust effects are initiated by a key press. When a particle enters the bounding box, calculation of new position is also impacted by wind or gust direction and speed. The wind bounding box acts like a fan sitting in space: if a rain or a snowflake falls in front of the fan then it is blown according the net result of the external forces due to the presences of wind.

90 CHAPTER 6

EXPERIMENTS AND RESULTS

The study is grouped into three phases, each building upon the knowledge acquired from

the previous phase. This chapter describes setup of various experiments and their results.

6.1 Phase 1 – Real-time Stereo

Initial experiments are performed on a generic graphics hardware to validate real-time

stereoscopic rendering of rain. The complex issues of scene illumination are ignored during

this initial phase of the study. Instead the emphasis is on the parameters that produce a

realistic rain distribution. The rain model used in monoscopic rendering of rain is extended

for stereo viewing. Stereo output is produced in two different ways. In the first method,

symmetric view frustum is used to produce stereo by adding random horizontal parallax to

the rain streaks. The second method uses a stereo camera model with asymmetric view

frustum to produce the left- and the right-eye views due to the rendering of the scene from

91 two different perspectives.

6.1.1 Method 1 – Horizontal Parallax

In this investigation pre-computed images of rain streaks are used as billboards to increase

rendering speed. A rain streak image represents retinal persistence in human vision when viewing a falling raindrop. The monoscopic statistical rain models to simulate the behavior

and distribution of falling rain are extended to stereo. The complex issues of scene illumina-

tion and hidden surface elimination problems are ignored. Rain streaks that have positive

parallax, the ones that appear behind the stereo window, are considered. The experiment

concentrates on the parameters that produce a stereo-realistic rain distribution. The al-

gorithm first determines the parameters of the stereo view frustum bounded by near and

far clipping planes. The top plane of the view frustum is the rain emitting plane where all

initial positions of rain streaks are formed. The symmetric view frustum of the left- and

the right-eye camera forms a stereo overlap area that can be represented as a view frustum

formed by a single camera as shown in the Figure 6.1.

For each rain streak two uniformly distributed random numbers are generated that

represent position and parallax within the stereo view frustum as illustrated in Figure 6.2.

The horizontal position of a rain streak is determined by generating a random number, x , with a range between 1 and image width, w . This produces a rain streak position for the

left-eye view. To produce the position of the rain streak for the right-eye, the horizontal

parallax value, z , is generated. The range for z is a random number between 0 and maximum

parallax, m. The maximum parallax value is half of the inter-axial distance. The vertical

position of the streak is changed according to speed of descent and other environmental

92 Figure 6.1 Symmetric view frustum with stereo overlap (top view)

factors such as wind gust introduced by adding a bias in the horizontal position of the rain

streak.

For non-zero parallax, two homologous points are created and the rain streak image is

linearly scaled as an inverse function of depth. The entire process is repeated to animate

and render rain streaks at various depths. An increase in the number of rain streaks creates

a more dense rainfall. The user can modify input parameters, such as inter-axial distance,

raindrop speed of decent, and number of rain streaks, interactively to observe changes in virtual rainfall in stereo.

An image of a rain streak is used as texture map on a polygon, a quadrilateral or quad.

This forms a rain billboard and always faces the camera. Since a stationary camera is

assumed, a cardboard effect associated with rendering of a billboard is not observed. Layers

of billboards, or slices, may be needed to avoid such artifacts. For each rain particle at the

93 Figure 6.2 Generating stereo rain streak (top view)

rain emitting plane, a rain billboard is rendered. Billboarding is used to cut back on the

number of polygons required to model a scene by replacing geometry with a texture map.

The rain billboards are pre-computed and stored in memory for efficient processing. The

rain billboards farther away from the camera are linearly scaled as an inverse function of

depth.

Rain streaks are scaled by a function of depth as illustrated in Figure 6.3, where n and

f are distances to the near and far clipping planes, respectively. let s be the distance of

the rain streak from the camera then the quad length is scaled by a factor (f s )/(f n), − − making the rain streak smaller when it forms closer to the far clipping plane. A smaller,

lower resolution, texture is needed for the scaled rain streaks. This requires use of texture

filtering technique called mipmaps, which is a texture map created from the original texture

at a reduced resolution and size. Therefore, in a mipmap texture the level of detail decreases

as depth increases. When the rain streak is closer to the camera, the original texture is used

to render the rain streak in full detail. In OpenGL, a function is provided to render mipmaps

94 Figure 6.3 Rain billboard and mipmaps (side view)

that is responsible to select a suitable resolution of the texture based on the distance of the rain streak from the camera. The use of mipmaps increases rendering speed and reduces aliasing. A total of eight mipmaps are used with the original texture image size at 128 256 × producing subsequent mipmaps of size 64 128, 32 64, 16 32, 8 16, 2 4, 1 2 and 1 1. × × × × × × × During the next rendering iteration, the vertical position (y ) is determined based on the speed of decent and updated for all rain streaks to move them towards the bottom plane of the rain boundary.

The length of the rain streaks and how they are distributed in the field of view are modeled using the Marshall-Palmer distribution [78]. The rain distribution has an inverse exponential relationship with the length of the rain streak. The rain streaks in the distance look smaller than those that are closer to the camera. Therefore, rain streaks farther from

95 the camera are exponentially greater in number than the rain streaks appearing close to the

camera. Let l be the rain streak length at the near clipping plane. As the rain streak forms

farther from the camera this length is scaled denoted by ls as

f s l l , (6.1) s = ( f −n ) − where s , n, and f are distances of the rain streak texture, near, and far clipping planes from

the camera, respectively, as shown in Figure 6.3. The rain distribution, R, is the inverse

exponential function of ls and is defined as

Λ ls R(ls ) = R0e − | |, (6.2)

where ls is normalized between 0 and 1, R0 is the rain density given in terms of number of

rain streaks, and Λ is the slope parameter. As the value of ls approaches 0, the value of the

exponential approaches 1. Therefore, the rain streaks farther from the camera will appear

more dense expressed by the value of R0. A value of R0 = 1000 results in light rain and a value approaching 10,000 results in heavy rain. The value of Λ is determined experimentally

and it is independent of R0. It affects the rain streak size and rain distributed in the field of view. Incorrect selection of the slope parameter (Λ) results in smaller rain streaks closer to

camera producing conflicts in visual cues and an unnatural appearance. It is found that a value of Λ = 2 gives the best results. It is assumed that the 3D model for the background for left- and right-eye views already

exists such that we can overlay rendered rain for each eye to get the final scene in stereo. Only

rain streaks are rendered without including any other scene elements such as backgrounds,

96 light and other environmental interactions. The number of rain streaks to render, which defines the rain density, is given as an input parameter which lies between 1000 for light,

5000 for medium, and 10,000 for heavy rain. The geometry associated with rain streaks is drawn once per rain streak for left-eye view. A new position for every rain streak is calculated and redrawn for the right-eye view as illustrated in Figure 6.4.

Figure 6.4 Single camera setup to add parallax to rain streaks

The graphic card used in this experiment supports OpenGL 2.0, which is good for basic rendering but lacks support for advance and modern graphics pipeline featuring GPU programming. Each frame displays the current position of the rain streaks for the left- and the right-eye views. Figure 6.5 shows the anaglyph output.

97 Figure 6.5 Method 1 output (stereo view with red-cyan anaglyph glasses)

6.1.2 Method 2 – Asymmetric View Frustum

In this method, the left- and right-eye views are generated by two separate camera setups,

rendering the scene from two different perspectives. In setting up the left- and right-eye

cameras it is not sufficient to translate the camera position by the inter-axial distance

because this creates large portion of the output image that is only visible by one eye as

shown in Figure 6.6. This can cause viewer discomfort. To overcome this issue in stereo

photography, a physical stereo camera, such as Panasonic Lumix 3D, uses a wide angle lens

for both left- and right-eye views such that overlap in the two output images is maximized.

The non-overlapping areas at the boundaries of the two view frustums, where stereo does

not exist, are cropped to only record stereo output. However, in software it is easier to

98 Figure 6.6 Symmetric view frustum (top view)

define an asymmetric view frustum as shown in Figure 6.7. This follows from how human

eyes form a view frustum. Imagine standing at some distance from the center of a window

looking outside. The left-eye, being at an offset from the center, forms an asymmetric view

frustum with the edges of this window and so does the right-eye. This setup avoids any

post-processing on the output rendered image, such as cropping, because both the left- and

right-eye views are projected on the same screen. Additionally, asymmetric view frustum works well for VR head mounted displays because the view frustum remains the same even when the viewer tilts or moves his head.

In method 2, the fixed function graphics pipeline of OpenGL 2.0 is used to animate

rain. The stereo camera model enables objects in the background to be rendered in stereo.

The background scene is created using a 3D modeling tool called Blender. The model and

related texture maps are imported into the OpenGL during program initialization. The

trunk of the center tree is placed at zero parallax, on the stereo window. The leaves in front

99 Figure 6.7 Asymmetric view frustum (top view)

of that tree have a slight negative parallax, appear to come out of the screen. All other

objects, including animated rain streaks, have positive parallax. The tree on the left is set

back slightly while the tree on the right is the farthest away. These observations are visible

in the anaglyph image. The left eye view is encoded with red color channel while green

and blue colors filter the right eye view. An output is produced using a red-cyan anaglyph

glasses as shown in Figure 6.8.

6.1.3 Frame rate Comparison

The frame rates achieved by the two stereo rendering methods for light, medium and heavy

rain are compared. The results are an average over 10 experiments. The results are also

compared with monoscopic rendering. In monoscopic viewing, the geometry associated with rain streaks is drawn once per rain streak. For heavy rain 10,000 rain streaks are

100 Figure 6.8 Method 2 output (stereo view with red-cyan anaglyph glasses)

rendered per frame. Moreover, for each rain streak only position is calculated. Like the monoscopic viewing, in method 1 the approach of achieving stereo by creating parallax in the rain streaks also uses a single camera setup. The position and parallax of each rain streak is calculated. This information is used to create left- and right-eye output images.

However, the geometry associated with rain streaks is drawn twice per rain streak. In the other stereo rendering approach, method 2, the left- and right-eye views are separately generated by two camera setups, rendering the scene from two different perspectives.

The results in Table 6.1 show that the frame rate decreases with the increase in the number of rain streaks. For light rain, the frame rate of the two stereo implementations is slightly less than the frame rate of the monoscopic rendering. As the number of particles to

101 Table 6.1 Frame rate comparison

Precipitation Number of Monoscopic Stereoscopic Intensity Particles (fps) Method 1 (fps) Method 2 (fps)

Light 1000 262 258 259 Medium 5000 180 93 93 Heavy 10000 91 46 45

render increases, the difference between the two stereo implementations and the mono-

scopic rendering also increases. At maximum rain density, the frame rate of both stereo

approaches is close to half of the monoscopic rendering rate.

The results from the two stereo implementations are very similar. It is shown that the

real-time stereo implementation of rain for simple scenes is possible on relatively simple

hardware. Method 1 adds parallax to the rain streaks and renders twice while method 2

renders twice based on two cameras position. In the proposed method 1, the right-eye rain

streak is derived from the left-eye rain streak by computing parallax. It is noted that there is

no measurable change in the frame rate when wind is added to the rain streaks. The frame

rate can be improved with implementation using contemporary graphics processors with a

newer version OpenGL programmable graphics pipeline.

6.1.4 Phase 1 – Conclusions

The initial phase of this study is inspired by existing techniques of stereoscopic rendering

of other natural phenomena such as fire and vegetation. The study extends monoscopic

techniques for rendering of rain and presents a solution for modeling real-time stereo

rain animation. A rain streak is rendered for left- and right-eye views based on randomly

102 generated parallax. A particle system is implemented for animating the rain scene. Wind

and gust effects are also modeled.

Several simplifying assumptions were made in this study: rain streaks only have nonneg-

ative parallax, the stereo cameras are stationary, rain streaks are moving at terminal velocity,

the 3D model for the background already exists, and complex issues of scene illumination

are ignored. Future research will address these assumptions and enhance this work by

including complex lighting interactions, object and camera motion, acceleration of rain

streaks, and inclusion of sound effects.

6.2 Phase 2 – Stereo from 2D-3D Converters

The alternative to modeling and rendering stereo rain is to apply 2D-3D conversion software

to a monoscopic rain scene. In this phase, the effectiveness of the 2D-3D converters in

producing stereoscopic natural scenes is studied. Five 2D-3D software applications that

convert 2D video into stereoscopic 3D are compared [79]. These five applications are Arcsoft, Axara, DIANA-3D, Leawo, and Movavi. The selection of these five applications is based on

the conversion techniques, ease of use, and software availability.

The Arcsoft Media Converter uses proprietary 3D simulation technology to turn 2D

pictures and movies into 3D format and is included to study how the algorithm compares to

documented methods used by other 2D-3D converters [80]. The Axara Media 2D-3D video converter software applies classifiers and automatic object detection in scenes to perform

transformations from 2D to 3D video files [81]. The DIANA-3D by Sea Phone implements

the method described above by Hattori [82]. The Leawo Video Converter [83] and Movavi

Video Converter 3D [84] both use parallax shift and perspective to provide 2D to 3D video

103 conversion support and are included to study how the two implementations compare.

In these experiments, the quality of the stereo output is measured in two different ways.

In the first method, two features in an input to the 2D-3D video converters are selected

such that one feature is closer to the viewer. Therefore, the correct output of the 2D-3D video converters has greater positive parallax between the left- and right-eye views in the

feature that is farther from the camera. The difference in the horizontal parallax between

actual values and the values obtained by the output of the 2D-3D video converters is

compared. The second method to evaluate the quality of stereo output of the five 2D-

3D video converters is based on subjective scoring by individuals who rate their overall visual experience. The quality of visual experience is measured by asking subjects to rate

converted output using three criteria: visual comfort, conflict between left- and right-views,

and observable depth in a given scene. The output produced by the 2D-3D video converters

is also compared subjectively with the results of the rendered output produced by method

2 for real-time rain rendering that uses a stereo camera model as described in the initial

phase of this study.

6.2.1 Overview

The problem of converting 2D to 3D addresses the generation of left- and right-eye views with correct horizontal parallax from a given 2D view or video. In the movie industry,

converting old movies to 3D is a meticulous, semi-automatic, and time consuming process.

Many sets have a 2D-3D conversion mode, but the processing resources are

limited, resulting in a poor quality visual experience. For computers, including tablets and

hand-held devices, many fully automatic conversion algorithms have become available.

104 The alternative to simulate and render rain is to use video of rain scenery as an input to

2D-3D conversion software. Given accurate depth map estimation, such software applica-

tions may produce a stereo rain scene. However, in creating a depth map the 2D-3D video

converters make many assumptions about the 3D scene and visual cues that are often not

correct, resulting in conflicting 3D views. Also, the data available in the 2D input image of

natural phenomena may not have enough information to give a look-around feel to the

converted output image. It also does not solve hidden surface problems where changing

the viewpoint changes the occlusion relationship between objects in the scene.

6.2.2 Depth Estimation Techniques

The proliferation of depth estimation techniques has given rise to many practical software

applications for 2D-3D conversion. Existing 2D-3D conversion algorithms can be grouped

in two categories: algorithms based on a single image and methods that require sequence

of multiple images such as videos. Depth from a single still image can be extracted by

employing monocular depth cues, such as linear perspective, shading, occlusion, relative

size, and atmospheric scattering. Other techniques like blur analysis and image based

rendering methods using bilateral symmetry also exist. McAllister uses linear morphing

between matching features to produce stereo output from a single image with bilateral

symmetry, such as the human face [85]. For methods that require a sequence of multiple images, several heuristics exist to

create depth information. These methods generate a depth map by segmenting the 2D

image sequences, estimating depth by using one or combination of many visual cues,

and augmenting the 2D images with depth to create left- and right-eye views. A detailed

105 description the algorithms useful in computing dense or spare depth maps is given by

Scharstein and Szeliski [86]. The depth map is computed from multiple images of a scene either taken from similar vantage points or from a sequence of images acquired from a video. In another method, Hattori describes real-time 2D-3D converter software that produces a 3D output viewable from different angles [87]. To accomplish this, the author applies the horopter circle projection on the right-eye image. The horopter is the locus of points in space that fall on corresponding points in the two retinas when the left- and right-eye fixate on a given object in the scene. All points that lie on the horopter have no binocular disparity. In the absence of binocular disparity other depth cues such as linear perspective, shading, shadows, atmospheric scattering, occlusion, relative size, texture gradient, and color become more relevant. The author relates the parallax shift with pixel illumination assuming that brighter objects are closer to viewpoint while darker objects are in the background. This parallax shift method is used to create the left-eye view. The author further shows that the anaglyph output generated by this real-time 2D-3D converter produces less fatigue due to a decrease in retinal rivalry [88]. Other techniques apply machine learning algorithms and a classifier to automatically detect objects and key features in a given scene to estimate depth. One such algorithm is described by Park et al. where for each video frame a potential stereo match is determined by the classifier [89]. This ensures that the proposed stereo pair meets certain geometric constraints for pleasant 3D viewing. In a sequence of multiple images, the depth cues are also estimated by the presences of shadows, focus/defocus, disparity among two images, and motion parallax. There is extensive research on depth estimation in the context of

2D-3D conversion. An excellent overview of 2D-3D conversion techniques for 3D content

106 generation is provided by Zhang et al. [90]. In principle, depth can be recovered either from monocular or binocular depth cues.

Conventional methods for depth estimation have relied on multiple images using stereo

correspondence between two or more images to compute disparity. However, combining

monocular and binocular cues together can give more accurate depth estimates [91].

6.2.3 Quantitative Experiments and Results

An alternative to simulate, render, and animate stereo precipitation is to apply 2D-3D video

conversion on existing 2D videos of precipitation scenes. The quality of stereo output of

such converters is measured and compared with results among selected 2D-3D convert-

ers. For this purpose, baseline synthetic videos are created by using a 3D modeling tool,

such as Blender. Three such videos are used to create test cases that consider depth from

monoscopic clues, such as linear perceptive, occlusion, and depth from object placement

in the scene. Additionally, three more stereoscopic videos are used as baselines to test the

2D-3D converters. These videos are from a collection of downloadable stereoscopic videos

of natural scenes acquired from an integrated twin lens camera system [92]. Six test cases are considered for this experiment. Figure 6.9 shows one such test image emphasizing

linear perspective. This 2D image of a 3D virtual scene is taken by two identical parallel

camera models in Blender, one for each eye, giving a true 3D stereoscopic output. The

scene consists of two identical spherical objects, representative of raindrops that are smaller

than 2 mm in size. The center of the sphere on the right is on the stereo window, while

the sphere on the left is the same object farther from the camera. The stereo window is

a plane perpendicular to the viewer’s line of slight on which the left- and right-eye views

107 Figure 6.9 Baseline input image to test depth from linear perspective

are projected. The stereo output image acquired by using the parallel camera models is

the baseline output image shown in Figure 6.10. The Axara 3D video converter outputs

Figure 6.10 Baseline output image: Test case C-1

shown are discussed. Notice that the left- and right-eye views of the sphere on the left

shows greater positive parallax as it is placed away from the camera while the center of

the sphere on the right has zero parallax and shows little disparity between the left- and

right-eye views. This baseline output image is compared with the output of the five 2D-3D video converters. The horizontal parallax value is measured by identifying key features such

as edges or corners of an object in the left- and right-eye views. For test cases where the

baseline image is from a stereo camera, a feature such as an edge or a corner of an object is

108 easily recognizable. The horizontal parallax of the selected feature from the baseline output

image is the correct value. The difference in this horizontal parallax for the same feature

in the output of the 2D-3D video converters is measured. The difference between the two values is the error in horizontal parallax introduced by the 2D-3D video converter.

The output of a 2D-3D video converter (Axara) to the input baseline image for linear

perspective is shown in Figure 6.11. It is expected that a feature closer to zero parallax would

Figure 6.11 Output of Axara 3D software video converter

show little disparity. However, in the actual output of the 2D-3D video converter, the sphere

closer to the camera, exhibits significant horizontal parallax. Furthermore, both spheres

measured similar disparity indicating that the 2D-3D converter did not correctly account

for linear perceptive.

These experiments are repeated for depth implied by occlusion. A baseline input model

is created with two spheres such that one is in front of other. The expected baseline output

and corresponding output of a 2D-3D video converter is shown in Figure 6.12. In the

baseline output, the occluded sphere is farther away from the camera; therefore it has

positive parallax and appear into the screen. The 2D-3D converter (Axara) output shows

both front and occluded sphere with same parallax, which is not expected.

109 Figure 6.12 Depth from occlusion. Test case C-2

For some methods, objects at the bottom or center of the scene are assumed to be closer to the camera than objects at the top or near the edges. To test this scenario, a baseline image with objects appearing throughout the scene is used. Figure 6.13 shows an expected output where the baseline image has all spheres on the stereo window. It is noted that the

Figure 6.13 Depth from object placement. Test case C-3

2D-3D converter output shows spheres on top of the screen with positive parallax, farther away from the camera, while the spheres on the lower half of the screen have negative parallax, coming out of the screen towards the camera.

Three videos consisting of a scene with water and wind effects are selected from a stereoscopic 3D HD video library [92]. These videos are used as baseline input to the 2D-3D

110 video converters. Recognizable features such as edges or corners are used as a reference

point to measure horizontal parallax between the left- and right-eye views in both the

baseline video and the output of the 2D-3D converters. The three baseline images taken

from videos of scenes from a stereoscopic camera and corresponding 2D-3D converters

output are shown in Figure 6.14. The comparison between positions of the selected features

Figure 6.14 Depth from stereoscopic camera: Test case C-4, C-5 and C-6

in the baseline image and the corresponding output from the five 2D-3D video converters is

111 shown in Table 6.2. The columns titled C-1 to C-6 correspond to the six different test cases

Table 6.2 Parallax (in Pixels) for objects closer to the camera

Test Cases No. Converters C-1 C-2 C-3 C-4 C-5 C-6

1 Baseline -2 -2 -2 -12 -18 -14 2 Arcsoft -6 -4 -4 -9 -6 -4 3 Axara 20 20 20 60 55 64 4 DIANA-3D 6 6 6 12 15 14 5 Leawo -4 -4 -4 -17 -20 -20 6 Movavi 10 10 10 32 30 31

used. The first three tests, from C-1 to C-3, are results from synthetic baseline input images while the results from C-4 to C-6 are from baseline images acquired from a stereoscopic

camera. The values in Table 6.2 are horizontal parallax values for a selected feature that

is closer to the camera. The values for the objects that are farther from the camera are

Table 6.3 Parallax (in Pixels) for objects farther to the camera

Test Cases No. Converters C-1 C-2 C-3 C-4 C-5 C-6

1 Baseline 26 -2 3 -8 8 -5 2 Arcsoft -6 -8 -4 -4 -9 -8 3 Axara 20 20 20 60 55 64 4 DIANA-3D 6 6 6 12 7 12 5 Leawo -4 -4 -4 -17 -20 -20 6 Movavi 10 10 10 32 30 31

112 provided in Table 6.3. These values are measured in pixels. Notice that some values are

negative. Negative values mean negative parallax. For example, in the baseline image, the

sphere is positioned with the stereo window passing through the center; a portion of the

sphere appears in front of the stereo window.

The test case C-1 corresponds to the depth from linear perspective. Arcsoft and Leawo

are the only two converters that place the sphere with negative parallax. However, these values are in error when compared to the true value. The test case C-2 corresponds to depth

due to variation of object placement in the scene. In this case, spheres are placed throughout

the scene at various locations all with centers in the stereo window. The expectation is

for the output image to be at the same parallax unless the 2D-3D video converter is using

scene placement to determine depth. Comparing C-2 with the same column in Table 6.3

should give the same values. Arcsoft is the only 2D-3D video converter that exhibits different

parallax values for objects placed at the bottom of the scene as opposed to the same object

placed on the top.

The C-3 test case includes occlusion. In the baseline image, the sphere farther from the

camera is placed behind the sphere that is near the camera so it is partially occluded. The

horizontal parallax value in column C-3 of the two tables for the baseline image confirms

this fact. The remaining values in the column C-3 show that none of the 2D-3D video

converters distinguished between the two spheres and the horizontal parallax values for

the two spheres are the same.

The test cases from C-4 to C-6 correspond to the videos of natural scenes taken from a

stereoscopic camera. The horizontal parallax in the three baseline images is negative. The

Arcsoft output for test cases C-5 and C-6 is visually conflicting as the feature farther from

113 the camera has less horizontal parallax than feature closer to the camera. Axara, DIANA-3D, and Movavi outputs all have positive parallax and are therefore incorrect. Only the Leawo output exhibited negative parallax for all objects close to the camera. An important note from the data in Table 6.2 and Table 6.3 is that apart from Arcsoft, all other 2D-3D video converters showed no difference in horizontal parallax for features closer or farther from the camera, thus adding an equal amount of parallax to all objects. This simply gives a perception of the entire scene appearing behind or in front of the stereo window. The depth perception in these outputs is mainly due to monoscopic depth cues.

It is noted that out of five 2D-3D video converters, four converters (Axara, DIANA-3D,

Leawo, and Movavi) offer a user adjustable 3D depth setting. For the experiments this setting was set to the default value. The effect of changing the 3D depth setting results in either shifting all objects in a scene to appear behind or in front of the stereo window, thus adding either positive or negative parallax to the entire scene. It is also noted that out of the five 2D-3D video converters, DIANA-3D is the only converter that can convert a video in real-time. All other 2D-3D converters first uploaded a 2D video file before writing the converted 3D output.

From the data in Table 6.2 and Table 6.3, a mean square error (MSE) value for each

2D-3D video converter for the feature closer and the feature farther from the camera is computed. For a given test case, the error is the difference between the parallax values of the baseline and the parallax value of the 2D-3D video converter. This error is squared and summed up for all test cases for that particular 2D-3D video converter. A mean value is calculated by dividing the squared sum values with the total number of test cases. Table 6.4 shows normalized mean squared error for the five 2D-3D converters. The data shows that

114 Table 6.4 Normalized MSE between baseline and 2D-3D converters

No. Converters Near Far

1 Arcsoft 0.015 0.115 2 Axara 1.000 1.000 3 DIANA-3D 0.146 0.094 4 Leawo 0.004 0.165 5 Movavi 0.371 0.309

Leawo had the least amount of error while Axara had the highest error, followed by Movavi.

The errors in Arcsoft and DIANA-3D are close; Arcsoft performs better for closer objects while DIANA-3D has a smaller error for objects further from camera.

6.2.4 Subjective Experiments and Results

The stereoscopic viewing experience is evaluated based on subjective scoring by individuals.

Twenty-five subjects participate in the study. The International Telecommunication Union

(ITU) has proposed recommendations for performing stereovision tests for participants in

subjective assessment [93]. Guidelines from the ITU recommendations helps with screening subjects for visual acuity and stereo blindness.

The input to the 2D-3D video converters is a synthetic monoscopic precipitation scene.

Some 2D-3D video converters cannot convert 2D input in real-time; they upload the entire

2D input video file before writing the converted 3D output. Participants are shown the

resulting video clips produced by each method. It is a blind experiment in which participants

are not aware of the methods used to produce stereoscopic output. The quality of visual

experience is measured by asking subjects to observe output of the five 2D-3D converters

115 and rate them based on three criteria: 1) visual comfort, 2) conflict between left- and right-views, and 3) observable depth in the scene. The rating of each question is based on a five-point Likert scale, where 1 is poor, 2 is marginal, 3 is average, 4 is good, and 5 is excellent [94]. The questions asked to assess the output are given in Table 6.5.

Table 6.5 Survey questions for visual assessments

Answer each question by writing a number between 1 and 5

Note: 1=Poor, 2=Marginal, 3=Average, 4=Good, and 5=Excellent No. Survey Questions Instructions 1 Is the scene comfortable to view? Note for any physical discomfort such as headache or eye strain. 2 Do you see a ghost image or dou- Note for left-eye image leaking ble vision? into right-eye and vice versa, re- sulting in ghosting effect. 3 Is there any observable depth in The 3-D positions of stereoscopic the scene, specifically depth in par- objects are perceived stereoscopi- ticles? cally but they appear unnaturally thin.

The output of five 2D-3D converters are compared to each other for their effectiveness in producing stereo results. Axara 3D video converter output produced for a monoscopic rendered rain scene is shown in Figure 6.15. By default, Axara 3D adds a negative parallax to each object. There is a 3D depth setting that can be adjusted from positive to negative parallax levels. Setting at positive level increase positive parallax in all objects by the same amount. The left-eye image is slightly shifted to the left. Similarly, the right-eye image is

116 Figure 6.15 Axara 3D output for a monoscopic rendered rain scene

shifted to the right. The resolution has decreased a little making the smaller rain streaks blur out of the scene. Additionally, output from stereoscopic rain rendering described in phase 1 and shown in Figure 6.8 is also included in the comparison.

The box and whisker plots are drawn for the three survey questions. A box and whisker plot graphically represents several numeric quantities [95]. The vertical axis represents the Likert scale ordinal values, from “poor” represented by rank value of 1 up to “excellent” with rank value of 5. The box itself represents the first and the third quartile of the data, which is the inter-quartile data range (IQR), where the majority of the response exists. The red line represents the median response. The whiskers are represented by the dashed lines showing the possible range of user response.

117 For responses the three survey questions Figure 6.16 compares the descriptive statistics,

such as median, range and inter-quartile range (IQR), for the five 2D-3D converters and

the stereo method 2 implemented during phase 1 of this study. It shows that among all

2D-3D converters Leawo produced the best results along with method 2. Since the IQR is

the minimum for method 2, it suggests that most participants preferred it over any other

2D-3D converters; the remaining 2D-3D converters are ranked followed by DIANA-3D,

Arcsoft, Movavi, and Axara. These results corroborate the normalized MSE values acquired

from the previous experiments. The Photoshop method is used to compute anaglyphs,

incurring minimal overhead [67]. Since viewing an anaglyph requires color filters, the color fidelity and luminosity of the scene is reduced. Due to a darker background, characteristic

of a rain environment, the luminance intensity in the experiment is already low.

The participants viewing the anaglyph output produced by the 2D-3D video converters

expressed unpleasant viewing experiences such as eye fatigue and headaches that are side

effects of prolonged viewing. The average time spent looking at all output per participant was approximately 10 minutes. The participants also expressed difficulty in observing depth

in the output produced by some 2D-3D video converters. Common issues related to the

2D-3D video converters output are poor resolution, color fidelity, luminance, and lack

of observable depth in rain streaks. From the software specifications of the 2D-3D video

converters it is not clear what algorithm they used to compute the anaglyphs.

6.2.5 Rain and Snow Rendering

The left- and right-eye views are calculated and rendered separately. Thus, the scene is

rendered from two different perspectives. The scene with rain or snow particles is modeled

118 Figure 6.16 Variation in rating from twenty five participants

119 using the world coordinates system while each object is modeled using their respective

model coordinates. Several matrix transformations are required to transform all model

coordinates to screen coordinates. This approach is typical in the raster graphics pipeline where scene objects are used to form an image on the screen.

The initial rain rendering is enhanced to simulated and stereo rendering of rain and

snow using a programmable GPU. The experimentation focus is on photorealistic output, which is achieved by using environment lighting methods, such as cube maps. The OpenGL

4.5 graphics library with programmable graphics pipeline is used. The application executes

on the Intel Quad Core i7 CPU running at 2.00 GHz with 16.0 GB of RAM installed with an

NVIDIA Quadro K5000 GPU. The graphics card used in this study supports advanced and

the most modern graphics pipeline featuring a programmable GPU.

For stereo, the hardware used for experiments support quad buffers with active shutter

glasses to view stereo output. This type of stereoscopic rendering requires a display refresh

rate of at least 120 hz. This leaves 0.833 ms to complete writing data to the framebuffer.

Table 6.6 shows the frame rates achieved for light, medium and heavy precipitation with

1000, 5000, and 10,000 number of particles respectively. The results are an average over

Table 6.6 Frame rate comparison for rain and snow rendering

Precipitation Number of Monoscopic Stereoscopic Intensity Particles (fps) Rain (fps) Snow (fps)

Light 1000 1685 825 830 Medium 5000 1260 660 664 Heavy 10000 1050 521 525

120 10 experiments. These results are compared with monoscopic rain rendering where only

one camera is used to render a scene from a single perspective. In the stereo rendering

approach, the left- and right-eye views are separately generated by two camera setups,

rendering the scene from two different perspectives.

6.2.6 Runtime

The running time of the algorithm depends upon a number of factors; single versus multiple

processor machines, various versions of a processor, read/write memory or hard disk access speed, 32 versus 64 bit architecture, configuration of the machine, and input to

the algorithm. The time complexity analysis is only concerned with the behavior of the

algorithm in response to various inputs. The rate of growth of time taken by the algorithm with respect to the input is determined. In other words, the complexity of an algorithm is

a measure of how many steps the algorithm will require in the worst case for an instance

or an input of a given size. It is important to understand some basic terminology such as

problem, problem instance, and algorithm. Moreover, it is important to know how the size

of a problem instance is measured and what constitutes a step in an algorithm.

A problem is an abstract description coupled with a question requiring an answer. For

example, the real-time, photorealistic, stereo rendering of rain problem is; “given a scene with photorealistic objects, what is the maximum number of raindrops that can be animated

in real-time to give a visual sensation of seeing rainfall in stereo?” On the other hand, a

problem instance includes an exact specification of the data, for example: “a photorealistic

scene contains 10 textures, 100 polygons, 30,000 vertices, 10,000 raindrops...” and so on.

Stated more mathematically, a problem can be thought of as a function p that maps an

121 instance x to an output p(x ), which is the answer to the question posed by the problem. An algorithm for a problem is a set of instructions guaranteed to find the correct solution to any problem instance in a finite number of steps. In other words, for a problem p an algorithm is a finite procedure for computing p(x ) for any given input x . In a simple model of a computing device, a “step” consists of one of the following operations: addition, subtraction, multiplication, finite-precision division, and comparison of two numbers.

Time complexity is concerned with how long the algorithm takes as the size of a problem instance gets large. To resolve this, a function of the input size is formulated that is a reasonably tight upper bound on the actual number of steps. Such a function expresses the complexity or running time of the algorithm. An asymptotic analysis is required which determines how the running time grows as the size of the instance gets very large. For this reason, it is useful to introduce Big-O notation. For two functions f (t ) and g (t ) of a nonnegative parameter t , f (t ) = O(g (t )) if there is a constant c > 0 such that, for all sufficiently large t , f (t ) c g (t ). The function c g (t ) is thus an asymptotic upper bound on ≤ f . To calculate an expression for time complexity, a machine model is assumed. This is a hypothetical machine, which is assumed to have multiple processors, 64 bit architecture, can process data in parallel, and takes one unit of time to complete simple arithmetic and logical operations like addition, subtraction, multiplication, and division. It is also assumed that it takes unit time for an assignment operation to complete and all other computational costs are negligible.

122 6.2.7 Phase 2 – Conclusions

The experiments show the relative performance among five selected commercially available

2D-3D video converters and stereo output of rain rendering from method 2 implemented

earlier. Six test cases are applied to measure horizontal parallax. Additionally, subjective

scoring by twenty five participants measures the overall quality of visual experience. It is

observed that the depth perception is mainly due to presence of strong monoscopic depth

cues. In the majority of 2D-3D converters tested, the binocular disparity is equally applied

to all objects in the scene. This makes the entire 2D image plane shift into or out of the

screen. The 2D-3D video converters are making assumptions about the 3D scene that are

often not correct, thus giving conflicting visual cues. The quality of the visual experience for

scenes acquired from most 2D-3D converters is poor. Therefore, there is a need to develop

new methods to enable real-time photo-realistic rendering for stereo content generation.

This is achieved using a programmable GPU and environment lighting techniques.

6.3 Phase 3 – Measuring Photorealism

In the final phase of the study, current graphics hardware that supports programmable GPU

and modern graphics library is used. The initial implementation of stereo rain rendering

is extended to include photorealistic stereo rendering of rain and snow precipitation at video frame rates. The experiments in this phase of the study used input visual stimuli that vary along three visual factors: particle numbers, particle size, and motion [96]. The goal is to determine the statistical ranking and importance of these visual factors necessary

for producing a photorealistic output. The experiments were also extended to study the

123 impact on photorealism by use of stereo output and use of post-processing effects, such as variable lighting conditions, fog, and glow effect.

6.3.1 Visual Factors for Photorealism

Physically accurate environment lighting is one approach to produce visually realistic

results. However, such physically-based rendering alone is not sufficient to achieve photo-

realism. There are many visual factors that contribute to generate photorealistic results.

For example, sharp edges make objects look artificial as demonstrated by Rademacher et

al. [74]. Making sharp edges smooth and slightly rounded gives an object model a more realistic look. Similar conclusions are drawn for shadows. A sharp contrast of a dark shadow

looks artificial while soft shadows make a computer-generated scene look more realistic.

Imperfections in the camera lens can produce distortions such as lens flare, chromatic

distortion, or the formation of out-of-focus areas. Reproduction of these imperfections in a

computer-generated image adds to photorealism although this study does not consider

such distortions.

A fundamental challenge associated with achieving photorealism in a rain or snow

scene is lighting calculations. In previous studies involving particle systems, Garg and

Nayar [16] investigate photorealistic rain rendering with lighting effects. The study involves interaction of light with oscillating raindrops which produce complex brightness patterns

such as speckles and smeared highlights. The importance of light attenuation to achieve

photorealism is demonstrated by Tatarchuk and Isidoro [18]. Atmospheric effects such as fog and addition of glow effects around light sources such as misty halos around streetlights

for nighttime rendering are also considered.

124 Precipitation scenes consist of several thousands of particles that move independently

at variable velocities under the influence of external forces such as gravity, air resistance,

and wind gusts. The intensity of rain or snowfall is dependent on the number of particles

and particle size. A light precipitation event will have fewer and smaller particles as opposed

to heavy precipitation. Similarly, the variability in a particle shape is due to motion caused

by external forces acting on the falling particles. Variations in precipitation details due to

changes in number of particles, their sizes, and motion are important visual factors. The

literature review of computer generated rain and snow also points to these three key factors

influencing visual attention when observing a scene. Therefore, such visual factors are

important to consider when producing photorealistic results. The experiments proposed

in this study collect subjective data to analyze and rank the three visual factors.

6.3.2 Measuring Photorealism

There is no standard method to measure photorealism [97]. The images used in the experi- ments and the test procedure itself can cause variability in the results. Therefore, depending

on the test scenarios, new experiments will have to be devised to measure photorealism.

In this study, answers to survey questions are collected as data. Participants are shown video clips and still photographs of natural scenery with rain and snow. The visual stimuli vary along three factors or dimensions: number, size, and movement of particles. Addi-

tionally, computer rendered natural scenes with precipitation vary in light conditions such

as precipitation during sunlight vs. overcast sky, glow or halo from artificial light sources

such as streetlamps, and fog or atmospheric haze effects. Participants also evaluate stereo-

scopic views of computer generated natural scenery with precipitation by answering survey

125 questions designed to compare monoscopic and stereoscopic outputs.

The perception of each person varies slightly, thus making visual fidelity extremely

subjective. Therefore, human subjects are asked to complete survey questions from which

conclusions can be drawn about photorealism. A total of thirty healthy adult subjects, who

are not stereo blind, participated in observing rain scenes. Another set of thirty individuals

answered same survey questions for scenes with snow.

Three different types of experiments are conducted. In the first type of experiment, a

set of questions are designed to determine the perceptual space of precipitation in terms

of number of particles, size, and their motion as they fall towards ground. The second

experiment is designed to evaluate other visual factors such as illumination, fog, and glow

effects in a computer generated precipitation scene. In the final experiment, a question

is asked to compare mono and stereoscopic rendered outputs to study the contribution

of stereo on photorealism. The data gathered from these experiments is analyzed using

statistical tools.

6.3.3 Experiment 1 – Perceptual Space

The perceptual space is defined as the visual experience of a precipitation scene as observed

from the ground. The perceptual space is influenced by visual factors, such as particle

sizes and number of particles that determine the precipitation intensity - light, moderate,

and heavy rain or snowfall conditions. Additionally, variations in particle motion due to wind effects or turbulence is an important visual factor to consider. Rainfall may appear

to come down vertically when there is little wind or turbulence. However, under similar

low wind dynamics, snowfall may exhibit greater variations and randomness in vertical

126 fall. These factors are ranked based on the result of series of survey questions asked in

controlled human subject experiments to determine relative influence of a visual factor on

the perceptual space of a rain or snow event. For example, if respondent is most sensitive to variations in the number of particles then, to enhance photorealism, a rendering algorithm

can be developed to emphasize this particular visual factor.

The input stimuli are a series of video samples and still pictures of actual rain and

snowfall scenes captured by a monoscopic camera or gathered from various freely available

public websites [98]. The stimuli vary along the three factors such that each factor varies in extreme between high and low values. This results in a total of eight input stimuli, varying

in size, number of particles, and particle motion as they fall. The quantities of high and

low values are subjectively judged after visually inspecting and comparing several input

stimuli. For example, an image of a light rain scene will be selected as one of the input

stimuli for having very few and small rain streaks falling vertically after comparing it will

several similar images. Figure 6.17. shows sample images used to show heavy rain with

many particles falling on a slant used as an input stimulus for one of eight experiments.

A similar visual stimulus for experiment with snow is also shown. Note that the included

images lose resolution and visual fidelity in rain and snow particles after screen capture

and a resize.

There are two questions per visual factor forming a total of six questions as listed in

Table 6.7. Thirty participants are shown eight input stimuli to respond to the six questions.

The Likert scale is a commonly used scale to rank human responses to survey questions.

Responses to the six questions, which are referred to as Likert items, are recorded on a five-

point Likert scale [94]. Note that input stimuli with particle attributes of “small” vs. “large”,

127 Figure 6.17 One of the eight visual stimuli for rain and snow scene

“few” vs. “many”, and “straight” vs. “angle” are subjectively selected for these experiments

after visually inspecting several samples. All even numbered questions are opposite to

the previous odd numbered questions. This identifies incorrect survey answers due to

respondent bias, which is participants’ inability to answer truthfully or accurately. The

participant response time is neither restricted nor measured. It is also important to note

that there is no notion of “correctness” as the responses are subjective. Since Likert scale

responses produce ordinal data with no clear distribution, the statistical analysis such as

means and standard deviations are not useful for analysis. For example, it is unclear what

the average of “strongly agree” and “strongly disagree” means. Instead calculating median

or measuring frequency of responses in each category provides a more meaningful analysis

128 [99]. A different population set of participants is used to repeat the above experiments for visual stimuli that have snow precipitation instead of rain.

Table 6.7 Survey questions to determine the perceptual space

Answer each question by writing a number between 1 and 5

No. Survey Questions

1 Particle size is small 2 Particle size is large 3 There are few particles 4 There are many particles 5 Particles are falling straight down 6 Particles are falling at an angle

The results from all experiments are collectively analyzed, except for questions that were

designed to detect respondent bias. Figure 6.18 shows results for rain as visual stimulus.

The horizontal axis represents the Likert scale ordinal values, from “strongly disagree”

represented by rank value of 1 up to “strongly agree” with rank value of 5. The vertical

axis shows corresponding number of responses in percentage. For particle size in rain

experiments, responses ranged from “disagree” to “agree” with median response of “neutral”, while the majority of responses are between “neutral” and “agree”.

For number of particles and particle motion, the respondents tend to favor “agree” or

“strongly agree” with average response rank between “neutral” and “agree”. The central

tendency of both these visual factors is indicated by median response of “agree”. Although

the average response of particle size is also between “neutral” and “agree”, the median is

129 Figure 6.18 Rain scenes - response to number of particles, size, and motion

“neutral” with response ranging from “disagree” to “agree”.The data shows that respondents

have a favorable opinion regarding number of particles and particle motion in a rain

precipitation scene but the same cannot be said about the particle size. Moreover, as shown

in Table 6.8, the mode value for particle size is a neutral opinion as the most frequently

occurring response. On the other hand, the mode of the other two visual factors are more

Table 6.8 Descriptive statistics – rain/snow to determine perceptual space

Median Mode Visual Factors Rain Snow Rain Snow

Particle Size 3 3 3 4 Particle Motion 4 5 4 5 Number of Particles 4 4 4 4

130 towards “agree” or “strongly agree”.

The experiments are repeated with another group of thirty respondents for snow scenes.

The experiments on snowfall as visual stimuli yields similar results. In comparison, when

the visual stimulus is snow more responses are in the “agree” column. on particle size.

Figure 6.19 shows results for snow as visual stimulus.

Figure 6.19 Snow scenes - response to number of particles, size, and motion

Experiments with number of particles in snowfall and rain have similar results with the

same median and mode values. However, the response to particle motion in snowfall has

more variability but the median and mode values are “strongly agree” values. This may

be attributed to randomness in snowfall particle motion. Even with no wind snowflakes

tend not to fall straight down due to variations in snowflakes shape. Table 6.8 shows the

median and mode values in comparison with rain as visual stimuli. In case of the snow, it is

131 noted that the mode is “agree” for particle size as opposed to “neutral” for rain. The results

suggest that the respondents note a change in particle size for snowflakes more frequently.

This may be attributed to relatively lower velocity of falling snow particles, as compared to

raindrops, giving the viewer opportunity to observe snowfall in more detail.

6.3.4 Experiment 2 – Other Visual Factors

In these experiments survey questions are designed to measure subjective preferences of

different lighting and atmospheric conditions. During natural rain or snowfall, the sky is

overcast and light is diffused. The question is whether simulating overcast lighting con-

tributes towards improvement in photorealism of a rendered precipitation scene. The

participants are asked to rate two different light conditions, daytime and simulated over-

cast light conditions. Other visual factors, such as glow from an artificial light source and

atmospheric haze or fog effects, are also rated for photorealism.

A total of three visual stimuli and three survey questions are evaluated by thirty par-

ticipants. The questions are listed in Table 6.9. Keeping the definition of photorealism in

mind, participants are asked to view computer generated rain animations in 2D. They are

first shown a rain scene in bright daylight conditions. The output is switched to show the

same scene with lighting appropriate for an overcast sky. The participants are then asked

to compare the two outputs. A Similar procedure is used to compare a rain scene with glow

or halo from an artificial light source and a scene with atmospheric haze or fog.

The experiments are repeated for snow precipitation with another group of thirty par-

ticipants. The impact of these visual factors on photorealism is investigated by analysis.

The survey questions used for these experiments compare the two visual stimuli shown in

132 Table 6.9 Survey questions to determine other important visual factors

Answer each question by writing a number between 1 and 5

No. Survey Questions

Daylight vs. Overcast: 1 Overcast light made scene appear photorealistic Glow: 2 Adding a glow effect made scene appear photorealistic Fog: 3 A fog effect improved photorealism

sequence in the three experiments. The comparison is made between two different light

conditions, daylight vs. overcast, presence or absence of glow from a simulated manmade

light source, and addition of fog effects in the scene. The participants use the Likert scale

from 1 to 5, “strongly disagree” to “strongly agree”, to rank their responses. The results of

the experiments for rain as input stimulus are summarized in Figure 6.20.

Lighting plays an important role in producing photorealistic results. These experiments

indicate that for a rain scene the overcast lighting produces some responses as “disagree”, which pulls the median results between “neutral” and “agree”. Although overall lighting

change to overcast adds to realism, these results indicate that rain streaks and their cumu-

lative effect is relatively difficult to observe in low light conditions as compared to the glow

or fog effects. The participants’ response to the glow and fog effect is most significant. The

experiments on snowfall as visual stimuli yield similar results with greater variability as

shown in Figure 6.21.

The response to snow as visual stimuli is “neutral” for light conditions and glow with

133 Figure 6.20 Rain scenes - response to lighting, glow, and fog

Figure 6.21 Snow scenes - response to lighting, glow, and fog

134 favorable response to fog effect. Since snow particles are opaque and lack light refraction

they make a scene appear bright. The respondents did not form any strong opinion about a

snow scene except to “agree” on fog contributing towards photorealism. Table 6.10 compares

the median and mode values between the two types of visual stimuli.

Table 6.10 Descriptive statistics – rain/snow to determine other visual factors

Median Mode Visual Effects Rain Snow Rain Snow

Fog 4 4 4 4 Glow 4 3 4 3 Lighting Condition 3.5 3 4 3

6.3.5 Experiment 3 – Photorealism and Stereo

A computer generated rain scene is used in this experiment that is made up of a noticeable

number of raindrops falling at slightly slanted angles with random variation in sideways

motion. The scene is rendered in both mono and stereo. The subject views 2D output

before viewing the same scene in stereo. The stereo is viewed by using active shutter glasses while the display is switched to produce stereoscopic output. The participants are asked to

compare the two outputs. The survey questions are designed to find whether stereoscopic

results appear more photorealistic relative to monoscopic output. The experiment is re-

peated for snow precipitation. A same question is asked to two independent groups of thirty

participants with one group answering the question with rain as the visual stimulus while

135 the other group is asked to respond to same question with snow as the visual stimulus. The

question compares a monoscopic stimulus with a stereo equivalent. The participants use

the Likert scale from 1 to 5, “strongly disagree” to “strongly agree”, to rank their responses.

For both input stimuli, rain and snow, responses are closer to “strongly agree”. The snow

scene has median and mode values slightly higher, median 4.5 and mode 5, as compared

to 4 and 5 respectively for rain. The response frequency of each Likert level, for which there was a response, is shown in Table 6.11. Note that the majority of responses are either “agree”

Table 6.11 Response frequency of rain/snow

Question: Viewing in stereo made the scene appear photorealistic

Strongly Agree Agree Neutral

Rain 37% 33% 30% Snow 50% 40% 10%

or “strongly agree” for both types of precipitations, which is more pronounced for snow.

With respect to the question regarding photorealism of mono vs. stereo output for rain

and snow visual stimuli, the Mann-Whitney U-test is used to compare the responses of

the two independent groups. This test is well suited to analyze the Likert scale data, which

is ordinal or ranked scale, as we cannot presume that the responses fit a parameterized

distribution. This test requires that results from one experiment do not affect results in the

other and since the responses for rain and snow experiments are from different population

groups, the two group results are independent.

Since rain and snow precipitation are similar, we want to test whether there is a sta-

136 tistically significant difference in respondents’ opinions regarding the question posed in

this experiment. We can perform a hypothesis test by defining the null and an alternative

hypothesis. The null hypothesis is that there is no difference, using a significance level of

0.05, between the outcome of the two experiments using either rain or snow as visual input when it comes to comparing mono and stereoscopic output. In other words, both groups

are expected to respond similarly regardless of the type of the visual stimuli, rain or snow,

95% of the time. The alternative hypothesis is opposite of the null hypothesis, there is a

difference between the outputs of the two experiments.

An online Mann-Whitney U-test calculator is used to test the null hypothesis [100]. Raw samples are entered as an input to the calculator. The results confirm our null hypothesis

by calculating the z-score of -1.45627 and the p-value of 0.1443. Since p-value is greater

than our significance level of 0.05, we will accept the null hypothesis, concluding that both

groups formed similar opinions that viewing in stereo makes the scene appear photorealistic

regardless of precipitation type.

6.3.6 Phase 3 – Conclusions

In terms of the particles size, number of particles, and motion in a precipitation scene, visual stimuli of either rain or snow generate the same response. Both types of precipitation

generated stronger opinions for the number of particles and motion suggesting that these

two visual factors have more effect on participant’s attention. Thus the rendering algorithm

should emphasize these visual factors to enhance visual realism. The results demonstrate

that the visual factors for photorealism can be ranked as more sensitive to number of

particles and motion than to size.

137 In studying other visual factors, such as lighting conditions, glow, and fog effect the results from the two visual stimuli differ slightly. As expected rain precipitation was more sensitive to these factors while responses to snow scenes had fewer variations in opinion, remaining close to neutral. The overcast sky condition for rain scenes produced responses slightly higher than neutral suggesting that participants still considered rain as visually real even in normal daytime light conditions. However, presence of atmospheric haze and fog produced a stronger response; rendering outdoor scenes with fog effects adds to realism.

Since this part of experiment used a 2D visual stimuli, the glow and fog effects contribute towards photorealism independent of stereo. Moreover, the stereoscopic output contributed towards photorealism when compared to monoscopic results. The median response for a snowfall scene is slight higher than rain scene. This can be explained with snow particles falling at much slower rate than rain resulting in more time to observe particles in stereo.

138 CHAPTER 7

FUTURE ENHANCEMENTS

The objective of this work is to show that stereoscopic, real-time, and photorealistic ren-

dering of precipitation, such as rain and snow is achievable using contemporary graphics

hardware and software. This chapter provides a summary of the study and suggests future

extensions to this work.

7.1 Summary

The study begins with providing a general description of rain and snow formation in nature.

This information is important to understand the key attributes associated with precipita-

tion scenes. This helps in creating realistic simulation and animation of rain or snowfall.

A comprehensive review of related work in monoscopic rendering of precipitation and

stereoscopic rendering of other natural phenomena is presented. This review is important

to build background information needed to achieve the research objectives. Various depth

139 cues, such as monoscopic and stereoscopic, are also studied for better understanding of

stereoscopic rendering.

After acquiring necessary foundational knowledge, initial experiments are performed

on simple graphics hardware using fixed function pipeline to validate that stereoscopic

rendering of rain is possible in real-time. The use of less efficient fixed function graphics

pipeline is deliberate in these experiments with an understanding that if a slower hardware would maintain a real-time frame rate then so will a more efficient hardware. Stereoscopic

rain scenes can also be produced by taking a monoscopic rain video and use it as an input

to a 2D-3D software video convertor. A quantitative and subjective comparison between various 2D-3D software video convertors is presented. Their effectiveness in producing

high quality 3D videos with scenery containing water phenomena is studied. The results

from this study is further compared with rendered stereoscopic rain scenes to evaluate the

quality of stereo depth based on subjective scoring by individuals who rate their overall visual experience by answering survey questions.

The implementation of stereo rain rendering is extended to include photorealistic stereo

rendering of rain and snow precipitation at video frame rates. This is accomplished by

using current graphics hardware that supports programmable GPU and a modern graphics

library. The experiments with this newer implementation determine the statistical ranking

and importance of these visual factors necessary for producing a photorealistic output. The

experiments are extended to investigate if stereo improves photorealism. Visual stimuli used

in the experiments also include post-processing on rendered output to produce variable

lighting, glow, and fog effects to study their impact on photorealism as the stereo camera

moves in the scene.

140 7.2 Future Extensions

There are many future extensions to this work some of which are as follows:

1. The model of falling raindrops is represented by textures of rain streaks. As the hard-

ware and software evolve, for future studies a deformable raindrop mesh model can

replace a texture based representation. This will result in a more physically accurate

and photorealistic output that can provide visual details in a falling raindrop. The

mesh model can also deform easily to model motion blur which can provide more

accurate representation of retinal persistence. Additionally for snowfall, such models

can simulate inflight collision between particles to form bigger snowflakes as this

process is physically accurate and common in nature.

2. Rasterization based rendering can be replaced by a ray-tracing approach. Graphics

vendors like NVIDA are working on hardware acceleration for ray-tracing, which

will facilitate real-time rendering. With improvement in technology, animation in

real-time ray-tracing is viable. It is an open area of research that brings real-time and

photorealistic goals towards each other with more visually and physically accurate

rendering. Extending animation in real-time ray-tracing to stereoscopic displays is a

challenging extension to this work.

3. The GPU based stereoscopic rendering of anaglyphs is limited to a simple method by

using frame buffer objects. The compute mode of a GPU, with use of either a compute

shader, CUDA or OpenCL programs, can be researched to produce computationally

extensive anaglyphs, in real-time, with enhanced color fidelity.

141 4. Physically based rendering with use of the computationally expensive fluid dynamic

equations can produce a more realistic animation of natural phenomenon. An effi-

cient stereoscopic implementation with fluid dynamics is a useful extension to this

work.

5. Sophisticated post-processing to include simulation of time-of-day, nighttime render-

ing, glow effects, sunrays, lens flare, cinematic camera movements, realistic particle

accumulation and flow are all valid additions to the current problem.

6. Another extension is combination of several natural phenomenon in one larger

weather system. This may include interaction of rain and snow particles with other

natural objects such as grass, trees, or other vegetation. Inclusion of the sound of rain-

fall, or thunder with lightening effects, howling winds, and simulating snow blizzard

all add to realism.

7. In evaluating 2D-3D converters, the horizontal parallax is measured manually by

counting pixels differences between the left- and the right-eye views. Instead, for

future experiments it is proposed to use an automatic feature detection algorithm

and apply stereo matching between the left- and right-eye views to the horizontal

parallax. This will increase the number of feature points to compare and enhance the

test sets for more accurate results.

8. The dimensionality of the problem that measures perceptual space of rain or snow

should be increased. Currently three dimensions are considered, namely particle

size, motion, and number. It is proposed that inclusion of other dimensions such as

raindrop splash or ripple effects, snow accumulation or stability, and sound effects

142 may also contribute towards defining perceptual space of rain or snow.

9. The survey experiment to determine apparent photorealism in stereo can be en-

hanced to consider other questions, such as stereo output and visual immersion for

effectiveness of stereoscopic viewing in virtual reality applications, the impact of

glow effects on photorealism to a scene with nighttime ambient light, and particle

interaction with other scene objects, such as snow accumulation or a raindrop splash

effect on photorealism.

10. A future enhancement to this experiment is to improve comparison of a mono to a

stereoscopic visual stimulus, such that we gather opinions from a monoscopic visual

stimulus and evaluate it against opinions from same output in stereo. Stereoscopic

implications of camera attributes, such as lens distortions on photorealism is an open

area of research. Future research will use methods to exploit the geometry of stereo

pairs to speedup rendering of photorealistic scenes with natural phenomena while

maintaining real-time frame rates.

143 REFERENCES

[1] Zhao, Q. “Data acquisition and simulation of natural phenomena”. Science China Information Sciences 54.4 (2011), pp. 683–716.

[2] Gotchev, A. et al. “Three-Dimensional Media for Mobile Devices”. Proceedings of the IEEE 99.4 (2011), pp. 708–741.

[3] Akenine-Möller, T. et al. Real-Time Rendering. 3rd ed. Natick, MA, USA: A. K. Peters, Ltd., 2008.

[4] Houze, R. A. Cloud Dynamics. 2nd ed. Seattle, WA, USA: Academic Press, 2014.

[5] Burton, J. & Taylor, K. The Nature and Science of Rain. Nature and Science Series. London, UK: Franklin Watts, 1997.

[6] Libbrecht, K. G. Ken Libbrecht’s Field Guide to Snowflakes. St. Paul, MN, USA: Voyageur Press, 2006.

[7] Srivastava, R. C. “Size distribution of the raindrops generated by their breakup and coalescence”. Journal of Atmospheric Sciences (1971), pp. 410–415.

[8] Pruppacher, H. & Klett, J. Microphysics of Clouds and Precipitation. 2nd ed. Atmo- spheric and Oceanographic Sciences Library. Dordrecht, The Netherlands: Kluwer Academic Publishers, 1997.

[9] Magono, C. & Lee, W. “Meteorological Classification of Natural Snow Crystals”. Journal of the Faculty of Science 4.2 (1966), pp. 312–335.

[10] Rose, B. M. & McAllister, D. F.“Real-time photorealistic stereoscopic rendering of fire”. Proceedings of SPIE 6490 (2007), 64901O1–64901O–13.

[11] Johnson, T. M. & McAllister, D. F.“Real-time stereo imaging of gaseous phenomena”. Proceedings of SPIE 5664 (2005), pp. 92–103.

[12] Borse, J. A. & McAllister, D. F.“Real-time image-based rendering for stereo views of vegetation”. Proceedings of SPIE 4660 (2002), pp. 292–299.

[13] Jakulin, A. “Interactive Vegetation Rendering with Slicing and Blending”. Proceed- ings of Eurographics (2000). Ed. by Sousa, A. & Torres, J. C.

[14] Garg, K. & Nayar, S. K. Photometric Model of a Rain Drop. Tech. rep. 2004.

144 [15] Roser, M. et al. “Realistic Modeling of Water Droplets for Monocular Adherent Rain- drop Recognition using Bezier Curves”. ACCV Workshop on Computer Vision in Vehicle Technology: From Earth to Mars. Queenstown, New Zealand, 2010.

[16] Garg, K. & Nayar, S. K. “Photorealistic Rendering of Rain Streaks”. Proceedings of ACM SIGGRAPH 23.3 (2006), pp. 996–1002.

[17] Garg, K. & Nayar, S. K. “Vision and Rain”. International Journal of Computer Vision 75.1 (2007), pp. 3–27.

[18] Tatarchuk, N. & Isidoro, J. “Artist-Directable Real-Time Rain Rendering in City En- vironments”. Eurographics Workshop on Natural Phenomena. Ed. by Chiba, N. & Galin, E. The Eurographics Association, 2006.

[19] Slomp, M. et al. “Photorealistic real-time rendering of spherical raindrops with hierarchical reflective and refractive maps”. Journal of Visualization and Computer Animation 22.4 (2011), pp. 393–404.

[20] Puig-Centelles, A. et al. “Multiresolution techniques for rain rendering in virtual environments”. International Symposium on Computer and Information Sciences. 2008, pp. 1–4.

[21] Puig-Centelles, A. et al. “Creation and control of rain in virtual environments”. The Visual Computer 25.11 (2009), pp. 1037–1052.

[22] Puig-Centelles, A. et al. “Rain Simulation in Dynamic Scenes”. International Journal of Creative Interfaces and Computer Graphics 2.2 (2011), pp. 23–36.

[23] Creus, C. & Patow, G. A. “R4: Realistic Rain Rendering in Realtime”. Computers and Graphics 37.2 (2013), pp. 33–40.

[24] Rousseau, P.et al. “Realistic Real-time Rain Rendering”. Computer and Graphics 30.4 (2006), pp. 507–518.

[25] Wang, L. et al. “Real-Time Rendering of Realistic Rain”. ACM SIGGRAPH. 2006.

[26] Rousseau, P.et al. “GPU Rainfall”. Journal of Graphics, GPU, and Game Tools 13.4 (2008), pp. 17–33.

145 [27] Coutinho, B. B. et al. “Rain Scene Animation through Particle Systems and Surface Flow Simulation by SPH”. Proceedings of SIBGRAPI (2010), pp. 255–262.

[28] Tariq, S. “Rain”. Nvida White Paper. 2007.

[29] Starik, S. & Werman, M. “Simulation of Rain in Videos”. International workshop on texture analysis and synthesis. 2003.

[30] Mizukami, Y. et al. “Realistic Rain Rendering”. GRAPP - International Conference on Computer Graphics Theory and Applications (2008), pp. 273–280.

[31] Wang, N. & Wade, B. “Rendering Falling Rain and Snow”. Proceedings of ACM SIG- GRAPH (2004), p. 14.

[32] Wang, C. et al. “Real-Time Modeling and Rendering of Raining Scenes”. The Visual Computer 24.7 (2008), pp. 605–616.

[33] Weber, Y. et al. “A Multiscale Model for Rain Rendering in Real-time”. Computer and Graphics 50 (2015), pp. 61–70.

[34] Wang, C. et al. “Realistic Simulation for Rainy Scene”. Journal of Software 10.1 (2015), pp. 106–115.

[35] Feng, Z.-X. et al. “Real-time rain simulation in cartoon style”. International Confer- ence on Computer Aided Design and Computer Graphics. IEEE Computer Society, 2005.

[36] Yang, Y. et al. “Design and Realtime Simulation of Rain and Snow based on LOD and Fuzzy Motion”. ICPCA - International Conference on Pervasive Computing and Applications (2008), pp. 510–513.

[37] Barnum, P.C. et al. “Analysis of Rain and Snow in Frequency Space”. International Journal of Computer Vision 86.2-3 (2010), pp. 256–274.

[38] Fearing, P.“Computer Modelling Of Fallen Snow”. Proceedings of AMC SIGGRAPH (2000), pp. 37–46.

[39] Feldman, B. E. & O’Brien, J. F.“Modeling the Accumulation of Wind-driven Snow”. Proceedings of ACM SIGGRAPH (2002), p. 218.

146 [40] Fedkiw, R. et al. “Visual Simulation of Smoke”. Proceedings of ACM SIGGRAPH (2001), pp. 15–22.

[41] Haglund,˙ H. et al. “Snow Accumulation in Real-Time”. Proceedings of SIGRAD (2002), pp. 11–15.

[42] Moeslund, T. B. et al. “Modeling Falling and Accumulating Snow”. International Conference on Vision, Video and Graphics. 2005.

[43] Zou, C. et al. “Algorithm for generating snow based on GPU”. Proceedings of ICIMCS (2010), pp. 199–202.

[44] Langer, M. S. et al. “A Spectral-particle Hybrid Method for Rendering Falling Snow”. Proceedings of EGSR (2004), pp. 217–226.

[45] Ohlsson, P.& Seipel, S. “Real-time Rendering of Accumulated Snow”. Proceedings of SIGRAD. 2004, pp. 25–32.

[46] Reynolds, D. T. et al. “Real-time Accumulation of Occlusion-based Snow”. The Visual Computer 31.5 (2015), pp. 689–700.

[47] Tokoi, K. “A Shadow Buffer Technique for Simulating Snow-Covered Shapes”. Pro- ceedings of CGIV (2006), pp. 310–316.

[48] Foldes, D. & Benes, B. “Occlusion-Based Snow Accumulation Simulation”. VRIPHYS - Workshop in Virtual Reality Interactions and Physical Simulation. Ed. by Dingliana, J. & Ganovelli, F.The Eurographics Association, 2007.

[49] Saltvik, I. et al. “Parallel Methods for Real-time Visualization of Snow”. Proceedings of PARA (2007), pp. 218–227.

[50] Festenberg, N. v. & Gumhold, S. “A Geometric Algorithm for Snow Distribution in Virtual Scenes”. Proceedings of Eurographics (2009), pp. 17–25.

[51] Hinks, T. & Museth, K. “Wind-driven snow buildup using a level set approach”. Proceedings of Eurographics 9 (2009), pp. 19–26.

[52] Zhang, J. et al. “Rendering snowing scene on GPU”. Proceedings of ICIS 3 (2010), pp. 199–202.

147 [53] Tan, Y. et al. “Real-Time Snowing Simulation Based on Particle Systems”. 3 (2009), pp. 7–11.

[54] Tan, J. & Fan, X. “Particle System Based Snow Simulating in Real Time”. Proceedings of ESIAT 10 (2011), pp. 1244–1249.

[55] Fan, N. & Zhang, N. “Real-time Simulation of Rain and Snow in Virtual Environment”. International Conference on Industrial Control and Electronics Engineering. 2012.

[56] Ding, W. et al. “Real-time rain and snow rendering”. International Conference on Agro-Geoinformatics (2013), pp. 32–35.

[57] Stomakhin, A. et al. “A material point method for snow simulation”. ACM Transac- tions on Graphics 32.4 (2013), 102:1–102:10.

[58] Sulsky, D. et al. “A particle method for history-dependent materials”. Computer Methods in Applied Mechanics and Engineering (1994), pp. 179–196.

[59] Harlow, F. H. “The Particle-in-Cell Method for Numerical Solution of Problems in Fluid Dynamics”. Proceedings of Symposium on Applied Mathematics 15.269 (1963).

[60] Wong, S.-K. & Fu, I.-T. “Hybrid-based Snow Simulation and Snow Rendering with Shell Textures”. Computer Animation and Virtual Worlds 26.3-4 (2015), pp. 413–421.

[61] Tatarchuk, N. “Artist-Directable Real-Time Rain Rendering in City Environments”. Proceedings of SI3D (2006), p. 30.

[62] Fine, I. & Jacobs, R. A. “Modeling the Combination of Motion, Stereo, and Vergence Angle Cues to Visual Depth”. Neural Computation 11.6 (1999), pp. 1297–1330.

[63] McAllister, D. F.“Display Technology: Stereo and 3D Technologies”. Encyclopaedia on Imaging Science and Technology (2006), pp. 1327–1344.

[64] Goldstein, E. B. Sensation and Perception. 8th ed. Wadsworth Publishing, 2009.

[65] Cumming, B. G. & DeAngelis, G. C. “The Physiology of Stereopsis”. Annual Review of Neuroscience 24.1 (2001), pp. 203–238.

148 [66] Jones, G. R. et al. “Controlling perceived depth in stereoscopic images”. Proceedings of SPIE 4297 (2001), pp. 42–53.

[67] Zhang, Z. & McAllister, D. F.“A uniform metric for anaglyph calculation”. Proceedings of SPIE 6055 (2006), pp. 605513–605513–12.

[68] McAllister, D. F. et al. “Methods for computing color anaglyphs”. Proceedings of SPIE 7524 (2010), 75240S–75240S–12.

[69] Jorke, H. & Fritz, M. “Stereo projection using interference filters”. Proceedings of SPIE 6055 (2006), 60550G–60550G–8.

[70] Dodgson, N. A. “Autostereoscopic 3D Displays”. Computer 38.8 (2005), pp. 31–36.

[71] Liu, J. et al. “Three-dimensional PC: toward novel forms of human-computer inter- action”. 3D Video and Display Devices and Systems SPIE (2000), pp. 5–8.

[72] Hussain, S. A. & McAllister, D. F.“Stereo rendering of rain in real-time”. Proceedings of SPIE 8648 (2013), 86480B–86480B–11.

[73] Wara, C. “Dynamic Stereo Displays”. Proceedings of SIGCHI. Denver, CO, 1995.

[74] Rademacher, P.et al. “Measuring the Perception of Visual Realism in Images”. Pro- ceedings of Eurographics. London, UK, 2001.

[75] Ferwerda, J. “Three varieties of realism in computer graphics”. Santa Clara, CA, 2003.

[76] Adelson, S. J. & Hodges, L. F.“Stereoscopic ray-tracing”. The Visual Computer 10.3 (1993), pp. 127–144.

[77] Reeves, W. T. “Particle Systems&Mdash;a Technique for Modeling a Class of Fuzzy Objects”. ACM Transections on Graphics 2.2 (1983), pp. 91–108.

[78] Marshall, J. S. & Palmer, W. M. “The distribution of raindrops with size”. Journal of Meteorology 5.4 (1948), pp. 154–166.

[79] Hussain, S. A. & McAllister, D. F.“The Effectiveness of 2D-3D Converters in Rendering Natural Water Phenomena”. IJRTET 8.1 (2013), pp. 18–22.

149 [80] Arcsoft, Inc. Arcsoft MediaConverter - Converting 2D to 3D. URL: http://www. arcsoft.com/mediaconverter/?icn=Topics-Win8&ici=AMC8-Top-Learn- Button (visited on 01/03/2013).

[81] AxaraMedia, Ltd. 2D to 3D Video Converter: converting 2D to 3D video files and cre- ating 3D videos. URL: http://www.axaramedia.com/VideoSolutions (visited on 01/10/2013).

[82] SeaPhone Co., Ltd. Realtime 2D/3D converter DIANA-3D Plus. URL: http://www. texnai.co.jp/diana3d-plus/eng/index.html (visited on 01/08/2013).

[83] Leawo Software Co., Ltd. Best Video Converter download. URL: http://www.leawo. com/hd-video-converter (visited on 01/07/2013).

[84] Movavi. 2D to 3D Video Converter. URL: http://www.movavi.com/ (visited on 01/11/2013).

[85] McAllister, D. F. “Stereo Pairs from Linear Morphing”. Proceedings of SPIE 3295 (1998), pp. 46–52.

[86] Scharstein, D. & Szeliski, R. “A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms”. International Journal of Computer Vision 47 (2002), pp. 7–42.

[87] Hattori, T. “2D-3D Image Converter by Sea Phone”. The Journal of Three Dimen- sional Images 23.2 (2009), pp. 36–39.

[88] Hattori, T. “Image Processing Concerning Individual Phenomena in Real Time 2D/3D Conversion Software DIANA for PCs”. The Journal of Three Dimensional Images 25.2 (2011), pp. 14–18.

[89] Park, M. et al. “Learning to Produce 3D Media From a Captured 2D Video”. IEEE Transactions on Multimedia 15.7 (2013), pp. 1569–1578.

[90] Zhang, L. et al. “3D-TV Content Creation: Automatic 2D-to-3D Video Conversion”. IEEE Transactions on Broadcasting 57.2 (2011), pp. 372–383.

[91] Saxena, A. et al. “Depth Estimation Using Monocular and Stereo Cues”. Proceedings of IJCAI (2007), pp. 2197–2203.

150 [92] Cheng, E. et al. “RMIT3DV: Pre-announcement of a creative commons uncom- pressed HD 3D video database”. Proceedings of QoMEX (2012). Ed. by Burnett, I. S., pp. 212–217.

[93] Mi˛edzynarodowy Zwi ˛azek Telekomunikacyjny. Subjective Assessment of Stereo- scopic TelevisionPictures - Recommendation ITU-R BT.1438. International Telecom- munication Union, 2000.

[94] Likert, R. “A technique for the measurement of attitudes”. Archives of Psychology 22.140 (1932), pp. 1–55.

[95] Cox, N. “Speaking Stata: Creating and varying box plots”. The Stata Journal 9.3 (2009), pp. 478–496.

[96] Hussain, S. A. & McAllister, D. F.“Stereo rendering of photorealistic precipitation”. Proceedings of SPIE 9 (2017), pp. 158–166.

[97] Hojlind, S. et al. “Why a single measure of photorealism is unrealistic”. 2014.

[98] MediaWiki. Wikimedia Commons. URL: https://commons.wikimedia.org (vis- ited on 01/10/2017).

[99] Jamieson, S. “Likert scales: how to (ab)use them”. Medical education 38.12 (2004), pp. 1217–1218.

[100] Stangroom, J. Social Science Statistics. URL: http://www.socscistatistics. com/tests/Default.aspx (visited on 01/11/2017).

151