ABSTRACT
HUSSAIN, SYED ASIF. Stereoscopic, Real-time, and Photorealistic Rendering of Natural Phenomena – A GPU based Particle System for Rain and Snow. (Under the direction of Dr. David F.McAllister and Dr. Edgar Lobaton.)
Natural phenomena exhibit a variety of forms such as rain, snow, fire, smoke, fog, and
clouds. Realistic stereoscopic rendering of such phenomena have implications for scientific visualization, entertainment, and the video game industry. However, among all natural
phenomena, creating a convincing stereoscopic view of a computer generated rain or snow,
in real-time, is particularly difficult. Moreover, the literature in rendering of precipitation
in stereo is non-existent and research in stereo rendering of other natural phenomenon
is sparse. A survey of recent work in stereo rendering of natural phenomenon, such as vegetation, fire, and clouds, is done to analyze how stereoscopic rendering is implemented.
Similarly, a literature review of monoscopic rendering of rain and snow is completed to
learn about the behavior and distribution of particles in precipitation phenomena. From
these reviews, it is hypothesized that the monoscopic rendering of rain or snow can be
extended to stereo with real-time and photorealistic results. The goal of this study is to validate this hypothesis and demonstrate it by implementing a particle system using a
graphics processing unit (GPU).
The challenges include modeling realistic particle distributions, use of illumination
models, and the impact of scene dynamics due to environmental changes. The modern
open graphics library shading language (GLSL) and single instruction multiple threads
(SIMT) architecture is used to take advantage of data-parallelism in a graphics processor.
The particle geometry is modeled by a few vertices, which are morphed into a raindrop or
snowflake shapes. Every vertex is processed in parallel, using the SIMT GPU architecture. A compute shader program, a new compute mode GPU programming language, is used
to implement the effects of physical forces on rain or snow particles. Additionally for rain,
the concept of retinal persistence is used to elongate the raindrop so that it appears as a
falling rain streak. Dynamic level of detail on rain streaks and snowflakes is implemented
so that particles closer to the viewer have more visual detail then particles farther away.
Illumination models are applied for photorealistic output. The scene is rendered for the left-
and right-eye views to produce stereoscopic output, while reducing rendering complexity
by drawing some features such as object shadows only once.
Additional experiments are performed to evaluate and compare various 2D-3D soft- ware video converters. The goal of these experiments is to determine effectiveness of the
2D-3D converters in producing realistic stereoscopic output of scenes containing water
phenomenon. Such scenes are challenging to convert due to scene complexity such as
details in scene dynamics, illumination, and reflective distortion. Comparisons between
five 2D-3D software video converters are provided by using quantitative and subjective
evaluations. The study concludes with experiments on the visual factors necessary to pro-
duce photorealistic output. The experimental method uses a series of controlled human
experiments where participants are presented with video clips and still photographs of
real precipitation. The stimuli vary along three visual factors such as number of particles,
particle sizes, and their motion. The goal is to determine the statistical ranking and im-
portance of these visual factors for producing a photorealistic output. The experiments
are extended to investigate if stereo improves photorealism. Experimental stimuli include
post-processing on rendered output to produce variable lighting, glow, and fog effects to
study their impact on photorealism as the stereo camera moves in the scene. © Copyright 2017 by Syed Asif Hussain
All Rights Reserved Stereoscopic, Real-time, and Photorealistic Rendering of Natural Phenomena – A GPU based Particle System for Rain and Snow
by Syed Asif Hussain
A dissertation submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy
Computer Engineering
Raleigh, North Carolina
2017
APPROVED BY:
Dr. David F.McAllister Dr. Edgar Lobaton Co-chair of Advisory Committee Co-chair of Advisory Committee
Dr. Edward Grant Dr. Gregory T. Byrd
Dr. Theodore R. Simons DEDICATION
This dissertation is dedicated to my family, their unending support and love made this work
possible, and to my committee who gave me the necessary foundation to reach my goals.
ii BIOGRAPHY
Syed grew up in Pakistan. His favorite after school activity was to use his brother’s Sinclair ZX
Spectrum personal computer. This early introduction to computers inspired him to pursue higher education in computer engineering. As luck has it, his brother won a scholarship to study at Duke university and encouraged Syed to follow his engineering dreams at NC
State university. In 1994 Syed graduated from NC State with a Bachelor of Science degree in computer engineering with a minor in mathematics. He continued his graduate studies at NC State while working as a software engineer, which resulted in a Master of Science in electrical engineering with thesis in 1998. He went on to work in the software industry and moved to Massachusetts. He was conferred with an online degree from NC State in
2003, a Master of Engineering with a concentration in computer science. Syed came back to Raleigh to join NC State with an ultimate goal, to be a Doctor of Philosophy. Along with achieving his research objectives, Syed has been teaching at a local community college since 2010.
iii ACKNOWLEDGEMENTS
I would like to acknowledge and thank every member of my committee. Special thanks
to my academic advisor Dr. McAllister for motivating me to move onward and upward.
His expertise are paramount in my understanding of the field. My accomplishments are
incomplete without his support. My committee co-chair Dr. Lobaton has always been
understanding, accessible, and open for discussion. His insight has led to many improve-
ments in my work. Many thanks to Dr. Grant for his engaging lectures and for always being
on my side. Always welcoming Dr. Byrd has been instrumental in providing me with en-
couragement in times of need. Thank you for always keeping your door open and being
approachable, kind, and fair. Before anyone on my committee, there was Dr. Simons. He,
along with John Wettroth of Maxim Integrated, played a pivotal role in helping me find my way and point me in the direction of success. Many thanks to Dr. Perros as without him
being approachable and open to discuss my plans, I would not have been able to form a
committee of exceptional faculty. I am forever in debt. Thank you.
Lastly, I must acknowledge my family. To my wife for giving me hope of success when I
needed it the most and to our daughters and son for giving meaning to our lives. Thank
you for all the joy and opportunities to provide unconditional love.
iv TABLE OF CONTENTS
LIST OF TABLES ...... viii
LIST OF FIGURES ...... ix
Chapter 1 Introduction ...... 1 1.1 Research Objectives...... 3 1.1.1 Stereoscopic Rendering...... 4 1.1.2 Real-time Execution ...... 4 1.1.3 Photorealistic Output ...... 5 1.1.4 The 2D-3D Conversion ...... 5 1.1.5 Perceptual Space and Measuring Photorealism...... 6 1.2 Research Contributions ...... 6 1.3 Chapters Layout...... 7
Chapter 2 Formation of Rain and Snow ...... 9 2.1 Types of Rain and Snow ...... 10 2.1.1 Convectional Rain...... 11 2.1.2 Frontal Rain ...... 12 2.1.3 Relief or Orographic Rain...... 13 2.1.4 Dry Snow ...... 13 2.1.5 Wet Snow...... 14 2.2 Rain Intensity, Size and Shape...... 14 2.3 Snow Intensity, Size and Shape...... 15
Chapter 3 Literature Review ...... 19 3.1 Stereoscopic Rendering of Natural Phenomenon...... 20 3.2 Monoscopic Rendering of Rain...... 21 3.3 Monoscopic Rendering of Snow ...... 30 3.4 Computational Analysis...... 38 3.4.1 Rendering Performance – Rain...... 38 3.4.2 Rendering Performance – Snow...... 43
Chapter 4 Stereo Rendering ...... 47 4.1 Depth Perception...... 48 4.2 Psychological Depth Cues...... 49 4.3 Physiological Depth Cues...... 51 4.4 Creating Stereo Pairs ...... 55 4.5 Viewing Stereo Pairs...... 57 4.5.1 Free Viewing...... 59
v 4.5.2 Time-parallel Viewing...... 59 4.5.3 Time-multiplexed Viewing...... 62 4.6 Challenges with Stereo Rendering...... 63
Chapter 5 Implementation Framework ...... 66 5.1 Graphics Hardware ...... 67 5.2 Graphics Software ...... 68 5.3 Stereoscopic Implementation...... 69 5.4 Real-time Implementation...... 71 5.4.1 Shader Mode...... 72 5.4.2 Compute Mode...... 76 5.4.3 Transform Feedback...... 78 5.5 Photorealistic Implementation...... 80 5.5.1 Illumination using a GPU ...... 81 5.6 Particle System...... 83 5.7 Precipitation using the Particle System ...... 86 5.8 Compute Mode Particle Simulation...... 88 5.9 Animation...... 89
Chapter 6 Experiments and Results ...... 91 6.1 Phase 1 – Real-time Stereo ...... 91 6.1.1 Method 1 – Horizontal Parallax ...... 92 6.1.2 Method 2 – Asymmetric View Frustum...... 98 6.1.3 Frame rate Comparison...... 100 6.1.4 Phase 1 – Conclusions...... 102 6.2 Phase 2 – Stereo from 2D-3D Converters...... 103 6.2.1 Overview ...... 104 6.2.2 Depth Estimation Techniques ...... 105 6.2.3 Quantitative Experiments and Results ...... 107 6.2.4 Subjective Experiments and Results...... 115 6.2.5 Rain and Snow Rendering ...... 118 6.2.6 Runtime...... 121 6.2.7 Phase 2 – Conclusions...... 123 6.3 Phase 3 – Measuring Photorealism ...... 123 6.3.1 Visual Factors for Photorealism ...... 124 6.3.2 Measuring Photorealism...... 125 6.3.3 Experiment 1 – Perceptual Space...... 126 6.3.4 Experiment 2 – Other Visual Factors...... 132 6.3.5 Experiment 3 – Photorealism and Stereo...... 135 6.3.6 Phase 3 – Conclusions...... 137
vi Chapter 7 Future Enhancements ...... 139 7.1 Summary...... 139 7.2 Future Extensions ...... 141
REFERENCES ...... 144
vii LIST OF TABLES
Table 2.1 Snowflake shapes. Image Credit: Snowcrystals.com ...... 17
Table 3.1 Monoscopic rain rendering literature...... 29 Table 3.2 Monoscopic snow rendering literature...... 39 Table 3.3 Performance summary of monoscopic rain rendering ...... 43 Table 3.4 Performance summary of monoscopic snow rendering ...... 46
Table 6.1 Frame rate comparison...... 102 Table 6.2 Parallax (in Pixels) for objects closer to the camera ...... 112 Table 6.3 Parallax (in Pixels) for objects farther to the camera ...... 112 Table 6.4 Normalized MSE between baseline and 2D-3D converters ...... 115 Table 6.5 Survey questions for visual assessments...... 116 Table 6.6 Frame rate comparison for rain and snow rendering ...... 120 Table 6.7 Survey questions to determine the perceptual space ...... 129 Table 6.8 Descriptive statistics – rain/snow to determine perceptual space . . . 130 Table 6.9 Survey questions to determine other important visual factors ...... 133 Table 6.10 Descriptive statistics – rain/snow to determine other visual factors . . 135 Table 6.11 Response frequency of rain/snow ...... 136
viii LIST OF FIGURES
Figure 2.1 Conversion of water from one form to another...... 10 Figure 2.2 Convectional Rain...... 11 Figure 2.3 Frontal Rain ...... 12 Figure 2.4 Relief Rain...... 13 Figure 2.5 Raindrop changes shape as it grows in size...... 16
Figure 4.1 Stereo pair as viewed by the two eyes...... 48 Figure 4.2 Gustave Caillebotte. Paris Street, Rainy Day, 1877. Image Credit: Art Institute of Chicago...... 50 Figure 4.3 Binocular disparity ...... 52 Figure 4.4 Accommodation - changes in lens thickness accommodate focus on objects ...... 53 Figure 4.5 Vergence - inward or outward movement of both eyes to converge on objects...... 54 Figure 4.6 Motion parallax - object closer to viewer appear to move faster . . . . 55 Figure 4.7 Stereo visible where the two view frustums overlap (top view) . . . . . 56 Figure 4.8 Stereo window and horizontal parallax...... 58
Figure 5.1 Transformation from 3D world space to 2D screen space...... 70 Figure 5.2 Fragment shader (stereo view with red-cyan anaglyph glasses) . . . . 74 Figure 5.3 Input and output of tessellation shader ...... 75 Figure 5.4 Modern graphics pipeline ...... 77 Figure 5.5 Implementation of a particle system using transform feedback . . . . 79 Figure 5.6 Particle system block diagram ...... 84 Figure 5.7 Simulation and rendering loops...... 88
Figure 6.1 Symmetric view frustum with stereo overlap (top view)...... 93 Figure 6.2 Generating stereo rain streak (top view)...... 94 Figure 6.3 Rain billboard and mipmaps (side view)...... 95 Figure 6.4 Single camera setup to add parallax to rain streaks ...... 97 Figure 6.5 Method 1 output (stereo view with red-cyan anaglyph glasses) . . . . 98 Figure 6.6 Symmetric view frustum (top view) ...... 99 Figure 6.7 Asymmetric view frustum (top view) ...... 100 Figure 6.8 Method 2 output (stereo view with red-cyan anaglyph glasses) . . . . 101 Figure 6.9 Baseline input image to test depth from linear perspective ...... 108 Figure 6.10 Baseline output image: Test case C-1 ...... 108 Figure 6.11 Output of Axara 3D software video converter ...... 109 Figure 6.12 Depth from occlusion. Test case C-2...... 110 Figure 6.13 Depth from object placement. Test case C-3...... 110
ix Figure 6.14 Depth from stereoscopic camera: Test case C-4, C-5 and C-6 ...... 111 Figure 6.15 Axara 3D output for a monoscopic rendered rain scene ...... 117 Figure 6.16 Variation in rating from twenty five participants ...... 119 Figure 6.17 One of the eight visual stimuli for rain and snow scene ...... 128 Figure 6.18 Rain scenes - response to number of particles, size, and motion . . . 130 Figure 6.19 Snow scenes - response to number of particles, size, and motion . . . 131 Figure 6.20 Rain scenes - response to lighting, glow, and fog ...... 134 Figure 6.21 Snow scenes - response to lighting, glow, and fog ...... 134
x CHAPTER 1
INTRODUCTION
Stereoscopic rendering produces the left- and right-eye views of the same scene from two different perspectives. True 3D animations and movie special effects demand realistic view- ing experience. Similarly, software applications in virtual reality (VR) systems, which have become mainstream, must produce stereoscopic output for viewer to have an immersive feel. Often in movies, video games, and VR applications use of natural phenomena, such as rain or snow enhances the scene or the storyline. Stereoscopic rendering of precipitation is also relevant to scientific visualization applications, such as study of various weather phenomena. However, creating a convincing stereoscopic rendering of rain or snow, in real-time, is particularly difficult. The dynamics of falling particles, their interaction with light, collision with other objects in the environment, and effects of external forces such as gravity and wind make them complex phenomena to simulate and render.
The focus of this study is to develop techniques for the stereoscopic, real-time, and photorealistic rendering of natural phenomena, such as rain or snow. The dynamics of such
1 phenomena is simulated and rendered in stereo at a real-time frame rate with photorealistic
output by taking advantage of data-parallelism, programmability, and new features of
current Graphics Processing Units (GPU). Many commercially available 2D-3D software video converters can also produce stereoscopic outputs. How well these outputs compare to
rendered stereo output is examined. The visual factors necessary to produce photorealistic
output in precipitation scenes are also measured by performing subjective experiments.
Several studies have explored computer rendering of natural phenomena. A compre-
hensive survey of recent work done in data acquisition and simulation techniques needed
to render natural phenomena is provided by Zhao [1]. The majority of these studies use monoscopic rendering, where depth sensation is simulated by using monoscopic cues such
as perspective, occlusions, relative size, atmospheric haze, motion parallax, shading and
shadows. Although monoscopic rendering provides some information about depth, the viewer lacks the ability to look beyond or around objects in that scene. The use of stereo-
scopic rendering can remedy this shortfall by providing true 3D depth sensation, which
is important in many computer graphics applications. However, research in stereoscopic
rendering of natural phenomena has been limited because visually accurate stereoscopic
output is challenging to produce without any visual fatigue. These challenges include ac-
counting for variations in stereo perception from person to person. Approximately five
percent of the general population is stereoscopically latent or stereo blind and cannot
sense depth [2]. Stereo is also difficult to perceive in quickly changing scenes such as fast moving objects or camera. Despite these challenges, a stereoscopic view of precipitation
greatly enhances a scene giving an immersive feel in applications of virtual reality, scientific visualization, video games and movies.
2 The stereoscopic rendering also presents many challenges. Due to the appearance of
depth in a stereo view, objects in the background become much more noticeable. Thus
scenes that may appear visually appealing in monoscopic view, because of ignored back-
ground, may appear distracting in stereo. Therefore, stereo extension of existing mono-
scopic rendering techniques are challenging. Additionally, the monoscopic and stereo
depth cues are additive. If these are combined incorrectly, they can create a conflict in
depth cues. This may result in eyestrain and an unpleasant viewing experience. Other
challenges include proper scene illumination such as use of light reflection and refraction
models, impact of scene dynamics due to environmental changes, such as wind speed, and
scene synchronization with natural sound effects.
Additionally, monoscopic techniques do not provide the necessary depth information
to permit rendering of left- and right-eye views with the correct parallax needed for stereo.
A technique that appears feasible to implement in real-time for a monoscopic environment
can become too computationally expensive to implement in stereo. Real-time applications
often opt for using heuristics to give an illusion of visual correctness instead of applying
physically accurate simulation models. Such rendering shortcuts may give seemingly realis-
tic output on a monoscopic display. However, their visual accuracy in stereoscopic displays
is an open research question.
1.1 Research Objectives
The objective of this research is to demonstrate that stereoscopic, real-time, and photoreal-
istic rendering of a rain or snow scene is achievable with contemporary programmable GPU.
A major contribution of this study is to extend the monoscopic statistical precipitation
3 models to simulate the behavior and distribution of falling particles for stereo viewing.
The application must respond to new parameters and make necessary adjustments, in
real-time, so that the user continues to view a high quality, photorealistic, stereo output without any visual fatigue. An additional goal of this research is to measure the quality of
stereo output from various 2D-3D video software converters and compare with the ren-
dered results of this study by analyzing answers to survey questions. Moreover, a series of
subjective experiments are performed to determine the statistical ranking and importance
of visual factors in precipitation scenes important for photorealism, such as number of
particles, size, and motion.
1.1.1 Stereoscopic Rendering
The stereo problem poses the biggest challenge since stereo perception varies from person
to person. Correct use of depth cues without conflict is important. Monoscopic rendering
methods are extended using modern graphics library (OpenGL) and corresponding OpenGL
shading language (GLSL) programs that run directly on a GPU to get the best stereo results.
1.1.2 Real-time Execution
The definition of real-time is taken from Akenine-Möller et al. [3]. A viewpoint change should result in a redraw of newly rendered scene at a minimum rate of 15 frames per second
(fps). This is a lower bound on real-time frame rates. Higher frame rates are desirable and
achievable by use of the modern hardware. The user should not observe any flicker or
discontinuous motion since such visual artifacts would have detrimental effect on stereo viewing. At least 120 fps are desired for flicker free stereo viewing in an interactive gaming
4 environment. Graphics acceleration hardware is used to meet real-time requirements.
Current hardware offers a programmable graphics pipeline that runs on a GPU using
multiple independent parallel execution threads, i.e. a single instruction runs on multiple
independent threads at the same time in lockstep. This type of parallel architecture, referred
to as Single Instruction Multiple Threads (SIMT), is well suited for solving many graphics
problems in real-time.
1.1.3 Photorealistic Output
Photorealism is the process of image generation by simulating the physical interaction of
light in an environment and how light from the source is reflected and scattered through
the environment. It is also to determine how local and global illumination models make
precipitation rendering realistic. The challenge is to implement photorealistic output in
real-time with stereoscopic rendering, which is one of the major goals of this study.
1.1.4 The 2D-3D Conversion
The alternative to simulating and rendering stereo precipitation is to apply 2D-3D con- version using commercially available software applications. The goal is to measure the
quality of stereo output of 2D-3D software video converters and compare the results with
the output of this study. The quality of the stereo output is measured in two ways. It is
quantitatively measured by comparing the horizontal disparity produced by various 2D-3D
converters with the known disparity taken from baseline images. It is also measured sub-
jectively by asking participants survey questions regarding the quality of visual experience when viewing 2D-3D video conversions in stereo.
5 1.1.5 Perceptual Space and Measuring Photorealism
The perceptual space is defined as the visual experience of a precipitation scene as observed
from the ground. From the literature review of related work in computer generation of rain
and snow it is concluded that there are three key visual factors influencing the perceptual
space, thus important for producing photorealistic results. They are number, size, and
movement of particles. The experiments performed in this study measure the statistical
ranking and importance of the three visual factors. Additionally, visual stimuli varying along
the three factors and are used in subjective experiments where participants are asked to
respond to survey questions. Photorealism is quantified by analyzing the results from these
experiments.
1.2 Research Contributions
The major results presented in the latter chapters of this work are the following:
I. Stereoscopic rendering achieved by using GPU quad buffers. This type of display uses
active shutter goggles to view in stereo.
II. A real-time frame rate of 60 frames per seconds (fps) per eye is achieved when vertical
sync is turned on. The vertical sync synchronizes GPU output to the display screen
refresh rate, which is typically set at 60 fps.
III. Photorealistic output is achieved by simulating global light effects with reflection,
specular, ambient, and diffused lighting components.
6 IV. Stereo output is compared and evaluated against commercially available 2D-3D
software video converters.
V. Perceptual space for rain and snow is determined and visual effects of photorealism
are measured by performing survey experiments.
1.3 Chapters Layout
A summary of the remaining chapters follows.
Chapter 2: In order to accurately simulate and render stereoscopic scene with rain
or snow, it is important to understand the attributes and various properties associated with such natural phenomena. This chapter provides a general description of rain and
snow formation in nature. Various types of natural precipitations are discussed along with
attributes that define them such as particles shape, size, and intensity.
Chapter 3: A review of recent literature in monoscopic rendering of rain and snow is
provided. Stereoscopic rendering of natural phenomenon such as fire, trees, and clouds
is also reviewed. The literature review provides an overview of various mono and stereo-
scopic rendering techniques, including methods to simulate the dynamics of rain and
snowfall. It also details various algorithms for rain splash and snow accumulation along with listing computational performance data. The topics covered in the chapter provide a
comprehensive background needed to achieve the research objectives.
Chapter 4: This chapter details stereoscopic depth perception and compares with
monoscopic depth cues. Methods of creating stereo pairs and their viewing techniques
are described. Various methods of displaying stereo rendering and related issues are also
7 discussed.
Chapter 5: An implementation framework of rendering precipitation phenomena in
stereo is given. Such implementation is computationally expensive and an application
running only on a CPU may not have the capacity to maintain a real-time frame rate. There-
fore, implementation takes advantage of hardware acceleration by a GPU. The graphics
hardware and rendering pipeline used for implementation of the demonstration applica-
tion is detailed in this chapter. It also provides detail about how to achieve real-time and
photorealistic implementation. Particle system implementation and animation using a
programmable GPU is described. Such particle systems can be modified to show other
natural phenomena such as mist, drizzle, hail or even falling autumn leaves.
Chapter 6: In this chapter various experiments and results are presented including:
computational complexity, a comparison between the outputs of various 2D-3D software video converters, and findings from the survey experiments to determine perceptual space
of rain and snow.
Chapter 7: The last chapter discusses future works and extensions to this study.
8 CHAPTER 2
FORMATION OF RAIN AND SNOW
This chapter provides background information needed to understand formation of rain
or snow in nature thus providing better understanding of how to accurately simulate and
render such processes. In nature, causes of formation of precipitation are complex but the
processes involved are always the same. The Earth’s atmosphere varies in temperature with
altitude. Generally, air at ground level is warmer. As altitude increases, the air temperature
falls. The cooling causes the moisture in the air to condense and form clouds. There are
three basic cloud types: cumulus, stratus, and cirrus meaning heaped, spread out, and
a lock of hair respectively. To describe clouds capable of precipitation the term nimbus,
meaning rain, is added [4]. When condensed moisture in nimbus clouds becomes heavy, it falls as rain. It is liquid precipitation different form drizzle. Relative to rain, drizzle droplets
are far smaller and more in number that results in poor visibility and produce a fog like
effect. If air temperature falls even further, the moisture in the air condenses into tiny ice
crystals to form snowflakes. If enough ice crystals stick together, they become heavy and fall
9 to the ground as snow. This cycle of surface moisture to convert into water vapors and come down in various forms of precipitations is called the water cycle, illustrated in Figure 2.1.
Figure 2.1 Conversion of water from one form to another
2.1 Types of Rain and Snow
Rain is classified into three types, namely convectional, frontal, and relief, also known as
Orographic, rain [5]. These types produce rain with variable intensity, droplet size and shape. Same winds that form conditions for rain can form snow but the air temperature must be close to freezing, between 0 and 2 °C(32 to 35.6 °F). Snow contains a certain amount of moisture. On average 127 mm (5 inches) of snow melts into 12.7 mm (0.5 inches) of water.
Based on the amount of moisture it contains, snow can be characterized into two major
10 types, namely dry and wet snow [6]. The following sections further describe rain and snow types and associated properties.
2.1.1 Convectional Rain
This type of rain is commonly formed in the tropics. It tends to form when the sun is the warmest. Because land warms more quickly than the sea, convectional rain generally forms
over land. It produces cumulus and rapidly towering cumulonimbus clouds, which results
in short but heavy rain interspaced with sunny periods. Figure 2.2 illustrates this process.
As the air moves upward and away from the relatively warmer ground surface, it cools down.
Figure 2.2 Convectional Rain
The cooler air is unable to hold as much water vapor as it can when it is warmer. Eventually
the temperature of the rising air reaches a dew point, a temperature where air cannot hold water vapor anymore. At dew point, the cooler air condenses. Condensation is the process
by which the water vapor held in the air is turned back into liquid water droplets, which
11 falls as rain.
2.1.2 Frontal Rain
Frontal rain is common in temperate regions. This type of rain takes a long time to arrive.
Initially, high altitude cirrus clouds are formed. Subsequently, they become layered and turn into stratus clouds that eventually become rain producing nimbostratus clouds. Frontal rain usually starts slowly and remains steady for several hours. It forms when two air masses meet. If a warm air mass meets a cold air mass, a warm front forms. Conversely, a cold front is formed when a cold air mass meets a warm air mass as shown in Figure 2.3.
Figure 2.3 Frontal Rain
In a warm front, the warm air, being less dense, gently slides over the cold air. As it rises, it cools and condenses into clouds. In a cold front since cold air is denser, it forces its way under the warm air. Consequently, the warm air is forced up quickly which leads to large cumulonimbus clouds producing heavy rain, often with thunder and lightning.
12 2.1.3 Relief or Orographic Rain
This rain type is characterized by thick clouds that produced light rain conditions. It can
occur at any time over a mountainous terrain. The warm air over an ocean is forced to
rise when it encounters high land mass. As this moist air gains height it gets cooler. Water vapors in this air mass gradually condense to form rain clouds. Figure 2.4 illustrates this
phenomenon.
Figure 2.4 Relief Rain
2.1.4 Dry Snow
Dry snow, sometimes called powdery snow, is formed when snowflakes pass through dryer
and cooler air at temperature below or closer to freezing. Dry snow contains below average
13 moisture, sometimes producing 12.7 mm (0.5 inch) of water for every 508 mm (20 inches)
of snow thus accumulating to higher depths. This type of snow produces large number of
smaller snowflakes resulting in a powdery texture that can drift more easily with wind. Such
snow is difficult to stick, easy to shovel, and is perfect for winter sports.
2.1.5 Wet Snow
Wet snow is created when the temperature is slightly warmer than freezing. At this temper-
ature, snowflakes melt around the edges and stick together resulting in a small amount of
big and denser snowflakes. Wet snow contains above average moisture producing 25.4 mm
(1 inch) of water for every 127 mm (5 inches) of snow. Such snow is easy to pack together
into various snow sculptures but is much heavier to shovel.
2.2 Rain Intensity, Size and Shape
Variation in the number of raindrops impacts rain intensity, which is measured by a volume
of water accumulated per unit of time. The more raindrops, and therefore water, the greater
the rain intensity, which is classified into three categories: light, moderate, and heavy rain.
The rate of light rain varies between a trace and 2.5 mm/hr (millimeters per hour). Moderate
rain rate is between 2.5 to 7.5 mm/hr. Rain is classified as heavy if the rate is more than 7.5
mm/hr. The size of a raindrop is typically greater than 0.5 mm in diameter. In widely scattered
rain, the drops may be smaller, down to 0.1 mm. In general, a raindrop size can vary from 0.1
mm to 9 mm across. However, droplets larger than about 4 mm can become unstable and
14 split into smaller droplets. The probability of a raindrop breaking up is given as a function of its size and results in an exponential increase in the likelihood of breakup in droplet sizes beyond 3.5 mm [7]. The shape of a raindrop depends on the result of a tug-of-war between surface tension of the raindrop and the pressure of the air pushing up against the bottom of the drop as it falls. When the drop is smaller than 2 mm, surface tension wins and pulls the drop into a spherical shape. The fall velocity increases as a raindrop gets bigger which causes the pressure on the bottom to increase. The raindrop becomes more oblate with a flatter surface facing the oncoming airflow. Larger drops become increasingly flattened at the bottom forming a depression. In raindrops that are greater than 4 mm across, this depression grows to form a parachute shape that explodes into many smaller droplets. Examples of this phenomenon are given by Pruppacher et al. [8]. Figure 2.5 illustrates the change in droplet shape as its size increases. The final illustration on the right shows the breakup of a bigger than 4 mm droplet into smaller spherical droplets.
2.3 Snow Intensity, Size and Shape
Like rain, snow is classified into light, moderate and heavy categories. Snow intensity is measured by considering how much equivalent liquid water a snowfall generates. This is called liquid water equivalent measurement. Light snow rate is about 1.0 mm/hr, moderate rate is between 1.0 to 2.5 mm/hr, and it is classified as heavy if the rate is more than 2.5 mm/hr. The size of a snowflake varies greatly. The smallest snowflakes are called diamond dust crystals. They are as small as 0.1 mm. These faceted crystals sparkle in sunlight as they float
15 Figure 2.5 Raindrop changes shape as it grows in size
through the air, which is how they got their name. They form rarely in very extremely cold weather. The largest snow crystal ever recorded, measured 10 mm (0.4 inches) from tip to
tip.
The shape of snowflakes is influenced by the temperature and humidity of the atmo-
sphere. Snowflakes form in the atmosphere when water droplets freeze onto dust particles.
Depending on the temperature and humidity of the air where the snowflakes form, the
resulting ice crystals will grow into a myriad of different shapes. Snow crystals are classified
by Magono and Lee [9] into eight primary categories as explained below and illustrated in Table 2.1.
1. Stellar Dendrites are quite large and a common type of snowflakes. The best speci-
mens usually appear when the weather is very cold close to -15 °C (5 °F). The name
comes from their star-shaped appearance, along with their branches called dendrite
meaning tree-like.
16 Table 2.1 Snowflake shapes. Image Credit: Snowcrystals.com
No. Category Illustration
1 Stellar Dendrites
2 Columns and Needles
3 Capped Columns
4 Fernlike Stellar Dendrites
5 Diamond Dust Crystals
6 Triangular Crystals
7 Twelve-branched Snowflakes
8 Rimed Snowflakes and Graupel
2. Columns and Needles snow crystals form when the temperature is around -6 °C (21
°F). They are small in size and look like bits of white hair. Longer column crystals look
like needles.
3. A Capped Column forms when it travels through multiple temperature layers as it
grows. First a column forms at approximately -6 °C (21 °F). After that, plates grow
on the ends of the columns near -15 °C (5 °F). These types of snowflakes are very
uncommon and difficult to spot.
4. Fernlike Stellar Dendrites snowflakes are similar to the stellar dendrites but slightly
larger. The specimen size can be up to 5 mm. They have leaf like structure with
side-branches parallel to neighboring branches. These crystals are not perfectly sym-
metrical. The side-branches on one arm are not usually the same as those on the
17 other branches.
5. Diamond Dust Crystals are the tiniest snow crystals no larger than the diameter of a
human hair forming in extreme cold. The basic ice crystal shape is that of a hexagonal
prism, governed by crystal faceting.
6. The aero-dynamical effects help produce these Triangular Crystals. They are typi-
cally small, shaped like truncated triangles. Sometimes branches sprout from the six
corners, yielding an unusual symmetry.
7. When two small six-branched snow crystals collide in mid-air, they can stick to-
gether and grow into a Twelve-Branched Snowflake. This occurs frequently in a windy
environment.
8. Snow crystals grow inside clouds made of water droplets. Often a snow crystal will
collide with some water droplets, which freeze onto the ice. These droplets are called
rime. A snow crystal might have no rime, a few rime droplets, quite a few, and some-
times the crystals are completely covered with rime. Blobs of rime are called graupel
meaning soft hail.
18 CHAPTER 3
LITERATURE REVIEW
Literature in rendering of natural phenomenon in stereo is sparse and work in rendering
rain and snow in stereo is non-existent. A fair amount of literature is focused on monoscopic
rendering of natural phenomena. This is because rendering techniques that work well in
monoscopic view such as bump mapping, which adds realism by modifying surface normal vectors to simulate bumps and wrinkles on the surface of an object, fail in stereo. The depth
information is lost and objects may look flat giving the rendered scene a cardboard effect.
The literature review of monoscopic rendering of rain and snow helps establish under-
standing of realistic rain and snow distribution models in virtual space. The literature in
stereoscopic rendering of natural phenomenon, such as fire, smoke, fog, or clouds, helps in
understanding stereoscopic rendering techniques in the context of natural phenomena.
Since there is no work in stereo rendering of precipitation, such as rain or snow, there is a
potential for research to extend monoscopic rain and snow models to stereo for use in VR
and true 3D animations, movies and applications. The focus of this chapter is to perform a
19 literature review on monoscopic rendering of precipitation and stereo rendering of other
natural phenomenon as it is hypothesized that monoscopic rendering of rain or snow can
be extended to stereo.
3.1 Stereoscopic Rendering of Natural Phenomenon
Natural phenomena such as fire, gaseous elements and vegetation have been the focus of
stereoscopic rendering. A real-time photorealistic and stereoscopic rendering method to
depict fire is proposed by Rose and McAllister [10]. Just like rain and snow, fire has dynamic characteristics that are challenging to reproduce in real-time with photorealistic effects.
The authors use pre-rendered high-quality images of fire as textures to attain photorealistic
effects. Real-time rendering is achieved by using billboards. Billboarding is a technique
that maps texture onto a polygon, which orients itself according to changing view direction.
This polygon is called a billboard. The billboard rotates around an axis and aligns itself to
face the viewer. Use of single billboard is not enough to give illusion of depth. Several layers
of billboards, called slices, are proposed to give depth to the rendered fire scene.
In another study, Johnson and McAllister [11] present stereo rendering of gaseous natural phenomena that vary in density, cast shadows, are transparent, and have dynamic behavior.
Such phenomena include fog, mist, and clouds. The authors apply a volume rendering
technique called splatting. This technique improves rendering time but is less accurate.
Splatting projects volume elements, called voxels, on the 2D viewing plane. It approximates
this projection by using a Gaussian splat, which depends on the opacity and on the color of
the voxel. Other splat types, like linear splats can be used also. A projection to the image
plane is made for every voxel and the resulting splats accumulate on top of each other in
20 back-to-front order to produce the final image.
A real-time stereo implementation of rendering vegetation is described by Borse and
McAllister [12], which use pre-rendered images of vegetation. They apply image-based rendering in stereo, which is an enhancement of monoscopic rendering of vegetation
proposed by Jakulin [13]. Rendering from an arbitrary viewpoint is achieved by using the composite of the nearest two image slices that are alpha-blended as the user changes the viewing position. Alpha blending is a combination of two colors allowing for transparency
effects. The value of alpha in the color code ranges from 0.0 to 1.0, where 0.0 represents a
fully transparent color, and 1.0 represents a fully opaque color. Their method improved
stereo quality and demonstrated reduction in visual artifacts.
3.2 Monoscopic Rendering of Rain
To analyze the effect of rainfall, it is necessary to understand the visual appearance of a
single raindrop. Several geometric and photometric models are proposed by Garg and Nayar
[14], which simulate light reflection and refraction through the raindrops. Such models are critical in understanding the visual representation of rain effects. The appearance of a
raindrop is a complex mapping of the light radiating from the environment. The results of
this technique showed that each drop acts like a wide-angle lens. The light redirects from a
large field of view towards the observer. The raindrop models created in this study provided
fundamental tools to analyze complex rain effects. Another raindrop shape models is
proposed by Roser et al. [15], which incorporates cubic Bezier curves. The authors provided validation of the model and show that the error between the shape fitted model and the
real raindrop is significantly less than when a spherical shape model is used. Such shape
21 models are useful for image correction by removing distortion created by raindrops.
In another study, Garg and Nayar [16] illustrated photorealistic rendering of rain with a system that measures rain streaks. A rain steak is formed when a raindrop looks like a bright
stroke on an image due to its fast falling speed relative to the camera exposure. In humans,
falling raindrops are perceived as streaks due to retinal persistence. An image formed on
the retina takes about 60 to 90 milliseconds to fade away, during which the raindrop keeps
falling resulting in composite image on the retina producing this persistence effect. As a
raindrop falls, it undergoes shape distortions known as oscillations. The interaction of light with oscillating raindrops produces complex brightness patterns, such as speckles and
smeared highlights, within a single motion blurred rain streak as viewed by an observer. The
authors presented a model build for rain streak appearance that captures these complex
interactions between various raindrop oscillations, lighting, and viewing directions. To
fully capture all the possible illumination conditions, they constructed an apparatus to
photograph distortion in a falling raindrop from various angles. From these experiments
a large database consisting of thousands of rain streak images is created that capture variations and appearance of a falling raindrop with respect to light position and view
direction. Subsequently, an image-based rendering algorithm is applied to utilize their
database to add rain to a single image or video.
A novel understanding of the visual effects of rain is also presented by Garg and Nayar
[17]. They analyzed various influential factors and proposed an effective photorealistic rendering algorithm. The authors first presented a photometric model that describes the
intensities produced by individual rain streaks. Then they produced a dynamic model that
captures the spatio-temporal properties of rain. These two models are used to describe
22 the visual appearance of rain. The authors then developed an algorithm to be used in
post-processing to remove rain effects from videos. They showed that properties unique to
rain, such as its small size, high velocity, and spatial distribution, make its visibility depend
strongly on camera parameters such as exposure time and depth of field.
Rain specific effects, such as ripples, splash, and drips add to the realism of a rainy
scene. Such effects are described Tatarchuk and Isidoro [18] in a visually complex virtual city environment rendered in real-time. The authors gave an overview of the lighting system and
presented various approaches of rendering rain and dynamic water simulation using a GPU.
They presented a novel post-processing rain effect that is created by rendering a composite
layer of falling rain. The illumination of rain is computed by using refraction of individual
raindrops and reflection due to surrounding light sources and the Fresnel effect, which
describes the behavior of light when moving between media of differing refractive indices.
Rain splashes as it falls on various objects is rendered by using a billboard particle system.
Their technique works well for rainfall effects with streaks formed by falling raindrops.
However, this method does not produce the specular richness of a stationary raindrop or
photorealistic effect in a slow-motion video of rain. An alternative is to use hierarchical
refractive and reflective maps presented by Slomp et al. [19]. They use this technique to create a photorealistic real-time rendering of raindrops in a static or slow-motion scene
that are common in movies or instant-replays in video games. Multiple texture maps are
generated, each with decreasing resolution as distance from the viewpoint increases, called
hierarchical maps. These hierarchical maps are mapped on the raindrop for photorealistic
effect and raindrop billboards are used to achieve real-time rendering.
A similar method uses a multi-resolution techniques proposed by Puig-Centelles et al.
23 [20]. They apply a hierarchy of rain texture models to render rain. To obtain realistic results the authors also use a level-of-detail technique, which decreases the complexity in the rain
model with an increase in distance from the viewer. Programmable units, called shaders, were used to upload input data to the GPU memory in order to improve rendering time by
taking advantage of the programmable graphics pipeline. In another study [21], the same authors extended their work to include control and management of rain scene by defining
and operating within a certain rain area. The physical properties of rain were analyzed
and incorporated in rendering of realistic rain within a predefined rain area with real-time
user controls. They further extended the work by taking advantage of a GPU compute
mode, where the GPU can be used as a general-purpose processor [22]. They achieved much higher frame rates in simulating rain dynamics and collision detection with CUDA
implementation. Presence of fog, halos, and light glows in a rendered scene enhances
photorealism. For greater photorealistic effects, Creus and Patow [23] extended existing monoscopic rendering of rain by adding splashes and illumination effects such as fog, halos,
and light glows. Their algorithm did not impose any restriction on rain area dimensions
but instead only rendered rain streaks in the area around the viewer. The simulation ran
entirely on the GPU that included collision detection with other scene objects and used
pre-computed images for illumination effects.
Real-time rendering with photorealistic output using a GPU brings user interactivity
to applications.. Such method is proposed by Rousseau et al. [24]. Refraction of the scene inside a raindrop is simulated by using a texture map that is distorted based on the optical
properties of a raindrop. The authors conclude that reflections are limited to its silhouette
and thus can be neglected without reducing photorealism. Similar results are presented
24 by Wang et al. [25] with a two-part approach. First they use off-line image analysis of rain videos and second they use particle-based synthesis of rain. Images analyzed from rain videos are used to create a rain mask, which are then used for online synthesis. A pre-
computed radiance transfer function is used for scene radiance and illumination. The
radiance transfer depicts the mapping of the light in the surrounding environment to the
raindrop and how it will appear to the observer. The authors demonstrated photorealistic
results with low computational cost and small memory footprint. They applied results on a variety of real video and incorporated synthetic rain in real-time.
In another study Rousseau et al. [26] described a complete framework to simulate rainfall in a video game. The authors developed a particle system that was implemented on a GPU.
A particle system is a large set of simple primitive objects, such as a point or a triangle, which
are processed as a group. Each particle has its own attributes including position, velocity,
and lifespan that can be changed dynamically. It animated each raindrop considering it
as an individual particle. They also simulated the effect of light on raindrops by using a
refraction model developed in one of their previous studies. A retinal persistence model was also developed and included in final implementation. Another rain scene animation
framework that uses a particle system is presented by Coutinho et al. [27]. Each particle represents a raindrop. Environment lighting is implemented by using a pre-computed
radiance transfer function. The authors also employed smoothed particle hydrodynamics,
a computational method used for simulating fluid flows.
A particle system is also used by Tariq [28] to animate and render rain streaks using only a GPU. In this process, rain particles were expanded into billboards to be rendered using
the geometry shader, which is a programmable stage in a modern GPU graphics pipeline
25 where object shape can be manipulated by adding or removing vertices. Subsequently,
a library of stored textures was used to render raindrops from different viewpoints and
lighting directions.
A video frame sequence of a real scene can be extracted and used as a background
scene.in a rendered output. Such method is proposed by Starik et al. [29]. They added synthetic rain to real video sequences by describing visual properties of rainfall in terms of
time and space. The authors assumed partial knowledge of intrinsic camera parameters,
such as focal length and camera exposure time, and user defined rain parameters, such
as intensity and velocity, to achieve photorealistic results. A similar method is proposed
by Mizukami et al. [30]. Their method is to render a realistic rain scene that represented environmental conditions that change from one scene to another. They modeled wind
effect, intensity and density of rainfall. They also proposed a technique to calculate raindrop
trajectory that made rainfall effect more realistic.
A technique that maps textures onto a double cone that is placed around the observer
is presented by Wang and Wade [31]. The orientation of the cones is determined by the position and speed of the camera movement. Several textures of rain or snow are simul-
taneously scrolled on the double cone with different speed to give impression of motion.
This method is faster than the particle system, but lacks interaction between raindrops or
snowflakes and the respective environment such as splashes, water puddle formation, and
snow accumulation. On the other hand, the method proposed by Wang et al. [32] simu- lated rain interaction with the ground giving it a wet appearance and ripples formation
on water puddles. They presented a real-time realistic rain scene model that accounts for
changes in physical characteristics of a raindrop under various conditions. The authors
26 modeled changes in shapes, movements and intensity of raindrops. They also presented a new method to calculate rain streaks that accounted for retinal persistence. Besides rain, they implemented a fog effect. The method also allowed a real-time change in scenery by implementing the algorithm on a GPU.
A rain model based on spectral analysis of real rain is presented by Weber et al. [33], This model is used to determine rain distribution. Furthermore, the visibility and intensity of distant rain streaks are attenuated by using the sparse convolution noise technique, which is used in building new textures at any resolution in real-time. The authors also derived a rain density function to quantify visible rain streaks within the view frustum in terms of rain intensity and camera parameters, such as field of view and exposure time. The view frustum is defined as the volume containing visible scene objects when observed from the camera’s perspective. The authors show that their technique is independent of scene complexity due to use of image-space post-processing.
Rain clouds and rendering of lightening from the clouds also adds to realism in a rainy scene. This is demonstrated by Wang et al. [34]. They used rain cloud lightening and wind field effect to enhance realism in a rainy scene. A particle system is used to simulate rainfall dynamics. Photorealistic output is achieved by implementing a ray-tracing algorithm, a method for calculating the path of light rays through a system, to render rain under different lighting condition. The authors also simulate hazy atomization to create mist simulation by fusing raindrops texture into the scene background. On the other hand,
Feng et al. [35] implemented a non-photorealistic cartoon style animation of rain. The authors achieved a real-time rendering frame rate on a GPU-based particle system that includes collision detection and cartoon style splash effects. Particle motion was based
27 on the Newtonian dynamics to simulate global changes in the rain direction due to wind- driven effects. Rendering runtime is compared between several scene complexities by using object models of various geometries.
The literature review of monoscopic rendering of rain highlights many different tech- niques. Some methods focused on real-time rendering such as the use of scrolling textures and particle systems. Other methods discussed realism in terms of rainfall dynamics, rain- drop shape, factors that may improve photorealism such as reflection, refraction, wind, lightening, rainbow effects, and collision with other objects in the scene. A study also described non-photorealistic cartoon style rainfall effects.
In summary, the key realization is that a real-time photorealistic rendering of rain can be achieved in a monoscopic case. However, achieving the same results for a stereoscopic display is more challenging because, in theory, the data processing doubles. Therefore, twice as much time is needed to process the left- and the right image. However, there is redundancy and parallelism that can be exploited to improve rendering efficiency for stereoscopic images. Table 3.1 summaries these results.
A total of 22 recent studies on monoscopic rendering of rain have been reviewed. In contrast, only 3 studies are found on rendering natural phenomena in stereo. None of the stereo studies addressed issues related to realistic rain and snow rendering, which are spread over large 3D virtual space. Therefore, a solution is proposed to extend existing monscopic techniques of rain or snow rendering to stereo while maintaining real-time and photorealistic output.
28 Table 3.1 Monoscopic rain rendering literature
No. Authors Realtime Photorealistic Other Attributes
1 Starik et al. (2003) Rain in video 2 Garg & Nayar (2004) Reflection/refraction 3 Wang & Wade (2004) Scrolling textures 4 Feng et al. (2005) Cartoon style 5 Garg & Nayar (2006) Rain streaks 6 Tatarchuk & Isidoro (2006) Cityscape rain 7 Rousseau et al. (2006) Reflection/refraction 8 Wang et al. (2006) Particle based rain 9 Garg & Nayar (2007) Rain distribution 10 Tariq (2007) GPU based rain 11 Rousseau et al. (2008) Rain streaks 12 Mizuka et al. (2008) Wind effect 13 Changbo et al. (2008) Ripples/puddles 14 Puig-Centelles et al. (2008) Particle based rain 15 Puig-Centelles et al. (2009) User controls 16 Roser et al. (2010) Raindrop model 17 Coutinho et al. (2010) Rain surface flow 18 Slomp et al. (2011) Reflection/refraction 19 Puig-Centelles et al. (2011) CUDA based rain 20 Creus et al. (2012) Rain splashes 21 Wang C. et al. (2015) Rainy scene 22 Weber et al. (2015) Rainfall modelling
29 3.3 Monoscopic Rendering of Snow
The methodologies used to simulate and render rainfall can also be applied to snowfall
such as scrolling texture-based or particle-based systems. Wang and Wade [31] used a scrolling textures method for both rain and snow. The authors applied snow textures to a
double cone that was placed around the observer. Applications such as flight simulation, where an airplane is in flight, can take advantage of such a method. Although it can be
implemented in real-time, it lacks realism due to little or no interaction with other objects
in the scene. On the other hand, the particle-based systems are more physically realistic
since each snowflake is simulated and rendered individually, whose motion is determined
by the influences of various forces, such as gravity, wind, and air resistance.
Another method to render rain and snow is proposed by Yang et al. [36] in which the authors combine two techniques, namely level-of-detail, which decreases complexity of a
3D object representation as it moves away from the viewer, and fuzzy-motion, the blurriness
of moving objects due to persistence of vision. Such combination is used to increase particle
system efficiency for rendering of natural phenomenon. Instead of using a billboard method
to represent rain or snow, the particles were expanded to form a point eidolon that can
be texture mapped. A point eidolon is defined as a point that is stretched to conform to
a rain streak or snowflake shape, which is texture mapped with appropriate rain or snow
texture. The authors showed that the number of polygon required to create a point eidolon
are lower than creating a billboard, increasing the performance of the particle system. A
shape and appearance model of a single falling snowflake or rain streak is proposed by
Barnum et al. [37]. It is used to detect rain or snowfall in a video. The detection results are
30 identified in the frequency domain and then transferred to the image-space. Once detected,
the amount of snow or rain can be reduced or increased. The authors demonstrate that
the frequency-based approach had greater accuracy in the detection of dynamic snow and
rain particles as compared to a pixel-based approach.
The realism can also be enhanced by having snow accumulate on the ground and on
other scene objects. A method for generating such snow cover is presented by Fearing
[38]. The method consists of two models that address snow accumulation and stability. The accumulation model calculates how much snow each surface should receive. This is
based on a counter-intuitive idea where particles are emitted from upward-facing surfaces
towards the sky to determine exposure to falling snow. The amount of exposure determined
the amount of snow accumulating on the surface. To simulate wind effects, the author used
a global wind-direction that had an advantage of not requiring any fluid computations.
However, it did not produce fully convincing accumulation patterns. The stability model
is used when layers of snow were added to the scene, in an unstable area, to determine
if it will fall down. The method is based on calculating the angle of repose, which is used
to measure the static friction between piles of granular material. If the repose angle is too
steep, snow is redistributed. To render the scene, the author used commercial rendering
software. While the method is visually superior and photorealistic, it is computationally
expensive.
Fearing’s work is extended by Feldman and O’Brien [39]. They apply fluid dynamics
techniques described in Fedkiw et al. [40] to create wind velocity fields and use it to drift accumulated snow in a more realistic manner. The snow is accumulated by storing the
amount on a horizontal surface of a three-dimensional grid, a voxel, which is marked as
31 occupied. New voxels are marked as occupied after enough snow has been accumulated to
fill the voxels beneath. A real-time snow accumulation method is Haglund et al. [41], which simulates different stages of buildup of snow cover starting with a snow free environment
and ending with a completely snow covered scene. To store snow depth, height matrices
are placed on all surfaces that could receive snow. When snowflakes hit a surface, the
nearest height value is increased. To render the scene, triangulations are created from the
height matrices, and rendered using Gouraud shading, an interpolation method to produce
continuous shading of surfaces, by means of OpenGL functionality. Their focus is to find a
good trade-off between visual result and performance without physical correctness.
The type of snow, either wet or dry, is determined by the polygon count. A method
that distinguish between the two snow types is proposed by Moeslund, et al. [42]. In this method, the rendering of snowfall and snow accumulation is based on collection of ran-
domized triangular polygons to distinguish between wet and dry snow. The appearance
and movement of snowfall is based on physics governing the real processes. The same goes
for the accumulated snow where a correctly modeled wind-field is important for producing
realistic results. The effects of wind also make individual snowflakes of various shapes, which results in snowflakes moving and tumbling differently from one another, based on
their shapes. Their accumulation model allowed for the generation of snow-covered scenes
of any depth very rapidly. The results show that both the appearance and movement of the
snow, as well as the accumulated snow, are very similar to real snow. This method is used
by Zou et al. [43] to make the scene photorealistic by creating natural looking snowflakes. The algorithm to create realistic snowflakes uses tessellated triangles that are combined
together to form a snowflake. The sharp edges on the triangles are curved by applying
32 quadric Bezier curves, implemented on a GPU, that gives a new snowflake a more smooth
and natural appearance. Photorealistic heavy snowfall requires the use a of high number of
particles. A frequency domain spectral analysis of the rendered image to reduce the number
of particles in snowfall, yet keep it visually realistic, is proposed by Langer et al. [44]. They combine geometry-based falling particles with image-based spectral synthesis. The method
first renders an image with simple particle-based snowfall representation. It then analyzes
the movement of particles stored inside an image within the frequency domain. This is
used to produce an opacity map, which creates an illusion of more dense snowfall than what has been rendered initially. While this technique provides visually pleasing results
for snowfall it does not address the problem of rendering snow on the ground or snow
accumulation.
A real-time snow accumulation model is implemented by Ohlsson, and Seipel [45]. The authors used an accumulation prediction model, which contained two components, namely
an exposure and an inclination component, which determine how much snow, at specific
point, a surface would receive. This method creates snow accumulation on a per-pixel basis.
Their idea was to create an occlusion map to decide how the surface should be rendered in
terms of snow depth. A noise function is used to create surface snow textures. The algorithm
is implemented on the GPU to achieve real-time frame rates. Another real-time occlusion-
based snow accumulation algorithm is proposed by Reynolds et al. [46]. A surface-bound accumulation buffer is assigned to each object in the scene, which forms a height map
of accumulated snow cover. The authors use ashadow mapping technique, to render the
scene from above, to map directly visible surfaces to their corresponding accumulation
height-map. To reduce formation of sharp peaks and edges, blurring is performed to get a
33 smoother accumulation height-map and existing scene geometry is tessellated to add more
detail. The authors are able to achieve real-time snow accumulation on a dynamic, moving
scene. A shadow buffer technique is used by Tokei [47] to render snowfall in real-time. This method generated a shadow map using the z-buffer, also called depth-buffer. Snow is
accumulated in areas that are not shadowed. The shadow map simulates obstruction to
snowfall. A snow stability method by Fearing [38] is used to create micro-avalanches to stabilize fallen snow on various objects. This method provided good results for small size
scenes that are only up to 300 300 pixels. A procedural modeling method by Foldes and × Benes [48] is used for snow accumulation based on illumination. The authors assumed that there is a constant layer of snow on the surface. In their model, the snow accumulated or
dissipated. The snow accumulation regions are defined by calculating ambient occlusion.
The snow dissipation is by either direct or indirect illumination. Pervious work by Ohlsson,
and Seipel [45] used occlusion techniques for determining obstructions to snowfall while this method uses occlusion to account for heat and dissipation.
Existing methods in snowfall are extended by Saltvik et al. [49] to include wind simulation and snow accumulation for parallel execution on symmetric multiprocessors and multi-
core systems. The data structures are divided among various parallel threads of executions.
For snowfall modeling, the authors extended the results from Moeslund, et al. [42] by tracking the 3D positions of snowflakes to decide where to accumulate snow. In the model,
the position and velocity of each individual snowflake is tracked and updated according to
the forces described by Newtonian dynamics and the laws of motion. For wind simulation,
computational fluid dynamics is used, which is based on the Navier-Stokes equations used
in Fedkiw et al. [40] for smoke simulation. The authors extended Haglund et. al. [41] for
34 snow accumulation. For each frame, objects are checked for intersections with snowflakes.
If a hit occurs, the nearest snow height value is increased.
An efficient snow distribution method is presented by Festenberg and Gumhold [50] to give a realistic snow cover on object surfaces, as an alternative to using high-cost particle
system simulation, which produced simplified snow surface representation. The authors
use photographs of snow-covered scenes as a primary source for the model development.
Inspired by the real world observations, they derived a statistical snow deposition model.
This work is extended by Hinks and Museth [51] to include a wind-driven snow distribution model that uses dual level set, a data structure that represents the surfaces of the dynamic
snow and the static boundaries of a scene. The authors introduce a concept of using snow-
packages, a representation of discrete volumes of snow, which is traced in a wind-field and
on the surfaces of scene objects.
A GPU-based particle system for snow rendering is implemented by Zhang et al. [52] for snow rendering that included both snowfall and accumulation. Their method used
textures to store data on a GPU necessary to simulate snow particles, e.g., RGB color value
on a texture represented the xyz value of particle position in space. For each update to the
next frame, new data is written on a new texture. The height fields are used to establish
snow accumulation rules, such as how high snow can accumulate on a certain surface. The
implementation targets real-time-rendering goals instead of achieving photorealism. A wind-effect model is developed by Tan et al. [53] which is applied on a particle system to simulate snowfall. To improve visual fidelity, the authors used eight different snowflake
textures and applied texture indexing to switch between them. The snowflakes attributes
such as position, velocity, and rotational angles are computed by the particle system imple-
35 mented using a DirectX-3D library designed specifically for Microsoft platforms. However,
the implementation ran on a CPU and did not take advantage of the programmable graph-
ics pipeline. This method is extended by Tan and Fan [54] to include snow accumulation. They increased the number of textures used for snowflakes and changed the wind-field
model to use lattice Boltzmann fluid dynamics equation, which simplified the calculation
and offered a more realistic particle motion with changes in wind directions. The rendering
efficiency is improved by Fan and Zhang [55]. They use OpenGL display lists, a series of graphics commands that define an output image. When a display list is referenced, a group
of stored commands execute in order efficiently. However, implementation of display lists
is still done on a CPU causing the GPU to reference the CPU for new data for each new
draw call, creating a bottleneck between the CPU and GPU interface.
Simulation of water due to rain and melting snow is added by Ding et al. [56]. The authors used a height-map, an array that store height values, to describe various terrain
elevations. For ground water simulation, bump mapping is used to perturb the normal vectors for realistic reflection from a water surface. A study on animation of snow dynamics
is proposed by Stomakhin et al. [57]. The interactions between solid and fluid states produce a realistic snow dynamics especially in wet or dense snow, which exhibits both solid and
fluid-like properties. The authors implement a user-controllable variant of a hybrid of
an Eulerian and Lagrangian material-point-method (MPM). The MPM, first proposed by
Sulsky et al. [58], is a method in computational fluid dynamics in relation to computational
solid dynamics, which is an extension of the particle-in-cell (PIC) method [59]. An Eulerian or grid-based approach to fluid dynamics is a way of looking at fluid motion that focuses
on specific locations in the space through which the fluid flows as time passes, a grid
36 based approach. As opposed to a Lagrangian or particle-based approach, which is a way
of looking at fluid motion where the observer follows an individual fluid parcel, a small
amount of fluid, as it moves through space and time. In a hybrid Eulerian/Lagrangian MPM approach, instead of direct communication between particles they communicate through
a background grid, making the method extremely efficient. However, like other methods without a rest state MPM suffers from drift, which is exacerbated by a Taylor expansion
approximation of the deformation gradient, limiting the ability to simulate large elastic
deformations over long time frames. The authors also demonstrate that the MPM occupy
an interesting middle ground for simulation techniques, especially elasto-plastic materials
undergoing fracture. By increasing plasticity to the basic constitutive model, they show
that a range of compressible materials, such as snow, can be simulated.
A shell texture to render snow is implemented by Wong and Fu [60]. This type of texture is formed by creating a series of concentric, semi-transparent textures containing samples
images of snowflakes. The shell textures are generated at the preprocessing stage and are
used in rendering snowflakes. The proposed method is based on a hybrid, particle (La-
grangian) and grid (Eulerian), structure for handling snow. The movable snow is represented
as particles, whereas static snow is modeled as grid cells. The snowflakes particles are made
of several shell textures that are held together by applying spring forces among each other
to bond them together. The movement of these particles is simulated by a particle system.
While static snow on the ground is simulated by grid cells such that the occupied grid cells
are filled with snow and the empty grid cells have no snow. But the particles can move freely
inside each grid cell. This allows fallen snow on objects to look natural. The final resting
place of the snowflakes is computed according to gravity, collision, and spring forces.
37 Like rain, numerous studies exist on monoscopic rendering of snow. A total of 22 recent
studies have been reviewed. In addition to simulating snowfall, other problems are also
considered such as interaction of fallen snow with the ground and objects in the scene, which includes snow accumulation, formation of snow drift patterns due to wind effects,
and various shapes of snow piles as they form on objects. Notably, none of the studies
address stereoscopic rendering. Table 3.2 summaries these results.
3.4 Computational Analysis
Computation power needed to render a 1024 1024 screen resolution requires about 1 × million pixels to paint. A single pixel color contains 8 bits per channel for red, green, and
blue colors and another 8 bits for the alpha channel, which is used for transparency. To
color a single pixel a total of 32 bits are assigned. Therefore, rendering a single image
requires about 4 MB of data. At a rate of 60 fps the computer processes upward of 240 MB
of information every second to display one image. To process scene content, object shapes,
lighting, and other characteristics require more computational resources. The following
sub-section summarizes computation analysis performed by authors of recent literature in
monoscopic rain and snow rendering.
3.4.1 Rendering Performance – Rain
A scrolling texture approach instead of a particle system is used by Wang and Wade [31]. This resulted in giving the same performance overhead for heavy or light precipitation, which is independent of the number of particles rendered. Across a range of consumer
38 Table 3.2 Monoscopic snow rendering literature
No. Authors Realtime Photorealistic Other Attributes
1 Fearing (2000) Accumulation 2 Feldman & O’Brien (2002) Drifts 3 Haglund˙ (2002) Accumulation 4 Langer et al. (2004) Snowfall 5 Ohlsson & Seipel (2004) Accumulation 6 Moeslund et al. (2005) Accumulation 7 Saltvik et al. (2006) Snowfall 8 Tokoi (2006) Piles & shapes 9 Foldes & Benes˙ (2007) Dissipation 10 Yang et al. (2008) Snowfall 11 Festenberg et al. (2009) Distribution 12 Hinks & Museth (2009) Distribution/wind 13 Tan et al. (2009) Snowfall/wind 14 Barnum et al. (2010) Snow/rain removal 15 Zhang et al. (2010) accumulation 16 Zou et al. (2010) Snowflakes model 17 Tan & Fan (2011) Snowfall/wind 18 Fan & Zhang (2012) Snowfall 19 Ding et al. (2013) Accumulation 20 Stomakhin et al. (2013) Distribution 21 Reynolds et al. (2015) Accumulation 22 Wong et al. (2015) Accumulation
39 graphics cards, their technique maintained frame rates of 15 to 60 fps. A cartoon style
non-photorealistic technique by Feng et al. [35] implemented a particle system. The au-
thors compared results between different scene geometries on an AMD Athlon XP 2500+ 1.83GHz processor with an NVIDIA GeForce 6800 LE GPU with 512 MB of main memory.
The performance varied between 10 to 45 fps for number of particles ranging from 6000 to
20,000. Note that the frame rate increases with a decrease in the number of particles. The variation was due to changes in the quantity of geometries drawn and the complexity of
the particle system.
A particle system used by Rousseau et al. [24] included 5000 particles that are sufficient to provide a realistic rain impression when large raindrops or streaks are used. When using very small raindrops, below a radius of 1 mm, at least 10,000 particles are required for a
realistic rain impression. The authors ran their experiments on a PC with a 2600+ AMD CPU and an NVIDIA Geforce 6800 GT graphics card. Their method generated 100 fps for a
rain scene containing 5000 particles.
An ATI Radeon X1900 XT graphics card with a 512 MB video memory on a 1 GB Dual
Core 3.2 GHz Pentium 4 PC is used by Tatarchuk [61] to render rain in real-time for a visually complex virtual city environment. The author was able to achieve frame rates of 26 fps for
20,000 rain particles. The frame rate increased to 69 fps when number of particles decreased
to 5000. Photorealistic results with low computational cost are demonstrated by Wang et al.
[25]. The performance experiments ran on an Intel Pentium 4, 3.2 GHz PC with an NVIDIA GeForce 7800 GT graphics card. Their frame rate varied from 77 to 790 fps and the number
of particles were in the range of 10,000 to 80,000.
An NVIDIA GeForce 8800 GTX graphics card on a 2 GB Intel Core 2 processor is used by
40 Tariq [28] to implement a particle system to animate and render rain streaks. The author implemented experiments using Direct3D10 library designed specifically for a Microsoft
platform and took advantage of the GPU geometry shader to achieve 26 to 545 fps for
particles form ranging from 200,000 to 5 million. A multiresolution techniques for rain
rendering is used by Puig-Centelles et al. [20] taking advantage of a programmable GPU shader units. For experiments NVIDIA GeForce 8800 GT graphics card on a Pentium D 2.8
GHz with 2 GB RAM was used. They achieved 187 fps for 50,000 rain particles. On the same
hardware, the authors compared their results with earlier implementation by Rousseau et
al. and Tariq and showed that their method provide a better rain appearance with fewer
particles.
A method that incorporates wet ground, ripples, puddles, and even a rainbow effect is
implemented by Wang et al. [32]. They used a Pentium 4, 3.2 GHz CPU using an NVIDIA GeForce FX 7900 GTX graphics card for implementation. The average rendering speed of a
dynamic rain scene reached 20 frames per second. Rain rendering with collision detection
is implemented by Coutinho et al. [27] to give visual effects like rain splashes. They also simulate rain water collecting into lakes and forming rivers. To model the water surface flow
they used a smoothed-particle hydrodynamics (SPH) technique, which is a computational
method used for simulating fluid flows. The framework used CUDA for rain simulation. The visualization was implemented in OpenGL and GLSL. The experiments were performed
using an Intel Quad Core 3.0 GHz, with 4 GB of RAM and an NVIDIA GeForce 9800 GTX,
running in an OpenSuse Linux platform. The experiments included simulation of rain,
collision detection and SPH with a total of 60,000 rain particles and achieved a frame rate
of 68 fps. The frame rate went down to 38 fps when simulation also included formation of
41 river flows and rain splashes.
A technique consisting of two stages, a preprocessing stage that generated a raindrop
mask and a run-time stage that renders raindrops as screen-aligned billboards is used by
Slomp et al. [19]. The framework was implemented with using C++, OpenGL, and GLSL version 1.20. The experiments were performed on an Intel Core 2 Quad 32 bit CPU running
at 2.66 GHz with 4 GB RAM with NVIDIA GeForce GTX 280, 1 GB VRAM graphics card.
A frame rate in the range of 46 to 794 fps for particles ranging from 125,000 to 4 million
raindrops. The authors also implemented tone mapping, a technique to approximate the
appearance of high dynamic range images by mapping one set of colors to another. When
tone mapping is enabled, the frame rate decreased to 595 fps for 125,000 raindrops. An
Intel Core 2 Duo running at 3.0 GHz with an NVIDIA GeForce GTX 280, 1 GB VRAM is
used to implement the Creus and Patow [23] work. Along with rain rendering, the authors implemented rain splashes and various illumination effects such as fog, halos, and light
glows. The frame rate varied from 4 to 56 fps for number of particles in the range of 430,000
to 7.6 million.
An image space rain streaks model is proposed by Weber et al. [33]. The experiments were performed on NVIDIA GeForce GTX 980. A heavy rain is simulated using 8000 visible
streaks rendered at 30 fps. A light rain corresponding to 2000 visible streaks performed at
60 fps. Their results produced a non-linear relationship between number of rendered rain
streaks and the frame rate. A ray-tracing technique to render a realistic rain scene is used
by Wang et al. [34]. The authors used an Intel 2.4 GHz Core CPU with 4 GB RAM and an NVIDIA GeForce GT 540M graphic card. For the number of particles ranging from 2000 to
15000, the frame rate varied from 76 to 478 fps. Table 3.3 summaries performance results
42 from recent work in monoscopic rendering of rain.
Table 3.3 Performance summary of monoscopic rain rendering
No. Authors No. of Particles Framerate (in thousands) (fps)
1 Wang & Wade (2004) N/A 15 - 60 2 Feng et al. (2005) 6 - 20 10 - 45 3 Rousseau et al. (2006) 5 100 4 Tatarchuk & Isidoro (2006) 5 - 20 26 - 69 5 Wang et al. (2006) 10 - 80 77 - 790 6 Tariq (2007) 200 - 5000 26 - 545 7 Puig-Centelles et al. (2008) 50 187 8 Wang et al. (2008) N/A 20 9 Coutinho et al. (2010) 60 68 10 Slomp et al. (2011) 125 - 4000 46 - 794 11 Creus et al. (2012) 430 - 7600 4 - 56 12 Weber et al. (2015) 2 - 8 30 - 60 13 Wang C. et al. (2015) 2 - 15 76 - 478
3.4.2 Rendering Performance – Snow
Real-time rendering of snowfall is achieved by Langer et al. [44] using a static background image and combining it with snowflake textures using an image-based spectral synthesis
method. The implementation was using a DirectX 9 library designed for Microsoft platforms
running on a Windows XP PC with an Intel Pentium 4, 2.4 GHz processor and 1 GB of RAM.
In addition, the PC had an ATI Radeon 9800 Pro graphics card with 256 MB of video memory.
The authors simulated light and heavy snow conditions with 2000 to 16000 snowflakes. The
frame rate varied between 10 to 60 fps.
43 An accumulation prediction model is used by Ohlsson and Seipel [45], which helped to compute how much snow a specific point on a surface would receive. The implementation was tested on NVIDIA Geforce FX 5600 Ultra. The performance was directly dependent on
the resolution and the amount of the screen covered with potential snow covered surfaces.
The authors used a 600 600 resolution image with 48000 vertices. On average a frame rate × of 13 fps was achieved.
Snowfall accumulation and effects of wind is implemented by Saltvik et al. [49] on a multiprocessor system. They compared results between task and data-parallel systems.
Task-parallelism is achieved when each processor executes different threads on the same
or different data. In contrast, data-parallelism performs the same task on different data
sets and a single thread controls operations on all pieces of data. The authors showed that
task parallelism gave a 29% performance gain. This is because in their data-parallel imple-
mentation, during rendering cycle the OpenGL thread blocked other executing threads.
The experiments ran on an Intel Pentium Xeon 3.2 GHz dual CPU workstation with NVIDIA
Quadro FX 3400 graphics card. The number of simulated snowflakes ranged from 5000 to
40,000 with a frame rate varying from 23 to 133 fps.
A shadow buffer technique was used by Tokoi [47] to render snow cover and distribution. The implementation was on an Intel Pentium III, 800 MHz CPU, 384 MB main memory,
NVIDIA Geforce 2 with 32 MB video memory. The experimental software was developed on
Microsoft Windows XP Home Edition using Microsoft Visual C++ 6.0, OpenGL version 1.1. The authors used a 300 300 image with about 40,000 vertices. On average a frame rate of 4 × fps was achieved.
Level-of-detail and retinal persistence are used by Yang et al. [36] to render falling
44 precipitations such as snow and rain. The experiments ran on an Intel Pentium 4 CPU 2.80
GHZ, 512 MB RAM, NVIDIA GeForce FX 5200, with 128 MB video memory. The authors
showed that their implementation needed approximately 56,000 particles to represent
precipitation with an average frame rate of 20 fps.
A GPU-based particle system is implemented by Zhang et al. [52] and compared to a CPU implementation of the same algorithm. The experiments ran on an Intel Pentium
4 2.8 GHz computer with 1 GB RAM, and NVIDIA GeForce 7650 graphics card with 256
MB of video memory under Windows XP,Visual C++ 6.0, and an OpenGL environment. Light snow was represented by 100,000 snowflakes and heavy snow contained 200,000
particles. On a CPU implementation, the frame rate ranged from 8 to 18 fps. However, GPU
implementation improved results by more than three times with frame rate ranging from
36 to 60 fps.
A shadow-mapping technique is used by Reynolds et al. [46] to determine which areas on the ground are not occluded, and therefore will accumulate snow. The experiments were
performed on Intel i7 PC with an NVIDIA GTX 580 GPU. Scenes of differing complexity were used with up to 750,000 vertices at 1024 1024 resolution. The results showed that × the performance of snow accumulation was largely independent of scene complexity with
frame rate varying between 65 to 75 fps. Table 3.4 summaries performance results from
recent work in monoscopic rendering of rain.
45 Table 3.4 Performance summary of monoscopic snow rendering
No. Authors No. of Particles Framerate (in thousands) (fps)
1 Langer et al. (2004) 2 - 16 10 - 60 2 Ohlsson & Seipel (2004) 48 13 3 Saltvik et al. (2006) 5 - 40 23 - 133 4 Tokoi (2006) 40 4 5 Yang et al. (2008) 56 20 6 Zhang et al. (2010) 100 - 200 36 - 60 7 Reynolds et al. (2015) 750 65
46 CHAPTER 4
STEREO RENDERING
Rendering is a process of converting a three-dimensional world scene into a two-dimensional
image of that scene. Although this two-dimensional image can give some information about
depth using lighting and the viewer’s knowledge of the world, it loses the ability to look
beyond or around objects in that scene and makes some depth relationships ambiguous.
The use of stereovision can remedy this shortfall. The process of rendering in stereo requires
two subtly different views of the same scene, one for the left eye and another for the right
eye. This is analogous to stereophonics, where separate speakers play different sounds into
the left and the right ears. Stereovision requires two forward facing eyes separated by a
horizontal distance, called the inter-ocular distance. The light enters the two eyes and fall
onto respective two-dimensional surface called the retina. The two separate sets of image
data are sent from each retina to the back of the brain for processing by the visual system where the images merge to produce depth perception. Debate exists about how a vision
system combines various depth cues [62]. How the human visual system functions is an
47 open area of research and beyond the scope of this study. Suffice it to say that most viewers will perceive depth from planar left- and right-eye views with disparity. Figure 4.1 illustrates
the two views perceived by each eye.
Figure 4.1 Stereo pair as viewed by the two eyes
4.1 Depth Perception
The 3D structure of a scene is perceived from the 2D retinal images using various psychologi-
cal and physiological depth cues [63]. Other taxonomies of depth perception, such as monoc-
ular, binocular, and oculomotor depth cues, also exist in the literature [64]. Depth cues are additive. Correct combination of depth cues gives a better sense of a three-dimensional
environment but a conflict in depth cues may result in eyestrain and unpleasant viewing ex-
48 perience. Some cues are stronger than the others in certain situations. For example, a sailor may rely on multiple psychological depth cues to determine the distance to a far-off buoy, such as linear perspective, aerial distortion, and relative size to name the few. However, a person threading a needle primarily uses physiological depth cues, such as binocular disparity, accommodation, and convergence, to determine the location of the end of the thread and the eye of the needle. An important criterion for the dominance of one cue over another is the distance from the viewer to the objects of interest.
4.2 Psychological Depth Cues
Psychological depth cues include: linear perspective, lighting and shadows, aerial perspec- tive, color, interposition or occlusion, texture gradient, and relative size. Such depth cues are considered monocular because they can be observed by one eye. They give an impression of depth even in a two-dimensional image. Artists have known about psychological depth cues since the renaissance period and have used them to create depth perception. Since these depth cues are observable in a painting or a picture, they are also known as pictorial depth cues. As an example, in Figure 4.2 an 1877 painting by Gustave Caillebott, Paris Street;
Rainy Day, shows how use of psychological depth cues can be effective in creating the illusion of depth even from a planar surface.
Aerial Distortion: The objects that are further away from the viewer appear cloudy or behind bluish haze. This is because blue color, having shorter wavelength, penetrate atmosphere more easily. In computer graphics, depth cuing is used to reproduce effects of aerial perspective, which reduces the intensity of the object in proportion to the distance from the viewer.
49 Figure 4.2 Gustave Caillebotte. Paris Street, Rainy Day, 1877. Image Credit: Art Institute of Chicago
Linear Perspective: The retinal image formed by an object becomes smaller as the object moves away from the viewer. In computer graphics, this phenomenon is modeled by linear perspective projection. All parallel lines that run towards the horizon line appear to converge at a single point called the vanishing point. This is why train tracks in the distance seem to come together.
Interposition or Occlusion: It is one of the simplest depth cues that work at any distance in which foreground objects hide or overlap background objects. This provides information about relative depth.
Relative Size: Certain objects are expected to be smaller than others. Knowledge of relative size of an object also aid in determining depth. Prior knowledge of normal object size can be used to infer knowledge about depth.
Color: When light of different wavelengths enter the eye, it is refracted by the fluids in the eye at various angles due to differences in the refractive index of the wavelengths. This
50 causes color images on the two retinas to have disparity thus producing depth perception.
This is referred to as chromostereopsis. Typically, red color and bright objects appears
closer than blue color or dull objects.
Texture Gradient: More details are perceived in an object that is closer to the viewer
but with distance, textures get blurry due to perspective transformation. For example, a
brick or stone is coarse in foreground view but gets progressively finer with distance. This
texture gradient causes relative depth perception.
Lighting and Shadows: Scene illumination plays an important role in giving a realistic
depth sensation to an image. Accurate use of lighting and shadows also enhances photo-
realism. An object that is further away from the light source appears darker and casts a
smaller shadow. The effects of light and shadow on an object give cues about depth, shape,
relative position and size.
4.3 Physiological Depth Cues
Physiological depth cues are perceived when eye structure changes. Examples include variations in the thickness of the eye lens or convergence of two eyes on an object to bring
it in focus. The physiological depth cues are due to binocular disparity, ocular motion
such accommodation, and convergence, or due to monocular clues as observed in motion
parallax.
Binocular Disparity: One of the most noticeable and important depth cue is binocular
disparity, also known as retinal disparity. It refers to the difference between an image formed
on the retina of the left- and the right-eye. If the two images formed on each retina are
somehow superimposed, two horizontally displaced but overlapping images would be seen.
51 Depth perception is possible when the brain processes the existence of small differences between the two retinal images, the process known as stereopsis [65]. The differences between the two retinal images is due to the difference in the two angles formed, in each eye, between the retinal projection of the object in focus and any other object in the field of view, as illustrated in Figure 4.3. Two cubes are in the field of view, one farther away than the other. Assume both eyes are fixated on a corner of the cube nearer to the viewer, labeled n1 such that the image of n1 is focused at corresponding points in the center of the fovea of each eye, where the visual acuity is highest. A corner, labeled n2, from the farther away cube will be imaged on the retina of each eye that will be at different distances from respective fovea, therefore θl = θr . This difference is binocular 6 or retinal disparity. Moreover, retinal disparity in relation to an object of interest can be
Figure 4.3 Binocular disparity
defined as the difference between the convergence angle of that object and the convergence
52 angle associated with the fixation target. From intersecting lines and opposite angles, it
is deduced that retinal disparity at point n2 can be expressed as: θl θr = ϕ α. In other − − words, when the convergence angle of that object (α) is smaller than the convergence angle
associated with the fixation target (ϕ), the object is farther than the fixation target, then the retinal disparity is positive.
Accommodation: The contracting or relaxing of the eye muscles that changes the shape
of the eye lens is called accommodation. To see an object that is closer to the viewer, the
eye muscles relax and the lens thickens. Conversely, the eye muscles contract stretching
the lens making it thin to see distant objects as illustrated in Figure 4.4. The changes in the
lens shape focus incoming light rays onto the retina to form a clear image. Accommodation
is categorized as an oculomotor depth cue because eye muscles are used in changing the
focus. Out of focus information from different states of accommodation can also provide
useful depth information.
Figure 4.4 Accommodation - changes in lens thickness accommodate focus on objects
Convergence: It is also referred to as vergence, which is the inward or outward rotation
of the eyes towards a point of interest as it moves towards or away from the viewer as shown
53 in Figure 4.5. It is also an oculomotor depth cue because eye muscles are used in rotating
the eyeballs. As an object moves away from the viewer, the eyes diverge and move outward
until they reach maximum parallel position. In a normal human vision, the eyes cannot
diverge beyond this point. As the both eyes rotate inward or outward to converge on the
object the lens becomes thinker or thinner respectively to bring the object in focus and
accommodate. Thus in real life, accommodation and convergence occur simultaneously when viewing the world with stereovision.
Figure 4.5 Vergence - inward or outward movement of both eyes to converge on objects
Motion Parallax: Motion parallax is a monocular depth cue that occurs when either
the observer or the scene move relative to each other. An image formed on the retina by
a distant object moves across the retina more slowly than an object’s image that is closer
to the viewer. Thus, objects closer to the viewer appear to move faster than objects that
are farther away. This allows relative depth judgments to be made. The effects of motion
parallax are evident when looking outside of a window from a moving car, where objects in
54 the distance appear stationary while objects close by rapidly travel across the viewer’s field
of view as depicted in Figure 4.6.
Figure 4.6 Motion parallax - object closer to viewer appear to move faster
4.4 Creating Stereo Pairs
The rendering of a stereo pair requires an understanding of the geometry of viewpoints
represented by two virtual cameras described by a viewing frustum in the right handed
Cartesian coordinate system. A frustum is the view volume formed by the eye’s field of view. In computer graphics, frustum is clipped by near and a far clipping planes. Objects
appearing in front of the near plane or behind far plane are not visible. The cameras
are horizontally separated by a distance, called the inter-axial distance, which represents
55 separation between the two eyes, the inter-ocular separation, thus simulating stereovision.
The simplest method of rendering a stereoscopic image pair is to set-up two virtual cameras
by using the parallel axis shown in Figure 4.7. This model has a parallel axis with symmetric
Figure 4.7 Stereo visible where the two view frustums overlap (top view)
view frustum. It is recommended for creating stereo pairs because it does not generate
keystone distortion or vertical parallax and has zero disparity for points at infinity [66]. The field of view common to both cameras is where stereovision is observed. Objects that are visible in one camera but outside the field of view of the other camera are the regions to
be avoided. Such object placement causes visual discomfort due to the object’s visibility
in one eye but not the other. To perceive corresponding points in the left and the right
images correctly, it is important to adjust the image. This is accomplished by either using
56 an asymmetric view frustum during rendering or cropping the sides of the image during
post-production. In computer graphics, use of an asymmetric view frustum is common
because of simpler implementation.
4.5 Viewing Stereo Pairs
The 2D display surface capable of forming a stereo image by projecting the left- and right-
eye views is called a stereo window. Consider looking through a real window on a building,
objects may appear behind, at, or in front of this window. Similarly, in a computer generated
stereoscopic view, looking at a stereo window is analogues to looking through a real window.
The intersection of two lines from a point in the scene to each eye produces what is called
homologous points on the stereo window. The horizontal distance between the homologous
points is called the horizontal parallax. Three types of horizontal parallaxes that induce
stereoscopic depth cues are shown in Figure 4.8. The sign of the difference between the
abscissas of two homologous points determines the type of parallax: positive, zero, or
negative. An object appears behind the stereo window when the horizontal parallax in
positive. This happens when the projections of the object on the stereo window for each
eye is on the same side as the respective eyes, which are uncrossed. The maximum positive
parallax occurs when the object is at infinity i.e. both eyes are looking straight with parallel
line of sight. At this point the horizontal parallax is equal to the inter-ocular distance.
An extreme form of positive parallax occurs when the horizontal parallax is greater than
the viewer’s inter-ocular separation. This is known as diverging parallax resulting in a
phenomenon called walled-eye vision. This phenomenon does not occur under natural viewing conditions for normal human vision. When an object has no perceivable amount
57 Figure 4.8 Stereo window and horizontal parallax
of horizontal parallax it appears to be at the stereo window. The projection of the object on
the stereo window is coincident for both the left and the right eye, hence zero horizontal
parallax.
An object is located in front of the stereo window when the projection for the left eye is
on the right and the projection for the right eye is on the left, hence crossed also known as
negative parallax. Note that a negative parallax is equal to the inter-ocular distance when
the object is halfway between the stereo window and the viewer. As the object moves closer
to the viewer the negative parallax increases to infinity.
Several techniques exist to view a stereo pair, with or without optical aid. Viewing a
stereo pair without assistance from any apparatus is called free viewing. With practice, most viewers can fuse stereo pairs using this technique. Viewing a stereo pair with an optical
aid can be grouped into two categories, time-parallel and time-multiplexed methods. The
following sub-sections describe these techniques.
58 4.5.1 Free Viewing
Parallel or uncross and transverse or cross viewing are two types of free viewing techniques.
In parallel viewing, the left eye image is placed to the left of the right eye image. When the viewer looks at the two images, the eyes are uncrossed. The lines of sight of the viewer’s
eyes move outward toward parallel and meet in the distance at a point well behind and
beyond the image.
In transverse viewing, the left and right eye images are reversed requiring the viewer to
cross eyes to restore image placement and perceive depth. Most people are better at one
type of free viewing over the other.
Free viewing skills require conscious effort, concentration and practice to master. Con-
sequently, these methods are used by experts in field enabling them to view stereo pairs without optical aid.
4.5.2 Time-parallel Viewing
In the time-parallel method both eyes are presented with stereo pairs simultaneously. Such
methods include anaglyphs, advanced wavelength multiplexing approach used by Dolby
3D, and auto-stereoscopic techniques.
Anaglyphs: Various techniques use filters to extract images for the left- and right-eye.
The anaglyph method is one such technique that has been used extensively in viewing
stereo pairs. In anaglyphs the left- and right-eye images are superimposed and the pixel values are computed from the combination of the left eye color and the right eye color.
Anaglyphs require the viewer to see the image with glasses that use complementary color
59 filters. The filters for viewing the anaglyph on an electronic display are typically designed to
block blue and green wavelengths for the left eye and to block red wavelength for the right
eye. Other common filter combinations include red-green, red-cyan. Anaglyphs are easy to
create and only require inexpensive color filter glasses to view either on a monitor or in
print. However, major drawbacks exist that make this technique unsuitable for mainstream
media. Since viewing of the anaglyph requires color filters, the true color fidelity of the
scene is often lost. In addition, it suffers from retinal rivalry that can create the appearance
of ghosting. Ghosting, also known as crosstalk, means that one eye can see part of the
image that is intended for the other eye. This causes an unpleasant viewing experience,
eye fatigue and headaches after an extended period of looking at anaglyphs. Recent tech-
niques for computing anaglyphs that improve color faithfulness which are based on the
transmission properties of the filters and the color characteristics of the display device have
been proposed. The anaglyph output can be improved by using algorithms like uniform
approximation to produce brighter output [67] or the CIELab approximation method to
preserve color fidelity [68]. However, such algorithms incur significant extra computational overhead in anaglyph calculation.
Advanced Wavelength Multiplexing: The classic anaglyph approach is a wavelength
multiplexing technique where the whole wavelength range of visible light from 400 to
700 nanometers (nm) is subdivided into two ranges red and the complementary color
to red, cyan. The advanced wavelength multiplexing approach, sometimes referred to
as super-anaglyph, is adopted by Dolby 3D that uses narrower wavelength bands. This
approach works for stereo projection systems. It uses interference narrowband spectral
filters to extract the left- and right-eye image [69]. The filters have to be selected such that
60 they fall within the red, green and blue sensitivity range of the cone cells, photoreceptors
responsible for color vision, of the human eye. Each RGB value is split into two channels with slightly different wavelengths. This set of RGB triplets is used to encode the stereo pair.
For example, the left-eye image may use: red = 620 nm, green = 530 nm, and blue = 440 nm wavelengths, while the right-eye image may use slightly different wavelength values:
red = 615 nm, green = 525 nm, and blue = 435 nm. The projected stereo pair is decoded by glasses with appropriate interference filters. Although, the glasses are passive, without
any active electronic components, they are expensive and not disposable because they use
specific wavelengths to filter light. Unlike anaglyphs, an advanced wavelength multiplexing
approach has full color output. The projection screen is inexpensive, either simple matte
or low gain, and it is compatible with standard cinema screens.
Auto-stereoscope: The necessary use of glasses is a hindrance to widespread consumer
acceptance of viewing content in stereo. Auto-stereoscope techniques provide glasses
free method for viewing stereo. Two ways of manufacturing spatially multiplexed auto-
stereoscope display are lenticular and parallax barrier [70]. A lenticular surface has an array of small cylindrical lenses in front of the pixel raster. These lenses are placed in such a way that light from adjacent pixel columns falls in different viewing slots at some ideal viewing distance. Each viewer’s eye sees light from only every other pixel column. The
parallax barrier technique achieves the same goal by using small visual barriers in front of
the pixel raster. The use of the auto-stereoscope display is specialized with a narrow viewing
angle. To get an optimum viewing angle, the observer needs to sit directly in front of the
screen. Auto-stereoscope display technology is well suited for personal displays, such as
the Nintendo 3DS game console. Other drawbacks include cost, diminishing image quality,
61 and being limited to only viewing 3D images.
4.5.3 Time-multiplexed Viewing
In the time-multiplexed method a stereo pair is presented to both eyes in a sequence.
Optical techniques are used to block one eye while the other eye is shown the image and vice versa. This method is grouped into two types, one with passive and the other with
active viewing glasses. In both types, the images are delivered at a faster frame rate, at least
120 Hz, to avoid flicker. The use of a passive polarized glasses and active shutter glasses
are examples of time-multiplexed methods. Time-multiplexed methods are sometimes
referred to as field sequential if the left- and right-eye images are shown using the interlaced
fields. The term frame sequential is used when display is progressive or non-interlaced.
Passive Polarized Glasses: In a system that uses polarized light to display a stereo pair,
the left- and right-eye images are polarized orthogonal to each other. A newer version of
this technology uses circular polarization where one eye lens is polarized clockwise and
the other counterclockwise. If the viewer tilts his head the images will not separate as it will in the orthogonally polarized case. The viewer uses passive polarized glasses that also
have orthogonal axis of polarization. The combination of the two acts as a shutter. When
the left eye image is displayed, the light is polarized such that it is parallel to the axis of the
left eye lens. Since the right lens’ polarization axis is orthogonal to the left lens, the image
to the right eye is blocked. The passive system does not need any synchronization with
the display device. It also allows several viewers to simultaneously view in stereo and has
a relatively larger field of view. For these reasons this system is popular in movie theaters.
The disadvantage of this type of system is that the display device must produce a polarized
62 image. The projector must use polarizing lenses and the viewing screen must be coated with a special material to reflect polarized light. Since the passive polarized glasses essentially act as dark sunglasses, the intensity of the light that the viewer sees is low that causes images to appear dark.
Active Shutter Glasses: The glasses used in this method do not have filters. Instead an
LCD acts as a blocking shutter. An electronic signal is used to either make the lenses clear or opaque. The signal alternates for each eye causing the left eye to see the view while the right is blocked and vice versa. The view from the glasses is actively synchronized with the current frame on the display via an infrared signal between the active shutter glasses and the display. This method is capable of delivering full high resolution progressive image to each eye. Since active shutter glasses do not use polarized light, the light intensity researching the viewer is much higher. However, the major disadvantage of this system is that is requires additional logic to maintain synchronization between the display and the glasses.
Quad Buffer Stereo: Quad buffering is a technology for implementing stereoscopic rendering that uses double buffering with a front and back buffer for each eye, totaling four frame buffers. Quad buffering allows swapping the front and back buffers for both eyes in sync, allowing the display to seamlessly work with different rendering frame rates. A GPU that supports quad buffering also supports time-sequential images to be viewed by active glasses with liquid crystal shutters.
4.6 Challenges with Stereo Rendering
Numerous experiments have shown that stereovision can provide significant advantages over monocular vision. Particularly, stereovision can aid in spatial localization and visu-
63 alizing large amounts of complex data [71]. However, mainstream acceptance of stereo requires a comfortable viewing experience. In most people the dominant depth cue for
stereovision is binocular disparity. The control of depth cues for producing a correct stereo
effect is critical. A conflict in depth cues has a strong detrimental visual effect resulting in
one or more of the following:
1. The dominant depth cue may not be the correct or intended depth cue.
2. The depth perception may get exaggerated or reduced.
3. The stereo view may cause eyestrain and become uncomfortable to watch.
4. An object at the edges of the stereo window with negative parallax results in conflict
in depth cues.
5. The stereo pair may not fuse at all and the viewer may observe two separate images.
In a stereo pair the only difference should be in the horizontal parallax. Any other
difference in the color, geometry, and brightness between the two images will result in visual fatigue. Some people experience eyestrain when viewing a stereoscopic image pair
on a flat display device where the relationship between accommodation and convergence
breakdown. In viewing a natural scene, both eyes accommodate and converge at the same
point. This is a habitual and learned response. However, viewing an object on a plane screen
makes the two eyes always focus on the screen itself while their line of sight converges at
some point in space. Only the image points that have zero parallax are those that have the
same accommodation and convergence. This results in using low horizontal parallax to
reduce viewer discomfort.
64 The goal in creating a stereo pair is to provide maximum depth effect with the minimum
amount of horizontal parallax. This can be achieved by varying inter-axial separation.
Bringing the two camera viewpoints towards each other reduces the amount of horizontal
parallax. Conversely, greater inter-axial separation results in greater parallax. As a general
rule, horizontal parallax should not exceed half an inch if viewed from typical viewing
distance of eighteen inches for a desktop monitor. The leakage of the left-eye image into the
right-eye and vice versa results in crosstalk or ghosting. It can result from inaccurate shutter
synchronization in active shutter glasses or from phosphor afterglow in plasma or older CRT
displays. A difference in perception of color, contrast, brightness, and geometry between
the left- and the right-eye all result in a ghosting effect. Reducing horizontal parallax also
helps to diminish this effect.
In creating a stereo view it is important to decide where to position the object with
respect to the stereo window. If an object has negative parallax and the edges of the stereo window cuts off that object then there will be conflict in depth cues. The occlusion of the
object by the stereo window tells the viewer that the object is behind the window. On the
other hand, the stereo cue provided by the parallax tells the viewer that the object is leaping
out of the window. From experience, when looking through the window an object cannot
be between the window and the viewer therefore causing conflict in depth cues.
Strictly speaking, a stereo view is only correct for one viewing position. When looking
through the stereo window, if the viewer shifts the head from side to side, a shearing effect will be observed. If the viewer moves closer or farther away from the screen the scene will
compress or expand. This can be compensated in real-time systems by head tracking.
65 CHAPTER 5
IMPLEMENTATION FRAMEWORK
Algorithms that execute instructions while performing calculations on multiple indepen-
dent data sets in parallel are well suited for running on graphics hardware. Graphics pro-
cessing takes place on a programmable GPU. It is a specialized processor with dedicated
memory that performs floating point operations required for computer graphics. In re-
sponse to commercial demand for real-time rendering, the current generation of GPUs have
evolved into many-core processors that are specifically designed to perform data-parallel
computation. The single instruction multiple threads (SIMT) framework accomplishes
data-parallelism where multiple processing cores in a GPU execute same instructions on
different data sets using multiple threads executing in lockstep. Particle simulation is an
ideal computational task for a SIMT framework since it is assumed that every raindrop
or snowflake will behave independently. This chapter describes implementation details
needed to demonstrate stereoscopic, real-time, and photorealistic rendering of precipita-
tions.
66 5.1 Graphics Hardware
In the graphics hardware terminology, host is synonyms to a CPU executing a graphics
application that is responsible for capturing events such as key presses, mouse clicks, setup
of initial states of the graphics program, and creation of an output window to draw results.
The term device is referred to a graphics card that houses a GPU, which consists of thousands
efficient data processing cores designed for handling multiple data sets simultaneously.
This increases throughput, the amount of data processed in a given unit of time, which is
ideal for high performance computing or executing graphics applications.
A prototype application is developed to test the performance of real-time rendering
of rain in stereo on an Intel Core-2 Duo CPU running at 2.00 GHz with 2.0 GB of RAM
installed with a generic graphics card designed for the Intel series express chip set family.
The experiments performed on this basic graphics hardware validate that stereoscopic
rendering of precipitation scene is possible in real-time.
In later experiments the graphics card used is an NVIDIA QUADRO K5000, which has
3.54 billion transistors. This is a powerful graphics hardware capable of supporting a mod-
ern OpenGL version 4.3 and above by featuring a programmable GPU. It has a theoretical
single precession floating point operation with a peak rate of 3090 GFLOPS. It incorporates
1,536 compute unified device architecture (CUDA) cores clocked at 1.06 GHz. The term
CUDA is a marketing name NVIDIA uses to define a GPU parallel computing platform and
programming model, which enables improvement in computing performance by harness-
ing the data-parallelism provided by the GPU. A CUDA core, also known as a streaming
processor, is a pipelined hardware unit accepting instructions and executing them in paral-
67 lel. Other GPU vendors, such as AMD, refer to a CUDA Core as a single instruction multiple
data unit (SIMD). In OpenGL terminology, a CUDA Core is called a shader unit because
it executes a shader program. The QUADRO K5000 graphics card also supports 4 GB of
GDDR5 memory clocked at 6.08 GHz which produces memory bandwidth of 192.26 GB/s
and a texel (texture element, also known as texture pixel) fill rate of 128.8 GTexel/s.
5.2 Graphics Software
There are two major low-level graphics libraries available to facilitate GPU programming,
DirectX and OpenGL. DirectX is proprietary and specific to Microsoft Windows operating
system. It is optimized to render video games for real-time performance, sacrificing pho-
torealism. However with each new release of DirectX, visual quality and experience has
improved by implementing better algorithms that takes advantage of programmability of a
modern GPU. Windows 10 supports the latest version 12 of DirectX. On the other hand,
OpenGL is an open framework. An application that uses OpenGL can execute on various
platforms. It is shipped as part of the graphics hardware driver and therefore some features
can be vendor specific. It provides a good balance between real-time and photorealistic
output. Additionally, it supports stereoscopic rendering by using quad buffers. Until re-
cently DirectX did not support stereoscopic implementation. Since one major goal of this
study is to produce results in stereo, OpenGL is selected for software implementation.
As the GPU hardware evolved so did the programming model. The prototype application
developed with OpenGL 2.0 to implement a stereo rendering of a rain uses a fixed-function
graphic pipeline. This pipeline is configurable but can only use fixed functionality, such
as lighting models that run on a GPU but cannot be changed or programmed. Because of
68 this, it is limited in use and does not take advantage of modern hardware improvements.
In the modern GPU, the fixed function graphic pipeline is improved and replaced with
a programmable graphics pipeline. In OpenGL, such programs are written in a C-like
language called the OpenGL Shading Language (GLSL) and are called shader programs.
The name shader is a misnomer since it has little to do with various lighting or shading
models. A shader programs are executable programs that run on a GPU and processes
data in parallel. The OpenGL driver and underlying hardware is responsible for parallel
execution, scheduling and synchronization.
5.3 Stereoscopic Implementation
The monoscopic particle system to simulate the behavior and distribution of falling par-
ticles are extended to stereo, where left- and right-eye views are calculated and rendered
separately. Thus, the scene is rendered from two different perspectives. The scene with
rain or snow particles is modeled using the world coordinates system while each object
is modeled using their respective model coordinates. Several matrix transformations are
required to transform all model coordinates to screen coordinates. This approach is typical
in a raster graphics pipeline where scene objects are used to form an image on the screen.
Figure 5.1 show this transformation process. All operations shown in gray boxes are software
implementations while blue boxes are GPU hardware units.
The implementation to produce stereo output is described by Hussain and McAllister
[72], where precipitation is rendered within a bounding box. The farther away rain or snow particles from the stereo camera, the bigger the disparity in the two views and the smaller
the particle size. A particle at near infinity will approach maximum disparity and minimum
69 Figure 5.1 Transformation from 3D world space to 2D screen space
particle size. If the inter-axial distance, the distance between the left- and the right-view camera, is kept similar to the distance between the human eyes, which is about 6.5 cm, then the stereovision will only be effective up to 30 meters (m) from the camera [73]. Since stereovision will only be strong for precipitation forming close to the camera, to improve rendering speed adjustments to the precipitation bounding box are made such that any particle formed outside of this boundary need only be rendered once. Conversely, the inter- axial distance is increased to produce hyper-stereo output, which is suitable for viewing outdoor scenery and distant rain or snow particles in stereo. The greater the inter-axial separation, the greater the depth effect.
Initially, the OpenGL 2.0 graphics library is used for experiments. Later implementation uses newer graphics hardware that supports OpenGL 4.5. The newer application executes on the Intel Quad Core i7 CPU running at 2.00 GHz with 16.0 GB of RAM installed with an
NVIDIA Quadro K5000 GPU. The graphics card supports an advanced graphics pipeline
70 featuring a programmable GPU. The hardware also supports quad buffer stereo with active
shutter glasses to view stereo output. A GPU that supports quad buffering also supports
time-sequential images to be viewed by active glasses with a liquid crystal display (LCD).
The glasses used in this method do not have filters. Instead an LCD acts as a blocking shutter.
An electronic signal is used to either make the lenses clear or opaque.
Alternatively, the frame buffer object (FBO) can be used to produce stereo anaglyphs.
In a fragment shader, instead of writing the pixel data to the frame buffer for rendering,
the FBO is used to save the pixel data. The fragment shader generates stereo anaglyphs
by rendering left- and right-eye images on separate FBO that are rendered to texture to
produce a composite of left- and right-eye views. The red component comes from the
left-eye image while the green and blue components come from the right-eye image.
5.4 Real-time Implementation
An earlier goal of this study is to investigate if real-time rendering in stereo is even viable.
Stereoscopic real-time output in other natural phenomena, such as fire [10], clouds [11],
and vegetation [12] have been studied. However, rain is distributed over larger 3D space and therefore representing each particle to produce visually realistic animation of rain in
stereo is a challenge.
In a previous study [72], a method for real-time rendering of rain in stereo is presented. It does not consider complex illumination and therefore produces an output without
emphasis on photorealism. The OpenGL 2.0 application executes on the Intel Core-2 Duo
CPU installed with a generic graphics card, which is good for basic rendering but does not
support an advance graphics pipeline or a programmable of GPU. Using OpenGL 2.0, the
71 particle simulation is implemented on the CPU. For each new frame, updates to pixel data
are made and passed to the GPU for rendering. This creates a bottleneck between CPU and
GPU, which is a major reason for GPU hardware and software redesign. The experiments with the older hardware show that a real-time stereo implementation of rain for simple
scenes is possible.
Recent improvements in graphics hardware architecture and corresponding updates to
OpenGL have made it possible for the CPU to send data once to the GPU, which keeps the
data resident in local memory. Thus, all graphics related data processing and simulation
takes place on the GPU. This enables a particle system to be implemented on a GPU. The
desired video frame rate is attained by taking advantage of inherent parallel architecture of
a modern GPU, which can operate in two program modes, shader and compute modes.
5.4.1 Shader Mode
In the shader mode, the current OpenGL programmable graphics pipeline, version 4.5, is
divided into four shader stages. The two required shader stages are vertex and fragment
shader, while the other two tessellation and geometry shader stages are optional. Every
modern OpenGL program must have at least vertex and fragment shaders, which constitute
a shader program.
The data passed from the host to the device is first processed by a vertex shader. It
operates on the vertex data which is in the form of 3D geometric primitives, like points,
lines, or triangles. Basic object geometry is defined by a collection of vertices and associated
attributes, like position, color, texture coordinates, normal vectors, and other attributes
that may defined that object. The vertex attributes are initially stored on the host system
72 memory in array form on a Vertex Buffer Array (VBA) This array is passed to the device
memory where it is referred to as a Vertex Buffer Object (VBO). Multiple vertex buffer
objects form a geometric scene. The vertex shader program is responsible for reading each vertex from the vertex buffer object and processes it in parallel. At a minimum, a vertex
shader calculates the projected position of a transformed vertex in screen space. It can
also generate other outputs, such as a color or texture coordinates, for the rasterizer to
blend across the surface of the triangles connecting the vertex. The particle geometry, for
either raindrop or snowflake, and associated attributes are defined in the VBA. Instead of
using detail geometric models, which may require hundreds of vertices to define, a coarse
icosahedron shape is used. An icosahedron shape only uses twelve vertices and takes far
less host and device memory to store. Using the optional shader stages, these initial vertices will be later subdivided into forming a desired particle shape of a raindrop or a snowflake, which can also be texture mapped to for greater visual detail.
The output of the vertex shader is passed to the next stage in the graphics pipeline. In
the absence of optional shader stages, such as a tessellation or geometry shader, the data
enters a fragment shader after passing through a hardware unit called the rasterizer. The
rasterizer produces visible pixel size fragments. OpenGL stores depth values of objects in a
depth buffer, also called a z-buffer. These values are used in a depth test which checks for
objects to be within near and far clipping planes. A fragment is defined as a pixel that has
not been subjected to a depth test. The GPU passes these fragments to a fragment shader.
The output of the fragment shader adds color to each fragment and applies depth values
that are then drawn into the framebuffer. Common fragment shader operations include
texture mapping and lighting. Since the fragment shader runs independently for every pixel
73 drawn, it can perform the most sophisticated special effects; however, it is also the most performance-sensitive part of the graphics pipeline. Since the fragment shader manipulates color values, it is used to produce stereo anaglyphs. It produces a composite image that incorporates both the left- and right-eye views. The red component of the composite image comes from the left eye image and the green and blue components come from the right eye image. Figure 5.2 show the final output the fragment shader forming an anaglyph from an initial icosahedron shape.
Figure 5.2 Fragment shader (stereo view with red-cyan anaglyph glasses)
Between the two required vertex and fragment shader stages there are a two optional stages namely tessellation and geometry shaders. The tessellation shader consists of three
74 sub-stages for creating more details in geometry. The first tessellation stage takes the in- coming vertices and passes them through a tessellation control (TC). This stage determines the amount of new vertices to generate. The output of TC passes to the tessellator, which is a graphics hardware unit responsible for generating new vertices by interpolating between the original and newly specified vertices. The position of these new vertices is determined by a tessellation evaluation (TE), which defines the shape of the new geometry. Tessellation shader stages are used to produce a dynamic level-of-detail in a scene instead of using static texture maps. They are also used in making curved surfaces either smoother or produce sharp edges suitable for terrain rendering. Figure 5.3 show input an icosahedron shape tessellated into a more refined spherical shape.
Figure 5.3 Input and output of tessellation shader
A geometry shader is the other optional shader stage that is responsible for modifying the geometry of objects before passing to a rasterizer. This stage is ideal for geometric instancing. For example, the geometry of an object is specified once but instead of making a draw call for each new object, instancing is used to make one draw call with multiple
75 instances of the same object drawn. This stage is useful in creating precipitation scenes where a single particle is drawn many times by a single draw call using multiple instances
of a particle.
With the use of a programmable GPU and performing all simulation and rendering
computation on GPU itself the real-time frame rates for stereoscopic rendering is achieved.
Figure 5.4 shows the modern GPU programmable shader stages for rendering and computa-
tion. Each stage runs on the GPU taking advantage of the parallel hardware architecture. The vertex shader operates on the vertex data which is in the form of 3D geometric primitives,
like points, lines, or triangles. The fragment shader operates on the fragments generated by
the rasterizer. The compute shader is not part of the programmable graphic pipeline but
runs in parallel to other shaders and is responsible for non-graphics related computations,
such as particle simulation.
5.4.2 Compute Mode
In the compute mode, the GPU can be used as a general purpose processor in which data
processing only takes place for simulation purposes and not for rendering. Programming
models defined by CUDA, OpenCL, and compute shader programs are used in GPU com-
pute mode. CUDA is vendor specific and can only be implemented on NVIDIA graphics
cards. The OpenCL is an open architecture designed to provide some implementation on
all graphics cards.
The compute shader is a new programming stage introduced in OpenGL version 4.3 to
provide better interoperability between GPU rendering and simulation tasks. It enables a
GPU to be used for general purpose computing. The particle simulation for rain and snow
76 Figure 5.4 Modern graphics pipeline
is performed in GPU compute mode using a compute shader. It is also used to simulate
environment effects, such as gravity and wind to implement a particle system. The compute
shader uses two buffers; one stores the current velocity of each particle and a second stores
the current position. At each time step, a compute shader updates position and velocity values. Each invocation of a compute shader processes a single particle. The current velocity
and position are read from their respective buffers. A new velocity is calculated for the
particle and then this velocity is used to update the particle’s position. The new velocity
and position are then written back into the buffers. To make the buffers accessible to the
compute shader program, a shader storage buffers object (SSBO) is used. Each particle
is represented as a single element stored in an SSBO. This is a memory reserved on the
graphics card. Each member has a position and a velocity that are updated by a compute
77 shader that reads the current values from one buffer and writes the result into another
buffer. That buffer is then bound as a vertex buffer and used as an instanced input to
the rendering vertex shader. The algorithm then iterates, starting again with the compute
shader, reusing the positions and velocities calculated in the previous pass. No data leaves
the GPU memory, and the CPU is not involved in any calculations. An alternative to pass
data to the GPU is via buffer textures that are used with image load and store operations.
A particle system can also be implemented using a transform feedback buffer which is
implemented using vertex and geometry shader programs.
5.4.3 Transform Feedback
Transform feedback captures vertices as they are assembled into primitives (points, lines, or
triangles). Each time a vertex passes through primitive assembly, those attributes that have
been marked for capture are recorded into one or more buffer objects. Those buffer objects
can then be read back by the application. To implement a particle system transform feed-
back requires two passes. On the first pass transform feedback is used to capture geometry
as it passes through the graphics pipeline. The captured geometry is then used in a second
pass along with another instance of transform feedback in order to implement a particle
system that uses the vertex shader to perform collision detection between particles and the
rendered geometry. An implementation of a particle system using transform feedback is
illustrated in Figure 5.5.
In the first pass, a vertex shader is used to transform object space geometry into both world space, and into eye space for rendering. The world space results are captured into a
buffer using transform feedback, while the eye space geometry is passed through to the
78 Figure 5.5 Implementation of a particle system using transform feedback
rasterizer. The buffer containing the captured world space geometry is attached to a texture
buffer object (TBO) so that it can be randomly accessed in the vertex shader that is used
to implement collision detection in the second simulation pass. Using this mechanism,
any object that would normally be rendered can be captured, so long as the vertex (or
geometry) shader produces world space vertices in addition to eye space vertices. This
allows the particle system to interact with multiple objects, potentially with each render
using a different set of shaders.
The second pass is where the particle system simulation occurs. Particle position and velocity vectors are stored in a pair of buffers. Two buffers are used so that data can be
double-buffered as it’s not possible to update vertex data in place. Each instance of the vertex shader performs collision detection between the particle and all of the geometry
79 captured during the first pass. It calculates new position and velocity vectors, which are
captured using transform feedback, and written into a buffer object ready for the next step
in the simulation. To produce results in stereo two sets of transform feedback buffers are
maintained at the same time.
5.5 Photorealistic Implementation
There are several applications of synthetic precipitation phenomenon found in the movies, video games, and virtual reality. Special effects are often used in the movies but they require
expensive and time consuming off-line processing to achieve photorealism. On the other
hand, video games require a balance between real-time user input and photorealistic
output. Evolution of software algorithms and recent improvements in computer hardware
have made it possible to produce real-time frame rates with emphasis on photorealism.
Photorealism is defined as a detailed representation like that obtained in a photograph in a
non-photographic medium such as a painting or, in our case, computer graphics.
References to photorealism appear in several studies. However, characterization of the
term varies. This is because human response to visual stimuli is as variable as individuals
themselves. Photorealism as defined by Rademacher et al. [74] is an image that is perceptu- ally indistinguishable from a photograph of a real scene. Since a photograph is planar, this
characterization of photorealism is well suited for monoscopic view. It is human conscious-
ness that identifies a scene as real or otherwise. Our brain creates a perception of reality
after processing visual information received by our two eyes, which are taking a snapshot
of the world around us from two different perspectives. A stereoscopic characterization
of photorealism is closer to visual realism where depth is perceived by presenting images
80 from two different perspectives. An alternate definition of photorealism is given by Ferw-
erda [75]. The author considers photorealism as images that are photo-metrically realistic, where photometry is the measure of eye’s response to light energy. Thus, photorealistic
rendering is about simulating light or how photons move around in a scene. The better
the approximation of this process, the closer we can get to photorealism. Techniques like
ray- or path-tracing, which simulate photons bouncing around in a scene, are inherently
better at producing photorealistic results. However, these techniques do not to work very well for a dynamic scene with many thousands of moving particles. The implementation
is further compounded by stereoscopic output at video frame rates, where a minimum of
120 fps screen refresh is required to achieve a jitter free stereo animation of rain or snow.
Thus, an approximation to photorealistic results is desired. Fortunately, it is difficult for
the human eye to notice subtle differences in a dynamic scene. Therefore, ignoring certain
photorealistic effects, such as soft shadows, which may make a visual difference in an
otherwise static scene, is a viable option.
5.5.1 Illumination using a GPU
The fragment shader programs are used to create realistic illumination effects using a GPU.
The image-based light and environment mapping techniques are used to reflect and refract
light from scene objects including rain and snow particles. Cube maps are one commonly
used variant of environment mapping which maps the reflection and refraction vectors
from the surrounding texture on to the particle. The benefit of this approach is that it
requires less computation and is therefore good for real-time applications. However, the
problem with this approach is that it only deals with the front surface, not the back or any
81 other intersecting polygons in the main object. This means anything between the front face
of the particle and the background is not considered. Thus the reflection and refraction
of any moving objects will not appear in the particle; only the static surrounding will be
reflected and refracted.
For simple monoscopic scenes, the cube map approach can give some interesting
results. However, to get more realistic stereo results it requires multiple passes that create
a new cube map during each frame cycle. The other background objects in the scene are
rendered to an off-screen buffer, which is used as a cube map for scene illumination. The
stereo implementation of cube maps is performed for both the left- and the right-eye views. This potentially doubles computation. OpenGL is a rasterization-based system but
there are other methods for generating images such as ray-tracing. To take realism and
global illumination into account, a move towards ray-tracing is necessary. The ray-tracing
simulates natural reflection, refraction, and shadowing of light by 3D surfaces. In general
ray-tracing is computationally expensive due to many calculations required to determine
object-ray intersections. Parallel implementation using a GPU are ideal for ray-tracing
based image generation. In ray-tracing, since pixel color is determined by tracing a ray from
the viewpoint towards the light source, this approach inherently avoids hidden view in a
stereoscopic image and can be computed with as little as five percent of the effort required
to fully ray-trace surface issues [76]. Distributed parallel ray-tracing implementation is best suited using a compute mode programming interface where GPUs are utilized as general
purpose processors for speed up. The real-time output from ray-tracing for an animated
scene with thousands of moving particles such as in rain or snow natural phenomenon is
an open area of research.
82 5.6 Particle System
In this study rain and snow particles are animated and rendered using a particle system.
Pre-computed images of rain streaks and snowflakes are used as billboards to render an
appropriate precipitation scene. This increases rendering speed and also enables the same
particle system to simulate other phenomena such as falling autumn tree leaves by changing
the billboard texture and modifying particle attributes, such as particle mass and velocity.
In nature, precipitations such as rain and snow consist of thousands of tiny particles.
The simulation and rendering of such nature phenomena require complex mathematical
models and computer algorithms. A particle system, which is first introduced by Reeves
[77], is a technique well suited to simulate and render particles that do not have smooth or well-defined surfaces but instead are irregular in shape and complex in behavior. A
particle system is a large set of simple primitive objects, such as a point or a triangle, which are processed as a group. Each particle has its own attributes including position, velocity, and lifespan that can be changed dynamically. The particles move and change
their attributes over time before their extinction, which occurs when the particle lifespan is
exhausted or where an attribute falls below a specified threshold. If the attributes of the
particles are coordinated, the collection of particles can represent an object, such as rain or
snow. To achieve the desired effects of rain or snowfall, many independent particles are
simulated and rendered. In a particle system four basic steps are performed: generation,
rendering, update, and removing of particles from the system. These steps are described in
the following list and illustrated in Figure 5.6.
1. Particle Emitter: the initial position of particles is specified on a plane in 3D space
83 Figure 5.6 Particle system block diagram
called an emitter, also known as a generator. It acts as a source of new particles. The initial number of particles generated defines the density of the desired precipitation.
Fewer initial particles will create an effect of light rain or snow as opposed to larger number of particles, which will generate heavy precipitation. At a given time, a ren- dered frame f contains a total of p number of particles. Let µ and σ2 be the user defined values representing desired mean and variance of the distribution of the number of particles in the system. Let U (a, b ) represent a random number of equal probability between a and b . Then new particles generated every frame is described by an equation 5.1
2 n f = µ + σ U (a, b ). (5.1) ∗ Once the number of generated particles is known, each particle is assigned attributes such as position in 3D space, initial velocity, air resistance, wind forces, size, color,
84 transparency, and particle lifespan.
2. Particle Rendering: the rendering of particles is complex because the particles can
overlap each other, be translucent, cast shadows, and interact with other objects in
the scene. Therefore, an image of a rain streak or snowflake is used as texture, called
an impostor texture or a sprite, which is mapped to a polygon called a quadrilateral
also known as a quad that is made up to two triangles. This textured polygon forms
a billboard, which always faces the camera. It is rendered for each particle. The
use of billboards cuts back on the number of polygons required to model a scene by
replacing geometry with an impostor texture. The rain streak and snowflake billboards
are pre-computed and stored in a GPU texture memory unit for efficient processing.
3. Particle Update: after pixel data in the frame buffer is rendered, every particle is up-
dated for the next frame cycle. The update involves a change in particle position due
to the result of net forces acting on the particle. The particle velocity is updated, lifes-
pan is decreased, and color is changed depending on the environment illumination.
Additionally, an acceleration factor is a user supplied parameter to the particle system
that alters velocity of each particle between frame cycles. This allows for simulating
effects of gravity and other external forces such as wind and air resistance, which
makes particle motion more realistic. When updating the state of a particle, for a
small time interval ∆t , Euler integration is applied. Given initial or previous particle
velocity v and acceleration a, the new velocity is calculated by equation 5.2
v = v + a ∆t . (5.2) ∗
85 This velocity is further integrated with initial or previous position p to get the new
particle position to be rendered for the current frame as expressed by equation 5.3.
p = p + v ∆t . (5.3) ∗
4. Particle Lifespan: when an emitter generates the particles they are given a lifespan,
which is incrementally decreased every frame cycle. The particle is removed from
the particle system when the lifespan expires by reaching a certain threshold, which
is usually zero. An alternative to removing a particle from the system is to recycle it.
This is done by setting a flag when the lifespan expires so that the particle can be
reinitialized for the subsequent frame. The removal of a particle from the system can
have a performance penalty; therefore reusing it for the next frame is preferred.
5.7 Precipitation using the Particle System
For the precipitation effects, such as rain and snow, the running simulation must be main- tained for the entire world, not just the portion that is within the field of view. This is because particle dynamics may cause particles to move from a portion of the world which is not currently visible to the visible portion or vice versa. In the case of snow, it is important to appropriately manage various scenarios dealing with the end-of-life of a particle, e.g., snowflakes accumulation or depletion. In snow accumulation, a snow layer is formed over the objects upon which they fall. In snow depletion the opposite happens which is the result of shrinking or disappearing snow particles due to heat from the sun, thus simulating the melting effect. The snow accumulation can be modeled by terminating the particle
86 dynamics when the particle strikes a surface, but continue to draw it in its final position which is determined by a collision detection algorithm. A difficulty with this solution is
that the number of particles which need to be drawn each frame will grow without bound.
Another solution is to draw the surfaces upon which the particles are falling as textured
surfaces. When a particle strikes the surface it is removed from the particle system and the
snow texture is added to the surface texture. However, this leads to a problem of efficiently
managing texture maps for the collision surfaces. One way to manage these texture maps is
to use the frame buffer object (FBO). Instead of writing the pixel data to the frame buffer for
rendering, the FBO is used to save the pixel data. When the simulation begins, the texture
map for a surface is without snow cover. At the end of each frame, expiring particles are
drawn on the surface using an orthographic projection, which is a viewpoint that is perpen-
dicular to the surface. The resulting texture is saved in the FBO and used in rendering the
surface during the next frame cycle. The process is repeated every frame cycle to simulate
snow accumulation on the surfaces. This method of using FBO for collided snow particles
provides an efficient mechanism for maintaining a constant number of particles in the
system. It works well for the initial snow accumulation on uncovered surface. However, it
does not model continuous snow accumulation and growth in snow cover over time. Rain
particles are denser, heavier, and have more mass as compared to the snow particles. The
particle attributes are different from snow and therefore the effect of gravity and wind is
also different on the rain particles. The heavy rainfall is better simulated using rain streak
texture while light rain is represented by motion blurred spherical mesh objects. The initial
accumulation of rain is a more complex problem than snow. In case of snow, an opaque
accumulation is built up over time but the rain is translucent thus the shading of the col-
87 lision surface is more subtle. Like in snow accumulation, the FBO is used to texture map the collision surface. However, a multi-pass shading method is used, which partition the scene into wet and dry pixels. The scene is drawn using two different shading models, one that renders a wet appearance and the other renders a dry appearance. The texture map is used to choose which output to store in the FBO on a pixel-by-pixel basis. A more efficient method increases the simulation performance by reducing the number of particles. The particles are only rendered if they are in front of the viewer. The use of motion blur in the particles, fog, and illumination effects are used to simulate overcast sky to enhance realism.
5.8 Compute Mode Particle Simulation
In the particle system implemented on a GPU, each particle cycles through the two key program stages running on a GPU as shown in Figure 5.7. One is responsible for particle simulation and the other is required for particle rendering in stereo.
Figure 5.7 Simulation and rendering loops
88 The particle simulation for rain and snow is performed in GPU compute mode using
a compute shader for better OpenGL interoperability. Each particle is represented as a
single element stored in a shader storage buffer object (SSBO). This is a memory reserved
on the graphics card. Each member has a position and a velocity that are updated by a
compute shader that reads the current values from one buffer and writes the result into
another buffer. That buffer is then bound as a vertex buffer and used as an instanced input
to the rendering vertex shader. The algorithm then iterates, starting again with the compute
shader, reusing the positions and velocities calculated in the previous pass. No data leaves
the GPU memory, and the CPU is not involved in any calculations.
5.9 Animation
Animation is accomplished by using a particle system. It is a well suited animation solution
for scenes that contain many similar objects, such as rain or snow particles. The particle
system also enables us to implement laws of physics that are used to model complex
dynamics in an animated precipitation. Physical forces on particles, such as gravity, air
resistance, and wind are simulated. In the animation cycle position and parallax of a new
particle is determined for left- and right-eye views, laws of physics are applied, and particle
attributes are updated before rendering takes place. Expired particles are reborn with initial
attributes and the cycle continues until the application stops.
It is assumed that all particles are moving towards the ground (bottom) plane at terminal velocity. The terminal velocity is the velocity at which the acceleration of the particle is
zero. This happens when force due to gravity cancels the effect of air resistance on the
particle thus the particle appears to fall at a constant velocity. The effect of wind and gusts
89 are the only other external forces considered that can change this constant velocity. They are defined by wind or gust velocity that includes direction and speed. This is implemented in forming a wind bounding box inside the particle boundary. For wind, the size of the wind bounding box is equal to the particle boundary. For gusts the position and size of the wind bounding box is specified in a configuration file. The wind and gust effects are initiated by a key press. When a particle enters the bounding box, calculation of new position is also impacted by wind or gust direction and speed. The wind bounding box acts like a fan sitting in space: if a rain or a snowflake falls in front of the fan then it is blown according the net result of the external forces due to the presences of wind.
90 CHAPTER 6
EXPERIMENTS AND RESULTS
The study is grouped into three phases, each building upon the knowledge acquired from
the previous phase. This chapter describes setup of various experiments and their results.
6.1 Phase 1 – Real-time Stereo
Initial experiments are performed on a generic graphics hardware to validate real-time
stereoscopic rendering of rain. The complex issues of scene illumination are ignored during
this initial phase of the study. Instead the emphasis is on the parameters that produce a
realistic rain distribution. The rain model used in monoscopic rendering of rain is extended
for stereo viewing. Stereo output is produced in two different ways. In the first method,
symmetric view frustum is used to produce stereo by adding random horizontal parallax to
the rain streaks. The second method uses a stereo camera model with asymmetric view
frustum to produce the left- and the right-eye views due to the rendering of the scene from
91 two different perspectives.
6.1.1 Method 1 – Horizontal Parallax
In this investigation pre-computed images of rain streaks are used as billboards to increase
rendering speed. A rain streak image represents retinal persistence in human vision when viewing a falling raindrop. The monoscopic statistical rain models to simulate the behavior
and distribution of falling rain are extended to stereo. The complex issues of scene illumina-
tion and hidden surface elimination problems are ignored. Rain streaks that have positive
parallax, the ones that appear behind the stereo window, are considered. The experiment
concentrates on the parameters that produce a stereo-realistic rain distribution. The al-
gorithm first determines the parameters of the stereo view frustum bounded by near and
far clipping planes. The top plane of the view frustum is the rain emitting plane where all
initial positions of rain streaks are formed. The symmetric view frustum of the left- and
the right-eye camera forms a stereo overlap area that can be represented as a view frustum
formed by a single camera as shown in the Figure 6.1.
For each rain streak two uniformly distributed random numbers are generated that
represent position and parallax within the stereo view frustum as illustrated in Figure 6.2.
The horizontal position of a rain streak is determined by generating a random number, x , with a range between 1 and image width, w . This produces a rain streak position for the
left-eye view. To produce the position of the rain streak for the right-eye, the horizontal
parallax value, z , is generated. The range for z is a random number between 0 and maximum
parallax, m. The maximum parallax value is half of the inter-axial distance. The vertical
position of the streak is changed according to speed of descent and other environmental
92 Figure 6.1 Symmetric view frustum with stereo overlap (top view)
factors such as wind gust introduced by adding a bias in the horizontal position of the rain
streak.
For non-zero parallax, two homologous points are created and the rain streak image is
linearly scaled as an inverse function of depth. The entire process is repeated to animate
and render rain streaks at various depths. An increase in the number of rain streaks creates
a more dense rainfall. The user can modify input parameters, such as inter-axial distance,
raindrop speed of decent, and number of rain streaks, interactively to observe changes in virtual rainfall in stereo.
An image of a rain streak is used as texture map on a polygon, a quadrilateral or quad.
This forms a rain billboard and always faces the camera. Since a stationary camera is
assumed, a cardboard effect associated with rendering of a billboard is not observed. Layers
of billboards, or slices, may be needed to avoid such artifacts. For each rain particle at the
93 Figure 6.2 Generating stereo rain streak (top view)
rain emitting plane, a rain billboard is rendered. Billboarding is used to cut back on the
number of polygons required to model a scene by replacing geometry with a texture map.
The rain billboards are pre-computed and stored in memory for efficient processing. The
rain billboards farther away from the camera are linearly scaled as an inverse function of
depth.
Rain streaks are scaled by a function of depth as illustrated in Figure 6.3, where n and
f are distances to the near and far clipping planes, respectively. let s be the distance of
the rain streak from the camera then the quad length is scaled by a factor (f s )/(f n), − − making the rain streak smaller when it forms closer to the far clipping plane. A smaller,
lower resolution, texture is needed for the scaled rain streaks. This requires use of texture
filtering technique called mipmaps, which is a texture map created from the original texture
at a reduced resolution and size. Therefore, in a mipmap texture the level of detail decreases
as depth increases. When the rain streak is closer to the camera, the original texture is used
to render the rain streak in full detail. In OpenGL, a function is provided to render mipmaps
94 Figure 6.3 Rain billboard and mipmaps (side view)
that is responsible to select a suitable resolution of the texture based on the distance of the rain streak from the camera. The use of mipmaps increases rendering speed and reduces aliasing. A total of eight mipmaps are used with the original texture image size at 128 256 × producing subsequent mipmaps of size 64 128, 32 64, 16 32, 8 16, 2 4, 1 2 and 1 1. × × × × × × × During the next rendering iteration, the vertical position (y ) is determined based on the speed of decent and updated for all rain streaks to move them towards the bottom plane of the rain boundary.
The length of the rain streaks and how they are distributed in the field of view are modeled using the Marshall-Palmer distribution [78]. The rain distribution has an inverse exponential relationship with the length of the rain streak. The rain streaks in the distance look smaller than those that are closer to the camera. Therefore, rain streaks farther from
95 the camera are exponentially greater in number than the rain streaks appearing close to the
camera. Let l be the rain streak length at the near clipping plane. As the rain streak forms
farther from the camera this length is scaled denoted by ls as
f s l l , (6.1) s = ( f −n ) − where s , n, and f are distances of the rain streak texture, near, and far clipping planes from
the camera, respectively, as shown in Figure 6.3. The rain distribution, R, is the inverse
exponential function of ls and is defined as
Λ ls R(ls ) = R0e − | |, (6.2)
where ls is normalized between 0 and 1, R0 is the rain density given in terms of number of
rain streaks, and Λ is the slope parameter. As the value of ls approaches 0, the value of the
exponential approaches 1. Therefore, the rain streaks farther from the camera will appear
more dense expressed by the value of R0. A value of R0 = 1000 results in light rain and a value approaching 10,000 results in heavy rain. The value of Λ is determined experimentally
and it is independent of R0. It affects the rain streak size and rain distributed in the field of view. Incorrect selection of the slope parameter (Λ) results in smaller rain streaks closer to
camera producing conflicts in visual cues and an unnatural appearance. It is found that a value of Λ = 2 gives the best results. It is assumed that the 3D model for the background for left- and right-eye views already
exists such that we can overlay rendered rain for each eye to get the final scene in stereo. Only
rain streaks are rendered without including any other scene elements such as backgrounds,
96 light and other environmental interactions. The number of rain streaks to render, which defines the rain density, is given as an input parameter which lies between 1000 for light,
5000 for medium, and 10,000 for heavy rain. The geometry associated with rain streaks is drawn once per rain streak for left-eye view. A new position for every rain streak is calculated and redrawn for the right-eye view as illustrated in Figure 6.4.
Figure 6.4 Single camera setup to add parallax to rain streaks
The graphic card used in this experiment supports OpenGL 2.0, which is good for basic rendering but lacks support for advance and modern graphics pipeline featuring GPU programming. Each frame displays the current position of the rain streaks for the left- and the right-eye views. Figure 6.5 shows the anaglyph output.
97 Figure 6.5 Method 1 output (stereo view with red-cyan anaglyph glasses)
6.1.2 Method 2 – Asymmetric View Frustum
In this method, the left- and right-eye views are generated by two separate camera setups,
rendering the scene from two different perspectives. In setting up the left- and right-eye
cameras it is not sufficient to translate the camera position by the inter-axial distance
because this creates large portion of the output image that is only visible by one eye as
shown in Figure 6.6. This can cause viewer discomfort. To overcome this issue in stereo
photography, a physical stereo camera, such as Panasonic Lumix 3D, uses a wide angle lens
for both left- and right-eye views such that overlap in the two output images is maximized.
The non-overlapping areas at the boundaries of the two view frustums, where stereo does
not exist, are cropped to only record stereo output. However, in software it is easier to
98 Figure 6.6 Symmetric view frustum (top view)
define an asymmetric view frustum as shown in Figure 6.7. This follows from how human
eyes form a view frustum. Imagine standing at some distance from the center of a window
looking outside. The left-eye, being at an offset from the center, forms an asymmetric view
frustum with the edges of this window and so does the right-eye. This setup avoids any
post-processing on the output rendered image, such as cropping, because both the left- and
right-eye views are projected on the same screen. Additionally, asymmetric view frustum works well for VR head mounted displays because the view frustum remains the same even when the viewer tilts or moves his head.
In method 2, the fixed function graphics pipeline of OpenGL 2.0 is used to animate
rain. The stereo camera model enables objects in the background to be rendered in stereo.
The background scene is created using a 3D modeling tool called Blender. The model and
related texture maps are imported into the OpenGL during program initialization. The
trunk of the center tree is placed at zero parallax, on the stereo window. The leaves in front
99 Figure 6.7 Asymmetric view frustum (top view)
of that tree have a slight negative parallax, appear to come out of the screen. All other
objects, including animated rain streaks, have positive parallax. The tree on the left is set
back slightly while the tree on the right is the farthest away. These observations are visible
in the anaglyph image. The left eye view is encoded with red color channel while green
and blue colors filter the right eye view. An output is produced using a red-cyan anaglyph
glasses as shown in Figure 6.8.
6.1.3 Frame rate Comparison
The frame rates achieved by the two stereo rendering methods for light, medium and heavy
rain are compared. The results are an average over 10 experiments. The results are also
compared with monoscopic rendering. In monoscopic viewing, the geometry associated with rain streaks is drawn once per rain streak. For heavy rain 10,000 rain streaks are
100 Figure 6.8 Method 2 output (stereo view with red-cyan anaglyph glasses)
rendered per frame. Moreover, for each rain streak only position is calculated. Like the monoscopic viewing, in method 1 the approach of achieving stereo by creating parallax in the rain streaks also uses a single camera setup. The position and parallax of each rain streak is calculated. This information is used to create left- and right-eye output images.
However, the geometry associated with rain streaks is drawn twice per rain streak. In the other stereo rendering approach, method 2, the left- and right-eye views are separately generated by two camera setups, rendering the scene from two different perspectives.
The results in Table 6.1 show that the frame rate decreases with the increase in the number of rain streaks. For light rain, the frame rate of the two stereo implementations is slightly less than the frame rate of the monoscopic rendering. As the number of particles to
101 Table 6.1 Frame rate comparison
Precipitation Number of Monoscopic Stereoscopic Intensity Particles (fps) Method 1 (fps) Method 2 (fps)
Light 1000 262 258 259 Medium 5000 180 93 93 Heavy 10000 91 46 45
render increases, the difference between the two stereo implementations and the mono-
scopic rendering also increases. At maximum rain density, the frame rate of both stereo
approaches is close to half of the monoscopic rendering rate.
The results from the two stereo implementations are very similar. It is shown that the
real-time stereo implementation of rain for simple scenes is possible on relatively simple
hardware. Method 1 adds parallax to the rain streaks and renders twice while method 2
renders twice based on two cameras position. In the proposed method 1, the right-eye rain
streak is derived from the left-eye rain streak by computing parallax. It is noted that there is
no measurable change in the frame rate when wind is added to the rain streaks. The frame
rate can be improved with implementation using contemporary graphics processors with a
newer version OpenGL programmable graphics pipeline.
6.1.4 Phase 1 – Conclusions
The initial phase of this study is inspired by existing techniques of stereoscopic rendering
of other natural phenomena such as fire and vegetation. The study extends monoscopic
techniques for rendering of rain and presents a solution for modeling real-time stereo
rain animation. A rain streak is rendered for left- and right-eye views based on randomly
102 generated parallax. A particle system is implemented for animating the rain scene. Wind
and gust effects are also modeled.
Several simplifying assumptions were made in this study: rain streaks only have nonneg-
ative parallax, the stereo cameras are stationary, rain streaks are moving at terminal velocity,
the 3D model for the background already exists, and complex issues of scene illumination
are ignored. Future research will address these assumptions and enhance this work by
including complex lighting interactions, object and camera motion, acceleration of rain
streaks, and inclusion of sound effects.
6.2 Phase 2 – Stereo from 2D-3D Converters
The alternative to modeling and rendering stereo rain is to apply 2D-3D conversion software
to a monoscopic rain scene. In this phase, the effectiveness of the 2D-3D converters in
producing stereoscopic natural scenes is studied. Five 2D-3D software applications that
convert 2D video into stereoscopic 3D are compared [79]. These five applications are Arcsoft, Axara, DIANA-3D, Leawo, and Movavi. The selection of these five applications is based on
the conversion techniques, ease of use, and software availability.
The Arcsoft Media Converter uses proprietary 3D simulation technology to turn 2D
pictures and movies into 3D format and is included to study how the algorithm compares to
documented methods used by other 2D-3D converters [80]. The Axara Media 2D-3D video converter software applies classifiers and automatic object detection in scenes to perform
transformations from 2D to 3D video files [81]. The DIANA-3D by Sea Phone implements
the method described above by Hattori [82]. The Leawo Video Converter [83] and Movavi
Video Converter 3D [84] both use parallax shift and perspective to provide 2D to 3D video
103 conversion support and are included to study how the two implementations compare.
In these experiments, the quality of the stereo output is measured in two different ways.
In the first method, two features in an input to the 2D-3D video converters are selected
such that one feature is closer to the viewer. Therefore, the correct output of the 2D-3D video converters has greater positive parallax between the left- and right-eye views in the
feature that is farther from the camera. The difference in the horizontal parallax between
actual values and the values obtained by the output of the 2D-3D video converters is
compared. The second method to evaluate the quality of stereo output of the five 2D-
3D video converters is based on subjective scoring by individuals who rate their overall visual experience. The quality of visual experience is measured by asking subjects to rate
converted output using three criteria: visual comfort, conflict between left- and right-views,
and observable depth in a given scene. The output produced by the 2D-3D video converters
is also compared subjectively with the results of the rendered output produced by method
2 for real-time rain rendering that uses a stereo camera model as described in the initial
phase of this study.
6.2.1 Overview
The problem of converting 2D to 3D addresses the generation of left- and right-eye views with correct horizontal parallax from a given 2D view or video. In the movie industry,
converting old movies to 3D is a meticulous, semi-automatic, and time consuming process.
Many 3D television sets have a 2D-3D conversion mode, but the processing resources are
limited, resulting in a poor quality visual experience. For computers, including tablets and
hand-held devices, many fully automatic conversion algorithms have become available.
104 The alternative to simulate and render rain is to use video of rain scenery as an input to
2D-3D conversion software. Given accurate depth map estimation, such software applica-
tions may produce a stereo rain scene. However, in creating a depth map the 2D-3D video
converters make many assumptions about the 3D scene and visual cues that are often not
correct, resulting in conflicting 3D views. Also, the data available in the 2D input image of
natural phenomena may not have enough information to give a look-around feel to the
converted output image. It also does not solve hidden surface problems where changing
the viewpoint changes the occlusion relationship between objects in the scene.
6.2.2 Depth Estimation Techniques
The proliferation of depth estimation techniques has given rise to many practical software
applications for 2D-3D conversion. Existing 2D-3D conversion algorithms can be grouped
in two categories: algorithms based on a single image and methods that require sequence
of multiple images such as videos. Depth from a single still image can be extracted by
employing monocular depth cues, such as linear perspective, shading, occlusion, relative
size, and atmospheric scattering. Other techniques like blur analysis and image based
rendering methods using bilateral symmetry also exist. McAllister uses linear morphing
between matching features to produce stereo output from a single image with bilateral
symmetry, such as the human face [85]. For methods that require a sequence of multiple images, several heuristics exist to
create depth information. These methods generate a depth map by segmenting the 2D
image sequences, estimating depth by using one or combination of many visual cues,
and augmenting the 2D images with depth to create left- and right-eye views. A detailed
105 description the algorithms useful in computing dense or spare depth maps is given by
Scharstein and Szeliski [86]. The depth map is computed from multiple images of a scene either taken from similar vantage points or from a sequence of images acquired from a video. In another method, Hattori describes real-time 2D-3D converter software that produces a 3D output viewable from different angles [87]. To accomplish this, the author applies the horopter circle projection on the right-eye image. The horopter is the locus of points in space that fall on corresponding points in the two retinas when the left- and right-eye fixate on a given object in the scene. All points that lie on the horopter have no binocular disparity. In the absence of binocular disparity other depth cues such as linear perspective, shading, shadows, atmospheric scattering, occlusion, relative size, texture gradient, and color become more relevant. The author relates the parallax shift with pixel illumination assuming that brighter objects are closer to viewpoint while darker objects are in the background. This parallax shift method is used to create the left-eye view. The author further shows that the anaglyph output generated by this real-time 2D-3D converter produces less fatigue due to a decrease in retinal rivalry [88]. Other techniques apply machine learning algorithms and a classifier to automatically detect objects and key features in a given scene to estimate depth. One such algorithm is described by Park et al. where for each video frame a potential stereo match is determined by the classifier [89]. This ensures that the proposed stereo pair meets certain geometric constraints for pleasant 3D viewing. In a sequence of multiple images, the depth cues are also estimated by the presences of shadows, focus/defocus, disparity among two images, and motion parallax. There is extensive research on depth estimation in the context of
2D-3D conversion. An excellent overview of 2D-3D conversion techniques for 3D content
106 generation is provided by Zhang et al. [90]. In principle, depth can be recovered either from monocular or binocular depth cues.
Conventional methods for depth estimation have relied on multiple images using stereo
correspondence between two or more images to compute disparity. However, combining
monocular and binocular cues together can give more accurate depth estimates [91].
6.2.3 Quantitative Experiments and Results
An alternative to simulate, render, and animate stereo precipitation is to apply 2D-3D video
conversion on existing 2D videos of precipitation scenes. The quality of stereo output of
such converters is measured and compared with results among selected 2D-3D convert-
ers. For this purpose, baseline synthetic videos are created by using a 3D modeling tool,
such as Blender. Three such videos are used to create test cases that consider depth from
monoscopic clues, such as linear perceptive, occlusion, and depth from object placement
in the scene. Additionally, three more stereoscopic videos are used as baselines to test the
2D-3D converters. These videos are from a collection of downloadable stereoscopic videos
of natural scenes acquired from an integrated twin lens camera system [92]. Six test cases are considered for this experiment. Figure 6.9 shows one such test image emphasizing
linear perspective. This 2D image of a 3D virtual scene is taken by two identical parallel
camera models in Blender, one for each eye, giving a true 3D stereoscopic output. The
scene consists of two identical spherical objects, representative of raindrops that are smaller
than 2 mm in size. The center of the sphere on the right is on the stereo window, while
the sphere on the left is the same object farther from the camera. The stereo window is
a plane perpendicular to the viewer’s line of slight on which the left- and right-eye views
107 Figure 6.9 Baseline input image to test depth from linear perspective
are projected. The stereo output image acquired by using the parallel camera models is
the baseline output image shown in Figure 6.10. The Axara 3D video converter outputs
Figure 6.10 Baseline output image: Test case C-1
shown are discussed. Notice that the left- and right-eye views of the sphere on the left
shows greater positive parallax as it is placed away from the camera while the center of
the sphere on the right has zero parallax and shows little disparity between the left- and
right-eye views. This baseline output image is compared with the output of the five 2D-3D video converters. The horizontal parallax value is measured by identifying key features such
as edges or corners of an object in the left- and right-eye views. For test cases where the
baseline image is from a stereo camera, a feature such as an edge or a corner of an object is
108 easily recognizable. The horizontal parallax of the selected feature from the baseline output
image is the correct value. The difference in this horizontal parallax for the same feature
in the output of the 2D-3D video converters is measured. The difference between the two values is the error in horizontal parallax introduced by the 2D-3D video converter.
The output of a 2D-3D video converter (Axara) to the input baseline image for linear
perspective is shown in Figure 6.11. It is expected that a feature closer to zero parallax would
Figure 6.11 Output of Axara 3D software video converter
show little disparity. However, in the actual output of the 2D-3D video converter, the sphere
closer to the camera, exhibits significant horizontal parallax. Furthermore, both spheres
measured similar disparity indicating that the 2D-3D converter did not correctly account
for linear perceptive.
These experiments are repeated for depth implied by occlusion. A baseline input model
is created with two spheres such that one is in front of other. The expected baseline output
and corresponding output of a 2D-3D video converter is shown in Figure 6.12. In the
baseline output, the occluded sphere is farther away from the camera; therefore it has
positive parallax and appear into the screen. The 2D-3D converter (Axara) output shows
both front and occluded sphere with same parallax, which is not expected.
109 Figure 6.12 Depth from occlusion. Test case C-2
For some methods, objects at the bottom or center of the scene are assumed to be closer to the camera than objects at the top or near the edges. To test this scenario, a baseline image with objects appearing throughout the scene is used. Figure 6.13 shows an expected output where the baseline image has all spheres on the stereo window. It is noted that the
Figure 6.13 Depth from object placement. Test case C-3
2D-3D converter output shows spheres on top of the screen with positive parallax, farther away from the camera, while the spheres on the lower half of the screen have negative parallax, coming out of the screen towards the camera.
Three videos consisting of a scene with water and wind effects are selected from a stereoscopic 3D HD video library [92]. These videos are used as baseline input to the 2D-3D
110 video converters. Recognizable features such as edges or corners are used as a reference
point to measure horizontal parallax between the left- and right-eye views in both the
baseline video and the output of the 2D-3D converters. The three baseline images taken
from videos of scenes from a stereoscopic camera and corresponding 2D-3D converters
output are shown in Figure 6.14. The comparison between positions of the selected features
Figure 6.14 Depth from stereoscopic camera: Test case C-4, C-5 and C-6
in the baseline image and the corresponding output from the five 2D-3D video converters is
111 shown in Table 6.2. The columns titled C-1 to C-6 correspond to the six different test cases
Table 6.2 Parallax (in Pixels) for objects closer to the camera
Test Cases No. Converters C-1 C-2 C-3 C-4 C-5 C-6
1 Baseline -2 -2 -2 -12 -18 -14 2 Arcsoft -6 -4 -4 -9 -6 -4 3 Axara 20 20 20 60 55 64 4 DIANA-3D 6 6 6 12 15 14 5 Leawo -4 -4 -4 -17 -20 -20 6 Movavi 10 10 10 32 30 31
used. The first three tests, from C-1 to C-3, are results from synthetic baseline input images while the results from C-4 to C-6 are from baseline images acquired from a stereoscopic
camera. The values in Table 6.2 are horizontal parallax values for a selected feature that
is closer to the camera. The values for the objects that are farther from the camera are
Table 6.3 Parallax (in Pixels) for objects farther to the camera
Test Cases No. Converters C-1 C-2 C-3 C-4 C-5 C-6
1 Baseline 26 -2 3 -8 8 -5 2 Arcsoft -6 -8 -4 -4 -9 -8 3 Axara 20 20 20 60 55 64 4 DIANA-3D 6 6 6 12 7 12 5 Leawo -4 -4 -4 -17 -20 -20 6 Movavi 10 10 10 32 30 31
112 provided in Table 6.3. These values are measured in pixels. Notice that some values are
negative. Negative values mean negative parallax. For example, in the baseline image, the
sphere is positioned with the stereo window passing through the center; a portion of the
sphere appears in front of the stereo window.
The test case C-1 corresponds to the depth from linear perspective. Arcsoft and Leawo
are the only two converters that place the sphere with negative parallax. However, these values are in error when compared to the true value. The test case C-2 corresponds to depth
due to variation of object placement in the scene. In this case, spheres are placed throughout
the scene at various locations all with centers in the stereo window. The expectation is
for the output image to be at the same parallax unless the 2D-3D video converter is using
scene placement to determine depth. Comparing C-2 with the same column in Table 6.3
should give the same values. Arcsoft is the only 2D-3D video converter that exhibits different
parallax values for objects placed at the bottom of the scene as opposed to the same object
placed on the top.
The C-3 test case includes occlusion. In the baseline image, the sphere farther from the
camera is placed behind the sphere that is near the camera so it is partially occluded. The
horizontal parallax value in column C-3 of the two tables for the baseline image confirms
this fact. The remaining values in the column C-3 show that none of the 2D-3D video
converters distinguished between the two spheres and the horizontal parallax values for
the two spheres are the same.
The test cases from C-4 to C-6 correspond to the videos of natural scenes taken from a
stereoscopic camera. The horizontal parallax in the three baseline images is negative. The
Arcsoft output for test cases C-5 and C-6 is visually conflicting as the feature farther from
113 the camera has less horizontal parallax than feature closer to the camera. Axara, DIANA-3D, and Movavi outputs all have positive parallax and are therefore incorrect. Only the Leawo output exhibited negative parallax for all objects close to the camera. An important note from the data in Table 6.2 and Table 6.3 is that apart from Arcsoft, all other 2D-3D video converters showed no difference in horizontal parallax for features closer or farther from the camera, thus adding an equal amount of parallax to all objects. This simply gives a perception of the entire scene appearing behind or in front of the stereo window. The depth perception in these outputs is mainly due to monoscopic depth cues.
It is noted that out of five 2D-3D video converters, four converters (Axara, DIANA-3D,
Leawo, and Movavi) offer a user adjustable 3D depth setting. For the experiments this setting was set to the default value. The effect of changing the 3D depth setting results in either shifting all objects in a scene to appear behind or in front of the stereo window, thus adding either positive or negative parallax to the entire scene. It is also noted that out of the five 2D-3D video converters, DIANA-3D is the only converter that can convert a video in real-time. All other 2D-3D converters first uploaded a 2D video file before writing the converted 3D output.
From the data in Table 6.2 and Table 6.3, a mean square error (MSE) value for each
2D-3D video converter for the feature closer and the feature farther from the camera is computed. For a given test case, the error is the difference between the parallax values of the baseline and the parallax value of the 2D-3D video converter. This error is squared and summed up for all test cases for that particular 2D-3D video converter. A mean value is calculated by dividing the squared sum values with the total number of test cases. Table 6.4 shows normalized mean squared error for the five 2D-3D converters. The data shows that
114 Table 6.4 Normalized MSE between baseline and 2D-3D converters
No. Converters Near Far
1 Arcsoft 0.015 0.115 2 Axara 1.000 1.000 3 DIANA-3D 0.146 0.094 4 Leawo 0.004 0.165 5 Movavi 0.371 0.309
Leawo had the least amount of error while Axara had the highest error, followed by Movavi.
The errors in Arcsoft and DIANA-3D are close; Arcsoft performs better for closer objects while DIANA-3D has a smaller error for objects further from camera.
6.2.4 Subjective Experiments and Results
The stereoscopic viewing experience is evaluated based on subjective scoring by individuals.
Twenty-five subjects participate in the study. The International Telecommunication Union
(ITU) has proposed recommendations for performing stereovision tests for participants in
subjective assessment [93]. Guidelines from the ITU recommendations helps with screening subjects for visual acuity and stereo blindness.
The input to the 2D-3D video converters is a synthetic monoscopic precipitation scene.
Some 2D-3D video converters cannot convert 2D input in real-time; they upload the entire
2D input video file before writing the converted 3D output. Participants are shown the
resulting video clips produced by each method. It is a blind experiment in which participants
are not aware of the methods used to produce stereoscopic output. The quality of visual
experience is measured by asking subjects to observe output of the five 2D-3D converters
115 and rate them based on three criteria: 1) visual comfort, 2) conflict between left- and right-views, and 3) observable depth in the scene. The rating of each question is based on a five-point Likert scale, where 1 is poor, 2 is marginal, 3 is average, 4 is good, and 5 is excellent [94]. The questions asked to assess the output are given in Table 6.5.
Table 6.5 Survey questions for visual assessments
Answer each question by writing a number between 1 and 5
Note: 1=Poor, 2=Marginal, 3=Average, 4=Good, and 5=Excellent No. Survey Questions Instructions 1 Is the scene comfortable to view? Note for any physical discomfort such as headache or eye strain. 2 Do you see a ghost image or dou- Note for left-eye image leaking ble vision? into right-eye and vice versa, re- sulting in ghosting effect. 3 Is there any observable depth in The 3-D positions of stereoscopic the scene, specifically depth in par- objects are perceived stereoscopi- ticles? cally but they appear unnaturally thin.
The output of five 2D-3D converters are compared to each other for their effectiveness in producing stereo results. Axara 3D video converter output produced for a monoscopic rendered rain scene is shown in Figure 6.15. By default, Axara 3D adds a negative parallax to each object. There is a 3D depth setting that can be adjusted from positive to negative parallax levels. Setting at positive level increase positive parallax in all objects by the same amount. The left-eye image is slightly shifted to the left. Similarly, the right-eye image is
116 Figure 6.15 Axara 3D output for a monoscopic rendered rain scene
shifted to the right. The resolution has decreased a little making the smaller rain streaks blur out of the scene. Additionally, output from stereoscopic rain rendering described in phase 1 and shown in Figure 6.8 is also included in the comparison.
The box and whisker plots are drawn for the three survey questions. A box and whisker plot graphically represents several numeric quantities [95]. The vertical axis represents the Likert scale ordinal values, from “poor” represented by rank value of 1 up to “excellent” with rank value of 5. The box itself represents the first and the third quartile of the data, which is the inter-quartile data range (IQR), where the majority of the response exists. The red line represents the median response. The whiskers are represented by the dashed lines showing the possible range of user response.
117 For responses the three survey questions Figure 6.16 compares the descriptive statistics,
such as median, range and inter-quartile range (IQR), for the five 2D-3D converters and
the stereo method 2 implemented during phase 1 of this study. It shows that among all
2D-3D converters Leawo produced the best results along with method 2. Since the IQR is
the minimum for method 2, it suggests that most participants preferred it over any other
2D-3D converters; the remaining 2D-3D converters are ranked followed by DIANA-3D,
Arcsoft, Movavi, and Axara. These results corroborate the normalized MSE values acquired
from the previous experiments. The Photoshop method is used to compute anaglyphs,
incurring minimal overhead [67]. Since viewing an anaglyph requires color filters, the color fidelity and luminosity of the scene is reduced. Due to a darker background, characteristic
of a rain environment, the luminance intensity in the experiment is already low.
The participants viewing the anaglyph output produced by the 2D-3D video converters
expressed unpleasant viewing experiences such as eye fatigue and headaches that are side
effects of prolonged viewing. The average time spent looking at all output per participant was approximately 10 minutes. The participants also expressed difficulty in observing depth
in the output produced by some 2D-3D video converters. Common issues related to the
2D-3D video converters output are poor resolution, color fidelity, luminance, and lack
of observable depth in rain streaks. From the software specifications of the 2D-3D video
converters it is not clear what algorithm they used to compute the anaglyphs.
6.2.5 Rain and Snow Rendering
The left- and right-eye views are calculated and rendered separately. Thus, the scene is
rendered from two different perspectives. The scene with rain or snow particles is modeled
118 Figure 6.16 Variation in rating from twenty five participants
119 using the world coordinates system while each object is modeled using their respective
model coordinates. Several matrix transformations are required to transform all model
coordinates to screen coordinates. This approach is typical in the raster graphics pipeline where scene objects are used to form an image on the screen.
The initial rain rendering is enhanced to simulated and stereo rendering of rain and
snow using a programmable GPU. The experimentation focus is on photorealistic output, which is achieved by using environment lighting methods, such as cube maps. The OpenGL
4.5 graphics library with programmable graphics pipeline is used. The application executes
on the Intel Quad Core i7 CPU running at 2.00 GHz with 16.0 GB of RAM installed with an
NVIDIA Quadro K5000 GPU. The graphics card used in this study supports advanced and
the most modern graphics pipeline featuring a programmable GPU.
For stereo, the hardware used for experiments support quad buffers with active shutter
glasses to view stereo output. This type of stereoscopic rendering requires a display refresh
rate of at least 120 hz. This leaves 0.833 ms to complete writing data to the framebuffer.
Table 6.6 shows the frame rates achieved for light, medium and heavy precipitation with
1000, 5000, and 10,000 number of particles respectively. The results are an average over
Table 6.6 Frame rate comparison for rain and snow rendering
Precipitation Number of Monoscopic Stereoscopic Intensity Particles (fps) Rain (fps) Snow (fps)
Light 1000 1685 825 830 Medium 5000 1260 660 664 Heavy 10000 1050 521 525
120 10 experiments. These results are compared with monoscopic rain rendering where only
one camera is used to render a scene from a single perspective. In the stereo rendering
approach, the left- and right-eye views are separately generated by two camera setups,
rendering the scene from two different perspectives.
6.2.6 Runtime
The running time of the algorithm depends upon a number of factors; single versus multiple
processor machines, various versions of a processor, read/write memory or hard disk access speed, 32 versus 64 bit architecture, configuration of the machine, and input to
the algorithm. The time complexity analysis is only concerned with the behavior of the
algorithm in response to various inputs. The rate of growth of time taken by the algorithm with respect to the input is determined. In other words, the complexity of an algorithm is
a measure of how many steps the algorithm will require in the worst case for an instance
or an input of a given size. It is important to understand some basic terminology such as
problem, problem instance, and algorithm. Moreover, it is important to know how the size
of a problem instance is measured and what constitutes a step in an algorithm.
A problem is an abstract description coupled with a question requiring an answer. For
example, the real-time, photorealistic, stereo rendering of rain problem is; “given a scene with photorealistic objects, what is the maximum number of raindrops that can be animated
in real-time to give a visual sensation of seeing rainfall in stereo?” On the other hand, a
problem instance includes an exact specification of the data, for example: “a photorealistic
scene contains 10 textures, 100 polygons, 30,000 vertices, 10,000 raindrops...” and so on.
Stated more mathematically, a problem can be thought of as a function p that maps an
121 instance x to an output p(x ), which is the answer to the question posed by the problem. An algorithm for a problem is a set of instructions guaranteed to find the correct solution to any problem instance in a finite number of steps. In other words, for a problem p an algorithm is a finite procedure for computing p(x ) for any given input x . In a simple model of a computing device, a “step” consists of one of the following operations: addition, subtraction, multiplication, finite-precision division, and comparison of two numbers.
Time complexity is concerned with how long the algorithm takes as the size of a problem instance gets large. To resolve this, a function of the input size is formulated that is a reasonably tight upper bound on the actual number of steps. Such a function expresses the complexity or running time of the algorithm. An asymptotic analysis is required which determines how the running time grows as the size of the instance gets very large. For this reason, it is useful to introduce Big-O notation. For two functions f (t ) and g (t ) of a nonnegative parameter t , f (t ) = O(g (t )) if there is a constant c > 0 such that, for all sufficiently large t , f (t ) c g (t ). The function c g (t ) is thus an asymptotic upper bound on ≤ f . To calculate an expression for time complexity, a machine model is assumed. This is a hypothetical machine, which is assumed to have multiple processors, 64 bit architecture, can process data in parallel, and takes one unit of time to complete simple arithmetic and logical operations like addition, subtraction, multiplication, and division. It is also assumed that it takes unit time for an assignment operation to complete and all other computational costs are negligible.
122 6.2.7 Phase 2 – Conclusions
The experiments show the relative performance among five selected commercially available
2D-3D video converters and stereo output of rain rendering from method 2 implemented
earlier. Six test cases are applied to measure horizontal parallax. Additionally, subjective
scoring by twenty five participants measures the overall quality of visual experience. It is
observed that the depth perception is mainly due to presence of strong monoscopic depth
cues. In the majority of 2D-3D converters tested, the binocular disparity is equally applied
to all objects in the scene. This makes the entire 2D image plane shift into or out of the
screen. The 2D-3D video converters are making assumptions about the 3D scene that are
often not correct, thus giving conflicting visual cues. The quality of the visual experience for
scenes acquired from most 2D-3D converters is poor. Therefore, there is a need to develop
new methods to enable real-time photo-realistic rendering for stereo content generation.
This is achieved using a programmable GPU and environment lighting techniques.
6.3 Phase 3 – Measuring Photorealism
In the final phase of the study, current graphics hardware that supports programmable GPU
and modern graphics library is used. The initial implementation of stereo rain rendering
is extended to include photorealistic stereo rendering of rain and snow precipitation at video frame rates. The experiments in this phase of the study used input visual stimuli that vary along three visual factors: particle numbers, particle size, and motion [96]. The goal is to determine the statistical ranking and importance of these visual factors necessary
for producing a photorealistic output. The experiments were also extended to study the
123 impact on photorealism by use of stereo output and use of post-processing effects, such as variable lighting conditions, fog, and glow effect.
6.3.1 Visual Factors for Photorealism
Physically accurate environment lighting is one approach to produce visually realistic
results. However, such physically-based rendering alone is not sufficient to achieve photo-
realism. There are many visual factors that contribute to generate photorealistic results.
For example, sharp edges make objects look artificial as demonstrated by Rademacher et
al. [74]. Making sharp edges smooth and slightly rounded gives an object model a more realistic look. Similar conclusions are drawn for shadows. A sharp contrast of a dark shadow
looks artificial while soft shadows make a computer-generated scene look more realistic.
Imperfections in the camera lens can produce distortions such as lens flare, chromatic
distortion, or the formation of out-of-focus areas. Reproduction of these imperfections in a
computer-generated image adds to photorealism although this study does not consider
such distortions.
A fundamental challenge associated with achieving photorealism in a rain or snow
scene is lighting calculations. In previous studies involving particle systems, Garg and
Nayar [16] investigate photorealistic rain rendering with lighting effects. The study involves interaction of light with oscillating raindrops which produce complex brightness patterns
such as speckles and smeared highlights. The importance of light attenuation to achieve
photorealism is demonstrated by Tatarchuk and Isidoro [18]. Atmospheric effects such as fog and addition of glow effects around light sources such as misty halos around streetlights
for nighttime rendering are also considered.
124 Precipitation scenes consist of several thousands of particles that move independently
at variable velocities under the influence of external forces such as gravity, air resistance,
and wind gusts. The intensity of rain or snowfall is dependent on the number of particles
and particle size. A light precipitation event will have fewer and smaller particles as opposed
to heavy precipitation. Similarly, the variability in a particle shape is due to motion caused
by external forces acting on the falling particles. Variations in precipitation details due to
changes in number of particles, their sizes, and motion are important visual factors. The
literature review of computer generated rain and snow also points to these three key factors
influencing visual attention when observing a scene. Therefore, such visual factors are
important to consider when producing photorealistic results. The experiments proposed
in this study collect subjective data to analyze and rank the three visual factors.
6.3.2 Measuring Photorealism
There is no standard method to measure photorealism [97]. The images used in the experi- ments and the test procedure itself can cause variability in the results. Therefore, depending
on the test scenarios, new experiments will have to be devised to measure photorealism.
In this study, answers to survey questions are collected as data. Participants are shown video clips and still photographs of natural scenery with rain and snow. The visual stimuli vary along three factors or dimensions: number, size, and movement of particles. Addi-
tionally, computer rendered natural scenes with precipitation vary in light conditions such
as precipitation during sunlight vs. overcast sky, glow or halo from artificial light sources
such as streetlamps, and fog or atmospheric haze effects. Participants also evaluate stereo-
scopic views of computer generated natural scenery with precipitation by answering survey
125 questions designed to compare monoscopic and stereoscopic outputs.
The perception of each person varies slightly, thus making visual fidelity extremely
subjective. Therefore, human subjects are asked to complete survey questions from which
conclusions can be drawn about photorealism. A total of thirty healthy adult subjects, who
are not stereo blind, participated in observing rain scenes. Another set of thirty individuals
answered same survey questions for scenes with snow.
Three different types of experiments are conducted. In the first type of experiment, a
set of questions are designed to determine the perceptual space of precipitation in terms
of number of particles, size, and their motion as they fall towards ground. The second
experiment is designed to evaluate other visual factors such as illumination, fog, and glow
effects in a computer generated precipitation scene. In the final experiment, a question
is asked to compare mono and stereoscopic rendered outputs to study the contribution
of stereo on photorealism. The data gathered from these experiments is analyzed using
statistical tools.
6.3.3 Experiment 1 – Perceptual Space
The perceptual space is defined as the visual experience of a precipitation scene as observed
from the ground. The perceptual space is influenced by visual factors, such as particle
sizes and number of particles that determine the precipitation intensity - light, moderate,
and heavy rain or snowfall conditions. Additionally, variations in particle motion due to wind effects or turbulence is an important visual factor to consider. Rainfall may appear
to come down vertically when there is little wind or turbulence. However, under similar
low wind dynamics, snowfall may exhibit greater variations and randomness in vertical
126 fall. These factors are ranked based on the result of series of survey questions asked in
controlled human subject experiments to determine relative influence of a visual factor on
the perceptual space of a rain or snow event. For example, if respondent is most sensitive to variations in the number of particles then, to enhance photorealism, a rendering algorithm
can be developed to emphasize this particular visual factor.
The input stimuli are a series of video samples and still pictures of actual rain and
snowfall scenes captured by a monoscopic camera or gathered from various freely available
public websites [98]. The stimuli vary along the three factors such that each factor varies in extreme between high and low values. This results in a total of eight input stimuli, varying
in size, number of particles, and particle motion as they fall. The quantities of high and
low values are subjectively judged after visually inspecting and comparing several input
stimuli. For example, an image of a light rain scene will be selected as one of the input
stimuli for having very few and small rain streaks falling vertically after comparing it will
several similar images. Figure 6.17. shows sample images used to show heavy rain with
many particles falling on a slant used as an input stimulus for one of eight experiments.
A similar visual stimulus for experiment with snow is also shown. Note that the included
images lose resolution and visual fidelity in rain and snow particles after screen capture
and a resize.
There are two questions per visual factor forming a total of six questions as listed in
Table 6.7. Thirty participants are shown eight input stimuli to respond to the six questions.
The Likert scale is a commonly used scale to rank human responses to survey questions.
Responses to the six questions, which are referred to as Likert items, are recorded on a five-
point Likert scale [94]. Note that input stimuli with particle attributes of “small” vs. “large”,
127 Figure 6.17 One of the eight visual stimuli for rain and snow scene
“few” vs. “many”, and “straight” vs. “angle” are subjectively selected for these experiments
after visually inspecting several samples. All even numbered questions are opposite to
the previous odd numbered questions. This identifies incorrect survey answers due to
respondent bias, which is participants’ inability to answer truthfully or accurately. The
participant response time is neither restricted nor measured. It is also important to note
that there is no notion of “correctness” as the responses are subjective. Since Likert scale
responses produce ordinal data with no clear distribution, the statistical analysis such as
means and standard deviations are not useful for analysis. For example, it is unclear what
the average of “strongly agree” and “strongly disagree” means. Instead calculating median
or measuring frequency of responses in each category provides a more meaningful analysis
128 [99]. A different population set of participants is used to repeat the above experiments for visual stimuli that have snow precipitation instead of rain.
Table 6.7 Survey questions to determine the perceptual space
Answer each question by writing a number between 1 and 5
No. Survey Questions
1 Particle size is small 2 Particle size is large 3 There are few particles 4 There are many particles 5 Particles are falling straight down 6 Particles are falling at an angle
The results from all experiments are collectively analyzed, except for questions that were
designed to detect respondent bias. Figure 6.18 shows results for rain as visual stimulus.
The horizontal axis represents the Likert scale ordinal values, from “strongly disagree”
represented by rank value of 1 up to “strongly agree” with rank value of 5. The vertical
axis shows corresponding number of responses in percentage. For particle size in rain
experiments, responses ranged from “disagree” to “agree” with median response of “neutral”, while the majority of responses are between “neutral” and “agree”.
For number of particles and particle motion, the respondents tend to favor “agree” or
“strongly agree” with average response rank between “neutral” and “agree”. The central
tendency of both these visual factors is indicated by median response of “agree”. Although
the average response of particle size is also between “neutral” and “agree”, the median is
129 Figure 6.18 Rain scenes - response to number of particles, size, and motion
“neutral” with response ranging from “disagree” to “agree”.The data shows that respondents
have a favorable opinion regarding number of particles and particle motion in a rain
precipitation scene but the same cannot be said about the particle size. Moreover, as shown
in Table 6.8, the mode value for particle size is a neutral opinion as the most frequently
occurring response. On the other hand, the mode of the other two visual factors are more
Table 6.8 Descriptive statistics – rain/snow to determine perceptual space
Median Mode Visual Factors Rain Snow Rain Snow
Particle Size 3 3 3 4 Particle Motion 4 5 4 5 Number of Particles 4 4 4 4
130 towards “agree” or “strongly agree”.
The experiments are repeated with another group of thirty respondents for snow scenes.
The experiments on snowfall as visual stimuli yields similar results. In comparison, when
the visual stimulus is snow more responses are in the “agree” column. on particle size.
Figure 6.19 shows results for snow as visual stimulus.
Figure 6.19 Snow scenes - response to number of particles, size, and motion
Experiments with number of particles in snowfall and rain have similar results with the
same median and mode values. However, the response to particle motion in snowfall has
more variability but the median and mode values are “strongly agree” values. This may
be attributed to randomness in snowfall particle motion. Even with no wind snowflakes
tend not to fall straight down due to variations in snowflakes shape. Table 6.8 shows the
median and mode values in comparison with rain as visual stimuli. In case of the snow, it is
131 noted that the mode is “agree” for particle size as opposed to “neutral” for rain. The results
suggest that the respondents note a change in particle size for snowflakes more frequently.
This may be attributed to relatively lower velocity of falling snow particles, as compared to
raindrops, giving the viewer opportunity to observe snowfall in more detail.
6.3.4 Experiment 2 – Other Visual Factors
In these experiments survey questions are designed to measure subjective preferences of
different lighting and atmospheric conditions. During natural rain or snowfall, the sky is
overcast and light is diffused. The question is whether simulating overcast lighting con-
tributes towards improvement in photorealism of a rendered precipitation scene. The
participants are asked to rate two different light conditions, daytime and simulated over-
cast light conditions. Other visual factors, such as glow from an artificial light source and
atmospheric haze or fog effects, are also rated for photorealism.
A total of three visual stimuli and three survey questions are evaluated by thirty par-
ticipants. The questions are listed in Table 6.9. Keeping the definition of photorealism in
mind, participants are asked to view computer generated rain animations in 2D. They are
first shown a rain scene in bright daylight conditions. The output is switched to show the
same scene with lighting appropriate for an overcast sky. The participants are then asked
to compare the two outputs. A Similar procedure is used to compare a rain scene with glow
or halo from an artificial light source and a scene with atmospheric haze or fog.
The experiments are repeated for snow precipitation with another group of thirty par-
ticipants. The impact of these visual factors on photorealism is investigated by analysis.
The survey questions used for these experiments compare the two visual stimuli shown in
132 Table 6.9 Survey questions to determine other important visual factors
Answer each question by writing a number between 1 and 5
No. Survey Questions
Daylight vs. Overcast: 1 Overcast light made scene appear photorealistic Glow: 2 Adding a glow effect made scene appear photorealistic Fog: 3 A fog effect improved photorealism
sequence in the three experiments. The comparison is made between two different light
conditions, daylight vs. overcast, presence or absence of glow from a simulated manmade
light source, and addition of fog effects in the scene. The participants use the Likert scale
from 1 to 5, “strongly disagree” to “strongly agree”, to rank their responses. The results of
the experiments for rain as input stimulus are summarized in Figure 6.20.
Lighting plays an important role in producing photorealistic results. These experiments
indicate that for a rain scene the overcast lighting produces some responses as “disagree”, which pulls the median results between “neutral” and “agree”. Although overall lighting
change to overcast adds to realism, these results indicate that rain streaks and their cumu-
lative effect is relatively difficult to observe in low light conditions as compared to the glow
or fog effects. The participants’ response to the glow and fog effect is most significant. The
experiments on snowfall as visual stimuli yield similar results with greater variability as
shown in Figure 6.21.
The response to snow as visual stimuli is “neutral” for light conditions and glow with
133 Figure 6.20 Rain scenes - response to lighting, glow, and fog
Figure 6.21 Snow scenes - response to lighting, glow, and fog
134 favorable response to fog effect. Since snow particles are opaque and lack light refraction
they make a scene appear bright. The respondents did not form any strong opinion about a
snow scene except to “agree” on fog contributing towards photorealism. Table 6.10 compares
the median and mode values between the two types of visual stimuli.
Table 6.10 Descriptive statistics – rain/snow to determine other visual factors
Median Mode Visual Effects Rain Snow Rain Snow
Fog 4 4 4 4 Glow 4 3 4 3 Lighting Condition 3.5 3 4 3
6.3.5 Experiment 3 – Photorealism and Stereo
A computer generated rain scene is used in this experiment that is made up of a noticeable
number of raindrops falling at slightly slanted angles with random variation in sideways
motion. The scene is rendered in both mono and stereo. The subject views 2D output
before viewing the same scene in stereo. The stereo is viewed by using active shutter glasses while the display is switched to produce stereoscopic output. The participants are asked to
compare the two outputs. The survey questions are designed to find whether stereoscopic
results appear more photorealistic relative to monoscopic output. The experiment is re-
peated for snow precipitation. A same question is asked to two independent groups of thirty
participants with one group answering the question with rain as the visual stimulus while
135 the other group is asked to respond to same question with snow as the visual stimulus. The
question compares a monoscopic stimulus with a stereo equivalent. The participants use
the Likert scale from 1 to 5, “strongly disagree” to “strongly agree”, to rank their responses.
For both input stimuli, rain and snow, responses are closer to “strongly agree”. The snow
scene has median and mode values slightly higher, median 4.5 and mode 5, as compared
to 4 and 5 respectively for rain. The response frequency of each Likert level, for which there was a response, is shown in Table 6.11. Note that the majority of responses are either “agree”
Table 6.11 Response frequency of rain/snow
Question: Viewing in stereo made the scene appear photorealistic
Strongly Agree Agree Neutral
Rain 37% 33% 30% Snow 50% 40% 10%
or “strongly agree” for both types of precipitations, which is more pronounced for snow.
With respect to the question regarding photorealism of mono vs. stereo output for rain
and snow visual stimuli, the Mann-Whitney U-test is used to compare the responses of
the two independent groups. This test is well suited to analyze the Likert scale data, which
is ordinal or ranked scale, as we cannot presume that the responses fit a parameterized
distribution. This test requires that results from one experiment do not affect results in the
other and since the responses for rain and snow experiments are from different population
groups, the two group results are independent.
Since rain and snow precipitation are similar, we want to test whether there is a sta-
136 tistically significant difference in respondents’ opinions regarding the question posed in
this experiment. We can perform a hypothesis test by defining the null and an alternative
hypothesis. The null hypothesis is that there is no difference, using a significance level of
0.05, between the outcome of the two experiments using either rain or snow as visual input when it comes to comparing mono and stereoscopic output. In other words, both groups
are expected to respond similarly regardless of the type of the visual stimuli, rain or snow,
95% of the time. The alternative hypothesis is opposite of the null hypothesis, there is a
difference between the outputs of the two experiments.
An online Mann-Whitney U-test calculator is used to test the null hypothesis [100]. Raw samples are entered as an input to the calculator. The results confirm our null hypothesis
by calculating the z-score of -1.45627 and the p-value of 0.1443. Since p-value is greater
than our significance level of 0.05, we will accept the null hypothesis, concluding that both
groups formed similar opinions that viewing in stereo makes the scene appear photorealistic
regardless of precipitation type.
6.3.6 Phase 3 – Conclusions
In terms of the particles size, number of particles, and motion in a precipitation scene, visual stimuli of either rain or snow generate the same response. Both types of precipitation
generated stronger opinions for the number of particles and motion suggesting that these
two visual factors have more effect on participant’s attention. Thus the rendering algorithm
should emphasize these visual factors to enhance visual realism. The results demonstrate
that the visual factors for photorealism can be ranked as more sensitive to number of
particles and motion than to size.
137 In studying other visual factors, such as lighting conditions, glow, and fog effect the results from the two visual stimuli differ slightly. As expected rain precipitation was more sensitive to these factors while responses to snow scenes had fewer variations in opinion, remaining close to neutral. The overcast sky condition for rain scenes produced responses slightly higher than neutral suggesting that participants still considered rain as visually real even in normal daytime light conditions. However, presence of atmospheric haze and fog produced a stronger response; rendering outdoor scenes with fog effects adds to realism.
Since this part of experiment used a 2D visual stimuli, the glow and fog effects contribute towards photorealism independent of stereo. Moreover, the stereoscopic output contributed towards photorealism when compared to monoscopic results. The median response for a snowfall scene is slight higher than rain scene. This can be explained with snow particles falling at much slower rate than rain resulting in more time to observe particles in stereo.
138 CHAPTER 7
FUTURE ENHANCEMENTS
The objective of this work is to show that stereoscopic, real-time, and photorealistic ren-
dering of precipitation, such as rain and snow is achievable using contemporary graphics
hardware and software. This chapter provides a summary of the study and suggests future
extensions to this work.
7.1 Summary
The study begins with providing a general description of rain and snow formation in nature.
This information is important to understand the key attributes associated with precipita-
tion scenes. This helps in creating realistic simulation and animation of rain or snowfall.
A comprehensive review of related work in monoscopic rendering of precipitation and
stereoscopic rendering of other natural phenomena is presented. This review is important
to build background information needed to achieve the research objectives. Various depth
139 cues, such as monoscopic and stereoscopic, are also studied for better understanding of
stereoscopic rendering.
After acquiring necessary foundational knowledge, initial experiments are performed
on simple graphics hardware using fixed function pipeline to validate that stereoscopic
rendering of rain is possible in real-time. The use of less efficient fixed function graphics
pipeline is deliberate in these experiments with an understanding that if a slower hardware would maintain a real-time frame rate then so will a more efficient hardware. Stereoscopic
rain scenes can also be produced by taking a monoscopic rain video and use it as an input
to a 2D-3D software video convertor. A quantitative and subjective comparison between various 2D-3D software video convertors is presented. Their effectiveness in producing
high quality 3D videos with scenery containing water phenomena is studied. The results
from this study is further compared with rendered stereoscopic rain scenes to evaluate the
quality of stereo depth based on subjective scoring by individuals who rate their overall visual experience by answering survey questions.
The implementation of stereo rain rendering is extended to include photorealistic stereo
rendering of rain and snow precipitation at video frame rates. This is accomplished by
using current graphics hardware that supports programmable GPU and a modern graphics
library. The experiments with this newer implementation determine the statistical ranking
and importance of these visual factors necessary for producing a photorealistic output. The
experiments are extended to investigate if stereo improves photorealism. Visual stimuli used
in the experiments also include post-processing on rendered output to produce variable
lighting, glow, and fog effects to study their impact on photorealism as the stereo camera
moves in the scene.
140 7.2 Future Extensions
There are many future extensions to this work some of which are as follows:
1. The model of falling raindrops is represented by textures of rain streaks. As the hard-
ware and software evolve, for future studies a deformable raindrop mesh model can
replace a texture based representation. This will result in a more physically accurate
and photorealistic output that can provide visual details in a falling raindrop. The
mesh model can also deform easily to model motion blur which can provide more
accurate representation of retinal persistence. Additionally for snowfall, such models
can simulate inflight collision between particles to form bigger snowflakes as this
process is physically accurate and common in nature.
2. Rasterization based rendering can be replaced by a ray-tracing approach. Graphics
vendors like NVIDA are working on hardware acceleration for ray-tracing, which
will facilitate real-time rendering. With improvement in technology, animation in
real-time ray-tracing is viable. It is an open area of research that brings real-time and
photorealistic goals towards each other with more visually and physically accurate
rendering. Extending animation in real-time ray-tracing to stereoscopic displays is a
challenging extension to this work.
3. The GPU based stereoscopic rendering of anaglyphs is limited to a simple method by
using frame buffer objects. The compute mode of a GPU, with use of either a compute
shader, CUDA or OpenCL programs, can be researched to produce computationally
extensive anaglyphs, in real-time, with enhanced color fidelity.
141 4. Physically based rendering with use of the computationally expensive fluid dynamic
equations can produce a more realistic animation of natural phenomenon. An effi-
cient stereoscopic implementation with fluid dynamics is a useful extension to this
work.
5. Sophisticated post-processing to include simulation of time-of-day, nighttime render-
ing, glow effects, sunrays, lens flare, cinematic camera movements, realistic particle
accumulation and flow are all valid additions to the current problem.
6. Another extension is combination of several natural phenomenon in one larger
weather system. This may include interaction of rain and snow particles with other
natural objects such as grass, trees, or other vegetation. Inclusion of the sound of rain-
fall, or thunder with lightening effects, howling winds, and simulating snow blizzard
all add to realism.
7. In evaluating 2D-3D converters, the horizontal parallax is measured manually by
counting pixels differences between the left- and the right-eye views. Instead, for
future experiments it is proposed to use an automatic feature detection algorithm
and apply stereo matching between the left- and right-eye views to the horizontal
parallax. This will increase the number of feature points to compare and enhance the
test sets for more accurate results.
8. The dimensionality of the problem that measures perceptual space of rain or snow
should be increased. Currently three dimensions are considered, namely particle
size, motion, and number. It is proposed that inclusion of other dimensions such as
raindrop splash or ripple effects, snow accumulation or stability, and sound effects
142 may also contribute towards defining perceptual space of rain or snow.
9. The survey experiment to determine apparent photorealism in stereo can be en-
hanced to consider other questions, such as stereo output and visual immersion for
effectiveness of stereoscopic viewing in virtual reality applications, the impact of
glow effects on photorealism to a scene with nighttime ambient light, and particle
interaction with other scene objects, such as snow accumulation or a raindrop splash
effect on photorealism.
10. A future enhancement to this experiment is to improve comparison of a mono to a
stereoscopic visual stimulus, such that we gather opinions from a monoscopic visual
stimulus and evaluate it against opinions from same output in stereo. Stereoscopic
implications of camera attributes, such as lens distortions on photorealism is an open
area of research. Future research will use methods to exploit the geometry of stereo
pairs to speedup rendering of photorealistic scenes with natural phenomena while
maintaining real-time frame rates.
143 REFERENCES
[1] Zhao, Q. “Data acquisition and simulation of natural phenomena”. Science China Information Sciences 54.4 (2011), pp. 683–716.
[2] Gotchev, A. et al. “Three-Dimensional Media for Mobile Devices”. Proceedings of the IEEE 99.4 (2011), pp. 708–741.
[3] Akenine-Möller, T. et al. Real-Time Rendering. 3rd ed. Natick, MA, USA: A. K. Peters, Ltd., 2008.
[4] Houze, R. A. Cloud Dynamics. 2nd ed. Seattle, WA, USA: Academic Press, 2014.
[5] Burton, J. & Taylor, K. The Nature and Science of Rain. Nature and Science Series. London, UK: Franklin Watts, 1997.
[6] Libbrecht, K. G. Ken Libbrecht’s Field Guide to Snowflakes. St. Paul, MN, USA: Voyageur Press, 2006.
[7] Srivastava, R. C. “Size distribution of the raindrops generated by their breakup and coalescence”. Journal of Atmospheric Sciences (1971), pp. 410–415.
[8] Pruppacher, H. & Klett, J. Microphysics of Clouds and Precipitation. 2nd ed. Atmo- spheric and Oceanographic Sciences Library. Dordrecht, The Netherlands: Kluwer Academic Publishers, 1997.
[9] Magono, C. & Lee, W. “Meteorological Classification of Natural Snow Crystals”. Journal of the Faculty of Science 4.2 (1966), pp. 312–335.
[10] Rose, B. M. & McAllister, D. F.“Real-time photorealistic stereoscopic rendering of fire”. Proceedings of SPIE 6490 (2007), 64901O1–64901O–13.
[11] Johnson, T. M. & McAllister, D. F.“Real-time stereo imaging of gaseous phenomena”. Proceedings of SPIE 5664 (2005), pp. 92–103.
[12] Borse, J. A. & McAllister, D. F.“Real-time image-based rendering for stereo views of vegetation”. Proceedings of SPIE 4660 (2002), pp. 292–299.
[13] Jakulin, A. “Interactive Vegetation Rendering with Slicing and Blending”. Proceed- ings of Eurographics (2000). Ed. by Sousa, A. & Torres, J. C.
[14] Garg, K. & Nayar, S. K. Photometric Model of a Rain Drop. Tech. rep. 2004.
144 [15] Roser, M. et al. “Realistic Modeling of Water Droplets for Monocular Adherent Rain- drop Recognition using Bezier Curves”. ACCV Workshop on Computer Vision in Vehicle Technology: From Earth to Mars. Queenstown, New Zealand, 2010.
[16] Garg, K. & Nayar, S. K. “Photorealistic Rendering of Rain Streaks”. Proceedings of ACM SIGGRAPH 23.3 (2006), pp. 996–1002.
[17] Garg, K. & Nayar, S. K. “Vision and Rain”. International Journal of Computer Vision 75.1 (2007), pp. 3–27.
[18] Tatarchuk, N. & Isidoro, J. “Artist-Directable Real-Time Rain Rendering in City En- vironments”. Eurographics Workshop on Natural Phenomena. Ed. by Chiba, N. & Galin, E. The Eurographics Association, 2006.
[19] Slomp, M. et al. “Photorealistic real-time rendering of spherical raindrops with hierarchical reflective and refractive maps”. Journal of Visualization and Computer Animation 22.4 (2011), pp. 393–404.
[20] Puig-Centelles, A. et al. “Multiresolution techniques for rain rendering in virtual environments”. International Symposium on Computer and Information Sciences. 2008, pp. 1–4.
[21] Puig-Centelles, A. et al. “Creation and control of rain in virtual environments”. The Visual Computer 25.11 (2009), pp. 1037–1052.
[22] Puig-Centelles, A. et al. “Rain Simulation in Dynamic Scenes”. International Journal of Creative Interfaces and Computer Graphics 2.2 (2011), pp. 23–36.
[23] Creus, C. & Patow, G. A. “R4: Realistic Rain Rendering in Realtime”. Computers and Graphics 37.2 (2013), pp. 33–40.
[24] Rousseau, P.et al. “Realistic Real-time Rain Rendering”. Computer and Graphics 30.4 (2006), pp. 507–518.
[25] Wang, L. et al. “Real-Time Rendering of Realistic Rain”. ACM SIGGRAPH. 2006.
[26] Rousseau, P.et al. “GPU Rainfall”. Journal of Graphics, GPU, and Game Tools 13.4 (2008), pp. 17–33.
145 [27] Coutinho, B. B. et al. “Rain Scene Animation through Particle Systems and Surface Flow Simulation by SPH”. Proceedings of SIBGRAPI (2010), pp. 255–262.
[28] Tariq, S. “Rain”. Nvida White Paper. 2007.
[29] Starik, S. & Werman, M. “Simulation of Rain in Videos”. International workshop on texture analysis and synthesis. 2003.
[30] Mizukami, Y. et al. “Realistic Rain Rendering”. GRAPP - International Conference on Computer Graphics Theory and Applications (2008), pp. 273–280.
[31] Wang, N. & Wade, B. “Rendering Falling Rain and Snow”. Proceedings of ACM SIG- GRAPH (2004), p. 14.
[32] Wang, C. et al. “Real-Time Modeling and Rendering of Raining Scenes”. The Visual Computer 24.7 (2008), pp. 605–616.
[33] Weber, Y. et al. “A Multiscale Model for Rain Rendering in Real-time”. Computer and Graphics 50 (2015), pp. 61–70.
[34] Wang, C. et al. “Realistic Simulation for Rainy Scene”. Journal of Software 10.1 (2015), pp. 106–115.
[35] Feng, Z.-X. et al. “Real-time rain simulation in cartoon style”. International Confer- ence on Computer Aided Design and Computer Graphics. IEEE Computer Society, 2005.
[36] Yang, Y. et al. “Design and Realtime Simulation of Rain and Snow based on LOD and Fuzzy Motion”. ICPCA - International Conference on Pervasive Computing and Applications (2008), pp. 510–513.
[37] Barnum, P.C. et al. “Analysis of Rain and Snow in Frequency Space”. International Journal of Computer Vision 86.2-3 (2010), pp. 256–274.
[38] Fearing, P.“Computer Modelling Of Fallen Snow”. Proceedings of AMC SIGGRAPH (2000), pp. 37–46.
[39] Feldman, B. E. & O’Brien, J. F.“Modeling the Accumulation of Wind-driven Snow”. Proceedings of ACM SIGGRAPH (2002), p. 218.
146 [40] Fedkiw, R. et al. “Visual Simulation of Smoke”. Proceedings of ACM SIGGRAPH (2001), pp. 15–22.
[41] Haglund,˙ H. et al. “Snow Accumulation in Real-Time”. Proceedings of SIGRAD (2002), pp. 11–15.
[42] Moeslund, T. B. et al. “Modeling Falling and Accumulating Snow”. International Conference on Vision, Video and Graphics. 2005.
[43] Zou, C. et al. “Algorithm for generating snow based on GPU”. Proceedings of ICIMCS (2010), pp. 199–202.
[44] Langer, M. S. et al. “A Spectral-particle Hybrid Method for Rendering Falling Snow”. Proceedings of EGSR (2004), pp. 217–226.
[45] Ohlsson, P.& Seipel, S. “Real-time Rendering of Accumulated Snow”. Proceedings of SIGRAD. 2004, pp. 25–32.
[46] Reynolds, D. T. et al. “Real-time Accumulation of Occlusion-based Snow”. The Visual Computer 31.5 (2015), pp. 689–700.
[47] Tokoi, K. “A Shadow Buffer Technique for Simulating Snow-Covered Shapes”. Pro- ceedings of CGIV (2006), pp. 310–316.
[48] Foldes, D. & Benes, B. “Occlusion-Based Snow Accumulation Simulation”. VRIPHYS - Workshop in Virtual Reality Interactions and Physical Simulation. Ed. by Dingliana, J. & Ganovelli, F.The Eurographics Association, 2007.
[49] Saltvik, I. et al. “Parallel Methods for Real-time Visualization of Snow”. Proceedings of PARA (2007), pp. 218–227.
[50] Festenberg, N. v. & Gumhold, S. “A Geometric Algorithm for Snow Distribution in Virtual Scenes”. Proceedings of Eurographics (2009), pp. 17–25.
[51] Hinks, T. & Museth, K. “Wind-driven snow buildup using a level set approach”. Proceedings of Eurographics 9 (2009), pp. 19–26.
[52] Zhang, J. et al. “Rendering snowing scene on GPU”. Proceedings of ICIS 3 (2010), pp. 199–202.
147 [53] Tan, Y. et al. “Real-Time Snowing Simulation Based on Particle Systems”. 3 (2009), pp. 7–11.
[54] Tan, J. & Fan, X. “Particle System Based Snow Simulating in Real Time”. Proceedings of ESIAT 10 (2011), pp. 1244–1249.
[55] Fan, N. & Zhang, N. “Real-time Simulation of Rain and Snow in Virtual Environment”. International Conference on Industrial Control and Electronics Engineering. 2012.
[56] Ding, W. et al. “Real-time rain and snow rendering”. International Conference on Agro-Geoinformatics (2013), pp. 32–35.
[57] Stomakhin, A. et al. “A material point method for snow simulation”. ACM Transac- tions on Graphics 32.4 (2013), 102:1–102:10.
[58] Sulsky, D. et al. “A particle method for history-dependent materials”. Computer Methods in Applied Mechanics and Engineering (1994), pp. 179–196.
[59] Harlow, F. H. “The Particle-in-Cell Method for Numerical Solution of Problems in Fluid Dynamics”. Proceedings of Symposium on Applied Mathematics 15.269 (1963).
[60] Wong, S.-K. & Fu, I.-T. “Hybrid-based Snow Simulation and Snow Rendering with Shell Textures”. Computer Animation and Virtual Worlds 26.3-4 (2015), pp. 413–421.
[61] Tatarchuk, N. “Artist-Directable Real-Time Rain Rendering in City Environments”. Proceedings of SI3D (2006), p. 30.
[62] Fine, I. & Jacobs, R. A. “Modeling the Combination of Motion, Stereo, and Vergence Angle Cues to Visual Depth”. Neural Computation 11.6 (1999), pp. 1297–1330.
[63] McAllister, D. F.“Display Technology: Stereo and 3D Technologies”. Encyclopaedia on Imaging Science and Technology (2006), pp. 1327–1344.
[64] Goldstein, E. B. Sensation and Perception. 8th ed. Wadsworth Publishing, 2009.
[65] Cumming, B. G. & DeAngelis, G. C. “The Physiology of Stereopsis”. Annual Review of Neuroscience 24.1 (2001), pp. 203–238.
148 [66] Jones, G. R. et al. “Controlling perceived depth in stereoscopic images”. Proceedings of SPIE 4297 (2001), pp. 42–53.
[67] Zhang, Z. & McAllister, D. F.“A uniform metric for anaglyph calculation”. Proceedings of SPIE 6055 (2006), pp. 605513–605513–12.
[68] McAllister, D. F. et al. “Methods for computing color anaglyphs”. Proceedings of SPIE 7524 (2010), 75240S–75240S–12.
[69] Jorke, H. & Fritz, M. “Stereo projection using interference filters”. Proceedings of SPIE 6055 (2006), 60550G–60550G–8.
[70] Dodgson, N. A. “Autostereoscopic 3D Displays”. Computer 38.8 (2005), pp. 31–36.
[71] Liu, J. et al. “Three-dimensional PC: toward novel forms of human-computer inter- action”. 3D Video and Display Devices and Systems SPIE (2000), pp. 5–8.
[72] Hussain, S. A. & McAllister, D. F.“Stereo rendering of rain in real-time”. Proceedings of SPIE 8648 (2013), 86480B–86480B–11.
[73] Wara, C. “Dynamic Stereo Displays”. Proceedings of SIGCHI. Denver, CO, 1995.
[74] Rademacher, P.et al. “Measuring the Perception of Visual Realism in Images”. Pro- ceedings of Eurographics. London, UK, 2001.
[75] Ferwerda, J. “Three varieties of realism in computer graphics”. Santa Clara, CA, 2003.
[76] Adelson, S. J. & Hodges, L. F.“Stereoscopic ray-tracing”. The Visual Computer 10.3 (1993), pp. 127–144.
[77] Reeves, W. T. “Particle Systems&Mdash;a Technique for Modeling a Class of Fuzzy Objects”. ACM Transections on Graphics 2.2 (1983), pp. 91–108.
[78] Marshall, J. S. & Palmer, W. M. “The distribution of raindrops with size”. Journal of Meteorology 5.4 (1948), pp. 154–166.
[79] Hussain, S. A. & McAllister, D. F.“The Effectiveness of 2D-3D Converters in Rendering Natural Water Phenomena”. IJRTET 8.1 (2013), pp. 18–22.
149 [80] Arcsoft, Inc. Arcsoft MediaConverter - Converting 2D to 3D. URL: http://www. arcsoft.com/mediaconverter/?icn=Topics-Win8&ici=AMC8-Top-Learn- Button (visited on 01/03/2013).
[81] AxaraMedia, Ltd. 2D to 3D Video Converter: converting 2D to 3D video files and cre- ating 3D videos. URL: http://www.axaramedia.com/VideoSolutions (visited on 01/10/2013).
[82] SeaPhone Co., Ltd. Realtime 2D/3D converter DIANA-3D Plus. URL: http://www. texnai.co.jp/diana3d-plus/eng/index.html (visited on 01/08/2013).
[83] Leawo Software Co., Ltd. Best Video Converter download. URL: http://www.leawo. com/hd-video-converter (visited on 01/07/2013).
[84] Movavi. 2D to 3D Video Converter. URL: http://www.movavi.com/ (visited on 01/11/2013).
[85] McAllister, D. F. “Stereo Pairs from Linear Morphing”. Proceedings of SPIE 3295 (1998), pp. 46–52.
[86] Scharstein, D. & Szeliski, R. “A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms”. International Journal of Computer Vision 47 (2002), pp. 7–42.
[87] Hattori, T. “2D-3D Image Converter by Sea Phone”. The Journal of Three Dimen- sional Images 23.2 (2009), pp. 36–39.
[88] Hattori, T. “Image Processing Concerning Individual Phenomena in Real Time 2D/3D Conversion Software DIANA for PCs”. The Journal of Three Dimensional Images 25.2 (2011), pp. 14–18.
[89] Park, M. et al. “Learning to Produce 3D Media From a Captured 2D Video”. IEEE Transactions on Multimedia 15.7 (2013), pp. 1569–1578.
[90] Zhang, L. et al. “3D-TV Content Creation: Automatic 2D-to-3D Video Conversion”. IEEE Transactions on Broadcasting 57.2 (2011), pp. 372–383.
[91] Saxena, A. et al. “Depth Estimation Using Monocular and Stereo Cues”. Proceedings of IJCAI (2007), pp. 2197–2203.
150 [92] Cheng, E. et al. “RMIT3DV: Pre-announcement of a creative commons uncom- pressed HD 3D video database”. Proceedings of QoMEX (2012). Ed. by Burnett, I. S., pp. 212–217.
[93] Mi˛edzynarodowy Zwi ˛azek Telekomunikacyjny. Subjective Assessment of Stereo- scopic TelevisionPictures - Recommendation ITU-R BT.1438. International Telecom- munication Union, 2000.
[94] Likert, R. “A technique for the measurement of attitudes”. Archives of Psychology 22.140 (1932), pp. 1–55.
[95] Cox, N. “Speaking Stata: Creating and varying box plots”. The Stata Journal 9.3 (2009), pp. 478–496.
[96] Hussain, S. A. & McAllister, D. F.“Stereo rendering of photorealistic precipitation”. Proceedings of SPIE 9 (2017), pp. 158–166.
[97] Hojlind, S. et al. “Why a single measure of photorealism is unrealistic”. 2014.
[98] MediaWiki. Wikimedia Commons. URL: https://commons.wikimedia.org (vis- ited on 01/10/2017).
[99] Jamieson, S. “Likert scales: how to (ab)use them”. Medical education 38.12 (2004), pp. 1217–1218.
[100] Stangroom, J. Social Science Statistics. URL: http://www.socscistatistics. com/tests/Default.aspx (visited on 01/11/2017).
151