AUDIOSITY = AUDIO + RADIOSITY

A THESIS

SUBMITTED TO THE FACULTY OF GRADUATE STUDIES AND RESEARCH

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

MASTER OF SCIENCE

IN

COMPUTER SCIENCE

UNIVERSITY OF REGINA

By

Hao Li

Regina, Saskatchewan

September 2009

© Copyright 2009: Hao Li Library and Archives Bibliotheque et 1*1 Canada Archives Canada Published Heritage Direction du Branch Patrimoine de I'edition 395 Wellington Street 395, rue Wellington Ottawa ON K1A 0N4 Ottawa ON K1A 0N4 Canada Canada

Your file Votre reference ISBN: 978-0-494-65704-1 Our file Notre reference ISBN: 978-0-494-65704-1

NOTICE: AVIS:

The author has granted a non­ L'auteur a accorde une licence non exclusive exclusive license allowing Library and permettant a la Bibliotheque et Archives Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par telecommunication ou par I'lntemet, preter, telecommunication or on the Internet, distribuer et vendre des theses partout dans le loan, distribute and sell theses monde, a des fins commerciales ou autres, sur worldwide, for commercial or non­ support microforme, papier, electronique et/ou commercial purposes, in microform, autres formats. paper, electronic and/or any other formats.

The author retains copyright L'auteur conserve la propriete du droit d'auteur ownership and moral rights in this et des droits moraux qui protege cette these. Ni thesis. Neither the thesis nor la these ni des extraits substantiels de celle-ci substantial extracts from it may be ne doivent etre imprimes ou autrement printed or otherwise reproduced reproduits sans son autorisation. without the author's permission.

In compliance with the Canadian Conformement a la loi canadienne sur la Privacy Act some supporting forms protection de la vie privee, quelques may have been removed from this formulaires secondaires ont ete enleves de thesis. cette these.

While these forms may be included Bien que ces formulaires aient inclus dans in the document page count, their la pagination, il n'y aura aucun contenu removal does not represent any loss manquant. of content from the thesis.

••I Canada UNIVERSITY OF REGINA

FACULTY OF GRADUATE STUDIES AND RESEARCH

SUPERVISORY AND EXAMINING COMMITTEE

Hao Li, candidate for the degree of Master of Science in , has presented a thesis titled, Audiosity = Audio + Radiosity, in an oral examination held on September 18, 2009. The following committee members have found the thesis acceptable in form and content, and that the candidate demonstrated satisfactory knowledge of the subject material.

External Examiner: Dr. Shaun M. Fallat,

Department of Mathematics and Statistics

Supervisor: Dr. David Gerhard, Department of Computer Science

Committee Member: Dr. Yiyu Yao, Department of Computer Science

Committee Member: Dr. Xue Dong Yang, Department of Computer Science

Chair of Defense: Professor Kenneth Runtz, Faculty of Engineering and Applied Science ABSTRACT

It is challenging to render spatialized audio within a dynamic environment in real-time with limited computational resources. Further complicating this process is the fact that compared to direct sound, reflected sound is usually more expensive and time consuming. In this thesis, we introduce a rapid method that we have named audiosity, which generates reflected sound in an enclosed audio environment containing obsta­ cles. Audiosity adopts the radiosity rendering technique from , and applies it to audio animation. At each frame of the animation, audiosity computes the sound energy that propagates between surfaces. The result is the final sound energy in the scene at the state of equilibrium. By treating the audio environment in its entirety, the audiosity scheme accelerates the reflected sound computation. More­ over, a filter grid is pre-calculated and stored so that the speed of the audio animation process can be increased further.

In this thesis, our only concerns are defused reflections during sound propagation.

In addtion, due to other restrictions of the graphics radiosity technique, we focus on

iii rendering without time delay. Although it is a rough approximation, we will justify this reasonable simplification to accelerate the entire audiosity scheme, so it works within real-time applications which are required to respond to events as they happen simultaneously. We also propose a modification to the original audiosity system providing for the inclusion of delay components in order to render reverberation, but this adds computational complexity.

A testing program is provided, along with experiments and evaluations which show that the audiosity scheme is efficient as well as matching our expectations.

iv ACKNOWLEDGEMENTS

I would like to acknowledge and extend my sincere gratitude to all those who have helped me during my graduate studies and research, especially in the preparation of this thesis.

Foremost, I would like to thank my supervisor, Dr. David Gerhard, for the freedom he gave me, which allowed me to explore various research topics in the Computer

Audio field. He has provided me with invaluable guidance and infinite wisdom in my research and graduate study.

Further appreciation is extended to my thesis committee members, Dr. Xue Dong

Yang and Dr. Yiyu Yao for their time and expertise in improving this thesis. In addition, thank the University of Regina, Faculty of Graduate Studies and Research, and the Department of Computer Science for funding and other helps.

Lastly, I would like to give my thanks to my parents, Wancai Li and Miling

Zhang, for both emotional and financial supports throughout my life. This thesis is also dedicated to Vivi for her unending encouragement and support.

v POST DEFENSE ACKNOWLEDGEMENTS

I would like to express thanks to my external examiner, Dr. Shaun Fallat, for his insightful comments and constructive suggestions. Final thanks are given to Prof.

Ken Runtz who presided at my defense.

VI CONTENTS

ABSTRACT iii

ACKNOWLEDGEMENTS v

POST DEFENSE ACKNOWLEDGEMENTS vi

1 INTRODUCTION 1

1.1 TERMINOLOGY AND MOTIVATION 1

1.2 OUTLINE OF THE THESIS 5

2 BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 7

2.1 SPATILIZATION IN AUDIO ANIMATION 7

2.2 FRESNEL ZONE AND CLEARANCE 10

2.3 TSINGOS'AUDIO ANIMATION RENDERING TECHNIQUE ... 14

2.4 LIMITATIONS OF CURRENT AUDIO RENDERING TECHNIQUES 17

2.5 RADIOSITY 18

vii 2.5.1 Form Factor 21

2.5.2 Radiosity Equation 22

2.6 LIMITATIONS OF RADIOSITY 26

3 FIRST-PASS ENERGY DISTRIBUTION 29

3.1 INTRODUCTION 29

3.2 DIFFERENCE BETWEEN GRAPHICS EMITTERS AND SOUND

EMITTERS 30

3.3 FIRST-PASS ENERGY DISTRIBUTION 32

3.4 USAGE OF FIRST-PASS ENERGY DISTRIBUTION 38

4 AUDIOSITY 41

4.1 INTRODUCTION 41

4.2 DIRECT SOUND VS. REFLECTED SOUND 42

4.3 AUDIOSITY SCHEME 45

4.3.1 Solid angle 46

4.3.2 Render with inversed matrix 48

4.3.3 Audio form factor 52

4.3.4 Rendering and output 54

4.3.5 Sound attenuation function 55

4.3.6 Rendering algorithm 56

viii 4.4 FILTER GRID 60

4.5 AUDIOSITY AND TIME DELAY 64

5 EXPERIMENTS AND EVALUATIONS 69

5.1 GUI OF THE TESTING PROGRAM 69

5.2 REASONABLENESS 71

5.2.1 Experiment #1: test with empty environment 72

5.2.2 Experiment #2: test with sound blockers 73

5.3 PERFORMANCE 76

6 CONCLUSION AND FUTURE WORK 82

REFERENCES 86

IX LIST OF TABLES

5.1 Grid construction time 78

5.2 Grid retrieval time 79

x LIST OF FIGURES

2.1 Sampled signals 9

2.2 Presnel zone 12

2.3 Tsingos audio animation rendering technique [47] 16

2.4 The Cornel Box [8] 20

2.5 Form factor 22

2.6 Radiosity equation 24

3.1 Emitter arrangement 1 33

3.2 Emitter arrangement 2 34

3.3 Emitter arrangement 3 35

3.4 Emitter arrangement 4 36

4.1 Direct sound vs. reflected sound 43

4.2 A determinant reflected sound situation 45

4.3 Solid angle [34] 47

xi 4.4 Audiosity rendering architecture 57

4.5 Equalization filter 58

4.6 Equalizer in iTunes 60

4.7 A sound filter grid 62

4.8 Retrieve data from the filter grid 63

4.9 Reverberation 65

4.10 Audiosity with delay time 67

5.1 GUI of the testing program 70

5.2 Experiment 1 73

5.3 Experiment 1 result chart 74

5.4 Experiment 2 75

5.5 Experiment 2 result chart 76

5.6 Fresnel zones for different frequency bands 77

5.7 Grid construction time 78

5.8 Grid retrieval time 80

xii 1 INTRODUCTION

1.1 TERMINOLOGY AND MOTIVATION

In real-time computer audio simulation within a dynamic sound environment, the final audio playback is a combination of all audio waveforms sent out by sound sources and affected by the environment. Certainly, during the propagation, sound will be obscured by obstacles and be reflected by surfaces, so that the physical layout of the environment affects the perceived sound. These effects, along with the sound attenuation during traveling, embed the information of the environment into the final sound, which is then received by the listener. One of the ultimate goals in computer audio is to make the result reasonable and match our expectation.

In more technical terms, we hope to create a realistic illusion of sound sources within a spatial environment. Objects near the sound source and receiver, such as obstacles and flat surfaces, affect the perceived sound based on the physical arrange­ ment of those objects. Such spatial information needs to be encoded into the audio

1 CHAPTER 1. INTRODUCTION 2 to ensure the listener can receive this information through limited number of loud­ speakers (two for stereo, five or more for a surround sound system). In computer audio, this procedure of simulating the audio behaviour based on the physical layout of the environment, including sound sources, sound receivers, and nearby objects, is called audio spatialization.

During the running time computation of large multimedia applications like games, sound processes often occupy a small amount of resource for particular reasons. First, better graphics and better AI systems give the user a better visual impression and gaming experience. As such, developers are willing to spend more computational resources on them. Second, since it is much easier to judge the quality level of the graphics or the AI system in a program, people are more critical and sensitive to them. However, as long as the audio "sounds right" in a game, or is not glaringly defective, players are satisfied. These facts facilitate two standard requirements of a real-time audio animation technique:

1. It must be fast, without over-consuming computer resources

2. It is not necessary to maintain the highest level of quality, if the reduction in

quality is not perceptually noticeable. CHAPTER 1. INTRODUCTION 3

In other words, the goal of any audio animation technique is to sacrifice accuracy for performance without compromising perceptual accuracy. Of course, there always exists the dilemma of how much accuracy is sacrificed. Therefore, the balance point should be chosen with care.

Before moving to the main part of the thesis, it is necessary to clarify a few terms, which will appear repeatedly in later chapters.

Along with audio spatialization, audio animation is another phrase strongly em­ phasized in this thesis. In graphics, animation defines a sequence of images that create an optical illusion of motion according to a psychological phenomenon named persistence of vision, which causes the brain to retain images cast upon the retina of the eye for a fraction of a second beyond their disappearance from the field of sight.

Similarly, audio animation analyzes the movement of sound sources by considering how the perceived sound is changed by a simulated environment. In other words, we now are focusing on the movement rather than a single frame of audio spatialization.

Moreover, both sound sources and nearby objects in the spatialized audio environ­ ment can move. In this case, the situation becomes more complex, and we need to be more careful about balancing accuracy and performance.

Currently, a number of audio spatialization techniques exist. In fact, some of them have achieved remarkable success when solely dealing with direct sound propagation only. However, when sound reverberation (reflected sound) is involved, it is much CHAPTER 1. INTRODUCTION 4 more costly. Compared to their original direct sound version, some techniques become several times more expensive after including sound reflections. This thesis seeks an audio animation scheme that accelerates the calculation of the reflected sound without losing much accuracy. Again, it is necessary to emphasize the word "animation" since we are more interested in the change of the perceived sound during a period of time rather than in a single frame. An effective solution is to adapt the concept of radiosity, which is a graphics rendering technique, and to merge it with an existing audio animation technique.

The title of this thesis, "Audiosity = Audio + Radiosity", simply means adopt the concept of radiosity from computer graphics and apply it into the computer audio field. However, there are certain compatibility issues between audio and radiosity, since the natural characteristics of sound and light are different from each other. The novel contribution of this thesis is seeking a proper method to combine techniques from these two research areas, and merging them into a reasonable scheme, and we name this scheme "audiosity".

We claim the audiosity scheme is efficient, and it will be shown by experiments; however, "accuracy" is difficult and expensive (time-consuming and costly) to numer­ ically quantify. In computer audio, an appropriate way to determine the accuracy is to find a real-world environment, then construct a virtual environment to match. CHAPTER 1. INTRODUCTION 5

Certainly, such a method is not feasible for this thesis. Therefore, here we only com­

pare the result of the auidosity scheme with our daily experience, and determine the

similarity between them. In other words, as long as the final sound that is generated by the scheme matches our expectation, we claim that the audiosity scheme is feasible

and promising.

Nevertheless, it is a possible future extension to have numerical and perceptual

listening experiments with the audiosity scheme to test its accuracy.

1.2 OUTLINE OF THE THESIS

The remainder of this thesis is organized as follows.

Chapter 2 contains two major components. The first part briefly introduces some

fundamental concepts of audio animation and existing rendering techniques, especially

Tsingos' audio ray-tracing technique. The second part of the chapter reviews the fundamentals of radiosity, including the explanation of the radiosity system, form factors, and methods for solving the radiosity system.

Chapter 3 gives the definition and explanation of the first-pass energy distribution, an important novel concept used in our audiosity scheme.

Chapter 4 derives the audiosity scheme by incorporating audio animation concepts into the standard radiosity rendering technique.

Chapter 5 includes the experimental results of the audiosity scheme. In this CHAPTER 1. INTRODUCTION 6 chapter, it is shown that not only the speed of the scheme is sufficient, but also the quality of the audio playback generated by the scheme matches our expectation well.

Chapter 6 concludes the entire thesis, and briefly discusses the possibilities to expand the audiosity scheme. 2 BACKGROUND OF AUDIO

ANIMATION AND RADIOSITY

2.1 SPATILIZATION IN AUDIO ANIMATION

In the computer audio field, spatial audio is sound reconstructed by the spatial in­ formation of a three dimensional environment [17] [39]. This provides a more re­ alistic experience by giving the listener the impression of the physical location and arrangement of sound sources. Moreover, sound spatialization is especially useful in real-time dynamic environments, such as video games, where the positions and ve­ locities of sound sources and the listener change during each frame. In this case, it is not possible to record the audio output in advance, instead, it must be rendered simultaneously in a real-time process based on the spatial information of the objects inside the environment.

7 CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 8

When we consider spatialization, certain questions arise: how does the environ­ ment change the sound, and how do we construct such a change in a virtual envi­ ronment. The impulse response technique is a traditional approach to solve such a problem, and it is popular in signal processing. Impulse response sends out an output signal (an impulse), like a bang or a clap, and the resulting signal (the response) is recorded after reacting with the environment. This impulse response can then be convoluted with a direct signal to re-create what that signal would have sounded like

in the environment where the impulse response was recorded [35] [14] [9]. In addition,

it has been largely used in collecting and analyzing audio spatial information and has become a fundamental analog tool to model the reverberation of an interior space.

For example, in order to simulate the reverberation of a particular cathedral, people

fire an impulse, and then record its echoes. Based on the pre-recorded audio sample of the impulse response, reverberations from within this cathedral can be generated ar­

tificially by using convolution and mixing it with new incoming audio signals, without

actually even walking into the cathedral again. This increases productivity, efficiency,

and ultimately reduces costs.

In the digital world, every continuous function has to be divided piecewisely and quantized into a discrete form. Generally speaking, this continuous range of values

(infinite) must be approximately mapped onto a relatively small (finite) set of discrete values. In the audio field, sound is broken up into instantaneous units, named samples. CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 9

Figure 2.1 demonstrates quantization and sampling of an input signal. These units are processed individually and are combined to produce the final result. The essence of convolution is that we apply the scaled impulse response to each sample of the original sound, and consequently, the resonance of that sound in the given space is simulated.

Figure 2.1: Sampled signals

The impulse response technique perfectly captures the spatial characteristics of a stationary physical environment, because a pre-recorded audio sample is equivalent to the perceived sound depending on the sound source and listener positions. For exam­ ple, in a movie theater, the impulse response between the loudspeaker and a particular seat is sufficient to re-construct the spatialized sound around that seat. However, if CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 10 the sound source or the listener is allowed to move, then we need to collect the im­ pulse response signals between every possible location of sound source and listener.

Therefore, in a dynamic system, which is a system involved with moving objects, the impulse response method becomes prohibitively expensive. Furthermore, it is very difficult to generate and record an impulse response in a fictitious environment, the deck of a spaceship, or the surface of an unknown planet for example. Unfortunately, we need a better approach to determine the effects caused by a physical environment near the sound sources and the listeners.

Currently, two techniques that researchers use to solve such a problem are the

Fresnel number and Fresnel clearance techniques, and both of them are based on the use of the famous Fresnel zone theory. We will introduce them in the next section.

2.2 FRESNEL ZONE AND CLEARANCE

In related research fields, obstacles between the sound source and the listener are called acoustic barriers [37]. A typical barrier can be a wall, a large box, or even a human being. If a barrier is interposed between the sound and the listener, the following three possibilities will happen to part of the sound:

1. penetrate through the barrier

2. be reflected back from the barrier CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 11

3. be diffracted around the barrier

Among these behaviours, penetration is usually ignored, because the amount of energy that passes through a barrier is usually small in comparison to the amount of energy that is reflected or diffracted. Since the audiosity scheme in this thesis is a performance-driven simplification, we will not consider the energy penetration at all. Sound reflection causes reverberations, which will be introduced in later chapters.

The most important and special behavior during sound traveling is diffraction.

It is a common experience that even if a sound source is visually covered by obsta­ cles, the sound can still be heard. This is because, during propagation, sound travels around an obstacle lying in the path. Such an occurrence is called diffraction [40] [21].

This kind of wave-like phenomenon exhibits everywhere around us, however, it often happens on an extremely small scale, which makes it barely noticeable. Compared to light, the diffraction of sound is more obvious since it very likely occurs in the scale of meters depending on its frequency. In a practical situation, determining the amount of sound diffracted around a barrier is not a straightforward task. Much research has been done in this area [45] [2] [31] [27], and Presnel zone [11] [47] is one of the most successful candidates for solving the diffract problem.

A Fresnel zone (Figure 2.2) originally used in optics and radio communications, is an ellipsoid-shape volume that represents the space involved in the radiation pattern

[5] [36] [4]. Like audio, radio signals (waves) do not form perfect straight lines during CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 12

Figure 2.2: Fresnel zone

travel from one transmitter to another; instead, diffraction of the radio wave generates a Fresnel zone between them. The cross section of the Fresnel zone, called a Fresnel disc, is circular and centers at the line of sight (LoS). When it is closer to the emitter or the receiver, the Fresnel disc is more concentrated and becomes smaller. In this way, the Fresnel zone forms an ellipsoid that wraps around both transmitters and the line of sight between them. The Fresnel zone is completely symmetric, so it does not matter which end is the emitter and which end is the receiver. CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 13

The radius of the Fresnel disc is called the Presnel zone radius. Since the Presnel zone is an ellipsoid, the Presnel zone radius has different values at varying locations along the line of sight, a straight-line that links both ends of the zone. The general equation for calculating the Fresnel zone radius at any point P in between endpoints is the following:

jnXdid2 f .

th in which, rn = the n Fresnel zone radius in meters

d\ = the distance of P from one end in meters

G?2 = the distance of P from the other end in meters

A = the wavelength of the transmitted signal in meters

The variable n indicates the degree of the Fresnel zone. Theoretically, there are an infinite number of Fresnel zones around a pair of transmitters, and a higher degree

Fresnel zone is always has a bigger radius than a lower degree one.

Please note that we are still analyzing only one sound frequency, and even this single frequency could generate a sequence of different Fresnel zones at individual degree levels. Fresnel zones with higher degree shift the phase of the oscillation of the original signal. Typically, in the nth Fresnel zone, an (n — 1) x 90 to n x 90 degrees out of phase will be created by obstacles. Odd numbered zones are constructive and even numbered zones are destructive. In fact, the energy strength decreases so fast CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 14 that even the second degree Presnel zone can often be ignored. Therefore, in most cases, we are only concerned with the first degree Presnel, so that in this thesis we always let n = 1.

Once the Presnel zone radius is calculated, the Fresnel zone clearance, which is a single value that interprets how much energy is maintained during propagation, can be determined as well. Usually the signal energy sent out from one transmitter is interfered by obstacles near the path. As such, it will not be completely received by the other transmitter.

2.3 TSINGOS' AUDIO ANIMATION RENDER­

ING TECHNIQUE

Since audio signals share many common properties with radio and wireless signals, concepts of Presnel zone and Fresnel zone clearance have been successfully merged into the acoustic research field. One effective application is Tsingos' audio animation rendering technique [47]. In this paper, sound objects are classified into four different categories: sound emitters, sound receivers, sound blockers, and sound reflectors.

Sound emitters and sound receivers are the actual sound sources and receptors.

One emitter and one receiver are treated as a pair, and a Fresnel zone will be con­ structed on the top of each pair. In this way, it will have many Fresnel zones, if there CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 15 are multiple emitters and receivers.

Sound blockers are equivalent to obstacles, and when they are entirely or partially within a Fresnel zone, they interfere with the transmitted sound. Moreover, the related Fresnel zone clearance is reduced due to the "sound shadows" cast by the blocker. Sound reflectors are surfaces that reflect back the original sound. Such behaviours change both the direction and the shape of the Fresnel zone, making the situation more complex. Typical examples of sound reflectors are walls and ceilings.

In addition, sound blockers and sound reflectors do not have to be isolated from each

other; for example, a sound blocker can be surrounded with sound reflectors. Take a building as the example once more: the building itself plays the role as a sound blocker

in the environment, and its outer surfaces should be treated as sound reflectors as

well.

These four categories (sound sources, sound receivers, sound reflectors, and sound

blockers) cover all objects involved in a sound environment. Nevertheless, necessary

information, including position, size, orientation, and so on, can be directly retrieved

from the data structure (i.e. polygons) of the simulated physical environment.

In Tsingos' audio animation rendering technique, as shown in Figure 2.3, for each

pair of sound emitters and receivers, a set of Fresnel zones is generated, one for

each frequency band (see section 4.3.6). After checking every sound block inside the Fresnel zone, its Fresnel zone clearance can be calculated by testing the overall CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 16

invalid image source

source! valid image source

blocker! A image blocker

microphone

Figure 2.3: Tsingos audio animation rendering technique [47]

coverage ratio of the obstacles to the cross-sections of the Fresnel zone. The Fresnel clearance (or Fresnel zone clearance) equals the percentage of the energy that arrives at the receiver, when the emitter is sending out full energy. However, handling sound reflectors becomes problematic and costly. Tsingos' idea is to mirror all the sound blockers, along with the emitter, to the other side of the reflector. In this way, a virtual sound emitter and virtual sound blockers are added. Then, we test the Fresnel zone clearance between the virtual emitter and the original receiver. If the environment is complex, it is necessary to duplicate a large number of sound blockers for each reflector. Moreover, if multiple reflections occur, there will be multiple copies of each CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 17 single sound blocker before the Presnel zone can be constructed. Roughly speaking, if the original complexity of calculating direct sound is 0(n) (linear complexity) and there are m reflectors, then the total complexity of the entire environment will be

0{mn).

2.4 LIMITATIONS OF CURRENT AUDIO REN­

DERING TECHNIQUES

It is a common thing in many current audio-rendering techniques, including Tsingos' technique and other mainstream methods [3] [30], to solve the reflection effect by using ray-tracing. In other words, they consider one sound "ray" at a time, and study its path and behaviour around blockers and reflectors. For people more familiar with computer graphics, ray-tracing may be a best solution for rendering, because it matches the physical movement of photons in the real world. However, when the domain is moved to the audio field, the ray-tracing technique is certainly mature and well-developed, but might not be the best choice to simulate the real world phenomenon.

Theoretically speaking, all waves form Presnel zones, and light shares the same characteristic. However, the wavelength of light varies in the range from 390nm to

780nm. If we plug such a number into the Presnel zone equation (equation 2.1), CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 18 then the Presnel radius (cross-section of the Fresnel zone) will be so small that can be omitted. Let us take yellow, about 500THz (terahertz), as an example, at a kilometer distance, the largest Presnel radius (at the middle point of the line of sight) is 7.071 x 10-4m. This is the reason that researchers always use ray to describe it.

Opposite to light, sound loses its concentration much easier. A typical wavelength of sound is in meter scale, and with such a large wavelength, the radius of a Fresnel zone can easily expand to meters during sound propagation. For example, human voice frequency ranges from approximately 300Hz to 3400Hz. If the speaker and the listener are 10 meters apart, then the maximum Fresnel radius ranges from about

0.502 meters to about 1.693 meters.

Since sound tends to generate a much "fatter" Fresnel zone then light at the same distance, we always use a wave to describe sound instead of a beam or a ray. Therefore, we need to seek a better rendering method to generate the sound environment instead of the well-known ray-tracing technique. Fortunately, radiosity is a better solution to simulate real-world sound propagation.

2.5 RADIOSITY

Radiosity is a new concept compared to other classical computer graphics rendering techniques, such as scan-line rendering and ray-tracing. The original idea of radiosity came from physics, where it is used to simulate and calculate heat exchanges during CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 19 transmissions between surfaces [25] [12] [42]. In the early 80's, radiosity was intro­ duced into computer graphics by researchers at Cornell University [19] and Fukuyama

University [32].

The term radiosity is a little bit ambiguous, and it contains several different layers of definitions and usages. First, a narrow definition of radiosity is "the total radiation energy leaving a surface per unit time and per unit area" [6], which is more often called Energy flux in physics. Second, in computer graphics, radiosity determines the brightness of different parts of a surface by mimicking the energy propagation between them, or generally speaking, the radiant existence [16] [41] [1]. Last but not least, radiosity also indicates the rendering technique (scheme) that uses energy flux to solve a given environment.

Figure 2.4 is a image generated by using the radiosity technique in the famous

"Cornell Box" environment, which is a simple physical environment designed by Cor­ nell University to test and measure the accuracy of any graphics rendering technique.

This image exhibits the advantages of radiosity: soft shadows and smooth colour transitions.

Currently, there are three basic radiosity algorithms: matrix radiosity [8], pro­ gressive radiosity [7], and wavelet radiosity [23] [20]. These three algorithms can be treated as three parallel approaches to a given radiosity problem, though they are all subtly related. Among them, matrix radiosity is the most well known technique, CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 20

Figure 2.4: The Cornel Box [8]

and the audiosity rendering scheme addressed by this thesis is based on the matrix radiosity as well.

Before any radiosity algorithm is applied, every surface in the scene needs to be divided into patches. A higher patch density provides smoother color variation and softer shadows. In addition, because a patch usually contains one single brightness value, alias errors can occur at the edge between two adjacent patches; therefore, higher patch density can reduce aliasing. Another vital note is that the radiosity technique only handles diffuse light, which means non-directional reflections. Strong directional reflected surfaces, such as mirrors, need to be rendered with other methods. CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 21

Because defused light spreads in all directions equally, this feature gives radiosity the capability of generating soft colour transitions. This is precisely why acoustic propagation is handled well by a system based on radiosity. On the other hand, it is difficult for radiosity to render hard sharp edges.

2.5.1 Form Factor

Form factor is the central idea of radiosity. As mentioned before, form factor deter­ mines the maintainability of the energy flux during the propagation from one patch to another. Between two patches, the form factor (Figure 2.5) can be approximately defined as:

„ Ai cos 6i cos 0j FV ~ — II 112 2-2 7r||r||2 in which, r = vector from patch Pi to patch Pj

9i = angle between the normal vector of patch Pj and r

9j = angle between the normal vector of patch Pj and r

Aj = area of patch Pj

For a scene with n patches, there arenxn different form factors, so that calculating form factors can be time-consuming. There are a few ways to accelerate, such as

Nusselt analog [33] [29], hemi-cube [8], and so on. Once the form factors are retrieved, the radiosity matrix can be constructed as well. CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 22

Figure 2.5: Form factor

2.5.2 Radiosity Equation

Consider a rendering environment with n patches, which are small fragments of a surface. Suppose E\ and Ej are two arbitrary patches and F^- (0 < Fij < 1) denotes the form factor, which is the fraction of energy flux emitted by Ei that is received by Ej. A detailed explanation of the form factor will be given in the next section.

Not all energy sent out from E{ can reach Ej, and the form factor F^ is the ratio of the amount of energy arriving at the receiver Ej to the amount of energy flux that CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 23 departs from the emitter E{. The reciprocity relation between Fij and Fji is:

AiFij = AjFji (2.3)

In which A{ and Aj are the areas of patch Ei and Ej. The total amount of outgoing energy from a given patch consists of the energy inherent in the patch and the energy received from all other patches. It is described by the radiosity equation (Figure 2.6) as follows.

Mi^Moi + pJ^MjFii (2-4) 3 in which, Mi = Radiosity of patch Pi (energy that leaves patch Pi)

Md = Emissivity of patch Pi (initial energy of patch Pi)

Pi = Reflectivity of patch Pj

Mj = Radiosity of patch Pj (energy that leaves patch Pj)

F^ = Form Factor of patch Pj relative to patch Pj

Notice that the reflectivity, p^ is one of the initial properties of the surface. It determines how much energy can be reflected by the surface, and it is always true that 0

^2 Bj Fij (energy reaching this surface from other surfaces)

Ei (energy emitted by this surface)

Pi I] Bj F^ (energy reflected by this surface)

Surface

Figure 2.6: Radiosity equation

Rearranging the original radiosity equation gives a new equation:

Moi = Mi-pY,MjFij (2.5) j

This is the equation for a single patch Pj. If we consider all n patches and list them together, then n equations form a linear system such as: CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 25

( Mol\ / 1 - PlFn -PlF12 -p\Fin \ / Mi \

Mo2 -P2F21 1 - P2F22 -p2F2n M2

MoZ = -P3F31 -P3F32 -P3-^3n M3 (2.6)

\ Mon J \ -pnFni -pnFn2 ... 1 - PnFnn ) \ Mn )

Alternatively, in a simplified notation:

M0 = {I- R)M (2.7)

In this equation, M0 contains all emissivities, which are the initial brightness of each patch, while M is the final brightness of each patch. The n x n matrix, I — R, is called the radiosity matrix, which can be pre-calculated. Solving the linear system for the final brightness level creates energy on every patch at the state of equilibrium.

I—R is a very special matrix. As mentioned before, the form factors Fij determines the ratio of the remaining amount of energy to the original amount of energy after travelling from one patch to another one. The total energy received by a patch from

all other patches should be no more than the sum of energy sent out from all other n patches. In other words, Y^Fy < 1, and the equal sign occurs only when there is no attenuation. In addition, it is always true that 0 < p< 1, because there is no material that either completely reflects energy or completely absorbs energy. In this case, any row in matrix I — R has the following characteristic: CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 26

n n

n

=> \l-PiFii\> ^ \piFyl (2.9)

Note: the reason equation 2.9 is valid is that all PiFij are non-negative.

In linear algebra, if in every row of a matrix, the magnitude of the diagonal entry in that row is larger than the sum of the magnitudes of all the other (non- diagonal) entries in that row, then the matrix is called strictly diagonally dominant.

By the Gersgorin circle theorem, a strictly diagonally dominant matrix is non-singular

(invertible). In other words, the linear system has an unique solution. [18] [50] [26]

The matrix I — R matches the requirement of a strictly diagonally dominant matrix, hence the matrix equation 2.6 must have an unique solution. With the unique solution, the state of equilibrium is guaranteed in the given radiosity environment.

Also, it ensures that there exists an inverse of matrix I — R, denoted by (/ — i?)_1, and this inverse matrix plays a extremely important role in the audiosity scheme.

2.6 LIMITATIONS OF RADIOSITY

In Section 2.4 we indicated that sound propagates in wave-like form rather than ray­ like form, which is the reason it should be considered in its entirety. Now the radiosity CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 27 rendering technique handles the entire scene altogether, and this feature makes it a superior candidate for solving the audio animation problem. However, before it can be applied to the audio field, it is necessary to make a few modifications.

First, in the original radiosity method, emitters must be a part of the scene. In other words, an emitter has to be converted into a patch, and then must take part in the matrix solving. In addition, since it is a patch, the emitter must be in a fixed position. In an audio environment, a sound emitter is usually isolated from the

scene, which usually means the emitter is not involved in the Fresnel zone clearance

detection. Also, in many cases a sound emitter is a moving object as well, so it cannot

stay at the same position all the time. Hence, a new concept called first-pass energy

distribution is introduced in this thesis. The first-pass energy distribution first casts

all energy from the emitter to patches in the environment, and then processes the

classical radiosity method. Detailed information of this concept appears in the next

chapter.

Second, in this sound spatialization study, we tend to treat the sound emitter

and the sound receiver as points, while in the radiosity method an emitter has to be a part of a surface. This is because most emitters are small compared to typical blockers and reflectors, such as walls and buildings. Even if an object like a car or a person is to be included, the emitter itself can be considered the part of the obstacle that makes the sound, for example, the mouth of the person or the engine of the car. CHAPTER 2. BACKGROUND OF AUDIO ANIMATION AND RADIOSITY 28

In this way, in the audiosity scheme, sound emitters and receivers can be removed during the matrix solving process, reducing the computational complexity. Certainly, this is only a reasonable approximation and simplification in order to balance both accuracy and performance in sound animation.

Third, in the original radiosity method, every time movement takes place in the scene, the entire form factor matrix must be reconstructed. It cannot be tolerated in audio animation, because actions of the emitter and the receiver occur in every frame. In order to solve such a problem, the radiosity equation requires some changes.

As such, mathematical proofs are provided in the next chapter to show that these modifications are valid.

Last, in the original radiosity method, it is assumed that all energy arrives si­ multaneously. Compared to the speed of light, sound travels so slow that we need consider delays. One feasible approximation is adding an extra axis of time, and then manually projecting the strength of the energy that arrives at each patch at the equilibrium state onto the time axis. This topic will be discussed in Section 4.5. 3 FIRST-PASS ENERGY DISTRIBUTION

3.1 INTRODUCTION

As discussed previously, it is not easy to apply the radiosity rendering technique to the audio environment directly unless a few modifications are made to the radiosity method. One big problem is that a graphics emitter functions differently from an audio emitter. First-pass energy distribution, a novel idea addressed in this thesis, is a fundamental concept that bridges the difficulties during the transition from radiosity to audiosity. In this chapter, differences between graphics emitters and sound emitters will be addressed first, and then first-pass energy distribution will be defined and explained.

29 CHAPTER 3. FIRST-PASS ENERGY DISTRIBUTION 30

3.2 DIFFERENCE BETWEEN GRAPHICS EMIT­

TERS AND SOUND EMITTERS

In the original radiosity technique, an emitter is a part of the graphics environment. In other words, an emitter is either a patch or a group of nearby patches. Energy involved in a radiosity system must depart from a surface in the scene, and eventally arrive at another surface. Furthermore, current radiosity algorithms only render stationary objects, which means emitters are at fixed positions. Each time something moves in the scene, including almost all form factors and the radiosity matrix, the entire data structure must be destroyed and reconstructed again. Such a process requires relatively high degree of computational complexity, especially if the environment is involved with multiple moving objects and multiple "sound source / receiver" pairs.

This is the reason radiosity is more commonly used in rendering static scenes rather than real-time animations.

In the audio would, it is a very different situation. Although stationary sound emitters exist, i.e. loudspeaker, most real-world sound emitters are moving objects.

Take a street as an example; if we consider that buildings and the road surface are sound blockers and reflectors, vehicles in the street can be treated as sound emitters.

Unlike in computer graphics where a light source can occupy a large area, in computer audio we normally use an arbitrary and abstract point to indicate a sound emitter or CHAPTER 3. FIRST-PASS ENERGY DISTRIBUTION 31 a sound receiver. The reason we treat sound emitters and receivers as single points is that the actual thing making sound is likely to be a single point, like the mouth of a person or the engine of a car. On the other hand, it also makes the mathematical computation easier and it is a reasonable simplification considering the scale of an actual sound source and receiver (microphone, mouth, ear), compared to the scale of the sound blockers and reflectors (buildings, vehicles, human body) we are dealing with.

A side effect of representing sound emitters and receivers as points is an object can be the combination of multiple sound elements (emitter, receiver, blocker, reflector).

For example, if we treat the mouth of a person as the sound emitter, then the body can play the role of a sound blocker and a sound reflector at the same time, since the body actually blocks and reflects the sound wave sent out from the mouth. Generally speaking, sound emitters, receivers, blockers, and reflectors do not necessary have to be isolated objects. One sound object usually contains multiple functionalities in an environment.

Since a point represents an emitter in this thesis, it does not fit into the original ra- diosity rendering technique. Somehow, the energy (sound) must be transformed from all emitters and projected onto existing patches in the sound environment. Moreover, it is necessary to find a valid way to do the energy transformation and projection, and it is the reason we introduce the idea of first-pass energy distribution. CHAPTER 3. FIRST-PASS ENERGY DISTRIBUTION 32

3.3 FIRST-PASS ENERGY DISTRIBUTION

In a graphics environment, light energy travels inside the scene, while it is replaced by sound in an audio environment. As we all know, both light and sound are special cases of energy. In physics, the radiosity technique concentrates on energy exchanges in an enclosed system, and disregards the specific form of such energy (light, sound, or another forms). Similarly, first-pass energy distribution is a general and abstract term. Regardless, it is able to handle various energy forms. In graphics, energy distribution is equivalent to brightness decomposition, and it is equivalent to the first-pass energy distribution in an audio environment.

Since visual examples are easier to perceive and comprehend, we use a graphics environment to explain the idea of first-pass energy distribution. Let us look through the entire radiosity process step-by-step in a different direction.

Suppose there is only one patch, placed on the ceiling, sending out energy. In other words, only one patch, say patch Piy is the original emitter with the initial brightness

Ei in this case. We then solve the following matrix equation and the results, Bi_,x, are the brightness of all patches that received energy from the patch Pi. Both the matrix equation (3.1) and a sample result follow (Figure 3.1). CHAPTER 3. FIRST-PASS ENERGY DISTRIBUTION 33

/ 0 \ ( B^X \

0 an din \ (3.1)

«nl Q"nn / Bi-+2 Ei

V 0 )

^'iiiia^ft»!^S££i.ii-i2iiatia \ Bi^n /

-r^^mF.^m^'^mpi^^^

Figure 3.1: Emitter arrangement 1

Repeat the same action (ignite one emitter patch only). However, this time let the patch Pj, which is different from patch P;, be the only emitter on the ceiling and

sends energy Ej to all other patches, and we get another matrix equation. Although this equation still uses the same radiosity matrix as the previous one and the vector of the initial state (left side of the equation) is similar, the solution is different. This time, it provides the brightness, Sj_,x, of every patch after receiving energy from patch Pj (Figure 3.2). CHAPTER 3. FIRST-PASS ENERGY DISTRIBUTION 34

/ 0 \ ( BJ->1 \ 0 on din \ (3.2) Ej B J-H O-nl 0inn /

V 0 / \ Bj^n )

-;-fti,^?fe^.'J^v- •••"'-" ».:rL.^-CA:l::-

Figure 3.2: Emitter arrangement 2

In both previous conditions, only one emitter in the scene has a non-zero value.

Next, if we have both patch Pi and patch Pj on the ceiling sending out energy Ei and

Ej, and keeping other patches dark, and then we have got a different solution. This solution is the brightness, Bitj^x, of each patch enlightened by both the patch Pi and patch Pj (Figure 3.3). CHAPTER 3. FIRST-PASS ENERGY DISTRIBUTION 35

/ 0 \ / Bij^i \

B.»,J— 2 Ei / an dm \ (3.3) 'i,j—>i Q>nn /

\ r>ij—>n / \ 0 /

; a -103.

Figure 3.3: Emitter arrangement 3

If we keep repeating the same action, that is, if we keep adding different patches as emitters, eventually the matrix equation will be changed back to the original radiosity equation. In this equation, the left hand side of the equal sign indicates all emitting patch values, and the right hand side is the product of the radiosity matrix and the final brightness of each patch at the state of equilibrium (Figure 3.4). CHAPTER 3. FIRST-PASS ENERGY DISTRIBUTION 36

( B \ an ... ain \ 1 E2 (3.4)

anl t^nn / \En) \Bn)

^.j^ftw*. ! 9 * ifO. ft ^ !«.

Figure 3.4: Emitter arrangement 4

We can prove the following assumption by using these distributed matrix equa­ tions: the result of solving the original radiosity equation is equivalent to the sum­ mation of results after solving the matrix equations of different emitter groups. More precisely, if we divide all emitters into groups and solve matrix equations for each of them, we have multiple sets of brightness for every patch in the scene. Among them, each set of brightness is the solution of a matrix equation related to the corresponding group of emitters. If we add all sets of brightness together, then the total summation is exactly the same as the result of the classical radiosity technique. This can be CHAPTER 3. FIRST-PASS ENERGY DISTRIBUTION 37

proven by the following matrix equations:

( E, \ an a\n \ B2 Eo

a>ni • Q"nn / \Bn ) \En) ( 0 \

Oil • • • Cbln

i=l a n\ an V 0 I ( 0 \ -l a au \n = £ Ei i=i O'nl V 0 J

Bi-+2

(3.5) = E Bi^i i=l

\ Bi->n )

Note that a completely dark patch can be considered as an emitter that sends out

zero energy. In other words, every patch can be treated as an emitter, no matter if

it sends out energy or not. Combining this revelation, with the previous assumption,

we conclude that different methods of grouping patches do not change the final result. CHAPTER 3. FIRST-PASS ENERGY DISTRIBUTION 38

Based on all previous assumptions, we can provide a full definition of first-pass energy distribution. Our claim is that as long as the arrangement of the initial energy sources does not change, we can re-organize energy on each patch and the final solution remains the same. Therefore, the first-pass energy distribution is the most important

concept in our audiosity scheme, because without it, emitters are not able to cast their energy to patches and be removed during the matrix solving process.

3.4 USAGE OF FIRST-PASS ENERGY DISTRI­

BUTION

In the last section, we showed that different arrangements of energy in an environment

lead to the same result, as long as the energy distribution remains the same. By

using this property, it is possible to cast all energy from the emitter to all patches in

the environment which are visible to the emitter without causing any change to the

results. In this way, energy sent out from emitters, which are not represented by any

patch to begin with, can be transferred on to patches in the scene. Next, we can use

the original radiosity techniques to solve the environment. Similarly, sound receivers

can be handled by collecting all visible energy from patches after the radiosity system

is solved.

Hence, this is the basic skeleton of the audiosity (audio + radiosity) scheme in CHAPTER 3. FIRST-PASS ENERGY DISTRIBUTION 39 sequence:

1. Remove (ignore) sound receivers temporarily, because we need them back later.

2. Broadcast all energy from every sound emitter to all the patches in the sound

environment. If a patch is not visible to the emitter, then simply assign a zero

value to it.

3. Remove (ignore) all sound emitters so that there are only patches in the scene.

4. Process the radiosity technique to generate the final energy stored in each patch

at the state of equilibrium.

5. Place all sound receivers back into the scene.

6. Collect energy for each sound receiver from patches in the sound environment.

Here, removing and re-introducing sound sources and sound receivers are dispens­ able. Both sound sources and receivers can be ignored in the following of the steps, since they are not strictly required in the radiosity calculation. The reason we use the word "remove" is that in the traditional radiosity technique, patch is the only component in the data structure of the environment. In order to keep the radiosity rendering process clear, we tend to remove sound sources and receivers temporarily.

Notice that the result (sound) is the distribution of energy participates in reflec­ tions. In other words, the audiosity scheme returns reflected sound at the state of CHAPTER 3. FIRST-PASS ENERGY DISTRIBUTION 40

equilibrium. The direct sound, the sound transmitted from the emitter to the receiver

directly, must be generated separately.

Certainly, energy that departs from the sound source will not distribute evenly to

all patches. The total amount of energy received by a particular patch is controlled

by various parameters. For example, if a patch is not visible to the sound source, then there is no energy transferred to it. Energy distribution is affected by the distance

between the sound source and the patch, the normal direction of the patch, and the

size of the patch as well. In order to solve this problem, we will introduce a new term,

named solid angle in Section 4.3.

In the next chapter, we will expand the skeleton of the audiosity scheme into a

completed sound environment rendering system. Each step in the process will be

discussed in detail, and experiments will be provided to prove the feasibility and the

reliability of the audiosity scheme. 4 AUDIOSITY

4.1 INTRODUCTION

The original radiosity technique analyzes the total light leaving a certain patch, while

being captured by another patch. Similar to the graphics radiosity introduced in

Section 2.5, a narrow definition of audiosity is the total sound leaving a point on a

surface, per unit area on the surface. However, audiosity, in this thesis, has a more

general meaning: an audio animation environment rendering method that determines

the sound at a given location by studying the energy exchange between different

surfaces. The audiosity scheme can be treated as an audio version of the radiosity

technique with a certain degree of modification. In this chapter, we will walk through

the entire audiosity scheme in detail, including the structure and the philosophy of

the scheme as well as mathematics behind it.

To make the explanation simple and clear, time delay will not be covered in the

introduction; nevertheless, we will study the possibility of introducing time delay into

41 CHAPTER 4. AUDIOSITY 42 the scheme later (in Section 4.5). The reason why time delay is ignored here is due to the characteristic of the radiosity technique: only diffused reflections are considered.

Since our audiosity scheme is a superstructure of the original radiosity technique, in this thesis we also concentrate on surfaces that reflect sound diffusely.

Consider a spectrum of the level of reflectivity, which varies from non-reflective to pure directional reflection. A surface covered in anechoic wedge is an extreme case, because it absorbs all sound. A large smooth surface is at the other end of the

spectrum, because it creates strong directional reflections. Reflectors with any other texture stays in the middle between these two extreme cases in the spectrum, and are treated as diffused surfaces. In this way, reflections at a surface will be evenly spread into all directions, without distinguishable echoes. In our audiosity scheme, which is

similar to the radiosity technique introduced earlier, we also use the parameter p to

indicate the reflectivity of a surface texture. For example, the p value of an anechoic wedge is near zero, because it tends to not send out energy after receiving incoming

energy from other patches.

4.2 DIRECT SOUND VS. REFLECTED SOUND

As mentioned before, a sound environment contains both direct and reflected sound,

as shown in Figure 4.1. Direct sound is the audio signal transmitted directly from the sound emitter without any reflections that is received by the receiver. In this CHAPTER 4. AUDIOSITY 43 case, sound reflectors are ignored and only sound blockers are tested. Conversely, reflected sound propagates only through reflections. When the traveling distance is long enough, time delay is no longer negligible and echoes start to appear.

Reflected Sound (reflection)

Sound emitter Wet sound (reflection)

Direct Sound (direct propagation)

Sound receptor

Figure 4.1: Direct sound vs. reflected sound

The final sound received by the receiver is the summation of the final direct sound and the final reflected sound. Both direct and reflected sound attenuate during prop­ agation. Therefore, only a portion of the total energy is able to arrive at the receiver.

Our goal is to determine how much sound (energy), direct sound and reflected sound, will successfully arrive at the receiver. CHAPTER 4. AUDIOSITY 44

Er = CdEe + CwEe (4-1)

In the above formula, Ee is the original sound energy and Er is the amount of

energy received by the receiver. Cd is a coefficient value, which indicates the ratio

of the final direct sound pressure to the initial sound pressure. Similarly, Cw is the

coefficient for the reflected sound. Notice that 0 < Cd < 1 and 0 < Cw < 1.

Since reflected sound usually travels much longer distances than direct sound, Cd

is sometimes much larger than Cw. It seems as though direct sound is more deter­

minant. On the other hand, compared to the volume of the direct sound, reflected

sound usually is only a relatively small perturbation. However, under some extreme

conditions, Cw can be significant as well. Please imagine the following situation as

shown in Figure 4.2.

In this example, the direct sound can be blocked entirely, whereas a small amount

of reflected sound can still be received. This is because some energy bypasses around

the obstacle and reaches the receiver by bouncing between the wall and the edge of the obstacle multiple times.

That said, the direct sound coefficient could be calculated by constructing a Fres-

nel zone between the sound emitter and the receiver, and then retrieving its Fresnel

zone clearance. Here, the reflected sound coefficient is more expensive to obtain.

Theoretically, we can simply construct a Fresnel zone for every possible reflection CHAPTER 4. AUDIOSITY 45

Sound emitter

direct soundfis completely blocked (Cd = 0) i'

Sound blocker

reflected sound is determinant (Cw != 0)

Wall (sound reflector) Sound receptor Wall (sound reflector)

Figure 4.2: A determinant reflected sound situation

for each sound environment; however, it is not feasible to do so in huge multimedia applications like games due to the restrictions of computational resources. In this thesis, the audiosity method is suggested as a rapid method to calculate the reflected sound coefficient in an audio animation environment.

4.3 AUDIOSITY SCHEME

According to first-pass energy distribution, it is valid to transform the energy gen­ erated by an emitter onto all patches. The question here is how to determine the CHAPTER 4. AUDIOSITY 46 amount of energy each specific patch receives. For example, we can simply use the form factor formula to calculate the energy ratio, if the emitter is a patch. However, arbitrary points are used to represent sound emitters in an audio environment, there­ fore, they are not as directional as patches. In other words, most sound emitters send out energy almost equally to all directions, although in some special cases the emitter is partially directional (i.e. a speaker). Take a vehicle as an example: the noise of engine can be heard from all around it. In this case, only a portion of original sound energy is transmitted to a particular patch. For that reason, we must calculate the size of the portion.

4.3.1 Solid angle

In mathematics, there is a special notation called solid angle, which is the three- dimensional extension of the angle between two lines. "The solid angle subtended by an object from a point P is the area of the projection of the object onto the unit sphere centered at P" [34] [41]. In this thesis, we introduce a very similar concept and still call it solid angle.

For each emitter in the environment, we define a new property of the relationship between the emitter and each patch visible to the emitter called a solid angle, denoted by ui. Let O be the unit sphere (sphere of radius one) centered at the location of the emitter. Assume Ai is the projected area of patch P, onto AQ, and AQ is the whole CHAPTER 4. AUDIOSITY 47

area of the unit sphere. The solid angle of the emitter to a patch Piy denoted by U{, shown in Figure 4.3, is defined as the ratio of the projected area to the whole area of the unit sphere. That is:

A, U>i = (4.2) io

Figure 4.3: Solid angle [34]

If the environment is represented in two dimensions, then the solid angle is simply the value of the arc angle of the projection over 2ir. CHAPTER 4. AUDIOSITY 48

By using solid angles, we guarantee that the initial energy is distributed evenly to all directions meaning no energy overflow. Alternatively, in mathematical notation, it appears as Y2 Pi < 1. If the environment is not completely enclosed, for example with doors or windows, some energy is lost and so ^ Pi < 1-

With the aid of solid angles, we can now calculate the energy that departs from the given emitter and that is received by any arbitrary patch in the scene, say patch

Pi. Let r be the vector that starts from the center of the patch and ends at the location of the emitter, therefore:

Ei = cos9iUiFa{d) (4.3)

In this formula, 6i is the angle between the normal vector of patch Pi and r. u>i is the solid angle of the emitter to patch Pi. Fa(d) is the sound attenuation function controlled by the parameter d, which is the distance between the emitter and the patch, and 0 < Fa(d) < 1. The return value, Ei, is the total among of energy that arrives at patch Pi, and is the initial energy of this patch during the radiosity rendering process.

4.3.2 Render with inversed matrix

Next, we remove the sound emitter and sound receiver so that only patches remain within the environment. Remember, energy that arrives at a particular patch has CHAPTER 4. AUDIOSITY 49 been embedded into that patch. This situation is similar to the initial state of the graphics radiosity technique. Though, in this instance, we store the amount of sound energy in each patch instead of brightness. Now the enclosed environment is a perfect radiosity system, meaning it is possible to solve the scene with classical radiosity algorithm for the final sound field at the state of equilibrium. However, the original graphics radiosity requires solving the radiosity matrix and rendering the scene each time patches receive new brightness values. Certainly, this is very time consuming and the frame rate of the animation will drop to an unacceptable level (interval between frames start to be detectable) if we want to apply it in real-time. Therefore, it is necessary to reform the original method in order to reduce the computational complexity during the actual real-time process.

M0 = (I- R)M (4.4)

In the radiosity equation, M0 is an n x 1 vector that contains the initial values of every patch for a scene that has n patches in it. / is an n x n identity matrix, while

R is the radiosity matrix generated by form factors between different patches. M is another n x 1 vector, which represents the final values of patches in the scene. This is a typical y = Ax pattern matrix equation and the solution is the final amount of energy in each patch at the state of equilibrium. Matrix solving techniques have been developed for hundreds of years, and there are countless matrix solving methods, such CHAPTER 4. AUDIOSITY 50 as Gaussian elimination, and LU decomposition. However, with the computational complexity of 0(n3) (cubic complexity), none of these matrix solving methods are satisfactory for the real-time performance.

A useful characteristic of the radiosity equation is that only values in the matrix

M0 vary in different frames. More precisely speaking, as long as the physical relation­ ship does not change between a pair of patches, their form factor remains the same.

Moreover, the reflectivity, p, is a static property of a patch and it is a constant value, which means that the entire radiosity matrix does not change at all. Therefore, we can take the inverse of the matrix and bring it to the other side of the equal sign.

The new equation looks like this:

X M = (I - R)- M0 (4.5)

Even though matrix inversion is very time consuming, remember that the inverse matrix, (7 — i?)-1, can be pre-constructed and stored (in hard-drive or memory) because in each frame the reflectors and blockers are assumed to be stationary. During real-time animation, only a matrix multiplication between an n x n matrix and an n x 1 vector is required every time the physical arrangement of the environment is changed. Complexity of such a multiplication is 0(n2), and it is much faster than the original matrix-solving (0(n3)).

Overall, each time the location of the emitter changes, a new emitter vector M0 is CHAPTER 4. AUDIOSITY 51 generated based on the relationship between the emitter and every patch. We then

-1 multiply the matrix (/ — R) , which is re-done, with the new M0, and it provides the final energy stored in each patch at the state of equilibrium.

Here, the assumption is that the positions of sound blockers and sound reflectors in the environment do not change, and that only sound emitters and sound receivers can move. The reason is, as mentioned before, that in a real-world case, emitters and receivers are often more active than blockers and reflectors. Moreover, the matrix solving technique can only be applied on a frame containing stationary arrangement of sound blockers and reflectors, which means every time an object relocates in the scene, the audiosity scheme must be run again in order to calculate the energy intensity on patches within that particular frame. With the filter grid method, which will be introduced in Section 4.4, perceived sound propagates between every pair of possible locations of the emitter and the receiver can be pre-generated before the real-time process. The only drawback is that the pre-rendering needs to be run for each different physical arrangement of the environment. If the movement of sound blockers and reflectors are predictable, the audiosity scheme is feasible for a limited number of spatialized arrangements. For environments containing many objects, whose positions vary randomly at different moments, it might be beneficial to switch to a simpler audio spatialization technique. CHAPTER 4. AUDIOSITY 52

4.3.3 Audio form factor

As mentioned in the previous chapter, the form factor calculation plays an important role in the radiosity technique. In the audiosity scheme, we also need to seek the audio form factor between different patches in a given sound environment.

The form factor of a patch determines the total amount of its reflected energy when receiving energy from another patch. If a surface is not a sound reflector, form factors of patches on this surface are zero. This further reduces the total duration of the radiosity matrix construction, because we only need to calculate form factors between patches on sound reflectors. The audio form factor Fy, which is a proportion of the total energy leaving patch Pi that is received by patch Pj, can be generated from the following formula:

2 — Aj cos Qi cos #,• Fa(d) Cij if patch Pi belongs to a sound reflector 7T

Fij = <

0 otherwise (4.6) in which, Aj = area of patch Pj

6i = angle between the normal vector of patch Pi and r

9j = angle between the normal vector of patch Pj and r CHAPTER 4. AUDIOSITY 53

Fa(d) = sound attenuation function controlled by distance, d, between

different patches

Cij = Presnel zone clearance between patches

Note: in this formula, r is the vector that links between patch Pi and Pj.

The audio form factor formula is quite similar to its original version, the graphics form factor, and it determines the energy sent out from a sound reflector. The

Presnel zone clearance, C^-, calculates the sound obstruction between patch Pj and patch Pj. It is in the same way as when handling direct sound, all sound reflectors are temporarily ignored and all sound blockers are checked to see if they are obstructing the Presnel zone.

The physical meaning of the audio form factor is that sound energy is decreased in three individual ways during propagation: physical layout of patches (A,-, cosOi, cosOj), sound attenuation (Fa(d)), and Presnel zone obstruction (Cij). They can be calculated separately, and the total product of them is the amount proportion of the energy that arrives at patch Pj .

After computing form factors between each pair of patches, the inversed radiosity matrix, (7 — R)~l, can be obtained immediately. Since all values in the matrix are independent of sound emitters and receivers, the matrix can be pre-rendered before real-time animation, as long as locations of sound blockers and reflectors are static. CHAPTER 4. AUDIOSITY 54

4.3.4 Rendering and output

The inversed radiosity matrix is prepared before the animation. In addition, at every frame, the total energy of each sound emitter is distributed to the patches based on first-pass energy distribution, and the product of the inversed radiosity matrix and the value of emitters create the final sound energy in each patch at the state of equilibrium. The final step is to collect the energy from the patches and feed it into the sound receiver.

Certainly, during collecting, energy attenuates before it reaches the sound receiver as well. This process is like a reversed version of energy broadcasting from the sound emitter.

M cose F CW = Y, J J M (4.7)

In this formula, Mj is the final amount of sound energy stored in patch Pj at the current frame, which is calculated using matrix multiplication, dj is the angle between the normal vectors of patch Pj and the vector from the sound receiver to the patch. Fa(d), again, is the sound attenuation function. The return value, Cw, is the ratio of the amount of energy that arrives at the sound receiver compared to the original amount of energy cast by the sound emitter. Generally speaking, Cw determines the proportion of the sound that remains during the propagation, so that CHAPTER 4. AUDIOSITY 55

0 < Cw < 1. In other words, if the original energy E departs from the sound emitter, then CWE will arrive at the sound receiver.

Remember CWE is only the amount of energy transmitted as reflected sound. The summation of both the direct sound and the reflected sound, CdE + CwE, is the sound that can be heard by the sound receiver.

Note that, in our scheme, both the direct sound coefficient and the reflected sound coefficient are linear ratios, that means the ratio numbers form a first-degree polynomial function as the original sound pressure changes. It is also suitable to use a logarithmic function to represent the relationship between the original and final sound energy strength. For example, decibel (dB) is a well-known logarithmic unit to measure the magnitude of sound pressure during propagation.

4.3.5 Sound attenuation function

The sound attenuation function, Fa(d), appears throughout this thesis. Typical sound pressure attenuation along a path is described by the following formula:

A* = 2°£ (4-8)

Here, d is the distance between the sound source and the receiver, while a.\ rep­ resents the medium scattering coefficient per meter.

As mentioned in the last section, any reasonable linear or logarithm function is CHAPTER 4. AUDIOSITY 56 qualified to be the sound attenuation function, if accuracy is not the first priority.

Note that the entire audiosity scheme results from a combination of both direct sound and reflected sound attenuation ratios, Cw and Cd, which can be easily translated into decibels.

4.3.6 Rendering algorithm

Since we have walked through all the major phases of the audiosity scheme, in this section we detail its algorithm, in particular how to combine the direct sound with the reflected sound rendered by audiosity at each frame during the audio animation.

The architecture of the audiosity scheme is presented in Figure 4.4.

Assume that in a sound environment emitters and receivers are moving, and that blockers and reflectors are stationary. This is a reasonable simplification for most real- world situations, because most sound blockers and reflectors, such as walls, ceilings, buildings, and roads, are stationary. Certainly, the catch of such a simplification is when a sound blocker / reflector is a moving object (like a car), then we have to run the scheme individually for the object at different location. First, we divide all surfaces in the environment into relatively small patches. In the original radiosity technique, very high patch density is required, in order to reduce aliasing in the area that contains colour transitions. However, since the human hearing system is less sensitive than vision (for example, it is easy for human eyes to discriminate the color CHAPTER 4. AUDIOSITY 57

Figure 4.4: Audiosity rendering architecture

difference between adjacent pixels, but it is harder to separate sound sources next to each other), audio patches do not need to be as small as patches for graphics radiosity. A typical wavelength of light is approximately several hundred nanometers

(approximately ranges from 390nm to 780nm ) and, in graphics radiosity, we usually construct patches in the scale of centimeters. In comparison, the wavelength of the average frequency of human speech (around 300Hz) at standard temperature and pressure of air, making the speed of sound around 330m/s, is about 1.1 meters. Also, the wavelength of audible sound ranges from 5.5 meters to 0.02 meters, depending on the person [14]. Therefore, it is reasonable to have sound patches in the scale of CHAPTER 4. AUDIOSITY 58 meters, and it is the reason we use 1-meter by 1-meter sound patches for experiments in the next chapter. Before introducing any emitters and receivers, the inversed radiosity matrix can be constructed ahead based on form factors between patches and the reflectivity of each patch.

For each pair of sound emitters and sound receivers, at a particular frequency, we compute the direct sound coefficient, Cj, and the reflected sound coefficient, Cw, separately. We can use the Fresnel zone clearance to generate the direct sound coef­ ficient by testing every sound blocker in the scene. For the direct sound coefficient, the audiosity scheme is applied. For each audio animation frame, we cast all energy from the emitter to patches, and then multiply the inversed radiosity matrix with the initial patch values. Next, summing up the final value of all patches retrieves the reflected sound coefficient. Finally, the summation of the direct sound coefficient and the reflected sound coefficient, named overall clearance, determines the proportion of the original audio that can be heard by the listener.

Fresnel zone for each frequency band Sound Source

HSEO»SKA iRllt'fT Sound Receiver

Original Sound Data -——-—• Mixer -—-——-^ Playback

Figure 4.5: Equalization filter CHAPTER 4. AUDIOSITY 59

This is how to calculate the overall clearance for a single frequency. For different sound frequencies, overall clearance values of each individual frequency are usually different. If we apply the radiosity scheme to a sequence of individual frequency bands

(or frequencies ranges), then it generates an equalization (EQ) filter, demonstrated in Figure 4.5. This is similar to the functionality of the graphic equalizer on most high-end audio equipment, or in any modern digital audio player. More precisely, an equalizer allows people to manipulate the amplitude (loudness) of a selected range of frequencies, in order to alter the acoustic characteristics of the sound. Figure 4.6 shows a typical software equalizer: if we increase the amplitude of high frequency bands, and significantly raise the lower sound, the result provides the "best sound" for a Rhythm and Blues (R&B) tune. We can construct an audiosity model for every frequency band, in order to approximately mimic the subtle clearance difference at distinct frequency levels.

Certainly, other sound filters can by applied to the emitter and the receiver as well. For example, in order to create the spatial perception of the human ear system, the Head Related Transfer Functions (HRTFs) may be added before the sound is received [46] [51] [3] [10]. Equalizer Qfeijxl

Figure 4.6: Equalizer in iTunes

4.4 FILTER GRID

Since we assume sound blockers and sound reflectors are stationary in an audio en­ vironment, it is possible to pre-render more information to accelerate the real-time process further. More specifically speaking, if the equalization filter for all possible locations of sound emitter and receiver pairs can be pre-calculated and stored in a ta­ ble, then, during the animation we are able to instantly retrieve the filter by searching the table.

It is not possible to generate a filter field for every possible location. A well-used method to fix this problem is to store only information at selected locations, and to reconstruct the information at the remaining locations based on existing data. For example, in artificial intelligent (AI), a vector field only contains a limited number of vectors that are spread evenly across the entire field. In order to determine the moving CHAPTER 4. AUDIOSITY 61 direction of an object in real-time, the system combines adjacent vectors based on interpolation formulas, and returns the vector at any arbitrary location. As a result, a small number of vectors are adequate to reconstruct the entire vector field without losing sufficient information.

The same idea can be applied to our audio case. A virtual grid matrix (Figure 4.7) is added to the environment, and then we select one position for each grid point to be the location of a sound source or a sound receiver. Assume that an JVxiV grid is constructed and each entry in this grid matrix can be either the sound source or the sound receiver, then this means the system contains ^y^ filters. The reason the equation is divide by two is that, in a pair of entry points (A and B), either of them can be the location of the sound source / receiver. Therefore, one filter is enough for both cases (A = sound source, B = sound receiver OR A = sound receiver, B = sound source).

However, it is a different story when we read a filter from the filter grid in real-time.

Since we only have the filter between any two pre-selected points in the grid, we must use some form of interpolation. We choose to employ linear interpolation as opposed to other methods such as Cosine interpolation, cubic interpolation, etc., because linear interpolation is the cheapest and the accuracy of interpolation is sufficient as long as the grid spacing is small enough. Here is a detailed example, see Figure 4.8, that demonstrates how to obtain the filter between the sound source, located at the point CHAPTER 4. AUDIOSITY 62

Filter 2

Filter 3

Figure 4.7: A sound filter grid

S, and the sound receiver, located at the point R.

Suppose Si, S2, S3 and £4 are the adjacent points of S in the grid, and i?i, R2,

i?3 and i?4 are the adjacent points of R in grid. Among these points, there are 16 related filters:

(Si, i?i), (Si, R2), (Si, i?3), (Si, R4)

(S2, Ri), (S2, R2), (S2, #3), (S2, -R4)

(S3, i?i), (S3, i?2), (S3, i?3), (S3, #4)

(S4, Ri), (S4, /?2), (S4, #3)1 (S4, R4) CHAPTER 4. AUDIOSITY 63

R,

R,

Figure 4.8: Retrieve data from the filter grid

Based on the relationships between point R and its neighbours, Ri, R2, R-3, RA, using 2-dimensional linear interpolations results in a group of niters between S"s neighbours and R:

(Si, R), (S2, R), (S3, R), (S4, R)

Then we run the 2-dimensional linear interpolation one more time, revealing the final filter between S and R.

In this way, the entire system needs to apply the 2-dimensional linear interpolation algorithm 5 times in order to get the filter (S, R). Compared with running the original audiosity scheme in real-time, this is faster because only basic arithmetical calculations are necessary. Another superiority of this method is that no matter how CHAPTER 4. AUDIOSITY 64 complex the environment is; no matter how high the density of the grid (total number of grid points), the total amount of time spent on grid checking remians constant.

This occurs because retrieving the exact filter from the grid only requires the

following two steps:

1. Find adjacent filter points.

2. Run two-dimensional linear interpolation.

Of course, for a very complex environment, a higher density grid is necessary

and needs a longer time to complete the pre-rendering. However, the retrieval time

remains the same.

4.5 AUDIOSITY AND TIME DELAY

So far, we have discussed the audiosity scheme without mentioning the factor of time

delay. In other words, we assume all sound energy arrives at the sound receiver

simultaneously. Mentioned before, this is a simplified approximation in practice, not only because the limitation of the original graphics radiosity technique that allows the entire interior of the environment to be solved at once, but also because we

assume that all surfaces in our audiosity scheme reflect sound waves diffusely (non- directional). Due to the embedded characteristics of the radiosity technique, we have to ignore the reflection now to make audiosity scheme work, but we are willing to add CHAPTER 4. AUDIOSITY 65 it back in if there is a sufficient method to simulate time delay under the restriction of computation time and resource.

Remember, our primary goal is to achieve a relatively reasonable audio spatial- ization in real-time, which means the performance (speed) is the highest priority.

A common way to make a real-time / interactive audio application "sound right" is adding an artificial reverb, since generating a real reverb based on the physical arrangement of an environment is time-comsuming. The point of this thesis is to im­ plement the first part of the system (without time delay); however, if reverberation is so clear that it cannot be overlooked in some situation, then we still need find a way to add the time delay into the audiosity scheme.

Early reflected Sound Direct sound pulse sound Collection of many reflected sounds, called reverberation

t = 0 Time Figure 4.9: Reverberation CHAPTER 4. AUDIOSITY 66

Figure 4.9 shows the time-amplitude (energy strength) relationship during a re­ verberation. The "sound pulse" indicates the original sound triggered at t = 0. After the direct sound, as sound bounces off of nearby surfaces, initial distinct reflections are heard. Later, as sound waves bounce off of more and more surfaces, the reflections merge into a unified decaying response.

Consider the current audiosity scheme. The result after the matrix-solving step can be treated as the relationship between the index of each patch and their energy strength level at the equilibrium state. On the other hand, we can say that it is a two-dimensional table with the patch index at one axis and the energy strength level at the other axis. Adding time delay is the equivalent of adding a third time axis into the original table, and at each time fragment, only some patches have the final sound (energy) generated for sound receiver to collect.

For the same reason, the performance is the critical requirement, because we need to design a feasible upgrade based on the audiosity scheme with minimum modifi­ cation. Accuracy-driven methods, impulse response for example, are ignored here, because it requires a recording of a real environment, and we are limited to simulated environments. In sum, it is imperative to construct an affordable approximation that fits into the audiosity scheme.

In fact, the final sound energy that arrives at a particular patch is a collection of multiple sound waves propagated along different paths. Take a patch Pi as an CHAPTER 4. AUDIOSITY 67 example, it should contain energy which moved directly from the emitter, as well as energy reflected by another patch Pj, plus energy reflected by patch Pn and then reflected by patch Pm, and so on. The difficulty here is that there is no single delay time for a selected patch, because every portion of its energy that travels along a different path has a unique delay time. Hence, the only solution available to us is assigning a reasonable artificial delay time for every patch, in order to make the reverberation "sound right".

r._ Patches

Energy Level

o t, t2 t3 tt Time Delay

Figure 4.10: Audiosity with delay time

One possible solution that matches our requirements is to mathematically project a one-to-one relation between intervals on the time axis and the "brightness" (energy CHAPTER 4. AUDIOSITY 68 strength level) of every patch in the environment (Figure 4.10). That is, after solving the audiosity matrix, map the final value of each patch to its own time delay so the patch with more energy has a shorter time delay and the patch with less energy has a longer time delay. Because of the attenuation of sound during propagation, if a sound wave travels longer distance (longer time) before being received by the sound receiver, it should contain less energy compared to the one with the shorter traveling distance (shorter time). If a sound wave takes infinitely long time for sound to arrive a patch, then the sound energy strength at that patch should be approximate to zero.

This is the theory behind the assumption that the patch with a higher value at the equilibrium state is mapped to a shorter time delay, and vice versa.

Compared to those methods that treat the accuracy as the first priority, this one has many disadvantages. However, since we are seeking a scheme to render real-time spatialized audio animation, a certain degree of inaccuracy should be allowed, as long as it significantly benefits the performance. Of course, if computational time and resource are abundant, this method can be replaced by a more costly one. No matter the reverberation algorithm, it is always a dilemma to choose a balance point between accuracy and performance.

Please note that time delay and reverberation have not been implemented as part of this thesis, and it is left for future work. 5 EXPERIMENTS AND EVALUATIONS

5.1 GUI OF THE TESTING PROGRAM

A Java program is implemented to test the audiosity scheme. In this chapter, we first introduce the graphical user interface (GUI) of the testing program, and then use it to evaluate not only how efficient is the audiosity scheme, but how reasonable is the sound result as well.

For the purposes of this study, a two dimensional interface is used to mimic the sound environment. Traditionally, sound fields are spatialized in two-dimensional

(2D) because humans are more sensitive to sounds on the horizontal plane then to sounds that are above or below them. Also, a 2D map is easier not only to identify locations of each object, but also to display on a the monitor. Current surround sound systems, including Dolby and DTS, employ a horizontal arrangement of loudspeakers

[24] [28]. Additionally, modern sound field techniques use a two-dimensional layout to minimize the total expense [43] [44].

69 CHAPTER 5. EXPERIMENTS AND EVALUAT

Figure 5.1 demonstrates an example of a Fresnel zone at 500Hz in our interface:

f Pi§fflgl §il& IMMMM); ®3^,d ErstiVi|tgf

' ~§M!^:MM§.0Mr

Figure 5.1: GUI of the testing program

In this interface, we use thick black bars to represent sound reflectors, while shad­ owed squares represent sound blockers. Sound blockers are placed on the periphery of the environment to simulate walls. Every light gray grid in the interface is a 1-meter by 1-meter square. Therefore, the entire environment is an 18-meter by 16-meter space. The elliptical bubble filled with darker colour is the area of the Fresnel zone, while the two dots at both ends indicate the sound emitter and the sound receiver. CHAPTER 5. EXPERIMENTS AND EVALUATIONS 71

Here, the ambiguity between the sound emitter and receiver is tolerable, because no matter which end is the emitter, the shape and size of the Fresnel zone in between will remain the same. Certainly, the value of the Fresnel clearance concluded by the related Fresnel zone will remain the same as well.

5.2 REASONABLENESS

Now, we would like to test the reasonableness of the radiosity scheme. As mentioned before, since the entire audiosity scheme is an approximation in a simulated environ­ ment, it is not feasible to determine a numerical accuracy value. Instead, here we are seeking a measurement of the similarity between the result of the scheme and our intuitional expectations, and it is the reason we use the term "reasonableness" to indicate how reasonable is the audio result of this scheme.

The same 18-meter by 16-meter room is used for experiments, and here we only test a single frequency band for easier observation. Notice that the default sound frequency is set to 500Hz. In order to test accuracy, we have designed two experiments.

The first is a test in an empty room, and the second is a test in a room with walls in the middle. CHAPTER 5. EXPERIMENTS AND EVALUATIONS 72

5.2.1 Experiment #1: test with empty environment

In the first experiment, we test the reasonableness of reflected sound clearance (co­ efficient). In the interface, two dots indicate locations of the sound emitter and the sound receiver, and arrows show the paths of their movement. Let point A and C be the start and end points, while point B is the midway position.

Suppose both the sound emitter and the sound receiver move south in parallel along the east wall of the room. Our assumption is that the room itself should affect the reflected sound clearance, because walls will reflect sound energy multiple times before the sound arrives at the receiver. After collecting data from the program, we plot them into a chart. The interface is shown in Figure 5.2 and the results are shown in Figure 5.3.

The continuous line in the chart indicates direct sound clearance. Since there is no sound blocker inside the room, the direct sound clearance is only affected by the distance between emitter and receiver, which is a constant value.

Therefore, direct sound clearance is always static, and is represented by a horizon­ tal line. However, reflected sound clearance, indicated by the dashed line, decreases during the first half of the trip, because the emitter and the receiver leave the north wall. At point B in the figure, where the emitter and the receiver have maximum distances from walls, we have the smallest value of reflected sound clearance in the chart. During the second half of the trip, the strength of the reflected sound signal CHAPTER 5. EXPERIMENTS AND EVALUATIONS

iiinl iMter A ° i i i

: "

§ '\

i 1

; !i : !J . V •£ °

'_• J •;. j i j„ L„..L...,,.L ...J ,:! ...J Li_._l

Figure 5.2: Experiment 1

starts to increase, because the south wall approaches, increasing the strength of the

reflections.

The result of this experiment matches our assumption well, and it is intuitively reasonable.

5.2.2 Experiment #2: test with sound blockers

Earlier, we mentioned that in some extreme conditions reflected sound could con­ tribute the majority of the energy. Here, in the second experiment, such conditions CHAPTER 5. EXPERIMENTS AND EVALUATIONS 74

IXrerl SOIIIHI

Rt Qrttt SotUMt

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

Figure 5.3: Experiment 1 result chart

are tested. Suppose we have the situation in Figure 5.4.

In this audio environment, two extra walls are placed as sound blockers, and their surface can be treated as sound reflectors. The sound emitter is located in a corner of the room, and we move the sound receiver behind the wall, from point A to point

B. Similar to the interface of the previous experiment, dots are the sound emitter and receiver. The sound receiver moves along the horizontal arrow, while the sound emitter is stationary. The solid line shows the direct link between the emitter and the receive at position B, and it emphasizes at this moment direct sound is obstructed.

Our assumption is that after a certain point, the sound emitter will be completely hidden by sound blockers, eliminating direct sound from the receiver. On the other hand, some amount of sound can still be propagated to the receiver by reflections. In CHAPTER 5. EXPERIMENTS AND EVALUATIONS 75

, . Q

*X______A

Figure 5.4: Experiment 2

other words, reflected sound now plays the only role in transmitting sound energy.

Clearly illustrated by the chart in Figure 5.5, the direct sound clearance (solid line) drops to zero quickly because of the sound blockers. However, the reflected sound clearance (dashed line) maintains a positive value throughout, even though its magnitude drops continuously. This shows that some sound energy wraps around sound blockers and, following multiple reflections, ultimately arrives at the sound receiver. CHAPTER 5. EXPERIMENTS AND EVALUATIONS 76

Diivii .Sound

Keflerte Sound

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

Figure 5.5: Experiment 2 result chart

5.3 PERFORMANCE

For testing efficiency, we will use the full sequence of frequency bands. There will be

10 frequency bands (50Hz, 100Hz, 200Hz, 400Hz, 800Hz, 1600Hz, 3200Hz, 6400Hz, and 12800Hz), varying from 20Hz to 12kHz, a typical range of a human hearing system. The reason we choose such range is that many sources say that the range of human hearing is 20 - 20,000Hz, although many adult humans can only hear up to 12kHz [22] [38]. As mentioned before, Fresnel zones at different frequencies have different shapes, and are visible in Figure 5.6.

According to the aforementioned filter grid method, we need to set up the density of the grid first. In this experiment, we test the environment three times with different PTER 5. EXPERIMENTS AND El

^•~~~~~~~~"~~~™ "^s.

'••' /' / / ,)• / 1 '} . : / ' / 1 .. /

/ i /

V \

N^ - ^•^^ ^_ _-_i—*—*

?igure 5.6: Fresnel zones for different frequency bands

gird densities: 7x7 grid, 9x9 grid, and 11 x 11 grid. In addition, there are three levels of environment complexity involved in testing, each with 50 sound blockers,

70 sound blockers, and 90 sound blockers respectively. Hence, in total, we have nine differential combinations by selecting distinct grid densities and numbers of sound blockers. We are using a 2GHz Intel Core 2 Duo processor with 2GB memory. The

Table 5.1 and Figure 5.7 exhibit the total duration time for grid construction. CHAPTER 5. EXPERIMENTS AND EVALUATIONS 78

Table 5.1: Grid construction time

50 blockers 70 blockers 90 blockers Audisoity 4.076 seconds 9.928 seconds 21.056 seconds 7x7 Grid 87.913 seconds 135.251 seconds 194.629 seconds 9x9 Grid 369.202 seconds 595.488 seconds 996.515 seconds 11 x 11 Grid 1174.787 seconds 1753.414 seconds 2767.315 seconds

2500 f

LJ SO blockers a. 1500 ti 70 blockers E P 90 blockers 1000

90 blockers

70 blockers

7«7grid 9x9grid 11x11 grid

Figure 5.7: Grid construction time

It is much easier to read after mapping tabbied numbers to a graph. The horizontal axis represents the grid density. For example, a 7 x 7 grid indicates there are total

49 intersections in this filter grid. The program spends a relatively long period in building up a grid for a complex environment. An 11x11 grid with 90 sound blockers CHAPTER 5. EXPERIMENTS AND EVALUATIONS 79 takes more than a half hour to finish, and this only represents a relatively small environment. However, because of the way the audiosity scheme is constructed, these calculations can be done before the real-time animation starts. In other words, this graph actually shows the amount of time that can be saved by pre-rendering the grid.

Compared to the construction, retrieving filters from the pre-rendered grid during the real-time process is significantly faster. Table 5.2 and Figure 5.8 shows total time for obtaining sound filter in different set of environment.

Table 5.2: Grid retrieval time

50 blockers 70 blockers 90 blockers 7x7 Grid 0.015 seconds 0.013 seconds 0.018 seconds 9x9 Grid 0.014 seconds 0.016 seconds 0.013 seconds 11 x 11 Grid 0.013 seconds 0.015 seconds 0.017 seconds

As mentioned before, the total time spent on data retrieval from the filter grid is a relatively constant value no matter how complex the environment. Therefore, theoretically, each pillar in this graph should have the same height. The computa­ tional durations in this experiment are fast enough for real-time performance; even a scene that contains 90 sound blockers only requires about 20 milliseconds to obtain the filter.

Note that once the audiosity scheme for a particular environment is constructed, it can be applied to multiple sound source / receiver pairs, since they all retrieve data from the same filter gird. Each sound pair takes constant time on data retrieving. CHAPTER 5. EXPERIMENTS AND EVALUATIONS 80

• SO blockers

• 70 blockers „ 90 blockers

Figure 5.8: Grid retrieval time

By pre-rendering the filter grid, the audiosity scheme largely decrease its com­ putational complexity during the real-time process. Moreover, the retrieval time is independent of the contents of the audio environment, which sets no upper limit to the elaborateness of the physical layout of the environment. This method allows for a substantial reduction in the time spent rendering audio animations in a variety of media and opens up the potential for better, more efficient gaming, and entertainment experiences.

Imagine for a modern video game, developers spend weeks to render filter grids of scenes before releasing the game to the market. When player plays the game, it CHAPTER 5. EXPERIMENTS AND EVALUATIONS 81 only takes a few milliseconds to render the equalization filter for audio playback with spatial information embedded in it. We hope this method can offer more possibil­ ities to make seamless real-time applications, while also increasing the quality and believability of the audio experience. 6 CONCLUSION AND FUTURE WORK

In this thesis, we address the audiosity scheme, a new audio animation rendering method that adopts the radiosity technique from computer graphics. This rapid method calculates the spatial audio playback according to the physical arrangement of an audio environment. By applying the graphics radiosity technique to audio, a small amount of resources is necessary to generate both the direct sound and reflected sound effects during the real-time process. In addition, a filter grid can be pre-constructed and stored to further accelerate the scheme. An audiosity scheme with a time delay is also mentioned to create reverberations. Not only is the mathematical background behind the audiosity scheme explained, but a graphical testing program is provided in the thesis as well. We have demonstrated that this scheme achieves a high standard of performance, and it is suitable for real-time audio applications.

Currently the audiosity scheme relies solely on matrix radiosity, the most basic radiosity rendering technique. In the future, we plan to apply different radiosity tech­ niques, including the progressive radiosity and wavelet radiosity, into our audiosity

82 CHAPTER 6. CONCLUSION AND FUTURE WORK 83 scheme, and run an empirical comparison between them. In addition, patches can be optimized so that the same quality of result can be returned with a smaller number of patches. Because of the characteristic of radiosity, it is also possible to use parallel computation to substantially increase the performance.

Further study on audio effects is another direction to improve the audiosity scheme.

In this thesis, we have briefly discussed adding a time axis to the original scheme in order to handle reverberation. The next step is to design an approximation of sound reflection model that provides better accuracy without significantly decreasing the speed of the scheme. Certainly, the acceleration of the computational capability of modern computers continues to bring extra resource to audio processes, and offers more room for complex calculations in spatialized animation.

Remember, the first priority is to achieve better performance with a minimum loss in accuracy. The speed of the real-time audiosity scheme is always an issue, as it is the largest motivation as well. Presently, if a sound blocker or a sound reflector moves, the audiosity scheme and the entire filter grid must be re-rendered. Hence, another possible future study is to modify the current scheme in order to pre-render all possible movements of sound objects, or to improve the performance of re-rendering the environment when some objects move.

However, it is possible for us to take advantage of the information from the ren­ dered graphics of the environment, and then build up the audio structure based on CHAPTER 6. CONCLUSION AND FUTURE WORK 84 such information [49] [15] [48]. By re-using the existing graphics data structures, the audio process can be further accelerated.

The accuracy of the radiosity scheme has not been tested numerically in this thesis. In order to do this, we plan to construct a digital model of a real physical environment, and compare its impulse responses with results that generated by the audiosity scheme. In this way, we hope the accuracy of this scheme can be quantified.

Another approach would be user evaluation of the sound results of the audiosity scheme, in order to clearly define the convincibility.

After many years of research, both computer graphics and audio work have at­ tained high standard. Excellent accomplishments not only happen in both the graph­ ics and audio field, but also appear when people want to connect these mediums together. Naturally, there is a significant difference between computer graphics and computer audio, and it is always a challenge to amalgamate concepts in these two isolated fields. In this thesis, we investigate a method, the audiosity scheme, which joins radiosity (in computer graphics) and spatialized audio (in computer audio). Af­ ter our research, we can safely claim that this new method is a success. Therefore, for others, and us future work lies in the possibility of uniting concepts and techniques from different research fields in computer science, in order to advance the state of the art of a variety of media, particularly large applications and video games where CHAPTER 6. CONCLUSION AND FUTURE WORK 85 implementation of spatialized sound quality continues to lag behind graphics inno­ vation. With the audiosity method outlined within this study, we expect that, with further research and development into and of this approach, a user's experience will be heightened by the high quality of audio animation combined with state of the art graphics. REFERENCES

[1] Ian Ashdown. Radiosity: A Programmer's Perspective. John Wiley & Sons,

October 1994.

[2] N.C. Bakhvalov, Ya. M. Zhileikin, and E. A. Zabolotskaya. Nonlinear Theory of

Sound Beams. Springer, February 1987.

[3] Durand R. Begault. 3D Sound for Virtual Reality and Multimedia. Morgan

Kaufmann Publishing, September 1994.

[4] Max Born and Emil Wolf. Principles of Optics. Cambridge University Press,

seventh edition, October 1999.

[5] David Brewster. A Treatise on Optics. Printed for Longman, Rees, Orme, Brown

& Green and John Taylor, August 1851.

[6] Yunus A. Cengel. Heat Transfer: A Practical Approach. McGraw-Hill series in

mechanical engineering. McGraw Hill Professional, second edition, 2003.

86 REFERENCES 87

[7] Michael F. Cohen, Shenchang Eric Chen, John R. Wallace, and Donald P.

Greenberg. A progressive refinement approach to fast radiosity image gener­

ation. SIGGRAPH '88: Proceedings of the 15th Annual Conference on Com­

puter Graphics and Interactive Techniques, pages 75-84, 1988. Available from:

http://doi.acm.org/10.1145/54852.378487.

[8] Michael F. Cohen and Donald P. Greenberg. The hemi-cube: A radiosity solu­

tion for complex environments. SIGGRAPH '85: Proceedings of the 12th An­

nual Conference on Computer Graphics and Interactive Techniques, pages 31-40,

1985. Available from: http://doi.acm.org/10.1145/325334.325171.

[9] G. Alfaro Degan. Acoustic barriers: Performance and experimental measure­

ments. 2003.

[10] Matteo Dellepiane, Nico Pietroni, Nicolas Tsingos, Manuel Asselot, and Roberto

Scopigno. Reconstructing head models from photographs for individualized 3d-

audio processing. PACIFIC GRAPHICS 2008: Computer Graphics Forum (Spe­

cial Issue - Proc. Pacific Graphics), 27(7), October 2008.

[11] Daniel T. DiPerna and Timothy K. Stanton. Fresnel zone effects in the scattering

of sound by cylinders of various lengths. The Journal of the Acoustical Society

of America, 90(6):3348-3355, 1991. REFERENCES 88

[12] Ernst R. G. Eckert and Robert M. Drake. Analysis of Heat and Mass Transfer.

McGraw-Hill Professional, 1959.

[13] Hugo Elias. Radiosity [online]. 2000. Available from: http://freespace.

virgin.net/hugo.elias/radiosity/radiosity.htm [cited 2009-8-25].

[14] Frederick Alton Everest. Master Handbook of Acoustics. McGraw-Hill/TAB

Electronics, 2000.

[15] Cameron Foale and Peter Vamplew. Portal-based sound propagation for first-

person computer games. IE '07: Proceedings of the 4th Australasian Conference

on Interactive Entertainment, pages 1-8, February 2007.

[16] James D. Foley, Andries van Dam, Steven k. Feiner, John F. Hughes, and

Richard L. Phillips. Introduction to Computer Graphics. Addison-Wesley Pro­

fessional, September 1993.

[17] William G. Gardner. 3-D Audio Using Loudspeakers. Springer, 1998.

[18] Semyon Aranovich Gersgorin. Uber die Abgrenzung der Eigenwerte einer Matrix.

Izv. Akad. Nauk SSSR, Otd. Mat. Estest. Nauk, VII. Ser. No., 6:749-754, 1931. REFERENCES 89

[19] Cindy M. Goral, Kenneth E. Torrance, Donald P. Greenberg, and Bennett

Battaile. Modeling the interaction of light between diffuse surfaces. SIG-

GRAPH '84-' Proceedings of the 11th Annual Conference on Computer Graph­

ics and Interactive Techniques, pages 213-222, 1984. Available from: http:

//doi.acm.org/10.1145/800031.808601.

[20] Steven J. Gortler, Peter Schrder, Michael F. Cohen, and Pat Hanrahan. Wavelet

radiosity. SIGGRAPH '93: Proceedings of the 20th Annual Conference on Com­

puter Graphics and Interactive Techniques, pages 221-230, 1993. Available from:

http://doi.acm.org/10.1145/166117.166146.

[21] Walter Greiner. Quantum Mechanics: An Introduction. Springer-Verlag, 1994.

[22] Donald E. Hall. Musical Acoustics. Brooks/Cole Publishing, third edition, Au­

gust 2001.

[23] Pat Hanrahan, David Salzman, and Larry Aupperle. A rapid hierarchical radios­

ity algorithm. SIGGRAPH '91: Proceedings of the 18th Annual Conference on

Computer Graphics and Interactive Techniques, pages 197-206, 1991. Available

from: http: //doi . acm. org/10.1145/122718.122740.

[24] Tomlinson Holman. 5.1 Surround Sound: Up and Running. Focal Press, Decem­

ber 1999. REFERENCES 90

[25] Hoyt C. Hottel. Radiative Transfer. McGraw-Hill Series in Mechanicla Engi­

neering. McGraw-Hill Professional, 1967.

[26] David C. Lay. Linear Algebra and Rs Applications. Addison Wesley, third edition,

July 2002.

[27] Yu. N. Makov and V. J. Sanchez-Morcillo. Fresnel number concept and revision

of some characteristics in the linear theory of focused acoustic beams. January

2008. Available from: http://arxiv.org/abs/0801.1476.

[28] Jean marie Pernaux, Patrick Boussard, and Jean-Marc Jot. Virtual sound source

positioning and mixing in 5.1 implementation on the real-time system genesis.

Proceeding Conference Digital Audio Effects, pages 76-80, 1998.

[29] Gregory M. Maxwell, Michael J. Bailey, and Victor W. Goldschmidt. Calcu­

lations of the radiation configuration factor using ray casting. Computer-aided

Design, 18(7):371-379, September 1986.

[30] Wolfgang Mueller and Frank Ullmann. A scalable system for 3d audio ray tracing.

ICMCS '99: Proceedings of the IEEE International Conference on Multimedia

Computing and Systems, pages 819-823, 1999.

[31] K. A. Naugolnykh and L. Ostrovsky. Nonlinear Wave Processes In Acoustics.

Cambridge University Press, May 1998. REFERENCES 91

[32] Tomoyuki Nishita and Eihachiro Nakamae. Continuous tone representation of

3D objects taking account of shadows and inter-reflection. SIGGRAPH '85: Pro­

ceedings of the 12th Annual Conference on Computer Graphics and Interactive

Techniques, pages 23-30, 1985. Available from: http://doi.acm.org/10.1145/

325334.325169.

[33] Wilhelm Nusselt. Graphische Bestimmung des Winkelverhaltnisses bei der

Warmestrahlung. Zeitschrift des Vereines Deutscher Ingenieure, 19(3):72-673,

1928.

[34] A Van Oosterom and J Strackee. The solid angle of a plane triangle. IEEE

Transactions on Biomedical Engineering, BME-30(2): 125-126, February 1983.

[35] . Circuits and Systems: A Modern Approach. HRW series

in Electrical and Computer Engineering. Holt, Rinehart and Winston, 1980.

[36] Jeremy Pearce and Daniel Mittleman. Defining the fresnel zone for broadband

radiation. Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics,

66(5):056602.1-056602.4, 2002.

[37] John G. Proakes and Dimitris K. Manolakis. Digital Signal Processing: Princi­

ples, Algorithms and Applications. Prentice Hall, fourth edition, 2006.

[38] Curtis Roads. The Computer Music Tutorial. MIT Press, February 1996. REFERENCES 92

[39] Francis Rumsey. Spatial Audio. Focus Press, 2001.

[40] Raymond A. Serway. Physics for Scientists & Engineers. Saunders College

Publishing, third edition, 1990.

[41] Franois X. Sillion and Claude Puech. Radiosity and Global Illumination. Morgan

Kaufmann, first edition, July 1994.

[42] Ephraim M. Sparrow. On the calculation of radiant interchange between surfaces.

Modern Developments in Heat Transfer, page 181, 1963.

[43] Thomas Sporer. Wave field synthesis: Generation and reproduction of natural

sound environments. Proceeding Conference Digital Audio Effects, pages 133-

138, 2004.

[44] Sascha Spors, Herbert Buchner, and Rudolf Rabenstein. Efficient active listening

room compensation for wave field synthesis. Audio Engineering Society 116th

Convention, May 2004.

[45] W. J. R. Swart, M. Odijk, and J. Jabben. Experimental validation of a model

for barrier noise attenuation. August 2002.

[46] R. Teranishi and E. A. Shaw. External-ear acoustic models with simple geometry.

Journal of the Acoustical Society of America, July 1968. REFERENCES 93

[47] Nicholas Tsingos and Jean-Dominque Gascuel. Soundtracks for computer ani­

mation: Sound rendering in dynamic environments with occlusions. Proceedings

of the Conference on Graphics Interface '97, pages 9-16, 1997.

[48] Nicolas Tsingos. Pre-computing geometry-based reverberation effects for games.

AES 35th International Conference, February 2009.

[49] Nicolas Tsingos, Emmanuel Gallo, and George Drettakis. Perceptual audio ren­

dering of complex virtual environments. SIGGRAPH '04: ACM SIGGRAPH

2004 Papers, pages 249-258, October 2008. Available from: http://doi.acm.

org/10.1145/1186562.1015710.

[50] Richard S. Varga and Richard Varga. Gersgorin and His Circles. Springer, first

edition, October 2004.

[51] Frederic L. Wightman and Doris J. Kistler. Headphone simulation of free-field

listening, part i k, ii. Journal of the Acoustical Society of America, 85(2):858-867,

February 1989.