SOUNDFIELD ANALYSIS and SYNTHESIS: Recording, Reproduction and Compression
by
SHUAI WANG
Athesis presented to the University of New South Wales in fulfilment of the thesis requirement for the degree of Master of Engineering (Research) in Electrical Engineering
Kensington, Sydney, Australia c Shuai Wang, 2007 Originality Statement
I hereby declare that this submission is my own work and to the best of my knowl- edge it contains no material previously published or written by another person, or substantial proportions of material which have been accepted for the award of any otherdegreeordiplomaatUNSWoranyother educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project’s design and conception or in style, presentation and linguistic expression is acknowledged.
I further authorize the University of NSW to reproduce this thesis by photocopying or by other means, in total or in part, at the request of other institutions or individuals for the purpose of scholarly research.
ii Acknowledgments
As the author of this dissertation, I would like to express my deep and sincere gratitude to my supervisor, Dr. D. Sen, for his inspiration and guidance throughout all the work of this research. It has been a great pleasure to conduct my research under his supervision, and his extraordinary patience and encouragement have helped me conquer various kinds of problems along the way.
Many thanks are also granted to other members from both our research group and school of Electrical Engineering and Telecommunications at the University of New South Wales for their kind help in various forms.
Last but not least, I wish to ascribe this thesis to my family for their consistent support and love.
iii Publications
• S. Wang, D. Sen, W. Lu, “Subband Analysis of Time Delay Estimation in STFT Domain,” Proc. Eleventh Australian International Conference on Speech Science and Technology, pp.211-215, 2006.
iv Abstract
Globally, the ever increasing consumer interest in multichannel audio is a major factor driving the research intent in soundfield reconstruction and compression. The popularity of the well commercialized 5.1 surround sound system and its 6-Channel audio has been strongly supported by the advent of powerful storage medium, DVD, as well as the use of efficient telecommunication techniques. However, this popularity has also revealed potential problems in the development of soundfield systems.
Firstly, currently available soundfield systems have rather poor compatibility with irregular speaker arrangements. Secondly, bandwidth requirement is dramatically in- creased for multichannel audio representation with good temporal and spatial fidelity.
This master’s thesis addresses these two major issues in soundfield systems. It introduces a new approach to analyze and sysnthesize soundfield, and compares this approach with currently popular systems. To facilitate this comparison, the behavior of soundfield has been reviewed from both physical and psychoacoustic perspectives, along with an extensive study of past and present soundfield systems and multichannel audio compression algorithms. The 1th order High Spatial Resolution (HSR) soundfield recording and reproduction has been implemented in this project, and subjectively evaluated using a series of MUSHRA tests to finalize the comparison.
v Contents
1 Introduction 1
1.1BackgroundOverview...... 1
1.2ReconstructionSystems...... 4
1.3MotivationforCurrentProject...... 6
1.4Contribution...... 8
1.5DissertationOverview...... 10
2 Soundfield Physics and Psychoacoustics 12
2.1PhysicalRepresentation...... 13
2.2Psychoacoustics...... 17
2.2.1 Spatialhearingwithonesoundsource...... 17
2.2.2 SpatialHearingwithTwoSoundSources...... 21
2.3Summary...... 24
3 Soundfield Systems and Multichannel Audio Compression Techniques 25
3.1 Historic Review of Soundfield Reconstruction Systems ...... 25
3.1.1 3-ChannelSystem...... 26
3.1.2 Stereophony...... 29
vi 3.1.3 Quadraphony...... 33
3.1.4 Ambisonics...... 34
3.1.5 WaveFieldSynthesis(WFS)...... 37
3.1.6 Ambiophonics and Vector Based Amplitude Panning (VBAP) . 39
3.1.7 Summary...... 40
3.2SummaryofMultichannelAudioCompressionTechniques...... 41
3.2.1 LosslessAudioCoding...... 41
3.2.2 LossyAudioCoding...... 42
3.3Summary...... 48
4 Multichannel Audio Compression Technique: Binaural Cue Coding (BCC) 49
4.1ProblemReview...... 49
4.2BCCDesignScheme...... 51
4.2.1 MacroscopicDesign...... 52
4.2.2 FrequencyProcessing...... 53
4.2.3 Encoding...... 56
4.2.4 Decoding...... 65
4.3Summary...... 66
5 High Spatial Resolution(HSR) Soundfield System 69
5.1ReviewofMicrophones...... 70
5.2HighSpatialResolution(HSR)Recording...... 77
5.3Reproduction...... 79
5.4Post-recordingProcessing...... 80
vii 5.4.1 Analysis...... 81 5.4.2 Synthesis...... 84 5.5Summary...... 88
6 Experiments and Results 90 6.1ExperimentsonBCCImplementation...... 90
6.1.1 EstimationofICTD...... 91 6.1.2 Pre-echoingEffect...... 96 6.2AcousticExperimentsPreparation...... 98
6.2.1 Anechoicchamber...... 99 6.2.2 SpeakerArrayConfiguration...... 100 6.2.3 MicrophoneArrayConfiguration...... 103 6.2.4 SubjectiveTests...... 105 6.3MUSHRATestsandResults...... 107
6.3.1 MUSHRATestsonBCC...... 108 6.3.2 MUSHRA Tests on HSR Soundfield Systems ...... 110
7 Conclusion 118 7.1SummaryofThisProject...... 118
7.2FutureWorks...... 119
Appendices
A Spherical Solutions to the Wave Equation 121 A.1 Solving the Wave Equation in Spherical Coordinates ...... 121
A.2 Spherical Bessel Functions of the 1st Kind ...... 122
A.2.1Plot...... 122 A.2.2Properties...... 123
viii B Information about Speaker and Wedge 124
B.1 GENELEC Loudspeakers (8130A)...... 124
B.2Wedge‘A’...... 125
C Stimuli of MUSHRA Tests on Soundfield Systems 127
C.1 Matrix F and Parameter μ ...... 127
C.2VariationsinStimuli...... 128
ix List of Figures
1.1GeneralStructureofSoundReconstructionSystems...... 4
1.2DifferentSoundReconstructionSystems...... 4
2.1AChirpSignalRecordedbyanOmnidirectionalMicrophone...... 13
2.2 Spherical Coordinates ...... 14
2.3SpatialHearingwithOneSoundSource...... 18
2.4BinauralSignals...... 19
2.5TheHeadShadowingEffect...... 20
2.6SummingLocalization...... 22
2.7SuperpositionofMultipleAuditoryEvents...... 23
3.1TheHuygens’Principle...... 27
3.2SteinbergandSnow’s3-ChannelSystem...... 28
3.3StereophonicRecordingandReproduction...... 30
3.4QuadraphonicSetup...... 34
3.5 Soundfield Microphone...... 35
3.6B-formatDirectionality...... 36
3.7 ITU-R BS.775, 5.1 Surround Sound Speakers Placement ...... 37
3.8ThePrincipleofWaveFieldSynthesis...... 38
3.9AmbiophonicsReproduction...... 39
3.10GenericStructureofLossyAudioCoding...... 43
4.1GenericStructureofBinauralCueCoding(BCC)...... 52
x 4.2MacroscopicStructuresofBCCAnalysisandSynthesis...... 52
4.3OverlappedHannWindows...... 55
4.4FineStructureofBCCEncoder...... 57
4.5UniformQuantizationScheme...... 64
4.6FineStructureofBCCDecoder...... 65
5.1 Generic Structure of Soundfield Systems ...... 70
5.2StandardMicrophones’2DPolarPattern...... 71
5.3TheDecompositionofB-formatDirectionality...... 73
5.4DoubleM/SMicrophoneDesignScheme...... 74
5.5ORTFStereoRecordingMicrophone...... 75
5.6 The Configuration of Decca Tree Microphone ...... 76
5.7Post-RecordingAnalysisSystem...... 82
5.8Post-recordingAnalysis...... 83
5.9Post-recordingSynthesis...... 84
5.10 High Spatial Resolution Soundfield System ...... 88
6.1TDEforMulti-Sinusoids:IntegerSamples...... 93
6.2TDEforMulti-Sinusoids:Non-integerSamples...... 94
6.3TDEforMulti-Sinusoids:Complex...... 95
6.4Pre-echoingEffects...... 97
6.5AdaptiveWindowingScheme...... 98
6.63DGeometricModeloftheAnechoicChamber...... 100
6.7GeometricSimulationofSpeakerMounting...... 101
6.8 Vogels Loudspeaker Support ...... 101
6.9HSRRecordingMicrophoneArray...... 103
6.10RecordingPosition...... 104
6.11GraphicTerminal(Qterm-Z60)andGUIinMUSHRATests...... 105
6.12TheContinuousQualityScaleinMUSHRA...... 106
xi 6.13ResultsofMUSHRATestsonBCCPerformance...... 109
6.14 Scores of MUSHRA Tests on HSR Soundfield Systems ...... 113
6.15NumberofSubjectsGrading‘Good’part1...... 115
6.16NumberofSubjectsGrading‘Good’Part2...... 116
A.1VariousBesselFunctionsoftheFirstKind...... 122
B.1 GENELEC Loudspeaker...... 124
B.2 Side {A1,A2,A3} andMountingPoint...... 125
B.3 Side {A0,A1,A3} ...... 126
B.4 Side {A2,A0,A3} ...... 126
xii List of Tables
4.1ComparisonofMultichannelAudioCompressionAlgorithms...... 50
4.2 Critical Band Boundaries(Hz) ...... 59
4.3 Comparison Between BCC and Other Popular Multichannel Audio Com- pressionAlgorithms...... 67
6.1CorrespondenceBetweenDSTAudioChannelsandSpeakers...... 111
6.2 Stimuli for MUSHRA Tests on HSR Soundfield Systems ...... 112
xiii Chapter 1
Introduction
1.1 Background Overview
Human beings live in a space congested by a variety of sound sources originating from different directions. Since we can not close our ears just like we can close our eyes, people are constantly exposed to a world of sound. This sonic environment serves as an essential element in our lives, and contributes in terms of both orientating ourselves and entertaining. Beyond the limited range of vision, the information delivered from sound sources helps people to discover, identify and localize their surroundings. The perception of sound also enriches a human’s capability of learning and exploring the physically unreachable world through communication. Moreover, sound can bring considerable pleasure to listeners. By stimulating the imagination of recipients, sound may present extraordinary auditory scenarios. The tremendous enthusiasm of the audience may also be strongly boosted by the melody and harmony of an inspirational piece of music.
However, the fidelity of a particular sound event was rather time and space lim- ited until 1877[1], when human speech, the famous “Mary Had a Little Lam”, was
first recorded by Thomas Edison. In the same year, a patent on transporting sound was filed by Alexander Graham Bell[2]. These inventions revealed the possibility of
1 transmitting and replicating a sound event at another time and location. As a re- sult, the reproduction of sound events is now extensively deployed in the film and broadcasting industries to enhance the perceptual quality and pleasure[3]. Provided with well reconstructed and synchronized audio tracks, viewers are able to tolerate the occasionally missing frames in motion pictures[4]. However, an inappropriate delivery and reproduction of sound sources over a long period of time or distance can cause serious confusion or misunderstanding. Hence, improving the quality of replicated sound events has been one of the major issues and under ongoing investigation ever since[5][6][7].
In the first half of previous century, most of the research work on sound reproduc- tion focused on retaining the accurate temporal characteristics of a sound event during reconstruction, despite the fact that hearing, the perception of sound, does not only find itself working in the time domain but also possesses some spatiality[8]. In another word, the directional perspective of a sound source, which features its localization and interaction with listeners, has long been neglected during the process of reconstructing the sound event at another time or location. The lack of spatial analysis and synthe- sis may lead to a mis-localized sound event, and cause confusion to recipients during their listening. Even high fidelity sound reproduction can be deceptive when being implemented via different arrangements of loudspeakers.
This situation has gradually changed in recent years. There are an increasing number of publications discussing the importance of spatial analysis and synthesis, studying different scenarios of spatial hearing and proposing various kinds of sound reproduction systems[9][10][11][12]. As a result, sound reconstruction has become one of the fastest growing segments in the consumer audio market[13]. With the assistance of Digital Signal Processing(DSP) technology, the applications of sonic replication have been expanded to the fields of virtual reality[14], education[15] and air traffic control[16]. In addition, the reproduced sound that maintains a good spatial quality has found its way into an extensive range of commercial products, like home theater
2 entertainment systems[17][18], teleconferencing equipment[19] and gaming surround sound systems[20].
Fundamental to these applications, sound reconstruction is often incorporated into multimedia systems. Involving a mixture of senses, such systems are able to provide a fairly realistic experience for recipients and allow their active participation. Some researchers, for example from Dolby Laboratories, are trying to merge auditory local- ization with visual perception to immerse recipients into a recreated virtual space, a developing trend in the modern gaming and film industries[21]. It is believed that, compared to the visual perceptibility, hearing is usually peripheral and considered as the secondary sense for acquiring information within these multimedia systems[22]. As the perception of sound is an integrated process of physics, physiology and psychol- ogy[23], it can be considerably affected by the introduction of visual information from the psychological perspective. It seems to be the case that our brains tend to believe what we can see more than what we can hear. Nevertheless, this belief has underes- timated the importance of precisely replicating sound events in multimedia systems. Besides, the consumer market in the audio/music industry possesses a tremendous in- terest in creating an incredibly realistic and immersive sound environment. Therefore, the pursuit of a faithful reconstruction of live sound events, without the presence of any other perceptual information, never ends in the Audio Engineering Society(AES)[24]. For instance, the ultimate goal of using home entertaining theater systems is defined by Dermot Furlong[25] as being able,“to optimally reconstruct the concert hall experi- ence for the domestic living room listener.” Achieving a precise reproduction of sound events, in terms of both spatiality and temporal fidelity, is the primary goal of this master project as well.
3 1.2 Reconstruction Systems
In general, current available sound reconstruction systems in the consumer market consist of three sections(cf. Figure 1.1): recording, encoding and decoding, and play- back. Those microphone capsule signals are passed through a certain codec to generate
Figure 1.1: General Structure of Sound Reconstruc- tion Systems the inputs for playback whose configurations have normally been decided in advance.
Compression techniques may be applied to reduce the bandwidth required for repre- senting these signals.
According to different sound reproduction mechanisms and apparatus in the play- back section, these sound reconstruction systems can be roughly classified into the following three families(cf. Figure 1.2).
Figure 1.2: Different Sound Reconstruction Systems
4 • Binaural Reconstruction System[26]
As implied in its name, the binaural reconstruction focuses on correctly repro- ducing sound signals at the ears. Typically, such a system uses headphones to play sound. In this way, signals can be directly delivered to the ear entrances.
By using this kind of system, recipients are always at the sweet spot which is the best location for a good auditory perception.
One of the demerits of the binaural system is that the awareness of wearing headphones may be undesirable for listeners to feel completely immersed in an auditory scenario. In addition, as the recipient’s head may be moving during a
replay, the reproduced sound can be misleading and cause confusion.
• Transaural Reconstruction System[27][28]
Similar to binaural reconstruction, the transaural approach is also dedicated
to retaining the sound signals at the listener’s ear entrance, whereas speak- ers, instead of headphones, are deployed in such a system. Using speakers can successfully avoid the side effects of headphones, as well as guarantee a good precision.
However, listeners are extremely space-limited in such a system, since the sweet spot for a transaural reconstruction system is rather small. As soon as the subject moves away from the ideal position, the quality of perception would be dramatically reduced.
• Soundfield Reconstruction System[29]
Different from the previous two families, a soundfield reconstruction system (cf. Figure 1.2c) utilizes a number of speakers to faithfully reproduce the sound pressure level(SPL) across a large area. In such a system, listeners are free to move around, and still able to perceive sounds with relatively good qualities. In
addition, it is also possible for a group of audience to enjoy the well replicated sound environment together within the same listening space.
5 However, such systems suffer from a severe problem that an enormous number of speakers and channels are required to handle a good reconstruction of soundfields over a large area[25][30]. This leads to the problems of storing and transmitting data[31]. The huge amount of data resulting from a multichannel recording of soundfields by using such a system is extremely difficult to store, despite the
advent of a powerful storage medium, the Digital Versatile Disc(DVD)[32], in the 1990s. Transmitting data is also problematic, even with the “huge” bandwidth networks currently available.
A second problem of the soundfield reconstruction systems is that currently well commercialized systems have to be coincident with certain arrangements of
speakers. In another word, adapting these systems to an irregular or any layout of loudspeakers can be extremely complicated.
Hence, this master thesis focuses on one of the most challenging families, the soundfield reconstruction system, with the primary goal of faithfully replicating a soundfield from both spatial and temporal perspectives. To clarify the expression, all the reconstruction systems mentioned in the following chapters of this thesis refer to the soundfield reconstruction systems using multiple speakers.
1.3 Motivation for Current Project
In modern societies, the need for a sensational reproduction of soundfields is booming in the consumer market, due to the influence of currently rapidly growing entertain- ment industry. Take computer games as an example: there is a high demand for an immersive gaming environment. Such a realistic sonic environment can be produced with the incorporation of visual perception. Meanwhile the accurate reconstruction of soundfields, in terms of both spatiality and fidelity, without the assistance of any other senses, is also a vital part of the market. Being able to enjoy the same incredible
6 sensation at home as in a real concert hall is the primary goal of most research on soundfield reconstruction[25][33][34].
For a long time, the analysis and synthesis of soundfields has been focused on im- proving the temporal fidelity, while leaving the spatial information unattended. The lack of spatial analysis could eventually affect the perceptual quality of the reproduced soundfields, possibly causing severe disorientation to the audience. Hence, it is nec- essary to take directional characteristics into account during soundfield recording and reproduction.
Meanwhile, currently available soundfield reconstruction systems have several ma- jor issues.
Firstly, as mentioned previously in section 1.2, an extraordinarily large num- ber1 of speakers are required to reproduce a soundfield of good perceptual quality over a large area, and this potentially causes problems in storage and transmission. It has been suggested in literature that a large number of chan- nels are essential for an immersive soundfield reconstruction. When a better spatial and temporal performance over a large area is targeted, this number will dramatically increase. As a result, an enormous increment will be introduced to the bitrates re- quired to represent the speaker input signals, if currently popular compression schemes are used. Arguably the development in storage media capacity and the bandwidth of networking is currently approaching its limit, and it is still inadequate to handle such a dramatic increase in the bandwidth of multichannel audio representations.
Secondly, currently well commercialized sound reconstruction systems are limited by the pre-determined arrangement of speakers in the play- back section. Currently in most of the cases, a certain layout of speakers is chosen, and then systems, like Ambisonics2, are designed accordingly. The process of convert- ing microphone signals to speaker feeds within these systems is relatively easy to be implemented. However, it is not always the same case on site which can be either a 1Details about this number can be found in Section 3.1.4. 2Details about this system will be reviewed in Section 3.1.4.
7 consumer’s home or a stadium. The placement of speakers is often different from the ideal layout, due to the variations of space, room structure and furniture arrangement. In this case, the quality of reproduction can not be the same as the proposed. Such a low compatibility of soundfield reconstruction systems with the irregular speaker layouts can lead to a large spatial distortion and an unsatisfactory listening experi- ence[35]. Moreover, as the number of speakers increases, it is more complex to design a suitable process to calculate the speaker feeds from microphone signals.
These two issues in current soundfield reconstruction systems are the premises of this thesis. Some research on the topics of multichannel audio and soundfield recon- struction[36][35] is found to address these issues separately in recent years. However, few of them are able to synthesize these two problems, and implement a soundfield reconstruction system which suceeds in both quality and efficiency. Meanwhile, a comparison between such a soundfield reconstruction system and currently popular systems in the market can be incredibly valuable for consumers who are aspiringly after a better system.
1.4 Contribution
This master project addresses both of the above two issues during the analysis and synthesis of soundfields, and offers a comparison between a state-of-art soundfield reconstruction system and currently well commercialized systems. This comparison can potentially be used to direct the consumer market towards the development of better soundfield reconstruction systems.
An extensive range of audio recording, reproduction and compression techniques are reviewed during this research project to provide the basis of the comparison. Par- ticularly, the principles of the High Spatial Resolution (HSR)3 recording technique
3High Spatial Resolution (HSR) technique was proposed by Arnaud Laborie et al in 2003[110].
8 and the Binaural Cue Coding (BCC)4 algorithm are studied in detail. In order to realize an effective and efficient soundfield system, the entire process of multichannel recording, compression and reproduction using HSR technique and BCC compression algorithm is implemented in an anechoic chamber that is also used as the listening environment for the later subjective tests.
A framework which is controlled by a number of factors, including the config- urations of both the microphone array and the speaker set, is built up to convert multichannel recordings into loudspeaker input signals for reproduction. In addition to its function of encoding and decoding soundfields, this framework is also a flexible process that can be adapted according to various arrangements of the speakers in dif- ferent home theater systems. Therefore, such a framework can not only be used in future research works in soundfield reconstruction, but also potentially be integrated into the home theater systems or set top boxes to improve the quality of on-site sound- field reconstructions. Furthermore, the internal product of this framework, vectorp ˆ
(cf. Equation 5.8), can be regarded as another format of audio recordings and utilized in storage and transmission in the future, because this vector, similar to the B-format in Ambisonics(cf. Section 3.1.4), can be used to describe the recorded soundfields.
As an important section of this research project, Binaural Cue Coding, the key algorithm in recently ratified MPEG5 multichannel audio codec, has been reviewed, implemented and applied to compress the feeds of multiple speakers. Compared with currently popular audio compression algorithms, this multichannel coding technique is purported to dramatically reduce the bandwidth required for multichannel audio representation, while maintaining spatiality and temporal fidelity. It is the intention of this project to evaluate just how well these cues are maintained in comparison to the uncompressed soundfield reproduction. During the implementation of BCC, several algorithms are also proposed and compared, in terms of accuracy and computational complexity, to determine the most efficient technique for spatial cue estimation.
4Binaural Cue Coding (BCC) algorithm was invented by Christof Faller et. al. in 2002[109]. 5MPEG is the abbreviation of Moving Pictures Expert Group.
9 Eventually, subjective experiments were conducted on various replicated sound- fields, to verify the performance of the reconstruction systems. The results are an- alyzed, and used to conclude the comparison of different systems. To facilitate the conclusion, a novel approach was undertaken during these experiments, which was to ensure the whole process of soundfield analysis and synthesis is staged. In another word, the subjective evaluation tests were carried out in exactly the same acoustic environment as the recordings, so that those reproduced soundfields are comparable with the original.
1.5 Dissertation Overview
Motivated by the pursuit of faithfully reproduced soundfields, this master thesis studies two dominant issues in currently popular soundfield reconstruction systems, realizes a good soundfield reconstruction using state-of-art techniques, and offers a comparison between such a system and currently well-commercialized multichannel systems. The content of this thesis is organized into five chapters.
Chapter 2 briefly reviews the representation of sound from both physical and psy- choacoustic perspectives. The wave equations of sound in both cartesian coordinates and spherical coordinates are first explained. A frequency domain equivalent solution is then described. This chapter also illustrates several psychoacoustic phenomena, which are fundamental in soundfield reconstruction systems.
In order to clearly understand the major issues in reconstructing soundfields, a variety of currently available systems are reviewed in Chapter 3. Everything from the earliest stereo systems to the up-to-date and most popular are reviewed, as well as some soundfield reconstruction techniques yet to be commercialized. In addition, a brief introduction of modern audio coding schemes is also included in this chapter. It helps with the understanding of problems in audio storage and transmission.
10 A multichannel audio compression algorithm, Binaural Cue Coding(BCC), is dis- cussed in Chapter 4. It first introduces the fundamental information of BCC and its generic coding scheme. The details of the encoding and decoding process, including the estimation of the essential information, are then explained. A comparison with other audio coding techniques is included in this chapter to show the advantage of
BCCinregardtocodingefficiency.
Chapter 5 looks into the new approach of faithfully reproducing a soundfield, the High Spatial Resolution (HSR) technique. The whole chapter is divided into several sections which explore the different stages of this soundfield reconstruction system.
It starts with a general review of recording techniques, followed by the recording principle of this new approach. The stage of post-recording analysis where the acoustic characteristics of a recording are analyzed and refined is described next. Chapter 5 also discusses the decoding process prior to the playback stage. In general, this chapter proposes a simple framework to enable a accurate reconstruction of soundfields on any loudspeaker arrangement.
All the experiments conducted during this project are described in Chapter 6. Firstly, several methods which are proposed to estimate essential cues in BCC are compared in terms of accuracy and computational complexity. The best option is then applied during the implementation. The following sections of this chapter ex- plain the setup and conduct of those subjective experiments. The temporal fidelity of several reproduced multichannel audio clips which have only been through BCC codec is examined during the first part of the tests, while other experiments are also conducted to evaluate both the spatiality and the temporal quality of various repro- duced soundfields. Experimental results are then analyzed to justify the performance of different soundfield reconstruction systems.
Finally, Chapter 7 briefly summarizes the research work involved in this thesis. Some suggestions for future research are then proposed.
11 Chapter 2
Soundfield Physics and Psychoacoustics
The perception of sound is recognized as an integrated process involving physics, psy- chology, and physiology[8]. Therefore, the physical representation of soundfield is first explained in this chapter to reveal the possibility of accurately recording/analyzing and reproducing/synthesizing soundfields. The second part of this chapter summarizes various psychoacoustic phenomena of sound spatial perception which are fundamental in soundfield reproduction and compression. Such a theoretical review of the char- acteristics of soundfield from these perspectives provides a better understanding of the mechanisms of existing soundfield systems and compression techniques which will be reviewed in Chapter 3. It also implies the essentials of soundfield systems which should be preserved throughout the entire process of recording, reproduction and com- pression. As this research project does not look into the physiological perspective of hearing, details about the physiological studies on sound perception will not be further described in the following sections.
12 2.1 Physical Representation
From a physical perspective, sound is considered as the product of vibrations which occur among the molecules of a medium. These vibrations can cause condensation and rarefaction in local regions of the medium where the vibrations occur and create differences in pressure, which is fundamental to the propagation of sound. As a result, the dynamic sound pressure, p(x, y, z, t) is often measured by microphones (cf. Figure
2.1) to describe a sound event.
Figure 2.1: A Chirp Signal Recorded by an Omnidi- rectional Microphone
The propagation of sound in a homogeneous medium can be described by the wave equation (cf. Equation 2.1) in three dimensions[37]. Solutions to this equation can be used to describe a certain soundfield, or explain sonic phenomena within this soundfield, since any valid soundfield should comply with the wave equation:
1 ∂2p Δp = . (2.1) c2 ∂t2 where Δ = ∂2/∂x2 + ∂2/∂y2 + ∂2/∂z2 is the Laplace operator in three dimensional
13 (3D) cartesian coordinates.
Figure 2.2: Spherical Coordinates: r is the distance, θ is the polar angle and φ is the azimuth angle[37].
Due to the convenience of the alternative spherical coordinate system (cf. Figure 2.2), the wave equation 2.1 is often expressed in spherical coordinates as[37],