Elevated Surround Sound System Using 3D Audio Post Processing
Total Page:16
File Type:pdf, Size:1020Kb
3D Audio Processing for Elevated Speakers Using the TI C62 EVM Board Woon-Seng Gan, See-Ee Tan and Meng-Hwa Er Digital Signal Processing Laboratory, School of Electrical & Electronic Engineering Nanyang Technological University Singapore 639798 [email protected] Abstract surround system is gaining popularity. As the number of loudspeakers increased, the number of 3D audio processing has been used to render virtual cables to these loudspeakers also increased. It is surround sound using 2 or more speakers placed at therefore important to conceal these wires to have a ear level. However, we can extend this 3D audio tidy appearance in a home environment. However, to signal processing to other commonly speakers’ conceal these wires might be very costly. Therefore, setups. Here, we introduce a new technique to bring it is desirable that wires can be concealed in the down sound images produced by elevated ceiling or cornes and the loudspeakers placed near loudspeakers. This feature allows a cleaner setup for the ceiling, which can be more easily achieved loudspeakers for home entertainment and virtual without much cost. A problem with this setup is that reality gaming environments and at the same time the speaker is no longer at ear-level and this cause preserves the auditory image to match the video elevated sound image. The problem can result in a display at eye level. In this paper, we address the mismatch between picture content and sound scene. problem of crosstalk cancellation for multiple In order to compensate this distortion, this paper speakers’ configurations and provide different proposes a correction scheme to reproduce the signal inverse filter structures and configurations to address as if it is being played back from ear-level this problem. loudspeakers. We have implemented these algorithms using the II. Head Related Transfer Function and Texas Instrument TMS320C6201 EVM board that is Elevation Cues interfaced to an 8-channel I/O board, built using 4 Source Crystal CS4218 CODECs. A GUI is also developed on the PC for user to select between different modes and parameters. This allows user to select either a 2, 4 or 6-speakers playback configuration. It also enables the user to control the angle of the elevated HRTFL speakers. Computational complexity, programming techniques and memory usage for the C62xx DSP are HRTFR also discussed in this paper. I. Introduction In home entertainment and computing system, the 60o proposed number of loudspeakers has been Listener increasing with the demand for better sound effects. There are many coding standards that can support 5.1 Figure 1. Sound Transmission from Source to channels such as the Dolby Digital, DTS, MPEG. Listener The new THX surround EX can even support up to 7.1 channels. These standard has found its Figure 1 shows a sound source traveling from an application in Digital Versatile Disk (DVD) and azimuth angle of 60o to the listener. The HRTF [1] is digital television (DTV) system. For computing the spatial transformation of a source from a point in environment, users are no longer satisfied with the free space to the listener's eardrums. Its time domain traditional stereo speakers and the four-point form is known as the Head Related Impulse Response (HRIR). In the context of this paper, we YLe shall use HRTF to refer to both time and frequency y = e Y representations. The three major cues of localization Re encoded in a HRTF are inter-aural intensity X = Le difference (IID), inter-aural time difference (ITD) xe X and modification of spectral profile [2]. These cues Re are the result of path length difference between the H H H = LLe RLe sound source and the two ears, head shadowing, e H H diffraction and reflection of head, outer ear and torso LRe RRe that are linear time-invariant (LTI). where, ye is a column vector representing the signal The ITD cue is dominant at low frequency where the at the eardrums of the listener contributed by the head dimension are small compared with the elevated loudspeakers, xe is a column vector wavelength of the sound. At frequency above representing the signal output to the elevated 1.5kHz, the IID is the dominant cue [1]. In the loudspeakers and He is a matrix representing the median plane and along the cone of confusion, there transmission path from the elevated sound source to the listener's eardrums, which are modeled by the are no IID and ITD, localization therefore must rely o on the spectral profile. Hence, for elevated source respective HRTF (HLLe, HRLe, HLRe and HRRe) at 60 elevation. In this project, all speakers are placed at the spectral profile is important and cannot be o neglected. So, the simple crosstalk cancellation an elevation of 60 with reference to the listener. model in [3] implemented for horizontal can no longer be used. Similarly, the ear-leveled system can be described by the following equations: III. Elevated Image Correction for Two = Loudspeakers y H x (2) Y X X L Le Re y = YR Elevated Loudspeakers X x = L X R HLLe HLRe HRLe HRRe H H H = LL RL H LR H RR Project X X Down L R where, y is a column vector representing the signal at Phantom the eardrums of the listener contributed by the ear- Ear-Leveled H H Loudspeakers leveled loudspeakers, x is a column vector LR RL representing the signal output to the ear-leveled H loudspeakers and H is a matrix representing the LL HRR transmission path from the ear-leveled sound source YL YR to the listener's eardrums, which are modeled by the o respective HRTF (HLL, HRL, HLR and HRR) at 0 Figure 2. Elevated Loudspeaker Setup and the elevation (i.e. listener’s eye level). Desired Phantom Loudspeakers In order to create the phantom ear-leveled Figure 2 shows the correction of the elevated loudspeakers, the signal at the listener's eardrum loudspeakers image of a two-channel system by produced using the physical elevated loudspeakers creating two phantom loudspeakers at ear-leveled via should be identical to that when the loudspeakers are a psycho-acoustics signal processing explained as placed at ear-leveled (i.e. equating Egn (1) to (2)). follows. The elevated loudspeaker system can be Hence, described by the following equations: y = y (3) y = H x (1) e e e e = H e xe H x − ~ x = H 1H x H = t H (10) e e (4) e e = Hk x By inverting Eqn (10) and post-multiplying it with H H = 11k 12k Eqn (9), the distortion, t is cancelled out. where H k (5) H 21k H 22k ~ − ~ − H 1H = H 1H (11) e e Therefore, in order to produce the desired signal at ear-level using the phantom ear-leveled loudspeakers IV. Extension to Multiple Loudspeakers with input signal x, the signal xe applied to the elevated loudspeakers should be preprocessed The discussion above has been based on a two- according to Eqn (4). channel system that can be modelled by a 2x2 matrix. If the number loudspeakers are increased to n, then In 3D audio rendering and crosstalk cancellation, the system can be described by a 2xn matrix. In such certain form of equalization [3] is required to obtain a system where the number of loudspeakers is greater the desired directional information without the than the number of ears, many solutions for He exist distortion caused by a non-flat microphone and [4]. For such cases, by computing the pseudoinverse + loudspeaker response. For headphone, usually a He , a minimum power solution can be obtained. diffused-field equalization is performed on the HRTF used. In loudspeaker reproduction, a free-field + − − −1 H (z) = H T (z 1 )[]H (z)H T (z 1 ) (12) equalization is preferred. It can be shown that for e e e e loudspeakers pair with non-identical response, the However, in this paper another approach is adopted loudspeakers' response does not cancel) ) out each other − by exploiting the linear time invariant property of the through based on the relation H 1H . However, if e system. By partitioning the loudspeakers into pairs, the) differences) in the responses are small, then the signal at the eardrums can be described as: H −1H ≈ H −1H can be easily deduced from (8). For e e y = H x microphone, even non-identical response can be 1 k1 1 cancelled off in the filter design process. y = H x 2 k2 2 (13) ) Μ = H M H S (6) y = H x ) m km m = H e M H e S (7) = + +Λ + y T y k y 2 y m SL 0 M L 0 where S = and M = is the m (14) = 0 SR 0 M R y T ∑ H kixi i=1 response of the loudspeaker and microphone respectively. where, yi is the signal at eardrums contributed by the th th i loudspeaker pair, xi is the input signal to the i S R loudspeaker pair, Hki is the 2x2 matrix that defined ) ) H11k H12k − S the transfer function of the transmission path from H 1H = L (8) e S the ith pair loudspeaker to the eardrums of the listener H L H 21k 22k and m is the total number of loudspeaker pairs. The SR total signal at the eardrums, yT is the sum of contribution from all the m loudspeaker pairs. Further, we can show that the HRTF does not require any further equalization if we assume the response of The non-overlapping grouping scheme for the left and right loudspeakers and also microphones loudspeaker pairs is shown in Figure 3 for the case of are identical and the measuring setup for all the 6 speakers, arranged at an angle of 60o apart.