Dual-Channel Speech Enhancement by Superdirective Beamforming
Total Page:16
File Type:pdf, Size:1020Kb
Hindawi Publishing Corporation EURASIP Journal on Applied Signal Processing Volume 2006, Article ID 63297, Pages 1–14 DOI 10.1155/ASP/2006/63297 Dual-Channel Speech Enhancement by Superdirective Beamforming Thomas Lotter and Peter Vary Institute of Communication Systems and Data Processing, RWTH Aachen University, 52056 Aachen, Germany Received 31 January 2005; Revised 8 August 2005; Accepted 22 August 2005 In this contribution, a dual-channel input-output speech enhancement system is introduced. The proposed algorithm is an adap- tation of the well-known superdirective beamformer including postfiltering to the binaural application. In contrast to conventional beamformer processing, the proposed system outputs enhanced stereo signals while preserving the important interaural ampli- tude and phase differences of the original signal. Instrumental performance evaluations in a real environment with multiple speech sources indicate that the proposed computational efficient spectral weighting system can achieve significant attenuation of speech interferers while maintaining a high speech quality of the target signal. Copyright © 2006 T. Lotter and P. Vary. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 1. INTRODUCTION Griffiths-Jim beamformer [6], or extensions, for example, in [7, 8], is most widely known. Adaptive beamformers can be Speech enhancement by beamforming exploits spatial diver- considered less robust against distortions of the desired sig- sity of desired speech and interfering speech or noise sources nal than fixed beamformers. by combining multiple noisy input signals. Typical beam- Beamforming for binaural input signals, that is, signals former applications are hands-free telephony, speech recog- recorded by single microphones at the left and right ear, has nition, teleconferencing, and hearing aids. Beamformer real- found significantly less attention than beamformers for (lin- izations can be classified into fixed and adaptive. ear) microphone arrays. An important application is the en- A fixed beamformer combines the noisy signals of mul- hancement of speech in a difficult multitalker situation using tiple microphones by a time-invariant filter-and-sum opera- binaural hearing aids. tion. The combining filters can be designed to achieve con- Current hearing aids achieve a speech intelligibility im- structive superposition towards a desired direction (delay- provement in difficult acoustic condition by the use of inde- and-sum beamformer) or in order to maximize the SNR im- pendent small endfire arrays, often integrated into behind- provement (superdirective beamformer), for example, [1]. the-ear devices with low microphone distances around 1- As practical problems such as self-noise and amplitude or 2 cm. When hearing aids are used in combination with eye- phase errors of the microphones limit the use of optimal glasses, larger arrays are feasible, which can also form a bin- beamformers, constrained solutions have been introduced aural enhanced signal [9]. that limit the directivity to the benefit of reduced suscepti- Binaural noise reduction techniques get into attention, bility [2–4]. Most fixed beamformer design algorithms as- when space limitation forbids the use of multiple micro- sume the desired source to be positioned in the far field, phones in one device or when the enhancement benefits of that is, the distance between the microphone array and the two independent endfire arrays are to be combined with bin- source is much greater than the dimension of the array. Near- aural processing benefit. In contrast to an endfire array, a field superdirectivity [5] additionally exploits amplitude dif- binaural speech enhancement system must work with a dual- ferences between the microphone signals. Adaptive beam- channel input-output signal, at best without modification of formers commonly consist of a fixed beamformer steered to- the interaural amplitude and phase differences in order not wards a desired direction and a time-varying branch, which to disturb the original spatial impression. adaptively steers beamformer spatial nulls towards inter- Enhancement by exploiting coherence properties [10]of fering sources. Among various adaptive beamformers, the the desired source and the noise [3, 11] has the ability to 2 EURASIP Journal on Applied Signal Processing reduce diffuse noise to a high degree, however fails in sup- and right ears do not only differ in the time difference de- pressing sound from directional interferers, especially un- pending on the position of the source relative to the head. wanted speech. Also, due to the adaptive estimation of the Furthermore, the shadowing effect of the head causes sig- instantaneous coherence in frequency bands, musical tones nificant intensity differences between the left- and right-ear canoccur.In[12, 13], a noise reduction system has been microphone signals. Both effects are described by the head- proposed, that applies a binaural processing model of the related transfer functions (HRTFs) [17]. human ear. To suppress lateral noise sources, the interaural Figure 1(a) shows a time signal s arriving at the micro- level and phase differences are compared to reference values phones from the angle θS in the horizontal plane. The time for the frontal direction. Frequency components are attenu- signals at the left and right microphones are denoted by yl, ated by evaluation of the deviation from reference patterns. yr. The microphone signal spectra can be expressed by the However, the system suffers severely from susceptibility to re- HRTFs towards left and right ears Dl(ω), Dr(ω). As the beam- verberation. In [14], the Griffiths-Jim adaptive beamformer former will be realized in the DFT domain, a DFT representa- [6] has been applied to binaural noise reduction in subbands, tion of the spectra is chosen. At discrete DFT frequencies ωk and listening tests have shown a performance gain in terms with frequency index k, the left- and right-ear signal spectra of speech intelligibility. However, the subband Griffiths-Jim are given by approach requires a voice activity detection (VAD) for the filter adaptation which can cause cancellation of the desired Yl ωk = Dl ωk S ωk , Yr ωk = Dr ωk S ωk . speech when the VAD frequently fails especially at low signal- (1) to-noise ratios. In [15], a two-microphone adaptive system is presented ffi Here, S(ωk) denotes the spectrum of the original signal s.For with the core of a modified Gri ths-Jim beamformer. By k ω lowband-highband separation, a tradeoff is provided be- brevity, the frequency index is used instead of k. tween array-processing benefit and binaural benefit by the The acoustic transfer functions are illustrated in Figure 1. ff The shadowing effect of the head is described by multiplica- choice of the cuto frequency. In the lower band, the bin- ffi S k aural signal is passed to the respective ear. The directional tion of each spectral coe cient of the input spectrum ( ) filter is only applied to the high-frequency regions, whose with an angle and frequency-dependent physical amplitude αphy αphy influence to sound localization and lateralization is consid- factors l , r for the left- and right-ear side. The physical τphy τphy ered less significant. Both adaptive algorithms from [14, 15] time delays l , r , that characterize the propagation time have the ability to adaptively cancel out an interfering source. from the origin to the left and right ears, are approximately However, the beamformer adaptation procedure needs to be considered to be frequency-independent. The HRTF vector coupled to a voice activity detection (VAD) or correlation- D canthusbewrittenby based measure to counteract against possible target cancella- T tion. phy − jω τphy θ phy − jω τphy θ θ k = α θ k e k l ( s) α θ k e k r ( s) . In this contribution, a full-band binaural input-output D s, l s, , r s, array that applies a binaural signal model and the well- (2) known superdirective beamformer as core is presented [16]. The dual-channel system thus comprises the advantages of a For convenience, the physical transfer function can be nor- fixed beamformer, that is, low risk of target cancellation and malized to that of zero degree. With αphy(0◦, k):= αphy(0◦, computational simplicity. l k = αphy ◦ k τphy ◦ = τphy ◦ = τphy ◦ To deliver an enhanced stereo signal instead of a mono ) r (0 , )and (0 ): l (0 ) r (0 ), the αnorm αnorm output, an efficient adaptive spectral weight calculation is in- normalized amplitude factors l , r and time delays τnorm τnorm troduced, in which the desired signal is passed unfiltered and l , r ,respectively,canbewrittenas which does not modify the perceptually important interau- ral time and phase differences of the target and residual noise αphy θ k norm l S, α θS k = signal. To further increase the performance, a well-known l , phy , α 0◦, k Wiener postfilter is also adapted for the binaural application l under consideration of the same requirements. norm phy phy ◦ τ θS = τ θS − τ 0 , The rest of the paper is organized as follow. In Section 2, l l l (3) the binaural signal model is introduced as a basis for the phy αr θS, k beamformer algorithm. Section 3 includes the proposed su- αnorm θS, k = , r αphy ◦ k perdirective beamformer with dual-channel input and out- r 0 , put as well as the adaptive postfilter. Finally, in Section 4 per- τnorm θ = τphy θ − τphy ◦ . formance results are given in a real environment. r S r S r 0 αphy αphy 2. BINAURAL SIGNAL MODEL The transfer vector D or the amplitudes l , r and time τphy τphy delays l , r as well as their normalized versions are in For the derivation of binaural beamformers, an appropriate the following obtained by two different approaches. Firstly, a signal model is required. The microphone signals at the left database of measured head-related impulse responses is used T.