EURASIP Journal on Advances in Signal Processing
Microphone Array Speech Processing
Guest Editors: Sven Nordholm, Thushara Abhayapala, Simon Doclo, Sharon Gannot, Patrick Naylor, and Ivan Tashev Microphone Array Speech Processing EURASIP Journal on Advances in Signal Processing
Microphone Array Speech Processing
Guest Editors: Sven Nordholm, Thushara Abhayapala, Simon Doclo, Sharon Gannot, Patrick Naylor, and Ivan Tashev Copyright © 2010 Hindawi Publishing Corporation. All rights reserved.
This is a special issue published in volume 2010 of “EURASIP Journal on Advances in Signal Processing.” All articles are open access articles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Editor-in-Chief Phillip Regalia, Institut National des Tel´ ecommunications,´ France
Associate Editors
Adel M. Alimi, Tunisia Sudharman K. Jayaweera, USA Douglas O’Shaughnessy, Canada Kenneth Barner, USA Soren Holdt Jensen, Denmark Bjorn¨ Ottersten, Sweden Yasar Becerikli, Turkey Mark Kahrs, USA Jacques Palicot, France Kostas Berberidis, Greece Moon Gi Kang, South Korea Ana Perez-Neira, Spain Enrico Capobianco, Italy Walter Kellermann, Germany Wilfried R. Philips, Belgium A. Enis Cetin, Turkey Lisimachos P. Kondi, Greece Aggelos Pikrakis, Greece Jonathon Chambers, UK Alex Chichung Kot, Singapore Ioannis Psaromiligkos, Canada Mei-Juan Chen, Taiwan Ercan E. Kuruoglu, Italy Athanasios Rontogiannis, Greece Liang-Gee Chen, Taiwan Tan Lee, China Gregor Rozinaj, Slovakia Satya Dharanipragada, USA Geert Leus, The Netherlands Markus Rupp, Austria Kutluyil Dogancay, Australia T.-H. Li, USA William Sandham, UK Florent Dupont, France Husheng Li, USA B. Sankur, Turkey Frank Ehlers, Italy Mark Liao, Taiwan Erchin Serpedin, USA Sharon Gannot, Israel Y.-P. Lin, Taiwan Ling Shao, UK Samanwoy Ghosh-Dastidar, USA Shoji Makino, Japan Dirk Slock, France Norbert Goertz, Austria Stephen Marshall, UK Yap-Peng Tan, Singapore M. Greco, Italy C. Mecklenbrauker,¨ Austria Joao˜ Manuel R. S. Tavares, Portugal IreneY.H.Gu,Sweden Gloria Menegaz, Italy George S. Tombras, Greece Fredrik Gustafsson, Sweden Ricardo Merched, Brazil Dimitrios Tzovaras, Greece Ulrich Heute, Germany Marc Moonen, Belgium Bernhard Wess, Austria Sangjin Hong, USA Christophoros Nikou, Greece Jar-Ferr Yang, Taiwan Jiri Jan, Czech Republic Sven Nordholm, Australia Azzedine Zerguine, Saudi Arabia Magnus Jansson, Sweden Patrick Oonincx, The Netherlands Abdelhak M. Zoubir, Germany Contents
Microphone Array Speech Processing, Sven Nordholm, Thushara Abhayapala, Simon Doclo, Sharon Gannot (EURASIPMember), Patrick Naylor, and Ivan Tashev Volume 2010, Article ID 694216, 3 pages
Selective Frequency Invariant Uniform Circular Broadband Beamformer,XinZhang,WeeSer, Zhang Zhang, and Anoop Kumar Krishna Volume 2010, Article ID 678306, 11 pages
First-Order Adaptive Azimuthal Null-Steering for the Suppression of Two Directional Interferers, ReneM.M.Derkx´ Volume 2010, Article ID 230864, 16 pages
Musical-Noise Analysis in Methods of Integrating Microphone Array and Spectral Subtraction Based on Higher-Order Statistics, Yu Takahashi, Hiroshi Saruwatari, Kiyohiro Shikano, and Kazunobu Kondo Volume 2010, Article ID 431347, 25 pages
Microphone Diversity Combining for In-Car Applications,Jurgen¨ Freudenberger, Sebastian Stenzel, and Benjamin Venditti Volume 2010, Article ID 509541, 13 pages
DOA Estimation with Local-Peak-Weighted CSP, Osamu Ichikawa, Takashi Fukuda, and Masafumi Nishimura Volume 2010, Article ID 358729, 9 pages
Shooter Localization in Wireless Microphone Networks, David Lindgren, Olof Wilsson, Fredrik Gustafsson, and Hans Habberstad Volume 2010, Article ID 690732, 11 pages Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 2010, Article ID 694216, 3 pages doi:10.1155/2010/694216
Editorial Microphone Array Speech Processing
Sven Nordholm (EURASIP Member),1 Thushara Abhayapala (EURASIP Member),2 Simon Doclo (EURASIP Member),3 Sharon Gannot (EURASIP Member),4 Patrick Naylor (EURASIP Member),5 and Ivan Tashev6
1 Department of Electrical and Computer Engineering, Curtin University of Technology, Perth, WA 6845, Australia 2 College of Engineering & Computer Science, The Australian National University, Canberra, ACT 0200, Australia 3 Institute of Physics, Signal Processing Group, University of Oldenburg, 26111 Oldenburg, Germany 4 School of Engineering, Bar-Ilan University, 52900 Tel Aviv, Israel 5 Department of Electrical and Electronic Engineering, Imperial College, London SW7 2AZ, UK 6 Microsoft Research, USA
Correspondence should be addressed to Sven Nordholm, [email protected]
Received 21 July 2010; Accepted 21 July 2010
Copyright © 2010 Sven Nordholm et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Significant knowledge about microphone arrays has been highly reverberant speech given that we only can observe the gained from years of intense research and product develop- received microphone signals. ment. There have been numerous applications suggested, for This special issue contains contributions to traditional example, from large arrays (in the order of >100 elements) areas of research such as frequency invariant beamforming for use in auditoriums to small arrays with only 2 or 3 [1], hand-free operation of microphone arrays in cars [2], elements for hearing aids and mobile telephones. Apart from and source localisation [3]. The contributions show new that, microphone array technology has been widely applied ways to study these traditional problems and give new in speech recognition, surveillance, and warfare. Traditional insights into those problems. Small size arrays have always techniques that have been used for microphone arrays a lot of applications and interest for mobile terminals, include fixed spatial filters, such as, frequency invariant hearing aids, and close up microphones [4]. The novel beamformers, optimal and adaptive beamformers. These way to represent small size arrays leads to a capability to array techniques assume either model knowledge or cali- suppress multiple interferers. Abnormalities in noise and bration signal knowledge as well as localization information speech stemming from processing are largely unavoidable, for their design. Thus they usually combine some form and using nonlinear processing results often in significant of localisation and tracking with the beamforming. Today character change particularly in noise character. It is thus contemporary techniques using blind signal separation (BSS) important to provide new insights into those phenomena and time frequency masking technique have attracted sig- particularly the so called musical noise [5]. Finally, new nificant attention. Those techniques are less reliant on array and unusual use of microphone arrays is always interesting model and localization, but more on the statistical properties to see. Distributed microphone arrays in a sensor network of speech signals such as sparseness, non-Gaussianity, and [6] provide a novel approach to find snipers. This type of non-stationarity. The main advantage that multiple micro- processing has good opportunities to grow in interest for new phones add from a theoretical perspective is the spatial and improved applications. diversity, which is an effective tool to combat interference, The contributions found in this special issue can be reverberation, and noise. The underpinning physical feature categorized to three main aspects of microphone array used is a difference in coherence in the target field (speech processing: (i) microphone array design based on eigenmode signal) versus the noise field. Viewing the processing in this decomposition [1, 4]; (ii) multichannel processing methods way one can understand also the difficulty in enhancing [2, 5]; and (iii) source localisation [3, 6]. 2 EURASIP Journal on Advances in Signal Processing
The paper by Zhang et al., “Selective frequency invariant array signal processing and spectral subtraction. To obtain uniform circular broadband beamformer”[1], describes a better noise reduction, methods of integrating microphone design method for Frequency-Invariant (FI) beamforming. array signal processing and nonlinear signal processing have This problem is a well-known array signal processing tech- been researched. However, nonlinear signal processing often nique used in many applications such as, speech acquisition, generates musical noise. Since such musical noise causes acoustic imaging and communications purposes. However, discomfort to users, it is desirable that musical noise is many existing FI beamformers are designed to have a mitigated. Moreover, it has been recently reported that frequency invariant gain over all angles. This might not be higher-order statistics are strongly related to the amount necessary and if a gain constraint is confined to a specific of musical noise generated. This implies that it is possible angle, then the FI performance over that selected region (in to optimize the integration method from the viewpoint of frequency and angle) can be expected to improve. Inspired not only noise reduction performance but also the amount by this idea, the proposed algorithm attempts to optimize of musical noise generated. Thus, the simplest methods the frequency invariant beampattern solely for the mainlobe of integration, that is, the delay-and-sum beamformer and and relax the FI requirement on the sidelobes. This sacrifice spectral subtraction, are analysed and the features of musical on performance in the undesired region is traded off for noise generated by each method are clarified. As a result, it is better performance in the desired region as well as reduced clarified that a specific structure of integration is preferable number of microphones employed. The objective function from the viewpoint of the amount of generated musical is designed to minimize the overall spatial response of the noise. The validity of the analysis is shown via a computer beamformer with a constraint on the gain being smaller simulation and a subjective evaluation. than a predefined threshold value across a specific frequency The paper by Freudenberger et al., “Microphone diversity range and at a specific angle. This problem is formulated as a combining for in-car applications”[2], proposes a frequency convex optimization problem and the solution is obtained domain diversity approach for two or more microphone by using the Second-Order Cone Programming (SOCP) signals, for example, for in-car applications. The micro- technique. An analysis of the computational complexity phones should be positioned separately to ensure diverse of the proposed algorithm is presented as well as its signal conditions and incoherent recording of noise. This performance. The performance is evaluated via computer enables a better compromise for the microphone position simulation for different number of sensors and different with respect to different speaker sizes and noise sources. This threshold values. Simulation results show that the proposed work proposes a two-stage approach: In the first stage, the algorithm is able to achieve a smaller mean square error of microphone signals are weighted with respect to their signal- the spatial response gain for the specific FI region compared to-noise ratio and then summed similar to maximum-ratio- to existing algorithms. combining. The combined signal is then used as a reference The paper by Derkx, “First-order azimuthal null-steering for a frequency domain least-mean-squares (LMS) filter for for the suppression of two directional interferers”[4] shows each input signal. The output SNR is significantly improved that an azimuth steerable first-order super directional micro- compared to coherence-based noise reduction systems, even phone response can be constructed by a linear combination if one microphone is heavily corrupted by noise. of three eigenbeams: a monopole and two orthogonal The paper by Ichikawa et al., “DOA estimation with dipoles. Although the response of a (rotation symmetric) local-peak-weighted CSP”[3], proposes a novel weighting first-order response can only exhibit a single null, the algorithm for Cross-power Spectrum Phase (CSP) analysis paper studies a slice through this beampattern lying in the to improve the accuracy of direction of arrival (DOA) azimuthal plane. In this way, a maximum of two nulls estimation for beamforming in a noisy environment. As in the azimuthal plane can be defined. These nulls are a sound source, a human speaker is used, and as a noise symmetric with respect to the main-lobe axis. By placing source broadband automobile noise is used. The harmonic these two nulls on maximally two-directional sources to structures in the human speech spectrum can be used for be rejected and compensating for the drop in level for the weighting the CSP analysis, because harmonic bins must desired direction, these directional sources can be effectively contain more speech power than the others and thus give rejected without attenuating the desired source. An adaptive us more reliable information. However, most conventional null-steering scheme for adjusting the beampattern, which methods leveraging harmonic structures require pitch esti- enables automatic source suppression, is presented. Closed- mation with voiced-unvoiced classification, which is not form expressions for this optimal null-steering are derived, sufficiently accurate in noisy environments. The suggested enabling the computation of the azimuthal angles of the approach employs the observed power spectrum, which is interferers. It is shown that the proposed technique has a directly converted into weights for the CSP analysis by good directivity index when the angular difference between retaining only the local peaks considered to be coming the desired source and each directional interferer is at least from a harmonic structure. The presented results show that 90 degrees. the proposed approach significantly reduces the errors in In the paper by Takahashi et al. “Musical noise analysis localization, and it also shows further improvement when in methods of integrating microphone array and spectral used with other weighting algorithms. subtraction based on higher-order statistics”[5], an objective The paper by Lindgren et al., “Shooter localization in analysis on musical noise is conducted. The musical noise wireless microphone networks”[6], is an interesting com- is generated by two methods of integrating microphone bination of microphone array technology with distributed EURASIP Journal on Advances in Signal Processing 3 communications. By detecting the muzzle blast as well as the ballistic shock wave, the microphone array algorithm is able to locate the shooter in the case when the sensors are synchronized. However, in the distributed sensor case, synchronization is either not achievable or very expensive to achieve and therefore the accuracy of localization comes into question. Field trials are described to support the algorithmic development. Sven Nordholm Thushara Abhayapala Simon Doclo Sharon Gannot Patrick Naylor Ivan Tashev
References
[1] X. Zhang, W. Ser, Z. Zhang, and A. K. Krishna, “Selective frequency invariant uniform circular broadband beamformer,” EURASIP Journal on Advances in Signal Processing, vol. 2010, Article ID 678306, 11 pages, 2010. [2] J. Freudenberger, S. Stenzel, and B. Venditti, “Microphone diversity combining for In-car applications,” EURASIP Journal on Advances in Signal Processing, vol. 2010, Article ID 509541, 13 pages, 2010. [3] O. Ichikawa, T. Fukuda, and M. Nishimura, “DOA estimation with local-peak-weighted CSP,” EURASIP Journal on Advances in Signal Processing, vol. 2010, Article ID 358729, 9 pages, 2010. [4]R.M.M.Derkx,“First-orderadaptiveazimuthalnull-steering for the suppression of two directional interferers,” EURASIP Journal on Advances in Signal Processing, vol. 2010, Article ID 230864, 16 pages, 2010. [5] Yu. Takahashi, H. Saruwatari, K. Shikano, and K. Kondo, “Musical-noise analysis in methods of integrating microphone array and spectral subtraction based on higher-order statistics,” EURASIP Journal on Advances in Signal Processing, vol. 2010, Article ID 431347, 25 pages, 2010. [6] D. Lindgren, O. Wilsson, F. Gustafsson, and H. Habberstad, “Shooter localization in wireless sensor networks,” in Proceed- ings of the 12th International Conference on Information Fusion (FUSION ’09), pp. 404–411, July 2009. Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 2010, Article ID 678306, 11 pages doi:10.1155/2010/678306
Research Article Selective Frequency Invariant Uniform Circular Broadband Beamformer
Xin Zhang,1 Wee Ser, 1 Zhang Zhang,1 and Anoop Kumar Krishna2
1 Center for Signal Processing, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798 2 EADS Innovation Works, EADS Singapore Pte Ltd., No. 41, Science Park Road, 01-30, Singapore 117610
Correspondence should be addressed to Xin Zhang, zhang [email protected]
Received 16 April 2009; Revised 24 August 2009; Accepted 3 December 2009
Academic Editor: Thushara Abhayapala
Copyright © 2010 Xin Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Frequency-Invariant (FI) beamforming is a well known array signal processing technique used in many applications. In this paper, an algorithm that attempts to optimize the frequency invariant beampattern solely for the mainlobe, and relax the FI requirement on the sidelobe is proposed. This sacrifice on performance in the undesired region is traded off for better performance in the desired region as well as reduced number of microphones employed. The objective function is designed to minimize the overall spatial response of the beamformer with a constraint on the gain being smaller than a pre-defined threshold value across a specific frequency range and at a specific angle. This problem is formulated as a convex optimization problem and the solution is obtained by using the Second Order Cone Programming (SOCP) technique. An analysis of the computational complexity of the proposed algorithm is presented as well as its performance. The performance is evaluated via computer simulation for different number of sensors and different threshold values. Simulation results show that, the proposed algorithm is able to achieve a smaller mean square error of the spatial response gain for the specific FI region compared to existing algorithms.
1. Introduction to use the Frequency-Invariant (FI) beampattern synthesis technique. As the name implies, such beamformers are Broadband beamforming techniques using an array of designed to have constant spatial gain response over the microphones have been applied widely in hearing aids, tele- desired frequency bands. conferencing, and voice-activated human-computer inter- Over recent years, FI beamforming techniques are face applications. Several broadband beamformer designs developed in a fast pace. It is difficult to make a distinct have been reported in the literature [1–3]. One design classification. However, in order to grasp the literature on FI approach is to decompose the broadband signal into several beamforming in a glimpse, we classify them loosely into the narrowband signals and apply narrowband beamforming following three types. techniques for each narrowband signal [4]. This approach One type of FI beamformers includes those that focus requires several narrowband processing to be conducted on the design based on array geometry. These include, for simultaneously and is computationally expensive. Another example, the 3D sensor array design reported in [6], the design approach is to use adaptive broadband beamformers. rectangular sensor array design reported in [7], and the Such techniques use a bank of linear transversal filters to design of using subarrays in [8]. In [9], the FI beampattern is generate the desired beampattern. The filter coefficients can achieved by exploiting the relationship among the frequency be derived adaptively from the received signals. One classic responses of the various filters implemented at the output of design example is the Frost Beamformer [5]. However, in each sensor. order to have a similar beampattern over the entire frequency The second type of FI beamformers is designed on range, a large number of sensors and filter taps will be the base of a least-square approach. For this type of FI needed. This again leads to high computational complexity. beamformers, the weights of the beamformer are optimized The third approach of designing broadband beamformers is such that the error between the actual beampattern and 2 EURASIP Journal on Advances in Signal Processing the desired beampattern is minimized over a range of In this configuration, the intersensor spacing is fixed at frequencies. Some of such beamformers are designed in the λ/2, where λ is the wavelength of the signals of interest time-frequency domain [10–12], while others are designed and its minimum value is denoted by λmin.Theradius in the eigen-space domain [13]. corresponding to λmin is given by [14] The third type of FI beamformers is designed based on “Signal Transformation.” For this type of beamformers, the λmin signal received at the sensor array is transformed into a r = . (1) 4sin(π/K) domain such that the frequency response and the spatial response of the signal can be decoupled and hence adjusted independently. This is the principle adopted in [14], where Assuming that the circular array is on a horizontal plane, a uniform concentric circular array (UCCA) is designed the steering vector is to achieve the FI beampattern. Excellent results have been produced by this algorithm. One limitation of the UCCA − − T = j2πfrcos(φ φ0)/c j2πfrcos(φ φK−1)/c (2) beamformer is that a relatively large number of sensors have a f , φ e , ..., e , to be used to form the concentric circular array. Inspired by the UCCA beamformer design, a new where T denotes transpose. For convenience, let ω be the algorithm has been proposed by the authors of this paper normalized angular frequency, that is, ω = 2πf/fs,let and presented in [15]. The proposed algorithm attempts be the ratio of the sampling frequency and the maximum to optimize the FI beampattern solely for the main lobe frequency, that is, = fs/fmax, and let r be the normalized where the signal of interest is from and relaxes the FI radius, that is, r = r/λmin, the steering vector can be rewritten requirement on the side lobe. As a result, the sacrifice as on performance in the undesired region is traded off for better performance in the desired region and fewer number − − T = jωr cos(φ φ0) jωr cos(φ φK−1) (3) of microphones are employed. To achieve this goal, an a ω, φ e , ..., e . objective function with a quadratic constraint is designed. This constraint function allows the FI characteristic to be Figure 2 shows the system structure of the proposed accurately controlled over the specified bandwidth at the uniform circular array beamformer. The sampled signals expense of other parts of the spectrum which are not of after the sensor are represented by the vector X[n] = concern to the designer. This objective function is formulated T [x0(n), x1(n), ..., xK−1(n)] where n is the sampling instance. into a convex optimization problem and solved by SOCP These sampled signals are transformed into a set of coef- readily. Our algorithm has a frequency band of interest from ficients via the Inverse Discrete Fourier Transform (IDFT), 0.3π to 0.95π. If the sampling frequency is 16000 Hz, the where each of the coefficients is called a phase mode [17]. frequency band of interest ranges from 2400 Hz to 7600 Hz. The mth phase mode at time instance n can be expressed as This algorithm can be applied in speech processing as the labial and fricative sounds of speech mostly lie in the 8th K −1 to 9th octave. If the sampling frequency is 8000 Hz, the p [n] = x [n]e j2πkm/K. (4) frequency band of interest is from 1200 Hz to 3800 Hz. m k k=0 This frequency range is useful for respiratory sounds [16]. The aim of this paper is to provide the full details of These phase modes are passed through an FIR (Finite ffi the design proposed in [15]. In addition, a computational Impulse Response) filter where the filter coe cients are complexity analysis of the proposed algorithm and the denoted as bm[n]. The purpose of this filter is to remove sensitivity performance evaluations at different numbers of the frequency dependency of the received signal X[n]. The sensors and different constraint parameter values are also beamformer output y[n] is then determined as the weighted included. sum of the filtered signals: The remaining paper is organized in the following way: in Section 2, problem formulation is discussed; in Section 3, L the proposed beamforming design is described; in Section 4, y[n] = pm[n] ∗ bm[n] · hm, (5) the design of the beamforming weight using SOCP is m=−L shown; numerical results are given in Section 5,andfinally, conclusions are drawn in Section 6. where hm is the phase spatial weighting coefficients or the beamforming weights, and ∗ is the discrete-time convolu- tion operator. 2. Problem Formulation Let M be the total number of phase modes and it is assumed to be an odd number. It can be seen from Figure 2 A uniformly distributed circular sensor array with K number that the K received signals are transformed into M phase of microphones is arranged as shown in Figure 1.Each modes, where L = (M − 1)/2. omnidirectional sensor is located at (r cos φk, r sin φk), where The corresponding spectrum of the phase modes can r is the radius of the circle, φk = 2kπ/K and k = 0, ..., K − 1. be obtained by taking the Discrete Time Fourier Transform EURASIP Journal on Advances in Signal Processing 3
(DTFT) of the phase modes defined in (4): kth element
K −1 j2πkm/K Pm(ω) = Xk(ω)e k=0 (6) φk K −1 − = S(ω) · e jωr cos(φ φk)e j2πkm/K, k=0 where S(ω) is the spectrum of the source signal. Radius r Taking DTFT on both side of (5) and using (6), we have