A Deep Learning Approach to Active Noise Control
Total Page:16
File Type:pdf, Size:1020Kb
A Deep Learning Approach to Active Noise Control Hao Zhang1, DeLiang Wang1;2 1Department of Computer Science and Engineering, The Ohio State University, USA 2Center for Cognitive and Brain Sciences, The Ohio State University, USA fzhang.6720, [email protected] Abstract Reference Error Microphone �(�) Microphone �(�) We formulate active noise control (ANC) as a supervised �(�) �(�) �(�) learning problem and propose a deep learning approach, called �(�) deep ANC, to address the nonlinear ANC problem. A convo- lutional recurrent network (CRN) is trained to estimate the real Cancelling Loudspeaker �(�) and imaginary spectrograms of the canceling signal from the reference signal so that the corresponding anti-noise can elimi- ANC nate or attenuate the primary noise in the ANC system. Large- scale multi-condition training is employed to achieve good gen- Figure 1: Diagram of a single-channel feedforward ANC sys- eralization and robustness against a variety of noises. The deep tem, where P (z) and S(z) denote the frequency responses of ANC method can be trained to achieve active noise cancella- primary path and secondary path, respectively. tion no matter whether the reference signal is noise or noisy speech. In addition, a delay-compensated strategy is intro- tional link artificial neural network to handle the nonlinear ef- duced to address the potential latency problem of ANC sys- fect in ANC. Other nonlinear adaptive models such as radial tems. Experimental results show that the proposed method is basis function networks [12], fuzzy neural networks [13], and effective for wide-band noise reduction and generalizes well to recurrent neural networks [14] have been developed to further untrained noises. Moreover, the proposed method can be trained improve the ANC performance. These neural network architec- to achieve ANC within a quiet zone. tures for nonlinear ANC utilize online adaptation or training to Index Terms: Active noise control, deep learning, deep ANC, characterize an optimal controller and thus can still be regarded spatial ANC, nonlinear distortion as adaptive algorithms. ANC aims to output a canceling signal to eliminate or at- 1. Introduction tenuate the primary noise. In this paper, we propose a new approach, named deep ANC, to address ANC, particularly the Active noise control is a noise cancellation methodology based nonlinear ANC problems. Deep learning is capable of model- on the principle of superposition of acoustic signals. The goal ing complex nonlinear relationships and can potentially play an of ANC systems is to generate an anti-noise with the same am- important role in addressing nonlinear ANC problems. Specif- plitude and opposite phase of the primary (unwanted) noise in ically, a convolutional recurrent network (CRN) [15] is trained order to cancel or attenuate the primary noise [1]. Traditionally, to estimate the real and imaginary spectrograms of a canceling an active noise controller is implemented using adaptive filters signal from the reference signal. The subsequent anti-noise is in a recursive way to optimize filter characteristics by minimiz- obtained by passing the canceling signal through a loudspeaker ing an error signal. Filtered-x least mean square (FxLMS) and and secondary path. Finally, the error signal is used to calculate its extensions are the most widely used active noise controllers the loss function for training the CRN model. due to their simplicity, robustness and relatively low computa- To the best of our knowledge, this paper represents the first tional load [2]. However, nonlinear distortions are inevitably in- study to formulate ANC as a supervised learning problem and troduced to the anti-noise in applications of ANC due to the lim- use deep learning to address it. Our study makes four main ited quality of electronic devices such as amplifiers and loud- contributions. First, complex spectral mapping is employed to speakers. LMS based methods are fundamentally linear and fail estimate both magnitude and phase responses for accurate esti- to identify the underlying filter accurately in the presence of mation [16,17], and large-scale multi-condition training is used nonlinearities. Even a small nonlinearity can have a significant, to attenuate a variety of noises and cope with the variations in negative impact on the FxLMS behavior [3]. acoustic environments. Second, in addition to attenuating noise Many adaptive nonlinear ANC algorithms have been pro- from the noise input, we propose to train deep ANC to selec- posed to address nonlinear distortions. The Volterra expan- tively attenuate the noise components of a noisy speech sig- sion [4, 5] and tangential hyperbolic function based FxLMS nal and let the underlying speech pass through. Namely, deep (THF-FxLMS) [6] have been shown to be effective for mod- ANC in principle is able to maintain the target signal embedded eling mild nonlinearities for nonlinear ANC. Other algorithms in noise by selectively canceling the noise components of the such as bilinear FxLMS, filtered-s LMS, and leaky FxLMS have noisy signal. Third, we introduce a delay-compensated training been investigated to address nonlinearity [7]. However, their strategy to tackle a shortcoming of frequency-domain ANC al- performance is limited in the presence of strong nonlinearities. gorithms: processing latency. Fourth, we expand deep ANC to Neural networks have also been introduced to address nonlin- perform ANC within a small spatial zone (in order to produce ear ANC [8], considering their ability in handling nonlinear re- a quiet zone). This is a more useful but more challenging task lations. A multilayer perceptron is introduced in [9] for active compared to ANC at a given spatial location. control of vibrations. The studies in [10] and [11] use func- The remainder of this paper is organized as follows. Sec- �(�) �(�) + �(�) �(�) target should be for a deep neural network (DNN). Although the − �(�) ideal canceling signal for attenuating a primary noise is known, it cannot be used directly as the desired output of the DNN due �(�) Deep ANC �!"{1} �(�) to the existence of the loudspeaker and the secondary path (see (a) Figure 2). Second, the primary and secondary paths can be time-varying and the transfer function that the DNN needs to CRN approximate can be different for different acoustic conditions. � (�, �) �#(�, �) # Encoder LSTM Decoder This seems to imply that a supervised learning model needs to �$(�, �) �$(�, �) (b) predict a one-to-many mapping, an impossible job. These ob- stacles may explain why ANC has not been approached from Figure 2: Diagram of (a) the deep ANC approach, and (b) CRN the deep learning standpoint. However, as detailed in the next based deep ANC. section, we have access to the ideal anti-noise to supervise DNN training, and the DNN can be trained to estimate, for a given in- tion II presents the deep ANC approach. Evaluation metrics put, some average of the different outputs for different scenar- and experimental results are shown in Section III. Section IV ios. With these observations, ANC can be formulated as a deep concludes the paper. learning task. 2. Deep ANC 2.3. Feature extraction and training target The reference signal x(t) is sampled at 16 kHz and divided 2.1. Signal model into 20-ms frames with a 10-ms overlap between consecutive A typical feedforward ANC system is shown in Figure 1, and frames. Then a 320-point short time Fourier transform (STFT) it consists of a reference microphone, a canceling loudspeaker, is applied to each time frame to produce the real and imagi- and an error microphone. The reference signal x(t) is picked up nary spectrograms of x(t), which are denoted as Xr(m; c) and by a reference microphone. The canceling signal y(t) generated Xi(m; c), respectively, within a T-F unit at time m and fre- by the ANC is passed through the canceling loudspeaker and the quency c. The proposed CRN based deep ANC takes Xr(m; c) secondary path to get the anti-noise a(t). The corresponding and Xi(m; c) as input features for complex spectral mapping. error signal sensed by the error microphone is defined as: To attenuate the primary noise at the error microphone, the ideal anti-noise (the primary noise) is used as the training tar- e(t) = d(t) − a(t) (1) get. The CRN is trained to output the real and imaginary spec- T = p(t) ∗ x(t) − s(t) ∗ fLSfw (t)x(t)g trograms of the canceling signal Yr(m; c) and Yi(m; c). Which are sent to the inverse Fourier transform to derive a waveform where t is the time index, d(t) is the primary signal received signal y(t). The anti-noise is then generated by passing the can- at the error microphone, w(t) represents the active noise con- celing signal through the loudspeaker and secondary path. troller, fLS{·} denotes the transfer function of the loudspeaker, ∗ denotes linear convolution, and the superscript T means trans- 2.4. Two training strategies and their loss functions pose. Furthermore, p(t) and s(t) denote the impulse responses of the primary and secondary path, respectively. Deep ANC can be trained to achieve noise cancellation no mat- Adaptive algorithms alleviate the effect of the secondary ter whether the reference signal is noise or noisy speech by us- path by filtering the reference signal with an estimate of the ing proper training data and loss functions. Two training strate- secondary path S^(z) before feeding it to the controller [18]. gies are introduced for the deep ANC in this study: Deep ANC trained with noise n(t) The secondary path is usually estimated during an initial stage : We use noise signal with separate procedures and the performance of ANC methods as the reference signal and train the deep ANC to attenuate the primary noise. The loss function is defined as: depends largely on the accuracy of S^(z) estimation.