Impulsive Sound Detection Directly in Sigma-Delta Domain
Total Page:16
File Type:pdf, Size:1020Kb
ARCHIVES OF ACOUSTICS Copyright c 2017 by PAN – IPPT Vol. 42, No. 2, pp. 255–261 (2017) DOI: 10.1515/aoa-2017-0028 Impulsive Sound Detection Directly in Sigma-Delta Domain Igor D. dos S. MIRANDA, Antonio C. de C. LIMA Department of Industrial Engineering Postgraduate Federal University of Bahia 2 Aristides Novis Street, 6th Floor, Federa¸c˜ao, Salvador, Bahia 40.210-630, Brazil e-mail: [email protected], [email protected] (received September 26, 2016; accepted February 1, 2017 ) Recent implementations of Sigma-Delta (Σ∆) converters have achieved low cost, low power consump- tion, and high integration while maintaining resolution as high as in Nyquist-rate converters. However, its usage implies demodulating the source signal delivered from Σ∆ modulation to Pulse-Code Modulation (PCM) on a pre-processing stage. This work proposes an algorithm based on Discrete Cosine Transform for impulsive signal detection to be applied directly on a modulated Σ∆ bitstream, targeting to reduce computational cost in acoustic event detection applications such as gunshot recognition systems. From pre-recorded impulsive sounds in Σ∆ format, it has been shown that the new method presents a simi- lar error rate in comparison with traditional energy-based approaches in PCM, meanwhile, it reduces significantly the number of operations per unit time. Keywords: impulsive signal detection; sigma-delta modulation; discrete cosine transform. 1. Introduction reaches high power values, it may indicate abnormal events in several environments and human tasks, such Impulsive sound signals is a category of audio sig- as building collapses, gunfires, and explosions in fac- nals characterised by an abrupt rise in amplitude, im- tories. Systems to automatically detect these signals mediately followed by an under-damped motion or ex- may be useful for monitoring emergency situations. ponential decay, as shown in Fig. 1. Claps, glass break- Impulsive signal detection has been addressed by ing, hamming, and door slams are examples of every- several works based on a variety of approaches, in- day impulsive sound sources. When an impulsive signal cluding power (or amplitude) analysis, Wavelet Trans- form, correlation, and statistical methods (Chacón- Rodr´ıguez et al., 2011; Kauppinen, 2002). Tech- niques that rely on power assessment have presented lower computational cost at an acceptable performance as compared to statistical approaches. These power-based methods are typically imple- mented by means of a power estimator followed by an onset detector, setting thresholds to activate the detector when limits for power and power slope are reached simultaneously (Chacón-Rodr´ıguez et al., 2011; Dufaux, 2001). Better results were obtained by implementing adaptive thresholds (Dufaux, 2001; Sharkey et al., 1996; Showen, Dunham, 1999). Onset may be found through derivatives, median fil- ters, and other methods applied to the power signal (Dufaux, 2001; Kauppinen, 2002). In general, impulse detection systems that ad- Fig. 1. Impulsive sound signal waveform of 0.40 semi- dress explosion and impact monitoring in large ar- automatic pistol gunshot sampled at a 8 kHz rate. eas are composed of a central processing unit (CPU) 256 Archives of Acoustics – Volume 42, Number 2, 2017 – a powerful server-like machine – and an array of been shown that the proposed technique, which is acoustic sensors nodes, whose typical architecture con- based on DCT, reduces computational cost without in- tains a microphone, a microcontroller unit (MCU), and creasing the error rate when compared to the reference a transceiver (Wessels, Basten, 2016). When an im- architecture described in Sec. 2. The final goal is to pulsive signal is detected in an acoustic sensor, the enable sensor nodes to be implemented using MEMS audio is transmitted to the CPU where a pattern clas- microphones with built-in Σ∆ converters, taking all sifier is run. If an emergency is detected, signals from of their advantages as well as relaxing MCU require- different sensors are used to triangulate the source. ments. Despite the fact that CPU performs the most com- putationally demanding tasks and runs important de- cision algorithms, sensor nodes have bigger impact 2. Reference architecture for impulse detection on system performance and cost. Sensors are spread in Σ∆ signals throughout the monitored area such that there are in- tersection regions to allow triangulation. The overall Throughout this work, the system depicted in system cost is then directly linked to the trade off be- Fig. 3 has been chosen as the reference architecture for tween microphone quality, impulse detection efficiency, impulse detection in Σ∆ modulated signals. The mod- nodes separation, and MCU cost. ules presented in Fig. 3 are described in this section Categories of converters and microphones that and comprise three processing stages: a Σ∆ demod- present features compatible with those mentioned for ulator, power estimator, and abrupt change detector. sensor nodes are Σ∆ ADCs and micro-electrome- chanical systems (MEMS) microphones, respectively. Both have been used in a broad range of applications due to their low cost, high quality, and high integra- tion. Currently, semiconductor companies have man- ufactured MEMS microphone, pre-amplifier, and Σ∆ ADC in a single integrated circuit. Σ∆ modulation, also known as Pulse-Density Mod- Fig. 3. Reference architecture for this work. ulation (PDM), represents analog amplitude levels as a density of bits in a bitstream. Modulation process can be modelled as an oversampled ADC (at MHz The Σ∆ demodulator has been implemented using range), single-bit quantiser and error feedback loop. the CIC filter structure shown in Fig. 2. The number of This feedback implements the noise shaping technique integrators and comb pairs must be chosen according that pushes the quantisation noise, generated by the to demodulator requirements and signal characteris- low resolution, to higher frequencies and leaves the tics, such as oversampling ratio, noise shape, output baseband almost intact. Therefore, conversion from bits, and phase linearity. Fourth order CIC filters are Σ∆ to PCM can be performed by means of a low pass suitable for most applications of Σ∆ demodulator. filter followed by a downsampler. Since impulsive sounds are generally aperiodic, sig- An efficient method to filter and decimate with low nal power has been estimated with successive compu- computational cost, often used in Σ∆ demodulation, is tation of time average of the energy within LP sample the Cascaded-Integrator Comb (CIC) filter. CIC deci- windows, as given by: mators are implemented only with adders, decreasing L −1 the total number of operations substantially (Park, 1 P E [k]= x2[kL + n]. (1) 1991). However, their passbands are not flat, requir- avg L P P n=0 ing an additional compensation FIR stage. Figure 2 illustrates a CIC decimator structure. The abrupt change detector has been implemented based on the Conditional Median Filter (CMF), in- troduced by Kasparis et al. (1992). Firstly, in CMF, a median filter is applied to the power estimate se- quence delivered by the previous stage, as in the ex- pression below: MF[k] = median Eavg[i] i = k LM 1, ..., k . (2) Fig. 2. CIC decimator structure for converting from Σ∆ to { | − − } PCM. Then, a threshold is applied to the median filter output, targeting to recover the power sequence but In this work, we propose an algorithm to detect removing impulsive noise by detecting abrupt transi- impulsive sound signals directly in Σ∆ format. It has tions, as follows: I.D. dos S. Miranda, A.C. de C. Lima – Impulsive Sound Detection Directly in Sigma-Delta Domain 257 MF[k], if MF[k] Eavg[k d] to Parseval theorem, this abrupt energy transition in | − − | time domain must be reflected by an abrupt energy CMF[k]= < Rth, (3) transition in frequency domain, if Fourier transforms Eavg[k d], otherwise, − are taken immediately before and after the impulse where d is d= (Lm 1)/2 and Rth is a threshold for onset. abrupt transitions. In− case of adaptive threshold, this In order to detect impulses in a Σ∆ signal, only result might be used to estimate long term background the energy in a narrow frequency range should be ob- noise power. For impulse detection, a binary version of served due to its high frequency quantisation noise. CMF has been implemented as follows: For example, most gunshots have their energy con- centrated below 2 kHz, with peaks ranging from 500 0, if MF[k] Eavg[k d] < Rth, to 600 Hz (Graves, 2012; Millet, Baligand, 2006). BCMF[k]= | − − | (4) This represents less than 1% of the spectrum in a Σ∆ 1, otherwise. signal sampled at 512 kHz, making frequency selection Therefore the reference architecture has been im- a hard task to be performed in discrete time domain, plemented using (1), (2), and (4). In these equations, as discussed in Sec. 1. In frequency domain, a straight- LP and LM determine the maximum impulse width forward strategy would be to use a Discrete Fourier detectable by the system, since the median filter only Transform (DFT). Calculating a complete DFT for removes impulses with duration shorter than LM /2. a wideband signal, with interest in only a few terms, Rth must be chosen according to the average power of is a very complex and wasteful operation. both impulses and background noise. Simpler and more efficient mathematical tools to A simulation example for the reference architecture represent the energy content of a signal are the Discrete presented in this section is shown in Fig. 4. The gener- Cosine Transform (DCT) and Discrete Sine Transform ated input signal has two impulsive events and a sinu- (DST) (Ahmed et al., 1974; Britanak et al., 2010; soid fragment between them (Fig. 4a). Note in Fig. 4b Rao, Yip, 2014). Both transformations create a modi- that although the impulses and the sinusoid have their fied magnitude/frequency signal representation by cor- power represented by the power estimator, impulses relating it, over time, with cosine and sine functions, are removed from the median filter’s output.