Turning Overlap-Save Into a Multiband Mixing, Downsampling Filter Bank
Total Page:16
File Type:pdf, Size:1020Kb
[dsp TIPS&TRICKS] Mark Borgerding Turning Overlap-Save into a Multiband Mixing, Downsampling Filter Bank n this article, we show how to mixing, filtering, and decimation of mul- filter coefficients. The actual value extend the popular overlap-save tiple arbitrary channels much faster than depends on the relative strengths of the (OS) fast convolution filtering tech- direct time-domain implementation. platform in question (CPU pipelining, nique to create a flexible and compu- zero-overhead looping, memory tationally efficient bank of filters, SOMETHING OLD addressing modes, etc.). On a desktop Iwith frequency translation and decima- AND SOMETHING NEW processor with a highly optimized FFT tion implemented in the frequency The necessary conditions for vanilla- library, the value may be as low as 16. domain. In addition, we supply some tips flavored fast convolution are covered On a fixed-point DSP with a single-cycle for choosing appropriate fast Fourier pretty well in the literature. However, the multiply-accumulate instruction, the transform (FFT) size. choice of FFT size is not. Filtering multi- efficiency value can be over 50. Fast convolution is a well-known and ple channels from the same forward FFT Fast convolution refers to the block- powerful filtering technique. All but the requires special conditions not detailed wise use of circular convolution to shortest finite impulse response (FIR) in textbooks. To downsample and shift accomplish linear convolution. Fast con- filters can be implemented more effi- those channels in the frequency domain volution can be accomplished by OA or ciently in the frequency domain than requires still more conditions. OS methods. OS is also known as “over- when performed directly in the time The first section is meant to be a lap-scrap” [5]. In OA filtering, each signal domain. The longer the filter impulse quick reminder of the basics before we data block contains only as many sam- response, the greater the speed advan- extend the OS fast convolution tech- ples as allows circular convolution to be tage of fast convolution. nique. If you feel comfortable with these equivalent to linear convolution. The sig- When more than one output is fil- concepts, skip ahead. On the other hand, nal data block is zero-padded prior to the tered from a single input, some parts of if this review does not jog your memory, FFT to prevent the filter impulse the fast convolution algorithm are redun- check your favorite DSP book for “Fast response from “wrapping around” the dant. Removing this redundancy increas- Convolution,” “Overlap-Add” (OA), end of the sequence. OA filtering adds es fast convolution’s speed even more. “Overlap-Save,” or “Overlap-Scrap” the input-on transient from one block Sample rate change by decimation [1]–[5]. with the input-off transient from the pre- (downsampling) and frequency transla- vious block. tion (mixing) techniques can also be REVIEW OF FAST CONVOLUTION In OS filtering, shown in Figure 1, no incorporated efficiently in the frequency The convolution theorem tells us that zero-padding is performed on the input domain. These concepts can be combined multiplication in the frequency domain data, thus the circular convolution is not to create a flexible and efficient bank of is equivalent to convolution in the time equivalent to linear convolution. The filters. Such a filter bank can implement domain [1]. Circular convolution is portions that “wrap around” are useless achieved by multiplying two discrete and discarded. To compensate for this, Fourier transforms (DFTs) to effect con- the last part of the previous input block volution of the time sequences that the is used as the beginning of the next “DSP Tips and Tricks” introduces prac- transforms represent. By using the FFT block. OS requires no addition of tran- tical tips and tricks of design and to implement the DFT, the computa- sients, making it faster than OA. The OS implementation of signal processing tional complexity of circular convolu- filtering method is recommended as the algorithms so that you may be able tion is approximately O(Nlog 2N ) basis for the techniques outlined in the to incorporate them into your instead of O(N 2 ), as in direct linear remainder of this article. The nomencla- designs. We welcome readers who convolution. Although very small FIR ture “FFTN” in Figure 1 indicates that an enjoy reading this column to submit filters are most efficiently implemented FFT’s input sequence is zero-padded to a their contributions. Contact Associate with direct convolution, fast convolu- length of N samples, if necessary, before Editors Rick Lyons ([email protected]) tion is the clear winner as the FIR filters performing the N-point FFT. or Amy Bell ([email protected]). get longer. Conventional wisdom places For clarity, the following list defines the efficiency crossover point at 25–30 the symbols used in this article. IEEE SIGNAL PROCESSING MAGAZINE [158] MARCH 2006 1053-5888/06/$20.00©2006IEEE SYMBOL CONVENTIONS x(n) Input data sequence Overlap Input: Repeat h(n), Length = P h(n) FIR filter impulse P−1 of N Samples response x(n) y(n)=x(n)∗h(n) Convolution of x(n) FFTN and h(n) L Number of new input FFTN samples consumed per data block P Length of h(n) Discard P−1 of N Samples N = L + P − 1 FFT size −1 y(n) V = N/(P − 1) Overlap factor, the FFTN ratio of FFT length to P−1 Samples filter transient length ( ) = ( ) ∗ ( ) D Decimation factor [FIG1] Overlap-save (OS) filtering, y n x n h n . CHOICE OF FFT SIZE: COMPLEXITY IS ■ FFT accuracy—the numerical of the additional coefficients without RELATED TO FILTER LENGTH AND accuracy of fast convolution filtering increasing the computational workload. OVERLAP FACTOR is dependent on the error introduced Processing a single block of input data by the FFT-to-IFFT (inverse FFT) FREQUENCY DOMAIN that produces L outputs incurs a compu- round trip. For floating-point imple- DOWNSAMPLING: ALIASING IS tational cost related to Nlog(N) = mentations, this may be negligible, YOUR FRIEND (L + P − 1) log(L + P − 1). The compu- but fixed-point processing can lose The most intuitive method for reducing tational cost per sample is related to significant dynamic range in these sample rate in the frequency domain is (L + P − 1) log(L + P − 1)/L . The filter transforms. to simply perform a smaller IFFT using length P is generally a fixed parameter, ■ Latency—Fast convolution filtering only those frequency bins of interest. so choosing L such that this equation is process increases delay by at least L This is not a 100% replacement for time minimized gives the theoretically opti- samples. The longer the FFT, the domain downsampling. This simple mal FFT length. It may be worth men- longer the latency. method will cause ripples (Gibbs phe- tioning that the log base is the radix of While there is no substitute for nomenon) at FFT buffer boundaries the FFT. The only difference between dif- benchmarking on the target platform, in caused by the multiplication in the fre- ferent-based logarithms is a scaling fac- the absence of benchmarks, choosing a quency domain by a rectangular window. tor. This is irrelevant to “big O” power-of-two FFT length about four The sampling theorem tells us that scalability, which generally excludes con- times the length of the FIR filter is a sampling a continuous signal aliases stant scaling factors. good rule of thumb. energy at frequencies higher than the Larger FFT sizes are more costly to Nyquist rate (half the signal sample rate) perform, but they also produce more FILTERING MULTIPLE CHANNELS: back into the baseband spectrum (below usable (nonwraparound) output samples. REUSE THE FORWARD FFT the Nyquist rate). This is equally true for In theory, one is penalized more for It is often desirable to apply multiple fil- decimation of a digital sequence as it is choosing too small an overlap factor, V, ters against the same input sample for analog-to-digital conversion [6]. than too large. In practice, the price of sequence. In these cases, the advantages Aliasing is a natural part of downsam- large FFTs paid in computation and/or of fast convolution filtering become even pling. To accurately implement down- numerical accuracy may suggest a differ- greater. The computationally expensive sampling in the frequency domain, it is ent conclusion. operations are the forward FFT, the IFFT, necessary to preserve this behavior. It Some other factors to consider while and the multiplication of the frequency should be noted that, with a suitable deciding the overlap factor for fast con- responses. The forward FFT needs to be antialiasing filter, the energy outside the volution filtering include: computed just once. This is roughly a selected bins might be negligible. This ■ FFT speed—It is common for actual “Buy one filter. Get one 40% off” sale. discussion is concerned with the steps FFT computational cost to differ To realize this computational cost necessary for equivalence. The designer greatly from theory, e.g., due to mem- savings for two or more filters, all filters should decide how much aliasing, if any, ory caching. (“In theory there is no must have the same impulse response is necessary. difference between theory and prac- length. This condition can always be Decimation (i.e., downsampling) can tice. In practice there is.” — Yogi achieved by zero padding the shorter fil- be performed exactly in the frequency Berra, American philosopher and ters. Alternately, the engineer may domain by coherently adding the fre- occasional baseball manager.) redesign the shorter filter(s) to make use quency components to be aliased [7]. IEEE SIGNAL PROCESSING MAGAZINE [159] MARCH 2006 [dsp TIPS&TRICKS] continued tiple of the decimation rate D.