On the Use of Time–Frequency Reassignment in Additive Sound Modeling*
Total Page:16
File Type:pdf, Size:1020Kb
PAPERS On the Use of Time–Frequency Reassignment in Additive Sound Modeling* KELLY FITZ, AES Member AND LIPPOLD HAKEN, AES Member Department of Electrical Engineering and Computer Science, Washington University, Pulman, WA 99164 A method of reassignment in sound modeling to produce a sharper, more robust additive representation is introduced. The reassigned bandwidth-enhanced additive model follows ridges in a time–frequency analysis to construct partials having both sinusoidal and noise characteristics. This model yields greater resolution in time and frequency than is possible using conventional additive techniques, and better preserves the temporal envelope of transient signals, even in modified reconstruction, without introducing new component types or cumbersome phase interpolation algorithms. 0INTRODUCTION manipulations [11], [12]. Peeters and Rodet [3] have developed a hybrid analysis/synthesis system that eschews The method of reassignment has been used to sharpen high-level transient models and retains unabridged OLA spectrograms in order to make them more readable [1], [2], (overlap–add) frame data at transient positions. This to measure sinusoidality, and to ensure optimal window hybrid representation represents unmodified transients per- alignment in the analysis of musical signals [3]. We use fectly, but also sacrifices homogeneity. Quatieri et al. [13] time–frequency reassignment to improve our bandwidth- propose a method for preserving the temporal envelope of enhanced additive sound model. The bandwidth-enhanced short-duration complex acoustic signals using a homoge- additive representation is in some way similar to tradi- neous sinusoidal model, but it is inapplicable to sounds of tional sinusoidal models [4]–[6] in that a waveform is longer duration, or sounds having multiple transient events. modeled as a collection of components, called partials, We use the method of reassignment to improve the time having time-varying amplitude and frequency envelopes. and frequency estimates used to define our partial param- Our partials are not strictly sinusoidal, however. We eter envelopes, thereby enhancing the time –frequency employ a technique of bandwidth enhancement to com- resolution of our representation, and improving its phase bine sinusoidal energy and noise energy into a single par- accuracy. The combination of time–frequency reassign- tial having time-varying amplitude, frequency, and band- ment and bandwidth enhancement yields a homogeneous width parameters [7], [8]. model (that is, a model having a single component type) Additive sound models applicable to polyphonic and that is capable of representing at high fidelity a wide vari- nonharmonic sounds employ long analysis windows, which ety of sounds, including nonharmonic, polyphonic, impul- can compromise the time resolution and phase accuracy sive, and noisy sounds. The reassigned bandwidth- needed to preserve the temporal shape of transients. enhanced sound model is robust under transformation, and Various methods have been proposed for representing the fidelity of the representation is preserved even under transient waveforms in additive sound models. Verma and time dilation and other model-domain modifications. The Meng [9] introduce new component types specifically for homogeneity and robustness of the reassigned bandwidth- modeling transients, but this method sacrifices the homo- enhanced model make it particularly well suited for such geneity of the model. A homogeneous model, that is, a manipulations as cross synthesis and sound morphing. model having a single component type, such as the break- Reassigned bandwidth-enhanced modeling and render- point parameter envelopes in our reassigned bandwidth- ing and many kinds of manipulations, including morph- enhanced additive model [10], is critical for many kinds of ing, have been implemented in the open-source Cϩϩ class library Loris [14], and a stream-based, real-time * Manuscript received 2001 December 20; revised 2002 July implementation of bandwidth-enhanced synthesis is avail- 23 and 2002 September 11. able in the Symbolic Sound Kyma environment [15]. J. Audio Eng. Soc., Vol. 50, No. 11, 2002 November 879 FITZ AND HAKEN PAPERS 1TIME–FREQUENCY REASSIGNMENT cantly to the reconstruction, and the maximum contribu- tion (center of gravity) occurs at the point where the phase The discrete short-time Fourier transform is often used is changing most slowly with respect to time and frequency. as the basis for a time–frequency representation of time- In the vicinity of t ϭ τ (that is, for an analysis window varying signals, and is defined as a function of time index centered at time t ϭ τ), the point of maximum spectral n and frequency index k as energy contribution has time–frequency coordinates that satisfy the stationarity conditions 3 RϪϪV S j2π ^lnkh W Xkn ^^^hhhϭϪ hlnxlexp (1) ! S N W 2 lϭϪ3 8φτω^^, hhϩϪϭ ωt τB 0 (3) T X 2ω NϪ1 2 2 ϩϪϭ Ϫj2πlk 8φτω^^, hh ωt τB 0 (4) ϭϩ!hlxn^^hh l exp e o (2) 2τ NϪ1 N lϭϪ where ( , ) is the continuous short-time phase spec- 2 φ τ ω trum and ω(t Ϫ τ) is the phase travel due to periodic where h(n) is a sliding window function equal to 0 for n < oscillation [18]. The stationarity conditions are satisfied Ϫ Ϫ Ϫ (N 1)/2 and n > (N 1)/2 (for N odd), so that Xn(k) at the coordinates is the N-point discrete Fourier transform of a short-time 2φτω^ , h waveform centered at time n. t ϭϪτ (5) Short-time Fourier transform data are sampled at a rate 2ω equal to the analysis hop size, so data in derivative 2φτω^ , h time–frequency representations are reported on a regular ωt ϭ (6) 2τ temporal grid, corresponding to the centers of the short- time analysis windows. The sampling of these so-called representing group delay and instantaneous frequency, frame-based representations can be made as dense as respectively. desired by an appropriate choice of hop size. However, Discretizing Eqs. (5) and (6) to compute the time and temporal smearing due to long analysis windows needed frequency coordinates numerically is difficult and unreli- to achieve high-frequency resolution cannot be relieved by able, because the partial derivatives must be approxi- denser sampling. mated. These formulas can be rewritten in the form of Though the short-time phase spectrum is known to con- ratios of discrete Fourier transforms [1]. Time and fre- tain important temporal information, typically only the quency coordinates can be computed using two additional short-time magnitude spectrum is considered in the short-time Fourier transforms, one employing a time- time–frequency representation. The short-time phase weighted window function and one a frequency-weighted spectrum is sometimes used to improve the frequency esti- window function. mates in the time–frequency representation of quasi- Since time estimates correspond to the temporal center harmonic sounds [16], but it is often omitted entirely, or of the short-time analysis window, the time-weighted win- used only in unmodified reconstruction, as in the basic dow is computed by scaling the analysis window function Ϫ Ϫ Ϫ sinusoidal model, described by McAulay and Quatieri [4]. by a time ramp from (N 1)/2 to (N 1)/2 for a win- The so-called method of reassignment computes sharp- dow of length N. The frequency-weighted window is ened time and frequency estimates for each spectral com- computed by wrapping the Fourier transform of the analy- Ϫ ponent from partial derivatives of the short-time phase sis window to the frequency range [ π, π], scaling the Ϫ Ϫ Ϫ spectrum. Instead of locating time–frequency compo- transform by a frequency ramp from (N 1)/2 to (N 1)/2, and inverting the scaled transform to obtain a (real) nents at the geometrical center of the analysis window (tn, frequency-scaled window. Using these weighted win- ωk), as in traditional short-time spectral analysis, the com- ponents are reassigned to the center of gravity of their dows, the method of reassignment computes corrections complex spectral energy distribution, computed from the to the time and frequency estimates in fractional sample Ϫ Ϫ Ϫ short-time phase spectrum according to the principle of units between (N 1)/2 to (N 1)/2. The three analy- stationary phase [17, ch. 7.3]. This method was first devel- sis windows employed in reassigned short-time Fourier oped in the context of the spectrogram and called the mod- analysis are shown in Fig. 1. ˆ ified moving window method [18], but it has since been The reassigned time tk,n for the kth spectral component applied to a variety of time–frequency and time-scale from the short-time analysis window centered at time n (in transforms [1]. samples, assuming odd-length analysis windows) is [1] The principle of stationary phase states that the variation R V of the Fourier phase spectrum not attributable to periodic S * W S XkXktn; ^^hhn W tntkn, ϭϪ0 (7) oscillation is slow with respect to frequency in certain S 2 W spectral regions, and in surrounding regions the variation is S Xkn ^ h W relatively rapid. In Fourier reconstruction, positive and T X negative contributions to the waveform cancel in frequency where Xt;n(k) denotes the short-time transform computed regions of rapid phase variation. Only regions of slow using the time-weighted window function and 0 [·] phase variation (stationary phase) will contribute signifi- denotes the real part of the bracketed ratio. 880 J. Audio Eng. Soc., Vol. 50, No. 11, 2002 November PAPERS TIME-FREQUENCY REASSIGNMENT IN SOUND MODELING The corrected frequency ωˆ k,n(k) corresponding to the spectrum, which is very difficult to interpret. The method same component is [1] reassignment is a technique for extracting information R V from the phase spectrum. S * W S XkXkfn; ^^hhn W ωt kn, ϭϩk 1 (8) S 2 W 2 REASSIGNED BANDWIDTH-ENHANCED S Xkn ^ h W ANALYSIS T X The reassigned bandwidth-enhanced additive model where Xf ;n(k) denotes the short-time transform computed using the frequency-weighted window function and 1 [·] [10] employs time–frequency reassignment to improve the time and frequency estimates used to define partial denotes the imaginary part of the bracketed ratio.