Digital Audio Effects and Simulations

By Francis Rumsey Staff Technical Writer

t is increasingly possible either to emulate legacy audio devices and IIeffects or to create new ones using digital signal processing. Often these are implemented as plug-ins to workstation packages, using one of the proprietary systems such as VST, DirectX, Audio Suite, or Audio Units. Papers from a session chaired by David Berners at the AES 127th Convention last year in New York cov- ered a wide range of recent innovations in this field, including , plate reverb, and emu- lation.

ROLL ME A LESLIE SPEAKER Invented by Don Leslie in 1949, the classic Leslie speaker has created an effect that defines the sound of certain bands, having been used to add a char- acteristic or effect to the for many years. The whole process is achieved mechanically, which makes for an interesting challenge to emulate it in the digital domain. The effect results from rotating elements Fig. 1. The Leslie 44W loudspeaker system (Figs. 1–3 courtesy Herrera et al.) that have amplitude-, spectrum-, and phase-modulating influences on the inside cabinet walls, as audio radiated by the system. A small picked up by two micro- amount of varying Doppler shift is also phones a short distance said to be introduced by the system. away (see Fig. 2). Virtual While the mid-range horns actually source images of the rotate, the bass loudspeaker remains reflections were added to physically static, and a rotating baffle the direct sound using is used to modulate the acoustic out- filtered versions passed put. A picture of a Leslie 44W model through a line, for is shown in Fig. 1, in which the loud- different rotational states speaker and baffle elements can clearly of the horn. While this be seen. The two rotating elements go was computationally effi- around at different rates to create a cient, it did not necessar- more interesting result, and there is a ily capture all the slow chorale effect as well as a fast required features of the Fig. 2. Picking up first-order reflections from the Leslie horn–cabinet combination using a pair of tremolo effect. sound. In the current . In “Discrete Time Emulation of the attempt, impulse respon- Leslie Speaker” (AES 127th paper ses picked up from the horn and baffle ments of about 0.9 degrees were used 7925), Herrera et al. describe an arrangement by a spaced pair of B&K for the horn and 2.7 degrees for the attempt to emulate the classic design 4006 microphones were captured for bass baffle. As can be seen from Fig. 3, using a novel method. In a previous numerous rotation angles, using small a typical impulse response contains a attempt at emulation, they included the manual rotations of the loudspeaker dense tail of reflections. The signal was modeling of direct sound and time- system. A sine-sweep method was used split into two frequency regions using a varying first-order reflections from the during this process. Rotation incre- crossover, so that the bass and mid- ➥

420 J. Audio Eng. Soc., Vol. 58, No. 5, 2010 May Digital Audio Effects and Simulations range components could be processed produced a low-level “hashy” sound at question. A simulation model by separately. The acceleration and decel- rotation speeds above 6 Hz, which was Norman Koren was found to be more eration dynamics of the rotating loud- thought to be due to spatial aliasing accurate than one by Leach, as it speaker were carefully measured and and linear interpolation between irreg- modeled the triode behavior better for implemented using separate filters in ularly distributed impulse responses. high grid and plate voltages, as well as order to be able to emulate the effects By projecting the impulse responses limiting the behavior of the model to of speeding up and slowing down, such onto a sinusoidal rotation pattern, then more realistic values in the case of as during the transition between chorus representing them as a set of extreme grid voltages. A couple of and tremolo modes. uniformly spaced (in angle) fixed dynamic aspects of triode modeling Three different FIR filtering tech- filters with spatially band-limited were also tried, including the parasitic niques used for processing the relevant responses, the authors were able to capacitances exhibited by the valve in impulse responses were compared. The eliminate this unpleasant effect even at question that tend to affect the first used a scaled and interpolated high rotation rates. frequency response of the amplifier version of the input signal based on (see Fig. 4). The combined result measurements made at numerous rota- A TRIODE GUITAR AMP appeared to be successful both from a tion angles, which was then added to In “Simulation of a Guitar Amplifier measured point of view and perceptu- the output. The second worked by Stage for Several Triode Models: ally, and it was possible to implement cycling through taps in an FIR filter Examination of Some Relevant Phe- it within a VST plug-in architecture. that accorded to the relevant rotation nomena and Choice of Adapted The authors briefly discuss the effect angle. The third worked in a similar Numerical Schemes” (AES 127th of sampling rate on their simulations, way to an HRTF filtering technique paper 7929), Cohen and Hélie look suggesting that oversampling (at 96 used in binaural processing, whereby into the matter of simulating the high- kHz for example) is generally consid- one of a small number of fixed filters gain triode stage in a guitar preampli- ered desirable in nonlinear simulations. was selected depending on the rotation fier, in particular considering the non- Oversampling in such situations angle, with a form of weighted mixing linear distortions that can give a reduces the likelihood of aliasing (or panning) between the adjacent -based guitar amp its effects becoming audible. It also seems filter outputs according to the rotation unique sound. They choose to investi- to improve the accuracy of the various angle. The authors found that approxi- gate the commonly used 12AX7 triode filters and the speed of convergence of mately 20 filters were sufficient to be valve, also known as the ECC83, an algorithm designed to solve nonlin- able to implement convincing repro- which has certain saturation distortion ear and differential equations. ductions of the original impulse characteristics that are said to be val- responses. (This last method is said to ued by musicians and OVER-THRESHOLD FEEDBACK be very efficient in terms of memory alike. DISTORTION SYNTHESIS requirements and can render multiple Mathematical models aiming to Staying with the “triode sound,” Tom inputs at different angles.) simulate static triode behavior were Rutt describes an algorithm he devel- During testing it was found that the tried and compared with those found oped to synthesize nonlinear distortion first two time-varying filter methods on available datasheets for the valve in using over-threshold power function feedback (AES paper 7930). This is said to emulate closely the effect of vacuum tube triode grid limit distor- tion. The process acts as an instanta- neous soft for both the positive and negative half cycles of the signal and never clips the output signal peaks, even at the maximum peak input level. An electronic circuit implementation of this algorithm is described, as well as a DirectX software plug-in version known as Tuboid (www.tuboid.com). In the latter a mapping array is used that relates certain input signal values to modified (distorted) output signal values, in an attempt to emulate the characteristics of the electronic circuit. While a 1:1 mapping function would require an array of 65,536 sixteen-bit integers, in fact the input signal is scaled by 256, and an array of only Fig. 3. Example of a stereo impulse response picked up from the Leslie horn 256 values is used. The intervening

422 J. Audio Eng. Soc., Vol. 58, No. 5, 2010 May Digital Audio Effects and Simulations plate controls the rate at which energy is dissipated (and hence the reverb time). According to the authors’ analysis, the EMT 140 unit behaves in an almost linear and time-invariant fashion. It has a very fast onset transient response and a high echo density, little influenced by the damping plate, while the low- frequency decay can be controlled to range from around half a second to over eight seconds. When considering how to implement a digital emulation it is suggested that, although an obvious suggestion, the idea of storing, inter- polating, and convolving impulse responses of the plate up to eight seconds is computationally very expen- sive. Physical models have been tried in which the mechanical characteristics of the plate are modeled numerically using partial differential equations, but so far these have been unable to emulate the rapid onset response of a real plate. They are, however, good for emulating the high-level nonlinear behavior of such a plate. In the emulation described here, the authors explain that they have employed a hybrid reverberator that combines two elements: a short convo- lution stage to deal with the onset tran- sient and a computationally efficient feedback-delay network (FDN) to emulate the tail, with access to a range of control parameters. Because the transition to a “late field” (the time after the initial impulse at which the Fig. 4. Frequency response of a triode valve simulation either with (top) or without echo density becomes perceptually (bottom) modeling of parasitic capacitances. The roll-off in HF response can be dense) occurs after only a few millisec- seen in the lower plot (courtesy Cohen and Hélie). onds with the EMT plate, the initial convolution period can be short. It output values are linearly interpolated. tain type of reverb that can be useful depends to some extent on the setting The positive- and negative-going parts for vocals in a mix. In “An Emulation of the damping plate. In fact, in this of the signal are separately processed of the EMT 140 Plate Reverberator implementation a single boundary point to enable the plug-in to be adjusted in a Using a Hybrid Reverberator Struc- between the initial convolution and flexible fashion to emulate a variety of ture” (AES 127th paper 7928), Green- FDN parts is introduced at 300 ms, different kinds of both symmetric and blatt et al. consider how its characteris- regardless of the damping setting. asymmetric tube distortion. The author tics can be emulated digitally, so as As the energy in the FDN builds up has made a number of audio examples to avoid the need to keep and maintain during the first few hundred millisec- available for audition at www.coast a large physical unit such as this in a onds, its output can then be crossfaded enterprises.com/OTPF/demoClips. studio. into the end of the initial convolution Essentially plate reverbs consist of a section using a crossfade with energy CLASSIC EMT 140 PLATE suspended, thin steel plate to which a matching. Measured and modeled REVERBERATION driving transducer and one or more decay traces and spectrograms for The EMT 140 was a classic plate pickup transducers are attached. this reverb emulation are shown in reverberation system introduced in Decaying vibrations are set up in the Figs. 5 and 6. 1957 that was used widely before the plate that behave to some extent like A stereo emulation of the EMT plate days of digital reverb units. Its sound is room reverberation, while a damping can be implemented by introducing a still sought by connoisseurs of a cer- plate located close to the reverberation second FDN signal path for the late ➥

J. Audio Eng. Soc., Vol. 58, No. 5, 2010 May 423 AudioEffectsMAY2010_MAY 2010 5/13/10 11:38 AM Page 4

Digital Audio Effects and Simulations

Fig. 5. Measured (blue) and modeled (red) decay traces for Fig. 6. Measured (left) and modeled (right) impulse response emulation of EMT 140 reverberator (Figs. 5 and 6 courtesy spectrograms of the EMT reverb emulation Greenblatt et al.)

reverberation, which will produce an The impulse responses at different output that is uncorrelated with the points in the chain are shown in Fig. 8, first. An identical window is applied where it can be seen that the output of during the convolution of the early part the comb filter is a simple decaying of the response for left and right chan- train of pulses, whereas that of the filter nels in order to maintain the stereo is an infinitely long decaying noise image. The stereo width of the late part sequence. The time constant of the Fig. 7. Comb filter reverberator structure using a short noise sequence is adjusted to match that of the original comb filter is supposed to be the same output filter (Figs. 7–11 courtesy Lee plate by varying the panning of the two length as the noise sequence, which in et al.) signals between the outputs so that the this case is only in the region of 100 cross-correlation of the channels ms, leading to low computation and matches that of the original. memory requirements. This structure creates a decay that can have similar A SWITCHED CONVOLUTION properties, both spectrally and in terms REVERBERATOR of echo density, to typical late reverber- Moving into the modern age and dis- ation. The authors found that this basic cussing the implementation of new structure worked well for stationary effects algorithms, Lee et al. continue signals but less well for transient sig- the reverberation theme in relation to a nals, having audible periodicity. low-cost algorithm for use in mobile Because the noise sequence is devices in ‘The Switched Convolution repeated in a cyclical fashion, as the Reverberator” (AES paper 7927). Con- authors found above, there can be audi- ventional convolution and FDN-based ble periodicity in the perceived reverber- reverberators need considerable ation decay. For this reason the noise amounts of processing and memory, sequence has somehow to be varied. In a which makes them unsuitable for switched convolution reverberator a mobile devices where these resources different noise sequence can be switched are at a premium. Consequently the in at every period of the comb filter, Fig. 8. Impulse response of the reverberator structure shown in Fig. 7: authors propose the reverberator struc- which helps to reduce periodicity. (a) input signal x(t), (b) comb filter ture shown in Fig. 7, which consists of However, a basic switched convolution output, (c) reverberator output y(t). a recursive filter (comb filter) driving a reverberator was found to have the process that convolves the comb filter opposite problem to the simple filter minimized, and this became a key goal output with a short, exponentially structure described above, in that it of the authors’ research. One proposed decaying noise sequence. The reverber- performed quite well with transient solution, as shown in Fig. 9, employed a ator’s is controlled using signals but less well for stationary crossfade between the noise sequences low-order IIR filters at the input and in signals. It was clear that time-varying to be convolved, having a length of the feedback path of the comb filter. artifacts in the reverberation had to be twice the comb filter impulse peri- ¯

J. Audio Eng. Soc., Vol. 58, No. 5, 2010 May 425 Digital Audio Effects and Simulations

a

b Fig. 9. Switched convolution reverberator structure with cross-faded noise sequences

Fig. 12. Transient removal and interpolation in a novel time-stretching algorithm: (a) The transient is detected and windowed, then it is subtracted from the signal, finally the gap is interpolated; (b) upon resynthesis the stationary signal is stretched, then a gap is prepared for the transient by multiplying with the inverse window used for its original extraction, then the transient is added back in (Figs. 12 and 13 courtesy Nagel and Walther).

A transient detection algorithm highly transient sound such as that from could be used to assist the a castanet. process of selecting the update The authors cite a 1994 paper on time Fig. 10. Impulse response of the crossfaded switched convolution reverberator shown in rate of the noise sequence. For stretching by Quatieri et al. in which the Fig. 9 complex music signals it was algorithm attempts to preserve the over- found to be helpful to make the all envelope of the signal in its time- leaky integrator’s time constant stretched version. However, this leads to frequency-dependent. Sparse the side effect that a time-dilated percus- noise sequences known as sive event will decay more slowly than velvet noise were tried to good the original. In musical signals, they effect, these having less dense argue, it is preferable to preserve the impulse responses but similar envelope of transient events. In the perceptual characteristics to authors’ new method, transients are Gaussian noise. extracted from the composite signal and replaced with a stationary interpolated HANDLING TRANSIENTS signal to fill the gap, as shown in Fig. IN TIME-STRETCHING 12. (The reason for this is to avoid the ALGORITHMS artifacts that can arise when stretching Fig. 11. Switched convolution reverberator Time-stretching algorithms are stationary signals that have gaps in using frequency-dependent noise update used to alter the speed of audio them.) The resulting quasistationary signals without altering their signal is then well suited to time stretch- ods. The system’s impulse response is pitch. Ideally this process should avoid ing, and the transients are subsequently shown in Fig. 10. The perceived result side effects. While sustained signals can used to replace parts of the interpolated was found to represent an improvement; be treated with relative success using signal after it has been stretched. A however, it cost more in terms of various types of overlap-and-add algo- phase vocoding technique was used to computation and memory requirements rithms, transient components are not stretch the stationary parts of the signal. than the basic structure described earlier. always so easy to deal with. In “A Novel As shown in Fig. 13, the signal was FFT Alternative structures were tried that Transient Handling Scheme for Time analyzed in grains of around 10 ms and used only a single convolution stage, Stretching Algorithms” (AES 127th then resynthesized at the new time scale based on the idea of gradually changing paper 7926), Nagel and Walther explain using a different hop size and overlap- the noise signal over time so as to that it is usually necessary to extract the add processing (in this example, a reduce artifacts. A leaky integrator transient components and only process stretching factor of two is shown). concept was used to control the rate at the sustained elements of a signal, The main challenges of this method which the noise sequence was changed, adding the transients back in after pro- were to detect transients accurately and and proposals were made to vary this cessing. In attempting to address this to achieve a perceptually satisfactory rate depending on the time-varying problem they tackle the challenge of interpolation of the stationary part of nature of the input signal. A version stretching a sustained note such as that the signal. In particular there was the based on this idea is shown in Fig. 11. from a pitch pipe, combined with a problem of what would happen if the

426 J. Audio Eng. Soc., Vol. 58, No. 5, 2010 May Digital Audio Effects and Simulations Fig. 13 The phase emulate, they can bring the sounds of vocoder algorithm yesterday to a contemporary audience used for time in a relatively convenient fashion. stretching resynthesizes the original signal by Editor’s note: The papers reviewed in this using a hop size article, and all AES papers, can be purchased that is different to online at www.aes.org/publications/preprints/ that used in the search.cfm and www.aes.org/journal/search.cfm. original FFT analysis (here AES members also have free access to past tech- shown as two nical review articles such as this one and other times longer) tutorials from AES conventions and conferences at www.aes.org/tutorials. stationary signal’s composition changed substantially over the duration of the transient, which the authors proposed to address by using forward and backward prediction with a cross- . Detection of the ends of transient sections was complicated by the fact that the reverberant decays of the tran- sient part typically merge with the Prism Sound is now a global following stationary part, and this sales & support agent for: required some careful handling. Results were promising, but did not appear to Loudspeaker design and analysis solutions outperform state-of-the-art software currently available. Interestingly, some listeners seemed to prefer versions of the time stretch in which the reverb of the transient was stretched along with the stationary part of the signal, which rather contradicted the authors’ original assumption that it would be preferable to treat the transients and their associ- ated decays as an unchanged and electronic test, combined entity. They concluded that more insight into listener preferences acoustic test... are needed. The novel approach is promising, they said, and its essential principles had been demonstrated in the $#"!#"! " !  "!!#" case of a special signal. Anyone inter- ...integrated  "!!  "!"!"!   "!   "! !"#""! "! ested in hearing the results of this  !  !!  approach are invited to ask the authors  "! "!""! ([email protected]) for  " "!  ! !  sample audio files.  "! "!""!!

!  ! !! !" " SUMMARY # #!   !  !  !  "!"! !"" Using today’s signal processing tools "!#! ! "! " one can both emulate the sounds of   yesteryear and implement new effects that were not possible in the days of analog processing. The sound of classic dScope Series III analog and digital systems such as the Leslie speaker and audio analyzers tube guitar amp can be made available in the world of plug-ins, for use in mod- ern digital audio production applica- tions. Although plug-ins may not have   #!    #!  "          some of the raw physical appeal or nos- talgic value of the classic hardware they

J. Audio Eng. Soc., Vol. 58, No. 5, 2010 May 427