The Phase Vocoder: a Tutorial Author(S): Mark Dolson Source: Computer Music Journal, Vol
Total Page:16
File Type:pdf, Size:1020Kb
The Phase Vocoder: A Tutorial Author(s): Mark Dolson Source: Computer Music Journal, Vol. 10, No. 4 (Winter, 1986), pp. 14-27 Published by: The MIT Press Stable URL: http://www.jstor.org/stable/3680093 Accessed: 19/11/2008 14:26 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=mitpress. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected]. The MIT Press is collaborating with JSTOR to digitize, preserve and extend access to Computer Music Journal. http://www.jstor.org MarkDolson The Phase Vocoder: Computer Audio Research Laboratory Center for Music Experiment, Q-037 A Tutorial University of California, San Diego La Jolla, California 92093 USA Introduction technique has become popular and well understood. The phase vocoder is one of a number of digital For composers interested in the modification of signal processing algorithms that can be catego- natural sounds, the phase vocoder is a digital signal rized as analysis-synthesis techniques. Mathemati- processing technique of potentially great signifi- cally, these techniques are sophisticated algorithms cance. By itself, the phase vocoder can perform very that take an input signal and produce an output sig- high fidelity time-scale modification or pitch trans- nal that is either identical to the input or a modi- position of a wide variety of sounds. In conjunction fied version of it. The underlying assumption is that with a standardsoftware synthesis program,the the input signal can be well representedby a model phase vocoder can provide the composer with arbi- (i.e., a mathematical formula)whose parameters trary control of individual harmonics. But use vary with time. The analysis is devoted to deter- of the phase vocoder to date has been limited mining the values of these model parametersfor the primarily to experts in digital signal processing. signal in question, and the synthesis is simply the Consequently, its musical potential has remained output of the model itself. largely untapped. The benefits of analysis-synthesis formulations Since the is based on In this article, I attempt to explain the operation are considerable. synthesis of the phase vocoder in terms accessible to musi- analysis of a specific signal, the synthesized output this cians. I rely heavily on the familiar concepts of sine can be virtually identical to the original input; in bears waves, filters, and additive synthesis, and I employ a can occur even when the signal question minimum of mathematics. My hope is that this little relation to the assumed model. Furthermore, can tutorial will lay the groundworkfor widespread use the parametervalues derived from the analysis useful modifications of of the phase vocoder, both as a tool for sound analy- be altered to synthesize sis and modification, and as a catalyst for continued the original signal. In the case of alteration, how- musical musical exploration. ever, the perceptual significance and utility of the result depends critically on the degree to which the assumed model matches the signal to be Overview modified. In the phase vocoder, the signal is modeled as a to be deter- The phase vocoder has its origins in a long line of sum of sine waves, and the parameters voice coding techniques aimed at minimizing the mined by analysis are the time-varying amplitude amount of data that must be transmitted for intel- and frequency for each sine wave. These sine waves so this ligible electronic speech communication. Indeed, are not requiredto be harmonically related, the word "vocoder"is simply a contraction of the model is appropriatefor a wide variety of musical term "voice coder."The phase vocoder is so named signals. For example, wind, brass, string, speech, themselves to distinguish it from the earlier channel vocoder, and some percussive sounds all lend which is described in this paper as well. Commer- well to phase-vocoderrepresentation and modifi- and cial analog "vocoders"for electronic music are of cation. Other percussive sounds (e.g., clicks) the channel-vocoder type. The phase vocoder was certain signal-plus-noise sound combinations are first described in an article by Flanaganand Golden not well representedas a sum of sine waves. These can still be but (1966), but it is only in the past ten years that this sounds perfectly resynthesized, attempts to modify them can lead to unanticipated Computer Music Journal,Vol. 10, No. 4, Winter 1986, sonic results. Since the musical utility of the phase ? 1986 Massachusetts Institute of Technology. vocoder lies in its ability to modify sounds predic- 14 Computer Music Journal Fig. 1. Filterbank inter- Fig. 2. Idealized frequency pretation of the phase response of bandpass vocoder. filters. Amplitude No overlap >0 Frequency(Hz) R/2 Lairge overlap I 10'XI Frequency(Hz) Input j ) - Output 1 R/2 (Fig. 1). For resynthesis, the amplitude and fre- quency outputs serve as the control inputs to a bank of sine-wave oscillators. The synthesis is then literally a sum of sine waves with the time-varying amplitude and frequency of each sine wave being Filters Oscillators obtained directly from the correspondingbandpass filter. If the center frequencies of the individual bandpassfilters happen to align with the harmonics tably, signals that fail to fit the model are a poten- of a musical signal, then the outputs of the phase- tial source of difficulty. vocoder analysis are essentially the time-varying In the sections that follow, I show in detail how amplitudes and frequencies of each harmonic. But the phase-vocoder analysis-synthesis is actually this alignment is by no means essential to the performed.In particular,I show that there are two analysis. complementary (but mathematically equivalent) The filterbank itself has only three constraints. viewpoints that may be adopted. I refer to these as First, the frequency response characteristics of the the filterbank interpretation and the Fourier-trans- individual bandpassfilters are identical except that form interpretation, and I discuss each in turn. each filter has its passband centered at a different Lastly, I show how the results of the phase-vocoder frequency. Second, these center frequencies are analysis can be used musically to effect useful mod- equally spaced across the entire spectrum from ifications of recordedsounds. In all cases, the input 0 Hz to R/2 Hz as shown in Fig. 2. Third, the indi- to the phase vocoder is assumed to be a discrete vidual bandpassfrequency response is such that the (i.e., sampled) signal with a sampling rate R of at combined frequency response of all the filters in least twice the highest frequency present in the parallel is essentially flat across the entire spectrum. signal. Thus, for a typical high fidelity application, (This is akin to the situation in a good graphicor R = 50 KHz. The frequency range of interest is parametricequalizer when the gain in each band is then from 0 Hz to R/2 Hz. set to 0 db; the resulting sound has no added colora- tion. In the phase vocoder, this condition ensures that no frequency component is given dispropor- The FilterbankInterpretation tionate weight in the analysis.) As a consequence of these constraints, the only issues in the design The simplest view of the phase-vocoder analysis is of the filterbank are the number of filters and the that it consists of a fixed bank of bandpass filters individual bandpassfrequency response. with the output of each filter expressed as a time- The number of filters should always be suffi- varying amplitude and a time-varying frequency ciently large so that there is never more than one Dolson 15 Fig. 3. An individual bandpass filter. sin(27rft) Magnitude Input Frequency cos(27rft) f partial within the passband of any single filter. (In changes in the input signal. This is a fundamental general, more filters mean narrowerpassbands for tradeoffin the design of any filter-a sharp fre- each filter.) For harmonic sounds, this means that quency response comes only at the expense of a there must be at least one filter for each harmonic slow time response. in the frequency range from 0 Hz to R/2 Hz. Thus, In the phase vocoder, the consequence of this for example, if the sampling rate R is 50 KHz, and tradeoffis that to get sharp filter cutoffs with mini- the fundamental frequency of the sound is, say, 500 mal overlap, one must use filters whose time re- Hz, then there are partials every 500 Hz. Conse- sponse is very sluggish. For slowly-varying sounds, quently, there are (25000/500) = 50 partials in all, a sluggish time response in the individual filters and we need 50 filters in our bank. may be perfectly acceptable. For more rapidly vary- For inharmonic and polyphonic sounds, the ing sounds, however, it may be more desirable for number of filters is usually much greater because the individual filters to have a rapid time response.