<<

Glitch Free FM Vocal Synthesis

Chris Chafe Center for Computer Research in Music and Acoustics, Stanford University [email protected]

ABSTRACT most synthesis technique have been applied to emulate the singing voice (additive, subtractive, physical model, FOF, Frequency Modulation (FM) and other audio rate non-linear etc.). The quest continues more than fifty years later with modulation techniques like Waveshaping Digital Synthe- composers attracted to vocal synthesizers like Yamaha’s sis, Amplitude Modulation (AM) and their variants are well- Vocaloid 1 where they can explore a fascination with musi- known techniques for generating complex sound spectra. cal personalities of singers which never existed. This paper Kleimola [1] provides a comprehensive and up-to-date de- joins a thread which began with Chowning’s work in the scription of the entire family. One shared trait is that syn- late 70’s, early 80’s involving FM for vocal synthesis and thesizing vocal sounds and other harmonically-structured which has been virtually languishing since it’s early use in sounds comprised of formants can be problematic because a few musical works. of an obstacle which causes distortions when intensifying John Chowning’s FM singing voice method was first de- time-varying controls. scribed in his 1980 article [3] prior to completing Phone¯ Large deflections of pitch or phoneme parameters cause at IRCAM (1981). The multi-channel tape piece features a jumps in the required integer approximations of formant wide variety of singing voices and morphing of vocal tim- center frequencies. Trying to imitate human vocal behav- bres with other FM-generated timbres such as gongs. The ior with its often wide prosodic and expressive excursions technique creates multiple formants with independent tun- causes audible clicks. A partial solution lay buried in some ings using multiple carriers and a shared modulator. Two code from the 80’s. This, combined with a phase-syncronous formants are used for his version of a soprano voice “eee” oscillator bank described in Lazzarini and Timoney [2] and three formants for his spectrally-rich basso profondis- produces uniform harmonic components which ensure ar- simo. A later version adds a third formant to the soprano tifact-free, exact formant spectra even under the most ex- model in a synthesis of the vowel “ahh” [4]. Pitch vi- treme dynamic conditions. The paper revisits singing and brato which causes synchronous spectral modulation is es- speech synthesis using the classic FM single modulator / pecially effective and Chowning has often demonstrated multiple-carrier structure pioneered by Chowning [3]. The how crucial this is to rendering vowels convincingly. “It is revised method is implemented in Faust and is as efficient striking that the tone only fuses and becomes a unitary per- as its predecessor technique. Dynamic controls arrive mul- cept with the addition of the pitch fluctuations, thus spec- tiplexed via an audio rate “articulation stream” which in- tral envelope does not make a voice!”[3]. terfaces conveniently with sample-synchronous algorithms The method has an inherent shortcoming which limits written in Chuck. FM for singing synthesis can now be the amount of vibrato excursion and limits phoneme tran- “abused” with radical time-varying controls. It also has sitions to nearby phonemes. Beyond these limits notice- potential as an efficient means for low-bandwidth analy- able artifacts occur which are caused by discrete shifts of sis – resynthesis speech coding. Applications of the tech- formant center frequency. Discontinuities are perceived as nique for sonification and in concert music are described. clicks and result from integer shifts in the carrier to mod- ulator ratio c : m which are required in order to track a 1. INTRODUCTION desired formant center frequency fc for a given pitch fp. Synthesis of singing voice by computer has a history which The modulating oscillator is always set to fp, so m = 1.0. begins in the very first years of computer music. The song The carrier ratio c is an integer approximation and quanti- Daisy Bell (Bicycle Built for Two) was sung by a com- zation of the actual real ratio fc/fp. puter in 1961 in an arrangement by Max Mathews and Formant synthesis with FM is essentially contradictory Joan Miller with vocal synthesis by John Kelly and Carol to the physics. The harmonic nature of voiced sound al- Lochbaum when the Bell Telephone Laboratories experi- lows only harmonic number ratios c ∈ N≥1 for the car- ments with digital music synthesis were only 4 years old. rier. Where physical sound production is an excitation – It was an early case of analysis – resynthesis speech coding resonance mechanism with independent tuning of both el- providing a means for singing synthesis. Over the , ements, FM models can only approximate the resonance frequencies of the latter when constrained to produce har- monic spectra. The inherent problem is that these approxi- Copyright: c 2013 Chris Chafe et al. This is an open-access article distributed mations are discontinuous in frequency. In practice, this under the terms of the Creative Commons Attribution 3.0 Unported License, which permits unrestricted use, distribution, and reproduction in any medium, provided 1 Vocaloid3 uses a triphone frequency-domain concatenative synthesis the original author and source are credited. engine. ohv eyflxbempigsrtg aebe moti- been have strategy desire mapping flexible a very and a seizure) have to in to voice- encountered quiescent data other (from of recordings and range brain phonemic The lis- is of which transitions. it patterns timbral with but like to ease voices” on the “inner lock of of teners because imagery apt to particularly in allude also attractions can its has it the synthesis that voice of Singing display (eCog) auditory recordings. electrocorticography for and EEG technology of fluctuations a rapid probably is one voices. this what mind-controlled isn’t of Instead, choir ensemble goal an singing The e.g., a imagines first mind. fashion this the to for from attempts advantages direct which offers involv- work singing project of FM sonfication body a signals. for brain engine ing synthesis the as 1. the to Fig. in impossible shown of those is spectrograms like it glitches causing form, without original ratios con- shift method’s only the In not small skew to ranges. glissando and pitch portamento also of but vibrato, amount straining the limits severely pitch. a use formants where vocal ratio FM ratio. harmonic its change frequency center formant 1 Figure idn ouinbcm eesr nodrt s FM use to order in necessary became solution a Finding Frequency (Hz) Frequency (Hz) 1000 1500 2000 2500 3000 3500 4000 1000 1500 2000 2500 3000 3500 4000 500 500 0 0 0 0 lcsawy cu hntastost a to transitions when occur always Clicks . b oe lento “eee–ooo–eee” alternation vowel (b) c 0.5 0.5 ∈ N ≥ 1 1 1 and 1.5 1.5 a vibrato (a) m f Time (s) Time (s) c TITLE TITLE = 2 2 ocsacriroclao to oscillator carrier a forces f p where , 2.5 2.5 3 3 f p stedesired the is 3.5 3.5 4 4 c : m pnigt vnadodhroi ubr r assigned are numbers corre- harmonic carriers odd Two and even to solution. sponding cross-fading a plementing in used writ- Brun, Le instrument Marc voice by the “ is comment, instrument code FM the multi-carrier (CLM) From Music as Lisp [6]. it Common project preserved the in has instrument Schottstaedt synthesis Bill a FM now) the exception: (until for one unpublished it remains with solution solved Brun’s also prob- Le discontinuity Brun case. the Le avoid waveshaping, to for way lem a synthe- devising waveshaping in of case and special sis a is modulation FM non-linear [5]. for synthesis paradigm in generalized synthesis a waveshaping as digital 1979 described Brun Le Marc electrodes time). from real directly in seizures controlled detecting voice for singing device data) the (with monitoring recorded medical (exploring a the installation as reach and gallery will a work as dis- completed the public The solving into problem. investigation continuity present the for vations Fig.1. in shown synthesis the to Brun 2 Figure Vox Frequency (Hz) Frequency (Hz) 1000 1500 2000 2500 3000 3500 4000 1000 1500 2000 2500 3000 3500 4000 500 500 visteitgrrtositdsotniisb im- by discontinuities shift ratio integer the avoids 0 0 0 0 euto pligteslto dpe rmLe from adopted solution the applying of Result . 0.5 0.5 b oe lento deglitched – alternation vowel (b) .ERYSOLUTION EARLY 2. a irt deglitched – vibrato (a) 1 1 1.5 1.5 Time (s) Time (s) TITLE TITLE 2 2 Colony 2.5 2.5 n te pieces.” other and Vox 3 3 nelaborate an , 3.5 3.5 4 4 to each formant “bracketing” the true formant center fre- mixed for each formant. These are the pair being cross- quency. Their assignments are made from the two nearest faded to combat the clicks in what I will now label as the harmonics flower = bfc/fpc while the other is the nearest “first-order problem.” upper harmonic fupper = dfc/fpe. The assignment of har- The cross-fade technique assumes that the energy of all monics to individual oscillators is dynamic and depends on coincident pairs of spectral lines will sum arithmetically. whether they are even numbered or odd numbered. When However, this assumption does not take phase into account. an oscillator is required to change its harmonic number the A “second-order problem” is caused by phase interference other will be approaching the actual target fc/fp. The two of coincident spectral lines. These are the spectral lines carrier oscillators’ amplitudes sum to unity in a mixture (carrier and sideband frequencies) of the two overlapping whose gains are complementary and linearly determined formant generators which fill out the spectral envelope of by proximity to the target. The key feature which makes the formant. They are identical spectra which are shifted this work is that it ensures that the oscillator which is hav- relative to one another by one harmonic number. All phases ing its frequency changed will be muted. As a nice side- are generated relative to their respective carrier oscillators effect, it also sharpens the accuracy with which the target rather than to the ensemble of frequencies as a whole. And formant center frequency is being synthesized. without phase-synchronous oscillators, these phases are ar- Le Brun’s paper describes “a unified conceptual frame- bitrary in time since they are independently determined by work for a number of nonlinear techniques, including fre- control changes. quency-modulation synthesis. Both the theory and prac- As discussed in Sec.2 the first-order artifact is only ap- tice of the method are developed fairly extensively, begin- parent under changing conditions of pitch and phoneme ning with simple but useful forms and proceeding to more target. Similarly, the second-order effect may remain un- complex and richer variations.” The cross-fade solution noticed under steady-state conditions. With no change to however only existed in code from the same era. To de- carrier frequencies, carrier phases will also be constant and tail the historical record precisely, its first implementation so will the resultant spectral mix. Nevertheless, interfer- was written in the MUS10 compiler (Stanford Artificial In- ence between unrelated sets of phases can have an effect telligence Laboratory’s version of Bell Laboratories’ Mu- which alters the static spectral envelope and is perceived as sicN compilers). Later, it was ported to CLM as pqw-vox a quality mismatch away from a target steady-state vowel. a “translation from MUS10 of MLB’s waveshaping voice The problem becomes more apparent when carrier frequen- instrument (using phase quadrature waveshaping).” Today, cies are being shifted dynamically, especially if these changes both pqw-vox and the FM version vox can be found trans- are happening periodically. Vibrato is a good way to em- lated to Scheme in Schottstaedt’s Snd project [6] as instru- phasize the problem. Spectral distortions which may be ments defined the file clm-ins.scm. 2 imperceptible under other conditions are easier to hear with The cross-fade solution has not been incorporated in com- control changes which are repeating. The ear can pick mon FM vocal synthesis implementations. Today, the most out the distortion effect as a kind of spectral “isorhythm” notable is the FMVoice instrument included in the Synthe- or aliased pattern which is superimposed. Vibrato with a sis Tool Kit (STK) [7]. The class FMvoices.cpp can be given period will generate a longer-period pattern of spec- freely downloaded as part of STK’s source code and has tral modulation as seen in Fig.3. If you study the regions been ported to various platforms e.g., Chuck [8] and Max around 700 Hz and 1200 Hz, you will notice patterns in / MSP / PeRColate [9]. In porting this class to Faust [10] which phase-related Moire´ fringing is inscribed on the am- and dealing with the discontinuity problem, I subsequently plitudes of the harmonics. “rediscovered” for myself Le Brun’s early solution. The same cross-fade solution also appears in Lazzarini and Ti- 3.1 Minimizing Fringing money [2]. An initial attempt to minimize the audible effect of phase fringing is worth mentioning even though it isn’t ultimately 3. NEW PROBLEM the solution being adopted. It exploits the fact that phase Lazzarini and Timoney also describe a method for gener- interference is most notable when the cross-fade mix of the ating formants with phase-syncronous oscillators. Its im- two carriers approaches equal portions (when interaction portance will become apparent. After adopting Le Brun’s will be greatest). This is the point at which the carriers are code in my own work and verifying that the discontinuity’s equidistant from the target center frequency. Conversely, clicks were gone (Fig.2) I noticed that the fix introduced the least interference occurs when one of them is closest a new problem. This was again an audible artifact plagu- to the target and the other is essentially muted. Taking ad- ing vibrato. Not clicks, but a new kind of artifact. Where vantage of this proportion where one oscillator dominates, perfectly periodic vibrato should elicit perfectly periodic by expanding its time in the (vibrato-related) duty cycle, is spectral modulation it in fact, didn’t. From one vibrato cy- one way to minimize fringing. cle to the next an overlayed pattern of spectral modulation In listening tests, it was found that the cross-fade ramp is heard. The problem arises from phase mismatches in the can be made non-linear and still mask the first-order dis- pair of formants (even and odd harmonic numbers) being continuity perfectly. By using a power law for the ramp slope, fringing is reduced by causing less time to be spent 2 One caution: some implementation versions belonging to this fam- ily have mistakenly labeled carrier oscillators as “modulators” and the in the portion of the duty cycle with the problematic mix reverse: their “carrier” is actually the modulator. ratio. Initial experiments under periodic vibrato condi- Frequency (Hz) 1000 1500 2000 2500 3000 3500 4000 500 0 in niae htee eysgicn xoetcnbe can exponent e.g., significant used very a even that indicated tions The carrier. other Moir the a to is respect result with phase its shift, re- resets to carriers also needs ratio it cross-faded harmonic two the because the frequency of its sets either When 2(a). Fig. 3 Figure pth rqec inlfrtebn ob constructed: be to bank the for signal frequency phasor. (pitch) common single a of off tapped be will out- harmonic all the of and implementation number puts any present with the constructed mod- be In the can by bank shared carriers. is all phasor and single a ulator de- Here, linked [2]. oscillators of phase-synchronous bank in a scribed the employ as to oscil- such is independent this oscillators of around use way the The in is lators. problem the of root The Way Better A 3.2 i.e., intended target the the approximates frequency. from best center away formant which mix mix the the from distorts away dynam- mix temporal the the of altering ics that More is cross-fade region. though, portion the significantly equal where the cycle through crosses duty briefly the mix of portion the during problem. second-order the elimi- of but effect all audible and the mode” nates interference the “phase reduces in greatly This spent time click. a avoid to enough smoothly 0 h aiirsmldsnsi eeae h fundamental the generates sinusoid sampled familiar The remains still fringing Some drawbacks. its has fix This a rnigptencagn vrsvrlvbaoperiods vibrato several over changing pattern fringing (a) hs-eae Moir Phase-related . 0.5

f Frequency (Hz) ( 1000 1500 2000 2500 3000 3500 4000 500 x 0 = ) 0 rnigefc iil nspectrograms. in visible effect fringing e ´ x x ( 0.5 7 t b rnigdetail fringing (b) = ) 1 n tl uetefis-re artifact first-order the mute still and Asin 1 rnig omdi from in zoomed fringing, e ´ 1.5 ( 1.5 ωt Time (s) + TITLE 2 φ Time (s) ) TITLE 2.5 2 3 3.5 2.5 (1) 4 oa on.Teewl raepoee f3(rmore) (or 3 a and of of distribution generate time-varying phonemes a to by create represented used will formants be These will sound. kind vocal this of oscillators carrier eemnsfratbnwdhadi yial sdi a in used ( typically range is low and bandwidth formant determines harmonic at frequency centered formant modulator a shared produces and which carrier independent an of consisting number: number, mp phase. instantaneous fundamental’s the is end N to 0 = n SR 0.0 for / = f mp = w function: modulo the with as lated hs-ycrnu siltr lmntstefign arti- fact. fringing the eliminates oscillators phase-synchronous 4 Figure 3 h bv suocd mlmnsoesml Mpair FM simple one implements pseudo-code above The FM doing in interested we’re since and phase instantaneous the sharing is step next the to key The constant The implemented be can 1 Eq. pseudo-code, in Expressed where

[]n a[o] (h[o] = y[o][n] = y[n] cp[o] = m[o] a[o] (h[o] = y[o][n] = cp[o] 1.0 mod w) + (mp = a mp = y[n] Frequency (Hz) ihayohroclaos where oscillators, other any with i 1000 1500 2000 2500 3000 3500 4000 500 coefficients. 0 0 2 A a with 2(a) from example vibrato the Re-rendering . cp πf f soclao mltd and amplitude oscillator is 3.5 o < where , ihmdlto index modulation with 0.5 t ntnaeu hs and phase instantaneous its 2 SR . * 0 .I rcie ako i o more) (or six of bank a practice, In ). 1 stesml aeadtevariable the and rate sample the is sin(2pi f * sfrequency. is 4 rnigfree fringing 1.5 m[o]) i[o] * * * * Time (s) TITLE p o 1.0 sin(2pi mod mp) 1.0 sin(2pi mod mp) 2 * mp) 2.5 i h atrcoefficient latter The . o ω pcfisoscillator specifies * * 3 nrdsi calcu- is rad/s in h po + cp[o] cp[o]) o 3.5 t harmonic its h fpitch of 4 ,a, h, mp The completed glitch-free method consists of Chowning 4.2 Speech synthesis FM singing voice + Le Brun cross-fade algorithm (from Speech synthesis, with its widely varying pitch and pho- Sec.2) + Lazzarini phase-synchronous oscillator bank (from neme transitions, provides a good “real-life” test of the for- Sec.3). Fig.4 displays a spectrogram of vibrato rendered mant synthesis technique. The test has been created with using the fully-realized solution (Faust and Chuck code to a “toy” analysis – resynthesis platform driving synthesis generate the example are included in the program appen- from digitized singing and speech. The formant tracking dices following). Classic phoneme table synthesis using analyzer is written in Chuck and the formant synthesizer Chowning’s method can now be extended to arbitrary dy- is a Chuck UGen (unit generator) written in Faust. The namic behavior. analysis portion is FFT-based and uses a relatively long (4096 sample) window for formant accuracy (at 48 kHz 4. APPLICATIONS sample rate). An example speech input fragment and the method’s resynthesized output are compared in the spec- The singing voice technique has been used in three projects. trograms of Fig.5. Signal coding in this version consists Sound examples for each of these can be found online [11]. simply of recording formant parameter updates which are relatively sparse (and could be greatly optimized). The re- sults are promising for developing this into an FM-based 4.1 Converting eCog Signals to Music speech coder – the example consists of two different speak- Electrocorticography (eCog) registers brain electrical ac- ers in a heated, emotional dialog. Their voices and iden- tivity directly from inside the skull. Using sensors placed tities are preserved, as is their expressive prosody and in- in regions suspected of giving rise to epilepsy, eCog arrays telligibility. The analysis tracks populations of short-lived provide precise diagnostic monitoring as well as signals of formants which in the example are limited to 4 at a time great importance for studying the brain itself. In a cur- (using 9 oscillators total). rent sonification project, the data is sung by a digital cho- rus. Each singer is a vocal simulation synthesized by the 4.3 Near the Inner Ear present technique. Where arrays have been implanted for The formant generation technique was applied in a recent therapeutic reasons, a large number of sensors is available composition for orchestra, computer music and video pre- (> 50) and the chorus can be made equally large. The aim miered in 2013 by the Stanford Symphony Orchestra, Jin- of the work is to create a music directly from these sensors. don Cai, conductor. The orchestral score by Dohi Moon A correspondence exists between the temporal structures was recorded and analyzed with a formant tracking algo- of music and the dynamics of brain activity monitored by rithm. Analyzed formant tracks were also obtained from eCog. Musical time has its notes, rhythms, phrases and a recording of the first movement of Beethoven’s Ninth epochal structures. Brain signals are marked by structures Symphony (the two pieces were performed together for the on the same time scales. Translation to music requires no conclusion a full Beethoven cycle). Near the Inner Ear in- modification of time base. In fact, the present approach corporated resynthesized clips from both analysis data sets. avoids ”re-composing” or altering the data in any way. The The composition used formant-based resynthesis to gen- individuals who have contributed data to this project share erate a novel timbral identity whose music is tied to the or- our interest in discovering the potential of music as a dif- chestral writing but whose comportment is different enough ferent, new way of comprehending the complexity of brain to constitute a kind of musical “alter ego.” The work opens dynamics. with a 90 s section in which an antiphonal exchange ex- Seizures we have listened to have a characteristic pro- ploits such contrasts. The resynthesis instrument returns gression. They arise with a light, fast modulation ”aura” at various moments during the piece with musical gestures not unlike a super-fast vibrato or tremolo which gives way which reinforce the accompanying video composition by to a nearly regular strong march of pulses. Multiple trains John Scott. of pulses play against each other polyrhythmically. This crescendoes to an almost unbearable climax, the apex of 5. CONCLUSION the seizure. When it seems impossible for it to grow fur- ther, there is an abrupt cessation revealing a state of noth- One goal of the analysis – resynthesis system going for- ing. The paroxysm has switched off and the music is a ward is to create a large database of acquired vocal sounds quiet, calm, sustained chord. Motion is regained after this in order to structure more complex phoneme-based singing repose, but it is in a new world. Long, slow, undulations synthesis from a vastly expanded table. Applications de- characterize the postictal phase whose affect is troubling, scribed above have demonstrated a potential to create rich almost nauseous. Typically, this can last 45 minutes until reservoirs of articulations and timbral identities. The hope normal brain activity is regained. is tap into greater timbral variety for the sonification work. The method for translating brain signals into music is Possibly also to acquire listeners’ own voice traits for EEG- to have them modulate synthesized tones. The choice of driven synthesis. singing synthesis makes an aesthetic connection to the ”hu- Resynthesis in the service of music projects like Near the man-ness” of the data. Our chorus of eCog channels ”per- Inner Ear can also take liberties with acoustic structures in forms” via modulations of pitch, loudness, vocal qualities order to create new instrumental identities. In Near the In- and spatial location. ner Ear, the formant tracker operating on orchestral sound 6. REFERENCES [1] J. Kleimola, “Nonlinear abstract sound synthesis algo- rithms,” Ph.D. dissertation, Aalto University, Helsinki, Finland, 2013.

[2] V. Lazzarini and J. Timoney, “Theory and practice of modified frequency modulation synthesis,” J. of Audio Eng. Soc., vol. 58, no. 6, pp. 459–471, 2010. (a) [3] J. Chowning, “Computer synthesis of the singing voice,” in Sound Generation in Winds, Strings, Com- puters, J. Sundberg, Ed. Royal Swedish Academy of Music, 1980, pp. 4–13.

[4] ——, “Frequency modulation synthesis of the singing voice,” in Current Directions in Computer Music Re- search, M. Mathews and J. Pierce, Eds. MIT Press, 1989, pp. 57–64. (b) [5] M. L. Brun, “Digital waveshaping synthesis,” J. of Au- Figure 5. Analysis – resynthesis of dialog: (a) is the in- dio Eng. Soc., vol. 27, no. 4, pp. 250–266, 1979. put from an argument between a teenage daughter and her mother, “can’t you please give me some space,” and “no, I [6] “Scheme, Ruby, and Forth Functions included will not give you some space.”, (b) FM resynthesis with Snd,” last viewed 29 Mar. 2013. [On- line]. Available: https://ccrma.stanford.edu/software/ snd/snd/sndscm.html

[7] “FMVoices Class Reference, in The Synthesis ToolKit inferred formants where no model would predict them to in C++,” last viewed 29 Mar. 2013. [Online]. exist. Experimenting with resynthesis, it was found that Available: https://ccrma.stanford.edu/software/stk/ often the tracker would emphasize pitches of inner voices in the analyzed recordings. Rather than impose an f0 fun- [8] “Chuck : Strongly-timed, concurrent, and on-the-fly damental pitch, formant frequencies themselves were al- audio programming language,” last viewed 29 Mar. lowed to become the f0. The remaining formants would 2013. [Online]. Available: http://chuck.cs.princeton. then create a kind of “instrumental singing” whose res- edu/ onances mapped to the pitch structures from the original recordings. [9] “PeRColate, A collection of synthesis, signal pro- cessing, and image processing objects for Max/MSP,” The present synthesis method can be used for non-vocal last viewed 29 Mar. 2013. [Online]. Available: sounds whose acoustic structures are also represented with http://music.columbia.edu/percolate/ formant-like resonances. Horner has explored timbre match- ing for a sampled trumpet using a genetic algorithm to find [10] “FAUST (Functional Audio Stream),” last viewed 29 suitable FM formant parameters [12]. Mar. 2013. [Online]. Available: http://faust.grame.fr/

The improvements to FM vocal synthesis detailed in this [11] “Supporting online materials.” [Online]. Available: ∼ paper can be extended to other audio rate modulation schemes, http://ccrma.stanford.edu/ cc/vox/smac2013som/ in particular those which also employ single modulator / [12] J. B. A. Horner and L. Haken, “Machine Tongues XVI: multiple carrier structures. A glitch-free AM vocal syn- Genetic algorithms and their application to FM match- thesis “cousin” has also been implemented in Faust. AM ing synthesis,” Computer Music J., vol. 17, no. 4, pp. has the advantage of simplicity in prediction of dynamic 17–29, 1993. sideband behavior (AM sidebands are free of the Bessel function which determines FM sidebands). 7. PROGRAM A The Faust program, FMVox.dsp [11], implements an FM- Acknowledgments voice algorithm with four formants consisting of uniform Many thanks to John Chowning for his inventions and en- phase table-lookup harmonic oscillators. A multiplexed couragement, musical and technical. Bill Schottstaedt con- audio rate coefficient stream drives the synthesis. For each tinues to passage into the future comprehensive sets of syn- formant a demuxer is included which extracts its coeffi- thesis instruments and analysis tools. His Snd project pre- cients from the stream. Formants are instantiated in a par- serves and provides essential computer music algorithms allel composition by the Faust process which outputs the without which much of the present work would not have sum of their signals. The resulting unit generator compiled been possible. by Faust can have a scalable number of formants. A texture of multiple voices can be created from multiple, indepen- The fupho function implements one formant (of uniform dent voice control streams which flow to independent unit phase harmonic oscillators) which is free of glitches caused generators. Using this architecture a choir of voices can by dynamic behavior. Its inputs are the fundamental fre- be distributed across multiple sample-synchronous threads quency, formant amplitude, bandwidth and center frequency and / or multiple cores. (respectively, a, b, c). A fupho receives demuxed controls via its calling function formant. declare name "FMVox"; import("filter.lib"); 8. PROGRAM B ts = 1 << 16; fs = float(ts); FMVox is used in the Chuck program, FMVoxVib.dsp [11], ts1 = ts+1; to produce Fig.4. The example defines a master “shred” ct = (+(1)˜_ ) - 1; which executes for 4 s. It sets up a DSP graph in which one fct = float(ct); FMVox instance receives a sample-synchronous control sc = fct*(2*PI)/fs:sin; stream and sends its output to the dac. The master shred sm = fct*(2*PI)/fs:sin:/(2*PI); “sporks” child shreds for vibrato and a multiplex control dec(x) = x-floor(x); stream. The data float array holds “aaa” vowel coefficients pha(f) = f/float(SR):(+:dec) ˜ _; for four formants described by their amplitude and center tbl(t,p)= s1+dec(f)*(s2-s1) frequency. For this test code formant bandwidths are set with { the same globally. f = p*fs; Step stream => FMVox fmv => dac; i = int(f); 4 => int nFormants; s1 = rdtable(ts1,t,i); 1::ms => dur updateRate; s2 = rdtable(ts1,t,i+1); }; SinOsc vibLFO => blackhole; vibLFO.freq(3); fupho(f0,a,b,c) = (even+odd):*(a) vibLFO.gain(0.1); with { Std.mtof(64) => float p => float f0; cf = c/f0; fun void vibrato() {while (true){ ci = floor(cf); ((vibLFO.last()+1.0)*p) => f0; ci1 = ci+1; 1::ms => now; isEven= if((fmod(ci,2)<1),1,0); }} ef = if(isEven,ci,ci1); of = if(isEven,ci1,ci); [ // "aaa" frac = cf-ci; [ 349.0, 0.0],[ 918.0,-10.0], comp = 1-frac; [2350.0,-17.0],[2731.0,-23.0] oa = if(isEven,frac,comp); ] @=> float data[][]; ea = if(isEven,comp,frac); ph = pha(f0); fun void mux(float val) { m = tbl(sm,ph):*(b); stream.next(val); even = ea:*(tbl(sc,(dec(ef*ph))+m)); 1::samp => now; odd = oa:*(tbl(sc,(dec(of*ph))+m));}; } frame(c) = (w ˜ _ ) with { -1 => int startFrame; rst(y)= if(c,-y,1); 95 => float db; w(x) = x+rst(x); }; fun void muxStream() { demux(i,ctr,x) = coef updateRate-14::samp => dur padTime; with { while(true){ trig = (ctr==i); padTime => now; coef = (*(1-trig)+x*trig) ˜ _;}; mux(startFrame); formant(f_num,ctlStream) = fsig mux(f0); with { for (0 => int f; f now; process = _<:par(i,nf,formant(i)):>_;