Integer-Based Wavetable Synthesis for Low-Computational Embedded Systems

Ben Wright and Somsak Sukittanon University of Tennessee at Martin Department of Engineering Martin, TN USA [email protected], [email protected]

Abstract— The evolution of digital music synthesis spans from discussed. Section B will discuss the mathematics explaining tried-and-true frequency to string modeling with the components of the wavetables. Part II will examine the neural networks. This project begins from a wavetable basis. music theory used to derive the wavetables and the software The audio was modeled as a superposition of formant implementation. Part III will cover the applied results. sinusoids at various frequencies relative to the fundamental frequency. Piano sounds were reverse-engineered to derive a B. Mathematics and Music Theory basis for the final formant structure, and used to create a Many periodic signals can be represented by a wavetable. The quality of reproduction hangs in a trade-off superposition of sinusoids, conventionally called Fourier between rich notes and sampling frequency. For low- series. Eigenfunction expansions, a superset of these, can computational systems, this calls for an approach that avoids represent solutions to partial differential equations (PDE) burdensome floating-point calculations. To speed up where Fourier series cannot. In these expansions, the calculations while preserving resolution, floating-point math was frequency of each wave (here called a formant) in the avoided entirely--all numbers involved are integral. The method was built into the Laser Piano. Along its 12-foot length are 24 superposition is a function of the properties of the string. This “keys,” each consisting of a laser aligned with a photoresistive is said merely to emphasize that the response of a plucked sensor connected to 8-bit MCU. The synthesis is processed in the string may need a model more complete than a Fourier series sensor array by four microcontrollers running a strictly to accurately capture its timbre as shown in equation (1). synchronous wavetable synthesis algorithm. In the laser array is 22uu u integrated a microcontroller that can toggle each laser, allowing  22u( x ,0) f ( x ) ( x ,0) g ( x ) (1) the piano to play itself or limit playable keys. xt t I. INTRODUCTION The function u x, t represents the displacement of a A. Background Review vibrating string as a function of position x along the string, Throughout the years, music synthesis has branched forth where 0 xa, and time t . fx() and gx() represent the from many different techniques. FM synthesis is classically initial displacement and speed, respectively. d’Alembert’s favored for simplicity but is limited in signal realism. The solution [4] to this PDE is given in the form Karplus-Strong algorithm is favored for rich harmonic content and authenticity particularly to the transient responses u(,)()() x t  x  ct   x  ct , (2) of plucked and struck strings. In [1], Karplus-Strong polynomials are used to define zeros to selectively cancel which describes the wavespeed c of travelling waves.  harmonics from a traditional Karplus-Strong transfer and  describe travelling waves that move in different function. The time-domain convolution associated with this directions at speed . A solution to this PDE is given by method, however, requires more RAM than may be practical for a low-computational system. In [2], the authors proposed  uxt( , ) A  cos  ctB   sin  ct sin(  xd )  an impressive system to synthesize music by modeling the 0  vibration of a string based on Scattering Recurrent Networks 2 a with very accurate results. To implement this system, A  f( x )sin(  x ) dx (3) 0 however, would still be far beyond the capacity of a typical a small embedded system. Techniques like these have been 2 a B  g( x )sin(  x ) dx combined in [3] by a control structure that dynamically ac 0 selects and combines synthesis techniques to benefit from the advantages of each. This control structure could prove Initial conditions fx() and gx() are used, by an appeal to invaluable to a system synthesizing a wide range of timbres and pitches but would be unnecessary for a system with orthogonality, to derive a pair of coefficients A and B for sufficiently limited scope. each eigenfunction. There exists one eigenfunction for each eigenvalue  . These eigenvalues need not be associated nor For embedded systems, wavetable synthesis offers an even countable. Each eigenfunction is called a formant. attractive combination of realism and speed. In this paper, a lightweight implementation of wavetable synthesis is These formants are not as easily determined for a generalized eigenfunction expansion as for a Fourier series. These formants are assumed to relate to the fundamental Once a pattern of formant relationships and gains was frequency of each note in identical fashion. A vibrating recognized, the waveform was reconstructed in MATLAB (as string, for example, should produce the same timbre (i.e. shown by the code in Fig. 4a, and plotted with the ADSR quality of sound as described by its formant structure) even as envelope in Fig. 1) using a superposition of waves with a tuned higher or lower. The relations of formants to the similar mapping of frequencies and amplitudes. Multiple fundamental root were assumed to fit into the model of a MATLAB files were written to attempt a simulation most diatonic scale as described by Western music theory. This like on-chip synthesis as possible. Every note was sampled limited the formants to a set of the most significant few that from an array of 256 8-bit numbers and amplitude-modulated provide a skeleton into which the formants can fit. If not according to the ADSR (Attack, Decay, Sustain, Release) perfectly accurate, this assumption proved a fair envelope as illustrated in Fig. 2. approximation, and simplified the enforcement of periodicity. Adding formants at eigenvalue frequencies extends the fundamental period of the superposition. The wavetable is not easily truncated because no discontinuities must exist in it. However, the wavetable must be small enough to fit in the limited RAM capacity of a small MCU.

II. ALGORITHM DESIGN A. Waveform Construction How to produce a desired timbre could be a subject of considerably deep inquiry. Applying an idealized partial differential equation to such a pursuit would be difficult enough, but easier still than so modeling a realistic physical string. The limitations of this implementation make work such precision needless. For this design, spectral analyses of professionally-recorded piano notes were studied as a first step toward reverse-engineering piano sound. To lessen RAM consumption on-chip, a wavetable only large enough to capture the note’s fundamental frequency was used. To maintain periodicity, formants were chosen that Fig. 2. A set of notes plotted in MATLAB to illustrate the effect of the satisfied periodicity within this window, most notably the amplitude modulation on the repetitive waveshape. Each enlarged section note’s perfect fifth and compound major third. This way there shows the signal within 50-ms-wide cuts. are no discontinuities in the final wave. The amplitude modulation proved to be more important to the final sound than was initially expected. The use of an Waveform 250 ADSR envelope to modulate the amplitude of the waveform provided the striking attack characteristic of a piano note. 200 Intuitively speaking, this envelope could be just as important 150 to other instruments, particularly to drums and woodwinds.

100 Amplitude B. Firmware Implementation 50 A program was written in C, using Codevision C compiler 0 50 100 150 200 250 [5], to synthesize a set of simultaneous piano notes on an Sample Index embedded system. A timer interrupt function was used to ADSR synchronize the wavetable sampling. The code controlling 250 amplitude modulation was written inside an interruptible 200 loop. The output, a superposition of all notes, was output via

150 a byte register into a DAC, from which an analog signal was filtered and sent straight to speakers. The flowchart is shown 100 Amplitude in Fig. 3b. 50 A timer built in to the microcontroller increments a byte 0 register every 32 clock cycles. Each time this register 50 100 150 200 250 Sample Index overflows, an interrupt subroutine is called and the counter is reinitialized to tune the frequency of such calls. Every timer Fig. 1. A MATLAB plot of the waveform and ADSR envelopes. The interrupt, an 8-bit number for each note is sampled from an waveform shapes the timbre of the notes and the ADSR further controls the index in the wavetable, then amplitude-modulated according amplitude of each note to realistically depict its attack and decay. to its position in the ADSR table. The sum of these numbers is normalized and output to the DAC.

(a) (b)

Fig. 3. (a) A system diagram of the piano hardware, (b) Flowchart describing the piano synthesis algorithm. Two counters increment each interrupt to synchronize ADSR timings for both damped and undamped notes.

These indices are incremented every so often; those indices sampling is not. In addition, every interrupt must consume controlling position in the wavetable are incremented the exact same amount of CPU time to remain synchronous according to the frequency of each note. The indices for and consistent. For this reason, the use of control statements higher notes, then, step through the wavetable faster than for was avoided within the timer interrupt code. The use of lower notes. The indices controlling a note’s position in the Boolean numbers in formulae allowed the complete ADSR table is incremented slowly enough to happen within avoidance of if conditions. In this way, a logical 0 or 1 can the interruptible loop. be used as a coefficient just like a numeric 0 or 1. For example, a number to be incremented if a condition is true Critical to this program’s success was the tuning of the may always be incremented without a control flow notes. The amount by which to increment each index for the statement, even when that means the number is incremented lower frequency notes can get muddled by 8-bit precision as by zero. these increments become small for lower notes and a higher sampling rate. Floating-point math, on the other hand, III. RESULTS proved to process very slowly. Instead, a 16-bit unsigned The final product was a laser piano . In integer was used to represent each index, but scaled up by parallel 12-foot arrays, lasers (each 5 mW 650 nm) shoot 256. This yielded tuning more than precise enough for this along the floor into an array of photoresistive sensors, one application while demanding much less CPU time than for each of 24 fully independent notes, as Fig. 3a shows. Full floating-point math. 24-note polyphony from C4 to B5 is processed by four The indices for the ADSR table, in contrast, were fairly microprocessors (AVR ATMega644), each governing its simple to process. These indices, also 8-bit, were own range of 6 notes. Overclocked with 27 MHz incremented far less frequently. The increment timings were piezoelectric crystals, each chip performs consistently at a scaled down from the interrupt frequency by incrementing a sampling rate just above 10 KHz. The synthesis algorithm counter in the interrupt function that the interruptible uses two separate ADSR timings. A note is stepped through amplitude-modulating code would check. Also within this the ADSR more slowly as long as the user input loop, checks are made concerning user input and which notes corresponding to that note stayed present. This way a laser should start and stop. To achieve the best performance is to blockage that remains causes a note to sustain longer like strive for the best sampling frequency while leaving enough holding a key down on a real piano. Each sampling period, CPU time to handle amplitude modulation and sampling user the amplitude of the superposition of synthesized notes is inputs. Traversing the ADSR table is not as important to the output to an 8-bit DAC (DAC-08CN). user’s ear as synchronously traversing the wavetable. Above In the laser array is integrated a microcontroller all, output from the wavetable must be synchronous. This is AT ega that can toggle each laser. rogrammed with why the amplitude modulation is interruptible but wavetable the arry otter theme, the mperial arch, and r lise, limited, a larger waveform envelope may become necessary the piano can effectively play itself. This microcontroller is to ensure periodicity of the output signal. These should be also programmed with a set of scales to make playing the implemented with an algorithm that does not fail to keep piano easier. Because such a microcontroller cannot source wavetable output synchronous while keeping ADSR timings enough current to all 24 lasers, the lasers are sourced by steady. Using integer calculations wherever possible may Darlington pairs the MCU controls. sacrifice some precision, but for considerable savings in The sensors were constructed using photoresistors CPU time. Fig. 4b shows the envelopes initialized as 8-bit aligned with plastic tubes and wired in series with a static integral types (unsigned char in C). These methods proved resistor. Each photoresistor has high resistance when effective in an implementation on an 8-bit embedded darkened and lesser resistance when light is shone upon it. platform. Similar techniques may be effective on larger The sensor circuit uses voltage division to sense the systems. Another step to make such an algorithm more difference between a laser shining and a blockage. The series effective could be to add transient effects, e.g. for resistor was selected to maximize the difference between the synthesizing the music of an acoustic guitar, the “scratch” maximum and minimum voltages across the photoresistor. It when a string is plucked. For such things, pursuing a can be shown the best series resistance is given by the different algorithm may prove more efficient. geometric mean of the highest and lowest resistances across the photoresistor. The low and high voltages were reliably ACKNOWLEDGEMENT distinguishable as TTL logic so the sensors could be wired The authors would like to thank Robert Reeves for his directly to tri-state pins on the chip. discussion and help in this work. The DAC selected outputs a current signal, so all the DAC output signals are added simply by wiring them to a REFERENCES common node. This current signal from the DACs was [1] I. A. Cummings, R. Venugopal, J. Ahmed, and D. S. Bernstein, filtered by a parallel RC filter with cutoff frequency 1500 Hz “Generalizations of the Karplus-Strong Transfer Function for meant to smooth the discontinuities in the DAC output Digital Music Sound Synthesis,” in IEEE Proceedings of the signal. Finally, a series capacitor was added to normalize the American Control Conferences, 1999, pp. 2210-2214. voltage output to the speaker and eliminate popping. [2] A. W. Y. Su and L. San-Fu, “Synthesis of Plucked-String Tones by Physical Modeling With Recurrent Neural Networks,” in IEEE IV. CONCLUSION Workshop on Multimedia Signal Processing, 1997, pp. 71-76. Even quite complicated timbres can be reproduced with [3] S. D. Trautmann and N. M. Cheung. “Wavetable Synthesis for Multimedia and Beyond,” in IEEE Workshop on Multimedia Signal wavetable synthesis. On systems with more computational Processing, 1997, pp. 89-94. power, more complicated string modeling techniques can [4] D. L. owers, “The Wave Equation,” in Boundary Value better be appreciated and produce higher quality sound. Problems and Partial Differential Equations, 6th ed., Academic There is much left to do modeling responses of musical Press, 2009, pp. 229-231 instruments for synthesis. There may be formants at [5] http://hpinfotech.ro/html/cvavr.htm frequencies lower than the fundamental frequency of the note or outside diatonic scales. When formants are not as

(a) (b)

Fig. 4. (a) Code snippet depicting waveform construction in MATLAB, (b) envelope initialization in C generated by MATLAB for the synthesizer firmware.