California State University, Northridge

CALIFORNIA STATE UNIVERSITY, NORTHRIDGE

Applying Local Search for Polynomial Coefficients for Alias Reduction in Oscillators

A thesis submitted in partial fulfillment of the requirements

For the degree of Master of Science in Computer Science

Menglee Guy

May 2018

The thesis of Menglee Guy is approved:

______Dr. Adam B. Kaplan Date

______Dr. Richard Covington Date

______Dr. Jeff Wiegley, Chair Date

California State University, Northridge

iii

Acknowledgements

I would like to express my deep gratitude to Dr. Jeff Wiegley for chairing my thesis committee and allowing me the freedom to pursue my passion of audio plug-in development. I would also like to thank Dr. Richard Covington and Dr. Adam B. Kaplan for being a part of my committee.

I am also grateful to all of those whom I have had the pleasure to meet on Audio

DSP forums, as well as the JUCE community for all of your help and support in my journey into the world of DSP and audio plug-in development.

Dedication

This paper is dedicated to all the trance music lovers in the world!

Table of Contents

Signature Page iii

Acknowledgments iv

Dedication v

Table of Contents vi

List of Figures viii

Abstract ix

1. Introduction 1

2. Sampling Fundamentals: Frequency vs. Sample Rate 5

2.1 The Problem: Aliasing 5

2.2 Nyquist 7

3. Classical Waveforms and Trivial Implementations 9

4. Techniques 11

4.1 Additive Synthesis 11

4.2 Wavetable Synthesis 12

4.3 Bandlimited Impulse Train (BLIT) 13

4.4 Bandlimited Step (BLEP) 13

4.5 Polynomial BLEP (PolyBLEP) 15

4.5.1 PolyBLEP Implementation 17

5. Proposed Technique 18

5.1 The Process 18

5.2 Conclusions 21

5.3 Future Work 22

6. BabySynth: Development 23

6.1 BabySynth: Oscillator Design 24

6.2 BabySynth: Envelope Design 24

6.2.1 BabySynth: Envelope Implementation 26

6.2.2 Envelope: multiplier output 28

6.3 BabySynth: Filter 29

6.3.1 Filter: Implementation 29

Bibliography 31

Appendix A: Trivial Sawtooth 32

Appendix B: Additive Synthesis 33

Appendix C: PolyBLEP 35

vii

List of Figures

Figure Page

1 Access Virus TI 1

2 Logic Pro X on MacBook Pro 2

3 Sylenth1 3

4 2-cycle Sine Wave 5

5 Sine Wave Sampled at 400 Hz 6

6 Sine Wave Sampling Aliasing 7

7 Sine Wave Representing Nyquist Frequency 8

8 Classical Waveforms 9

9 Sinc Pulse to BLEP Residue 14

10 Correcting points to smooth discontinuity 15

11 Triangle pulse 16

12 PolyBLEP residue 16

13 FFT spectrum with aliasing at 9.34k 19

14 FFT spectrum with aliasing at 6.26k 20

15 FFT spectrum with aliasing at 10.2k 21

16 FFT spectrum of Welch Window BLEP table 22

17 Basic ADSR Envelope 25

18 Multiplier Sample Code 28

19 Low Pass Filter Sample Code 30

viii

Abstract

Applying Local Search for Polynomial Coefficients for Alias Reduction in Oscillators

Menglee Guy

Master of Science in Computer Science

In the development of virtual instruments, the oscillator is responsible for generating periodic waveforms. The classical sawtooth waveform, described as an increasing slope, then a straight drop, is known for its use in music production. A trivial implementation of an oscillator that generates a sawtooth is only a few lines. It requires using a counter that increments a small amount repeatedly, then resets and starts over.

However, this trivial implementation is known to alias wildly due to the round off errors in the digital world. The purpose of this research is to survey current techniques, and investigate the possibility of alias reduction by modifying polynomial coefficients in the

PolyBLEP algorithm. The simplicity of the PolyBLEP is most attractive. By modifying the coefficients, the alias reduction levels can be improved. Using JUCE, a framework for creating audio applications, the development of the virtual instrument BabySynth is discussed.

1. Introduction

In the music industry, a synthesizer is an electronic instrument that generates and controls sounds. The components of a synthesizer fall into three categories: sources, modifiers, and controllers (Pirkle, 2015). An oscillator, which is a source, generates the audio signal. Modifiers include filters and effects. They are used to alter the audio signal.

Controllers are knobs, sliders, and anything used to control the parameters of the modifier. A popular hardware synthesizer used in today’s electronic dance music is the

Access Virus TI.

Figure 1: Access Virus TI, revised in March 2009.

As personal computers become increasingly powerful, they are able to run CPU intensive software, such as a Digital Audio Workstation (DAW). The DAW software is used for recording, editing, and producing audio files. High end DAWs are packaged with many smaller pieces of software called audio plug-ins, which enhance the capabilities and features of the DAW. Audio plug-ins are highly specialized tools. They are classified as either effects, or virtual instruments. Effects alter the audio signal (e.g.,

1 equalizers can be used to boost the bass in audio signals). Virtual instruments are software versions of music instruments that generate audio signals.

Powerful computers are now affordable to the general population. Many professionals in music, and hobbyists alike, are setting up budget music production studios with their personal computer as the key constituent.

Nowadays, it is not uncommon to produce commercial music with a few items: a personal computer, a DAW, and headphones. It is an exciting and lucrative time for the music technology industry. People are spending anywhere from a few hundred, to tens of thousands of dollars on DAWs and plug-ins. Therefore, audio plug-in development has garnered considerable attention from software developers, and as a result, there are many third party audio plug-ins available for free or purchase.

Figure 2: A MacBook Pro running Logic Pro X (Apple’s high end DAW).

Audio plug-in development is not a simple task. It requires a keen grasp of digital signal processing (DSP) theory. Virtual instruments, also known as virtual synthesizers, are a subset of plug-ins that are used to create sound. Because of the increase in popularity for music production software, many developers are interested in building virtual synthesizers that are similar in sound character to popular hardware synthesizers.

Sylenth1 is a popular virtual synthesizer that is widely used in today’s electronic dance music.

Figure 3: Sylenth1 by LennarDigital.

The Oscillator is the component that is responsible for generating sound. It generates audio data by producing amplitude values of a periodic waveform. Given an array of these values, one could plot the values and visually see the graph of the periodic

3 waveform. Therefore, an oscillator is just a function that returns amplitude values corresponding to a periodic waveform.

There are a handful of common waveforms known as classical waveforms, found in all virtual synthesizers: sine, sawtooth, triangle, and square wave. Each waveform produces a different sound. A sawtooth waveform produces a rich buzz sound, while a square produces a hollow sound. An oscillator can be programmed to generate any repeating waveform; however, the classical waveforms generally have the sound characteristics of any other complex waveforms. Therefore, the effort to code a more complex waveform is usually not worth it. Furthermore, there is a fundamental issue with oscillators known as aliasing.

2. Sampling fundamentals: Frequency vs. Sampling Frequency (Sample Rate)

First, a gentle introduction of terms and definitions is required before the problem of aliasing can be demonstrated. Sampling is the process of taking amplitude values of a continuous waveform at regularly spaced time intervals (Mitchell, 2008). The frequency of a waveform is in Hertz (Hz), and which describes, cycles per second. For example, a frequency of 60 Hz describes a cycle that repeats 60 times per second.

Figure 4: A sine wave with 2 cycles shown. The amplitude is the maximum height. The x axis represents time.

The sampling frequency, also in Hertz, is the number of samples per second, and is denoted with �. For example, a sampling frequency, � = 400 ��, says that there are

400 samples per second. It should be noted that standard sampling frequency for CD

(compact-disc) quality is 44,100 Hz. Therefore, there are 44,100 samples per second. The sampling frequency will now be referred to as the sample rate, which helps distinguish it from the frequency of the waveform.

2.1 The problem: Aliasing

In the oscillator, aliasing will occur when the signal is not properly band limited

(Pirkle, 2015). To understand this phenomenon, examine the process of sampling. As a crude example, let an oscillator generate the sine waveform in figure 5. Let � be the

5 frequency of the waveform, and is 60 Hz. Let � be 400 Hz, the sample rate. Through the process of quantization, we can represent any continuous audio signal, such as the one below, as a set of discrete values. Simply take the amplitude at every constant interval.

This interval can be found by taking the inverse of the sampling rate, in this case:

1 400 = 0.0025.

Figure 5: A sine wave with a frequency of 60 Hz, sampled at 400 Hz (Wickert, 2013). The green sample points are amplitude values. Bipolar waveforms have positive and negative values.

Given a sample of the discrete amplitude values taken above, there is enough information to render this waveform properly. In other words, using the green points above, one can construct a curve that accurately represents 60 Hz.

Now, let the frequency of the waveform increase dramatically, to 340 Hz, while the sample rate remains at 400 Hz. Again, record the amplitude values using the constant interval.

Figure 6: The frequency is 340 Hz. The sample rate is 400 Hz (Wickert, 2013). The interval is 1/400. The green points do not give enough information to construct a 340 Hz.

Clearly, there is a problem here. Given these amplitude values, what curve would be reconstructed? There is not enough samples to accurately represent the expected frequency, which is 340 Hz. Aliasing is when a frequency is disguised, and that is what’s happening here. When sampled and reconstructed, the resulting frequency is less than the expected frequency. Naturally, the next question is: given any sample rate, what is the highest frequency that can be sampled and reconstructed accurately?

2.2 Nyquist

The Nyquist theorem says that a bandlimited continuous signal can be sampled and perfectly reconstructed from its samples if the waveform is sampled over twice as fast as its highest frequency component. In other words, there needs to be at least two samples per cycle to reconstruct the expected frequency, one value above and one value below the zero amplitude level (Mitchell, 2008).

Figure 7: This waveform has 2 samples per cycle. One point above the x axis. One point below the x axis. The Nyquist frequency is half of the sample rate.

Formally, the sample rate must be at least twice the frequency. In figure 6 above, in order to sample 340 Hz with accurate reconstruction, the sample rate must be at least

680 Hz. This will result in a constant interval of = 0.00147. Furthermore, when the frequency is exactly half the sampling rate, it is known as the Nyquist limit, or Nyquist frequency.

The standard sample rate of CDs is 44,100 Hz. The Nyquist limit for this sample rate is (44,100) = 22,050 ��. This means that audio CDs can represent frequencies up to 22,050 Hz. Furthermore, the human ear can hear a range of frequencies from as low as

20 Hz, all the way up to 20,000 Hz (Mitchell, 2008). Therefore, the standard CD sample rate of 44.1 kHz works out nicely, to cover the range of human hearing. Other common

CD sample rates include 48 kHz, and 96 kHz. These higher sample rates use up more memory, and the gains in sound quality may not be worth the space.

3. Classical Waveforms and Trivial Implementations

The classical waveforms found in commercial synthesizers, both hardware and software, include the sine, sawtooth, triangle, and square wave. The trivial implementations of these waveforms are straightforward. The term trivial is used because a complex function is not required to model the waveforms.

Figure 8: The classical waveforms: sine, square, triangle, and sawtooth.

Think about how one could model the sawtooth wave. The slope of the sawtooth waveform increases with time, up to a point, then resets with a sharp drop. This behavior is similar to a modulo function. It is trivial to implement in code. Starting at 0, increment a counter on every iteration with a constant interval, the inverse of the sample rate. When

9 the counter reaches the maximum value of 1, simply reset the counter by subtracting 1.

This counter is the amplitude value, or slope, of the waveform at a specific point in time.

The maximum value of 1 represents the normalized maximum amplitude value. See

Appendix A.

The slope of the triangle increases with time, up to a point, then decreases with time, down to a point. Therefore, instead of resetting our counter by subtracting 1, simply decrease the counter by subtracting the constant interval. The square wave has a constant slope that jumps back and forth over time. The discontinuities in the sawtooth and square wave will result in aliasing frequencies due to round off errors. The sine wave is continuous, has no harmonics, and does not alias. The triangle wave also has no discontinuity; the waveform is continuous piecewise linear. The trivial implementation of the triangle waveform can be used to generate sound because it is continuous and does not alias.

The trivial implementations are useful in virtual synthesizers because the implementation is quite fast and efficient. However, the aliasing is substantial, especially when the frequencies are high, near the Nyquist limit. Therefore, they are used as Low-

Frequency Oscillators (LFOs). LFOs are not used to generate sound. Instead, they are used to modulate effect parameters (e.g., a filter’s cutoff) to create unique sounds. A different class of oscillators called pitched oscillators, are used to generate sound (Pirkle,

2015).

4. Techniques

Oscillators that produce wildly aliasing frequencies are highly undesirable for use as a sound source. Therefore, there is a lot of effort going into finding solutions to this problem. There are many different techniques, and each has its own advantages and disadvantages. The next few sections will look at different pitched oscillator designs.

4.1 Additive Synthesis

Additive synthesis is the summation of sinusoids (Mitchell, 2008). A sinusoid is any continuous wave similar to a sine wave, with different amplitudes, and phase shifts.

Additive synthesis is fairly simple to implement. Additive synthesis can be bandlimited as well, by not including harmonics that are higher than the Nyquist limit. The terms harmonic and partial are used interchangeably, and they describe an integer multiple of the fundamental frequency, (e.g., the note on a piano being played). A properly bandlimited implementation of additive synthesis eliminates aliasing frequencies completely.

Additive synthesis, while a solution to aliasing, is not used because of its complexity requirement. Let the sampling rate be the standard CD sample rate of 44,100

Hz. The Nyquist limit is half, and is 22,050 Hz. The lowest piano frequency is 27.500 Hz.

Therefore, , ≈ 801 partials. This will result in 801 calls to the sin function. See .

Appendix B.

Now imagine the demands on the CPU as multiple piano keys are being pressed simultaneously. As the CPU demands increase, another ugly problem arises, latency.

Latency is the delay between the interval of the note being pressed, and the sound being

11 heard by one’s ear. In short, for serious real time audio work, latency must be kept to a minimum.

4.2 Wavetable Synthesis

Wavetable oscillators are less expensive than additive synthesis; however, they can require respectable amounts of storage. The wavetable oscillator can be implemented using a structure similar to a queue. This queue has the data for one cycle of any waveform (e.g., one cycle of a sawtooth). Because the queue structure is pre-loaded with data, wavetable oscillators are often pre-loaded with recorded signals, some very complex. This is an appealing feature of the wavetable oscillator. Many wavetable synthesizers have large collections of recorded samples, which can take up some memory, although the memory issue is becoming less of a factor with current advancements in technology.

As an example of how wavetable oscillators work, let a queue contain 1024 samples of one cycle of any repeating waveform. Then, with each iteration, one sample is outputted. If the CD sample rate of 44,100 Hz is used, then the frequency of the generated signal is , = 43.066 ��. This is the frequency of the queue. To produce other frequencies (e.g., a piano key of 440.0 is pressed), find the increment value using equation (4.1).

�� = ��ℎ ∗ �_�� / ��_�� (4.1)

This increment value is used to skip values in the queue (Pirkle, 2015). For example, to generate a frequency of 440.0 Hz, the increment is (.) = 10.21678. ,

This means that on every iteration, the queue should output every 10.21678 sample.

The idea is simple; however, there are intricacies in the implementation that the

12 developer should keep in mind. It is rare that the increment value is a whole number.

Therefore, to handle fractions, the algorithm needs an interpolation method to calculate a good approximation of the sample output value. Furthermore, the waveform to be loaded in the queue should already be bandlimited. One idea is to calculate bandlimited amplitude values from the additive synthesis method, then load the queue using this data set.

4.3 Bandlimited Impulse Train (BLIT)

Stilson and Smith (1996) proposed their Bandlimited Impulse Train Synthesis technique to reduce aliasing. First, they start with an ideal bandlimited impulse train.

Then, they apply a sinc function, and give a closed-form expression for the sampled bandlimited impulse train (Stilson & Smith, 1996).

() ��(�) = (4.2) ∗ ( )

The integration of this unipolar impulse train produces sawtooth waves with rounded discontinuities, which reduces the aliasing. The integration of a bipolar impulse train produces the square wave.

This method reduces aliasing; however, the computation costs are still high when working with high frequencies. The bandlimited impulse has to be generated for each discontinuity.

4.4 Bandlimited Step (BLEP)

Eli Brandt proposed the Bandlimited Step method in 2001. This method, and variations of it, is the most popular choice for modern software synthesizers. Tests have shown BLEP to be sonically close to additive synthesis. In other words, one cannot tell any difference by ear when comparing it to additive synthesis. This BLEP method still

13 has some aliasing at really high frequencies near Nyquist; however, the aliasing in general has been dramatically reduced.

Brandt’s idea is to use one single bandlimited impulse, which is a solved problem by Stilson and Smith (4.2), and integrate it beforehand. This results in a step signal.

Subtracting a unit step results in a residual.

Figure 9: (a) a sinc pulse windowed at -1 to 1 (b) integration of sinc pulse; Bottom left & right: the residual when subtracting a unit step from (b) (Pirkle, 2015).

This BLEP residual is stored in a BLEP table. Then, the BLEP table is used to round off the discontinuities in a trivial waveform. This method achieves sound quality that is known to be much better than BLIT, and only a few samples need to be modified to achieve such results. Also, the implementation is much simpler compared to the other methods. And, the complexity is much less.

Figure 10: Visual of merging a residual to round off the discontinuity of a trivial sawtooth (Frei, 2010).

This BLEP method is also popular for another reason: it has many choices available for the developer to experiment with to slightly alter the character of the sound quality. In other words, many variations can possibly result in an oscillator that sounds slightly different from others.

4.5 Polynomial BLEP (PolyBLEP)

Polynomial BLEP, which was proposed by Valimaki and Huovilainen (2007), eliminates the need for a BLEP table lookup. In PolyBLEP, the residual is two polynomial equations. Valimaki and Huovilainen use a triangle pulse as a linear approximation of the sinc pulse. Integrating it produces a curve similar to the BLEP edge.

Just as in BLEP, subtracting out a unit step results in a residual.

Figure 11: Instead of a sinc pulse, Valimaki and Huovilainen integrates this triangle pulse.

Figure 12: The PolyBLEP residual.

The PolyBLEP residual is:

��(�) = � + 2� + 1 − 1 ≤ � ≤ 0 (4.5) 2� − � − 1 0 < � ≤ 1

When compared with the original BLEP residual, the PolyBLEP residual is almost a perfect match. The BLEP method has less aliasing than PolyBLEP; however, the sound quality of PolyBLEP is still excellent. Furthermore, since the curves are polynomial equations, PolyBLEP does not require a table lookup, and it is simple to implement the equations directly.

4.5.1 PolyBLEP implementation

The PolyBLEP algorithm focuses on the discontinuities of a waveform. Using the classical sawtooth waveform as an example, the PolyBLEP oscillator generates a sawtooth with the trivial implementation. The region where the counter reaches maximum amplitude and resets itself is the region of discontinuity.

In the region of discontinuity, PolyBLEP must identify whether a point is on the left or right side of the discontinuity, figure 10(b). In both cases, subtracting a small value, figure 10(c), will smooth the discontinuity, figure 10(d). Note, the sawtooth in figure 10 is bipolar, with maximum at 1, and minimum at -1. The point on the right side of the discontinuity will be a negative value, and subtraction will merge the point upwards, smoothing the discontinuity.

The small value to subtract is found by calculating the distance � from the discontinuity, to the point in subject of merging, and using this � value in either of the

PolyBLEP equations, depending if the point was on the left or right side of the discontinuity. What the equations return is an offset that can be subtracted from the trivial sawtooth counter, which gently moves the points on either side of the discontinuity closer together, allowing for a smoother transition, figure 10(c-d). See Appendix C.

The BLEP algorithm works the same way; however, the distance between a point on the left and right of the discontinuity, is used to index the BLEP residue table. Again, the distance is rarely an integer and an interpolation method should be used to estimate accurate values.

5. Proposed Idea

The PolyBLEP method and the BLEP table method both achieve tremendous alias reduction. The latter, is widely accepted as having the sound quality similar to that of additive synthesis. The flexibility of the implementation of the BLEP table method allows developers to achieve different sound character from other virtual instruments.

However, PolyBLEP’s simplicity is attractive, and warrants exploration. The reduction is drastic, almost as good as the BLEP table.

These BLEP, and PolyBLEP algorithms are known as quasi-bandlimited algorithms. They produce excellent results, and they are not as CPU intensive as additive synthesis. Quasi-bandlimited algorithms allows for some aliasing, the main goal is drastic reduction. Now, the question to ask is, “how much aliasing can an algorithm reduce?”

5.1 The Process

The focus of this research is to explore alias reduction achieved with the

PolyBLEP method. PolyBLEP’s implementation is impressively simple, see Appendix C.

From Valimaki and Huovilainen’s calculations, equation (4.5), the coefficients for the first case are 1, 2, and 1 respectively. In this exploration, the objective is to search for polynomial coefficients with the intention of reducing aliasing as much as possible.

It would be fruitless to tinker with the coefficients randomly. This idea of generating polynomial coefficients involves using a searching strategy based on local search. The search space is reasonable, and the coefficients will be in the neighborhood of 1, 2, and 1. The coefficients are stored in an object called a state. The state will have a score. The scoring involves measuring the aliasing with a Fast Fourier Transform (FFT) spectrum analyzer. This tool allows a visual analysis of the waveform. The test frequency is 440.0 Hz. The score is the frequency where the aliasing begins. The goal is to find coefficients where the aliasing begins at higher frequencies.

The benchmark state is the BLEP table. Using MATLAB’s polyfit function, the coefficients for the BLEP table are: 0.976456801660105, 2.00719985138105, and

1.02169212204272. Using the test frequency of 440.0 Hz, the aliasing begins at 9.34k. In other words, aliasing starts at around 9340 Hz. Refer to figure 13, each partial should be definitive; however, at about 9340 Hz, a second frequency is noticeable beside the expected partial. This shorter frequency is an aliasing frequency. Also, note how the aliasing from 20K trends downward to 9k.

Figure 13: Voxengo SPAN’s FFT analyzer. Each partial should be clear-cut. Above 9k (x-axis), another smaller frequency becomes noticeable next to the expected partial. These smaller frequencies are aliasing frequencies.

The current state is the PolyBLEP coefficients (1,2, and 1), and when tested with

440.0 Hz, it achieves a score of 6.26. As shown in figure 14 below, along the bottom x- axis, there is a tiny frequency beside the first partial greater than 6000 Hz.

Figure 14: This spectrum shows aliasing that begins between 6000 Hz and 7000 Hz.

The goal is to find some coefficients where aliasing begins at higher frequencies.

To explore locally, the neighborhood will be the absolute value of the difference between the coefficients of the current state, and benchmark state, which is called the radius. The possible coefficient candidates are generated by either subtracting, or adding the radius to the current coefficient.

Since the current state produces a score of 6.26 with the first coefficient being 1, and the goal state has a higher score of 9.34 with the first coefficient being less than 1, it is likely that the ideal coefficient is less than 1. Therefore, the search will generate possible coefficients by incrementally subtracting the radius from the current state. To avoid testing every candidate, the divide and conquer strategy is employed, and the scores are recorded for each state.

Figure 15: Aliasing frequency begins just after 10,000 Hz.

After, generating and testing many candidate coefficients. Some good results have been uncovered. A state with coefficients: 0.95291360332021, 1.99280014861894, and

1.02169212204272, resulted in a score of 10.2.

5.2 Conclusion

The coefficients found has aliasing that begins at 10,200 Hz. This is slightly better than the benchmark state. Therefore, it is possible to achieve more reduction by modifying the coefficients. One should note, the final coefficients are very similar to the original BLEP table coefficients. It is expected that the most reductions should be coefficients very near the BLEP table’s coefficients because the BLEP table is currently accepted as the best design method.

It is important to understand that the BLEP table method allows many variations of its implementation, and some has achieved unparalleled results. For example, choosing a different window size and shape, when choosing which part of the sinc pulse to integrate will affect the amount of aliasing. Figure 16 below, shows a spectrum of a

BLEP table implementation that uses a Welch window on the sinc pulse with four point

21 correction, two on each side of the discontinuity, see figure 16. The reduction is tremendous and the aliasing is begins at around 13,000 Hz.

Figure 16: Spectrum showing aliasing beginning between 12,000 and 14,000 Hz (Pirkle, 2015).

The BLEP table method is used by professional audio plug-in developers and although most details of the exact implementation of popular plug-ins are proprietary, many oscillators have been analyzed, and some have achieved remarkable results with negligible aliasing.

This research focuses on improving alias reduction using the PolyBLEP method.

Through strategically exploring coefficients near the PolyBLEP’s coefficients, it has been found that slightly modifying the coefficients improve aliasing reduction levels. However, because of BLEP’s customizability, the BLEP table method is still the obvious choice for serious audio plug-in development. Although, when modifying PolyBLEP’s coefficients, the aliasing reduction levels can reach the bottom tier of BLEP table implementations.

5.3 Future Work

Slightly altering the coefficients of the PolyBLEP method produces good results.

The next idea is to build a polynomial coefficient generator. A graphical user interface could be provided to allow the user to draw in the curves themselves. The coefficients

22 would then be retrieved and tested in a PolyBLEP oscillator. A data table of coefficients could be combined with interpolation algorithm to interpolate between values, which could possibly produce even better results.

6. BabySynth: Development

There are a few options to choose when building a music application. For making audio plug-ins on windows, the VST SDK is available for free on the Steinberg website.

For Apple’s Audio Unit specification, the process is a bit more tricky. There is no Audio

Unit SDK to download. The process requires looking at Audio Unit tutorials, and gathering the necessary source code on apple’s developer site. Then, it requires manual set up in XCode.

For cross-platform applications, a framework is the obvious choice. DPlug is an excellent simple, and minimal library for creating audio plug-ins for both windows and mac. DPlug uses the D programming language. The library is so small such that novice developers will be able to ramp up quickly in audio plug-in development. In fact, many of BabySynth’s algorithms and components were built and tested in DPlug, before integrating them into the chosen framework for this project.

BabySynth is made with the JUCE framework. This framework was chosen because of the ease of building a user interface with the GUI library. In the early stages of audio plug-in development, one should not get bogged down with the mechanics of the user interface. It is important to focus on developing the algorithms and getting the components to work properly. JUCE allows this strategy by making it very simple to

23 build a user interface quickly with widgets. Once the components are working properly, the user interface can receive detailed attention with more customization. JUCE is a C++ framework for creating music applications. It is a mature framework designed for serious cross-platform music application development. It is used by many big names in the music tech space, and it should be in any developer’s toolkit who is serious about audio plug-in development as a career.

6.1 BabySynth: Oscillator Design

The Oscillator class is responsible for generating the waveforms. The user will select the waveform with an oscillator wave knob on the interface. The oscillator wave knob has an integer value that will be used to determine which waveform to generate.

The integer value is associated with enumerated strings representing different waveforms.

Now, a switch statement can be used to handle the different user selected waveforms.

To design the Oscillator object as an independent reusable module, the most important functionality it provides is the ability to generate data on a per sample basis

(e.g., a function named oscillate()). This function contains the switch statement. Every time this Oscillator generates a sample, the current amplitude position will be incremented. Every time a new note is pressed on the keyboard, a member function should update the frequency and increment value. See Appendix.

6.2 BabySynth: Envelope Design

Envelope generators allows the user to control the volume of the note being played. It can also be used to control other parameters such as the pitch, and the filter.

Controlling the volume, pitch, and filter allows the user to sculpt more interesting sounds.

Figure 17: A common ADSR envelope. The amplitude of the sound starts at 0, rises to 1 during the attack, decays to the sustain level. A note off event triggered by releasing the key, fades the amplitudes to 0 during release.

A basic envelope, commonly referred to as an ADSR envelope offers four stages: attack, decay, sustain, and release. The segments do not have to be linear, in fact, exponential curves may sound more natural. The segments offered, and curves are limited only by one’s imagination.

A finite state machine can be used to keep track of the segments. This way, we can calculate the number of samples in each segment. The envelope described in this section is one that controls the volume, also known as the amplitude envelope. The default state is the off state. When a note on the keyboard is pressed, the envelope will enter the attack state. The envelope will move into the decay and sustain state on its own.

When the note on the keyboard is released, the envelope enters the release state. Since the oscillator generates amplitude values of a waveform, then we can alter the signal by multiplying a value. The envelope will calculate this multiplier, which is a number

25 slightly above or below 1.00. This number is multiplied with the amplitude value of the oscillator, which results in a larger or smaller amplitude value for a specific sample.

How does the envelope know how to enter the decay and sustain states? Based on the values of each segment, which the user can adjust with sliders or knobs on the interface, the state machine calculates the number of samples in each segment.

6.2.1 BabySynth: Envelope Implementation

Using a switch statement to represent the state machine, there are five states for the common ADSR envelope. To determine what state the envelope is in, the first thing that needs to be done is to calculate the samples in the segments. The current sample will always be set to 0, representing the start of the segment. The end sample represents the end of the segment. The number of samples for that segment can be found by multiplying the user selected value by the sample rate. Then, on each iteration, just increment the current sample. When the current sample reaches the end sample, simply set the state machine to the next segment. Each segment will calculate the multiplier value, which is based on the current level, end level, and length of the segment. case OFF:

The envelope starts in the off state. There is not much to do here except resetting the current level to 0.0, and the multiplier set to 1.0. Therefore, when multiplied with the oscillator’s amplitude value, it has no effect. case ATTACK:

When a key is pressed, the state changes to that attack state. As an example, let the value of the attack be 0.5. The end sample will be 0.1 * 44100 (our CD quality sample rate), which is 4,410. Therefore, there are 4,410 samples in the attack segment.

Using a counter, we increment the counter up to 4,410. On every increment, the multiplier value is continually being updated. In the attack state, the sound rise to maximum volume, therefore, multiplier will range from 0 to a value greater than 1. In this example, 0.1 is very fast, the volume of the sound will reach maximum almost instantaneously.

In contrast, let the attack value be 5.0. Then 5 * 44,100 is 220,500 samples in the attack segment. The slope of this attack segment rises slowly, and as a result, the sound will slowly rise to maximum volume. Once we reach the end sample, the state machine moves to the decay state. case DECAY:

In this state, the amplitude falls. Therefore, starting with the maximum amplitude level reached by the attack state, the multiplier will lower the amplitude over the length in samples of the decay segment.

Using the same process as above, calculate the number of samples in the decay segment using the user selected value, multiplied by the sample rate. Increment the counter from 0 to end sample, and multiply the amplitude with the multiplier less than

1.0, resulting in a decrease in volume over the length of the decay segment. Refer to the

ADSR figure, the current level after the decay ends is not 0. Although, it could be. Once the end sample is reached, the state machine moves into the sustain state. case SUSTAIN:

The sustain describes how long to hold the volume steady. The current level that the decay finished with, will be multiplied by 1.0. This results in no change in amplitude.

There is no need to calculate the number of samples in this segment. The state machines

27 stays in sustain until the user releases the key. This event triggers the state machine to enter the release state. case RELEASE:

In the release state, we desire the amplitude fall to zero. Calculate the number of samples in the segment and multiply the current level with the multiplier less than 1.0, which will drop the amplitude value on each iteration. Then set the state machine to off.

6.2.2 Envelope: multiplier output

The multiplier value is open ended for experimentation. The human ear does not perceive volume changes as linear. Exponential changes in perceived loudness sounds more natural. Therefore, it is more common to use exponential curves for the ADSR envelope.

The equation to produce an exponential curve is:

� = � ∗ (�) + � (6.1)

The value for y represents the amplitude, and n is the sample number. The exponential curve can be generated as an incremental calculation of x, raised to some power of b.

Refer to the graph above, the envelope starts at 0, and ends at 0. The amplitude y, should be normalized to the range of 0 to 1.

Figure 18: On every sample, the envelope generator calculates the multiplier value that is to be multiplied with the original amplitude value of the signal.

When the value of b is 1, the segment is linear. Values greater than 1, results in exponential curves. Values less than 1, results in logarithmic curves. This function returns a value that is multiplied with the oscillator’s generated amplitude value, which affects the loudness of the sample.

6.3 BabySynth: Filter

A filter is a modifier. It takes audio data as input, modifies it, then produces data output. A filter can be used to cut, or block out frequencies to modify the sound.

BabySynth’s filter is the last step applied to the sample before it is placed into the audio buffer.

On the user interface, there are three parameters to control the filter section: filter mode, and cutoff frequency. The user will be able to select the filter mode which include: low-pass, high-pass, and band-pass.

6.3.1 Filter: Implementation

BabySynth uses a feed-forward filter. As the audio sample is being generated, this feed-forward filter uses two samples and sums them up to produce a final output sample.

The equation for a feed-forward filter looks like this (Pirkle, 2013):

�(�) = ��(�) + ��(� − 1) (6.2)

The y value represents the final output after being summed by the algorithm. The x values are the current and previous sample. This can be handled easily in code with a variable that hold the previous sample, also known as a delay sample. The a coefficients are what is going to affect the amplitude values. They will be determined by the user through the cutoff frequency parameter knob on the user interface.

Figure 19: A simple low-pass filter. The current and previous amplitude values are summed up together using coefficients. Many more complex filters are built using this as a building block.

For this example, the range of the coefficient for A1 is 0 to 0.49. This value can be determined by the user. In another function that gets triggered every time the user changes the A1 value, the coefficient A0 will be determined by subtracting 1.0 from A1.

Then, A0 will be -1.0 for the initial default value of 0 for A1. This just inverts the amplitude signal and results in no audible change in the signal. However, as A1 increases, only high frequencies are able to pass through this filter. In other words, when

A1 increases towards positive 0.49, A0 will decrease to -0.51. When these to values are summed together, it will be closer to 0. Now, which harmonics have amplitudes that are closer to 0? High frequency harmonics. Therefore, this is a low-pass filter. It blocks low frequencies, and allows higher frequencies to pass through.

Bibliography

1. Brandt, Eli. “Hard Sync Without Aliasing.” Proceedings International Computer Music Conference, (2001).

2. Frei, Beat. “Digital Sound Generation.” Institute for Computer Music and Sound Technology (ICST), (2010). https://www.zhdk.ch/en/downloads-digital-sound- generation-5383. Accessed 15 March 2018.

3. Mitchell, R. Dan. “BasicSynth: Creating a Music Synthesizer in Software. Mitchell, R. D. (2008).

4. Pirkle, Will. “Designing Audio Effect Plug-Ins In C++: With Digital Audio Processing Theory. Focal Press. (2013).

5. Pirkle, Will. “Designing Software Synthesizer Plug-Ins In C++: For RackAFX, VST3, and Audio Units. Focal Press. (2015).

6. Stilson, T. S. and Smith, J. O. “Alias-Free Digital Synthesis of Classic Analog Waveforms. Proceedings International Computer Music Conference. (1996).

7. Välimäki, Vesa & Huovilainen, Antti. (2007). Antialiasing Oscillators in Subtractive Synthesis. Signal Processing Magazine, IEEE. 24. 116 - 125. 10.1109/MSP.2007.323276.

8. Wickert, M. “Chapter 4: Sampling and Aliasing.” (2013). http://www.eas.uccs.edu/~mwickert/ece2610/lecture_notes/ece2610_chap4.pdf. Accessed 15 March 2018.

Appendix A: Trivial Sawtooth

This example hardcodes the frequency to be 440 Hz. In reality, there would be a

“get frequency” function that converts the note pressed on the keyboard into a frequency value. The data in the buffer has been plotted for a visual reference.

Appendix B: Additive Synthesis

This example shows how to generate waveforms using additive synthesis. Using the frequency of 440 Hz, and sampling frequency of 44,100 Hz, the phase increment value is (2*pi*f/fs) (line 26). The first for loop generates phase increments for all partials and saves it to an array (line 36). Line 37 holds the amplitude values of each partial;

33 naturally, the fundamental partial should be the largest, and all other partials will be a fraction. The sawtooth waveform contains both odd and even harmonics, we increment the partial number by one (line 39). To generate square waves, which has even harmonics, increment the partial number by two. Lines 47 and 49, uses the sin function to obtain the value at that phase and multiplies it by the amplitude. The phase is incremented by the partial’s phase increment. Keep the phase with the bounds (lines 51-

52). Here are some graphs generated with 2, 4, 8, 16, and 32 partials.

Appendix C: PolyBLEP

PolyBLEP works by smoothing the discontinuity of a trivial waveform. Refer to line 27 of the sample in Appendix A. PolyBLEP works by subtracting a small value from the amplitudes just before, and after the discontinuity. Therefore, we could have something like this for line 27 of the sample in Appendix A:

And the PolyBLEP function itself, adapted from (Pirkle, 2015).

PolyBLEP only alters two points, one on the left, and one on the right of the discontinuity. Therefore it will return 0.0 most of the time, line 15. Refer to the drawing below. When the modulo counter is greater than 1.0 minus the phase increment, we know the modulo counter is on the left side of the discontinuity, and is about to reset. Likewise, after the reset occurs, the modulo counter will be less than the phase increment, therefore we know it is on the right side.

To calculate the distance �_left, subtract 1.0 from the modulo counter, and divide by the phase increment, line 7. To find the distance �_right, simply divide the current modulo counter’s value by the phase increment, line 12.