CALIFORNIA STATE UNIVERSITY, NORTHRIDGE
Applying Local Search for Polynomial Coefficients for Alias Reduction in Oscillators
A thesis submitted in partial fulfillment of the requirements
For the degree of Master of Science in Computer Science
By
Menglee Guy
May 2018
Copyright by Menglee Guy 2018
ii
The thesis of Menglee Guy is approved:
______Dr. Adam B. Kaplan Date
______Dr. Richard Covington Date
______Dr. Jeff Wiegley, Chair Date
California State University, Northridge
iii
Acknowledgements
I would like to express my deep gratitude to Dr. Jeff Wiegley for chairing my thesis committee and allowing me the freedom to pursue my passion of audio plug-in development. I would also like to thank Dr. Richard Covington and Dr. Adam B. Kaplan for being a part of my committee.
I am also grateful to all of those whom I have had the pleasure to meet on Audio
DSP forums, as well as the JUCE community for all of your help and support in my journey into the world of DSP and audio plug-in development.
iv
Dedication
This paper is dedicated to all the trance music lovers in the world!
v
Table of Contents
Signature Page iii
Acknowledgments iv
Dedication v
Table of Contents vi
List of Figures viii
Abstract ix
1. Introduction 1
2. Sampling Fundamentals: Frequency vs. Sample Rate 5
2.1 The Problem: Aliasing 5
2.2 Nyquist 7
3. Classical Waveforms and Trivial Implementations 9
4. Techniques 11
4.1 Additive Synthesis 11
4.2 Wavetable Synthesis 12
4.3 Bandlimited Impulse Train (BLIT) 13
4.4 Bandlimited Step (BLEP) 13
4.5 Polynomial BLEP (PolyBLEP) 15
4.5.1 PolyBLEP Implementation 17
5. Proposed Technique 18
5.1 The Process 18
5.2 Conclusions 21
vi
5.3 Future Work 22
6. BabySynth: Development 23
6.1 BabySynth: Oscillator Design 24
6.2 BabySynth: Envelope Design 24
6.2.1 BabySynth: Envelope Implementation 26
6.2.2 Envelope: multiplier output 28
6.3 BabySynth: Filter 29
6.3.1 Filter: Implementation 29
Bibliography 31
Appendix A: Trivial Sawtooth 32
Appendix B: Additive Synthesis 33
Appendix C: PolyBLEP 35
vii
List of Figures
Figure Page
1 Access Virus TI 1
2 Logic Pro X on MacBook Pro 2
3 Sylenth1 3
4 2-cycle Sine Wave 5
5 Sine Wave Sampled at 400 Hz 6
6 Sine Wave Sampling Aliasing 7
7 Sine Wave Representing Nyquist Frequency 8
8 Classical Waveforms 9
9 Sinc Pulse to BLEP Residue 14
10 Correcting points to smooth discontinuity 15
11 Triangle pulse 16
12 PolyBLEP residue 16
13 FFT spectrum with aliasing at 9.34k 19
14 FFT spectrum with aliasing at 6.26k 20
15 FFT spectrum with aliasing at 10.2k 21
16 FFT spectrum of Welch Window BLEP table 22
17 Basic ADSR Envelope 25
18 Multiplier Sample Code 28
19 Low Pass Filter Sample Code 30
viii
Abstract
Applying Local Search for Polynomial Coefficients for Alias Reduction in Oscillators
By
Menglee Guy
Master of Science in Computer Science
In the development of virtual instruments, the oscillator is responsible for generating periodic waveforms. The classical sawtooth waveform, described as an increasing slope, then a straight drop, is known for its use in music production. A trivial implementation of an oscillator that generates a sawtooth is only a few lines. It requires using a counter that increments a small amount repeatedly, then resets and starts over.
However, this trivial implementation is known to alias wildly due to the round off errors in the digital world. The purpose of this research is to survey current techniques, and investigate the possibility of alias reduction by modifying polynomial coefficients in the
PolyBLEP algorithm. The simplicity of the PolyBLEP is most attractive. By modifying the coefficients, the alias reduction levels can be improved. Using JUCE, a framework for creating audio applications, the development of the virtual instrument BabySynth is discussed.
ix
1. Introduction
In the music industry, a synthesizer is an electronic instrument that generates and controls sounds. The components of a synthesizer fall into three categories: sources, modifiers, and controllers (Pirkle, 2015). An oscillator, which is a source, generates the audio signal. Modifiers include filters and effects. They are used to alter the audio signal.
Controllers are knobs, sliders, and anything used to control the parameters of the modifier. A popular hardware synthesizer used in today’s electronic dance music is the
Access Virus TI.
Figure 1: Access Virus TI, revised in March 2009.
As personal computers become increasingly powerful, they are able to run CPU intensive software, such as a Digital Audio Workstation (DAW). The DAW software is used for recording, editing, and producing audio files. High end DAWs are packaged with many smaller pieces of software called audio plug-ins, which enhance the capabilities and features of the DAW. Audio plug-ins are highly specialized tools. They are classified as either effects, or virtual instruments. Effects alter the audio signal (e.g.,
1 equalizers can be used to boost the bass in audio signals). Virtual instruments are software versions of music instruments that generate audio signals.
Powerful computers are now affordable to the general population. Many professionals in music, and hobbyists alike, are setting up budget music production studios with their personal computer as the key constituent.
Nowadays, it is not uncommon to produce commercial music with a few items: a personal computer, a DAW, and headphones. It is an exciting and lucrative time for the music technology industry. People are spending anywhere from a few hundred, to tens of thousands of dollars on DAWs and plug-ins. Therefore, audio plug-in development has garnered considerable attention from software developers, and as a result, there are many third party audio plug-ins available for free or purchase.
Figure 2: A MacBook Pro running Logic Pro X (Apple’s high end DAW).
2
Audio plug-in development is not a simple task. It requires a keen grasp of digital signal processing (DSP) theory. Virtual instruments, also known as virtual synthesizers, are a subset of plug-ins that are used to create sound. Because of the increase in popularity for music production software, many developers are interested in building virtual synthesizers that are similar in sound character to popular hardware synthesizers.
Sylenth1 is a popular virtual synthesizer that is widely used in today’s electronic dance music.
Figure 3: Sylenth1 by LennarDigital.
The Oscillator is the component that is responsible for generating sound. It generates audio data by producing amplitude values of a periodic waveform. Given an array of these values, one could plot the values and visually see the graph of the periodic
3 waveform. Therefore, an oscillator is just a function that returns amplitude values corresponding to a periodic waveform.
There are a handful of common waveforms known as classical waveforms, found in all virtual synthesizers: sine, sawtooth, triangle, and square wave. Each waveform produces a different sound. A sawtooth waveform produces a rich buzz sound, while a square produces a hollow sound. An oscillator can be programmed to generate any repeating waveform; however, the classical waveforms generally have the sound characteristics of any other complex waveforms. Therefore, the effort to code a more complex waveform is usually not worth it. Furthermore, there is a fundamental issue with oscillators known as aliasing.
4
2. Sampling fundamentals: Frequency vs. Sampling Frequency (Sample Rate)
First, a gentle introduction of terms and definitions is required before the problem of aliasing can be demonstrated. Sampling is the process of taking amplitude values of a continuous waveform at regularly spaced time intervals (Mitchell, 2008). The frequency of a waveform is in Hertz (Hz), and which describes, cycles per second. For example, a frequency of 60 Hz describes a cycle that repeats 60 times per second.
Figure 4: A sine wave with 2 cycles shown. The amplitude is the maximum height. The x axis represents time.
The sampling frequency, also in Hertz, is the number of samples per second, and is denoted with � . For example, a sampling frequency, � = 400 ��, says that there are
400 samples per second. It should be noted that standard sampling frequency for CD
(compact-disc) quality is 44,100 Hz. Therefore, there are 44,100 samples per second. The sampling frequency will now be referred to as the sample rate, which helps distinguish it from the frequency of the waveform.
2.1 The problem: Aliasing
In the oscillator, aliasing will occur when the signal is not properly band limited
(Pirkle, 2015). To understand this phenomenon, examine the process of sampling. As a crude example, let an oscillator generate the sine waveform in figure 5. Let � be the
5 frequency of the waveform, and is 60 Hz. Let � be 400 Hz, the sample rate. Through the process of quantization, we can represent any continuous audio signal, such as the one below, as a set of discrete values. Simply take the amplitude at every constant interval.
This interval can be found by taking the inverse of the sampling rate, in this case:
1 400 = 0.0025.
Figure 5: A sine wave with a frequency of 60 Hz, sampled at 400 Hz (Wickert, 2013). The green sample points are amplitude values. Bipolar waveforms have positive and negative values.
Given a sample of the discrete amplitude values taken above, there is enough information to render this waveform properly. In other words, using the green points above, one can construct a curve that accurately represents 60 Hz.
Now, let the frequency of the waveform increase dramatically, to 340 Hz, while the sample rate remains at 400 Hz. Again, record the amplitude values using the constant interval.
6
Figure 6: The frequency is 340 Hz. The sample rate is 400 Hz (Wickert, 2013). The interval is 1/400. The green points do not give enough information to construct a 340 Hz.
Clearly, there is a problem here. Given these amplitude values, what curve would be reconstructed? There is not enough samples to accurately represent the expected frequency, which is 340 Hz. Aliasing is when a frequency is disguised, and that is what’s happening here. When sampled and reconstructed, the resulting frequency is less than the expected frequency. Naturally, the next question is: given any sample rate, what is the highest frequency that can be sampled and reconstructed accurately?
2.2 Nyquist
The Nyquist theorem says that a bandlimited continuous signal can be sampled and perfectly reconstructed from its samples if the waveform is sampled over twice as fast as its highest frequency component. In other words, there needs to be at least two samples per cycle to reconstruct the expected frequency, one value above and one value below the zero amplitude level (Mitchell, 2008).
7
Figure 7: This waveform has 2 samples per cycle. One point above the x axis. One point below the x axis. The Nyquist frequency is half of the sample rate.
Formally, the sample rate must be at least twice the frequency. In figure 6 above, in order to sample 340 Hz with accurate reconstruction, the sample rate must be at least
680 Hz. This will result in a constant interval of = 0.00147. Furthermore, when the frequency is exactly half the sampling rate, it is known as the Nyquist limit, or Nyquist frequency.
The standard sample rate of CDs is 44,100 Hz. The Nyquist limit for this sample rate is (44,100) = 22,050 ��. This means that audio CDs can represent frequencies up to 22,050 Hz. Furthermore, the human ear can hear a range of frequencies from as low as
20 Hz, all the way up to 20,000 Hz (Mitchell, 2008). Therefore, the standard CD sample rate of 44.1 kHz works out nicely, to cover the range of human hearing. Other common
CD sample rates include 48 kHz, and 96 kHz. These higher sample rates use up more memory, and the gains in sound quality may not be worth the space.
8
3. Classical Waveforms and Trivial Implementations
The classical waveforms found in commercial synthesizers, both hardware and software, include the sine, sawtooth, triangle, and square wave. The trivial implementations of these waveforms are straightforward. The term trivial is used because a complex function is not required to model the waveforms.
Figure 8: The classical waveforms: sine, square, triangle, and sawtooth.
Think about how one could model the sawtooth wave. The slope of the sawtooth waveform increases with time, up to a point, then resets with a sharp drop. This behavior is similar to a modulo function. It is trivial to implement in code. Starting at 0, increment a counter on every iteration with a constant interval, the inverse of the sample rate. When
9 the counter reaches the maximum value of 1, simply reset the counter by subtracting 1.
This counter is the amplitude value, or slope, of the waveform at a specific point in time.
The maximum value of 1 represents the normalized maximum amplitude value. See
Appendix A.
The slope of the triangle increases with time, up to a point, then decreases with time, down to a point. Therefore, instead of resetting our counter by subtracting 1, simply decrease the counter by subtracting the constant interval. The square wave has a constant slope that jumps back and forth over time. The discontinuities in the sawtooth and square wave will result in aliasing frequencies due to round off errors. The sine wave is continuous, has no harmonics, and does not alias. The triangle wave also has no discontinuity; the waveform is continuous piecewise linear. The trivial implementation of the triangle waveform can be used to generate sound because it is continuous and does not alias.
The trivial implementations are useful in virtual synthesizers because the implementation is quite fast and efficient. However, the aliasing is substantial, especially when the frequencies are high, near the Nyquist limit. Therefore, they are used as Low-
Frequency Oscillators (LFOs). LFOs are not used to generate sound. Instead, they are used to modulate effect parameters (e.g., a filter’s cutoff) to create unique sounds. A different class of oscillators called pitched oscillators, are used to generate sound (Pirkle,
2015).
10
4. Techniques
Oscillators that produce wildly aliasing frequencies are highly undesirable for use as a sound source. Therefore, there is a lot of effort going into finding solutions to this problem. There are many different techniques, and each has its own advantages and disadvantages. The next few sections will look at different pitched oscillator designs.
4.1 Additive Synthesis
Additive synthesis is the summation of sinusoids (Mitchell, 2008). A sinusoid is any continuous wave similar to a sine wave, with different amplitudes, and phase shifts.
Additive synthesis is fairly simple to implement. Additive synthesis can be bandlimited as well, by not including harmonics that are higher than the Nyquist limit. The terms harmonic and partial are used interchangeably, and they describe an integer multiple of the fundamental frequency, (e.g., the note on a piano being played). A properly bandlimited implementation of additive synthesis eliminates aliasing frequencies completely.
Additive synthesis, while a solution to aliasing, is not used because of its complexity requirement. Let the sampling rate be the standard CD sample rate of 44,100
Hz. The Nyquist limit is half, and is 22,050 Hz. The lowest piano frequency is 27.500 Hz.
Therefore, , ≈ 801 partials. This will result in 801 calls to the sin function. See .
Appendix B.
Now imagine the demands on the CPU as multiple piano keys are being pressed simultaneously. As the CPU demands increase, another ugly problem arises, latency.
Latency is the delay between the interval of the note being pressed, and the sound being
11 heard by one’s ear. In short, for serious real time audio work, latency must be kept to a minimum.
4.2 Wavetable Synthesis
Wavetable oscillators are less expensive than additive synthesis; however, they can require respectable amounts of storage. The wavetable oscillator can be implemented using a structure similar to a queue. This queue has the data for one cycle of any waveform (e.g., one cycle of a sawtooth). Because the queue structure is pre-loaded with data, wavetable oscillators are often pre-loaded with recorded signals, some very complex. This is an appealing feature of the wavetable oscillator. Many wavetable synthesizers have large collections of recorded samples, which can take up some memory, although the memory issue is becoming less of a factor with current advancements in technology.
As an example of how wavetable oscillators work, let a queue contain 1024 samples of one cycle of any repeating waveform. Then, with each iteration, one sample is outputted. If the CD sample rate of 44,100 Hz is used, then the frequency of the generated signal is , = 43.066 ��. This is the frequency of the queue. To produce other frequencies (e.g., a piano key of 440.0 is pressed), find the increment value using equation (4.1).
��������� = ����������ℎ ∗ �_������� / ������_���� (4.1)
This increment value is used to skip values in the queue (Pirkle, 2015). For example, to generate a frequency of 440.0 Hz, the increment is ( . ) = 10.21678. ,
This means that on every iteration, the queue should output every 10.21678 sample.
The idea is simple; however, there are intricacies in the implementation that the
12 developer should keep in mind. It is rare that the increment value is a whole number.
Therefore, to handle fractions, the algorithm needs an interpolation method to calculate a good approximation of the sample output value. Furthermore, the waveform to be loaded in the queue should already be bandlimited. One idea is to calculate bandlimited amplitude values from the additive synthesis method, then load the queue using this data set.
4.3 Bandlimited Impulse Train (BLIT)
Stilson and Smith (1996) proposed their Bandlimited Impulse Train Synthesis technique to reduce aliasing. First, they start with an ideal bandlimited impulse train.
Then, they apply a sinc function, and give a closed-form expression for the sampled bandlimited impulse train (Stilson & Smith, 1996).