Analysis and Synthesis of Expressive Guitar Performance
AThesis
Submitted to the Faculty
of
Drexel University
by
Raymond Vincent Migneco
in partial fulfillment of the
requirements for the degree
of
Doctor of Philosophy
May 2012 c Copyright 2012 Raymond Vincent Migneco. All Rights Reserved. ii
Table of Contents
ListofTables...... vi
ListofFigures ...... vii
Abstract...... xi
1 INTRODUCTION ...... 1
1.1 Contributions...... 3
1.2 Overview ...... 4
2 COMPUTATIONAL GUITAR MODELING ...... 6
2.1 SoundModelingandSynthesisTechniques...... 6
2.1.1 WavetableSynthesis ...... 6
2.1.2 FMSynthesis...... 7
2.1.3 AdditiveSynthesis ...... 8
2.1.4 Source-FilterModeling...... 8
2.1.5 PhysicalModeling ...... 9
2.2 Summary and Model Recommendation ...... 10
2.3 SynthesisApplications ...... 12
2.3.1 SynthesisEngines ...... 12
2.3.2 Description and Transmission ...... 12
2.3.3 New Music Interfaces ...... 13
3 PHYSICALLY INSPIRED GUITAR MODELING ...... 14
3.1 Overview ...... 14
3.2 WaveguideModeling ...... 14
3.2.1 Solution for the Ideal, Plucked-String ...... 15
3.2.2 Digital Implementation of the Wave Solution ...... 15
3.2.3 Lossy Waveguide Model ...... 17
3.2.4 Waveguide Boundary Conditions ...... 18 iii
3.2.5 ExtensionstotheWaveguideModel ...... 20
3.3 Analysis and Synthesis Using Source-Filter Approximations ...... 21
3.3.1 Relation to the Karplus-Strong Model ...... 22
3.3.2 Plucked String Synthesis as a Source-Filter Interaction ...... 22
3.3.3 SDL Components ...... 23
3.3.4 Excitation and Body Modeling via Commuted Synthesis ...... 25
3.3.5 SDL Loop Filter Estimation ...... 27
3.4 ExtensionstotheSDLModel ...... 31
4 SOURCE-FILTERPARAMETERESTIMATION ...... 32
4.1 Overview ...... 32
4.2 Background on Expressive Guitar Modeling ...... 32
4.3 Excitation Analysis ...... 33
4.3.1 Experiment: Expressive Variation on a Single Note ...... 34
4.3.2 Physicality of the SDL Excitation Signal ...... 36
4.3.3 Parametric Excitation Model ...... 38
4.4 Joint Source-Filter Estimation ...... 38
4.4.1 Error Minimization ...... 38
4.4.2 Convex Optimization ...... 40
5 SYSTEMFORPARAMETERESTIMATION...... 43
5.1 Onset Localization ...... 43
5.1.1 Coarse Onset Detection ...... 44
5.1.2 Pitch Estimation ...... 45
5.1.3 Pitch Synchronous Onset Detection ...... 46
5.1.4 Locating the Incident and Reflected Pulse ...... 48
5.2 Experiment1 ...... 49
5.2.1 Formulation ...... 49
5.2.2 Problem Solution ...... 52
5.2.3 Results ...... 53
5.3 Experiment2 ...... 58 iv
5.3.1 Formulation ...... 58
5.3.2 Problem Solution ...... 59
5.3.3 Results ...... 61
5.4 Discussion...... 62
6 EXCITATIONMODELING ...... 63
6.1 Overview ...... 63
6.2 Previous Work on Guitar Source Signal Modeling ...... 64
6.3 Data Collection Overview ...... 66
6.3.1 Approach ...... 67
6.4 Excitation Signal Recovery ...... 68
6.4.1 Pitch Estimation and Resampling ...... 69
6.4.2 Residual Extraction ...... 69
6.4.3 Spectral Bias from Plucking Point Location ...... 70
6.4.4 Estimating the Plucking Point Location ...... 71
6.4.5 Equalization: Removing the Spectral Bias ...... 74
6.4.6 Residual Alignment ...... 76
6.5 Component-based Analysis of Excitation Signals ...... 77
6.5.1 Analysis of Recovered Excitation Signals ...... 77
6.5.2 Towards an Excitation Codebook ...... 78
6.5.3 Application of Principal Components Analysis ...... 79
6.5.4 Analysis of PC Weights and Basis Vectors ...... 81
6.5.5 Codebook Design ...... 84
6.5.6 Codebook Evaluation and Synthesis ...... 85
6.6 Nonlinear PCA for Expressive Guitar Synthesis ...... 88
6.6.1 Nonlinear Dimensionality Reduction ...... 89
6.6.2 Application to Guitar Data ...... 90
6.6.3 Expressive Control Interface ...... 92
6.7 Discussion...... 94
7 CONCLUSIONS ...... 95 v
7.1 ExpressiveLimitations...... 96
7.2 Physical Limitations ...... 97
7.3 FutureDirections...... 98
Appendix A Overview of Fractional Delay Filters ...... 100
A.1 Overview ...... 100
A.2 The Ideal Fractional Delay Filter ...... 100
A.3 Approximation Using FIR Filters ...... 102
A.3.1 Delay Approximation using Lagrange Interpolation Filters ...... 103
A.4 Further Considerations ...... 104
AppendixB PitchGlideModeling ...... 106
B.1 Overview ...... 106
B.2 PitchGlideModel ...... 107
B.3 PitchGlideMeasurement ...... 107
B.4 Nonlinear Modeling and Data Fitting ...... 108
B.4.1 Nonlinear Least Squares Formulation ...... 108
B.4.2 FittingandResults ...... 109
B.5 Implementation...... 110
Bibliography ...... 113
VITA ...... 119 vi
List of Tables
2.1 Summary of sound synthesis models including their modeling domain and applicable audio signals. Adopted from Vercoe et al. [93]...... 11
2.2 Evaluating the attributes of various sound modeling techniques. The boldface tags indicate the optimal evaluation for a particular category...... 11
5.1 Mean and standard deviation of the SNR computed using Equation 5.11. The joint source-filter estimation approach was used to obtain parameters for synthesizing the guitar tones based on an IIR loop filter...... 58
5.2 Mean and standard deviation of the SNR computed using Equation 5.11. The joint source-filter estimation approach was used to obtain parameters for synthesizing the guitar tones using a FIR loop filter with length N = 3...... 61
B.1 Pitch glide parameters of Equation B.3 for plucked guitar tones for each guitar string. p, mf and f indicate strings excited with piano, mezzo-forte and forte dynamics, respectively...... 112 vii
List of Figures
3.1 Traveling wave solution of an ideal string plucked at time t = t1 and its displacement at subsequent time instances t2,t3. The string’s displacement (solid) at any position is the summation of the two disturbances (dashed) at that position...... 16
3.2 Waveguide model showing the discretized solution of an ideal, plucked string. The + upper (y ) and lower (y ) signal paths represent the right and left traveling distur- + bances, respectively. The string’s displacement is obtained by summing y and y at adesiredspatialsample...... 17
3.3 Waveguide model incorporating losses due to propagation at the spatial sampling instances. The dashed lines outline a section where M gain and delay blocks are consolidated using a linear time-invariant assumption...... 18
3.4 Plucked-string waveguide model as it correlates to the physical layout of the guitar. Propagation losses and boundary conditions are lumped into digital filters located at the bridge and nut positions. The delay lines are initialized with the string’s initial displacement...... 20
3.5 Single delay-loop model (right) obtained by concatenating the two delay lines from a bidirectional waveguide model (left) at the nut position. Losses from the bridge and nut filters are consolidated into a single filter in the feedback loop...... 22
3.6 Plucked string synthesis using the single delay-loop (SDL) model specified by S(z). C(z) and U(z) are comb filters simulating the e↵ects of the plucking point and pickup positions along the string, respectively...... 24
3.7 Components for guitar synthesis including excitation, string and body filters. The excitation and body filter’s may be consolidated for commuted synthesis...... 26
3.8 Overview of the loop filter design algorithm outlined in Section 3.3.5 using short-time Fourier transform analysis on the signal...... 30
4.1 Top: Plucked guitar tones representing various string articulations by the guitarist on the open, 1st string (pitch E4, 329.63 Hz). Bottom: Excitation signals for the SDL modelassociatedwitheachpluckingstyle...... 35
4.2 The output of a waveguide model is observed over one period of oscillation. The top figure in each subplot shows the position of the traveling acceleration waves at di↵erent time instances. The bottom plot traces out the measured acceleration at the bridge (notedbythe’x’inthetopplots)overtime...... 37
5.1 Proposed system for jointly estimating the source-filter parameters for plucked guitar tones...... 43
5.2 Pitch estimation using the autocorrelation function. The lag corresponding to the global maximum indicates the fundamental frequency for a signal with f0 = 330 Hz. 46 viii
5.3 Overview of residual onset localization in the plucked-string signal. (a): Coarse onset localization using a threshold based on spectral flux with a large frame size. (b): pitch-synchronous onset detection utilizing spectral flux threshold computed with a frame size proportional to the fundamental frequency of the string. (c): Plucked-string signal with onsets coarse and pitch-synchronous onsets overlayed...... 47
5.4 Detail view of the “attack” portion of the plucked-tone signal in Figure 5.3. The pitch- synchronous onset is marked as well as the incident and reflected pulses from the first period of oscillation...... 48
5.5 Pole-zero and magnitude plots of a string filter S(z)withf0 = 330 Hz and a loop filter pole located at ↵0 =0.03. The pole-zero and magnitude plots of the system are shown in (a) and (c) and the corresponding plots using an all-pole approximation of S(z)areshownin(b)and(d)...... 50
st 5.6 Analysis and resynthesis of the guitar’s 1 String in the “open” position (E4, f0 = 329.63 Hz). Top: Original plucked-guitar tone, residual signal and estimated excitation boundaries. Middle: Resynthesized pluck and excitation using estimated source-filter parameters. Bottom: Modeling error...... 54
5.7 Comparing the amplitude envelopes of synthetic plucked-string tones produced with the parameters obtained from the joint source-filter algorithm against their analyzed counterparts. The tones under analysis were produced by plucking the 1st string at nd the 2 fret position (F#4, f0 = 370 Hz) at piano, mezzo-forte and forte dynamics. . 55
5.8 Comparing the amplitude envelopes of synthetic plucked-string tones produced with the parameters obtained from the joint source-filter algorithm against their analyzed counterparts. The tones under analysis were produced by plucking the 5th string at th the 5 fret position (D3, f0 = 146.83 Hz) at piano, mezzo-forte and forte dynamics. . 56
6.1 Source-filter model for plucked-guitar synthesis. C(z) is the feed-forward comb filter simulating the a↵ect of the player’s plucking position. S(z) models the string’s pitch anddecaycharacteristics...... 65
6.2 Front orthographic projection of the bridge-mounted piezoelectric bridge used to record plucked-tones. A piezoelectric crystal is mounted on each saddle, which measures pressure during vibration. Guitar diagram obtained from www.dragoart.com. . . . . 67
6.3 Diagram outlining the residual equalization process for excitation signals...... 69
6.4 “Comb filter” e↵ect resulting from plucking a guitar string (open E, f0 = 331 Hz) 8.4 cm from the bridge plucked-guitar tone. (a) Residual obtained from single delay- loop model. (b) Residual spectrum. Using equation 6.2, the notch frequencies are approximately located at multiples of 382 Hz...... 70
6.5 Plucked-guitar tone measured using a piezo-electric bridge pickup. Vertical dashed- lines indicate the impulses arriving at the bridge pickup. t indicates the arrival time betweenimpulses...... 73
6.6 (a) One period extracted from the plucked-guitar tone in Figure 6.5. (b) Autocor- relation of the extracted period. The minimum is marked and denotes time lag, t, betweenarrivingpulsesatthebridgepickup...... 73 ix
6.7 Comb filter structures for simulating the plucking point location. (a) Basic struc- ture. (b) Basic structure with fractional delay filter added to the feedforward path to implementnon-integerdelay...... 75
6.8 Spectral equalization on a residual signal obtained from plucking a guitar string 8.4 cm from the bridge (open E, f0 = 331 Hz) ...... 76
6.9 Excitation signals corresponding to strings excited using a pick (a) and finger (b). . . 77
6.10 Average magnitude spectra of signals produced with pick (a) and finger (b)...... 78
6.11 Application of principal components analysis to a synthetic data set. The vector v1 explains the greatest variance in the data while v2 explains the remaining greatest variance...... 79
6.12 Explained variance of the principal components computed for the set of (a) unwound and(b)woundstrings...... 82
6.13 Selected basis vectors extracted from plucked-guitar recordings produced on the 1st, 2nd and 3rd strings...... 83
6.14 Selected basis vectors extracted from plucked-guitar recordings produced on the 4th, 5th and 6th strings...... 83
6.15 Projection of guitar excitation signals into the principal component space. Excitations from strings 1 - 3 (a) and 4 - 6 (b)...... 84
6.16 Histogram of basis vector occurrences generated with Mtop = 20...... 86
6.17 Excitation synthesis by varying the number of code book entries: (a) 1 entry, (b) 10 entries,(c)50entries...... 87
6.18 Computed Signal-to-noise ratio when increasing the number of codebook entries used to reconstruct the excitation signals...... 88
6.19 Architecture for a 3-4-1-4-3 autoassociative neural network...... 89
6.20 Top: Projection of excitation signals into the space defined by the first two linear principal components. Bottom: Projection of the linear PCA weights along the axis defined by the bottleneck layer of the trained 25-6-2-6-25 ANN...... 91
6.21 Guitar data projected along orthogonal principal axes defined by the ANN (center). Example excitation pulses resulting from sampling this space are also shown...... 92
6.22 Tabletop guitar interface for the components based excitation synthesis. The articula- tion is applied in the gradient rectangle, while the colored squares allow the performer tokeyinspecificpitches...... 93
A.1 Impulse responses of an ideal shifting filter when the sample delay assumes an integer (top) and non-integer (bottom) number of samples...... 102
A.2 Lagrange interpolation filters with order N = 3 (top) and N = 7 (bottom) to provide a fractional delay, dF =0.3. As the order of the filter is increased, the Lagrange filter coe cients near the values of the ideal function...... 104 x
A.3 Frequency response characteristics of Lagrange interpolation filters with order N = 3, 5, 7 to provide a fractional delay dF =0.3. Magnitude (top) and group delay (bottom) characteristics are plotted...... 105
B.1 Measured and modeled pitch glide for forte plucks...... 110
B.2 Measured and modeled pitch glide for piano, mezzo-forte and forte plucks...... 111
B.3 Single delay-loop waveguide filter with variable fractional delay filter, HF (z)...... 111 xi
Abstract Analysis and Synthesis of Expressive Guitar Performance Raymond Vincent Migneco Advisor: Youngmoo Edmund Kim, Ph.D.
The guitar is one of the most popular and versatile instruments used in Western music cultures.
Dating back to the Renaissance era, the guitar can be heard in nearly every genre of Western music, and is arguably the most widely used instrument in present-day rock music. Over the span of 500 years, the guitar has developed a multitude of performance and compositional styles associated with nearly every musical genre such as classical, jazz, blues and rock. This versatility can be largely attributed to the relatively simplistic nature of the instrument, which can be built from a variety of materials and optionally amplified. Furthermore, the flexibility of the instrument allows performers to develop unique playing styles, which reflect how they articulate the guitar to convey certain musical expressions.
Over the last three decades, physical- and physically-inspired models of musical instruments have emerged as a popular methodology for modeling and synthesizing various instruments, including the guitar. These models are popular since their components relate to the actual mechanisms involved with sound production on a particular instrument, such as the vibration of a guitar string. Since the control parameters are physically relevant, they have a variety of applications including control and manipulation of “virtual instruments.” The focus of much of the literature on physical modeling for guitars is concerned with calibrating the models from recorded tones to ensure that the behavior of real strings is captured. However, far less emphasis is placed on extracting parameters that pertain to the expressive styles of the guitarist.
This research presents techniques for the analysis and synthesis of plucked guitar tones that are capable of modeling the expressive intentions applied through the guitarist’s articulation during performance. A joint source-filter estimation approach is developed to account for the performer’s articulation and the corresponding resonant string response. A data-driven, statistical approach for modeling the source signals is also presented in order to capture the nuances of particular playing styles. This research has several pertinent applications, including the development of expressive syn- thesizers for new musical interfaces and the characterization of performance through audio analysis.
1
CHAPTER 1: INTRODUCTION
The guitar is one of the most popular and versatile instruments used in Western music cultures.
Dating back to the Renaissance period, it has been incorporated into nearly every genre of Western music and, hence, has a rich tradition of design and performance techniques pertaining to each genre.
From a cultural standpoint, musicians and non-musicians alike are captivated by the performances of virtuoso guitarists past and present, who introduced innovative techniques that defined or redefined the way the instrument was played. This deep appreciation is no doubt related to the instrument’s adaptability, as it is recognized as a primary instrument in many genres, such as blues, jazz, folk, country and rock.
The guitar’s versatility is inherent in its simple design, which can be attributed to its use in multiple musical genres. The basic components of any guitar consist of a set of strings mounted across a fingerboard and a resonant body to amplify the vibration of the strings. The tension on each string is adjusted to achieve a desired pitch when the string is played. Particular pitches are produced by clamping down each string at a specific location along the fingerboard, which changes the e↵ective length of the string and, thus, the associated pitch when it is plucked. Frets, which are metallic strips spanning the width of the fingerboard, are usually installed on the fingerboard to exactly specify the location of notes in accordance with an equal tempered division of the octave.
The basic design of the guitar has been augmented in a multitude of ways to satisfy the demands of di↵erent performers and musical genres. For example, classical guitars are strung with nylon strings, which can be played with the fingers or nails, and a wide fingerboard to permit playing scales and chords with minimal interference from adjacent strings. Often a solo instrument, the classical guitar requires a resonant body for amplification where the size and materials of the body are chosen to achieve a specific timbre. On the other hand, country and folk guitarists prefer steel- strings which generally produce “brighter” tones. Electric guitars are designed to accommodate the demands of guitarists performing rock, blues and jazz music. These guitars are outfitted with electromagnetic pickups where string vibration induces an electrical current, which can be processed to apply certain e↵ects (e.g. distortion, reverberation) and eventually amplified. The role of the body is less important for electric guitars (although guitarists argue that it a↵ects the instrument’s 2 timbre) where the body is generally thinner to increase comfort during performance. When the electric guitar is outfitted with light gauge strings, it facilitates certain techniques such as pitch- bending and legato, which are more di cult to perform on acoustic instruments.
Though the guitar can be designed and played in di↵erent ways to achieve a vast tonal palette, the underlying physical principles of vibrating strings is constant for each variation of the instrument.
Consequently, a popular topic among musicians and researchers is the development of quantitative guitar models that simulate this behavior. Physical- and physically-inspired models of musical in- struments have emerged as a popular methodology for this task. The lure of these models is that they simulate the physical phenomena responsible for sound production in instruments, such as a vibrating strings or air in a column, and produce high-quality synthetic tones. Properly calibrating these models, however, remains a di cult task and is an on-going topic in the literature. Several gui- tar synthesizers have been developed using physically-inspired models, such as waveguide synthesis and the Karplus-Strong Algorithm.
In the last decade, there has been considerable interest in digitally modeling analog guitar com- ponents and e↵ects using digital signal processing (DSP) techniques. This work is highly relevant to the consumer electronics industry since it promises low-cost, digital “clones” of vintage, analog equipment. The promise of these devices is to help musicians consolidate their analog equipment into a single device or acquire the specific tones and capabilities of expensive and/or discontinued equipment at lower cost. Examples of products designed using this technology include Line6 mod- eling guitars and amplifiers, where DSP is used to replicate the sounds of well-known guitars and tube-based amplifiers [45, 46].
Despite the large amount of research focused on digitally modeling the physics of the guitar and its associated e↵ects, there has been relatively little research conducted which analyzes the expressive attributes of guitar performance. The current research is mainly concerned with implementing specific performance techniques into physical models based on detailed physical analysis of the performer-instrument interaction. However, there is a void in the research for guitar modeling and synthesis that is concerned with measuring physical and expressive data from recordings. Obtaining such data is essential for developing an expressive guitar synthesizer; that is, a system that not only faithfully replicates guitar timbres, but is also capable of simulating expressive intentions used by many guitarists. 3
1.1 Contributions
This dissertation proposes analysis and synthesis techniques for plucked guitar tones that are capable of modeling the expressive intentions applied through the guitarist’s articulation during performance.
Specifically, the expression analyzed through recorded performance focuses on how the articulation was applied through plucking mechanism and strength. The main contributions of this research are summarized as follows:
Generated a data set of plucked guitar tones comprising variations of the performer’s articu- • lation including the plucking mechanism and strength, which spans all of the guitar’s strings
and several fretting positions.
Developed a framework for jointly estimating the source and filter parameters for plucked- • guitar tones based on a physically-inspired model.
Proposed and demonstrated a novel application of principal component analysis to model the • source signal for plucked guitar tones to encapsulate characteristics of various string articula-
tions.
Utilized nonlinear principal components analysis to derive an expressive control space to syn- • thesize excitation signals corresponding to guitar articulations.
The analysis and synthesis techniques proposed here are based on physically inspired models of plucked-guitar tones. These types of models are chosen because they have great potential for analyzing and synthesizing expressive performance because their operation has a strong physical analog to the process of exciting a string; that is, an impulsive force excites a resonant string response.
These advantages are in contrast to other modeling techniques, such as frequency modulation (FM), additive and spectral modeling synthesis, which are often used for music synthesis tasks, but lack easily controlled parameters that relate to how an instrument is excited (e.g. bowing, picking).
Physical models, on the other hand, relate to the initial conditions of a plucked string and possible variations which produce unique tones when applied to the model. This is intuitive, considering guitarists a↵ect the same physical variables when plucking a string.
The proposed method for deriving the parameters relating to expressive guitar performance is based on a joint source-filter estimation framework. The motivation to implement the estimation in a joint source-filter framework is two-fold. Foremost, musical expression results from an interaction 4 between the performer and the instrument and estimating the expressive attributes of performance requires accounting for the simultaneous variation of source and filter parameters. For the specific case of the guitar, the performer can be seen as imparting an articulation (i.e. excitation) on the string (i.e. filter), which has a resonant response to the performance input. The second reason for this approach is to facilitate the estimation of the source and filter parameters, which is typically accomplished in two separate tasks.
Building o↵the joint parameter estimation scheme, component-based analysis is applied to the source (i.e. excitation) signals obtained from recorded performance. Existing modeling techniques treat the excitation signal as a separate entity saved o↵-line to model a specific articulation, but in doing so provides no mechanism to quantify or manipulate the excitation signal. The application of component analysis is a data-driven, statistical approach used to represent the nuances of specific articulations through linear combinations of component vectors or functions. Using this represen- tation, the articulations can be visualized in the component space and dimensionality reduction is applied to yield an expressive synthesis space that o↵ers control over specific characteristics of the data set.
The proposed guitar modeling techniques presented in this dissertation have many potential applications for music analysis and synthesis tasks. Analyzing the source-filter parameters derived from the recordings of many guitarists could lead to development of quantitative models of guitar expression and a deeper understanding of expression during performance. The application of the estimated parameters using the proposed techniques can expand upon the sonic and expressive capabilities of current synthesizers, which often rely on MIDI or wavetable samples to replicate the tone with little or no expressive control. During the advent of computer music, limited computational power was a major constraint when implementing synthesis algorithms, but this is now much less of a concern given the capabilities of present-day computers and mobile devices. These advances in technology have provided new avenues for interacting with audio through gesture-based technologies.
The guitar analysis and synthesis techniques presented in this dissertation can be harnessed along with these technologies to create new experiences for musical interaction.
1.2 Overview
As computational modeling for plucked-guitars is the basis of this thesis, Chapter 2 overviews various approaches for modeling and synthesizing musical sounds. These approaches include wavetable 5 synthesis, spectral modeling, FM synthesis, physical modeling and source-filter model. The strengths and weaknesses of each model are evaluated and based on our assessment, a recommendation is made to base the techniques proposed in this dissertation on a source-filter approximation of physical guitar models.
Physical and source-filter models are discussed in detail in Chapter 3, which digitally implement the behavior of a vibrating string due to an external input. The so-called waveguide model, which is based on a digital implementation of the d’Alembert solution for describing traveling waves on a string, is introduced as well as a source-filter approximation of this model.
Chapter 4 presents an approach for capturing the expression contained in specific string articu- lations via the source signal from a source-filter model. The physical relation of this source signal to the waveguide model is highlighted and it is suggested that a parametric model can be used to capture the nuances of the articulations. The joint estimation of the source and filter models is proposed by finding parameters that minimize the error between the analyzed recording and the synthetic signal. This constrained least squares problem is solved using convex optimization. The implementation for this approach and results are discussed in Chapter 5.
In Chapter 6, principal components analysis (PCA) is applied to a corpus of excitation signals derived from recorded performance. In this application, PCA models each excitation signal as a linear combination of basis functions, where each function contributes to the expressive attributes of the data. We show that a codebook of relevant basis functions can be extracted which describe particular articulations where the plucking device and strength are varied. Furthermore, using components as features, we show that nonlinear PCA (NLPCA) can be applied for dimensionality reduction, which helps visualize the expressive attributes of the data set. This mapping is reversible, so the reduced dimensional space can be used as an expressive synthesizer using the linear basis functions to reconstruct the excitation signals. This chapter also deals with the pre-processing steps required to remove biases from the recovered signals, including the e↵ect of the guitarist’s plucking position along the string.
The conclusions from this dissertation are presented in Chapter 7, which includes the limitations and future avenues to explore. 6
CHAPTER 2: COMPUTATIONAL GUITAR MODELING
A number of techniques are available for the computational modeling and synthesis of guitar tones, each with entirely di↵erent approaches for capturing its acoustic attributes. This chapter will provide an overview of the sound models most commonly applied to guitar tones including their computa- tional basis, strengths and weaknesses. For detailed treatment of these techniques, the reader is referred to extensive overviews provided by [10] and [89]. The analysis of each synthesis techniques will also be used to justify the source-filter modeling approach used throughout this dissertation.
Finally, this chapter will discuss pertinent applications of computational synthesis of guitar tones.
2.1 Sound Modeling and Synthesis Techniques
2.1.1 Wavetable Synthesis
In many computer music applications, wavetable synthesis is a viable means for synthetically gener- ating musical sounds with low computational overhead. A wavetable is simply a bu↵er that stores the periodic component of a recorded sound, which can be looped repeatedly. As musical sounds vary in pitch and duration, signal processing techniques are required to modify the synthetic tones from a wavetable sample. Pitch shifting is achieved by interpolating the samples in the wavetable where a decrease or increase in pitch is achieved by interpolating the wavetable samples up or down, respectively.
A problem with interpolation in wavetable synthesis is that excessive interpolation of a particular wavetable sample can result in synthetic tones that sound unnatural since interpolation alters the length of the synthetic signal. To overcome this limitation, multi-sampling is used, where several samples of an instrument are used and these samples span the pitch range of the instrument. In- terpolation can now be used between the reference samples without excessive degradation to the synthetic tone, which is preferred to storing every possible pitch the instrument can produce. Multi- sampling can also be used to incorporate di↵erent levels of dynamics, or relative loudness into the system as well. Beyond interpolation, digital filters can be used to adjust the spectral properties 7
(e.g. brightness) of the wavetable samples as well.
The computational costs of wavetable synthesis are fairly low and the main restriction is the amount of memory available to store samples. The sound quality in these systems can be quite good as long as there is not excessive degradation from modification. However, wavetable synthesis has no true modeling basis (i.e. sinusoidal, source-filter) and is rather “ad-hoc” in its approach. Also, its flexibility in modeling and synthesis is restricted by the samples available to the synthesizer.
2.1.2 FM Synthesis
Frequency Modulation (FM) synthesis is a technique used to simulate characteristics of sounds that cannot be produced with LTI models. A FM oscillator is one such way of achieving these sounds and it operates by modulating the base frequency of a signal with another signal. FM Synthesis is often used to simulate characteristics of sounds that cannot be modeled using linear time-invariant models. A simple FM oscillator is given by
y(t)=Ac sin(2⇡tfc + fc cos(2⇡tfm)) (2.1)
where Ac and fc are the amplitude and frequency of the carrier signal, respectively, fm is the modulating frequency and fc is the maximum di↵erence between fc and fm. The spectrum of the resulting signal y(t) contains a peak located at the carrier frequency and sideband frequencies located at plus and minus integer multiples of fm. When the ratio of the carrier to the modulating frequency is non-integer, FM synthesis creates an inharmonic spectrum where the frequency spacing between the partials is not constant. This is useful for modeling the spectra of certain musical sounds, such as strings and drums, which exhibit inharmonic behavior.
FM synthesis is a fairly computationally e cient technique and can be easily implemented on a microprocessor, which makes it attractive for commercially available synthesizers. Due to the nonlinearity of the FM oscillator, for example, it is capable of producing timbres not possible with other synthesis methods. However, there is no automated approach for matching the synthesis parameters to an acoustic recording [8]. Rather, the parameters must be tweaked by trial and error and/or using perceptual evaluation. 8
2.1.3 Additive Synthesis
Additive, or spectral modeling, synthesis is a sound modeling and synthesis approach based on characterizing the spectra of musical sounds and modeling them appropriately. Sound spectra cat- egories typically consist of harmonic, inharmonic, noise or mixed spectra. Analysis via the additive synthesis approach typically entails performing a short-time analysis on the signal to divide it into relatively short frames where the signal is assumed to be stationary within the frame. In the spectral modeling synthesis technique proposed by Serra and Smith, the sinusoidal, or deterministic, parts of the spectrum within each frame are identified and modeled using amplitude, frequency and phase.
The sound can be re-synthesized by interpolating between the deterministic components of each frame to generate a sum of smooth, time-varying sinusoids. The noise-like, or stochastic, parts of the spectrum can be obtained by subtracting the synthesized, deterministic component from the original signal [68].
There are several benefits to synthesizing musical sounds via additive synthesis. Foremost, the model is very general and can be applied to a wide range of signals including polyphonic audio and speech [50, 68]. Also, the separation of the deterministic and stochastic components permits
flexible modification of signals since the sinusoidal parameters are isolated within the spectrum.
For example, pitch and time/scale modification can be achieved independently or simultaneously by shifting the frequencies of the sinusoids and altering the interpolation time between successive frames. This leads to synthetic tones that sound more natural and can be extended indefinitely, unlike wavetable interpolation.
A problem with additive synthesis is that transient events present in an analyzed signal are often too short to be adequately modeled by sinusoids and must be accounted for separately. This is problematic especially for signals with a percussive “attack” such as plucked-strings. It is also unclear how to modify the sinusoids in order to achieve certain e↵ects related to the perceived dynamics of a musical tone.
2.1.4 Source-Filter Modeling
Analysis and synthesis via source-filter models involves using a complex sound source, such as an impulse or periodic impulse train, to excite a resonant filter. The filter includes the important per- ceptual characteristics of the sound, such as the overall spectral tilt and the formants, or resonances, characteristic to the sound. When such a filter is excited by an impulse train, for example, the 9 resonant filter is “sampled” at regular intervals in the spectrum as defined by the frequency of the impulse train.
Source-filter models are attractive because they permit the automated analysis of the resonant characteristics through either time or frequency domain based techniques. One of the most well- known examples of this is linear prediction. Linear prediction entails predicting a sample of a signal based on a linear combination of past samples for that signal
P x(n)= ↵ x(n p) (2.2) p p=1 X where ↵p,↵p+1,...,↵P are the prediction coe cients to be estimated from the recording [60]. When a fairly low prediction order P is used, the prediction coe cients yield an all-pole filter that approx- imates the spectral shape, including resonances, of the analyzed sound. Computationally e cient techniques, such as the autocorrelation and covariance methods, are available for estimating the
filter parameters as well.
A significant advantage of source-filter models is that they approximate musical sounds as the output of a linear time-invariant (LTI) system. Therefore, using the estimated resonant filter, the source signal for the model can be recovered through an inverse filtering operation. Analysis of the recovered source signals provides insight into the expression used to produce the sound for the case of musical instruments. Also, source signals derived from certain signals can be used to excite the resonant filters from others, thus permitting cross-synthesis for generating new and interesting sounds. As will be discussed in Chapter 3, source-filter models have a close relation to physical models of musical instruments.
Despite the advantages of source-filter models, they have certain limitations. Namely, as they are based on LTI models, they cannot model the inherent nonlinearities found in real musical in- struments. For example, tension modulation in real strings alters the spectral characteristics in a time-varying manner, while source-filter models have fixed fundamental frequencies.
2.1.5 Physical Modeling
Physical modeling systems aim to model the behavior of systems using physical variables such as force, displacement, velocity and acceleration. Physical systems describing sound can range from musical interactions such as striking a drum or string or natural sounds such as wind and rolling objects. An example physical system for a musical interaction consists of releasing a string from an 10 initial displacement. The solution to this system is discussed extensively in Chapter 3, but involves computing the infinitesimal forces acting on the string as it is released which results in a set of di↵erential equations describing the motion of the string with respect to time and space. The digital implementation of physical models for sound can be achieved in a number of ways including modal decomposition, digital waveguides and wave digital filters to name a few [89].
While physical models are capable of high quality synthesis of acoustic instruments, developing models of these systems is often a di cult task. Taking the plucked-string as an example, a complete physical description requires knowledge of the string including its material composition and how it interacts with the boundary conditions at its termination points, which includes fricative forces acting on the string as it travels. Furthermore, there may be coupling forces acting between the string and the excitation mechanism (e.g. the player’s finger), which should be included as well.
For these reasons, the physical system must be known a priori and it cannot be calibrated directly through audio analysis.
2.2 Summary and Model Recommendation
Table 2.1 summarizes the sound modeling techniques presented above by comparing their modeling domains and the range of musical signals that can be produced using each method. The vertical ordering is indicative of the underlying basis and/or structure of the model types. For example, wavetable synthesis is a rather “ad-hoc” approach without a true computational basis, while FM synthesis is based on modulating sinusoids. Additive synthesis and source-filter models have a strict modeling basis using sinusoids plus noise and source-filter parameters, respectively. Physical models are most closely related to musical instruments since they deal with related physical quantities and interactions. As a model’s parameter domain becomes more general, a greater range of sounds can be synthesized with more control over their properties (i.e. pitch, timbre, articulation).
Based on the discussion in Section 2.1, the strengths and weaknesses of each model are evaluated on a scale (Low, Moderate, High) as they pertain to four categories:
1. Computational complexity required for implementation
2. The resulting sound quality when the model is used for sound synthesis of guitar tones
3. The di culty required to calibrate the model in accordance with acoustic samples
4. The degree of expressive control a↵orded by the model 11
Table 2.1: Summary of sound synthesis models including their modeling domain and applicable audio signals. Adopted from Vercoe et al. [93].
Sound Model Parameter Domain Acoustic Range sound samples, manipulation discrete pitches, isolated sound Wavetable filters events carrier and modulating sounds with harmonic and FM frequencies inharmonic spectra sounds with harmonic, noise sources, time-varying Additive inharmonic, noisy or mixed amplitude, frequency and phase spectra voice (speech, singing), excitation signal, filter Source-Filter plucked-string or struck parameters instruments physical quantities (length, plucked, struck, bowed or blown Physical sti↵ness, position, etc.) instruments
Table 2.2: Evaluating the attributes of various sound modeling techniques. The boldface tags indicate the optimal evaluation for a particular category. Computational Calibration Expressive Sound Model Sound Quality Complexity Di culty Control
Wavetable Low High High Low
FM Low Moderate High Low
Additive Moderate High Moderate Moderate
Source-Filter Moderate High Moderate High
Physical High High High Moderate
Table 2.2 shows the results of this evaluation in accordance with the four categories presented above. The model(s) earning the best evaluation for each category are highlighted in bold face font for emphasis. It should be noticed that, in general, the computational complexity of the models increases in accordance with the associated model parameter domain in Table 2.1. That is, as the parameters become more general, they are more di cult to implement and harder to calibrate.
For truly flexible and expressive algorithmic synthesis, additive, source-filter and physical models o↵er the best of all categories. While the additive model provides good sound quality and flexible synthesis (especially with regard to pitch and time shifting), the sinusoidal basis does not allow the performer’s input to be separated from the instrument’s response. Physical models provide this 12 separation, but are di cult to calibrate, especially from a recording, since the physical configuration of the instrument’s components and the performer’s interaction are generally not known a priori.
Of the remaining models, the source-filter model provides the greatest appeal due to its inherent simplicity especially, especially as it pertains to modeling the performer’s articulation, relative ease of calibration and available expressive control.
2.3 Synthesis Applications
The techniques for modeling plucked-guitar tones presented in this thesis are applicable to a number of sound synthesis tasks. This section will highlight a few such tasks to provide a larger perspective on the benefits of computational guitar modeling.
2.3.1 Synthesis Engines
There are numerous systems available which encompass a variety of computational sound models for the creation of synthetic audio. One system includes CSound, which is an audio programming language created by Vercoe et al. based on the C language [92]. CSound o↵ers the implementation of several synthesis algorithms, including general filtering operations, additive synthesis and linear prediction. The Synthesis ToolKit (STK) is another system created by Cook and Scavone, which adopts a hierarchical approach to sound modeling and synthesis using an open-source application programming interface based on C++ [11]. STK handles low level, core sound synthesis via unit generators which include envelopes, oscillators and filters. High-level synthesis routines encapsulate physical modeling algorithms for specific musical instruments, FM synthesis, additive synthesis and other routines.
2.3.2 Description and Transmission
Computational modeling of musical instruments, especially the guitar, is highly applicable in sys- tems requiring generalized audio description and transmission. The MPEG-4 standard is perhaps the most well-known codec (compressor-decompressor) for transmission of multimedia data. How- ever, the compression of raw audio, even using the perceptual codec found in mp3, leaves little or no control over the sound at the decoder. To expand the parametric control of compressed audio, the
MPEG-4 standard includes a descriptor for so-called Structured Audio, which permits the encoding, transmission and decoding of audio using highly structured descriptions of sound [21, 66, 93]. The 13 audio descriptors can include high-level, performance information for musical sounds such as pitch, duration, articulation and timbre and low-level descriptions based on the models (e.g. source-filter, additive synthesis) used to generate the sounds. It should be noted that the structured audio descrip- tor does not attempt to standardize the model used to parameterize the audio, but provides a means for describing the synthesis method(s), which keeps the standard flexible. The level of description provided by structured audio di↵erentiates it from other formats such as pulse-code modulated audio or mp3, which do not provide contextual descriptions and MIDI (musical instrument digital inter- face), which provide contextual description, but lacks timbral or expressive descriptors. In essence, structured audio provides a flexible and descriptive “language” for communicating with synthesis engines.
2.3.3 New Music Interfaces
Computer music researchers have long sought to develop new interfaces for musical interaction.
Often, these interfaces deviate from the traditional notion in which an instrument is played in order to appeal to non-musicians or enable entirely new ways of interacting with sound. For the guitar,
Karjalainen et al. developed a “virtual air guitar” where the performer’s hands are tracked using motion sensing gloves [26]. The guitar tones are produced algorithmically using waveguide models in response to gestures made by the performer. More recently, commercially available gesture and multitouch technologies have been used for music creation. The limitations of these systems, however, is that their audio engines utilize sample-based synthesizers and provide little or no parametric control over the resulting sound [20, 55].
The plucked-guitar model techniques presented in this dissertation are applicable to each of the sound synthesis areas outlined above. The source and filter parameters extracted from recordings can be used for low bit-rate transmission of audio and are based on algorithms (source-filter) that are either available in many synthesis packages are easily implemented on present-day hardware.
Given the computational power available in present day computers and mobile devices, the anal- ysis techniques and algorithms presented here can be harnessed into applications for new musical interfaces as well. 14
CHAPTER 3: PHYSICALLY INSPIRED GUITAR MODELING
3.1 Overview
For the past two decades, physically-inspired modeling systems have emerged as a popular method for simulating plucked-string instruments since they are capable of producing high-quality tones with computationally e cient implementations. The emergence of these techniques was due, in part, to the innovations of the Karplus-Strong algorithm, which simulated plucked-string sounds using a simple and e cient model, which was later shown to approximate the physical phenomena of traveling waves on a string [22, 30, 31, 72, 89]. Thus, direct physical modeling of a musical instrument aims to simulate the behavior of particular elements responsible for sound production
(e.g. a vibrating string or resonant air column) due to the musician’s interaction with the instrument
(e.g. plucking or breath excitation) with a digital model [89].
This chapter will briefly overview waveguide techniques for guitar synthesis, which directly models the traveling wave solution resulting from a plucked string. A related model, known as the single delay-loop, is also discussed, which is utilized for the analysis and synthesis tasks presented in this thesis.
3.2 Waveguide Modeling
Directly modeling the complex vibration of guitar strings due to the performer-instrument interaction is a di cult problem. However, by using simplified models of plucked-strings, waveguide models o↵er an intuitive understanding of string and lead to practical and e cient implementations [72]. In this section, the well-known traveling wave solution for ideal, plucked-strings is presented [33]. This general solution is then discretized and digitally implemented, as shown by Smith, to constitute a digital waveguide model [72]. Common extensions to the waveguide model are also presented, which correspond to non-ideal string conditions. 15
3.2.1 Solution for the Ideal, Plucked-String
The behavior of a vibrating string is understood by deriving and solving the well-known wave equation for an ideal, lossless string. The full derivation of the wave equation is documented in several physics texts [33, 52] and is obtained by computing the tension di↵erential across a curved section of string with infinitesimal length. This tension is balanced at all times by an inertial restoring force due to the string’s transverse acceleration.
The wave equation is expressed as [33]
Kty00 = "y¨ (3.1)
where Kt, " are the string’s tension and linear mass density, respectively, and y = y (t, x)isthe string’s transverse displacement at a particular time instant, t, and location along the string, x.The
2 2 curvature of the string is indicated by y00 = @ y(t, x)/@x and its transverse acceleration is given by y¨ = @2y(t, x)/@t2. The general solution to the wave equation is given by [33]
y (t, x)=y (t x/c)+y (t + x/c) , (3.2) r l
where yr and yl are functions that describe the right and left traveling components of the wave, respectively, and c is the wave speed, which is a constant determined by Kt/". It should be noted that, y and y are arbitrary functions of arguments (ct x) and (ct + x)p and it can be verified that r l substituting any twice-di↵erentiable function with these arguments for y(t, x) will satisfy Equation
3.1 [33, 72].
Equation 3.2 indicates that the wave solution can be represented by two functions, each depending on a time and a spatial variable. This notion becomes clear by analyzing an ideal, plucked-string at a few instances after its initial displacement as shown in Figure 3.1. After the string is released, its total displacement is obtained by summing the amplitudes of the right- and left-traveling wave shapes, which propagate away from the plucking position, along the entire length of the string.
3.2.2 Digital Implementation of the Wave Solution
As demonstrated in Figure 3.1, the traveling wave solution has both time and spatial dependencies, which must be discretized to digitally implement Equation 3.2. Temporal sampling is achieved by employing a change of variable in Equation 3.2 such that tn = nTs where Ts is the audio sampling 16
t = t1
t = t2
t = t3
Figure 3.1: Traveling wave solution of an ideal string plucked at time t = t1 and its displacement at subsequent time instances t2,t3. The string’s displacement (solid) at any position is the summation of the two disturbances (dashed) at that position.
interval. The wave’s position is discretized by setting xm = mX,whereX = cTs, such that the waves are sampled at a fixed spatial interval along the string. Substituting t and x with tn and xm in Equation 3.2 yields [72]:
y (t ,x )=y (t x/c)+y (t + x/c) (3.3) n m r l = y (nT mX/c)+y (nT + mX/c) (3.4) r s l s = y ((n m) T )+y ((n + m) T ) (3.5) r s l s
Since all arguments are multiplied by Ts, it is suppressed and the terms corresponding to the right and left traveling waves can be simplified to [72, 89]:
+ y (n) , yr (nTs) ,y (n) , yl (nTs) (3.6)
Smith showed that Equation 3.5 could be schematically realized as a so-called “digital waveg- uide” model shown in Figure 3.2 [70, 71, 72]. When the upper and lower signal paths, or “rails”, of Figure 3.2 are initialized with the values of the string’s left and right wave shapes, the traveling wave phenomena in Figure 3.1 and Equation 3.2 is achieved by shifting the transverse displacement values for the wave shapes in the upper and lower rails. For example, during one temporal sampling instance, the right-traveling wave shifts by the amount cTs along the string, which is equivalent to delaying y+ by one sample in Figure 3.2. The waveguide model also provides an intuitive under- standing for how the traveling waves relate to the string’s total displacement, which is obtained by 17
y+(n) y+(n-1) y+(n-2) y+(n-3) z-1 z-1 z-1
y(nT , 3X) y(nTs, 0) s
y-(n) y-(n+1) y-(n+2) y-(n+3) z-1 z-1 z-1 (x = 3cT ) (x = 0) (x = cTs) (x = 2cTs) s
Figure 3.2: Waveguide model showing the discretized solution of an ideal, plucked string. The upper + (y ) and lower (y ) signal paths represent the right and left traveling disturbances, respectively. + The string’s displacement is obtained by summing y and y at a desired spatial sample.
+ summing the values of y and y at a desired spatial sample x = mcTs. It should be noted that the values obtained at the sampling instants in the waveguide model are exact, although band-limited interpolation can be used to obtain the displacement between spatial sampling instants if desired
[89].
3.2.3 Lossy Waveguide Model
The lossless waveguide model in Figure 3.2 clearly represents the phenomena of the traveling wave solution for a plucked string under ideal conditions. However, this model does not incorporate the characteristics of real strings, which are subject to a number of non-ideal characteristics, such as internal friction and losses due to boundary collisions. In the context of sound synthesis, incorpo- rating these properties is essential for modeling tones that behave naturally both from a physical and perceptual standpoint.
Non-ideal string propagation is hindered by energy losses from internal friction and drag imposed by the surrounding air. If these losses can be modeled as a constant, µ, proportional to the wave’s transverse velocity,y ˙, Equation 3.1 can be modified as [72]
Kty00 = "y¨ + µy˙ (3.7) where the additional term, µy˙, incorporates the fricative losses applied to the string in the transverse direction. The solution to Equation 3.7 is the same as Equation 3.1, but with an exponential term that attenuates the right- and left-traveling waves as a function of propagation distance. The solution 18
M sections
y+(n) z-1 g z-1 g z-1 g
y(nTs, 0) y(nTs, MX) y-(n) g z-1 g z-1 g z-1
(x = 0) (x = McTs)
Figure 3.3: Waveguide model incorporating losses due to propagation at the spatial sampling in- stances. The dashed lines outline a section where M gain and delay blocks are consolidated using a linear time-invariant assumption.
is given by [72]:
(µ/2")x/c (µ/2")x/c y(t, x)=e y (t x/c)+e y (t + x/c) (3.8) r l
To obtain the lossy waveguide model, Equation 3.8 is discretized by applying the same change of variables that were used to discretize Equation 3.1. This yields a waveguide model with a gain factor,
µT /2" g = e s , inserted after each delay element in the waveguide as shown in Figure 3.3. Thus, a particular point along the right- or left-traveling wave shape is subject to an amplitude attenuation by the amount g as it advances one spatial sample through the waveguide.
By using a linear time-invariant (LTI) assumption, Figure 3.3 can be simplified to reduce the number of delay and gain elements required for the model. For example, if the output of the waveguide is observed at x =(M + 1)X, then the previous M delay and gain elements can be
M M consolidated into a single delay, z , and loss factor, g . This greatly reduces the complexity of the waveguide model, which is desirable for practical implementations.
3.2.4 Waveguide Boundary Conditions
In practice, the behavior of a vibrating string is determined by boundary conditions due to the string’s termination points. In the case of the guitar, each string is terminated at the “nut” and
“bridge” where the former is located near the guitar’s headstock and the latter is mounted on the guitar’s saddle. The behavior of the string at these locations depends on several factors, including the string’s tensile properties, how it is fastened and the construction of the bridge and nut. For 19 simplistic modeling, however, it su ces to assume that guitar string’s are rigidly terminated such that there is no displacement at these positions.
By assuming rigid terminations for a string with length L, a set of boundary conditions are obtained for solving the wave equation [33]
y (t, 0) = 0 y (t, L)=0. (3.9)
By substituting these conditions into Equation 3.2 and discretizing, the following relations between
+ y and y are obtained [72]:
+ y (n)= y (n) (3.10) + y (n D/2) = y (n + D/2) (3.11)
In Equation 3.11, D =2L/X and is often referred to as the “loop delay” since it indicates the delay time, in samples, for a point on the right wave shape, for example, to travel from x =0tox = L and back along the string. Thus, points located at the same spatial sample on the right and left wave shapes will have the same amplitude displacement every D/2 samples. Viewed another way,
D can be calculated as a ratio of the sampling frequency and the string’s pitch, which is determined by the string’s length,
2L 2L 2Lf f D = = = s = s (3.12) X cTs c f0
where the fundamental frequency, f0, was substituted based on the wave relationship f0 = c/2L where 2L is the wavelength and c is the wavespeed.
Figure 3.4 shows the lossy waveguide model with boundary conditions superimposed on a guitar body to illustrate the physical relationship between the model and instrument. The loss factors due to wave propagation and rigid boundary conditions are consolidated into two filters located at x =0 and x = L, which correlate the guitar’s bridge and nut positions, respectively. The individual delay elements are merged into two bulk delay lines, each having a length of D/2 samples and store the shapes of the left- and right-traveling wave shapes at any time during the simulation. Furthermore, this model allows the string’s initial conditions to be specified relative to a spatial sample in the delay line that represents the plucking point position. Initializing the waveguide in this way removes 20
+ + y (n) Delay Line D/2 Samples y (n-D/2)
H (z) H (z) b y(nTs, M1X) h
Delay Line D/2 Samples
y-(n) y-(n+D/2)
(x = 0) (x = M1X) (x = M2X) (x = L)
Bridge Pickup Pluck Point Nut
Figure 3.4: Plucked-string waveguide model as it correlates to the physical layout of the guitar. Propagation losses and boundary conditions are lumped into digital filters located at the bridge and nut positions. The delay lines are initialized with the string’s initial displacement.
the need to explicitly model the coupling e↵ects arising from the interaction between the string and excitation mechanism [72]. The guitar’s output is observed at the “pickup” location by summing the values of the upper and lower delay lines at a desired spatial sample.
The simplistic nature of the the waveguide model in Figure 3.4 leads to computationally e cient hardware and software implementations of realistic plucked guitar sounds. Memory requirements are minimal, since only two bu↵ers are required to store the string’s initial conditions and the lossy boundaries can be implemented with simple digital filters. Furthermore, as Smith showed, the contents of the delay lines can be shifted via pointer manipulation to reduce the load on the processor [10, 72]. Karjalainen showed that using such techniques enables several string models to be implemented on a single DSP chip, with computational capabilities that are eclipsed by present day (2012) microprocessors [25].
3.2.5 Extensions to the Waveguide Model
An important extension is providing fractional delay for the waveguide model since strings are often tuned to non-integer frequencies that may not be obtainable by taking the ratio of sampling frequency over delay line length. While certain hardware and software configurations support multiple sampling rates, it is generally undesirable to vary the sampling rate to achieve a particular tuning, especially when synthesizing multiple string tones with di↵erent pitches. Instead, Karjalainen proposed adding 21 fractional delay into the waveguide loop via a Lagrange interpolation filter. Thus, a FIR filter is computed to add the required fractional delay to precisely tune the waveguide [25].
Smith proposed using all-pass filters to simulate the e↵ects of dispersion in strings, where the string’s internal sti↵ness causes higher frequency components of the wave to travel faster than lower ones. This has the e↵ect of constantly altering the shape of the string. All-pass filters introduce frequency-dependent group delay to simulate this e↵ect [72].
Tolonen et al. incorporate the e↵ects of “pitch glide,” or tension modulation, exhibited by real strings using a non-linear waveguide model [79, 80, 91]. At rest, a string exhibits a nominal length and tension. However, as the string is displaced from its equilibrium position, the string undergoes elongation which increases its tension. After release, the tension and, thus, the wave speed constantly
fluctuates as the string oscillates about its nominal position. This constant fluctuation does not allow a fixed spatial sampling scheme to su ce and the wave must be resampled at each time instance to account for the elongation.
3.3 Analysis and Synthesis Using Source-Filter Approximations
The waveguide model discussed in the previous discussion provides an intuitive methodology for implementing the traveling wave solution and simulating plucked-string tones. However, accurate re-synthesis of plucked-guitar tones using the waveguide model requires knowledge of the string’s initial conditions and loss filters that are correctly calibrated to simulate naturally decaying tones.
The former requirement is a significant limitation since the exact initial conditions of the string are not available from a recorded signal and must be measured during performance, which is often impractical. Therefore, when performance and physical data are unavailable, the utility of the waveguide model is limited for analysis-synthesis tasks, such as characterizing recorded performance.
An alternative model, known as the single delay-loop (SDL), was developed to simplify the waveguide model from a computational standpoint by consolidating the delay lines and loss filters.
The SDL model is also widely used in the literature because it permits the analysis of plucked- guitar tones from a source-filter perspective; that is, an external signal excites a filter to simulate the resonant behavior of a plucked string. Thus, the physical specifications for the guitar and its strings are generally not required to calibrate the SDL model since linear time-invariant methods can be applied for this task. A number of guitar synthesis systems are based on SDL models
[26, 56, 74, 75, 90]. 22
3.3.1 Relation to the Karplus-Strong Model
For a more streamlined structure, the bidirectional waveguide model from Figure 3.4 can be reduced to a single, D-length delay line and a loop filter that consolidates the losses incurred from the bridge and nut [7, 72]. This reduction is shown in Figure 3.5, where the lower delay line is concatenated with the upper delay line at the nut position. The wave shape contained in the lower delay line is inverted to incorporate the reflection at the rigid nut, which has been removed.
y+(n) y+(n-D/2) y+(n) y+(n-D) D/2 Samples D Samples
Hb (z) Hh (z)
Hl (z)
D/2 Samples y-(n) y-(n+D/2)
Figure 3.5: Single delay-loop model (right) obtained by concatenating the two delay lines from a bidirectional waveguide model (left) at the nut position. Losses from the bridge and nut filters are consolidated into a single filter in the feedback loop.
The new waveguide structure in Figure 3.5 (right) demonstrates the basic SDL model and is identical to the well-known Karplus-Strong (KS) plucked-string model, whose discovery pre-dated waveguide synthesis techniques [22, 31]. Unlike waveguide techniques where the excitation is based on wave variables, the KS model works by initializing a D-length delay line with random values and circularly shifting the samples through a loss filter. The random initialization of the delay line simulates the transient noise burst perceived during the attack of plucked-string instruments, though this “excitation” signal has no physical relation to the string, while the feedback loop acts a comb
filter so that only the harmonically-related frequencies are passed. The loss filter, Hl(z), employs low-pass filtering to implement the frequency dependent decay characteristics of real strings so that high frequency energy dissipates faster than the lower frequencies.
3.3.2 Plucked String Synthesis as a Source-Filter Interaction
By modeling plucked-guitar tones with the single-delay loop (SDL), the physical interpretation of traveling wave shapes on a string is no longer clear as it was for the bidirectional waveguide.
However, Valimaki et al. show that the SDL can be derived from the bidirectional waveguide model by computing a transfer function between the spatial samples representing the plucking position 23 and output samples [30, 89]. This derivation is still physically valid, though the model’s excitation signal is treated as an external input rather than a set of initial conditions describing the string’s displacement.
Figure 3.6 shows a complete source-filter model for plucked guitar synthesis based on waveguide modeling principles. The SDL model is contained in the block labeled S(z), which is equivalent to the single delay line structure shown in Figure 3.5, except the model is driven by an external excitation signal rather than a random initialization as in the Karplus-Strong model. S(z) alone cannot simulate the complete behavior of plucked-strings found in the waveguide model. Notably, missing is the ability to manipulate the plucking point and pick up positions, both of which are achieved by selecting a desired spatial sample in the waveguide model corresponding to the location on where the string is displaced and where the vibration is observed as the output. Valimaki showed that this functionality could be achieved by adding comb filters before and after the SDL to simulate the e↵ects of plucking point and pickup positions present in the waveguide model.
Figure 3.6 shows a comb filter C(z)precedingS(z) to simulate the e↵ect of the plucking point position. For simplicity, the input p(n) can be an ideal impulse. The comb filter delay determines when p(n) is reflected, which is analogous to a sample in the digital waveguide model encountering a rigid boundary. The number of samples between the initial and reflected impulses is specified as a fraction of the loop delay where D indicates the number of samples corresponding to one period of string vibration. Similarly, the comb filter U(z)proceedingS(z) simulates the position of the pickup seen on electric guitars. In this filter, the comb filter delay specifies the delay between arriving pulses associated with a relative position along the string. It should be noted that, since each of the blocks in Figure 3.6 are linear time-invariant (LTI) systems, they may be freely interchanged as desired.
3.3.3 SDL Components
Whereas the comb filters in Figure 3.6 specify initial and output observation conditions for the plucked guitar tone, the SDL filter in S(z) is responsible for modeling the string vibration including its fundamental frequency and decay. As in the case of the bidirectional waveguide, the total “loop delay”, D, of the SDL denoted by S(z) determines the pitch of the resulting guitar tone as determined by Equation 3.12. Since D is typically a non-integer, the fractional delay filter, HF (z), is used to
D add the required fractional group delay, while z I provides the bulk, integer delay component of
D. All-pass and Lagrange interpolation filters are commonly used for HF (z), with the latter being 24
C(z)
-λ D p(n) z 1 + + −
+ S(z) + + Hl (z) HF (z) z-DI
U(z)
-λ2D z + − + y(n)
Figure 3.6: Plucked string synthesis using the single delay-loop (SDL) model specified by S(z). C(z) and U(z) are comb filters simulating the e↵ects of the plucking point and pickup positions along the string, respectively.
especially popular in synthesis systems since it can achieve variable delay for pitch modification without significant transient e↵ects [26, 30]. Additional information pertaining to fractional delay
filters is provided in Appendix A.
Hl(z) is the so-called “loop filter” and is responsible for implementing the non-ideal characteristics of real strings, including losses due to wave propagation and terminations at the nut and bridge positions. In the early developments of waveguide synthesis, Hl(z) was chosen as a two-tap, averaging filter for simplicity and e ciency [31], but since a low order, FIR filter is often too simplistic to match the magnitude decay characteristics of plucked-guitar tones. In the literature, a first order, IIR filter is often used for Hl(z) and has the form
g H (z)= (3.13) l 1 ↵ z 1 0 where ↵0 and g must be determined for proper calibration [29, 62, 86, 90] It is useful to analyze the total delay, D, in the SDL as a sum of the delays contributed by each component in the feedback loop,
D = ⌧l + DF + DI (3.14) 25
DI where ⌧l, DF , DI are the group delays associated with Hl(z), HF (z) and z ,respectively.Thus, the bulk and fractional delay components should be chosen to compensate for the group delay introduced by the loop filter, which varies as a function of ↵0. For spectral-based analysis, the transfer function of the SDL model between input, p(n), and output, y(n), can be expressed in the z-transform domain as
1 S(z)= . (3.15) 1 H (z)H (z)z DI l F
Equation 3.15 can be thought of as a modified linear prediction where the prediction occurs over
DI samples due to the periodic nature of plucked-guitar tones. The “prediction” coe cients are determined by the coe cients of the loop and fractional delay filters in the feedback loop of S(z).
The SDL model in Figure 3.6 is attractive from an analysis-synthesis perspective since, unlike the bidirectional waveguide model, it does not require specific data about the string during performance
(e.g. initial conditions, instrument materials, plucking technique) to faithfully replicate plucked- guitar tones. Rather, the problem becomes properly calibrating the filters from recorded tones via model-based analysis. A significant portion of the literature for plucked-guitar synthesis is dedicated towards developing calibration schemes for extracting optimal SDL components [26, 29, 62, 69, 86,
90].
3.3.4 Excitation and Body Modeling via Commuted Synthesis
When using the SDL model for guitar synthesis, the output signal is assumed to be strictly the result of the string’s vibration where the only external forces acting on the string are due to fricative losses.
This assumption is not necessarily true when dealing with real guitars, since the instrument’s body incorporates a resonant filter, which a↵ects its timbre, and interacts with the strings via nonlinear coupling. Valimaki et al. describe the acoustic guitar body as a multidimensional resonator, which requires computationally expensive modeling techniques to implement [89].
While an exhaustive review of acoustic body modeling techniques is beyond the current scope, several attempts have been made to reduce the complexity of this task [7, 28, 57]. Measurement of the acoustic guitar body response is typically achieved by striking the resonant body of the instrument with a hammer with the strings muted. The acoustic radiation is recorded to capture the resonant body modes. In some cases, electro-mechanical actuators are used to excite and measure the resonant body in a controlled manner [63]. Digital implementation of the acoustic body involves designing a 26
Excitation Filter SDL Model Body Filter
δ(n) E(z) S(z) B(z) y(n)
Figure 3.7: Components for guitar synthesis including excitation, string and body filters. The excitation and body filter’s may be consolidated for commuted synthesis.
filter that captures the resonant modes. This can be achieved using FIR or IIR filters, though precise modeling requires very high order filters. Karjalainen et al. proposed using warped filter models for computationally e cient modeling and synthesis of acoustic guitar bodies. The warped filter is advantageous since the frequency resolution of the filter can favor the lower, resonant frequency modes which are perceptually important to capture for re-synthesis, while keeping the required filter orders low enough for e cient synthesis [24]. For “cross-synthesis” applications, Karjalainen et al. introduced a technique to “morph” electric guitar sounds into acoustic tones through equalization of the magnetic pickups found on electric guitars. A filter, which encapsulates the body e↵ects of the acoustic guitar, was then applied to a digital waveguide model of the instrument [27].
A popular method for dealing with the absent resonant body e↵ects in SDL model involves using so-called commuted synthesis, which was independently developed by Smith and Karjalainen [29, 73].
This technique exploits the commutative property of linear time-invariant (LTI) systems in order to extract an aggregate signal that encapsulates the e↵ects of the resonant body filter and the string excitation, p(n), of the SDL model when the loop filter parameters are known. This approach avoids the computational cost incurred with explicitly modeling the body with a high-order filter.
Figure 3.7 shows the SDL model augmented by inserting excitation and body filters before and after the SDL loop, respectively. The excitation filter is a general LTI block that encapsulates several aspects of synthesis including “pluck-shaping” filters to model certain dynamics in the articulation and the comb filtering e↵ects from the plucking point and/or pickup locations as shown in Figure
3.6. Assuming that S(z) and y(n) are known, the LTI system can be rearranged
Y (z)=E (z) S (z) B (z) (3.16)
= E (z) B (z) S (z) (3.17)
= A (z) S (z) (3.18) where A(z) is an aggregation of the body and excitation filters. By inverse filtering y(n)inthe 27 frequency domain with S(z), the impulse response for A(z) is obtained. Thus, by making a LTI assumption on the model, this residual signal contains the additional model components which are unaccounted for by the SDL alone. For practical considerations, Valimaki notes that several hundred milliseconds of the residual signal may be required to capture the perceptually relevant resonances of the acoustic body during resynthesis [90], but for many applications the tradeo↵of storing this signal outweighs the cost of explicit body modeling.
It should be noted, that even when plucked-guitar tones do not exhibit prominent e↵ects from the resonant body, commuted synthesis is still a valid technique for obtaining the SDL excitation signal, p(n). This is often the case for electric guitar tones, where the output is measured by a transducer and is relatively “dry” compared to an acoustic guitar signal. Also, any excitation signal extracted via commuted synthesis will contain biases from the plucking point and pickup locations unless these phenomena are specifically accounted for in the “excitation filter” block of Figure 3.7.
If the plucking point and pickup locations are known with respect to the SDL model, the excitation signal can be “equalized” to remove the biases. There are several techniques utilized in the literature to estimate the plucking point location directly from recordings of plucked guitar tones. Traube and
Smith developed frequency domain techniques for acoustic guitars [81, 82, 83, 84], while Pentttinen et al. employed time-domain analysis to determine the relative plucking position along the string
[58, 59].
3.3.5 SDL Loop Filter Estimation
Before the SDL excitation signal can be extracted via commuted synthesis, the loop filter, Hl(z), needs to be calibrated from the recorded tone. This task has been the primary focus in much of the literature, since the loop filter provides the synthesized tones with natural decay characteristics
[14, 29, 39, 62, 69, 86, 90]. This section will overview some of the techniques used in the literature.
Early attempts at modeling the loop filter for the violin involved using deconvolution in the frequency domain to obtain an estimate of the loop filter’s magnitude response. Smith employed various filter design techniques, including autoregressive methods, in order to model the contours of the spectra, however, the measured spectra were subject to amplified noise due to the deconvolution process [69].
Karjalainen introduced a more robust algorithm that extracts magnitude response specifications for the loop filter by analyzing the recorded tone with a short-time Fourier transform (STFT) 28 analysis [29]. Phase characteristics of the STFT are not considered in the loop filter design since the magnitude response is considered to be perceptually more important for plucked-guitar modeling
[29, 86].
Lee et al. expand on Karjalainen’s STFT-based approach by adapting the so-called Energy Decay
Relief (EDR) [40, 64] to model the frequency-dependent attenuation of the waveguide. The EDR was adapted from Jot [23] in order to de-emphasize the e↵ects of beating in the string so that the resulting magnitude trajectories for each partial are strictly monotonic. Thus, the EDR at time t and frequency f is computed by summing all the remaining energy at that frequency from t to infinity. Due to the decaying nature of plucked-guitar tones, this leads to a set of monotonically decreasing curves for each partial analyzed.
Example algorithm for Loop Filter Estimation
An example of Karjalainen’s calibration scheme is shown in Figure 3.8 and can be summarized with the following steps:
1. Determine the pitch, f0, of the recorded tone, y(n).
2. Compute the STFT on the plucked tone y(n).
3. For each frame in the STFT, estimate the magnitudes of the harmonically-related partials.
4. Estimate the slope of each partial’s magnitude trajectory across all frames in the STFT.
5. Compute a gain profile, G(fk), based on the magnitude trajectories for each harmonically related partials.
6. Apply filter design techniques (e.g. least-squares) to determine the parameters of Hl(z) that satisfy the gain profile.
The details of each step in Karjalainen’s calibration scheme vary depending on the specific imple- mentation. For example, the number of partials chosen to analyze is typically between 10-20. Also, partial-tracking across each frame can be achieved by bandpass filtering techniques when the pitch is known [90].
The gain profile, G(fk), extracted from the STFT analysis is computed as [29]