<<

Analysis and Synthesis of Expressive Performance

AThesis

Submitted to the Faculty

of

Drexel University

by

Raymond Vincent Migneco

in partial fulfillment of the

requirements for the degree

of

Doctor of Philosophy

May 2012 c Copyright 2012 Raymond Vincent Migneco. All Rights Reserved. ii

Table of Contents

ListofTables...... vi

ListofFigures ...... vii

Abstract...... xi

1 INTRODUCTION ...... 1

1.1 Contributions...... 3

1.2 Overview ...... 4

2 COMPUTATIONAL GUITAR MODELING ...... 6

2.1 SoundModelingandSynthesisTechniques...... 6

2.1.1 WavetableSynthesis ...... 6

2.1.2 FMSynthesis...... 7

2.1.3 AdditiveSynthesis ...... 8

2.1.4 Source-FilterModeling...... 8

2.1.5 PhysicalModeling ...... 9

2.2 Summary and Model Recommendation ...... 10

2.3 SynthesisApplications ...... 12

2.3.1 SynthesisEngines ...... 12

2.3.2 Description and Transmission ...... 12

2.3.3 New Music Interfaces ...... 13

3 PHYSICALLY INSPIRED GUITAR MODELING ...... 14

3.1 Overview ...... 14

3.2 WaveguideModeling ...... 14

3.2.1 Solution for the Ideal, Plucked-String ...... 15

3.2.2 Digital Implementation of the Wave Solution ...... 15

3.2.3 Lossy Waveguide Model ...... 17

3.2.4 Waveguide Boundary Conditions ...... 18 iii

3.2.5 ExtensionstotheWaveguideModel ...... 20

3.3 Analysis and Synthesis Using Source-Filter Approximations ...... 21

3.3.1 Relation to the Karplus-Strong Model ...... 22

3.3.2 Plucked String Synthesis as a Source-Filter Interaction ...... 22

3.3.3 SDL Components ...... 23

3.3.4 Excitation and Body Modeling via Commuted Synthesis ...... 25

3.3.5 SDL Loop Filter Estimation ...... 27

3.4 ExtensionstotheSDLModel ...... 31

4 SOURCE-FILTERPARAMETERESTIMATION ...... 32

4.1 Overview ...... 32

4.2 Background on Expressive Guitar Modeling ...... 32

4.3 Excitation Analysis ...... 33

4.3.1 Experiment: Expressive Variation on a Single Note ...... 34

4.3.2 Physicality of the SDL Excitation Signal ...... 36

4.3.3 Parametric Excitation Model ...... 38

4.4 Joint Source-Filter Estimation ...... 38

4.4.1 Error Minimization ...... 38

4.4.2 Convex Optimization ...... 40

5 SYSTEMFORPARAMETERESTIMATION...... 43

5.1 Onset Localization ...... 43

5.1.1 Coarse Onset Detection ...... 44

5.1.2 Pitch Estimation ...... 45

5.1.3 Pitch Synchronous Onset Detection ...... 46

5.1.4 Locating the Incident and Reflected Pulse ...... 48

5.2 Experiment1 ...... 49

5.2.1 Formulation ...... 49

5.2.2 Problem Solution ...... 52

5.2.3 Results ...... 53

5.3 Experiment2 ...... 58 iv

5.3.1 Formulation ...... 58

5.3.2 Problem Solution ...... 59

5.3.3 Results ...... 61

5.4 Discussion...... 62

6 EXCITATIONMODELING ...... 63

6.1 Overview ...... 63

6.2 Previous Work on Guitar Source Signal Modeling ...... 64

6.3 Data Collection Overview ...... 66

6.3.1 Approach ...... 67

6.4 Excitation Signal Recovery ...... 68

6.4.1 Pitch Estimation and Resampling ...... 69

6.4.2 Residual Extraction ...... 69

6.4.3 Spectral Bias from Plucking Point Location ...... 70

6.4.4 Estimating the Plucking Point Location ...... 71

6.4.5 Equalization: Removing the Spectral Bias ...... 74

6.4.6 Residual Alignment ...... 76

6.5 Component-based Analysis of Excitation Signals ...... 77

6.5.1 Analysis of Recovered Excitation Signals ...... 77

6.5.2 Towards an Excitation Codebook ...... 78

6.5.3 Application of Principal Components Analysis ...... 79

6.5.4 Analysis of PC Weights and Basis Vectors ...... 81

6.5.5 Codebook Design ...... 84

6.5.6 Codebook Evaluation and Synthesis ...... 85

6.6 Nonlinear PCA for Expressive Guitar Synthesis ...... 88

6.6.1 Nonlinear Dimensionality Reduction ...... 89

6.6.2 Application to Guitar Data ...... 90

6.6.3 Expressive Control Interface ...... 92

6.7 Discussion...... 94

7 CONCLUSIONS ...... 95 v

7.1 ExpressiveLimitations...... 96

7.2 Physical Limitations ...... 97

7.3 FutureDirections...... 98

Appendix A Overview of Fractional Delay Filters ...... 100

A.1 Overview ...... 100

A.2 The Ideal Fractional Delay Filter ...... 100

A.3 Approximation Using FIR Filters ...... 102

A.3.1 Delay Approximation using Lagrange Interpolation Filters ...... 103

A.4 Further Considerations ...... 104

AppendixB PitchGlideModeling ...... 106

B.1 Overview ...... 106

B.2 PitchGlideModel ...... 107

B.3 PitchGlideMeasurement ...... 107

B.4 Nonlinear Modeling and Data Fitting ...... 108

B.4.1 Nonlinear Least Squares Formulation ...... 108

B.4.2 FittingandResults ...... 109

B.5 Implementation...... 110

Bibliography ...... 113

VITA ...... 119 vi

List of Tables

2.1 Summary of sound synthesis models including their modeling domain and applicable audio signals. Adopted from Vercoe et al. [93]...... 11

2.2 Evaluating the attributes of various sound modeling techniques. The boldface tags indicate the optimal evaluation for a particular category...... 11

5.1 Mean and standard deviation of the SNR computed using Equation 5.11. The joint source-filter estimation approach was used to obtain parameters for synthesizing the guitar tones based on an IIR loop filter...... 58

5.2 Mean and standard deviation of the SNR computed using Equation 5.11. The joint source-filter estimation approach was used to obtain parameters for synthesizing the guitar tones using a FIR loop filter with length N = 3...... 61

B.1 Pitch glide parameters of Equation B.3 for plucked guitar tones for each guitar string. p, mf and f indicate strings excited with , mezzo-forte and forte dynamics, respectively...... 112 vii

List of Figures

3.1 Traveling wave solution of an ideal string plucked at time t = t1 and its displacement at subsequent time instances t2,t3. The string’s displacement (solid) at any position is the summation of the two disturbances (dashed) at that position...... 16

3.2 Waveguide model showing the discretized solution of an ideal, plucked string. The + upper (y ) and lower (y) signal paths represent the right and left traveling distur- + bances, respectively. The string’s displacement is obtained by summing y and y at adesiredspatialsample...... 17

3.3 Waveguide model incorporating losses due to propagation at the spatial sampling instances. The dashed lines outline a section where M gain and delay blocks are consolidated using a linear time-invariant assumption...... 18

3.4 Plucked-string waveguide model as it correlates to the physical layout of the guitar. Propagation losses and boundary conditions are lumped into digital filters located at the bridge and nut positions. The delay lines are initialized with the string’s initial displacement...... 20

3.5 Single delay-loop model (right) obtained by concatenating the two delay lines from a bidirectional waveguide model (left) at the nut position. Losses from the bridge and nut filters are consolidated into a single filter in the feedback loop...... 22

3.6 Plucked string synthesis using the single delay-loop (SDL) model specified by S(z). C(z) and U(z) are comb filters simulating the e↵ects of the plucking point and pickup positions along the string, respectively...... 24

3.7 Components for guitar synthesis including excitation, string and body filters. The excitation and body filter’s may be consolidated for commuted synthesis...... 26

3.8 Overview of the loop filter design algorithm outlined in Section 3.3.5 using short-time Fourier transform analysis on the signal...... 30

4.1 Top: Plucked guitar tones representing various string articulations by the guitarist on the open, 1st string (pitch E4, 329.63 Hz). Bottom: Excitation signals for the SDL modelassociatedwitheachpluckingstyle...... 35

4.2 The output of a waveguide model is observed over one period of oscillation. The top figure in each subplot shows the position of the traveling acceleration waves at di↵erent time instances. The bottom plot traces out the measured acceleration at the bridge (notedbythe’x’inthetopplots)overtime...... 37

5.1 Proposed system for jointly estimating the source-filter parameters for plucked guitar tones...... 43

5.2 Pitch estimation using the autocorrelation function. The lag corresponding to the global maximum indicates the fundamental for a signal with f0 = 330 Hz. 46 viii

5.3 Overview of residual onset localization in the plucked-string signal. (a): Coarse onset localization using a threshold based on spectral flux with a large frame size. (b): pitch-synchronous onset detection utilizing spectral flux threshold computed with a frame size proportional to the of the string. (c): Plucked-string signal with onsets coarse and pitch-synchronous onsets overlayed...... 47

5.4 Detail view of the “attack” portion of the plucked-tone signal in Figure 5.3. The pitch- synchronous onset is marked as well as the incident and reflected pulses from the first period of oscillation...... 48

5.5 Pole-zero and magnitude plots of a string filter S(z)withf0 = 330 Hz and a loop filter pole located at ↵0 =0.03. The pole-zero and magnitude plots of the system are shown in (a) and (c) and the corresponding plots using an all-pole approximation of S(z)areshownin(b)and(d)...... 50

st 5.6 Analysis and resynthesis of the guitar’s 1 String in the “open” position (E4, f0 = 329.63 Hz). Top: Original plucked-guitar tone, residual signal and estimated excitation boundaries. Middle: Resynthesized pluck and excitation using estimated source-filter parameters. Bottom: Modeling error...... 54

5.7 Comparing the envelopes of synthetic plucked-string tones produced with the parameters obtained from the joint source-filter algorithm against their analyzed counterparts. The tones under analysis were produced by plucking the 1st string at nd the 2 position (F#4, f0 = 370 Hz) at piano, mezzo-forte and forte dynamics. . 55

5.8 Comparing the amplitude envelopes of synthetic plucked-string tones produced with the parameters obtained from the joint source-filter algorithm against their analyzed counterparts. The tones under analysis were produced by plucking the 5th string at th the 5 fret position (D3, f0 = 146.83 Hz) at piano, mezzo-forte and forte dynamics. . 56

6.1 Source-filter model for plucked-guitar synthesis. C(z) is the feed-forward comb filter simulating the a↵ect of the player’s plucking position. S(z) models the string’s pitch anddecaycharacteristics...... 65

6.2 Front orthographic projection of the bridge-mounted piezoelectric bridge used to record plucked-tones. A piezoelectric crystal is mounted on each saddle, which measures pressure during vibration. Guitar diagram obtained from www.dragoart.com. . . . . 67

6.3 Diagram outlining the residual equalization process for excitation signals...... 69

6.4 “Comb filter” e↵ect resulting from plucking a guitar string (open E, f0 = 331 Hz) 8.4 cm from the bridge plucked-guitar tone. (a) Residual obtained from single delay- loop model. (b) Residual spectrum. Using equation 6.2, the notch are approximately located at multiples of 382 Hz...... 70

6.5 Plucked-guitar tone measured using a piezo-electric bridge pickup. Vertical dashed- lines indicate the impulses arriving at the bridge pickup. t indicates the arrival time betweenimpulses...... 73

6.6 (a) One period extracted from the plucked-guitar tone in Figure 6.5. (b) Autocor- relation of the extracted period. The minimum is marked and denotes time lag, t, betweenarrivingpulsesatthebridgepickup...... 73 ix

6.7 Comb filter structures for simulating the plucking point location. (a) Basic struc- ture. (b) Basic structure with fractional delay filter added to the feedforward path to implementnon-integerdelay...... 75

6.8 Spectral equalization on a residual signal obtained from plucking a guitar string 8.4 cm from the bridge (open E, f0 = 331 Hz) ...... 76

6.9 Excitation signals corresponding to strings excited using a pick (a) and finger (b). . . 77

6.10 Average magnitude spectra of signals produced with pick (a) and finger (b)...... 78

6.11 Application of principal components analysis to a synthetic data set. The vector v1 explains the greatest variance in the data while v2 explains the remaining greatest variance...... 79

6.12 Explained variance of the principal components computed for the set of (a) unwound and(b)woundstrings...... 82

6.13 Selected basis vectors extracted from plucked-guitar recordings produced on the 1st, 2nd and 3rd strings...... 83

6.14 Selected basis vectors extracted from plucked-guitar recordings produced on the 4th, 5th and 6th strings...... 83

6.15 Projection of guitar excitation signals into the principal component space. Excitations from strings 1 - 3 (a) and 4 - 6 (b)...... 84

6.16 Histogram of basis vector occurrences generated with Mtop = 20...... 86

6.17 Excitation synthesis by varying the number of code book entries: (a) 1 entry, (b) 10 entries,(c)50entries...... 87

6.18 Computed Signal-to-noise ratio when increasing the number of codebook entries used to reconstruct the excitation signals...... 88

6.19 Architecture for a 3-4-1-4-3 autoassociative neural network...... 89

6.20 Top: Projection of excitation signals into the space defined by the first two linear principal components. Bottom: Projection of the linear PCA weights along the axis defined by the bottleneck layer of the trained 25-6-2-6-25 ANN...... 91

6.21 Guitar data projected along orthogonal principal axes defined by the ANN (center). Example excitation pulses resulting from sampling this space are also shown...... 92

6.22 Tabletop guitar interface for the components based excitation synthesis. The articula- tion is applied in the gradient rectangle, while the colored squares allow the performer tokeyinspecificpitches...... 93

A.1 Impulse responses of an ideal shifting filter when the sample delay assumes an integer (top) and non-integer (bottom) number of samples...... 102

A.2 Lagrange interpolation filters with order N = 3 (top) and N = 7 (bottom) to provide a fractional delay, dF =0.3. As the order of the filter is increased, the Lagrange filter coecients near the values of the ideal function...... 104 x

A.3 Frequency response characteristics of Lagrange interpolation filters with order N = 3, 5, 7 to provide a fractional delay dF =0.3. Magnitude (top) and group delay (bottom) characteristics are plotted...... 105

B.1 Measured and modeled pitch glide for forte plucks...... 110

B.2 Measured and modeled pitch glide for piano, mezzo-forte and forte plucks...... 111

B.3 Single delay-loop waveguide filter with variable fractional delay filter, HF (z)...... 111 xi

Abstract Analysis and Synthesis of Expressive Guitar Performance Raymond Vincent Migneco Advisor: Youngmoo Edmund Kim, Ph.D.

The guitar is one of the most popular and versatile instruments used in Western music cultures.

Dating back to the Renaissance era, the guitar can be heard in nearly every genre of Western music, and is arguably the most widely used instrument in present-day rock music. Over the span of 500 years, the guitar has developed a multitude of performance and compositional styles associated with nearly every musical genre such as classical, , and rock. This versatility can be largely attributed to the relatively simplistic nature of the instrument, which can be built from a variety of materials and optionally amplified. Furthermore, the flexibility of the instrument allows performers to develop unique playing styles, which reflect how they articulate the guitar to convey certain musical expressions.

Over the last three decades, physical- and physically-inspired models of musical instruments have emerged as a popular methodology for modeling and synthesizing various instruments, including the guitar. These models are popular since their components relate to the actual mechanisms involved with sound production on a particular instrument, such as the vibration of a guitar string. Since the control parameters are physically relevant, they have a variety of applications including control and manipulation of “virtual instruments.” The focus of much of the literature on physical modeling for is concerned with calibrating the models from recorded tones to ensure that the behavior of real strings is captured. However, far less emphasis is placed on extracting parameters that pertain to the expressive styles of the guitarist.

This research presents techniques for the analysis and synthesis of plucked guitar tones that are capable of modeling the expressive intentions applied through the guitarist’s articulation during performance. A joint source-filter estimation approach is developed to account for the performer’s articulation and the corresponding resonant string response. A data-driven, statistical approach for modeling the source signals is also presented in order to capture the nuances of particular playing styles. This research has several pertinent applications, including the development of expressive syn- thesizers for new musical interfaces and the characterization of performance through audio analysis.

1

CHAPTER 1: INTRODUCTION

The guitar is one of the most popular and versatile instruments used in Western music cultures.

Dating back to the Renaissance period, it has been incorporated into nearly every genre of Western music and, hence, has a rich tradition of design and performance techniques pertaining to each genre.

From a cultural standpoint, musicians and non-musicians alike are captivated by the performances of virtuoso guitarists past and present, who introduced innovative techniques that defined or redefined the way the instrument was played. This deep appreciation is no doubt related to the instrument’s adaptability, as it is recognized as a primary instrument in many genres, such as blues, jazz, folk, country and rock.

The guitar’s versatility is inherent in its simple design, which can be attributed to its use in multiple musical genres. The basic components of any guitar consist of a set of strings mounted across a fingerboard and a resonant body to amplify the vibration of the strings. The tension on each string is adjusted to achieve a desired pitch when the string is played. Particular pitches are produced by clamping down each string at a specific location along the fingerboard, which changes the e↵ective length of the string and, thus, the associated pitch when it is plucked. , which are metallic strips spanning the width of the fingerboard, are usually installed on the fingerboard to exactly specify the location of notes in accordance with an equal tempered division of the octave.

The basic design of the guitar has been augmented in a multitude of ways to satisfy the demands of di↵erent performers and musical genres. For example, classical guitars are strung with nylon strings, which can be played with the fingers or nails, and a wide fingerboard to permit playing scales and chords with minimal interference from adjacent strings. Often a solo instrument, the requires a resonant body for amplification where the size and materials of the body are chosen to achieve a specific . On the other hand, country and folk guitarists prefer steel- strings which generally produce “brighter” tones. Electric guitars are designed to accommodate the demands of guitarists performing rock, blues and jazz music. These guitars are outfitted with electromagnetic pickups where string vibration induces an electrical current, which can be processed to apply certain e↵ects (e.g. distortion, reverberation) and eventually amplified. The role of the body is less important for electric guitars (although guitarists argue that it a↵ects the instrument’s 2 timbre) where the body is generally thinner to increase comfort during performance. When the is outfitted with light gauge strings, it facilitates certain techniques such as pitch- bending and , which are more dicult to perform on acoustic instruments.

Though the guitar can be designed and played in di↵erent ways to achieve a vast tonal palette, the underlying physical principles of vibrating strings is constant for each variation of the instrument.

Consequently, a popular topic among musicians and researchers is the development of quantitative guitar models that simulate this behavior. Physical- and physically-inspired models of musical in- struments have emerged as a popular methodology for this task. The lure of these models is that they simulate the physical phenomena responsible for sound production in instruments, such as a vibrating strings or air in a column, and produce high-quality synthetic tones. Properly calibrating these models, however, remains a dicult task and is an on-going topic in the literature. Several gui- tar have been developed using physically-inspired models, such as waveguide synthesis and the Karplus-Strong Algorithm.

In the last decade, there has been considerable interest in digitally modeling analog guitar com- ponents and e↵ects using digital signal processing (DSP) techniques. This work is highly relevant to the consumer electronics industry since it promises low-cost, digital “clones” of vintage, analog equipment. The promise of these devices is to help musicians consolidate their analog equipment into a single device or acquire the specific tones and capabilities of expensive and/or discontinued equipment at lower cost. Examples of products designed using this technology include Line6 mod- eling guitars and amplifiers, where DSP is used to replicate the sounds of well-known guitars and tube-based amplifiers [45, 46].

Despite the large amount of research focused on digitally modeling the physics of the guitar and its associated e↵ects, there has been relatively little research conducted which analyzes the expressive attributes of guitar performance. The current research is mainly concerned with implementing specific performance techniques into physical models based on detailed physical analysis of the performer-instrument interaction. However, there is a void in the research for guitar modeling and synthesis that is concerned with measuring physical and expressive data from recordings. Obtaining such data is essential for developing an expressive guitar ; that is, a system that not only faithfully replicates guitar , but is also capable of simulating expressive intentions used by many guitarists. 3

1.1 Contributions

This dissertation proposes analysis and synthesis techniques for plucked guitar tones that are capable of modeling the expressive intentions applied through the guitarist’s articulation during performance.

Specifically, the expression analyzed through recorded performance focuses on how the articulation was applied through plucking mechanism and strength. The main contributions of this research are summarized as follows:

Generated a data set of plucked guitar tones comprising variations of the performer’s articu- • lation including the plucking mechanism and strength, which spans all of the guitar’s strings

and several fretting positions.

Developed a framework for jointly estimating the source and filter parameters for plucked- • guitar tones based on a physically-inspired model.

Proposed and demonstrated a novel application of principal component analysis to model the • source signal for plucked guitar tones to encapsulate characteristics of various string articula-

tions.

Utilized nonlinear principal components analysis to derive an expressive control space to syn- • thesize excitation signals corresponding to guitar articulations.

The analysis and synthesis techniques proposed here are based on physically inspired models of plucked-guitar tones. These types of models are chosen because they have great potential for analyzing and synthesizing expressive performance because their operation has a strong physical analog to the process of exciting a string; that is, an impulsive force excites a resonant string response.

These advantages are in contrast to other modeling techniques, such as frequency (FM), additive and spectral modeling synthesis, which are often used for music synthesis tasks, but lack easily controlled parameters that relate to how an instrument is excited (e.g. bowing, picking).

Physical models, on the other hand, relate to the initial conditions of a plucked string and possible variations which produce unique tones when applied to the model. This is intuitive, considering guitarists a↵ect the same physical variables when plucking a string.

The proposed method for deriving the parameters relating to expressive guitar performance is based on a joint source-filter estimation framework. The motivation to implement the estimation in a joint source-filter framework is two-fold. Foremost, musical expression results from an interaction 4 between the performer and the instrument and estimating the expressive attributes of performance requires accounting for the simultaneous variation of source and filter parameters. For the specific case of the guitar, the performer can be seen as imparting an articulation (i.e. excitation) on the string (i.e. filter), which has a resonant response to the performance input. The second reason for this approach is to facilitate the estimation of the source and filter parameters, which is typically accomplished in two separate tasks.

Building o↵the joint parameter estimation scheme, component-based analysis is applied to the source (i.e. excitation) signals obtained from recorded performance. Existing modeling techniques treat the excitation signal as a separate entity saved o↵-line to model a specific articulation, but in doing so provides no mechanism to quantify or manipulate the excitation signal. The application of component analysis is a data-driven, statistical approach used to represent the nuances of specific articulations through linear combinations of component vectors or functions. Using this represen- tation, the articulations can be visualized in the component space and dimensionality reduction is applied to yield an expressive synthesis space that o↵ers control over specific characteristics of the data set.

The proposed guitar modeling techniques presented in this dissertation have many potential applications for music analysis and synthesis tasks. Analyzing the source-filter parameters derived from the recordings of many guitarists could lead to development of quantitative models of guitar expression and a deeper understanding of expression during performance. The application of the estimated parameters using the proposed techniques can expand upon the sonic and expressive capabilities of current synthesizers, which often rely on MIDI or wavetable samples to replicate the tone with little or no expressive control. During the advent of computer music, limited computational power was a major constraint when implementing synthesis algorithms, but this is now much less of a concern given the capabilities of present-day computers and mobile devices. These advances in technology have provided new avenues for interacting with audio through gesture-based technologies.

The guitar analysis and synthesis techniques presented in this dissertation can be harnessed along with these technologies to create new experiences for musical interaction.

1.2 Overview

As computational modeling for plucked-guitars is the basis of this thesis, Chapter 2 overviews various approaches for modeling and synthesizing musical sounds. These approaches include wavetable 5 synthesis, spectral modeling, FM synthesis, physical modeling and source-filter model. The strengths and weaknesses of each model are evaluated and based on our assessment, a recommendation is made to base the techniques proposed in this dissertation on a source-filter approximation of physical guitar models.

Physical and source-filter models are discussed in detail in Chapter 3, which digitally implement the behavior of a vibrating string due to an external input. The so-called waveguide model, which is based on a digital implementation of the d’Alembert solution for describing traveling waves on a string, is introduced as well as a source-filter approximation of this model.

Chapter 4 presents an approach for capturing the expression contained in specific string articu- lations via the source signal from a source-filter model. The physical relation of this source signal to the waveguide model is highlighted and it is suggested that a parametric model can be used to capture the nuances of the articulations. The joint estimation of the source and filter models is proposed by finding parameters that minimize the error between the analyzed recording and the synthetic signal. This constrained least squares problem is solved using convex optimization. The implementation for this approach and results are discussed in Chapter 5.

In Chapter 6, principal components analysis (PCA) is applied to a corpus of excitation signals derived from recorded performance. In this application, PCA models each excitation signal as a linear combination of basis functions, where each function contributes to the expressive attributes of the data. We show that a codebook of relevant basis functions can be extracted which describe particular articulations where the plucking device and strength are varied. Furthermore, using components as features, we show that nonlinear PCA (NLPCA) can be applied for dimensionality reduction, which helps visualize the expressive attributes of the data set. This mapping is reversible, so the reduced dimensional space can be used as an expressive synthesizer using the linear basis functions to reconstruct the excitation signals. This chapter also deals with the pre-processing steps required to remove biases from the recovered signals, including the e↵ect of the guitarist’s plucking position along the string.

The conclusions from this dissertation are presented in Chapter 7, which includes the limitations and future avenues to explore. 6

CHAPTER 2: COMPUTATIONAL GUITAR MODELING

A number of techniques are available for the computational modeling and synthesis of guitar tones, each with entirely di↵erent approaches for capturing its acoustic attributes. This chapter will provide an overview of the sound models most commonly applied to guitar tones including their computa- tional basis, strengths and weaknesses. For detailed treatment of these techniques, the reader is referred to extensive overviews provided by [10] and [89]. The analysis of each synthesis techniques will also be used to justify the source-filter modeling approach used throughout this dissertation.

Finally, this chapter will discuss pertinent applications of computational synthesis of guitar tones.

2.1 Sound Modeling and Synthesis Techniques

2.1.1

In many computer music applications, wavetable synthesis is a viable means for synthetically gener- ating musical sounds with low computational overhead. A wavetable is simply a bu↵er that stores the periodic component of a recorded sound, which can be looped repeatedly. As musical sounds vary in pitch and duration, signal processing techniques are required to modify the synthetic tones from a wavetable sample. Pitch shifting is achieved by interpolating the samples in the wavetable where a decrease or increase in pitch is achieved by interpolating the wavetable samples up or down, respectively.

A problem with interpolation in wavetable synthesis is that excessive interpolation of a particular wavetable sample can result in synthetic tones that sound unnatural since interpolation alters the length of the synthetic signal. To overcome this limitation, multi-sampling is used, where several samples of an instrument are used and these samples span the pitch range of the instrument. In- terpolation can now be used between the reference samples without excessive degradation to the synthetic tone, which is preferred to storing every possible pitch the instrument can produce. Multi- sampling can also be used to incorporate di↵erent levels of dynamics, or relative loudness into the system as well. Beyond interpolation, digital filters can be used to adjust the spectral properties 7

(e.g. brightness) of the wavetable samples as well.

The computational costs of wavetable synthesis are fairly low and the main restriction is the amount of memory available to store samples. The sound quality in these systems can be quite good as long as there is not excessive degradation from modification. However, wavetable synthesis has no true modeling basis (i.e. sinusoidal, source-filter) and is rather “ad-hoc” in its approach. Also, its flexibility in modeling and synthesis is restricted by the samples available to the synthesizer.

2.1.2 FM Synthesis

Frequency Modulation (FM) synthesis is a technique used to simulate characteristics of sounds that cannot be produced with LTI models. A FM oscillator is one such way of achieving these sounds and it operates by modulating the base frequency of a signal with another signal. FM Synthesis is often used to simulate characteristics of sounds that cannot be modeled using linear time-invariant models. A simple FM oscillator is given by

y(t)=Ac sin(2⇡tfc +fc cos(2⇡tfm)) (2.1)

where Ac and fc are the amplitude and frequency of the carrier signal, respectively, fm is the modulating frequency and fc is the maximum di↵erence between fc and fm. The spectrum of the resulting signal y(t) contains a peak located at the carrier frequency and sideband frequencies located at plus and minus integer multiples of fm. When the ratio of the carrier to the modulating frequency is non-integer, FM synthesis creates an inharmonic spectrum where the frequency spacing between the partials is not constant. This is useful for modeling the spectra of certain musical sounds, such as strings and drums, which exhibit inharmonic behavior.

FM synthesis is a fairly computationally ecient technique and can be easily implemented on a microprocessor, which makes it attractive for commercially available synthesizers. Due to the nonlinearity of the FM oscillator, for example, it is capable of producing timbres not possible with other synthesis methods. However, there is no automated approach for matching the synthesis parameters to an acoustic recording [8]. Rather, the parameters must be tweaked by trial and error and/or using perceptual evaluation. 8

2.1.3

Additive, or spectral modeling, synthesis is a sound modeling and synthesis approach based on characterizing the spectra of musical sounds and modeling them appropriately. Sound spectra cat- egories typically consist of , inharmonic, noise or mixed spectra. Analysis via the additive synthesis approach typically entails performing a short-time analysis on the signal to divide it into relatively short frames where the signal is assumed to be stationary within the frame. In the spectral modeling synthesis technique proposed by Serra and Smith, the sinusoidal, or deterministic, parts of the spectrum within each frame are identified and modeled using amplitude, frequency and .

The sound can be re-synthesized by interpolating between the deterministic components of each frame to generate a sum of smooth, time-varying sinusoids. The noise-like, or stochastic, parts of the spectrum can be obtained by subtracting the synthesized, deterministic component from the original signal [68].

There are several benefits to synthesizing musical sounds via additive synthesis. Foremost, the model is very general and can be applied to a wide range of signals including polyphonic audio and speech [50, 68]. Also, the separation of the deterministic and stochastic components permits

flexible modification of signals since the sinusoidal parameters are isolated within the spectrum.

For example, pitch and time/scale modification can be achieved independently or simultaneously by shifting the frequencies of the sinusoids and altering the interpolation time between successive frames. This leads to synthetic tones that sound more natural and can be extended indefinitely, unlike wavetable interpolation.

A problem with additive synthesis is that transient events present in an analyzed signal are often too short to be adequately modeled by sinusoids and must be accounted for separately. This is problematic especially for signals with a percussive “attack” such as plucked-strings. It is also unclear how to modify the sinusoids in order to achieve certain e↵ects related to the perceived dynamics of a musical tone.

2.1.4 Source-Filter Modeling

Analysis and synthesis via source-filter models involves using a complex sound source, such as an impulse or periodic impulse train, to excite a resonant filter. The filter includes the important per- ceptual characteristics of the sound, such as the overall spectral tilt and the , or resonances, characteristic to the sound. When such a filter is excited by an impulse train, for example, the 9 resonant filter is “sampled” at regular intervals in the spectrum as defined by the frequency of the impulse train.

Source-filter models are attractive because they permit the automated analysis of the resonant characteristics through either time or based techniques. One of the most well- known examples of this is linear prediction. Linear prediction entails predicting a sample of a signal based on a linear combination of past samples for that signal

P x(n)= ↵ x(n p) (2.2) p p=1 X where ↵p,↵p+1,...,↵P are the prediction coecients to be estimated from the recording [60]. When a fairly low prediction order P is used, the prediction coecients yield an all-pole filter that approx- imates the spectral shape, including resonances, of the analyzed sound. Computationally ecient techniques, such as the autocorrelation and covariance methods, are available for estimating the

filter parameters as well.

A significant advantage of source-filter models is that they approximate musical sounds as the output of a linear time-invariant (LTI) system. Therefore, using the estimated resonant filter, the source signal for the model can be recovered through an inverse filtering operation. Analysis of the recovered source signals provides insight into the expression used to produce the sound for the case of musical instruments. Also, source signals derived from certain signals can be used to excite the resonant filters from others, thus permitting cross-synthesis for generating new and interesting sounds. As will be discussed in Chapter 3, source-filter models have a close relation to physical models of musical instruments.

Despite the advantages of source-filter models, they have certain limitations. Namely, as they are based on LTI models, they cannot model the inherent nonlinearities found in real musical in- struments. For example, tension modulation in real strings alters the spectral characteristics in a time-varying manner, while source-filter models have fixed fundamental frequencies.

2.1.5 Physical Modeling

Physical modeling systems aim to model the behavior of systems using physical variables such as force, displacement, velocity and acceleration. Physical systems describing sound can range from musical interactions such as striking a drum or string or natural sounds such as wind and rolling objects. An example physical system for a musical interaction consists of releasing a string from an 10 initial displacement. The solution to this system is discussed extensively in Chapter 3, but involves computing the infinitesimal forces acting on the string as it is released which results in a set of di↵erential equations describing the motion of the string with respect to time and space. The digital implementation of physical models for sound can be achieved in a number of ways including modal decomposition, digital waveguides and wave digital filters to name a few [89].

While physical models are capable of high quality synthesis of acoustic instruments, developing models of these systems is often a dicult task. Taking the plucked-string as an example, a complete physical description requires knowledge of the string including its material composition and how it interacts with the boundary conditions at its termination points, which includes fricative forces acting on the string as it travels. Furthermore, there may be coupling forces acting between the string and the excitation mechanism (e.g. the player’s finger), which should be included as well.

For these reasons, the physical system must be known a priori and it cannot be calibrated directly through audio analysis.

2.2 Summary and Model Recommendation

Table 2.1 summarizes the sound modeling techniques presented above by comparing their modeling domains and the range of musical signals that can be produced using each method. The vertical ordering is indicative of the underlying basis and/or structure of the model types. For example, wavetable synthesis is a rather “ad-hoc” approach without a true computational basis, while FM synthesis is based on modulating sinusoids. Additive synthesis and source-filter models have a strict modeling basis using sinusoids plus noise and source-filter parameters, respectively. Physical models are most closely related to musical instruments since they deal with related physical quantities and interactions. As a model’s parameter domain becomes more general, a greater range of sounds can be synthesized with more control over their properties (i.e. pitch, timbre, articulation).

Based on the discussion in Section 2.1, the strengths and weaknesses of each model are evaluated on a scale (Low, Moderate, High) as they pertain to four categories:

1. Computational complexity required for implementation

2. The resulting sound quality when the model is used for sound synthesis of guitar tones

3. The diculty required to calibrate the model in accordance with acoustic samples

4. The degree of expressive control a↵orded by the model 11

Table 2.1: Summary of sound synthesis models including their modeling domain and applicable audio signals. Adopted from Vercoe et al. [93].

Sound Model Parameter Domain Acoustic Range sound samples, manipulation discrete pitches, isolated sound Wavetable filters events carrier and modulating sounds with harmonic and FM frequencies inharmonic spectra sounds with harmonic, noise sources, time-varying Additive inharmonic, noisy or mixed amplitude, frequency and phase spectra voice (speech, singing), excitation signal, filter Source-Filter plucked-string or struck parameters instruments physical quantities (length, plucked, struck, bowed or blown Physical sti↵ness, position, etc.) instruments

Table 2.2: Evaluating the attributes of various sound modeling techniques. The boldface tags indicate the optimal evaluation for a particular category. Computational Calibration Expressive Sound Model Sound Quality Complexity Diculty Control

Wavetable Low High High Low

FM Low Moderate High Low

Additive Moderate High Moderate Moderate

Source-Filter Moderate High Moderate High

Physical High High High Moderate

Table 2.2 shows the results of this evaluation in accordance with the four categories presented above. The model(s) earning the best evaluation for each category are highlighted in bold face font for emphasis. It should be noticed that, in general, the computational complexity of the models increases in accordance with the associated model parameter domain in Table 2.1. That is, as the parameters become more general, they are more dicult to implement and harder to calibrate.

For truly flexible and expressive algorithmic synthesis, additive, source-filter and physical models o↵er the best of all categories. While the additive model provides good sound quality and flexible synthesis (especially with regard to pitch and time shifting), the sinusoidal basis does not allow the performer’s input to be separated from the instrument’s response. Physical models provide this 12 separation, but are dicult to calibrate, especially from a recording, since the physical configuration of the instrument’s components and the performer’s interaction are generally not known a priori.

Of the remaining models, the source-filter model provides the greatest appeal due to its inherent simplicity especially, especially as it pertains to modeling the performer’s articulation, relative ease of calibration and available expressive control.

2.3 Synthesis Applications

The techniques for modeling plucked-guitar tones presented in this thesis are applicable to a number of sound synthesis tasks. This section will highlight a few such tasks to provide a larger perspective on the benefits of computational guitar modeling.

2.3.1 Synthesis Engines

There are numerous systems available which encompass a variety of computational sound models for the creation of synthetic audio. One system includes , which is an audio programming language created by Vercoe et al. based on the C language [92]. CSound o↵ers the implementation of several synthesis algorithms, including general filtering operations, additive synthesis and linear prediction. The Synthesis ToolKit (STK) is another system created by Cook and Scavone, which adopts a hierarchical approach to sound modeling and synthesis using an open-source application programming interface based on C++ [11]. STK handles low level, core sound synthesis via unit generators which include envelopes, oscillators and filters. High-level synthesis routines encapsulate physical modeling algorithms for specific musical instruments, FM synthesis, additive synthesis and other routines.

2.3.2 Description and Transmission

Computational modeling of musical instruments, especially the guitar, is highly applicable in sys- tems requiring generalized audio description and transmission. The MPEG-4 standard is perhaps the most well-known codec (compressor-decompressor) for transmission of multimedia data. How- ever, the compression of raw audio, even using the perceptual codec found in mp3, leaves little or no control over the sound at the decoder. To expand the parametric control of compressed audio, the

MPEG-4 standard includes a descriptor for so-called Structured Audio, which permits the encoding, transmission and decoding of audio using highly structured descriptions of sound [21, 66, 93]. The 13 audio descriptors can include high-level, performance information for musical sounds such as pitch, duration, articulation and timbre and low-level descriptions based on the models (e.g. source-filter, additive synthesis) used to generate the sounds. It should be noted that the structured audio descrip- tor does not attempt to standardize the model used to parameterize the audio, but provides a means for describing the synthesis method(s), which keeps the standard flexible. The level of description provided by structured audio di↵erentiates it from other formats such as pulse-code modulated audio or mp3, which do not provide contextual descriptions and MIDI (musical instrument digital inter- face), which provide contextual description, but lacks timbral or expressive descriptors. In essence, structured audio provides a flexible and descriptive “language” for communicating with synthesis engines.

2.3.3 New Music Interfaces

Computer music researchers have long sought to develop new interfaces for musical interaction.

Often, these interfaces deviate from the traditional notion in which an instrument is played in order to appeal to non-musicians or enable entirely new ways of interacting with sound. For the guitar,

Karjalainen et al. developed a “virtual air guitar” where the performer’s hands are tracked using motion sensing gloves [26]. The guitar tones are produced algorithmically using waveguide models in response to gestures made by the performer. More recently, commercially available gesture and multitouch technologies have been used for music creation. The limitations of these systems, however, is that their audio engines utilize sample-based synthesizers and provide little or no parametric control over the resulting sound [20, 55].

The plucked-guitar model techniques presented in this dissertation are applicable to each of the sound synthesis areas outlined above. The source and filter parameters extracted from recordings can be used for low bit-rate transmission of audio and are based on algorithms (source-filter) that are either available in many synthesis packages are easily implemented on present-day hardware.

Given the computational power available in present day computers and mobile devices, the anal- ysis techniques and algorithms presented here can be harnessed into applications for new musical interfaces as well. 14

CHAPTER 3: PHYSICALLY INSPIRED GUITAR MODELING

3.1 Overview

For the past two decades, physically-inspired modeling systems have emerged as a popular method for simulating plucked-string instruments since they are capable of producing high-quality tones with computationally ecient implementations. The emergence of these techniques was due, in part, to the innovations of the Karplus-Strong algorithm, which simulated plucked-string sounds using a simple and ecient model, which was later shown to approximate the physical phenomena of traveling waves on a string [22, 30, 31, 72, 89]. Thus, direct physical modeling of a musical instrument aims to simulate the behavior of particular elements responsible for sound production

(e.g. a vibrating string or resonant air column) due to the musician’s interaction with the instrument

(e.g. plucking or breath excitation) with a digital model [89].

This chapter will briefly overview waveguide techniques for guitar synthesis, which directly models the traveling wave solution resulting from a plucked string. A related model, known as the single delay-loop, is also discussed, which is utilized for the analysis and synthesis tasks presented in this thesis.

3.2 Waveguide Modeling

Directly modeling the complex vibration of guitar strings due to the performer-instrument interaction is a dicult problem. However, by using simplified models of plucked-strings, waveguide models o↵er an intuitive understanding of string and lead to practical and ecient implementations [72]. In this section, the well-known traveling wave solution for ideal, plucked-strings is presented [33]. This general solution is then discretized and digitally implemented, as shown by Smith, to constitute a digital waveguide model [72]. Common extensions to the waveguide model are also presented, which correspond to non-ideal string conditions. 15

3.2.1 Solution for the Ideal, Plucked-String

The behavior of a vibrating string is understood by deriving and solving the well-known wave equation for an ideal, lossless string. The full derivation of the wave equation is documented in several physics texts [33, 52] and is obtained by computing the tension di↵erential across a curved section of string with infinitesimal length. This tension is balanced at all times by an inertial restoring force due to the string’s transverse acceleration.

The wave equation is expressed as [33]

Kty00 = "y¨ (3.1)

where Kt, " are the string’s tension and linear mass density, respectively, and y = y (t, x)isthe string’s transverse displacement at a particular time instant, t, and location along the string, x.The

2 2 curvature of the string is indicated by y00 = @ y(t, x)/@x and its transverse acceleration is given by y¨ = @2y(t, x)/@t2. The general solution to the wave equation is given by [33]

y (t, x)=y (t x/c)+y (t + x/c) , (3.2) r l

where yr and yl are functions that describe the right and left traveling components of the wave, respectively, and c is the wave speed, which is a constant determined by Kt/". It should be noted that, y and y are arbitrary functions of arguments (ct x) and (ct + x)p and it can be verified that r l substituting any twice-di↵erentiable function with these arguments for y(t, x) will satisfy Equation

3.1 [33, 72].

Equation 3.2 indicates that the wave solution can be represented by two functions, each depending on a time and a spatial variable. This notion becomes clear by analyzing an ideal, plucked-string at a few instances after its initial displacement as shown in Figure 3.1. After the string is released, its total displacement is obtained by summing the of the right- and left-traveling wave shapes, which propagate away from the plucking position, along the entire length of the string.

3.2.2 Digital Implementation of the Wave Solution

As demonstrated in Figure 3.1, the traveling wave solution has both time and spatial dependencies, which must be discretized to digitally implement Equation 3.2. Temporal sampling is achieved by employing a change of variable in Equation 3.2 such that tn = nTs where Ts is the audio sampling 16

t = t1

t = t2

t = t3

Figure 3.1: Traveling wave solution of an ideal string plucked at time t = t1 and its displacement at subsequent time instances t2,t3. The string’s displacement (solid) at any position is the summation of the two disturbances (dashed) at that position.

interval. The wave’s position is discretized by setting xm = mX,whereX = cTs, such that the waves are sampled at a fixed spatial interval along the string. Substituting t and x with tn and xm in Equation 3.2 yields [72]:

y (t ,x )=y (t x/c)+y (t + x/c) (3.3) n m r l = y (nT mX/c)+y (nT + mX/c) (3.4) r s l s = y ((n m) T )+y ((n + m) T ) (3.5) r s l s

Since all arguments are multiplied by Ts, it is suppressed and the terms corresponding to the right and left traveling waves can be simplified to [72, 89]:

+ y (n) , yr (nTs) ,y (n) , yl (nTs) (3.6)

Smith showed that Equation 3.5 could be schematically realized as a so-called “digital waveg- uide” model shown in Figure 3.2 [70, 71, 72]. When the upper and lower signal paths, or “rails”, of Figure 3.2 are initialized with the values of the string’s left and right wave shapes, the traveling wave phenomena in Figure 3.1 and Equation 3.2 is achieved by shifting the transverse displacement values for the wave shapes in the upper and lower rails. For example, during one temporal sampling instance, the right-traveling wave shifts by the amount cTs along the string, which is equivalent to delaying y+ by one sample in Figure 3.2. The waveguide model also provides an intuitive under- standing for how the traveling waves relate to the string’s total displacement, which is obtained by 17

y+(n) y+(n-1) y+(n-2) y+(n-3) z-1 z-1 z-1

y(nT , 3X) y(nTs, 0) s

y-(n) y-(n+1) y-(n+2) y-(n+3) z-1 z-1 z-1 (x = 3cT ) (x = 0) (x = cTs) (x = 2cTs) s

Figure 3.2: Waveguide model showing the discretized solution of an ideal, plucked string. The upper + (y ) and lower (y) signal paths represent the right and left traveling disturbances, respectively. + The string’s displacement is obtained by summing y and y at a desired spatial sample.

+ summing the values of y and y at a desired spatial sample x = mcTs. It should be noted that the values obtained at the sampling instants in the waveguide model are exact, although band-limited interpolation can be used to obtain the displacement between spatial sampling instants if desired

[89].

3.2.3 Lossy Waveguide Model

The lossless waveguide model in Figure 3.2 clearly represents the phenomena of the traveling wave solution for a plucked string under ideal conditions. However, this model does not incorporate the characteristics of real strings, which are subject to a number of non-ideal characteristics, such as internal friction and losses due to boundary collisions. In the context of sound synthesis, incorpo- rating these properties is essential for modeling tones that behave naturally both from a physical and perceptual standpoint.

Non-ideal string propagation is hindered by energy losses from internal friction and drag imposed by the surrounding air. If these losses can be modeled as a constant, µ, proportional to the wave’s transverse velocity,y ˙, Equation 3.1 can be modified as [72]

Kty00 = "y¨ + µy˙ (3.7) where the additional term, µy˙, incorporates the fricative losses applied to the string in the transverse direction. The solution to Equation 3.7 is the same as Equation 3.1, but with an exponential term that attenuates the right- and left-traveling waves as a function of propagation distance. The solution 18

M sections

y+(n) z-1 g z-1 g z-1 g

y(nTs, 0) y(nTs, MX) y-(n) g z-1 g z-1 g z-1

(x = 0) (x = McTs)

Figure 3.3: Waveguide model incorporating losses due to propagation at the spatial sampling in- stances. The dashed lines outline a section where M gain and delay blocks are consolidated using a linear time-invariant assumption.

is given by [72]:

(µ/2")x/c (µ/2")x/c y(t, x)=e y (t x/c)+e y (t + x/c) (3.8) r l

To obtain the lossy waveguide model, Equation 3.8 is discretized by applying the same change of variables that were used to discretize Equation 3.1. This yields a waveguide model with a gain factor,

µT /2" g = e s , inserted after each delay element in the waveguide as shown in Figure 3.3. Thus, a particular point along the right- or left-traveling wave shape is subject to an amplitude attenuation by the amount g as it advances one spatial sample through the waveguide.

By using a linear time-invariant (LTI) assumption, Figure 3.3 can be simplified to reduce the number of delay and gain elements required for the model. For example, if the output of the waveguide is observed at x =(M + 1)X, then the previous M delay and gain elements can be

M M consolidated into a single delay, z , and loss factor, g . This greatly reduces the complexity of the waveguide model, which is desirable for practical implementations.

3.2.4 Waveguide Boundary Conditions

In practice, the behavior of a vibrating string is determined by boundary conditions due to the string’s termination points. In the case of the guitar, each string is terminated at the “nut” and

“bridge” where the former is located near the guitar’s headstock and the latter is mounted on the guitar’s saddle. The behavior of the string at these locations depends on several factors, including the string’s tensile properties, how it is fastened and the construction of the bridge and nut. For 19 simplistic modeling, however, it suces to assume that guitar string’s are rigidly terminated such that there is no displacement at these positions.

By assuming rigid terminations for a string with length L, a set of boundary conditions are obtained for solving the wave equation [33]

y (t, 0) = 0 y (t, L)=0. (3.9)

By substituting these conditions into Equation 3.2 and discretizing, the following relations between

+ y and y are obtained [72]:

+ y (n)= y (n) (3.10) + y (n D/2) = y (n + D/2) (3.11)

In Equation 3.11, D =2L/X and is often referred to as the “loop delay” since it indicates the delay time, in samples, for a point on the right wave shape, for example, to travel from x =0tox = L and back along the string. Thus, points located at the same spatial sample on the right and left wave shapes will have the same amplitude displacement every D/2 samples. Viewed another way,

D can be calculated as a ratio of the sampling frequency and the string’s pitch, which is determined by the string’s length,

2L 2L 2Lf f D = = = s = s (3.12) X cTs c f0

where the fundamental frequency, f0, was substituted based on the wave relationship f0 = c/2L where 2L is the wavelength and c is the wavespeed.

Figure 3.4 shows the lossy waveguide model with boundary conditions superimposed on a guitar body to illustrate the physical relationship between the model and instrument. The loss factors due to wave propagation and rigid boundary conditions are consolidated into two filters located at x =0 and x = L, which correlate the guitar’s bridge and nut positions, respectively. The individual delay elements are merged into two bulk delay lines, each having a length of D/2 samples and store the shapes of the left- and right-traveling wave shapes at any time during the simulation. Furthermore, this model allows the string’s initial conditions to be specified relative to a spatial sample in the delay line that represents the plucking point position. Initializing the waveguide in this way removes 20

+ + y (n) Delay Line D/2 Samples y (n-D/2)

H (z) H (z) b y(nTs, M1X) h

Delay Line D/2 Samples

y-(n) y-(n+D/2)

(x = 0) (x = M1X) (x = M2X) (x = L)

Bridge Pickup Pluck Point Nut

Figure 3.4: Plucked-string waveguide model as it correlates to the physical layout of the guitar. Propagation losses and boundary conditions are lumped into digital filters located at the bridge and nut positions. The delay lines are initialized with the string’s initial displacement.

the need to explicitly model the coupling e↵ects arising from the interaction between the string and excitation mechanism [72]. The guitar’s output is observed at the “pickup” location by summing the values of the upper and lower delay lines at a desired spatial sample.

The simplistic nature of the the waveguide model in Figure 3.4 leads to computationally ecient hardware and software implementations of realistic plucked guitar sounds. Memory requirements are minimal, since only two bu↵ers are required to store the string’s initial conditions and the lossy boundaries can be implemented with simple digital filters. Furthermore, as Smith showed, the contents of the delay lines can be shifted via pointer manipulation to reduce the load on the processor [10, 72]. Karjalainen showed that using such techniques enables several string models to be implemented on a single DSP chip, with computational capabilities that are eclipsed by present day (2012) microprocessors [25].

3.2.5 Extensions to the Waveguide Model

An important extension is providing fractional delay for the waveguide model since strings are often tuned to non-integer frequencies that may not be obtainable by taking the ratio of sampling frequency over delay line length. While certain hardware and software configurations support multiple sampling rates, it is generally undesirable to vary the sampling rate to achieve a particular tuning, especially when synthesizing multiple string tones with di↵erent pitches. Instead, Karjalainen proposed adding 21 fractional delay into the waveguide loop via a Lagrange interpolation filter. Thus, a FIR filter is computed to add the required fractional delay to precisely tune the waveguide [25].

Smith proposed using all-pass filters to simulate the e↵ects of dispersion in strings, where the string’s internal sti↵ness causes higher frequency components of the wave to travel faster than lower ones. This has the e↵ect of constantly altering the shape of the string. All-pass filters introduce frequency-dependent group delay to simulate this e↵ect [72].

Tolonen et al. incorporate the e↵ects of “pitch glide,” or tension modulation, exhibited by real strings using a non-linear waveguide model [79, 80, 91]. At rest, a string exhibits a nominal length and tension. However, as the string is displaced from its equilibrium position, the string undergoes elongation which increases its tension. After release, the tension and, thus, the wave speed constantly

fluctuates as the string oscillates about its nominal position. This constant fluctuation does not allow a fixed spatial sampling scheme to suce and the wave must be resampled at each time instance to account for the elongation.

3.3 Analysis and Synthesis Using Source-Filter Approximations

The waveguide model discussed in the previous discussion provides an intuitive methodology for implementing the traveling wave solution and simulating plucked-string tones. However, accurate re-synthesis of plucked-guitar tones using the waveguide model requires knowledge of the string’s initial conditions and loss filters that are correctly calibrated to simulate naturally decaying tones.

The former requirement is a significant limitation since the exact initial conditions of the string are not available from a recorded signal and must be measured during performance, which is often impractical. Therefore, when performance and physical data are unavailable, the utility of the waveguide model is limited for analysis-synthesis tasks, such as characterizing recorded performance.

An alternative model, known as the single delay-loop (SDL), was developed to simplify the waveguide model from a computational standpoint by consolidating the delay lines and loss filters.

The SDL model is also widely used in the literature because it permits the analysis of plucked- guitar tones from a source-filter perspective; that is, an external signal excites a filter to simulate the resonant behavior of a plucked string. Thus, the physical specifications for the guitar and its strings are generally not required to calibrate the SDL model since linear time-invariant methods can be applied for this task. A number of guitar synthesis systems are based on SDL models

[26, 56, 74, 75, 90]. 22

3.3.1 Relation to the Karplus-Strong Model

For a more streamlined structure, the bidirectional waveguide model from Figure 3.4 can be reduced to a single, D-length delay line and a loop filter that consolidates the losses incurred from the bridge and nut [7, 72]. This reduction is shown in Figure 3.5, where the lower delay line is concatenated with the upper delay line at the nut position. The wave shape contained in the lower delay line is inverted to incorporate the reflection at the rigid nut, which has been removed.

y+(n) y+(n-D/2) y+(n) y+(n-D) D/2 Samples D Samples

Hb (z) Hh (z)

Hl (z)

D/2 Samples y-(n) y-(n+D/2)

Figure 3.5: Single delay-loop model (right) obtained by concatenating the two delay lines from a bidirectional waveguide model (left) at the nut position. Losses from the bridge and nut filters are consolidated into a single filter in the feedback loop.

The new waveguide structure in Figure 3.5 (right) demonstrates the basic SDL model and is identical to the well-known Karplus-Strong (KS) plucked-string model, whose discovery pre-dated waveguide synthesis techniques [22, 31]. Unlike waveguide techniques where the excitation is based on wave variables, the KS model works by initializing a D-length delay line with random values and circularly shifting the samples through a loss filter. The random initialization of the delay line simulates the transient noise burst perceived during the attack of plucked-string instruments, though this “excitation” signal has no physical relation to the string, while the feedback loop acts a comb

filter so that only the harmonically-related frequencies are passed. The loss filter, Hl(z), employs low-pass filtering to implement the frequency dependent decay characteristics of real strings so that high frequency energy dissipates faster than the lower frequencies.

3.3.2 Plucked String Synthesis as a Source-Filter Interaction

By modeling plucked-guitar tones with the single-delay loop (SDL), the physical interpretation of traveling wave shapes on a string is no longer clear as it was for the bidirectional waveguide.

However, Valimaki et al. show that the SDL can be derived from the bidirectional waveguide model by computing a transfer function between the spatial samples representing the plucking position 23 and output samples [30, 89]. This derivation is still physically valid, though the model’s excitation signal is treated as an external input rather than a set of initial conditions describing the string’s displacement.

Figure 3.6 shows a complete source-filter model for plucked guitar synthesis based on waveguide modeling principles. The SDL model is contained in the block labeled S(z), which is equivalent to the single delay line structure shown in Figure 3.5, except the model is driven by an external excitation signal rather than a random initialization as in the Karplus-Strong model. S(z) alone cannot simulate the complete behavior of plucked-strings found in the waveguide model. Notably, missing is the ability to manipulate the plucking point and pick up positions, both of which are achieved by selecting a desired spatial sample in the waveguide model corresponding to the location on where the string is displaced and where the vibration is observed as the output. Valimaki showed that this functionality could be achieved by adding comb filters before and after the SDL to simulate the e↵ects of plucking point and pickup positions present in the waveguide model.

Figure 3.6 shows a comb filter C(z)precedingS(z) to simulate the e↵ect of the plucking point position. For simplicity, the input p(n) can be an ideal impulse. The comb filter delay determines when p(n) is reflected, which is analogous to a sample in the digital waveguide model encountering a rigid boundary. The number of samples between the initial and reflected impulses is specified as a fraction of the loop delay where D indicates the number of samples corresponding to one period of string vibration. Similarly, the comb filter U(z)proceedingS(z) simulates the position of the pickup seen on electric guitars. In this filter, the comb filter delay specifies the delay between arriving pulses associated with a relative position along the string. It should be noted that, since each of the blocks in Figure 3.6 are linear time-invariant (LTI) systems, they may be freely interchanged as desired.

3.3.3 SDL Components

Whereas the comb filters in Figure 3.6 specify initial and output observation conditions for the plucked guitar tone, the SDL filter in S(z) is responsible for modeling the string vibration including its fundamental frequency and decay. As in the case of the bidirectional waveguide, the total “loop delay”, D, of the SDL denoted by S(z) determines the pitch of the resulting guitar tone as determined by Equation 3.12. Since D is typically a non-integer, the fractional delay filter, HF (z), is used to

D add the required fractional group delay, while z I provides the bulk, integer delay component of

D. All-pass and Lagrange interpolation filters are commonly used for HF (z), with the latter being 24

C(z)

-λ D p(n) z 1 + + −

+ S(z) + + Hl (z) HF (z) z-DI

U(z)

-λ2D z + − + y(n)

Figure 3.6: Plucked string synthesis using the single delay-loop (SDL) model specified by S(z). C(z) and U(z) are comb filters simulating the e↵ects of the plucking point and pickup positions along the string, respectively.

especially popular in synthesis systems since it can achieve variable delay for pitch modification without significant transient e↵ects [26, 30]. Additional information pertaining to fractional delay

filters is provided in Appendix A.

Hl(z) is the so-called “loop filter” and is responsible for implementing the non-ideal characteristics of real strings, including losses due to wave propagation and terminations at the nut and bridge positions. In the early developments of waveguide synthesis, Hl(z) was chosen as a two-tap, averaging filter for simplicity and eciency [31], but since a low order, FIR filter is often too simplistic to match the magnitude decay characteristics of plucked-guitar tones. In the literature, a first order, IIR filter is often used for Hl(z) and has the form

g H (z)= (3.13) l 1 ↵ z 1 0 where ↵0 and g must be determined for proper calibration [29, 62, 86, 90] It is useful to analyze the total delay, D, in the SDL as a sum of the delays contributed by each component in the feedback loop,

D = ⌧l + DF + DI (3.14) 25

DI where ⌧l, DF , DI are the group delays associated with Hl(z), HF (z) and z ,respectively.Thus, the bulk and fractional delay components should be chosen to compensate for the group delay introduced by the loop filter, which varies as a function of ↵0. For spectral-based analysis, the transfer function of the SDL model between input, p(n), and output, y(n), can be expressed in the z-transform domain as

1 S(z)= . (3.15) 1 H (z)H (z)z DI l F

Equation 3.15 can be thought of as a modified linear prediction where the prediction occurs over

DI samples due to the periodic nature of plucked-guitar tones. The “prediction” coecients are determined by the coecients of the loop and fractional delay filters in the feedback loop of S(z).

The SDL model in Figure 3.6 is attractive from an analysis-synthesis perspective since, unlike the bidirectional waveguide model, it does not require specific data about the string during performance

(e.g. initial conditions, instrument materials, plucking technique) to faithfully replicate plucked- guitar tones. Rather, the problem becomes properly calibrating the filters from recorded tones via model-based analysis. A significant portion of the literature for plucked-guitar synthesis is dedicated towards developing calibration schemes for extracting optimal SDL components [26, 29, 62, 69, 86,

90].

3.3.4 Excitation and Body Modeling via Commuted Synthesis

When using the SDL model for guitar synthesis, the output signal is assumed to be strictly the result of the string’s vibration where the only external forces acting on the string are due to fricative losses.

This assumption is not necessarily true when dealing with real guitars, since the instrument’s body incorporates a resonant filter, which a↵ects its timbre, and interacts with the strings via nonlinear coupling. Valimaki et al. describe the acoustic guitar body as a multidimensional resonator, which requires computationally expensive modeling techniques to implement [89].

While an exhaustive review of acoustic body modeling techniques is beyond the current scope, several attempts have been made to reduce the complexity of this task [7, 28, 57]. Measurement of the acoustic guitar body response is typically achieved by striking the resonant body of the instrument with a hammer with the strings muted. The acoustic radiation is recorded to capture the resonant body modes. In some cases, electro-mechanical actuators are used to excite and measure the resonant body in a controlled manner [63]. Digital implementation of the acoustic body involves designing a 26

Excitation Filter SDL Model Body Filter

δ(n) E(z) S(z) B(z) y(n)

Figure 3.7: Components for guitar synthesis including excitation, string and body filters. The excitation and body filter’s may be consolidated for commuted synthesis.

filter that captures the resonant modes. This can be achieved using FIR or IIR filters, though precise modeling requires very high order filters. Karjalainen et al. proposed using warped filter models for computationally ecient modeling and synthesis of acoustic guitar bodies. The warped filter is advantageous since the frequency resolution of the filter can favor the lower, resonant frequency modes which are perceptually important to capture for re-synthesis, while keeping the required filter orders low enough for ecient synthesis [24]. For “cross-synthesis” applications, Karjalainen et al. introduced a technique to “morph” electric guitar sounds into acoustic tones through equalization of the magnetic pickups found on electric guitars. A filter, which encapsulates the body e↵ects of the acoustic guitar, was then applied to a digital waveguide model of the instrument [27].

A popular method for dealing with the absent resonant body e↵ects in SDL model involves using so-called commuted synthesis, which was independently developed by Smith and Karjalainen [29, 73].

This technique exploits the commutative property of linear time-invariant (LTI) systems in order to extract an aggregate signal that encapsulates the e↵ects of the resonant body filter and the string excitation, p(n), of the SDL model when the loop filter parameters are known. This approach avoids the computational cost incurred with explicitly modeling the body with a high-order filter.

Figure 3.7 shows the SDL model augmented by inserting excitation and body filters before and after the SDL loop, respectively. The excitation filter is a general LTI block that encapsulates several aspects of synthesis including “pluck-shaping” filters to model certain dynamics in the articulation and the comb filtering e↵ects from the plucking point and/or pickup locations as shown in Figure

3.6. Assuming that S(z) and y(n) are known, the LTI system can be rearranged

Y (z)=E (z) S (z) B (z) (3.16)

= E (z) B (z) S (z) (3.17)

= A (z) S (z) (3.18) where A(z) is an aggregation of the body and excitation filters. By inverse filtering y(n)inthe 27 frequency domain with S(z), the impulse response for A(z) is obtained. Thus, by making a LTI assumption on the model, this residual signal contains the additional model components which are unaccounted for by the SDL alone. For practical considerations, Valimaki notes that several hundred milliseconds of the residual signal may be required to capture the perceptually relevant resonances of the acoustic body during resynthesis [90], but for many applications the tradeo↵of storing this signal outweighs the cost of explicit body modeling.

It should be noted, that even when plucked-guitar tones do not exhibit prominent e↵ects from the resonant body, commuted synthesis is still a valid technique for obtaining the SDL excitation signal, p(n). This is often the case for electric guitar tones, where the output is measured by a transducer and is relatively “dry” compared to an acoustic guitar signal. Also, any excitation signal extracted via commuted synthesis will contain biases from the plucking point and pickup locations unless these phenomena are specifically accounted for in the “excitation filter” block of Figure 3.7.

If the plucking point and pickup locations are known with respect to the SDL model, the excitation signal can be “equalized” to remove the biases. There are several techniques utilized in the literature to estimate the plucking point location directly from recordings of plucked guitar tones. Traube and

Smith developed frequency domain techniques for acoustic guitars [81, 82, 83, 84], while Pentttinen et al. employed time-domain analysis to determine the relative plucking position along the string

[58, 59].

3.3.5 SDL Loop Filter Estimation

Before the SDL excitation signal can be extracted via commuted synthesis, the loop filter, Hl(z), needs to be calibrated from the recorded tone. This task has been the primary focus in much of the literature, since the loop filter provides the synthesized tones with natural decay characteristics

[14, 29, 39, 62, 69, 86, 90]. This section will overview some of the techniques used in the literature.

Early attempts at modeling the loop filter for the involved using deconvolution in the frequency domain to obtain an estimate of the loop filter’s magnitude response. Smith employed various filter design techniques, including autoregressive methods, in order to model the contours of the spectra, however, the measured spectra were subject to amplified noise due to the deconvolution process [69].

Karjalainen introduced a more robust algorithm that extracts magnitude response specifications for the loop filter by analyzing the recorded tone with a short-time Fourier transform (STFT) 28 analysis [29]. Phase characteristics of the STFT are not considered in the loop filter design since the magnitude response is considered to be perceptually more important for plucked-guitar modeling

[29, 86].

Lee et al. expand on Karjalainen’s STFT-based approach by adapting the so-called Energy Decay

Relief (EDR) [40, 64] to model the frequency-dependent attenuation of the waveguide. The EDR was adapted from Jot [23] in order to de-emphasize the e↵ects of beating in the string so that the resulting magnitude trajectories for each partial are strictly monotonic. Thus, the EDR at time t and frequency f is computed by summing all the remaining energy at that frequency from t to infinity. Due to the decaying nature of plucked-guitar tones, this leads to a set of monotonically decreasing curves for each partial analyzed.

Example algorithm for Loop Filter Estimation

An example of Karjalainen’s calibration scheme is shown in Figure 3.8 and can be summarized with the following steps:

1. Determine the pitch, f0, of the recorded tone, y(n).

2. Compute the STFT on the plucked tone y(n).

3. For each frame in the STFT, estimate the magnitudes of the harmonically-related partials.

4. Estimate the slope of each partial’s magnitude trajectory across all frames in the STFT.

5. Compute a gain profile, G(fk), based on the magnitude trajectories for each harmonically related partials.

6. Apply filter design techniques (e.g. least-squares) to determine the parameters of Hl(z) that satisfy the gain profile.

The details of each step in Karjalainen’s calibration scheme vary depending on the specific imple- mentation. For example, the number of partials chosen to analyze is typically between 10-20. Also, partial-tracking across each frame can be achieved by bandpass filtering techniques when the pitch is known [90].

The gain profile, G(fk), extracted from the STFT analysis is computed as [29]

kD 20f G(fk) = 10 Hop (3.19) 29

where k is the slope of the kth partial’s magnitude trajectory, D is the “loop delay” in samples and fHop is the hop size of the STFT analysis. The physical meaning of Equation 3.19 is to determine the amount of attenuation a particular partial of the plucked tone incurs for each pass through the

SDL. Thus, Equation 3.19 provides a gain specification for each partial in the STFT that can be used to design a loop filter, Hl(z), with similar magnitude response characteristics.

Filter Design Techniques

Least-squares filter design techniques are typically employed to derive coecients for the loop filter that satisfy the estimated gain profile [29, 86, 90]. Valimaki et al. utilized a weighted, least squares algorithm to estimate the gain, g, and pole, ↵0 of Hl(z) with a transfer function described by Equation 3.13. Since a low-order filter generally cannot match the gain specifications of every partial, the weighted minimization ensures that the magnitudes of the lower, perceptually important partials are more accurately matched with the gain profile [86, 90]. These techniques must ensure that the filter coecients are constrained for stability, which, for example, requires 1 <↵0 < 0 and 0

Erkut and Laurson used Karjalainen’s calibration method as a foundation for an iterative scheme based on nonlinear optimization to extract loop filter parameters that best match the amplitude envelope of a recorded tone [14, 39]. The calibration scheme in Figure 3.8 is used to obtain an initial set of loop filter parameters, which are used to resynthesize the plucked signal and an error signal is computed between the amplitude envelopes of the recorded and synthesized signals. The loop filter parameters are adjusted by a small amount and the process is repeated until a global minimum in the error function is found. While this method has the potential to extract precise model parameters, convergence is not guaranteed and its success depends on the accuracy of the initial parameter estimates. 30

Plucked Guitar Tone 0.25

0.2

y(n) 0.15

0.1

0.05

0 Amplitude Pitch −0.05 Estimation −0.1

−0.15

−0.2

−0.25 0 0.05 0.1 0.15 0.2 f0 Time (sec)

Trajectories of the Partials from a Plucked−Guitar Tone

Partial 1 40 Fitted Partial 2 Fitted Partial 3 STFT 30 Fitted Partial 4 Fitted Partial 5 Fitted 20

Y(m, ω) 10 Magnitude (dB) 0

Peak −10 Detection

−20

0 0.5 1 1.5 2 Time (sec)

Loop Filter Gain Specifications

Gain Profile 1 Designed Filter Magnitude

Loop Filter 0.995

Design 0.99 Gain

0.985

0.98 g, α0

0 100 200 300 400 500 600 700 800 Frequency (Hz)

Figure 3.8: Overview of the loop filter design algorithm outlined in Section 3.3.5 using short-time Fourier transform analysis on the signal. 31

3.4 Extensions to the SDL Model

The SDL model discussed in this chapter simulates plucked strings that vibrate in only the transverse

(parallel to the guitar’s top plate) direction and behave in accordance with linear time-invariant as- sumptions. These simplifications prevent modeling additional physical behavior exhibited by guitar strings, which are described in this section. Real guitar strings vibrate along the axes parallel and perpendicular to the guitar’s sound board. The frequency of vibration along each axis is slightly di↵erent due to slight di↵erences in the string’s length at the bridge and nut terminations. The di↵erences in the frequency of vibration along each axis causes the “beating” phenomena where the sum and di↵erence frequencies are perceived [9]. Furthermore, these vibrations may be coupled at the guitar’s bridge termination, which causes a two-stage decay due to the in- and out-of-phase vibration along each axis [43].

In practice, the beating phenomena is incorporated into synthesis systems by driving two SDL models in parallel, which represent string vibration along the transverse and perpendicular axes

[30, 26, 86]. From an analysis perspective, it is dicult to simultaneously estimate parameters for both the transverse and perpendicular axes from a recording since -ups measure the total vibration at a particular point on the string. Typically, the parameters for both SDL model are extracted using the methods described in Section 3.3.5 with the exception of slightly mistuning one of the delay lines to simulate the beating e↵ect. In order to estimate the model parameters directly,

Riionheimo utilized genetic algorithms to obtain transverse and perpendicular SDL parameters that matched recorded signals in a perceptual sense [62]. Alternately, Lee employed a hybrid waveguide- signal approach where the waveguide model is augmented with a resonator bank to implement beating and two-stage decay phenomena in the lower frequency partials [43].

Modeling the tension modulation in strings necessitates the use of non-linear techniques to model the “pitch-glide” phenomena [79, 80]. In practice, pitch-glide is simulated by pre-loading a waveguide or SDL model with an initial string displacement and regularly computing the string’s slope to determine an elongation parameter. This parameter drives a time-varying delay, which represents wave speed to reproduce the tension modulation e↵ect. The caveat to this approach, however, is that commuted synthesis cannot be applied to extract an excitation signal from a recorded tone.

For an analysis-synthesis approach, Lee uses a hybrid resonator-waveguide model. The resonator bank is calibrated from a recording to implement pitch-glide in the low-frequency partials, since, it is argued, that these are perceptually more relevant [42]. 32

CHAPTER 4: SOURCE-FILTER PARAMETER ESTIMATION

4.1 Overview

Despite the vast amount of literature dedicated towards developing and calibrating physically in- spired guitar models, as discussed in Chapter 3, far less research has been dedicated towards esti- mating expression from recorded performances and incorporating these attributes into the synthesis models. It is well-known that guitarists employ a variety of techniques to articulate guitar strings, such as varying the loudness, or dynamics, and picking device (e.g. finger, pick), which characterizes their playing style. Thus, identifying these playing styles from a performance is essential towards developing a system capable of expressive synthesis.

In this chapter, I propose a novel method to capture expressive characteristics of guitar perfor- mance from recordings in accordance with the single delay-loop (SDL) model overviewed in Section

3.3. This approach involves jointly estimating the source and filter parameters of the SDL in accor- dance with a parametric model for the excitation signal, which captures the expressive attributes of guitar performance. Since the SDL is a source-filter abstraction of the waveguide model, this method treats the source signal as the guitarist’s string articulation while the filter represents the string’s response behavior. The motivation for a joint estimation scheme is to account for simultaneous variation of source and filter parameters, which characterizes particular playing styles.

Before providing the details of our approach, I briefly overview existing techniques in the litera- ture for modeling expression in guitar synthesis models.

4.2 Background on Expressive Guitar Modeling

Erkut and Laurson present methods to generate plucked-tones with di↵erent levels of musical dynam- ics, or relative “loudness”, by manipulating a reference excitation signal with a known dynamics level.

These methods involve designing pluck-shaping filters that can achieve a desired musical dynamics when applied to the reference excitation signal [14]. Erkut employs a method that deconvolves a fortissimo (very loud) excitation with forte (loud) and piano (soft) excitations in order to derive 33 their respective pluck-shaping filter coecients. Laurson used the di↵erences in log-magnitude be- tween two signals with di↵erent dynamics and autoregressive filter design techniques to approximate a desired pluck-shaping filter [39]. Both approaches are founded on an argument that a desired level of musical dynamics can be achieved by appropriately filtering a reference excitation signal.

A limitation of this approach, however, is the assumption that the string filter parameters remain constant for all plucking styles, which does not always hold.

Cuzzocoli et al. presented a model for synthesizing guitar expression by considering the finger- string interaction for di↵erent plucking styles in classical guitar performance [12]. This work consid- ered two plucking styles; , where the string is displaced quickly by the finger, and , where the finger slowly displaces the string before releasing it. The e↵ects of these finger-string in- teractions are incorporated into the waveguide model by modifying the wave equation to incorporate the force exerted on the string depending on the plucking style. For example, in the case of apoyando plucking, the force applied to the string is impulsive, while tirando plucks are characterized by a more gradual change in the string’s tension. Cuzzucoli’s approach relies on o↵-line analysis and no methods are provided for deriving these parameters from a recorded signal.

Though these approaches adequately model expressive intention(s), o✏ine analysis is required to compute the model’s excitation signal separately from the filter. This approach is counter-intuitive from a musical performance perspective, since it is understood by musicians that expression is, in part, the result of a simultaneous interaction between the performer and instrument.

4.3 Excitation Analysis

The SDL model presented in Section 3.3 assumes that plucked-guitar synthesis can be modeled by a linear and time-invariant system. Accordingly the model output is the result of a convolution between a source signal p(n) a comb filter C(z) approximating the performer’s plucking point position and the string filter model S(z). For analysis-synthesis tasks, the commuted synthesis technique, as overviewed in Section 3.3.4, is used to compute pb(n) by inverse filtering the recorded tone, y(n), in the frequency domain with S(z) as shown in Equation 4.1:

1 Pb(z)=Y (z)S (z) (4.1) 34

It should be noted that the subscript b on p(n) indicates that the excitation signal contains a bias from the performer’s plucking point position. Unless the comb filter C(z) from Section 3.3.4 is known, the excitation signal derived from commuted synthesis will always contain this type of bias.

4.3.1 Experiment: Expressive Variation on a Single Note

To determine if the SDL model can incorporate expressive attributes of guitar performance, exci- tation signals are analyzed corresponding to di↵erent articulations for the same note on an electric guitar by employing commuted synthesis with Equation 4.1. Assuming the string filter parameters are relatively constant for each performance, one might expect that the excitation signals contain the expressive characteristics that distinguish each playing style. Additionally, any similarities observed between the excitations may permit the development of a parametric input model.

To test this hypothesis, recordings of electric guitar performance were analyzed using the follow- ing approach; For each plucking style:

1. Vary the relative plucking strength used to excite the string from piano (soft) to forte (loud).

2. Vary the articulation used to excite the string using either a pick or a finger.

3. Calibrate the string filter, S(z), using the methodology described in Section 3.3.5

4. Extract pb(n) by inverse filtering the recording, y(n), with S(z)

The tones used for analysis were taken from an electric guitar equipped with a bridge-mounted piezo electric pickup. These signals are relatively “dry” with negligible e↵ects from the instrument’s resonant body so that the recovered excitation signals should primarily indicate the performer’s articulation. The bridge-mounted pickup ensures that the output will be observed from the same location on the string and the recovered excitation signal will only contain a bias due to the plucking point e↵ect.

The top panel of Figure 4.1 shows the recorded tones produced from specific articulations applied to the guitar’s “open”, or unfretted, 1st string and the corresponding excitation signals obtained using the approach outlined above are shown in the bottom panel. By observation, it is clear that each excitation signal corresponds to the first period of oscillation for its associated signal in the top panel of Figure 4.1 and each has negligible amplitude after this period. This is an intuitive result since the SDL used for synthesis is tuned for the pitch of the string and its . By inverse

filtering with the SDL, the residual signal is devoid of the periodic and harmonic structure of the 35

0.4

0.2

0 Amplitude −0.2

−0.4 finger, piano finger, forte pick, piano −0.6 pick, forte

0.4

0.2

0 Amplitude −0.2

−0.4 finger, piano finger, forte pick, piano −0.6 pick, forte

0 1 2 3 4 5 6 7 Time (msec)

Figure 4.1: Top: Plucked guitar tones representing various string articulations by the guitarist on the open, 1st string (pitch E4, 329.63 Hz). Bottom: Excitation signals for the SDL model associated with each plucking style.

recorded tone. The remaining “spikes” in the excitation signal correspond to incident and reflected pulses detected by the pick up after the string is released from displacement (see Section 4.3.2).

Despite the similar contour patterns of the excitation signals in Figure 4.1, there are several distinguishing features related to the perceived di↵erences in timbre. The di↵erences between the amplitudes of overlapping impulses corresponds to the relative strength of the articulation used to produce the tone. More interestingly, however, are the di↵erences between the tones produced with a pick and those produced with the finger, as the former features sharper transitions near regions of maximum or minimum amplitude displacement. This observation is correlated with the perceived timbre of each tone since plucks generated with a pick have a more pronounced “attack” and will 36 excite the high-frequency harmonics in the string.

The common structure of the excitation signals in Figure 4.1 suggest that pb(n) can be parametri- cally represented to capture the variations imparted by the guitarist through the applied articulation.

4.3.2 Physicality of the SDL Excitation Signal

The excitation signals shown in Figure 4.1 follow the contours of their counterpart plucked signals in Figure 4.1. However, the excitation signal is a short transient event that reduces to residual error after one period of oscillation in the corresponding plucked tones. Essentially, the excitation signal indicates one period of oscillation in the vibrating string measured at a particular position along the string. In this case, the acceleration of the string at the guitar’s bridge is the variable observed.

The peaks observed in the excitation signals of Figure 4.1 can be explained by observing the output of a bidirectional waveguide model over one period of oscillation. This is shown in Figure

4.2 where the output at the end of the waveguide representing the guitar’s bridge position is traced over time. Initially, the amplitude of the acceleration wave is maximal at the moment the string is released from its initial displacement (Figure 4.2a). After time, two separate disturbances form and travel in opposite directions along the string (Figure 4.2b). The initial peak in the excitation signal occurs when the right-traveling wave encounters the bridge position (Figure 4.2c). The amplitude of both traveling waves is inverted after reflecting with the boundary conditions at the nut and bridge positions. Eventually, the initially left-traveling wave, now with inverted amplitude, encounters the bridge position forming the second pulse of the excitation signal (Figure 4.2e). After sometime, the initial pulse returns and the cycle repeats (Figure 4.2e). As will be discussed in Chapter 6, identifying the pulse locations in the excitation signal can be used to estimate the guitarist’s relative plucking position. 37

1 1

0.5 0.5

0 0

Acceleration −0.5 Acceleration −0.5

−1 −1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 String Length (meters) String Length (meters)

1 1

0.5 0.5

0 0

−0.5 −0.5 Bridge Acceleration Bridge Acceleration

−1 −1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Time (msec) Time (msec)

(a) t =0msec (b) t =0.56 msec

1 1

0.5 0.5

0 0

Acceleration −0.5 Acceleration −0.5

−1 −1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 String Length (meters) String Length (meters)

1 1

0.5 0.5

0 0

−0.5 −0.5 Bridge Acceleration Bridge Acceleration

−1 −1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Time (msec) Time (msec)

(c) t =1.156 msec (d) t =2.26 msec

1 1

0.5 0.5

0 0

Acceleration −0.5 Acceleration −0.5

−1 −1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 String Length (meters) String Length (meters)

1 1

0.5 0.5

0 0

−0.5 −0.5 Bridge Acceleration Bridge Acceleration

−1 −1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Time (msec) Time (msec)

(e) t =3.37 msec (f) t =5.67 msec

Figure 4.2: The output of a waveguide model is observed over one period of oscillation. The top figure in each subplot shows the position of the traveling acceleration waves at di↵erent time instances. The bottom plot traces out the measured acceleration at the bridge (noted by the ’x’ in the top plots) over time. 38

4.3.3 Parametric Excitation Model

The contour patterns of the excitation signals observed in Figure 4.1 and the simulated waveguide output of Figure 4.2 are consistent with the physical behavior of the vibrating string. This suggests that the variations in the physical behavior of a plucked-string due to di↵erent articulations can be parametrically represented by capturing the contours of the pulse peaks. Modeling the excitation signal with polynomial segments is a reasonable choice for approximating each contour. By concate- nating these polynomial segments together, the excitation signal can be represented by a piecewise function

p (n)=c n0 + c n1 + + c nK + + c n0 + c n1 + + c nK (4.2) b 1,0 1,1 ··· 1,K ··· J,0 J,1 ··· J,K

th th th where cJ,k is the k coecient of a K order polynomial modeling the J segment of pb(n). Therefore, modeling a particular excitation signal requires determining the number of segments required, the polynomial degree used to model each segment and the boundary locations specifying where a particular segment begins and ends.

4.4 Joint Source-Filter Estimation

As shown in Section 4.3.2, the SDL excitation signal reflects one period of oscillation observed at a particular location along the string. Also, it was shown that these signals di↵er according to the articulation imparted by the guitarist and that a parametric model was proposed that can account for these di↵erences. To model the SDL filter in response to di↵erent inputs (i.e. string articulations), this section proposes a joint source-filter approach to simultaneously account for variation in the excitation and string filter parameters. This section will detail the approach for estimating these parameters by formulating a convex optimization problem.

4.4.1 Error Minimization

Using the SDL model, plucked string synthesis is assumed to result from a convolution between an input signal and a string filter. To estimate these parameters in a joint framework, the error between the excitation model described by Equation 4.2 and the residual signal must be minimized

e (n)=p (n) pˆ (n) . (4.3) b b 39

Here, pb(n) is the excitation model from Equation 4.2 andp ˆb(n) is the residual obtained by inverse filtering the output with the string filter. By assuming S(z) is an all-pole filter, e(n) can be expressed

1 in the frequency domain by replacingp ˆb (n)withY (z)S (z)toyield

1 E(z)=P (z) Y (z)S (z) b D = P (z) Y (z)(1 H (z)H (z)z ) (4.4) b l F where the SDL components discussed in Chapter 3 are used to complete the inverse filtering oper- ation. Making an all-pole assumption on S(z) treats the output of the SDL as a generalized linear prediction problem where the current output sample y(n) is computed by a linear combination of previous output samples. Due to the periodic nature of the plucked tone, this prediction happens over an interval defined by the loop delay which is specified by D.

Since inverse-filtering is a time-domain process, taking the inverse Z-Transform of E(z) in Equa- tion 4.4 yields

e(n)=p (n) y(n)+↵ y(n D)+↵ y(n D 1) + + ↵ y(n D N), (4.5) b 0 1 ··· N

where ↵0,↵1,... are generalized filter coecients that are to be estimated. This equation can be rearranged to

e(n)=p (n)+↵ y(n D)+↵ y(n D 1) + + ↵ y(n D N) y(n), (4.6) b 0 1 ··· N

where the unknowns due to the source signal pb(n) and filter (↵0,↵1,...) are clearly separated from the recorded tone y(n). This form leads to a convenient matrix formulation as shown in Equation

4.7. 40

e(1) 10 1K 0 0 y(1 D) y(1 D N) y(1) ··· ··· ··· 2 . 3 2 ...... 3 2 . 3 ...... 6 7 6 7 6 7 6 7 6 0 K 7 6 7 6 e(i) 7 6i i 0 0 y(i D) y(i D N) 7 6 y(i) 7 6 7 = 6 ··· ··· ··· 7 x 6 7 6 7 6 7 6 7 6e(i + 1)7 6 0 0(i + 1)0 (i + 1)K y(i +1 D) y(i +1 D N)7 6y(i + 1)7 6 7 6 ··· ··· ··· 7 6 7 6 . 7 6 ...... 7 6 . 7 6 . 7 6 ...... 7 6 . 7 6 . 7 6 ...... 7 6 . 7 6 7 6 7 6 7 6 7 6 0 K 7 6 7 6 e(m) 7 6 0 0 m m y(m D) y(m D N) 7 6 y(m) 7 6 7 6 ··· ··· ··· 7 6 7 4 5 4 5 4 5 e = Hx y (4.7)

H contains the time indices corresponding to the boundaries of pb(n) and the shifted samples of y(n) and the unknown source-filter parameters are contained in a column vector x defined as

T

x = c1,0 c1,K cJ,0 cJ,K ↵0 ↵1 ↵N . (4.8)  ··· ··· ···

Full specification of Equation 4.7 requires determining the number of unknown source and filter

parameters. The generalized filter depends on N coecients while the excitation signal depends on

the number of piecewise polynomials used to model it. J indicates the number of segments and K

is the polynomial order for each segment.

4.4.2 Convex Optimization

The source-filter parameters are found by identifying the unknowns in x that minimize Equation 4.7.

The complexity of this problem is obviously related to the number of segments used to parameterize

pb(n) and the order of the generalized filter used to implement the string decay. In general the number of unknowns are specified by J (K + 1) + N + 1. ⇥ A common metric for optimizing the estimation of the unknown parameters is by taking the

L2-norm of the error term in Equation 4.7, which leads to

min e 2 =min Hx y 2 . (4.9) x k k x k k 41

Expanding 4.9 yields

min Hx y 2 =(Hx y)T (Hx y) x k k = xT HT Hx 2yT Hx + yT y 1 = xT Fx + gT x + yT y (4.10) 2 where F =2HT H and gT = 2yT H. Equation 4.10 is now in the form of a convex optimization problem. In this form, any locally minimum solution must also be a global solution [6].

Before applying a solver to the optimization problem, the constraints on the source-filter param- eters in x must be addressed. For example, depending on the structure used for the loop filter, the constraints may specify bounds on the coecients to yield a stable filter. Specific constraints for the

filter models used will be discussed in Sections 5.2 and 5.3. Regardless of the filter structure used, the constraints regarding the excitation model are consistent. In particular, the segments constitut- ing the excitation should be a smooth concatenation of polynomial functions that are continuous at the boundary locations. As an example, consider an excitation consisting of J = 2 segments, each modeled with a K-order polynomial and sharing a boundary located at n = i. The equality condition ensuring that these segments are continuous can be expressed as

c i0 + c i1 + + c iK = c i0 + c i1 + + c iK , 1,0 · 1,1 · ··· 1,K · 2,0 · 2,1 · ··· 2,K · which, in matrix form, is notated as

c1,0 2 3 c1,1 6 7 6 . 7 6 . 7 6 7 6 7 6c1,K 7 0 1 K 0 1 k 6 7 i i i i i i 6 7 =0. ··· ··· 6 7  6 c2,0 7 6 7 6 7 6 c2,1 7 6 7 6 . 7 6 . 7 6 7 6 7 6 7 6c2,K 7 6 7 4 5 The term on the left contains the time indices of the polynomial functions and the column vector 42 contains the unknown source coecients. Since the real excitation signals dealt with will consist of more than two segments, additional equality conditions are required for each pair of segments sharing a boundary.

The constraints on the source-filter parameters are specified for the optimization problem via equality and inequality conditions, noted by Aeq and A, respectively. By including these constraints, the optimization problem from Equation 4.10 is expressed as

1 min f(x)= xT Fx + gT x (4.11) x 2 subject to Ax b 

Aeqx = beq. where the last term of Equation 4.10 is dropped from the objective function f(x) since it is always positive and does not contribute to the minimization. In Equation 4.11, b and beq specify the bounds on the parameters related to the inequality and equality constraint matrices, respectively.

When written in the form of 4.11, Equation 4.9 is solved using quadratic programming techniques.

Several software packages are available for this task, including CVX and the quadprog function in

MATLAB’s Optimization Toolbox. quadprog employs a “trust region” algorithm, where a gradient approximation is used to evaluate a small neighborhood of possible solutions in x to determine convergence [47]. CVX is also adept for solving quadratic programs, though it formulates the objective function as a second-order cone problem [18]. CVX is the preferred solver for the work in this thesis because the syntax used to specify the quadratic program is identical to the mathematical description of the minimization problem in Equation 4.10. 43

CHAPTER 5: SYSTEM FOR PARAMETER ESTIMATION

y(n)

Coarse Onset Detection

Source-Filter Parameters Initialize Least f0 Solve Pitch Squares Problem x Estimation Optimization ||Hx - y||2

Pitch n0, n1, ... ,nJ Synchronous Onset

Onset Localization and Segment Estimation

Figure 5.1: Proposed system for jointly estimating the source-filter parameters for plucked guitar tones.

This chapter presents the details for the implementation of the joint source-filter estimation scheme proposed in Chapter 4. Figure 5.1 provides a diagram of the proposed system including the major sub-tasks required for estimating the parameters directly from recordings. Section 5.1 discusses the onset localization of the plucked-guitar signal. This is required to determine the pitch of the tone during the “attack” instant and to localize the indices for the parametric model of the excitation signal. The experiments for application of the joint source-filter scheme are presented in

Section 5.2, which include the problem formulation, solution and analysis of the results.

5.1 Onset Localization

To estimate the SDL excitation signal in the joint framework, the physics of a vibrating string fixed at both end points are exploited. When considering the SDL model without the comb filter e↵ect explicitly accounted for, the excitation signal corresponds to one period of string vibration, which can be identified in the recorded signal. From the physical modeling overview provided in Chapter 3, 44 when the string is released from an initial displacement, two disturbances are produced that travel in opposite directions along the string. These disturbances are measured by the guitar’s pickup as impulse-like signals where the first pulse is incident from the string’s initial displacement and the second is inverted from reflection at the guitar’s nut. A simulation of this behavior using acceleration as the wave variable was shown in Section 4.3.2. By identifying these pulses in the initial period of vibration, the portion of the recorded signal corresponding to the excitation signal can be identified.

This section overviews the approach used to identify the boundaries of the excitation within the plucked-guitar signal, which includes locating the incident and reflected pulses. As will be explained in Chapter 6, the spacing of these pulses provides insight on estimating the performer’s relative plucking position along the string. The approach utilizes a two-stage onset detection and is outlined as follows:

1. Employ “coarse” onset detection to determine a rough onset time for the “attack” of the

plucked tone.

2. Estimate the pitch of the tone starting from the coarse onset.

3. Using the estimated pitch value, employ pitch-synchronous onset detection to estimate an

onset closer to the initial “attack” of the signal.

4. Search for the local minimum and maximum values within the first period of the signal.

5.1.1 Coarse Onset Detection

Onset detection is an important tool used for many tasks in music information retrieval (MIR) systems, such as the identification of performance events in recorded music. For example, on a large scale it may be of interest to identify the beats from a recording of polyphonic music by looking for the drum onsets. For melody detection on a monophonic signal, the onsets must be found to determine when the instrument is actually playing.

A thorough review of onset detection algorithms is provided in [4] and details several sub-tasks of the process including pre-processing of the audio signal, reducing the audio signal to a detection function and locating the onsets by finding peaks in the detection function. Obtaining a spectral representation of the audio signal is often the initial step for computing a detection function since the time-varying energy in the spectrum can indicate when certain transient events occur, such as note onsets. The short-time Fourier Transform (STFT) provides a time-varying spectral representation 45 and may be computed as:

N 2 1 2j⇡mk Y (n)= y(m)w(m nh)e N . (5.1) k m= N X 2

In Equation 5.1, w(m) is an N-point window function and h is the hop-size between adjacent windows. The STFT facilitates the computation of several detection functions for onset detection tasks including spectral flux. For monophonic recordings of instruments with an impulsive attack, such as the guitar, Bello et al. show that spectral flux performs well in identifying onsets [4]. Spectral

flux is calculated as the squared distance between successive frames of the STFT

N 1 2 SF(n)= R ( Y (n) Y (n 1) ) 2 (5.2) { | k || k | } k= N X 2 where R(x)=(x + x )/2 is a rectification function to account for only positive changes in energy | | while ignoring negative changes.

The “coarse” onset detection is named such because a relatively large window size of N = 2048 samples is used to compute the STFT in Equation 5.1 and the flux in Equation 5.2. The motivation for using such a long window size is to identify the “attack” portion of the plucked-tone where there is the largest energy increase while ignoring spurious noise preceding onset. The corresponding detection function is shown in the top panel of Figure 5.3(a) where there is a clear peak. The onset is taken as the time instant two frames prior to the maxima in the detection function.

5.1.2 Pitch Estimation

The coarse onset detected in Figure 5.3(a) is still quite far o↵from the “attack” segment of the plucked signal. Searching for the pulse indices too far from the onset of the signal will likely result in false detections and a closer estimate is required. This is the purpose of pitch synchronous onset detection. The pitch of the signal is estimated by taking a window of audio equal to three times of the STFT frame length starting from the coarse onset location. Using this window, the pitch is estimated using the well-known autocorrelation function, which is given by

N 1 1 (m)= [y(n + l)w(n)][y(n + l + m)w(n + m)], for 0 m N 1, (5.3) N   n=0 X 46

200 Autocorrelation Function Fundamental Frequency Lag 150

100

50 Autocorrelation 0

−50

0 2 4 6 8 10 Lag (msec)

Figure 5.2: Pitch estimation using the autocorrelation function. The lag corresponding to the global maximum indicates the fundamental frequency for a signal with f0 = 330 Hz. where w(n) is a window with length N. Autocorrelation is used extensively for detecting periodicity in signal processing tasks since it can reveal underlying structure in signals, especially for speech and music. If (m) for a particular signal is known to be periodic with period P , then that signal is also periodic with the same period [61]. The pitch of the plucked-signal is estimated by searching for a global maximum in (m) that occurs after the maximum correlation, i.e. the point of zero lag where m = 0. An example autocorrelation plot is provided in Figure 5.2.

5.1.3 Pitch Synchronous Onset Detection

The estimated pitch of the plucked-signal is used to recompute the STFT using a frame size equal to half the estimated pitch period starting from the coarse onset location. The spectral flux is also recomputed using equation 5.2 and the new frame size. This yields a detection function with much

finer time resolution. As an example, the pitch synchronous onset for a plucked signal is shown in

Figure 5.3(b), where the onset is taken as the first locally maximum peak indicated by the detection function. Comparing all the panels of 5.3, it is evident that the two stage onset detection procedure provides an onset that is suciently close to the “attack” portion of the plucked-note. 47

Flux 180 Onset 160

140

120

100

80 Spectral Flux

60

40

20

0

(a)

400 Flux Onset 350

300

250

200

Spectral Flux 150

100

50

0

(b)

1 Plucked Signal 0.8 Coarse Onset Pitch Synchronous Onset 0.6

0.4

0.2

0

Amplitude −0.2

−0.4

−0.6

−0.8

−1 0.4 0.45 0.5 0.55 0.6 Time (sec)

(c)

Figure 5.3: Overview of residual onset localization in the plucked-string signal. (a): Coarse onset localization using a threshold based on spectral flux with a large frame size. (b): pitch-synchronous onset detection utilizing spectral flux threshold computed with a frame size proportional to the fun- damental frequency of the string. (c): Plucked-string signal with onsets coarse and pitch-synchronous onsets overlayed. 48

5.1.4 Locating the Incident and Reflected Pulse

With the pitch-synchronous onset location, identifying the indexes of the incident and reflected pulses is accomplished via a straight-forward search for the minimum and maximum peaks within the first period of the signal. This period is known from the previous pitch estimation step. The plucked-signal from Figure 5.3 is shown again in detail in Figure 5.4 for emphasis. The indices of the pulses are used as boundaries for fitting polynomial curves to model the excitation signal. It should be noted that a straight-forward search for the minima and maxima is sensitive to noise preceding the incident pulse. The pitch-synchronous onset detection is capable of ignoring this noise and yielding an onset closer to the incident pulse location.

1 Pluck Signal 0.8 Pitch Syncrhonous Onset Incident Pulse 0.6 Reflected Pulse

0.4

0.2

0

Amplitude −0.2

−0.4

−0.6

−0.8

−1 0.44 0.445 0.45 0.455 0.46 Time (sec)

Figure 5.4: Detail view of the “attack” portion of the plucked-tone signal in Figure 5.3. The pitch- synchronous onset is marked as well as the incident and reflected pulses from the first period of oscillation. 49

5.2 Experiment 1

This section presents the application of the joint source-filter estimation schemed proposed in Section

4.4 when the loop filter chosen is a single pole infinite impulse response (IIR) type. The problem formulation and solution are discussed as well as the application of the scheme to a corpus of plucked guitar tones.

5.2.1 Formulation

In the literature, the decay rates of the harmonically-related partials of plucked-guitar tones are often approximated by a single, infinite impulse response (IIR) filter with the following form

g H (z)= (5.4) l 1 ↵ z 1 0

In this formulation, the pole ↵0 is tuned so that the spectral roll-o↵of the filter’s magnitude response approximates the decay rates of the harmonically related partials in the plucked guitar tone. The gain term g in the numerator is tuned to improve the fit.

To estimate this type of filter in the joint source-filter framework, Equation 5.4 is substituted for

Hl(z)intheSDLstringfilterS(z)

1 S(z)= D 1 Hl(z)HF (z)z I 1 1 ↵ z = 0 . (5.5) 1 ↵ z 1 gH (z)z DI 0 F

The pole in the numerator of Equation 5.5 poses a problem for the joint-source filter estimation approach because inverse filtering Y (z)withS(z) does not result in a FIR filtering operation. This is problematic because inverse filtering Y (z) and S(z) in the time domain requires previous samples from the excitation signal pb(n), which is unknown. In practice, we can circumvent this diculty and still formulate the joint source-filter estima- tion problem by discarding the numerator of S(z) in Equation 5.5 to yield an all-pole filter. This approximation is made by noting a few observations about the source-filter system. First, the mag- nitude response of S(z), shown in Figure 5.5(d), is dominated by its poles, which creates a resonant structure passing frequencies located near the string’s harmonically related partials. Examining the values estimated for the loop filter pole ↵0 in the literature [14, 39, 86, 90], ↵0 is typically very small 50

1 1

0.8 0.8

) 0.6 ) 0.6 − 1 − 1 0.4 0.4

0.2 0.2

0 0

−0.2 −0.2

−0.4 −0.4 Imaginary Axis (seconds Imaginary Axis (seconds

−0.6 −0.6

−0.8 −0.8

−1 −1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 Real Axis (seconds−1) Real Axis (seconds−1)

(a) (b)

0 0 −0.2 −0.2 −0.4 −0.4 −0.6 −0.6 −0.8 −0.8 −1 −1 Magnitude (dB) Magnitude (dB) −1.2 −1.2 −1.4 −1.4 −1.6 −1.6 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 Frequency (Hz) Frequency (Hz) (c) (d)

Figure 5.5: Pole-zero and magnitude plots of a string filter S(z)withf0 = 330 Hz and a loop filter pole located at ↵0 =0.03. The pole-zero and magnitude plots of the system are shown in (a) and (c) and the corresponding plots using an all-pole approximation of S(z) are shown in (b) and (d).

( ↵ 1). As shown in Figure 5.5(a), this places the corresponding zero in the numerator of S(z) | 0|⌧ close to the origin of the unit circle giving it a negligible a↵ect on the filter’s magnitude response.

Figures 5.5(d) shows that the magnitude response of the all-pole approximation is identical to its pole-zero counterpart in Figure 5.5(c).

The next observation is that the model of the excitation signal consists of a short-duration pulse with zero amplitude after the first period of vibration as discussed in Section 4.3. The non-zero part of the excitation signal pertains to how the string was plucked, while the remaining part is residual error from the string model. By making a zero-input assumption on the excitation signal after the initial period, the recursion from the numerator of S(z) can be ignored without much a↵ect to the 51 estimation.

Taking these observations into account, the numerator of S(z) is discarded and an all-pole ap- proximation is obtained

1 Sˆ(z)= . (5.6) 1 ↵ z 1 gH (z)z DI 0 F

The fractional delay coecients due to HF (z) must be addressed before the error minimization between the residual and excitation filter can be formulated (i.e. Equation 4.3). HF (z) is an N order FIR filter

N N HF (z)= hn(n)z (5.7) n=0 X where the coecients for a desired delay can be computed using a number of design techniques.

A consequence of realizing a causal fractional delay filter is that an additional integer delay in the amount N/2 is introduced into the feedback loop of Sˆ(z). In practice, this can be compensated b c for to avoid de-tuning the SDL by subtracting the added delay from HF (z) o↵of the bulk delay

filter zDI as long as N D . ⌧ I The required fractional delay DF and the bulk delay DI can be determined from the estimated pitch of the guitar tone discussed in Section 5.1.2 and HF (z) is computed using the LaGrange interpolation technique overviewed in Appendix A. The error minimization from Equation 4.4 can now be specified for this particular case

1 1 N D E(z)=P (z)Y (z)(1 ↵ z g(h + h z + + h z )z I ). (5.8) b 0 0 1 ··· N

By expanding Equation 5.8, rearranging terms and taking the inverse z-transform the error mini- mization is expressed in the time domain as

e(n)=pb(n)+↵0y(n 1) + ... (5.9) y(n D )+ y(n D 1) + + y(n D N) y(n) 0 I 1 I ··· N I where j = ghj, for j =0, 1, 2,...,N. 52

5.2.2 Problem Solution

Using the convex optimization approach presented in Section 4.4.2, minimizing the L2-norm of Equation 5.9 becomes

min Hx y 2 (5.10) x k k subject to 0.001 ↵ 0.999  0  0.001 0.999 for j =0, 1,...,N.  j 

The first inequality in the minimization ensures that the estimated loop filter pole ↵0 will lie within the unit circle for stability and have low-pass characteristics. Though ↵0 = 0 is a stable solution, the resulting filter will not have any damping characteristics on the frequency response of the loop filter so 0.001 was chosen as a lower bound on ↵0. The second inequality constraint relates to the stability of the overall string filter S(z). If the gain g of the loop filter is permitted to exceed unity, certain frequencies could be amplified, which would result in an unstable string filter response. Thus, the product of g with each fractional delay filter coecient hj is constrained to avoid this. Each hj is constrained by the nature of the fractional filter design leaving g as the free parameter.

In addition to the inequality constraints, equality constraints were placed on the minimization in Equation 5.10 to handle continuous excitation boundaries, which was discussed in Section 4.4.2.

The excitation boundaries were identified using the two-stage onset localization scheme from Section

5.1. While this approach yields 3 segments corresponding to the incident and reflected pulses, it was found that additional segments were needed to adequately model the complex contours of the excitation signal. To reduce the modeling complexity, two equally-spaced boundaries were inserted between the incident and reflected pulses as shown in the top panel of Figure 5.6. Including the boundary after the first period of the signal, this yields a total of 5 boundaries requiring 6 segments to be modeled. 5th-order polynomial functions were found to provide the best approximation of each segment while maintaining feasibility in the optimization problem since increasing the order also increases the number of unknown variables. Lower order functions are unable to capture the details of the signal, while higher order functions generally resulted in the solver failing to converge on a solution. 53

5.2.3 Results

The source-filter estimation scheme was applied to a corpus of recorded performances of a guitarist exciting each of the 6 strings using various fret positions. Multiple articulations were performed at each position, which included using a finger or pick and altering the dynamics, or relative hardness, of the excitation. Additional details about the data are provided in Section 6.3.

Figure 5.6 demonstrates the analysis and resynthesis for a tone produced by plucking the open,

1st string of the guitar. The top panel of Figure 5.6 shows the identification of the boundaries for the excitation signal model within the first period of the recorded tone. The middle panel shows the resynthesized tone and estimated excitation signal using the parameters obtained from the convex optimization. The error computed between the synthetic and recorded tones is shown in the bottom panel of Figure 5.6 along with the error computed between the estimated excitation signal and the residual from inverse filtering. Areas of the error signals with significant amplitude can be attributed to several factors. First, the approximation of the excitation may not capture all the high frequency details present in the recorded signal. Second, the SDL model has fixed-frequency tuning whereas the pitch of the recorded tone tends to fluctuate due to changing tension as the string vibrates, which results in misalignment. Finally, the loop filter model assumes that the string’s partials monotonically decay over time even though the decay characteristics of recorded tones are generally more complex. This results in amplitude discrepancy between the analyzed and synthetic signals, which contributes to the error as well.

Figure 5.7 shows that the source-filter estimation approach is capable of estimating the loop filter pertaining to string articulations resulting from varying dynamics. Figures 5.7(a) and 5.7(b) show the amplitude decay characteristics of analyzed and synthesized tones produced with a piano artic- ulation, respectively. In this case, the synthetic tone demonstrates the gradual decay characteristics of its analyzed counterpart. As the articulation dynamics are increased to mezzo-forte, the observed decay is more rapid in both the analyzed and synthetic cases in Figures 5.7(c) and 5.7(d). Finally,

Figures 5.7(e) and 5.7(f) show a forte articulation defined by a very rapid decay. In all cases, the synthetic signals constructed from the estimated parameters convey the perceptual characteristics of their analyzed counter parts.

Figure 5.8 shows a similar plot of analyzed and resynthesized signals for various articulations, but focuses on tones produced on a lower gauge string. In this case, the string’s behavior deviates significantly from the SDL model since the amplitude decay rate fluctuates over time. This is 54

Analyzed Signal 0.5 Residual Excitation Excitation Boundaries 0.4

0.3

0.2

0.1

0 Amplitude −0.1

−0.2

−0.3

−0.4

−0.5

Synthesized Output 0.5 Estimated Input Signal 0.4

0.3

0.2

0.1

0 Amplitude −0.1

−0.2

−0.3

−0.4

−0.5

Output Error 0.5 Input Error 0.4

0.3

0.2

0.1

0 Amplitude −0.1

−0.2

−0.3

−0.4

−0.5 2 4 6 8 10 12 Time (msec)

st Figure 5.6: Analysis and resynthesis of the guitar’s 1 String in the “open” position (E4, f0 = 329.63 Hz). Top: Original plucked-guitar tone, residual signal and estimated excitation boundaries. Middle: Resynthesized pluck and excitation using estimated source-filter parameters. Bottom: Modeling error. 55

0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0

Amplitude −0.05 Amplitude −0.05 −0.1 −0.1 −0.15 −0.15 −0.2 −0.2

0 1 2 3 4 5 0 1 2 3 4 5 Time (sec) Time (sec) (a) piano, analyzed (b) piano, synthetic

0.2 0.2

0.1 0.1

0 0 Amplitude Amplitude −0.1 −0.1

−0.2 −0.2

0 1 2 3 4 5 0 1 2 3 4 5 Time (sec) Time (sec) (c) mezzo-forte, analyzed (d) mezzo-forte, synthetic

0.6 0.6

0.4 0.4

0.2 0.2

0 0

Amplitude −0.2 Amplitude −0.2

−0.4 −0.4

−0.6 −0.6 0 1 2 3 4 5 0 1 2 3 4 5 Time (sec) Time (sec) (e) forte, analyzed (f) forte, synthetic

Figure 5.7: Comparing the amplitude envelopes of synthetic plucked-string tones produced with the parameters obtained from the joint source-filter algorithm against their analyzed counterparts. The st nd tones under analysis were produced by plucking the 1 string at the 2 fret position (F#4, f0 = 370 Hz) at piano, mezzo-forte and forte dynamics.

characteristic of tones that exhibit strong beating characteristics and tension modulation. Although these behaviors are not captured using the joint estimation approach, the optimization routine identifies loop filter parameters that provide the best overall approximation of the tone’s decay characteristics. 56

0.3 0.3

0.2 0.2

0.1 0.1

0 0

Amplitude −0.1 Amplitude −0.1

−0.2 −0.2

−0.3 −0.3

0 1 2 3 4 5 0 1 2 3 4 5 Time (sec) Time (sec) (a) piano, analyzed (b) piano, synthetic

0.6 0.6

0.4 0.4

0.2 0.2

0 0

Amplitude −0.2 Amplitude −0.2

−0.4 −0.4

−0.6 −0.6 0 1 2 3 4 5 0 1 2 3 4 5 Time (sec) Time (sec) (c) mezzo-forte, analyzed (d) mezzo-forte, synthetic

0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0

Amplitude −0.2 Amplitude −0.2 −0.4 −0.4 −0.6 −0.6 −0.8 −0.8 0 1 2 3 4 5 0 1 2 3 4 5 Time (sec) Time (sec) (e) forte, analyzed (f) forte, synthetic

Figure 5.8: Comparing the amplitude envelopes of synthetic plucked-string tones produced with the parameters obtained from the joint source-filter algorithm against their analyzed counterparts. th th The tones under analysis were produced by plucking the 5 string at the 5 fret position (D3, f0 = 146.83 Hz) at piano, mezzo-forte and forte dynamics.

To assess the model “fit” for each signal in the data set, the signal-to-noise ratio (SNR) was 57 computed as

1 L y(n) 2 SNR = 10 log , (5.11) dB 10 L y(n) yˆ(n) n=0 X ✓ ◆ where L is the length of the analyzed guitar tone y(n) andy ˆ(n) is the re-synthesized tone using the parameters from the joint estimation scheme. This metric provides an indication of the average amplitude distortion introduced by the modeling scheme for a particular signal so that in the ideal case there is zero amplitude error distorting the signal.

Table 5.1 summarizes the mean and standard deviation of the SNR computed for particular articulations on certain strings. For example, the SNR values for all forte plucks produced with the guitarist’s finger along the 1st string are computed and the mean and standard deviation of these values is reported. No distinction is made for di↵erent fret positions along a string.

It should be noted that in general, the mean SNR value for a particular dynamic (i.e. forte) corresponding to pick articulations is generally lower than the same plucking dynamic produced with the guitarist’s finger. This can be explained by the action of the plastic pick, which induces rapid frequency excursions in the partials of the string and other nonlinear behaviors such as tension modulation. These e↵ects are prominent near the “attack” portion of the tone and the associated string decay does not exhibit the monotonically decaying exponential characteristics used in the single delay-loop model. The linear time invariant model cannot capture the complexities of the string vibration and the estimated loop filter provides a “best fit” to match the overall decay char- acteristics. This leads to a greater amplitude discrepancy between the modeled and analyzed tones and thus a lower SNR value.

For the 3rd string, the SNR values are significantly lower for the pick articulations. A closer inspection revealed that many of these tones exhibited resonant e↵ects from coupling with the guitar’s body. This resonant e↵ect introduces a “hump” in the tone’s amplitude decay envelope after the initial attack. Since the string model does not consider the instrument’s resonant body, this e↵ect is not accounted for, which leads to increased amplitude error for the a↵ected portions of the signal.

Informal listening tests confirm that the synthetic signals preserve many of the perceptually important characteristics of the original tones, including the transient “attack” portion of the signal related to the guitarist’s articulation. 58

Mean and Standard Deviation of Signal-to-Noise Ratio (dB)

Pick Finger String piano mezzo-forte forte piano mezzo-forte forte 1 50.27 1.52 51.92 1.73 52.03 2.12 49.80 2.53 52.70 1.74 54.66 1.51 ± ± ± ± ± ± 2 50.23 1.37 50.35 1.19 53.58 2.18 52.10 3.29 55.34 1.39 55.48 1.34 ± ± ± ± ± ± 3 48.30 0.99 48.60 1.29 48.85 1.53 50.73 3.86 55.62 3.12 56.36 2.37 ± ± ± ± ± ± 4 51.19 1.29 52.11 0.85 51.78 1.98 54.44 2.37 57.06 1.18 56.47 1.30 ± ± ± ± ± ± 5 49.80 1.59 50.16 1.80 49.12 1.04 53.63 1.79 56.38 1.53 55.60 1.03 ± ± ± ± ± ± 6 51.09 1.23 51.61 1.65 51.98 1.77 53.78 1.84 53.88 1.65 55.09 1.25 ± ± ± ± ± ± Table 5.1: Mean and standard deviation of the SNR computed using Equation 5.11. The joint source-filter estimation approach was used to obtain parameters for synthesizing the guitar tones based on an IIR loop filter.

5.3 Experiment 2

This section investigates the solution of the joint source-filter estimation scheme when a finite impulse

response (FIR) filter is used to implement the loop filter. The problem formulation, solution and

results are discussed as well.

5.3.1 Formulation

The Z-Transform for a generalized, length N (order N 1) FIR filter is given by

N k H(z)= hkz , (5.12) kX=0

where each hk is an impulse response coecient of the filter. By using this filter structure for the string model’s loop filter, the transfer function of S(z) becomes

1 S(z)= . (5.13) 1 H (z)H (z)z Dl l F

For the plucked-string system defined by the transfer function of S(z), the output is computed en-

tirely by a linear combination of past output samples once the transient-like excitation has reached

a zero-input state. Estimating the filter coecients through the error minimization technique dis-

cussed in Section 4.4.1 becomes complicated since the loop filter coecients are convolved with the

coecients from the fractional delay filter HF (z), which is also modeled using an FIR filter and 59 the contribution of the loop filter cannot be easily separated. In practice, this diculty is averted by resampling the recorded signal y(n) to a frequency that can be defined by an integer number of delays determined by the bulk delay term DI , which allows HF (z) to be dropped. Though this has the e↵ect of adjusting the frequency of the signal to fˆ = fs , the fractional delay filter can be o DI re-introduced during synthesis to correct the pitch.

After the resampling operation, the Z-Transform of the error minimization becomes

1 E(z)=P (z) Y (z)S (z) b 1 D = P (z)(1 (h + h z + + h )z I ). (5.14) b 0 1 ··· N

Expanding terms and taking the inverse Z-Transform of Equation 5.14 yields the time-domain for- mulation of the error minimization

e(n)=p (n)+h y(n D )+h y(n D 1) + + h y(n D N) y(n) (5.15) b 0 I 1 I ··· N I

where the loop filter coecients hk can be estimated with the convex optimization approach.

5.3.2 Problem Solution

Before solving for the source and filter parameters, several constraints are imposed on the FIR loop

filter. Foremost, the loop filter is required to have a low pass characteristic, to avoid amplifying high frequency partials. This is consistent with the assumed operation of the loop filter in relation to the behavior of plucked-guitar tones described in Section 3.3.3 where, in general, high frequency partials are perceived as decaying faster than lower frequency partials. The next constraint on the loop filter is that it exhibit a linear phase response to avoid introducing excessive phase distortion into the frequency response of the string filter S(z). These filters also have the convenient property of constant group delay, so as not to drastically de-tune S(z) when the signal is resynthesized.

The low pass constraints on the FIR filter can be formulated by constraining the magnitude response on the filter at DC and Nyquist. At DC (! = 0), the filter gain is required to be 1 and  60 yields the following inequality constraints on the filter coecients

j!k H(e ) 1 !=0  j 0 1 j 0 2 j 0 N h0 + h1e ⇤ ⇤ + h2e ⇤ ⇤ + + hN e ⇤⇤ 1 ···  h0 + h1 + h2 + + hN 1. (5.16) ··· 

At Nyquist frequency (! = ⇡), we require the filter to have zero magnitude response. This is expressed as an equality constraint on the filter coecients

j!k H(e ) !=⇡ =0

j⇡ j2⇡ jN⇡ h0 + h1e + h2e + + hN e =0 ··· N h0 + h1 + h2 + +( 1) hN =0. (5.17) ···

The linear phase constraint on the filter requires that its filter coecients are symmetric. This imposes a final set of equality constraints on the coecients

hk = hN 1 k for k =0,...,N. (5.18)

The process of identifying the boundaries for the segments of the excitation signal is identical to the procedure described in Section 5.2.2 and 5th-order polynomials are also used for segment

fitting. Equation 5.19 summarizes the constrained minimization problem after taking the L2-norm of Equation 5.15 and imposing the constraints from Equations 5.16-5.18 in addition to the constraints placed on the input signal as specified in Section 4.4.2.

min Hx y 2 (5.19) x k k N+1 subject to h 1 k  kX=0 N+1 h ( 1)k =1 k kX=0

hk = hN 1 k for k =0,...,N 61

Mean and Standard Deviation of Signal-to-Noise Ratio (dB)

Pick Finger String piano mezzo-forte forte piano mezzo-forte forte 1 50.81 1.61 51.94 1.68 52.03 1.85 49.51 2.77 52.88 1.83 54.77 1.66 ± ± ± ± ± ± 2 50.76 1.19 50.68 1.13 52.64 1.93 52.26 3.33 56.03 1.32 55.69 1.29 ± ± ± ± ± ± 3 48.78 0.97 48.70 1.20 49.65 1.44 50.89 3.91 56.21 3.48 56.30 2.68 ± ± ± ± ± ± 4 51.60 1.05 52.18 0.66 52.32 1.72 54.45 2.16 57.28 2.16 56.45 1.23 ± ± ± ± ± ± 5 49.68 1.65 50.10 1.66 49.78 1.92 53.76 2.07 56.48 1.58 55.28 1.05 ± ± ± ± ± ± 6 51.30 1.43 51.73 1.51 52.12 1.86 53.92 1.95 54.03 1.84 55.23 1.75 ± ± ± ± ± ± Table 5.2: Mean and standard deviation of the SNR computed using Equation 5.11. The joint source-filter estimation approach was used to obtain parameters for synthesizing the guitar tones using a FIR loop filter with length N = 3.

5.3.3 Results

The source-filter estimation scheme using the FIR loop filter was applied to the same corpus of signals

used in Experiment 1 and the MATLAB CVX package was again used to solve the minimization

from Equation 5.19. Table 5.2 summarizes the mean and standard deviation of the SNR computed

in the same manner as Experiment 1 using Equation 5.11. These values were computed based on

re-synthesizing the plucked-guitar tones using a FIR loop filter with length N = 3.

The values reported in Table 5.2 from this experiment are on par with the values obtained

in Experiment 1. That is, the FIR modeling approach exhibits roughly the same average SNR

values and trends for di↵erent articulations and strings. However, by comparing the synthetic tones

produced by the methods of Experiment 1 and 2, we noted that the FIR filter does not always

adequately match the decay rates for the high frequency partials. This yielded synthetic tones that

sounded “buzzy” since the high frequency partials were not decaying fast enough.

We attempted to improve the perceptual qualities of the synthetic tones to better match their

analyzed counterparts by increasing the length of the FIR loop filter. However, using filters with

length N>3 often resulted in the overall response of the single delay-loop model becoming unstable.

Though the FIR loop filter is inherently stable by design and constraints were placed on the filter at

the DC and Nyquist frequencies, the FIR loop filter may occasionally exhibit gains exceeding unity

at mid-range frequencies across the spectrum. Since this filter is located in the feedback loop of the

single delay-loop model, the overall response is unstable when the excitation signal has energy at 62 mid-range frequencies.

5.4 Discussion

This chapter presented the implementation details for the joint source-filter estimation scheme pro- posed in Chapter 4. This included a two-stage onset detection based on a spectral flux computation to estimate the pitch of the plucked-tone and identify the location of the incident pulses used to estimate the source signal. The system was implemented using two di↵erent loop filter structures which characterize the frequency-dependent decay characteristics of the guitar tones.

The first implementation utilized a one pole IIR filter to model the string’s decay response. The formulation of the joint estimation scheme using this filter required using an all-pole approximation for the single delay-loop transfer function. By applying the estimation scheme using this formulation, it was shown that the modeling scheme was capable of capturing the source signals and string decay responses characteristic to the articulations in the data set. The articulations produced with the guitarist’s pick led to more complex string responses and the source-filter estimation method extracts

filter parameters that best approximate these characteristics. Modeling error is attributed to the accuracy of the estimated source signal, which may omit some noise-like characteristics and the non-ideal decay characteristics of real strings, which is generally not monotonic as assumed by the model.

The second implementation utilized an FIR loop filter model, which inherently leads to an all- pole transfer function for the single delay-loop model and thus, is more flexible in terms of adding additional taps to improve the fit. Though a low order (length N = 3) FIR filter performed similarly to the IIR case in terms of SNR, the low order filter did not adequately taper o↵the high frequency characteristics of the tones. Increasing the order of this filter led to unstable single delay-loop transfer functions due to the loop filter gain occasionally exceeding unity. Thus, the IIR loop filter proved to be more robust in terms of stability and providing a better match of the string’s decay characteristics for high frequency partials. 63

CHAPTER 6: EXCITATION MODELING

6.1 Overview

In Chapter 3 physically inspired models of the guitar were discussed including the popular waveg- uide synthesis and the related source-filter models. In particular, the source-filter approximation is attractive for analysis and synthesis tasks because these models provide a clear analog to the physical phenomena incurred with exciting a guitar string: that is, an impulsive-like force from the performer excites the resonant behavior of the string. In Section 4.3, it was shown that analysis via the source-filter approximation can be used to recover excitation signals corresponding to par- ticular string articulations, thereby providing a measure of the performer’s expression. In Section

4.4, a technique was proposed to jointly estimate the excitation signal along with the filter model using a piecewise polynomial approximation of the excitation signal, which contains a bias from the performer’s relative plucking point position along the string.

Including the method proposed in Section 4.4.1, many techniques are available for estimating and calibrating the resonant filter properties for the source-filter model [29, 36, 86], but less research has been invested in the analysis of the excitation signals, which are responsible for reproducing the unique timbres associated with the performer’s articulation. This is a complex problem, since there are nearly an infinite number of ways to pluck a string, each of which will yield a unique excitation (using the source-filter model) even when the tones have a similar timbre. In particular, it is desirable to have methods in which particular articulations could be quantified from analysis of the associated excitation signal. For applications, it would also be desirable to manipulate a parametric representation for arbitrary plucked-string synthesis.

In this chapter, a components analysis approach is applied to a corpus of excitation signals derived from recordings of plucked-guitar tones in order to obtain a quantitative representation to model the unique characteristics of guitar articulations. In particular, principal components analysis (PCA) is employed for this task to exploit common features of excitation signals while modeling the finer details using the appropriate principal components. This approach can be viewed as developing a codebook, where the entries are principal component vectors that describe the unique characteristics 64 of the excitation signals. Additionally, these components are used as features for visualization of particular articulations and dimensionality reduction. Nonlinear PCA is employed to yield a two-way mapping that isolates specific performance attributes which can be used for synthesizing excitation signals.

This research has several applications, including modeling guitar performance directly from recordings in order to capture expressive and perceptual characteristics of a performer’s playing style. Additionally, the codebook entries obtained in this paper can be applied to musical interfaces for control and synthesis of expressive guitar tones.

6.2 Previous Work on Guitar Source Signal Modeling

Existing excitation modeling techniques are based on either the digital waveguide or related source-

filter models. While both are discussed at length in Chapter 3, the source filter model and its components are briefly overviewed here to re-introduce notation pertinent to the remainder of the chapter.

Figure 6.1 shows the model achieved when the bi-directional waveguide model is reduced to a source-filter approximation. The lower block, S(z), of Figure 6.1 is referred to as the single delay-loop (SDL) and consolidates the DWG model into a single delay line zDI in cascade with a string decay filter Hl(z) and a fractional delay filter HF (z). These filters are calibrated such that the total delay, D, in the SDL satisfies D = fs where f and f are the sampling frequency and f0 s 0 fundamental frequency, respectively. Hl(z) is designed using the techniques discussed in Section 3.3.5 [29, 36, 86] while the fractional delay filter can be designed using a number of techniques discussed in Appendix A. The upper block, C(z), of Figure 6.1 is a feedforward comb-filter that incorporates the e↵ect of the performer’s plucking point position along the string. Since the SDL lacks the bi-directional characteristics of the DWG, C(z) simulates the boundary conditions when a traveling wave encounters a rigid termination. Absent from Figure 6.1 is an additional comb filter modeling the pickup position where the string output is observed. While this a↵ects the resulting excitation signals when commuted synthesis is used for recovery, it is omitted here since the data used for evaluations is collected using a constant pickup position.

While the SDL is essentially a source-filter approximation of the physical system for a plucked- string, there are several benefits associated with modeling tones in this manner. For example, modifying the source signal permits arbitrary synthesis of unique tones even for the same filter 65

C(z)

p(n) z-λD + + −

+ S(z) + y(n) + Hl (z) HF (z) z-DI

Figure 6.1: Source-filter model for plucked-guitar synthesis. C(z) is the feed-forward comb filter simulating the a↵ect of the player’s plucking position. S(z) models the string’s pitch and decay characteristics. model. Also, for analysis tasks it is desirable to model the perceptual characteristics of tones from a recorded performance by recovering the source signal using linear filtering operations (see Section

3.3.4 on Commuted Synthesis), which is possible with a source-filter model.

There are several approaches used in the literature for determining the excitation signal for the source-filter model of a plucked-guitar. A possible source signal includes filtered white noise, which simulates the transient, noise-like characteristics of a plucked-string [31]. A well-known technique involves inverse filtering a recorded guitar tone with a properly calibrated string-model [29, 36].

When inverse filtering is used, the string model cancels out the tone’s harmonic components leaving behind a residual that contains the excitation in the first few milliseconds. In [39], these residuals are processed with “pluck-shaping” filters to simulate the performer’s articulation dynamics. For improved reproduction of acoustic guitar tones, this approach is extended by decomposing the tone into its deterministic and stochastic components, separately inverse filtering each signal and adding the residuals to equalize the spectra of the residual [90]. Other methods utilize non-linear processing to spectrally flatten the recorded tone and use the resulting signal as the source, since it preserves the signal’s phase information [38, 41]. Lindroos et al. consider the excitation signal to consist of three parts, which include the picking noise, the first impulse detected by the pickup and a second, reflected pulse also detected by the pickup at some later time [44]. The picking noise is modeled with low-pass filtered white noise and the first pulse is modeled with an integrating filter.

Despite the range of modeling techniques described above, these methods are not generalizable for describing a multitude of string articulations. For example, Laurson’s approach involves storing the residual signals obtained from inverse-filtering recorded plucks, and filters to shape a reference 66 residual signal in order to achieve another residual with a particular dynamic level (e.g. piano, forte) [39]. While this approach is capable of “morphing” one residual into another, the relation- ship between the pluck-shaping filters and the physical e↵ects of modifying plucking dynamics is somewhat arbitrary. Additionally, this method does not remove the bias of the guitarist’s pluck- ing point location, which is undesirable since the plucking point should be a free parameter for arbitrary resynthesis. On the other hand, Lee’s approach handles this problem by “whitening” the spectrum of the recorded tone to remove spectral bias. However, this requires preserving the phase information resulting in a signal equal to the duration of the recorded tone, which is not a compact representation of the signal.

6.3 Data Collection Overview

It is understood by guitarists that exactly reproducing a particular articulation on a guitar string is extremely dicult, if not impossible due to the many degrees of freedom available when exciting the string. These degrees of freedom during the articulation comprise parts of the guitarist’s expressive palette including:

Plucking device (e.g. pick, finger, nail) •

Plucking location along the string •

Dynamics (i.e. the relative “hardness” or “softness” during the articulation) •

These techniques have a direct impact on the initial shape of the string, yielding perceptually unique timbres, especially during the “attack” phase of the tone. It is important to note that, unlike the waveguide model presented in Chapter 3, the SDL does not allow the initial waveshape to be specified via wave variables (e.g. displacement, acceleration). Instead, signal processing techniques must be used to derive the excitation signals through analysis of recorded tones and it is unclear initially how exactly to parameterize the e↵ects of the plucking device and dynamics once the signals are recovered. Additionally, a significant amount of data is needed to analyze the e↵ects of these expressive parameters on the resulting excitation signals.

This section details the approach and apparatus used to collect plucked guitar recordings con- taining the expressive attributes listed above. The recovery of the excitation signals from the data will be explained in Section 6.4. 67

6.3.1 Approach

The plucked-guitar signals under analysis were produced using an Epiphone Les Paul Standard gui- tar equipped with a Fishman Powerbridge pickup. A diagram of the Powerbridge pickup is shown in Figure 6.2 and features a piezoelectric sensor mounted on each string’s saddle on the bridge

[15]. Unlike the magnetic pickups traditionally used for electric guitars, the piezoelectric pickup responds to pressure changes due to the string’s vibration at the bridge. For the application of excitation modeling, the piezoelectric pickup has several benefits over magnetic pickups, including the measurement of a relatively “dry” signal that does not include significant resonant e↵ects arising from the instrument’s body. Also, magnetic pickups tend to introduce a low-pass filtering e↵ect on the spectra of plucked-tones, but the piezo pickups record a much wider frequency range, which is useful for modeling the noise-like interaction between the performer’s articulation and the string.

Finally, recordings produced with the bridge-mounted piezo pickup can be used to isolate the pluck- ing point location for equalization, which will be explained in Section 6.4.2, since the pickup location is constant at the bridge.

Piezo Crystals

Saddle Saddle Position Screw Bridge

Figure 6.2: Front orthographic projection of the bridge-mounted piezoelectric bridge used to record plucked-tones. A piezoelectric crystal is mounted on each saddle, which measures pressure during vibration. Guitar diagram obtained from www.dragoart.com.

The guitar was strung with a set of D’Addario “10-gauge” nickel-wound strings. The gauge reflects the diameter of the first (highest) string, which is 0.01 inches, while the last (lowest) string 68 has a 0.046 inch diameter. As is common with electric guitar strings, the lowest 3 strings (4-6) feature a wound construction while the highest 3 (1-3) are unwound. Recordings were used using either the fleshy part of the guitarist’s finger or a Dunlop Jazz III pick.

The data set of plucked-recordings was produced by varying the articulation across the fretboard of the guitar using either the guitarist’s finger or the pick. For each fret, the guitarist produces a specific articulation five consecutive times for consistency using the pick and their finger. The artic- ulations were identified by their dynamic level and consisted of piano (soft), mezzo-forte (medium- loud) and forte (loud). The performer’s relative plucking point position along the string was not specified and remained a free parameter during the recordings. The articulations were produced on each of the guitar’s six strings using the “open” string position as well as the first five frets, which yielded approximately 1000 plucked-guitar recordings.

The output of the guitar’s bridge pick-up was fed directly to a M-Audio Fast Track Pro USB interface, which recorded the audio directly to a Macintosh computer. Audacity, an open source sound recording and editing tool, was used to record the samples at sampling rate of 44.1 kHz at a

16-bit depth [49].

Due to the di↵erence in construction between the lower and high strings on the guitar, the recordings were analyzed in two separate groups reflecting the wound and unwound strings. In terms of the acquisition system, this a↵ects how the signals are resampled in Figure 6.3. For the unwound strings, the signals were re-sampled to 196 Hz, which corresponds to the tuning of the open, 3rd string, which is the lowest pitch possible on the unwound set. Similarly, the wound strings were resampled to 82.4 Hz, which is the pitch of the open 6th string and the lowest note possible in the wound set.

6.4 Excitation Signal Recovery

On the way to modeling the articulations from recordings from plucked-guitar tones, there are a few pre-processing tasks that must be addressed: 1) Estimate the residual signal from plucked guitar recordings and 2) remove the bias associated with the guitarists plucking point position. As discussed in Section 6.2, a limitation of existing excitation modeling methods is that they do not explicitly handle this bias. The system overviewed in Figure 6.3 addresses these tasks and its various sub-blocks will be explained in this section. 69

6.4.1 Pitch Estimation and Resampling

The initial step of the excitation recovery scheme involves estimating the pitch of the plucked guitar tone. This is achieved by using the well-known autocorrelation method, which estimates the pitch over the first 2-3 periods of the signal by searching for the lag corresponding to the maximum of the autocorrelation function (see Section 5.1.2) [61]. The fundamental frequency is computed as f = fs where f is the sampling frequency and ⌧ is the lag at the maximum of the 0 ⌧max s max autocorrelation function.

Since the plucked-guitar tones under analysis have varying fundamental frequencies, a resampling operation is required to compensate for di↵erences in the pulse-width when the residual is recovered.

This is a required pre-processing step before principal components analysis, since the goal is to model di↵erences in articulation that are not related to pitch. Otherwise, the extracted basis vectors will not reflect the di↵erences in articulation, but rather the di↵erences between the fundamental periods of the analyzed tones.

The resampling operation on the plucked-tone is defined as

yˆ(n)= y(n) (6.1) l

where is the resampling factor. = Tref and = T0 indicate the periods, in samples, of the reference frequency and the estimated pitch frequency of the plucked-tone, respectively.

6.4.2 Residual Extraction

There are several methods of extracting the residual from the recorded tone. The most generalized approach was discussed in Section 4.3 and involves inverse-filtering the recorded tone by the cali-

f Pitch 0 y(n) Estimation Residual Extraction pb(n) via inverse filtering or joint estimation Residual p(n) Equalization drpp Plucking Point Estimation

Figure 6.3: Diagram outlining the residual equalization process for excitation signals. 70

1 30 0.8

0.6 20 0.4

0.2 10

0 0

Amplitude −0.2 Magnitude (dB) −0.4 −10

−0.6 −20 −0.8

−1 −30 0 2 4 6 8 10 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Time (msec) Frequency (Hz)

(a) (b)

Figure 6.4: “Comb filter” e↵ect resulting from plucking a guitar string (open E, f0 = 331 Hz) 8.4 cm from the bridge plucked-guitar tone. (a) Residual obtained from single delay-loop model. (b) Residual spectrum. Using equation 6.2, the notch frequencies are approximately located at multiples of 382 Hz.

brated string model presented in Section 6.2 to yield the residual excitation pb(n). The approach proposed in Chapter 4 outlines an alternate method to jointly estimate the excitation and filter parameters for a plucked guitar tone. It should be notice that the subscript b on pb(n) indicates that the residual contains a “plucking point bias”, which will eventually be removed.

6.4.3 Spectral Bias from Plucking Point Location

The “Plucking Point Estimation” block in Figure 6.3 is concerned with determining the position where the guitarist has displaced the string. It is well understood in literature regarding string physics and digital waveguide modeling that the plucking point position imparts a comb-filter e↵ect on the spectrum of the vibrating string [17, 30, 64]. This occurs because the harmonics that have a node at the plucking position are not excited and, in the ideal case, have zero amplitude.

Figure 6.4 shows the residual and its spectrum obtained from plucking an open E string (f0 = 331 Hz) approximately 8.4 cm from the bridge of an electric guitar. From 6.4(a), the first spike in the residual results from the impulse produced by the string’s initial displacement arriving at the bridge pickup. The subsequent spike also results from the initial string displacement, but has an inverted amplitude due to traveling in the opposite direction along the string and reflecting at the guitar’s nut. A detailed description of this behavior is provided in Figure 4.2 in Section 4.3.2. Unlike a pure impulse which has a flat frequency response, the residual spectrum in 6.4(b) contains deep notches spaced at near-regular frequency intervals. By denoting the relative plucking position along 71 the string as d = l ,wherel is the distance from the bridge and L is the length of the string, rpp Ls s the notch frequencies can be calculated by

f f = n 0 , for n =0, 1, 2,... (6.2) notch,n 1 d rpp

The comb filter bias creates a challenge for parameterizing the excitation signals since the gui- tarist’s relative plucking position constantly varies depending on the position of their strumming hand and their fretting hand. Even when the guitarist maintains the same plucking distance from the bridge, changing the fretting position along the neck manipulates the relative plucking position by elongating or shortening the e↵ective length of the string. While guitarists vary the relative plucking point location, either consciously or subconsciously, during performance, modeling the ex- citation signal requires estimation of the plucking point position and equalization to remove its spectral bias. Ideally, it is desirable to recover the pure impulsive signal imparted by the guitarist when striking the string, as shown in Figure 6.9, in order to quantify expressive techniques, such as plucking mechanism and dynamics. Such analysis requires estimating the plucking point location from recordings and equalizing the residuals to remove the bias.

6.4.4 Estimating the Plucking Point Location

Previous techniques in the literature for estimating the plucking point location from guitar recordings have focused on spectral or time-domain analysis techniques.

Traube proposed a method of estimating the plucking point location by comparing a sampled- data magnitude spectrum obtained from a recording to synthetic magnitude spectra generated with di↵erent plucking point locations [83, 84]. The plucking point location for a particular recording was determined by finding the synthetic string spectra with a plucking position that minimizes the magnitude error between the measured and ideal spectra.

Later, Traube introduced a plucking-point estimation method based on iterative optimization and the so-called log-correlation, which is computed from recordings of plucked tones [81, 82]. The log-correlation is computed by taking the log of the squared Fourier coecients for the harmonically- related partials in a plucked-guitar spectrum and applying the inverse Fourier transform using these coecients. The log-correlation function yields an initial estimate for the relative plucking position, d = ⌧min ,where⌧ , ⌧ are the lags indicating the minima and maxima of the log-correlation rpp ⌧0 min 0 function, respectively. The estimate of drpp is used to initialize an iterative optimization scheme, 72

which minimizes the di↵erence between ideal and measured spectra, in order to refine drpp and improve accuracy.

Penttinen et al. exploited time domain-based analysis techniques to estimate the plucking po- sition [58, 59]. Using an under-saddle bridge pickup, Penttinen’s technique is based on identifying the impulses associated with the string’s initial displacement as they arrive at the bridge pickup.

Since the initial string displacement produces two impulses traveling in opposite directions, the ar- rival time between each impulse at the bridge, t, provides an indication of the guitarist’s relative plucking position along the string.

Figure 6.5 shows the output of a bridge-mounted piezo-electric pickup for a plucked-guitar tone.

By determining the onsets when each pulse arrives at the bridge pickup, Pentinnen shows that the relative plucking position can be determined by

fs Tf0 drpp = , (6.3) fs

where T = fst indicates the number of samples between the arrival of each impulse at the bridge pickup [58, 59]. As drpp is in the range of (0, 1), the actual distance from the bridge is obtained by multiplying drpp by the length of the string. Penttinen utilizes a two-stage onset detection to determine T where the first stage isolates the onset of the plucked tone and the second stage uses the estimated pitch of the tone to extract one period of the . The autocorrelation on the extracted period is used to determine T since the minimum of the autocorrelation function occurs at the lag where the signal’s impulses are out of phase. Figure 6.6(a) shows one cycle extracted from the waveform in Figure 6.5 and the corresponding autocorrelation of that signal in Figure 6.6(b).

t is identified by searching for the index corresponding to the minimum of the autocorrelation function.

There are several strengths and weaknesses associated with the methods proposed by Traube and

Penttinen. Traube’s approach is generalizable to acoustic guitar tones recorded using an external microphone. However, a relatively large time window on the order of 100 milliseconds is required to achieve the frequency resolution required to resolve the string’s harmonically related partials and, thus, compute the autocorrelation function. By including multiple periods of string vibration in the analysis, the e↵ect of the plucking position can become obscured since non-linear coupling of the string’s harmonics can regenerate the missing harmonics [16]. By isolating just one period of the waveform near the onset, Penttinen’s technique avoids this physical consequence since the analyzed 73

0.5 0.4 0.3 0.2 0.1 0

Amplitude −0.1 −0.2 −0.3 −0.4 −0.5 Δt

2 4 6 8 10 12 14 Time (msec)

Figure 6.5: Plucked-guitar tone measured using a piezo-electric bridge pickup. Vertical dashed-lines indicate the impulses arriving at the bridge pickup. t indicates the arrival time between impulses.

2.5 0.5 2 0.4 0.3 1.5 0.2 1 0.1 0 0.5

Amplitude −0.1 Amplitude 0 −0.2 −0.3 −0.5 −0.4 −1 −0.5 −1.5 5.5 6 6.5 7 7.5 8 8.5 0 0.5 1 1.5 2 2.5 3 Time (msec) Time (msec)

(a) (b)

Figure 6.6: (a) One period extracted from the plucked-guitar tone in Figure 6.5. (b) Autocorrelation of the extracted period. The minimum is marked and denotes time lag, t, between arriving pulses at the bridge pickup.

segment results from the string’s initial displacement. However, Penttinen’s approach requires the guitar to be equipped with the bridge-mounted pickup to isolate the arrival time of the impulses in the first period of vibration. Also, isolating the first period of vibration is dicult and success is dependent on the parameters used in the two-stage onset detection.

Handling the e↵ect of a string pickup location at a position other than the bridge is not explicitly addressed by either method. Similar to spectral bias resulting from the plucking point location, the pickup location also adds a spectral bias since vibrating modes of the string with a node at the pick 74 up location will not be measured. Traube’s methods are developed for the acoustic guitar recorded with a microphone some distance from the instrument’s sound hole. In this case, the “pickup” is the radiated acoustic energy from all positions along the string and thus shows no particular spectral bias. For electric guitars, if a bridge-mounted pickup is not available, determining the plucking location is particularly dicult due to the lack of consistency where the pickups are placed on the instrument and the number used. The former constraint makes it dicult to determine which impulse (i.e. that left-traveling and right-traveling) pulse is being measured at the output and the latter constraint complicates the problem since some guitars “blend” the signal from two or more pickups.

6.4.5 Equalization: Removing the Spectral Bias

The next step in the excitation acquisition scheme is to remove the comb filter bias associated with the plucking point position. In Figure 6.3, the “Residual Equalization” block handles this task.

The equalization begins by obtaining an estimate of the relative plucking-point location drpp along the string. Since the signals under analysis were recorded with a bridge-mounted pickup,

Penttinen’s autocorrelation-based technique was chosen to estimate drpp. The two-stage onset de- tection approach presented in Section 5.1 was used to identify the incident and reflected pulses during the initial period of vibration. drpp is then used to formulate a comb filter to approximate the notches in the spectrum of the residual

D H (z)=1 µzb c, (6.4) cf

fs where =1 drpp and D = is the “loop delay” of the digital waveguide model determining the f0 pitch of the string [74]. D denotes the greatest integer less than or equal to the product D. µ b c is a gain factor applied to the delayed signal, which determines how deep the magnitude is for the notch frequencies in the spectrum where µ values closer to 1 lead to deeper notches [76]. Intuitively,

Equation 6.4 specifies the number of samples, as a fraction of the total loop delay, between the arrival of each impulse at the bridge.

The basic comb filter structure in Equation 6.4 and Figure 6.7 (a) provides a good approximation of the spectral nulls associated with the plucking point position. However, it is limited to sample- level accuracy, which may not adequately approximate the true notch frequencies in the spectrum.

For more precise localization, a fractional delay filter is inserted into the feed-forward path to provide 75

− v(n) μ z-λD + u(n) +

(a)

− v(n) μ F(z) z-λD + u(n) +

(b)

Figure 6.7: Comb filter structures for simulating the plucking point location. (a) Basic structure. (b) Basic structure with fractional delay filter added to the feedforward path to implement non-integer delay.

the required non-integer delay as shown in Figure 6.7 (b) [88]. Thus, the resulting fractional delay comb filter has the form

D H (z)=1 µF (z)zb c, (6.5) cf where F (z) provides the fractional precision lost by rounding the product D. F (z) is designed using several available techniques in the literature, including all-pass filters and FIR LaGrange interpolation filters as discussed in Appendix A.

Using the comb filter structure from Equation 6.4 or 6.5, pb(n) can be equalized by inverse filtering

P (z) P (z)= b . (6.6) Hcf (z)

Figure 6.8 demonstrates the e↵ects of equalizing the residual in both the time and frequency domains. Figures 6.8(a) and 6.8(b) show the time and spectral domain plots, respectively, of the residual obtained from a plucked-guitar tone. Figure 6.8(b) also plots the frequency response of the estimated comb filter, which approximates the deep notches found in the residual. A 5th-order fractional delay was used for the comb filter and a value of 0.95 was used for the gain term µ.This value was found to provide the closest approximation of the spectral notches for the signals in the 76 dataset. Figure 6.8(c) and 6.8(d) show the time and spectral domain plots when the residual is equalized by inverse filtering. In the spectral domain, inverse comb filtering yields a magnitude spectrum that is relatively free of the deep notches seen in 6.8(b). In the time domain plot of 6.8(c) this translates into a signal that is much closer to a pure impulse.

1 Residual Spectrum 30 0.8 Comb Filter Approximation

0.6 20 0.4

0.2 10

0 0

Amplitude −0.2 Magnitude (dB) −0.4 −10

−0.6 −20 −0.8

−1 −30 0 2 4 6 8 10 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Time (msec) Frequency (Hz)

(a) Residual (b) Residual spectrum and comb filter approximation

0.4 Residual Spectrum 30 0.2 Equalized Spectrum

0 20 −0.2

−0.4 10

−0.6 0

Amplitude −0.8 Magnitude (dB) −1 −10

−1.2 −20 −1.4

−1.6 −30 0 2 4 6 8 10 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Time (msec) Frequency (Hz)

(c) Residual with bias removed (d) Original and equalized spectra using inverse comb filter.

Figure 6.8: Spectral equalization on a residual signal obtained from plucking a guitar string 8.4 cm from the bridge (open E, f0 = 331 Hz)

6.4.6 Residual Alignment

After equalization, the final step is to align the processed excitation signals with a reference excitation signal. This ensures that the impulse “peak” of each signal is aligned in the time domain to avoid errors for principal components analysis. In practice, this is accomplished by copying the reference and processed signals and cubing them, which decreases the amplitudes of the samples around the primary peak. The cross correlation is computed between each signal and the reference pulse. The lag indicating maximum correlation is used to indicate the shift needed to align each signal with the 77 reference pulse.

For excitation signal modeling and parameterization, the residual equalization scheme has several benefits. From an intuitive standpoint, the impulsive-like signals obtained from equalization are more indicative of the performer’s string articulation. Also, signals in this form are simpler to model and therefore more adept for parameterization. Finally, removing the plucking point bias allows the relative plucking point location to remain a free parameter for synthesis applications.

6.5 Component-based Analysis of Excitation Signals

6.5.1 Analysis of Recovered Excitation Signals

By applying the excitation recovery and equalization scheme of the previous section to the corpus of recordings gathered in Section 6.3, analysis of the recovered signals provides insight into the similarities and di↵erences of excitation signals corresponding to various string articulations. Figure

0.2 0.2

0 0

−0.2 −0.2

−0.4 −0.4 Amplitude Amplitude −0.6 −0.6

−0.8 −0.8

forte forte −1 mezzo−forte −1 mezzo−forte piano piano

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Time (msec) Time (msec)

(a) (b)

Figure 6.9: Excitation signals corresponding to strings excited using a pick (a) and finger (b).

6.9 (a) and (b) shows excitation signals overlayed on each other which were obtained from plucked guitar tones produced by using either a plastic pick (a) or the player’s finger (b). For both finger and pick articulations, the dynamics of the pluck consisted of piano (soft), mezzo-forte (moderately loud) and forte (loud). These plots show a common, impulsive-like contour with additional high-frequency characteristics depending on the dynamics used. Comparing Figures 6.9 (a) and (b), it is evident that the signals corresponding to finger articulations are generally wider whereas the pick excitation signals are more narrow and closer to an ideal impulse.

Figure 6.10 plots the average magnitude spectrum for each type of articulation in the data set. 78

30 forte 30 forte mezzo−forte mezzo−forte piano piano 20 20

10 10

0 0

−10 −10 Magnitude (dB) Magnitude (dB)

−20 −20

−30 −30

−40 −40 2 3 4 2 3 4 10 10 10 10 10 10 Frequency (Hz) Frequency (Hz) (a) (b)

Figure 6.10: Average magnitude spectra of signals produced with pick (a) and finger (b).

For each type of articulation (finger or pick), increasing the relative dynamics from piano to forte results in increased high frequency spectral energy. An interesting observation is that piano-finger articulations show a significant high frequency ripple. This may be attributed to the deliberately slower plucking action used to produce these articulations, where the string slides slower o↵the player’s finger. When these signals are used to re-synthesized plucked-guitar tones, they often have a qualitative association with the perceived timbre of the resulting tones. Descriptors, such as

“brightness” are often used to describe the timbre, which generally increases with the dynamics of the articulations. The varying energy from the plots in Figure 6.10 provides quantitative support of this observation.

6.5.2 Towards an Excitation Codebook

Based on the observations of Figures 6.9 and 6.10, we propose a data-driven approach for mod- eling excitation signals using principal components analysis (PCA). Employing PCA is motivated by observing the similar, impulse-like structure of the excitation signals shown in Figure 6.9. As discussed, the fine di↵erences between the derived excitation signals can be attributed to the gui- tarist’s articulation and account, in part, for the spectral characteristics of the perceived tones.

These di↵erences can be modeled using a linear combination of basis vectors to provide the desired spectral characteristics. The results of this analysis will be used to develop a codebook that consists of the essential components required to accurately synthesize a multitude of articulation signals. At present, PCA has not yet been applied to modeling the excitation signals for source-filter models of plucked-string instruments. However, PCA has been applied to speech coding applications, in which 79 principal components are used to model voice-source including the complex interactions between the vocal tract and glottis [19, 51].

This section presents the application of PCA to the data set and the development of an excitation codebook using the basis vectors. The re-synthesis of excitation signals corresponding to particular string articulations will also be presented.

6.5.3 Application of Principal Components Analysis

The motivation for applying principal components analysis (PCA) to plucked-guitar excitation sig- nals is to achieve a parametric representation of these signals through statistical analysis. In Section

6.5.1 it was shown that excitation signals corresponding to di↵erent articulations shared a common impulsive-contour, but had varying high frequency details depending on the specific articulation.

The goal of PCA is to apply a statistical analysis to this data set which is capable of extracting basis vectors that can model these fine details. By exploiting redundancy in the data set, PCA leads to data reduction for parametric representation of signals.

PCA is defined as an orthogonal linear transformation of the data set onto a new coordinate system [13]. The first principal axes in this new space explains the greatest variance in the original data set, the second axes maximizes the remaining greatest variance in the data set and so on. Figure

6.11 depicts the application of PCA to synthetic data in a two dimensional space. The vectors v1 and v2 define the principal component axes for the data set. The principal components are found by computing the eigenvalues and eigenvectors for the covariance matrix of the data set [5]. This is the well-known Covariance Method for PCA [13]. The

v1 v2

Figure 6.11: Application of principal components analysis to a synthetic data set. The vector v1 explains the greatest variance in the data while v2 explains the remaining greatest variance. 80 initial step involves formulating a data matrix

T || | 2 3 P = p1 p2 ... pN (6.7) 6 7 6 7 6 7 6 || |7 4 5 where each pi is a M-length column vector corresponding to a particular excitation signal in the data set. The next step involves computing the covariance matrix for the mean-centered data matrix by taking

⌃ =E (P u)(P u)T (6.8) h i where E is the expectation operator and u =E[P] is the empirical mean of the data matrix. The principal component basis vectors are obtained through an eigenvalue decomposition of ⌃

1 V ⌃V = D (6.9)

where V =[v1v2 ...vN ] is a matrix of eigenvectors of ⌃ and D is a matrix containing the associ- ated eigenvalues along its main diagonal. The LAPACK linear algebra software package is used to compute the eigenvectors and eigenvalues [2].

The columns of V are sorted in order of the decreasing eigenvalues in D such that 1 >2 > > . This step is performed so that the PC basis vectors are rearranged in a manner that ··· N explains the most variance in the data set.

To reconstruct the excitation signals, the correct linear combination of basis vectors is required.

The correct weights are obtained by projecting the mean-centered data matrix onto the eigenvectors

W =(P u) V. (6.10)

Equation 6.10 defines an orthogonal linear transformation of the data onto a new coordinate system 81 defined by the basis vectors. The weight matrix W is defined as

T || | 2 3 W = w1 w2 ... wN , (6.11) 6 7 6 7 6 7 6 || |7 4 5 where each w is an M-length column vector containing the scores (or weights) to pertaining to a particular excitation signal in P. These scores indicate how much each basis vector is weighted when reconstructing the signal and they are also helpful in visualizing the data, as will be discussed in the next section.

6.5.4 Analysis of PC Weights and Basis Vectors

Principal component analysis of the excitation signals is divided into two groups to separately examine the set of wound and unwound strings, which have di↵erent physical characteristics, as described in Section 6.3.

For the set of unwound strings, the recovered excitation signals were normalized to a reference length of M = 570 samples, which is approximately twice the length of the period corresponding to the open 3rd string tuned to 196 Hz. For the set of wound strings, the reference length of the excitation signals was set to M = 910 samples, which is approximately twice the period of the open

6th string tuned to 82.4 Hz. It should be noticed that normalization was achieved via downsampling to avoid truncating significant sections of the excitation signal. Downsampling to the lowest possible frequency in the set of strings also avoids the loss of high frequency information present in the data set. PCA was applied to both groups of excitation signals using the Covariance Method overviewed in Section 6.5.3.

To analyze the compactness of each data set, the explained variance (EV ) can be computed using the eigenvalues calculated from PCA

M 0 ⌃m m EV = M (6.12) ⌃m m

where M 0 95% of the variance for the set of 82

100 100

95 95

90 90

85 85 Explained Variance (%) Explained Variance (%)

80 80

75 75 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Number of Eigenvalues Number of Eigenvalues

(a) (b)

Figure 6.12: Explained variance of the principal components computed for the set of (a) unwound and (b) wound strings.

unwound strings while M 0 = 30 is sucient for > 95% of the variance in the wound set. For insight on the relationship between the basis vectors and the excitation signals, Figure 6.13 plots the first three basis functions along side example articulations extracted from the data set consisting of the 1st,2nd and 3rd strings. The general, impulsive-like contour is captured by the empirical mean of the data set. In the case of the excitations derived from pick articulations, the basis vectors plotted provide the high frequency components just before and after the main impulse.

In the case of the finger articulations, these basis vectors are negatively weighted and serve to widen the main impulse. This relationship agrees with the physical occurrence of plucking a string with a pick versus a finger, since the physical characteristics of each plucking device directly a↵ect the shape of the string.

Figure 6.14 shows a similar plot for the 4th,5th and 6th strings, which have di↵erent physical characteristics due to their wound construction. By comparing Figures 6.13 and 6.14, it is evident that the extracted basis vectors are very similar in each case. The di↵erence, however, is in the empirical mean vector, which is exhibits a pronounced “bump ” immediately after the main impulse.

This feature appears to be characteristic of the articulations produced by the finger, which perhaps reflects the slippage of the wound string o↵of the finger.

Figure 6.15 shows projections of how the data pertaining to the string articulations projects into the space defined by the principal component vectors. Figure 6.15(a) shows the projection of articulations from strings 1-3 along the 1st and 2nd components. This projection shows that the data pertaining to specific articulations have a particular arrangement and grouping in this space. 83

pick excitations principal components

0

−0.5 Amplitude −1 forte mezzo−forte piano finger excitations

0

Mean −0.5 PC 1 PC 2 −1 PC 3

0 2 4 6 8 10 0 2 4 6 8 10 Time (msec) Time (msec)

Figure 6.13: Selected basis vectors extracted from plucked-guitar recordings produced on the 1st, 2nd and 3rd strings.

pick excitations principal components

0

−0.5 Amplitude −1 forte mezzo−forte piano finger excitations

0

Mean −0.5 PC 1 PC 2 −1 PC 3

0 5 10 15 0 5 10 15 Time (msec) Time (msec)

Figure 6.14: Selected basis vectors extracted from plucked-guitar recordings produced on the 4th, 5th and 6th strings. 84

3.5 pick−forte 3 pick−mezzo−forte pick−piano 2.5 finger−forte finger−mezzo−forte 2 finger−piano

1.5

1

0.5

0 2nd Principal Component

−0.5

−1

−1.5 −6 −5 −4 −3 −2 −1 0 1 2 3 4 1st Principal Component

(a)

3 pick−forte pick−mezzo−forte 2 pick−piano finger−forte finger−mezzo−forte 1 finger−piano

0

−1

−2 2nd Principal Component

−3

−4 −6 −4 −2 0 2 4 6 1st Principal Component

(b)

Figure 6.15: Projection of guitar excitation signals into the principal component space. Excitations from strings 1 - 3 (a) and 4 - 6 (b).

In particular, the axis pertaining to the 1st correlates to the articulation strength, which increases independently for pick and finger articulations. Similarly, the projection of the data pertaining to

Strings 4-6 is shown in Figure 6.15(b), which shows a di↵erent arrangement, but a similar clustering of data based on the articulation type.

6.5.5 Codebook Design

The plots of explained variance in Figure 6.12 demonstrate the relative low dimensionality of the extracted guitar excitation signals. Here, we present an approach for designing a codebook to further 85 reduce the number of basis vectors required to accurately reconstruct the excitation signals. This step is advantageous for synthesis systems where it is desirable to faithfully capture the perceptual characteristics of the performer-string interaction, while minimizing the amount of data required.

Also, this approach separately analyzes the principal component weights for pick and finger articu- lations to determine the “best” subset of basis vectors comprising each group of articulations. This method considers that, while PCA yields basis vectors that successively explain the most variance in the data, certain basis vectors may be more essential to synthesize a particular articulation based on the magnitude of the associated weight vector.

The codebook design procedure is as follows:

1. Compute the weight matrix for the data set using Equation 6.10. A weight vector w =

[w1w2 ...wM ] is obtained for each excitation signal in the data set.

2. Take the absolute value for each weight vector w and sort the entries in descending order so

that w > w > > w . | 1| | 2| ··· | M |

3. Select the first Mtop weights from the sorted weight vector where Mtop is an integer number.

4. For each of the Mtop weights selected, record the occurrence of the associated principal com- ponent vector into a histogram.

5. Using the histogram as a guide, select a subset L of basis vectors having the highest occurrences

in the histogram (see Figure 6.16) where L

Figure 6.16 shows the histogram computed separately for excitation signals associated with pick and finger articulations. It is interesting to note that the function of weight frequency vs. principal component number does not monotonically decrease. This suggests that certain component vectors are more “essential” than others for representing the ensemble of excitation signals for a particular articulation.

6.5.6 Codebook Evaluation and Synthesis

After the codebook as been designed, a particular excitation signal can be generated by using a desired number of codebook entries (i.e. basis vectors) and the appropriate weightings for each 86

250 Pick Finger

200

150 Frequency 100

50

0 5 10 15 20 25 30 35 40 45 50 Principal Component

Figure 6.16: Histogram of basis vector occurrences generated with Mtop = 20.

entry. Equation 6.13 presents the synthesis equation

L

pi = p¯ + wi,mvˆm, (6.13) m=1 X where L indicates the number of codebook entries used for re-synthesis. The weight values are obtained by projecting the excitation signal onto the basis vectors. The number of codebook entries used for synthesis depends on the desired accuracy. Figure 6.17 demonstrates the reconstruction by varying the number of entries. It is clear that using a single entry does not capture the high frequency details found in the reference excitation signal. However, using 10 entries approximates the contour of the signal and 50 entries captures nearly all the high frequency information.

The reconstruction quality can be summarized for the entire data set by computing the signal- to-noise ratio (SNR) for each signal in the set. SNR is defined as

p(n) 2 SNR = 10 log , (6.14) dB 10 p(n) pˆ(n) n X ✓ ◆ where p(n) andp ˆ(n) are the original and reconstructed signals, respectively. Each excitation signal was constructed by varying the number of codebook entries used and averaging the SNR for all excitations at particular number of entries. Additionally, separate codebooks were developed for signals associated with pick or finger articulations to improve error when the number of entries is low. Figure 6.18 summarizes the results of this analysis. 87

1 Codebook Entries 0.8 Original 0.6 Reconstructed

0.4

0.2

0

−0.2 Amplitude

−0.4

−0.6

−0.8

−1 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 Time (msec)

(a)

10 Codebook Entries 0.8 Original 0.6 Reconstructed

0.4

0.2

0

−0.2 Amplitude

−0.4

−0.6

−0.8

−1 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 Time (msec)

(b)

50 Codebook Entries 0.8 Original 0.6 Reconstructed

0.4

0.2

0

−0.2 Amplitude

−0.4

−0.6

−0.8

−1 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 Time (msec)

(c)

Figure 6.17: Excitation synthesis by varying the number of code book entries: (a) 1 entry, (b) 10 entries, (c) 50 entries.

It is of note that the SNR computed for finger excitation signals is generally higher than SNR computed for pick excitations regardless of the number of codebook entires used. Intuitively, this agrees with previous observations of the excitation signals obtained from our data set. In general, the observed signals pertaining to finger articulations were not as complex as the picked articulations 88

35 pick finger

30

25

SNR (dB) 20

15

10 0 50 100 150 200 250 Codebook Entries

Figure 6.18: Computed Signal-to-noise ratio when increasing the number of codebook entries used to reconstruct the excitation signals.

(see Figure 6.10). Thus, the finger articulations may be more accurately represented with fewer components.

The results presented in Figure 6.18 present a strong case for applications requiring accurate and expressive synthesis with low data storage requirements. The initial PCA analysis yielded

570 basis vectors (for strings 1-3) each with a length of 570 samples. From Figure 6.18, it is evident that the SNR of the reconstruction error only marginally increases when more than 150

150 570 codebook entries are used. 150 codebook entries requires only 26% ( 570⇥570 ) of the data obtained ⇥ from the initial PCA, which significantly reduces the amount of storage required. At a 16-bit quantization level, 150 codebook entries would require approximately 167 kilobytes of storage, which is a modest requirement considering the storage capacities of present day personal computers and mobile computing devices.

6.6 Nonlinear PCA for Expressive Guitar Synthesis

The linear PCA technique presented in the previous section provides intuition on the underlying basis functions comprising our data set, it is unclear how exactly the high dimensional component space relates to the expressive attributes of our data. As shown in Figure 6.15, there is a nonlinear arrangement of the data along the axes pertaining to the first two principal components. Moreover, as additional components are needed to accurately reconstruct the source signals, simply sampling the space defined by the first two components is not sucient for high quality synthesis. On the 89 other hand, it is dicult to visualize and infer the underlying structure of the data by projecting it along additional components. In this section, we explore the application of nonlinear principal components analysis (NLPCA) to the data extracted from linear PCA to derive a low dimensional representation of the data. We show that the reduced dimensional space derived using NLPCA explains the expressive attributes of the excitation signals in the data set. Moreover, this low dimensional representation can be inverted and therefore adept as an expressive controller using the original linear components.

6.6.1 Nonlinear Dimensionality Reduction

There are many techniques available in the literature for nonlinear dimensionality reduction, or manifold-learning, for the purposes of discovering the underlying nonlinear characteristics of high dimensional data. Such techniques include locally linear embedding (LLE) [65] and Isomap [78].

While LLE and Isomap are useful for data reduction and visualization tasks, their application does not provide an explicit mapping function to project the reduced dimensionality data back into the high dimensional space.

For the purpose of developing an expressive control interface, re-mapping the data back into the original space is essential since we wish to use our linear basis vectors to reconstruct the excitation pulses. To satisfy this requirement, we employ NLPCA via autoassociative neural networks (ANN) to achieve dimensionality reduction with explicit re-mapping functions.

σ σ w1 wˆ z 1 σ 1 σ w2 * wˆ2 σ σ w3 wˆ3 σ σ

T1 T2 T3 T4

Input Mapping Bottleneck De-Mapping Output Layer Layer Layer Layer Layer

Figure 6.19: Architecture for a 3-4-1-4-3 autoassociative neural network. 90

The standard architecture for an ANN is shown in Figure 6.19 and consists of 5 layers [34]. The input and mapping layers can be viewed as the “extraction” function since it projects the input variables into a lower dimensional space as specified in the bottleneck layer. The de-mapping and output layers comprise the “generation” function, which projects the data back into its original dimensionality. Using Figure 6.19 as an example, the ANN can be specified as a 3-4-1-4-3 network to indicate the number of nodes at each layer. The nodes in the mapping and de-mapping functions contain sigmoidal functions and are essential for compressing and decompressing the range of the data to and from the bottle neck layer. An example sigmoidal function that can be used is the hyperbolic tangent, which compresses values with a range of ( , )to( 1, 1). Since the desired values at the 1 1 bottleneck layer are unknown, direct supervised training cannot be used to learn the mapping and de-mapping functions. Rather, the combined network is learned using back propagation algorithms to minimize a squared error criterion such that E = 1 w wˆ [34]. From a practical standpoint, 2 k k the mapping functions are essentially a set of transformation matrices to compress (T1,T2) and decompress (T3,T4) the dimensionality of the data.

6.6.2 Application to Guitar Data

To uncover the nonlinear structure of the guitar features extracted in Section 6.5.4, NLPCA was applied using 25 scores from the linear components analysis at the input layer of the ANN. Empir- ically, we found that using 25 scores was sucient in terms of adequately describing the data and expediting the ANN training. As discussed in Section 6.5.4, 25 linear PCA vectors explains > 95% of the variance in the data set and leads to good re-synthesis. At the bottleneck layer of the ANN, we chose two nodes in order to have multiple degrees of freedom which could be used to synthesize excitation pulses in an expressive control interface. These design criteria yielded a 25-6-2-6-25 ANN architecture, which was trained using the NLPCA MATLAB Toolbox [67].

Figure 6.20 compares the projection of the data into the linear component space and the reduced dimension space defined by the bottleneck layer of the ANN. As shown in 6.20(b). Unlike the linear projection in 6.20(a), the bottleneck layer of the NLPCA space has “unwrapped” the nonlinear data arrangement so that it is now clustered about linear axes. Figure 6.21 shows an additional linear rotation applied to this new space for a clearer view of how the axes relate to the data set. By examining this space, the data is clearly organized around the orthogonal z1 and z2 axes. Selected excitation pulses are also shown, which were synthesized by sampling this coordinate space, project- 91

ing back into the linear principal component domain using the transformation matrices (T3,T4) from the ANN and using the resulting scores to reconstruct the pulse with linear component vectors.

3.5 pick−forte 3 pick−mezzo−forte pick−piano 2.5 finger−forte finger−mezzo−forte 2 finger−piano

1.5

1

0.5

0 2nd Principal Component

−0.5

−1

−1.5 −6 −5 −4 −3 −2 −1 0 1 2 3 4 1st Principal Component

(a)

0.8 pick, forte pick, mezzo−forte 0.6 pick, piano finger, forte 0.4 finger, mezzo−forte 0.2 finger, piano

0 2 v −0.2

−0.4

−0.6

−0.8

−1

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 v 1 (b)

Figure 6.20: Top: Projection of excitation signals into the space defined by the first two linear principal components. Bottom: Projection of the linear PCA weights along the axis defined by the bottleneck layer of the trained 25-6-2-6-25 ANN.

The nonlinear component defined by the z1 axis describes the articulation type where points sampled in the space z1 < 0 pertain to finger articulations and points sampled for z1 > 0 pertain to pick articulations. The finger articulations feature a wider excitation pulse in contrast to the pick, where the pulse is generally more narrow and impulsive. In both articulation spaces, moving from left to right increases the relative dynamics. The second nonlinear component defined by the z2 axis relates to the contact time of the articulation. As z2 is increased, the excitation pulse width increases for both articulation types. 92

0.5 0.5 0.5

0 0 0

Amplitude −0.5 Amplitude −0.5 Amplitude −0.5

−1 −1 −1 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 Time (msec) Time (msec) Time (msec)

0.25 pick, forte 0.2 pick, mezzo−forte pick, piano 0.15 finger, forte 0.1 finger, mezzo−forte finger, piano 2

z 0.05

0

−0.05

−0.1

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 z 1

0.5 0.5 0.5

0 0 0

Amplitude −0.5 Amplitude −0.5 Amplitude −0.5

−1 −1 −1 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 Time (msec) Time (msec) Time (msec)

Figure 6.21: Guitar data projected along orthogonal principal axes defined by the ANN (center). Example excitation pulses resulting from sampling this space are also shown.

6.6.3 Expressive Control Interface

We demonstrate the practical application of this research in a touch-based iPad interface shown in

Figure 6.22. This interface acts as a “tabletop” guitar, where the performer uses one hand to provide the articulation and the other to key in the desired pitch(es). The articulation is applied to the large, gradient square in Figure 6.22, which is a mapping of the reduced dimensionality space shown in

Figure 6.21. Moving up along the vertical axis of the articulation space increases the dynamics of the articulation (piano to forte) and moving right to left on the horizontal axis increases the contact time. The articulation area is capable of multi-touch input so the performer can use multiple fingers within the articulation area to give each tone a unique timbre.

The colored keys on the left-side of Figure 6.22 allow the user to produce certain pitches. Adjacent 93

Strength

Articulation Space

Keyboard Area Contact Time

Figure 6.22: Tabletop guitar interface for the components based excitation synthesis. The articu- lation is applied in the gradient rectangle, while the colored squares allow the performer to key in specific pitches.

keys on the horizontal axis are tuned a half step apart and their color indicates that they are part of the same “string” so that only the leading key on the string can be played at once. Diagonal keys on adjacent strings are tuned to a Major 3rd interval while the o↵-diagonal keys represent a Minor

3rd interval. This arrangement allows the performer to easily finger di↵erent chord shapes.

The synthesis engine for the tabletop interface is capable of computing the excitation signal corresponding to the performer’s touch point within the articulation space and filtering the resulting excitation signal for multiple tones in real-time. The filter module used for the string is implemented with the single delay-loop model shown in Figure 6.1. Though this filter has a large number of delay taps, which is dependent on the pitch, only a few of these taps have non-zero coecients, which permits an ecient implementation of infinite impulse response filtering. Currently, the relative plucking position along the string is fixed, though this may be a free parameter in future versions of the application. The excitation signal can be updated in real-time during performance, which is made possible by the iPad’s support of hardware-accelerated vector libraries. These include the matrix multiplication routines to project the low dimensional user input into the high dimensional component space. Through our own testing, we found that the excitation signal is typically computed 94 in < 1 millisecond, which is more than adequate for real-time performance.

6.7 Discussion

In this chapter, a novel, component-based approach was presented for modeling the excitation signals of plucked-guitar tones. This method draws on physically inspired modeling techniques to extract the excitation pulses from recorded performances pertaining to various articulation styles in accordance with a source-filter model. Principal components analysis (PCA) was used to model the excitation pulses using the resulting set of linear basis vectors. While this analysis led to a large number of basis vectors, a codebook was developed to reduce the number required for accurate modeling.

To understand the relation between the linear components and the expressive attributes of the excitation signals in the data set, nonlinear principal components analysis (NLPCA) was used to achieve a reduced dimensional space using the linear weights as inputs to autoassociative neural network (ANN). Using the ANN, the relation of the expressive attributes of the excitation signals to the axes of the reduced dimensional space are clear.

A pertinent application of this research includes developing new interfaces for musical expression.

The application of NLPCA to the excitation signal data set derives a low dimensional representation based on linear basis vectors and has a clear relationship to the expressive attributes of the data set.

Since the transformation into the reduced space is invertible, this representation could be leveraged into gesture recognition and control applications for music synthesis. At present, gesture-based recognition systems for guitar synthesis rely on non-parametric, sample-based synthesizers or at best, physical models where the excitation signals are saved o↵-line [26, 55]. The component-based modeling approach presented here is limited only by the data used to derive component vectors and can be used for arbitrary synthesis using the reduced dimensional space.

Similar to gesture-recognition systems, recent advances in mobile computing technology make touch-based devices a compelling platform for expressive musical interfaces, especially for the gui- tar. Among the existing interfaces are Apple’s iPad implementation of Garageband, which uses accelerometer data in response to the user’s tapping strength to trigger an appropriate sample for the synthesizer [20]. Similarly, the OMGuitar enables single note or chorded performance and triggers chord samples based on the how quickly the user “” the interface [1]. In both cases, sample-based synthesizers are used, though as shown in the previous section, the reduced-dimensional component space is highly applicable to these interfaces. 95

CHAPTER 7: CONCLUSIONS

This research presented several novel techniques for the analysis and synthesis of guitar performance focusing on the player’s string articulation, which can be summarized as follows:

Generated a data set of plucked guitar tones comprising variations of the performer’s articu- • lation including the plucking mechanism and strength, which spans all of the guitar’s strings

and several fretting positions.

Developed a framework for jointly estimating the source and filter parameters for plucked- • guitar tones based on a physically-inspired model.

Proposed and demonstrated a novel application of principal component analysis to model the • source signal for plucked guitar tones to encapsulate characteristics of various string articula-

tions.

Utilized nonlinear principal components analysis to derive an expressive control space to syn- • thesize excitation signals corresponding to guitar articulations.

This research is centered on source-filter modeling techniques widely used in the literature since the model highly analogous to the process of exciting a resonant string. I have shown that estimating the parameters of the model can be formulated as a joint estimation problem where the motivation is to account for the simultaneous variation between the performer’s articulation and the string’s resonant response and that this technique is adept at capturing the parameters and perceptual attributes of recorded plucked-guitar tones produced with di↵erent plucking mechanisms and strengths. A novel, data-driven approach for modeling excitation signals based on linear and nonlinear principal components was also presented. This modeling approach decouples the e↵ect of the performer’s plucking position on the string and treats each excitation signal as a weighted combination of basis vectors. Nonlinear components analysis is used to derive an invertible, expressive space which can be used to synthetically generate excitation signals pertaining to specific articulations in the data set.

A practical application of this research was also presented where an iPad was used to demonstrate

flexible, real-time synthesis of guitar tones with control over the string articulation. 96

This chapter will discuss limitations of the proposed methods with regard to the techniques employed and the underlying physics of vibrating strings. Future directions for this research will also be discussed.

7.1 Expressive Limitations

The techniques presented in this dissertation are primarily concerned with modeling the performer’s articulation through their plucking action, which includes the e↵ects of plucking mechanism and strength. However, guitarists use additional expressive techniques during performance pertaining to the action of their fretting hand which controls the pitch of the plucked-tone. These techniques include legato, or smooth, transitions between notes and pitch shifting techniques such as bends and vibrato, which alter the pitch of a tone after it has been excited. Due to the time-varying nature of the the tones resulting from these techniques, analysis and synthesis with linear time-invariant source filter models is extremely dicult or unfeasible.

Guitarists typically play with legato style using slides, “hammer-on’s” or “pull-o↵’s” between notes. When performing a slide, the note is played at a particular position and the fretting finger moves up or down the string after the note has sounded until the desired pitch is reached. Similarly, a hammer-on involves playing a particular note with a fretting finger and using another finger to clamp down the string at a higher fret position after the note has already sounded to achieve a sudden pitch increase. The complementary technique is the pull-o↵, where the fretting finger is released and another finger, already in position behind the fretting finger, sets a lower pitch. The discrete pitch changes resulting from tones produced with legato are not easily analyzed with a source-filter model. In particular, sliding into a note causes one or many discrete pitch changes as the guitarist’s

finger moves along the fretboard to its final position. The resulting tone will exhibit time varying pitch and decay characteristics. The hammer-on technique introduces additional complexity into the analysis since the string is “excited” in a sense by the second finger clamping the new fret in an impulsive-like manner. Furthermore, melodies can often be performed with hammer-on’s and pull-o↵’s without using the articulation hand to initially excite the string, which diverges from the traditional notion of how the string is excited.

While legato performance introduces sudden, discrete pitch changes to the plucked tone, vibrato and string bending alter the pitch of the fretted note without changing the fret position. Vibrato is achieved by rapidly wiggling the fretting finger at a particular position to slightly alter the pitch 97 of the tone. Pitch-bending involves physically bending the string at the fretting position, thereby altering its tension to achieve a pitch increase. While a certain degree of vibrato may be negligible from an analysis standpoint, pitch bending produces a signal with noticeable time-varying pitch, which cannot be analyzed using either the proposed joint source-filter estimation scheme or existing spectral-based filter estimation schemes. This is due to the harmonically related partials shifting with the fundamental frequency so that the continuously changing partial frequencies and decay rates must be identified. Implementing pitch shifting via post-processing can be achieved but with certain restrictions. For example, vibrato can be implemented using the source-filter model by varying the fractional delay filter in the feedback loop as long as the pitch change is small. However, significant pitch shifting requires modification of the bulk delay term in the feedback loop. Such modification requires continuously resampling the delay line to simulate the gradual tension change in the string [80]. In certain synthesis systems, pitch bending is often simulated by applying a algorithm which applies short-time spectral manipulation to the signal to smoothly alter the pitch of a synthetic signal [20]. Alternately, a sinusoidal model can easily be applied to the time-varying characteristics associated with string bending, though the benefits associated with source-filter modeling will be lost.

7.2 Physical Limitations

The so-called single delay-loop (SDL) model that forms the basis of the analysis and synthesis techniques presented in this dissertation describes the basic components of plucked string synthesis including articulation, pitch and frequency-dependent decay. However, there are several physical aspects of vibrating guitar strings that are not encapsulated by the model.

It is well understood that real strings vibrate along the transverse and longitudinal directions which are perpendicular and parallel to the guitar’s body, respectively. The perceived vibration of the string is the sum of vibration in both directions, including coupling e↵ects, and in certain cases a “beating phenomena” is heard, which is caused by slight di↵erences in the string’s e↵ective length along the transverse and longitudinal axes [16]. The beating phenomena causes the sum and di↵erence frequencies to be perceived by the listener. Identification of the beating frequencies in guitar tones through analysis is dicult since it is a fast occurring phenomenon requiring high spectral resolution (and thus long window lengths) to identify the distinct frequencies. Lee presents an approach for finding the beating frequencies through identification of the two-stage decay evident 98 in plucked tones, but it is unclear how to automate the process which is based on an additive synthesis model [43]. While beating isn’t included in the analysis techniques presented here, beating implementation is often accomplished via an ad-hoc approach where two SDL models are used, each having a slightly di↵erent pitch, and placed in parallel. The outputs of each SDL are scaled by a gain factor and mixed to create a synthetic signal with beating present around the fundamental frequency

[44]. The presented synthesis techniques can easily be modified to include beating, though automated analysis and identification of the beating frequencies remains an on-going research problem.

The pitch shifting due the tension modulation present in real plucked-guitar strings is not ex- plicitly accounted for in the joint source-filter estimation since it is a slowly time-varying process.

However, when the measured pitch shift is relatively small, the fractional delay filter can be slowly varied over time to manipulate the frequency as discussed in Appendix B. The frequency trajecto- ries are obtained by modeling the pitch of a plucked tone via short-time analysis. A technique for incorporating tension modulation into a synthesis system involves re-sampling the delay line to alter the pitch [80] or using a sinusoidal model where the frequencies of the harmonically related partials gradually decrease over time [42].

7.3 Future Directions

Beyond the expressive and physical limitations of the modeling techniques demonstrated, the com- putational model of guitar articulations developed in this thesis could be furthered through the collection of performance data from additional guitarists. However, acquiring this data is challeng- ing due to the specific guitar configuration (e.g. bridge-mounted piezoelectric pickup) required for recording and analyzing the performance. Currently, no publicly available datasets exists while also satisfying the recording configuration, which is why a dataset was created specifically for this research.

There is also the issue of recording the guitarist in the context of a live performance. The data set developed is centered on capturing the acoustic attributes of the expression associated with an articulation in a controlled environment so that individual strings can be isolated. During a live performance, guitarists will alter their articulation in other ways, especially when strumming the strings to produce chords. This necessitates a divided, or “hexaphonic”, guitar pickup for capturing the audio from individual strings while avoiding the challenging task of multiple source separation from a polyphonic mixture. Divided pickups are commercially available for common guitar models, 99 but a streamlined apparatus is required to interface the signals with recording equipment without being obtrusive to the performer. Development of this complete, polyphonic recording system for capturing contextual performance remains a task for future work.

With the inclusion of performance data from many guitarist’s, computational models for specific performers could be developed to determine if the di↵erences in articulation are discernible using the proposed modeling techniques. These models could then be used to “profile” a particular performer and integrate the related parameters into a synthesis system for the application of new musical interfaces. It was already demonstrated that the excitation synthesis could be implemented on currently available mobile computing platforms, but emerging gesture recognition technologies, such as the Microsoft Kinect, could also be used to harness this technology for performance, entertainment and gaming applications.

From a physical modeling standpoint, additional characteristics of guitars such as body resonance e↵ects and magnetic pickups could be studied including how the performer uses these aspects of the instrument during performance. Foremost, inclusion of these e↵ects is required for acoustically ac- curate synthesis of a “complete” guitar model, which would necessitate augmenting the source-filter model with blocks implementing the signal processing tasks for modeling the pickups, resonance, etc. Also, analysis of how the guitarist uses certain techniques such as plucking position or pickup position either consciously, or subconsciously in context with the performance also warrants analysis. 100

Appendix A: Overview of Fractional Delay Filters

A.1 Overview

The waveguide models introduced in Chapter 3 depend on a delay loop parameter, D, that sets the waveguide’s total sample delay and thus the pitch, f , of the synthesized tone such that D = fs , 0 f0 where fs is the sampling frequency. In many cases, however, D is a non-integer which cannot be obtained by a simple ratio of integers. In some systems, it is permissible to adjust the sampling rate to achieve a desired pitch, though this is often undesirable especially when multiple voices are being synthesized or when certain performance techniques, such as tremolo and vibrato, require D to be a continuously varying parameter.

Fractional delay filters have been widely used in the literature to provide the required non- integer delay required for precisely tuning waveguide models [25, 26, 29, 56, 59, 85]. However, design and implementation of such filters is not straight forward and requires some special consideration.

This appendix will briefly overview the basic theory and practical considerations associated with designing and implementing FIR-type fractional delay filters. While IIR-type filters are also used for this task, FIR filters are preferred in the literature since they can be easily designed with good frequency response characteristics. In particular, the Lagrange interpolation fractional delay filter is examined, which is used in this thesis.

A.2 The Ideal Fractional Delay Filter

To understand fractional delay filters, it is useful to consider a discrete time signal, x(n), delayed by D samples. D is a real number and is expressed as

D = dI + dF (A.1)

where dI and dF are the integer and fractional components, respectively. x(n)isshiftedbyD samples via convolution with a shifting filter, h (n), to yield y = x(n D) [54]. In the z-transform id 101 domain, the transfer function of the ideal shifting filter is,

D Y (z) X(z)z D H (z)= = = z (A.2) id X(z) X(z)

j! and the corresponding frequency response is obtained by setting z = e in Equation A.2:

j! j!D Hid(e )=e (A.3)

By computing the magnitude, phase and group delay responses for Equation A.3, it can be veri-

j! fied that Hid(e ) is distortionless since it will pass an input signal without magnitude or phase distortion as shown in Equations A.4 - A.6 [37]:

j! j!D Hid(e ) = e = 1 (A.4)

j! ⇥id(!)= Hid(e )= !D (A.5) \ @ ⌧ (!)= ⇥ (!)=D (A.6) id @! id

It is intuitive that the filter will not distort the magnitude of an input signal since it has unity gain, but the importance of the linear phase response shown in Equation A.5 cannot be understated.

Linear phase implies that the system has a constant group delay such that the input signal is uniformly delayed by D samples regardless of frequency.

j! The impulse response of Hid(e ) can be obtained by taking its inverse discrete-time Fourier transform [54]:

⇡ 1 j! j!n hid(n)= Hid(e )e d! (A.7) 2⇡ ⇡ Z⇡ 1 j!D j!n = e e d! (A.8) 2⇡ ⇡ Z⇡ 1 j!(n D) = e d! (A.9) 2⇡ ⇡ Z

j! By evaluating the integral in Equation A.9, the impulse response of Hid(e ) can be verified as the sinc function shifted by D samples

sin(⇡(n D)) h (n)= =sinc(n D). (A.10) id ⇡(n D) 102

D = 3 1

0.5

0 Amplitude

−2 −1 0 1 2 3 4 5 6 7 8 Sample (n) D = 3.3 1

0.5

0 Amplitude

−2 −1 0 1 2 3 4 5 6 7 8 Sample (n)

Figure A.1: Impulse responses of an ideal shifting filter when the sample delay assumes an integer (top) and non-integer (bottom) number of samples.

Laakso et al. address the problems with implementing a fractional delay filter by comparing the impulse responses for hid(n)whenD takes on integer and non-integer values as shown in Figure A.1

[35, 87] In the case where D = 3, hid(n) reduces to a unit impulse at n = 3 since the sinc function is exactly zero at all other sample values. When D =3.3, however, the hid(n) cannot be reduce to a simple unit impulse, since the peak of the sinc function is o↵set from an integer sample value.

Now, an interpolation using all samples of the sinc function is required to delay an input signal by

D =3.3 samples. As the bottom panel of Figure A.1 shows, implementing this impulse response is not possible since hs(n) is both non-causal and infinite in length.

A.3 Approximation Using FIR Filters

Since the ideal fractional delay (FD) filter cannot be realized in practice, techniques are required to approximate the impulse response for practical implementations. This section will briefly overview the design techniques used to develop approximations based on finite impulse response (FIR) filters.

A FIR filter that approximates the ideal shifting filter has the following form

N n HF (z)= h(n)z (A.11) n=0 X where N indicates the filter order, such that the filter consists of N +1coecients.Todetermine 103 the coecients of h(n) that approximate the ideal filter, an error function is defined

E(ej!)=H (ej!) H (ej!). (A.12) id F

Laakso et al. obtain a time-domain error criterion by applying the L2 norm to Equation A.12 and applying Parseval’s Theorem [35] which yields

1 e (n)= h (n) h (n) 2. (A.13) L2 | F id | n= X1

The optimal solution for hF (n) as per Equation A.13 is the ideal impulse response truncated and delayed by the required number of samples. The error decreases as the number of samples used to approximate the sinc function are increased.

A.3.1 Delay Approximation using Lagrange Interpolation Filters

A consequence of implementing fractional delay (FD) filters based on a truncated sinc function is the well-known Gibbs phenomenon [54]. Essentially, the Gibbs phenomenon results from truncating the impulse response of a sinc function with a square window, which results in the FD filter’s magnitude response exhibiting a ripple due to side lobe interaction. This rippling is often undesirable and thus, more sophisticated techniques are required to design FD filters with relatively flat magnitude responses.

Lagrange interpolation filters allow for FD filter design with a maximally flat magnitude response at a frequency of interest. The coecients for the Lagrange filters are obtained by setting the derivatives of Equation A.12 equal to 0.

dnE(ej!) = 0 for n =0, 1, 2,...,N (A.14) d!n |!=!0

In most cases, it is desired that the maximally flat magnitude response occur near DC, which requires

!0 = 0. The solution of Equation A.14 is obtained by solving a system of N linear equations which has the following solution

N D k h(n)= for n =0, 1, 2,...,N (A.15) n k k=0,k=n Y6 104 where D is the total delay including the fractional component [35, 53]. The name of the Lagrange interpolation filter becomes obvious when considering the case N =1whichyieldscoecients h(0) = 1 D and h(1) = D, which is equivalent to a linear interpolation between two samples. Figure A.2 illustrates the tradeo↵s associated with designing Lagrange filters with a desired accuracy. As the order N is increased, the values of the h(n) approach the ideal fractional delay filter at the expense of adding integer sample delay. Figure A.3 demonstrates the tradeo↵s associated with the order of the Lagrange FD filter and its frequency response. As N increases, the cuto↵frequency for the filter’s magnitude response increases, thus providing a flatter magnitude response across a wider bandwidth. Similarly, we also see the tradeo↵associated with the group delay of the FD

filter, since increasing N maintains the desired flat group delay response over a wider bandwidth.

For an N-order Lagrange FD filter designed for a maximally flat response at DC, the associated bulk integer delay, d , at this frequency can be computed as N/2 . I b c

1 Lagrange Filter, N = 3 0.5 Ideal

Amplitude 0

0 1 2 3 4 5 6 7 8 9 10 Sample (n)

1 Lagrange Filter, N = 7 0.5 Ideal

0 Amplitude

0 1 2 3 4 5 6 7 8 9 10 Sample (n)

Figure A.2: Lagrange interpolation filters with order N = 3 (top) and N = 7 (bottom) to provide a fractional delay, dF =0.3. As the order of the filter is increased, the Lagrange filter coecients near the values of the ideal function.

A.4 Further Considerations

Lagrange interpolation filters are a popular choice in many waveguide synthesis systems, since the

filter coecients are relatively easy to compute and the frequency response characteristics are suf-

ficient for relatively low order filters. In general, FIR FD filters are preferred for musical synthesis because they can be varied during synthesis to achieve certain a↵ects (such as pitch bending or 105

5 N = 3 N = 5 0 N = 7

−5 Magnitude (dB) −10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized Frequency 5 N = 3 N = 5 N = 7 4 3 2 1

Group Delay (samples) 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized Frequency

Figure A.3: Frequency response characteristics of Lagrange interpolation filters with order N = 3, 5, 7 to provide a fractional delay dF =0.3. Magnitude (top) and group delay (bottom) character- istics are plotted.

vibrato) without noticeable transient e↵ects, which is problematic when using IIR FD filter.

FD filter design based on a maximally flat frequency characteristics is just one of many techniques used for designing FD filters. The reader is referred to the work of Laakso and Valimaki for additional

FD design techniques, including windowed sinc functions, weighted-least squares and IIR design techniques [35, 87]. Additionally, these works provide theory and techniques required to develop IIR

FD filters using all-pass filters, which have their own benefits over FIR implementations. 106

Appendix B: Pitch Glide Modeling

B.1 Overview

Pitch glide is an important physical consequence resulting from plucking a guitar string. As a plucked-string vibrates about its equilibrium, or resting, position, it is subject to elongation of its nominal length. This elongation increases the tension of the string beyond its nominal value and, consequently, increases the fundamental frequency of vibration. As shown in the following equation, the fundamental frequency of vibration is proportional to the square of the string’s tension:

Kt f0 = (B.1) r4mL where Kt, m and L are the string’s tension, mass and length, respectively [17]. Since the string loses energy during vibrations due to various frictional forces, the amplitude of its transverse displacements decreases over time and thus, the elongation decreases as well. After some amount of time, the string will vibrate at near its nominal, or un-stretched value, and a steady state pitch is perceived.

Modeling and simulation of pitch glide is an important consideration for an expressive guitar system since it can lead to tones that have a noticeably higher pitch near the “attack” part of the note than it does some time later. The amount of pitch glide present in a tone depends on the guitarist’s dynamics, or the relative “hardness” used to displace the string. Therefore, as a guitarist increases their dynamics during performance, we expect the resulting notes to have a greater perceived pitch initially than some time after the “attack” phase.

This appendix will discuss the modeling and implementation of pitch glide for expressive guitar synthesis. This includes the estimation of time-varying pitch from plucked-guitar recordings, fitting estimated data to a model of pitch glide and practical implementation. 107

B.2 Pitch Glide Model

The following model was proposed by Lee et al. [42] to simulate the pitch glide trajectory observed in recorded guitar tones

t/⌧ f(t)=fss(1 + ↵e ). (B.2)

This representation consists of multiplying the steady state pitch value fss, which is associated with the nominal tension of the string, by an exponentially decaying function with time constant ⌧ and a multiplicative factor ↵. This model ensures that the tone decays to its steady state pitch as t , which agrees with the physicality of the damped vibrating string. The multiplicative factor !1 ↵ determines the amount of pitch excursion such that increasing ↵ increases the amount of pitch deviation from its steady state value. By setting ↵ to an arbitrarily small (or zero) value, the pitch glide e↵ect is e↵ectively eliminated so that f(t) f for all values of t. For a physical interpretation ⇡ ss of Equation B.2, Lee relates the time-varying fundamental frequency to the square of the slope of the string’s displacement, which decays exponentially over time [42].

The pitch glide model of Equation B.2 is suitable for an expressive synthesis system because its parameters can be related to particular articulations. In particular, the ↵ parameter allows the amount of pitch glide to vary based on the dynamics used by the player. This parameter, and the others, must be determined through analysis of plucked-guitar recordings.

B.3 Pitch Glide Measurement

In this section we discuss the estimation of pitch glide parameters through analysis of plucked-guitar recordings. The data set used for parameter estimation consists of approximately 1000 samples of guitar tones recorded using a bridge-mounted piezo-electric pick-up. The recorded notes span all 6 guitar strings and were produced by varying the plucking device and articulation from piano (soft), to mezzo-forte (moderately loud) to forte (loud). More information about the data is provided in

Section 6.3.

The first step involves acquisition of the pitch glide data from the recordings. A short-time analysis is applied to the recordings to extract 1500 msec of pitch information for each tone beginning at the “attack” instant of the tone. This audio segment is sub-divided into overlapping frames, each having a duration of 90 msec and adjacent frames are overlapped by a factor of 90%. 108

For each analysis frame, the (FFT) is computed and the pitch is deter- mined by searching for the prominent peak in the frequency spectrum. The underlying frequency bin at the spectral peak indicates the pitch for the vibrating string at that moment. This pitch estimation is improved via quadratic interpolation around the spectral peak [77]. Utilizing the peak

FFT bin and the magnitudes of the neighboring bins on each side of the peak, the “true” peak is found by finding the maxima of the parabola passing through all three points. The underlying frequency of this maxima is taken as the “true” frequency. This step improves the pitch estimation by compensating for the limited frequency resolution of the FFT.

By repeating the pitch estimation for each frame, a pitch trajectory is obtained for each recording in the data set. Since the approach involves determining the parameters of Equation B.2 from many recordings, each pitch trajectory is normalized by its steady state frequencyf(t)/fss.Bydividing

Equation B.2 by fss, the measured data must be fit to the following equation

t/⌧ fnorm(t)=1+↵e , (B.3)

where fnorm(t)=f(t)/fss is the normalized pitch trajectory. The normalized pitch trajectories cor- responding to recordings produced with a specific articulation (e.g. piano) are averaged to compute pitch trajectory prototype curves used for model fitting.

B.4 Nonlinear Modeling and Data Fitting

B.4.1 Nonlinear Least Squares Formulation

To determine the model parameters that best describe the measured pitch glide trajectories, a nonlinear least-squares (NLLS) problem is formulated. The problem formulation involves defining a residual function

r(t)=fˆ(t) F (t, x), (B.4) where fˆ is a prototype pitch glide curve measured from audio and F (t, x) is the pitch glide function in Equation B.3 with unknown parameters x =[↵⌧]. The optimal parameters satisfy S(x⇤)=0 109 where S is the sum of squares of the residual defined by

S(x)= r(t)2. (B.5) t X The unknowns in x are found by taking the gradient of S with respect to x and setting it equal to zero

@S @r =2 r t =0 i =1, 2. (B.6) @x t @x i t i X

Equation B.6 lacks a closed form solution since the partial derivatives @rt/@xi of the nonlinear func- tion depend on both the independent variable and the unknown parameters. In practice, nonlinear least squares problems are solved using iterative methods where initial values of the unknown param- eters in x are specified and iteratively refined using successive approximation [32]. This linearizes the model through a Taylor series expansion by ignoring the high order, nonlinear terms.

The algorithm chosen for successive approximation in this implementation is the Gauss-Newton

Iteration, which is available in many numerical software packages. The MATLAB function lsqnonlin applies NLLS approximation using the Gauss-Newton Iteration by default [48]. This function allows the programmer to specify the nonlinear function desired for curve-fitting as well as the initial pa- rameter estimations, bounds for the unknown parameters, the maximum number of iterations and several other options.

B.4.2 Fitting and Results

We first extract the pitch glide parameters for the forte articulations using MATLAB’s lsqnonlin function. The results of this fit are shown in Figure B.1.

Using the time constant ⌧ estimated for the forte pitch glide curve, we constrain the NNLS algorithm for the remaining piano and mezzo-forte curves by enforcing the same ⌧ value for all curves.

This results in all pitch glide curves having the same time constant, but di↵ering ↵ parameters, which determine the maximum amount of pitch deviation from the steady state value. In this manner, ↵ acts as an expressive control parameter which can be varied to continuously interpolate between the piano and forte pitch glide curves. Figure B.2 shows the observed and estimated pitch glide curves for each articulation and clearly shows the e↵ect of the ↵ parameter on the initial pitch glide value.

The extracted parameters for each articulation are summarized in Table B.1. 110

1.0045 Forte, measured 1.004 Forte, fitted

1.0035

1.003

1.0025

1.002

1.0015 Normalized Frequency

1.001

1.0005

1 0.2 0.4 0.6 0.8 1 1.2 1.4 Time (sec)

Figure B.1: Measured and modeled pitch glide for forte plucks.

B.5 Implementation

For implementation of the pitch glide e↵ect in a plucked-guitar synthesis system, we employ the well-known single delay-loop model, which was presented in Chapter 2 and is shown in Figure B.3.

fs The pitch of the synthetic tone is determined by the ratio D where fs is the sampling frequency and

fs D is the delay line length. Since the ratio of D is often non-integer, HF (z)providestherequired non-integer delay. Appendix A provides an overview of fractional delay filters.

The fractional delay filter chosen is a variable 5th order LaGrange interpolation filter inserted into the feedback loop of the single delay-loop model as shown in Figure B.3. Equation B.3 can be multiplied by the desired steady state pitch value to achieve the correct tuning. The pitch glide is implemented by updating the coecients of HF (z) every 50 milliseconds according to the prototype curve for a particular articulation. Updating the coecients in this manner is possible, since the single delay-loop model is implemented as a Type I IIR filter, which has separate delay lines for the input and output feedback [77]. 111

1.0045 Forte, measured 1.004 Forte, fitted Mezzo−forte, measured 1.0035 Mezzo−forte, fitted Piano, measured 1.003 Piano, fitted

1.0025

1.002

1.0015 Normalized Frequency

1.001

1.0005

1 0.2 0.4 0.6 0.8 1 1.2 1.4 Time (sec)

Figure B.2: Measured and modeled pitch glide for piano, mezzo-forte and forte plucks.

pb(n) + y(n)

Hl (z) HF (z) z-DI

Figure B.3: Single delay-loop waveguide filter with variable fractional delay filter, HF (z). 112

Table B.1: Pitch glide parameters of Equation B.3 for plucked guitar tones for each guitar string. p, mf and f indicate strings excited with piano, mezzo-forte and forte dynamics, respectively. Pitch Glide Parameters

4 String Dynamic ↵ ( 10 ) ⌧ ⇥ p 1.523 0.2284 1 mf 3.123 0.2284 f 11.94 0.2284 p 9.337 0.4037 2 mf 19.41 0.4037 f 44.39 0.4037 p 16.45 0.3958 3 mf 35.51 0.3958 f 72.91 0.3958 p 26.03 0.3766 4 mf 36.55 0.3766 f 60.89 0.3766 p 35.04 0.3786 5 mf 60.21 0.3786 f 68.28 0.3786 p 38.03 0.3523 6 mf 62.76 0.3523 f 81.24 0.3523 113

Bibliography

[1] Amidio. OMGuitar advanced guitar synth. http://amidio.com/portfolio/omguitar/, Jan. 2012.

[2] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Green- baum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, third edition, 1999.

[3] B. Bank and V. V¨alim¨aki. Robust loss filter design for digital waveguide synthesis of string tones. Signal Processing Letters, IEEE, 10(1):18 – 20, Jan. 2003.

[4] J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, and M. B. Sandler. A tutorial on onset detection in music signals. IEEE Transactions on Speech and Audio Processing, 13(5), 2005.

[5] C.M. Bishop. Pattern Recognition and Machine Learning. Information science and statistics. Springer, 2006.

[6] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, The Edin- burgh Building, Cambridge, CB2 8RU, UK, 2004.

[7] K. Bradley, Mu-Huo Cheng, and V.L. Stonick. Automated analysis and computationally ecient synthesis of acoustic guitar strings and body. In Applications of Signal Processing to Audio and , 1995., IEEE ASSP Workshop on, pages 238–241, Oct. 1995.

[8] John M. Chowning. The synthesis of complex audio spectra by means of . J. Audio Eng. Soc, 21(7):526–534, 1973.

[9] Perry R. Cook, editor. Music, Cognition, and Computerized Sound: An Introduction to Psy- choacoustics. MIT Press, Cambridge, MA, USA, 1999.

[10] Perry R. Cook. Real Sound Synthesis for Interactive Applications. A. K. Peters, Ltd., Natick, MA, USA, 2002.

[11] Perry R. Cook and Gary P. Scavone. The synthesis toolkit (STK). In International Computer Music Conference, 1999.

[12] G. Cuzzucoli and V. Lombardo. A physical model of the classical guitar, including the player’s touch. Computer Music Journal, 23(2):52–69, Jun. 1999.

[13] R.O. Duda, P.E. Hart, and D.G. Stork. Pattern Classification. Pattern Classification and Scene Analysis: Pattern Classification. Wiley, 2001.

[14] C. Erkut, V. V¨alim¨aki, M. Karjalainen, and M. Laurson. Extraction of physical and expres- sive parameters for model-based sound synthesis of the classical guitar. In 108th AES Int. Convention 2000, pages 19–22, Paris, France, Feb. 2000. AES. 114

[15] Fishman. Pickups: Tune-o-matic powerbridge pickup. http://www.fishman.com/products/ view/tune-o-matic-powerbridge-pickup, Apr. 2012.

[16] N. H. Fletcher. The nonlinear physics of musical instruments. Technical Report 62, Institute of Physics Publishing, 1999.

[17] N.H. Fletcher and T.D. Rossing. The Physics of Musical Instruments. Springer, 1998.

[18] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 1.21. http://cvxr.com/cvx.

[19] J. Gudnason, M. R. P. Thomas, P. A. Naylor, and D. P. W. Ellis. Voice source waveform analysis and synthesis using principal component analysis and gaussian mixture modelling. In Proc. of the 2009 Annual Conference of the International Speech Communication Association, Brighton, U.K., Sept. 2009. INTERSPEECH.

[20] Apple Inc. Garageband. http://itunes.apple.com/us/app/garageband/id408709785?mt=8, Jan. 2012.

[21] ISO. Information technology - coding of audio-visual objects - part 3: Au- dio. http://www.iso.org/iso/iso_catalogue/catalogue_ics/catalogue_detail_ics. htm?csnumber=53943, Nov. 2011.

[22] D. A. Ja↵e and J. O. Smith. Extensions of the Karplus-Strong plucked-string algorithm. Com- puter Music Journal, 7(2):56–69, Jun. 1983.

[23] J.-M. Jot. An analysis/synthesis approach to real-time artificial reverberation. In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 2, pages 221–224. ICASSP, Mar. 1992.

[24] M. Karjalainen, A. Harma, U.K. Laine, and J. Huopaniemi. Warped filters and their audio ap- plications. In Proc. IEEE Worshop on Applications of Signal Processing to Audio and Acoustics, page 4 pp. WASPAA, Oct. 1997.

[25] M. Karjalainen and U. K. Laine. A model for real-time sound synthesis of guitar on a floating- point signal processor. In Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, volume 5, pages 3653–3656. ICASSP, Apr. 1991.

[26] M. Karjalainen, T. Maki-Patola, A. Kanerva, A. Huovilainen, and P. Janis. Virtual air guitar. In Proc. of the 117th Audio Engineering Society Convention. AES, Oct. 2004.

[27] M. Karjalainen, H. Penttinen, and V. V¨alim¨aki. Acoustic sound from the electric guitar using DSP techniques. In Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, volume 2, pages 773–776. ICASSP, 2000.

[28] M. Karjalainen and J. O. Smith. Body modeling techniques for synthesis. In Proc. of the International Computer Music Conference. ICMC, 1996.

[29] M. Karjalainen, V. V¨alim¨aki, and Z. Janosy. Towards high-quality sound synthesis of the guitar and string instruments. In Proc. of the International Computer Music Conference.ICMC,Sept. 1993.

[30] M. Karjalainen, V. V¨alim¨aki, and T.. Tolonen. Plucked-string models: From the Karplus-Strong algorithm to digital waveguides and beyond. Computer Music Journal, 22(3):17–32, Oct. 1998. 115

[31] K. Karplus and A. Strong. Digital synthesis of plucked-string and drum timbres. Computer Music Journal, 7(2):43–55, Jun. 1983.

[32] C. T. Kelley. Iterative Methods for Optimization. Frontiers in Applied Mathematics, SIAM, 1999.

[33] L. E. Kinsler, A. R. Frey, A. B. Coppens, and J. V. Sanders. Fundamentals of Acoustics.Wiley, 3rd edition, 1982.

[34] Mark A. Kramer. Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal, 37(2):233–243, 1991.

[35] T. Laakso, V. V¨alim¨aki, M. Karjalainen, and U. K. Laine. Splitting the unit delay - tools for fractional delay filter design. IEEE Signal Processing Magazine, 13(1):30–60, Jan. 1996.

[36] J. Laroche and J.-L. Meillier. Multichannel excitation/filter modeling of percussive sounds with application to the piano. IEEE Transactions on Speech and Audio Processing, 2(2):329 –344, Apr. 1994.

[37] B. P. Lathi. Signal Processing And Linear Systems. Oxford University Press, Inc., 198 Madison Avenue, New York, New York, 10016, 1998.

[38] N. Laurenti, G. De Poli, and D. Montagner. A nonlinear method for stochastic spectrum esti- mation in the modeling of musical sounds. IEEE Transactions on Audio, Speech, and Language Processing, 15(2):531 –541, Feb. 2007.

[39] M. Laurson, C. Erkut, V. V¨alim¨aki, and M. Kuushankare. Methods for modeling realistic playing in acoustic guitar synthesis. Computer Music Journal, 25(3):38–49, Oct. 2001.

[40] N. Lee, R. Cassidy, and J.O. Smith. Use of energy decay relief (EDR) to estimate partial- decay-times in a freely vibrating string. In Invited paper at The Musical Acoustics Sessions at the Joint ASA-ASJ meeting, Honolulu, HI, 2006. ASA.

[41] N. Lee, Z. Duan, and J. O. Smith. Excitation signal extraction for guitar tones. In Proc. of the International Computer Music Conference. ICMC, 2007.

[42] N. Lee, J. O. Smith, J. Abel, and D. Berners. Pitch glide analysis and synthesis from recorded tones. In Proc. of the International Conference on E↵ects, Como, Italy, Sept. 2009. DAFx.

[43] N. Lee, J. O. Smith, and V. V¨alim¨aki. Analysis and synthesis of coupled vibrating strings using a hybrid modal-waveguide synthesis model. IEEE Transactions on Audio, Speech and Language Processing, 18(4):833–842, May 2010.

[44] N. Lindroos, H. Penttinen, and V. Valimaki. Parametric electric guitar synthesis. Computer Music Journal, 35(3):18–27, Sept. 2011.

[45] Line6. Lin6 modeling amplifiers. http://line6.com/amps, May 2012.

[46] Line6. Lin6 variax guitars. http://line6.com/guitars, May 2012.

[47] MathWorks. Optimization Toolbox 5.0. http://www.mathworks.com/products/ optimization/, August 2010. 116

[48] MathWorks. Curve Fitting Toolbox 3.0. http://www.mathworks.com/products/ curvefitting/, November 2011.

[49] D. Mazzoni and R. Dannenberg. Audacity. http://audacity.sourceforge.net/, Oct. 2011.

[50] R. McAulay and T. Quatieri. Speech analysis/synthesis based on a sinusoidal representation. IEEE Transactions on Acoustics, Speech and Signal Processing, 34(4):744 – 754, aug 1986.

[51] P. Mokhtari, H. R. Pfitzinger, and C. T. Ishi. Principal components of glottal waveforms:towards parameterisation and manipulation of laryngeal voice quality. In VOQUAL ’03, 2003.

[52] P. M. Morse and K. U. Ingard. Theoretical Acoustics. McGraw-Hill Education, New York, NY, USA, 1968.

[53] G. Oetken. A new approach for the design of digital interpolating filters. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(6):637 – 643, Dec. 1979.

[54] A. V. Oppenheim and R. W. Schafer. Discrete-Time Signal Processing. Prentice-Hall, Inc., Upper Saddle River, New Jersey, 1999.

[55] C. O’Shea. Kinect air guitar prototype. http://www.chrisoshea.org/lab/ air-guitar-prototype, Jan. 2012.

[56] J. Pakarinen, T. Puputti, and V. V¨alim¨aki. Virtual slide guitar. Computer Music Journal, 32(3):42–54, 2008.

[57] H. Penttinen, M. Karjalainen, T. Paatero, and H. Jarvelainen. New techniques to model rever- berant instrument body responses. In Proc. of the International Computer Music Conference. ICMC, 2001.

[58] H. Penttinen, J. Siiskonen, and V. V¨alim¨aki. Acoustic guitar plucking point estimation in real time. In Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 3, pages 209 – 212. ICASSP, Mar. 2005.

[59] H. Penttinen and V. V¨alim¨aki. Time-domain approach to estimating the plucking point of guitar tones obtained with an under-saddle pickup. Applied Acoustics, 65:1207–1220, Dec. 2004.

[60] Thomas Quatieri. Discrete-Time Speech Signal Processing: Principles and Practice.Prentice Hall Press, Upper Saddle River, NJ, USA, 2001.

[61] L. Rabiner. On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech and Signal Processing, 25(1):24 – 33, Feb. 1977.

[62] Janne Riionheimo and Vesa V¨alim¨aki. Parameter estimation of a plucked string synthesis model using a genetic algorithm with perceptual fitness calculation. EURASIP J. Appl. Signal Process., 2003:791–805, 2003.

[63] M. Roma, L. Gonzalez, and F. Briones. Software based acoustic guitar simulation by means of its impulse response. In 10th Meeting on Audio Engineering of the AES. AES, Portugal, Lisbon, 2009.

[64] Thomas D. Rossing, editor. The Science of String Instruments, chapter 23. Springer Sci- ence+Business Media, 233 Spring Street, New York, NY 10013, USA, 1 edition, 2010. 117

[65] Sam T. Roweis and Lawrence K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000.

[66] E.D. Scheirer. The MPEG-4 structured audio standard. In Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, volume 6, pages 3801 –3804 vol.6. ICASSP, may 1998.

[67] M. Scholz. Nonlinear PCA toolbox for MATLAB. http://www.nlpca.de/matlab.html, 2011.

[68] Xavier Serra and J. O. Smith. Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition. Computer Music Journal, 14(4):pp. 12–24, 1990.

[69] J. O. Smith. Techniques for Digital Filter Design and System Identification with Application to the Violin. PhD thesis, Department of Music, Stanford University, Stanford, CA, Jun. 1983.

[70] J. O. Smith. Music applications of digital waveguides. Technical report, CCRMA, Music Department, Stanford University, 1987.

[71] J. O. Smith. Waveguide filter tutorial. In Proc. of the International Computer Music Conference, pages 9–16. Computer Music Association, 1987.

[72] J. O. Smith. Physical modeling using digital waveguides. Computer Music Journal, 16(4):74–91, 1992.

[73] J. O. Smith. Ecient synthesis of stringed musical instruments. In Proc of the International Computer Music Conference, Tokyo, Japan, 1993. ICMC.

[74] J. O. Smith. Virtual electric guitars and e↵ects using faust and octave. In Proc of International Linux Audio Conference, Cologne, Germany, 2008.

[75] J. O. Smith. Digital waveguide architectures for virtual musical instruments. In David Havelock, Sonoko Kuwano, and Michael Vorl¨ander, editors, Handbook of Signal Processing in Acoustics, pages 399–417. Springer New York, 2009.

[76] J. O. Smith. Physical Audio Signal Processing. W3K Publishing, 2010. online book.

[77] J. O. Smith. Spectral Audio Signal Processing, October 2008 Draft. CCRMA Stanford, August 22, 2010. online book.

[78] Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.

[79] T. Tolonen, C. Erkut, V. V¨alim¨aki, and M. Karjalaineen. Simulation of plucked strings exhibit- ing tension modulation driving force. In Proc. of the International Computer Music Conference. ICMC, 1999.

[80] T. Tolonen, V. V¨alim¨aki, and M. Karjalainen. Modeling of tension modulation nonlinearity in plucked strings. IEEE Transactions on Speech and Audio Processing, 8(3):300–310, May 2000.

[81] C. Traube and P. Depalle. Extraction of the excitation point location on a string using weighted least-square estimation of a comb filter delay. In Proc. of International Conference on Digital Audio E↵ects, London, UK, Sept. 2003. DAFx. 118

[82] C. Traube, P. Depalle, and M. Wanderley. Indirect acquisition of instrumental gesture based on signal, physical and perceptual information. In Proc. of New Interfaces for Musical Expression, pages 42–47, Montreal, Canada, 2003. NIME.

[83] C. Traube and J. O. Smith. Estimating the plucking point on a guitar string. In COST G-6 Conference on Digital Audio E↵ects. DAFX, Dec. 2000.

[84] C. Traube and J.O. Smith. Extracting the fingering and the plucking points on a guitar string from a recording. In Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 7–10. WASPAA, 2001.

[85] V. V¨alim¨aki. Discrete-Time Modeling of Acoustic Tubes Using Fractional Delay Filters.PhD thesis, Helsinki University of Technology, Espoo, Finland, 1995.

[86] V. V¨alim¨aki, J. Huopaniemi, M. Karjalainen, and Z. Janosy. Physical modeling of plucked string instruments with application to real-time sound synthesis. Journal of the Audio Engineering Society, 44(5):331–353, May 1996.

[87] V. V¨alim¨aki and T. Laakso. Principles of fractional delay filters. In Proc. of the IEEE Inter- national Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, Jun. 2000. ICASSP.

[88] V. V¨alim¨aki, H. Lehtonen, and T. Laakso. Musical signal analysis using fractional-delay inverse comb fitlers. In Proc. of International Conference on Digital Audio E↵ects, Bordeaux, France, Sept. 2007. DAFx.

[89] V. V¨alim¨aki, J. Pakarinen, C. Erkut, and M. Karjalainen. Discrete-time modeling of musical instruments. Technical report, Institute of Physics Publishing, Oct. 2005.

[90] V. V¨alim¨aki and T. Tolonen. Development and calibration of a . Journal of the Audio Engineering Society, 46(9):766–778, Sept. 1998.

[91] V. V¨alim¨aki, T. Tolonen, and M. Karjalinen. Signal-dependent nonlinearities for physical models using time-varying fractional delay filters. In International Computer Music Conference, pages 264–267, Oct. 1998.

[92] B.L. Vercoe and D. P. Ellis. Real-time csound: Software synthesis with sensing and control. In International Computer Music Conference, 1990.

[93] B.L. Vercoe, W.G. Gardner, and E.D. Scheirer. Structured audio: creation, transmission, and rendering of parametric sound representations. Proceedings of the IEEE, 86(5):922 –940, may 1998. 119

VITA

Raymond Vincent Migneco

EDUCATION Ph.D. Electrical & Computer Engineering, Drexel University, Philadelphia, PA, 2012 M.S. Electrical & Computer Engineering, Drexel University, Philadelphia, PA, 2011 B.S. Electrical Engineering, The Pennsylvania State University, University Park, PA, 2005 ACADEMIC HONORS Eta Kappa Nu Electrical & Computer Engineering Honor Society Dean’s List Honors Drexel University, The Pennsylvania State University PROFESSIONAL EXPERIENCE Graduate Research Assistant, Drexel University, 9/2007 - 6/2012 Electrical Reliability Engineer, Sunoco Chemicals, 8/2005 - 8/2007 TEACHING EXPERIENCE Teaching Assistant, Drexel University, 9/2007 - 6/2011 NSF Discovery K-12 Fellow, Drexel University, 3/2008 - 6/2009 Teaching Assistant, The Pennsylvania State University, 1/2005 - 5/2005 SELECTED PUBLICATIONS Migneco, R., and Kim, Y. E. (2012). “A Component-Based Approach for Modeling Plucked- • Guitar Excitation Signals,” Proceedings of the International Conference on New Interfaces for Musical Expression, Ann Arbor, MI: NIME. Batula, A. M., Morton, B. G., Migneco, R., Prockup, M., Schmidt, E. M., Grunberg, D. • K., Kim, Y. E., and Fontecchio, A. K. (2012). “Music Technology as an Introduction to STEM,” Proceedings of the American Society for Engineering Education Annual Conference, San Antonio, TX: ASEE. Migneco, R., and Kim, Y. E. (2011). “Excitation Modeling and Synthesis for Plucked Guitar • Tones,” Proceedings of the 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY: WASPAA. Migneco, R., and Kim, Y. E. (2011). “Modeling Plucked Guitar Tones Via Joint Source • Filter Estimation,” Proceedings of the 14th IEEE Digital Signal Processing Workshop and 6th IEEE Signal Processing Education Workshop, Sedona, AZ: DSP/SPE. Scott, J., Migneco, R., Morton, B., Hahn, C. M., Difenbach, P. and Kim, Y. E. (2010). • “An audio processing library for MIR application development in Flash,” Proceedings of the 2010 International Society for Music Information Retrieval Conference, Utrecht, Netherlands: ISMIR. Migneco, R., Doll, T. M., Scott, J. J., Hahn, C., Diefenbach, P. J., and Kim, Y. E. (2009). • “An audio processing library for game development in Flash,” Accepted to International IEEE Consumer Electronics Societys Games Innovations Conference. Kim, Y. E., Doll, T. M., and Migneco, R. (2009). “Collaborative online activities for acoustics • education and psychoacoustic data collection,” in IEEE Transactions on Learning Technologies. Doll, T. M., Migneco, R., Scott, J. J., and Kim, Y. E. (2009). “An audio DSP toolkit • for rapid application development in Flash,” Accepted to IEEE International Workshop on Multimedia Signal Processing, Rio de Janiero, Brazil: MMSP.