<<

EURASIP Journal on Applied Signal Processing

Model-Based Sound Synthesis

Guest Editors: Vesa Välimäki, Augusto Sarti, Matti Karjalainen, Rudolf Rabenstein, and Lauri Savioja

EURASIP Journal on Applied Signal Processing Model-Based Sound Synthesis

EURASIP Journal on Applied Signal Processing Model-Based Sound Synthesis

Guest Editors: Vesa Välimäki, Augusto Sarti, Matti Karjalainen, Rudolf Rabenstein, and Lauri Savioja

Copyright © 2004 Hindawi Publishing Corporation. All rights reserved.

This is a special issue published in volume 2004 of “EURASIP Journal on Applied Signal Processing.” All articles are open access articles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Editor-in-Chief Marc Moonen, Belgium

Senior Advisory Editor K. J. Ray Liu, College Park, USA Associate Editors Kiyoharu Aizawa, Japan A. Gorokhov, The Netherlands Antonio Ortega, USA Gonzalo Arce, USA Peter Handel, Sweden Montse Pardas, Spain Jaakko Astola, Finland Ulrich Heute, Germany Ioannis Pitas, Greece Kenneth Barner, USA John Homer, Australia Phillip Regalia, France Mauro Barni, Italy Jiri Jan, Czech Markus Rupp, Austria Sankar Basu, USA Søren Holdt Jensen, Denmark Hideaki Sakai, Japan Jacob Benesty, Canada Mark Kahrs, USA Bill Sandham, UK Helmut Bölcskei, Switzerland Thomas Kaiser, Germany Wan-Chi Siu, Hong Kong Chong-Yung Chi, Taiwan Moon Gi Kang, Korea Dirk Slock, France M. Reha Civanlar, Turkey Aggelos Katsaggelos, USA Piet Sommen, The Netherlands Tony Constantinides, UK Mos Kaveh, USA John Sorensen, Denmark Luciano Costa, Brazil C.-C. Jay Kuo, USA Michael G. Strintzis, Greece Satya Dharanipragada, USA Chin-Hui Lee, USA Sergios Theodoridis, Greece Petar M. Djurić, USA Kyoung Mu Lee, Korea Jacques Verly, Belgium Jean-Luc Dugelay, France Sang Uk Lee, Korea Xiaodong Wang, USA Touradj Ebrahimi, Switzerland Y. Geoffrey Li, USA Douglas Williams, USA Sadaoki Furui, Japan Mark Liao, Taiwan An-Yen (Andy) Wu, Taiwan Moncef Gabbouj, Finland Bernie Mulgrew, UK Xiang-Gen Xia, USA Sharon Gannot, Israel King N. Ngan, Hong Kong Fulvio Gini, Italy Douglas O’Shaughnessy, Canada

Contents

Editorial, Vesa Välimäki, Augusto Sarti, Matti Karjalainen, Rudolf Rabenstein, and Lauri Savioja Volume 2004 (2004), Issue 7, Pages 923-925

Physical Modeling of the , N. Giordano and M. Jiang Volume 2004 (2004), Issue 7, Pages 926-933

Sound Synthesis of the Harpsichord Using a Computationally Efficient Physical Model, Vesa Välimäki, Henri Penttinen, Jonte Knif, Mikael Laurson, and Cumhur Erkut Volume 2004 (2004), Issue 7, Pages 934-948

Multirate Simulations of String Vibrations Including Nonlinear Fret-String Interactions Using the Functional Transformation Method, L. Trautmann and R. Rabenstein Volume 2004 (2004), Issue 7, Pages 949-963

Physically Inspired Models for the Synthesis of Stiff Strings with Dispersive Waveguides, I. Testa, G. Evangelista, and S. Cavaliere Volume 2004 (2004), Issue 7, Pages 964-977

Digital Waveguides versus Finite Difference Structures: Equivalence and Mixed Modeling, Matti Karjalainen and Cumhur Erkut Volume 2004 (2004), Issue 7, Pages 978-989

A Digital Synthesis Model of Double-Reed Wind Instruments, Ph. Guillemain Volume 2004 (2004), Issue 7, Pages 990-1000

Real-Time Gesture-Controlled Physical Modelling Music Synthesis with Tactile Feedback, David M. Howard and Stuart Rimell Volume 2004 (2004), Issue 7, Pages 1001-1006

Vibrato in Singing Voice: The Link between Source-Filter and Sinusoidal Models, Ixone Arroabarren and Alfonso Carlosena Volume 2004 (2004), Issue 7, Pages 1007-1020

A Hybrid Resynthesis Model for Hammer-String Interaction of Piano Tones, Julien Bensa, Kristoffer Jensen, and Richard Kronland-Martinet Volume 2004 (2004), Issue 7, Pages 1021-1035

Warped Linear Prediction of Physical Model Excitations with Applications in Audio Compression and Instrument Synthesis, Alexis Glass and Kimitoshi Fukudome Volume 2004 (2004), Issue 7, Pages 1036-1044 EURASIP Journal on Applied Signal Processing 2004:7, 923–925 c 2004 Hindawi Publishing Corporation

Editorial

Vesa Valim¨ aki¨ Laboratory of and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000, 02015 Espoo, Finland Email: vesa.valimaki@hut.fi Augusto Sarti Dipartimento di Elettronica e Informazione, Politecnico di Milano, piazza Leonardo da Vinci 32, 20133 Milan, Italy Email: [email protected] Matti Karjalainen Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000, 02015 Espoo, Finland Email: matti.karjalainen@hut.fi Rudolf Rabenstein Multimedia Communications and Signal Processing, University Erlangen-Nuremberg, 91058 Erlangen, Germany Email: [email protected] Lauri Savioja Laboratory of Telecommunications Software and Multimedia, Helsinki University of Technology, P.O. Box 5400, 02015 Espoo, Finland Email: lauri.savioja@hut.fi

Model-based sound synthesis has become one of the most called ALMA (Algorithms for the Modelling of Acous- active research topics in musical signal processing and in tic Interactions, IST-2001-33059, see http://www-dsp.elet. musical acoustics. The earliest attempts in generating mu- polimi.it/alma/) where the guest editors and their research sical sound with a physical model were made over three teams collaborated in the period from 2001 to 2004. The decades ago. The first commercial products were seen only goal of the ALMA project was to develop an elegant, gen- some twenty years later. Recently, many refinements to pre- eral, and unifying strategy for a blockwise design of physi- vious signal processing algorithms and several new ones have cal models for sound synthesis. A “divide-and-conquer” ap- been introduced. We have learned that new signal processing proach was taken, in which the elements of the structure methods can still be devised or old ones modified to advance are individually modeled and discretized, while their inter- the field. action topology is separately designed and implemented in a Today there exist efficient model-based synthesis algo- dynamical and physically sound fashion. As a result, several rithms for many sound sources, while there are still some high-quality demonstrations of virtual musical instruments for which we do not have a good model. Certain issues, such played in a virtual environment were developed. During the as parameter estimation and real-time control, require fur- ALMA project, the guest editors realized that this special is- ther work for many model-based approaches. Finally, the ca- sue could be created, since the field was very active but there pabilities of human listeners to perceive details in synthetic had not been a special issue devoted to it for a long time. sound should be accounted for in a way similar as in percep- This EURASIP JASP special issue presents ten examples tual audio coding in order to optimize the algorithms. The of recent research in model-based sound synthesis. The first success and future of the model-based approach depends on two papers are related to keyboard instruments. First Gior- researchers and the results of their work. dano and Jiang discuss physical modeling synthesis of the pi- The roots of this special issue are in a European project ano using the finite-difference approach. Then Valim¨ aki¨ et al. 924 EURASIP Journal on Applied Signal Processing show how to synthesize the sound of the harpsichord based Westminster, London, UK. During the academic year 2001-2002 on measurements of a real instrument. An efficient imple- he was Professor of signal processing at the Pori School of Tech- nology and Economics, Tampere University of Technology (TUT), mentation using a visual software synthesis package is given Pori, Finland. In August 2002 he returned to HUT, where he for real-time synthesis. is currently Professor of audio signal processing. He was ap- In the third paper, Trautmann and Rabenstein present a pointed Docent in signal processing at the Pori School of Tech- multirate implementation of a vibrating string model that is nology and Economics, TUT, in 2003. His research interests are based on the functional transformation method. In the next in the application of digital signal processing to audio and mu- paper, Testa et al. investigate the modeling of stiff string be- sic. Dr. Valim¨ aki¨ is a Senior Member of the IEEE Signal Process- havior. The dispersive wave phenomenon, perceivable as in- ing Society and is a Member of the Audio Engineering Society, harmonicity in many string instrument sounds, is studied by the Acoustical Society of Finland, and the Finnish Musicological deriving different physically inspired models. Society. In the fourth paper, Karjalainen and Erkut propose a very interesting and general solution to the problem of how to Augusto Sarti, born in 1963, received the build composite models from digital waveguides and finite- “Laurea” degree (1988, cum laude) and the difference time-domain blocks. The next contribution is Ph.D. (1993) in electrical engineering, from from Guillemain, who proposes a real-time synthesis model the University of Padua, Italy, with research of double-reed wind instruments based on a nonlinear phys- on nonlinear communication systems. He completed his graduate studies at the Uni- ical model. versity of California at Berkeley, where he The paper by Howard and Rimell provides a viewpoint spent two years doing research on nonlinear ff quite di erent from the others in this special issue. It deals system control and on motion planning of with the design and implementation of user interfaces for nonholonomic systems. In 1993 he joined model-based synthesis. An important aspect is the incorpo- the Dipartimento di Elettronica e Informazione of the Politecinco ration of tactile feedback into the interface. di Milano, where he is now an Associate Professor. His current re- Arroabarren and Carlosena have studied the modeling search interests are in the area of digital signal processing, with and analysis of human voice production, particularly the vi- particular focus on sound analysis, processing and synthesis, im- brato used in the singing voice. Source-filter modeling and age processing, video coding and computer vision. Augusto Sarti sinusoidal modeling are compared to gain a deeper insight authored over 100 scientific publications. He is leading the Image in these phenomena. Bensa et al. bring the discussion back to and Sound Processing Group (ISPG) at the Dipartimento di Elet- tronica e Informazione of the Politecnico di Milano, which con- the physical modeling of musical instruments, with particu- tributed to numerous national projects and eight European re- lar reference to the piano. They propose a source/resonator search projects. He is currently coordinating the IST-2001-33059 model of hammer-string interaction aimed at a realistic pro- European Project “ALMA: Algorithms for the Modelling of Acous- duction of piano sound. Finally, Glass and Fukuodome in- tic Interactions,” and is co-coordinating the IST-2000-28436 Euro- corporate a plucked-string model into an audio coder for au- pean Project “ORIGAMI: A new paradigm for high-quality mixing dio compression and instrument synthesis. of real and virtual.” The guest editors would like to thank all the authors for their contributions. We would also like to express our deep Matti Karjalainen was born in Hankasalmi, gratitude to the reviewers for their diligent efforts in evaluat- Finland, in 1946. He received the M.S. and ing all submitted manuscripts. We hope that this special issue the Dr.Tech. degrees in electrical engineer- will stimulate further research work on model-based sound ing from the Tampere University of Tech- synthesis. nology, in 1970 and 1978, respectively. Since 1980 he has been a Professor of acoustics and audio signal processing at the Helsinki Vesa Valim¨ aki¨ University of Technology in the Faculty of Augusto Sarti Electrical Engineering. In audio technology Matti Karjalainen his interest is in audio signal processing, Rudolf Rabenstein such as DSP for sound reproduction, perceptually based signal pro- Lauri Savioja cessing, as well as music DSP and sound synthesis. In addition to audio DSP, his research activities cover , analysis, and recognition; perceptual auditory modeling and spatial hear- Vesa Valim¨ aki¨ was born in Kuorevesi, Fin- ing; DSP hardware, software, and programming environments; as land, in 1968. He received the M.S. de- well as various branches of acoustics, including musical acoustics gree, the Licentiate of Science degree, and and modeling of musical instruments. He has written more than the Doctor of Science degree, all in elec- 300 scientific or engineering articles and contributed to organiz- trical engineering from Helsinki Univer- ing several conferences and workshops. Professor Karjalainen is sity of Technology (HUT), Espoo, Fin- an AES Fellow and a Member in IEEE (Institute of Electrical and land, in 1992, 1994, and 1995, respec- Electronics Engineers), ASA (Acoustical Society of America), EAA tively. He was with the HUT Labora- (European Acoustics Association), ICMA (International Com- tory of Acoustics and Audio Signal Pro- puter Music Association), ESCA (European Speech Communica- cessing from 1990 to 2001. In 1996, he tion Association), and several Finnish scientific and engineering was a Postdoctoral Research Fellow with the University of societies. Editorial 925

Rudolf Rabenstein received the “Diplom- Ingenieur” and “Doktor-Ingenieur” degrees in electrical engineering and the “Habilita- tion” degree in signal processing, all from the University of Erlangen-Nuremberg, Germany in 1981, 1991, and 1996, respec- tively. He worked with the Telecommuni- cations Laboratory, University of Erlangen- Nuremberg, from 1981 to 1987. From 1998 to 1991, he was with the Physics Depart- ment of the University of Siegen, Germany. In 1991, he returned to the Telecommunications Laboratory of the University of Erlangen- Nuremberg. His research interests are in the fields of multidimen- sional systems theory, multimedia signal processing, and computer music. Rudolf Rabenstein is the author and coauthor of more than 100 scientific publications, has contributed to various books and book chapters, and holds several patents in audio engineering. He is a Board Member of the School of Engineering of the Virtual Uni- versity of Bavaria, Germany and a member of several engineering societies.

Lauri Savioja worksasaProfessorforthe Laboratory of Telecommunications Soft- ware and Multimedia in the Helsinki Uni- versity of Technology (HUT), Finland. He received the Doctor of Science degree in Technology in 1999 from the Department of Computer Science, HUT. His research inter- ests include virtual reality, room acoustics, and human-computer interaction. EURASIP Journal on Applied Signal Processing 2004:7, 926–933 c 2004 Hindawi Publishing Corporation

Physical Modeling of the Piano

N. Giordano Department of Physics, Purdue University, 525 Northwestern Avenue, West Lafayette, IN 47907-2036, USA Email: [email protected]

M. Jiang Department of Physics, Purdue University, 525 Northwestern Avenue, West Lafayette, IN 47907-2036, USA Department of Computer Science, Montana State University, Bozeman, MT 59715, USA Email: [email protected]

Received 21 June 2003; Revised 27 October 2003

A project aimed at constructing a physical model of the piano is described. Our goal is to calculate the sound produced by the instrument entirely from Newton’s laws. The structure of the model is described along with experiments that augment and test the model calculations. The state of the model and what can be learned from it are discussed. Keywords and phrases: physical modeling, piano.

1. INTRODUCTION the instrument. However, as far as we can tell, certain fea- tures of the model, such as hammer-string impulse func- This paper describes a long term project by our group aimed tions and the transfer function that ultimately relates the at physical modeling of the piano. The theme of this volume, sound pressure to the soundboard motion (and other sim- model based sound synthesis of musical instruments, is quite ilar transfer functions), are taken from experiments on real broad, so it is useful to begin by discussing precisely what instruments. This approach is a powerful way to produce re- we mean by the term “physical modeling.” The goal of our alistic musical tones efficiently, in real time and in a man- project is to use Newton’s laws to describe all aspects of the ner that can be played by a human performer. However, this piano. We aim to use F = ma to calculate the motion of the approach cannot address certain questions. For example, it hammers, strings, and soundboard, and ultimately the sound would not be able to predict the sound that would be pro- that reaches the listener. duced if a radically new type of soundboard was employed, Of course, we are not the first group to take such a New- or if the hammers were covered with a completely differ- ton’s law approach to the modeling of a musical instrument. ent type of material than the conventional felt. The physi- For the piano, there have been such modeling studies of the cal modeling method that we describe in this paper can ad- hammer-string interaction [1, 2, 3, 4, 5, 6, 7, 8, 9], string vi- dress such questions. Hence, we view the ideas and method brations [8, 9, 10], and soundboard motion [11]. (Nice re- embodied in work of Bank and coworkers [20] (and the ref- views of the physics of the piano are given in [12, 13, 14, 15].) erences therein) as complementary to the physical modeling There has been similar modeling of portions of other instru- approach that is the focus of our work. ments (such as the guitar [16]), and of several other com- In this paper, we describe the route that we have taken plete instruments, including the xylophone and the timpani to assembling a complete physical model of the piano. [17, 18, 19]. Our work is inspired by and builds on this pre- This complete model is really composed of interacting sub- vious work. models which deal with (1) the motions of the hammers and At this point, we should also mention how our work re- strings and their interaction, (2) soundboard vibrations, and lates to other modeling work, such as the digital waveguide (3) sound generation by the vibrating soundboard. For each approach, which was recently reviewed in [20]. The digital of these submodels we must consider several issues, includ- waveguide method makes extensive use of physics in choos- ing selection and implementation of the computational algo- ing the structure of the algorithm; that is, in choosing the rithm, determination of the values of the many parameters proper filter(s) and delay lines, connectivity, and so forth, that are involved, and testing the submodel. After consider- to properly match and mimic the Newton’s law equations of ing each of the submodels, we then describe how they are motion of the strings, soundboard, and other components of combined to produce a complete computational piano. The Physical Modeling of the Piano 927 quality of the calculated tones is discussed, along with the The issue of listening tests brings us to the question of lessons we have learned from this work. A preliminary and goals, that is, what do we hope to accomplish with such a abbreviated report on this project was given in [21]. modeling project? At one level, we would hope that the cal- culated piano tones are realistic and convincing. The model could then be used to explore what various hypothetical pi- 2. OVERALL STRATEGY AND GOALS anos would sound like. For example, one could imagine con- structing a piano with a carbon fiber soundboard, and it One of the first modeling decisions that arises is the question would be very useful to be able to predict its sound ahead of of whether to work in the domain or the time do- time, or to use the model in the design of the new sound- main. In many situations, it is simplest and most instructive board. On a different and more philosophical level, one to work in the frequency domain. For example, an under- might want to ask questions such as “what are the most im- standing of the distribution of normal mode , and portant elements involved in making a piano sound like a pi- the nature of the associated eigenvectors for the body vibra- ano?” We emphasize that it is not our goal to make a real time tions of a violin or a piano soundboard, is very instructive. model, nor do we wish to compete with the tones produced However, we have chosen to base our modeling in the time by other modeling methods, such as sampling synthesis and domain. We believe that this choice has several advantages. digital waveguide modeling [20]. First, the initial excitation—in our case this is the motion of a piano hammer just prior to striking a string—is described most conveniently in the time domain. Second, the interac- 3. STRINGS AND HAMMERS tion between various components of the instrument, such Our model begins with a piano hammer moving freely with a as the strings and soundboard, is somewhat simpler when speed v just prior to making contact with a string (or strings, viewed in the time domain, especially when one considers h since most notes involve more than one string). Hence, we the early “attack” portion of a tone. Third, our ultimate goal ignore the mechanics of the action. This mechanics is, of is to calculate the room pressure as a function of time, so it is course, quite important from a player’s perspective, since it appealing to start in the time domain with the hammer mo- determines the touch and feel of the instrument [26]. Nev- tion and stay in the time domain throughout the calculation, ertheless, we will ignore these issues, since (at least to a first ending with the pressure as would be received by a listener. approximation) they are not directly relevant to the compo- Our time domain modeling is based on finite difference cal- sition of a piano tone and we simply take v as an input pa- culations [10] that describe all aspects of the instrument. h rameter. Typical values are in the range 1–4 m/s [9]. A second element of strategy involves the determination When a hammer strikes a string, there is an interaction of the many parameters that are required for describing the force that is a function of the compression of the hammer piano. Ideally, one would like to determine all of these pa- felt, y f . This force determines the initial excitation and is rameters independently, rather than use them as fitting pa- thus a crucial factor in the composition of the resulting tone. rameters when comparing the modeling results to real (mea- Considerable effort has been devoted to understanding the sured) tones. This is indeed possible for all of the parame- hammer-string force [1, 2, 3, 4, 5, 6, 7, 27, 28, 29, 30, 31, ters. For example, dimensional parameters such as the string 32, 33]. Hammer felt is a very complicated material [34], diameters and lengths, soundboard dimensions, and bridge and there is no “first principles” expression for the hammer- positions, can all be measured from a real piano. Likewise, ff string force relation Fh(y f ). Much work has assumed a sim- various material properties such as the string sti ness, the ple power law function elastic moduli of the soundboard, and the acoustical proper- ties of the room in which the numerical piano is located, are p Fh y f = F0 y ,(1) well known from very straightforward measurements. For a f few quantities, most notably the force-compression charac- where the exponent p is typically in the range 2.5–4 and F0 teristics of the piano hammers, it is necessary to use separate is an overall . This power law form seems to be at (and independent) experiments. least qualitatively consistent with many experiments and we This brings us to a third element of our modeling therefore used (1) in our initial modeling calculations. strategy—the problem of how to test the calculations. The While (1) has been widely used to analyze and inter- final output is the sound at the listener, so one could “test” pret experiments, and also in previous modeling work, it the model by simply evaluating the sounds via listening tests. has been known for some time that the force-compression However, it is very useful to separately test the submod- characteristic of most real piano hammers is not a simple els. For example, the portion of the model that deals with reversible function [7, 27, 28, 29, 30]. Ignoring the hystere- soundboard vibrations can be tested by comparing its pre- sis has seemed reasonable, since the magnitude of the ir- dictions for the acoustic impedance with direct measure- reversibility is often found to be small. Figure 1 shows the ments [11, 22, 23, 24]. Likewise, the room-soundboard com- force-compression characteristic for a particular hammer (a putation can be compared with studies of sound production Steinway hammer from the note middle C) measured in by a harmonically driven soundboard [25]. This approach, two different ways. In the type I measurement, the hammer involving tests against specially designed experiments, has struck a stationary force sensor and the resulting force and proven to be extremely valuable. felt compression were measured as described in [31]. We see 928 EURASIP Journal on Applied Signal Processing

cles for the felt. There is considerable hysteresis during these 20 cycles, much more than might have been expected from the Hammer force characteristics type I result. The overall magnitude of the type II force is also Hammer C4 somewhat smaller; the hammer is effectively “softer” under the type II conditions. Since the type II arrangement is the one found in real piano, it is important to use this hammer- force characteristic in modeling.

(N) We have chosen to model our hysteretic type II hammer

h Type I exp. F 10 measurements following the proposal of Stulov [30, 33]. He has suggested the form Fh y f (t) −∞    = F0 g y f (t) − 0 g y f (t ) exp − (t − t )/τ0 dt . t Type II exp. (2) 0 00.20.40.6 Here, τ is a characteristic (memory) time scale associated y f (mm) 0 with the felt, 0 is a measure of the magnitude of the hystere- sis, and ( ) is the variation of the compression with time. Figure 1: Force-compression characteristics measured for a partic- y f t ular piano hammer measured in two different ways. In the type I ex- In other words, (2) says that the felt “remembers” its pre- periment (dotted curve), the hammer struck a stationary force sen- vious compression history over a time of order τ0, and that sor and the resulting force, Fh, and felt compression, y f , were mea- the force is reduced according to how much the felt has been sured. The initial hammer velocity was approximately 1 m/s. The compressed during that period. The inherent nonlinearity of solid curve is the measured force-compression relation obtained in the hammer is specified by the function g(z); Stulov took this a type II measurement, in which the same hammer impacted a pi- to be a power law ano string. This behavior is described qualitatively by (2), with pa- = = × 13  = = × −5 p rameters p 3.5, F0 1.0 10 N, 0 0.90, and τ0 1.0 10 g(z) = z . (3) second. The dashed arrows indicate compression/decompression branches. Stulov has compared (2) to measurements with real ham- mers and reported very good agreement using τ0, 0, p,and F0 as fitting parameters. Our own tests of (2) have not shown that for a particular value of the felt compression, y f , the such good agreement; we have found that it provides only a force is larger during the compression of the hammer- qualitative (and in some cases semiquantitative) description string collision than during decompression. However, this of the hysteresis shown in Figure 1 [35]. Nevertheless, it is ff di erence is relatively small, generally no more than 10% of currently the best mathematical description available for the the total force. Provided that this hysteresis is ignored, the hysteresis, and we have employed it in our modeling calcula- type I result is described reasonably well by the power law tions. function (1)withp ≈ 3. However, we will see below that (1) Our string calculations are based on the equation of mo- is not adequate for our modeling work, and this has led us to tion [8, 10, 36] consider other forms for F . h In order to shed more light on the hammer-string force, ∂2 y ∂2 y ∂4 y ∂y ∂3 y we developed a new experimental approach, which we refer = c2 −  − α + α ,(4) ∂t2 s ∂x2 ∂x4 1 ∂t 2 ∂t3 to as a type II experiment, in which the force and felt com- pression are measured as the hammer impacts on a string where y(x, t) is the transverse string displacement at time t [32, 35]. Since the string rebounds in response to the ham- ≡ mer, the hammer-string contact time in this case is consider- and position x along the string. cs µ/T is the wave speed ably longer (by a factor of approximately 3) than in the type I for an ideal string (with stiffness and damping ignored), with measurement. The force-compression relation found in this T the tension and µ the mass per unit length of the string.  type II measurement is also shown in Figure 1.Incontrastto When the parameters , α1,andα2 are zero, this is just the the type I measurements, the type II results for Fh(y)donot simple wave equation. Equation (4) describes only the po- consist of two simple branches (one for compression and an- larization mode for which the string displacement is parallel other for decompression). Instead, the type II result exhibits to the initial velocity of the hammer. The other transverse “loops,” which arise for the following reason. When the ham- mode and also the longitudinal mode are both ignored; ex- mer first contacts the string, it excites pulses that travel to periments have shown that both of these modes are excited the ends of the string, are reflected at the ends, and then re- in real piano strings [37, 38, 39],butwewillleavethemfor turn. These pulses return while the hammer is still in contact future modeling work. The term in (4)thatisproportional ff with the string, and since they are inverted by the reflection, to  arises from the sti ness of the string. It turns out that  = 2 they cause an extra series of compression/decompression cy- cs rs Es/ρs,wherers, Es,andρs are the radius, Young’s Physical Modeling of the Piano 929 modulus, and density of the string, respectively, [9, 36]. For tion.) The soundboard coordinates x and y run perpendic- −4 typical piano strings,  is of order 10 , so the stiffness term ular and parallel to the grain of the board. Ex and νx are in (4) is small, but it cannot be neglected as it produces the Young’s modulus and Poisson’s ration for the x direction, well-known effect of stretched octaves [36]. Damping is ac- and so forth for y, Gxy is the shear modulus, hb is the board counted for with the terms involving α1 and α2; one of these thickness and ρb is its density. The values of all elastic con- terms is proportional to the string velocity, while the other is stants were taken from [41]. In order to model the ribs and proportional to ∂3 y/∂t3. This combination makes the damp- bridges, the thickness and rigidity factors are position depen- ing dependent on frequency in a manner close to that ob- dent (since these factors are different at the ribs and bridges served experimentally [8, 10]. than on the “bare” board) as described in [11]. There are Our numerical treatment of the string motion employs a also some additional terms that enter the equation of mo- finite difference formulation in which both time t and posi- tion (5) at the ends of bridges [11, 17, 18, 43]. Fs(x, y) is the tion x are discretized in units ∆ts and ∆xs [8, 9, 10, 40]. The force from the strings on the bridge. This force acts at the string displacement is then y(x, t) ≡ y(i∆xs, n∆ts) ≡ y(i, n). appropriate bridge location; it is proportional to the com- If the derivatives in (4) are written in finite difference form, ponent of the string tension perpendicular to the plane of this equation can be rearranged to express the string dis- the board, and is calculated from the string portion of the placement at each spatial location i at time step n+1interms model. Finally, we include a loss term proportional to the of the displacement at previous time steps as described by parameter β [11]. The physical origin of this term involves Chaigne and Askenfelt [8, 10]. The equation of motion (4) elastic losses within the board. We have not attempted to does not contain the hammer force. This is included by the model this physics according to Newton’s laws, but have sim- addition of a term on the right-hand side proportional to ply chosen a value of β which yields a quality factor for the Fh, which acts at the hammer strike point. Since the ham- soundboard modes which is similar to that observed experi- mer has a finite width, it is customary to spread this force mentally [11, 24].1 Finally, we note that the soundboard “acts over a small length of the string [8]. So far as we know, the back” on the strings, since the bridge moves and the strings details of how this force is distributed have never been mea- are attached to the bridge. Hence, the interaction of strings in sured; fortunately our modeling results are not very sensitive a unison group, and also sympathetic string vibrations (with to this factor (so long as the effective hammer width is qual- the dampers disengaged from the strings) are included in the itatively reasonable). With this approach to the string calcu- model. lation, the need for numerical stability together with the de- For the solution of (5), we again employed a finite dif- sired frequency range require that each string be treated as ference algorithm. The space dimensions x and y were dis- 50–100 vibrating numerical elements [8, 10]. cretized, both in steps of size ∆xb; this spatial step need not be related to the step size for the string ∆xs. As in our previous work on soundboard modeling [11], we chose ∆xb = 2cm, 4. THE SOUNDBOARD since this is just small enough to capture the structure of the Wood is a complicated material [41]. Soundboards are as- board, including the widths of the ribs and bridges. Hence, sembled from wood that is “quarter sawn,” which means that the board was modeled as ∼ 100 × 100 vibrating elements. two of the principal axes of the elastic constant tensor lie in The behavior of our numerical soundboard can be the plane of the board. judged by calculations of the mechanical impedance, Z,as The equation of motion for such a thin orthotropic plate defined by is [11, 22, 23, 42] F = ,(7) 2 4 4 Z ∂ z ∂ z ∂ z vb ρ h =−D − D ν + D ν +4D b b ∂t2 x ∂x4 x y y x xy ∂x2∂y2 where F is an applied force and vb is the resulting sound- 4 (5) − ∂ z − ∂z board velocity. Here, we assume that F is a (single Dy 4 + Fs(x, y) β , ∂y ∂t frequency) force applied at a point on the bridge and vb is measured at the same point. Figure 2 shows results calculated where the rigidity factors are from our model [11] for the soundboard from an upright pi- ano. Also shown are measurements for a real upright sound- h3E = b x board (with the same dimensions and bridge positions, etc., Dx − ν ν , 12 1 x y as in the model). The agreement is quite acceptable, espe- h3E cially considering that parameters such as the dimensions of = b y (6) Dy , the soundboard, the position and thickness of the ribs and 12 1 − νxνy 3 bridges, and the elastic constants of the board were taken h Gxy D = b . xy 12 1 − In principle, one might expect the soundboard losses to be frequency Here, our board lies in the x y plane and z is its displace- dependent, as found for the string. At present there is no good experimental ment. (These x and y directions are, of course, not the same data on this question, so we have chosen the simplest possible model with as the x and y coordinates used in describing the string mo- just a single loss term in (5). 930 EURASIP Journal on Applied Signal Processing

104 ∆ ,and ∆ . The grids for and are arranged in a sim- Experiment j xr k xr vy vz ilar manner, as explained in [44, 45]. Sound is generated in this numerical room by the vibra- 5000 tion of the soundboard. We situate the soundboard from the previous section on a plane perpendicular to the z direction in the room, approximately 1 m from the nearest parallel wall 2000 (i.e., the floor). At each time step the velocity vz of the room air at the surface of the soundboard is set to the calculated soundboard velocity at that instant, as obtained from the 1000 (kg/s) soundboard calculation. Z The room is taken to be a rectangular box with the same 500 Model acoustical properties for all 6 walls. The walls of the room are modeled in terms of their acoustic impedance, Z,with

Soundboard impedance p = Zvn,(9) 200 Upright piano at middle C where vn is the component of the (air) velocity normal to the 100 wall [46]. Measurements of Z for a number of materials [47] 100 1000 104 have found that it is typically frequency dependent with the f(Hz) form iZ Figure 2: Calculated (solid curve) and measured (dotted curve) Z(ω) ≈ Z0 − , (10) mechanical impedance for an upright piano soundboard. Here, ω the force was applied and the board velocity was measured at the where ω is the angular frequency. Incorporating this fre- point where the string for middle C crosses the bridge. Results from quency domain expression for the acoustic impedance into [11, 24]. our time domain treatment was done in the manner de- scribed in [45]. The time step for the room calculation was ∆tr = 1/22050 ≈ 4.5 × 10−4 s, as explained in the next section. from either direct measurements or handbook values (e.g., ∆ Young’s modulus). The choice of spatial step size xr was then influenced by two considerations. First, in order for the finite difference al- gorithm to be numerically√ stable in three dimensions, one 5. THE ROOM must have ∆xr /( 3∆tr ) >ca. Second, it is convenient for the Our time domain room modeling follows the work of Bottel- spatial steps for the soundboard and room to be commen- surate. In the calculations described below, the room step dooren [44, 45]. We begin with the usual coupled equations ∆ = for the velocity and pressure in the room size was xr 4 cm, that is, twice the soundboard step size. When using the calculated soundboard velocity to obtain the ∂v ∂p room velocity at the soundboard surface, we averaged over ρ x =− , a ∂t ∂x 4 soundboard grid points for each room grid point. Typical numerical rooms were 3×4×4m3, and thus contained ∼ 106 ∂vy ∂p ρ =− , finite difference elements. a ∂t ∂y (8) Figure 3 shows results for the sound generation by an ∂vz ∂p upright soundboard. Here, the soundboard was driven har- ρa =− , ∂t ∂z monically at the point where the string for middle C contacts ∂p ∂v ∂vy ∂v the bridge, and we plot the sound pressure normalized by = ρ c2 − x − − z , ∂t a a ∂x ∂y ∂z the board velocity at the driving point [25]. It is seen that the model results compare well with the experiments. This pro- where p is the pressure, the velocity components are vx, vy, vides a check on both the soundboard and the room models. and vz, ρa is the density, and ca is the speed of sound in air. This family of equations is similar in form to an electromag- 6. PUTTING IT ALL TOGETHER netic problem, and much is known about how to deal with it numerically. We employ a finite difference approach in which Our model involves several distinct but coupled sub- staggered grids in both space and time are used for the pres- systems—the hammers/strings, the soundboard, and the sure and velocity. Given a time step ∆tr , the pressure is com- room—and it is useful to review how they fit together com- puted at times n∆tr while the velocity is computed at times putationally. The calculation begins by giving some initial (n+1/2)∆tr . A similar staggered grid is used for the space co- velocity to a particular hammer. This hammer then strikes a ordinates, with the pressure calculated on the grid i∆xr , j∆xr , string (or strings), and they interact through either (1)or(2). k∆xr , while vx is calculated on the staggered grid (i+1/2)∆xr , This sets the string(s) for that note into motion, and these Physical Modeling of the Piano 931

20 Sound generation 30 minutes of computer time. Of course, this gap will nar- row in the future in accord with Moore’s law. In addition, Soundboard driven at C4 10 the model should transfer easily to a cluster (i.e., multi-CPU) machine. We have also explored an alternative approach to the room modeling involving a ray tracing approach [48]. 5 Ray tracing allows one to express the relationship between soundboard velocity and sound pressure as a multiparame-

(arb. units) ter map, involving approximately 104 parameters. The values b

p/v 2 of these parameters can be precalculated and stored, resulting in about an order of magnitude speed-up in the calculation Experiment as compared to the room algorithm described above. 1 Model

0.5 7. ANALYSIS OF THE RESULTS: WHAT HAVE WE 20 100 103 104 LEARNED AND WHERE DO WE GO NEXT? Frequency (Hz) In the previous section, we saw that a real-time Newton’s law Figure 3: Results for the sound pressure normalized by the sound- simulation of the piano is well within reach. While such a board velocity for an upright piano soundboard: calculated (solid simulation would certainly be interesting, it is not a primary curve) and measured (dotted curve). The board was driven at the goal of our work. We instead wish to use the modeling to point where the string for middle C crosses the bridge. Results from learn about the instrument. With that in mind, we now con- [25]. sider the quality of the tones calculated with the current ver- sion of the model. In our initial modeling, we employed power law ham- in turn act on the bridge and soundboard. As we have al- mers described by (1) with parameters based on type I ready mentioned, the vibrations of each component of our hammer experiments by our group [31]. The results were model are calculated with a finite difference algorithm, each disappointing—it is hard to accurately describe the tones in with an associated time step. Since the systems are coupled— words, but they sounded distinctly plucked and somewhat that is, the strings drive the soundboard, the soundboard acts metallic. While we cannot include our calculated sounds back on the strings, and the soundboard drives the room— as part of this paper, they are available on our website it would be computationally simpler to use the same value http://www.physics.purdue.edu/piano. After many modeling of the time step for all three subsystems. However, the equa- calculations, we came to the conclusion that the hammer tion of motion for the soundboard is highly dispersive, and model—for example, the power law description (1)—was the stability requirements demand a much smaller time step for problem. Note that we do not claim that power law ham- the soundboard than is needed for string and room simula- mers must always give unsatisfactory results. Our point is tions. Given the large number of room elements, this would that when the power law parameters are chosen to fit the type greatly (and unnecessarily) slow down the calculation. We I behavior of real hammers, the calculated tones are poor. It is have therefore chosen to instead make the various time steps certainly possible (and indeed, likely) that power law param- commensurate, with eters that will yield good piano tones can be found. How- ever, based on our experience, it seems that these parameters 1 ∆tr = s, should be viewed as fitting parameters, as they may not ac- 22050 curately describe any real hammers. ∆t ∆t = r , (11) This led us to the type II hammer experiments described s 4 ∆t above, and to a description of the hammer-string force in ∆t = s , terms of the Stulov function (2), with parameters (τ ,  ,etc.) b 6 0 0 taken from these type II experiments [35]. The results were where the subscripts correspond to the room (r), string (s), much improved. While they are not yet “Steinway quality,” and soundboard (b). To explain this hierarchy, we first note it is our opinion that the calculated tones could be mistaken that the room time step is chosen to be compatible with com- for a real piano. In that sense, they pass a sort of acoustical mon audio hardware and software; 1/∆tr is commensurate Turing test. Our conclusion is that the hammers are an es- with the data rates commonly used in CD sound formats. sential part of the instrument. This is hardly a revolutionary We then see that each room time step contains 4 string time result. However, based on our modeling, we can also make steps; that is, the string algorithm makes 4 iterations for each a somewhat stronger statement: in order to obtain a real- iteration of the room model. Likewise, each string time step istic piano tone, the modeling should be based on hammer contains 6 soundboard steps. parameters observed in type II measurements, with the hys- The overall computational speed is currently somewhat teresis included in the model. less than “real time.” With a typical personal computer (clock There are a number of issues that we plan to address speed 1 GHz), a 1 minute simulation requires approximately in the future. (1) The hammer portion of the model still 932 EURASIP Journal on Applied Signal Processing needs attention. Our experiments [35] indicate that while [6] H. Suzuki, “Model analysis of a hammer-string interaction,” the Stulov function does provide a qualitative description Journal of the Acoustical Society of America,vol.82,no.4,pp. of the hammer force hysteresis, there are significant quan- 1145–1151, 1987. titative differences. It may be necessary to develop a bet- [7] X. Boutillon, “Model for piano hammers: Experimental de- termination and digital simulation,” Journal of the Acoustical ter functional description to replace the Stulov form. (2) Society of America, vol. 83, no. 2, pp. 746–754, 1988. As it currently stands, our string model includes only one [8] A. Chaigne and A. Askenfelt, “Numerical simulations of pi- polarization mode, corresponding to vibrations parallel to ano strings. I. A physical model for a struck string using finite the initial hammer velocity. It is well known that the other difference method,” Journal of the Acoustical Society of Amer- transverse polarization mode can be important [37]. This ica, vol. 95, no. 2, pp. 1112–1118, 1994. can be readily included, but will require a more general [9] A. Chaigne and A. Askenfelt, “Numerical simulations of piano soundboard model since the two transverse modes couple strings. II. Comparisons with measurements and systematic exploration of some hammer-string parameters,” Journal of through the motion of the bridge. (3) The soundboard of the Acoustical Society of America, vol. 95, no. 3, pp. 1631–1640, a real piano is supported by a case. Measurements in our 1994. laboratory indicate that the case acceleration can be as large [10] A. Chaigne, “On the use of finite differences for musical syn- as 5% or so of the soundboard acceleration, so the sound thesis. Application to plucked stringed instruments,” Journal emitted by the case is considerable. (4) We plan to refine d’Acoustique, vol. 5, no. 2, pp. 181–211, 1992. the room model. Our current room model is certainly a [11] N. Giordano, “Simple model of a piano soundboard,” Journal of the Acoustical Society of America, vol. 102, no. 2, pp. 1159– very crude approximation to a realistic room. Real rooms ff 1168, 1997. have wall coverings of various types (with di ering values [12] H. A. Conklin Jr., “Design and tone in the mechanoacoustic of the acoustic impedances), and contain chairs and other piano. Part I. Piano hammers and tonal effects,” Journal of the objects. At our current level of sophistication, it appears Acoustical Society of America, vol. 99, no. 6, pp. 3286–3296, that the hammers are more of a limitation than the room 1996. model, but this may well change as the hammer modeling is [13] H. Suzuki and I. Nakamura, “Acoustics of ,” Appl. improved. Acoustics, vol. 30, pp. 147–205, 1990. [14] H. A. Conklin Jr., “Design and tone in the mechanoacous- In conclusion, we have made good progress in developing tic piano. Part II. Piano structure,” Journal of the Acoustical a physical model of the piano. It is now possible to produce Society of America, vol. 100, no. 2, pp. 695–708, 1996. realistic tones using Newton’s laws with realistic and inde- [15] H. A. Conklin Jr., “Design and tone in the mechanoacoustic pendently determined instrument parameters. Further im- piano. Part III. Piano strings and scale design,” Journal of the provements of the model seem quite feasible. We believe that Acoustical Society of America, vol. 100, no. 3, pp. 1286–1298, physical modeling can provide new insights into the piano, 1996. and that similar approaches can be applied to other instru- [16] B. E. Richardson, G. P. Walker, and M. Brooke, “Synthesis of guitar tones from fundamental parameters relating to con- ments. struction,” Proceedings of the Institute of Acoustics, vol. 12, no. 1, pp. 757–764, 1990. ACKNOWLEDGMENTS [17] A. Chaigne and V. Doutaut, “Numerical simulations of xy- lophones. I. Time-domain modeling of the vibrating bars,” We thank P. Muzikar, T. Rossing, A. Tubis, and G. Weinre- Journal of the Acoustical Society of America, vol. 101, no. 1, pp. ich for many helpful and critical discussions. We also are in- 539–557, 1997. [18] V. Doutaut, D. Matignon, and A. Chaigne, “Numerical simu- debted to A. Korty, J. Winans II, J. Millis, S. Dietz, J. Jourdan, ff lations of xylophones. II. Time-domain modeling of the res- J. Roberts, and L. Reu for their contributions to our piano onator and of the radiated sound pressure,” Journal of the studies. This work was supported by National Science Foun- Acoustical Society of America, vol. 104, no. 3, pp. 1633–1647, dation (NSF) through Grant PHY-9988562. 1998. [19] L. Rhaouti, A. Chaigne, and P. Joly, “Time-domain model- ing and numerical simulation of a kettledrum,” Journal of the REFERENCES Acoustical Society of America, vol. 105, no. 6, pp. 3545–3562, 1999. [1] D. E. Hall, “Piano string excitation in the case of small ham- [20] B. Bank, F. Avanzini, G. Borin, G. De Poli, F. Fontana, and mer mass,” Journal of the Acoustical Society of America, vol. 79, D. Rocchesso, “Physically informed signal processing meth- no. 1, pp. 141–147, 1986. ods for piano sound synthesis: a research overview,” EURASIP [2] D. E. Hall, “Piano string excitation II: General solution for Journal on Applied Signal Processing, vol. 2003, no. 10, pp. a hard narrow hammer,” Journal of the Acoustical Society of 941–952, 2003. America, vol. 81, no. 2, pp. 535–546, 1987. [21] N. Giordano, M. Jiang, and S. Dietz, “Experimental and com- [3] D. E. Hall, “Piano string excitation III: General solution for putational studies of the piano,” in Proc. 17th International a soft narrow hammer,” Journal of the Acoustical Society of Congress on Acoustics, vol. 4, Rome, Italy, September 2001. America, vol. 81, no. 2, pp. 547–555, 1987. [22] J. Kindel and I.-C. Wang, “Modal analysis and finite ele- [4] D. E. Hall and A. Askenfelt, “Piano string excitation V: Spectra ment analysis of a piano soundboard,” in Proc. 5th Interna- for real hammers and strings,” Journal of the Acoustical Society tional Modal Analysis Conference, pp. 1545–1549, Union Col- of America, vol. 83, no. 4, pp. 1627–1638, 1988. lege, Schenectady, NY, USA, 1987. [5] D. E. Hall, “Piano string excitation. VI: Nonlinear modeling,” [23] J. Kindel, “Modal analysis and finite element analysis of a Journal of the Acoustical Society of America, vol. 92, no. 1, pp. piano soundboard,” M.S. thesis, University of Cincinnati, 95–105, 1992. Cincinnati, Ohio, USA, 1989. Physical Modeling of the Piano 933

[24] N. Giordano, “Mechanical impedance of a piano sound- [45] D. Botteldooren, “Finite-difference time-domain simulation board,” Journal of the Acoustical Society of America, vol. 103, of low-frequency room acoustic problems,” Journal of the no. 4, pp. 2128–2133, 1998. Acoustical Society of America, vol. 98, no. 6, pp. 3302–3308, [25] N. Giordano, “Sound production by a vibrating piano sound- 1995. board: Experiment,” Journal of the Acoustical Society of Amer- [46] P.M. Morse and K. U. Ingard, Theoretical Acoustics,Princeton ica, vol. 104, no. 3, pp. 1648–1653, 1998. University Press, Princeton, NJ, USA, 1986. [26] A. Askenfelt and E. V. Jansson, “From touch to string vibra- [47] L. L. Beranek, “Acoustic impedance of commercial materials tions. II. The motion of the key and hammer,” Journal of the and the performance of rectangular rooms with one treated Acoustical Society of America, vol. 90, no. 5, pp. 2383–2393, surface,” Journal of the Acoustical Society of America, vol. 12, 1991. pp. 14–23, 1940. [27] T. Yanagisawa, K. Nakamura, and H. Aiko, “Experimental [48] M. Jiang, “Room acoustics and physical modeling of the study on force-time curve during the contact between ham- piano,” M.S. thesis, Purdue University, West Lafayette, Ind, mer and piano string,” Journal of the Acoustical Society of USA, 1999. Japan, vol. 37, pp. 627–633, 1981. [28] T. Yanagisawa and K. Nakamura, “Dynamic compression characteristics of piano hammer,” Transactions of Musical N. Giordano obtained his Ph.D. from Yale Acoustics Technical Group Meeting of the Acoustic Society of University in 1977, and has been at the De- Japan, vol. 1, pp. 14–17, 1982. partment of Physics at Purdue University [29] T. Yanagisawa and K. Nakamura, “Dynamic compression since 1979. His research interests include characteristics of piano hammer felt,” Journal of the Acous- mesoscopic and nanoscale physics, compu- tical Society of Japan, vol. 40, pp. 725–729, 1984. tational physics, and musical acoustics. He [30] A. Stulov, “Hysteretic model of the grand piano hammer felt,” is the author of the textbook Computational Journal of the Acoustical Society of America, vol. 97, no. 4, pp. Physics (Prentice-Hall, 1997). He also col- 2577–2585, 1995. lects and restores antique pianos. [31] N. Giordano and J. P. Winans II, “Piano hammers and their force compression characteristics: does a power law make sense?,” Journal of the Acoustical Society of America, vol. 107, M. Jiang has a B.S. degree in physics (1997) no. 4, pp. 2248–2255, 2000. from Peking University, China, and M.S. [32] N. Giordano and J. P. Millis, “Hysteretic behavior of pi- degrees in both physics and computer sci- ano hammers,” in Proc. International Symposium on Musi- ence (1999) from Purdue University. Some cal Acoustics,D.Bonsi,D.Gonzalez,andD.Stanzial,Eds.,pp. of the work described in this paper was part 237–240, Perugia, Umbria, Italy, September 2001. of his physics M.S. thesis. After graduation, [33] A. Stulov and A. Magi,¨ “Piano hammer: Theory and experi- he worked as a software engineer for two ment,” in Proc. International Symposium on Musical Acoustics, D. Bonsi, D. Gonzalez, and D. Stanzial, Eds., pp. 215–220, Pe- years, developing Unix kernel software and rugia, Umbria, Italy, September 2001. device drivers. In 2002, he moved to Boze- [34] J. I. Dunlop, “Nonlinear vibration properties of felt pads,” man, Montana, where he is now pursuing a Journal of the Acoustical Society of America, vol. 88, no. 2, pp. Ph.D. in computer science in Montana State University. Minghui’s 911–917, 1990. current research interests include the design of algorithms, compu- [35] N. Giordano and J. P. Millis, “Using physical modeling to tational geometry, and biological modeling and bioinformatics. learn about the piano: New insights into the hammer-string force,” in Proc. International Congress on Acoustics, S. Furui, H. Kanai, and Y. Iwaya, Eds., pp. III–2113, Kyoto, Japan, April 2004. [36] N. H. Fletcher and T. D. Rossing, The Physics of Musical In- struments, Springer-Verlag, New York, NY, USA, 1991. [37] G. Weinreich, “Coupled piano strings,” Journal of the Acous- tical Society of America, vol. 62, no. 6, pp. 1474–1484, 1977. [38] M. Podlesak and A. R. Lee, “Dispersion of waves in piano strings,” Journal of the Acoustical Society of America, vol. 83, no. 1, pp. 305–317, 1988. [39] N. Giordano and A. J. Korty, “Motion of a piano string: lon- gitudinal vibrations and the role of the bridge,” Journal of the Acoustical Society of America, vol. 100, no. 6, pp. 3899–3908, 1996. [40] N. Giordano, Computational Physics,Prentice-Hall,Upper Saddle River, NJ, USA, 1997. [41] V. Bucur, Acoustics of Wood, CRC Press, Boca Raton, Fla, USA, 1995. [42] S. G. Lekhnitskii, Anisotropic Plates,GordonandBreachSci- ence Publishers, New York, NY, USA, 1968. [43] J. W. S. Rayleigh, Theory of Sound,Dover,NewYork,NY,USA, 1945. [44] D. Botteldooren, “Acoustical finite-difference time-domain simulation in a quasi-Cartesian grid,” Journal of the Acoustical Society of America, vol. 95, no. 5, pp. 2313–2319, 1994. EURASIP Journal on Applied Signal Processing 2004:7, 934–948 c 2004 Hindawi Publishing Corporation

Sound Synthesis of the Harpsichord Using a Computationally Efficient Physical Model

Vesa Valim¨ aki¨ Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000, 02015 Espoo, Finland Email: vesa.valimaki@hut.fi Henri Penttinen Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000, 02015 Espoo, Finland Email: henri.penttinen@hut.fi Jonte Knif Sibelius Academy, Centre for Music and Technology, P.O. Box 86, 00251 Helsinki, Finland Email: jknif@siba.fi

Mikael Laurson Sibelius Academy, Centre for Music and Technology, P.O. Box 86, 00251 Helsinki, Finland Email: laurson@siba.fi

Cumhur Erkut Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000, 02015 Espoo, Finland Email: cumhur.erkut@hut.fi

Received 24 June 2003; Revised 28 November 2003

A sound synthesis algorithm for the harpsichord has been developed by applying the principles of digital waveguide modeling. A modification to the loss filter of the string model is introduced that allows more flexible control of decay rates of partials than is possible with a one-pole digital filter, which is a usual choice for the loss filter. A version of the commuted waveguide synthesis approach is used, where each tone is generated with a parallel combination of the string model and a second-order resonator that are excited with a common excitation signal. The second-order resonator, previously proposed for this purpose, approximately simulates the beating effect appearing in many harpsichord tones. The characteristic key-release thump terminating harpsichord tones is reproduced by triggering a sample that has been extracted from a recording. A digital filter model for the soundboard has been designed based on recorded bridge impulse responses of the harpsichord. The output of the string models is injected in the soundboard filter that imitates the reverberant nature of the soundbox and, particularly, the ringing of the short parts of the strings behind the bridge. Keywords and phrases: acoustic signal processing, digital filter design, , musical acoustics.

1. INTRODUCTION features of grand pianos are among the most popular elec- tronic instruments. Our current work focuses on the imita- Sound synthesis is particularly interesting for acoustic key- tion of the harpsichord, which is expensive, relatively rare, board instruments, since they are usually expensive and large but is still commonly used in music from the Renaissance and may require amplification during performances. Elec- and the baroque era. Figure 1 shows the instrument used in tronic versions of these instruments benefit from the fact this study. It is a two-manual harpsichord that contains three that keyboard controllers using MIDI are commonly avail- individual sets of strings, two bridges, and has a large sound- able and fit for use. Digital pianos imitating the and board. Sound Synthesis of the Harpsichord Using a Physical Model 935

model that consists of the cascade of a shaping filter and a common reverb algorithm. The sparse loop filter consists of a conventional one-pole filter and a feedforward comb filter inserted in the feedback loop of a basic string model. Meth- ods to calibrate these parts of the synthesis algorithm are pro- posed. This paper is organized as follows. Section 2 gives a short overview on the construction and acoustics of the harpsi- chord. In Section 3, signal-processing techniques for synthe- sizing harpsichord tones are suggested. In particular, the new loop filter is introduced and analyzed. Section 4 concentrates on calibration methods to adjust the parameters according Figure 1: The harpsichord used in the measurements has two man- to recordings. The implementation of the using uals, three string sets, and two bridges. The picture was taken during a block-based graphical programming language is described the tuning of the instrument in the anechoic chamber. in Section 5, where we also discuss the computational com- plexity and potential applications of the implemented sys- tem. Section 6 contains conclusions, and suggests ideas for Instead of wavetable and sampling techniques that are further research. popular in digital instruments, we apply modeling tech- niques to design an electronic instrument that sounds nearly 2. HARPSICHORD ACOUSTICS identical to its acoustic counterpart and faithfully responds The harpsichord is a stringed keyboard instrument with a to the player’s actions, just as an acoustic instrument. We use long history dating back to at least the year 1440 [8]. It is the modeling principle called commuted waveguide synthe- the predecessor of the pianoforte and the modern piano. It sis [1, 2, 3], but have modified it, because we use a digital belongs to the group of plucked string instruments due to filter to model the soundboard response. Commuted syn- its excitation mechanism. In this section, we describe briefly thesis uses the basic property of linear systems, that in a the construction and the operating principles of the harpsi- cascade of transfer functions their ordering can be changed chord and give details of the instrument used in this study. ff without a ecting the overall transfer function. This way, the For a more in-depth discussion and description of the harp- complications in the modeling of the soundboard resonances sichord, see, for example, [9, 10, 11, 12], and for a descrip- extracted from a recorded tone can be hidden in the in- tion of different types of harpsichord, the reader is referred put sequence. In the original form of commuted synthesis, to [10]. the input signal contains the contribution of the excitation mechanism—the quill plucking the string—and that of the 2.1. Construction of the instrument soundboard with all its vibrating modes [4]. In the current The form of the instrument can be roughly described as tri- implementation, the input samples of the string models are angular, and the oblique side is typically curved. A harpsi- short (less than half a second) and contain only the initial chord has one or two manuals that control two to four sets of part of the soundboard response; the tail of the soundboard strings, also called registers or string choirs. Two of the string response is reproduced with a reverberation algorithm. choirs are typically tuned in unison. These are called the 8 (8 Digital waveguide modeling [5]appearstobeanexcel- foot) registers. Often the third string choir is tuned an octave lent tool for the synthesis of harpsichord tones. A strong ar- higher, and it is called the 4 register. The manuals can be set gument supporting this view is that tones generated using to control different registers, usually with a limited number the basic Karplus-Strong algorithm [6] are reminiscent of of combinations. This permits the player to use different reg- the harpsichord for many listeners.1 This synthesis technique isters with left- and right-hand manuals, and therefore vary has been shown to be a simplified version of a waveguide the timbre and loudness of the instrument. The 8 registers string model [5, 7]. However, this does not imply that realis- differ from each other in the plucking point of the strings. tic harpsichord synthesis is easy. A detailed imitation of the Hence, the 8 registers are called 8 back and front registers, properties of a fine instrument is challenging, even though where “back” refers to the plucking point away from the nut the starting point is very promising. Careful modifications (and the player). to the algorithm and proper signal analysis and calibration The keyboard of the harpsichord typically spans four or routines are needed for a natural-sounding synthesis. five octaves, which became a common standard in the early The new contributions to stringed-instrument models 18th century. One end of the strings is attached to the nut include a sparse high-order loop filter and a soundboard and the other to a long, curved bridge. The portion of the string behind the bridge is attached to a hitch pin, which is on top of the soundboard. This portion of the string also 1The Karplus-Strong algorithm manages to sound something like the harpsichord in some registers only when a high sampling rate is used, such tends to vibrate for a long while after a key press, and it gives as 44.1 kHz or 22.05 kHz. At low sample rates, it sounds somewhat similar theinstrumentareverberantfeel.Thenutissetonavery to violin pizzicato tones. rigid wrest plank. The bridge is attached to the soundboard. 936 EURASIP Journal on Applied Signal Processing

grelease Trigger at Trigger at attack time release time Release samples

Output Timbre S(z) Excitation control samples

gsb Soundboard Tone R(z) filter corrector

Figure 2: Overall structure of the harpsichord model for a single string. The model structure is identical for all strings in the three sets, but the parameter values and sample data are different.

Therefore, the bridge is mainly responsible for transmitting and 85 cm wide, and its strings are all made of brass. The string vibrations to the soundboard. The soundboard is very plucking point changes from 12% to about 50% of the string thin—about 2 to 4 mm—and it is supported by several ribs length in the bass and in the treble range, respectively. This installed in patterns that leave trapezoidal areas of the sound- produces a round timbre (i.e., weak even ) in the board vibrating freely. The main function of the soundboard treble range. In addition, the dampers have been left out in is to amplify the weak sound of the vibrating strings, but it the last octave of the 4 register to increase the reverberant also filters the sound. The soundboard forms the top of a feel during playing. The wood material used in the instru- closed box, which typically has a rose opening. It causes a ment has been heat treated to artificially accelerate the aging , the frequency of which is usually be- process of the wood. low 100 Hz [12]. In many harpsichords, the soundbox also opens to the manual compartment. 3. SYNTHESIS ALGORITHM 2.2. Operating principle This section discusses the signal processing methods used in A plectrum—also called a quill—that is anchored onto a the synthesis algorithm. The structure of the algorithm is jack, plucks the strings. The jack rests on a string, but there is illustrated in Figure 2. It consists of five digital filters, two a small piece of felt (called the damper) between them. One sample databases, and their interconnections. The physical end of the wooden keyboard lever is located a small distance model of a vibrating string is contained in block S(z). Its in- below the jack. As the player pushes down a key on the key- put is retrieved from the excitation signal database, and it board, the lever moves up. This action lifts the jack up and can be modified during run-time with a timbre-control fil- causes the quill to pluck the string. When the key is released, ter, which is a one-pole filter. In parallel with the string, a the jack falls back and the damper comes in contact with the second-order resonator R(z) is tuned to reproduce the beat- string with the objective to dampen its vibrations. A spring ing of one of the partials, as proposed earlier by Bank et al. mechanism in the jack guides the plectrum so that the string [14, 15].Whilewecouldusemoreresonators,wehavede- is not replucked when the key is released. cided to target a maximally reduced implementation to min- imize the computational cost and number of parameters. The 2.3. The harpsichord used in this study sum of the string model and resonator output signals is fed The harpsichord used in this study (see Figure 1)wasbuilt through a soundboard filter, which is common for all strings. The tone corrector is an equalizer that shapes the spectrum in 2000 by Jonte Knif (one of the authors of this paper) and ffi Arno Pelto. It has the characteristics of harpsichords built in of the soundboard filter output. By varying coe cients grelease Italy and Southern Germany. This harpsichord has two man- and gsb, it is possible to adjust the relative levels of the string uals and three sets of string choirs, namely an 8 back, an sound, the soundboard response, and the release sound. 8 front, and a 4 register. The instrument was tuned to the In the following, we describe the string model, the sample databases, and the soundboard model in detail, and discuss Vallotti tuning [13] with the of A4 of 2 the need for modeling the dispersion of harpsichord strings. 415 Hz. There are 56 keys from G1 to D6,whichcorrespond to fundamental frequencies 46 Hz and 1100 Hz, respectively, 3.1. Basic string model revisited in the 8 register; the 4 register is an octave higher, so the corresponding lowest and highest fundamental frequencies We use a version of the vibrating string filter model proposed are about 93 Hz and 2200 Hz. The instrument is 240 cm long by Jaffe and Smith [16]. It consists of a feedback loop, where a delay line, a fractional delay filter, a high-order allpass filter, and a loss filter are cascaded. The delay line and the fractional 2The tuning is considerably lower than the current standard (440 Hz or delay filter determine the fundamental frequency of the tone. higher). This is typical of old musical instruments. The high-order allpass filter [16] simulates dispersion which Sound Synthesis of the Harpsichord Using a Physical Model 937

x(n) y(n)

b

Ad(z) r a + −

− F(z) z−L1 z−R z 1

Ripple One-pole filter filter

Figure 3: Structure of the proposed string model. The feedback loop contains a one-pole filter (denominator of (1)), a feedforward comb filter called “ripple filter” (numerator of (1)), the rest of the delay line, a fractional delay filter F(z), and an allpass filter Ad(z) simulating dispersion. is a typical characteristic of vibrating strings and which in- The delay line length R is determined as troduces in the sound. For the fractional delay filter, we use a first-order allpass filter, as originally suggested R = round rrateL ,(3) by Smith and Jaffe[16, 17]. This choice was made because it allows a simple and sufficient approximation of delay when where rrate is the ripple rate parameter that adjusts the rip- a high sampling rate is used.3 Furthermore, there is no need ple density in the frequency domain and L is the total delay to implement fundamental frequency variations (pitch bend) length in the loop (in samples, or sampling intervals). in harpsichord tones. Thus, the recursive nature of the allpass Theripplefilterwasdevelopedbecauseitwasfoundthat fractional delay filter, which can cause transients during pitch the magnitude response of the one-pole filter alone is overly bends, is not harmful. smooth when compared to the required loop gain behavior The loss filter of waveguide string models is usually im- for harpsichord sounds. Note that the ripple factor r in (1) plemented as a one-pole filter [18], but now we use an ex- increases the loop gain, but it is not accounted for in the scal- tended version. The transfer function of the new loss filter ing factor in (2). This is purposeful because we find it useful is that the loop gain oscillates symmetrically around the mag- nitude response of the conventional one-pole filter (obtained r + z−R from (1) by setting r = 0). Nevertheless, it must be ensured H(z) = b ,(1) 1+az−1 somehow that the overall loop gain does not exceed unity at any of the harmonic frequencies—otherwise the system be- where the scaling parameter b is defined as comes unstable. It is sufficient to require that the sum g + |r| remains below one, or |r| < 1−g. In practice, a slightly larger b = g(1 + a), (2) magnitude of r still results in a stable system when r<0, because this choice decreases the loop gain at 0 Hz and the R is the delay line length of the ripple filter, r is the ripple conventional loop filter is a lowpass filter, and thus its gain at depth, and a is the feedback gain. Figure 3 shows the block the harmonic frequencies is smaller than g. diagram of the string model with details of the new loss filter, With small positive or negative values of r, it is possible to which is seen to be composed of the conventional one-pole obtain wavy loop gain characteristics, where two neighboring filter and a ripple filter in cascade. The total delay line length partials have considerably different loop gains and thus decay L in the feedback loop is 1+R+L1 plus the phase delay caused rates. The frequency of the ripple is controlled by parameter by the fractional delay filter F(z) and the allpass filter Ad(z). rrate so that a value close to one results in a very slow wave, The overall loop gain is determined by parameter g, while a value close to 0.5 results in a fast variation where the which is usually selected to be slightly smaller than 1 to en- loop gain for neighboring even and odd partials differs by sure stability of the feedback loop. The feedback gain param- about 2r (depending on the value of a). An example is shown eter a defines the overall lowpass character of the filter: a in Figure 4 where the properties of a conventional one-pole value slightly smaller than 0 (e.g., a =−0.01) yields a mild loss filter are compared against the proposed ripply loss filter. lowpass filter, which causes high-frequency partials to decay Figure 4a shows that by adding a feedforward path with small faster than the low-frequency ones, which is natural. gain factor r = 0.002, the loop gain characteristics can be The ripple depth parameter r is used to control the de- made less regular. viation of the loss filter gain from that of the one-pole filter. Figure 4b shows the corresponding reverberation time (T60) curve, which indicates how long it takes for each partial to decay by 60 dB. The T60 values are obtained by multiplying 3The sampling rate used in this work is 44100 Hz. the time-constant values τ by −60/[20 log(1/e)] or 6.9078. 938 EURASIP Journal on Applied Signal Processing

1

0.995

0.99 Loop gain

0.985 0 500 1000 1500 2000 2500 3000 Frequency (Hz)

(a)

10 (s)

60 5 T

0 0 500 1000 1500 2000 2500 3000 Frequency (Hz)

(b)

Figure 4: The frequency-dependent (a) loop gain (magnitude response) and (b) reverberation time T60 determined by the loss filter. The dashed lines show the smooth characteristics of a conventional one-pole loss filter (g = 0.995, a =−0.05). The solid lines show the characteristics obtained with the ripply loss filter (g = 0.995, a =−0.05, r = 0.0020, rrate = 0.5). The bold dots indicate the actual properties experienced by the partials of the synthetic tone (L = 200 samples, f0 = 220.5Hz).

The time constants τ(k) for partial indices k = 1, 2, 3, ...,on monicity, because then the allpass filter Ad(z) would not be the other hand, are obtained from the loop gain data G(k)as needed at all. The inharmonicity of the recorded harpsichord tones −1 τ(k) = . (4) were investigated in order to find out whether it is relevant f0 ln G(k) to model this property. The partials of recorded harpsichord tones were picked semiautomatically from the magnitude The loop gain sequence G(k) is extracted directly from the spectrum, and with a least-square fit we estimated the in- magnitude response of the loop filter at the fundamental fre- harmonicity coefficient B [20] for each recorded tone. The quency (k = 1) and at the other partial frequencies (k = measured B values are displayed in Figure 5 together with the 2, 3, 4, ...). threshold of audibility and its 90% confidence intervals taken Figure 4b demonstrates the power of the ripply loss fil- from listening test results [24]. It is seen that the B coeffi- ter: the second partial can be rendered to decay much slower cient is above the mean threshold of audibility in all cases, but than the first and the third partials. This is also perceived above the frequency 140 Hz, the measured values are within in the synthetic tone: soon after the attack, the second par- the confidence interval. Thus, it is not guaranteed that these tial stands out as the loudest and the longest ringing partial. cases actually correspond to audible inharmonicity. At low Formerly, this kind of flexibility has been obtained only with frequencies, in the case of the 19 lowest keys of the harpsi- high-order loss filters [17, 19]. Still, the new filter has only chord, where the inharmonicity coefficients are about 10−5, two parameters more than the one-pole filter, and its com- the inharmonicity is audible according to this comparison. putational complexity is comparable to that of a first-order It is thus important to implement the inharmonicity for the pole-zero filter. lowest 2 octaves or so, but it may also be necessary to imple- 3.2. Inharmonicity ment the inharmonicity for the rest of the notes. This conclusion is in accordance with [10], where inhar- Dispersion is always present in real strings. It is caused by monicity is stated as part of the tonal quality of the harp- the stiffness of the string material. This property of strings ff sichord, and also with [12], where it is mentioned that the gives rise to inharmonicity in the sound. An o spring of the inharmonicity is less pronounced than in the piano. harpsichord, the piano, is famous for its strongly inharmonic tones, especially in the bass range [9, 20]. This is due to the 3.3. Sample databases large elastic modulus and the large diameter of high-strength steel strings in the piano [9]. In waveguide models, inhar- The excitation signals of the string models are stored in a monicity is modeled with allpass filters [16, 21, 22, 23]. Nat- database from where they can be retrieved at the onset time. urally, it would be cost-efficient not to implement the inhar- The excitation sequences contain 20,000 samples (0.45 s), Sound Synthesis of the Harpsichord Using a Physical Model 939

10−2 dB 4000 0

3500 −5 10−3 3000 −10

10−4 2500 −15

B 2000 −20 10−5 1500 −25 Frequency (Hz)

− 10−6 1000 30

500 −35 10−7 0 −40 0 200 400 600 800 1000 00.511.52 Fundamental frequency (Hz) Time (s)

ffi Figure 5: Estimates of the inharmonicity coe cient B for all 56 keys Figure 6: Time-frequency plot of the harpsichord air radiation of the harpsichord (circles connected with thick line). Also shown when the 8 bridge is excited. To exemplify the fast decay of the ffi are the threshold of audibility for the B coe cient (solid line) and low-frequency modes only the first 2 seconds and frequencies up its 90% confidence intervals (dashed lines) taken from [24]. to 4000 Hz are displayed. and they have been extracted from recorded tones by can- The soundboard has its own modes depending on the size celing the partials. The analysis and calibration procedure is and the materials used. The radiated acoustic response of the discussed further in Section 4 of this paper. The idea is to harpsichord is reasonably flat over a frequency range from 50 include in these samples the sound of the quill scraping the to 2000 Hz [11]. In addition to exciting the air and structural string plus the beginning of the attack of the sound so that modes of the instrument body, the pluck excites the part of a natural attack is obtained during synthesis, and the ini- the string that lies behind the bridge, the high modes of the tial levels of partials are set properly. Note that this approach low strings that the dampers cannot perfectly attenuate, and is slightly different from the standard commuted synthesis the highest octave of the 4 register strings.4 The resonance technique, where the full inverse filtered recorded signal is strings behind the bridge are about 6 to 20 cm long and have used to excite the string model [18, 25]. In the latter case, a very inharmonic spectral structure. The soundboard filter all modes of the soundboard (or soundbox) are contained used in our harpsichord synthesizer (see Figure 2)isrespon- within the input sequence, and virtually perfect resynthesis is sible for imitating all these features. However, as will be dis- accomplished if the same parameters are used for inverse fil- cussed further in Section 4.5, the lowest body modes can be tering and synthesis. In the current model, however, we have ignored since they decay fast and are present in the excita- truncated the excitation signals by windowing them with the tion samples. In other words, the modeling is divided into right half of a Hanning window. The soundboard response two parts so that the soundboard filter models the rever- is much longer than that (several seconds), but imitating its berant tail while the attack part is included in the excitation ringing tail is taken care of by the soundboard filter (see the signal, which is fed to the string model. Reference [11] dis- next subsection). cusses the resonance modes of the harpsichord soundboard In addition to the excitation samples, we have extracted in detail. short release sounds from recorded tones. One of these is re- The radiated acoustic response of the harpsichord was trieved and played each time a note-off command occurs. Ex- recorded in an anechoic chamber by exciting the bridges tracting these samples is easy: once a note is played, the player (8 and 4) with an impulse hammer at multiple positions. can wait until the string sound has completely decayed, and Figure 6 displays a time-frequency response of the 8 bridge then release the key. This way a clean recording of noises re- when excited between the C3 strings, that is, approximately lated to the release event is obtained, and any extra process- at the middle point of the bridge. The decay times at fre- ing is unnecessary. An alternative way would be to synthesize quencies below 350 Hz are considerably shorter than in the these knocking sounds using modal synthesis, as suggested in frequency range from 350 to 1000 Hz. The T60 values at the [26]. respective bands are about 0.5 seconds and 4.5 seconds. This can be explained by the fact that the short string portions 3.4. Modeling the reverberant soundboard and undamped strings

When a note is plucked on the harpsichord, the string vibra- 4The instrument used in this study does not have dampers in the last tions excite the bridge and, consequently, the soundboard. octave of the 4 register. 940 EURASIP Journal on Applied Signal Processing behind the bridge and the undamped strings resonate and highpass filtered. The highpass filter is a fourth-order But- decay slowly. terworth highpass filter with a cutoff frequency of 52 Hz or As suggested by several authors, see for example, [14, 27, 32 Hz (for the lowest tones). The filter was applied to the 28], the impulse response of a musical instrument body can signal in both directions to obtain a zero-phase filtering. be modeled with a reverberation algorithm. Such algorithms The recordings were compared in an informal listening test have been originally devised for imitating the impulse re- among the authors, and the signals obtained with a high- sponse of concert halls. In a previous work, we triggered a quality studio microphone by Schoeps were selected for fur- static sample of the body response with every note [29]. In ther analysis. contrast to the sample-based solution, which produces the All 56 keys of the instrument were played separately with same response every time, the reverberation algorithm pro- six different combinations of the registers that are commonly duces additional variation in the sound: as the input signal used. This resulted in 56 × 6 = 336 recordings. The tones of the reverberation algorithm is changed, or in this case as were allowed to decay into silence, and the key release was in- the key or register is changed, the temporal and frequency cluded. The length of the single tones varied between 10 and content of the output changes accordingly. 25 seconds, because the bass tones of the harpsichord tend The soundboard response of the harpsichord in this work to ring much longer than the treble tones. For completeness, is modeled with an algorithm presented in [30]. It is a mod- we recorded examples of different dynamic levels of different ification of the feedback delay network [31], where the feed- keys, although it is known that the harpsichord has a limited back matrix is replaced with a single coefficient, and comb dynamic range due to its excitation mechanism. Short stac- allpass filters have been inserted in the delay line loops. A cato tones, slow key pressings, and fast repetitions of single schematic view of the reverberation algorithm is shown in keys were also registered. Chords were recorded to measure Figure 7. This structure is used because of its computational the variations of attack times between simultaneously played efficiency. The Hk(z) blocks represent the loss filters, Ak(z) keys. Additionally, scales and excerpts of musical pieces were blocks are the comb allpass filters, and the delay lines are of played and recorded. length Pk. In this work, eight (N = 8) delay lines are imple- Both bridges of the instrument were excited at several mented. points (four and six points for the 4 and the 8 bridge, re- One-pole lowpass filters are used as loss filters which im- spectively) with an impulse hammer to obtain reliable acous- plement the frequency-dependent decay. The comb allpass tic soundboard responses. The force signal of the hammer filters increase the diffusion effect and they all have the trans- and acceleration signal obtained from an accelerometer at- fer function tached to the bridge were recorded for the 8 bridge at

− three locations. The acoustic response was recorded in syn- + Mk = aap,k z chrony. Ak(z) −M ,(5) 1+aap,kz k 4.2. Analysis of recorded tones and extraction where are the delay-line lengths and are the allpass Mk aap,k of excitation signals filter coefficients. To ensure stability, it is required that aap,k ∈ [−1, 1]. In addition to the reverberation algorithm, a tone- Initial estimates of the synthesizer parameters can be ob- corrector filter, as shown in Figure 2, is used to match the tained from analysis of recorded tones. For the basic calibra- spectral envelope of the target response, that is, to suppress tion of the synthesizer, the recordings were selected where the low frequencies below 350 Hz and give some additional each register is played alone. We use a method based on the lowpass characteristics at high frequencies. The choice of the short-time Fourier transform and sinusoidal modeling, as parameters is discussed in Section 4.5. previously discussed in [18, 32]. The inharmonicity of harp- sichord tones is accounted for in the spectral peak-picking ffi 4. CALIBRATION OF THE SYNTHESIS ALGORITHM algorithm with the help of the estimated B coe cient val- ues. After extracting the fundamental frequency, the analy- The harpsichord was brought into an anechoic chamber sis system essentially decomposes the analyzed tone into its where the recordings and the acoustic measurements were deterministic and stochastic parts, as in the spectral model- conducted. The registered signals enable the automatic cali- ing synthesis method [33]. However, in our system the de- bration of the harpsichord synthesizer. This section describes cay times of the partials are extracted, and the loop filter de- the recordings, the signal analysis, and the calibration tech- sign is based on the loop gain data calculated from the de- niques for the string and the soundboard models. cay times. The envelopes of partials in the harpsichord tones exhibit beating and two-stage decay, as is usual for string in- 4.1. Recordings struments [34]. The residual is further processed, that is, the Harpsichord tones were recorded in the large anechoic cham- soundboard contribution is mostly removed (by windowing ber of Helsinki University of Technology. Recordings were the residual signal in the time domain) and the initial level made with multiple microphones installed at a distance of of each partial is adjusted by adding a correction obtained about 1 m above the soundboard. The signals were recorded through sinusoidal modeling and inverse filtering [35, 36]. digitally (44.1 kHz, 16 bits) directly onto the hard disk, and The resulting processed residual is used as an excitation sig- to remove disturbances in the infrasonic range, they were nal to the model. Sound Synthesis of the Harpsichord Using a Physical Model 941

−P − + z 1 H1(z) A1(z) − x(n) y(n) + . + . +

− + + z PN HN (z) AN (z)

+ gfb

Figure 7: A schematic view of the reverberation algorithm used for soundboard modeling.

4.3. Loss filter design 1 Since the ripply loop filter is an extension of the one-pole fil- ter that allows improved matching of the decay rate of one 0.995 partial and simply introduces variations to the others, it is 0.99 reasonable to design it after the one-pole filter. This kind Loop gain of approach is known to be suboptimal in filter design, but 0.985 highest possible accuracy is not the main goal of this work. 0 500 1000 1500 2000 2500 3000 3500 4000 Rather, a simple and reliable routine to automatically pro- Frequency (Hz) cess a large amount of measurement data is reached for, thus leaving a minimum amount of erroneous results to be fixed (a) manually. Figure 8 shows the loop gain and T60 data for an example case. It is seen that the target data (bold dots in Figure 8)con- 10

tain a fair amount of variation from one partial to the next (s) one, although the overall trend is downward as a function 60 5 T of frequency. Partials with indices 10, 11, 16, and 18 are ex- cluded (set to zero), because their decay times were found to 0 be unreliable (i.e., loop gain larger than unity). The one-pole 0 500 1000 1500 2000 2500 3000 3500 4000 filter response fitted using a weighted least squares technique Frequency (Hz) [18] (dashed lines in Figure 8) can follow the overall trend, (b) but it evens up the differences between neighboring partials. The ripply loss filter can be designed using the following Figure 8: (a) The target loop gain for a harpsichord tone ( f0 = heuristic rules. 197 Hz) (bold dots), the magnitude response of the conventional (1) Select the partial with the largest loop gain starting one-pole filter with g = 0.9960 and a =−0.0296 (dashed line), and from the second partial5 (the sixth partial in this case, the magnitude response of the ripply loss filter with r =−0.0015 and rrate = 0.0833 (solid line). (b) The corresponding T60 data. The see Figure 8), whose index is denoted by kmax. Usually one of the lowest partials will be picked once the out- total delay-line length is 223.9 samples, and the delay-line length R of the ripple filter is 19 samples. liers have been discarded. (2) Set the absolute value of r so that, together with the one-pole filter, the magnitude response will match the (3) If the target loop gain of the first partial is larger than target loop gain of the partial with index k , that is, max the magnitude response of the one-pole filter alone at |r|=G(k ) −|H(k f )|, where the second term max max 0 that frequency, set the sign of r to positive, and other- is the loop gain due to the one-pole filter at that fre- wise to negative so that the decay of the first partial is quency (in this case r = 0.0015). made fast (in the example case in Figure 8, the minus sign is chosen, that is, r =−0.0015). 5In practice, the first partial may have the largest loop gain. However, if (4) If a positive r has been chosen, conduct a stability ≥ we tried to match it using the ripply loss filter, the rrate parameter would go check at the zero frequency. If it fails (i.e., g + r 1), to 1, as can be seen from (6), and the delay-line length R would become equal the value of r must be made negative by changing its to L rounded to an integer, as can be seen from (3). This practically means sign. that the ripple filter would be reduced to a correction of the loop gain by r, which can be done also by simply replacing the loop gain parameter g by (5) Set the ripple rate parameter rrate so that the longest g + r. For this reason, it is sensible to match the loop gain of a partial other ringing partial will occur at the maximum nearest to than the first one. 0 Hz. This means that the parameter must be chosen 942 EURASIP Journal on Applied Signal Processing

according to the following rule: 1500   1  when r ≥ 0, 1000 = kmax rrate (6) MSE  1 500  when r<0. 2kmax 0 In the example case, as the ripple pattern is a negative 20 40 60 80 100 cosine wave (in the frequency domain) and the peak should Harmonic # = hit the 6th partial, we set the rrate parameter equal to 1/12 (a) 0.0833. This implies that the minimum will occur at every 12th partial and the first maximum will occur at the 6th par- tial. The result of this design procedure is shown in Figure 8 −120 with the solid line. Note that the peak is actually between the −140 5th and the 6th partial, because fractional delay techniques − are not used in this part of the system and the delay-line 160 length R is thus an integer, as defined in (3). It is obvious that −180 Magnitude (dB) this design method is limited in its ability to follow arbitrary −200 target data. However, as we now know that the resolution of 500 1000 1500 2000 human hearing is also very limited in evaluating differences Time (ms) in decay rates [37], we find the match in most cases to be ffi 9th partial su ciently good. 10th partial 11th partial 4.4. Beating filter design (b) The beating filter, a second-order resonator R(z) coupled in parallel with the string model (see Figure 2), is used for re- producing the beating in harpsichord synthesis. In practice, Figure 9: (a) The mean squared error of exponential curve fitting = we decided to choose the center frequency of the resonator so to the decay of partials ( f0 197 Hz), where the lowest large devi- that it brings about the beating effect in one of the low-index ation has been circled (10th partial), and the acceptance threshold is presented with a dashed-dotted line. (b) The corresponding tem- partials that has a prominent level and large beat amplitude. poral envelopes of the 9th, 10th, and 11th partials, where the slow These criteria make sure that the single resonator will pro- beating of the 10th partial and deviations in decay rates are visible. duce an audible effect during synthesis. In this implementation, we probed the deviation of the actual decay characteristics of the partials from the ideal ex- ponential decay. This procedure is illustrated in Figure 9.In Figure 9a, the mean-squared error (MSE) of the deviation is components that produce the beating, because the mixing shown. The lowest partial that exhibits a high deviation (10th parameter that adjusts the beating amplitude was not giving partial in this example) is selected as a candidate for the most a useful audible variation [39]. Thus, we are now convinced prominent beating partial. Its magnitude envelope is pre- that it is unnecessary to add another parameter for all string sented in Figure 9b by a solid curve. It exhibits a slow beating models by allowing changes in the amplitude of the beating pattern with a period of about 1.5 seconds. The second-order partial. resonator that simulates beating, in turn, can be tuned to re- sult in a beating pattern with this same rate. For comparison, 4.5. Design of soundboard filter the magnitude envelopes of the 9th and 11th partials are also The reverberation algorithm and the tone correction unit are shown by dashed and dash-dotted curves, respectively. set in cascade and together they form the soundboard model, The center frequency of the resonator is measured from as shown in Figure 2. For determining the soundboard filter, the envelope of the partial. In practice, the offset ranges from the parameters of the reverberation algorithm and its tone practically 0 Hz to a few . The gain of the resonator, correctorhavetobeset.Theparametersforthereverbera- that is, the amplitude of the beating partial, is set to be the tion algorithm were chosen as proposed in [31]. To match same as that of the partial it beats against. This simple choice the frequency-dependent decay, the ratio between the de- is backed by the recent result by Jarvel¨ ainen¨ and Karjalainen cay times at 0 Hz and at fs/2 was set to 0.13, so that T60 at [38] that the beating in string instrument tones is essentially 0 Hz became 6.0 seconds. The lengths of the eight delay lines perceived as an on/off process: if the beating amplitude is varied from 1009 to 1999 samples. To avoid superimposing above the threshold of audibility, it is noticed, while if it is the responses, the lengths were incommensurate numbers below it, it becomes inaudible. Furthermore, changes in the [40]. The lengths Mk of the delay lines in the comb allpass beating amplitude appear to be inaccurately perceived. Be- structures were set to 8% of the total length of each delay fore knowing these results, in a former version of the synthe- line path Pk, filter coefficients aap,k were all set to 0.5, and the sizer, we also decided to use the same amplitude for the two feedback coefficient gfb was set to −0.25. Sound Synthesis of the Harpsichord Using a Physical Model 943

The excitation signals for the harpsichord synthesizer are 0.45 second long, and hence contain the necessary fast- decaying modes for frequencies below 350 Hz (see Figure 6). Therefore, the tone correction section is divided into two parts: a highpass filter that suppresses frequencies below 350 Hz and another filter that imitates the spectral envelope 0 at the middle and high frequencies. The highpass filter is a 5th-order Chebyshev type I design with a 5 dB passband rip- −20 ple, the 6 dB point at 350 Hz, and a roll-off rate of about 0 50 dB per octave below the cutoff frequency. The spectral en- velope filter for the soundboard model is a 10th-order IIR Magnitude (dB) 0.5 −40 filter designed using linear prediction [41] from a 0.2-second 1 long windowed segment of the measured target response (see 0 2000 Time (s) 4000 Figure 6 from 0.3 second to 0.5 second). Figure 10 shows the 6000 1.5 Frequency (Hz) 8000 10000 time-frequency plot of the target response and the sound- board filter for the first 1.5 seconds up to 10 kHz. The tar- get response has a prominent lowpass characteristic, which (a) is due to the properties of the impulse hammer. While the response should really be inverse filtered by the hammer force signal, in practice we can approximately compensate this effect with a differentiator whose transfer function is −1 Hdiff(z) = 0.5 − 0.5z . This is done before the design of the tone corrector, so the compensation filter is not included in 0 the synthesizer implementation.

−20 5. IMPLEMENTATION AND APPLICATIONS 0

ffi Magnitude (dB) This section deals with computational e ciency, implemen- 0.5 tation issues, and musical applications of the harpsichord −40 1 synthesizer. 0 2000 Time (s) 4000 6000 1.5 5.1. Computational complexity Frequency (Hz) 8000 10000 The computational cost caused by implementing the harp- sichord synthesizer and running it at an audio sample rate, (b) such as 44100 Hz, is relatively small. Table 1 summarizes the amount of multiplications and additions needed per sam- Figure 10: The time-frequency representation of (a) the recorded ple for various parts of the system. In this cost analysis, it is soundboard response and (b) the synthetic response obtained as the assumed that the dispersion is simulated using a first-order impulse response of a modified feedback delay network. allpass filter. In practice, the lowest tones require a higher- order allpass filter, but some of the highest tones may not have the allpass filter at all. So the first-order filter represents computer, and it can simultaneously run 15 string models in an average cost per string model. Note that the total cost per real time without the soundboard model. With the sound- string is smaller than that of an FIR filter of order 12 (i.e., 13 boardmodel,itispossibletorunabout10strings.Anew, multiplications and 12 additions). In practice, one voice in faster computer and optimization of the code can increase harpsichord synthesis is allocated one to three string mod- these numbers. With optimized code and fast hardware, it els, which simulate the different registers. The soundboard may be possible to run the harpsichord synthesizer with full model is considerably more costly than a string model: the polyphony (i.e., 56 voices) and soundboard in real time using number of multiplications is more than fourfold, and the current technology. number of additions is almost seven times larger. The com- plexity analysis of the comb allpass filters in the soundboard 5.2. Synthesizer implementation model is based on the direct form II implementation (i.e., The signal-processing part of the harpsichord synthesizer one delay line, two multiplications, and two additions per is realized using a visual software synthesis package called comb allpass filter section). PWSynth [42]. PWSynth, in turn, is part of a larger visual The implementation of the synthesizer, which is dis- programming environment called PWGL [43]. Finally, the cussed in detail in the next section, is based on high-level control information is generated using our music notation programming and control. Thus, it is not optimized for package ENP (expressive notation package) [44]. In this sec- fastest possible real-time operation. The current implemen- tion, the focus is on design issues that we have encountered tation of the synthesizer runs on a Macintosh G4 (800 MHz) when implementing the synthesizer. We also give ideas on 944 EURASIP Journal on Applied Signal Processing

Table 1: The number of multiplications and additions in different parts of the synthesizer.

Part of synthesis algorithm Multiplications Additions String model • Fractional delay allpass filter F(z) 22

• Inharmonizing allpass filter Ad(z) 22 • One-pole filter 21 • Ripple filter 11 • Resonator R(z) 32 • Timbre control 21 • Mixing with release sample 11 Soundboard model • Modified FDN reverberator 33 47 • IIR tone corrector 11 10 • Highpass filter 12 9 • Mixing 11 Total • Per string (without soundboard model) 13 10 • Soundboard model 57 67 • All (one string and soundboard model) 70 77

how the model is parameterized so that it can be controlled ual string sets used by the instrument. These sets are labeled from the music notation software. as follows: “harpsy1/8-fb/,” “harpsy1/8-ff,” and “harpsy1/4- Our previous work in designing computer simulations ff/.” Each string set copies the string model patch count times, of musical instruments has resulted in several applications, where count is equal to the current number of strings (given such as the classical guitar [39], the Renaissance lute, the by the upper number-of-strings box). The rest of the boxes Turkish ud [45], and the clavichord [29]. The two-manual in the patch are used to mix the outputs of the string sets. harpsichord tackled in the current study is the most chal- Figure 12 gives the definition of a single string model. lenging and complex instrument that we have yet investi- The patch consists of two types of boxes. First, the boxes with gated. As this kind of work is experimental, and the synthe- the name “pwsynth-plug” (the boxes with the darkest out- sis model must be refined by interactive listening, a system lines in grey-scale) define the parametric entry points that is needed that is capable of making fast and efficient proto- are used by our control system. Second, the other boxes are types of the basic components of the system. Another non- low-level DSP modules, realized in C++, that perform the ac- trivial problem is the parameterization of the harpsichord tual sample calculation and boxes which are used to initialize synthesizer. In a typical case, one basic component, such as the DSP modules. The “pwsynth-plug” boxes point to mem- the vibrating string model, requires over 10 parameters so ory addresses that are continuously updated while the syn- that it can be used in a convincing simulation. Thus, since the thesizer is running. Each “pwsynth-plug” box has a label that full harpsichord synthesizer implementation has three string is used to build symbolic parameter pathnames. While the sets each having 56 strings, we need at least 1680 (= 10 × “copy-synth-patch” boxes (see the main patch of Figure 11) 3 × 56) parameters in order to control all individual strings copy the string model in a loop, the system automatically separately. generates new unique pathnames by merging the label from Figure 11 shows a prototype of a harpsichord synthe- the current “copy-synth-patch” box, the current loop index, sizer. It contains three main parts. First, the top-most box and the label found in “pwsynth-plug” boxes. Thus, path- (called “num-box” with the label “number-of-strings”) gives names like “harpsy1/8-fb/1/lfgain” are obtained, which refers the number of strings within each string set used by the syn- to the lfgain (loss filter gain) of the first string of the 8 back thesizer. This number can vary from 1 (useful for preliminary string set of a harpsichord model called “harpsy1.” tests) to 56 (the full instrument). In a typical real-time sit- uation, this number can vary, depending on the polyphony 5.3. Musical applications of the musical score to be realized, between 4 and 10. The The harpsichord synthesizer can be used as an electronic mu- next box of interest is called “string model.” It is a spe- sical instrument controlled either from a MIDI keyboard or cial abstraction box that contains a subwindow. The con- from a sequencer software. Recently, some composers have tents of this window are displayed in Figure 12. This abstrac- been interested in using a formerly developed model-based tion box defines a single string model. Next, Figure 11 shows guitar synthesizer for compositions, which are either experi- three “copy-synth-patch” boxes that determine the individ- mental in nature or extremely challenging for human players. Sound Synthesis of the Harpsichord Using a Physical Model 945

Num-box 56 Number of strings

String-model A

Copy-synth-patch Copy-synth-patch Copy-synth-patch Count Patch Count Patch Count Patch harpsy1/8-fb/ harpsy1/8-ff/ harpsy1/4-f/ S S S

Accum-vector Accum-vector Accum-vector Vector Vector Vector S S S

+

S Synth-box Patch Score S

Figure 11: The top-level prototype of the harpsichord synthesizer in PWSynth. The patch defines one string model and the three string sets used by the instrument.

Another fascinating idea is to extend the range and timbre of modern technology and want to try physics-based synthesis the instrument. A version of the guitar synthesizer, that we to learn about the instrument. A synthesizer allows varying call the super guitar, has an extended range and a large num- certain parameters in the instrument design, which are diffi- ber of strings [46]. We plan to develop a similar extension of cult or impossible to adjust in the real instrument. For exam- the harpsichord synthesizer. ple, the point where the quill plucks the string is structurally In the current version of the synthesizer, the parameters fixed in the harpsichord, but as it has a clear effect on the have been calibrated based on recordings. One obvious ap- timbre, varying it is of interest. In the current harpsichord plication for a parametric synthesizer is to modify the timbre synthesizer, it would require the knowledge of the plucking by deviating the parameter values. This can lead to extended point and then inverse filtering its contribution from the ex- that belong to the same instrument family as the citation signal. The plucking point contribution can then be original instrument or, in the extreme cases, to a novel vir- implemented in the string model by inserting another feed- tual instrument that cannot be recognized by listeners. One forward comb filter, as discussed previously in several works of the most obvious subjects for modification is the decay [7, 16, 17, 18]. Another prospect is to vary the location of the rate, which is controlled with the coefficients of the loop fil- damper. Currently, we do not have an exact model for the ter. damper, and neither is its location a parameter. Testing this is A well-known limitation of the harpsichord is its re- still possible, because it is known that the nonideal function- stricted dynamic range. In fact, it is a controversial issue ing of the damper is related to the nodal points of the strings, whether the key velocity has any audible effect on the sound which coincide with the locations of the damper. The ripply of the harpsichord. The synthesizer easily allows the imple- loss filter allows the imitation of this effect. mentation of an exaggerated dynamic control, where the key Luthiers are interested in the possibility of virtual proto- velocity has a dramatic effect on both the amplitude and the typing without the need for actually building many versions timbre, if desired, such as in the piano or in the acoustic gui- of an instrument out of wood. The current synthesis model tar. As the key velocity information is readily available, it can may not be sufficiently detailed for this purpose. A real-time be used to control the gain and the properties of a timbre or near-real-time implementation of a physical model, where control filter (see Figure 2). several parameters can be adjusted, would be an ideal tool for Luthiers who make musical instruments are interested in testing prototypes. 946 EURASIP Journal on Applied Signal Processing

Pwsynth-plug SoundID 0 Pwsynth-plug Pwsynth-plug freqsc 1 P1gain 0 Sample-player Pwsynth-plug Sample Freq Amp Trig Trigg 0 S + Pwsynth-plug Number Numbers Riprate 0.5 S Pwsynth-plug Initial-vals Ripple-delay-1g3 1Overfreq Intval 1/freq 1/fcoef 1fgain A Sig Delay Pwsynth-plug Ripple Ripdepth Ripdepth 0.0 0.5 ∧ Pwsynth-plug z −1 S Onepole Sig 1fcoef Intval + Extra-sample1 Coef Gain Number A Numbers S Pwsynth-plug S 1fgain 0.994842 Pwsynth-plug ∗ 1fgainsc 1.0 Number Numbers Linear-iP S 1 Sig 0.001 S

Figure 12: The string model patch. The patch contains the low-level DSP modules and parameter entry points used by the harpsichord synthesizer.

6. CONCLUSIONS (IST-2001-33059). The authors are grateful to B. Bank, P. A. A. Esquef, and J. O. Smith for their helpful comments. Spe- This paper proposes signal-processing techniques for synthe- cial thanks go to H. Jarvel¨ ainen¨ for her help in preparing sizing harpsichord tones. A new extension to the loss filter Figure 5. of the waveguide synthesizer has been developed which al- lows variations in the decay times of neighboring partials. This filter will be useful also for the waveguide synthesis of REFERENCES other stringed instruments. The fast-decaying modes of the ffi soundboard are incorporated in the excitation samples of [1] J. O. Smith, “E cient synthesis of stringed musical instru- ments,” in Proc. International Computer Music Conference,pp. the synthesizer, while the long-ringing modes at the middle 64–71, Tokyo, Japan, September 1993. and high frequencies are imitated using a reverberation al- [2] M. Karjalainen and V. Valim¨ aki,¨ “Model-based analy- gorithm. The calibration of the synthesis model is made al- sis/synthesis of the acoustic guitar,” in Proc. Stockholm Music most automatic. The parameterization and use of simple fil- Acoustics Conference, pp. 443–447, Stockholm, Sweden, July– ters also allow manual adjustment of the timbre. A physics- August 1993. based synthesizer, such as the one described here, has several [3] M. Karjalainen, V. Valim¨ aki,¨ and Z. Janosy,´ “Towards high- musical applications, the most obvious one being the usage quality sound synthesis of the guitar and string instruments,” in Proc. International Computer Music Conference, pp. 56–63, as a computer-controlled musical instrument. Tokyo, Japan, September 1993. Examples of single tones and musical pieces synthesized [4] J. O. Smith and S. A. Van Duyne, “Commuted piano syn- with the synthesizer are available at http://www.acoustics. thesis,” in Proc. International Computer Music Conference,pp. hut.fi/publications/papers/jasp-harpsy/. 319–326, Banff, Alberta, Canada, September 1995. [5] J. O. Smith, “Physical modeling using digital waveguides,” Computer Music Journal, vol. 16, no. 4, pp. 74–91, 1992. ACKNOWLEDGMENTS [6] K. Karplus and A. Strong, “Digital synthesis of plucked string and drum timbres,” Computer Music Journal,vol.7,no.2,pp. The work of Henri Penttinen has been supported by the 43–55, 1983. Pythagoras Graduate School of Sound and Music Research. [7] M. Karjalainen, V. Valim¨ aki,¨ and T. Tolonen, “Plucked- The work of Cumhur Erkut is part of the EU project ALMA string models, from the Karplus-Strong algorithm to digital Sound Synthesis of the Harpsichord Using a Physical Model 947

waveguides and beyond,” Computer Music Journal, vol. 22, [26] P. R. Cook, “Physically informed sonic modeling (PhISM): no. 3, pp. 17–32, 1998. synthesis of percussive sounds,” Computer Music Journal, vol. [8] F. Hubbard, Three Centuries of Harpsichord Making,Harvard 21, no. 3, pp. 38–49, 1997. University Press, Cambridge, Mass, USA, 1965. [27] D. Rocchesso, “Multiple feedback delay networks for sound [9] N. H. Fletcher and T. D. Rossing, The Physics of Musical In- processing,” in Proc. X Colloquio di Informatica Musicale,pp. struments, Springer-Verlag, New York, NY, USA, 1991. 202–209, Milan, Italy, December 1993. [10] E. L. Kottick, K. D. Marshall, and T. J. Hendrickson, “The [28] H. Penttinen, M. Karjalainen, T. Paatero, and H. Jarvel¨ ainen,¨ acoustics of the harpsichord,” Scientific American, vol. 264, “New techniques to model reverberant instrument body re- no. 2, pp. 94–99, 1991. sponses,” in Proc. International Computer Music Conference, [11] W. R. Savage, E. L. Kottick, T. J. Hendrickson, and K. D. Mar- pp. 182–185, Havana, Cuba, September 2001. shall, “Air and structural modes of a harpsichord,” Journal of [29] V. Valim¨ aki,¨ M. Laurson, and C. Erkut, “Commuted waveg- the Acoustical Society of America, vol. 91, no. 4, pp. 2180–2189, uide synthesis of the clavichord,” Computer Music Journal, vol. 1992. 27, no. 1, pp. 71–82, 2003. [12] N. H. Fletcher, “Analysis of the design and performance of [30] R. Va¨an¨ anen,¨ V. Valim¨ aki,¨ J. Huopaniemi, and M. Kar- harpsichords,” Acustica, vol. 37, pp. 139–147, 1977. jalainen, “Efficient and parametric reverberator for room [13] J. Sankey and W. A. Sethares, “A consonance-based approach acoustics modeling,” in Proc. International Computer Mu- to the harpsichord tuning of Domenico Scarlatti,” Journal of sic Conference, pp. 200–203, Thessaloniki, Greece, September the Acoustical Society of America, vol. 101, no. 4, pp. 2332– 1997. 2337, 1997. [31] J. M. Jot and A. Chaigne, “Digital delay networks for design- [14] B. Bank, “Physics-based sound synthesis of the piano,” M.S. ing artificial reverberators,” in Proc. 90th Convention Audio thesis, Department of Measurement and Information Sys- Engineering Society, Paris, France, February 1991. tems, Budapest University of Technology and Economics, Bu- [32] C. Erkut, V. Valim¨ aki,¨ M. Karjalainen, and M. Laurson, “Ex- dapest, Hungary, 2000, published as Tech. Rep. 54, Laboratory traction of physical and expressive parameters for model- of Acoustics and Audio Signal Processing, Helsinki University based sound synthesis of the classical guitar,” in Proc. 108th of Technology, Espoo, Finland, 2000. Convention Audio Engineering Society,p.17,Paris,France, [15] B. Bank, V. Valim¨ aki,¨ L. Sujbert, and M. Karjalainen, “Effi- February 2000. cient physics based sound synthesis of the piano using DSP [33] X. Serra and J. O. Smith, “Spectral modeling synthesis: a methods,” in Proc. European Signal Processing Conference, sound analysis/synthesis system based on a deterministic plus vol. 4, pp. 2225–2228, Tampere, Finland, September 2000. stochastic decomposition,” Computer Music Journal, vol. 14, [16] D. A. Jaffe and J. O. Smith, “Extensions of the Karplus-Strong no. 4, pp. 12–24, 1990. plucked-string algorithm,” Computer Music Journal, vol. 7, [34] G. Weinreich, “Coupled piano strings,” Journal of the Acous- no. 2, pp. 56–69, 1983. tical Society of America, vol. 62, no. 6, pp. 1474–1484, 1977. [17] J. O. Smith, Techniques for digital filter design and system iden- [35] V. Valim¨ aki¨ and T. Tolonen, “Development and calibration of tification with application to the violin, Ph.D. thesis, Stanford a guitar synthesizer,” Journal of the Audio Engineering Society, University, Stanford, Calif, USA, 1983. vol. 46, no. 9, pp. 766–778, 1998. [18] V. Valim¨ aki,¨ J. Huopaniemi, M. Karjalainen, and Z. Janosy,´ [36] T. Tolonen, “Model-based analysis and resynthesis of acoustic “Physical modeling of plucked string instruments with appli- guitar tones,” M.S. thesis, Laboratory of Acoustics and Audio cation to real-time sound synthesis,” Journal of the Audio En- Signal Processing, Department of Electrical and Communica- gineering Society, vol. 44, no. 5, pp. 331–353, 1996. tions Engineering, Helsinki University of Technology, Espoo, [19] B. Bank and V. Valim¨ aki,¨ “Robust loss filter design for digital Finland, 1998, Tech. Rep. 46. waveguide synthesis of string tones,” IEEE Signal Processing [37] H. Jarvel¨ ainen¨ and T. Tolonen, “Perceptual tolerances for de- Letters, vol. 10, no. 1, pp. 18–20, 2003. cay parameters in plucked string synthesis,” Journal of the Au- [20] H. Fletcher, E. D. Blackham, and R. S. Stratton, “Quality of dio Engineering Society, vol. 49, no. 11, pp. 1049–1059, 2001. piano tones,” Journal of the Acoustical Society of America, vol. [38] H. Jarvel¨ ainen¨ and M. Karjalainen, “Perception of beating and 34, no. 6, pp. 749–761, 1962. two-stage decay in dual-polarization string models,” in Proc. [21] S. A. Van Duyne and J. O. Smith, “A simplified approach to International Symposium on Musical Acoustics, Mexico City, modeling dispersion caused by stiffness in strings and plates,” Mexico, December 2002. in Proc. International Computer Music Conference, pp. 407– [39] M. Laurson, C. Erkut, V. Valim¨ aki,¨ and M. Kuuskankare, 410, Arhus,˚ Denmark, September 1994. “Methods for modeling realistic playing in acoustic guitar [22] D. Rocchesso and F. Scalcon, “Accurate dispersion simulation synthesis,” Computer Music Journal, vol. 25, no. 3, pp. 38–49, for piano strings,” in Proc. Nordic Acoustical Meeting, pp. 407– 2001. 414, Helsinki, Finland, June 1996. [40] W. G. Gardner, “Reverberation algorithms,” in Applications of [23] B. Bank, F. Avanzini, G. Borin, G. De Poli, F. Fontana, and Digital Signal Processing to Audio and Acoustics,M.Kahrsand D. Rocchesso, “Physically informed signal processing meth- K. Brandenburg, Eds., pp. 85–131, Kluwer Academic, Boston, ods for piano sound synthesis: a research overview,” EURASIP Mass, USA, 1998. Journal on Applied Signal Processing, vol. 2003, no. 10, pp. [41] J. D. Markel and A. H. Gray Jr., Linear Prediction of Speech, 941–952, 2003. Springer-Verlag, Berlin, Germany, 1976. [24] H. Jarvel¨ ainen,¨ V. Valim¨ aki,¨ and M. Karjalainen, “Audibility [42] M. Laurson and M. Kuuskankare, “PWSynth: a Lisp-based of the timbral effects of inharmonicity in stringed instrument bridge between computer assisted composition and sound tones,” Acoustics Research Letters Online, vol. 2, no. 3, pp. 79– synthesis,” in Proc. International Computer Music Conference, 84, 2001. pp. 127–130, Havana, Cuba, September 2001. [25] M. Karjalainen and J. O. Smith, “Body modeling techniques [43] M. Laurson and M. Kuuskankare, “PWGL: a novel visual for string instrument synthesis,” in Proc. International Com- language based on Common Lisp, CLOS and OpenGL,” in puter Music Conference, pp. 232–239, Hong Kong, China, Au- Proc. International Computer Music Conference, pp. 142–145, gust 1996. Gothenburg, Sweden, September 2002. 948 EURASIP Journal on Applied Signal Processing

[44] M. Kuuskankare and M. Laurson, “ENP2.0: a music notation Mikael Laurson was born in Helsinki, program implemented in Common Lisp and OpenGL,” in Finland, in 1951. His formal training at Proc. International Computer Music Conference, pp. 463–466, the Sibelius Academy consists of a guitar Gothenburg, Sweden, September 2002. diploma (1979) and a doctoral dissertation [45] C. Erkut, M. Laurson, M. Kuuskankare, and V. Valim¨ aki,¨ (1996). In 2002, he was appointed Docent “Model-based synthesis of the ud and the Renaissance lute,” in in music technology at Helsinki Univer- Proc. International Computer Music Conference, pp. 119–122, sity of Technology, Espoo, Finland. Between Havana, Cuba, September 2001. the years 1979 and 1985 he was active as [46] M. Laurson, V. Valim¨ aki,¨ and C. Erkut, “Production of vir- a guitarist. Since 1989 he has been work- tual acoustic guitar music,” in Proc. Audio Engineering Society ing at the Sibelius Academy as a Researcher 22nd International Conference on Virtual, Synthetic and Enter- and Teacher of computer-aided composition. After conceiving the tainment Audio, pp. 249–255, Espoo, Finland, June 2002. PatchWork (PW) programming language (1986), he started a close collaboration with IRCAM resulting in the first PW release in 1993. Vesa Valim¨ aki¨ was born in Kuorevesi, Fin- After 1993 he has been active as a developer of various PW user li- land, in 1968. He received the M.S. de- braries. Since the year 1999, Dr. Laurson has worked in a project gree, the Licentiate of Science degree, and dealing with physical modeling and sound synthesis control funded the Doctor of Science degree, all in elec- by the Academy of Finland and the Sibelius Academy Innovation trical engineering from Helsinki University Centre. of Technology (HUT), Espoo, Finland, in Cumhur Erkut was born in Istanbul, 1992, 1994, and 1995, respectively. He was Turkey, in 1969. He received the B.S. and the with the HUT Laboratory of Acoustics and M.S. degrees in electronics and communi- Audio Signal Processing from 1990 to 2001. cation engineering from the Yildiz Techni- In 1996, he was a Postdoctoral Research Fel- cal University, Istanbul, Turkey, in 1994 and low with the University of Westminster, London, UK. During the 1997, respectively, and the Doctor of Sci- academic year 2001-2002 he was Professor of signal processing at ence degree in electrical engineering from thePoriSchoolofTechnologyandEconomics,TampereUniversity Helsinki University of Technology (HUT), of Technology (TUT), Pori, Finland. He is currently Professor of Espoo, Finland, in 2002. Between 1998 and audio signal processing at HUT. He was appointed Docent in sig- 2002, he worked as a Researcher at the HUT nal processing at the Pori School of Technology and Economics, Laboratory of Acoustics and Audio Signal Processing. He is cur- TUT, in 2003. His research interests are in the application of digi- rently a Postdoctoral Researcher in the same institution, where tal signal processing to music and audio. Dr. Valim¨ aki¨ is a Senior he contributes to the EU-funded research project “Algorithms for Member of the IEEE Signal Processing Society and is a Member of the Modelling of Acoustic Interactions” (ALMA, European project the Audio Engineering Society, the Acoustical Society of Finland, IST-2001-33059). His primary research interests are model-based and the Finnish Musicological Society. sound synthesis and musical acoustics. Henri Penttinen was born in Espoo, Fin- land, in 1975. He received the M.S. degree in electrical engineering from Helsinki Uni- versity of Technology (HUT), Espoo, Fin- land, in 2003. He has worked at the HUT Laboratory of Acoustics and Signal Process- ing since 1999 and is currently a Ph.D. stu- dent there. His main research interests are signal processing algorithms, real-time au- dio applications, and musical acoustics. Mr. Penttinen is also active in music through playing, composing, and performing.

Jonte Knif was born in Vaasa, Finland, in 1975. He is currently studying music tech- nology at the Sibelius Academy, Helsinki, Finland. Prior to this he studied the harpsi- chord at the Sibelius Academy for five years. He has built and designed many histori- cal keyboard instruments and adaptations such as an electric clavichord. His present interests include also loudspeaker and stu- dio electronics design. EURASIP Journal on Applied Signal Processing 2004:7, 949–963 c 2004 Hindawi Publishing Corporation

Multirate Simulations of String Vibrations Including Nonlinear Fret-String Interactions Using the Functional Transformation Method

L. Trautmann Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Cauerstrasse 7, 91058 Erlangen, Germany Email: [email protected] Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000, 02015 Espoo, Finland Email: [email protected].fi R. Rabenstein Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Cauerstrasse 7, 91058 Erlangen, Germany Email: [email protected]

Received 30 June 2003; Revised 14 November 2003

The functional transformation method (FTM) is a well-established mathematical method for accurate simulations of multidimen- sional physical systems from various fields of science, including optics, heat and mass transfer, electrical engineering, and acoustics. This paper applies the FTM to real-time simulations of transversal vibrating strings. First, a physical model of a transversal vibrat- ing lossy and dispersive string is derived. Afterwards, this model is solved with the FTM for two cases: the ideally linearly vibrating string and the string interacting nonlinearly with the frets. It is shown that accurate and stable simulations can be achieved with the discretization of the continuous solution at audio rate. Both simulations can also be performed with a multirate approach with only minor degradations of the simulation accuracy but with preservation of stability. This saves almost 80% of the compu- tational cost for the simulation of a six-string guitar and therefore it is in the range of the computational cost for digital waveguide simulations. Keywords and phrases: multidimensional system, vibrating string, partial differential equation, functional transformation, non- linear, multirate approach.

1. INTRODUCTION ferent playing techniques or different instruments within one instrument family are described in the physics-based meth- Digital sound synthesis methods can mainly be categorized ods with only a few parameters. These parameters can be ad- into classical direct synthesis methods and physics-based justed in advance to simulate a distinct acoustical instrument methods [1]. The first category includes all kinds of sound or they can be controlled by the musician to morph between processing algorithms like wavetable, granular and subtrac- real world instruments to obtain more degrees of freedom in tive synthesis, as well as abstract mathematical models, like the expressiveness and variability. additive or frequency synthesis. What is com- The second item makes physical modeling methods quite mon to all these methods is that they are based on the sound useful for multimedia applications where only a very limited to be (re)produced. bandwidth is available for the transmission of music as, for The physics-based methods, also called physical model- example, in mobile phones. In these applications, the physi- ing methods, start at the physics of the sound production calmodelhastobetransferredonlyonceandafterwardsitis mechanism rather than at the resulting sound. This approach sufficient to transfer only the musical score while keeping the has several advantages over the sound-based methods. variability of the resulting sound. (i) The resulting sound and especially transitions be- The starting points for the various existing physical mod- tween successive notes always sound acoustically realistic as eling methods are always physical models varying for a cer- far as the underlying model is sufficiently accurate. tain vibrating object only in the model accuracies. The appli- (ii) Sound variations of acoustical instruments due to dif- cation of the basic laws of physics to an existing or imaginary 950 EURASIP Journal on Applied Signal Processing vibrating object results in continuous-time, continuous- 17, 18]. The DWG first simplifies the PDE to the wave equa- space models. These models are called initial-boundary- tion which has an analytical solution in the form of a for- value problems and they contain a partial differential equa- ward and backward traveling wave, called d’Alembert solu- tion (PDE) and some initial and boundary conditions. The tion. It can be realized computationally very efficient with discretization approaches to the continuous models and the delay lines. The sound effects like damping or dispersion oc- digital realizations are different for the single physical mod- curring in the vibrating structure are included in the DWG by eling methods. low-order digital filters concentrated in one point of the de- One of the first physical modeling algorithm for the sim- lay line. This procedure ensures the computational efficiency, ulation of musical instruments was made by Hiller and Ruiz but the implementation looses the direct connection to the 1971 in [2] with the finite difference method. It directly dis- physical parameters of the vibrating structure. cretizes the temporal and spatial differential operators of the The focus of this article is the FTM. It was first intro- PDE to finite difference terms. On the one hand, this ap- duced in [19] for the heat-flow equation and first used for proach is computationally very demanding; since temporal digital sound synthesis in [20]. Extensions to the basic model and spatial sampling intervals have to be chosen small for of a vibrating string and comparisons between the FTM and accurate simulations. Furthermore, stability problems occur the above mentioned physical modeling methods are given, especially in dispersive vibrational objects if the relationship for example, in [12]. In the FTM, the initial-boundary-value between temporal and spatial sampling intervals is not cho- problem is first solved analytically by appropriate functional sen properly [3]. On the other hand, the finite difference transformations before it is discretized for computer simula- method is quite suitable for studies in which the vibration has tions. This ensures a high simulation accuracy as well as an to be evaluated in a dense spatial grid. Therefore, the finite inherent stability. One of the drawbacks of the FTM is so far difference method has mainly been used for academic stud- its computational load, which is about five times higher than ies rather than for real-time applications (see, e.g., [4, 5]). the load of the DWG [21]. However, the finite difference method has recently become This article extends the FTM by applying a multirate ap- more popular also for real-time applications in conjunction proach to the discrete realization of the FTM, such that the with other physical modeling methods [6, 7]. computational complexity is significantly reduced. The ex- A mathematically similar discretization approach is used tension is shown for the linearly vibrating string as well as in mass-spring models that are closely related to the finite for the nonlinear limitation of the string vibration by a fret- element method. In this approach, the vibrating structure string interaction occurring in slapbass synthesis. is reduced to a finite number of mass points that are inter- The article is organized as follows. Section 2 derives the connected by springs and dampers. One of the first systems physical model of a transversal vibrating, dispersive, and for the simulation of musical instruments was the CORDIS lossy string in terms of a scalar PDE and initial and boundary system which could be realized in real time on a specialized conditions. Furthermore, a model for a nonlinear fret-string processor [8]. The finite difference method, as well as the interaction is given. These models are solved in Section 3 mass-spring models, can be viewed as direct discretization with the FTM in continuous time and continuous space. approaches of the initial-boundary-value problems. Despite Section 4 discretizes these solutions at audio rate and derives the stability problems, they are very easy to set up, but they an algorithm to guarantee stability even for the nonlinear are computationally demanding. discrete system. A multirate approach is used in Section 5 In modal synthesis, first introduced in [9], the PDE is for the simulation of the continuous solution to save com- spatially discretized at non necessarily equidistant spatial putational cost. It is shown that this multirate approach also points, similar to the mass-spring models. The interconnec- works for nonlinear systems. Section 6 compares the audio tions between these discretized spatial points reflect the phys- rate and the multirate solutions with respect to the simula- ical behavior of the structure. This discretization reduces the tion accuracy and the computational complexity. degrees of freedom for the vibration to the number of spatial points which is directly transferred to the same number of 2. PHYSICAL MODELS temporal modes the structure can vibrate in. The reduction does not only allow the calculation of the modes of simple In this Section, a transversal vibrating, dispersive, and lossy structures, but it can also handle vibrational measurements string is analyzed using the basic laws of physics. From this of more complicated structures at a finite number of spatial analysis, a scalar PDE is derived in Section 2.1. Section 2.2 points [10]. A commercial product of the modal synthesis, defines the initial states of the vibration, as well as the fixings Modalys, is described, for example, in [11]. For a review of of the string at the nut and the bridge end, in terms of ini- modal synthesis and a comparison to the functional trans- tial and boundary conditions, respectively. In Section 2.3, the formation method (FTM), see also [12]. linear model is extended with a deflection-dependent force The commercially and academically most popular phys- simulating the nonlinear interaction between the string and ical modeling method of the last two decades was the digital the frets, well known as slap synthesis [22]. waveguide method (DWG) because of its computational ef- In all these models, the strings are assumed to be homo- ficiency. It was first introduced in [13] as a physically inter- geneous and isotropic. Furthermore, the smoothness of their preted extension of the Karplus-Strong algorithm [14]. Ex- surfaces may not permit stress concentrations. The deflec- tensions of the DWG are described, for example, in [15, 16, tions of the strings are assumed to be small enough to change Multirate Simulations of String Vibrations Using the FTM 951 neither the cross section area nor the tension on the string so string deflection y(x, t) by replacing v(x, t)withy˙(x, t)and that the string itself behaves linearly. ϕ(x, t) = y(x, t)from(3d) and with (3b)and(3c). Then (3) can be written in a general notation of scalar PDEs 2.1. Linear partial differential equation       derived by basic laws of physics D y(x, t) +L y(x, t) +W y(x, t) (4a) The string under examination is characterized by its ma- = fe1(x, t), x ∈ [0, l], t ∈ [0, ∞), terial and geometrical parameters. The material parameters are given by the mass density ρ, the Young’s modulus E, the with laminar air flow damping coefficient d1, and the viscoelastic   ffi D y(x, t) = ρAy¨(x, t)+d1 y˙(x, t), damping coe cient d3. The geometrical parameters consist     of the length l, the cross section area A and the moment of L y(x, t) =−Ts y (x, t)+EIB y (x, t), (4b) inertia I.Furthermore,atensionT is applied to the string in      s = =−  axial direction. Considering only a string segment between W y(x, t) WD WL y(x, t) d3 y˙ (x, t). the spatial positions and + ∆ , the forces on this string xs xs x {} segment can be analyzed in detail. They consist of the restor- Asitcanbeseenin(4), the operator D contains only tem- {} ing force caused by the tension , the bending force poral derivatives, the operator L has only spatial deriva- fT Ts fB {} caused by the stiffness of the string, the laminar air flow force tives, and the operator W consists of mixed temporal and spatial derivatives. The PDE is valid only on the string be- fd1, the viscoelastic damping force fd3 (modeled here without tween x = 0andx = l and for all positive times. Equation memory), and the external excitation force fe. They result at x in (4) forms a continuous-time, continuous-space PDE. For a s unique solution, initial and boundary conditions must be f x , t = T sin ϕ x , t ≈ T ϕ x , t ,(1a)given as specified in the next section. T s s s s s f x , t =−EIb x , t ,(1b) B s s 2.2. Initial and boundary conditions f x , t = d ∆xv x , t ,(1c) d1 s 1 s Initial conditions define the initial state of the string at time fd3 xs, t = d3 sin ϕ˙ xs, t ≈ d3ϕ˙ xs, t ,(1d)t = 0. This definition is written in the general operator nota- tion with where ϕ(xs, t) is the slope angle of the string, b(xs, t) is the curvature of the string, v(x , t) is the velocity, and prime de-   y(x,0) s fT y(x, t) = = 0, x ∈ [0, l], t = 0. (5) notes spatial derivative and dot denotes temporal derivative. i y˙(x,0) Note that in (1a) and in (1d) it is assumed that the amplitude of the string vibration is small so that the function can Since the scalar PDE (4) is of second order with respect to be approximated by its argument. Similar equations can be time, only two initial conditions are needed. They are chosen found for the forces at the other end of the string segment at arbitrarily by the initial deflection and the initial velocity of xs + ∆x. the string as seen in (5). For musical applications, it is a rea- All these forces are combined by the equation of motion sonable assumption that the initial states of the strings vanish to at time t = 0asgivenin(5). Note that this does not prevent the interaction between successively played notes since the ρA∆xv˙ x , t = f x , t + f x , t − f x + ∆x, t s y s d3 s y s time is not set to zero for each note. Thus, this kind of initial − fd3 xs + ∆x, t − fd1 xs, t + fe xs, t , condition is only used for, for example, the beginning of a (2) piece of music. In addition to the initial conditions, also the fixings of where fy = fT + fB. Setting ∆x → 0 and solving (2) for the the string at both ends must be defined in terms of bound- excitation force density fe1(xs, t) = fe(xs, t)δ(x − xs), four ary conditions. In most stringed instruments, the strings are coupled equations are obtained, that are valid not only at the nearly fixed at the nut end (x = x0 = 0) and transfer energy = = string segment xs ≤ x ≤ xs + ∆x but also at the whole string at the other end (x x1 l) via the bridge to the resonant 0 ≤ x ≤ l. δ(x) denotes the impulse function. body [2]. For some instruments (e.g., the piano) it is also a justified assumption, that the bridge fixing can be modeled = −  − ˙ fe1(x, t) ρAv˙(x, t)+d1v(x, t) fy (x, t) d3b(x, t), (3a) to be ideally rigid [23]. Then the boundary conditions are  given by fy(x, t) = Tsϕ(x, t) − EIb (x, t), (3b) =  b x1, t ϕ (x, t), (3c)   y x , t T = i = ∈ ∈ ∞  fbi y(x, t)  0, i 0, 1, t [0, ). (6) v x1, t = ϕ˙(x, t). (3d) y xi, t

An extended version of the derivation of (3)canbefound It can be seen from (6) that the string is assumed to be fixed, in [12]. The four coupled equations (3) can be simplified allowed to pivot at both ends, such that the deflection y and to one scalar PDE with only one output variable. All the the curvature b = y must vanish. These are boundary con- dependent variables in (3a) can be written in terms of the ditions of first kind. For simplicity, there is no energy fed 952 EURASIP Journal on Applied Signal Processing

L{·} T {·} Algebraic Reordering PDE ODE MD TFM IC, BC BC equation Discretization −1{·} T −1{·} Discrete z Discrete Discrete solution 1−DTFM MD TFM

Figure 1: Procedure of the FTM solving initial boundary value problems defined in form of PDEs, IC, and BC.

into the system via the boundary, resulting in homogeneous (BC) is Laplace transformed (L{·})withrespecttotime boundary conditions. to derive a boundary-value problem (ODE, BC). Then a The PDE (4), in conjunction with the initial (5)and so-called Sturm-Liouville transformation (T {·})isusedfor boundary conditions (6), forms the linear continuous- the spatial variable to obtain an algebraic equation. Solving time continuous-space initial-boundary-value problem to be for the output variable results in a multidimensional (MD) solved and simulated. transfer function model (TFM). It is discretized and by ap- plying the inverse Sturm-Liouville transformation T −1{·} 2.3. Nonlinear extension to the linear model and the inverse z-transformation z−1{·} it results in the dis- for slap synthesis cretized solution in the time and space domain. Nonlinearities are an important part in the sound produc- The impulse-invariant transformation is used for the dis- tion mechanisms of musical instruments [23]. One example cretization shown in Figure 1. It is equivalent to the calcu- is the nonlinear interaction of the string with the frets, well lation of the continuous solution by inverse transformation known as slap synthesis. This effect was modeled first for the into the continuous time and space domain with subsequent DWG in [22] as a nonlinear amplitude limitation. For the sampling. The calculation of the continuous solution is pre- FTM, the effect was already applied to vibrating strings in sented in Sections 3.1 to 3.5, the discretization is shown in [24]. Sections 4 and 5. A simplified model for this interaction interprets the fret For the nonlinear system, the transformations cannot ob- as a spring with a high stiffness coefficient Sfret acting at one viously result in a TFM. Therefore, the procedure has to be position xf as a force ff on the string at time instances where modified slightly, resulting in an MD implicit equation, de- the string is in contact with the fret. Since this force depends scribed in Section 3.6. on the string deflection, it is nonlinear, defined with 3.1. Laplace transformation ff xf , t, y, yf As known from linear electrical network theory, the Laplace  transformation removes the temporal derivatives in linear S y x , t − y x , t ,fory x , t − y x , t > 0, = fret f f f f f f and time-invariant (LTI) systems and includes, due to the  − ≤ 0, for y xf , t yf xf , t 0. differentiation theorem, the initial conditions as additive (7) terms (see, e.g., [25]). Since first- and second-order time derivatives occur in (4) and the initial conditions (5) are ho- The deflection of the fret from the string rest position is de- mogeneous, the application of the Laplace transformation to noted with yf .ThePDE(4) becomes nonlinear by adding the the initial boundary value problem derived in Section 2 re- slap force ff to the excitation function fe1(x, t).Thus,alinear sults in and a nonlinear system for the simulation of the vibrating     string is derived. Both systems are solved in the next sections dD(s)Y(x, s)+L Y(x, s) + wD(s)WL Y(x, s) (8a) with the FTM. = Fe1(x, s), x ∈ [0, l], T = ∈ fbiY(x, s) 0, i 0, 1. (8b) 3. CONTINUOUS SOLUTIONS USING THE FTM The Laplace transformed functions are written with capital To obtain a model that can be implemented in the computer, letters and the complex temporal frequency variable is de- the continuous initial-boundary-value problem has to be noted by s = σ + jω. It can be seen in (8a) that the temporal discretized. Instead of using a direct discretization approach derivatives of (4a) are replaced with scalar multiplication of as described in Section 1, the continuous analytical solution the functions is derived first, which is discretized subsequently. This proce- 2 dure is well known from the simulation of one-dimensional dD(s) = ρAs + d1s, wD(s) =−d3s. (8c) systems like electrical networks. It has several advantages in- cluding simulation accuracy and guaranteed stability. Thus, the initial boundary value problem (4), (5), and (6)is The outline of the FTM is given in Figure 1. First, the replaced with the boundary-value problem (8)afterLaplace PDE with initial conditions (IC) and boundary conditions transformation. Multirate Simulations of String Vibrations Using the FTM 953

3.2. Sturm-Liouville transformation 3.4. Inverse transformations The transformation of the spatial variable should have the As explained at the beginning of Section 3, the continuous same properties as the Laplace transformation has for the solution in the time and space domain is now calculated by time variable. It should remove the spatial derivatives and it using inverse transformations. should include the boundary conditions as additive terms. Unfortunately, there is no unique transformation available Inverse SLT for this task due to the finite spatial definition range in con- The inverse SLT is defined by an infinite sum over all discrete trast to the infinite time axis. That calls for a determination eigenvalues βµ with of the spatial transformation at hand, depending on the spa-    1 tial differential operator and the boundary conditions. Since Y(x, s) = T −1 Y¯ (µ, s) = Y¯ (µ, s)K(µ, x). (12) it leads to an eigenvalue problem first solved for simplified µ Nµ problems by Sturm and Liouville between 1836 and 1838, this transformation is called a Sturm-Liouville transforma- The inverse transformation kernel K(µ, x) and the inverse tion (SLT) [26]. Mathematical details of the SLT applied to spatial frequency variable βµ are the same eigenfunctions and scalar PDEs can be found in [12]. eigenvalues as for the forward transformation due to the self- The SLT is defined by adjointness of the spatial operators L and WL (see [12]forde- tails). Thus, the inverse SLT can be evaluated at each spatial   l position by evaluating the infinite sum. Since only quadratic T Y(x, s) = Y¯ (µ, s) = K(µ, x)Y(x, s)dx. (9) terms of µ occur in the denominator, it is sufficient to sum 0 over positive values of µ and double the result to account for the negative values. The norm factor results in that case in Note that there is a finite integration range in (9)incontrast N = l/4. to the Laplace transformation. The transformation kernels µ K(µ, x) of the SLT are obtained as the set of eigenfunctions of Inverse Laplace transformation the spatial operator L = L+W with respect to the bound- W L It can be seen from (11)and(8c), (10b) that the transfer ary conditions (8b). The corresponding eigenvalues are de- functions consist of two-pole systems with conjugate com- noted by β4(s)whereβ (s) is the discrete spatial frequency µ µ plex pole pairs for each discrete spatial eigenvalue β . There- variable (see, e.g., [12] for details). µ fore the inverse Laplace transformation results for each spa- For the boundary-value problem defined in (8) with the tial frequency variable in a damped sinusoidal term, called operators given in (4b), the transformation kernels and the mode. discrete spatial frequency variables result in 3.5. Continuous solution µπ K(µ, x) = sin x , µ ∈ N, (10a) After applying the inverse transformations to the MD TFM, l the continuous solution results in 4 2   4 µπ µπ ∞ β (s) = EI − Ts + d3s . (10b) 4 1 µ σµt ¯ l l y(x, t) = e sin ωµt ∗ fe(x, t) K(µ, x)δ−1(t). ρAl µ=1 ωµ Thus, the SLT can be interpreted as an extended Fourier se- (13) ries decomposition. The step function, denoted by δ−1(t), is used since the solu- tion is only valid for positive time instances; ∗ means tem- ¯ 3.3. Multidimensional transfer function model poral convolution. fe(x, t) is the spatially transformed exci- tation force, derived by inserting f into (9). The angular Applying the SLT (9) to the boundary-value problem (8)and e1 frequencies ω , as well as their corresponding damping co- solving for the transformed output variable Y¯ (µ, s) results in µ efficients σµ, can be calculated from the poles of the transfer the MD TFM function model (11). They directly depend on the physical 1 parameters of the string and can be expressed by Y¯ (µ, s) = F¯ (µ, s). (11) d (s)+β4(s) e ω = D µ µ      2 4 2 2  EI d µπ T d d µπ d Hence, the transformed input forces F¯(µ, s) are related via − 3 + s − 1 3 − 1 , ρA 2ρA l ρA 2(ρA)2 l 2ρA the MD transfer function given in (11) to the transformed output variable Y¯ (µ, s). The denominator of the MD TFM d d µπ 2 σ =− 1 − 3 . depends quadratically on the temporal frequency variable s µ 2ρA 2ρA l and to the power of four on the spatial frequency variable βµ. (14) This is based on the second-order temporal and fourth-order spatial derivatives occurring in the scalar PDE (4). Thus, the Thus, an analytical continuous solution (13), (14) of the ini- transfer function is a two-pole system with respect to time tial boundary value problem (4), (5), (6) is derived without for each discrete spatial eigenvalue βµ. temporal or spatial derivatives. 954 EURASIP Journal on Applied Signal Processing

3.6. Implicit equation for slap synthesis Discretization with respect to time The PDE (4) becomes nonlinear by adding the solution- Discretizing the time variable with t = kT, k ∈ N and assum- dependent slap force ff (xf , t, y, yf )in(7) to the right-hand ing an impulse-invariant system, an s-to-z mapping is ap- side of the linear PDE. Obviously, the application of the plied to the MD TFM (11)withz = e−sT . This procedure di- Laplace transformation and the SLT to the nonlinear initial- rectly leads to an MD TFM with the discrete-time frequency boundary-value problem cannot lead to an MD TFM, since variable z: a TFM always requires linearity. However, assuming that the σ T T 1/ρAωµ ze µ sin ωµT nonlinearity can be represented as a finite power series and Y¯ d(µ, z) = F¯d(µ, z). (17) 2 σµT 2σµT e that the nonlinearity does not contain spatial derivatives, z − 2ze cos ωµT + e both transformations can be applied to the system [12]. With (7), both premises are given such that the slap force can also Superscript d denotes discretized variables. The angular fre- ffi be transformed into the frequency domains. The Y(x, s)- quency variables and the damping coe cients are given in (14). Pole-zero diagrams of the continuous and the discrete dependency of F¯f can be expressed with (12)intermsof Y¯ (ν, s) to be consistently in the spatial frequency domain. system are shown in [27]. Then an MD implicit equation is derived in the temporal and spatial frequency domain Discretization with respect to space For the spatial frequency domain, there is no need for dis- ¯ = 1 ¯ ¯ ¯ ν Y(µ, s) 4 Fe(µ, s)+Ff µ, s, Y( , s) . (15) cretization, since the spatial frequency variable is already dis- dD(s)+β (s) µ crete. However, a discretization has to be applied to the spa- Note that the different argument ν in the output dependence tial variable x. This spatial discretization consists of simply ¯ ¯ ν of Ff (µ, s, Y( , s)) denotes an interaction between all modes evaluating the analytical solution (13) at a limited number caused by the nonlinear slap force. Details can be found in of arbitrary spatial positions xa on the string. They can be [12]. chosen to be the pickup positions or the fret positions, re- Since the transfer functions in (11)and(15) are the same, spectively. also the spatial transformation kernels and frequency vari- ables stay the same as in the linear case. Thus, also the tem- Inverse transformations poral poles of (15) are the same as in the MD TFM (11)and The inverse SLT cannot be performed any longer for an infi- the continuous solution results in the implicit equation nite number of µ due to the temporal discretization. To avoid  ∞ 4  1 temporal aliasing the number must be limited to µT such that = σµt | |≤ y(x, t) e sin ωµt ωµT T π, which also ensures realizable computer imple- ρAl µ=1 ωµ ff  mentations. E ects of this truncation are described in [12]. (16) The most important conclusion is that the sound quality is ∗ ¯ ¯ ν fe(x, t)+ ff µ, t, y¯( , t) not effected since only modes beyond the audible range are neglected. × K(µ, x)δ−1(t), By applying the shifting theorem, the inverse z-trans- formation results in µT second-order recursive systems in with ωµ and σµ givenin(14). It is shown in the next sections parallel, each one realizing one vibrational mode of the that this implicit equation is turned into explicit ones by ap- ff string. The structure is shown with solid lines in Figure 2. plying di erent discretization schemes. This linear structure can be implemented directly in the computer since it only includes delay elements z−1,adders, 4. DISCRETIZATION AT AUDIO RATE and multipliers. Due to (14), the coefficients of the second- This section describes the discretization of the continuous order recursive systems in Figure 2 only depend on the phys- solutions for the linear and the nonlinear cases. It is per- ical parameters of the vibrating string. formed at audio rate, for example with sampling frequency 4.2. Extensions for slap synthesis fs = 1/T = 44.1 kHz, where T denotes the sampling interval. The discrete realization is shown as it can be implemented The discretization procedure for the nonlinear slap synthe- in the computer. For the nonlinear slap synthesis, some ex- sis can be performed with the same three steps described in tensions of the discrete realization are required and, further- Section 4.1. Here, the discretized MD TFM is extended with ¯d ¯ d ν more, the stability of the entire system must be controlled. the output-dependent slap force Ff (µ, z, Y ( , z)) and thus stays implicit. However, after discretization with respect to 4.1. Discretization of the linear MD model spaceasdescribedabove,andinversez-transformation with The discrete realization of the MD TFM (11) consists of a application of the shifting theorem, the resulting recursive three-step procedure performed below: systems are explicit. This is caused by the time shift of the ex- citation function due to the multiplication with z in the nu- (1) discretization with respect to time, merator of (17). Therefore, the linear system given with solid (2) discretization with respect to space, lines in Figure 2 is extended with feedback paths denoted by (3) inverse transformations. dashed lines from the output to additional inputs between Multirate Simulations of String Vibrations Using the FTM 955

d d f (k) fe (k) f NL

c1,e(1) c1,s(1) d( , ) K(1, x ) y xa k z−1 + z−1 a + N1 2σ T σ T −e 1 2e 1 cos(ω1T) . . . ··· + c1,e(µT ) c1,s(µT )

K(µT , xa) z−1 + z−1 NµT

− 2σµT T σµT T e 2e cos(ωµT T)

Figure 2: Basic structure of the FTM simulations derived from the linear initial boundary value problem (4), (5), and (6)withseveral second-order resonators in parallel. Solid lines represent basic linear system, while dashed lines represent extensions for the nonlinear slap force.

d d d y¯2 (µT , k) c1,e(µT ) fe (k) c1,s(µT ) ff (k)

2σµ T d −e T y¯ (µT , k) z−1 ++ z−1

d d y¯1 (µT , k) y¯1,s(µT , k) σµT T 2e cos(ωµT T)

Figure 3: Recursive system realization of one mode of the transversal vibrating string.

the unit delays of all recursive systems. The feedback paths in are weighted with the nonlinear (NL) function (7). = 2T µT π c1,(e,s) µT sin ωµT T sin x(e,s) . (18) 4.3. Guaranteeing stability ρAωµT l The discretized LTI systems derived in Section 4.1 are inher- The total instantaneous energy of the string vibration with- ently stable as long as the underlying continuous physical out slap force density can be calculated with [12, 28](time model is stable due to the use of the impulse-invariant trans- step k and mode number µT dependencies are omitted for formation [25]. However, for the nonlinear system derived in concise notation) Section 4.2 this stability consideration is not valid any more.   It might happen that the passive slap force of the continu- 4ρA  E (k) = σ2 + ω2 ous system becomes active with the direct discretization ap- vibr µT µT l µ proach [24]. To preserve the passivity of the system, and thus T d2 d d σ T d2 2σ T y¯ − 2y¯ y¯ e µT cos ω T − y¯ e µT the inherent stability, the slap force must be limited such that × 1 1 2 µT 2 2 . 2σµT T the discrete impulses correspond to their continuous coun- e sin ωµT T terparts. (19) The instantaneous energy of the string vibration can be calculated by monitoring the internal states of the modal de- In (19), the instantaneous energy is calculated without appli- d flections [12]. The slap force limitation can then be obtained cation of the slap force since the internal states y¯1 (µT , k)are directly from the available internal states. For an illustration used (see Figure 3). For calculating the instantaneous energy d of these internal states, the recursive system of one mode µT Es(k) after applying the slap force, y¯1 (µT , k) must be replaced d is given in Figure 3. with y¯1,s(µT , k)in(19). To meet the condition of passivity The variables c1,e(µT )andc1,s(µT ), denoting the weight- of the elastic slap collision, both energies must be related by d ≥ ings of the linear excitation force fe (k)atxe and of the slap Evibr(k) Es(k). Here, only the worst-case scenario with d force ff (k)atxf , respectively, result with (9), (10a)and(17) regard to the instability problem is discussed, where both 956 EURASIP Journal on Applied Signal Processing energies are the same. By inserting into this energy equal- ing the temporal spectrum into different bands that are pro- ity the corresponding expressions of (19) and solving for the cessed independently of each other, the modes within these d slap force ff (k) results in bands can be calculated with a sampling rate that is a frac- tion of the audio rate. Thus, the computational complexity d ff (k) can be reduced with this method. The sidebands generated    σ T d d by this procedure at audio rate are suppressed with a syn- = c µ 2e µT cos ω T y¯ µ , k − 2y¯ µ , k , 5 T µT 2 T 1 T thesis filter bank when all bands are added up to the output µT signal. The input signals of the subsampled modes also have (20a) to be subsampled. To avoid aliasing, the respective input sig- nals for the modes are obtained by processing the excitation d with signal fe (k) through an analysis filter bank. This general pro- cedure is shown with solid lines in Figure 4. It shows several c5 µT    modes (RS # i), each one running at its respective downsam- 2 ν 2 2 2 σ T T ν pled rate. c1,s µT σµ + ωµ νT =µT e sin ω T T =  T T  . This filter bank approach is discussed in detail in the next   ν 2 2 2 2 2σ T T two sections for the linear as well as for the nonlinear model κ c1,s κT σκ + ωκ ν =κ e sin ωνT T T T T T T of the FTM. (20b) 5.1. Discretization of the linear MD model The force limitation discussed here can be implemented ffi For the realization of the structure shown in Figure 4,two very e ciently. Only one additional multiplication, one major tasks have to be fulfilled [29]: summation, and one binary shift are needed for each vibra- tional mode (see (20a)), since the more complicated con- (1) designing an analysis and a synthesis filter bank that ffi stants c5(µT ) have to be calculated only once and the weight- can be realized e ciently, d ing of y¯2 (µT , k) has to be performed within the recursive sys- (2) developing an algorithm that can simulate band tem anyway (compare Figure 3). changes of single sinusoids to keep the flexibility of the Discrete realizations of the analytical solutions of the MD FTM. initial boundary value problems have been derived in this section. For the linear and nonlinear systems, they resulted Filter bank design in stable and accurate simulations of the transversal vibrat- There are numerous design procedures for filter banks that ing string. The drawback of these straight forward discretiza- are mainly specialized to perfect or nearly perfect reconstruc- tion approaches of the MD systems in the frequency domains tion requirements [30]. In the structure shown in Figure 4 is the high computational complexity of the resulting real- there is no need for a perfect reconstruction as in sound- izations. Assuming a typical nylon guitar string with 247 Hz processing applications, since the sound production mecha- pitch frequency, 59 eigenmodes have to be calculated up to nism is performed within the single downsampled frequency the Nyquist frequency at 22.050 kHz. With an average of 3.1 bands. Therefore, inaccuracies of the interpolation filters can and 4.2 multiplications per output sample (MPOS) per re- be corrected by additional weightings of the subsampled re- cursive system for the linear and the nonlinear systems, re- cursive systems. Linear phase filters with finite impulse re- spectively, the total computational cost results for the whole sponses (FIR) are used for the filter bank due to the vari- string in 183 MPOS and 248 MPOS. Note that the fractions ability of the single sinusoids over time. Furthermore, a real- of the average MPOS result from the assumption that there valued generation of the sinusoids in the form of second- are only few time instances where an excitation force acts on order recursive systems as shown in Figure 2 is preferred to the string, such that the input weightings of the recursive sys- complex-valued first-order recursive systems. This approach tems do not have to be calculated at each sample step. Since avoids on one hand additional real-valued multiplications of this is also assumed for the nonlinear slap force, the fractional complex numbers. On the other hand, the nonlinear slap part in the nonlinear system is higher than in the linear sys- implementation can be performed in a similar way for the tem. multirate approach, as explained for the audio-rate realiza- These computational costs are approximately five times tion in Section 4.2. A multirate realization of the FTM with higher than those of the most efficient physical modeling complex-valued first-order systems is described in [31]. method, the DWG [21]. The next section shows that this dis- To fulfill these prerequisites and the requirement of low- advantage of the FTM can be fixed by using a multirate ap- order filters for computational efficiency with necessarily flat proach for the simulation of the recursive systems. filter edges, a filter bank with different downsampling factors for different bands has to be designed. A first step is to de- 5. DISCRETIZATION WITH A MULTIRATE APPROACH sign a basic filter bank with PED equidistant filters, all using the same downsampling factor rED = PED. Due to the flat fil- The basic idea using a multirate approach to the FTM realiza- ter edges, there will be PED − 1 frequency gaps between the tion is that the single modes have a very limited bandwidth single filters that have neither a sufficient passband amplifi- as long as the damping coefficients σµ are small. Subdivid- cation nor a sufficient stopband attenuation. These gaps are Multirate Simulations of String Vibrations Using the FTM 957

d d ff (rk) y (xa, rk) NL +

↓ 4 + RS # 1 + ↑ 4

RS # 2 +

RS # 3

↓ 6 + RS # 4 + ↑ 6 d d y (xa, k) fe (k) RS # 5

↓ 4 + RS # 6 + ↑ 4 Analysis filter bank Synthesis filter bank

RS # 7 . .

Figure 4: Structure of the multirate FTM. Solid lines represent the basic linear system, while dashed and dotted lines represent the extensions for the nonlinear slap force. RS means recursive system. The arrow between RS # 3 and RS # 4 indicates a band change.

filled with low-order FIR filters that realize the interpolation 0 of different downsampling factors than rED. The combina- −30 4444 tion of all filters forms the filter bank. It is used for the anal- −60 ysis and the synthesis filter bank as shown in Figure 4. −90 An example of this procedure is shown in Figure 5 with 00.20.40.60.81 PED = 4. The total number of bands is P = 7. The frequency 0 regions where the single filters are used as passbands in the −30 656 filter bank are separated by vertical dashed lines. The filters −60 are designed by a weighted least-squares method such that −90 they meet the desired passband bandwidths and stopband at- 00.20.40.60.81

tenuations. Note that there are several frequency regions for Magnitude response (dB) 0 each filter where the frequency response is not specified ex- −30 plicitly. These so-called “don’t care bands” occur since only −60 a part of the Nyquist bandwidth in the downsampled do- −90 main is used for the simulation of the modes. Thus, there can 00.20.40.60.81 only be images of these sinusoids in the upsampled version ω T/π in distinct regions. All other parts of the spectrum are “don’t µ care bands,” for the lowpass filter they are shown as gray ar- Figure 5: Top: frequency responses of the equidistant filters (with eas in Figure 5. Magnitude ripples of ±3 dB are allowed in downsampling factor four in this example). Center: frequency re- the passband which can be compensated by a correction of sponses of the filters with other downsampling factors. Bottom: fre- the weighting factors of the single sinusoids. The stopbands quency response of the filter bank. The downsampling factors r are are attenuated by at least −60 dB, which is sufficient for most given within the corresponding passbands. The FIR filter orders are hearing conditions. Merely in studio-like hearing conditions between Mmin = 34 and Mmax = 72 in this example. They realize a larger stopband attenuations must be used such that artifacts stopband attenuation of at least −60 dB and allow passband ripples produced by using the filter bank cannot be heard. of ±3dB. Due to the different specifications of the filters, concern- ing bandwidths and edge steepnesses, they have different or- ders and thus different group delays. To compensate for the ficients of the interpolation filters are denoted by Mp,where different group delays, delay-lines of length (Mmax − Mp)/2 Mmax is the maximum order of all filters. The delay lines con- are used in conjunction with the filters. The number of coef- sume some memory space but no additional computational 958 EURASIP Journal on Applied Signal Processing cost [32]. Realizing the filter bank in a polyphase structure, (3) training of the new interpolation filter to avoid tran- each filter bank results in a computational cost of sient behavior.

P Similar to the calculation of the instantaneous energy for slap Mp Cfilterbank = MPOS, (21) synthesis, also the instantaneous amplitude and phase can be p=1 rp calculated from the internal states of a second-order recursive system, y¯1 and y¯2. They can be calculated for the old band with the downsampling factors rp of each band. For the ex- with downsampling factor r1,aswellasforthenewband ample given above, each filter bank needs 73 MPOS. In (21) with factor r2. Demanding the equality of both it is assumed that each band contains at least one mode to and phases, the internal states of the new band are calculated be reproduced, so that it is a worst-case scenario. As long as from the internal states of the old band to the excitation signal is known in advance, the excitations for sin ω r T (r2) = (r1) µ 2 each band can be precalculated such that only the synthesis y¯1 y¯1 sin ωµr1T filter bank must be implemented in real time. The case that   the excitation signals are known and stored as wavetables in ( ) sin ωµr2T (23) r1 σµr1T − advance is quite frequent in physical modeling algorithms, + y¯2 e cos ωµr2T , tan ωµr1T although the pure physicality of the model is lost by this ap- ( ) ( ) − r2 = r1 σµ(r1 r2)T proach. For example, for string simulations, typical plucking y¯2 y¯2 e . or striking situations can be described by appropriate excita- tion signals which are determined in advance. The second item of the three-step procedure means that The practical realization of the multirate approach starts the output of the synthesis interpolation filter must not con- tain those modes that are leaving that band at time instance with the calculation of the modal frequencies ωµT and their ffi kchT for time steps kT ≥ kchT. Since the filter bank is a causal corresponding damping coe cients σµT .Thefrequencyde- notes in which band the mode is synthesized. The coefficients system of length MpT, the information of the band change − of the recursive systems, as shown in Figure 2 for the audio must either be given in advance at (kch Mp)T or a turbo fil- rate realization, have to be modified in the downsampled do- tering procedure has to be applied. In the turbo filtering, the main since the sampling interval T is replaced by calculations of several sample steps are performed within one sampling interval at the cost of a higher peak computational T(r) = rT(1) = rT. (22) complexity. In this case, the turbo filtering must calculate the previous outputs of the modes, leaving the band and sub- tract their contribution to the interpolated output for time Superscript (r) denotes the downsampled simulation with instances kT ≥ k T. Due to the higher peak computational factor r. The downsampling factors of the different bands r ch p complexity of the turbo filtering and the low orders of the are given in the top and center plot of Figure 5. No further interpolation filters, the additional delay of M T is preferred adjustments have to be performed for the coefficients of the p here. recursive systems in the multirate approach, since modes can In the same way, as the band changing mode must not be realized in the downsampled baseband or each of the cor- have an effect on the leaving band from k T on, it must responding images. ch also be included in the interpolation filter of the new band from this time instance on. In other words, the new interpo- Band changes of single modes lation filter must be trained to correctly produce the desired One advantage of the FTM is that the physical parameters mode without transients, as addressed in the third item of the of a vibrating object can be varied while playing. This is not three-step procedure above. It can also be performed with the only valid for successively played notes but also within one turbo processing procedure with a higher computational cost note, as it occurs, for example, in vibrato playing. As far as or with the delay of MpT between the information of band one or several modes are at the edges of the filter bank bands, change and its effect in the output signal. these variations can cause the modes to change the bands Now, the linear solution (13) of the transversal vibrating while they are active. This is shown with an arrow in Figure 4. string derived with the FTM is realized also with a multirate In such a case, the reproduction cannot be performed by just approach. Since the single modes are produced at a lower rate ffi adjusting the coe cients of the recursive systems with (22) than the audio rate, this procedure saves computational cost to the new downsampling rate and using the other interpo- in comparison to the direct discretization procedure derived lation filter. This procedure would result in strong transients in Section 4.1. The amount of computational savings with and in a modification of the modal amplitudes and phases. this procedure is discussed in more detail in Section 6. Therefore, a three-step procedure has to be applied to the band changing modes: 5.2. Extensions for slap synthesis (1) adjusting the internal states of the recursive systems In the discretization approach described in Section 4.2 the ff d such that no phase shift and no amplitude di erence output y (xa, k) is fed back to the recursive systems via the d occurs in the upsampled output signal from this mode, path of the external force fe (k)(compareFigure 2). Using (2) canceling the filter output of the band changing mode, the same path in the multirate system shown in Figure 4 Multirate Simulations of String Vibrations Using the FTM 959 would result in a long delay within the feedback path due between most modes but it only restricts them to few time to the delays in the interpolation filters of the analysis and instances, in the example above every fourth or twelfth audio the synthesis filter bank. Furthermore, the analysis filter bank sample. These low delays of the interaction are not notice- should not be realized in real time as long as the excitation able. The second effect can be handled by adding impulses signal is known in advance. directly to the interpolation filters of the synthesis filter bank. Fortunately, the recursive systems calculate directly the The weights of the impulses in each band are determined by instantaneous deflection of the single modes, but in the the difference between the sum of all slap force impulses in downsampled domain. Considering a system where only all bands and the applied slap force impulses in that band. In modes are simulated in baseband, the signal can be fed back that way, a slap force, only applied to baseband modes, pro- in between the down- and upsampled boxes in Figure 4 and duces a nearly white noise slap signal at audio rate. thus directly in the downsampled domain. In comparison to The stabilization procedure described in Section 4.3 can the full-rate system, the observation of the penetration of the be also applied to the multirate realization of the nonlinear string into the fret might be delayed by up to (rp − 1)T sec- slap force. The only differences to the audio rate simulations onds. This delay results in a different slap force, but applying are that T is replaced by rpT asgivenin(22) and the sum- d the stabilization procedure described in Section 4.3 the sta- mation for the calculation of the stable slap force ff (k)as bility is guaranteed. givenin(20a) is only performed over the modes realized in However, in realistic simulations there are also modes in the participating bands. Thus, there are time instances where the higher frequency bands than just in the baseband. This the slap force is only applied to the modes in the equidistant modifies the simulations described above in two ways: bands and time instances where it is applied also to bands (i) the deflection of the string and thus the penetration with another downsampling factor. This is shown with the ff into the fret depends on the modes of all bands, dotted lines in Figure 4. Due to the di erent cases of partici- (ii) there is an interaction due to nonlinear slap force be- pating bands, also two versions of the constants c5(µT )have tween all modes in all bands. to be calculated, since the products and sums in (20b)de- pend only on the participating modes. The calculation of the instantaneous string deflection in the Now, a stable and realistic simulation of the nonlinear downsampled rates is rarely possible, since there are various slap force is also obtained in the multirate realization. In the downsampling rates as shown in Figure 4. Thus, there are nonlinear case, the simulation accuracy obviously decreases only a few time instances kallT, where the modal deflections with higher downsampling factors and thus with an increas- are updated in all bands at the same time. Since in almost all ing number of bands. This effect is discussed in more detail bands one sample value of the recursive systems represents in the next section. more than half the period of the mode, it is not reasonable to use the previously calculated sample value for the calcu- lation of the deflection at time instances kT = kallT.How- 6. SIMULATION ACCURACY AND COMPUTATIONAL ever, all the equidistant bands of the filter bank as shown COMPLEXITY on top of Figure 5 have the same downsampling factor and In the previous sections, stable, linear and nonlinear, discrete can thus represent the same time instances for the calcula- FTM models have been derived. In the next sections, the sim- tion of the deflection. Furthermore, most of the energy of ulation accuracies of these models and their corresponding guitar string vibrations is in the lower modes [28], such that computational complexities are discussed. the deflection is mostly defined by the modes simulated in the lowest bands. Therefore, the string deflection is deter- 6.1. Simulation accuracies mined here at each r1th audio sample from all equidistant bands and each (k mod r1 = 0) ∧ (k mod r2 = 0)th audio For the linearly vibrating string, the discrete realization of sample from all equidistant bands and bands with the down- the single modes at full rate is an exactly sampled version of sampling rate of the lowest band-pass. This is shown in the the continuous modes. This is true as long as the input force right dashed and dotted paths in Figure 4. In the example can be modeled with discrete impulses, since the impulse- of Figure 5, in each twelfth audio sample the deflection is invariant transformation is used as explained in Section 4.1. calculated from the four equidistant bands and each twelfth However, the exactness of the complete system is lost with audio sample it is calculated also from the second and sixth the truncation of the summation of partials in (12)toavoid bands. aliasing effects. Therefore, the results are only accurate as In the same way the string deflection is calculated with long as the excitation signal has only low energy in the trun- varying participation of the different bands, also the slap cated high frequency range. This is true for the guitar and force is only applied to modes in these bands as shown in the most other musical instruments [28] and, furthermore, the left dashed and dotted paths in Figure 4. This procedure has neglected higher partials cannot be received by the human two effects: firstly, there is no interaction between all modes auditory system as long as the sampling interval T is cho- at all (downsampled) time instances from the slap force. Sec- sen small enough. Since the audible modes are simulated ex- ondly, the slap force itself, being an impulse-like signal with a actly and the simulation error is out of the audible range, the bright spectrum, is filtered by the filter bank. The first effect is FTM is used here as an optimized discretization approach for not that important since the procedure ensures interactions sound synthesis applications. 960 EURASIP Journal on Applied Signal Processing

In multirate simulations of linear systems as described are preserved. Thus, simplifications and computational sav- in Section 5.1, the single modes are produced exactly within ings due to the filter bank approach are performed here with the downsampled domain. But due to the imperfectness of respect to the human auditory system. the analysis filter bank, modes are not only excited by the correct frequency components of the excitation force, but 6.2. Computational complexities also by aliasing terms that occur with downsampling. In the same way, the images, produced by upsampling the outputs The computational complexities of the FTM are explained of the recursive systems, are not suppressed perfectly with with two typical examples, a single bass guitar string sim- the synthesis filter bank. However, the filter banks have been ulated in different qualities and a six-string acoustic gui- designed such that the stopband suppressions are at least tar. The first example simulates the vibration of one bass −60dB.Thisissufficient for most listening conditions as de- guitar string with fundamental frequency of 41 Hz. The cor- fined in Section 5.1. Furthermore, the filters are designed in responding physical parameters can be found, for example, a least-mean-squares sense such that the energy of the side in [12]. This string is simulated in different sound quali- lobes in the stopbands is minimized. Further filter bank op- ties by varying the number of simulated modes from 1 to timizations with respect to the human auditory system are 117, which corresponds to the simulation of all modes up difficult since the filter banks are designed only once for all to the Nyquist frequency with a sampling frequency of fs = kinds of mode configurations concerning their positions and 44.1 kHz. amplitude relations in the simulated spectrum. Figure 6 shows the dependency of the computational In the audio rate string model excited nonlinearly with complexities on the number of simulated modes and thus the slap force as described in Section 4.2, the truncation of the simulation accuracy or sound quality. The procedure the infinite sum in (16) also effects the accuracy of the lower used here to enhance the sound quality consists of sim- modes through the nonlinearity. The simulations are accu- ulating more and more modes in consecutive order from rate only as long as the external excitation and the nonlinear- the lowest mode on. Thus, the enhancement of the sound ity have low contributions to the higher modes. Although the quality sounds like opening the lowpass in subtractive syn- external excitation contributes rarely to the higher modes, thesis. The upper plot shows the computational complexi- there is an interaction between all modes due to the slap ties for the linear system, simulated at audio rate and with force. This interaction grows with the modal frequencies. It the multirate approach using filter banks with P = 7and = can be directly seen in the coefficients c5(µT )in(20b), since P 15. The bottom plot shows the corresponding graphs they have larger absolute values for higher frequencies. How- for the nonlinear systems. It is assumed that the exter- ever, the force contributions of the omitted modes are dis- nal forces only act on the string at one tenth of the out- tributed to the simulated modes since the denominator of put samples such that the weighting of the inputs do not (20b) decreases for less simulated partials. Furthermore, the have to be performed at each time instance. Thus, each lin- sign of c5(µT ) changes with µT due to (18) as well as the ex- ear recursive system needs 3.1 MPOS for the calculation of pression in parenthesis of (20a) does with time. Thus, there is one output sample, whereas the nonlinear system needs 4.2 a bidirectional interaction between low and high modes and MPOS. not only an energy shift from low to high frequencies. Ne- It can be seen that the multirate implementations are glecting modes out of the audible range results in less energy much more efficient than the audio-rate simulations, except fluctuations of the audible modes. But since the neglected en- for simulations with very few modes. With all 117 simulated ergy fluctuations have high frequencies, they are also out of modes, the relation between audio rate and multirate sim- the audible range. ulations (P = 7) is 363 MPOS to 157 MPOS for the linear In the multirate implementation of the nonlinear model system and 492 MPOS to 187 MPOS for the nonlinear sys- as described in Section 5.2, the interactions between al- tem. This is a reduction of the computational complexity of most all modes are retained. It is more critical here that more than 60%. the observation of the fret-string penetration might be The steps in the multirate graphs denote the offset of the delayed by several audio samples. This circumvents not filter bank realization and that the interpolations of the fil- only the strict limitation of the string deflection by the ter bank bands are only calculated as long as there is at least fret, but is also changes the modal interactions because one mode simulated in those bands. On the one hand, the the nonlinear system is not time-invariant. However, the regions between the steps are steeper in the filter bank with audible slap effect stays similar to the full-rate simula- P = 7 than in that with P = 15 due to the higher downsam- tions and sounds realistic. Audio examples can be found at pling factors in filter banks with more bands. On the other http://www.LNT.de/∼traut/JASP04/sounds.html. hand, the steps are higher for filter banks with more bands It has been shown that the FTM realizes the continuous due to the higher interpolation filter orders. In this example, solutions of the physical models of the vibrating string ac- the multirate approach with P = 7 is superior to the filter curately. With the multirate approach, the FTM looses the bank with P = 15 for high qualities, since there are only a exactness of the linear audio rate model, but the inaccura- few modes simulated in the higher bands of P = 15, but the cies cannot be heard. For the nonlinear model, the multirate filter bank offset is higher. For other configurations with a approach leads to audible differences compared to the audio higher number of simulated modes, this situation is different rate simulations, but the characteristics of the slap sounds as shown in the next example. Multirate Simulations of String Vibrations Using the FTM 961

200 150 100 (MPOS) 50 0 0 20406080100120 Computational complexity Number of modes

(a)

200 150 100 (MPOS) 50 0 0 20406080100120 Computational complexity Number of modes

(b)

Figure 6: Computational complexities of the FTM simulations dependent on the number of simulated modes at audio rate (dotted line), and with multirate approaches with P = 7 (dashed line) and P = 15 (solid line). (a): Linearly vibrating string, (b): vibrating string with nonlinear slap forces.

The second example shows the computational complex- ities of the simultaneous simulations of six independent 400 strings as they occur in an acoustic guitar. Obviously, there is only one interpolation filter bank needed for all strings. The average number of simulated modes for each guitar string is assumed to be 60. In contrast to the first example, it is assumed that the modes are equally distributed in the fre- 350 quency domain, such that at least one mode is simulated in each band. Figure 7 shows that the computational complexities de- pend on the choice of the used filter bank. On the one hand, 300 each filter bank needs a fixed amount of computational cost Computational complexity (MPOS) which grows with the number of used bands. On the other hand, filter banks with more bands provide higher down- sampling factors for the production of the sinusoids which 7 111519 saves computational cost. Thus, the choice of the optimal P filter bank depends on the number of simultaneously sim- ulated modes. For practical implementations this has to be Figure 7: Computational complexities of the FTM simulations of a estimated in advance. six-string guitar dependent on the number of bands for the multi- It can be seen that for the linear case (solid line) the rate approach. Solid line: linearly vibrating string. Dashed line: vi- minimum computational cost is 272 MPOS using the filter brating string with nonlinear slap forces. bank with P = 11. In the nonlinear case, the filter bank with P = 15 has the minimum computational cost with 319 MPOS for the simulation of all six strings. Compared to the audio-rate simulations with 1116 MPOS and 1512 MPOS for Compared to high quality DWG simulations, the com- the linear and nonlinear case, respectively, the multirate sim- putational complexities of the multirate FTM approach are ulations allow computational savings up to 79%. Thus, the nearly the same. Linear DWG simulations need up to 40 multirate simulations have a computational complexity of MPOS for the realization of the reflection filters [21]and approximately 45 MPOS (53 MPOS) for each linearly (non- the nonlinear limitation of the string by the fret additionally linearly) simulated string. needs 3 MPOS per fret position [22]. 962 EURASIP Journal on Applied Signal Processing

7. CONCLUSIONS [9] J. M. Adrien, “Dynamic modeling of vibrating structures for sound synthesis, modal synthesis,” in Proc. AES 7th Inter- The complete procedure of the FTM has been described from national Conference, pp. 291–299, Audio Engineering Society, the basic physical analysis of a vibrating structure resulting in Toronto, Canada, May 1989. an initial boundary value problem via its analytical solution [10] G. De Poli, A. Piccialli, and C. Roads, Eds., Representations of to efficient digital multirate implementations. The transver- Musical Signals, MIT Press, Cambridge, Mass, USA, 1991. sal vibrating dispersive and lossy string with a nonlinear slap [11] G. Eckel, F. Iovino, and R. Causse,´ “Sound synthesis by phys- force served as an example. The novel contribution is a thor- ical modelling with Modalys,” in Proc. International Sympo- ough investigation of the implementation and the properties sium on Musical Acoustics, pp. 479–482, Le Normant, Dour- dan, France, July 1995. of a multirate realization. ff [12] L. Trautmann and R. Rabenstein, Digital Sound Synthe- It has been shown that the di erences between audio- sis by Physical Modeling Using the Functional Transformation rate and multirate simulations for linearly vibrating string Method, Kluwer Academic Publishers, New York, NY, USA, simulations are not audible. The differences of the nonlin- 2003. ear simulations were audible but the multirate approach pre- [13] D. A. Jaffe and J. O. Smith, “Extensions of the Karplus-Strong serves the sound characteristics of the slap sound. The ap- plucked-string algorithm,” Computer Music Journal,vol.7, plication of the multirate approach saves almost 80% of the no. 2, pp. 56–69, 1983. ffi [14] K. Karplus and A. Strong, “Digital synthesis of plucked-string computational cost at audio rate. Thus, it is nearly as e cient and drum timbres,” Computer Music Journal, vol. 7, no. 2, pp. as the most popular physical modeling method, the DWG. 43–55, 1983. The multirate FTM is by far not limited to the example of [15] J. O. Smith, “Physical modeling using digital waveguides,” vibrating strings. It can be used in a similar way to spatially Computer Music Journal, vol. 16, no. 4, pp. 74–91, 1992. multidimensional systems, like membranes or plates, or even [16] J. O. Smith, “Efficient synthesis of stringed musical instru- to other physical problems like heat flow or diffusion. ments,” in Proc. International Computer Music Conference,pp. 64–71, Tokyo, Japan, September 1993. [17] M. Karjalainen, V. Valim¨ aki,¨ and Z. Janosy,´ “Towards high- ACKNOWLEDGMENTS quality sound synthesis of the guitar and string instruments,” The authors would like to thank Vesa Valim¨ aki¨ for numer- in Proc. International Computer Music Conference, pp. 56–63, Tokyo, Japan, September 1993. ous discussions and his help in the filter bank design for the [18] M. Karjalainen, V. Valim¨ aki,¨ and T. Tolonen, “Plucked-string multirate FTM. Furthermore, the financial support of the models, from the Karplus-Strong algorithm to digital waveg- Deutsche Forschungsgemeinschaft (DFG) for this research is uides and beyond,” Computer Music Journal,vol.22,no.3,pp. greatly acknowledged. 17–32, 1998. [19] R. Rabenstein, “Discrete simulation of dynamical boundary REFERENCES value problems,” in Proc. EUROSIM Simulation Congress,pp. 177–182, Vienna, Austria, September 1995. [1] C. Roads, S. Pope, A. Piccialli, and G. De Poli, Eds., Musical [20] L. Trautmann and R. Rabenstein, “Digital sound synthesis Signal Processing, Swets & Zeitlinger, Lisse, The Netherlands, based on transfer function models,” in Proc. IEEE Workshop 1997. on Applications of Signal Processing to Audio and Acoustics,pp. [2] L. Hiller and P. Ruiz, “Synthesizing musical sounds by solving 83–86, IEEE Signal Processing Society, New Paltz, NY, USA, the wave equation for vibrating objects: Part I,” Journal of the October 1999. Audio Engineering Society, vol. 19, no. 6, pp. 462–470, 1971. [21] L. Trautmann, B. Bank, V. Valim¨ aki,¨ and R. Rabenstein, [3] A. Chaigne and V. Doutaut, “Numerical simulations of xy- “Combining digital waveguide and functional transformation lophones. I. Time-domain modeling of the vibrating bars,” methods for physical modeling of musical instruments,” in Journal of the Acoustical Society of America, vol. 101, no. 1, pp. Proc. Audio Engineering Society 22nd International Conference 539–557, 1997. on Virtual, Synthetic and Entertainment Audio, pp. 307–316, [4] A. Chaigne, “On the use of finite differences for musical syn- Espoo, Finland, June 2002. thesis. Application to plucked stringed instruments,” Journal [22] E. Rank and G. Kubin, “A waveguide model for slapbass syn- d’Acoustique, vol. 5, no. 2, pp. 181–211, 1992. thesis,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Pro- [5] A. Chaigne and A. Askenfelt, “Numerical simulations of pi- cessing, pp. 443–446, IEEE Signal Processing Society, Munich, ano strings. I. A physical model for a struck string using finite Germany, April 1997. difference methods,” Journal of the Acoustical Society of Amer- [23] M. Kahrs and K. Brandenburg, Eds., Applications of Digital ica, vol. 95, no. 2, pp. 1112–1118, 1994. Signal Processing to Audio and Acoustics,KluwerAcademic [6] M. Karjalainen, “1-D digital waveguide modeling for im- Publishers, Boston, Mass, USA, 1998. proved sound synthesis,” in Proc. IEEE Int. Conf. Acoustics, [24] L. Trautmann and R. Rabenstein, “Stable systems for nonlin- Speech, Signal Processing, vol. 2, pp. 1869–1872, IEEE Signal ear discrete sound synthesis with the functional transforma- Processing Society, Orlando, Fla, USA, May 2002. tion method,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal [7] C. Erkut and M. Karjalainen, “Finite difference method Processing, vol. 2, pp. 1861–1864, IEEE Signal Processing So- vs. digital waveguide method in string instrument modeling ciety, Orlando, Fla, USA, May 2002. and synthesis,” in Proc. International Symposium on Musical [25] B. Girod, R. Rabenstein, and A. Stenger, Signals and Systems, Acoustics, Mexico City, Mexico, December 2002. John Wiley & Sons, Chichester, West Sussex, UK, 2001. [8] C. Cadoz, A. Luciani, and J. Florens, “Responsive input [26] R. V. Churchill, Operational Mathematics,McGraw-Hill,New devices and sound synthesis by simulation of instrumental York, NY, USA, 3rd edition, 1972. mechanisms: the CORDIS system,” Computer Music Journal, [27] R. Rabenstein and L. Trautmann, “Digital sound synthe- vol. 8, no. 3, pp. 60–73, 1984. sis of string instruments with the functional transformation Multirate Simulations of String Vibrations Using the FTM 963

method,” Signal Processing, vol. 83, no. 8, pp. 1673–1688, 2003. [28] N. H. Fletcher and T. D. Rossing, The Physics of Musical In- struments, Springer-Verlag, New York, NY, USA, 1998. [29] L. Trautmann and V. Valim¨ aki,¨ “A multirate approach to physical modeling synthesis using the functional transforma- tion method,” in Proc. IEEE Workshop on Applications of Sig- nal Processing to Audio and Acoustics, pp. 221–224, IEEE Signal Processing Society, New Paltz, NY, USA, October 2003. [ 3 0 ] P. P. Va i dya n a t h a n , Multirate Systems and Filter Banks,Pren- tice Hall, Englewood Cliffs, NJ, USA, 1993. [31] S. Petrausch and R. Rabenstein, “Sound synthesis by physical modeling using the functional transformation method: Effi- cient implementation with polyphase filterbanks,” in Proc. International Conference on Effects,London,UK, September 2003. [32] B. Bank, “Accurate and efficient method for modeling beating and two-stage decay in string instrument synthesis,” in Proc. MOSART Workshop on Current Research Directions in Com- puter Music, pp. 134–137, Barcelona, Spain, November 2001.

L. Trautmann received his “Diplom-Inge- nieur” and “Doktor-Ingenieur” degrees in electrical engineering from the University of Erlangen-Nuremberg, in 1998 and 2002, re- spectively. In 2003 he was working as a Post- doc in the Laboratory of Acoustics and Au- dio Signal Processing at the Helsinki Uni- versity of Technology, Finland. His research interests are in the simulation of multi- dimensional systems with focus on digital sound synthesis using physical models. Since 1999, he published more than 25 scientific papers, book chapters, and books. He is a holder of several patents on digital sound synthesis.

R. Rabenstein received his “Diplom-Inge- nieur” and “Doktor-Ingenieur” degrees in electrical engineering from the University of Erlangen-Nuremberg, in 1981 and 1991, respectively, as well as the “Habilitation” in signal processing in 1996. He worked with the Telecommunications Laboratory of this university from 1981 to 1987 and since 1991. From 1998 to 1991, he was with the Physics Department of the University of Siegen, Germany. His research interests are in the fields of multidi- mensional systems theory and simulation, multimedia signal pro- cessing, and computer music. He serves in the IEEE TC on Signal Processing Education. He is a Board Member of the School of En- gineering of the Virtual University of Bavaria and has participated in several national and international research cooperations. EURASIP Journal on Applied Signal Processing 2004:7, 964–977 c 2004 Hindawi Publishing Corporation

Physically Inspired Models for the Synthesis of Stiff Strings with Dispersive Waveguides

I. Testa Dipartimento di Scienze Fisiche, Universita` di Napoli “Federico II,” Complesso Universitario di Monte S. Angelo, 80126 Napoli, Italy Email: [email protected]

G. Evangelista Dipartimento di Scienze Fisiche, Universita` di Napoli “Federico II,” Complesso Universitario di Monte S. Angelo, 80126 Napoli, Italy Email: [email protected]

S. Cavaliere Dipartimento di Scienze Fisiche, Universita` di Napoli “Federico II,” Complesso Universitario di Monte S. Angelo, 80126 Napoli, Italy Email: [email protected]

Received 30 June 2003; Revised 17 November 2003

We review the derivation and design of digital waveguides from physical models of stiff systems, useful for the synthesis of sounds from strings, rods, and similar objects. A transform method approach is proposed to solve the classic fourth-order equations of stiff systems in order to reduce it to two second-order equations. By introducing scattering boundary matrices, the eigenfrequencies are determined and their n2 dependency is discussed for the clamped, hinged, and intermediate cases. On the basis of the frequency-domain physical model, the numerical discretization is carried out, showing how the insertion of an all-pass delay line generalizes the Karplus-Strong algorithm for the synthesis of ideally flexible vibrating strings. Know- ing the physical parameters, the synthesis can proceed using the generalized structure. Another point of view is offered by Laguerre expansions and frequency warping, which are introduced in order to show that a stiff system can be treated as a nonstiff one, provided that the solutions are warped. A method to compute the all-pass chain coefficients and the optimum warping curves from sound samples is discussed. Once the optimum warping characteristic is found, the length of the dis- persive delay line to be employed in the simulation is simply determined from the requirement of matching the desired fun- damental frequency. The regularization of the dispersion curves by means of optimum unwarping is experimentally evalu- ated. Keywords and phrases: physical models, dispersive waveguides, frequency warping.

1. INTRODUCTION tem are sought. However, due to the complexity of the real Interest in digital audio synthesis techniques has been rein- physical systems—from the classic design of musical in- forced by the possibility of transmitting signals to a wider au- struments to the molecular structure of extended objects— dience within the structured audio paradigm, in which algo- solutions of these equations cannot be generally found in an rithms and restricted sets of data are exchanged [1]. Among analytic way and one should resort to numerical methods or these techniques, the physically inspired models play a privi- approximations. In many cases, the resulting approximation leged role since the data are directly related to physical quan- scheme only closely resembles the exact model. For this rea- tities and can be easily and intuitively manipulated in order son, one could better define these methods as physically in- to obtain realistic sounds in a flexible framework. Applica- spired models, as first proposed in [2], where the mathemat- tions are, amongst the others, the simulation of a “physical ical equations or solutions of the physical problem serve as situation” producing a class of sounds as, for example, a clos- a solid base to inspire the actual synthesis scheme. One of ing door, a car crash, the hiss made by a crawling creature, the the advantages of using physically inspired models for sound human-computer interaction and, of course, the simulation synthesis is that they allow us to perform a “selection” of the of musical instruments. physical parameters actually influencing the sound so that a In the general physical models technique, continuous- trade-off between completeness and particular goals can be time solutions of the equations describing the physical sys- achieved. Physically Inspired Models 965

In the following, we will focus on stiff vibrating systems, exploited in order to perform an analysis of piano sounds including rods and stiff strings as encountered in pianos. by means of pitch-synchronous frequency warped wavelets However, extensions to two- or three-dimensional systems in which the excitation can be separated from the resonant can be carried out with little effort. sound components [34]. Vibrating physical systems have been extensively studied The models presented in this paper provide at least two over the last thirty years for their key role in many musi- entry points for the synthesis. If the physical parameters and cal instruments. The wave equation can be directly approxi- boundary conditions are completely known, or if it is de- mated by means of finite difference equations [3, 4, 5, 6, 7], sired to specify them to model arbitrary strings or rods, then or by discretization of the wave functions as proposed by the eigenfunctions, hence the dispersion curve, can be deter- Jaffe and Smith [8, 9] who reinterpreted and generalized the mined. The problem is then reconducted to that of finding Karplus-Strong algorithm [10] in a wave propagation setting. the best approximation of the continuous-time dispersion The outcome of the approximation of the time domain solu- curve with the phase response of a suitable all-pass chain us- tion of the wave equation is the design of a digital waveg- ing the methods illustrated in Section 3. Another entry point uide simulating the string itself: the sound signal simulation is offered if sound samples of an instrument are available. is achieved by means of an appropriate excitation signal, such In this case, the parameters of the synthesis model can be as white noise. However, in order to achieve a more realistic determined by finding the warping curve that best fits the and flexible synthesis, the interaction of the excitation sys- data given by the frequencies of the partials, together with tem with the vibrating element is, in turn, physically mod- the length of the dispersive delay line. This is achieved by eled. Digital waveguide methods for the simulation of physi- means of a regularization method of the experimental dis- cal models have been widely used [11, 12, 13, 14, 15, 16]. One persion data, as reported in Section 4. of the reasons for their success is that they are appropriate for The physical entry point is to be preferred in those sit- real-time synthesis [17, 18, 19, 20]. This result allowed us to uations where sound samples are not available, for example, change our approach to model musical instruments based on when we are modeling a nonexisting instrument by extension vibrating strings: waveguides can be designed for modeling of the physical model, such as a piano with unusual speak- the “core” of the instruments, that is, the vibrating string, but ing length. The other entry level is best for approximating they are also suitable for the integration of interaction mod- real instrument sounds. However, in this case, the synthesis els, for example, for the excitation due to a hammer [21]or is limited to existing sources, although some variations can to a bow [9], the radiation of sound due to the body of the be obtained in terms of the warping parameters, which are instrument [22, 23, 24, 25], and of different side-effects in related to, but do not directly represent, physical factors. plucked strings [26]. It must be pointed out that the interac- tions being highly nonlinear, their modeling and the deter- 2. PHYSICAL STIFF SYSTEMS mination of the range of stability is not an easy task. ff In this paper, we will review the design of a digital waveg- In this section, we present a brief overview of the sti uide simulating a vibrating stiff system, focusing on stiff string and rod equations of motion and of their solution. strings and treating bars as a limit case where the tension The purpose is twofold. On the one hand, these equations in negligible. The purpose is to derive a general framework give the necessary background to the physical modeling of ff inspiring the determination of a discrete numerical model. sti strings. On the other hand, we show that their fre- A frequency domain approach has been privileged, which quency domain solution ultimately provides the link between allows us to separate the fourth-order differential equation continuous-time and discrete-time models, useful for the of stiff systems into two second-order equations, as shown derivation of the digital waveguide and suitable for their sim- in Section 2. This approach is also useful for the simula- ulation. This link naturally leads to Laguerre expansions for tion of two-dimensional (2D) systems such as thin plates. the solution and to frequency warping equivalences. Further- By enforcing proper boundary conditions, we obtain the more, enforcing proper boundary conditions determines the eigenfrequencies and the eigenfunctions of the system as eigenfrequencies and eigenfunctions of the system, useful for found, for the case of strings, in the classic works by Fletcher fitting experimentally measured resonant modes to the ones [27, 28]. Once the exact solutions are completely charac- obtained by simulation. This fit allows us to determine the terized, their numerical approximation is discussed [29, 30] parameters of the waveguide through optimization. together with their justification based on physical reason- 2.1. Stiff string and bar equation ing. The discretization of the continuous-time domain so- lutions is carried out in Section 3, which naturally leads to The equation of motion for the stiff string can be determined dispersive waveguides based on a long chain of all-pass fil- by studying the equilibrium of a thin plate [35, 36]. One ob- ters. From a different point of view, the derived structure can tains the following 4th-order differential equation for the de- be described in terms of Laguerre expansions and frequency formation of the string y(x, t): warping [31]. In this framework, a stiff system can be shown ff ∂4 y(x, t) ∂2 y(x, t) 1 ∂2 y(x, t) to be equivalent to a nonsti (Karplus-Strong like) system, −ε + = , ∂x4 ∂x2 c2 ∂t2 whose solutions are frequency warped, provided that the ini-  (1) tial and the possibly moving boundary conditions are prop- EI T ε = , c = , erly unwarped [32, 33]. As a side effect, this property can be T ρS 966 EURASIP Journal on Applied Signal Processing

2 2 featuring the Young modulus of the material E, the inertia The operator −∂ /∂x is selfadjoint with respect to the L2 moment I with respect to the transversal axis of the cross- scalar product [37]. Therefore, (9) can be separated into the section of the string (for a circular section of radius r, I = following two independent equations: πr4/4asin[36]), the tension of the string T, and the mass per unit length ρS. Note that for ε → 0, (1) becomes the well- ∂2 − ξ2 Y (x, ω) = 0, known equation of the vibrating string [35]. Otherwise, if the ∂x2 1 1 applied tension T is negligible, we obtain (10) 2 ∂ − 2 = 4 2 ξ2 Y2(x, ω) 0,  ∂ y(x, t) ∂ y(x, t)  EI ∂x2 −ε = , ε = ,(2) ∂x4 ∂t2 ρS where which is the equation for the transversal vibrations of rods. Solutions of (1)and(2) are best found in terms of the Fourier Y(x, ω) = Y1(x, ω)+Y2(x, ω). (11) transform of y(x, t)withrespecttotime: As we will see, (10) justifies the use, with proper modifica- +∞ tions, of a second-order generalized waveguide based on pro- Y(x, ω) = y(x, t)exp(−iωt)dt,(3) −∞ gressive and regressive waves for the numerical simulation of stiff systems. where ω is the angular velocity related to frequency f by the relationship f = 2πω. 2.2. General solution of the stiff string By taking the Fourier transform of both members of (1) and bar equations and (2), we obtain In this section, we will provide the general solution of (8). ∂4Y(x, ω) ∂2Y(x, ω) ω2 The particular eigenfunctions and eigenfrequencies of rods ε − − Y(x, ω) = 0(4) ff ∂x4 ∂x2 c2 and sti strings are determined by proper boundary condi- tions and are treated in Section 2.3.From(7), it can be shown for the stiff string and that  √ 4 2 2  ∂ Y(x, ω) 2 ± 1+4ω ε/c − 1 ε − ω Y(x, ω) = 0(5)ξ =± − ∂x4 1 2ε  √ (stiff string), 2 2 for the rod. ± 1+4ω ε/c +1 − 2 2 ff ξ =± The second-order ∂ /∂x spatial di erential operator is 2 2ε defined as a repeated application of the L2 space extension of  (12) the −i(∂/∂x)operator[37]. To the purpose, we seek solutions ± =± − √ω ξ1  whose spatial and frequency dependency can be factored, ac- ε  cording to the separation of variables method, as follows: (rod). ± =± √ω ξ2  Y(x, ω) = W(ω)X(x). (6) ε ± Note that in both cases, the eigenvalues ξ1 are complex num- Substituting (6)in(4)and(5) results in the elimination of ± the W(ω) term, obtaining ordinary differential equations, bers, while ξ2 are real numbers. It is also worth noting that whose characteristic equations, respectively, are 2 2 = 1 ff ξ1 + ξ2 (sti string), 2 ε (13) 4 2 ω ελ − λ − = 0 (stiff string), 2 2 = c2 (7) ξ1 + ξ2 0(rod), ελ4 − ω2 = 0(rod). where ξ1 corresponds to the positive choice of the sign in =| ±| The elementary solutions for the spatial part X(x) have the front of the square root in (12)andξ2 ξ2 . As expected, if → ff form X(x) = C exp(λx). It is important to note that in both we let T 0, then both sets of eigenvalues of the sti string cases, the characteristics equations have the following form: tend to those found for the rod. Using the equations in (12), we then have for both strings and rods λ2 − ξ2 λ2 − ξ2 = 0, (8) 1 2 = + − − Y1(x, ω) c1 exp ξ1x + c1 exp ξ1x , (14) where ξ1 and ξ2 are, in general, complex numbers that de- + − Y2(x, ω) = c exp ξ2x + c exp − ξ2x , pend on ω.Equation(8)allowsustofactorbothequations 2 2 in (4)and(5) as follows: ± ± where c1 , c2 are, in general, functions of ω. Note that ( , ) is an oscillating term, while, since is real, ( , ) 2 2 Y1 x ω ξ2 Y2 x ω ∂ − 2 · ∂ − 2 = is nonoscillating. For finite-length strings, both positive and ξ1 ξ2 Y(x, ω) 0. (9) ∂x2 ∂x2 negative real exponentials are to be retained. Physically Inspired Models 967

From (12), we see that the primary effect of stiffness is 3.5 the dependency on frequency of the argument (from now on, phase) of the solutions of (4)and(5). Therefore, the propa- 3 gation of the wave from one section of the string located at x to the adjacent section located at x +∆x is obtained by multi- 2.5 ∆ plication of a frequency dependent factor exp(ξ1 x). Conse- ) 1 2 −1 − quently, the group velocity u,definedasu ≡ (dξ1/dω) , also depends on frequency. This results in a dispersion of the wave (cm 1

ξ 1.5 packet, characterized by the function ξ1(ω), whose modulus is plotted in Figure 1 for the case of a brass string using the 1 following values of the physical parameters r, T, ρ,andE: 0.5 r = 1 mm, T = 9 · 107 dyne, 0 (15) 00.511.522.5 −3 ×104 ρ = 8.44 g cm , Frequency (Hz) − E = 9 · 1011 dyne cm 2. Figure 1: Plot of the phase module of the stiff model equation so- 2 4 −1 Clearly, the previous example is a very crude approximation lution for ε = π/4cm and c ≈ 2 ∗ 10 cm s . of a physical piano string (e.g., real-life piano strings in the low register are built out of more than one material and a copper or brass wire is wrapped around a steel core). For the and of hinged extrema [5, 16, 31, 35, 36]: sake of completeness, we give the explicit expression of |u| in L L both the cases we are studying. We have Y − , ω = Y , ω = 0, 2 2 2 2 (19) 2 2 +4 2 ∂ Y(x, ω) ∂ Y(x, ω) c c ω ε = = |u|=   (stiff string), 2 2 0. ∂x −L/2 ∂x L/2 2c2 ± 2c c2 +4ω2ε (16) √ Before determining the conditions for the eigenfrequencies |u|=2 ω ε (rod). of the considered stiff systems, we find a more compact way of writing (18)and(19). Starting from the factorized form of If T → 0, the two group velocities are equal. Moreover, if in the stiff systems equation (see (10)), and using the symbols the first line in (16), we let ε → 0, then u → c, which is the introduced in Section 2.2,wehave limit case of the ideally flexible vibrating string. These facts = + − further justify the use of a dispersive waveguide in the nu- Y1(x, ω) ψ1 (x, ω)+ψ1 (x, ω), (20) = + − merical simulation. With respect to this point, a remark is in Y2(x, ω) ψ2 (x, ω)+ψ2 (x, ω), order: the dispersion introduced by stiffness can be treated as a limiting “nonphysical” consequence of the Euler-Bernoulli wherewelet beam equation: ± = ± ± ψ1 (x, ω) c1 exp ξ1 x , (21) d2 d2 y ψ±(x, ω) = c± exp ξ±x . EI = p, (17) 2 2 2 dx2 dx2 Conditions (18) can then be rewritten as follows: where p is the distributed load acting on the beam. It is “non- √ L L physical” in the sense that u →∞as ω. However, in the Y − , ω =−Y − , ω , 1 2 2 2 discrete-time domain, this “nonphysical” situation is avoided L L ifwesupposeallthesignalsbebandlimited. Y , ω =−Y , ω , 1 2 2 2 (22) 2.3. Complete characterization of stiff string ∂Y1(x, ω) ∂Y2(x, ω) =− , and rod solution ∂x − ∂x − L/2 L/2 Boundary conditions for real piano strings lie in between the ∂Y1(x, ω) ∂Y2(x, ω) =− . conditions of clamped extrema: ∂x L/2 ∂x L/2 L L At the terminations of the string or of the rod, we have Y − , ω = Y , ω = 0, 2 2 (18) + − =− + − ψ1 + ψ1 ψ2 + ψ2 , ∂Y(x, ω) = ∂Y(x, ω) = (23) 0, + + − − =− + + − − ∂x −L/2 ∂x L/2 ξ1 ψ1 + ξ1 ψ1 ξ2 ψ2 + ξ2 ψ2 , 968 EURASIP Journal on Applied Signal Processing which can be rewritten in matrix form: between the conditions given in (18)and(19), we can com- bine the two matrices S and S in order to enforce more + − c h 11 ψ1 11 ψ1 general conditions, as illustrated in Section 3. In the follow- + + + =− − − − . (24) ξ1 ξ2 ψ2 ξ1 ξ2 ψ2 ing, we will solve (4)and(5) applying separately these sets of boundary conditions. By left-multiplying both members of (24) for the inverse of 11 the + + matrix, we have ξ1 ξ2 2.3.1. The clamped stiff string and rod + − In order to characterize the eigenfunctions in the case of con- ψ1 = ψ1 + Sc − , (25) ditions (18), in (12)welet ψ2 ψ2 =  where we let ξ1 iξ1 (33)   + + + ff ξ2 + ξ1 ξ for both the sti string and the rod solution. By definition, − −2 2    =  + − + + − +  ξ1 is a real number. Moreover, for the rod, we have ξ1 ξ2. ≡  ξ2 ξ1 ξ2 ξ1  Sc  + + +  . (26) With this position, it can be shown that conditions (18)for ξ1 ξ2 + ξ1 ff 2 + − + + − + the sti string lead to the equations [35, 38] ξ2 ξ1 ξ2 ξ1    L L The matrix Sc relates the incident wave with the reflected  tan ξ1 tanh ξ2    2 2  ξ1 0 wave at the boundaries. Independently of the roots ξi,ithas   = , (34) L  L ξ 0 the following properties: tanh ξ − tan ξ 2 2 2 1 2 Sc =−1, while, for the rod, we have 10 (27) S2 = .  = c 01 cos ξ1L cosh ξ2L 1. (35)

In the case of a hinged stiff system (see (19)) at both ends, we Equations (34)and(35) can be solved numerically. In partic- have ular, taking into account the second line in (12), solutions of (35)are[35] ψ+ + ψ− =− ψ+ + ψ− , 1 1 2 2 2   π  2 2 2 2 (28) = 2 2 2 2 2 ξ+ ψ+ + ξ− ψ− =− ξ+ ψ+ + ξ− ψ− ωn 3.011 ,5 ,7 , ...,(2n +1) α , 1 1 1 1 2 2 2 2 √4 4  (36)  ε which, in matrix form, becomes α = . L 11 + 11 − ff ψ1 =− ψ1 A similar trend can be obtained for the sti string. In view + 2 + 2 + − 2 − 2 − . (29) ξ1 ξ2 ψ2 ξ1 ξ2 ψ2 of their historical and practical relevance, we here report the numerical approximation for the allowed eigenfrequencies of 11 + 2 + 2 the stiff string given by Fletcher [27]: By taking the inverse of matrix (ξ1 ) (ξ2 ) ,weobtain + − ψ ψ c 2 2 2 2 1 = 1 ωn  nπ 1+n π α 1+2α +4α , + Sh − , (30) L ψ2 ψ2 √ (37) ε α = . where L 10 If we expand the above expression in a series of powers of S =− . (31) h 01 α truncated to second order, we have the following approxi- mate formula valid for small values of stiffness: The Sh matrix for the hinged stiff system is independent of c 1 2 2 2 roots ξi.ThematricesSh and Sc are related in the following ωn  nπ 1+2α + 1+ n π (2α) . (38) way: L 8 The last approximation does not apply to bars. For ε = 0, we Sh =− Sc , (32) have α = 0 and the eigenfrequencies tend to the well-known 2 = 2 Sh Sc . formula for the vibrating string [35]: ff In conclusion, the boundary conditions for sti systems ωn = nω1. (39) can be expressed in terms of matrices that can be used in the numerical simulation of stiff systems. Moreover, since the Typical curves of the relative spacing χn ≡ ∆ωn/ω1,where real-life boundary conditions for stiff strings in piano lie in ∆ωn ≡ ωn+1 − ωn, of eigenfrequencies for the stiff string are Physically Inspired Models 969

9 3500 = = r 3mm 8 r 3mm 3000 7 2500 6

5 2000

4 1500

Relative spacing 3 1000 Deviation from linearity (Hz) 2 500 1 = r = 1mm r 1mm 0 0 010203040 0 500 1000 1500 2000 Partial number Frequency (Hz)

Figure 2: Typical eigenfrequencies relative spacing curves of the Figure 3: Typical warping curves of the clamped stiff string for dif- clamped stiff string for different values of the radius r of the sec- ferent values of the radius r of the section S. tion S. shown in Figure 2 with variable r, where values of the other and for the rod: physical parameters are the same as in (15).  = Due to the dependency on the frequency of the phase of sin ξ1L sinh ξ2L 0. (42) the solution, the eigenfrequencies of the stiff string are not 2 equally spaced. For a small radius r,henceforlowdegree The second line in (41) has no solutions since both ξ1 and 2 ff of the stiffness of the string (see (1)), the relative spacing is ξ2 are real functions. It follows that hinged sti systems are = almost constant for all the considered order of eigenfrequen- only described by (42). In this equation, sinh(ξ2L) 0has cies. However, for higher stiffness, the spacing of the eigen- no solution, hence the eigenfrequencies are determined by frequencies increases, in first approximation, as a linear func- the condition tion of the order of the eigenfrequency. The above results are  nπ ξ = . (43) summarized by the typical “warping curves” of the system, 1 L shown in Figure 3, in which the quantity ωn −ωn,whichrep-  resents the deviation from the linearity, is plotted in terms of Using the parameters α and α respectively defined in (36) ff spacing ∆ωn between consecutive eigenfrequencies. and (37), the eigenfrequencies for the hinged sti string are In the stiff string case, we have two sets of eigenfunctions, exactly expressed as follows: one having even parity and the other one having odd parity, c whose analytical expressions are respectively given by [38] ω = nπ n2π2α2 +1 , (44) n L  =  L cos ξ1x − cosh ξ2x while for the rod, we have Y(x, ω) C(ω)cos ξ1  , 2 cos ξ1(L/2) cosh ξ2(L/2) 2 2 2 ωn = n π α . (45)  =  L sin ξ1x − sinh ξ2x Y(x, ω) C(ω)sin ξ1  , As the tension T → 0, (44) tends to (45). Figure 4 shows 2 sin ξ1(L/2) sinh ξ2(L/2) (40) the relative spacing of the eigenfrequencies in the case of the hinged stiff string. where C(ω) is a constant that can be calculated imposing the Relative eigenfrequencies spacing curves are very similar initial conditions. to the ones of the clamped string and so are the “warping curves” of the system, as shown in Figure 5. 2.3.2. Hinged stiff string and rod Using (45), we can give an analytic expression for the rel- ative spacing of the eigenfrequencies of the hinged rod. We Conditions (19) lead to the following sets of equations for have the stiff string: 2 2 π α (2n +1). (46)  = sin ξ1L sinh ξ2L 0, (41) 2 2 = ξ1 + ξ2 0, Equation (43) leads to the following set of odd and even 970 EURASIP Journal on Applied Signal Processing

10 2P delays r = 3mm X(z) Y(z) z−2P 8

6 G(z) (low-pass) 4 Relative spacing Figure 6: Basic Karplus-Strong delays cascade. 2

= r 1mm can also be seen from the similarity of the warping curves ob- 0 010203040tained with the two types of boundary conditions. Taking into account the fact that real-piano strings Partial number boundary conditions lie in between these two cases, we can Figure 4: Typical eigenfrequencies relative spacing curves of the conclude that the eigenfrequencies of real-piano strings can hinged stiff string for different values of the radius r of the section be calculated by means of the approximated formula [27, 28]: S. & 2 ωn  An Bn + 1, (48) 4000 r = 3mm where A and B can be obtained from measurements. Approx- 3500 imation (48) is useful in order to match measured vibrating 3000 modes against the model eigenfrequencies.

2500 3. NUMERICAL APPROXIMATIONS OF STIFF SYSTEMS 2000 Most of the problems encountered when dealing with the continuous-time equation of the stiff string consist in de- 1500 termining the general solution and in relating the initial 1000 and boundary conditions to the integrating constants of the Deviation from linearity (Hz) equation. In this section, we will show that we can use a sim- 500 r = 1mm ilar technique also in discrete-time, which yields a numerical transform method for the computation of the solution. 0 0 500 1000 1500 2000 In Section 2, we noted that (1) becomes the equation of ff ffi Frequency (Hz) vibrating string in the case of negligible sti ness coe cient ε. It is well known that the technique known as Karplus-Strong Figure 5: Typical warping curves of the hinged stiff string for dif- algorithm implements the discrete-time domain solution of ferent values of the radius r of the section S. the vibrating string equation [8], allowing us to reach good quality acoustic results. The block diagram of the adopted loop circuit is shown in Figure 6. eigenfunctions for the stiff string [38]: The transfer function of the digital loop chain can be written as follows: 2nπ Y (x, ω) = 2D(ω)sin x , 1 n L ( ) = , (49) H z − −2P (47) 1 z G(z) (2n +1)π Y (x, ω) = 2D(ω)cos x , n L where the loop filter G(z) takes into account losses due to nonrigid terminations and to internal friction, and P is the where D(ω) must be determined by enforcing the initial con- number of sections in which the string is subdivided, as ob- ditions. It is worth noting that both functions in (47) are in- tained from time and space sampling. Loop filters design dependent of the stiffness parameter ε.InSection 3,wewill can be based on measured partial amplitude and frequency use the obtained results in order to implement the dispersive trajectories [18], or on (LPC)-type waveguides digitally simulating the solutions of (4)and(5). methods [9]. The filter G(z) can be modelled as IIR or FIR Finally, we need to stress the fact that the eigenfrequen- and it must be estimated from samples of the sound or from cies of the hinged stiff string are similar to the ones for the a model of the string losses, where, for stability, we need clamped case except for the factor (1 + 2α +4α2). Therefore, |G(e jω)| < 1. Clearly, in the IIR case or in the nonlinear phase for small values of stiffness, they do not differ too much. This FIR case, the phase response of the loop filter introduces a Physically Inspired Models 971

3 u = 0.9 2P all-pass cascade X(z) Y(z) A(z)2P

2.5

2 G(z) (low-pass)

1.5 Figure 8: Dispersive waveguide used to simulate dispersive systems.

Discrete frequency (Hz) 1

where fs is the sampling frequency. Note that, by definition, 0.5 both members of (52) are real numbers. Therefore, in the z- domain, a nonstiff system can be mapped into a stiff system =− u 0.9 by means of the frequency warping map 0 0123 z−1 −→ A(z). (53) Discrete frequency (Hz) The resulting circuit is shown in Figure 8. Note, that the feed- Figure 7: First-order all-pass phase plotted for various values of u. back all-pass chain results in delay-free loops. Computation- ally, these loops can be resolved by the methods illustrated in [34, 42, 43]. Moreover, the phase response of the loop filter limited amount of dispersion. Additional phase terms in the G(z) contributes to the dispersion and it must be taken into form of all-pass filters can be added in order to tune the account in the global model. string model to the required pitch [13] and contribute to fur- The circuit in Figure 8 can be optimized in order to take ther dispersion. into account the losses and the coupling amongst strings Since the group velocity for a traveling wave for a stiff sys- (e.g., as in piano). In the framework of this paper, we con- tem depends on frequency (see (16)), it is natural to substi- fined our interest to the design of the stiff system filter. For a tute, in discrete time, the cascade of unit delays with a chain review of the design of lossy filters and coupling models, see of circuital elements whose phase responses do depend on [17]. frequency. One can show that the only choice that leads to rational transfer functions is given by a chain of first-order 3.1. Stiff system filter parameters determination all-pass filters [39, 40]. More complex physical systems, for example, as in the simulation of a monaural room, call for Within the framework of the approximation (52) in the case substituting the delays chain with a more general filter as il- of dispersive waveguide, the integer parameter P can be ob- lustrated in [41]: tained by constraining the two functions to attain the same values at the extrema of the bandwidth. Since θ(π) = π,we z−1 − u have A(z, u) = (50) 1 − uz−1 ξ πf L P = 1 s . (54) whose phase characteristic is π u sin(Ω) As we will see, condition (54) is not the only one that can be θ(Ω)= Ω + 2 arctan . (51) 1 − u cos(Ω) obtained for the parameter P. The deviation from linearity introduced by the warping θ(Ω) can be written as follows: The phase characteristics in (51)areplottedinFigure 7 for variousvaluesofu. u sin(Ω) ∆(Ω) ≡ θ(Ω) − Ω = 2 arctan . (55) A comparison between the curve in Figure 1 and the ones 1 − u cos(Ω) in Figure 7 gives more elements of plausibility for the approx- imation of the solution phase of the stiff model equations, The function ∆(Ω) is plotted, for different values of u,in given in (12), with the all-pass filter phase (51). Adopting a Figure 9. similar circuital scheme as in the Karplus-String algorithm One can see that the absolute value of ∆(Ω) has a max- [10] in which the unit delays are replaced by first-order all- imum which corresponds to the maximum deviation from pass filters, the approximation is given by the linearity of θ(Ω). It can be shown that this maximum oc- curs for  Ω  P Ω ξ1 fs θ( ), (52) L Ω = ΩM = arccos(u) (56) 972 EURASIP Journal on Applied Signal Processing

3 fore, an approximation of the chain by means of a cascade of u = 0.9 an all-pass of order much smaller than 2P with unit delays is usually sought [13, 29, 30]. A simple and accurate approach 2 is to model the all-pass as a cascade of first-order sections with variable real parameter u [38]. However, a more gen- eral approach calls for including in the design second-order 1 all-pass sections, equivalent to a pair of complex conjugated first-order sections [29]. In Section 4, we will bypass this esti- 0 mation procedure based on the theoretical eigenfunctions of the string to estimate the all-pass parameters and the number of sections from samples of the piano. −1 3.2. Laguerre sequences Deviation from linearity (Hz) An invertible and orthogonal transform, which is related to −2 the all-pass chain included in the stiff string model, is given by the Laguerre transform [44, 45]. The Laguerre sequences u =−0.9 −3 li[m, u] are best defined in the z-domain as follows: 0123 √ 1 − u2 z−1 − u i L (z, u) = . (62) Discrete frequency (Hz) i 1 − uz−1 1 − uz−1

Figure 9: Plot of the deviation from linearity of the all-pass filter Thus, the Laguerre sequences can be obtained from the z- phase for different values of parameter u. domain recurrence √ 1 − u2 L (z, u) = , for which the maximum deviation is 0 1 − uz−1 (63) Li+1(z, u) = A(z)Li(z, u), ∆ ΩM, u = 2 arcsin(u). (57) where A(z)isdefinedasin(50). Comparison of (62)with Substituting (56)in(51), we have (50) shows that the phase of the z transform of the Laguerre π sequences is suitable for approximating the phase of the solu- θ Ω = + arcsin(u). (58) ff M 2 tion of the sti model equation. A biorthogonal generaliza- tion of the Laguerre sequences calling for a variable u from Since the solution phase ξ1 is approximated by θ(Ω), it has to section to section is illustrated in [46]. This is linked to the satisfy the condition refined approximation of the solution previously shown. Ω M L π 3.3. Initial conditions ξ1  + arcsin(u) (59) T P 2 Putting together the results obtained in Section 1,wecan ff Ω and therefore, we have the following bound on P: write the solution phase of the sti model Y( , x)asfollows (see (11)and(14)): Lξ f arccos(u) P  1 s . (60) = +  − −  π/2 + arcsin(u) Y(ω, x) c1 (ω)exp iξ1x + c1 (ω)exp iξ1x . (64)

For higher-order Q all-pass filters, (60)canbewrittenasfol- We are now disregarding the transient term due to ξ2 since it lows: does not influence the acoustic frequencies of the system. In = discrete time and space, we let x m(L/P)asin[10]. With 1 Q ξ f arccos u L the approximation (52), (64)becomes P  1 s i . (61) 2+arcsin Q i=1 π/ ui Ω  + Ω Ω − Ω − Ω Y(m, ) c1 ( )exp imθ( ) + c1 ( )exp imθ( ) . An optimization algorithm can be used to obtain the vector (65) parameter u. Based on our experiments, we estimated that Substituting (63)in(65), we have an optimal order Q is 4 for the piano string. Therefore, us- Lm(Ω, u) − L−m(Ω, u) ing the values in (15) for the 58 Hz tone of an L = 200 cm Y(Ω, m)  c+(Ω) + c (Ω) , (66) 1 L (z, u) 1 L (z, u) brass string, we obtain P = 209. Although this is not a model 0 0 for a real-life wound inhomogeneous piano string, this ex- where we have used the fact that ample gives a rough idea of the typical number of the re- −iΩ Ω e − u quired all-pass sections. The computation of this long all- A ei , u = = exp iθ(Ω) . (67) pass chain can be too heavy for real-time applications. There- 1 − ue−iΩ Physically Inspired Models 973

By defining We have just shown that the solution of the discrete-time stiff model equation can be written as a Laguerre expansion + Ω − Ω c1 ( ) c1 ( ) of the initial condition. At the same time, this shows that the V+(Ω) ≡ , V−(Ω) ≡ , (68) L0(z, u) L0(z, u) stiff string model is equivalent to a nonstiff string model cas- caded by frequency warping obtained by Laguerre expansion. (66) can be written as follows: 3.4. Boundary conditions Y(m, Ω)  V+(Ω)Lm(Ω, u)+V−(Ω)L−m(Ω, u). (69) In Section 1, we discussed the stiff model equation bound- Taking the inverse discrete-time Fourier transform (IDTFT) ary conditions in continuous time (see (18)and(19)). In on both sides of (69), we obtain this section, we will discuss the homogenous boundary con- ditions (i.e., the first line in both (18)and(19)) in the y[m, n]  y+[m, n]+y−[m, n], (70) discrete-time domain. Using approximation (52) and letting the number of sections of the stiff system P be an even integer, where we can write the homogenous conditions as follows (see also ∞ (69)): y [m, n] = v [n − k]l [k, u], + + m P k=−∞ Y − , Ω = 0 ∞ (71) 2 y−[m, n] = v−[n − k]l−m[k, u], =⇒ V+(Ω)L−P/2(Ω, u)+V−(Ω)LP/2(Ω, u) = 0, =−∞ (78) k P Y + , Ω = 0 and the sequences v±(n) are the IDTFT of V±(Ω). For the 2 sake of conciseness, we do not report here the expression of =⇒ V+(Ω)LP/2(Ω, u)+V−(Ω)L−P/2(Ω, u) = 0. ± v±[n] in terms of constants c1 . For further details, see [31, 38]. The expression of the numerical solution y[m, n]canbe Like (34), (78) can be expressed in matrix form: written in terms of a generic initial condition LP/2(Ω, u) L−P/2(Ω, u) V+(Ω) 0 Ω Ω Ω = . (79) y[m,0]= y+[m,0]+y−[m,0]. (72) L−P/2( , u) LP/2( , u) V−( ) 0

In order to do this, we resort to the extension of Laguerre As shown in Section 3.3, the functions V±(Ω) are determined sequences to negative arguments: by means of Laguerre expansion of the initial conditions se-  quences through (71)and(76). For any choice of these initial  [ , ], ≥ 0, conditions, the determinant of the coefficients matrix in (79) = lm n u n lm[n, u]  (73) must be zero, obtaining the following condition: lm[−n, u], n<0, Ω 2 − Ω 2 = and to the property LP/2( , u) L−P/2( , u)] 0. (80) Recalling the z-transform expression for the Laguerre se- lm[n, u] = ln[m, −u]. (74) quences, we have If we introduce the quantity kπ sin θ(Ω)P = 0, θ(Ω) = , k = 1, 2, 3, .... (81) ∞ P ± =  ± yk [u] y±[m,0]lk [ m, u], m=0 (75) In the stiff string case, the eigenfrequencies of the system are  not harmonically related. In our approximation of the phase l [±m, u] = l±m[k, u], k of the solution with the digital all-pass phase, the harmonic- with a simple mathematical manipulation, (71)canbewrit- ity is reobtained at a different level: the displacement of the ten as follows: all-pass phase values is harmonic according to the law writ- ten in (81). The distance between two consecutive values of ∞ this phase is π/P. Due to the nonrigid terminations, the real- y [m, n] = y+[u]l [k + n, u], + k m life boundary conditions can be given in terms of frequency k=−∞ ∞ (76) dependent functions, which are included in the loop filter. = − In mapping the stiff structure to a nonstiff one, care must be y−[m, n] yk [u]lm[k + n, u]. k=−∞ taken into unwarping the loop filter as well. Therefore, the numeric solution becomes 4. SYNTHESIS OF SOUND ∞ ∞ = + − In order to implement a piano simulation via the physical y[m, n] yk lm[k + n, u]+ yk lm[k + n, u]. (77) k=−∞ k=−∞ model, we need to determine the design parameters of the 974 EURASIP Journal on Applied Signal Processing

0.7 180

160 0.6 140

0.5 120

100 0.4 80 Warping parameter

Spacing of the partials (Hz) 60 0.3 40

0.2 20 0 5 10 15 20 25 30 020406080 Partial number Partial number

Figure 10: Computed all-pass optimized parameters u. Figure 11: Warped deviation from linearity.

dispersive waveguide, that is, the number of all-pass sections 0.022 and the coefficients ui of the all-pass filters. This task could be performed by means of lengthy measurements or estimation 0.021 of the physical variables, such as tension, Young’s module, density, and so forth. However, as we already remarked, due 0.02 to the constitutive complexity of the real-life piano strings and terminations, this task seems to be quite difficult and to lead to inaccurate results. In fact, the given physical model 0.019 only approximately matches the real situation. Indeed, in ordertomodelandjustifythemeasuredeigenfrequencies, 0.018 we resorted to Fletcher’s experimental model described by Normalized frequency (48). However, in that case, we ignore the exact form of the 0.017 eigenfunctions, which is required in order to determine the number of sections of the waveguide and the other param- 0.016 eters.Amorepragmaticandeffective approach is to esti- 0 1020304050 mate the waveguide parameters directly from the measured Partial number eigenfrequencies ωn. These can be extracted, for example, from recorded samples of notes played by the piano under Figure 12: Optimized all-pass parameters u for A#3 tone. exam. Fletcher’s parameters A and B can be calculated as follows:  imization of the number of nontrivial all-pass sections in the 1 16ω2 − ω2 A = n 2n , cascade. 2n 3 (82) Given the optimum warping curve, the number of sec- 2 − tions is then determined by forcing the pitch of the cascade = 1 4γ 1 = ωn B 2 2 , γ . of the nonstiff model (Karplus-Strong like) with warping to n 1 − 16γ ω2n match the required fundamental frequency of the recorded In practice, in the model where the all-pass parameters tone. An example of this method is shown in Figure 11, ui are equal throughout the delay line, one does not even where the measured warping curves pertaining to several pi- need to estimate Fletcher’s parameters. In fact, in view of the ano keys in the low register, as estimated from the resonant equivalence of the stiff string model with the warped non- eigenfrequencies, are shown. In Figure 12, the optimum se- stiff model, one can directly determine, through optimiza- quence of all-pass parameters u for the examined tones is tion, the parameter u that makes the dispersion curve of the shown. Finally, in Figure 13, the plot of the regularized dis- eigenfrequencies the closest to a straight line, using a suitable persion curves by means of optimum unwarping is shown. distance. A result of this optimization is shown in Figure 10. For further details about this method, see [47, 48, 49]. Fre- It must be pointed out that our point of view differs from quency warping has also been employed in conjunction with the one proposed in [29, 30], where the objective is the min- 2D waveguide meshes in the effort of reducing the artificial Physically Inspired Models 975

250 struments. However, in order to fine tune the physically in- spired models to real instruments, one needs methods for 200 the estimation of the parameters from samples of the instru- ment. In this paper, we showed that dispersion from stiff- ness is a simple case in which the solution of the raw phys- 150 ical model suggests a discrete-time model, which is flexible enough to be used in the synthesis and which provides real- istic results when the characteristics are estimated from the 100 samples. Frequency (Hz)

50 REFERENCES [1] B. L. Vercoe, W. G. Gardner, and E. D. Scheirer, “Structured audio: creation, transmission, and rendering of parametric 0 sound representations,” Proceedings of the IEEE, vol. 86, no. 010203040 5, pp. 922–940, 1998. Partial number [2] P. Cook, “Physically informed sonic modeling (PhISM): syn- thesis of percussive sounds,” Computer Music Journal,vol.21, Figure 13: Optimum unwarped regularized dispersion curves. no. 3, pp. 38–49, 1997. [3] L. Hiller and P. Ruiz, “Synthesizing musical sounds by solving the wave equation for vibrating objects: Part I,” Journal of the Audio Engineering Society, vol. 19, no. 6, pp. 462–470, 1971. [4] L. Hiller and P. Ruiz, “Synthesizing musical sounds by solving dispersion introduced by the nonisotropic spatial sampling the wave equation for vibrating objects: Part II,” Journal of the [50]. Since the required warping curves do not match the Audio Engineering Society, vol. 19, no. 7, pp. 542–551, 1971. first-order all-pass phase characteristic, in order to overcome [5] A. Chaigne and A. Askenfelt, “Numerical simulations of pi- this difficulty, a technique including resampling operators ano strings. I. A physical model for a struck string using finite has been used in [50, 51] according to a scheme first in- difference methods,” Journal of the Acoustical Society of Amer- troduced in [33] and further developed in [52] for the ica, vol. 95, no. 2, pp. 1112–1118, 1994. wavelet transforms. However, the downsampling operators [6] A. Chaigne and A. Askenfelt, “Numerical simulations of piano strings. II. Comparisons with measurements and systematic inevitably introduce aliasing. While in the context of wavelet exploration of some hammer-string parameters,” Journal of transforms, this problem is tackled with multichannel filter the Acoustical Society of America, vol. 95, no. 3, pp. 1631–1640, banks, this is not the case of 2D waveguide meshes. 1994. [7] A. Chaigne, “On the use of finite differences for musical syn- thesis. Application to plucked stringed instruments,” Journal 5. CONCLUSIONS d’Acoustique, vol. 5, no. 2, pp. 181–211, 1992. [8] D. A. Jaffe and J. O. Smith III, “Extensions of the Karplus- In order to support the design and use of digital dispersive ff Strong plucked-string algorithm,” The Music Machine,C. waveguides, we reviewed the physical model of sti systems, Roads, Ed., pp. 481–494, MIT Press, Cambridge, Mass, USA, using a frequency domain approach in both continuous and 1989. discrete time. We showed that, for dispersive propagation in [9] J. O. Smith III, Techniques for digital filter design and sys- the discrete-time, the Laguerre transform allows us to write tem identification with application to the violin,Ph.D.the- the solution of the stiff model equation in terms of an or- sis, Electrical Engineering Department, Stanford University thogonal expansion of the initial conditions and to reob- (CCRMA), Stanford, Calif, USA, June 1983. [10] K. Karplus and A. Strong, “Digital synthesis of plucked-string tain harmonicity at the level of the displacement of the all- ff and drum timbres,” The Music Machine,C.Roads,Ed.,pp. pass phase values. Consequently, we showed that the sti 467–479, MIT Press, Cambridge, Mass, USA, 1989. string model is equivalent to a nonstiff string model cas- [11] J. O. Smith III, “Physical modeling using digital waveguides,” caded with frequency warping, in turn obtained by Laguerre Computer Music Journal, vol. 16, no. 4, pp. 74–91, 1992. expansion. Finally, we showed that due to this equivalence, [12] J. O. Smith III, “Physical modeling synthesis update,” Com- the all-pass coefficients can be computed by means of opti- puter Music Journal, vol. 20, no. 2, pp. 44–56, 1996. mization algorithms of the stiff model with a warped nonstiff [13] S. A. Van Duyne and J. O. Smith III, “A simplified approach to modeling dispersion caused by stiffness in strings and plates,” one. in Proc. 1994 International Computer Music Conference,pp. The exploration of physical models of musical instru- 407–410, Aarhus, Denmark, September 1994. ments requires mathematical or physical approximations in [14] J. O. Smith III, “Principles of digital waveguide models of order to make the problem treatable. When available, the musical instruments,” in Applications of Digital Signal Pro- solutions will only partially reflect the ensemble of mechan- cessing to Audio and Acoustics,M.KahrsandK.Branden- ical and acoustic phenomena involved. However, the phys- burg, Eds., pp. 417–466, Kluwer Academic Publishers, Boston, Mass, USA, 1998. ical models serve as a solid background for the construc- [15] M. Karjalainen, T. Tolonen, V. Valim¨ aki,¨ C. Erkut, M. Laur- tion of physically inspired models, which are flexible nu- son, and J. Hiipakka, “An overview of new techniques and merical approximations of the solutions. Per se, these ap- effects in model-based sound synthesis,” Journal of New Mu- proximations are interesting for the synthesis of virtual in- sic Research, vol. 30, no. 3, pp. 203–212, 2001. 976 EURASIP Journal on Applied Signal Processing

[16] J. Bensa, S. Bilbao, R. Kronland-Martinet, and J. O. Smith III, [34] A. Harm¨ a,¨ M. Karjalainen, L. Savioja, V. Valim¨ aki,¨ U. K. “The simulation of piano string vibration: from physical Laine, and J. Huopaniemi, “Frequency-warped signal process- models to finite difference schemes and digital waveguides,” ing for audio applications,” Journal of the Audio Engineering Journal of the Acoustical Society of America, vol. 114, no. 2, pp. Society, vol. 48, no. 11, pp. 1011–1031, 2000. 1095–1107, 2003. [35] N. H. Fletcher and T. D. Rossing, Principles of Vibration and [17] B. Bank, F. Avanzini, G. Borin, G. De Poli, F. Fontana, and Sound, Springer-Verlag, New York, NY, USA, 1995. D. Rocchesso, “Physically informed signal processing meth- [36] L. D. Landau and E. M. Lifsits,ˇ Theory of Elasticity, Editions ods for piano sound synthesis: a research overview,” EURASIP Mir, Moscow, Russia, 1967. Journal on Applied Signal Processing, vol. 2003, no. 10, pp. [37] N. Dunford and J. T. Schwartz, Linear Operators. Part 2: Spec- 941–952, 2003. tral Theory, Self Adjoint Operators in Hilbert Space, John Wiley [18] V. Valim¨ aki,¨ J. Huopaniemi, M. Karjalainen, and Z. Janosy,´ & Sons, New York, NY, USA, 1st edition, 1963. “Physical modeling of plucked string instruments with appli- [38] I. Testa, Sintesi del suono generato dalle corde vibranti: un al- cation to real-time sound synthesis,” Journal of the Audio En- goritmo basato su un modello dispersivo, Physics degree thesis, gineering Society, vol. 44, no. 5, pp. 331–353, 1996. Universita` Federico II di Napoli, Napoli, Italy, 1997. [19] J. O. Smith III, “Efficient synthesis of stringed musical instru- [39] H. W. Strube, “Linear prediction on a warped frequency ments,” in Proc. 1993 International Computer Music Confer- scale,” Journal of the Acoustical Society of America, vol. 68, no. ence, pp. 64–71, Tokyo, Japan, September 1993. 4, pp. 1071–1076, 1980. [20] M. Karjalainen, V. Valim¨ aki,¨ and Z. Janosy,´ “Towards high- [40] J. A. Moorer, “The manifold joys of conformal mapping: ap- quality sound synthesis of the guitar and string instruments,” plications to digital filtering in the studio,” Journal of the Audio in Proc. 1993 International Computer Music Conference,pp. Engineering Society, vol. 31, no. 11, pp. 826–841, 1983. 56–63, Tokyo, Japan, September 1993. [41] J.-M. Jot and A. Chaigne, “Digital delay networks for design- [21] G. Borin and G. De Poli, “A hysteretic hammer-string inter- ing artificial reverberators,” in Proc. 90th Convention Audio action model for physical model synthesis,” in Proc. Nordic Engineering Society, Paris, France, preprint no. 3030, Febru- Acoustical Meeting, pp. 399–406, Helsinki, Finland, June 1996. ary, 1991. [22] G. E. Garnett, “Modeling piano sound using digital waveg- [42] M. Karjalainen, A. Harm¨ a,¨ and U. K. Laine, “Realizable uide filtering techniques,” in Proc. 1987 International Com- warped IIR filters and their properties,” in Proc. IEEE Interna- puter Music Conference, pp. 89–95, Urbana, Ill, USA, August tional Conference on Acoustics, Speech, and Signal Processing, 1987. vol. 3, pp. 2205–2208, Munich, Germany, April 1997. [23] J. O. Smith III and S. A. Van Duyne, “Commuted piano syn- [43] A. Harm¨ a,¨ “Implementation of recursive filters having delay thesis,” in Proc. 1995 International Computer Music Confer- free loops,” in Proc. IEEE International Conference on Acous- ence, pp. 319–326, Banff, Canada, September 1995. tics, Speech, and Signal Processing, vol. 3, pp. 1261–1264, Seat- [24] S. A. Van Duyne and J. O. Smith III, “Developments for the tle, Wash, USA, May 1998. commuted piano,” in Proc. 1995 International Computer Mu- [44] P. W. Broome, “Discrete orthonormal sequences,” Journal of sic Conference, pp. 335–343, Banff, Canada, September 1995. the ACM, vol. 12, no. 2, pp. 151–168, 1965. [25] M. Karjalainen and J. O. Smith III, “Body modeling tech- [45] A. V. Oppenheim, D. H. Johnson, and K. Steiglitz, “Computa- niques for string instrument synthesis,” in Proc. 1996 Interna- tion of spectra with unequal resolution using the fast Fourier tional Computer Music Conference, pp. 232–239, Hong Kong, transform,” Proceedings of the IEEE, vol. 59, pp. 299–301, August 1996. 1971. [26] M. Karjalainen, V. Valim¨ aki,¨ and T. Tolonen, “Plucked-string [46] G. Evangelista and S. Cavaliere, “Audio effects based on models, from the Karplus-Strong algorithm to digital waveg- biorthogonal time-varying frequency warping,” EURASIP uides and beyond,” Computer Music Journal,vol.22,no.3,pp. Journal on Applied Signal Processing, vol. 2001, no. 1, pp. 27– 17–32, 1998. 35, 2001. [27] H. Fletcher, “Normal vibration frequencies of a stiff piano [47] G. Evangelista and S. Cavaliere, “Auditory modeling via fre- string,” Journal of the Acoustical Society of America, vol. 36, quency warped wavelet transform,” in Proc. European Sig- no. 1, pp. 203–209, 1964. nal Processing Conference, vol. I, pp. 117–120, Rhodes, Greece, [28] H. Fletcher, E. D. Blackham, and R. Stratton, “Quality of pi- September 1998. ano tones,” Journal of the Acoustical Society of America,vol. [48] G. Evangelista and S. Cavaliere, “Dispersive and pitch- 34, no. 6, pp. 749–761, 1962. synchronous processing of sounds,” in Proc. Digital Audio [29] D. Rocchesso and F. Scalcon, “Accurate dispersion simulation Effects Workshop, pp. 232–236, Barcelona, Spain, November for piano strings,” in Proc. Nordic Acoustical Meeting, pp. 407– 1998. 414, Helsinki, Finland, June 1996. [49] G. Evangelista and S. Cavaliere, “Analysis and regulariza- [30] D. Rocchesso and F. Scalcon, “Bandwidth of perceived inhar- tion of inharmonic sounds via pitch-synchronous frequency monicity for physical modeling of dispersive strings,” IEEE warped wavelets,” in Proc. 1997 International Computer Mu- Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 597–601, sic Conference, pp. 51–54, Thessaloniki, Greece, September 1999. 1997. [31] I. Testa, G. Evangelista, and S. Cavaliere, “A physical model [50] L. Savioja and V. Valim¨ aki,¨ “Reducing the dispersion er- of stiff strings,” in Proc. Institute of Acoustics (Internat. Symp. ror in the digital waveguide mesh using interpolation and on Music and Acoustics), vol. 19, pp. 219–224, Edinburgh, UK, frequency-warping techniques,” IEEE Trans. Speech and Audio August 1997. Processing, vol. 8, no. 2, pp. 184–194, 2000. [32] S. Cavaliere and G. Evangelista, “Deterministic least squares [51] L. Savioja and V. Valim¨ aki,¨ “Multiwarping for enhancing the estimation of the Karplus-Strong synthesis parameter,” in frequency accuracy of digital waveguide mesh simulations,” Proc. International Workshop on Physical Model Synthesis,pp. IEEE Signal Processing Letters, vol. 8, no. 5, pp. 134–136, 2001. 15–19, Firenze, Italy, June 1996. [52] G. Evangelista, Dyadic Warped Wavelets, vol. 117 of Advances [33] G. Evangelista and S. Cavaliere, “Discrete frequency warped in Imaging and Electron Physics, Academic Press, NY, USA, wavelets: theory and applications,” IEEE Trans. Signal Process- 2001. ing, vol. 46, no. 4, pp. 874–885, 1998. Physically Inspired Models 977

I. Testa was born in Napoli, Italy, on September 21, 1973. He received the Lau- rea in Physics from University of Napoli “Federico II” in 1997 with a dissertation on physical modeling of vibrating strings. In the following years, he has been engaged in the didactics of physics research, in the field of secondary school teacher training on the use of computer-based activities and in teaching computer architecture for the information sciences course. He is currently teaching “electronics and telecommunications” at the Vocational School, Galileo Fer- raris, Napoli.

G. Evangelista received the Laurea in physics (with the highest honors) from the University of Napoli, Napoli, Italy, in 1984 andtheM.S.andPh.D.degreesinelectri- cal engineering from the University of Cal- ifornia, Irvine, in 1987 and 1990, respec- tively. Since 1995, he has been an Assistant Professor with the Department of Physical Sciences, University of Napoli “Federico II”. From 1998 to 2002, he was a Scientific Ad- junct with the Laboratory for Audiovisual Communications, Swiss Federal Institute of Technology, Lausanne, Switzerland. From 1985 to 1986, he worked at the Centre d’Etudes de Mathematique´ et Acoustique Musicale (CEMAMu/CNET), Paris, France, where he contributed to the development of a DSP-based sound synthesis system, and from 1991 to 1994, he was a Research Engineer at the Microgravity Advanced Research and Support Center, Napoli, where he was engaged in research in image processing applied to fluid motion analysis and material science. His interests in- clude digital audio, speech, music, and image processing; coding; wavelets and multirate signal processing. Dr. Evangelista was a re- cipient of the Fulbright Fellowship.

S. Cavaliere received the Laurea in elec- tronic engineering (with the highest hon- ers) from the University of Napoli “Federico II”,Napoli, Italy, in 1971. Since 1974, he has been with the Department of Physical Sci- ences, University of Napoli, first as a Re- search Associate and then as an Associate Professor. From 1972 to 1973, he was with CNR at the University of Siena. In 1986, he spent an academic year at the Media Lab- oratory, Massachusetts Institute of Technology, Cambridge. From 1987 to 1991, he received a research grant for a project devoted to the design of VLSI chips for real-time sound processing and for the realization of the Musical Audio Research Station, workstation for sound manipulation, IRIS, Rome, Italy. He has also been a Research Associate with INFN for the realization of very-large systems for data acquisition from nuclear physics experiments (KLOE in Fras- cati and ARGO in Tibet) and for the development of techniques for the detection of signals in high-level noise in the Virgo experiment. His interests include sound and music signal processing, in partic- ular for the Web, signal transforms and representations, VLSI, and specialized computers for sound manipulation. EURASIP Journal on Applied Signal Processing 2004:7, 978–989 c 2004 Hindawi Publishing Corporation

Digital Waveguides versus Finite Difference Structures: Equivalence and Mixed Modeling

Matti Karjalainen Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, 02150 Espoo, Finland Email: matti.karjalainen@hut.fi

Cumhur Erkut Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, 02150 Espoo, Finland Email: cumhur.erkut@hut.fi

Received 30 June 2003; Revised 4 December 2003

Digital waveguides and finite difference time domain schemes have been used in physical modeling of spatially distributed systems. Both of them are known to provide exact modeling of ideal one-dimensional (1D) band-limited wave propagation, and both of them can be composed to approximate two-dimensional (2D) and three-dimensional (3D) mesh structures. Their equal capabil- ities in physical modeling have been shown for special cases and have been assumed to cover generalized cases as well. The ability to form mixed models by joining substructures of both classes through converter elements has been proposed recently. In this paper, we formulate a general digital signal processing (DSP)-oriented framework where the functional equivalence of these two approaches is systematically elaborated and the conditions of building mixed models are studied. An example of mixed modeling of a 2D waveguide is presented. Keywords and phrases: acoustic signal processing, hybrid models, digital waveguides, scattering, FDTD model structures.

1. INTRODUCTION Finite difference schemes [11] were introduced to the simulation of vibrating string as a numerical integration so- Discrete-time simulation of spatially distributed acoustic sys- lution of the wave equation [12, 13], and the approach has tems for sound and voice synthesis finds its roots both in been developed further for example in [14] as a finite differ- modeling of speech production and musical instruments. ence time domain (FDTD) simulation. The second-order fi- The Kelly-Lochbaum vocal tract model [1] introduced a one- nite difference scheme including propagation losses was for- dimensional transmission line simulation of speech produc- mulated as a digital filter structure in [15], and its stability tion with two-directional delay lines and scattering junc- issues were discussed in [16]. This particular structure is the tions for nonhomogeneous vocal tract profiles. Delay sec- main focus of the finite difference discussions in the rest of tions discretize the d’Alembert solution of the wave equa- this paper and we will refer to it as the FDTD model struc- tion [2] and the scattering junctions implement the acous- ture. tic continuity laws of pressure and volume velocity in a tube DWG and FDTD approaches to discrete-time simula- of varying diameter. Further simplification led to the synthe- tion of spatially distributed systems show a high degree sis models used as the basis for linear prediction of speech of functional equivalence. As discussed in [5], in the one- [3]. dimensional band-limited case, the ideal wave propagation A similar modeling approach to musical instruments, can be exactly modeled by both methods. The basic differ- such as string and wind instruments, was formulated later ence is that the FDTD model structures process the signals and named the technique of digital waveguides (DWGs) as they are, whereas DWGs process their wave decompo- [4, 5]. For computational efficiency reasons, in DWGs two- sition. There are other known differences between DWGs directional delay lines are often reduced to single delay loops and FDTD model structures. One of them is the instabil- [6]. DWGs have been further discussed in two-dimensional ities (“spurious” responses) found in FDTD model struc- (2D) and three-dimensional (3D) modeling [5, 7, 8, 9, 10], tures, but not in DWGs, to specific excitations. Another dif- combined sometimes with a finite difference approach into ference is the numeric behavior in finite precision computa- DWG meshes. tion. Digital Waveguides versus Finite Difference Structures 979

Comparison of these two different paradigms has been and µ is mass per unit length of the string [2]. The impedance developed further in [10, 17, 18]. In [17], the interesting and is closely related to the tension T,massdensity µ, and the important possibility of building mixed models with sub- propagation speed c and is given by Z = Tµ = T/c.In models of DWG and FDTD types was introduced and gen- the acoustical domain, the admittance is also related to the eralized to elements with arbitrary wave impedances in [18]. acoustical propagation speed c. For instance, the admittance The problem of functional comparison and compatibility of a tube with a constant cross-section area A is given by analysis has remained, however, and is the topic of this pa- per. A Y = ,(2) The rest of the paper is organized as follows. Section 2 ρc provides the background information and notation that will be used in the following sections. A summary of wave- where ρ is the gas density in the tube. based modeling and finite difference modeling is also in- The two common forms of discretizing the wave equa- cluded in this section. Section 3 provides the derivation of tion for numerical simulation are through traveling wave so- the FDTD model structures, including the source terms, scat- lution and by finite difference formulation. tering, and the continuity laws. Based on the wave equation in the acoustical domain, this section highlights the func- 2.1. Wave-based modeling tional equivalence of DWGs and FDTD model structures. It The traveling wave formulation is based on the d’Alembert also presents a way of building mixed models. The formal solution of propagation of two opposite direction waves, that proofs of equivalence are provided in “Appendix.” Section 4 is, is devoted to real-time implementation of mixed models. Fi- → ← nally, Section 5 draws conclusions and indicates future direc- y(x, t) = y(x − ct)+y(x + ct). (3) tions. Here, the arrows denote the right-going and the left-going 2. BACKGROUND components of the total . Assuming that the signals are bandlimited to half of sampling rate, we may sample the Sound synthesis algorithms that simulate spatially dis- traveling waves without losing any information by selecting tributed acoustic systems usually provide discrete-time so- T as the sample interval and X the position interval between lutions to a hyperbolic partial differential equation, that samples so that T = X/c. Sampling is applied in a discrete is, the wave equation. According to the domain of simula- time-space grid in which n and k are related to time and po- tion, the variables correspond to different physical quanti- sition, respectively. The discretized version of (3)becomes ties. The physical variables may further be characterized by [5]: their mathematical nature. An across variable is defined here → ← to describe a difference between two values of an irrotational y(k, n) = y(k − n)+y(k + n). (4) potential function (a function that integrates or sums up to zero over closed trajectories), whereas a through variable is It follows that the wave propagation can be computed by up- defined here to describe a solenoidal function (a quantity dating state variables in two delay lines by that integrates or sums-up to zero over closed surfaces). For → → ← ← = = example in the acoustical domain, the deviation from the yk,n+1 yk−1,n, yk,n+1 yk+1,n,(5) steady-state pressure p(x, t) is an across variable and the vol- ume velocity u(x, t) is a through variable, where x is the spa- that is, by simply shifting the samples to the right and left, tial vector variable and t is the temporal scalar variable. Sim- respectively. The shift is implemented with a pair of delay ilarly, in the mechanical domain, the across variable is the lines, and this kind of discrete-time modeling is called DWG force and the through variable is the velocity. The ratio of the modeling [5]. Since the physical variables are split into di- through and across variables yields the impedance Z. The ad- rectional wave components, we will refer to such models as mittance is the inverse of Z, that is, Y = 1/Z. W-models. According to (3)or(4), a single physical variable In a one-dimensional (1D) medium, the spatial vector (either through or across) is computed by summing the trav- variable reduces to a scalar variable x, so that in a homo- eling waves, whereas the other one may be computed implic- geneous, lossless, unbounded, and source-free medium the itly via the impedance. wave equation is written If the medium is nonhomogeneous, then the admittance varies as a function of the spatial variable. In this case, the en- 2 ytt = c yxx,(1)ergy transfer between the wave components should be com- puted according to Kirchhoff-type of continuity laws,ensur- where y is a physical variable, subscript tt refers to the second ing that the total energy is preserved. These laws may be de- partial derivative in time t, xx to the second partial deriva- rived utilizing the irrotational and solenoidal nature of across tive in spatial variable x,andc is speed of wavefront in the and through variables, respectively. In the DWG equivalent, medium of interest. For example in the mechanical domain the change in Y across a junction of the waveguide sec- (e.g., vibrating string) we are primarily interested in transver- tions causes scattering and the scattering junctions of inter- salwavemotionforwhichc = T/µ,whereT is tension force connected ports, with given admittances and wave variables, 980 EURASIP Journal on Applied Signal Processing have to be formulated [5]. For instance, in a parallel junc- Y2 tion of waveguides in the acoustical domain, the Kirchhoff Y1 + N P2 − constraints are P1 P− P1 = P2 =···=PN = PJ , 2 (6) P+ U1 + U2 + ···+ UN + Uext = 0, 1 PJ where Pi and Ui are the total pressure and volume velocity 1 P+ of the ith branch ,respectively,PJ is the common pressure Uext 3 of coupled branches, and Uext is an external volume veloc- ity to the junction. Such a junction is illustrated in Figure 1. − When port pressures are represented by incoming wave com- P3 Y3 + − Y ponents Pi , outgoing wave components by Pi , admittances n attached to each port by Yi,and = + − + = + Pi Pi + Pi , Ui YiPi ,(7) Figure 1: Parallel junction of admittances Yi with associated pres- the junction pressure PJ can be obtained as sure waves indicated. A volume velocity input Uext is also attached.   N = 1 + PJ Uext +2 YiPi ,(8) Ytot i=1 where the short-hand notation yx,t is used instead of y(x, t). ∆ = ∆ = ∆  By selecting t x/c, and using index notation k x/ x = N = ∆ where Ytot i=1 Yi is the sum of all admittances to the and n t/ t,(10)resultin junction. Outgoing pressure waves are obtained from (7)to − = − + = − + − − (11) yield Pi PJ Pi . The resulting junction, a W-node,is yk,n+1 yk 1,n yk+1,n yk,n 1. depicted in Figure 2. The delay lines or termination admit- tances (see appendix) are connected to the W-ports of a W- From (11) we can see that a new sample yk,n+1 at position k node. and time index n + 1 is computed as the sum of its neighbor- A useful addition to DWG theory is to adopt wave digital ing position values minus the value at the position itself one filters (WDF) [10, 19] as discrete-time simulators of lumped sample period earlier. Since yk,n+1 is a physical variable, we ff parameter elements. Being based on W-modeling, they are will refer to models based on finite di erences as K-models, ff computationally compatible with the W-type DWGs [10, 18, with a reference to Kirchho type of physical variables. 20]. 3. FORMULATION OF THE FDTD MODEL STRUCTURE 2.2. Finite difference modeling The equivalence of the traveling wave and the finite difference In the most commonly used way to discretize the wave equa- solution of the ideal wave equation (given in (5)and(11), re- tion by finite differences, the partial derivatives in (1)areap- spectively) has been shown, for instance, in [5]. Based on this proximated by centered differences. The centered difference functional equivalence, (11) has been previously expanded approximation to the spatial partial derivative y is given by x without a formal derivation to a scattering junction with ar- [11] bitrary port impedances, where (8) is used as a template for y(x + ∆x/2, t) − y(x − ∆x/2, t) the expansion [18]. The resulting FDTD model structure is y ≈ ,(9) x ∆x illustrated in Figure 3 for a three-port junction. A compari- son of the FDTD model structure in Figure 3 and the DWG where ∆x is the spatial sampling interval. A similar expres- scattering junction in Figure 2 reveals the functional simi- sion is obtained for the temporal partial derivative, if x is larities of the two methods. However, a formal, generalized, kept constant and t is replaced by t ± ∆t,where∆t is the and unified derivation of the FDTD model structure with- discrete-time sampling interval. Iterating the difference ap- out an explicit reference to the DWG method remains to proximations, second-order partial derivatives in (1)areap- be presented. This section presents such a derivation based proximated by on the equations of motion of the gas in a tube. Note that, because of the analogy between different physical domains, − ≈ yx+∆x,t 2yx,t + yx−∆x,t once the formulation is derived, it can be used in different yxx ∆ 2 , x (10) domains as well. Therefore, the derivation below is not lim- y ∆ − 2y + y −∆ ited to the acoustical domain and the resulting structure can y ≈ x,t+ t x,t x,t t , tt ∆t2 also be used in other domains. 3.1. Source terms 1 Note that capital letters denote a transform variable. For instance, Pi is In order to explain the excitation Uext and the associated filter − the z-transform of the signal pi(n). H(z) = 1 − z 2 in Figure 3, we consider a piece of tube of Digital Waveguides versus Finite Difference Structures 981

W-port 3 + − Uext Y3 P3 P3

W-node N1 Y1 Y3 Y2 − 2 + + + + 0 P1 P2 − Y1 2 + 2 Y2 z N

1 W-line W-admittance

− Yi − W-port 2 W-port 1 + + z−N − PJ − P1 + + P2

(a)

Y1 Uext Y2 YN

w

Y1 wwN1 w w W-line wwNN wwW-line w

PJ (b)

Figure 2: (a) N-port scattering junction (three ports are shown) of ports with admittances Yi. Incoming and outgoing pressure waves are + − Pi and Pi , respectively. W-port 1 is terminated by admittance Y1. (b) Abstract representation of the W-node in (a).

Uext K-port 3 Y3

K-node − z 2 N Y1 − Y3 1 Y2 + 2

Y1 2 + 2 Y2 K-pipe K-admittance 1 Yi

z−1 − + K-port 1 K-port 2 PJ

z−1 z−1

(a)

Y1 Uext Y2 YN

k

Y1 kkN1 k k K-pipe kkNN kkK-pipe k

PJ (b)

Figure 3: (a) Digital filter structure for finite difference approximation of a three-port scattering node with port admittances Yi.Onlytotal velocity PJ (K-variable) is explicitly available. (b) Abstract representation of the K-node in (a). 982 EURASIP Journal on Applied Signal Processing constant cross-sectional area A that includes an ideal volume twice, (17)becomes velocity source s(t). The pressure p and volume velocity u − (the variables in the acoustical domain, as explained in the pk(n +1)+pk(n 1) previous section) satisfy the following PDE set: 2 (19) = Ak−1/2 pk−1(n)+Ak+1/2 pk+1(n) . Ak−1/2 + Ak+1/2 ∂u ∂p A ∂p ∂u ρ + A = 0 + = s, (12) = ∂t ∂x ρc2 ∂t ∂x Finally, by defining Yk−1 Ak−1/2/ρc we obtain p (n +1)+p (n − 1) where ρ is the gas density and c is the propagation speed. k k 2 (20) This set may be combined to yield a single PDE in p and the = Yk−1 pk−1(n)+Yk+1 pk+1(n) , source term Ytot = ∂2 p ρc2 ∂s ∂2 p where the term Ytot Yk−1 + Yk+1 may be interpreted as − = c2 . (13) the sum of all admittances connected to the kth cell. This ∂t2 A ∂t ∂x2 recursion is implemented with the filter structure illustrated Defining in Figure 4. The output of the structure is the junction pres-      sure pJ,k(n).Itisworthtonotethat(20) is functionally the 1 ∆t ∆t same as the DWG scattering representation given in (8), if s(t) = s t − + s t + + O ∆t2 , (14) 2 2 2 the admittances are real. A more general case of complex ad- mittances has been considered in the appendix. Whereas the using index notation k = x/∆x and n = t/∆t, and applying DWG formulation can easily be extended to N-port junc- centered differences (see Section 2.2)to(13)with∆x/∆t = c tions, this extension is not necessarily possible for a K-model, yields the following difference equation where the continuity laws are generally not satisfied. In the next subsection, we investigate the continuity laws within the pk(n +1)= pk+1(n)+pk−1(n) − pk(n − 1) FDTD model structure. ρc∆x (15) + s (n +1)− s (n − 1) . 3.3. Continuity laws 2A k k  We denote the pressure across the impedance 1/ Yi as Note that ρc/A is the acoustic impedance that converts the pa(n), and the volume velocity through the same impedance volume velocity source s(t) to the pressure. Since the model as ut(n), with a reference to Figure 4. According to these no- output is the pressure at the time step n + 1, it follows that tations, Ohm’s law in the acoustical domain yields the source is delayed two samples, subtracted from its current −2 ut(n) value, and scaled, corresponding to the filter 1 − z for U pa(n) = , (21) ext Y in Figure 3. tot whereas the Kirchhoff continuity laws can be written as 3.2. Admittance discontinuity and scattering p (n) = p (n +1)+p (n − 1), (22) Now consider an unbounded, source-free tube with a cross- a k k = section A(x) that is a smooth real function of spatial variable ut(n) 2Yk−1 pk−1(n)+2Yk+1 pk+1(n). (23) x. In this case, the governing PDEs can be combined into a Inserting (21) into (23) eliminates ( ), and the result may single PDE in the pressure alone [10], ut n be combined with (22) to give the following equation for 2 2 combined continuity laws: ∂ p = c ∂ ∂p 2 A(x) (16) ∂t A(x) ∂x ∂x pk(n +1)+pk(n − 1) 2 (24) which is the Webster horn equation. Discretizing this equa- = Yk−1 pk−1(n)+Yk+1 pk+1(n) . tion by centered differences yields the following difference Ytot equation This relation is exactly the recursion of the FDTD model structure given in (20), but obtained here solely from the p (n +1)− 2p (n)+p (n − 1) k k k continuity laws. We thus conclude that the continuity laws ∆ 2 t are automatically satisfied by the FDTD model structure of c2 A p (n) − p (n) − A − p (n) − p − (n) = k+1/2 k+1 k k 1/2 k k 1 , Figure 4. ∆ 2 Ak x It is worth to note that more ports may be added to the (17) structure without violating the continuity laws for any num- ber of linear, time-invariant (LTI) admittances, as long as where Ak = A(k∆x). By selecting ∆x = c∆t and using the Ytot = Yi.ForN ports connected to the ith cell, (23)be- approximation comes N 1 2 − A = A − + A + O ∆x (18) = 1 k 2 k 1/2 k+1/2 Ut 2 z YiPJ,i (25) i=1 Digital Waveguides versus Finite Difference Structures 983

Yk−1 2+ 2Yk+1

ut(n)

1 Yi pa(n) − +

pk(n +1)= pJ,K

z−1 z−1

pk−1(n) pk+1(n)

Figure 4: Digital filter structure for finite difference approximation of an unbounded, source-free tube with a spatially varying cross section.

N1 Y2 + N2 P2 0 2Y1 +2Y2 + 2Y2 + 2Y3 −

1 − z−1 z 2 Y1 + Y2 1 Y2 + Y3 −1 W-port W-port K-port + K-port z + − − − P1 − + + P2 P2 z−1 z−1 KW-converter

Figure 5: FDTD node (left) and a DWG node (right) forming a part of a hybrid waveguide. There is a KW-converter between K- and W- models. Yi are wave admittances of W-lines, K-pipes, and KW-converter between junction nodes. P1 and P2 are the junction pressures of the K-node and W-node, respectively.

andtherecursionin(24) can be expressed in z-domain as resulting hybrid model in this special case. A generalization has been proposed in [18],whichallowstomakeanyhybrid N model of K-elements (FDTD) and W-elements having arbi- −2 = 2 −1 PJ,k + z PJ,k z YiPJ,i. (26) trary wave admittances/impedances at their ports (see also Yi = i 1 [21]). The superposition of the excitation block in (14) and the Here,wederivehowahybridmodel(showninFigure 5) N-port formulation above completes the formulation of the can be constructed in a 1D waveguide between a K-node N1 FDTD model structure. In particular, by setting N = 3 the (left) and a W-node N2 (right), aligned with the spatial grids digital filter structure in Figure 3 is obtained. k = 1 and 2, respectively. The derivation is based on the fact that the junction pressures are available in both types 3.4. Construction of mixed models of nodes, but in the DWG case not at the W-ports. If N1 and N2 would be both W-nodes (see Figure 8 in An essential difference between DWGs of Figure 2 and FDTD the appendix), the traveling wave entering into the node N2 model structures of Figure 3 is that while DWG junctions could be calculated as are connected through two-directional delay lines (W-lines), FDTD nodes have two unit delays of internal memory and + = −1 − = −1 − −1 − = −1 − −2 − P2 z P1 z P1 z P2 z P1 z P2 . (27) delay-free K-pipes connecting ports between nodes. These junction nodes and ports are thus not directly compatible. Note that P1 is available in the K-node N1 in Figure 5.Con- The next question is the possibility to interface these sub- versely, if N1 and N2 would be both K-nodes, the junction −1 models. The interconnection of a lossy FDTD model struc- pressure z P2 would be needed for calculation of P1 (see ture and a similar DWG has been tackled in [17]. A proper in- Figure 10 in the appendix). Although P2 is implicitly avail- terconnection element (converter) has been proposed for the able in N2, it can also be obtained by summing up the wave 984 EURASIP Journal on Applied Signal Processing

yt yt yt

ww w

wl wl wl

wl w wl w wl w wl w yt

kw kw wl

kp k kp k kw w wl w yt

kp kp wl

kp k kp k kw w wl w yt

kp kp wl

Figure 6: Part of a 2D waveguide mesh composed of (a) K-type FDTD elements (left bottom): K-pipes (kp) and K-nodes (k), (b) W-type DWG elements (top and right): delay-controllable W-lines (wl), W-nodes (w), and terminating admittances (yt), and (c) converter elements (kw) to connect K- and W-type elements into a mixed model. components within the converter 4.1. K-modeling versus W-modeling, pros and cons An advantage of W-modeling is in its numerical robustness. z−1P = z−1 P+ + P− . (28) 2 2 2 By proper formulation, the stability is guaranteed also with Equation (27) may be inserted in (28) to yield the following fixed-point arithmetics [5, 19]. Another useful property is transfer matrix of the 2-port KW-converter element the relatively straightforward way of using fractional delays [22] when building digital waveguides, which makes for ex- + − −2 −1 ample tuning and run time variation of musical instrument P2 = 1 z z P1 −1 − −2 − . (29) z P2 1 1 z P2 models convenient. In general, it seems that W-modeling is the right choice in most 1D cases. The KW-converter in Figure 5 essentially performs the cal- The advantages of K-modeling by FDTD waveguides are culations given in (29) and interconnects the K-type port of found when realizing mesh-like structures, such as 2D and an FDTD node and the W-type port of a DWG node. The 3D meshes [7, 8]. In such cases, the number of unit delays signal behavior in a mixed modeling structure is further in- (memory positions) is two for any dimensionality, while for vestigated in the appendix. a DWG mesh it is two times the dimensionality of the mesh. A disadvantage of FDTDs is their inherent lack of numeri- 4. IMPLEMENTATION OF MIXED MODELS cal robustness and tendency of instability for signal frequen- cies near DC and the Nyquist frequency. Furthermore, FDTD The functional equivalence and mixed modeling paradigm of junction nodes cannot be made memoryless, which may be a DWGs and FDTDs presented above allows for flexible build- limitation in nonlinear and parametrically varying models. ing of physical models from K- and W-type of substructures. In this way, it is possible to exploit the advantages of each 4.2. 2D waveguide mesh case type. In this section, we will explore a simple example of digital waveguide model that shows how the mixed mod- Figure 6 illustrates a part of a 2D mixed model structure that els can be built. Before that, a short discussion on the pros is based on a rectangular FDTD waveguide mesh for effi- and cons of the different paradigms in practical realizations cient and memory-saving computation and DWG elements is presented. at boundaries. Such model could be for example a membrane Digital Waveguides versus Finite Difference Structures 985 of a drum or in a 3D case a room enclosed by walls. When DWGs. Furthermore, an example of mixed models consist- there is need to attach W-type termination admittances to ing of FDTD and DWG blocks and converter elements is re- the model or to vary the propagation delays within the sys- ported. The formulation allows for high flexibility in build- tem, a change from K-elements to W-elements through con- ing 1D or higher dimensional physical models from inter- verters is a useful property. Furthermore, variable-length de- connected blocks. lays can be used, for example, for passive nonlinearities at the The DWG method is used as a primary example to terminations to simulate gongs and other instruments where the wave-based methods in this paper. Naturally, the KW- nonlinear mode coupling takes place [23]. The same princi- converter formulation is applicable to any W-method, such ple can be used to simulate shock waves in brass instrument as the wave digital filters (WDFs) [19]. In the future, we plan bores [24]. In such cases, the delay lengths are made depen- to extend our examples to include WDF excitation blocks. dent on the signal value passing through the delay elements. Other important future directions are the analysis of the dy- In Figure 6, the elements denoted by kp are K-type pipes namic behavior of parametrically varying hybrid models, as between K-type nodes. Elements kw are K-to-W converters well as benchmark tests for computational costs of the pro- and elements wl are W-lines, where the arrows indicate that posed structures. they are controllable fractional delays. Elements yt are ter- Matlab scripts and demos related to DWGs and minating admittances. In a general case, scattering can be FDTDs can be found at http://www.acoustics.hut.fi/demos/ controlled by varying the admittances, although the compu- waveguide-modeling/. tational efficiency is improved if the admittances are made equal.InamodernPC,a2Dmeshofafewhundredelements APPENDIX can run in real time at full audio rate. By decimated compu- tation, bigger models can be computed if a lower cutoff fre- A. PROOFS OF EQUIVALENCE quency is permitted, allowing large physical dimensions of the mesh. The proofs of functional equivalence between the DWG and FDTD formulations used in this article are given below. The 4.3. Mixed modeling in BlockCompiler approach useful for this can be based on the Thevenin and Norton theorems [27]. The development of the K- and W-models above has led to a systematic formulation of computational elements for both A.1. Termination in a DWG network paradigms and mixed modeling. The W-lines and K-pipes as well as related junction nodes are useful abstractions for Passive termination of a DWG junction port by a given ad- a formal specification of model implementation. We have mittance Y is equivalent to attaching a delay line of infinite developed a software tool for physical modeling called the length and wave admittance Y. In the DWG case, this means BlockCompiler [20] that is designed in particular for flexible an infinite long sequence of admittance-matched unit delay modeling and efficient real-time computation of the models. lines. Since there is no back-scattering in finite time, we can The BlockCompiler contains two levels: (a) model cre- use the left-side port termination of Figure 2,withzerovol- ation and (b) model implementation. The model creation ume velocity in input terminal. Thus, admittance filter Y1 level is written in the Common Lisp programming lan- is not needed in computation, it has only to be included in guage for maximal flexibility in symbolic object-based ma- making the filter 1/ Yi. nipulation of model structures. A set of DSP-oriented and A.2. Termination in an FDTD network physics-oriented computational blocks are available. New block classes can be created either as macro classes composed Deriving the passive port termination for an FDTD junc- of predefined elementary blocks or by writing new elemen- tion is not as obvious as for a DWG junction. We can ap- tary blocks. The blocks are connected through ports: inputs ply again an infinitely long sequence of admittance-matched and outputs for DSP blocks and K- or W-type ports for phys- FDTD sections, as depicted in Figure 7 on the left-hand side. ical blocks. A full interconnected model is called a patch. With the notations given and z-transforms of variables and The model implementation level is a code generator that admittances we can denote does the scheduling of the blocks, writes C source code into a M file, compiles it on the fly, and allows for streaming sound 2Y1 −1 2 −1 −2 P0 = P−1z + YiPiz − P0z , (A.1a) in real time or computation by stepping in a sample-by- Yi Yi i=2 sample mode. The C code can also be exported to other = −1 −1 − −2 platforms, such as the Mustajuuri audio platform [25]and P−1 P0z + P−2z P−1z , (A.1b) pd [26]. Sound examples of mixed models can be found at −1 −1 −2 P−k = P−k+1z + P−k−1z − P−kz ,fork<−1, (A.1c) http://www.acoustics.hut.fi/demos/waveguide-modeling/. where Pi, i = 1, ..., M, are pressures of all M neighboring junctions linked through admittances Y to junction i = 0, 5. SUMMARY AND CONCLUSIONS i and Pk,wherek = 0, −1, −2, ... are pressures in junctions This paper has presented a formulation of a specific FDTD between admittance-matched elements chained as termina- model structure and showed its functional equivalence to the tion of junction 0. By applying (A.1c)to(A.1b)iterativelyfor 986 EURASIP Journal on Applied Signal Processing

+ + 2Y1 + 2Y2

1 Yi

− − − + + + P−2 P−1 P0

z−1 z−1 z−1 z−1 z−1 z−1

Figure 7: FDTD structure terminated by admittance-matched chain of FDTD elements on the left-hand side.

Uext + P2 −1 2Y1 +2Y2 z 2Y2 +2Y3 0 0

1 1 Y1 + Y2 Y2 + Y3

− − − − + + z−1 + + − − P1 P1 P2 P2

Figure 8: Structure for derivation of signal behavior in a DWG network. k = 2, ..., N we get z−2 U −1 −N −N−1 ext − P−1 = P0z + P−N−1z − P−N z . (A.2) + When N →∞, the last two terms cease to have effect on P−1 in any finite time span and they can thus be discarded. When = −1 the result P−1 P0z is used in (A.1a), we get 2Y1 + 2Y2

' ( M 2Y − − 2 − − P =  1 P z 1 z 1 +  Y P z 1 − P z 2,(A.3) 0 0 i i 0 1 Yi Yi i=2 Y1 + Y2 where the first term on the right-hand side can be interpreted −1 −1 as a way to implement the termination as a feedback through z − z a unit delay as illustrated in Figure 3 for the left-hand port of + the FDTD junction. PJ A.3. Signal behavior in a DWG network z−1 z−1 Figure 8 illustrates a case where an arbitrarily large intercon- nected DWG network is reduced so that only two scattering junctions, connected through unit delay line of wave admit- tance Y2, are shown explicitly. Norton equivalent source Uext Figure 9: FDTD structure for derivation of volume velocity source is feeding junction node 1 and an equivalent termination ad- (Uext) to junction pressure (PJ ) transfer function. mittance is Y1. Junction node 2 is terminated by a Norton equivalent admittance Y3. Now, we derive the signal prop- agation from Uext to junction pressure P1 and transmission ratio between pressures P2 and P1. If these “transfer func- for any topologies and parametric values equivalent between tions” are equal for the DWG, the FDTD, and the mixed case these cases. This is due to the superposition principle and the with KW-converter, the models are functionally equivalent Norton theorem. Digital Waveguides versus Finite Difference Structures 987

2Y1 + 2Y2 2Y2 + 2Y3

1 1 Y1 + Y2 Y2 + Y3

−1 −1 z − − z + +

P1 P2

z−1 z−1 z−1 z−1

Figure 10: FDTD structure for derivation of signal relation between two junction pressures.

2Y1 + 2Y2 + 2Y2 + 2Y3 0 −

− 1 z 2 z−1 1 Y2 + Y3 Y1 + Y2 − + z 1 + − − − P + + 2 − P1 P1 −1 −1 W-to-K converter z z

Figure 11: Mixed modeling structure for derivation of DWG to FDTD pressure relation.

From Figure 8, we can write directly for the propagation In the special case of admittance match Y2 = Y3,wegetP2/P1 −1 of equivalent source Uext to junction pressure P1 as = z .Forms(A.4)and(A.7) are now the reference to prove equivalence with FDTD and mixed modeling cases. Uext P1 = . (A.4) Y1 + Y2 A.4. Signal behavior in an FDTD network

Signal transmission ratio between P2 and P1 can be de- Using notations in Figure 9, which shows a Norton’s equiva- rived from the following set of equations (A.5a), (A.5b), and lent for an FDTD network, we can write (A.5c): U − − P = ext 1 − z 2 − P z 2 2Y − − J J P = 2 P z 1, (A.5a) Y1 + Y2 2 Y + Y 1 (A.8) 2 3 2Y1 −2 2Y2 −2 − = − − −1 + PJ z + PJ z P1 P1 P2 z , (A.5b) Y1 + Y2 Y1 + Y2 − = − − −1 (A.5c) P2 P2 P1 z . that after simplification yields − − By eliminating wave variables P1 and P2 , U = ext − PJ ,(A.9) P − P z 1 Y1 + Y2 P− = 1 2 , 1 1 − z−2 whichisequivalenttotheDWGform(A.4). Notice that form − −1 −2 − P2 P1z (1−z ) in feeding U to the node has zeros on the unit cir- P = , (A.6) ext 2 1 − z−2 cle for angles nπ (n is integer), compensating poles inherent −1 in the FDTD backbone structure. This degrades numerical 2Y − z P = 2 P − P z 1 2 Y + Y 1 2 1 − z−2 robustness of the structure around these frequencies. 2 3 For the structure of two FDTD nodes in Figure 10,wecan and by solving for P2/P1,weget write equation

−1 P2 2Y2z −2 2Y3 −2 2Y2 −1 = P2 =−P2z + P2z + P1z , (A.10) −2 . (A.7) P1 Y2 + Y3 + Y2 − Y3 z Y2 + Y3 Y2 + Y3 988 EURASIP Journal on Applied Signal Processing which simplifies to [6]M.Karjalainen,V.Valim¨ aki,¨ and T. Tolonen, “Plucked-string models: From the Karplus-Strong algorithm to digital waveg- P 2Y z−1 uides and beyond,” Computer Music Journal,vol.22,no.3,pp. 2 = 2 (A.11) P Y + Y + Y − Y z−2 17–32, 1998. 1 2 3 2 3 [7] S. A. Van Duyne and J. O. Smith, “Physical modeling with the being equivalent to the DWG form (A.7). This completes 2-D digital waveguide mesh,” in Proc. International Computer Music Conference, pp. 40–47, Tokyo, Japan, September 1993. proving the equivalence of the DWG and FDTD structures. [8]L.Savioja,T.J.Rinne,andT.Takala,“Simulationofroom acoustics with a 3-D finite difference mesh,” in Proc. Interna- A.5. Signal behavior in a mixed modeling structure tional Computer Music Conference, pp. 463–466, Aarhus, Den- To prove the equivalence of signal behavior also in the mixed mark, September 1994. modeling structure of Figure 5 with a KW-adaptor, we have [9] L. Savioja, Modeling techniques for virtual acoustics,Ph.D.the- sis, Helsinki University of Technology, Espoo, Finland, 1999. to analyze the junction signal relations in both directions. We [10] S. D. Bilbao, Wave and scattering methods for the numerical first prove the equivalence in the FDTD to DWG direction. integration of partial differential equations, Ph.D. thesis, Stan- According to Figure 5,wecanwrite ford University, Stanford, Calif, USA, May 2001. [11] J. C. Strikwerda, Finite Difference Schemes and Partial Differ- 2Y − 2Y − − ential Equations, Wadsworth and Brooks/Cole, Pacific Grove, P = 2 P z 1 − 2 P z 2, 2 Y + Y 1 Y + Y 2 Calif, USA, 1989. 2 3 2 3 (A.12) [12] L. Hiller and P. Ruiz, “Synthesizing musical sounds by solving − = − −1 − − −2 P2 P2 P1z P2 z . the wave equation for vibrating objects: Part 1,” Journal of the Audio Engineering Society, vol. 19, no. 6, pp. 462–470, 1971. − Eliminating P2 and solving for P2/P1 yields again form (A.7), [13] L. Hiller and P. Ruiz, “Synthesizing musical sounds by solving proving the equivalence. the wave equation for vibrating objects: Part 2,” Journal of the Audio Engineering Society, vol. 19, no. 7, pp. 542–551, 1971. According to Figure 11, we can analyze signal relation- ff ship in the DWG to FDTD direction by writing [14] A. Chaigne, “On the use of finite di erences for musical syn- thesis. Application to plucked stringed instruments,” Journal d’Acoustique, vol. 5, no. 2, pp. 181–211, 1992. 2Y3 −2 −2 P2 = P2z − P2z [15] M. Karjalainen, “1-D digital waveguide modeling for im- Y2 + Y3 proved sound synthesis,” in Proc. IEEE International Con- − 2Y2 − − − −2 −1 −1 (A.13) ference on Acoustics, Speech and Signal Processing, vol. 2, pp. P1 P1 z + P2z z , 1869–1872, Orlando, Fla, USA, May 2002. Y2 + Y3 [16] C. Erkut and M. Karjalainen, “Virtual strings based on a − = − −1 − − −2 P1 P1 P2z P1 z . 1-D FDTD waveguide model: Stability, losses, and traveling

− waves,” in Proc. Audio Engineering Society 22nd International By eliminating P1 and solving for P2/P1, we get again form Conference on Virtual, Synthetic and Entertainment Audio,pp. (A.7). This concludes proving the equivalence of the mixed 317–323, Espoo, Finland, June 2002. modeling case to corresponding DWG and thus also to [17] C. Erkut and M. Karjalainen, “Finite difference method FDTD structures. vs. digital waveguide method in string instrument modeling and synthesis,” in Proc. International Symposium on Musical Acoustics, Mexico City, Mexico, December 2002. ACKNOWLEDGMENTS [18] M. Karjalainen, C. Erkut, and L. Savioja, “Compilation of unified physical models for efficient sound synthesis,” in Proc. This work is part of the Algorithms for the Modelling of IEEE International Conference on Acoustics, Speech and Signal Acoustic Interactions (ALMA) project (IST-2001-33059) and Processing, vol. 5, pp. 433–436, Hong Kong, China, April 2003. has been supported by the Academy of Finland as a part of [19] A. Fettweis, “Wave digital filters: Theory and practice,” Proc. IEEE, vol. 74, no. 2, pp. 270–327, 1986. the project “Technology for Audio and Speech Processing” [20] M. Karjalainen, “BlockCompiler: Efficient simulation of (SA 53537). acoustic and audio systems,” in Proc. 114th Audio Engineering Society Convention, Amsterdam, Netherlands, March 2003, REFERENCES preprint 5756. [21] M. Karjalainen, “Time-domain physical modeling and real- [1]J.L.KellyandC.C.Lochbaum,“Speechsynthesis,”inProc. time synthesis using mixed modeling paradigms,” in Proc. 4th International Congress on Acoustics, pp. 1–4, Copenhagen, Stockholm Music Acoustics Conference, vol. 1, pp. 393–396, Denmark, September 1962. Stockholm, Sweden, August 2003. [2] N. H. Fletcher and T. D. Rossing, The Physics of Musical In- [22] T. I. Laakso, V. Valim¨ aki,¨ M. Karjalainen, and U. K. Laine, struments, Springer-Verlag, New York, NY, USA, 2nd edition, “Splitting the unit delay-tools for fractional delay filter de- 1998. sign,” IEEE Signal Processing Magazine,vol.13,no.1,pp.30– [3] J. D. Markel and A. H. Gray, Linear Prediction of Speech, 60, 1996. Springer-Verlag, New York, NY, USA, 1976. [23] J. R. Pierce and S. A. Van Duyne, “A passive nonlinear digital [4] J. O. Smith, “Physical modeling using digital waveguides,” filter design which facilitates physics-based sound synthesis of Computer Music Journal, vol. 16, no. 4, pp. 74–91, 1992. highly nonlinear musical instruments,” Journal of the Acousti- [5] J. O. Smith, “Principles of digital waveguide models of musi- cal Society of America, vol. 101, no. 2, pp. 1120–1126, 1997. cal instruments,” in Applications of Digital Signal Processing to [24] R. Msallam, S. Dequidt, S. Tassart, and R. Causse,´ “Physical Audio and Acoustics, M. Kahrs and K. Brandenburg, Eds., pp. model of the trombone including nonlinear propagation ef- 417–466, Kluwer Academic Publishers, Boston, Mass, USA, fects,” in Proc. International Symposium on Musical Acoustics, 1998. vol. 2, pp. 419–424, Edinburgh, Scotland, UK, August 1997. Digital Waveguides versus Finite Difference Structures 989

[25] T. Ilmonen, “Mustajuuri—an application and toolkit for in- teractive audio processing,” in Proc. International Conference on Auditory Display, pp. 284–285, Espoo, Finland, July 2001. [26] M. Puckette, “Pure data,” in Proc. International Computer Mu- sic Conference, pp. 224–227, Thessaloniki, Greece, September 1997. [27] J. E. Brittain, “Thevenin’s theorem,” IEEE Spectrum, vol. 27, no. 3, pp. 42, 1990.

Matti Karjalainen was born in Hankasalmi, Finland, in 1946. He received the M.S. and the Dr.Tech. degrees in electrical engineer- ing from the Tampere University of Tech- nology, in 1970 and 1978, respectively. Since 1980 he has been a professor in acoustics and audio signal processing at the Helsinki University of Technology in the Faculty of Electrical Engineering. In audio technology, his interest is in audio signal processing such as digital signal processing (DSP) for sound reproduction, perceptually based signal processing, as well as music DSP and sound synthesis. In addition to audio DSP, his research activities cover speech synthesis, analysis, and recognition, perceptual audi- tory modeling and spatial hearing, DSP hardware, software, and programming environments, as well as various branches of acous- tics, including musical acoustics and modeling of musical instru- ments. He has written more than 300 scientific and engineering ar- ticles and contributed to organizing several conferences and work- shops. Professor Karjalainen is Audio Engineering Society (AES) Fellow and Member in Institute of Electrical and Electronics En- gineers (IEEE), Acoustical Society of America (ASA), European Acoustics Association (EAA), International Computer Music As- sociation (ICMA), European Speech Communication Association (ESCA), and several Finnish scientific and engineering societies.

Cumhur Erkut was born in Istanbul, Turkey, in 1969. He received his B.S. and his M.S. degrees in electronics and communi- cation engineering from the Yildiz Techni- cal University, Istanbul, Turkey, in 1994 and 1997, respectively, and the Dr.Tech. degree in electrical engineering from the Helsinki University of Technology (HUT), Espoo, Finland, in 2002. Between 1998 and 2002, he worked as a researcher at the Laboratory of Acoustics and Audio Signal Processing of the HUT. He is cur- rently a postdoctoral researcher in the same institution, where he contributes to the EU-funded research project “Algorithms for the Modelling of Acoustic Interactions” (ALMA, IST-2001-33059). His primary research interests are model-based sound synthesis and musical acoustics. EURASIP Journal on Applied Signal Processing 2004:7, 990–1000 c 2004 Hindawi Publishing Corporation

A Digital Synthesis Model of Double-Reed Wind Instruments

Ph. Guillemain LaboratoiredeM´ecanique et d’Acoustique, Centre National de la Recherche Scientifique, 31 chemin Joseph-Aiguier, 13402 Marseille cedex 20, France Email: [email protected]

Received 30 June 2003; Revised 29 November 2003

We present a real-time synthesis model for double-reed wind instruments based on a nonlinear physical model. One specificity of double-reed instruments, namely, the presence of a confined air jet in the embouchure, for which a physical model has been proposed recently, is included in the synthesis model. The synthesis procedure involves the use of the physical variables via a digital scheme giving the impedance relationship between pressure and flow in the time domain. Comparisons are made between the behavior of the model with and without the confined air jet in the case of a simple cylindrical bore and that of a more realistic bore, the geometry of which is an approximation of an oboe bore. Keywords and phrases: double-reed, synthesis, impedance.

1. INTRODUCTION The physical model is first summarized in Section 2.In order to obtain the synthesis model, a suitable form of the The simulation of woodwind instrument sounds has been in- flow model is then proposed, a dimensionless version is writ- vestigated for many years since the pioneer studies by Schu- ten and the similarities with single-reed models (see, e.g., macher [1] on the clarinet, which did not focus on digital [7]) are pointed out. The resonator model is obtained by as- sound synthesis. Real-time-oriented techniques, such as the sociating several elementary impedances, and is described in famous digital waveguide method (see, e.g., Smith [2]and terms of the acoustic pressure and flow. Valim¨ aki¨ [3]) and wave digital models [4] have been intro- ffi Section 3 presents the digital synthesis model, which re- duced in order to obtain e cient digital descriptions of res- quires first discrete-time equivalents of the reed displacement onators in terms of incoming and outgoing waves, and used and the impedance relations. The explicit scheme solving the to simulate various wind instruments. nonlinear model, which is similar to that proposed in [6], is The resonator of a clarinet can be said to be approxi- then briefly summarized. mately cylindrical as a first approximation, and its embou- In Section 4, the synthesis model is used to investigate the chure is large enough to be compatible with simple airflow effects of the changes in the nonlinear characteristics induced models. In double-reed instruments, such as the oboe, the by the confined air jet. resonator is not cylindrical but conical and the size of the air jet is comparable to that of the embouchure. In this case, the dissipation of the air jet is no longer free, and the jet remains 2. PHYSICAL MODEL confined in the embouchure, giving rise to additional aero- The main physical components of the nonlinear synthesis dynamic losses. model are as follows. Here, we describe a real-time digital synthesis model for (i) The linear oscillator modeling the first mode of reeds double-reed instruments based on one hand on a recent vibration. study by Vergez et al. [5], in which the formation of the con- (ii) The nonlinear characteristics relating the flow to the fined air jet in the embouchure is taken into account, and on pressure and to the reed displacement at the mouth- the other hand on an extension of the method presented in piece. [6] for synthesizing the clarinet. This method avoids the need (iii) The impedance equation linking pressure and flow. for the incoming and outgoing wave decompositions, since it deals only with the relationship between the impedance vari- Figure 1 shows a highly simplified embouchure model for an ables, which makes it easy to transpose the physical model to oboe and the corresponding physical variables described in a synthesis model. Sections 2.1 and 2.2. A Digital Synthesis Model of Double-Reed Wind Instruments 991

The relationship between the mouth pressure pm and the y/2 pressure of the air jet pj (t) and the velocity of the air jet vj (t) pm H pj , vj pr , q and the volume flow q(t), classically used when dealing with y/2 single-reed instruments, is based on the stationary Bernoulli Reeds Backbore Main bore equation rather than on the Backus model (see, e.g., [10]for justification and comparisons with measurements). This re- Figure 1: Embouchure model and physical variables. lationship, which is still valid here, is

1 2 pm = pj (t)+ ρvj (t) , 2.1. Reed model 2 (4) Although this paper focuses on the simulation of double- q(t) = Sj (t)vj (t) = αSi(t)vj (t), reed instruments, oboe experiments have shown that the dis- placements of the two reeds are symmetrical [5, 8]. In this where α, which is assumed to be constant, is the ratio be- case, a classical single-mode model seems to suffice to de- tween the cross section of the air jet Sj (t) and the reed open- scribe the variations in the reed opening. The opening is ing Si(t). based on the relative displacement y(t) of the two reeds when It should be mentioned that the aim of this paper is to adifference in acoustic pressure occurs between the mouth propose a digital sound synthesis model that takes the dis- pressure pm and the acoustic pressure pj (t) of the air jet sipation of the air jet in the reed channel into account. For formed in the reed channel. If we denote the resonance fre- a detailed physical description of this phenomenon, readers quency, damping coefficient, and mass of the reeds ωr , qr and can consult [5], from which the notation used here was bor- µr , respectively, the relative displacement satisfies the equa- rowed. tion 2.2.2. Flow model 2 p − p (t) d y(t) dy(t) 2 =− m j In the framework of the digital synthesis model on which 2 + ωr qr + ωr y(t) . (1) dt dt µr this paper focuses, it is necessary to express the volume flow q(t) as a function of the difference between the mouth pres- Based on the reed displacement, the opening of the reed sure pm and the pressure at the entrance of the resonator channel denoted Si(t) is expressed by pr (t). From (4), we obtain Si(t) = Θ y(t)+H × w y(t)+H ,(2) 2 = 2 − where w denotes the width of the reed channel, H denotes the vj (t) pm pj (t) ,(5) ρ distance between the two reeds at rest (y(t)andpm = 0) and 2 2 2 2 Θ is the Heaviside function, the role of which is to keep the q (t) = α Si(t) vj (t) . (6) opening of the reeds positive by canceling it when y(t)+H< 0. Substituting the value of pj (t)givenby(3) into (5)gives

2 2.2. Nonlinear characteristics 2 = 2 − − Ψq(t) vj (t) pm pr (t) 2 . (7) 2.2.1. Physical bases ρ Sr In the case of the clarinet or saxophone, it is generally rec- Using (6), this gives ognized that the acoustic pressure ( )andvolumevelocity pr t   vr (t) at the entrance of the resonator are equal to the pressure 2 2 2 2 2 q(t) p (t)andvolumevelocityv (t) of the air jet in the reed chan- q (t) = α Si(t) pm − pr (t) − Ψ ,(8) j j ρ S2 nel (see, e.g., [9]). In oboe-like instruments, the smallness of r the reed channel leads to the formation of a confined air jet. from which we obtain the expression for the volume flow, According to a recent hypothesis [5], p (t)isnolongerequal r namely, the nonlinear characteristics in this case to pj (t), but these quantities are related as follows q(t) = sign p − p (t) 1 q(t)2 m r p (t) = p (t)+ ρΨ ,(3)  j r 2 S2 (9) ra αSi(t) 2 × pm − pr (t) . Ψ 2 2 2 ρ where Ψ is taken to be a constant related to the ratio between 1+ α Si(t) /Sr the cross section of the jet and the cross section at the en- trance of the resonator, q(t) is the volume flow, and ρ is the 2.3. Dimensionless model mean air density. In what follows, we will assume that the The reed displacement and the nonlinear characteristics are area Sra, corresponding to the cross section of the reed chan- converted into the dimensionless equations used in the syn- nel at the point where the flow is spread over the whole cross thesis model. For this purpose, we first take the reed displace- section, is equal to the area Sr at the entrance of the resonator. ment equation and replace the air jet pressure pj (t) by the 992 EURASIP Journal on Applied Signal Processing

expression involving the variables q(t)andpr (t)(equation In addition to the parameter ζ, two other parameters βx (3)), and βu depend on the height H of the reed channel at rest. Although, for the sake of clarity in the notations, the vari- d2 y(t) dy(t) p − p (t) q(t)2 2 =− m r Ψ able t has been omitted, γ, ζ, βx,andβu are functions of time 2 + ωr qr + ωr y(t) + ρ 2 . dt dt µr 2µr Sr (but slowly varying functions compared to the other vari- (10) ables). Taking the difference between the jet pressure and the resonator pressure into account results in a flow which is no On similar lines to what has been done in the case of single- longer proportional to the reed displacement, and a reed dis- reed instruments [11], y(t) is normalized with respect to the placement which is no longer linked to pe(t) in an ordinary = 2 static beating-reed pressure pM defined by pM Hωr µr . linear differential equation. We denote by γ the ratio, γ = pm/pM and replace y(t)by x(t), where the dimensionless reed displacement is defined 2.4. Resonator model = by x(t) y(t)/H + γ. We now consider the simplified resonator of an oboe-like in- With these notations, (10)becomes strument. It is described as a truncated, divergent, linear con- 1 d2x(t) q dx(t) p (t) ρΨ q(t)2 ical bore connected to a mouthpiece including the backbore + r + x(t) = r + (11) to which the reeds are attached, and an additional bore, the ω2 dt2 ω dt p 2p S2 r r M M r volume of which corresponds to the volume of the missing and the reed opening is expressed by part of the cone. This model is identical to that summarized in [12]. Si(t) = Θ 1 − γ + x(t) × wH 1 − γ + x(t) . (12) 2.4.1. Cylindrical bore Likewise, we use the dimensionless acoustic pressure The dimensionless input impedance of a cylindrical bore pe(t) and the dimensionless acoustic flow ue(t)definedby is first expressed. By assuming that the radius of the bore is large in comparison with the boundary layers thick- pr (t) ρc q(t) ff pe(t) = , ue(t) = , (13) nesses, the classical Kirchho theory leads to the value of pM Sr pM the complex wavenumber for a plane wave k(ω) = ω/c − (i3/2/2)ηcω1/2,whereη is a constant& depending& on the radius where c is the speed of the sound. 3/2 R of the bore η = (2/Rc )( lv +(cp/cv − 1) lt). Typical val- With these notations, the reed displacement and the non- −8 ues of the physical constants, in mKs units, are lv = 4.10 , linear characteristics are finally rewritten as follows, −8 lt = 5.6.10 , Cp/Cv = 1.4 (see, e.g., [13]). The trans- 2 fer function of a cylindrical bore of infinite length between 1 d x(t) qr dx(t) 2 + + x(t) = pe(t)+Ψβuue(t) (14) x = 0andx = L, which constitutes the propagation filter ω2 dt2 ω dt r r associated with the Green formulation, including the prop- and using (9)and(12), agation delay, dispersion, and dissipation, is then given by F(ω) = exp(−ik(ω)L). ue(t) = Θ 1 − γ + x(t) sign γ − pe(t) Assuming that the radiation losses are negligible, the di- ζ 1 − γ + x(t) mensionless input impedance of the cylindrical bore is clas- × γ − p (t) 2 e (15) sically expressed by 1+Ψβ 1 − γ + x(t) x C = = F x(t), pe(t) , (ω) i tan k(ω)L . (18) C where ζ, βx and βu are defined by In this equation, (ω) is the ratio between the Fourier  transforms Pe(ω)andUe(ω) of the dimensionless variables √ 2 2 2 = 2ρ cαw = 2 α w = ωr µr pe(t)andue(t)definedby(13). The input admittance of the ζ H , βx H 2 , βu H 2 . C−1 µr Sr ωr Sr 2ρc cylindrical bore is denoted by (ω). (16) Adifferent formulation of the impedance relation of a cylindrical bore, which is compatible with a time-domain This dimensionless model is comparable to the model implementation, and was proposed in [6], is used and ex- described, for example, in [7, 9] in the case of single-reed in- tended here. It consists in rewriting (18)as struments, where the dimensionless acoustic pressure pe(t), the dimensionless acoustic flow u (t), and the dimensionless 1 exp − 2ik(ω)L e C(ω) = − . (19) reed displacement x(t) are linked by the relations 1+exp − 2ik(ω)L 1+exp − 2ik(ω)L

1 d2x(t) q dx(t) Figure 2 shows the interpretation of (19)intermsof + r + x(t) = p (t), 2 2 e looped propagation filters. The transfer function of this ωr dt ωr dt (17) model corresponds directly to the dimensionless input ue(t) = Θ 1 − γ + x(t) sign γ − pe(t) impedance of a cylindrical bore. It is the sum of two parts. ×ζ 1 − γ + x(t) γ − pe(t) . The upper part corresponds to the first term of (19) and the A Digital Synthesis Model of Double-Reed Wind Instruments 993

2.4.3. Oboe-like bore

pe(t) The complete bore is a conical bore combined with a mouth- − exp − 2ik(ω)L ue(t) piece. The mouthpiece consists of a combination of two bores, − exp − 2ik(ω)L (i) a short cylindrical bore with length L1,radiusR1,sur- face S1, and characteristic impedance Z1. This is the backbore to which the reeds are attached. Its radius is small in comparison with that of the main conical Figure 2: Impedance model of a cylindrical bore. bore, the characteristic impedance of which is denoted Z2 = ρc/Sr ,and u (t) p (t) (ii) an additional short cylindrical bore with length L0,ra- e xe D e c dius R0,surfaceS0, and characteristic impedance Z0. Its radius is large in comparison with that of the back- bore. This role serves to add a volume correspond- ing to the truncated part of the complete cone. This − C 1(ω) −1 makes it possible to reduce the geometrical dispersion responsible for inharmonic impedance peaks in the Figure 3: Impedance model of a conical bore. combination backbore/conical bore.

The impedance C1(ω) of the short cylindrical backbore is based on an approximation of i tan(k (ω)L )withsmall lower part corresponds to the second term. The filter having 1 1 values of k (ω)L . It takes the dissipation into account and the transfer function −F(ω)2 =−exp(−2ik(ω)L) stands for 1 1 neglects the dispersion. Assuming that the radius R is large the back and forth path of the dimensionless pressure waves, 1 in comparison with the boundary layers thicknesses, using with a sign change at the open end of the bore. (19), C (ω) is first approximated by Although k(ω) includes both dissipation and dispersion, 1 the dispersion is small (e.g., in the case of a cylindrical bore √ 1 − exp − η1c ω/2L1 exp − 2iωL1/c with a radius of 7 mm, η = 1.34.10−5), and the peaks of the C (ω)  √ , (21) 1 1+exp − η c ω/2L exp − 2iωL /c input impedance of a cylindrical bore can be said to be nearly 1 1 1 harmonic. In particular, this intrinsic dispersion can be ne- which, since L1 is small, is finally simplified as glected, unlike the dispersion introduced by the geometry of √ the bore (e.g., the input impedance of a truncated conical − − − C  1 exp η1c ω/2L√1 1 2iωL1/c bore cannot be assumed to be harmonic). 1(ω) . (22) 1+exp − η1c ω/2L1 √ 2.4.2. Conical bore = − − By noting√ G(ω) (1 exp( η1c ω/2L1))/(1 + From the input impedance of the cylindrical bore, the di- exp(−η1c ω/2L1)), and H(ω) = (L1/c)(1 − G(ω)), the mensionless input impedance of the truncated, divergent, expression of C1(ω)reads conical bore can be expressed as a parallel combination of a cylindrical bore and an “air” bore, C1(ω) = G(ω)+iωH(ω). (23)

1 This approximation avoids the need for a second delay line S2(ω) = , (20) 1/ iωxe/c +1/C(ω) in the sampled formulation of the impedance. The transmission line equation relates the acoustic pres- where xe is the distance between the apex and the input. It is sure pn and the flow un at the entrance of a cylindrical bore expressed in terms of the angle θ of the cone and the input (with characteristic impedance Zn,lengthLn,andwavenum- = radius R as xe R/ sin(θ/2). ber kn) to the acoustic pressure pn+1 and the flow un+1 at The parameter η involved in the definition of C(ω)in the exit of a cylindrical bore. With dimensioned variables, (20), which depends on the radius and characterizes the it reads losses included in k(ω), is calculated by considering the ra- = dius of the cone at (5/12)L. This value was determined em- pn(ω) cos kn(ω)Ln pn+1(ω)+iZn sin kn(ω)Ln un+1(ω), pirically, by comparing the impedance given by (20)withan i un(ω) = sin kn(ω)Ln pn+1(ω)+cos kn(ω)Ln un+1(ω), input impedance of the same conical bore obtained with a se- Zn ries of elementary cylinders with different diameters (stepped (24) cone), using the transmission line theory. Denoting by D the differentiation operator D(ω) = iω yielding and rewriting (20) in the form S2(ω) = D(ω)(xe/c)/(1 + D C−1 p (ω) p (ω)/u (ω)+iZ tan k (ω)L (ω)(xe/c) (ω)), we propose the equivalent scheme in n = n+1 n+1 n n n . (25) Figure 3. un(ω) 1+(i/Zn)tan kn(ω)Ln pn+1(ω)/un+1(ω) 994 EURASIP Journal on Applied Signal Processing

p (t) C e 1(ω) Z1 Ze(ω)

ue(t) 1 pe(t) Z2 H, p S m 2(ω) Z2 Reed −V model D ζ,βx,βu,γ (ω) ρc2 x(t) u (t) e f Figure 4: Impedance model of the simplified resonator. pe(t)

Figure 5: Nonlinear synthesis model. Using the notations introduced in (20)and(23), the input impedance of the combination backbore/main conical bore reads S C 2.5. Summary of the physical model p1(ω) Z2 2(ω)+ Z1 1(ω) = , (26) The complete dimensionless physical model consists of three u1(ω) 1+ Z2/Z1 S2(ω)C1(ω) equations, which is simplified as p1(ω)/u1(ω) = Z2S2(ω)+Z1C1(ω), 2 since Z  Z . 1 d x(t) qr dx(t) 1 2 + + x(t) = p (t)+Ψβ u (t)2, (31) In the same way, the input impedance of the whole bore 2 2 e u e ωr dt ωr dt reads − ζ 1 γ + x(t) ue(t) = p0(ω) p1(ω)/u1(ω)+iZ0 tan k0(ω)L0 Ψ − 2 = , (27) 1+ βx 1 γ + x(t) u (ω) 1+(i/Z )tan k (ω)L p (ω)/u (ω) 0 0 0 0 1 1 × Θ 1 − γ + x(t) sign γ − pe(t) × γ − pe(t) , which, since Z  Z , is simplified as 0 1 (32) p0(ω) p1(ω)/u1(ω) = = . (28) Pe(ω) Ze(ω)Ue(ω). (33) u0(ω) 1+(i/Z0)tan k0(ω)L0 p1(ω)/u1(ω) These equations enable us to introduce the reed and the Since L is small and the radius is large, the losses in- 0 nonlinear characteristics in the form of two nonlinear loops, cluded in k (ω) can be neglected, and hence k (ω) = ω/c 0 0 as shown in Figure 5. The first loop relates the output p to and tan(k (ω)L ) = (ω/c)L . Under these conditions, the in- e 0 0 0 the input u of the resonator, as in the case of single-reed put impedance of the bore is given by e instruments models. The second nonlinear loop corresponds to the u2-dependent changes in x. The output of the model is p0(ω) 1 e = given by the three coupled variables , ,and . The control u (ω) 1/ p (ω)/u (ω) + iω/c L /Z pe ue x 0 1 1 0 0 (29) parameters of the model are the length L of the main conical 1 = . bore and the parameters H(t)andpm(t)fromwhichζ(t), 1/ Z2S2(ω)+Z1C1(ω) + iω/c L0S0/ρc βx(t), βu(t), and γ(t) are calculated. In the context of sound synthesis, it is necessary to calcu- If we take V to denote the volume of the short addi- late the external pressure. Here we consider only the propa- tional bore V = L S and rewrite (29) with the dimension- 0 0 gation within the main “cylindrical” part of the bore in (20). less variables P and U (U = Z u ), the dimensionless in- e e e 2 0 Assuming again that the radiation impedance can be ne- put impedance of the whole resonator relating the variables glected, the external pressure corresponds to the time deriva- Pe(ω)andUe(ω)becomes tive of the flow at the exit of the resonator pext(t) = dus(t)/dt. Using the transmission line theory, one directly obtains = Pe(ω) Ze(ω) Ue(ω) (30) Us(ω) = exp − ik(ω)L Pe(ω)+Ue(ω) . (34) = 1/Z2 2 C S . iωV/ ρc +1/ Z1 1(ω)+Z2 2(ω) From the perceptual point of view, the quantity exp(−ik(ω)L) can be left aside, since it stands for the After rearranging (30), we propose the equivalent scheme in losses corresponding to a single travel between the em- Figure 4. bouchure and the open end. This simplification leads to the It can be seen from (30) that the mouthpiece is equivalent following expression for the external pressure to a Helmholtz resonator consisting of a hemispherical cavity = 3 with volume V and radius Rb such that V (4/6)πRb,con- d p (t) = p (t)+u (t) . (35) nected to a short cylindrical bore with length L1 and radius ext dt e e R1. A Digital Synthesis Model of Double-Reed Wind Instruments 995

3. DISCRETE-TIME MODEL 6

In order to draw up the synthesis model, it is necessary to 5 use a discrete formulation in the time domain for the reed displacement and the impedance models. The discretization schemes used here are similar to those described in [6]for 4 the clarinet, and summarized in [12] for brass instruments and saxophones. 3

3.1. Reed displacement 2 We take e(t) to denote the excitation of the reed e(t) = 2 pe(t)+Ψβuue(t) . Using (31), the Fourier transform of the 1 ratio X(ω)/E(ω) can be readily written as X(ω) ω2 0 = r (36) 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 2 − 2 . E(ω) ωr ω + iωqr ωr Hz An inverse Fourier transform provides the impulse response Figure 6: Approximated (solid line) and exact (dotted line) reed h(t) of the reed model frequency response with parameter values fr = 2500 Hz, qr = 0.2, = and fe 44.1kHz. = 2ωr − 1 1 − 2 h(t) exp ωr qr t sin 4 qr ωr t . (37) 4 − q2 2 2 r stability condition makes this discretization scheme unsuit- Equation (37) shows that h(t)satisfiesh(0) = 0. This prop- able for use at low sampling rates, but in practice, at the CD erty is most important in what follows. In addition, the range quality sample rate, this problem does not arise for a reed res- onance frequency of up to 5 kHz with a quality factor of up to of variations allowed for qr is ]0, 2[. The discrete-time version of the impulse response uses 0.5. For a more detailed discussion of discretization schemes, two centered numerical differentiation schemes which pro- readers can consult, for example, [14]. vide unbiased estimates of the first and second derivatives The bilinear transformation does not provide a suitable when they are applied to sampled second-order polynomi- discretization scheme for the reed displacement. In this case, als the impulse response does not satisfy the property of the con- tinuous model h(0) = 0.  fe − −1 iω z z , 3.2. Impedance 2 (38) − 2  2 − −1 ω fe z 2+z , A time domain equivalent to the inverse Fourier transform of impedance Ze(ω)givenby(30)isnowrequired.Herewe = = where z exp(iω˜), ω˜ ω/ fe,and fe is the sampling fre- express pe(n) as a function of ue(n). quency. The losses in the cylindrical bore element contributing to With these approximations, the digital transfer function the impedance of the whole bore are modeled with a digi- of the reed is given by tal low-pass filter. This filter approximates the back and forth losses described by F(ω)2 = exp(−2ik(ω)L) and neglects the X(z) = (small) dispersion. So that they can be adjusted to the ge- E(z) ometry of the resonator, the coefficients of the filter are ex- z−1 pressed analytically as functions of the physical parameters, , 2 2 − −1 2 2 − − −2 − 2 2 fe /ωr + feqr / 2ωr z 2 fe /ωr 1 z feqr / 2ωr fe /ωr rather than using numerical approximations and minimiza- (39) tions. For this purpose, a one-pole filter is used, yielding a difference equation of the type b0 exp(−iωD˜ ) F˜(ω˜) = , (41) 1 − a1 exp(−iω˜) = − − − x(n) b1a e(n 1) + a1a x(n 1) + a2a x(n 2). (40) where ω˜ = ω/ fe,andD = 2 fe(L/c) is the pure delay corre- This difference equation keeps the property h(0) = 0. sponding to a back and forth path of the waves. Figure 6 shows the frequency response of this approxi- The parameters b0 and a1 are calculated so that mated reed model (solid line) superimposed with the exact |F(ω)2|2 =|F˜(ω˜)|2 for two given values of ω,andareso- one (dotted line). lutions of the system This discrete reed model is stable under the condi- 2 2 2 tion ωr

require pe(n)andue(n) to be known. This makes it possi- 30 ble to solve this system explicitly, as shown in [6], thus doing 25 away with the need for schemes such as the K-method [15]. 20 Since W is always positive, if one considers the two cases 15 10 γ − pe(n) ≥ 0andγ − pe(n) < 0, successively, substituting the 5 expression for pe(n)from(48) into (49)eventuallygives 0 0 500 1000 1500 2000 2500 3000 35004000 1 ue(n) = sign(γ − V˜ ) Hz 2 (51) (a) × − 2 2 | − ˜ | bc0 W + W bc0 W +4 γ V .

The acoustic pressure and flow in the mouthpiece at sam- 2 pling time n are then finally obtained by the sequential cal- ˜ culation of V with (46), x(n)with(47), W with (50), ue(n) 1.5 with (51), and pe(n)with(48). Theexternalpressurep (n) is calculated using the dif- ext 1 ference between the sum of the internal pressure and the flow 0 500 1000 1500 2000 2500 3000 35004000 at sampling time n and n − 1. Hz (b) 4. SIMULATIONS The effects of introducing the confined air jet into the non- Figure 8: (a) represents impedance (dotted line) and ratio between the spectra of pe and ue (solid line), while (b) represents reed trans- linear characteristics are now studied in the case of two dif- Ψ 2 fer (dotted line) and ratio of spectra between x and pe + βuue (solid ferent bore geometries. In particular, we consider a cylindri- line). cal resonator, the impedance peaks of which are odd har- monics, and a resonator, the impedance of which contains all the harmonics. We start by checking numerically the va- lidity of the resolution scheme in the case of the cylindrical −10 bore. (Sound examples are available at http://omicron.cnrs- mrs.fr/∼guillemain/eurasip.html.) −8 4.1. Cylindrical resonator

We first consider a cylindrical resonator, and make the pa- −6 Ψ rameter vary linearly from 0 to 4000 during the sound kHz synthesis procedure (1.5 seconds). The transient attack cor- −4 responds to an abrupt increase in γ at t = 0. During the de- cay phase, starting at t = 1.3 seconds, γ decreases linearly towards zero. Its steady-state value is γ = 0.56. The other −2 −4 parameters are constant, ζ = 0.35, βx = 7.5.10 , βu = −3 6.1.10 . The reed parameters are ωr = 2π.3150 rad/second, 0 qr = 0.5. The resonator parameters are R = 0.0055 m, 00.20.40.60.811.21.4 L = 0.46 m. s Figure 8 shows superimposed curves, in the top figure, the digital impedance of the bore is given in dotted lines, Figure 9: of the external pressure for a cylindrical bore and the ratio between the Fourier transforms of the sig- and a beating reed where γ = 0.56. nals pe(n)andue(n) in solid lines; in the bottom figure, the digital reed transfer function is given in dotted lines, and the ratio of the Fourier transforms of the signals x(n)and 2 4.1.1. The case of the beating reed pe(n)+Ψ(n)βuue(n) (including attack and decay transients) in solid lines. The first example corresponds to a beating reed situation, As we can see, the curves are perfectly superimposed. which is simulated by choosing a steady-state value of γ There is no need to check the nonlinear relation between greater than 0.5 (γ = 0.56). ue(n), pe(n), and x(n), which is satisfied by construction Figure 9 shows the spectrogram (dB) of the external pres- since ue(n) is obtained explicitly as a function of the other sure generated by the model. The values of the spectrogram variables in (51). In the case of the oboe-like bore, the re- are coded with a grey-scale palette (small values are dark and sults obtained using the resolution scheme are equally accu- high values are bright). The bright horizontal lines corre- rate. spond to the harmonics of the external pressure. 998 EURASIP Journal on Applied Signal Processing

×10−2 ×10−2 18 18 −10 16 16 14 14 12 12 −8 10 10 8 8 6 6 − 4 4 6 2 2 kHz 0 0 −8 −6 −4 −20 2 4 6 −8 −6 −4 −20 2 4 6 −4 ×10−1 ×10−1

(a) (b) −2

Figure 10: ue(n)versuspe(n): (a) t = 0.25 second, (b) t = 0.5 second. 0 00.20.40.60.811.21.4 s

×10−2 ×10−2 Figure 12: Spectrogram of the external pressure for a cylindrical 16 14 bore and a nonbeating reed where γ = 0.498. 14 12 12 10 10 8 − − 8 ×10 2 ×10 2 6 6 16 16 4 4 14 14 2 2 12 12 0 0 10 10 −8 −6 −4 −20 2 4 6 −8 −6 −4 −20 2 4 6 8 8 − − ×10 1 ×10 1 6 6 4 4 (a) (b) 2 2 0 0 − − − − − − − − − − Figure 11: u (n)versusp (n): (a) t = 0.75 second, (b) t = 1second. 5 4 3 2 1012345 5 4 3 2 1012345 e e ×10−1 ×10−1

(a) (b) Ψ ff Increasing the value of mainly a ects the pitch and = = ff Figure 13: ue(n)versuspe(n): (a) t 0.25 second, (b) t 0.5 only slightly a ects the amplitudes of the harmonics. In par- second. ticular, at high values of Ψ, a small increase in Ψ results in a strong decrease in the pitch. A cancellation of the self-oscillation process can be ob- 4.1.2. The case of the nonbeating reed served at around t = 1.2 seconds, due to the high value of Ψ, since it occurs before γ starts decreasing. The second example corresponds to a nonbeating reed situa- Odd harmonics have a much higher level than even har- tion, which is obtained by choosing a steady-state value of γ monics as occuring in the case of the clarinet. Indeed, the smaller than 0.5 (γ = 0.498). even harmonics originate mainly from the flow, which is Figure 12 shows the spectrogram of the external pressure taken into account in the calculation of the external pressure. generated by the model. Increasing the value of Ψ results in However, it is worth noticing that the level of the second har- a sharp change in the level of the high harmonics at around monic increases with Ψ. t = 0.4 seconds, a slight change in the pitch, and a cancella- Figures 10 and 11 show the flow u (n) versus the pressure tion of the self-oscillation process at around t = 0.8 seconds, e Ψ pe(n), obtained during a small number (32) of oscillation pe- corresponding to a smaller value of than that observed in riods at around t = 0.25 seconds, t = 0.5 seconds, t = 0.75 the case of the beating reed. seconds and t = 1 seconds. The existence of two different Figure 13 shows the flow ue(n) versus the pressure pe(n) paths, corresponding to the opening or closing of the reed, is at around t = 0.25 seconds and t = 0.5 seconds. Since the due to the inertia of the reed. This phenomenon is observed reed is no longer beating, the whole path remains continu- also on single-reed instruments (see, e.g., [14]). A disconti- ous. The changes in its shape with respect to Ψ are smaller nuity appears in the whole path because the reed is beating. than in the case of the beating reed. This cancels the opening (and hence the flow) while the pres- sure is still varying. 4.2. Oboe-like resonator The shape of the curve changes with respect to Ψ. This Inordertocomparetheeffects of the confined air jet with the shape is in agreement with the results presented in [5]. geometry of the bore, we now consider an oboe-like bore, A Digital Synthesis Model of Double-Reed Wind Instruments 999

0.4 0.2 −10 0 −0.2 −0.4 −8 00.511.5 s −6 (a) kHz 0.2 −4 0.1 0 −2 −0.1 −0.2 0 500 1000 1500 2000 2500 3000 3500 4000 0 samples 00.20.40.60.811.21.4 s (b)

0.1 Figure 15: Spectrogram of the external pressure for an oboe-like bore where = 0 4. 0.05 γ . 0 −0.05 −0.1 ×10−2 ×10−2 0 500 1000 1500 2000 2500 3000 3500 4000 18 18 samples 16 16 14 14 12 12 (c) 10 10 8 8 Figure 14: (a) represents external acoustic pressure, and (b), (c) 6 6 4 4 represent attack and decay transients. 2 2 0 0 −16 −12 −8 −40 4−14 −10 −6 −22 ×10−1 ×10−1

(a) (b) the input impedance, and geometric parameters of which correspond to Figure 7. The other parameters have the same = = values as in the case of the cylindrical resonator, and the Figure 16: ue(n)versuspe(n): (a) t 0.25 second, (b) t 0.5 second. steady-state value of γ is γ = 0.4. Figure 14 shows the pressure pext(t). Increasing the effect Ψ of the air jet confinement with , and hence the aerodynam- ×10−2 ×10−2 ical losses, results in a gradual decrease in the signal ampli- 18 16 tude. The change in the shape of the waveform with respect 16 14 14 to Ψ can be seen on the blowups corresponding to the attack 12 12 10 and decay transients. 10 8 Figure 15 shows the spectrogram of the external pressure 8 6 6 generated by the model. 4 4 Since the impedance includes all the harmonics (and not 2 2 0 0 only the odd ones as in the case of the cylindrical bore), −12 −8 −40 4−10 −8 −6 −4 −20 2 4 the output pressure also includes all the harmonics. This ×10−1 ×10−1 makes for a considerable perceptual change in the timbre (a) (b) in comparison with the cylindrical geometry. Since the in- put impedance of the bore is not perfectly harmonic, it is = = not possible to determine whether the “moving ” Figure 17: ue(n)versuspe(n): (a) t 0.75 second, (b) t 1 second. are caused by a change in the value of Ψ or by a “phasing effect” resulting from the slight inharmonic nature of the impedance. Figures 16 and 17 show the flow ue(n) versus the pressure = = = Increasing the value of Ψ affects the amplitude of the har- pe(n)ataroundt 0.25 seconds, t 0.5 seconds, t 0.75 monics and slightly changes the pitch. In addition, as in the seconds, and t = 1 seconds. The shape and evolution with Ψ case of the cylindrical bore with a nonbeating reed, a large of the nonlinear characteristics are similar to what occurs in value of Ψ brings the self-oscillation process to an end. the case of a cylindrical bore with a beating reed. 1000 EURASIP Journal on Applied Signal Processing

5. CONCLUSION [7] J. Kergomard, “Elementary considerations on reed-instru- ment oscillations,” in Mechanics of Musical Instruments, The synthesis model described in this paper includes the for- A. Hirschberg, J. Kergomard, and G. Weinreich, Eds., mation of a confined air jet in the embouchure of double- Springer-Verlag, New York, NY, USA, 1995. reed instruments. A dimensionless physical model, the form [8] A. Almeida, C. Vergez, R. Causse,´ and X. Rodet, “Physical of which is suitable for transposition to a digital synthesis study of double-reed instruments for application to sound- model, is proposed. The resonator is modeled using a time synthesis,” in Proc. International Symposium in Musical Acous- domain equivalent of the input impedance and does not re- tics, pp. 221–226, Mexico City, Mexico, December 2002. [9] A. Hirschberg, “Aero-acoustics of wind instruments,” in Me- quire the use of wave variables. This facilitates the model- chanics of Musical Instruments, A. Hirschberg, J. Kergomard, ing of the digital coupling between the bore, the reed and and G. Weinreich, Eds., Springer-Verlag, New York, NY, USA, the nonlinear characteristics, since all the components of the 1995. model use the same physical variables. It is thus possible to [10] S. Ollivier, Contribution al’` ´etude des oscillations des instru- obtain an explicit resolution of the nonlinear coupled sys- ments avent` a` anche simple, Ph.D. thesis, l’Universitedu´ tem thanks to the specific discretization scheme of the reed Maine, France, 2002. model. This is applicable to other self-oscillating wind in- [11] T. A. Wilson and G. S. Beavers, “Operating modes of the clar- inet,” Journal of the Acoustical Society of America, vol. 56, no. struments using the same flow model, but it still requires to 2, pp. 653–658, 1974. be compared with other methods. [12] Ph. Guillemain, J. Kergomard, and Th. Voinier, “Real-time This synthesis model was used in order to study the in- synthesis models of wind instruments based on physical mod- fluence of the confined jet on the sound generated, by carry- els,” in Proc. Stockholm Music Acoustics Conference,Stock- ing out a real-time implementation. Based on the results of holm, Sweden, 2003. informal listening tests with an oboe player, the sound and [13] A. D. Pierce, Acoustics—An Introduction to Its Physical Prin- ciples and Applications, McGraw-Hill, New York, NY, USA, dynamics of the transients obtained are fairly realistic. The 1981, reprinted by Acoustical Society of America, Woodbury, simulations show that the shape of the resonator is the main NY, USA, 1989. factor determining the timbre of the instrument in steady- [14] F. Avanzini and D. Rocchesso, “Efficiency, accuracy, and sta- state parts, and that the confined jet plays a role at the con- bility issues in discrete time simulations of single reed instru- trol level of the model, since it increases the oscillation step ments,” Journal of the Acoustical Society of America, vol. 111, and therefore plays an important role mainly in the transient no. 5, pp. 2293–2301, 2002. parts. [15] G. Borin, G. De Poli, and D. Rocchesso, “Elimination of delay- free loops in discrete-time models of nonlinear acoustic sys- tems,” IEEE Trans. Speech and Audio Processing,vol.8,no.5, ACKNOWLEDGMENTS pp. 597–605, 2000. The author would like to thank Christophe Vergez for helpful discussions on the physical flow model, and Jessica Blanc for Ph. Guillemain was born in 1967 in Paris. reading the English. Since 1995, he has been working as a full time researcher at the Centre National de la Recherche Scientifique (CNRS) in Mar- REFERENCES seille, France. He obtained his Ph.D. in 1994 on the modeling of [1] R. T. Schumacher, “Ab initio calculations of the oscillation of natural sounds using time frequency and a clarinet,” Acustica, vol. 48, no. 71, pp. 71–85, 1981. wavelets representations. Since 1989, he has [2] J. O. Smith III, “Principles of digital waveguide models of been working in the field of musical sounds musical instruments,” in Applications of Digital Signal Pro- analysis, synthesis and transformation us- cessing to Audio and Acoustics, M. Kahrs and K. Branden- ing signal models, and phenomenological models with an emphasis burg, Eds., pp. 417–466, Kluwer Academic Publishers, Boston, on propagative models, their link with physics, and the design and Mass, USA, 1998. control of real-time compatible synthesis algorithms. [3] V. Valim¨ aki¨ and M. Karjalainen, “Digital waveguide modeling of wind instrument bores constructed of truncated cones,” in Proc. International Computer Music Conference, pp. 423–430, Computer Music Association, San Francisco, 1994. [4] M. van Walstijn and M. Campbell, “Discrete-time modeling of woodwind instrument bores using wave variables,” Journal of the Acoustical Society of America, vol. 113, no. 1, pp. 575– 585, 2003. [5]C.Vergez,R.Almeida,A.Causse,´ and X. Rodet, “Toward a simple physical model of double-reed musical instruments: influence of aero-dynamical losses in the embouchure on the coupling between the reed and the bore of the resonator,” Acustica, vol. 89, pp. 964–974, 2003. [6] Ph. Guillemain, J. Kergomard, and Th. Voinier, “Real-time synthesis of wind instruments, using nonlinear physical mod- els,” submitted to Journal of the Acoustical Society of Amer- ica. EURASIP Journal on Applied Signal Processing 2004:7, 1001–1006 c 2004 Hindawi Publishing Corporation

Real-Time Gesture-Controlled Physical Modelling Music Synthesis with Tactile Feedback

David M. Howard Media Engineering Research Group, Department of Electronics, University of York, Heslington, York, YO10 5DD, UK Email: [email protected]

Stuart Rimell Media Engineering Research Group, Department of Electronics, University of York, Heslington, York, YO10 5DD, UK

Received 30 June 2003; Revised 13 November 2003

Electronic sound synthesis continues to offer huge potential possibilities for the creation of new musical instruments. The tra- ditional approach is, however, seriously limited in that it incorporates only auditory feedback and it will typically make use of a sound synthesis model (e.g., additive, subtractive, wavetable, and sampling) that is inherently limited and very often nonintu- itive to the musician. In a direct attempt to challenge these issues, this paper describes a system that provides tactile as well as acoustic feedback, with real-time synthesis that invokes a more intuitive response from players since it is based upon mass-spring physical modelling. Virtual instruments are set up via a graphical user interface in terms of the physical properties of basic well- understood sounding objects such as strings, membranes, and solids. These can be interconnected to form complex integrated structures. Acoustic excitation can be applied at any point mass via virtual bowing, plucking, striking, specified waveform, or from any external sound source. Virtual microphones can be placed at any point masses to deliver the acoustic output. These aspects of the instrument are described along with the nature of the resulting acoustic output. Keywords and phrases: physical modelling, music synthesis, haptic interface, force feedback, gestural control.

1. INTRODUCTION more closely related to musicians’ experiences with acous- tic instruments [2, 3, 4, 5]. Professional electroacoustic mu- Musicians are always searching for new sounds and new sicians require control over all aspects of the sounds with ways of producing sounds in their compositions and per- which they are working, in much the same way as a con- formances. The availability of modern computer systems has ductor is in control of the sound produced by an orchestra. enabled considerable processing power to be made available Such control is not usually available from traditional syn- on the desktop and such machines have the capability of en- thesis techniques, since user adjustment of available synthe- abling sound synthesis techniques to be employed in real- sis parameters rarely leads to obviously predictable acous- time, that would have required large dedicated computer sys- tic results. Physical modelling, on the other hand, offers the tems just a few decades ago. Despite the increased incorpo- potential of more intuitive control, because the underlying ration of computer technology in electronic musical instru- technique is related directly to the physical vibrating prop- ments, the search is still on for virtual instruments that are erties of objects, such as strings and membranes with which closer in terms of how they are played to their physical acous- the user can interact through inference relating to expecta- tic counterparts. tion. The system described in this paper aims to integrate mu- The acoustic output from traditional electronic musical sic synthesis by physical modelling with novel control in- instruments is often described as “cold” or “lifeless” by play- terfaces for real-time use in composition and live perfor- ers and audience alike. Indeed, many report that such sounds mances. Traditionally, sound synthesis has relied on tech- become less interesting with extended exposure. The acous- niques involving oscillators, wavetables, filters, time envelope tic output from acoustic musical instruments, on the other shapers, and digital sampling of natural sounds (e.g., [1]). hand, is often described as “warm,” “intimate” or “organic.” More recently, physical models of musical instruments have The application of physical modelling for sound synthesis been used to generate sounds which have more natural qual- produces output sounds that resemble much more closely ities and have control parameters which are less abstract and their physical counterparts. 1002 EURASIP Journal on Applied Signal Processing

The success of a user interface for an electronic musical available PC force feedback gaming devices are employed to instrument might be judged on its ability to enable the user provide its real-time gestural control and haptic feedback. to experience the illusion of directly manipulating objects, The instrument described in this paper, known as Cy- and one approach might be the use of virtual reality inter- matic [17], took its inspiration from the fact that traditional faces. However, this is not necessarily the best way to achieve acoustic instruments are controlled by direct physical ges- such a goal in the context of a musical instrument, since a ture, whilst providing both aural and tactile feedback. Cy- performing musician needs to be actively in touch visually matic has been designed to provide players with an immer- and acoustically not only with other players, but also with the sive, easy to understand, as well as tactile musical experience audience. This is summed up by Shneiderman [6]: “virtual that is more commonly associated with acoustic instruments reality is a lively new direction for those who seek the immer- but rarely found with computer-based instruments. The au- sion experience, where they block out the real world by hav- dio output from Cymatic is derived from a physical mod- ing goggles on their heads.” In any case, traditionally trained elling synthesis engine which has its origins in TAO [3]. It musicians rely less on visual feedback with their instrument shares some common approaches with other physical mod- and more on tactile and sonic feedback as they become in- elling sound synthesis environments such as Mosaic in [4] creasingly accustomed to playing it. For example, Hunt and and Cordis-Anima in [5]. Cymatic makes use of the more in- Kirk [7] note that “observation of competent pianists will tuitive approach to sound synthesis offered by physical mod- quickly reveal that they do not need to look at their fingers, elling, to provide a building block approach to the creation let alone any annotation (e.g., sticky labels with the names of virtual instruments, based on elemental structures in one of the notes on) which beginners commonly use. Graphics (string), two (sheet), three (block), or more dimensions that are a useful way of presenting information (especially to be- can be interconnected to form complex virtual acoustically ginners), but are not the primary channel which humans use resonant structures. Such instruments can be excited acous- when fully accustomed to a system.” tically, controlled in real-time via gestural devices that incor- There is evidence to suggest that the limited informa- porate force feedback to provide a tactile response in addi- tion available from the conventional screen and mouse in- tion to the acoustic output, and heard after placing one or terface is certainly limiting and potentially detrimental for more virtual microphones at user-specified positions within creating electroacoustic music. Buxton [8] suggests that the the instrument. visual senses are overstimulated, whilst the others are under- stimulated. In particular, he suggests that tactile input de- 2. DESIGNING AND PLAYING CYMATIC vices also provide output to enable the user to relate to the INSTRUMENTS system as an object rather than an abstract system, “every haptic input device can also be considered to provide out- Cymatic is a physical modelling synthesis system that makes put. This would be through the tactile or kinaesthetic feed- use of a mass-spring paradigm with which it synthesises back that it provides to the user .... Some devices actually resonating structures in real-time. It is implemented on a provideforcefeedback,aswithsomespecialjoysticks.”Fitz- Windows-based PC machine in C++, and it incorporates maurice [9]proposes“graspable user interfaces”asrealob- support for standard force feedback PC gaming controllers to jects which can be held and manipulated, positioned, and provide gestural control and tactile feedback. Acoustic out- conjoined in order to make interfaces which are more akin to put is realised via a sound card that provides support for the way a human interacts with the real world. It has further ASIO audio drivers. Operation of Cymatic is a two-stage pro- been noted that the haptic senses provide the second most cess: (1) virtual instrument design and (2) real-time sound important means (after the audio output) by which users ob- synthesis. serve and interact with the behaviour of musical instruments Virtual instrument design is accomplished via a graphi- [10], and that complex and realistic musical expression can cal interface, with which individual building block resonat- only result when both tactile (vibrational and textural) and ing elements including strings, sheets, and solids can be in- proprioceptive cues are available in combination with aural corporated in the instrument and interconnected on a user- feedback [11]. specified mass to mass basis. The ends of strings and edges Considerable activity exists on capturing human ges- of sheets and blocks can be locked as desired. The tension ture http://www.media.mit.edu/hyperins/ and http://www. and mass parameters of the masses and springs within each megaproject.org/ [12]. Specific to the control of musical in- building block element can be user defined in value and ei- struments is the provision of tactile feedback [13], electronic ther left fixed or placed under dynamical control using a ges- keyboards that have a feel close to a real piano [14], hap- tural controller during synthesis. Virtual instruments can be tic feedback bows that simulate the feel and forces of real customised in shape to enable arbitrary structures to be re- bows [15], and the use of finger-fitted vibrational devices in alised by deleting or locking any of the individual masses. open air gestural musical instruments [16]. Such haptic con- Each building block resonating element will behave as a trol devices are generally one-off, relatively expensive, and vibrating structure. The individual axial resonant frequen- designed to operate linked with specific computer systems, cies will be determined by the number of masses along the and as such, they are essentially inaccessible to the musi- given axis, the sampling rate, and the specified mass and ten- cal masses. A key feature of our instrument is its potential sion values. Standard relationships hold in terms of the rel- for wide applicability, and therefore inexpensive and widely ative values of resonant frequency between building blocks, Gesture-Tactile Physical Modelling Synthesis 1003

cluded: a random input to the string at mass 33 and a bowed String 1 Random excitation to the block at mass (2, 2, 2). The basic sheet and Sheet 1 mic1 block have been edited. Masses have been removed from both the sheet and the block as indicated by the gaps in their struc- ture and the masses on the back surface of the block have Block 1 all been locked. The audio output is derived from a virtual microphone placed on the sheet at mass (4, 1). These are in- dicated on the figure as random, bow,andmic1,respectively. Individual components, excitations, and microphones can be Bow added, edited, or deleted as desired. The instrument is controlled in real-time using a Mi- crosoft Sidewinder Force Feedback Pro Joystick and a Log- itech iFeel mouse found on http://www.immersion.com.The various gestures that can be captured by these devices can be mapped to any of the parameters that are associated with the physical modelling process on an element-by-element ba- sis. The joystick offers four degrees of freedom (x, y, z-twist movement and a rotary “throttle” controller) and eight but- tons. The mouse has two degrees of freedom (X, Y) and three Figure 1: Example build-up of a Cymatic virtual instrument start- buttons. Cymatic parameters that can be controlled include ing with a string with 45 masses (top left), then adding a sheet of 7 the mass or tension of any of the basic elements that make by9masses(bottomleft),thenablockof4by4by3masses(top up the instrument and the parameters associated with the right), and finally the completed instrument (bottom right). Mic1: chosen excitation, such as bowing pressure, excitation force, audio output virtual microphone on the sheet at mass (4, 1). Ran- or excitation velocity. The buttons can be configured to sup- dom: random excitation at mass 33 of the string. Bow: bowed ex- press the effect of any of the gestural movements to enable the citation at mass (2, 2, 2) of the block. Joins (dotted line) between user to move to a new position while making no change and string mass 18 and sheet mass (1, 5). Join (dotted line) between sheet mass (6, 3) and block mass (3, 2, 1). then the change can be made instantaneously by releasing the button. In this way, step variations can be accommodated. The force feedback capability of the joystick allows for the for example, a string twice the length of another will have a provision of tactile feedback with a high degree of customis- fundamental frequency that is one octave lower. ability. It receives its force instructions via MIDI through the An excitation function, selected from the following list, combined MIDI/joystick port on most PC sound cards, and can be placed on any mass within the virtual instrument: Cymatic outputs the appropriate MIDI messages to control pluck, bow, random, , square wave, triangular wave, its force feedback devices. The Logitech iFeel mouse is an or live audio. Parameters relating to the selected excita- optical mouse which implements Immersion’s iFeel technol- tion, including excitation force and its velocity and time ogy (http://www.immersion.com). It contains a vibrotactile of application where appropriate can be specified by the device to produce tactile feedback over a range of frequen- user. Multiple excitations can be specified on the basis that cies and amplitudes via the “Immersion Touchsense Entertain- each is applied to its own individual mass element. Mono- ment” software, which converts any audio signal to tactile phonic audio output to the sound card is achieved via a sensations. The force feedback amplitude is controlled by the virtual microphone placed on any individual mass within acoustic amplitude of the signal from a user-specified virtual the instrument. Stereophonic output is available either from microphone, which might be involved in the provision of the two individual microphones or from any number of mi- main acoustic output, or it could solely be responsible for the crophones greater than two, where the output from each is control of tactile feedback. panned between the left and right channels as desired. Cy- matic supports whatever range of sampling rates that is avail- able on the sound card. For example, when used with an 3. PHYSICAL MODELLING SYNTHESIS IN CYMATIC Eridol UA-5 USB audio interface, the following are avail- able: 8 kHz, 9.6 kHz, 11.025 kHz, 12 kHz, 16 kHz, 22.05 kHz, Physical modelling audio synthesis in Cymatic is carried out 24 kHz, 32 kHz, 44.1 kHz, 48 kHz, 88.2 kHz, and 96 kHz. by solving for the mechanical interaction between the masses and springs that make up the virtual instrument on a sample- Figure 1 illustrates the process of building up a virtual in- ff strument. The instrument has been built up from a string by-sample basis. The central di erence method of numerical of 45 masses, a sheet of 7 by 9 masses, and a block of 4 integration is employed as follows: by 4 by 3 masses. There is an interconnection between the dt string (mass 18 from the left) and the sheet (mass 1, 5) as x(t + dt) = x(t)+v t + dt, 2 well as the sheet (mass 6, 3) and the block (mass 3, 2, 1) as in- (1) dt dt dicated by the dotted lines (a simple process based on click- v t + = v t − + a(t)dt, ing on the relevant masses). Two excitations have been in- 2 2 1004 EURASIP Journal on Applied Signal Processing where x = mass position, v = mass velocity, a = mass Joined masses: mass 30 on string to mass (6.3) on sheet acceleration, t = time, and dt = sampling interval. The mass velocity is calculated half a time step ahead of String 1 its position, which results in a more stable model than an im- plementation of the Euler approximation. The acceleration at time t of a cell is calculated by the classical equation Sheet 1 F a = ,(2) m where F = the sum of all the forces on the cell and m = cell mass. Three forces are acting on the cell: Figure 2: Cymatic virtual instrument consisting of a string and Ftotal = Fspring + Fdamping + Fexternal,(3)modified sheet. They are joined together between mass 30 (from the left) on the string to mass (6, 3) on the sheet. A random excita- where Fspring = the force on the cell from springs connected tion is applied at point 10 of the string and the virtual microphone to neighbouring cells, Fdamping = the frictional damping force is located at mass (6, 3) of the sheet. on the cell due to the viscosity of the medium, Fexternal = the force on the cell from external excitations. Fspring is calculated by summing the force on the cell from the springs connecting it to its neighbours, calculated via Hooke’s law:  Fspring = k pn − p0 ,(4) where k = spring constant, pn = the position of the nth neighbour, and p0 = the position of the current cell. Fdamping is the frictional force on the cell caused by the viscosity of the medium in which the cell is contained. It is proportional to the cell velocity, where the constant of pro- portionality is the damping parameter of the cell.

Fdamping =−ρv(t), (5) where ρ = the damping parameter of the cell, v(t) = the ve- locity of the cell at time t. The acceleration of a particular cell at any instant can be established by combining these forces into (2) Figure 3: Force feedback joystick settings dialog.    a(t) = (1/m) k pn − p0 − ρv(t)+Fexternal . (6)

The position, velocity, and acceleration are calculated once per sampling interval for each cell in the virtual instrument. Any virtual microphones in the instrument output their cell 4

positions to provide an output audio waveform. kHz 2 4. CYMATIC OUTPUTS

Audio provide a representation that enables 1s the detailed nature of the acoustic output from Cymatic to be observed visually. Figure 2 shows a virtual Cymatic instru- Figure 4: Spectrogram of output from the Cymatic virtual instru- ment consisting of a string and a modified sheet which are ment, shown in Figure 2, consisting of a string and modified sheet. joined together between mass 30 (from the left) on the string to mass (6, 3) on the sheet. A random excitation is applied at mass 10 of the string and a virtual microphone (mic1) is functions of the joystick. Three of the buttons have been set located at mass (4, 3) of the sheet. Figure 3 shows the force to suppress X, Y, and Z; a feature which enables a new setting feedback joystick settings dialog used to control the virtual to be jumped to as desired, for example, by pressing button instrument and it can be seen that the component mass of 1, moving the joystick in the X axis and then releasing button the string, the component tension, and damping and mass 1. Force feedback is applied based on the output amplitude of the sheet are controlled by the X, Y, Z and slider (throttle) level from mic1. Gesture-Tactile Physical Modelling Synthesis 1005

Figure 5: Spectrogram of a section of “the child is sleeping” by Stuart Rimell showing Cymatic alone (from the start to A), the word “hush” sung by the four-part choir (A to B) and the “st” of “still” at C.

Figure 4 shows a spectrogram of the output from mic1 as well as tactile musical experience that is rarely found with of the instrument. The tonality visible (horizontal banding computer-based instruments, but commonly expected from in the spectrogram) is entirely due to the resonant properties acoustic musical instruments. The audio output from Cy- of the string and sheet themselves, since the input excitation matic is derived from a physical modelling synthesis engine, is random. Variations in the tonality are rendered through which enables virtual instruments with arbitrary shapes to gestural control of the joystick, and the step change notable be built up by interconnecting one (string), two (sheet), just before half way through is a result of using one of the three (block), or more dimensional basic building blocks. An “suppress” buttons. acoustic excitation chosen from bowing, plucking, striking, Cymatic was used in a public live concert in Decem- or waveform is applied at any mass element, and the output is ber 2002, for which a new piece “the child is sleeping”was derived from a virtual microphone placed at any other mass specially composed by Stuart Rimell for a capella choir and element. Cymatic is controlled via gestural controllers that Cymatic (http://www.users.york.ac.uk/∼dmh). It was per- incorporate force feedback to provide the player with tactile formed by the Beningbrough Singers in York, conducted by as well as acoustic feedback. David Howard. The composer performed the Cymatic part, Cymatic has the potential to enable new musical instru- which made use of three cymbal-like structures controlled ments to be explored, that have the potential to produce orig- by the mouse and joystick. The choir provided a backing inal and inspiring new timbral palates, since virtual instru- in the form of a slow moving carol in four-part harmony, ments that are not physically realizable can be implemented. while Cymatic played an obligato solo line. The spectrogram In addition, interaction with these instruments can include in Figure 5 illustrates this with a section which has Cymatic aspects that cannot be used with their physical counterparts, alone (up to point A), and then the choir enters singing such as deleting part of the instrument while it is sounding, “hush be still,” with the “sh” of “hush” showing at point B or changing its physical properties in real-time during per- and the “st” of “still” at point C. In this particular Cymatic formance. The design of the user interface ensures that all of example, the sound colours being used lie at the extremes of these activities can be carried out in a manner that is more the vocal spectral range, but there are clearly tonal elements intuitive than with traditional electronic instruments, since in the Cymatic output visible. Indeed, these were essential as it is based on the resonant properties of physical structures. a means of giving the choir their starting pitches. A user can therefore make sense of what she or he is doing through reference to the likely behaviour of strings, sheets, and blocks. Cymatic has the further potential in the future 5. DISCUSSION AND CONCLUSIONS (as processing speed increases further) to move well away An instrument known as Cymatic has been described, which from the real physical world, while maintaining the link with provides its players with an immersive, easy to understand, this intuition, since the spatial dimensionality of the virtual 1006 EURASIP Journal on Applied Signal Processing instruments can in principle be extended well beyond the [13] C. Cadoz, A. Luciani, and J. L. Florens, “Responsive input three of the physical world. devices and sound synthesis by simulation of instrumental Cymatic provides the player with an increased sense of mechanisms: The Cordis system,” Computer Music Journal, immersion, which is particularly useful when developing vol. 8, no. 3, pp. 60–73, 1984. [14] B. Gillespie, Haptic display of systems with changing kinematic performance skills since it reinforces the visual and aural constraints: The virtual piano action, Ph.d. dissertation, Stan- feedback cues and helps the player internalise models of ford University, Stanford, Calif, USA, 1996. the instrument’s response to gesture. Tactile feedback also [15] C. Nichols, “The vBow: Development of a virtual violin Bow has the potential to prove invaluable in group performance, haptic human-computer interface,” in Proc. New Interfaces for where traditionally computer instruments have placed an Musical Expression Conference, pp. 168–169, Dublin, Ireland, over-reliance on visual feedback, thereby detracting from the May 2002. player’s visual attention which should be directed elsewhere [16] J. Rovan and V. Hayward, “Typology of tactile sounds and their synthesis in gesture-driven computer music perfor- in a group situation, for example, towards a conductor. mance,” in Trends in Gestural Control of Music,M.Wander- ley and M. Battier, Eds., pp. 297–320, Editions IRCAM, Paris, ACKNOWLEDGMENTS France, 2000. [17] D. M. Howard, S. Rimell, and A. D. Hunt, “Force feedback The authors acknowledge the support of the Engineering and gesture controlled physical modelling synthesis,” in Proc. Con- Physical Sciences Research Council, UK, under Grant num- ference on New Musical Instruments for Musical Expression,pp. 95–98, Montreal, Canada, May 2003. ber GR/M94137. They also thank the anonymous referees for their helpful and useful comments. David M. Howard holds a first-class B.S. REFERENCES degree in electrical and electronic engineer- ing from University College London (1978), [1] M. Russ, Sound Synthesis and Sampling, Focal Press, Oxford, and a Ph.D. in human communication from UK, 1996. the University of London (1985). His Ph.D. [2] J. O. Smith III, “Physical modelling synthesis update,” Com- topic was the development of a signal pro- puter Music Journal, vol. 20, no. 2, pp. 44–56, 1996. cessing unit for use with a single channel [3] M. D. Pearson and D. M. Howard, “Recent developments with cochlear implant hearing aid. He is now TAO physical modelling system,” in Proc. International Com- with the Department of Electronics at the puter Music Conference, pp. 97–99, Hong Kong, China, August 1996. University of York, UK, teaching and re- [4]J.D.MorrisonandJ.M.Adrien,“MOSAIC:Aframeworkfor searching in music technology. His specific research areas include modal synthesis,” Computer Music Journal,vol.17,no.1,pp. the analysis and synthesis of music, singing, and speech. Current 45–56, 1993. activities include the application of bio-inspired techniques for [5] C. Cadoz, A. Luciani, and J. L. Florens, “CORDIS-ANIMA: A music synthesis, physical modelling synthesis for music, singing modelling system for sound and image synthesis, the general and speech, and real-time computer-based visual displays for pro- formalism,” Computer Music Journal, vol. 17, no. 1, pp. 19–29, fessional voice development. David is a Chartered Engineer, a Fel- 1993. low of the Institution of Electrical Engineers, and a Member of the [6] J. Preece, “Interview with Ben Shneiderman,” in Human- Audio Engineering Society. Outside work, David finds time to con- Computer Interaction,Y.Rogers,H.Sharp,D.Benyon,S.Hol- duct a local 12-strong choir from the tenor line and to play the pipe land, and J. Preece, Eds., Addison Wesley, Reading, Mass, organ. USA, 1994. [7] A. D. Hunt and P. R. Kirk, Digital Sound Processing for Music Stuart Rimell holds a B.S. in electronic mu- and Multimedia, Focal Press, Oxford, UK, 1999. sic and psychology as well as an M.S. in dig- [8] W. Buxton, “There is more to interaction than meets the eye: ital music technology, both from the Uni- Some issues in manual input,” in User Centered System Design: versity of Keele, UK. He worked for 18 New Perspectives on Human-Computer Interaction,D.A.Nor- months with David Howard at the Univer- man and S. W. Draper, Eds., pp. 319–337, Lawrence Erlbaum sity of York on the development of the Cy- Associates, Hillsdale, NJ, USA, 1986. matic system. There he studied electroa- [9]G.W.Fitzmaurice, Graspable user interfaces, Ph.D. thesis, coustic composition for 3 years under Mike University of Toronto, Ontario, Canada, 1998. Vaughan and Rajmil Fischman. Stuart is in- [10] B. Gillespie, “Introduction haptics,” in Music, Cognition, and terested in the exploration of new and fresh Computerized Sound: An Introduction to Psychoacoustics,P.R. creative musical methods and their computer-based implementa- Cook, Ed., pp. 229–245, MIT Press, London, UK, 1999. [11] D. M. Howard, S. Rimell, A. D. Hunt, P. R. Kirk, and A. M. tion for electronic music composition. Stuart is a guitarist and he Tyrrell, “Tactile feedback in the control of a physical mod- also plays euphonium, trumpet, and piano and has been writing elling music synthesiser,” in Proc. 7th International Conference music for over 12 years. His compositions have been recognized in- on Music Perception and Cognition,C.Stevens,D.Burnham, ternationally through prizes from the prestigious Bourge Festival G. McPherson, E. Schubert, and J. Renwick, Eds., pp. 224– of Electronic Music in 1999 and performances of his music world- 227, Casual Publications, Adelaide, Australlia, 2002. wide. [12] S. Kenji, H. Riku, and H. Shuji, “Development of an au- tonomous humanoid robot, iSHA, for harmonized human- machine environment,” Journal of Robotics and Mechatronics, vol. 14, no. 5, pp. 324–332, 2002. EURASIP Journal on Applied Signal Processing 2004:7, 1007–1020 c 2004 Hindawi Publishing Corporation

Vibrato in Singing Voice: The Link between Source-Filter and Sinusoidal Models

Ixone Arroabarren Departamento de Ingenier´ıa El´ectrica y Electronica,´ Universidad Publica´ de Navarra, Campus de Arrosadia, 31006 Pamplona, Spain Email: [email protected]

Alfonso Carlosena Departamento de Ingenier´ıa El´ectrica y Electronica,´ Universidad Publica´ de Navarra, Campus de Arrosadia, 31006 Pamplona, Spain Email: [email protected]

Received 4 July 2003; Revised 30 October 2003

The application of inverse filtering techniques for high-quality singing voice analysis/synthesis is discussed. In the context of source-filter models, inverse filtering provides a noninvasive method to extract the voice source, and thus to study voice quality. Although this approach is widely used in speech synthesis, this is not the case in singing voice. Several studies have proved that inverse filtering techniques fail in the case of singing voice, the reasons being unclear. In order to shed light on this problem, we will consider here an additional feature of singing voice, not present in speech: the vibrato. Vibrato has been traditionally studied by sinusoidal modeling. As an alternative, we will introduce here a novel noninteractive source filter model that incorporates the mechanisms of vibrato generation. This model will also allow the comparison of the results produced by inverse filtering techniques and by sinusoidal modeling, as they apply to singing voice and not to speech. In this way, the limitations of these conventional techniques, described in previous literature, will be explained. Both synthetic signals and singer recordings are used to validate and compare the techniques presented in the paper. Keywords and phrases: voice quality, source-filter model, inverse filtering, singing voice, vibrato, sinusoidal model.

1. INTRODUCTION Regarding the voice production models, we can distin- guish two approaches as follows. Inverse filtering provides a noninvasive method to study (i) On the one hand, interactive models are closer to the voice quality. In this context, high-quality speech synthesis physical features of the vocal system. This system is com- is developed using a source-filter model, where voice texture posed by two resonant cavities (subglottal and supraglot- is controlled by glottal source characteristics. Efforts to ap- tal) which are connected by a valve, the glottis, where vo- ply this approach to singing voice have failed, the reasons calfoldsarelocated.Themovementofthevocalfoldspro- being not clear: either the unsuitability of the model, or the vides the harmonic nature of the air flow of voiced sounds, different range of frequencies, or both, could be the cause. and also controls the coupling between the two resonant The lyric singers, being professionals, have an efficiency re- cavities, which will be different during the open and closed quirement, and as a result, they are educated to change their phases. As a result of this effect, the VTR will change dur- formants position moving them towards the first harmonics ing a single fundamental period and there will be a relation- position, what could also be another reason of the model’s ship between the glottal source and the VTR. This physical failure [1]. behavior has been modeled in several ways, by physical mod- This paper purports to shed light on this problem by els [2] or aerodynamic models [3, 4]. From the signal pro- comparing two salient methods for glottal source and vo- cessing point of view, in [4] the VTR variation is related to cal tract response (VTR) estimation, with a novel frequency- the glottal area, which controls the coupling of the cavities, domain method proposed by the authors. In this way, the and this relationship is represented by a frequency modula- inverse filtering approach will be tested in singing voice anal- tion of the central frequency and bandwidth of the formants. ysis. In order to have a benchmark, the source-filter model Other effect of the source-tract interaction is the increase will be compared to sinusoidal model and this comparison of the skewness of the glottal source [4], which emphasizes will be performed thanks to the particular feature of singing the difference between the glottal area and the glottal source voice: vibrato. [5]. 1008 EURASIP Journal on Applied Signal Processing

(ii) On the other hand, Non Interactive Models separate Glottal Singing the glottal source and the VTR, and both are independently source Lip radiation voice VTR diagram modeled as linear time-varying systems. This is the case of − · −1 the source-filter model proposed by Fant in [6]. The VTR is 1 l z modeled as an all-pole filter, in the case of nonnasal sounds. For the glottal source several waveform models have been Figure 1: Noninteractive source-filter model of voice production proposed [7, 8, 9], but all of them try to include some of the system. features of the source-tract interaction, typically the asym- metric shape of the pulse. These models provide a high qual- ity synthesis framework for the speech with a low compu- may serve to increase the information available by virtue of tational complexity. The synthesis is preceded by an anal- the frequency modulated nature, and therefore wider band- ysis stage, which is divided into two steps: an inverse fil- width, of vibrato [22, 23, 24]. Frequency variations are in- tering step where the glottal source and the VTR are sepa- fluenced by the VTR, and this effect can be used to obtain rated [9, 10, 11, 12, 13] and a parameterization step where information about it. the most relevant parameters of both elements are obtained With this in mind, it is not surprising that vibrato has [14, 15, 16]. been traditionally analyzed by sinusoidal modeling [25, 26], In general, inverse filtering techniques yield worse re- the most important limitation being the impossibility to sep- sults as the fundamental frequency increases, as is the case arate the sound generation and the VTR. In Section 3,we of women and children in speech and singing voice. In the will take a step forward by introducing a source-filter model, latter case, singing voice, the number of published works is which accounts for the physical origin of the main features of very scarce [1, 17]. In [1], the glottal source features are stud- singing voice. Making use of this model, we will also demon- ied in speech and singing voice by acoustic and electroglotto- strate how the simpler sinusoidal model can serve to obtain a graphic signals [18, 19]. From these works, it is not apparent complementary information to inverse filtering, particularly which is the main limitation of inverse filtering in singing in those conditions where the latter method fails. voice. It might be possible that the source-tract interaction was more complex than in speech, what would represent a paradox in the noninteractive assumption [20]. Other rea- 2. INVERSE FILTERING son mentioned in [1] is that perhaps the glottal source mod- els used in speech are not suitable for singing voice. These Along this section, the noninteractive source-filter model, statements are not demonstrated, but are interesting ques- depicted in Figure 1, will be considered and some of the pos- tions that should be answered. sible estimation algorithms for it will be reviewed. On the other hand, in [17] the noninteractive source- According to the block diagram in Figure 1, singing voice filter model is used as a high-quality singing voice synthesis production can be modeled by a glottal source excitation that approach. The main contribution of that work is the devel- is linearly modified by the VTR and the lip radiation dia- opment of an analysis procedure that estimates the param- gram. Typically, the VTR is modeled by an all-pole filter, and eters of the synthesis model [12, 21]. However, there is no relying on the linearity of the model, the lip radiation sys- evidence that could point to differences between speech and tem is combined with the glottal source, in such a way that singing as it is indicated in [1]. the glottal source derivative (GSD) is considered as the vocal One of the goals of the present work is to clarify whether tract excitation. the noninteractive models are able to model singing voice in In this context, during the last decades many inverse fil- the same way as high-quality speech, or on the contrary, the tering algorithms to estimate the model elements have been source-tract interaction is different from speech, and pre- proposed. This technique is usually accomplished in two cludes this linear model assumption. If the noninteractive steps. In the first one, the GSD waveform and the VTR are model could model singing voice, the reason of the failure estimated. In the second one, these signals are parameterized of inverse filtering techniques would be just the high funda- in a few numerical values. This whole analysis can be practi- mental frequency of singing voice. cally implemented in several ways. For the sake of clarity, we To this end, we will compare in this paper three differ- can group these possibilities into two types. ent inverse filtering techniques, one of them novel and pro- (i) In the first group, the two identification steps are com- posed recently by the authors in order to obtain the source- bined in a single algorithm, for instance in [9, 12]. There, filter decomposition. Though they work correctly for speech a mathematical model for GSD and the autoregressive (AR) and low-frequency signals, we will show their limitations as model for the VTR are considered, and then authors estimate the fundamental frequency increases. This is described in simultaneously the VTR and the GSD model parameters. In Section 2. this way, the GSD model parameterizes a given phonation Since fundamental frequency in singing voice is higher type. Several different algorithms follow this structure, but than in speech, it seems obvious that the above-mentioned all of them are invariably time domain implementations that methods fail, apparently due to the limited spectral informa- require glottal closure instant (GCI) detection [27]. There- tion provided in high pitched signals. Tocompensate for that, fore, they suffer from a high computational load, what makes we claim that the introduction of a feature such as vibrato them very cumbersome. Vibrato in Singing Voice 1009

Vocal tract Speech parameters Preemphasis

Covariance LPC Voice source parameters Voice source Voice source Preemphasis model parameters optimization

Figure 2: Block diagram of the AbS inverse filtering algorithm.

(ii) The procedures in the second group split the whole taneous search is developed. The block diagram of the algo- process into two stages. Regarding the first step, different rithm is represented in Figure 2. inverse filtering techniques are proposed, [11, 13]. These al- As in covariance LP without source, this approach al- gorithms remove the GSD effect from the speech signal and lows shorter analysis windows. However, the stability of the the VTR is obtained by linear prediction (LP) [28] or alterna- system is not guaranteed and a stabilization step must be in- tively by discrete all-pole (DAP) modeling [29], which avoids cluded with this purpose. Also, and since it is a time domain the fundamental frequency dependence of the former. implementation, the voice source model must be synchro- For this comparative study three inverse filtering ap- nized with the speech signal and a high sampling frequency is proaches have been selected. The first one is the analysis by mandatory in order to obtain satisfactory results. As a result, synthesis (AbS) procedure presented in [9], the second one the computational load is also high. Regarding the GSD pa- is the one proposed by the authors in [13], Glottal Spectrum rameter optimization, it is dependent on the chosen model. Based (GSB) inverse filtering. In this way, both groups of al- In the results shown in Section 2.4, the LF model is selected gorithms mentioned above are represented. In addition, the because it is one of the most powerful GSD models, and it Closed Phase Covariance (CPC) [10] has been added to the allows an independent control of the three main features of comparison. This approach is difficult to classify because it the glottal source: open quotient, asymmetry coefficient and only obtains the VTR, as it is the case in the second group, spectral tilt. The disadvantage of this model is its computa- but it is a time domain implementation as in the first one. tional load. For more details on the topic readers are referred The most interesting feature of this algorithm is that it is less to [8]. affected by the ripple due to the source-tract inter- Regarding fundamental frequency limits, it is shown in action, because it only takes into account the time interval [1] that this algorithm provides unsatisfactory results for when the vocal folds are closed. In what follows, the three medium and high pitched signals. approaches will be shortly described, and finally compared. 2.2. Glottal spectrum based inverse filtering 2.1. Analysis by synthesis This inverse filtering algorithm was proposed in [9]. It is This technique was proposed by the authors in [13]andwill based on covariance LPC [29], but the least squares error is be briefly described here. Unlike the technique described in modified in order to include the input of the system: the previous section, it is essentially a frequency domain im- plementation. In the AbS approach, the GSD effect was in- N−1 cluded in the LP error, and the AR coefficients were obtained 2 E = s(n) − sˆ(n) by Covariance LPC. In our case, a short term spectrum of n=0 speech is considered (3 or 4 fundamental periods), and the    (1) N−1 p 2 GSD effect is removed from the speech spectrum. Then, the = s(n) − aks(n − k)+ap+1g(n) , AR coefficients of (2) are obtained by the DAP modeling n=0 k=1 [29]. For this spectral implementation, the KLGLOTT88 where g(n) represents the GSD, and model [7] has been considered. It is less powerful than the ap+1 LF model, but of a simpler implementation. H(z) =  (2) − p −k As it is shown in Figure 3, there is a basic voicing wave- 1 k=1 akz form controlled by the open quotient (Oq) and the amplitude represents the VTR. Since neither VTR nor GSD parameters of voicing (AV), the spectral tilt being included by a first- are known, an iterative algorithm is proposed and a simul- order lowpass filter. 1010 EURASIP Journal on Applied Signal Processing

1.5 1 g(t) Lowpass filter 0.5 Basic voicing spectral tilt GSD 0 waveform 1 − 0.105 0.11 0.115 0.12 0.125 1−µz−1 0.5 −1 Oq AV −1.5 −2 Figure 3: Block diagram of the KLGLOTT88 model. −2.5 Closed phase Normalized amplitude −3 −3.5 Speech Time (s) DAP modeling V. tract GSD Short term Peak − and ST spectrum detection V. tract + ST Voice (N + 1)th order separation Figure 5: Closed phase interval in voice. Basic Vocal tract voicing parameters spectrum Speech Interval Covariance selection LPC Figure 4: Block diagram of the GSB inverse filtering algorithm. GCI Vocal tract detection parameters

EGG Closed In our inverse filtering algorithm, once the short term phase detection spectrum is calculated, the glottal source effect is removed, by spectral division, by using the spectrum of the basic voicing waveform (3), which can be directly obtained by the Fourier Figure 6: Closed phase covariance (CPC). transform of the basic voicing waveform [30]: − j2πfOq To − j2πfOq To fer function. So, this inverse filtering algorithm will also have = 27 AV je 1+2e G( f ) 3 + a limit in the highest achievable fundamental frequency. 2Oq(2πf) 2 2πfOq To − (3) 1 − e j2πfOq To 2.3. Closed phase covariance +3j 2 . 2πfOq To This inverse filtering technique was proposed in [31]. It is also based on covariance LP, as the AbS approach explained The spectral tilt (ST) and the VTR are combined in an (N + above. However, instead of removing the effect of the GSD 1)th order all-pole filter. The block diagram of the algorithm from a long speech interval, the classical covariance LP takes is shown in Figure 4. only into account a portion of a single cycle where the vocal Since DAP modeling is the most important part of the folds are closed. In this way, and in the considered time in- algorithm, we should explain its rationale. In classical auto- terval, there is no GSD information to be removed, and the correlation LP [28], it is a well-known effect that as funda- application of covariance LP will lead to the right transfer mental frequency increases the resulting transfer function is function. Considering the linearity of the model shown in biased by the spectral peaks of the signal. This happens be- Figure 1, the closed phased interval will be the time interval cause the signal is assumed to be the impulse response of the where the GSD is zero. This situation is depicted in Figure 5. system, and this assumption is obviously not entirely correct. The most difficult step in this technique is to detect the In order to avoid this problem, an alternative proposed in closed phase in the speech signal. In [10], a two-channel [29] is to obtain the LP error based on the spectral peaks, speech processing is proposed, making use of electroglotto- instead of on the time domain samples. Unfortunately, this graphic signals to detect the closed phase. Electroglottogra- error calculation is based on an aliased version of the right phy (EGG) is a technique used to indirectly register laryngeal autocorrelation of the signal, and this aliasing grows as the behavior by measuring the electrical impedance across the fundamental frequency increases. Then, the resulting trans- throat during speech. Rapid variation in the conductance is fer function is not correct again. To solve this problem, the mainly caused by movement of the vocal folds. As they ap- DAP modeling uses the Itakura-Saito error, instead of the proximate and the physical contact between them increases, least squares error, and it can be shown that the error is min- the impedance decreases, what results in a relatively higher imized using only the spectral peaks information. The de- current flow through the larynx structures. Therefore, this tails of the algorithm are explained in [29]. This technique signal will provide information about the contact surface of allows higher fundamental frequencies than classical auto- the vocal cords. correlation LP, but for proper operation requires an enough The complete inverse filtering algorithm is represented in number of spectral peaks in order to estimate the right trans- Figure 6. Vibrato in Singing Voice 1011

0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.014 0.016 0.018 0.02 0.022 0.024 0.026 Time (s) Time (s) GSB GSB CPC CPC AbS AbS Original GSD Original GSD

(a) (b) 50 50 40 40 30 30 20 20 10 10 0 0 −10 −10 −20

Amplitude (dB) Amplitude (dB) − −30 20 −40 −30 −50 −40 0 1000 2000 3000 4000 5000 6000 7000 0 1000 2000 3000 4000 5000 6000 7000 Frequency (Hz) Frequency (Hz) GSB GSB CPC CPC AbS AbS Original VTR Original VTR

(c) (d)

Figure 7: (a) Estimated GSD. F0 = 100 Hz, vowel “a.” (b) Estimated GSD. F0 = 300 Hz, vowel “a.” (c) Estimated VTR. F0 = 100 Hz, vowel “a.” (d) Estimated VTR. F0 = 300 Hz, vowel “a.”

In Figure 6,aGCIdetectionblock[27] is included, be- 2.4.1. Simulation results cause, even though both acoustic and electroglottographic First, the non interactive model for voice production shown signals are simultaneously recorded, there is a propaga- in Figure 1 will be used in order to synthesize some artifi- tion delay between the acoustic signal recorded on the cial signals for test. The lip radiation effect and the glottal microphone and the impedance variation at the neck of the source are combined in a mathematical model for the GSD, singer. Thus, a precise synchronization is mandatory. also making use of the LF model. It is well known [1, 17] Since this technique is based on the covariance LP, it may that the formant position can affect inverse filtering results. work with very short window lengths. However, as the fun- In [3], it is also shown that the lower first formant central fre- damental frequency increases, the time length of the closed quency is, the higher is the source-tract interaction. So, the phase gets shorter, and there is much less information left for interaction is higher in vowels where the first format central the vocal tract estimation. This fact imposes a fundamental frequency is lower. Therefore, and in order to cover all pos- frequency limit, even using the covariance LP. sible situations, two vocal all-pole filters have been used for synthesizing the test signal: one representing Spanish vowel 2.4. Practical results “a,” and the other one representing Spanish vowel “e.” In this Once the basics of three inverse filtering techniques have latter case, the first formant is located at lower frequencies. been presented and described, they will be compared by sim- In order to see the fundamental frequency dependence of ulations and also by making use of natural singing voice inverse filtering techniques, this parameter has been varied records. The main goal of this analysis is to see how the three from 100 Hz to 300 Hz in 25 Hz steps. For each fundamen- techniques are compared in terms of their fundamental fre- tal frequency, the three algorithms have been applied and the quency limitations. GSD as well as the VTR have been estimated. In Figures 7a to 1012 EURASIP Journal on Applied Signal Processing

0.2 0.2 0.18 0.18 0.16 0.16 0.14 0.14

F1 0.12 F1 0.12 0.1 0.1

Error 0.08 Error 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0 0 90 140 190 240 290 90 140 190 240 290

F0 (Hz) F0 (Hz)

GSB GSB CPC CPC AbS AbS

(a) (b)

1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 GSD 0.5 GSD 0.5 0.4 0.4 Error 0.3 Error 0.3 0.2 0.2 0.1 0.1 0 0 90 140 190 240 290 90 140 190 240 290 F (Hz) 0 F0 (Hz) GSB GSB CPC CPC AbS AbS

(c) (d)

Figure 8: Fundamental frequency dependence. (a) ErrorF1 in vowel “a.” (b) ErrorF1 in vowel “e.” (c) ErrorGSD in vowel “a.” (d) ErrorGSD in vowel “e.”

7d, the glottal GSD and the VTR estimated by the three ap- three methods, calculated according to the following expres- proaches are shown for two different fundamental frequen- sions: cies. Note that in them, and in other figures, DC level has − ˆ been arbitrarily modified to facilitate comparisons. F1 F1 ErrorF1 = , Comparing the results obtained by the three inverse fil- F1  (4) tering approaches, it is shown that as fundamental frequency N−1 − 2 n=0 g(n) gˆ(n) increases the error in both GSD and VTR increases. Recall- ErrorGSD = , N ing the implementation of the algorithms, the CPC uses only the time interval where the GSD is zero. When the funda- where F1 represents the first formant central frequency and mental frequency is low, it is possible to see that the result g(n)andgˆ(n) are the original and estimated GSD , of this technique is the closest one to the original one. In respectively. the case of the other two techniques, both have slight vari- Although the simulation model does not take into ac- ations in the closed phase, because in both cases the glottal count source-tract interactions, Figure 8 shows that inverse source effect is removed from the speech signal in an approx- filtering results are dependent on the first formant position, imated manner. Otherwise, when the fundamental frequency being worse as it moves to lower frequencies. Also, it is possi- is high, the AbS approach leads comparatively to the best re- ble to see that both errors increase as fundamental frequency sult. However, it provides neither the right GSD, nor the right increases. Therefore, the main conclusion of this simulation- VTR. based study is that the inverse filtering results have funda- In Figure 8, the relative error in the first formant central mental frequency dependence even when applied to a non frequency and the error in the GSD are represented for the interactive source-filter model. Vibrato in Singing Voice 1013

0

−20

−40

−60 Amplitude (dB) −80

−100 1.269 1.274 1.279 1.284 1.289 1.294 1.299 0 1000 2000 3000 4000 5000 6000 7000 Time (s) Frequency (Hz)

GSB GSB CPC CPC AbS AbS

(a) (b)

0

−20

−40

−60 Amplitude (dB) −80

−100 0.765 0.767 0.769 0.771 0.773 0.775 0.777 0 1000 2000 3000 4000 5000 6000 7000 Time (s) Frequency (Hz)

GSB GSB CPC CPC AbS AbS

(c) (d)

Figure 9: (a) Estimated GSD. F0 = 123 Hz, vowel “a.” (b) Estimated VTR. F0 = 123 Hz, vowel “a.” (c) Estimated GSD. F0 = 295 Hz, vowel “a.” (d) Estimated VTR. F0 = 295 Hz, vowel “a.”

2.4.2. Natural singing voice results are shown. These results are also representative of the other ff For this analysis, three male professional singers were singers’ recordings and of the di erent vowels. recorded: two tenors and one baritone. They were asked to By comparing Figures 9a and 9c,itispossibletoconclude sing notes of different fundamental frequency values, in or- that in the case of a low fundamental frequency, the three al- der to register samples of all of their tessitura. Besides, differ- gorithms provide very close results. In the case of CPC, the ent vocal tract configurations are considered, and thus, this GSD presents less formant ripple in the closed phase interval. exercise was repeated for the five Spanish vowels “a,” “e,” “i,” Regarding the VTR, the central frequencies of the formants “o,” “u.” The singing material was recorded in a professional and the frequency responses are very similar. Nevertheless, studio, in such a way that reverberation was reduced as much in the case of a high fundamental frequency, the resulting ff as possible. Acoustic and electroglottographic signals were GSD of the three analyses are very di erent from those of synchronously recorded, with a bandwidth of 20 KHz, and Figure 9a, and also from the waveform model provided by ff stored in . wav format. In order to remove low frequency am- the LF model. Also, the calculated VTR is very di erent for bient noise, the signals were filtered out by a high pass lin- the three methods. Thus, conclusions with natural recorded ear phase FIR filter whose cut-off frequency was set to a 75% voices are similar to those obtained with synthetic signals. of the fundamental frequency. In the case of electroglotto- graphic signals, this filtering was also applied because of low 3. VIBRATO IN SINGING VOICE frequency artifacts typical of this kind of signals due to larynx movements. 3.1. Definition In Figures 9a to 9c, the results obtained for different In Section 2, inverse filtering techniques, successfully em- fundamental frequencies and vowel “a,” for the same singer, ployed in speech processing, have been used for singing voice 1014 EURASIP Journal on Applied Signal Processing processing. It has been shown that as fundamental frequency 0 increases, they reach a limit and thus an alternative technique −10 should be used. As we will show in this section, the introduc- −20 tion of vibrato in singing voice provides more information −30 about what can be happening. −40 Vibrato in singing voice could be defined as a small −50 quasiperiodic variation of the fundamental frequency of the −60 note. As a result of this variation, all of the harmonics of the Amplitude (dB) −70 voice will also present an amplitude variation, because of the −80 filtering effect of the VTR. Due to these nonstationary char- −90 acteristics of the signal, singing voice has been modeled by 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 the modified sinusoidal model [25, 26]: Frequency (Hz)

N−1 Figure 10: AM-FM representation for the first 20 harmonics. Ane- = s(t) = ai(t)cosθi(t)+r(t), (5) choic tenor recording F0 220 Hz, vowel “a.” i=0 where est of some few papers. The first work on this topic is [34], t where the perceptual relevance on spectral envelope discrim- θi(t) = 2π fi(τ)dτ (6) −∞ ination of the instantaneous amplitude is proven. In [22], the relevance of this feature is experimentally demonstrated in and ai(t) is the instantaneous amplitude of the partial, fi(t) the the case of synthesis of singing voice. Also, its physical cause instantaneous frequency of the partial,andr(t) the stochastic is tackled and a representation in terms of the instantaneous residual. amplitude versus instantaneous frequency of the harmonics The acoustic signal is composed by a set of components, is introduced for the first time. This representation is pro- (partials), whose amplitude and frequency change with time, posed as a means of obtaining a local information of the plus a stochastic residual, which is modeled by a spectral VTR in limited frequency ranges. Something similar is done density time-varying function. Also in [25, 26], detailed in- in [35], where the singing voice is synthesized using this lo- formation is given on how these time-varying characteristics cal information of the VTR. We have also contributed in this can be measured. direction, for instance in [23], where the instantaneous am- Of the two features of a vibrato signal, frequency and plitude is decomposed in two parts. The first one represents amplitude variations, frequency is the most widely stud- the sound intensity variation and the other one represents ied and characterized. In [32, 33], the instantaneous fre- the amplitude variation determined by the local VTR, in an quency is characterized and decomposed into three main attempt to split the contribution of the source and the vocal components which account for three musically meaningful tract. Moreover, in [24], different time-frequency processing characteristics, respectively. Namely, tools have been used and compared in order to identify the relationship between instantaneous amplitude and instanta- = f (t) i(t)+e(t)cosϕ(t), (7) neous frequency. In that work, the AM-FM representation is defined as where the instantaneous amplitude versus instantaneous frequency t representation, with time being an implicit parameter. This ϕ(t) = 2π r(τ)dτ (8) −∞ representation is compared to the magnitude response of an all-pole filter, which is typically used for VTR modeling. Two f (t) being the instantaneous frequency, i(t) the intonation of main conclusions are derived, the first one is that only when the note, which corresponds to slow variations of pitch; e(t) anechoic recordings are considered, these two representa- represents the extent or amplitude of pitch variations, and tions can be compared. Otherwise, the instantaneous mag- r(t) represents the rate or frequency of pitch variations. nitudes will be affected by reverberation. The second one is All of them are time-dependent magnitudes and rely on that, as a frequency modulated input is considered, and fre- the musical context and singer’s talent and training. In the quency modulation is not a linear operation, the phase of the case of intonation, its value depends on the sung note, and all-pole system will affect the AM-FM representation, leading thus, on the context. But extent and rate are mostly singer- to a different representation than the vocal tract magnitude dependent features, typical values being a 10% of the intona- response. However the relevance of this effect depends on tion value and 5 Hz, respectively. the formant bandwidth and vibrato characteristics, vibrato Regarding the amplitude variation of the harmonics dur- rate in this case. It was also shown that in natural vibrato the ing vibrato, a well-established parameterization is not ac- phase effect of VTR is not noticeable, because vibrato rate is cepted, and probably it does not exist, because this varia- slow comparing to formant bandwidths. tion is different for all of the harmonics. It is therefore not Figure 10 constitutes a good example of the kind of strange that amplitude variation has been the topic of inter- AM-FM representations we are talking about. In it, each Vibrato in Singing Voice 1015 harmonic’s instantaneous amplitude is represented versus its O VTR q Glotal source instantaneous frequency. For this case, only two vibrato cy- 1 ft derivative H(z) = p 1− a z−k cles, where the vocal intensity does not change significantly, LF model k=1 k have been considered. As the number of harmonic increases, α Singing voice the frequency range swept by each harmonic widens. Com- F0(t): vibrato paring Figure 10 and Figure 9b, the AM-FM representation intonation of the former one is very similar to the VTR of Figure 9b. rate extent However, in the case of the AM-FM representation, no source-filter separation has been made, and thus both ele- Figure 11: Noninteractive source-filter model with vibrato. ments are melted in that representation. The results obtained by other authors [22, 35] are quite similar regarding the in- stantaneous amplitude and instantaneous frequency of each stantaneous amplitude versus instantaneous frequency rep- harmonic need to be measured. Results obtained for this sim- resentation, however, in those works no comment is made ulation are shown in Figures 12, 13, 14,and15.InFigure 12a about the conditions of recordings. inverse filtering results are shown for a short window analy- sis. When fundamental frequency is low, GSD and VTR are 3.2. Simplified noninteractive source-tract model well separated. In Figures 12a, 13a , sinusoidal modeling re- with vibrato sults are shown. The frequency variations of the harmonics The main conclusion from the results presented above could of the signal are clearly observed and, as a result, the am- be that vibrato might be used in order to extract more in- plitude variation. On the other hand, in Figure 14, the AM- formation about glottal source and VTR in singing voice. FM representation of the partials is shown. Taking into ac- Therefore, we will propose here a simplified noninteractive count the AM-FM representation of every partial, and com- source-filter model with vibrato that will be a signal model paring this to the VTR shown in Figure 12a,itispossible of vibrato production and will explain the results provided to conclude that a local information of the VTR is provided by sinusoidal modeling. We will first make some basic as- by this method. However, as no source-filter decomposition sumptions regarding what is happening with GSD and VTR has been developed, each AM-FM representation is shifted during vibrato. These assumptions are based on perceptual in amplitude depending on the GSD spectral features. This aspects of vibrato, and on the AM-FM representation for nat- effect is a result of keeping GSD parameters constant during ural singing voice. vibrato. Comparing Figures 14 and 15, it can be noticed that if the GSD magnitude spectrum is removed from the AM-FM (1) The GSD characteristics remain constant during vi- representation of the harmonics, the resulting AM-FM rep- brato, and only the fundamental frequency of the voice resentation would provide only VTR information. The result changes. This assumption is justified by the fact that of this operation is shown in Figure 16. perceptually there is no phonation change during a For this simplified noninteractive source-filter model single note. with vibrato, instantaneous parameters of sinusoidal model- (2) The intensity of the sound is constant, at least during ing provide a complementary information about both GSD one or two vibrato cycles. and VTR. When inverse filtering works, the GSD effect can (3) The VTR remains invariant during vibrato. This as- be removed from the AM-FM representation provided by si- sumption relies on the fact that vocalization does not nusoidal modeling and only the information of the VTR re- change along the note. mains. (4) The three vibrato characteristics remain constant. This assumption is not strictly true, but their time constants 3.3. Natural singing voice are considerably larger than the signal fundamental The relationship between these two signal models, noninter- period. active source-filter model and sinusoidal model, has been es- tablished for a synthetic signal where vibrato has been in- Taking into account these four assumptions, the simpli- cluded under the four assumptions stated at the beginning fied noninteractive source-filter model with vibrato could be of the section. Now, the question is whether this relationship represented by the block diagram in Figure 11. holds in natural singing voice too. Therefore, both kinds of Based on this model, we will simulate the produc- signal analysis will be now applied to natural singing voice. In tion of vibrato. The GSD characteristics are the same as order to get close to simulation conditions, some precautions in Section 2.4, and the VTR has been implemented as an have been taken in the recording process. all-pole filter whose frequency response represents Spanish vowel “a.” A frequency variation, typical of vibrato, has been (1) The musical context has been selected in order to con- applied to the GSD with a 120 Hz intonation, an extent of trol intensity variations of the sound. Singers were 10% of the intonation value, and a rate of 5,5 Hz. All of them asked to sing a word of three notes, where the first are kept constant in the complete register. and the last one simply provide a musical support and We have applied to the resulting signal both inverse fil- the note in between is a long sustained note. This note tering (where the presence or absence of vibrato does not in- is two semitones higher than the two accompanying fluence the algorithm), and sinusoidal modeling, where in- ones. 1016 EURASIP Journal on Applied Signal Processing

30 20 10 0 −10 −20

Amplitude (dB) −30 −40 −50 0.402 0.407 0.412 0.417 0.422 0.427 0.432 0 2000 4000 6000 Time (s) Frequency (Hz)

Original GSD Original VTR Inverse filtered GSD Inverse filtered VTR

(a) (b)

Figure 12: Inverse filtering results. GSB inverse filtering algorithm. (a) GSD. (b) VTR.

2500 80 75 2000 70 1500 65 60 1000 55 50 Frequency (Hz) 500 Amplitude (dB) 45 40 0 35 00.20.40.60.8 00.20.40.60.8 Time (s) Time (s)

(a) (b)

Figure 13: Sinusoidal modeling results. (a) Instantaneous frequency. (b) Instantaneous amplitude.

80 0 70 −10 60 −20 50 −30 40 −40 30 −50 Amplitude (dB)

Amplitude (dB) 20 −60 10 −70 0 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 Frequency (Hz) Frequency (Hz) Short term spectrum Figure 14: AM-FM representation. Spectral peaks

Figure 15: GSD short term spectrum. Blackman-Harris window.

(2) Recordings have been done in a studio where reverber- In Figures 17, 18, 19,and20 the results of these analyses are ations are reduced but not completely eliminated as in shown for a low-pitched baritone recording, F0 = 128 Hz, an anechoic room. In this situation, the AM-FM rep- vowel “a”. Contrarily to Figures 12, 13, 14,and15, here there resentation will present slight variations from the ac- is no reference for the original GSD and VTR. Compar- tual VTR, but it is still possible to develop a qualitative ing Figures 12b, 13b and 17b, 18b, instantaneous frequency study. variation is similar in simulation and natural singing voice. Vibrato in Singing Voice 1017

110 4. DISCUSSION OF RESULTS AND CONCLUSIONS 100 In Section 2, inverse filtering techniques have been reviewed, 90 and their dependence on the fundamental frequency has 80 been shown. It seems to be obvious that, regardless of the 70 particular technique, inverse filtering in speech fails as fre- 60 quency increases. In natural singing voice, where pitch is in- Amplitude (dB) 50 herently high, there are no references in order to make sure whether this is the only cause of this failure. In Section 3, 40 0 500 1000 1500 2000 2500 3000 and with the aim to give an answer to this question, a novel Frequency (Hz) noninteractive source-filter model has been introduced for singing voice modeling, including vibrato as an additional AM-FM representation without source feature. It has been shown that this model can represent the VTR vibrato production in singing voice. In addition, this model Figure 16: AM-FM representation without source. has allowed a relationship between sinusoidal modeling and source-filter model, through which authors have coined as AM-FM representation. However, the extent of vibrato in this baritone recording is In this last section, AM-FM representation will be used lower than in synthetic signal. In the case of instantaneous again in singing voice analysis, in order to determine whether amplitude, natural singing voice results are not as regular as there are other effects in singing voice when fundamen- synthetic ones. This is because of reverberation and irreg- tal frequency increases. To this end, the same analysis of ularities of natural voice. Regarding intensity of the sound, Section 3 has been applied to the signal database of Section 2 there are not large variations in instantaneous amplitude, corresponding to three male singers’ recordings. On the and so, for one or two vibrato cycles it could be considered one hand, inverse filtering is applied and GSD and VTR constant. In this situation, the AM-FM representation of the are estimated. On the other hand, sinusoidal modeling harmonics, shown in Figure 19, is very similar to synthetic is considered and the two instantaneous magnitudes (fre- signal’s AM-FM representation, though the already men- quency and amplitude for each harmonic) are measured. tioned irregularities are present. In Figure 20, the GSD spec- Then, the AM-FM representation is obtained for each (fre- trum is shown for the signal of Figures 17a, 18a.Itisverysim- quency modulated) harmonic, and the GSD is removed from ilar to the synthetic GSD spectrum, both are low frequency this representation using the GSD obtained by the inverse periodic signals, although it has slight variations in its har- filtering. monic amplitudes that will be explained later. In Figure 22, the results obtained for several fundamen- Now, the so-obtained GSD spectrum will be used to ex- tal frequencies, for the baritone singer, are shown. As in tract from the AM-FM the information of the VTR. The re- Section 2, these results are representative of other singers’ sult of this operation is shown in Figure 21. recordings and other vowels. As in the case of synthetic signal, the compensated AM- Regarding the AM-FM representation, it is possible to FM representation is very close to the VTR obtained by in- say, looking at Figure 22, that as fundamental frequency in- verse filtering. However, the matching is not as perfect as for creases, the frequency range swept by one harmonic is wider, the synthetic signal. because of the extent and intonation relationship. Also, as From this two-signal model comparison, it is possible fundamental frequency increases, the AM-FM representa- to conclude that the simplified noninteractive source-filter tions of two consecutive harmonics are more separated, model with vibrato can explain, in an approximated way, which is a direct consequence of their harmonic relationship. what is happening in singing voice when vibrato is present. In addition to these obvious effects, there is no other evi- Now, it is possible to say that GSD and VTR have not large dent consequence of fundamental frequency increase in this variations during a few vibrato cycles. In this way, the in- analysis, and thus the simplified noninteractive source-filter stantaneous amplitude and frequency obtained by sinusoidal model with vibrato can model high-pitched singing voice modeling provide more, and complementary, information with vibrato, from the signal point of view. about GSD and VTR during vibrato than known analysis The main limitation of the plain AM-FM representa- methods. tion is that no source-filter separation is possible unless it is It is important to note that the AM-FM representation combined with other method, and thus, from here, nothing by itself does not provide information of GSD and VTR sep- can be said about the exact shape of GSD and VTR. How- arately, but it represents, in the vicinity of each harmonic, a ever, the main advantage of this representation is that it has small section of the VTR. In order to know what is exactly no fundamental frequency limit, and so, it can be applied happening with GSD and VTR during vibrato, precautions in every singing voice sample with vibrato. This conclusion have to be taken with recording conditions. Even in nonopti- brings along another evidence: the noninteractive source- mum conditions, AM-FM representation of vibrato provides filter model remains valid in singing voice. complementary information to that of inverse filtering meth- We can summarize the main contributions and conclu- ods. sions of this work as follows. 1018 EURASIP Journal on Applied Signal Processing

40 30 20 10 2.759 2.764 2.769 2.774 2.779 2.784 0 −10 Amplitude (dB) −20 −30 0 1000 2000 3000 4000 5000 Time (s) Frequency (Hz)

(a) (b)

Figure 17: Inverse filtering results. GSB inverse filtering algorithm. (a) GSD (b) VTR.

2500 60 2000 55

1500 50 45 1000 40 Frequency (Hz) Amplitude (dB) 500 35 0 30 1.21.41.61.822.2 1.21.41.61.822.2 Time (s) Time (s)

(a) (b)

Figure 18: Sinusoidal modeling results. (a) Instantaneous frequency. (b) Instantaneous amplitude.

80 50 70 40 60 30 50 20 40 30 10 0 Amplitude (dB) Amplitude (dB) 20 10 −10 0 −20 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 Frequency (Hz) Frequency (Hz) Figure 19: AM-FM representation. AM-FM representation without source VTR of inverse filtering 60 50 Figure 21: AM-FM representation without source. 40 30 (i) Several representative inverse filtering techniques have 20 been critically compared when applied to speech. It has Amplitude (dB) 10 been shown how all of them fail as frequency increases, as it is the case in singing voice. 0 0 500 1000 1500 2000 2500 3000 (ii) A novel noninteractive source-filter model has been Frequency (Hz) proposed for singing voice, which includes vibrato as a possible feature. GSD spectrum Spectral peaks (iii) The existence of vibrato and the above mentioned model has allowed to relate source-filter model (i.e., Figure 20: GSD Short term spectrum. Blackman-Harris window. inverse filtering techniques) and the simple sinusoidal Vibrato in Singing Voice 1019

40 and VTR, the AM-FM representation gives comple- 30 mentary information particularly in high frequency ranges, where inverse filtering does not work. 20 10 ACKNOWLEDGMENTS 0 Amplitude (dB) −10 The Gobierno de Navarra and the Universidad Publica´ de Navarra are gratefully acknowledged for financial support. −20 0 500 1000 1500 2000 2500 3000 Authors would also like to acknowledge the support from Xavier Rodet and Axel Roebel (IRCAM, Paris), material and Frequency (Hz) medical support from Ana Mart´ınez Arellano, and the col- AM-FM representation without source laboration from student Daniel Erro who implemented some VTR of inverse filtering of the algorithms. (a) REFERENCES 50 [1] N. Henrich, Etude de la source glottique en voix parl´ee et 40 chant´ee : mod´elisation et estimation, mesures acoustiques et 30 ´electroglottographiques, perception, Ph.D. thesis, Paris 6 Uni- versity, Paris, France, 2001. 20 [2] B. H. Story, “An overview of the physiology, physics and mod- 10 eling of the sound source for vowels,” Acoustical Science and 0 Technology, vol. 23, no. 4, pp. 195–206, 2002. Amplitude (dB) −10 [3] B. Guerin, M. Mrayati, and R. Carre, “A voice source taking account of coupling with the supraglottal cavities,” −20 0 500 1000 1500 2000 2500 3000 in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP ’76), vol. 1, pp. 47–50, Philadelphia, Pa, USA, April Frequency (Hz) 1976. AM-FM representation without source [4] T. V. Ananthapadmanabha and G. Fant, “Calculation of the VTR of inverse filtering true glottal flow and its components,” Speech Communication, vol. 1, no. 3-4, pp. 167–184, 1982. (b) [5] M. Berouti, D. G. Childers, and A. Paige, “Glottal area ver- sus glottal volume-velocity,” in Proc. IEEE Int. Conf. Acous- tics, Speech, Signal Processing (ICASSP ’77), vol. 2, pp. 33–36, 30 Cambridge, Mass, USA, May 1977. 20 [6] G. Fant, Acoustic Theory of Speech Production, Mouton, The Hague, The Netherlands, 1960. 10 [7] D. H. Klatt and L. C. Klatt, “Analysis, synthesis, and percep- 0 tion of voice quality variations among female and male talk- −10 ers,” Journal of the Acoustical Society of America, vol. 87, no. 2, Amplitude (dB) pp. 820–857, 1990. −20 [8] G. Fant, J. Liljencrants, and Q. Lin, “A four-parameter model −30 of glottal flow,” Speech Transmission Laboratory-Quarterly 0 500 1000 1500 2000 2500 3000 Progress and Status Report, vol. 85, no. 2, pp. 1–13, 1985. Frequency (Hz) [9] H. Fujisaki and M. Ljungqvist, “Proposal and evaluation of models for the glottal source waveform,” in Proc. IEEE AM-FM representation without source Int. Conf. Acoustics, Speech, Signal Processing (ICASSP ’86), VTR of inverse filtering vol. 11, pp. 1605–1608, Tokyo, Japan, April 1986. [10] A. K. Krishnamurthy and D. G. Childers, “Two-channel (c) speech analysis,” IEEE Trans. Acoustics, Speech, and Signal Pro- cessing, vol. 34, no. 4, pp. 730–743, 1986. Figure 22: AM-FM representation removing the source and VTR [11] P. Alku and E. Vilkman, “Estimation of the glottal pulseform given by inverse filtering. (a) F0 = 110 Hz, vowel “a,” (b) F0 = based on discrete all-pole modeling,” in Proc. 2nd Interna- 156 Hz, vowel “a,” (c) F0 = 227 Hz, vowel “a.” tional Conf. on Spoken Language Processing (ICSLP ’94),pp. 1619–1622, Yokohama, Japan, September 1994. [12] H.-L. Lu and J. O. Smith, “Joint estimation of vocal tract fil- ter and glottal source waveform via convex optimization,” in Model. In other words, although both are signal mod- IEEE Workshop on Applications of Signal Processing to Audio els for singing voice, the first one is related to the and Acoustics (WASPAA ’99), pp. 79–92, New Paltz, NY, USA, voice production and the second one is a general signal October 1999. [13] I. Arroabarren and A. Carlosena, “Glottal spectrum based model, but thanks to vibrato both can be linked. inverse filtering,” in Proc. 8th European Conference on (iv) Even though sinusoidal modeling does not allow to Speech Communication and Technology (EUROSPEECH ’03), obtain separate information about the sound source Geneva, Switzerland, September 2003. 1020 EURASIP Journal on Applied Signal Processing

[14] E. L. Riegelsberger and A. K. Krishnamurthy, “Glottal source Trans. Acoustics, Speech, and Signal Processing, vol. 27, no. 4, estimation: methods of applying the LF-model to inverse fil- pp. 350–355, 1979. tering,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Pro- [32] E. Prame, “Vibrato extent and intonation in professional cessing (ICASSP ’93), vol. 2, pp. 542–545, Minneapolis, Minn, western lyric singing,” Journal of the Acoustical Society of USA, April 1993. America, vol. 102, no. 1, pp. 616–621, 1997. [15] B. Doval, C. d’Alessandro, and B. Diard, “Spectral meth- [33] I. Arroabarren, M. Zivanovic, J. Bretos, A. Ezcurra, and ods for voice source parameters estimation,” in Proc. 5th A. Carlosena, “Measurement of vibrato in lyric singers,” IEEE European Conference on Speech Communication and Technol- Trans. Instrumentation and Measurement,vol.51,no.4,pp. ogy (EUROSPEECH ’97), vol. 1, pp. 533–536, Rhodes, Greece, 660–665, 2002. September 1997. [34] S. McAdams and X. Rodet, “The role of FM-induced AM in [16] I. Arroabarren and A. Carlosena, “Glottal source parame- dynamic spectral profile analysis,” in Basic Issues in Hearing, terization: a comparative study,” in Proc. ISCA Tutorial and H. Duifhuis, J. Horst, and H. Wit, Eds., pp. 359–369, Aca- Research Workshop on Voice Quality: Functions, Analysis and demic Press, London, UK, 1988. Synthesis, Geneva, Switzerland, August 2003. [35] M. Mellody and G. H. Wakefield, “Signal analysis of the [17] H.-L. Lu, Toward a high-quality singing synthesizer with vocal singing voice:low-order representations of singer identity,” in texture control, Ph.D. thesis, Stanford University, Stanford, Proc. International Computer Music Conference (ICMC ’00), Calif, USA, 2002. Berlin, Germany, August 2000. [18] N. Henrich, B. Doval, and C. d’Alessandro, “Glottal open quotient estimation using linear prediction,” in Proc. Inter- national Workshop on Models and Analysis of Vocal Emissions Ixone Arroabarren was born in Arizkun, for Biomedical Applications, Firenze, Italy, September 1999. Navarra, Spain, on December 11, 1975. She [19] N. Henrich, B. Doval, C. d’Alessandro, and M. Castellengo, received her Eng. degree in telecommunica- “Open quotient measurements on EGG, speech and singing tions in 1999, from the Public University of signals,” in Proc. 4th International Workshop on Advances in Navarra, Pamplona, Spain, where she is cur- Quantitative Laryngoscopy, Voice and Speech Research,Jena, rently pursuing her Ph.D. degree in the area Germany, April 2000. of signal processing techniques as they apply [20] N. Henrich, C. d’Alessandro, and B. Doval, “Spectral cor- to musical signals. She has collaborated in relates of voice open quotient and glottal flow asymmetry: industrial projects for the vending machine theory, limits and experimental data,” in Proc. 7th European industry. Conference on Speech Communication and Technology (EU- ROSPEECH ’01), Aalborg, Denmark, September 2001. Alfonso Carlosena was born in Navarra, [21] H.-L. Lu and J. O. Smith, “Glottal source modeling for singing Spain, in 1962. He received his M.S. de- voice synthesis,” in Proc. International Computer Music Con- gree with honors and his Ph.D. in physics in ference (ICMC ’00), Berlin, Germany, August 2000. 1985 and 1989, respectively, both from the [22] R. Maher and J. Beauchamp, “An investigation of vocal vi- University of Zaragoza, Spain. From 1986 brato for synthesis,” Applied Acoustics, vol. 30, no. 2-3, pp. to 1992 he was an Assistant Professor in 219–245, 1990. [23] I. Arroabarren, M. Zivanovic, and A. Carlosena, “Analysis and the Department of Electrical Engineering synthesis of vibrato in lyric singers,” in Proc. 11th European and Computer Science at the University of Signal Processing Conference (EUSIPCO ’02), Toulose, France, Zaragoza, Spain. Since October 1992, he has September 2002. been an Associate Professor with the Pub- [24] I. Arroabarren, M. Zivanovic, X. Rodet, and A. Carlosena, lic University of Navarra, where he has also served as Head of the “Instantaneous frequency and amplitude of vibrato in singing Technology Transfer Office. In March 2000, he was promoted to voice,” in Proc. IEEE 28th Int. Conf. Acoustics, Speech, Signal Full Professor at the same University. He has also been a Visiting Processing (ICASSP ’03), Hong Kong, China, April 2003. Scholar in the Swiss Federal Institute of Technology, Zurich and [25] R. J. McAulay and T. F. Quatieri, “Speech analysis/synthesis New Mexico State University, Las Cruces. His current research in- based on a sinusoidal representation,” IEEE Trans. Acoustics, terests are in the areas of analog circuits and signal processing, dig- Speech, and Signal Processing, vol. 34, no. 4, pp. 744–754, 1986. ital signal processing and instrumentation, where he has published [26] X. Serra, “Musical sound modeling with sinusoids plus noise,” over sixty papers in international journals and a similar number of in Musical Signal Processing,C.Roads,S.Pope,A.Picialli,and conference presentations. He is currently leading several industrial G. De Poli, Eds., Swets & Zeitlinger, Lisse, The Netherlands, projects for local firms. May 1997. [27]C.Ma,Y.Kamp,andL.F.Willems,“AFrobeniusnormap- proach to glottal closure detection from the speech signal,” IEEE Trans. Speech and Audio Processing, vol. 2, no. 2, pp. 258– 265, 1994. [28] J. Makhoul, “Linear prediction: a tutorial review,” Proceedings of the IEEE, vol. 63, no. 4, pp. 561–580, 1975. [29] A. El-Jaroudi and J. Makhoul, “Discrete all-pole modeling,” IEEE Trans. Signal Processing, vol. 39, no. 2, pp. 411–423, 1991. [30] B. Doval and C. d’Alessandro, “Spectral correlates of glot- tal waveform models: an analytic study,” in Proc. IEEE 22th Int. Conf. Acoustics, Speech, Signal Processing (ICASSP ’97),pp. 1295–1298, Munich, Germany, April 1997. [31] D. Y. Wong, J. D. Markel, and A. H. Gray, “Least squares glottal inverse filtering from the acoustic speech waveform,” IEEE EURASIP Journal on Applied Signal Processing 2004:7, 1021–1035 c 2004 Hindawi Publishing Corporation

A Hybrid Resynthesis Model for Hammer-String Interaction of Piano Tones

Julien Bensa LaboratoiredeM´ecanique et d’Acoustique, Centre National de la Recherche Scientifique (LMA-CNRS), 13402 Marseille Cedex 20, France Email: [email protected] Kristoffer Jensen Datalogisk Institut, Københavns Universitet, Universitetsparken 1, 2100 København, Denmark Email: [email protected]

Richard Kronland-Martinet LaboratoiredeM´ecanique et d’Acoustique, Centre National de la Recherche Scientifique (LMA-CNRS), 13402 Marseille Cedex 20, France Email: [email protected]

Received 7 July 2003; Revised 9 December 2003

This paper presents a source/resonator model of hammer-string interaction that produces realistic piano sound. The source is generated using a subtractive signal model. Digital waveguides are used to simulate the propagation of waves in the resonator. This hybrid model allows resynthesis of the vibration measured on an experimental setup. In particular, the nonlinear behavior of the hammer-string interaction is taken into account in the source model and is well reproduced. The behavior of the model parameters (the resonant part and the excitation part) is studied with respect to the velocities and the notes played. This model exhibits physically and perceptually related parameters, allowing easy control of the sound produced. This research is an essential step in the design of a complete piano model. Keywords and phrases: piano, hammer-string interaction, source-resonator model, analysis/synthesis.

1. INTRODUCTION from the analysis of natural sounds. In addition, the playing of the synthesis model requires a good relationship between This paper is a contribution to the design of a com- the physics of the instrument, the synthesis parameters, and plete piano-synthesis model. (Sound examples obtained us- the generated sounds. This relationship is crucial to having ing the method described in this paper can be found a good interaction between the “digital instrument” and the at www.lma.cnrs-mrs.fr/∼kronland/JASP/sounds.html.) It is player, and it will constitute the most important aspects our the result of several attempts [1, 2], eventually leading to a piano model has to deal with. stable and robust methodology. We address here the model- Music based on the so-called “sound objects”—like ing for synthesis of a key aspect of piano tones: the hammer- electro-acoustic music or “musique concrete”—lies` on syn- string interaction. This model will ultimately need to be thesis models allowing subtle and natural transformations linked to a soundboard model to accurately simulate piano of the sounds. The notion of natural transformation of sounds. sounds consists here in transforming them so that they cor- The design of a synthesis model is strongly linked to the respond to a physical modification of the instrument. As specificity of the sounds to be produced and to the expected a consequence, such sound transformations calls for the use of the model. This work was done in the framework model to include physical descriptions of the instrument. of the analysis-synthesis of musical sounds; we seek both Nevertheless, the physics of musical instruments is some- reconstructing a given piano sound and using the synthe- times too complicated to be exhaustively taken into ac- sis model in a musical context. The perfect reconstruction count, or not modeled well enough to lead to satisfactory of given sounds is a strong constraint: the synthesis model sounds. This is the case of the piano, for which hundreds must be designed so that the parameters can be extracted of mechanical components are connected [3], and for which 1022 EURASIP Journal on Applied Signal Processing

the hammer-string interaction still poses physical modeling Source Excitation Resonator (nonlinear Sound problems. Control (physical model) To take into account the necessary simplifications made signal model) in the physical description of the piano sounds, we have used hybrid models that are obtained by combining physical and Figure 1: Hybrid model of piano sound synthesis. signal synthesis models [4, 5]. The physical model simulates the physical behavior of the instrument whereas the signal ffi model seeks to recreate the perceptual effect produced by the ations. Smith proposed e cient resonators [11] by using instrument. The hybrid model provides a perceptually plau- the so-called digital waveguide. This approach simulates the sible resynthesis of a sound as well as intimate manipulations physics of the propagating waves in the string. Moreover, the in a physically and perceptually relevant way. Here, we have waveguide parameters are naturally correlated to the phys- used a physical model to simulate the linear string vibration, ical parameters, making for easy control. Borin and Bank and a physically informed signal model to simulate the non- [12, 13] used this approach to design a synthesis model of pi- linear interaction between the string and the hammer. ano tones based on physical considerations by coupling dig- An important problem linked to hybrid models is the ital waveguides and a “force generator” simulating the ham- coupling of the physical and the signal models. To use a mer impact. The commuted synthesis concept [14, 15, 16] source-resonator model, the source and the resonator must uses the linearity of the digital waveguide to commute and be uncoupled. Yet, this is not the case for the piano since the combine elements. Then, for the piano, a hybrid model was hammer interacts with the strings during 2 to 5 milliseconds proposed, combining digital waveguide, a phenomenologi- [6, 7]. A significant part of the piano sound characteristics is cal hammer model, and a time-varying filtering that simu- due to this interaction. Even though this observation is true lates the soundboard behavior. Our model is an extension of from a physical point of view, this short interaction period these previous works, to which we added a strong constraint is not in itself of great importance from a perceptual point of resynthesis capability. Here, the resonator was modeled of view. The attack is constituted of two parts due to two vi- using a physically related model, the digital waveguide; and brating ways [8]: one percussive, a result of the impact of the the source—destined to generate the initial condition for the key on the frame, and another that starts when the hammer string motion—was modeled using a signal-based nonlinear strikes the strings. Schaeffer [9] showed that cutting the first model. milliseconds of a piano sound (for a bass note, for which the Theadvantagesofsuchahybridmodelarenumerous: impact of the key on the frame is less perceptible) does not (i) it is simple enough so that the parameters can be accu- alter the perception of the sound. We have informally carried rately estimated from the analysis of real sound, out such an experiment by listening to various piano sounds (ii) it takes into account the most relevant physical char- cleared of their attack. We found that, from a perceptual acteristics of the piano strings (including coupling be- point of view, when the noise due to the impact of the key on tween strings) and it permits the playing to be con- the frame is not too great (compared to the vibrating energy trolled (the velocity of the hammer), provided by the string), the hammer-string interaction is not (iii) it simulates the perceptual effect due to the nonlin- audible in itself. Nevertheless, this interaction undoubtedly ear behavior of the hammer-string interaction, and it plays an important role as an initial condition for the string allows sounds transformation with both physical and motion. This is a substantial point justifying the dissociation perceptual approaches. of the string model and the source model in the design of our synthesis model. Thus, the resulting model consists in Even though the model we propose is not computationally what is commonly called a “source-resonant” system (as il- costly, we address here its design and its calibration rather lustrated in Figure 1). Note that the model still makes sense than its real time implementation. Hence, the calculus and for high-frequency notes, for which the impact noise is of im- reasoning are done in the frequency domain. The time do- portance. Actually, the hammer-string interaction only lasts a main implementation should give rise to a companion arti- couple of milliseconds, while the impact sound consists of an cle. additional sound, which can be simulated using predesigned samples. Since waves are still running in the resonator after 2. THE RESONATOR MODEL the release of the key, repeated keystroke is naturally taken into account by the model. Several physical models of transverse wave propagation on a Laroche and Meillier [10] used such a source-resonator struck string have been published in the literature [17, 18, 19, technique for the synthesis of piano sound. They showed 20]. The string is generally modeled using a one-dimensional that realistic piano tones can be produced using IIR filters to wave equation. The specific features of the piano string that model the resonator and common excitation signals for sev- are important in wave propagation (dispersion due to the eral notes. Their simple resonator model, however, yielded stiffness of the string and frequency-dependent losses) are excitation signals too long (from 4 to 5 seconds) to accu- further incorporated through several perturbation terms. To rately reproduce the piano sound. Moreover, that model took account for the hammer-string interaction, this equation is into account neither the coupling between strings nor the de- then coupled to a nonlinear force term, leading to a sys- pendence of the excitation on the velocity and octave vari- tem of equations for which an analytical solution cannot be Hybrid Resynthesis of Piano Tones 1023 exhibited. Since the string vibration is transmitted only to G(ω) the radiating soundboard at the bridge level, it is not use- ful to numerically calculate the entire spatial motion of the E(ω) D(ω) F(ω) S(ω) string. The digital waveguide technique [11]providesanef- ficient way of simulating the vibration at the bridge level of the string, when struck at a given location by the hammer. Moreover, the parameters of such a model can be estimated from the analysis of real sounds [21]. Figure 2: Elementary digital waveguide (named G).

2.1. The physics of vibrating strings We present here the main features of the physical modeling of ness of the hammer felt increases with the impact velocity. In the next paragraph, we show how the waveguide model pa- piano strings. Consider the propagation of transverse waves ffi in a stiff damped string governed by the motion equation rameters are related to the amplitudes, damping coe cients, [21] and frequencies of each partial.

2 2 4 3 2.2. Digital waveguide modeling ∂ y 2 ∂ y 2 ∂ y ∂y ∂ y − c + κ +2b1 − 2b2 = P(x, t), (1) ∂t2 ∂x2 ∂x4 ∂t ∂x2∂t 2.2.1. The single string case: elementary digital waveguide where y is the transverse displacement, c the wave speed, κ the stiffness coefficient, b1 and b2 the loss parameters. To model wave propagation in a piano string, we use a digital Frequency-dependent loss is introduced via mixed time- waveguide model [11]. In the single string case, the elemen- space derivative terms (see [21, 22] for more details). We ap- tary digital waveguide model (named G) we used consists of ply fixed boundary conditions a single loop system (Figure 2) including 2 2 (i) a delay line (a pure delay filter named D) simulating ∂ y ∂ y | = = | = = = = the time the waves take to travel back and forth in the y x 0 y x L 2 2 0, (2) ∂x x=0 ∂x x=L medium, where L is the length of the string. After the hammer-string (ii) a filter (named F) taking into account the dissipation contact, the force P isequaltozeroandthissystemcanbe and dispersion phenomena, together with the bound- solved. An analytical solution can be expressed as a sum of ary conditions. The modulus of F is then related to the exponentially damped sinusoids: damping of the partials and the phase to inharmonic- ity in the string, ∞ − (iii) an input E corresponding to the frequency-dependent = αnt iωnt y(x, t) an(x)e e ,(3)energy transferred to the string by the hammer, = n 1 (iv) an output S representing the vibrating signal measured where an is the amplitude, αn is the damping coefficient, and at an extremity of the string (at the bridge level). ω is the frequency of the nth partial. Due to the stiffness, the n The output of the digital waveguide driven by a delta waves are dispersed and the partial frequencies, which are not function can be expanded as a sum of exponentially damped perfectly harmonic, are given by [23] sinusoids. The output thus coincides with the solution of the & motion equation of transverse waves in a stiff damped string = 2 ωn 2πnω0 1+Bn ,(4)for a source term given by a delta function force. As shown in [21, 24], the modulus and phase of F are related to the damp- where ω is the fundamental radial frequency of the string 0 ing and the frequencies of the partials by the expressions without stiffness, and B is the inharmonicity coefficient [23]. The losses are frequency dependent and expressed by [21] αnD F ωn = e ,      (6) 2  2 arg F ωn = ωnD − 2nπ, =− −  π  −  ωn  αn b1 b2 2 1+ 1+4B . (5) 2BL ω0 with ωn and αn given by (4)and(5). Aftersomecalculations(see[21]), we obtain the expres- The spectral content of the piano sound, and of most mu- sions of the modulus and the phase of the loop filter in terms sical instruments, is modified with respect to the dynamics. of the physical parameters: For the piano, this nonlinear behavior consists of an increase of the brightness of the sound and it is linked mainly to the b π2ξ F(ω)  exp − D b + 2 ,(7) hammer-string contact (the nonlinear nature of the gener- 1 2BL2 ation of longitudinal waves also participates in the increase  of brightness; we do not take this phenomena into account ξ arg F(ω)  Dω − Dω ,(8) since we are interested only in transversal waves). The stiff- 0 2B 1024 EURASIP Journal on Applied Signal Processing

with Ce  C C 2 a a =− 4Bω ξ 1+ 1+ 2 (9) ω0 in terms of the inharmonicity coefficient B [23].

Ce(ω) 2.2.2. The multiple strings case: coupled digital

waveguides G1(ω) In the middle and the treble range of the piano, there are two or three strings for each note in order to increase the ef- Ca(ω) ficiency of the energy transmission towards the bridge. The vibration produced by this coupled system is not the super- position of the vibrations produced by each string. It is the result of a complex coupling between the modes of vibra- Ca(ω) tion of these strings [25]. This coupling leads to phenomena like beats and double decays on the amplitude of the par- E(ω) G2(ω) S(ω) tials, which constitute one of the most important features of the piano sound. Beats are used by professionals to precisely Ca(ω) tune the doublets or triplets of strings. To resynthesize the vi- bration of several strings at the bridge level, we use coupled digital waveguides. Smith [14] proposed a coupling model with two elementary waveguides. He assumed that the two C (ω) strings were coupled to the same termination, and that the a losses were lumped to the bridge impedance. This technique G3(ω) leads to a simple model necessitating only one loss filter. But the decay times and the coupling of the modes are not in- Ce(ω) dependent. Valim¨ aki¨ et al. [26] proposed another approach that couples two digital waveguides through real gain ampli- Figure 3: The three-coupled digital waveguide (bottom) and the fiers. In that case, the coupling is the same for each partial, corresponding physical system at the bridge level (top). and the time behavior of the partials is similar. For synthesis purpose, Bank [27] showed that perceptually plausible beat- ing sound can be obtained by adding only a few resonators in parallel. We have designed two models, a two- and a three- To ensure the stability of the different models, one has to coupled digital waveguides, which are an extension of respect specific relations. First the modulus of the loop filters Valim¨ aki¨ et al.’s approach. They consist in separating the time must be inferior to 1. Second, for coupled digital waveguides, behavior of the components by using complex-valued and the following relations must be verified: frequency-dependent linear filters to couple the waveguides. The three-coupled digital waveguide is shown on Figure 3. |C| G1 G2 < 1 (10) The two models accurately simulate the energy transfer be- tween the strings (see Section 2.4.3). A related method [28] in the case of two-coupled waveguides, and (with an example of piano coupling) has been recently avail- 2 2 2 2 able in the context of digital waveguide networks. G1G2Ca + G1G3Ce + G2G3Ca +2G1G2G3CaCe < 1 (11) Each string is modeled using an elementary digital wave- guide (named G1, G2, G3;eachloopfilteranddelaysare in the case of three-coupled waveguides. Assuming that those named F1, F2, F3,andD1, D2, D3 respectively). The coupled relations are verified, the models are stable. model is then obtained by connecting the output of each el- This work takes place in the general analysis-synthesis ementary waveguide to the input of the others through cou- framework, meaning that the objective is not only to simu- pling filters. The coupling filters simulate the wave propa- late sounds, but also to reconstruct a given sound. The model gation along the bridge and are thus correlated to the dis- must therefore be calibrated carefully. In the next section is tance between the strings. In the case of a doublet of strings, presented the inverse problem allowing the waveguide pa- the two coupling filters (named C) are identical. In the case rameters to be calculated from experimental data. We then of a triplet of strings, the coupling filters of adjacent strings describe the experiment and the measurements for one-, (named Ca)areequalbutdiffer from the coupling filters of two- and three-coupled strings. We then show the validity the extreme strings (named Ce). The excitation signal is as- and the accuracy of the analysis-synthesis process by com- sumed to be the same for each elementary waveguide since paring synthetic and original signals. Finally, the behavior of we suppose the hammer strikes the strings in a similar way. the signal of the real piano is verified. Hybrid Resynthesis of Piano Tones 1025

2.3. The inverse problem We address here the estimation of the parameters of each el- Modulus ementary waveguide as well as the coupling filters from the 1.1 analysis of a single signal (measured at the bridge level). For 1 this, we assume that in the case of three-coupled strings the signal is composed of a sum of three exponentially decay- 0.9 ing sinusoids for each partial (and respectively one and two exponentially decaying sinusoids in the case of one and two 0.8 strings). The estimation method is a generalization of the one described in [29] for one and two strings. It can be summa- 0.7 rized as follows: start by isolating each triplet of the measured 4 3 signal through bandpass filtering (a truncated Gaussian win- Velocity (m/s) 3000 2 2000 dow); then use the Hilbert transform to get the correspond- 1000 1 ing analytic signal and obtain the average frequency of the Frequency (Hz) component by derivating the phase of this analytic signal; fi- nally, extract from each triplet the three amplitudes, damping Figure 4: Amplitude of filter F as a function of the frequency and coefficients, and frequencies of each partial by a parametric of hammer velocity. method (Steiglitz-McBride method [30]). The second part of the process is described in detail in the a hardwood support. The strings are tightened between the appendix. In brief, we identify the Fourier transform of the bridge and the agraffe and tuned manually. It is clear that sum of the three exponentially damped sinusoids (the mea- the strings are not totally uncoupled to their support. Nev- sured signal) with the transfer function of the digital wave- ertheless, this experiment has been used to record signals guide (the model output). This identification leads to a lin- of struck strings, in order to validate the synthesis models, ear system that admits an analytical solution in the case of and was it entirely satisfactory for this purpose. One, two, or one or two strings. In the case of three coupled strings, the three strings are struck with a hammer linked to an electron- solution can be found only numerically. The process gives an ically piloted key. By imposing different voltages to the sys- estimation of the modulus and of the phase of each filter near tem, one can control the hammer velocity in a reproducible the resonance peaks as a function of the amplitudes, damp- way. The precise velocity is measured immediately after es- ing coefficients, and frequencies. Once the resonator model capement by using an optic sensor (MTI 2000, probe module is known, we extract the excitation signal by a deconvolution 2125H) pointing to the side of the head of the hammer. The process with respect to the waveguide transfer function. Since vibration at the bridge level is measured by an accelerome- the transfer function has been identified near the resonant ter (B&K 4374). The signals are directly recorded on digital peaks, the excitation is also estimated at discrete frequency audio tape. Acceleration signals correspond to hammer ve- values corresponding to the partial frequencies. This excita- locities between 0.8 m.s−1 and 5.7 m.s−1. tion corresponds to the signal that has to be injected into the resonator to resynthesize the actual sound. 2.4.2. Filter estimation 2.4. Analysis of experimental data and validation From the signals collected on the experimental setup, a set of the resonator model of data was extracted. For each hammer velocity, the wave- guide filters and the corresponding excitation signals were We describe here first an experimental setup allowing the estimated using the techniques described above. The filters measurement of the vibration of one, two, or three strings ff were studied in the frequency domain; it is not the purpose struck by a hammer for di erent velocities. Then we show of this paper to describe the method for the time domain and how to estimate the resonator parameters from those mea- to fit the transfer function using IIR or FIR filters. surements, and finally, we compare original and synthesized Figure 4 shows the modulus of the filter response F for signals. This experimental setup is an essential step that vali- the first twenty-five partials in the case of tones produced dates the estimation method. Actually, estimating the param- by a single string. Here the hammer velocity varies from eters of one-, two-, or three-coupled digital waveguides from 0.7 m.s−1 to 4 m.s−1. One notices that the modulus of the only one signal is not a trivial process. Moreover, in a real pi- waveguide filters is similar for all hammer velocities. The res- ano, many physical phenomena are not taken into account in onator represents the strings that do not change during the the model presented in the previous section. It is then neces- experiment. If the estimated resonator remains the same for sary to verify the validity of the model on a laboratory exper- different hammer velocities, all the nonlinear behavior due iment before applying the method to the piano case. to the dynamic has been taken into account in the excitation part. The resonator and the source are well separated. This 2.4.1. Experimental setup result validates our approach based on a source-resonator On the top of a massive concrete support, we have attached separation. For high frequency partials, however, the filter a piece of a bridge taken from a real piano. On the other modulus decreased slightly as a function of the hammer ve- extremity of the structure, we have attached an agraffeon locity. This nonlinear behavior is not directly linked to the 1026 EURASIP Journal on Applied Signal Processing

Modulus Phase 1.1 12 10 1 8 0.9 6 4 0.8 2 0.7 0 4 4 3 3 Velocity (m/s) 3000 Velocity (m/s) 3000 2 2000 2 2000 1000 1000 1 1 Frequency (Hz) Frequency (Hz)

Figure 5: Amplitude of filter F2 (three-coupled waveguide model) Figure 6: Phase of filter F as a function of the frequency and ham- as a function of the frequency and of hammer velocity. mer velocity. hammer-string contact. It is mainly due to nonlinear phe- Modulus nomena involved in the wave propagation. At large ampli- 0.2 tude motion, the tension modulation introduces greater in- ff ternal losses (this e ectisevenmorepronouncedinplucked 0.15 strings than in struck strings). The filter modulus slowly decreases (as a function of fre- 0.1 quency) from a value close to 1. Since the higher partials are more damped than the lower ones, the amplitude of the filter 0.05 decreases as the frequency increases. The value of the filter 0 modulus (close to 1) suggests that the losses are weak. This 4 is true for the piano string and is even more obvious on this 3 3000 experimental setup, since the lack of a soundboard limits the Velocity (m/s) 2 2000 acoustic field radiation. More losses are expected in the real 1000 1 piano. Frequency (Hz) We now consider the multiple strings case. From a phys- ical point of view, the behavior of the filters F1, F2,andF3 Figure 7: Modulus of filter Ca as a function of the frequency and of (which characterize the intrinsic losses) of the coupled digi- hammer velocity. tal waveguides should be similar to the behavior of the filter F for a single string, since the strings are supposed identical. This is verified except for high-frequency partials. This be- The coupling filters simulate the energy transfer between havior is shown on Figure 5 for filter F2 of the three-coupled the strings and are frequency dependent. Figure 7 represents waveguide model. Some artifacts pollute the drawing at high one of these coupling filters for different values of the ham- frequencies. The poor signal/noise ratio at high frequency mer velocity. The amplitude is constant with respect to the (above 2000 Hz) and low velocity introduce error terms in hammer velocity (up to signal/noise ratio at high frequency the analysis process, leading to mistakes on the amplitudes of and low velocity), showing that the coupling is independent the loop filters (for instance, a very small value of the modu- of the amplitude of the vibration. The coupling rises with the lus of one loop filter may be compensated by a value greater frequency. The peaks at frequencies 700 Hz and 1300 Hz cor- than one for another loop filter; the stability of the coupled respond to a maximum. waveguide is then preserved). Nevertheless, this does not al- ter the synthetic sound since the corresponding partials (high 2.4.3. Accuracy of the resynthesis frequency) are weak and of short duration. At this point, one can resynthesize a given sound by using a The phase is also of great importance since it is related single- or multicoupled digital waveguide and the parame- to the group delay of the signal and consequently directly ters extracted from the analysis. For the synthetic sounds to linked to the frequency of the partials. The phase is a non- be identical to the original requires describing the filters pre- linear function of the frequency (see (8)). It is constant with cisely. The model was implemented in the frequency domain, the hammer velocity (see Figure 6) since the frequencies of as described in Section 2, thus taking into account the ex- the partials are always the same (linearity of the wave propa- act amplitude and the phase of the filters (for instance, for a gation). three-coupled digital waveguide, we have to implement three Hybrid Resynthesis of Piano Tones 1027

Amplitude Amplitude (arbitrary scale) (arbitrary scale) 0.02 0.04

0.01 0.02 0 0 200 0 200 0 400 2 400 4 2 600 4 Frequency (Hz)600 6 Frequency (Hz) 800 6 800 8 Time (s) Time (s)

(a) (a)

Amplitude Amplitude (arbitrary scale) (arbitrary scale) 0.02 0.04

0.01 0.02 0 0 200 0 200 0 400 2 400 4 2 600 4 600 6 Frequency (Hz) 800 8 6 Frequency (Hz) 800 Time (s) Time (s)

(b) (b)

Figure 8: Amplitude modulation laws (velocity of the bridge) for Figure 10: Amplitude modulation laws (velocity of the bridge) for the first six partials, one string, of the (a) original and (b) resynthe- the first six partials, three strings, of the (a) original and (b) resyn- sised sound. thesised sound.

Amplitude be developed in future reports. By injecting the excitation (arbitrary scale) signal obtained by deconvolution into the waveguide model, 0.05 the signal measured is reproduced on the experimental setup. Figures 8, 9,and10 show the amplitude modulation laws (ve- locity of the bridge) of the first six partials of the original 0 and the resynthesized sound. The variations of the tempo- 200 400 0 ral envelope are generally well retained, and for the coupled 600 4 2 Frequency (Hz) 800 8 6 system (in Figures 9 and 10), the beat phenomena are well Time (s) reproduced. The slight differences, not audible, are due to (a) fine physical phenomena (coupling between the horizontal and the vertical modes of the string) that are not taken into Amplitude account in our model. (arbitrary scale) In the one-string case, we now consider the second and 0.05 sixth partials of the original sound in Figure 8. We can see beats (periodic amplitude ) that show coupling phenomena on only one string. Indeed, the horizontal and 0 vertical modes of vibration of the string are coupled through 200 400 0 the bridge. This coupling was not taken into account in this 600 4 2 Frequency (Hz) 800 8 6 study since the phenomenon is of less importance than cou- Time (s) pling between two different strings. Nevertheless, we have (b) shown in [29] that coupling between two modes of vibration can also be simulated using a two-coupled digital waveguide model. The accuracy of the resynthesis validates a posteriori Figure 9: Amplitude modulation laws (velocity of the bridge) for our model and the source-resonator approach. the first six partials, two strings, of the (a) original and (b) resyn- thesised sound. 2.5. Behavior and control of the resonator through measurements on a real piano delays and five complex filters, moduli, and phases). Nev- To take into account the note dependence of the resonator, ertheless, for real-time synthesis purposes, filters can be ap- we made a set of measurements on a real piano, a Yamaha proached by IIR of low order (see, e.g., [26]). This aspect will Disklavier C6 grand piano equipped with sensors. The 1028 EURASIP Journal on Applied Signal Processing

1 4

0.95 3

2 0.9 1 0.85

Modulus 0 0.8 −1 Amplitude (arbitrary scale) 0.75 −2

0.7 −3 0 1000 2000 3000 4000 5000 6000 7000 0123456 Frequency (Hz) Time (ms)

Modeled 0.8 m/s Original 2m/s 4m/s Figure 11: Modulus of the waveguide filters for notes A0, F1 and D3, original and modeled. Figure 12: Waveform of three excitation signals of the experimental setup, corresponding to three different hammer velocities. vibrations of the strings were measured at the bridge by an accelerometer, and the hammer velocities were measured by modify, for instance, the fundamental frequency, consider- a photonic sensor. Data were collected for several velocities ing that the tension and the linear mass are unchanged. This and several notes. We used the estimation process described aspect has been taken into account in the implementation of in Section 2.3 for the previous experimental setup and ex- the model. tracted for each note and each velocity the corresponding resonator and source parameters. 3. THE SOURCE MODEL As expected, the behavior of the resonator as a func- tion of the hammer velocity and for a given note is similar In the previous section, we observed that the waveguide to the one described in Section 2.4.2, for the signals mea- filters are almost invariant with respect to the velocity. In sured on the experimental setup. The filters are similar with contrast, the excitation signals (obtained as explained in respect to the hammer velocity. Their modulus is close to Section 2.3 and related to the impact of the hammer on one, but slightly weaker than previously, since it now takes the string) varies nonlinearly as a function of the velocity, into account the losses due to the acoustic field radiated by thereby taking into account the timbre variations of the re- the soundboard. The resynthesis of the piano measurements sulting piano sound. From the extracted excitation signals, through the resonator model and the excitation obtained by we here study the behavior and design a source model by deconvolution are perceptively satisfactory since the sound is using signal methods, so as to simulate these behaviors pre- almost indistinguishable from the original one. cisely. The source signal is then convolved with the resonator On the contrary, the shape of the filters is modified as filter to obtain the piano bridge signal. a function of the note. Figure 11 shows the modulus of the waveguide filter F for several notes (in the multiple string 3.1. Nonlinear source behavior as a function case, we calculated an average filter by arithmetic averaging). of the hammer velocity The modulus of the loop filter is related to the losses under- Figure 12 shows the excitation signals extracted from the gone by the wave over one period. Note that this modulus in- measurement of the vibration of a single string struck by creases with the fundamental frequency, indicating decreas- a hammer for three velocities corresponding to the pianis- ing loss over one period as the treble range is approached. simo, mezzo-forte, and fortissimo musical playing. The exci- The relations (7)and(8), relating the physical parame- tation duration is about 5 milliseconds, which is shorter than ters to the waveguide parameters, allow the resonator to be what Laroche and Meillier [10] proposed and in accordance controlled in a relevant physical way. We can either change with the duration of the hammer-string contact [6]. Since the length of the strings, the inharmonicity, or the losses. But this interaction is nonlinear, the source also behaves nonlin- to be in accordance with the physical system, we have to take early. Figure 13 shows the spectra of several excitation signals into account the interdependence of some parameters. For obtained for a single string at different velocities regularly instance, the fundamental frequency is obviously related to spaced between 0.8 and 4 m/s. The excitation correspond- the length of the string, and to the tension and the linear ing to fortissimo provides more energy than the ones corre- mass. If we modify the length of the string, we also have to sponding to mezzo-forte and pianissimo. But this increased Hybrid Resynthesis of Piano Tones 1029

40 2200

30 4m/s 2000

20 1800 10 1600 Amplitude (dB) 0 Frequency (Hz)

−10 1400 0.8 m/s −20 1200

500 1000 1500 2000 2500 3000 3500 4000 4500 11.522.533.54 Frequency (Hz) Hammer velocity (m/s)

Figure 13: Amplitude of the excitation signals for one string and One string several velocities. Two strings Three strings

Figure 14: The spectral centroid of the excitation signals for one amplitude is frequency dependent: the higher partials in- (plain), two (dash-dotted) and three (dotted) strings. crease more rapidly than the lower ones with the same ham- mer velocity. This increase in the high partials corresponds to an increase in brightness with respect to the hammer ve- Static spectrum Spectral deviation Gain locity. It can be better visualized by considering the spec- E 0dB E tral centroid [31] of the excitation signals. Figure 14 shows s the behavior of this perceptually (brightness) relevant crite- ria [32] as a function of the hammer velocity. Clearly, for one, two, or three strings, the spectral centroid is increased, cor- responding to an increased brightness of the sound. In addi- tion to the change of slope, which translates into the change Hammer position Hammer velocity of brightness, Figure 13 shows several irregularities common to all velocities, among which a periodic modulation related to the location of the hammer impact on the string. Figure 15: Diagram of the subtractive source model.

3.2. Design of a source signal model The amplitude of the excitation increases smoothly as a func- amplitude. Earlier versions of this model were presented in tion of the hammer velocity. For high-frequency compo- [1, 2]. This type of models has been, in addition, shown to nents, this increase is greater than for low frequency compo- work well for many instruments [33]. nents, leading to a flattening of the spectrum. Nevertheless, In the early days of digital waveguides, Jaffe and Smith the general shape of the spectrum stays the same. Formants [24] modeled the velocity-dependent spectral deviation as do not move and the modulation of the spectrum due to the a one-pole lowpass filter. Laursen et al. [34]proposeda hammer position on the string is visible at any velocity. These second-order biquad filter to model the differences between observations suggest that the behavior of the excitation could guitar tones with different dynamics. be well reproduced using a model. A similar approach was developed by Smith and Van The excitation signal is seen as an invariant spectrum Duyne in the time domain [15]. The hammer-string interac- shaped by a smooth frequency response filter, the charac- tion force pulses were simulated using three impulses passed teristics of which depend on the hammer velocity. The re- through three lowpass filters which depend on the hammer sulting source model is shown on Figure 15. The subtractive velocity. In our case, a more accurate method is needed to source model consists of the static spectrum, the spectral de- resynthesize the original excitation signal faithfully. viation, and the gain. The static spectrum takes into account all the information that is invariant with respect to the ham- 3.2.1. The static spectrum mer velocity. It is a function of the characteristics of the ham- We defined the static spectrum as the part of the excitation mer and the strings. The spectral deviation and the gain both that is invariant with the hammer velocity. Considering the shape the spectrum as function of the hammer velocity. The expression of the amplitude of the partials, an, for a hammer spectral deviation simulates the shifting of the energy to the striking a string fixed at its extremities (see Valette and Cuesta high frequencies, and the gain models the global increase of [19]), and knowing that the spectrum of the excitation is 1030 EURASIP Journal on Applied Signal Processing

30 20 10 3.8 m/s 20 0 10 −10 2.0 m/s −20 0 −30 0.8 m/s Amplitude (dB) Amplitude (dB) −10 −40 −50 −20 −60 − −30 70 1000 2000 3000 4000 5000 6000 7000 0 2000 4000 6000 8000 10000 Frequency (Hz) Frequency (Hz) Original Figure 16:ThestaticspectrumEs(ω). Spectral tilt

Figure 17: Dynamic deviation of three excitation signals of the ex- related to amplitudes of the partials by E = anD [29], the perimental setup, original and modeled. static spectrum Es can be expressed as 50 4L sin nπx /L E ω = √ 0 , (12) s n T nπ 1+n2B 45 dB where T is the string tension and L its length, B is the inhar- 40 monicity factor, and x0 the striking position. We can easily 35 measure the striking position, the string length and the in- 01234 56 harmonicity factor on our experimental setup. On the other Hammer velocity (m/s) hand, we have an only estimation of the tension, it can be calculated through the fundamental frequency and the linear 20 mass of the string. Figure 16 shows this static spectrum for a single string. 15 Many irregularities, however, are not taken into account for dB/kHz 10 several reasons. We will see later their importance from a per- ceptual point of view. Equation (12)isstillused,however, 5 when the hammer position is changed. This is useful when 0123456 one plays with a different temperament because it reduces Hammer velocity (m/s) dissonance. Figure 18: Parameters g (gain)(top), a (spectral deviation) (bot- 3.2.2. The deviation with the dynamic tom) as a function of the hammer velocity for the experimental setup signals, original (+) and modeled (dashed). The spectral deviation and the gain take into account the de- pendency of the excitation signal on velocity. They are esti- mated by dividing the spectrum of the excitation signal by where dˆ is the modeled deviation. The term g corresponds the static spectrum for all velocities: to the gain (independent of the frequency) and the term af corresponds to the spectral deviation. The variables g and E(ω) d(ω) = , (13) a depend on the hammer velocity. To get a usable source Es(ω) model, we must consider the parameter’s behavior with dif- ferent dynamics. Figure 18 shows the two parameters for sev- where E is the original excitation signal. Figure 17 shows this eral hammer velocities. The model is consistent since their ff deviation for three hammer velocities. It e ectively strength- behavior is regular. But the tilt increases with the hammer ve- ens the fortissimo, in particular for the medium and high locity, showing an asymptotic and nonlinear behavior. This partials. Its evolution with the frequency is regular and can observation can be directly related to the physics of the ham- successfully be fitted to a first-order exponential polynomial mer. As we have seen, when the felt is compressed, it be- (as shown in Figure 17) comes harder and thus gives more energy to high frequen- cies. But, for high velocities, the felt is totally compressed and ˆ = d exp(af + g), (14) its hardness is almost constant. Thus, the amplitude of the Hybrid Resynthesis of Piano Tones 1031 corresponding string wave increases further but its spectral 25 content is roughly the same. We have fitted this asymptotic 20 behavior by an exponential model (see Figure 18), for each parameter g and a, 15 10 g(v) = αg − βg exp − γg v , (15) 5 a(v) = αa − βa exp − γav ,

Amplitude (dB) 0 where αi (i = g, a) is the asymptotic value, βi (i = g, a)is the deviation from the asymptotic value at zero velocity (the −5 dynamic range), and γi (i = g, a) is the velocity exponen- tial coefficient, governing how sensitive the attribute is to a −10 velocity change. The parameters of this exponential model −15 were found using a nonlinear weighted curvefit. 0 2000 4000 6000 8000 10000 Frequency (Hz) 3.2.3. Resynthesis of the excitation signal Figure 19: Example of the error spectrum. The large errors gener- For a given velocity, the excitation signal can now be recre- ally fall in the weak parts of the spectrum. ated using (13), (14), and (15). The inverse Fourier trans- form of this source model convoluted with the transfer func- tion of the resonator leads to a realistic sound of a string struck by a hammer. The increase in brightness with the dy- 50 namic is well reproduced. But from a resynthesis point of 40 view, this model is not satisfactory. The reproduced signal is different from the original one; it sounds too regular and 30 monotonous. To understand this drawback of our model, we 20 calculated the error we made by dividing the original excita- tion signal by the modeled one for each velocity. The corre- 10 3.8 m/s sponding curves are shown on Figure 19 for three velocities. 0 Notice that this error term does not depend on the ham- Amplitude (dB) −10 mer velocity, meaning that our static spectrum model is too 2.0 m/s straightforward and does not take into account the irregular- −20 ities of the original spectrum. Irregularities are due to many phenomena including the width of the hammer-string con- −30 0.8 m/s tact, hysteretic phenomena in the felt, nonlinear phenomena −40 in the string, and mode resonances of the hammer. To obtain 0 2000 4000 6000 8000 10000 a more realistic sound with our source model, we include this Frequency (Hz) error term in the static spectrum. The resulting original and Original resynthesized signals are shown on Figure 20. The deviations Velocity modeled of the resulting excitations are perceptually insignificant. The synthesized sound obtained is then close to the original one. Figure 20: Original and modeled excitation spectrum for three dif- ferent hammer velocities for the experimental setup signals. 3.3. Behavior and control of the source through measurements on a real piano The source model parameters were calculated for a subset of ative increase in high frequencies leading to a brighter tone. the data for the piano, namely the notes A0, F1, B1, G2, C3, Equations (15) make it possible to resynthesize of the excita- G3,D4,E5,andF6.Eachnotehasapproximatelytenveloc- tion signal for a given note and hammer velocity. However, ities,fromabout0.4m/stobetween3to6m/s.Thesource parameters g and a used in the modeling are linked in a com- extracted from the signals measured on the piano behaves as plex way to the two most important perceptual features of the data obtained with the experimental setting for all notes the tone, that is, loudness and brightness. Thus, without a with respect to the hammer velocity. The dynamic deviation thorough knowledge of the model, the user will not be able is well modeled by the gain g and the spectral deviation pa- to adjust the parameters of the virtual piano to obtain a satis- rameter a.AsinSection 3.2, their behavior as a function of factory tone. To get an intuitive control of the model, the user the velocity is well fitted using an asymptotic exponential needs to be provided access to these perceptual parameters, curve. loudness and brightness, closely corresponding to energy and From a perceptual point of view, an increased hammer spectral centroid. The energy En is directly correlated to the velocity corresponds both to an increased loudness and a rel- perception of loudness and the spectral centroid Ba to the 1032 EURASIP Journal on Applied Signal Processing perception of brightness [32]. These parameters are given by F6 6000 1 Fs/2 5000 En = E2( f )df , E5 T 0 4000 D4 (16) G3 Fs/2 E( f ) fdf 3000 C3 Ba = 0 , Fs/2 Frequency (Hz) 2000 F1 G2 0 E( f )df A0B1 1000 where f is the frequency and Fs the sampling frequency. 123456 To synthesize an excitation signal having a given energy Hammer velocity (m/s) and spectral centroid, we must express parameters g and a as (a) functions of Ba and En. The centroid actually depends only on a: A0 Fs/2 F1 Es( f )eaf fdf 60 B1 Ba = 0 . (17) G2 C3 Fs/2 af G3 D4 0 Es( f )e df 40 E5 F6 dB We numerically calculate the expression of a as a function of 20 Ba and store the solution in a table. Alternatively, assuming that the brightness change is unaffected by the shape of the 0 static spectrum Es, the spectral deviation parameter a can be 123456 calculated directly from the given brightness [35]. Hammer velocity (m/s) Knowing a, we can calculate g from the energy En by the relation (b)   1 EnT g = log . (18) Figure 21: Spectral centroid (a) and energy (b) for several notes 2 Fs/2 2 2af2+2bf 0 Es ( f )e as a function of the hammer velocity, original (plain) and modeled (dotted). The behavior of Ba and En as a function of the hammer velocity will then determine the dynamic range of the instru- mentanditmustbedefinedbytheuser. Figure 21 shows the behavior of the spectral centroid and themselves, is modeled by a digital waveguide model that is ffi the energy for several notes. The curves have similar behavior very e cient in simulating the wave propagation. The res- and differ mainly by a multiplicative constant. We have fitted onatormodelexhibitsphysicalparameterssuchasthestring ffi their asymptotic behavior by an exponential model, similarly tension, the inharmonicity coe cient, allowing physically to what was done with (15). These functions are applied to relevant control of the resonator. It also takes into account ff the synthesis of each excitation signal and then characterize the coupling e ects, which are extremely relevant for percep- the dynamic range of the virtual instrument. It is easy for the tion. The source is extracted using a deconvolution process user to change the dynamic range of the virtual instrument, and is modeled using a subtractive signal model. The source which is modified by the user by changing the shape of these model consists of three parts (static spectrum, spectral devi- functions. ation, and gain) that are dependent on the velocities and the Calculating the excitation signal is then done as follows. notes played. To get intuitive control of the source model, we To a given note and velocity, we associate a spectral centroid exhibited two parameters: the spectral centroid and the en- Ba andanenergyEn (using the asymptotic exponential fit); ergy, strongly related to the perceptual parameters brightness a is then obtained from the spectral centroid and g from the and loudness. This perceptual link permits easy control of the energy (equation (18)). One finally gets the spectral devia- dynamic characteristics of the piano. tion which, multiplied by the static spectrum, allows the ex- Thus, the tone of a given piano can be synthesized using a citation signal to be calculated. hybrid model. This model is currently implemented in real- time using a Max-MSP software environment.

4. CONCLUSION APPENDIX The reproduction of the piano bridge vibration is undoubtly INVERSE PROBLEM, THREE-COUPLED the first most important step for piano sound synthesis. We DIGITAL WAVEGUIDE show that a hybrid model consisting of a resonant part and an excitation part is well adapted for this purpose. After accu- We show in this appendix how the parameters of a three- rate calibration, the sounds obtained are perceptually close to coupled digital waveguide model can be expressed as func- the original ones for all notes and velocities. The resonator, tion of the modal parameters. This method is an extension which simulates the phenomena intervening in the strings of the model presented in [29]. Hybrid Resynthesis of Piano Tones 1033

The signal measured at the bridge level is the result of termine the following system of 6 equations: the vibration of three coupled strings. Each partial is actually constituted by at least three components, having frequencies P + Q + R = F1 + F2 + F3,(A.6) which are slightly different from the frequencies of each in- PY + PZ + QX + QZ + RX + RY dividual string. We write the measured signal as a sum of ex- ponentially damped sinusoids: = 2F1F2 1 − Ca +2F1F3 1 − Ce (A.7) ∞ +2F2F3 1 − Ca , −α1kt iω1kt −α2kt iω2kt −α3kt iω3kt s(t) = a1ke e + a2ke e + a3ke e , = PYZ + QXZ + RXY k 1 (A.8) (A.1) = − − − 2 F1F2F3 4CaCe 4Ca 2Ce Ce +3 , with a1k, a2k,anda3k the initial amplitudes, α1k, α2k, α3k and ω1k, ω2k, ω3k the damping coefficients and the frequencies of X + Y + Z = F1 + F2 + F3,(A.9) the components of the kth partial. The Fourier transform of = − 2 − 2 s(t)is XY + XZ + YZ F1F2 1 Ca + F2F3 1 Ca (A.10) − 2 ∞ + F1F3 1 Ce , a1k a2k S(ω) = + = − 2 − 2 2 α1k + i ω − ω1k α2k + i ω − ω2k XYZ F1F2F3 1 2Ca Ce +2CaCe . (A.11) k=1 (A.2) a + 3k . We identify (A.2) with the excitation signal times the α3k + i ω − ω3k transfer function T (equation (A.5)):

We identify this expression locally in frequency with the S(ω) = E(ω)T(ω). (A.12) output T(ω) of the three-coupled waveguide model (see Figure 3): Assuming that two successive modes do not overlap (these assumptions are verified for the piano sound) and by writing N1 T(ω) = (A.3) Φ = i X (ω) N2 X(ω) X(ω) e , Φ Y(ω) = Y(ω) ei Y (ω), (A.13) with Φ Z(ω) = Z(ω) ei Z (ω), N1 = F1 + F2 + F3 we express (A.12)neareachdoubleresonanceas +2 Ca − 1 F1F2 + F2F3 + Ce − 1 F1F3 + F F F 3+4C C − 4C − 2C − C2 , a1k a2k a3k 1 2 3 e a a e e − + − + − (A.4) α1k + i ω ω1k α2k + i ω ω2k α3k + i ω ω3k = − − 2 N2 1 F1 + F2 + F3 + F1F2 + F2F3 1 Ca −iωD −iωD E(ω)P (ω)e E(ω)Q (ω)e  + − 2 2 2 − 2 − −i(ωD−ΦX (ω)) − −i(ωD−ΦY (ω)) + F1F3 1 Ce + F1F2F3 2Ca + Ce 2Ca 1 X(ω) e 1 Y(ω) e 2 − 2 − E(ω)R(ω)e−iωD + Ce 2CaCe 1 , + − −Φ . 1 − Z(ω) e i(ωD Z (ω)) where Fi (i = 1, 2, 3) are the loop filters of the digital waveg- (A.14) uides Gi (i = 1, 2, 3) (without loss of generality, one can as- sume that D1 = D2 = D3 = D, since the difference in delays We identify term by term the members of this equation. We can be taken into account in the phase of the filter Fi). For take, for example, this purpose, since T(ω) is a rational fraction of third-order − −iωD polynomial in e iωD (see (6)), it can be decomposed into a a1k  E(ω)P (ω)e −i(ωD−Φ (ω)) . (A.15) sum of three rational fractions of the first-order polynomial α1k + i ω − ω1k 1 − X(ω) e X in e−iωD: The resonance frequencies of each doublet ω1k, ω2k,andω3k P(ω)e−iωD Q(ω)e−iωD correspond to the minimum of the three denominators ( ) = + T ω − −iωD − −iωD 1 X(ω)e 1 Y(ω)e − −Φ (A.5) 1 − X(ω) e i(ωD X (ω)), −iωD R(ω)e − −Φ + . 1 − Y(ω) e i(ωD Y (ω)), (A.16) 1 − Z(ω)e−iωD − −Φ 1 − Z(ω) e i(ωD Z (ω)). The vibrations generated by the model are assimilated to a superposition of three series of partials whose frequencies If we assume that moduli |X(ω)|, |Y(ω)|,and|Z(ω)| are and decay times are governed by the quantities X(ω), Y(ω), close to one (this assumption is realistic because the prop- and Z(ω). By identification between (A.3)and(A.5), we de- agation is weakly damped), we determine the values of ω1k, 1034 EURASIP Journal on Applied Signal Processing

ω2k,andω3k: In the case of a two-coupled digital waveguide, the corre- sponding system admits analytical solutions (see [29]). But Φ X ω1k +2kπ in the case of three-coupled digital waveguide, we have not ω1k = , D found analytical expressions for variables P, Q, R, Ca, Ce, F1, ΦY ω2k +2kπ F2,andF3. We have then solved the system numerically. ω2k = , (A.17) D Φ ω +2kπ ω = Z 3k . REFERENCES 3k D [1] J. Bensa, K. Jensen, R. Kronland-Martinet, and S. Ystad, “Per- Taking ω = ω1k +  with  arbitrary small, ceptual and analytical analysis of the effect of the hammer im- pact on piano tones,” in Proc. International Computer Music − Φ  −  a E ω +  P ω +  e i X (ω1k+ )e i D Conference, pp. 58–61, Berlin, Germany, August 2000. 1k  1k 1k . (A.18) α + i 1 − X ω +  e−iD [2] J. Bensa, F. Gibaudan, K. Jensen, and R. Kronland-Martinet, 1k 1k “Note and hammer velocity dependance of a piano string A limited expansion of −iD  1 −  + (2)around = 0 model based on coupled digital waveguides,” in Proc. Interna- e i D θ tional Computer Music Conference, pp. 95–98, Havana, Cuba, (at the zeroth order for the numerator and at the first order September 2001. for the denominator) gives [3] A. Askenfelt, Ed., Five Lectures on the Acoustics of the Pi- ano, Royal Swedish Academy of Music, Stockholm, Swe- −iΦX (ω1k+) −iD E ω1k +  P ω1k +  e e den, 1990, Lectures by H. A. Conklin, Anders Askenfelt − Φ and E. Jansson, D. E. Hall, G. Weinreich, and K. Wogram,  i X (ω1k) E ω1k P ω1k e , (A.19) http://www.speech.kth.se/music/5 lectures/. 1 − X ω +  e−iD  1 − X ω (1 − iD). [4] S. Ystad, Sound modeling using a combination of physical and 1k 1k signal models, Ph.D. thesis, UniversitedelaM´ edit´ erran´ ee,´ Marseille, France, 1998. Assuming that P(ω)and|X(ω)| are locally constant (in the [5] S. Ystad, “Sound modeling applied to flute sounds,” Journal frequency domain), we identify term by term (the two mem- of the Audio Engineering Society, vol. 48, no. 9, pp. 810–825, bers are considered as functions of the variable ). We deduce 2000. the expressions of |X(ω)|, |Y(ω)|,and|Z(ω)| as a function [6] A. Askenfelt and E. V. Jansson, “From touch to string vibra- of the amplitudes and decay times coefficients for each mode: tions. II: The motion of the key and hammer,” Journal of the Acoustical Society of America, vol. 90, no. 5, pp. 2383–2393, 1991. = 1 = 1 X ω1k , Y ω2k , [7] A. Askenfelt and E. V. Jansson, “From touch to string vibra- α1kD +1 α2kD +1 tions. III: String motion and spectra,” Journal of the Acoustical 1 Society of America, vol. 93, no. 4, pp. 2181–2195, 1993. Z ω2k = . α3kD +1 [8] X. Boutillon, “Le piano: Modelisation physiques et devel- (A.20) oppements technologiques,” in Congres Francais d’Acoustique Colloque C2, pp. 811–820, Lyon, France, 1990. We also get the relations [9] P.Schaeffer, Trait´edesobjetsmusicaux, Edition du Seuil, Paris, France, 1966. E ω1k P ω1k = a1kDX ω1k , [10] J. Laroche and J. L. Meillier, “Multichannel excitation/filter modeling of percussive sounds with application to the piano,” E ω Q ω = a DY ω , (A.21) 2k 2k 2k 2k IEEE Trans. Speech and Audio Processing, vol. 2, no. 2, pp. 329– 344, 1994. E ω3 Q ω3 = a3 DY ω3 . k k k k [11] J. O. Smith III, “Physical modeling using digital waveguides,” From the measured signal, we estimate the modal parame- Computer Music Journal, vol. 16, no. 4, pp. 74–91, 1992. [12] G. Borin, D. Rochesso, and F. Scalcon, “A physical piano ters a1k, a2k, a3k, α1k, α2k, α3k, ω1k, ω2k,andω3k. Using (A.17) model for music performance,” in Proc. International Com- and (A.20), we calculate X, Y,andZ.Westillhave9un- puter Music Conference, pp. 350–353, Computer Music Asso- known variables P, Q, R, E, Ca, Ce, F1, F2,andF3.Butwe ciation, Thessaloniki, Greece, September 1997. also have a system of 9 equations ((A.6), (A.7), (A.8), (A.9), [13] B. Bank, “Physics-based sound synthesis of the piano,” M.S. (A.10), (A.11), and (A.21)). Assuming that the two resonance thesis, Budapest University of Technology and Economics, frequencies are close and that the variables P, Q, R, E, C , Budapest, Hungary, 2000, published as Report 54, Helsinki a University of Technology, Laboratory of Acoustics and Audio Ce, F1, F2, F3, X, Y,andZ have a locally smooth behav- Signal Processing, http://www.mit.bme.hu/∼bank. ior, we then express the waveguide parameters as function of [14] J. O. Smith III, “Efficient synthesis of stringed musical instru- the temporal parameters. For the sake of simplicity, we note ments,” in Proc. International Computer Music Conference,pp. Ek = E(ω1k) = E(ω2k). 64–71, Computer Music Association, Tokyo, Japan, Septem- Using (A.6)and(A.9), we obtain Pk + Qk + Rk = Xk + ber 1993. [15]J.O.SmithIIIandS.A.VanDuyne,“Commutedpianosyn- Yk + Zk. Thanks to (A.21) we finally get the expression of the excitation signal at the resonance frequencies thesis,” in Proc. International Computer Music Conference, pp. 335–342, Computer Music Association, Banff, Canada, September 1995. D a1kXk + a2kYk + a3kZk [16] S. A. Van Duyne and J. O. Smith III, “Developments for Ek = . (A.22) Xk + Yk + Zk the commuted piano,” in Proc. International Computer Music Hybrid Resynthesis of Piano Tones 1035

Conference, pp. 319–326, Computer Music Association, Banff, [35] K. Jensen, Timbre models of musical sounds, Ph.D. thesis, Canada, September 1995. Department of Datalogy, University of Copenhagen, Copen- [17] A. Chaigne and A. Askenfelt, “Numerical simulations of hagen, Denmark, DIKU Tryk, Technical Report No 99/7, struck strings. I. A physical model for a struck string using 1999. finite difference methods,” Journal of the Acoustical Society of America, vol. 95, no. 2, pp. 1112–1118, 1994. [18] X. Boutillon, “Model for piano hammers: Experimental de- Julien Bensa obtained in 1998 his Mas- termination and digital simulation,” Journal of the Acoustical ter’s degree (DEA) in acoustics, signal pro- Society of America, vol. 83, no. 2, pp. 746–754, 1988. cessing, and informatics applied to mu- [19] C. Valette and C. Cuesta, M´ecanique de la corde vibrante, sic from the Pierre et Marie Curie Uni- Traite´ des nouvelles technologies. Serie´ Mecanique.´ Hermes,` versity, Paris, France. He received in 2003 Paris, France, 1993. a Ph.D. in acoustics and signal process- [20] D. E. Hall and A. Askenfelt, “Piano string excitation V: Spectra ing from the University of Aix-Marseille for real hammers and strings,” Journal of the Acoustical Society II for his work on the analysis and syn- of America, vol. 83, no. 6, pp. 1627–1638, 1988. thesis of piano sounds using physical [21] J. Bensa, S. Bilbao, R. Kronland-Martinet, and J. O. Smith and signal models (available on line at III, “The simulation of piano string vibration: from physi- ∼ ff http://www.lma.cnrs-mrs.fr/ bensa). He currently holds a postdoc cal model to finite di erence schemes and digital waveguides,” position in the Laboratoire d’Acoustique Musicale, Paris, France, Journal of the Acoustical Society of America, vol. 114, no. 2, pp. and works on the relation between the parameters of synthesis 1095–1107, 2003. models of musical instruments and the perceived quality of the cor- [22] A. Chaigne and V. Doutaut, “Numerical simulations of xy- lophones. I. Time-domain modeling of the vibration bars,” responding tones. Journal of the Acoustical Society of America, vol. 101, no. 1, pp. Kristoffer Jensen got his Master’s degree in 539–557, 1997. [23] H. Fletcher, E. D. Blackham, and R. Stratton, “Quality of pi- computer science at the Technical Univer- ano tones,” Journal of the Acoustical Society of America,vol. sity of Lund, Sweden, and a DEA in sig- 34, no. 6, pp. 749–761, 1962. nal processing at the ENSEEIHT, Toulouse, [24] D. A. Jaffe and J. O. Smith III, “Extensions of the Karplus- France. His Ph.D. was delivered and de- Strong plucked-string algorithm,” Computer Music Journal, fended in 1999 at the Department of Data- vol. 7, no. 2, pp. 56–69, 1983. logy, University of Copenhagen, Denmark, [25] G. Weinreich, “Coupled piano strings,” Journal of the Acous- treating analysis/synthesis, signal process- tical Society of America, vol. 62, no. 6, pp. 1474–1484, 1977. ing, classification, and modeling of musical [26] V. Valim¨ aki,¨ J. Huopaniemi, M. Karjalainen, and Z. Janosy,´ sounds. Kristoffer Jensen has a broad back- “Physical modeling of plucked string instruments with appli- ground in signal processing, including musical, speech recognition cation to real-time sound synthesis,” Journal of the Audio En- and acoustic antenna topics. He has been involved in gineering Society, vol. 44, no. 5, pp. 331–353, 1996. for children, state-of the-art next-generation effect processors, and [27] B. Bank, “Accurate and efficient modeling of beating and two- signal processing in music informatics. His current research topic is stage decay for string instrument synthesis,” in Proc. Workshop signal processing with musical applications, and related fields, in- on Current Research Directions in Computer Music, pp. 134– cluding perception, psychoacoustics, physical models, and expres- 137, Barcelona, Spain, November 2001. sion of music. He currently holds a position at the Department of [28] D. Rocchesso and J. O. Smith III, “Generalized digital waveg- Datalogy as Assistant Professor. uide networks,” IEEE Trans. Speech and Audio Processing,vol. 11, no. 3, pp. 242–254, 2003. Richard Kronland-Martinet received a [29] M. Aramaki, J. Bensa, L. Daudet, P. Guillemain, and Ph.D. in acoustics from the University of R. Kronland-Martinet, “Resynthesis of coupled piano string Aix-Marseille II, France, in 1983. He re- vibrations based on physical modeling,” Journal of New Music ceived a “Doctorat d’Etat es` Sciences” in Research, vol. 30, no. 3, pp. 213–226, 2002. 1989 for his work on Analysis and synthesis [30] K. Steiglitz and L. E. McBride, “A technique for the identifi- cation of linear systems,” IEEE Trans. Automatic Control,vol. of sounds using time-frequency and time- 10, pp. 461–464, 1965. scale representations. He is currently Di- [31] J. Beauchamp, “Synthesis by spectral amplitude and “bright- rector of Research at the National Center ness” matching of analyzed musical instrument tones,” Jour- for Scientific Research (CNRS), Laboratoire nal of the Audio Engineering Society, vol. 30, no. 6, pp. 396– de Mecanique´ et d’Acoustique in Marseille, 406, 1982. where he is the head of the group “Modeling, Synthesis and Con- [32] S. McAdams, S. Winsberg, S. Donnadieu, G. de Soete, and trol of Sound and Musical Signals.” His primary research interests J. Krimphoff, “Perceptual scaling of synthesized musical tim- are in analysis and synthesis of sounds with a particular empha- bres: Common dimensions, specificities, and latent subject sis on musical sounds. He has recently been involved in a multi- classes,” Psychological Research, vol. 58, pp. 177–192, 1992. disciplinary research project associating sound synthesis processes [33] K. Jensen, “Musical instruments parametric evolution,” in and brain imaging techniques fonctional Nuclear Magnetic Reso- Proc. International Symposium on Musical Acoustics, pp. 319– nance (fNRM) to better understand the way the brain is processing 326, Computer Music Association, Mexico City, Mexico, De- sounds and music. cember 2002. [34] M. Laursen, C. Erkut, V. Valim¨ aki,¨ and M. Kuuskankara, “Methods for modeling realistic playing in acoustic guitar synthesis,” Computer Music Journal, vol. 25, no. 3, pp. 38–49, 2001. EURASIP Journal on Applied Signal Processing 2004:7, 1036–1044 c 2004 Hindawi Publishing Corporation

Warped Linear Prediction of Physical Model Excitations with Applications in Audio Compression and Instrument Synthesis

Alexis Glass Department of Acoustic Design, Graduate School of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan Email: [email protected]

Kimitoshi Fukudome Department of Acoustic Design, Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan Email: [email protected]

Received 8 July 2003; Revised 13 December 2003

A sound recording of a plucked string instrument is encoded and resynthesized using two stages of prediction. In the first stage of prediction, a simple physical model of a plucked string is estimated and the instrument excitation is obtained. The second stage of prediction compensates for the simplicity of the model in the first stage by encoding either the instrument excitation or the model error using warped linear prediction. These two methods of compensation are compared with each other, and to the case of single-stage warped linear prediction, adjustments are introduced, and their applications to instrument synthesis and MPEG4’s audio compression within the structured audio format are discussed. Keywords and phrases: warped linear prediction, audio compression, structured audio, physical modelling, sound synthesis.

1. INTRODUCTION for some applications it also leaves a few problems unad- dressed. Since the discovery of the Karplus-Strong algorithm [1]and The MPEG-4 structured audio codec allows for the im- its subsequent reformulation as a physical model of a string, plementation of any coding algorithm, from linear predic- a subset of the digital waveguide [2], physical modelling has tive coding to adaptive transform coding to, at its most ef- seen the rapid development of increasingly accurate and dis- ficient, the transmission of instrument models and perfor- parate instrument models. Not limited to string model im- mance data [9]. This coding flexibility means that MPEG- plementations of the digital waveguide, such as the kantele 4 has the potential to implement any coding algorithm and [3] and the clavichord [4], models for brass, woodwind, and to be within an order of magnitude of the most efficient percussive instruments have made physical modelling ubiq- codec for any given input data set [10]. Moreover, for sources uitous. that are synthetic in nature, or can be closely approximated With the increasingly complex models, however, the by physical or other instrument models, structured audio task of parameter selection has become correspondingly promises levels of compression orders of magnitude bet- difficult. Techniques for calculating the loop filter coeffi- ter than what is currently possible using conventional pure cients and excitation for basic plucked string models have signal-based codecs. been refined [5, 6] and can be quickly calculated. How- Current methods used to parameterize physical models ever, as the one-dimensional model gave way to models with from recordings require, however, a great deal of time for weakly interacting transverse and vertical polarizations, re- complex models [8]. They also often require very precise search has looked to new ways of optimizing parameter se- and comprehensive original recordings, such as recordings lection. These new methods of optimizing parameter se- of the impulse response of the acoustic body [5, 11], in or- lection use neural networks or genetic algorithms [7, 8] der to achieve reproductions that are indistinguishable from to automate tasks which would otherwise take human op- the original. Given current processor speeds, these limita- erators an inordinate amount of time to adjust. This re- tions preclude the use of genetic algorithm parameter selec- search has yielded more accurate instrument models, but tion techniques for real-time coding. Real-time coding is also WLP in Physical Modelling for Audio Compression 1037

ffi x(n) y(n) made exceedingly di cult in such cases where body impulse + responses are not available or playing styles vary from model expectations. This paper proposes a solution to this real-time pa- −L rameterization and coding problem for string modelling in F(z) G(z) z the marriage of two common techniques, the basic plucked string physical model and warped linear prediction (WLP) Figure 1: Topology of a basic plucked string physical model. [12].

Thejustificationsforthisapproachareasfollows.Most −L string recordings can be analyzed using the techniques de- study, comprised of one delay line z with a first-order all- veloped by Smith, Karjalainen et al. [2, 6]inordertoparam- pass fractional delay filter F(z) and a single pole low-pass eterize a basic plucked string model, and a considerable pre- loop filter G(z) as shown in Figure 1, where, diction gain can be achieved using these techniques. The ex- − a + z 1 citation signal for the plucked string model is constituted by F(z) = ,(1) 1+az−1 an attack transient that represents the plucking of the string according to the player’s style and plucking position [11], and = g 1+a1 G(z) −1 ,(2) is followed by a decay component. This decay component in- 1+a1z cludes the body resonances of the instrument [11, 13], beat- ing introduced by the string’s three-dimensional movement and the overall transfer function of the system can be ex- and further excitation caused by the player’s performance. pressed as Additional excitations from the player’s performance include = 1 deliberate expression through vibrato or even unintentional H(z) − . (3) 1 − F(z)G(z)z L influences, such as scratching of the string or the rattling caused by the string vibrating against the fret with weak This string model is very simple and much more accurate fingering pressure. The body resonances and contributions and versatile models have been developed since [6, 11, 15]. from the three-dimensional movement of the string mean For the purposes of this study, however, it was required that that the excitation signal is strongly correlated and there- the model could be quickly and accurately parameterized fore a good candidate for WLP coding. Furthermore, while without the use of complex or time consuming algorithms residual quantization noise in a warped predictive codec is and sufficientthatitoffers a reasonable first-stage coding shaped so as to be masked by the signal’s spectral peaks [12], gain. The algorithms used to parameterize the first-order in one of the proposed topologies, the noise in the physical model are described in detail in [15]andwillonlybeout- model’s excitation signal is likewise shaped into the mod- lined here as they were implemented for this study. elled harmonics. This shaping of the noise by the physical In the first stage of the model parameterization, the pitch model results in distortion that, if audible, is neither un- of the target sound was detected from the target’s autocorre- natural nor distracting, thereby allowing codec sound qual- lation function. The length of the delay line z−L and the frac- ity to degrade gracefully with decreasing bit rate. In the tional delay filter F(z) were determined by dividing the sam- ideal case, we imagine that at the lowest bit rate, the guitar pling frequency (44.1 kHz) by the pitch of the target. Next, would be transmitted using only the physical model param- the magnitude of up to the first 20 harmonics were tracked eters and that with increasing excitation bit rate, the repro- using short-term Fourier transforms (STFTs). The magni- duced guitar timbre would become closer to the target origi- tude of each harmonic versus time was recorded on a loga- nal one. rithmic scale after the attack transient of the pluck was deter- This paper is composed of six sections. Following the in- mined to have dissipated and until the harmonic had decayed troduction, the second section describes the plucked string 40 dB or disappeared into the noise floor. model used in this experiment and the analysis methods used A linear regression was performed on each harmonic’s to parameterize it. The third section describes the record- decay to determine its slope, βk, as shown in Figure 2, and the ing of a classic guitar and an electric guitar for testing. The measured loop gain for each harmonic, Gk, was calculated coding of the guitar tones using a combination of physical according to the following equation, modelling and warped linear predictive coding is outlined in βk L/20H Section 4. Section 5 analyzes the results from simulated cod- Gk = 10 , k = 1, 2, ..., Nh,(4) ing scenarios using the recorded samples from Section 3 and the topologies of Section 4, while investigating methods of where L is the length of the delay line (including the frac- further improving the quality of the codec. Section 6 con- tional component), and H is the hop size (adjusted to ac- cludes the paper. count for hop overlap). The loop gain at DC, g,wasesti- mated to equal the loop gain of the first harmonic, G1,as in [15]. Because the target guitar sounds were arbitrary and 2. MODEL STRUCTURE nonideal, the harmonic envelop trajectories were quite noisy A simple linear string model extended from the Karplus- in some cases, so, additional measures had to be introduced Strong algorithm, by Jaffe and Smith [14], was used in this to stop tracking harmonics when their decays became too 1038 EURASIP Journal on Applied Signal Processing

50 Separate room

PC with Layla

0 Anechoic chamber Magnitude (dB) Mic amp

50 00.511.522.5 Time (s)

Figure 2: The temporal envelopes of the lowest four harmonics of a guitar pluck (dashed) and their estimated decays (solid). Figure 3: Schematic for classic guitar pluck recording. erratic or, as in some cases, negative. In such cases as when the combination of the basic plucked string model and WLP the guitar fret was held with insufficient pressure, additional coding. No special care was taken, therefore, in the selec- transients occurred after the first attack transient and this tion of the instruments to be used or the nature of the gui- tended to raise the gain factor in the loop filter, resulting in tar tones to be analyzed and resynthesized beyond that they a model that did not accurately reflect string losses. For the were monophonic, recorded in an anechoic chamber and purposes of this study, such effects were generally ignored so each pluck was preceded by silence to facilitate the analysis long as a positive decay could be measured from the harmon- process. A schematic of the recording environment and sig- ics tracked. nal flow for the classic guitar is pictured in Figure 3. The first-order loop filter coefficient a1 was estimated by Two guitars were recorded. The first, a classic guitar, was minimizing the weighted error between the target loop filter recorded in an anechoic chamber with the guitar held ap- Gk,ascalculatedin(4), and candidate filters G(z)from(2). proximately 50 cm from a Bruel & Kjaer type 4191 free field  A weighting function Wk, suggested by [15]anddefinedas 1/2 microphone, the output of which was amplified by a Falcon Range 1/2 type 2669 microphone preamp with a = 1 Bruel & Kjaer type 5935 power supply and fed into a PC Wk − ,(5) 1 Gk through a Layla 24/96 multitrack recording system. The elec- tric guitar was recorded through its line out and a Yamaha was used such that the error could be calculated as follows: O3D mixer into the Layla. A variety of plucking styles were

Nh recorded in both cases, along with the application of vibrato, jωk ffi E a1 = Wk Gk − G e , a1 ,(6)string scratching, and several cases where insu cient finger k=1 pressure on the frets lead to further string excitation (i.e., a rattling of the string) after the initial pluck. where ωk is the frequency at the harmonic being evaluated After capturing approximately 8 minutes of playing with and 0

1 WLP used here could have been optimized by distributing the high number of bits used in the attack throughout the 0.5 length of the sound to be coded. However, since similar op- 0 timizations could also be made in the two-stage algorithms, only the simplest method was investigated in this study. Amplitude −0.5

−1 4.2. Windowed excitation 0051152 . . As the most basic implementation of the physical model, the Time (s) residual from the string model’s inverse filter can be win- (a) dowed and used as the excitation for the model. In this study, the excitation was first coded using a warped linear predic- 1 tive coder of order 20 and with BITSA bits of quantization for each sample of the residual. In many cases, the first 100 0.5 milliseconds of the excitation contains enough information 0 about the pluck and the guitar’s body resonances for accurate

Amplitude resynthesis [13, 15]. The beating caused by the slight three- − 0.5 dimension movement of the string and the rattling caused by −1 the energetic plucks used in the study, however, were signifi- 00.511.52 cant enough that a longer excitation was used. Time (s) Specifically, the window used was thus unity for the first (b) 100 milliseconds of the excitation and then decayed as the second half of a Hanning window for the following 100 mil- liseconds. An example of this windowed excitation can be Figure 4: The decomposition of an excitation into (a) attack and (b) decay. The attack window is 200 milliseconds long. In this case, seen in the top of Figure 4. This windowed excitation, consid- decay refers to the portion of the pluck where the greatest attenu- ered as the attack component, was input to the string model ation is a result of string losses. Because the string is not otherwise for comparison to the WLP case and used in the modified damped, it may also be considered to be the sustain segment of the extended Karplus-Strong algorithm which will now be de- envelope. scribed.

4.3. Two-stage coding topologies

4. ANALYSIS/RESYNTHESIS ALGORITHMS As described in [9], structured audio allows for the parame- terization and transmission of audio using arbitrary codecs. 4.1. Warped linear prediction These codecs may be comprised of instrument models, effect Frequency warping methods [16] can be used with linear models, psychoacoustic models, or combinations thereof. prediction coding so that the prediction resolution closely The most common methods used for the psychoacoustic matches the human auditory system’s nonuniform frequency compression of audio are transform codecs, such as MP3 resolution. Harm¨ a¨ found that WLP realizes a basic psychoa- [17]andATRAC[18] and time-domain approaches such as coustic model [12]. As a control for the study, the target WLP [12]. Because the specific application being considered signal was therefore first processed using a twentieth-order here is that of the guitar, the first stage of our codec is the WLP coder of lattice structure. simple string model described in Section 2. The second stage Thelatticefilter’sreflectioncoefficients were not quan- of coding was then approached using one of two methods: tized, and after inverse filtering, the residual was split into (1) the model’s output signal error (referred to as model two sections, attack and decay, which were quantized using a error) could be immediately coded using WLP, or mid-riser algorithm. The step size in the mid-riser quantizer (2) the model’s excitation could be coded using WLP, with was set such that the square error of the residual was mini- the attack segment of the excitation receiving more bits mized. The number of bits per sample in the attack residual as in the WLP case of Section 4.2. (BITSA) was set to each of BITSA ={16, 8, 4} for each of the bits per sample in the decay residual BITSD ={2, 1}. The topologies of these two strategies are illustrated in The frame size for the coding was set to equal two periods of Figure 5. the guitar pluck being coded, and the reflection coefficients Both topologies require the inverse filtering of the target were linearly interpolated between frames. The bit allocation pluck sound in order to extract the excitation. The decompo- method was used in order to match the case of the topolo- sition of the excitation into attack and decay components for gies that use a first-stage physical model predictor, where the first topology, as formerly proposed by Smith [19]and more bits were allocated to the attack excitation than the implemented by Valim¨ aki¨ and Tolonen in [13], reflects the decay excitation. Harm¨ a¨ found in [12] that near transpar- wideband and high amplitude portion which marks the be- ent quality could be achieved with 3 bits per sample using ginning of the excitation signal and the decay which typically a WLP codec. It is therefore reasonable to suggest that the contains lower frequency components from body resonances 1040 EURASIP Journal on Applied Signal Processing

Coder Transmission Decoder

String model parameter estimation String WLPD P(z) model wattack H(z) Inverse xfull xˆattack String swex emodel eˆ swex × WLPC Q WLPD WLPC Q WLPD model + sˆ s filter − model − + − −1 P 1(z) P(z) P 1(z) P(z) H (z) BITSA H(z) BITSD

String model parameter estimation

wattack x˜attack xˆfull String × Q + WLPD sˆ Inverse ( ) model xfull WLPC P z s filter −1 BITSA H(z) −1 P (z) H (z) wdecay x˜decay × Q BITSD

Figure 5: The WLP coding of model error (WLPCME) topology (top) and WLP coding of model excitation (WLPCMX) topology (bottom). Here, s represents the plucked string recording to be coded and sˆ the reconstructed signal. In this diagram, WLPC indicates the WLP coder, or inverse filter, and WLPD indicates the WLP decoder. Q is the quantizer, with BITSA and BITSD being the number of bits with which the respective signals are quantized. or from the three-dimensional movement of the string. How- sample for the decay residual. This process can be expressed ever, whereas the authors of [13] synthesized the decay exci- in the following terms: tation at a lower sampling rate, justified by its predominantly x = h−1 ∗ s, lower frequency components, the excitations in our study of- full ten contained wideband excitations following the initial at- x˜ = q p−1 ∗ x · w , attack BITSA full attack tack and no such multirate synthesis was therefore used. Typ- − x˜ = q p 1 ∗ x · w , (8) ical attack and decay decomposition of an excitation is shown decay BITSD full decay = ∗ in Figure 4. The high frequency decay components are a re- xˆfull p x˜attack + x˜decay , sult of the mismatch between the string model and the source sˆ = h ∗ xˆfull, recording. where s is the original instrument recording being modelled, 4.4. Warped linear prediction coding of model error h is the string model’s inverse filter, and xfull is thus the The WLPCME topology from Figure 5 was implemented model excitation. x˜attack is therefore the string model exci- −1 such that WLP was applied to the model error as follows tation whitened by the WLP, p , and quantized to BITSA, while x˜decay is likewise whitened and quantized to BITSD. The swex = h ∗ xˆattack, sum of the attack and decay is then resynthesized by the WLP decoder, p. The resulting xˆ is subsequently considered as e = s − s , (7) full model wex excitation to the string model, h, to form the resynthesized = sˆ swex + eˆmodel, plucked string sound sˆ. where s is the recorded plucked string input, h is the im- pulse response of the derived pluck string model from (3), 5. SIMULATION RESULTS AND DISCUSSION xˆattack is the WLP-coded windowed excitation introduced in Inordertoevaluatetheeffectiveness of the two proposed Section 4.2, swex is the pluck resynthesized using only the topologies, a measure of the sound quality was required. In- windowed excitation, and emodel is the model error. eˆmodel is formal listening tests suggested that the WLPCMX topology thus the model error coded using WLP and BITSD bits per offered slightly improved sound quality and a more musi- sample and sˆ is the reconstructed pluck. cal coding at lower bit rates, although it came at the cost of a much brighter timbre. At very low bit rates, WLPCMX in- 4.5. Warped linear prediction coding troduced considerable distortion especially for sound sources of model excitation that were poorly matched by the string model. WLPCME, on In this case, the model excitation was coded instead of the the other hand, was equivalent in sound quality to WLPC model error. Following the string model inverse filtering, the and sometimes worse. Resynthesis using windowed excita- excitation is whitened using a twentieth-order WLP inverse tion yielded passable guitar-like timbres, but in none of the filter. Next, the signal is quantized with BITSA bits per sam- test cases came close to reproducing the nuance or fullness of ple allotted to the residual in the attack, and BITSD bits per the original target sounds. WLP in Physical Modelling for Audio Compression 1041

For a more formal evaluation of the simulated codecs’ 12 sound quality, an objective measure of sound quality was cal- culated by measuring the spectral distance between the fre- 10 quency warped STFTs, Sk, of the original pluck recording and the resynthesized output, Sˆk, created using the codecs. The frequency-warped STFT sequences were created by first 8 warping each successive frame of each signal using cascaded all-pass filters [16], followed by a Hanning window and a 6 (FFT). The method by which the bark

spectral distance (BSD) was measured is as follows: Mean BSD (dB)  4    −  N1 =  1 − ˆ 2 BSDk 20 log10 Sk(n) 20 log10 Sk(n) , 2 N n=0 (9) 0 with the mean BSD for the whole sample being the un- 00.511.522.5 weighted mean of all frames k. A typical profile of BSD ver- Time (s) sus time is shown in Figure 6 for the three cases WLPC, WLPCMX, and WLPCME. Figure 6: Bark scale spectral distortion (dB) versus time (seconds). In the first round of simulations, all six input samples WLPC is solid, WLPCMX is dashed-dotted, and WLPCME is the as described in Section 3 were processed using each of the dashed line. algorithms described in Section 4. The resulting mean BSDs were then calculated to be as shown in Figure 7. Subjective evaluation of the simulated coding revealed that as bit rate decreased, the WLPCMX topology main- 12 tained a timbre that, while brighter than the target, was rec- ognizably as a guitar. In contrast, the other methods became 10 noisy and synthetic. Objective evaluation of these same re- sults reveals that both topologies using a first-stage physical model predictor have greater spectral distortion than the case 8 of WLPC, particularly in the case of the recordings with very slow decays (i.e., with a high DC loop gain g). In identifying 6 the cause of this distortion, we must first consider the model prediction. The degradation occurs for the following reason Mean BSD (dB) 4 in each of the two topologies.

(A) In the case of the WLPCME, the beating that is caused 2 by the three-dimensional vibration of the string causes considerable phase deviation from the phase of the 0 modelled pluck, and the model error often becomes 123 456 greater in magnitude than the original signal itself. This leads to a noisier reconstruction by the resynthe- Figure 7: Mean Bark scale spectral distortion (dB) using each of sizer. Additionally, small model parameterization er- WLPC, WLPCME, and WLPCMX (left to right) for (1) E3 classic, rors in pitch and the lack of vibrato in the model result (2) E1 classic, (3) B1 classic (rattle 1), (4) B1 classic (rattle 2), (5) E1 in phase deviations. electric, and (6) E2 electric. Simulation parameters were BITSA = 4 = (B) In the case of the WLPCMX, with a low bit rate in and BITSD 1. the residual quantization stage of the linear predictor, a small error in coding of the excitation is magnified by the resynthesis filter (string model). In addition to this, as noted in [15], the inverse filter may not have and previously raised in Section 2.Performancecontrol,such been of sufficiently high order to cancel all harmon- as vibrato, would also have to be extracted from the input for ics, and high frequency noise, magnified by the WLP a locked phase to be achieved in the resynthesized pluck. The coding, may have been further shaped by the plucked topology of (B), however, allows for some improvement in string synthesizer into bright higher harmonics. the reconstructed signal quality by compromising between the prediction gain of the first stage and the WLP coding of The distortion caused by the topology in (A) seems im- the second stage. More explicitly, if the loop filter gain was to possible to improve significantly without using a more com- be decreased, then the cumulative error being introduced by plex model that considers the three-dimensional vibration of the quantization in the WLP stage would be correspondingly the string, such as the model proposed by Valim¨ aki¨ et al. [11] decreased. 1042 EURASIP Journal on Applied Signal Processing

Such a downwards adjustment of the loop filter gain in 8 order to minimize coding noise results in a physical model that represents a plucked string with an exaggerated decay. 7 This almost makes the physical model prediction stage ap- 6 pear more like the long-term pitch predictor in a more con- ventional linear prediction (LP) codec targeted at speech. 5 However, there is still the critical difference in that the physi- cal model contains the low-pass component of the loop filter 4 and can still be thought of as modelling the behaviour of a Mean BSD (highly damped) guitar string. 3 To obtain an appropriate value for the loop gain, mul- tiplier tests were run on all six target samples. The electric 2 guitar recordings and the recordings of the classical guitar at E3 represented “ideal” cases; there were no rattles subse- 1 quent to the initial pluck, in addition to negligible changes 0 in pitch throughout their lengths. Amongst the remaining 00.20.40.60.81 recordings, the two rattling guitar recordings represented two Loop gain multiplier timbres very difficult to model without a lengthy excitation or a much more complex model of the guitar string. The Figure 8: Mean Bark scale spectral distortion versus loop gain mul- mean BSD measure for the electric guitar at E1 is shown in tiplier. WLPCMX is solid and WLPC is the dashed-dotted line. Figure 8. As can be seen from Figure 8, reducing the loop gain of the physical model predictor increased the performance of the codec and yielded superior BSD scores for loop gain 6 multipliers between 0.1 and 0.9. The greater the model mis- match, as in the case of the recordings with rattling strings, 5 the less the string model predictor lowered the mean BSD. Models which did not closely match also featured minimal mean BSDs at lower loop gains (e.g., 0.5 to 0.7). The simu- 4 lation used to produce Figure 7 was performed again using a single, approximately optimal, loop gain multiplier of 0.7. 3 The results from this simulation are pictured in Figure 9.

The decreased BSD for all the samples in Figure 9 con- Mean BSD (dB) firms the efficacy of the two-stage codec. Informal subjec- 2 tive listening tests described briefly at the beginning of this section also confirmed that decreasing the bit rate reduced 1 the similarity of the reproduced timbre to the original tim- bre, without obscuring the fact that it was a guitar pluck 0 and without the “thickening” of the mix that occurs due to 123456 the shaped noise in the WLPC codec. This improvement of- fered by the two-stage codec becomes even more noticeable Figure 9: Mean Bark scale spectral distortion (dB) using each of at lower bit rates, such as with a constant 1 bit per sample WLPC, WLPCMX (left to right) for (1) E3 classic, (2) E1 classic, (3) quantization of WLP residual over both attack and decay. B1 classic (rattle 1), (4) B1 classic (rattle 2), (5) E1 electric, and (6) = = To evaluate the utility of the proposed WLPCMX, it E2 electric. Simulation parameters were BITSA 4andBITSD 1. is important to compare it to the alternatives. Existing purely signal-based approaches such as MP3 and WLPC have proven their usefulness for encoding arbitrary wideband au- dio signals at low bit rates while preserving transparent qual- On the other hand, increasingly complex physical models ity. As an example, Harm¨ a¨ found that wideband audio could can now reproduce many classes of instruments with excel- be coded using WLPC at 3 bits per sample (= 132.3kbps lent quality. Assuming a good calibration or, in the best case, @44.1 kHz) for good quality [12].Thesemodelscanbeim- a performance made using known physical modelling algo- plemented in real-time with minimal computational over- rithms, transmission of model parameters and continuous head, but like sample-based synthesis, do not represent the controllers would result in a bit rate at least an order of mag- transmitted signal parametrically in a form that is related to nitude lower than the case of pure signal-based methods. As the original instrument. Pure signal-based approaches, using an example, if we consider an average score file from a mod- psychoacoustic models, are thus limited to the extent which ern sequencing program using only virtual instruments and they can remove psychoacoustically redundant data from an software effects, the file size (including simple instrument audio stream. and effect model algorithms) is on the order of 500 kB. For WLP in Physical Modelling for Audio Compression 1043 an average song length of approximately 4 minutes, this leads constant. This improved objective measures of the sound to a bit rate of approximately 17 kbps. For optimized scores quality beyond those achieved by the similar WLPC de- and simple instrument models, the bit rate could be lower sign while maintaining the codec’s advantages exposed by than 1 kbps. Calibration of these complex instrument models the subjective tests. Whereas the target plucks became noisy to resynthesize acoustic instruments remains an obstacle for when coded at 1 bit per sample using WLPC, the allocation of real-time use in coding, however. Likewise, parametric mod- quantization noise to higher harmonics in the second topol- els are flexible within the class for which they are designed, ogy meant that the same plucks took on a drier, brighter tim- but an arbitrary performance may contain elements not sup- bre when coded at the same bit rate. ported by the model. Such a performance cannot be repro- WLP can easily be performed in real-time, and it could duced by the pure physical model and may, indeed, result in thus be applied to coding model excitations in both audio poor model calibration for the performance as a whole. coders and in real-time instrument synthesizers. Analysis of This preliminary study of the WLPCMX topology offers polyphonic scenes is still beyond the scope of the model, a compromise between the pure physical-model-based ap- however, and the realization of highly polyphonic instru- proaches and the pure signal-based approaches. For the case ments would entail a corresponding increase in computa- of the monophonic plucked string considered in this study, a tional demands from the WLP in the decoding of the exci- lower spectral distortion was realized using the model-based tation. predictor. Because more bits were assigned to the attack por- Future exploration of the two-stage physical model/WLP tion of the string recording, the actual long-term bit rate of coding schemes should be investigated using more accurate the codec is related to the frequency of plucks, but at its worst physical models, such as the vertical/transverse string model case it is limited by the rate of the WLP stage (assuming mentioned in Section 1, which might allow the first topology a loop gain multiplier of 0) and its best case, given a close investigated in this paper to realize coding gains. Implemen- match between model and recording, approaches the physi- tation of more complicated models reintroduces, however, cal model case. For recordings that were well modelled by the the difficulties of accurately parameterizing them—though string model, such as the electric guitar at E1 and E2 and the this increased complexity is partially offset by the increased E3 classic guitar sample, subjective tests suggested that equiv- tolerance for error that the excitation coding allows. alent quality could be achieved with 1 bit per sample less than the WLPC case. Limitations of the string model prevent it ACKNOWLEDGMENTS from capturing all the nuances of the recording, such as the rattling of the classical guitar’s string, but these unmodelled The authors would like to thank the Japanese Ministry of Ed- features are successfully encoded by the WLP stage. Because ucation, Culture, Sports, Science and Technology for funding the predictor reflects the acoustics of a plucked string, degra- this research. They are also grateful to Professor Yoshikawa dation in quality with lower bit rates sounds more natural. for his guidance throughout, and the students of the Signal Processing Lab for their assistance, particularly in making the guitar recordings. 6. CONCLUSIONS The implementation of a two-stage audio codec using a phys- REFERENCES ical model predictor followed by WLP was simulated and the subjective and objective sound quality analyzed. Two codec [1] K. Karplus and A. Strong, “Digital synthesis of plucked-string topologies were investigated. In the first topology, the instru- and drum timbres,” Computer Music Journal, vol. 7, no. 2, pp. ment response was estimated by windowing the first 200 mil- 43–55, 1983. liseconds of the excitation, and this estimate was subtracted [2] J. O. Smith, “Physical modeling using digital waveguides,” ff Computer Music Journal, vol. 16, no. 4, pp. 74–91, 1992. from the target sample, with the di erence being coded us- [3] C. Erkut, M. Karjalainen, P. Huang, and V. Valim¨ aki,¨ “Acous- ing WLP coding. In the second topology, the excitation to tical analysis and model-based sound synthesis of the kantele,” the plucked string physical model was coded using WLP be- Journal of the Acoustical Society of America, vol. 112, no. 4, pp. fore being reconstructed by reapplying the coded excitation 1681–1691, 2002. to the string model shown in Figure 1. Tests revealed that the [4] V. Valim¨ aki,¨ M. Laurson, C. Erkut, and T. Tolonen, “Model- limitations of the physical model resulted in model error in based synthesis of the clavichord,” in Proc. International Com- puter Music Convention, pp. 50–53, Berlin, Germany, August– the first topology to be of greater amplitude than the target September 2000. sound, and the codec therefore operated with inferior quality [5] V. Valim¨ aki¨ and T. Tolonen, “Development and calibration of to the WLPC control case. a guitar synthesizer,” Journal of the Audio Engineering Society, The second topology, however, showed promise in sub- vol. 46, no. 9, pp. 766–778, 1998. jective tests whereby a decrease in the bits allocated to [6]M.Karjalainen,V.Valim¨ aki,¨ and T. Tolonen, “Plucked-string the coding of the decay segment of the excitation reduced models: From the Karplus-Strong algorithm to digital waveg- the similarity of the timbre without changing its essential uides and beyond,” Computer Music Journal,vol.22,no.3,pp. 17–32, 1998. likeness to a plucked string. A further simulation was per- [7] A. Cemgil and C. Erkut, “Calibration of physical models using formed wherein the loop gain of the physical model was re- artificial neural networks with application to plucked string duced in order to limit the propagation of the excitation’s instruments,” in Proc. International Symposium on Musical quantization error due to the physical model’s long-time Acoustics, Edinburgh, UK, August 1997. 1044 EURASIP Journal on Applied Signal Processing

[8] J. Riionheimo and V. Valim¨ aki,¨ “Parameter estimation of a Kimitoshi Fukudome was born in Kago- plucked string synthesis model using a genetic algorithm with shima, Japan in 1943. He received his B.E., perceptual fitness calculation,” EURASIP Journal on Applied M.E., and Dr.E. degrees from Kyushu Uni- Signal Processing, vol. 2003, no. 8, pp. 791–805, 2003. versity in 1966, 1968, and 1988, respectively. [9] B. L. Vercoe, W. G. Gardner, and E. D. Scheirer, “Structured He joined Kyushu Institute of Design’s De- audio: Creation, transmission, and rendering of parametric partment of Acoustic Design as a Research sound representations,” Proceedings of the IEEE,vol.86,no.5, Associate in 1971 and has been an Associate pp. 922–940, 1998. Professor there since 1990. With the Octo- [10] E. D. Scheirer, “Structured audio, Kolmogorov complexity, ber 1, 2003 integration of Kyushu Institute and generalized audio coding,” IEEE Transactions on Speech of Design into Kyushu University, his affili- and Audio Processing, vol. 9, no. 8, pp. 914–931, 2001. ation has changed to the Department of Acoustic Design, Faculty [11] M. Karjalainen, V. Valim¨ aki,¨ and Z. Janosy, “Towards high- quality sound synthesis of the guitar and string instruments,” of Design, Kyushu University. His research interests include digital in Proc. International Computer Music Conference, pp. 56–63, signal processing for 3D sound systems, binaural stereophony, en- Tokyo, Japan, 1993. gineering acoustics, and direction of arrival (DOA) estimation with ffl [12] A. Harm¨ a,¨ Audio coding with warped predictive methods, Li- sphere-ba ed microphone arrays. centiate thesis, Helsinki University of Technology, Espoo, Fin- land, 1998. [13] V. Valim¨ aki¨ and T. Tolonen, “Multirate extensions for model- based synthesis of plucked string instruments,” in Proc. In- ternational Computer Music Conference, pp. 244–247, Thessa- loniki, Greece, September 1997. [14] D. Jaffe and J. O. Smith, “Extensions of the Karplus-Strong plucked-string algorithm,” Computer Music Journal,vol.7, no. 2, pp. 56–69, 1983. [15] V. Valim¨ aki,¨ J. Huopaniemi, M. Karjalainen, and Z. Janosy,´ “Physical modeling of plucked string instruments with appli- cation to real-time sound synthesis,” Journal of the Audio En- gineering Society, vol. 44, no. 5, pp. 331–353, 1996. [16] A. Harm¨ a,¨ M. Karjalainen, L. Savioja, V. Valim¨ aki,¨ U. K. Laine, and J. Huopaniemi, “Frequency-warped signal process- ing for audio applications,” Journal of the Audio Engineering Society, vol. 48, no. 11, pp. 1011–1031, 2000. [17] K. Brandenburg and G. Stoll, “ISO/MPEG-audio codec: a generic standard for coding of high quality digital audio,” Journal of the Audio Engineering Society, vol. 42, no. 10, pp. 780–791, 1994. [18] K. Tsutsui, H. Suzuki, O. Shimoyoshi, M. Sonohara, K. Aka- giri, and R. M. Heddle, ATRAC: Adaptive transform acoustic coding for MiniDisc, reprinted from the 93rd Audio Engineer- ing Society Convention, San Francisco, Calif, USA, 1992. [19] J. O. Smith, “Efficient synthesis of stringed musical instru- ments,” in Proc. International Computer Music Conference,pp. 64–71, Tokyo, Japan, September 1993.

Alexis Glass received his B.S.E.E. from Queen’s University, Kingston, Ontario, Canada in 1998. During his bachelor’s degree, he interned for nine months at Toshiba Semiconductor in Kawasaki, Japan. After graduating, he worked for a defense firm in Kanata, Ontario and a videogame developer in Montreal, Quebec before winning a Monbusho Scholarship from the Japanese government to pursue graduate studies at Kyushu Institute of Design (KID, now Kyushu University, Graduate School of Design). In 2002, he received his Master’s of Design from KID and is currently a doctoral candidate there. His interests include sound, music signal processing, instrument modelling, and electronic music.