<<

144 Sven-Amin Lembke, Scott Levine, & Stephen McAdams

BLENDING BETWEEN BASSOON AND PLAYERS: AN ANALYSIS OF TIMBRAL ADJUSTMENTS DURING MUSICAL PERFORMANCE

SVEN-AMIN LEMBKE,SCOTT LEVINE,& MONG THE MANY AIMS OF , STEPHEN MCADAMS the combination of instruments into a blended McGill University, Montreal, Canada A timbre is one that is most relevant perceptually. Although decisions concerning orchestration can be ACHIEVING A BLENDED TIMBRE BETWEEN TWO primarily guided by personal preference, blend relies instruments is a common aim of orchestration. It relates on a set of perceptual factors. It is commonly assumed to the auditory fusion of simultaneous sounds and can to concern the auditory fusion of concurrent sounds be linked to several acoustic factors (e.g., temporal into a single timbre, with the individual sounds losing synchrony, harmonicity, spectral relationships). Previ- their distinctness. Furthermore, it is thought to span ous research has left unanswered if and how musicians a perceptual continuum from complete blend to distinct control these factors during performance to achieve perception of individual timbres (Kendall & Carterette, blend. For instance, timbral adjustments could be ori- 1993; Lembke & McAdams, 2015; Reuter, 1996; Sandell, ented towards the leading performer. In order to study 1991, 1995; Tardieu & McAdams, 2012). Perceptual cues such adjustments, pairs of one bassoon and one horn that are favorable to blend range from synchronous player participated in a performance experiment, note onsets and pitch relationships emphasizing the which involved several musical and acoustical factors. harmonic series, to instrument-specific acoustical traits. Performances were evaluated through acoustic mea- Concerning pitch relationships, higher blend is sures and behavioral ratings, investigating differences achieved for unison than for non-unison intervals (Ken- across performer roles as leaders or followers, unison dall & Carterette, 1993). Whereas dissonant pitch inter- or non-unison intervals, and earlier or later segments vals exhibit greater frequency divergence between of performances. In addition, the acoustical influence harmonics that may render the identities of constituent of performance room and communication impairment instruments in a mixture more distinct, combinations in were also investigated. Role assignments affected spec- highly consonant intervals (octaves, fifths) can be tral adjustments in that musicians acting as followers assumed to be more blended. For the latter, auditory adjusted toward a ‘‘darker’’ timbre (i.e., realized by fusion can be further enhanced by parallel movement reducing the frequencies of the main formant or spec- of voices (Bregman, 1990). For all non-unison intervals, tral centroid). Notably, these adjustments occurred certain combinations of instruments can be expected to together with slight reductions in sound level, lead to higher degrees of blend than others, which may although this was more apparent for horn than bas- influence the instrumentation choices orchestrators soon players. Furthermore, coordination seemed more make. critical in unison performances and also improved With respect to acoustic traits, previous studies have over the course of a performance. These findings com- shown spectral properties to have the strongest effect on pare to similar dependencies found concerning how blend between sounds from sustained instruments. The performers coordinate their timing and suggest that global spectral shape of many wind instruments has performer roles also determine the nature of adjust- been shown to be largely invariant with respect to pitch ments necessary to achieve the common aim of and may also bear prominent features such as spectral a blended timbre. maxima (Lembke & McAdams, 2015). These maxima Received: January 27, 2015, accepted April 7, 2017. are also termed formants, in direct analogy to the pitch- independent spectral maxima found in human voice pro- words: coordination, music performance, duction (Fant, 1960). Previous explanations that relate timbre, blend, spectrum blend to spectral features are either based on global spec- tral characterization or focus on local, prominent spectral

Music Perception, VOLUME 35, ISSUE 2, PP. 144–164, ISSN 0730-7829, ELECTRONIC ISSN 1533-8312. © 2017 BY THE REGENTS OF THE UNIVERSITY OF CALIFORNIA ALL RIGHTS RESERVED. PLEASE DIRECT ALL REQUESTS FOR PERMISSION TO PHOTOCOPY OR REPRODUCE ARTICLE CONTENT THROUGH THE UNIVERSITY OF CALIFORNIA PRESS’S REPRINTS AND PERMISSIONS WEB PAGE, HTTP://WWW.UCPRESS.EDU/JOURNALS.PHP?P¼REPRINTS. DOI: https://doi.org/10.1525/MP.2017.35.2.144 Timbre Blending During Musical Performance 145

traits. The global and more general hypothesis was the influence of performance by relying on stimuli that established from studies for instrument dyads, in which were mixed from instrument sounds that had been the spectral centroids of individual instruments were recorded in isolation, with there being a single exception evaluated. The spectral centroid represents the global, (Kendall & Carterette, 1993) in which dyad stimuli had amplitude-weighted frequency average of a spectrum. been recorded in a joint performance (Kendall & Carter- It has been shown that higher degrees of blend are ette, 1991). The interaction between performers may in obtained when the sum of the spectral centroids of fact influence blend in a way that previous research has the constituent instruments are lower (Sandell, 1995; not considered. For instance, differences between per- Tardieu & McAdams, 2012). The alternative hypothesis former roles could provide answers to the question of argues that localized spectral features influence blend, a certain instrument serving as a reference. more specifically, concerning formant relationships between instruments: when two instruments exhibit MUSICAL PERFORMANCE coincident formant locations, high blend is achieved, Psychological research on musical performance has pri- whereas increasingly divergent formant locations marily investigated temporal properties. Although past decrease blend, as the individual identities of instru- investigations have focused on note synchronization ments are thought to become more distinct (Reuter, and timing between performers (Goebl & Palmer, 1996). 2009; Keller & Appel, 2010; Rasch, 1988) as well as Lembke and McAdams (2015) followed up on the related motion cues (D’Ausilio et al., 2012; Goebl & formant hypothesis by studying frequency relationships Palmer, 2009; Keller & Appel, 2010), performer coordi- between the most prominent main formants. The inves- nation with respect to timbral properties remains tigation considered dyads of recorded and synthesized largely unexplored (Keller, 2014; Papiotis, Marchini, instrument sounds. The recorded sound remained Perez-Carrillo, & Maestre, 2014). Rasch (1988) estab- a static reference and the synthesized sound was varied lished that a certain degree of asynchrony between per- parametrically with respect to its formant frequency. formers is common and practically unavoidable, For the instruments with prominent formant structure, whereas perceptual simultaneity between musical notes namely bassoon, (French) horn, , and , is still conveyed. For example, typical asynchronies blend was found to decrease markedly when the syn- between wind instruments (e.g., single and double ) thesized main formant exceeded that of the reference, performing in non-unison are reported as falling within whereas comparably high degrees of blend were 30-40 ms. Moreover, the asynchronies relate to different achieved if the synthesized formant remained at or roles assumed by musical voices (e.g., the melody gen- below the reference. This rule proved to be robust across erally precedes and middle voices). different pitches, with the exception of the highest Two studies investigated the relationship between two instrument registers, and even applied to non-unison pianists being assigned performer roles as either leader pitch intervals. However, this rule relies on one instru- or follower. In one study, followers exhibited delayed ment serving as a reference, which raises the conun- note onsets relative to leaders (Keller & Appel, 2010), drum of which of two instruments in an arbitrary whereas in the other, followers displayed a higher tem- combination would function as the reference. The poral variability, thought to be linked to a strategy of answer may lie in musical practice: either the instru- error correction relative to leaders (Goebl & Palmer, ment leading the joint performance or the one with 2009). In addition, the second study showed that under a more dominant timbre could assume this function. impaired acoustical feedback, performers increasingly In musical practice, achieving blended timbres relied on visual cues to maintain synchrony. Investiga- involves two stages: its conception and its realization. tions with a sole focus on performance-related factors Blend is first conceived by composers and orchestrators, within the auditory domain would therefore need to who lay out the foundations by providing necessary per- prevent visual communication between musicians. ceptual cues, i.e., ensuring that musical parts have syn- Role dependencies between performers are indeed chronous note onsets and pitch relationships favorable to common to performance practice. They have been inves- blend, with the parts being assigned to suitable instru- tigated for larger ensembles (D’Ausilio et al., 2012) and ment combinations. The successful realization of blend as have been discussed in terms of joint action (Keller, perceived by listeners still depends on musical perfor- 2008), in which they may modulate how performers rely mance, which necessitates precise execution by several on cognitive functions such as anticipatory imagery, inte- performers with respect to intonation, timing, and likely grative attention, and adaptive coordination. In terms of also coordination of timbre. Previous research precluded musical interpretation, leaders commonly assume charge 146 Sven-Amin Lembke, Scott Levine, & Stephen McAdams

of phrasing, articulation, intonation, and timing, whereas by avoiding situations in which one instrument’s timbre followers ‘‘adapt their own expressive intentions to dominates the other when a change in role assignments is accommodate or blend with another part’’ (Goodman, unlikely to overcome the strong timbral mismatch. An 2002, p. 158). It therefore appears plausible that the per- instrument combination that is widely used in the formance of blended timbre may similarly rely on role orchestral repertoire is bassoon and horn. Orchestration assignments between musicians. For instance, when two treatises discuss these two instruments as forming a com- instruments are doubled in unison, one of them assumes mon blended pairing (Koechlin, 1954; Rimsky-Korsakov, the leadership in performance, toward which followers 1964), with these observations reflected in findings of may orient their timbral and timing adjustments. In high degrees of blend in perceptual investigations (Reu- addition, these adjustments may continually be refined, ter, 1996; Sandell, 1995). The horn is often considered an as it likely takes some time for both musicians to improve unofficial member of the , bearing their coordination, given their individual roles and a timbral versatility that succeeds in blending with wood- respective performance goals. winds, , and even strings, which suggests that, at The current study explores what timbral adjustments the very least, it should succeed in bridging timbral dif- are employed in achieving blend and how these interact ferences with the bassoon. in a performance scenario with two musicians. A set of In summary, this investigation tests several hypotheses acoustic measures monitors the spectral change and based on the following experimental variables or factors potential covariates that are assumed to be related to (set in italics and capitalized). It is expected that musi- timbral adjustments. In addition, performances are also cians will perform differently as leaders than as followers, evaluated through musicians’ self-assessments. Besides with those in the Role of followers adjusting their timbre timbral adjustments, performances naturally also to that of the leader. Unison Intervals are hypothesized to involve aspects related to timing, intonation, and adjust- yield higher perceived blend than the non-unison case, as ment of dynamics. Intonation has not been previously well as possibly showing more coordination between discussed as relating to blend, likely due to past research instrumentalists. Furthermore, the coordination between having precluded performance-related aspects, but performers is predicted to increase throughout a perfor- reports from performers argue that correct intonation mance, i.e., it should be higher in a later than an earlier aids blending. Given the emphasis on timbre, however, musical Phrase. With respect to the influence of , performer coordination with respect to synchronization differences between Rooms may affect the degree of coor- and intonation remains outside the focus of the current dination between performers to some extent, although it study. Moreover, they represent aspects that are impor- is not clear in what way. Finally, given an assumed stron- tant to accurate delivery of musical performance in gen- ger dependency of followers on leaders than vice versa, eral, which greatly limits the extent to which they can be performances in which leaders lack acoustical feedback varied independently to affect blend. As a result, the from followers are not expected to differ from the case emphasis in this article lies on the spectrum, which with unimpaired Communication. likely governs instrumentation choices composers make and relates to the timbral adjustments over which per- ACOUSTIC MEASURES FOR TIMBRE ADJUSTMENTS formers have independent control. Our acoustical analysis of instruments focuses on the The investigation considers a realistic account of fac- spectral envelope, which represents the envelope or tors encountered in musical practice and situates musi- profile outlined by the partial tones contained in an cians in an approximation to the ecologically valid instrument’s spectrum (Rodet & Schwarz, 2007). Unlike setting of a concert hall, realized through controlled and conventional Fourier spectra, which characterize spec- reproducible virtual performance environments. In tral fine structure by delineating individual partial tones concert halls, the coloration of instrument timbre as and the gaps between them, a spectral envelope is a function of relative position inside the room has been a smooth, continuous function approximating the reported to be perceptible (Goad & Keefe, 1992), which broader spectral structure of instruments, e.g., revealing would similarly extend to differences between rooms. the presence of formants, which one might conceive of Furthermore, an impairment of the acoustical commu- as the resonant structure that shapes the amplitudes nication between musicians (Goebl & Palmer, 2009) may across frequencies. Spectral envelopes can be determined be relevant to the performance of blended timbre as well. for audio signals across their time course (Villavicencio, Because the investigation considers a potential effect of Ro¨bel, & Rodet, 2006) or they can concern pitch- performer roles, an instrument combination should be generalized descriptions from a compilation of spectra chosen that allows for sufficient timbral coordination, i.e., obtained across entire pitch ranges of instruments Timbre Blending During Musical Performance 147

0dB F , F 0 max 3dB f 2000 0 –5 SC SC F –10 max F –15 horn 3dB –10dB –20 1500 –25 bassoon

–30 –20dB Power level (dB) –35 –40 1000 –45

–50 Frequency (Hz) 250 500 750 1000 1250 1500 1750 2000 Frequency (Hz) 500

FIGURE 1. Spectral envelopes for bassoon (white area) and horn (grey area) at dynamic marking , estimated using the pitch-generalized 100 method (Lembke & McAdams, 2015). Frequency descriptors for the main L (dB) –50dB rms formant, Fmax (solid red) and F3dB (dashed red), and the global spectral 123456789 centroid SC (solid blue) reflect the similarity in prominent spectral- Time (s) envelope features of the two instruments. The spectral envelopes are offset vertically by 6 dB for better comparison. FIGURE 2. Spectrogram of horn playing an A-major scale from A2 to A4, based on time-variant spectral-envelope estimates (True Envelope, (Lembke & McAdams, 2015). With regard to the latter, Villavicencio et al., 2006). The plot displays spectral-envelope bassoon and horn bear a high resemblance, as illustrated magnitude (colormap at the far right) along frequency (y-axis) and time (x-axis), spectral measures Fmax, F3dB,andSC and fundamental in Figure 1 (see color version online) for the dynamic frequency f0 (solid red, dashed red, blue, and white curves, marking piano. As their most prominent traits, main respectively) as well as sound level Lrms summed across frequencies formants are located around 500 Hz and can be charac- (separate horizontal strip at the bottom). Sound levels were normalized to the maximum level of the excerpt (0 dB). terized by the frequency Fmax (solid red line) correspond- ing to the maximum magnitude and the frequency above Fmax where the magnitude has decreased by 3 dB, termed Villavicencio et al., 2006), again employing the descrip- the upper frequency bound F3dB (dashed red line). Both tive measures Fmax, F3dB,andSC. An example is given in instruments’ main formants exhibit similarities, with Figure 2 (see color version online), showing a horn play- their Fmax differing by only about 80 Hz, whereas their ing an ascending A-major scale over two octaves, visual- F3dB lie much closer. In addition, the spectral centroids ized as a spectrogram of TE estimates across time frames. SC (solid blue line) are located in the vicinity of the main Apart from the spectral descriptors Fmax, F3dB,andSC, formants, showing the global spectral distribution to be the figure includes the temporal evolution of pitch and strongly influenced by the prominence of the main for- dynamics, represented by the fundamental frequency f0 mants. Still, the horn exhibits a slightly broader, more (white curve) and the relative sound level Lrms (level sum dominant main-formant region, which may equate to across all frequencies: separate horizontal strip at the a similar difference in timbral dominance. bottom), respectively. Gaps in the spectral descriptors Although the pitch-generalized description in Figure 1 Fmax and F3dB (red curves) are due to unreliable detection approximates the instruments’ structural invariants (i.e., of formants. related to what informs orchestrators in their choice of From a preliminary qualitative investigation with bas- instruments), in practice these structural constraints still soon and horn players, the timbre variability at the allow for a certain degree of timbral variation that musi- players’ control was found to be greater for horn than cians can exploit. Because wind instruments act as acous- for bassoon. For the latter, the location and shape of the tic systems in which all sound originates from common main formant is relatively fixed, with spectral changes structural elements (e.g., , resonator tube), primarily affecting the magnitudes of higher frequency timbral adjustments are expected to be inherently linked regions relative to the main formant, whereas the struc- to the primary parameters of sound excitation perfor- tural constraints of the horn allow for greater changes to mers focus on, namely, pitch and dynamic intensity. For main-formant location and shape, as also becomes both instruments, blend-related adjustments of timbre apparent in Figure 2. Musicians reported that during can be assumed to relate to spectral changes, which can performance, the greatest timbre change could be be monitored by evaluating time-variant spectral envel- achieved by varying dynamics, which suggests a depen- opes (e.g., by way of True Envelope [TE] estimation, dency between them. The identification of perceived 148 Sven-Amin Lembke, Scott Levine, & Stephen McAdams

dynamic markings has been shown to be mediated by themselves professional musicians, and overall, the musi- both timbre and sound level (Fabiani & Friberg, 2011), cians reported playing or practicing their respective which argues that when performers adjust dynamics, instruments for the median duration of 21 hours per both timbre and the sound level (Lrms) are affected. week (range ¼ 5-35). All musicians were paid for their Apart from dynamics, pitch presents another source participation and provided informed consent. The study of covariation with spectral measures, with pitch being was reviewed for ethical compliance by the McGill expressed through the fundamental frequency (f0) for Research Ethics Board. harmonic sounds. In Figure 2, all spectral measures show some variation as pitch ascends, which can be STIMULI quantified descriptively by the linear correlation coeffi- Three musical parts were investigated, all taken from cient (Pearson’s r): The strongest covariation with f is 0 a single excerpt in Mendelssohn-Bartholdy’s A Midsum- apparent for SC, r ¼ .92, whereas the correlation with mer Night’s Dream, Op. 61, No. 7 (measures 1-16). The main-formant measures is less pronounced, r < .40, with chosen instrument combination is featured prominently F and F meandering around idealized average max 3dB in this musical passage. In a thin orchestral texture, low values. Given these differences in covariation with f , 0 strings, second horn, and establish the har- the two types of spectral measures seem to capture inde- monic structure through long, separated notes, while pendent contributions of timbral change. It is important two bassoons accompany a solo horn melodically. In to note that even f and L yield a clear degree of 0 rms the absence of other salient voices, the combination of correlation, r ¼ .72, with about 10 dB of level change bassoons with horn can therefore be thought to aim for across the two octaves. In orchestration practice, this a homogeneous, blended timbre. All parts were trans- correlation corresponds to the notion of pitch-driven posed by a fifth down to A major from the original key dynamics, with experimental evidence showing that of E major, to reduce the impact of player fatigue ascending pitch contour can enhance the identification through repeated performances in high instrument reg- of changes in dynamics, e.g., crescendo (Nakamura, isters, at the same time ensuring little change in key 1987). In summary, this preliminary investigation sug- signature. The transposed parts are shown in Figure 3. gests that timbral adjustments should be evaluated by The melody, voice A, was used for unison performances, way of combined measures of spectral variation and whereas voices B and served as non-unison material. other potential factors of covariation, such as pitch and Across the different experimental conditions, each voice dynamics. was played by both instruments, regardless of whether a voice had been assigned to only one particular instru- Method ment in the original score. Although the musicians played in separate rooms in PARTICIPANTS order to record their individual sounds, they heard them- Sixteen musicians were recruited primarily from the selves and the other player over headphones in a simu- Schulich School of Music at McGill University and the lated virtual-acoustics environment, which allowed the music faculty of the Universite´ de Montre´al. The bassoo- control over acoustical factors (see Design). The simula- nists, three female and five male, had a median age of tion was achieved through binaural reproduction (Paul, 21 years (range ¼ 18-31). The hornists, six female and 2009) using real-time convolution of the instruments’ two male, had a median age of 20 years (range ¼ 17-44). source signals with individualized binaural room impulse Across both instruments, 10 participants considered responses (RIRs). Each musician’s performance was

FIGURE 3. Musical parts A, B, and C in A-major transposition, based on Mendelssohn-Bartholdy’s A Midsummer Night’s Dream. The ‘V’ marks the separation into the first and second phrases (see Musical factors). Timbre Blending During Musical Performance 149

captured through an omnidirectional microphone (DPA voice contexts, and performance differences across 4003-TL). Both microphone signals were routed to a con- time. For the Role factor, one instrumentalist was trol room, where preamplification gain was digitally assigned the role of leader, while the other performer matched for both performers. The analog signals were acted as follower (i.e., took on an accompanying role). converted to 96 kHz / 24-bit PCM digital data, recorded According to the Interval factor, musicians either per- at full resolution for later acoustical analysis and at the formed a melodic phrase in unison (voice A in Figure 3) same time fed into separate convolution engines that or a two-voice phrase in non-unison (B and C); in non- processed the source signals with customized RIRs, based unison, the top voice (B) was assigned to the leader. The on the manipulation of acoustical factors. Individualized Phrase factor divided the musical excerpt into two, with binaural signals were then fed to headphones for each the separation occurring right before beat three of mea- performer. Headphone amplifier volume was held con- sure eight (see the ‘V’ in Figure 3). This separation stant, as were the circumaural closed-ear headphones yielded two musical phrases of identical length consist- (Beyerdynamic DT770). A latency inherent to the con- ing of similar musical material, more so for unison than volution delayed the arrival of the simulated room feed- for non-unison parts. back by about 8.4 ms, affecting both performers equally. Acoustical factors. Two other variables investigated The RIRs had been previously collected in real concert effects for communication directivity between perfor- venues and were measured with a binaural head-and- mers and the room-acoustical properties of perfor- torso system (Bru¨el & Kjaer Type 4100), excited by a loud- mancevenues.TheCommunication factor assessed speaker (JBL LSR6328P) positioned to emulate the the influence of whether both performers were able to instruments’ main sound-radiation directivity (Meyer, hear each other or whether only the follower could hear 2009). the leader, denoted two-way or one-way, respectively. In the simulated environment, musicians would For the Room factor, the influence of acoustics was hear themselves and the other musician in a common assessed for two different performance spaces: musi- performance space, which provided realistic room- cians were simulated as performing in either a large, acoustical cues (e.g., room size, its reverberation char- multipurpose performance space (RT ¼ 2.1 s, time for acteristics, relative spatial positions of players). The 60 reverberation to decrease by 60 dB) or in a mid-sized instrument locations were based on a typical orchestral recital hall (RT ¼ 1.3 s).1 setup: horns on the conductor’s left front side and bas- 60 soons on the conductor’s right front. For instance, hor- nists heard themselves in direct proximity and the PROCEDURE The experiment was conducted in two research labora- bassoonist towards their left, at a distance of 3.6 m, tories at the Centre for Interdisciplinary Research in whereas the bassoonists’ viewpoint was reversed in ori- Music Media and Technology (CIRMMT) at McGill entation. In order to take these individual viewpoints University. Separate laboratory spaces were called for into account, i.e., as performers heard themselves (self) in order to create individual acoustical environments and the other musician (other), the acoustical analyses for each participant, ensuring the capture of separate of performances considered the individualized binaural source signals as well as preventing visual cues between signals. Although four possible binaural signal paths performers. Each performance laboratory was treated to resulted from a performer having two ears and hearing be relatively non-reverberant, with RT < 0.5 s. Perfor- two sources at the self and other positions, only two 60 mers received instructions and provided feedback paths were considered for simplicity: self considered the through dedicated computer interfaces. Musical nota- ear facing away from the other performer, and other tion for all three parts was provided on a music stand, considered the ear closer to the other performer. and performances were temporally coordinated by a silent video of a conductor. With both performers DESIGN seated on chairs, the stand was positioned to allow the Performances were studied as a function of musical and performer’s field of view to cover both the musical nota- acoustical factors using a repeated-measures design to tion and the conductor, arranged similarly to the binau- rule out confounding individual differences for instru- rally simulated situation (i.e., the stand slightly ments and playing technique or style with the investi- gated effects. 1 The performance venues correspond to the Music Multimedia Room and Tanna Schulich Hall, respectively. Both are located at the Schulich Musical factors. Three independent variables considered School of Music, McGill University. More details under http:// the performer role, the influence of different musical www.mcgill.ca/music/node/48232. (Last accessed on May 18, 2017.) 150 Sven-Amin Lembke, Scott Levine, & Stephen McAdams

to the right of the conductor as seen from a hornist and ACOUSTIC MEASURES to the left for a bassoonist). The video was recorded in In addition to the behavioral ratings, several acoustic advance by having an experienced conductor (with measures accounted for blend-related timbre features baton) outline the metrical structure of the musical and were evaluated as time series. Timbral adjustments excerpt, including gestures related to phrasing and artic- were evaluated through spectral descriptors and also ulation. He used a constant reference tempo of 58 beats monitored through the covariate measures pitch and per minute. dynamics. Two additional cues important to blend— A pair of one bassoon and one horn player was namely, intonation and synchrony—were initially con- tested in a single experimental session, being instructed sidered in order to allow their influence to be filtered to perform together to achieve the highest degree of out subsequently. Time series were analyzed with blend possible. They performed three repetitions of 16 respect to the time-averaged magnitude of an acoustic different experimental conditions (four factors by two measure, its temporal variability during performance, treatment levels, excluding Phrase), leading to a total of and its temporal coordination between performers. 48 experimental trials. The experiment lasted around Therefore, each measure yielded three corresponding two hours in total, including a break scheduled after dependent variables (DVs). half of the trials. To avoid disorientation of musicians All acoustic measures were based on spectral analyses through strongly varying performer-role and voice acrossthetimecourseofperformances,forwhich assignments, the musical factors were grouped in sep- short-time Fourier transforms (STFT) and further arate blocks. Participants assumed the role of either derived representations were computed using dedicated leader or follower throughout the first or second half software (AudioSculpt/SuperVP, IRCAM, Paris). STFT of the experiment. Furthermore, shorter eight-trial was based on the fast Fourier transform (FFT), using blocks grouped conditions based on voice assignment Hann-windowed analysis frames consisting of 7,620 (e.g., four unison trials, another four non-unison), with samples, FFT length of 8,192 bins, and an overlap of the repetitions occurring after each block. For instance, 25% between successive frames. Given the sampling a given participant would begin as leader for 24 trials, rate of 96 kHz, this corresponded to a frequency and performing the first repetition of four unison trials, time resolution of 11.7 Hz and 19.8 ms, respectively. then proceed to four non-unison trials, followed by the Pitch detection employed harmonic analysis of the second repetition of the same four unison trials, etc. STFT spectra (Doval & Rodet, 1991), with the identified The four possible block-ordering schemes were coun- fundamental frequency f0 configured to fall within the terbalanced across all participants and instruments. possible range f0 2 [92.5, 370] Hz, which reflected the The acoustical-factor combinations were nested in pitch range across all parts expanded by a whole tone on sub-blocks of four trials and randomly ordered. Three each end. The f0 estimates provided by AudioSculpt practice trials were conducted under the guidance of were complemented by corresponding confidence two experimenters ahead of the main experiment, scores (i.e., the likelihood for identified harmonics to involving the experimental conditions from the first be linked to f0), which in turn were used to discard time block of four trials. frames falling below 80% confidence from further anal- A single experimental trial consisted of three stages: ysis for all measures. This elimination improved the preparation, performance, and ratings. During prepara- reliability of both f0 and spectral measures. Based on tion, musicians were asked to prepare the assigned the remaining STFT frames, spectral envelopes were musical parts and individual performer roles, while obtained through True Envelope (TE) estimation (Villa- being able to hear themselves in the current simulated vicencio et al., 2006). The TE algorithm applied iterative room environment. After both participants indicated cepstral smoothing on STFT-magnitude spectra, yield- being prepared, the actual performance commenced ing individual spectral-envelope estimates per time and once it ended, each participant judged their indi- frame, based on a constant cepstral order oriented at vidual experience of the performance by providing two f0 300 Hz. Then, a formant-analysis algorithm eval- ratings. The first rating assessed how well they thought uated the spectral envelopes, identifying main formants they had individually performed given their assigned (F1), which were quantified in terms of frequencies role on a continuous scale with the verbal anchors very characterizing their maximum Fmax and the upper badly and very well. The second rating concerned the bound F3dB, as well as computing the spectral centroid perceived degree of achieved blend with the other per- SC (Peeters, Giordano, Susini, Misdariis, & McAdams, former on a continuous scale with the verbal anchors low 2011). The spectral envelopes also served to quantify blend and high blend. dynamics by determining relative, root mean square Timbre Blending During Musical Performance 151

(RMS) power levels Lrms, which corresponded to the for all measures retained only values falling within the level summed across all frequencies of the spectrum. intonation range of +25 cents, which corresponds to As the raw time-series data for the measures exhibited musically acceptable intonation (Rakowski, 1990). some fine temporal variation and occasional outliers, Unlike intonation and timing, pitch (f0) and dynamics some prior data treatment was needed. All measures (Lrms) were intrinsically related to the spectral measures were smoothed by a weighted moving-average filter. and could not be directly excluded from further analy- Weights were based on the f0-confidence scores, assum- sis, but were instead monitored for similar trends along ing that higher confidence reflected a more robust and the spectral measures’ time series. The influence of f0 reliable parameter estimate. Smoothing used a sliding- was twofold: First, systematic differences in f0 between window duration of 475 ms, which corresponded to an the musical parts were likely reflected in deviations eighth note at the performed tempo. Especially for horn between unison and non-unison performances. Second, signals, the automated formant detection at times led to f0 also varied over time, and all spectral measures cov- erroneous estimates, which could be identified and aried with f0 to some extent. By taking residuals (") from eliminated. Prior to smoothing, the main-formant the linear regression of the f0 time series onto the time descriptors Fmax and F3dB were filtered for outlying series of each of the three spectral measures and adding values that lay beyond an octave below and two-thirds the residual scores to the spectral time-series means, the of an octave above their time-averaged median value, linear covariation with f0 over the parts could be because unrelated spectral features beyond these fre- removed. This procedure yielded the residual measures quencies were occasionally classified as the main for- "Fmax, "F3dB, and "SC. mant. Deemed an artifact of cepstral smoothing, the TE The performance analysis considered individual per- estimates for horns sometimes also exhibited spectral- formers and evaluated each acoustic measure with three envelope maxima at 0 Hz, in which case formant identi- DVs. The first DV quantified the acoustic measure’s fication failed. Therefore, resulting gaps for Fmax greater average magnitude, using the median across time than two metrical beats were replaced by f0 values, serv- values. The second DV assessed the temporal variability ing as the lowest tonal signal components. The corre- along a measure, expressed as a robust coefficient of sponding F3dB values were determined from the variation (CV): the ratio between interquartile range replaced Fmax. The final step of data treatment ensured and median. The third DV assessed the temporal coor- that the measures yielded values across all analysis frames dination between performers, evaluating the maximum of a performance, allowing comparisons between perfor- cross-correlation coefficient (XC) for their time series.2 mers across all time points. This was achieved through Due to the expected covariation with f0, the XCs for the linear interpolation of all remaining gaps to a reference spectral measures were assumed to be inflated by the time grid. Extrapolation was applied for values missing at inherent similarity in f0 profiles between parts A and A the edges, which rarely exceeded a quarter-note duration (unison), and even B and C (non-unison). Therefore, (e.g., delayed entry of the first note or the final note not this DV considered the residual measures ("), whereas being held for its entire duration). the remaining DVs were based on the original acoustic The investigation focused on timbral adjustments as measures. Furthermore, in considering the individual reflected in spectral changes. However, not all spectral viewpoints of performers within the binaural simula- changes were necessarily related to the intent to achieve tion, the DVs evaluating median and CV were based blend. Performer actions related to errors in intonation on time series for the binaural signal self, whereas the or timing could also have evoked a certain degree of DV evaluating XC compared self with other. spectral change. Therefore, the performances were fil- tered for cases in which bad intonation and/or syn- Results chrony were apparent. Intonation was measured by comparing f0 between performers, expressed as the rel- The presentation of results focuses on the hypotheses ative deviation in cents. For unison, this characterized established in the introduction, which were tested by deviations from a f0 ratio of unity; for non-unison, the 2 deviation considered f0 ratios of the corresponding Although cross-correlation time lags were also evaluated, no evidence intervals in equal temperament. Asynchrony could also for relative delays in coordination was found across all measures. For be assessed through the intonation measure, because instance, Lrms displayed a median lag of 0 ms across all conditions and both instruments, with the interquartile range also being 0 ms, showing asynchronous note entries also introduced substantial hardly any variation along this measure. SC exhibited a median lag of deviations from perfect intonation for the duration by 0 ms with an extremely wide interquartile range of 871 ms, which reflects which they were offset from synchrony. The time series little agreement across participants. 152 Sven-Amin Lembke, Scott Levine, & Stephen McAdams

a total of five factors, namely, Role, Interval, Room, given their assigned role. As the ratings applied to entire Communication, and Phrase, with two treatment levels performances, mixed ANOVAs included the four each. In the experiment, performances across the 16 fac- within-subjects factors Role (leader, follower), Interval torial combinations (excluding Phrase) were repeated (unison, non-unison), Room (large, small), and Com- three times. The subsequent analysis retained only the munication (two-way, one-way), with Instrument (bas- two ‘‘best’’ repetitions per participant pair (i.e., those that soon, horn) forming a between-subjects factor. yielded the highest self-assessed performance ratings), For blend ratings, performers acting as leaders did not which needed to reflect agreement between the two par- provide ratings for the impaired acoustical feedback as ticipants performing together. Out of three repetitions, at they were unable to hear the follower. To work around least one found mutual agreement between both perfor- these missing values, separate ANOVAs evaluated two mers as to having been rated among the highest two. If subsets of the blend ratings, which each excluded one of there was no further mutual agreement, the repetition the problematic factors. The first only considered unim- yielding the higher average rating across performers was paired feedback across the remaining within-subjects taken. Some unforeseen technical issues during two factors Role Interval Room; the second comprised experimental sessions rendered data for a total of five only performers acting as followers across Interval trials unusable. Fortunately, this affected only one repe- Room Communication. Both analyses suggested that tition per experimental condition, allowing the remain- performances were perceived as more blended in unison ing two repetitions to be used. In the analyses, separate than in non-unison, without other factors interacting. performances were considered as independent cases, i.e., Whereas performances under unimpaired communica- corresponding to a total of 16 cases (eight performers tion yielded clear trends for higher blend in unison, 2 two repetitions) per instrument. F(1, 30) ¼ 19.40, p <.01,p ¼ .39, analysis of only Analyses of variance (ANOVA) tested effects across the followers’ ratings resulted in only marginally higher within-subjects musical and acoustical factors. The blend ratings for unison, F(1, 30) ¼ 3.94, p ¼ .06, 2 within-subject residuals yielded slight departures from p ¼ .12. In numerical terms, the observed blend- a normal distribution (Shapiro-Wilk test). Based on the rating differences between unison and non-unison con- known robustness of ANOVA to violations of normality ditions amounted to a mean within-subject difference of for equal sample sizes (Harwell, Rubinstein, Hayes, & about .04 (standard error .01) on a full scale range of [0, Olds, 1992), the use of ANOVA was considered justified 1]. In summary, performances under unimpaired com- for DVs exhibiting less than 10 violations over all 32 munication led to higher blend ratings for unison con- factor cells, which all reported statistical effects fulfilled. ditions, although the exclusion of leaders’ ratings or the Furthermore, the two Instrument groups could be imple- inclusion of ratings for impaired communication may mented as a between-subjects factor if both groups exhib- have compromised this effect. ited similar variances. This condition was fulfilled for the Performance ratings only led to a marginally signifi- behavioral ratings, as both groups of players used iden- cant main effect for Interval, F(1, 30) ¼ 3.90, p ¼ .06, 2 tical rating scales and did not exhibit systematic differ- p ¼ .12, but this factor still yielded two-way interac- 2 ences in their ratings. The acoustic measures, however, tions with Role, F(1, 30) ¼ 6.43, p ¼ .02, p ¼ .18, and 2 exhibited clear violations (Levene’s test), brought about Communication, F(1, 30) ¼ 4.70, p ¼ .04, p ¼ .14. by consistent differences in their acoustical characteriza- Figure 4 presents differences between Roles (rating as tion. As a result, the acoustic measures involved separate leader minus rating as follower) and Communication ANOVAs by instrument. In line with the use of ANOVA direction (one-way minus two-way condition). As is for repeated measures, reported main effects consider apparent in Figure 4 (top panel), the first interaction statistics for within-subjects differences between two involved musicians rating themselves as having per- levels of a single factor, i.e., means and standard errors formed their role better as followers than as leaders in across participants for individual differences along the unison conditions, with the inverse relationship holding factor in question. For a quantification of several DVs for non-unison performances. The second interaction in terms of group means for individual factor cells, please (Figure 4, bottom panel) suggested that in unison per- refer to the two tables in the supplementary materials formances, musicians rated their performances higher section accompanying the online version of this paper. for unimpaired, two-way communication, whereas the ratings for non-unison performances appeared to be BEHAVIORAL RATINGS unaffected by communication directivity. Participants provided two ratings quantifying their per- Two additional interactions involved differences ception of blend and assessment of their own performance between instruments. Figure 5 presents the differences Timbre Blending During Musical Performance 153

roles became larger in the smaller room, whereas for unison higher rating higher rating horns, the role difference appeared to be limited to just follower leader the smaller room. non-unison Overall, these interdependencies suggest that com- –0.2 –0.15 –0.1 –0.05 0 0.05 0.1 0.15 0.2 munication impairment had a stronger effect on unison Rating difference between levels of Role performances and that followers were more satisfied with their performances than were leaders. Differences unison between instruments and across roles could be related higher rating higher rating two-way one-way to instrument-specific issues concerning playability of non-unison the corresponding parts. Furthermore, the less reverber- ant acoustics of the small room seemed to affect perfor- –0.2 –0.15 –0.1 –0.05 0 0.05 0.1 0.15 0.2 Rating difference between levels of Communication mances (or their evaluation) more critically.

FIGURE 4. Within-subject differences in performance ratings across ACOUSTIC MEASURES the factor interactions Role Interval (top; leader minus follower) and The way in which bassoonists and hornists coordinated Communication Interval (bottom; one-way minus two-way). Bars and their playing to achieve blend was analyzed across the intervals represent means and standard errors, respectively. time course of performances by taking several acoustic measures into account. The analysis approach examined performer coordination as a function of the musical and leader higher rating higher rating acoustical factors being studied. horn bassoon follower Figure 6 (see color version online) visualizes a single

–0.2 –0.15 –0.1 –0.05 0 0.05 0.1 0.15 0.2 performance by one bassoon and one horn player in two Rating difference between levels of Instrument spectrograms obtained through TE estimation. The superimposed curves represent the time courses for all large acoustic measures, Fmax, F3dB, SC, and f0, and the sepa- room bassoon rate horizontal strip at the bottom traces the temporal small evolution of Lrms. In this example, the unison part was higher rating higher rating performed under normal, two-way communication in follower leader large the larger room, with the bassoon acting as leader. This example also considers the bassoon’s viewpoint, i.e., room horn involving binaural signals for bassoon and horn as small heard from the self and other positions, respectively. –0.2 –0.15 –0.1 –0.05 0 0.05 0.1 0.15 0.2 Three DVs were derived from each measure—median, Rating difference between levels of Role CV, and XC—and were analyzed in repeated-measures ANOVAs investigating the factors Role, Interval, Room, FIGURE 5. Within-subject differences in performance ratings across Communication, and Phrase. the factor interactions Instrument Role (top; bassoon minus horn) Because the acoustic measures and associated DVs and Role Instrument Room (bottom; leader minus follower). Bars and intervals represent means and standard errors, respectively. were quantified along physical scales or quantities derived from them, statistical effects were also evaluated against psychoacoustically meaningful thresholds. For between Instruments (bassoon minus horn) and Roles median Lrms, differences needed to exceed 1 dB, as this (leader minus follower). As illustrated in Figure 5 (top value estimated the just noticeable difference (JND) for panel), a two-way interaction with Role, F(1, 30) ¼ 6.49, amplitude (Zwicker & Fastl, 1999). Spectrum-related 2 p ¼ .02, p ¼ .18, yielded higher performance ratings for JNDs for formant frequencies or spectral centroid bassoons than horns in the role of followers, whereas no amount to about 15 Hz for the frequency range in ques- difference between instruments was found for leaders. tion (Kendall & Carterette, 1996; Kewley-Port & Wat- The same interaction suggested that bassoonists pro- son, 1994), whereas spectral-envelope variation has also vided higher ratings as followers than as leaders (Figure been linked to lowering JNDs for fundamental fre- 5, bottom panel), with the opposite applying to horns. quency (Moore & Moore, 2003). As the latter case A related three-way interaction (Figure 5, bottom panel) points to an even more acute discrimination of spectral added the influence of the Room factor, F(1, 30) ¼ 4.22, change, a more liberal threshold of 5 Hz was adopted 2 p ¼ .05, p ¼ .12. For bassoons, the difference between for the spectral measures (Fmax, F3dB, SC). This 154 Sven-Amin Lembke, Scott Levine, & Stephen McAdams

0dB f SC F F 1500 0 max 3dB –10dB

1000 –20dB

500 Frequency (Hz)

100 –50dB

1500

1000

500 Frequency (Hz)

100 L rms 1-1 2-1 3-1 4-1 5-1 6-1 7-1 8-1 9-1 10-1 11-1 12-1 13-1 14-1 15-1 16-1 Metrical time (measure-beat)

FIGURE 6. Spectrograms of a joint performance of the unison part by one bassoon (top) and one horn (bottom) player, employing TE estimation

(Villavicencio et al., 2006). Curves display the time series of (smoothed) acoustic measures Fmax, F3dB, SC, and f0; the separate horizontal strip at the bottom displays Lrms. See caption of Figure 2 for further details. threshold is based on the discrimination threshold of The assessment of potential room effects compared about 1% for fundamental frequency in complex tones fixed reference performances simulated at the self posi- (Moore & Moore, 2003; Zwicker & Fastl, 1999) when tions in the small vs. large rooms. For greater represen- applied to the investigated main-formant frequencies. tativeness, this procedure was applied to two selected For CV, differences below 10% were considered negli- performances per participant, for parts A and C, yield- gible, because even confounding variables could be ing 2 16 cases. For the median DV measures, the shown to introduce greater variability (see Covariates). comparison of group medians by room yielded shifts Lastly, XC differences below 1% (e.g., 0.3% improved for all spectral measures and Lrms: identical horn per- temporal coordination) were considered of too little formances exhibited slightly stronger dynamics in the valuetobereported.ThethresholdforXCwas large than in the small room, with the opposite applying expressed in terms of explained variance, i.e., differ- to bassoon. Likewise, the spectral measures varied by ences between R2 values. about 1% in main-formant frequency between rooms. In terms of CV, the spectral measures exhibited up to Covariates. Astheacousticmeasureswerebasedon 30% more temporal variability in the large room, real-life signals, they may have contained some differ- whereas variability in Lrms decreased by up to 10% in ences between factor levels that were unrelated to the same room. It appears that higher reverberation deliberate timbre adjustments by performers. For introduced greater spectral fluctuation, whereas it instance, different rooms typically impose a character- smoothed out temporal variability in dynamics. As only istic coloration, i.e., frequency filter, that may induce single performances at the self position were considered shifts in the spectral measures. Likewise, the apparent for the comparison between rooms, the change in XC differences in f0 register between parts likely imposed could not be assessed, because the cross-correlation spectral shifts that lay beyond the performers’ control. compared two performers at separate positions. Still, These possible sources of covariation will therefore be differences in reverberation between rooms may have assessed in this section to determine baselines against had an effect on XC as well. As is apparent in Figure 6, which to interpret any related effects in the following the performance at the other position (bottom panel) sections. yielded more variability than at the self position (top), Timbre Blending During Musical Performance 155

F (Hz) F (Hz) L (dB) max 3dB SC (Hz) rms was weighted by the relative duration of individual

higher for higher for higher for higher for pitches. Table 1 displays these comparisons: Although large large large 9 large f varied as much as 42%, the spectral shifts were less 100 0 8 pronounced, nonetheless exhibiting a monotonic decrease by part, i.e., C was lower than B, which was 80 7 lower than A. Bassoons exhibited only up to 13% of 6 covariation, whereas horns showed decreases up to 24%. The averaged frequency shifts for B and C were 60 5 taken as the baselines for spectral shifts induced from f0 4 changes alone and are visualized as the horizontal lines in 40 3 Figure 8. Given the covariate influence of rooms and f0,the 2 20 presentation of results for the factors Room and Interval * 1 * precedes the three remaining ones. Figures 7, 8, 9, and 11 visualize potential main effects for median DVacross 0 0 all acoustic measures, i.e., Fmax, F3dB, SC, and Lrms (indi- Median-DV difference between levels of Room higher –1 higher for higher for higher for for vidual panels from left to right, respectively). The bars –20 small small small small * and intervals symbolize means and standard errors, bassoon horn bassoon horn bassoon horn bassoon horn Instrument Instrument Instrument Instrument respectively, for within-subject differences between fac- tor levels for the factors Room, Interval, Role, or Phrase. FIGURE 7. Within-subject differences for the Room factor (large minus The labels above and below the zero-axis indicate the small) and DVs evaluating performers’ medians of the acoustic time orientation of a difference between two factor levels. For series derived from the performances (individual panels) by instance, for the factor Interval (Figure 8) and positive instrument. Labels above and below zero indicate the orientation of values in SC, the spectral centroid was higher for uni- differences between the two factor levels. For example, positive son than non-unison; the reverse applies for negative values for Lrms signify that the sound level was higher while playing in the large than in the small room. Bars and intervals represent means and values. In addition, Table 2 summarizes the effects for standard errors, respectively; asterisks (*) indicate significant main CV and XC. effects. Black horizontal lines indicate the expected covariation arising from room-acoustical variability alone. Room. ANOVAs on the median DVs yielded differences between rooms for the spectral measures and sound i.e., signals heard from farther away were also more level that strongly mirrored the expected covariate base- reverberated. Differences in reverberation between lines, as illustrated in Figure 7 by comparing the bars to rooms could have therefore modulated the disparity the corresponding horizontal lines. Assuming these between the two positions, and hence also XC, in some mirrored trends to reflect pre-existing differences in additional way. Unfortunately, these observations sug- room acoustics, only discrepancies from these baselines gested that pre-existing, systematic differences between beyond the psychoacoustically meaningful threshold rooms introduced a confounding influence on all will be considered. All but one of the effects fulfilled measures and across all DVs, compromising the ability this criterion, with F3dB for bassoon barely exceeding to tease apart differences in performer adjustments the baseline by about 5 Hz, F(1, 15) ¼ 22.86, p < .01, 2 from those introduced by room acoustics. As a result, p ¼ .60. Also the CV exhibited greater temporal obtained ANOVA effects were evaluated against the variability in the larger room, as indicated in Table 2. The threshold values quantified above, serving as baselines main-formant measures yielded differences up to 23%, 2 for the systematic variation. The resulting baselines for for both the horn, F(1, 15) 7.74, p < .02, p .34, and 2 median DV between rooms are visualized as the hori- the bassoon, F(1, 15) 5.29, p < .04, p .26, which zontal lines in Figure 7. again mirrored the expected trends for room-acoustical Spectral covariation with f0 between parts A, B, and C variation alone. Similar trends also applied to the tem- was quantified on the actual performer data. The com- poral coordination, with XC changing up to 8%. Both parison considered separate group medians by part, instruments’ Lrms exhibited greater XC in the larger 2 with the spectral shifts expressed relative to part A, room, F(1, 15) 8.32, p .01, p .36. In addition, which had the highest median f0. Spectral shifts could temporal coordination for horn was also higher in the also be compared to corresponding changes in f0 itself, larger room concerning SC and F3dB, F(1, 15) 9.29, 2 represented by the median across pitches per part, which p < .01, p .38. In summary, all findings appeared to 156 Sven-Amin Lembke, Scott Levine, & Stephen McAdams

TABLE 1. Influence of Pitch Differences (f0) Among Musical Parts on the Spectral Measures (Fmax,F3dB, SC)

f0 Bassoon Horn Part (rel. to A) Hz % Fmax F3dB SC Fmax F3dB SC B 62 25 4 2 6 19 12 13 C 104 42 13 7 13 24 12 21

Note: The covariation was evaluated for parts B and C relative to A (in % if not indicated otherwise), quantified as medians across all performances of a part. f0 per part considered the median across the pitches of all performed notes, weighted by their relative durations.

F (Hz) F (Hz) L (dB) differences between intervals for hornists, as they were max 3dB SC (Hz) rms derived from group medians. Nonetheless, these effects higher for higher for higher for higher for unison unison unison 9 unison could still not be assumed to correspond to blend-rated 100 performer actions, as they were dictated by the musical * 8 notation. The pronounced influence of Interval, how- 80 * 7 ever, is still important for interpreting interaction effects

* 6 among the remaining factors. * In addition, as summarized in Table 2, bassoonists 60 5 showed greater temporal coordination playing in uni- * 4 son than in non-unison, with XC increasing by 4% for 40 "SC, F(1, 15) ¼ 4.82, p < .05, 2 ¼ .24, although the 3 * p * difference was mainly apparent in the smaller room, 2 2 20 Interval Room: F(1, 15) ¼ 5.69, p ¼ .03, p ¼ .28. 1 By contrast, horns exhibited 8% greater coordination in Lrms in non-unison performances, F(1, 15) ¼ 12.00, 0 0 2 p < .01, p ¼ .44, with the difference being only half as Median-DV difference between levels of Interval higher for higher for higher for –1 higher for pronounced in the second phrase, Interval Phrase: non-unison non-unison non-unison non-unison 2 –20 F(1, 15) ¼ 7.76, p ¼ .01, p ¼ .34. These effects were bassoon horn bassoon horn bassoon horn bassoon horn complemented by analogous differences for CV mea- Instrument Instrument Instrument Instrument sures of Lrms, in that bassoons showed greater temporal variability in unison, F(1, 15) ¼ 4.81, p < .05, 2 ¼ .24, FIGURE 8. Within-subject differences for the Interval factor (unison p minus non-unison) and DVs evaluating performers’ medians of the whereas the opposite applied to horns, F(1, 15) ¼ 6.26, 2 acoustic time series derived from the performances (individual panels) p ¼ .02, p ¼ .30, with the latter being limited to fol- by instrument. Black horizontal lines indicate the expected covariation 2 lowers, Interval Role: F(1, 15) ¼ 9.05, p < .01, p ¼ arising from f0-register variability alone. See Figure 7 caption. .38. In summary, whereas the Interval factor introduced an upward bias to the acoustical measures for unison closely reflect patterns expected from pre-existing, sys- performances, which affected both instruments simi- tematic differences in room acoustics and did not allow larly, the DVs for temporal variability and coordination effects caused by deliberate performer actions to be showed a few opposing trends between instruments. clearly identified. Role. The clearest indication for timbre adjustments by Interval. The median DV for spectral measures and performers concerned differences between leader and sound level exhibited higher values in unison than in follower roles. For the median DVs, role-based differ- non-unison. As in the preceding section, the observed ences across spectral features and dynamics become differences for Interval generally matched the covariate apparent in Figure 9. Musicians produced higher spectral baselines for f0 register, as illustrated in Figure 8 when frequencies and increased sound levels as leaders com- comparing the bars against the horizontal lines. Only pared to when performing as followers. For bassoon, for the horn, the spectral measures exhibited higher the main-formant measures were higher for leaders, 2 frequencies for unison, F(1, 15) 106.45, p <.01, F(1, 15) 33.02, p < .01, p .69, but this appeared 2 p .88, which moreover fell below the baselines by to be limited to non-unison conditions, which was likely 10 to 20 Hz. These discrepancies could be due to related to the f0 difference between parts B and C, Role 2 the baselines overestimating the actual within-subjects Interval: F(1, 15) 34.76, p <.01,p .70. Likewise, Timbre Blending During Musical Performance 157

TABLE 2. Summary Table of Main Effects for DVs Evaluating Performers’ Temporal Variability (CV) and Temporal Coordination (XC) Across Acoustic Measures for the Factors Room, Interval, Role, and Phrase

Bassoon Horn

Fmax F3dB SC Lrms Fmax F3dB SC Lrms Room

more larger CV variability smaller more larger XC coordination smaller Interval

more unison CV variability non-unison

more unison XC coordination non-unison Role

more leader CV variability follower

more leader XC coordination follower Phrase

more phrase 1 CV variability phrase 2

more phrase 1 XC coordination phrase 2

Note: Vertically adjacent pairs of black and white fields represent main effects and their orientation. For instance, in the top row for bassoon and Fmax, more temporal variability was obtained in the larger room (black) than in the smaller room (white). No significant differences were found for the grey-shaded fields.

2 performances for leaders exhibited higher SC than did F(1, 15) ¼ 45.91, p < .01, p ¼ .75, being more pro- 2 those for followers, F(1, 15) ¼ 60.24, p <.01,p ¼ .80, nounced for non-unison performances, Role Interval: 2 however, more so in the non-unison conditions, for sim- F(1, 15) ¼ 6.43, p ¼ .02, p ¼ .30. Analogous differences ilar reasons as before, Role Interval: F(1, 15) ¼ 76.50, concerned leaders yielding higher Lrms, F(1, 15) ¼ 22.84, 2 2 p <.01,p ¼ .84. At the same time, Lrms increased slightly p < .01, p ¼ .60, and more so in the non-unison 2 for leaders, F(1, 15) ¼ 14.49, p <.01,p ¼ .49. conditions, Role Interval: F(1, 15) ¼ 30.23, p < .01, 2 The differences obtained for horn exhibited similar p ¼ .67. patterns. Both Fmax and F3dB yielded higher frequencies In other words, these findings argue that in the attempt 2 for leaders, F(1, 15) 9.45, p < .01, p .39, with the to blend with leaders, followers adjusted to ‘‘darker’’ tim- difference for F3dB appearing to be limited to unison bres and, interestingly, spectral features and dynamics performances, Role Interval: F(1, 15) ¼ 10.19, changed in a coherent way. For both instruments, SC 2 p < .01, p ¼ .40. Also SC yielded a difference between dropped by about 30 Hz and Lrms decreased by 1-3 dB performer roles, with higher frequencies for leaders, for followers. Figure 10 (see color version online) relates 158 Sven-Amin Lembke, Scott Levine, & Stephen McAdams

F (Hz) F (Hz) L (dB) max 3dB SC (Hz) rms the observed differences between performer roles to

higher for higher for higher for higher for equivalent spectral-envelope changes. These spectral leader leader leader 9 leader envelopes (curves) and the indicated acoustic measures 100 8 (vertical lines) represent medians taken across all perfor- mances, collapsed across the remaining factors. Although 80 7 these aggregate differences do not correspond to within- 6 subject differences, they still show how the effects influ- enced the entire spectrum. As illustrated by the black 60 5 arrows traversing the pairs of spectral envelopes, the 4 * main formants of followers (dark grey) receded in fre- 40 * * * 3 quency and level compared to the leaders’ (light grey). * This was reflected in analogous differences across the * 2 20 * * acoustic measures (vertical lines), although the detailed 1 analysis mirrors the observed differences between instru- ments (e.g., differences in line width). The main formants 0 0

Median-DV difference between levels of Role in unison bassoon performances remained fixed (top- higher for higher for higher for –1 higher for left panel), whereas the change in SC suggested spectral –20 follower follower follower follower adjustments relative to the main formant, which co- bassoon horn bassoon horn bassoon horn bassoon horn Instrument Instrument Instrument Instrument occurred with a slight change in Lrms.Forthesame unison conditions, the horns exhibited more change FIGURE 9. Within-subject differences for the Role factor (leader minus in formant measures and sound level (top-right). follower) and DVs evaluating performers’ medians of the acoustic time With regard to temporal variation, the DVs quantify- series derived from the performances (individual panels) by instrument. ing the CV exhibited instrument-specific effects, as See Figure 7 caption. summarized in Table 2. Leading hornists varied more

F , F spectral envelopes max 3dB SC 0 bassoon horn –5 unison unison

–10 leader leader –15 –20 follower follower –25 Power level (dB) –30 –35 0 bassoon horn –5 non-unison non-unison –10 –15 leader leader –20 follower follower –25 Power level (dB) –30 –35 400 600 800 1000 1200 400 600 800 1000 1200 Frequency (Hz) Frequency (Hz)

FIGURE 10. Spectral-envelope change as a function of performer roles (curves), by Interval (top and bottom panels) and instrument (left and right panels). Arrows trace the adjustments toward lower frequencies and sound levels from leader to follower roles. Corresponding shifts along median DV for Fmax, F3dB, and SC (vertical lines) illustrate the trend toward a “darker” spectrum, with the line width corresponding to the actual shift in frequency. Timbre Blending During Musical Performance 159

F (Hz) F (Hz) L (dB) than followers along F3dB and SC, F(1, 15) 9.15, max 3dB SC (Hz) rms 2 p < .01, p .38, whereas the contrary applied to bas- higher for higher for higher for higher for soonists across all spectral measures, F(1, 15) 22.42, phrase 1 phrase 1 phrase 1 9 phrase 1 2 100 p < .01, p .60. For both instruments, these effects 8 were limited to non-unison performances, which sug- gests that they arose from instrument-specific issues 80 7 related to parts B and C, Role Interval: F(1, 15) 6 5.93, p < .03, 2 .28. For instance, the low registral p 60 5 range of part C posed more playing difficulty to hornists than to bassoonists. Other role-dependent differences 4 40 were specific to horns, in which temporal variation of 3 Lrms was greater for followers, F(1, 15) ¼ 17.07, p < .01, 2 2 p ¼ .53, whereas the temporal coordination as quan- 20 * tified by XC was up to 3% higher for leaders concerning * * 1 "F and "SC, F(1, 15) 5.68, p .03, 2 .28. 3dB p 0 0 In summary, the effects between performer roles for Median-DV difference between levels of Phrase temporal variation and coordination yielded less coher- higher for higher for higher for –1 higher for ent patterns than those for median DVs. The observed –20 phrase 2 phrase 2 phrase 2 phrase 2 bassoon horn bassoon horn bassoon horn bassoon horn tendencies were mainly instrument-specific, which Instrument Instrument Instrument Instrument seemed more pronounced for spectral variation in the lower pitch registers. FIGURE 11. Within-subject differences for the Phrase factor (1st phrase minus 2nd phrase) and DVs evaluating performers’ medians of the Phrase. Comparisons between the first and second acoustic time series derived from the performances (individual panels) phrases indicated that both musicians adapted their by instrument. See Figure 7 caption. playing throughout a performance, adjusting their tim- bres toward an assumedly improved blend. With regard Similarly, the coordination in "SC also increased in the to median DV, leading bassoonists lowered SC by about later phrase by 3% for bassoon, F(1, 15) ¼ 9.86, p < .01, 2 12 Hz towards the second phrase, whereas followers p ¼ .40, and 5% for horn, F(1, 15) ¼ 19.14, p < .01, 2 increased by 10 Hz, still remaining below leaders, Phrase p ¼ .56. The increased coordination in the later phrase 2 may have been related to the notated crescendo-decre- Role: F(1, 15) ¼ 25.63, p <.01,p ¼ .63. The effect for followers appeared limited to non-unison conditions, scendo (see Figure 3, measures 13-14), which could like- whereas in unison, followers did not vary SC in their wise have explained a corresponding increase in horn 2 performances, Phrase Role Interval: F(1, 15) ¼ players’ CV for Lrms, F(1, 15) ¼ 92.41, p < .01, p ¼ .86. 2 In summary, performances in the second phrase exhib- 31.22, p <.01,p ¼ .68. This notable interaction revealed that even leaders attempted to close larger gaps in SC, ited a greater degree of temporal coordination, and they whereas followers fulfilled the same objective by remain- also seemed to involve adjustments toward a more sim- ing stable or closing gaps in the opposite direction. Hor- ilar and moderately ‘‘darker’’ spectrum. nists showed similar effects, although without Communication. Among the acoustic measures, no interactions with other factors, as illustrated in Figure 11. clear indications were obtained that the absence of audi- The formant measures decreased by about 5 Hz in the tory feedback from the follower affected performances second phrase, F(1, 15) 6.69, p .02, 2 .31. Like- p differently than in the unimpaired case. Of the few sta- wise, L also decreased by about 1 dB throughout rms tistically significant findings, all fell below the prede- performances, F(1, 15) ¼ 28.22, p < .01, 2 .65. Overall, p fined thresholds for psychoacoustically meaningful the difference in spectral frequencies between phrases differences. spanned between 5 Hz and 12 Hz, which given the prior discussion of thresholds may not have yielded clearly perceptible differences in all cases. Discussion Similar effects for temporal coordination supported the previous findings, as outlined in Table 2. For Lrms, When two musicians aim to achieve a blended timbre the second phrase yielded 6% and 8% higher XC for during performance, they coordinate their playing in 2 bassoon, F(1, 15) ¼ 37.93, p < .01, p ¼ .72, and horn, a certain way. Both performers aim for the idealized 2 F(1, 15) ¼ 125.05, p <.01,p ¼ .89, respectively. timbre the musical score conveys, which usually also 160 Sven-Amin Lembke, Scott Levine, & Stephen McAdams

implies the instrument that should lead in performance. 1 f L f L The leading musician determines timing, intonation, 0 rms 0 rms and phrasing, providing reference cues that accompa- 0.8 SC nying musicians closely follow, who likely also adjust F 0.6 max F their timbres to ensure blend. The employed strategies 3dB f of performer coordination may or may not be influenced 0.4 0 by whether they are playing in unison or non-unison, whether they perform in different venues, or whether the 0.2 leading instrument is unable to hear the other musician (as in offstage playing, for example). These factors were 0 studied for pairs of one bassoon and one horn player, Correlation coefficient (Pearson r) –0.2 focusing on the timbral adjustments they employed. Per- formances were evaluated over their time courses Bassoon Horn through a set of acoustic measures, complemented by self-assessment from the performers, delivering a differ- FIGURE 12. Quantification of the covariation introduced by pitch (f0) and dynamics (Lrms) on the spectral measures (Fmax, F3dB, SC). Bar entiated picture of how performers adjust timbre in heights and error bars represent medians and interquartile ranges, achieving blend. respectively, of Pearson correlation coefficients computed per Measuring timbre adjustments as they occur in the performer, which considered all available time-series data (N 65,000). realistic setting of musical performance involves a high degree of complexity. These adjustments were evaluated between pitch and dynamics, which differed in magni- through spectral features, which in some cases, however, tude between instruments. In summary, pitch appears to seemed inseparable from covariation with pitch and induce substantial spectral change, and due to it being dynamics. These covariates are what a musical score dictated by , these changes lie beyond essentially communicates to performers and although performers’ expressive control. timbre is implied through instrumentation and articu- Although the results from the experiment showed ten- lation markings, for a given instrument it also occurs as dencies for increases in sound level to reflect increases in a by-product of notated pitches and dynamics. These spectral measures (Fmax, F3dB, SC), a linear covariation covariates also determine how performers excite their was only obtained for horns. Regardless of these differ- instruments’ acoustic systems, in turn establishing ences, dynamics may still have afforded performers of inherent links to the resulting spectral properties. both instruments greater liberty in timbral control, Although correlation analyses on their own do not although not necessarily in the same way. Subtle changes prove causal relationships, the inherent coupling of in dynamics that remain within the notated dynamic pitch, dynamics, and spectral properties in wind instru- markings could thus be used for slight timbre adjust- ments has been established physically (Benade, 1976), ments and may be more easily achieved than adjustments and this should hence justify their association. independent of both dynamics and pitch. Experienced Correlations between spectral measures and the cov- orchestrators likely have internalized the inherent links ariates of pitch (f0) and dynamics (Lrms) are visualized in between pitch, dynamics, and timbral properties in their Figure 12. As individual differences across performers instrumentation knowledge (e.g., pitch-driven dynamics), and their instruments were to be expected, the evaluation whereas the current findings argue that research on tim- considered correlations across all performances of indi- bre perception that aims to situate it within musical prac- vidual players and then summarized these as medians tice should abandon its definition as that residual quality and interquartile ranges for bassoon and horn separately. alongside pitch and dynamics, instead accepting the An impact of pitch variation becomes clearly apparent, notion of it being closely entwined with the other musical reflected in positive correlations, r .55, between f0 and parameters (McAdams & Goodchild, 2017). all spectral measures. This applied to both instruments, Assigning roles to performers yielded the clearest with spectral centroid (SC) being most affected and there effects for timbral adjustments related to blend. Players being little variability among players. The potential influ- acting as leaders indeed functioned as a reference ence of dynamics on the spectral measures differed fun- toward which followers oriented their playing. In order damentally across instruments. Whereas correlations to achieve blend, followers adjusted towards darker tim- with Lrms for bassoonists were nearly absent, r .10, bres compared to when they performed as leaders. For hornists exhibited clearly positive correlations, r .40. In both instruments, the darker timbre corresponded to addition, there was also a trend for positive correlations shifts of SC by about 30 Hz towards lower frequencies, Timbre Blending During Musical Performance 161

whereas the main formant shifted as well, but only for counterparts, but the mean difference between the two the horn. These selective spectral adjustments can be was merely 4% of the full range of the rating scale. This compared with similar strategies undertaken by singers small difference may be explained in a number of ways. to blend into a choir (Goodwin, 1980; Ternstro¨m, 2003). Listening experiments conducted in the past obtained At the same time, a darker timbre occurred together clearer differences in blend ratings between unison and with softer dynamics, which suggests that performers non-unison. In the current experiment, however, partici- may have partially achieved the timbre change through pants provided retrospective ratings alongside the more subtle changes in dynamics, in addition to potential demanding performance task, with the ratings also being changes in or the position of the right hand well separated in time, which did not allow immediate in the bell of the horn. The extent to which spectral comparisons of unisonvs. non-unison performances. Fur- change was employed varied between instruments, with thermore, performers were asked to use the rating scale the horn clearly producing more change—it is also based on their previous musical experience, i.e., judging known to be the timbrally more versatile instrument. performances and blend relative to what they had learned Due to the nature of the within-subjects design, these was achievable in musical practice. In addition, blending role comparisons considered how the same musicians could have also been understood as how ‘‘coupled’’ the performed differently as followers than as leaders (i.e., musicians’ performance was (i.e., related to additional they did not assess how bassoonist followers darkened factors such as synchrony and intonation). Lastly, the their timbre relative to hornist leaders and vice versa). musicians’ own playing could have partially masked their At the least, Figure 10 suggests that as followers, hor- perception of the other player (e.g., hearing their instru- nists lowered their upper-bound formant frequencies ment in greater proximity and via bone conduction), (F3dB) to be about the same as that of the bassoonists, which does not compare to conventional listening experi- which is necessary to avoid a marked decrease in per- ments, where participants are presented a comparatively ceived blend (Lembke & McAdams, 2015). balanced rendition of two instruments. Together, these With regard to the magnitude of changes in dynam- factors could have led to the less pronounced rating differ- ics, differences in Lrms (e.g., 1-3 dB) were not so pro- ences between the interval conditions. nounced as to signify a departure from the notated Nonetheless, higher blend may still relate to unison dynamic marking piano. From interviewing players performances influencing player coordination more of both instruments, musicians appear to consciously critically. In unison, the performance ratings suggest consider adjustments of both dynamics and timbre as that followers gave higher ratings than did leaders, strategies to achieve blend. For instance, in accompa- which could imply that leaders were generally less sat- nying a leading instrument, a hornist described his isfied with their performance, given their more impor- goal as achieving a ‘‘rounder’’ or less brilliant timbre, tant role and responsibility for its success. By contrast, at the same time reporting that playing with wood- non-unison performances yielded higher ratings by lea- winds, he would need to avoid ‘‘overpowering’’ the ders than followers. This result could be related to part other instrument in dynamics. Likewise, a bassoonist C being located in a low register, which may have led to reported the importance of loudness balance to blend, some noticeable playing difficulty for a few players. also clarifying that to her, dynamics and timbre were While communication directivity did not appear to not independent. Yet, it cannot be ruled out that spec- affect performances as measured acoustically, the only tral changes occurred as by-products of sound level time it did become relevant concerned unison perfor- adjustments made for wholly other reasons than mances, as impaired communication was judged to be achieving blend. For instance, adjusting the self-to- detrimental to musicians’ performance. In a similar way, other ratio, i.e., the sound-level difference between although room-acoustical effects related to blend could oneself and another performer, may improve commu- not be deduced from the acoustic measures, the perfor- nication amongst musicians (Fulford et al., 2014; Kel- mance ratings revealed more pronounced effects ler, 2014; Ternstro¨m, 2003). Despite this possibly between performer roles in the smaller, less reverberant confounding influence, it still seems justified to room. These effects suggest that performer coordination assume some quantity of the observed spectral changes between instruments was more critical in the smaller to stem from blend-related adjustments, as no clear room, which may have allowed more subtle differences correlation between Lrms and the spectral measures is to become audible. Indeed, temporal coordination for apparent, especially for the bassoon (see Figure 12). one spectral measure was found to be higher for unison Unison performances were indeed perceived as yield- and the smaller room, although this remained limited to ing significantly higher blend than their non-unison global spectral change ("SC) and bassoons. 162 Sven-Amin Lembke, Scott Levine, & Stephen McAdams

Several indications suggest that musicians improved similar, yet, the greater timbral versatility of the horn their coordination throughout a performance. The tem- allowed it to blend into a bassoon sound (see Figure 10), poral coordination for both instruments improved in whereas the bassoon would not have succeeded in the later phrase for both dynamics (Lrms) and global adjusting towards a more brilliant or ‘‘brassy’’ timbre spectral change ("SC)byupto5% and 8%, respectively. in return. This imbalance in timbral adjustments, paired It should be noted that while median values of temporal with instrument-specific issues related to the playability coordination (XC) across both measures and instru- of parts, could explain the differences in performance ments were comparable, r .24, they indicate a fairly ratings between instruments. For example, hornists gen- weak positive correlation, which suggests that timbre- erally gave higher ratings of their performances as lea- related performer coordination does not operate at a fine ders than as followers, which could be linked to the time resolution but only appears to apply to larger time greater ease of playing in their default timbre as leaders, segments, such as first vs. second phrase. Furthermore, as opposed to having to adjust to a substantially darker the assessment of temporal change suggests that even timbre as followers. This implies that even in this com- leaders adjust their timbre. For instance, regardless of mon pairing, the horn may generally assume the more assigned role, horn players slightly reduced their main- dominant role over bassoons, which also manifests itself formant frequencies and dynamic level in the second in the orchestral repertoire. Their combination in unison phrase. Although these changes were of considerably is in fact less common, likely explained by their high smaller magnitude than the ones between performer similarity not adding much timbral enrichment, whereas roles (e.g., 5 vs. 30 Hz), and likely of lesser perceptual their combination in non-unison is widespread. In the salience, performer coordination appears to motivate latter cases, bassoons are often substituted for missing adjustments by both musicians to a limited degree. Over- horns, because up to the mid-nineteenth century, orches- all, this result both suggests that performer coordination tras generally only included two horns. The addition of adapts over time, ideally leading to an improvement, and bassoons overcame this limitation, as is also the case in that the reference function of leaders still allows for a cer- the investigated orchestral passage by Mendelssohn- tain degree of bilateral adjustment between performers. Bartholdy. In practice, bassoonists more often find them- As there was no indication that performer coordination selves blending into the horn timbre than vice versa. was modulated by either communication impairment or Despite the various scenarios concerning instrument performance venue among the acoustic measures, the combinations as well as dominance or role relationships, strategies musicians employ in achieving blend appear a common rule seems to apply to all: In attaining per- to be fairly robust to acoustical factors. ceptual blend, the accompanying instrument darkens its This investigation represents a case study by featuring timbre in order to avoid ‘‘outshining’’ the leading, dom- two instruments that commonly form a blended timbre inant instrument. In other words, when an accompany- in the orchestral literature. Given the high timbral sim- ing instrument blends into the leading instrument, it ilarity between bassoon and horn, an effect of performer adopts a strategy of remaining subdued and low-key, very roles was obtained across both instruments, i.e., regard- similar to how it subordinates itself to the lead instru- less of which was leading in performance, whereas ment’s cues for intonation, timing, and phrasing. obtaining a role-based effect would become less likely when there are starker differences between instrument Conclusion timbres. In the latter scenario, the more dominant tim- bre would seem predisposed to assume the lead and The current investigation showcases how the orchestra- serve as the reference, into which the other instrument tion goal of achieving blended timbres is mediated by would either succeed or fail to blend. This case concerns factors related to musical performance. For instrument what Sandell (1995) referred to as the augmented tim- combinations exhibiting similar timbres (e.g., bassoon bre,inwhichadominantinstrumentistimbrally and horn), the assignment of performer roles may deter- enriched by another instrument. With this case being mine which instrument serves as a reference toward a common goal in orchestration, its success depends on which accompanying musicians adapt their timbre to the ability of the other instrument to blend into the be darker. In an arbitrary combination of instruments, context defined by the reference. Either its spectral a possible dominance of one timbre likely biases that envelope lacks any prominent features that would oth- instrument toward assuming the reference and leading erwise ‘‘challenge’’ the dominant instrument or it bears role, requiring that another instrument be able to blend a sufficiently high resemblance to the latter. In the in, otherwise resulting in a heterogeneous timbre. With current investigation, both instrument timbres were respect to previous research on musical performance, the Timbre Blending During Musical Performance 163

current findings illustrate a case in which performer and for the use of its equipment and facilities. We would coordination, as related to concepts like joint action and like to expressly thank its technical staff, Harold leadership, directly applies to performers’ control of tim- Kilianski, Yves Me´thot, and Julien Boissinot, for their bre. Achieving a blended timbre requires coordinated technical advice and assistance. The authors also thank action in which an orchestrator’s intention becomes the Martha de Francisco from the Sound Recording pro- common aim of two or more performers, involving strat- gram at the Schulich School of Music, McGill University egies based on relative performer roles that ensure the for her co-supervision and valuable feedback through- idealized goal is realized. Standing in the limelight of out the project. This research was partly funded by performance, leading musicians assume the responsibil- a Canadian National Sciences and Engineering Research ity over the accurate and expressive delivery of musical Council grant (RGPIN 312774-2010) and a Canada ideas, whereas the accompanist’s primary concern is to Research Chair to Stephen McAdams. We thank three blend in, and if successful, remain somewhat obscured in anonymous reviewers for valuable comments on earlier the lead instrument’s timbral shadow. versions of this article. Correspondence concerning this article should be Author Note addressed to Sven-Amin Lembke, currently at De Mon- tfort University, Clephan Building, Room 00.07b, The authors would like to acknowledge CIRMMT for Leicester LE1 9BH, United Kingdom; email: sven- student awards to Scott Levine and Sven-Amin Lembke [email protected]

References

BENADE, A. H. (1976). Fundamentals of musical acoustics.New GOODMAN, E. (2002). Ensemble performance. In J. Rink (Ed.), York: Oxford University Press. Musical performance: A guide to understanding (pp. 153-167). BREGMAN, A. S. (1990). Auditory scene analysis: The perceptual Cambridge, UK: Cambridge University Press. organization of sound. Cambridge, MA: MIT Press. GOODWIN, A. W. (1980). An acoustical study of individual voices D’AUSILIO,A.,BADINO,L.,LI,Y.,TOKAY,S.,CRAIGHERO,L., in choral blend. Journal of Research in Music Education, 28, CANTO,R.,ALOIMONOS,Y.,&FADIGA, L. (2012). Leadership 119-128. in orchestra emerges from the causal relationships of move- HARWELL,M.R.,RUBINSTEIN,E.N.,HAYES,W.S.,&OLDS,C. ment kinematics. PloS One, 7(5), e35757. C. (1992). Summarizing Monte Carlo results in methodolog- DOVAL,B.,&RODET, X. (1991). Estimation of fundamental ical research: The one- and two-factor fixed effects ANOVA frequency of musical sound signals. In Proceedings of the IEEE cases. Journal of Educational and Behavioral Statistics, 17, 315- International Conference on Acoustics, Speech, and Signal 339. Processing (pp. V–3657–V–3660). Toronto: IEEE ICASSP. KELLER, P. E. (2008). Joint action in music performance. In F. FABIANI,M.,&FRIBERG, A. (2011). Influence of pitch, loudness, Morganti, A. Carassa, & G. Riva (Eds.), Enacting intersubjec- and timbre on the perception of instrument dynamics. Journal tivity (pp. 205-221). Amsterdam: IOS Press. of the Acoustical Society of America, 130, EL193-199. KELLER, P. E. (2014). Ensemble performance: Interpersonal FANT, G. (1960). Acoustic theory of speech production. The Hague: alignment of musical expression. In D. Fabian, R. Timmers, & Mouton. E. Schubert (Eds.), Expressiveness in music performance: FULFORD,R.,HOPKINS,C.,SEIFFERT,G.,&GINSBORG, J. (2014). Empirical approaches across styles and cultures (pp. 260-282). Sight, sound and synchrony: effects of attenuating auditory Oxford: Oxford University Press. feedback on duo violinists’ behaviours in performance. In KELLER,P.E.,&APPEL, M. (2010). Individual differences, Proceedings of the 13th International Conference on Music auditory imagery, and the coordination of body movements Perception and Cognition (pp. 166-171). Seoul, Korea: ICMPC. and sounds in musical ensembles. Music Perception, 28, 27-46. GOAD,P.J.,&KEEFE, D. H. (1992). Timbre discrimination of KENDALL,R.A.,&CARTERETTE, E. C. (1991). Perceptual scaling musical instruments in a concert hall. Music Perception, 10, of simultaneous timbres. Music Perception, 8, 43-62. 369-404. GOEBL,W.,&PALMER, C. (2009). Synchronization of timing and KENDALL,R.A.,&CARTERETTE, E. C. (1993). Identification and motion among performing musicians. Music Perception, 26, blend of timbres as a basis for orchestration. Contemporary 427-438. Music Review, 9, 51-67. 164 Sven-Amin Lembke, Scott Levine, & Stephen McAdams

KENDALL,R.A.,&CARTERETTE, E. C. (1996). Difference RAKOWSKI, A. (1990). Intonation variants of musical intervals in thresholds for timbre related to spectral centroid. In isolation and in musical contexts. Psychology of Music, 18, Proceedings of the 4th International Conference on Music 60-72. Perception and Cognition (pp. 166-171). Montreal, Canada: RASCH, R. A. (1988). Timing and synchronization in ensemble ICMPC. performance. In J. A. Sloboda (Ed.), Generative processes in KEWLEY-PORT,D.,&WATSON, C. S. (1994). Formant-frequency music: The psychology of performance, improvisation, and discrimination for isolated English vowels. Journal of the composition (pp. 70-90). Oxford: Clarendon Press. Acoustical Society of America, 95, 485-496. REUTER, C. (1996). Die auditive Diskrimination von KOECHLIN, C. (1954). Traite´ de l’orchestration: En quatre volumes Orchesterinstrumenten - Verschmelzung und Herausho¨rbarkeit [Treatise of orchestration: in four volumes]. Paris: M. Eschig. von Instrumentalklangfarben im Ensemblespiel [The auditory LEMBKE,S.-A.,&MCADAMS, S. (2015). The role of spectral- discrimination of orchestral instruments: Fusion and distin- envelope characteristics in perceptual blending of wind- guishability of instrumental timbres in ensemble playing]. instrument sounds. Acta Acustica United with Acustica, 101, Frankfurt am Main: P. Lang. 1039-1051. RIMSKY-KORSAKOV, N. (1964). Principles of orchestration.New MCADAMS,S.,&GOODCHILD, M. (2017). Musical structure: York: Dover Publications. Sound and timbre. In R. Ashley & R. Timmers (Eds.), The RODET,X.,&SCHWARZ, D. (2007). Spectral envelopes and Routledge companion to music cognition (pp. 129-139). New additive þ residual analysis/synthesis. In J. W. Beauchamp York: Routledge. (Ed.), Analysis, synthesis, and perception of musical sounds (pp. MEYER, J. (2009). Acoustics and the performance of music (5th 175-227). New York: Springer. ed.). New York: Springer. SANDELL, G. J. (1991). Concurrent timbres in orchestration: A MOORE,B.C.,&MOORE, G. A. (2003). Discrimination of the perceptual study of factors determining blend (Unpublished fundamental frequency of complex tones with fixed and doctoral dissertation). Northwestern University. shifting spectral envelopes by normally hearing and hearing- SANDELL, G. J. (1995). Roles for spectral centroid and other impaired subjects. Hearing Research, 182, 153-163. factors in determining ‘‘blended’’ instrument pairings in NAKAMURA, T. (1987). The communication of dynamics between orchestration. Music Perception, 13, 209-246. musicians and listeners through musical performance. TARDIEU,D.,&MCADAMS, S. (2012). Perception of dyads of Perception and Psychophysics, 41, 525-533. impulsive and sustained instrument sounds. Music Perception, PAPIOTIS,P.,MARCHINI,M.,PEREZ-CARRILLO,A.,&MAESTRE, 30, 117-128. E. (2014). Measuring ensemble interdependence in a string TERNSTRO¨ M, S. (2003). Choir acoustics–An overview of scientific quartet through analysis of multidimensional performance research published to date. International Journal of Research in data. Frontiers in Psychology, 5, 963. Choral Singing, 1, 3-12. PAUL, S. (2009). Binaural recording technology: A historical VILLAVICENCIO,F.,RO¨ BEL,A.,&RODET, X. (2006). Improving review and possible future developments. Acta Acustica United LPC spectral envelope extraction of voiced speech by true- with Acustica, 95, 767-788. envelope estimation. In Proceedings of the IEEE International PEETERS,G.,GIORDANO,B.L.,SUSINI,P.,MISDARIIS,N.,& Conference on Acoustics, Speech, and Signal Processing (pp. I– MCADAMS, S. (2011). The Timbre Toolbox: extracting audio 869–I–872). Toulouse: IEEE ICASSP. descriptors from musical signals. Journal of the Acoustical ZWICKER,E.,&FASTL, H. (1999). Psychoacoustics: Facts and Society of America, 130, 2902-2916. models (2nd ed.). Berlin: Springer.