<<

1

Experiment Vocals. Speech and Music in Carl Stumpf's Papers and Phono-Objects

Loescher, NN

Introduction

Timbre has been regarded as a key component of aesthetic 'coding' and in music since the days of Helmholtz. A famous example is Ravel's Boléro featuring the same theme (including rhythm and pitch) with different instruments, thus changing the semantics of the aesthetic message along the 'gliding' timbre. Up to the present day there is no definition of timbre except the one ex negativo that the of a tone equal in pitch and loudness still differs in timbre regarding the instrument (American Standards Association, 1951, 23). In contrast to pitch and loudness timbre cannot be measured on a two-dimensional scale; it is, per se, a multidimensional phenomenon. That is why multidimensional scaling of musical timbre uses a three-dimensional timbre space (McAdams, 2013). Moreover, timbre categories are aesthetic in their very wording in the psychological literature. Not only musical instruments have timbre: voices have it, too. Experimental settings which manipulated the spectral envelope of voice signals or which simulated voice such as formant sweeps with an "interspersed steady state portion" (Husain et al, 2003) or which used noise envelopes corellated with sound curves or, finally, which blocked pair wise presentation of human voiceless, human voiced and environmental sounds (Belin, 2000; Halpern et al, 2003) have tested positively the effect of voiced sounds on encoding in working memory and/or forced choice performance. Timbre is a prime candidate for this differentia specifica because it is the one auditory feature which the 'mock stimuli' in the mentioned experiments did not have. The topic of timbre in voice (and that is, among others, in speech) is not represented well, because psychologists interested in acoustics are not inclined to deal with its aesthetic implications – in contrast to 'empirical' musicologists. This is amazing because speech sounds and musical tones have a lot in common regarding their sound signals: both are organized in terms of a fundamental and partials which are 'layered' in a harmonic overtone spectrum1. Consequently – because of the fundamental – both sound signals have a pitch2. Speech vocals do correspond to musical pitch, as Plomp noted in the Seventies quoting Carl Stumpf (Plomp, 1973, 97). The most striking similarity, though, is the spectral envelope, that is the distribution of frequency bands of the sound signal over time. It is generally agreed that the 'profile' of the spectral envelope is an indicator for timbre, and thus for aesthetic coding and perception (Koelsch, 2011; Zatorre, 2004). This might be due to:

a, spectral flux, that is vertical glides of the formants in the spectrum over time (formant sweeps). Spectral flux is usually the reason why an auditory object is perceived as 'poignant', 'granular', 'clear', 'distinctive'. b, the spectral centroid, that is the median of all frequency bands or 'active' harmonics. Some musical instruments are brighter than others, that is they have a higher spectral centroid. Voices bear this feature, too.

1 With vocals (and consonants) these partials are called formants. 2 To be more precise: there is a 'fingerprint' of the fundamental in the partials rendering the fundamental superfluous for pitch detection. 2

c, roughness due to pulses of adjacent 'active' frequencies (in musical terms: inharmonic complex tones). Roughness is an important component of musical (and vocal) tension and relaxation, of generating a 'steady-state' of emergent sound signals in terms of an auditory/aesthetic object3.

We assume that there is a reliable analogy between the essentially aesthetic coding and perception of spectral envelope changes in music and that in (speaking) voices. In the proposed sequence of experiments we will concentrate on the latter.

Methods/Experiments

Experiment 1. We will measure female and male timbre (especially formant sweeps) in single sentences of professional speakers (historical, present). This data will be correlated with that of neutral speakers (historical, present). We will level out energy spectrum, loudness, and pitch of the sound waves in order to isolate timbre as the independent variable (acoustic feature extraction techniques; see Alluri et al, 2012). For a first analysis the software Praat will be used to visualize sound spectra, formant distribution, etc. For elaborate analyses we will turn to the MIR toolbox (Matlab) which computes a set of timbre features for input of auditory sources. Finally, we will conduct multidimensional scaling of timbre in 'aesthetic'/'neutral' voices in line with studies of scaling musical timbre (MDS via similarity/dissimilarity ratings of subjects, see McAdams, 2013).

Experiment 2. If experiment 1 shows a reliable effect (timbre in aesthetic performance > timbre in neutral performance), then we will project an imaging study (fMRI) in the line of Belin's et al voice recognition study (Nature, 403, 2000). Belin's guiding question: "How do humans decode voices?" is modified as to: "How do humans decode aesthetic features of voices?" We will use the stimuli of experiment 1. Stimuli will be presented pair wise (aesthetic/neutral performance). Task 1 will be listening during stimulus presentation after a prompt ("Bright"/"Distinct"/"Rough") has been given. Task 2 will be a forced choice paradigm ("Is the second sound sequence brighter/rougher/more distinct than the first?"). Along the line of previous studies regarding timbre processing we expect activation of the planum temporale (Griffiths et al, 2002) and parts of Heschl's gyrus in comparison to the base line of scanner noise activation. We expect to find task related neuronal activation patterns and behavioral data to correlate significantly with the analyses and the scaling of experiment 1.

3 Of course, this is my adaptation of Jackendoff's/Lerdahl's famous book "A Generative Theory of Tonal Music". See their chapter on 'prolongational reduction'. 3

References

Alluri, Vinoo; Petri Toiviainen; Iiro Jääsekläinen; Enrico Glerean; Mikko Sams; Elvira Brattico: "Large-scale brain networks emerge from dynamic processing of musical timbre, key, and rhythm", NeuroImage, vol.59, 2012, 3677-3689.

American Standards Association: "Acoustical Terminology", New York, 1951.

Boersma, Paul; David Weenink: "Praat. Doing phonetics by computer", 2013. http://www.fon.hum.uva.nl/praat/

Fueller, Carina; Jens Loescher; Peter Indefrey: "Writing superiority in cued recall", Frontiers in , vol. 4, nr. 00764, 2013. http://www.frontiersin.org/cognitive_science/10.3389/fpsyg.2013.00764/abstract

Griffiths, Timothy D.; Jason D. Warren: "The planum temporale as a computational hub", Trends in Neurosciences, vol.25, no.7, 2002, 348-353.

Griffiths, Timothy D.; Jason D. Warren: "What is an auditory object?", Nature, vol.5, 2004.

Halpern, Andrea R.; Robert J. Zatorre; Marc Bouffard; Jennifer A. Johnson: "Behavioral and neural correlates of perceived and imagined musical timbre", Neuropsychologia, vol. 42, 2004, 1281–1292.

Koelsch, Stefan: "Toward a neural bais of – a review and updated model", Frontiers in Psychology, vol. 2, 2011.

Lange, Kathrin; Daniela Czernochowski: "Does this sound familiar? Effects of timbre change on episodic retrieval of novel melodies", Acta Psychologica, vol.143, 2013, 136-145.

Lartillot, Olivier; Petri Toivainen; Tuomas Eerola: MIR Toolbox, 2013. https://www.jyu.fi/hum/laitokset/musiikki/en/research/coe/materials/mirtoolbox

Latinus, Marianne; Pascal Belin: "Human Voice Perception", Current Biology, vol.21, no.4, 2011.

McAdams, Stephen: "Musical Timbre Perception", The Psychology of Music, ed.b. , Elsevier, 2013.

Nolden, Sophie; Patrick Bermudez; Kristelle Alunni-Menichini; Christine Lebfevre; Stephan Grimault; Pierre Jolicoeur: "Electrophysiological correlates of the retention of tones differing in timbre in auditory short-term memory", in: Neuropsychologia, vol. 51, 2013, 2740-2746.

Zatorre, Robert J.; Marc Bouffard; Pascal Belin: "Sensitivity to Auditory Object Features in Human Temporal Neocortex", Journal of Neuroscience, vol 24, no.14, 2004.

4

Appendix

1. Karl Kraus: "Die letzten Tage der Menschheit", Originalaufnahme 1921. Deutsches Literaturarchiv Marbach. "Zahlt's ihnen heim, ihr Götter! Sei's darum!"

2. Neutral speaker "Ich gehe in den Wald und nehme mein Fahrrad mit"

Sound curves generated with Praat.

Time (s)

Time (s)

5

Spectral envelopes of 1. Kraus and 2. Neutral speaker. Note the large differences in spectral density and distribution.

Track_No02_short

Time (s)

Time (s)