An Energy-Based Generative Sequence Model for Testing Sensory Theories of Western Harmony

AN ENERGY-BASED GENERATIVE SEQUENCE MODEL FOR TESTING SENSORY THEORIES OF WESTERN HARMONY Peter M. C. Harrison Marcus T. Pearce Queen Mary University of London Queen Mary University of London Cognitive Science Research Group Cognitive Science Research Group ABSTRACT Sequential sensory consonance is primarily determined by spectral distance and voice-leading distance. Spec- The relationship between sensory consonance and Western tral distance 2 describes how much a sound’s acoustic harmony is an important topic in music theory and psy- spectrum perceptually differs from neighbouring spectra chology. We introduce new methods for analysing this re- [22–24, 27, 29]. Voice-leading distance 3 describes how lationship, and apply them to large corpora representing far notes in one chord have to move to produce the next three prominent genres of Western music: classical, popu- chord [2, 39, 40]. Consonance is associated with low spec- lar, and jazz music. These methods centre on a generative tral and voice-leading distance. sequence model with an exponential-family energy-based Many Western harmonic conventions can be rational- form that predicts chord sequences from continuous fea- ized as attempts to increase pleasantness by maximizing tures. We use this model to investigate one aspect of in- sensory consonance. The major triad maximizes con- stantaneous consonance (harmonicity) and two aspects of sonance by minimizing roughness and maximizing har- sequential consonance (spectral distance and voice-leading monicity; the circle of fifths maximizes consonance by distance). Applied to our three musical genres, the results minimizing spectral distance; tritone substitutions are con- generally support the relationship between sensory conso- sonant through voice-leading efficiency [39]. nance and harmony, but lead us to question the high impor- This idea – that Western harmony seeks to maximize tance attributed to spectral distance in the psychological sensory consonance – has a long history in music the- literature. We anticipate that our methods will provide a ory [31]. Its empirical support is surprisingly limited, how- useful platform for future work linking music psychology ever. The best evidence comes from research linking sen- to music theory. sory consonance maximization to rules from music theory [15, 27, 39], but this work is constrained by the sub- 1. INTRODUCTION jectivity and limited scope of music-theoretic textbooks. Music theorists and psychologists have long sought to un- A better approach is to bypass textbooks and analyse derstand how Western harmony may be shaped by natural musical scores directly. Usefully, large datasets of digi- phenomena universal to all humans [13, 27, 36]. Key to tised musical scores are now available, as are many com- this work is the notion of sensory consonance, describing putational models of consonance. However, statistically a sound’s natural pleasantness [32, 35, 38], and its inverse linking them is non-trivial. One could calculate distribu- sensory dissonance, describing natural unpleasantness. tions of consonance features, but this would give only lim- Sensory consonance has both instantaneous and se- ited causal insight into how these distributions arise. Better quential aspects. Instantaneous consonance is the conso- insight might be achieved by regressing transition proba- nance of an individual sound, whereas sequential conso- bilities against consonance features, but this approach is nance is a property of a progression between sounds. statistically problematic because of variance heterogeneity Instantaneous sensory consonance primarily derives induced by the inevitable sparsity of the transition tables. This paper presents a new statistical model developed arXiv:1807.00790v1 [cs.SD] 2 Jul 2018 from roughness and harmonicity. Roughness is an un- pleasant sensation caused by interactions between spectral for tackling this problem. The model is generative and components in the inner ear [8,41], whereas harmonicity 1 feature-based, defining a probability distribution over sym- is a pleasant percept elicited by a sound’s resemblance to bolic sequences based on features derived from these se- the harmonic series [4, 20]. quences. Unlike previous feature-based sequence models, it is specialized for continuous features, making it well- 1 Related concepts include tonalness [27], toneness [15], fusion [14, suited to consonance modelling. Moreover, the model pa- 36], complex sonorousness [29], and multiplicity [29]. rameters are easily interpretable and have quantifiable un- 2 c Peter M. C. Harrison, Marcus T. Pearce. Licensed under Spectral distance is also known by its antonym spectral similarity Pitch commonality a Creative Commons Attribution 4.0 International License (CC BY 4.0). [23]. [29] is a similar concept. Psychological models of harmony and tonality in the auditory short-term memory (ASTM) tra- Attribution: Peter M. C. Harrison, Marcus T. Pearce. “An energy-based dition typically rely on some kind of spectral distance measure [1, 7, 17]. generative sequence model for testing sensory theories of Western har- 3 Voice-leading distance is termed horizontal motion in [2]. Parncutt’s mony”, 19th International Society for Music Information Retrieval Con- notion of pitch distance [28, 29] is also conceptually similar to voice- ference, Paris, France, 2018. leading distance. certainty, enabling error-controlled statistical inference. We use this new model to test sensory theories of har- 2! mony as follows. We fit the model to corpora of chord l 1 d(pc; px) g(pc; l; px) = p exp − ; (4) sequences from classical, popular, and jazz music, using σ 2π 2 σ psychological models of sensory consonance as features. We then compute feature importance metrics to quantify d(px; py) is the distance between two pitch classes px and how different aspects of consonance constrain harmonic py, movement. This work constitutes the first corpus analysis comprehensively linking sensory consonance to harmonic d(p ; p ) = min (jp − p j; 12 − jp − p j) ; (5) practice. x y x y x y and h(x; j) is the pitch class of the jth partial of a harmonic 2. METHODS complex tone with fundamental pitch class x: 2.1 Representations h(x; j) = (x + 12 log2 j) mod 12: (6) 2.1.1 Input ρ and σ are set to 0.75 and 0.0683 after [22]. Chord progressions are represented as sequences of pitch- 2.2 Features class sets. Exact chord repetitions are removed, but changes of chord inversion are represented as repeated 2.2.1 Spectral Distance pitch-class sets. Spectral distance is operationalised using the psychological model of [22, 24]. The spectral distance between two 2.1.2 Pitch-Class Spectra pitch-class sets X; Y is defined as 1 minus the continuous Some of our features use pitch-class spectra as defined cosine similarity between the two pitch-class spectra: in [22, 24]. A pitch-class spectrum is a continuous function that describes perceptual weight as a function of pitch R 12 W (z; X)W (z; Y ) dz class (p ). Perceptual weight is the strength of percep- 0 c D(X; Y ) = 1 − q q tual evidence for a given pitch class being present. Pitch R 12 2 R 12 2 0 W (z; X) dz 0 W (z; Y ) dz classes (pc) take values in the interval [0; 12) and relate to (7) frequency (f, Hz scale) as follows: with W as defined in Equation 2. The measure takes values in the interval [0; 1], where 0 indicates maximal similarity f p = 9 + 12 log mod 12: (1) and 1 indicates maximal divergence. c 2 440 2.2.2 Harmonicity Pitch-class sets are transformed to pitch-class spectra by expanding each pitch class into its implied harmonics. Our harmonicity model is inspired by the template- Pitch classes are modelled as harmonic complex tones with matching algorithms of [21] and [29]. The model simulates 12 harmonics, after [22]. The jth harmonic in a pitch class how listeners search the auditory spectrum for occurrences has level j−ρ, where ρ is the roll-off parameter (ρ > 0). of harmonic spectra. These inferred harmonic spectra are Partials are represented by Gaussians with mass equal to termed virtual pitches. High harmonicity corresponds to a partial level, mean equal to partial pitch class, and standard strong virtual pitch percept. deviation σ. Perceptual weights combine additively. Our model differs from previous models in two ways. First, it uses a pitch-class representation, not a pitch repre- Formally, W (p ;X) defines a pitch-class spectrum, re- c sentation. This makes it voicing-invariant and hence more turning the perceptual weight at pitch-class p for an input c suitable for modelling pitch-class sets. Second, it takes pitch-class set X = fx ; x ; : : : ; x g: 1 2 m into account the strength of all virtual pitches in the spec- m trum, not just the strongest virtual pitch. X W (pc;X) = T (pc; xi): (2) The model works as follows. The virtual pitch-class i=1 spectrum Q defines the spectral similarity of the pitch-class set X to a harmonic complex tone with pitch class pc: Here i indexes the pitch classes, and T (pc; x) is the contri- bution of a harmonic complex tone with fundamental pitch Q(pc;X) = D(pc;X) (8) class x to an observation at pitch class pc: with D as defined in Equation 7. Normalising Q to unit 12 mass produces Q0: X −ρ T (pc; x) = g pc; j ; h(x; j) : (3) j=1 0 Q(pc;X) Q (pc;X) = : (9) R 12 Q(y; X) dy Now j indexes the harmonics, g(pc; l; px) is the contribu- 0 tion from a harmonic with level l and pitch-class px to an Previous models compute harmonicity by taking the peak observation at pitch-class pc, of this spectrum. In our experience this works for small chords but not for larger chords, where several virtual 2.2.5 Summary pitches need to be accounted for.

Load more