Visualising Singing Style Under Common Musical Events Using Pitch-Dynamics Trajectories and Modified TRACLUS Clustering
Total Page:16
File Type:pdf, Size:1020Kb
Visualising Singing Style Under Common Musical Events Using Pitch-Dynamics Trajectories and Modified TRACLUS Clustering yKin Wah Edward Lin, yHans Anderson, yNatalie Agus, zClifford So, ySimon Lui y Singapore University of Technology and Design and z Chinese University of Hong Kong fedward lin, hans anderson, natalie [email protected], [email protected], simon [email protected] Abstract—We present a novel method for visualising the tempo and articulation as features. The style can be extracted singing style of vocalists. To illustrate our method, we take and retargeted by a support vector machine (SVM). Lui et 26 audio recordings of A cappella solo vocal music from two al. use each feature separately during the synthesis stage. different professional singers and we visualise the performance style of each vocalist in a two-dimensional space of pitch and Dannenberg et al. [2] have a strong emphasis on modelling dynamics. We use our own novel modification of a trajectory high quality frequency and amplitude envelopes (pitch and clustering algorithm called TRACLUS to generate four rep- dynamics) for wind instrument synthesis. Widmer et al. [3] resentative paths, called trajectories, in that two dimensional also use these features to represent piano performance style. space. Each trajectory represents the characteristic style of Nakano et al. [4] propose a singing synthesis system called a vocalist during one of four common musical events: (1) Crescendo, (2) Diminuendo, (3) Ascending Pitches and (4) VocaListener, in which they model singing style based on Descending Pitches. The unique shapes of these trajectories pitch, dynamics and timbre. Saitou et al. [5] verify that characterize the singing style of each vocalist with respect micro-tonal fluctuations in tone, such as vibrato and pitch to each of these events. We present the details of our mod- bends, are essential to human perception of singing style. ified version of the TRACULUS algorithm and demonstrate In this paper, we focus on two of the most significant graphically how the plots produced indicate distinct stylistic differences between singers. Potential applications for this performance features: dynamics and macro-tonal pitch, and method include: (a) automatic identification of singers and their relationship throughout the time. automatic classification of singing styles and (b) automatic G. Widmer et al. [6] demonstrate a machine learning al- retargeting of performance style to add human expression gorithm capable of identifying pianists by their performance to computer generated vocal performances and allow singing style. They use the technique called a performance worm [7] synthesisers to imitate the styles of specific famous professional vocalists. in which variations in tempo and loudness are plotted against each other. The performance worm separates an entire audio Keywords-Visualising; Singing Style; Music Event; TRA- recording into segments of length two beats and constructs CLUS Clustering; a symbolic representation for the stylistic content in each I. INTRODUCTION segment. The complete set of stylistic elements in a given song is represented symbolically and the collection of all Automatic characterization of musical performance is these symbols is called an alphabet. In this way, the style important primarily for two applications. First, it provides of an entire performance can be symbolically represented a set of rules that can be applied to musical synthesis to as a string of alpha-numeric symbols. After collecting a set permit computers to produce more expressive, more human of performance strings from different pianists, they find the sounding performances. And secondly, it is useful for infor- sub-strings that occur most frequently in the performances mation retrieval applications such as automatic recognition of a particular pianist. They claim that these frequently of individual performers and automatic classification of occurring sub-strings characterize the performance style. musical styles. Many existing methods of characterizing For example, in a string representing a particular pianist’s vocal performance style have provided insight into which performances of Mozart, they frequently find a sub-string audio features are most significant and into how we should representing a crescendo followed by a slight accelerando process these features so that they can be applied in a useful followed again by a decrescendo with nearly constant tempo. manner to the applications mentioned above. Inspired by their method, the method we present in this paper Lui et al. [1] show that the expressive performance style is an alternative way to visualise the singing style using paths of violin playing can be represented by using dynamics, in pitch-dynamics space. This work is supported by the SUTD-MIT International design center In this paper, a trajectory represents a plot of the dynamics (IDC) Research Grant (IDG31200107 / IDD11200105 / IDD61200103). and pitch (in a two-dimensional space) over the entire Duration Time Song Name Tempo Key duration of a song. Specifically, a trajectory is a set of (m:ss) sig. ordered pairs (pt; dt), representing a time series of samples 4 Minutes 0:28 112 B[ major 4/4 And I Am Telling You 0:28 118 B[ major 4/4 of pitch pt and loudness dt at time t, taken at regular (I’m Not Going) intervals. We use a modified version of Jae-Gil Lee’s trajec- Disco Inferno 1:01 126 E[ major 4/4 Don’t Wanna Lose You 1:40 80 F major 4/4 tory clustering algorithm called TRACLUS [8] to compare Hate On Me 1:15 120 E[ major 4/4 between vocal performances of two or more songs and Hell To The No 1:35 133 G major 4/4 identify similar sub-trajectories, representing portions where I Look To You 1:27 104 G major 4/4 I Will Always Love You 1:11 66 A major 4/4 the performances share stylistic similarities. We then group Shake It Out 0:18 112 F major 4/4 those similar sub-trajectories into clusters and represent each Sweet Transvestite 1:23 104 E major 4/4 Take Me Or Leave Me 0:28 118 F major 4/4 cluster with a representative trajectory. Each representative Try A Little Tenderness 1:48 80 G major 4/4 trajectory of a cluster of sub-trajectories indicates a stylistic USA National Anthem 1:09 104 A[ major 4/4 performance event that occurs in a similar way across several performances. Table I: 13 A cappella Solo of Amber Riley. The remainder of the paper is organised as follows. In Duration Time Song Name Tempo Key Section II, we describe the audio samples in the database (m:ss) sig. we used to illustrate our method. Then, we explain how Being Good Isn’t Good Enough 2:00 108 G major 4/4 Dont Cry For Me Argentina 2:50 87 D major 4/4 we process each sample to generate the pitch-dynamics Firework 1:47 124 A[ major 4/4 trajectories in Section III. Since our design makes use of the Get It Right 3:22 84 D major 4/4 TRACLUS algorithm to further process these trajectories, Go Your Own Way 1:52 136 F major 4/4 Jar Of Hearts 2:26 75 E[ major 4/4 we briefly state the TRACLUS algorithm and justify its use My Man 2:01 102 E major 4/4 in this application in Section IV. Our main contribution, Oops!...I Did It Again 2:09 95 E major 4/4 Take A Bow 2:01 85 E major 4/4 which is to use a modified version of the TRACLUS algo- The Only Exception 1:40 95 B major 6/8 rithm to visualise the singing style based on representative Torn 1:07 96 F major 4/4 What I Did For Love 2:32 72 C major 4/4 trajectories in a plot in two-dimensional space of pitch and Without You 2:31 128 D major 4/4 dynamics is presented in Section V. Finally, several possible future directions are discussed in section VI. Table II: 13 A cappella Solo of Lea Michele. II. SINGING SAMPLES III. PITCH-DYNAMICS TRAJECTORIES Using a publicly available database is always desirable In this section, we illustrate how the pitch-dynamics as it provides a benchmark for researchers to scientifically trajectories are generated, using the songs Disco Inferno justify their research finding. However, for the current re- and Being Good Isn’t Good Enough as examples. We first search, no publicly accessible database was ideally suitable cut the songs into frames of fixed length (4096 sample points for our needs. The database would be at best similar to the or 92.9 ms) with 50% frame overlap. Then we measure one G.Widmer used [6], wherein there is a set of singers, the loudness and pitch of each frame and represent the S = fs ; :::; s g and a set of songs (or performance), 1 n measurement as a point on the graph of pitch/dynamics P = fp ; :::; p g such that 8p 2 P , p is performed 1 m j j space. by each s 2 S. The requirement that all vocalists have to i We measure the dynamics on the sone scale [9] using sing the same songs makes it impossible to assemble such Glasberg and Moore’s loudness model [10]. Our implemen- a database using existing recordings. tation is based on the one from Genesis Audio 3, using Despite the above requirement, clean recordings without 4096 sample points per frame and a sampling frequency excessive reverberation or other mixed vocal effects are also of 44.1kHz. We also use this same frame size for the pitch major criteria to be selected in our database. We searched detection step. For our measurement of pitch we use YIN4 and found two popular female vocalists have such sets of [11], a common algorithm for pitch detection in melodic clean recordings: Amber Riley1 and Lea Michele2.