<<

Master thesis on Sound and Music Computing Universitat Pompeu Fabra

Automatic Chord-Scale Type Detection using Chroma Features

Emir Demirel

Supervisor: Baris Bozkurt

July 2018

Master thesis on Sound and Music Computing Universitat Pompeu Fabra

Automatic Chord-Scale Type Detection using Chroma Features

Emir Demirel

Supervisor: Baris Bozkurt

July 2018

Contents

1 Introduction 1

1.1 Motivation ...... 2

1.2 Conceptual Background ...... 4

1.3 On-line Music Courses ...... 6

1.4 MusicCritic ...... 8

2 State of the Art 10

2.1 Review of Chord-Scale Detection Methods ...... 10

2.2 Review on Improvisation Assessment Metrics ...... 14

3 Datasets 15

3.1 The Chord-Scale Dataset ...... 15

3.2 Data from MusicCritic ...... 17

4 Feature Extraction 20

4.1 Preprocessing ...... 20

4.2 Chroma Feature Extraction ...... 24

4.3 Reference Frequency Determination for Pitch-Class Mapping ...... 27

4.4 Post-Processing ...... 27

5 Classification 32

5.1 Template Based Additive Likelihood Estimation ...... 32

5.2 Support Vector Classification ...... 34 6 Experiments 39 6.1 Evaluation on the Chord-Scale Dataset ...... 39 6.2 Evaluation of Student Performances using MusicCritic framework . . . 41

7 Results 46 7.1 Evaluation on the Chord-Scale Dataset ...... 46 7.2 Evaluation on Student Performances in the Scale Exercise ...... 53

8 Discussions 58

9 Conclusion 60

List of Figures 62

List of Tables 64

Bibliography 65 Acknowledgement

There are many people that I would like to express my gratitudes. It has been a great experience to be involved in the research done in Music Technology Group (MTG) of Universitat Pompeu Fabra (UPF) in Barcelona. From the first day to the last, I have been constantly in the process of learning. Thank you all MTG people for making this research possible.

First, I would like to thank Xavier Serra for providing me with the opportunity to work and do research in MTG, and moreover being a mentor for me since the first day I came to Barcelona. I would like to thank Baris Bozkurt, my supervisor, for all of his technical and personal support throughout my masters studies, and being so patient for all my questions throughout my research.

I was very lucky to be able to work within TECSOME project with such great people, who were great inspirations for me in my research. I would like to thank Vsevolod Eremenko, Oriol Romani , Rong Gong, Rafael Caro Repetto, Blazej Kotowski for all of their great ideas and support. Thank you all the researchers in MTG, for their valuable conversations, enthusiasm in music and research, and more so, for being true inspirations.

I have learned so much valuable information and gained a priceless insight in Music and Informatics thanks to the top notch academic staff of UPF, especially within the Sound and Music Computing (SMC) masters program. Thank you Perfecto Herrera, Agustin Martorell, Sergi Jorda, Rafael Ramirez, Enric Gine, Alastair Porter, Dimitri Bogdanov, Frederic Font, Davinia Hernandez, and again, Xavier Serra and Baris Bozkurt. I would like to thank to the administration staff of MTG, Cristina Garrido and Sonia Espi Fernandez for their support and assistance. I would also like to specially thank to all of the fellow SMC masters students for being companions along this journey.

It would have been impossible to come to Barcelona and continue my career as a researcher without the support of family, my mother Nevin Demirel, my father Mustafa Demirel and my sister, Gozde Demirel Akan. Their full support kept me going and motivated even in the most challenging times.

Special thanks to Toprak Barut and Hikmet Altunbaslier, two brilliant musicians from Ankara, Turkey, who have made the computational part of my project possible. Also I would like to thank all my fellow musician friends from Ankara for all of their vision and understanding of music. Thank you Gurkan Uysal, my first music mentor, who has sparkled this passion of mine for music. Your initial guidance for music has always been invaluable for me. Abstract

There has been great effort in designing data-driven applications which exploit com- putational methods to solve musically related problems in the research field of Music Information Retrieval. However, the current state of music modeling needs to be expanded in consideration with musical domain knowledge, as well as the human perception and cognition of music. In this thesis work, we study and evaluate differ- ent computational methods to carry out a "modal analysis" for improvisation performances by modeling the concept of "chord-scales". The Chord-Scale The- ory is a theoretical concept that explains the relationship between the harmonic context of a musical piece and possible scale types to be used in improvisation. This work proposes different computational approaches to detect (or recognize) the chord-scale type present in the target Jazz solo, given the harmonic context. The experiments are conducted on two different datasets which are created within the course of this work. One of the datasets is made publicly available. To achieve the task of chord-scale type detection, we exploit a rule-based and a supervised learn- ing method. The rule-based approach is developed in order to reveal possibilities for computational modeling of chord-scales. In the supervised learning algorithm, Support Vector Machines are chosen as classifiers. The classification of audio data is performed using chroma features. Furthermore, we conduct a case study on user (student) performance using "MusicCritic", which is a novel framework for auto- matic student performance assessment. This work has its value for conducting one of the first research on the numeric representations of chord-scales in improvised solo performances and pointing out several possible directions for exploring some of the core elements behind . Keywords: computational musicology; music information retrieval; chord-scale de- tection; chroma features; supervised learning; machine learning; Jazz improvisation Chapter 1

Introduction

Being able to play an instrument and improvise with it requires high musical knowl- edge and vast amount of training. Most skilled and experienced Jazz improvisers exploit the musical concept called "scales" in their solos. In general, scale infor- mation gives a reflection of the harmonic structure of a musical piece. Therefore, learning how to play scales is a crucial step for both improving improvisational skills and having a stronger sense about musical harmony. In this work, we present a novel approach to detect or estimate the chord-scale type of a musical performance using both machine learning and rule-based methods that exploit tonal features. Moreover, our study examines the performance of other existing methods for scale estimation methods within the context of our research. By doing so, we aim to highlight the potential of proposed approach for automatic chord-scale detection task. We have created an open-source dataset to conduct our experiments, which consists of improvisation performances with manually annotated chord-scales. For demonstration of a use case of our technology, we exploit MusicCritic, which is a novel framework for automatic musical performance assessment. The student impro- visation performances obtained via MusicCritic are computationally analyzed and automatically assessed based on musical heuristics.

1 2 Chapter 1. Introduction

1.1 Motivation

There has been great effort in the last few decades in designing computational mod- els to solve problems within the research field of Music Information Retrieval (MIR), using data-driven techniques like machine learning, deep learning, etc. However, the current state of music modeling needs to be expanded with a focus in music theo- retical (domain) knowledge, in consideration with human perception and cognition of music. A focus in theoretical knowledge to model "chord-scales" would help us discover and understand some of the core elements behind Jazz improvisation. In this thesis work, we study an essential theoretical concept in Jazz improvisation, the chord-scales, for the computational analysis of improvisation performances.

Figure (1) Three dimensions of research in Music Information Retrieval

There are numerous sources to get proper training and education to be able to understand and produce music and learn how to play an instrument and to be able to improvise with it. The availability of today’s on-line education tools provide a way to enhance this training process. Although these tools are useful for self-training, there is a space for improvement in terms of the effectiveness of the education output. First of all, the process of learning relies heavily on feedback on student’s performance.

There has been a paradigm shift in the education system in the last passing decades 1.1. Motivation 3 due to the increasing number of on-line education platforms. With reference to the data collected by Class Central (https://www.classcentral.com/ ), the total number of students who enrolled in at least one on-line course has already exceed 58 million in 2016. Over 700 universities around the world have released free on-line courses. These on-line education platforms are able to reach a very large audience of students in various fields with the availability of Massive Open Online Courses (MOOCs) [1].

The absence of an automatic feedback mechanism requires human experts to grade the musical performances for the student, which could be highly time consuming in the context of MOOCs. The work of this paper aims to create and develop a system that could be used for providing automatic feedback to the students for improvisation or any other chord-scale related exercises within the on-line music learning context.

This study also intends to touch upon the significance of scale-based analysis for the overall harmonic analysis of a musical excerpt. Chord-scales reflect many im- portant aspects regarding the harmonic content of a musical performance. Hence, the information regarding the concept of chord-scales may reveal new directions in computational musicology.

The main motivation for the work studied in this thesis is therefore, to explore features that are suitable for our task, design and implement different computational models, perform a harmonic analysis of monophonic solo performances based on chord-scales and explore further directions in music modeling in the context of Jazz improvisation.

In Section 1.2, we provide a brief summary of the musical concepts that our study is built on. First, we define chord-scales within the context of Jazz improvisation and bring out the concept of ’The Chord-Scale Theory’ to set a theoretical basis for our research. In the following section, we give an outline of On-line Music Courses and more specifically the course that gives form to our analyzes: Gary Burton’s Jazz Improvisation. Finally in Section 1.4, we review MusicCritic, a novel framework 4 Chapter 1. Introduction for automatic musical performance assessment, that we have exploited in this study to demonstrate a real-world test case of the chord-scale type detection algorithm introduced in this thesis work.

1.2 Conceptual Background

Due to the high degree of interdisciplinarity of the research area of Music Informa- tion Retrieval, it is essential to have a comprehensive outlook that comprises both music theoretical (or domain) knowledge and the common computational methods for music modeling when doing research. A sufficient conceptual background is nec- essary for both the researchers, so that their work could conveniently link with a real world scenario, and the readers, so that they are able to grasp the core of the studies belonging to these research fields.

The goal of this section is to set a proper description of ’chord-scales’ within the context of our research. Before attempting to computationally model this concept, it is important to have a broad understanding of the main characteristics of chord- scales.

As proposed in Alamo (2012), one essential strategy for Jazz improvisation is the use of scales in a manner of the specific jazz style. To avoid playing a solo that lacks the sense of melody and musical cohesiveness, one should avoid playing the entire scales up and down, stressing on the root note and exploit chromatic ornamentations to basic scales for sounding more "interesting".

A scale is a step-wise arrangement of pitch-classes contained within an octave [2] . Major and minor scales are considered among the basic mother scales and composed of seven pitch-classes which are called scale degrees. These scale degrees are notated in Figure 2.1.

The Chord-Scale Theory

The chord-scale system is a method of mapping a list of scales to a . The scales commonly used today consist of the seven modes of the diatonic major scale, 1.2. Conceptual Background 5

Figure (2) Major Scale and Degrees

Figure (3) Minor Scale and Degrees the seven modes of the melodic minor scale, the diminished scales, the whole-tone scale and the altered scale, the pentatonic and bebop scales [3] . From a student’s perspective, the list of scales may seem to be overwhelming. With the consideration of practicing each scale in all 12 pitch-classes multiplies the complexity level of the whole system. In a real world scenario, the number of possible chord-scales to be played within a certain harmonic context is clearly dependent of performer’s artistic decisions and personal creativity. Hence, it would be too optimistic to assume to achieve perfect scale detection every improvised performance, even when the harmonic content is provided. For practical reasons, a limitation on the list of chord-scales to detect or estimate needs to be considered. In principle, the list of chord-scales involved in this study for analysis are determined according to the content of Gary Burton’s on-line course of Jazz Improvisation, as explained in the previous chapter. In addition to the scales involved in the course, we consider other mother scales included in [4]. The complete list of chord-scale mapping to be considered in this study can be seen from Table 1.

Table 1 shows that multiple scales could be played over certain chord types. Thus, the choice of chord-scale among the list of possible scales needs to be done properly to achieve a good-sounding solo. According to the chord-scale theory, there are 3 main criteria for the choice of chord-scale [5].

1 - The Melody : When there are multiple options for choosing the right chord- scale, the melody (or main theme) of the piece should be considered. Chord-scale 6 Chapter 1. Introduction

Table (1) The Chord-Scale Mapping Chords Scale Types Major 7 Ionian, Lydian Minor 6 Melodic Minor Minor 7 Dorian, Aeolian, Phrygian Minor Major 7 Melodic Minor, Harmonic Minor Minor 7b5 Locrian Dominant 7 Mixolydian, Altered, Lydian b7, Whole-Tone, Symmetrical Diminished Diminished 7 Symmetrical Diminished that contains the notes in the melody would be the best choice for chord-scale. Note that if the melody uses chromatic passing notes, these notes cannot be used as a criteria for choosing the best chord-scale.

2 - The Previous Chord-Scale : This is the most important in choosing the best chord-scale. In general, the goal is to retain any notes from one chord scale to the next successive chord scale.

3 - The Following Chord : This is important especially for choosing the best chord scale for dominant 7 chords. Dominant chords usually sets up the sound of the next chord as it creates sort of a harmonic motion sense in "the Circle of 5ths". Hence, the following chord is checked for determining the best chord scale for the underlying dominant chord.

1.3 On-line Music Courses

With the increasing range of content of today’s Massive On-line Open Courses (MOOC), on-line education seems to emerge as the new wave of education. On- line music courses are a part of this trend as well. Many music schools, institutions and creative art developer companies offer comprehensive on-line music courses.

One of the main content providers in the on-line music education world is Berklee College of Music. Jazz Improvisation, taught by Gary Burton, is among the most popular courses in the format of a MOOC, within Coursera.org. Besides the course 1.3. On-line Music Courses 7 is free and open-content, it has thousands of students from all over the world. The content of the course is brief but explanatory and aims to teach improvisational skills mostly within the context of "chord-scales". The students of the course are required to complete assignments that are evaluated with Peer-Review Grading method. Our work is motivated to perform automatic assessment and generate feedback to the reviewers or the instructors and the students in the context of on-line music edu- cation. If the goal is achieved, we believe that this strategy could accelerate the process of music learning and grading.

Figure (4) Coursera, Gary Burton’s Jazz Improvisation

Considering the context of the assignments in Gary Burton’s Jazz Improvisation course, one needs to take into account some constraints for designing an automatic assessment system on improvisation performances. Initially, the set of chord-scales to be included within the context of assessment needs to be determined. In the above 8 Chapter 1. Introduction mentioned course, there are ten most commonly chord-scales used by musicians in the context of Jazz improvisation. Consequently, we limit the number of scales within our framework according to the content of this course. These chord-scales are: 7-modes of major scale (in Greek Ionian, Dorian, Phrygian, Lydian, Mixolydian, Aeolian / natural minor, Locrian), super-Locrian / Altered, Lydian dominant and half-step whole-step symmetrical diminished scales. Further explanation on the choice of scale types to be considered in this thesis work is done in Classification Chapter.

1.4 MusicCritic

Massive On-line Open Courses (MOOCs) for music education have the potential to widen the range of music and instrument learning in the global scale. However, there still remain drawbacks regarding the scalability of these courses for large audi- ences. The limitations due to the difficulties in recording the student performances, assessing these recordings and providing feedback in the case of large scale audiences emerge as the main challenges of today’s MOOCs for music education. In [1], the authors present MusicCritic, which is a novel framework that is aimed to overcome the challenges mentioned above. In addition to the capabilities of providing auto- matic or semi-automatic assessment and feedback (depending on the exercise design and content) on the student performances , the design of the framework also in- volves tools for designing exercise, the interface of the exercises and an architecture for maintaining the student recording data.

In Figure 5, the main architecture of the MusicCritic framework is shown. The Learning Management System (LMS) is the part where the student-computer and teacher-computer interaction takes place. The MusicCritic software performs the automatic and semi-automatic assessment procedures. According to the design of this framework, the Learning Management System (LMS) is in communication with MusicCritic. The main work flow for a typical use case of the framework can be summarized as follows: 1.4. MusicCritic 9

Figure (5) MusicCritic Framework [1]

1 - Teacher prepares an exercise using the LMS and MusicCritic,

2 - Learner uses the interface for practice and recording. Then, the learner uploads his/her performance to the LMS

3 - LMS sends the recording to MusicCritic, where the semi-automatic or automatic assessment is done, and then the output of MusicCritic’s assessments are sent to the teacher for further evaluation.

The value of MusicCritic is not only due to its unique exercise interface and the capability of providing automatic feedback to the students or easing the process of grading and assessment done by the teachers, but also its property to store student data. Chapter 2

State of the Art

In this section, a review of the state of the art methods that could be exploited specifically for the task of ’chord-scale detection’ is provided. Then, a literature review on improvisation assessment metrics in musical context takes place.

2.1 Review of Chord-Scale Detection Methods

The task of automatic chord-scale detection from audio is relatively a new field of research in Sound and Music Computing. With the increasing popularity of on-line Music courses and recommendation systems, scale-based analysis has a potential to gain more importance in the near future. Even though there still is no standardized way of automatic chord-scale estimation, there exist few inspirational works that could shed light on possible directions for the task studied in this paper. In this section, the methodologies that inspired our approach of chord-scale detection are reviewed.

Social Virtual Band

Social Virtual Band is an environment designed to provide the jazz student with a software support to record himself over realistic accompaniment and to store and manage his collection of solos over the Cloud [6]. The system is aimed to give auto- matic feedback over the solo performances. The feedback is generated through an

10 2.1. Review of Chord-Scale Detection Methods 11 automatic harmonic analysis applied on the MIDI representation of the predominant melody. The MIDI notes of the melody are obtained through automatic transcrip- tion of audio with the method explained in [7]. These MIDI notes are quantized within the boundaries of notated start and end times in order to obtain an aligned audio-to-score representation. The solo performances are evaluated with automatic harmonic analysis through following steps.

1 - For each chord label, list of all possible scales are computed. The harmonic analysis is performed over 3 basic scales, major, harmonic minor and melodic minor. The procedure of the analysis is performed as explained in [8].

2 - A cost function is introduced as the transition between two resulting harmonic analysis for consecutive chord labels.

3 - The first chord is added at the end of final chord assuming that given loops over itself.

The system described in this section is valuable for highlighting important challenges for generation automatic feedback on the solo performance in terms of played scales over chords. Yet, the imperfection of automatic transcription algorithms remains an artifact for analyzing solo performances. Moreover, the harmonic analysis is limited within 3 basic scales, which is far from being comprehensive for the goal of this study.

Figure (6) The evaluation interface of SocialVirtualBand [6] 12 Chapter 2. State of the Art

Scale Matching for Audio Tonality Analysis

In [9], Weiss and Habryka proposes an algorithm for visualizing the tonal charac- teristics of classical audio recording through the concept of scales . In the second approach presented in the paper, the authors apply maximum likelihood estimation on audio frames using chroma features to estimate the scale-type. The list of scale- types to be estimated are chosen to investigate different classical musical concepts (Table 2).

Table (2) Scale Types in Weiss (2014) Scale Types Binary Templates Diatonic (1 0 1 0 1 1 0 1 0 1 0 1) Pentatonic (1 0 1 0 1 0 0 1 0 1 0 0) Whole Tone (1 0 1 0 1 0 1 0 1 0 1 0) Octatonic (1 1 0 1 1 0 1 1 0 1 1 0) Hexatonic (1 1 0 0 1 1 0 0 1 1 0 0) Acoustic (1 0 1 0 1 0 1 1 0 1 1 0) Chromatic (1 1 1 1 1 1 1 1 1 1 1 1)

The proposed methodology computes the likeliest scale-type on the frames of analy- sis. First, the chroma features are extracted using the method in [10] by employing a filter bank and summing up all energies belong to a pitch-class. For each frame, a

T chroma vector c = (c1, c2, c3, ..., cN ) of dimension N := 12 is obtained. The vectors are then normalized with respect to l − 2 norm. Then the normalized chroma vec- tors are summed up and normalized again on a larger analysis window. This process may also be referred as smoothing on chroma vectors (equation below). By applying smoothing on the time-domain chroma features, they obtain chroma histograms g over larger analysis frame for estimating the scale-type.

Then, scale type estimates Sq are computed using following formula.

11 Y T Sq = (gi) (2.1) i=0 2.1. Review of Chord-Scale Detection Methods 13 where T stands for the scale-type templates in binary form. Then, then the maxi- mum likelihood all scale types is used as the final scale estimate.

Smax = max Sq (2.2) q

To account for varying number of notes in the estimated scales, a normalization factor is introduced as follows:

11 X M := Ti (2.3) i=0 where M corresponds to the number of notes in a scale template. Then the normal- ization is applied:

Snormalized := Smax/λ (2.4) with the normalization factor;

λ = (1/M)M (2.5)

Note that in their approach, authors aim to estimate the likeliest scale type with the consideration of all the transpositions in all 12 keys of each scale type. The proposed method for scale detection in this thesis work estimates the type of chord-scale in one certain key which is the root of the corresponding chord. This assumption is determined in accordance with the Chord - Scale Theory as explained in the previous section.

The value of this work is due to being one of the few papers that studied the mod- eling and automatic estimation of scales. In our study, we have expanded the ideas proposed in the paper [9] for the Chord - Scale analysis applied on Jazz improvisa- tion performances. Moreover, the likelihood estimates could be useful features for creating probabilistic machine learning models. 14 Chapter 2. State of the Art

2.2 Review on Improvisation Assessment Metrics

The review of prior research on assessment metrics of improvisation in jazz is pro- vided in order to highlight the multi-dimensionality of the task and show that it is possible to evaluate an improvisation performance using reliable measures [11]. The summary below is confined to listing up the related work on improvisation assessment metrics since they are outside the technical focus of our analyzes.

Evaluations of musical and more specifically improvisation performances still have ambiguities due to the lack of clear definitions of such musical metrics. The assess- ment of improvised music is particularly mysterious, in spite of insightful discussions of cognitive processes involved in the act of improvisation [12]. The cognitive pro- cesses involved in has been studied empirically in [13], where the findings of the study supports the information processing model of improvisa- tion introduced in [12], the role of anticipation described by [14] and the role of flow studied in [15]. [16] provides and extensive study on these cognitive attributes, like attention, flow, working memory and technical fluency.

In the prior works of [17],[16] , [18], [19], there are several factors that lead to success in jazz improvisation, such as aural imitation ability, technical facility and self-evaluation ability. [20] and [21] have further developed and tested measure- ment tools for the assessment of improvisation achievement. These measurement tools vary from broad evaluation concepts like musical syntax, creativity, musical quality or fluency [20], to more specific metrics related to technique/tone quality, structure/development, rhythm and style, and expression [21]. Chapter 3

Datasets

Computational methodologies require proper and established datasets for testing and evaluation. Due to the lack of such open-content datasets related to our task of automatic chord-scale detection, we have created a dataset, which comprises of monophonic recordings of improvisation performances with manually annotations in machine readable format. Moreover, we have utilized MusicCritic for testing the proposed chord-scale detection algorithm in a real use case.

3.1 The Chord-Scale Dataset

The manual annotation of scales from an improvisation performance is a time- consuming task, and requires trained ears, and excellent theoretical and practical expertise level. This is a challenging task due to frequent scale changes on the tem- poral level and the variety of possible scale choices that could be played in a ’real’ improvisation performance.

Figure (7) Sample sounds from Chord-scale dataset (freesound.org)

15 16 Chapter 3. Datasets

In order to overcome the problem of data creation, we have followed a simple, yet useful strategy. Thanks to the fellow musicians who accepted to participate in this project, we have recorded 41 monophonic improvisation performances on trumpet (by Hikmet Altunbaslier) and tenor saxophone (by Toprak Barut). Each of these recordings are performances in one scale (or "mode") among the list of scales shown in Table 3.

Table (3) Scale Types in the Chord-scale Dataset

Scale Types Stype Ionian (Major) Dorian Phrygian Lydian Mixolydian Aeolian (Natural Minor) Locrian Melodic Minor Lydian b7 Harmonic Minor Altered (Super Locrian) Half-Whole Step (Symmetrical) Diminished

The improvisation performances are played over a certain backing track with har- monically related chord progressions for each type of chord-scale or mode. The performances are annotated in terms of start and end times per "phrase" or "sen- tence" or "motif". (Figure 8) This annotation is done manually and carefully. Even though each performance has varying duration, the number of phrases (or motifs) per scale type is kept constant in order to maintain a balanced dataset. 3.2. Data from MusicCritic 17

Figure (8) Data Annotation Format (Chord-Scale Dataset)

3.2 Data from MusicCritic

The MusicCritic framework offers a web environment that conveniently handles student performance data gathering. To test the proposed system for chord-scale detection in a real-world use case, we have used the student data obtained via MusicCritic.

For convenience, we have designed a simple exercise for practicing scales and im- provisation skills. The students are provided with a backing track and instructions regarding the harmony and the chord-scales in the format (Figure 9). For each region in the figure, the students are expected to perform improvisation using the indicated chord-scales. In the backing track, the chords in the lead sheet are played by an acoustic guitar in 100 bpm at every downbeat. The key signatures for each type of chord-scale are also provided as additional guidelines for students. 18 Chapter 3. Datasets

Figure (9) Modal (Chord-Scale) Exercise

As required by the MusicCritic framework, the backing track and the student per- formances have the same duration of 21.6 secs, which is the exact duration of 36 beats in 100 bpm. The recordings start with an empty measure with clicks. Using these parameters, we have created the exercise via the interface shown in Figure 10. 3.2. Data from MusicCritic 19

The lead sheet in Figure 9 is exported as XML file, which is directly fed to our system for further processing. The chord-scale types, keys, start and end times for each region in the exercise are obtained using a similar structure with that of Figure 8.

Figure (10) Exercise setup interface of MusicCritic Chapter 4

Feature Extraction

We have used chroma features for the recognition of the chord-scale type out of a musical phrase or a motif in our system architecture. Chroma features provide a useful multi-dimensional tonal representation of audio, which are studied in detail under this section. As mentioned in Section 1.3, scales are a set of pitch-classes sequentially ordered within an octave. Thus, we have decided to use chroma features for performing ’automatic chord-scale detection’.

One may argue that monophonic solo performances can also be analyzed in symbolic domain using MIDI instead of representing the musical signal with chroma vectors. Although it is a valid argument, conversion of raw audio (waveform) to symbolic data requires the transcription of solo performances, which can be done manually by a human expert or automatically, which introduces complexities and still cannot be performed perfrectly. Hence, we use chroma vector representation of musical signals and perform chord-scale type classification based on chroma features.

4.1 Preprocessing

We use tonal features of sound in our methodology, which are extracted through spectral analysis. Initially, we preprocess the audio signal in order to get a better representation of the tonal content. The main goal of the preprocessing steps is to

20 4.1. Preprocessing 21 make the audio signal more robust to noises and artifacts and prepare the signal for further processing (i.e feature extraction).

Equalisation

The audio signal is filtered with inverted approximation of equal loudness curves in order to account for the non-linear perception of the spectra in the human auditory system. [22]

Frame-based Spectral Analysis

The sampled and filtered audio signal is divided into series of analysis frames of size Nframe and hop size of Nhop. Then, each analysis frame x(n + l · Nhop) is multiplied with a "Hanning" window function w(n), to obtain the windowed audio signal xw(n), where n = 0, 1, ..., Nframe − 1 and l indicates the number of the frame that is analyzed.

xwl (n) = x(n + l · Nhop) · w(n) for n = 0...Nframe − 1 (4.1)

Essentially the windowed frame is centered in time domain to obtain a zero-phase window before the Discrete Fourier Transform (DFT) operation:

N N N x = x (n + frame ) , n = − frame , ..., frame − 1 (4.2) wcentered w 2 2 2

DFT of these centered frames are computed with formula:

Nframe 2 −1 X j 2πnk /Nframe X(k) = DFT [x(n)] = xwcentered (n) · e (4.3) Nframe n=− 2

X(k) denotes the frame-based frequency spectrum of the signal x(n). 22 Chapter 4. Feature Extraction

Table (4) Spectral Analysis Parameters

PARAMETER VALUE

Sample Rate (fs) 44100

FFT Size (N) 2048

Window Function (W) Hanning

Window Size (M) 200 ms

Hop Size (H) 100 ms

Number of Bins per Octave in Chroma Vectors 12

Peak Detection

Spectral peak detection is performed using the Sinusoidal Modeling Synthesis frame- work (SMS), which is introduced in [23]. In SMS, a ’spectral peak’ is defined as a local maximum in the magnitude spectrum, with the constraints of the frequency being in the certain range and has an amplitude above a predetermined thresh- old. The peak detection procedure explained below is performed as in [24] and its implementation is taken from [22].

The resolution of the spectra are determined by the FFT (Fast Fourier Transform) size. Consequently, the accuracy of the detected peaks are restricted by the spec- trogram resolution, where a spectrogram bin represents a frequency of fs /Nfft, fs being the sampling rate and Nfft the FFT size. As explained above, zero-padding is applied on the audio signal to increase the resolution of the spectra. However, as shown in [25], in the case of using a rectangular window (which has 2 main lobes) and an increase as small as 0.1 percent of the width of the main lobe of the trans- formed window, a zero-padding factor of 1000 is necessary. Thus, to decrease this factor, a further step is taken. Quadratic spectral interpolation is applied only to the samples which are closer to the detected minimum/maximum frequencies. In the procedure, three closest points to the min/max frequency are considered to form a 4.1. Preprocessing 23 parabola as follows;

y(x) = a (x − p)2 + b (4.4) where p is the center of the parabola, a is a measure of its concavity and b is the offset. Then for the three points to be considered which are the closest points around the max/min frequency fbeta.

y(−1) = α = 20 log 10|Xkβ − 1|

y(0) = β = 20 log |X(kβ)| (4.5) y(1) = γ = 20 log |X(kβ + 1)|

α ≤ β ≥ γ where these three points are assumed to form a parabola, as defined above. So the peak location p can be calculated as;

1 α − γ p = · (4.6) 2 α − 2β + γ

Finally, the frequency bin of the detected peak is;

∗ k = kβ + p (4.7) and the magnitude of the peak is therefore,

1 y(p) = 20 · log10|X(k∗)| = β − (α − γ) p (4.8) 4

Note that, the peaks to be detected and mapped are limited within the bandwidth of f = [100-5000] Hz. This limitation is due to the fact the higher frequencies introduce a lot of noise and artifacts in the spectrum and consequently to pitch class distribution. 24 Chapter 4. Feature Extraction

4.2 Chroma Feature Extraction

Chromagrams or Pitch Class Profiles (PCPs) are widely used in many applications that aim to extract mid-level musical information from the audio signal, like au- tomatic chord recognition key extraction, tonality detection and such, since they were introduced in [26]. Chroma features are a projection of an entire frequency spectrum into 12 pitch classes (C,D,...,B), representing the 12 distinct semitones of the musical octave, where the pitch height is discarded and all pitch classes lie on a 2-D tonal surface [27].

We employ the Harmonic Pitch Class Profiles [24] for extracting chroma features. Initially, Harmonic Pitch Class Profile (HPCP) features are extracted from the audio on the frame level. The features are further processed to obtain cleaner data for classification. Then, the features are statistically summarized over each annotated audio segment and prepared for the classification stage.

Harmonic Pitch Class Profiles

In our methodology, we extract the Harmonic Pitch Class Profile (HPCP) features in a similar fashion with that of [24], which is explained in this section. The HPCP extraction algorithm in [24] is implemented in Essentia, the software library intro- duced in [22], which is the main feature extraction that is used in this study.

In principle, Harmonic Pitch Class Profiles are modified versions of Pitch Class Distributions (PCDs) which are introduced in [26]. PCDs are obtained by computing the square of the spectral amplitudes of each frequency corresponding to a pitch-class according to the logarithmic mapping function M in Equation 4.8,

 −1, if i = 0 M(fpk) = (4.9) fs.i round(12 · mod12(log N ), ifi = 1, 2, ...N/2 2 fref 4.2. Chroma Feature Extraction 25

where fref represents the reference frequency (or PCPbin0 )and fs is the sampling rate and N is the DFT size. Thus, fs · i /N represents central frequency of the spectrum bin i.

For the computation of Harmonic Pitch Class Profiles, [24] introduces a weighting function for that of PCDs, which is defined as;

  2 π di cos ( 2 · 0.5·l ), if|di| ≤ 0.3 · l w(n, fi) = (4.10) 0, if|di| 0.5 · l

where di ’s the distance in semitones between the peak frequency fp and the center bin frequency fbin, n is the number of the bin in the chroma vector, m is the integer that minimizes the module of d, and l is the weighting window. Note that l is a parameter for changing the bin resolution of the HPCP vectors.

The HPCP model considers the influence of harmonics on the pitch class distribution vectors. The model proposes a weighting function on the harmonics that should con- tribute to the pitch-class of its fundamental frequency (fi fi/2, fi/3, fi /nHarmonics), where the weight of each harmonic decreases with following function:

n−1 wharm(n) = s (4.11)

Where s < 1 , for achieving the decreasing contribution. In our experiments, we choose s = 0.6 , as suggested in [24].

At the end, using the weighting function w(ai, fi) described above, the HPCP vectors 26 Chapter 4. Feature Extraction are computed with the following formula;

nP eaks X 2 HPCP (n) = w(n, fi) · ai n = 1 ··· size (4.12) i=1

where ai and fi are the amplitude and frequency values for the peak I, nP eaks is the number of spectral peaks to be considered and n is the index bin of the HPCP vector.

Tuning Frequency

For most of musical material available today within the context of Western music, A4 - 440 Hz is considered as the standard reference frequency and A ("la") is the reference pitch class. Most of the chroma computation methodologies use this fixed reference frequency value for constructing pitch class distribution (PCD) vector. However, it will be a naive approach to assume that all musical performances are tuned to A4 - 440 Hz perfectly. Thus, in order to construct more robust chroma features which are ideally independent of the tuning frequency, we need to estimate the tuning frequency of the instruments in the musical performance Therefore, we compute the tuning frequency, ftuning, of the analysis frames using the method in [28].

Once the tuning frequency is computed, the harmonic peaks in the frame spectra will be mapped to pitch-classes, using the reference frequency fref . Depending on the type of chroma features used, we employ two different strategies for determining the reference frequency of the chroma vectors.

In Jazz improvisation performances phrases, lines and motives are constructed ac- cording to the contextual harmonic information which is evident from the score of the composition (or exercise). Jazz musicians use chord-scales for improvisation and these chord-scales are essentially sets of pitch-classes that are constructed on top of the given key of the chord that is played by the accompaniment, and in consider- 4.3. Reference Frequency Determination for Pitch-Class Mapping 27 ation with few other criteria, which is explained thoroughly in Section 1.3. In our methodology, we construct the chroma vectors using this knowledge by setting the reference bin (bin 0 or PCPbin0) of the vector as the frequency of the key of the chord. By doing so, we aim to achieve key/tonality invariant chroma features.

4.3 Reference Frequency Determination for Pitch- Class Mapping

When using Harmonic Pitch Class Profiles as chroma features, the reference fre- quency is estimated before the pitch-class mapping of the harmonic peaks. The estimated tuning frequency outputs the frequency value corresponding to the mu- sical note A4, which is generally expected have a value around 440 Hz. The key / tonality information K of the musical segment of analysis is obtained from the manual annotations as introduced in Section 3.1. The frequency of the reference bin of HPCP vectors is then determined by using following formula:

δk .100 /1200 fref = ftuning .2 (4.13)

where δk is the distance in semitones between the keys of the tuning frequency

Ktuning and the target key K.

4.4 Post-Processing

The goal of the post-processing steps explained in this section is to prepare the feature data for the classification stage.

Normalization

In order to establish dynamic invariance in the feature vectors, each of the frame- based HPCP vectors are normalized with respect to a suitable norm. In our method- 28 Chapter 4. Feature Extraction ology, we employ unitSum norm, which is defined as:

HPCP (n) HPCP (n) = (4.14) unitSum PnumBins i=1 HPCP (i)

By applying unitSum norm, we obtain relative weights of pitch-classes in the frame- based HPCP vectors.

Segment Features

The classification techniques explained in Section 5, requires statistical summaries of each audio segment for estimating the type of scale present in the musical per- formance of analysis. Thus, frame based features are statistically summarized over the audio segments of analysis. The pitch-class mapping of spectral harmonic peaks causes artifacts on the HPCP vectors, in addition to the effect of external noise. In order to minimize the presence of these artifacts on the segment-wise statistically summarized features, we consider only the chroma bin (or pitch-class) with the max- imum value in the feature computation procedure and set the rest of the chroma bins to zero (Equation 4.15) This process is valid for our case since the analysis audio files are monophonic / one-instrument performances and it is expected to have only one dominant pitch-class in the feature frames.

 HPCP (n), if n = maxn HPCP (n) HPCPonlyMax(n) = (4.15) 0, otherwise

For the use cases studied in this thesis, the boundaries of each audio segment of analysis is annotated with timestamps. The availability of these annotations elimi- nates the necessity of the automatic segmentation of the performances. Therefore, we can directly obtain the feature statistics of each audio segment.

Noise Removal

We apply further processing on the frame-based chroma features in order to reduce the influence of noise and artifacts in the summarized features for classification. This 4.4. Post-Processing 29 noise removal step is based on heuristics and exploited with practical motivations such as simplicity and fast processing. The logic of our noise removal method is as below:

if argmax(HPCPonlyMax(k)) 6= argmax(HPCPonlyMax(k ± 1) : (4.16) HPCPonlyMax(k) = 0

Figure (11) HPCPunprocessed vs. Time

Figure (12) HPCPonlyM ax vs. Time

Figure (13) HPCPpost−processed(after transient removal) vs. Time

The audio file that the figures above are extracted from can be found from the link 30 Chapter 4. Feature Extraction

1.

Statistical Summarization

Frame-based features are extracted with a window size of 200ms, which is assumed to be too short to perceive a pitch-class. Hence if the non-zero element of HPCPonlyMax vectors (or the most predominant pitch-class within the chroma vector) are not the same pitch-class in the consecutive chroma vectors, we assume that this non-zero element does not imply a pitch-class. Thus, the chroma vectors which satisfy the above logic are discarded from the summarized features.

In this study, we use two main statistics, the means and the standard deviations of the HPCP bins over the audio segments, which are computed using following equations;

PN HPCP (x) HPCP (x) = i=0 (4.17) mean N

s PN (HPCP (x) − HPCP (x))2 HPCP (x) = i=0 mean (4.18) std N − 1

As suggested in [29], standard deviations seem to provide useful features for the mode recognition. As illustrated in Figure 14, the bins with strong and weak in HPCP vectors have similar trends, representing similar harmonic content. Hence, we use the standard deviation features in addition to the average HPCP features.

The normalized and summarized statistical features of audio segments can also be considered as "chroma histograms" (Figure 14), which may also be used as a visu- alization of the harmonic content of the audio segment. The use case of this type of visualization will be discussed in Section 6.2.

1https://github.com/emirdemirel/Chord-ScaleDetection/blob/master/toprakMinor.mp3 4.4. Post-Processing 31

Figure (14) Chroma (HPCP) histograms, HPCPmean (left), HPCPstd (right) Chapter 5

Classification

In our methodology, we consider ’Automatic Chord-scale Detection’ as an audio classification task. In our proposed model, we apply and evaluate two distinct ap- proaches. The first approach is based on predefined binary scale templates and classification of audio segments to scale types is carried out by extracting the likeli- hoods of each scale type given the segment chroma features (or chroma histograms). In the second approach, automatic classification is performed using Support Vector Machines (SVM), which is a data-driven approach that is based on statistical learn- ing theory [?]. In this chapter, the work flows of these two distinct approaches are explained.

5.1 Template Based Additive Likelihood Estimation

The first chord-scale detection method we have developed is inspired by the visu- alization method for the scales in Classical Western music, proposed in [9], which is essentially a likelihood estimation procedure based on predefined scale templates. As in [9], we employ binary templates to represent the scale types (Table 5). Hence, these templates are not learned and do not require an existing dataset for scale estimation.

32 5.1. Template Based Additive Likelihood Estimation 33

Table (5) Scale Dictionary

Scale Types s Binary Templates Ttemplate Ionian (Major) (101011010101) Dorian (101101010110) Phrygian (110101011010) Lydian (101010110101) Mixolydian (101011010110) Aeolian (Natural Minor) (101101011010) Locrian (110101101010) Melodic Minor (101101010101) Lydian b7 (101010110110) Harmonic Minor (101101011001) Altered (Super Locrian) (110110101010) Whole Tone (101010101010) Half-Whole Step (Symmetrical) Diminished (110110110110) Chromatic (111111111111)

The bins of the frame based chroma vectors HPCP (k) are statistically summarized over the audio segments and normalized as explained in Section 4.3.1. The binary scale templates are used as pitch-class activation functions on these summarized chroma vectors. Then the scale-type likelihoods Slikelihood are obtained by computing the inner products of the statistically summarized chroma vectors HPCP with the each of the scale templates Ttemplate(s):

Slikelihood(s) = HPCP · Ttemplate(s) (5.1)

Finally, the chord-scale type s with the maximum likelihood would be determined as the estimated or detected scale type.

Sestimated = max Slikelihood(s) (5.2) s 34 Chapter 5. Classification

Due to the summarization / addition of HPCP bins in contrast to the likelihood es- timation method introduced in [9], we refer this method as ’Template-Based Additive Chord-Scale Likelihood Estimation’.

The use case of this scale estimation method can be illustrated on a sample from the Chord-Scale Dataset.Figure 15(a) shows the scale type estimation applied on the accumulated frame based HPCP vectors. In Figure 15(a), the darker regions towards the beginning are due to only few notes being present in the accumulated HPCP vectors, hence the likeliest of any type of scales are low. As more notes are present in the accumulated HPCP vectors, the likelihood of scale types increase. Finally, the density of the expected scale type is higher than the rest, where similar scale types are denser than more dissimilar scale types. In Figure 15(b) frame based HPCP vectors in time domain are provided.

Figure (15) Scale Likelihoods vs. Time (left) Chroma Features vs. Time (right)

5.2 Support Vector Classification

Support Vector Machine (SVM), first introduced in [?], is a classification and re- gression tool that utilizes machine learning theory to maximize predictive accuracy while avoiding over-fit to the data. SVMs use a hypothesis space with linear func- 5.2. Support Vector Classification 35 tions in a high dimensional feature space, trained using a learning algorithm from optimization theory that implements a learning bias from statistical learning theory [30].

Theory

Given l training samples (xi, yi), i = 1, ..., l where each sample has d inputs or d  d features (xi ∈ R ), and a class label (yi ∈ − 1, 1 ) . All the hyperplanes in R are parametrized by the vector (w), and a constant (b) with following equation:

w · x + b = 0 (5.3)

Note that vectors w satisfying the condition in Equation 5.3 are orthogonal to the hyperplane. Considering that the data is to separated by this hyperplane, we can define a function for classifying the data:

f(x) = sign(w · x + b) (5.4)

Now, let’s define a canonical hyperplane which separates the data from the hyper- plane defined in Equation 5.5 by a distance of at least 1. Hence, the data to be classified should satisfy the conditions below:

xi · w + b ≥ +1, when yi = +1 (5.5) xi · w + b ≤ −1, when yi = −1

The above inequality indicate that all such hyperplanes have a ’functional distance’ ≥ 1. To get the geometric distance from the hyperplane to a data point (or sam- ple), the functional distances of all data points that share the same hyperplane are normalized.

  y (x · w + b) 1 d (w, b), x = i i ≥ (5.6) i kwk kwk

Here, our goal is to maximize the geometric distance of the hyperplane to the data 36 Chapter 5. Classification points, in order to obtain a large margin for a better classification. From Equation 5.7, we see that this is accomplished by minimizing kwk . The most common method for doing that is with Lagrange multipliers [31]. The problem of minimizing the geometric distance (also referred as Quadratic Programming Problem or QP Problem) is transformed into following equation:

l l l X X X minimize : W (α) = − αi + 1/2 yiyjαiαj(xi · xj) i=1 i=1 j=1 l X (5.7) subject to : yiαi = 0 i=1

0 ≤ αi ≤ C (∀i)

where α is the vector of l non-negative Lagrange multipliers to be determined, and C(> 0) is a constant, which is tunable parameter. When C = ∞, the hyperplane completely separates the data. For finite C, our problem becomes a ’soft-margin’ classifier problem [31], which allows misclassification. Hence, lower values of C results in a more flexible hyperplane, which is constructed to minimize the margin error for each data point:

Margin Error : yi(w · Xi + b) < 1 (5.8)

Kernel Functions

For most cases, the samples in a data set are not linearly separable. A straight- forward approach would be trying to separate the data with polynomial curves. However, optimizing the parameters of such curves to fit the data is difficult. Here, we can ’pre-process’ the data for finding an optimal hyperplane using the method described above. For that, a mapping function z = φ(x) is introduced, which transforms the d dimensional feature vector x into a d0 dimensional vector z. Our goal at this step is to choose a proper mapping φ(), so that the new training data is separable by a hyperplane. 5.2. Support Vector Classification 37

Given the mapping z = φ(x), our QP problem (recall Equation 5.7) is transformed into following form:

f(x) = sign(w · φ(x) + b) X = sign([ α y φ(x )] · φ(x) + b) i i i (5.9) i X = sign( αiyi(φ(xi) · φ(x)) + b) i

The equation above shows that, if we knew the dot product, φ(xa) · φ(xb) in the higher dimensional space (which is called a kernel), we would not need to deal with the mapping manually z = φ(x). Then, we can define the kernel function as:

K(xa, xb) = φ(xa) · φ(xb)) (5.10)

P Then our classifier function would simply be f(x) = sign( i αiyi(K(xi, x)) .

One of the most popular kernel functions used in SVM classification is called Gaus- sian RBF Kernel

 kx − x 2k K(x , x ) = exp − a b + b (5.11) a b 2σ2 where σ is a parameter to be tuned. This kernel function results in the classifier function shown below:

  2   X kxa − xb k f(x) = sign α y exp − + b (5.12) i i 2σ2 i which is called the Radial Basis Function (RBF) and the kernel function is called the RBF kernel.

Hyperparameter Optimization - Grid Search Method

In machine learning theory, hyperparameter optimization is the procedure of choos- ing a set of optimal parameters for the learning algorithm [32]. In our machine learning pipeline, we have used Grid Search algorithm for tuning/optimizing the 38 Chapter 5. Classification parameters of our Support Vector Classifier.

Grid Search is an exhaustive searching method which generates a score given a subset of finite set of parameter candidates and performance metric. In our SVM classifier with RBF kernel, we need to tune 2 parameters : C - the regularization constant and y - the kernel parameter. In order to achieve a higher generalization, the parameters are validated 10-fold stratified Cross Validation.

Below can be seen the parameter sets used in our experiments:

C : 0.001, 0.01, 0.1, 1, 10, 100, 1000

y : 0.001, 0.01, 0.1, 1

The implementations of SVMs and Grid Search are obtained from scikit-learn, which is a Python module that integrates a broad range of state-of-the-art machine learning algorithms for medium scaled problems [33]. Chapter 6

Experiments

The main context of this thesis work is concentrated on the experiments conducted for the Chord-Scale analysis of monophonic improvisation performances. In this sec- tion, we examine two study cases based on two individual datasets for the demon- stration of the proposed methodology for automatic chord-scale recognition and automatic performance assessment. In the first stage of the experiments, we evalu- ate different feature sets and two distinct classification algorithms. Then, we study a real-world use case for the proposed methodology.

6.1 Evaluation on the Chord-Scale Dataset

The initial phase of the experiments is conducted on the dataset described in Chapter 3. Despite the small size of the dataset, it is the only dataset available for research purposes focused on the chord-scale detection task. In this study, we evaluate two different phases of the overall system of chord-scale detection.

Feature Selection

Each audio file are phrase-wise manually annotated with timestamps indicating the start and end times of the segments of analysis, as described in Chapter 3. The frame-based chroma features are extracted for each audio segment using the

39 40 Chapter 6. Experiments methodologies explained in Chapter 4, and then statistically summarized over the segments in two aspects: Mean and Standard Deviation. By doing so, we obtain the chroma histograms of these audio segments (Figure 16).

Figure (16) Chroma Histograms: HPCP.mean (right), HPCP.std (left)

Evaluation of Classification Algorithms

Scale detection is performed using two distinct approaches, one being a rule-based method based on heuristics, and the other using supervised learning. These two approaches are tested and evaluated on the Chord-Scale Dataset based on overall accuracy and F-scores. Since the dataset is balanced, macro evaluation metrics are valid for use.

The overall accuracy scores are based on the estimated scale-type matching with the ground truth scale types, as in:

T rueP ositive + T rueNegative Accuracy = (6.1) T otal #of Samples

The supervised learning methodology is evaluated using a 10-fold cross validation, where each fold is performed in a stratified fashion and trained with the 90% of the entire dataset. Scale type or class predictions are made on the other 10% each time. In order to increase the generalization power of the evaluation measure, this 6.2. Evaluation of Student Performances using MusicCritic framework 41 procedure is performed 10 times, each time on randomized (shuffled) order of the data. The overall accuracy is then provided by averaging these 10 results.

6.2 Evaluation of Student Performances using Mu- sicCritic framework

Scale Exercise

This exercise is designed within the context of the demonstration of MusicCritic. The goal of the exercise is to practice major (Ionian) and minor (aeolian) scales, given the harmonic context. The students are provided with a lead sheet (Figure 9 - Section 3.2), and a backing track. As seen in Figure 9, the students are expected to perform improvisation using 2 type of chord-scales, major and minor, in 3 keys, E, C, Ab. Along the target chord-scale types, the chords in the backing track and the key signatures are provided in the lead sheet.

For each designated region, the performance assessment is done individually. There- fore, based on the ground truth key, feature extraction and scale detection are ap- plied separately for each region.

Student Performance Assessment

The performance assessment methods for the study case (MusicCritic framework) are explained in this chapter. As discussed in detail in Section 2.2, the evalua- tion of an improvisation performance is a multi-dimensional task. Although it is a challenging task to achieve a complete evaluation of improvisation performances automatically, our proposed system of chord-scale detection can be used for perfor- mance assessment regarding musical syntax [20] or technical quality [21]. Regarding the musical syntax, we provide feedback on pitch-classes played within an audio segment of analysis and the choice of scale types given the harmonic context. To sum up, we provide performance assessment scores in three dimensions: In-scale Rate, Scale Completeness and Scale Correctness. 42 Chapter 6. Experiments

In-Scale Rate

The In-scale Rate score is the ratio of the relative amplitudes of in-scale pitch-classes over the entire pitch-classes within an octave:

PN HPCP (i) InScaleRate = i=0 inscale (6.2) PnumBins i=0 HPCP (i) where,

HPCPinscale = HPCP · Stemplate (6.3)

Note that In-scale Rate assesses the correct pitch-classes according to the target chord-scale type and it does not give extra penalty in the assessment score due to out-scale pitch-classes that are present in the performance.

Scale Completeness

Scale Completeness is the ratio of the sum of number of non-zero in-scale HPCP bins (i.e. the number of in-scale pitch-classes) played by the performer over the total number of in-scale pitch-classes.

number of in-scale pitch classes in the performance ScaleCompleteness = (6.4) total number of in-scale pitch classes

For instance, let’s assume the target chord-scale is C major scale and the performer plays only C (do), E (mi), G (sol) which is simply a C major triad. C major scale is a diatonic scale and consists of 7 pitch-classes. Hence, the scale completeness would be 3/7 .

Scale Correctness

In this performance evaluation method, we apply an assessment on the scale choice of the student given a musical excerpt. It is not an unusual case to play pitch- classes that are not included in the scale of choice during improvisation. These out-scale pitch classes may be present in the improvised motif or phrase. From 6.2. Evaluation of Student Performances using MusicCritic framework 43 music theoretical point of view, these out-of-scale pitch-classes (or notes) can be classified as chromatic passing notes or out-scale notes [4].

The number of possible scales to be played in an improvisation context can be as large as the permutation of at least five pitch-classes within one octave, considering that it is necessary to have at least 5 different notes for constructing a "scale"[4]. To simplify the analysis for our research, we consider a limited number of scale types for assessment. In addition to the chord-scale types within the Chord-scale Dataset, we also include the whole tone scale into the set of the scale type choices to be estimated. The whole-tone scale is added since it is included in the set of main chord-scales by Levine [4].

Figure (17) Binary representation of the scale types considered in this case study

We estimate the scale type of choice by the student using the method explained in Section 5.1. As this methodology suggests, the estimated scale gives the likeliest scale type given the amplitudes of pitch-classes present in the chroma histogram of analysis. According to this approach, the presence of dominantly played chromatic passing notes or out-of-scale notes effects the estimated scale type, which decreases the correctness of the choice of scale type. Similarly, wrong scale choices or random 44 Chapter 6. Experiments choice of pitch-classes would result in estimating a different scale type, which con- sequently implies incorrect choice of scale type. Using this insight, it is proper to apply a similarity metric on the estimated scale types with the expected scale type, for measuring the correctness of the scale choice by the student.

Cosine Similarity

Cosine similarity is a similarity measure that computes the cosine of the angle between two non-zero vectors in the inner-product space. The equation in (Equation

6.5) computes the cosine similarity between these two binary scale templates S1 and

S2 defined in Section 5.1.

Pn S1S2 i=1 S1iS2i CosineSimilarity = cos(S1, S2) = = (6.5) kS kkS k pPn 2pPn 2 1 2 i=1 (S1i) i=1 (S2i)

In our Scale Correctness assessment method, we apply cosine similarity between the binary templates of the estimated scale type of student and the expected scale type. In the plot below, we provide the cosine similarity measures of each chord-scale type included within the context of this assessment. 6.2. Evaluation of Student Performances using MusicCritic framework 45

Figure (18) Cosine Similarity Matrix of the binary Scale Templates Chapter 7

Results

In this chapter, the evaluation results of the proposed algorithms for chord-scale detection are shown and a real-world use case of our algorithm using MusicCritic framework is demonstrated. We begin our tests on different feature sets and clas- sification approaches. Standard evaluation metrics such as overall accuracy and weighted F1 scores are used. In order to gain more intuition on how the features and classification algorithms have an effect on this audio classification task, we pro- vide confusion matrix plots for each test case.

Table (6) Indexes Test Cases for HPCP (1-7) Test Index Features Used Classification Method 1 HPCP - MEAN LIKELIHOOD ESTIMATION (MULTIPLICATIVE) 2 HPCP - STD LIKELIHOOD ESTIMATION (MULTIPLICATIVE) 3 HPCP - MEAN LIKELIHOOD ESTIMATION (ADDITIVE) 4 HPCP - STD LIKELIHOOD ESTIMATION (ADDITIVE) 5 HPCP - MEAN SUPERVISED LEARNING (SVM) 6 HPCP - STD SUPERVISED LEARNING (SVM) 7 HPCP - MEAN + STD SUPERVISED LEARNING (SVM)

7.1 Evaluation on the Chord-Scale Dataset

In this section, the proposed chord-scale detection algorithms are evaluated. First, the confusion matrices of different classification methods and feature sets are pro-

46 7.1. Evaluation on the Chord-Scale Dataset 47 vided and commented. The evaluation scores are provided at the end of this section (see Figure 27) in terms of overall accuracy and F-score.

Results of Classification Using Template Based Likelihood Es- timation Methods

1) FEATURES : HPCPmean , CLASSIFICATION METHOD : Template Based Multiplicative Likelihood Estimation (Weiss, 2014)

Figure (19) Confusion Matrix using HPCPmean as features and Likelihood Esti- mation (multiplicative) for classification

Figure 19, shows the confusion matrix of the scale-type estimation algorithm in [9] evaluated on the Chord-Scale Dataset. Note that due the multiplication of HPCP bins for the chord-scale likelihood estimation, we refer this method as ’Template- Based Multiplicative Scale Likelihood Estimation’. From the figure, it is seen that there is a trend of misclassifying the chord-scale types towards the ’major’ scale.

The reason behind this misclassification might be due to the HPCPmean features having boosted amplitudes for the major degrees due to the influence of harmonics, which is one of the main bottlenecks of chroma feature extraction using Harmonic Pitch Class Profiles. 48 Chapter 7. Results

When the HPCPstd features are used (Figure 20), the common misclassified in- stances when using HPCPmean features, reduce considerably and the misclassifica- tion trend towards major scale disappears.

2) FEATURES = HPCPstd , CLASSIFICATION METHOD : Template Based Multiplicative Likelihood Estimation (Weiss, 2014)

Figure (20) Confusion Matrix using HPCPstd as features and Likelihood Estima- tion (multiplicative) for classification

The misclassification bias towards ’major’ scale is still present when performing

"Template-Based Scale-type Classification" (explained in Section 5.1) using HPCPmean features. These results substantiate our inferences regarding HPCPmean features from the first stage of the experiments. The confusion matrices in Figure 19, 20, 21 & 22 show that the rule-based approaches have higher misclassification rates for similar chord-scale types in general. Lydianb7 and Altered scales are the chord- scale types that are misclassified most frequently. Also notice that, the misclassified instances are detected as scale types that are not significantly different from the target scales. For instance, we see that Lydianb7 instances are highly misclassified as Mixolydian where only the diatonic 4th is different between these two chord-scale types. 7.1. Evaluation on the Chord-Scale Dataset 49

3) FEATURES = HPCPmean , CLASSIFICATION METHOD : Template Based Additive Likelihood Estimation (Additive)

Figure (21) Confusion Matrix using HPCPmean as features and Likelihood Esti- mation (additive) for classification

4) FEATURES = HPCPstd , CLASSIFICATION METHOD : Template Based Additive Likelihood Estimation (Additive)

Figure (22) Confusion Matrix using HPCPstd as features and Likelihood Estima- tion (additive) for classification 50 Chapter 7. Results

Results of Classification Using Supervised Learning Method

The supervised learning algorithm is evaluated using 10-fold stratified cross-validation as mentioned in the previous sections. In addition to the illustrations of confusion matrices for the results of the supervised learning method, we provide violin plots for each supervised learning test case, to see the extrema and the distribution of evaluation scores for the iterations in the cross-validation procedure.

5) FEATURES = HPCPmean , CLASSIFICATION METHOD : SVM

Figure (23) Evaluation of HPCPmean features for Supervised Learning stage

Note that the misclassification trend observed when using HPCPmean features on the rule-based classification methods does not appear in the case of supervised learning. It could be inferred that the learning algorithm internally reduces the influence of harmonics when performing the automatic classification. 7.1. Evaluation on the Chord-Scale Dataset 51

6) FEATURES = HPCPstd , CLASSIFICATION METHOD : SVM

Figure (24) Evaluation of HPCPstd features for Supervised Learning stage

7) FEATURES = HPCPstd & HPCPmean , CLASSIFICATION METHOD : SVM

Figure (25) Evaluation of combined HPCP features for Supervised Learning stage

In the violin plot shown in Figure 26, the distribution of the evaluation scores over the 10 randomized iteration sets. As seen in the figure, the variance of the evaluation scores is higher when using the mean of the HPCP features. The inclusion of standard deviations of HPCP features reduces the score variance, while slightly 52 Chapter 7. Results increasing the overall score. Note that the line in the middle of violin plots indicate the median of all the evaluation scores over the randomized splits.

Figure (26) Distribution of standard evaluation scores over validation sets

The figure below shows the evaluation scores of all the experiment instances.

Figure (27) Classification Scores 7.2. Evaluation on Student Performances in the Scale Exercise 53

For the rule-based approaches, both mean and standard deviation features have sim- ilar performances. The Template-Based Additive Likelihood Estimation approach proposed in this study outperforms the Template-Based Multiplicative Likelihood Estimation method suggested in [9]. This performance gap may be due to the musi- cal data consisting of monophonic solo performances. The exponential nature of the multiplicative likelihood estimation suggests that for the cases where the Pitch-Class Distributions are far from being uniform, the likelihood estimation would have more bias towards the major scale due to the influence of harmonics being present in the diatonic major scale degrees (the major 3rd, the perfect 4th, etc.).

The supervised learning method performs slightly better than the former rule-based approaches when the same set of features used. It is seen that the standard deviation features have higher classification score than the combination with the mean fea- tures. From this results, it can be deduced that the standard deviations of chroma bins perform better as features for automatic classification.

7.2 Evaluation on Student Performances in the Scale Exercise

Here, we present the test cases for the application of scale-based automatic as- sessment of improvisation performances on the scale exercise explained in previous chapters. We apply the additive template-based likelihood estimation method due to several reasons.

First, the lack of training data prevents the usage of the supervised learning method. Even if we had enough performances, the quality of the data would be another issue. In order to be able to successfully evaluate and grade the student performances using a learning system, the training data must be chosen carefully. If the goal is to measure the ’correctness’ of the student performance, based on the structure of the desired scale types, or evaluation aspects that involve aesthetics in music, the dataset with the student performances has to be trained with the manual annotations done by an expert (i.e. Teaching Assistant, Music Professor), in terms of grading or the 54 Chapter 7. Results performance assessment aspect.

Here we consider grading in four degrees of performance quality, which can be listed as, ’Very Good Performance’, ’Good Performance’, ’Average Performance’ and ’Poor Performance’. Below provided two distinct student performance, which are graded as very good and poor performances.

For both cases, we provide the automatic assessment measures explained in Section 6.3. For the scale type detection, the additive template based scale estimation method is used.

Performance Grading Degree Possible cases

Very Good perfect assessment scores all notes played in target chord-scale

Good performances with minor mistakes out-scale pitch-classes occasionally not all notes of the target scale are played

Average incorrect scale choice major mistakes incomplete scales

Poor performance out of context too much chromaticism

The figures below are obtained from the Jupyter Notebook demonstration of the study 1. In the figures the performance assessment results are given for each section in the exercise. In addition the assessment scores, the illustration of the chroma features of the student performances and the accumulated scale estimations over time are provided for the user. Note that the sample audio files used in this section

1https://github.com/emirdemirel/Chord-ScaleDetection/blob/master/Chord- ScaleDetectionPART3.ipynb 7.2. Evaluation on Student Performances in the Scale Exercise 55 can be found in the related github repository.

Very Good Performance

Figure (28) Case study : Very Good Performance

Assessment Scores

Figure (29) Case study : Very Good Performance (ASSESSMENT SCORES)

The chroma visualization and the automatic assessment scores of the first perfor- mance we analyze in this section can be seen from Figure 28. In most of the parts within the exercise, the grades in all 3 assessment dimensions are above 90% which makes this performance arguably a ’very good performance’. Notice the low in-scale 56 Chapter 7. Results rate in the first part of the exercise. Here the in-scale rate is the calculated according to the target chord-scale type.

In the chroma feature vs time plot, a high rate of chromaticism is apparent. When this performance is listened, it can be noticed that the performer plays a technique in guitar playing called ’sliding’, which is basically fretting a note on the guitar and then moving (sliding) to another fret without taking the pressure off the finger as you move. Even though these slides may cause a lower in-scale rate, it does not affect the scale correctness, which is determined according to the predominant chord-scale in the performance. Notice that the proposed algorithm for chord-scale detection is able to detect the intended chord-scale, even when the Scale Completeness measure is not 100 % Hence we can comfortably rate this performance as ’very good performance’.

Poor Performance - Performance out of Context

ASSESSMENT SCORES

Figure (30) Case study : Student Performance - Poor Performance (ASSESS- MENT SCORES) 7.2. Evaluation on Student Performances in the Scale Exercise 57

ASSESSMENT SCORES

Figure (31) Case study : Student Performance - poor grades (ASSESSMENT SCORES)

As seen from the ’Estimated Scales’ section of the assessment scores table, none of the scale choice by the performer match with the target scales in the exercise (major or minor). The in-scale rates are far from being perfect. The scale completeness gives good scores for certain instances. The chroma vs. time plots show that for the parts with high ’scale completeness’ scores (refer to Part 5 of the above example), the performer have played almost all the pitch-classes within one octave, and almost in a randomized fashion. This case indicates that the evaluation metrics should penalize wrong (out-scale) pitch-classes, instead of assessing the performance only according to the correct (in-scale) pitch-classes. Chapter 8

Discussions

As mentioned within the context of this work, chord-scales within a musical perfor- mance carry stylistic information. Hence, an analysis that does not take into account any stylistic approach would be oversimplifying chord-scales as an improvisation tool for musicians. Our rule-based approach suggests modeling the chord-scales as binary templates. By modeling scales as binary templates, we consider the term ’in-scale’ as a binary activation function. This strategy neither includes the concept of tonal hierarchy, first explained in [34] in scales, nor any other perceptual bias among the pitch-classes within a chord-scale. Modeling chord-scales as vectors that are constructed using empirical data would be one of the future directions.

Improvisation does exist in many other music traditions that utilize similar concepts to chord-scales. There has been numerous research in the analysis of ’modes’ in other music traditions. In our prior work in Makam Music or Turkish Classical Music tradition [29], we have performed Automatic Mode or Makam Recognition based on chroma features, using the supervised classification method explained in Section 5.1. Our chroma-based automatic classification method has performed %77 overall accuracy score on 1000 songs in 20 different makam types, which sets the state of the art in the task of Automatic Makam Recognition. 1. Our study on makam classification showed that distinct makams that share the same scales are 1For further information on the dataset used for automatic makam recognition mentioned in this context, please refer to Ottoman-Turkish Makam Music dataset [35]

58 59 highly misclassified (Sultan-i Yegah with Nihavend, or Rast with Mahur) when the chroma vectors are constructed with respect to the tonic frequency. This indicates that automatic audio classification based on scales works well in this tradition, yet there is still a room for improvement.

An important discussion that this thesis work discloses is the robustness of chroma features for the task of chord-scale detection in terms of timbre. The automatic classification procedures for chord-scale detection proposed in this study are based on pitch-classes as features. By definition, chroma features represent the tonal content of a musical signal in terms of pitch-classes. Therefore, it is expected that the chroma features are perfectly suitable for our task of chord-scale detection. We have used Harmonic Pitch Class Profiles, which are extracted via a mapping procedure of spectral harmonics to pitch-classes within one octave. Due to the influence of nonuniform spectral harmonics, HPCP’s are expected to be highly timbre-sensitive, since each instrument has its unique harmonic characteristics. The experiments were held on musical signals in three different instruments, and the proposed method showed expected results invariant of instruments. However, the choice of instrument for chord-scale detection is not tested in this study. It would be an important contribution to observe the effect of instrumentation on chord-scale detection.

The supervised learning algorithm works slightly better than the rule-based ap- proaches. This shows that modeling chord-scales in terms of tonal feature vectors is a valid approach. In our experiments, we have constructed our feature vectors with the size of 12, with regards to the equal temperament system. The inclusion of mi- crotonalities or other tonal representations of audio could improve the classification performance. Moreover, it would be interesting to see the effect of the inclusion of microtonalities, to observe their presence in Western music traditions.

In the second phase of our experiments, we have analyzed student performances given an improvisation exercise based on chord-scale. The automatic assessment metrics stated in this work are based on heuristics. It would be interesting to compare the proposed performance assessment metrics to human expert gradings on the same data. Chapter 9

Conclusion

In this thesis, we have studied the concept of chord-scales using computational meth- ods. The importance of chord-scales within the context of improvisation is not yet reflected in the Music Information Retrieval research adequately. Thus, we aimed to shed light on some of the basic characteristics of chord-scales that can be mod- eled computationally. Even though our analysis include several music theoretical assumptions, a more thorough approach of modeling chord-scales is essential.

Our study shows that information regarding chord-scales can be extracted from a musical signal using the chroma representation. Even though the proposed algo- rithm works for all the instruments that took place in our experiments, the nature of chroma feature extraction algorithms suggests a further analysis regarding in- strumentation. Another possible approach for modeling chord-scales could be using transcribed audio or MIDI as features (as described in[6]).

To carry out experiments, we have gathered data from 2 professional musicians and created an open-source dataset which is shared on freesound 1. The Chord- scale Dataset is the first dataset that consists of monophonic Jazz improvisation performances with ’chord-scale’ annotations in machine readable format. We have provided a baseline for the chord-scale detection task on this dataset and the experi-

1https://freesound.org/people/emirdemirel/packs/24075/

60 61 ments are shared on ’github’ 2 for reproducibility. The sample student performances studied in Section 7.2 are also provided within the repository.

In addition to the analysis on the Chord-Scale dataset, we have demonstrated a real use case of the technology developed within the course of this thesis. This demonstration is held on MusicCritic framework. As the on-line education market has been growing rapidly in the recent years, we believe that our technology would be of use within the context of music education.

2https://github.com/emirdemirel/Chord-ScaleDetection List of Figures

1 Three dimensions of research in Music Information Retrieval . . . . . 2 2 Major Scale and Degrees ...... 5 3 Minor Scale and Degrees ...... 5 4 Coursera, Gary Burton’s Jazz Improvisation ...... 7 5 MusicCritic Framework [1] ...... 9

6 The evaluation interface of SocialVirtualBand [6] ...... 11

7 Sample sounds from Chord-scale dataset (freesound.org) ...... 15 8 Data Annotation Format (Chord-Scale Dataset) ...... 17 9 Modal (Chord-Scale) Exercise ...... 18 10 Exercise setup interface of MusicCritic ...... 19

11 HPCPunprocessed vs. Time ...... 29

12 HPCPonlyM ax vs. Time ...... 29

13 HPCPpost−processed(after transient removal) vs. Time ...... 29

14 Chroma (HPCP) histograms, HPCPmean (left), HPCPstd (right) . . 31

15 Scale Likelihoods vs. Time (left) Chroma Features vs. Time (right) . 34

16 Chroma Histograms: HPCP.mean (right), HPCP.std (left) ...... 40 17 Binary representation of the scale types considered in this case study 43 18 Cosine Similarity Matrix of the binary Scale Templates ...... 45

19 Confusion Matrix using HPCPmean as features and Likelihood Esti- mation (multiplicative) for classification ...... 47

20 Confusion Matrix using HPCPstd as features and Likelihood Estima- tion (multiplicative) for classification ...... 48

62 LIST OF FIGURES 63

21 Confusion Matrix using HPCPmean as features and Likelihood Esti- mation (additive) for classification ...... 49

22 Confusion Matrix using HPCPstd as features and Likelihood Estima- tion (additive) for classification ...... 49

23 Evaluation of HPCPmean features for Supervised Learning stage . . . 50

24 Evaluation of HPCPstd features for Supervised Learning stage . . . . 51 25 Evaluation of combined HPCP features for Supervised Learning stage 51 26 Distribution of standard evaluation scores over validation sets . . . . 52 27 Classification Scores ...... 52 28 Case study : Very Good Performance ...... 55 29 Case study : Very Good Performance (ASSESSMENT SCORES) . . 55 30 Case study : Student Performance - Poor Performance (ASSESS- MENT SCORES) ...... 56 31 Case study : Student Performance - poor grades (ASSESSMENT SCORES) ...... 57 List of Tables

1 The Chord-Scale Mapping ...... 6

2 Scale Types in Weiss (2014) ...... 12

3 Scale Types in the Chord-scale Dataset ...... 16

4 Spectral Analysis Parameters ...... 22

5 Scale Dictionary ...... 33

6 Indexes Test Cases for HPCP (1-7) ...... 46

64 Bibliography

[1] Bozkurt, B., Gulati, S., Romani, O. & Serra, X. Musiccritic: A technological framework to support online music teaching for large audiences. 33rd World Conference of International Society For Music Education (ISME) (2018). URL https://doi.org/10.5281/zenodo.1211450.

[2] Fraser, W. A. Jazzology: A study of the tradition in which jazz musicians learn to improvise (afro-american). 9th Conference on Interdisciplinary Musicology, CIM 2014 (1983). URL https://repository.upenn.edu/dissertations/ AAI8406667.

[3] Nettles, B. & Graf, R. The Chord Scale Theory and (1997).

[4] Levine, M. The jazz theory book (" O’Reilly Media, Inc.", 2011).

[5] Saindon, E. The chord scale theory and jazz harmony. Percussive Notes (2011).

[6] Ramona, M., Pachet, F. & Gorlow, S. Giant steps in jazz practice with the social virtual band. Music Learning with Massive Open Online Courses (2015). URL http://ebooks.iospress.nl/volumearticle/42034.

[7] Ozaslan, T., Guaus, E., Palacios, E. & Arcos, J. L. Identifying attack articula- tions in classical guitar. Computer Music Modeling and Retrieval 6684, 219 – 241 (2011). URL http://doi.org/10.1007/978-3-642-23126-1_15.

[8] Pachet, F. An object-oriented representation of pitch-classes, intervals, scales and chords: The basic muses. In Proceedings of JournÂťees dâĂŹInformatique Musicale (JIM) (1994).

65 66 BIBLIOGRAPHY

[9] Weiss, C. & Habryka, J. Chroma based scale matching for audio tonality analysis. Proceedings of 9th Conference on Interdisciplinary Musicology, CIM 2014 pp.168–173 (2014). URL http://publica.fraunhofer.de/dokumente/ N-345665.html.

[10] Muller, M. & Ewert, S. Chroma toolbox: Matlab implementations for extract- ing variants of chroma-based audio features. In Proceedings of the 12th Inter- national Society for Music Information Retrieval Conference (ISMIR) pp.215 – 220 (2011). URL http://doi.org/10.1.1.399.9397.

[11] Smith, K. A. Perspectives on improvisation in beginning string pedagogy: A description of teacher anxiety, confidence, and attitude. Ph.D. thesis, Kent State University (2010). URL https://etd.ohiolink.edu/rws_etd/ document/get/kent1271081122/inline.

[12] Pressing, J. Improvisation: Methods and models. In Sloboda J. A. (Ed.), Gen- erative processes in music: The psychology of performance, improvisation, and composition: 129 – 178 (1988). URL http://dx.doi.org/10.1093/acprof: oso/9780198508465.003.0007.

[13] Biasutti, M. & Frezza, L. Dimensions of music improvisation. Creativ- ity Research Journal 21, 232–242 (2009). URL https://doi.org/10.1080/ 10400410902861240.

[14] Kenny, M. & Barry, J. «improvisation». The Science and Psychology of Music Performance 117–34 (2002). URL http://dx.doi.org/10.1093/acprof:oso/ 9780195138108.003.0008.

[15] Csikszentmihalyi, M. The domain of creativity. (Sage Publications, Inc, 1990).

[16] Palmer, C. M. An Analysis of Instrumental Jazz Improvisation Development Among High School and College Musicians. Ph.D. thesis (2013). URL http: //dx.doi.org/10.1177/0022429416664897. BIBLIOGRAPHY 67

[17] May, L. F. Factors and abilities influencing achievement in instrumental jazz improvisation. Journal of Research in Music Education 51, 245–258 (2003). URL http://doi.org/10.2307/3345377.

[18] Watson, K. E. The effects of aural versus notated instructional materials on achievement and self-efficacy in jazz improvisation. Journal of Research in Music Education 58, 240–259 (2010). URL https://doi.org/10.1177/ 0022429410377115.

[19] Wehr-Flowers, E. Differences between male and female students’ confidence, anxiety, and attitude toward learning jazz improvisation. Journal of Research in Music Education 54, 337–349 (2006). URL http://doi.org/10.1177/ 002242940605400406.

[20] McPherson, G. Evaluating improvisational ability of high school instrumental- ists. Bulletin of the Council for Research in Music Education 11–20 (1993). URL https://www.jstor.org/stable/40318607.

[21] Smith, D. T. Development and validation of a rating scale for wind jazz im- provisation performance. Journal of Research in Music Education 57, 217–235 (2009). URL https://doi.org/10.1177/0022429409343549.

[22] Bogdanov, D. et al. Essentia: An audio analysis library for music information retrieval. Britto A, Gouyon F, Dixon S, editors. 14th Conference of the In- ternational Society for Music Information Retrieval (ISMIR); 2013 Nov 4-8; Curitiba, Brazil. ISMIR; 2013. p. 493-8. (2013). URL http://hdl.handle. net/10230/32252.

[23] Serra, X. & Smith, J. Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition. Computer Music Journal 14, 12–24 (1990). URL http://doi.org/10.2307/3680788.

[24] Gómez, E. Tonal Description of Music Audio Signals. Ph.D. thesis, Universitat Pompeu Fabra (2006). URL https://doi.org/10.1287/ijoc.1040.0126. 68 BIBLIOGRAPHY

[25] Amatriain, X., Bonada, J., Loscos, A. & Serra, X. Spectral processing. Zölzer U, editor. DAFX-Digital Audio Effects. Chichester: John Wiley & Sons; 2002. (2002). URL http://doi.org/10.1002/9781119991298.ch10.

[26] Fujishima, T. Real-time chord recognition of musical sound: A system using common lisp music. In Proceedings of International Computer Music Confer- ence pp.464–467 (1999). URL http://hdl.handle.net/2027/spo.bbp2372. 1999.446.

[27] Shepard, R. N. Circularity in judgments of relative pitch. The Journal of the Acoustical Society of America 36, 2346–2353 (1964). URL https://doi.org/ 10.1121/1.1919362.

[28] Gómez, E. Key estimation from polyphonic audio. Music Information Retrieval Evaluation Exchange (MIREX ’05) (2005). URL http://doi.org/10.1287/ ijoc.1040.0126.

[29] Demirel, E., Bozkurt, B. & Serra, X. Automatic makam recognition using chroma features. In Proceedings of Folk Music Analysis 2018 (FMA) (2018). URL https://doi.org/10.5281/zenodo.1239435.

[30] Vapnik, V. N. An overview of statistical learning theory. IEEE transactions on neural networks 10, 988–999 (1999). URL http://doi.org/10.1109/72. 788640.

[31] Vapnik, V., Guyon, I. & Hastie, T. Support vector networks. Machine Learning 20, 273–297 (1995). URL http://doi.org/10.1023/A:1022627411411.

[32] Claesen, M. & De Moor, B. Hyperparameter search in machine learning (2015). URL arXiv:1502.02127.

[33] Pedregosa, F. et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011). URL http://dl.acm.org/citation.cfm? id=1953048.2078195. BIBLIOGRAPHY 69

[34] Temperley, D. & Marvin, E. W. Pitch-class distribution and the identification of key. Music Perception: An Interdisciplinary Journal 25, 193–212 (2008). URL https://doi.org/10.1525/mp.2008.25.3.193.

[35] Karakurt, A., Şentürk, S. & Serra, X. Morty: A toolbox for mode recognition and tonic identification. Proceedings of the 3rd International workshop on Dig- ital Libraries for Musicology 9–16 (2016). URL https://doi.org/10.1145/ 2970044.2970054.