A Corpus-Based Approach to the Computational Modelling of Melody
Total Page:16
File Type:pdf, Size:1020Kb
A Corpus-based Approach to the Computational Modeling of Melody in Raga Music A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy by Kaustuv Kanti Ganguli (Roll No. 134076013) Under the guidance of Prof. Preeti Rao Department of Electrical Engineering Indian Institute of Technology Bombay, Mumbai, India. May 2019 To my grandparents ... INDIAN INSTITUTE OF TECHNOLOGY BOMBAY, INDIA CERTIFICATE OF COURSE WORK This is to certify that Kaustuv Kanti Ganguli (Roll No. 134076013) was ad- mitted to the candidacy of Ph.D. degree on July 2013, after successfully com- pleting all the courses required for the Ph.D. programme. The details of the course work done are given below. S.No Course Code Course Name Credits 1 EE 635 Applied Linear Algebra 6 2 EE 603 Digital Signal Processing and its Applications 6 3 EE 679 Speech Processing 6 4 EE 601 Statistical Signal Analysis 6 5 EE 610 Image Processing 6 6 EE 779 Advanced Topics in Signal Processing 6 7 EE 712 Embedded System Design 6 8 EE 678 Wavelets 6 9 EE 692 R & D Project 6 10 EES 801 Seminar 4 11 HS 699 Communication and Presentation Skills PP 12 CS 725 Foundations of Machine Learning AU Total Credits 58 IIT Bombay Date: Dy. Registrar (Academic) c Copyright by Kaustuv Kanti Ganguli May 2019 All Rights Reserved i ii Abstract Indian art music is predominantly an oral tradition with pedagogy involving the oral transmis- sion of raga lessons. There exist text resources for musicology, wherein the lexicon finds its place in a rather prescriptive manner. Raga performance allows for considerable flexibility in interpretation of the raga grammar in order to incorporate elements of creativity via improvi- sation. It is therefore of interest to understand how the musical concepts are manifested in performance, and how the artiste improvises i.e. uses the stock musicological knowledge in “new” ways, while carefully maintaining the distinctiveness of each raga in the ears of trained listeners. Alongside addressing the issue of subjectivity, scalability, and reproducibility in mu- sicological research, this work proposes novel methods for relevant music information retrieval (MIR) applications like rich transcription, melody segmentation, motif spotting, raga recogni- tion. While a general engineering approach is to optimize certain evaluation metrics, we aimed to ensure that our findings are informed by musicological knowledge and human judgment. To achieve this, our approach is two-fold: computational modeling of the melody on a sizable, representative corpus; then validating the models through behavioral experiments towards un- derstanding the learned schema by trained musicians. We propose computational representations that robustly capture the particular melodic features of the raga under study while being sensitive enough to the differences between ragas tested within a sizable, representative music corpus. To make a good foundation for tuning of hyper-parameters, we exploit the notion of “allied ragas” that use the same tonal material but differ in their order, hierarchy, and phraseology. Our results show that computational repre- sentations of distributional and structural information in the melody, combined with suitable distance measures give insights about how the aspect of raga distinctiveness is manifested in practice over different time scales by creative performers. Finally, motivated by the parallels be- tween musical structure and prosodic structure in speech, we present listening experiments that explore musicians’ perception of ecologically valid synthesized variants of a raga-characteristic phrase. Our findings suggest that trained musicians clearly demonstrate elements of categor- ical perception in the context of the technical boundary of a raga. The last leg of the thesis discusses the application areas, namely mainstream MIR tasks as well as computational musi- cology paradigm. In sum, we aim to impart music knowledge into a data-driven computational model towards modeling human-judgment of melodic similarity. This cognitively-based model can be useful in music pedagogy, as a compositional aid, or building retrieval tools for music exploration and recommendation. iii iv Contents Abstract iii List of Tables xi List of Figures xv 1 Introduction 1 1.1 The broad perspective . .1 1.2 Organization of the thesis . .6 2 Literature on melodic similarity in MIR and computational musicology 9 2.1 Motivation . .9 2.2 The distributional view . 11 2.2.1 Introduction . 11 2.2.2 The theory of tonal hierarchy in music . 12 2.2.3 First-order pitch distributions in audio MIR . 14 2.3 The structural view . 16 2.3.1 Melodic representation . 17 2.3.2 Melodic (dis)similarity . 18 2.4 Literaure on perceptual experiments . 19 3 Hindustani music concepts 23 3.1 Motivation . 23 3.2 Raga grammar and performance . 24 3.2.1 Allied ragas . 25 3.3 Datasets . 27 v 3.3.1 Allied-ragas dataset . 28 3.3.2 Phrase dataset . 29 3.4 Musicological viewpoint of improvisation . 31 3.4.1 Musical expectancy . 34 3.5 Scope of the thesis . 35 4 Audio processing for melodic analysis 37 4.1 Motivation . 37 4.2 Background . 38 4.2.1 Objective modeling for musical entities . 39 4.3 Methodology . 40 4.3.1 Pitch time-series extraction from audio . 40 4.3.2 Svara segmentation and labeling . 41 4.4 Melodic representations . 45 4.4.1 Distributional representations . 45 4.4.2 Structural representation . 49 4.4.3 Event segmentation for melodic motifs . 54 4.5 Distance measures between pitch distributions . 57 4.5.1 A note on dynamic programming approaches . 58 4.6 Statistical model fit: regression . 61 5 The distributional view 63 5.1 Motivation . 63 5.2 Background . 64 5.3 Evaluation criteria and experiments . 65 5.3.1 Experiment 1: unsupervised clustering . 67 5.3.2 Experiment 2: detecting ungrammaticality . 67 5.4 Discussion . 68 5.4.1 Performance across allied raga-pairs . 68 5.4.2 Bin resolution: coarse or fine? . 70 5.4.3 Validation of svara transcription parameters . 71 5.4.4 New insights on time scales . 71 5.5 Conclusion . 75 vi 5.6 Discussions on individual raga-pairs . 77 5.6.1 Raga-pair Deshkar-Bhupali . 77 5.6.2 Raga-pair Puriya-Marwa . 78 5.6.3 Raga pair Multani-Todi . 80 6 The structural view 85 6.1 Motivation . 85 6.2 Background . 86 6.3 Methodology and experiments . 89 6.3.1 Musicological hypotheses testing . 90 6.3.2 Realization of emphasis . 100 6.4 MIR applications . 103 6.4.1 Feature selection and evaluation . 104 6.4.2 Comparing phrase representations . 104 6.5 Summary . 105 7 Perceptual similarity of raga phrases 109 7.1 Introduction . 109 7.2 Literature review . 111 7.2.1 Literature summary and proposed method . 113 7.3 General method . 115 7.3.1 Choice of the phrase: relation to acoustic measurements . 115 7.3.2 Stimulus creation . 117 7.3.3 Participant summary . 119 7.4 Experiment 1: testing for the perceptual magnet effect . 120 7.4.1 Method . 121 7.4.2 Results and discussion: goodness rating . 124 7.4.3 Results and discussion: PME discrimination . 125 7.5 Experiment 2: testing for categorical perception . 129 7.5.1 Method . 130 7.5.2 Results and discussion: identification . 131 7.5.3 Results and discussion: CP discrimination . 133 7.6 Experiment 3: production based perception testing . 136 vii 7.6.1 Method . 137 7.6.2 Observations . 138 7.6.3 Statistical model fit . 140 7.7 General discussion . 143 7.8 Extended experiments . 144 7.8.1 Type B stimuli . 144 7.8.2 Equivalent discussion on type A stimuli . 148 7.8.3 Visualization of perceptual space . 150 7.9 Concluding remarks . 151 8 Retrieval applications 157 8.1 Motivation . 157 8.2 Background . 158 8.3 Dataset and proposed systems . 160 8.3.1 Baseline system . 161 8.3.2 Behavior based system . 161 8.3.3 Pseudo-note system . 162 8.4 Experimental results . 164 8.5 Extension of the best performing proposed system . 167 8.5.1 Scoring scheme . 168 8.6 Discussion . ..