Bridging Chasms in Hindustani Music Retrieval
Total Page:16
File Type:pdf, Size:1020Kb
Bridging Chasms in Hindustani Music Retrieval A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy by Joe Cheri Ross (Roll No. 114050001) Under the guidance of Prof. Preeti Rao and Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, INDIAN INSTITUTE OF TECHNOLOGY BOMBAY December 2017 Declaration I declare that this written submission represents my ideas in my own words and where others ideas or words have been included I have adequately cited and referenced the original sources. I also declare that I have adhered to all principles of academic honesty and integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source in my submission. I un- derstand that any violation of the above will be cause for disciplinary action by the Institute and can also evoke penal action from the sources which have thus not been properly cited or from whom proper permission has not been taken when needed. Joe Cheri Ross (Roll No: 114050001) Date: 22nd December 2017 Abstract Development of a music recommender system, one of the key applications of Music Informa- tion Retrieval (MIR), necessitates research into methods to represent and retrieve music infor- mation efficiently. Considering the specific characteristics of each music culture and the diverse requirements thereof, the methods must be culture-aware with many of the associated tasks be- ing culture-specific. This research is motivated by the importance of a music recommendation system for Hindustani music. The investigated tasks focus primarily on information extraction from melodic audio and text content, realizing the significance of information from multi-modal sources. In the context of information extraction from audio signals, we present our investiga- tions on melodic motif detection involving mukhda (main title phrase of a composition) and pakad (raga characteristic phrase) detection. Our investigation on extracting meta-information from natural language text focuses on coreference resolution, with the aim of improving rela- tion extraction from textual content. The task of raga similarity detection is investigated with both text and music content (available as music notation). Melodic motifs form essential building blocks in Indian Classical music. The mukhda and the pakad phrases provide strong cues to the identity of the composition and the underlying raga respectively. Automatic detection of such recurring basic melodic phrases is highly relevant to music information retrieval in Hindustani music. This thesis discusses approaches to detect mukhda and pakad in a concert audio recording by exploring musicological cues and similarity computations. Considering the large number of ragas in Hindustani music, detection of similarities be- tween them is beneficial to music recommendation. The problem of raga similarity detection is investigated with two diverse data sources viz., discussions on Hindustani ragas and compo- sition notations. Each of these sources help in extracting different aspects of raga similarity. While text discussions aid to extraction of similarities generally perceived by musicians, sim- ilarities based on melodic attributes are extracted through composition notations. Both the ii approaches learn representations for ragas, and the similarities between the representations in- dicate the similarities between the ragas. Realizing the importance of coreference resolution to improve relation extraction from text, our investigations on extracting meta-information focuses on the same from music discus- sion forums. The attempt to design a specific approach is motivated by the nature of the text and the domain specificity. While the feature design considers domain specificity and nature of the text, we also observe the need for a hybrid approach. The proposed modification to best- first clustering, for the clustering step in the mention-pair model, considers relation between candidate antecedents while resolving for an anaphoric mention. We also discuss a method to identify the semantic class, a crucial feature for coreference resolution, with the help of web re- sources. This approach to semantic class identification is generalizable for any domain-specific dataset with similar challenges. The investigations with eye-tracking and memory networks initiate research in the direction of bridging the gap between the cognitive process involved and the machine understanding of coreference resolution. iii Contents Abstract ii List of Tables ix List of Figures xii 1 Introduction 1 1.1 Music Information Retrieval (MIR) . .2 1.2 Introduction to Hindustani Music Concepts . .3 1.2.1 Bandish and Performance . .3 1.2.2 Swara . .4 1.2.3 Raga . .5 1.2.4 Tala (Rhythm) . .6 1.2.5 Mukhda and pakad ...........................7 1.3 Music information retrieval for Hindustani music . .8 1.4 Knowledge representation: MIR . 10 1.5 Motivation . 11 1.6 Objectives . 14 1.7 Overview of Thesis Contribution . 15 1.7.1 Melodic Motif Detection . 15 1.7.2 Raga Similarity Detection . 17 1.7.3 Coreference Resolution . 19 1.8 Thesis Organization . 23 2 Literature Survey 26 2.1 MIR from Audio: Motif Detection . 26 iv 2.1.1 Symbolic Representation . 26 2.1.2 Audio . 27 2.2 Raga Similarity Detection . 28 2.3 Coreference Resolution . 30 2.3.1 Linguistic and Other Considerations . 30 2.3.2 Mention detection . 31 2.3.3 Rule Based Approaches . 32 2.3.4 Data-driven Approaches . 33 2.4 Summary . 41 3 Mukhda Detection 42 3.1 Dataset and Evaluation Methods . 43 3.2 Automatic Mukhda Detection . 45 3.2.1 Vocal Pitch Detection . 46 3.2.2 Motif Candidate Selection . 47 3.2.3 Similarity Modeling . 48 3.3 Experiments and Results . 50 3.4 Summary . 53 4 Pakad Detection 55 4.1 Dataset and Annotation . 56 4.2 Audio Processing . 59 4.2.1 Audio Processing Stages . 59 4.3 Phrase-level pitch curve characteristics . 61 4.3.1 Intra-phrase-class similarity . 62 4.4 Similarity Computation . 65 4.4.1 Classification of test segment . 65 4.4.2 Constraint Learning . 67 4.5 Experiments and Results . 71 4.6 Summary . 75 5 Raga Similarity Detection from Composition Notation 76 5.1 Raga Similarity Based on Notation: Motivation and Central Idea . 77 v 5.2 Neural Network Architecture for Learning Note-Embeddings .......... 78 5.3 Raga Similarities from Note-Embeddings . 81 5.4 Baselines for Comparison . 81 5.4.1 N-gram based approach . 81 5.4.2 Pitch Class Distribution (PCD) . 81 5.4.3 Uni-directional LSTM . 82 5.5 Dataset . 82 5.6 Data Pre-processing . 84 5.7 Experiments . 84 5.7.1 Evaluation Methods . 84 5.7.2 Setup . 86 5.8 Results . 86 5.9 Summary . 89 6 Raga Similarity Detection from Textual Discussions 91 6.1 Distributional Semantics . 92 6.2 Neural Network Architecture . 92 6.2.1 Mikolov’s Architecture . 92 6.2.2 Proposed Approach . 93 6.3 Datasets . 94 6.4 Experiments . 95 6.4.1 Quantitative Evaluation . 95 6.4.2 Qualitative Evaluation . 96 6.5 Results . 96 6.5.1 Qualitative Evaluation . 98 6.6 Applicability of the Approach to Other Domain-Specific Datasets . 101 6.7 Summary . 102 7 Information Extraction from Music Discussion Forums: Relevance of Coreference Resolution 103 7.1 Dataset: Rasikas.org . 104 7.2 Our Approach . 106 7.3 Knowledge Source for Coreference Resolution . 107 vi 7.3.1 Grammatical Role Features . 109 7.4 Mention Detection . 111 7.5 Bayesian Network for Small Dataset . 112 7.5.1 Bayesian Network . 113 7.5.2 Bayesian Network Design . 114 7.6 Hybrid Approach . 114 7.6.1 Coreference Evaluation . 116 7.7 Experiments & Results . 116 7.7.1 Mention Detection . 116 7.7.2 Bayesian Network for Small Dataset . 117 7.7.3 Feature Evaluation . 119 7.7.4 Hybrid Approach . 121 7.7.5 Results with an Existing Approach . 126 7.8 Summary . 127 8 Improved Best-First Clustering 128 8.1 Best-First Clustering . 129 8.2 Improved Best-First Clustering . 130 8.2.1 Algorithm . 132 8.2.2 Dynamic λ ................................ 133 8.3 Experiments & Results . ..