Automatic Labelling of Tabla Signals
Total Page:16
File Type:pdf, Size:1020Kb
Automatic Labelling of Tabla Signals Olivier K. GILLET Gael¨ RICHARD GET-ENST (TELECOM Paris) GET-ENST (TELECOM Paris) 46, rue Barrault 46, rue Barrault 75013 Paris, France 75013 Paris, France [email protected] [email protected] Abstract a set of general descriptors such as genre, style, musicians • and instruments of a music piece. Most of the recent developments in the field of music a set of more specific descriptors such as beat, notes, indexing and music information retrieval are focused • chords or nuances. on western music. In this paper, we present an auto- matic music transcription system dedicated to Tabla - Automatic labelling of instrument sounds therefore represents a North Indian percussion instrument. Our approach one of the component of a complete indexing system. Pre- is based on three main steps: firstly, the audio signal is vious efforts in automatic labelling of instruments sounds segmented in adjacent segments where each segment have mainly been dedicated to sounds with definite pitch represents a single stroke. Secondly, rhythmic infor- ([Kaminskyj, 2001], [Martin, 1999] or [Herrera et al., 2000] for mation such as relative durations are calculated using a critical review). Percussive sounds is a specific class of in- beat detection techniques. Finally, the transcription strument sounds. They can be pitched (as for xylophone or (recognition of the strokes) is performed by means of marimba) or unpitched (as for drum or congas). There is much a statistical model based on Hidden Markov Model interest in devoting more effort to the latter class since there is (HMM). The structure of this model is designed in or- a number of specific applications for this class of sounds (auto- der to represent the time dependencies between suc- matic drum loop recognition and retrieval, virtual dancer, drum- cessives strokes and to take into account the specifici- controlled synthesizers . ). ties of the tabla score notation (transcription symbols may be context dependent). Realtime transcription In fact, more attention is now given to the class of Tabla soli (or performances) with an error rate of of unpitched percussive sounds (see for example 6.5% is made possible with this transcriber. The tran- [Herrera et al., 2003],[McDonald and Tsang, 1997], scription system, along with some additional features [Gouyon and Herrera, 2001])), but these efforts are still such as sound synthesis or phrase correction, are inte- limited to western music and for most studies only isolated grated in a user-friendly environment called Tablas- sounds are considered. cope. We propose in this paper a study on the automatic transcrip- tion of tabla performances. The tabla is a percussion instrument widely used in (North) Indian classical and semi-classical mu- 1 Introduction sic, and which has gained more and more recognition among Due to the exponential growth of available digital information, fusion musicians. As for a number of percussive instruments there is a need for techniques that would make this information (such as congas, drum), the characteristics of the audio signal more readily accessible to the user. As a consequence, auto- produced is rather impulsive and successive events are therefore matic indexing and retrieval of information based on content is rather easily detected. Another advantage for automatic analy- becoming more and more important and represent very chal- sis is that such instruments produce only limited sound mixtures lenging research areas. Automatic indexing of digital informa- (1 or 2 events at the same time where in polyphonic music over tion permits to extract a textual description of this information 50 notes can be played simultaneously). However, the transcrip- (i.e. meta data). In the context of music signals, if such a de- tion of tabla signals is quite challenging because there is strong scription would ultimately be a complete transcription (for ex- time dependencies that are essential to take into account for a ample in the form of music scores), it should at least target to successful transcription. There are two types of time dependen- extract two types of descriptors from the audio signal: cies : firstly, the instrument can produce long, resonant strokes which will overlap and alter the timbre of the following strokes. Secondly, the symbol used to represent a stroke can depend on Permission to make digital or hard copies of all or part of this work the context in which it is used. Prior works related to computeri- for personal or classroom use is granted without fee provided that zation of the Tabla include a phrase retrieval system using MIDI copies are not made or distributed for profit or commercial advan- input [Roh and Wilcox, 1995] or specific controller and synthe- tage and that copies bear this notice and the full citation on the first sis models [A. Kapur and Cook, 2002], but to our knowledge no page. c 2003 Johns Hopkins University. transcription system of tabla performances have been reported. A tabla sequence or rhythm is composed of successive bols of different durations (note, half-note, quarter note,...) however the rhythmic structure can be quite complex. The basic rhythmic structures (called taal) can have a large variety of beats (for example 6,7,8, 10, 12, 16,..) which are grouped in measures. The successives measures may have different number of beats (for example Dhin Na Dhin Dhin Na Tin Na Dhin Dhin j j j Na where the vertical bars symbolise the measure boundaries). The grouping characteristics are in a sense similar to those ob- served in spoken or written languages where ”words” can be grouped in sentences. These properties can be modelled by Figure 1: The tabla is a pair of drums : a metallic bass drum means of a Language model that will provide an estimation of (the bayan) and a wooden treble drum : the dayan the probability of a given sequence of words. For the tabla, since the association between a sound and the corresponding bol can be context-sensitive (i.e. depends on previous bols), it The paper is organized as follows. The next section presents the seems quite appropriate to use language modelling approaches instrument and the specific transcription that is commonly used to model these dependencies. to describe tabla performances. Section 3 is dedicated to the In this work, the transcription is limited to the recognition of description of the transcription system including a detailed sec- the successives bols with their corresponding relative duration tion on the model proposed. Section 4 presents some evaluation (note, half-note etc...) but does not intend to extract the complex results on real tabla performances. Finally, section 5 suggests rhythmic structure (i.e the measures). some conclusions. 3 Transcription of Tabla phrases 2 Presentation of the tabla 3.1 Architecture of the system The tabla (see figure 1) is a pair of drums traditionally used for The architecture of our system is organized in several modules: the accompaniment of Indian classical or semi-classical music. The bayan is the metallic bass drum, played by the left hand. Parametric representation the signal parametric repre- It is used to produce a loud resonant sound or a damped non- • sentation is obtained by means of two main sub-modules: resonant one. By sliding the palm of the hand on the drum head the onset detection which permits to segment the signal while striking, the perceived pitch can be modified. The dayan into individual strokes and the features extraction which is the wooden treble drum, played by the right hand. A larger provides an acoustic vector from the segment previously variety of sounds is produced on this drum. As for timbals for extracted. example, the drum can be tuned according to the accompanied voice or instrument. It is also important to note that for this Sequence Modelling: the sequence of feature vectors class of instruments the main resonance (that corresponds to the • (one vector per stroke) is then modelled by a classification main perceived pitch) is much above the resonance of the other technique such as k nearest neighbors (k-NN) or Hidden harmonics. Further elements on the physics of the Tabla can be Markov Models (HMM). found in [Fletcher and Rossing, 1998] or in the early work of Raman [Raman, 1934]. Transcription: The final transcription is given in the form • of a succession of bols or words accompanied with a rhyth- A mnemonic syllable or bol is associated to each of these mic notation obtained from the tempo detection module strokes. Since the musical tradition in India is mostly oral, com- positions are often transmitted from masters to students as sung bols. Common bols are : Ge, Ke (bayan bols), Na, Tin, Tun, Figure 2 provides the general architecture of the system includ- Ti, Te (dayan bols). Strokes on the Bayan and dayan can be ing the transcription modules (in bold line), and the tabla se- combined, like in the bols : Dha (Na + Ge), Dhin (Tin + Ge), quence generator/synthesizer (in dotted line) that is further ex- Dhun (Tun + Ge)... Because of these characteristics, even if plained in section 5.2. It is important to note that the input to the two drums are played simultaneously, the transcription can al- system can either be an audio file (for example in .wav format) ways be considered as monophonic - a single symbol is used or audio streams that are processed in real time. even if the corresponding stroke is compound. 3.2 Parametric representation A specificity of this notation system is that two different bols 3.2.1 Segmentation in strokes can sound very similar. For example, Ti and Te nearly sound the same, but are played with different fingers. Though, it is As for many transcription system, the first step of our tran- possible to figure out which bol is used, since a fast sequence of scriber consists in performing a segmentation of the incoming these strokes is played by alternating between Ti and Te.