Melody Track Identification in Music Symbolic Files
Total Page:16
File Type:pdf, Size:1020Kb
Melody Track Identification in Music Symbolic Files David Rizo, Pedro J. Ponce de Leon,´ Antonio Pertusa, Carlos Perez-Sancho,´ Jose´ M. Inesta˜ Departamento de Lenguajes y Sistemas Inform´aticos Universidad de Alicante, Spain {drizo,pierre,pertusa,cperez,inesta}@dlsi.ua.es http://grfia.dlsi.ua.es Abstract extraction front-endis needed to be changed for dealing with other formats. Standard MIDI files contain data that can be considered as a symbolic representation of music (a digital score), The identification of the melodic track is very useful for and most of them are structured as a number of tracks, a number of applications. For example, melody match- one of them usually containing the melodic line of the ing when searching in MIDI databases, both in symbolic piece, while the other tracks contain the accompani- format (Uitdenbogerd & Zobel 1999) and in audio for- ment. The objective of this work is to identify the track mat (Ghias et al. 1995) (in this case, this problem is ofter containing the melody using statistical properties of the named ‘query by humming’, and the first stage is often an notes and pattern recognition techniques. Finding that identification of the notes in the sound query). In all these track is very useful for a number of applications, like cases, search queries are always a small part of the melody melody matching when searching in MIDI databases and it should be clearly identified in the database files to per- or motif extraction, among others. First, a set of de- form melodic comparisons. Other application can be motif scriptors from each track of the target file are extracted. These descriptors are the input to a random forest clas- extraction to build music thumbnails for music collection in- sifier that assigns the probability of being a melodic line dexing. to each track. The track with the highest probability is The literature about melody voice identification is quite selected as the one containing the melodic line of that poor. In the digital sound domain, several papers aim to MIDI file. Promising results have been obtained testing extract the melodic line from audio files (Berenzweig & El- a number of databases of different music styles. lis 2001; Eggink & Brown 2004). In the symbolic domain, Ghias and co-workers (Ghias et al. 1995) built a system Introduction to process MIDI files extracting something similar to the melodic line using simple heuristics not described in their There are different file formats to represent a digital score. paper and discarding the MIDI percussion channel. Some of them are proprietary and others are open standards, In (Uitdenbogerd & Zobel 1998), four algorithms were 1 2 like MIDI or MusicXML , that have been adopted by many developedfor detecting the melodic line in polyphonicMIDI sequencers and score processors as data interchange for- files3, assuming that a melodic line is a monophonic4 se- mats. As a result of that, thousands of digital scores can quence of notes. These algorithms are based mainly in the be found on the Internet in these formats. A standard MIDI note pitches; for example, keeping at every time the note file is a representation of music designed to make it sound of highest pitch from those that sound at that time (skyline through electronic instruments and it is usually structured as algorithm). a number of tracks, one for each voice of the music piece. Other kind of works focus on how to split a polyphonic One of them usually contains its melodic line, specially in source into a number of monophonic sequences by parti- the case of modern popular music. The melodic line (also tioning it into a set of melodies (Marsden 1992) or selecting called melody voice) is the leading part in a composition at most one note at every time step (Uitdenbogerd & Zo- with accompaniment. The goal of this work is to automat- bel 1999). In general, these works are called monophonic ically find this melodic line track in a MIDI file using sta- reduction techniques (Lemstrom & Tarhio 2000). Different tistical properties of the notes and pattern recognition tech- approaches, like using voice information (when available), niques. The proposed methodology can be applied to other average pitch, and entropy measures have been proposed. symbolic music file formats, because the information uti- Other approach, related to motif extraction, focus on the lized to take the decision is only based on how the notes are development of techniques for identifying patterns as repeti- arranged in each voice of the digital score. Only the feature Copyright c 2006, American Association for Artificial Intelli- 3 In polyphonic music there can be several notes sounding si- gence (www.aaai.org). All rights reserved. multaneously. 1 http://www.midi.org 4 In a monophonic line no more than one note can be sounding 2 http://www.recordare.com simultaneously. 254 tions that are able to capture the most representative notes of a music piece (Cambouropoulos 1998; Lartillot 2003; Table 1: Extracted descriptors Meudic 2003). Category Descriptors Nevertheless, in this work the aim is not to extract a Track information Relative duration monophonic line from a polyphonic score, but to decide Number of notes Occupation rate which of the tracks contains the main melody in a multi- Polyphony rate track standard MIDI file. For this, we need to assume that Pitch Highest the melody is indeed contained in a single track. This is Lowest often the case of popular music. Mean The features that characterize melody and accompani- Standard deviation ment voices must be defined in order to be able to select the Pitch intervals Number of different intv. melodic track. There are some features in a melodic track Largest that, at first sight, seem to suffice to identify it, like the pres- Smallest ence of higher pitches (see (Uitdenbogerd & Zobel 1998)) Mean or being monophonic(actually, a melody can be defined as a Mode monophonic sequence). Unfortunately, any empirical anal- Standard deviation Note durations Longest ysis will show that these hypotheses do not hold in general, Shortest and more sophisticated criteria need to be devised in order Mean to take accurate decisions. Standard deviation To overcome these problems, a classifier able to learn in Syncopation Number of Syncopated notes a supervised manner what a melodic track is, based on note distribution statistics, has been utilized. For that, a num- ber of training sets based on different music styles have column indicates the property analyzed and the right one the been constructed consisting in multitrack standard MIDI kind of statistics describing the property. files with all the tracks labelled either as melody or not Four features were designed to describe the track as a melody. Each track is analyzed and represented by a vec- whole and 15 to describe particular aspects of its content. tor of features, as described in the methodology section. For these 15 descriptors, absolute and normalized relative Therefore, a set of descriptors are extracted from each versions have been computed. These latter descriptors were track of a target midifile, and these descriptors are the input calculated using the formula (valuei −min)/(max−min), to a classifier that assigns a probability of being a melodic where valuei is the descriptor to be normalized correspond- line to each track. The tracks with a probability under a ing to the i-th track, and min and max are, respectively, given threshold are filtered out, and then the one with the the minimum and maximum value for this descriptor for all highest probability is selected as the melodic line for that the tracks of the target midifile. This permits to know these file. properties in terms of proportions to the other tracks in the Several experiments were performed with different pat- same file, using non-dimensional values. This way, a total tern recognition algorithms and the random forest clas- of 4 + 15 × 2 = 34 descriptors were initially computed for sifier (Breiman 2001) yielded the best results. The each track. WEKA (Witten & Frank 1999) toolkit was chosen to im- The track overall descriptors are its relative duration (us- plement the system. ing the same scheme as above), number of notes, occupation The rest of the paper is organized as follows: first the rate (proportion of the track length occupied by notes), and methodologyis described, both the way to identify a melody the polyphony rate, defined as the ratio between the num- track and how to select one track for a song. Then, the ex- ber of ticks in the track where two or more notes are active periments to test the method are presented, and the paper simultaneously and the track duration in ticks. finishes with some conclusions. Pitch descriptors are measured using MIDI pitch values. Maximum possible MIDI pitch is 127 (note G8) and mini- Methodology mum is 0 (note C−2). MIDI Track characterization The interval descriptors summarize information about the difference in pitch between consecutive notes. Pitch interval The content of each track is characterized by a vector of values are either positive, negative, or zero. Absolute values statistical descriptors based on descriptive statistics of note have been computed instead. pitches and durations that summarize track content infor- mation. This kind of statistical description of musical con- Note duration descriptors were computed in terms of tent is sometimes referred to as shallow structure descrip- beats, and are, therefore, independentfrom the midifile reso- tion (Pickens 2001; Ponce de Le´on, I˜nesta, & P´erez-Sancho lution. 2004). A set of descriptors have been defined, based on several Feature selection categories of features that assess melodic and rhythmicprop- The descriptors listed above are a complete list of all the erties of a music sequence, as well as track properties.