Classifying Musical Scores by Instrument
Total Page:16
File Type:pdf, Size:1020Kb
Classifying Musical Scores By Instrument Bryce Cai Michael Svolos Stanford University Stanford University Stanford, CA 94305 Stanford, CA 94305 [email protected] [email protected] Abstract A given line of music can be played with greater or less ease by different musical instruments, due to the invariants of their construction and design, and ideally, these instrumental idioms are taken into consideration when writing a part for that instrument. Thus, the question posed by this paper is whether these idoms, these differences, can be used to classify instrumental parts by instrument. We trained a multiclass classifier using linear regression over 860 instrumental parts from Classical-era musical scores. Our one-vs-one algorithm saw 23% accuracy, exceeding the baseline by 9%. 1 Task definition Instrumental music from the Classical era (1750-1825) is written with individual lines of music for each instrument. Each line is played at the same time to create the sounds of the complete piece, whether it’s a Bach piece for solo flute, a Mozart concerto for clarinet and orchestra, or a Hadyn string quartet. Every wind or string instrument has its own range of notes that it’s capable of playing - for instance, the flute cannot play below the B below middle C. Each instrument also has a different, much more nebulous set of limitations of what it can easily play, though. These limitations are a consequence of the way each instrument was designed physically and acoustically. An example of this is a quick passage on trombone that requires many long jumps of the slide; such a passage is technically possible, but it would require great skill at the instrument. Similarly, passages that have large leaps in pitch might be difficult on wind instruments, since they require quick changes in mouth position, but are easy on violin, since one can just place their finger on the string for the new note. This idea, that different instruments can play different parts with varying ease, is known as instru- mental idiom. It’s important to note that instrumental idiom is a quality of instruments, not of parts. A composer could write any part for any instrument if they can find players capable of playing it. The best composers, though, are able to write beautiful and expressive music that also is well suited to the instruments they’re writing for. This makes it easier for performers to play their music well and is seen as part of the craft of conducting. This brings us to the fundamental question of this paper: are differences in instrumental parts and idiom so large that they can distinguish which instrument a part is for? For this task, we measure success with accuracy, i.e. how many parts were successfully labeled out of total. 2 Infrastructure One of the challenges of this project was collecting data. There were a number of options available to us: would we use audio files of pieecs? Images of the score? Score data in some musical score file format? Each of these options had their own good and bad features. For instance, there are many Preprint. Under review. more recordings of pieces or scans of scores than there are digitized scores, but analyzing audio or images is much more difficult and outside of the scope of this project. For this reason, we decided to work with digitized scores. We downloaded our scores from Mutopia Project, a collection of public-domain, digitized scores of pieces of classical music. Scores were downloaded manually. There are a number of file formats for sheet music, including MusicXML, MIDI, and file formats unique to certain score editors such as MuseScore or Sibelius. MIDI is a general standard that is recognized by most editors, so the downloaded scores were digitized in this format and carried the .mid suffix. The process of translating scores into this format is time-consuming, so there were only 2000 scores available for download on the website. MIDI files are organized into tracks: one master track that contains general info such as the title of the song and time resolution information, and one or more other tracks for each line of music. To process our data, we used a python toolkit called python-midi to split each file into tracks so that each track represented one instrumental part to classify. Here, each track is a list of MIDI events. These include track-wide information such as instrument name (which is how we verified our guesses) as well as events for the note value and timing of each individual note. To filter our data, we only evaluated instruments that are mostly monophonic, i.e. that only play one note at a time (trumpet: yes, piano: no). We also excluded instruments with fewer than 40 parts in the dataset, and capped all instruments at 100 parts. Finally, we cut off each part at 1000 MIDI events. This left us with ten instruments and 860 total parts. Figure 1: Number of parts per instrument. 3 Approach 3.1 Baseline and Oracle A simple baseline for this classification problem is simply choosing an instrument at random for each assignment. Since there are 10 instruments, this baseline gives 10% accuracy. A more advanced baseline involves restricting this random choice from the set of all instruments to the set of instruments that are able to play the piece to begin with. This was done by examining the full range of a musical score by noting the highest and lowest notes in the piece. If the range fell outside of the playable range of an instrument, that instrument was excluded from the final set; the 2 assignment was then chosen randomly from the instruments remaining (i.e., the set of instruments for which the entire piece was within its playable range). This range baseline gave an improved accuracy of 14% upon implementation. Oracles for this problem are either poor or very difficult to find. One such rudimentary oracle is examining the assignment beforehand, which always gives 100% accuracy, but this oracle is not very useful as all it shows is that the maximum accuracy of a classifier is 100%. A more useful oracle involves human classification, using experts trained with the knowledge of known instrumental idioms to classify a part. However, such subjects are hard to find; being able to assign a piece to a list of 10 instruments depends on having expert knowledge on playing each of those instruments. As another rudimentary oracle, our own attempts at assigning instruments to pieces resulted in an oracle accuracy of approximately 35%; for reference, this oracle was determined with non-expert musician knowledge. A good oracle involving more expert knowledge would likely have a higher accuracy. We would expect a well-trained classifier to be more accurate than this oracle as well as most of our knowledge was also range-based. 3.2 Multiclass Classification Since our problem is one of multiclass classification, we implemented multiple methods in an effort to find the best method of classification. 3.2.1 One-vs.-one The main method implemented was the one-vs.-one multiclass classification approach, which reduced n the problem of classifying a part into one of n categories (for our problem, n = 10) into 2 problems of binary classification between each possible pair of instruments. Instead of training one n-ary n classifier, the problem was reduced into training 2 = n(n − 1)=2 binary classifiers instead. One example of an implementation of one-vs.-one classification is as follows. During training, we would pass each data point (in our case, a solo part) into the n − 1 binary classifiers comparing that data point’s category (in our case, that part’s associated instrument) with any other category (in our case, any other instrument). In our implementation, these binary classifiers are simply linear n classifiers. After training, we would have 2 = n(n − 1)=2 binary classifiers comparing pairs of categories, each trained on the subset of data classified under either of the two categories it is comparing. Using this classifier to classify a new data point would involve passing the data point through each of the n(n − 1)=2 binary classifiers, keeping track of the count of times a category “wins” each comparison (i.e., the number of times for each category that a classifier categorizes the data point into that category, equivalent to the number of categories A for which the comparison between A and the current category yields a categorization into the current category). The category with the most “wins” is then the category that the point is assigned to. Ties are arbitrarily broken (in our case, the first instrument in some set order we passed into the model). 3.2.2 One-vs.-rest Another approach for linear multiclass classification that was attempted was one-vs.-rest, where n binary classifiers are trained, one for each category, with the whole set of training data. Each data point is processed through the classifier for each instrument i as a training point for “i” or “not i.” In this way, all n classifiers process all n elements, and the problem of multiclass classification is again reduced to a series of binary classification problems. The final classification for one-vs.-rest classification depends on a confidence score rather than just the series of simple binary classifications as a test data point can be classified into multiple categories or no category at all. To ensure accuracy, the scales of the confidence values must be kept the same between different binary classifiers. 3 3.2.3 Implementation Again, our model took in input from musical parts formatted as MIDI files.