Transfer Learning for Piano Sustain-Pedal Detection
Total Page:16
File Type:pdf, Size:1020Kb
Transfer Learning for Piano Sustain-Pedal Detection Beici Liang, Gyorgy¨ Fazekas and Mark Sandler Centre for Digital Music, Queen Mary University of London London, United Kingdom Email: fbeici.liang,g.fazekas,[email protected] Abstract—Detecting piano pedalling techniques in polyphonic music remains a challenging task in music information retrieval. While other piano-related tasks, such as pitch estimation and onset detection, have seen improvement through applying deep learning methods, little work has been done to develop deep learning models to detect playing techniques. In this paper, we propose a transfer learning approach for the detection of sustain- pedal techniques, which are commonly used by pianists to enrich the sound. In the source task, a convolutional neural network (CNN) is trained for learning spectral and temporal contexts when the sustain pedal is pressed using a large dataset generated by a physical modelling virtual instrument. The CNN is designed Melspectrogram and experimented through exploiting the knowledge of piano acoustics and physics. This can achieve an accuracy score of MIDI & sensor 0 127 0.98 in the validation results. In the target task, the knowledge learned from the synthesised data can be transferred to detect the Fig. 1. Different representations of the same note played without (first note) or sustain pedal in acoustic piano recordings. A concatenated feature with (second note) the sustain pedal, including music score, melspectrogram vector using the activations of the trained convolutional layers is and messages from MIDI or sensor data. extracted from the recordings and classified into frame-wise pedal press or release. We demonstrate the effectiveness of our method in acoustic piano recordings of Chopin’s music. From the cross- In this paper, we focus on detecting the technique of the validation results, the proposed transfer learning method achieves an average F-measure of 0.89 and an overall performance of sustain pedal, which is the most frequently used one among 0.84 obtained using the micro-averaged F-measure. These results the three standard piano pedals. All dampers are lifted off outperform applying the pre-trained CNN model directly or the the strings when the sustain pedal is pressed. This mechanism model with a fine-tuned last layer. helps to sustain the current sounding notes and allows strings associated to other notes to vibrate due to coupling via the I. INTRODUCTION bridge. A phenomenon known as sympathetic resonance [1] Learning to use the piano pedals strongly relies on listening is thereby enhanced and embraced by pianists to create a to nuances in the sound. Instructions with respect to when the “dreamy” sound effect. We can observe how the phenomenon pedal should be pressed and for what duration are required reflects on the melspectrogram in Figure 1, where note F4 to develop critical listening. To facilitate the learning pro- is played without (first) and with (second) the sustain pedal cess, we pose a research question: “Can a computer point in two bars respectively. Note that the symbol under the out pedalling techniques when a piano recording from a second bar of the music score in Figure 1 can be used virtuoso performance is given?” Pedalling techniques change to indicate the sustain-pedal techniques. Yet, even if pedal very specific acoustic features, which can be observed from notations are provided, pedalling in the same piano passage their spectral and temporal characteristics on isolated notes. arXiv:2103.13219v1 [cs.SD] 24 Mar 2021 can be executed in many different ways. Playing techniques are However, their effects are typically obscured by the variations typically adjusted to the performer’s sense of tempo, dynamics, in pitch, dynamics and other elements in polyphonic music. as well as the location where the performance takes place [2]. Therefore, automatic detection of pedalling techniques using Given that detecting pedalling nuances from the audio signal hand-crafted features is a challenging problem. Given enough alone is a rather challenging task [3], several measurement labelled data, deep learning models have shown the ability systems have been developed to capture the pedal movement. of learning hierarchical features. If these features are able to For instance, the Yamaha Disklavier piano can encode this represent acoustic characteristics corresponding to pedalling movement into MIDI messages (0-127) along with note events. techniques, the model can serve as a detector. A dedicated system proposed in [4] enables synchronously This work is supported by Centre for Doctoral Training in Media and Arts recording the pedalling gestures and the piano sound. This Technology (EPSRC and AHRC Grant EP/L01632X/1), the EPSRC Grant can be deployed on common acoustic pianos, and it is used EP/L019981/1 “Fusing Audio and Semantic Technologies for Intelligent Music to provide the ground truth dataset introduced in Section III. Production and Consumption (FAST-IMPACt)” and the European Commission H2020 research and innovation grant AudioCommons (688382). Beici Liang Detection of pedalling techniques from audio recordings is is funded by the China Scholarship Council (CSC). necessary in the cases where installing sensors on the piano is source task II. RELATED WORK Past research in music information retrieval (MIR) abound pedal.wav train convnet in recognition of musical instruments, but automatic detection of instrumental playing techniques (IPT) remains underde- input.mid veloped [8]. IPT creates a variety of spectral and temporal excerpts in pairs excerpts transfer variations of the sounds in different instruments. Recent re- no-pedal.wav learning search has attempted to transcribe IPT on drum [9], erhu [10], guitar [11], [12] and violin [13], [14]. Hand-crafted convnet features features are commonly designed based on instrument acoustics SVM classifier to capture the salient variations induced by IPT. The sustain- audio recording.wav results target task pedal technique leads to rather subtle variations, therefore most studies managed to detect the technique based on isolated Fig. 2. Framework of the proposed method. notes only [15]–[17]. This challenge is further intensified in polyphonic music where clean features extracted from isolated notes cannot be easily obtained. In our prior work [18], the first not practical. We approach the sustain-pedal detection from research aiming to extract pedalling technique in polyphonic the audio domain using transfer learning [5] as illustrated piano music, we proposed a method for detecting pedal onset in Figure 2. Transfer learning exploits the knowledge gained times using a measure of sympathetic resonance. Yet, this during training on a source task and applies this to a target method assumes the availability of modelling the specific task [6]. This is crucial for our case, where the target-task data acoustic piano which is also used in evaluation. Moreover, is obtained from recordings of a different piano, therefore it it is prone to errors due to its reliance on note transcription. is difficult to learn a “good” representation due to mechanical Convolutional Neural Networks (CNNs) have been used and acoustical deviations. In our source task, a convolutional to boost the performance in MIR tasks, with the ability to neural network (denoted by convnet hereafter) is trained efficiently model temporal features [19] and timbre represen- for distinguishing synthesised music excerpts with or without tations [20]. We choose CNNs to facilitate learning time- the sustain-pedal effect. The convnet is then used as a frequency contexts related to the sustain pedal, using syn- feature extractor, aiming to transfer the sustain-pedal effect thesised excerpts in pairs (pedal versus no-pedal versions). learned from the source task to the target task. Support Using this method, contexts that are invariant to large pitch vector machines (SVMs) [7] are trained using the frame- and dynamics changes can be learned. wise convnet features from the acoustic piano recordings To apply a convnet trained from the synthesised data into to finalise the feature representation transfer as the target task. the context of real recordings, a transfer learning approach SVMs can be used as a classifier to localise which frames are can be used. It has been gaining more attentions in MIR for played with the sustain pedal. The performance is expected to alleviating the data sparsity problem and its ability to be used improve significantly with the new feature representation. To for different tasks. For example, Choi et al. [21] obtained sum up, the main contributions of this paper are: features from CNNs, which were trained for music tagging in 1) A novel strategy of model design, which incorporates the source task. These features outperformed MFCC features knowledge of piano acoustics and physics, enabling the in the target tasks, such as genre and vocal/non-vocal classi- convnet to become more effective in representing the fication. We believe such strategy is suited to the challenges sustain-pedal effect. in detecting the sustain pedal from polyphonic piano music 2) A transfer learning method that allows the convnet recorded in different acoustic and recording conditions. trained from the source task to be adapted to the target In our case, training a convnet with the synthesised data task, where the recording instruments and room acous- is considered as the source task. Then in the target task, we tics are different. This also allows effective learning with can use the learnt representations from the trained convnet a smaller dataset. as features, which are extracted from every frame of a real 3) Finally, we conduct visual analysis on the convolutional piano recording, to train a dedicated classifier adapted to the layers of the convnet to promote model designs with actual acoustics of the piano and the performance venue used fewer trainable parameters, while maintaining their dis- in the recording. This transfer learning approach is expected criminating power. to better identify frames played with the sustain pedal. For the The rest of this paper is organised as follows.