Monophonic Piano Music Transcription
Total Page:16
File Type:pdf, Size:1020Kb
ISSN (Print) : 0974-6846 Indian Journal of Science and Technology, Vol 9(28), DOI: 10.17485/ijst/2016/v9i28/97359, July 2016 ISSN (Online) : 0974-5645 Monophonic Piano Music Transcription Yong Yee Zien* and Yap Fa Toh School of Computer Sciences, Universiti Sains Malaysia, Penang - 11800, Malaysia; [email protected], [email protected] Abstract This paper proposes a method for computational monophonic piano music transcription, which detects the pitches of piano music and thus to identify the corresponding musical notations. This computational music transcription method consists of two main algorithms, which are Onset Detection Algorithm and Pitch Detection Algorithm. The Onset Detection methodAlgorithm and involves they are sound built wavebased filtering on the observation and sound waveof characteristics segmentation. of pianoAnd the sound Pitch signal. Detection The programAlgorithm is involvesfast and simpleperiod determination, frequency computation and musical notation identification. These proposed algorithms adopt time- domain input only, that is the monophonic piano music with slow or average speed up to 120 crotchet beats per minute. It is becauseto use, and the able performances to output result of the with algorithms 88% accuracy. are dependent However, on this the musicthreshold transcription values set method in the program. is limited Therefore, to specific further sound investigation and research have to be carried out in order to improve the performance of the program. Keywords: Automatic Music Transcription, Onset Detection Algorithm, Pitch Detection Algorithm 1. Introduction is only able to transcribe monophonic music, which is less complex if compared to polyphonic music. Music transcription is a process of writing musical The current work focuses on the monophonic piano notations based solely on a recording of music. To tran- music transcription. Monophonic music is defined as scribe a piece of music manually, the transcriptionist every single pitch in a piece of music is standalone which must have the knowledge in musical notation, and must is not occurring with other pitches simultaneously. In be proficient in analyzing the music sound in order to other words, monophonic music consists of merely single transcribe the melody. However, this process becomes melody line. The proposed computational music tran- tedious and time consuming as the transcriptionist has scription method in this paper emphasizes on Onset to listen to the music numerous times and writes down Detection Algorithm and Pitch Detection Algorithm, which basically involve analyzing the sound wave of piano the music notes one by one into a sheet music. Hence, music signal, estimating the pitches in a piano music, and computational music transcription method is developed identifying the corresponding musical notations. to assist musician in transcribing music automatically. The background of this work is to develop a desktop software application which is called Musical Score 2. Literature Review Transcriber (MST). The main function of MST is to tran- Extracting musical information from a piece of music scribe piano music into sheet music automatically. Hence, audio is a challenging task, as it involves music tran- the method of sound signal analysis and processing is scription, note transcription, score alignment, chord necessary and it has to be developed in order to perform transcription, and structure detection1. On the other automatic music transcription. The music transcription hand, pitch detection is the fundamental technique algorithms are developed for this application. However, it to extract musical data from music audio. Most of the *Author for correspondence Monophonic Piano Music Transcription pitch detection algorithms are invented to detect pitch by sorting the onset, pitch and offset. The collector of vocal signal such as Linear Predictive Coding (LPC)2 recognizes when the pitch maintains the same value, and which is used to encode speech and estimate the speech proposes a note onset in the first value of the constant parameters, and F0 estimator by Klapuri3 which is able sequence. After an onset, the offset is detected by check- to estimate pitch of several concurrent sounds in speech ing if the signal energy falls below the audibility threshold. signal. And these pitch detection algorithms can be fur- Hence the duration of the note is determined. ther developed to perform music transcription. Besides that, onset detection also plays an important role in music transcription. Klapuri4 used psychoacoustic model in the onset detection system to determine the perceptual onsets of sounds in acoustic signals. While, Zhou5 proposed an onset detection system that uses a combination of pitch- based and energy-based detection algorithms based on Equation 2. Key number relative to the note the Resonator Time-Frequency Image (RTFI) analysis. Similarly, it is mentioned by Onder8 that Music transcription basically can be categorized into auto correlation method is also used to estimate the pitch two main domains, which are time-domain method and of played music note. Initially, the autocorrelation of frequency-domain method. The time-domain method musical signals is computed with zero lag, and the results involves analyzing the variation of amplitude and period are normalized to a maximum value. Then, the peaks are of signal data over time. And the frequency-domain expected after or before this maximum. For signal with method is analyzing signal data in term of frequency no harmonic related peaks, the length between the maxi- components and it usually involves transformation such mum and the first peak is the period of the played note as Fourier Transformation. For monophonic music tran- (T) and the fundamental frequency is 1/T. When the sig- scription, it mainly involves three parameters which are nal has strong harmonic related peaks, the determination onset, duration and pitch6. of the fundamental frequency of the played note is more 2.1 Autocorrelation complicated. For example, a signal with one strong har- monic component, if the difference of the length between Most of the researchers agree that autocorrelation is one the maximum and the first peak, and the length of the of the efficient methods for detecting pitch of sound. first and the second peak is greater than one sampling Autocorrelation method is widely used in signal analysis, period, then the length of the maximum and the second speech recognition, and automatic music transcription. peak is the period of the played tone. Autocorrelation pitch tracking was proposed by The autocorrelation method calculates the Monti6,7 for monophonic music transcription. The method self- similarity of signal over time, and the peaks of auto- estimates the pitch in music signal based on equation 1. correlation will indicate the fundamental frequency of the signal9. However, this method works well only on the sustain region of sound envelope where the signal is more consistent. 2.2 K-Nearest Neighbor (KNN) Algorithm Pishdadian10 proposed an instance-based classification Equation 1. Autocorrelation of an N-length sequence approach, that is the K-Nearest Neighbour (KNN) x(k)6,7 algorithm for pitch detection. The proposed music The music transcription system consists of four transcription algorithm focuses on frequency-domain processes. Firstly, the pitch is tracked while the envelope features of music signal. The pitch class is identified based of signal is calculated simultaneously. Next, the pitch data on the distance between the observed frequency-domain obtained is then converted to key number, kn, which is feature vector and the training feature vectors. The KNN relative to the note by using equation 2. Following that, a algorithm works based on the assumption that each module called the collector manages the signal amplitude training or testing sample corresponds to a point in the 2 Vol 9 (28) | July 2016 | www.indjst.org Indian Journal of Science and Technology Yong Yee Zien and Yap Fa Toh n-dimensional Euclidean space. If the training data set is sufficiently large, KNN pitch classifier is employed to identify the target pitch class for each note event. On the other hand, if the training database is small, a two-step algorithm is employed. The first step is to identify a list of pitch candidates using semi-KNN algorithm. In the sec- ond step, Viterbi algorithm is applied to a trellis of pitch Figure 1. Graphical representation of sound wave. candidates to find the most likely note sequence or melody line. KNN algorithm is a low complexity method with a very simple training stage. However, it is capable of yield- ing high performance accuracy provided the amount of training samples is large. If the training database is small, then the classification results tend to degrade dramatically due to the noisy training data. The existing music transcription algorithms have their own strengths and weaknesses. Some of the algorithms are able to perform well and give accurate results, however the methods are highly complex and the cost of computation is expensive. Therefore, this paper proposed a simple and fast Figure 2. A sound envelope is made up of four regions: method as an alternative for automatic music transcription. Attack, Decay, Sustain and Release. 3. Proposed Solution A piece of music usually consists of more than one pitch, and each of the pitches is connected to form a melody. Hence, this paper proposes that, each of the pitches of music can be segmented in the initial stage so that every single pitch can be processed separately. After that, the period and fundamental frequency of the single pitch can be determined and thus enable to identify the Figure 3. A sound envelope consists of three points: Onset, corresponding musical notation. Peak and Offset. To perform these processes, the time-domain methods, Onset Detection Algorithm and Pitch Detection Algorithm are introduced in this paper as an alternative highest point of amplitude after being sounded. Following for monophonic piano music transcription. The Onset that is Decay which is the time that it takes to fall down Detection Algorithm involves sound wave segmenta- to the sustain level from the maximum amplitude.