ISSN (Print) : 0974-6846 Indian Journal of Science and Technology, Vol 9(28), DOI: 10.17485/ijst/2016/v9i28/97359, July 2016 ISSN (Online) : 0974-5645 Monophonic Piano Music Transcription

Yong Yee Zien* and Yap Fa Toh School of Computer Sciences, Universiti Sains Malaysia, Penang - 11800, Malaysia; [email protected], [email protected]

Abstract This paper proposes a method for computational monophonic piano music transcription, which detects the pitches of piano music and thus to identify the corresponding musical notations. This computational music transcription method consists of two main algorithms, which are Onset Detection Algorithm and Pitch Detection Algorithm. The Onset Detection

methodAlgorithm and involves they are built wavebased filtering on the observation and sound waveof characteristics segmentation. of pianoAnd the sound Pitch . Detection The programAlgorithm is involvesfast and simpleperiod determination, frequency computation and musical notation identification. These proposed algorithms adopt -­domain input only, that is the monophonic piano music with slow or average speed up to 120 crotchet beats per minute. It is ­becauseto use, and the able performances to output result of the with algorithms 88% accuracy. are dependent However, on this the musicthreshold transcription values set method in the program. is limited Therefore, to specific further sound investigation and research have to be carried out in order to improve the performance of the program.

Keywords: Automatic Music Transcription, Onset Detection Algorithm, Pitch Detection Algorithm

1. Introduction is only able to transcribe monophonic music, which is less complex if compared to polyphonic music. Music transcription is a process of writing musical The current work focuses on the monophonic piano ­notations based solely on a recording of music. To tran- music transcription. Monophonic music is defined as scribe a piece of music manually, the transcriptionist every single pitch in a piece of music is standalone which must have the knowledge in musical notation, and must is not occurring with other pitches simultaneously. In be proficient in analyzing the music sound in order to other words, monophonic music consists of merely single transcribe the melody. However, this process becomes melody line. The proposed computational music tran- tedious and time consuming as the transcriptionist has scription method in this paper emphasizes on Onset to listen to the music numerous and writes down Detection Algorithm and Pitch Detection Algorithm, which basically involve analyzing the sound of piano the music notes one by one into a sheet music. Hence, music signal, estimating the pitches in a piano music, and computational music transcription method is developed identifying the corresponding musical notations. to assist ­musician in transcribing music automatically. The background of this work is to develop a ­desktop software application which is called Musical Score 2. Literature Review Transcriber (MST). The main function of MST is to tran- Extracting musical information from a piece of music scribe piano music into sheet music automatically. Hence, audio is a challenging task, as it involves music tran- the method of sound signal analysis and processing is scription, note transcription, score alignment, chord necessary and it has to be developed in order to perform transcription, and structure detection1. On the other automatic music transcription. The music transcription hand, pitch detection is the fundamental technique algorithms are developed for this application. However, it to extract musical data from music audio. Most of the

*Author for correspondence Monophonic Piano Music Transcription

pitch detection algorithms are invented to detect pitch by sorting the onset, pitch and offset. The collector of vocal signal such as Linear Predictive Coding (LPC)2 ­recognizes when the pitch maintains the same value, and which is used to encode speech and estimate the speech proposes a note onset in the first value of the constant parameters, and F0 estimator by Klapuri3 which is able sequence. After an onset, the offset is detected by check- to estimate pitch of several concurrent in speech ing if the signal energy falls below the audibility threshold. signal. And these pitch detection algorithms can be fur- Hence the duration of the note is determined. ther developed to perform music transcription. Besides that, onset detection also plays an important role in music transcription. Klapuri4 used psychoacoustic model in the onset detection system to determine the perceptual onsets of sounds in acoustic . While, Zhou5 proposed an onset detection system that uses a combination of pitch- based and energy-based detection algorithms based on Equation 2. Key number relative to the note the Resonator Time-Frequency Image (RTFI) analysis. Similarly, it is mentioned by Onder8 that Music transcription basically can be categorized into ­auto­correlation method is also used to estimate the pitch two main domains, which are time-domain method and of played music note. Initially, the autocorrelation of frequency-domain method. The time-domain method musical signals is computed with zero lag, and the results involves analyzing the variation of and period are normalized to a maximum value. Then, the peaks are of signal data over time. And the frequency-domain expected after or before this maximum. For signal with method is analyzing signal data in term of frequency no related peaks, the length between the maxi- components and it usually involves transformation such mum and the first peak is the period of the played note as Fourier Transformation. For monophonic music tran- (T) and the fundamental frequency is 1/T. When the sig- scription, it mainly involves three parameters which are nal has strong harmonic related peaks, the determination onset, duration and pitch6. of the fundamental frequency of the played note is more 2.1 Autocorrelation complicated. For example, a signal with one strong har- monic component, if the difference of the length between Most of the researchers agree that autocorrelation is one the maximum and the first peak, and the length of the of the efficient methods for detecting pitch of sound. first and the second peak is greater than one sampling Autocorrelation method is widely used in signal analysis, period, then the length of the maximum and the second speech recognition, and automatic music transcription. peak is the period of the played tone. Autocorrelation pitch tracking was proposed by The autocorrelation method calculates the Monti6,7 for monophonic music transcription. The method self-­similarity of signal over time, and the peaks of auto- estimates the pitch in music signal based on equation 1. correlation will indicate the fundamental frequency of the signal9. However, this method works well only on the sustain region of sound envelope where the signal is more consistent.

2.2 K-Nearest Neighbor (KNN) Algorithm Pishdadian10 proposed an instance-based ­classification Equation 1. Autocorrelation of an N-length sequence approach, that is the K-Nearest Neighbour (KNN) x(k)6,7 algorithm for pitch detection. The proposed music The music transcription system consists of four transcription algorithm focuses on frequency-domain ­processes. Firstly, the pitch is tracked while the envelope features of music signal. The pitch class is identified based of signal is calculated simultaneously. Next, the pitch data on the distance between the observed frequency-domain obtained is then converted to key number, kn, which is feature vector and the training feature vectors. The KNN relative to the note by using equation 2. Following that, a algorithm works based on the assumption that each module called the collector manages the signal ­amplitude training or testing sample corresponds to a point in the

2 Vol 9 (28) | July 2016 | www.indjst.org Indian Journal of Science and Technology Yong Yee Zien and Yap Fa Toh n-dimensional Euclidean space. If the training data set is sufficiently large, KNN pitch classifier is employed to identify the target pitch class for each note event. On the other hand, if the training database is small, a two-step algorithm is employed. The first step is to identify a list of pitch candidates using semi-KNN algorithm. In the sec- ond step, Viterbi algorithm is applied to a trellis of pitch Figure 1. Graphical representation of sound wave. candidates to find the most likely note sequence or melody line. KNN algorithm is a low complexity method with a very simple training stage. However, it is capable of yield- ing high performance accuracy provided the amount of training samples is large. If the training database is small, then the classification results tend to degrade­dramatically due to the noisy training data. The existing music transcription algorithms have their own strengths and weaknesses. Some of the algorithms are able to perform well and give accurate results, however the methods are highly complex and the cost of computation is expensive. Therefore, this paper proposed a simple and fast Figure 2. A sound envelope is made up of four regions: method as an alternative for automatic music transcription. Attack, Decay, Sustain and Release.

3. Proposed Solution

A piece of music usually consists of more than one pitch, and each of the pitches is connected to form a melody. Hence, this paper proposes that, each of the pitches of music can be segmented in the initial stage so that every single pitch can be processed separately. After that, the period and fundamental frequency of the single pitch can be determined and thus enable to identify the Figure 3. A sound envelope consists of three points: Onset, ­corresponding musical notation. Peak and Offset. To perform these processes, the time-domain ­methods, Onset Detection Algorithm and Pitch Detection Algorithm are introduced in this paper as an alternative ­highest point of amplitude after being sounded. Following for monophonic piano music transcription. The Onset that is Decay which is the time that it takes to fall down Detection Algorithm involves sound wave segmenta- to the sustain level from the maximum amplitude. Next is tion, while the Pitch Detection Algorithm involves period the Sustain which is a level that the signal remains hold. determination and frequency computation. To develop Last is the Release which is the time that it takes to fall the algorithms, an understanding of characteristic and from Sustain level to zero amplitude. structure of sound wave of piano music is necessary. In other perspective, a music sound envelope can also The characteristics of sound wave include wavelength, be explained as it consists of three points which are Onset, amplitude, time period and frequency. The peak of sound Peak and Offset. Onset is the power content at the start- wave is known as compression or crest, while the valley of time of a musical note. Peak is the maximum amplitude sound wave is known as rarefaction or trough. of the sound signal. And Offset is the power content at the A sound wave of a single pitch is called sound ­envelope end-time of a musical note. that is made up of four regions which are Attack, Decay, By understanding the structure and nature of an­ Sustain and Release. First region is Attack which is the ordinary sound envelope, music transcription can be duration or time that it takes for a signal to reach the ­carried out by analyzing the pattern of music sound ­signal.

Vol 9 (28) | July 2016 | www.indjst.org Indian Journal of Science and Technology 3 Monophonic Piano Music Transcription

3.1 Sound Wave Segmentation be the onset. This process starts from the first selected sample, s by comparing it with the next selected sample, The first step of this proposed music transcription method n s . If there are certain amount of s which are is sound wave segmentation, which is the process of seg- n+m, m=1,2,3,…,M n+m greater than s , then the program will proceed and start to menting music into single pitch by determining the onset n compare the s with s . If there are of each pitch. After segmentation, each of the pitches of n-m, m=1,2,3,…,M n+m, m=1,2,3,…,M certain amount of s which are greater than s as well, the music can be analyzed separately. This segmentation n+m n-m it indicates that s has a similar property as s . When process basically involves two steps, which are sound n-m n the program obtains a certain amount of samples which ­signal filtering and onset detection. have the similar property as s , then it can infer that the The sound signal filtering involves three stages, which n s is the onset. This process is continue for all the samples are mainly for smoothening the sound signal so that it n until all the onsets are determined in the sound signal. would be easier for processing.

3.1.1 First Stage 3.2 Period Determination and Frequency To identify all the crests in sound signal. This stage of Computation ­filtering is to obtain the values of crests, while the remain- After all the onsets of pitches are determined in a music ing samples will be set to 0 value. The purpose of first sound wave, the pitches are separated and can be pro- stage of filtering is to highlight all the maximum ampli- cessed separately. The period of a pitch can be determined tudes as the prominent samples within a sound signal by by estimating the time duration of a complete cycle. Figure ­eliminating the inconspicuous samples. 5 shows that the of a piano sound note consists of many crests and troughs, and the sizes vary from each 3.1.2 Second Stage other. Nevertheless, the pattern of the crests and troughs To obtain the value of sample which is greater than the are repeating when one cycle of sound wave is completed. previous sample and also the next sample, while the And, the patterns of the cycles are almost identical which remaining samples will be set to 0 value. The purpose of is representing the period of the sound wave. second stage of filtering is to eliminate those maximum Several computations have to be done in order to which are less significant among the selected determine cycle in a sound wave of piano music. First is samples from the first stage of filtering. calculating the area and width of each crest group, and then computing the standard deviation of the area and 3.1.3 Third Stage the width among any two crest groups. To obtain the value of sample which is less than the ­previous sample and also the next sample, and then ­eliminate it by setting it to 0 value. This stage has to be done in order to σa: standard deviation of the areas of two crest groups obtain a smoother shape of sound signal. ci: area of current crest group Next, the onset detection will be carried out after the cj: area of another crest group three stages of filtering are done. This process is started by comparing the two series of fixed amount of samples along the sound signal. If the latter series of samples are greater than the former series of samples by a defined ratio, then the connected point of the two series of samples will

Figure 4. Onset of Sound Envelope. Figure 5. A segment of waveform of piano music.

4 Vol 9 (28) | July 2016 | www.indjst.org Indian Journal of Science and Technology Yong Yee Zien and Yap Fa Toh

i = 1, 2, ..., T where T = total number of crest groups in 4. Results and Findings a note j = 1, 2, ..., T where i ≠ j 4.1 Onset Detection Equation 3. Standard deviation of area of two crest Several experiments have been carried out to investigate groups the pattern of sound wave. A piano music with tune of fourth octave is used as a sample to test the program. The process of finding onset is implemented, and all the pos- sible onsets are obtained at the initial step. Figure 6 shows σ : standard deviation of the widths of two crest groups w the sound signal of two connected piano notes, where the c : width of current crest group i left part is the decay region of a sound note while right part c : width of another crest group j is the attack region of another sound note. The program i = 1, 2, ..., T where T = total number of crest groups in has to determine the onset which connects the two sound a note notes in order to perform sound wave segmentation. j = 1, 2, ..., T where i ≠ j To determine the right onset among the possible Equation 4. Standard deviation of width of two crest onsets, the distance of two consecutive possible onsets is groups calculated. If the distance value is greater than a specific If the value of the standard deviation is minimal, then amount, then the first possible onset of the two consecu- it is indicating that the two crest groups are identical and tive possible onsets will be the right onset. The specific the signal starting from the first crest group to the start- amount is set to be the value of the musical note of thirty- ing point of the next crest group is considered as a cycle. two note which is 5512. This value is set by assuming the And, the length of the cycle is the period of the pitch. The duration of a crotchet note is one second, while the default computation is repeated until all possible cycles in a pitch sample per second is 44100Hz. Hence, the duration of a are determined, and a list of all cycle values is obtained. thirty-two note is 1*2-3 = 0.125s, which is equivalent to In the list of cycles, the values might be different from 44100*2-3≈ 5512. By presenting a graph of possible onsets one another, hence, the cycle value which occurs most as in Figure 7, it shows that the change is obvious which is ­frequently will be the right period of the pitch. easy to identify the right onset. After the right period of pitch is determined, the Figure 7 shows the 17th, 36th and 45thpossible onsets ­frequency of the pitch is computed using equation 5. which have the distance greater than 5512. Hence, it can conclude that the 17th, 36th and 45th possible onsets are the right onsets for the music signal.

4.2 Period Determination After all the onsets are determined in the piano music, Equation 5. Frequency of pitch each of the piano pitch can be processed separately. The

3.3 Musical Notation Identification When the frequency of a pitch is obtained, the musical notation can be computed using equation 6.

Equation 6. Note number Figure 6. The possible onsets.

Vol 9 (28) | July 2016 | www.indjst.org Indian Journal of Science and Technology 5 Monophonic Piano Music Transcription

Table 1. Example 1 of Transcription Output Music Name Happy Birthday to You C4, C4, D4, C4, F4, E4, C4, C4, D4, C4, Actual Notes G4, F4, C4, C4, C5, A4, F4, E4, D4, A#4, A#4, A4, F4, G4, F4 C4, C4, D4, C4, F4, E4, C4, C4, D4, D#5, Transcribed A#6, G4, F4, C4, C4, C4, A4, F4, E4, D4, output A#4, A#4, A4, F3, G4, F4 Error Ratio 3/25 = 0.12 Error 12.0% Figure 7. The graph of distance of two consecutive possible Percentage (%) onsets; Distance vs. Timeline. ∗ The bold italic note indicates the wrongly transcribed note. There are 3 errors: 1. D#5, A#6, should be C4 2. C4, it should be C5 3. F3, it should be F4

Table 2. Example 2 of Transcription Output Music Name Twinkle Twinkle Little Star C4, C4, G4, G4, A4, A4, G4, F4, F4, E4, E4, D4, D4, C4, G4, G4, F4, F4, E4, E4, D4, G4, Actual Notes G4, F4, F4, E4, E4, D4, C4, C4, G4, G4, A4, A4, G4, F4, F4, E4, E4, D4, D4, C4 C4, C4, G4, G4, A4, A4, G4, F4, F4, E4, E4, Figure 8. All cycle values of a sample pitch (C4); Number Transcribed D4, D4, C4, G4, G4, F4, F4, F4, __, D4, G4, output G4, F4, __, E4, E4, D4, C4, C4, G4, G4, A4, of Occurrence vs. Cycle Value. A4, G4, F4, F4, E4, __, D4, D4, C4, C4 Error Ratio 5/42 = 0.12 next process is period determination which involves Error 12.0% ­finding out all cycle values in a piano pitch. Percentage (%) Figure 8 shows all the cycle values in the piano pitch of ∗ The bold italic note indicates the wrongly transcribed note. C4, and cycle value with highest occurrence is 167, which There are 5 errors; 2 wrong notes and 3 missing notes: occurs a total of 59 times. Hence, it can conclude that the 1. F4, it should be E4 cycle value of 167 is the right period of the piano pitch of 2. C4, it is an extra note, it should be combined with the previous C4 C4. Afterwards, the frequency of the piano pitch can be 3. The missing notes are: E4, F4, E4 calculated from the period obtained. ­detection failure might be due to the tempo of music being 4.3 Musical Notation Identification too fast, which causes the offset and onset of two consecu- The musical note number is the final output of the music tive notes to be overlapping. If the onset is not detected transcription system. Many testing experiments have correctly, then the sound envelope will not be segmented been carried out, and the transcription outputs have correctly as well. Second, error might also occur when shown a satisfactory result. The accuracy of output is detecting the period of sound envelope. Sometimes, the measured based on the correctness of onset detection and algorithm might not be able to identify the correct cycle period determination. The tables below show examples of in a sound envelope of certain types of waveform espe- ­program testing. cially the uniform type of sound wave. The uniform type The errors occur might be caused by two factors. of sound wave refers to the crests or troughs of a sound First, the error occurs during the sound wave segmenta- pitch are almost identical among each other, and this will tion where the onset is not detected properly. The onset cause the program to misidentify the cycle. As a result,

6 Vol 9 (28) | July 2016 | www.indjst.org Indian Journal of Science and Technology Yong Yee Zien and Yap Fa Toh the frequency computed will be wrong and thus causes 6. Conclusion and Future Work the output musical note to be wrong as well. During the system development, piano sound is adopted as the main testing sample. It is because the pitch of piano 5. Discussion sound is known in advance, hence, it is easier to verify the The onset detection and period determination are the output results from the program. By taking the piano sound two main steps which play the most crucial role in this as input sound, the algorithms are built based on the pattern proposed music transcription method. The purpose of and behaviour of its sound waveform. And, the threshold Onset Detection Algorithm is to perform sound wave values are being set in such a way to fulfill the conditions of segmentation as the initial step of the music transcrip- the piano sound. However, the problem might occur when tion which is to isolate all the sound notes. It must be the input sound type is changed, which causes the algo- reliable as its output will influence the performance of rithms not be able to adapt the pattern of the unfamiliar period determination directly. If the onset of pitch is not sound waveform, and thus to return imprecise results. detected correctly, then the program might not be able to It is difficult to avoid imprecision error in the ­determine the period of pitch accurately as well. ­algorithms, thus, there should be alternatives to over- In this work, piano sound is selected as sample for come the imperfections. One of the approaches is to building the algorithms as it is the most common musi- embed a function in the application system for allowing cal instrument. By recording the played piano sound, its user to manipulate the output sheet music such as add, pitches are known in advance which makes it easy to carry edit and delete the musical notations. Besides, investiga- out the output checking. As a result, the algorithms can tion should be carried out in order to further improve the only work well provided the input sound is piano music. algorithms. The algorithms should be developed in such a The algorithms operate based on the threshold values way to adapt all types of recording including various types set in the program. There are three main threshold val- of and hummed tune, and thus to be able to return ues. First is the minimum ratio of two series of compared precise output with minimal error. sound samples which is set in the third stage of filtering process. Second is the minimum value of the standard 7. References deviation of the area and the width of two compared crest 1. groups. Third is the minimum counter to check a sound Gold B, Morgan N, Ellis D. Speech and audio signal ­processing: processing and perception of speech and music. sample is the possible onset. These threshold values as John Wiley & Sons; 2011 Nov 1. p. 568–76. mentioned above are set based on the observation of the 2. Bharathi V, Asaph AA, Ramya R. Vocal pitch detection sound signal. Hence, the program might only suit to a for musical transcription. 2011 International Conference specific instrument or specific kind of melody. For exam- on Signal Processing, Communication, Computing and ple, the designed algorithms only suit to transcribe piano Networking Technologies (ICSCCN), IEEE. 2011 Jul 21. music, as the threshold values are set based on the pat- p. 724–6. tern of piano sound signal. If other such as ­violin 3. Klapuri A. Multipitch analysis of polyphonic music and or guitar are used as input sound, then the algorithms speech signals using an auditory model. IEEE Transactions might not perform well and thus causes the percentage of on Audio, Speech, and Language Processing. 2008 Feb; ­accuracy to be low. Besides, the algorithms are only able to 16(2):255–66. transcribe music with average or slow speed. It is because 4. Klapuri A. Sound onset detection by applying the Onset Detection Algorithm is unable to detect onset ­psychoacoustic knowledge. 1999 IEEE International Conference on , Speech, and Signal Processing, of fast moving sound signal due to the peaks of different 1999 Proceedings. IEEE. 1999 Mar 15; 6. p. 3089–92. pitches are too close together, and this makes the decay 5. Zhou R. Feature extraction of musical content for automatic region to be absent in the sound signal. If the decay region music transcription, 2006. is not clear in the sound signal, then the program would 6. Bello JP, Monti G, Sandler MB. Techniques for automatic fail to detect the onset. Hence, it can conclude that all of music transcription. In ISMIR. 2000 Oct 23. these problems occurred are due to threshold values set 7. Monti G, Sandler M. Monophonic transcription with are subjective to uniqueness of every single music piece ­autocorrelation. In Proc. COST G-6 Conf on Digital Audio and its timbre. Effects, DAFX. 2000 Dec 7.

Vol 9 (28) | July 2016 | www.indjst.org Indian Journal of Science and Technology 7 Monophonic Piano Music Transcription

8. Onder M, Akan A, Bingol S. Pitch detection for ­monophonic 10. Pishdadian F, Nelson JK. On the transcription of musical notes. In Third International Conference on ­monophonic melodies in an instance-based pitch classifi- Electrical and Electronic Engineering-ELECO. 2003 Dec; 1. cation scenario. 2013 IEEE in Digital Signal Processing and 9. Alastair P, Monophonic transcription, 2011. Available Signal Processing Education Meeting (DSP/SPE), IEEE. from: http://www.music.mcgill.ca/~alastair/621/porter11­ 2013 Aug 11; 222–7. monophonic-summary.pdf

8 Vol 9 (28) | July 2016 | www.indjst.org Indian Journal of Science and Technology