A System That Enables Amateur Singers to Practice Backing Vocals for Karaoke

Journal of Information Processing Vol.27 683–692 (Nov. 2019) [DOI: 10.2197/ipsjjip.27.683] Regular Paper HamoKara: A System that Enables Amateur Singers to Practice Backing Vocals for Karaoke Mina Shiraishi1,a) Kozue Ogasawara1,b) Tetsuro Kitahara1,c) Received: January 24, 2019, Accepted: September 11, 2019 Abstract: Creating harmony in karaoke by a lead vocalist and a backing vocalist is enjoyable, but backing vocals are not easy for non-advanced karaoke users. First, it is difficult to find musically appropriate submelodies (melodies for backing vocals). Second, the backing vocalist has to practice backing vocals in advance in order to play backing vocals accurately, because singing submelodies is often influenced by the singing of the main melody. In this paper, we propose a backing vocals practice system called HamoKara. This system automatically generates a submelody with a rule-based or probabilistic-model-based method, and provides users with an environment for practicing backing vocals. Users can check whether their pitch is correct through audio and visual feedback. Experimental results show that the generated submelodies are musically appropriate to some degree, and the system helped users to learn to sing submelodies to some extent. Keywords: back vocal, submelody generation, probabilistic model, karaoke In this paper, we propose a system called HamoKara,which 1. Introduction enables a user to practice backing vocals. This system has two Karaoke is one of the most familiar forms of entertainment re- functions. The first function is automatic submelody generation. lated to music. At a bar or a designated room, a person who For popular music songs, the system generates submelodies and wants to sing orders the karaoke machine to play back the ac- indicates them with both a piano-roll display and a guide tone. companiment of his/her favorite songs. During the playback of The second function is support of backing vocals practice. While the accompaniment, he/she enjoys singing. Because karaoke is the user is singing the indicated submelody, the system shows enjoyable for many people, techniques enhancing the entertain- the pitch (fundamental frequency, F0) of the singing voice on the ability of karaoke have been developed [2], [3], [4]. piano-roll display. Because we adopted almost the same interface One way to enjoy karaoke is to create harmony with two design as used in the converntional karaoke machine, the user can singers. Typically, one person plays lead vocals (singing the main easily use our system. The user can find out how close the pitch melody) and the other person plays backing vocals (singing a dif- of his/her singing voice comes to the correct pitch. ferent melody). The melody sung by the backing vocalist (we call The rest of the paper is organized as follows: in Section 2, we this a submelody here) typically has the same rhythm as the main present a brief survey on related work from the perspectives of melody but has different pitches (for example, a third above or harmonization and singing training. In Section 3, we describe the below from the main melody). If the lead vocalist and backing overview of our system. In Section 4, we present the details of vocalist are able to sing in harmony with each other, the sound our automatic submelody generation methods. In Section 5, we will be pleasant. However, this is not easy in practice. First, it report results of evaluations conducted to confirm the effective- is difficult to find a musically appropriate submelody. Particu- ness of our system. Finally, we conclude the paper in Section 6. larly for songs that do not contain backing vocals in the original CD recordings, the backing vocalist has to create an appropriate 2. Related Work submelody him/herself, which requires musical knowledge and 2.1 Harmonization skill. Second, accurately playing backing vocals in pitch requires Harmonization has two general approaches. The output of advanced singing skill. If the two singers do not have sufficient the first approach is a sequence of chord symbols, such as C- singing skill, the pitch of the backing vocals may be influenced F-G7-C, for a given melody [5], [6], [7]. The second approach by the lead vocals. To avoid this, the backing vocalist needs to assigns concrete notes to voices other than the melody voice. learn to sing accurately in pitch by practicing the backing vocals The typical form in the latter approach of harmonization is a in advance. four-part harmony, which consists of soprano, alto, tenor, and bass voices. Four-part harmonization is a traditional part of 1 College of Humanities and Sciences, Nihon University, Setagaya, Tokyo 156–8550, Japan a) [email protected] The content of this paper has been presented at the 15th Sound and Mu- b) [email protected] sic Computing Conference [1]. A video related to this paper is available c) [email protected] at: https://www.youtube.com/watch?v=C2 P7-jMwC8 c 2019 Information Processing Society of Japan 683 Journal of Information Processing Vol.27 683–692 (Nov. 2019) the theoretical education of Western classical musicians, and so 3. System Overview numerous researchers have attempted to generate it automatically [8], [9], [10], [11], [12], [13], [14], [15]. 3.1 Basic System Design The feature of our task is the creation of a two-voice harmony The goal of our system is to enable the user to practice backing under the existence of the accompaniment. This is because, in vocals for a song from popular music (especially Japanese pop- karaoke, one sings together with the accompaniment played back ular music, or J-pop). To achieve this, the system has to support by the karaoke machine. Submelodies should therefore be conso- the following functions: nant with the accompaniment, typically given as MIDI-equivalent ( 1 ) Submelody generation sequence data, as well as with the main melody. Some songs do not have submelodies in the original CD A very similar problem was dealt with by Kiribuchi et al. [16]. recordings. For such songs, the system has to generate sub- To compose duet pieces automatically, they developed a method melodies automatically. for yielding submelodies to a piece composed by Orpheus [17]. ( 2 ) Practice support through audio and visual feedback Because Orpheus generates accompaniment tracks, the sub- The submelody for a specified song is indicated through the melody generator should consider the consonance of the sub- piano-roll display and the guide tone during the playback of melody with the accompaniment. Although the details are not the accompaniment. It is not easy for most users to recog- presented in Ref. [16], we suppose that the information of condi- nize whether the singing voice has the correct pitch (i.e., the tions for composition, such as chord progression, is also available same pitch as the guide tone). The system should therefore in the submelody generator. On the other hand, our input is MIDI show the user how accurate his/her singing pitch is through tracks, in which the chord transcription is not assumed to exist. audio and visual feedback. We, therefore, have to extract harmonic features from the given Our system assumes that music data (main melodies, accom- MIDI tracks. paniments, and lyrics of songs) are given in the MIDI format, because most modern karaoke machines use MIDI-equivalent data. 2.2 Singing Practice Support Given the MIDI file of the target song, a submelody is gener- As commonly known, almost all commercial karaoke ma- ated for the entire main melody (from beginning to end). The sub- chines have a singing scoring function. For example, in US melody is assumed to have the same rhythm as the main melody; Patent 5719344 [18], a technique has been proposed that is based hence, each note in the submelody completely corresponds to on comparing the karaoke user’s voice and a referential voice each note in the main melody one-for-one. Submelodies are often extracted from the original CD recording. In addition, Tsai sung only at limited sections (such as chorus sections) in actual et al. [19] also proposed an automatic evaluation method of music scenes, but our system generates a submelody for the en- karaoke singing. tire main melody. This is so that users can simply ignore the Computer-assisted singing learning systems have been de- submelody indication except in the sections where they want to veloepd, such as Singad, Albert,Sino & See, andWinSINGAD sing harmony. (See [20] for details). These systems aim at self-monitoring the It is possible to generate submelodies that have higher or lower user’s singing with visual feedback of the pitch, spectra, and/or pitches than the main melody. The user can select an upper sub- other acoustic/musical features. melody or a lower submelody at the start screen. In addition, Hirai et al. [21] developed the Voice Shooting A typical phenomenon occurs when one person plays lead vo- Game to improve poor-pitch singing. In this game, the verti- cals and another person plays backing vocals: here, their singing cal position of the user’s character on the screen is synchronized pitches are influenced by each other. In particular, backing vocals with the pitch of the user’s singing voice. The user has to con- may be greatly influenced by lead vocals because the backing vo- trol his/her pitch accurately to control the vertical position of the calist has to sing an unfamiliar melody. One solution to this prob- character and to catch target items. lem is to enable the backing vocalist to check his/her pitch accu- Nakano et al. [22] developed a singing-skill visualization sys- racy during the practice phase. Our system enables the backing tem called MiruSinger.

Load more