
Proceedings of the International Computer Music Conference 2011, University of Huddersfield, UK, 31 July - 5 August 2011 MODELING SPEED DOUBLING IN CARNATIC MUSIC Srikumar K. Subramanian, Lonce Wyse, Kevin McGee National University of Singapore Department of Communications and New Media srikumar.k.subramanian, lonce.wyse, mckevin @nus.edu.sg { } ABSTRACT 5, 6, 7, 8 and 9. The tal¯ a¯ (time structure or meter) is kept constant throughout the different speeds. The fourth part - We consider the problem of modeling the feature re- caranam - is by convention performed at double the speed lationships between multiple speed renditions of parts of at which the piece is begun and in this case, the tal¯ a¯ is also compositions called varnams in Carnatic music, discuss doubled in speed. The caranam therefore provides addi- related work in speech and singing synthesis and in syn- tional raw material to study the characteristics of gamakas thesizing Carnatic music from solfege notation, present at higher speeds. style dependent but arguably rag¯ a¯ independent rules for Gamaka¯s that feature in the slower speeds cannot be simplifying and adjusting gamaka¯s (continuous pitch move- preserved in the faster speeds. Since the slower speed ments) in the slower speed performance of one compo- already packs much more detail per note, a direct speed sition to derive the more rhythmic double speed perfor- up would require absurd levels of detail in higher speeds mance and find that the performance derived using these that will be impossible for a performer to execute and will rules compares favourably with the double speed rendition overload listeners. The work presented here is an attempt by the same artist. to model the kind of detail reduction that happens when increasing performance speed. 1. BACKGROUND Salient musical features that depend on speed occur in some musical genres such as jazz and in Indian classical 2. MOTIVATION music where performers are known to alter musical de- tails of a composition to suit different speeds. In south In- dian classical music (called “carnatic music”), the ornate Our long term goal is to develop a synthesizer for the sparse and discrete “prescriptive notation” that is used for gamaka¯s (continuous pitch movements) used in a slower 1 speed are simplified during performance at higher speeds musical communication in the Carnatic genre. Though and have a greater rhythmicity to them than their lower the prescriptive notation omits the all-important gamaka¯s speed counterparts. The movements in the higher speed — complex continuous pitch movements — that charac- renditions are fewer and appear to follow a rhythmic pulse terize the genre, trained musicians are able to fill in these determined by the composition’s “tal¯ a¯” (time structure). details. Therefore a synthesizer for prescriptive notation A transformation involving detail reduction while increas- can be said to capture the knowledge that a trained mu- ing such rhythmicity appears to be intricate and raises the sician brings to the interpretation of a sparsely notated question of how much genre knowledge is needed to exe- composition. Understanding the influence of speed on cute it. The nature of these speed related transformations the choice and structure of gamaka¯s is an important part is the subject of this paper. of this larger synthesis problem. Performing such detail reduction of gamaka¯s when given only the slower speed Within carnatic music, compositions in the category performance is also an important skill for a student of the of varnam feature sections that are performed in multi- genre. Therefore, computer modeling of this transforma- ple speeds within a single concert performance. Varnams tion, in addition to contributing to the musicology of the therefore are suitable material for studying the changes genre, may have pedagogical applications. that a performer makes to the slower speed rendition when performing it in a higher speed. A typical varnam consists We now discuss work that has been done in the related of four parts, with the first three making up the first half of areas of jazz swing modeling, text to speech synthesis, ex- the composition. The first three parts (pallavi, anupallavi pressive music synthesis and gamaka¯ synthesis in carnatic and muktayisvaram) are performed first at a slow speed music. and followed by one or more higher speed versions related to the original speed by simple integer factors. A varnam 1 performance is guaranteed to at least feature a speed dou- The term “prescriptive notation” was introduced by the ethnomusi- cologist Charles Seeger to denote notation forms that serve as instruc- bled version of the first three parts though a performer tions for performers, in contrast with “descriptive notation” which cap- may choose additional speed multiples such as 3/2, 3, 4, tures the details of a specific performance after the fact [13]. 478 Proceedings of the International Computer Music Conference 2011, University of Huddersfield, UK, 31 July - 5 August 2011 3. RELATED WORK that both these systems for different forms of musical ex- pression use a rule based system that was originally built In considering musical features that depend on speed, jazz for text to speech synthesis. “swing” has been a subject of considerable study. In jazz, In the domain of carnatic music, M. Subramanian [17] swing ratios - the ratio of the longer note or beat to the has built an expert system featuring a per-raga database following shorter one - executed by performers are known of context dependent rules for the automatic derivation to change with tempo [6, 4, 2, 3]. In [4], Friberg and of gamaka¯s from the sparse prescriptive notation. Such Sundstrom¨ present that swing ratios vary from 3.5:1 at a synthesizer is one way to approach the speed depen- slow tempi to 1:1 at fast tempi. In [6], Honing reports that dency problem, since it needs to handle both the normal professional jazz drummers have “enormous control over speed and the double speed renditions, both of which have their timing” within a precision of milliseconds. Never- the same prescriptive notation. His system sits within 2 theless, swing ratios are not kept constant and are “varied the framework of his “svara notation” synthesis program systematically with tempo”. Honing also notes that this is called “Gaayaka” [15]. In his approach to synthesizing “in line with the more general hypothesis that expressive prescriptive notation, the preceding and following pitches timing in music performance does not scale proportion- of a notated pitch are used as the melodic context, to- ately with tempo”. gether with the duration of the pitch to be expanded to 3 Speech intonation models deal with the generation of determine the final phrase. To deal with the fact that dif- the fundamental frequency contour - known as the “F0 ferent gamakas are needed for faster phrases, Subrama- contour” - are an important component of prosody and are nian populates the database with entries for five different related to gamaka¯s. The most common model used for duration ranges for each pitch triad. To account for mul- generating F0 contours for speech is the Fujisaki model tiple interpretations of a given notation fragment, Subra- which has been applied to both speech and singing [10]. manian presents the possibilities as choice to the user of According to this model, the F0 contour is generated as his Gaayaka system. Though this is a viable approach the response of a second order linear system to a sequence to handling the speed dependency of gamakas, it doesn’t of discrete linguistic commands [5]. The “tilt intonation” account for the possibility that the detail reduction upon model developed by Taylor and Black [19, 21] views the increase of speed might have a pattern to it, potentially F0 contours of speech as a series of pitch “excursions” and spanning multiple rag¯ a¯s. describe each using an extent, a duration and a “tilt” pa- In the following sections we describe this speed dou- rameter which varies from –1 (a pure fall) through 0 (a rise bling problem and present a logical model for the speed followed by a fall) to +1 (pure rise). Portele and Heuft’s dependencies of gamakas for one rag¯ a¯. Such a model can “maximim-based description” uses another parameteriza- capture some of the musical understanding that a prac- tion that is similar to Taylor’s model. They specify a con- ticing musician might bring to the act of changing speeds, tour by identifying F0 maxima, their times and their left besides helping to reduce the complexity of phrase databases and right slopes [11]. The minima are implicit in this such as used by Subramanian. model and sinusoidal interpolation of F0 is used to gen- erate the complete contour using this information. Re- 4. PROBLEM garding the question of “naturalness” of such intonation models, Taylor notes in [20] that “the linguistic justifica- A qualitative assessment of the relationship between the tion for any existing intonation systems are weak”, though “first speed” and “second speed” renditions of portions of Fujisaki does provide physiological justifications for his a varnam indicates that the latter has a stronger rhythmic- model. Modeling “speaking rate” control is another rel- ity to it and has fewer details in its melodic movements. evant area of speech synthesis where non-linear tempo- The problem, therefore is to determine the extent to which ral stretching is used to preserve intelligibility of speech. the second speed rendition can be derived from the first Vowels and consonants, in particular, are time stretched speed rendition. The nature of this transformation appears by different amounts. For example, in [22], Yoshimura et discrete and therefore different from the continuous na- al describe how to implement such speaking rate controls ture of the transformations in text to speech and expressive music synthesis discussed in 3.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-