Music Structure Music Proportion Total Type Style (%) (%) 7. REFERENCES A 17.50 [1] L. K, “Theme Motif Analysis as Applied in Chinese Coordinate A+B 20.42 with Band-Limited Oscillator Sections 75.42 Structure A+B+C 3.33 Folk Songs,” Art of Music-Journal of the Shanghai A+B+C+D+... 34.17 Conservatory of Music, 2005. Reproducing A+B+A 10.00 10.00 Peter Pabon So Oishi Structure [2] Z.-Y. P, “The Common Characteristics of Folk Songs Cyclotron A+B+A+C+A 1.25 Institute of Sonology, Institute of Sonology, 1.25 in Plains of Northeast China,” Journal of Jilin College Structure A+B+A+C+A+... 0 Royal Conservatoire, Juliana van Stolberglaan 1, Royal Conservatoire, Juliana van Stolberglaan 1, of the Arts, 2010. A+B+A+B 5.00 2595 CA Den Haag, The Netherlands 2595 CA Den Haag, The Netherlands Circular Structure A+B+A+B+A+B 1.67 12.92 A+B+A+B+A+B+... 6.25 [3] Y. Ruiqing, “Chinese folk melody form (22),” Jour- [email protected] [email protected] nal of music education and creation, vol. 5, pp. 18–20, Table 3. Statistical results of HaoZi-Hunan music structure styles 2015. ABSTRACT regions by maintaining synced -couplings to a common devisor term. In this extended BLOsc ver- The last columns of Table 1, Table 2 and Table 3 all show [4] Z. Shenghao, “The interpretation of Chinese folk songs The band-limited oscillator (BLOsc) is atypical as it pro- sion, each region can be given its own inde- that the folk songs in the three regions have general charac- and music ontology elements,” Ge Hai, vol. 4, pp. 48– duces signal spectra with distinctive edgings instead of pendent exponential sloping (see Figure 1). teristics: the coordinate structure occupies the largest pro- 50, 2013. distinct peaks. An edging at low frequency can have a portion, and the cyclotron structure the least. The reason comparable perceptual effect as a spectral peak. When 1.1 Nyquist why the coordinate structure is the most common is that it [5] G. Mantena, S. Achanta, and K. Prahallad, “Query- modulated, the BLOsc has the advantage that it preserves is the simplest combination of the music structure and is by-example spoken term detection using frequency spectral textures and contrasts that tend to blur with a A large part of the literature on the band-limitation the foundation of all music structure types. On the other domain linear prediction and non-segmental dynamic resonance-based (subtractive) synthesis approach. First, paradigm is concerned with the problem of generating hand, the strict requirements for the formation of the cy- time warping,” IEEE/ACM Trans. Audio, Speech, the simple math behind the BLOsc is described. Staying non-aliased versions of the standard oscillator waveforms clotron structure make it the least common. It needs to Lang.Process, vol. 22, no. 5, pp. 946–955, 2014. close to this formulation helps to keep the model mallea- found with the analog [2][3][4]. With all have two inconsistent clips in three adjacent clips with the [6] G. Min, X. Zhang, J. Yang, and Y. Chen, “Sparse rep- ble and to maintain the dynamic consistencies within the Nyquist problems solved, we can safely do subtractive same label, which leads to its not being stable and eas- resentation and performance analysis for LSP parame- model. Next, an extended processing scheme is presented synthesis with our familiar palette of waveforms, but now ily transitioning to the reproducing structure and circular ters via dictionary learning,” Journal of Pla University that essentially involves a sectioned evaluation of the in the digital domain. Yet, in this case, the traditional structure. of Science & Technology, 2014. frequency range. The and the application of subdivision additive-versus-subtractive is far from trivial. We also compare the proportions of each music structure convolution-, and chance-mechanisms are examined. With a subtractive scheme the developing spectrum enve- styles in the three region’ folk songs from Table 1, Table 2, [7] D. Bogdanov, J. Serra,` N. Wack, P. Herrera, and Stochastic control, MFCC based control and the options lope contrasts depend on the amount of filtering. With the and Table 3, We can see another indication of the general X. Serra, “Unifying low-level and high-level music of modeling are shortly discussed. Implementa- additive BLOsc approach, large contrast can be there characteristics of the three region folk songs, as all the mu- similarity measures,” IEEE Trans.Multimedia, vol. 13, tions in MAX/MSP and Super Collider are used to from the start and remain preserved when modulated. So, sic structure styles have similar ratios. no. 4, pp. 687–701, 2011. demonstrate the different options. this earlier classification, expresses a critical division; In conclusion, we can identify the general characteristics very different musical results may emerge not only due to [8] M. Mller and S. Ewert, “Chroma Toolbox: Matlab of the three regions’ folk songs are that, they have strong a difference in compositional strategy, but also due to a similarities in the music structure types and styles, having Implementations for Extracting Variants of Chroma- 1. INTRODUCTION Based Audio Features.” in in Proc. of ISMIR Interna- different valuation of the spectral factors and perceptual similar ratios,with the coordinate Structure the most and Before digital became the leading approach in electronic cues that determine the of a sound. the cyclotron structure the least. tional Society for Music Information Retrieval Confer- ence, 2011, pp. 215–220. sound synthesis, Moorer [1] introduced the band-limited oscillator (BLOsc) principle as a means to synthesize 1.2 Lower frequency limit 5. CONCLUSIONS [9] Z. Fu, G. Lu, K. M. Ting, and D. Zhang, “A survey complex audio spectra with only a limited set of frequen- of audio-based music classification and annotation,” cy-coupled oscillators. At the time in 1976 the technique A peculiar perceptual phenomenon appears when the This paper studies the general characteristics of Chinese IEEE Trans.Multimedia, vol. 13, no. 2, pp. 303–319, was still called “discrete summation”. The BLOsc uses an limiting frequency of the BLOsc is no longer close to the folk songs using the styles of folk songs’ music structure 2011. efficient calculation scheme to synthesize signals that, Nyquist-frequency but transposed downwards to a lower, types. The process consists of three steps: first, segment over a hard-limited harmonic range, show a constant ex- more audible frequency setting, somewhere below 3 kHz. each folk song into clips based on LSD audio segmentation [10] B. K. Mishra, A. Rath, N. R. Nayak, and S. Swain, “Far ponentially varying spectrum envelope, one that is Typically, more or less involuntary, the BLOsc sound efficient K-means clustering algorithm,” in Proc. of algorithm we proposed. Then, music structure annotation smooth when measured on a scale in dB/harmonic. will attain a voice-like character, where the cutoff fre- ACM International Conference on Advances in Com- to these clips. Finally, make statistics on the styles of each quency will associate with a distinct vowel identity. A puting, Communications and Informatics, 2012, pp.

folk song’ music structure types and analyze their general Hz)

/ first inexplicit suggestion of an articulating voice may characteristics. 106–110. 40 A become more apparent, or more inevitable, when the lim- The experiments show that the LSD audio algorithm we [11] Y. Tamura and S. Miyamoto, “A method of two stage iting frequency or the are modu- proposed is effective for audio segmentation according to 20 clustering using agglomerative hierarchical algorithms lated and follow distinct gestures over time. The effect is music similarity. The F-measure can reach 90.39%. It is 0 with one-pass k-means++ or k-median++,” in Proc. of audible in S. Oishi’s electronic compositions and with his feasible to automatically analyze the general characteris- 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 104 1.1·1041.2·104 IEEE Granular Computing, 2014, pp. 281–285. Sound pressure level (dB tics of folk songs based on the music structure types we Frequency (Hz) BLOsc Super Collider objects you can simply explore proposed, and the general characteristics of the three re- [12] R. Etemadpour, R. Motta, J. G. de Souza Paiva, B this phenomenon yourself [6]. It is a known phenomenon. gions’ folk songs is that all the music structure types and R. Minghim, F. de Oliveira, M. Cristina, and L. Lin- Assman and Nearey [7] already report how discrete, styles have similar ratios, with the coordinate structure be- sen, “Perception-based evaluation of projection meth- equal-intensity (flat) harmonic configurations, may trig- 0 0.01 0.02 ing the most and the cyclotron structure the least. ods for multidimensional data visualization,” IEEE ger the perception of specific vowel identities and they Trans.Vis Comput Grap., vol. 21, no. 1, pp. 81–94, Figure 1. Sectioned BLOsc, (A) spectrum & (B) signal. were able to link the cutoff to intensity 2015. 6. ACKNOWLEDGEMENT Expanding on this principle, the harmonic frequency changes in the first formant region. Their study provides [13] T. E. C. of ”Integration of Chinese folk songs”, Inte- range can again be arbitrary sub-sectioned in discrete answers using a static, constant frequency viewpoint, but The work is supported in part by the fundamental research gration of Chinese folk songs. Chinese ISBN center, we were specifically interested in the dynamics. funds for the central universities: sk2016017. Any opin- 1994. Copyright: © 2016 Peter Pabon et al. This is an open-access article dis- ions, findings and conclusions or recommendations expressed tributed under the terms of the Creative Commons Attribution License 3.0 in this material are those of the authors and do not neces- Unported, which permits unrestricted use, distribution, and reproduction sarily reflect the views of the funding agencies. in any medium, provided the original author and source are credited.

20 Proceedings Proceedingsof the International of the International Computer Computer Music ConferenceMusic Conference2016 2016 pg. 20 Proceedings Proceedingsof the International of the International Computer Computer Music ConferenceMusic Conference2016 2016 pg. 21 21 1.3 Sharp Peak or Steep Cutoff locations of sharp spectrum edges that often coincide at frequency ω, where the analyzed sig- almost unnoticeably directs users in the direction of a with the peak locations. nal has the form of an exponentially decaying impulse paradigm. are the designated spectrum structures that de- There is no reason to doubt the wisdom to locate spec- train rn that is sampled at successive nt time intervals. So, The idea of a dynamic r-modulation was already there termine the vowel identity, where the first two resonanc- trum peaks. What sticks out in the spectrum remains im- when we listen to the periodic signal q(t) synthesized by in the original description by Moorer [1]. The r- es, called F1 and F2, are the most important. The formant portant as it informs on the precise locations of the char- the BLOsc we actually hear the sampled circular frequen- coefficient allows a fluent and precise control of the spec- frequencies vary with the articulated vowel, where F1 acteristic resonance modes of a system. But this typically cy characteristic q(ω) of a truncated exponential decaying trum slope ranging from a pure (the fundamental) to typically moves in the frequency range from 200 to 1200 applies to static, time-invariant modeling. From a static pulse train of length N-1 coming from a first order recur- a flat spectrum of a band limited impulse train (BLIT), Hz, while F2 can be found in the range from 500 to 2700 viewpoint, our sectioned BLOsc approach will be cum- sive filter (one pole). To avoid confusion, we will next and even beyond to a configuration where the spectrum Hz [8]. There is an obvious overlap with the earlier- bersome. Better or more efficient schemes can be found stick to the time-domain signal q(t) interpretation only. gets progressively weaker at the low frequency-end to the mentioned range below 3 kHz where the BLOsc attains to precisely model a designated spectrum shape. Howev- point that only the highest bounding-harmonic resides this voice-like character, and a relevant question is: how q(t) S(ω) er, many spectrum cues move with time. Moreover, spec- A (Figure 2). can a steep spectrum cut-off bring up the suggestion of a trum characteristic can seldom be observed in full detail, dual formant resonator system? as there is constant competition with features of other The general notion is that the center frequencies of F1 4. EXTENSIONS sounds. Catching the salient spectrum contrasts while q(t) B S(ω) and F2 determine the vowel identity. Although the for- maintaining a dynamic continuation of this contrast over 4.1 Sectioning mant peak is widely seen as the discriminating factor, this both the time- and frequency-axis becomes the issue. idea is not as absolute as often thought. There are differ- This thinking positions our BLOsc modeling. When frequency (and phase) maintain their unidirectional ent ways to conceive the spectrum level contrasts that q(t) C interpretation, this means that complete self-similar har- appear below 3 kHz. In running speech formants are sel- S(ω) 2. FORMULATION monic progressions can be shifted, or rotated, without dom sharp. Formants cutout distinct spectrum areas, they their content further spreading over the frequency axis. log(A)/lin(f) log(A)/log(f) occupy a certain bandwidth. When formants overlap in In ancient times, Euclid already described in one of his Such a harmonic section will preserve the sine-in-sine- range, they together build a raised structure, but they also “Elements” the underlying mathematical principle that Figure 2. BLOsc signals with different r-coefficient. out linear system property of all its constituting compo- build a larger level contrast seen over a wider frequency the BLOsc is based on; the sum formula for a geometric Signal q(t) with real (all cos-phase, black curve) and nents. The instantaneous frequency and can be imaginary (all sin-phase sum, grey curve) plus its corre- range. Even when formants are characterized as moving series (see derivation 1). modulated or swept as with one single sine, with the ad- sponding amplitude spectrum S(ω) seen with different spectral peaks, as for instance in your cell-phone that uses ditional modulation of the slope coefficient r as an extra an LPC-like coding technique, then still the peaks implic- z0 + z1 + z2 + z3 +..... + z N−1 = frequency scales. N=10 . A: down-going slope -3 dB/harmonic (r=2-½), B: flat envelope (r=1), C: bonus. itly code for contrasting slopes seen over a wider fre- ½ 0 1 2 3 N−1 ()1− z up-going +3 dB/harmonic (r=2 ). Building on this line of thought, an additive synthesis quency area. (z + z + z + z +..... + z ) = ()1− z scheme is implemented, where instead of single harmon- When formants move -and they move fast in speech (1) ics, complete harmonic sections of variable width and 0 1 2 3 N−1 1 2 3 N 3. REALIZATION [9][10]- then all peaks blur while the dynamically moving (z + z + z + z +..... + z − z − z − z −..... − z ) decay are controlled separately, but where all sections are edgings become the distinctive elements. Note that our = ()1− z Our BLOsc will always generate a so-called: “analyti- still sync to the same fundamental frequency base (see ears are particularly good, or even predisposed with de- cal signal”; two outputs with a 90-degrees phase differ- figure 3). For each section only the first harmonic com- N tecting significant differences in these fast changing spec- 1− z n=N−1 ence. Except for a DC-offset, the real (cosine-sum) and ponents needs to be generated; the first harmonic in the ()= n tral settings. Those who have played around with cross- ∑ z imaginary (sine-sum) output will sound identical, as our next section is the “N” of the previous section. ()1− z = synthesis (vocoding) might have experienced the follow- n 0 ears will be deaf to this constant within-period phase dif- ing; when an arbitrary complex sound is used as a carrier In his book: “Fractals, Chaos, Power Laws”, Manfred ference [15]. Maintaining complex calculus all-over and speech as the modulator, typically most of the origi- Schroeder presents several geometrical proofs of the serves several purposes: (I) rounding errors will stay low, nal spectrum character of the carrier remains preserved above derivation in what he calls “a simple case of self- (II) the processing scheme and dynamic control stays up till the moment that a time-varying modulation brings similarity” [13]. Essentially, our extended BLOsc model simple as you remain close to the formulation which also the speech interpretation to the foreground. Note, that the aims at breaking up the frequency range in sections with makes it easier to later section the mechanism and (III), dynamic aspect of our hearing is generally understood a self-similar harmonic power development. This sec- spectral blocks do not mirror on zero frequency, which is Figure 3. Coupled band-limited oscillator sections. This from a static frequency viewpoint. A sinusoidal sweep is tioned behavior is successively condensed using the essential for a later auto-convolution by raising time do- sketch was used to generate the signal in Figure 1. modeled as a sequential rippling through successive band above geometric series abstraction. main power. Even if the calculation load is considered, Note that section levels may jump at each bound; a fi- filters where it stops being a single coherent unit. When the z in (1) is replaced by the complex exponen- there will be no benefit in converting to an all real-valued nite slope value can be modeled by introducing a new Plomp [8][11][15] forwarded a simple analysis system tial z=reiωt, then the above sum will obtain a double iden- signal implementation. The i86-based processors found in (close) section bound. With each section another quotient of two spectral weighting curves that performs equally tity; for the rn multiplier it is still a “geometric series” many computers have inbuilt CORDIC-based [14], dou- q(t) is associated. If all sections share the same r- well in positioning a vowel in the F1/F2 plane. The first with exponentially incrementing (or decrementing) mag- ble-argument instructions that do a polar-to-Cartesian or coefficient as an overall spectrum slope parameter, then dimension senses the specific spectrum region where F1 nitude, but the einωt term, that unfolds into cos(nωt) and Cartesian-to-polar conversion as fast as a floating-point all quotients may share the same denominator base. In its is most effectively varying the spectrum contrast, and the sin(nωt) terms, will also make it a “harmonic series” with multiplication or division. The exp() and log() function complex-logarithmic representation, the whole scheme other does this for F2. It is actually the derivative of the linearly incrementing nωt terms (2). are comparably fast. Any “vintage” digital version that spectrum envelope curve that does the work; the shoul- turns into a simple series of additions and subtractions n=N−1 n=N−1 n=N−1 builds on wavetables will not have a different sound; it n within a dB/Hz scaled framework. ders tell where the formant is and again the cut-off fre- iωt n inωt n (2) re = r e = r cos nωt +isin nωt will only be slower, less flexible and stand in the way of quency becomes the critical factor. ∑ ()∑ ∑ ( () ()) As all frequency multiples stem from the same phasor, n=0 n=0 n=0 a further development of the BLOsc scheme in a new It is possible to synthesize a plausible singing voice N iNωt we are free to choose any harmonic block-width within ()1− r e (3) musical direction. sound without modeling any formant peaks; only a flat = = qt()= q()ω this additive scheme and do overlap-add in the frequency 1− reiωt domain. There is no need for an equal center spacing or spectrum envelop with a sharp-edged gap suffices [12]. () 3.1 Decay control The designated sound examples also demonstrate how a The quotient in (3) that comprises the sum from (2) constant overlap factor. shift in the frequency location of only an up-going spec- can be interpreted as a signal q(t) resulting from an addi- In computer sound synthesis environments like Chuck! or trum edge may change our perception of the voice type. tive Fourier synthesis at time instant t, where the succes- Super Collider (SC), standard implementations of the 4.2 Modulation BLOsc can be found. Typically, only the spectrally flat Although we generally assume that our ears search for sive harmonic nω terms have a magnitude that scales The amplitude (magnitude of |q(t)|), the fundamental fre- (r=1) option is advertised. This simplification in a way spectrum peaks, we could actually be listening for the exponentially with the frequency index. The same quo- quency ω, and the slope coefficient r, can all be instantly downgrades this powerful oscillator mechanism, as it tient can also be seen to represent the result q(ω) of a modulated without violating the generalization. The spec-

23 Proceedings Proceedingsof the International of the International Computer Computer Music ConferenceMusic Conference2016 2016 pg. 22 Proceedings Proceedingsof the International of the International Computer Computer Music ConferenceMusic Conference2016 2016 pg. 23 24 1.3 Sharp Peak or Steep Cutoff locations of sharp spectrum edges that often coincide Fourier analysis at frequency ω, where the analyzed sig- almost unnoticeably directs users in the direction of a with the peak locations. nal has the form of an exponentially decaying impulse subtractive synthesis paradigm. Formants are the designated spectrum structures that de- There is no reason to doubt the wisdom to locate spec- train rn that is sampled at successive nt time intervals. So, The idea of a dynamic r-modulation was already there termine the vowel identity, where the first two resonanc- trum peaks. What sticks out in the spectrum remains im- when we listen to the periodic signal q(t) synthesized by in the original description by Moorer [1]. The r- es, called F1 and F2, are the most important. The formant portant as it informs on the precise locations of the char- the BLOsc we actually hear the sampled circular frequen- coefficient allows a fluent and precise control of the spec- frequencies vary with the articulated vowel, where F1 acteristic resonance modes of a system. But this typically cy characteristic q(ω) of a truncated exponential decaying trum slope ranging from a pure sine (the fundamental) to typically moves in the frequency range from 200 to 1200 applies to static, time-invariant modeling. From a static pulse train of length N-1 coming from a first order recur- a flat spectrum of a band limited impulse train (BLIT), Hz, while F2 can be found in the range from 500 to 2700 viewpoint, our sectioned BLOsc approach will be cum- sive filter (one pole). To avoid confusion, we will next and even beyond to a configuration where the spectrum Hz [8]. There is an obvious overlap with the earlier- bersome. Better or more efficient schemes can be found stick to the time-domain signal q(t) interpretation only. gets progressively weaker at the low frequency-end to the mentioned range below 3 kHz where the BLOsc attains to precisely model a designated spectrum shape. Howev- point that only the highest bounding-harmonic resides this voice-like character, and a relevant question is: how q(t) S(ω) er, many spectrum cues move with time. Moreover, spec- A (Figure 2). can a steep spectrum cut-off bring up the suggestion of a trum characteristic can seldom be observed in full detail, dual formant resonator system? as there is constant competition with features of other The general notion is that the center frequencies of F1 4. EXTENSIONS sounds. Catching the salient spectrum contrasts while q(t) B S(ω) and F2 determine the vowel identity. Although the for- maintaining a dynamic continuation of this contrast over 4.1 Sectioning mant peak is widely seen as the discriminating factor, this both the time- and frequency-axis becomes the issue. idea is not as absolute as often thought. There are differ- This thinking positions our BLOsc modeling. When frequency (and phase) maintain their unidirectional ent ways to conceive the spectrum level contrasts that q(t) C interpretation, this means that complete self-similar har- appear below 3 kHz. In running speech formants are sel- S(ω) 2. FORMULATION monic progressions can be shifted, or rotated, without dom sharp. Formants cutout distinct spectrum areas, they their content further spreading over the frequency axis. log(A)/lin(f) log(A)/log(f) occupy a certain bandwidth. When formants overlap in In ancient times, Euclid already described in one of his Such a harmonic section will preserve the sine-in-sine- range, they together build a raised structure, but they also “Elements” the underlying mathematical principle that Figure 2. BLOsc signals with different r-coefficient. out linear system property of all its constituting compo- build a larger level contrast seen over a wider frequency the BLOsc is based on; the sum formula for a geometric Signal q(t) with real (all cos-phase, black curve) and nents. The instantaneous frequency and amplitude can be imaginary (all sin-phase sum, grey curve) plus its corre- range. Even when formants are characterized as moving series (see derivation 1). modulated or swept as with one single sine, with the ad- sponding amplitude spectrum S(ω) seen with different spectral peaks, as for instance in your cell-phone that uses ditional modulation of the slope coefficient r as an extra an LPC-like coding technique, then still the peaks implic- z0 + z1 + z2 + z3 +..... + z N−1 = frequency scales. N=10 harmonics. A: down-going slope -3 dB/harmonic (r=2-½), B: flat envelope (r=1), C: bonus. itly code for contrasting slopes seen over a wider fre- ½ 0 1 2 3 N−1 ()1− z up-going +3 dB/harmonic (r=2 ). Building on this line of thought, an additive synthesis quency area. (z + z + z + z +..... + z ) = ()1− z scheme is implemented, where instead of single harmon- When formants move -and they move fast in speech (1) ics, complete harmonic sections of variable width and 0 1 2 3 N−1 1 2 3 N 3. REALIZATION [9][10]- then all peaks blur while the dynamically moving (z + z + z + z +..... + z − z − z − z −..... − z ) decay are controlled separately, but where all sections are edgings become the distinctive elements. Note that our = ()1− z Our BLOsc will always generate a so-called: “analyti- still sync to the same fundamental frequency base (see ears are particularly good, or even predisposed with de- cal signal”; two outputs with a 90-degrees phase differ- figure 3). For each section only the first harmonic com- N tecting significant differences in these fast changing spec- 1− z n=N−1 ence. Except for a DC-offset, the real (cosine-sum) and ponents needs to be generated; the first harmonic in the ()= n tral settings. Those who have played around with cross- ∑ z imaginary (sine-sum) output will sound identical, as our next section is the “N” of the previous section. ()1− z = synthesis (vocoding) might have experienced the follow- n 0 ears will be deaf to this constant within-period phase dif- ing; when an arbitrary complex sound is used as a carrier In his book: “Fractals, Chaos, Power Laws”, Manfred ference [15]. Maintaining complex calculus all-over and speech as the modulator, typically most of the origi- Schroeder presents several geometrical proofs of the serves several purposes: (I) rounding errors will stay low, nal spectrum character of the carrier remains preserved above derivation in what he calls “a simple case of self- (II) the processing scheme and dynamic control stays up till the moment that a time-varying modulation brings similarity” [13]. Essentially, our extended BLOsc model simple as you remain close to the formulation which also the speech interpretation to the foreground. Note, that the aims at breaking up the frequency range in sections with makes it easier to later section the mechanism and (III), dynamic aspect of our hearing is generally understood a self-similar harmonic power development. This sec- spectral blocks do not mirror on zero frequency, which is Figure 3. Coupled band-limited oscillator sections. This from a static frequency viewpoint. A sinusoidal sweep is tioned behavior is successively condensed using the essential for a later auto-convolution by raising time do- sketch was used to generate the signal in Figure 1. modeled as a sequential rippling through successive band above geometric series abstraction. main power. Even if the calculation load is considered, Note that section levels may jump at each bound; a fi- filters where it stops being a single coherent unit. When the z in (1) is replaced by the complex exponen- there will be no benefit in converting to an all real-valued nite slope value can be modeled by introducing a new Plomp [8][11][15] forwarded a simple analysis system tial z=reiωt, then the above sum will obtain a double iden- signal implementation. The i86-based processors found in (close) section bound. With each section another quotient of two spectral weighting curves that performs equally tity; for the rn multiplier it is still a “geometric series” many computers have inbuilt CORDIC-based [14], dou- q(t) is associated. If all sections share the same r- well in positioning a vowel in the F1/F2 plane. The first with exponentially incrementing (or decrementing) mag- ble-argument instructions that do a polar-to-Cartesian or coefficient as an overall spectrum slope parameter, then dimension senses the specific spectrum region where F1 nitude, but the einωt term, that unfolds into cos(nωt) and Cartesian-to-polar conversion as fast as a floating-point all quotients may share the same denominator base. In its is most effectively varying the spectrum contrast, and the sin(nωt) terms, will also make it a “harmonic series” with multiplication or division. The exp() and log() function complex-logarithmic representation, the whole scheme other does this for F2. It is actually the derivative of the linearly incrementing nωt terms (2). are comparably fast. Any “vintage” digital version that spectrum envelope curve that does the work; the shoul- turns into a simple series of additions and subtractions n=N−1 n=N−1 n=N−1 builds on wavetables will not have a different sound; it n within a dB/Hz scaled framework. ders tell where the formant is and again the cut-off fre- iωt n inωt n (2) re = r e = r cos nωt +isin nωt will only be slower, less flexible and stand in the way of quency becomes the critical factor. ∑ ()∑ ∑ ( () ()) As all frequency multiples stem from the same phasor, n=0 n=0 n=0 a further development of the BLOsc scheme in a new It is possible to synthesize a plausible singing voice N iNωt we are free to choose any harmonic block-width within ()1− r e (3) musical direction. sound without modeling any formant peaks; only a flat = = qt()= q()ω this additive scheme and do overlap-add in the frequency 1− reiωt domain. There is no need for an equal center spacing or spectrum envelop with a sharp-edged gap suffices [12]. () 3.1 Decay control The designated sound examples also demonstrate how a The quotient in (3) that comprises the sum from (2) constant overlap factor. shift in the frequency location of only an up-going spec- can be interpreted as a signal q(t) resulting from an addi- In computer sound synthesis environments like Chuck! or trum edge may change our perception of the voice type. tive Fourier synthesis at time instant t, where the succes- Super Collider (SC), standard implementations of the 4.2 Modulation BLOsc can be found. Typically, only the spectrally flat Although we generally assume that our ears search for sive harmonic nω terms have a magnitude that scales The amplitude (magnitude of |q(t)|), the fundamental fre- (r=1) option is advertised. This simplification in a way spectrum peaks, we could actually be listening for the exponentially with the frequency index. The same quo- quency ω, and the slope coefficient r, can all be instantly downgrades this powerful oscillator mechanism, as it tient can also be seen to represent the result q(ω) of a modulated without violating the generalization. The spec-

22 Proceedings Proceedingsof the International of the International Computer Computer Music ConferenceMusic Conference2016 2016 pg. 22 Proceedings Proceedingsof the International of the International Computer Computer Music ConferenceMusic Conference2016 2016 pg. 23 23 tral spreading can be predicted by extrapolating the mod- Effects (DAFx09), Como, Italy, September 1–4. a c Lin(A) ulation rules for a single frequency component. However, A B 0 dB +6 dB 2009. A C any instantaneous update of the bounding harmonic num- d Log(A) ber (the N), will generally result in a glitch. At the cross- b [5] J. Pekonen, et al. "Variable fractional delay filters in ing of the zero-phase instant, the rate of change in the bandlimited oscillator algorithms for music C phase curve and log amplitude curve are minimal. At this D -6 dB +12 dB synthesis." Green Circuits and Systems (ICGCS), point in the fundamental period all sine sums are crossing 2010 International Conference on. IEEE, 2010. zero and all cosine sums reach their maximum. This is [6] S. Oishi, “Timbral Movements in the normalization point where amplitude sums in time Composition”. Master thesis, Institute of Sonology, and frequency domain equate. It is thus the best point to E F 12 dB +18 dB - Royal Conservatoire, the Hague, 2015. switch over N, and/or change signal power number (dis- cussed later). To dynamically change the “N” the BLOsc [7] P. F. Assmann, and T.M. Nearey, “Perception of must operate in a period-by-period mode. The zero-phase Figure 4. Raising power in the time domain. Signal q(t) front vowels: The role of harmonics in the first instant is also the least-penalty point to start-stop the os- (A) real (cosine-sum) only, for BLOsc (r=1 & N=6), -18 dB +24 dB formant region.” The journal of the Acoustical cillator in a one-shot period mode. Although the abrupt q(t)2 (C) and q(t)3 (E). Corresponding Lin(A)/Lin(f) society of America, 81(2), (1987): 520-534. begin and end violate the infinite time principle the spectra (B, D, F) demonstrate the convolution effect in BLOsc is based on, that does not mean that the sound the frequency domain. [8] L.C.W. Pols, H.R.C. Tromp, and R. Plomp. results will be less interesting. Figure 5. Center shift as function of r. In each panel: "Frequency analysis of Dutch vowels from 50 male In the example in Figure 4, deliberately six harmonics (a) initial harmonic balance, (b) signal q(t)8, (c) result- speakers." The journal of the Acoustical Society of of equal amplitude were chosen to draw a parallel to the 4.3 Alternative envelope controls ing spectrum as Lin(A)/Lin(f) and (b) as Log(A)/Lin(f). America 53.4 (1973): 1093-1101. uniform chance function that is seen for each digit of a In the GENDY model from Xenakis, waveform break- six-sided dice. When more dice are thrown at the same The resulting distribution will reshape from skewed- [9] T. Gay, "Effect of speaking rate on diphthong points are varied using a stochastic model. This idea can time, each dice will be an identical independent distrib- to-the-left, via normal, to skewed-to-the-right bound due formant movements." The Journal of the Acoustical in a direct way be ported to a frequency domain sectioned uting (IID) process. As the chance functions combine as to a gradual rebalancing of the prominence within the Society of America 44.6 (1968): 1570-1573. BLOsc implementation, where the breakpoints vary on a in a convolution procedure, the harmonic distributions initial harmonic pair. Such a spectral progression will 2 3 [10] D. J. Hermes. "Vowel-onset detection." The Journal dB/Hz scales, while updating is done in a period-by- seen for q(t) and for q(t) will match to the probability mimic in a realistic way a moving formant, however, period fashion. As a different overlap is allowed, the functions that result when two, and when three dice are with sudden stops at the region bounds. of the Acoustical Society of America 87.2 (1990): breakpoints may also be randomly redistributing on a thrown together. Unfortunately, the above spreading 866-873. logarithmic frequency scale to better agree with a percep- mechanism will disassociate when fractional powers are 6. CONCLUSION [11] L. Pols, CW, LJ Th Van der Kamp, and R. Plomp. tual (critical band) organization. A log(A)/Log(f) setting used. This is a pity as it could have presented us a simple One chooses this extended BLOsc model for the strict "Perceptual and physical space of vowel sounds." compares to the domain where the MFCC’s are defined direct scheme to arrive at a linear spectrum slope control frequency partitioning that it holds, even on modulation. The Journal of the Acoustical Society of America from. Thus the sectioned BLOsc presents a simple in a log(A)/Log(f) scale setting. Complete harmonic sections, or formant like structures, 46.2B (1969): 458-467. scheme to re-synthesize this spectrum envelope abstrac- can be moved as one unit while preserving the same de- tion using an arbitrary fundamental frequency carrier. 5.2 Formant shaping [12] http://kc.koncon.nl/staff/pabon/SingingVoiceSynthe gree of spectrum contrast. It is thus possible to stay sharp sis/CutoffF3/CutoffFreqF3.htm, retrieved Feb 29, To exploit the above spreading mechanism at least two in frequency while following a sharp defined path in 2016. 5. CONVOLUTION AND CHANCE harmonics are needed. An initial setting with equal har- time. Time-invariance is a major constraint in filter de- monic amplitude compares to the 50% chance seen with a sign. To realize steep spectrum contrasts we generally [13] M. Schroeder, “Fractals, Chaos, Power Laws,” W.H 5.1 Raising power in the time domain coin flipping process. Following this analogy, we can need higher-order (inflexible) filter-structures with long Freeman and Company, New York, 1991. Fourier theory presents the principle that multiplication in predict that each power increment will successively add a impulse responses. For this reason, this additive scheme new harmonic to the series, where the numbers from Pas- can offer you “dynamic spectral consistencies” that are [14] O. Spaniol, “Computer Arthmetic, Logic and the frequency domain compares to a convolution process Design,” John Wiley & Sons, New York, 1981. in the time domain, but the opposite is also true. So, when cal’s triangle will predict the series. For not hard to realize with any subtractive (filtering) synthesis the analytical signal q(t) is multiplied with itself, produc- even that high (third) power the harmonic amplitude dis- scheme, and perhaps new and interesting sound results. [15] R. Plomp, Aspects of tone sensation: A ing the (still complex-valued) squared signal q(t)2, this tribution will already reasonably approximate a Gaussian psychophysical study. Academic Press, 1976. simple operation will correspond to an auto-convolution shape, as shown in Figure 4F. 7. REFERENCES (or cross-correlation, or filtering) in the frequency do- Note that the frequency information will still be band- [16] Unser, M. Splines; A Perfect Fit for Image and [1] J.A. Moorer, “The synthesis of complex audio main. The principle is demonstrated in Figure 4, where limited to a width set by the power-index for the signal in Signal Processing. IEEE Signal Processing spectra by means of discrete summation,” J. Audio the harmonic series 1..6, is convolved with itself by simp- the time domain. On the logarithmic dB-scale the approx- magazine, 16. (1999) 6. imated Gaussian shape reveals its minus-squared depend- Eng. Soc., 24, pp. 717–727, 1976. ly squaring q(t). This brings about a new harmonic series [17] M. Puckette, "Formant-based audio synthesis using ency as an inverted parabola (see Figure 5d). The thus with the information spread over double the frequency [2] V. Välimäki, and A. Huovilainen. "Oscillator and produced spectral prominence can be used to model a nonlinear distortion." Journal of the Audio width, but with still the original harmonic spacing. Rising filter algorithms for virtual analog synthesis." formant resonance in an additive synthesis scheme. It Engineering Society 43.1/2 (1995): 40-47. q(t) to the third power will spread the information up to Computer Music Journal 30.2 (2006): 19-31. harmonic 18. For signal q(t) the spectrum envelope is flat follows the thinking that is also seen with the VOSIM, [18] Tempelaars, Stan. "The VOSIM signal spectrum." (constant, zero order). With the squared signal q(t)2 the FOF and PAF models [17][18][19]. [3] V. Välimäki, J. Pekonen, and J. Nam. "Perceptually Journal of New Music Research (AKA Interface) 6.2 spectrum envelope goes linearly up/down (first order), A disadvantage of this time-domain power-controlled informed synthesis of bandlimited classical (1977): 81-96. and with a cubic signal q(t)3, the spectrum follows a par- formant-shaping approach is that the bandwidth of the waveforms using integrated polynomial abolic (second order) curvature. A comparable generic peak will increase with the fundamental frequency. It can interpolation." The Journal of the Acoustical Society [19] X. Rodet, Y. Potard, and J.B. Barriere, “The scheme that builds from a flat zero-order kernel, to higher thus be difficult to model real sharp peaks, as to give of America 131.1 (2012): 974-986. CHANT project: from the synthesis of the singing order shapes is seen with B-spline interpolation [16]. formants steep shoulders; high signal-powers are needed, voice to synthesis in general.” Computer Music which also widens the distribution. This problem can be [4] J. Nam, et al. "Alias-free virtual analog oscillators Journal 8/3, (1984) pp. 15-31. partly solved by varying the r-coefficient of the initial using a feedback delay loop." Proceedings of the pair (see Figure 5). 12th International Conference on

25 Proceedings Proceedingsof the International of the International Computer Computer Music ConferenceMusic Conference2016 2016 pg. 24 Proceedings Proceedingsof the International of the International Computer Computer Music ConferenceMusic Conference2016 2016 pg. 25 26 tral spreading can be predicted by extrapolating the mod- Effects (DAFx09), Como, Italy, September 1–4. a c Lin(A) ulation rules for a single frequency component. However, A B 0 dB +6 dB 2009. A C any instantaneous update of the bounding harmonic num- d Log(A) ber (the N), will generally result in a glitch. At the cross- b [5] J. Pekonen, et al. "Variable fractional delay filters in ing of the zero-phase instant, the rate of change in the bandlimited oscillator algorithms for music C phase curve and log amplitude curve are minimal. At this D -6 dB +12 dB synthesis." Green Circuits and Systems (ICGCS), point in the fundamental period all sine sums are crossing 2010 International Conference on. IEEE, 2010. zero and all cosine sums reach their maximum. This is [6] S. Oishi, “Timbral Movements in Electronic Music the normalization point where amplitude sums in time Composition”. Master thesis, Institute of Sonology, and frequency domain equate. It is thus the best point to E F 12 dB +18 dB - Royal Conservatoire, the Hague, 2015. switch over N, and/or change signal power number (dis- cussed later). To dynamically change the “N” the BLOsc [7] P. F. Assmann, and T.M. Nearey, “Perception of must operate in a period-by-period mode. The zero-phase Figure 4. Raising power in the time domain. Signal q(t) front vowels: The role of harmonics in the first instant is also the least-penalty point to start-stop the os- (A) real (cosine-sum) only, for BLOsc (r=1 & N=6), -18 dB +24 dB formant region.” The journal of the Acoustical cillator in a one-shot period mode. Although the abrupt q(t)2 (C) and q(t)3 (E). Corresponding Lin(A)/Lin(f) society of America, 81(2), (1987): 520-534. begin and end violate the infinite time principle the spectra (B, D, F) demonstrate the convolution effect in BLOsc is based on, that does not mean that the sound the frequency domain. [8] L.C.W. Pols, H.R.C. Tromp, and R. Plomp. results will be less interesting. Figure 5. Center shift as function of r. In each panel: "Frequency analysis of Dutch vowels from 50 male In the example in Figure 4, deliberately six harmonics (a) initial harmonic balance, (b) signal q(t)8, (c) result- speakers." The journal of the Acoustical Society of of equal amplitude were chosen to draw a parallel to the 4.3 Alternative envelope controls ing spectrum as Lin(A)/Lin(f) and (b) as Log(A)/Lin(f). America 53.4 (1973): 1093-1101. uniform chance function that is seen for each digit of a In the GENDY model from Xenakis, waveform break- six-sided dice. When more dice are thrown at the same The resulting distribution will reshape from skewed- [9] T. Gay, "Effect of speaking rate on diphthong points are varied using a stochastic model. This idea can time, each dice will be an identical independent distrib- to-the-left, via normal, to skewed-to-the-right bound due formant movements." The Journal of the Acoustical in a direct way be ported to a frequency domain sectioned uting (IID) process. As the chance functions combine as to a gradual rebalancing of the prominence within the Society of America 44.6 (1968): 1570-1573. BLOsc implementation, where the breakpoints vary on a in a convolution procedure, the harmonic distributions initial harmonic pair. Such a spectral progression will 2 3 [10] D. J. Hermes. "Vowel-onset detection." The Journal dB/Hz scales, while updating is done in a period-by- seen for q(t) and for q(t) will match to the probability mimic in a realistic way a moving formant, however, period fashion. As a different overlap is allowed, the functions that result when two, and when three dice are with sudden stops at the region bounds. of the Acoustical Society of America 87.2 (1990): breakpoints may also be randomly redistributing on a thrown together. Unfortunately, the above spreading 866-873. logarithmic frequency scale to better agree with a percep- mechanism will disassociate when fractional powers are 6. CONCLUSION [11] L. Pols, CW, LJ Th Van der Kamp, and R. Plomp. tual (critical band) organization. A log(A)/Log(f) setting used. This is a pity as it could have presented us a simple One chooses this extended BLOsc model for the strict "Perceptual and physical space of vowel sounds." compares to the domain where the MFCC’s are defined direct scheme to arrive at a linear spectrum slope control frequency partitioning that it holds, even on modulation. The Journal of the Acoustical Society of America from. Thus the sectioned BLOsc presents a simple in a log(A)/Log(f) scale setting. Complete harmonic sections, or formant like structures, 46.2B (1969): 458-467. scheme to re-synthesize this spectrum envelope abstrac- can be moved as one unit while preserving the same de- tion using an arbitrary fundamental frequency carrier. 5.2 Formant shaping [12] http://kc.koncon.nl/staff/pabon/SingingVoiceSynthe gree of spectrum contrast. It is thus possible to stay sharp sis/CutoffF3/CutoffFreqF3.htm, retrieved Feb 29, To exploit the above spreading mechanism at least two in frequency while following a sharp defined path in 2016. 5. CONVOLUTION AND CHANCE harmonics are needed. An initial setting with equal har- time. Time-invariance is a major constraint in filter de- monic amplitude compares to the 50% chance seen with a sign. To realize steep spectrum contrasts we generally [13] M. Schroeder, “Fractals, Chaos, Power Laws,” W.H 5.1 Raising power in the time domain coin flipping process. Following this analogy, we can need higher-order (inflexible) filter-structures with long Freeman and Company, New York, 1991. Fourier theory presents the principle that multiplication in predict that each power increment will successively add a impulse responses. For this reason, this additive scheme new harmonic to the series, where the numbers from Pas- can offer you “dynamic spectral consistencies” that are [14] O. Spaniol, “Computer Arthmetic, Logic and the frequency domain compares to a convolution process Design,” John Wiley & Sons, New York, 1981. in the time domain, but the opposite is also true. So, when cal’s triangle will predict the amplitudes series. For not hard to realize with any subtractive (filtering) synthesis the analytical signal q(t) is multiplied with itself, produc- even that high (third) power the harmonic amplitude dis- scheme, and perhaps new and interesting sound results. [15] R. Plomp, Aspects of tone sensation: A ing the (still complex-valued) squared signal q(t)2, this tribution will already reasonably approximate a Gaussian psychophysical study. Academic Press, 1976. simple operation will correspond to an auto-convolution shape, as shown in Figure 4F. 7. REFERENCES (or cross-correlation, or filtering) in the frequency do- Note that the frequency information will still be band- [16] Unser, M. Splines; A Perfect Fit for Image and [1] J.A. Moorer, “The synthesis of complex audio main. The principle is demonstrated in Figure 4, where limited to a width set by the power-index for the signal in Signal Processing. IEEE Signal Processing spectra by means of discrete summation,” J. Audio the harmonic series 1..6, is convolved with itself by simp- the time domain. On the logarithmic dB-scale the approx- magazine, 16. (1999) 6. imated Gaussian shape reveals its minus-squared depend- Eng. Soc., 24, pp. 717–727, 1976. ly squaring q(t). This brings about a new harmonic series [17] M. Puckette, "Formant-based audio synthesis using ency as an inverted parabola (see Figure 5d). The thus with the information spread over double the frequency [2] V. Välimäki, and A. Huovilainen. "Oscillator and produced spectral prominence can be used to model a nonlinear distortion." Journal of the Audio width, but with still the original harmonic spacing. Rising filter algorithms for virtual analog synthesis." formant resonance in an additive synthesis scheme. It Engineering Society 43.1/2 (1995): 40-47. q(t) to the third power will spread the information up to Computer Music Journal 30.2 (2006): 19-31. harmonic 18. For signal q(t) the spectrum envelope is flat follows the thinking that is also seen with the VOSIM, [18] Tempelaars, Stan. "The VOSIM signal spectrum." (constant, zero order). With the squared signal q(t)2 the FOF and PAF models [17][18][19]. [3] V. Välimäki, J. Pekonen, and J. Nam. "Perceptually Journal of New Music Research (AKA Interface) 6.2 spectrum envelope goes linearly up/down (first order), A disadvantage of this time-domain power-controlled informed synthesis of bandlimited classical (1977): 81-96. and with a cubic signal q(t)3, the spectrum follows a par- formant-shaping approach is that the bandwidth of the waveforms using integrated polynomial abolic (second order) curvature. A comparable generic peak will increase with the fundamental frequency. It can interpolation." The Journal of the Acoustical Society [19] X. Rodet, Y. Potard, and J.B. Barriere, “The scheme that builds from a flat zero-order kernel, to higher thus be difficult to model real sharp peaks, as to give of America 131.1 (2012): 974-986. CHANT project: from the synthesis of the singing order shapes is seen with B-spline interpolation [16]. formants steep shoulders; high signal-powers are needed, voice to synthesis in general.” Computer Music which also widens the distribution. This problem can be [4] J. Nam, et al. "Alias-free virtual analog oscillators Journal 8/3, (1984) pp. 15-31. partly solved by varying the r-coefficient of the initial using a feedback delay loop." Proceedings of the pair (see Figure 5). 12th International Conference on Digital Audio

24 Proceedings Proceedingsof the International of the International Computer Computer Music ConferenceMusic Conference2016 2016 pg. 24 Proceedings Proceedingsof the International of the International Computer Computer Music ConferenceMusic Conference2016 2016 pg. 25 25