Keiji Hirata* and Tatsuya Aoyagi† Computational *NTT Communication Science Laboratories 2–4, Seika-cho Hikaridai, Soraku-gun, Kyoto 619–0237 Japan Representation Based [email protected] † Tsuda College on the Generative Tsuda-machi 2–1-1, Kodaira-shi, Tokyo, 187–8577 Japan Theory of Tonal Music [email protected] and the Deductive Object-Oriented Database

Much conventional music software places very lit- Sleator (1999), and Lerdahl (2001). These works are tle importance on . These programs pioneering and significant to music theory and its often simply manipulate surface structures of mu- computer implementation. sic, not the underlying structures handled in theo- Based on the above considerations, we have retical approaches such as the Generative Theory developed three music applications (Hirata and of Tonal Music (GTTM) of Lerdahl and Jackendoff Aoyagi 1999, 2000; Hirata and Hiraga 2000) in (1983), the Implication-Realization Model (Nar- which we combined time-span analysis of GTTM mour 1990), and Preference Rule Systems (Temper- as the music theory with the deductive object- ley 2001). Because music theory can serve as a oriented database (DOOD; Yokota 1992; Kifer, Lau- basis for analyzing and understanding the vague sen, and Wu 1995) for knowledge representation. and subjective meanings of music, it should con- (DOOD was originally the name of an international tribute to the development of an intelligent music conference and a research field. Here, it only refers application. In particular, we think that music the- to the name of a knowledge representation ory should underlie a music knowledge representa- method.) We think that DOOD has a theoretical tion language to enhance its descriptive power. As foundation and is thus tractable. We assume that a result, users should be able to correctly predict GTTM’s time-span analysis yields a useful model the run-time behavior of a music application. for music knowledge representation, because Many issues must be considered in designing a GTTM is one of a few music theories that involves music knowledge representation language that em- the concept of reduction, which plays a fundamen- ploys music theory (Dannenberg 1993; Wiggins et tal role in our music representation method. It is al. 1993; Selfridge-Field 1997). Among them, the true that GTTM itself has been a controversial the- following point is crucial: software implementation ory of music, but this article is not about proving is not a goal of music theory. Because it has been the validity of GTTM (see, for example, Botelho described in neither a formal nor an algorithmic 1994; London 1994). Also, to write a working pro- way, the heuristics, such as the rules for a listener’s gram, more efforts for formalizing GTTM are subjectivity and unconscious preferences, are needed than Preference Rule Systems. needed to realize working music applications that The main contribution of our research is to pro- include key identification, melody segmentation, vide a formal framework of music representation melodic similarities, metrical analysis, harmonic and to achieve musical computation based on analysis, and voice separation. To our knowledge, music theory. Our framework will be helpful in the attempts at computer implementation of music designing a wide variety of musical tasks and algo- theory at large include Nord (1992), Temperley and rithms in a consistent way. However, this article primarily describes music representation, not appli- Computer Music Journal, 27:3, pp. 73–89, Fall 2003 cations of our music representation method. We ᭧ 2003 Massachusetts Institute of Technology. emphasize here that major work (i.e., a formal rep-

Hirata and Aoyagi 73

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322482547 by guest on 30 September 2021 resentational theory of music) must be fully devel- Outline of GTTM oped before computers can automatically derive a time-span analysis from a musical score. GTTM comprises a grouping-structure analysis,a First, let us define homophony and as metrical-structure analysis,atime-span reduction, used in this article. The conventional definition of and a prolongational reduction. The grouping homophony is music in which there is a clear-cut structure analysis segments a homophony into distinction between melody and / nested groups of varying sizes. When we sing a long or in which all the parts move in the melody, we must find the proper points for drawing same rhythmic pattern (Sadie 1980). Polyphony a breath; these points correspond to the boundaries consists of music in more than one part, music in of groups. The metrical structure analysis identifies many parts, and the style in which all or several of the positions of strong and weak beats at the levels the musical parts move to some extent indepen- of a quarter note, half note, whole note, and so on. dently (Sadie 1980). In this article we introduce for- A conductor moves a baton to conduct the music mal definitions of these words. Homophony means played, and a listener keeps time by hand clapping a melody superimposed by subordinate melodies or foot tapping; both of these activities correspond and interpreted as a single melody temporally, and in some way to beats. polyphony refers to a hierarchical collection of in- Time-span reduction represents the intuitive idea dependent homophonies or that may that if we remove ornamental notes from a long be temporally overlapped. melody, we obtain a simple melody that sounds This article is organized as follows. First, we ex- similar. An entire piece of Western tonal music can plain GTTM and DOOD. Next, we present the eventually be reduced to an important note or a framework for representing polyphony based on tonic triad. The time-span reduction is performed GTTM and DOOD and describe the primitive based on the results of the grouping structure and operations for polyphony represented in our frame- metrical structure analyses in a bottom-up manner work. Then, we show some examples and discuss in the sense that parts come together to form a the advantages and disadvantages of our framework whole. together with related work. Finally, we conclude by Finally, prolongational reduction reflects the mu- mentioning future work. sical intuition related to the progression of a piece both at the harmonic and melodic levels. It is usu- ally recognized that parts of a piece or even an en- tire piece exhibit patterns of tension and release. Generative Theory of Tonal Music GTTM provides local structures to support the tension-relaxation pattern: repetition (prolongation) and a change of movement (a ‘‘turning point’’). The Lerdahl and Jackendoff (1983) proposed GTTM as a prolongational reduction is performed based on the theory for formalizing the intuition of listeners. result of the time-span reduction in a top-down GTTM provides concepts and procedures for ana- manner. That is, an important note or chord is first lyzing and interpreting Western tonal music writ- selected from an entire homophony in terms of the ten in common music notation (i.e., on a score). repetition or turning point, and the piece is then A score is just a surface structure of music, and split at that note or chord as a pivot. This step is GTTM derives from the surface structure the iterated recursively. score’s hidden hierarchical structures that corre- The rules of GTTM comprise well-formedness spond to a listener’s intuition represented as the rules (for constructing a tree structure based on underlying structures. The music treated by GTTM analyses) and preference rules (the knowledge for is limited to homophony. GTTM is modeled on the application of the well-formedness rules). The Noam Chomsky’s transformational generative results of time-span reduction and prolongational grammar, as it proposes a finite set of rules for gen- reduction are obtained in the form of a time-span erating many musical pieces. tree and a prolongational tree. The time-span and

74 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322482547 by guest on 30 September 2021 Figure 1. Example of a homophonic passage and a time-span tree, from the first phrase of J. S. Bach’s ‘‘O Haupt,’’ after Lerdahl and Jackendoff (1983, p. 115).

score. In Figure 1, under the primary branch, note W occurs at the last eighth beat of the first mea- sure, and under the secondary branch, note V oc- curs at the upbeat of the third beat. Because V and W are adjacent to each other at Level d, they make a group. Here we say that V is a left-branching elab- oration of W, and this branching indicates that W is more stable than V. The head of this group is W. In this case, W is made the head of the group in the ordinary way, which is one of four ways to make a head in GTTM. (The others are fusion, transforma- tion and cadential retain.)

prolongational trees represent a underlying struc- ture of a piece in the GTTM sense. GTTM basi- Deductive Object-Oriented Database cally attempts to look for a unique underlying structure by applying the preference rules as much As described before, music can be vague and sub- as possible. However, a piece can be interpreted in jective on many levels. Here, vagueness means that various ways, and, indeed, GTTM occasionally de- the information required for resolving the ambigu- rives more than one time-span tree and prolonga- ity of the GTTM analyses is lacking or only par- tional tree. tially given. Hence, the knowledge representation method, especially for music, must be able to treat the situations in which some attributes are lacking Time-Span Reduction or only the type of a value, not a value itself, is given or declared. We formalize time-span reduction in the following Next, subjectivity determines which underlying manner. A time-span tree is built from the bottom structure derived from the analyses should be up. First, notes or chords that are adjacent to each adopted. The underlying structures reflect the other are made into a group. Among the notes or user’s preferences, and some of them may be con- chords contained in the group, the most stable note tradictory. We think that the case-based reasoning or chord in the time span covering the duration of technique (Kolodner 1993) can properly treat such the group (called a ‘‘head’’ in GTTM) represents the subjectivity. Because cases can serve as both data group. The degree of stability in terms of the time- and rules, we can achieve high readability and effi- span reduction is determined by the results of the cient inference if we employ a scheme that can rep- grouping-structure and metrical-structure analyses. resent data and rules in the same form. A typical In applying time-span reduction, less important method amenable to this purpose is first-order notes or chords are eliminated earlier. Next, neigh- logic. boring heads similarly make a group. This process The Deductive Object-Oriented Database continues until a whole piece makes a group. (DOOD) is an extension of first-order logic in that Figure 1 depicts a sample homophonic passage it can represent the lacking of attributes and type and its time-span tree, the first phrase of Bach’s declaration (Yokota 1992) and has almost the same chorale ‘‘O Haupt’’ in Figure 5.8 (p. 115) of Lerdahl functions as the feature structure from the and Jackendoff (1983). knowledge-description point of view (Carpenter A time-span tree is a binary tree, and in this arti- 1992). Plaza (1995) claimed that the feature struc- cle we refer to an important branch at a node as ture is suitable for describing cases. A prominent primary, depicted as a stem, and the other as sec- feature of DOOD is a deductive rule for DOOD ondary, depicted as a lateral line. At the terminal terms to formally define any relation, which brings level of a time-span tree, notes or chords occur on a about descriptive efficiency, accuracy, and expand-

Hirata and Aoyagi 75

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322482547 by guest on 30 September 2021 ability. Incidentally, DOOD is similar to XML C), and note(pitch : C, octave : 4), where note is (www.w3c.org/XML) and RDF (www.w3c.org/RDF) the concept to represent a single abstract note, and owing to its object-attribute structure (Castan, it is extensionally equivalent to the set of all notes. Good, and Roland 1999). The note(pitch : C) is the note only with the infor- mation of pitch class, and it is more instantiated Object Term than note. When we say ‘‘a C-major chord consists of notes the C, E, and G,’’ note C is represented as The DOOD framework is motivated by the insight note(pitch : C). The note(pitch : C, octave : 4) is that real-world entities can be represented as the more instantiated than note(pitch : C) and repre- combination of a basic (atomic) object and a set of sents the C4 key on the piano keyboard, for exam- attributes. Hereafter, we identify an object term ple. Due to the deductive rule of the subsumption with an object itself. We write an object term as relation, note(pitch : C, octave : 4) ʕ note(pitch : o(...,l:v,. . .), where o is an atomic symbol for a C) and note(pitch : C) ʕ note both hold. Because basic object, l:vis an attribute, l is an attribute’s transitivity also holds with respect to the subsump- name (label), and v is an attribute’s value. We in- tion relation, we can also write note(pitch : C, oc- troduce for convenience another notation: for an tave : 4) ʕ note. object term p(...,l:v,...), then p.l is equal to the Next, we define the subsumption relation be- attribute value v. tween the sets of objects. Assume Oi is the set of ס ס objects oij (i 1, 2 and j 1. . .Ni). We can think Subsumption Relation of various definitions for the subsumption relation

between O1 and O2. Typical ones in the deductive The most fundamental relation among real-world rule form are objects is the ‘‘is_a’’ relation, and it is modeled as ʕ ←∀ ʦ ∃ ʦ ʕ the subsumption relation defined by the deductive O1 H O2 s O1 t O2 s t ʕ ←∀ ʦ ∃ ʦ ʕ rule in the DOOD framework. That is, the sub- O1 S O2 s O2 t O1 t s sumption relation (written as ʕ) represents the ʕ The relation H represents the so-called Hoare or- relation ‘‘a more informative object ʕ a less infor- ʕ der of sets, and S the so-called Smyth order. For mative object.’’ In other words, it represents ‘‘an example, assuming b ʕ b and d ʕ d , we then ʕ 1 2 1 2 instantiated object an abstract object’’ or ‘‘a spe- have {b ,d } ʕ {a,b ,c,d } and {a,b ,c,d } ʕ {b ,d }. ʕ 1 1 H 2 2 1 1 S 2 2 cific object a general object.’’ If there is a disjunctive relation between the ele- The subsumption relation is defined in the form ments of a set, the Hoare order is appropriate for of the deductive rule as follows. For two objects comparing the sets, and if the conjunctive relation, p(. . ., l :v . . .), the ס p(. . ., l : v . . .) and o ס o 1 m m 2 n n the Smyth order is appropriate. subsumption relation between o1 and o2 is defined as ʕ $ ס ∃ ∀← ʕ o1 o2 ln lm (ln lm o1.lm o2.ln) This definition of the subsumption relation means Representation of Polyphony

that if for all attributes of o2 the values of attrib-

utes of o1 are more instantiated, then o1 is consid- When we design a music knowledge representation

ered more instantiated than o2 (or o2 is more method using DOOD, a fundamental question is

abstract than o1). For example, an object that has raised: To what concept in music theory should the fewer attributes is considered more abstract. Be- subsumption relation of DOOD correspond? In tween two objects that have distinct basic object other words, what does the ‘‘is_a’’ relation mean in terms, the subsumption relation does not hold. By music? We assert that the counterpart of the sub- this definition, the subsumption relation can be sumption relation is the time-span reduction in proven to be transitive. GTTM, because the subsumption relation in gen- We show a musical example of the subsumption eral represents the instantiation-abstraction rela- relation. Consider three objects, note, note(pitch : tion, and the time-span reduction associates an

76 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322482547 by guest on 30 September 2021 instantiated melody with an abstract one. We be- framework deals with alignment as simply as lieve that several examples shown later support possible, we consider the proximate homophonic this assertion appropriately, and hence this corre- passages (in fact, their heads) to be just adjacent spondence is the most natural. branches of a time-span tree. In response, we al- That GTTM treats only homophony presents a ways place the ordering among these homophonic major limitation, as most Western tonal music is passages in terms of stability. Figure 2 outlines the polyphonic. Therefore, we now extend the time- time-span reduction of polyphony in our frame- span reduction of GTTM to treat polyphony. work. According to the definition of polyphony, the polyphonic passage in Figure 2 is considered to be a Time-Span Reduction of Polyphony collection of three homophonies temporally over- According to the definition of polyphony, we as- lapped; each homophonic passage is enclosed in a box. Take the left box in Figure 2 that represents a sume that the time-span reduction of polyphony   homophony G4–A4–B 4–E 5. By time-span reducing proceeds as follows. A polyphonic passage is first  decomposed to components until they become the homophonic passage to a single note, an E 5is homophonic (independent streams). The conven- obtained as the head. Similarly, the head of the tional time-span reduction is then applied to each center passage is C2, and that of the right-most is homophonic stream, and the results are finally F2. Then, we determine the ordering among these combined into a single overall grouping structure heads based on stability, which yields the time- in some way. Lerdahl (2001) proposed a method for span tree one-level higher on the lower side of Fig- the combination in which proximate homophonic ure 2. We will discuss how to set up a head of a passages are aligned in phase with a metrical grid. time span later in this section. Within the aligned homophony, the conventional time-span reduction is applied. We call the method Representation of the Time-Span Tree homophonization by alignment. The scope of GTTM is limited to homophony First, we consider two simple melodies, both of owing to the grouping well-formedness rules and which consist of C4 and E4 (see Figures 3a and 3b). the strong reduction hypothesis (SRH). The SRH To represent a node of the time-span tree, we in- claims that a piece is heard in a strict structural troduce a ts object, which has attributes head, pri- manner and that a less structurally important note mary, and secondary. The ts object that has only is heard in association with a surrounding note that the head attribute represents a terminal node of a is more structurally important. Further, every note time-span tree. It stands for a special group consist- or group always belongs to a group one level ing of a single note. The following object K␣ repre- higher, and, at the same level, the time-spans of sents the time-span tree in Figure 3a: any group do not temporally overlap each other. ס To treat polyphony, this latter rule should be pre- K␣ ts(head : C4, cluded; i.e., time-spans of groups are allowed to primary : ts(head : C4), overlap. In light of SRH, the homophonization by secondary : ts(head : E4))

alignment transforms a polyphonic passage without Then, we make a time-span tree from K␣ by de- changing its musical meaning and yields the situa- leting the less-important branch E4; the new time- tion where the latter rule holds. span tree obtained is designated by the following

However, Lerdahl’s method raises a problem object Kb. from the knowledge-representation point of view. (ts(head : C4 ס K That is, we must introduce a new data structure for b ʕ representing alignment amenable to the DOOD Here, K␣ Kb holds. framework. This new data structure may bring However, the object representing the time-span about an additional complexity. To ensure our tree in Figure 3b is equal to that for Figure 3a (i.e.,

Hirata and Aoyagi 77

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322482547 by guest on 30 September 2021 Figure 2. Polyphonic composed into homo- Figure 3. Simple time-span grouping. The first two phonic segments, and trees that consist of the bars of ‘‘Autumn Leaves’’ analysis results are then same pitches. performed by a pian- combined into a single ist. The polyphony is de- grouping structure.

(a) (b)

distinct durations for each note, we may naturally perceive them as similar to each other, because we understand homophony at an abstract level of pitch sequence. Condition 3 is motivated by our intui- tion of the abstraction of the ordering. Abstracting the ordering means that the ordering ‘‘note N is preceded by note M’’ becomes the loose ordering ‘‘note N is preceded by or occurs at the same time as note M.’’ Let us consider an example where we make the K␣), because this representation method cannot de- abstraction of a broken chord (e.g., Alberti bass) and scribe the temporal relation between the primary then obtain the chord consisting of the notes of the and secondary attributes. Thus, we must introduce same pitches. The first half of this abstraction an attribute representing the temporal relation. makes the abstraction of the time difference be- tween notes of a broken chord, supported by Con- dition 2. The second half reduces the ordering to a single chord, which is supported by Condition 3. Conditions Required for Temporal Relation

To represent the temporal relation for the onset time of a note contained in a polyphonic passage, Representation of Temporal Structure we impose three conditions. First, the onset time of a relevant note should be given by the difference To satisfy these three conditions, we represent the between the note itself and the more-important onset time of a note by using four attributes: a pre- nearest note on a time-span tree (Condition 1). Sec- ceding note, a succeeding note, an adjacent stable ond, abstracting the difference between onset times note, and a time difference. (We assume that notes should provide the ordering information between surround a relevant note.) Then, one of the tempo- them, not absolute timings (Condition 2). Finally, rally preceding notes is considered the preceding abstracting the ordering information should provide note p, and, similarly, one of the temporally suc- the reduced ordering information (Condition 3). ceeding notes the succeeding note r. Then, the on- Condition 1 is implicitly prescribed in the defini- set time of a relevant note is represented by the tion of the time-span reduction. Condition 2 is mo- constraint that it occurs somewhere after p and be- tivated by our intuition in terms of homophony. fore r, which guarantees the ordering information. For example, assuming that there are two homo- Furthermore, we impose the constraint that either phonic passages of the same pitch sequence with p or r is the head of a group to which the relevant

78 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322482547 by guest on 30 September 2021 Figure 4. (a) Temporal structure. (b) Reduced structure.

(a) (b)

note anchors. The stable-note attribute indicates correspond to a preceding note, a succeeding note, whether p or r is the head. The time difference is a stable note, and the time difference between the the interval between the relevant note and the sta- relevant note and the stable note, respectively. ble one. If the time difference is given, then the ab- Here we can think of a couple of methods for de- solute time of the relevant note is obtained, and if signing an object of the temporal structure. In one, not, then only its ordering information is. to interpret a temporal structure, the structure is Here we assume that if a note or a chord of a traversed from the origin point of an entire piece group becomes a head by time-span reduction, the (note p in Figures 4a and 4b) in a top-down manner. onset time of the head is equal to that of the note In the other, it is traversed from every note con- or the chord. We make this assumption because we tained in a piece in a bottom-up manner. Our want to preserve the amount of information as framework employs the latter, because it is desir- much as possible in the time-span reduction, al- able that a single object be able to represent a poly- though we just deviate from GTTM (discussed phonic passage, and thus the temporal structure for again later). the onset time of every note must be embedded Figure 4 shows an example of a temporal struc- into the structure of a time-span tree as shown in ture based on our framework. Figures 3a and 3b. In Figures 4a and 4b, p, q, and r are quarter notes. For example, the onset time of note q in Figure Assume that the onset time of note p is the origin 4a, T , is written as point of a whole polyphonic passage represented by q , the symbol in the figures. Here, for the purpose temp(pred : Tp, succ : Tr, stable : Tp, difference : 1/4) of uniform description, we introduce positive infi- .ϱ) and negative infinite Note that this representation satisfies Condition 1ם nite time (denoted as T T ϱ). Then, the note preceding r is p and r are objects for representing the onsetמ time (denoted as ϱ, be- times of the preceding note p and the succeedingם p, and the time of its succeeding note is cause no note succeeds it. Note r occurs on the sec- note r, respectively. Thus, we know that note q oc- ond beat from p. Similarly, the note preceding q is curs after p and before r. The value of the difference p, and its succeeding note is r. Note q occurs on attribute 1/4 is the duration of a quarter note. Be- ס the first beat from p. The arrow (R) in the figure cause Tq.stable Tp, note q occurs one beat after p. represents the anchoring relation of a relevant note Abstracting Tq with respect to the difference at- to a head; the source of the arrow corresponds to tribute, we get Ts, representing only the ordering the relevant note, and the target its superordinate information: note on a time-span tree. temp(pred : T , succ : T , stable : T ) To represent the onset time of a note as a DOOD p r p ʕ term, we introduce a new object temp that has the Then, we obtain Tq Ts, which satisfies Condi- attributes pred, succ, stable, and difference, which tion 2.

Hirata and Aoyagi 79

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322482547 by guest on 30 September 2021 Lastly, consider notes q and qЈ in Figures 4a and so that the value of the at attribute is a temp ob- 4b as an example for examining Condition 3. Com- ject. The head attribute keeps the set of note ob- paring the temporal structure containing q and one jects that make a chord. The note object stands for containing qЈ, we can consider the latter more ab- the pitch class, octave, and duration of an individ- stract, because qЈ displaces towards p and is even- ual note of a chord. If the value of the head attri- tually reduced to p. This reduction occurs because bute is a singleton set (i.e., the number of notes is for note q, p is more superordinate than r on the one), it is equivalent to a single note. Since the re- time-span tree. Hence, we need a new deductive lation between chord notes is conjunctive, the rule for the subsumption relation for this reduction Smyth order is appropriate for the ordering of the of the temporal structure, which is head attribute values. A duration attribute is the 1/N ʕ ← ʕ duration of a chord, and is the duration of one T T.stable T temp Nth of a measure. For the attributes of a temp ob- In the example shown in Figure 4b, because ject, refer to the previous section. ס Tq.stable Tp, from the deductive rule we obtain ʕ Ј Ј Tq Tp. When q is reduced to p, and q is equiva- ʕ lent to p, it follows that Tq TqЈ. That is, the Setting a Head temporal structure in which q occurs is more in- stantiated than one in which qЈ occurs (at the posi- As mentioned before, to handle polyphony, a head tion of q), which satisfies Condition 3. can be placed anywhere within a time span in our framework. There are two ways to set the head at- tributes (position and duration). One is to place the head at the beginning of a time span and set its du- The Syntax of Object Terms ration to the width of the time span, according to the time-span reduction prescribed in GTTM (ex- The syntax of objects for representing a time-span cept for the cases such as auftakt and upbeat). We tree and a temporal structure is call this a ‘‘GTTM head.’’ In contrast, we can set a ts(head : {note}, stable note or chord itself to a head without chang- at : temp| 0, ing its position and duration. We call this a ‘‘con- primary : ts, servative head,’’ which has a practical advantage in secondary : ts) that less information is lost in the time-span reduc- tion. That is, when a stable note or chord itself be- note(pitch-class : PC, octave : Integer, duration : comes the head of a time span, the head preserves part of the information on the actual shape of the (םRational polyphony before the time-span reduction. The ,ϱם | ϱ, succ : temp | 0מ | temp(pred : temp | 0 conservative head plays an essential role when cre- (ϱ, difference : Rationalע | stable : temp | 0 ating complicated polyphony by instantiation (the

Here, objclass(name1 : objclass1, . . .) stands for inverse of reduction): the information of a stable

the name of an object class, name1 represents the note or the chord to be placed within its time span

name of an attribute, and objclass1 the name or the is retained. Our framework adopts the conservative data type of an object class at the name attribute. head rather than the GTTM head.  1 PC means a pitch class, such as C, C , etc. Integer means an integer number, Rational a rational num- a positive rational number, and 0 Example of Simple Polyphony םber, Rational an origin time. The notation ‘‘x | y’’ means ‘‘x or y’’, and {x} represents a set of x. Figure 5 shows a time-span tree and a temporal To hold the temporal structure of an individual structure represented in our framework with a con- note, we introduce an at attribute into a ts object servative head.

80 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322482547 by guest on 30 September 2021 Figure 5. A time-span tree Figure 6. A subsumption and temporal structure. relation.

tive operators for polyphony: subsumption relation (ʕ), least upper bound (lub), and greatest lower bound (glb). These primitive operators correspond to calculation of the abstraction-instantiation rela- tion, the largest common part, and the superimpo- sition of two polyphonic passages, respectively The ts object for the time-span tree and the tem- (Hirata and Matsuda 2002). poral structure in Figure 5 is described in the fol- First, we briefly discuss the mathematical back- lowing DOOD terms: ground for our framework. The subsumption rela- tion is a partial order. A partial order is a relation R ts(head : {N , N }, 3 4 such that it is reflexive (i.e., xRx), transitive (i.e., at : 0, xRy$ yRzr xRz), and antisymmetric (i.e., xR primary : ts(head : {N , N }, y). The ordering is partial, rather ס y $ yRxr x 4 3 at : 0), than total, when there may exist elements x and y secondary : ts(head : {N }, 1 for which neither xRynor yRx). Then, the do- at : T , C4 main of ts objects makes a complete lattice with primary : ts(head : {N }, 1 respect to the subsumption relation. A lattice is a at : T ) C4 set in which a partial ordering is defined and all fi- secondary : ts(head : {N }, 2 nite subsets have an intersection (lub) and a union at : T ))) D4 (glb). A complete lattice is a lattice whose every in- ס N1 note(pitch-class : C, octave : 4, duration : 1/8) finite subset has an intersection and a union. ס N2 note(pitch-class : D, octave : 4, duration : 1/8) ס N3 note(pitch-class : E, octave : 4, duration : 1/4) ס N4 note(pitch-class : G, octave : 4, duration : 1/4) Subsumption Relation ,ϱ,succ:0מ : temp(pred ס T C4 Given a polyphonic passage, the subsumption rela- stable : 0, difference : 1/4) tion associates it with a reduced or elaborated one ,temp(pred : T , succ : 0 ס T D4 C4 with respect to the time-span reduction. Figure 6 stable : T , difference : 1/8) C4 shows that, by eliminating the less important note Here, the ts object is shown with intermediate vari- D4, a more abstract polyphony is obtained.

ables N1,...,TC4, . . ., for readability, but the actual Because we adopt the conservative head, the ts object on a computer may be constructed with- eliminated D4 is replaced with an eighth rest, and out these intermediate variables. the other notes are unchanged. Hereafter, we some- times omit a time-span tree and temporal struc- tures for space efficiency. Primitives for Polyphony Figure 7 shows an example of Alberti bass. As

the time-span reduction is carried out from P1 to P2

So far, we have given a formal representation of po- to P3, less important notes are eliminated, and lyphony. In this section, we introduce three primi- more abstract melodies are obtained.

Hirata and Aoyagi 81

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322482547 by guest on 30 September 2021 Figure 7. A subsumption relation of Alberti bass (Mozart, Sonata K. 332, I, m. 1).

ʕ ʕ ʕ ʕ We then have P1 P2, P2 P3, and P1 P3, and such that p x, and given polyphonic passages p these relations agree with musical intuition. and q, we can check if p ʕ q holds. Figure 8 shows an example of a melody (Varia- tion III of Mozart’s Variations on ‘‘Ah! vous dirais- je, maman’’). Least Upper Bound The subsumption relation holds between Varia- tion III and its reduced melody. We performed We usually define a primitive lub(x, y)asmin({z|x time-span analysis of this melody using our ʕ z $ y ʕ z}), procedurally defined as method, although the time-span trees and the tem- T if a ϶ b ס (lub(a,b poral structures are not shown here. We used vari- b ס a if a ס (lub(a,b ous ways of making heads; chord C4–E4–G4 at the o ס ((lub(o, o(l:v first beat of the first bar is made in the ‘‘fusion’’ (( o(l : lub(v , v ס ((lub(o(l:v), o(l:v way, for example, as is the chord at the second beat 1 2 1 2 of the first bar. On the other hand, chord G5–C6 at Here, a and b are basic (atomic) objects, T repre- the first beat of the second bar is made in the sents the top element in the complete lattice con- .o ס (transformational’’ way. structed from objects, and we assume o(l:T‘‘ Note that the subsumption relation can be used The lub operator extracts the largest common part both actively (generatively) and passively (con- or the most common information of the time-span sumptively). For example, given a polyphonic pas- trees of two polyphonic passages in a top-down sage p, we can obtain a new polyphonic passage x manner. If all the heads are made in the ‘‘ordinary’’ way, calculating lub involves deleting less stable notes from the two passages until their ‘‘pruned’’ versions are identical. Figure 8. Reduced melody Figure 9 shows the lub calculation of polyphonic from Variation III of Mo- zart’s K. 265, Variations passages P1 and P3. on ‘‘Ah! vous dirais-je, In this lub calculation, we do not show incom- maman’’ (equivalent to plete objects that do not hold all proper attributes.

‘‘Twinkle, Twinkle Little Note that the temp objects of TC4 in P1 and TC4 in

Star’’). P3 are identical.

82 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322482547 by guest on 30 September 2021 Figure 9. Calculating lub.

The lub of these melodies yields their most com- mon information. From the subsumption in Figure 8, let us consider the lub calculation of the reduced Variation III and Variation VI, where we assume that these have the same time-span trees and tem- poral structures. For instance, we compare the first chords of these melodies, C4–E4–G4 of duration equal to one-third of a quarter note (1/12), and E4– G4–C5 of eighth-note duration (1/8). Notes E4 and G4 are common. C4 and C5 are distinct, but pitch In particular, for the duration attribute of chord class C is common. Therefore, the result of lub of the first chords is the chord E4–G4–C with a dura- objects d1 and d2, we introduce the definition .Ͻ tion equal to one-third of a quarter note ס lub(d1, d2) d1 if d1 d2. The rationale for this is to maintain consistency with the conservative Note that ‘‘C’’ represents an abstract note whose head, in which the onset and duration of a note to pitch class is specified, but not its octave. We say be a head are unchanged after the time-span reduc- the note is ‘‘incomplete.’’ The remaining chords are tion. Here, we can think of another simpler defini- similarly calculated. According to the above lub def- (inition regarding duration, we obtain lub(1/12, 1/8 ס ס tion: lub(d1, d2) d1 if d1 d2,orT otherwise. .1/12 ס -Although this definition can also maintain consis tency, we do not adopt it, because we think that it loses too much information in the case of T. For ex-

ample, the duration of note C4 in lub(P1, P3)is1/8, ס because lub(1/4, 1/8) 1/8. Figure 10. Calculating lub Figure 10 shows the lub calculation of Variation and the abstract melody III and Variation VI from our previous Mozart obtained (Mozart, Varia- example. tions K. 265).

Hirata and Aoyagi 83

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322482547 by guest on 30 September 2021 Figure 11. Calculating glb.

Furthermore, the onset-timings of the third mel- ody in Figure 10 are abstracted to the ordering in- formation of the last melody. At that time, the octave information for each note is abstracted, and it is all eliminated. Note that the subsumption re- lation can associate not only a real melody with another real one, but also an abstract melody of in- complete notes with an abstract melody or a real melody.

Greatest Lower Bound formance and composition. That is, music theory ,(max({z|zʕ x $ z has provided procedures for analysis (reduction ס (Similarly, we define glb(x, y -Ќ. Primitive glb joins two po- not synthesis (instantiation). We think the glb op ס (ʕ y}), and o(l:Ќ lyphonies in the top-down manner as long as the erator proposed realizes instantiation appropriately. structures of two polyphonies are consistent. If we arrive at a leaf node of either of the two polypho- nies, the remaining structure of the other polyph- Some Further Examples ony is expanded at the leaf node. If the value of glb of temp objects is, for example, Ќ, then the entire Here we present two examples to which our frame- object is also Ќ owing to our definition. work has been applied. The first is an arrangement Figure 11 shows the sample calculation of glb of program. Figure 12 shows the input and output pol-

polyphonic passages P1 and P4. yphonic passages of our arrangement program that

Looking at the time-span trees of P1 and P4,we employs case-based reasoning. know that the most important events are the last The upper half of Figure 12 shows the four-bar

chord in P1 (E4–G4) and the last note in P4 (A4). input. The input is first divided into four one-bar

They are joined to E4–G4–A4 in glb(P1,P4), and the fragments. For each input fragment, the arrange- chord is placed on the origin time (,). The first ment program selects the most similar case from a

note in P1 (C4) and the first note in P4 (F4) are left- case library, carries out case-based reasoning, and branching elaborations of their respectively most concatenates each output. That is, arrangement important events, and they occur one beat before proceeds bar by bar. The input consists of a mono- the origin time. Because the positions of these phonic theme for a right hand and a simple chord notes are consistent in terms of time-span structure progression for a left hand. The output is a solo and temporal structure, they can be joined to a piano performance for both hands. The sample ar-

chord, C4–F4, in glb(P1,P4). rangement in the figure uses as cases the real 32- In this glb calculation, for the duration attribute, bar performance of ‘‘Autumn Leaves’’ and a 16-bar ʜ ס glb(n1, n2) n1 n2, because we employ the performance of ‘‘Nefertiti’’ played by a jazz pianist. Smyth order between the values of the duration at- The jazz pianist looked at a simple score of ‘‘Au- tribute. In particular, for the duration attribute of tumn Leaves’’ and ‘‘Nefertiti’’ like the upper half of Ͻ ס the chord object, d1 and d2, glb(d1, d2) d2 if d1 Figure 12 and created improvisations like the lower

d2, for the same reason as with the lub calculation. half, although the entire scores and improvisations Ќ ס Note that glb(P1, P3) . For the first notes of P1 are not shown to save space. (The left-hand side of

and P4 in Figure 11 (C4 eighth note and F4 quarter Figure 12 is part of an improvisation.) Thus, the .case library here consists of 48 total one-bar cases .1/4 ס (note), we have glb(1/4, 1/8 Music theory in general is a framework for un- In the lower half of Figure 12, we see a couple of  derstanding music on a score, not for analyzing per- auftakt (upbeat) notes consisting of B 4–C5 (D4);

84 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322482547 by guest on 30 September 2021 Figure 12. (a) Input poly- phonic passage (first four bars of ‘‘Over the Rain- bow’’). (b) Sample output of arrangement applica- tion.

(a)

(b)

one lies at the very beginning of the score, and the compute f(X) with respect to the function f satisfy- -A. (Here we use a single case.) The ar ס (other begins at the third beat of the first bar. These ing f(E auftakts are originally contained in the case perfor- rangement f(X) is also a one-bar polyphonic mances used (Hirata and Aoyagi 2001). passage. Because not enough information is given We will now briefly discuss the arrangement al- to infer the function f, many such functions can be gorithm. Figure 13 illustrates the arrangement task derived. Therefore, we impose some musical con- as an imitation of a function (Hirata and Aoyagi straints onto f and, as a result, we obtain 2000). glb(lub(A,X),E), approximating f(X) to some extent. In our formalization, the input data comprises an The sample arrangement is calculated by unknown set piece X, a sample arrangement con- glb(lub(A,X),E) for each bar. (The standard MIDI sisting of a set piece E, and the resulting piece of its file outputs of the arrangement program are avail- arrangement A. Here, X, E, and A are all one-bar able online at www.brl.ntt.co.jp/people/hirata/ polyphonic passages. Then, to arrange X, we simply nttmusicmachines.html#babibun.) The second application example is a melodic similarity calculation for polyphony. Melodic simi- larity calculations play a central role in discovering Figure 13. Formalization of a piece’s structure with a music summarizer (Hir- the arrangement task as ata and Matsuda 2002). To measure similarity be- case-based reasoning. tween two polyphonic passages, we measure how much information is lacking from the two poly- phonic passages by the lub calculation. As de- scribed before, lub calculates the largest common part or the most common information of two input passages. For instance, if the calculation result is equal to its input polyphonic passages, the lub cal- culation does not lose any information, and the in- put passages are considered identical (i.e., most similar). If the calculation result is, in contrast, T, there is no common part, and they are considered unrelated (i.e., least similar). We think the lacking

Hirata and Aoyagi 85

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322482547 by guest on 30 September 2021 information should be concerned with the time- one of a few fundamental attempts toward a formal span tree and the temporal structure, which corre- treatment of music. We think that Nyquist has the spond to the two aspects of music formalized in optimal (i.e., necessary and sufficient) specifica- our framework. tions as a music description language. Cope’s sys- To make the lacking information quantitative, tem is well known for its artistic compositions in we introduce three measures, RN, RA, and RT, de- various genres. fined as follows. The models provided by Music Ontology and -lub(P,Q)| Nyquist use only the superficial relations of rele| ס R$(P,Q) max(|P|,|$$Q|) vant notes on a score with their surrounding notes, which are essentially superimposition and juxtapo- where $ represents N, A, and T. Here, let P and Q sition. The operations on these superficial relations be polyphonic passages. |P|N represents the number of note objects in P,|P| is the total number of at- include get_next_note, interval_to_previous_note, A and is_simultaneous, which work as the applica- tributes of all note objects in P, and |P|T is the total number of attributes of all temp objects in P. tion program interface (API). The feature space method basically maps a given melody to a feature In some sense, |P|N and |P|A quantify the informa- space, the dimensions of which typically include tion conveyed by the time-span tree, and |P|T quan-

tifies the temporal structure. Therefore, RN the number of notes occurring in a relevant mea- expresses the similarity of time-span trees at the sure, the average pitch of all notes, and the up/ note level, where an incomplete note is also re- down vector of pitch intervals of a melody. These

garded as a note. RA expresses the similarity of features are also regarded as superficial relations time-span trees at the attribute level and indicates and selected in a more or less ad hoc way. These to what extent the pitch class, octave, duration, three approaches require more or fewer heuristics and onset-time attributes of all notes in the result peculiar to each application, which imposes a dif- of lub are instantiated. Likewise, RT expresses the ferent system model on the user, while they have similarity of temporal structures at the attribute an advantage in that their models are small and level, and it indicates to what extent the four at- comprehensive. tributes of temp objects for all notes in the result of Other attempts at employing GTTM include lub are instantiated. Widmer’s performance-rendering system (Widmer Let us examine the real values of these similarity 1995, 1996); Arcos’s and de Ma´ntaras’s saxophone and ,1.0 ס R ס R ס Q, then R ס measures. If P N A T sound synthesizer (2000); and a learning system de- ס R ס R ס T in contrast, then R ס (if lub(P,Q N A T scribed by Uwabu, Katayose, and Inokuchi based 0.0. For an interesting case, consider the similarity between P and Q such that P ʕ Q. Then, because on prolongational reduction from a corpus (1997). Unfortunately, Widmer, Arcos, and de Ma´ntaras ס Ն this leads to |P|$ |Q|$ for $ N, A, or T, we have -give just the schematic pictures related to knowl ס R$ |Q|$/|P|$. For more details, please refer to Hirata and Matsuda (2002). edge representation. It does not seem that these data structures are suitably designed for the argu- ments of some mathematical operators. Uwabu and Discussion colleagues proposed an actual procedure for prolon- gational reduction by using the co-occurrence rela- Related Work tions extracted from corpora, where a Music representation methods and music descrip- prolongational tree is described in the ML lan- tion languages that do not employ GTTM include guage, not a time-span tree. Their data structure is Balaban’s Music Ontology (Balaban 1996), Dannen- dedicated to discovering the co-occurrence relation, berg’s Nyquist (Dannenberg 1997), and feature and the temporal information of a piece is not in- space (Rowe 1993; Cope 1996). Music Ontology is cluded.

86 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322482547 by guest on 30 September 2021 Figure 14. Screen shot of the TS-editor currently un- der development, which shows the first seven bars of Mozart’ ‘‘Rondo alla Turca’’ from Mozart’s Sonata in A, K. 331.

Problems conservative head and the GTTM head from the listener’s recognition point of view. We think there are four problems to be solved. Second, our framework provides three primitive First, GTTM claims that a head dominates the operators: the subsumption relation, least upper time span including it, and a head is hence placed bound, and greatest lower bound. However, the de- at the beginning of a time span on a score (the scriptive power of these primitives may not be ‘‘GTTM head’’). It turns out that the notes during enough to express the functions for complicated the time-span reduction do not represent real notes musical tasks. Hence, we should invent or discover but rather the notes in the listener’s recognition. new primitive operators to gain more descriptive Thus, Lerdahl and Jackendoff might not think that power. it was meaningful to examine the similarity be- Third, because music itself is comprised of many tween the original homophony and the intermedi- features, music should be analyzed from more than ate melodies during the time-span reduction. one aspect to obtain a more precise representation Dibben (1994) investigated the similarity between of musical intuition and concepts. This may lead to an original homophony and the intermediate melo- the use of other music theories, such as Preference dies with the GTTM head and reported an affirma- Rule Systems, prolongational reduction of GTTM, tive result, while there were no considerations on and the implication-realization model. However, the idea of the conservative head. We would look formalizing the music theories that lack the reduc- forward to future investigations that compare the tion concept will require some other techniques.

Hirata and Aoyagi 87

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322482547 by guest on 30 September 2021 Additionally, efficiency of description and trans- Music theory has addressed musical intuition position should be improved. Because parts of ts and recognition without the concept of automation and temp objects represent the same information for a long time. At present, there are a few ongoing (a stable note), we may improve their descriptions projects for computer implementation of music by introducing inheritance, for example. Next, our theory, such as Temperley and Sleator (1999) and method cannot compare two polyphonic passages Lerdahl (2001). We believe that a fundamental that have the same contour in different keys (i.e., problem for an intelligent music application is mu- transposed polyphony). We must formalize the sic knowledge representation as well as computer transposition concept in our framework. implementation of music theory. We hope our re- Fourth, the time-span analysis must be done search will be a first step in bridging the gap be- manually in our framework, but we do not offer tween music theory and music knowledge any ideas on how a time-span tree can be algo- representation. rithmically derived from a musical surface in this article. This article focuses on music knowledge representation and presents usage of the time-span Acknowledgments analysis results after computers automatically de- rive a time-span tree from a musical surface. We We are very grateful to two anonymous reviewers think at present it is the most appropriate for users for valuable comments and suggestions on the pre- to manually perform the time-span reduction of po- vious version of this article. We would like to lyphony, not only because current technology to do thank Prof. Yuzuru Hiraga and Mr. Yoshihiro Take- so automatically is premature, but also because ex- uchi for their valuable suggestions to the material cessive automation may yield unintended analysis of this article. Thanks are also due to Dr. Shunichi results. Hence, we are developing a dedicated editor Uchida, the director of Research Institute for Ad- (even for users who are not familiar with music vanced Information Technology (AITEC), for his theory), called TS-editor, that allows users to at- support. tach a time-span tree and temporal structures (see Figure 14). A score is displayed in piano-roll format on the TS-editor’s screen. The editor enables a user to effi- References ciently indicate the relations among head, primary, and secondary, and the information of stable and Arcos, J. L., and R. L. de Ma´ntaras. 2000. ‘‘Combining AI difference in a temporal structure. For more details, Techniques to Perform Expressive Music by Imita- see Hirata and Matsuda (2002). tion.’’ AAAI Workshop: Artificial Intelligence and Mu- sic. Menlo Park, California: AAAI Press, pp. 41–47. Balaban, M. 1996. ‘‘The Music Structures Approach to Knowledge Representation for Music Processing.’’ Conclusion Computer Music Journal 20(2):96–111. Botelho, M. 1994. ‘‘Tonal Grouping: An Addendum to This article proposed a music knowledge represen- Lerdahl and Jackendoff’s ‘A Generative Theory of To- tation framework based on the generative theory of nal Music’.’’ In Proceedings of the Third International tonal music (GTTM). Our framework succeeds in Conference for Music Perception and Cognition, pp. formalizing polyphony to some extent from the 265–266. Carpenter, B. 1992. The Logic of Typed Feature Struc- GTTM point of view, though this work is merely a tures. Cambridge, UK: Cambridge University Press. first step towards a methodology for designing in- Castan, G., M. Good, and P. Roland. 1999. ‘‘Extensible telligent music applications. To evaluate our Markup Language (XML) for Music Applications: An framework, we will have to build several music ap- Introduction.’’ Computing in Musicology 12:95–102. plications and examine the descriptive power, pre- Cope, D. 1996. Experiments in Musical Intelligence. cision, and efficiency of our framework. Middleton, WI: A-R Editions.

88 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322482547 by guest on 30 September 2021 Dannenberg, R. B. 1993. ‘‘Music Representation Issues, Conference for Music Perception and Cognition, pp. Techniques, and Systems.’’ Computer Music Journal 261–262. 17(3):20–30. Narmour, E. 1990. The Analysis and Cognition of Basic Dannenberg, R. B. 1997. ‘‘Machine Tongues XIX: Ny- Melodic Structures – The Implication-Realization quist, a Language for Composition and Sound Synthe- Model. Chicago: University of Chicago Press. sis.’’ Computer Music Journal 21(3):50–60. Nord, T. A. 1992. ‘‘Toward Theoretical Verification: De- Dibben, N. 1994. ‘‘The Cognitive Reality of Hierarchic veloping a Computer Model of Lerdahl and Jacken- Structure in Tonal and Atonal Music.’’ Music Percep- doff’s Generative Theory of Tonal Music.’’ Ph.D. Diss., tion 12(1):1–25. The University of Wisconsin–Madison. Hirata, K., and T. Aoyagi. 1999. ‘‘A Musically Intelligent Plaza, E. 1995. ‘‘Cases as Terms: A Feature Term Ap- Agent for Composition and Interactive Performance.’’ proach to the Structured Representation of Cases.’’ Proceedings of the 1999 International Computer Mu- Lecture Notes in Artificial Intelligence. Berlin: sic Conference. San Francisco: International Computer Springer-Verlag. Music Association, pp. 167–170. Rowe, R. 1993. Interactive Music Systems: Machine Lis- Hirata, K., and T. Aoyagi. 2000. ‘‘Ba-Bi-Bun: Case-Based tening and Composing. Cambridge, Massachusetts: Reasoning Musical Arrangement System Understand- MIT Press. ing a User’s Intention at Note Level.’’ IPSJ SIG Notes Sadie, S., ed. 1980. The New Grove Dictionary of Music 2000-MUS-37:17–23. and Musicians. London: Macmillan Publishers Ltd. Hirata, K., and T. Aoyagi. 2001. ‘‘Pa-Pi-Pun: Creativity- Selfridge-Field, E. 1997. Beyond MIDI. Cambridge, Mas- Support Tool for Generating Jazz Harmony.’’ IPSJ Jour- sachusetts: MIT Press. nal 42(3):633–641. Temperley, D. 2001. The Cognition of Basic Musical Hirata, K., and R. Hiraga. 2000. ‘‘Next Generation Perfor- Structures. Cambridge, Massachusetts: MIT Press. mance Rendering: Exploiting Controllability.’’ In Pro- Temperley, D., and D. Sleator. 1999. ‘‘Modeling Meter ceedings of the 2000 International Computer Music and Harmony: A Preference Rule Approach.’’ Com- Conference. San Francisco: International Computer puter Music Journal 23(1):10–27. Music Association, pp. 360–363. Uwabu, Y., H. Katayose., and S. Inokuchi. 1997. ‘‘A Hirata, K., and S. Matsuda. 2002. ‘‘Interactive Music Structural Analysis Tool for Expressive Performance.’’ Summarization based on GTTM.’’ In Proceedings of Proceedings of the 1997 International Computer Mu- the Third International Conference on Music Informa- sic Conference. San Francisco: International Computer tion Retrieval (ISMIR 2002). Paris: IRCAM, pp. 86–93. Music Association, pp. 121–124. Kifer, M., G. Lausen, and J. Wu. 1995. ‘‘Logical Founda- Widmer, G. 1995. ‘‘Modeling the Rational Basis of Musi- tions of Object-Oriented and Frame-Based Languages.’’ cal Expression.’’ Computer Music Journal 19(2):76–96. State University of New York at Stony Book Depart- Widmer, G. 1996. ‘‘Learning Expressive Performance: ment of Computer Science Technical Report TR-90– The Structure-Level Approach.’’ Journal of New Music 003. Research 25:179–205. Kolodner, J. 1993. Case-Based Reasoning. Burlington, Wiggins, G., E. Miranda, A. Smaill, and M. Harris. 1993. MA: Morgan Kaufmann. ‘‘A Framework for the Evaluation of Music Represen- Lerdahl, F. 2001. Tonal Pitch Space. Oxford, UK: Oxford tation Systems.’’ Computer Music Journal 17(3):31–42. University Press. Yokota, K. 1992. ‘‘Towards an Integrated Knowledge-Base Lerdahl, F., and R. Jackendoff. 1983. A Generative The- Management System: Overview of R&D on Databases ory of Tonal Music. Cambridge, Massachusetts: MIT and Knowledge Bases in the FGCS Project.’’ Proceed- Press. ings of International Conference on Fifth Generation London, J. 1994. ‘‘The ‘Strong Reduction Hypothesis’ of Computer Systems. Institute for New Generation GTTM.’’ In Proceedings of the Third International Computer Technology, pp. 89–112.

Hirata and Aoyagi 89

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322482547 by guest on 30 September 2021