Word Informativeness and Automatic Pitch Accent Modeling Shimei Pan and Kathleen R

Home , Semantic similarity

Word Informativeness and Automatic Pitch Accent Modeling Shimei Pan and Kathleen R. McKeown Dept. of Computer Science Columbia University New York, NY 10027, USA {pan, kathy}@cs, columbia, edu

Abstract though native speakers of a particular lan- In intonational phonology and speech syn- guage have no difficulty in deciding which thesis research, it has been suggested that words in their utterances should be accented, the relative informativeness of a word can be the general pattern of accenting in a lan- used to predict pitch prominence. The more guage, such as English, is still an open ques- information conveyed by a word, the more tion. likely it will be accented. But there are oth- Some linguists speculate that relative in- ers who express doubts about such a correla- formativeness, or semantic weight of a word tion. In this paper, we provide some empiri- can influence accent placement. Ladd (1996) cal evidence to support the existence of such claims that "the speakers assess the relative a correlation by employing two widely ac- semantic weight or informativeness of poten- cepted measures of informativeness. Our ex- tially accentable words and put the accent on periments show that there is a positive corre- the most informative point or points" (ibid, lation between the informativeness of a word pg. 175). He also claims that "if we un- and its pitch accent assignment. They also derstand relative semantic weight, we will show that informativeness enables statisti- automatically understand accent placement" cally significant improvements in pitch ac- (ibid, pg. 186). Bolinger (Bolinger, 1972) cent prediction. The computation of word also uses the following examples to illustrate informativeness is inexpensive and can be the phenomenon: incorporated into speech synthesis systems 1. "He was arrested because he KILLED a easily. man." 1 Introduction 2. "He was arrested because he killed a POLICEMAN." The production of natural, intelligible speech remains a major challenge for speech The capitalized words in the examples are synthesis research. Recent research has accented. In (1), "man" is semantically focused on prosody modeling (Silverman, empty relative to "kill"; therefore, the verb 1987; Hirschberg, 1990; Santen, 1992), which "kill" gets accented. However, in (2), "po- determines the variations in pitch, tempo liceman" is semantically rich and is accented and rhythm. One of the critical issues in instead. prosody modeling is pitch accent assign- However, different theories, not based on ment. Pitch accent is associated with the informativeness, were proposed to explain pitch prominence of a word. For example, the above phenomenon. For example, Bres- some words may sound more prominent than nan's (1971) explanation is based on syn- others within a sentence because they are as- tactic function. She suggests that "man" sociated with a sharp pitch rise or fall. Usu- in the above sentence does not get accented ally, the prominent words bear pitch accents because "man" and other words like "guy" while the less prominent ones do not. A1- or "person" or "thing" form a category of

148 "semi-pronouns". Counter-examples listed • IC provides better prediction power in below raise more questions about the use- pitch accent prediction than previous fulness of semantic informativeness. The ac- techniques. cent pattern in the following examples can- not be explaihed solely by semantic informa- The investigated pitch accent models can be tiveness. easily adopted by speech synthesis systems. 3. "HOOVER dam." 2 Definitions of IC and 4. "Hoover TOWER." TF*IDF While researchers have discussed the pos- Following the standard definition in infor- sible influence of semantic informativeness, mation theory (Shannon, 1948; Fano, 1961; there has been no known empirical study Cover and Thomas, 1991) the IC of a word of the claim nor has this type of informa- is tion been incorporated into computational IC(w) = -log(P(w)) models of prosody. In this work, we employ where P(w) is the probability of the word two measurements of informativeness. First, w appearing in a corpus and P(w) is esti- we adopt an information-based framework matted as: _~2 where F(w) is the frequency (Shannon, 1948), quantifying the "Informa- of w in the corpus and N is the accumula- tion Content', (IC)" of a word as the negative tive occurrence of all the words in the cor- log likelihood of a word in a corpus. The pus. Intuitively, if the probability of a word second measurement is TF*IDF (Term Fre- increases, its informativeness decreases and quency times Inverse Document Frequency) therefore it is less likely to be an information (Salton, 1989; Salton, 1991), which has been focus. Similarly, it is therefore less likely to widely used :to quantify word importance in be communicated with pitch prominence. information retrieval tasks. Both IC and TF*IDF is defined by two components TF*IDF are well established measurements multiplied together. TF (Term Frequency) of informativeness and therefore, good can- is the word frequency within a document; didates to investigate. Our empirical study IDF (Inverse Document Frequency) is the shows that word informativeness not only logarithm of the ratio of the total number of is closely related to word accentuation, but documents to the number of documents con- also provides new power in pitch accent pre- taining the word. The product of TF*IDF is diction. Our results suggest that informa- higher if a word has a high frequency within tion content is a valuable feature to be in- the document, which signifies high impor- coiporated in speech synthesis systems. tance for the current document, and low dis- In the following sections, we first define IC persion in the corpus, which signifies high and TF*IDF. Then, a description of the cor- specificity. In this research, we employed pus used in ~this study is provided. We then a variant of TF*IDF score used in SMART describe a Set of experiments conducted to (Buckley, 1985), a popular information re- study the relation between informativeness trieval package: and pitch accent. We explain how machine learning techniques are used in the pitch ac- (TF*IDF)~o,,d; = cent modeling process. Our results show N that: (1.0 + log F,o,,dj) log N,o~ • Both iC and TF*IDF scores are I M N 2 strongly correlated with pitch accent as- E ((1"0 + log F~ok,dj) log ~) signment. k=l • IC is a more powerful predictor than where F,~.dj is the the frequency of word wi TF*IDF. in document dj, N is the total number of

149 documents, Nw~ is the number of documents "receive" and "receives" are equally likely containing word w~ and M is the number of to be accented. Then, IC and TF*IDF are distinct stemmed words in document dj. calculated. After this, the effectiveness of IC and TF*IDF capture different kinds of informativeness in accent placement is veri- informativeness. IC is a matrix global in fied using the speech corpus. Each word in the domain of a corpus and each word in the speech corpus has an IC score, a TF*IDF a corpus has a unique IC score. TF*IDF score, a part-of-speech (POS) tag and a pitch captures the balance of a matrix local to a accent label. Both IC and TF*IDF are used given document (TF) and a matrix global to test the correlation between informative- in a corpus (IDF). Therefore, the TF*IDF ness and accentuation. POS is also investi- score of a word changes from one document gated by several machine learning techniques to another (different TF). However, some in automatic pitch accent modeling. global features are also captured by TF*IDF. For example, a common word in the domain 4 Experiments tends to get low TF*IDF score in all the documents in the corpus. We conducted a series of experiments to determine whether there is a correlation between informativeness and pitch accent 3 Corpus Description and whether informativeness provides an im- In order to empirically study the relations provement over other known indicators on between word informativeness and pitch ac- pitch accent, such as part-of-speech. We ex- cent, we use a medical corpus which includes perimented with different forms of machine a speech portion and a text portion. The learning to integrate indicators within a sin- speech corpus includes fourteen segments gle framework, testing whether rule induc- which total about 30 minutes of speech. The tion or hidden Markov modeling provides a speech was collected at Columbia Presbyte- better model. rian Medical Center (CPMC) where doctors 4.1 Ranking Word Informativeness informed residents or nurses about the post- operative status of a patient who has just un- in the Corpus dergone a bypass surgery. The speech corpus Table 1 and 2 shows the most and least in- was transcribed orthographically by a medi- formative words in the corpus. The IC order cal professional and is also intonationally la- indicates the rank among all the words in the beled with pitch accents by a ToBI (Tone corpus, while TF*IDF order in the table in- and Break Index) (Silverman et al., 1992; dicates the rank among the words within a Beckman and Hirschberg, 1994) expert. The document. The document was picked ran- text corpus includes 1.24 million, 2,422 dis- domly from the corpus. In general, most charge summaries, spanning a larger group of the least informative words are function of patients. The majority of the patients words, such as "with" or "and". However, have also undergone cardiac surgery. The or- some content words are selected, such as "pa- thographic transcripts as well as the text cor- tient", "year", "old". These content words pus are used to calculate the IC and TF*IDF are very common in this domain and are scores. First, all the words in the text cor- mentioned in almost all the documents in the pus as well as the speech transcripts are pro- corpus. In contrast, the majority of the most cessed by a stemming model so that words informative words are content words. Some like "receive" and "receives" are treated as of the selections are less expected. For ex- one word. We employ a revised version of ample "your" ranks as the most informative Lovins' stemming algorithm (Lovins, 1968) word in a document using TF*IDF. This in- which is implemented in SMART. Although dicates that listeners or readers are rarely ad- the usefulness of stemming is arguable, we dressed in the corpus. It appears only once choose to use stemming because we think in the entire corpus.

150 Rank ICMost Informative IC Least Informative Words IC Words IC 1 zophrin 14.02725 with 4.08777 2 namel 14.02725 on 4.20878 3 xyphoid 14.02725 patient 4.26354 4 wytensin 14.02725 in 4.35834 5 pyonephritis 14.02725 she 4.52409 6 orobuccal 14.02725 he 4.52918 7 tzanck 14.02725 for 4.66436 8 synthetic 14.02725 no 4.69019 9 Rx 14.02725 day 4.78832 10 quote 14.02725 had 4.98343

Table I: IC Most and Least informative words

Rank TF*IDF Most Informative TF*IDF Least Informative Words TF*IDF Words TF*IDF 1 your 0.15746 and 0.00008 2 vol 0.15238 a 0.00009 3 tank 0.15238 the 0.00009 4 sonometer 0.15238 to 0.00016 5 papillary 0.15238 was 0.00020 6 pancuronium 0.15238 of 0.00024 7 name2 0.15238 with 0.00034 8 name3 0.15238 in 0.00041 9 incomplete 0.14345 old 0.00068 10 yes 0.13883 year 0.00088

Table 2: TF*IDF Most and Least informative words

4.2 Testing the Correlation of itive, this indicates that the higher the IC Informativeness and Accent and TF*IDF are, the more likely a word is to be accented. Prediction

In order to verify whether word informa- 4.3 Learning IC and TF*IDF Accent tiveness is correlated with pitch accent, we Models employ Spearman's rank correlation coefficient p and associated test (Conover, 1980) The correlation test suggests that there is to estimate the correlations between IC and a strong connection between informativeness pitch prominence as well as TF*IDF and and pitch accent. But we also want to show pitch prominence. As shown in Table 3, how much performance gain can be achieved both IC and TF*IDF are closely correlated by adding this information to pitch accent to pitch accent with a significance level p = models. To study the effect of TF*IDF and 2.67.10 -85 and p = 2.90. i0 TM respectively. IC on pitch accent, we use machine learning Because the! correlation coefficient p is pos- techniques to learn models that predict the

151 Feature Correlation Coefficient Significance Level TF*IDF p -- 0.29 p = 2.67.10 -65 IC p = 0.34 p = 2.90.10 TM

Table 3: The Correlation of Informativeness and Accentuation effect of these indicators on pitch accent. We no sophisticated parameter estimation pro- use both RIPPER (Cohen, 1995) and Hid- cedure, such as the Baum-Welch algorithm den Markov Models (HMM) (Rabiner and (Rabiner, 1989) is necessary. Here all the Juang, 1986) to build pitch accent models. parameters are precisely calculated using the RIPPER is a system that learns sets of clas- following formula: sification rules from training data. It automatically selects rules which maximize the A = {c~j :i = 1,..,N,j = 1,..,N} information gain and employs heuristics to decide when to stop to prevent over-fitting. F(Qt-1--i, Qt=j) The performance of RIPPER is compara- o~i3 -- F(Qt = j) ble with most benchmark rule induction systems such as C4.5 (Quinlan, 1993). We train B = (l~m: j = 1,..,N,m = 1,.., M} RIPPER on the speech corpus using 10- Zjm = F(Qt = j, = m) fold cross-validation, a standard procedure F(Qt = j) for training and testing when the amount of data is limited. In this experiment, the R = i= 1,..,N} predictors are IC or TF*IDF, and the re- F(Qi =i) sponse variable is the pitch accent assign- ~'i= F(Qi) ment. Once a set of RIPPER rules are ac- quired, they can be used to predict which where N is the number of hidden states and word should be accented in a new corpus. M is the number of observations. HMM is a probability model which has Once all the parameters of a HMM are set, been successfully used in many applications, we employ the Viterbi algorithm (Viterbi, such as speech recognition (Rabiner, 1989) 1967; Forney, 1973) to find an optimal accen- and part-of-speech tagging (Kupiec, 1992). tuation sequence which maximizes the pos- A HMM is defined as a triple: )~=(A, B, H) sibility of the occurrence of the observed IC where A is a state transition probability ma- or TF*IDF sequence given the HMM. trix, B is a observation probability distribu- Both RIPPER and HMM are widely ac- tion matrix, and H is an initial state distribu- cepted machine learning systems. However, tion vector. In this experiment, the hidden their theoretical bases are very different. states are the accent status of words which HMM focuses on optimizing a sequence of can be either "accented" or "not accented". accent assignments instead of isolated accent The observations are IC or TF*IDF score assignment. By employing both of them, we of each word. Because of the limitation of want to show that our conclusions hold for the size of the speech corpus, we use a first- both approaches. Furthermore, we expect order HMM where the following condition is HMM to do better than RIPPER because assumed: the influence of context words is incorporated. P( Qt+I -~i[Qt = j, Qt-1 = k, . . . Q1 =n) = We use a baseline model where all words P(Qt+i =ilQt=j) are assigned a default accent status (accented). 52% of the words in the corpus where Qt is the state at time t. Because are actually accented and thus, the baseline we employ a supervised training process, has a performance of 52~0. Our results in

152 Models HMM Performance RIPPER Performance Baseline 52.02% 52.02% TF*IDF Model 67.25% 65.66% IC Model 71.96% 70.06%

TaBle 4: Comparison of IC, TF*IDF model with the baseline model

Table 4 show that when TF*IDF is used a part-of-speech (POS) model for pitch ac- to predict pitch accent, performance is in- cent prediction and then compare a model creased over the baseline of 52% to 67.25% that integrates IC with POS against the POS and 65.66 °7o for HMM and RIPPER respec- model. Finally, anticipating the possibility tively. In the IC model, the performance that other features within a traditional TTS is further increased to 71.96% and 70.06%. in combination with POS may provide equal These results 'are obtained by using 10-fold or better performance than the addition of cross-validation. We can draw two conclu- IC, we carried out experiments that directly sions from the results. First, both IC and compare the performance of Text-to-Speech TF*IDF are very effective in pitch accent (TTS) synthesizer alone with a model that prediction. All the improvements over the integrates TTS with IC. baseline model are statistically significant In most speech synthesis systems, part-of- with p < i.Iii. 10 -16 1, using X 2 test (Fien- speech (POS) is the most powerful feature berg, 1983; Fleiss, 1981). Second, the IC in pitch accent prediction. Therefore, show- model is more powerful than the TF*IDF ing that IC provides additional power over model. It out performs the TF*IDF model POS is important. In addition to the im- with p = 3.8.10 -5 for the HMM model and portance of POS within TTS for predicting p = 0.0002 for the RIPPER model. The low pitch accent, there is a clear overlap between p-values show the improvements achieved by POS and IC. We have shown that the words the IC models are significant. Since IC per- with highest IC usually are content words forms better than TF*IDF in pitch accent and the words with lowest IC are frequently prediction, we choose IC to measure infor- function words. This is an added incentive mativeness in all the following experiments. for comparing IC with POS models. Thus, Another observation of the results is that the we want to explore whether the new informa- HMM models do show some improvements tion added by IC can provide any improve- over the RIPBER models. But the difference ment when both of them are used to predict is marginal. More data is needed to test the accent assignment. significance of the improvements. In order to create a POS model, we first utilize MXPOST, a maximum entropy part- 4.4 Incorporating IC in Reference of-speech tagger (Ratnaparkhi, 1996) to get Accent Models the POS information for each word. The In order to show that IC provides additional performance of the MXPOST tagger is com- power in predicting pitch accent than cur- parable with most benchmark POS taggers, rent models, we need to directly compare such as Brill's tagger (Brill, 1994). After the influence of IC with that of other ref- this, we map all the part-of-speech tags into erence models. In this section, we describe seven categories: "noun", "verb", "adjec- experiments that compare IC alone against tive", "adverb", "number", "pronoun" and "others". The mapping procedure is con- iS reports p=0 because of underflow. The real p ducted because keeping all the initial tags value is less than I.ii • 10 -16, which is the smallest (about 35) will drastically increase the re- value the comptlter can represent in this case quirements for the amount of training data. Models HMM Performance RIPPER Performance IC Model 71.96% 70.06% POS Model 71.33% 70.52% POS+IC Model 74.06% 73.71%

Table 5: Comparison of POS+IC model with POS model

Models HMM Performance RIPPER Performance TTS Model 71.75% 71.75% TTS+IC Model 72.30% 72.75% POS+IC Model 74.06% 73.71%

Table 6: Comparison of TTS+IC model with TTS model

The obtained POS tag is the predictor in the ment, the TTS performance is 71.75% which POS model. As shown in table 5, the perfor- is statistically significantly lower than the mance of these two POS models are 71.33% HMM POS+IC model with p = 0.039. We and 70.52% for HMM and RIPPER respec- also tried to incorporate IC in TTS model. tively, which is comparable with that of the A simple way of doing this is to use the IC model. This comparison further shows TTS output and IC as predictors and train the strength of IC because it has similar them with our data. The obtained TTS+IC power to POS in pitch accent prediction and models achieve marginal improvement. The it is very easy to compute. When the POS performance of TTS+IC model increases to models are augmented with IC, the POS+IC 72.30% and 72.75% for HMM and RIPPER model performance is increased to 74.06% respectively, which is lower than that of the and 73.71% respectively. The improvement POS÷IC models. We speculate that this is is statistically significant with p -- 0.015 for may be due to the corpus we used. The HMM model and p = 0.005 for RIPPER Bell Laboratories' TTS pitch accent model which means the new information captured is trained in a totally different domain, and by IC provides additional predicting power our medical corpus seems to negatively affect for the POS+IC models. These experiments the TTS performance (71.75% compared to produce new evidence confirming that IC is around 80%, its normal performance). Since a valuable feature in pitch accent modeling. the TTS+IC models involve two totally dif- We also tried another reference model, ferent domains, the effectiveness of IC may Text-to-Speech (TTS) synthesizer output, to be compromised. If this assumption holds, evaluate the results. The TTS pitch ac- we think that the TTSwIC model will per- cent model is more comprehensive than the form better when IC is trained together with POS model. It has taken many features into the TTS internal features on our corpus di- consideration, such as discourse and seman- rectly. But since this requires retraining a tic information. It is well established and TTS system for a new domain and it is very has been evaluated in various situations. In hard for us to conduct such an experiment, this research, we adopted Bell Laboratories' no further comparison was conducted to ver- TTS system (Sproat, 1997; Olive and Liber- ify this assumption. man, 1985; Hirschberg, 1990). We run it on Although TF*IDF is less powerhfl than IC our corpus first to get the TTS pitch accent in pitch accent prediction, since they mea- assignments. Comparing the TTS accent sure two different kinds of informativeness, assignment with the expert accent assign- it is possible that a TF*IDF+IC model can

154 !

perform better than the IC model. Similarly, guage generation system is used to predict if TF*IDF is incorporated in the POS÷IC pitch accent. model, the overall performance may increase for the POS+IC+TF*IDF model. How- ever, our experiment shows no improvements 6 Discussion when TF*IDF is incorporated in the IC and Since IC is not a perfect measurement of in- POS+IC model. Our experiments show that formativeness, it can cause problems in ac- IC is always the dominant predictor when cent prediction. Moreover, even if a perfect both IC and TF*IDF are presented. measurement of informativeness is available, more features may be needed in order to 5 Related Work build a satisfactory pitch accent model. In this section, we discuss each of these issues. Information based approaches were applied IC does not directly measure the informa- in some natural languages applications be- tiveness of a word. It measures the rarity of a fore. In (Resnik, 1993; Resnik, 1995), IC word in a corpus. That a word is rare doesn't was used to measure semantic similarity be- necessarily mean that it is informative. Se- tween words and it is shown to be more mantically empty words can be ranked high effective than traditional measurements of using IC as well. For example, CABG is a semantic distance within the WordNet hi- common operation in this domain. "CABG" erarchy. A similar log-based information- is almost always used whenever the opera- like measurement was also employed in (Lea- tion is mentioned. However, in a few in- cock and Chodorow, 1998) to measure se- stances, it is referred to as a "CABG oper- mantic similarity. TF*IDF scores are mainly ation". As a result, the semantically empty used in keyword-based information retrieval word (in this context) "operation" gets a tasks. For example, TF*IDF has been used high IC score and it is very hard to distin- in (Salton, :1989; Salton, 1991) to index guish high IC scores resulting from this sit- the words ini a document and is also imple- uation from those that accurately measure mented in SMART (Buckley, 1985) which informativeness and this causes problems in is a general;-purpose information retrieval precisely measuring the IC of a word. Simi- package, providing basic tools and libraries larly, misspelled words also can have high IC to facilitate information retrieval tasks. score due to their rarity. Some early work on pitch accent predic- Although IC is not ideal for quantifying tion in speech synthesis only uses the dis- word informativeness, even with a perfect tinction between content words and function measurement of informativeness, there are words. Although this approach is simple, it still many cases where this information by tends to assign more pitch accents than nec- itself would not be enough. For example, essary. We also tried the content/function each word only gets a unique IC score re- word model on our corpus and as expected, gardless of its context; yet it is well known we found it to be less powerful than the part- that context information, such as given/new of-speech model. More advanced pitch ac- and contrast, plays an important role in accent models make use of other information, centuation. In the future, we plan to build a such as part-of-speech, given/new distinc- comprehensive accent model with more pitch tions and contrast information (Hirschberg, accent indicators, such as syntactic, seman- 1993). Semantic information is also em- tic and discourse features. ployed in predicting accent patterns for com- plex nominal phrases (Sproat, 1994). Other comprehensive pitch accent models have 7 Conclusion been suggested in (Pan and McKeown, 1998) In this paper, we have provided empirical ev- in the framework of Concept-to-Speech gen- idence for the usefulness of informativeness eration where the output of a natural lan- for accent assignment. Overall, there is a

155 positive correlation between indicators of in- Dwight Bolinger. 1972. Accent is pre- formativeness, such as IC and TF*IDF, and dictable (if you're a mind-reader). Lan- pitch accent. The more informative a word guage, 48:633-644. is, the more likely that a pitch accent is as- Joan Bresnan. 1971. Sentence stress signed to the word. Both of the two measure- and syntactic transformations. Language, ments of informativeness improve over the 47:257-280. baseline performance significantly. We also Eric Brill. 1994. Some advances in rule- show that IC is a more powerful measure of based part of speech tagging. In Proceed- informativeness than TF*IDF for pitch ac- ings of the 12th National Conference on cent prediction. Later, when comparing IC- Artificial Intelligence. empowered POS models with POS models, Chris Buckley. 1985. Implementation of we found that IC enables additional, statis- the SMART information retreival system. tically significant improvements for pitch ac- Technical Report 85-686, Cornell Univer- cent assignment. This performance also out- sity. performs the TTS pitch accent model significantly. Overall, IC is not only effective, as William Cohen. 1995. Fast effective rule in- shown in the results, but also relatively induction. In Proceedings of the 12th In- expensive to compute for a new domain. Al- ternational Conference on Machine Learn- most all speech synthesis systems, text-to- ing. speech as well as concept-to-speech systems, W. J. Conover. 1980. Practical Nonparam- can employ this feature as long as there is etic Statistics. Wiley, New York, 2nd edi- a large corpus. In the future, we plan to tion. explore other information content measure- Thomas M. Cover and Joy A. Thomas. 1991. ments and incorporate them in a more com- Elements of Information Theory. Wiley, prehensive accent model with more discourse New York. and semantic features included. Robert M. Fano. 1961. Transmission of Information: A Statistical Theory of 8 Acknowledgement Communications. MIT Press, Cambridge, Massachusetts. Thanks to Julia Hirschberg, Vasileios Hatzi- Stephen E. Fienberg. 1983. The Analysis vassiloglou and James Shaw for comments of Cross-Classified Categorical Data. MIT and suggestions on an earlier version of this Press, Cambridge, Mass, 2nd edition. paper. Thanks to Desmand Jordan for help- Joseph L. Fleiss. 1981. Statistical Methods ing us with the collection of the speech and for Rates and Proportions. Wiley, New text corpus. This research is supported York, 2nd edition. in part by the National Science Founda- G. David Forney. 1973. The Viterbi algo- tion under Grant No. IRI 9528998, the rithm. Proceedings of IEEE, 61(3). National Library of Medicine under project R01 LM06593-01 and the Columbia Univer- Julia Hirschberg. 1990. Assigning pitch ac- sity Center for Advanced Technology in High cent in synthetic speech: The given/new Performance Computing and Communica- distinction and deaccentability. In Pro- tions in Healthcare (funded by the New York ceedings of the Seventh National Confer- State Science and Technology Foundation). ence of American Association of Artificial Intelligence, pages 952-957, Boston. Julia Hirschberg. 1993. Pitch accent in con- References text: predicting intonational prominence Mary Beckman and Julia Hirschberg. 1994. from text. Artificial Intelligence, 63:305- The ToBI annotation conventions. Tech- 340. nical report, Ohio State University, Julian Kupiec. 1992. Robust part-of-speech Columbus. tagging using a hidden markov model.

156 Computer Speech and Language, 6(3):225- cial Intelligence, pages 448-453, Montreal, 242, July. Canada. D. Robert Ladd. 1996. Intonational Phonol- Gerard Salton. 1989. Automatic Text Pro- ogy. Cambridge University Press, Cam- cessing: The Transformation, Analysis, bridge. and Retrieval of Information by Com- Claudia Leacock and Martin Chodorow. puter. Addison-Wesley, Reading, Mas- 1998. Combining local context and Word- sachusetts. Net similai:ity for word sense identifi- Gerard Salton. 1991. Developments in auto- cation. In Christiane Fellbaum, editor, matic text retrieval. Science, 253:974-980, WordNet: An electronic lexical database, August. chapter 11. MIT Press. Jan P. H. Van Santen. 1992. Contextual el- fects on vowel duration. Julie Beth Lovins. 1968. Development of Speech Commu- a stemming algorithm. In Mechanical nicatio n, 11:513-546, January. Translation and Computational Linguis- Claude E. Shannon. 1948. A mathemati- tics, volume 11. cal theory of communication. Bell System Technical Journal, 27:379-423 and 623- Joseph. P. Olive and Mark Y. Liberman. 656, July and October. 1985. Text to Speech--An overview. Kim Silverman, Mary Beckman, John Journal of lthe Acoustic Society of Amer- Pitrelli, Mari Ostendorf, Colin Wightman, 78(Fall~ :s6. ica, Patti Price, Janet Pierrehumbert, and Ju- Shimei Pan and Kathleen R. McKeown. lia Hirschberg. 1992. ToBI: a standard for 1998. Learning intonation rules for con- labelling English prosody. In Proceedings cept to speech generation. In Proceedings of ICSLP92, volume 2. of COLING/A CL '98, Montreal, Canada. Kim Silverman. 1987. The structure and John R. Quinlan. 1993. C4.5: Programs for processing of fundamental frequency con- Machine Learning. Morgan Kaufmann, tours. Ph.D. thesis, Cambridge Univer- San Mateo. sity. Lawrence R. Rabiner and B. H. Juang. 1986. Richard Sproat. 1994. English noun- An introduction to hidden Markov rood- phrase accent prediction for Text-to- els. IEEE ASSP Magazine, pages 4-15, Speech. Computer Speech and Language, January. 8:79-94. Lawrence R. Rabiner. 1989. A tutorial on Richard Sproat. 1997. Multilingual Text- hidden Ma~:kov models and selected appli- to-Speech Synthesis: The Bell Labs Ap- cations in speech recognition. Proceedings proach. Kluwer, Boston. of the IEEE, 77(2):257-286. Andrew J. Viterbi. 1967. Error bound Adwait Ratnaparkhi. 1996. A maximum en- for convolutionM codes and an asymp- tropy part iof speech tagger. In Eric Brill totically optimum decoding algorithm. and Kenneth Church, editors, Conference IEEE Transactions in Information The- on Empirical Natural Language Process- or'y, 13(2). ing. Univ. of Pennsylvania. Philip Resnik. 1993. Semantic classes and syntactic ambiguity. In Proc. of ARPA Workshop on Human Language Technol- ogy, pages 278-283. Morgan Kaufmann Publishers. Philip Resnik. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th Internatioi~al Joint Conference on Artifi-

157