Discourse Structure in Spoken Language: Studies on Speech Corpora
Total Page:16
File Type:pdf, Size:1020Kb
Discourse Structure in Spoken Language: Studies on Speech Corpora The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Nakatani, Christine H., Julia Hirschberg, and Barbara J. Grosz. 1995. Discourse structure in spoken language: Studies on speech corpora. Paper presented at 1995 AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation in Palo Alto, Calif., March 27–29, 1995. Published Version http://www.aaai.org/Symposia/Spring/sss95.php Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:2580299 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Discourse Structure in Sp oken Language Studies on Sp eech Corp ora y Christine H Nakatani Julia Hirschberg Barbara J Grosz Aiken Computation Lab oratory C ATT Bell Lab oratories Aiken Computation Lab oratory Division of Applied Sciences Mountain Avenue Division of Applied Sciences Harvard University Murray Hill NJ USA Harvard University Cambridge MA USA juliaresearchattcom Cambridge MA USA chndasharvardedu groszdasharvardedu Abstract spite the fact that basic synthesis technologies for pro ducing natural intonation already exist A b etter understanding of the intonational char acteristics of sp oken discourse mayleadtonew Theoretical and Metho dological empirical techniques for identifying discourse structure from sp eech as well as new algorithms Foundations for enhancing the naturalness of synthetic sp eech Several decades of researchhave resulted in numerous This pap er summarizes results of pilot stud ndings on how discourse level meaning can b e con ies that demonstrate reliable correlations of dis veyed by acousticproso dic prop erties such as pitch course and sp eech prop erties and rep orts nd ings on a new corpus of directiongivi ng mono range and pausal duration Avesani Vayra logues collected in b oth sp ontaneous and read Ayers Brown Currie Kenworthy Lehiste sp eaking styles Preliminary analyses of the Silverman cf Wo o dbury am directiongivin g corpus show that the availabil plitude Brown Currie Kenworthy sp eak ity of sp eech signicantly aects the reliabil ityof ing rate Lehiste and intonational prominence discourse segmentation for a set of trained dis Brown Terken Most of these studies have course lab elers relied on intuitive analyses of notions such as topic structure or op erational denitions of discourselevel prop erties such as paragraph markings as indicators Intro duction of discourse segment b oundaries This pap er rep orts on ongoing corpusbased research In contrast to most previous work two recent studies on the intonational characteristics of sp oken discourse utilized an indep endent denition of discourse struc in American English The scientic goal of this re ture to obtain discourse segmentation data from mul searchistolay the foundations for a b o otstrapping tiple sub jects In Hirschb erg Grosz Grosz pro cess in which empirical evidence from sp oken lan Hirschb erg discourse stuctural elements were guage informs us of strengths and w eaknesses in a dis determined by trained sub jects follo wing Grosz Sid course theory and in which our b est current under ner and were correlated with intonational prop standing of discourse structure suggests more sophis erties In Passoneau Litman discourse seg ticated interpretations of intonational meaning The mentations were obtained from naive sub jects based on technological goal of this research is to improve the an informal notion of sp eaker intention For a narrative quality of sp eechsynthesis by exploiting the abilityof corpus pausal duration ab ove a certain threshold pre intonation to reliably convey linguistic structure at the dicted segment b oundaries with high recall but discourse level low precision Passoneau and Litman suggest Cognitive studies based on linguistic researchhave that intonational cues b e integrated with textbased shown that the lack of contextually appropriate into cues such as cue phrases Hirschb erg Litman national variation can hinder pro cessing bythehu and other lexical information Morris Hirst man listener Terken No oteb o om No oteb o om Hearst in sp oken language pro cessing systems Kruyt Yet algorithms for manipulating using multiple knowledge sources proso dic variation lag b ehind even our present under The p otential contributions of sp eech cues in suchan standing of howintonational meaning is conveyed de architecture remain largely unexplored Intonational The research rep orted here was partially supp orted by variables need to b e interrelated in new algorithms grants NSF IRI and NSF IRI from the National Science Foundation That is input to systems such as DECTalk and the y Partially supp orted by a National Science Foundation ATT TexttoSp eech System can b e hand annotated to Graduate ResearchFellowship pro duce quite natural sounding sp eech and a fuller sp ectrum of sp eech prop erties needs to b e Intonation is an element of the linguistic structure correlated with a theoretically motivated yet empiri that can provide information imp ortant for comput cally determined representation of discourse structure ing b oth attentional state and intentional structure The approachwehavetaken in our work is to con In our research GSs mo del of discourse structure duct corpusbased empirical work on intonational fea provides b oth a foundation for segmenting discourses tures of sp oken language analyze discourse prop into constituent parts and a set of theoretical con erties based on an indep endently motivated theory of structs that may serve to mediate our interpretation of discourse structure and examine the correlations the discourse functions of intonational features Fur between the two sources of linguistic structure ther intonation provides information ab out b oth lev els of discourse structure For example at the global Proso dic Analysis level cue phrases that mark segment b oundaries Sid ner ReichmanAdar exhibit reliable in The metho ds we use for measuring sp eech prop er tonational prop erties Hirschb erg Litman ties such as rate energy rms pauses and fun Hirschb erg At the lo cal level intonation may damental frequency are widely used in the sp eech indicate whether a phrase is parenthetical or may in community These measures can b e obtained auto uence the p erceived salience of some mentioned entity matically given orthographic and proso dic transcrip We devised a set of instructions based on GS for la tions of the sp eech The proso dic transcription a b eling the intentional and linguistic structures at b oth more abstract representation of the intonational promi the lo cal and global levels Hirschb erg Grosz nences phrasing and melo dic contours is obtained Grosz Hirschb erg While the studies rep orted by handlab eling We employ the ToBI standard here utilize these socalled exp ert instructions a par for proso dic transcription Silverman et al allel set of intentionbased segmentation instructions Pitrelli Beckamn Hirschb erg which is based suitable for naive sub jects is b eing develop ed for use up on Pierrehumb erts theory of American English in in the Boston Directions studywhich is describ ed b e tonation Pierrehumb ert low The ToBI transcription provides us with a break down of the sp eechsampleinto minor or intermedi Sp eech Corp ora ate phrases in Pierrehumb erts terms Pierrehum We utilize three corp ora in our investigations b ert This level of proso dic phrase serves as our professionally read AP news stories non primary unit of analysis for measuring b oth sp eechand professional sp ontaneous narrative and non discourse prop erties For eachintermediate phrase professional elicited taskoriented monologues b oth we calculate values for pitc h range from the funda sp ontaneous and read Below we summarize results of mental frequency f maximum o ccurring within an two pilot studies utilizing the rst two corp ora resp ec accented syllable in the phrase amountoffchange tively The rst pilot study investigated intonational between phrases fphraseifphrasei ampli correlates of discourse structure while the second fo tude and energy rms maxima within the vowel of the cused on discourse structural constraints on intona syllable containing the phrases f p eak contour typ e tional prominence Although the pilot study results and typ e of nuclear accent identied in ToBI notation were encouraging our exp eriences with the resp ec sp eaking rate measured in syllables p er second sps tive corp ora revealed ways in whichchoices of sp eak and pausal duration b etween intermediate as well as ing style eg read vs sp ontaneous professional vs intonational phrases nonprofessional and genre generally inuence b oth discourse and sp eech prop erties These singlesp eaker Discourse structure analysis corp ora also did not address the problem of individual We base our discourse analysis on the theory of dis variation across sp eakers Toovercome problems with course structure presented in Grosz Sidner these corp ora a third corpus of multisp eaker elicited hereafter GS in which discourse structure is com taskoriented monologues was designed prised of intentional structure attentional This corpus the Boston Directions Corpus exhibits state and linguistic structure GSs mo del discourse and sp eech