Discourse Structure in Spoken : Studies on Corpora

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Citation Nakatani, Christine H., Julia Hirschberg, and Barbara J. Grosz. 1995. Discourse structure in spoken language: Studies on speech corpora. Paper presented at 1995 AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation in Palo Alto, Calif., March 27–29, 1995.

Published Version http://www.aaai.org/Symposia/Spring/sss95.php

Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:2580299

Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA

Discourse Structure in Sp oken Language

Studies on Sp eech Corp ora

y

Christine H Nakatani Julia Hirschberg Barbara J Grosz

Aiken Computation Lab oratory C ATT Bell Lab oratories Aiken Computation Lab oratory

Division of Applied Sciences Mountain Avenue Division of Applied Sciences

Harvard University Murray Hill NJ USA Harvard University

Cambridge MA USA juliaresearchattcom Cambridge MA USA

chndasharvardedu groszdasharvardedu

Abstract spite the fact that basic synthesis technologies for pro

ducing natural intonation already exist

A b etter understanding of the intonational char

acteristics of sp oken discourse mayleadtonew

Theoretical and Metho dological

empirical techniques for identifying discourse

structure from sp eech as well as new algorithms

Foundations

for enhancing the naturalness of synthetic sp eech

Several decades of researchhave resulted in numerous

This pap er summarizes results of pilot stud

ndings on how discourse level meaning can b e con

ies that demonstrate reliable correlations of dis

veyed by acousticproso dic prop erties such as pitch

course and sp eech prop erties and rep orts nd

ings on a new corpus of directiongivi ng mono

range and pausal duration Avesani Vayra

logues collected in b oth sp ontaneous and read

Ayers Brown Currie Kenworthy Lehiste

sp eaking styles Preliminary analyses of the

Silverman cf Wo o dbury am

directiongivin g corpus show that the availabil

plitude Brown Currie Kenworthy sp eak

ity of sp eech signicantly aects the reliabil ityof

ing rate Lehiste and intonational prominence

discourse segmentation for a set of trained dis

Brown Terken Most of these studies have

course lab elers

relied on intuitive analyses of notions such as topic

structure or op erational denitions of discourselevel

prop erties such as paragraph markings as indicators

Intro duction

of discourse segment b oundaries

This pap er rep orts on ongoing corpusbased research

In contrast to most previous work two recent studies

on the intonational characteristics of sp oken discourse

utilized an indep endent denition of discourse struc

in American English The scientic goal of this re

ture to obtain discourse segmentation data from mul

searchistolay the foundations for a b o otstrapping

tiple sub jects In Hirschb erg Grosz Grosz

pro cess in which empirical evidence from sp oken lan

Hirschb erg discourse stuctural elements were

guage informs us of strengths and w eaknesses in a dis

determined by trained sub jects follo wing Grosz Sid

course theory and in which our b est current under

ner and were correlated with intonational prop

standing of discourse structure suggests more sophis

erties In Passoneau Litman discourse seg

ticated interpretations of intonational meaning The

mentations were obtained from naive sub jects based on

technological goal of this research is to improve the

an informal notion of sp eaker intention For a narrative

quality of sp eechsynthesis by exploiting the abilityof

corpus pausal duration ab ove a certain threshold pre

intonation to reliably convey linguistic structure at the

dicted segment b oundaries with high recall but

discourse level

low precision Passoneau and Litman suggest

Cognitive studies based on linguistic researchhave

that intonational cues b e integrated with textbased

shown that the lack of contextually appropriate into

cues such as cue phrases Hirschb erg Litman

national variation can hinder pro cessing bythehu

and other lexical information Morris Hirst

man listener Terken No oteb o om No oteb o om

Hearst in sp oken language pro cessing systems

Kruyt Yet algorithms for manipulating

using multiple knowledge sources

proso dic variation lag b ehind even our present under

The p otential contributions of sp eech cues in suchan

standing of howintonational meaning is conveyed de

architecture remain largely unexplored Intonational

The research rep orted here was partially supp orted by variables need to b e interrelated in new algorithms

grants NSF IRI and NSF IRI from the

National Science Foundation

That is input to systems such as DECTalk and the

y

Partially supp orted by a National Science Foundation ATT TexttoSp eech System can b e hand annotated to

Graduate ResearchFellowship pro duce quite natural sounding sp eech

and a fuller sp ectrum of sp eech prop erties needs to b e Intonation is an element of the linguistic structure

correlated with a theoretically motivated yet empiri that can provide information imp ortant for comput

cally determined representation of discourse structure ing b oth attentional state and intentional structure

The approachwehavetaken in our work is to con In our research GSs mo del of discourse structure

duct corpusbased empirical work on intonational fea provides b oth a foundation for segmenting discourses

tures of sp oken language analyze discourse prop into constituent parts and a set of theoretical con

erties based on an indep endently motivated theory of structs that may serve to mediate our interpretation of

discourse structure and examine the correlations the discourse functions of intonational features Fur

between the two sources of linguistic structure ther intonation provides information ab out b oth lev

els of discourse structure For example at the global

Proso dic Analysis

level cue phrases that mark segment b oundaries Sid

ner ReichmanAdar exhibit reliable in

The metho ds we use for measuring sp eech prop er

tonational prop erties Hirschb erg Litman

ties such as rate energy rms pauses and fun

Hirschb erg At the lo cal level intonation may

damental frequency are widely used in the sp eech

indicate whether a phrase is parenthetical or may in

community These measures can b e obtained auto

uence the p erceived salience of some mentioned entity

matically given orthographic and proso dic transcrip

We devised a set of instructions based on GS for la

tions of the sp eech The proso dic transcription a

b eling the intentional and linguistic structures at b oth

more abstract representation of the intonational promi

the lo cal and global levels Hirschb erg Grosz

nences phrasing and melo dic contours is obtained

Grosz Hirschb erg While the studies rep orted

by handlab eling We employ the ToBI standard

here utilize these socalled exp ert instructions a par

for proso dic transcription Silverman et al

allel set of intentionbased segmentation instructions

Pitrelli Beckamn Hirschb erg which is based

suitable for naive sub jects is b eing develop ed for use

up on Pierrehumb erts theory of American English in

in the Boston Directions studywhich is describ ed b e

tonation Pierrehumb ert

low

The ToBI transcription provides us with a break

down of the sp eechsampleinto minor or intermedi

Sp eech Corp ora

ate phrases in Pierrehumb erts terms Pierrehum

We utilize three corp ora in our investigations

b ert This level of proso dic phrase serves as our

professionally read AP news stories non

primary unit of analysis for measuring b oth sp eechand

professional sp ontaneous narrative and non

discourse prop erties For eachintermediate phrase

professional elicited taskoriented monologues b oth

we calculate values for pitc h range from the funda

sp ontaneous and read Below we summarize results of

mental frequency f maximum o ccurring within an

two pilot studies utilizing the rst two corp ora resp ec

accented syllable in the phrase amountoffchange

tively The rst pilot study investigated intonational

between phrases fphraseifphrasei ampli

correlates of discourse structure while the second fo

tude and energy rms maxima within the vowel of the

cused on discourse structural constraints on intona

syllable containing the phrases f p eak contour typ e

tional prominence Although the pilot study results

and typ e of nuclear accent identied in ToBI notation

were encouraging our exp eriences with the resp ec

sp eaking rate measured in syllables p er second sps

tive corp ora revealed ways in whichchoices of sp eak

and pausal duration b etween intermediate as well as

ing style eg read vs sp ontaneous professional vs

intonational phrases

nonprofessional and genre generally inuence b oth

discourse and sp eech prop erties These singlesp eaker

Discourse structure analysis

corp ora also did not address the problem of individual

We base our discourse analysis on the theory of dis

variation across sp eakers Toovercome problems with

course structure presented in Grosz Sidner

these corp ora a third corpus of multisp eaker elicited

hereafter GS in which discourse structure is com

taskoriented monologues was designed

prised of intentional structure attentional

This corpus the Boston Directions Corpus exhibits

state and linguistic structure GSs mo del

discourse and sp eech prop erties more characteristic of

also distinguishes b etween twolevels of discourse pro

the language used in interactivespoken language sys

cessing global and local Grosz Grosz Sid

tems After describing the corpus we present recent

ner Discourse segments emb edding relations

initial results on a p ortion of it that extend ndings of

discourse segmentpurposesdsps and relations b e

our pilot studies

tween them are part of the global level Attentional

state at this level is mo deled by a stack of fo cus spaces

Pilot Study Intonational Correlates of

The lo cal level of discourse structure concerns features

Discourse Structure

of the utterances within a discourse segment and re

In one set of pilot studies Hirschb erg Grosz lations among them Attentional state at this level is

Grosz Hirschb erg we analyzed a corpus of mo deled by centering theory Grosz Joshi Wein

three Asso ciated Press ap news stories recorded bya stein

professional sp eaker Results conrmed previous nd sults from sp eechreadbyaprofessionalspeaker would

ings that pitch range and timing variation are imp or generalize to sp ontaneous sp eech and to sp eechfrom

tant in signaling topic structure and further estab nonprofessionals

lished that these relationships hold when topic struc

ture has b een indep endently determined from consen

Pilot Study Intonational Prominence

sus sub ject lab eling

and Discourse Structure

A second pilot study investigated the relationship b e

Analysis Results

tween intonational prominence assignment and lo cal

Two groups of sub jects lab eled the stories using the

and global discourse structure in sp ontaneous narra

exp ert intentionbased discourse segmentation instruc

tive Nakatani Results of Brown

tions One group lab eled from text alone group T

showed a general tendency for given information to b e

while the other group annotated the text while simul

unaccented and new information accented where lex

taneously listening to the corresp onding sp eech group

ical items referring to referents previously mentioned in

S We then analyzed intonational and acoustic fea

the discourse are considered given Other researchon

tures for those discourse structural elements agreed

the problem of predicting accented given information

up on by all lab elers in a given group the consen

has identied a variety of relevant factors eg Horne



sus labels We separately examined group Ts and

on metricalphonological constraints Selkirk

group Ss consensus lab elings for discourse segment

on syntactic factors Terken Hirschb erg

b eginnings SBEGs and discourse segment endings

on p ersistence of grammatical function and surface

SFs for one news story

p osition and Hirschb erg on the interaction

We found statistically signicant correlations of as

of these and other variables in pitch accent assign

p ects of pitch range amplitude and timing with fea

ment algorithms trained on corp ora While Terken

tures of global and lo cal structure for b oth group T and

studied an additional factor namely discourse struc



S lab elings Further analyses of this and two addi

tural constraints he op erationally dened the notion

tional news stories showed that global and lo cal struc

of discourse or topic structure based on task structure

tures could b e reliably identied from handlab eled

Terken Referents were identied as givennew

acoustic and proso dic features with crossvalidated

relative to a topic segment WeextendTerkens nd

success rates of Hirschb erg Grosz

ings by providing an indep endent discourse analysis for

Grosz Hirschb erg Twocentral contributions

our narrative and by recasting the givennew distinc

of this pilot study were a the discovery of new re

tion at twolevels of discourse structure using GSs

lationships among intonational features of discourse

attentional state mo del to identify discourse referents

structure and b the development of a metho dol

as lo cally givennew and globally givennew

ogy for obtaining discourse segmentations by theoreti

cally motivated empirical metho ds indep endent of the

Analysis Results

acoustic and proso dic factors under investigation

A total of animate noun phrase referring expres

These results demonstrated the p ossibilityofdis

sions in a minute singlesp eaker unrestricted sp on

covering intonational correlates of discourse structure

taneous narrativewere analyzed for lexical form gram

through corpus analysis However the study also re

matical function and intonational prominence ie H

vealed certain w eaknesses in our corpus and in our

high star or complex pitch accents in Pierrehum

metho dology Some asp ects of news stories proved dif



b erts notation Pierrehumb ert The linguis

cult for our lab elers to segment reliably Also radio

tic and intentional structures were analyzed according

sp eech has b een claimed to exhibit certain idiosyncra

to the exp ert intentionbased instructions We found

cies which might confound our results since a ma jor

statistically signicant asymmetries in the interactions

goal of radio news writers is to capture and maintain

of accentuation with grammatical p osition and lexical

listener attention This goal of engaging an audience

form whichwe accounted for by noting that the pres

mayinteract with other discourse purp oses In addi

ence or absence of intonational prominence combines

tion the normal pro cesses of news editing may alter

with lexical and syntactic information to mark shifts

the originally intended discourse structure of the news

in attention at b oth the lo cal and global levels of dis

story Issues such as these intro duce diculties into

course structure

segmentation analysis We also wanted to see if our re

While this study conrmed general ndings that full



forms b ear accent and reduced forms do not Brown

Use of consensus lab els is a conservative measure of la

Terken it also wen b eler agreement average interlab eler agreement for struc tbeyond previous work

tural elements varied from within each group

in suggesting a uniform explanation of b oth exp ected

Results in Passoneau Litman show that with a



larger numb er of lab elers more sophisticated criteria such

The narrativewas collected by Virginia Merlini for the

as boundary strength can b e usefully employed

purp ose of studying American gay male sp eech We thank



Details are rep orted in Hirschb erg Grosz Mark Lib erman at the UniversityofPennsylvania for mak

Grosz Hirschb erg ing it available to us

patternings and socalled mismatches b etween lexical Analysis Results Discourse

form and accentuation ie cases of accented pronouns

Segmentation

and unaccented full forms Socalled mismatches may

b e reinterpreted as cases in whichaccent marks the

attentional status of a discourse referent where lexi

Discourse segmentations using the exp ert instruc

cal form and grammatical p osition convey conicting

tions were obtained from three sub jects lab eling from

statuses

text alone group T and three lab eling from sp eech

and text group S Percentages for consensus lab els

However the problem of reliable segmentation ap

for segmentinitial SBEG segmentnal SF and

plied acutely to our narrative whichisover

segmentmedial SCONT dened as neither SBEG nor

long the AP news stories averaged words in



SF are given in Table Twointeresting trends

length while the taskoriented sp eech segments stud

emerge First in contrast to the AP news story nd

ied in Brown averaged a few hundred Also ow

ings group S segmentations dier signicantly from

ing to the sub ject matter of the narrative the ma jority

those of group T Table shows that listening to sp eech

of referring expressions were realized as pronouns and

while segmenting pro duces more consensus b oundaries

prop er names To further test and rene our hyp othe

for b oth read and sp ontaneous sp eech than do es seg

ses wewanted to examine more segmentation data

menting from text alone When the read and sp on

and a fuller variety of referring expressions pro duced

taneous data are p o oled lab elers from sp eechand

bymultiple sp eakers

text agree up on signicantly more SBEG b oundaries

p  df as well as SF b oundaries p

 df than lab elers from text alone Further

CurrentInvestigations Boston

it is not the case that segmenters from text alone sim

Directions Corpus

ply cho ose to place fewer b oundaries in the discourse

if this were so then wewould exp ect a high number of

To build up on our preliminary results wehaveun

SCONT consensus lab els where no SBEGs or SFs were

dertaken a more extensive study using sp ontaneous

identied Instead we nd that the numb er of consen

and read sp eech in a directiongiving task The new

tly higher for lab elings from sus SCONTs is signican

corpus is made up of elicited monologues pro duced

sp eech and text for read p  df and



bymultiple nonprofessional sp eakers who are given

sp ontaneous sp eechp    df

written instructions to p erform a series of increasingly

These factors combine to yield signicantly higher p er

complex directiongiving tasks Sp eakers rst explain

centages of consensus lab els overall for lab elings from



simple routes such as getting from one station to an

sp eech and text for b oth read p   



other on the subway and progress gradually to the

df and sp ontaneous sp eechp   

most complex task of planning a roundtrip journey

df We conclude that asp ects of the sp eech signal

from Harvard Square to several Boston tourist sights

can help disambiguate among alternate segmentations

The sp eakers are provided with various maps and may

of the same text and thus the availabilityofspeech

write notes to themselves as well as trace routes on the

critically inuences the outcome of discourse structure

maps For the duration of the exp eriment the sp eak

analysis

ers are in facetoface contact with a silent exp erimen

The second trend to emerge concerns a somewhat

tal partner a confederate who traces on her map the

surprising eect of sp eaking styleonsegmentation

routes describ ed by the sp eakers The sp eech is subse

namely that of read versus sp ontaneous sp eaking

quently orthographically transcrib ed with false starts

mo des Sp ontaneous sp eech is generally claimed to ex

and other sp eech errors repaired or omitted sub jects

hibit less reliable proso dic indicators of discourse struc

return several weeks after their rst recording to read

ture than read sp eech cf Ayers Yetinour

the transcrib ed sp eech Both sets of recordings are



corpus sp ontaneous sp eech actually pro duced signif

then acoustically and proso dically lab eled Prelim

icantly more SCONT consensus lab els than did read

inary results describ ed b elow are available for b oth

sp eech for groups S and T combined p 

the sp ontaneous and the read sp eech for one sp eaker

df The higher overall p ercen tages of consensus

p erforming ve directiongiving tasks These tasks

lab els for sp ontaneous sp eech are attributable to this

resulted in minutes or intermediate phrases

dierence in SCONT lab elings

of read sp eech and minutes or intermediate

phrases of sp ontaneous sp eech

 

Sp eech recording and analysis is carried out using the Note that the value in Table for Sum of all typ es

WAVES software package Talkin and the ToBI can b e slightly less than the sum of p ercentages for the

lab elin g convention and to ols Silverman et al three typ es due to the fact that one phrase maybesimul

Pitrelli Beckamn Hirschb erg on Silicon Graph taneously lab eled segmentinitial and segmentnal This

ics workstations o ccurs when a single phrase comprises a complete segment

CONSENSUS LABELS FOR READ SPEECH N

Seginitial Segnal Segmentmedial Sum of all typ es

SBEG SF SCONT

Text alone T

Sp eechText S

CONSENSUS LABELS FOR SPONTANEOUS SPEECH N

Seginitial Segnal Segmentmedial Sum of all typ es

SBEG SF SCONT

Text alone T

Sp eechTextS

Table Percentage of Consensus Lab els by Segment Boundary Typ e

Analysis Results Intonational in read sp eech Wemay also hyp othesize that this

structural information is in fact signalled at least in

Correlates

part by proso dic and acoustic information since dis

We examined the following acoustic and proso dic cor

course lab elings pro duced while listening to sp eech cor

relates of consensus lab elings of intermediate phrases

related with more acousticproso dic features than la

lab eled as SBEGs and SFs f maximum and aver

b elings from text alone Certain acousticproso dic fea

age f rms maximum and average sp eaking rate and

tures such as preceding pause for example app ear to

duration of preceding and subsequent pauses Wecom

have b een made use of in segmentation decisions for

pared segmentation lab els not only for group S versus

group S

group T but also for sp ontaneous versus read sp eech

As noted while intonational correlates for segment

Analysis Results Intonational

b oundaries have b een identied in read sp eech they

Prominence

have b een observed in sp ontaneous sp eech rarely and

A total of noun phrases in the read sp eech for the

descriptively

ve directiongiving discourses were analyzed for lexi

We found strong correlations for consensus SBEG

cal form eg prop er names deniteindenite nps

and SF lab els for groups S and T in b oth sp ontaneous

grammatical function eg sub ject direct ob ject



sp eech and read sp eech Results on consensus SBEG

prep ositional ob ject surfaceorder p osition sentence

lab els were as follows given group T segmentations

initial medial nal p osition in ma jor intonational

we found signicantly higher maximum and average

phrase phraseinitial medial nal and accentuation

f and maximum and average rms and shorter sub

unaccented or pitchaccenttyp e Similar to the pi

sequent pause for b oth sp ontaneous and read sp eech

lot study ndings of noun phrases were accen

for read sp eechwe also found signicant correlations

tually reduced ie b earing fewer pitch accents than

for preceding pauses Given group S segmentations



the citationform Preliminary analysis indicates that

we found signicantly higher maximum and average

lexical form and grammatical function are signicant

f higher maximum rms longer preceding and shorter

factors in determining accentuation with names b eing

succeeding pauses for read and sp ontaneous sp eech

less reduced than full nps and sub jects less reduced

we found higher average rms as w ell for read sp eech



than ob jects Surfaceorder and intonational phrase

Results on consensus SF lab els were as follows given

p osition were not signicant

group T segmentations we found signicantly lower

As for the pilot study the simple notion that refer

average f and rms maximum for b oth read and sp on

ences to given entities in the discourse should b e ac

taneous sp eech and lower rms average and subsequent

centually reduced fails to predict accentuation since

pause in addition for read sp eech Given group S seg

reintro ductions of discourseold entities were often ac

mentations we found lower average f rms maximum

cented Interestinglyitwas not the case either that

and average shorter preceding pause and longer sub

rep etitions of the same referring expressions were ac

sequent pause for b oth read and sp ontaneous sp eech

centually reduced any more than were alternate lexi

and in addition lower f maximum for read sp eech

cal expressions referring to discourseold entities In

While these results now hold for only a single

the case of rep eated referring expressions the second

sp eaker they are quite encouraging Wemayhy



p othesize that sp eakers can convey structural infor

The ATT TexttoSp eech System TTS was used

mation ab out a discourse in sp ontaneous as well as

to determine the ma jority of citationform accent assign

ments Two native sp eaker judgments were used for items



not in the TTS lexicon such as street and restaurant

Ttests were used to test for statistical signicance of

names

dierence in the means of phrases eg b eginning and not



Chisquare tests were used to test signicance Results b eginnin g segments Results rep orted are signicantatthe

rep orted are at the level or b etter level or b etter

mention was usually unreduced in intonational promi References

nence when a discourse segment b oundary intervenes

Avesani C and Vayra M Discorso segmenti

between it and the rst mention Thus accentual re

di discorso e un ip otesi sull intonazione In Att del

duction cannot b e considered an epiphenomenon of lex

Convegno Internazionale Sul lInterpunzione

ical givenness if it were so then lexical rep etitions

Ayers G M Discourse functions of pitch range

should simply b e reduced Rather discourse structure

in sp ontaneous and read sp eech Presented at the

interacts with lexical and other factors to constrain the

Linguistic So ciety of America Annual Meeting

deaccentuation of given information

Brown G Currie K and KenworthyJ

Finally certain cases of accentual reduction did not

Questions of Intonation Baltimore UniversityPark

arise previously in the narrative pilot study In the

Press

Boston Directions Corpus head nouns were frequently

Brown G Proso dic structure and the

deaccented in full NPs with accented adjectival mo d

givennew distinction In Ladd D R and Cutler

iers This phenomenon typically o ccurs when the

A eds Models and Measurements Berlin

sp eaker contrasted two referential tokens of the same

Springer Verlag

typ e eg RED line SUBWAY vs GREEN line sub

way RIGHT TURN vs ANOTHER right turn How

Grosz B and Hirschb erg J Some intonational

ever the twotokens were not conned to the same dis

characteristics of discourse structure In Proceedings

course segment This p oses a problem for the global

of the International ConferenceonSpoken Language

fo cusing mechanism in Grosz Sidner which

Processing Ban ICSLP

assumes that entities in sister segments cannot b e si

Grosz B J and Sidner C L Attention inten

multaneously in global fo cus We will explore whether

tions and the structure of discourse Computational

limited relaxations of this assumption such as consid

Linguistics

ering the most recently p opp ed fo cus space to b e in

Grosz B Joshi A and Weinstein S Pro

nonimmediate global fo cus can account for our cases

viding a unied account of denite noun phrases in

of accentual reduction A similar relaxation was neces

discourse In Proceedings of the st Annual Meet

sary to account for the deaccentuation of ob ject prop er

ingCambridge MA Asso ciation for Compu

names in the narrative study

tational Linguistics

Grosz B J The representation and use of

Conclusion

fo cus in dialogue understanding Technical Rep ort

SRI International Menlo Park Ca Universityof

Our studies of intonation and discourse provide em

California at Berkeley PhD Thesis

pirical evidence that discourses can b e segmented re

Hearst M Multiparagraph segmentation of

liably that intonation is used b yspeakers to convey

exp ository discourse In Proceedings of the nd An

linguistic structure at the discourse level and that

nual Meeting Las Cruces NM Asso ciation for Com

the relationship among intonational features and dis

putational Linguistics

course elements is more complex than previous studies

have suggested Preliminary analysis of the Boston Di

Hirschb erg J and Grosz B Intonational fea

rections Corpus has supp orted these hyp otheses and

tures of lo cal and global discourse structure In Pr o

has also uncovered imp ortant eects of sp eaking style

ceedings of the Speech and Natural Language Work

and segmentation metho dology on the abilitytoob

shop Harriman NY DARPA

tain reliable analyses of discourse structure Con

Hirschb erg J and Litman D Now lets talk

trary to exp ectation we found that discourse struc

ab out now Identifying cue phrases intonationally

ture analysis is most robust for sp ontaneous sp eech

In Proceedings of the th Annual Meeting

lab eled from sp eech and text together Our continuing

Stanford University Asso ciation for Computational

analysis of this corpus will test the generality of these

Linguistics

trends against more data including sp eech from multi

Hirschb erg J and Litman D Empirical stud

ple sp eakers and discourse segmentations pro duced by

ies on the disambiguation of cue phrases Computa

naive sub jects Findings of these corpusbased studies

tional Linguistics

in sum suggest that lo oking at sp oken language can

Hirschb erg J Pitch accent in context Pre

lead to improvements in the descriptive and computa

dicting intonational prominence from text Articial

tional adequacy of theories ab out discourse structure

Intel ligence

as wellastheoriesofintonational meaning

Horne M Whydospeakers accentgiven

information In Proceedings of the Second European

Acknowledgements

ConferenceonSpeech and Technol

ogy Genova Eurosp eech

The authors thank Nancy Chang Andy Kehler Candy

Sidner and Gregory Ward for their exp ert participation Lehiste I Perception of sentence and para

in this research graph b oundaries In Lindblom B and Oehman S

eds Frontiers of Speech Research London Academic Terken J and Hirschb erg J Deaccentuation

Press and p ersistence of grammatical function and surface

p osition Ms

Lehiste I Phonetic characteristics of discourse

Terken J and No oteb o om S G Opp osite

Pap er presented at the Meeting of the Committee on

eects of accentuation and deaccentuation on veri

Sp eech Research Acoustical So ciety of Japan

cation latencies for given and new information Lan

Morris J and Hirst G Lexical cohesion com

guage and Cognitive Processes

puted by thesaural relations as an indicator of the

Terken J The distribution of pitch accents

structure of text Computational Linguistics

in instructions as a function of discourse structure

Language and Speech

Nakatani C H Accenting on pronouns and

Wo o dbury A C Rhetorical structure in a cen

prop er names in sp ontaneous narrative In Proceed

tral Alaskan Yupik Eskimo traditional narrative In

ings of the European Speech Communication Associ

Sherzer J and Woodbury A eds Native Ameri

ation Workshop on Prosody Lund Sweden ESCA

hetoric Cambridge UK can Discourse Poetics and R

Nakatani C H Discourse structural con

Cambridge University Press

straints on accentinspontaneous narrative In Pro

ceedings of the European Speech Communication As

sociationIEEE Workshop on Speech Synthesis New

Paltz NY ESCAIEEE

No oteb o om S G and Kruyt J G Accent

fo cus distribution and the p erceived distribution of

given and new information An exp eriment Journal

of the Acoustical Society of America

P assoneau R and Litman D Feasibilityof

automated discourse segmentation In Proceedings of

ACL Ohio State University Asso ciation for Com

putational Linguistics

Pierrehumbert J B The Phonology and

Phonetics of English Intonation PhD Dissertation

Massachusetts Institute of Technology Distributed

by the Indiana University Linguistics Club

Pitrelli J F Beckamn M and Hirschb erg J

Evaluation of proso dic transcription lab eling reliabil

ity in the tobi framework In Proceedings of ICSLP

Yokohama International Conference on Sp oken Lan

guage Pro cessing

ReichmanAdar R Extended p ersonmachine

interface AI Journal

Selkirk E O SentenceProsody Intonation

Stress and Phrasing Basil Blackwell

Sidner C L Fo cusing in the comprehension

of denite anaphora In Brady M and Berwick R

eds Computational Models of DiscourseCambridge

MA MIT Press

Silverman K Beckamn M Pierrehumb ert J Os

tendorf M Wightman C Price P and Hirschberg

J Tobi A standard scheme for lab eling

proso dyInProceedings of ICSLPBanInterna

tional Conference on Sp oken Language Pro cessing

Silverman K The StructureandProcessing

of Fundamental Frequency Contours PhD Disserta

tion Cambridge UniversityCambridge UK

Talkin D Lo oking at sp eech Speech Technol

ogy