An Analysis of Sentence Segmentation Features for

Broadcast News, Broadcast Conversations, and Meetings

12 1 Sebastien Cuendet1 Elizabeth Shriberg Benoit Favre

[email protected] [email protected] [email protected] 1 James Fung1 Dilek Hakkani-Tur

[email protected] [email protected] 2 1 ICSI SRI International 1947 Center Street 333 Ravenswood Ave

Berkeley, CA 94704, USA Menlo Park, CA 94025, USA

t sp eaking stylesbroadcast news BN broadcast

ABSTRACT dieren

conversations BC and facetoface multiparty meetings

Information retrieval techniques for sp eech are based on

MRDA We fo cus on the task of automatic sentence seg

those develop ed for text and thus exp ect structured data as

mentation or nding b oundaries of sentence units in the

An essential task is to add sentence b oundary infor input

otherwise unannotated devoid of punctuation capitaliza

mation to the otherwise unannotated stream of out

tion or formatting stream of words output by a sp eech

systems We analyze put by automatic sp eech recognition

recognizer

sentence segmentation p erformance as a function of feature

Sentence segmentation is of particular imp ortance for sp eech

ual versus automatic for news typ es and transcription man

understanding applications b ecause techniques aimed at se

sp eech meetings and a new corpus of broadcast conversa

mantic pro cessing of sp eech inputsuch as machine trans

tions Results show that overall features for broadcast

lation are typ

news transfer well to meetings and broadcast conversations

ically develop ed for textbased applications They thus as

pitch and energy features p erform similarly across cor

sume the presence of overt sentence b oundaries in their in

p ora whereas other features duration pause turnbased

put In addition many sp eech pro cessing tasks

and lexical show dierences the eect of sp eech recogni

show improved p erformance when sentence b oundaries are

tion errors is remarkably stable over features typ es and cor

provided F or instance sp eech summarization p erformance

p ora with the exception of lexical features for meetings and

improves when sentence b oundary information is provided

broadcast conversations a new typ e of data for sp eech

as observed in Similarly named entity extraction and

technology b ehave more like news sp eech than like meetings

partofsp eech tagging in sp eech is improved using sentence

for this task Implications for mo deling of dierent sp eaking

b oundary cues in and the use of sentence b oundaries for

styles in sp eech segmentation are discussed

is shown to b e b enecial for machine

in Sentence b oundary annotation is also

General Terms translation

imp ortant for aiding human readability of the output of au

Proso dic Mo deling Sentence Segmentation

tomatic sp eech recognition systems and could b e used for

determining semantically and proso dically coherent b ound

for playback of sp eech to users in tasks involving audio

Keywords aries

search

Sp oken Language Pro cessing Sentence Segmentation Broad

While sentence segmentation of broadcast news and to

cast Conversations Sp ontaneous Sp eech Proso dy Bound

some extent of meetings has b een studied in previous work

ary Classication Bo osting

little is known ab out broadcast conversations Indeed data

for this task has only recently b ecome available for work

sp eech technology Studying the prop erties of broadcast

1. INTRODUCTION in

conversations and comparing them with those of meetings

We investigate the role of identicallydened lexical and

and broadcast news is of interest b oth theoretically and also

proso dic features when applied to the same task across three

practically esp ecially b ecause there is currently less data

available for broadcast conversations than for the other two

typ es studied here For example if two sp eaking styles share

haracteristics one can p erform adaptation from one to an

Permission to make digital or hard copies of all or part of this work for c

to improve the p erformance of the sentence segmen

personal or classroom use is granted without fee provided that copies are other

as proved previously for meetings by using conversa

not made or distributed for profit or commercial advantage and that copies tation

telephone sp eech

bear this notice and the full citation on the first page. To copy otherwise, to tional

goal of this study is to analyze how dierent sets of

republish, to post on servers or to redistribute to lists, requires prior specific The

including lexical features proso dic features and permission and/or a fee. features

Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...5.00.

MRDA TDT BC

their combination p erform on the task of automatic sen

Training set size

tence segmentation for dierent sp eaking styles More sp ecif

Test set size

ically we ask the following questions

Heldout set size

Vo cabulary size

Ho w do dierent feature typ es p erform for the dierent

Mean sentence length

sp eaking styles

Table Data set statistics Values are given in

What is the eect of sp eech recognition errors on p er

num b er of words based on the output of the sp eech

formance and how do es this eect dep end on the fea

recognizer STT

ture typ es or on the sp eaking style

For this task are broadcast conversations more like

broadcasts or more like conversations general tend to contain syntactically simpler sentences and

signicant pronominalization News sp eech is typically read

Results have implications not only for the task of sentence

from a transcript and more closely resembles written text

b oundary detection but more generally for proso dic mo del

It contains for example app ositions center emb eddings and

ing for natural language understanding across genres

prop er noun comp ounds among other characteristics that

The next section describ es the data set features and ap

contribute to longer sentences Discourse phenomena also

proach to sentence segmentation Section rep orts on ex

obviously dier across corp ora with meetings containing

p eriments with proso dic and lexical features and provides

more turn exchanges incomplete sentences and higher rates

further analysis and a discussion of usage of various feature

of short backchannels such as y eah and uhhuh than

typ es or groups and comparison across sp eaking styles A

sp eech in news broadcasts and in the broadcast conversa

summary and conclusions are provided in Section

tions

Sentence b oundary lo cations are based on reference tran

for all three corp ora Sentences b oundaries are

2. METHOD scriptions

annotated in BN transcripts directly For the meeting data

are obtained by mapping dialog act b oundaries

2.1 Data and annotations b oundaries

to sentence b oundaries The meetings data are lab eled ac

To study the dierences b etween the meetings BN and

cording to classes of dialog acts backchannels o orgrabb ers

BC sp eech for the task of sentence segmentation we use the

and o orholders questions statements and incompletes

ICSI Meetings MRDA the TDT English Broadcast

In order to b e able to compare the three corp ora all dia

YQ Broadcast Conversations News and the GALE

log act classes are mapp ed to the sentence b oundary class

corp ora

The BC Corpus frequently lacked sentence b oundary anno

The ICSI Meeting Corpus is a collection of meetings

tations but included line breaks and capitalization as well

including simultaneous multichannel audio recordings word

as dialog act tag annotations In order to use this data

level orthographic transcriptions The meetings range in

we implemented heuristic rules based on human analysis to

length from to minutes but generally run just under

pro duce punctuation annotations For BC data we similarly

an hour each summing to hours We use a meeting

mapp ed the typ es of dialog acts dened statements ques

subset of this corpus that was also used in the previous re

tions backchannels and incompletes to sentence b ound

search with the same split into training heldout and

aries Note that we have chosen to map incomplete sentence

test sets TDT Corpus was collected by LDC and includes

b oundaries to the b oundary class even though they are not

multilingual raw material news wires and other electronic

full b oundaries This is b ecause the rate of incompletes

text web audio broadcast radio and television We use

while not negligible was to o low to allow for adequate train

a subset of TDT English broadcast radio and television

ing of a third class in the BC data given the size of the cur

data in this study The GALE YQ Broadcast Conversa

rently available data We thus chose to group it with the

tions Corpus also collected by LDC is a set of instudio

b oundary class even though incompletes also share some

talk shows with two or more participants including formal

characteristics with nonb oundaries Namely material to

oneonone interviews and debates with more participants

the left of an incomplete resembles nonb oundaries whereas

Three shows last for half an hour and the rest of the shows

material to the right resembles b oundaries

run an hour each for a total of ab out hours

In the exp eriments to follow classication mo dels are

on a set of data tuned on a heldout set and tested

trained 2.2 Automatic

on an unseen test set within each genre The corp ora are Automatic sp eech recognition results for the ICSI Meet

available with the words transcrib ed by humans reference ings data the TDT data and the BC data were obtained

and with the words output by the sp eech recognizer STT using the stateoftheart SRI conversational sp eech recog

For the reference conditions word start and end times are nition system BN system and BC system

obtained by using a exible alignment pro cedure Ref resp ectively The meetings recognizer was trained using no

erence b oundaries in sp eech recognizer output and exible acoustic data or transcripts from the analyzed meetings cor

alignments are obtained by aligning these with manual tran pus The word error rate for the recognizer output of the

scriptions with annotations Statistics on these data sets are complete meetings corpus is Recognition scores for

shown in Table for the STT conditions the TDT corpus is not easily denable as only closed cap

Note that the three dierent sp eaking styles dier sig tions are available that frequently do not match well with the

nicantly in mean sentence length with sentences in meet actual words of the broadcast news shows The estimated

ings b eing only ab out half the length on average as those word error rate lies b etween and The word error

in broadcast news Meetings and conversational sp eech in rate for the recognizer output of the BC data is

2.3 Features 2.5 Metrics

Sentence segmentation can b e seen as a binary classi The quality of a sentence segmentation is usually com

cation problem in which every word b oundary has to b e puted with Fmeasure and NIST error The Fmeasure is

lab eled as a sentence b oundary or as a nonsentence b ound the harmonic mean of the recall and precision measures of

1

ary We dene a large set of lexical and proso dic features the sentence b oundaries hyp othesized by the classier to the

computed automatically based on the output of a sp eech ones assigned by human lab elers The NIST error rate is the

recognizer ratio of the numb er of wrong hyp otheses made by the classi

er to the numb er of reference sentence b oundaries In this

ork we rep ort only the FMeasure p erformances

Lexical features. w

Previous work on sentence segmentation in broadcast news

h and in telephone conversations has used lexical and

sp eec 2.6 Chance performance computation

proso dic information Additional work has studied

What is of interest in the following exp eriments is the p er

the contribution of syntactic information Lexical fea

formance gain obtained by the classier towards the baseline

tures are usually represented as N grams of words In this

p erformance that one would achieve without any knowledge

work lexical information is represented by N gram fea

ab out the data but the prior of the classes The easiest way

tures for each word b oundary unigrams and

of doing so is to compute the prior probability p s of hav

t

Naming the word preceding the word b oundary of

ing a sentence b oundary on the training set and classify

interest as the current word and the preceding and follow

each word b oundary in the test set as a sentence b oundary

ing words as the pr evious and next word resp ectively the

with probability p s Concretely the chance score is evalu

t

lexical features are as follows

ated by computing the probability of each error and correct

class true p ositives false p ositives and false negatives and

unigrams fpreviousg fcurrentg fnextg

the ensuing value for the FMeasure computation The nal

chance p erformance only dep ends on the prior probabilities

bigrams fcurrent nextg

of having a sentence b oundary on the training set and on

trigram fprevious current nextg

the test set Therefore the chance p erformance can dier

slightly on the reference and STT exp eriments due to word

and deletion errors intro duced by automatic sp eech

Prosodic Features. insertion

recognition

Proso dic information is represented using mainly continu

ous values We use proso dic features dened for and ex

from the regions around each interword b oundary

tracted 3. RESULTS AND DISCUSSION

Features include pause duration at the b oundary normal

This section discusses results for the three dierent cor

ized phone durations of the word preceding the b oundary

p ora in two subsections The main section Section

and a variety of sp eakernormalized pitch features and en

presents results for feature groups eg pitch energy pause

ergy features preceding following and across the b oundary

and for combinations of the groups Section examines

Features are based in part on those describ ed in The

feature subgroups within the pitch and energy features for

extraction region around the b oundary comprises either the

example pitch reset versus pitch range features to gain

words or time windows on either side of the b oundary Mea

further understanding of which features contribute most to

sures include the maximum minimum and mean of pitch

p erformance

and energy values from these wordbased and timebased

Pitch features are normalized by sp eaker using a

regions 3.1 Performance by feature group

metho d to estimate a sp eakers baseline pitch as describ ed

Performance results for exp eriments using one or more

in Duration features which measure the duration of

feature typ es are summarized for reference in Table To

the last vowel and the last rhyme in the word b efore the

convey trends they are plotted in Figure lines connect

word b oundary of interest are normalized by statistics on

p oints from the same data set for readability The feature

the relevant phones in the training data We also include

are arranged in approximate order conditions on the Xaxis

turn features based on sp eaker changes

of p erformance for the b estp erforming condition ie for

A using reference transcriptions

2.4 Boosting Classifiers MRD

Although chance p erformance is higher for MRDA than

For classication of word b oundaries we use the AdaBo ost

for the broadcast corp ora consistent with the shorter aver

algorithm Bo osting aims to combine weak base clas

age sentence length in meetings all corp ora have low chance

siers to come up with a strong classier The learning

p erformance While chance p erformance changes slightly

algorithm is iterative In each iteration a dierent distribu

for reference versus automatic transcriptions they are close

tion or weighting over the training examples is used to give

within a corpus As a consequence one can compare the F

more emphasis to examples that are often misclassied by

Measure results almost directly across conditions To sim

the preceding weak classiers For this approach we use the

plify the discussion we dene the relative error reduction

Bo osTexter to ol describ ed in Bo osTexter handles b oth

Since the FMeasure is a harmonic mean of two error typ es

discrete and continuous features which allows for a conve

one can compute the relative error reduction for a mo del

nient incorp oration of the proso dic features describ ed ab ove

with FMeasure F and the asso ciated chance p erformance c

no binning is needed The weak learners are onelevel de

as

cision trees stumps

1

c F F c

More detailed mo dels may distinguish questions from state

c c

ments or complete from incomplete sentences TDT4 ref TDT4 stt 80 MRDA ref MRDA stt BC stt

60

40 F measure

20

0 Chance Dur Energy Pitch Turn Pause AllPros Lex Lex+Pau ALL

Features

Figure Fmeasure Results by Condition and Features Included in Mo del Dur duration features AllPros

all proso dic feature typ es Lex lexical features LexPau lexical feature plus pause features ALL

lexical plus all proso dic features

Since chance error rates are near the relative reduc

tion in error after normalizing for chance p erformance is

nearly the same value as the FMeasure itself That is an

Fmeasure of corresp onds to a relative error reduction

of ab out for the data sets considered

When comparing p erformance within a corpus TDT or

MRDA for reference versus automatic transcripts results

show a remarkably consistent p erformance drop asso ciated

Features TDT TDT MRDA MRDA BC

with ASR errors This implies that all feature typ es are

ref STT ref STT STT

aected to ab out the same degree by ASR errors The in

Chance

teresting clear exception is the p erformance of lexical fea

Duration

tures for the meeting data which degrades more in the face

Energy

of ASR errors than do other conditions For example rel

Pitch

ative to the allproso dy features condition lexical features

Turn

in this corpus give b etter p erformance for reference tran

Pause

scripts but worse p erformance for automatic transcripts

All pros

In contrast TDT shows ab out the exp ected drop for lex

Lex only

ical features from ASR features One p ossible explanation

Lexpause

is that in MRDA there is a high rate of backchannel sen

ALLlexpros

tences such as uhhuh which comprise a rather small set

of words sometimes with fairly low energy that are more

Table Overall results FMeasure scores for each

prone to recognition errors or that cause class errors when

column for all group of features showed in the rst

misrecognized The same argument could b e made for other

combinations of corpusconditions

frequent words in MRDA that are strong cues to sentence

starts such as I and various llers and discourse mark

ers Further analysis in which selected dialog acts such

as backchannels are removed from the train and test data

could shed light on these hyp otheses

If we consider that the BC data set is much smaller than

the other two sets and thus the training material for the

classier smaller all three corp ora are quite similar in p er also the two feature typ es that are normalized for the par

formance in b oth energy and pitch features although we ticular sp eaker or sp eaker estimated via diarization in our

will see in the next section that within these feature classes exp eriments Our feature sets for each of these two feature

there are some corpus dierences The corp ora also share typ es consisted of three subgroups

the trend that duration features are less useful than pitch or Subgroup uses an estimate of baseline pitch or en

energy features and that pause features are the most use ergy intended to capture the minimum value for the par

ful individual feature typ e Interestingly duration features ticular talker Features in this subgroup reect pitch or en

alone are more useful in MRDA than in either of the broad ergy values in the word or short time window preceding the

cast corp ora A listening analysis using class probabilities b oundary in question and compares those values to the es

of errors from the mo del revealed a p ossible explanation In timated baseline value for the talker The idea is to capture

broadcast sp eech sp eakers do lengthen phones b efore sen how high or low the preb oundary sp eech is relative to that

tence ends but they also lengthen phones considerably in sp eakers range with lower values correlating with sentence

other lo cations including during the pro duction of frequent ends We refer to these features as range features since they

prominences and at the more frequent subsentential syn capture the sp eakers lo cal value within their range Note

tactic b oundaries found in news sp eech Both characteris that b ecause these features lo ok at information prior to the

tics app ear to lead to considerable false alarms on duration b oundary itself they could b e used for online pro cessing

features in the broadcast corp ora ie to predict b oundary typ es b efore the following word is

Another noticeable dierence across corp ora is visible for uttered

turn features Here again the meeting data diers from the Subgroup lo oks across the b oundary ie at words or

broadcast data This result reects b oth the higher rate time windows b oth b efore and after the b oundary in ques

of turn changes in the meeting data esp ecially for short tion These features compare various pitch and energy statis

utterances such as backchannels and the way that the data tics for example maximum mean minimum of the preced

is pro cessed As already mentioned in Section the turn is ing and following sp eech using various normalizations The

computed dierently in the meetings than in the two other idea is to capture resets typical of sentence ends in which

corp ora In the broadcast data the turn is only estimated the pitch or energy has b ecome low b efore the end of a sen

by an external diarization system that may intro duce errors tence and is then reset to a higher value at the onset of the

whereas in the meetings the turn information is the true one next sentence Such features are dened only when b oth

since each sp eaker have their own channel Furthermore the preceding and following sp eech is present within a time

while turns in b oth broadcast and meetings data are broken threshold for any pause at the b oundary We refer to these

by second pauses the meeting pauses are derived from as reset features

the reference or STT transcript while the broadcast data Subgroup like subgroup lo oks only at words or time

pauses come from the lesssophisticated sp eechnonsp windows at one side of the b oundary The idea is to capture eech

the size of pitch or energy excursions by using the slop e of prepro cessor of the diarization system

regions close to the b oundary The slop e is taken from linear A nal observation from Figure concerns the patterns

ts of pitch and energy contours after various prepro cessing for the BC data This is a newer corpus in the sp eech com

techniques to remove outliers Sentence ends in conventional munity and little is understo o d ab out whether it is more like

linguistic studies of proso dy are asso ciated with a large nal broadcast news sp eech or more like conversational sp eech

fall which these features are intended to capture They The results here b oth for chance p erformance and for p er

may also however capture excursions related to prominent formance across feature typ es clearly indicate that in terms

at nonb oundaries of sentence b oundary cues broadcast conversations are more

Results for the three feature typ es for b oth energy and like broadcast news and less like conversations The overall

pitch are shown in Figure For ease of readability lines lower results for BC data are as noted earlier likely ex

connect p oints for the same condition We lo ok only at plained simply by the smaller set of training data available

the three STT conditions since reference results for TDT The one exception visible from Figure in this trend is in

and MRDA show a similar pattern to their resp ective STT the condition using lexical features only We would exp ect

results and using STT results allows us to compare all three the BC result here to b e lower than that for TDT STT

corp ora To compare relative usage of the dierent feature given the overall lower p erformances for BC than TDT

subgroups directly we lo ok at relative error reduction results But instead we see a higherthanexp ected result for BC

see previous section although as noted there the absolute in this condition similar in trend to the pattern seen for

Fmeasure results will lo ok similar MRDA STT for lexical features We hyp othesize that BC

A rst p oint to note ab out Figure which can b e con shares with conversational data the added utility of lexical

strued by comparing to results in Figure is that in all features from either backchannels that start and end sen

conditions subgroups p erform less well on their own than tences or from words like llers discourse markers and rst

the allenergy and allpitch groups This indicates that the p erson pronouns that tend to start sentences Further an

subgroups contribute some degree of complementary infor alyzes of the BC data suggest that while backchannels may

mation Second across all conditions and across the two not play a large role in broadcast conversations the second

feature typ es it is clear that the reset features p erform b et class of words ie those that tend to start sentences are

ter than features based on lo cal range or slope The inter fairly frequent and thus probably aid the lexical mo del

pretation is that it is b etter to use information from words

windows on b oth sides of a b oundary in question than

3.2 Performance by feature subgroup or

to lo ok only at one side or the other Third pitch features

A further feature analysis step is to lo ok more closely at

are relatively more useful than energy features for all three

the two feature typ es that capture framelevel proso dic val

corp ora but the largest dierential is for the TDT data

ues namely pitch features and energy features These are

while the rank of the feature typ es was the same the du

0.40 TDT4 stt

features were comparatively more useful in meetings

MRDA stt ration

in the two other corp ora most likely b ecause of the

BC stt than

tendency of broadcast sp eakers to lengthen phones not only

near sentence b oundaries but also in other lo cations

0.35

A closer lo ok at pitch and energy features in terms of

feature subgroups revealed that subgroups provide comple

mentary information but some subgroups are clearly b et

than others For all three corp ora there was greatest

0.30 PITCH ter

from features that compare sp eech b efore and after

SUBGROUPS b enet

interword b oundaries Broadcast news diered from the

conversational corp ora in b eing able to also take go o d ad

Relative Error Reduction

antage of features that lo ok only at one side of the b ound 0.25 v

ENERGY

likely reecting the more careful and regular proso dic

SUBGROUPS ary

patterns asso ciated with read as opp osed to sp ontaneous

sp eech

0.20

Comparisons of the reference and sp eechtotext condi

showed interestingly that nearly all feature typ es are

Range Reset Slope Range Reset Slope tions

aected to ab out the same degree by ASR errors The excep

Figure Relative error reduction for the three sub

tion was lexical features in the case of the meetings which

groups of features r ange reset and slope for b oth

degrade more than exp ected from ASR errors Possible ex

energy and pitch

planations for this are that sentence segmentation p erfor

mance in meetings relies more heavily on certain oneword

utterance like backchannels as well as on a small class of

Note however that the strongest subgroup ie pitch reset

highly predictive sentence onset words such as I llers

shows nearly identical relative error reduction p erformance

and discourse markers

for b oth TDT and MRDA BC is not far b ehind given the

In future work we plan to explore metho ds for improving

much smaller amount of data available for training

p erformance on BC data including adaptation and addition

Finally these results show one example in which the BC

of similar data from other corp ora We also plan to study

data compares more closely with meeting data than with

the impact of removing sp ecic classes of dialog acts from

broadcast news data TDT more so than the two con

the meetings to determine the b ehavior of lexical features

versational corp ora can make use of additional subtyp es

for this corpus as just describ ed ab ove is related to sp ecic

such as slop e for energy features and range for pitch fea

dialog acts or to some other phenomenon Finally we hop e

tures Thus although BC lo oks more like TDT than like

that further work along the lines of the studies describ ed

MRDA when examining overall feature usage see Figure

herein can add to our longer term understanding of the

it shares with MRDA that certain feature subtyp es such as

relationship b etween sp eaking style and various techniques

those based on lo oking at only one side of a b oundary are

and features for natural language pro cessing

much less robust than the reset features that lo ok across

This suggests that the conversational data may

b oundaries 5. ACKNOWLEDGMENTS

have greater range variation and less dened excursions than

This material is based up on work supp orted by the De

read news sp eech

fense Advanced Research Pro jects Agency DARPA under

tract No HRC Any opinions ndings

4. SUMMARY AND CONCLUSIONS con

and conclusions or recommendations expressed in this ma

We have studied the p erformance of sentence segmenta

terial are those of the authors and do not necessarily re

tion across two sp ontaneous sp eaking stylesbroadcast con

ect the view of DARPA The authors would like to thank

versations and meetings and a more formal onebroadcast

Matthias Zimmermann Yang Liu and Mathew Magimai

news The average length of sentences and comparison of

Doss for their help and suggestions

the lexical and proso dic feature typ es p erformance showed

in terms of sentence b oundary cues broadcast con

that 6. REFERENCES

versations are more like broadcasts and less like meetings

S Cuendet D HakkaniTur and G Tur Mo del

However the p erformance of the lexical features suggests

adaptation for sentence unit segmentation from

that BC shares with meetings the added utility of lexical

sp eech In Proceedings of SLT Aruba

features from word like llers discourse markers and rst

S Furui T Kikuchi Y Shinnaka and C Hori

p erson pronouns Other similarities b etween meetings and

Sp eechtotext and sp eechtosp eech summarization of

BC were also observed such as the b enet of proso dic fea

sp ontaneous sp eech Speech and Audio Processing

tures that lo oking at characteristics of b oth sides rather

IEEE Transactions on

than only one side of an interword b oundary

D HakkaniTur and G Tur Statistical sentence

The three sp eaking styles showed similarities in the role

extraction for information distillation In Proceedings

of individual features Pitch and energy features as overall

Honolulu HI of ICASSP

groups p erform surprisingly similarly in absolute terms for

all three corp ora Also for all corp ora pause features are R Grishman D Hillard Z Huang H Ji

the most useful individual typ e of features and duration D HakkaniTur M Harp er M Ostendorf and

features are less useful than energy and pitch However W Wang Impact of automatic comma prediction on

p osname tagging of sp eech In Spoken Language

Technologies SL T

D Jones W Shen E Shrib erg A Stolcke T Kamm

and D Reynolds Two exp eriments comparing reading

with listening for human pro cessing of conversational

OSPEECH telephone sp eech In Proceedings of EUR

pages

Y Liu E Shrib erg A Stolcke B Peskin J Ang

D Hillard M Ostendorf M Tomalin P Wo o dland

and M Harp er Structural metadata research in the

EARS program In Proceedings of ICASSP

J Makhoul A Baron I Bulyko L Nguyen

L Ramshaw D Stallard R Schwartz and B Xiang

The eects of sp eech recognition and punctuation on

information extraction p erformance In In Proc of

Interspeech Lisb on

E Matusov D Hillard M MagimaiDoss

D HakkaniTur M Ostendorf and H Ney

Improving sp eech translation with automatic

b oundary prediction In Proceedings of ICSLP

Antwerp Belgium

J Mrozinski E W D Whittaker P Chatain and

S Furui Automatic sentence segmentation of sp eech

for In Proc ICASSP

Philadelphia PA

B Roark Y Liu M Harp er R Stewart M Lease

M Snover I Shafran B Dorr J Hale

A Krasnyanskaya and L Yung Reranking for

sentence b oundary detection in conversational sp eech

In Proceedings of ICASSP Toulouse France

R E Schapire and Y Singer Bo ostexter A

b o ostingbased system for text categorization

Machine Learning

E Shrib erg R Dhillon S Bhagat J Ang and

H Carvey The ICSI meeting recorder dialog act

MRDA corpus In Proceedings of SigDial Workshop

Boston MA

E Shrib erg A Stolcke D HakkaniTur and G Tur

Proso dybased automatic segmentation of sp eech into

ation sentences and topics Speech Communic

A Stolcke B Chen H Franco V R R Gadde

M Graciarena MY Hwang K Kirchho

A Mandal N Morgan X Lin T Ng M Ostendorf

K Sonmez A Venkataraman D Vergyri W Wang

J Zheng and Q Zhu Recent innovations in

sp eechtotext transcription at SRIICSIUW In

IEEE Trans Audio Speech and Language Processing

volume pages

S Strassel and M Glenn Creating the annotated

TDT Y evaluation corpus In TDT

Evaluation Workshop NIST

A Venkataraman A Stolcke W Wang D Vergyri

V Gadde and J Zheng SRIs broadcast news

sp eech to text system In EARS Rich Transcription

workshop Palisades

Q Zhu A Stolcke B Chen and N Morgan Using

MLP features in SRIs conversational sp eech

recognition system In Proceedings of

INTERSPEECH pages Lisb on Portugal