Um and Uh, and the Expression of Stance in Conversational Speech
Total Page:16
File Type:pdf, Size:1020Kb
Um and Uh, and the Expression of Stance in Conversational Speech Esther Le Grézause A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2017 Reading Committee: Richard Wright, Chair Nicolas Ballier Gina-Anne Levow Mari Ostendorf Ellen Kaisse Program Authorized to Offer Degree: UW Department of Linguistics ©Copyright 2017 Esther Le Grézause University of Washington Abstract Um and Uh, and the Expression of Stance in Conversational Speech Esther Le Grézause Chair of the Supervisory Committee: Professor Richard Wright Department of Linguistics Um and uh are some of the most frequent items in spoken American and British English (Biber et al., 1999). They have been traditionally treated as disfluencies but recent research has focused on their discursive functions and acoustic properties, and suggest that um and uh are not just filled pauses or random speech errors. At this stage, there is little agreement on whether they should be considered as by-products of the planning process (speech errors) or as pragmatic markers. In addition, most work on um and uh considers them to be the same variable, collapsing both into the same category. The present work investigates the discursive and the acoustic properties of um and uh in spontaneous speech with the aim of finding out if they occur in systematic ways and if they correlate with specific variables. The analysis of um and uh is conducted on two corpora, ATAROS and Switchboard, to determine how the markers are used in different spontaneous speech activities. The Switchboard corpus consists of phone conversations between strangers, which allow us to study how speakers use um and uh in this context. It has different transcript versions (original and corrected), which allows us to test how transcribers perceive the two markers by aligning the original transcript with the corrected one. The ATAROS corpus consists of collaborative tasks between strangers and it is annotated for stance strength and polarity, which allows us to investigate how um and uh relate to stance. The term stance refers to subjective spoken attitudes toward something (Haddington, 2004). Stance strength is the degree to which stance is expressed. Stance strength has four possible values: no stance, weak, moderate, and strong stance. Stance polarity is the direction of the expression of stance, and it can be positive, neutral, or negative. The results of this study show that um and uh have different discursive cues, which cor- relate with variables such as speaker, speaker gender, speaker involvement, and naturalness of the conversation. Um and uh have different acoustic cues, which show some correlation with different degrees of stance and with stance polarity for different acoustic properties depending on the marker. The presence and the position of um and uh in utterances af- fect the likelihood of an utterance to be marked with a certain degree or polarity of stance. These findings are incorporated in a classification experiment, to test whether information pertaining to um and uh can be used to train a classifier to automatically label stance. The results of this experiment reveal that um and uh are valuable word unigram features and indicate that the position features and certain acoustic features increase the performance of the system in predicting stance. The results also indicate that um is a slightly more im- portant word unigram feature than uh, that features pertaining to um are more informative in the prediction of binary stance, and that features relating to uh are more informative to predict three-way stance strength. The findings confirm that um and uh are distinct entities. The discourse and acoustic features of um and uh are different. The marker um tends to vary to a greater extent than the marker uh. Transcribers perceive um more reliably than uh. Um and uh are relevant word unigram features. Features associated to um increase accuracy over those related to uh to predict binary stress, and features associated to uh increase accuracy over those associated to um to predict three-way stance strength. The work presented in this dissertation provides support to show um and uh are not just fillers or disfluencies, but rather that they have a wide range of uses, from fillers to pragmatic and stance markers. TABLE OF CONTENTS Page List of Figures . .v List of Tables . ix Glossary . xii Chapter 1: Introduction . .1 1.1 Um and uh ....................................1 1.2 Stance . .3 1.3 Cross corpora study . .3 1.4 Study goals and contributions . .4 1.5 Study structure . .5 Chapter 2: Defining Disfluencies . .6 2.1 Introduction . .6 2.2 General introduction on disfluencies . .6 2.3 Clinical disfluencies . .7 2.4 Normally occurring disfluencies . 11 2.5 Synthesis . 19 Chapter 3: Um and uh ................................. 20 3.1 Introduction . 20 3.2 Synthesis . 29 3.3 Issues to address . 30 Chapter 4: Stance . 31 4.1 Introduction . 31 4.2 Definitions . 31 i 4.3 Why stance? . 32 Chapter 5: Corpora . 35 5.1 Introduction . 35 5.2 Corpus choice rationale . 35 5.3 The ATAROS corpus . 36 5.4 The Switchboard corpus . 41 5.5 Summary . 45 Chapter 6: Distribution of um and uh in ATAROS and Switchboard . 46 6.1 Introduction . 46 6.2 Methodology . 47 6.3 The distribution of um and uh in ATAROS . 48 6.4 The distribution of um and uh in Switchboard . 57 6.5 Summary and discussion . 67 Chapter 7: Transcription errors of um and uh in Switchboard . 70 7.1 Introduction . 70 7.2 Methodology . 71 7.3 Overall description of the data . 75 7.4 Substitutions of um and uh ........................... 81 7.5 Missed and hallucinated ums and uhs ...................... 89 7.6 Paralinguistic variables . 97 7.7 Position . 111 7.8 Conclusion . 113 Chapter 8: Presence and position of um and uh and stance marking . 116 8.1 Introduction . 116 8.2 Methodology . 117 8.3 Presence of um and uh .............................. 120 8.4 Position . 122 8.5 Position in ATAROS vs. Switchboard . 127 8.6 Summary and discussion . 128 ii Chapter 9: Stance values and Acoustic properties of um and uh in the ATAROS corpus . 130 9.1 Introduction . 130 9.2 Methodology . 131 9.3 Duration and Stance . 134 9.4 Pitch and Stance . 139 9.5 Intensity and Stance . 143 9.6 Vowel quality (F1 and F2) and Stance . 147 9.7 Chapter summary and discussion . 151 Chapter 10: Automatic classification of stance . 153 10.1 Introduction . 153 10.2 Methodology . 154 10.3 Experiment 1: Classification of stance using lexical features . 155 10.4 Experiment 2: Classification of stance using acoustic and discourse features . 166 10.5 Summary . 174 10.6 Chapter summary and discussion . 175 Chapter 11: Conclusion . 177 11.1 Result summary . 177 11.2 Contributions and impact . 180 11.3 Future directions . 181 Appendix A: Word category: list of function and other words . 196 Appendix B: Token Frequency of monosyllabic words with the vowel /2/ in ATAROS 198 Appendix C: Top 200 most frequent words in ATAROS . 200 Appendix D: Main python scripts used in Chapter 10 . 203 Appendix E: List of communications . 239 E.1 Refereed conferences . 239 E.2 Presentations . 240 E.3 Article . 241 iii E.4 Scholarships and Certificates . 241 iv LIST OF FIGURES Figure Number Page 2.1 Linguistic factors in stuttering, after Kadri et al. (2011) . 10 2.2 Disfluency regions, after Shriberg (1994, 2001) . 13 6.1 Average number of words spoken (left) and average speaking time in s. (right) across speakers for each gender and each task . 50 6.2 Task effect on duration of um and uh for each gender in the ATAROS corpus 54 6.3 Effect of dyad gender on the rates (left) and on the duration (right) in s. of um and uh in the ATAROS corpus . 56 6.4 Effect of dyad gender on the rates (left) and on the duration (right) in s. of um and uh in the Switchboard corpus . 62 6.5 Effect of naturalness ratings on the average number of words and the average speaking time in s. per conversation in the Switchboard corpus . 63 6.6 Effect of naturalness ratings on the average number and on the duration of tokens in s. per conversation in the Switchboard corpus . 64 6.7 Effect of speaker participation in the.