A Taiwan Southern Min Spontaneous Speech Corpus for Discourse Prosody

TRASP'2013 Aix-en-Provence, France [P1-1] A Taiwan Southern Min spontaneous speech corpus for discourse prosody Sheng-Fu Wang, Janice Fon Graduate Institute of Linguistics, National Taiwan University, Taiwan 10617 Le-xue Building, No.1, Sec. 4, Roosevelt Rd., Taipei, Taiwan (R.O.C.) [email protected], [email protected] Abstract 2. Corpus Construction This paper presents a Taiwan Southern Min (Taiwanese) spon- The corpus is the continuation of a Mandarin-Min bilingual taneous speech corpus primarily constructed and annotated for spontaneous speech corpus started in 2004 [2]. Currently in studying discourse prosody. The corpus contains monologue- the annotated dataset, there are eight hours of monologues con- like speech elicited from interviews. Eight hours of speech tributed by sixteen native speakers of Taiwan Southern Min. contributed by sixteen interviewees, evenly split by gender and These speakers can be grouped according to gender and age. age, have been transcribed and annotated. Transcription and the For the age of the speakers, the young group contains speak- recordings were aligned at the level of syllable with the aid of ers born in the 1980s and the old group contains speakers born EasyAlign (Goldman, 2011). Discourse annotation was done in the 1940s. All of the speakers were from the same region by identifying one-verb clausal units and labeling the strength (Taichung, the mid-Taiwan metropolitan area) so that research of unit transitions to show the hierarchical structure of discourse claims based on the corpus would not be confounded by dialec- using Grosz and Sidner’s model (1986). As for prosodic label- tal differences. ing, two major levels of prosodic breaks were identified,along Speech was elicited in the form of an interview in which the with truncation and prolongation caused by disfluencies and interviewer asked the interviewee to talk about his or her per- hesitation. The present state of the corpus allows for research sonal experiences in childhood, in school, or at work. Marriage, on the relationship between acoustic cues, prosodic structure, health, and traveling experiences were also common topics. The and discourse organization in unscripted speech. aim of the interviewer was to elicit longer monologues rather Index Terms: discourse, spontaneous speech, Taiwanese, than to engage in a conversation with the interviewee. When- Southern Min, prosodic break ever the interview became loaded with short turn exchanges between the interviewer and the interviewee, the recording would 1. Introduction be not be included in the present dataset. The currently annotated dataset contains 110873 syllables, Natural speech is inherently variable. The construction of spon- 10603 discourse boundaries, and 19433 prosodic breaks. The taneous speech corpora is one way of approaching such fas- annotation conventions for discourse and prosodic break will cinating variability. This kind of corpora that contain speech be described in Section 3. collected by a more natural setting provides a greater range of variability than a reading task or other kinds of simple buts de- 2.1. Transcription signed laboratory setting. Especially in studying the relationship between natural discourse and speech, spontaneous speech The recordings were transcribed with the Taiwanese- Romanization convention mainly based on the online is able to reveal phenomena that would otherwise be unavailable n to researchers. dictionary constructed by Iu [3]. The convention uses both The paper presents a spontaneous Taiwan Southern Min Chinese characters and romanized transcription. Chinese speech corpus constructed with the aforementioned objectives characters are used when it is possible to identify the source in mind. Another important object concerns the target language. characters for a given Southern Min expression. When the Taiwan Southern Min, more commonly known as Taiwanese, characters were not readily identifiable, romanization was used is the native language of approximately 70% of the popula- instead. tion in Taiwan [1]. Because of the policy that promoted Man- When the transcription was aligned with the recording in darin Chinese as the official language, Taiwan Southern Min Praat [4], all Chinese characters were romanized. The partic- ularities of the romanization includes using the h symbol for was marginalized in the realm of education, media, and admin- h istration. A consequence in linguistic studies was that language plosive aspiration (e.g., pha for /p a/), double n for nasalized ˜ resources such as annotated speech corpora in Taiwan Southern vowels (e.g., pinn for /pi/), and h for glottal stops at the coda Min have been very scarce as compared with resources in Man- position (e.g., peh for /peP/). darin Chinese. The construction of a spontaneous speech corpus Since Taiwan Southern Min is a tone language, informa- will certainly be an important step to further understandings on tion on lexical tones is provided in the annotation. In addition, relevant theoretical issues on Taiwan Southern Min, as well as since lexical tones in Taiwan Southern Min may be realized as a providing an important resource for applicational purposes such fixed low tone [5], and the occurrence of such a low tone is not as speech synthesis and speech recognition. predictable from the transcription, further effort was devoted in identifying syllables with the low tone. 20 TRASP'2013 Aix-en-Provence, France [P1-1] 2.2. Syllable alignment (6) [對以Meˆ查某來講Á婿W/¶ñeˆ)]DBI0 [@以一定愛PZ] Syllable segmentation and alignment were first automatically “[a husband was like the sky for women in the conducted with the EasyAlign plug-in [6] and were further man- past] ually checked by the first author. Since investigation on the rela- DBI0 [so getting married was a must]” tionship between syllable duration and other linguistic parame- ters was a primary goal for research based on this corpus, it was DBI1 refers to the juncture around which the clauses de- crucial to determine the criteria for deciding syllable bound- scribes about different subtopics or ”scenes” within a theme or aries, especially at the utterance-final positions. The major cri- an episode in narration. In the corpus, DBI1 is often used to la- terion taken by the labeler was to put the syllable boundary at bel the following cases: an anaphoric change or update (Exam- the decay of formant structures as shown on Praat spectrograms. ple 7), a change of aspect or the introduction of a new time ref- A second trained phonetician labeled 10% of the data and a erence (Example 8), comments (Example 9), and the boundary test of cross-labeler agreement was conducted. Results showed between a complex discourse unit to a simple or another com- a mean difference of 12.4 ms in terms of the alignment of sylla- plex discourse unit where the units between the boundary have ble boundaries, which amounts to 5.7% of mean syllable dura- the relation of cause-effect or topic-comment(Example 10). tion. In addition, 91% of the alignment differences are smaller (7) [因º.哥哥他們攏讀0高w]DBI0 than 20% of mean syllable duration. This degree of agreement [讀了專科讀了]DBI0 is on par with, or even slightly better as compared with simi- [1攏»Z工\J]DBI1 lar tests for the Switchboard Corpus [7], which reports a mean [?,來..?l的意思/講]... phone-alignment difference of 16.4 ms (19% of mean phone duration), and of the Buckeye corpus [8], which reports that “[because my brothers they went to the vocational high 75% of the phone-alignment differences are smaller than 20% school]DBI0 of phone duration. [after graduating from the vocational high school]DBI0 [(they) went to work]DBI1 3. Annotation [and originally my grandpa’s intention was]...” 3.1. Discourse annotation (8) [若講¤朋Ëf]DBI0 Discourse segmentation was done with the transcribed texts of [我v,上都/專門`he lah]DBI1 the recordings. The annotation was based on Fon’s [9] adapta- [ú,上J我讀二專eˆBc]... tion of Grosz and Sidner’s [10] model on “Discourse Segment Purpose”, which identifies the basic intentional units that struc- “[when talking about making friends]DBI0” tured in a hierarchical fashion. In practice, the texts were first [basically I particularly depended on that]DBI1 segmented into basic discourse units, which are clauses, defined [basically, when I was studying at the two-year as units that contain one verb, according to the definition of “a technological college]” simple clause” by a classical study on the functional grammar of Mandarin Chinese [11]. Next, the relationship between clauses (9) [G£鳥仔>ú`leh] [³a喜] was judged according to the level of discourse juncture. Four DBI1 “[just like birds out of their cage] [very happy]” different levels of “Discourse Boundary Indices (DBI)” were DBI1 F/我/ ¯ distinguished in this study. (10) [ 6ti²¶]DBI1 @ 這棉 ²嘛真真 The first level was DBI0, which suggests that the two adja- [ 以對 leh¡ eˆz long´ 熟]DBI0 有= cent clauses describe the same entity or event, thus the boundary [ah實( »Z] between these two is merely a clausal boundary. In the corpus, DBI0 is often used in the following situations: between a ma- “[but I lived at the farm]DBI1 trix and a subordinate clause Examples 1 and 2), two clauses [so I was quite familiar with the work concerning the showing parallel syntax (Example 3), between a tag question silkworms]DBI0 and its preceding clause (Example 4), between clauses sharing [I actually did it]” an anaphora (Example 5), and the boundary between two simple discourse units having the relation of cause-effect or topic- DBI2 referred to situations where the boundary clearly comment (Example 6). differentiates two themes or episodes, yet these themes and episodes are still within a bigger general topic. Example 11 (1) [伊講] [”我嘛2來”] DBI0 showed one such switch, in which the speaker was still narrat- “[he said] [’I also want to come’]” DBI0 ing the experiences working in a hospital but changed from the (2) [這我感º]DBI0 [³a喜e]ˆ sentiment of the fragility of human-beings to the description of “[it makes me feel]DBI0 [very happy] how long he had worked there.

Load more