Data Mining Mandarin Tone Contour Shapes

Data mining Mandarin tone contour shapes Shuo Zhang CED Applied Research Bose Corporation The Mountain Rd, Framingham, MA 01701 shuo [email protected] Abstract et al., 2009)? (3) What features can we use to im- prove the accuracy of automatic tone recognition In spontaneous speech, Mandarin tones that (Surendran, 2007)? Each of the works was driven belong to the same tone category may exhibit many different contour shapes. We explore the by a particular set of theoretical or practical moti- use of data mining and NLP techniques for un- vations and offered us a slice of understanding into derstanding the variability of tones in a large the problem. corpus of Mandarin newscast speech. First, In this work, we are interested in looking at the we adapt a graph-based approach to character- tone variability problem from a data mining per- ize the clusters (fuzzy types) of tone contour spective: we explore the structure and distribution n shapes observed in each tone -gram category. of tone contour shapes within a large amount of Second, we show correlations between these realized contour shape types and a bag of au- data. By taking a data mining approach, we con- tomatically extracted linguistic features. We trast our work with those works that focus on tone discuss the implications of the current study recognition or tone learning (either by machine or within the context of phonological and infor- by human): we seek to extract tone patterns of em- mation theory. pirical significance from a large data set of tones from spontaneous speech. 1 Introduction Working with the MCPST corpus (see Section One of the central phenomena of interest in lex- 3) of Mandarin newscast speech (about 100,000 ical tone production is the deviation of their sur- tones), we ask two questions: (1) For each tone face realizations from canonical templates of tone category, what are the (coarse) types/classes of categories(Xu, 1997; Prom-on et al., 2009; Suren- tone contour shapes we observe in this corpus? (2) dran, 2007). In a tone language, different tone cat- For a particular tone category, what linguistic fac- egories differing in pitch movements can distin- tors caused the same tone to be realized as these guish different lexical meanings of a syllable (e.g., different types of shapes? in Mandarin, the syllable “ma” in a high level pitch Inspired by works in natural language process- contour means “mother”, whereas the same syl- ing (NLP), we further extend these research ques- lable spoken in a falling pitch contour means “to tions in two directions. First, we extend our in- scold”). Even though each tone category is de- vestigation of tone categories into a series of n fined with a general pitch contour profile (such as consecutive tones, or tone n-grams. N-grams is level, rising, falling, etc.), they typically exhibit a classic technique in NLP language modeling1, great variability in spontaneous speech. As an ex- whereas in the current context, we study tone n- ample, Figure1 shows many different realizations grams due to the importance of context in tone of Mandarin tone 1, observed during speech pro- variability (Xu, 1997): a tone category maybe re- duction experiments in the lab. alized differently depending on their neighboring Previous works in phonology, speech prosody, tones. What can we learn from data mining tone and tone recognition have investigated this vari- contour shapes for tone unigrams, bigrams, and ability by asking questions such as: (1) What fac- trigrams? tors contribute to the variability in tone produc- Second, to study prosody interface in MCPST tion (Xu, 1997)?(2) How can we model the tone 1Readers may refer to the classic NLP textbook chapter if contour trajectory in synthesized speech (Prom-on needed: https://web.stanford.edu/ jurafsky/slp3/3.pdf. 144 Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 144–153 Florence, Italy. August 2, 2019 c 2019 Association for Computational Linguistics data, we use automatic methods (NLP and other) to extract linguistic features from the text, includ- ing Named Entity Recognition (NER), Corefer- ence resolution, Part-of-speech (POS) tagging, de- pendency parsing, and other phonological, mor- phological and contextual features. In order to find out the importance of these linguistic factors in shaping tone variability, we run the following ex- periment: given a particular tone (or tone n-gram) Figure 1: Samples of Mandarin Tone1 by the same speaker in lab speech. Data source: (Xu, 1997). The category, how well can we predict the type of tone canonical contours of Mandarin Tone 1,2,3,4 are: high contour shape it will take in running speech, using level, low rising, low dipping, high falling, where low these linguistic features that exclude information and high denote the pitch starting point of the tone. about the pitch contour f0 values? Previous works showed that many linguistic factors (such as focus, topic, etc.) affect tone pro- of Mandarin tones, most works have focused on duction or prosody (see Section2) . In this work, the effect of local tonal context (e.g., neighbor- we extend this to a more comprehensive set of lin- ing tones and pitch range, such as (Gauthier et al., guistic features, motivated by the information the- 2007; Xu, 1997)) and broader context (e.g., focus, ory account of tone production. We hypothesize topic, information structure, long term f0 varia- that there exists an information content inequality tions, such as (Xu et al., 2004; Liu et al., 2006; resulting from probability distribution of events in Wang and Xu, 2011)). The data in these works various linguistic domains (phonological, seman- usually consisted of a small number of tone obser- tic, etc). These inequalities affect speakers’ speech vations obtained in speech production experiments production, resulting in gradient variants of tone in the lab. They have informed later works on im- contour shapes in a given tone category. We in- proving the performance of supervised or unsu- vestigate the relative importance of these factors pervised tone recognition ((Levow, 2005; Suren- in predicting the types of contour shapes any par- dran, 2007) etc.). Other works such as (Surendran, ticular tone n-gram will take. 2007) and (Yu, 2011) have shown the importance The rest of the paper is organized as follows. of signals in speech outside of f0 for tone recogni- Section2 discusses relevant previous works. Next tion and learning. we describe the data used in this paper in Sec- In the PENTA (Xu, 1997, 2005) and qTA (quan- tion3. In order to characterize the types of con- titative target approximation) models (Prom-on tour shapes a tone n-gram will take, we develop et al., 2009), the surface f0 contour is viewed as a method to derive clusters of tone contour shape the result of asymptotic approximation to an un- types using network analysis (Section4). In Sec- derlying pitch target, which can be a static tar- tion5, we discuss feature engineering and feature get (High or Low) or a dynamic target (Rise or extraction from various linguistic domains (syn- Fall). An important contribution of the qTA is that tax, morphology, semantics, information structure, it provides a mathematical model to account for etc.). Section6 reports machine learning experi- the process of generating of a particular realiza- ments and results on predicting tone contour shape tion of a tone template, defined by a pitch target types and the analysis on feature importance. Fi- (with slope and intercept parameters) and the ac- nally, in Section7 we discuss the implications of celeration rate. As such, the specific shape of the this work in the context of information theory and contour then would depend on the starting pitch, phonological theory of speech and tone produc- ending pitch target, and how fast the pitch moves. tion. A fundamental theoretical question is how should we view the underlying factors that ac- 2 Related Work count for the tone surface variability. Previous research exhibits two opposing theories to this ques- There has been a long line of research on the vari- tion. The first approach (Cooper et al., 1985; ability of tone contour shapes as well as interfac- Cooper and Sorenson, 1981) postulates a direct ing between other linguistic factors and prosody link between communicative functions and sur- (Li, 2009; Buring, 2013). In linguistic research face acoustic forms by finding the acoustic corre- 145 lates of certain communicative functions, such as Given the set S (represented as a network) that focus, stress, newness, questions, etc. Such ap- contains all observations of f0 vectors that belong proaches have met criticisms from phonologists to a particular tone n-gram category, an algorithm (Ladd, 1996; Liberman and Pierrehumbert, 1984), A, defined in this section, partitions S into k clus- who argue that prosodic meanings are not directly ters, c1; c2; :::; ck, where all tone contours within ci mapped onto acoustic correlates. Instead, into- are highly similar to each other, and members of ci national meanings should be first mapped onto maximally distinct from cj for i 6= j. For a partic- phonological structures, which is in turn linked ular tone n-gram category, we define the centroid to surface acoustic forms through phonetic imple- f0 vector of ci to be its tone contour shape type ti. mentation rules. In this work, we attempt to show If we denote C to be set of types fc1; c2; :::; ckg, a new middle ground between these two theories. our goal in this section is to describe the algorithm A that learns a function g : S ! C. We adapt an 3 Data algorithm first proposed by (Gulati et al., 2016), All the data in this work comes from the Man- which has been shown to be effective in identify- darin Chinese Phonetic Segmentation and Tone ing clusters in time-series data such as pitch con- (MCPST) corpus 2, developed by the Linguistic tours.

Load more