<<

The productivity of variable disyllabic in Chinese

Jie Zhang & Jiang Liu

Journal of East Asian Linguistics

ISSN 0925-8558 Volume 25 Number 1

J East Asian Linguist (2016) 25:1-35 DOI 10.1007/s10831-015-9135-0

1 23 Your article is protected by copyright and all rights are held exclusively by Springer Science +Business Media Dordrecht. This e-offprint is for personal use only and shall not be self- archived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com”.

1 23 Author's personal copy

J East Asian Linguist (2016) 25:1–35 DOI 10.1007/s10831-015-9135-0

The productivity of variable disyllabic tone sandhi in Tianjin Chinese

Jie Zhang1 · Jiang Liu2

Received: 9 January 2013 / Accepted: 23 December 2014 / Published online: 3 November 2015 © Springer Science+Business Media Dordrecht 2015

Abstract Tianjin Chinese has one of the more complex tone sandhi systems in Northern Chinese dialects. Due to its close contact with , many of its tone sandhi patterns are also variable. This article first reports a detailed acoustic study of tone sandhi patterns in both real lexical items and novel words in Tianjin. The data were collected from 48 speakers of Tianjin, who were instructed to pro- nounce disyllabic sequences as real words based on voice prompts. The results showed that the productivity of the in novel words varied depending on the sandhi—some were less productive than in real words, and some were more pro- ductive, indicating a combination of underlearning, overlearning, and proper learning of the sandhis from the lexicon. A theoretical model that predicts the productivity patterns based on the phonetic properties of the sandhis and statistical generalizations about the sandhis over the lexicon is then proposed.

Keywords Tone · Tone sandhi · Tianjin · Productivity · Optimality theory · Maximum entropy grammar

Electronic supplementary material The online version of this article (doi: 10.1007/s10831-015-9135-0) contains supplementary material, which is available to authorized users.

& Jie Zhang [email protected] & Jiang Liu [email protected]

1 Department of Linguistics, The University of Kansas, 1541 Lilac Lane, Blake Hall, Room 427, Lawrence, KS 66045-3129, USA 2 Department of Asian Languages and Literatures, University of Minnesota, 220 Folwell Hall, 9 Pleasant Street SE, Minneapolis, MN 55455, USA 123 Author's personal copy

2 J. Zhang, J. Liu

1 Introduction

1.1 Two types of evidence for phonological knowledge

Kenstowicz and Kisseberth, in Chap. 5 of their seminal Generative phonology: Description and theory (1979), raised a serious methodological issue for generative phonology research: they questioned the assumption that the phonological abstractions derived by traditional research methods that focused on lexically manifested patterns of sound distribution and morpheme alternation were the same abstractions in speakers’ unconscious phonological knowledge—the knowledge that generative phonology aims to uncover. Consequently, they advocated the research practice of complementing the evidence gleaned from such traditional sources with evidence from speakers’ linguistic behavior that directly manifested their uncon- scious knowledge, from speech errors and language games to loanwords and second language acquisition. Their skepticism of the assumption turned out to be well founded as subsequent research showed that speakers know both more and less than the lexical patterns. A number of recent studies have shown that speakers possess phonological knowledge that the lexical patterns of their language do not inform them of—a scenario that we will refer to as “overlearning.” For instance, Zuraw (2007) showed through a corpus study on loans and a web-based survey on novel words that Tagalog speakers possessed knowledge of the splittability of word-initial consonant clusters that could not be deduced from the lexicon. Berent et al. (2007) demonstrated through a series of experiments that English speakers preferred /bd/ as an onset cluster over /lb/, even though neither is a legal onset cluster in English. In an artificial language- learning setting, Wilson (2006) established that when English speakers were presented with velar palatalization before mid vowels, they could extend the process before high vowels but not vice versa. These have been taken as “the poverty of the stimulus” arguments for the relevance of Universal Grammar or substantive biases in phonological learning. “Underlearning,” alternatively termed “the surfeit of the stimulus” (Becker et al. 2011), refers to speakers’ subpar knowledge, and sometimes total ignorance, of generalizable patterns in the lexicon. For example, Becker et al. (2011) found that Turkish speakers could generalize to novel words the statistical patterns seen in relations between an obstruent voicing alternation and word as well as place of articulation in obstruents in the lexicon, but they were oblivious to a similar statistically significant relation between the voicing alternation and properties of the preceding vowel (height, backness). Hayes et al. (2009b) investigated the variation patterns in suffixal in Hungarian and compared how speakers internalized two types of gradient patterns in novel words—natural ones in which the harmony behavior is based on the properties of the stem vowels (number of triggers, height of the trigger) and unnatural ones in which the harmony is correlated with features of the stem-final consonant. They found that speakers learned both the natural and unnatural patterns, but the unnatural patterns were undervalued and learned less robustly than the natural ones. Using an artificial language-learning

123 Author's personal copy

Tone sandhi productivity in Tianjin Chinese 3 paradigm, Moreton (2008) showed that English speakers learned a vowel height- voicing dependency significantly more poorly than a height-height dependency despite the facts that (a) neither dependency is attested in English, (b) the dependency in question was present in the learning experiment, and (c) the two dependencies have comparable phonetic precursors. These results also suggest that speakers’ phonological knowledge is the combined result of learned lexical patterns and a priori knowledge. These studies support Kenstowicz and Kisseberth’s thesis that evidence for speakers’ phonological knowledge needs to come from both within and beyond lexical patterns. Beyond the areas identified by Kenstowicz and Kisseberth such as speech errors and loanwords, corpus-external evidence has emerged from exper- imental investigations of productivity, especially in the form of wug tests (Berko 1958), in which speakers are asked to provide responses to novel words in contexts that are facilitative to the application of the phonological process in question. This methodology has been widely used to test the productivity of phonological alternations (e.g., Albright et al. 2001; Hayes and Londe 2006; Zuraw 2007; Hayes et al. 2009b; Becker et al. 2011) as well as regular and irregular morphological rules (e.g., Bybee and Pardo 1981; Albright 2002; Albright and Hayes 2003; Pierrehumbert 2006).

1.2 The role of productivity in tone sandhi research

Tone sandhi research, particularly descriptive work, has had a long tradition in Chinese phonology. Both detailed descriptions of tone sandhi in individual dialects and typological works on cross-linguistic patterns of tone sandhi abound (see Zhang 2014a, b for reviews and references). The relation between tone sandhi and theoretical phonology, however, has been an uncomfortable one. The analysis of Chinese tone sandhi patterns has presented considerable challenges to theoretical phonology in both rule-based and constraint-based frameworks, and complete theoretical analyses of any given tone sandhi system have proven difficult. Beyond the sheer complexity of tone sandhi patterns often observed in Chinese dialects, especially in the Wu and Min groups, three other properties of tone sandhi are responsible for this difficulty. First, as the result of diachronic changes, many of the sandhi patterns in the present-day systems are phonetically arbitrary. This presents particular challenges to the analysis of these patterns in Optimality Theory (Prince and Smolensky 1993), which relies on surface-oriented, generalizable markedness constraints. Second, many of the tone sandhi patterns are phonologically opaque (Kiparsky 1973). For example, in Taiwanese, four of the five tones in the tonal inventory on non-checked are involved in a circular : 55 → 33 → 21 → 51 → 55 (Cheng 1968; Chen 1987); in , the following synchronic chain shifts are attested: 32 → 44 → 53 → 21 / __ {212, 242}; 44 → 53 → 32 → 24 / __ 32 (Liang and Feng 1996). These patterns also pose analytical challenges for Optimality Theory: circular chain shift has been shown to be incomputable by a “conservative” OT grammar that uses only IO-faithfulness and markedness constraints (Moreton 2004), and regular chain shift requires additional mechanisms such as constraint conjunction to be captured (Kirchner 123 Author's personal copy

4 J. Zhang, J. Liu

1996). Third, due to complex contact situations as well as internal factors, many sandhi patterns are riddled with variation and exceptions. Under these contexts, it is particularly worthwhile to ask whether the lexical sandhi patterns that the speakers encounter are a true reflection of their phonological knowledge via productivity studies. Do speakers overlearn/generalize sandhi patterns in the face of variation and exceptions? Do speakers underlearn lexical regularities in tone sandhi due to their phonetic arbitrariness and phonological opacity? In other words, we need to expand our empirical basis from which theoretical analysis of tone sandhi proceeds to include not only lexical patterns of tone sandhi but also experimental evidence of tone sandhi productivity. This was exactly Kenstowicz and Kisseberth’s recom- mendation to phonologists over 30 years ago. Using wug tests to investigate the productivity of tone sandhi patterns can be traced back to the ground-breaking work of Hsieh (1970, 1975, 1976), who showed that the opaque tone sandhi circle in Taiwanese is generally not productive. Later works by Wang (1993), Zhang and Lai (2008), and Zhang et al. (2009, 2011) replicated and expanded Hsieh’s studies and reached similar conclusions. Zhang and Lai (2008) and Zhang et al. (2009, 2011), in addition, showed that sandhi productivity is also correlated with the frequencies of the sandhi patterns in the lexicon and the phonetic nature of the tone change: sandhis that have higher type and token frequencies in the lexicon tend to have higher productivity, and sandhis that turn longer tones into shorter tones have a productivity advantage over sandhis that turn shorter tones into longer tones due to the impoverished duration of the sandhi position as compared to the non-sandhi position. Zhang and Lai (2010) tested the productivity difference between the third-tone sandhi (213 → 35 / __ 213) and half-third sandhi (213 → 21 / __ T, T ≠ 213) in Standard Chinese in two wug test experiments and showed that the former applies less productively in novel words than the latter. They argued that the results were due to the fact that the half-third sandhi is a contour reduction process directly related to the shortened duration in non-final positions and thus has a clearer phonetic motivation than the third-tone sandhi, which (a) has a long diachronic history, (b) involves a pitch not easily explainable by phonetics, and (c) is also perceptually neutralizing. Zhang and Meng (2012) demonstrated that in Shanghai Wu, rightward contour extension, which effectively reduces contour tones on both syllables, is more productive than rightward contour displacement, which does not level the contour, and in the meantime causes large phonetic mismatches in both and tonal contour between the base and sandhi tones. These studies indicate that wug testing the productivity of tone sandhi patterns is a worthy research endeavor as speakers’ phonological knowledge can indeed differ from lexically manifested sandhi patterns due to the phonetic (e.g., tone duration, tone similarity) and phonological (e.g., opacity) properties of the sandhis. What we hope to achieve in this article is to present a productivity study of the tone sandhi system in Tianjin Chinese, which differs from previously investigated sandhi systems in a number of respects. First, as a northern dialect with a close affinity to the dialect and Standard Chinese, Tianjin’s sandhi pattern is also “right-dominant” (Yue-Hashimoto 1987), in that the tone at the right edge of the sandhi domain remains intact while non-final tones undergo sandhi. But its sandhi 123 Author's personal copy

Tone sandhi productivity in Tianjin Chinese 5 pattern is considerably more complex than that of Beijing and Standard Chinese. Second, different from the “right-dominant” dialects like Taiwanese, the Tianjin sandhi pattern does not involve phonological opacity. Third, the sandhi pattern in Tianjin is riddled with variation and exceptions, likely due to its close contact with the and the dominance of Standard Chinese. The productivity study on Tianjin tone sandhi, therefore, allows us to expand the typology of sandhi productivity, address new questions such as the effect of variation and exceptions to productivity, and in the meantime provide further tests of some of the hypotheses mentioned earlier, such as the relevance of lexical frequency and phonetic properties to sandhi productivity. In the rest of the article, we introduce the Tianjin tone sandhi pattern first in Sect. 1.3, then discuss the hypotheses and the methodology of the productivity study in Sect. 2. Results of our experiment follow in Sect. 3. We then provide a theoretical model for our results in Sect. 4. Discussions and concluding remarks are provided in Sect. 5.

1.3 Tianjin tone sandhi

Tianjin Chinese is spoken in the city of Tianjin 65 miles to the southeast of Beijing. Its four lexical tones are cognates with the four tones in Standard Chinese, but the pitch values of the tones in the two dialects differ, as shown in (1) (Chen 2000).1 The four-way contrast maH ‘mother’  maMH ‘hemp’  maMLH ‘horse’  maHL ‘to scold’ in Standard Chinese, for example, is realized as maL  maH  maLH  maHL in Tianjin.

(1) Lexical tones and Tianjin Chinese and Standard Chinese:

Tone 1 Tone 2 Tone 3 Tone 4

Tianjin L H LH HL

Standard Chinese H MH MLH HL

As mentioned previously, despite its close affinity and similarity to Standard Chinese, Tianjin has a considerably more complex system of tone sandhi. The traditional disyllabic sandhis reported in Li and Liu (1985) and later confirmed by Shi (1986), Yang et al. (1999), and Chen (2000), are summarized in (2). The T3+T3 sandhi in (2b) is cognate with the third-tone sandhi in Standard Chinese, which also changes a T3 to a T2 before another T3. The other three sandhis are not attested in Standard Chinese nor do they have extensive synchronic counterparts in other dialects to the best of our knowledge.

1 The transcriptions of the Tianjin tones vary from source to source. For example, using Chao’s tone numbers (Chao 1968), Li and Liu (1985) transcribed the four tones as 21, 45, 213, 54, respectively, while Shi (1990) used 11, 55, 24, 53. We use Chen’s (2000) notation here. For more detailed discussion and acoustic data on Tianjin citation tones, see Zhang and Liu (2011). 123 Author's personal copy

6 J. Zhang, J. Liu

(2) Tianjin disyllabic tone sandhi I: a. L+L → LH+L (T1+T1 → T3+T1) b. LH+LH → H+LH (T3+T3 → T2+T3) c. HL+L → H+L (T4+T1 → T2+T1) d. HL+HL → L+HL (T4+T4 → T1+T4) Shi (1988) noted that the four sandhi processes in (2) applied with different propensities in Tianjin. Under the criteria of the number of lexical exceptions and the likelihood with which the base-tone combinations surface as the result of tone sandhi in longer sequences, Shi ordered the sandhis according to their “strength” as follows: (T3+T3) [ (T1+T1) [ (T4+T4) [ (T4+T1). From the recordings of 204 Tianjin speakers in different age groups, Shi and Wang (2004) showed that the T4+T1 sandhi had a tendency to apply with greater regularity among younger speakers (close to 100 % application for speakers younger than 20 but only around 60 % for speakers older than 70), and the T4+T4 sandhi had generally become obsolete for younger speakers (close to 0 % application for \20 years; around 40 % for [70 years).2 The disappearance of the T4+T4 sandhi has also been reported in Liu and Gao (2003) and Gao (2004), and they attributed the disappearance to the influence of Standard Chinese, which has a similar T4 (51) that does not undergo sandhi before another T4. Shi and Wang’s (2004) results were in general agreement with Zhang and Liu’s (2011) acoustic findings on disyllabic tone sandhi from 12 Tianjin speakers (average age = 34.3), which showed that the T3+T3 and T1+T1 sandhis applied consistently, the T4+T1 sandhi had a small number of exceptions, and the T4+T4 sandhi only applied to a handful of words for a small subset of the speakers. Furthermore, Zhang and Liu (2011) showed that the sandhi patterns, even when they applied, generally did not result in tonal neutralization as the description in (2) implies, as the sandhi tone always preserved certain pitch properties from the base tone. Wee (2004) reported two additional tone sandhis for Tianjin, given in (3). These sandhis likely originated from the half-third sandhi in Standard Chinese, whereby the falling-rising T3 is realized as its first half before a tone other than T3 (213 +T → 21+T, T ≠ 213). Although Wee (2004) reported these sandhis as neutralizing sandhis (neutralization of T3 and T1 in the sandhi contexts), Ma and Jia’s (2006) acoustic and perceptual studies showed that neither sandhi in (3) was truly neutralizing: the sandhi tones partially preserved the rising property of T3, and listeners could identify the difference between T1 and T3 in the sandhi contexts with an accuracy rate of over 85 %. Zhang and Liu’s (2011) acoustic results further supported the incomplete neutralization property of these two sandhis. In our discussion of the sandhis below, we will still use the conventional categorical transcriptions, but only as a convenient shorthand.

2 In addition, Shi and Wang (2004) also found that for T1+T1, younger speakers (\20 years) consistently used T2+T1 as the sandhi tones, not the previously reported T3+T1, while older speakers ([70%) varied between T3+T1 and T2+T1. See Lu (1997, 2004) and Zhang and Liu (2011) for similar findings and additional discussions. 123 Author's personal copy

Tone sandhi productivity in Tianjin Chinese 7

(3) Tianjin disyllabic tone sandhi II: a. LH+H → L+H (T3+T2 → T1+T2) b. LH+HL → L+HL (T3+T4 → T1+T4) The complexity of tone sandhi in Tianjin, therefore, comes not only from the intricacy of the pattern itself but also from the variation and exceptions in the pattern and the changes that it is currently undergoing. The pattern itself, then, is not only interesting in its own right but also presents an opportunity to contribute to the theoretical debate on the roles of variation and exceptions in the formal grammar— an issue that has captured much attention in the recent phonological literature (see Coetzee and Pater 2011 for a review). It is also worth noting that the complexity of the Tianjin sandhi pattern does not involve opaque chain shifts as in Taiwanese. A study of the productivity of the tone sandhi pattern in Tianjin, therefore, allows us to investigate the speakers’ knowledge of a typologically different kind of sandhi system. We lay out the specific hypotheses about the productivity of Tianjin tone sandhi and the methodology for the study in the next section.

2 Hypotheses and methodology

2.1 Hypotheses

We have seen in Sect. 1.2 that a series of work on the productivity of tone sandhi patterns in Chinese dialects has shown that the phonological transparency, phonetic properties, and lexical frequency of a sandhi can all affect its productivity in novel words. Phonological transparency is not relevant here as all Tianjin tone sandhis are transparent. But we expect the effects of phonetic properties and lexical frequency to manifest themselves in Tianjin. In particular, we first hypothesize that regular sandhis with a strong phonetic basis, such as the half-third sandhis LH+H → L+H (T3+T2 → T1+T2) and LH+HL → L+HL (T3+T4 → T1+T4), would be more productive than other regular sandhis L+L → LH+L (T1+T1 → T3+T1) and LH +LH → H+LH (T3+T3 → T2+T3), whose phonetic basis is less strong. Our judgment of the strength of the phonetic basis follows that of Zhang and Lai’s (2010) for Standard Chinese. The half-third sandhi is a contour reduction process directly related to the shortened duration in non-final positions.3 The other sandhis have properties that are not directly related to phonetic reduction. The T1+T1 sandhi involves a contouring process in non-final position, which is typologically rare (Yue-Hashimoto 1987; Zhang 2002). It also cannot be easily interpreted as phonetically motivated as coarticulatory dissimilation typically involves the raising of a high tone before a low tone (see Gandour et al. 1994 for

3 An anonymous reviewer questioned the phonetic basis of the T3+T2 sandhi as the opposite pattern, whereby L+H → LH+H, is attested in African languages. But this type of regressive tone spreading is considerably rarer than progressive (Maddieson 1978; Hyman 2007; Zhang 2007). Hyman (2007) in fact goes on to argue that regressive tone spreading is due to special circumstances involving tone attraction to stressed positions or pressure from at the right edge and therefore is not a diachronically natural process. 123 Author's personal copy

8 J. Zhang, J. Liu

Thai; Xu 1997 for Standard Chinese; Peng 1997 for Taiwanese; and Zhang and Liu 2011 for Tianjin). The third-tone sandhi, like in Standard Chinese, also involves a raising of the pitch not easily explainable by phonetics. Second, we hypothesize that ceteris paribus, sandhi patterns with higher type and token frequencies will be more productive than those with lower frequencies. This should be most clearly manifested in the comparison between the two half-third sandhi patterns: both are equally motivated by insufficient duration, yet Tone 2 has lower type and token frequencies than Tone 4 (based on Da 2004). Therefore, we expect the T3+T2 sandhi to be less productive than the T3+T4 sandhi, a result also found in Zhang and Lai’s (2010) study on Standard Chinese. Relatedly, we also hypothesize that the token frequency of a particular lexical item is related to how the sandhi applies to the item, in that higher frequency leads to higher productivity. If so, then any underlearning or overlearning effects in novel words may be interpreted as exaggerated frequency effects. This will further inform the theoretical model for the speakers’ sandhi knowledge. Third, for the sandhis with exceptions, we predict that they will tend to change in the innovative direction in novel words. This is because we expect new words to take on the behavior that represents the direction of change. In other words, the disappearing HL+HL → L+HL (T4+T4 → T1+T4) should show further underlearning in novel words while the sandhi gaining popularity—HL+L → H +L (T4+T1 → T2+T1)—should be overlearned and generalized. In short, we hypothesize that a Tianjin speaker’s knowledge of tone sandhi is a combination of proper learning, underlearning, and overlearning from the lexicon: the lexical frequency of the sandhi pattern is positively correlated with sandhi productivity; however, sandhis that lack phonetic motivation should be under- learned and lack full productivity, yet sandhis with a limited number of exceptions should be overlearned and generalized.

2.2 Methodology

2.2.1 Experimental design

To test these hypotheses, we designed a wug test in which native speakers of Tianjin were asked to pronounce two separately presented individual syllables together as a real disyllabic word in Tianjin. All six sandhis in (2) and (3) were tested, and within each sandhi, three types of words were used: real disyllabic words in Tianjin, pseudo words composed of two actual-occurring syllables in Tianjin, and novel words in which the first was an accidental gap in the Tianjin syllabary. An accidental gap is a syllable in which both the segmentals and tone are legal, but their combination happens to be missing in Tianjin. We will refer to these three groups as REAL,PSEUDO, and NOVEL henceforth. REAL words were then further divided into four subtypes according to whether the disyllable and the first syllable were of high or low token frequency as in Fig. 1a, and PSEUDO words were further divided into two subtypes depending on whether the first syllable had high or low token frequency as in Fig. 1b. For each of the word-(sub)type/sandhi-type combination, we used four different words, which resulted in 168 test words (6 9 7 9 4). Token 123 Author's personal copy

Tone sandhi productivity in Tianjin Chinese 9

REAL PSEUDO

High freq. Low freq. High Low freq. 1 freq. 1

High Low High Low freq. 1 freq. 1 freq. 1 freq. 1 (a) (b)

Fig. 1 Stimulus design for a REAL words and b PSEUDO words frequency data were derived from a corpus of written Chinese with 28,278,285 bigrams compiled from online resources by Da (2004). The mean raw bigram frequency for the high-frequency disyllabic words is 3721, and that for the low- frequency words is 178. Frequencies for the first syllables in high-frequency REAL, low-frequency REAL, and PSEUDO words include the frequencies of all homophonous characters, provided that the characters are among the 3500 most commonly used characters in Da’s character corpus. In other words, these frequencies are approximations of the frequencies of the phonetic syllables with tones. High- frequency syllables all have a mean raw frequency over 210,000 while low- frequency syllables all have a frequency under 80,000. Care was taken to minimize the effect of tonal combination on word and syllable frequencies. We also used 160 fillers, 16 for each of the 10 disyllabic tonal combinations that did not undergo sandhi. We did not control for whether the REAL words were verbs or nouns or whether the PSEUDO words were more easily interpreted as verbs or nouns as word category is not known to affect the application of disyllabic tone sandhi in either Tianjin or Standard Chinese. Additional information on the selection of the stimuli and the entire word list are given in Appendix 1 (see Supplementary material). The 328 experimental stimuli were recorded in their monosyllabic citation form by a 23-year-old male native speaker of Tianjin in an anechoic chamber at the University of Kansas. Each monosyllable was read without sentential context twice, and the token deemed clearer by the two authors was used in the experiment. The experiment was implemented in Paradigm® (Perceptional Research Systems). The stimuli were evenly divided into two blocks. Block A included all stimuli with the tonal combinations T1+T1, T3+T2, and T3+T3 as well as fillers with the tonal combinations T1+T2, T1+T3, T1+T4, T2+T1, and T2+T2. Block B included all T3+T4, T4+T1, and T4+T4 stimuli and T2+T3, T2+T4, T3+T1, T4+T2, and T4 +T3 fillers. Half of the subjects took block A first, and the other half took block B first. There was a 5-min break between the blocks. Within each block, the stimuli were randomized by Paradigm® for each speaker. Each stimulus consisted of two monosyllables separated by an 800 ms interval. The stimuli were played through a pair of headphones to the subjects. For each stimulus, the subjects were asked to put the two syllables together and pronounce them as a real disyllabic word in Tianjin as naturally as possible. Before the experiment began, there was an introduction in Tianjin that the subjects heard through the headphones and simultaneously read on a 123 Author's personal copy

10 J. Zhang, J. Liu computer screen in front of them. The introduction explained their task both in prose and through examples. There was then a practice session of 9 words that did not appear in the real experiment (three of each of REAL,PSEUDO, and NOVEL words). The instruction and practice items were recorded by the same male speaker whose voice was used in the experiment. The experiment began after a verbal confirmation from the subjects that they were ready. The entire experiment took around 45 min. Fifty native speakers of Tianjin participated in the experiment. Two of them were recorded in an anechoic chamber in the Phonetics and Psycholinguistics Laboratory of the University of Kansas using a Marantz solid state recorder PMD 671 sampling at 22.05 kHz and an Electro-Voice RE-20 microphone. The other 48 were recorded in a quiet room in the Phonetics Laboratory of the Department of and Literature at Nankai University in Tianjin using the same model of solid-state recorder and an EV N/D 767a microphone. These speakers all self-reported to be native Tianjin speakers but were all bilingual in Tianjin and Standard Chinese. We made it clear to them that we were interested in the , and the native- Tianjin instruction and practice should also orient them to the Tianjin context. The speakers’ recordings were judged to be native-Tianjin-like by a native Tianjin consultant in the US and a trained Tianjin linguist in Tianjin. The data from two of the speakers in Tianjin could not be used: one speaker was from a suburb of Tianjin and spoke a different native dialect; the other’s data were lost due to a software malfunction. For the 48 speakers whose data we did use, all were from the six inner- city districts of Tianjin and used both Tianjin and Standard Chinese in their daily lives; 14 were male, 34 were female; they had an average age of 23.4 at the time of the experiment.

2.2.2 Data analysis

All acoustic analyses of the data were conducted in Praat (Boersma and Weenink 2009). For the first syllable in all test words, we took an f0 measurement every 10 % of the rhyme duration using Yi Xu’s TimeNormalizedF0 Praat script (Xu 2005), giving eleven f0 measurements for each syllable. The Maxf0 and Minf0 parameters in the script as well as the octave-jump cost were adjusted for each speaker, and the f0 measurements were hand-checked against narrow-band spectrograms in Praat. There were two situations in which a token was not used in further analysis: first, if neither the TimeNormalizedF0 script nor the narrow band spectrogram could produce reliable pitch measurements for it; second, if its second syllable was pronounced as a stressless syllable, as judged by both authors, who are native speakers of Standard Chinese.4 The reason the latter cases were excluded was that stressless syllables in Tianjin have a reduced tonal inventory, and words with stressless syllables have a different set of tone sandhi behaviors as shown in Jiang (1994) and Wang (2002). Of the 8064 tokens recorded (168 test words 9 48 speakers), 932 were excluded due to these two reasons—an attrition rate of 11.56 %.

4 Although neither author is a native speaker of Tianjin, we believe that our judgment was accurate as stressless syllables in Tianjin have significantly reduced duration (Jiang 1994), similar to Standard Chinese. 123 Author's personal copy

Tone sandhi productivity in Tianjin Chinese 11

The f0 measurements in Hz were converted to Semi-tone using the formula in (4a) to better reflect pitch perception (Rietveld and Chen 2006). The Semi-tone values were then z-score transformed using the formula in (4b) over all measurements from a given speaker in order to normalize for between-speaker variation, especially male and female differences (Rose 1987; Zhu 2004). Then for each speaker, the f0 values of the four words within each word-(sub)type/sandhi- type combination were averaged, and the averaged data were submitted for statistical analyses.

(4) a. ST = 39.87 9 log10(Hz/50) Pn 1 STxÀ STi z ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP n i¼1P b. STx n n 1 1 2 ðSTiÀ STiÞ nÀ1 i¼1 n i¼1

3 Results and discussions

3.1 Word-type results

We first report the productivity differences among the three word types—REAL, PSEUDO, and NOVEL—for the different sandhis. For REAL and PSEUDO words, we further averaged the pitch values of the first syllable from different lexical frequencies within each word type. For each sandhi, three two-way Repeated- Measures ANOVAs were conducted, with Word-Type [two levels each: (1) REAL vs. PSEUDO; (2) PSEUDO vs. NOVEL; (3) REAL vs. NOVEL] and Data-Point (11 levels) as independent variables. A significant main effect on Word-Type indicates that the pitches from the two word types under comparison have different means; a significant main effect on Data-Point indicates that the pitch value is time-sensitive; and a significant interaction between Word-Type and Data-Point indicates that the pitches from the two word types have different slopes. Huynh–Feldt adjusted values were used to correct for sphericity violations. The average pitches for the different word types for each sandhi are plotted in Fig. 2, and the ANOVA results are summarized in Appendix 2 (see Supplementary material). Let us first note that regardless of the word type, the general sandhi patterns agree with the acoustic findings in Zhang and Liu (2011). Most notably, compared to traditional descriptions, for the T1+T1 sandhi, the sandhi tone is higher than expected and closer to a base T2 (Fig. 2a); for the T3+T3 sandhi, the sandhi tone is lower than expected and does not neutralize with T2 (Fig. 2b); and for the T4+T4 sandhi, the sandhi tone has the same falling shape as the base tone, indicating that the sandhi has indeed become obsolete (Fig. 2d). But crucially, there are differences in how the sandhis applied to the three types of words as indicated by the often significant differences in pitch means and pitch slopes between the word types under comparison. Specifically, we can categorize the six sandhis into three types depending on whether it is the NOVEL words or the REAL words or neither that share more phonetic properties with the base tone of the first syllable of the stimuli. The properties under comparison are the pitch mean and

123 Author's personal copy

12 J. Zhang, J. Liu

Fig. 2 Average pitch contours of the first syllable in REAL,PSEUDO, and NOVEL disyllabic words for the six different sandhis (a–f). Significant comparisons in pitch means and pitch slopes are noted in the graphs. *, **, and *** significant differences at the p \ 0.05, p \ 0.01, and p \ 0.001 levels, respectively. For detailed ANOVA results, see Appendix 2 in Supplementary material. a L+L → LH+L (T1+T1 → T3+T1), b LH+LH → H+LH (T3+T3 → T2+T3), c HL+L → H+L (T4+T1 → T2+T1), d HL+HL → L+HL (T4+T4 → T1+T4), e LH+H → L+H (T3+T2 → T1+T2), f LH+HL → L+HL (T3+T4 → T1+T4) pitch slope of the tones. In L+L → LH+L (Fig. 2a), LH+LH → H+LH (Fig. 2b), HL+HL → L+HL (Fig. 2d), and LH+H → L+H (Fig. 2e), the sandhi tone for NOVEL words shares more phonetic properties with the base tone compared to REAL 123 Author's personal copy

Tone sandhi productivity in Tianjin Chinese 13 words and PSEUDO words. For instance, in L+L → LH+L (Fig. 2a), the sandhi tone for NOVEL words is lower in pitch than that for REAL words. Given that the base tone (L) has a lower pitch than the expected sandhi tone (LH), this falls under the category in which the NOVEL words share more phonetic properties with the base tone. On the other hand, in HL+L → H+L (Fig. 2c), the sandhi tone in NOVEL words has more properties of the expected sandhi tone H by having an overall higher pitch than the sandhi tones in REAL and PSEUDO words. Finally, for LH +HL → L+HL (Fig. 2f), there is no difference in pitch mean or pitch slope among the sandhi tones for the three types of words. Our interpretation of these results is as follows. The first type of sandhi is underlearned by the speakers as the sandhi applies less productively in NOVEL words than REAL words, as indicated by the greater phonetic similarity between the sandhi tone and the base tone in NOVEL words. The second type of sandhi—the sandhi with exceptions HL+L → H+L (Fig. 2c)—has been generalized and thus applies with a greater regularity in NOVEL words. We consider this as an instance of overlearning. The last type of sandhi—LH+HL → L+HL (Fig. 2f)—is properly learned from the lexicon and applies in the same fashion to PSEUDO and NOVEL words as in REAL words. These results and their interpretations are summarized in Table 1. A note of caution is in order for the interpretation of the gradient differences among different word types. The average pitches in Fig. 2 represent all usable tokens in the recorded data regardless of whether the token undergoes the sandhi per the rules in (2). This is because whether a tonal combination has undergone the sandhi categorically, incompletely, or has not undergone the sandhi at all is often difficult to determine except for a handful of cases. For L+L → LH+L and LH +LH → H+LH, all usable tokens have undergone sandhi regardless of word type; therefore, the gradient differences among the different word types were due to incomplete application of the sandhis to at least some of the wug tokens. For HL +L → H+L and HL+HL → L+HL, however, there are tokens in which the sandhi clearly did not apply—a handful for the former and the vast majority for the latter; the gradient differences seen in Fig. 2 are thus likely due to both categorical and gradient differences in the application of the sandhis. For the two half-third sandhis LH+H → L+H and LH+HL → L+HL, whether the sandhi has applied to a token was particularly difficult to decide, and we surmise that the gradient differences

Table 1 A summary of the word-type results for the six tone sandhi patterns Sandhi pattern Acoustic results for sandhi tone Learning classification

L → LH/ __ L Lower pitch in NOVEL than REAL words Underlearing

LH → H/ __ LH Lower pitch in NOVEL than REAL words Underlearing

HL → H/ __ L Higher pitch in NOVEL than REAL words Overlearning

HL → L/ __ HL Higher pitch in NOVEL than REAL words Underlearing

LH → L / __ H Higher pitch in NOVEL than REAL words Underlearing LH → L/ __ HL No pitch difference among word types Proper learning

123 Author's personal copy

14 J. Zhang, J. Liu observed for the former are primarily caused by different degrees of gradient application of the sandhi. Our results are in agreement with our hypotheses. We have shown that a tone sandhi pattern may be underlearned despite its full productivity in the lexicon, and the underlearning may be gradiently realized as the incomplete application of the sandhi. As hypothesized, the set of sandhis that shows underlearning includes not only the regular and the obsolete sandhis, L+L → LH+L, LH+LH → H+LH, and HL+HL → L+HL, but also the durationally based LH+H → L+H (reduction of contour due to insufficient duration). The other durationally-based sandhi LH +HL → L+HL, however, shows proper learning as expected. It is possible that in order to approach proper learning, the pattern needs the help of both phonetics and high lexical frequency: the trigger of the properly learned half-third sandhi, HL (Tone 4), has considerably higher type and token frequencies than the trigger of the underlearned half-third sandhi, H (Tone 2). Zhang and Lai (2010) found the same underlearning and proper learning patterns for the half-third sandhis before Tone 2 and Tone 4 in Standard Chinese as well. For the underlearning of the obsolete sandhi HL+HL → L+HL, our interpretation is that the real words that still undergo the sandhi are listed in the lexicon, but the sandhi itself has become unproductive, manifested in the results as underlearning. And for HL+L → H+L, the exceptions to the sandhi are listed in the lexicon, but the sandhi itself is productive, manifested in the results as overlearning.

3.2 Lexical frequency results

The effects of lexical frequency on sandhi productivity are reported on three separate graphs for each tone sandhi, two for REAL words and one for PSEUDO words as shown in Fig. 3. The two comparisons for the REAL words are based on the token frequency of the disyllable (high vs. low) and the token frequency of the first syllable (high vs. low), and the comparison for the PSEUDO words is based on the token frequency of the first syllable. Each graph represents the average pitch contours of the first syllable of the two word types under comparison. A two-way Repeated-Measures ANOVA was conducted for each comparison, with Frequency (two levels) and Data-Point (11 levels) as independent variables. A significant main effect on Frequency indicates that the pitches from the two frequency profiles have different means, and a significant interaction between Frequency and Data-Point indicates that the pitches from the different frequencies have different slopes. Huynh–Feldt adjusted values were again used. Significant comparisons are indicated in Fig. 3. Detailed ANOVA results are given in Appendix 3 (see Supplementary material). The frequency results in Fig. 3 show that higher token frequency generally leads to higher productivity. For both L+L → LH+L (Fig. 3a) and LH+LH → H+LH (Fig. 3b) for which the sandhi raises the base tone, the σ1 comparison for the PSEUDO words showed that higher token frequency σ1 leads to higher pitch. This indicates that higher frequency for a syllable likely leads to a stronger allomorph listing for its sandhi tone. In turn, this supports the hypothesis that the gradient underlearning of sandhis exhibited in wug words is an exaggerated frequency effect. The higher 123 Author's personal copy

Tone sandhi productivity in Tianjin Chinese 15

Fig. 3 Effects of lexical frequency on the productivity of different sandhis. For each sandhi, three comparisons are shown: REAL-Word-High vs. REAL-Word-Low; REAL-Syll1-High vs. REAL-Syll1-Low; PSEUDO-Syll1-High vs. PSEUDO-Syll1-Low. All graphs show the pitch contours of the first syllable of the two word types under comparison. In the graphs, *, **, and *** significant differences at the p \ 0.05, p \ 0.01, and p \ 0.001 levels, respectively. For detailed ANOVA results, see Appendix 3 in Supplementary material. a L+L → LH+L (T1+T1 → T3+T1), b LH+LH → H+LH (T3+T3 → T2 +T3), c HL+L → H+L (T4+T1 → T2+T1), d HL+HL → L+HL (T4+T4 → T1+T4), e LH+H → L +H (T3+T2 → T1+T2), f LH+HL → L+HL (T3+T4 → T1+T4)

123 Author's personal copy

16 J. Zhang, J. Liu

Fig. 3 continued

123 Author's personal copy

Tone sandhi productivity in Tianjin Chinese 17

Fig. 3 continued

123 Author's personal copy

18 J. Zhang, J. Liu sandhi productivity for high-frequency words is also attested in the two REAL comparisons of the obsolete sandhi HL+HL → L+HL (Fig. 3d): the sandhi tones for words with higher frequency are lower than those for words with low frequency, which, for a sandhi that lowers the base tone, indicates higher productivity for the high-frequency words. This is likely due to the fact that high-frequency words are more conservative in maintaining exceptional behavior, which in this case is to maintain the sandhi. Interestingly, for the sandhi with exceptions, HL+L → H+L (Fig. 3c), in the REAL-Word-High vs. REAL-Word-Low comparison, higher frequency in fact leads to lower productivity as evidenced by the lower pitch of the sandhi tone for the high-frequency words. This reversal of the pattern may also be caused by the conservative nature of high-frequency words in maintaining exceptional patterns; but in this case, the exceptional pattern is the failure to apply this sandhi. We also found a difference between the two half-third sandhis in the frequency effect. For LH+H → L+H, two of the frequency comparisons (REAL-Word-High vs. REAL-Word-Low; PSEUDO-σ1-High vs. PSEUDO-σ1-Low) showed a significant difference in pitch slope, yet no significant difference was obtained for any frequency comparison for LH+HL → L+HL. In particular, for the two comparisons for LH+H → L+H that showed a significant difference, the high- frequency words both had a more pronounced pitch fall at the beginning. Given that the half-third sandhi in Tianjin primarily renders the first syllable a falling tone as shown in Zhang and Liu (2011), this seems to indicate a productivity advantage for the high-frequency words. This result, again, maybe due to the overall higher lexical frequency of HL (Tone 4) than H (Tone 2), further encouraging proper learning of the sandhi involving the former. We fail to interpret two of the frequency patterns observed in PSEUDO words: the high productivity of HL+L → H+L when σ1 has a high frequency and the low productivity of HL+HL → L+HL when σ1 has a high frequency. It is interesting to note that these anomalies occur in the two sandhis with exceptional behaviors and that they are the mirror images of the patterns observed for REAL words for these sandhis. But we are yet to understand the significance of these observations.

4 A learning model

Our experimental results showed that Tianjin speakers’ knowledge of tone sandhi is a combination of proper learning, underlearning, and overlearning from the lexicon: on the one hand, lexical statistics do inform learning as evidenced by the frequency effects found in our experiment; on the other hand, productive patterns in the lexicon can be underlearned, especially when the patterns do not have strong phonetic bases, and the underlearning can be gradiently manifested in the phonetic realization of the sandhi tones, yet patterns with exceptions can also be overlearned and generalized to novel words. It is therefore imperative to have a learning model that is able to make these predictions.

123 Author's personal copy

Tone sandhi productivity in Tianjin Chinese 19

4.1 The maximum entropy (MaxEnt) model

To this end, we designed a substantively-biased learning model based on the Maximum Entropy (MaxEnt) grammar. In MaxEnt, each constraint is associated with a weight, and for each input, the probability of a particular candidate surfacing as the output is determined by how well this candidate satisfies the constraint weight hierarchy when compared with all other candidates. Learning in a MaxEnt grammar is to determine the constraint weights that maximize the log probability of the learning data, and for each constraint, the learner can impose a Gaussian prior, with a mean of μ and a variance of σ2, over its weight to prevent overfitting the data. The μ represents the default weight for the constraint, and σ2 determines the severity of the penalty when the weight of the constraint deviates from μ—the smaller the σ2, the greater the penalty. Crucially, learning biases can be encoded as different σ2s for different constraints. For more details on MaxEnt grammars and learning biases as Gaussian priors, see Goldwater and Johnson (2003), Wilson (2006), Ja¨ger (2007), and Hayes and Wilson (2008), among others.

4.2 Constraints

We also base our analysis on the dual listing/generation model of Zuraw (2000, 2010). This model assumes that existing forms are lexically listed and are protected by highly-ranked faithfulness constraints, but lower and stochastically-ranked constraints can encode both patterns of lexical statistics and phonetically-based generalizations. One crucial type of constraint in our model is USELISTED, inspired by Zuraw (2000). Two types of USELISTED constraints are proposed. First, given that the speakers performed the sandhis better in real words than in wug words, we posit that the disyllabic words are listed in the lexicon with their sandhi tones, and there are USELISTED constraints on disyllables that force the listed disyllables to be used as in (5a). Second, since our results also showed that the speakers performed the sandhis better when the first syllable is an existing Tianjin syllable than when it is an accidental gap, this indicates that sandhi allomorphs of existing syllables are also listed, and we posit a second group of USELISTED constraints that forces the listed syllable allomorphs to be used in non-final sandhi positions as in (5b). Note that the term “allomorph” in (5b) is used in a more abstract sense than the morpheme- specific traditional definition of the term as it refers to syllables that can cue multiple homophonous morphemes. For example, [panHL] is an existing syllable in Tianjin and can represent morphemes meaning “half,” “partner,” “to mix,” “to act as,” “to deal with,” and “to trip.” Therefore, this syllable has a listed allomorph H HL [pan ] to be used before an L-toned syllable, and USELISTED(pan /_L) requires the use of this allomorph in the appropriate context regardless of which morpheme this syllable represents.

123 Author's personal copy

20 J. Zhang, J. Liu

(5) USELISTED constraints: L L LH L L L a. USELISTED(σ –σ ): Use the listed /σ –σ / for /σ /+/σ /. LH LH HL L Mutatis mutandis for USELISTED(σ –σ ), USELISTED(σ –σ ), HL HL LH H LH HL USELISTED(σ –σ ), USELISTED(σ –σ ), and USELISTED(σ –σ ). L LH L b. USELISTED(σ /_L): Use the listed allomorph /σ / for /σ / before an /L/-toned syllable. LH HL Mutatis mutandis for USELISTED(σ /_LH), USELISTED(σ /_L), HL LH LH USELISTED(σ /_HL), USELISTED(σ /_H), and USELISTED(σ /_HL).

In our implementation of the model, the USELISTED constraints in (5a) are word- specific, and the ones in (5b) are syllable-specific. In other words, there are as many (5a)-type USELISTED constraints as words in Tianjin, and there are as many (5b)-type USELISTED constraints as syllable types. This is in the same spirit as the lexically indexed constraints àlaCoetzee (2009), Becker et al. (2011), and Coetzee and Kawahara (2013), and it can be seen as a possible way in which lexical entries interacts with the rest of the phonological grammar: the strength of the lexical entry is now represented as the weight of its USELISTED constraint. The USELISTED constraints employed here are different from USELISTED in Zuraw (2000) in the following respects. First, Zuraw employs USELISTED only for morphologically complex forms, not for allomorphs. Second, Zuraw assumes that each candidate is an input–output pairing, and her USELISTED constraint is defined as “The input portion of a candidate must be a single lexical entry” (p. 50). We have made a different assumption: the candidate that is identical to the listed form is necessarily derived from the listed form. Third, Zuraw uses only one USELISTED constraint and encodes the strength of a lexical entry by a listedness value from 0 to 1 that is determined by the entry’s lexical frequency. The listedness value reflects the availability of the lexical entry in the derivation of the output. We, on the other hand, have a proliferation of USELISTED constraints whose weights reflect the strengths of lexical entries and syllable allomorph listings as determined by their frequencies. The assumption, then, is that a lexical entry for a word or an abstract allomorph for a syllable with the appropriate tone sandhi is built together with a USELISTED constraint whenever a sandhied formed is encountered and accepted by the speaker, and the weight of the USELISTED constraint gradually increases as the form is further encountered.5

5 A reviewer asked what restrictions USELISTED constraints would have and whether they only refer to tones of a given language. These are interesting and difficult questions. Provided that (a) a lexical phonological pattern is not entirely productive, and (b) there are productivity differences among different types of phonological patterns, indicating that the lack of full productivity is not just a task effect, it is necessary to encode the effect of lexicality for this pattern in the grammar. Therefore, USELISTED constraints would be applicable to any type of phonological pattern, not just tonal ones. It is possible to conceive of USELISTED constraints simply as IO-faithfulness constraints, which would require the output to be identical to the listed form. This is essentially how we have used these constraints here. It is then less of a surprise that these constraints are applicable to other phonological features. In a published update of Zuraw (2000), Zuraw (2010) in fact rephrased the USELISTED constraints in similar terms and distinguished the correspondence between the output and the listed form and the correspondence between the output and the “underlying” form by shifting the burden of the latter to Output–Output- correspondence. We have simply maintained the distinction between USELISTED and IO-faithfulness here. The proliferation of the USELISTED constraints is necessary for the analysis of lexical frequency 123 Author's personal copy

Tone sandhi productivity in Tianjin Chinese 21

Markedness constraints that militate against certain tonal combinations and hence motivate tone sandhi6 and faithfulness constraints that protect underlying tones, as defined in (6) and (7), are also included in our model.

(6) Markedness constraints: a. *L+L b. *LH–LH c. *HL–L d. *HL–HL e. *LH–H f. *LH–HL

(7) Faithfulness constraints:7

a. PRESERVE(L) b. PRESERVE(H) c. PRESERVE(LH) d. PRESERVE(HL/_L) e. PRESERVE(HL/_HL) In order to capture the gradience observed in sandhi application, we define these constraints to be gradient in that candidates may incur different degrees of violation of the constraints encoded as different numbers of violation marks. We assume that the number of violations for each constraint ranges from 0 to 4, with 0 indicating that the output tone completely satisfies the requirement set forth by the constraint, and 4 indicating that the output tone maximally deviates from the requirement. The 0–4 scale is admittedly ad hoc, but it represents a reasonable trade-off between the contrastive tone differences in Tianjin and the potential gradient steps between contrastive tones given the production and perception of tones. As an illustration, Table 2 shows the evaluations of five candidates for a real word with /L/+/L/ base tones and a listed /LH–L/ form against USELISTED L L L L (σ –σ ), USELISTED(σ /_σ ), PRESERVE(L), and *L–L. The five candidates [L–L], [LL↑–L], [LM–L], [LH↓–L], and [LH–L] are phonetically evenly spaced between [L–L] and [LH–L]. The closer a candidate is to /L–L/, the more violations it L L L incurs for USELISTED(σ –σ ), USELISTED(σ /_L), and *L–L, but the fewer violations it incurs for PRESERVE(L).

Footnote 5 continued effects on productivity as well as lexical variation, and Coetzee (2009), Becker et al. (2011), and Coetzee and Kawahara (2013), among others, have used a similar strategy. 6 The markedness constraints should be taken as phonotactic generalizations that speakers make when tonal alternations are encountered. This is different from the canonical OT assumption that all constraints are in UG (Prince and Smolensky 1993). For modeling the learning of phonotactic constraints, see Hayes and Wilson (2008). 7 The reason we use PRESERVE instead of IDENT in our faithfulness constraints is that in its formal definition, IDENT(F) requires [F] to be a distinctive feature; the featural representation of tone, however, is controversial in both the number of tone levels and whether there are contour tone features (see Zhang 2010 for a review of the issue). We have therefore chosen to use the theory-neutral PRESERVE to avoid this controversy. 123 Author's personal copy

22 J. Zhang, J. Liu

Table 2 Constraint evaluations

L L L Base: /L/+/L/ USELISTED(σ –σ )USELISTED(σ /_L) PRESERVE(L) *L–L listed: /LH–L/

L–L 4 4 4 LL↑–L 3 3 1 3 LM–L 2 2 2 2 LH↓–L 1 1 3 1 LH–L 4

4.3 Learning biases as σ2 values

We set the default weight μ to be 0 and the default σ2 to be 10−3 for all constraints. 2 But we also encode two learning biases by adjusting the σ values of the USELISTED and the markedness constraints in the following ways. 2 First, the σ value of each USELISTED constraint is multiplied by a coefficient BListed that is smaller than 1 and thus biases against promoting the weight of the constraint. For each USELISTED constraint, we posit BListed to be 10 to the negative power of a logistic function, in which x represents the number of morphemes that the USELISTED constraint covers as in (8). The x value for the USELISTED constraints for disyllabic words is naturally 1. For the USELISTED constraints for syllable-level allomorphs, the x value equals the number of homophones that the syllable represents. As estimated from Da’s (2004) corpus, the average numbers of homophones for a syllable in each of the tones in Mandarin are summarized in Table 3. We will use these numbers as approximations for the x values for the syllable-level USELISTED constraints in our learning simulation. The BListed values according to these numbers are summarized in Table 3 as well. The intuition behind this bias coefficient is that learners use lexical information in concomi- tance with grammatical resources such as the MARKEDNESS »FAITHFULNESS ranking to make phonological generalizations, but they do so cautiously, expressed in the model by assigning USELISTED constraints greater penalties if they deviate from the default ranking of 0, so that the weights of these constraints are harder to promote; moreover, learners are unwilling to treat large amounts of data as listed behavior, expressed in the model as greater penalties for syllable-level USELISTED constraints, so that these constraints are even harder to promote along the weight scale.

1 À À : x (8) BListed ¼ 10 1þe1 0 25 (x = the number of morphemes that the USELISTED constraint covers.) Second, we encode a learning bias in favor of promoting the weights of USELISTED constraints that regulate base-sandhi mappings with a strong phonetic LH LH basis [i.e., USELISTED(σ /_H), USELISTED(σ /_HL)] and the relevant markedness constraints (i.e., *LH+H, *LH+HL) by multiplying their σ2 values with a coefficient BPhonetics = 10. The rest of the constraints are assumed to have a

123 Author's personal copy

Tone sandhi productivity in Tianjin Chinese 23

Table 3 BListed values for xBListed USELISTED constraints

USELISTED(σ–σ) 1 0.4777 L L USELISTED(σ /_σ ) 5.45 0.2573 LH LH USELISTED(σ /_σ ) 3.72 0.3292 HL L USELISTED(σ /_σ ) 5.76 0.2465 HL HL USELISTED(σ /_σ ) 5.76 0.2465 LH H USELISTED(σ /_σ ) 3.72 0.3292 LH HL USELISTED(σ /_σ ) 3.72 0.3292

BPhonetics = 1. This coefficient expresses a substantive bias àlaWilson (2006)in allowing phonetically motivated patterns to have an edge in learning over other patterns (see also Zhang and Lai 2010; Zhang et al. 2009, 2011). Each USELISTED 2 −3 constraint’s σ value, then, is 10 multiplied by its BListed and BPhonetics values while 2 −3 the rest of the constraints’ σ values are 10 multiplied by their respective BPhonetics values. The σ2 values for all constraints are summarized in Table 4.

4.4 Learning simulations

The goal of our learning simulation is to train the learner with a representative sample of the Tianjin lexicon so that it will acquire a grammar that can predict our speakers’ wug test behavior. The learning was simulated using the MaxEnt Grammar Tool (Hayes et al. 2009a). The training dataset included 20 real words for each of the base tone combinations L+L, LH+LH, HL+L, HL+HL, LH+H, and LH+HL. Among the 20 words for each tonal combination, 10 were high frequency, and 10 were low frequency. We used the average raw frequencies of the disyllabic words in each tonal combination used in our experiment from Da’s corpus to simulate the token frequencies of words in the training dataset as shown in Table 5. For example, for L+L, each of the 10 high-frequency words had a token frequency of 4615, and each of the 10 low-frequency words had a token frequency of 75. For each word, five candidates whose initial syllables were phonetically evenly spaced between the base tone and the sandhi tone, like in Table 2, were considered. For L+L, LH+LH, LH+H, and LH+HL, the base tone was consistently listed as undergoing sandhi; for HL+L, one high-frequency word and one low-frequency word were listed not to undergo sandhi; for HL+HL, only one high-frequency word and one low-frequency word were listed to undergo sandhi. The USELISTED constraints were indexed to the words and the syllables. Each sandhi was tested separately, and the learner acquired the weights of the constraints relevant for the sandhi. We will not list the weights for individual constraints due to the large number of word- and syllable-specific USELISTED constraints. But overall, the USELISTED constraints for high-frequency disyllabic words have higher weights than those for low-frequency disyllable words, and the USELISTED constraints for high-frequency syllable allomorphs have higher weights

123 Author's personal copy

24 J. Zhang, J. Liu

Table 4 σ2 values for all constraints Constraints σ2 Constraints σ2

USELISTED(σ–σ) 0.0004777 *HL+L 0.001 L USELISTED(σ /_L) 0.0002573 *HL+HL 0.001 LH USELISTED(σ /_LH) 0.0003292 *LH+H 0.01 HL USELISTED(σ /_L) 0.0002465 *LH+HL 0.01 HL USELISTED(σ /_HL) 0.0002465 PRESERVE(L) 0.001 LH USELISTED(σ /_H) 0.003292 PRESERVE(H) 0.001 LH USELISTED(σ /_HL) 0.003292 PRESERVE(LH) 0.001

*L+L 0.001 PRESERVE(HL/_L) 0.001

*LH+LH 0.001 PRESERVE(HL/_HL) 0.001

than those for low-frequency syllable allomorphs. Also, USELISTED for a disyllabic word has a higher weight than USELISTED for the syllable allomorph of its first syllable. The markedness constraints generally have high weights except for *HL– HL, which has a weight of 0. The faithfulness constraints, on the other hand, have a weight of 0 except for PRESERVE(HL/_HL), which has a high weight. To test the accuracy of the learning model, we considered the learner’s predictions for five types of words: high- and low-frequency real words (REAL-High, REAL-Low), pseudo words in which σ1 comes from high- and low-frequency real words (PSEUDO-High, PSEUDO-Low), and novel words with a nonce σ1(NOVEL). For PSEUDO words, we assumed that the only relevant type of USELISTED constraint was the syllable-level constraints, and for NOVEL words, none of the USELISTED constraints were relevant. For HL+L and HL+HL, we tested both words that are listed to undergo sandhi as well as words that are listed not to. The learner made predictions on the percentages of the five output candidates whose initial syllables were phonetically evenly spaced between the base tone and the sandhi tone. Given that for all the sandhis, the base and the sandhi tones differ in pitch only at either the left or the right edge of the tone, in reporting the learner’s predictions, we report the average pitch of this crucial edge according to the predicted outputs for each base tone combination. To facilitate the pitch calculation, we represented the pitch on a 1–5 numerical scale, on which 5 = H, 4 = H↓,3= M, 2 = L↑,1= L. To illustrate, take

Table 5 Token frequencies of High frequency Low frequency disyllabic words in the training dataset L+L 4615 75 LH+LH 3267 201 HL+L 3652 307 HL+HL 3629 173 LH+H 2851 117 LH+HL 4291 216

123 Author's personal copy

Tone sandhi productivity in Tianjin Chinese 25

Fig. 4 The learner’s predictions for the behavior of different sandhis for five word types: REAL-High, REAL-Low, PSEUDO-High, PSEUDO-Low, and NOVEL. Bars in the graphs represent the average pitches among the predicted outputs of the edge of the tone where the base and the sandhi tones differ. The pitch is represented in a 1–5 numerical scale: 5 = High and 1 = Low. a L+L → LH+L (T1+T1 → T3+T1), b LH+LH → H+LH (T3+T3 → T2+T3), c HL+L → H+L (T4+T1 → T2+T1), d HL+HL → L+HL (T4+T4 → T1+T4), e LH+H → L+H (T3+T2 → T1+T2) f LH+HL → L+HL (T3+T4 → T1+T4) the example of a high-frequency real word with the base tones /L/+/L/: if the learner predicts the five candidates [L–L], [LL↑–L], [LM–L], [LH↓–L], and [LH–L] to have the percentages 0.005, 0.062, 0.705, 8.021, and 91.206 %, respectively, then the predicted average offset pitch for σ1is19 0.005 % + 2 9 0.062 % + 3 9 0.705 % + 4 9 8.021 % + 5 9 91.206 % = 4.9036. For HL+L and HL+HL, the average pitch

123 Author's personal copy

26 J. Zhang, J. Liu was derived by proportionally combining the predictions for forms with listed sandhi and the predictions for forms with listed no-sandhi (9:1 for HL+L, 1:9 for HL+HL). The learner’s predictions are summarized in Figure 4. For the sandhi L+L → LH+L, given that the sandhi changes L to LH, the higher the right edge of the output tone is, the more productively the sandhi has applied. Our predictions in Fig. 4a are that, first, there is a general gradation of sandhi productivity from REAL to PSEUDO to NOVEL, and second, the sandhi applies more productively in high-frequency than low-frequency words. These predictions were borne out in our experimental results. For LH+LH → H+LH (Fig. 4b), the pattern is similar. For the two half-third sandhi patterns LH+H → L+H (Fig. 4e) and LH+ HL → L+HL (Fig. 4f), our model predicts that the magnitude of the differences among the different word types is smaller due to the Bphonetics coefficient that allowed the weights of relevant USELISTED and markedness constraints to be promoted more easily. In our experiment, the LH+H → L+H sandhi showed underlearning in PSEUDO and NOVEL words but no clear results based on lexical frequency; the LH+HL sandhi showed proper learning. Our model in fact predicts slightly smaller pitch differences among different word types for the latter due to the higher token frequencies of the LH+HL words in the learner’s input. Regarding the sandhis with exceptional behavior, for HL+HL → L+HL we predicted a higher productivity in real words, especially those with high frequency as indicated by a lower σ1 onset pitch (Fig. 4d), and for HL+L → H+Lwe predicted the mirror image, namely, a lower productivity in high-frequency real words as indicated by a lower σ1 offset pitch (Fig. 4c); both agreed with the experimental results. In our model, the nature of the predicted productivity differences is a combination of categorical and gradient differences in the application of the sandhis. This also echoes the experimental results. We compared the current model with a baseline model in which the phonetic nature of the sandhi is not encoded in the grammar; in other words, Bphonetics = 1 for all constraints. This baseline model makes different predictions for LH+H and LH+HL as shown in Fig. 5. The main difference is in the magnitude of the

Fig. 5 The predictions of the baseline learner in which Bphonetics = 1 for all constraints for the LH+H and LH+HL sandhis. a LH+H → L+H (T3+T2 → T1+T2), b LH+HL → L+HL (T3+T4 → T1+T4)

123 Author's personal copy

Tone sandhi productivity in Tianjin Chinese 27 predicted pitch difference: the baseline model predicts a considerably larger productivity difference between different word types and words of different lexical frequencies. Given that in our experimental results, lexical frequency had a significant effect on productivity for only a subset of the comparisons for LH+H (Fig. 3e), and neither word type nor lexical frequency had a significant effect on productivity for LH+HL, the smaller effect predicted by the model with the Bphonetics coefficient is more consistent with the experimental results. Another baseline model that we compared our results to was one in which there is no bias against the promotion of USELISTED constraints; in other words, BListed = 1. The phonetic bias is retained. The predictions of this baseline model are given in Fig. 6. Compared to the original model, this baseline model predicts similar patterns, but the differences predicted among word types and different frequencies are of slightly greater magnitudes. This works to the advantage of the non-phonetic sandhis of L+L, LH+LH, HL+L, and HL+HL as the magnitude of effects predicted by the original model was smaller than the attested effects, but to the disadvantage of the two phonetic sandhis LH+H and LH+HL as our experimental result showed no consistent effect. However, we do not consider this baseline model to be a theoretically sound model. This is because earlier work by Zhang et al. (e.g., 2009, 2011) has shown that the BListed coefficients are crucial to the learning of opaque tone sandhi patterns in Taiwanese. There is thus no reason to assume that they would not be relevant for Tianjin. The reason these coefficients are particularly important for opaque sandhis is that these sandhis cannot be captured by the MARKEDNESS »FAITHFULNESS schema and must be acquired through lexical and allomorph listings in our model; in order to capture the lack of full productivity of the opaque sandhis manifested in wug tests, the learning model must actively suppress the promotion of weights for USELISTED constraints, especially those regarding syllable and tonal allomorphs. For transparent sandhis like those in Tianjin, the patterns are the combined result of both USELISTED and markedness constraints. The suppression of weights for USELISTED, therefore, has less of a dramatic effect as the markedness constraints will compensate for the effect by acquiring greater weights. Indeed, in the baseline simulation where BListed was set to 1, the weights for USELISTED constraints were greater, but the weights for the markedness constraints were smaller. This trade-off produced the similar effects of word type and frequency to the original model. Finally, we also tested a baseline model in which both Bphonetics and BListed are set to 1. Aside from the theoretical issue of not suppressing the weights for USELISTED constraints just mentioned, this model has the same problem as the first baseline model in predicting a productivity difference between different word types and words of different lexical frequencies for the two half-third sandhis as shown in Fig. 7. The predicted patterns for the sandhis without clear phonetic motivation are identical to those in Fig. 6a–d. Overall, we believe that our biased model is the one that is both theoretically sound and makes good empirical predictions. It succeeds in predicting the simultaneous underlearning and overlearning of the sandhi patterns: the learner can underlearn the sandhi patterns slightly despite their full productivity in the lexicon, but it can also overgeneralize the sandhi with exceptions to wug words; both the underlearning and 123 Author's personal copy

28 J. Zhang, J. Liu

Fig. 6 The predictions of the baseline learner in which Blisted = 1 for all USELISTED constraints for the six sandhi patterns. a L+L → LH+L (T1+T1 → T3+T1), b LH+LH → H+LH (T3+T3 → T2+T3), c HL +L → H+L (T4+T1 → T2+T1), d HL+HL → L+HL (T4+T4 → T1+T4), e LH+H → L+H (T3 +T2 → T1+T2), f LH+HL → L+HL (T3+T4 → T1+T4) overlearning are correlated with the frequency effects in the right direction as well. Regarding proper learning for one of the half-third sandhis, the biased model only predicts smaller differences in productivity between real and wug words, not the identity between the two; the predicted differences would likely be further reduced if we took type frequency into account in our model as the HL tone that triggers the properly learned half-third sandhi has the highest syllable-type frequency among all tones.

123 Author's personal copy

Tone sandhi productivity in Tianjin Chinese 29

Fig. 7 The predictions of the baseline learner in which Bphonetics = Blisted = 1 for all constraints for the LH+H and LH+HL sandhis. a LH+H → L+H (T3+T2 → T1+T2), b LH+HL → L+HL (T3 +T4 → T1+T4)

Our model, however still needs improvements in the following areas. First, the overall magnitudes of the predicted productivity differences are currently too small compared to our wug test results. Second, like our failure of interpretation, we also fail to model the frequency patterns in PSEUDO words for the two sandhis with exceptional behaviors. Third, although we have commented on the influence of Beijing and Standard Chinese (SC) on Tianjin, our model has not formally taken this influence into account and can hence only model underlearning or overlearning effects due to Tianjin-internal factors. A more comprehensive model should be able to make predictions on how the SC input helps shape the productivity patterns.

5 General discussion

5.1 The theoretical model

Earlier theoretical analyses of disyllabic tone sandhi in Tianjin (e.g., Wang 2002; Lin 2008) take the four sandhi patterns in (2) as a given in terms of both their productivity and their neutralizing nature and account for the patterns via the interaction between various types of tonal Obligatory Contour Principle (OCP, Leben 1973) constraints and tonal faithfulness constraints. For example, Lin (2008) accounts for the sandhi L+L → LH+L as in (9). The tones are represented on two levels: the tonal level (T), which is directly associated with the syllable, and the tonemic level (t), which are level components of contour tones dominated by the tonal level. OCP and faithfulness constraints to tones can be defined on either the T or the t level, and subscripted L or R indicates the level tone on the left or right of a contour tone. The conjoined constraint [IDENT(t)R and IDENT(t)L]T militates against changing both the left and right edges of the contour tone.

123 Author's personal copy

30 J. Zhang, J. Liu

We have taken a very different approach in our analysis. The advantage of our proposal is that it is a more accurate reflection of speakers’ knowledge of Tianjin disyllabic tone sandhi, which involves exceptions and incomplete neutralization, and the productivity of the patterns also varies depending on the sandhi. An analysis along the lines of (9) misses these nuanced yet important generalizations. The price that we pay, however, is that we now have a proliferation of USELISTED constraints that interact with the rest of the grammar, and the syllable-level USELISTED constraints partially duplicate the function of the MARKEDNESS »FAITHFULNESS ranking. Zhang et al. (2009, 2011) have shown that this duplication is empirically necessary to capture the lack of full productivity of the opaque tone sandhis in Taiwanese. What we have seen here is that even for transparent sandhis that can be captured by the markedness and faithfulness interaction, full productivity is still not guaranteed, and lexical listing is still necessary. The coexistence of traditional markedness and faithfulness constraints with ad hoc USELISTED constraints that require the surface forms to use listed allomorphs coincides with Moreton’s (2004) argument that the grammar is composed of an innate and “conservative” component of markedness and faithfulness constraints and a language-specific component with constraints that require particular lexical items to have particular surface representations in particular environments. We share with Moreton (2004) the intuition that such constraints are necessary in the grammar in any case to deal with processes that target specific lexical items and morphological categories, that are suppletive, and that have lexical exceptions, but we have taken the position one step further by positing that speakers will build lexical constraints in any event. In other words, USELISTED can be considered as a universal template into which learners plug the specifics of their language. Finally, the model proposed here has affinities with exemplar-based models of grammar (e.g., Bybee 2001, 2006; Pierrehumbert 2001, 2002; Gahl and Yu 2006)in that it allows usage frequency effects on phonological patterning to be captured. But the frequency effects are derived through the weights of USELISTED constraints, which interact with other constraints in the grammar, rather than just emerging from the lexicon. Thus, the frequency effects are predicted to interact with other grammatical effects in ways constrained by the grammar.

123 Author's personal copy

Tone sandhi productivity in Tianjin Chinese 31

5.2 Size of the acoustic effects

We have seen in the acoustic results that although some of the word-type or frequency comparisons show significant differences in either pitch mean or pitch slope, the differences are typically of small magnitudes. Absolute f0 differences found in the comparisons are generally in the order of a few Hertz before normalization. An anonymous reviewer questioned whether such small differences can be the basis for the claim of learning differences and, hence, grammatical differences. The point that we would like to emphasize, however, is that our main result is in the different behaviors of different sandhi patterns, and we have interpreted the different behaviors based on the lexical and phonetic properties of the sandhi patterns that are known to affect phonological productivity in general. Therefore, the pitch differences, though small, cannot be easily claimed to have resulted from the nature of the task and swept under the rug; they need to be accounted for in other ways. The position we have taken is that the lexical and phonetic properties of the sandhi directly influence its production grammar. The fact that the small acoustic differences may not be perceptible does not contradict the fact that different sandhis are processed differently in production. This is in fact a familiar scenario: production and perception studies of incompletion neutralization and near merger often show consistent small acoustic differences as the result of these, but speakers’ perceptual use of these subtle cues is highly context-dependent and often unreliable (e.g., Jassem and Richter 1989; Port and Crawford 1989; Peng 2000; Warner et al. 2004;Yu2007; Herd et al. 2010).

5.3 Aggregate vs. individual differences

As pointed out by an anonymous reviewer, it is important to recognize that our grammatical model is based on the aggregate acoustic results from multiple speakers of Tianjin. Therefore, the model is only a representation of the behavior of an idealized native speaker of Tianjin. As we have commented in Sects. 1.3 and 3, there were clearly individual differences in how the speakers behaved. This means that each individual speaker’s grammar will deviate from the model that we have proposed. However, we opted not to take each individual speaker’s data and construct a grammar for him/her as idiosyncrasies of the speakers will likely be overrepresented in these grammars while a grammar based on the aggregate results is more likely to be representative of the Tianjin language. This is common practice for modeling analyses of phonological patterns based on experimental results or corpus data (e.g., Wilson 2006; Hayes and Londe 2006; Coetzee and Pater 2008; Hayes et al. 2009b; Becker et al. 2011; Zuraw 2010; Coetzee and Kawahara 2013).

6 Conclusions

The tone sandhi patterns of Tianjin Chinese are variable, gradient, and full of exceptions. To understand how the speakers of Tianjin tackle phonological patterns with such complexity, we conducted a wug test to investigate the productivity of the 123 Author's personal copy

32 J. Zhang, J. Liu sandhi patterns. Our results indicate that a Tianjin speaker’s knowledge of tone sandhi may differ from the sandhi pattern in the lexicon in nuanced ways: sandhis with exceptions can be generalized and overlearned while a number of fully productive sandhis in the lexicon are underlearned, both of which illustrate the effects of frequency and lexical listing on sandhi productivity; the phonetic nature of a sandhi may encourage learning, bringing underlearning closer to proper learning. These mismatches are claimed here to be informative as to the nature of the speakers’ phonological grammars. A model of the grammar, consequently, needs to be quantitative and flexible enough to capture the variability, gradience, and exceptions, and the resultant overlearning and underlearning effects.

Acknowledgments We are indebted to Ping Wang, Xiaoyu Zeng, and Feng Shi at Nankai University for hosting us during data collection and discussing various aspects of this project with us. We also thank Geng Wang for serving as our Tianjin language consultant and the speakers of Tianjin who participated in our experiment. We are grateful to the participants at GLOW-Asia 8 and the second Pan-American/ Iberian Meeting on Acoustics, especially James Myers, Doug Whalen, and Charles Yang, for their comments on this research. We, however, remain fully responsible for the opinions expressed here. This research was supported by the National Science Foundation grant BCS-0750773 and the University of Kansas General Research Fund 2301166.

References

Albright, Adam. 2002. Islands of reliability for regular morphology: Evidence from Italian. Language 78 (4): 684–709. Albright, Adam, and Bruce Hayes. 2003. Rules vs. analogy in English past tenses: A computational/ex- perimental study. Cognition 90: 119–161. Albright, Adam, Argelia Andrade, and Bruce Hayes. 2001. Segmental environments of Spanish diphthongization. In UCLA working papers in linguistics 7, (Papers in phonology 5), ed. Adam Albright, and Taehong Cho, 117–151. Los Angeles: UCLA Department of Linguistics. Becker, Michael, Nihan Ketrez, and Andrew Nevins. 2011. The surfeit of the stimulus: Analytic biases filter lexical statistics in Turkish laryngeal alternations. Language 87: 84–125. Berent, Iris, Donca Steriade, Tracy Lennertz, and Vered Vaknin. 2007. What we know about what we have never heard: Evidence from perceptual illusions. Cognition 104: 591–630. Berko, Jean. 1958. The child’s learning of English morphology. Word 14: 150–177. Boersma, Paul and David Weenink. 2009. Praat: Doing phonetics by computer (computer program). http://www.praat.org/. Accessed 5 Jan 2009. Bybee, Joan. 2001. Phonology and language use. Cambridge: Cambridge University Press. Bybee, Joan. 2006. From usage to grammar: The mind’s response to repetition. Language 82: 711–733. Bybee, Joan, and Elly Pardo. 1981. On lexical and morphological conditioning of alternations: A nonce- probe experiment with Spanish verbs. Linguistics 19: 937–968. Chao, Yuen Ren. 1968. A grammar of spoken Chinese. Berkeley and Los Angeles: University of California Press. Chen, Matthew Y. 1987. The syntax of Xiamen tone sandhi. Phonology Yearbook 4: 109–150. Chen, Matthew Y. 2000. Tone sandhi: Patterns across Chinese dialects. Cambridge: Cambridge University Press. Cheng, Robert L. 1968. Tone sandhi in Taiwanese. Linguistics 41: 19–42. Coetzee, Andries W. 2009. Learning lexical indexation. Phonology 26: 109–145. Coetzee, Andries W., and Shigeto Kawahara. 2013. Frequency biases in phonological variation. Natural Language and Linguistic Theory 31: 47–89. Coetzee, Andries W., and Joe Pater. 2008. Weighted constraints and gradient restrictions on place co- occurrence in Muna and Arabic. Natural Language and Linguistic Theory 26: 289–337.

123 Author's personal copy

Tone sandhi productivity in Tianjin Chinese 33

Coetzee, Andries, and Joe Pater. 2011. The place of variation in phonological theory. In The handbook of phonological theory, 2nd ed, ed. John A. Goldsmith, Jason Riggle, and Alan C.L. Yu, 401–434. Cambridge, MA and Oxford, UK: Blackwell. Da, Jun. 2004. Chinese text computing. http://lingua.mtsu.edu/chinese-computing. Accessed 1 Sept 2008. Gahl, Susanne, and Alan Yu. 2006. Special issue on exemplar-based models in linguistics. The Linguistic Review 23(3): 289–318. Gandour, Jackson T., Siripong Potisuk, and Sumalee Dechongkit. 1994. Tonal in Thai. Journal of Phonetics 22: 474–492. Gao, Jing. 2004. The changing sandhi rules in Tianjin dialect. In Phonetic and phonological studies on Tianjin dialect, ed. Lu Jilun, 193–247. Beijing: Beijing Institute of Technology Press. Goldwater, Sharon, and Mark Johnson. 2003. Learning OT constraint ranking using a maximum entropy model. In Proceedings of the Stockholm workshop on variation within optimality theory, ed. Jennifer Spenader, Anders Eriksson, and Osten Dahl, 111–120. Stockholm: Stockholm University. Hayes, Bruce, and Zsuzsa C. Londe. 2006. Stochastic phonological knowledge: The case of Hungarian vowel harmony. Phonology 23: 59–104. Hayes, Bruce, and Colin Wilson. 2008. A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry 39: 379–440. Hayes, Bruce, Colin Wilson, and Benjamin George. 2009a. Maxent grammar tool. Java program. http://www.linguistics.ucla.edu/people/hayes/MaxentGrammarTool/. Accessed 22 May 2009. Hayes, Bruce, Kie Zuraw, Pe´ter Sipta´r, and Zsuzsa Londe. 2009b. Natural and unnatural constraints in Hungarian vowel harmony. Language 85: 822–863. Herd, Wendy, Allard Jongman, and Joan Sereno. 2010. An acoustic and perceptual analysis of /t/ and /d/ flaps in American English. Journal of Phonetics 38: 504–516. Hsieh, Hsin-I. 1970. The psychological reality of tone sandhi rules in Taiwanese. In Papers from the 6th meeting of the Chicago Linguistic Society, ed. M.A. Campbell, 489–503. Chicago: Chicago Linguistic Society. Hsieh, Hsin-I. 1975. How generative is phonology. In The transformational-generative paradigm and modern linguistic theory, ed. E.F. Koerner, 109–144. Amsterdam: John Benjamins. Hsieh, Hsin-I. 1976. On the unreality of some phonological rules. Lingua 38: 1–19. Hyman, Larry. 2007. Universals of tone rules: 30 years later. In Tones and tunes vol 1: Typological studies in word and sentence , ed. Tomas Riad, and Carlos Gussenhoven, 1–34. Berlin: Mouton de Gruyter. Ja¨ger, Gerhard. 2007. Maximum Entropy models and stochastic optimality theory. In Architectures, rules and preferences: Variation on themes by Joan W. Bresnan, ed. Annie Zaenen, Jane Simpson, Tracy H. King, Jane Grimshaw, Joan Maling, and Chris Manning, 467–479. Stanford: CSLI Publications. Jassem, Wiktor, and Lutoslawa Richter. 1989. Neutralization of voicing in Polish obstruents. Journal of Phonetics 17: 317–325. Jiang, Hui. 1994. The phonetic description of neutral tone in Tianjin dialect. MA thesis, Tianjin Normal University, Tianjin. Kenstowicz, Michael, and Charles Kisseberth. 1979. Generative phonology: Description and theory. San Diego: Academic. Kiparsky, Paul. 1973. Abstractness, opacity, and global rules. In Three dimensions of linguistic theory, ed. Osamu Fujimura, 57–86. Tokyo: TEC Company Ltd. Kirchner, Robert. 1996. Synchronic chain shifts in optimality theory. Linguistic Inquiry 27: 341–350. Leben, William. 1973. Suprasegmental phonology. PhD dissertation, MIT. Li, Xing-Jian, and Si-Xun Liu. 1985. Tianjin fangyan de liandu biandiao [Tone sandhi in the Tianjin dialect]. Zhongguo Yuwen [Studies of the Chinese Language] 1985(1): 76–80. Liang, Yuzhang, and Aizhen Feng. 1996. Fuzhouhua yindang [The sound system of ]. Shanghai: Shanghai Education. Lin, Huishan. 2008. Variable directional applications in Tianjin tone sandhi. Journal of East Asian Linguistics 17: 181–226. Liu, Yu-Zhen, and Jiang Gao. 2003. Qu-Qu liandu biandiao guize: shehui yuyanxue bianxiang [FF sandhi rule in Tianjin dialect: A sociolinguistic variable]. Tianjin Shifan Daxue Xuebao—Shehui Kexue Ban [Journal of Tianjin Normal University—Social Sciences] 2003(5): 65–69. Lu, Ji-Lun. 1997. Tianjin fangyan zhong de yizhong xin de liandu biandiao [A new tone sandhi rule in Tianjin dialect]. Tianjin Shida Xuebao [Journal of Tianjin Normal University] 1997(4): 67–72.

123 Author's personal copy

34 J. Zhang, J. Liu

Lu, Ji-Lun. 2004. A new phenomenon in Tianjin tone sandhi. In Phonetic and phonological studies on Tianjin dialect: Festschrift for Professor Wang Jialing’s 70th birthday, ed. Lu Ji-Lun, 89–137. Beijing: Beijing Institute of Technology Press. Ma, Qiuwu, and Yuan Jia. 2006. Tianjinhua shangsheng de liangtiao “biandiao guize” bianxi [Two new third tone sandhi rules in Tianjin dialect—a critical reanalysis]. Tianjin Shifan Daxue Xuebao— Shehui Kexue Ban [Journal of Tianjin Normal University—Social Science] 2006(1): 53–58. Maddieson, Ian. 1978. Universals of tone. In Universals of human language, vol. 2: Phonology, ed. Joseph H. Greenberg, 335–366. Stanford: Stanford University Press. Moreton, Elliott. 2004. Non-computable functions in optimality theory. In Optimality theory in phonology, ed. John McCarthy, 141–164. Malden: Blackwell. Moreton, Elliott. 2008. Analytical bias and phonological typology. Phonology 25: 83–127. Peng, Shu-Hui. 1997. Production and perception of Taiwanese tones in different tonal and prosodic contexts. Journal of Phonetics 25: 371–400. Peng, Shu-Hui. 2000. Lexical versus ‘phonological’ representations of Mandarin sandhi tones. In Language acquisition and the lexicon: Papers in laboratory phonology 5, ed. Michael B. Broe, and Janet B. Pierrehumbert, 152–167. Cambridge: Cambridge University Press. Pierrehumbert, Janet B. 2001. Exemplar dynamics: Word frequency, and contrast. In Frequency and the emergence of linguistic structure, ed. Joan Bybee, and Paul Hopper, 137–157. Amsterdam: John Benjamins. Pierrehumbert, Janet B. 2002. Word-specific phonetics. In Laboratory phonology 7, ed. Carlos Gussenhoven, and Natasha Warner, 101–139. Berlin: Mouton de Gruyter. Pierrehumbert, Janet B. 2006. The statistical basis of an unnatural alternation. In Laboratory phonology 8. Varieties of phonological competence, ed. Louis Goldstein, Douglas H. Whalen, and Catherine Best, 81–107. Berlin: Mouton de Gruyter. Port, Robert, and Penny Crawford. 1989. Incomplete neutralization and pragmatics in German. Journal of Phonetics 17: 257–282. Prince, Alan, and Paul Smolensky. 1993. Optimality theory: Constraint interactions in generative grammar. New Brunswick: Rutgers Center for Cognitive Science, Rutgers University. (re-printed in 2004 by MIT Press, Cambridge, MA). Rietveld, Toni, and Aoju Chen. 2006. How to obtain and process perceptual judgements of intonational meaning. In Methods in empirical prosody research, ed. Stefan Sudhoff, Denisa Lenortova´, Roland Meyer, Sandra Pappert, Petra Augurzky, Ina Mleinek, Nicole Richter, and Johannes Schieβer, 283– 319. Berlin: Walter de Gruyter. Rose, Phil. 1987. Considerations in the normalization of the fundamental frequency in linguistic tone. Speech Communication 6: 343–351. Shi, Feng. 1986. Tianjin fangyan shuangzizu shengdiao fenxi [An analysis of disyllabic tones in Tianjin dialect]. Yuyan Yanjiu [Linguistic Research] 1986(1): 77–90. Shi, Feng. 1988. Shilun Tianjinhua de shengdiao jiqi bianhua—xiandai yuyinxue biji [On tones and their recent changes in Tianjin dialect—modern phonetics notes]. Zhongguo Yuwen [Studies of the Chinese Language] 1988(5): 351–360. Shi, Feng. 1990. Hanyu he Dong-Tai yu de shengdiao geju [Tone systems in Chinese and Kam-Tai languages]. PhD dissertation, Nankai University, Tianjin. Shi, Feng, and Ping Wang. 2004. Tianjinhua shengdiao de xin bianhua [New changes in Tianjin tones]. In The joy of research: A festschrift in honor of Professor William S.-Y. Wang on his seventieth birthday, ed. Feng Shi, and Zhongwei Shen, 176–188. Tianjin: Nankai University Press. Wang, Samuel H. 1993. Taiyu biandiao de xinli texing [On the psychological status of Taiwanese tone sandhi]. Tsinghua Xuebao [Tsinghua Journal of Chinese Studies] 23: 175–192. Wang, Jia-Ling. 2002. Youxuanlun he Tianjinhua de liandu biandiao ji qingsheng [Optimality Theory and tone sandhi and neutral tone in Tianjin dialect]. Zhongguo Yuwen [Studies of the Chinese Language] 2002(4): 363–371. Warner, Natasha, Allard Jongman, Joan Sereno, and Rache`l Kemper. 2004. Incomplete neutralization of sub-phonemic durational differences in production and perception of Dutch. Journal of Phonetics 32: 251–276. Wee, Lian-Hee. 2004. Inter-tier correspondence theory. PhD dissertation, Rutgers University, New Brunswick, NJ. Wilson, Colin. 2006. Learning phonology with substantive bias: An experimental and computational study of velar palatalization. Cognitive Science 30(5): 945–982. Xu, Yi. 1997. Contextual tonal variations in Mandarin. Journal of Phonetics 25: 61–83. 123 Author's personal copy

Tone sandhi productivity in Tianjin Chinese 35

Xu, Yi. 2005. TimeNormalizedF0. Praat script. http://www.phon.ucl.ac.uk/home/yi/tools.html. Accessed 1 Dec 2005. Yang, Zi-Xiang, He-Tong Guo, and Xiang-Dong Shi. 1999. Tianjinhua Yindang [The sound system of Tianjin dialect]. Shanghai: Shanghai Education Press. Yu, Alan C.L. 2007. Understanding near mergers: The case of morphological tone in . Phonology 24: 187–214. Yue-Hashimoto, Anne O. 1987. Tone sandhi across Chinese dialects. In Wang Li memorial volumes, English volume, ed. Chinese Language Society of Hong Kong, 445–474. Hong Kong: Joint Publishing Co. Zhang, Jie. 2002. The effects of duration and sonority on contour tone distribution: A typological survey and formal analysis. New York: Routledge. Zhang, Jie. 2007. A directional asymmetry in Chinese tone sandhi systems. Journal of East Asian Linguistics 16: 259–302. Zhang, Jie. 2010. Issues in the analysis of Chinese tone. Language and Linguistics Compass 4(12): 1137– 1153. Zhang, Jie. 2014a. Tones, tonal phonology, and tone sandhi. In The handbook of Chinese linguistics, ed. C.-T.James Huang, Y.-H.Audrey Li, and Andrew Simpson, 443–464. Oxford: Wiley-Blackwell. Zhang, Jie. 2014b. Tone sandhi. In Oxford bibliographies in linguistics, ed. Mark Aronoff. New York: Oxford University Press. http://www.oxfordbibliographies.com/view/document/obo-978019977281 0/obo-9780199772810-0160.xml. Accessed 15 July 2014. Zhang, Jie, and Yuwen Lai. 2008. Phonological knowledge beyond the lexicon in Taiwanese double reduplication. In Interfaces in Chinese phonology: Festschrift in honor of Matthew Y. Chen on his 70th birthday, ed. Yuchau E. Hsiao, Hui-Chuan Hsu, Lian-Hee Wee, and Dah-An Ho, 183–222. : Academia Sinica. Zhang, Jie, and Yuwen Lai. 2010. Testing the role of phonetic knowledge in Mandarin tone sandhi. Phonology 27(1): 153–201. Zhang, Jie, and Jiang Liu. 2011. Tone sandhi and tonal coarticulation in Tianjin Chinese. Phonetica 68 (3): 161–191. Zhang, Jie, and Yuanliang Meng. 2012. Structure-dependent tone sandhi in real and nonce words in Shanghai Wu. In Proceedings of the 3rd international symposium on tonal aspects of languages, ed. Gu Wentao. Nanjing: Nanjing Normal University. Zhang, Jie, Yuwen Lai, and Craig Sailor. 2009. Opacity, phonetics, and frequency in Taiwanese tone sandhi. In Current issues in unity and diversity of languages: Collection of papers selected from the 18th International Congress of Linguists, ed. Manghyu Pak, 3019–3038. Seoul: Linguistic Society of Korea. Zhang, Jie, Yuwen Lai, and Craig Sailor. 2011. Modeling Taiwanese speakers’ knowledge of tone sandhi in reduplication. Lingua 121(2): 181–206. Zhao, Yuan, and Dan Jurafsky. 2009. The effect of lexical frequency and Lombard reflex on tone hyperarticulation. Journal of Phonetics 37: 231–247. Zhu, Xiaonong. 2004. Jipin guiyihua — ruhe chuli shengdiao de suiji chayi? [F0 normalization — How to deal with between-speaker tonal variations?]. Yuyan Kexue [Linguistic Sciences] 3(2): 3–19. Zuraw, Kie. 2000. Patterned exceptions in phonology. PhD dissertation, University of California, Los Angeles. Zuraw, Kie. 2007. The role of phonetic knowledge in phonological patterning: Corpus and survey evidence from Tagalog infixation. Language 83: 277–316. Zuraw, Kie. 2010. A model of lexical variation and the grammar with application to Tagalog nasal substitution. Natural Language and Linguistic Theory 28: 417–472.

123