24. Prosodic Systems: China and Siberia

Jie Zhang, San Duanmu, and Yiya Chen

Abstract

This chapter provides a summary of the prosodic systems of varieties of Chinese spoken in China and as well as languages in Siberia, in particular Ket. What the Chinese languages and Ket share is their tonal nature. Three unique aspects of the of these languages are highlighted. First, the typologically complex patterns of tonal alternation known as ‘ sandhi’ are surveyed and a summary is given of current experimental findings on the productivity of these patterns. Second, the patterns of lexical and phrasal and their interaction with tone are discussed, with a on the similar metrical principles that underlie tone languages and other languages. Third, the interaction between lexical tone and utterance-level , particularly pitch variations for focus and interrogativity, is reviewed. These issues are first discussed in the context of Chinese languages, and then echoed in a brief summary of Ket prosody.

Keywords: tone, tone, , stress, intonation

24.1. Introduction

This chapter provides a summary of the prosodic systems of languages in Northern Asia, including varieties of Chinese spoken in China and Taiwan as well as languages in Siberia, in particular Ket. A common theme in the prosody of these languages is their ability to use pitch to cue lexical meaning differences; i.e., they are tone languages. The well-known quadruplet ma55/ma35/ma213/ma51 ‘mother/hemp/horse/to scold’1 in is an exemplification of the tonal nature of the languages in this area.

We start with a brief discussion of the typology of and tonal inventories in Chinese languages (§24.2). These typological properties lead to three unique aspects of prosody in these languages: the prevalence of complex tonal alternations, also known as ‘tone sandhi’ (§24.3), the interaction between tone and word and phrase-level stress (§24.4), and the interaction between tone and intonation (§24.5). The prosodic properties of Ket are discussed briefly in §24.6. The last section provides a summary (§24.7).

24.2. The syllable and tone inventories of Chinese languages

The maximal syllable structure of Chinese languages is CGVV or CGVC (G=glide, VV=long vowel or diphthong) (Duanmu 2008: 72). The syllabic position of the prenuclear glide is controversial, and it has been analyzed as part of the onset (Duanmu 2007, 2008, 2017), part of the rime (Wang and Chang 2001), occupying a position of its own (Weijer and Zhang 2008), or variably belonging to the onset or the rime depending

1 Tones are transcribed in Chao numbers (Chao 1948, 1968), where ‘5’ and ‘1’ indicate the highest and lowest pitches in the speaker’s pitch range, respectively. Juxtaposed numbers represent contour tones; e.g., ‘51’ indicates a falling tone from the highest pitch to the lowest pitch.

1 on the language, the phonotactic constraints within a language, and the speaker (Bao 1990, Wan 2002, Yip 2003). Yip (2003) specifically used the ambiguous status of the prenuclear glide as an argument against the subsyllabic onset-rime constituency. The coda inventory is reduced to different degrees, from Northern dialects in which only nasals and occasionally [ɻ] are legal to southern dialects (e.g., Wu, Min, Yue, Hakka) where stops [p, t, k, ʔ] may also appear in addition to the nasals. closed by a stop are often referred to as checked syllables (ru sheng) in Chinese , and they are considerably shorter than non-checked (open or sonorant-closed) syllables.

There are typically three to six contrastive tones on non-checked syllables in a Chinese dialect. On checked syllables, the tonal inventory is reduced — one or two tones are common, and three tones are occasionally attested. Table 1 illustrates the tonal inventories on non-checked and checked syllables in Shanghai (Wu), Fuzhou (Min), and Cantonese (Yue).

Table 1. Tonal inventories in three dialects of Chinese: Non-checked syllables Checked syllables Cantonese (Matthews and Yip 1994) 55, 33, 22, 35, 21, 23 5, 3, 2 Shanghai (Zhu 2006) 52, 34, 14 4, 24 Fuzhou (Liang and Feng 1996) 44, 53, 32, 212, 242 5, 23

24.3. Tone sandhi in Chinese languages

A prominent aspect of the prosody of Chinese languages is that they often have a complex system of ‘tone sandhi,’ whereby tones alternate depending on the adjacent tones or the prosodic/morphosyntactic environment in which they appear (Chen 2000, Zhang 2014). Two examples of tone sandhi from Standard Chinese and Xiamen (Min) are given in (1). In Standard Chinese, the third tone 214 becomes 35 before another third tone;2 in Xiamen, tones undergo regular changes whenever they appear in nonfinal positions of a syntactically defined tone sandhi domain (Chen 1987, Lin 1994).

(1) Tone sandhi examples: a. Tonally induced tone sandhi in Standard Chinese: 214 → 35 / ___ 213 b. Positionally induced tone sandhi on non-checked syllables in Xiamen: 53 " 44 " 22 ! 24 in nonfinal positions of tone sandhi domain % ' 21

2 This is a vast simplification. While in identification tasks, T2 is indistinguishable from the sandhi tone for T3 (e.g., Wang and Li 1967, Peng 2000), recent phonetic, psycholinguistic, and neurolinguistic evidence indicates the sandhi tone for T3 is neither acoustically identical to T2 (e.g., Peng 2000; Yuan and Chen 2014), nor is it processed the same way as T2 in on-line spoken word processing (e.g., Li and Chen 2015, Nixon et al. 2015).

2 Tone sandhi patterns can be generally classified as ‘left-dominant’ or ‘right- dominant.’ Right-dominant sandhi, found in most Southern Wu, Min, and Northern dialects, preserves the base tone on the final syllable in a sandhi domain and changes the tones on nonfinal syllables; left-dominant sandhi, typified by Northern Wu dialects, preserves the tone on the initial syllable (Yue-Hashimoto 1987, Chen 2000, Zhang 2007, 2014). It has been argued that there is an asymmetry in how the sandhi behaves based on directionality, in that right-dominant sandhi tends to involve local or paradigmatic tone change, while left-dominant sandhi tends to involve the extension of the initial tone rightward (Yue-Hashimoto 1987, Duanmu 1993, Zhang 2007). We have seen in (1) that both the tone sandhi patterns in Standard Chinese and Xiamen are right-dominant and involve local paradigmatic tone change. In the left-dominant Shanghai tone sandhi pattern in (2), however, the tone on the first syllable is spread across the disyllabic word, neutralizing the tone on the second syllable (Zhu 2006).

(2) Shanghai tone sandhi for non-checked tones: 52-X → 55-31 34-X → 33-44 14-X → 11-14

Zhang (2007) argued that the typological asymmetry is due to two phonetic effects. One is that the prominent positions in the two types of dialects have different phonetic properties: the final position in right-dominant systems has longer duration and can maintain the contrastive tonal contour locally; the initial position in left-dominant systems has shorter duration and therefore needs to allocate the tonal contour over a longer stretch in the sandhi domain. The other is the directionality effect of tonal coarticulation, which tends to be perseverative and assimilatory; the phonologization of this type of coarticulatory effect could then potentially lead to a directional asymmetry in tone sandhi. Duanmu (1993, 1994, 1999, 2007), on the other hand, argued that the difference stems from the syllable structure, and hence, stress pattern difference between the two types of languages, as discussed in §24.4. Despite these typological tendencies, phonetically arbitrary tone sandhi patterns abound in Chinese dialects. For instance, the circular chain shift in the Xiamen pattern (1b) has no phonotactic, and hence, phonetic motivation, as the base tone itself is not phonotactically illegal in the sandhi position. Left-dominant sandhi, likewise, often has phonetic changes that cannot be predicted by a straightforward tone mapping mechanism. In Wuxi (Wu), for example, the tone on the initial syllable of a word needs to be first replaced with another tone before it spreads rightward (Chan and Ren 1989), and Yan and Zhang (2016) argued that the tone substitution involves a circular chain shift, as in (3).

(3) Wuxi tone sandhi for non-checked tones with voiceless initials: 53-X → 43-34 Falling " Dipping 323-X → 33-44 % ' 34-X → 55-31 Rising

3 The phonetic arbitrariness and complexity of the synchronic tone sandhi patterns raises the question whether all these patterns are equally productive and learnable for speakers. This question has been investigated using the ‘wug’ tests in a long series of work since the 1970s. For instance, Hsieh (1970, 1975, 1976), Wang (1993), and Zhang et al. (2011) have shown that the circular chain shift in Taiwanese Southern Min (a very similar pattern to Xiamen in (1b)) is not fully productive. Zhang and Meng (2016) and Yan and Zhang (2016) provided a comparison between Shanghai and Wuxi tone sandhi and showed that the Shanghai pattern is generally productive; for Wuxi, the spreading aspect of the tone sandhi is likewise productive, but the substitution aspect of the sandhi is unproductive due to its circular chain shift nature. The relevance of phonetic naturalness to tone sandhi productivity has also been investigated in non-chain-shift patterns. Zhang and Lai (2010), for instance, tested the productivity difference between the phonetically less natural third-tone sandhi and the more natural half-third sandhi in Standard Chinese and showed that, although both apply consistently to novel words, the former involves incomplete application of the sandhi phonetically and is thus less productive. In general, the productivity studies of tone sandhi demonstrate that, to understand how native speakers internalize the complex sandhi patterns in their language, we need to look beyond the sandhi patterns manifested in the lexicon and consider ways that more directly tap into the speakers’ tacit generalizations. In our current understanding, the synchronic grammar of tone sandhi likely includes both productive derivations from the base tone to the sandhi tone and allomorph listings of sandhi tones depending on the nature of the sandhi.

24.4. Lexical and phrasal stress in Chinese languages

We begin with word stress. All monosyllabic content words occur in heavy syllables, are long and stressed, and have a lexical tone, such as lian3 ‘face’.3 Function words can have stress and carry a lexical tone, too, but they often occur in light syllables, are short and unstressed, and have no lexical tone, such as the aspect marker le5. The pattern is captured by the generalizations in (4) and (5).

(4) Metrical structure in monosyllables (Hayes 1995): A heavy syllable has two moras, forms a moraic , and is stressed. A light syllable has one , cannot form a foot, and has no stress.

(5) The Tone-Stress Principle (Liberman 1975, Goldsmith 1981, Duanmu 2007): A stressed syllable can be assigned a lexical tone. An unstressed syllable is not assigned a lexical tone.

In two-syllable words or compounds, stress patterns are more complicated. Three degrees of stress can be distinguished, represented in (6) and (7) as S (strong), M

3 Unless otherwise noted, examples in section 24.4 are taken from Standard Chinese. Tones are indicated by letters 1-5, representing tone 1, tone 2, tone 3, tone 4, and lack of tone, respectively.

4 (medium), and L (light or unstressed). Tones are omitted, since they differ from dialect to dialect.

(6) Stress patterns in final positions: Variety Stress type Example in Gloss Beijing MS (67%) da-xue ‘university’ SM (17%) bao-dao ‘report’ SL (14%) ma-ma ‘mom’ Chengdu SM da-xue ‘university’ SL ma-ma ‘mom’ Shanghai SL da-xue ‘university’

(7) Stress patterns in non-final positions: Variety Stress type Example in Pinyin Gloss Beijing SM da-xue ‘university’ SL ma-ma ‘mom’ Chengdu SM da-xue ‘university’ SL ma-ma ‘mom’ Shanghai SL da-xue ‘university’

Stress in Beijing is based on Yin (1982). Stress in Chengdu has a robust phonetic realization in syllable duration (Ran 2011). Stress in Shanghai is realized in both syllable duration (Zhu 1995) and tone sandhi (Xu et al. 1988). In Chengdu and Shanghai, the stress patterns remain the same whether the position is final or non-final. In Beijing, however, MS is found in final position only, and it changes to SM in non-final positions. For example, da-XUE ‘university’ is MS when final but SM when non-final, as in DA-xue jiao-SHI ‘university teacher’ (uppercase indicates S). The patterns raise three questions, given in (8).

(8) Three questions to explain: a. Out of nine possible combinations (SS, SM, SL, MS, MM, ML, LS, LM, and LL), why are only SM and SL found? b. Why is MS found in final position only? c. How do we account for dialectal differences?

(8a) is explained by (9), which allows (SM) and (SL) but not *MM, *ML, *LM, *LL (no main stress), or *SS (two main stresses).

(9) Constraint on stress patterns in non-final positions: Chinese has syllabic trochee.

(8b) is explained by (10), where 0 is an empty beat, which is realized as either a pause or lengthening of the preceding syllable.

5 (10) Stress Shift: (SM) → M(S0) / __ #

(8c) is related to the complexity of syllable rimes. As shown in (11), Beijing has the most complex rimes and Shanghai the simplest.

(11) Rime complexity: Variety Diphthongs [-n -ŋ] contrast Beijing Yes Yes Chengdu Yes No Shanghai No No

Rime complexity can explain differences in stress patterns: (12a) explains why stress shift occurs in Beijing but not in Chengdu or Shanghai. (12b) explains why Shanghai has S and L but no M.

(12) Stress and rime complexity: a. Stress Shift occurs in languages that have both diphthongs and contrastive codas. b. Languages without diphthongs or contrastive codas have no inherent heavy syllables.

There is a common view that a language can only choose one foot type (Hayes 1995), which seems to contradict our assumption that Chinese has both moraic trochees and syllabic trochees. However, a standard assumption in metrical phonology is that multiple tiers of metrical constituents are needed, such as in the analysis of main and secondary word stress in English. The foot type of a language is simply the foot type at the lowest level of metrical structure. Our analysis suggests that the lowest metrical tier in Chinese is the moraic foot. Monomorphemic words longer than two syllables are mostly foreign names, in which binary feet are built from left to right. Some examples in Shanghai are shown in (13), transcribed in Pinyin, where uppercase indicates stress.

(13) Stress in polysyllabic foreign names ZI-jia-ge ‘Chicago’ DE-ke-SA-si ‘Texas’ JIA-li-FO-ni-ya ‘California’ JE-ke-SI-luo-FA-ke ‘Czechoslovakia’

Let us now consider phrasal stress. Chomsky and Halle (1968) proposed two cyclic rules for English, shown in (14).

6 (14) Phrasal stress (Chomsky and Halle 1968): Nuclear Stress Rule (NSR) In a phrase [A B], assign stress to B. Compound Stress Rule (CSR) In a compound [A B], assign stress to B if it is branching, otherwise assign stress to A.

The rules have been reinterpreted as a single rule and extended to other languages (Gussenhoven 1983a, b, Duanmu 1990, 2007, Cinque 1993, Truckenbroadt 1995, Zubizarreta 1998). Let us assume the version in (15).

(15) Stress-XP (Truckenbroadt 1995): In a syntactic unit [X XP] or [XP X], XP is assigned phrasal stress.

Stress-XP can be noncyclic. A comparison of CSR and Stress-XP is shown in (16), with English compounds.

(16) A comparison of cyclic CSR and noncyclic Stress-XP whale-oil lamp law-school language-exam CSR Cycle 2 x x Cycle 1 x x x [[XP X] X] [[XP X][XP X]] Stress-XP x x x [[XP X] X] [[XP X][XP X]]

On cycle 1, CSR assigns stress to the left in whale-oil, law-school, and language- exam. On cycle 2, CSR assigns stress to whale-oil (because lamp is not branching) and language-exam (because it is branching). In contrast, Stress-XP assigns stress to each XP in one step. There are three differences between the analyses. First, as Gussenhoven (1983a, b) notes, Stress-XP achieves the result in one step, while CSR cannot. Second, CSR produces many stress levels, while Stress-XP produces far fewer, in support of Gussenhoven (1991). Third, in law-school language exam, CSR assigns more stress to language, while Stress-XP assigns equal stress to law and language. In what follows, we shall consider Stress-XP only, since it is a simpler theory and can account for all Chinese data. Now, consider a compound and a phrase in Shanghai Chinese (Xu et al. 1988, Duanmu 1999), shown in (17). The foot/weight tier shows foot boundaries and (H for heavy, L for light, 0 for an empty beat). On the tone tiers, H means high and L means low.

7 (17) A compound and a phrase in Shanghai Chinese Compound Verb phrase Stress x x Syntax [XP X] [X XP] Foot/weight (HL) H(H0) IPA tso ve tso ve Underlying-tone LH-LH LH-LH Surface-tone L-H LH-LH Gloss ‘fry-rice ‘fry rice (fried rice)’ (to fry rice)’

In the compound, ‘fry’ has phrasal stress; ‘rice’ has no stress and loses its underlying tones. The underlying tones of ‘fry’ is then split between the syllables. In the phrase, ‘rice’ has phrasal stress; ‘fry’ does not but remains heavy, because no expression in Chinese starts with a light syllable. The three degrees of , L, H, and (H0), are quite clear phonetically (Zhu 1995, Tables [10L] and [10M]). Next, we consider speech style, shown in (18). Of interest is the fact that in careful speech both expressions form two domains, but in casual speech the compound can reduce to one domain, while the verb phrase stays with two domains (Xu et al. 1988).4

(18) Speech style on a compound and a phrase in Shanghai Chinese Compound Verb phrase Careful Stress x x x x Syntax [[XP X][XP X]] [[XP X][XP X]] Foot/weight (HL)(HL) (HL)(HL) IPA nø-ʨĩ da-ɦoʔ kʰo-zɑ̃ da-ɦoʔ UR-tone LH-HL LH-LH LH-LH LH-LH Surface-tone L-H L-H L-H L-H Casual Stress x x Syntax [XP X] [X XP] Foot/weight (HL) LL (HL)(HL) IPA nø-ʨĩ da-ɦoʔ kʰo-zɑ̃ da-ɦoʔ UR-tone LH-HL LH-LH LH-LH LH-LH Surface-tone L-H 0-0 L-H L-H Gloss ‘South-capital big-school ‘exam-enter big-school (Nanjing University)’ (enter university by exam)’

In careful speech, each expression has two XPs; each XP yields a stress and a tonal domain, as expected. In casual speech, we may assume that each disyllabic unit is treated as a single word. As a result, the compound is now [XP X], with just one XP and one stress (on ‘South- capital’). The verb phrase is now [X XP], with phrasal stress on the object still. The verb

4 As a reviewer pointed out, the difference between casual and careful styles may be gradient. Nevertheless, the tonal difference between typical casual and careful styles are quite noticeable, as described by Xu et al. (1988).

8 has no phrasal stress but gets stress by a separate requirement that a Chinese expression cannot start with an unstressed syllable. Before we end this section, let us consider the role of function words. An example is shown in (19), where [jɪʔ] can be an article ‘a’ or a numeral ‘one.

(19) Example with [jɪʔ] ‘a/one’ in Shanghai Chinese [jɪʔ] as ‘a’ [jɪʔ] as ‘one’ Feet/weight (HL)L(H0) H(HL)(H0) Feet/words (ma jɪʔ) po (se) ma (jɪʔ po) (se) Gloss ‘buy a CL umbrella’ ‘buy one CL umbrella’

When [jɪʔ] means ‘a’, it is not an XP and hence unstressed (the classifier CL [po] is not an XP either). When [jɪʔ] means ‘one’, it is an XP (numeral phrase) and stressed. The verb [ma] gets stress, again because a Chinese expression cannot start with a light syllable. Finally, ‘umbrella’ is an XP (noun phrase) and is always stressed. Selkirk and Shen (1990) used (19) to exemplify a phonology-syntax mismatch. A metrical analysis can explain how the mismatch occurred. The discussion above barely scratches the surface of metrical effects in Chinese, but we hope the reader could see that (i) stress plays a crucial role in Chinese phonology and (ii) the fundamental metrical principles in Chinese are the same as in other languages.

24.5. Intonation in Chinese languages Intonation in Chinese is present in every utterance and serves diverse linguistic and paralinguistic functions beyond word meanings. It marks the prosodic organization of an utterance (Li and Yang 2009, Li et al. 2011) and helps to organize its information structure (Chen et al. 2016). It also signals the makeup of a discourse (Tseng et al. 2005, Yang and Yang 2012) and regulates turn-takings between interlocutors (Levow 2005). Moreover, speakers employ intonation to perform speech acts (Ho 1977, Chen and He 2005) as well as to express emotional states and attitudes (Liu and Pell 2012, Li 2015). Both intonation and lexical tone involve the modification of various acoustic aspects of the speech signal, but their primary correlate is fundamental frequency (f0) changes. The multiplexing of the f0 channel raises the intriguing question of how exactly utterance-level intonation and word-level lexical tones interact in Chinese. Early discussions of Chinese intonation include Chao (1933, 1968), Gårding (1987), Shen (1989), Shen (1992), Cao (2002), Lin (2004), and Wang and Shi (2010). To date, most quantitative studies of intonation have focused upon the f0 marking of focus and interrogativity.

Focus Focus refers to the highlighting of information that speakers intend to bring to the discourse, as opposed to other alternatives. It is an important strategy that languages adopt for efficient speech communication. In answer to the question of who teaches linguistics (20a), 玛丽 FOC in (20b) would be focused (indicated with FOC) and uttered with prominence, implying that among the set of possible teachers, it is MARY who teaches linguistics.

9 (20) a: - 谁教语言学? Shui_jiao_yuyanxue Who_teach_linguistics ‘Who teaches linguistics?’

b: - 玛丽FOC教语言学 mali_jiao_yuyanxue Mary_teach_linguistics ‘{MARY}FOC teaches linguistics’

Focal prominence in Chinese is cued via an ensemble of acoustic variations including not only the distinctive realization of lexical tone contours and durational lengthening (Chen 2003, Chen 2006, Chen and Gussenhoven 2008), but also higher intensity (Shih 1988, Chen et al. 2014), hyperarticulated segmental contrasts (Chen 2008). What has been of great interest is the underlying mechanism that leads to the f0- marking of focus. Gårding, et al. (1983) were the first who adopted the notion of pitch range grid to describe f0 modifications of tones under emphasis in Standard Chinese (SC): an expanded range for focus and a compressed range out of focus. This pattern has been repeatedly observed in subsequent studies in SC (e.g., Jin 1996, Xu 1999, Yuan 2004), which led to the view that focus is encoded via the tri-zone pitch range control: expansion under focus, compression after focus, and little or no change before focus (Xu and Xu 2005). Both within- and cross- dialectal variation in focus-induced f0 adjustments has been reported. Shanghai Chinese (a Wu dialect), for example, has five lexical tones with two rising ones differing mainly in pitch . Chen (2009) showed that while syllables with the low-register rising tone do show significant f0 range expansion under focus, syllables with the high-register rising tone do not. This is presumably to ensure the distinctness of the two rising tones, because significant f0 range expansion of both would cause the f0 ranges of the two rising tones to overlap, making them less distinguishable. Taiwanese Southern Min is another Chinese variety that lacks consistent focus-induced f0 range expansion across lexical tones. Pan (2007) showed that f0 raising is clearly present in the HH and HL tones but not in the MM and ML tones. Taiwanese thus parallels Shanghai Chinese in that focus-related f0 modification is robust only when it does not sacrifice the distinctive realization of lexical tonal contrasts. The two studies converge on the importance of lexical tonal properties (such as the role of f0 register for tonal contrasts) for focus-induced f0 range manipulations. In the post-focus position, a range of languages spoken in China has been documented to lack f0 compression, including Cantonese (Gu et al. 2006, Gu and Lee 2007), Taiwanese, and Taiwan Mandarin (Xu et al. 2012), as well as Wa, Deang and Yi (Wang et al. 2011). Chen (2010) reported the lack of compression in certain tonal contexts even in Standard Chinese and argued that post-focus f0 realization is conditioned by lexical tonal properties and the weak implementation of the tonal targets in the post-focus condition. Xu et al. (2012) hypothesized that post-focus compression (PFC) has a single origin which evolved into a typological divide of languages with vs. without PFC. Given that this study and its follow-up research were typically based on

10 speakers’ production of one stimulus sentence (therefore with limited tonal context), larger scale investigations are certainly needed to test this hypothesis further. Focus f0 expression is also sensitive to higher-level prosodic organizations as evident in Wu dialects. In Wenzhou Wu Chinese, where a disyllabic word serves as the domain for tone sandhi, when only one syllable within the disyllabic sandhi domain is contrastively focused, f0 range expansion is quite uniformly distributed over the entire disyllabic domain (Scholz 2012, Scholz and Chen 2014). Shanghai Wu Chinese shows a similar pattern of sensitivity of focus to tone sandhi domain in addition to other lexical prosodic properties (Chen 2009, Ling and Liang 2017). The finding of the multiple cues for focal prominence and the within- and cross- dialectal variation in focus-induced f0 range manipulations are compatible with the prominence-marking view of focus expression, explored in Chen (2003, 2010) and Chen and Gussenhoven (2008). (See Chen 2012 and references therein for a cross-linguistic perspective.) This view holds that the phonological reflex of focus is prosodic prominence, whose phonetic expression is contingent upon lexical and prosodic properties of the focused constituent. While focus-induced f0 range manipulation is one of the important means to signal focus, other f0 adjustments (such as delayed f0 rise/fall in the rising/falling tones) are prioritized which, as an ensemble, ensure that lexical tones are produced with enhanced distinctiveness of their characteristic F0 contours. Moreover, while focus-induced f0 variation is largely independent of f0 variation for lexical tones, they do interact, as evident in the dialects where f0 adjustments may be absent or compromised.

Interrogativity Generally speaking, interrogativity in Chinese is encoded via a global f0 raising with greater magnitude towards the end of the utterance, which is consistent with a cross- linguistic tendency (Cruttenden 1997; cf. Rialland 2007). Chinese dialects differ in the way the rising question intonation is implemented in production and utilized in perception. Cantonese, in the face of competing f0 cues for lexical tone and question intonation, favors to cue interrogativity with an utterance-final f0 rise at the cost of tonal neutralization and misidentification, as evident in the f0 realizations of the utterance-final lexical tones (T21/T23/T22), which are indistinguishable from the high-rising tone T25 (Ma et al. 2006). The substantial effect that question intonation has on lexical tone contours in Cantonese allows the local f0 rise to serve as a reliable cue for intonation perception (Ma et al. 2006), but also leads to a high error rate for utterance-final lexical tone identification, especially for the low and low-rising tones (Ma et al. 2011), while listeners’ sensitivity to f0 raising over non-final syllables to mark interrogativity is reduced (Xu and Mok 2012). Standard Chinese, in contrast, opts to cue lexical tones at the cost of potential intonation misidentification (Ho 1977, Shih 1988, Liu and Xu 2005). A falling tone at the end of a question maintains its falling f0 contour (but would not fall as low as in declaratives), while a rising tone at the end of a statement is realized with its characteristic rising f0 contour (but at a relatively lowered f0 level compared to that in questions). Neither the global f0 raising nor the local f0 rise distort the lexical tone contours.

11 Thus, while Cantonese shows a more direct mapping between phonetics (final f0 rise) and meaning (interrogativity), in SC, the mapping is more obscure as the phonetic implementation of the so-called question-induced final f0 ‘rise’ varies as a function of the utterance-final lexical tone. This renders the final syllable a less reliable f0 cue-bearer for interrogativity than that in Cantonese. Listeners therefore tune more into the pre-final f0 raising as an additional cue for question perception (Jiang and Chen 2016). They can also quickly and accurately identify different lexical tones with near-ceiling levels of accuracy (Liu et al. 2016a). The recognition rate of intonation, however, is contingent on the identity of the utterance-final lexical tone. Lower identification accuracy and more variance was reported for the rising tone in question intonation (Yuan 2011), while higher accuracy rates and a faster response speed were found for the falling tone in statements (Liu et al. 2016a). The cross-dialect differences in the interplay between lexical tone and intonation in behavioral studies have also been echoed by event-related potential (ERP) brain response data (Ren et al. 2009, Ren et al. 2012, Kung et al. 2014, Liu et al. 2016b). They conjointly suggest that dialects differ in the extent to which interrogative intonation may be grammaticalized into a local boundary tone due to their different lexical tone systems. More data, from more dialects/languages, with statistical validity, are essential to replicate the findings and to elucidate the range of possible interactions between tone and intonation.

24.6. The prosody of Siberian languages

Central Siberia is a linguistically complex region, with at least five genetically distinct groups of language present — Samoyedic, Ob-Ugric, Yeniseic, Tungusic, and Turkic (Anderson 2004: p. 2). The prosody of these languages is understudied, and we focus on the Yeniseic language Ket here, for two reasons. One is that, like Chinese languages, Ket has lexical tones. The other is that there exist relatively detailed descriptions of tone in the language (e.g., Werner 1997, Vajda 2000, 2003, 2004, Georg 2007). The following summary is primarily based on Vajda (2000, 2003). Monosyllabic words spoken in isolation can have one of four tones in Ket: a high tone, a glottalized tone, a rising/falling tone (the tone rises, then falls), and a falling tone. Tonal pitch contours vary allophonically based on the carrier syllable. On disyllabic or longer words, however, only two tonal melodies appear on the two leftmost syllables of the word: a rising/falling contour (peak on the rising portion) and a rising/high-falling contour (peak on the falling portion; the fall is also less pronounced). These are considered as allotones of the monosyllabic rising/falling tone and high tone, respectively. When a monosyllabic root is followed by a syllabic suffix, the tone on the suffix, to some extent, predicts the disyllabic melody of the word. For instance, a rising/high-falling contour is more likely when the suffix has a rising/falling tone, but a rising/falling contour is more likely when the suffix has a high tone (Vajda 2003: 409). But there are many exceptions. In phonological phrases, however, the monosyllabic tones are preserved. From these, Vajda (2000, 2003) concluded that the domain of tone for Ket is the word rather than the syllable. This description of Ket tone is strikingly similar to that of the tonal systems of Northern Wu dialects of Chinese, which has a similar

12 word/phrase distinction and word-level tones. The affinity between Ket and Chinese languages in the nature of tone is recognized by Vajda (2003: 416). Word and phrase-level stress in Ket has not been described in the literature. Therefore, the question whether its similarity in tone patterns to certain dialects of Chinese entails similar stress effects remains unanswered. The intonation patterns of Ket are also understudied and a comprehensive study remains to be conducted (Vajda 2003: 411).

24.7. Summary

Chinese languages and Siberian languages like Ket are known for being tonal. The scholarship on the prosody of these languages has contributed to our understanding of the typology of prosodic systems in a number of important ways, of which we focused on three in this chapter. First, tonal alternation patterns, generally known as “tone sandhi” in the Chinese context, are typologically diverse and often complex. Experimental studies on the productivity of different types of tonal alternation can shed light on the learnability of phonological alternation and the nature of the synchronic grammar in languages with complex alternations in general. Second, although the tonal aspect of these languages may obscure stress cues, the metrical structure of these tone languages shows striking similarities with that of non-tone languages. Third, the interaction between intonation and tone shows that the intonational realization in tone languages is not only influenced by the general tonal nature of the language, but also dependent on the specific tonal contrasts used in the language. It is hoped that future work on the prosody of languages in these regions, particularly those understudied languages/dialects, continue to explore the ways in which tone interacts with other aspects of the prosodic and grammatical systems.

References:

Anderson, Gregory D. S. (2004). The languages of Central Siberia. In Edward J. Vajda (ed.), Languages and prehistory of Central Siberia, pp. 1-119. Amsterdam: John Benjamins. Bao, Zhiming (1990). Fanqie languages and reduplication. Linguistic Inquiry 21: 317- 350. Cao, Jianfeng (2002). The relationship between tone and intonation in . 3: 195- 202. Chan, Marjorie K. M. and Hongmo Ren (1989). Wuxi tone sandhi: From last to first syllable dominance. Acta Linguistica Hafniensia 21: 35-64. Chao, Yuen-Ren (1933). Tone and intonation in Chinese. Bulletin of the Institute of History and Philology, Academia Sinica 4.2:121-134. Chao, Yuen Ren (1948). Mandarin primer: An intensive course in spoken Chinese. Cambridge, MA: Harvard University Press. Chao, Yuen Ren (1968). A grammar of spoken Chinese. Berkeley: University of California Press. Chen, Matthew Y. (2000). Tone sandhi: patterns across Chinese dialects. Cambridge, UK: Cambridge University Press.

13 Chen, Ying, Yi Xu and Susan Guion-Anderson (2014). Prosodic Realization of Focus in Bilingual Production of Southern Min and Mandarin. Phonetica 71: 249-270. Chen, Yiya (2003). The Phonetics and Phonology of Contrastive Focus in Standard Chinese. PhD dissertation. Stony Brook: Stony Brook University. Chen, Yiya (2006). Durational adjustment under corrective focus in Standard Chinese. Journal of Phonetics 34: 176-201. Chen, Yiya (2008). The acoustic realization of vowels of Shanghai Chinese, Journal of Phonetics 36: 629-648. Chen, Yiya (2009). Prosody and information structure mapping: Evidence from Shanghai Chinese. Chinese Journal of Phonetics 2: 123-133. Chen, Yiya (2010). Post-focus f0 compression - Now you see it, now you dont, Journal of Phonetics 38: 517-525. Chen, Yiya (2012). Message-related variation. In Cohn, A., Fourgeron, C., and Huffman, M (eds.), Oxford Handbook of Laboratory Phonology. pp. 103-115. Oxford: Oxford University Press. Chen, Yiya, and Carlos Gussenhoven (2008). Emphasis and tonal implementation in Standard Chinese. Journal of Phonetics 36: 724-746. Chen, Yiya, Pepina Lee, and Haihua Pan (2016). Focus and topic marking in Chinese. In Féry, Caroline and S. Ishihara (eds.), Oxford Handbook of Information Structure. Oxford: Oxford University Press. Chomsky, Noam, and Morris Halle (1968). The sound pattern of English. New York: Harper and Row. Cinque, Guglielmo (1993). A null theory of phrase and compound stress. Linguistic Inquiry 24.2: 239-297. Cruttenden, Alan (1997). Intonation, Cambridge University Press. Cambridge, UK. Duanmu, San (1990). A formal study of syllable, tone, stress and domain in Chinese languages. Doctoral dissertation, MIT, Cambridge, MA. Duanmu, San (1993). Rime length, stress, and association domains. Journal of East Asian Linguistics 2: 1-44. Duanmu, San (1994). Syllabic weight and syllabic duration: A correlation between phonology and phonetics. Phonology 11: 1–24. Duanmu, San (1999). Metrical structure and tone: Evidence from Mandarin and Shanghai. Journal of East Asian Linguistics 8: 1-38. Duanmu, San (2007). The phonology of Standard Chinese, 2nd edition. Oxford: Oxford University Press. Duanmu, San (2008). Syllable structure — the limits of variation. Oxford: Oxford University Press. Duanmu, San (2017). From non-uniqueness to the best solution in phonemic analysis: evidence from Chengdu Chinese. Lingua Sinica 3.1: 1-23. Gårding, Eva, Jialu Zhang, and Jan-Olof Svantesson (1983). A Generative Model for Tone and Intonation in Standard Chinese Based on Data from One Speaker, Lund Working Papers 25: 53-65. Gårding, Eva (1987). Speech act and tonal pattern in Standard Chinese: constancy and variation. Phonetica 44: 13-29. Georg, Stefan (2007). A descriptive grammar of Ket (Yenisei-Ostyak), Part I: Introduction, phonology, morphology. Kent, UK: Global Oriental.

14 Goldsmith, John (1981). English as a tone language. In Didier L. Goyvaerts (ed.), Phonology in the 1980’s, pp. 287-308. Ghent, Belgium: E. Story-Scientia. Gu, Wentao, Keikichi Hirose, and Hiroya Fujisaki (2006). Modeling the effects of emphasis and question on fundamental frequency contours of Cantonese utterances. IEEE Transactions on Audio, Speech and Language Processing 14: 1155-1170. Gu, Wentao, and Tan Lee (2007). Effects of tonal context and focus on Cantonese f0. Proceedings of The 16th International Congress of Phonetic Sciences (ICPhS 2007). Dudweiler: Pirrot. Gussenhoven, Carlos (1983a). Focus, mode and the nucleus. Journal of Linguistics 19.2: 377-417. Gussenhoven, Carlos (1983b). Testing the reality of focus domains. Language and Speech 26: 61–80. Gussenhoven, Carlos (1991). The English rhythm rule as an accent deletion rule. Phonology 8.1: 1-35. Gussenhoven, Carlos (2004). The Phonology of Tone and Intonation. Cambridge: Cambridge University Press. Hayes, Bruce (1995). Metrical stress theory: Principles and case studies. Chicago: University of Chicago Press. Hsieh, Hsin-I (1970). The psychological reality of tone sandhi rules in Taiwanese. In M. A. Campbell (ed.), Papers from the 6th Meeting of the Chicago Linguistic Society, pp. 489-503. Chicago: Chicago Linguistic Society. Hsieh, Hsin-I (1975). How generative is phonology. In E. F. Koerner (ed.), The transformational-generative paradigm and modern linguistic theory, pp. 109-144. Amsterdam: John Benjamins. Hsieh, Hsin-I (1976). On the unreality of some phonological rules. Lingua 38: 1-19. Ho, Aichen T. (1977). Intonation Variation in a Mandarin Sentence for Three Expressions: Interrogative, Exclamatory, and Declarative. Phonetica 34: 446–457. Jiang, Ping, and Aishu Chen (2016). Representation of Mandarin intonation: Boundary tone revisited. Proceedings of the 23rd North American Conference on Chinese Linguistics: 97-109. Jin, Shunde (1996). An acoustic study of sentence stress in Mandarin Chinese. PhD dissertation. Columbus: OSU. Kung, Carmen., Dorothee Chwilla, and Herbert Schriefers (2014). The interaction of lexical tone, intonation and semantic context in on-line spoken word recognition: An ERP study on Cantonese Chinese. Neuropsychologia 53: 293-309. Levow, Gina (2005). Turn-taking in Mandarin dialogue: Interactions of tones and intonation. Proceedings of the SIGHAN Workshop: 72–78. Li, Aijun (2015). Encoding and Decoding of Emotional Speech: A Cross-cultural and Multimodal Study between Chinese and Japanese. Springer. Li, Xiaoqing and Yiya Chen (2015). Representation and processing of lexical tone and tonal variants: Evidence from the mismatch negativity. PLoS ONE 10: e0143097. Li, Xiaoqing, Yiya Chen, Yufang Yang (2011). Immediate integration of different types of prosodic information during on-line spoken language comprehension: An ERP Study. Brain Research 1386: 139-152.

15 Liang, Yuzhang and Aizhen Feng (1996). Fuzhouhua yindang [The sound system of the ]. Shanghai: Shanghai Jiaoyu Chubanshe [Shanghai Educational Press]. Liberman, Mark (1975). The intonational system of English. Doctoral dissertation, MIT, Cambridge, MA. Lin, Maocai (2004). Chinese intonation and tone. Applied Linguistics 3: 57-67. Ling, B. and J. Liang (2017). Focus encoding and prosodic structure in Shanghai Chinese. Journal of Acoustical Society of America 141: EL610. Liu, Fang, Yi Xu (2005). Parallel Encoding of Focus and Interrogative Meaning in Mandarin Intonation. Phonetica 62: 70-87. Liu, Min, Yiya Chen, and Niels Schiller (2016a). Context Effects on Tone and Intonation Processing in Mandarin. Speech Prosody, Boston, USA. Liu, Min, Yiya Chen, and Niels Schiller (2016a). Online processing of tone and intonation in Mandarin: Evidence from ERPs. Neuropyschologia: 307-317. Liu, Pan, and Marc Pell (2012). Recognizing vocal emotions in Mandarin Chinese: A validated database of Chinese vocal stimuli. Behavior Research Methods 44: 1042- 1051. Ma, Joan, Valter Ciocca and Tara Whitehill (2006). Effect of intonation on Cantonese lexical tones. Journal of Acoustical Society of America 120: 3978-3987. Ma, Joan, Valter Ciocca and Tara Whitehill (2011). The perception of intonation questions and statements in Cantonese. Journal of Acoustical Society of America 129: 1012-1023. Matthews, Stephen and Virginia Yip (1994). Cantonese: A comprehensive grammar. London: Routledge. Mencken, H. L. (1948). American street names. American Speech 23.2: 81-88. Nixon, Jessie S., Yiya Chen, and Niels Schiller (2015). Speech variants are processed as abstract categories and context-specific instantiations: evidence from Mandarin lexical tone production. Language, Cognition, and Neuroscience 30: 491-505. Pan, Ho-Hsien (2007). Focus and Taiwanese unchecked tones, in C. Lee, M. Gordon, and D. Büring (eds.), Topic and Focus: Cross-linguistic Perspectives on Meaning and Intonation. Springer. Peng, Shu-Hui (2000). Lexical versus ‘phonological’ representations of Mandarin sandhi thones. In M. Broe and J. Pierrehumbert (eds.), Papers in laboratory phonology 5: Acquisition and the lexicon, pp. 152-167. Cambridge, UK: Cambridge University Press. Ran, Qibin (2011). Beijinghua, Sichuanhua qiyi ‘dong-dan + ming-dan’ jiegou de yuyin chayi ji yiyi [Phonetic difference of ambiguous ‘V-monosyllable + N-monosyllable’ phrases in Beijing Dialect and Sichuan Dialect and some theoretical thoughts]. Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 25.4: 235-448. Ren, Guiqin, Yufang Yang, and Xiaoqing Li (2009). Early cortical processing of linguistic pitch patterns as revealed by the mismatch negativity. Neuroscience 162: 87-95. Ren, Guiqi, Yiyuan Tang, Xiaoqing Li, and Xue Sui (2013). Pre-Attentive Processing of Mandarin Tone and Intonation: Evidence from Event-Related Potentials. In F. Signorelli and D. Chirchiglia (eds.), Functional Brain Mapping and the Endeavor to Understand the Working Brain: Ch. 06.

16 Rialland, Annie. (2007). Question prosody: an African perspective. In C. Gussenhoven and T. Riad (eds.). Tones and Tunes, vol 1: Typological Studies in Word and Sentence Prosody, Mouton de Gruyter, pp.35-62. Scholz, Franziska (2012). Tone sandhi, prosodic phrasing, and focus marking in Wenzhou Chinese. PhD dissertation, Leiden University. Scholz, Franziska, and Yiya Chen (2014). The independent effects of prosodic structure and information status on tonal coarticulation: Evidence from Wenzhou Chinese. In J. Caspers, Y. Chen, W. Heeren, J. Pacilly, N. O. Schiller, and E. van Zanten (Eds.), Above and beyond the segments: Experimental linguistics and phonetics. John Benjamins Publishing Company. Selkirk, Elisabeth, and Tong Shen (1990). Prosodic domains in Shanghai Chinese. In Sharon Inkelas and Draga Zec (eds.), The phonology-syntax connection, pp. 313-337. CSLI, Stanford University, Stanford, CA. Distributed by University of Chicago Press. Shen, Jong (1992). On Chinese intonation models. Chinese Studies 4: 16-24. Shen, Xiaonan (1989). The Prosody of Mandarin Chinese, University of California Publications, Linguistics, vol. CXVIII, Berkeley: University of California Press. Shih, Chilin (1988). Tone and intonation in Mandarin, Working Papers of the Cornell Phonetics Laboratory 3: Stress, Tone and Intonation. Ithaca: Cornell University. Tseng, Chiu-yu, Shao-huang Pin, Yehlin Lee, Hsin-min Wang, and Yong-cheng Chen (2005). Fluent speech prosody: Framework and modeling. Speech Communication 46: 284–309. Truckenbrodt, Hubert (1995). Phonological phrases--their relation to syntax, focus, and prominence. Doctoral Dissertation, MIT, Cambridge, MA. Vajda, Edward J. (2000). Ket prosodic phonology. München: Lincom Europa. Vajda, Edward J. (2003). Tone and in Ket. In Dee Ann Holisky and Kevin Tuite (ed.), Current trends in Caucasian, East European, and Inner Asian linguistics: Papers in honors of Howard I. Aronson, pp. 393-418. Amsterdam: John Benjamins. Vajda, Edward J. (2004). Ket. München: Lincom Europa. Wan, I-Ping (2002). The status of prenuclear glides in Mandarin syllables: Evidence from psycholinguistics and experimental acoustics. Journal of Chinese Phonology 11: 232- 248. Wang, Bei, Ling Wang, and Tursun Qadir (2011). Prosodic realization of focus in six languages/dialects in China, ICPhS 17 144–147. Wang, H. Samuel (1993). Taiyu biandiao de xinli texing [On the psychological status of Taiwanese tone sandhi]. Tsinghua Xuebao [Tsinghua Journal of Chinese Studies] 23: 175-192. Wang, H. Samuel and Chih-ling Chang (2001). On the status of the prenucleus glide in Mandarin Chinese. Language and Linguistics 2: 243-260. Wang, Ping, and Feng Shi (2011). Intonation Patterns. Nankai Journal of Language 2: 1- 11. Wang, William S.-Y. and Kung-Pu Li (1967). Tone 3 in Pekinese. Journal of Speech and Hearing Research 10: 629-636. Weijer, Jeroen van de and Jisheng Zhang (2008). An X-bar approach to the syllable structure of Mandarin. Lingua 118: 1416-1428. Werner, Heinrich (1997). Die Ketische Sprache. Wiesbaden: Harrassowitz Verlag.

17 Xu, Bo, and Peggy Mok (2012). Cross-linguistic perception of intonation by Mandarin and Cantonese listeners. In Proceedings of Speech Prosody 2012, 99-102. Shanghai. Xu, Baohua, Zhenzhu Tang, Rujie You, Nairong Qian, Rujie Shi, and Yaming Shen. (1988). Shanghai Shiqü fangyan zhi [Shanghai City Dialect Gazette]. Shanghai: Shanghai Jiaoyu Chubanshe [Shanghai Educational Press]. Xu, Yi (1999). Effects of tone and focus on the formation and alignment of f0 contours, Journal of Phonetics 27: 55-105. Xu, Yi and Xu, Ching X. (2005). Phonetic realization of focus in English declarative intonation, Journal of Phonetics 33: 159-197. Xu, Yi, Szu-wei Chen, Bei Wang (2012). Prosodic focus with and without post-focus compression (PFC): A typological divide within the same language family? The Linguistic Review 29: 131-147. Yan, Hanbo and Jie Zhang (2016). Pattern substitution in Wuxi tone sandhi and its implication for phonological learning. International Journal of Chinese Linguistics 3: 1-45. Yang, Xiaohong. and Yang, Yufang. (2012). Prosodic Realization of Rhetorical Structure in Chinese Discourse. IEEE Transactions on Audio, Speech, and Language Processing 20: 1196-1206. Yin, Zuoyan (1982). Guanyu Putonghua shuangyin changyong ci qingzhongyin de chubu kaocha [A preliminary study of accents and atonics in disyllabic words in common use]. Zhongguo Yuwen [Studies of the Chinese Language] 1982.3 (168): 168-173. Yip, Moira (2003). Casting doubt on the Onset–Rime distinction. Lingua 113: 779-816. Yuan, Jiahong (2004). Intonation in Mandarin Chinese: acoustics, perception, and computational modeling. Doctoral dissertation, Cornell University. Yuan, Jiahong (2011). Perception of intonation in Mandarin Chinese. The Journal of the Acoustical Society of America 130: 4063-4069. Yuan, Jiahong and Yiya Chen (2014). 3rd tone sandhi in Standard Chinese: A corpus approach. Journal of Chinese Linguistics 42:1: 218-237. Yue-Hashimoto, Anne O. (1987). Tone sandhi across Chinese dialects. In Chinese Language Society of Hong Kong (ed.), Wang Li memorial volumes, English volume, pp. 445-474. Hong Kong: Joint Publishing Co. Zhang, Jie (2007). A directional asymmetry in Chinese tone sandhi systems. Journal of East Asian Linguistics 16: 259-302. Zhang, Jie (2014). Tones, tonal phonology, and tone sandhi. In C.-T. James Huang, Y.-H. Audrey Li, and Andrew Simpson (eds.), The handbook of Chinese linguistics, pp. 443-464. Oxford: Wiley-Blackwell. Zhang, Jie and Yuwen Lai (2010). Testing the role of phonetic knowledge in Mandarin tone sandhi. Phonology 27: 153-201. Zhang, Jie, Yuwen Lai, and Craig Sailor (2011). Modeling Taiwanese speakers’ knowledge of tone sandhi in reduplication. Lingua 121: 181-206. Zhang, Jie and Yuanliang Meng (2016). Structure-dependent tone sandhi in real and nonce words in Shanghai Wu. Journal of Phonetics 54.1: 169-201. Zhu, Xiaonong (1995). Shanghai tonetics. Doctoral dissertation, Australian National University, Canberra. Zhu, Xiaonong (2006). A grammar of Shanghai Wu. München: Lincom Europa.

18 Zubizarreta, Maria Luisa (1998). Prosody, focus, and word order. Cambridge, MA: MIT Press.

19