Language and Speech

Aren’t prosodyFor and Peer syntax Reviewmarking bias in questions?

Journal: Language and Speech

Manuscript ID LAS-18-0085.R2

Manuscript Type: Original Article

Date Submitted by the n/a Author:

Complete List of Authors: Arnhold, Anja; University of Alberta Faculty of Arts, Department of Linguistics; University of Konstanz, Linguistics Braun, Bettina; University of Konstanz, Linguistics Romero, Maribel; University of Konstanz, Linguistics

Keywords: yes-no question, speaker certainty, bias, Ladd's ambiguity

As first observed by Ladd (1981. A first look at the semantics and pragmatics of negative questions and tag questions. Proceedings of Chicago linguistic society 17, pp. 164–171), English polar questions with high negation (HiNQ, e.g. Aren’t they adding a menu item?) can be used both to check the speaker’s belief that the proposition p is true (e.g. p = they are adding a menu item) and to check the addressee’s belief that p is not true (¬p). We hypothesized that this ambiguity can be disambiguated prosodically. We further hypothesized that the prosodic disambiguation is absent in German, because the checked proposition can be marked morpho-syntactically, with HiNQs checking p and low negation questions (LowNQ, e.g. Are they not adding a menu item?) checking ¬p. Abstract: A production study tested these hypotheses with 24 speakers of Western Canadian English and German each (764 and 767 utterances total, respectively). The results showed that, when the speaker originally believed p and the addressee implied ¬p, English speakers preferred HiNQs over LowNQs, confirming Ladd’s observation, and used intonation to mark whose proposition they were checking, as hypothesized. By contrast, German speakers marked this distinction morpho-syntactically, realizing mostly HiNQs to check their own proposition and LowNQs to check the addressee’s proposition. Their prosody, in turn, was largely determined by the morpho-syntactic question form. The study further manipulated the speaker’s certainty of the checked proposition, but, in contrast to studies on Romance languages, found that certainty itself

https://mc.manuscriptcentral.com/las Page 1 of 106 Language and Speech

1 2 3 was not marked. 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 2 of 106

1 2 3 4 5 Full report 6 7 8 9 10 Corresponding Author: 11 12 Anja Arnhold, University of Alberta, Department of Linguistics, 4-24 Assiniboia Hall, 13 14 Edmonton, AB, T6G 2E7, Canada 15 16 17 Email: [email protected] 18 19 For Peer Review 20 21 Aren’t prosody and syntax marking bias 22 23 24 25 in questions? 26 27 28 29 30 Anja Arnhold1,2 Bettina Braun2 and Maribel Romero2 31 32 33 1University of Alberta, Edmonton, AB, Canada 34 35 2 36 University of Konstanz, Konstanz, 37 38 39 Abstract 40 41 42 As first observed by Ladd (1981. A first look at the semantics and pragmatics of negative 43 44 questions and tag questions. Proceedings of Chicago linguistic society 17, pp. 164–171), 45 46 47 English polar questions with high negation (HiNQ, e.g. Aren’t they adding a menu item?) 48 49 can be used both to check the speaker’s belief that the proposition p is true (e.g. p = they 50 51 are adding a menu item) and to check the addressee’s belief that p is not true (p). We 52 53 54 hypothesized that this ambiguity can be disambiguated prosodically. We further 55 56 hypothesized that the prosodic disambiguation is absent in German, because the checked 57 58 59 60 https://mc.manuscriptcentral.com/las Page 3 of 106 Language and Speech

1 2 3 proposition can be marked morpho-syntactically, with HiNQs checking p and low 4 5 6 negation questions (LowNQ, e.g. Are they not adding a menu item?) checking p. 7 8 9 A production study tested these hypotheses with 24 speakers of Western Canadian 10 11 English and German each (764 and 767 utterances total, respectively). The results 12 13 showed that, when the speaker originally believed p and the addressee implied p, 14 15 16 English speakers preferred HiNQs over LowNQs, confirming Ladd’s observation, and 17 18 used intonation to mark whose proposition they were checking, as hypothesized. By 19 For Peer Review 20 contrast, German speakers marked this distinction morpho-syntactically, realizing mostly 21 22 HiNQs to check their own proposition and LowNQs to check the addressee’s proposition. 23 24 25 Their prosody, in turn, was largely determined by the morpho-syntactic question form. 26 27 The study further manipulated the speaker’s certainty of the checked proposition, but, in 28 29 contrast to studies on Romance languages, found that certainty itself was not marked. 30 31 32 33 Keywords 34 35 36 yes-no question, bias, Ladd’s ambiguity, speaker certainty 37 38 39 40 1. Introduction 41 42 43 Polar questions (or yes-no questions) are questions that expect an answer of either yes or 44 45 46 no. The form of the question can be positive or negative (e.g. Are you hungry? vs. Aren’t 47 48 you hungry? or Are you not hungry?), and even though positive and negative forms have 49 50 traditionally received the same semantic representation (e.g. a set containing the checked 51 52 53 proposition and its negation), more recently several authors have argued that they are not 54 55 in fact equivalent in terms of use-conditions (Asher & Reese, 2007; Büring & Gunlogson, 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 4 of 106

1 2 3 2000; Romero & Han, 2004, 2004; van Rooij & Šafářová, 2003; Sudo, 2013). Instead, the 4 5 6 felicity of their use depends both on the speaker’s prior beliefs regarding the checked 7 8 proposition p and the new evidence provided by the conversational context. For example, 9 10 Domaneschi, Romero and Braun (2017) manipulated the original speaker bias (e.g., (1a); 11 12 13 speaker bias for the proposition p) and contextual evidence (e.g. (1b), evidence for p) 14 15 and found that in such situations with conflicting biases, English participants most often 16 17 chose negative polar questions with negation in a high position, as in (1c); by contrast, 18 19 For Peer Review 20 positive polar questions as in (1d) were almost never uttered by participants as a response 21 22 in the same context and thus seem inappropriate (see Domaneschi et al., 2017, for 23 24 empirical results on these and other combinations of biases). 25 26 27 28 29 (1) Polar questions checking proposition p (p = there is a train in the early morning) 30 31 a. Tomorrow, you need to go from Nottingham to Sheffield very early. Your 32 33 brother goes there quite frequently and you remember he told you that he 34 35 36 usually takes a train in the early morning, before 7:00. 37 38 b. You go to the ticket office and you ask for a train for the next morning. 39 40 The operator answers to you: “The only train available is at 11:00.” 41 42 43 c. Isn’t there a train in the early morning? 44 45 d. #Is there a train in the early morning? 46 47 (from Domaneschi et al., 2017, p. 12) 48 49 In this article, we further investigate negative questions with high negations such as (1c) 50 51 52 in German and English. While these questions have been argued to be ambiguous 53 54 between two readings in English, their German counterparts are not. We tested whether 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 5 of 106 Language and Speech

1 2 3 the two readings are disambiguated by prosody in English and how the same difference is 4 5 6 signalled in German. 7 8 As first pointed out by Ladd (1981), English negative polar questions with high negation 9 10 (HiNQ, e.g. Isn’t there a vegetarian restaurant around here?) show an interesting 11 12 ambiguity, later termed ‘Ladd’s ambiguity’ in the semantic literature: Speakers can either 13 14 15 use HiNQs to check the proposition p that they are biased for (p = there is a vegetarian 16 17 restaurant) or, having just inferred that the addressee seems to believe that the 18 19 proposition is not true,For check this Peer inference thatReview p. Ladd’s original examples are given 20 21 22 in (2) and (3) below. In (2b) Kathleen uses a HiNQ to double-check her original belief 23 24 that p is true. In (3c), Bob, who also had originally assumed that p was true, uses the 25 26 same HiNQ to check p, which he has just inferred from Kathleen’s previous statement. 27 28 29 30 31 (2) HiNQ checking proposition p (p = there is a vegetarian restaurant) 32 33 (Situation: Kathleen and Jeff have just come from Chicago on the Greyhound bus 34 35 36 to visit Bob in Ithaca.) 37 38 a. Bob: You guys must be starving. You want to go get something to eat? 39 40 b. Kathleen: Yeah, isn’t there a vegetarian restaurant around here--- 41 42 43 Moosewood, or something like that? 44 45 c. Bob: Gee, you’ve heard of Moosewood all the way out in Chicago, huh? 46 47 OK, let’s go there. (from Ladd, 1981, p. 164) 48 49 50 (3) HiNQ checking proposition p 51 52 (Situation: Bob is visiting Kathleen and Jeff in Chicago while attending CIS.) 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 6 of 106

1 2 3 a. Bob: I’d like to take you guys out to dinner while I’m here-we’d have time 4 5 6 to go somewhere around here before the evening session tonight, don’t 7 8 you think? 9 10 b. Kathleen: I guess, but there’s not really any place to go in Hyde Park. 11 12 c. Bob: Oh, really, isn’t there a vegetarian restaurant around here? 13 14 15 d. Kathleen: No, about all we can get is hamburgers and souvlaki. 16 17 (from Ladd, 1981, p. 164) 18 19 For Peer Review 20 21 22 This ambiguity is not present in German, as first pointed out by Büring and Gunlogson 23 24 (2000). They state that, when the negative question checks proposition p as in (2) – ‘outer 25 26 negation’ in Ladd’s terminology –, the question translates into German as (4), where the 27 28 29 negation nicht precedes the indefinite determiner ein. When the negative question checks 30 31 the proposition p as in (3) – ‘inner negation’–, German is said to use the equivalent 32 33 negative polar question with the negation in a low position (LowNQ) in (5), where the 34 35 36 negation and the determiner are merged into kein ‘none’. 37 38 (4) Gibt es hier nicht ein vegetarisches Restaurant? (HiNQ) 39 40 Gives EXPL here not a vegetarian restaurant 41 42 43 ‘Isn’t there a vegetarian restaurant around here?’ 44 45 (from Büring and Gunlogson, 2000, p. 10) 46 47 (5) Gibt es hier kein vegetarisches Restaurant? (LowNQ) 48 49 Gives EXPL here no vegetarian restaurant 50 51 52 ‘Is there not a vegetarian restaurant around here?’ 53 54 (from Büring and Gunlogson, 2000, p. 9) 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 7 of 106 Language and Speech

1 2 3 4 5 Hence, while German may disambiguate syntactically between checking p vs. p, 6 7 English may not.1 We hypothesized that, in English, Ladd’s ambiguity is disambiguated 8 9 10 prosodically, i.e. that the intonation of English HiNQs is different when they are 11 12 employed to check p than when they are used to check p. We therefore constructed a 13 14 production experiment comparing the prosody of English and German negative polar 15 16 17 questions – namely HiNQs and LowNs – checking p vs. checking p. 18 19 Although to our knowledgeFor it hasPeer not previously Review been tested whether prosody marks the 20 21 22 checked proposition of English HiNQs, an observation by Domaneschi et al. (2017) hints 23 24 that this may be the case. They showed that in a context in which a speaker with an 25 26 existing bias for p encountered evidence that their addressee believed p (as in (1) 27 28 29 above), when German participants selected HiNQs in the experiment they uniformly 30 31 realized them with a nuclear pitch rise, whereas British English participants sometimes 32 33 used rises and sometimes falls. Domaneschi et al. (2017) hypothesized that German 34 35 speakers uttering HiNQ always used them to check their original bias for p, whereas the 36 37 38 two different intonational realizations for English may have been used to mark the 39 40 difference between speakers checking their original bias for p vs. the newly presented 41 42 bias for p. However, since their experimental design did not control which proposition 43 44 45 participants checked, the authors were unable to indicate which of the two contours 46 47 would be used to check which bias. 48 49 Note that to test how the checked proposition is marked, it is crucial to use a so-called 50 51 52 contradiction scenario where the speaker’s original belief is p and the addressee implies 53 54 p, as in (1) and (3), and not a so-called suggestion scenario as in Ladd’s context given 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 8 of 106

1 2 3 in (2). It has been shown in the literature that, while HiNQs checking p can be used both 4 5 6 in contradiction and in suggestion scenarios, HiNQs checking p are only felicitous in 7 8 contradiction scenarios (AnderBois, 2011, p. 187-189; Romero & Han, 2004, p. 618ff.; 9 10 Sudo, 2013, table 19): In contrast to scenario (2), Bob could utter both Oh, really, isn’t 11 12 13 there some vegetarian restaurant around here? and Oh, really, isn’t there any vegetarian 14 15 restaurant around here? in scenario (3). Since we are interested in whether prosody 16 17 marks the difference between checking p and p in the case of ambiguity for English 18 19 For Peer Review 20 HiNQs, the relevant contexts are contradiction scenarios. 21 22 23 1.1 Prosody of polar questions 24 25 26 27 For both English and German, scholars have suggested various different, but related 28 29 explanations for differences in the intonation of questions, with most of the discussion 30 31 32 centered around the difference between final pitch rises and falls. Rises have been 33 34 described as signaling interrogativity and falls as making questions more statement-like, 35 36 with authors generally acknowledging that there is no one-to-one correspondence 37 38 39 between intonation and interrogativity and that other factors, such as politeness, play a 40 41 role (e.g. for English: Halliday & Greaves, 2008; O’Connor & Arnold, 1973; Palmer, 42 43 1922; Schubiger, 1958; for German: Altmann, 1984; von Essen, 1964; Féry, 1993; 44 45 Isačenko & Schädlich, 1966; Pheby, 1975; Zacharias & Stock, 1973; for other prosodic 46 47 48 markers of interrogativity also see Michalsky, 2017; Petrone & Niebuhr, 2014). In the 49 50 following, we are going to summarize more concrete analyses of the meaning of question 51 52 intonation, first for English and then for German. 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 9 of 106 Language and Speech

1 2 3 For English, Pierrehumbert and Hirschberg (1990) argue that the primary meaning of 4 5 6 phrase-final high pitch in English is to signal that the phrase should be interpreted as 7 8 related to a following phrase, whereas low pitch signals that it should not (see also 9 10 O’Connor & Arnold, 1973, p. 47; Pike, 1945, p. 51; Peters, 2006, adopts this 11 12 interpretation for German). A partially overlapping proposal by Bartels (1997b, 2013) 13 14 15 claims that final high pitch is associated with continuation dependence, whereas final low 16 17 pitch is associated with assertiveness. Similarly, Nilsenová (2006) posits an association 18 19 between final pitch risesFor and epistemic Peer uncertainty Review on the part of the speaker. Steedman 20 21 22 (2004), expanding a suggestion by Gussenhoven (1983, p. 201-202), interprets final high 23 24 pitch as the speaker attributing responsibility for the propositional content to the 25 26 addressee, whereas falls mark a commitment of the speaker (also see Gunlogson, 2001, 27 28 29 on falls and rises in declarative questions vs. statements, and Escandell-Vidal, 1998, 30 31 1999, 2017, for a similar account of Peninsular European Spanish question contours). 32 33 34 35 Conversely, another line of research has taken syntactic and pragmatic factors as its 36 37 38 starting point and investigated their influence on the prosodic realization of questions. 39 40 The role of the checked proposition is generally not explicitly addressed in studies 41 42 describing the prosody of English or German HiNQs. For English, Romero and Han 43 44 45 (2002) suggested that the preposed negation in HiNQs is focused per se (it contributes the 46 47 VERUM operator, see also Romero & Han, 2004) and that the complex of auxiliary and n’t 48 49 carries a different pitch contour and higher pitch than the auxiliary in a positive polar 50 51 52 question. They left open whether to analyse this contour as a rising (L+H*) or falling 53 54 (H*+L) accent, but stated that it is followed by a low (L*) accent on the focus exponent, 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 10 of 106

1 2 3 followed by a high (H-H%) boundary tone, i.e. a sentence-final pitch rise. Hedberg and 4 5 6 Sosa (2002) compared different question types in a corpus of spontaneous speech, and 7 8 found that the auxiliary in negative polar questions overwhelmingly carried a rising 9 10 (L+H*) accent (eight out of nine cases), but in terms of utterance-final pitch movements, 11 12 they were almost evenly split between final falling and final rising pitch. Nilsenová 13 14 15 (2006) also found an equal number of final rises and falls on the nine negative polar 16 17 questions in her corpus (four HiNQs, five positive polar questions with negative tags), all 18 19 of which she classifiedFor s as biased Peer towards aReview positive answer. Without distinguishing 20 21 22 positive and negative polar questions (or HiNQs and LowNQs), Palmer (1922) stated that 23 24 polar questions implying an answer of opposite polarity are realized with falling pitch, 25 26 whereas without such a speaker bias, a final rise is most common. Pierrehumbert and 27 28 29 Hirschberg (1990) contrasted the standard polar question contour rising from a low level 30 31 (L* H-H%) with a rise from a high level (H* H-H%), which they claimed is used more 32 33 often for confirmation questions. They only provide examples of positive confirmation 34 35 questions with a positive speaker bias. Based on experimental data from Western 36 37 38 Canadian English, Dehé and Braun (2019) showed that information-seeking questions are 39 40 predominantly realized with a high rise (H-H%), while rhetorical questions are realized 41 42 with a high plateau (H-L%). 43 44 45 In summary, research on the prosody of English polar questions has described differences 46 47 between positive and negative (HiNQ) questions, as well as marking of the expected 48 49 answer. Thus, while both morpho-syntactic question type and bias have been argued to 50 51 52 influence the prosodic realization of English questions, the picture is fragmented and 53 54 incomplete, in particular regarding the role of checked proposition. 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 11 of 106 Language and Speech

1 2 3 For German, no differences between the prosody of positive and negative polar questions 4 5 6 or between HiNQs and LowNQs have been described in the literature yet (but for 7 8 differences between polar questions, wh-questions, alternative questions and questions 9 10 with declarative syntax in their association with final pitch rises vs. falls, see e.g. 11 12 Altmann, 1984; Bierwisch, 1966; von Essen, 1964; Isačenko & Schädlich, 1966; Kohler, 13 14 15 2004; Pheby, 1975; Stock, 1999). Some authors have interpreted falls as signaling that 16 17 the speaker is focused on facts, but rises as indicating a focus on the addressee or the 18 19 relationship between theFor speaker Peer and addressee Review (Kohler, 2004; Stock, 1999; Stock & 20 21 22 Zacharias, 1973) or suggested that rises signal openness and an allocation of choice to the 23 24 addressee, whereas falls restrict the choice (Bartels, 1997a; von Essen, 1964; Kohler, 25 26 2004), relating the two to extralinguistic factors such as perceived friendliness and 27 28 29 interest in the addressee (von Essen, 1964, p. 59-60; Kohler, 2004; Peters, 2006). 30 31 Crucially for our present topic, Pheby (1975, p. 154-155) states that the intonational 32 33 realization of polar questions depends on the presence of bias, distinguishing three cases. 34 35 First, final pitch rises generally mark neutral questions requesting information. Second, a 36 37 38 final fall appears when speakers use a question to assure themselves of something or to 39 40 be assured that the addressees are certain. Third, questions with final falling-rising pitch 41 42 convey an expectation on the part of the speaker with respect to the truth of the checked 43 44 45 proposition. The third case is most relevant in the present context, since it seems to 46 47 correspond to the cases of interest in our study, i.e. biased polar questions where the 48 49 speaker checks the truth of a proposition p (or its negation p). To illustrate this case, 50 51 52 Pheby gives examples of both positive and negative polar questions carrying what he 53 54 calls a ‘negative’ expectation, i.e. a speaker bias towards answers of the opposite polarity, 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 12 of 106

1 2 3 namely Darfst du das? ‘Are you allowed to do that?’ expecting that the addressee is not 4 5 2 6 allowed to do it and Darfst du das nicht? ‘Are you not allowed to do that?’ expecting 7 8 that the addressee is allowed to do it. But he states that the polarity of the expected 9 10 answer may also match that of the question, supporting his claim with an example of a 11 12 positive polar question with the same fall-rise contour carrying a positive expectation. 13 14 15 Interestingly, Pheby (1975, p. 171) further subdivides this third case, questions with 16 17 falling-rising intonation, into questions with a smaller vs. a larger pitch range for the fall- 18 19 rise, with the latter signallingFor a Peerstronger expectation Review of an opposite-polarity answer.3 20 21 22 In line with that, Batliner (1989) suggests that the size of the final pitch rise in a question 23 24 can make a question more or less prototypically question-like (“fragehaltig”, Batliner, 25 26 1989, p. 155), whereby a less question-like question can for example express a bias for a 27 28 29 certain answer or signal a lack of interest in the answer, whereas a more question-like one 30 31 is neutral with respect to the expected answer and shows strong interest in the response. 32 33 Note, however, that Batliner is discussing gradient differences in meaning here, which 34 35 according to him can only be employed when the size of the final pitch rise is not needed 36 37 38 to mark a categorical distinction. Thus, prosody can modify how prototypical a morpho- 39 40 syntactic polar question is, but cannot be used in the same way when it is needed to mark 41 42 the contrast between question and declarative in so-called declarative questions, which 43 44 45 are morpho-syntactic statements and marked as questions only by prosody (Batliner, 46 47 1989, p. 155-157). 48 49 Similarly, Kügler (2003) observed that positive polar questions in semi-spontaneous and 50 51 52 spontaneous Upper Saxon German were always realized with rising nuclear accents 53 54 (L*H), but that these accents were followed by a final pitch rise for information-seeking 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 13 of 106 Language and Speech

1 2 3 questions, i.e. questions without a bias, and by a pitch fall for confirmation-seeking 4 5 6 questions, i.e. questions with a positive speaker bias. He specifically noted that the 7 8 absence of falling accents in polar questions in his Upper Saxon data constitutes a 9 10 difference from .4 11 12 German hence uses the final contour to mark bias in polar questions (Pheby, 1975; 13 14 15 Kügler, 2003) and seems to use gradual differences in pitch range to mark the strength of 16 17 an expectation of an opposite-polarity answer (Pheby, 1975). There are no experimental 18 19 data, however, on whetherFor the checPeerked proposition Review affects the morpho-syntactic question 20 21 22 form or the prosodic realization. 23 24 Similar pragmatic differences have been found to affect the prosody of polar questions in 25 26 various other languages. In particular, a prosodic distinction between information-seeking 27 28 29 and confirmation-seeking questions or between biased and non-biased questions has been 30 31 found in experimental studies for several varieties of Romance languages, with authors 32 33 frequently pointing to the relevance of other factors like the speaker’s confidence in 34 35 inferred information or the strength of their bias, also termed speaker certainty or 36 37 38 commitment (e.g. Armstrong, 2017; Armstrong & Prieto, 2015, for Puerto Rican Spanish; 39 40 Prieto & Borràs-Comes, 2018, for Central Catalan; Grice & Savino, 1995, 2003, for Bari 41 42 Italian; Michelas, Portes, & Champagne-Lavau, 2016, for French; Vanrell, Mascaro, 43 44 45 Torres-Tamarit, & Prieto, 2013, for Majorcan Catalan; Vanrell, Ballone, Schirru, & 46 47 Prieto, 2014, for Sardinian). 48 49 In sum, various factors have been argued to influence the intonation of polar questions in 50 51 52 the literature. We believe that the most important factors are: (i) syntactic form or 53 54 question type (e.g. HiNQ vs. LowNQ), (ii) pragmatic function (neutral information- 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 14 of 106

1 2 3 seeking use vs. biased questions seeking reassurance/confirmation or expecting a given 4 5 6 answer), (iii) the degree of the speaker’s certainty or commitment to the proposition in 7 8 the question, which may be related to more interpersonal aspects like the speaker’s 9 10 perceived friendliness, surprise or disbelief and (iv) the propositional content – p vs. p – 11 12 13 checked by the question. Importantly, while most authors acknowledge the relevance of 14 15 several factors, experimental studies rarely control more than one factor at a time and 16 17 often implicitly or explicitly conflate factors that can at least conceptually be 18 19 For Peer Review 20 differentiated. Furthermore, factor (iv) is at this point a suggestion in Domaneschi et al. 21 22 (2017) and has not been tested experimentally. 23 24 25 1.2 Experiment 26 27 28 29 Here, we report an experiment on negative polar questions that independently 30 31 32 manipulated and crossed the checked proposition (speaker’s/p vs. addressee’s/p) and 33 34 the speaker’s certainty regarding the checked proposition (high vs. low level of certainty) 35 36 to investigate morpho-syntactic marking (use of HiNQ vs. LowNQ) and prosodic 37 38 39 realization (distribution of boundary tones and accent types). Thus, we controlled factor 40 41 (ii) from section 1.1. to be constant for all target items (biased questions used to resolve a 42 43 conflict between speaker’s and addressee’s beliefs) while systematically varying factors 44 45 (iii) and (iv) and investigating the effects on morpho-syntactic question type (i) and 46 47 48 prosody (as well effects of question type on prosody). We concentrated on English and 49 50 German negative polar questions, which present an ideal test case for these factors due to 51 52 the differences between the languages described above with respect to Ladd’s ambiguity. 53 54 55 Based on previous studies, we tested the following hypotheses: 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 15 of 106 Language and Speech

1 2 3 4 5 6 1. Speakers mark which proposition is being checked by their question. German uses 7 8 question type, English uses prosody. 9 10 a. English speakers use HiNQs both to check the speaker’s proposition p and 11 12 13 to check the addressee’s proposition p (e.g. Ladd, 1981). 14 15 b. English speakers use prosody, in particular final intonation, to mark the 16 17 checked proposition (suggestion in Domaneschi et al., 2017). 18 19 For Peer Review 20 c. German speakers use HiNQs to check their own proposition p and 21 22 LowNQs to check the addressee’s proposition p (Büring & Gunlogson, 23 24 2000) 25 26 27 2. The speaker’s level of certainty regarding the checked proposition influences the 28 29 prosodic realization of the question (e.g. Pheby, 1975, for German; for Romance 30 31 languages Armstrong, 2017; Armstrong & Prieto, 2015; Prieto & Borràs-Comes, 32 33 2018; Michelas et al., 2016). 34 35 36 37 38 Note that since effects of speaker certainty have not been directly investigated for either 39 40 German or English, we left the precise nature of this factor’s influence open in hypothesis 41 42 43 2. Similarly, it was not possible to specify the nature of the prosodic marking of the 44 45 checked proposition in English in hypothesis 1b, since it has not previously been 46 47 investigated. In this paper we focus on categorical differences in boundary tones and 48 49 50 accent (placement and type). 51 52 53 2. Method 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 16 of 106

1 2 3 4 2.1 Participants 5 6 7 8 2.1.1 English 9 10 For the English version of the experiment, 34 participants were recruited at the University 11 12 of Alberta in Edmonton, Canada. Of these participants, we excluded seven participants 13 14 15 for the following reasons: four were native speakers of other varieties of English than 16 17 Western Canadian, two were bilingual or heritage speakers of another language and one 18 19 participant was both. OfFor the remaining Peer 27 participants, Review we chose 24 such that each of the 20 21 22 two basic experimental lists (see below) contained an equal number of participants. The 23 24 data that were analyzed came from 24 monolingual speakers of Western Canadian 25 26 English (15 female, 8 male, 1 other; age 18-33, mean 21.7). 27 28 Participants received partial course credit or CAD15 as reimbursement for their time. The 29 30 31 study was reviewed and approved by the Research Ethics Board 2 of the University of 32 33 Alberta. 34 35 36 2.1.2 German 37 38 For the German version of the experiment, 32 participants were tested at the University 39 40 of Konstanz, Germany. Except for one, all of them were native speakers of German who 41 42 43 had not acquired any other language before age six. Data from 24 monolingual speakers 44 45 of German were evaluated to match the number of speakers in the English data (20 46 47 female, 4 male; age 18-30, mean 22.5). Again, evaluated participants were equally 48 49 50 distributed across experimental lists. All participants received EUR8 as reimbursement 51 52 for their time. The study was reviewed by the ethics board at the University of Konstanz. 53 54 55 2.2 Procedure and Materials 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 17 of 106 Language and Speech

1 2 3 The experiment manipulated two factors, the proposition checked by asking a negative 4 5 6 polar question – HiNQ or LowNQ – and the speaker’s level of certainty regarding the 7 8 checked proposition, which were crossed to create four experimental conditions (A: 9 10 checking p, 90% sure, B: checking p, 60% sure, C: checking p, 60% sure, D: checking 11 12 13 p, 90% sure). Both factors were manipulated within-subjects. Procedure and equipment 14 15 were the same for both languages. 16 17 Participants read descriptions of dialogue scenarios between two fictional characters and 18 19 For Peer Review 20 were told to imagine themselves as one of the characters (see Table 1 for an English 21 22 example item). 23 24 The text appeared paragraph by paragraph (rows 1.-6. in Table 1), with participants 25 26 27 pressing a button on a keypad to proceed when they had read the paragraph and were 28 29 ready for the next one. After a short introductory statement (1.), the second paragraph 30 31 stated the participant’s positive bias towards a proposition p, which was made stronger in 32 33 conditions A and B (2., here: p = they are planting a water lily) than in conditions C and 34 35 36 D. Next, the interlocutor in the scenario, their addressee, was introduced as having a 37 38 certain level of expertise with respect to p, which was lowest in condition A and highest 39 40 in condition D (3.). The addressee’s belief that p was then introduced in all conditions 41 42 43 (4.) before the participant was prompted to resolve the contradiction with a question (5.). 44 45 The participant’s belief was always p, while the addressee’s belief was p, but conditions 46 47 48 varied in the following way: In conditions A and B, participants were instructed to utter a 49 50 question to check their own belief p, whereas they were told to check the addressee’s 51 52 belief p in conditions C and D (below: ‘checked proposition’). Additionally, they were 53 54 55 told about their level of certainty of the checked proposition, which was 90% for 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 18 of 106

1 2 3 conditions A and D and 60% for conditions B and C (below: ‘certainty’). Finally, 4 5 6 participants saw a HiNQ and a LowNQ checking p/p and had to choose one of them by 7 8 uttering it out loud (6.; participants did not see the labels HiNQ and LowNQ). 9 10 11 12 13 Table 1. English example item in the four conditions: A (checking p, 90% sure), B 14 15 (checking p, 60% sure), C (checking p, 60% sure) and D (checking p, 90% sure). 16 17 18 19 Condition ForA Condition Peer B ReviewCondition C Condition D 20 21 22 1. Your next-door neighbours, the Kapoors, have a very nice garden. 23 24 25 Looking out of the kitchen window, you have just spotted them in a 26 27 corner of the garden by the pond. 28 29 30 2. You are sure that they are planting You think that they are planting a 31 32 a water lily. water lily. 33 34 35 36 3. Your husband, Your husband, Your husband, Your husband, 37 38 who really who probably who has who has better 39 40 needs new needs new excellent eyesight than 41 42 43 glasses, has glasses some eyesight, has you, has 44 45 noticed the time soon, has noticed the noticed the 46 47 Kapoors, too. noticed the Kapoors, too. Kapoors, too. 48 49 50 Kapoors, too. 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 19 of 106 Language and Speech

1 2 3 4 4. He says: “Look! Our neighbours are just lounging in the sun for once. 5 6 Should we invite them over for a tea?” 7 8 9 5. You are You are You are You are 10 11 12 surprised to surprised to surprised to surprised to 13 14 hear this, but hear this, but hear this, but hear this, but 15 16 you are still you are still you are 60% you are 90% 17 18 19 90% convincFored 60% Peer convinced Review convinced that convinced that 20 21 you are right. you are right. he is right. he is right. 22 23 However, to However, to However, to However, to 24 25 26 make sure, you make sure, you make sure, you make sure, you 27 28 want to check want to check want to check want to check 29 30 your your his assumption. his assumption. 31 32 assumption. assumption. You ask: You ask: 33 34 35 You ask: You ask: 36 37 38 6. Aren’t they planting a water lily? [HiNQ] 39 40 41 Are they not planting a water lily? [LowNQ] 42 43 44 45 46 There were 16 different experimental scenarios of this structure for English and German 47 48 each. The target questions all followed the same pattern, given in Table 2, where V 49 50 51 denotes a verbal stem and N a compound noun. Compound nouns were all quadrisyllabic, 52 53 with primary stress on the first syllable and secondary stress on the third syllable; they 54 55 contained only voiced segments to facilitate prosodic analysis. Verbs were less 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 20 of 106

1 2 3 prosodically controlled, but consisted of at least two syllables (see Appendix A for the 4 5 6 complete list of target questions). We also controlled information structure to ensure that 7 8 participants used the question to check the truth of the proposition p/p as a whole and 9 10 all questions were uttered with broad/VP focus (instead of a possible interpretation with 11 12 13 narrow focus on the final compound noun). To this end, we took particular care in 14 15 constructing the text introducing the addressee’s belief that p. For example, for the item 16 17 illustrated in Table 1, we made sure that the addressee’s turn in 4. established a conflict 18 19 For Peer Review 20 with the speaker’s belief not only with respect to what the neighbours are planting 21 22 (leading to narrow focus on the compound noun water lily in 6.), but with respect to 23 24 whether the neighbours are doing anything at all (leading to broad/VP focus in 6.). 25 26 27 Table 2. Structure of target questions. V denotes a verbal stem, N a compound noun. 28 29 30 HiNQ LowNQ 31 32 33 34 English Aren’t they V-ing a(n) N? Are they not V-ing a(n) N? 35 36 37 German V-en sie nicht eine(n) N? V-en sie keine(n) N? 38 39 V-INF they not a N? V-INF they not.a N? 40 41 42 43 44 45 The 16 items x 4 conditions were split up to create two basic lists, which in turn consisted 46 47 of two experimental blocks each. Conditions and items were counterbalanced so that each 48 49 list contained every item twice, once in each experimental block, in different conditions. 50 51 Each condition appeared for overall eight trials per list. Experimental items were 52 53 54 interspersed with 15 fillers presenting scenarios similar to the experimental items, but 55 56 offering different question types as response options. Five fillers gave participants the 57 58 59 60 https://mc.manuscriptcentral.com/las Page 21 of 106 Language and Speech

1 2 3 choice between a simple positive polar question and a positive polar question including 4 5 6 really, five between a simple positive polar question and a declarative question, and five 7 8 between a positive polar question including really and a declarative question. Since these 9 10 question types cannot be used to check the speaker’s own proposition in contradictions 11 12 scenarios, filler items only occurred in two conditions, similar to conditions C and D for 13 14 15 experimental items. Like the target items, fillers were counterbalanced across lists and 16 17 participants encountered each filler in each block, i.e. twice per experimental session in 18 19 different conditions. Thus,For an experimental Peer sessionReview contained 16 x 2 experimental trials 20 21 22 and 15 x 2 filler trials, totalling in 62 trials, preceded by four practice trials. One of the 23 24 practice trials was similar to an experimental trial and three were similar to filler trials. 25 26 Within experimental blocks, items were presented in a different random order for each 27 28 29 participant. The experiment was conducted in conjunction with a second study, 30 31 counterbalancing the order of the experiments between participants, so that a complete 32 33 experimental session took about 45 minutes. 34 35 36 37 38 To keep participants engaged and attentive and to encourage them to read the contexts 39 40 properly, one third of the trials were followed by questions on whose proposition they 41 42 were checking or on their level of surprise or by comprehension questions on details 43 44 45 about the context. The English participants scored 89% correct on comprehension 46 47 questions, the German participants 81%, which suggests that they were reading the 48 49 contexts closely. 50 51 52 English- and German-speaking participants were tested in a sound-attenuated booth at the 53 54 Centre for Comparative Psycholinguistics at the University of Alberta and at the PhonLab 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 22 of 106

1 2 3 at the University of Konstanz, respectively. Stimulus presentation and data collection 4 5 6 were controlled with the software Presentation (Neurobehavioral Systems, 2014). The 7 8 productions were directly recorded on the computer (44.1kHz, 16Bit) using a head- 9 10 mounted unidirectional dynamic Shure SM10A microphone with an audio interface 11 12 audient iD14. Acoustic measurements were performed with Praat (Boersma & Weenink, 13 14 15 2016). 16 17 18 3. Data treatment and analysis 19 For Peer Review 20 21 22 23 Of the 768 target trials for each language (16 items × 2 repetitions × 24 participants), 24 25 three were discarded in the English version of the experiment because the participants did 26 27 not choose either of the two questions, but instead responded with a declarative question. 28 29 A further trial was lost because the participant pressed the button to move on to the next 30 31 32 trial too early, so the programme did not record their response. We thus evaluated 33 34 responses to 764 trials. For the German data, one trial had to be discarded because the 35 36 participant spoke so fast and silently that her response could not be analyzed. Thus 767 37 38 39 target trials were included in the analysis. 40 41 42 43 44 45 3.1. Morphosyntactic and prosodic analysis 46 47 48 The productions were first coded as HiNQ or LowNQ. Then, the first author, a native 49 50 51 speaker of German who had lived in Canada for over two years at that point, annotated 52 53 the pitch contours of all English recordings according to the Tone and Break Indices 54 55 guidelines for American English, MAE-ToBI (Silverman et al., 1992; Veilleux et al., 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 23 of 106 Language and Speech

1 2 3 2006), marking presence and type of accents and boundary tones (for the autosegmental- 4 5 6 metrical model of prosody on which ToBI annotations are based, see e.g. Pierrehumbert, 7 8 1980; Ladd 1996). Two native German speaking research assistants performed 9 10 corresponding annotations for the German data following the German GToBI guidelines 11 12 (Grice et al., 2005; Grice, Baumann, Ritter, & Röhr, 2017). We checked the reliability of 13 14 15 the annotation by pseudo-randomly selecting 15% of the data (115 utterances per 16 17 language, including utterances from all participants, utterances with all items and 18 19 conditions) for a new Forannotation Peer by a different Review annotator, the first author for the German 20 21 22 utterances and a native English-speaking research assistant for the English ones. The new 23 24 annotators were first trained with a similarly selected set of 24 utterances. The agreement 25 26 on the 115 test utterances of these new annotations with respect to the prosodic 27 28 29 characteristics analyzed statistically below was 74% for English and 75% for German, 30 31 corresponding to Cohen’s Kappa (κ) values of 0.67 and 0.70, respectively (Cohen, 1960; 32 33 Carletta, 1996; calculated with R package irr by Gamer, Lemon, Fellows, & Singh, 34 35 2019).4 Evaluating only agreement in whether each annotator had assigned the most 36 37 38 frequent prosodic category or not, i.e. the classification that formed the basis for 39 40 statistical analyses reported below, yielded an agreement of 87% (κ = 0.73) for English 41 42 and 83% (κ = 0.65) for the German annotations. The overall values obtained with either 43 44 45 method of evaluation, with κ ranging between 0.60 and 0.80, represent a “[r]eliable 46 47 distinction with substantial agreement” (Breen, Dilley, Kraemer, & Gibson, 2012, p. 293, 48 49 and references therein). 50 51 52 53 3.2 Statistical analysis 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 24 of 106

1 2 3 We analyzed the data with (generalized) linear mixed-effects models in R (R Core Team, 4 5 6 2017), using the package lme4 (Bates, Mächler, Bolker, & Walker, 2015; Bates et al., 7 8 2016). We additionally employed the package lmerTest (Kuznetsova, Brockhoff, & 9 10 Christensen, 2016) to obtain p-values based on Satterthwaite’s approximation of degrees 11 12 of freedom. Linear mixed-effects models are a type of linear regression modelling that 13 14 15 can for example take into account associations between several data points coming from 16 17 the same participant by fitting by-participant random slopes and/or intercepts in addition 18 19 to modelling effects ofFor predictor Peer variables as Review fixed effects (Baayen, Davidson, & Bates, 20 21 22 2008). As dependent measures, we analyzed the following binary response variables by 23 24 fitting separate binomial generalized linear mixed-effects models (with a binomial linking 25 26 function) to them: participants’ choice of morpho-syntactic question types (HiNQ vs. 27 28 29 LowNQ), the final boundary tone (final high rise vs. no final high rise), and the 30 31 realization of accents on all constituents (most frequent vs. other realization). We used 32 33 the default coding in R for these two-level factors (dummy coding). To validate our 34 35 categorization of utterance final boundary tones (see footnotes 5 and 6 below), we 36 37 38 additionally fit linear mixed-effects models to the numeric response variable utterance- 39 40 final f0 height in semitones (st). Model fitting always started with a model with random 41 42 intercepts for participant and item to account for inherent differences between 43 44 45 participants and items. The fixed effects of the first model included an interaction 46 47 between proposition checked (own proposition/p vs. addressee’s proposition/p) and 48 49 certainty (high/90% vs. low/60%), as well as the factor trial number (centered by 50 51 52 subtracting the mean) to account for possible changes in behavior over the course of the 53 54 experiment. For models of prosodic dependent variables (final boundary tone, accents 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 25 of 106 Language and Speech

1 2 3 and f0 measures), we additionally included the morpho-syntactic form of the question 4 5 6 (HiNQ vs. LowNQ) as a predictor (below: ‘question type’), as well as a three-way 7 8 interaction between question type, proposition and certainty. Starting from this model, we 9 10 then added random slopes for the predictors question type (for prosodic response 11 12 variables only), trial, checked proposition, certainty and the interaction between them and 13 14 15 compared the resulting models to the original one using the ANOVA function, until we 16 17 found the most complex random effects structure providing an improved fit to the data 18 19 (forward fitting). If noFor random byPeer-item slopes Review were found to improve model fit, we 20 21 22 additionally checked whether the by-item intercept could be removed altogether. Most of 23 24 our best models included random intercepts to account for effects of participant and item, 25 26 but no random slopes. Where models had a more or less complex random structure, we 27 28 29 will report this below. 30 31 Then we backward-fitted the fixed effects by removing interactions and predictors not 32 33 associated with significant effects, as long as this did not significantly decrease model fit. 34 35 Finally, we performed model criticism on the resulting best model by inspecting the 36 37 38 residuals. Where we found outlier residuals deviating more than ±2.5 standard deviations 39 40 from 0 (the regression line), we trimmed the model excluding these data points and 41 42 refitting, to make sure reported effects were not driven by outliers. We report the 43 44 45 resulting best models below, stating the number of removed data points where trimming 46 47 was necessary. Where the random effects structure is not explicitly described, it consists 48 49 of random intercepts for participants and items without random slopes (see Baayen, 2008, 50 51 52 ch. 7; Matuschek, Kliegl, Vasishth, Baayen, & Bates, 2017, on model fitting procedures). 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 26 of 106

1 2 3 4 4. Results 5 6 7 8 3.1 Morpho-syntax 9 10 11 12 As predicted based on Ladd’s ambiguity, participants in the English version of the 13 14 experiment showed an overall preference for HiNQ, choosing it over LowNQ in 62% of 15 16 target trials. As Figure 1 illustrates, this preference appeared for all conditions and was 17 18 19 influenced neither by Forthe speaker’s Peer certainty Reviewnor by whether they asked the question to 20 21 check their own proposition or that of the addressee. Statistical modelling indicated no 22 23 significant effects of proposition checked, certainty or an interaction between them. The 24 25 26 best linear mixed-effects model of the participants’ choice of question type contained an 27 28 insignificant effect of checked proposition, β = 0.006, SE = 0.314, z = 0.02, p = 0.985, 29 30 kept in the model since random effects included a by-participant effect slope for checked 31 32 proposition. Including certainty as a predictor or an interaction between the two factors 33 34 35 did not significantly improve model fit (the random by-item intercept was removed for 36 37 the same reason). The fixed effects only showed a significant effect of trial, β = -0.026, 38 39 SE = 0.004, z = -5.59, p < 0.001, suggesting that participants were overall more likely to 40 41 42 choose LowNQs later in the experimental session. 43 44 To confirm that participants indeed showed an overall preference for HiNQs, we 45 46 performed two separate one-sided one-sample t-tests over participant and item averages 47 48 49 for the percentage of HiNQ choices. These tests indicated that the percentage of uttered 50 51 HiNQs was indeed significantly above the chance level of 50%, t(23) = 3.8, p < 0.001 by 52 53 subjects; t(15) = 5.8, p < 0.001 by items. 54 55 56 [insert Figure 1] 57 58 59 60 https://mc.manuscriptcentral.com/las Page 27 of 106 Language and Speech

1 2 3 [insert Figure 2] 4 5 6 For the German participants, by contrast, a significant interaction appeared between 7 8 checked proposition and certainty, β = 0.767, SE = 0.333, z = 2.31, p = 0.021 (see Figure 9 10 11 2). The best linear mixed-effects model additionally contained a significant main effect of 12 13 checked proposition, β = -1.957, SE = 0.390, z = -5.01, p < 0.001, indicating that 14 15 participants responded with HiNQs less often when checking the addressee’s proposition 16 17 18 p than when checking their own proposition p. The main effect of certainty was not 19 For Peer Review 20 significant, β = -0.161, SE = 0.232, z = -0.69, p = 0.487, but the interaction showed that 21 22 participants’ preference for checking p with a HiNQ and p with a LowNQ was stronger 23 24 25 when they were 90% certain than when they were only 60% certain (the best model again 26 27 included a by-participant effect of checked proposition). 28 29 To assess directly whether the effects of the manipulated variables differed between the 30 31 32 languages, we combined both data sets and tested whether the three-way interaction 33 34 between checked proposition, certainty and language was significant. However, this 35 36 interaction did not significantly contribute to model fit, and the best model included only 37 38 39 a significant interaction between language and proposition, β = -1.473, SE = 0.244, z = 40 41 6.03, p < 0.001, suggesting that the effect of proposition – participants chose less HiNQ 42 43 when checking the addressee’s proposition than when checking their own – was larger in 44 45 the German than in the English data (in addition, the model included insignificant main 46 47 48 effects of proposition, β = -0.020, SE = 0.244, z = -0.08, p = 0.934, and language, β = - 49 50 0.186, SE = 0.166, z = 1.12, p = 0.263). This statistically corroborates the differences in 51 52 the effect of checked proposition for German and English participants suggested by the 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 28 of 106

1 2 3 separate models reported above: German speakers marked the checked proposition 4 5 6 morpho-syntactically, but English speakers did not. 7 8 9 3.2 Prosody 10 11 12 13 3.2.1 Final boundary tone 14 15 16 Participants uttered overall 85% of the English questions with a final f0 rise annotated as 17 18 H-H% boundary tones (N = 648), while only 10% of utterances showed the high plateau 19 For Peer Review 20 H-L% (N = 80), 3% the low L-L% (N = 20) and 2% the low-rising L-H% (N = 16). The 21 22 23 dominance of H-H% endings was similar for both question types (82% for HiNQs; 89% 24 25 for LowNQ), for both checked propositions (82% for speaker’s/p vs. 88% for 26 27 addressee’s/p) and both levels of certainty (85% for both levels). However, it was 28 29 30 apparent that some of these final H-H% rises were clearly higher, moving from a low 31 32 point to the top of the participant’s range (see Figure 3). Other utterances were annotated 33 34 with H-H% boundary tones, but still did not reach the same level (Figure 4). Therefore, 35 36 we systematically categorized the difference between high rises and smaller rises during 37 38 39 the annotation (this distinction was also used by the second annotator and evaluated in the 40 41 reliability check reported in section 3.1 above). Overall, 420 H-H% rises were classified 42 43 as high rises and 228 as smaller rises (55% vs. 30% of the total data). We leave open 44 45 46 whether this distinction truly reflects a further categorical bifurcation within the MAE- 47 48 ToBI category H-H%. An alternative explanation could be that we underutilized the H- 49 50 L% category, though small H-H% rises generally showed final rises and were auditorily 51 52 53 quite distinct from the final plateaus annotated as H-L% (compare Figures 4 and 5). 54 55 Crucially, a clear difference appeared between utterances annotated as high-rising H-H%, 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 29 of 106 Language and Speech

1 2 3 which were the overall most frequent realization in the data, and all other boundary 4 5 6 realizations. For statistical analysis, we therefore employed a binary classification into 7 8 utterances with and without high final f0 rises, where the latter category collapsed H-H% 9 10 boundary tones resulting in smaller rises with the categories H-L%, L-L% and L-H%.65 11 12 13 [insert Figure 3] 14 15 16 [insert Figure 4] 17 18 [insert Figure 5] 19 For Peer Review 20 21 The best linear mixed-effects model of this binary variable included a significant 22 23 24 interaction between checked proposition and certainty, β = -0.895, SE =0.325, z = -2.75, p 25 26 = 0.006, see Figure 6. Overall, high final rises were significantly more frequent when 27 28 participants checked the addressee’s proposition p than when checking their own 29 30 31 proposition p, β = 1.584, SE = 0.238, z = 6.64, p < 0.001, but this difference was smaller 32 33 for low certainty than for high certainty. By contrast, the main effect of certainty was 34 35 marginal, β = 0.394, SE = 0.223, z = 1.77, p = 0.077. The factor morpho-syntactic 36 37 38 question type did not significantly improve model fit and was therefore not included in 39 40 the best model. Evaluating only utterances annotated with H-H% boundary tones yielded 41 42 essentially the same results, with the best model showing the same interaction between 43 44 45 proposition checked and certainty, β = -0.916, SE =0.371, z = -2.47, p = 0.014, and the 46 47 same significant effect of proposition checked, β = 1.716, SE =0.278, z = 6.17, p < 0.001 48 49 (as well as an insignificant main effects of certainty, β = 0.327, SE = 0.246, z = 1.33, p = 50 51 0.183, question type, β = -0.351, SE = 0.205, z = -0.247, p = 0.805, and trial, β = -0.006, 52 53 54 SE = 0.007, z = -0.795, p = 0.427; question type was retained because removing this 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 30 of 106

1 2 3 factor resulted in convergence issues, while trial was included because by-item random 4 5 6 slope of trial improved model fit). 7 8 [insert Figure 6] 9 10 11 [insert Figure 7] 12 13 14 In annotating the German data, we directly distinguished high final f0 rises from less high 15 16 rises. In the GToBI system, this difference is reflected in the presence of two distinct 17 18 boundary tone combinations: H-^H%, a rise followed by a high plateau (H-) and a further 19 For Peer Review 20 21 rise perceived as high rising (^H%), contrasts with H-%, a simple rise ending in a 22 23 plateau. Overall 53% of the utterances were annotated with the H-^H% tones indicating a 24 25 final high rise, 21% with H-%, 11% with L-H%, 8% with L-% and for 7% of the 26 27 28 utterances, the annotators were unsure about the final boundary tone. We evaluated a 29 30 binary distinction between final high rises (H-^H%) and all other contours (see Figure 31 32 7).67 33 34 35 In contrast to the English results, the best binomial linear mixed-effects model only 36 37 included a significant effect of syntactic question type, β = 1.111, SE = 0.287, z = 3.87, p 38 39 = 0.002. It indicated that participants produced more final high rises when uttering a 40 41 42 HiNQ than when uttering a LowNQ (recall that HiNQs, in turn, were selected more often 43 44 when participants checked their own proposition p than when they checked the 45 46 addressee’s proposition p). The best model additionally included a random by- 47 48 49 participant slope of question type. 50 51 To directly assess the differences between the languages, we tested for a four-way 52 53 interaction between proposition, certainty, question type and language. However, this 54 55 56 model did not converge, indicating that there were not enough data points to model this 57 58 59 60 https://mc.manuscriptcentral.com/las Page 31 of 106 Language and Speech

1 2 3 kind of complexity. Therefore, we fitted a simpler model, which showed a significant 4 5 6 three-way interaction between proposition, certainty and language, β = 1.196, SE = 0.475, 7 8 z = 2.52, p = 0.012, suggesting that the interaction between proposition and certainty, 9 10 which appeared as significant for the intercept language English, β = -0.894, SE = 0.326, 11 12 z = -2.75, p = 0.006, was significantly weaker for the German data, in line with the 13 14 15 findings reported above. The model also contained an interaction between question type 16 17 and language, β = 1.209, SE = 0.259, z = 4.66, p < 0.001, while the main effect of 18 19 question type was notFor significant, Peer β = -0.174, Review SE = 0.175, z = -0.99, p = 0.322, matching 20 21 22 the finding of a significant effect of question type for German, but not for English. 23 24 25 26 In sum, this section found clear differences between English and German in the 27 28 29 realization of utterance-final boundary tones: English participants showed an effect of 30 31 proposition checked – realizing higher final f0 rises when checking their addressee’s 32 33 proposition p than for their own proposition p –, which was further modulated by their 34 35 36 certainty of that proposition, whereas the occurrence of high final f0 rises in German was 37 38 only predicted by morpho-syntactic question type, with more high final rises for HiNQs 39 40 than LowNQs. 41 42 43 3.2.2 Accents 44 45 Auxiliaries in the English utterances (are; note that only English items contained 46 47 48 auxiliaries, so that this category was absent from the German data) were the only 49 50 constituents to show an effect of one the manipulated variables. Auxiliaries were most 51 52 frequently realized with a rising accent with a delayed peak (L+H*<, N = 383, i.e. 50% 53 54 overall). For statistical analyses, the dependent variable coded whether auxiliaries were 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 32 of 106

1 2 3 realized with the most frequent accent category L+H*< or any of the other categories. 4 5 6 Binomial linear mixed-effects modelling indicated that this accent choice was influenced 7 8 by the factor proposition checked: Participants used L+H*< accents slightly more often 9 10 when checking their addressee’s proposition p than when checking their own 11 12 13 proposition p, β = 0.686, SE = 0.296, z = 2.32, p = 0.020 (see Figure 8). The best model 14 15 additionally contained a significant main effect of question type, associated with a much 16 17 larger estimate, β = 8.873 SE = 1.289, z = 6.89, p < 0.001, showing that L+H*< appeared 18 19 For Peer Review 20 more often in HiNQs than in LowNQs, which may be connected to the accentuation of 21 22 negations (24 data points or 3% trimmed, the best model did not contain a random 23 24 intercept for item). Less frequent realizations consisted mostly of unaccented auxiliaries 25 26 (N = 268, i.e. 35% overall), while all other accents together accounted for only 15% (N = 27 28 29 113). 30 31 32 [insert Figure 8] 33 34 In addition to auxiliaries, we evaluated the accents on the sentence-final compound nouns 35 36 37 (e.g. water lily), main verbs (e.g. planting) and negations (n’t and not in English, kein 38 39 ‘none’ and nicht ‘not’ in German). Below, we present short summaries of the 40 41 accentuation of these constituents, see Appendix B for further details. The subject 42 43 44 constituent (they in English, sie ‘they’ in German) was never accented and is thus not 45 46 further considered here. The realization of the accents on the other constituents did not 47 48 show an effect of proposition, certainty or an interaction between them in either the 49 50 English or the German data set. Instead, accentuation of all other constituents (except 51 52 53 finite verbs, which did not show any significant differences), was determined by morpho- 54 55 syntactic question type for both languages, though reasons for this seem more trivial for 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 33 of 106 Language and Speech

1 2 3 English (see Appendix B for statistical analyses using a binary dependent variable that 4 5 6 coded whether each word was produced with the most frequent realization or not). 7 8 Figures 9 and 10 illustrate utterances where all constituents were realized with their most 9 10 frequent realizations for English and German, respectively. 11 12 13 [insert Figure 9] 14 15 16 [insert Figure 10] 17 18 English negations were frequently accented in LowNQs, but never in HiNQs (overall 19 For Peer Review 20 21 65% unaccented, N = 493). This is expected, as negations in HiNQs are cliticized to the 22 23 auxiliary (Aren’t) and thus do not contain a vowel of their own that could carry an accent. 24 25 Among the overall 271 negations realized with an accent, the most common accent 26 27 28 category was L* (N = 119, i.e. 16% overall), while all other accents constituted overall 29 30 20% of the realizations (N = 152). 31 32 Compound nouns in the English data carried L* accents most of the time (N = 683, 89% 33 34 35 overall), but even more so in HiNQs than in LowNQs. Among the compound nouns not 36 37 realized with the L* accent, most were unaccented (N = 61 or 8% of all realizations), 38 39 while other accent categories constituted about 3% of all realizations (N = 20). 40 41 42 Among the main verbs, 34% had a low accent (L*, N = 258), 26% a high accent (H*, N = 43 44 200), 18% a rising accent with a delayed peak (L+H*<, N = 140), 14% no accent (N = 45 46 108), while all other accents together comprised 8% of the data (N = 58). No significant 47 48 difference appeared between HiNQs and LowNQs. 49 50 51 Turning to the German data, overall 33% of the negations were unaccented (N = 253). 52 53 However, while unaccented negations formed the largest category in HiNQs (54%, N = 54 55 216), they made up only overall 10% of productions with LowNQs (N = 37). Instead, 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 34 of 106

1 2 3 negations in LowNQs were most often realized with the rising accents L+H* (34%, N = 4 5 6 126) and L*+H (24%, N = 90), which in turn appeared for HiNQs in 22% (N = 87) and 7 8 4% (N = 16) of the cases, respectively. Note that this difference between HiNQs and 9 10 LowNQs cannot be explained in the same way as for English, since both negation words 11 12 are accentable in terms of their segmental make-up. 13 14 15 For German sentence-final compound nouns, the accent L*+H was the overall most 16 17 frequent realization for final compound nouns in both question types (N = 435, i.e. 57%), 18 19 though it accounted forFor 73% in PeerHiNQs and onlyReview for 39% in LowNQs (N = 290 vs. N = 20 21 22 145). Less frequent realizations of final compound nouns were L* accents (overall 20%, 23 24 N = 157), L+H* accents (overall 10%, N = 74), no accent (overall 7%, N = 55) and H* 25 26 accents (overall 6%, N = 45). 27 28 29 German verbs appeared most often with a high H* accent (N = 349, i.e. 46%) or a low L* 30 31 accent (N = 203, i.e. 26%), while unaccented realizations made up 12% (N = 90) and all 32 33 other accents combined 16% (N = 125). 34 35 In summary, accentuation of most constituents was influenced by morpho-syntactic 36 37 38 question type in English and German. In contrast to the results for German, the effect was 39 40 either small (utterance-final compounds) or had a straightforward explanation (negations 41 42 and auxiliaries) for the English data. The only significant effect of proposition checked – 43 44 45 albeit small – appeared for English auxiliaries, which were realized with L+H*< accents 46 47 (vs. unaccented realizations) more often when checking the addressee’s proposition. No 48 49 effects of certainty and no interaction between the two factors appeared in either 50 51 52 language. 53 54 55 4. Summary and Discussion 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 35 of 106 Language and Speech

1 2 3 In this section, we first examine the hypotheses in light of the results of our experimental 4 5 6 study that orthogonally manipulated checked proposition and speaker certainty for 7 8 negative polar questions in English and German (section 4.1). Next, we compare our 9 10 participants’ productions to previous descriptions of the prosody of negative polar 11 12 questions in these languages (section 4.2). Finally, we relate our results to accounts of 13 14 15 intonational meaning (section 4.3) before ending with a conclusion (section 4.4). 16 17 18 4.1 Evaluation of hypotheses 19 For Peer Review 20 21 22 Our study was designed to test the following hypotheses: 23 24 25 26 27 1. Speakers mark which proposition is being checked by their question. German uses 28 29 question type, English uses prosody. 30 31 a. English speakers use HiNQs both to check the speaker’s proposition p and 32 33 34 to check the addressee’s proposition p (e.g. Ladd, 1981). 35 36 b. English speakers use prosody, in particular final intonation, to mark the 37 38 checked proposition (suggestion in Domaneschi et al., 2017). 39 40 41 c. German speakers use HiNQs to check their own proposition p and 42 43 LowNQs to check the addressee’s proposition p (Büring & Gunlogson, 44 45 2000) 46 47 48 2. The speaker’s level of certainty regarding the checked proposition influences the 49 50 prosodic realization of the question (e.g. Pheby, 1975, for German; for Romance 51 52 languages Armstrong, 2017; Armstrong & Prieto, 2015; Prieto & Borràs-Comes, 53 54 55 2018; Michelas et al., 2016). 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 36 of 106

1 2 3 4 5 6 In line with hypothesis 1, we found experimental support for ‘Ladd’s ambiguity’: 7 8 Speakers of English use negative questions with the negation in a high position (HiNQ, 9 10 e.g. Aren’t they adding a menu item?) both to check their own belief that the proposition 11 12 p is true (e.g., p = they are adding a menu item) and their addressee’s belief that it is not 13 14 15 (Ladd, 1981). In our data, speakers of Western Canadian English most frequently chose 16 17 this question type over the alternative with the negation in a lower position (LowNQ, e.g. 18 19 Are they not adding a Formenu item? Peer) in both cases, Review supporting hypothesis 1a. Note that 20 21 22 while HiNQs constituted the majority of responses (62% overall), LowNQs still appeared 23 24 frequently (38% overall). This may to some degree be related to the fact that participants 25 26 made a binary choice between these two options. However, two facts speak against this 27 28 29 interpretation: First, according to Romero and Han (2004), while HiNQs obligatorily 30 31 convey a question bias, LowNQs may be used as neutral questions, but can also be used 32 33 in a biased context. Second, Domaneschi et al. (2017), who gave their participants five 34 35 different response options, still found 26% LowNQ choices and observed a similar 36 37 38 percentage of HiNQ responses (67%) in the same context, i.e. where the speaker had a 39 40 previous bias towards p and the addressee expressed a bias for p. Note also that all the 41 42 43 other response options in Domaneschi et al. (2017), which included the possibility for 44 45 speakers to freely formulate their own question response, together constituted less than 46 47 10% of responses in this context, suggesting that providing a binary choice between 48 49 HiNQ and LowNQ in the present study was appropriate. A search of the 560-million- 50 51 52 words Corpus of Contemporary American English (COCA; Davies, 2008, 2009) for text 53 54 strings indicating the use of HiNQs and LowNQs suggests that in spontaneous 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 37 of 106 Language and Speech

1 2 3 conversations and naturally-occurring written texts, HiNQs are relatively more frequent 4 5 6 than in these experimental contexts, which were specifically constructed to elicit biased 7 8 questions, but LowNQs also constitute between 8% and 43% of negative polar questions, 9 10 depending on the subject pronoun (Table 2; note that sentences with full noun phrases, 11 12 demonstratives etc. as subjects were not counted; also, the search was restricted to strings 13 14 15 including present tense forms of the auxiliary/verb to be). 16 17 18 19 For Peer Review 20 Table 2. String frequencies for LowNQs and HiNQs similar to those occurring in the 21 22 present studies with various subject pronouns in COCA (Davies, 2008, 2009), with 23 24 searched strings given in brackets, and relative percentage of HiNQs, calculated as 25 26 27 number of HiNQs * 100 / (number of HiNQs + number of LowNQs), rounded. 28 29 30 Subject pronoun Number of LowNQs Number of HiNQs % HiNQs 31 32 33 I 323 (am I not) 352 (aren’t I) + 69 (ain’t I) 57 34 35 36 37 You 447 (are you not) 5103 (aren’t you) 92 38 39 40 He 204 (is he not) 1208 (isn’t he) 86 41 42 43 She 77 (is she not) 658 (isn’t she) 90 44 45 46 47 It 1372 (is it not) 10974 (isn’t it) 89 48 49 50 We 311 (are we not) 1158 (aren’t we) 79 51 52 53 54 They 300 (are they not) 1445 (aren’t they) 83 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 38 of 106

1 2 3 4 SUM 3034 20967 87 5 6 7 8 9 Supporting hypothesis 1b, we further found that speakers of English prosodically 10 11 disambiguated the checked proposition, expanding a suggestion by Domaneschi et al. 12 13 14 (2017). Participants ended their questions with higher f0 rises more often when checking 15 16 the addressee’s proposition than when checking their own. In addition, an effect of 17 18 proposition checked also appeared for the accentuation of English auxiliaries (namely, 19 For Peer Review 20 21 more L+H*> accents when checking p than when p), although it was small (see below 22 23 for more detailed information on the prosodic marking). The importance of the difference 24 25 in accentuation should therefore not be overstated, but see below for some discussion of 26 27 how this finding could be related to accounts of intonational meaning. 28 29 30 In line with hypothesis 1c, we found that German differs from English with respect to the 31 32 morpho-syntactic marking of the checked proposition: German speakers chose a majority 33 34 of HiNQs when checking their own proposition p, but uttered LowNQs more frequently 35 36 37 to check their addressee’s proposition that p, in line with the previous literature (Büring 38 39 & Gunlogson, 2000). The prosodic realization, however, did not directly encode 40 41 proposition checked in German. Instead, it was best explained by the morpho-syntactic 42 43 44 question form: Unaccented negations constituted the largest group in HiNQs, whereas 45 46 negations in LowNQs were most frequently realized with L*+H accents. L*+H accents 47 48 on the sentence-final compound nouns were more frequent in HiNQs than in LowNQs. 49 50 51 Additionally, HiNQs were more often realized with a high final f0 rise (boundary tone H- 52 53 ^H%) than LowNQs. Thus, while Figure 10 above shows the typical intonational 54 55 realization of a HiNQ, Figure 11 illustrates the accents and boundary tone found most 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 39 of 106 Language and Speech

1 2 3 frequently for German LowNQs. Note, however, that German LowNQs were 4 5 6 intonationally more variable than their HiNQ counterparts: While there were only three 7 8 LowNQs that had the intonational pattern shown in Figure 11 (about 1% of German 9 10 LowNQs), Figure 10 is representative of 99 realizations that had the most frequent 11 12 intonational realizations for all constituents (25% of German HiNQs). Still, the picture 13 14 15 that emerges is that for the German participants, their choice of morpho-syntactic 16 17 question type largely determined, or at least restricted, options for the prosodic 18 19 realization. This contrastsFor with English,Peer where Review accentuation was also affected by the 20 21 22 choice of question type, but these effects were either very small or can simply be 23 24 explained through the availability of syllable nuclei for accentuation, as discussed below. 25 26 27 [insert Figure 11] 28 29 30 31 32 Summing up the evaluation of hypothesis 1, our results suggest that in German, the 33 34 distinction between checking the speaker’s own epistemic bias for the proposition p and 35 36 37 checking the addressee’s bias for p is encoded syntactically via the choice of HiNQ vs. 38 39 LowNQ and this morpho-syntactic choice then largely determines the prosodic 40 41 realization. In English, by contrast, the same morpho-syntactic form (HiNQ) is uniformly 42 43 44 preferred, and the checked proposition is directly prosodically encoded via accentuation 45 46 of the auxiliary and the final boundary tone (rising accent on auxiliary and high boundary 47 48 tone when checking the addressee’s proposition, unaccented auxiliary and less high 49 50 boundary tone when checking the speaker’s own). Thus, both languages reflect the 51 52 53 distinction between double-checking the speaker’s own bias and double-checking the 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 40 of 106

1 2 3 addressee’s bias. Prosody is influenced directly by proposition checked in English, and 4 5 6 follows from the choice of morpho-syntactic question type in German. 7 8 In hypothesis 2, we hypothesized that certainty would play a role. In our data, there was 9 10 no main effect of certainty in either language. However, certainty played a role in that it 11 12 interacted with the marking of proposition checked in some cases: In particular, it 13 14 15 affected only the choice of morpho-syntactic form in German and only the prosodic form 16 17 in English. Specifically, regarding morpho-syntactic marking in German, speakers’ 18 19 choice of HiNQ vs. LowNQFor further Peer interacted Review with their level of certainty regarding the 20 21 22 proposition they checked with their question: When participants were 90% certain (that 23 24 their own belief or that of their addressee was true), they chose their preferred question 25 26 type – HiNQ for checking p, LowNQ for checking p – more often than when they were 27 28 29 only 60% certain. 30 31 In English, where checked proposition did not affect the choice of question type, 32 33 certainty did not have an effect on choice of question type, either. Instead, the interaction 34 35 36 between proposition and certainty appeared in the prosodic realization of the questions in 37 38 our English data: The effect of checked proposition on use of utterance-final high-rising 39 40 boundary tones got boosted when participants were 90% certain compared to when they 41 42 43 were 60% certain. However, statistical modelling did not confirm a significant 44 45 interaction with certainty for the effect of proposition on the accentuation of auxiliaries, 46 47 though Figure 8 shows that for HiNQs, the difference between the two propositions was 48 49 numerically larger for 90% certainty than for 60% certainty. 50 51 52 Thus, for both languages, certainty was not itself encoded prosodically or in the choice of 53 54 morpho-syntactic question type, but it strengthened the preference for encoding the 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 41 of 106 Language and Speech

1 2 3 checked proposition. Thereby, our results contrast with the observed prosodic marking of 4 5 6 speaker certainty, confidence or commitment reported in many Romance languages, e.g. 7 8 for Bari Italian (Grice & Savino, 1995, 2003), Majorcan Catalan (Vanrell et al., 2013), 9 10 French (Michelas et al., 2016), Puerto Rican Spanish (Armstrong, 2017), Central Catalan 11 12 (Prieto & Borràs-Comes, 2018). Whether or not these results indicate a contrast between 13 14 15 the West- English and German on the one hand and the Romance 16 17 languages on the other hand is presently unclear due to differences in study design. 18 19 Whereas we treated checkedFor proposition Peer and Review certainty as two separate factors crossed in 20 21 22 our study design, previous studies often treated speaker certainty as situated on a gradient 23 24 (explicitly, e.g. Armstrong, 2017), with the checking of the speaker’s or addressee’s 25 26 proposition following from the speaker’s certainty of their own proposition.7 8 In this 27 28 29 context, the results of the third study reported in Romero, Arnhold, Braun, and 30 31 Domaneschi (2017) might be illuminating: In a modified version of the present study 32 33 investigating only the choice of morpho-syntactic question type, the speaker’s and 34 35 addressee’s original beliefs were flipped such that the speaker originally held a bias for 36 37 38 p, whereas the addressee’s bias was for p. As in the present study, participants’ choice 39 40 of question type was significantly influenced by the checked proposition, however, there 41 42 43 was no effect of certainty and no interaction. These results do not support an epistemic 44 45 gradient ranging from the speaker being totally sure of their own proposition to the 46 47 speaker being totally convinced of their addressee’s proposition. Instead, they seem to 48 49 suggest that the primary factor, at least for English and German, is the checked 50 51 52 proposition – as well as the presence of biases for specific propositions in the first place 53 54 (e.g. Domaneschi et al., 2017). Thus, the combined results of the present study, the third 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 42 of 106

1 2 3 study in Romero et al. (2017) and Domaneschi et al. (2017) suggest that, in line with the 4 5 6 previous literature, the prosody of polar questions is influenced by (i) syntactic form or 7 8 question type (e.g. HiNQ vs. positive polar question), (ii) pragmatic function (neutral 9 10 information-seeking use vs. biased questions seeking reassurance/confirmation or 11 12 13 expecting a given answer) and (iii) the propositional content – p vs. p – checked by the 14 15 question and/or whether the propositional content is ‘owned’ by the speaker or the 16 17 addressee. Various combinations of these factors are signalled with different morpho- 18 19 For Peer Review 20 syntactic and/or prosodic polar question forms. By contrast, at least for the West- 21 22 Germanic languages investigated here, speaker certainty or commitment does not seem to 23 24 be marked per se, but can only increase the marking of one of the other factors.Should 25 26 future studies confirm the apparent contrast between West-Germanic and Romance 27 28 29 languages, a possible explanation could follow from the fact that Romance languages 30 31 tend to be pro-drop languages: While interrogativity can be marked by subject 32 33 inversion/dislocation, this is not possible in sentences without overt expression of 34 35 36 subjects. As a result, sentences without overt subjects, which are frequent, must be 37 38 marked as statements or questions by prosody (Escandell-Vidal, 1998, 1990; Grice & 39 40 Savino, 2003; Vanrell et al. 2013), and the same can be argued to apply to question bias 41 42 43 and certainty. This could be the reason for intonation playing a more important role in 44 98 45 this . However, the opposite could also be speculated to be true, so that 46 47 in a language where intonation is already used to mark several distinctions, prosody may 48 49 not be available to mark a further distinction. In this respect, it might be noteworthy that 50 51 52 interactions with certainty in our data were clear for German, which allows morpho- 53 54 syntactic marking of the checked proposition, and somewhat less pervasive for English, 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 43 of 106 Language and Speech

1 2 3 which relies exclusively on prosody for this purpose. A third alternative would be that 4 5 6 Romance and Germanic languages simply differ in the semantic/pragmatic distinctions 7 8 they ‘choose’ to encode grammatically, in the same way that some languages 9 10 morphologically mark aspect or evidentiality and others do not. 11 12 13 14 4.2 Prosody of negative polar question in English and German 15 16 17 18 Before we address the wider questions of the semantics of intonation and its interaction 19 For Peer Review 20 with syntax, a comment is due regarding the exact prosodic form of the English polar 21 22 questions in our data. Recall that hypothesis 1b remained vague with respect to how 23 24 proposition checked would be marked prosodically. The prosodic realizations in our data 25 26 27 only partially corresponded to descriptions in the literature. Most importantly, while the 28 29 discussion of utterance-final pitch movements in the literature has centered on rises vs. 30 31 falls (see introduction), final pitch falls were virtually absent from our data. Instead, 32 33 34 English negative polar questions in our data overwhelmingly ended in pitch rises, with 35 36 85% of utterances having final high H-H% boundary tones and 89% of preceding final 37 38 accents being low L*. A likely explanation for the differences in final pitch movement 39 40 41 are regional differences, in particular the fact that Canadian English is a so-called ‘uptalk’ 42 43 in which many statement utterances end in a high boundary tone (James, Mahut, 44 45 & Latkiewicz, 1989; Shokeir, 2008; Talla Sando, 2009): It is possible that the final rises 46 47 in Domaneschi et al.’s (2016) British English data (and potentially those in the American 48 49 50 English data evaluated by Hedberg & Sosa, 2002, and Nilsenová, 2006) correspond to 51 52 high rises in our Western Canadian data, while the absence of a high rise in Western 53 54 Canadian English may be equivalent to a fall when marking the checked proposition in a 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 44 of 106

1 2 3 negative polar question in British (and American) English. This would mean that 4 5 6 Canadian English preserves the distinction between final rises and falls reported for 7 8 question intonation in other varieties as a distinction in final pitch height: In the relative 9 10 absence of pitch falls due to uptalk, even utterances with final rises may be perceived as 11 12 prototypical ‘statement intonation’ (in its ‘uptalked’ variant) as long as the rise is not too 13 14 15 high, while only high final rises would thus constitute prototypical ‘question intonation’ 16 17 appropriate for checking the addressee’s proposition (future perception experiments to 18 19 determine the boundaryFor between Peer the two categories Review are desirable). Thus, we predict that 20 21 22 our findings can be extended to British and non-uptalk varieties of American English: We 23 24 hypothesize that in these varieties, HiNQs checking the speaker’s proposition p are 25 26 realized with final f0 falls and HiNQs checking the addressee’s proposition p are 27 28 29 realized with a final f0 rises. 30 31 Alternatively, it is possible that our observation of a prosodic disambiguation of Ladd’s 32 33 ambiguity is a peculiarity of Western Canada that does not extend to other varieties of 34 35 36 English. This does, however, seem less likely both in view of the extended discussion 37 38 regarding the potential functions of final rises and falls in British and American English 39 40 summarized in the introduction, and based on Ladd’s (1981) original article, in which he 41 42 43 discusses the role of prosody in disambiguating the scope of negation and interrogation in 44 45 tag questions. Furthermore, with the exception of the final boundary tone, the prosodic 46 47 form of HiNQs in our data corresponded quite closely to previous descriptions of 48 49 American English. As described by Hedberg and Sosa (2002) and Romero and Han 50 51 52 (2001, 2002), we mostly found a rising accent on the auxiliary (L+H*< in our data, L+H* 53 54 in theirs). As described by Romero and Han (2001, 2002) this was most frequently 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 45 of 106 Language and Speech

1 2 3 followed by a final low L* combined with an H-H% high boundary tone, forming what is 4 5 6 generally described as the unmarked question contour (but note the variety of nuclear 7 8 contours reported by Hedberg & Sosa, 2002). 9 10 In a similar vein, it is interesting to note that even though both final pitch rises and final 11 12 falls have been described as frequent realizations of German polar questions, with various 13 14 15 interpretations suggested for the difference (cf. introduction), final falls were in fact rare 16 17 in our German data, as well. Altogether 74% of the utterances carried high H-^H% or H- 18 19 % boundary tones andFor the most Peerfrequent accent Review category for the sentence-final compound 20 21 22 noun was the L*+H rise, followed in frequency by the low L* accent. The final fall-rise 23 24 contour described for biased polar questions by Pheby (1975) was largely absent: This 25 26 fall-rise contour would either correspond to a GToBI annotation with a combination of a 27 28 29 final high H* or rising L+H* accent and a rising boundary tone L-H% or to a 30 31 combination of a final falling H+L* or H+!H* accent and a high H-^H% or H-% 32 33 boundary tone. However, L-H% boundary tones occurred in only 11% of the utterances, 34 35 while final compound nouns were never realized with H+L* accents and only very rarely 36 37 109 38 with H+!H* accents (less than 1% of the data). In fact, the prevalence of final rises in 39 40 our data corresponds very well with their frequency in Kohler’s (2005b) corpus of 41 42 spontaneous spoken appointment-making dialogues in German: Of 121 polar questions, 43 44 45 he found 39% to have a final high rising contour, 30% to have a final low rising contour, 46 47 and only 21% to end in falls and 10% in other contours. Thus, it seems likely that the 48 49 intonation contours in our data set are representative of German prosody more generally, 50 51 52 so that the present findings shed light on the previously under-described prosodic 53 54 differences between German HiNQs and LowNQs. 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 46 of 106

1 2 3 4 4.3 Intonational meaning 5 6 7 Relating our findings to proposals of intonational meaning, we are first going to discuss 8 9 10 results regarding utterance-final pitch movements in English and German before we turn 11 12 to the realization of accentuation in both languages. 13 14 The utterance-final pitch movements in our English data can be quite straightforwardly 15 16 17 related to the literature if we are correct that, as suggested above, only the high final rises 18 19 in our Western CanadianFor data correspondPeer to Reviewrises in other English varieties, whereas the 20 21 lower final rises are equivalent to falls in other varieties discussed in the literature. If this 22 23 equivalence is correct, the high rises our participants used preferably to check the 24 25 26 addressee’s proposition p correspond to H-H% in other varieties, a boundary tone 27 28 combination that according to Steedman (2008) directly attributes the responsibility for 29 30 the propositional content to the addressee. By contrast, the less high or low boundary 31 32 33 tones that participants used most often to check their own proposition p (annotated as 34 35 lower H-H% or the less frequent H-L% or L-L%) would correspond to L-L% in other 36 37 varieties and thus attribute the responsibility for the propositional content to the speaker 38 39 40 according to Steedman (we disregard L-H% boundary tones here and below, as they 41 42 constituted only 2% of our data overall). Thus, our findings show very good agreement 43 44 with his account (and related accounts by Gunlogson, 2001, and Escandell-Vidal, 1998, 45 46 47 1999, 2017), as the contrast between double-checking the speaker’s vs. the addressee’s 48 49 proposition dovetails with attributing responsibility for the propositional content to the 50 51 speaker vs. the addressee. 52 53 The correspondence with Pierrehumbert and Hirschberg’s (1990) account is slightly less 54 55 56 clear. According to them, as well as Bartels (1997b, 2013), H-H% points towards the 57 58 59 60 https://mc.manuscriptcentral.com/las Page 47 of 106 Language and Speech

1 2 3 relevance of an upcoming utterance for interpreting the current utterance (in the case of 4 5 6 questions like in our data, the upcoming utterance is the expected answer of the 7 8 addressee). By contrast, L-L% (which would correspond to the lower final rises in our 9 10 Western Canadian data) does not signal a dependence on an upcoming utterance 11 12 according to their proposal. At first blush, it is not easy to see why negative polar 13 14 15 questions checking the speaker’s proposition p and the addressee’s proposition p should 16 17 differ in this respect: Both seem to convey a similar ‘forward reference’, to use 18 19 For Peer Review 20 Pierrehumbert and Hirschberg’s term, to the upcoming expected response of the 21 110 22 addressee. 23 24 However, an alternative interpretation could be constructed based on Bartels’ (1997b, 25 26 2013) observation that polar questions with final falling intonation are similar to 27 28 29 alternative questions in signaling that the speaker is only interested in a yes or no answer, 30 31 whereas other answers might be more acceptable when polar questions bear final rising 32 33 intonation. For example, Do you like it? with a fall is similar to Do you like it or not?, 34 35 36 whereas a realization with a rise seems to be more encouraging of the addressee 37 38 presenting other information. In the semantic-pragmatic literature, Bartels’ description 39 40 has recently been further developed by Westera (2017, also cf. Bäuerle, 1979, Biezma & 41 42 43 Rawlins, 2012). He proposes that polar questions with final falling pitch signal ‘thematic 44 45 exhaustivity’ whereas the same questions with rising pitch signal that something else, 46 47 another answer besides yes or no, may be relevant. When combined with Roberts’ (2012) 48 49 discourse structure framework using questions under discussion (QUDs), this means that 50 51 52 the function of rising intonation is to signal that the current QUD has a sibling in the 53 54 QUD-tree. For example, asking Are you American? with a rise signals that other parallel 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 48 of 106

1 2 3 questions like Are you Canadian? are sub-parts of the same mother-QUD (e.g. What is 4 5 6 your nationality?), meaning that the addressee is invited to already answer such questions 7 8 and that the speaker may in fact ask them next if receiving only a yes or no answer. Such 9 10 a question could for example be asked to start a conversation at a party. By contrast, a US 11 12 immigration officer would typically ask the same question with falling pitch, which 13 14 15 signals that only a yes or no answer is to be given and announces that other nationalities 16 17 are irrelevant (i.e. sibling QUDs like Are you Canadian? are not relevant). Bringing this 18 19 back to our results, oneFor could suggest Peer that our Review English-speaking participants preferably 20 21 22 used final high rises when checking the addressees’ proposition p to signal that they 23 24 were not only interested in double-checking the proposition itself, but also in further 25 26 information, for example exactly how much credence the addressees put in their 27 28 29 proposition p. By contrast, when participants double-checked their own proposition p, 30 31 they may have used the lower final rises (which we assume correspond to falls in other 32 33 varieties of English) to signal that they were indeed only interested in the truth of the 34 35 36 proposition itself, and ideally in the addressee confirming that the speaker was right all 37 38 along. 39 40 Turning now to sentence-final pitch movements in the German data, a match with 41 42 43 accounts of intonational meaning is not completely straightforward. As summarized in 44 45 the introduction, various, partially overlapping interpretations have been proposed for the 46 47 contrast between rises and falls (as well as specific meanings for fall-rises and rise-falls). 48 49 50 However, there is little explicit discussion of the contrast between high rises and less- 51 52 high rising final plateaus, annotated in GToBI as H-^H% vs. H-% boundary tones 53 54 combined with a L*(+H) nuclear accent, which was the most salient contrast in our data. 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 49 of 106 Language and Speech

1 2 3 However, Grice and Baumann’s (2002) and Grice et al.’s (2005) tables illustrating 4 5 6 common nuclear contours for German show the combination L* H-^H%, which likely 7 8 corresponds to the final high rises annotated as L*+H H-^H% in our data, as typical for 9 10 neutral polar questions (or echo questions), see also Braun, Dehé, Neitsch, Wochner and 11 12 Zahner (2018). While they caution against interpreting their examples as directly 13 14 15 illustrating intonational meaning and do not show examples of questions with a L*(+H) 16 17 H-% contour, it could maybe be concluded that polar questions with less high rising 18 19 plateaus, simply by virtueFor of not Peer being neutral Review high rises, would convey some kind of 20 21 22 bias (in Braun et al. 2018, these contours were used to convey a rhetorical question 23 24 meaning). 25 26 In our German data, the distinction between H-^H% vs. H-% boundary tones when 27 28 29 following L*+H nuclear accents correlated with the proposition checked by the question. 30 31 Although the primary marking of this contrast was morpho-syntactic, HiNQs (typically 32 33 checking the speaker’s proposition p) were more frequently realized with H-^H% than 34 35 36 LowNQs (checking p), so that each checked proposition was indirectly associated with 37 38 a prosodic form. We are unable to say with certainty whether proposition checked was a 39 40 factor that independently affected morphosyntactic question form and prosody or whether 41 42 43 proposition checked affected morphosyntactic form, which in turn affected prosodic 44 45 realization. However, since morphosyntax and prosody are not independent of each other, 46 47 it is almost impossible to discuss intonational meaning for our negative question forms . 48 49 What is interesting, though, is that in contrast to the English data, higher final rises were 50 51 52 more strongly associated with checking the speaker’s proposition p than with checking 53 54 addressee’s proposition p. The reason for this difference could be that in contrast to 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 50 of 106

1 2 3 English, proposition checked is already marked by the morpho-syntactic question type in 4 5 6 German. Therefore, prosody can be used for other purposes in German. When checking 7 8 their own proposition p with a HiNQ, speakers mostly used a clear final high rise 9 10 (L*(+H) H-^H%) signaling neutrality and thus openness towards the addressee’s position 11 12 (strongly fragehaltig in Batliner’s, 1989, terms). By contrast, when checking the 13 14 15 addressee’s proposition p with a LowNQ, speakers mostly used the less high rising H- 16 17 % boundary tones to signal lack of neutrality and that they were ready to accommodate 18 19 For Peer Review 20 the addressee’s position. This could be related to previous research describing question 21 22 prosody in terms of politeness as summarized in the introduction. 23 24 Next, our results regarding accent realizations can be related to compositional accounts of 25 26 intonational meaning. Starting again with the English data, the only constituent that 27 28 29 showed a significant effect of one of our manipulated variables, proposition checked, was 30 31 the auxiliary. Here, the most frequent realization was an L+H*> accent, which contrasted 32 33 with the next-most frequent category, unaccented realizations. In addition to the small, 34 35 36 but significant effect of proposition checked (more L+H*> when checking the 37 38 addressee’s proposition p than when checking the speaker’s proposition p), we 39 40 observed larger effect of question type (more L+H*> in HiNQs than in LowNQs). Before 41 42 43 discussing these effects, some comments are due on the meaning conveyed by the choice 44 45 of L+H*(>) as the most frequent accent type. 46 47 Pierrehumbert and Hirschberg (1990) state that rising accents (L+H* and L*+H) convey 48 49 50 the salience of a scale, with L+H* marking that the accented item is to be believed 51 52 instead of believing a related item on a scale (i.e. a related item in a set of alternatives in 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 51 of 106 Language and Speech

1 2 3 the sense of Rooth, 1985, 1992). This explains why these accents are typically used in 4 5 6 corrections or to mark contrasts, as in their example of this naturally occurring exchange: 7 8 (6) A: It’s awfully warm for January. 9 10 B: It’s even warm for December. 11 12 L+H* L H% 13 14 15 (from Pierrehumbert & Hirschberg, 1990, p. 296) 16 17 By realizing the word December with a L+H* rising accent, B invokes the scale 18 19 contrasting months ofFor the year andPeer signals that Review she is making a predication (i.e. not 20 21 22 January, but December). Since in our study, the truth of the proposition as a whole was 23 24 under discussion, it makes sense that the L+H*< accent frequently appeared on the 25 26 auxiliary and negation complex. Prominent accents on the finite verb are typically taken 27 28 29 to mark either polarity focus (Goodhue, 2018; Rooth, 1992) or verum focus (see Höhle, 30 31 1992; Turco, Dimroth, & Braun, 2013). Following Pierrehumbert and Hirschberg’s 32 33 account, the use of L+H*(<) on auxiliaries in our data could thus be described as the 34 35 36 speaker making salient the scale {p, p} (for polarity focus) or the scale {‘you are totally 37 38 sure that p’, ‘you are quite sure that p’, …} (for verum focus). 39 40 Taking this meaning of the L+H*< accent as a starting point, two explanations can be 41 42 43 given for the effect of question type (L+H*< accents on the auxiliary were more frequent 44 45 in HiNQs than in LowNQs). First, in both HiNQ and LowNQ, this accent marks focus on 46 47 the polarity, namely on the negative polarity, of the sentence. Thus, both the auxiliary 48 49 50 complex (i.e. Aux + n’t) and the negation could be accented to mark this polarity focus. 51 52 However, segmental phonology favours the auxiliary over the negation as an accent 53 54 location for HiNQs, but not for LowNQs: As already mentioned above, the negation is 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 52 of 106

1 2 3 reduced to a clitic in English HiNQs, whereas it is a full prosodic word in LowNQs 4 5 6 (Aren’t they…? vs. Are they not…?). Consequently, negations were never accented, but 7 8 their host auxiliaries regularly carried accents in HiNQs, whereas in LowNQs, negations 9 10 almost always carried an accent, but the preceding auxiliaries did not. 11 12 Second, Romero and Han (2002) suggest that preposing the auxiliary introduces a VERUM 13 14 15 operator, i.e. that English HiNQs always express a verum focus, whereas English 16 17 LowNQs do not (which also explains why they are overall preferred in situation like 18 19 those in our study, whereFor speakers’ Peer and addressees’ Review biases conflict). According to this 20 21 22 interpretation, the observed difference in accent realization between HiNQs and LowNQs 23 24 would actually be meaningful in that the L+H*< accents preferred on auxiliaries would 25 26 mark verum focus in HiNQs, whereas LowNQs rarely showed rising accents on 27 28 29 auxiliaries and thus would not mark verum focus. In line with this idea, negations in 30 31 LowNQs, which were preferably accented instead of auxiliaries, carried various types of 32 33 accents, including some realizations as L+H*, but the most frequent category in this 34 35 context was the non-contrastive L* accent. 36 37 38 Regarding the effect of proposition checked, the fact that L+H*> accents were slightly 39 40 more frequent when checking the addressee’s proposition p than when checking their 41 42 43 own proposition p could receive various explanations, but at this point, we do not see one 44 121 45 that is clearly superior. However, it is important to remember that the effect was small 46 47 and did not interact with question type. Thus, participants overwhelmingly responded by 48 49 using HiNQs, in which they mostly realized L+H*> accents on the auxiliaries. 50 51 52 Turning to the accentuation of other constituents, our data fit well with Pierrehumbert and 53 54 Hirschberg’s (1990) suggestion for the compositional meaning of accents: Simple L* 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 53 of 106 Language and Speech

1 2 3 accents were most frequently used on verbs and sentence-final compound nouns in our 4 5 6 data. L* accents, according to Pierrehumbert and Hirschberg, serve to highlight a 7 8 constituent, but explicitly exclude it from the speaker’s predication, which is typical of 9 10 canonical questions, where the speaker does not make a statement but asks the addressee 11 12 to make one instead. 13 14 15 In contrast to Pierrehumbert and Hirschberg’s account, the accentuation of HiNQs in our 16 17 English data may be harder to reconcile with Steedman’s (2000, 2008) account. He 18 19 suggests that L+H* accents,For which Peer were the Reviewmost frequent realization of auxiliaries, mark 20 21 22 themes that are mutually believed by the speaker and the hearer, i.e. uncontentious, 23 24 whereas L*+H accents appear on themes that are not marked as agreed between the 25 26 speaker and the hearer. As there was clearly disagreement in the dialogues in our study, 27 28 29 the occurrence of L+H*(<) accents would be unexpected, apart from the fact that it is 30 31 questionable whether auxiliaries can be themes. His statement that L* accents mark 32 33 rhemes that are not marked as agreed between speaker and hearer could fit with their 34 35 occurrence as the most frequent realization of compound nouns, if the accent on this 36 37 38 sentence-final constituent is assumed to project a larger rhematic constituent. 39 40 Finally, regarding accentuation in the German data, it seems fitting that our German 41 42 participants mostly used LowNQs to check the addressee’s proposition and in turn mostly 43 44 45 realized the negation in these LowNQs with the rising accents L+H* or L*+H, accents 46 47 that according to Pierrehumbert and Hirschberg (1990) invoke a scale, and frequently a 48 49 contrast. As the negation is the part of the utterance that was in question in these short 50 51 52 dialogues, it makes sense that speakers chose to mark it as contrastive. While both types 53 54 of rising accents can be used to accomplish this, German speakers employed the L+H* 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 54 of 106

1 2 3 accent slightly more frequently than the L*+H accent (note that we did not statistically 4 5 6 evaluate the relative frequency of these two accent types since the main contrast in the 7 8 realization of German negations was between accented and unaccented). Thus, they used 9 10 the same accent type that English speakers used primarily on auxiliaries in the same 11 12 13 condition, i.e. when checking the addressee’s proposition p. Interestingly, in contrast to 14 15 the English data, speakers also used a sizable amount of L*+H accents, which according 16 17 to Pierrehumbert and Hirschberg in contrast to L+H* “convey a lack of predication” 18 19 For Peer Review 20 (1990, p. 296). While Pierrehumbert and Hirschberg’s examples for both accent types 21 22 come only from statements, they discuss the contrast between H* vs. L* accents in polar 23 24 questions, which likewise contrast with respect to the speaker making a predication in 25 26 statements: The H* accent typically involves predication – the speaker marks the 27 28 29 constituent realized with H* as new and signals that the proposition resulting from 30 31 inserting this constituent is to be added as a mutual belief–, while the L* exempts the 32 33 constituent bearing it from the predication. Here, they state that in morpho-syntactic polar 34 35 36 questions, is less striking than for statements. However, they remark that H* accents 37 38 frequently appear in confirmation questions, i.e. when the speaker believes the answer 39 40 will affirm the semantic nucleus of the question (in their example, the expectation is of a 41 42 43 yes answer to a positive polar question). Both marking and not marking such an 44 45 expectation seem appropriate in the contexts we provided to our participants. 46 47 Importantly, both rising accents highlight the polarity contrast (p vs. p) as expected in 48 49 50 the context of our study. 51 52 To sum up this discussion, we found several matches between our results and 53 54 compositional accounts of meaning suggested in the literature – taking into account the 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 55 of 106 Language and Speech

1 2 3 morpho-syntactic marking of our data as questions where appropriate. In particular, the 4 5 6 effect of proposition checked on the use of high-rising utterance-final pitch in our English 7 8 data fits well with the meaning suggested for H-H% boundary tones in the literature 9 10 (attribution of responsibility according to Steedman, 2008, or marking an interest in 11 12 further information according to Bartels, 1997b, 2013, and others). In contrast, further 13 14 15 research is required on the meaning of the contrast between high and less high utterance- 16 17 final rises in German (H-^H% vs. H-% in GToBI). Regarding the meaning of accents, we 18 19 again found good agreementFor between Peer the frequent Review use of contrast-marking rising accents 20 21 22 on auxiliaries or negations in both our English and German and the literature (evocation 23 24 of a scale according to Pierrehumbert & Hirschberg, 1990). 25 26 27 28 4.4 Conclusion 29 30 31 32 To conclude, our study of English and German negative polar questions confirmed 33 34 experimentally that English HiNQs can be used to check both the speaker’s bias for the 35 36 proposition p and the addressee’s proposition p, whereas German marks this distinction 37 38 39 morpho-syntactically (Ladd, 1981; Büring & Gunlogson, 2000). Moreover, we found that 40 41 speakers of English frequently resolved this ambiguity within HiNQs prosodically by 42 43 marking the checked proposition through sentence-final pitch movements and 44 45 46 accentuation of the auxiliary (high rise boundary tone and rising accent on the auxiliary 47 48 for checking the addressee's proposition, normal-high or low boundary tones and 49 50 unaccented auxiliary for checking the speaker's proposition). By contrast, the prosody of 51 52 53 the German negative polar questions was largely dependent on the morpho-syntactic 54 55 choice, with HiNQs including the negation nicht ‘not’ generally carrying different 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 56 of 106

1 2 3 intonation contours than LowNQs containing the negation kein ‘no’. Speaker certainty 4 5 6 was not directly encoded, but affected the marking of the checked proposition, with 7 8 higher certainty making the marking more consistent where it occurred – prosodically in 9 10 English, morpho-syntactically in German. 11 12 13 14 Acknowledgements 15 16 17 18 We would like to thank Clara Huttenlauch, Katharina Zahner and Devon Gozjolko for 19 For Peer Review 20 help with data annotation, Clara Huttenlauch for help with data collection, Keely Morrow 21 22 23 for checking the English items and Oleksiy Bobrov and Achim Kleinmann for technical 24 25 support. We further thank Juhani Järvikivi for hosting our data collection at the Centre 26 27 for Comparative Psycholinguistics at the University of Alberta, as well as lab members 28 29 for practical support. 30 31 32 33 Funding statement 34 35 36 37 This work was supported by the German Research Foundation (DFG) [Priority 38 39 Programme 1727 “XPrag.de: New Pragmatic Theories based on Experimental Evidence”, 40 41 42 project “BiasQ: Bias in Polar Questions”]. 43 44 45 46 Notes 47 48 49 1 50 As already pointed out by Ladd, the ambiguity between English HiNQs checking p vs. 51 52 p is resolved when polarity items are present: Positive polarity items like some and too 53 54 indicate that the speaker is checking p, whereas negative polarity items like any and 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 57 of 106 Language and Speech

1 2 3 either indicate that they are checking p (e.g. Isn’t there some/any vegetarian restaurant 4 5 6 around here?). However, since disambiguating polarity items are frequently absent, 7 8 HiNQs often remain string-wise ambiguous. Additionally, Romero and Han (2004) note 9 10 that LowNQs can occasionally express original speaker bias and show the checking p/p 11 12 13 ambiguity. 14 15 2 Note that while we have translated the negative question with an English LowNQ, the 16 17 position of the negation cannot be determined in the German original since the object is a 18 19 For Peer Review 20 pronoun. 21 22 3 Pheby elsewhere describes a similar effect of pitch range also for the first case, i.e. the 23 24 final rise normally marking neutral questions. He states that when the final rise has a 25 26 27 large pitch range it can carry a ‘negative’ expectation, expressing speaker surprise and 28 29 potentially disbelief regarding the checked proposition (Pheby, 1975, p. 60; examples are 30 31 positive polar questions; also note that we have translated his use of the German 32 33 Implikation with ‘expectation’ here, since it is not clear whether he means ‘implicature’ 34 35 36 in the semantic sense). 37 38 4 As pointed out by a reviewer, two context descriptions in Grice and Baumann’s (2002) 39 40 table of common nuclear contours, which is repeated in Grice et al. (2005), seem related 41 42 43 to bias: contour 2, the rise-fall (late peak) L*+H L-% described as “selbstverständliche 44 45 Feststellung” (Grice and Baumann, 2002, p. 19) / “self-evident assertion” (Grice et al., 46 47 2005, p. 71) and contour 6a, the early peak H+!H* L-% described as “Bestätigung einer 48 49 50 bekannten Tatsache” (Grice and Baumann, 2002, p. 20) / “established fact” (Grice et al., 51 52 2005, p. 72), although Grice and colleagues state that “[t]he contexts provided in the table 53 54 contain pragmatic interpretations referring to specific examples; they should not be taken 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 58 of 106

1 2 3 as abstract meanings for given contours” (Grice et al., 2005, p. 70). The example 4 5 6 utterances for both contours, which are taken from previous literature, are statements in 7 8 both cases. As these contours seem to have been described almost exclusively for 9 10 statements, we do not provide an exhaustive review here, but for the former contour see 11 12 Pheby (1975, p. 147; 1984, p. 847, 881-882), who describes the rise-fall as infrequent, 13 14 15 but familiar and more obliging than a simple fall, and Féry (1993:93-96), who transcribes 16 17 it as L*HL and states that it has a meaning like ‘of course’; on the latter contour, the early 18 19 peak, see Féry (1993:103For-106), Peerwho transcribes Review it as H H*L and notes that it is frequently 20 21 22 used by television reporters, as well as several publications of perception studies by 23 24 Kohler and colleagues (e.g. Kohler, 1991, 2005a; Niebuhr, 2003). The only mention 25 26 regarding questions that we could find comes from Uhmann (1991:171-174), who 27 28 29 transcribes the rise-fall as T*+H T% (note that she consistently uses T instead of L to 30 31 mark low targets) and remarks that it was rare in her data, appearing exclusively on a few 32 33 precursor questions (Vorläuferfragen), although she remarks that a use on certain 34 35 declaratives is maybe possible. Neither of these two nuclear contours appeared in our 36 37 38 data. 39 40 5 These overall numbers are based on evaluating boundary tones and pitch accent types 41 42 on all constituents together, with the absence of a pitch accent treated as a pitch accent 43 44 45 type (“none”). Agreement regarding the presence vs. absence of a pitch accent was 92% 46 47 (κ = 0.84) for English and 91% (κ = 0.81) for German, while agreement regarding pitch 48 49 accent type for constituents where both annotators had marked the presence of a pitch 50 51 52 accent was 72% (κ = 0.53) for English and 70% (κ = 0.58) for German. Agreement 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 59 of 106 Language and Speech

1 2 3 regarding boundary tones was 58% (κ = 0.37) for English and 82% (κ = 0.70) for 4 5 6 German. 7 8 65 To test the validity of this distinction, we measured f0 height at the end of the utterance 9 10 (in semitones relative to 100Hz, measured 50ms from the end of the utterance to avoid 11 12 microprosodic effects; note that f0 was not measurable for 15 utterances, i.e. 2% of the 13 14 15 data). We then fitted a linear mixed-effects model to this data, using participant gender 16 17 and the binary classification into high rises and other final boundary tones as a predictor. 18 19 The resulting model (trimmed,For removingPeer 37 Reviewdata points or 4%) contained a fixed effect of 20 21 22 the binary classification into final high rises vs. other boundary tones showing that the 23 24 former were associated with significantly higher f0 measurements, β = 3.672, SE = 0.467, 25 26 t = 7.87, p < 0.001 (mean: 16.2st for high final rises vs. 11.6st for others). The best model 27 28 29 additionally contained a by-participant random slope of boundary tone classification. The 30 31 fixed effect of gender was also significant, indicating that compared to the female 32 33 participants, male participants produced overall significantly lower f0 values, β = - 34 35 12.114, SE = 1.205, t = -10.05, p < 0.001, as did the participant who identified their 36 37 38 gender as ‘other’, β = -8.530, SE = 2.942, t = -2.90, p = 0.008 (excluding data from this 39 40 participant produced the same results). Finally, restricting the analysis to only utterances 41 42 annotated with H-H% boundary tones resulted in a nearly identical best model, indicating 43 44 45 that those H-H% tones classified as high final rises indeed had significantly higher f0 46 47 than those that were not, β = 3.118, SE = 0.462, t = 6.75, p < 0.001 (mean: 16.2st for high 48 49 rises vs. 11.7st for non-high rises; trimming removed 33 data points, i.e. 5% of the 648 50 51 52 utterances with H-H% tones). This model also again contained a by-participant random 53 54 slope of the classification of high rises, as well as a significant effect of gender, 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 60 of 106

1 2 3 indicating overall lower pitch for male participants, β = -12.306, SE = 1.241, t = -9.92, p 4 5 6 < 0.001, and the participant identifying with the third gender category ‘other’, β = -8.842, 7 8 SE = 2.970, t = -2.98, p = 0.007, compared to female speakers. 9 10 Note also the difference between this classification, which is parallel to our classification 11 12 of the German data below, and another use of the term ‘high-rise’, which contrasts rises 13 14 15 from a final high accent (H* H-H%) with those from a final low accent (L* H-H%), i.e. is 16 17 based on accent category, not the height of the f0 realizing the final boundary tone (e.g. 18 19 Bartels, 1997b, 2013; ForHirschberg Peer & Ward, 1995;Review Levis, 1999; Pierrehumbert & 20 21 22 Hirschberg, 1990). 23 24 67 We again validated this choice by performing f0 measurements and linear mixed- 25 26 effects modelling as for the English data (final f0 not measurable for 16 utterances, i.e. 27 28 29 2% of the data; best model trimmed, removing 33 data points or 4%). Modelling 30 31 indicated that utterances classified as having a final high rise ended in significantly 32 33 higher f0, β = 2.490, SE = 0.576, t = 4.32, p < 0.001 (mean: 17.7st for high rises vs. 34 35 14.5st for others). Also, the model contained a by-participant effect suggesting that the 36 37 38 absolute difference differed between speakers and a significant main effect of gender, β = 39 40 -11.310, SE = 0.8317, t = -13.60, p < 0.001. Comparing only H-% and H-^H% yielded a 41 42 model with the same effect, importantly indicating that final f0 was higher for H-^H%, β 43 44 45 = 1.518, SE = 0.438, t = 3.46, p = 0.005 (37 data points or 5% of 569 trimmed; mean: 46 47 17.7st for H-^H% vs. 15.4st for H-%). 48 49 87 A slightly different approach is taken by Prieto and Borràs-Comes (2018), who treat 50 51 52 speaker commitment (to their own proposition) and speaker agreement (with their 53 54 addressee’s proposition) as two separate factors, which were however not crossed in their 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 61 of 106 Language and Speech

1 2 3 study design. The stimuli of their perception study consisted of twelve different contexts 4 5 6 each followed by a matching question (with all questions presented with four different 7 8 intonational contours, i.e. each participant rated 12 x 4 = 56 stimuli). The twelve contexts 9 10 represented six different conditions (i.e. two items per condition), three of which 11 12 manipulated only speaker commitment (low, mid, high), while the other three 13 14 15 manipulated speaker agreement (low, mid, high). Importantly, the condition with high 16 17 speaker agreement at the same time had low speaker commitment, while both the mid 18 19 and low speaker agreementFor conditions Peer had midReview speaker commitment. 20 21 98 22 We thank an anonymous reviewer for suggesting this possible explanation. 23 24 109 However, it is a bit difficult to match up Pheby’s descriptions with GToBI 25 26 annotations. Importantly, all of his fall-rise contours stretch across several words, with a 27 28 29 high or falling nuclear accent (‘Tonsilbe’) followed by a tail (‘Nachlauf’) consisting of a 30 31 low plateau and a final rise (see Pheby, 1975, p. 54-55), which would correspond to the 32 33 GToBI annotations given here. However, if his accents on the ‘Tonsilbe’ were interpreted 34 35 as pre-nuclear instead of nuclear, the fall-rise could correspond to a GToBI annotation of 36 37 38 H* L*(+H) H-(^H)%, i.e. a pre-nuclear high accent, followed by a nuclear low or rising 39 40 accent and a high boundary tone, i.e. the most common realization of HiNQs in our data 41 42 as illustrated in Figure 10. 43 44 110 45 Note that we slightly simplify in the main text here and above by referring to the 46 47 meaning of the boundary tone combinations H-H% vs. L-L% as a whole, since the 48 49 contrast between these two combinations is most relevant to our data and most discussed 50 51 52 in the literature. Pierrehumbert and Hirschberg (1990) attribute very similar meanings to 53 54 H- and H% (forming part of a larger utterance/dependence on upcoming speech) and to 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 62 of 106

1 2 3 L- and L% (emphasizing the separation of the current unit from a subsequent 4 5 6 phrase/absence of dependence on upcoming speech). According to Bartels’ interpretation, 7 8 “to P[ierrehumbert] & H[irschberg], the difference in function between phrase accents 9 10 and boundary tones lies merely in the size of the prosodic unit for which they cue 11 12 discourse integration–intermediate vs. intonational phrase” (Bartels 2013, p. 59). Bartels 13 14 15 (1997b, 2013) assumes that L% and H- are inherently meaningless, so that the meaning 16 17 of L-L% is entirely determined by the meaning of L- (assertiveness), whereas the 18 19 meaning of H-H% correspondsFor Peerto the meaning Review of H% (dependence on upcoming units of 20 21 22 speech). In contrast to H-L% and L-H%, the simplification regarding the 23 24 compositionality of meaning is thus justified for the H-H% and L-L% cases we discuss 25 26 here. 27 28 112 29 For example, in line with the high rise of HiNQs double-checking p, evocation of a 30 31 scale – and thus the use of the accent L+H*> – is more important/natural if the speaker 32 33 cares about other alternatives on the scale beyond the current one. Alternatively, 34 35 36 highlighting the polarity contrast is additionally important when the proposition checked 37 38 conflicts with the speaker’s beliefs. 39 40 41 42 References 43 44 45 46 Altmann, H. (1984). Linguistische Aspekte der Intonation am Beispiel Satzmodus 47 48 (Forschungsberichte des Instituts für Phonetik und sprachliche Kommunikation der 49 50 Universität München (FIPKM) No. 19). Institut für Phonetik und sprachliche 51 52 53 Kommunikation der Universität München: München. 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 63 of 106 Language and Speech

1 2 3 AnderBois, S. (2011). Issues and Alternatives (PhD thesis). UC Santa Cruz. 4 5 6 7 Asher, N., & Reese, B. (2007). Intonation and Discourse: Biased Questions. (S. Ishihara, 8 9 S. Jannedy, & A. Schwarz, Eds.), Interdisciplinary Studies on Information Structure 10 11 (ISIS). Working Papers of the SFB 632 (Vol. 8). Potsdam: Universitätsverlag 12 13 14 Potsdam, 1–38. 15 16 17 Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with 18 19 crossed random effectsFor for Peersubjects and Reviewitems. Journal of Memory and Language, 20 21 22 59(4), 390–412. http://doi.org/10.1016/j.jml.2007.12.005 23 24 25 Bartels, C. (1997a). The Pragmatics of Wh-Question Intonation in English. U. Penn 26 27 Working Papers in Linguistics, 4(2), 1–17. 28 29 30 31 Bartels, C. (1997b). Towards a compositional interpretation of English statement and 32 33 question intonation. PhD thesis, University of Massachusetts Amherst. 34 35 36 37 Bartels, C. (2013). The intonation of English statements and Questions. A compositional 38 39 interpretation. New York and London: Routledge (first published 1999 by Garland 40 41 Publishing, New York). 42 43 44 45 Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects 46 47 models using lme4. Journal of Statistical Software, 67, 1–48. 48 49 http://doi.org/10.18637/jss.v067.i01 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 64 of 106

1 2 3 Bates, D., Mächler, M., Bolker, B., Walker, S., Christensen, R. H. B., Singmann, H., … 4 5 6 Green, P. (2016). lme4: Linear mixed-effects models using “Eigen” and S4. R 7 8 package version 1.1-13. Retrieved from https://CRAN.R-project.org/package=lme4 9 10 11 Bäuerle, Rainer. (1979). Questions and Answers. In R. Bäuerle, U. Egli & A. von 12 13 14 Stechow (Eds.), Semantics from different points of view. , Heidelberg, New 15 16 York: Springer, 61–74.Bierwisch, M. (1966). Regeln für die Intonation deutscher 17 18 Sätze. In M. Bierwisch (Ed.), Untersuchungen über Akzent und Intonation im 19 For Peer Review 20 21 Deutschen (pp. 99–201). Berlin: Akademie-Verlag. 22 23 24 Biezma, M., & Rawlins, K. (2012). Responding to Alternative and Polar Questions. 25 26 Linguistics and Philosophy, 35(5): 361–406. http://doi.org/10.1007/s10988-012- 27 28 29 9123-z 30 31 32 Boersma, P., & Weenink, D. (2016). Praat. Doing phonetics by computer (Version 33 34 6.0.18). Retrieved from www.praat.org. 35 36 37 38 Braun, B., N. Dehé, J. Neitsch, D. Wochner & Zahner, K. (2018). The prosody of 39 40 rhetorical and information-seeking questions in German. Advance Online 41 42 Publication. Language and Speech. https://doi.org/10.1177/0023830918816351. 43 44 45 46 Breen, M., Dilley, L. C., Kraemer, J., & Gibson, E. (2012). Inter-transcriber reliability for 47 48 two systems of prosodic annotation: ToBI (Tones and Break Indices) and RaP 49 50 51 (Rhythm and Pitch). Corpus Linguistics and Linguistic Theory, 8(2), 277–312. 52 53 https://doi.org/10.1515/cllt-2012-0011 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 65 of 106 Language and Speech

1 2 3 Büring, D., & Gunlogson, C. (2000). Aren’t positive and negative polar questions the 4 5 6 same? Unpublished manuscript, UCLA and UCSC. 7 8 9 Carletta, J. (1996). Assessing agreement on classification tasks: The kappa statistic. 10 11 Computational Linguistics, 22(2), 249–254. 12 13 14 15 Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and 16 17 Psychological Measurement, XX(1), 37–46. 18 19 For Peer Review 20 21 Davies, M. (2008). The corpus of contemporary American English (COCA): 560 million 22 23 words, 1990-present. Retrieved April 18, 2019, from https://www.english- 24 25 corpora.org/coca/ 26 27 28 29 Davies, M. (2009). The 385+ million word Corpus of Contemporary American English 30 31 (1990-2008+). International Journal of Corpus Linguistics, 14(2), 159–190. 32 33 https://doi.org/10.1075/ijcl.14.2.02dav 34 35 36 37 Dehé, N. & Braun, B. (2019). The prosody of rhetorical questions in English. English 38 39 Language and Linguistics. Advance Online Publication. 40 41 https://doi.org/10.1017/S1360674319000157 42 43 44 45 Domaneschi, F., Romero, M., & Braun, B. (2017). Bias in polar questions: Evidence 46 47 from English and German production experiments. Glossa, 2(1), 1–28. 48 49 http://doi.org/10.5334/gjgl.27 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 66 of 106

1 2 3 Escandell-Vidal, M. V. (1998). Intonation and procedural encoding: The case of Spanish 4 5 6 interrogatives. In V. Rouchota & A. H. Jucker (Eds.), Current issues in relevance 7 8 theory (pp. 169–203). Amsterdam/Philadelphia: John Benjamins. 9 10 11 Escandell-Vidal, M. V. (1999). Los enunciados interrogativos. Aspectos semánticos y 12 13 14 pragmáticos. 15 16 17 Escandell-Vidal, V. (2017). Intonation and evidentiality in Spanish polar interrogatives. 18 19 Language and SpeechFor, 60(2), Peer 224–241. Reviewhttps://doi.org/10.1177/0023830917698178 20 21 22 23 von Essen, O. (1964). Grundzüge der hochdeutschen Satzintonation. Ratingen / 24 25 Düsseldorf: Alois Henn. 26 27 28 29 Féry, C. (1993). German intonational patterns. Tübingen: Max Niemeyer Verlag. 30 31 32 Gamer, M., Lemon, J., Fellows, I., & Singh, P. (2019) irr: Various Coefficients of 33 34 Interrater Reliability and Agreement. R package version 0.84.1. Retrieved from 35 36 37 https://CRAN.R-project.org/package=irr 38 39 40 Goodhue, Daniel. (2018). On asking and answering biased polar questions (PhD thesis). 41 42 McGill University. 43 44 45 46 Grice, M., & Baumann, S. (2002). Deutsche Intonation und GToBI. Linguistische 47 48 Berichte, 181(1993), 267–298. 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 67 of 106 Language and Speech

1 2 3 Grice, M., Baumann, S., & Benzmüller, S. (2005). German intonation in autosegmental- 4 5 6 metrical phonology. In S.-A. Jun (Ed.), Prosodic typology: The phonology of 7 8 intonation and phrasing (pp. 55–83). Oxford: Oxford University Press. 9 10 11 Grice, M., Baumann, S., Ritter, S., & Röhr, C. (2017). GToBI. Übungsmaterialien zur 12 13 14 deutschen Intonation. Retrieved November 15, 2017, from http://www.gtobi.uni- 15 16 koeln.de/index.html 17 18 19 Grice, M., & Savino, ForM. (1995). Peer Intonation andReview communicative function in a regional 20 21 22 variety of Italian. Phonus (Vol. 1). Saarbrücken: Saarland University, 19-32. 23 24 25 Grice, M., & Savino, M. (2003). Map Tasks in Italian: Asking questions about given, 26 27 accessible and new information. Catalan Journal of Linguistics, 2, 153–180. 28 29 30 31 Gunlogson, C. (2001). True to form: Rising and falling declaratives as questions in 32 33 English. PhD thesis, University of California, Santa Cruz. 34 35 36 37 Gussenhoven, C. (1983). On the grammar and semantics of sentence accents. Dordrecht: 38 39 Foris Publications. 40 41 42 Halliday, M. A. K., & Greaves, W. S. (2008). Intonation in the grammar of English. 43 44 45 London: Equinox. 46 47 48 Hedberg, N., & Sosa, J. M. (2002). The Prosody of Questions in Natural Discourse. In 49 50 51 Proceedings of Speech Prosody 2002: The first international conference on speech 52 53 prosody (pp. 375–378). Aix-en-Provence: ISCA. 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 68 of 106

1 2 3 Höhle, T. N. (2013). Über Verum-Fokus im Deutschen. In J. Jacobs (Ed.), 4 5 6 Informationsstruktur und Grammatik (pp. 112–141). Opladen: Westdeutscher 7 8 Verlag. https://doi.org/10.1007/978-3-663-12176-3_5 9 10 11 Isačenko, A., & Schädlich, H.-J. (1966). Untersuchungen über die deutsche 12 13 14 Satzintonation. In M. Bierwisch (Ed.), Untersuchungen über Akzent und Intonation 15 16 im Deutschen (pp. 7–67). Berlin: Akademie-Verlag. 17 18 19 James, E., Mahut, C., For& Latkiewicz, Peer G. (1989). Review The investigation of an apparently new 20 21 22 intonation pattern in Toronto English. Information Communication, 10, 11–17. 23 24 25 Kügler, F. (2003). Do we know the answer? – Variation in yes-no question intonation. In 26 27 S. Fischer & R. van de Vijver (Eds.), Experimental Studies in Linguistics 1 (pp. 9– 28 29 30 29). Potsdam: Universitätsverlag Potsdam. 31 32 33 Kügler, F., Smolibocki, B., Arnold, D., Baumann, S., Braun, B., Grice, M., … Röhr, C. 34 35 36 T. (2015). DIMA – Annotation Guidelines for German Intonation. In The Scottish 37 38 Consortium for ICPhS 2015 (Ed.), Proceedings of the 18th International Congress of 39 40 Phonetic Sciences (ICPhS 2015) (Paper number 317). Glasgow, UK: University of 41 42 Glasgow. Retrieved from https://www.internationalphoneticassociation.org/icphs- 43 44 45 proceedings/ICPhS2015/Papers/ICPHS0317.pdfKohler, K. J. (1991). Terminal 46 47 intonation patterns in single-accent utterances of German: Phonetics, phonology and 48 49 semantics. In K. J. Kohler (Ed.), Studies in German intonation. Arbeitsbericht Nr. 25 50 51 52 des Instituts für Phonetik und digitale Sprachverarbeitung der Universität Kiel 53 54 (AIPUK) (Vol. 25, pp. 115–185). Kiel: Universität Kiel. 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 69 of 106 Language and Speech

1 2 3 Kohler, K. J. (2005a). Timing and communicative functions of pitch contours. Phonetica, 4 5 6 62(2–4), 88–105. https://doi.org/10.1159/000090091 7 8 9 Kohler, K. J. (2005b). Pragmatic and attitudinal meanings of pitch patterns in German 10 11 syntactically marked questions (Arbeitsberichte des Instituts für Phonetik und 12 13 14 digitale Sprachverarbeitung der Universität Kiel (AIPUK) No. 35a). (K. J. Kohler, F. 15 16 Kleber, & B. Peters, Eds.), Prosodic Structures in German Spontaneous Speech. 17 18 Kiel: Institut für Phonetik und digitale Sprachverarbeitung (IPDS) der Universität 19 For Peer Review 20 21 Kiel, 125–142. 22 23 24 Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2016). lmerTest: Tests in 25 26 linear mixed effects models. R package version 2.0-33. Retrieved from https://cran.r- 27 28 29 project.org/package=lmerTest 30 31 32 Ladd, D. R. (1981). A first look at the semantics and pragmatics of negative questions 33 34 and tag questions. In Proceedings of Chicago linguistic society vol. 17 (pp. 164– 35 36 37 171). 38 39 40 Ladd, D. R. (1996). Intonational Phonology. Cambridge: Cambridge University Press. 41 42 43 44 Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type 45 46 I Error and Power in Linear Mixed Models. Journal of Memory and Language, 94, 47 48 305–315. http://doi.org/10.1016/j.jml.2017.01.001 49 50 51 52 Michalsky, J. (2017). Frageintonation im Deutschen: Zur intonatorischen Markierung 53 54 von Interrogativität und Fragehaltigkeit. Berlin/Boston: Walter de Gruyter. 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 70 of 106

1 2 3 Michelas, A., Portes, C., & Champagne-Lavau, M. (2016). When pitch accents encode 4 5 6 speaker commitment: Evidence from French intonation. Language and Speech, 59, 7 8 266–293. http://doi.org/10.1177/0023830915587337 9 10 11 Neurobehavioral Systems (2014). Presentation. Version 17.0. 12 13 14 15 Niebuhr, O. (2003). Perceptual study of timing variables in F0 peaks. In Proceedings of 16 17 the 15th ICPhS, Barcelona, Spain (pp. 1225–1228). 18 19 For Peer Review 20 21 Nilsenová, M. (2006). Rises and falls: Studies in the semantics and pragmatics of 22 23 intonation. PhD thesis, Amsterdam University. 24 25 26 O’Connor, J. D., & Arnold, G. F. (1973). Intonation of Colloquial English (2nd edition). 27 28 29 London: Longman. 30 31 32 Peters, J. (2006). Intonation. In M. Wermke, K. Kunzel-Razum, & W. Scholze- 33 34 Stubenrecht (Eds.), Duden. Die Grammatik (revised reprint of the 7th edition, pp. 95– 35 36 37 128). Mannheim, , Wien, Zürich: Dudenverlag. 38 39 40 Petrone, C., & Niebuhr, O. (2014). On the Intonation of German Intonation Questions: 41 42 The Role of the Prenuclear Region. Language and Speech, 57(1), 108–146. 43 44 45 http://doi.org/10.1177/0023830913495651 46 47 48 Pheby, J. (1975). Intonation und Grammatik im Deutschen. Berlin: Akademie-Verlag. 49 50 51 52 Pierrehumbert, J. B. (1980). The phonology and phonetics of English intonation. PhD 53 54 thesis, MIT. 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 71 of 106 Language and Speech

1 2 3 Pierrehumbert, J. B., & Hirschberg, J. (1990). The meaning of intonational contours in 4 5 6 the interpretation of discourse. In P. R. Cohen, J. Morgan, & M. E. Pollack (Eds.), 7 8 Intentions in communication (pp. 271–311). Cambridge, MA: MIT Press. 9 10 11 Pike, K. L. (1945). The intonation of American English. Ann Arbor: University of 12 13 14 Michigan Press. 15 16 17 Prieto, P., & Borràs-Comes, J. (2018). Question intonation contours as dynamic 18 19 epistemic operators.For Natural Peer Language andReview Linguistic Theory, 36, 563–586. 20 21 22 http://doi.org/10.1007/s11049-017-9382-z 23 24 25 R Core Team. (2017). R: A Language and Environment for Statistical Computing. 26 27 Vienna, Austria: R Foundation for Statistical Computing. Version 3.4.0 (2017-04- 28 29 30 21). Retrieved from https://www.r-project.org/ 31 32 33 Roberts, C. (2012). Information Structure in Discourse: Towards an Integrated Formal 34 35 36 Theory of Pragmatics. Semantics and Pragmatics, 6: 1–69. 37 38 http://doi.org/10.3765/sp.5.6 39 40 41 Romero, M., & Han, C. (2002). Verum Focus in Negative Yes/No Questions and Ladd’s 42 43 44 p/-p Ambiguity. Proceedings of SALT XII, 1(4), 204–224. 45 46 47 Romero, M., & Han, C.-H. (2004). On Negative “Yes / No” Questions. Linguistics and 48 49 Philosophy, 27(5), 609–658. 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 72 of 106

1 2 3 Romero, M., Arnhold, A., Braun, B., & Domaneschi, F. (2017). Negative Polar Question 4 5 6 Types in English. In A. Lamont & K. A. Tetzloff (Eds.), NELS 47: Proceedings of 7 8 the Forty-Seventh Annual Meeting of the North East Linguistic Society: Volume 3. 9 10 Amherst, MA: GLSA: Graduate Linguistics Student Association, 35–48. 11 12 13 14 Rooth, M. (1985). Association with Focus. PhD thesis, University of Massachusetts, 15 16 Amherst. 17 18 19 Rooth, M. (1992). A TheoryFor of PeerFocus Interpretation. Review Natural Language Semantics, 1(1): 20 21 22 75–116. http://doi.org/10.1007/BF02342617 23 24 25 van Rooy, R., & Šafářová, M. (2003). On polar questions. Proceedings of SALT XIII, 26 27 292–309. 28 29 30 31 Schubiger, M. (1958). English intonation. Its form and function. Tübingen: Max 32 33 Niemeyer Verlag. 34 35 36 37 Shokeir, V. (2008). Evidence for the stable use of uptalk in South Ontario English. 38 39 University of Pennsylvania Working Papers in Linguistics, 14(2), 16–24. Silverman, 40 41 K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., … Hirschberg, 42 43 44 J. (1992). TOBI: A standard for labeling English prosody. In Second International 45 46 Conference on Spoken Language Processing (pp. 867–870). 47 48 http://doi.org/10.1515/cllt-2012-0011 49 50 51 52 Steedman, M. (2000). Information structure and the syntax-phonology interface. 53 54 Linguistic Inquiry, 31(4), 649–689. 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 73 of 106 Language and Speech

1 2 3 Steedman, M. (2008). Information-Structural Semantics for English Intonation. In C. Lee, 4 5 6 M. Gordon, & D. Büring (Eds.), Topic and focus. Cross-linguistic perspectives on 7 8 meaning and intonation (pp. 245–264). Dordrecht: Springer. 9 10 http://doi.org/10.1007/978-1-4020-4796-1_13 11 12 13 14 Stock, E. (1999). Deutsche Intonation (4th edition). Berlin, München, Leipzig: 15 16 Langenscheidt - Verlag Enzyklopädie. 17 18 19 Stock, E., & Zacharias,For C. (1973). Peer Deutsche ReviewSatzintonation (2nd revised edition). 20 21 22 Leipzig: VEB Verlag Enzyklopädie. 23 24 25 Sudo, Y. (2013). Biased Polar Questions in English and Japanese. In D. Gutzmann & H.- 26 27 M. Gärtner (Eds.), Beyond expressives: Explorations in use-conditional meaning (pp. 28 29 30 275–295). Leiden: Brill. http://doi.org/10.1163/9789004183988_009 31 32 33 Talla Sando, Y. (2009). Upspeak across Canadian English accents: Acoustic and 34 35 36 sociophonetic evidence. In Proceedings of the 2009 annual conference of the 37 38 Canadian Linguistic Association (pp. 1–12). 39 40 41 Turco, G., Dimroth, C., & Braun, B. (2013). Intonational means to mark verum focus in 42 43 44 German and French. Language and Speech, 56(564), 461–491. 45 46 https://doi.org/10.1177/0023830912460506 47 48 49 Vanrell, M. D. M., Mascaro, I., Torres-Tamarit, F., & Prieto, P. (2013). Intonation as an 50 51 52 Encoder of Speaker Certainty: Information and Confirmation Yes-No Questions in 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 74 of 106

1 2 3 Catalan. Language and Speech, 56(2), 163–191. 4 5 6 http://doi.org/10.1177/0023830912443942 7 8 9 Vanrell, M., Ballone, F., Schirru, C., & Prieto, P. (2014). Intonation and its interfaces in 10 11 Sardinian polar questions, Loquens 1(2), e014. 12 13 14 http://doi.org/10.3989/loquens.2014.014 15 16 17 Veilleux, N., Shattuck-Hufnagel, S., & Brugos, A. (2006). 6.911 Transcribing Prosodic 18 19 Structure of SpokenFor Utterances Peer with ToBI. Review January IAP 2006. Massachusetts 20 21 22 Institute of Technology: MIT OpenCourseWare. Retrieved January 25, 2016, from 23 24 https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-911- 25 26 transcribing-prosodic-structure-of-spoken-utterances-with-tobi-january-iap-2006/ 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 75 of 106 Language and Speech

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Figure 1. Participant’s choice of HiNQ vs. LowNQ responses by certainty and checked proposition for English 46 data. Error bars indicate standard error of the mean in all figures. 47 215x279mm (300 x 300 DPI) 48 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 76 of 106

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Figure 2. Participant’s choice of HiNQ vs. LowNQ responses by certainty and checked proposition for German 46 data. 47 215x279mm (300 x 300 DPI) 48 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 77 of 106 Language and Speech

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 Figure 3. High final rise annotated with H-H% boundary tones. Utterance Aren’t they removing a jury 28 woman? (HiNQ) realized by participant 28. Accented syllables are indicated by capital letters. 29 30 241x152mm (300 x 300 DPI) 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 78 of 106

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 Figure 4. Less high final rise annotated with H-H% boundary tones. Utterance Aren’t they planting a water 28 lily? (HiNQ) realized by participant 28. Accented syllables are indicated by capital letters. 29 30 241x152mm (300 x 300 DPI) 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 79 of 106 Language and Speech

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 Figure 5. Final plateau annotated with H-L% boundary tones. Utterance Aren’t they planting a water lily? 28 (HiNQ) realized by participant 28. Accented syllables are indicated by capital letters. 29 30 241x152mm (300 x 300 DPI) 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 80 of 106

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Figure 6. Proportion of question-final high f0 rises for English data. 46 47 215x279mm (300 x 300 DPI) 48 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 81 of 106 Language and Speech

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Figure 7. Proportion of question-final high f0 rises for German data. 46 47 215x279mm (300 x 300 DPI) 48 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 82 of 106

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Figure 8. Proportion of auxiliaries realized with L+H*< accents in English data. 46 47 215x279mm (300 x 300 DPI) 48 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 83 of 106 Language and Speech

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 Figure 9. Sentence Aren’t they reanalyzing a mammal molar? (HiNQ) realized with each constituent carrying 28 the most frequent accent realization (Aux = auxiliary, Neg = negation, S = subject, V = main verb, N = 29 sentence-final compound noun). Accented syllables are indicated by capital letters. 30 31 241x152mm (300 x 300 DPI) 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 84 of 106

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 28 29 30 Figure 10. Sentence Importieren sie nicht eine Mangelware? ‘Aren’t they importing a scarce commodity?’ 31 (HiNQ) realized with each constituent carrying the most frequent accent realization (V = main verb, S = 32 subject, Neg = negation, N = sentence-final compound noun). Accented syllables are indicated by capital 33 letters. 34 35 241x178mm (300 x 300 DPI) 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 85 of 106 Language and Speech

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 28 29 30 Figure 11. Sentence Besuchen sie keine Rummelmeile? ‘Are they not visiting a fun fair?’ (LowNQ) realized 31 with each constituent carrying the accent realization most frequent in German LowNQs (V = main verb, S = 32 subject, Neg = negation, N = sentence-final compound noun). Accented syllables are indicated by capital 33 letters. 34 35 241x178mm (300 x 300 DPI) 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 86 of 106

1 2 3 4 5 Appendix A: List of items 6 7 8 9 Table A. List of target questions for all English items. 10 11 12 Item HiNQ LowNQ 13 14 15 1. Aren’t they assigning a learner level? Are they not assigning a learner level? 16 17 18 19 2. Aren’t they addingFor a menu Peer item? ReviewAre they not adding a menu item? 20 21 22 3. Aren’t they cutting a mini melon? Are they not cutting a mini melon? 23 24 25 4. Aren’t they inviting a novel writer? Are they not inviting a novel writer? 26 27 28 29 5. Aren’t they releasing a vinyl volume? Are they not releasing a vinyl volume? 30 31 32 6. Aren’t they invoking a motor neuron? Are they not invoking a motor neuron? 33 34 35 7. Aren’t they celebrating a rally winner? Are they not celebrating a rally winner? 36 37 38 39 8. Aren’t they installing an angle-iron? Are they not installing an angle-iron? 40 41 42 9. Aren’t they modelling an atom layer? Are they not modelling an atom layer? 43 44 45 46 10. Aren’t they stealing a venom vial? Are they not stealing a venom vial? 47 48 49 11. Aren’t they deploying a NATO army? Are they not deploying a NATO army? 50 51 52 12. Aren’t they analyzing a movie villain? Are they not analyzing a movie villain? 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 87 of 106 Language and Speech

1 2 3 4 13. Aren’t they photographing a river valley? Are they not photographing a river valley? 5 6 7 14. Aren’t they reanalyzing a mammal molar? Are they reanalyzing a mammal molar? 8 9 10 11 15. Aren’t they planting a water lily? Are they not planting a water lily? 12 13 14 16. Aren’t they removing a jury woman? Are they not removing a jury woman? 15 16 17 18 19 For Peer Review 20 Table B. List of target questions for all German items. 21 22 23 Item HiNQ LowNQ 24 25 26 1. Berücksichtigen sie nicht eine Berücksichtigen sie keine 27 28 Lehrermeinung? Lehrermeinung? 29 30 31 ‘Aren’t they considering a teacher’s ‘Are they not considering a teacher’s 32 33 34 opinion?’ opinion?’ 35 36 37 2. Sanieren sie nicht eine Römervilla? Sanieren sie keine Römervilla? 38 39 40 ‘Aren’t they reconstructing a Roman ‘Are they not reconstructing a Roman 41 42 villa?’ villa?’ 43 44 45 3. Verwenden sie nicht eine Nylonrolle? Verwenden sie keine Nylonrolle? 46 47 48 ‘Aren’t they using a nylon paint ‘Are they not using a nylon paint 49 50 roller?’ roller?’ 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 88 of 106

1 2 3 4 4. Beschäftigen sie nicht einen Beschäftigen sie keinen 5 6 Maurerlehrling? Maurerlehrling? 7 8 9 ‘Aren’t they employing a bricklayer ‘Are they not employing a bricklayer 10 11 apprentice?’ apprentice?’ 12 13 14 15 5. Veranstalten sie nicht ein Veranstalten sie kein Rallyrennen? 16 17 Rallyrennen? ‘Are they not organising a rally race?’ 18 19 For Peer Review 20 ‘Aren’t they organising a rally race?’ 21 22 23 6. Empfehlen sie nicht einen Empfehlen sie keinen Nierenwärmer? 24 25 Nierenwärmer? 26 ‘Are they not recommending sashes?’ 27 28 ‘Aren’t they recommending sashes?’ 29 30 31 7. Erwarten sie nicht einen Juraneuling? Erwarten sie keinen Juraneuling? 32 33 34 ‘Aren’t they expecting a law novice?’ ‘Are they not expecting a law novice?’ 35 36 37 38 8. Ersetzen sie nicht eine Neonröhre? Ersetzen sie keine Neonröhre? 39 40 ‘Aren’t they replacing a neon lamp?’ ‘Are they not replacing a neon lamp?’ 41 42 43 44 9. Vertreten sie nicht eine Malerinnung? Vertreten sie keine Malerinnung? 45 46 47 ‘Aren’t they representing a painter’s ‘Are they not representing a painter’s 48 49 guild?’ guild?’ 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 89 of 106 Language and Speech

1 2 3 4 10. Prognostizieren sie nicht eine Prognostizieren sie keine 5 6 Mehreinnahme? Mehreinnahme? 7 8 9 ‘Aren’t they predicting additional ‘Are they not predicting additional 10 11 revenue?’ revenue?’ 12 13 14 15 11. Repräsentieren sie nicht einen Repräsentieren sie keinen 16 17 Wählerwillen? Wählerwillen? 18 19 For Peer Review 20 ‘Aren’t they representing [a] voters’ ‘Are they not representing [a] voters’ 21 22 intention?’ intention?’ 23 24 25 12. Importieren sie nicht eine Importieren sie keine Mangelware? 26 27 28 Mangelware? ‘Are they not importing a scarce 29 30 ‘Aren’t they importing a scarce commodity?’ 31 32 33 commodity?’ 34 35 36 13. Besuchen sie nicht eine Besuchen sie keine Rummelmeile? 37 38 Rummelmeile? 39 ‘Are they not visiting a fun fair / 40 41 ‘Aren’t they visiting a fun fair / amusement mile?’ 42 43 amusement mile?’ 44 45 46 47 14. Planen sie nicht einen Angelweiher? Planen sie keinen Angelweiher? 48 49 50 ‘Aren’t they planning a fishing pond?’ ‘Are they not planning a fishing pond?’ 51 52 53 15. Tanzen sie nicht einen Ringelreihen? Tanzen sie keinen Ringelreihen? 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 90 of 106

1 2 3 ‘Aren’t they dancing ring-a-ring-a- ‘Are they not dancing ring-a-ring-a- 4 5 6 roses?’ roses?’ 7 8 9 16. Errichten sie nicht ein Errichten sie kein Marmormahnmal? 10 11 Marmormahnmal? 12 ‘Are they not erecting a marble 13 14 ‘Aren’t they erecting a marble memorial?’ 15 16 memorial?’ 17 18 19 For Peer Review 20 21 Appendix B: Accents 22 23 24 25 Figure AppendixB1 shows the raw number of all accent type realizations by condition 26 27 (proposition and certainty) for the separate constituents and sentence types. Table C lists 28 29 the number and percentage of realizations with the most frequent accent realization for all 30 31 32 constituents in the English data. 33 34 [insert Figure AppendixB1] 35 36 37 Table C. Proportion of most frequent accent realizations in the English data by 38 39 constituent, checked proposition, certainty and morpho-syntactic question type, given in 40 41 42 percent, rounded, and absolute number of realizations (in brackets). 43 44 45 Constituent Realization Checked Certainty In HiNQs In 46 47 proposition LowNQs 48 49 50 51 Auxiliary L+H*< Speaker’s/p High/90% 71% (84) 5% (4) 52 53 54 Speaker’s/p Low/60% 75% (89) 5% (4) 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 91 of 106 Language and Speech

1 2 3 4 Addressee’s/p High/90% 82% (93) 8% (6) 5 6 7 Addressee’s/p Low/60% 80% (97) 9% (6) 8 9 10 11 Negation No accent Speaker’s/p High/90% 100% (118) 4% (3) 12 13 14 Speaker’s/p Low/60% 100% (119) 4% (3) 15 16 17 Addressee’s/p High/90% 100% (114) 9% (7) 18 19 For Peer Review 20 21 Addressee’s/p Low/60% 100% (122) 10% (7) 22 23 24 Compound L* Speaker’s/p High/90% 94% (111) 82% (61) 25 26 27 noun 28 Speaker’s/p Low/60% 92% (110) 81% (59) 29 30 31 Addressee’s/p High/90% 91% (104) 89% (68) 32 33 34 Low/60% 93% (113) 84% (57) 35 Addressee’s/p 36 37 38 Verb L* Speaker’s/p High/90% 36% (43) 28% (21) 39 40 41 Speaker’s/p Low/60% 30% (36) 32% (23) 42 43 44 45 Addressee’s/p High/90% 39% (44) 28% (21) 46 47 48 Addressee’s/p Low/60% 36% (44) 38% (26) 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 92 of 106

1 2 3 For negations in the English data, Pearson’s chi-squared test with Yates’ continuity 4 5 6 correction indicated that the presence of an accent on the negation was significantly 7 8 influenced by question type, χ2 (1) = 678.57, p < 0.001, i.e. there were significantly fewer 9 10 accented negations in HiNQ than in LowNQ. However, this could not be confirmed with 11 12 linear mixed-effects modelling due to the lack of variation in the HiNQ productions, 13 14 15 which constituted the majority of the data. Therefore, we did not use question type as a 16 17 predictor in these models. The resulting best binomial linear mixed-effects model did not 18 19 indicate any significantFor effects ofPeer checked proposition Review or certainty, either. The main 20 21 22 effect of proposition included in the model was not significant, β = -0.110, SE = 0.319, z 23 24 = -0.34, p = 0.731. The model further contained a significant fixed effect of trial, β = 25 26 0.026, SE = 0.005, z = 5.49, p < 0.001, suggesting that participants accented negations (in 27 28 29 LowNQs) more often over the course of the experimental session, as well as a by- 30 31 participant slope of checked proposition in the random structure. 32 33 For utterance-final compound nouns, the best linear fixed-effects model indicated that the 34 35 most frequent accent category, L*, appeared more often in HiNQs than in LowNQs, β = 36 37 38 15.778, SE = 5.403, z = 2.92, p = 0.004, but no other effects were significant (trimming 39 40 removed 49 data points or 6%). 41 42 In evaluating accentuation of verbs in the English data, we found no significant effect of 43 44 45 any of the predictors. We contrasted the most frequent realization, a low L* accent, with 46 47 a second category comprising all other realizations. Note that there were not enough data 48 49 points to test a three-way interaction between proposition, certainty and question type. 50 51 52 [insert Figure AppendixB2] 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 93 of 106 Language and Speech

1 2 3 Table D. Proportion of most frequent accent realizations in the German data by 4 5 6 constituent, checked proposition, certainty and morpho-syntactic question type, given in 7 8 percent, rounded, and absolute number of realizations (in brackets). 9 10 11 Constituent Realization Checked Certainty In HiNQ In LowNQ 12 13 14 proposition 15 16 17 Negation No accent Speaker’s/p High/90% 59% (79) 12% (7) 18 19 For Peer Review 20 Speaker’s/p Low/60% 52% (66) 14% (9) 21 22 23 High/90% 44% (26) 9% (12) 24 Addressee’s/p 25 26 27 Addressee’s/p Low/60% 56% (45) 8% (9) 28 29 30 Compound L*+H Speaker’s/p High/90% 74% (98) 49% (29) 31 32 33 noun 34 Speaker’s/p Low/60% 72% (92) 43% (28) 35 36 37 Addressee’s/p High/90% 69% (41) 36% (48) 38 39 40 41 Addressee’s/p Low/60% 74% (59) 36% (40) 42 43 44 Verb H* Speaker’s/p High/90% 53% (70) 44% (26) 45 46 47 Speaker’s/p Low/60% 49% (62) 38% (25) 48 49 50 51 Addressee’s/p High/90% 47% (28) 41% (55) 52 53 54 Addressee’s/p Low/60% 41% (33) 45% (50) 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 94 of 106

1 2 3 4 5 6 For German (see Figure AppendixB2, Table D), linear mixed-effects modelling of 7 8 accentuation of negation words indicated that checked proposition and certainty did not 9 10 have a significant effect. Instead, the best model only contained an effect of question 11 12 type, β = -3.932, SE = 0.332, z = -11.85, p < 0.001 (trimming removed 16 data points or 13 14 15 2% or the data). This effect suggested that participants accented negations overall less 16 17 often in HiNQs than in LowNQs, i.e. HiNQs had a higher proportion of unaccented 18 19 realizations. For Peer Review 20 21 22 German sentence-final compound nouns were most often realized with rising L*+H 23 24 accents. This realization was overall more frequent in HiNQs than in LowNQs, as shown 25 26 by the best binomial linear mixed-effects model, β = 1.818, SE = 0.367, z = 4.95, p < 27 28 29 0.001 (the model additionally contained a by-participant effect of question type). Thus, 30 31 the accent L*+H was the overall most frequent realization for final compound nouns in 32 33 both question types, though it accounted for 73% in HiNQs and only for 39% in LowNQs 34 35 (N = 290 vs. N = 145). Finally, the model included a fixed effect of trial, indicating fewer 36 37 38 L*+H accents later in the experimental sessions, β = -0.011, SE = 0.005, z = -2.17, p = 39 40 0.030. 41 42 For the accentuation of verbs in the German data, we evaluated a binary dependent 43 44 45 variable distinguishing H* accents from all other realizations. Again, no significant effect 46 47 of any of the predictors emerged. However, an insignificant fixed effect of question type, 48 49 β = 0.365, SE = 0.273, z = 1.33, p = 0.182, was retained in the final model due to the 50 51 52 presence of a by-participant effect of question type. In addition to the most frequent 53 54 realization with H* accents (N = 349 or 46%), German verbs also appeared with low L* 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 95 of 106 Language and Speech

1 2 3 accents (N = 203 or 26%), unaccented (N = 90 or 12%), with rising L+H* accents (N = 4 5 6 87 or 11%), with rising L*+H accents (N = 36 or 5%), and with H+!H* and H+L* 7 8 accents (one realization each). 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 96 of 106

1 2 3 4 5 Appendix A: List of items 6 7 8 9 Table A. List of target questions for all English items. 10 11 12 Item HiNQ LowNQ 13 14 15 1. Aren’t they assigning a learner level? Are they not assigning a learner level? 16 17 18 19 2. Aren’t they addingFor a menu Peer item? ReviewAre they not adding a menu item? 20 21 22 3. Aren’t they cutting a mini melon? Are they not cutting a mini melon? 23 24 25 4. Aren’t they inviting a novel writer? Are they not inviting a novel writer? 26 27 28 29 5. Aren’t they releasing a vinyl volume? Are they not releasing a vinyl volume? 30 31 32 6. Aren’t they invoking a motor neuron? Are they not invoking a motor neuron? 33 34 35 7. Aren’t they celebrating a rally winner? Are they not celebrating a rally winner? 36 37 38 39 8. Aren’t they installing an angle-iron? Are they not installing an angle-iron? 40 41 42 9. Aren’t they modelling an atom layer? Are they not modelling an atom layer? 43 44 45 10. Aren’t they stealing a venom vial? Are they not stealing a venom vial? 46 47 48 49 11. Aren’t they deploying a NATO army? Are they not deploying a NATO army? 50 51 52 12. Aren’t they analyzing a movie villain? Are they not analyzing a movie villain? 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 97 of 106 Language and Speech

1 2 3 4 13. Aren’t they photographing a river valley? Are they not photographing a river valley? 5 6 7 14. Aren’t they reanalyzing a mammal molar? Are they reanalyzing a mammal molar? 8 9 10 15. Aren’t they planting a water lily? Are they not planting a water lily? 11 12 13 14 16. Aren’t they removing a jury woman? Are they not removing a jury woman? 15 16 17 18 19 Table B. List of targetFor questions Peer for all German Review items. 20 21 22 23 Item HiNQ LowNQ 24 25 26 1. Berücksichtigen sie nicht eine Berücksichtigen sie keine 27 28 Lehrermeinung? Lehrermeinung? 29 30 31 ‘Aren’t they considering a teacher’s ‘Are they not considering a teacher’s 32 33 34 opinion?’ opinion?’ 35 36 37 2. Sanieren sie nicht eine Römervilla? Sanieren sie keine Römervilla? 38 39 40 ‘Aren’t they reconstructing a Roman ‘Are they not reconstructing a Roman 41 42 villa?’ villa?’ 43 44 45 3. Verwenden sie nicht eine Nylonrolle? Verwenden sie keine Nylonrolle? 46 47 48 ‘Aren’t they using a nylon paint ‘Are they not using a nylon paint 49 50 roller?’ roller?’ 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 98 of 106

1 2 3 4 4. Beschäftigen sie nicht einen Beschäftigen sie keinen 5 6 Maurerlehrling? Maurerlehrling? 7 8 9 ‘Aren’t they employing a bricklayer ‘Are they not employing a bricklayer 10 11 apprentice?’ apprentice?’ 12 13 14 15 5. Veranstalten sie nicht ein Veranstalten sie kein Rallyrennen? 16 17 Rallyrennen? ‘Are they not organising a rally race?’ 18 19 For Peer Review 20 ‘Aren’t they organising a rally race?’ 21 22 23 6. Empfehlen sie nicht einen Empfehlen sie keinen Nierenwärmer? 24 25 Nierenwärmer? 26 ‘Are they not recommending sashes?’ 27 28 ‘Aren’t they recommending sashes?’ 29 30 31 7. Erwarten sie nicht einen Juraneuling? Erwarten sie keinen Juraneuling? 32 33 34 ‘Aren’t they expecting a law novice?’ ‘Are they not expecting a law novice?’ 35 36 37 38 8. Ersetzen sie nicht eine Neonröhre? Ersetzen sie keine Neonröhre? 39 40 ‘Aren’t they replacing a neon lamp?’ ‘Are they not replacing a neon lamp?’ 41 42 43 44 9. Vertreten sie nicht eine Malerinnung? Vertreten sie keine Malerinnung? 45 46 47 ‘Aren’t they representing a painter’s ‘Are they not representing a painter’s 48 49 guild?’ guild?’ 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 99 of 106 Language and Speech

1 2 3 4 10. Prognostizieren sie nicht eine Prognostizieren sie keine 5 6 Mehreinnahme? Mehreinnahme? 7 8 9 ‘Aren’t they predicting additional ‘Are they not predicting additional 10 11 revenue?’ revenue?’ 12 13 14 15 11. Repräsentieren sie nicht einen Repräsentieren sie keinen 16 17 Wählerwillen? Wählerwillen? 18 19 For Peer Review 20 ‘Aren’t they representing [a] voters’ ‘Are they not representing [a] voters’ 21 22 intention?’ intention?’ 23 24 25 12. Importieren sie nicht eine Importieren sie keine Mangelware? 26 27 28 Mangelware? ‘Are they not importing a scarce 29 30 ‘Aren’t they importing a scarce commodity?’ 31 32 33 commodity?’ 34 35 36 13. Besuchen sie nicht eine Besuchen sie keine Rummelmeile? 37 38 Rummelmeile? 39 ‘Are they not visiting a fun fair / 40 41 ‘Aren’t they visiting a fun fair / amusement mile?’ 42 43 amusement mile?’ 44 45 46 47 14. Planen sie nicht einen Angelweiher? Planen sie keinen Angelweiher? 48 49 50 ‘Aren’t they planning a fishing pond?’ ‘Are they not planning a fishing pond?’ 51 52 53 15. Tanzen sie nicht einen Ringelreihen? Tanzen sie keinen Ringelreihen? 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 100 of 106

1 2 3 ‘Aren’t they dancing ring-a-ring-a- ‘Are they not dancing ring-a-ring-a- 4 5 6 roses?’ roses?’ 7 8 9 16. Errichten sie nicht ein Errichten sie kein Marmormahnmal? 10 11 Marmormahnmal? 12 ‘Are they not erecting a marble 13 14 ‘Aren’t they erecting a marble memorial?’ 15 16 memorial?’ 17 18 19 For Peer Review 20 21 Appendix B: Accents 22 23 24 25 Figure AppendixB1 shows the raw number of all accent type realizations by condition 26 27 (proposition and certainty) for the separate constituents and sentence types. Table C lists 28 29 the number and percentage of realizations with the most frequent accent realization for all 30 31 32 constituents in the English data. 33 34 [insert Figure AppendixB1] 35 36 37 Table C. Proportion of most frequent accent realizations in the English data by 38 39 constituent, checked proposition, certainty and morpho-syntactic question type, given in 40 41 42 percent, rounded, and absolute number of realizations (in brackets). 43 44 45 Constituent Realization Checked Certainty In HiNQs In 46 47 proposition LowNQs 48 49 50 51 Auxiliary L+H*< Speaker’s/p High/90% 71% (84) 5% (4) 52 53 54 Speaker’s/p Low/60% 75% (89) 5% (4) 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 101 of 106 Language and Speech

1 2 3 4 Addressee’s/p High/90% 82% (93) 8% (6) 5 6 7 Addressee’s/p Low/60% 80% (97) 9% (6) 8 9 10 11 Negation No accent Speaker’s/p High/90% 100% (118) 4% (3) 12 13 14 Speaker’s/p Low/60% 100% (119) 4% (3) 15 16 17 Addressee’s/p High/90% 100% (114) 9% (7) 18 19 For Peer Review 20 21 Addressee’s/p Low/60% 100% (122) 10% (7) 22 23 24 Compound L* Speaker’s/p High/90% 94% (111) 82% (61) 25 26 27 noun 28 Speaker’s/p Low/60% 92% (110) 81% (59) 29 30 31 Addressee’s/p High/90% 91% (104) 89% (68) 32 33 34 Low/60% 93% (113) 84% (57) 35 Addressee’s/p 36 37 38 Verb L* Speaker’s/p High/90% 36% (43) 28% (21) 39 40 41 Speaker’s/p Low/60% 30% (36) 32% (23) 42 43 44 45 Addressee’s/p High/90% 39% (44) 28% (21) 46 47 48 Addressee’s/p Low/60% 36% (44) 38% (26) 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 102 of 106

1 2 3 For negations in the English data, Pearson’s chi-squared test with Yates’ continuity 4 5 6 correction indicated that the presence of an accent on the negation was significantly 7 8 influenced by question type, χ2 (1) = 678.57, p < 0.001, i.e. there were significantly fewer 9 10 accented negations in HiNQ than in LowNQ. However, this could not be confirmed with 11 12 linear mixed-effects modelling due to the lack of variation in the HiNQ productions, 13 14 15 which constituted the majority of the data. Therefore, we did not use question type as a 16 17 predictor in these models. The resulting best binomial linear mixed-effects model did not 18 19 indicate any significantFor effects ofPeer checked proposition Review or certainty, either. The main 20 21 22 effect of proposition included in the model was not significant, β = -0.110, SE = 0.319, z 23 24 = -0.34, p = 0.731. The model further contained a significant fixed effect of trial, β = 25 26 0.026, SE = 0.005, z = 5.49, p < 0.001, suggesting that participants accented negations (in 27 28 29 LowNQs) more often over the course of the experimental session, as well as a by- 30 31 participant slope of checked proposition in the random structure. 32 33 For utterance-final compound nouns, the best linear fixed-effects model indicated that the 34 35 most frequent accent category, L*, appeared more often in HiNQs than in LowNQs, β = 36 37 38 15.778, SE = 5.403, z = 2.92, p = 0.004, but no other effects were significant (trimming 39 40 removed 49 data points or 6%). 41 42 In evaluating accentuation of verbs in the English data, we found no significant effect of 43 44 45 any of the predictors. We contrasted the most frequent realization, a low L* accent, with 46 47 a second category comprising all other realizations. Note that there were not enough data 48 49 points to test a three-way interaction between proposition, certainty and question type. 50 51 52 [insert Figure AppendixB2] 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 103 of 106 Language and Speech

1 2 3 Table D. Proportion of most frequent accent realizations in the German data by 4 5 6 constituent, checked proposition, certainty and morpho-syntactic question type, given in 7 8 percent, rounded, and absolute number of realizations (in brackets). 9 10 11 Constituent Realization Checked Certainty In HiNQ In LowNQ 12 13 14 proposition 15 16 17 Negation No accent Speaker’s/p High/90% 59% (79) 12% (7) 18 19 For Peer Review 20 Speaker’s/p Low/60% 52% (66) 14% (9) 21 22 23 High/90% 44% (26) 9% (12) 24 Addressee’s/p 25 26 27 Addressee’s/p Low/60% 56% (45) 8% (9) 28 29 30 Compound L*+H Speaker’s/p High/90% 74% (98) 49% (29) 31 32 33 noun 34 Speaker’s/p Low/60% 72% (92) 43% (28) 35 36 37 Addressee’s/p High/90% 69% (41) 36% (48) 38 39 40 41 Addressee’s/p Low/60% 74% (59) 36% (40) 42 43 44 Verb H* Speaker’s/p High/90% 53% (70) 44% (26) 45 46 47 Speaker’s/p Low/60% 49% (62) 38% (25) 48 49 50 51 Addressee’s/p High/90% 47% (28) 41% (55) 52 53 54 Addressee’s/p Low/60% 41% (33) 45% (50) 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 104 of 106

1 2 3 4 5 6 For German (see Figure AppendixB2, Table D), linear mixed-effects modelling of 7 8 accentuation of negation words indicated that checked proposition and certainty did not 9 10 have a significant effect. Instead, the best model only contained an effect of question 11 12 type, β = -3.932, SE = 0.332, z = -11.85, p < 0.001 (trimming removed 16 data points or 13 14 15 2% or the data). This effect suggested that participants accented negations overall less 16 17 often in HiNQs than in LowNQs, i.e. HiNQs had a higher proportion of unaccented 18 19 realizations. For Peer Review 20 21 22 German sentence-final compound nouns were most often realized with rising L*+H 23 24 accents. This realization was overall more frequent in HiNQs than in LowNQs, as shown 25 26 by the best binomial linear mixed-effects model, β = 1.818, SE = 0.367, z = 4.95, p < 27 28 29 0.001 (the model additionally contained a by-participant effect of question type). Thus, 30 31 the accent L*+H was the overall most frequent realization for final compound nouns in 32 33 both question types, though it accounted for 73% in HiNQs and only for 39% in LowNQs 34 35 (N = 290 vs. N = 145). Finally, the model included a fixed effect of trial, indicating fewer 36 37 38 L*+H accents later in the experimental sessions, β = -0.011, SE = 0.005, z = -2.17, p = 39 40 0.030. 41 42 For the accentuation of verbs in the German data, we evaluated a binary dependent 43 44 45 variable distinguishing H* accents from all other realizations. Again, no significant effect 46 47 of any of the predictors emerged. However, an insignificant fixed effect of question type, 48 49 β = 0.365, SE = 0.273, z = 1.33, p = 0.182, was retained in the final model due to the 50 51 52 presence of a by-participant effect of question type. In addition to the most frequent 53 54 realization with H* accents (N = 349 or 46%), German verbs also appeared with low L* 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 105 of 106 Language and Speech

1 2 3 accents (N = 203 or 26%), unaccented (N = 90 or 12%), with rising L+H* accents (N = 4 5 6 87 or 11%), with rising L*+H accents (N = 36 or 5%), and with H+!H* and H+L* 7 8 accents (one realization each). 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Language and Speech Page 106 of 106

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Figure AppendixB1. Number of realization of different accent categories by constituent, morpho-syntactic 46 question type and condition crossing proposition checked (speaker’s/p vs. addressee’s/¬p) and certainty 47 regarding the proposition (high 90% certain vs. low 60% certain) in the English data. 48 215x279mm (300 x 300 DPI) 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las Page 107 of 106 Language and Speech

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Figure AppendixB2. Number of realization of different accent categories by constituent, morpho-syntactic 46 question type and condition crossing proposition checked (speaker’s/p vs. addressee’s/¬p) and certainty 47 regarding the proposition (high 90% certain vs. low 60% certain) in the German data. 48 215x279mm (300 x 300 DPI) 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/las