SEQUENTIAL COLLABORATION 1

1 Sequential Collaboration: Comparing the Accuracy of Dependent,

2 Incremental Judgments to Wisdom of Crowds

1 2 3 Maren Mayer & Daniel W. Heck

1 4 University of

2 5 Philipps SEQUENTIAL COLLABORATION 2

6 Author Note

7

8 Maren Mayer, Department of Psychology, School of Social , University of

9 Mannheim, . https://orcid.org/0000-0002-6830-7768

10 Daniel W. Heck, Department of Psychology, Philipps University of Marburg,

11 Germany. https://orcid.org/0000-0002-6302-9252

12 Data and R scripts for the analyses are available at the Open Science

13 Framework(https://osf.io/96nsk/).

nd 14 The present work was presented at the 62 Conference of Experimental

15 Psychologists (Virtual TeaP, 2021). The present manuscript has not yet been peer

16 reviewed. A preprint was uploaded to PsyArXiv and ResearchGate for timely

17 dissemination (version from June 2, 2021).

18 This work was supported by the Heidelberg Academy of and

19 (WIN project Shared Data Sources) and the Research Training Group

20 “Statistical Modeling in Psychology” funded by the German Research Foundation

21 (DFG grant GRK 2277).

22 The authors made the following contributions. Maren Mayer: Conceptualization,

23 Investigation, Methodology, Writing - Original Draft, Writing - Review & Editing;

24 Daniel W. Heck: Conceptualization, Methodology, Writing - Review & Editing.

25 Correspondence concerning this article should be addressed to Maren Mayer, B6,

26 30-32, 68169 Mannheim. E-mail: [email protected] SEQUENTIAL COLLABORATION 3

27 Abstract

28 In recent years, online collaborative projects in which users generate extensive

29 knowledge bases such as Wikipedia or OpenStreetMap have become increasingly

30 popular while yielding highly accurate information. Collaboration in such projects is

31 organized sequentially with one contributor creating an entry and the following

32 contributors deciding whether to adjust or maintain the presented information. We refer

33 to this process as sequential collaboration as individual judgments are dependent on the

34 latest judgment. As sequential collaboration has not yet been examined systematically,

35 we investigated whether dependent incremental judgment as obtained in sequential

36 collaboration become increasingly more accurate and whether the final judgments are

37 more accurate than estimates obtained with equally large groups in wisdom of crowds.

38 For this purpose, we conducted three preregistered studies with groups of four to six

39 contributors using general knowledge questions as well as geographic maps on which

40 cities had to be positioned. As expected, individual judgments in sequential

41 collaboration became more accurate and the final group estimates were slightly more

42 accurate than those based on aggregated judgments in wisdom of crowds. These results

43 show that sequential collaboration can profit from dependent incremental judgments,

44 thereby extending the literature on dependent judgments and shedding light on

45 collaboration in large-scale online collaborative projects.

46 Keywords: judgment and decision making, teamwork, mass collaboration, group

47 decision making SEQUENTIAL COLLABORATION 4

48 Sequential Collaboration: Comparing the Accuracy of Dependent,

49 Incremental Judgments to Wisdom of Crowds

50 Collaborative online projects that provide user-generated content have become a

51 popular source for information gathering and acquiring over the last twenty years. The

52 most prominent example is Wikipedia, an online encyclopedia that allows users to

53 contribute semantic information to various topics in the form of structured articles

54 (Wikipedia Contributors, 2021). Another less well known example of online

55 collaboration is OpenStreetMap, a collaborative project that aims at generating a

56 comprehensive, open, and free-to-use map of the world (OpenStreetMap Contributors,

57 2021). OpenStreetMap does not only comprise geographical numeric information about

58 the locations of objects such as coordinates but also semantic information such as

59 names of streets, areas, buildings and other useful information (e.g., addresses or

60 websites of shops and restaurants). Giles (2005) showed that Wikipedia is very accurate

61 in general. Moreover, certain topics such as information on cancer or certain drugs are

62 similarly accurate as official health information or text books (Kräenbring et al., 2014;

63 Leithner et al., 2010). Comparing the accuracy of OpenStreetMap with commercial

64 map providers or governmental sources also revealed a comparable accuracy (Girres &

65 Touya, 2010; Zheng & Zheng, 2014; Zielstra & Zipf, 2010).

66 The high accuracy of Wikipedia and other online collaborative projects has often

67 been attributed to wisdom of crowds (Arazy et al., 2006; Baeza-Yates & Saez-Trumper,

68 2015; Chen et al., 2010; Kittur et al., 2007; Kittur & Kraut, 2008; Niederer & Dijck,

69 2010). However, wisdom of crowds refers to a technique of aggregating independent

70 individual judgments (Galton, 1907; Larrick & Soll, 2006; Surowiecki, 2004). The high

71 accuracy of judgments in wisdom of crowds is due to the central limit theorem which

72 ensures that errors in independent, individual judgments cancel out (Hogarth, 1978).

73 Wisdom of crowds has been shown to yield highly accurate estimates for various tasks

74 and contexts (Hueffer et al., 2013; Keck & Tang, 2020; Larrick & Soll, 2006; Steyvers et

75 al., 2009; Wagner & Vinaimont, 2010). Aggregating independent individual judgments SEQUENTIAL COLLABORATION 5

76 is especially successful when judgments bracket the true answer (Larrick & Soll, 2006;

77 Simmons et al., 2011) and are negatively correlated and unbiased (Davis-Stober et al.,

78 2014; Keck & Tang, 2020).

79 In contrast, judgments in online collaborative projects are not collected

80 independently and then aggregated afterwards, but rather elicited in a dependent and

81 sequential manner. Instead of providing independent individual judgments, contributors

82 encounter already existing entries and decide whether to change the presented

83 information which reflects the latest version of an entry or whether to leave the

84 presented information as it is. We refer to this way of collaborating as sequential

85 collaboration.

86 In the following, we will first describe the process of sequential collaboration,

87 distinguish it from other forms of collaboration, and embed it into already existing

88 research on dependent judgments which has shown both positive and detrimental effects

89 of dependency. Furthermore, we compare sequential collaboration and wisdom of

90 crowds to highlight why eliciting incremental, dependent judgments in sequential

91 collaboration can be beneficial for judgment accuracy compared to aggregating

92 independent judgments. In three studies, two of them preregistered, we used general

93 knowledge questions and maps on which cities should be positioned to test whether

94 sequential collaboration yields improved judgments within small groups of four to six

95 contributors. Moreover, we tested whether the final judgments at the end of a sequential

96 chain are more accurate than estimates obtained by aggregating independent individual

97 judgments in wisdom of crowds. In line with our hypotheses, we found that judgment

98 accuracy increased over the course of sequential chain and that sequential collaboration

99 yielded more accurate results than wisdom of crowds in two of the three studies.

100 Sequential Collaboration

101 As outlined above, collaboration in collaborative online projects is organized

102 sequentially by making incremental changes to the latest available information.

103 Sequential collaboration starts with one contributor creating an initial independent SEQUENTIAL COLLABORATION 6

104 entry. The following contributors who encounter this entry can then decide whether to

105 adjust or maintain the presented information. Whenever the entry is changed, the

106 information is updated such that only the latest version of the entry is presented to the

107 following contributors. For example, a first contributor might answer the question “How

108 tall is the Eiffel Tower?” with 420 meters. A second contributor encountering this

109 judgment could simply maintain it while a third contributor might adjust the height to

110 290 meters. After several contributors have adjusted and maintained the judgment, the

111 correct height of 300 meters may be entered. In the domain of geographical maps, the

112 first contributor could create an initial entry by outlining the layout of a buildingnot

113 yet mapped in OpenStreetMap. While a second contributor could improve the outline

114 of the building, a third might not change any information, and a fourth could add

115 semantic tags to describe that the building belongs to a university. Throughout several

116 sequential steps of adjusting and maintaining the entry, the building might finally be

117 represented by an adequate outline and be tagged as a university building with

118 additional information such as the university’s website and address. The sequence of

119 decisions whether to maintain or adjust entries made by a previous contributor forms a

120 sequential chain. Figure1 displays how group estimates are generated in sequential

121 collaboration and in wisdom of crowds. In the former, the final estimate is the last

122 judgment in a sequential chain generated by adjusting and maintaining previous

123 judgments; in the latter, the aggregated estimate is obtained by averaging independent

124 individual judgments.

125 Even though sequential collaboration is performed by a group of individuals and

126 shares some aspects with other forms of group decision making, it also has some unique

127 features that distinguish it from other forms of collaboration. When investigating group

128 decision making, group work usually takes place simultaneously (Kerr & Tindale, 2004;

129 Lu et al., 2012; Stasser & Titus, 1985) even though interaction does not necessarily

130 takes place in person (Dennis, 1996; Dennis et al., 1998; Lu et al., 2012). In a paradigm

131 organized like this, all members of the group have the opportunity of listening to all

132 judgments and opinions, asking questions to other group members, and sharing reasons SEQUENTIAL COLLABORATION 7

Figure 1 Illustration of forming a group estimate in (a) wisdom of crowds compared to (b) sequen- tial collaboration.

(a) Wisdom of Crowds

120 250 250 300

Aggregate estimate: 240

(b) Sequential Collaboration

120 250 250 300

Final estimate: 300

133 for judgments and other information. In sequential collaboration, however, information

134 is shared only by adding or correcting the judgment of a previous contributor which

135 implies that the dependency between judgments is limited to the displayed information.

136 Furthermore, direct interactions with other contributors are neither necessary nor

137 possible in sequential collaboration, and additional information such as the number of

138 adjustments already made to this information or reasons why information was adjusted

139 are initially not available.

140 A form of collaboration similar to sequential collaboration is the Delphi method

141 (Dalkey & Helmer, 1963; Geist, 2010; Jeste et al., 2010). The Delphi method was

142 designed to obtain judgments on a given topic from a group of experts who do not

143 interact directly. After providing a judgment and reasons for this judgment, all

144 judgments are combined in a report by a moderator. This report is sent to all experts

145 who can then revise their judgments based on the judgments and information included

146 in the report. When experts have reached a sufficient consensus, the individual SEQUENTIAL COLLABORATION 8

147 judgments are aggregated to a final result. Both forms of collaboration are similar in

148 that no interaction happens directly with one another. However, in sequential

149 collaboration, contributors are not presented with judgments of multiple other

150 contributors and do not get to know the reasons for specific judgments. Moreover,

151 contributors are not necessarily required to provide a judgment, and even if they do,

152 they may not notice when their judgment is in turn adjusted by others. Finally, the

153 Delphi method focuses on eliciting judgments by a group of experts, whereas in

154 sequential collaboration, neither the specific contributors nor the number of

155 contributors has to be predefined.

156 Possible issues and benefits of sequential collaboration.

157 Even though sequential collaboration seems to be a successful way of integrating

158 judgments of various individuals, the process of sequentially deciding whether to adjust

159 or maintain a previous judgment has not been systematically examined yet.

160 Nonetheless, there are findings on related phenomena that can be applied to sequential

161 collaboration and allow us to derive testable predictions.

162 Possible issues for the accuracy of sequential collaboration may arise from the

163 anchoring effect (Tversky & Kahneman, 1974). Anchoring describes the robust

164 phenomenon that a presented numerical value influences a subsequent, often unrelated

165 numerical judgment (Mussweiler et al., 2004). This effect may undermine the accuracy

166 of sequential collaboration such that adjustments made to a previous judgment are

167 systematically biased toward the previous judgment. Especially when the previous

168 judgment heavily over- or underestimates the correct value, anchoring might affect later

169 judgments which may in turn result in prolonging or hindering other contributors to

170 arrive at accurate unbiased estimates.

171 The conditions under which information provided by others is considered in

172 forming a judgment has been extensively studied in the advice-taking literature

173 (Bonaccio & Dalal, 2006). A typical finding in advice taking is egocentric discounting

174 which describes the phenomenon that advice is generally underweighted relative to SEQUENTIAL COLLABORATION 9

175 one’s own initial judgment (Bonaccio & Dalal, 2006; Yaniv & Kleinberger, 2000). This

176 results in less accurate judgments compared to equally weighing the advice and one’s

177 own judgment. In sequential collaboration, egocentric discounting could lead

178 contributors to adjust the presented previous judgment mainly according to their prior

179 beliefs, which in turn could be detrimental to accuracy as the chain may not converge

180 to the correct answer. However, advice taking improves when no initial individual

181 judgment is formed before receiving advice (Koehler & Beauregard, 2006). This

182 resembles the situation in sequential collaboration more closely since contributors are

183 directly confronted with the previous judgment and do not have to form an initial,

184 independent judgment. Hence, contributors in sequential collaboration may be more

185 likely to accept previous judgments compared to the standard advice-taking paradigm.

186 Previous research also provides preliminary evidence in favor of the accuracy of

187 sequential collaboration. Providing participants with a frame of reference improves

188 subsequent judgments, especially because it prevents extreme judgments (Bonner et al.,

189 2007; Laughlin et al., 1999). Thus, previous judgments in a sequential chain may serve

190 as a frame of reference that prevents extreme judgments and fosters reaching an

191 accurate estimate earlier. However, especially at the beginning, judgments by the

192 previous contributors may not provide an accurate frame of reference.

193 Providing judgments of other individuals can also improve the accuracy of

194 wisdom of crowds. Imitating successful individuals leads to more accurate judgments

195 (King et al., 2012), and discussions in dyads also improve judgments but only when

196 initial independent judgments are formed (Minson et al., 2017). Moreover, Becker et al.

197 (2017) showed that information about others’ judgments is beneficial when this

198 information equally weighs all other judgments (as opposed to overweighing the

199 judgment of a single, highly influential individual). Given that individual judgments

200 can be improved by providing judgments of others, providing contributors in sequential

201 collaboration with previous judgments may lead to more accurate judgments.

202 Especially the finding that imitating successful individuals improves accuracy (King et

203 al., 2012) is relevant for sequential collaboration as contributors may often be presented SEQUENTIAL COLLABORATION 10

204 with the currently best judgment in the sequential chain. Moreover, it is not required to

205 imitate successful individuals but it is also possible to maintain their judgment and

206 thereby imitate this judgment. However, while King et al. (2012) selected the current

207 most accurate judgment from a large pool of independent judgments, the judgments

208 presented in a sequential chain are not necessarily very accurate, especially if only a few

209 contributors have encountered and edited it. It is also unlikely that the judgments of

210 previous contributors are equally weighted as in the study by Becker et al. (2017);

211 instead, single judgments can be dominant.

212 Sequential collaboration may also benefit from the fact that in group work, not

213 all group members contribute to a given task equally and some do not contribute at all

214 (free-rider effect, Bray et al., 1978) and that group members often contribute lessthe

215 more they feel that their contribution is dispensable (Kerr & Bruun, 1983). Such effects

216 may also be observed in sequential collaboration since contributors can decide not to

217 adjust a previous judgment when they do not feel confident that they can substantially

218 contribute to it. This mechanism could in turn improve accuracy since giving

219 respondents the possibility to select the questions to be answered improves accuracy in

220 wisdom of crowds (Bennett et al., 2018). The fact that contributors can self-select

221 which judgments to adjust may thus lead to a higher accuracy of the resulting

222 judgments. However, this requires that contributors can accurately distinct which

223 judgments to maintain (assuming they cannot substantially contribute to it) and which

224 judgments to adjust (assuming they can improve the present state of an entry).

225 Hypotheses

226 Overall, we hypothesize that the probability that a judgment is changed

227 decreases (Hypothesis 1a) while the accuracy of judgments increases (Hypothesis 1b)

228 over the course of a sequential chain. These two hypotheses form the basis of sequential

229 collaboration and need to be tested before further examining sequential collaboration

230 and comparing its accuracy to that of wisdom of crowds.

231 Given its high accuracy, wisdom of crowds can be used as a benchmark for other SEQUENTIAL COLLABORATION 11

232 forms of collaboration. We expect that sequential collaboration yields more accurate

233 results than wisdom of crowds (Hypothesis 2). As discussed above, sequential

234 collaboration may profit from the possibility that contributors are not required to

235 adjust the displayed information (Bennett et al., 2018). Instead, contributors who are

236 not confident may perceive their own judgments to be dispensable (Kerr &Bruun,

237 1983) and in turn not adjust the presented judgment. Furthermore, the accuracy of

238 sequential collaboration should be higher than that of wisdom of crowds given that

239 providing information about the judgments of others can improve judgments (Becker et

240 al., 2017; King et al., 2012; Minson et al., 2017). However, sequential collaboration

241 cannot profit from the central limit theorem (Hogarth, 1978) or from positive effects

242 due to negatively correlated judgments (Davis-Stober et al., 2014; Keck & Tang, 2020).

243 Taken together with the fact that wisdom of crowds is known to yield highly accurate

244 estimates already, we only expect small effect sizes for the comparison of sequential

245 collaboration and wisdom of crowds.

246 To test our hypotheses, we conducted three preregistered online experiments

247 using short chains of four to six contributors in sequential collaboration and respective

248 group sizes for wisdom of crowds. The material comprised general knowledge questions

249 with numerical judgments in the first two experiments and geographic maps on which

250 contributors had to position cities in the last experiment.

251 Experiment 1

252 Prior to conducting the three experiments reported in the present manuscript,

253 we conducted a pilot study to develop and pretest the experimental paradigm. Based

254 on the results of the pilot study, we improved the items, the experimental design, and

255 the countermeasures which aimed at assuring participants’ compliance with the task.

256 Thereby, we ensured the collection of valid data for testing our hypotheses and limited

257 the amount of data exclusions due to unruly behavior of the participants. SEQUENTIAL COLLABORATION 12

258 Method

259 Materials. We presented 65 difficult general knowledge questions such as “How

260 tall is the Eiffel Tower?” or “When was Leonardo da Vinci born?” to the participants.

261 The questions were taken from an item pool on general knowledge questions (Pohl,

262 1998) and updated with contemporary information whenever necessary. The median of

263 correctly answered questions was 0.53% (MAD = 0.78%) indicating that the questions

264 were indeed difficult to answer for participants. All items, their correct numerical

265 answers, and the unit in which the answer had to be given are provided in Table A1 in

266 the Appendix.

267 Participants. For this online study, 310 German college students were

268 collected via a German panel provider. In order to control data quality while collecting

269 the data, participants who changed their browser window or switched to other programs

270 more than five times were already excluded during participation. Based on the results

271 of the pilot study, we suspected participants to look up answers when more than 10% of

272 these questions were answered correctly. This was the case for three participants who

273 were not considered for building sequences and whose data was excluded for the

274 analysis. One participant was excluded due to irregular answer patterns in more than

275 10% of the questions (i.e., answering with number series such as “23456”). Lastly, two

276 participants were excluded since the same position in a sequential chain was assigned to

277 two participants due to a technical issue. We kept the data of the participant whose

278 data was used throughout the rest of this sequential chain. Our final sample comprised

279 304 participants, of whom 76.32% were female, 22.70% were male, and 0.99% did

280 identify as diverse. The mean age of the sample was 23.82 years (SD = 3.20).

281 Design and Procedure. Participants were randomly assigned to either the

282 wisdom-of-crowds questionnaire (193 participants) or the sequential-collaboration

283 questionnaire (111 participants). After consenting to the study, they were introduced to

284 the task. When answering the wisdom-of-crowds questionnaire, participants were

285 presented with one general knowledge question at once and were required to answer the SEQUENTIAL COLLABORATION 13

286 question in an open text-box before they could proceed to the next question. When

287 answering the sequential-collaboration questionnaire, participants were also presented

288 with one general knowledge question. Additionally, the answer of a previous participant

289 was given below the question. Then, participants were asked whether they would like to

290 adjust or maintain the given judgment. Only if participants decided to adjust the

291 presented judgment, the text box appeared in which the new judgment could be entered

292 before proceeding to the next question. Figure2 displays the design of the question for

293 a) the wisdom-of-crowds questionnaire and b) the sequential-collaboration

294 questionnaire. Questions in both questionnaires were presented in random order, the

295 unit in which the judgment had to be given was provided next to the open-text box.

296 After answering all general knowledge questions, participants lastly indicated

297 demographic variables before being thanked for participation and debriefed. To prevent

298 looking up answers, we implemented a time limit of 30 seconds to enter a judgment in

299 both conditions. Additionally, we implemented a waiting time of two seconds in the

300 sequential-collaboration condition to prevent clicking through the study.

301 Since sequences in sequential collaboration require initial judgments which can

302 be presented to the first participant in a sequential chain, we used participants who

303 answered the wisdom-of-crowds questionnaire to initialize sequences in sequential

304 collaboration. Hence, 37 participants who completed the wisdom-of-crowds

305 questionnaire served to initialize sequences which resulted in 156 participants in the

306 wisdom-of-crowds condition and 148 participants in the sequential-collaboration

307 condition since participants are only included in one experimental condition for the

308 analysis. We used a sequence length of four meaning that a sequential chain in this

309 experiment consists of one participant who completed the wisdom-of-crowds

310 questionnaire followed by three participants who completed the sequential-collaboration

311 questionnaire consecutively. For each participant, only the latest judgment in the

312 sequential chain was presented. SEQUENTIAL COLLABORATION 14

Figure 2 Questionnaire for a) wisdom of crowds, b) sequential collaboration in Experiment 1 and Experiment 2.

313 Results

314 Before analyzing the data, we excluded judgments that were timed out after 30

315 seconds. We identified 315 judgments that were timed out by the experimental

316 software, which resulted in the exclusion of 810 judgments in total since sequential

317 chains containing a judgment that was timed out were excluded completely. From

318 19,760 judgments, 18,950 judgments remained after the exclusion.

319 Afterwards, we standardized the raw judgments item-wise to obtain comparable

320 values over all items for the following analyses. To standardize the judgments, we

321 subtracted the correct answer for each question from the raw judgments and divided the

322 result by the standard deviation of judgments obtained with the wisdom-of-crowds SEQUENTIAL COLLABORATION 15

323 questionnaire. This procedure offers several benefits: First, since judgments are given

324 on vastly different scales (e.g., single digits for the length of a soccer goal, yeardatesfor

325 the year Leonardo da Vinci was born, or millions for the number of students enrolled in

326 German universities), standardization makes the judgments for these questions

327 comparable for the later analysis. Second, this standardization avoids issues arising

328 with other standardization procedures, especially with logarithmic transformation. As

329 some participants answered questions correctly, applying a logarithmic transformation

330 to the difference of the raw judgment and the correct answer would result in avalueof

331 minus infinity. Additionally, the logarithmic transformation is not linear, thus weighing

332 judgments differently depending on their value. Lastly, our transformation improves

333 interpretation of judgments since a negative value now indicates underestimation while

334 a positive value indicates overestimation and a value of zero indicates that the judgment

335 was correct.

336 After standardizing the judgments, we removed the 1% most extreme judgments

337 from the data as these judgments are rather extreme and may distort the results

1 338 obtained. Excluding outliers between conditions is also recommended by André (2021)

339 who demonstrated that excluding outliers within conditions can increase false-positive

340 rates. We identified 190 outliers with this procedure. Again, we excluded these

341 judgments as well as sequences containing judgments that were identified as outliers

342 which resulted in a final sample of 18,694 judgments.

343 Model-based analysis.

344 To test Hypothesis 1a stating that that change is expected to decrease over the

345 course of a sequential chain, we only considered data obtained in the

346 sequential-collaboration questionnaire (i.e., data of participants on position 2, 3, or 4 in

347 the sequential chain) since only these participants were able to decide whether to change

1 This approach was proposed by a reviewer of an earlier version of this manuscript. We applied this routine to all studies reported in this manuscript and added an exploratory analysis to demonstrate the effects of excluding extreme judgments to the results of Hypothesis 2 (comparison of estimates obtained by wisdom of crowds and sequential collaboration). SEQUENTIAL COLLABORATION 16

348 or maintain the presented judgments during the study. Thus, we fitted a generalized

349 linear mixed model using R with the packages lme4 (Bates et al., 2015) and lmerTest

350 (Kuznetsova et al., 2017) The decision to adjust or maintain a judgment served as

351 dependent variable and chain position in the sequence as independent variable. Since

352 the dependent variable can only be 0 (deciding not to adjust the presented judgment) or

353 1 (adjusting the presented judgment), we implemented a logit link function to handle

354 the dichotomous dependent variable. Additionally, as every participant answered the

355 same 65 items, we added random intercepts for items and participants to account for

356 the nested structure of our data (Pinheiro & Bates, 2000). Lastly, we set polynomial

357 contrasts to test for a decline in change rate with increasing chain position.

358 Figure3 displays that mean change rate for each chain position and the

359 according 95% confidence intervals. Even though descriptively in line with Hypothesis

360 1a, the effect of chain position was not significant for both the linear trend(β = −0.311,

361 CI = [−0.685, 0.063], z = −1.629, p = .103) and the quadratic trend (β = 0.162,

362 CI = [−0.213, 0.538], z = 0.846, p = .397). Thus, Hypothesis 1a was not supported.

363 To test whether judgments become more accurate over the course of a sequential

364 chain (Hypothesis 1b), we only considered data obtained in the sequential-collaboration

365 condition and used absolute values of the standardized judgments as dependent

366 measure to capture how far or close the judgments are to the true answer and, thus,

367 how accurate they are. We computed a linear mixed model with absolute standardized

368 judgments as dependent variable and chain position as independent variable. Since

369 fixed-effect coefficients in linear mixed models have been shown to be robust against

370 violations concerning the residual distribution (LeBeau et al., 2018; Schielzeth et al.,

371 2020) and our data contains natural zeros that are not adequately transformed using

372 typical transformations applied to left skewed data, we did not transform the dependent

373 variable to conform to these distributional assumptions. Furthermore, we added

374 random intercepts for participants and items to account for the nested structure of our

375 data, and set polynomial contrasts to test for a decrease in absolute standardized

376 judgments over the course of a sequential chain. SEQUENTIAL COLLABORATION 17

Figure 3 Change rate within a sequential chain

Change rate over chains

Experiment 1 Experiment 2

0.3

0.2

Change rate 0.1

0.0

2 3 4 2 3 4 5 6 Chain position Note. Bars display the emprical means for each chain position, error bars show the 95% between− subjects confidence intervals for each condition.

377 Figure4 displays the mean absolute standardized judgments for each chain

378 position and the according 95% confidence intervals. In line with Hypothesis 1b,

379 judgments became more accurate as the distance to the correct answer declines over the

380 course of a sequential chain. This pattern was also confirmed by the linear mixed model

381 showing a significant negative linear trend (β = −0.024, CI = [−0.036, −0.011],

382 t(143.55) = −3.800, p < .001), thus supporting Hypothesis 1b. Neither the quadratic

383 (β = 0.000,CI = [−0.012, 0.012], t(143.55) = 0.019, p = .985) nor the cubic trend

384 (β = 0.001,CI = [−0.011, 0.013], t(143.55) = 0.173, p = .863) were significant.

385 In order to test whether judgments obtained by sequential collaboration are

386 more accurate than judgments obtained by wisdom of crowds (Hypothesis 2), we first

387 generated the according estimates. For sequential collaboration, the estimate for each

388 chain is the judgment at the last chain position. For wisdom of crowds, we computed

389 estimates by randomly assigning the participants to virtual groups of four and averaging SEQUENTIAL COLLABORATION 18

Figure 4 Accuracy of judgments within a sequential chain

Sequence of judgments over chains

Experiment 1 Experiment 2 0.20

0.15

0.10

0.05

Mean standardized judgments Mean standardized 0.00

1 2 3 4 1 2 3 4 5 6 Chain position Note. Bars display the emprical means for each condition, error bars show the 95% between− subjects confidence intervals for each condition.

2 390 the standardized judgments for each question in these groups. This procedure resulted

391 in estimates from 39 groups of participants in the wisdom-of-crowds condition and

392 estimates from 37 sequences of participants in the sequential-collaboration condition on

393 the 65 items presented in the study. We use the absolute value of the obtained

394 estimates as a dependent variable to assess the accuracy of judgments.

395 To analyze Hypothesis 2, we computed a linear mixed model with the absolute

396 value of the estimate as dependent variable and condition, either wisdom of crowds or

397 sequential collaboration, as independent variable. Additionally, we considered random

398 intercepts for items and group of participants from which the estimate was derived.

2 To check the robustness of the grouping, we computed the mean difference in absolute standardized estimates and the linear mixed model for 100 different random groupings. The mean difference in absolute standardized estimates was 0.004 (SD = 0.0004). The results of the linear mixed model remained the same for all 100 comparisons. SEQUENTIAL COLLABORATION 19

399 Figure5 displays the mean absolute standardized estimates and the according 95%

400 confidence intervals. Contrary to Hypothesis 2, estimates in the sequential-collaboration

401 showed a slightly higher mean absolute standardized estimate meaning that these

402 judgments were descriptively farther away from the correct answers and thus less

403 accurate. However, the linear mixed model did not reveal a significant effect of

404 condition on the absolute standardized estimate (β = 0.008, CI = [−0.005, 0.022],

405 t(72.15) = 1.237, p = .220). These results do not support Hypothesis 2.

Figure 5 Comparison of estimates of wisdom of crowds and sequential collaboration

Comparison of wisdom of crowds and sequential collaboration

Experiment 1 Experiment 2

0.15

0.10

0.05

0.00

Mean absolute standardized estimate Mean absolute standardized WoC Seq WoC Seq Condition Note. WoC = wisdom−of−crowds condition, Seq = sequential−collaboration condition. Bars display the empirical means for each condition, error bars show the 95% between−subjects confidence intervals for each condition.

406 Effect of outlier analysis on the results. Since extreme judgments and the

407 exclusion of extreme judgments may have affected estimates obtained by wisdom of

408 crowds and sequential collaboration differently, we additionally checked how varying

409 levels of outlier exclusion affect the results of the linear mixed model computed for

410 Hypothesis 2. We compared the model coefficient for the difference in conditions for

411 excluding 0% at minimum and the 5% of the most extreme judgments at maximum in SEQUENTIAL COLLABORATION 20

412 0.25% steps. The results of this exploratory analysis are displayed in Figure6. The plot

413 shows that when only excluding few extreme judgments, sequential collaboration was

414 more accurate. However, when excluding at least the 0.75% most extreme judgments,

415 wisdom-of-crowds estimates became more accurate than sequential-collaboration

416 estimates which was even significant in more than half of the comparisons.

Figure 6 Effect of outlier removal on the comparison of wisdom od crowds and sequential collabo- ration

0.00 Experiment 1 Experiment 2

−0.02 Coefficient not significant −0.04 significant

0.00 0.01 0.02 0.03 0.04 0.05 Percent removed extreme judgments Note. Positive values indicate that estimates are more accurate in wisdom of crowds, negative values indicate more accurate estimates in sequential collaboration.

417 Robustness analysis. Given the pronounced impact of outliers in the present

418 analysis, we also used observation-oriented modeling to perform a robustness analysis

419 (Grice et al., 2012, 2017; Grice, 2011). Observation-oriented modeling is a technique to

420 identify underlying patterns in the raw data by classifying observations according to

421 their fit to the predicted of the hypotheses. Instead of performing typical statistical

422 tests, effect sizes are derived directly from the data. For instance, the Percent

423 Classification Correct (PCC) describes the proportion of observations that areinline SEQUENTIAL COLLABORATION 21

424 with the hypothesis. This method focuses on patterns in the raw data rather than on

425 standardized or aggregated data and has a straightforward interpretation since the PCC

426 is always between 0% and 100%. Observation-oriented modeling is assumption free and

427 robust to outliers (Grice et al., 2012, 2017). Therefore, we perform all robustness

428 analyses on data (1) with exclusion of the 1% most extreme judgments as well as (2)

429 with no exclusion of extreme judgments to further provide insights into the robustness

430 of the reported effects. Since observation-oriented modeling has no requirements

431 concerning the distribution of the data, we perform all robustness analyses using raw

432 data.

433 We applied the PCC to check the robustness of Hypothesis 1a and Hypothesis

434 1b. For Hypothesis 1a, we computed the relative frequency of items for which the

435 change rate decreased over the course of a sequential chain. Like in the model-based

436 analysis, we only considered chain position 2, 3, and 4. As the most strict test, we

437 computed the relative frequency of items for which the change rate decreases

438 monotonically in every sequential step (strict montonicity). To improve interpretation,

439 we additionally computed a benchmark for the expected relative frequency under the

440 condition that the hypothesis does not hold. Assuming that an increase and decrease in

441 the change rate have the same probability in every sequential step, we computed a

442 benchmark of 1/3! = 16.67% for the strict monotonicity. For data with the 1% most

443 extreme judgments excluded, 20% of all items (i.e., 13 items) showed a strictly

444 monotonic decrease in change rate. Analyzing the data with no outliers excluded, for

445 18.46% of all items (i.e., 12 items) the change rate decreased strictly monotonically.

446 Both of these PCCs exceeded the benchmark, indicating that Hypothesis 1a is

447 supported by the robustness analysis. Even though Hypothesis 1a was not supported in

448 the model-based analysis, the pattern of mean change rates already resembled what was

449 expected under the hypothesis. As a less strict criterion, we additionally computed the

450 relative frequency of items for which the change rate decreased from the second to the

451 last chain position (global monotonicity). For data with extreme judgments excluded,

452 64.62% of all items (i.e., 42 items) showed a lesser change rate at the last chain position SEQUENTIAL COLLABORATION 22

453 than at the second chain position. A similar pattern emerged for data with no outlier

454 exclusion (66.15% of all items, i.e., 43 items), further supporting Hypothesis 1a.

455 For Hypothesis 1b, we computed the PCC as the relative frequency of chains in

456 sequential collaboration for which the judgments become more accurate or remain

457 stable over the course of a sequential chain. Since participants can decide to maintain

458 judgments, we can only test for weak monotonicity by computing the relative frequency

459 of items for which the accuracy increased or remained stable in every sequential step.

460 Again, we computed a benchmark: Assuming that improving accuracy, keeping a

461 judgment, and worsening a judgment have equal probabilities, we would expect that

3 462 0.66 = 28.75% of all chains improve or remain stable for every sequential step under

463 the condition that the hypothesis does not hold. We found that for data excluding

464 (including) outliers, 79.61% (79.44%) of all chains had stable or improving accuracy for

465 every sequential step. The PCC clearly exceeded the benchmark and supported

466 Hypothesis 1b. Additionally, we computed the relative frequency of chains for which the

467 last judgment is more accurate than the first judgment (global monotonicity). This was

468 the case for 92.72% of all chain with extreme judgments removed and for 92.64% of all

469 chains when no judgments were removed which further supported Hypothesis 1b.

470 To compare the accuracy of wisdom of crowds and sequential collaboration

471 (Hypothesis 2), we computed the common language effect size (McGraw & Wong,

472 1992). The common language effect size is a nonparametric measure, similar toPCC,

473 that is defined as the probability that a randomly drawn value of the dependent

474 variable in one condition is smaller than a randomly drawn value in another condition.

475 To test the robustness of Hypothesis 2, we computed the relative frequency that an

476 estimate is more accurate in sequential collaboration than in wisdom of crowds by

477 comparing all estimates of both conditions item-wise. For data with the 1% most

478 extreme judgments removed, we found that for 52.37% of all comparisons estimates in

479 sequential collaboration were more accurate than wisdom of crowds. This relative

480 frequency was significantly larger than chance in a one-sample t-test

481 (t(64) = 1.705, p = .047). Similarly, when no outliers were excluded, for 54.94% of all SEQUENTIAL COLLABORATION 23

482 comparisons sequential collaboration had more accurate estimates

483 (t(64) = 3.871, p < .001). Even though the model-based analysis did not support

484 Hypothesis 2, the robustness analysis shows some support for Hypothesis 2.

485 Discussion

486 Overall, Experiment 1 yielded mixed results for the hypothesis. Hypothesis 1b

487 was clearly supported by the model-based analysis as well as the robustness analysis.

488 However, Hypothesis 1a was only supported by the robustness analysis relying on

489 patterns in the raw data but not by the model-based analysis even though mean change

490 rates for each chain position resembled the expected pattern. Similarly, Hypothesis 2

491 was also not supported by the model-based analysis but by the robustness analysis.

492 Nonetheless, the difference in accuracy was very small and not significant andthe

493 analysis on effect of removing outliers shows that sequential collaboration might only

494 yield more accurate results than wisdom of crowds when extreme judgments are not

495 removed from the data. This might be the case since sequential collaboration does only

496 require one participant who adjusts the presented judgment to eliminate extreme

497 judgments while averaging in wisdom of crowds requires far more judgments to yield

498 the same effect.

499 Experiment 1 has some limitations restricting the generalizability of the results.

500 First, the sample was a sample of college students. Thus, participants were similar in

501 age and educational background which might have limited to possibilities to improve

502 judgments presented in sequential collaboration since the knowledge of this sample

503 might have been too homogeneous. Furthermore, we implemented a rather short chain

504 length of four. The results, however, might differ when using longer chains.

505 Experiment 2

506 To further examine sequential collaboration and address some of the limitations

507 of Experiment 1, we conducted a second experiment using the same material but SEQUENTIAL COLLABORATION 24

508 increasing the chain length from four to six and collecting an adult sample with no

509 restrictions in age or education. Thereby, we test the robustness of the findings,

510 especially concerning the continuous improvement of judgments within a sequential

511 chain, and further extend the paradigm to a different sample and a longer sequential

3 512 chain. The design and model-based analysis were preregistered at aspredicted.org.

513 Note that we did not preregister Hypothesis 1a concerning the change rate within a

514 sequential chain. Moreover, based on feedback from previous reviews, we improved the

515 exclusion criteria for extreme judgments, and added an exploratory analysis to

516 investigate the effects of removing outliers. Furthermore, we conducted all analyses

517 concerning the accuracy of judgments (Hypothesis 1b and Hypothesis 2) using absolute

518 standardized judgments.

519 Method

520 Material, Design, and Procedure. For Experiment 2, we used the same

521 design (Figure2) and material (Table A1) as in Experiment 1 but made some minor

522 adjustments. Since the sample was not restricted in age, we extended the time limit for

523 answering the questions from 30 to 40 seconds. Furthermore, we implemented a chain

524 length of six meaning that the first participant in a sequential chain answered the

525 wisdom-of-crowds questionnaire and was then followed by five participants answering

526 the sequential-collaboration questionnaire.

527 Participants. A German panel provider sampled 686 participants for this

528 study. During data collection, 21 participants were identified who entered more than

529 10% correct answers, were thus suspected to look up answers and excluded for building

530 sequential chains and later analysis. Additionally, five participants with irregular

531 answer patterns were identified and excluded. One participant was excluded since the

532 position in the sequential chain was allocated to two different participants. After

533 excluding these participants and, if necessary, participants in the same sequential chain,

3 The preregistration form as available at https://aspredicted.org/blind.php?x=5q2n5z. SEQUENTIAL COLLABORATION 25

534 the final sample comprised 654 participants. Half of the participants were female

535 (50.00%), the mean age was 47.90 years (SD = 19.55). Most participants had a college

536 degree (27.68%), followed by a high-school diploma (25.69%), and vocational education

537 (22.78%) while 23.85% of all participants had a lesser education attainment.

538 Results

539 As preregistered and established in Experiment 1, we first identified and

540 excluded judgments that were timed out after 40 seconds and chains containing timed

541 out judgments. From 42,520 initial judgments, 40,814 judgments remained after this

542 exclusion. The remaining judgments were then standardized item-wise using the same

543 procedure as in Experiment 1 (subtracting the correct answer from the individual

544 judgment and dividing the result by the standard deviation as computed from the

545 wisdom-of-crowds questionnaire). Lastly, the 1% most extreme judgments and chains

546 containing these judgments were excluded from the data resulting in 40,125 judgments

547 for the analysis. We conducted the same analyses as in Experiment 1 for the

548 model-based analysis, for the exploration of the effect of excluding extreme judgments,

549 and for the robustness analysis.

550 Model-based analysis.

551 We again estimated a generalized linear mixed model to test Hypothesis 1a with

552 the decision whether to adjust or maintain a judgment as dependent and chain position

553 as independent variable. Figure3 displays the mean change rate for each chain position

554 with the according confidence intervals. As hypothesized, the plot shows that the change

555 rate decreased over the course of a sequential chain. This pattern is also supported by

556 the generalized linear mixed model showing a significant negative linear trend

557 (β = −0.581, CI = [−0.982, −0.181], z = −2.843, p = .004). No other trend though was

558 significant (β = 0.154, CI = [−0.245, 0.553], z = 0.759, p = .448 for the quadratic trend,

559 β = 0.085, CI = [−0.317, 0.487], z = 0.413, p = .679 for the cubic trend, and β = 0.189,

560 CI = [−0.209, 0.588], z = 0.932, p = .351 for the trend to the power of four). SEQUENTIAL COLLABORATION 26

561 Next, we estimated a linear mixed model with absolute standardized judgments

562 as dependent variable and chain position as independent variable to test Hypothesis 1b.

563 Supporting Hypothesis 1b, the model revealed a significant negative linear trend

564 between chain position and absolute standardized judgment (β = −0.045,

565 CI = [−0.054, −0.036], t(290.96) = −9.528, p < .001). All other trends were not

566 significant (β = 0.007, CI = [−0.002, 0.016], t(291.00) = 1.572, p = .117 for the

567 quadratic trend, β = −0.003, CI = [−0.012, 0.006], t(290.94) = −0.696, p = .487 for the

568 cubic trend, β = 0.001, CI = [−0.008, 0.010], t(290.94) = 0.174, p = .862 for the trend

569 to the power of four, and β = 0.001, CI = [−0.008, 0.010], t(291.02) = 0.184, p = .854

570 for the trend to the power of five). The significant linear trend is also displayed in

571 Figure4 showing a decrease in mean absolute standardized judgments over the course

572 of a sequential chain.

573 Before analyzing Hypothesis 2, we again computed the wisdom-of-crowds

574 estimates from randomly composed groups of six participants. Then we computed a

575 linear mixed model with absolute standardized estimate as dependent and condition as

576 independent variable as described in Experiment 1. Figure5 displays the mean absolute

577 standardized estimates for the wisdom-of-crowds and the sequential-collaboration

578 condition showing that estimates obtained by sequential collaboration are slightly more

579 accurate than estimates obtained by wisdom of crowds. This impression was confirmed

580 by the linear mixed model showing a significant difference between absolute

581 standardized estimates in favor of sequential collaboration (β = −0.014,

582 CI = [−0.023, −0.005], t(100.71) = −3.067, p.003) which supports Hypothesis 2.

583 Effect of outlier analysis on the results. The test the effect of excluding

584 extreme judgments on the results of the analysis for Hypothesis 2, we again computed

585 the linear mixed model for data with excluding up to 5% of the most extreme

586 judgments. The results displayed in Figure6 show that the effect is more stable in

587 Experiment 2 than in Experiment 1. While the accuracy advantage of sequential

588 collaboration compared to wisdom of crowds is stronger when no or only a small

589 proportion of outliers is excluded, the effect remains significant for almost all levels of SEQUENTIAL COLLABORATION 27

590 outlier exclusion applied to the data.

591 Robustness Analysis. To further strengthen the results of the model-based

592 analysis, we again computed a robustness analysis based on observation-oriented

593 modeling. To examine the robustness of Hypothesis 1a, we compared the change rates

594 over chain positions for each item. For data excluding (including) outliers, we found that

595 1.54% (3.08%) of all items showed a strictly monotonic decline in change rate. Even

596 though this result exceeds the benchmark of 1/5! = 0.83%, only one (two) item follows

597 the strict monotonicity. Thus, we also compared the change rate between the second

598 and last chain position. For 92.31% (90.77%) of all items we found that the change rate

599 decreases from the second to the last chain position, supporting Hypothesis 1a.

600 To test Hypothesis 1b, we computed the PCC for sequential chains that showed

601 improvement in accuracy. When excluding (including) outliers, 59.83% (59.24%) of all

602 chains show a weak monotonic increase in accuracy, which is more than the benchmark

5 603 of 0.66 = 12.52%. Additionally, for 89.59% (89.41%) of all chains judgments at the last

604 chain position were more accurate than judgments at the first chain position. These

605 results further support Hypothesis 1b.

606 Lastly, we computed the common language effect size to compare the accuracy of

607 estimates obtained by wisdom of crowds and sequential collaboration. For the subset of

608 data excluding outliers, we found that in 57.22% of all comparisons sequential

609 collaboration yielded more accurate results than wisdom of crowds which was

610 significantly larger than chance (t(64) = 4.526, p < .001). Similarly, when outliers

611 remained in the data 59.58% (t(64) = 6.311, p < .001) of all comparisons where in favor

612 of sequential collaboration which further supports Hypothesis 2.

613 Discussion

614 In Experiment 2, all hypotheses were supported using model-based analyses as

615 well as robustness analyses. For sequential collaboration, the results showed a

616 significant decrease in change rate over the course of a sequential chain while judgment SEQUENTIAL COLLABORATION 28

617 accuracy increases which was also supported by our robustness analyses. Additionally,

618 estimates obtained by sequential collaboration were more accurate than estimates

619 obtained by wisdom of crowds. This effect was also stable for almost all tested

620 proportions of outlier exclusion and was also shown in the robustness analysis.

621 While these results seem promising for sequential collaboration, Experiment 1

622 and 2 both used the same material. This limits the generalizability of the obtained

623 results. Additionally, the general knowledge questions used in the experiments have

624 some limitations. First, the questions are prone to extreme judgments. For instance,

625 one participant answered 120,000,000,000,000,000 kilometers to the question “How long

626 is the mean distance between Earth and Moon?” for which the correct answer is

627 384,400 kilometers. Since the exclusion of outliers has an effect on the results obtained

628 such that sequential collaboration seems especially beneficial when extreme judgments

629 are considered, having extreme judgments in the data might distort the performance of

630 wisdom of crowds and sequential collaboration. Furthermore, general knowledge

631 questions occur rather seldom in online collaboration projects, thus, limiting the

632 ecological validity of the conclusions. Thus, the results should be replicated using

633 different material less prone to extreme judgments and closer to actual online

634 collaboration projects.

635 Experiment 3

636 In Experiment 3, we conceptually replicated Experiment 1. Instead of general

637 knowledge questions, we generated geographic maps on which participants had to give

638 judgments about the position of differt cities. Hence, we focus on two-dimensional

639 location judgments (i.e., x- and y-coordinates) rather than one-dimensional numerical

640 judgments. In contrast to general knowledge questions, two-dimensional location

641 judgments on geographical maps are naturally constrained by the size of the map (more

642 precisely, by the maximum distance between the correct location and all possible

643 judgments) which limits the range of extreme judgments. Otherwise, the study design

644 was similar to Experiment 1 and only minor changes were applied due to the different SEQUENTIAL COLLABORATION 29

4 645 material. This study was also preregistered at aspredicted.org. In addition to the

646 preregistration, we also analyze whether the frequency of changes decreases over the

647 course of a sequential chain (Hypothesis 1a). Furthermore, we adjusted the outlier

648 analysis to the procedure we used in Experiment 1 and Experiment 2 and added the

649 analysis on the effect of outlier exclusion on the results of Hypothesis 2 as wellasthe

650 robustness analysis with observation-oriented modeling.

651 Method

652 Participants. We recruited 417 adult participants via a commercial German

653 panel provider. Since participants were presented with maps in the study, they were

654 supposed to only participate using a computer. Due to issues in the recruitment of

655 participants by the panel provider, 39 participants were nonetheless able to access and

656 complete the study using mobile devices. We excluded all participants using mobile

657 devices and sequences that included participants using mobile devices which resulted in

658 a total of 70 participants excluded. Additionally, four participants were able to access

659 and complete the study a second time. Therefore, we excluded the data collected at the

660 second participation. Since two of those participants were assigned to the

661 sequential-collaboration condition for their second participation and sequences were

662 built based on their judgments, we excluded another 10 participants in total. We also

663 checked whether participants looked up the correct answers or whether participants

664 clicked at a similar position for all items. We identified one participant who was

665 suspected to look up answers. The final sample comprised 333 participants of whom

666 45.95% were female. The mean age was 45.49 years (SD = 15.17). Participants had a

667 diverse educational background with 35.44% holding a college degree, 24.92% having a

668 high school diploma, 24.02% having vocational education, and 18.32% having a lesser

669 educational attainment.

670 Material. As stimulus material, we selected seven maps displaying different

4 The complete preregistration form is available at https://aspredicted.org/blind.php?x=e7cm3e. SEQUENTIAL COLLABORATION 30

671 European countries (i.e., Italy, France, Germany, United Kingdom and Ireland, Austria

672 and Switzerland, Spain and Portugal, and, lastly, Poland, Czech, Hungary, and Slovenia

673 combined). All maps were on a scale of 1:5,000,000 with an image resolution of 800 x

674 500 pixels. Regarding the available geographic information, the maps only showed land

675 mass, oceans, and country borders (but no rivers, mountains, forests, or other cities).

676 The countries of interest were colored white while all other countries were colored gray;

677 oceans were colored blue and country borders were represented as black lines. For each

678 map, we selected several cities while considering the expected geographic knowledge of

679 German participants. This resulted between four cities (for the map of Poland, Czech,

680 Hungary, and Slovenia) and seventeen cities (for the map of Germany). Overall, we

681 selected 57 cities across all seven maps. A comprehensive overview of the material can

682 be found in Table B1 in the Appendix. All presented maps are available in the

683 supplemental material at the OSF (https://osf.io/96nsk/).

684 Design and Procedure. We used a between-subjects design and randomly

685 assigned participants to either the sequential-collaboration questionnaire (112

686 participants) or the wisdom-of-crowds questionnaire (221 participants). After being

687 informed about the aim of and consenting to the study, participants were informed

688 about their task. In the wisdom-of-crowds questionnaire, participants were asked to

689 indicate the position of the given cities on the presented map as accurately as possible.

690 In the sequential-collaboration questionnaire, participants were provided with the

691 location judgment of a city given by a previous participant. Subsequently, they could

692 choose either to modify the given position by indicating a new position or to directly

693 continue to the next city without changing the current location judgment. The order in

694 which the seven maps were presented was randomized as was the order of the presented

695 city within each map. Furthermore, each trial asked about the position of only one city

696 such that participants provided only a single location judgment before continuing to the

697 next city. Participants were given 40 seconds to indicate the city’s position or to decide

698 to not change the presented position. Additionally, participants completing the

699 sequential-collaboration questionnaire had a waiting period of 2 seconds before they SEQUENTIAL COLLABORATION 31

700 could continue to the next city. After positioning the cities or deciding not to change

701 the presented position for all 57 cities, participants were asked for demographic

702 information. Lastly, they were debriefed and thanked for participation.

703 As in Experiment 1, we formed sequences of four participants meaning that one

704 participant who answered the wisdom-of-crowds questionnaire started a sequential chain

705 followed by three participants who completed the sequential-collaboration

706 questionnaire. This resulted in 183 participants in the wisdom-of-crowds condition and

707 150 participants in the sequential collaboration condition.

708 Results

709 Before testing the hypotheses, we computed the Euclidean distance to the correct

5 710 answer for each judgment as dependent variable for Hypothesis 1b and Hypothesis 2.

711 Next, we excluded judgments that were timed out after 40 seconds. With this

712 procedure, we identified 225 judgments that were timed out; from originally 18,981

713 judgments, 18,433 judgments remained after excluding the timed out judgments and

714 sequential chains containing timed out judgments. As already applied in Experiment 1

715 and Experiment 2, we additionally excluded the 1% most extreme judgments (i.e., 184

716 judgments) as defined by the distance to the correct answer. After excluding these

717 extreme judgments and sequential chains that contained extreme judgments, 18,161

718 judgments remained for analysis. The model-based analysis as well as the exploration

719 concerning the effect of outlier exclusion on the results and the robustness analysis were

720 conducted analogous to the procedure in Experiment 1 and Experiment 2.

721 Model-based analysis.

722 To analyze Hypothesis 1a, we applied a generalized linear mixed model to the

723 data with whether a judgment was adjusted or maintained as dependent and chain

724 position as independent variable. Figure7 displays the change rate for each chain

5 All hypotheses were also analyzed using the x- and y-coordinate separately as dependent variables. These analyses yielded the same results as the analysis using Euclidean distances as dependent variable. SEQUENTIAL COLLABORATION 32

725 position with according 95% between-subjects confidence intervals. As expected, the

726 plot shows a decreasing change rate with increasing chain position. This trend is

727 confirmed by the generalized linear model showing a significant negative linear trend

728 between change rate and chain position (β = −0.937, CI = [−1.845, −0.028],

729 z = −2.021, p = .043), thus supporting Hypothesis 1a. The quadratic trend was not

730 significant (β = 0.409, CI = [−0.503, 1.322], z = 0.879, p = .379).

Figure 7 Change rate within a sequential chain

Sequence of judgments over chains

0.6

0.4 Change rate 0.2

0.0

2 3 4 Chain position Note. Points display the emprical means for each chain position, error bars show the 95% between− subjects confidence intervals for each condition.

731 To test Hypothesis 1b, we computed a linear mixed model as in Experiment 1

732 and Experiment 2. The Euclidean distance of every judgment to the true position of the

733 city served as dependent variable, the chain position as independent variable. The

734 model revealed a significant linear trend between chain position and distance

735 (β = −1.070, CI = [−1.513, −0.626], t(143.10) = −4.700, p < .001). Furthermore, the

736 quadratic trend was also significant (β = 0.531, CI = [0.087, 0.974], t(143.10) = 2.332,

737 p = .021), the cubic trend, however, was not significant (β = −0.054, SEQUENTIAL COLLABORATION 33

738 CI = [−0.498, 0.390], t(143.10) = −0.238, p = .812). The negative linear trend in

739 combination with a positive quadratic trend indicates a steep decrease in distance with

740 increasing chain position. This pattern is also displayed in Figure8. As expected in

741 Hypothesis 1b, the distance to the true position of a city decreases with increasing

742 chain position.

Figure 8 Accuracy of judgments within a sequential chain

Sequence of judgments over chains

60

40

20

0 Mean distance to correct answer (in pixels) Mean distance to correct answer 1 2 3 4 Chain position Note. Points display the emprical means for each chain position, error bars show the 95% between− subjects confidence intervals for each condition.

743 Before analyzing Hypothesis 2, we computed the estimates for each condition

744 according to the procedure in Experiment 1 and Experiment 2. To obtain estimates for

745 wisdom of crowds, we randomly grouped data of participants into groups of four and

746 averaged the judgments for each coordinate. Based on the mean position for each city

747 in those groups, we computed the Euclidean distances to the true positions for each

748 estimate as dependent variable. The estimate from the sequential-collaboration

749 condition is the last judgment in each chain. Figure9 displays the mean distance to the

750 true position over all estimates with according confidence intervals. As expected, SEQUENTIAL COLLABORATION 34

751 estimates obtained by sequential collaboration show less distance to the true position

752 than estimates obtained by wisdom of crowds. This impression is also supported by as

753 linear mixed model with the euclidean distance as dependent, condition as independent

754 variable. We found a significant negative effect of condition on distanceβ ( = −7.088,

755 CI = [−13.971, −0.215], t(80.61) = −2.027, p = .046) indicating that sequential

756 collaboration yielded more accurate estimates than wisdom of crowds.

Figure 9 Estimates obtained by wisdom of crowds and sequential collaboration

Comparison of wisdom of crowds and sequential collaboration

40

20

0 Mean distance to correct answer (in pixels) Mean distance to correct answer WoC Seq Condition Note. Points display the emprical means for each chain position, error bars show the 95% between− subjects confidence intervals for each condition.

757 Effect of outlier analysis on the results. As in Experiment 1 and

758 Experiment 2, we additionally explored the effect of excluding extreme judgments from

759 the data before comparing accuracy of estimates obtained by wisdom of crowds and

760 sequential collaboration. Again, we computed the coefficient of the linear mixed model

761 for Hypothesis 2 for data with none to 5% of the most extreme judgments excluded in

762 0.25% steps. The results of this exploratory analysis are displayed in Figure 10. As

763 already observed in Experiment 2, the advantage of sequential collaboration concerning SEQUENTIAL COLLABORATION 35

764 accuracy decreases the more extreme judgments are excluded as outliers. However, the

765 coefficient remains significantly negative. Thus, estimates obtained by sequential

766 collaboration are more accurate for every proportion of excluded extreme outliers.

Figure 10 Effect of outlier removal on the comparison of wisdom of crowds and sequential collabo- ration

0.00

−0.25

−0.50 Coefficient

−0.75

0.00 0.01 0.02 0.03 0.04 0.05 Percent removed extreme judgments Note. Positive values indicate that judgments are more accurate in wisdom of crowds, negative values indicate more accurate judgments in sequential collaboration.

767 Robustness Analysis. To check the robustness of Hypothesis 1a, we, again,

768 computed the relative frequency of items for which the change rate decreases for every

769 sequential step (strict monotonicity) as well as the the frequency of items for which the

770 change rate decreased from the second to the last chain position (global monotonicity)

771 as PCC. As a benchmark, we assume that the probability of both an increasing and a

772 decreasing change rate is equal which results in a 1/3! = 16.67% of items that are

773 expected to behave according to the hypothesis by chance. For the subset of data

774 excluding (including) all outliers, we found that for 40.35% (40.35%) of all items the

775 change rate decreased strictly monotonic from the second to the last chain position. SEQUENTIAL COLLABORATION 36

776 Since the benchmark of 16.67% is exceeded by the relative frequency of items

777 conforming strict monotonicity in change rate for both subsets of data, these results

778 further confirm the model-based analysis of Hypothesis 1a. Additionally, 91.23%

779 (89.47%) of all items showed a decrease in change rate from the second to the last chain

780 position.

781 To check the robustness of Hypothesis 1b, we computed the PCC for increasing

782 accuracy (operationalized as decreasing distance of the judgments to the true position)

783 over the course a sequential chain. As a benchmark for weak monotonicity, we assumed

784 that improving, maintaining, and impairing a judgment are all equally likely (i.e., 33%).

3 785 Under this assumption 0.66 = 28.75% of all chains are expected to have only improving

786 or maintained judgments. For data excluding (including) outliers, we found that 35.20%

787 (35.30%) of all chains show a monotonic increase in accuracy which are both above our

788 benchmark and further confirm Hypothesis 1b. Additionally, 71.97% (72.07%) of all

789 chains improve in accuracy from the first to the last chain position.

790 To analyze the robustness of Hypothesis 2, we computed for each item the

791 frequency of estimate comparisons in which sequential collaboration was more accurate

792 than wisdom of crowds (common language effect size). When excluding outliers, we

793 found that for 57.07% of all comparisons the sequential-collaboration estimate is more

794 accurate than the wisdom-of-crowds estimate which significantly differs from chance

795 (t(56) = 5.019, p < .001). When outliers remained in the data, for 61.78% of all

796 comparisons sequential collaboration yielded a more accurate estimate (t(56) = 7.603,

797 p < .001). These results further support our model-based analysis of Hypothesis 2.

798 Discussion

799 In Experiment 3, we replicated the results of Experiment 2 using geographic

800 maps instead of general knowledge questions, showing that sequential collaboration

801 yields more accurate estimates over the course of a sequential chain while the change

802 rate of judgment decreases. Additionally, sequential collaboration yielded more accurate

803 results than wisdom of crowds. Furthermore, the analysis on the effect of outlier SEQUENTIAL COLLABORATION 37

804 exclusion showed that this effect becomes smaller the more extreme judgments are

805 removed, but it remains significant.

806 General Discussion

807 Sequential collaboration describes a collaboration method in which contributors

808 form a sequential chain of judgments by deciding to adjust or to maintain the latest

809 judgment provided by a previous contributor. In three studies using general knowledge

810 questions and geographic maps, we examined whether the change rate decreases over

811 the course of a sequential chain (Hypothesis 1a) while judgment accuracy increases

812 (Hypothesis 1b). Additionally, we compared the accuracy of estimates obtained by

813 sequential collaboration and wisdom of crowds (Hypothesis 2). While the results of all

814 three experiments were in line with Hypothesis 1b, only Experiment 2 and Experiment

815 3 supported Hypothesis 1a and Hypothesis 2. However, robustness analyses supported

816 the hypotheses in all three experiments.

817 Hence, the sequential collaboration provides accurate results and was not

818 obstructed by anchoring effects (Mussweiler et al., 2004; Tversky & Kahneman, 1974)or

819 high rates of inaccurate changes due to egocentric discounting (Bonaccio & Dalal,

820 2006). When the judgments improve and the change rate decreases over the course of a

821 sequential chain, judgments may finally converge to a correct judgment that isnot

822 changed anymore. Since large-scale online collaboration projects rely on this basic

823 mechanism, our results shed light on a possible mechanism that renders those projects

824 successful with respect to yielding highly accurate information.

825 Moreover, our findings extend research on how individual judgments are

826 influenced when providing information about others’ judgments. Several studies hinted

827 towards dependent judgments being beneficial in certain situations (Becker et al., 2017;

828 King et al., 2012; Koehler & Beauregard, 2006; Minson et al., 2017), and our results

829 show that dependent incremental judgments can also yield accurate estimates. Overall,

830 providing judgments of others can improve individual judgments. SEQUENTIAL COLLABORATION 38

831 Furthermore, estimates obtained by sequential collaboration were more accurate

832 than estimates obtained by wisdom of crowds. Nonetheless we found that wisdom of

833 crowds seems to profit more from outlier exclusion than sequential collaboration.

834 Sequential collaboration allows completely eliminating extreme judgments in the

835 sequential chain by contributors adjusting these judgments. Thus, an accurate estimate

836 can still be obtained even though the chain may have started with an extreme judgment.

837 In wisdom of crowds, extreme judgments can distort estimates more seriously since

838 judgments are aggregated with no weighing. Thus, far more judgments or an equally

839 large extreme judgment on the other side of the distribution are necessary to absorb the

840 initial extreme judgment. However, sequential collaboration was also successful in a

841 paradigm were extreme outliers are very unlikely (Experiment 3). Moreover, excluding

842 outliers reduced but did not eliminate the advantage over wisdom of crowds in accuracy.

843 This shows that even though wisdom of crowds already yields highly accurate results

844 and can profit from error cancellation in judgments, some improvements in accuracy are

845 obtainable by dependent incremental judgments. As expected, since wisdom of crowds

846 already yields highly accurate estimates, this effect was rather small.

847 Possible mechanisms

848 Even though the results are encouraging for future research on sequential

849 collaboration, our experiments did not explain why sequential collaboration yields

850 accurate results. Nonetheless, existing research gives an impression which processes

851 could lead to improved judgments over the course of a sequential chain. For example,

852 while most studies on group decision making show that groups perform worse than

853 individuals on several tasks (Kerr & Tindale, 2004), some studies on group decision

854 making found that judgments of groups can be more accurate than the average of the

855 group members’ individual judgments (Laughlin et al., 1999; Sniezek & Henry, 1989,

856 1990). This is attributed to increases in individual capability through group interaction

857 (group-to-individual transfer, Schultze et al., 2012). Group-to-individual transfer also

858 occurs when group members only interact once (Stern et al., 2017) and when interaction SEQUENTIAL COLLABORATION 39

859 does not take place in person (Maciejovsky & Budescu, 2007). Applying this research to

860 sequential collaboration, contributors may profit from group-to-individual transfer

861 through the judgments they encounter and, thus, give more accurate judgments

862 themselves subsequently. However, in our study design, when less capable contributors

863 encountered judgments of contributors with higher capability those contributors could

864 adjust the same judgments previously. Thus, less capable contributors were probably

865 not able to substantially contribute to the already given judgments.

866 Besides, sequential collaboration may yield accurate results as contributors

867 implicitly weigh judgments by expertise. Judgments in wisdom of crowds improve when

868 the questions to be answered are self-selected (Bennett et al., 2018) while Kerr and

869 Bruun (1983) found that participants refrain from contributing to group work when

870 they feel their judgment is dispensable. Furthermore, weighing judgments by expertise

871 was found to improve estimates in wisdom of crowds (Budescu & Chen, 2014; Merkle et

872 al., 2020). Combining these findings, the task structure of sequential collaboration

873 allows contributors to maintain judgments when they do not feel they can sufficiently

874 contribute to the presented judgment while they can adjust the presented judgments

875 when they feel that an adjustment leads to an improvement in this judgment. Thereby,

876 judgments are implicitly weighted by expertise such that contributors maintain

877 judgments that they cannot improve and adjust judgments that they can improve. In

878 an ideal case, this leads to improvements in judgments until a correct judgment is no

879 longer adjusted. Future research should examine whether these possible mechanisms

880 contribute to the improvement of judgments in sequential collaboration.

881 Limitations and future research directions

882 Even though sequential collaboration seems a promising paradigm for future

883 research, our experiments have some limitations. First, we only studied the basic

884 mechanism of sequential collaboration. However, Wikipedia and OpenStreetMap have

885 several additional functions such as discussion sites, a board of moderators checking on

886 the contributors’ activities, a history of all changes ever made to an entry. These SEQUENTIAL COLLABORATION 40

887 additional functions may also influence the changing behavior in online collaborative

888 projects even though they are less prominent than the information presented in an

889 article or on the map.

890 Second, the results of the exploratory analysis concerning the effect of outlier

891 removal hint toward sequential collaboration requiring a heterogeneous sample to yield

892 significantly more accurate results than wisdom of crowds since this effect wasnot

893 found in a student sample in Experiment 1 but in more heterogeneous samples in

894 Experiment 2 and Experiment 3. Future research should address how a crowds of

895 potential contributors should be composed to optimally generate accurate estimates.

896 Lastly, our studies used small chains of four to six contributors. However,

897 sequential chains in online collaborative projects can be much longer comprising dozens

898 of contributions. Nonetheless, very long chains may be prone to be obstructed by

899 extreme judgments since a single contributor can worsen an already correct judgment

900 while extreme judgments in wisdom of crowds can be absorbed when many judgments

901 are available. Thus, the behavior of contributors in longer chains should be examined in

902 the future and be compared to estimates obtained by wisdom of crowds.

903 Conclusion

904 Sequential collaboration as an underlying process of large-scale online

905 collaborative projects such as Wikipedia and OpenStreetMap has become increasingly

906 important. Our studies show that contributors can successfully collaborate through

907 adjusting and maintaining previous judgments of other contributors. Sequential

908 collaboration can thus be a promising paradigm for future research due to the high

909 practical theoretical relevance about how dependency and opting out of giving a

910 judgment affect sequential collaboration. SEQUENTIAL COLLABORATION 41

911 References

912 André, Q. (2021). Outlier exclusion procedures must be blind to the researcher’s

913 hypothesis. Journal of Experimental Psychology: General.

914 https://doi.org/10.1037/xge0001069

915 Arazy, O., Morgan, W., & Patterson, R. (2006). Wisdom of the crowds:

916 Decentralized knowledge construction in wikipedia. SSRN Electronic

917 Journal. https://doi.org/10.2139/ssrn.1025624

918 Baeza-Yates, R., & Saez-Trumper, D. (2015). Wisdom of the crowd or wisdom of

919 a few? An analysis of users’ content generation. Proceedings of the 26th ACM

920 Conference on Hypertext & Social Media - HT ’15, 69–74.

921 https://doi.org/10.1145/2700171.2791056

922 Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear

923 mixed-effects models using lme4. Journal of Statistical Software, 67 (1), 1–48.

924 https://doi.org/10.18637/jss.v067.i01

925 Becker, J., Brackbill, D., & Centola, D. (2017). Network dynamics of social

926 influence in the wisdom of crowds. Proceedings of the National Academy of

927 Sciences, 114, E5070–E5076. https://doi.org/10.1073/pnas.1615978114

928 Bennett, S. T., Benjamin, A. S., Mistry, P. K., & Steyvers, M. (2018). Making a

929 wiser crowd: Benefits of individual metacognitive control on crowd

930 performance. Computational Brain & Behavior, 1, 90–99.

931 https://doi.org/10.1007/s42113-018-0006-4

932 Bonaccio, S., & Dalal, R. S. (2006). Advice taking and decision-making: An

933 integrative literature review, and implications for the organizational sciences.

934 Organizational Behavior and Human Decision Processes, 101, 127–151.

935 https://doi.org/10.1016/j.obhdp.2006.07.001

936 Bonner, B. L., Sillito, S. D., & Baumann, M. R. (2007). Collective estimation:

937 Accuracy, expertise, and extroversion as sources of intra-group influence. SEQUENTIAL COLLABORATION 42

938 Organizational Behavior and Human Decision Processes, 103, 121–133.

939 https://doi.org/10.1016/j.obhdp.2006.05.001

940 Bray, R. M., Kerr, N. L., & Atkin, R. S. (1978). Effects of group size, problem

941 difficulty, and sex on group performance and member reactions. Journal of

942 Personality and Social Psychology, 36, 1224–1240.

943 https://doi.org/10.1037/0022-3514.36.11.1224

944 Budescu, D. V., & Chen, E. (2014). Identifying expertise to extract the wisdom

945 of crowds. Management Science, 61, 267–280.

946 https://doi.org/10.1287/mnsc.2014.1909

947 Chen, J., Ren, Y., & Riedl, J. (2010). The effects of diversity on group

948 productivity and member withdrawal in online volunteer groups. Proceedings

949 of the 28th International Conference on Human Factors in Computing

950 Systems - CHI ’10, 821. https://doi.org/10.1145/1753326.1753447

951 Dalkey, N., & Helmer, O. (1963). An experimental application of the DELPHI

952 method to the use of experts. Management Science, 9, 458–467.

953 https://doi.org/10.1287/mnsc.9.3.458

954 Davis-Stober, C. P., Budescu, D. V., Dana, J., & Broomell, S. B. (2014). When

955 is a crowd wise? Decision, 1 (2), 79–101. https://doi.org/10.1037/dec0000004

956 Dennis, A. R. (1996). Information exchange and use in small group decision

957 making. Small Group Research, 27, 532–550.

958 https://doi.org/10.1177/1046496496274003

959 Dennis, A. R., Hilmer, K. M., & Taylor, N. J. (1998). Information exchange and

960 use in GSS and verbal group decision making: Effects of minority influence.

961 Journal of Management Information Systems, 14 (3), 61–88. https://search.

962 proquest.com/docview/218899978/abstract/3186065D57B54572PQ/1

963 Galton, F. (1907). Vox populi. Nature, 75, 450–451.

964 https://doi.org/10.1038/075450a0 SEQUENTIAL COLLABORATION 43

965 Geist, M. R. (2010). Using the delphi method to engage stakeholders: A

966 comparison of two studies. Evaluation and Program Planning, 33, 147–154.

967 https://doi.org/10.1016/j.evalprogplan.2009.06.006

968 Giles, J. (2005). Internet encyclopaedias go head to head. Nature, 438, 900–901.

969 https://doi.org/10.1038/438900a

970 Girres, J.-F., & Touya, G. (2010). Quality assessment of the french

971 OpenStreetMap dataset. Transactions in GIS, 14, 435–459.

972 https://doi.org/10.1111/j.1467-9671.2010.01203.x

973 Grice, J. W. (2011). Observation oriented modeling: Analysis of cause in the

974 behavioral sciences. Elsevier Academic Press.

975 Grice, J. W., Barrett, P. T., Schlimgen, L. A., & Abramson, C. I. (2012).

976 Toward a brighter future for psychology as an observation oriented science.

977 Behavioral Sciences, 2, 1–22. https://doi.org/10.3390/bs2010001

978 Grice, J. W., Yepez, M., Wilson, N. L., & Shoda, Y. (2017).

979 Observation-oriented modeling: Going beyond “is it all a matter of chance?”

980 Educational and Psychological Measurement, 77, 855–867.

981 https://doi.org/10.1177/0013164416667985

982 Hogarth, R. M. (1978). A note on aggregating opinions. Organizational Behavior

983 and Human Performance, 21, 40–46.

984 https://doi.org/10.1016/0030-5073(78)90037-5

985 Hueffer, K., Fonseca, M. A., Leiserowitz, A., & Taylor, K. M. (2013). The

986 wisdom of crowds: Predicting a weather and climate-related event. Judgment

987 and Decision Making, 8, 16.

988 Jeste, D. V., Ardelt, M., Blazer, D., Kraemer, H. C., Vaillant, G., & Meeks, T.

989 W. (2010). Expert consensus on characteristics of wisdom: A delphi method

990 study. The Gerontologist, 50, 668–680.

991 https://doi.org/10.1093/geront/gnq022 SEQUENTIAL COLLABORATION 44

992 Keck, S., & Tang, W. (2020). Enhancing the wisdom of the crowd with

993 cognitive-process diversity: The benefits of aggregating intuitive and

994 analytical judgments. Psychological Science, 1272–1282.

995 https://doi.org/10.1177/0956797620941840

996 Kerr, N. L., & Bruun, S. E. (1983). Dispensability of member effort and group

997 motivation losses: Free-rider effects. Journal of Personality and Social

998 Psychology, 44, 78–94. https://doi.org/10.1037/0022-3514.44.1.78

999 Kerr, N. L., & Tindale, R. S. (2004). Group performance and decision making.

1000 Annual Review of Psychology, 55, 623–655.

1001 https://doi.org/10.1146/annurev.psych.55.090902.142009

1002 King, A. J., Cheng, L., Starke, S. D., & Myatt, J. P. (2012). Is the true ‘wisdom

1003 of the crowd’ to copy successful individuals? Biology Letters, 8, 197–200.

1004 https://doi.org/10.1098/rsbl.2011.0795

1005 Kittur, A., & Kraut, R. E. (2008). Harnessing the wisdom of crowds in

1006 wikipedia: Quality through coordination. Proceedings of the 2008 ACM

1007 Conference on Computer Supported Cooperative Work, 37–46.

1008 https://doi.org/10.1145/1460563.1460572

1009 Kittur, A., Pendleton, B. A., Suh, B., & Mytkowicz, T. (2007). Power of the few

1010 vs. Wisdom of the crowd: Wikipedia and the rise of the bourgeoisie. CHI

1011 ’07: Proceedings of the SIGCHI Conference on Human Factors in Computing

1012 Systems.

1013 Koehler, D. J., & Beauregard, T. A. (2006). Illusion of confirmation from

1014 exposure to another’s hypothesis. Journal of Behavioral Decision Making, 19,

1015 61–78. https://doi.org/https://doi.org/10.1002/bdm.513

1016 Kräenbring, J., Monzon Penza, T., Gutmann, J., Muehlich, S., Zolk, O.,

1017 Wojnowski, L., Maas, R., Engelhardt, S., & Sarikas, A. (2014). Accuracy and

1018 completeness of drug information in wikipedia: A comparison with standard

1019 textbooks of pharmacology. PLoS ONE, 9 (9). SEQUENTIAL COLLABORATION 45

1020 https://doi.org/10.1371/journal.pone.0106930

1021 Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest

1022 package: Tests in linear mixed effects models. Journal of Statistical Software,

1023 82 (13), 1–26. https://doi.org/10.18637/jss.v082.i13

1024 Larrick, R. P., & Soll, J. B. (2006). Intuitions about combining opinions:

1025 Misappreciation of the averaging principle. Management Science, 52,

1026 111–127. https://doi.org/10.1287/mnsc.1050.0459

1027 Laughlin, P. R., Bonner, B. L., Miner, A. G., & Carnevale, P. J. (1999). Frames

1028 of reference in quantity estimations by groups and individuals.

1029 Organizational Behavior and Human Decision Processes, 80, 103–117.

1030 https://doi.org/10.1006/obhd.1999.2848

1031 LeBeau, B., Song, Y. A., & Liu, W. C. (2018). Model misspecification and

1032 assumption violations with the linear mixed model: A meta-analysis. SAGE

1033 Open, 8. https://doi.org/10.1177/2158244018820380

1034 Leithner, A., Maurer-Ertl, W., Glehr, M., Friesenbichler, J., Leithner, K., &

1035 Windhager, R. (2010). Wikipedia and osteosarcoma: A trustworthy patients’

1036 information? Journal of the American Medical Informatics Association :

1037 JAMIA, 17, 373–374. https://doi.org/10.1136/jamia.2010.004507

1038 Lu, L., Yuan, Y. C., & McLeod, P. L. (2012). Twenty-five years of hidden

1039 profiles in group decision making: A meta-analysis. Personality and Social

1040 Psychology Review, 16, 54–75. https://doi.org/10.1177/1088868311417243

1041 Maciejovsky, B., & Budescu, D. V. (2007). Collective induction without

1042 cooperation? Learning and knowledge transfer in cooperative groups and

1043 competitive auctions. Journal of Personality and Social Psychology, 92,

1044 854–870. https://doi.org/10.1037/0022-3514.92.5.854

1045 McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic.

1046 Psychological Bulletin, 111, 361–365.

1047 https://doi.org/10.1037/0033-2909.111.2.361 SEQUENTIAL COLLABORATION 46

1048 Merkle, E. C., Saw, G., & Davis-Stober, C. (2020). Beating the average forecast:

1049 Regularization based on forecaster attributes. Journal of Mathematical

1050 Psychology, 98, 102419. https://doi.org/10.1016/j.jmp.2020.102419

1051 Minson, J. A., Mueller, J. S., & Larrick, R. P. (2017). The contingent wisdom of

1052 dyads: When discussion enhances vs. Undermines the accuracy of

1053 collaborative judgments. Management Science, 64, 4177–4192.

1054 https://doi.org/10.1287/mnsc.2017.2823

1055 Mussweiler, T., Englich, B., & Strack, F. (2004). Anchoring effect. In R. F. Pohl

1056 (Ed.), Cognitive illusions (1st ed., pp. 183–199). Psychology Press.

1057 Niederer, S., & Dijck, J. van. (2010). Wisdom of the crowd or technicity of

1058 content? Wikipedia as a sociotechnical system. New Media & Society, 12,

1059 1368–1387. https://doi.org/10.1177/1461444810365297

1060 OpenStreetMap Contributors. (2021). OpenStreetMap.

1061 https://www.openstreetmap.org/about

1062 Pinheiro, J. C., & Bates, D. M. (Eds.). (2000). Linear mixed-effects models:

1063 Basic concepts and examples. In Mixed-effects models in S and S-PLUS (pp.

1064 3–56). Springer. https://doi.org/10.1007/978-1-4419-0318-1_1

1065 Pohl, R. F. (1998). The effects of feedback source and plausibility of hindsight

1066 bias. European Journal of Cognitive Psychology, 10, 191–212.

1067 https://doi.org/10.1080/713752272

1068 Schielzeth, H., Dingemanse, N. J., Nakagawa, S., Westneat, D. F., Allegue, H.,

1069 Teplitsky, C., Réale, D., Dochtermann, N. A., Garamszegi, L. Z., &

1070 Araya-Ajoy, Y. G. (2020). Robustness of linear mixed-effects models to

1071 violations of distributional assumptions. Methods in Ecology and Evolution,

1072 11, 1141–1152. https://doi.org/https://doi.org/10.1111/2041-210X.13434

1073 Schultze, T., Mojzisch, A., & Schulz-Hardt, S. (2012). Why groups perform

1074 better than individuals at quantitative judgment tasks: Group-to-individual

1075 transfer as an alternative to differential weighting. Organizational Behavior SEQUENTIAL COLLABORATION 47

1076 and Human Decision Processes, 118, 24–36.

1077 https://doi.org/10.1016/j.obhdp.2011.12.006

1078 Simmons, J. P., Nelson, L. D., Galak, J., & Frederick, S. (2011). Intuitive biases

1079 in choice versus estimation: Implications for the wisdom of crowds. Journal

1080 of Consumer Research, 38, 1–15. https://doi.org/10.1086/658070

1081 Sniezek, J. A., & Henry, R. A. (1989). Accuracy and confidence in group

1082 judgment. Organizational Behavior and Human Decision Processes, 43, 1–28.

1083 https://doi.org/10.1016/0749-5978(89)90055-1

1084 Sniezek, J. A., & Henry, R. A. (1990). Revision, weighting, and commitment in

1085 consensus group judgment. Organizational Behavior and Human Decision

1086 Processes, 45, 66–84. https://doi.org/10.1016/0749-5978(90)90005-T

1087 Stasser, G., & Titus, W. (1985). Pooling of unshared information in group

1088 decision making: Biased information sampling during discussion. Journal of

1089 Personality and Social Psychology, 48, 1467–1478.

1090 https://doi.org/10.1037/0022-3514.48.6.1467

1091 Stern, A., Schultze, T., & Schulz-Hardt, S. (2017). How much group is

1092 necessary? Group-to-individual transfer in estimation tasks. Collabra:

1093 Psychology, 3 (16). https://doi.org/10.1525/collabra.95

1094 Steyvers, M., Miller, B., Hemmer, P., & Lee, M. (2009). The wisdom of crowds

1095 in the recollection of order information. In Y. Bengio, D. Schuurmans, J.

1096 Lafferty, C. Williams, & A. Culotta (Eds.), Advances in neural information

1097 processing systems (Vol. 22, pp. 1785--1793). Curran Associates, Inc.

1098 https://proceedings.neurips.cc/paper/2009/file/

1099 4c27cea8526af8cfee3be5e183ac9605-Paper.pdf

1100 Surowiecki, J. (2004). The wisdom of crowds (1. ed). Anchor Books.

1101 Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics

1102 and biases. Science, 185, 1124–1131.

1103 https://doi.org/10.1126/science.185.4157.1124 SEQUENTIAL COLLABORATION 48

1104 Wagner, C., & Vinaimont, T. (2010). Evaluating the wisdom of crowds. Issues

1105 in Information Systems, 11, 724–732.

1106 Wikipedia Contributors. (2021). Wikipedia:about.

1107 https://en.wikipedia.org/wiki/Wikipedia:About

1108 Yaniv, I., & Kleinberger, E. (2000). Advice taking in decision making: Egocentric

1109 discounting and reputation formation. Organizational Behavior and Human

1110 Decision Processes, 83, 260–281. https://doi.org/10.1006/obhd.2000.2909

1111 Zheng, S., & Zheng, J. (2014). Assessing the completeness and positional

1112 accuracy of OpenStreetMap in China. In T. Bandrova, M. Konecny, & S.

1113 Zlatanova (Eds.), Thematic cartography for the society (pp. 171–189).

1114 Springer International Publishing.

1115 https://doi.org/10.1007/978-3-319-08180-9_14

1116 Zielstra, D., & Zipf, A. (2010). Quantitative studies on the data quality of

1117 OpenStreetMap in Germany. AGILE 2010. The 13th AGILE international

1118 conference on geographic information science. SEQUENTIAL COLLABORATION 49

Appendix A General Knowledge Questions

Table A1 Table of items for Experiment 1 and Experiment 2 using general knowledge questions.

Item Question Correct answer

1 How large is the Eiffel Tower? 300 meters

2 How many sovereign countries are located in Africa? 54 countries

3 How long is the Nile? 6650 kilometers

4 How old was Johann Wolfang von Goethe? 82 years old

5 How many bones does a human have? 214 bones

6 What is earth’s mean radius? 6371 kilometers

7 How old was Martin Luther King Jr.? 39 years old

8 How tall is the Gate? 26 meters

9 How high was the highest temperature ever measured on earth? 57 °C

10 At what temperature does lead melt? 328 °C

11 In which year did the first manned space flight take place? 1961

12 How high is Mount Everst? 8848 meters

13 How much does a tennis ball weigh? 57 gramms

14 How many keys does a typical piano have? 88 keys

15 How fast can a cheetah run? 112 kilometers per hour

16 How long can a blue whale get? 33 meters

17 How much do ten liters of oxygen weigh? 14 gramms

18 How many sovereign countries are located in Africa? 54 countries

19 How many prime numbers are in the interval between 1 and 1000? 168 prime numbers

20 How many star constellations are officially recognized? 88 constellations

21 How many kilocalories do ten gummy bears have (i.e. 30 gramms)? 98 kilocalories

22 How long is a soccer goal? 7 meters

23 When was the last capital punishment enforced in France? 1977

24 How many plays of Shakespeare are preserved? 33 plays

25 How long is the kidney of a full-grown person? 12 centimeters

26 How many species of the hawaiian honeycreeper exist? 21 species

27 When was the lightning rod invented? 1752

28 When did the first modern Oylmpic Games take place? 1896

29 How fast can a raindrop fall? 9 meters per second

30 When was Leonardo da Vinci born? 1452

31 What is the maximum time that a total solar eclipse can take? 7 minutes

32 How many strings does a concert harp have? 47 strings

33 What is mean life expectancy of women in Germany? 81 years SEQUENTIAL COLLABORATION 50

Table A1 Table of items for Experiment 1 and Experiment 2 using general knowledge questions. (continued)

Item Question Correct answer

34 How wide is Lake Constance at its widest point? 14 kilometers

35 How long is the distance between earth and sun in million kilometers? 150 million kilometers

36 When was women’s suffrage adapted in Swizerland? 1971

37 How many chaptes does the Quran have? 114 chapters

38 How many times larger is the diameter of Juptier compared to the diameter of Earth? 11 times

39 How large is the island of Borkum? 31 square-kilometers

40 How many singles did the Beatles officially release? 22 singles

41 How old was Alexander the Great when he waged his first campaign? 18 years old

42 How many species of insects live in Antarctica? 52 species

43 How many federal states does Austria have? 9 federal states

44 When was the first human heart transplant performed? 1967

45 How many marriages were there in Germany in 2018? 449466 marriages

46 How many students were enrolled in German university in the winter term of 2019 / 2020? 2897336 students

47 How many floors does Burj Khalifa have? 163 floors

48 How far is Frankfurt (Main) from (linear distance)? 424 kilometers

49 When was the first color film available in Germany? 1936

50 When was the numerus clausus first applied in German universities? 1968

51 How far is Paris from London (linear distance)? 343 kilometers

52 How far is Dortmund from (linear distance)? 284 kilometers

53 How far is Munich from Athens (linear distance)? 1496 kilometers

54 How tall is the Statue of Liberty including its pedestral? 93 meters

55 When was slavery officially ended in the United States? 1865

56 When was the first Autobahn inaugurated? 1921

57 When did Albert Schweitzer receive the Nobel Peace Price? 1952

58 How long is the mean distance between Earth and Moon? 384400 kilometers

59 In which year was Uranus discovered by William Herschel? 1781

60 How many letters does the Arabic script have? 28 letters

61 How deep is the Pacific at the deepest point? 10094 meters

62 When was Astrid Lindgren born? 1907

63 How much does the heart of a full-grown person weigh? 300 gramms

64 How long can a Green Anakonda get? 8 meters

65 After how many days has a person’s top layer of skin completely renewed? 28 days SEQUENTIAL COLLABORATION 51

Appendix B Cities selected for different maps

Table B1 Table of items for Experiment 3 using map material.

Item Map City

1 Austria and Switzerland Zurich

2 Austria and Switzerland Geneva

3 Austria and Switzerland Basel

4 Austria and Switzerland Bern

5 Austria and Switzerland Vienna

6 Austria and Switzerland Graz

7 Austria and Switzerland Linz

8 Austria and Switzerland Salzburg

9 France Paris

10 France Marseille

11 France Lyon

12 France Toulouse

13 France Nizza

14 Italy Rome

15 Italy Milan

16 Italy Naples

17 Italy Florence

18 Italy Venice

19 Spain and Portugal Madrid

20 Spain and Portugal Barcelona

21 Spain and Portugal Seville

22 Spain and Portugal Lisbon

23 Spain and Portugal Porto

24 United Kingdom and Ireland London

25 United Kingdom and Ireland Birmingham

26 United Kingdom and Ireland Glasgow

27 United Kingdom and Ireland Liverpool

28 United Kingdom and Ireland Dublin

29 Poland, Czech, Hungary and Slovenia Warsaw

30 Poland, Czech, Hungary and Slovenia Prague

31 Poland, Czech, Hungary and Slovenia Bratislava

32 Poland, Czech, Hungary and Slovenia Budapest

33 Germany Berlin SEQUENTIAL COLLABORATION 52

Table B1 Table of items for Experiment 3 using map material. (continued)

Item Map City

34 Germany Hamburg

35 Germany Cologne

36 Germany Frankfurt

37 Germany Stuttgart

38 Germany Düsseldorf

39 Germany Leipzig

40 Germany Dortmund

41 Germany Essen

42 Germany

43 Germany Dresden

44 Germany Hannover

45 Germany Nuremberg

46 Germany Duisburg

47 Germany Wuppertal

48 Germany Bielefeld

49 Germany Bonn

50 Germany Münster

51 Germany Karlsruhe

52 Germany Mannheim

53 Germany Augsburg

54 Germany Wiesbaden

55 Germany Braunschweig

56 Germany Kiel

57 Germany Munich