SEQUENTIAL COLLABORATION 1
1 Sequential Collaboration: Comparing the Accuracy of Dependent,
2 Incremental Judgments to Wisdom of Crowds
1 2 3 Maren Mayer & Daniel W. Heck
1 4 University of Mannheim
2 5 Philipps University of Marburg SEQUENTIAL COLLABORATION 2
6 Author Note
7
8 Maren Mayer, Department of Psychology, School of Social Science, University of
9 Mannheim, Germany. https://orcid.org/0000-0002-6830-7768
10 Daniel W. Heck, Department of Psychology, Philipps University of Marburg,
11 Germany. https://orcid.org/0000-0002-6302-9252
12 Data and R scripts for the analyses are available at the Open Science
13 Framework(https://osf.io/96nsk/).
nd 14 The present work was presented at the 62 Conference of Experimental
15 Psychologists (Virtual TeaP, 2021). The present manuscript has not yet been peer
16 reviewed. A preprint was uploaded to PsyArXiv and ResearchGate for timely
17 dissemination (version from June 2, 2021).
18 This work was supported by the Heidelberg Academy of Sciences and
19 Humanities (WIN project Shared Data Sources) and the Research Training Group
20 “Statistical Modeling in Psychology” funded by the German Research Foundation
21 (DFG grant GRK 2277).
22 The authors made the following contributions. Maren Mayer: Conceptualization,
23 Investigation, Methodology, Writing - Original Draft, Writing - Review & Editing;
24 Daniel W. Heck: Conceptualization, Methodology, Writing - Review & Editing.
25 Correspondence concerning this article should be addressed to Maren Mayer, B6,
26 30-32, 68169 Mannheim. E-mail: [email protected] SEQUENTIAL COLLABORATION 3
27 Abstract
28 In recent years, online collaborative projects in which users generate extensive
29 knowledge bases such as Wikipedia or OpenStreetMap have become increasingly
30 popular while yielding highly accurate information. Collaboration in such projects is
31 organized sequentially with one contributor creating an entry and the following
32 contributors deciding whether to adjust or maintain the presented information. We refer
33 to this process as sequential collaboration as individual judgments are dependent on the
34 latest judgment. As sequential collaboration has not yet been examined systematically,
35 we investigated whether dependent incremental judgment as obtained in sequential
36 collaboration become increasingly more accurate and whether the final judgments are
37 more accurate than estimates obtained with equally large groups in wisdom of crowds.
38 For this purpose, we conducted three preregistered studies with groups of four to six
39 contributors using general knowledge questions as well as geographic maps on which
40 cities had to be positioned. As expected, individual judgments in sequential
41 collaboration became more accurate and the final group estimates were slightly more
42 accurate than those based on aggregated judgments in wisdom of crowds. These results
43 show that sequential collaboration can profit from dependent incremental judgments,
44 thereby extending the literature on dependent judgments and shedding light on
45 collaboration in large-scale online collaborative projects.
46 Keywords: judgment and decision making, teamwork, mass collaboration, group
47 decision making SEQUENTIAL COLLABORATION 4
48 Sequential Collaboration: Comparing the Accuracy of Dependent,
49 Incremental Judgments to Wisdom of Crowds
50 Collaborative online projects that provide user-generated content have become a
51 popular source for information gathering and acquiring over the last twenty years. The
52 most prominent example is Wikipedia, an online encyclopedia that allows users to
53 contribute semantic information to various topics in the form of structured articles
54 (Wikipedia Contributors, 2021). Another less well known example of online
55 collaboration is OpenStreetMap, a collaborative project that aims at generating a
56 comprehensive, open, and free-to-use map of the world (OpenStreetMap Contributors,
57 2021). OpenStreetMap does not only comprise geographical numeric information about
58 the locations of objects such as coordinates but also semantic information such as
59 names of streets, areas, buildings and other useful information (e.g., addresses or
60 websites of shops and restaurants). Giles (2005) showed that Wikipedia is very accurate
61 in general. Moreover, certain topics such as information on cancer or certain drugs are
62 similarly accurate as official health information or text books (Kräenbring et al., 2014;
63 Leithner et al., 2010). Comparing the accuracy of OpenStreetMap with commercial
64 map providers or governmental sources also revealed a comparable accuracy (Girres &
65 Touya, 2010; Zheng & Zheng, 2014; Zielstra & Zipf, 2010).
66 The high accuracy of Wikipedia and other online collaborative projects has often
67 been attributed to wisdom of crowds (Arazy et al., 2006; Baeza-Yates & Saez-Trumper,
68 2015; Chen et al., 2010; Kittur et al., 2007; Kittur & Kraut, 2008; Niederer & Dijck,
69 2010). However, wisdom of crowds refers to a technique of aggregating independent
70 individual judgments (Galton, 1907; Larrick & Soll, 2006; Surowiecki, 2004). The high
71 accuracy of judgments in wisdom of crowds is due to the central limit theorem which
72 ensures that errors in independent, individual judgments cancel out (Hogarth, 1978).
73 Wisdom of crowds has been shown to yield highly accurate estimates for various tasks
74 and contexts (Hueffer et al., 2013; Keck & Tang, 2020; Larrick & Soll, 2006; Steyvers et
75 al., 2009; Wagner & Vinaimont, 2010). Aggregating independent individual judgments SEQUENTIAL COLLABORATION 5
76 is especially successful when judgments bracket the true answer (Larrick & Soll, 2006;
77 Simmons et al., 2011) and are negatively correlated and unbiased (Davis-Stober et al.,
78 2014; Keck & Tang, 2020).
79 In contrast, judgments in online collaborative projects are not collected
80 independently and then aggregated afterwards, but rather elicited in a dependent and
81 sequential manner. Instead of providing independent individual judgments, contributors
82 encounter already existing entries and decide whether to change the presented
83 information which reflects the latest version of an entry or whether to leave the
84 presented information as it is. We refer to this way of collaborating as sequential
85 collaboration.
86 In the following, we will first describe the process of sequential collaboration,
87 distinguish it from other forms of collaboration, and embed it into already existing
88 research on dependent judgments which has shown both positive and detrimental effects
89 of dependency. Furthermore, we compare sequential collaboration and wisdom of
90 crowds to highlight why eliciting incremental, dependent judgments in sequential
91 collaboration can be beneficial for judgment accuracy compared to aggregating
92 independent judgments. In three studies, two of them preregistered, we used general
93 knowledge questions and maps on which cities should be positioned to test whether
94 sequential collaboration yields improved judgments within small groups of four to six
95 contributors. Moreover, we tested whether the final judgments at the end of a sequential
96 chain are more accurate than estimates obtained by aggregating independent individual
97 judgments in wisdom of crowds. In line with our hypotheses, we found that judgment
98 accuracy increased over the course of sequential chain and that sequential collaboration
99 yielded more accurate results than wisdom of crowds in two of the three studies.
100 Sequential Collaboration
101 As outlined above, collaboration in collaborative online projects is organized
102 sequentially by making incremental changes to the latest available information.
103 Sequential collaboration starts with one contributor creating an initial independent SEQUENTIAL COLLABORATION 6
104 entry. The following contributors who encounter this entry can then decide whether to
105 adjust or maintain the presented information. Whenever the entry is changed, the
106 information is updated such that only the latest version of the entry is presented to the
107 following contributors. For example, a first contributor might answer the question “How
108 tall is the Eiffel Tower?” with 420 meters. A second contributor encountering this
109 judgment could simply maintain it while a third contributor might adjust the height to
110 290 meters. After several contributors have adjusted and maintained the judgment, the
111 correct height of 300 meters may be entered. In the domain of geographical maps, the
112 first contributor could create an initial entry by outlining the layout of a buildingnot
113 yet mapped in OpenStreetMap. While a second contributor could improve the outline
114 of the building, a third might not change any information, and a fourth could add
115 semantic tags to describe that the building belongs to a university. Throughout several
116 sequential steps of adjusting and maintaining the entry, the building might finally be
117 represented by an adequate outline and be tagged as a university building with
118 additional information such as the university’s website and address. The sequence of
119 decisions whether to maintain or adjust entries made by a previous contributor forms a
120 sequential chain. Figure1 displays how group estimates are generated in sequential
121 collaboration and in wisdom of crowds. In the former, the final estimate is the last
122 judgment in a sequential chain generated by adjusting and maintaining previous
123 judgments; in the latter, the aggregated estimate is obtained by averaging independent
124 individual judgments.
125 Even though sequential collaboration is performed by a group of individuals and
126 shares some aspects with other forms of group decision making, it also has some unique
127 features that distinguish it from other forms of collaboration. When investigating group
128 decision making, group work usually takes place simultaneously (Kerr & Tindale, 2004;
129 Lu et al., 2012; Stasser & Titus, 1985) even though interaction does not necessarily
130 takes place in person (Dennis, 1996; Dennis et al., 1998; Lu et al., 2012). In a paradigm
131 organized like this, all members of the group have the opportunity of listening to all
132 judgments and opinions, asking questions to other group members, and sharing reasons SEQUENTIAL COLLABORATION 7
Figure 1 Illustration of forming a group estimate in (a) wisdom of crowds compared to (b) sequen- tial collaboration.
(a) Wisdom of Crowds
120 250 250 300
Aggregate estimate: 240
(b) Sequential Collaboration
120 250 250 300
Final estimate: 300
133 for judgments and other information. In sequential collaboration, however, information
134 is shared only by adding or correcting the judgment of a previous contributor which
135 implies that the dependency between judgments is limited to the displayed information.
136 Furthermore, direct interactions with other contributors are neither necessary nor
137 possible in sequential collaboration, and additional information such as the number of
138 adjustments already made to this information or reasons why information was adjusted
139 are initially not available.
140 A form of collaboration similar to sequential collaboration is the Delphi method
141 (Dalkey & Helmer, 1963; Geist, 2010; Jeste et al., 2010). The Delphi method was
142 designed to obtain judgments on a given topic from a group of experts who do not
143 interact directly. After providing a judgment and reasons for this judgment, all
144 judgments are combined in a report by a moderator. This report is sent to all experts
145 who can then revise their judgments based on the judgments and information included
146 in the report. When experts have reached a sufficient consensus, the individual SEQUENTIAL COLLABORATION 8
147 judgments are aggregated to a final result. Both forms of collaboration are similar in
148 that no interaction happens directly with one another. However, in sequential
149 collaboration, contributors are not presented with judgments of multiple other
150 contributors and do not get to know the reasons for specific judgments. Moreover,
151 contributors are not necessarily required to provide a judgment, and even if they do,
152 they may not notice when their judgment is in turn adjusted by others. Finally, the
153 Delphi method focuses on eliciting judgments by a group of experts, whereas in
154 sequential collaboration, neither the specific contributors nor the number of
155 contributors has to be predefined.
156 Possible issues and benefits of sequential collaboration.
157 Even though sequential collaboration seems to be a successful way of integrating
158 judgments of various individuals, the process of sequentially deciding whether to adjust
159 or maintain a previous judgment has not been systematically examined yet.
160 Nonetheless, there are findings on related phenomena that can be applied to sequential
161 collaboration and allow us to derive testable predictions.
162 Possible issues for the accuracy of sequential collaboration may arise from the
163 anchoring effect (Tversky & Kahneman, 1974). Anchoring describes the robust
164 phenomenon that a presented numerical value influences a subsequent, often unrelated
165 numerical judgment (Mussweiler et al., 2004). This effect may undermine the accuracy
166 of sequential collaboration such that adjustments made to a previous judgment are
167 systematically biased toward the previous judgment. Especially when the previous
168 judgment heavily over- or underestimates the correct value, anchoring might affect later
169 judgments which may in turn result in prolonging or hindering other contributors to
170 arrive at accurate unbiased estimates.
171 The conditions under which information provided by others is considered in
172 forming a judgment has been extensively studied in the advice-taking literature
173 (Bonaccio & Dalal, 2006). A typical finding in advice taking is egocentric discounting
174 which describes the phenomenon that advice is generally underweighted relative to SEQUENTIAL COLLABORATION 9
175 one’s own initial judgment (Bonaccio & Dalal, 2006; Yaniv & Kleinberger, 2000). This
176 results in less accurate judgments compared to equally weighing the advice and one’s
177 own judgment. In sequential collaboration, egocentric discounting could lead
178 contributors to adjust the presented previous judgment mainly according to their prior
179 beliefs, which in turn could be detrimental to accuracy as the chain may not converge
180 to the correct answer. However, advice taking improves when no initial individual
181 judgment is formed before receiving advice (Koehler & Beauregard, 2006). This
182 resembles the situation in sequential collaboration more closely since contributors are
183 directly confronted with the previous judgment and do not have to form an initial,
184 independent judgment. Hence, contributors in sequential collaboration may be more
185 likely to accept previous judgments compared to the standard advice-taking paradigm.
186 Previous research also provides preliminary evidence in favor of the accuracy of
187 sequential collaboration. Providing participants with a frame of reference improves
188 subsequent judgments, especially because it prevents extreme judgments (Bonner et al.,
189 2007; Laughlin et al., 1999). Thus, previous judgments in a sequential chain may serve
190 as a frame of reference that prevents extreme judgments and fosters reaching an
191 accurate estimate earlier. However, especially at the beginning, judgments by the
192 previous contributors may not provide an accurate frame of reference.
193 Providing judgments of other individuals can also improve the accuracy of
194 wisdom of crowds. Imitating successful individuals leads to more accurate judgments
195 (King et al., 2012), and discussions in dyads also improve judgments but only when
196 initial independent judgments are formed (Minson et al., 2017). Moreover, Becker et al.
197 (2017) showed that information about others’ judgments is beneficial when this
198 information equally weighs all other judgments (as opposed to overweighing the
199 judgment of a single, highly influential individual). Given that individual judgments
200 can be improved by providing judgments of others, providing contributors in sequential
201 collaboration with previous judgments may lead to more accurate judgments.
202 Especially the finding that imitating successful individuals improves accuracy (King et
203 al., 2012) is relevant for sequential collaboration as contributors may often be presented SEQUENTIAL COLLABORATION 10
204 with the currently best judgment in the sequential chain. Moreover, it is not required to
205 imitate successful individuals but it is also possible to maintain their judgment and
206 thereby imitate this judgment. However, while King et al. (2012) selected the current
207 most accurate judgment from a large pool of independent judgments, the judgments
208 presented in a sequential chain are not necessarily very accurate, especially if only a few
209 contributors have encountered and edited it. It is also unlikely that the judgments of
210 previous contributors are equally weighted as in the study by Becker et al. (2017);
211 instead, single judgments can be dominant.
212 Sequential collaboration may also benefit from the fact that in group work, not
213 all group members contribute to a given task equally and some do not contribute at all
214 (free-rider effect, Bray et al., 1978) and that group members often contribute lessthe
215 more they feel that their contribution is dispensable (Kerr & Bruun, 1983). Such effects
216 may also be observed in sequential collaboration since contributors can decide not to
217 adjust a previous judgment when they do not feel confident that they can substantially
218 contribute to it. This mechanism could in turn improve accuracy since giving
219 respondents the possibility to select the questions to be answered improves accuracy in
220 wisdom of crowds (Bennett et al., 2018). The fact that contributors can self-select
221 which judgments to adjust may thus lead to a higher accuracy of the resulting
222 judgments. However, this requires that contributors can accurately distinct which
223 judgments to maintain (assuming they cannot substantially contribute to it) and which
224 judgments to adjust (assuming they can improve the present state of an entry).
225 Hypotheses
226 Overall, we hypothesize that the probability that a judgment is changed
227 decreases (Hypothesis 1a) while the accuracy of judgments increases (Hypothesis 1b)
228 over the course of a sequential chain. These two hypotheses form the basis of sequential
229 collaboration and need to be tested before further examining sequential collaboration
230 and comparing its accuracy to that of wisdom of crowds.
231 Given its high accuracy, wisdom of crowds can be used as a benchmark for other SEQUENTIAL COLLABORATION 11
232 forms of collaboration. We expect that sequential collaboration yields more accurate
233 results than wisdom of crowds (Hypothesis 2). As discussed above, sequential
234 collaboration may profit from the possibility that contributors are not required to
235 adjust the displayed information (Bennett et al., 2018). Instead, contributors who are
236 not confident may perceive their own judgments to be dispensable (Kerr &Bruun,
237 1983) and in turn not adjust the presented judgment. Furthermore, the accuracy of
238 sequential collaboration should be higher than that of wisdom of crowds given that
239 providing information about the judgments of others can improve judgments (Becker et
240 al., 2017; King et al., 2012; Minson et al., 2017). However, sequential collaboration
241 cannot profit from the central limit theorem (Hogarth, 1978) or from positive effects
242 due to negatively correlated judgments (Davis-Stober et al., 2014; Keck & Tang, 2020).
243 Taken together with the fact that wisdom of crowds is known to yield highly accurate
244 estimates already, we only expect small effect sizes for the comparison of sequential
245 collaboration and wisdom of crowds.
246 To test our hypotheses, we conducted three preregistered online experiments
247 using short chains of four to six contributors in sequential collaboration and respective
248 group sizes for wisdom of crowds. The material comprised general knowledge questions
249 with numerical judgments in the first two experiments and geographic maps on which
250 contributors had to position cities in the last experiment.
251 Experiment 1
252 Prior to conducting the three experiments reported in the present manuscript,
253 we conducted a pilot study to develop and pretest the experimental paradigm. Based
254 on the results of the pilot study, we improved the items, the experimental design, and
255 the countermeasures which aimed at assuring participants’ compliance with the task.
256 Thereby, we ensured the collection of valid data for testing our hypotheses and limited
257 the amount of data exclusions due to unruly behavior of the participants. SEQUENTIAL COLLABORATION 12
258 Method
259 Materials. We presented 65 difficult general knowledge questions such as “How
260 tall is the Eiffel Tower?” or “When was Leonardo da Vinci born?” to the participants.
261 The questions were taken from an item pool on general knowledge questions (Pohl,
262 1998) and updated with contemporary information whenever necessary. The median of
263 correctly answered questions was 0.53% (MAD = 0.78%) indicating that the questions
264 were indeed difficult to answer for participants. All items, their correct numerical
265 answers, and the unit in which the answer had to be given are provided in Table A1 in
266 the Appendix.
267 Participants. For this online study, 310 German college students were
268 collected via a German panel provider. In order to control data quality while collecting
269 the data, participants who changed their browser window or switched to other programs
270 more than five times were already excluded during participation. Based on the results
271 of the pilot study, we suspected participants to look up answers when more than 10% of
272 these questions were answered correctly. This was the case for three participants who
273 were not considered for building sequences and whose data was excluded for the
274 analysis. One participant was excluded due to irregular answer patterns in more than
275 10% of the questions (i.e., answering with number series such as “23456”). Lastly, two
276 participants were excluded since the same position in a sequential chain was assigned to
277 two participants due to a technical issue. We kept the data of the participant whose
278 data was used throughout the rest of this sequential chain. Our final sample comprised
279 304 participants, of whom 76.32% were female, 22.70% were male, and 0.99% did
280 identify as diverse. The mean age of the sample was 23.82 years (SD = 3.20).
281 Design and Procedure. Participants were randomly assigned to either the
282 wisdom-of-crowds questionnaire (193 participants) or the sequential-collaboration
283 questionnaire (111 participants). After consenting to the study, they were introduced to
284 the task. When answering the wisdom-of-crowds questionnaire, participants were
285 presented with one general knowledge question at once and were required to answer the SEQUENTIAL COLLABORATION 13
286 question in an open text-box before they could proceed to the next question. When
287 answering the sequential-collaboration questionnaire, participants were also presented
288 with one general knowledge question. Additionally, the answer of a previous participant
289 was given below the question. Then, participants were asked whether they would like to
290 adjust or maintain the given judgment. Only if participants decided to adjust the
291 presented judgment, the text box appeared in which the new judgment could be entered
292 before proceeding to the next question. Figure2 displays the design of the question for
293 a) the wisdom-of-crowds questionnaire and b) the sequential-collaboration
294 questionnaire. Questions in both questionnaires were presented in random order, the
295 unit in which the judgment had to be given was provided next to the open-text box.
296 After answering all general knowledge questions, participants lastly indicated
297 demographic variables before being thanked for participation and debriefed. To prevent
298 looking up answers, we implemented a time limit of 30 seconds to enter a judgment in
299 both conditions. Additionally, we implemented a waiting time of two seconds in the
300 sequential-collaboration condition to prevent clicking through the study.
301 Since sequences in sequential collaboration require initial judgments which can
302 be presented to the first participant in a sequential chain, we used participants who
303 answered the wisdom-of-crowds questionnaire to initialize sequences in sequential
304 collaboration. Hence, 37 participants who completed the wisdom-of-crowds
305 questionnaire served to initialize sequences which resulted in 156 participants in the
306 wisdom-of-crowds condition and 148 participants in the sequential-collaboration
307 condition since participants are only included in one experimental condition for the
308 analysis. We used a sequence length of four meaning that a sequential chain in this
309 experiment consists of one participant who completed the wisdom-of-crowds
310 questionnaire followed by three participants who completed the sequential-collaboration
311 questionnaire consecutively. For each participant, only the latest judgment in the
312 sequential chain was presented. SEQUENTIAL COLLABORATION 14
Figure 2 Questionnaire for a) wisdom of crowds, b) sequential collaboration in Experiment 1 and Experiment 2.
313 Results
314 Before analyzing the data, we excluded judgments that were timed out after 30
315 seconds. We identified 315 judgments that were timed out by the experimental
316 software, which resulted in the exclusion of 810 judgments in total since sequential
317 chains containing a judgment that was timed out were excluded completely. From
318 19,760 judgments, 18,950 judgments remained after the exclusion.
319 Afterwards, we standardized the raw judgments item-wise to obtain comparable
320 values over all items for the following analyses. To standardize the judgments, we
321 subtracted the correct answer for each question from the raw judgments and divided the
322 result by the standard deviation of judgments obtained with the wisdom-of-crowds SEQUENTIAL COLLABORATION 15
323 questionnaire. This procedure offers several benefits: First, since judgments are given
324 on vastly different scales (e.g., single digits for the length of a soccer goal, yeardatesfor
325 the year Leonardo da Vinci was born, or millions for the number of students enrolled in
326 German universities), standardization makes the judgments for these questions
327 comparable for the later analysis. Second, this standardization avoids issues arising
328 with other standardization procedures, especially with logarithmic transformation. As
329 some participants answered questions correctly, applying a logarithmic transformation
330 to the difference of the raw judgment and the correct answer would result in avalueof
331 minus infinity. Additionally, the logarithmic transformation is not linear, thus weighing
332 judgments differently depending on their value. Lastly, our transformation improves
333 interpretation of judgments since a negative value now indicates underestimation while
334 a positive value indicates overestimation and a value of zero indicates that the judgment
335 was correct.
336 After standardizing the judgments, we removed the 1% most extreme judgments
337 from the data as these judgments are rather extreme and may distort the results
1 338 obtained. Excluding outliers between conditions is also recommended by André (2021)
339 who demonstrated that excluding outliers within conditions can increase false-positive
340 rates. We identified 190 outliers with this procedure. Again, we excluded these
341 judgments as well as sequences containing judgments that were identified as outliers
342 which resulted in a final sample of 18,694 judgments.
343 Model-based analysis.
344 To test Hypothesis 1a stating that that change is expected to decrease over the
345 course of a sequential chain, we only considered data obtained in the
346 sequential-collaboration questionnaire (i.e., data of participants on position 2, 3, or 4 in
347 the sequential chain) since only these participants were able to decide whether to change
1 This approach was proposed by a reviewer of an earlier version of this manuscript. We applied this routine to all studies reported in this manuscript and added an exploratory analysis to demonstrate the effects of excluding extreme judgments to the results of Hypothesis 2 (comparison of estimates obtained by wisdom of crowds and sequential collaboration). SEQUENTIAL COLLABORATION 16
348 or maintain the presented judgments during the study. Thus, we fitted a generalized
349 linear mixed model using R with the packages lme4 (Bates et al., 2015) and lmerTest
350 (Kuznetsova et al., 2017) The decision to adjust or maintain a judgment served as
351 dependent variable and chain position in the sequence as independent variable. Since
352 the dependent variable can only be 0 (deciding not to adjust the presented judgment) or
353 1 (adjusting the presented judgment), we implemented a logit link function to handle
354 the dichotomous dependent variable. Additionally, as every participant answered the
355 same 65 items, we added random intercepts for items and participants to account for
356 the nested structure of our data (Pinheiro & Bates, 2000). Lastly, we set polynomial
357 contrasts to test for a decline in change rate with increasing chain position.
358 Figure3 displays that mean change rate for each chain position and the
359 according 95% confidence intervals. Even though descriptively in line with Hypothesis
360 1a, the effect of chain position was not significant for both the linear trend(β = −0.311,
361 CI = [−0.685, 0.063], z = −1.629, p = .103) and the quadratic trend (β = 0.162,
362 CI = [−0.213, 0.538], z = 0.846, p = .397). Thus, Hypothesis 1a was not supported.
363 To test whether judgments become more accurate over the course of a sequential
364 chain (Hypothesis 1b), we only considered data obtained in the sequential-collaboration
365 condition and used absolute values of the standardized judgments as dependent
366 measure to capture how far or close the judgments are to the true answer and, thus,
367 how accurate they are. We computed a linear mixed model with absolute standardized
368 judgments as dependent variable and chain position as independent variable. Since
369 fixed-effect coefficients in linear mixed models have been shown to be robust against
370 violations concerning the residual distribution (LeBeau et al., 2018; Schielzeth et al.,
371 2020) and our data contains natural zeros that are not adequately transformed using
372 typical transformations applied to left skewed data, we did not transform the dependent
373 variable to conform to these distributional assumptions. Furthermore, we added
374 random intercepts for participants and items to account for the nested structure of our
375 data, and set polynomial contrasts to test for a decrease in absolute standardized
376 judgments over the course of a sequential chain. SEQUENTIAL COLLABORATION 17
Figure 3 Change rate within a sequential chain
Change rate over chains
Experiment 1 Experiment 2
0.3
0.2
Change rate 0.1
0.0
2 3 4 2 3 4 5 6 Chain position Note. Bars display the emprical means for each chain position, error bars show the 95% between− subjects confidence intervals for each condition.
377 Figure4 displays the mean absolute standardized judgments for each chain
378 position and the according 95% confidence intervals. In line with Hypothesis 1b,
379 judgments became more accurate as the distance to the correct answer declines over the
380 course of a sequential chain. This pattern was also confirmed by the linear mixed model
381 showing a significant negative linear trend (β = −0.024, CI = [−0.036, −0.011],
382 t(143.55) = −3.800, p < .001), thus supporting Hypothesis 1b. Neither the quadratic
383 (β = 0.000,CI = [−0.012, 0.012], t(143.55) = 0.019, p = .985) nor the cubic trend
384 (β = 0.001,CI = [−0.011, 0.013], t(143.55) = 0.173, p = .863) were significant.
385 In order to test whether judgments obtained by sequential collaboration are
386 more accurate than judgments obtained by wisdom of crowds (Hypothesis 2), we first
387 generated the according estimates. For sequential collaboration, the estimate for each
388 chain is the judgment at the last chain position. For wisdom of crowds, we computed
389 estimates by randomly assigning the participants to virtual groups of four and averaging SEQUENTIAL COLLABORATION 18
Figure 4 Accuracy of judgments within a sequential chain
Sequence of judgments over chains
Experiment 1 Experiment 2 0.20
0.15
0.10
0.05
Mean standardized judgments Mean standardized 0.00
1 2 3 4 1 2 3 4 5 6 Chain position Note. Bars display the emprical means for each condition, error bars show the 95% between− subjects confidence intervals for each condition.
2 390 the standardized judgments for each question in these groups. This procedure resulted
391 in estimates from 39 groups of participants in the wisdom-of-crowds condition and
392 estimates from 37 sequences of participants in the sequential-collaboration condition on
393 the 65 items presented in the study. We use the absolute value of the obtained
394 estimates as a dependent variable to assess the accuracy of judgments.
395 To analyze Hypothesis 2, we computed a linear mixed model with the absolute
396 value of the estimate as dependent variable and condition, either wisdom of crowds or
397 sequential collaboration, as independent variable. Additionally, we considered random
398 intercepts for items and group of participants from which the estimate was derived.
2 To check the robustness of the grouping, we computed the mean difference in absolute standardized estimates and the linear mixed model for 100 different random groupings. The mean difference in absolute standardized estimates was 0.004 (SD = 0.0004). The results of the linear mixed model remained the same for all 100 comparisons. SEQUENTIAL COLLABORATION 19
399 Figure5 displays the mean absolute standardized estimates and the according 95%
400 confidence intervals. Contrary to Hypothesis 2, estimates in the sequential-collaboration
401 showed a slightly higher mean absolute standardized estimate meaning that these
402 judgments were descriptively farther away from the correct answers and thus less
403 accurate. However, the linear mixed model did not reveal a significant effect of
404 condition on the absolute standardized estimate (β = 0.008, CI = [−0.005, 0.022],
405 t(72.15) = 1.237, p = .220). These results do not support Hypothesis 2.
Figure 5 Comparison of estimates of wisdom of crowds and sequential collaboration
Comparison of wisdom of crowds and sequential collaboration
Experiment 1 Experiment 2
0.15
0.10
0.05
0.00
Mean absolute standardized estimate Mean absolute standardized WoC Seq WoC Seq Condition Note. WoC = wisdom−of−crowds condition, Seq = sequential−collaboration condition. Bars display the empirical means for each condition, error bars show the 95% between−subjects confidence intervals for each condition.
406 Effect of outlier analysis on the results. Since extreme judgments and the
407 exclusion of extreme judgments may have affected estimates obtained by wisdom of
408 crowds and sequential collaboration differently, we additionally checked how varying
409 levels of outlier exclusion affect the results of the linear mixed model computed for
410 Hypothesis 2. We compared the model coefficient for the difference in conditions for
411 excluding 0% at minimum and the 5% of the most extreme judgments at maximum in SEQUENTIAL COLLABORATION 20
412 0.25% steps. The results of this exploratory analysis are displayed in Figure6. The plot
413 shows that when only excluding few extreme judgments, sequential collaboration was
414 more accurate. However, when excluding at least the 0.75% most extreme judgments,
415 wisdom-of-crowds estimates became more accurate than sequential-collaboration
416 estimates which was even significant in more than half of the comparisons.
Figure 6 Effect of outlier removal on the comparison of wisdom od crowds and sequential collabo- ration
0.00 Experiment 1 Experiment 2
−0.02 Coefficient not significant −0.04 significant
0.00 0.01 0.02 0.03 0.04 0.05 Percent removed extreme judgments Note. Positive values indicate that estimates are more accurate in wisdom of crowds, negative values indicate more accurate estimates in sequential collaboration.
417 Robustness analysis. Given the pronounced impact of outliers in the present
418 analysis, we also used observation-oriented modeling to perform a robustness analysis
419 (Grice et al., 2012, 2017; Grice, 2011). Observation-oriented modeling is a technique to
420 identify underlying patterns in the raw data by classifying observations according to
421 their fit to the predicted of the hypotheses. Instead of performing typical statistical
422 tests, effect sizes are derived directly from the data. For instance, the Percent
423 Classification Correct (PCC) describes the proportion of observations that areinline SEQUENTIAL COLLABORATION 21
424 with the hypothesis. This method focuses on patterns in the raw data rather than on
425 standardized or aggregated data and has a straightforward interpretation since the PCC
426 is always between 0% and 100%. Observation-oriented modeling is assumption free and
427 robust to outliers (Grice et al., 2012, 2017). Therefore, we perform all robustness
428 analyses on data (1) with exclusion of the 1% most extreme judgments as well as (2)
429 with no exclusion of extreme judgments to further provide insights into the robustness
430 of the reported effects. Since observation-oriented modeling has no requirements
431 concerning the distribution of the data, we perform all robustness analyses using raw
432 data.
433 We applied the PCC to check the robustness of Hypothesis 1a and Hypothesis
434 1b. For Hypothesis 1a, we computed the relative frequency of items for which the
435 change rate decreased over the course of a sequential chain. Like in the model-based
436 analysis, we only considered chain position 2, 3, and 4. As the most strict test, we
437 computed the relative frequency of items for which the change rate decreases
438 monotonically in every sequential step (strict montonicity). To improve interpretation,
439 we additionally computed a benchmark for the expected relative frequency under the
440 condition that the hypothesis does not hold. Assuming that an increase and decrease in
441 the change rate have the same probability in every sequential step, we computed a
442 benchmark of 1/3! = 16.67% for the strict monotonicity. For data with the 1% most
443 extreme judgments excluded, 20% of all items (i.e., 13 items) showed a strictly
444 monotonic decrease in change rate. Analyzing the data with no outliers excluded, for
445 18.46% of all items (i.e., 12 items) the change rate decreased strictly monotonically.
446 Both of these PCCs exceeded the benchmark, indicating that Hypothesis 1a is
447 supported by the robustness analysis. Even though Hypothesis 1a was not supported in
448 the model-based analysis, the pattern of mean change rates already resembled what was
449 expected under the hypothesis. As a less strict criterion, we additionally computed the
450 relative frequency of items for which the change rate decreased from the second to the
451 last chain position (global monotonicity). For data with extreme judgments excluded,
452 64.62% of all items (i.e., 42 items) showed a lesser change rate at the last chain position SEQUENTIAL COLLABORATION 22
453 than at the second chain position. A similar pattern emerged for data with no outlier
454 exclusion (66.15% of all items, i.e., 43 items), further supporting Hypothesis 1a.
455 For Hypothesis 1b, we computed the PCC as the relative frequency of chains in
456 sequential collaboration for which the judgments become more accurate or remain
457 stable over the course of a sequential chain. Since participants can decide to maintain
458 judgments, we can only test for weak monotonicity by computing the relative frequency
459 of items for which the accuracy increased or remained stable in every sequential step.
460 Again, we computed a benchmark: Assuming that improving accuracy, keeping a
461 judgment, and worsening a judgment have equal probabilities, we would expect that
3 462 0.66 = 28.75% of all chains improve or remain stable for every sequential step under
463 the condition that the hypothesis does not hold. We found that for data excluding
464 (including) outliers, 79.61% (79.44%) of all chains had stable or improving accuracy for
465 every sequential step. The PCC clearly exceeded the benchmark and supported
466 Hypothesis 1b. Additionally, we computed the relative frequency of chains for which the
467 last judgment is more accurate than the first judgment (global monotonicity). This was
468 the case for 92.72% of all chain with extreme judgments removed and for 92.64% of all
469 chains when no judgments were removed which further supported Hypothesis 1b.
470 To compare the accuracy of wisdom of crowds and sequential collaboration
471 (Hypothesis 2), we computed the common language effect size (McGraw & Wong,
472 1992). The common language effect size is a nonparametric measure, similar toPCC,
473 that is defined as the probability that a randomly drawn value of the dependent
474 variable in one condition is smaller than a randomly drawn value in another condition.
475 To test the robustness of Hypothesis 2, we computed the relative frequency that an
476 estimate is more accurate in sequential collaboration than in wisdom of crowds by
477 comparing all estimates of both conditions item-wise. For data with the 1% most
478 extreme judgments removed, we found that for 52.37% of all comparisons estimates in
479 sequential collaboration were more accurate than wisdom of crowds. This relative
480 frequency was significantly larger than chance in a one-sample t-test
481 (t(64) = 1.705, p = .047). Similarly, when no outliers were excluded, for 54.94% of all SEQUENTIAL COLLABORATION 23
482 comparisons sequential collaboration had more accurate estimates
483 (t(64) = 3.871, p < .001). Even though the model-based analysis did not support
484 Hypothesis 2, the robustness analysis shows some support for Hypothesis 2.
485 Discussion
486 Overall, Experiment 1 yielded mixed results for the hypothesis. Hypothesis 1b
487 was clearly supported by the model-based analysis as well as the robustness analysis.
488 However, Hypothesis 1a was only supported by the robustness analysis relying on
489 patterns in the raw data but not by the model-based analysis even though mean change
490 rates for each chain position resembled the expected pattern. Similarly, Hypothesis 2
491 was also not supported by the model-based analysis but by the robustness analysis.
492 Nonetheless, the difference in accuracy was very small and not significant andthe
493 analysis on effect of removing outliers shows that sequential collaboration might only
494 yield more accurate results than wisdom of crowds when extreme judgments are not
495 removed from the data. This might be the case since sequential collaboration does only
496 require one participant who adjusts the presented judgment to eliminate extreme
497 judgments while averaging in wisdom of crowds requires far more judgments to yield
498 the same effect.
499 Experiment 1 has some limitations restricting the generalizability of the results.
500 First, the sample was a sample of college students. Thus, participants were similar in
501 age and educational background which might have limited to possibilities to improve
502 judgments presented in sequential collaboration since the knowledge of this sample
503 might have been too homogeneous. Furthermore, we implemented a rather short chain
504 length of four. The results, however, might differ when using longer chains.
505 Experiment 2
506 To further examine sequential collaboration and address some of the limitations
507 of Experiment 1, we conducted a second experiment using the same material but SEQUENTIAL COLLABORATION 24
508 increasing the chain length from four to six and collecting an adult sample with no
509 restrictions in age or education. Thereby, we test the robustness of the findings,
510 especially concerning the continuous improvement of judgments within a sequential
511 chain, and further extend the paradigm to a different sample and a longer sequential
3 512 chain. The design and model-based analysis were preregistered at aspredicted.org.
513 Note that we did not preregister Hypothesis 1a concerning the change rate within a
514 sequential chain. Moreover, based on feedback from previous reviews, we improved the
515 exclusion criteria for extreme judgments, and added an exploratory analysis to
516 investigate the effects of removing outliers. Furthermore, we conducted all analyses
517 concerning the accuracy of judgments (Hypothesis 1b and Hypothesis 2) using absolute
518 standardized judgments.
519 Method
520 Material, Design, and Procedure. For Experiment 2, we used the same
521 design (Figure2) and material (Table A1) as in Experiment 1 but made some minor
522 adjustments. Since the sample was not restricted in age, we extended the time limit for
523 answering the questions from 30 to 40 seconds. Furthermore, we implemented a chain
524 length of six meaning that the first participant in a sequential chain answered the
525 wisdom-of-crowds questionnaire and was then followed by five participants answering
526 the sequential-collaboration questionnaire.
527 Participants. A German panel provider sampled 686 participants for this
528 study. During data collection, 21 participants were identified who entered more than
529 10% correct answers, were thus suspected to look up answers and excluded for building
530 sequential chains and later analysis. Additionally, five participants with irregular
531 answer patterns were identified and excluded. One participant was excluded since the
532 position in the sequential chain was allocated to two different participants. After
533 excluding these participants and, if necessary, participants in the same sequential chain,
3 The preregistration form as available at https://aspredicted.org/blind.php?x=5q2n5z. SEQUENTIAL COLLABORATION 25
534 the final sample comprised 654 participants. Half of the participants were female
535 (50.00%), the mean age was 47.90 years (SD = 19.55). Most participants had a college
536 degree (27.68%), followed by a high-school diploma (25.69%), and vocational education
537 (22.78%) while 23.85% of all participants had a lesser education attainment.
538 Results
539 As preregistered and established in Experiment 1, we first identified and
540 excluded judgments that were timed out after 40 seconds and chains containing timed
541 out judgments. From 42,520 initial judgments, 40,814 judgments remained after this
542 exclusion. The remaining judgments were then standardized item-wise using the same
543 procedure as in Experiment 1 (subtracting the correct answer from the individual
544 judgment and dividing the result by the standard deviation as computed from the
545 wisdom-of-crowds questionnaire). Lastly, the 1% most extreme judgments and chains
546 containing these judgments were excluded from the data resulting in 40,125 judgments
547 for the analysis. We conducted the same analyses as in Experiment 1 for the
548 model-based analysis, for the exploration of the effect of excluding extreme judgments,
549 and for the robustness analysis.
550 Model-based analysis.
551 We again estimated a generalized linear mixed model to test Hypothesis 1a with
552 the decision whether to adjust or maintain a judgment as dependent and chain position
553 as independent variable. Figure3 displays the mean change rate for each chain position
554 with the according confidence intervals. As hypothesized, the plot shows that the change
555 rate decreased over the course of a sequential chain. This pattern is also supported by
556 the generalized linear mixed model showing a significant negative linear trend
557 (β = −0.581, CI = [−0.982, −0.181], z = −2.843, p = .004). No other trend though was
558 significant (β = 0.154, CI = [−0.245, 0.553], z = 0.759, p = .448 for the quadratic trend,
559 β = 0.085, CI = [−0.317, 0.487], z = 0.413, p = .679 for the cubic trend, and β = 0.189,
560 CI = [−0.209, 0.588], z = 0.932, p = .351 for the trend to the power of four). SEQUENTIAL COLLABORATION 26
561 Next, we estimated a linear mixed model with absolute standardized judgments
562 as dependent variable and chain position as independent variable to test Hypothesis 1b.
563 Supporting Hypothesis 1b, the model revealed a significant negative linear trend
564 between chain position and absolute standardized judgment (β = −0.045,
565 CI = [−0.054, −0.036], t(290.96) = −9.528, p < .001). All other trends were not
566 significant (β = 0.007, CI = [−0.002, 0.016], t(291.00) = 1.572, p = .117 for the
567 quadratic trend, β = −0.003, CI = [−0.012, 0.006], t(290.94) = −0.696, p = .487 for the
568 cubic trend, β = 0.001, CI = [−0.008, 0.010], t(290.94) = 0.174, p = .862 for the trend
569 to the power of four, and β = 0.001, CI = [−0.008, 0.010], t(291.02) = 0.184, p = .854
570 for the trend to the power of five). The significant linear trend is also displayed in
571 Figure4 showing a decrease in mean absolute standardized judgments over the course
572 of a sequential chain.
573 Before analyzing Hypothesis 2, we again computed the wisdom-of-crowds
574 estimates from randomly composed groups of six participants. Then we computed a
575 linear mixed model with absolute standardized estimate as dependent and condition as
576 independent variable as described in Experiment 1. Figure5 displays the mean absolute
577 standardized estimates for the wisdom-of-crowds and the sequential-collaboration
578 condition showing that estimates obtained by sequential collaboration are slightly more
579 accurate than estimates obtained by wisdom of crowds. This impression was confirmed
580 by the linear mixed model showing a significant difference between absolute
581 standardized estimates in favor of sequential collaboration (β = −0.014,
582 CI = [−0.023, −0.005], t(100.71) = −3.067, p.003) which supports Hypothesis 2.
583 Effect of outlier analysis on the results. The test the effect of excluding
584 extreme judgments on the results of the analysis for Hypothesis 2, we again computed
585 the linear mixed model for data with excluding up to 5% of the most extreme
586 judgments. The results displayed in Figure6 show that the effect is more stable in
587 Experiment 2 than in Experiment 1. While the accuracy advantage of sequential
588 collaboration compared to wisdom of crowds is stronger when no or only a small
589 proportion of outliers is excluded, the effect remains significant for almost all levels of SEQUENTIAL COLLABORATION 27
590 outlier exclusion applied to the data.
591 Robustness Analysis. To further strengthen the results of the model-based
592 analysis, we again computed a robustness analysis based on observation-oriented
593 modeling. To examine the robustness of Hypothesis 1a, we compared the change rates
594 over chain positions for each item. For data excluding (including) outliers, we found that
595 1.54% (3.08%) of all items showed a strictly monotonic decline in change rate. Even
596 though this result exceeds the benchmark of 1/5! = 0.83%, only one (two) item follows
597 the strict monotonicity. Thus, we also compared the change rate between the second
598 and last chain position. For 92.31% (90.77%) of all items we found that the change rate
599 decreases from the second to the last chain position, supporting Hypothesis 1a.
600 To test Hypothesis 1b, we computed the PCC for sequential chains that showed
601 improvement in accuracy. When excluding (including) outliers, 59.83% (59.24%) of all
602 chains show a weak monotonic increase in accuracy, which is more than the benchmark
5 603 of 0.66 = 12.52%. Additionally, for 89.59% (89.41%) of all chains judgments at the last
604 chain position were more accurate than judgments at the first chain position. These
605 results further support Hypothesis 1b.
606 Lastly, we computed the common language effect size to compare the accuracy of
607 estimates obtained by wisdom of crowds and sequential collaboration. For the subset of
608 data excluding outliers, we found that in 57.22% of all comparisons sequential
609 collaboration yielded more accurate results than wisdom of crowds which was
610 significantly larger than chance (t(64) = 4.526, p < .001). Similarly, when outliers
611 remained in the data 59.58% (t(64) = 6.311, p < .001) of all comparisons where in favor
612 of sequential collaboration which further supports Hypothesis 2.
613 Discussion
614 In Experiment 2, all hypotheses were supported using model-based analyses as
615 well as robustness analyses. For sequential collaboration, the results showed a
616 significant decrease in change rate over the course of a sequential chain while judgment SEQUENTIAL COLLABORATION 28
617 accuracy increases which was also supported by our robustness analyses. Additionally,
618 estimates obtained by sequential collaboration were more accurate than estimates
619 obtained by wisdom of crowds. This effect was also stable for almost all tested
620 proportions of outlier exclusion and was also shown in the robustness analysis.
621 While these results seem promising for sequential collaboration, Experiment 1
622 and 2 both used the same material. This limits the generalizability of the obtained
623 results. Additionally, the general knowledge questions used in the experiments have
624 some limitations. First, the questions are prone to extreme judgments. For instance,
625 one participant answered 120,000,000,000,000,000 kilometers to the question “How long
626 is the mean distance between Earth and Moon?” for which the correct answer is
627 384,400 kilometers. Since the exclusion of outliers has an effect on the results obtained
628 such that sequential collaboration seems especially beneficial when extreme judgments
629 are considered, having extreme judgments in the data might distort the performance of
630 wisdom of crowds and sequential collaboration. Furthermore, general knowledge
631 questions occur rather seldom in online collaboration projects, thus, limiting the
632 ecological validity of the conclusions. Thus, the results should be replicated using
633 different material less prone to extreme judgments and closer to actual online
634 collaboration projects.
635 Experiment 3
636 In Experiment 3, we conceptually replicated Experiment 1. Instead of general
637 knowledge questions, we generated geographic maps on which participants had to give
638 judgments about the position of differt cities. Hence, we focus on two-dimensional
639 location judgments (i.e., x- and y-coordinates) rather than one-dimensional numerical
640 judgments. In contrast to general knowledge questions, two-dimensional location
641 judgments on geographical maps are naturally constrained by the size of the map (more
642 precisely, by the maximum distance between the correct location and all possible
643 judgments) which limits the range of extreme judgments. Otherwise, the study design
644 was similar to Experiment 1 and only minor changes were applied due to the different SEQUENTIAL COLLABORATION 29
4 645 material. This study was also preregistered at aspredicted.org. In addition to the
646 preregistration, we also analyze whether the frequency of changes decreases over the
647 course of a sequential chain (Hypothesis 1a). Furthermore, we adjusted the outlier
648 analysis to the procedure we used in Experiment 1 and Experiment 2 and added the
649 analysis on the effect of outlier exclusion on the results of Hypothesis 2 as wellasthe
650 robustness analysis with observation-oriented modeling.
651 Method
652 Participants. We recruited 417 adult participants via a commercial German
653 panel provider. Since participants were presented with maps in the study, they were
654 supposed to only participate using a computer. Due to issues in the recruitment of
655 participants by the panel provider, 39 participants were nonetheless able to access and
656 complete the study using mobile devices. We excluded all participants using mobile
657 devices and sequences that included participants using mobile devices which resulted in
658 a total of 70 participants excluded. Additionally, four participants were able to access
659 and complete the study a second time. Therefore, we excluded the data collected at the
660 second participation. Since two of those participants were assigned to the
661 sequential-collaboration condition for their second participation and sequences were
662 built based on their judgments, we excluded another 10 participants in total. We also
663 checked whether participants looked up the correct answers or whether participants
664 clicked at a similar position for all items. We identified one participant who was
665 suspected to look up answers. The final sample comprised 333 participants of whom
666 45.95% were female. The mean age was 45.49 years (SD = 15.17). Participants had a
667 diverse educational background with 35.44% holding a college degree, 24.92% having a
668 high school diploma, 24.02% having vocational education, and 18.32% having a lesser
669 educational attainment.
670 Material. As stimulus material, we selected seven maps displaying different
4 The complete preregistration form is available at https://aspredicted.org/blind.php?x=e7cm3e. SEQUENTIAL COLLABORATION 30
671 European countries (i.e., Italy, France, Germany, United Kingdom and Ireland, Austria
672 and Switzerland, Spain and Portugal, and, lastly, Poland, Czech, Hungary, and Slovenia
673 combined). All maps were on a scale of 1:5,000,000 with an image resolution of 800 x
674 500 pixels. Regarding the available geographic information, the maps only showed land
675 mass, oceans, and country borders (but no rivers, mountains, forests, or other cities).
676 The countries of interest were colored white while all other countries were colored gray;
677 oceans were colored blue and country borders were represented as black lines. For each
678 map, we selected several cities while considering the expected geographic knowledge of
679 German participants. This resulted between four cities (for the map of Poland, Czech,
680 Hungary, and Slovenia) and seventeen cities (for the map of Germany). Overall, we
681 selected 57 cities across all seven maps. A comprehensive overview of the material can
682 be found in Table B1 in the Appendix. All presented maps are available in the
683 supplemental material at the OSF (https://osf.io/96nsk/).
684 Design and Procedure. We used a between-subjects design and randomly
685 assigned participants to either the sequential-collaboration questionnaire (112
686 participants) or the wisdom-of-crowds questionnaire (221 participants). After being
687 informed about the aim of and consenting to the study, participants were informed
688 about their task. In the wisdom-of-crowds questionnaire, participants were asked to
689 indicate the position of the given cities on the presented map as accurately as possible.
690 In the sequential-collaboration questionnaire, participants were provided with the
691 location judgment of a city given by a previous participant. Subsequently, they could
692 choose either to modify the given position by indicating a new position or to directly
693 continue to the next city without changing the current location judgment. The order in
694 which the seven maps were presented was randomized as was the order of the presented
695 city within each map. Furthermore, each trial asked about the position of only one city
696 such that participants provided only a single location judgment before continuing to the
697 next city. Participants were given 40 seconds to indicate the city’s position or to decide
698 to not change the presented position. Additionally, participants completing the
699 sequential-collaboration questionnaire had a waiting period of 2 seconds before they SEQUENTIAL COLLABORATION 31
700 could continue to the next city. After positioning the cities or deciding not to change
701 the presented position for all 57 cities, participants were asked for demographic
702 information. Lastly, they were debriefed and thanked for participation.
703 As in Experiment 1, we formed sequences of four participants meaning that one
704 participant who answered the wisdom-of-crowds questionnaire started a sequential chain
705 followed by three participants who completed the sequential-collaboration
706 questionnaire. This resulted in 183 participants in the wisdom-of-crowds condition and
707 150 participants in the sequential collaboration condition.
708 Results
709 Before testing the hypotheses, we computed the Euclidean distance to the correct
5 710 answer for each judgment as dependent variable for Hypothesis 1b and Hypothesis 2.
711 Next, we excluded judgments that were timed out after 40 seconds. With this
712 procedure, we identified 225 judgments that were timed out; from originally 18,981
713 judgments, 18,433 judgments remained after excluding the timed out judgments and
714 sequential chains containing timed out judgments. As already applied in Experiment 1
715 and Experiment 2, we additionally excluded the 1% most extreme judgments (i.e., 184
716 judgments) as defined by the distance to the correct answer. After excluding these
717 extreme judgments and sequential chains that contained extreme judgments, 18,161
718 judgments remained for analysis. The model-based analysis as well as the exploration
719 concerning the effect of outlier exclusion on the results and the robustness analysis were
720 conducted analogous to the procedure in Experiment 1 and Experiment 2.
721 Model-based analysis.
722 To analyze Hypothesis 1a, we applied a generalized linear mixed model to the
723 data with whether a judgment was adjusted or maintained as dependent and chain
724 position as independent variable. Figure7 displays the change rate for each chain
5 All hypotheses were also analyzed using the x- and y-coordinate separately as dependent variables. These analyses yielded the same results as the analysis using Euclidean distances as dependent variable. SEQUENTIAL COLLABORATION 32
725 position with according 95% between-subjects confidence intervals. As expected, the
726 plot shows a decreasing change rate with increasing chain position. This trend is
727 confirmed by the generalized linear model showing a significant negative linear trend
728 between change rate and chain position (β = −0.937, CI = [−1.845, −0.028],
729 z = −2.021, p = .043), thus supporting Hypothesis 1a. The quadratic trend was not
730 significant (β = 0.409, CI = [−0.503, 1.322], z = 0.879, p = .379).
Figure 7 Change rate within a sequential chain
Sequence of judgments over chains
0.6
0.4 Change rate 0.2
0.0
2 3 4 Chain position Note. Points display the emprical means for each chain position, error bars show the 95% between− subjects confidence intervals for each condition.
731 To test Hypothesis 1b, we computed a linear mixed model as in Experiment 1
732 and Experiment 2. The Euclidean distance of every judgment to the true position of the
733 city served as dependent variable, the chain position as independent variable. The
734 model revealed a significant linear trend between chain position and distance
735 (β = −1.070, CI = [−1.513, −0.626], t(143.10) = −4.700, p < .001). Furthermore, the
736 quadratic trend was also significant (β = 0.531, CI = [0.087, 0.974], t(143.10) = 2.332,
737 p = .021), the cubic trend, however, was not significant (β = −0.054, SEQUENTIAL COLLABORATION 33
738 CI = [−0.498, 0.390], t(143.10) = −0.238, p = .812). The negative linear trend in
739 combination with a positive quadratic trend indicates a steep decrease in distance with
740 increasing chain position. This pattern is also displayed in Figure8. As expected in
741 Hypothesis 1b, the distance to the true position of a city decreases with increasing
742 chain position.
Figure 8 Accuracy of judgments within a sequential chain
Sequence of judgments over chains
60
40
20
0 Mean distance to correct answer (in pixels) Mean distance to correct answer 1 2 3 4 Chain position Note. Points display the emprical means for each chain position, error bars show the 95% between− subjects confidence intervals for each condition.
743 Before analyzing Hypothesis 2, we computed the estimates for each condition
744 according to the procedure in Experiment 1 and Experiment 2. To obtain estimates for
745 wisdom of crowds, we randomly grouped data of participants into groups of four and
746 averaged the judgments for each coordinate. Based on the mean position for each city
747 in those groups, we computed the Euclidean distances to the true positions for each
748 estimate as dependent variable. The estimate from the sequential-collaboration
749 condition is the last judgment in each chain. Figure9 displays the mean distance to the
750 true position over all estimates with according confidence intervals. As expected, SEQUENTIAL COLLABORATION 34
751 estimates obtained by sequential collaboration show less distance to the true position
752 than estimates obtained by wisdom of crowds. This impression is also supported by as
753 linear mixed model with the euclidean distance as dependent, condition as independent
754 variable. We found a significant negative effect of condition on distanceβ ( = −7.088,
755 CI = [−13.971, −0.215], t(80.61) = −2.027, p = .046) indicating that sequential
756 collaboration yielded more accurate estimates than wisdom of crowds.
Figure 9 Estimates obtained by wisdom of crowds and sequential collaboration
Comparison of wisdom of crowds and sequential collaboration
40
20
0 Mean distance to correct answer (in pixels) Mean distance to correct answer WoC Seq Condition Note. Points display the emprical means for each chain position, error bars show the 95% between− subjects confidence intervals for each condition.
757 Effect of outlier analysis on the results. As in Experiment 1 and
758 Experiment 2, we additionally explored the effect of excluding extreme judgments from
759 the data before comparing accuracy of estimates obtained by wisdom of crowds and
760 sequential collaboration. Again, we computed the coefficient of the linear mixed model
761 for Hypothesis 2 for data with none to 5% of the most extreme judgments excluded in
762 0.25% steps. The results of this exploratory analysis are displayed in Figure 10. As
763 already observed in Experiment 2, the advantage of sequential collaboration concerning SEQUENTIAL COLLABORATION 35
764 accuracy decreases the more extreme judgments are excluded as outliers. However, the
765 coefficient remains significantly negative. Thus, estimates obtained by sequential
766 collaboration are more accurate for every proportion of excluded extreme outliers.
Figure 10 Effect of outlier removal on the comparison of wisdom of crowds and sequential collabo- ration
0.00
−0.25
−0.50 Coefficient
−0.75
0.00 0.01 0.02 0.03 0.04 0.05 Percent removed extreme judgments Note. Positive values indicate that judgments are more accurate in wisdom of crowds, negative values indicate more accurate judgments in sequential collaboration.
767 Robustness Analysis. To check the robustness of Hypothesis 1a, we, again,
768 computed the relative frequency of items for which the change rate decreases for every
769 sequential step (strict monotonicity) as well as the the frequency of items for which the
770 change rate decreased from the second to the last chain position (global monotonicity)
771 as PCC. As a benchmark, we assume that the probability of both an increasing and a
772 decreasing change rate is equal which results in a 1/3! = 16.67% of items that are
773 expected to behave according to the hypothesis by chance. For the subset of data
774 excluding (including) all outliers, we found that for 40.35% (40.35%) of all items the
775 change rate decreased strictly monotonic from the second to the last chain position. SEQUENTIAL COLLABORATION 36
776 Since the benchmark of 16.67% is exceeded by the relative frequency of items
777 conforming strict monotonicity in change rate for both subsets of data, these results
778 further confirm the model-based analysis of Hypothesis 1a. Additionally, 91.23%
779 (89.47%) of all items showed a decrease in change rate from the second to the last chain
780 position.
781 To check the robustness of Hypothesis 1b, we computed the PCC for increasing
782 accuracy (operationalized as decreasing distance of the judgments to the true position)
783 over the course a sequential chain. As a benchmark for weak monotonicity, we assumed
784 that improving, maintaining, and impairing a judgment are all equally likely (i.e., 33%).
3 785 Under this assumption 0.66 = 28.75% of all chains are expected to have only improving
786 or maintained judgments. For data excluding (including) outliers, we found that 35.20%
787 (35.30%) of all chains show a monotonic increase in accuracy which are both above our
788 benchmark and further confirm Hypothesis 1b. Additionally, 71.97% (72.07%) of all
789 chains improve in accuracy from the first to the last chain position.
790 To analyze the robustness of Hypothesis 2, we computed for each item the
791 frequency of estimate comparisons in which sequential collaboration was more accurate
792 than wisdom of crowds (common language effect size). When excluding outliers, we
793 found that for 57.07% of all comparisons the sequential-collaboration estimate is more
794 accurate than the wisdom-of-crowds estimate which significantly differs from chance
795 (t(56) = 5.019, p < .001). When outliers remained in the data, for 61.78% of all
796 comparisons sequential collaboration yielded a more accurate estimate (t(56) = 7.603,
797 p < .001). These results further support our model-based analysis of Hypothesis 2.
798 Discussion
799 In Experiment 3, we replicated the results of Experiment 2 using geographic
800 maps instead of general knowledge questions, showing that sequential collaboration
801 yields more accurate estimates over the course of a sequential chain while the change
802 rate of judgment decreases. Additionally, sequential collaboration yielded more accurate
803 results than wisdom of crowds. Furthermore, the analysis on the effect of outlier SEQUENTIAL COLLABORATION 37
804 exclusion showed that this effect becomes smaller the more extreme judgments are
805 removed, but it remains significant.
806 General Discussion
807 Sequential collaboration describes a collaboration method in which contributors
808 form a sequential chain of judgments by deciding to adjust or to maintain the latest
809 judgment provided by a previous contributor. In three studies using general knowledge
810 questions and geographic maps, we examined whether the change rate decreases over
811 the course of a sequential chain (Hypothesis 1a) while judgment accuracy increases
812 (Hypothesis 1b). Additionally, we compared the accuracy of estimates obtained by
813 sequential collaboration and wisdom of crowds (Hypothesis 2). While the results of all
814 three experiments were in line with Hypothesis 1b, only Experiment 2 and Experiment
815 3 supported Hypothesis 1a and Hypothesis 2. However, robustness analyses supported
816 the hypotheses in all three experiments.
817 Hence, the sequential collaboration provides accurate results and was not
818 obstructed by anchoring effects (Mussweiler et al., 2004; Tversky & Kahneman, 1974)or
819 high rates of inaccurate changes due to egocentric discounting (Bonaccio & Dalal,
820 2006). When the judgments improve and the change rate decreases over the course of a
821 sequential chain, judgments may finally converge to a correct judgment that isnot
822 changed anymore. Since large-scale online collaboration projects rely on this basic
823 mechanism, our results shed light on a possible mechanism that renders those projects
824 successful with respect to yielding highly accurate information.
825 Moreover, our findings extend research on how individual judgments are
826 influenced when providing information about others’ judgments. Several studies hinted
827 towards dependent judgments being beneficial in certain situations (Becker et al., 2017;
828 King et al., 2012; Koehler & Beauregard, 2006; Minson et al., 2017), and our results
829 show that dependent incremental judgments can also yield accurate estimates. Overall,
830 providing judgments of others can improve individual judgments. SEQUENTIAL COLLABORATION 38
831 Furthermore, estimates obtained by sequential collaboration were more accurate
832 than estimates obtained by wisdom of crowds. Nonetheless we found that wisdom of
833 crowds seems to profit more from outlier exclusion than sequential collaboration.
834 Sequential collaboration allows completely eliminating extreme judgments in the
835 sequential chain by contributors adjusting these judgments. Thus, an accurate estimate
836 can still be obtained even though the chain may have started with an extreme judgment.
837 In wisdom of crowds, extreme judgments can distort estimates more seriously since
838 judgments are aggregated with no weighing. Thus, far more judgments or an equally
839 large extreme judgment on the other side of the distribution are necessary to absorb the
840 initial extreme judgment. However, sequential collaboration was also successful in a
841 paradigm were extreme outliers are very unlikely (Experiment 3). Moreover, excluding
842 outliers reduced but did not eliminate the advantage over wisdom of crowds in accuracy.
843 This shows that even though wisdom of crowds already yields highly accurate results
844 and can profit from error cancellation in judgments, some improvements in accuracy are
845 obtainable by dependent incremental judgments. As expected, since wisdom of crowds
846 already yields highly accurate estimates, this effect was rather small.
847 Possible mechanisms
848 Even though the results are encouraging for future research on sequential
849 collaboration, our experiments did not explain why sequential collaboration yields
850 accurate results. Nonetheless, existing research gives an impression which processes
851 could lead to improved judgments over the course of a sequential chain. For example,
852 while most studies on group decision making show that groups perform worse than
853 individuals on several tasks (Kerr & Tindale, 2004), some studies on group decision
854 making found that judgments of groups can be more accurate than the average of the
855 group members’ individual judgments (Laughlin et al., 1999; Sniezek & Henry, 1989,
856 1990). This is attributed to increases in individual capability through group interaction
857 (group-to-individual transfer, Schultze et al., 2012). Group-to-individual transfer also
858 occurs when group members only interact once (Stern et al., 2017) and when interaction SEQUENTIAL COLLABORATION 39
859 does not take place in person (Maciejovsky & Budescu, 2007). Applying this research to
860 sequential collaboration, contributors may profit from group-to-individual transfer
861 through the judgments they encounter and, thus, give more accurate judgments
862 themselves subsequently. However, in our study design, when less capable contributors
863 encountered judgments of contributors with higher capability those contributors could
864 adjust the same judgments previously. Thus, less capable contributors were probably
865 not able to substantially contribute to the already given judgments.
866 Besides, sequential collaboration may yield accurate results as contributors
867 implicitly weigh judgments by expertise. Judgments in wisdom of crowds improve when
868 the questions to be answered are self-selected (Bennett et al., 2018) while Kerr and
869 Bruun (1983) found that participants refrain from contributing to group work when
870 they feel their judgment is dispensable. Furthermore, weighing judgments by expertise
871 was found to improve estimates in wisdom of crowds (Budescu & Chen, 2014; Merkle et
872 al., 2020). Combining these findings, the task structure of sequential collaboration
873 allows contributors to maintain judgments when they do not feel they can sufficiently
874 contribute to the presented judgment while they can adjust the presented judgments
875 when they feel that an adjustment leads to an improvement in this judgment. Thereby,
876 judgments are implicitly weighted by expertise such that contributors maintain
877 judgments that they cannot improve and adjust judgments that they can improve. In
878 an ideal case, this leads to improvements in judgments until a correct judgment is no
879 longer adjusted. Future research should examine whether these possible mechanisms
880 contribute to the improvement of judgments in sequential collaboration.
881 Limitations and future research directions
882 Even though sequential collaboration seems a promising paradigm for future
883 research, our experiments have some limitations. First, we only studied the basic
884 mechanism of sequential collaboration. However, Wikipedia and OpenStreetMap have
885 several additional functions such as discussion sites, a board of moderators checking on
886 the contributors’ activities, a history of all changes ever made to an entry. These SEQUENTIAL COLLABORATION 40
887 additional functions may also influence the changing behavior in online collaborative
888 projects even though they are less prominent than the information presented in an
889 article or on the map.
890 Second, the results of the exploratory analysis concerning the effect of outlier
891 removal hint toward sequential collaboration requiring a heterogeneous sample to yield
892 significantly more accurate results than wisdom of crowds since this effect wasnot
893 found in a student sample in Experiment 1 but in more heterogeneous samples in
894 Experiment 2 and Experiment 3. Future research should address how a crowds of
895 potential contributors should be composed to optimally generate accurate estimates.
896 Lastly, our studies used small chains of four to six contributors. However,
897 sequential chains in online collaborative projects can be much longer comprising dozens
898 of contributions. Nonetheless, very long chains may be prone to be obstructed by
899 extreme judgments since a single contributor can worsen an already correct judgment
900 while extreme judgments in wisdom of crowds can be absorbed when many judgments
901 are available. Thus, the behavior of contributors in longer chains should be examined in
902 the future and be compared to estimates obtained by wisdom of crowds.
903 Conclusion
904 Sequential collaboration as an underlying process of large-scale online
905 collaborative projects such as Wikipedia and OpenStreetMap has become increasingly
906 important. Our studies show that contributors can successfully collaborate through
907 adjusting and maintaining previous judgments of other contributors. Sequential
908 collaboration can thus be a promising paradigm for future research due to the high
909 practical theoretical relevance about how dependency and opting out of giving a
910 judgment affect sequential collaboration. SEQUENTIAL COLLABORATION 41
911 References
912 André, Q. (2021). Outlier exclusion procedures must be blind to the researcher’s
913 hypothesis. Journal of Experimental Psychology: General.
914 https://doi.org/10.1037/xge0001069
915 Arazy, O., Morgan, W., & Patterson, R. (2006). Wisdom of the crowds:
916 Decentralized knowledge construction in wikipedia. SSRN Electronic
917 Journal. https://doi.org/10.2139/ssrn.1025624
918 Baeza-Yates, R., & Saez-Trumper, D. (2015). Wisdom of the crowd or wisdom of
919 a few? An analysis of users’ content generation. Proceedings of the 26th ACM
920 Conference on Hypertext & Social Media - HT ’15, 69–74.
921 https://doi.org/10.1145/2700171.2791056
922 Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear
923 mixed-effects models using lme4. Journal of Statistical Software, 67 (1), 1–48.
924 https://doi.org/10.18637/jss.v067.i01
925 Becker, J., Brackbill, D., & Centola, D. (2017). Network dynamics of social
926 influence in the wisdom of crowds. Proceedings of the National Academy of
927 Sciences, 114, E5070–E5076. https://doi.org/10.1073/pnas.1615978114
928 Bennett, S. T., Benjamin, A. S., Mistry, P. K., & Steyvers, M. (2018). Making a
929 wiser crowd: Benefits of individual metacognitive control on crowd
930 performance. Computational Brain & Behavior, 1, 90–99.
931 https://doi.org/10.1007/s42113-018-0006-4
932 Bonaccio, S., & Dalal, R. S. (2006). Advice taking and decision-making: An
933 integrative literature review, and implications for the organizational sciences.
934 Organizational Behavior and Human Decision Processes, 101, 127–151.
935 https://doi.org/10.1016/j.obhdp.2006.07.001
936 Bonner, B. L., Sillito, S. D., & Baumann, M. R. (2007). Collective estimation:
937 Accuracy, expertise, and extroversion as sources of intra-group influence. SEQUENTIAL COLLABORATION 42
938 Organizational Behavior and Human Decision Processes, 103, 121–133.
939 https://doi.org/10.1016/j.obhdp.2006.05.001
940 Bray, R. M., Kerr, N. L., & Atkin, R. S. (1978). Effects of group size, problem
941 difficulty, and sex on group performance and member reactions. Journal of
942 Personality and Social Psychology, 36, 1224–1240.
943 https://doi.org/10.1037/0022-3514.36.11.1224
944 Budescu, D. V., & Chen, E. (2014). Identifying expertise to extract the wisdom
945 of crowds. Management Science, 61, 267–280.
946 https://doi.org/10.1287/mnsc.2014.1909
947 Chen, J., Ren, Y., & Riedl, J. (2010). The effects of diversity on group
948 productivity and member withdrawal in online volunteer groups. Proceedings
949 of the 28th International Conference on Human Factors in Computing
950 Systems - CHI ’10, 821. https://doi.org/10.1145/1753326.1753447
951 Dalkey, N., & Helmer, O. (1963). An experimental application of the DELPHI
952 method to the use of experts. Management Science, 9, 458–467.
953 https://doi.org/10.1287/mnsc.9.3.458
954 Davis-Stober, C. P., Budescu, D. V., Dana, J., & Broomell, S. B. (2014). When
955 is a crowd wise? Decision, 1 (2), 79–101. https://doi.org/10.1037/dec0000004
956 Dennis, A. R. (1996). Information exchange and use in small group decision
957 making. Small Group Research, 27, 532–550.
958 https://doi.org/10.1177/1046496496274003
959 Dennis, A. R., Hilmer, K. M., & Taylor, N. J. (1998). Information exchange and
960 use in GSS and verbal group decision making: Effects of minority influence.
961 Journal of Management Information Systems, 14 (3), 61–88. https://search.
962 proquest.com/docview/218899978/abstract/3186065D57B54572PQ/1
963 Galton, F. (1907). Vox populi. Nature, 75, 450–451.
964 https://doi.org/10.1038/075450a0 SEQUENTIAL COLLABORATION 43
965 Geist, M. R. (2010). Using the delphi method to engage stakeholders: A
966 comparison of two studies. Evaluation and Program Planning, 33, 147–154.
967 https://doi.org/10.1016/j.evalprogplan.2009.06.006
968 Giles, J. (2005). Internet encyclopaedias go head to head. Nature, 438, 900–901.
969 https://doi.org/10.1038/438900a
970 Girres, J.-F., & Touya, G. (2010). Quality assessment of the french
971 OpenStreetMap dataset. Transactions in GIS, 14, 435–459.
972 https://doi.org/10.1111/j.1467-9671.2010.01203.x
973 Grice, J. W. (2011). Observation oriented modeling: Analysis of cause in the
974 behavioral sciences. Elsevier Academic Press.
975 Grice, J. W., Barrett, P. T., Schlimgen, L. A., & Abramson, C. I. (2012).
976 Toward a brighter future for psychology as an observation oriented science.
977 Behavioral Sciences, 2, 1–22. https://doi.org/10.3390/bs2010001
978 Grice, J. W., Yepez, M., Wilson, N. L., & Shoda, Y. (2017).
979 Observation-oriented modeling: Going beyond “is it all a matter of chance?”
980 Educational and Psychological Measurement, 77, 855–867.
981 https://doi.org/10.1177/0013164416667985
982 Hogarth, R. M. (1978). A note on aggregating opinions. Organizational Behavior
983 and Human Performance, 21, 40–46.
984 https://doi.org/10.1016/0030-5073(78)90037-5
985 Hueffer, K., Fonseca, M. A., Leiserowitz, A., & Taylor, K. M. (2013). The
986 wisdom of crowds: Predicting a weather and climate-related event. Judgment
987 and Decision Making, 8, 16.
988 Jeste, D. V., Ardelt, M., Blazer, D., Kraemer, H. C., Vaillant, G., & Meeks, T.
989 W. (2010). Expert consensus on characteristics of wisdom: A delphi method
990 study. The Gerontologist, 50, 668–680.
991 https://doi.org/10.1093/geront/gnq022 SEQUENTIAL COLLABORATION 44
992 Keck, S., & Tang, W. (2020). Enhancing the wisdom of the crowd with
993 cognitive-process diversity: The benefits of aggregating intuitive and
994 analytical judgments. Psychological Science, 1272–1282.
995 https://doi.org/10.1177/0956797620941840
996 Kerr, N. L., & Bruun, S. E. (1983). Dispensability of member effort and group
997 motivation losses: Free-rider effects. Journal of Personality and Social
998 Psychology, 44, 78–94. https://doi.org/10.1037/0022-3514.44.1.78
999 Kerr, N. L., & Tindale, R. S. (2004). Group performance and decision making.
1000 Annual Review of Psychology, 55, 623–655.
1001 https://doi.org/10.1146/annurev.psych.55.090902.142009
1002 King, A. J., Cheng, L., Starke, S. D., & Myatt, J. P. (2012). Is the true ‘wisdom
1003 of the crowd’ to copy successful individuals? Biology Letters, 8, 197–200.
1004 https://doi.org/10.1098/rsbl.2011.0795
1005 Kittur, A., & Kraut, R. E. (2008). Harnessing the wisdom of crowds in
1006 wikipedia: Quality through coordination. Proceedings of the 2008 ACM
1007 Conference on Computer Supported Cooperative Work, 37–46.
1008 https://doi.org/10.1145/1460563.1460572
1009 Kittur, A., Pendleton, B. A., Suh, B., & Mytkowicz, T. (2007). Power of the few
1010 vs. Wisdom of the crowd: Wikipedia and the rise of the bourgeoisie. CHI
1011 ’07: Proceedings of the SIGCHI Conference on Human Factors in Computing
1012 Systems.
1013 Koehler, D. J., & Beauregard, T. A. (2006). Illusion of confirmation from
1014 exposure to another’s hypothesis. Journal of Behavioral Decision Making, 19,
1015 61–78. https://doi.org/https://doi.org/10.1002/bdm.513
1016 Kräenbring, J., Monzon Penza, T., Gutmann, J., Muehlich, S., Zolk, O.,
1017 Wojnowski, L., Maas, R., Engelhardt, S., & Sarikas, A. (2014). Accuracy and
1018 completeness of drug information in wikipedia: A comparison with standard
1019 textbooks of pharmacology. PLoS ONE, 9 (9). SEQUENTIAL COLLABORATION 45
1020 https://doi.org/10.1371/journal.pone.0106930
1021 Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest
1022 package: Tests in linear mixed effects models. Journal of Statistical Software,
1023 82 (13), 1–26. https://doi.org/10.18637/jss.v082.i13
1024 Larrick, R. P., & Soll, J. B. (2006). Intuitions about combining opinions:
1025 Misappreciation of the averaging principle. Management Science, 52,
1026 111–127. https://doi.org/10.1287/mnsc.1050.0459
1027 Laughlin, P. R., Bonner, B. L., Miner, A. G., & Carnevale, P. J. (1999). Frames
1028 of reference in quantity estimations by groups and individuals.
1029 Organizational Behavior and Human Decision Processes, 80, 103–117.
1030 https://doi.org/10.1006/obhd.1999.2848
1031 LeBeau, B., Song, Y. A., & Liu, W. C. (2018). Model misspecification and
1032 assumption violations with the linear mixed model: A meta-analysis. SAGE
1033 Open, 8. https://doi.org/10.1177/2158244018820380
1034 Leithner, A., Maurer-Ertl, W., Glehr, M., Friesenbichler, J., Leithner, K., &
1035 Windhager, R. (2010). Wikipedia and osteosarcoma: A trustworthy patients’
1036 information? Journal of the American Medical Informatics Association :
1037 JAMIA, 17, 373–374. https://doi.org/10.1136/jamia.2010.004507
1038 Lu, L., Yuan, Y. C., & McLeod, P. L. (2012). Twenty-five years of hidden
1039 profiles in group decision making: A meta-analysis. Personality and Social
1040 Psychology Review, 16, 54–75. https://doi.org/10.1177/1088868311417243
1041 Maciejovsky, B., & Budescu, D. V. (2007). Collective induction without
1042 cooperation? Learning and knowledge transfer in cooperative groups and
1043 competitive auctions. Journal of Personality and Social Psychology, 92,
1044 854–870. https://doi.org/10.1037/0022-3514.92.5.854
1045 McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic.
1046 Psychological Bulletin, 111, 361–365.
1047 https://doi.org/10.1037/0033-2909.111.2.361 SEQUENTIAL COLLABORATION 46
1048 Merkle, E. C., Saw, G., & Davis-Stober, C. (2020). Beating the average forecast:
1049 Regularization based on forecaster attributes. Journal of Mathematical
1050 Psychology, 98, 102419. https://doi.org/10.1016/j.jmp.2020.102419
1051 Minson, J. A., Mueller, J. S., & Larrick, R. P. (2017). The contingent wisdom of
1052 dyads: When discussion enhances vs. Undermines the accuracy of
1053 collaborative judgments. Management Science, 64, 4177–4192.
1054 https://doi.org/10.1287/mnsc.2017.2823
1055 Mussweiler, T., Englich, B., & Strack, F. (2004). Anchoring effect. In R. F. Pohl
1056 (Ed.), Cognitive illusions (1st ed., pp. 183–199). Psychology Press.
1057 Niederer, S., & Dijck, J. van. (2010). Wisdom of the crowd or technicity of
1058 content? Wikipedia as a sociotechnical system. New Media & Society, 12,
1059 1368–1387. https://doi.org/10.1177/1461444810365297
1060 OpenStreetMap Contributors. (2021). OpenStreetMap.
1061 https://www.openstreetmap.org/about
1062 Pinheiro, J. C., & Bates, D. M. (Eds.). (2000). Linear mixed-effects models:
1063 Basic concepts and examples. In Mixed-effects models in S and S-PLUS (pp.
1064 3–56). Springer. https://doi.org/10.1007/978-1-4419-0318-1_1
1065 Pohl, R. F. (1998). The effects of feedback source and plausibility of hindsight
1066 bias. European Journal of Cognitive Psychology, 10, 191–212.
1067 https://doi.org/10.1080/713752272
1068 Schielzeth, H., Dingemanse, N. J., Nakagawa, S., Westneat, D. F., Allegue, H.,
1069 Teplitsky, C., Réale, D., Dochtermann, N. A., Garamszegi, L. Z., &
1070 Araya-Ajoy, Y. G. (2020). Robustness of linear mixed-effects models to
1071 violations of distributional assumptions. Methods in Ecology and Evolution,
1072 11, 1141–1152. https://doi.org/https://doi.org/10.1111/2041-210X.13434
1073 Schultze, T., Mojzisch, A., & Schulz-Hardt, S. (2012). Why groups perform
1074 better than individuals at quantitative judgment tasks: Group-to-individual
1075 transfer as an alternative to differential weighting. Organizational Behavior SEQUENTIAL COLLABORATION 47
1076 and Human Decision Processes, 118, 24–36.
1077 https://doi.org/10.1016/j.obhdp.2011.12.006
1078 Simmons, J. P., Nelson, L. D., Galak, J., & Frederick, S. (2011). Intuitive biases
1079 in choice versus estimation: Implications for the wisdom of crowds. Journal
1080 of Consumer Research, 38, 1–15. https://doi.org/10.1086/658070
1081 Sniezek, J. A., & Henry, R. A. (1989). Accuracy and confidence in group
1082 judgment. Organizational Behavior and Human Decision Processes, 43, 1–28.
1083 https://doi.org/10.1016/0749-5978(89)90055-1
1084 Sniezek, J. A., & Henry, R. A. (1990). Revision, weighting, and commitment in
1085 consensus group judgment. Organizational Behavior and Human Decision
1086 Processes, 45, 66–84. https://doi.org/10.1016/0749-5978(90)90005-T
1087 Stasser, G., & Titus, W. (1985). Pooling of unshared information in group
1088 decision making: Biased information sampling during discussion. Journal of
1089 Personality and Social Psychology, 48, 1467–1478.
1090 https://doi.org/10.1037/0022-3514.48.6.1467
1091 Stern, A., Schultze, T., & Schulz-Hardt, S. (2017). How much group is
1092 necessary? Group-to-individual transfer in estimation tasks. Collabra:
1093 Psychology, 3 (16). https://doi.org/10.1525/collabra.95
1094 Steyvers, M., Miller, B., Hemmer, P., & Lee, M. (2009). The wisdom of crowds
1095 in the recollection of order information. In Y. Bengio, D. Schuurmans, J.
1096 Lafferty, C. Williams, & A. Culotta (Eds.), Advances in neural information
1097 processing systems (Vol. 22, pp. 1785--1793). Curran Associates, Inc.
1098 https://proceedings.neurips.cc/paper/2009/file/
1099 4c27cea8526af8cfee3be5e183ac9605-Paper.pdf
1100 Surowiecki, J. (2004). The wisdom of crowds (1. ed). Anchor Books.
1101 Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics
1102 and biases. Science, 185, 1124–1131.
1103 https://doi.org/10.1126/science.185.4157.1124 SEQUENTIAL COLLABORATION 48
1104 Wagner, C., & Vinaimont, T. (2010). Evaluating the wisdom of crowds. Issues
1105 in Information Systems, 11, 724–732.
1106 Wikipedia Contributors. (2021). Wikipedia:about.
1107 https://en.wikipedia.org/wiki/Wikipedia:About
1108 Yaniv, I., & Kleinberger, E. (2000). Advice taking in decision making: Egocentric
1109 discounting and reputation formation. Organizational Behavior and Human
1110 Decision Processes, 83, 260–281. https://doi.org/10.1006/obhd.2000.2909
1111 Zheng, S., & Zheng, J. (2014). Assessing the completeness and positional
1112 accuracy of OpenStreetMap in China. In T. Bandrova, M. Konecny, & S.
1113 Zlatanova (Eds.), Thematic cartography for the society (pp. 171–189).
1114 Springer International Publishing.
1115 https://doi.org/10.1007/978-3-319-08180-9_14
1116 Zielstra, D., & Zipf, A. (2010). Quantitative studies on the data quality of
1117 OpenStreetMap in Germany. AGILE 2010. The 13th AGILE international
1118 conference on geographic information science. SEQUENTIAL COLLABORATION 49
Appendix A General Knowledge Questions
Table A1 Table of items for Experiment 1 and Experiment 2 using general knowledge questions.
Item Question Correct answer
1 How large is the Eiffel Tower? 300 meters
2 How many sovereign countries are located in Africa? 54 countries
3 How long is the Nile? 6650 kilometers
4 How old was Johann Wolfang von Goethe? 82 years old
5 How many bones does a human have? 214 bones
6 What is earth’s mean radius? 6371 kilometers
7 How old was Martin Luther King Jr.? 39 years old
8 How tall is the Brandenburg Gate? 26 meters
9 How high was the highest temperature ever measured on earth? 57 °C
10 At what temperature does lead melt? 328 °C
11 In which year did the first manned space flight take place? 1961
12 How high is Mount Everst? 8848 meters
13 How much does a tennis ball weigh? 57 gramms
14 How many keys does a typical piano have? 88 keys
15 How fast can a cheetah run? 112 kilometers per hour
16 How long can a blue whale get? 33 meters
17 How much do ten liters of oxygen weigh? 14 gramms
18 How many sovereign countries are located in Africa? 54 countries
19 How many prime numbers are in the interval between 1 and 1000? 168 prime numbers
20 How many star constellations are officially recognized? 88 constellations
21 How many kilocalories do ten gummy bears have (i.e. 30 gramms)? 98 kilocalories
22 How long is a soccer goal? 7 meters
23 When was the last capital punishment enforced in France? 1977
24 How many plays of Shakespeare are preserved? 33 plays
25 How long is the kidney of a full-grown person? 12 centimeters
26 How many species of the hawaiian honeycreeper exist? 21 species
27 When was the lightning rod invented? 1752
28 When did the first modern Oylmpic Games take place? 1896
29 How fast can a raindrop fall? 9 meters per second
30 When was Leonardo da Vinci born? 1452
31 What is the maximum time that a total solar eclipse can take? 7 minutes
32 How many strings does a concert harp have? 47 strings
33 What is mean life expectancy of women in Germany? 81 years SEQUENTIAL COLLABORATION 50
Table A1 Table of items for Experiment 1 and Experiment 2 using general knowledge questions. (continued)
Item Question Correct answer
34 How wide is Lake Constance at its widest point? 14 kilometers
35 How long is the distance between earth and sun in million kilometers? 150 million kilometers
36 When was women’s suffrage adapted in Swizerland? 1971
37 How many chaptes does the Quran have? 114 chapters
38 How many times larger is the diameter of Juptier compared to the diameter of Earth? 11 times
39 How large is the island of Borkum? 31 square-kilometers
40 How many singles did the Beatles officially release? 22 singles
41 How old was Alexander the Great when he waged his first campaign? 18 years old
42 How many species of insects live in Antarctica? 52 species
43 How many federal states does Austria have? 9 federal states
44 When was the first human heart transplant performed? 1967
45 How many marriages were there in Germany in 2018? 449466 marriages
46 How many students were enrolled in German university in the winter term of 2019 / 2020? 2897336 students
47 How many floors does Burj Khalifa have? 163 floors
48 How far is Frankfurt (Main) from Berlin (linear distance)? 424 kilometers
49 When was the first color film available in Germany? 1936
50 When was the numerus clausus first applied in German universities? 1968
51 How far is Paris from London (linear distance)? 343 kilometers
52 How far is Dortmund from Hamburg (linear distance)? 284 kilometers
53 How far is Munich from Athens (linear distance)? 1496 kilometers
54 How tall is the Statue of Liberty including its pedestral? 93 meters
55 When was slavery officially ended in the United States? 1865
56 When was the first Autobahn inaugurated? 1921
57 When did Albert Schweitzer receive the Nobel Peace Price? 1952
58 How long is the mean distance between Earth and Moon? 384400 kilometers
59 In which year was Uranus discovered by William Herschel? 1781
60 How many letters does the Arabic script have? 28 letters
61 How deep is the Pacific at the deepest point? 10094 meters
62 When was Astrid Lindgren born? 1907
63 How much does the heart of a full-grown person weigh? 300 gramms
64 How long can a Green Anakonda get? 8 meters
65 After how many days has a person’s top layer of skin completely renewed? 28 days SEQUENTIAL COLLABORATION 51
Appendix B Cities selected for different maps
Table B1 Table of items for Experiment 3 using map material.
Item Map City
1 Austria and Switzerland Zurich
2 Austria and Switzerland Geneva
3 Austria and Switzerland Basel
4 Austria and Switzerland Bern
5 Austria and Switzerland Vienna
6 Austria and Switzerland Graz
7 Austria and Switzerland Linz
8 Austria and Switzerland Salzburg
9 France Paris
10 France Marseille
11 France Lyon
12 France Toulouse
13 France Nizza
14 Italy Rome
15 Italy Milan
16 Italy Naples
17 Italy Florence
18 Italy Venice
19 Spain and Portugal Madrid
20 Spain and Portugal Barcelona
21 Spain and Portugal Seville
22 Spain and Portugal Lisbon
23 Spain and Portugal Porto
24 United Kingdom and Ireland London
25 United Kingdom and Ireland Birmingham
26 United Kingdom and Ireland Glasgow
27 United Kingdom and Ireland Liverpool
28 United Kingdom and Ireland Dublin
29 Poland, Czech, Hungary and Slovenia Warsaw
30 Poland, Czech, Hungary and Slovenia Prague
31 Poland, Czech, Hungary and Slovenia Bratislava
32 Poland, Czech, Hungary and Slovenia Budapest
33 Germany Berlin SEQUENTIAL COLLABORATION 52
Table B1 Table of items for Experiment 3 using map material. (continued)
Item Map City
34 Germany Hamburg
35 Germany Cologne
36 Germany Frankfurt
37 Germany Stuttgart
38 Germany Düsseldorf
39 Germany Leipzig
40 Germany Dortmund
41 Germany Essen
42 Germany Bremen
43 Germany Dresden
44 Germany Hannover
45 Germany Nuremberg
46 Germany Duisburg
47 Germany Wuppertal
48 Germany Bielefeld
49 Germany Bonn
50 Germany Münster
51 Germany Karlsruhe
52 Germany Mannheim
53 Germany Augsburg
54 Germany Wiesbaden
55 Germany Braunschweig
56 Germany Kiel
57 Germany Munich