Intuitive : Identifying Children’s Data Comparison Strategies using Eye Tracking

Bradley J. Morris ([email protected]), Patrick F. Cravalho ([email protected]), Angela Junglen ([email protected]), Christopher Was ([email protected]) Department of Educational , 405 White Hall Kent, OH 44242 USA

Amy M. Masnick ([email protected]) Department of Psychology, Hauser Hall Hempstead, NY 11549 USA

Abstract Dehaene, & Spelke, 2004; Opfer & Siegler, 2012). People often compare sets of numbers informally, in Differences between single-digit numbers are detected more considering prices or sports performance. Children who lack quickly and accurately as the ratio of the numbers increases knowledge of formal comparison strategies (e.g., statistics) (Dehaene, 2009). For example, reaction times are slower may use intuitive strategies like estimation that create and evaluations are less accurate when comparing 9 and 10 summary values with approximations of means and variance. (9:10 ratio of numbers), than when comparing 3 and 9 (1:3 There were two goals for this experiment: (1) to classify data comparison strategies and (2) to evaluate whether children’s ratio of numbers). This distance effect is evidence that strategy discovery and selection is effective. Using eye numerical quantities are compared using approximate tracking, we identified strategies used by 41 8-12-year-old representations. This effect is detected across species children when comparing number sets, by examining how the (Brannon, 2003), across age of participants (Feigenson et properties of the data sets (e.g., mean ratios and variance) al., 2004), and across presentation formats (e.g., dots, influenced accuracy and confidence in differences. We Arabic numbers, fractions; Buckley & Gillman, 1974; classified strategies from eye tracking patterns; these Dehaene, 2001; Sprute & Temple, 2011). strategies were associated with different levels of accuracy, Multi-digit number comparisons (e.g., 63 vs. 72) and strategy selection was adaptive in that selection was demonstrate a unique distance effect in that two- and three- related to the statistical properties of the sets being compared. The results demonstrate that children are quite adept at digit numbers produce distance effects for each place value informally comparing data sets and adaptively select unit (i.e., tens vs. ones; Korvorst & Damian, 2008; Nuerk, strategies to match the properties of the sets themselves. Weger, & Willmes, 2001). Eye tracking results demonstrate more fixations when there is incompatibility between places Keywords: Intuitive statistics, cognitive development, (e.g., 47 vs. 51), where one number has a larger value in the eye tracking tens column, but the other one has a larger value in the ones column (Moeller, Fischer, Nuerk, & Willmes, 2009). Introduction The Approximate Number System (ANS) allows children, even infants, to discriminate quantities, although the threshold at which accurate discrimination occurs changes How does a shopper determine which store has the lowest over development (Mou & vanMarle, in press). Specifically, prices? This could be achieved by comparing sets of prices infants can distinguish quantities at a ratio of 1:2; by early using a formal statistical analysis like a t-test. However, elementary school, children distinguish at a 4:5 ratio; and when comparing numbers in context (data), outside of a adults discriminate quantities at a 9:10 ratio (Halberda & research setting, such comparisons are more likely to occur Feigenson, 2008; Xu & Spelke, 2000). Children are also informally. Surprisingly, there has been relatively little adept at using the ANS to estimate solutions to problems for research investigating how people compare sets of numbers which they lack formal solution strategies (Gilmore, (e.g., Morris & Masnick, in press) despite the rapidly McCarthy, & Spelke, 2007). One explanation for the growing literature investigating comparisons between single increasing acuity is the strategies children (and adults) use numbers (e.g., Dehaene, 2009). In the current paper, we to compare quantities (Hyde, 2011; Mu & van Marle, in investigate the comparison of number sets in children using press). That is, strategies are related to goals (e.g., accuracy) eye tracking in order to classify and evaluate naïve and constrained by processing limitations (e.g., working strategies for performing data comparisons. memory; Hyde, 2011).

Number representation and comparison Comparing number sets Numbers are represented as both a verbal category (e.g. There is evidence that when comparing number sets, “twelve”, an exact value) and an activation function on an adults create and compare approximate summaries of the approximate number system (Dehaene, 2009; Feigenson,

2651 statistical properties of these sets, which include information differences between data sets, and then linked these about means and variances (Morris & Masnick, in press; strategies to specific behavioral predictions. Table 1 Masnick & Morris, 2008; Obrecht, Chapman, & Suarez, summarizes the set of possible strategies and predicted 2010). For example, adults were asked to compare sets of 3- behavior, given each strategy, described in more detail digit numbers that varied in set size (4 or 8), ratio of means below. We suggested an additional strategy in which (2:3, 4:5, or 9:10), and coefficient of variation (.10 or .20 of participants sought the highest or lowest value. However, no mean). The results suggested that summaries were subject used this strategy and it was eliminated from our compared much like single values in that accuracy and coding. confidence were higher and visual fixations shorter for comparisons between high contrast sets (e.g., high mean Data Comparison Strategies ratio, low variance) than low contrast sets (e.g., low mean One possible approach for comparing sets of numbers is ratio, high variance; Morris & Masnick, in press). Children to attend to only a subset of the number set. We suggest two as young as 8 show some perception of variance in data sets, possible types of subset strategies. The use of any of these reducing their confidence in conclusions with more varied strategies reduces information about set properties. data, though they do not change their confidence as much as Strategy 1: Pairwise comparisons. One possible subset adults (Masnick & Klahr, 2003; Masnick & Morris, 2008). strategy is to make paired comparisons, comparing One model suggests that summaries emerge from multiple individual values within one set to the individual values activation areas in the approximate number system (Morris within the other set. This strategy would involve direct & Masnick, in press). More specifically, approximate means comparisons within each pair of data, such as noting which emerge from the overlap between two or more values and golfer had the longer drive on each trial. After comparing approximate variance could emerge from spread of values each pair of drives, the Golfer with the most “wins” (e.g., 5 within the set. Sets with large mean ratios and low variance of 6 times one golfer had a longer drive) would be would constitute high contrast sets, in that they would create determined to have hit the ball farther. From a processing summary activation areas that would be far apart on a standpoint, such a strategy would require a series of number line (low contrast sets would be close together). comparisons that determine the larger value and a simple In order to create the aforementioned summary tally of the results. Even if used on all pairs in a data set, representations, a reasoner must use a strategy that encodes such comparisons would ignore potentially relevant information necessary to derive summary values. For information about the statistical properties of the set that example, given two columns of three-digit numbers, when may influence the overall evaluation (e.g., mean ratios, an adult attends to the hundreds place value of both variance). columns, she can quickly detect differences given high Strategy 2. Subset comparisons. A second strategy is contrast sets (e.g., left has mostly 4s and 5s in the hundreds comparing a sample of values, such as comparing only the column, while the right has mostly 7s and 8s, therefore the first two drives in a set and ignoring the remaining numbers. right column is larger). However, if a second reasoner only Using this strategy, comparisons for sets would operate attends to the first three-digit numbers in each set, then he exactly as predicted in the comparison between two will not encode the properties of the set, which will numbers. From a processing standpoint, such a strategy influence the types of possible comparisons. Adults are would not require any calculation. consistent in attending to the numeric properties of an entire Strategy 3: Calculation. Another possible strategy for data set (Masnick & Morris, 2008; Morris & Masnick, in comparing sets is for reasoners to calculate values explicitly. press). For example, one might compute a mean value for each Children are often less consistent than adults, column and then compare these means. One might also add demonstrating high variability in strategy use (Siegler, the values and compare the sums (which would lead to the 2007). Although more variable, children often demonstrate same conclusion as computing means). highly adaptive strategy use in that the selection of Strategy 4: Gist. Another strategy for comparing sets of strategies is related to processing goals (Siegler, 1996). numbers is to summarize the statistical properties of the When solving complex addition problems, children discover numbers, including unique characteristics of sets new strategies without conscious awareness and modify unavailable in individual number comparisons, such as their strategies to improve solution accuracy without explicit means and variances. This strategy would lead to a quick feedback (Siegler & Stern, 1998). Thus, we investigated decision, but one that was based on a rough estimation of naïve data comparison strategies of elementary school mean and spread of each data set. Such summaries likely children. emerge from scanning a set of numbers because the In the current study, we asked children to compare data activation of multiple approximate magnitudes for each presented as columns of three-digit numbers (framed as the member of the set results in a summary activation area in distances two golfers drove a golf ball, when doing so regards to a number line. Sets with large mean ratios and repeatedly), and to choose which golfer hit the ball farther. low variance would constitute high contrast sets, in that they We examined potential strategies for assessing such would create summary activation areas that would be far

2652 Table 1. Data comparison strategies, descriptions, and hypothesized response patterns Strategy Accuracy Confidence Eye fixation pattern Pairwise High with high No change with Sequential pattern of fixations comparing pairs within each row Comparison contrast sets. sample size. (e.g., fixations targeted between first two numbers in sets) before moving on to next set until all pairs have been compared.

Subset High with high No change with Fixations directed to only a subset of pairs (e.g. first two, last Comparison contrast sets sample size two). No fixations on more than half of all numbers in the sets.

Calculation High; No change Decreases as sample Long fixations on place values or entire numbers in each set with set size increases properties. Gist Increases as set Increases with high Rapid scanning by place values of numbers; often associated contrast contrast sets, smaller with decomposition of numbers (e.g., focus on hundreds place increases. sample sizes. value only) Gist + Increases as set Increases with high Initial scan strategy with additional information pickup. For contrast contrast sets, smaller example, subjects may iteratively scan different place values increases. sample sizes. (e.g., tens) or whole number comparisons. apart on a number line. If the summary representations are Method compared like single number representations, then a Participants distance effect for sets would be expected, and high contrast Participants were 41 8-12 year-old children (M = 10.76, sets would be associated with faster, more accurate SD = .46) who returned a signed parental consent form and comparisons and low contrast sets with slower, less accurate provided verbal assent to participate. One participant was comparisons. not included in the analysis because of problems with the Strategy 5: Gist+. This strategy begins as the Gist strategy eye-tracking recording. described above. However, instead of terminating the search and producing a response, subjects continue to search for additional information, update information in working Procedure memory, or process information. One key difference from Participants saw 36 data set pairs with the following the Gist strategy is that, without detecting a clear difference properties: (a) set size 4, 6, or 8, (b) low (9:10) or high (4:5) between the two sets, subjects may simply continue to scan ratio of means, and (c) either low or high variance. Numbers for more information. In some cases, this scanning might be were presented in 42-point Times New Roman font and deliberate. For example, subjects may scan, then conduct each column of numbers was centered within two columns pairwise comparisons between whole numbers in the sets or in a table. Within each number, an extra space was placed may search for the highest/lowest value before responding. between the hundreds, tens, and ones values and 1.5 spacing In other cases, the scanning may be incidental. For example, was used between numbers in each column. additional scanning might not be directed towards specific Participants were asked to determine which golfer (left or elements (e.g., iterative scans of tens columns), but might be right) hit the ball farther (accuracy) and rate their confidence simply looking at various elements until they reach either in this evaluation on a 4-point scale (1 = Not at all sure, 4 = some criterion or simply terminate the search and produce a Totally sure). On each trial, participants first saw a fixation guess. slide in which a + was placed in the center of the screen for 1 second. After one second, participants saw a data slide Current paper. We investigated how children compare consisting of the two sets of data (positioned on the left or number sets by presenting participants with sets of data (i.e., right side of the screen) and were given unlimited time to numbers in context) that varied systematically in the mean view the datasets. Participants were then asked to determine ratio, relative variance, and number of observations. We which golfer, on average, hit the ball farther and how asked participants to compare these sets, and we measured confident they were in this difference (using the scale eye fixations, accuracy, and confidence in comparisons to positioned in front of them). Data sets were presented investigate set comparisons when the mean, variance and sequentially in blocks by sample size (4, 6, or 8 number of observations were systematically manipulated. observations) because pilot testing demonstrated that the From these data, we classified their comparison strategies, change in stimuli between presentations (e.g., presenting a evaluated the accuracy of strategies, and investigated set of four, then a set of eight) resulted in a higher number whether children’s strategy discovery and selection was of fixations compared to blocked presentations. In this way, adaptive. blocked presentation controls for errant fixations and

2653 focuses attention on features associated with the stimuli, not difference between sets decreased F(2, 39) = 14, p = .001, with the mode of presentation. η2 = .45; more fixations in the hundreds column as variance Data sets were presented on a Tobii® T-60XL eye increased F(2, 39) = 8.8, p = .01, η2 = .39; and significantly tracker. Participants were given a 9-point calibration before more fixations in the hundreds column for low contrast beginning the task. Areas of Interest (AOIs) were recorded compared to high contrast sets F(3, 39) = 12, p = .001, η2 = around the hundreds, tens, and ones columns and around .36. An additional ANOVA was run with fixation duration whole numbers. Inter-rater reliability was 92% before as the dependent measure, and it yielded a similar pattern of discussion, and all disagreements were resolved with the results. first author. Results Low Variability High Variability Aggregated Results Accuracy and Confidence. A 3-way repeated measures 20 ANOVA was run in which mean ratio (4:5; 9:10), 15 Switches coefficient of variation (low, high) and sample size (4, 6, 8) were independent variables, and accuracy (out of 3) was 10 the dependent measure. For accuracy, there was a main effect for mean ratio, with lower accuracy for sets with 5 9:10 ratio (M = 2.33, SD = .4), and higher accuracy for 4:5 0 sets (M = Figure 1). There were no significant differences Column Mean for confidence ratings. 4:5 9:10 Ratio of Mean Difference

Low Variability High Variability Figure 2. Mean number of column switches by condition. Note. Error bars represent SE of Mean. 3

Column switches. Another measure of comparison was the 2 number of times people moved their eyes from one side to the other. In reading comprehension research, regressions 1 are measured as fixations in which subjects re-read text, and are an accurate index of text complexity (Rayner &

Mean Accuracy Mean Pollatsek, 1989). Here, we considered looking at a column 0 of data repeatedly in a similar manner. Two raters counted 4:5 9:10 the number of times subjects switched fixations between Ratio of Mean Difference columns. The two coders agreed 95% of the time and discrepancies were resolved through discussions with the Figure 1. Mean accuracy by condition. first author. We then examined the effect of data Note. Error bars represent SE of Mean. characteristics on the frequency of fixation switches. A mean ratio (4:5, 9:10) x coefficient of variation (low or Eye Fixation Counts. Eye fixation counts were used to high) x sample size (4, 6, 8) repeated measures ANOVA explore the pattern of where people look for information in was conducted with the number of fixation switches as the this task. There were main effects for all variables in our 4- dependent measure. Overall, there were more column way ANOVA in which mean ratio (4:5, 9:10), coefficient of switches given low mean ratios (F (1, 39) = 53.3, p < .01, variation (low, high), sample size (4, 6, 8), and column partial η2 = .44), and high variance (F (1, 39) = 36, p < .01, (hundreds, tens, ones) were all repeated measures partial η2 = .36). There was also an interaction in which independent variables, and number of fixations with the switches were more frequent with low contrast sets (F (2, Areas of Interest defined by each of the six columns 39) = 16.6, p < .01, partial η2 = .28; See Figure 2). (hundreds, tens, ones on the left and right sides of the screen) was the dependent measure. A repeated measure Strategy analysis ANOVA demonstrated two main effects: a significant In order to investigate data comparison strategies, we overall increase in the number of fixations as set size examined the patterns of eye motions to compare them to increased, F(2, 39) = 6.6, p = .01, η2 = .32, and significantly the proposed patterns associated with strategies outlined in more fixations in the hundreds column than in the tens or Table 1. Two raters coded the eye movement patterns. They one column, F(2, 39) = 21.8, p = .001, η2 = .61. There were had 87% agreement, and resolved discrepancies after more fixations in the hundreds column as the mean discussion with the first author.

2654 Analysis showed substantial variability across the set of trials. Ninety-five percent of participants used at least 3 Subset Gist Gist+ strategies during the experiment. The pairwise comparison and calculation strategies were excluded from further 3 analyses because one participant used them each once. A 2.5 McNemar Chi-square was used to compare strategy use between high and low contrast sets only. The use of the 2 Gist+ strategy was significantly higher for low contrast sets 1.5 while the use of the Gist strategy was significantly higher 1 for high contrast sets (40, df = 6, p = .002; See Figure 3). Mean Accuracy Mean 0.5 Subset Gist Gist+ 0 High Contrast Low Contrast 20 Figure 4. Strategy accuracy by condition. Note. Error bars

15 represent SE of Mean

10 Discussion

5 Our results provide insights into how children make sense of data. Recall that children were asked to compare a series Strategyfrequency 0 of data sets that differed in their statistical properties (e.g., High Contrast Low Contrast mean ratio, variance). Adults are quite adept at such comparisons (Morris & Masnick, in press); however, adults likely have greater knowledge of number properties and Figure 3. Strategy frequency by condition. Note. Error bars mathematical operations as well as greater experience with represent SE of Mean such comparisons than children. Our results provide four main findings. One, children A mean ratio (4:5, 9:10) x coefficient of variation low or were highly accurate at comparing data sets even when high) x sample size (4, 6, 8) repeated measures ANOVA those sets were quite similar (e.g., 9:10 ratio of mean was conducted comparing accuracy by strategy codes. The difference with high variability). Most children attended to results indicate that there was no difference in strategy highly relevant information when making comparisons. For accuracy for high contrast sets (F (1, 39) = 1.2, p > .10). example, children were more likely to attend to the For low contrast sets, there was a significant relationship information in the hundreds place value when making between strategy and accuracy. Specifically, the comparison comparisons. strategy was associated with significantly lower accuracy Two, children used a variety of strategies to make these than the Gist or Gist+ strategies (F (1, 39) = 24.5, p < .01, comparisons. For example, nearly all children used at least partial η2 = .23) and the Gist+ strategy was associated with three different strategies during the experiment. Three, lower accuracy than the Gist strategy (F (2, 39) = 13.9, p < changes in strategy use were associated with changes in the .01, partial η2 = .15; See Figure 4). Unlike accuracy, statistical properties of the data sets. For example, the use of confidence ratings did not significantly differ by strategy or a Gist strategy was more frequent with high contrast than sample size. The subset strategy was used more frequently with low contrast sets and the use of the Gist+ strategy was for high contrast sets than for low contrast sets. The subset more frequent with low contrast set than high contrast sets. strategy was as accurate for high contrast sets as Gist and This change may indicate a purposeful search for additional Gist+ but was significantly less accurate for low contrast information (e.g., tens column), re-activation of information sets F (2, 39) = 21, p < .001, partial η2 = .30. in working memory that was previously scanned (e.g., fixating on hundreds column for a second time), or simply might indicate additional looking time associated with a lack of certainty regarding any conclusion. Finally, strategies were associated with accuracy. For high contrast sets, there was no difference in accuracy; however, strategy use was associated with accuracy for low contrast sets. Specifically, the use of the subset strategy, in which children attended to the first two numbers in the set, was significantly less accurate than strategies in which

2655 children attended to the entire set. For high contrast sets it is Korvorst, M., & Damian, M. F. (2008). The differential highly probable that comparing any two individual values in influence of decades and units on multidigit number the set would be diagnostic of the overall differences comparison. The Quarterly Journal of Experimental between sets. For low contrast sets, this strategy was Psychology, 61(8), 1250-1264. significantly less accurate because the comparison of two doi:10.1080/17470210701503286 values from the set are much less likely to be diagnostic of Masnick, A. M., & Klahr, D. (2003). Error Matters: An set differences than in high contrast sets. The use of Gist initial exploration of elementary school children's and Gist+ strategies would provide information about the understanding of experimental error. Journal of Cognition entire set, which would provide information about the and Development, 4(1), 67-98. statistical properties of the set. Interestingly, for low Masnick, A. M., & Morris, B. J. (2008). Investigating the contrast sets the additional information gained from the development of data evaluation: The role of data Gist+ strategy did not provide a significant increase in characteristics. Child Development, 79(4), 1032-1048. accuracy above the Gist strategy. It is possible that children doi:10.1111/j.1467-8624.2008.01174.x either focused on irrelevant information in these additional Moeller, K., Fischer, M. H., Nuerk, H., & Willmes, K. fixations or children exceeded their working memory (2009). Sequential or parallel decomposed processing of capacity with the additional information. two-digit numbers? Evidence from eye-tracking. The In sum, many 8-12-year-old children displayed an Quarterly Journal of Experimental Psychology, 62(2), intuitive understanding of the statistical properties of 323-334. doi:10.1080/17470210801946740 number sets, using them to draw conclusions and evaluate Morris, B. J., & Masnick, A. M. (in press). Comparing data strategy application. This nascent understanding sets: Implicit summaries of the statistical properties of demonstrates that the Approximate Number System may number sets. Cognitive Science: A Multidisciplinary support rapid, implicit comparisons between number sets Journal. without the need for deliberate calculation. These findings Mou, Y., & vanMarle, K. (in press). Two core systems of may be able to inform formal scientific and statistical numerical representations in infants. Developmental training by highlighting critical elements in creating Review. effective problem representations and developing effective Nuerk, H., Weger, U., & Willmes, K. (2001). Decade breaks instructional tools. in the mental number line? Putting the tens and units back in different bins. Cognition, 82(1), B25-B33. References doi:10.1016/S0010-0277(01)00142-1 Brannon, E. M. (2003). Number knows no bounds. Trends Obrecht, N. A., Chapman, G. B., & Suárez, M. T. (2010). in Cognitive Sciences, 7(7), doi:10.1016/S1364- Laypeople do use sample variance: The effect of 6613(03)00137-2 embedding data in a variance-implying story. Thinking & Buckley, P. B., & Gillman, C. B. (1974). Comparisons of Reasoning, 16(1), 26-44. digits and dot patterns. Journal of Experimental doi:10.1080/13546780903416775 Psychology, 103(6), 1131-1136. doi:10.1037/h0037361 Opfer, J. E., & Siegler, R. S. (2012). Development of Dehaene, S. (2009). Origins of Mathematical Intuitions: The quantitative thinking. In K. Holyoak and R. Morrison Case of Arithmetic. The Year in Cognitive Neuroscience (Eds.), Oxford Handbook of Thinking and Reasoning. 2009: Annual New York Academy of Science, 1156, 232– Rayner, K., & Pollatsek, A. (1989). The psychology of 259. doi: 10.1111/j.1749-6632.2009.04469 reading. Englewood Cliffs, NJ US: Prentice-Hall, Inc. Dehaene, S. (2001). Précis of the number sense. Mind & Siegler, R. S. (2007). Cognitive variability. Developmental , 16, 16-36. Science, 10(1), 104-109. Feigenson, L., Dehaene, S., & Spelke, E. (2004). Core Siegler, R. S. (1996). Emerging minds: The process of systems of number. Trends in Cognitive Sciences, 8(7), change in children’s thinking. New York: Oxford 307-314. doi:10.1016/j.tics.2004.05.002 University Press. Gilmore, C. K., McCarthy, S. E., & Spelke, E. S. (2007). Siegler, R. S., & Stern, E. (1998). Conscious and Symbolic arithmetic knowledge without instruction. unconscious strategy discoveries: a microgenetic Nature, 447(7144), 589-591. doi:10.1038/nature05850 analysis. Journal of Experimental Psychology: Halberda, J., & Feigenson, L. (2008). Developmental General, 127(4), 377. change in the acuity of the “Number Sense": The Sprute, L., & Temple, E. (2011). Representations of Approximate Number System in 3-, 4-, 5-, and 6-year- fractions: Evidence for accessing the whole magnitude in olds and adults. , 44(5), 1457. adults. Mind, Brain, and Education, 5(1), 42-47. Hyde, D. C. (2011). Two systems of non-symbolic doi:10.1111/j.1751-228X.2011.01109.x numerical cognition. Frontiers in neuroscience, 5, Xu, F., & Spelke, E. S. (2000). Large number discrimination 150, 1-8. in 6-month-old infants. Cognition, 74(1), B1-B11.

2656