SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 1
The Social Shapes Test as a Self-Administered, Online Measure of Social Intelligence:
Three Studies with Typically Developing Adults and Adults with Autism Spectrum
Disorder
Matt I. Brown1, Christopher F. Chabris1, & Patrick R. Heck1
1Geisinger Health System, Lewisburg, PA
Running Head: SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE Version 1 submitted for peer-review at the Journal of Autism and Developmental Disorders, 29 July 2021. This paper has not been peer reviewed. Please do not copy or cite without author's permission
*Address correspondence to: Matt I. Brown Geisinger Autism and Developmental Medicine Institute 120 Hamm Drive, Suite 2A, MC 60-36 Lewisburg, PA 17837 [email protected]
Acknowledgments: We thank Lauren Walsh for her research assistance and Antoinette DiCriscio, Cora Taylor, and Vanessa Troiani for their helpful feedback on our manuscript. We also thank Jamie Toroney, Kiely Law, and the SPARK Consortium for their assistance in recruiting participants for this study. This work was supported by funds from the National Institutes of Health (grant U01MH119705). SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 2
Abstract
The Social Shapes Test (SST) was designed as a measure of social intelligence, theory of mind, or mentalizing without using human faces or relying on extensive verbal ability and is completely self-administered online. Despite promising validity evidence, there have been no studies of whether this task is suitable for adults with autism spectrum disorder (ASD). Across three studies, we find that the SST is an equally reliable test for adults with ASD. We detect a modest difference in SST performance due to ASD and no evidence for differential item functioning. We also provide a ten-item version for whenever testing time is limited. The SST is a suitable measure of social intelligence, especially for studies where remote, online assessment is needed.
Keywords: autism spectrum disorder, social cognition, social intelligence, theory of mind; online testing
SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 3
Social and emotional skills are important for adaptive functioning in everyday life (Soto et al., 2020). These abilities have been categorized or labeled using a variety of terms, including social intelligence (SI), perspective-taking, mentalizing, and Theory of Mind (Olderbak &
Wilhelm, 2020). Clinical researchers have developed an array of psychological assessments to assess these abilities (e.g., Abrahams et al., 2019) and to investigate the phenotypes of various psychological conditions including autism spectrum disorder (ASD; Morrison et al., 2019) and schizophrenia (Pinkham et al., 2017a). Cognitive neuroscience researchers have also used these tools to identify areas of brain associated with social functioning (Schaafsma et al., 2015).
Research on the development and implementation of social-cognitive measures can therefore contribute to our understanding of these important etiologies (Casaletto & Heaton, 2017) however most existing tools have focused on recognizing emotion in human facial expressions or interpreting written descriptions of social interactions.
Due in part to the COVID-19 epidemic, many clinical and research practices have needed to shift from administering in-person assessments to implementing measures which can be administered remotely online or during telemedicine visits (Türközer & Öngür, 2020). This shift to online, non-proctored assessment has already become the predominant mode in employment testing (e.g., Tippins, 2015) prior to the pandemic. In contrast, many of the assessments used in clinical or developmental research are designed for in-person administration. Although recent work has helped establish new, web-based cognitive ability assessments (e.g., Biagianti et al.,
2019; Wright, 2020), few have focused on designing remote measures of SI or Theory of Mind.
This is important given that these forms of social cognition are generally affected by ASD or similar, developmental disorders (Velikonja et al., 2019). Presently, many of the SI measures which can be administered online rely on self- or observer-reports such as the Social SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 4
Responsiveness Scale (SRS; Constantino et al., 2003), the Autism Quotient (AQ; Baron-Cohen et al., 2001b), or the Broad Autism Phenotype Questionnaire (BAPQ; Hurley et al., 2007). In contrast, many of the validated, objective SI tests are designed to be completed in-person and administered by a proctor or trained clinician. This makes them ill-suited for remote administration and presents a challenge given the growing need for remote, web-based assessments. To this end, the Social Shapes Test (SST; Brown et al., 2019) was developed as a simple, self-administering SI test using distinct materials from many other SI measures and with promising evidence for convergent validity. To date, however, the SST has only been used in samples of typically developing adults. Therefore, we conducted the present study to examine whether the SST can be used to provide reliable yet remote measurement of SI for adults with or without an ASD diagnosis.
Measuring SI using Animated Shape Tasks
Heider and Simmel (1944) famously observed that research participants often described the movements of animated, geometric shapes in human psychological terms. This pioneering work inspired research by Klin (2000), who found that individuals with autism or Asperger’s disorder made fewer social attributions to the Heider and Simmel video compared to members of an unaffected control group. At the same time, Abell and colleagues (2000) reported similar results using animations based on the Heider and Simmel video. Subsequent research inspired by these findings has largely observed group differences due to ASD among children and adults in performing different types of animated shape tasks (Burger-Caplan et al., 2016). Moreover, researchers have also used shape tasks to quantify differences in SI or social attribution due to schizophrenia (Bell et al., 2010; Johannesen et al., 2018; Koelkebeck et al., 2010) and to identify SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 5 neurophysiological antecedents, including differences in eye movement (Zwickel et al., 2010), and regions of brain activation (Weisberg et al., 2012).
One benefit of these animated shape tasks is that they rely less on reading skill or verbal knowledge and comprehension compared to other measures of SI. More verbally-loaded tasks may allow individuals to use their verbal skills to compensate for low SI (Livingston & Happé,
2017), or may be poorly-performed not because of low SI but because of low verbal ability. This is important given that one of the few SI tests which can be freely administered online, the
Reading the Mind in the Eyes test (RMET; Baron-Cohen et al., 2001a), has been criticized for its reliance on verbal content (Peterson & Miller, 2012). For example, the RMET presents emotion words that are not all commonly used in everyday language, implying that one’s own vocabulary skill may affect RMET performance (Kittel et al., 2021). In addition, tasks involving human faces tend to display score differences due to race or ethnicity among clinical and nonclinical populations (Dodell-Feder et al., 2019; Pinkham et al., 2017b). In contrast, animated shape tasks have displayed few if any group differences in past research (Brown et al., 2019, Brown et al.,
2021; Lee et al., 2018).
Despite these advantages, many animated shape tasks are not well-suited for remote testing. Nearly all existing tasks require proctoring from a clinician or an administrator. For example, both tasks used by Klin (2000) and Abell and colleagues (2000) require an administrator to ask questions and to record and score verbal responses from participants. Not only could this introduce confounding effects of verbal ability or rater bias, but it also increases administration time and financial costs (Livingston et al., 2019). Some tasks use a combination of free response and multiple choice questions (e.g., the Frith-Happé animations; White et al.,
2011) or only a fixed set of multiple choice questions (e.g., Social Attribution Test – Multiple SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 6
Choice or SAT-MC; Bell et al., 2010). However, both examples require an administrator to be present in order to operate the specific video segments for each question. This is problematic because it prevents participants from being able to complete these tasks remotely. These barriers will likely prevent researchers from using animated shape tasks as research studies increasingly shift from being conducted in-person to online.
The Social Shapes Test (SST)
The SST is a 23-item multiple choice social intelligence test. Each SST item consists of a short, 13–23 second animated video which includes a standard set of colored, geometric shapes.
These animations were designed to represent a broader range of emotional and mental states compared to the original Heider and Simmel video and have been found to elicit a similar degree of social attributions in narrative descriptions compared to those reported by Klin (Ratajska et al., 2020). Each video features a different social plot where the shapes display a variety of behaviors including bullying, helping, comforting, deception, and playing. All videos are controlled by the participant and can be viewed as many times as desired. Before starting the
SST, all participants are given a sample item followed by feedback indicating the correct response (Figure 1). All 23 items were administered in the same order for all participants.
Unlike other SI tasks, the SST was explicitly designed to be completely self-administered online, as was done in our initial validations studies (Brown et al., 2019). All questions are scored using an objective scoring key which helps prevent potential rater bias when scoring open responses which are used in other animated shape tasks (White et al., 2011). Lastly, all SST questions and video files are freely available for research use and can be accessed via the Open
Science Framework. Researchers are free to use the SST videos to administer the test as part of an online survey or to adapt the videos in order to suit their own individual studies. This makes SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 7 the SST more easily accessible for researchers, especially compared to other video-based social intelligence tests owned or distributed by commercial test publishers (e.g., The Awareness of
Social Inference Test – TASIT; McDonald, 2012).
Despite these advantages, it is uncertain whether the SST is sensitive to differences in social intelligence among adults with ASD or other developmental disorders. SST scores have been found to correlate positively with existing measures of emotion perception and emotion understanding, even after controlling for differences in verbal and non-verbal ability. However, this research has only been conducted with samples of typically functioning adults such as undergraduate college students (Brown et al., 2021) or crowdsourced workers from Amazon
Mechanical Turk (Brown et al., 2019). Like the SST, there is also promising evidence for convergent validity between SAT-MC scores and other measures of SI and standardized assessments of social functioning (Altschuler et al., 2018; Johannesen et al., 2018). Although no study has administered both tests together, both the SST and SAT-MC have been found to correlate strongly with the RMET (Brown et al., 2019; Lee et al., 2018). Moreover, research suggests that the SAT-MC is sensitive to differences in SI among adults and children based on diagnoses of ASD or schizophrenia (Burger-Caplan et al., 2016; Pinkham et al., 2017a).
Present Study
We designed the present study to investigate whether the SST can be self-administered remotely to measure in social intelligence among typically functioning adults and adults with
ASD. Based on similarities in test content and past convergent validity evidence, we expect that the SST and the SAT-MC each measure the same, underlying construct of SI. Therefore, we expect that adults with ASD should score lower on the SST compared to unaffected adults (UA).
In line with past research, we investigate whether the SST is sensitive to differences in SI due to SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 8 diagnoses of ASD or AS (e.g., Morrison et al., 2019). To test this hypothesis, we conduct two studies in which we recruit adults with an ASD diagnosis to complete the SST using a self- administered, online survey. In each study we report these results in comparison to test scores from samples of typically-developing adults. We also examine whether SST items function similarly for adults with or without an ASD diagnosis and conduct a third study to determine whether an abbreviated form of the SST has similar psychometric properties to the full version.
Study 1
Methods
Participants
We recruited a total of 563 participants for Study 1 from the Simons Foundation
Powering Autism Research for Knowledge (SPARK; SPARK Consortium, 2018) cohort which consists of individuals and families who have agreed to participate in autism research. A broader description of adults in the SPARK cohort was recently reported by Fombonne et al. (2020). All participants were given a $10 Amazon gift card for completing the study. Participants included independent adults who self-reported a diagnosis of autism spectrum disorder (ASD), autistic disorder, or Asperger’s disorder (n = 261) as well as unaffected adults (UA) who are parents of one or more children with an ASD or AS diagnosis but have never been given a diagnosis themselves (n = 302). Although diagnosis history was collected using either self- or parent- reports, rather than direct clinical evaluation, past research has found that this method yields reliable accounts of autism diagnoses in other research registries (Daniels et al., 2012). Although the UA may also have a genetic predisposition to ASD, unaffected, first-degree relatives have been observed to perform better on social cognition tasks relative to individuals with clinical SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 9 diagnoses in prior research (e.g., Lavoie et al., 2013). Therefore, we used the UA sample as a control group.
Prior to data analysis, we removed participants who had a median response time of less than 15 seconds per SST item. This response time threshold was chosen to increase the likelihood that participants watched the entire video for each item and to remove potential cases of non-purposeful responding (21 of 23 SST videos were 15 seconds or longer). We observed that a greater proportion of UA participants were removed based on our response time threshold
(21%) compared to those with either an ASD or AS diagnosis (3%). We also removed participants who failed to respond correctly to all four attention-check items. In each such item, participants were asked to watch a different shape animation and to identify which of four shapes did not appear in the video. The attention-check items depend only on basic cognitive processes
(e.g. vision, attention, memory) and should not require social intelligence to solve. More than
90% of participants in each of the diagnosis conditions were able to correctly identify the missing shape in all four items. We also removed 61 cases where participants reported receiving an ASD or AS diagnosis after the age of 30. This left us with a total sample of n = 367 participants (ASD n = 68; AS n = 79; UA n = 220).
A full summary of the key demographic variables for each group are reported in Table 1.
Participants in the UA group were older on average compared to the ASD and AS groups,
F(2,364) = 67.73, p < .001. We also observed significant differences in educational attainment among the three groups, F(2,364) = 10.18, p < .001. There were no significant group differences in sex or race/ethnicity.
Procedure SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 10
All study materials were presented in an online survey. Each participant was emailed a unique link to the study. Participants were given a brief set of instructions and a practice item before beginning the SST. Afterwards, participants completed several demographic items regarding their geographical location, educational attainment, self-identified race/ethnicity, and approximate annual income. Participant sex and age was provided by the SPARK consortium.
SST scores were calculated as the simple sum of correct responses across the 23 items. We observed adequate internal consistency for the SST within the ASD (α = .78), AS (α = .71), and
UA groups (α = .73). We observed similar, corrected item-total correlations (CITC) for SST items within each of the three groups as well (CITC ASD = .33; CITC AS = .26; CITC UA = .29) which indicates that individual items discriminate between high and low ability equally well between groups.
Statistical Analysis
All analyses were performed using R version 3.6.3. We report the standardized mean difference (Cohen’s d) when interpreting differences in SST scores based on diagnosis group.
We also use multiple linear regression in order to statistically control for demographic differences between the three diagnosis groups.
Results
To test for SST score differences between participants with an ASD and AS diagnosis compared to no diagnosis (UA), we first regressed SST scores on dummy-coded diagnosis variables. We report the descriptive statistics for all three groups in Table 2. Participants with
ASD scored significantly lower on the SST relative to UA (β = –.15, p = .001, d = –0.44, 95% CI
= [–0.66, –0.11]). We failed to detect any difference in SST scores between AS and UA (β = –
.05, p = .93, d = –0.13, 95% CI = [–0.27, 0.25]). Next, we tested for differences between SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 11 participants who reported an ASD or AS diagnosis. We found a significant difference in SST scores between ASD and AS groups (d = –0.36, 95% CI = [–0.69, –0.04], t (143) = 2.25, p =
.03). This result suggests that the SST was sensitive enough to detect differences in SI between adults reporting either an ASD or AS diagnosis.
We also sought to test for SST score differences after statistically controlling for differences in age, sex, educational attainment, and race/ethnicity between the three condition groups (Table 3). After controlling for these demographic differences, we found that participants with ASD still scored lower on the SST compared to the UA group (β = –.13, p = .005). No difference between AS and UA groups could be detected (β = –.03, p = .56). Among the demographic control variables, educational attainment was positively related to SST scores (β =
.28, p < .001) and participants who identified as White scored higher on the SST, compared to all others (β = .17, p < .001).
Discussion
The results of Study 1 suggest that the SST provides an adequately reliable measure of SI for adults who have previously been diagnosed with AS or ASD. We observed a similar degree of internal consistency and comparable item statistics across all three participant groups. We also detected a moderate difference in SI between unaffected adults and adults with ASD even after controlling for demographic differences. Yet the mean score difference due to ASD diagnosis that we observed (d = –0.44) was slightly weaker than some of the estimates reported in past research for similar measures (e.g., SAT-MC; Pinkham et al., 2017a). This result may also be due to the nature of our control group (unaffected parents of children with ASD) or indicative of greater SI among our participants (independent adults with ASD) compared to samples obtained in clinical settings. It could also be that measures reporting larger differences are also relying on SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 12 verbal ability somewhat more than the SST is. Therefore, we conducted a second study to see if we could replicate these findings using an additional sample of adults with ASD recruited from a neurodevelopmental clinic.
Study 2
Methods
Participants
We recruited a second sample of 31 adults who had previously received a clinical diagnosis of ASD from a neurodevelopmental pediatrician. All participants in Study 2 had sought clinical services in the Northeastern U.S. and had consented to be contacted for ongoing research studies. Due to the smaller pool of eligible participants than in Study 1, participants were given a larger reward of a $35 Amazon gift card for completing this study. All participants were recruited online and completed the SST without a proctor or administrator. Before data analysis, six cases were removed for getting at least one attention check item wrong, leaving a sample of n = 25 adults with an ASD diagnosis. All remaining participants exceeded the median response time threshold that we used in Study 1 and no other participants were excluded.
Participants ranged from 18 to 34 years of age (mean age = 20.8, SD = 3.9). Most participants identified as male (20/25; 80%) and as White, non-Hispanic (22/25; 88%). Based on assessment scores obtained from the electronic medical record, the average full-scale IQ score among the
ASD group was 86.1 (SD = 22.4). T-scores from the Social Responsiveness Scale (SRS;
Constantino et al., 2003) indicated an elevated level of autistic symptoms among most of the participants in ASD group as well (M = 74.2, SD = 12.4).
In order to address some of the limitations of our control group in Study 1, here we relied on data collected from typically-developing adult participants in two prior studies (Brown et al., SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 13 in review; Brown et al., 2019). Unlike the control group in Study 1, these participants were not likely to share a potential genetic predisposition to ASD or to other developmental disorders. A total of 829 participants had been recruited from undergraduate psychology courses at a public university in the Midwestern U.S. and from Amazon’s Mechanical Turk. Most participants in this control group identified as female (59%) and White, non-Hispanic (56%). Like the ASD group, all control group participants had completed the SST as part of a self-administered, online
Qualtrics survey. We observed several demographic differences between our ASD and UA groups (Table 4). Participants in the UA sample were significantly older (F(1,852) = 14.95, p <
.001), reported higher educational attainment (F(1,852) = 37.01, p < .001), and were more likely to identify as female, �2 (1) = 15.13, p < .001. We also observed a smaller proportion of participants who identified as White, non-Hispanic within the UA sample compared to the ASD sample (�2 (1) = 10.15, p = .001).
Procedure
All measures were administered using an online survey. Participants first completed the
23-item SST (with four interspersed attention-check items) and then responded to several demographic questions including self-reported race/ethnicity and educational attainment. We observed adequate internal consistency for the SST in both the ASD (α = .74) and UA groups (α
= .69) and the average corrected item-total correlation was similar between groups (CITC ASD =
.30, CITC UA = .24).
Statistical Analysis
As in Study 1, all analyses were performed using R version 3.6.3 and used many of the same statistical analyses to test for differences in SST scores between ASD and UA groups. We also used several statistical packages for additional exploratory analyses which were unique to SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 14 this study. We used the “ps” function as part of the twang R package to compute propensity scores (McCaffrey et al., 2004). These propensity scores were calculated in attempt to estimate
SST score differences between participants with ASD and UA participants who share the most similar demographic makeup (e.g., age, sex, race/ethnicity) as our clinical sample. We also used the difR R package in order to estimate logistic models to test for differential item functioning
(DIF) between participants with or without an ASD diagnosis (Magis et al., 2010).
Results
We first used linear regression to test for unadjusted differences in SST scores between
ASD and UA groups (Table 5). Participants with an ASD diagnosis scored significantly lower on the SST compared to the UA group (β = –.07, p = .04; d = –0.42, 95% CI = [–0.82, –0.02]). We also observed practically no difference in SST scores between adults with ASD in Studies 1 and
2 (d = 0.07, 95% CI = [–0.38, 0.53], t(90) = 0.25, p = .80) or between UA groups in Studies 1 and 2 (d = 0.08, 95% CI = [–0.07, 0.23], t(1047) = 1.04, p = .30). Again, we found that participants with ASD scored lower on the SST even after controlling for demographic differences between groups (β = –.07, p < .04). Among the demographic variables, we did not find the effects of age or sex that we observed in Study 1 (Table 6). We failed to detect an effect for educational attainment, which was inconsistent with our findings in the prior study.
Given the fairly large demographic differences between the ASD group and our larger
UA group, we conducted exploratory analyses using propensity score weighting in attempt to better estimate SST score differences that could be attributed to diagnosis group independent of demographic factors. Propensity scores are determined by regressing group status (e.g., treatment or control condition; here ASD or UA group) onto a set of auxiliary variables in order to minimize any pre-existing differences between treatment and control groups in observational SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 15 studies (Austin, 2011). This method has been widely used in medical research (e.g., Austin &
Stuart, 2015) among other research fields to statistically control for potential confounding variables in non-experimental studies. We estimated propensity scores based on participant age, gender, self-reported race/ethnicity, and educational attainment using generalized boosted regression as described by McCaffrey and colleagues (2004). Given the focus of our study, we sought to estimate the average treatment effect for the treated (ATT) which attempts to determine the effect of the independent variable (e.g., presence or absence of ASD diagnosis) by comparing the “treatment group” to a subset of control group subjects who are most like those in the diagnosis condition (Austin, 2011). In this approach, propensity scores are estimated to identify members of the UA group who otherwise most closely match members of the ASD group based on the demographic variables that were included in the model.
After weighting via this procedure, we found that participants with ASD scored lower on the SST relative to matched controls, but this did not reach statistical significance, d = –0.27, t =
1.05, p = .30. However, when reviewing the descriptive statistics after weighting, we found that our weighted sample size for the control condition was rather small (only n = 27, or 3% of the full UA sample). Among participants with an ASD diagnosis, 72% reported achieving a high school degree or some high school as their highest level of education so far. In contrast, only
12% of participants in the UA group reported achieving only a high school degree or some high school. Due to these disparities between groups, as well as meta-analytic evidence for the positive relationship between SI and academic performance (e.g., MacCann et al., 2020;
Sánchez-Álvarez et al., 2020), we also estimated propensity weights without including educational attainment in the model. This nearly doubled the sample size of the weighted control group (n = 51) while also maintaining no significant differences between the ASD and UA SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 16 groups for the other demographic variables. When using these adjusted propensity score weights, we detected a significant difference between ASD and UA groups, d = –0.52, t = 2.20, p = .03.
Taken together, these findings add some support for the validity of the SST to detect differences in social intelligence between typically-functioning adults and adults with an ASD diagnosis.
Lastly, we tested for DIF between UA and ASD groups for each of the 23 SST items
(Table 7). An example of DIF is an item which is less difficult for members of one group (e.g., unaffected adults) than for members of a comparison group (e.g., adults with ASD) who otherwise have an identical ability level. This would indicate that the item is biased in favor of one group and group differences are more a function of the item properties and do not necessarily demonstrate group differences in the underlying construct of interest. DIF can be uniform, where the difference between groups is consistent across the ability spectrum, or nonuniform, when there is an interaction between group status and ability level (Swaminathan &
Rogers, 1990). It is important to rule out DIF in order to determine whether the observed score differences between groups are due to true differences in the construct of interest and not a result of differences in the test’s measurement properties (Vandenberg & Lance, 2000). DIF testing has been used in prior studies to observe whether instruments function similarly between demographic groups based on factors such as gender, race, and ethnicity (e.g., Harrison et al.,
2017). For this analysis, we pooled the data from Studies 1 and 2 to provide the maximum degree of statistical power to detect differential functioning between groups. None of the 23 SST items displayed any evidence of either uniform or nonuniform DIF. Based on these results, we conclude that the SST provides an equivalent measure of social intelligence for adults with or without an ASD diagnosis. The rank order of item difficulty (percent correct) within the SST was also highly consistent between ASD and UA samples (r = .97, p < .001). We also observed a SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 17 positive correlation between corrected item-total correlations from the ASD and UA samples (r =
.76, p < .001), which indicates that the relative contributions of each SST item to the total score were consistent regardless of diagnosis status. Therefore, differences in SST scores between
ASD and UA groups are more likely to be explained by differences in social intelligence ability than by an artifact of DIF.
Discussion
Not only did we find replicate our initial finding of SST score differences between UA and a new sample of adults with a clinical diagnosis of ASD, but we also found meaningful effects of ASD on SST scores after controlling for demographic group differences using propensity score weighting. Furthermore, we also combined our ASD groups from Studies 1 and
2 to test for DIF relative to a large, normative sample of typically developing adults. The result of this analysis indicates that the SST items are equally valid measures of SI for adults with or without an ASD diagnosis. This also suggests that the observed score differences between groups can be attributed to differences in SI and are not due to any item or measurement biases.
Study 3
Although the 23-item SST typically takes roughly 15 minutes to complete, this may be too long for some studies involving a series of different assessments and measures. Prior studies have developed shorter versions of commonly-used SI tests (e.g., short form Reading the Mind in the Eyes Test; Olderbak et al., 2015) to accommodate researchers who wish to include performance measures of SI without making the length of the study discouraging or prohibitive to prospective participants. Therefore, we conducted Study 3 to develop an abbreviated form of the SST which could be used in situations where a shorter administration time is needed while retaining most of the psychometric properties of the full 23-item version. SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 18
Methods
Participants
We recruited 461 U.S.-based adults using Prolific (Peer et al., 2017). Participants were each paid $4.70 for completing the study. We did not screen participants for any history of ASD diagnoses but assume that most participants in this sample are within a typical range of development, as others have found for similar crowdsourcing sites (e.g., Amazon Mechanical
Turk; McCredie & Morey, 2018). We removed 39 participants who failed one or more attention check items or did not meet our median response time criteria, resulting in a final sample of 422 participants. Most participants identified as White (51%) and female (52%). The average participant age was 31.7 years old (SD = 11.0). Data from this study were also reported by Heck et al. (2021) but there is no overlap in research questions or primary analyses between that publication and this one.
Procedure
All measures in Study 3 were administered using an online survey hosted by Qualtrics.
All tasks were completed by participants in the same, fixed order. After completing the social intelligence and general intelligence tasks described below, participants were also asked to complete several self-report personality and confidence measures which were not relevant to the research questions asked here. The median time to complete the study was 31 minutes.
Measures
We created a short, ten-item SST form based on an item analysis of the full 23-item version using item-level data reported by Brown et al. (2019). We selected items based on difficulty (percent correct) and corrected item-total correlations as suggested by Stanton et al.
(2002). We also considered the differences in item difficulty between UA and ASD groups SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 19 reported in Table 7. This short form had only slightly worse internal consistency (α = .69) compared to the estimates found for the full SST in Studies 1 and 2.
In addition to the SST, all participants completed ten-item short form versions of several social and general intelligence measures. Ten Reading the Mind in the Eyes Test (RMET; Baron-
Cohen et al., 2001) items were selected based on a short form developed by Olderbak et al.
(2015). A ten-item version of the Situational Test of Emotional Understanding (STEU; MacCann
& Roberts, 2008) was created using item analysis data reported by Brown et al. (2019). Both short forms yielded acceptable internal consistency (RMET α = .66; STEU α = .64). We measured verbal ability using the ten-item Wordsum vocabulary test (α = .78). We also measured abstract reasoning using a ten-item test using matrix items similar to those used in Raven’s
Progressive Matrices (α = .72).
Results
Notwithstanding the slight decrease in internal consistency relative to the full 23-item
SST, scores on the 10-item short form SST were positively related to scores on the RMET (r =
.53, p < .001) and the STEU (r = .53, p < .001). We also found that SST performance (as percent correct of total) among unaffected adults was similar for the full 23-item version (M = 69% in
Study 2) and the shortened, 10-item form (M = 70%). We report descriptive statistics for the short form SST in Table 8. To test for incremental validity, we regressed SST scores onto both
RMET and STEU scores after controlling for verbal ability and abstract reasoning. Both SI tests predicted an additional 11% of variance in SST scores (∆R2 = .11, F(∆df = 2) = 40.17, p < .001).
Additionally, the RMET (� = .25, p < .001) and STEU (� = .22, p < .001) were each significant predictors of SST scores beyond the effects of the general intelligence measures. These results SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 20 suggest that the 10-item version of the SST retains much of the convergent validity that has been demonstrated in previous research using the full 23-item version (Brown et al., 2019).
We compared the short form results from this sample of unaffected adults (UA) to the combined ASD samples from Studies 1 and 2 (n = 93). For this combined sample of adults with
ASD, we observed their performance on the ten-item subset from the full SST that they completed. Compared to the unaffected adult sample, the short form SST demonstrated a similar level of internal consistency among ASD participants (α = .65) and corrected item-total correlations among the ten items (mean CITC = .32). We also found that participants with ASD scored lower on the short form SST compared to the UA sample (d = –0.24, 95% CI = [–0.47, –
0.02], t(513) = 2.13, p = .03). These results suggest that the short form SST may provide a valid measure of SI for studies where the full 23-item version cannot be administered.
General Discussion
Our study is the first to explore how adults with ASD perform on a self-administered, online SI test compared to typically developing adults. We found that the SST and its items functioned equally well for both groups of participants and that the test detected modest group mean differences due to ASD. We provide a histogram of SST scores for participants in UA and
ASD groups in Studies 1 and 2 (Figure 2). We also failed to detect any consistent evidence for
DIF compared to a large normative sample of 1049 unaffected adults. Even in Study 2, where our ASD sample was characterized by below average IQ scores, participants were able to complete the SST online without the presence of a test proctor or clinician. These results suggest that the SST holds promise as a valid, online, remote assessment of SI for adults in clinical or subclinical populations. Unlike self- or observer-reported measures of autistic traits which have been found to correlate with personality traits in non-clinical samples (Ingersoll et al., 2011; SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 21
Schwartzman et al., 2016), past research also indicates that SST scores are practically unrelated to personality or trait emotional intelligence (Brown et al., 2019). This suggests that the SST can be used as a complement to many of the popular existing self- or observer-reported measures.
These findings are especially promising given the growing need for valid, online, self- administered assessments. Although past research has documented the development of internet- based general cognitive ability or intelligence tests (e.g., Brown & Grossenbacher, 2017;
Sliwinski et al., 2018; Wright, 2020), few objective SI tests besides the RMET have been used online without a proctor. Even though our participants completed the SST outside of a clinical setting and using their own device (e.g., tablet, laptop, or desktop computer), we did not detect any degradation in measurement precision or item validity. Based on these results, the SST appears useful for assessing SI while allowing participants to complete the test remotely without having to travel to a clinic or research site. We also designed a 10-item short form of the SST which can be used while retaining modest reliability and demonstrating similar validity evidence.
This may help researchers recruit larger samples or may make participating in research studies more accessible to potential participants. Based on findings from recent research, the use of animation in the SST may also create a more engaging and enjoyable experience for participants relative to text-based assessments (Karakolidis et al., 2021).
Implications and Directions for Future Research
Even though the SST was not explicitly designed to detect ASD or other developmental disorders or to quantify traits related to ASD, the test does appear to be somewhat sensitive to differences in SI due to ASD. We found modest mean score differences on the SST between adults with and without a prior diagnosis of ASD as observed in previous studies using different measures of SI. Our estimates were only slightly lower than the difference between patients with SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 22 schizophrenia and controls reported for the SAT-MC by Pinkham et al. (d = 0.64; 2017a).
However, larger score differences have been reported for the SAT-MC among children with or without ASD (d = 0.88; Burger-Caplan et al., 2016) and for other animated shape tasks
(Blilksted et al., 2016). Likewise, other SI tests like the RMET have generally displayed larger mean differences due to ASD or schizophrenia diagnoses (Morrison et al., 2019; Peñuelas-Calvo et al., 2019). Future test development work may help improve the sensitivity of the SST. Item results in Table 7 indicate that the largest score differences were found for items which were designed as false belief tasks (SST 18 and 23) that likely stress Theory of Mind capacities. In each of these items, participants were asked to take the perspective of one of the shape characters
(e.g., “Who does X think is in the house at the end of the video?”). These results suggest that this item format may be particularly effective for differentiating between with or without an ASD diagnosis and may be used to further optimize the SST for this purpose.
Another important avenue for future research is to determine the boundary conditions for administering the SST online. In our samples, adults with ASD were able to complete the SST outside of a controlled research or clinic setting. However, many of these adults appear to be relatively high functioning, based on their self-reported educational attainment. Future studies should seek to identify criteria which would help researchers determine whether a participant could be expected to provide valid responses in a self-administered, online assessment. Likewise, our samples only included participants who were 18 years of age or older even though animated shape tasks have been used to measure SI among children and adolescents (Altschuler et al.,
2018; Burger-Caplan et al., 2016; Salter et al., 2008). Given that the SST items were designed to require as little reading as possible and thus be more independent of verbal ability or language skills, we expect that the test can be used in younger populations, but this has yet to be explored SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 23 empirically. Thus, further research is needed to observe how this test functions when administered to younger participants in clinical or nonclinical populations.
Future research is also needed to observe the heritability and genetic predictors of SST scores. Twin studies have estimated genetic contributions to individual differences in social cognition (Isaksson et al., 2019) and measures of social functioning (Constantino & Todd, 2003).
More specifically, a recent study reported a heritability estimate of 28% for performance on the
RMET (Warrier et al., 2018). We would expect to find similar heritability estimates for the SST and similar animated shape tasks based on the convergent validity evidence with the RMET, but this has yet to be empirically observed. This work could potentially identify whether different measures of SI share common genetic influences and how distinct those influences may be from those which predict performance on more general intelligence or cognitive tests.
Study Limitations
We observed that the SST provided adequate, but not ideal, internal consistency for adults with or without ASD (α < .80). Based on these results, the current version of the SST is best suited for research purposes, where even modest reliability may be sufficient for detecting true effects (Schmitt, 1996), but is not reliable enough for high-stakes, diagnostic use, where
Nunnally (1978) suggests a threshold of α ≥ .90. There is also currently only one SST form, so future research should develop a parallel test form and demonstrate test-retest reliability among clinical and nonclinical samples. Pinkham et al. (2017a) found that a similar animated shape task, the SAT-MC, had relatively weak test-retest reliability due in part to a poorly-performing alternative test form. This level of reliability may have attenuated the observed mean differences between adults with or without ASD. Another limitation is that we did not obtain cognitive or intelligence test scores for all participants in Studies 1 and 2. Therefore, we were only able to SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 24 control for coarse-grained educational attainment and other demographic differences when testing for score differences due to ASD.
Conclusions
Across two studies, we detected differences in SI between typically developing adults and adults with ASD using a remotely administered, freely-available online test. In both instances, we found that the SST was equally reliable for those with or without ASD. We also found no instances of differential item functioning based on ASD diagnosis. These results indicate that the SST is a promising tool for measuring SI, especially in situations where in- person, on-site assessments are either impractical or not possible. We also identified a shorter,
10-item version of the test which can be used in place of the full version. Although future research is needed to further optimize the SST and boost its reliability for clinical purposes, this tool may help researchers obtain a quantitative measure of SI while avoiding some of the practical limitations of other existing instruments. SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 25
References
Abell, F., Happé, F., & Frith, U. (2000). Do triangles play tricks? Attribution of mental states to
animated shapes in normal and abnormal development. Cognitive Development, 15, 1–16.
https://doi.org/10.1016/S0885-2014(00)00014-9
Abrahams, L., Pancorbo, G., Primi, R., Santos, D., Kyllonen, P., John, O. P., & De Fruyt, F.
(2019). Social-emotional skill assessment in children and adolescents: Advances and
challenges in personality, clinical, and educational contexts. Psychological Assessment,
31, 460-473. https://doi.org/10.1037/pas0000591
Altschuler, M., Sideridis, G., Kala, S., Warshawsky, M., Gilbert, R., Carroll, D., Burger-Caplan,
R., & Faja, S. (2018). Measuring individual differences in cognitive, affective, and
spontaneous theory of mind among school-aged children with autism spectrum disorder.
Journal of Autism and Developmental Disorders, 48, 3945–3957.
https://doi.org/10.1007/s10803-018-3663-1
Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of
confounding in observational studies. Multivariate Behavioral Research, 46, 399–424.
https://doi.org/10.1080/00273171.2011.568786
Austin, P. C., & Stuart, E. A. (2015). Moving towards best practice when using inverse
probability of treatment weighting (IPTW) using the propensity score to estimate causal
treatment effects in observational studies. Statistics in Medicine, 34, 3661–3679.
https://doi.org/10.1002/sim.6607
Baron-Cohen, S., Wheelwright, S., Hill, J., Raste, Y., & Plumb, I. (2001a). The “Reading the
Mind in the Eyes” test revised version: A study with normal adults, and adults with SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 26
Asperger syndrome or high-functioning autism. Journal of Child Psychology and
Psychiatry, 42, 241–251. https://doi.org/10.1111/1469-7610.00715
Baron-Cohen, S., Wheelwright, S., Skinner, R., Martin, J., & Clubley, E. (2001b). The Autism-
Spectrum Quotient (AQ): Evidence from Asperger syndrome/high-functioning autism,
males and females, scientists and mathematicians. Journal of Autism and Developmental
Disorders, 31, 5–17. https://doi.org/10.1023/A:1005653411471
Bell, M. D., Fiszdon, J. M., Greig, T. C., & Wexler, B. E. (2010). Social attribution test –
multiple choice (SAT-MC) in schizophrenia: Comparison with community sample and
relationship to neurocognitive, social cognitive and symptom measures. Schizophrenia
Research, 122, 164–171. https://doi.org/10.1016/j.schres.2010.03.024
Biagianti, B., Fisher, M., Brandrett, B., Schlosser, D., Loewy, R., Nahum, M., & Vinogradov, S.
(2019). Development and testing of a web-based battery to remotely assess cognitive
health in individuals with schizophrenia. Schizophrenia Research, 208, 250–257.
https://doi.org/10.1016/j.schres.2019.01.047
Brown, M. I., & Grossenbacher, M. A. (2017). Can you test me now? Equivalence of GMA tests
on mobile and non-mobile devices. International Journal of Selection and Assessment,
25, 61–71. https://doi.org/10.1111/ijsa.12160
Brown, M. I., Ratajska, A., Hughes, S. L., Fishman, J. B., Huerta, E., & Chabris, C. F. (2019).
The social shapes test: A new measure of social intelligence, mentalizing, and theory of
mind. Personality and Individual Differences, 143, 107–117.
https://doi.org/10.1016/j.paid.2019.01.035 SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 27
Brown, M. I., Speer, A. B., Tenbrink, A. P., & Shihadeh, M. (2021). Investigating group
differences in verbal and non-verbal social intelligence. Poster presented at the 35th
Meeting for the Society for Industrial and Organizational Psychology; virtual conference.
Burger-Caplan, R., Saulnier, C., Jones, W., & Klin, A. (2016). Predicting social and
communicative ability in school-age children with autism spectrum disorder: A pilot
study of the Social Attribution Task, Multiple Choice. Autism, 20, 952–962.
https://doi.org/10.1177/1362361315617589
Casaletto, K. B., & Heaton, R. K. (2017). Neuropsychological assessment: Past and future.
Journal of the International Neuropsychological Society, 23, 778–790.
https://doi.org/10.1017/S1355617717001060
Constantino, J. N., Davis, S. A., Todd, R. D., Schindler, M. K., Gross, M. M., Brophy, S. L., ...
& Reich, W. (2003). Validation of a brief quantitative measure of autistic traits:
Comparison of the social responsiveness scale with the autism diagnostic interview-
revised. Journal of Autism and Developmental Disorders, 33, 427–433.
https://doi.org/10.1023/A:1025014929212
Constantino, J. N., & Todd, R. D. (2003). Autistic traits in the general population. Archives of
General Psychiatry, 60, 524–530. https://doi.org/10.1001/archpsyc.60.5.524
Cor, M. K., Haertel, E., Krosnick, J. A., & Malhotra, N. (2012). Improving ability measurement
in surveys by following the principles of IRT: The Wordsum vocabulary test in the
General Social Survey. Social Science Research, 41, 1003–1016.
https://doi.org/10.1016/j.ssresearch.2012.05.007
Daniels, A. M., Rosenberg, R. E., Anderson, C., Law, J. K., Marvin, A. R., & Law, P. A. (2012).
Verification of parent-report of child autism spectrum disorder diagnosis to a web-based SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 28
autism registry. Journal of Autism and Developmental Disorders, 42, 257–265.
https://doi.org/10.1007/s10803-011-1236-7
Dodell-Feder, D., Ressler, K. J., & Germine, L. T. (2020). Social cognition or social class and
culture? On the interpretation of differences in social cognitive performance.
Psychological Medicine, 50, 133–145. https://doi.org/10.1017/S003329171800404X
Fombonne, E., Green Snyder, L., Daniels, A., Feliciano, P., Chung, W., & SPARK Consortium.
(2020). Psychiatric and medical profiles of autistic adults in the SPARK cohort. Journal
of Autism and Developmental Disorders, 50, 3679–3698. https://doi.org/10.1007/s10803-
020-04414-6
Harrison, A. J., Long, K. A., Tommet, D. C., & Jones, R. N. (2017). Examining the role of race,
ethnicity, and gender on social and behavioral ratings within the Autism Diagnostic
Observation Schedule. Journal of Autism and Developmental Disorders, 47, 2770–2782.
https://doi.org/10.1007/s10803-017-3176-3
Heck, P. R., Brown, M. I., & Chabris, C. F. (2021). Socially (un)skilled and unaware of it: A
robust negative relationship between self-report and performance measures of social
intelligence. Unpublished manuscript.
Heider, F., & Simmel, M. (1944). An experimental study of apparent behavior. American
Journal of Psychology, 57, 243–259. https://doi.org/10.2307/1416950
Hurley, R. S. E., Losh, M., Parlier, M., Reznick, J. S., & Piven, J. (2007). The broad autism
phenotype questionnaire. Journal of Autism and Developmental Disorders, 37, 1679–
1690. https://doi.org/10.1007/s10803-006-0299-3
Ingersoll, B., Hopwood, C. J., Wainer, A., & Donnellan, M. B. (2011). A comparison of three
self-report measures of the broader autism phenotype in a non-clinical sample. Journal of SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 29
Autism and Developmental Disorders, 41, 1646–1657. https://doi.org/10.1007/s10803-
011-1192-2
Isaksson, J., Westeinde, A. V., Cauvet, E., Kuja-Halkola, R., Lundin, K., Neufeld, J., Willfors,
C., & Bolte, S. (2019). Social cognition in autism and other neurodevelopmental
disorders: A co-twin control study. Journal of Autism and Developmental Disorders, 49,
2838–2848. https://doi.org/10.1007/s10803-019-04001-4
Jodoin, M. G. and Gierl, M. J. (2001). Evaluating Type I error and power rates using an effect
size measure with logistic regression procedure for DIF detection. Applied Measurement
in Education, 14, 329–349. https://doi.org/10.1207/S15324818AME1404_2
Johannesen, J. K., Fiszdon, J. M., Weinstein, A., & Ciosek, D. (2018). The Social Attribution
Task – Multiple Choice (SAT-MC): Psychometric comparison with social cognitive
measures for schizophrenia research. Psychiatry Research, 262, 154–161.
https://doi.org/10.1016/j.psychres.2018.02.011
Karakolidis, A., O’Leary, M., & Scully, D. (2021). Animated videos in assessment: Comparing
validity evidence from and test-takers’ reactions to an animated and a text-based
situational judgment test. International Journal of Testing, 21, 57–79.
https://doi.org/10.1080/15305058.2021.1916505
Kittel, A. F. D., Olderbak, S., & Wilhelm, O. (2021). Sty in the mind’s eye: A meta-analytic
investigation of the nomological network and internal consistency of the “Reading the
Mind in the Eyes” test. Assessment. Advance online publication.
https://doi.org/10.1177/1073191121996469 SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 30
Klin, A. (2000). Attributing social meaning to ambiguous visual stimuli in higher-functioning
autism and Asperger syndrome: The social attribution task. Journal of Child Psychiatry
and Allied Disciplines, 41, 831–846. https://doi.org/10.1111/1469-7610.00671
Lavoie, M. A., Plana, I., Lacroix, J. B., Godmaire-Duhaime, F., Jackson, P. L., & Achim, A. M.
(2013). Social cognition in first-degree relatives of people with schizophrenia: A meta-
analysis. Psychiatry Research, 209, 129–135.
https://doi.org/10.1016/j.psychres.2012.11.037
Lee, H. S., Corbera, S., Poltorak, A., Park, K., Assaf, M., Bell, M. D., Wexler, B. E., Cho, Y. I.,
Jung, S., Brocke, S., & Choi, K. H. (2018). Measuring theory of mind in schizophrenia
research: Cross-cultural validation. Schizophrenia Research, 201, 187–195.
https://doi.org/10.1016/j.schres.2018.06.022
Livingston, L. A., Carr, B., & Shah, P. (2019). Recent advances and new directions in measuring
Theory of Mind in autistic adults. Journal of Autism and Developmental Disorders, 49,
1738–1744. https://doi.org/10.1007/s10803-018-3823-3
Livingston, L. A., & Happé, F. (2017). Conceptualising compensation in neurodevelopmental
disorders: Reflections from autism spectrum disorder. Neuroscience and Biobehavioral
Reviews, 80, 729–742. https://doi.org/10.1016/j.neubiorev.2017.06.005
Ludwig, N. N., Hecht, E. E., King, T. Z., Revill, K. P., Moore, M., Fink, S. E., & Robins, D. L.
(2020). A novel social attribution paradigm: The Dynamic Interacting Shape Clips
(DISC). Brain and Cognition, 138, 105507. https://doi.org/10.1016/j.bandc.2019.105507
MacCann, C., Jiang, Y., Brown, L. E., Double, K. S., Bucich, M., & Minbashian, A. (2020).
Emotional intelligence predicts academic performance: A meta-analysis. Psychological
Bulletin, 146, 150–186. https://doi.org/10.1037/bul0000219 SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 31
MacCann, C., & Roberts, R. D. (2008). New paradigms for assessing emotional intelligence:
Theory and data. Emotions, 8, 540–551. https://doi.org/10.1037/a0012746
Magis, D., Béland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R
package for the detection of dichotomous differential item functioning. Behavior
Research Methods, 42, 847–862. https://doi.org/10.3758/BRM.42.3.847
Magis, D., Raîche, G., Béland, S., & Gérard, P. (2011). A generalized logistic regression
procedure to detect differential item functioning among multiple groups. International
Journal of Testing, 11, 365–386. https://doi.org/10.1080/15305058.2011.602810
Martinez, G., Mosconi, E., Daban-Huard, C., Parellada, M., Fananas, L., Gaillard, R., Fatjo-
Vilas, M., Krebs, M. O., & Amado, I. (2019). “A circle and a triangle dancing together”:
Alteration of social cognition in schizophrenia compared to autism spectrum disorders.
Schizophrenia Research, 210, 94–100. https://doi.org/10.1016/j.schres.2019.05.043
McCaffrey, D., Ridgeway, G., & Morral, A. (2004). Propensity score estimation with boosted
regression for evaluating adolescent substance abuse treatment. Psychological Methods,
9, 403–425. https://doi.org/10.1037/1082-989X.9.4.403
McCredie, M. N., & Morey, L. C. (2019). Who are the Turkers? A characterization of MTurk
workers using the personality assessment inventory. Assessment, 26, 759–
766. https://doi.org/10.1177/1073191118760709
McDonald, S. (2012). New frontiers in neuropsychological assessment: Assessing social
perception using a standardised instrument, The Awareness of Social Inference
Test. Australian Psychologist, 47, 39–48. https://doi.org/10.1111/j.1742-
9544.2011.00054.x SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 32
Montgomery, C. B., Allison, C., Lai, M. C., Cassidy, S., Langdon, P. E., & Baron-Cohen, S.
(2016). Do adults with high functioning autism or Asperger syndrome differ in empathy
and emotion recognition? Journal of Autism and Developmental Disorders, 46, 1931–
1940. https://doi.org/10.1007/s10803-016-2698-4
Morrison, K. E., Pinkham, A. E., Kelsven, S., Ludwig, K., Penn, D. L., & Sasson, N. J. (2019).
Psychometric evaluation of social cognitive measures for adults with autism. Autism
Research, 12, 766–778. https://doi.org/10.1002/aur.2084
Nunnally, J. C. (1978). Psychometric theory (2nd ed.). McGraw-Hill.
Olderbak, S., Semmler, M., & Doebler, P. (2019). Four-branch model of ability emotional
intelligence with fluid and crystalized intelligence: A meta-analysis of relations. Emotion
Review, 11, 166–183. https://doi.org/10.1177/1754073918776776
Olderbak, S., & Wilhelm, O. (2020). Overarching principles for the organization of
socioemotional constructs. Current Directions in Psychological Science, 29, 63–70.
https://doi.org/10.1177/0963721419884317
Olderbak, S., Wilhelm, O., Olaru, G., Geiger, M., Brenneman, M. W., & Roberts, R. D. (2015).
A psychometric analysis of the reading the mind in the eyes test: Toward a brief form for
research and applied settings. Frontiers in Psychology, 6, 1503–1520.
https://doi.org/10.3389/fpsyg.2015.01503
Peer, E., Brandimarte, L., Samat, S., & Acquisti, A. (2017). Beyond the Turk: Alternative
platforms for crowdsourcing behavioral research. Journal of Experimental Social
Psychology, 70, 153–163. https://doi.org/10.1016/j.jesp.2017.01.006
Peñuelas-Calvo, I., Sareen, A., Sevilla-Llewellyn-Jones, J., & Fernández-Berrocal, P. (2019).
The “Reading the Mind in the Eyes” test in autism-spectrum disorders comparison with SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 33
healthy controls: A systematic review and meta-analysis. Journal of Autism and
Developmental Disorders, 49, 1048–1061. https://doi.org/10.1007/s10803-018-3814-4
Peterson, E., & Miller, S. F. (2012). The eyes test as a measure of individual differences: How
much of the variance reflects verbal IQ? Frontiers in Psychology, 3, 220.
https://doi.org/10.3389/fpsyg.2012.00220
Pinkham, A. E., Harvey, P. D., & Penn, D. L. (2017a). Social cognition psychometric evaluation:
Results of the final validation study. Schizophrenia Bulletin, 44, 737–748.
https://doi.org/10.1093/schbul/sbx117
Pinkham, A. E., Kelsven, S., Kouros, C., Harvey, P. D., & Penn, D. L. (2017b). The effect of
age, race, and sex on social cognitive performance in individuals with schizophrenia.
Journal of Nervous Mental Disorders, 205, 346–352.
https://doi.org/10.1097/NMD.0000000000000654
Ratajska, A., Brown, M. I., & Chabris, C. F. (2020). Attributing social meaning to animated
shapes: A new experimental study of apparent behavior. American Journal of
Psychology, 133, 295–312. https://www.jstor.org/stable/10.5406/amerjpsyc.133.3.0295
Ridgeway, G., McCaffrey, D., Morral, A., Griffin, B. A., Burgette, L., & Cefalu, M. (2020).
twang: Toolkit for Weighting and Analysis of Nonequivalent Groups. R package version
1.6.
Salter, G., Seigal, A., Claxton, M., Lawrence, K., & Skuse, D. (2008). Can autistic children read
the mind of an animated triangle? Autism, 12, 349–371.
https://doi.org/10.1177/1362361308091654
Sánchez-Álvarez, N., Berrios Martos, M. P., & Extremera, N. (2020). A meta-analysis of the
relationship between emotional intelligence and academic performance in secondary SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 34
education: A multi-stream comparison. Frontiers in Psychology, 11, 1517.
https://doi.org/10.3389/fpsyg.2020.01517
Schaafsma, S. M., Pfaff, D. W., Spunt, R. P., & Adolphs, R. (2015). Deconstructing and
reconstructing theory of mind. Trends in Cognitive Sciences, 19, 65–72
https://doi.org/10.1016/j.tics.2014.11.007
Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psychological Assessment, 8, 350–353.
https://doi.org/10.1037/1040-3590.8.4.350
Schwartzman, B. C., Wood, J. J., & Kapp, S. K. (2016). Can the five factor model of personality
account for the variability of autism symptom expression? Multivariate approaches to
behavioral phenotyping in adult autism spectrum disorder. Journal of Autism and
Developmental Disorders, 46, 253–272. https://doi.org/10.1007/s10803-015-2571-x
Sliwinski, M. J., Mogle, J. A., Hyun, J., Munoz, E., Smyth, J. M., & Lipton, R. B. (2018).
Reliability and validity of ambulatory cognitive assessments. Assessment, 25, 14–30.
https://doi.org/10.1177/1073191116643164
Soto, C. J., Napolitano, C. M., & Roberts, B. W. (2020). Taking skills seriously: Toward an
integrative model and agenda for social, emotional, and behavioral skills. Current
Directions in Psychological Science. Advance online publication.
https://doi.org/10.1177/0963721420978613
Stanton, J. M., Sinar, E. F., Balzer, W. K., & Smith, P. C. (2002). Issues and strategies for
reducing the length of self-report scales. Personnel Psychology, 55, 167–194.
https://doi.org/10.1111/j.1744-6570.2002.tb00108.x SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 35
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic
regression procedures. Journal of Educational Measurement, 27, 361–370.
https://doi.org/10.1111/j.1745-3984.1990.tb00754.x
The SPARK Consortium (2018). SPARK: A US cohort of 50,000 families to accelerate Autism
research. Neuron, 97, 488–493. https://doi.org/10.1016/j.neuron.2018.01.015
Tippins, N. T. (2015). Technology and assessment in selection. Annual Review of Organizational
Psychology and Organizational Behavior, 2, 551-582. https://doi.org/10.1146/annurev-
orgpsych-031413-091317
Türközer, H.B., & Öngür, D. A (2020). A projection for psychiatry in the post-COVID-19 era:
potential trends, challenges, and directions. Molecular Psychiatry, 25, 2214–2219
https://doi.org/10.1038/s41380-020-0841-2
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance
literature: Suggestions, practices, and recommendations for organizational research.
Organizational Research Methods, 3, 4–69. https://doi.org/10.1177/109442810031002
Velikonja, T., Fett, A. K., & Velthorst, E. (2019). Patterns of nonsocial and social cognitive
functioning in adults with autism spectrum disorder. JAMA Psychiatry, 76, 135–151.
https://doi.org/10.1001/jamapsychiatry.2018.3645
Warrier, V., Grasby, K. L., Uzefovsky, F., Toro, R., Smith, P., Chakrabarti, B. … et al. (2018).
Genome-wide meta-analysis of cognitive empathy: Heritability, and correlates with sex,
neuropsychiatric conditions and cognition. Molecular Psychiatry, 23, 1402–1409.
https://doi.org/10.1038/mp.2017.122
Weisberg, J., Milleville, S. C., Kenworthy, L., Wallace, G. L., Gotts, S. J., Beauchamp, M. S., &
Martin, A. (2012). Social perception in autism spectrum disorders: Impaired category SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 36
sensitivity for dynamic but not static images in ventral temporal cortex. Cerebral Cortex,
24, 37–48. https://doi.org/10.1093/cercor/bhs276
White, S. J., Coniston, D., Rogers, R., & Frith, U. (2011). Developing the Frith-Happé
animations: A quick and objective test of theory of mind for adults with autism. Autism
Research, 4, 149–154. https://doi.org/10.1002/aur.174
Wright, A. J. (2020). Equivalence of remote, digital administration and traditional, in-person
administration of the Wechsler Intelligence Scale for Children, Fifth Edition (WISC-V).
Psychological Assessment, 32, 809–817. https://doi.org/10.1037/pas0000939
Zwickel, J., White, S. J., Coniston, D., Senju, A., & Frith, U. (2010). Exploring the building
blocks of social cognition: Spontaneous agency perception and visual perspective taking
in autism. Social Cognitive and Affective Neuroscience, 6, 564–571.
https://doi.org/10.1093/scan/nsq088
SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 37
Table 1. Study 1 Participant Demographics ASD AS UA (n = 68) (n = 79) (n = 220) M SD M SD M SD Age 30.67 9.98 30.72 8.49 41.84 8.83 Sex 0.66 0.48 0.61 0.49 0.55 0.50 Race/Ethnicity 0.84 0.37 0.82 0.38 0.78 0.42 Education 6.18 2.33 7.28 1.96 7.59 2.34 Note: ASD = participants reporting a diagnosis of autism spectrum or autism disorder; AS = participants reporting a diagnosis of Asperger’s syndrome; UA = unaffected adults; Race/Ethnicity (1 = White/non-Hispanic, 0 = all other participants); Sex (1 = Female, 0 = Male); Education was self-reported using a ten-point rating scale (1= did not attend high school, 2 = some high school , 3 = GED, 4 = high school graduate, 5 = trade or vocational school, 6 = associate degree ,7 = completed some college, 8 = currently in college, 9 = baccalaureate degree, 10 = graduate/professional degree); Group differences in age F(2,364) = 67.73, p < .001; Group differences in sex �2 (2) = 2.92, p = .23; Group differences in race/ethnicity �2 (2) = 1.57, p = .46; Group differences in education F(2,364) = 10.18, p < .001. SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 38
Table 2. SST Descriptive Statistics by Diagnosis Group in Study 1 ASD AS UA (n = 68) (n = 79) (n = 220) Mean 14.13 15.57 15.61 SD 4.25 3.63 3.68 α .78 .71 .73 Mean CITC .33 .26 .29
Cohen’s d –0.39 –0.01 95% CI [–0.66, –0.11] [–0.27, 0.25] Note. Cohen’s d = standardized mean difference relative to unaffected adult (UA) group; ASD = autism spectrum disorder; AS = Asperger’s syndrome; CITC = corrected item-total correlation
SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 39
Table 3. Differences in SST Scores between Diagnosis Groups in Study 1 B(SE) β p Model 1 ASD –1.48 (0.52) –.15 <.01 AS –0.04 (0.50) –.05 .93
Model 2 Age –0.03 (0.02) –.08 .19 Sex 0.23 (0.39) .03 .56 Race/Ethnicity 1.66 (0.47) .18 <.01 Education 0.46 (0.08) .28 <.01 ASD –1.27 (0.54) –.13 .02 AS –0.30 (0.53) –.03 .56 Note. N = 367; β = standardized regression coefficient; ASD = autism spectrum disorder; AS = Asperger’s syndrome; Race/Ethnicity (1 = White/non-Hispanic, 0 = all other participants); Sex (1 = Female, 0 = Male); Race/Ethnicity (1 = identified as White, 0 = all other participants); Education was self-reported using a ten-point rating scale (1= did not attend high school, 2 = some high school , 3 = GED, 4 = high school graduate, 5 = trade or vocational school, 6 = associate degree ,7 = completed some college, 8 = currently in college, 9 = baccalaureate degree, 10 = graduate/professional degree); Model 1 R2 = .02, F(2, 364) = 4.19, p = .016; Model 2 R2 = .12, F(6, 360) = 8.96, p < .001
SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 40
Table 4. Study 2 Participant Demographics ASD UA (n = 25) (n = 829) M SD M SD Age 20.80 3.88 29.69 11.48 Sex 0.20 0.41 0.59 0.49 Race/Ethnicity 0.88 0.33 0.56 0.50 Education 2.20 0.58 3.26 0.86 Note: ASD = participants with a diagnosis of autism spectrum disorder; UA = unaffected adults; All unaffected adults were sampled from Amazon’s Mechanical Turk or undergraduate psychology students (Brown et al., 2019; Brown et al., 2021); Sex (1 = Female, 0 = Male); Race/Ethnicity (1 = identified as White, 0 = all other participants); Education was coded using a six-point rating scale (1= less than high school; 2 = high school degree, 3 = some college, 4 = 4 year college degree, 5 = some graduate school, 6 = graduate degree); Group differences in sex �2 (1) = 15.13, p < .001; Group differences in race/ethnicity �2 (1) = 10.15, p = .001;
SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 41
Table 5. SST Descriptive Statistics by Diagnosis Group in Study 2 ASD UA (n = 25) (n = 829) Mean 14.44 15.89 SD 3.82 3.42 α .75 .67 Mean CITC .29 .24
Cohen’s d –0.42 95% CI [–0.82, –0.02] Note. ASD = participants with a diagnosis of autism spectrum disorder; UA = unaffected adults; α = coefficient alpha; CITC = corrected item-total correlation; CI = confidence interval
SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 42
Table 6. Differences in SST Scores between Diagnosis Groups in Study 2 B(SE) β p Model 1 ASD –1.45 (0.70) –.07 .04
Model 2 Age 0.00 (0.01) .01 .74 Sex –0.27 (0.24) –.04 .26 Race/Ethnicity 0.56 (0.25) .08 .03 Education 0.18 (0.14) .05 .20 ASD –1.51 (0.73) –.07 .04 Note. N = 854; β = standardized regression coefficient; ASD = Autism Spectrum Disorder; Race/Ethnicity (1 = White/non- Hispanic, 0 = all other participants); Sex (1 = Female, 0 = Male); Education was coded using a six-point rating scale (1= less than high school; 2 = high school degree, 3 = some college, 4 = 4 year college degree, 5 = some graduate school, 6 = graduate degree); Model 1 R2 = .01, F(1,852) = 4.32, p = .04; Model 2 R2 = .01, F(5, 848) = 2.89, p = .01
SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 43
Table 7. Differential Item Functioning Results for All SST Items ASD UA UA/ASD
(n = 93) (n = 1049) DIF M CITC M CITC Wald �2 p Item 1 .37 .25 .40 .12 0.72 .70 Item 2 .40 .26 .41 .26 3.49 .17 Item 3 .88 .27 .94 .28 2.37 .31 Item 4 .85 .51 .90 .28 2.15 .34 Item 5 .91 .33 .95 .24 0.02 .99 Item 6 .82 .23 .84 .26 2.31 .32 Item 7 .70 .48 .83 .35 1.23 .54 Item 8 .47 .15 .63 .13 4.14 .13 Item 9 .76 .28 .76 .24 2.74 .25 Item 10 .83 .49 .86 .35 2.45 .29 Item 11 .61 .26 .60 .09 2.46 .29 Item 12 .61 .38 .65 .33 1.90 .39 Item 13 .87 .32 .92 .25 0.02 .99 Item 14 .72 .42 .80 .28 0.39 .82 Item 15 .47 .14 .55 .14 0.47 .79 Item 16 .62 .23 .66 .25 1.79 .41 Item 17 .66 .45 .74 .29 0.82 .66 Item 18 .52 .54 .66 .45 0.51 .77 Item 19 .18 .21 .31 .12 3.23 .20 Item 20 .35 .09 .44 .13 1.16 .56 Item 21 .46 .20 .58 .32 5.04 .08 Item 22 .74 .28 .81 .28 0.81 .67 Item 23 .40 .45 .57 .36 2.35 .31 Note. M = mean correct responses; DIF = differential item functioning; All estimates were calculated using the difR package (Magis et al., 2010); Each statistic represents an omnibus test for uniform and nonuniform DIF for each item; UA = unaffected adults from Studies 1 and 2 (n = 1049); ASD = adults with an autism spectrum disorder diagnosis from Studies 1 and 2 (n = 93); CITC = corrected item-total correlation; Items included in the 10-item short form SST are shown in italics and bolded.
SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 44
Table 8. SST Short Form Descriptive Statistics in Study 3 ASD UA (n = 93) (n = 422) Mean 6.47 7.02 SD 2.19 2.27 α .65 .69 Mean CITC .32 .35
Cohen’s d –0.24 95% CI [–0.47, –0.02] Note. ASD = participants with a diagnosis of autism spectrum disorder; UA = unaffected adults recruited from Prolific; α = coefficient alpha; CITC = corrected item-total correlation; CI = confidence interval
SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 45
A
B
Figure 1. Practice SST Item. All participants were given a practice item (A) before starting the 23-item SST. After responding to the practice item, participants received feedback which identified the correct response (B).
SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 46
Figure 2. Distribution of SST Scores for Participants with ASD and Unaffected Adults (both Studies 1 and 2 combined). The y-axis represents the proportion of participants within each group. The x-axis represents the number of correct responses to the 23-item SST.