<<

SELF-ADMINISTERED TEST OF SOCIAL 1

The Social Shapes Test as a Self-Administered, Online Measure of Social Intelligence:

Three Studies with Typically Developing Adults and Adults with

Disorder

Matt I. Brown1, Christopher F. Chabris1, & Patrick R. Heck1

1Geisinger Health System, Lewisburg, PA

Running Head: SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE Version 1 submitted for peer-review at the Journal of Autism and Developmental Disorders, 29 July 2021. This paper has not been peer reviewed. Please do not copy or cite without author's permission

*Address correspondence to: Matt I. Brown Geisinger Autism and Developmental Medicine Institute 120 Hamm Drive, Suite 2A, MC 60-36 Lewisburg, PA 17837 [email protected]

Acknowledgments: We thank Lauren Walsh for her research assistance and Antoinette DiCriscio, Cora Taylor, and Vanessa Troiani for their helpful feedback on our manuscript. We also thank Jamie Toroney, Kiely Law, and the SPARK Consortium for their assistance in recruiting participants for this study. This work was supported by funds from the National Institutes of Health (grant U01MH119705). SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 2

Abstract

The Social Shapes Test (SST) was designed as a measure of social intelligence, , or mentalizing without using human faces or relying on extensive verbal ability and is completely self-administered online. Despite promising validity evidence, there have been no studies of whether this task is suitable for adults with autism spectrum disorder (ASD). Across three studies, we find that the SST is an equally reliable test for adults with ASD. We detect a modest difference in SST performance due to ASD and no evidence for differential item functioning. We also provide a ten-item version for whenever testing time is limited. The SST is a suitable measure of social intelligence, especially for studies where remote, online assessment is needed.

Keywords: autism spectrum disorder, social , social intelligence, theory of mind; online testing

SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 3

Social and emotional skills are important for adaptive functioning in everyday life (Soto et al., 2020). These abilities have been categorized or labeled using a variety of terms, including social intelligence (SI), perspective-taking, mentalizing, and Theory of Mind (Olderbak &

Wilhelm, 2020). Clinical researchers have developed an array of psychological assessments to assess these abilities (e.g., Abrahams et al., 2019) and to investigate the phenotypes of various psychological conditions including autism spectrum disorder (ASD; Morrison et al., 2019) and (Pinkham et al., 2017a). Cognitive neuroscience researchers have also used these tools to identify areas of brain associated with social functioning (Schaafsma et al., 2015).

Research on the development and implementation of social-cognitive measures can therefore contribute to our understanding of these important etiologies (Casaletto & Heaton, 2017) however most existing tools have focused on recognizing emotion in human facial expressions or interpreting written descriptions of social interactions.

Due in part to the COVID-19 epidemic, many clinical and research practices have needed to shift from administering in-person assessments to implementing measures which can be administered remotely online or during telemedicine visits (Türközer & Öngür, 2020). This shift to online, non-proctored assessment has already become the predominant mode in employment testing (e.g., Tippins, 2015) prior to the pandemic. In contrast, many of the assessments used in clinical or developmental research are designed for in-person administration. Although recent work has helped establish new, web-based cognitive ability assessments (e.g., Biagianti et al.,

2019; Wright, 2020), few have focused on designing remote measures of SI or Theory of Mind.

This is important given that these forms of are generally affected by ASD or similar, developmental disorders (Velikonja et al., 2019). Presently, many of the SI measures which can be administered online rely on self- or observer-reports such as the Social SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 4

Responsiveness Scale (SRS; Constantino et al., 2003), the Autism Quotient (AQ; Baron-Cohen et al., 2001b), or the Broad Autism Phenotype Questionnaire (BAPQ; Hurley et al., 2007). In contrast, many of the validated, objective SI tests are designed to be completed in-person and administered by a proctor or trained clinician. This makes them ill-suited for remote administration and presents a challenge given the growing need for remote, web-based assessments. To this end, the Social Shapes Test (SST; Brown et al., 2019) was developed as a simple, self-administering SI test using distinct materials from many other SI measures and with promising evidence for convergent validity. To date, however, the SST has only been used in samples of typically developing adults. Therefore, we conducted the present study to examine whether the SST can be used to provide reliable yet remote measurement of SI for adults with or without an ASD diagnosis.

Measuring SI using Animated Shape Tasks

Heider and Simmel (1944) famously observed that research participants often described the movements of animated, geometric shapes in human psychological terms. This pioneering work inspired research by Klin (2000), who found that individuals with autism or Asperger’s disorder made fewer social attributions to the Heider and Simmel video compared to members of an unaffected control group. At the same time, Abell and colleagues (2000) reported similar results using animations based on the Heider and Simmel video. Subsequent research inspired by these findings has largely observed group differences due to ASD among children and adults in performing different types of animated shape tasks (Burger-Caplan et al., 2016). Moreover, researchers have also used shape tasks to quantify differences in SI or social attribution due to schizophrenia (Bell et al., 2010; Johannesen et al., 2018; Koelkebeck et al., 2010) and to identify SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 5 neurophysiological antecedents, including differences in eye movement (Zwickel et al., 2010), and regions of brain activation (Weisberg et al., 2012).

One benefit of these animated shape tasks is that they rely less on reading skill or verbal and comprehension compared to other measures of SI. More verbally-loaded tasks may allow individuals to use their verbal skills to compensate for low SI (Livingston & Happé,

2017), or may be poorly-performed not because of low SI but because of low verbal ability. This is important given that one of the few SI tests which can be freely administered online, the

Reading the Mind in the Eyes test (RMET; Baron-Cohen et al., 2001a), has been criticized for its reliance on verbal content (Peterson & Miller, 2012). For example, the RMET presents emotion words that are not all commonly used in everyday language, implying that one’s own vocabulary skill may affect RMET performance (Kittel et al., 2021). In addition, tasks involving human faces tend to display score differences due to race or ethnicity among clinical and nonclinical populations (Dodell-Feder et al., 2019; Pinkham et al., 2017b). In contrast, animated shape tasks have displayed few if any group differences in past research (Brown et al., 2019, Brown et al.,

2021; Lee et al., 2018).

Despite these advantages, many animated shape tasks are not well-suited for remote testing. Nearly all existing tasks require proctoring from a clinician or an administrator. For example, both tasks used by Klin (2000) and Abell and colleagues (2000) require an administrator to ask questions and to record and score verbal responses from participants. Not only could this introduce confounding effects of verbal ability or rater bias, but it also increases administration time and financial costs (Livingston et al., 2019). Some tasks use a combination of free response and multiple choice questions (e.g., the Frith-Happé animations; White et al.,

2011) or only a fixed set of multiple choice questions (e.g., Social Attribution Test – Multiple SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 6

Choice or SAT-MC; Bell et al., 2010). However, both examples require an administrator to be present in order to operate the specific video segments for each question. This is problematic because it prevents participants from being able to complete these tasks remotely. These barriers will likely prevent researchers from using animated shape tasks as research studies increasingly shift from being conducted in-person to online.

The Social Shapes Test (SST)

The SST is a 23-item multiple choice social intelligence test. Each SST item consists of a short, 13–23 second animated video which includes a standard set of colored, geometric shapes.

These animations were designed to represent a broader range of emotional and mental states compared to the original Heider and Simmel video and have been found to elicit a similar degree of social attributions in narrative descriptions compared to those reported by Klin (Ratajska et al., 2020). Each video features a different social plot where the shapes display a variety of behaviors including bullying, helping, comforting, deception, and playing. All videos are controlled by the participant and can be viewed as many times as desired. Before starting the

SST, all participants are given a sample item followed by feedback indicating the correct response (Figure 1). All 23 items were administered in the same order for all participants.

Unlike other SI tasks, the SST was explicitly designed to be completely self-administered online, as was done in our initial validations studies (Brown et al., 2019). All questions are scored using an objective scoring key which helps prevent potential rater bias when scoring open responses which are used in other animated shape tasks (White et al., 2011). Lastly, all SST questions and video files are freely available for research use and can be accessed via the Open

Science Framework. Researchers are free to use the SST videos to administer the test as part of an online survey or to adapt the videos in order to suit their own individual studies. This makes SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 7 the SST more easily accessible for researchers, especially compared to other video-based social intelligence tests owned or distributed by commercial test publishers (e.g., The Awareness of

Social Inference Test – TASIT; McDonald, 2012).

Despite these advantages, it is uncertain whether the SST is sensitive to differences in social intelligence among adults with ASD or other developmental disorders. SST scores have been found to correlate positively with existing measures of emotion perception and emotion understanding, even after controlling for differences in verbal and non-verbal ability. However, this research has only been conducted with samples of typically functioning adults such as undergraduate college students (Brown et al., 2021) or crowdsourced workers from Amazon

Mechanical Turk (Brown et al., 2019). Like the SST, there is also promising evidence for convergent validity between SAT-MC scores and other measures of SI and standardized assessments of social functioning (Altschuler et al., 2018; Johannesen et al., 2018). Although no study has administered both tests together, both the SST and SAT-MC have been found to correlate strongly with the RMET (Brown et al., 2019; Lee et al., 2018). Moreover, research suggests that the SAT-MC is sensitive to differences in SI among adults and children based on diagnoses of ASD or schizophrenia (Burger-Caplan et al., 2016; Pinkham et al., 2017a).

Present Study

We designed the present study to investigate whether the SST can be self-administered remotely to measure in social intelligence among typically functioning adults and adults with

ASD. Based on similarities in test content and past convergent validity evidence, we expect that the SST and the SAT-MC each measure the same, underlying construct of SI. Therefore, we expect that adults with ASD should score lower on the SST compared to unaffected adults (UA).

In line with past research, we investigate whether the SST is sensitive to differences in SI due to SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 8 diagnoses of ASD or AS (e.g., Morrison et al., 2019). To test this hypothesis, we conduct two studies in which we recruit adults with an ASD diagnosis to complete the SST using a self- administered, online survey. In each study we report these results in comparison to test scores from samples of typically-developing adults. We also examine whether SST items function similarly for adults with or without an ASD diagnosis and conduct a third study to determine whether an abbreviated form of the SST has similar psychometric properties to the full version.

Study 1

Methods

Participants

We recruited a total of 563 participants for Study 1 from the Simons Foundation

Powering Autism Research for Knowledge (SPARK; SPARK Consortium, 2018) cohort which consists of individuals and families who have agreed to participate in autism research. A broader description of adults in the SPARK cohort was recently reported by Fombonne et al. (2020). All participants were given a $10 Amazon gift card for completing the study. Participants included independent adults who self-reported a diagnosis of autism spectrum disorder (ASD), autistic disorder, or Asperger’s disorder (n = 261) as well as unaffected adults (UA) who are parents of one or more children with an ASD or AS diagnosis but have never been given a diagnosis themselves (n = 302). Although diagnosis history was collected using either self- or parent- reports, rather than direct clinical evaluation, past research has found that this method yields reliable accounts of autism diagnoses in other research registries (Daniels et al., 2012). Although the UA may also have a genetic predisposition to ASD, unaffected, first-degree relatives have been observed to perform better on social cognition tasks relative to individuals with clinical SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 9 diagnoses in prior research (e.g., Lavoie et al., 2013). Therefore, we used the UA sample as a control group.

Prior to data analysis, we removed participants who had a median response time of less than 15 seconds per SST item. This response time threshold was chosen to increase the likelihood that participants watched the entire video for each item and to remove potential cases of non-purposeful responding (21 of 23 SST videos were 15 seconds or longer). We observed that a greater proportion of UA participants were removed based on our response time threshold

(21%) compared to those with either an ASD or AS diagnosis (3%). We also removed participants who failed to respond correctly to all four attention-check items. In each such item, participants were asked to watch a different shape animation and to identify which of four shapes did not appear in the video. The attention-check items depend only on basic cognitive processes

(e.g. vision, attention, memory) and should not require social intelligence to solve. More than

90% of participants in each of the diagnosis conditions were able to correctly identify the missing shape in all four items. We also removed 61 cases where participants reported receiving an ASD or AS diagnosis after the age of 30. This left us with a total sample of n = 367 participants (ASD n = 68; AS n = 79; UA n = 220).

A full summary of the key demographic variables for each group are reported in Table 1.

Participants in the UA group were older on average compared to the ASD and AS groups,

F(2,364) = 67.73, p < .001. We also observed significant differences in educational attainment among the three groups, F(2,364) = 10.18, p < .001. There were no significant group differences in sex or race/ethnicity.

Procedure SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 10

All study materials were presented in an online survey. Each participant was emailed a unique link to the study. Participants were given a brief set of instructions and a practice item before beginning the SST. Afterwards, participants completed several demographic items regarding their geographical location, educational attainment, self-identified race/ethnicity, and approximate annual income. Participant sex and age was provided by the SPARK consortium.

SST scores were calculated as the simple sum of correct responses across the 23 items. We observed adequate internal consistency for the SST within the ASD (α = .78), AS (α = .71), and

UA groups (α = .73). We observed similar, corrected item-total correlations (CITC) for SST items within each of the three groups as well (CITC ASD = .33; CITC AS = .26; CITC UA = .29) which indicates that individual items discriminate between high and low ability equally well between groups.

Statistical Analysis

All analyses were performed using R version 3.6.3. We report the standardized mean difference (Cohen’s d) when interpreting differences in SST scores based on diagnosis group.

We also use multiple linear regression in order to statistically control for demographic differences between the three diagnosis groups.

Results

To test for SST score differences between participants with an ASD and AS diagnosis compared to no diagnosis (UA), we first regressed SST scores on dummy-coded diagnosis variables. We report the descriptive statistics for all three groups in Table 2. Participants with

ASD scored significantly lower on the SST relative to UA (β = –.15, p = .001, d = –0.44, 95% CI

= [–0.66, –0.11]). We failed to detect any difference in SST scores between AS and UA (β = –

.05, p = .93, d = –0.13, 95% CI = [–0.27, 0.25]). Next, we tested for differences between SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 11 participants who reported an ASD or AS diagnosis. We found a significant difference in SST scores between ASD and AS groups (d = –0.36, 95% CI = [–0.69, –0.04], t (143) = 2.25, p =

.03). This result suggests that the SST was sensitive enough to detect differences in SI between adults reporting either an ASD or AS diagnosis.

We also sought to test for SST score differences after statistically controlling for differences in age, sex, educational attainment, and race/ethnicity between the three condition groups (Table 3). After controlling for these demographic differences, we found that participants with ASD still scored lower on the SST compared to the UA group (β = –.13, p = .005). No difference between AS and UA groups could be detected (β = –.03, p = .56). Among the demographic control variables, educational attainment was positively related to SST scores (β =

.28, p < .001) and participants who identified as White scored higher on the SST, compared to all others (β = .17, p < .001).

Discussion

The results of Study 1 suggest that the SST provides an adequately reliable measure of SI for adults who have previously been diagnosed with AS or ASD. We observed a similar degree of internal consistency and comparable item statistics across all three participant groups. We also detected a moderate difference in SI between unaffected adults and adults with ASD even after controlling for demographic differences. Yet the mean score difference due to ASD diagnosis that we observed (d = –0.44) was slightly weaker than some of the estimates reported in past research for similar measures (e.g., SAT-MC; Pinkham et al., 2017a). This result may also be due to the nature of our control group (unaffected parents of children with ASD) or indicative of greater SI among our participants (independent adults with ASD) compared to samples obtained in clinical settings. It could also be that measures reporting larger differences are also relying on SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 12 verbal ability somewhat more than the SST is. Therefore, we conducted a second study to see if we could replicate these findings using an additional sample of adults with ASD recruited from a neurodevelopmental clinic.

Study 2

Methods

Participants

We recruited a second sample of 31 adults who had previously received a clinical diagnosis of ASD from a neurodevelopmental pediatrician. All participants in Study 2 had sought clinical services in the Northeastern U.S. and had consented to be contacted for ongoing research studies. Due to the smaller pool of eligible participants than in Study 1, participants were given a larger reward of a $35 Amazon gift card for completing this study. All participants were recruited online and completed the SST without a proctor or administrator. Before data analysis, six cases were removed for getting at least one attention check item wrong, leaving a sample of n = 25 adults with an ASD diagnosis. All remaining participants exceeded the median response time threshold that we used in Study 1 and no other participants were excluded.

Participants ranged from 18 to 34 years of age (mean age = 20.8, SD = 3.9). Most participants identified as male (20/25; 80%) and as White, non-Hispanic (22/25; 88%). Based on assessment scores obtained from the electronic medical record, the average full-scale IQ score among the

ASD group was 86.1 (SD = 22.4). T-scores from the Social Responsiveness Scale (SRS;

Constantino et al., 2003) indicated an elevated level of autistic symptoms among most of the participants in ASD group as well (M = 74.2, SD = 12.4).

In order to address some of the limitations of our control group in Study 1, here we relied on data collected from typically-developing adult participants in two prior studies (Brown et al., SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 13 in review; Brown et al., 2019). Unlike the control group in Study 1, these participants were not likely to share a potential genetic predisposition to ASD or to other developmental disorders. A total of 829 participants had been recruited from undergraduate psychology courses at a public university in the Midwestern U.S. and from Amazon’s Mechanical Turk. Most participants in this control group identified as female (59%) and White, non-Hispanic (56%). Like the ASD group, all control group participants had completed the SST as part of a self-administered, online

Qualtrics survey. We observed several demographic differences between our ASD and UA groups (Table 4). Participants in the UA sample were significantly older (F(1,852) = 14.95, p <

.001), reported higher educational attainment (F(1,852) = 37.01, p < .001), and were more likely to identify as female, �2 (1) = 15.13, p < .001. We also observed a smaller proportion of participants who identified as White, non-Hispanic within the UA sample compared to the ASD sample (�2 (1) = 10.15, p = .001).

Procedure

All measures were administered using an online survey. Participants first completed the

23-item SST (with four interspersed attention-check items) and then responded to several demographic questions including self-reported race/ethnicity and educational attainment. We observed adequate internal consistency for the SST in both the ASD (α = .74) and UA groups (α

= .69) and the average corrected item-total correlation was similar between groups (CITC ASD =

.30, CITC UA = .24).

Statistical Analysis

As in Study 1, all analyses were performed using R version 3.6.3 and used many of the same statistical analyses to test for differences in SST scores between ASD and UA groups. We also used several statistical packages for additional exploratory analyses which were unique to SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 14 this study. We used the “ps” function as part of the twang R package to compute propensity scores (McCaffrey et al., 2004). These propensity scores were calculated in attempt to estimate

SST score differences between participants with ASD and UA participants who share the most similar demographic makeup (e.g., age, sex, race/ethnicity) as our clinical sample. We also used the difR R package in order to estimate logistic models to test for differential item functioning

(DIF) between participants with or without an ASD diagnosis (Magis et al., 2010).

Results

We first used linear regression to test for unadjusted differences in SST scores between

ASD and UA groups (Table 5). Participants with an ASD diagnosis scored significantly lower on the SST compared to the UA group (β = –.07, p = .04; d = –0.42, 95% CI = [–0.82, –0.02]). We also observed practically no difference in SST scores between adults with ASD in Studies 1 and

2 (d = 0.07, 95% CI = [–0.38, 0.53], t(90) = 0.25, p = .80) or between UA groups in Studies 1 and 2 (d = 0.08, 95% CI = [–0.07, 0.23], t(1047) = 1.04, p = .30). Again, we found that participants with ASD scored lower on the SST even after controlling for demographic differences between groups (β = –.07, p < .04). Among the demographic variables, we did not find the effects of age or sex that we observed in Study 1 (Table 6). We failed to detect an effect for educational attainment, which was inconsistent with our findings in the prior study.

Given the fairly large demographic differences between the ASD group and our larger

UA group, we conducted exploratory analyses using propensity score weighting in attempt to better estimate SST score differences that could be attributed to diagnosis group independent of demographic factors. Propensity scores are determined by regressing group status (e.g., treatment or control condition; here ASD or UA group) onto a set of auxiliary variables in order to minimize any pre-existing differences between treatment and control groups in observational SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 15 studies (Austin, 2011). This method has been widely used in medical research (e.g., Austin &

Stuart, 2015) among other research fields to statistically control for potential confounding variables in non-experimental studies. We estimated propensity scores based on participant age, gender, self-reported race/ethnicity, and educational attainment using generalized boosted regression as described by McCaffrey and colleagues (2004). Given the focus of our study, we sought to estimate the average treatment effect for the treated (ATT) which attempts to determine the effect of the independent variable (e.g., presence or absence of ASD diagnosis) by comparing the “treatment group” to a subset of control group subjects who are most like those in the diagnosis condition (Austin, 2011). In this approach, propensity scores are estimated to identify members of the UA group who otherwise most closely match members of the ASD group based on the demographic variables that were included in the model.

After weighting via this procedure, we found that participants with ASD scored lower on the SST relative to matched controls, but this did not reach statistical significance, d = –0.27, t =

1.05, p = .30. However, when reviewing the descriptive statistics after weighting, we found that our weighted sample size for the control condition was rather small (only n = 27, or 3% of the full UA sample). Among participants with an ASD diagnosis, 72% reported achieving a high school degree or some high school as their highest level of education so far. In contrast, only

12% of participants in the UA group reported achieving only a high school degree or some high school. Due to these disparities between groups, as well as meta-analytic evidence for the positive relationship between SI and academic performance (e.g., MacCann et al., 2020;

Sánchez-Álvarez et al., 2020), we also estimated propensity weights without including educational attainment in the model. This nearly doubled the sample size of the weighted control group (n = 51) while also maintaining no significant differences between the ASD and UA SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 16 groups for the other demographic variables. When using these adjusted propensity score weights, we detected a significant difference between ASD and UA groups, d = –0.52, t = 2.20, p = .03.

Taken together, these findings add some support for the validity of the SST to detect differences in social intelligence between typically-functioning adults and adults with an ASD diagnosis.

Lastly, we tested for DIF between UA and ASD groups for each of the 23 SST items

(Table 7). An example of DIF is an item which is less difficult for members of one group (e.g., unaffected adults) than for members of a comparison group (e.g., adults with ASD) who otherwise have an identical ability level. This would indicate that the item is biased in favor of one group and group differences are more a function of the item properties and do not necessarily demonstrate group differences in the underlying construct of interest. DIF can be uniform, where the difference between groups is consistent across the ability spectrum, or nonuniform, when there is an interaction between group status and ability level (Swaminathan &

Rogers, 1990). It is important to rule out DIF in order to determine whether the observed score differences between groups are due to true differences in the construct of interest and not a result of differences in the test’s measurement properties (Vandenberg & Lance, 2000). DIF testing has been used in prior studies to observe whether instruments function similarly between demographic groups based on factors such as gender, race, and ethnicity (e.g., Harrison et al.,

2017). For this analysis, we pooled the data from Studies 1 and 2 to provide the maximum degree of statistical power to detect differential functioning between groups. None of the 23 SST items displayed any evidence of either uniform or nonuniform DIF. Based on these results, we conclude that the SST provides an equivalent measure of social intelligence for adults with or without an ASD diagnosis. The rank order of item difficulty (percent correct) within the SST was also highly consistent between ASD and UA samples (r = .97, p < .001). We also observed a SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 17 positive correlation between corrected item-total correlations from the ASD and UA samples (r =

.76, p < .001), which indicates that the relative contributions of each SST item to the total score were consistent regardless of diagnosis status. Therefore, differences in SST scores between

ASD and UA groups are more likely to be explained by differences in social intelligence ability than by an artifact of DIF.

Discussion

Not only did we find replicate our initial finding of SST score differences between UA and a new sample of adults with a clinical diagnosis of ASD, but we also found meaningful effects of ASD on SST scores after controlling for demographic group differences using propensity score weighting. Furthermore, we also combined our ASD groups from Studies 1 and

2 to test for DIF relative to a large, normative sample of typically developing adults. The result of this analysis indicates that the SST items are equally valid measures of SI for adults with or without an ASD diagnosis. This also suggests that the observed score differences between groups can be attributed to differences in SI and are not due to any item or measurement biases.

Study 3

Although the 23-item SST typically takes roughly 15 minutes to complete, this may be too long for some studies involving a series of different assessments and measures. Prior studies have developed shorter versions of commonly-used SI tests (e.g., short form Reading the Mind in the Eyes Test; Olderbak et al., 2015) to accommodate researchers who wish to include performance measures of SI without making the length of the study discouraging or prohibitive to prospective participants. Therefore, we conducted Study 3 to develop an abbreviated form of the SST which could be used in situations where a shorter administration time is needed while retaining most of the psychometric properties of the full 23-item version. SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 18

Methods

Participants

We recruited 461 U.S.-based adults using Prolific (Peer et al., 2017). Participants were each paid $4.70 for completing the study. We did not screen participants for any history of ASD diagnoses but assume that most participants in this sample are within a typical range of development, as others have found for similar crowdsourcing sites (e.g., Amazon Mechanical

Turk; McCredie & Morey, 2018). We removed 39 participants who failed one or more attention check items or did not meet our median response time criteria, resulting in a final sample of 422 participants. Most participants identified as White (51%) and female (52%). The average participant age was 31.7 years old (SD = 11.0). Data from this study were also reported by Heck et al. (2021) but there is no overlap in research questions or primary analyses between that publication and this one.

Procedure

All measures in Study 3 were administered using an online survey hosted by Qualtrics.

All tasks were completed by participants in the same, fixed order. After completing the social intelligence and general intelligence tasks described below, participants were also asked to complete several self-report personality and confidence measures which were not relevant to the research questions asked here. The median time to complete the study was 31 minutes.

Measures

We created a short, ten-item SST form based on an item analysis of the full 23-item version using item-level data reported by Brown et al. (2019). We selected items based on difficulty (percent correct) and corrected item-total correlations as suggested by Stanton et al.

(2002). We also considered the differences in item difficulty between UA and ASD groups SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 19 reported in Table 7. This short form had only slightly worse internal consistency (α = .69) compared to the estimates found for the full SST in Studies 1 and 2.

In addition to the SST, all participants completed ten-item short form versions of several social and general intelligence measures. Ten Reading the Mind in the Eyes Test (RMET; Baron-

Cohen et al., 2001) items were selected based on a short form developed by Olderbak et al.

(2015). A ten-item version of the Situational Test of Emotional Understanding (STEU; MacCann

& Roberts, 2008) was created using item analysis data reported by Brown et al. (2019). Both short forms yielded acceptable internal consistency (RMET α = .66; STEU α = .64). We measured verbal ability using the ten-item Wordsum vocabulary test (α = .78). We also measured abstract reasoning using a ten-item test using matrix items similar to those used in Raven’s

Progressive Matrices (α = .72).

Results

Notwithstanding the slight decrease in internal consistency relative to the full 23-item

SST, scores on the 10-item short form SST were positively related to scores on the RMET (r =

.53, p < .001) and the STEU (r = .53, p < .001). We also found that SST performance (as percent correct of total) among unaffected adults was similar for the full 23-item version (M = 69% in

Study 2) and the shortened, 10-item form (M = 70%). We report descriptive statistics for the short form SST in Table 8. To test for incremental validity, we regressed SST scores onto both

RMET and STEU scores after controlling for verbal ability and abstract reasoning. Both SI tests predicted an additional 11% of variance in SST scores (∆R2 = .11, F(∆df = 2) = 40.17, p < .001).

Additionally, the RMET (� = .25, p < .001) and STEU (� = .22, p < .001) were each significant predictors of SST scores beyond the effects of the general intelligence measures. These results SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 20 suggest that the 10-item version of the SST retains much of the convergent validity that has been demonstrated in previous research using the full 23-item version (Brown et al., 2019).

We compared the short form results from this sample of unaffected adults (UA) to the combined ASD samples from Studies 1 and 2 (n = 93). For this combined sample of adults with

ASD, we observed their performance on the ten-item subset from the full SST that they completed. Compared to the unaffected adult sample, the short form SST demonstrated a similar level of internal consistency among ASD participants (α = .65) and corrected item-total correlations among the ten items (mean CITC = .32). We also found that participants with ASD scored lower on the short form SST compared to the UA sample (d = –0.24, 95% CI = [–0.47, –

0.02], t(513) = 2.13, p = .03). These results suggest that the short form SST may provide a valid measure of SI for studies where the full 23-item version cannot be administered.

General Discussion

Our study is the first to explore how adults with ASD perform on a self-administered, online SI test compared to typically developing adults. We found that the SST and its items functioned equally well for both groups of participants and that the test detected modest group mean differences due to ASD. We provide a histogram of SST scores for participants in UA and

ASD groups in Studies 1 and 2 (Figure 2). We also failed to detect any consistent evidence for

DIF compared to a large normative sample of 1049 unaffected adults. Even in Study 2, where our ASD sample was characterized by below average IQ scores, participants were able to complete the SST online without the presence of a test proctor or clinician. These results suggest that the SST holds promise as a valid, online, remote assessment of SI for adults in clinical or subclinical populations. Unlike self- or observer-reported measures of autistic traits which have been found to correlate with personality traits in non-clinical samples (Ingersoll et al., 2011; SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 21

Schwartzman et al., 2016), past research also indicates that SST scores are practically unrelated to personality or trait (Brown et al., 2019). This suggests that the SST can be used as a complement to many of the popular existing self- or observer-reported measures.

These findings are especially promising given the growing need for valid, online, self- administered assessments. Although past research has documented the development of internet- based general cognitive ability or intelligence tests (e.g., Brown & Grossenbacher, 2017;

Sliwinski et al., 2018; Wright, 2020), few objective SI tests besides the RMET have been used online without a proctor. Even though our participants completed the SST outside of a clinical setting and using their own device (e.g., tablet, laptop, or desktop computer), we did not detect any degradation in measurement precision or item validity. Based on these results, the SST appears useful for assessing SI while allowing participants to complete the test remotely without having to travel to a clinic or research site. We also designed a 10-item short form of the SST which can be used while retaining modest reliability and demonstrating similar validity evidence.

This may help researchers recruit larger samples or may make participating in research studies more accessible to potential participants. Based on findings from recent research, the use of animation in the SST may also create a more engaging and enjoyable experience for participants relative to text-based assessments (Karakolidis et al., 2021).

Implications and Directions for Future Research

Even though the SST was not explicitly designed to detect ASD or other developmental disorders or to quantify traits related to ASD, the test does appear to be somewhat sensitive to differences in SI due to ASD. We found modest mean score differences on the SST between adults with and without a prior diagnosis of ASD as observed in previous studies using different measures of SI. Our estimates were only slightly lower than the difference between patients with SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 22 schizophrenia and controls reported for the SAT-MC by Pinkham et al. (d = 0.64; 2017a).

However, larger score differences have been reported for the SAT-MC among children with or without ASD (d = 0.88; Burger-Caplan et al., 2016) and for other animated shape tasks

(Blilksted et al., 2016). Likewise, other SI tests like the RMET have generally displayed larger mean differences due to ASD or schizophrenia diagnoses (Morrison et al., 2019; Peñuelas-Calvo et al., 2019). Future test development work may help improve the sensitivity of the SST. Item results in Table 7 indicate that the largest score differences were found for items which were designed as false belief tasks (SST 18 and 23) that likely stress Theory of Mind capacities. In each of these items, participants were asked to take the perspective of one of the shape characters

(e.g., “Who does X think is in the house at the end of the video?”). These results suggest that this item format may be particularly effective for differentiating between with or without an ASD diagnosis and may be used to further optimize the SST for this purpose.

Another important avenue for future research is to determine the boundary conditions for administering the SST online. In our samples, adults with ASD were able to complete the SST outside of a controlled research or clinic setting. However, many of these adults appear to be relatively high functioning, based on their self-reported educational attainment. Future studies should seek to identify criteria which would help researchers determine whether a participant could be expected to provide valid responses in a self-administered, online assessment. Likewise, our samples only included participants who were 18 years of age or older even though animated shape tasks have been used to measure SI among children and adolescents (Altschuler et al.,

2018; Burger-Caplan et al., 2016; Salter et al., 2008). Given that the SST items were designed to require as little reading as possible and thus be more independent of verbal ability or language skills, we expect that the test can be used in younger populations, but this has yet to be explored SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 23 empirically. Thus, further research is needed to observe how this test functions when administered to younger participants in clinical or nonclinical populations.

Future research is also needed to observe the heritability and genetic predictors of SST scores. Twin studies have estimated genetic contributions to individual differences in social cognition (Isaksson et al., 2019) and measures of social functioning (Constantino & Todd, 2003).

More specifically, a recent study reported a heritability estimate of 28% for performance on the

RMET (Warrier et al., 2018). We would expect to find similar heritability estimates for the SST and similar animated shape tasks based on the convergent validity evidence with the RMET, but this has yet to be empirically observed. This work could potentially identify whether different measures of SI share common genetic influences and how distinct those influences may be from those which predict performance on more general intelligence or cognitive tests.

Study Limitations

We observed that the SST provided adequate, but not ideal, internal consistency for adults with or without ASD (α < .80). Based on these results, the current version of the SST is best suited for research purposes, where even modest reliability may be sufficient for detecting true effects (Schmitt, 1996), but is not reliable enough for high-stakes, diagnostic use, where

Nunnally (1978) suggests a threshold of α ≥ .90. There is also currently only one SST form, so future research should develop a parallel test form and demonstrate test-retest reliability among clinical and nonclinical samples. Pinkham et al. (2017a) found that a similar animated shape task, the SAT-MC, had relatively weak test-retest reliability due in part to a poorly-performing alternative test form. This level of reliability may have attenuated the observed mean differences between adults with or without ASD. Another limitation is that we did not obtain cognitive or intelligence test scores for all participants in Studies 1 and 2. Therefore, we were only able to SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 24 control for coarse-grained educational attainment and other demographic differences when testing for score differences due to ASD.

Conclusions

Across two studies, we detected differences in SI between typically developing adults and adults with ASD using a remotely administered, freely-available online test. In both instances, we found that the SST was equally reliable for those with or without ASD. We also found no instances of differential item functioning based on ASD diagnosis. These results indicate that the SST is a promising tool for measuring SI, especially in situations where in- person, on-site assessments are either impractical or not possible. We also identified a shorter,

10-item version of the test which can be used in place of the full version. Although future research is needed to further optimize the SST and boost its reliability for clinical purposes, this tool may help researchers obtain a quantitative measure of SI while avoiding some of the practical limitations of other existing instruments. SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 25

References

Abell, F., Happé, F., & Frith, U. (2000). Do triangles play tricks? Attribution of mental states to

animated shapes in normal and abnormal development. Cognitive Development, 15, 1–16.

https://doi.org/10.1016/S0885-2014(00)00014-9

Abrahams, L., Pancorbo, G., Primi, R., Santos, D., Kyllonen, P., John, O. P., & De Fruyt, F.

(2019). Social-emotional skill assessment in children and adolescents: Advances and

challenges in personality, clinical, and educational contexts. Psychological Assessment,

31, 460-473. https://doi.org/10.1037/pas0000591

Altschuler, M., Sideridis, G., Kala, S., Warshawsky, M., Gilbert, R., Carroll, D., Burger-Caplan,

R., & Faja, S. (2018). Measuring individual differences in cognitive, affective, and

spontaneous theory of mind among school-aged children with autism spectrum disorder.

Journal of Autism and Developmental Disorders, 48, 3945–3957.

https://doi.org/10.1007/s10803-018-3663-1

Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of

confounding in observational studies. Multivariate Behavioral Research, 46, 399–424.

https://doi.org/10.1080/00273171.2011.568786

Austin, P. C., & Stuart, E. A. (2015). Moving towards best practice when using inverse

probability of treatment weighting (IPTW) using the propensity score to estimate causal

treatment effects in observational studies. Statistics in Medicine, 34, 3661–3679.

https://doi.org/10.1002/sim.6607

Baron-Cohen, S., Wheelwright, S., Hill, J., Raste, Y., & Plumb, I. (2001a). The “Reading the

Mind in the Eyes” test revised version: A study with normal adults, and adults with SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 26

Asperger syndrome or high-functioning autism. Journal of Child Psychology and

Psychiatry, 42, 241–251. https://doi.org/10.1111/1469-7610.00715

Baron-Cohen, S., Wheelwright, S., Skinner, R., Martin, J., & Clubley, E. (2001b). The Autism-

Spectrum Quotient (AQ): Evidence from Asperger syndrome/high-functioning autism,

males and females, scientists and mathematicians. Journal of Autism and Developmental

Disorders, 31, 5–17. https://doi.org/10.1023/A:1005653411471

Bell, M. D., Fiszdon, J. M., Greig, T. C., & Wexler, B. E. (2010). Social attribution test –

multiple choice (SAT-MC) in schizophrenia: Comparison with community sample and

relationship to neurocognitive, social cognitive and symptom measures. Schizophrenia

Research, 122, 164–171. https://doi.org/10.1016/j.schres.2010.03.024

Biagianti, B., Fisher, M., Brandrett, B., Schlosser, D., Loewy, R., Nahum, M., & Vinogradov, S.

(2019). Development and testing of a web-based battery to remotely assess cognitive

health in individuals with schizophrenia. Schizophrenia Research, 208, 250–257.

https://doi.org/10.1016/j.schres.2019.01.047

Brown, M. I., & Grossenbacher, M. A. (2017). Can you test me now? Equivalence of GMA tests

on mobile and non-mobile devices. International Journal of Selection and Assessment,

25, 61–71. https://doi.org/10.1111/ijsa.12160

Brown, M. I., Ratajska, A., Hughes, S. L., Fishman, J. B., Huerta, E., & Chabris, C. F. (2019).

The social shapes test: A new measure of social intelligence, mentalizing, and theory of

mind. Personality and Individual Differences, 143, 107–117.

https://doi.org/10.1016/j.paid.2019.01.035 SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 27

Brown, M. I., Speer, A. B., Tenbrink, A. P., & Shihadeh, M. (2021). Investigating group

differences in verbal and non-verbal social intelligence. Poster presented at the 35th

Meeting for the Society for Industrial and Organizational Psychology; virtual conference.

Burger-Caplan, R., Saulnier, C., Jones, W., & Klin, A. (2016). Predicting social and

communicative ability in school-age children with autism spectrum disorder: A pilot

study of the Social Attribution Task, Multiple Choice. Autism, 20, 952–962.

https://doi.org/10.1177/1362361315617589

Casaletto, K. B., & Heaton, R. K. (2017). Neuropsychological assessment: Past and future.

Journal of the International Neuropsychological Society, 23, 778–790.

https://doi.org/10.1017/S1355617717001060

Constantino, J. N., Davis, S. A., Todd, R. D., Schindler, M. K., Gross, M. M., Brophy, S. L., ...

& Reich, W. (2003). Validation of a brief quantitative measure of autistic traits:

Comparison of the social responsiveness scale with the autism diagnostic interview-

revised. Journal of Autism and Developmental Disorders, 33, 427–433.

https://doi.org/10.1023/A:1025014929212

Constantino, J. N., & Todd, R. D. (2003). Autistic traits in the general population. Archives of

General Psychiatry, 60, 524–530. https://doi.org/10.1001/archpsyc.60.5.524

Cor, M. K., Haertel, E., Krosnick, J. A., & Malhotra, N. (2012). Improving ability measurement

in surveys by following the principles of IRT: The Wordsum vocabulary test in the

General Social Survey. Social Science Research, 41, 1003–1016.

https://doi.org/10.1016/j.ssresearch.2012.05.007

Daniels, A. M., Rosenberg, R. E., Anderson, C., Law, J. K., Marvin, A. R., & Law, P. A. (2012).

Verification of parent-report of child autism spectrum disorder diagnosis to a web-based SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 28

autism registry. Journal of Autism and Developmental Disorders, 42, 257–265.

https://doi.org/10.1007/s10803-011-1236-7

Dodell-Feder, D., Ressler, K. J., & Germine, L. T. (2020). Social cognition or social class and

culture? On the interpretation of differences in social cognitive performance.

Psychological Medicine, 50, 133–145. https://doi.org/10.1017/S003329171800404X

Fombonne, E., Green Snyder, L., Daniels, A., Feliciano, P., Chung, W., & SPARK Consortium.

(2020). Psychiatric and medical profiles of autistic adults in the SPARK cohort. Journal

of Autism and Developmental Disorders, 50, 3679–3698. https://doi.org/10.1007/s10803-

020-04414-6

Harrison, A. J., Long, K. A., Tommet, D. C., & Jones, R. N. (2017). Examining the role of race,

ethnicity, and gender on social and behavioral ratings within the Autism Diagnostic

Observation Schedule. Journal of Autism and Developmental Disorders, 47, 2770–2782.

https://doi.org/10.1007/s10803-017-3176-3

Heck, P. R., Brown, M. I., & Chabris, C. F. (2021). Socially (un)skilled and unaware of it: A

robust negative relationship between self-report and performance measures of social

intelligence. Unpublished manuscript.

Heider, F., & Simmel, M. (1944). An experimental study of apparent behavior. American

Journal of Psychology, 57, 243–259. https://doi.org/10.2307/1416950

Hurley, R. S. E., Losh, M., Parlier, M., Reznick, J. S., & Piven, J. (2007). The broad autism

phenotype questionnaire. Journal of Autism and Developmental Disorders, 37, 1679–

1690. https://doi.org/10.1007/s10803-006-0299-3

Ingersoll, B., Hopwood, C. J., Wainer, A., & Donnellan, M. B. (2011). A comparison of three

self-report measures of the broader autism phenotype in a non-clinical sample. Journal of SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 29

Autism and Developmental Disorders, 41, 1646–1657. https://doi.org/10.1007/s10803-

011-1192-2

Isaksson, J., Westeinde, A. V., Cauvet, E., Kuja-Halkola, R., Lundin, K., Neufeld, J., Willfors,

C., & Bolte, S. (2019). Social cognition in autism and other neurodevelopmental

disorders: A co-twin control study. Journal of Autism and Developmental Disorders, 49,

2838–2848. https://doi.org/10.1007/s10803-019-04001-4

Jodoin, M. G. and Gierl, M. J. (2001). Evaluating Type I error and power rates using an effect

size measure with logistic regression procedure for DIF detection. Applied Measurement

in Education, 14, 329–349. https://doi.org/10.1207/S15324818AME1404_2

Johannesen, J. K., Fiszdon, J. M., Weinstein, A., & Ciosek, D. (2018). The Social Attribution

Task – Multiple Choice (SAT-MC): Psychometric comparison with social cognitive

measures for schizophrenia research. Psychiatry Research, 262, 154–161.

https://doi.org/10.1016/j.psychres.2018.02.011

Karakolidis, A., O’Leary, M., & Scully, D. (2021). Animated videos in assessment: Comparing

validity evidence from and test-takers’ reactions to an animated and a text-based

situational judgment test. International Journal of Testing, 21, 57–79.

https://doi.org/10.1080/15305058.2021.1916505

Kittel, A. F. D., Olderbak, S., & Wilhelm, O. (2021). Sty in the mind’s eye: A meta-analytic

investigation of the nomological network and internal consistency of the “Reading the

Mind in the Eyes” test. Assessment. Advance online publication.

https://doi.org/10.1177/1073191121996469 SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 30

Klin, A. (2000). Attributing social meaning to ambiguous visual stimuli in higher-functioning

autism and Asperger syndrome: The social attribution task. Journal of Child Psychiatry

and Allied Disciplines, 41, 831–846. https://doi.org/10.1111/1469-7610.00671

Lavoie, M. A., Plana, I., Lacroix, J. B., Godmaire-Duhaime, F., Jackson, P. L., & Achim, A. M.

(2013). Social cognition in first-degree relatives of people with schizophrenia: A meta-

analysis. Psychiatry Research, 209, 129–135.

https://doi.org/10.1016/j.psychres.2012.11.037

Lee, H. S., Corbera, S., Poltorak, A., Park, K., Assaf, M., Bell, M. D., Wexler, B. E., Cho, Y. I.,

Jung, S., Brocke, S., & Choi, K. H. (2018). Measuring theory of mind in schizophrenia

research: Cross-cultural validation. Schizophrenia Research, 201, 187–195.

https://doi.org/10.1016/j.schres.2018.06.022

Livingston, L. A., Carr, B., & Shah, P. (2019). Recent advances and new directions in measuring

Theory of Mind in autistic adults. Journal of Autism and Developmental Disorders, 49,

1738–1744. https://doi.org/10.1007/s10803-018-3823-3

Livingston, L. A., & Happé, F. (2017). Conceptualising compensation in neurodevelopmental

disorders: Reflections from autism spectrum disorder. Neuroscience and Biobehavioral

Reviews, 80, 729–742. https://doi.org/10.1016/j.neubiorev.2017.06.005

Ludwig, N. N., Hecht, E. E., King, T. Z., Revill, K. P., Moore, M., Fink, S. E., & Robins, D. L.

(2020). A novel social attribution paradigm: The Dynamic Interacting Shape Clips

(DISC). Brain and Cognition, 138, 105507. https://doi.org/10.1016/j.bandc.2019.105507

MacCann, C., Jiang, Y., Brown, L. E., Double, K. S., Bucich, M., & Minbashian, A. (2020).

Emotional intelligence predicts academic performance: A meta-analysis. Psychological

Bulletin, 146, 150–186. https://doi.org/10.1037/bul0000219 SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 31

MacCann, C., & Roberts, R. D. (2008). New paradigms for assessing emotional intelligence:

Theory and data. Emotions, 8, 540–551. https://doi.org/10.1037/a0012746

Magis, D., Béland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R

package for the detection of dichotomous differential item functioning. Behavior

Research Methods, 42, 847–862. https://doi.org/10.3758/BRM.42.3.847

Magis, D., Raîche, G., Béland, S., & Gérard, P. (2011). A generalized logistic regression

procedure to detect differential item functioning among multiple groups. International

Journal of Testing, 11, 365–386. https://doi.org/10.1080/15305058.2011.602810

Martinez, G., Mosconi, E., Daban-Huard, C., Parellada, M., Fananas, L., Gaillard, R., Fatjo-

Vilas, M., Krebs, M. O., & Amado, I. (2019). “A circle and a triangle dancing together”:

Alteration of social cognition in schizophrenia compared to autism spectrum disorders.

Schizophrenia Research, 210, 94–100. https://doi.org/10.1016/j.schres.2019.05.043

McCaffrey, D., Ridgeway, G., & Morral, A. (2004). Propensity score estimation with boosted

regression for evaluating adolescent substance abuse treatment. Psychological Methods,

9, 403–425. https://doi.org/10.1037/1082-989X.9.4.403

McCredie, M. N., & Morey, L. C. (2019). Who are the Turkers? A characterization of MTurk

workers using the personality assessment inventory. Assessment, 26, 759–

766. https://doi.org/10.1177/1073191118760709

McDonald, S. (2012). New frontiers in neuropsychological assessment: Assessing social

perception using a standardised instrument, The Awareness of Social Inference

Test. Australian Psychologist, 47, 39–48. https://doi.org/10.1111/j.1742-

9544.2011.00054.x SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 32

Montgomery, C. B., Allison, C., Lai, M. C., Cassidy, S., Langdon, P. E., & Baron-Cohen, S.

(2016). Do adults with high functioning autism or Asperger syndrome differ in

and emotion recognition? Journal of Autism and Developmental Disorders, 46, 1931–

1940. https://doi.org/10.1007/s10803-016-2698-4

Morrison, K. E., Pinkham, A. E., Kelsven, S., Ludwig, K., Penn, D. L., & Sasson, N. J. (2019).

Psychometric evaluation of social cognitive measures for adults with autism. Autism

Research, 12, 766–778. https://doi.org/10.1002/aur.2084

Nunnally, J. C. (1978). Psychometric theory (2nd ed.). McGraw-Hill.

Olderbak, S., Semmler, M., & Doebler, P. (2019). Four-branch model of ability emotional

intelligence with fluid and crystalized intelligence: A meta-analysis of relations. Emotion

Review, 11, 166–183. https://doi.org/10.1177/1754073918776776

Olderbak, S., & Wilhelm, O. (2020). Overarching principles for the organization of

socioemotional constructs. Current Directions in Psychological Science, 29, 63–70.

https://doi.org/10.1177/0963721419884317

Olderbak, S., Wilhelm, O., Olaru, G., Geiger, M., Brenneman, M. W., & Roberts, R. D. (2015).

A psychometric analysis of the reading the mind in the eyes test: Toward a brief form for

research and applied settings. Frontiers in Psychology, 6, 1503–1520.

https://doi.org/10.3389/fpsyg.2015.01503

Peer, E., Brandimarte, L., Samat, S., & Acquisti, A. (2017). Beyond the Turk: Alternative

platforms for crowdsourcing behavioral research. Journal of Experimental Social

Psychology, 70, 153–163. https://doi.org/10.1016/j.jesp.2017.01.006

Peñuelas-Calvo, I., Sareen, A., Sevilla-Llewellyn-Jones, J., & Fernández-Berrocal, P. (2019).

The “Reading the Mind in the Eyes” test in autism-spectrum disorders comparison with SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 33

healthy controls: A systematic review and meta-analysis. Journal of Autism and

Developmental Disorders, 49, 1048–1061. https://doi.org/10.1007/s10803-018-3814-4

Peterson, E., & Miller, S. F. (2012). The eyes test as a measure of individual differences: How

much of the variance reflects verbal IQ? Frontiers in Psychology, 3, 220.

https://doi.org/10.3389/fpsyg.2012.00220

Pinkham, A. E., Harvey, P. D., & Penn, D. L. (2017a). Social cognition psychometric evaluation:

Results of the final validation study. Schizophrenia Bulletin, 44, 737–748.

https://doi.org/10.1093/schbul/sbx117

Pinkham, A. E., Kelsven, S., Kouros, C., Harvey, P. D., & Penn, D. L. (2017b). The effect of

age, race, and sex on social cognitive performance in individuals with schizophrenia.

Journal of Nervous Mental Disorders, 205, 346–352.

https://doi.org/10.1097/NMD.0000000000000654

Ratajska, A., Brown, M. I., & Chabris, C. F. (2020). Attributing social meaning to animated

shapes: A new experimental study of apparent behavior. American Journal of

Psychology, 133, 295–312. https://www.jstor.org/stable/10.5406/amerjpsyc.133.3.0295

Ridgeway, G., McCaffrey, D., Morral, A., Griffin, B. A., Burgette, L., & Cefalu, M. (2020).

twang: Toolkit for Weighting and Analysis of Nonequivalent Groups. R package version

1.6.

Salter, G., Seigal, A., Claxton, M., Lawrence, K., & Skuse, D. (2008). Can autistic children read

the mind of an animated triangle? Autism, 12, 349–371.

https://doi.org/10.1177/1362361308091654

Sánchez-Álvarez, N., Berrios Martos, M. P., & Extremera, N. (2020). A meta-analysis of the

relationship between emotional intelligence and academic performance in secondary SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 34

education: A multi-stream comparison. Frontiers in Psychology, 11, 1517.

https://doi.org/10.3389/fpsyg.2020.01517

Schaafsma, S. M., Pfaff, D. W., Spunt, R. P., & Adolphs, R. (2015). Deconstructing and

reconstructing theory of mind. Trends in Cognitive Sciences, 19, 65–72

https://doi.org/10.1016/j.tics.2014.11.007

Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psychological Assessment, 8, 350–353.

https://doi.org/10.1037/1040-3590.8.4.350

Schwartzman, B. C., Wood, J. J., & Kapp, S. K. (2016). Can the five factor model of personality

account for the variability of autism symptom expression? Multivariate approaches to

behavioral phenotyping in adult autism spectrum disorder. Journal of Autism and

Developmental Disorders, 46, 253–272. https://doi.org/10.1007/s10803-015-2571-x

Sliwinski, M. J., Mogle, J. A., Hyun, J., Munoz, E., Smyth, J. M., & Lipton, R. B. (2018).

Reliability and validity of ambulatory cognitive assessments. Assessment, 25, 14–30.

https://doi.org/10.1177/1073191116643164

Soto, C. J., Napolitano, C. M., & Roberts, B. W. (2020). Taking skills seriously: Toward an

integrative model and agenda for social, emotional, and behavioral skills. Current

Directions in Psychological Science. Advance online publication.

https://doi.org/10.1177/0963721420978613

Stanton, J. M., Sinar, E. F., Balzer, W. K., & Smith, P. C. (2002). Issues and strategies for

reducing the length of self-report scales. Personnel Psychology, 55, 167–194.

https://doi.org/10.1111/j.1744-6570.2002.tb00108.x SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 35

Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic

regression procedures. Journal of Educational Measurement, 27, 361–370.

https://doi.org/10.1111/j.1745-3984.1990.tb00754.x

The SPARK Consortium (2018). SPARK: A US cohort of 50,000 families to accelerate Autism

research. Neuron, 97, 488–493. https://doi.org/10.1016/j.neuron.2018.01.015

Tippins, N. T. (2015). Technology and assessment in selection. Annual Review of Organizational

Psychology and Organizational Behavior, 2, 551-582. https://doi.org/10.1146/annurev-

orgpsych-031413-091317

Türközer, H.B., & Öngür, D. A (2020). A projection for psychiatry in the post-COVID-19 era:

potential trends, challenges, and directions. Molecular Psychiatry, 25, 2214–2219

https://doi.org/10.1038/s41380-020-0841-2

Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance

literature: Suggestions, practices, and recommendations for organizational research.

Organizational Research Methods, 3, 4–69. https://doi.org/10.1177/109442810031002

Velikonja, T., Fett, A. K., & Velthorst, E. (2019). Patterns of nonsocial and social cognitive

functioning in adults with autism spectrum disorder. JAMA Psychiatry, 76, 135–151.

https://doi.org/10.1001/jamapsychiatry.2018.3645

Warrier, V., Grasby, K. L., Uzefovsky, F., Toro, R., Smith, P., Chakrabarti, B. … et al. (2018).

Genome-wide meta-analysis of cognitive empathy: Heritability, and correlates with sex,

neuropsychiatric conditions and cognition. Molecular Psychiatry, 23, 1402–1409.

https://doi.org/10.1038/mp.2017.122

Weisberg, J., Milleville, S. C., Kenworthy, L., Wallace, G. L., Gotts, S. J., Beauchamp, M. S., &

Martin, A. (2012). Social perception in autism spectrum disorders: Impaired category SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 36

sensitivity for dynamic but not static images in ventral temporal cortex. Cerebral Cortex,

24, 37–48. https://doi.org/10.1093/cercor/bhs276

White, S. J., Coniston, D., Rogers, R., & Frith, U. (2011). Developing the Frith-Happé

animations: A quick and objective test of theory of mind for adults with autism. Autism

Research, 4, 149–154. https://doi.org/10.1002/aur.174

Wright, A. J. (2020). Equivalence of remote, digital administration and traditional, in-person

administration of the Wechsler Intelligence Scale for Children, Fifth Edition (WISC-V).

Psychological Assessment, 32, 809–817. https://doi.org/10.1037/pas0000939

Zwickel, J., White, S. J., Coniston, D., Senju, A., & Frith, U. (2010). Exploring the building

blocks of social cognition: Spontaneous agency perception and visual perspective taking

in autism. Social Cognitive and Affective Neuroscience, 6, 564–571.

https://doi.org/10.1093/scan/nsq088

SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 37

Table 1. Study 1 Participant Demographics ASD AS UA (n = 68) (n = 79) (n = 220) M SD M SD M SD Age 30.67 9.98 30.72 8.49 41.84 8.83 Sex 0.66 0.48 0.61 0.49 0.55 0.50 Race/Ethnicity 0.84 0.37 0.82 0.38 0.78 0.42 Education 6.18 2.33 7.28 1.96 7.59 2.34 Note: ASD = participants reporting a diagnosis of autism spectrum or autism disorder; AS = participants reporting a diagnosis of Asperger’s syndrome; UA = unaffected adults; Race/Ethnicity (1 = White/non-Hispanic, 0 = all other participants); Sex (1 = Female, 0 = Male); Education was self-reported using a ten-point rating scale (1= did not attend high school, 2 = some high school , 3 = GED, 4 = high school graduate, 5 = trade or vocational school, 6 = associate degree ,7 = completed some college, 8 = currently in college, 9 = baccalaureate degree, 10 = graduate/professional degree); Group differences in age F(2,364) = 67.73, p < .001; Group differences in sex �2 (2) = 2.92, p = .23; Group differences in race/ethnicity �2 (2) = 1.57, p = .46; Group differences in education F(2,364) = 10.18, p < .001. SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 38

Table 2. SST Descriptive Statistics by Diagnosis Group in Study 1 ASD AS UA (n = 68) (n = 79) (n = 220) Mean 14.13 15.57 15.61 SD 4.25 3.63 3.68 α .78 .71 .73 Mean CITC .33 .26 .29

Cohen’s d –0.39 –0.01 95% CI [–0.66, –0.11] [–0.27, 0.25] Note. Cohen’s d = standardized mean difference relative to unaffected adult (UA) group; ASD = autism spectrum disorder; AS = Asperger’s syndrome; CITC = corrected item-total correlation

SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 39

Table 3. Differences in SST Scores between Diagnosis Groups in Study 1 B(SE) β p Model 1 ASD –1.48 (0.52) –.15 <.01 AS –0.04 (0.50) –.05 .93

Model 2 Age –0.03 (0.02) –.08 .19 Sex 0.23 (0.39) .03 .56 Race/Ethnicity 1.66 (0.47) .18 <.01 Education 0.46 (0.08) .28 <.01 ASD –1.27 (0.54) –.13 .02 AS –0.30 (0.53) –.03 .56 Note. N = 367; β = standardized regression coefficient; ASD = autism spectrum disorder; AS = Asperger’s syndrome; Race/Ethnicity (1 = White/non-Hispanic, 0 = all other participants); Sex (1 = Female, 0 = Male); Race/Ethnicity (1 = identified as White, 0 = all other participants); Education was self-reported using a ten-point rating scale (1= did not attend high school, 2 = some high school , 3 = GED, 4 = high school graduate, 5 = trade or vocational school, 6 = associate degree ,7 = completed some college, 8 = currently in college, 9 = baccalaureate degree, 10 = graduate/professional degree); Model 1 R2 = .02, F(2, 364) = 4.19, p = .016; Model 2 R2 = .12, F(6, 360) = 8.96, p < .001

SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 40

Table 4. Study 2 Participant Demographics ASD UA (n = 25) (n = 829) M SD M SD Age 20.80 3.88 29.69 11.48 Sex 0.20 0.41 0.59 0.49 Race/Ethnicity 0.88 0.33 0.56 0.50 Education 2.20 0.58 3.26 0.86 Note: ASD = participants with a diagnosis of autism spectrum disorder; UA = unaffected adults; All unaffected adults were sampled from Amazon’s Mechanical Turk or undergraduate psychology students (Brown et al., 2019; Brown et al., 2021); Sex (1 = Female, 0 = Male); Race/Ethnicity (1 = identified as White, 0 = all other participants); Education was coded using a six-point rating scale (1= less than high school; 2 = high school degree, 3 = some college, 4 = 4 year college degree, 5 = some graduate school, 6 = graduate degree); Group differences in sex �2 (1) = 15.13, p < .001; Group differences in race/ethnicity �2 (1) = 10.15, p = .001;

SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 41

Table 5. SST Descriptive Statistics by Diagnosis Group in Study 2 ASD UA (n = 25) (n = 829) Mean 14.44 15.89 SD 3.82 3.42 α .75 .67 Mean CITC .29 .24

Cohen’s d –0.42 95% CI [–0.82, –0.02] Note. ASD = participants with a diagnosis of autism spectrum disorder; UA = unaffected adults; α = coefficient alpha; CITC = corrected item-total correlation; CI = confidence interval

SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 42

Table 6. Differences in SST Scores between Diagnosis Groups in Study 2 B(SE) β p Model 1 ASD –1.45 (0.70) –.07 .04

Model 2 Age 0.00 (0.01) .01 .74 Sex –0.27 (0.24) –.04 .26 Race/Ethnicity 0.56 (0.25) .08 .03 Education 0.18 (0.14) .05 .20 ASD –1.51 (0.73) –.07 .04 Note. N = 854; β = standardized regression coefficient; ASD = Autism Spectrum Disorder; Race/Ethnicity (1 = White/non- Hispanic, 0 = all other participants); Sex (1 = Female, 0 = Male); Education was coded using a six-point rating scale (1= less than high school; 2 = high school degree, 3 = some college, 4 = 4 year college degree, 5 = some graduate school, 6 = graduate degree); Model 1 R2 = .01, F(1,852) = 4.32, p = .04; Model 2 R2 = .01, F(5, 848) = 2.89, p = .01

SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 43

Table 7. Differential Item Functioning Results for All SST Items ASD UA UA/ASD

(n = 93) (n = 1049) DIF M CITC M CITC Wald �2 p Item 1 .37 .25 .40 .12 0.72 .70 Item 2 .40 .26 .41 .26 3.49 .17 Item 3 .88 .27 .94 .28 2.37 .31 Item 4 .85 .51 .90 .28 2.15 .34 Item 5 .91 .33 .95 .24 0.02 .99 Item 6 .82 .23 .84 .26 2.31 .32 Item 7 .70 .48 .83 .35 1.23 .54 Item 8 .47 .15 .63 .13 4.14 .13 Item 9 .76 .28 .76 .24 2.74 .25 Item 10 .83 .49 .86 .35 2.45 .29 Item 11 .61 .26 .60 .09 2.46 .29 Item 12 .61 .38 .65 .33 1.90 .39 Item 13 .87 .32 .92 .25 0.02 .99 Item 14 .72 .42 .80 .28 0.39 .82 Item 15 .47 .14 .55 .14 0.47 .79 Item 16 .62 .23 .66 .25 1.79 .41 Item 17 .66 .45 .74 .29 0.82 .66 Item 18 .52 .54 .66 .45 0.51 .77 Item 19 .18 .21 .31 .12 3.23 .20 Item 20 .35 .09 .44 .13 1.16 .56 Item 21 .46 .20 .58 .32 5.04 .08 Item 22 .74 .28 .81 .28 0.81 .67 Item 23 .40 .45 .57 .36 2.35 .31 Note. M = mean correct responses; DIF = differential item functioning; All estimates were calculated using the difR package (Magis et al., 2010); Each statistic represents an omnibus test for uniform and nonuniform DIF for each item; UA = unaffected adults from Studies 1 and 2 (n = 1049); ASD = adults with an autism spectrum disorder diagnosis from Studies 1 and 2 (n = 93); CITC = corrected item-total correlation; Items included in the 10-item short form SST are shown in italics and bolded.

SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 44

Table 8. SST Short Form Descriptive Statistics in Study 3 ASD UA (n = 93) (n = 422) Mean 6.47 7.02 SD 2.19 2.27 α .65 .69 Mean CITC .32 .35

Cohen’s d –0.24 95% CI [–0.47, –0.02] Note. ASD = participants with a diagnosis of autism spectrum disorder; UA = unaffected adults recruited from Prolific; α = coefficient alpha; CITC = corrected item-total correlation; CI = confidence interval

SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 45

A

B

Figure 1. Practice SST Item. All participants were given a practice item (A) before starting the 23-item SST. After responding to the practice item, participants received feedback which identified the correct response (B).

SELF-ADMINISTERED TEST OF SOCIAL INTELLIGENCE 46

Figure 2. Distribution of SST Scores for Participants with ASD and Unaffected Adults (both Studies 1 and 2 combined). The y-axis represents the proportion of participants within each group. The x-axis represents the number of correct responses to the 23-item SST.