NEWS FEATURE NEWS FEATURE News Feature: How online studies are transforming psychology research

The samples are large and diverse, but will this trend strengthen the field or merely introduce new sources of error?

Amber Dance allows people to test their unconscious leanings: Science Writer a deeply buried belief, for example, that boys are better than girls at math or that one religion outranks another. Nosek and his collaborators After six weeks spent busily coding, psychology When he landed in Seattle, he attended a wanted to study people’s responses. It was 1998 graduate student Brian Nosek finished his press conference about his new website, and “the Internet wasn’t being used for re- project in a cab on the way to the airport. Project Implicit (projectimplicit.org), which search,” recalls Nosek, now a professor at the University of in Charlottesville. “We had no idea if it would be of any success.” Within two days, 50,000 people had taken Nosek’stests.Today,ProjectImplicithasgath- ered data from 18 million participants, enough to keep the organizers busy for their whole ca- reers, should they wish. It has resulted in more than 100 publications on topics ranging from how the election of an African American pres- ident affected racial attitudes [not much (1)] to a slowly rising tolerance for gay people (2). Project Implicit is no longer alone on the Web; psychologists have taken their studies online in droves in recent years. The benefits are clear: easily accessible, large, diverse subject pools that extend beyond psychology students seeking course credit. But researchers must guard against online pitfalls. Participation is inherently biased to- wardthosewhochoosetotakeonlinesurveys or tests, sometimes for money. Psychologists must take care to understand their respon- dents, and make inferences only for the categories of people represented. However, researchers have no way to confirm who their online subjects are, and they have to watch out for dishonest or distracted participants who might introduce misleading data. Of course, every psychological research method and group of test subjects has its upsides and downsides. “There is no perfect method,” says Samuel Gosling of the University of Texas at Austin. Ideally, experts say, re- searchers should use multiple methods to con- firm results. But online studies offer another potent and increasingly popular method to the mix. WEIRD Science Picking the right pool of people to study has long been an issue in psychology research. “The psychology undergraduate has become the ‘model organism’ for psychological studies,” Psychology studies done online allow for the collection of data from tens of thousands of says Laura Germine, a postdoctoral fellow in subjects. Image courtesy of Dave Cutler. psychiatry at Harvard Medical School and

www.pnas.org/cgi/doi/10.1073/pnas.1520769112 PNAS | November 24, 2015 | vol. 112 | no. 47 | 14399–14401 Downloaded by guest on September 28, 2021 Massachusetts General Hospital in . as in-person pools (3). “Now, it’s pretty First, researchers have to hook their potential Retirees and young children are also fairly easy much accepted that people on the Inter- participants. People love to take quizzes online, subjects to attract, adds Joshua Hartshorne, an net are just people,” Germine says. Indeed, but researchers are competing with such popu- incoming assistant professor of psychology at Hartshorne says he doesn’tknowmanyre- lar frivolous offerings as “Which Disney prin- Boston College. The problem is that those searchers who have never done at least a tiny cess are you?”“If you put up a website that’s volunteers typically come from societies pilot study online. boring, three people come in, then it’sjustspi- that are Western, Educated, Industrialized, The Internet provides an easy way for psy- ders,” Hartshorne says, referring to the auto- Rich, and Democratic: as psychologists say, chologists to obtain the large, diverse, or spe- mated webcrawlers that index sites. Some they’re WEIRD. “There are giant holes in our cialized samples they need. For example, in one psychologists have reverse-engineered the knowledge about adults between 20 and 65, or paper Germine and Hartshorne report a study clickbait traits of popular Facebook quizzes, about people not college-educated and living in of how cognitive ability varies with age (4). and fit their studies to those parameters. For the West,” Hartshorne says. Such a study requires numerous people of all example, outofservice.com doesn’tsay,“Take Sometimes, it might not really matter who ages, and they simply don’t have the space or this standard personality test”;ittrumpets, takes the test, so long as they have a human the experimenters to recruit so many subjects “Find your Star Wars twin” or offers up similar brain. For example, consider a classic test of and test them in a laboratory, Hartshorne says. questions in a questionnaire labeled “All About attention: subjects are told to count how many Online, they accumulated nearly 50,000 people You—A Guide to Your Personality” (3). times a group of basketball players in a video who took IQ and memory tests. The re- For psychologists trained in traditional ex- pass the ball, and many are so focused on the searchers found that some abilities, such as perimental settings, going online may be countthattheyfailtonoticeamaninagorilla mental processing speed, peak in the late teens. unsettling. “You’re giving up a lot of perception suit stroll across the screen. If you want to Others, such as vocabulary, peak around age of control,” says Fred Sabb, a cognitive neu- understand how the human brain ignores the roscientist at the University of Oregon in — 50. The authors compared their Internet scores gorilla, any study participants so long as Eugene. Under the watchful eye of a research — to a smaller, standard dataset from in-person they can perform the task may suffice. testing, and most of the findings held up across assistant, subjects are likely to pay attention to However, for other questions, demographics both testing groups, thus confirming the find- their tasks. Far away on their computers or matter. If you want to know what kinds of ings with multiple methods. mobile phones, they might be concurrently voters support Donald Trump for president, ’ In another case, Nosek used Project Implicit watching a movie for all investigators know. you can t ask a sample of all liberals. data for a study of how attitudes toward gender Researchers have ways to make sure subjects Can use of the online community fix the and career vary around the world. Study par- are focused on their tasks in the laboratory, and WEIRD problem? Yes and no. Internet re- ticipants were given two kinds of words to these are even more important online. “Catch search subjects are more diverse than psy- categorize: gender-related words and careers in trials” are questions that anyone who is paying chology undergrads, but they are still a skewed science or liberal arts. The subjects then had to attention should ace, such as, “Have you ever participant pool. For example, in a 2004 review, sort, as quickly as they could, the two cate- had a heart attack and died?”“Manipulation Gosling et al. (3) compared the demographics goriessimultaneously.Itturnsoutthatmost checks” ensure the person understood the in- of traditional samples, gleaned from a year’s people sift more quickly and accurately if they structions. For example, the screen might offer worth of studies in the Journal of Personality have to bin masculine words such as “man” up what looks like a multiple-choice question, and Social Psychology, to the population that “ ” but the instructions say to click on the question used the personality-test website outofservice. and boy together with science-related words “ ” title, not an answer. Researchers can also time com,runbyGosling’s collaborator. Sample like physics than if they have to match composition varies depending on the school, feminine words to science topics. Among its locale, and the sample selection proce- more than half a million people who took dures; but overall, in-person psychology sub- that test, about 70% associated males with jects were 29% male. Online, that number science more often than females, Nosek rose to 43%. The Internet sample also had et al. reported in 2009 (5). slightly more non-Whites, 23% compared The researchers accumulated data from 34 with 20% in traditional studies, but with per- different countries, allowing them to analyze centages of African Americans and Latinos how those implicit attitudes correlated with still far below the United States census education. In those countries where the numbers. The online sample also had nearly subconscious stereotype was strongest, boys double the proportion of people from outside and girls had the biggest differential in per- the United States (3). formances on science and math. This result complemented previous findings that implicit World Wide Subject Pool stereotypes correlate with science and math Although researchers like Nosek have been scores on an individual level. conducting studies online since the days of Take this Quiz! dial-up, skepticism about the practice was widespread even just a few years ago. Germine Research online yields big datasets, fast. But recalls one comment when she presented at a running an online study is not quite as simple meeting in 2010: “‘How do you know that as posting a survey and sitting back while the your participants are not drunk and watching results roll in. Not many psychologists have porn?’”accused the questioner. There was trained in how to do an online study properly, a common perception, Germine says, that Hartshorne says. So he has become a bit of an “there’s some dark underbelly of society evangelist, traveling and speaking about the ’ ” mistakes he’s learned the hard way, such as the and they re on the Internet. Researchers must carefully monitor their re- However, when Gosling tested his online re- time he posted a study for Japanese speakers, spondents for neuroticism and introversion, with monetary rewards, and ended up with spondent pool to ensure the integrity of the characteristics linked to depression and so- participants who clearly didn’t understand data they collect. Image courtesy of Dave cial isolation, he found they scored the same Japanese at all. Cutler.

14400 | www.pnas.org/cgi/doi/10.1073/pnas.1520769112 Dance Downloaded by guest on September 28, 2021 the responses, because robot programs could be humanity as a whole, but they still beat out completing hundreds or thousands of studies, NEWS FEATURE very fast and movie-watchers slow, and directly university undergraduates, as well as many theworkersmaybemorefamiliarwithclassic ask people if they were distracted. Because other online samples, for diversity (8, 9). research materials than the average person, online studies net so many participants, Another advantage is that each Turker has a Chandler notes. That might skew results: for researchers can liberally throw out suspect unique ID, which psychologists can use to example, someone who takes a lot of IQ tests is results while maintaining adequate sample prevent people from taking the same survey likely to get better at them over time. size, Sabb points out. multiple times or to recontact study partici- Another downside, Kahan points out, is that Participants are often anonymous, which pants who meet a certain profile. (Chandler Turkers connect online to exchange tips on can help recruitment. Germine recently per- used these IDs to ask people repeat questions the best studies and how to complete them formed a study of how childhood traumas, over time to trawl for liars.) And since MTurk “ ” quickly. Foreknowledge of an experiment such as sexual abuse, affected adult thought employers can deny payment for a task “ ’ processes. She was worried she might not poorly done, Turkers are motivated to follow could alter results. Idontwanttohavea getmanysubjects,butnetted30,000.People instructions. lounge where people who are about to take the were probably more likely to participate be- “Ithinkit’s probably the most well-studied study mingle with people who have just fin- causetheydidnothavetoidentifythemselves sample out there right now,” Chandler says. It ished,” he says. “It’s just bad social science hy- or their family members, Germine surmises. In can be particularly good for researchers seeking giene.” Chandler thinks this problem is rare, but an analysis of a subset of those participants, she specific populations, such as parents of tod- says he usually asks at the end of a study if discovered that parental abuse correlated with dlers, or people who are gay, he says. participants saw the survey discussed anywhere. certain deficiencies, such as trouble inferring And its applications extend beyond sim- “The question is always, is the sample valid the thoughts and feelings of another person (6). ple online surveys. In one MTurk study, pub- given the kind of inference you want to draw?” Previous work had reported children who faced lished in 2012, researchers investigated effective Kahan says. “With respect to a lot of things traumas had a hard time understanding other problem solving in groups. They assigned teams people are using MTurk for, the answer is ’ ’ people s emotions; Germine sstudyshowed of 16 Turkers to explore a virtual desert for oil ‘no.’” For example, he’d worry about studies “ ” those problems extend into adulthood. fields in a game called Wildcat Wells. As of Turkers who claim to have psychological That anonymity comes at a price. People can participants searched different parts of the ’ ’ symptoms, based on Chandler sreportthat certainly lie in person, but in the laboratory, it s landscape,thegamesharedtheirfindingswith about 10% of Turkers scored high for malin- often easy to tell if a volunteer doesn’t qualify for just three other team members. Some played in gering: they claimed rare symptoms, perhaps in a study. Online, people who want to take studies a group with an efficient network, where this or are financially compensated for doing so sharing distributed the information evenly the hopes of qualifying for more work (12). might be inclined to lie to make themselves throughout the team, whereas other groups Of course, the Internet will never work for look like perfect candidates. However, studies dealt with an inefficient network in which many kinds of psychology studies. Germine indicate that people online lie about as often small clusters of players mostly shared find- pointsoutthatnosurveycanreplaceaface-to- as they do in person (7). ings with each other. Previous research offline face clinical interview. Nor can Internet tests suggested that inefficient networks would get replace observational studies of people inter- Rise of the Turk the best results, but the efficient networks acting in person, whether in real-life activities Doing research online has gotten even easier performed best in the MTurk experiment or in response to staged scenarios, although since the 2005 launch of the Mechanical Turk, a (10, 11). they can link up multiple online participants in service from . “MTurk,” as it’saffec- games or cooperative tests. “There will never be Turk Tomfoolery? tionately called by users, allows people to post a time, especially with behavioral research, ’ tasks that require a human to perform: for It s fast, easy, and cheap, but does MTurk where there isn’t a use for having people in example, identifying objects in pictures. It magnify the perils of online research? Dan person,” Nosek says. was named for an 18th century hoax, a - Kahan, a professor of law and psychology at Despite these potential stumbling blocks, playing that turned out to have a Yale University in New Haven, laments how psychologists continue to plumb the Internet human player concealed inside. often researchers turn to it without considering for greater and greater subject numbers. “Turkers” are the online users or “workers” that it may be unsuitable for their work. “You who perform those tasks, expecting about should wonder about the value of something Hartshorne, for example, expects to publish 10 cents a minute in return. Psychologists you’re paying a few cents for,” he warns. Kahan a study soon with 700,000 participants, on how cottoned on to MTurk’s possibilities around cautions that people on MTurk might be age affects learning of a second language. 2010, says Jesse Chandler, an adjunct pro- atypical precisely because they are attracted to Learning another language can take decades, fessor at the University of Michigan’sIn- the site. he points out. “Idon’t have 30 years to wait for stitute for Social Research in Ann Arbor and a Because so many psychologists are using the longitudinal study to get done.” Thanks to survey researcher at Mathematica Policy Re- MTurk, and some individual Turkers are the online community, he doesn’thaveto. search. With the service, they can get surveys done by lots of people, fast, for as little as dimes, nickels, or even pennies per subject. No 1 Schmidt K, Nosek BA (2010) Implicit (and explicit) racial attitudes 6 Germine L, Dunn EC, McLaughlin KA, Smoller JW (2015) joke—Gosling once offered one cent for an- barely changed during Barack Obama’s presidential campaign and Childhood adversity is associated with adult theory of mind and social swering a two-question survey, and garnered early presidency. J Exp Soc Psychol 46(2):308–314. affiliation, but not face processing. PLoS One 10(6):e0129612. 2 Westgate EC, Riskind RG, Nosek BA (2015) Implicit preferences for 7 Paolacci G, Chandler J (2014) Inside the Turk: Understanding 500 responses in just 33 hours (8). straight people over lesbian women and gay men weakened from Mechanical Turk as a participant pool. Curr Dir Psychol Sci 23(3):184–188. Who’s willing to take surveys for pocket 2006 to 2013. Collabra 1(1). Available at collabra.org/articles/ 8 Buhrmester M, Kwang T, Gosling SD (2011) Amazon’s Mechanical change? As it turns out, Turkers are still a bit 10.1525/collabra.18/. Accessed October 28, 2015. Turk: A new source of inexpensive, yet high-quality, data? Perspect WEIRD. They have above-average education, 3 Gosling SD, Vazire S, Srivastava S, John OP (2004) Should we trust Psychol Sci 6(1):3–5. web-based studies? A comparative analysis of six preconceptions 9 Paolacci G, Chandler J, Ipeirotis PG (2010) Running experiments but report fairly low income; Chandler suspects about internet questionnaires. Am Psychol 59(2):93–104. on Amazon Mechanical Turk. Judgm Decis Mak 5(5):411–419. many are unemployed or underemployed 4 Hartshorne JK, Germine LT (2015) When does cognitive 10 Mason WA, Jones A, Goldstone RL (2008) Propagation of Millennials living with their parents. Most functioning peak? The asynchronous rise and fall of different innovations in networked groups. J Exp Psychol Gen 137(3):422–433. Turkers reside in the United States or India, cognitive abilities across the life span. Psychol Sci 26(4):433–443. 11 Mason W, Watts DJ (2012) Collaborative learning in networks. 5 Nosek BA, et al. (2009) National differences in gender-science Proc Natl Acad Sci USA 109(3):764–769. where Amazon pays in cash, not merely store stereotypes predict national sex differences in science and math 12 Shapiro DN, Chandler J, Mueller PA (2013) Using Mechanical credit. Turkers certainly do not represent achievement. Proc Natl Acad Sci USA 106(26):10593–10597. Turk to study clinical populations. Clin Psychol Sci 1(2):213–220.

Dance PNAS | November 24, 2015 | vol. 112 | no. 47 | 14401 Downloaded by guest on September 28, 2021