Reverse-Engineering Censorship in China
Total Page:16
File Type:pdf, Size:1020Kb
RESEARCH ◥ Although the largest existing study analyzed RESEARCH ARTICLE more than 11 million social media posts from almost 1400 websites across China (2), it and other quantitative studies of censorship (16, 17) POLITICAL SCIENCE were solely observational, with some conclusions necessarily depending on untestable assumptions. For example, the data for these studies were con- Reverse-engineering censorship in trolled by an earlier stage in which many social media websites use automated review (based on techniques such as keyword matching) to im- China: Randomized experimentation mediately move large numbers of prospective posts into a temporary limbo to receive extra and participant observation scrutiny before possible publishing (for a guide, see Fig. 1). Whereas the ex post content-filtering Gary King,1* Jennifer Pan,1 Margaret E. Roberts2 decision is conducted largely by hand and takes aslongas24hours,theexantedecisionofwhether Existing research on the extensive Chinese censorship organization uses observational posts are slotted for review is automated, in- methods with well-known limitations. We conducted the first large-scale experimental stantaneous, and thus cannot be detected by study of censorship by creating accounts on numerous social media sites, randomly observational methods. This also means that the submitting different texts, and observing from a worldwide network of computers which automated review process could induce selec- texts were censored and which were not. We also supplemented interviews with tion bias in existing studies of censorship, which confidential sources by creating our own social media site, contracting with Chinese firms could only observe those submissions that were to install the same censoring technologies as existing sites, and—with their software, not stopped from publication by automated documentation, and even customer support—reverse-engineering how it all works. Our review. And, of course, observational research gen- results offer rigorous support for the recent hypothesis that criticisms of the state, its erally also risks endogeneity bias, confounding leaders, and their policies are published, whereas posts about real-world events with bias, and other problems. collective action potential are censored. To avoid these potential biases and to study how automated review works, we conducted a he Chinese government has implemented biasing news positively would in fact reduce large-scale experimental study in which random “the most elaborate system for Internet the potential for collective action, this state assignment, controlled by the investigators, sub- content control in the world” (1), marshal- critique theory of censorship misses the value stituted for statistical assumptions. We created T ing hundreds of thousands of people to to the central Party organization of the infor- accounts on numerous social media sites across strategically slow the flow of certain types mation content provided by open criticism in China;wrotealargenumberofuniquesocialme- of information among the Chinese people. Yet social media (10–13). After all, much of the job dia posts; randomized the assignment of different the sheer size and influence of this organiza- of leaders in an autocratic system is to keep types of posts to accounts; and, to evade detec- tion make it possible to infer via passive ob- the people sufficiently mollified that they will tion, observed from a network of computers all servation a great deal about its purpose and not take action that may affect their hold on over the world which types were published or procedures, as well as the intentions of the power. In line with the literature on responsive censored. Throughout, we attempted to avoid dis- Chinese government. To get around the well- authoritarianism, the knowledge that a local turbing the flow of normal discourse by producing known inferential limitations inherent in ob- leader or government bureaucrat is engender- social media content on topics similar to those servational work, our experiment depended on ing severe criticism—perhaps because of corrup- in real social media posts (including the content large-scale random experimentation and partic- tion or incompetence—is valuable information of those censored, which our methods could ac- ipant observation. (14, 15). That leader can then be replaced with cess). Although very-small-scale nonrandomized We begin with the theoretical context. The largest someone more effective at maintaining stability, efforts to post on Chinese websites and observe previous study of the purpose of Chinese censor- and the system can then be seen as responsive. censorship have been informative (18), random- ship (2)distinguishedbetweenthe“state critique” This responsiveness would seem likely to have a ized experiments have not before been used in and “collective action potential” theories of cen- considerably larger effect on reducing the prob- the study of Chinese censorship. sorship and found that, with few exceptions, ability of collective action than merely biasing In addition to our randomized experiment, the first was wrong and the second was right: thenewsinpredictableways. from which we drew causal inferences, we also Criticisms of the government in social media The collective action potential hypothesis holds soughttoproducemorereliabledescriptiveknowl- (even vitriolic ones) are not censored, whereas that the Chinese censorship organization first edge of how the censorship process works. This any attempt to physically move people in ways detects a volume burst of social media posts is important information in its own right; the not sanctioned by the government is censored. within a specific topic area, and then identifies processisintenselystudiedandcontestedinthe Even posts that praise the government are cen- the real-world event that gives rise to the vol- academic and policy communities. Until now, sored if they pertain to real-world collective action ume burst (2). If the event is classified as having such information has mostly come from highly events (2). collective action potential, then all posts within confidential interviews with censors or their In both theories, regime stability is the assumed the burst are censored, regardless of whether they agents at social media sites or in government. ultimate goal (3–6). Scholars had previously arecriticalorsupportiveofthestateanditslead- This information is necessarily partial, incom- thought that the censors pruned the Internet ers. Unlike the uncertain process involved in co- plete, potentially unsafe for research subjects, of government criticism and biased the remain- herently classifying individual posts as to their and otherwise difficult to gather. Participant ing news in favor of the government, thinking collective action potential, this procedure is easily observation provided us with a new source of that others would be less moved to action on implemented with extremely high levels of inter- information absent from previous studies of the ground as a result (7–9). However, even if coder reliability. No evidence exists as to whether censorship. From inside China, we created our any such rules were invented and directed by a own social media website, purchased a URL, 1 Institute for Quantitative Social Science, Harvard University, person or committee in the Chinese government, rented server space, contracted with one of the Cambridge, MA 02138, USA. 2Department of Political Science, University of California, San Diego, La Jolla, CA 92093, USA. or whether they merely represent an emergent most popular software platforms in China used *Corresponding author. E-mail: [email protected] pattern of this large-scale activity. to create these sites, submitted, automatically SCIENCE sciencemag.org 22 AUGUST 2014 • VOL 345 ISSUE 6199 1251722-1 RESEARCH | RESEARCH ARTICLE reviewed, posted, and censored our own submis- word lists and the absence of a coherent or action potential hypothesis—by using very large sions. The website we created was available only unified interpretation for the operation of these numbers of human coders to produce post hoc to our research team, so as to avoid affecting the lists at individual sites. Consistent with this dis- corrections to automated review and to censor- object of our study or otherwise interfering with agreement, we show that the large number of ship in general. existing Chinese social media discourse. However, local social media sites in China have consider- Our research offers clear support for the col- we had complete access to the software, docu- able flexibility, and choose diverse technical and lective action potential hypothesis and then offers mentation, help forums, and extensive consul- software options, in implementing censorship. some important extensions. We find—consistent tation with support staff; we were even able to Second, we show that the automated review pro- with the implications of this theory, but untested get their recommendations on how to conduct cess affects large numbers of posts on fully two- in prior research—that there is no censorship censorship on our own site in compliance with thirds of Chinese social media sites, but is a of posts about collective action events out- government standards. The “interviews” we con- largely ineffective step in implementing the gov- side mainland China, collective action events ducted in this way were unusually informative ernment’scensorshipgoals.Thisissurprising occurring solely online, social media posts because the job