Three Researchers, Five Conjectures: an Empirical Analysis of TOM-Skype Censorship and Surveillance
Total Page:16
File Type:pdf, Size:1020Kb
Three Researchers, Five Conjectures: An Empirical Analysis of TOM-Skype Censorship and Surveillance Jeffrey Knockel, Jedidiah R. Crandall, and Jared Saia University of New Mexico Dept. of Computer Science {jeffk, crandall, saia}@cs.unm.edu Abstract as if keyword-based censorship is effective at stopping We present an empirical analysis of TOM-Skype censor- protests when censorship keywords target specific adver- ship and surveillance. TOM-Skype is an Internet tele- tised protest locations, e.g., •'!W$• "ª phony and chat program that is a joint venture between TN (Corning West and Da Zhi Street intersection, Cen- TOM Online (a mobile Internet company in China) and tury Lianhua gate). Estimating the effectiveness of this Skype Limited. TOM-Skype contains both voice-over- entails an understanding of psychology to quantify the IP functionality and a chat client. The censorship and effects of perceived surveillance and uncertainty, meme surveillance that we studied for this paper is specific to spreading, social networking, content filtering, linguis- the chat client and is based on keywords that a user might tics to anticipate attempts to evade the censorship, and type into a chat session. many other factors. We were able to decrypt keyword lists used for censor- In this paper, we propose five conjectures about cen- ship and surveillance. We also tracked the lists for a pe- sorship. Our conjectures are based on our recent re- riod of time and witnessed changes. Censored keywords sults in reverse-engineering TOM-Skype censorship and range from obscene references, such as 'so (two surveillance, combined with past studies of Internet cen- girls one cup, the motivation for our title), to specific pas- sorship. TOM-Skype is an Internet telephony and chat sages from 2011 China Jasmine Revolution protest in- program that is a joint venture between TOM Online structions, such as ý %™ ¦S#èM (McDon- (a mobile Internet company in China) and Skype Lim- ald’s in front of Chunxi Road in Chengdu). Surveillance ited. TOM-Skype contains both voice-over-IP function- keywords are mostly related to demolitions in Beijing, ality and a chat client, the former of which implements such as u" !" (Ling Jing Alley demolition). keyword-based censorship and surveillance that we have Based on this data, we present five conjectures that we reverse-engineered. believe to be formal enough to be hypotheses that the In- We present these conjectures in a formal way, in an at- ternet censorship research community could potentially tempt to propose them as testable hypotheses on which answer with more data and appropriate computational future research can focus. We do not expect all of our and analytic techniques. conjectures to be true. However, all of them have the properties that: 1) they can in principle be empirically tested; and 2) determining whether they are true or false will advance our understanding of Internet censorship. 1 Introduction We contend that the enumeration of such testable con- How effective is keyword censorship at stifling the jectures is critical in order for the study of Internet cen- spread of ideas? Is constant surveillance necessary for sorship to continue as a viable area of scientific research. effective Internet censorship? What are the computa- 1.1 TOM-Skype results tional, linguistic, political, and social problems faced by both the censors and the people seeking to evade censor- In this paper, we give preliminary results from reverse- ship? engineering different versions of TOM-Skype. Our re- A good understanding of how Internet censorship sults include the cryptography algorithms used for both works, how it is applied, and what its impacts are will censorship and surveillance, differences between TOM- require both ideas from the social sciences and computa- Skype versions, fully decrypted lists of keywords with tional ideas. Consider a relatively simple question such translations, changes to the lists over time, and a rough categorization of three of the lists. There are several factors that make the list we have Recently, Nart Villeneuve demonstrated that the chat obtained from TOM-Skype unique among the lists that functionality of TOM-Skype triggers on certain key- have been made public thus far. The first is that, not only words, preventing their communication and uploading is the TOM-Skype list more up-to-date, but in the three messages to a server in China [16]. He provided some weeks that we have been monitoring it we have recorded high-level analysis of what is censored and how this many updates to it. We plan to record daily updates for mechanism works. In this paper, we provide a more de- a long period of time. Also, we believe our list is the tailed analysis of TOM-Skype, including the algorithms first to provide more than anecdotal evidence of some for protecting the keywords that trigger censorship and of the shorter-term applications of censorship, such as surveillance. All versions of TOM-Skype have at least the censorship of specific intersections where protesters one of two separate lists: one that triggers both censor- planned to meet or events in the news. Our list is not ship and surveillance and one that only triggers surveil- only complete, but there is a clear separation between lance. We have decrypted all lists for all versions of censorship keywords and surveillance keywords. TOM-Skype that we analyzed, and we have translated The prior work that is closest to ours is the aforemen- the most recent version’s lists and tracked changes in the tioned study of TOM-Skype by Villeneuve [16]. That lists. The encryption for protecting the keyword lists analysis was based on obtaining the uploaded conversa- in earlier versions is based on a simple algorithm in- tions, which were available for download on the server volving additions and exclusive-or operations on each that TOM-Skype uploaded them to at the time, and per- byte, whereas the encryption for later versions is AES- forming clustering and other aggregate analyses of this based. The encryption for uploading conversations that data. In contrast, the analysis we present in this paper trigger surveillance is DES-based. By overcoming the is based on reverse-engineering of the TOM-Skype im- anti-debugging functionalities built into both Skype and plementation of censorship and of the cryptography for TOM-Skype, we also have a detailed understanding of protecting the keyword blacklist. Thus, we are able to how the censorship and surveillance is implemented. present exactly what the keywords are and make a clear Based on our analysis, we propose a set of five conjec- distinction between keywords that evoke censorship and tures. surveillance and those that only evoke surveillance. There has been some amount of historical, political, 1.2 Related work economic, and legal discussion of the potential effective- The Open Net Initiative is an excellent source of infor- ness and applications of Internet censorship in China and mation about censorship in a variety of countries [6]. elsewhere [1, 13, 7, 2, 5]. Our goal in this paper is to However, the descriptions of what is filtered are rel- present data that can help the research community move atively high-level. The report by Zittrain and Edel- toward formalizing conjectures about Internet censorship man [17] is a good overview of some of China’s cen- that can be tested computationally. We propose five such sorship implementations. conjectures after presenting our data. The methods of China’s HTTP keyword filtering were This paper is structured as follows. In Section 2 first published by the Global Internet Freedom Consor- we present our findings from reverse-engineering TOM- tium [10]. Clayton et al. [3] published a more de- Skype. Section 3 discusses the two keyword blacklists tailed study of this mechanism. The ConceptDoppler for the most recent version of TOM-Skype, which we project [4] studied multiple routes and also used Latent have translated and analyzed in detail. Then we propose Semantic Analysis [12] to reverse-engineer 122 black- five conjectures and some recommendations for future listed keywords by clustering around sensitive concepts work in Section 4. and then probing. ConceptDoppler has generated two 2 Empirical analysis of TOM-Skype more recent lists as well. Human Rights Watch [11], Reporters Without Bor- In this section we describe the basic empirical results ders [14], and others [13, 1] have released reports de- from our efforts to reverse-engineer TOM-Skype. scribing China’s censorship regime. These reports often include insider information about what is censored [14] 2.1 Censorship mechanisms or perhaps full leaked blacklists, such as the list of key- Each version of TOM-Skype that we analyzed features words blocked in the QQChat chat program [11, Ap- shared censorship mechanisms. Each binary contains a pendix I] or by a particular blog site [11, Appendix II]. In built-in, encrypted list of keywords to censor, and, via the case of the QQChat list, hackers found the list in the HTTP, each client downloads at least one additional en- QQ software by doing a simple string dump of QQChat’s crypted list of censored keywords called a “keyfile” from dynamically linked libraries, i.e., there was no encryp- TOM’s servers. Each client uses at least one of these lists tion. to censor incoming and/or outgoing chat messages at any time. an.skype.tom.com/installer/agent/keyfile However, we also found stark differences in their cen- n sorship implementations. Different versions use different where is a pseudorandom, uniformly-distributed inte- encryption algorithms, different built-in keyword lists, ger between 1 and 8, inclusive. This file can be decrypted and download keyfiles from different locations. More- using the previous algorithm, as can these clients’ built- over, implementations vary in whether they censor in- in keyword lists.