Message In A Bottle: Sailing Past Censorship

Luca Invernizzi Christopher Kruegel Giovanni Vigna UC Santa Barbara UC Santa Barbara UC Santa Barbara California, USA California, USA California, USA [email protected] [email protected] [email protected]

ABSTRACT To sneak by the censorship, the dissident populations have re- Exploiting recent advances in monitoring technology and the drop sorted to technology. A report from Harvard’s Center for Internet of its costs, authoritarian and oppressive regimes are tightening & Society [44] shows that the most popular censorship-avoidance the grip around the virtual lives of their citizens. Meanwhile, vectors are web proxies, VPNs, and Tor [17]. These systems share the dissidents, oppressed by these regimes, are organizing online, a common characteristic: They have a limited amount of entry their activity with anti-censorship systems that typically points. Blocking these entry points, and evading the block, has consist of a network of anonymizing proxies. The censors have become an arms race: Since 2009, China is enumerating and become well aware of this, and they are systematically finding banning the vast majority of Tor’s bridges [57]. In 2012, Iran and blocking all the entry points to these networks. So far, they took a more radical approach and started blocking encrypted have been quite successful. We believe that, to achieve resilience traffic [54], a move that Tor countered the same day by deploying to blocking, anti-censorship systems must abandon the idea of a new kind of traffic camouflaging [53]. having a limited number of entry points. Instead, they should In this paper, we take a step back and explore whether it is establish first contact in an online location arbitrarily chosen possible to design a system with so many available entry points by each of their users. To explore this idea, we have developed that it is impervious to blocking, without disconnecting from the Message In A Bottle, a protocol where any post becomes a global network. potential “drop point” for hidden messages. We have developed Let’s generalize the problem at hand with the help of Alice, and released a proof-of-concept application of our system, and a dissident who lives in the oppressed country of Tyria. Alice demonstrated its feasibility. To block this system, censors are wants to establish, for the first time, a confidential communication left with a needle-in-a-haystack problem: Unable to identify what with Bob, who lives outside the country. To use any censorship- bears hidden messages, they must block everything, effectively resistant protocol, Alice must know something about Bob, and disconnecting their own network from a large part of the Internet. how to bootstrap the protocol. In the case of anonymizing proxies This, hopefully, is a cost too high to bear. or onion-routing/mix networks (e.g., Tor), Alice needs the address of at least one of the entry points into the network, and Bob’s address. Also, to achieve confidentiality, Alice needs Bob’s public Keywords key, or something equivalent to that. In protocols that employ Censorship Resistance, Deniable Communications, Steganography steganography to hide messages in files uploaded to media-hosting sites (such as Collage [12]) or in network traffic (such as Telex [62]), 1. INTRODUCTION Alice must know the location of the first rendezvous point, and 1 The revolutionary wave of protests and demonstrations known Bob’s public key . as the Arab Spring rose in December 2010 to shake the foundations The fact that Alice needs to know this information inevitably of a number of countries (e.g., Tunisia, Libya, and Egypt), and means that the censor can learn it too (as he might pose as showed the Internet’s immense power to catalyze social awareness Alice). Bob cannot avoid this, without having some way to through the free exchange of ideas. This power is so threatening distinguish Alice from the censor (but this becomes a chicken- to repressive regimes that censorship has become a central point in and-egg problem: How did Bob come to know that, since he their agendas: Regimes have been investing in advanced censoring never had any confidential communication with Alice before?). technologies [43], and even resorted to a complete isolation from We believe that this initial something that Alice must know is a the global network in critical moments [14]. For example, Pakistan fundamental weakness of existing censorship-resistant protocols, recently blocked Wikipedia, YouTube, Flickr, and Facebook [2], which forms a crack in their resilience to blocking. For example, and Syria blocked citizen-journalism media sites [51]. this is the root cause of the issues that Tor is facing when trying to distribute bridge addresses to its users, without exposing these addresses to the censor [52]. It is because of this crack that China has been blocking the majority of Tor traffic since 2009 [57]: The Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies 1 bear this notice and the full citation on the first page. To copy otherwise, to To be precise, in Telex, Alice does not need to know the precise republish, to post on servers or to redistribute to lists, requires prior specific location of the proxy running the Telex protocol. However, she permission and/or a fee. Request permissions from [email protected]. needs to be aware that a Telex proxy will occur somewhere in the ACSAC ’13 December 09-13, 2013, New Orleans, LA, USA path to the decoy destination. Here, we consider this destination Copyright 2013 ACM 978-1-4503-2015-3/13/12 ...$15.00. to be the rendezvous point she must know. number of entry points is finite, and a determined attacker can His intent is to discover and suppress the spreading of dissident enumerate them by pretending to be Alice. ideas. Determining the current and potential capabilities of With Message In A Bottle (miab), we propose a protocol in modern censors of this kind (e.g., Iran and China) is a difficult which Alice needs to know the least possible about Bob, and how task, as the inner workings of the censoring infrastructure are to bootstrap the protocol. In fact, we will show that it is enough often kept secret. However, we can create a reasonable model for for Alice to know Bob’s public key, and nothing else. Alice must our adversary by observing that, ultimately, the Censor will be know at least Bob’s public key to authenticate him, so that she constrained by economic factors. Therefore, we postulate that can be sure she is not talking to a disguised censor. However, the censorship we are facing is influenced by these two factors: contrary to systems like Collage and Telex, there is no prearranged • The censoring infrastructure will be constrained by its cost rendezvous point where Alice and Bob must meet. and technological effort; This may now sound like a needle-in-a-haystack problem: If • Over-censoring will have a negative impact on the economy neither Alice nor Bob know how to contact the other one, how of the country. can they ever meet? To make this possible, and reasonably fast, From these factors, we can now devise the capabilities of our miab exploits one of the mechanisms that search engines employ Censor. We choose to do so in a conservative fashion, preferring to generate real-time results from blog posts and news articles: to overstate the Censor’s powers than to understate them. We blog pings. Using these pings as a broadcast system, Bob gets make this choice also because we understand that censorship is an covertly notified that a new message from Alice is available, and arms race, in which the Censor is likely to become more powerful where to fetch it from. We will show that, just like a search as technology advances and becomes more pervasive. engine, Bob can monitor the majority of the published on We assume that it is in the Censor’s interest to let the general the entire Internet with limited resources, and in quasi real-time. population benefit from Internet access, because of the social In some sense, every blog becomes a potential meeting point and economic advantages of connectivity. This is a fundamental for Alice and Bob. However, there are over 165 million blogs assumption for any censorship-avoidance system: If the country online [8], and since a blog can be opened trivially by anybody, runs an entirely separate network, there is no way out2. for our practical purposes they constitute an infinite resource. The Censor’s Capabilities. As part of his infrastructure, As a result of one of our experiments (see Section 5.1), we will the Censor might deploy multiple monitoring stations anywhere show that, to blacklist all the potential miab’s drop points in a within his jurisdiction. Through these stations, he can capture, three-months period, the Censor should block at least 40 million analyze, modify, disrupt, and store indefinitely network traffic. fully qualified domain names, and four million IP addresses (as In the rest of the world, he will only be able to harness what is we will show, these are conservative estimates). For comparison, commercially available to him. The Censor can analyze traffic blacklisting a single IP address would block Collage’s support for aggregates to discover outliers in traffic patterns, and profile Flickr (the only one implemented), and supporting additional encrypted traffic with statistical attacks on the packets content media-hosting sites requires manual intervention for each one. or timing. Also, he might drill down to observe the traffic of In miab, Alice will prepare a message for Bob and encrypt it individual users. with his public key. This ciphertext will be steganographically The Censor might issue false TLS certificates with a Certificate embedded in some digital photos. Then, Alice will prepare a blog Authority under its control. With them, he might man-in-the- post about a topic that fits those photos, and publish it, along middle connections with at least one endpoint within his country. with the photos, on a blog of her choosing. Meanwhile, Bob will The Censor will have knowledge of any publicly available be monitoring the stream of posts as they get published. He will censorship-avoidance technology, including miab. In particular, recognize Alice’s post (effectively, the bottle that contains the he might join the system as a user, or run a miab instance to lure proverbial message), and recover the message. dissidents into communicating with him. Also, he might inject To achieve these properties, the miab protocol imposes sub- additional traffic into an existing miab instance to try to confuse stantial overhead. We do not strive for miab’s performance to or overwhelm it. be acceptable for low latency (interactive) communication over Because the Censor bears some cost from over-blocking, he the network (such as web surfing). Instead, we want our users will favor blacklisting over whitelisting, banning traffic only when to communicate past the Censor by sending small delay-tolerant it deems it suspicious. When possible, the Censor will prefer messages (e.g., emails, articles, tweets). Also, miab can be em- traffic disruption over modification, as the cost of deep-packet ployed to exchange the necessary information to bootstrap more inspection and modification is higher than just blocking the demanding censorship-resistant protocols. These might be more stream. For example, if the Censor suspects that the dissidents are efficient than miab, at the cost of requiring more data upon communicating through messages hidden in videos on YouTube, bootstrap (such as Collage, or Telex). We will showcase a few he is more likely to block access to the site rather than re–encoding applications of miab in Section 4. every downloaded video, as this would impose a high toll on his Our main contributions are: computational capabilities. Also, we assume that if the Censor • A new primitive for censorship-resistant protocols that chooses to alter uncensored suspicious traffic, he will do so in a requires Alice to have minimal initial knowledge about Bob, manner that the user would not easily notice. and how to bootstrap the protocol; • A study of the feasibility of this approach; • An open-source implementation of a proof-of-concept ap- 3. DESIGN plication of miab that lets user tweet anonymously and In its essence, miab is a system devised to allow Alice, who deniably on Twitter. lives in a country ruled by an oppressive regime, to communicate confidentially with Bob, who resides outside the country. Alice 2. THREAT MODEL does not need to know any information, but Bob’s public key. The adversary we are facing, Tyria’s Censor, is a country-wide 2We are strictly speaking about traditional ways to leak messages authority that monitors and interacts with online communications. to the Internet through network communication. 3. Alice uses the miab client software to embed software to embed a message M into the photos. The message is hidden using public-key steganography [58] (PKS), with Bob’s public key. We will discuss the available choices for the steganographic algorithm in Section 3.2. 4. Alice publishes the blog post, containing the processed photos. Alice can choose the blog arbitrarily, provided it supports blog pings. In Section 5.1, we will show that most of the popular blog engines provide blog pings. 5. The blog emits a to some ping servers. 6. Meanwhile, Bob is monitoring some of the ping servers, Figure 1: Alice sends a message to Bob, using the miab looking for steganographic content encrypted with his public protocol in its basic form. key. Within minutes, he discovers Alice’s post, and decrypts the message. In particular, we aim to satisfy these properties for miab: 7. Bob reads the message, and acts upon its content (more on • Confidentiality: The Censor should not be able to read this later). the messages exchanged between Alice and Bob. This concludes the miab protocol in its basic form; we will expand • Availability: The Censor should not be able to block miab it further in Section 4. without incurring unacceptable costs (by indiscriminately Distributing the software bundle. To use miab, Alice blocking large portions of the Internet). needs a copy of the software, and Bob’s public key. The key • Deniability: When confidentiality holds, the Censor should should rarely change, so it can be distributed with the software. not be able to distinguish whether Alice is using the system, This bundle can be downloaded in clear-text before the Censor or behaving normally. becomes too controlling. Otherwise, it can be distributed via • Unobtrusive deployment: Deploying a instance miab USB drives, or spam. Also, it could be published in multiple should be easy and cheap, so that small organizations or locations online, encoded in a variety of formats (e.g., DeCSS code private citizens with some disposable income (e.g., Bob) diffusion [1]), so that the Censor cannot easily fingerprint and can do it. block them all. In this case, Alice should collect several of these We will give a detailed analysis about to which extent we bundles and compare them, to mitigate the possibility that she is achieved these properties in Section 6. using bundles that were disseminated by the Censor. However, The only requirement that Alice must satisfy to use this protocol this possibility cannot be ruled out with certainty. We expect is to be able to make a blog post. She can create this post on that the Censor will also have a copy of the software. any blog hosted (or self-hosted) outside the Censor’s jurisdiction. Alice’s need for the initial bundle is certainly a shortcoming of miab. However, distributing the software and establishing a 3.1 Components root of trust is a fundamental problem, with no easy solution, Before explaining the details of the miab protocol, we must that affects every related work (e.g., Collage, Tor – see Section 7). introduce the notion of blog pings, a concept that will play a Moreover, miab represents a step toward solving this fundamental crucial role in our system. A blog ping is a message sent from a problem, as its bundle need updates only in exceptional cases (e.g., blog to a centralized network service (a ping server) to notify the Bob’s key has expired), and therefore can be used to bootstrap server of new or updated content. Blog pings were introduced other protocols, distributing the rendezvous points. We note that in October 2001 by Dave Winer, the author of the popular also Telex has this property, but it needs the collaboration of ping server weblogs.com, and are now a well-established reality. ISPs to be deployed, whereas miab does not. Over the last ten years, the rising popularity of pings pushed Steganographic scheme. The choice of an appropriate for the development of a wealth of protocols that compete for steganographic scheme is critical to achieve our confidentiality performance and scalability (e.g., FriendFeed’s SUP [29], Google’s and deniability goals. Ideally, we would like to have a perfectly PubSubHubbub [31], and rssCloud [48]). secure stegosystem, which is a system where the stego object (i.e., Search engines use blog pings to efficiently index new content in the object with an embedded hidden message) exactly matches real time. Since search engines drive a good part of the traffic on the probability distribution of the cover source (i.e., the object the Internet, blog writers adopt pings to increase their exposure before the embedding). Solutions to this problem exist [49,61], but and for Search Engine Optimization. Consequently, the vast they require an exact knowledge of the probability distribution of majority of blogging platforms support pings, and have them the cover source. Unfortunately, this knowledge is hard to obtain, enabled by default (e.g., Wordpress, Blogger, Tumblr, Drupal). if not impossible [9], from real digital media, such as photos. To overcome this setback, these solutions opt to artificially generate the cover object. More pragmatic approaches, instead, focus on 3.2 The miab Protocol hiding messages in real digital media by minimizing perturbations, Our scene opens with Alice, who lives in a country controlled in the hope that the image noise will make these alterations appear by the Censor. Alice wants to communicate with Bob, who is inconspicuous. The positive dependence between the fraction of residing outside the country, without the Censor ever knowing changes in the cover and their detectability is formalized under that this communication took place. To do so, Alice sends a the “square root law of steganography” [27]. Fortunately, miab message with the miab protocol, following these steps (also shown needs to embed a very limited payload, consisting only of a few in Figure 1): hundred bytes, which are embedded in cover photos whose size is 1. Alice authors a blog post of arbitrary content. The content in the hundreds of Kbytes. To minimize our detectability, then, should be selected to be as innocuous as possible. we need to minimize the alterations made during the embedding, 2. Alice snaps one or more photos to include in the post. and to curtail our changes in difficult-to-model areas of the digital media. Among the current state-of-the-art proposals in this field, we selected, and used in our experiments, an adaptive LSB-matching steganographic approach that chooses the location of the pixels to alter with the help of Syndrome-Trellis Codes (STCs) [25]. LSB-matching, also called ± embedding, consists in randomly adding ±1 to the least significant bit of pixels to match the hidden message. LSB-matching has been shown to be near optimal when only a single pixel can be utilized [23]. Pure LSB-matching schemes have been broken [30,45], exploiting an invalid underlying assumption of LSB-matching, which considers natural image noise to be independent among pixels. Using STCs, LSB-matching can be augmented by choosing the pixels in which to embed the message wisely, minimizing detectable Figure 2: Covert messaging with miab. alterations. In particular, we chose a near-optimal algorithm [26] that minimizes distortions. This algorithm first associates a cost To bootstrap a Collage installation, the database of tasks that to each possible pixel alteration, and then it finds the least costly Collage employs must be distributed offline. This database needs series of modification needed to embed the message. Both its to be constantly updated as the Censor reacts and the drop space and time complexity are linear with respect to the size of sites change (both in location and structure). It is, therefore, the cover source. LSB-matching with STCs is also the approach crucial that this bootstrap database is up-to-date: Otherwise, the chosen by HUGO [46], which is a stegosystem that is optimized agreement on the drop points between sender and receiver will to embed large messages. Although miab does not benefit from be lost, breaking the communication channel. Once Collage has HUGO’s efficient embedding, as its messages are short, this been bootstrapped, further updates can be received through the stegosystem can give us a reference point about the state of the Collage network. To receive these updates, however, the Collage art in the steganalysis of a system close to ours. This is because client must be connected to the Internet. When the connectivity HUGO has been targeted by steganalysis experts from all over the is sporadic, the client’s database might become obsolete, and a world, in the first international challenge on steganalysis called new bootstrap round will be necessary. “Break Our Steganographic System” (BOSS [6]). For details on We believe that miab is a good fit for bootstrapping Collage, HUGO’s resilience to attacks, see the next section. If we consider because the only information that Alice must know in order also that, as we will show in the next section, tens of thousands to request a current copy of the task database is Bob’s public of photos are published in blog posts around the world every key. miab could also be used to communicate with a censorship- minute, we expect that, if our adversary tries to detect stego resistant micro-blogging service, like #h00t [4], or a privacy- images in the blog posts stream, she will be overwhelmed by the preserving one, like Hummingbird [16]. sheer number of false positives. Finally, we would like to note Covert Messaging. Another option to achieve two-way mes- that the protocol is decoupled from the particular scheme in miab saging is to extend the miab protocol to let Bob reply to Alice. use, and swapping schemes is quite easy. Because of this, a miab To do so, Alice will need to inform Bob about the next rendezvous implementation can support multiple PKS schemes, much like point, and about the steganographic scheme Bob should use in the SSL/TLS protocol does with cipher suites. In this scenario, his reply. If Bob is a human (i.e., he is not running an automated Alice can pick the PKS scheme she feels safe to use, and encode service), Alice will run the same algorithm presented in Section 3.2, her choice into a pre-determined place in the blog post (e.g., the with a small modification in Step 3. In particular, fist letter of the title will decide the scheme). Other schemes • Alice will choose the next rendezvous point R (e.g., site_ that we have been considering are Gibbs constructions [24], and 1 a.com in Figure 2). As rendezvous point, Alice will choose Greenstadt’s work on an image-steganography scheme that can the URL of a web page where Bob can post additional withstand re-encoding [32]. content: for example, a forum, a media-sharing service, or an online newspaper that lets its users comment the news. 4. ACHIEVING TWO-WAY MESSAGING If possible, Alice should indicate a website that she routinely The miab protocol can be used to achieve two-way communi- visits, so that the Censor will not be alarmed when she cation in two ways: either by bootstrapping a more demanding visits that URL. This website should be located outside censorship resistent protocol (that requires more than just Bob’s Tyria, so that the Censor cannot see if Bob is visiting the public key to be initiated), or with a channel-hopping protocol site (the censor would need to know Bob’s IP address, but that establishes rendezvous points in Internet locations chosen at miab does not prevent this from happening). will by the users. • Alice will decide which steganographic scheme S1 Bob Bootstrapping another protocol. We can use miab to should use to hide his answer. Note that now Alice is bootstrap Collage [12], which is a protocol that uses user-generated not restricted to PKS schemes: She can also indicate a content (e.g., photos, tweets) as drop sites for hidden messages. shared-key scheme, along with a secret password to initialize Differently from the miab approach, in Collage, the drop sites the scheme. must be decided upon in advance: These rendezvous points are • If Alice intends to send additional messages to Bob, she the secret that Alice and Bob share. To put and retrieve the should also specify an additional rendezvous point, R2 messages from these rendezvous points, the users of Collage have (site_b.com in the Figure), and steganographic scheme S2. to perform a series of tasks. For example, if the drop site is a Alice will then append this information to the message M, which photo posted on Flickr under the keyword “flowers,” the sender she will then embed in the photos. Following the usual protocol, task will be the series of HTTP requests required to post a photo Bob will then recover Alice’s message, and obtain R1, R2, S1, and with that keyword, and the receiver task will be composed of S2. Bob will then prepare his reply to Alice, and post it (Point requests designed to retrieve the latest photos with that tag. 8 in Figure 2), encoded with S1, on the rendezvous point R1. Alice will periodically check R1, looking for Bob’s reply (Point 9 in the same Figure – we will empirically show in Section 5 that this wait is less than ten minutes). Alice could check R1 manually or, if the rendevouz point supports it, by email or RSS feed. When she finds Bob’s message, she will decode it with S1, completing the two-way messaging cycle. If Alice requires any further communication with Bob, she will post her reply, encoded with S2, on the rendezvous point R2, which Bob will periodically check. Following the guidelines for the previous message from Alice, this reply will also contain R and S , if Alice expects 3 3 Figure 3: BOSS false positives, false negatives and ac- an answer from Bob (and, optionally, R and S , if she plans 4 4 curacy, for the winning team and all teams combined to send additional messages). If Bob is running an automated (by simulating collusion among teams) service, Alice will also need to instruct the service on how to experiment. Without it, miab can be run on a single machine. reply. To do so, much like a Collage task, she will add to M the We have successfully done so, and the code published online refers sequence of actions that the automated service must follow to post to this second version. a reply (e.g., perform a POST request to http://site_a.com, Ping server. We use webblogs.com. This server is the oldest with payload comment=ANSWER). Alice has the option to provide ping server of the web, and receives million of pings every day. the cover channel for the automated service’s reply (i.e., she could For example, blogs hosted on Google’s blogger.com ping this write a comment relevant to R , or an image, and embed it in M). 1 server by default. The server does not allow real-time monitoring Otherwise, the automated service will embed the reply into a spam of pings, but releases a tarball with the latest pings every five message, which will then post. Spam messages can be generated minutes. (e.g., see [39]), or real spam messages can be collected from a set of Steganography For our experiments, we have implemented email accounts. Note that, for the sake of simplicity, in describing the stegosystem described in Section 3.2. To embed a message, this covert messaging protocol we assumed that Alice choses the first we encrypt it with RSA using Bob’s public key, then we rendezvous points and encryption schemes manually. However, apply our tool. As we mentioned in Section 3.2, we can estimate this can be automated, so that Alice can use the protocol without our resilience to steganography by looking at how effectively the hassle. In particular: • miab’s client software can detect the available stegano- HUGO stegosystem was attacked during the BOSS competition. graphic schemes on Alice’s computer, and pick one at For the reader’s convenience, a summary of the results is given in random Figure 3. Since miab’s embedding rate is at least two orders of • Alice, with the help of a browser extension, can collect po- magnitude smaller than the one used during BOSS, and that the tential rendezvous points while surfing the web in her usual square root law of steganography applies, we expect a dramatic routine. The extension would observe when Alice makes decrease of detection rates, and increase of false positive rates. POST requests containing files or long strings, without being In particular, a direct application of this law indicates that we authenticated to the website she is interacting with (e.g., should expect single digit detection ratios. Unfortunately, we over clear-text HTTP, or without a Cookie HTTP header) cannot verify this, as the detection systems used during BOSS , and record them as possible rendezvous points. Essentially, have not been release publicly. Even if we consider the detection the extension would observe Alice’s browsing, looking for rates of Figure 3, the number of false positives detected is likely commenting systems and media-sharing web services. to overwhelm the Censor, considering the vast prevalence of clean Once one of these rendezvous point is used in the covert photos in the blogs’ stream. Also, Alice can throw the Censor messaging, the browser extension can collect the answer off track by publishing a few clean images for each stego photo automatically, after waiting for the appropriate time for published. Bob to answer (see Section 5 for details). Since our stegosystem does not use a key, the Censor will be able to extract the hidden encrypted message, if there is any. If there is none, the Censor will just extract gibberish. If it is 5. IMPLEMENTATION possible to tell the difference between the two kinds of extracted To demonstrate the feasibility of miab, we have implemented a data, the Censor will break our stegosystem. Since our messages proof-of-concept application that can be used to post anonymous are encrypted with RSA, they will have high entropy, which is messages on Twitter, circumventing the block on social networks a feature that can be easily quantified to be used for machine- imposed by Tyria’s Censor. learning classification. However, our stegosystem is designed to To provide a more open evaluation, we have published the select the pixels in which to embed the message preferring noisy, code online. The application can be downloaded at http://www. hard to modify areas of natural photos, which are also highly message-in-a-bottle.us, together with our public key. entropic. To check if it is likely that the Censor will be able We have set up a small cluster of machines that monitors one to use the extracted data to tell apart clean images from the of the most popular blog ping servers (webblogs.com), looking ones with stego content, we have trained a few classifiers that for miab messages (our cluster plays Bob’s part in the algorithm). aim to do so. First, we have prepared two datasets. In the first When a message is found, its content is posted on Twitter under dataset, we have applied (the decoder part of) our stegosystem the handle @corked_bottle. to extract gibberish from 5,000 clean photos downloaded from Also, the code can be used with a new public/private key-pair Flickr. To reach a broad number of photographers, we selected to verify how messages are recovered. the photos using Flickr popular tags. In the second dataset, Blog-fetching cluster. We use four servers with Intel Xeon we have encrypted with RSA 5,000 random messages. From 2.40GHz processors, 4 Gigabytes of RAM, and high-speed Internet these datasets, we extracted features that quantify the entropy connection. The need for a cluster of machines comes from of the strings. In particular, we have calculated the Hamming’s the additional overhead of our detailed data collection for this entropy, the Chi-Square test’s p value, the arithmetic mean of the byte, the Monte Carlo approximation for π, and the correlation since the title is usually part of the URL. Of course, the Censor coefficient between consecutive bytes. The precision and recall will also be aware of this, so this device will reduce the workload of the best in precision for each class of classifiers is shown in equally for both the Censor and Bob. Table 1. We attribute the poor performance of the classifiers to The maximum length of the embedded message depends on the combination of the shortness of the messages and the natural the channel and the PKS scheme in use. and digital noise in the photos, that is empirically entropic enough In our scheme, we enforce a size limit of 450 bytes, since as the to be hardly distinguishable from RSA-encrypted messages. covert message becomes larger, the chance that the image might be flagged as suspicious by the Censor also increases. Classifier Precision Recall Scale. miab should cope well with a high volume of hidden RSA Flickr RSA Flickr messages. A single machine can accommodate 15-20 million posts K-NN (k = 14) 52.05% 51.37% 41.76% 61.52% per day. To put this in perspective, if the entire population of Iran Naive Bayes 50.11% 50.98% 90.48% 9.90% ANN (2 hidden layers) 50.14% 50.09% 40.36% 59.86% would start blogging daily, Bob would need five more machines. SVM 48.15% 48.77% 58.52% 38.52% Also, a large number of useless posts made by miab users should Vote 82.56% 17.08% 49.89% 49.48% not affect search engine results, because search engines already deal with a high volume of spam pings [36]. Table 1: Best classifiers for each class. Choosing the right blog. To send a message through miab, Certificate. We ship a 2,048-bit RSA public key. Alice has to publish a post on a blog that supports pings. Alice might already have posting privileges on such a blog: In this case, 5.1 Evaluation she can just use it. Otherwise, Alice needs to open one, or be added as an author to an existing one. It is therefore interesting miab relies on Bob’s ability to fetch and process all the blog to see how difficult it is for Alice to open a blog supporting pings. posts created on the Internet in real time. To prove that this In Table 2, we show the four most popular blogging platforms is feasible, we have done so with our cluster, as well as our for the Alexa’s top million websites in February 2012. These single-machine version of miab. platforms account for more than 85% of all blogs, and they all Over the period of three months (72 days), we have seen support pings. In Table 3, we show the platform that the 100 814, 667, 299 blog posts. The average number of posts seen per most popular blog chose to use in 2009. Of these, at least 75 day is 11, 314, 823, and the highest traffic we have experienced is support pings. 13, 083, 878 posts in a day. If the blog Alice has chosen does not support blog pings, Alice During the course of a day, we have found that the rate of blog can perform the ping manually, as most ping servers offer this posts is quite stable, with an average of 44, 146 posts every five feature (e.g., http://blogsearch.google.com/ping). minutes, and a low standard deviation of 3, 963. We observed that approximately half of the blog posts contain photos. Performance. The ping server we chose releases every five Platform Number of sites (%) Ping support minutes the pings that it received during that period. To keep WordPress 63.49 yes Joomla! 11.17 yes up with the ping rate, we process all the blog posts of one chunk Drupal 8.57 yes of pings before the next one is released. Therefore, the maximum Blogger 3.14 yes latency between the publishing of a post with miab content and our tweeting is ten minutes (five minutes in the ping server’s Table 2: Blogging platform popularity: Alexa’s top queue, and another five before we process the post). million (source: builtwith.com [11]) An advantage of the stego algorithm we have chosen is that it is much faster in extracting potential hidden messages than Platform Number of sites Ping support embedding them. On average, it takes 0.56 seconds per image to WordPress 32 yes embed a message, and 0.02 seconds to recover it. In particular, TypePad 16 yes Moveable Type 20 yes performing exclusively our steganographic extraction on a 5- BlogSmith 14 n/a, proprietary minute pings chunk takes, on average, 2m:51s on a Intel-i7 eight- custom made 8 n/a core laptop. Performing exclusively the RSA decryption takes Drupal 4 yes Blogger 3 yes 2m:35s. Fetching all the images from the blogs takes an average Expression Engine 1 n/a, proprietary of 4m:17s. Since the steganographic test is CPU bound, and Scoop 1 no the blogs fetching is I/O bound, the two operations can execute Bricolage 1 no, CMS simultaneously with an acceptable loss in performance: processing a five-minutes chunk of pings with the full miab, with either Table 3: Blogging platforms: 100 most popular blogs scheme, completes under five minutes on average, with a processor (source: pingdom.com [47]) load under 90%. Estimating the blocking set. The Censor might attempt to Depending on the running time and memory requirements of block all blogs by blacklisting their domain names, or IP addresses. the PKS scheme Alice and Bob have chosen, a single machine may It is essential to miab’s availability that the blocking set (that is, not be able to cope with the workload. To solve this issue, Bob the set of domains or IP addresses to block) is large, as this makes can opt to scale horizontally, adding more machines. Otherwise, the maintenance of a blacklist impractical (this will be further Alice and Bob may agree beforehand that Bob will only fetch and discussed in Section 6). To estimate the size of the blocking set, analyze blog posts whose URL has some particular properties. For we generated three blacklists from three months of recorded pings, example, they might agree that blog posts with stego messages targeting FQDNs, second-level domain names, and IP addresses 3 will have a URL whose length is even (or odd). In this example, (see Figure 4). this agreement would reduce the workload approximately by 3Note that some blogs advertise only their IP address in their a factor of two, and Alice can make sure that Bob will keep pings: this is why the number of IP addresses shows a faster examining her posts by choosing the blog post title accordingly, trend than the number of second-level domains. ism [28]. Despite the block, Kazakh bloggers have moved to self-hosted or less popular blogging platforms, and continue to grow in numbers [35]. There are over 165 million blogs on the Internet [8], and a good part of them are self-hosted, making the maintenance of a blacklist impractical (see Section 5.1). Even if the Censor is able to fingerprint blogs and successfully block each of them, there are other services that generate blog pings, for example, commenting services (e.g., Disqus, which is installed on 750,000 websites). Also, for her miab messaging, Alice might open a blog in favor of the ruling party: The Censor will be less likely to block that, because it shows that the population is supporting the regime. Figure 4: Estimating the blacklist size, when target- The Censor might also block any domain that emits a ping. ing second-level domain names, fully-qualified domain We discussed how to evade a blacklist in Section 5.1. Also, this names, and IP addresses. approach is vulnerable to a denial of service attack: Bob could Bypassing the blacklist. These blacklists are cumbersome forge bogus pings claiming to be any arbitrary domain (blog in size. For example, according to Netcraft, during the time of our pings, just like emails, do not authenticate the sender), forcing experiment, there were 662, 959, 946 FQDNs running web servers the Censor to add it to the blacklist. Iterating this attack, Bob on the Internet [41]. In our experiment, we would have blacklisted can de facto render the blacklist useless. To counter this, the 33, 361, 754 FQDNs, which is 5% of the ones found by Netcraft. Censor will have to maintain a whitelist of allowed domains. This Blocking second level domains is also not practical. In our is harmful to the regime, because of the social and economic costs experiment, we blacklisted 1, 803, 345 of them, which account for of over-blocking, and can be easily circumvented (as we discussed 1.4% of the global domain registrations (which are 140, 035, 323, in Section 5.1). according to DomainTools [18]). Note that our blacklists only The Censor might want to block pings, or shut down ping cover blogs that sent pings during our experiment. To better servers. This attack is irrelevant when the blog (or ping server) is quantify this, we checked the (hypothetical) blacklists’ coverage hosted outside the Censor’s jurisdiction. on the pings that were emitted the day after we stopped updating The Censor might try to prevent Alice to post on any blog. the blacklists. We found that the blacklist targeting FQDNs had This requires both the ability to fingerprint a blog, and the 12% miss ratio, the one targeting second level domains had 11% technological power to deeply inspect the network traffic to detect miss ratio, and the one targeting IP addresses had 11% miss ratio. and block posting. The cost of this attack increases as Alice Even if the Censor chooses to pay the price of over-blocking will most likely be using a TLS connection (i.e., https://) to and enforces these blacklists, Bob and Alice can still communicate make the post, requiring the Censor to either man-in-the-middle through miab, thanks to a weakness in this blacklisting approach. the connection or perform statistical timing attacks. Also, Alice Bloggers usually associate multiple domain names/IP addresses can create a post in a variety of ways, for example using client to their blog: on blogger.com, this is configurable in the “Basic applications or by email (as many blogging platforms support Settings” admin page. These domain names constitute the entry that). The Censor therefore has to block all these vectors. points to the blog. Blog pings contain only one of these entry The Censor might try to coerce content hosts to drop miab points. Bob only needs one of these entry points to visit the blog. content. However, to do so, he needs to identify such content Instead, the Censor needs each one of the entry points, because (when deniability should prevent him from doing so). Also, the he wants to prevent Alice from accessing her blog. So, Alice can Censor has to drop the content fast: Bob will fetch the blog after side-step the blacklist using an entry point not included in her just a few minutes, since blog pings are sent in real time. blog’s pings, which the Censor cannot easily discover. The Censor might try to overwhelm miab by creating large quantities of bogus messages. Bob should be able to provide a fair 6. SECURITY ANALYSIS service to its users by rate limiting blogs, and ban them when they exhibit abusive behavior. Since Bob will first inspect the list In this section, we discuss miab’s resilience against the attacks of pings, he will avoid fetching blogs that are over their quota or that the Censor might mount to break the security properties blacklisted. Also, search engines will fetch the same blog pings that miab guarantees: availability, deniability, and confidentiality. and will analyze and index their content. Since the Censor, to Since we lack a formal verification of the security of the stegano- create a large number of blog posts, will have to either generate graphic primitives, and the powers of a real Censor are unknown, the content automatically or replay content, search engines should these goals are necessarily best effort. detect his behavior and mark it as spam. Blog ping spam (or Availability. The Censor might prevent Alice from sending sping) is already a problem that search engines are facing, since messages through miab. Our main argument against this is that pings are used in Search Engine Optimization campaigns. Because the Censor will be forced to ban a large part of the web, and of this, much research has been invested into detecting this type he is likely unwilling to do so because of the pushback from its of behavior. Bob could leverage that research by querying the population and economic drawbacks. We will now describe in search engine for the reputation of blog or blog posts. Using more detail the attacks that the Censor might devise to limit a combination of blacklists, rate limitation, and search engines availability. For this discussion, we assume that deniability holds: reputation systems, Bob is likely to thwart this attack. In particular, the Censor cannot detect content generated with The Censor might perform traffic analysis to distinguish regular miab. blog posts from ones. However, since the blog post is created The Censor could block blogging platforms. In fact, this has miab by a human, this is unlikely to be successful. already been attempted: Kazakhstan has been banning blogging The Censor might re-encode images to remove any stegano- platforms and social networks since 2008 [35]. In 2011, a court graphic content. Since the censor cannot find all the entry points established they contributed to the spread of religious terror- to the blogs, he must re-encode every image crossing his country’s Unobtrusive deployment. With our proof-of-concept ap- borders (not only blogs). Moreover, miab can easily evolve and plication, we have demonstrated that it is possible to run a miab hide the ciphertext in other parts of the post (e.g., in the text), or instance on a single modern machine with a fast domestic In- use a steganographic system that can withstand re-encoding (see ternet connection. We recommend a 100Mbps connection, such also Section 3.2). For example, a technique to do so is used in the as the one provided by Comcast in the US, also considering the shared-key steganographic scheme YASS [50], which can survive fluctuations in the speed achievable at rush hours. Therefore, a second compression, additive noise, and light filtering. The any private citizen with some disposable income, and living in a steganographic primitive and channel (images, videos, or text) country with a modern offering for Internet connectivity, can run can be chosen at will, and should be selected when deploying a miab instance. Moreover, this burden is sustained only by Bob; miab, considering the particular Censor faced. Alice has minimal requirements to participate in the protocol. Deniability. The Censor might try to identify who is using miab, thus compromising deniability. 7. RELATED WORK The Censor might try to detect steganography. uses miab There has been an extensive and ongoing discussion on anony- steganography as a primitive, so its deniability relies on a good mous and censorship-resistant communication. In this section, steganographic algorithm, and will benefit from any advancement we review the main directions of research, highlighting how they in steganography. Also, the Censor will need to detect stegano- measure against the goals that we have set for miab. graphic content with a very low fraction of false positives, because Anonymization proxies. The most established way to achieve of the vast prevalence of clean image in blog posts. For more online anonymity is through proxies, possibly with encrypted details, see in Section 3.2. traffic [3,37]. In 2010, a report from Harvard’s Center for Internet The Censor might try to detect anomalous blogging behavior. & Society [44] shows that 7 of the 11 tools with at least 250,000 Since Alice manually publishes the post, it will be difficult to unique monthly users are simple web proxies. detect anomalous behavior in a single post (provided that the These systems focus on hiding the user identities from the message that Alice composes is innocuous). However, the Censor websites they are surfing to. They have a low latency, which might detect an anomaly if the posting frequency is higher than makes web surfing possible (which miab does not). In doing the other blogs hosted on the same platform. Alice can mitigate so, they do not satisfy any of the goals of miab: They can be this risk by keeping under control her post rate. Also, Alice blocked, since it is possible to discover the addresses of the proxies can use a blogging platform that encourages frequent posts with and block them (low availability). Also, even if the traffic is photos (e.g., a photoblog, like Instagram). Alice might also have encrypted, the users cannot deny that they were using the system multiple online personas, each of which posts on a different blog. (no deniability), and it is possible to mount fingerprinting and The Censor might try to run an instance of to trick Alice miab timing attacks to peek into the users’ activity [10,33]. into communicating with him. This is a limitation common to any One of the first systems that address deniability is Infranet [20]. system that relies on a public key to identify the receiving party. Uncensored websites deploying Infranet would discreetly offer This is mitigated by publishing Bob’s public key in a variety of censored content upon receiving HTTP traffic with steganographic formats on the Internet, so that the Censor cannot find them and content. Availability is still an issue, since the Censor might block or alter them. discover and block these websites, so the collaboration of a large The Censor might try to perform a timing attack, correlating number of uncensored websites becomes essential. the effect of the action specified in the message (in our miab Mix/onion networks. Mix networks (e.g., Mixminion [15]), implementation, the posting of a new tweet) with the publishing of and Onion networks (e.g., Tor [17]), focus on anonymity and a new post. We can mitigate this by letting the user specify miab unlinkability, offering a network of machines through which users when he wants the aforementioned action to be taken, inserting can establish a multi-hop encrypted tunnel. They also typically random delays and generating decoys (in our implementation, have a low latency, which makes web surfing possible (in contrast tweeting a message that was not received). with miab). In some cases, depending on the location of the The Censor might try to perform a replay attack. Let’s suppose proxies with respect to the Censor, it is possible to link sender and that the Censor suspects that Alice used to post a message M miab receiver [7,40]. These systems do not typically provide deniability, to Twitter. To confirm his suspects, the Censor would collect blog although Tor tries to mask its traffic as an SSL connection [55]. posts from Alice, and publish the same posts on a blog platform The availability of these systems depends on the difficulty to under his control. If the same message M is tweeted when the enumerate their entry points, and their resistance to flooding [19]. posts are re-published, the Censor’s suspicion is probably correct. Protecting the entry nodes has proven to be an arms race: For The Censor can repeat the process to rule out any remaining example, the original implementation of Tor provided a publicly- doubt. Replay attacks can be thwarted by including a hash of accessible list of addresses. These entry points were blocked in the URL of the blog post in the message. After receiving the China in 2009 [56]. In response, Tor has implemented bridges [52], message, Bob will check that the hashed URL in the message which is a variation on Feamster’s key-space hopping [21]. However, matches with the one of the post, and will discard the message if these bridges have some limitations that can expose the address the check fails. Also, Bob could keep a hash of the photo in a of a bridge operator when he is visiting websites through Tor [38]. Bloom filter, to avoid reuse. Also, distributing the bridges’ addresses without exposing them Confidentiality. To break confidentiality, the Censor must get to the Censor is theoretically impossible (since the Censor could access to the plaintext of the message that Alice sent. The message play Alice’s role, and Tor cannot distinguish one from the other). is encrypted with a public key, of which only Bob knows the However, it might be possible to increase the cost of discovering a private counterpart. Assuming that the cryptographic primitive bridge so much that the Censor finds it impractical to enumerate is secure, the Censor will not be able to read the message. Also, them. This problem remains very complex and, so far, none of the Censor might run an instance of to trick Alice, as we miab the approaches attempted has succeeded: Proof of this is the fact discussed in the previous section. that China has been blocking the vast majority of bridges since 2010, as the metrics published by Tor show [57]. An interesting variation on proxy-based censorship circumven- message fetching, but this decision would result in unbounded tion is described in Fifield et al. [22]. Here, a group of Internet delays between the posting of the message and its retrieval, chal- users run an Adobe Flash application in their web browser. This lenging the availability of the system. An effective solution for application creates a short-lived proxy (in their implementation, this denial-of-service is to generate a separate task list for any the proxy is for the Tor network), so that these users can con- pair of people that need to communicate with each other. This tribute a part of their bandwidth to the network. These ephemeral can be done using miab as a primitive to disseminate these web proxies lower the barrier to set up a Tor proxy, and their databases to the parties, so that they can bootstrap a Collage short lives make their enumeration more cumbersome. However, session. Once the two parties are exchanging messages, they can a central part of this system is a special service, called the Facili- arbitrate the Collage channel they want to employ, depending on tator, whose job is to keep track and distribute proxy addresses. the performance that their communication requires. By sniffing the Facilitator’s traffic, the Censor can learn the Telex is a system that should be deployed by Internet Service proxy addresses, as they get requested from the clients. The Providers (ISPs) that sympathize with the dissidents. A user authors point out that, once the system gets popular, keeping using the Telex system opens a TLS connection to an unblocked a blacklist might be cumbersome, because of the sheer number site (that does not participate in the protocol), steganographically of available proxies. Also, as the author point out, the Censor embedding a request in the TLS handshake. When the ISP might flood proxy registrations, and exhaust the list of proxies discovers the request, it diverts the SSL connection to a blocked that the Facilitator keeps. This, also, might be mitigated by sheer site. While Telex deniability is sound, it can be subject to a numbers, as the authors point out. denial of service by the Censor. Also, the Censor can discover Anonymous publishing. Here, the focus is on publishing ISPs implement Telex by probing the paths to various points on and browsing content anonymously. Freenet [13] and Publius [60] the Internet, and prevent the traffic originating from the country explore the use of peer-to-peer networks for anonymous publishing. from going through those ISPs. Overall, Telex comes close to These systems deliver anonymity and unlinkability, but do not satisfying the goals that we set for miab. However, it requires provide deniability or availability, as the Censor can see where the ISP collaboration, which is hard to set up, as the authors the user is connecting, and block him. Another direction that is mention. miab, on the other hand, requires minimal resources being explored is to make it difficult for the Censor to remove (a good Internet connection and a single machine should suffice, content without affecting legitimate content (Tangler [59]). This when implemented efficiently), and hence, it can be setup today, system suffers similar pitfalls as Freenet, although availability of as we demonstrated by hosting our public implementation of this the documents is improved by the document entanglement. system. Finally, we would like to stress that the Censor we are Deniable messaging. Systems for deniable messaging can facing in miab is capable and willing to man-in-the-middle or be split into two categories: The ones that require no trusted block HTTPS connections (e.g., like China [42], and Iran [54] peer outside the country (e.g., CoverFS [5], Collage [12]), and are doing). If this is not true, there is a simpler solution to our the ones that do (e.g., Telex [62]). CoverFS is a FUSE file problem: the usage of a third-party service. For example, Alice system that facilitates deniable file sharing amongst a group of could send a message though one of the many webmail services people. The file system synchronization happens though messages hosted outside the censored country (e.g., Gmail) to an address hidden in photos uploaded to a web media service (Flickr, in their where an anti-censorship service run by Bob is listening. In fact, implementation). As mentioned by the authors, while CoverFS this is one of the channels used to distribute Tor bridges (by focuses on practicality, the traffic patterns could be detected by a sending an email to [email protected]). This method powerful Censor. This would affect its availability and deniability. is highly available, because of the great number of free webmail Collage, which we already described in Section 4, improves offerings. However, even if Alice and Bob have a deniable method CoverFS’ concept of using user-generated content hosted on of embedding and retrieving messages from ordinarily-looking media services to establish covert channels. Collage has higher emails, the webmail service will have an indisputable proof that availability, as the sites where the hidden messages are left change Alice and Bob exchanged something, and the Censor might, at over time, and it provides a protocol to synchronize these updates. any point in the future, pressure the webmail service to release As the authors point out, Collage deniability and availability that information to her (e.g., like China did by wire-tapping are challenged when the Censor performs statistical and timing Skype [34]). Instead, a steganographic message published in a attacks on traffic. Also, if the Censor records the traffic, he can broadcast channel, like miab’s, will be out of reach from the join Collage, download the files where the messages are embedded, Censor, at least as long as Bob’s private key is kept safe. and find out from his logs who uploaded them. To join the Collage network, a user needs to have an up-to-date version of 8. CONCLUSIONS the tasks database, which contains the rendezvous points. Since We have introduced miab, a deniable protocol for censorship its distribution has to be timely, this is challenging (and can resistance that works with minimal requirements. miab can be be solved with ). Also, without regular updates, the tasks miab used as a standalone communication system, or to bootstrap database can go stale, which requires a new bootstrap. protocols that achieve higher performance, at the cost of more In Collage, the tasks defining the rendezvous points are gener- demanding requirements. We have demonstrated miab feasibility ated manually. Although every media-sharing site can potentially with the implementation of a proof-of-concept prototype, and be used with Collage, generating all these tasks requires substan- released its code open-source. The deployment of a miab instance tial work. , instead, can use millions of domains out of the miab requires minimal configuration and resources, since it relies on box without human intervention. well-established and popular technologies (blog pings and blogging A Censor might also join Collage and disseminate large quan- platforms). Also, we have shown that miab is resilient to attacks tities of bogus messages. Since the burden of finding the right to its availability, deniability and confidentiality. Although a message is on Collage’s users, they would be forced into fetch- powerful Censor might be able to disrupt miab, in doing so he ing many of the bogus messages, thus exposing themselves to will suffer a high cost, effectively cutting the population of his statistical attacks. This could be mitigated by rate-limiting the jurisdiction out of a major part of the Internet. 9. ACKNOWLEDGEMENTS SPIE, Electronic Imaging, Security and Forensics of Multimedia This work was supported in part by the Office of Naval Re- Contents XI (2009). [28] France-Presse, A. Kazakhstan blocks popular blogging platforms, search (ONR) under grant N000140911042, the Army Research 2011. Office (ARO) under grant W911NF0910553, the National Science [29] FriendFeed. Simple update protocol. Foundation (NSF) under grants CNS-0845559 and CNS-0905537, code.google.com/p/simpleupdateprotocol. and Secure Business Austria. [30] Goljan, M., Fridrich, J., and Holotyak, T. New blind steganalysis and its implications. In Proceedings of SPIE (2006). [31] Google. Pubsubhubhub. code.google.com/p/pubsubhubbub. 10. REFERENCES [32] Greenstadt, R. Zebrafish: A steganographic system. MIT (2002). [1] Gallery of css descramblers. www.cs.cmu.edu/~dst/DeCSS/Gallery/. [33] Hintz, A. Fingerprinting websites using traffic analysis. In Privacy [2] Agence France-Presse. Pakistan blocks Facebook over Enhancing Technologies (2002). Mohammed cartoon. www.google.com/hostednews/afp/article/ [34] Human Rights Watch. Letter from human rights watch to skype ALeqM5iqKZNUdJFQ6c8ctdkUW0C-vktIEA. and skype’s response. www.hrw.org/node/11259/section/19. [3] Anonymizer. Home page. plone.anonymizer.com. [35] Kelly, S., and Cook, S. Freedom on the net. Freedom House [4] Bachrach, D., Nunu, C., Wallach, D., and Wright, M. #h00t: (2011). Censorship resistant microblogging. arXiv:1109.6874 (2011). [36] Kolari, P. Pings, spings, splogs and the splogosphere: 2007. [5] Baliga, A., Kilian, J., and Iftode, L. A web based covert file [37] Labs, B. The lucent personalized web assistant. system. In Proceedings of the 11th USENIX workshop on Hot www.bell-labs.com/project/lpwa, 1997. topics in operating systems (2007), USENIX Association. [38] McLachlan, J., and Hopper, N. On the risks of serving whenever [6] Bas, P., Filler, T., and Pevny,` T. “break our steganographic you surf: vulnerabilities in tor’s blocking resistance design. In system”: The ins and outs of organizing boss. In Information Proceedings of the 8th ACM workshop on Privacy in the Hiding (2011). electronic society. [7] Bauer, K., McCoy, D., Grunwald, D., Kohno, T., and Sicker, D. [39] Mimic, S. Homepage. spammimic.com. Low-resource routing attacks against tor. In Proceedings of the [40] Murdoch, S., and Danezis, G. Low-cost traffic analysis of tor. In 2007 ACM workshop on Privacy in electronic society. Security and Privacy, 2005 IEEE Symposium on. [8] BlogPulse. Report on indexed blogs. goo.gl/SEpDH, 2011. [41] Netcraft. May 2012 web server survey. [9] Bohme,¨ R. Advanced statistical steganalysis. Springer-Verlag, 2010. news.netcraft.com/archives/2012/05/02. [10] Brumley, D., and Boneh, D. Remote timing attacks are practical. [42] Netresec. Forensics of chinese mitm on github. Computer Networks (2005). www.netresec.com/?page=Blog&post=Forensics-of-Chinese-MITM-on- [11] BuiltWith. Content management system distribution. GitHub. trends.builtwith.com/cms, 2012. [43] Noman, H., and York, J. West censoring east: The use of western [12] Burnett, S., Feamster, N., and Vempala, S. Chipping away at technologies by middle east censors, 2010-2011. censorship firewalls with user-generated content. In USENIX [44] Palfrey, J., Roberts, H., York, J., Faris, R., and Zuckerman, E. Security Symposium (2010). 2010 circumvention tool usage report. [13] Clarke, I., Sandberg, O., Wiley, B., and Hong, T. Freenet: A [45] Pevny, T., Bas, P., and Fridrich, J. Steganalysis by subtractive distributed anonymous information storage and retrieval system. In pixel adjacency matrix. information Forensics and Security, Designing Privacy Enhancing Technologies (2001), Springer. IEEE Transactions on (2010). [14] CNN. Egyptians brace for friday protests as internet, messaging [46] Pevny,´ T., Filler, T., and Bas, P. Using High-Dimensional Image disrupted. articles.cnn.com/2011-01-27. Models to Perform Highly Undetectable Steganography. 161–177. [15] Danezis, G., Dingledine, R., and Mathewson, N. Mixminion: [47] Pingdom. The blog platforms of choice among the top 100 blogs. Design of a type iii anonymous remailer protocol. In Security and royal.pingdom.com/2009/01/15, 2009. Privacy, 2003. Proceedings. 2003 Symposium on. [48] rssCloud. Homepage. rsscloud.org. [16] De Cristofaro, E., Soriente, C., Tsudik, G., and Williams, A. [49] Ryabko, B. Y., and Ryabko, D. B. Asymptotically optimal perfect Hummingbird: Privacy at the time of twitter. IEEE Symposium steganographic systems. Problems of Information Transmission on Security and Privacy (2012). (2009). [17] Dingledine, R., Mathewson, N., and Syverson, P. Tor: The [50] Sarkar, A., Solanki, K., and Manjunath, B. Further study on yass: second-generation onion router. In Proceedings of the 13th Steganography based on randomized embedding to resist blind conference on USENIX Security Symposium-Volume 13 (2004), steganalysis. Proceedings SPIE, Electronic Imaging, Security, USENIX Association. Forensics, Steganography, and Watermarking of Multimedia [18] DomainTools. Internet statistic. Contents (2008). www.domaintools.com/internet-statistics/. [51] TechCrunch. Syrian government blocks live video of crisis. [19] Evans, N., Dingledine, R., and Grothoff, C. A practical eu.techcrunch.com/2012/02/17. congestion attack on tor using long paths. In Proceedings of the [52] Tor. Bridges. www.torproject.org/docs/bridges. 18th conference on USENIX security symposium. [53] Tor. Help users in Iran reach the internet. [20] Feamster, N., Balazinska, M., Harfst, G., Balakrishnan, H., and lists.torproject.org/pipermail/tor-talk/2012-February. Karger, D. Infranet: Circumventing web censorship and [54] Tor. Iran blocks encrypted traffic. blog.torproject.org/blog/iran- surveillance. In Proceedings of the 11th USENIX Security partially-blocks-encrypted-network-traffic. Symposium, August (2002). [55] Tor. Iran blocks Tor; Tor releases same-day fix. [21] Feamster, N., Balazinska, M., Wang, W., Balakrishnan, H., and [56] Tor. Tor partially blocked in China. Karger, D. Thwarting web censorship with untrusted messenger blog.torproject.org/blog/tor-partially-blocked-china. discovery. In Privacy Enhancing Technologies (2003), Springer. [57] Tor. Tor users via bridges from china. metrics.torproject.org. [22] Fifield, D., Hardison, N., Stark, J., Porras, R., Boneh, D., and [58] Von Ahn, L., and Hopper, N. Public-key steganography. In Evading censorship with browser-based proxies. Tor, S. Advances in Cryptology-EUROCRYPT 2004. [23] Filler, T., and Fridrich, J. Fisher information determines capacity [59] Waldman, M., and Mazieres, D. Tangler: a censorship-resistant of ε-secure steganography. In Information Hiding (2009), Springer. publishing system based on document entanglements. In ACM [24] Filler, T., and Fridrich, J. Gibbs Construction in Steganography. conference on Computer and Communications Security (2001). IEEE Transactions on Information Forensics and Security 5, 4 [60] Waldman, M., Rubin, A., and Cranor, L. Publius: A robust, (Dec. 2010), 705–720. tamper-evident, censorship-resistant web publishing system. In [25] Filler, T., and Fridrich, J. Design of Adaptive Steganographic USENIX Security Symposium (2000). Schemes for Digital Images. Proceedings of SPIE, Electronic [61] Wang, Y., and Moulin, P. Perfectly secure steganography: Imaging, Media Watermarking, Security, and Forensics XIII Capacity, error exponents, and code constructions. Information (Feb. 2011). Theory, IEEE Transactions on (2008). [26] Filler, T., Judas, J., and Fridrich, J. Minimizing embedding [62] Wustrow, E., Wolchok, S., Goldberg, I., and Halderman, J. A. impact in steganography using trellis-coded quantization. IEEE Telex: Anticensorship in the network infrastructure. In USENIX Transactions on Information Forensics and Security (Feb. 2011). Security Symposium (2011). [27] Filler, T., Ker, A. D., and Fridrich, J. The Square Root Law of Steganographic Capacity for Markov Covers. In Proceedings of