Arxiv:1606.01979V2 [Cs.NI] 29 Oct 2016 Ing Various Aspects of Online Information Controls
Total Page:16
File Type:pdf, Size:1020Kb
Exploring the Design Space of Longitudinal Censorship Measurement Platforms Abbas Razaghpanah Anke Li Arturo Filasto` Rishab Nithyanand Stony Brook University Stony Brook University The Tor Project Stony Brook University Vasilis Ververis Will Scott Phillipa Gill Humboldt University Berlin University of Washington Stony Brook University Abstract gitudinal data from a diverse set of regions while per- forming accurate analysis using robust and well specified Despite the high perceived value and increasing sever- techniques. We present and compare two such platforms ity of online information controls, a data-driven under- – ICLab and OONI– that represent different points in the standing of the phenomenon has remained elusive. In censorship measurement design space. this paper, we consider two design points in the space In this paper, we first identify three primary design of Internet censorship measurement with particular em- decisions made in the development of censorship mea- phasis on how they address the challenges of locating surement platforms. Then, we describe how ICLab and vantage points, choosing content to test, and analyzing OONI address these decisions, while considering the im- results. We discuss the trade offs of decisions made by pact of these decisions on the measurement results pro- each platform and show how the resulting data provides duced by the systems. Finally, we show how ICLab complementary views of global censorship. Finally, we and OONI, when used together, provide a unique insight discuss lessons learned and open challenges discovered into the current state of information controls around the through our experiences. globe. 1 Introduction 2 Design Decisions The last five years have cemented the Internet as critical We now discuss the choices of (1) where measurements infrastructure for communication. In particular, it has are run, (2) how measurements are run, and (3) how data demonstrated high utility for citizens and political ac- is intepreted . These three design decisions are central to tivists to obtain accurate information, organize political the design of a global censorship measurement platform. movements, and express dissent. This fact has not gone unnoticed, with governments clamping down on this 2.1 Where measurements are run medium via censorship and information controls. Con- sequently, there has been a surge of interest in measur- There are two options when considering vantage points: arXiv:1606.01979v2 [cs.NI] 29 Oct 2016 ing various aspects of online information controls. More crowd-sourced and dedicated infrastructure vantage specifically, data obtained from such measurements has points. Platforms using a crowd-sourced approach rely been used by (1) political activists to understand the mo- on volunteers running measurement software. They have tivation for and the impact of such government policies the ability to turn citizens in any location into vantage and (2) researchers to build safer and more secure cen- points. Dedicated infrastructure, on the other hand are sorship circumvention tools by understanding the tech- distributed and operated exclusively for the platform. niques used to implement these policies [1]. Both approaches have their own benefits and drawbacks. While there have been numerous efforts to character- Cost and availability. A hurdle in setting up a dedi- ize online information controls [2–7], the data gathered cated infrastructure is distributing infrastructure globally. or used by these measurements have limited scope due to However, once this infrastructure is in place, it has the the specificity of locations and time-periods considered. capability of performing on-demand measurements; lim- In order to gain a nuanced understanding of the evolution ited only by the reliability and uptime of the infrastruc- of Internet censorship, in terms of policy and techniques, ture. Crowd-sourced platforms, on the other hand, in- a measurement platform needs to be able to gather lon- cur no setup cost but are dependent on the availability of 1 volunteers to execute measurements. As a consequence, processing gathered data, and lower bandwidth for com- crowd-sourced platforms are unable to provide a reliable municating processed results of measurements. flow of measurements from a region. Representativeness and diversity of measurements. 2.3 Gathering and interpreting data Crowd-sourced platforms have the potential to obtain a view of the Internet from a wide variety of networks (e.g., A censorship measurement platform must specify data residential, academic, and corporate). In contrast, dedi- collected and how it will identify censorship in this data. cated infrastructure faces an uphill battle of distributing Type and quantity of data gathered. A platform may devices or may leverage existing infrastructure (e.g., aca- record packet captures of entire tests or selectively gather demic networks, or dedicated hosting networks). Van- data such as packet headers and responses. While com- tage point location can impact conclusions drawn from plete packet captures are ideal for deep aposteriori anal- their measurements. As an example, measurements con- ysis and to identify censorship not visible at the appli- ducted from the UK academic network (JANET) do not cation layer. However, they require root privileges, high observe the “Great Firewall of Cameron” [8], since they storage and bandwidth requirements, and may acciden- are placed outside of its purview. Crowd-sourced plat- tally collect data of other system users. forms can also leverage public interest and news cover- Identifying censorship events. Another challenge that age to introduce additional vantage points. arises during the processing of gathered data is defin- Safety and risk. Information controls measurements ing when “censorship” has occurred. This task is com- using humans in the field poses a significant and hard to plicated by strange protocol implementations (e.g., load quantify risk. In many regions (e.g., Syria) this risk has balancers that cause gaps in TCP sequence numbers), been determined to be too high for volunteers. Such risks server side blocking [10], and regular network failures. impede crowd-sourced measurements, while infrastruc- ture (such as VPN or hosting networks) allow for mea- 3 The ICLab and OONI Platforms surements while posing little to no risk to users. In this section we describe the design decisions made 2.2 Measurement autonomy during the development of the ICLab and OONI plat- forms. A censorship measurement platform can either use a cen- tral server to schedule experiments, or leave these tasks to each vantage point. This dichotomy has an impact on 3.1 Vantage points several capabilities of the platform. The most fundamental difference between the ICLab and Time and context sensitive measurements. The time OONI platforms is the approach each system takes to re- and political context of measurements are important for cruit vantage points. ICLab relies on a dedicated infras- understanding evolving and abrupt policy changes. As tructure to perform measurements. This allows measure- an example, during the rise of ISIS in 2014, the Indian ments that require permissions that may not be compat- government blocked (and subsequently unblocked) ac- ible with software to be run on end-user systems. As cess to 32 websites including GitHub, Vimeo, and Paste- a consequence, the system has thus far focused on de- Bin for propagating “Anti India content” [9]. Centrally ployment on VPN vantage points and a limited deploy- controlled measurement platforms have the advantage of ment of Raspberry Pi’s installed with ICLab software. being able to evolve existing and schedule new measure- In contrast, OONI takes a lighter-weight software-based ments in response to changing political and social situa- approach. and assumes some amount of technical savvy tions. Locally controlled platforms, however, do not have on the part of volunteers. This leads to differences in this capability. Instead they are dependent on the update the availability and representativeness of measurements schedule of the local vantage point. from each platform. In Figure 1a, we see that as a re- Infrastructure requirements. The ability to remotely sult of the decision to use VPN end points, ICLab is able schedule experiments and aggregate data centrally allows to provide vantage points for measurements in signifi- cantly more countries than OONI (151 for ICLab and 46 for the use of computationally constrained infrastructure, 1 not needing technically savvy local maintenance efforts. for OONI in the last 100 days ). However, we found This comes with the cost of bandwidth requirements that that OONI’s crowd-sourced model is able to provide associated with shipping unprocessed data to a central more AS-level diversity – i.e., OONI provides vantage server. Locally controlled platforms require local man- points from an average of 3.15 different networks (ASes) agement of the platform infrastructure to ensure up-to- 1Since its release in 2012, OONI has received nearly 10M measure- date experiments, higher computational capabilities for ments from volunteers in 95 countries. 2 in each country, compared to ICLab’s average of 1.46 ity. Volunteers also have the option to add their own tests networks per country. and modify inputs to existing tests (e.g., they may change the set of URLs being used by a test). Performing measurements. ICLab and OONI also differ in their approach to performing measurements. ICLab takes a “simple node” approach, with