Dclaims: a Censorship Resistant Web Annotations System (Extended Abstract of the Msc Dissertation)
Total Page:16
File Type:pdf, Size:1020Kb
DClaims: A Censorship Resistant Web Annotations System (extended abstract of the MSc dissertation) Joao˜ Ricardo Marques dos Santos Departamento de Engenharia Informatica´ Instituto Superior Tecnico´ Advisors: Professors Nuno Santos and David Dias I. INTRODUCTION the text highlighted by her friends (or anyone chosen by her) and when she places her mouse over the highlighted portions The web plays a critical role in informing modern democ- she sees comments made by the users about the highlighted racies. A study [1] by the Reuters Institute for the Study text. What web annotations allow is for the creation of a of Journalism at the University of Oxford, conducted in new layer of data, on top of the existing websites, without 2016, reveals that the Internet makes 53% to 84% of the changing the original resources. Currently, there are several primary sources of information used by UK citizens aged web annotation services available to the general public [12], 18-44. Unfortunately, we have become more aware of often but they have a centralised nature. An annotation made by lack of credibility, falsehood and incompleteness of that very a user using service A cannot be used by service B. We same information – which can have serious consequences. call these services silos because the information they hold Democracies are fed by public opinion which in turn is and services they offer are only useful and relevant to their shaped by the information individuals receive. Campaigns ecosystem. Other platforms such as Hypothes.is improve of disinformation have the power to start wars [2], influence over the interoperability aspect by using a standard type government elections [3], endanger human lives [4] and even of web annotation data model, but still have total control jeopardise the future of the human species [5]. Therefore, over the data and offer no guarantees of permanent storage. it is essential to allow people to have access to reliable Consequently, these services are vulnerable to the same type sources of information – to the point of the identification of censorship as the one present in social networks. and classification of low-quality information has become one The goal of our work is to build a system that allows of the most active areas of research[6], [7], [8], [9]. for uncensored access to web annotations on the Internet by There have been several attempts to improve the reliability eliminating central points of control where a powerful actor of information on the Web. Fact-checking platforms and can exert pressure to fabricate, modify, or suppress exchanged social networks are two. The problem with these platforms messages. Examples of such actors include governments or is that they too can be censored. There are multiple instances powerful media organisations. of state censorship of information. There is evidence of Our system must satisfy the following requirements: efforts by governments of at least 13 countries (including China, Egypt, Turkey, United Kingdom, Russia and France) • Authenticity and Integrity Assurances: Ensure that to either block access to these platforms or censor user- the end-users know who created the web annotations, generated content [10], [11]. Google, Facebook and Twitter and that they have not been tampered with. publish annual reports of the number of government requests • Data Permanence and Portability: Links to web of remove content 1 2 3. annotations should not get broken if they change loca- Besides being vulnerable to censorship, the information tion. This means that the links should remain the same, created by these platforms is scattered across multiple profile independent of the web server where content is stored. pages and websites. An alternative way to achieve the goal • Financial Cost Efficiency: The infrastructure cost of of informing would be to display the information directly operating such a system will be supported by voluntary on the source webpage. Web annotations4 is a new way of institutions, and should not be higher than one of current interacting with information on the web, which allows for platforms which provide similar services. users to make and share annotations of web pages, just like • Scalability: The system should handle workloads simi- they would annotate a physical notepad. It empowers end- lar to the ones on Facebook’s news pages. users to highlight text, create sticky notes or comment specific • Compatibility with standards: The data model used parts of a web page. A typical usage scenario consists of a should be compatible with current standards. This is im- user visiting a news webpage article which shows portions of portant to ensure interoperability between applications. This paper presents a novel system for the censorship- 1Google Transparency Report: https://transparencyreport.google.com/ resistant exchange of web annotations over the Internet, government-removals/overview called DClaims. DClaims has a decentralised architecture, 2 Facebook Transparency Report: https://web.archive.org/web/ which is based upon two building blocks: the Ethereum 20180501211320/https://transparency.facebook.com/government/ 5 6 3Twitter Transparency Report: https://web.archive.org/web/ blockchain and the Inter-Planetary File System(IPFS) . To 20180501211317/https://transparency.twitter.com/en/gov-tos-reports.html 4https://web.archive.org/web/20180502111019/https://www.w3.org/ 5https://www.ethereum.org/ annotation/ 6https://ipfs.io/ marry the feature set of web annotations, with the integrity Table I: Comparison Of All The Presented Web Annotation and censorship resistance assurances of IPFS and the ordered Services As Well As Dclaims registry and freshness of Ethereum, our system stores web Multiple Website Cross-Platform Self Censorship Revocability Support Compatible Verifiability Permanence Resistant annotations on IPFS and records the IPFS links of these files Medium Yes No No No No No Genius Yes Yes No No No No on Ethereum. Hypothes.is Yes Yes Yes No No No In summary, the contributions of this paper are: DClaims Yes Yes Yes Yes Yes Yes • The development of the DClaims Protocol, which de- fines how web annotations are generated, transported and stored. With the goal of increasing interoperability between web annotations services, the World Wide Web Consortium • Implementation of the DClaims platform, which imple- 8 ments the DClaims Protocol and can run on any modern (W3C) created a standard for web annotations. Following the web browser or web server. standard, a new web annotation service, called Hypothes.is, was created. Hypothes.is inherits the properties of the stan- • An experimental evaluation of the DClaims platform, comparing its performance with other widely adopted dard’s data model and transport protocol. The features offered platforms. by their platform are comparable to the ones of Genius’ (users can annotate any webpage and share it with anyone) with the The rest of this document is structured as follows: Section advantage of interoperability, meaning that the annotations II is the related work where we overview Web Annotations, produced by Hypothese.is’ platform will be compatible with blockchain and IPFS. In Section III we present the architec- the ones produced by any other platform that implements ture of the DClaims protocol. After, in Section IV we offer the same standard. Furthermore, Hypothes.is has developed our implementation, followed in Section V by the evaluation specific software for use cases in education, journalism, of the system. Finally in Section VI we present the conclusion publishing and research. However, the annotations are being and discussion. stored by Hypothes.is containers (servers run and controlled by Hypothes.is) and users still need to trust the organisation II. RELATED WORK to keep their best interests in mind. In this section, we analyse the current systems for Web Discussion: Web Annotations are a handy tool for the Annotations and explain how they can be a useful tool for web. They allow for the creation of a layer on top of the fact-checking information on the web. To understand the existing websites, enriching their content and improving the general model of web annotations services, consider a simple way information about them is shared, all of this without the example. Alice is a blogger who just published an article on need to change the original resources. Table I is a comparison Medium (a blog hosting website). Bob visited the website and between the previously presented services. Platforms such found a paragraph of the blog post particularly interesting, as Genius are feature rich, offering great functionality for he highlighted that portion of the text and used the website’s their users, but a cost of no interoperability. The standard highlight feature. Later that day, Charlie, visits Medium and data model for Web Annotation is the one produced by reads Alice’s blog post. The paragraph that Bob highlighted W3C which has interoperability and decentralisation in mind. is displayed highlighted to Charlie. In this scenario Alice Unfortunately, as of yet, there is no real fully decentralised is the content creator, Bob is the annotator, Charlie is the implementation of web annotations, as all the existing ser- reader. Medium is the website and the holder (since the vices use their services to store the data. Furthermore, users information about the highlights is kept under their control), have to trust the service to not withhold data from them or and the highlight is the annotation. tamper with the data. Ideally, web annotations should have a set of desired We argue that to make web annotations resilient, two properties, which are enforced differently depending on the objectives need to be achieved. First, the annotations need to type of annotation service implementation. The main of such be provided with integrity assurances and, second, the whole properties are self-verifiability (not be dependent on third architecture needs to be decentralised, to not have centralised parties to be verified), permanency, revocability, portability points where censorship can occur.