A Collaborative Internet Archive for Personal and Social Use

A Collaborative Internet Archive for Personal and Social Use Ensuring File Availability and User Friendliness Through a Peer-to-peer Internet Archiving System Tonje Røyeng Thesis submitted for the degree of Master in Programming and System Architecture 60 credits Department of Informatics Faculty of mathematics and natural sciences UNIVERSITY OF OSLO Spring 2020 A Collaborative Internet Archive for Personal and Social Use Ensuring File Availability and User Friendliness Through a Peer-to-peer Internet Archiving System Tonje Røyeng © 2020 Tonje Røyeng A Collaborative Internet Archive for Personal and Social Use http://www.duo.uio.no/ Printed: Reprosentralen, University of Oslo Abstract The nature of the internet is ephemeral. Hyperlinks break, technologies are rendered obsolete, information is changed, deleted, lost and we are left unable to access the same information we were only a little while ago. One solution to this is to archive permanent copies of websites, which can be done for their cultural and historical importance, or personal use. This archiving effort can ensure that the content does not change or disappear over time, and adds a persistence of data that the Internet itself lacks. The goal of this project is to implement a prototype for a peer-to-peer system for personal archiving and sharing of Internet content that is both reliable and user friendly. This thesis explores the theoretical aspects of peer-to-peer and Internet archiving systems, and gives an overview of the systems that already exist. This examination acts as the foundation of our system design, where we seek to combine the good qualities of existing Internet archiving systems with the robustness of a peer-to-peer system. The result is a peer-to-peer Internet archiving system that uses a distributed hash table to structure the network, and an application that allows the user to interact with the system through a graphical user interface. The experimental results show that the system performs as expected. It is possible to archive and view sites through a graphical user interface, and share the archived files through their file ID. The files are duplicated across a network of peers, and they are also republished every hour to maintain the number of copies. The system is also acceptably performant, and user friendly, following research on attention spans and design principles. The result is a system that combines the qualities of peer-to-peer and Internet archiving systems to create a collaborative Internet archiving system for personal and social use. i ii Acknowledgements First of all, I would like to thank my supervisors, Prof. Eric Jul and Oleks Shturmov for their guidance throughout this project, their constructive criticism and for letting me shape my own project. Their experience and help along the way has been invaluable. Secondly, I want to thank my friends, both the ones I have met during my studies, and anyone else who has provided me with much needed distractions and support along the way. A special thanks to everyone who was willing to participate in my user test. Finally, I want to acknowledge the support of my family, in particular my mum, for always listening to me and encouraging me, and my partner, who has been a shoulder to lean on, a helpful discussion partner, and a very patient household member in these last few months. iii iv Contents I Preliminaries 1 1 Introduction 2 1.1 Research Question . .2 1.2 Goal . .2 1.3 Approach . .3 1.4 Design and Implementation . .4 1.5 Evaluation . .5 1.5.1 Results . .5 1.6 Conclusion . .7 1.7 Work Done . .8 1.8 Limitations . .8 1.9 Outline . .9 2 Background 11 2.1 Introduction to Background . 11 2.2 Peer-to-peer Systems . 12 2.2.1 Decentralised and Distributed . 12 2.2.2 Characteristics . 13 2.2.3 Distributed Hash Table (DHT) . 14 2.2.4 Taxonomy . 15 2.2.5 Summary of Peer-to-peer Systems . 31 2.3 Internet Archiving . 31 2.3.1 Saving the Internet . 31 2.3.2 Issues in Internet Archiving . 32 2.3.3 Taxonomy . 35 2.3.4 Summary of Internet Archiving Systems . 46 2.4 Sharing in a Cultural Context . 46 2.4.1 Pirating . 46 2.4.2 Social Media . 47 2.4.3 Summary of Sharing in a Cultural Context . 47 2.5 Summary of Background . 47 II Project 49 3 Analysis 50 3.1 Problem . 50 v 3.1.1 Core Issues . 50 3.1.2 Summary of Problem . 53 3.2 Solution . 53 3.2.1 Required Functionality . 53 3.2.2 Evaluation . 54 3.2.3 Summary of Solution . 55 3.3 Conclusion of Analysis . 56 4 Design 57 4.1 Core functionality . 57 4.1.1 Peer Communication . 57 4.1.2 Archiving Sites . 58 4.1.3 Fetching Files From DHT . 58 4.1.4 Sharing . 58 4.2 Structured Peer-to-peer System . 58 4.2.1 Kademlia . 59 4.2.2 Summary of Structure . 59 4.3 Files . 60 4.3.1 Local saving . 60 4.3.2 File Duplication . 60 4.3.3 File Location . 61 4.3.4 Summary of Files . 61 4.4 Graphical User Interface . 61 4.4.1 Clarity . 62 4.4.2 Consistency . 63 4.4.3 Simplicity . 64 4.4.4 Summary of Graphical User Interface . 64 4.5 Trade-offs . 65 4.6 Summary of Design . 65 5 Implementation 67 5.1 Two-tier Architecture . 67 5.2 Running the Application . 68 5.2.1 Application . 68 5.2.2 CLI . 69 5.3 Back-end . 70 5.3.1 File Structure . 70 5.3.2 Peer Handling . 71 5.3.3 File Handling . 73 5.3.4 Summary of Back-end . 75 5.4 Front-end . 75 5.4.1 File Structure . 75 5.4.2 Functionality . 76 5.4.3 Visual Design . 79 5.4.4 Summary of Front-end . 82 5.5 Summary of Implementation . 83 vi 6 Evaluation 84 6.1 Implementation Tests . 84 6.1.1 File Structure Overview . 84 6.1.2 Functionality . 85 6.1.3 Reliability . 86 6.1.4 Performance . 90 6.1.5 Summary of Implementation Tests . 93 6.2 Graphical User Interface Evaluation . 94 6.2.1 Survey . 94 6.2.2 Analysis . 96 6.2.3 Conclusion of Graphical User Interface Evaluation . 99 6.3 Conclusion of Evaluation . 99 III Conclusion and future work 102 7 Conclusion 103 7.1 Summary . 103 7.1.1 Results . 104 7.2 Limitations . 105 7.3 Perspective . 105 7.4 Future Work . 106 Appendices 113 A Output From Functionality Test 114 B Questionnaire 118 vii List of Tables 2.1 Comparison of peer-to-peer systems . 30 2.2 Comparison of Internet archiving systems . 45 6.1 Time it takes to archive file . 92 6.2 Time it takes to store file in DHT . 92 6.3 Time it takes to fetch file . 93 viii List of Figures 2.1 Difference between centralised and decentralised systems . 12 2.2 Launch screen of uTorrent Web . 24 2.3 Download screen of uTorrent Web . 24 2.4 gnutella-gtk GUI . 25 2.5 IPFS status menu . 26 2.6 IPFS desktop GUI . 27 2.7 Napster GUI . 28 2.8 ArchiveBox file overview . 39 2.9 Webrecorder landing page . 40 2.10 Webrecorder collection page . 41 2.11 Webrecorder manage collection page . 42 2.12 Pocket landing page . 42 2.13 Pocket action menu for article . 43 2.14 Article saved to Pocket, viewed in web app . 43 2.15 Internet Archive landing page . 44 2.16 Wayback Machine landing page . 45 4.1 Example system state . 58 4.2 Example node state, using marked node from figure 4.1 . 59 4.3 First wireframe of system design . 62 4.4 Colours used . 64 5.1 Structure of back-end scripts . 72 5.2 Program flow when saving a new site to archive . 74 5.3 Front-end application home page . 77 5.4 Front-end application display page . 78 5.5 File menu . ..

Load more