This Electronic Thesis Or Dissertation Has Been Downloaded from the King's Research Portal At
Total Page:16
File Type:pdf, Size:1020Kb
This electronic thesis or dissertation has been downloaded from the King’s Research Portal at https://kclpure.kcl.ac.uk/portal/ Understanding the feasibility of network edge caching through measurements Raman, Aravindh Awarding institution: King's College London The copyright of this thesis rests with the author and no quotation from it or information derived from it may be published without proper acknowledgement. END USER LICENCE AGREEMENT Unless another licence is stated on the immediately following page this work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International licence. https://creativecommons.org/licenses/by-nc-nd/4.0/ You are free to copy, distribute and transmit the work Under the following conditions: Attribution: You must attribute the work in the manner specified by the author (but not in any way that suggests that they endorse you or your use of the work). Non Commercial: You may not use this work for commercial purposes. No Derivative Works - You may not alter, transform, or build upon this work. Any of these conditions can be waived if you receive permission from the author. Your fair dealings and other rights are in no way affected by the above. Take down policy If you believe that this document breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 07. Oct. 2021 Understanding the feasibility of network edge caching through measurements Aravindh Raman Understanding the feasibility of network edge caching through measurements Aravindh Raman Department of Informatics, Faculty of Natural and Mathematical Sciences King's College London This dissertation is submitted for the degree of Doctor of Philosophy 2020 Understanding the feasibility of network edge caching through measurements Abstract Recent developments in both hardware industry to bring down the cost/size of the user devices and in content systems to deliver various forms of content have resulted in the burdening of the Internet. In this work, we study how to tackle this growing network burden, leveraging local content access patterns, and the presence of edge caches and computes. The analysis is done through the data gathered from three new and trending applications: (i) on- demand video streaming (ii) live video streaming and (iii) decentralised social web, with the analysis being carried out on BBC iPlayer, Facebook Live and Mastodon respectively. Our proposition is { careful planning to share content at the edge would not only earn significant traffic savings but also improves resiliency due to a redundant availability of content. We notice that the traffic load of on-demand video traffic can be decreased by up to 70% by strategic content sharing at the edge and 22% of upload live-traffic can be saved just by offloading the content to the edge and sharing amongst the local viewers. We also find, instead of popularity based content replication in the decentralised social web (which happens at present based on the federating nodes), strategic placements are much more efficient. For example, random replication of content in 5 different nodes improves the content availability by 68% even when 25 most popular nodes are down. Thus, we investigate the spread of diverse media content arising from three different popular applications and propose how to optimally serve the content making use of the storage available at network edge. Thesis Supervisor: Prof. Nishanth Sastry Thesis Examiners: Dr. Marwan Fayed Prof. S. Keshav To Geeth and Sid Acknowledgements I am grateful to everyone who helped me along the path of writing this dissertation. First and foremost, I would like to express the deepest gratitude to my advisor Nishanth Sastry, and I am thankful for his unwavering support throughout my Ph.D. The right questions and suggestions made by him have helped me in structuring my research. Nishanth's unique way of deriving insights and his balanced style of mentoring will have an influence on me for the years to come. I am truly fortunate to work with and learn from Nishanth, for without his guidance, this dissertation would have been a distant dream! My journey as a Ph.D student would not have been possible without amazing collabora- tions. I was lucky enough to team up with brilliant people from both academia and industry, to name a few { Gareth Tyson, Diego Perino, Arjuna Sathiaseelan, Mostafa Salehi, Nader Mokari, Ming-chun Lee and Andy Molisch. Their ideas and perspective have always helped me in shaping my research in a much better way. I am incredibly fortunate to share the lab with smart computer scientists who are also excellent friends. Uncountable coffees accompanied with intellectual discussions, arguments, lunches and the humour I have shared with Sagar Joglekar, Dmytro Karamshuk, Pushkal Agarwal, Emeka Obiodu and Changtao Zhong are simply priceless. In fact, some of the chats have turned into impactful research papers/grants. Beyond our lab, I am also grateful to friends and colleagues at Center for Telecommunication Research { Mischa Dohler, Maria Lema, Fragkiskos Sardis, Oliver Holland and Luis Sequeira for involving me in networking implementations. My sincere thanks to King's College London, for supporting me with a Professor Sir Richard Trainor Scholarship. The confidence to choose the research path was seeded at Tata Institute of Fundamental Research and nurtured at IIT Delhi. I would like to thank all mentors and friends especially Aaditeshwar Seth, Zahir Koradia, Dipanjan Chakraborty, Karthyek Murthy and Nithin Varma. Indeed, they are the biggest inspirations to land in network measurements and data research. 5 6 Moving abroad and pursuing Ph.D could not have been possible without the support of my family. I would like to thank amma and appa for unwavering belief in my decisions at every point of time and instilling the desire in me to strive for the best. I am greatly indebted to my anna and manni, who have always encouraged me through my successes and been pillars of support during trough times. I am forever grateful to my beautiful and caring wife, Geetha, who has been a constant source of support while writing this dissertation. She has always stood by me and filled my life with fun and happiness. Finally, I would like to thank my Ph.D examiners, Marwan Fayed and Srinivasan Keshav. Their valuable suggestions greatly helped to improve the presentation of this dissertation. Contents Acknowledgments 5 Contents 7 List of Figures 10 List of Tables 13 List of Acronyms 14 1 Introduction 17 1.1 Thesis Statement . 19 1.2 Research Goals . 20 1.3 Methodology & Challenges . 21 1.3.1 Data Anonymisation and Integrity . 22 1.3.2 Data Consistency . 22 1.3.3 Data Storage and Ethics . 22 1.4 Contributions . 23 1.5 List of Publications . 23 2 Background 26 2.1 Web Caching: A Brief History . 26 2.1.1 Intercache Protocols . 27 2.1.2 Early Caching Topologies . 28 2.2 Content Delivery Networks . 29 2.2.1 Content Distribution and Management . 30 2.2.2 Challenges . 32 7 CONTENTS 8 2.3 Caching at the Network Edge . 33 2.3.1 Video Caching . 34 2.3.2 Information-centric Networking (ICN) Architectures . 35 2.3.3 Heterogeneous networks and D2D Caching . 38 2.3.4 Co-operation at the Edge . 38 2.3.5 Inference at the Edge . 39 2.4 Measurement & Content: State-of-the-art . 39 2.4.1 Network Measurements . 39 2.4.2 Studying the Content Systems . 41 3 On Demand Streaming and Content Sharing 45 3.1 Introduction . 45 3.2 Dataset . 48 3.2.1 BBC iPlayer. 48 3.2.2 Wireless Geographic Logging Engine (WiGLE) . 49 3.3 Exploring traffic savings from sharing . 51 3.3.1 Distribution of Access Points . 51 3.3.2 Potential bounds on savings . 52 3.3.3 Effect of storage limits . 53 3.3.4 On Scalability of Sharing . 54 3.3.5 Towards strategic content caching and sharing at the Edge . 55 3.4 Wi-Stitch: A framework for content sharing . 59 4 Live Broadcasts and Content Caching 61 4.1 Introduction . 61 4.2 Dataset . 63 4.2.1 Data Capture Methodology . 63 4.2.2 On Facebook's geo-coordinates . 65 4.2.3 On Facebook's infrastructure . 66 4.3 Characterising Live Broadcasts . 67 4.3.1 How popular is Facebook Live? . 67 4.3.2 How long are broadcasts? . 70 4.3.3 Is broadcast really necessary? . 71 4.4 Geographical Exploration . 72 4.4.1 Where are the users? . 72 4.4.2 How far away are viewers? . 73 4.4.3 Domestic or international? . 74 CONTENTS 9 4.5 Understanding Engagement . 76 4.5.1 Evolution of views over time . 76 4.5.2 Social engagement . 78 5 Decentralised web and Content Replication 80 5.1 Introduction . 80 5.2 Dataset . 82 5.3 Characterising Mastodon . 85 5.3.1 Instance Categories . 88 5.4 Exploring Instances . 90 5.4.1 Instance Hosting . 90 5.4.2 Instance Availability . 92 5.5 Exploring Federation . 97 5.5.1 Breaking the Content Federation . 97 6 Summary and Conclusions 101 6.1 Summary and Takeaways . 101 6.1.1 Content Sharing . 101 6.1.2 Content Caching . 102 6.1.3 Content Replication . 103 6.2 Future Directions . 104 6.3 Final Remarks . 104 Bibliography 106 List of Figures 2-1 Taxonomy of Cache and Content Distribution research in Content Delivery Networks . 30 3-1 Heatmap showing the clumped dispersion of home Wi-Fi access points across six districts in the United Kingdom, indicating a potential density of caches 46 3-2 Dispersion of Wi-Fi access points in 37 adjacent cells at Hammersmith and Fulham .