Social Networks Can Be Used to Tailor Content Staging Decisions to the User Base and Thereby Build Better Data Delivery Infrastructures
Total Page:16
File Type:pdf, Size:1020Kb
Social Network Support for Data Delivery Infrastructures Nishanth Sastry St. John’s College University of Cambridge This dissertation is submitted for the degree of Doctor of Philosophy 2011 Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indicated in the text. This dissertation does not exceed the regulation length of 60 000 words, including tables and footnotes. Social Network Support for Data Delivery Infrastructures Summary Network infrastructures often need to stage content so that it is accessible to consumers. The standard solution, deploying the content on a centralised server, can be inadequate in several situations. Our thesis is that information encoded in social networks can be used to tailor content staging decisions to the user base and thereby build better data delivery infrastructures. This claim is supported by two case studies, which apply social information in challenging situations where traditional content staging is infeasible. Our approach works by examining empirical traces to identify relevant social properties, and then exploits them. The first study looks at cost-effectively serving the “Long Tail” of rich- media user-generated content, which need to be staged close to viewers to control latency and jitter. Our traces show that a preference for the un- popular tail items often spreads virally and is localised to some part of the social network. Exploiting this, we propose Buzztraq, which decreases replication costs by selectively copying items to locations favoured by viral spread. We also design SpinThrift, which separates popular and unpopular content based on the relative proportion of viral accesses, and opportunisti- cally spins down disks containing unpopular content, thereby saving energy. The second study examines whether human face-to-face contacts can efficiently create paths over time between arbitrary users. Here, content is staged by spreading it through intermediate users until the destination is reached. Flooding every node minimises delivery times but is not scalable. We show that the human contact network is resilient to individual path fail- ures, and for unicast paths, can efficiently approximate flooding in delivery time distribution simply by randomly sampling a handful of paths found by it. Multicast by contained flooding within a community is also efficient. However, connectivity relies on rare contacts and frequent contacts are often not useful for data delivery. Also, periods of similar duration could achieve different levels of connectivity; we devise a test to identify good periods. We finish by discussing how these properties influence routing algorithms. Acknowledgements First of all, I would like to thank Jon Crowcroft. For allowing me to discover my own thesis topic and approach, without any pressure. For opening up a whole host of collaboration opportunities. For quick and efficient feedback on my paper (and dissertation) drafts. For trademark-worthy names like Buzztraq and SpinThrift. One should be so lucky in their choice of supervisors. I have had more than my fair share of excellent mentors. Karen Sollins took me under her wing at MIT during my first year. I thank her for making it possible for me to start my PhD at the other Cambridge, and for her gentle and caring support since then. Discussions with collaborators and others have greatly shaped my think- ing. D. Manjunath helped me with my first baby steps in using probabilistic arguments. Anthony Hylick pointed me towards storage subsystems, and this led to SpinThrift. An email exchange with Steve Hand and the wider netos group helped sharpen my thinking on Buzztraq. Discussions with Vito Latora led me to create and experiment with synthetic versions of the MIT and UCSD traces. I have also benefited from lively and stimulating conversations on a wide range of topics with many of my FN07 officemates, past and present: Derek, Malte, Ganesh, Steve, Jisun, Mark, Henry, Chris, Stephen, Periklis, Theo, and frequent FN07 visitor Eiko. At MIT, I had Rob, Chintan and Ji in my office, and Mythili upstairs. Given the subject of this dissertation, it seems fitting to acknowledge that in a wider sense, the same purpose has been served by my social network support during these PhD years: Lava, Alex, Klaudia, Sriram, Alka, Amitabha, Mohita, Kiran, Lavanya, Vijay, Roopsha and Olga. My wife, Viji, has played such a central role that I don’t know where to begin. She is usually the first guinea pig on whom I try out my ideas. Having to explain to a non-computer scientist makes my thinking clearer and my ideas simpler. She suffers through unpolished versions of my talks. She stays up late during conference deadlines, for moral support. Once, she v vi even spell-checked my paper at 3 A.M. in the morning. At crucial junctures, she has completely taken over on the domestic front, allowing me to focus on work. In short, this dissertation is what it is because of (or inspite of) her. She is also responsible for ViVa, whom I need to thank separately for 1.5 years of pure fun. Amma and Nana planted the seeds for this dissertation a long while ago, when they bought me the eminently readable Chitra Kathegalu (Picture Stories) and later (what must have been for them an expensive book), The Children’s Book of Questions and Answers. I am forever in their debt. My sister Nikhila, five years younger, and therefore in need of teaching, was the first “student” I bossed over. That must have been the formative experience that set me off on an academic path. Finally, my sincere thanks to my College, St. John’s, for generously sup- porting me with a Benefactors’ Scholarship, and to The Cambridge Philo- sophical Society, who awarded me a Research Studentship. Contents Contents vii 1 Introduction1 1.1 The content staging problem . .1 1.2 Thesis and its substantiation . .6 1.3 List of publications . 14 1.4 Contributions and chapter outlines . 15 2 Social Networks: an overview 17 2.1 Different varieties of social networks . 18 2.2 Properties of social networks . 24 2.3 Anti-properties . 30 2.4 Social network-based systems . 34 2.5 Present dissertation and future outlook . 38 3 Akamaising the Long Tail 39 3.1 Introduction . 39 3.2 The tail of Internet-based catalogues: Long Tail versus Super Star Effect . 42 3.3 The tail of user-generated content . 47 3.4 Saving energy in the storage subsystem . 62 3.5 Tailoring content delivery for the tail . 71 3.6 Selective replication for viral workloads . 78 3.7 Conclusions . 89 vii viii CONTENTS 4 Data delivery properties of human contact networks 91 4.1 The Pocket Switched Network . 93 4.2 Setup and methodology . 97 4.3 Delivery over fixed number of contacts . 102 4.4 Delivery over fixed duration windows . 109 4.5 Understanding path delays . 115 4.6 Evaluating the effect of failures . 124 4.7 Multicast within communities . 134 4.8 Related work . 136 4.9 Design recommendations . 138 5 Reflections and future work 141 5.1 Summary of contributions . 142 5.2 A Unifying Social Information Plane . 143 5.3 On adding social network support at a systems level . 145 Bibliography 149 CHAPTER 1 Introduction 1.1 The content staging problem In this dissertation, we are interested in supporting “eyeball” information flows, where the consumer or target of the flow is a human being. In many eyeball information flows, the producer or source of the flow is also human, but this is not essential. Examples of such flows include instant-messaging, VoIP conversations, email, or a user-generated video being streamed online. A data communications network is only useful insofar as it is able to deliver content effectively to its consumers. Often, in eyeball information flows, the producer of the content and the consumers cannot be guaranteed to connect to the network infrastructure at the same time. In this situation, some intermediate infrastructure needs to play the key role of staging the content so that it becomes accessible to the consumer asynchronously. A simple, well understood staging mechanism is to deploy the content on a centralised server. The server increases the availability of the content in the temporal dimension by acting as an always-on proxy for the producer. This kind of asynchronous access is enabled by Bulletin Board Services (BBSes), news servers, online video and photo sharing sites, etc. However, a centralised server may be inadequate either because of spe- cial characteristics of the content, or because of inadequate support from 1 2 CHAPTER 1. INTRODUCTION the network itself1. For example, certain content, like rich-media video streams, have strict delivery constraints and require tight control over la- tency and jitter. This may be difficult to achieve with a centralised server and a globally distributed consumer base, given the best-effort nature of the current Internet. To resolve this, the producer needs to be proxied in space as well as in the time dimension, by mirroring the content in different parts of the network. Specialised global replication infrastructures called Content Delivery Networks (CDNs) have been developed for this purpose. Delay-tolerant networks like the Pocket Switched Network (PSN) [HCS+05] offer even lesser support and might not even guarantee any-any connectivity. In a PSN, content is opportunistically transmitted hop-by-hop over face- to-face contacts and connectivity is achieved over time through a sequence of social contacts. Clearly, each user involved as an intermediate node is mobile and can be thought of as increasing the availability of the content in a particular region of space-time where they are located, by acting as a proxy for the producer. Because of the extremely constrained connectivity, the coverage of individual proxies can be limited.