Utilizing Distributed Resources for Content Delivery and User Collaboration
Total Page:16
File Type:pdf, Size:1020Kb
FROM PEERS TO THE CLOUD: UTILIZING DISTRIBUTED RESOURCES FOR CONTENT DELIVERY AND USER COLLABORATION by Haiyang Wang M.Eng., Tongji University, 2006 B.Sc., Jiangxi Normal University, 2003 a Thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the School of Computing Science Faculty of Applied Sciences c Haiyang Wang 2013 SIMON FRASER UNIVERSITY Summer 2013 All rights reserved. However, in accordance with the Copyright Act of Canada, this work may be reproduced without authorization under the conditions for “Fair Dealing.” Therefore, limited reproduction of this work for the purposes of private study, research, criticism, review and news reporting is likely to be in accordance with the law, particularly if cited appropriately. APPROVAL Name: Haiyang Wang Degree: Doctor of Philosophy Title of Thesis: From Peers to the Cloud: Utilizing Distributed Resources for Content Delivery and User Collaboration Examining Committee: Dr. Janice Regan Chair Dr. Jiangchuan Liu, Senior Supervisor Associate Professor Dr. Mohamed Hefeeda, Supervisor Associate Professor Dr. Qianping Gu, SFU Examiner Professor Dr. Kui Ren, External Examiner Associate Professor, University at Buffalo The State University of New York Date Approved: ii Partial Copyright Licence Acknowledgments First, I give my foremost gratitude to my senior supervisor Dr. Jiangchuan Liu. His significant enlightenment, guidance and encouragement on my research are invaluable for the success of my PhD Degree in the past six years. He is the one who totally changes the path of my life and gives me the abilities to finally find a job in academia. I love you, JC! We all love you! You are simply the best supervisor ever! I thank Dr. Ke Xu, for his great support and guidance since the year 2004. He is always like a mentor to me, and I sincerely cannot imagine how my life would be without his help. My gratitude also goes to Dr. Mohamed Hefeeda, Dr. Qianping Gu and Dr. Kui Ren for serving on the thesis examining committee. I thank them for their precious time reviewing my thesis and for advising me on improving this thesis. I would like to thank Dr. Janice Regan for chairing my PhD thesis defence. I thank Dr. Feng Wang, and Dr. Dan Wang, for helping me with the research as well as many other problems during my PhD study. I thank Mrs. Ji Xu, for giving us wonderful group parties every year. I just want to let her know that we really appreciate it. She is always our nice colleague, good friend and the best hostess! I thank my colleagues and friends at Simon Fraser University, as well as ex-colleagues whom I worked with. Although I am not listing your names here, I am deeply indebted to you. Last but certainly not least, I thank my family for their love, care, and support: my dearest wife Qi, my parents and my grandparents. I sincerely hope that I have made them proud of my achievements today. This thesis is dedicated to you all. iii Abstract In this thesis, we tackle the problem of content delivery and user collaboration with emerging Internet technologies. Our investigation starts from peer-to-peer (P2P) sharing with social relations to contemporary cloud computing with flexible resource provisioning. We seek to leverage distributed resources for efficient sharing and collaboration, which leads to a hybrid system design that seamlessly bridges users’ local resources to public datacenters. We first explore social-network-based optimizations in peer-to-peer content delivery. We give solid evidences that long-term social relations can be found and applied to enhance the sharing efficiency in peer-to-peer networks, and present practical implementation strategies for the popular BitTorrent system. We then investigate the performance of cloud-based file synchronization applications and identify the bottlenecks in their system design, in particular, the task interferences. We propose an interference-aware provisioning algorithm, which effectively mitigates the problem. We further examine the users’ interactions in state-of-the-art cloud-based distributed interactive applications. We find that, despite the benefit in terms of cost savings and better scalability, the cloud-based deployment greatly increases the users’ interaction latency. We demonstrate that a smart assignment algorithms for virtual machines can remarkably reduce such latency. Finally, we present a real-world system design that effectively bridges users’ local resources to enterprise cloud platforms. Our measurements as well as system analysis indicate that it serves as a complement of great potentials to enterprise cloud services. iv Contents Approval ii Acknowledgments iii Abstract iv Contents v List of Tables viii List of Figures ix 1 Introduction 1 1.1 RelatedWorks ................................... 3 1.1.1 P2P-based Content Delivery . 3 1.1.2 Cloud-based User Collaboration and File Synchronization . 4 1.1.3 Cloud-based Distributed Interactive Applications . ........... 5 1.1.4 Cloud Computing and Customer Resources . ... 6 1.2 OrganizationoftheThesis. .... 7 2 Accelerating P2P with Social Relations 8 2.1 Introduction.................................... 9 2.2 BackgroundandMeasurementScheme . 10 2.2.1 BitTorrent and Social Applications . ..... 10 2.2.2 MeasurementScheme . .. .. .. .. .. .. .. .. 11 2.3 CommonInterestsAmongPeers . 12 2.4 Whether Peers Can Meet Each Other Again? . 13 v 2.5 How to Identify Socially Active Peers? . ....... 15 2.5.1 TraceAnalysis ............................... 15 2.5.2 Hadamard Transform based Social Index . 17 2.6 Can Social Networks Accelerate Content Sharing? . ......... 19 2.6.1 Collaboration Among BT Peers: A Simple Solution . ..... 19 2.6.2 Evaluation ................................. 20 2.6.3 PerformancewithaHybridSystem. 21 2.7 Summary ...................................... 23 3 Resource Provisioning for Cloud User Collaboration 24 3.1 Introduction.................................... 25 3.2 Dropbox Design: Measurement and Analysis . ...... 26 3.2.1 Background................................. 26 3.2.2 DropboxProtocolAnalysis . 27 3.2.3 Overhead of Collaboration/Synchronization . ........ 29 3.3 Interferences Between Bandwidth-intensive and CPU-intensive Tasks . 30 3.4 Revisiting Instance Selection for Dropbox . ......... 32 3.4.1 Motivation ................................. 32 3.4.2 Problem Formulation and Solution . 33 3.5 PerformanceEvaluation . 37 3.6 FurtherDiscussions.. .. .. .. .. .. .. .. .. 39 3.7 Summary ...................................... 40 4 Latency Minimization in Cloud-Based User Interaction 43 4.1 Introduction.................................... 44 4.2 Cloud-Based DIA: Background and Framework . ...... 46 4.3 NetworkLatencyinCDIA. .. .. .. .. .. .. .. 48 4.4 Latency Optimization in CDIA: Basic Model and Solution . .......... 50 4.4.1 TheBasicCDIALatencyProblem . 50 4.4.2 AnOptimalSolution............................ 52 4.5 ProcessingLatencyonCloudProxy . 53 4.5.1 MeasurementofResponseTime. 53 4.6 Latency Optimization in CDIA: An Enhanced Model . ....... 55 4.7 PerformanceEvaluation . 57 vi 4.8 Summary ...................................... 61 5 Customer-Provided Resources for Cloud Computing 62 5.1 Introduction.................................... 63 5.2 Enabling Customer Resources for Cloud: An Overview . ......... 65 5.3 Provisioning across Dynamic Customer-Provided Resources .......... 66 5.3.1 TheResourceProvisioningProblem . 66 5.3.2 Optimal Provisioning with Dynamic Resources . ...... 68 5.4 Pricing with Heterogeneous Resource Providers . .......... 71 5.5 SpotCloud: A Practical System Implementation . ......... 73 5.6 Lease- vs Migration-Cost: A Closer Look . ...... 78 5.7 Summary ...................................... 82 6 Conclusion and Future Work 83 Bibliography 86 vii List of Tables 3.1 CPU benchmark running time under different traffic loads on virtualized EC2 smallinstance.................................... 32 viii List of Figures 2.1 Sampledgraphvisualization . ..... 11 2.2 An illustration to show the existence of highly overlappedpeers. 13 2.3 # of peers that encountered with peer# 313 . ...... 13 2.4 #ofpeerencounters............................... 14 2.5 Length of overlap time slots between peer pair samples (in Twitter swarms) . 14 2.6 The autocorrelation function for the data in 6 hours . .......... 16 2.7 Amplitude distribution of peer#117 and peer#313 . ......... 16 2.8 Randomnessofpeers’onlinebehaviors . ....... 17 2.9 CDFofsocialindex ................................ 17 2.10Completiontime ................................. 19 2.11Startupdelay ................................... 19 2.12 Downloadingrate................................ 20 2.13 Downloading completion time (larger swarm) . ......... 20 2.14 Downloading completion time (mixed swarm) . ........ 22 2.15 Startupdelay(mixedswarm). ..... 22 3.1 Dropboxframework ................................ 27 3.2 Synchronizationdelay . 28 3.3 Increasing of CPU benchmark on small instance . ....... 29 3.4 Increasing of CPU benchmark on large instance . ........ 30 3.5 Anexamplefortheresourceprovisioning . ....... 33 3.6 Comparison of average synchronization delay . ......... 36 3.7 Reducing of average synchronization delay . ......... 38 3.8 Comparison of total synchronization delay(total delay for Dropbox to fulfill alltheuserdemands) ............................... 39 ix 3.9 Comparisonofrentingcost . 40 3.10 Algorithm to compute the load assignment matrix. .......... 42 4.1 BasicFrameworkofGaikai . 46 4.2 Path of client interaction . ..... 48 4.3 Time cost between user and