Copyright by Tae Won Cho 2010 the Dissertation Committee for Tae Won Cho Certifies That This Is the Approved Version of the Following Dissertation
Total Page:16
File Type:pdf, Size:1020Kb
Copyright by Tae Won Cho 2010 The Dissertation Committee for Tae Won Cho certifies that this is the approved version of the following dissertation: Enabling Information-centric Networking: Architecture, Protocols, and Applications Committee: Yin Zhang, Supervisor Mohamed Gouda Raymond Mooney Lili Qiu K. K. Ramakrishnan Enabling Information-centric Networking: Architecture, Protocols, and Applications by Tae Won Cho, B.S., M.A. DISSERTATION Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY THE UNIVERSITY OF TEXAS AT AUSTIN August 2010 To Christy, Brandon, and Claire. Enabling Information-centric Networking: Architecture, Protocols, and Applications Tae Won Cho, Ph.D. The University of Texas at Austin, 2010 Supervisor: Yin Zhang As the Internet is becoming information-centric, network services increas- ingly demand scalable and efficient communication of information between a mul- titude of information producers and large groups of interested information con- sumers. Such information-centric services are growing rapidly in use and deploy- ment. Examples of deployed services that are information-centric include: IPTV, MMORPG, VoD, video conferencing, file sharing, software updates, RSS dissemi- nation, online markets, and grid computing. To effectively support future information-centric services, the network in- frastructure for multi-point communication has to address a number of significant challenges: (i) how to understand massive information-centric groups in a scalable manner, (ii) how to analyze and predict the evolution of those groups in an accurate and efficient way, and (iii) how to disseminate content from information produc- ers to a vast number of groups with potentially long-lived membership and highly diverse, dynamic group activity levels? This dissertation proposes novel architecture and protocols that effectively address the above challenges in supporting multi-point communication for future information-centric network services. In doing so, we make the following three major contributions: v (1) We develop a novel technique called Proximity Embedding (PE) that can approximate a family of path-ensembled based proximity measures for information- centric groups. We develop Clustered Spectral Graph Embedding (SCGE) that cap- tures the essential structure of large graphs in a highly efficient and scalable manner. Our techniques help to explain the proximity (closeness) of users in information- centric groups, and can be applied to a variety of analysis tasks of complex network structures. (2) Based on SCGE, we develop new supervision based link prediction tech- niques called Clustered Spectral Learning (CSL) and Clustered Polynomial Learn- ing (CPL) that enable us to predict the evolution of massive and complex network structures in an accurate and efficient way. By exploiting supervised information from past snapshots of network structures, our methods yield up to 20% improve- ment in link prediction accuracy when compared to existing state-of-the-art meth- ods. (3) Finally, we develop a novel multicast infrastructure called Multicast with Adaptive Dual-state (MAD). MAD supports large number of group and group mem- bership, and efficient content dissemination in a presence of dynamic group activity. We demonstrate the effectiveness of our approach in extensive simulation, analysis, and emulation through the real system implementation. vi Acknowledgments First of all, I would like to thank my advisor Yin Zhang. Yin has guided me through my entire PhD career, and he has been an excellent advisor all the time. Especially, he has an exceptional vision in research, and selecting challenging problems. He also gave me lots of experience in designing problems and attacking solutions based on intuitions. I feel very fortunate to work with him. I thank my mentors in AT&T Labs Research - K. K. Ramakrishnan and Divesh Srivastava. During two summer internships and four years of collaboration with AT&T Labs Research, they have been my excellent mentors. K. K. has a great vision in research, and lots of experience from real-world problems. He has helped me a lot in designing a new multicast architecture, and formulating it as information-centric networking. Divesh has provided me invaluable insights based on his expertise from database area. His sharp analytic skill has always inspired me. They gave me lots of helpful comments in positioning our work, and finally getting published in the top-tier conference. It has been a great pleasure to work with my co-authors Inderjit Dhillon, Han Hee Song, Berkant Savas, and Vacha Dave. Inderjit has provided me many helpful comments and insights from his knowledge in data mining and mathematics. Han is my good friend and an excellent colleague to work with. Berkant has helped me a lot in understanding and solving tough problems using his expertise in data mining and scientific computing. Vacha has been a great asset to our team. She is very good at collecting and analyzing tremendous amount of network data. I would like to thank my other committee members Mohamed Gouda, Ray- vii mond Mooney, and Lili Qiu. Professor Gouda gave me lots of helpful comments in network protocols and architectures. Raymond showed great interest in the link prediction problem. I received very good feedback from his experience and knowl- edge from machine learning field. Lili is an excellent researcher in wireless area. Her passion and hard-working in research has motivated me a lot. She also gave me many helpful comments in writing papers. Without my colleagues and lab members, I could not survive from the grad- uate school. I want to thank Ajay Mahimkar, Upendra Shevade, Mikyoung Han, Eric Rozner, and the LASR group members. I also thank lab alumni - Jayaram Mudigonda, Ravi Kokku, Taylor Riche, and Umamaheswararao Karyampudi. They helped me in setting up the lab environment and understanding research problems during my early graduate career. Finally, I dedicate my dissertation to my loving wife Christy, my mischievous son Brandon, and my beautiful daughter Claire. They have always supported me with love, and motivated me to move forward in my life. viii Table of Contents List of Tables xii List of Figures xiii Chapter 1. Introduction 1 1.1 Challenges............................... 3 1.2 Approach ............................... 5 1.2.1 Information-centricGroupFormation . 5 1.2.2 ScalableProximityEmbedding . 6 1.2.3 SupervisedLinkPrediction . 6 1.2.4 MAD:MulticastwithAdaptiveDual-state . 7 1.3 Outline................................. 9 Chapter 2. Information-centric Group 10 2.1 Information-centricGroupFormation . 10 2.1.1 User-basedGroup . 10 2.1.1.1 OnlineSocialNetwork . 11 2.1.2 Content-basedGroup. 12 2.1.2.1 Multicast ....................... 12 Chapter 3. Scalable Proximity Embedding 14 3.1 Introduction .............................. 14 3.2 Background .............................. 16 3.2.1 ProximityMeasures . 16 3.2.2 SpectralGraphEmbedding . 18 ix 3.2.3 GraphClustering. 19 3.3 ProximityEmbedding . 21 3.3.1 ProblemFormulation. 21 3.3.2 ScalableProximityInversion . 23 3.3.2.1 Preparation . 23 3.3.2.2 ProximityEmbedding . 24 3.4 ClusteredSpectralGraphEmbedding . 27 3.4.1 ProposedAlgorithm . 27 3.4.2 AdvantagesofOurApproach . 29 3.4.3 ScalabilityAnalysis . 32 3.4.4 Proximity Estimation Using CSGE .............. 34 3.5 Evaluation ............................... 36 3.5.1 DatasetDescription . 36 3.5.2 GraphClustering. 38 3.5.3 Scalability ........................... 38 3.5.4 ProximityEstimation. 42 3.5.4.1 EvaluationMethodology . 43 3.5.4.2 EstimatingProximityMetrics . 44 3.6 RelatedWork ............................. 48 Chapter 4. Supervised Link Prediction 50 4.1 Introduction .............................. 50 4.2 OurApproach ............................. 52 4.2.1 ProblemSetup ......................... 52 4.2.2 SupervisedLearningMetric . 54 4.2.3 TrainingandTestingofLinkPredictors . 56 4.2.4 Alignment ........................... 57 4.3 LinkPredictionEvaluation . 58 4.3.1 DatasetDescription . 58 4.3.2 EvaluationMethodology. 59 4.3.3 ScalabilityEvaluation . 60 4.3.4 AccuracyEvaluation . 62 x Chapter 5. MAD: Multicast with Adaptive Dual-state 67 5.1 Introduction .............................. 67 5.1.1 Requirements of Information-centric Network Services . 68 5.1.2 MADApproachandContributions . 70 5.2 RelatedWorkandLimitations. 72 5.3 MADOverview ............................ 75 5.4 MADProtocolDesign. .. .. .. .. .. .. .. 79 5.4.1 ModeTransition . .. .. .. .. .. .. .. 80 5.4.2 FailureRecovery. 81 5.5 ScalingofMADTrees. .. .. .. .. .. .. .. 82 5.5.1 SimulationEvaluation . 82 5.5.1.1 SimulationSetup . 82 5.5.1.2 SimulationResults . 84 5.5.2 FormalAnalysisofStateRequirement . 86 5.6 EvaluationofImplementation . 91 Chapter 6. Conclusions and Future Work 96 6.1 Contributions ............................. 96 6.2 FutureWork .............................. 98 Bibliography 101 Vita 107 xi List of Tables 3.1 SummaryofProximityMeasures. 35 3.2 SummaryofOnlineSocialNetworkCharacteristics . 37 3.3 ClusteringResults.. .. .. .. .. .. .. .. 38 3.4 Preparation Time of Graph Embedding Algorithms (r = 100) . 39 3.5 Query Time of Proximity Estimation Algorithms (r = 100, 0.6 millionsamples)............................ 39 3.6 Clustered Spectral Graph Embedding Computation Timeand Mem- oryUsage(0.6millionsamples) . 42 4.1 Preparation Time of Graph Embedding Algorithms (r = 100) . 60 4.2 Query Time of Proximity Estimation Algorithms