University of Cincinnati

UNIVERSITY OF CINCINNATI Date: ___________________5/29/2008 I, ________________________________________________Chad Yoshikawa _________, hereby submit this work as part of the requirements for the degree of: Ph.D. in: Computer Science It is entitled : Adaptive Client-Server Load Balancing on Bounded-Degree Network Overlays This work and its defense approved by: Chair: _______________________________Dr. Kenneth A. Berman _______________________________Dr. Fred S. Annexstein _______________________________Dr. Urmila Ghia _______________________________Dr. Matthew J. Grismer _______________________________Dr. Jerome Paul Galaxy: Adaptive Client-Server Load Balancing on Bounded-Degree Network Overlays A dissertation submitted to the Division of Research and Advanced Studies of the University of Cincinnati in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY in the Department of Computer Science of the College of Engineering June 2008 by Chad Owen Yoshikawa B.S., Carnegie Mellon University, 1994, Pittsburgh, PA M.S., University of California, Berkeley, 1999, Berkeley, CA Dissertation Adviser and Committee Chair: Kenneth A. Berman, Ph.D. Abstract In this work we describe the Galaxy system which enables communication between discon- nected computers by providing a set of public waypoints which accept incoming data from sources and subsequently relay the data to destinations. A prototype of this system has been built which has a Windows Explorer frontend interface and Java-based relay server code deployed on the global PlanetLab network. The primary focus of this thesis is the theoretical analysis of the Galaxy distributed load- balancing algorithm which attempts to maximize the throughput of the system and provide a measure of fairness to each Galaxy client. The analysis is carried out by first modeling the Galaxy system network as a bipartite graph G = (U V, E) where each communicating ∪ sender/receiver pair is a single node u U, each node v V is a relay, and each edge ∈ ∈ e = (u, v) represents a network connection between a client u and proximate relay v. We call degree-bounded bipartite graphs with maximum client out-degree ∆ and mini- mum relay in-degree δ as “(∆, δ) Dual Bounded”. We show that on (∆, δ) Dual Bounded networks the Galaxy load-balancing algorithm always reaches at least a fraction of optimal ∆2 δ equal to min(1, 2∆2 δ−∆ ∆ ) and that this bound is tight. Further, we show that the Galaxy − − load-balancing algorithm converges in O( U ) rounds and achieves constant-competitiveness | | in only O(log(∆)) time rounds. This result suggests that (1) we minimize the number of relays to which each client may connect and maximize the number of clients to which each relay is connected, and (2) provides exact bounds on throughput and the time required to reach convergence as functions of the structure of the network. In addition to being useful for our Galaxy system, the throughput bound given above is also applicable to other distributed Internet services which use TCP/IP. Acknowledgements First, I would like to thank Dr. Urmila Ghia and Dr. Kirti Ghia for being advisers to me during every phase of my life. They have guided me throughout high school, my college years, and back to the University of Cincinnati where I finally realized my dream of finishing a Ph.D. in computer science. Next, this thesis would not have been possible without the many hours of time that my adviser and co-adviser have so graciously provided. Dr. Kenneth A. Berman always made time for me, my questions, and helped tremendously with the research contained in this thesis. During our long conversations on research I would often think to myself that `this is my favorite time at the University of Cincinnati'. Dr. Fred S. Annexstein also provided so much of his time brainstorming with me, finding holes in my proofs, and helping me with the theoretical analysis used in my research. Without funding this research would not have been possible. I thank the Ohio Space Grant Consortium and Ohio Aeronautics Institute which funded the first three years of my graduate research. This funding opportunity was made possible through the efforts of Dr. Kirti Ghia, Dr. Urmila Ghia, and especially Dr. Gary Slater who worked on my behalf at the OSGC meeting to secure funding. I thank Dr. Ken DeWitt and Laura Stacko who have helped out tremendously during the many trips I made to the OSGC Symposiums. The Ohio Board of Regents funded the last two years of my research and also provided a much- needed equipment grant which funded the U.C. PlanetLab computers. I could not have gotten funding without the help of the University of Cincinnati OBR committee members, Dr. Jerry Paul and Dr. Raj Bhatnagar. Some of the experiments in this thesis were made possible by equipment provided through a National Science Foundation MRI equipment grant number 0521189. Also, the Emulab and PlanetLab consortium provided computer time for the networking experiments contained in this thesis. On a personal level, I would also like to thank my family - my Mom & Dad, sister Nicolle, and extended family for everything they have provided to my wife and me over the years. Finally, I would like to thank my wife, Svetlana Strunjaˇs,for her help with mathematics, her thoughtfulness, and her companionship which helped me get through these years of thesis research and dissertation work. I look forward to starting our new life together. Table of Contents List of Latin Symbols 9 List of Greek Symbols 11 1 Introduction 12 1.1 Application Level Relays . 12 2 Galaxy 15 2.1 PlanetLab . 16 2.2 Choosing Galaxy Relays . 18 2.3 Load Balancing . 22 3 Problem Statement 25 3.1 Maximizing Throughput with Max-Min Fairness . 25 3.2 Problem Statement . 26 3.3 Summary of Results . 27 3.3.1 Convergence Time . 27 3.3.2 Performance Upper Bound . 28 3.3.3 Performance Lower Bounds . 28 3.3.4 Expected Performance . 30 3.4 Throughput Bounds . 31 1 3.5 Convergence Bounds . 31 4 Terminology 32 5 Aggressive Increase Algorithm 35 5.1 Client Algorithm . 36 5.2 Server Algorithm . 37 6 Experimental Results 43 6.1 Simulation Results . 43 6.1.1 Heuristics . 44 6.1.2 Experimental Results . 45 6.1.3 Discussion . 52 7 Rate of Convergence 54 7.1 Analysis . 56 7.1.1 One Server . 59 7.1.2 All Servers . 61 7.1.3 Competitive Ratio Comparisons . 65 8 Throughput Upper Bounds 67 8.1 Upper Bound: MAXMIN . 69 8.2 Upper Bound: GREEDY . 70 8.3 Upper Bound: PROPORTIONAL . 71 8.4 Summary . 73 9 Lower Bounds 74 9.1 Sharp Bounds for Degree 2 . 75 9.1.1 MAXMIN and ARBITRARY: 3/4 of Optimal . 75 2 9.1.2 GREEDY and PROPORTIONAL: 5/6 of Optimal . 77 9.2 Simple Lower Bound . 79 9.3 General Lower Bound . 82 9.3.1 Dual Bounded Graphs . 82 9.3.2 Throughput of on Dual Bounded Graphs . 82 AAI 9.3.3 Notation . 84 9.3.4 Server Flow . 86 9.3.5 Client Flow . 87 9.3.6 Flow Equations . 90 9.3.7 Upper Bound . 91 9.4 Nonlinear Program . 92 9.5 Summary . 95 10 Additional Degree Restrictions 97 10.1 Cross Ratios . 97 10.2 One-Sided Fixed . 99 11 Probabilistic Analysis 102 11.1 Lower Bound for δ ................................ 102 11.1.1 Poisson Approximation . 103 11.1.2 Degree Distribution . 104 12 Galaxy Frontend 108 12.1 Background . 112 12.2 Architecture . 113 12.2.1 Shell Namespace Extension . 114 12.3 Filesystem Extensions . 116 3 12.3.1 NFS Extension . 116 12.3.2 Mirror Extension . 117 12.4 Benchmarks . 117 12.4.1 Experimental Setup . 120 12.5 Results . 120 12.5.1 NFS File Extension . 120 12.5.2 Galaxy Mirror Filesystem Client . 128 13 Related Work 133 13.1 Convergence Time of Flow Algorithms . 133 13.2 Constant Competitive Load Balancing . 135 13.3 Max Min Fairness . 135 13.4 Multipath Relays . 136 13.5 Networking Systems . 138 13.6 Software Routers . 148 13.7 Traffic Studies . 150 13.8 User-level Filesystems . 154 13.8.1 Interposition at System call . 156 13.8.2 Interposition inside the Kernel . 156 13.8.3 Interposition at the network protocol layer . 157 13.8.4 Interposition at the Namespace Extension Level . 158 14 Conclusion 161 14.1 Open Problems and Future Directions . 162 4 List of Figures 1.1 Voice-over-IP Network Example . 13 2.1 PlanetLab world map . 16 2.2 PlanetLab node bandwidth capacities. 17 2.3 Symmetry of incoming vs. outgoing bandwidth on PlanetLab nodes . 18 2.4 PlanetLab relay locations in the United States. 19 2.5 Picking relays using the lens metric. 20 2.6 Distance vs. Bandwidth from a University of Cincinnati PlanetLab node. 21 2.7 Bandwidth of small file Galaxy transfers between nodes in Cincinnati and Utah 22 2.8 Bandwidth of large file Galaxy transfers between nodes in Cincinnati and Utah 23 3.1 MaxMin Assignment Example . 26 4.1 VOIP example represented as a bipartite graph. 33 5.1 Steps of the algorithm under MAXMIN arbitration for saturated clients. 38 AAI 5.2 Steps of the algorithm under GREEDY arbitration for saturated clients. 38 AAI 5.3 Steps of the algorithm under PROPORTIONAL arbitration for saturated AAI clients. 39 6.1 performance on a Focal graph. ..

University of Cincinnati

A Searchable-By-Content File System

A Semantic File System for Integrated Product Data Management', Advanced Engineering Informatics, Vol

Using Virtual Directories Prashanth Mohan, Raghuraman, Venkateswaran S and Dr

Privacy Engineering for Social Networks

Modern Infrastructure for Dummies®, Dell EMC #Getmodern Special Edition

Orion File System : File-Level Host-Based Virtualization

The Sile Model — a Semantic File System Infrastructure for the Desktop

Propeller: a Scalable Metadata Organization for a Versatile Searchable File System

Dynamic Metadata Management in Semantic File Systems

Rlinks: a Mechanism for Navigating to Related Files

Richer File System Metadata Using Links and Attributes

A Semantic File System for Integrated Product Data Management', Advanced Engineering Informatics, Vol