Routing Architectures for Large Self-Configuring Networks

Scalable and Efficient Self-configuring Networks Changhoon Kim [email protected] Today’s Networks Face New Challenges • Networks growing rapidly in size – Up to tens of thousands to millions of hosts • Networking environments getting highly dynamic – Mobility, traffic volatility, frequent re-adjustment • Networks being increasingly deployed in non-IT industries and developing regions – Limited support for management Networks must be scalable and yet easy to build and manage! 2 Pains In Large Dynamic Networks • Control-plane overhead to store and disseminate host state – Millions of host-info entries . Commodity switches/routers can store only ~32K/300K entries – Frequent state updates, each disseminated to thousands of switches/routers • Status quo: Hierarchical addressing and routing • Victimizes ease of management – Leads to complex, high-maintenance, fragile, hard-to-debug, and brittle network 3 More Pains … • Limited data-plane capacity – Tbps of traffic workload on core links . Fastest links today are only 10 ~ 40 Gbps – Highly volatile traffic patterns • Status quo: Over-subscription • Victimizes efficiency (performance) – Lowers server utilization and end-to-end performance – Trying to mitigate this via traffic engineering can make management harder 4 All We Need Is Just A Huge L2 Switch, or An Abstraction of One! • Host up to millions of hosts • Obviate manual configuration • Ensure high capacity and small end-to-end latency R R R R R Scalability S R R Efficiency S Self-ConfigurationR S (Performance) S S S S S S … … … … 5 Research Strategy and Goal • Emphasis on architectural solutions – Redesign underlying networks • Focus on edge networks – Enterprise, campus, and data-center networks – Virtual private networks (VPNs) [ Research Goal ] Design, build, and deploy architectural solutions enabling scalable, efficient, self-configuring edge networks 6 Why Is This Particularly Difficult? • Scalability, self-configuration, and efficiency can conflict! – Self-configuration can significantly increase the amount of state and make traffic forwarding inflexible – Examples: Ethernet, VPN routing Present solutions to resolve this impasse 7 Universal Hammers Self-configuration • Utilize location-independent Flat Addressing names to identify hosts • Let network self-learn hosts’ info Scalability • Deliver traffic through Traffic intermediaries (chosen Indirection systematically or randomly) Efficiency Usage-driven • Populate routing and host state Optimization only when and where needed 8 Solutions For Various Edge Networks Enterprise, Data-Center Virtual Campus Network Network Private Network Config-free Config-free site-level ConfigSelf-free-configuring addressing addressing, routing, addressing and andsemantics routing and traffic engineering routing Huge overhead to Expensive router Main scalability Limited server-to- store and disseminate memory to store bottleneck server capacity individual hosts’ info customers’ info Spread traffic Keep routing info PartitionKey technique hosts’ in info a randomly over only when it’s overnutshell switches multiple paths needed SEATTLE VL2 Relaying *SIGCOMM’08+ *SIGCOMM’09+ *SIGMETRICS’08+ 9 Strategic Approach for Practical Solutions Clean Backwards Workable solution with Slate sufficientSEATTLE improvementVL2 Relaying Compatible Enterprise, Data-Center Virtual Campus Network Network Private Network • Lack of • Immediate • Heterogeneity of programmability at deployment Technicalend-host Obstacles switches and • Transparency to environments routers customers Clean slate on Clean slate on Backwards Approach network end hosts compatible Several independent Real-world Pre-deployment prototypes Deployment Evaluation 10 SEATTLE: A Scalable Ethernet Architecture for Large Enterprises Work with Matthew Caesar and Jennifer Rexford Current Practice in Enterprises A hybrid architecture comprised of several small Ethernet-based IP subnets interconnected by routers IP subnet == Ethernet broadcast • Loss of self-configuring capability domain R • Complexity in implementing policies (LAN or VLAN) R R • Limited mobilityR support R • Inflexible route selection R Need a protocol that combines only the best of IP and Ethernet 12 Objectives and Solutions Objective Approach Solution Resolve host info 1. Avoid flooding via unicast Network-layer 2. Restrain Bootstrap hosts one-hop DHT broadcasting via unicast Populate host info Traffic-driven 3. Reduce only when and where resolution with routing state needed caching 4. Enable L2 link-state routing Allow switches to shortest-path maintaining only learn topology forwarding switch-level topology * Meanwhile, avoid modifying end hosts 13 Network-layer One-hop DHT • Switches maintain <key, value> pairs by commonly using a hash function F – F: Consistent hash mapping a key to a switch – LS routing ensures each switch knows about all the other live switches, enabling one-hop DHT operations • Unique benefits – Fast, efficient, and accurate reaction to churns – Reliability and capacity naturally growing with network size 14 Location Resolution <key, val> = <MAC addr, location> x Owner C y A MACx payload MACx payload A MACx payload B MACx payload Host discoveryHash Hash F(MACx) = B D F A MACx payload (MACx ) = B Publish <MACx, A> Notify <MACx, A> Resolver B Switches Store E End hosts <MACx, A> Control message Data traffic 15 Handling Host Dynamics • Host location, MAC-addr, or IP-addr can change Dealing with host mobility Old location F Host talking x with x A y < x, A > < x, D > C < x, A > < x, D > New location B Resolver D < x, A > E < x, D > < x, D > MAC- or IP-address change can be handled similarly 16 Further Enhancements Implemented Goals Solutions Host-info re-publication and Handling network dynamics purging Isolating host groups Group-aware resolution (e.g., VLAN) Supporting link-local Per-group multicast broadcast Dealing with switch-level Virtual switches heterogeneity Ensuring highly-available Replicating host information resolution service Dividing administrative Multi-level one-hop DHT control 17 Prototype Implementation • Link-state routing: XORP OSPFD • Host-info management and traffic forwarding: Click XORP Link-state Click Network OSPF advertisements Interface Map Daemon User/Kernel Click Host-info publication and resolution msgs Switch Ring Host Info Table Manager Manager SeattleSwitch Data Data Frames Frames 18 Amount of Routing State y ≈ 34.6h Ethernet SEATTLE w/ caching y ≈ 2.4h SEATTLE y ≈ 1.6h w/o caching SEATTLE reducesy = 30h the amount of routing state by more thany = 2h an order of magnitude (h) 19 Cache Size vs. Stretch Stretch = actual path length / shortest path length (in latency) ROFL [SIGCOMM’06] SEATTLE offers near-optimal stretch with very small amount of routing state SEATTLE 20 SEATTLE Conclusion • Enterprises need a huge L2 switch – Config-free addressing and routing, support for mobility, and efficient use of links • Key lessons – Coupling DHT with LS routing offers huge benefits – Reactive resolution and caching ensures scalability Flat Addressing MAC-address-based routing Traffic Indirection Forwarding through resolvers Usage-driven Opt. Ingress caching, reactive cache update 21 Further Questions • What other kinds of networks need SEATTLE? • What about other configuration tasks? • What if we were allowed to modify hosts? These motivate my next work for data centers 22 VL2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, Navendu Jain, Srikanth Kandula, Dave Maltz, Parveen Patel, and Sudipta Sengupta Data Centers • Increasingly used for non-customer-facing decision-support computations • Many of them will soon be outsourced to cloud-service providers • Demand for large-scale, high-performance, cost-efficient DCs growing very fast 24 Tenets of Cloud-Service Data Center • Scaling out: Use large pools of commodity resources – Achieves reliability, performance, low cost • Agility: Assign any resources to any services – Increases efficiency (statistical multiplexing gain) • Low Management Complexity: Self-configuring – Reduces operational expenses, avoids errors Self- Scalability Efficiency configurationConventional DC network ensures none 25 Status Quo: Conventional DC Network Internet CR CR DC-Layer 3 AR AR ... AR AR DC-Layer 2 S S Reference – “Data Center: Load balancing Data Center Services”, Cisco 2004 Key S S S S ... • CR = Core Router (L3) • AR = Access Router (L3) • S = Switch (L2) … … • LB = Load Balancer • A = Rack of app. servers ~ 1,000 servers/pod == IP subnet LimitedPoorDependenceFragmentation utilization server-to on - andserverof proprietary resources reliability capacity high-cost solutions 26 Status Quo: Traffic Patterns in a Cloud • Instrumented a cluster used for data mining, then computed representative traffic matrices • Traffic patterns are highly divergent – A large number (100+) of representative TMs needed to cover a day’s traffic • Traffic patterns are unpredictable – No representative TM accounts for more than a few hundred seconds of traffic patterns Optimization approaches might cause more trouble than benefits 27 Objectives and Solutions Objective Approach Solution Name-location 1. Ensure layer-2 Employ flat separation & semantics addressing resolution service 2. Offer uniform high Guarantee bandwidth capacity between Random traffic servers for hose-model traffic indirection (VLB) over 3. Avoid hot spots a topology with huge Use randomization to w/o frequent aggregate capacity cope with volatility reconfiguration * Embrace end systems! 28 Addressing and Routing: Name-Location

Load more