Maglev: A Fast and Reliable Software Network Load Balancer Daniel E. Eisenbud, Cheng Yi, Carlo Contavalli, Cody Smith, Roman Kononov, Eric Mann-Hielscher, Ardas Cilingiroglu, Bin Cheyney, Wentao Shang†* and Jinnah Dylan Hosein‡* Google Inc. †UCLA ‡SpaceX
[email protected] Abstract Maglev is Google’s network load balancer. It is a large distributed software system that runs on commodity Linux servers. Unlike traditional hardware network load balancers, it does not require a specialized physical rack deployment, and its capacity can be easily adjusted by adding or removing servers. Network routers distribute packets evenly to the Maglev machines via Equal Cost Multipath (ECMP); each Maglev machine then matches Figure 1: Hardware load balancer and Maglev. the packets to their corresponding services and spreads them evenly to the service endpoints. To accommodate high and ever-increasing traffic, Maglev is specifically balancers form a critical component of Google’s produc- optimized for packet processing performance. A single tion network infrastructure. Maglev machine is able to saturate a 10Gbps link with A network load balancer is typically composed of small packets. Maglev is also equipped with consistent multiple devices logically located between routers and hashing and connection tracking features, to minimize service endpoints (generally TCP or UDP servers), as the negative impact of unexpected faults and failures on shown in Figure 1. The load balancer is responsible for connection-oriented protocols. Maglev has been serving matching each packet to its corresponding service and Google’s traffic since 2008. It has sustained the rapid forwarding it to one of that service’s endpoints. global growth of Google services, and it also provides Network load balancers have traditionally been imple- network load balancing for Google Cloud Platform.