Virtually Eliminating Router Bugs Eric Keller∗ Minlan Yu∗ Matthew Caesar† Jennifer Rexford∗ ∗ Princeton University, Princeton, NJ, USA † UIUC, Urbana, IL, USA
[email protected] {minlanyu, jrex}@cs.princeton.edu
[email protected] ABSTRACT 1. INTRODUCTION Software bugs in routers lead to network outages, security The Internet is an extremely large and complicated dis- vulnerabilities, and other unexpected behavior. Rather than tributed system. Selecting routes involves computations across simply crashing the router, bugs can violate protocol se- millions of routers spread over vast distances, multiple rout- mantics, rendering traditional failure detection and recovery ing protocols, and highly customizable routing policies. Most techniques ineffective. Handling router bugs is an increas- of the complexity in Internet routing exists in protocols im- ingly important problem as new applications demand higher plemented as software running on routers. These routers availability, and networks become better at dealing with tra- typically run an operating system, and a collection of proto- ditional failures. In this paper, we tailor software and data col daemons which implement the various tasks associated diversity (SDD) to the unique properties of routing proto- with protocol operation. Like any complex software, routing cols, so as to avoid buggy behavior at run time. Our bug- software is prone to implementation errors, or bugs. tolerant router executes multiple diverse instances of routing software, and uses voting to determine the output to publish 1.1 Challenges in dealing with router bugs to the forwarding table, or to advertise to neighbors. We de- sign and implement a router hypervisor that makes this par- The fact that bugs can produce incorrect and unpredictable allelism transparent to other routers, handles fault detection behavior, coupled with the mission-critical nature of Inter- and booting of new router instances, and performs voting in net routers, can produce disastrous results.