<<

Networks 36 )2001) 137±151 www.elsevier.com/locate/comnet

Customizable virtual service with QoS L. Keng Lim, Jun Gao, T.S. Eugene Ng, Prashant R. Chandra, Peter Steenkiste *, Hui Zhang

Computer Science Department, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA

Abstract In this paper, we propose and implement Virtual Network Service )VNS), a value-added network service for de- ploying virtual private networks )VPNs) in a managed wide-area IP network. The key feature of VNS is its capability of providing a customer with a VPNthat is customizable with management capabilities and performance properties comparable to a dedicated physical network. In addition, VNS ensures con®dentiality of data and principals through the use of IPSEC. The main technique underlying VNS is the virtualization of routers in both control and data planes. Virtualization of the control plane enables customizable and signaling per VPN. On the data plane, packet forwarding and link bandwidth are virtualized. Virtualization of the forwarding mechanism on the data plane enables forwarding of trac according to each VPN's topology and policies. Virtualization of the link bandwidth enables each VPNto have guaranteed )QoS) and customized resource management policies. We have developed a VNS prototype for deployment on the CAIRN network. The VNS prototype implements several resource management mechanisms including packet scheduling, signaling and runtime monitoring. A graphical user interface enables service providers to manage, con®gure and deploy VPNs remotely. Ó 2001 Elsevier Science B.V. All rights reserved.

Keywords: Virtual private networks; Network quality of service; Programmable networks

1. Introduction network infrastructure amongst multiple VPNs. The ubiquity of the makes it an ideal in- The Internet is gradually evolving into an in- frastructure for providing the VPNservice. Fig. 1 frastructure for network-based services. Virtual illustrates the situation where two di€erent VPN private network )VPN) service will be one of the topologies are created on top of the same under- important Internet services. A VPNservice allows lying shared network infrastructure. a customer to build a virtual wide-area network on Various forms of private networking services top of a shared wide-area network infrastructure, have been available to enterprises for years. Ini- such as the Internet, without setting up any dedi- tially, private networks were built using dedicated cated physical connections. There is strong eco- leased lines, but the cost of building a large private nomic incentive for the VPNservice because of the network using dedicated hardware is prohibitive to opportunity to share a common expensive physical all but the largest corporations. Then, with the introduction of low-cost, packet-switched virtual circuit-based services such as and X.25, virtual private networking became possible. * Corresponding author. Tel.: +1-412-268-3261; fax: +1-412- 268-5576. Unfortunately, the availability and functionality E-mail address: [email protected] )P. Steenkiste). of these services is very limited. For an Internet-

1389-1286/01/$ - see front matter Ó 2001 Elsevier Science B.V. All rights reserved. PII: S 1 3 8 9 - 1 2 8 6 ) 0 1 ) 0 0 173-6 138 L.K. Lim et al. / Computer Networks 36 .2001) 137±151

In this paper, we propose and implement Virtual Network Service )VNS), a value-added network service for deploying VPNs in a man- aged wide-area IP network. VNS is built on top of the IP layer to ensure interoperability across various layer 2 technologies )e.g., ATM, MPLS). A VPNis constructed from virtual links. A vir- tual link is a link abstraction connecting any two physical nodes that are in the VPN's topology. Communication over the VPNis secure, and each virtual link is allocated a guaranteed bandwidth. Moreover, unused bandwidth is shared statistically between VPNs for additional performance gains. Fig. 1. Two VPNs built on top of one shared physical network The key advantage of VNS is that it deploys in VNS. VANESA is a graphical VPN management tool. The VPNcontroller is responsible for carrying out commands from VPNs that have a level of performance and degrees VANESA. of freedom in management that are comparable to physical private networks. For instance, instead of being restricted to only site-to-site virtual links, a based VPNservice to be a viable alternative, it customer has full control of the VPNtopology, must have properties comparable to that of a and how the VPNtopology maps onto the un- dedicated physical network. The service must derlying network. This has two advantages. First, provide mechanisms to enforce quality of service the topology can be engineered such that appli- )QoS) and con®dentiality of data must be guar- cations that are sensitive to the anteed as the data travels over the common in- )such as multicast applications) can achieve the frastructure. In addition, the service must o€er best performance. Second, by carefully choosing each VPNwith the autonomy to customize re- the topology, statistical sharing of bandwidth source management. within the VPNcan be optimized. In addition to Most Internet-based commercial VPNsolutions customizing the topology, each VPNcan also se- today construct virtual links using either site-to- lect its own control protocols. For example, it can site IP tunnels or site-to-site MPLS paths. The use a customized routing protocol that supports con®guration of the VPNtopology is therefore load balancing, policy-based routing or QoS highly restricted. The services supported are often routing. VNS also provides guaranteed QoS on limited to best-e€ort site-to-site connectivity and each virtual link in a VPN. Moreover, because link between sites. If QoS is bandwidth is virtualized using hierarchical packet o€ered, it is usually provided by over-provisioning scheduling, each VPNcan even have its own sig- network resources so that QoS service-level naling protocol )e.g., RSVP) to customize resource agreements are unlikely to be violated. Recently, sharing policies in the VPNor to provide per-¯ow some e€orts such as in [5,12] use QoS strategies QoS to real-time applications. that require VPNtrac to be regulated at ingress The main technique underlying VNS is the vir- nodes. The downside is that the opportunity for tualization of the control and data planes in rou- statistical sharing of unused resources is reduced. ters. Virtualization of the control plane enables Another important limitation of these approaches each VPNto have the autonomy to execute cus- is the lack of customizability. For example, a tom routing and signaling protocols while sharing customer cannot control the routing of VPNtrac a common physical infrastructure. Our approach for load balancing or QoS routing, nor can a to provisioning customizable control planes le- customer specify resource management policies in verages a programmable architecture that the VPN. provides an open programmable interface [29]. L.K. Lim et al. / Computer Networks 36 .2001) 137±151 139

In the data plane, packet forwarding and link 1. VANESA: VANESA is a Java-based centralized bandwidth are virtualized per VPN. Virtualization graphical user interface for con®guring and of the forwarding mechanism enables isolation managing VPNs. Fig. 2 is a screen capture of and routing of trac according to virtual topolo- VANESA. The idea here is similar to the con- gies. Virtualization of the link bandwidth provides cept of a software toolkit for deploying virtual each VPNwith virtual links of guaranteed capac- networks as described in [13] by Ferrari and ity, and the autonomy to specify its own band- Delgrossi. VANESA provides a simple interface width sharing policy. Earlier work in VPNservices for the network administrator to con®gure VPN such as in [7,15,24,31] did not consider statistical properties such as the virtual topology, band- sharing of under-utilized resources. In this work, width requirements of virtual links, parameters the additional performance bene®t of statistical for security con®guration and VPN membership multiplexing is achieved without compromising information. Members of a VPNare described any bandwidth guarantees by using the fair service by the member end ' IP addresses and/or curve )H-FSC) [27] hierarchical packet scheduler. the member subnets' network pre®xes. In addi- Architecturally, VNS is based on the Darwin [8] tion, VANESA can also be used to specify cus- router design, which is programmable and capable tom routing and signaling protocols that are to of virtualizing the link bandwidth. The Beagle [9] be deployed within a VPN. signaling protocol is used for resource allocation 2. VPN controller: The VPNcontroller is a pro- and control plane customization. In order to virtu- cess that runs on a host or router that has direct alize routing and forwarding, we extend the Darwin access to the network where VNS is deployed. router design to allow each VPNto have its own The job of the VPNcontroller is to act as a routing protocol and forwarding table. Secure proxy for control between VANESA communication is achieved through IPSEC [18]. and routers in the WANwhere VNSis de- The virtual network system administrator )VA- ployed. This enables VANESA to be executed NESA), a Java-based VPN management tool, pro- remotely from anywhere in the Internet. Fur- vides a user interface that hides the complexity of thermore, the complexity of the signaling re- the signaling from the user. VNS is targeted towards quired to set up the VPNis handled by the deployment on the CAIRNresearch network [1]. VPNcontroller and decoupled from the user in- The rest of this paper is organized as follows. In terface. This setup is depicted in Fig. 1. Section 2, we examine the overall system design of 3. Virtualizable VNS routers: VNS routers are VNS. In Section 3, we explain the key concept of Darwin-based routers built on commodity PC virtualization by describing the mechanisms used hardware running a variant of FreeBSD Unix. to enforce virtualization of bandwidth, control Usually, a minimal PC router performs packet plane protocols, and the forwarding mechanism. forwarding based on a single forwarding table We then survey related work in Section 4 and and a routing daemon that does route compu- summarize our work in Section 5. tation. Darwin routers have enhancements such as a signaling protocol module, a sophisticated packet scheduler, packet classi®er, and a pro- 2. VNS system overview grammable interface for deploying value-added services. Leveraging these existing features of In this section, we describe the major compo- Darwin, we extended the Darwin router design nents of VNS and how they interoperate. A more for VNS. Control plane and data plane re- detailed desciption of the techniques used in vir- sources on a VNS router are virtualized to sup- tualizing routers is presented in Section 3. port the unique needs of each VPN. 3. In the data plane, each VPNis allocated its own 2.1. Components resources such as link bandwidth and a forwarding table. In the control plane, a VNS The main VNS components are: router has mechanisms that enforce isolated 140 L.K. Lim et al. / Computer Networks 36 .2001) 137±151

Fig. 2. A screen-shot of VANESA.

execution of custom-VPNrouting and signaling example, VPN#2's members are subnet 10.1.1/24 protocols. Fig. 3 illustrates the virtualization of attached at router A and subnet 10.2.1/24 attached a router. at router E. The router that is the access point to Next, we describe service provisioning in VNS the network for a VPNmember is called an edge by explaining the interactions between the com- router. All other interior routers in the network ponents of the system during the design, setup, and that are part of a VPNbut are not directly con- operation of a VPN. nected to VPNmembers are called core routers. In order to provide QoS to virtual links and 2.2. VPN design support per-VPNforwarding, virtual routers need to maintain VPN-speci®c information for QoS We will describe the design of a VPNusing the enforcement and per-VPNforwarding. In addi- example VPNshown in Fig. 4. Each VPN'svirtual tion, edge routers must maintain VPNmember- topology is constructed from virtual links, illus- ship information, IPSEC security parameters, and trated as dotted lines for VPN#1 and as the lightly the encapsulating IP headers to use for each shaded lines for VPN#2 in Fig. 4. A router that is VPN. part of a VPN's topology is called a virtual router. For instance, VPN#2's virtual routers are A, B, D 2.3. VPN setup and E. A VPNprovides connectivity for end hosts or During the setup phase, the network adminis- subnets identi®ed as members of the VPN. In our trator speci®es a VPN's properties through VA- L.K. Lim et al. / Computer Networks 36 .2001) 137±151 141

Fig. 3. A virtualized VNS router with three instances of virtual control planes and customized forwarding tables.

Fig. 4. Basic concepts illustrated with two VPNs.

NESA's graphical interface. These properties in- three setup messages to the VPNcontroller since clude the VPN's virtual topology, bandwidth re- each of these con®guration steps corresponds to a quirements of the virtual links in the topology, speci®c type of setup request. members, local routing policies for virtual routers, Upon receiving the VPNsetup messages, the security information and encapsulating IP headers VPNcontroller initiates requests to routers in the for tunneling VPNtrac. After specifying the virtual topology through the Beagle signaling VPNdescription, the network administrator sub- protocol [9]. While it would also be possible to set mits the request for setting up this VPNby clicking up resource reservations with ¯ow-based signaling on the ``Submit'' button on VANESA's interface. protocols such as RSVP [4], we chose Beagle be- Subsequently, VANESA sends appropriate setup cause it provides support for the allocation of re- messages to the VPNcontroller based on the re- sources for mesh structures such as VPN quest. There are several types of setup messages. topologies. All VPNconnection management Each is related to a request to con®gure one of the tasks are handled by the Beagle daemon on the VPNproperties. For instance, in a minimal VPN VPNcontroller and the Beagle daemons on the setup that has no security con®guration, VANESA routers that are part of the virtual topology. In will be used to set up virtual links with bandwidth Fig. 5, we show this setup procedure for one of the guarantees, dispatch membership information and routers that is part of the VPN. con®gure local routing policies of routers in the For virtual link resource reservations, the Bea- virtual topology. VANESA would therefore send gle daemon on every router of a VPNcon®gures 142 L.K. Lim et al. / Computer Networks 36 .2001) 137±151

Fig. 5. Control path in VNS. the local classi®ers and schedulers of the appro- 2.4. VPN operation priate network interfaces to reserve resources. Beagle is also used to deploy VPN-speci®c routing The operation of a VPNis based on IP-in-IP and signaling protocol modules on the routers of a tunneling, but support is provided to maintain VPN. The customization of control protocols is privacy of the data and to allow per-VPNcus- discussed in Section 3.2. tomization of packet handling inside the core of During the setup of a VPN, Beagle also per- the network. We discuss the main tasks performed forms two con®guration steps that are speci®c to during the operation of a VPNin more detail in edge routers. The ®rst step is to provide edge this section )Fig. 6). routers with VPNmembership information and As in a private physical network, we believe the the globally unique VPNidenti®er )VPN-ID)that basic security service a VPNshould have is the was chosen by VANESA; this information is nee- con®dentiality of data and principals when VPN ded so edge routers can inject packets appropri- packet ¯ows in the core of the network. This is ately into the VPN. The second VPN-speci®c step provided in VNS by establishing ESP [17] tunnels is to establish security associations between the between the ingress and egress edge routers. This edge routers; the security associations are used to means that for any VNS data stream, crypto- provide and of the data graphic packet processing is performed at edge that travels over the VPN. Both operations are routers only. It can be argued that this is less described in more detail below. secure than an alternative model that requires

Fig. 6. Datapath through a VNS-enabled network. L.K. Lim et al. / Computer Networks 36 .2001) 137±151 143

Fig. 7. VNS packet format. re-keying at every link. Our choice in keeping the gress IP, egress IP> tuples. The member security model simple is motivated by a perfor- source address in the tuple identi®es a VPNmem- mance trade-o€, i.e., we reduce the overhead on ber that is reachable through that network inter- the core routers. face. Using the source and destination addresses of Using the membership information provided to a packet, the packet is classi®ed to be part of a VPN them by Beagle, an ingress edge router can cor- if it matches the portion of a tuple in the membership list. The then injects the packet in the appropriate IP-in-IP packet is then encrypted by IPSEC and prepended tunnel and tags the packet with the globally unique with the corresponding VPN-ID, and at last the VPN-ID of the VPN. The VPN-ID is necessary packet is encapsulated with the ingress and egress because once a packet enters a VPNtunnel, the routers' IP addresses found in the tuple. Fig. 7 il- original packet is encrypted, so core routers can no lustrates the resulting packet format. We can pro- longer use the header ®elds to identify what VPN vide more ®ne grain control over what trac enters the packet belongs to. To di€erentiate between a VPNby using additional ®elds )e.g., source and packets so as to enable per-VPNforwarding and destination port numbers) in the ®lter that is used resource management, the VPN-ID is added to the to classify packets. encapsulating header at the ingress edge router as When a core router receives a packet, it uses the an IPOPT_SATID IP option. This approach does VPN-ID to identify the VPN that the packet be- not support inter-VPNcommunication, though an longs to. It can then service the packet in a way that easy extension to enable this would be to supple- is appropriate for that VPN. Packet forwarding ment a pair of VPN-IDs identifying the source and packet scheduling )QoS) can be customized on VPNand destination VPN,respectively. a per-VPNbasis, as is discussed in more detail in By relegating the task of tagging packets with a Section 3. This allows packets to be scheduled VPN-ID to the edge routers, we allow any end based on the policies of the VPNand forwarded host to become a VPNmember without requiring according to the VPN's topology. At the egress any changes. Implicitly, this limits the freedom of edge router, the packet is decrypted and decapsu- hosts to directly control what VPNs they partici- lated. The inner packet is then examined and for- pate in, since the information of what trac uses warded to the locally attached VPNdestination. what VPNhas to be stored on the edge routers We have also modi®ed the route, tracero- using a signaling protocol. End-hosts can be given ute and netstat commands for the VNS envi- more control by making them VNS-aware so they ronment such that we can create the initial routing can insert a VPN-ID into the packets they send. table setup and verify VPNroutes. This way, the end host can control more easily which speci®c VPN-ID they want to use for spe- ci®c applications. 3. Virtualization VPNmembership is maintained at each network interface of an edge router in the form of

CBQ, H-FSC is capable of decoupling the alloca- tion of delay and bandwidth resources and char- acterizing the provided service precisely. As a result, real-time trac can enjoy a low delay without over-reserving resources. This allows the router to have greater ¯exibility in resource allo- cation and increases resource utilization. We ex- tended the packet classi®er from the Darwin implementation to support VPN-ID-based classi- ®cation. 2 Another important property of the H-FSC Fig. 8. Hierarchical resource tree of link bandwidth. scheduler is that it allows sibling nodes in the re- source tree to borrow bandwidth from each other when possible. This means that if a ¯ow inside a 3.1. Virtualization of link bandwidth VPNdoes not use all the bandwidth that is allo- cated to it, other ¯ows within the same VPNwill Enforcement of bandwidth guarantees to vir- ®rst have the opportunity to use that bandwidth. If tual links is performed using a packet classi®er and a VPNdoes not fully utilizing its capacity on a a hierarchical packet scheduler. For any router, we virtual link, the extra bandwidth will be shared by represent the division of the bandwidth of a link at trac belonging to other coexisting VPNs. This the router as a hierarchical resource tree. In the additional performance gain from statistical mul- context of VNS, each VPN virtual link created tiplexing demonstrates that VPNs in VNS can over a physical link is represented by a node 1 in actually do better than a physical private network the ®rst tier of nodes underneath the root node in with ®xed capacity. the hierarchical resource tree. A certain amount of bandwidth is reserved for each node at the VPN 3.2. Virtualization of the control plane protocols set up time. The e€ect of this is that each virtual link will have a guaranteed capacity. Fig. 8 is an The control plane of a commodity PC router example of what a resource tree at a physical link running the Unix operating system typically con- might look like with three VPNs. In this example, sists of user-level daemons that implement various VPN#3 reserved 40% of the link bandwidth, control protocols. For example, a routing daemon which ensures that the virtual link of VPN#3 has creates and maintains the routing table on a rou- a capacity of about 62 Mb per second. The hier- ter, which governs the packet forwarding behav- archical scheduler allows a VPNto further divide ior, by exchanging routing protocol messages with its bandwidth across the trac classes it carries by peer routing daemons on other routers in the creating a subtree. For instance, VPN#3 allocates network. In a traditional )physical) network, net- 40% of its bandwidth to its TCP trac in our ex- work administrators can deploy a di€erent routing ample. protocol by installing new routing daemons on the VNS uses the H-FSC [27] packet scheduler de- routers within the network. Similarly, we would veloped in the context of Darwin. An advantage of like the administrators of VPNs to be able to using H-FSC as opposed to other class-based choose and deploy their own control plane pro- scheduling discipline such as H-PFQ [2] and CBQ tocols and network management policies within [14] is H-FSC's ¯exibility in de®ning and enforcing their VPN. To meet this requirement, the control QoS on a multi-tier hierarchy. Unlike H-PFQ and

2 In the case of encrypted trac, an additional ¯ow identi®er 1 Generally, a node corresponds to one or multiple ¯ows.A must be added to the packet header at the ingress router in ¯ow is de®ned using a ¯ow_spec which includes ®elds from IP order to di€erentiate between ¯ows inside the VPN. This and transport layer headers and an optional application ID. feature is not implemented in the current VNS prototype. L.K. Lim et al. / Computer Networks 36 .2001) 137±151 145 plane of the network that supports VPNservices The coordinated actions of these routing delegates needs to be virtualized. In other words, the control will create VPN-speci®c routing tables according plane can be sub-divided into multiple VPNcon- to the VPN's topology. This means that a virtual trol planes, each running a VPN-speci®c set of router will have multiple routing delegates run- control daemons. ning, each responsible for the trac of a separate VPN. 3.2.1. Darwin programmability support To demonstrate the concept of routing virtual- To control the behavior of the router, a control ization, we use RIP-2 [19] as an intra-VPNrouting protocol daemon needs to interact with modules in protocol. For each VPN, a separate RIP-2 routing the data plane, e.g., a routing daemon must be able daemon will be started by Beagle. We modi®ed the to update the routing table in the kernel, and a existing CAIRNrouting daemon, mrtd [26], to signaling daemon must be able to change the states support multiple RIP clouds over a single physical of the classi®er and scheduler. However, a tradi- network. The RIP-2 speci®cation requires all RIP tional router is shipped as a ``closed box'' with a messages to be exchanged at the set of standard vendor protocols. It is dicult if 224.0.0.9 and port 520. In order to support mul- not impossible for users to install any customized tiple RIP clouds, we extend the RIP protocol to control protocols. In this project, we take on a support the exchange of RIP messages at an as- programmable network approach to support signable port number. The idea here is to allow a control plane virtualization. In a programmable VPNto select an unused port number at the RIP network, the control plane functionality of the multicast address and have VPNrouting daemons routers can be extended dynamically by installing use that port number for RIP messages. This way, customized control protocols on the router. These we ensure isolation of VPN-speci®c RIP messages protocol can modify the forwarding behavior of and prevent VPNs from leaking routes into each the data plane in a controlled fashion through a others' domain. In our implementation, port 520 programming interface. remains as the port used by RIP-2 for default VNS leverages the programmability of the routing, and for each VPNdeployed, VANESA Darwin system [8] to dynamically deploy VPN- assigns a unique and well-known port number to speci®c control protocols. In Darwin, mobile code the VPN. All RIP-2 messages pertinent to this segments, called delegates, can be transferred to VPNwill then be exchanged via this port. the router and instantiated in the delegate runtime Another possible approach would be to assign environment )DRE) using the Beagle signaling each VPNwith a speci®c multicast address for protocol. Delegates can implement control plane RIP-2 protocol messages. This address would be protocols, customized control policies or custom- chosen from the administratively scoped range ized services. They run at user level within the )239.192/14) [21] and the only requirement is that DRE and change the router's behavior by con- the multicast address must be uniquely mapped to trolling data plane modules, such as the classi®er, a speci®c VPN. This approach has the advantage routing table and the scheduler through Darwin's that a router will only receive VPN-speci®c RIP-2 programming interface, the router control inter- messages if the router is a virtual router in the face )RCI) [16]. Delegates can only modify the VPN, but it requires that multicast is available. forwarding behavior of the trac ¯ows that are The VNS approach of executing independent explicitly assigned to them. per-VPNrouting daemons on a router o€ers cus- tomers the ¯exibility of deploying VPN-speci®c 3.2.2. Routing virtualization routing protocols. However, it has the disadvan- We demonstrate control plane virtualization by tage that it will not scale well to large numbers of showing that VPN-speci®c routing protocols can VPNs. Each routing daemon will consume re- be deployed using delegates. During VPNsetup, sources such as CPU cycles and memory, which delegates implementing a selected routing protocol may degrade the router's performance when it are installed on all the virtual routers of the VPN. supports a large number of VPNs. When multiple 146 L.K. Lim et al. / Computer Networks 36 .2001) 137±151

VPNs use the same routing protocol, we can re- 3.3. Virtualization of forwarding mechanism duce the number of routing daemons by deploying a single routing delegate that sends and receives all In this section, we will discuss a speci®c virtu- the routing messages belonging to the VPNs using alization technique for forwarding packets ac- the same routing protocol. The delegate then cording to virtual topologies. Conceptually, this demultiplexes the messages internally to compute means that we may have to forward packets des- routes for each VPNseparately. tined for the same destination )egress router) dif- Besides multiple routing daemons, routing vir- ferent. However, FreeBSD Unix only supports tualization also requires multiple routing tables in single path routing [20]. This is an inherent limi- the data plane. We made extensions to the FreeBSD tation of the forwarding table radix-tree-based Unix forwarding mechanism so that packets be- lookup algorithm and data structures [25]. longing to di€erent VPNs are forwarded by looking Our solution for route isolation in the for- up the next hop in a VPN-speci®c forwarding table. warding mechanism is to simply require the system We discuss the details of this extension to the for- to maintain a separate forwarding table for each warding mechanism later in this section. VPN. Every forwarding table is populated with routes computed based on the VPN's virtual to- 3.2.3. QoS management within a VPN pology. Whenever a packet arrives at a router and A virtualized router control plane allows a VPN needs to be forwarded, the forwarding mechanism to deploy other VPN-speci®c control plane pro- classi®es the packet. If the packet is classi®ed to a tocols. As an example, we discuss how a VPNcan VPN, it will be forwarded based on a route lookup deploy its own signaling protocol to perform using that VPN's forwarding table. Moreover, our VPN-speci®c resource management. system's routing architecture must correctly de- As discussed earlier, each virtual router em- multiplex routing messages that are exchanged ploys a hierarchical packet scheduler, i.e., the between the user space and the kernel space. In the bandwidth of each link is shared in a hierarchical remainder of this section, we present the extensions fashion. As shown in Fig. 8, the ®rst level in the that we made to the FreeBSD Unix routing system. resource tree corresponds to the bandwidth shar- ing across the VPNs running on the physical link. 3.3.1. Packet forwarding in FreeBSD Unix To further exploit the merit of the hierachical In a FreeBSD Unix router, the user-level rout- scheduler, the owner of a VPNlink, i.e., a node in ing daemon and the kernel communicate using the ®rst tier of the resource tree, can set up more messages [32]. The core information carried in sophisticated bandwidth sharing policies for ap- these messages are addresses of destinations and plications running within its VPN, as is illustrated gateways. These addresses are stored as one or for VPN#3 in Fig. 8. more sockaddr structures in the payload of these To manage the resource reservations within a messages. Fig. 9 is a simpli®ed illustration of the virtual network, a VPNmay need to deploy its forwarding mechanism in FreeBSD Unix. own signaling protocol. This can be done by in- Forwarding and routing are organized on the stantiating per-VPNsignaling deamons )e.g., basis of di€erent address families. Separate routing Beagle, RSVP), similar to what VNS does for tables are used for di€erent address families, and routing daemons. Sigaling messages must be tag- routing daemons inform the kernel what family of ged with a VPN-ID, the same way as other VPN addresses they are responsible for. To make this trac, and they will be forwarded according to the system work correctly, routing messages must be VPNtopology, i.e., use the VPNforwarding table demultiplexed to the appropriate routing daemon managed by the routing delegate of that VPN. The and forwarding table updates have to be applied to actions of the signaling daemon will be restricted the right table. Also when there are local changes to the resources of a speci®c VPN, i.e., the daemon in routes or route policies, the kernel's routing will only be able to modify the resource allocations subsystem must be able to dispatch these changes within a speci®c subtree of the resource tree. to the correct routing daemon. L.K. Lim et al. / Computer Networks 36 .2001) 137±151 147

Fig. 9. Forwarding mechanism in FreeBSD Unix.

To demultiplex to the correct forwarding table, daemons. Any forwarding table updates will a pointer to the forwarding table is obtained by ``leak'' to other VPNs. using the sa_family ®eld of addresses as an in- dex into the rt_tables[ ] array. Similarly, to 3.3.2. Routing and packet forwarding in VNS dispatch routing messages to routing daemons, the We provide per-VPNpacket forwarding by forwarding mechanism searches through the con- supporting demultiplexing to di€erent forwarding trol block list in the kernel in order to ®nd a tables based on the VPN-ID, as is illustrated in control block which would give a back pointer to Fig. 10. This requires that we virtualize the various the routing daemon. The search strategy is an ex- kernel data structures involved in routing and haustive search that returns any control block that packet forwarding: has its values match the 1. Create an array vpn_rt_tables[ ] for VPN key . locates two arrays; vpn_rt_tables[ ] for It is clear that the above routing architecture VPNforwarding tables and rt_tables[ ] cannot support the multiple forwarding table so- for forwarding tables of all other protocol fam- lution needed for per-VPNpacket forwarding and ilies. Each entry in vpn_rt_tables[ ] con- routing. All addresses in the VPNs are IP ad- tains a pointer to a forwarding table and an dresses and will therefore have the protocol family unsigned integer that stores the VPN-ID for ®eld set to AF_INET. As a result, all VPNs will the associated forwarding table. share the same IP forwarding table and any rout- 2. Initialize routing daemon with VPN-ID:Whena ing update will be dispatched to all VPNrouting routing daemon is instantiated, it is given the

Fig. 10. Virtualization of forwarding mechanism in a VNS router's kernel. 148 L.K. Lim et al. / Computer Networks 36 .2001) 137±151

VPN-ID of the VPN it is responsible for. This In VNS, the extra step involved in this lookup VPN-ID is used by the routing daemon to inter- consists of a classi®cation step to determine if the act with the kernel. packet belongs to a VPN. The classi®cation step 3. Label routing sockets with a VPN-ID: We aug- checks for the availability of the IPOPT_SATID mented the kernel socket structure with an ad- option in the packet header. If the option exists, ditional unsigned integer ®eld named vpn_id. the packet is assumed to be a VPNpacket and the After a VPNrouting daemon has created a option value is used as the VPN-ID. The destina- routing socket, it will make an additional io- tion address of the packet is then packed into a ctl& ) system call to set the vpn_id ®eld of sockaddr structure and tagged with the VPN- the socket structure in the kernel to the VPN- ID. This sockaddr structure is then passed to the ID of the VPN. forwarding mechanism for a route lookup. 4. Label the raw socket control blocks with a VPN- Our virtualization of the forwarding mechanism ID: We modi®ed the raw socket control block is straightforward and results in no changes to the structure by adding a ®eld named vpn_id.As tree-based forwarding table lookup algorithm and in step 3, vpn_id is set to the VPN-ID of the its associated data structures. An alternative VPNassociated with the routing daemon. would be revamp the forwarding table lookup al- All the functions responsible for processing gorithm and data structures, as is done in the routing messages entering the kernel from user Detour project [10] or to use MPLS. A more de- space have access to the kernel socket structure of tailed discussion of Detour and MPLS in com- the process that generated the routing messages. parison to VNS is provided in Section 4. As a result, we can use the vpn_id ®eld in the kernel socket to associate the routing messages with the correct VPN. For example, using the 4. Related work vpn_id ®eld as the index to the vpn_rt_tables array, we can easily obtain the pointer to the ap- One of the distinguishing features of VNS is propriate VPNforwarding table. that it can provide VPNservices with customizable In the other direction, when routing messages intra-VPNQoS support. To the best of our need to be dispatched to the routing daemon, we knowledge, other approaches such as the X-Bone cannot easily associate these routing messages with [30], Genesis [6] and Supranet [13] are more fo- a kernel socket. In the IP domain, the forwarding cused on providing an overall service architecture mechanism uses as the and have not fully developed techniques for en- search key to ®nd a match from the list of raw abling per-VPNQoS. Earlier research in VPNs, socket control blocks. We extended the search to such as in [7,15,24] focused on VPNservices on use the tuple . Within the broadband ATM networks, i.e., they developed forwarding mechanism, we overloaded the func- methods for managing and mapping VPNs on tionality of the sa_family ®eld in sockaddr to virtual circuits. While some measure of QoS is encode the VPN-ID in the following way. If its attainable through dedicated virtual circuits, these value falls outside the set of well-known protocol ATM-based solutions typically do not allow families, then we know that sa_family must be a bandwidth sharing across VPNs, since VPN ¯ows VPN-ID. Consequently, in the search for the cor- are mapped directly to virtual circuits. This leads responding raw socket control block, the vpn_id to a lower utilization of the bandwidth resources. ®eld in the control block structure will be used for Other QoS strategies that regulate trac exclu- comparison. sively at the ingress router, such as in [5,12], also Finally, an extra step is added into the packet cannot capitalize on statistical multiplexing gains forwarding mechanism. As shown in Fig. 10, a as easily. route lookup is performed in the IP_FORWARD Our approach to QoS is based on an IP layer module after the IP_INPUT module determines mechanism that provides bandwidth guarantees to that a packet has yet to reach its ®nal destination. VPNs and has the added bene®t of statistical L.K. Lim et al. / Computer Networks 36 .2001) 137±151 149 multiplexing gained through the use of the H-FSC proaches rely on trac engineering and regulating packet scheduler. When ¯ows are inactive, their trac at the ingress router using service models unused bandwidth can be utilized by other active such as Di€Serv [3]. For the purpose of labeling ¯ows. packets, MPLS-based solutions require the inser- A programmable network router architecture tion of a shim layer between layer 2 and layer 3 facilitates the virtualization of a router's control protocols or overloading of existing layer 2 pro- plane. Projects such as Tempest [31], Genesis [6], tocol ®elds. The networks where such a service is and Virtual Active Network )VAN) [28] represent deployed must therefore be MPLS aware. In con- recent e€orts in using concepts of programmable trast, the VNS approach is an IP layer solution networks for deploying virtual networks. Their and is independent of the underlying link layer. approaches are conceptually similar to ours. Ar- chitecturally, Tempest is an ATM-based solution that uses logical entities called switchlets for iso- 5. Summary lating multiple control architectures. Genesis on the other hand, has an architecture for spawning In this paper, we presented the design and a virtual networks through the operating system prototype implementation of VNS, a VPN service services of the Genesis kernel. VANuses a func- that is customizable and supports VPN-level QoS. tional language [11] to specify virtual networks We had three design goals. First, we wanted VPN and virtual networks generated from VANare support at the IP level for interoperability across deployed as application layer tunnels using UDP multiple network technologies. Second, we wanted encapsulation. In VNS, we leverage Darwin's VPNs to be very similar to physical networks by programmable router architecture, which provides providing the ¯exibility to use a variety of QoS programmability of the routers through the use of models inside the VPNs. In fact, we want VPNs to delegates and an open programming interface be better than physical networks in the sense that called RCI. heavily used VPNs can share the unused capacity Virtualization of packet forwarding can be im- of lightly loaded VPNs through statistical multi- plemented in several di€erent ways. To our plexing. Finally, users should be able to customize knowledge, VNS and Detour are the only two the management and control functions of their projects that implement virtual forwarding by VPN. modifying the behavior of the forwarding mecha- Our proposed VNS design uses IP tunnels and nism in a router's kernel. VNS virtualizes the IP security as the basic VPNinfrastructure. To forwarding mechanism by maintaining multiple support the VPNisolation and customization that forwarding tables to isolate VPNroutes. In De- is needed to meet the above goals we use three tour, the forwarding mechanism looks up routes in complementary mechanisms. A H-FSC scheduler a ¯ow database and tunnels packets using IP-in-IP provides bandwidth isolation between VPNs and encapsulation every time the packet traverses from allows each VPNto independently manage the one virtual node to another virtual node. In our bandwidth that is assigned to it. Customization of approach, encapsulation occurs only once at the control plane functionality is provided by using a edge of the network and no tunneling is needed in programmable router platform that supports the the core of the network. Furthermore, because execution of third party control plane protocols. routing algorithms are not considered as part of These customized control protocols can control the Detour framework, the ¯ow database used for the data path functions for the trac they are route lookups in Detour are manually con®gured responsible for through a RCI. Finally, we virtu- with routes. The VNS approach allows for auto- alized critical functions in the data plane. The matic construction of a VPNforwarding table by H-FSC scheduler already supports virtualization using a VPN-speci®c routing protocol. of resource allocation )scheduling) and in our MPLS-based VPNsolutions such as in [22,23] prototype we also demonstrate the virtualization have also been proposed. For QoS, these ap- of packet forwarding. 150 L.K. Lim et al. / Computer Networks 36 .2001) 137±151

We implemented a VNS prototype based on [11] S. DaSilva, D. Florissi, Y. Yemini, NetScript: a language- this design using the Darwin network as a foun- based approach to active networks, Technical Report, dation. Darwin is a programmable network that Computer Science Department, Columbia University, New York, January 1998. uses the H-FSC scheduler and it also provides a [12] N.G. Dueld, P. Goyal, A. Greenberg, P. Mishra, K.K. signaling protocol that makes bandwidth reserva- Ramakrishnan, J.E. van der Merwe, A ¯exible model for tions and installs customized control protocols. resource management in virtual private networks, in: Our prototype is rich enough to demonstrate Proceedings of the ACM SIGCOMM 1999, September bandwidth isolation, isolation of bandwidth 1999, pp. 95±108. [13] D. Ferrari, L. Delgrossi, Supranets, Technical Report management, and customization of routing and CTR-96-001, Center for Research on the Applications of packet forwarding. We plan to expand our pro- Telematics to Organizations and Society )CRATOS), totype to further evaluate the possibilities of our Universita Cattolica, Piacenza, Italy, September 1996. approach, e.g., by providing support for custom- [14] S. Floyd, V. Jacobson, Link-sharing and resource man- ized signaling protocols and hierarchical VPNs. agement models for packet networks, IEEE/ACM Trans- actions on Networking 3 )4) )1995) 365±386. [15] S. Fotedar, M. Gerla, P. Crocetti, L. Fratta, ATM virtual private networks, Communications of the ACM 38 )2) References )1995). [16] J. Gao, P. Steenkiste, E. Takahashi, A. Fisher, A [1] Collaborative Advanced Inter Agency Research Network. programmable router architecture supporting control http://www.cairn.net. plane extensibility, IEEE Communications Magazine, [2] J.C.R. Bennett, H. Zhang, Hierarchical packet fair queue- March 2000. ing algorithms, IEEE/ACM Transactions on Networking 5 [17] S. Kent, R. Atkinson, IP Encapsulation Security Payload )5) )1997) 675±689 )also in SIGCOMM'96). )ESP), Request for Comments )Standards Track) 2406, [3] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, W. Internet Engineering Task Force, November 1998. Weiss, An Architecture for Di€erentiated Services, Request [18] S. Kent, R. Atkinson, Security Architecture for the Internet for Comments )Informational) 2475, Internet Engineering Protocol, Request for Comments )Standards Track) 2401, Task Force, December 1998. Internet Engineering Task Force, November 1998. [4] R. Braden, L. Zhang, S. Berson, S. Herzog, S. Jamin, [19] G. Malkin, RIP Version 2, Request for Comments Resource Reservation Protocol RSVP ± Version 1 Func- )Standards Track) 2453, Internet Engineering Task Force, tional Speci®cation, Request for Comments )Standards November 1998. Track) 2205, Internet Engineering Task Force, September [20] M.K. McKusick, K. Bostic, M.J. Karels, J.S. Quarterman, 1997. in: The Design and Implementation of the 4.4BSD Unix [5] T. Braun, M. Gunter, I. Khalil, An architecture for Operating System, Addison-Wesley, Reading, MA, 1996. managing QoS-enabled VPNs over the Internet, in: 24th [21] D. Meyer, Administratively Scoped IP Multicast, Request IEEE Annual Conference on Local Computer Networks for Comments )Best Current Practice) 2365, Internet )LCN99), Lowell, Boston, MA, October 1999. Engineering Task Force, July 1998. [6] A.T. Campbell, M.E. Kounvanis, D.A. Villela, J. Vicente, [22] K. Muthukrishnan, A. Malis, Core MPLS IP VPN K. Miki, H.G. De Meer, K.S. Kalaichelvan, Spawning Architecture, Internet draft draft-muthukrishnan-mpls- networks, IEEE Network Magazine, July/August 1999. corevpn-arch-03.txt, Work in Progress, expires December [7] M.C. Chan, A.A. Lazar, R. Stadler, Customer manage- 2000. ment and control of broadband VPNservices, in: Pro- [23] E. Rosen, Y. Rekhter, BGP/MPLS VPNs, Request for ceedings of the Fifth IFIP/IEEE International Symposium Comments )Informational) 2547, Internet Engineering on Integrated Network Management, May 1997. Task Force, March 1999. [8] P. Chandra, A. Fisher, C. Kosak, T.S.E. Ng, P. Steenkiste, [24] J.M. Schneider, T. Preuss, P.S. Nielsen, Management of E.Takahashi, H. Zhang, Darwin: resource management for virtual private networks for integrated broadband com- value-added customizable network service, in: Sixth IEEE munication, in: Proceedings of the ACM SIGCOMM 1993, International Conference on Network Protocols September 1993, pp. 224±237. )ICNP'98), Austin, TX, October 1998. [25] K. Sklower, A tree-based packet routing table for Berkeley [9] P. Chandra, A. Fisher, P. Steenkiste, A signaling protocol UNIX, in: Proceedings of the Usenix Winter Conference, for structured resource allocation, in: Proceedings of the Dallas, TX, January 1991. IEEE Infocomm '99, New York, March 1999. [26] Software available for download at http://www.mrtd.net. [10] A. Collins, The Detour framework for packet rerouting, [27] I. Stoica, H. Zhang, T.S.E. Ng, A hierarchical fair service Ph.D. Qualifying Examination, Department of Computer curve algorithm for link sharing, real-time and priority Science and Engineering, University of Washington, No- service, in: Proceedings of the ACM SIGCOMM, Septem- vember 1998. ber 1997. L.K. Lim et al. / Computer Networks 36 .2001) 137±151 151

[28] G. Su, Virtual Active Network: A White Paper. http:// Prashant R. Chandra received his B.E. www.cs.columbia.edu/gongsu. in Electronics Engineering from [29] E. Takahashi, P. Steenkiste, J. Gao, A. Fisher, A Bangalore University in 1991, M.S. in Computer Engineering from West programming interface for network resource management, Virginia University in 1994 and Ph.D. in: Proceedings of the 1999 IEEE Open Architectures and in Computer Engineering from Car- Network Programming, New York, March 1999, pp. 34± negie Mellon University in 2000. He is 44. currently a network architect at Intel Corporation. His research interests are [30] J. Touch, S. Hotz, The X-Bone, in: The Third Global in the areas of programmable net- Internet Mini-conference in Conjunction with Globecom works, signaling protocols and trac '98, Sydney, Australia, November 1998. engineering. [31] J.E. van der Merwe, S. Rooney, I.M. Leslie, S.A. Crosby, The Tempest ± a practical framework for network programmability, IEEE Network 12 )3) )1998) 20±28. Peter Steenkiste is an Associate Pro- [32] G.R. Wright, W.R. Stevens, TCP/IP Illustrated. Vol. 2. fessor in the School of Computer Sci- The Implementation, Addison-Wesley, Reading, MA, ence and the Department of Electrical and Computer Engineering at Carne- 1995. gie Mellon University. He received the degree of Electrical Engineer from the University of Gent in Belgium in 1982, L. Keng Lim received his M.S. in In- and M.S. and Ph.D. degrees in Elec- formation Networking )2000) and B.S. trical Engineering from Stanford Uni- )1993) in Math/Computer Science versity in 1983 and 1987. His research from Carnegie Mellon University. His interests are in the area of network research interest is in virtualization of support for electronic services. network infrastructure for providing QoS to scalable overlay IP networks. He is currently a software engineer working on a new class of optical In- Hui Zhang is the Finmeccanica Asso- ternet switching systems that combines ciate Professor at the School of Com- optical and IP technologies at Laurel puter Science of Carnegie Mellon Networks Inc. University. He received a B.S. in Computer Science from Beijing Uni- versity in 1988, an M.S. in Computer Engineering from Rensselaer Poly- Jun Gao received his B.S. degrees in technic Institute in 1989, and Ph.D. in Engineering Physics and Computer Computer Science from the University Science in 1995 from Tsinghua Uni- of California at Berkeley in 1993. Hui versity, Beijing, China, and an M.S. Zhang's research interests are in degree in Nuclear Engineering in 1997 scalable solutions for QoS and value- from University of Virginia, and an added services over the Internet. He M.S. degree in Computer Science in received the National Science Foun- 1999 from Carnegie Mellon Univer- dation CAREER Award in 1996 and sity. He is currently a Ph.D. candidate the Alfred Sloan Fellowship in 2000. of Computer Science at Carnegie Mellon. His research interests include network resource management mech- anisms and customizable Internet ser- vices.

T.S. Eugene Ng received his B.S. in Computer Engineering from Univer- sity of Washington in 1995 and M.S. in Computer Science from Carnegie Mellon University in 1998. He is cur- rently a Ph.D. candidate in Computer Science at CMU. His thesis research focuses on developing a 3rd-party network service to enable connectivity across Internet networks of heteroge- neous address spaces, and perfor- mance optimization techniques in a wide range of 3rd-party network ser- vices.