CREID: Development of reliable and scalable DHCP system for carrier IP networks

Katsuhiro Naito Department of Electrical and Electronic Engineering, Mie University, 1577 Kurimamachiya, Tsu, 514-8507, Japan Email: [email protected] Makoto Nishide Net Step Inc. 213 Obatachomiyamae, Ise, 519-0504, Japan Email: [email protected] Eiji Miyazoe OSS BroadNet Inc. 3-5-7 Hisamoto, Takatsuku, Kawasaki, 213-0011, Japan Email: [email protected]

Abstract— Dynamic configuration protocol (DHCP) is of ISC-DHCP is distributed over various kinds of UNIX OS. essential service to configure information about networks at Therefore, many users use the in local networks. On user terminals in Internet service providers (ISPs). Therefore, the contrary, the transaction performance of ISC-DHCP is not many DHCP programs are released in Internet. However, few free software DHCP programs can achieve required reliability high comparing to the requirement in commercial large ISPs. and scalability in ISP’s usage. In this paper, we develop the Additionally, the fail-over mechanisms of ISC-DHCP are not reliable and scalable DHCP system called CNR Emulator on enough to achieve a stable DHCP service in commercial ISPs ISC-DHCP (CREID) based on free software such as ISC-DHCP, [9], [10], [11]. Duplicated Replicated Block Device (DRBD), and Pacemaker. In this paper, we develop the reliable and scalable DHCP ISC-DHCP supports Internet Protocol (IP) v4 and IPv6 that are required in commercial ISP services. DRBD and Pacemaker system called CNR Emulator on ISC-DHCP (CREID) based can construct clustering systems over some physical computers. on free software such as ISC-DHCP, Duplicated Replicated From the numerical results, we can find that the developed DHCP Block Device (DRBD) [12], and Pacemaker [13]. ISC-DHCP system can achieve enough DHCP transaction performance which supports Internet Protocol (IP) v4 and IPv6 that are required in is required in commercial ISPs, and high service availability over commercial ISP services. DRBD and Pacemaker can construct 99.999%. clustering systems over some physical computers. Therefore, Keywords— DHCP, Carrier IP networks, ISC-DHCP, DRBD, the CREID can provide required functions in commercial Pacemaker ISPs, and can achieve reliability by using cluster fail-over mechanisms. From the evaluation experiments, we can find I.INTRODUCTION that the developed DHCP system can achieve enough DHCP Dynamic host configuration protocol (DHCP) [1], [2] be- transaction performance which is required in commercial ISPs, comes more important functions to setup network configura- and high service availability over 99.999%. tion automatically according to the development of computer networks. Especially, Internet Service Providers (ISPs) require II.FUNDAMENTALPERFORMANCEOF ISC-DHCP the DHCP function to manage user terminals and assign The developed DHCP system employs ISC-DHCP as the IP addresses [3], [4], [5], [6]. Therefore, they require more fundamental DHCP software. Therefore, it is important to reliable and scalable DHCP systems to provide stable network evaluate the fundamental performance of ISC-DHCP to design services. As the results, high-end DHCP systems for commer- the developed system. Table I shows the hardware specifica- cial ISPs have achieved high reliability and scalability [7]. In tions in fundamental evaluation of ISC-DHCP. these systems, they can achieve high transaction performance Generally, IP address ranges are registered in a DHCP of DHCP requests and support fail-over mechanisms against beforehand. Transaction performance of assigning an server troubles. On the contrary, the price of these systems IP address deteriorates due to increase of IP address range. becomes expensive due to the lack of competitors. Therefore, In this evaluation, we use dhcperf [14], which is an evalu- it is difficult to employ them in small ISPs due to lack of ation software for DHCP services. In the measurements, we capital-investment spending even if they have a better and measure the DHCP discover transaction performance including stable performance. Discover, Offer Request, Ack, and Release. ISC-DHCP [8] is the well known free DHCP software Fig. 1 shows the DHCP transaction performance versus provided by Internet Systems Consortium. The implementation the IP address range. From the results, we can find that the TABLE I 2 HARDWARE SPECIFICATIONS IN FUNDAMENTAL EVALUATION

1.5 DHCP Server DHCP Client CPU Intel Xeon(R) Intel Celeron E5620 2.40GHz G1101 2.26GHz 1 Memory 8GB 1GB HDD 300GB 250GB 6G SAS 15000rpm SATA2 7200rpm 0.5 Network 1Gbps OS Scientific 6.1 DHCP of timeBootprocess [s]

DHCP ICS-DHCP 4.1-ESV-R2 0 1000 10000 100000 Number of registered MAC addresses 100 Fig. 2. Boot time of DHCPD process with registered MAC address. 80

2 60

40 1.5

20 1

0 Transactionperformance [Transactions/s] 10000 100000 100000 Number of IP address range 0.5 Boot time of DHCP of timeBootprocess [s] Fig. 1. DHCP transaction performance of ISC-DHCP. 0 100 1000 10000 50000 Number of leased IP addresses transaction performance deteriorates drastically according to Fig. 3. Boot time of DHCPD process with leased records. the increase of the IP address range. For example, Common Antenna TeleVision (CATV) provides 50,000 Cable Modems (CMs) with one Cable Modem Termination System (CMTS). Therefore, the IP address range should be set as twice number increases when the number of leased IP address increases. of CMs. As the results, the transaction performance of ISC- However, the maximum boot time is short, and does not effect DHCP in large networks is not enough in commercial ISPs. the availability of DHCP service when the retrial mechanisms Some ISPs register subscriber’s MAC address information are implemented at DHCP clients. into DHCP systems to provide adequate services. ISC-DHCP Fig. 4 shows the boot time of DHCPD process with timeout always read the MAC address information when DHCPD pro- records. From the results, the boot time increases drastically cesses boot up. Therefore, boot time depends on the number of according to the number of timeout records. Fig. 5 shows registered MAC addresses. ISC-DHCP cannot provide DHCP the boot time of DHCPD process with renewed records. service during booting of a DHCPD process. Therefore, boot From the results, we can find that the boot time increases time is an important factor to achieve high service availability. when the number of renewed records increase. During the Fig. 2 shows the boot time of DHCPD process with reg- boot time of the DHCPD process, the DHCP service is not istered MAC address. The results shows that the boot time available. Therefore, the reduction of boot time is important increases up to a few second when the number of MAC address in commercial usages. increases. DHCP client software generally implements retry III.SERVICE AVAILABILITY mechanisms when DHCP service is temporally unavailable. Therefore, the short boot time does not effect the availability Generally, system managers of DHCP services edit con- of DHCP service. figuration files according to user contracts and maintenance ISC-DHCP servers add lease information of IP addresses of networks. DHCPD processes should be restarted to reload into lease files when DHCP clients request a new IP address the latest configuration files when the configuration files are or renew an assigned IP address. Therefore, the size of the edited. During the restating process, DHCP services are down. lease files increases according to the service period. In ISC- Additionally, DHCP services are also down when the DHCP DHCP, DHCPD processes optimizes the lease files when they system switches a main server to a backup server due to main boot up. server troubles. Therefore, the service availability of DHCP Fig. 3 shows the boot time of DHCPD process with leased services is an important factor in commercial usages. In this records. From the results, we can find that the boot time section, we consider the service availability ratio for DHCP 35 when the lease files include timeout or renewed records. 30 Therefore, system managers should restart DHCP pro-

25 cesses to reduce the size of the lease files periodically. We define the periodical restart frequency as Popt, and 20 required boot time of DHCPD processes as Topt. 15 The optimization process is performed when the DHCPD process is restarted. Therefore, we redefine the required time 10 ′ ′ ′ including optimization process as Tshort,Tlong, and Tfail. Boot time of DHCP of timeBootprocess [s] 5 DHCP clients renew an assigned IP address when the lease

0 time is end, and try to renew several times when it cannot 100 1000 10000 50000 receive DHCP messages from DHCP servers. During retrial Number of lease timeout terminals period, it continues to use the assigned IP address. Therefore, Fig. 4. Boot time of DHCPD process with timeout records of ISC-DHCP. we should consider the timeout period for the DHCP renew process. We define the timeout period at DHCP client as Ttimeout. The actual service down period depends on each 35 period and the timeout period. For example, the actual service ′′ 30 down period due to the short maintenance T is { short 25 0 (T ′ ≤ T ) T ′′ = short timeout (1) short ′ − ′ 20 Tshort Ttimeout (Tshort > Ttimeout). 15 Then, we can calculate the total service down period per

10 day as

Boot time of DHCP of timeBootprocess [s] 5 ′′ ′′ Tdown = PshortTshort + PlongTlong 0 ′′ ′′ 100000 1000000 +PfailTfail + PoptTopt. (2) Number of records When we assume that each user terminal boots up indepen- Fig. 5. Boot time of DHCPD process with renewed records of ISC-DHCP. dently, the number of DHCP clients that request to renew an assigned IP address is

Nlease = Ncm/Tlease, (3) services. At first, we classify reasons of restart process into four where Ncm is the number of user terminals and Tlease is the categories: short-term maintenance, long-term maintenance, lease period of DHCP services. system troubles, optimization of lease files. Then we can obtain the service available ratio per user terminals as • Short-term maintenance System managers edit configuration files according to Pavailable = 1 − NleaseTdown/Ncm. (4) user contracts day by day. We define the short-term IV. CREID SYSTEM maintenance frequency as Pshort, and required boot time of processes as Tshort. A. System model • Long-term maintenance Fig. 6 shows the system model of CREID. This figure System managers edit configuration files according to assumes that CATV networks where CREID provides DHCP network maintenance, policy changes, and redefinition of services to cable modems (CMs). In order to improve the trans- subnet, IP address ranges, etc. We define the long-term action performance of ISC-DHCP, CREID employs multiple maintenance frequency as Plong, and required boot time processes of DHCPD. Therefore, each DHCPD process can of processes as Tlong. provide independent DHCP services for CMs. Additionally, • Server troubles we construct a cluster system to achieve fail-over mechanisms DHCP systems should change a main server to a backup by using DRBD and Pacemaker. As the results, CREID can server to continue DHCP services to user terminals when switch from the main server to the backup server when some some troubles happen on the main server. We define the troubles happen in the main server. occurrence frequency of main server troubles as Pfail, and required switching period from the main server to B. Experimental results the backup server as Tfail. We perform experimental measurements of CREID to eval- • Optimization of lease files uate the performance. In the measurements, we use the same ISC-DHCP requires long period to optimize the lease files hardware shown in Table II. MAC LIST 1 dhcpd

MAC LIST 2 dhcpd IF IF

MAC LIST L dhcpd

DRBD+Pacemaker

IF IF Main DHCP Server

Backup DHCP Server IF IF Switch Switch CMTS DRBD+Pacemaker Management Server

MAC LIST 1 dhcpd

MAC LIST 2 dhcpd IF IF

MAC LIST L dhcpd

Fig. 6. System model of CREID.

TABLE II 500 HARDWARE SPECIFICATIONS IN CREID 400 DHCP Server DHCP Client CPU Intel Xeon(R) Intel Celeron 300 E5620 2.40GHz G1101 2.26GHz Memory 8GB 1GB HDD 300GB 250GB 200 6G SAS 15000rpm SATA2 7200rpm Network 1Gbps 100 OS Scientific Linux 6.1 DHCP ICS-DHCP 4.1-ESV-R2 0 Pacemaker 1.0.11 Transactionperformance [Transactions/s] 0 20 40 60 80 100 Heartbeat 3.0.5 Number of DHCPD processes DRBD 8.4.1 Fig. 7. DHCP Discover transaction performance of CREID.

1) Clustering performance: CREID employs DRBD and Pacemaker to achieve fail-over mechanisms of DHCP service. number of DHCPD processes. In CATV networks, required Clustering mechanisms generally cause performance overhead transaction performance is between 100 and 300 transactions due to additional processing of clustering layer. CREID re- per second [15]. Therefore, CREID can achieve the required quires accessing to hard disks when an IP address is leased. performance by using free software. Therefore, I/O performance is an important factor in CREID. 3) Boot time performance with multiple DHCPD processes: In the measurements, CREID mounts a disk with synchronous Fig. 9 shows the boot time of DHCPD process with registered option. Then, we measure the period for copying sixty files MAC address of CREID. From the results, we can find that from the primary server to the backup server. The size of each the boot time is less than 200 [ms] even if the number of file is 10 [MBytes]. registered MAC addresses increases. CREID allocates subset The results show that the throughput performance without of MAC addresses to each DHCPD process. Therefore, the DRBD is 56.34 [Mbps], and that with DRBS is 3.81 [Mbps]. number of registered MAC address for each DHCPD process Since lease files for DHCP processes include only text infor- can be decreased. mation, the throughput performance with DRBS is enough to Fig. 10 shows the boot time of DHCPD process with leased exchange the text information. records. From the results, CREID can keep short boot time 2) Transaction performance with multiple DHCPD pro- when the number of lease IP addresses increases. cesses: Fig. 7 shows the DHCP discover transaction perfor- Fig. 11 shows the boot time of DHCPD process with timeout mance of CREID. Fig. 8 shows the DHCP renew transaction records. From the results, we can find that the boot time is performance of CREID. From the results, we can find that quite short even if the number of timeout records increases. we can improve the performance by selecting the adequate Fig. 12 shows the boot time of DHCPD process with renewed 1000 0.5

800 0.4

600 0.3

400 0.2

200 0.1 Boot time of DHCP of timeBootprocesses [s]

0

Transactionperformance [Transactions/s] 0 0 20 40 60 80 100 100 1000 10000 50000 Number of DHCPD processes Number of leased IP addresses

Fig. 8. DHCP Renew transaction performance of CREID. Fig. 10. Boot time of DHCPD process with leased records of CREID.

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1

Boot time of DHCP of timeBootprocesses [s] 0.1 Boot time of DHCP of timeBootprocesses [s] 0 1000 10000 100000 0 Number of registered MAC addresses 100 1000 10000 50000 Number of lease timeout terminals Fig. 9. Boot time of DHCPD process with registered MAC address of CREID. Fig. 11. Boot time of DHCPD process with timeout records of CREID.

records. From the results, CREID can reduce the boot time DHCP services to 50,000 user terminals. The lease time is set when the number of renewed records increase. From the as 24 hours, Pshort is set as 10, Plong is set as once per three measurement results, we can find that CREID can achieve months, Popt is 1. From the numerical results, the boot time scalable DHCP system when the large number of user terminal is about 0.2 [s]. Therefore, we set Tshort,Tlong, and Topt as is required in commercial ISPs. 0.25 [s]. 4) Processing performance: Tab. III shows the memory Additionally, the measured switching period of fail-over usage and swap memory usage with multiple DHCPD pro- mechanisms is about 23.2 [s]. Therefore, we set Tfail as 23.5 cesses. The results show that the memory usage increases [s] and Pfail as once per a year . In the implementation of when the number of DHCPD processes increases. However, ISC-DHCP client, timeout period is 10 [s]. But, we evaluate the increased memory amount is not large. Therefore, the more critical situation. So, we set the Ttimeout as 0 [s]. The system will work stably when the enough physical memories summary of the parameters is shown in Tab. V. are implemented. From the equations in the section III, we can obtain that Tab. IV shows the CPU load with multiple DHCPD pro- T is 5.3 [s], and P is 99.9991%. Therefore, we cesses. From the results, the maximum value of r increases down available can find that CREID can achieve high service availability when the number of DHCPD processes increases. However, according to the experimental measures. the average value of r is smaller than the maximum value. Therefore, CPU load is not constantly high. Additionally, the maximum value of b increases. However, the value of b is V. CONCLUSIONS smaller than the value of r. A server system generally works well when the average CPU load is small. Therefore, CREID This paper proposed a reliable and scalable DHCP systems also works well on the real hardwares. called CREID. The developed system consists of free software such as ISC-DHCP, DRBD and Pacemaker. From the exper- C. Service available ratio imental measurements, we can find CREID can achieve high We evaluate the service available ratio of CREID according transaction performance that is required in commercial ISPs to the experimental results. We assume that CREID provides and high service available ratio more than 99.999%. TABLE III MEMORY AND SWAP MEMORY USAGE.

Number of processes 1 10 20 25 40 50 80 100 Memory usage (KB) 830,144 1,003,468 1,141,088 1,214,228 1,424,456 1,577,352 1,999,016 2,289,708 swap memory usage (KB) 0 0 0 0 0 0 0 0

TABLE IV CPU LOAD.

Number of process 1 10 20 25 40 50 80 100 Maximum value of r 1 7 19 22 25 34 58 66 Average value of r 0 1 1 2 1 7 12 7 Maximum usage ratio of users (%) 7 4 5 6 7 9 13 15 Average usage ratio of users (%) 6 3 4 4 6 7 11 14 Maximum value of b 3 2 3 3 2 3 3 5 Average value of b 1 1 1 2 1 1 2 2 Maximum CPU wait ratio (%) 9 19 18 17 17 19 17 17 Average CPU wait ratio (%) 6 16 16 16 15 17 15 14 Increased value of bi 1,202 746 220 452 208 630 179 347 Increased value of bo 28,902 96,571 105,216 111,363 124,097 137,204 161,086 184,383

1 TABLE V PARAMETERS FOR EVALUATING AVAILABILITY 0.8 Number of CMs 50,000 0.6 Lease period of DHCP 24 hours Pshort 10 times / day Tshort 0.25 s 0.4 Plong 1 time / 3 months Tlong 0.25 s P 1 time / year 0.2 fail

Boot time of DHCP of timeBootprocesses [s] Tfail 23.5 s Poptimization 1 time / day 0 Toptimization 0.25 s 100000 1000000 Ttimeout 0 s Number of records

Fig. 12. Boot time of DHCPD process with renewed records of CREID. [8] Internet Systems Consortium, ISC-DHCP, http://www. isc.org, retrieved: January 2012. [9] R. Droms, K. Kinnear, M. Stapp, B. Volz, S. Gonczi, G. Rabil, ACKNOWLEDGMENT M. Dooley, and A. Kapur, “Draft, DHCP Failover Protocol,” IETF INTERNET DRAFT, March 2003. Authors thank Casa Systems, Inc. and OSS BroadNet Inc. [10] http://paulroberts69.wordpress.com/2011/10/27/isc-dhcp-failover-is-just- for experimental measurements. too-complex/, retrieved: January 2012. [11] http://www.accumuli.com/using-infoblox-dhcp-failover-part-1-i- 3232.php, retrieved: January 2012. EFERENCES R [12] B. Hellman, F. Haas, P. Reisner, and L. Ellenberg, “The DRBD Userfs [1] R. Droms, “Dynamic Host Configuration Protocol,” IETF RFC2131, Guide,” , http://www.drbd.org/users-guide/, retrieved: March 2012. March 1997. [13] A. Beekhof, “Pacemaker 1.0 Configuration Explained,” [2] R. Droms, J. Bound, B. Volz, T. Lemon, C. Perkins, and M. Carney, http://www.clusterlabs.org, retrieved: January 2012. “Dynamic Host Configuration Protocol for IPv6 (DHCPv6),” IETF [14] dhcperf, http://www.nominum.com, retrieved: January 2012. RFC3315, July 2003. [15] http://cn.teldevice.co.jp/product/infoblox/ib spec.html, retrieved: Jan- [3] S. Alexander and R. Droms, “DHCP Options and BOOTP Vendor uary 2012. Extensions,” IETF RFC2132, March 1997. [4] J. Littlefield, “Vendor-Identifying Vendor Options for Dynamic Host Configuration Protocol version 4 (DHCPv4),” IETF RFC3925, October 2004. [5] Cable Television Laboratories Inc., “Operations Support System In- terface Specification,” Cable Television Laboratories, Inc., CM-SP- OSSIv3.0-I07-080522, May 2008. [6] Cable Television Laboratories Inc., “CableLabs’ DHCP Options Reg- istry,” Cable Television Laboratories, Inc., CL-SP-CANN-DHCP-Reg- I02-080306, March 2008. [7] Cisco “Cisco Network Registrar,” http://www.cisco. com, retrieved: January 2012.