Self-Configuring Network Traffic Generation
Total Page:16
File Type:pdf, Size:1020Kb
Self-Configuring Network Traffic Generation Joel Sommers Paul Barford University of Wisconsin–Madison University of Wisconsin–Madison [email protected] [email protected] ABSTRACT results [15, 30]. Having appropriate tools for generating scal- The ability to generate repeatable, realistic network traffic is able, tunable and representative network traffic is therefore critical in both simulation and testbed environments. Traf- of fundamental importance. Such tools are critical in labo- fic generation capabilities to date have been limited to either ratory test environments (such as [10, 11]) where they can simple sequenced packet streams typically aimed at through- be used to evaluate behavior and performance of new sys- put testing, or to application-specific tools focused on, for tems and real networking hardware. They are also critical example, recreating representative HTTP requests. In this in emulation environments (such as [48, 49]) and simulation paper we describe Harpoon, a new application-independent environments (such as [9, 12]) where representative back- tool for generating representative packet traffic at the IP ground traffic is needed. Without the capability to consis- flow level. Harpoon generates TCP and UDP packet flows tently create realistic network test conditions, new systems that have the same byte, packet, temporal and spatial char- run the risk of unpredictable behavior and unacceptable per- acteristics as measured at routers in live environments. Har- formance when deployed in live environments. poon is distinguished from other tools that generate statis- Current best practices for traffic generation have focused tically representative traffic in that it can self-configure by on either simple packet streams or recreation of a single automatically extracting parameters from standard Netflow application-specific behavior. Packet streaming methods such logs or packet traces. We provide details on Harpoon’s ar- as those used in tools like iperf [6] consist of sequences of chitecture and implementation, and validate its capabilities packets separated by a constant interval. These methods in controlled laboratory experiments using configurations form the basis for standard router performance tests such as derived from flow and packet traces gathered in live envi- those recommended in RFC 2544 [21] and RFC 2889 [40]. ronments. We then demonstrate Harpoon’s capabilities in Another example is an infinite FTP source which is com- a router benchmarking experiment that compares Harpoon monly used for traffic generation in simulations. While these with commonly used throughput test methods. Our results approaches provide some insight into network system capa- show that the router subsystem load generated by Harpoon bilities, they lack nearly all of the richness and diversity of is significantly different, suggesting that this kind of test can packet streams observed in the live Internet [32, 34, 44]. provide important insights into how routers might behave A number of successful application-specific workload gen- under actual operating conditions. erators have been developed including [13, 18, 37]. These tools typically focus on generating application-level request Categories and Subject Descriptors: C.2.6 [Computer- sequences that result in network traffic that has the same Communication Networks]: Internetworking—Routers; C.4 statistical properties as live traffic from the modeled applica- [Performance of Systems]: Measurement techniques, Model- tion. While they are useful for evaluating the behavior and ing techniques, Performance attributes performance of host systems (such as servers and caches), General Terms: Measurement, Performance these tools can be cumbersome for use in router or switch Keywords: Traffic Generation, Network Flows tests and obviously only recreate one kind of application traffic, not the diversity of traffic observed in the Internet. 1. INTRODUCTION These observations motivate the need for a tool capable of The network research community has a persistent need to generating a range of network traffic such as might be ob- evaluate new algorithms, systems and protocols using tools served either at the edges or core of the Internet. that (1) create a range of test conditions similar to those The contribution of this paper is the description and eval- experienced in live deployment and (2) ensure reproducible uation of a new network traffic generator capable of recre- ating IP traffic flows (where flow is defined as a series of packets between a given IP/port pair using a specific trans- port protocol - see Section 3 for more detail) representative Permission to make digital or hard copies of all or part of this work for of those observed at routers in the Internet. We are aware of personal or classroom use is granted without fee provided that copies are no other tools that target representative traffic generation not made or distributed for profit or commercial advantage and that copies at the IP flow level. Our approach is to abstract flow-level bear this notice and the full citation on the first page. To copy otherwise, to traffic generation into a series of application-independent republish, to post on servers or to redistribute to lists, requires prior specific file transfers that use either TCP or UDP for transport. permission and/or a fee. IMC’04, October 25–27, 2004, Taormina, Sicily, Italy. Our model also includes both temporal (diurnal effects as- Copyright 2004 ACM 1-58113-821-0/04/0010 ...$5.00. sociated with traffic volume) and spatial (vis-`a-vis IP ad- measured at a router in a live network. Our model can be dress space coverage) components. The resulting construc- realized in a tool for generating representative packet traffic tive model can be manually parameterized or can be self- in a live (or emulated) network environment or for creating parameterized using Netflow [2] or packet trace data and traffic in a simulation environment. components of the traffic generation tool itself. The self- An approach similar to ours has been used in a number parameterizing capability simplifies use and further distin- of application-specific workload generators. In [18], the au- guishes our system from other tools that attempt to generate thors describe the SURGE Web workload generator which statistically representative traffic. was built by combining distributional properties of Web We realized this model in a tool we call Harpoon that use. A similar approach was used in developing Web Poly- can be used in a router testbed environment or emulation graph [13] and GISMO [37] which are used to generate scal- testbed environment to generate scalable, representative net- able, representative Web cache and streaming workloads re- work traffic. Harpoon has two components: client threads spectively. Harpoon differs from these tools in that it is that make file transfer requests and server threads that trans- application independent, uses empirical distributions, gen- fer the requested files using either TCP or UDP. The result erates both UDP and TCP traffic and includes both spatial is byte/packet/flow traffic at the first hop router (from the and temporal characteristics. It should also be noted that perspective of the server) that is qualitatively the same as there are several commercial products that target application- the traffic used to produce input parameters. specific workload generation including [8]. We evaluated Harpoon’s capabilities in two ways. First, A model closely related to ours was introduced by Feld- we conducted experiments in a controlled laboratory en- mann et al. in [28]. That model generalizes the SURGE vironment using two different data sets to show that we model for Web traffic for the purpose of general background can qualitatively recreate the same traffic using Harpoon. traffic generation in the ns-2 simulator. A similar model We used a one week data set of Netflow records collected based on connection-rate superposition was developed by at a border router of the University of Wisconsin-Madison Cleveland et al. in [25]. Our model is more general in that and a series of packet traces (from which we constructed it has no Web-specific parameters, includes the capability to flow records using a standard tool) collected at the border transfer files via UDP and incorporates temporal and spa- of the University of Auckland. These experiments demon- tial request characteristics. Uhlig presents a flow level traffic strate Harpoon’s capability to generate representative flow model in [47] that is also similar to ours. That model, how- level traffic. Next, we used the same laboratory environ- ever, uses three parameters (compared with the eight that ment to conduct a throughput evaluation of a Cisco 6509 we use) and does not rely on an underlying transport proto- switch/router following the testing protocol described in RFC col to transfer individual files. Our work is also distinct from 2889. We show that while throughputs achieved using stan- each of these models in that we create and evaluate a tool dard packet streaming methods compared with Harpoon are (Harpoon) that can be used for traffic generation on real similar, the loads placed on router subsystems are substan- systems and that can be automatically parameterized from tially different. The implication of this result is that a tool flow records, thus requiring no additional modeling steps. like Harpoon can augment standard test suites to give both In contrast to workload generators based on models de- router designers and network operators insight into system signed to synthesize distributional characteristics are tools behavior under actual operating conditions. that replay traffic based on raw packet traces. tcpreplay This paper is organized as follows. After discussing re- and flowreplay [46], for example, take a very low-level ap- lated work in §2, we describe the design requirements and proach by attempting to generate packets (while option- architecture of our flow-level workload model in §3. We ally rewriting IP addresses) that mirror the specific timings present the details of Harpoon’s implementation in §4 and recorded in a packet trace. Monkey-See Monkey-Do [23] the results of the validation tests in §5.