Swing: Realistic and Responsive Network Traffic Generation

1 Swing: Realistic and Responsive Network Traffic Generation Kashi Venkatesh Vishwanath and Amin Vahdat University of California, San Diego {kvishwanath, vahdat}@cs.ucsd.edu Abstract— This paper presents Swing, a closed-loop, network- instance, and faithfully reproduce the corresponding effects responsive traffic generator that accurately captures the packet on the generated traffic. interactions of a range of applications using a simple structural model. Starting from observed traffic at a single point in the Usage of link over time network, Swing automatically extracts distributions for user, 100 application, and network behavior. It then generates live traffic HTTP corresponding to the underlying models in a network emulation NNTP 80 Mawi environment running commodity network protocol stacks. We Mawi3 Mawi2 SMTP find that the generated traffic is statistically similar to the original Napster traffic. Further, to the best of our knowledge, we are the first to 60 reproduce burstiness in traffic across a range of timescales using a model applicable to a variety of network settings. An initial 40 sensitivity analysis reveals the importance of our individual model parameters to accurately reproduce such burstiness. Finally, we Percentage by bytes 20 explore Swing’s ability to vary user characteristics, application properties, and wide-area network conditions to project traffic characteristics into alternate scenarios. 0 Time from 04/2001 − 02/2006 Index Terms— Traffic Generator, Modeling, Wavelet Scaling, Burstiness, Structural Model. Fig. 1. Trends in usage of protocol by bytes over time for Mawi traces. Second, we require techniques to populate the model from I. INTRODUCTION existing packet traces [11]–[13] to validate its efficacy in capturing trace conditions. That is, traffic based on a model The goal of this work is to design a framework capable populated from a given packet trace should reproduce the of generating live network traffic representative of a wide essential characteristics of the original trace. Of course, the range of both current and future scenarios. Such a framework model can be populated from a designer’s “first principles” as would be valuable in a variety of settings [1], including: well, enabling traffic generation corresponding to a variety of capacity planning [2], high-speed router design, queue man- scenarios, both real and projected. agement studies [3], worm propagation models [4], bandwidth In this paper, we present the design, implementation, and measurement tools [5], [6], network emulation [7], [8] and evaluation of Swing, a traffic generation tool that addresses simulation [9]. We define traffic generation to result in a time- these challenges. The principal contribution of our work is an stamped series of packets arriving at and departing from a understanding of the requirements for matching the burstiness particular network interface with realistic values for at least of the packet arrival process of an original trace at a variety of the layer 3 (IP) and layer 4 (TCP/UDP) headers. This traffic timescales, ranging from fine grained (1 ms) to coarse grained should accurately reflect arrival rates and variances across a (multiple minutes). Swing matches burstiness for: i) both bytes range of time scales, e.g., capturing both average bandwidth and packets, ii) both directions (arriving and departing) of and burstiness. The traffic should further appropriately map to a network interface, iii) a variety of individual applications flow and packet-size distributions, e.g., capturing flow arrival within a trace (e.g., HTTP, P2P, SMTP, NNTP, etc.), and iv) rate, length distributions, etc. original traces at a range of speeds and taken from a variety We consider two principal challenges to achieving this goal. of locations. First, we require an underlying model with simple, semanti- Critical to the success of this effort is our ability to both cally meaningful parameters that fully specify the character- measure and reproduce in our traffic generation infrastructure istics of a given trace. By semantically meaningful, we mean the prevailing wide-area network characteristics at the time of that it should be straightforward to map high-level applica- the trace. Earlier work shows that it is possible to recreate tion and network conditions to the model. It is well known aggregate trace characteristics (e.g., average bandwidth over that traffic characteristics across a given link change over a period of minutes) without reproducing wide-area network time for a variety of reasons including changing application conditions [14]. We show that reproducing burstiness at a mix [10]. Consider packet-level traces from the Wide working range of timescales (especially sub-RTT) requires recreating group [11] over a five-year period in Figure 1, for instance, network conditions for the transmitting/receiving hosts in the that shows a shift in popularity among HTTP, NAPSTER, original trace. Of course, extracting wide-area network con- NNTP and SMTP. Any traffic generation model should extend ditions from a single packet trace is a difficult problem [15], the ability to express such changes in application mix, for [16]. Thus, a contribution of this work is an understanding of the extent to which existing techniques for passively measur- in correspondingly meaningful changes in the characteristics ing wide-area network conditions are sufficient to accurately of the resulting trace. reproduce the burstiness of a given trace and whether chang- By maximally random, we mean that a trace generation ing assumptions about prevalent network conditions result in tool should be able to generate a family of traces constrained correspondingly meaningful changes in resulting traces (e.g., only by the target characteristics of the original trace and not based on halving or doubling the prevalent round trip time). the particular pattern of communication in the trace. Thus, Given the ability to reproduce original trace characteristics, multiple traces generated to follow a given set of charac- Swing can then be used to explore a variety of what-if sce- teristics should vary (perhaps significantly) across individual narios by tuning user, application, and network parameters. connections while still following the appropriate underlying Further, because Swing uses real TCP connections and closed- distributions. This requirement eliminates the trivial solution loop sessions, the generated traffic is “responsive” for instance, of simply replaying the exact same connections in the exact to changing network conditions, or competing application same order seen in some original trace. While quantifying the traffic. extent to which we are successful in generating maximally In the Sections that follow we first describe our methodol- random traffic is beyond the scope of this paper, this require- ogy (§ II-C) to parameterize (§ III-A) a given trace and use it ment significantly influences our approach to, and architecture to generate (§ III-C) live traffic and the corresponding packet for, trace generation. trace. We then validate (§ V) Swing’s ability to faithfully reproduce trace characteristics, critically examine (§ V-C) the B. Overview sensitivity of generated traces to individual model parameters and finally explore Swing’s ability to project (§ V-D) traffic Our hypothesis is that realistic and responsive packet gen- characteristics into the future. eration must be informed by accurate models of: i) the users and programs initiating communication across the target link, II. THE SWING APPROACH ii) the hardware, software, and protocols hosting the programs, and iii) the large space of other network links responsible for A. Requirements carrying packets to and from the target link. Without modeling This section describes our goals, assumptions, and approach users, it is impossible to study the effects of architectural to packet trace generation. We extract bi-directional char- changes on end users or to capture the effects of changing acteristics of packets traversing a single network link, our user behavior (e.g., if user patience for retrieving web content target. Before we describe our approach to trace generation, is reduced). Similarly, without an understanding of a wide mix we present our metrics for success: realism, responsiveness, of applications and protocols, it is difficult to understand the and maximally random. effects of evolving application popularity on traffic patterns at Packet trace generation is not an end in itself, rather a tool a variety of timescales. Finally, the bandwidth, latency, and to aid higher-level studies. Thus, the definition of realism for a loss rate of the links upstream and downstream of the target trace generation mechanism must be considered in the context affect packet inter-arrival characteristics and TCP transmission of its usage. For instance, generating traces for capacity plan- behavior of the end hosts communicating across the target link ning likely only requires matching aggregate trace characteris- (see § V-C). tics such as bandwidth over relatively coarse timescales. Trace We first describe our methodology for trace generation generation for high-speed router design, queue management assuming perfect knowledge of these three system compo- policies, bandwidth estimation tools, or flow classification nents. In the following sections we describe how to relax algorithms have much more stringent requirements. this assumption, extracting approximate values for all of these We aim to generate realistic traces for a range of usage sce- characteristics

Load more