A First Look at Modern Enterprise Traffic

A First Look at Modern Enterprise Traffic Ruoming Pang†, Mark Allman‡, Mike Bennett¶, Jason Lee¶, Vern Paxson‡,¶, Brian Tierney¶ †Princeton University, ‡International Computer Science Institute, ¶Lawrence Berkeley National Laboratory (LBNL) Abstract study of OSPF routing behavior in [21]. Our aim is to com- plement that study with a look at the make-up of traffic as While wide-area Internet traffic has been heavily studied seen at the packet level within a contemporary enterprise for many years, the characteristics of traffic inside Inter- network. net enterprises remain almost wholly unexplored. Nearly all of the studies of enterprise traffic available in the liter- One likely reason why enterprise traffic has gone un- ature are well over a decade old and focus on individual studied for so long is that it is technically difficult to mea- LANs rather than whole sites. In this paper we present sure. Unlike a site’s Internet traffic, which we can generally a broad overview of internal enterprise traffic recorded at record by monitoring a single access link, an enterprise of a medium-sized site. The packet traces span more than significant size lacks a single choke-point for its internal 100 hours, over which activity from a total of several thou- traffic. For the traffic we study in this paper, we primarily sand internal hosts appears. This wealth of data—which recorded it by monitoring (one at a time) the enterprise’s we are publicly releasing in anonymized form—spans a two central routers; but our measurement apparatus could wide range of dimensions. While we cannot form general only capture two of the 20+ router ports at any one time, so conclusions using data from a single site, and clearly this we could not attain any sort of comprehensive snapshot of sort of data merits additional in-depth study in a number of the enterprise’s activity. Rather, we piece together a partial ways, in this work we endeavor to characterize a number of view of the activity by recording a succession of the enter- the most salient aspects of the traffic. Our goal is to provide prise’s subnets in turn. This piecemeal tracing methodol- a first sense of ways in which modern enterprise traffic is ogy affects some of our assessments. For instance, if we similar to wide-area Internet traffic, and ways in which it is happen to trace a portion of the network that includes a quite different. large mail server, the fraction of mail traffic will be mea- sured as larger than if we monitored a subnet without a mail server, or if we had an ideally comprehensive view of 1 Introduction the enterprise’s traffic. Throughout the paper we endeavor to identify such biases as they are observed. While our When Cáceres captured the first published measurements methodology is definitely imperfect, to collect traces from of a site’s wide-area Internet traffic in July, 1989 [4, 5], a site like ours in a comprehensive fashion would require a the entire Internet consisted of about 130,000 hosts [13]. large infusion of additional tracing resources. Today, the largest enterprises can have more than that many Our study is limited in another fundamental way, namely hosts just by themselves. that all of our data comes from a single site, and across only It is striking, therefore, to realize that more than 15 years a few months in time. It has long been established that after studies of wide-area Internet traffic began to flourish, the wide-area Internet traffic seen at different sites varies the nature of traffic inside Internet enterprises remains al- a great deal from one site to another [6, 16] and also over most wholly unexplored. The characterizations of enter- time [16, 17], such that studying a single site cannot be rep- prise traffic available in the literature are either vintage resentative. Put another way, for wide-area Internet traffic, LAN-oriented studies [11, 9], or, more recently, focused the very notion of “typical” traffic is not well-defined. We on specific questions such as inferring the roles played by would expect the same to hold for enterprise traffic (though different enterprise hosts [23] or communities of interest this basic fact actually remains to be demonstrated), and within a site [2]. The only broadly flavored look at traf- therefore our single-site study can at best provide an exam- fic within modern enterprises of which we are aware is the ple of what modern enterprise traffic looks like, rather than USENIX Association Internet Measurement Conference 2005 15 a general representation. For instance, while other studies D0 D1 D2 D3 D4 Date 10/4/04 12/15/04 12/16/04 1/6/05 1/7/05 have shown peer-to-peer file sharing applications to be in Duration 10 min 1 hr 1 hr 1 hr 1 hr widespread use [20], we observe nearly none of it in our Per Tap 1 2 1 1 1-2 traces (which is likely a result of organizational policy). # Subnets 22 22 22 18 18 # Packets 17.8M 64.7M 28.1M 21.6M 27.7M Even given these significant limitations, however, there Snaplen 1500 68 68 1500 1500 is much to explore in our packet traces, which span more Mon. Hosts 2,531 2,102 2,088 1,561 1,558 LBNL Hosts 4,767 5,761 5,210 5,234 5,698 than 100 hours and in total include activity from 8,000 in- Remote Hosts 4,342 10,478 7,138 16,404 23,267 ternal addresses at the Lawrence Berkeley National Labo- ratory and 47,000 external addresses. Indeed, we found the Table 1: Dataset characteristics. very wide range of dimensions in which we might examine the data difficult to grapple with. Do we characterize individual applications? Transport protocol dynamics? Ev- the main components of the traffic, while § 4 looks at the idence for self-similarity? Connection locality? Variations locality of traffic sources and destinations. In § 5 we ex- over time? Pathological behavior? Application efficiency? amine characteristics of the applications that dominate the Changes since previous studies? Internal versus external traffic. § 6 provides an assessment of the load carried by traffic? Etc. the monitored networks. § 7 offers final thoughts. We note Given the many questions to explore, we decided in this that given the breadth of the topics covered in this paper, first look to pursue a broad overview of the characteristics we have spread discussions of related work throughout the of the traffic, rather than a specific question, with an aim paper, rather than concentrating these in their own section. towards informing future, more tightly scoped efforts. To this end, we settled upon the following high-level goals: • To understand the makeup (working up the protocol 2 Datasets stack from the network layer to the application layer) of traffic on a modern enterprise network. We obtained multiple packet traces from two internal net- • To gain a sense of the patterns of locality of enterprise work locations at the Lawrence Berkeley National Labora- traffic. tory (LBNL) in the USA. The tracing machine, a 2.2 GHz PC running FreeBSD 4.10, had four NICs. Each cap- • To characterize application traffic in terms of how in- tured a unidirectional traffic stream extracted, via network- tranet traffic characteristics can differ from Internet controllable Shomiti taps, from one of the LBNL net- traffic characteristics. work’s central routers. While the kernel did not report • To characterize applications that might be heavily any packet-capture drops, our analysis found occasional used in an enterprise network but only rarely used out- instances where a TCP receiver acknowledged data not side the enterprise, and thus have been largely ignored present in the trace, suggesting the reports are incomplete. by modeling studies to date. It is difficult to quantify the significance of these anomalies. We merged these streams based on timestamps synchro- • To gain an understanding of the load being imposed nized across the NICs using a custom modification to the on modern enterprise networks. NIC driver. Therefore, with the four available NICs we Our general strategy in pursuing these goals is “under- could capture traffic for two LBNL subnets. A further lim- stand the big things first.” That is, for each of the dimen- itation is that our vantage point enabled the monitoring of sions listed above, we pick the most salient contributors traffic to and from the subnet, but not traffic that remained to that dimension and delve into them enough to under- within the subnet. We used an expect script to periodically stand their next degree of structure, and then repeat the change the monitored subnets, working through the 18–22 process, perhaps delving further if the given contributor re- different subnets attached to each of the two routers. mains dominant even when broken down into components, Table 1 provides an overview of the collected packet or perhaps turning to a different high-level contributor at traces. The “per tap” field indicates the number of traces this point. The process is necessarily somewhat opportunis- taken on each monitored router port, and Snaplen gives tic rather than systematic, as a systematic study of the data the maximum number of bytes captured for each packet. would consume far more effort to examine, and text to dis- For example, D0 consists of full-packet traces from each cuss, than is feasible at this point. of the 22 subnets monitored once for 10 minutes at a time, The general structure of the paper is as follows. We be- while D1 consists of 1 hour header-only (68 bytes) traces gin in § 2 with an overview of the packet traces we gath- from the 22 subnets, each monitored twice (i.e., two 1-hour ered for our study.

Load more