A Scalable Architecture for Network Traffic Monitoring and Analysis Using Free Open Source Software
Total Page:16
File Type:pdf, Size:1020Kb
Int. J. Communications, Network and System Sciences, 2009, 6, 528-539 doi:10.4236/ijcns.2009.26058 Published Online September 2009 (http://www.SciRP.org/journal/ijcns/). A Scalable Architecture for Network Traffic Monitoring and Analysis Using Free Open Source Software Olatunde ABIONA1, Temitope ALADESANMI2, Clement ONIME3, Adeniran OLUWARANTI4, Ayodeji OLUWATOPE5, Olakanmi ADEWARA6, Tricha ANJALI7, Lawrence KEHINDE8 1Department of Computer Information Systems, Indiana University Northwest, Garry, USA 2,6 Information Technology and Communications Unit, Obafemi Awolowo University, Nigeria 3Information and Communication Technology Section, Abdus Salam International Centre for Theoretical Physics, Trieste, Italy 4,5 Department of Computer Science and Engineering, Obafemi Awolowo University, Ile-Ife, Nigeria 7Electrical and Computer Engineering Department, Illinois Institute of Technology, Chicago, USA 8Department of Engineering Technologies, Texas Southern University, Houston, Texas, USA Email: [email protected], {2taladesanmi,4aranti,5aoluwato,6kanmiade}@oauife.edu.ng, [email protected], [email protected], [email protected] Received February 5, 2009; revised April 8, 2009; accepted June 10, 2009 ABSTRACT The lack of current network dynamics studies that evaluate the effects of new application and protocol de- ployment or long-term studies that observe the effect of incremental changes on the Internet, and the change in the overall stability of the Internet under various conditions and threats has made network monitoring challenging. A good understanding of the nature and type of network traffic is the key to solving congestion problems. In this paper we describe the architecture and implementation of a scalable network traffic moni- toring and analysis system. The gigabit interface on the monitoring system was configured to capture net- work traffic and the Multi Router Traffic Grapher (MRTG) and Webalizer produces graphical and detailed traffic analysis. This system is in use at the Obafemi Awolowo University, Ile-Ife, Nigeria; we describe how this system can be replicated in another environment. Keywords: Scalable Network Monitoring, Traffic Analysis, Web Log Analysis, Open Source 1. Introduction upgrade is done the congestion problem will improve for a while and later gradually deteriorate as the users The rapid growth of the Internet in size, complexity and change their behavior in response to the upgrade. The traffic types has made network management a challeng- alternative solution to this problem is to deploy a scal- ing task. The ability of a monitoring system to provide able network traffic monitoring and analysis system, in accurate information about the nature and type of the order to understand the dynamics of the traffic and network traffic can not be over emphasized. Information changes in the internet and overall stability of the net- about who is generating the most traffic, what protocols work. In addition to knowing the health status of the are in use, where is the traffic originating from or where network, monitoring of network activity also has the is the destination of the traffic can be very important to benefits of detecting denial of service (DoS) and band- solving congestion problems. Many network administra- width theft attacks. In order to conduct analysis of wide tors spend a lot of time, trying to know what is degrading range of network behaviors, it is necessary to collect the performance of their network. network traffic on a continuous basis rather than as a one A typical solution to congestion problem is to upgrade time event which only captures transient behaviors that network infrastructure, i.e. replace servers with high end provides insight into network problems. Collecting long servers and increase the bandwidth. This solution is ex- term network traffic data will provide valuable informa- pensive, short term and does not scale. As soon as the tion for improving and understanding the actual network Copyright © 2009 SciRes. IJCNS A SCALABLE ARCHITECTURE FOR NETWORK TRAFFIC MONITORING AND ANALYSIS 529 USING FREE OPEN SOURCE SOFTWARE dynamics. for web pages. These graphs can be viewed using a web The rest of the paper is organized as follows: in Sec- browser. Although MRTG gives a graphical overview, it tion 2, we review related work, followed by the system however does not give details about the host and protocol design in Section 3. In Section 4 we describe the imple- responsible for the traffic monitored. Windmill [10] is a mentation and configuration of the system. In Section 5, modular system for monitoring network protocol events; we present the example application of the system and in it is useful for acquiring the data from the network, but it Section 6 we conclude the paper. is however limited in its capability by not providing any facility to aid in the analysis of those events or non pro- 2. Related Work tocol events acquired. WebTrafMon [11] uses a probe to extract data from network packets and composes log files. The libpcap [1] tool has greatly simplified the task of Analysis results are based on the collected log files. Fur- acquiring network packets for measurement. The limita- thermore the user is able to view the analysis result via a tion of the tool is its inability to analyze the captured data, generic web browser. WebTrafMon can show traffic it will only capture the data and the programmer or net- information according to the source and destination host work administrator is left to carry out analysis manually. through any web interface; it can also show the traffic This task can be time consuming and cumbersome and in status according to each protocol in use. Although Web- most cases accurate information about the network is not TrafMon has good capabilities, it can not monitor and obtained. Some researchers have developed modular analyze traffic in a switched network such as Fast software architectures for extensible system [2,3], how- Ethernet and Gigabit Ethernet. ever only a few of these systems are optimized to handle The continuous Query Systems such as [12] and [13] large amount of data and continuous monitoring. Other share many of the concerns of other systems in acquiring researchers have developed systems for streaming data and filtering continuous streams of data, this system through protocol layers [4] and routing functions [5], but however lack the ability to easily add new functions over not much attention has been given to the analysis of the data, hence they are not extensible. The agile and large/huge or broad data collected over time. scalable analysis of network events is a system based on Simple Network Management Protocol (SNMP) cov- modular analysis and continuous queries [14] allowing ers a class of tools such as Multi Router Traffic Grapher users to assemble modules into an efficient system for (MRTG) and Cricket [6], which collect counter statistics analyzing multiple types of streaming data. The system is from network infrastructures, and visualizes these statis- optimized for analyzing and filtering large streams of tics by means of graphs. The most common use for these data, and makes extensive use of polymorphic compo- tools is graphing the InOctet and OutOctet counters on nents that can perform common functions on new and router interfaces, which respectively provide counts of unforeseen types of data without requiring any additional the number of bytes passing in and out of the interface. programming. This data analysis system provides a Many other tools also support SNMP monitoring [7]. scalable, flexible system for composing ad-hoc analyses This includes monitoring performed by Remote Moni- of high speed streaming data but does not provide infra- toring (RMON) agents, which can be useful in deter- structures for data gathering and network monitoring. mining the top hosts with regards to traffic, as well as the Ntop [15] provides a display similar to the UNIX top distribution of packet sizes on a network. SNMP can command, but for network traffic, it can be used for traffic only be used to monitor devices that are SNMP managed, measurement and monitoring. Features such as the em- the reading and writing of the In Octet and OutOctet bedded HTTP server, support for various network media counters in a router could generate substantial traffic. types, light CPU utilization, portability across various Tcpdump [8] prints out the headers of packets on a net- platforms, and storage of traffic information into an SQL work interface that match the boolean expression. It is a database makes ntop versatile. However, ntop is limited network sniffer with in-built filtering capabilities; it can by its high memory requirements when operating in a only collect the data from the network, but does not ana- continuous monitoring environment. The extensive cache lyze collected data. The collected data can be analyzed usage has the drawback that memory usage is increased offline with another utility namely, tcpshow and tcptrace. [16]. This makes ntop a memory and computational in- As useful and powerful as tcpdump is, it is only suitable tensive application. Like several of the other tools, ntop for troubleshooting i.e., for tracking network and proto- uses the same packet capture library to obtain the net- col related connectivity problems. work data. Since ntop operates in continuous mode, it is MRTG is a versatile tool for graphing network data designed to operate on networks with speeds of less than [9], this tool can run on a Web server. Every five minutes, 100 Mbps [17]. A continuous network tracing infra- it reads the inbound and outbound octet counter of the structure was described in [18], it is multi-user based and gateway router, and then logs the data to generate graphs capable of collecting archives and analyzing network Copyright © 2009 SciRes. IJCNS 530 O. ABIONA ET AL. data captured. The system is limited by not providing a universal web interface to display the results of its monitoring. The system requires huge storage, 11TB of shared disk space for data repository since it is not web based and also there is high overhead since the storage system is accessed through an NFS server over an Ethernet link.