Open Source Tools for Monitoring the MPLS nodes

Manish Anand

Electronics & Electrical Communication Engineering

IIT Kharagpur [email protected]

Project guide: Dr. N.P. Dhavale

DGM, INFINET Department

Institute of Development and Research in Banking Technology (IDRBT)

Road No. 1, Castle Hills, Masab Tank, Hyderabad – 500 057 http://www.idrbt.ac.in/

28 July,2012

1 | P a g e

CONTENTS

Certificate

Introduction…………………………………………………………………………………………………4

Knowing the Network…………………………………………………………………………………..5

What to monitor………………………………………………………………………………………....6

The Problem………………………………………………………………………………………………..10

Cacti……………………………………………………………………………………………………………..12

Cacti Installation……………………………………………………………………………………………13

Device Addition…………………………………………………………………………………………….18

Cacti Graphs………………………………………………………………………………………………….20

Cacti Plugins………………………………………………………………………………………………….22

Conclusion…………………………………………………………………………………………………….25

References…………………………………………………………………………………………………….26

2 | P a g e

CERTIFICATE

This is to certify that project report titled Open Source Networking Tools for monitoring the MPLS nodes submitted by Manish Anand of B.Tech. 2nd year, Dept. of Electronics & Electrical Communication Engineering,

IIT Kharagpur, is record of a bonafide work carried out by her under my guidance during the period 30th April 2012 to 30th June 2012 at Institute of

Development and Research in Banking Technology, Hyderabad.

The project work is a research study, which has been successfully completed as per the set objectives.

Dr. N.P. Dhavale

DGM, Infinet office

IDRBT,Hyderabad

3 | P a g e

INTRODUCTION

Networking as we see today has constantly evolved from merely just two computers connected through a wire to a broader platform of Internet which acclaims to connect one part of the world to the other. As a result modern computer networks tend to be large heterogeneous collections of computers, switches, routers and a large assortment of other devices. To a large degree, the growth of such networks is ad-hoc and based on the current and perceived future needs of the users. As networks get larger and faster, the job of monitoring and managing them gets more complex. However, the job of managing computer networks becomes increasingly more important as society becomes more dependent on computers and the Internet for everyday business tasks. Network downtime now costs significant amounts of money so it is important that network and system managers are aware of everything that is happening on the networks for which they are responsible. As a result a solitary network consultant monitoring network activity is required based on the Open Systems Interconnect (OSI) reference model proposed by International Telecommunications Union (ITU).This task is accomplished by using some form of tool to gather, analyze and represent information about a therefore, in general, network monitoring involves a set of tools to aid people to monitor and maintain computer networks.

With a resource this valuable, ensuring its availability is essential. It’s also challenging because of threats such as hackers, denial of service attacks, viruses, and information theft, all of which can lead to downtime, loss of data, and overall decreasing credibility and profitability. Additionally, the network is evolving drastically, with new technologies, devices, and strategies, such as virtualization and service-oriented architectures. That’s why network management is such an important function and capability for businesses of all sizes. If our business depends on our network, then network management is critical.

Network management is a broad functional area incorporating device monitoring, application management, security, ongoing maintenance, service levels, troubleshooting, planning, and other tasks – ideally all coordinated and overseen by an experienced and reliable network administrator. Yet even the most knowledgeable and capable network administrator is only as good as the network information that is visible, and that he or she can manage and act on. Administrators need to know what’s happening on their networks at all times, including real-time and historical information on usage, performance, and status of every device, application, and all data on the network.

This is the domain of network monitoring, the most critical function of network management. The only way to know if everything on our network is operating as it should is to monitor it continuously.

4 | P a g e

KNOWING THE NETWORK

Today’s networks can be astounding in their complexity. Routers, switches, and hubs link the multitude of workstations to critical applications on myriad servers and to the Internet. In addition, there are numerous security and communications utilities and applications installed, including firewalls, virtual private networks (VPNs), and spam and virus filters. These technologies span all verticals and companies of all sizes. Network management, therefore, is not confined to only certain industries or solely to large, public companies.

Understanding the composition and complexity of our network, and having the capacity to be informed of how all the individual elements are performing at any given time, is a key success factor in maintaining the performance and integrity of the network – and often of the business – as a whole. There are potentially thousands of data points to monitor on a network, and it is critical to be able to access meaningful, accurate, and current information at any given time. Network administrators need to feel confident that they know what’s happening on their network from end to end at any given point in time. It is critical to “know our network” at all times.

A network is no longer a monolithic structure. It includes the Internet, local area networks (LANs) , wide area networks (WANs), virtual LANS (VLANS), wireless networks, and all of the devices, servers, and applications that run on them. Whatever enables users to access and share information, utilize applications, and communicate with each other and with the outside world – either through voice, data, or images – is, in essence, our network. A network typically has both internal and external users, including employees, customers, partners, and other stakeholders. Suboptimal network performance affects companies in different ways, depending on the type of user. For example, if employees can’t access the applications and information they need to do their jobs, it means lost productivity and missed deadlines. When customers can’t complete transactions online, it means lost revenues and damaged reputation. And when strategic partners can’t collaborate or communicate with the company, it harms the relationship and affects their bottom line. Even stakeholders such as investors and analysts who can’t get the information they need when they need it will also look unfavorably at our company, leading to loour stock prices and loss of shareholder value.

The fact is, though, that networks are so complex that something will go wrong. Every component in the network represents a potential point of failure. That’s why it’s essential to implement redundancy and/or a failover strategy in order to minimize downtime. This way, if a server or fails, another one waiting idly until needed can automatically come online to mitigate the impact of the failed equipment.

Of course, not every problem can be addressed quite so proactively before any warning signs are apparent. However, if we can monitor network performance proactively in real time, we can identify problems before they become emergencies. An overloaded server, for example, can be replaced before it crashes – but only if we know that its utilization rate is increasing to such an extent that a crash is all but imminent. With network monitoring, we should know the status of everything on our network without having to watch it personally,

5 | P a g e and be able to take the timely action needed to minimize and, when necessary, quickly correct problems.

WHAT TO MONITOR AND WHY ?

For something as mission-critical as our network, it’s important to have the right information at the right time. Of primary importance is to capture status information about current network devices (e.g., routers and switches) and critical network servers. A network administrator also needs to know that essential services (e.g., email, website, and file transfer services) are consistently available.

The following table contains a representative list of some of the key types of network status information we need to know every minute of every day – and why ?

What to monitor? Why to monitor? Availability of network devices (such as The plumbing of a network keeps the switches, routers, servers, etc.). network running Availability of all critical services on our The whole network doesn’t have to be down network. to have a negative impact; loss of email, HTTP, or FTP server availability for even just one hour can shut a business down. Amount of disk space in use on our key Applications require disk capacity. It’s also servers important to be aware of any anomalous behavior in disk capacity, which can indicate a problem with a specific application or system Percentage of our routers’ maximum If we anticipate when we need to upgrade throughput utilized on average. before we feel the pain of needing to upgrade, we’ll minimize disruption to our business Average memory and processor utilization of If we wait until memory is used up, users will our key CPUs/servers. never let we forget it Function of firewalls, antivirus protection, There’s a difference between having update servers, and spyware/malware security, and having security that’s working. defenses. Availability of all network devices. Most networks are a combination of heterogeneous devices; we need to be able to monitor Windows, Linux, UNIX, and other types of servers, workstations, and printers.

6 | P a g e

When there are issues, we should be alerted immediately, either through audio alerts, on- screen displays, or emails automatically generated by the network monitoring solution. The sooner we know what is going on – and the more complete the information included with the alert – the sooner we can take corrective action. Alerts should announce not only when a problem has occurred (or a threshold is being approached), but also whenever a new application or piece of equipment is brought online. They should contain information about the device, the issue, and the event that triggered the notification.

At the same time, it’s important to generate only meaningful alerts and to minimize the number of alerts stemming from the same problem or event on the network. For example, we want the flexibility to configure the monitoring solution so that it doesn’t alert when scheduled maintenance downtime is initiated. And if availability to many devices is constrained because of a problem with a router or switch, eliminating dependent alerts enables the administrator to more effectively and efficiently diagnose the actual problem. Suppressing these dependencies decreases the information we have to assimilate and increases overall confidence in the alerts we do receive.

WHAT TO LOOK IN A NETWORK MONITORING APPLICATION

To really know our network, we need a network monitoring solution that can tell we what we need to know – in real time and from anywhere, anytime.

For businesses of all sizes, we also need a solution that’s easy to use, quick to deploy, and offers low total cost of ownership – yet also delivers all the features we need. We need a solution with comprehensive capabilities and the same reliability we expect from our network. If we want our network running at high availability, we need a proven solution that we can depend on as well.

Remember, we’re monitoring a lot of network components and we’re collecting a lot of information. In order to see things clearly and quickly, we need a solution that displays this data – including a network map, report data, alerts, historical information, problem areas, and other useful information – as a network operating center (NOC) dashboard.

As discussed earlier, alerts are important. However, they are like alarm clocks – we want them to go off when we need them to, not when we don’t. For example, just as we don’t want our alarm to go off on Saturday morning, we don’t want our network monitoring solution to alert we during planned service periods. We want to be able to program our weekly maintenance schedule into the system so it can distinguish between planned and unplanned downtime. In other words, no false alarms.

Networks have to run 24/7 regardless of what hours our employees work. And while our network generally stays in one location, our employees sometimes travel. Regardless, we need to be able to access our network monitoring solution anywhere, anytime. For that matter, different people will need to access the system for different reasons, and not everyone should be able to access the same level of information. We need a solution that affords role-based views, that assigns levels of permissions based on the user’s function in

7 | P a g e the organization. This not only makes the user more productive, it also adds an important layer of security around the information.

Finally, we should look for a solution that supports multiple methods of monitoring devices. SNMP (Simple Network Management Protocol) is a flexible technology that lets we manage and monitor network performance devices, troubleshoot problems, and better prepare for future network growth. Many network devices support SNMP, making it easy to monitor them using a solution that supports SNMP.

IDRBT NETWORK

IDRBT(Institute for Development and Research in Banking Technology) started INFINET( Indian Financial Network) . INFINET is the communication backbone for the Indian Banking and Financial Sector. It is a Closed User Group Network for the exclusive use of member banks and financial institutions and is the communication backbone for the National Payments System, which caters mainly to inter-bank applications like RTGS, Delivery Vs Payment , Government Transactions, Automatic Clearing House, etc.

The network is a hybrid one of terrestrial leased lines and VSATs(Very Small Aperture Terminal) was the main communication backbone for inter-bank requirements. Over the years, with the decline in prices of leased lines, the reliance on VSATs for running applications declined. The VSAT technology also matured over the years with the increase in the size of the market and the number of private VSAT operators. The terrestrial leased line market also underwent significant change with the introduction of MPLS / VPN service being offered by many service providers. Further, as the technology had matured, the need for IDRBT to play a role of intermediary between the banks and the commercial VSAT operators also diminished.

With the availability of better and more reliable technology in the form of Multi-Protocol Label Switching (MPLS), IDRBT decided to migrate the INFINET backbone to MPLS. The IP VPN MPLS network is an improvement over the Leased Line Network. The Leased Line Network is less scalable and since it is a partial mesh network, adding a new site to the network is difficult. Upgradation of bandwidth too is a time consuming and cumbersome process. Packet switching is sloour compared to MPLS and the quality of service for applications too is not of a high standard.

MPLS(Multi Protocol Label Switching)

The MPLS is a combination of packet forwarding and label switching through a network. It is an integration of high speed layer-2 switching with layer-3 routing using label switching. It improves efficient use of resources and enhances performance of the network. MPLS also enables easy to implement Quality of Service and Class of Service based on application needs. Moreover, implementation of IPSec tunnels (secure tunnels between which data is

8 | P a g e encrypted) between CPE (Customer Premises Equipment i.e. router) to CPE is easy. When an unlabeled packet enters the ingress router and needs to be passed on to an MPLS tunnel, the router first determines the forwarding equivalence class (FEC) the packet should be in, and then inserts one or more labels in the packet's newly-created MPLS header. The packet is then passed on to the next hop router for this tunnel.

Google images Fig. MPLS uses a combination of packet as well as circuit switching

IDRBT MPLS Architecture

The INFINET MPLS Architecture is uniquely designed to provide high-level redundancy. Its salient features are full meshed communications at all locations (backbone); two service providers to enable high speed fault tolerance; a VPN between two locations could be across service providers; all VPNs between CPEs will be encrypted; and availability of Quality of Service and Traffic Engineering on the last mile as well.

The INFINET MPLS network provides the performance characteristics of layer-2 networks and the connectivity and network services of layer-3 networks, improved scalability and easy upgradation of bandwidth through a configuration change at the provider-end and the time involved in upgrading the link is less.

The INFINET MPLS network provides for low latency since it involves minimal processing time at the router. The present SLA is for latency of not over 100ms. The architecture changes are underway to bring it to below 50 ms and even better for latency sensitive

9 | P a g e payment system applications as per user requirements. The INFINET MPLS improves the possibilities for Traffic Engineering and supports the delivery of services with Quality of Service (QoS) guarantees.

*

Fig. Basic operation of MPLS network at IDRBT

The problem-

IDRBT while hosting the INFINET has to take care of the communication backbone of the inter bank transfer and other activities. As a result they need to constantly need to monitor over the entire RBI locations all across the country and if for any link going down ,need to report the network administrator about it and take necessary actions . This calls for the use of a network monitoring tool quite robust and stable which can counter over these problems and comes with the worthwhile result.

The Approach

Network monitoring tool can be broadly categorized into two main segments-Open Source and Licensed version . Our project had been a group project under the common heading of “ Open source Networking Tools for monitoring the Mpls nodes” which saw the broad division of 13 networking tools both under open source as well as licensed version namely as described in the next page. IDRBT source material

10 | P a g e

Open Source Networking Tool

 Zabbix  Argus  Nagios  Cacti  NetDisco  Zenoss  Spiceworks  Open QRM  Open NMS  Frame Flow

Licensed Networking Tool

 OpManager  PRTG  NetFlow Analyser

On the part of the group project I was assigned the task to have a look over one of the tool and that was Cacti.

11 | P a g e

CACTI

Cacti is a complete frontend to RRDTool, it stores all of the necessary information to create graphs and populate them with data in a MySQL database. The frontend is completely PHP driven. Along with being able to maintain Graphs, Data Sources, and Round Robin Archives in a database, cacti handles the data gathering. There is also SNMP support for those used to creating traffic graphs with MRTG

Some of the important features-

• A tool to monitor, store and present network and system/server statistics • Designed around RRDTool (Round Robin Database)with a special emphasis on the graphical interface • Almost all of Cacti's functionality can be configured via the Web. • A tool to monitor, store and present network and system/server statistics • Designed around RRDTool with a special emphasis on the graphical interface • Almost all of Cacti's functionality can be configured via the Web.

General Description of Cacti

 Cacti is written as a group of PHP scripts.  The key script is “poller.php”, which runs every 5 minutes (by default). It resides in /usr/share/cacti/site  Cacti uses RRDtool to create graphs for each device and data that is collected about that device. You can adjust all of this from within the Cacti web interface.  The RRD files are located in /var/lib/cacti/rra.

Installation under ubuntu

– Available in RPM form and packages for Gentoo, Red Hat, Fedora, SuSE, FreeBSD,etc. – It is necessary to install cactid separately if you wish to use this for larger installations.Again, this code has not been formally measured for improved performance. – In Ubuntu/Debian…

# apt-get install cacti

12 | P a g e

Fig:- Setting the root password for Mysql

Fig;-Again confirming the password

Fig:- Informational Message

13 | P a g e

Fig:- In this case Apache2 web server is being used

Fig:- Here the Option has been selected to automatically configure database

Fig:- Database administrative password is set here

14 | P a g e

Fig:-Mysql database for cacti password is set here

Fig:- Confirmation of password takes here

Now the web browser is opened up and pointed to the following url- http://localhost/cacti

The following page comes up

15 | P a g e

Fig- is clicked upon

Fig:- is selected and is clicked over

16 | P a g e

Fig:- The following page comes over. We have to make sure that RRDTool 1.3x is selected

17 | P a g e

Fig- The first login page appears like this. For this username and password

Fig:- We are required to change the password for our next login

Adding devices to cacti

Fig:- basic console on login then click on tab to add devices to monitor

18 | P a g e

Fig:-Basic device addition web portal.Here SNMP version and other details oure made available by IDRBT

19 | P a g e

Creating Graphs

Fig:-On clicking Create Graphs for a particular host

Fig:- The template required for a particular graph is selected from the list

20 | P a g e

Fig:- The following graphs oure obtained of the various routers and switches

21 | P a g e

CACTI Plugins

As said before there are thousands of data point that can be monitored over. So specifically we oure asked to monitor the following details and to trigger alarms and notification if anything exceeded its upper or lower limit.

Measurement of following Network Parameters

 Latency(<100ms )

 Packet loss(<.5%)

 Link utilization(<75%)

 Link availability(> 99.99%)

 CPU utilization(<75%)

 Memory utilization(<75%)

 Alerts and alarms

Cacti basic installation comes with only with two segment-Console and Graph. To further add features we need to install plugins greatly concerned with what we require to monitor .The plugin addition is quite handy. The major plugin involved here was thold which greatly looks over threshold addition and alert generation. The different important plugins available are listed below:-

Name Description thold Threshold Alert Module discovery Auto Host Discovery mobile A simple display to see down hosts and alerts from your mobile device nectar This Plugin sends Graphs and Text to given mail address(es)

Thus the required plugins were installed and network was monitored over .

22 | P a g e

Fig:- Cacti with different plugins successfully installed

Fig- Thold Plugin web interface

23 | P a g e

Conclusion:-

Since our project was group project we came to conclusion the none of the tool was perfectly suitable for the given set of parameters. So the entire group sat together and discussed the pros and cons of each tool and came to the final result of integrating the four best networking tools over the servers of IDRBT

• Cacti is very flexible due to its use of templates.

• Once you understand the concepts behind RRDTool,then how Cacti works should be (more or less) intuitive.

• The visualization hierarchy of devices helps to organize and discover new devices quickly.

• There are very few to no statistics available about the performance of cactid .

• It is not easy to do a rediscover of devices.

• To add lots of devices requires lots of time and effort.

24 | P a g e

Difficulty of InstallationUsability o/s Hardware reqiurementsOpen Sourcedash-boardgraphs facality Alerts

Argus Complicated Not Good Ubuntu nothing special Y N Y Y

Zabbix simple friendly Ubuntu nothing special Y Y Y Y openNMS Complicated Friendly Ubuntu nothing special Y Y Y Y

Nagios Conplicated Friendly Ubuntu nothing special Y Y N Y

Cacti Simple friendly windows/linuxnothing special Y Y Y Y

NetDisco Complicated not good Ubuntu nothing special Y Y N N

2.4 Ghz Dual Core Processo r,2 GB RAM,250 GB Hard Disk Space

Netflow Analysersimple friendly Windows XP & above/Linux N Y Y Y

PRTG Simple User FriendlyWindows XPminimum or later 1024 MB RAMN memoryY Y Y opManagerSimple user friendlyWindows 2 GHz processor,4 GB Nmemory, 20GBY hard diskY space Y

SpiceworksSimple Friendly Windows/Ubuntu N Y N Y

Zenoss Simple Friendly Ubuntu nothing special Y Y Y Y

Frame FlowSimple Friendly windows nothing special Y Y Y Y

OpenQRMSimple Friendly Ubuntu/linuxnothing special Y Y Y Y

25 | P a g e

notificationsPing TestPacket LossCRC Error RTT Latency Jitter Memory UtilizationCPU Utilization

Argus Y Y N N N Y N N N

Zabbix Y Y Y Y Y Y N Y Y openNMSY Y N N N N N N N

Nagios Y Y Y Y Y N N

Cacti Y Y N N N N N Y Y

NetDiscoN N N N N N N N N

Netflow YAnalyser N Y N Y Y Y Y Y

PRTG Y Y N N N N Y Y Y opManagerY N Y N N Y Y Y Y

SpiceworksY Y N N N N N N N

Zenoss Y Y N N N N N Y Y

Frame FlowY Y Y N N N N Y Y

OpenQRMY Y Y Y Y Y Y Y

26 | P a g e

Protocol Status Link Availability Link UtilizationConfigurationNetwork Monitoring MappingLicense Amount

Argus N N N N NA

Zabbix Y Y Y Y N NA openNMS Y Y Y Y N NA

Nagios N Y N N N NA

Cacti N N Y N Y NA

NetDisco N N N N Y NA

Netflow AnalyserY Y Y N Y Rs. 457440 (INR)

PRTG Y Y Y N N 2230 USD for a pack including 1000 sensors (and apart from this software maitenance and VAT to be included / Overall 2653 USD opManagerN N N N Y 1,43,640

SpiceworksN Y Y Y Y NA

Zenoss N N N N Y NA

Frame FlowN N N N Y NA - plugins not free

OpenQRMY Y Y Y N NA

27 | P a g e

References:-  www.cacti.net

 http://docs.cacti.net/plugins

 Networking Essentials- Microsoft(Publication House)

 http://en.wikipedia.org/wiki/Cacti_(software)

 http://forums.cacti.net/

 Cacti manual- http://www.cacti.net/downloads/docs/html/

28 | P a g e