Distributed Monitoring with Opennms Distributed Environment, Distributed Monitoring, Central Visibility Executive Summary
Total Page:16
File Type:pdf, Size:1020Kb
Distributed Monitoring with OpenNMS Distributed environment, distributed monitoring, central visibility Executive Summary As the edges of enterprise networks expand with more devices, processes, services, and locations, so do the challenges of distributed monitoring. Highly distributed networks present issues such as security, privacy, reachability, and latency that make the monitoring, collection, and processing of large volumes of data difficult. This white paper explores some of the challenges to effective monitoring in distributed network environments, and solutions to address them: • Distributed data collection to monitor systems and networks that are otherwise inaccessible • Digital experience monitoring (DEM) from different perspectives to provide a better understanding of local conditions • Dynamic scaling to adapt to changing network conditions and volumes of data collected for processing and storage • Data visualization and alarm correlation to better understand the data collected and improve response times • Customization for your unique monitoring, workflow, and personnel needs 2 Introduction As the edges of enterprise networks expand with more devices, processes, services, and locations, so do the challenges of distributed monitoring. Keeping up with this type of growth presents unique challenges: how to monitor everything you need to and effectively process and interpret the volume of data such monitoring produces, given the issues (security, privacy, reachability, and latency) that highly distributed networks present. To ensure maximum uptime and optimal performance of your network, you need to be able to do the following: • Monitor and collect data from anywhere, including remote and restricted locations (page 4) • Monitor digital experience (DEM) from different perspectives(page 6) • Scale to process large volumes of data (page 8) • View data from a central location, with one tool (page 10) • Correlate and categorize alarms (page 12) • Customize your monitoring environment for your unique monitoring needs (page 13) • Delegate issues to the right people at the right time • Store data for analysis to predict and adapt 3 Monitor and collect data from anywhere Infrastructure, services, and applications located in remote sites within large, distributed corporate networks, can be difficult, if not impossible, to reach and monitor from a central location such as a data center or the Cloud. Specific challenges include firewalls, network address translation (NAT) traversal, overlapping IP address ranges, and locked-down environments. A network monitoring platform needs to be deployable in a distributed configuration to provide reach into systems and networks that would otherwise be inaccessible, while keeping the monitoring logic centralized for easier operation and administration. The OpenNMS Minion provides access to the inaccessible, with the resilience and scalability to expand monitoring capabilities as your network expands. Comprehensive fault, performance, and traffic monitoring OpenNMS Horizon/Meridian offers comprehensive fault, performance, and traffic monitoring, as well as alarm generation for your entire network from one central place. Monitoring via a host of protocols from SNMP to Netflow to gRPC and more, OpenNMS collects data on the devices, interfaces, and services you define during provisioning. It triggers alarms when it detects a problem and stores the metrics it collects, so you can analyze trends for better capacity management and network optimization. Minion acts as the eyes and ears of OpenNMS, extending its reach so it can • Operate behind firewalls and NAT • Handle overlapping address spaces with a separate Minion in each space • Provide resilient deployments with multiple Minions per location • Scale horizontal ingestion for flow, trap, and syslog messages with multiple Minions per location • Scale flow processing with OpenNMS Sentinel 4 How it works Minion is a stateless service that runs in the lightweight Karaf application container, communicating with devices and services in remote locations, while OpenNMS Core maintains state and performs the coordination and task delegation. A location defines a network area associated with a Minion: an isolated network in a data center, a department, a branch office, or a customer’s network. Minions can operate behind a firewall and/or NAT as long as they can communicate with OpenNMS via an ActiveMQ or Apache Kafka message broker or through gRPC. Being stateless means a Minion is easy to sustain, horizontally scalable, and easy to orchestrate due to the simplicity of its design. REMOTE LOCATION A MONITORED MINION ELEMENTS APM REMOTE LOCATION B MONITORED MINION ELEMENTS Sample Minion Configuration The Minion connects to an OpenNMS REST endpoint to update its configuration and for initial provisioning. The REST endpoint can be secured with HTTPS and is authenticated with a username and password. The messaging system provides a second communication channel for the actual job of monitoring. When a device sends a message such as an SNMP trap to Minion, the trap is transformed into a Minion message and pushed to the message broker. OpenNMS listens on the location queues and transforms the message from the Minion to an OpenNMS event, which appears in the central OpenNMS UI. REST MINION SNMP TRAP OpenNMS - Minion messaging Minion also checks in for messages from OpenNMS, for monitoring tasks (remote procedure calls), and sends the responses back on the response queue in the message broker. 5 Monitor digital experience from different perspectives Understanding location-specific conditions makes it easier to pinpoint not only where an issue occurs, but its impact on a user’s (or machine’s) digital experience. When your central New York location can see the availability of a service hosted in Houston as accessed from Seattle, you can identify the perspective from which an outage occurs and troubleshoot the problem more easily. Application Perspective Monitoring (APM) APM uses the Minion infrastructure to monitor a service or application (central or external) from each Minion’s location, allowing you to view the reachability of a service from many different perspectives. When a service is not responsive, OpenNMS generates an outage record that includes the corresponding perspective that identified the outage. APM can combine these perspectives to provide a holistic view of the application or service. With APM you can easily monitor the digital experience (DEM) of corporate services and applications from the perspective of many different physical, geographical, or logical locations representative of a client’s perspective. Testing availability and latency measurements from different locations provides a better understanding of local conditions. The Minion’s ability to monitor from remote locations is what makes APM possible. How It works APM implementation requires one Minion set up on your network and a simple configuration through the OpenNMS web UI. Configure one or more Minions to monitor the services from specific locations. In the OpenNMS database model, an application combines several monitored services and contains references to locations. The application also references an optional polling package that users can customize. When a remote outage occurs, OpenNMS registers the outage and includes the location from which the outage was detected. This enables you to see the perspective from which an outage occurred, and filter for local-only or remote-only outages. 6 The following diagram illustrates a sample use scenario for APM. Three locations each have a Minion: Stuttgart, Raleigh, and Ottawa. Each Minion is configured to monitor an HTTP service. If the HTTP service goes down on one of the servers, the OpenNMS UI displays the outage, including the locations (perspectives) from which the outage was detected. MINION Location: Stuttgart Monitored MINION Services Location: Raleigh MINION Management Traffic Monitoring Traffic Location: Ottawa Sample APM Configuration APM provides granularity for detecting outages. Knowing which location detected the outage indicates that the issue could lie between the perspective location and the monitored location, rather than at the monitored location itself (since other locations still see the service operating). Topology view with APM OpenNMS extends the APM feature with effective visualization. The topology view in the UI displays the applications and services for each location, and includes service status from the perspective of the location monitoring them. A table below the topology map provides detailed status: 7 Scale to process large volumes of data A network monitoring system must be able to collect and process tens of thousands of data points per second. Of course, networks are not static: the volume of data you process increases as your network expands, and changes with fluctuations in network traffic, peak/ off-peak hours, and other factors. An NMS that can scale dynamically to collect and process large volumes of data helps administrators respond to the most current issues in a timely manner. OpenNMS Minion increases the total scale of your monitoring system by distributing the data collection load instead of handling it on one server with an OpenNMS instance. Minion is stateless software that users can containerize and deploy alone or in groups in various network locations to provide a secure and simple communications infrastructure. The ability to use more than one Minion per location provides resiliency